Hi Reader,

Welcome to the PYCAD newsletter, where every week you receive doses of machine learning and computer vision techniques and tools to help you learn how to build AI solutions to empower the most vulnerable members of our society, patients.

Build an Object Detection Model in 30 minutes without Data Annotation

Foundation models such as SAM (Segment Anything Model) are really impressive in their detection or segmentation results. But they are almost useless for building real world products. Why?

Because they are too big, too slow and too general.

Nonetheless, they can be used to build deep learning systems incredibly quickly. How?

By allowing you to auto annotate your dataset!

A technique called Autodistill enables you do exactly this.

Here are the steps:

1 - You take a foundation model such as SAM and you run your images through it and get outputs.

2 - You use prompting to choose which outputs to keep and save, and which ones to remove. This is your annotated dataset.

3 - You use the annotations to train a leaner and more specialized model such as YOLOv8.

Although you can build such a process by yourself, I would recommend you first take a look at the autodistill package.

It allows you to perform all of these steps in an easy way.

You can also test the package directly inside a google colab here.

Tracking Technique to Beat Them All

This new tracking technique is a game changer!

It's from a paper titled: Tracking Everything Everywhere All at Once!

Classical tracking algorithms such as pairwise optical flow, lose track of objects when they are occluded, and can produce inconsistencies when correspondences are composed over multiple frames.

Researchers in this paper argue that the way motion is represented by these classical algorithms is not good enough.

Therefore, they propose a global motion representation that can provide accurate and consistent tracking even through occlusion.

This global motion would be represented as a data structure that encodes the trajectories of all points in a scene.

The proposed representation is called OmniMotion.

OmniMotion allows going from one frame of a video to the other while consistently keeping track of the 3D context around the video content.

Below you can see the result of this technique.

You can find out more about this technique in the original paper. You can also check a demo of this technique here.

Cool AI Tools (Affiliates)

Artsmart.ai - Create unique AI-generated images to level up your brand’s marketing strategy
Katteb - The first fact-checked, real-time, and localized AI writer.

Machine Learning for Medical Imaging

Object Detection Without Annotation

Build an Object Detection Model in 30 minutes without Data Annotation

Tracking Technique to Beat Them All

Cool AI Tools (Affiliates)

What'd you think of today's edition?

AI Scribes: The Future of Medical Documentation?

From DeepSeek to Lung Tumors

LLMs that are HIPAA Compliant!