Vision Transformer Made Simpler


Hello Reader,

Welcome to another edition of PYCAD newsletter where we cover interesting topics in Machine Learning and Computer Vision applied to Medical Imaging. The goal of this newsletter is to help you stay up-to-date and learn important concepts in this amazing field! I've got some cool insights for you below ↓

Vision Transformer Made Simpler

I thought that CNNs are a must when it comes to vision tasks. I thought that transformers will not be able to fully replace them for vision.

I was wrong. Here’s why.

There is a new vision transformer model from Meta AI. It’s called Hiera.

This vision transformer outperforms previous models in both accuracy and speed.

The performance was measured on image classification and video classification tasks.

But what makes Hiera impressive is its simplicity in design.

This model does NOT have:

  • Convolutional layers.
  • Shifted windows.
  • Attention bias.

Adding these techniques have historically been necessary to make transformers work good enough for vision tasks.

Why were these techniques necessary to make vision transformers work well?

Because they addressed some important drawbacks that vanilla vision transformers had. For example: lack of inductive bias.

So how did Hiera address these drawbacks without the use of these techniques?

By using a self-supervised learning technique for learning visual representation. The technique is called β€œmasked pretraining”;

Below is an image that shows the full architecture of the model when trained with this technique.

Hiera achieves some impressive results for image and video tasks. For example:

On ImageNet-1K dataset, it had a 0.3% accuracy gain while being almost twice as fast during inference compared to ConvNextV2-B.

On Kenitics-400 dataset, it had 2.5% accuracy gain while being almost 3 times faster compared to ViT-B.

You can check out the paper here and the code here.

​

Improving Object Detection Results without Training

Detecting small objects in images has always been a difficult task for deep learning models. There is an approach that can make your results incredibly better.
​
This approach is called Slicing Aided Hyper Inference or SAHI for short.
​
It is extremely simple, yet very effective.
​
You basically take your input image and divide it into patches.
​
You then resize these patches and pass everything to your model: the original image and the resized patches.
​
Then you aggregate the results and you filter them based on an IoU (intersection over union) threshold.
​
The technique can be used directly with your trained object detection model without any finetuning.
​
It can also be used as a data augmentation technique during training.
​
From what the paper has reported, the results are very impressive.
​
If it's used without finetuning you get an AP increase of 6.8%, 5.1% and 5.3% for FCOS, VFNet and TOOD detectors, respectively.
​
With finetuning, you get an AP increase of 12.7%, 13.4% and 14.5% AP in the same order.

You can read more about it in the original paper. You can also check the code here.


Cool AI Tools (Affiliates)

​BlogAssistant helps minimize AI content flags, so you can produce articles in over 30 languages that draw readers in without giving off creepy robot vibes. It also lets you retrieve images for your content that are always 100% royalty-free!


​

What'd you think of today's edition?

​

Machine Learning for Medical Imaging

πŸ‘‰ Learn how to build AI systems for medical imaging domain by leveraging tools and techniques that I share with you! | πŸ’‘ The newsletter is read by people from: Nvidia, Baker Hughes, Harvard, NYU, Columbia University, University of Toronto and more!

Read more from Machine Learning for Medical Imaging

Hello Reader, Welcome to another edition of PYCAD newsletter where we cover interesting topics in Machine Learning and Computer Vision applied to Medical Imaging. The goal of this newsletter is to help you stay up-to-date and learn important concepts in this amazing field! I've got some cool insights for you below ↓ Zoom That Works Everywhere If you can’t zoom any pane in your web DICOM viewer, you’re doing extra work for no reason. Think of it like this: when something is small, you bring it...

Hello Reader, Welcome to another edition of PYCAD newsletter where we cover interesting topics in Machine Learning and Computer Vision applied to Medical Imaging. The goal of this newsletter is to help you stay up-to-date and learn important concepts in this amazing field! I've got some cool insights for you below ↓ A Quick Look at Our Volume Measurement Tool One of the tools we’ve been working on is a simple way to estimate 3D volumes directly inside the viewer. You start by drawing a...

Hello Reader, Welcome to another edition of PYCAD newsletter where we cover interesting topics in Machine Learning and Computer Vision applied to Medical Imaging. The goal of this newsletter is to help you stay up-to-date and learn important concepts in this amazing field! I've got some cool insights for you below ↓ How We Build Our DICOM Viewers Using Plugins One thing we focus on when building DICOM viewers is keeping every feature as a separate plugin. This gives the app a clean structure...