Knowledge Distillation : From Teacher to Student

how knowledge distillation works from a high level

Have you heard of the term "model distillation" or “knowledge distillation”? It’s a very cool concept in deep learning to compress what a large model has learned into a smaller model. Here’s how it works.

In the typical knowledge distillation process, the larger model is first trained on the dataset. Once trained, this model's predictions are used as "soft targets" for training the smaller model.

The smaller model is then trained on the same dataset, but instead of using the original hard target labels (0 or 1 in a binary classification problem), it uses the output probabilities of the larger model as soft targets.

The smaller model learns to mimic the larger model's behavior, including its handling of more nuanced or borderline cases represented in these soft targets.

This process allows the smaller model to generalize better and often achieve performance metrics close to the larger model's, despite its reduced size and complexity.

However, it's worth noting that the performance of the distilled model largely depends on the quality of the teacher model.

If the teacher model is poorly trained or not sophisticated enough, the student model's performance will also be subpar.

Below you can see a code sample on how to do this in Pytorch.

The main part to look at in the student training function is the calculation of the loss function.

It has two components:

1 - The traditional cross-entropy loss between the student's predictions and the true labels,

2 - The Kullback-Leibler (KL) divergence between the "softened" output distributions of the student and the teacher. The softmax function is "softened" by the temperature parameter, which is usually set to a value greater than 1.

Notice that you’re adding 2 hyperparameters here: alpha and temperature.

how to implement knowledge distillation in Pytorch

Why do you need to know this technique?

Knowledge distillation can help compress the knowledge of big models to small ones. So think about it this way, what if you could deploy a model that's 10 times smaller then some original model?

This opens up a lot of opportunities for edge deployment. For example, below I share how you can deploy Pytorch models inside mobile phones.

Deploying Pytorch Models on Mobile Phones

Let’s say you want to deploy a Pytorch model in a mobile app (on the edge and not in the cloud). How would you do that?

Well, you can use Pytorch Mobile. Here’s how you’d go about it.

Train your model using Pytorch. Once that’s done, you convert it to a format that can be used by PyTorch Mobile.

This is often done using a process called "torchscript". TorchScript allows you to serialize your models, meaning they can be loaded in a non-Python environment.

Then, to integrate your model inside your mobile app, you have several options depending on your target environment: iOS or Android.

If you’re on an iOS device you can use TorchModule.

If you’re on an Android device you can use org.pytroch.Module.

These are libraries provided by Pytorch.

You can also deploy your Pytorch model inside a Flutter app by writing custom platform-specific code. You can check this article on how to do that.

Why do you need to know this?

A lot of companies are developing deep learning models that need to be deployed directly on mobile devices. They don't want the option to deploy their models on the cloud. This is specifically done for security and privacy reasons. And in some cases, it is done for the sake of having the app always running, even without internet access.

Deep Learning Courses

Deep Learning for Object Detection using Tensorflow.

Deep Learning for Image Segmentation using Mask RCNN and Tensorflow.

What'd you think of today's edition?

That's it for this week's edition, I hope you enjoyed it!

Machine Learning for Medical Imaging

Model Distillation : Big to Small Models

Knowledge Distillation : From Teacher to Student

Deploying Pytorch Models on Mobile Phones

Deep Learning Courses

What'd you think of today's edition?

AI Scribes: The Future of Medical Documentation?

From DeepSeek to Lung Tumors

LLMs that are HIPAA Compliant!