fast.ai v3 Lesson 1: Image classification
I’m doing some 5-minute summaries of fast.ai Practical Deep Learning for Coders lessons.
This is lesson 1.
This is partly introduction to the course, and partly instant gratification by showing you a world-class image classifier.
The course uses the fastai Library, which sits on top of PyTorch (similar to how Keras sits on top of Tensorflow and other frameworks).
The library gives you several abstractions to simplify “doing things in Deep Learning”.
One such abstraction is called a
DataBunch. As data can be organised in many different ways and formats, using a
DataBunch and passing in functions that specify how to get the data will in theory simplify things for you.
For common cases you won’t really need to think about that kind of thing, just use the
DataBunch and get started. It even gives you a validation set automatically.
In this particular case, we’re using the subclass
ImageDataBunch as we’re dealing with images.
Another one is this concept of a
Learner. If you’re familiar with machine learning, this is a thin wrapper on top of a model. It makes it difficult to create a model architecture that is bad.
In this particular lesson we used the
ConvLearner which I believe uses a pre-trained ImageNet model (more on that soon) that you specify, which then has a couple of fully connected layers where the size depends on your labels.
If you don’t know what that means, we’ll learn about fully connected layers, convolutional layers, etc. later in the course.
Training a world-class image classifier
Has only 2 steps, really. Or 3 depending on how you count:
- Use transfer learning with the pre-trained part frozen, to update the weights of the fully-connected layers
- Unfreezing the pre-trained part, and using “differential learning rates” for fine-tuning.
Also, your default pre-trained model should be ResNet (either resnet34 or resnet50). For the love of God, don’t use VGG.
Leveraging a pre-trained model on a more complex dataset. So you use the pre-trained model, chop off the last layer, and add new fully connected layers. In practice this gives you great results.
Discriminative learning rates
A learning rate is a number that tells the algorithm how much to update the weights at each batch. Since it’s unlikely that the earlier pre-trained layers would need much updating, they should have smaller learning rates than the final layers.
This technique is called discriminative learning rates. In an earlier version of the course, this was called “differential learning rates”.
Learning rate finder
An algorithm to find the learning rate. I believe this is by a researcher named Leslie Smith. The particular algorithm seems to have been updated from v2 of the course, but it wasn’t described in lesson 1 yet.