Transfer Learning with VGG

Share this post

In the last article in the image processing series we discussed convolutional neural networks. We even created one, but it must be recognized that it is very simple. The problem when we want to have good results is that we need a lot of labelled images but also a lot of resources because very quickly we have to stack a good number of layers in our deep neural network in order to have great accuracy.

In short, the question of time, resources but also the choice of hyperparameters of the neural network can become key for many projects.

What if we reused a piece of existing, pre-trained neural network?

This is exactly the goal we are trying to achieve with Transfer Learning! and to do this there are a lot of neural networks that are already available. We will start in this article with one of the most famous: VGG.

What is VGG ?

VGG16 is a convolutional neural network model designed by K. Simonyan and A. Zisserman. The implementation details can be found in the document “Very Deep Convolutional Networks for Large-Scale Image Recognition”. This model achieves 92.7% test accuracy in ImageNet, which aggregates more than 14 million images belonging to 1000 classes. Why vgg-16 and good simply because this neural network includes 16 deep layers:

So of course, you could create this neural network by yourself – and from scratch – then find out the best hyperparameters to finally train it. But that would take a lot of your time and resources… so why not use all the settings in this model and see how to complete this network by adding custom layers to it?

This is exactly what we are going to do in this article, and we will doing it with some labels/classes that VGG has not ever been trained for. Crazy isn’t it? that’s kind of the magic of convolutional neural networks. We will in fact be able to reuse the feature map mechanisms that were produced by the VGG but to detect new shapes.

VGG & TensorFlow

Good news, Tensorflow provides the VGG-16 model as standard and therefore makes Transfer Learning very easy.

In fact it provides as standard other pre-trained models such as:

  • VGG16
  • VGG19
  • ResNet50
  • Inception V3
  • Xception

You will see that the reuse of these models is child’s game 🙂

Let’s build a fruit detector!

To test our custom VGG-16 Transfer Learning model we will use a dataset made up of fruit images (131 types of fruit to be exact). You can find this dataset on Kaggle at the following address: https://www.kaggle.com/moltean/fruits

Be careful if you want to follow me step by step and create your own neural network, know that you will need power (GPU / TPU). So I suggest you do like me and create a notebook in Kaggle.

You can watch mine here: https://www.kaggle.com/shiftbc/fruit-and-vgg

Dataset presentation & discovery

The dataset is structured by directory (and you will see that this structure has not been done randomly):

We have two datasets: Training and Test. In each of these directories there are sub-directories (labels) in which we have photos of the different fruits.

Here is some additional information that may be useful later:

  • Images: 90483
  • One fruit per image
  • Training: 67703 images
  • Test: 19543 images
  • Labels/fruits: 131
  • Image Size: 100×100 pixels

First steps …

Commençons par importer les librairies nécessaires :

import numpy as np
import pandas as pd
from glob import glob
 
from keras.layers import Input, Lambda, Dense, Flatten
from keras.models import Model
from keras.applications.vgg16 import VGG16
from keras.applications.vgg16 import preprocess_input
from keras.preprocessing import image
from keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import EarlyStopping
 
import numpy as np
import matplotlib.pyplot as plt
from skimage.io import imread, imshow

Let’s look at an image :

image = imread("/content/drive/MyDrive/Colab Notebooks/fruits-360/Training/Apple Braeburn/0_100.jpg")
plt.imshow(image)

Perfect we have a beautiful apple in color (RGB).

image.shape
(100, 100, 3)

The size of the images is confirmed 100 by 100 pixels.

Data Augmentation with TensorfFlow

90,483 images is fine, but much more would be even better. I will use this article to introduce what is called “data augmentation”. The principle is very simple, the idea is to decline an image by shifting, rotating, zooming in order to duplicate it in several copies. From an image we can therefore have x new images and therefore improve the learning of our model by this method.

With TensorFlow we will use what are called Generators. there are several, but here we are going to use the ImageDataGenerator () Image Generator which will do all this duplication work for us and automatically.

To start, we configure the way we will create the image combinations with the ImageDataGenerator () function

Then we will use and apply this Generator to our two datasets (Training and Test)

src_path_train = "/content/drive/MyDrive/Colab Notebooks/fruits-360/Training"
src_path_test = "/content/drive/MyDrive/Colab Notebooks/fruits-360/Test"
batch_size = 32
 
image_gen = ImageDataGenerator(
        rescale=1 / 255.0,
        rotation_range=20,
        zoom_range=0.05,
        width_shift_range=0.05,
        height_shift_range=0.05,
        shear_range=0.05,
        horizontal_flip=True,
        fill_mode="nearest",
        validation_split=0.20)
 
# create generators
train_generator = image_gen.flow_from_directory(
  src_path_train,
  target_size=IMSIZE,
  shuffle=True,
  batch_size=batch_size,
)
 
test_generator = image_gen.flow_from_directory(
  src_path_test,
  target_size=IMSIZE,
  shuffle=True,
  batch_size=batch_size,
)
Found 67703 images belonging to 131 classes.
Found 19543 images belonging to 131 classes.

Here it is! simple isn’t it?

Modeling

We will now create the model from VGG-16.

3 minimum things to remember here:

  • We use the VGG16 class provided by TensorFlow (here we use imagenet weights), include_top specifies that we take the whole model except the last layer
  • We tag the layers of the neural network so as not to overwrite the learning already retrieved (layer.trainable = False)
  • We add a new Dense layer at the end (we could add others by the way), it is this layer that will make the choice of this or that fruit.
NBCLASSES = 131
train_image_files = glob(src_path_train + '/*/*.jp*g')
test_image_files = glob(src_path_test + '/*/*.jp*g')
 
def create_model():
    vgg = VGG16(input_shape=IMSIZE + [3], weights='imagenet', include_top=False)
 
    # Freeze existing VGG already trained weights
    for layer in vgg.layers:
        layer.trainable = False
     
    # get the VGG output
    out = vgg.output
     
    # Add new dense layer at the end
    x = Flatten()(out)
    x = Dense(NBCLASSES, activation='softmax')(x)
     
    model = Model(inputs=vgg.input, outputs=x)
     
    model.compile(loss="binary_crossentropy",
                  optimizer="adam",
                  metrics=['accuracy'])
     
    model.summary()
     
    return model
 
mymodel = create_model()
Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         [(None, 100, 100, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 100, 100, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 100, 100, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 50, 50, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 50, 50, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 50, 50, 128)       147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 25, 25, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 25, 25, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 25, 25, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 25, 25, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 12, 12, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 12, 12, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 12, 12, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 12, 12, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 6, 6, 512)         0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 6, 6, 512)         2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 6, 6, 512)         2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 6, 6, 512)         2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 3, 3, 512)         0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 4608)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 131)               603779    
=================================================================
Total params: 15,318,467
Trainable params: 603,779
Non-trainable params: 14,714,688
_________________________________________________________________

We find all the layers of VGG-16 upstream and the added layer (Dense / 131) at the end. Also note the number of pre-trained parameters (14,714,688) that will be reused.

Model Training

The training will take a long time, but then a long time if you don’t have a GPU locally, use colab or kaggle if you don’t have one.

epochs = 30
early_stop = EarlyStopping(monitor='val_loss',patience=2)
 
mymodel.fit(
  train_generator,
  validation_data=test_generator,
  epochs=epochs,
  steps_per_epoch=len(train_image_files) // batch_size,
  validation_steps=len(test_image_files) // batch_size,
  callbacks=[early_stop]
)
Epoch 1/10
2115/2115 [==============================] - 647s 304ms/step - loss: 0.0269 - accuracy: 0.6334 - val_loss: 0.0085 - val_accuracy: 0.8802
Epoch 2/10
2115/2115 [==============================] - 289s 136ms/step - loss: 0.0037 - accuracy: 0.9787 - val_loss: 0.0055 - val_accuracy: 0.9295
Epoch 3/10
2115/2115 [==============================] - 290s 137ms/step - loss: 0.0018 - accuracy: 0.9923 - val_loss: 0.0047 - val_accuracy: 0.9391
Epoch 4/10
2115/2115 [==============================] - 296s 140ms/step - loss: 0.0012 - accuracy: 0.9959 - val_loss: 0.0043 - val_accuracy: 0.9522
Epoch 5/10
2115/2115 [==============================] - 298s 141ms/step - loss: 8.8540e-04 - accuracy: 0.9967 - val_loss: 0.0040 - val_accuracy: 0.9524
Epoch 6/10
2115/2115 [==============================] - 298s 141ms/step - loss: 6.6982e-04 - accuracy: 0.9985 - val_loss: 0.0037 - val_accuracy: 0.9600
Epoch 7/10
2115/2115 [==============================] - 299s 142ms/step - loss: 5.5506e-04 - accuracy: 0.9984 - val_loss: 0.0035 - val_accuracy: 0.9613
Epoch 8/10
2115/2115 [==============================] - 353s 167ms/step - loss: 4.5906e-04 - accuracy: 0.9988 - val_loss: 0.0037 - val_accuracy: 0.9599
Epoch 9/10
2115/2115 [==============================] - 295s 139ms/step - loss: 3.9744e-04 - accuracy: 0.9987 - val_loss: 0.0033 - val_accuracy: 0.9680
Epoch 10/10
2115/2115 [==============================] - 296s 140ms/step - loss: 3.4436e-04 - accuracy: 0.9993 - val_loss: 0.0035 - val_accuracy: 0.9671

Quick evaluation

score = mymodel.evaluate_generator(test_generator)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
Test loss: 0.003505550790578127
Test accuracy: 0.9680007100105286

With an accuracy of 97% we can say that the model did its job very well… and did you see? in so few lines of Python / Tensorflow.

Once again I invite you to look at my notebook on Kaggle: https://www.kaggle.com/shiftbc/fruit-and-vgg

Try adding layers, changing settings, etc.

Share this post

Benoit Cayla

In more than 15 years, I have built-up a solid experience around various integration projects (data & applications). I have, indeed, worked in nine different companies and successively adopted the vision of the service provider, the customer and the software editor. This experience, which made me almost omniscient in my field naturally led me to be involved in large-scale projects around the digitalization of business processes, mainly in such sectors like insurance and finance. Really passionate about AI (Machine Learning, NLP and Deep Learning), I joined Blue Prism in 2019 as a pre-sales solution consultant, where I can combine my subject matter skills with automation to help my customers to automate complex business processes in a more efficient way. In parallel with my professional activity, I run a blog aimed at showing how to understand and analyze data as simply as possible: datacorner.fr Learning, convincing by the arguments and passing on my knowledge could be my caracteristic triptych.

View all posts by Benoit Cayla →

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Fork me on GitHub