Image processing (part 7) Convolution Neural Networks – CNN

Share this post

So here we are at the end of this serie on image processing. And how better to end a serie like this than by opening up to another world … the wide world of neural networks. Of course, it is impossible to deal with neural networks in a single article, and even less with convolutional neural networks (or CNN) in detail. Nevertheless, I will try to introduce you to this technique that can be found (without always knowing it) everywhere when dealing with images.

This article is a logical continuation of the previous article and assumes that you have a good understanding of Artificial Neural Networks (ANN). If not, you can also read this article (tutorial) I wrote about the Titanic. Of course, other articles specific to Neural Networks will soon appear on this site 😉

What is a CNN ?

Quite simply, a CNN (or Convolutional Neural Network) is an artificial neural network that has at least one convolutional layer. A convolutional layer being quite simply a layer in which we will apply a certain number of convolutional filters.

Ok, but, why apply convolution filters ?

Quite simply because an image contains a lot, but then a lot of input data. Imagine with a small image of 100 × 100 pixels in color … it already gives us 100x100x3, so 30,000 data to be sent to the neural network (and it’s a small image!). If you start to stack layers and neurons, very quickly the number of parameters of your network will explode and the number of calculations will grow exponentially… enough to put down your machine!

It was therefore necessary to find another approach than the classic one of ANN networks (or Multilayer Perceptron). The idea behind convolution filters is that they allow you to find patterns, shapes in images (remember the previous article which allowed you to find outlines for example). CNNs make it possible to gradually determine the different shapes and then to assemble them to find others.

The classic example is that the first layers of such a network find the basic shapes of a face: the main features, then we will detect the first shapes: nose, mouth, eyes, etc., then finally the face. and why not recognize the person, etc.

The main advantages of convolution filters are:

  • The number of parameters is much smaller to find compared to an ANN type approach. In the neural network will only have to find the values of the convolution matrix (kernel) that is to say a small matrix of the type 2 × 2 or 3 × 3!
  • The calculations are extremely simple because a convolution only requires multiplications and additions.

A Convolutional Neural Network (or CNN) is ultimately just a neural network that will gradually detect the characteristics of an image.

The CNN’s convolution layers

The architecture of such a network is very often articulated by a stack of convolutional layers then of deep dense layers which will do the decision work. To summarize the convolutional layers find the shapes and patterns in the image and the final layers will do the decision work (like classification for example).

Convolution layers include several filters. Each Convolution Filter – as we explained previously – the same layer will therefore extract or detect a characteristic of the image. So at the exit of a convolutional layer we have a set of characteristics which are materialized by what we call feature maps.

These characteristics (or resulting images of convolution filters) are then fed back into other filters, etc.

Build our own CCN now

Goal

To illustrate convolutional neural networks, we are going to create our own from scratch that will allow us to classify images. To do this we will use Python & TensorFlow 2.x (with keras) and we will use a classic dataset the MNSIT Fashion.

Dataset description

The dataset contains more than 70,000 grayscale images (see below):

Each image is a 28 × 28 pixel square.

Good news, Tensorflow includes its images in its API so no need to bother to retrieve the dataset. To make your life easier, I suggest you use colab (the notebook will be downloadable from GitHub of course).

This data set identifies 10 types of objects (labels). These labels are coded with numbers from 0 to 9:

  • 0 T-shirt/top
  • 1 Trouser
  • 2 Pullover
  • 3 Dress
  • 4 Coat
  • 5 Sandal
  • 6 Shirt
  • 7 Sneaker
  • 8 Bag
  • 9 Ankle boot

Get data

Let’s start by importing the libraries:

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.keras.layers import Dense, Conv2D, Input, Flatten, Dropout, MaxPooling2D
from tensorflow.keras.models import Model
import pandas as pd
from sklearn.metrics import classification_report,confusion_matrix
from tensorflow.keras.callbacks import EarlyStopping
import seaborn as sns

The recovery of the data set as well as the splitting is strait forward:

dataset_fashion_mnsit = tf.keras.datasets.fashion_mnist
(X_train, y_train), (X_test, y_test) = dataset_fashion_mnsit.load_data()

Now we have two datasets (training and testing). Let’s look at the distribution of labels:

pd.DataFrame(y_train)[0].value_counts()
9    6000
8    6000
7    6000
6    6000
5    6000
4    6000
3    6000
2    6000
1    6000
0    6000
Name: 0, dtype: int64

Excellent news, we have a very even distribution of these labels.

Data Preparation

Neural networks are very sensitive to data normalization. In the case of grayscale images this is very simple and since the pixels go from 0 to 255, we just have to divide all the pixels by 255:

X_train = X_train / 255
X_test = X_test / 255
print(f"Training data: {X_train.shape}, Test data: {X_test.shape}")
Training data: (60000, 28, 28), Test data: (10000, 28, 28)

Try to look at an image sample:

plt.imshow(X_train[0])

And its label:

y_train[0]
9

Label 9 match with an Ankle boot !

Since we have grayscale images we are missing one dimension (color: RGB). Nothing serious we will add it …

X_train = X_train.reshape(60000, 28, 28, 1)
X_test = X_test.reshape(10000, 28, 28, 1)

Modeling

I’m not going to detail everything here, but we will stack the layers of our CNN as follows:

This is how to create this Neural Network with TensorFlow :

mon_cnn = tf.keras.Sequential()
 
# 3 couches de convolution, avec Nb filtres progressif 32, 64 puis 128
mon_cnn.add(Conv2D(filters=32, kernel_size=(3,3), input_shape=(28, 28, 1), activation='relu'))
mon_cnn.add(MaxPooling2D(pool_size=(2, 2)))
 
mon_cnn.add(Conv2D(filters=64, kernel_size=(3,3),input_shape=(28, 28, 1), activation='relu'))
mon_cnn.add(MaxPooling2D(pool_size=(2, 2)))
 
mon_cnn.add(Conv2D(filters=64, kernel_size=(3,3),input_shape=(28, 28, 1), activation='relu'))
mon_cnn.add(MaxPooling2D(pool_size=(2, 2)))
 
# remise à plat
mon_cnn.add(Flatten())
 
# Couche dense classique ANN
mon_cnn.add(Dense(512, activation='relu'))
 
# Couche de sortie (classes de 0 à 9)
mon_cnn.add(Dense(10, activation='softmax'))

Note: the explanation of the different hypermarameters and layers (Conv2D and pooling in particular) will come in a next article.

In order not to fumble over the number of epochs to perform, I will use the technique of EarlyStopping which allows the learning to be stopped as soon as the model begins to over-train. This allows me to neglect this parameter (epochs).

early_stop = EarlyStopping(monitor='val_loss',patience=2)

Now we can compile the model:

mon_cnn.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
mon_cnn.summary()
Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_2 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 1, 1, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 64)                0         
_________________________________________________________________
dense (Dense)                (None, 512)               33280     
_________________________________________________________________
dense_1 (Dense)              (None, 10)                5130      
=================================================================
Total params: 94,154
Trainable params: 94,154
Non-trainable params: 0

We see in the summary that our model will have to learn 94,154 parameters, so it will take a few minutes during the training phase.

Training phase (fit)

Let’s start training. Notice the number of epochs (iterations / backpropagation) of 25:

mon_cnn.fit(x=X_train, 
            y=y_train, 
            validation_data=(X_test, y_test), 
            epochs=25,
            callbacks=[early_stop])
Epoch 1/25
1875/1875 [==============================] - 59s 31ms/step - loss: 0.7872 - accuracy: 0.7077 - val_loss: 0.4386 - val_accuracy: 0.8408
Epoch 2/25
1875/1875 [==============================] - 58s 31ms/step - loss: 0.4102 - accuracy: 0.8490 - val_loss: 0.3833 - val_accuracy: 0.8625
Epoch 3/25
1875/1875 [==============================] - 58s 31ms/step - loss: 0.3345 - accuracy: 0.8752 - val_loss: 0.3404 - val_accuracy: 0.8740
Epoch 4/25
1875/1875 [==============================] - 58s 31ms/step - loss: 0.2958 - accuracy: 0.8887 - val_loss: 0.3470 - val_accuracy: 0.8747
Epoch 5/25
1875/1875 [==============================] - 58s 31ms/step - loss: 0.2694 - accuracy: 0.8987 - val_loss: 0.3225 - val_accuracy: 0.8844
Epoch 6/25
1875/1875 [==============================] - 58s 31ms/step - loss: 0.2422 - accuracy: 0.9092 - val_loss: 0.3194 - val_accuracy: 0.8862
Epoch 7/25
1875/1875 [==============================] - 57s 31ms/step - loss: 0.2329 - accuracy: 0.9115 - val_loss: 0.3220 - val_accuracy: 0.8851
Epoch 8/25
1875/1875 [==============================] - 58s 31ms/step - loss: 0.2058 - accuracy: 0.9217 - val_loss: 0.3184 - val_accuracy: 0.8898
Epoch 9/25
1875/1875 [==============================] - 58s 31ms/step - loss: 0.1969 - accuracy: 0.9271 - val_loss: 0.3080 - val_accuracy: 0.8962
Epoch 10/25
1875/1875 [==============================] - 58s 31ms/step - loss: 0.1827 - accuracy: 0.9314 - val_loss: 0.3258 - val_accuracy: 0.8890
Epoch 11/25
1875/1875 [==============================] - 58s 31ms/step - loss: 0.1740 - accuracy: 0.9318 - val_loss: 0.3455 - val_accuracy: 0.8878

Please note that the earlystopping condition allows stopping before 25 iterations (stopping after 11).

Model evaluation

TensorFlow has set aside the accuracy and loss information during the training phase and for each epoch. We just need to recover them:

losses = pd.DataFrame(mon_cnn.history.history)
losses[['accuracy', 'val_accuracy']].plot()

The orange curve represents the accuracy on the training data, the blue the accuracy on the test data. We also note that even if the accuracy continues to improve on the training data while the accuracyon the test data has flattened and even decreased. We then begin to over-fit, which is why the early-stopping stopped the process.

We can also see the loss curve:

losses[['loss', 'val_loss']].plot()

Let’s look at the confusion matrix (with a heat map with Seaborn):

plt.figure(figsize=(12,8))
sns.heatmap(confusion_matrix(y_test, pred),annot=True)

We can see that there are errors / confusion especially between shirts (6) and tops (0), which is not really surprising given the quality of the images.

Prediction

Let’s try our model on an image. For this test we will take the image from the beginning and see how our model behaves:

img = X_train[0]
mon_cnn.predict(img.reshape(1,28,28,1))
array([[3.9226734e-07, 8.9244217e-08, 6.7499624e-11, 4.7707250e-08,
        1.1513226e-08, 1.3388344e-05, 9.8523687e-09, 7.1390239e-03,
        6.6544054e-08, 9.9284691e-01]], dtype=float32)

The returned array actually offers a probability of result for each class … To get the most probable, all you have to do is take the greatest value:

np.argmax(mon_cnn.predict(img.reshape(1,28,28,1)), axis=-1)[0]
9

Our model works pretty well!

This concludes this series on image management. If you liked it, please let me know in the comments. I am well aware of having covered the subject but also had the idea somewhere … namely not to go into too much detail in order to be able to embark on this fascinating subject.

Share this post

Benoit Cayla

In more than 15 years, I have built-up a solid experience around various integration projects (data & applications). I have, indeed, worked in nine different companies and successively adopted the vision of the service provider, the customer and the software editor. This experience, which made me almost omniscient in my field naturally led me to be involved in large-scale projects around the digitalization of business processes, mainly in such sectors like insurance and finance. Really passionate about AI (Machine Learning, NLP and Deep Learning), I joined Blue Prism in 2019 as a pre-sales solution consultant, where I can combine my subject matter skills with automation to help my customers to automate complex business processes in a more efficient way. In parallel with my professional activity, I run a blog aimed at showing how to understand and analyze data as simply as possible: datacorner.fr Learning, convincing by the arguments and passing on my knowledge could be my caracteristic triptych.

View all posts by Benoit Cayla →

4 thoughts on “Image processing (part 7) Convolution Neural Networks – CNN

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Fork me on GitHub