Building a CNN

Posted on Wed 18 March 2020 in deep learning

This notebook describes how to create a CNN for classifying dogs and cats from scratch. The following material contains my notes from Deep Learning with Python.

In [2]:
from __future__ import print_function
import numpy as np
from keras import layers
from keras import models
from keras import optimizers
from keras.models import load_model
from keras.preprocessing.image import ImageDataGenerator
from keras.preprocessing import image
import matplotlib.pyplot as plt
import os
In [3]:
input_h = 150       # height
input_w = 150       # width
input_c = 3         # number of channels
save_model = True   # flag

In this case, the images of the dataset will be yield into the model using a generator. There are several advantages of using generators:

  • Instead of loading the whole dataset and fed it into the model, the images are loaded in batches.
  • The images can be preprocessed (rescale, grayscale, etc.) before the training step.
  • The number of images can be increased via data augmentation.
In [4]:
# --- paths ---
# train (cats):       /media/data/dogs_vs_cats_small/train/cats
# train (dogs):       /media/data/dogs_vs_cats_small/train/dogs
# validation (cats):  /media/data/dogs_vs_cats_small/validation/cats
# validation (dogs):  /media/data/dogs_vs_cats_small/validation/dogs
# test (cats):        /media/data/dogs_vs_cats_small/test/cats
# test (dogs):        /media/data/dogs_vs_cats_small/test/dogs

# --- data preprocessing ---
train_datagen = ImageDataGenerator(rescale=1./255)          # rescale all images by 1/255
validation_datagen = ImageDataGenerator(rescale=1./255)

train_path = "/media/data/dogs_vs_cats_small/train"
train_generator = train_datagen.flow_from_directory(train_path,
                                                    target_size=(input_h, input_w),
                                                    batch_size=20,
                                                    class_mode="binary"
                                                    )

validation_path = "/media/data/dogs_vs_cats_small/validation"
validation_generator = validation_datagen.flow_from_directory(validation_path,
                                                            target_size=(input_h, input_w),
                                                            batch_size=20,
                                                            class_mode="binary"
                                                            )
Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.
In [5]:
# check generator
for data_batch, labels_batch in train_generator:
    print("data batch shape:  ", data_batch.shape)
    print("labels batch shape:", labels_batch.shape)
    break
data batch shape:   (20, 150, 150, 3)
labels batch shape: (20,)

In this case, the generator yields a batch of 20 images each time. The shape of each image is (150, 150, 3), that is, height=150, width=150, and channels=3. Notice that the generators modify the input images in two ways:

  • The range of the images changes from [0, 255] to [0, 1] by using rescale=1./255.
  • The shape of the images changes to (150, 150) by using target_size=(input_h, input_w).

The generators also assign a label to each image automatically. Notice that such labels are encoded in the class_indices property:

In [6]:
# check labels
print("train mapping:", train_generator.class_indices)
print("validation mapping:", validation_generator.class_indices)

# mapping
class_indices = train_generator.class_indices
train mapping: {'cats': 0, 'dogs': 1}
validation mapping: {'cats': 0, 'dogs': 1}

It is worth to mention that this property will be needed later when classifying unseen images.

The next block shows how to build the model:

In [8]:
# --- build model ---
model = models.Sequential()

model.add(layers.Conv2D(32, (3, 3), activation="relu", input_shape=(input_h, input_w, input_c)))
model.add(layers.MaxPooling2D((2,2)))

model.add(layers.Conv2D(64, (3,3), activation="relu"))
model.add(layers.MaxPooling2D((2,2)))

model.add(layers.Conv2D(128, (3,3), activation="relu"))
model.add(layers.MaxPooling2D((2,2)))

model.add(layers.Conv2D(128, (3,3), activation="relu"))
model.add(layers.MaxPooling2D((2,2)))

model.add(layers.Flatten())

model.add(layers.Dense(512, activation="relu"))
model.add(layers.Dense(1, activation="sigmoid"))

The next block displays a summary of the model. Notice that the shape of the convolution/maxpooling layers changes from shallow to deep layers. From [1] page 123, the convolution layers operates on feature maps; they are 3D tensors of shape (height, width, number_of_filters). Here, number_of_filters is given when the layer is defined. For instance, the feature map of the first layer has a shape (148, 148, 32), that is, there are 32 filters.

In [9]:
# check the model
model.summary()
Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_5 (Conv2D)            (None, 148, 148, 32)      896       
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 74, 74, 32)        0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 72, 72, 64)        18496     
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 36, 36, 64)        0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 34, 34, 128)       73856     
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 17, 17, 128)       0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 15, 15, 128)       147584    
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 7, 7, 128)         0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 6272)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 512)               3211776   
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 513       
=================================================================
Total params: 3,453,121
Trainable params: 3,453,121
Non-trainable params: 0
_________________________________________________________________

In addition, we can create a figure of the model:

In [11]:
from keras.utils import plot_model
plot_model(model, show_shapes=True)
Out[11]:
In [10]:
# --- compile the model ---
model.compile(loss="binary_crossentropy",
                optimizer=optimizers.RMSprop(lr=1e-4),
                metrics=["acc"])
In [11]:
# Listing 5.8 Fitting the model using a batch generator
# --- training ---
history = model.fit_generator(
            train_generator,
            steps_per_epoch=100,
            epochs=30,
            validation_data=validation_generator,
            validation_steps=50
            )

# Listing 5.9 Saving the model
# --- save model ---
if save_model:
    filename = "dogs_vs_cats_v1.h5"
    model.save(filename)
    print("model saved:", filename)
W1013 23:19:55.609534 140255455250176 deprecation_wrapper.py:119] From /home/auraham/.virtualenvs/keras/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:1033: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

Epoch 1/30
100/100 [==============================] - 67s 673ms/step - loss: 0.6892 - acc: 0.5425 - val_loss: 0.6739 - val_acc: 0.5810
Epoch 2/30
100/100 [==============================] - 69s 687ms/step - loss: 0.6546 - acc: 0.6095 - val_loss: 0.6513 - val_acc: 0.6070
Epoch 3/30
100/100 [==============================] - 63s 632ms/step - loss: 0.6129 - acc: 0.6625 - val_loss: 0.6164 - val_acc: 0.6670
Epoch 4/30
100/100 [==============================] - 63s 627ms/step - loss: 0.5752 - acc: 0.6915 - val_loss: 0.5900 - val_acc: 0.6960
Epoch 5/30
100/100 [==============================] - 63s 629ms/step - loss: 0.5364 - acc: 0.7320 - val_loss: 0.6527 - val_acc: 0.6400
Epoch 6/30
100/100 [==============================] - 63s 628ms/step - loss: 0.5102 - acc: 0.7525 - val_loss: 0.5557 - val_acc: 0.7160
Epoch 7/30
100/100 [==============================] - 63s 631ms/step - loss: 0.4748 - acc: 0.7715 - val_loss: 0.5520 - val_acc: 0.7220
Epoch 8/30
100/100 [==============================] - 63s 629ms/step - loss: 0.4510 - acc: 0.7865 - val_loss: 0.6259 - val_acc: 0.6690
Epoch 9/30
100/100 [==============================] - 63s 628ms/step - loss: 0.4268 - acc: 0.8060 - val_loss: 0.5806 - val_acc: 0.7080
Epoch 10/30
100/100 [==============================] - 63s 630ms/step - loss: 0.4003 - acc: 0.8180 - val_loss: 0.5262 - val_acc: 0.7430
Epoch 11/30
100/100 [==============================] - 63s 627ms/step - loss: 0.3692 - acc: 0.8405 - val_loss: 0.5364 - val_acc: 0.7370
Epoch 12/30
100/100 [==============================] - 63s 632ms/step - loss: 0.3500 - acc: 0.8520 - val_loss: 0.5390 - val_acc: 0.7450
Epoch 13/30
100/100 [==============================] - 63s 631ms/step - loss: 0.3129 - acc: 0.8705 - val_loss: 0.6195 - val_acc: 0.7280
Epoch 14/30
100/100 [==============================] - 63s 628ms/step - loss: 0.2970 - acc: 0.8770 - val_loss: 0.6114 - val_acc: 0.7280
Epoch 15/30
100/100 [==============================] - 63s 627ms/step - loss: 0.2786 - acc: 0.8875 - val_loss: 0.5861 - val_acc: 0.7420
Epoch 16/30
100/100 [==============================] - 63s 628ms/step - loss: 0.2595 - acc: 0.8965 - val_loss: 0.5786 - val_acc: 0.7410
Epoch 17/30
100/100 [==============================] - 63s 629ms/step - loss: 0.2343 - acc: 0.9025 - val_loss: 0.6163 - val_acc: 0.7300
Epoch 18/30
100/100 [==============================] - 63s 631ms/step - loss: 0.2142 - acc: 0.9145 - val_loss: 0.5869 - val_acc: 0.7390
Epoch 19/30
100/100 [==============================] - 63s 630ms/step - loss: 0.1881 - acc: 0.9355 - val_loss: 0.6111 - val_acc: 0.7450
Epoch 20/30
100/100 [==============================] - 63s 632ms/step - loss: 0.1724 - acc: 0.9375 - val_loss: 0.6596 - val_acc: 0.7430
Epoch 21/30
100/100 [==============================] - 63s 630ms/step - loss: 0.1493 - acc: 0.9500 - val_loss: 0.6704 - val_acc: 0.7360
Epoch 22/30
100/100 [==============================] - 63s 631ms/step - loss: 0.1332 - acc: 0.9600 - val_loss: 0.6970 - val_acc: 0.7500
Epoch 23/30
100/100 [==============================] - 63s 630ms/step - loss: 0.1225 - acc: 0.9580 - val_loss: 0.7396 - val_acc: 0.7440
Epoch 24/30
100/100 [==============================] - 63s 632ms/step - loss: 0.1048 - acc: 0.9685 - val_loss: 0.7707 - val_acc: 0.7390
Epoch 25/30
100/100 [==============================] - 63s 633ms/step - loss: 0.0938 - acc: 0.9705 - val_loss: 0.9160 - val_acc: 0.7280
Epoch 26/30
100/100 [==============================] - 63s 630ms/step - loss: 0.0774 - acc: 0.9775 - val_loss: 0.7824 - val_acc: 0.7530
Epoch 27/30
100/100 [==============================] - 63s 630ms/step - loss: 0.0685 - acc: 0.9780 - val_loss: 0.8119 - val_acc: 0.7430
Epoch 28/30
100/100 [==============================] - 63s 629ms/step - loss: 0.0635 - acc: 0.9805 - val_loss: 0.8990 - val_acc: 0.7260
Epoch 29/30
100/100 [==============================] - 63s 632ms/step - loss: 0.0565 - acc: 0.9845 - val_loss: 0.8618 - val_acc: 0.7430
Epoch 30/30
100/100 [==============================] - 63s 630ms/step - loss: 0.0420 - acc: 0.9895 - val_loss: 0.9173 - val_acc: 0.7510
model saved: dogs_vs_cats_v1.h5

The next block shows the accuracy and loss after training.

In [12]:
 # --- plotting ---
acc = history.history["acc"]
val_acc = history.history["val_acc"]
loss = history.history["loss"]
val_loss = history.history["val_loss"]

epochs = range(1, len(acc)+1)

plt.plot(epochs, acc, "bo", label="Training Acc")
plt.plot(epochs, val_acc, "b", label="Validation Acc")
plt.title("Training and validation accuracy")
plt.legend()

plt.figure()

plt.plot(epochs, loss, "bo", label="Training Loss")
plt.plot(epochs, val_loss, "b", label="Validation Loss")
plt.title("Training and validation loss")
plt.legend()

plt.show()

From the first figure, it seems that the model starts to overfit after the 10-th epoch.

In the following, we will classify a few images.

In [5]:
# --- paths ----
# train (cats):       /media/data/dogs_vs_cats_small/train/cats
# train (dogs):       /media/data/dogs_vs_cats_small/train/dogs
# validation (cats):  /media/data/dogs_vs_cats_small/validation/cats
# validation (dogs):  /media/data/dogs_vs_cats_small/validation/dogs
# test (cats):        /media/data/dogs_vs_cats_small/test/cats
# test (dogs):        /media/data/dogs_vs_cats_small/test/dogs

# --- display images (they are not preprocessed yet) ---
test_dir_cats = "/media/data/dogs_vs_cats_small/test/cats"
test_dir_dogs = "/media/data/dogs_vs_cats_small/test/dogs"

fnames_dogs = [os.path.join(test_dir_dogs, fname) for fname in os.listdir(test_dir_dogs)]
fnames_cats = [os.path.join(test_dir_cats, fname) for fname in os.listdir(test_dir_cats)]


# convert PIL to np.array (cast to np.int in the range [0, 255])
img_path = fnames_dogs[4]
img = image.load_img(img_path)
x = image.img_to_array(img).astype(int)
plt.figure()
plt.imshow(x)

# convert PIL to np.array (as np.float32 in the range [0, 1])
img_path = fnames_dogs[5]
img = image.load_img(img_path)
x = image.img_to_array(img) * 1./255
plt.figure()
plt.imshow(x)

# convert PIL to np.array (cast to np.int in the range [0, 255])
img_path = fnames_cats[4]
img = image.load_img(img_path)
x = image.img_to_array(img).astype(int)
plt.figure()
plt.imshow(x)

# convert PIL to np.array (as np.float32 in the range [0, 1])
img_path = fnames_cats[5]
img = image.load_img(img_path)
x = image.img_to_array(img) * 1./255
plt.figure()
plt.imshow(x)

plt.show()

Image classification

In [6]:
%matplotlib inline

def display_batch(batch, label="", limit=10):
    """
    Plots the images in the batch
    """
    
    for i, img in enumerate(batch):
        
        plt.figure()
        plt.imshow(img)
        title = "class: %s id: %d" % (label, i)
        plt.title(title)
        
        if (i+1) == limit:
            break

def create_batch(fnames):
    """
    Load images from a list of file paths.
    The images are preprocessed to match the shape and size expected by the model:
      - The images are rescaled in the range [0, 1]
      - The size is (150, 150)
    The output is a tensor of shape (n_samples, height, width, n_channels)
    """
    
    images = []
    
    for fname in fnames:
        
        # convert PIL to np.array (as np.float32 in the range [0, 1])
        img = image.load_img(fname,
                            color_mode="rgb",
                            target_size=(150, 150))
        x = image.img_to_array(img) * 1./255

        images.append(x)
        
    return np.array(images)
        
test_dir_cats = "/media/data/dogs_vs_cats_small/test/cats"
test_dir_dogs = "/media/data/dogs_vs_cats_small/test/dogs"

fnames_dogs = [os.path.join(test_dir_dogs, fname) for fname in os.listdir(test_dir_dogs)]
fnames_cats = [os.path.join(test_dir_cats, fname) for fname in os.listdir(test_dir_cats)]

# create the batch for dogs and cats
batch_dogs = create_batch(fnames_dogs[:32])
batch_cats = create_batch(fnames_cats[:32])
In [9]:
display_batch(batch_dogs, "dogs", limit=5)
display_batch(batch_cats, "cats", limit=5)
In [10]:
batch_dogs.shape
Out[10]:
(32, 150, 150, 3)
In [12]:
# load model
model = load_model("dogs_vs_cats_v1.h5")

# prediction
pred_dogs = model.predict(batch_dogs)
pred_cats = model.predict(batch_cats)
In [13]:
# cast to labels
pred_dogs_classes = (pred_dogs > 0.5).astype("int32").flatten()
pred_cats_classes = (pred_cats > 0.5).astype("int32").flatten()

These are the predictions:

In [14]:
pred_dogs_classes
Out[14]:
array([0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1,
       1, 0, 1, 0, 0, 1, 1, 1, 1, 0], dtype=int32)
In [15]:
pred_cats_classes
Out[15]:
array([0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
       0, 0, 0, 0, 1, 1, 0, 0, 0, 0], dtype=int32)

Remember the mapping:

In [22]:
class_indices
Out[22]:
{'cats': 0, 'dogs': 1}
In [23]:
error_dogs = (pred_dogs_classes != class_indices["dogs"]).sum()
print("missclassified dogs:", error_dogs)
missclassified dogs: 14
In [24]:
error_cats = (pred_cats_classes != class_indices["cats"]).sum()
print("missclassified cats:", error_cats)
missclassified cats: 7

Single image classification

Now, rather than classify a batch of images, let's classify a single image at a time.

In [16]:
# load model
model = load_model("dogs_vs_cats_v1.h5")

Remember the mapping:

In [26]:
# mapping
class_indices
Out[26]:
{'cats': 0, 'dogs': 1}
In [40]:
# reversed mapping
rev_class_indices = {v:k for k, v in class_indices.items()}
In [54]:
def classify_single_image(img_path):
    
    # load first image (no preprocessing)
    img_original = image.load_img(img_path)
    
    # load first image
    img = image.load_img(img_path, 
                         color_mode="rgb",
                         target_size=(150, 150))

    # convert PIL to np.array (as np.float32 in the range [0, 1])
    x = image.img_to_array(img) * 1./255
    
    # change shape to (1, height, width, n_channels)
    input = np.array([x])
    
    # predict
    pred = model.predict(input)

    # cast prediction to label
    pred_label = (pred > 0.5).astype("int32").flatten()[0]  # [0] to access to the first and only image in the batch
    pred_value = pred.flatten()[0]

    # results
    label = rev_class_indices[pred_label]
    print("prediction (float): %.4f, class (int): %d, label: %s" % (pred_value, pred_label, label))

    # plot
    fig = plt.figure()
    ax = fig.add_subplot(111)
    title = "Prediction: %s" % label
    
    ax.set_title(title)
    ax.imshow(img_original)
In [55]:
classify_single_image("hachi.jpg")
prediction (float): 0.9994, class (int): 1, label: dogs
In [56]:
classify_single_image("sheldon.jpg")
prediction (float): 0.0064, class (int): 0, label: cats

So, it turns out that Hachi is indeed a dog, and Sheldon is a kind of cat.