Classification of Digits
Posted on Sat 24 October 2020 in deep learning
This notebook shows how to create a neural network with Keras for classifying numbers. This material is taken from Deep Learning with Python. This notebook is available here.
from __future__ import print_function
from keras.datasets import mnist
from keras import models
from keras import layers
from keras.utils import to_categorical, plot_model
from keras.models import load_model
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import ModelCheckpoint
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.pyplot import rcParams
from sklearn.model_selection import train_test_split
def print_int_matrix(vector):
for row in vector:
print(" ".join(["%3d" % val for val in row]))
def plot_history(history, model_name):
# define input data
hist_dict = history.history
indicators = ("acc", "loss")
# create figure
fig, axes = plt.subplots(2, 1, figsize=(10, 8))
colors = ["#454545", "#007ca5", "#A91458"]
x = np.arange(1, len(hist_dict["acc"])+1)
# accuracy
ax = axes[0]
ax.plot(x, hist_dict["acc"], marker="o", c=colors[0], label="Acc")
ax.plot(x, hist_dict["val_acc"], ls="--", c=colors[1], label="Val Acc")
ax.set_title("%s Accuracy" % model_name)
ax.set_xlabel("Epochs")
ax.legend(loc="lower right")
# plot bar
y_a, y_b = ax.get_ylim()
bar = np.array(hist_dict["val_acc"]).argmax() + 1
ax.plot([bar, bar], [y_a, y_b], c="#505050", zorder=0, lw=0.5)
ax.set_ylim((y_a, y_b))
# loss
ax = axes[1]
ax.plot(x, hist_dict["loss"], marker="o", c=colors[0], label="Loss")
ax.plot(x, hist_dict["val_loss"], ls="--", c=colors[1], label="Val Loss")
ax.set_title("%s Loss" % model_name)
ax.set_xlabel("Epochs")
ax.legend(loc="lower right")
#bar = hist_dict["val_loss"].argmin() + 1
#ax.plot([bar, bar], [-2, 2], c="#c0c0c0")
# plot bar
y_a, y_b = ax.get_ylim()
bar = np.array(hist_dict["val_loss"]).argmin() + 1
ax.plot([bar, bar], [y_a, y_b], c="#505050", zorder=0, lw=0.5)
ax.set_ylim((y_a, y_b))
# adjust margins
fig.subplots_adjust(hspace=0.4)
# save figure
fig.savefig("training_%s.jpg" % model_name, dpi=300)
Check dataset¶
This block loads the dataset:
# load dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images
and train_labels
comprises the training set. The model will learn from that set. On the other hand, test_images
and test_labels
comprises the test set. We evalute the performance of the model with the test set. Below, we display some properties of the dataset. Note that the images are encoded as Numpy uint8
arrays:
print("train_images shape (%s)" % (train_images.shape,))
print("test_images shape (%s)" % (test_images.shape,))
print("images dtype %s\n" % train_images.dtype)
print("train_labels shape (%s)" % (train_labels.shape,))
print("test_labels shape (%s)" % (test_labels.shape,))
print("labels dtype %s" % train_labels.dtype)
We have 60,000 training samples and 10,000 samples for testing. Notice that the data type of both patterns and labels is uint8
. Below we inspect some examples from the dataset. It is recommeded to inspect samples from the training set only. Let's take a look at one sample:
sample_id = 5 # choose a sample from training_images
sample_image = train_images[sample_id] # shape (28, 28)
sample_label = train_labels[sample_id] # scalar
print_int_matrix(sample_image)
As it can be seen, that block roughly shows a '2'. Now, compare it with imshow
:
%matplotlib inline
sample_id = 5 # choose a sample from trainig_images
sample_image = train_images[sample_id] # shape (28, 28)
sample_label = train_labels[sample_id] # scalar
plt.title("Real class: %d" % sample_label)
plt.imshow(sample_image, cmap=plt.cm.binary)
plt.show()
Let's show a few more examples:
%matplotlib inline
# choose some ids
sample_ids = [3, 5, 5390, 5734, 6265, 53, 860, 8883, 3704, 3392]
sample_ids = [860, # 0
49723, # 1
52099, # 2
57651, # 3
4180, # 4
2445, # 5
6265, # 6
19183, # 7
51078, # 8
40753, # 9
]
# style
rcParams['xtick.color'] = "#505050" # ticks gray color
rcParams['ytick.color'] = "#505050" # ticks gray color
# create plot
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for ix, ax in zip(sample_ids, axes.flat):
sample_image = train_images[ix] # shape (28, 28)
sample_label = train_labels[ix] # scalar
ax.set_title("Real class: %d\nID: %d" % (sample_label, ix))
ax.imshow(sample_image, cmap=plt.cm.binary)
# adjust margins
fig.subplots_adjust(wspace=0.5, hspace=0.5)
fig.savefig("numbers.jpg", dpi=300)
You can also use this block for displaying examples of a specific class:
%matplotlib inline
# choose some ids
np.random.seed(49)
sample_ids = np.random.randint(0, len(train_images), 10)
# or...
# choose some ids of a specific class
class_ids = np.where(train_labels == 5)[0] # choose your class here
sample_ids = np.random.choice(class_ids, 10)
# style
rcParams['xtick.color'] = "#505050" # ticks gray color
rcParams['ytick.color'] = "#505050" # ticks gray color
# create plot
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for ix, ax in zip(sample_ids, axes.flat):
sample_image = train_images[ix] # shape (28, 28)
sample_label = train_labels[ix] # scalar
ax.set_title("Real class: %d\nID: %d" % (sample_label, ix))
ax.imshow(sample_image, cmap=plt.cm.binary)
# adjust margins
fig.subplots_adjust(wspace=0.5, hspace=0.5)
#fig.savefig("numbers.jpg", dpi=300)
Preprocessing¶
Before training, we need to preprocess our data as follows:
- Change data type from
uint8
tofloat32
. - Change range from
[0, 255]
to[0, 1]
. - Change shape from
(28, 28)
to(784, )
.
We also need to encode the labels from integers to vectors using one-hot encoding. From Chapter 4:
[...] you should format your data in a way that can be fed into a machine-learning model -- here, we'll assume a deep neural network:
- Your data should be formatted as tensors
- The values taken by these tensors should usuallu be scaled to small values: for example, in the [-1, 1] range or [0, 1] range
# load dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# data preprocessing
X_train = X_train.reshape((60000, 28*28))
X_train = X_train.astype("float32") / 255
X_test = X_test.reshape((10000, 28*28))
X_test = X_test.astype("float32") / 255
# prepare labels (one-hot encoding)
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
Check the change:
print("before preprocessing (dtype: %s, shape: %s, min: %.2f, max: %.2f)" % (train_images[0].dtype, train_images.shape, train_images.min(), train_images.max()))
print("after preprocessing (dtype: %s, shape: %s, min: %.2f, max: %.2f)" % (X_train[0].dtype, X_train.shape, X_train.min(), X_train.max()))
Notice that the shape, range and dtype changed. Now, each sample is a float32
vector of shape (784, )
.
Finally, check the labels. Notice that a label 5
is encoded as a binary vector: [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
:
print("label ", train_labels[0])
print("encoded label ", y_train[0])
Create a neural network¶
We are ready to create our first neural network:
# network architecture
model = models.Sequential(name="model_v1")
model.add(layers.Dense(512, activation="relu", input_shape=(28*28, )))
model.add(layers.Dense(10, activation="softmax"))
# compilation
model.compile(optimizer="rmsprop",
loss="categorical_crossentropy",
metrics=["accuracy"])
We can use summary()
to show the parameters of the model:
model.summary()
Also, we,can use plot_model()
to show a graphical representation of the model:
plot_model(model, show_shapes=True)
Training and testing¶
Before training, we split the training set into two sets: one for training and another one for validation:
split_X_train, split_X_val, split_y_train, split_y_val = train_test_split(X_train, y_train, test_size=0.1, random_state=42)
In the following, we train the neural network. In order to analyze the convergence of the network, the function fit
returns a History
object. We use that object for creating convergence plots.
# trainining time: ~45s
history = model.fit(split_X_train,
split_y_train,
epochs=15,
batch_size=128,
validation_data=(split_X_val, split_y_val))
Save the model:
# save model
model.save("mnist_model_v1.h5")
Measure the accuracy of the model:
# evaluate the model
test_loss, test_acc = model.evaluate(X_test, y_test)
print("test_acc:", test_acc)
In Chapter 3, the author shows how to plot the training and validation loss. In this example, we plot the training loss only becasue we did not employ a validation set. Before going any further, analyze history
first:
print(history)
print(history.history['loss'])
print(len(history.history['loss']))
print(history.history.keys())
history
contains a dictionary also called history
. As it can be seen, it contains four keys:
acc
, the value of the metric in the training setloss
, the value of the loss function in the training setval_acc
, the metric in the validation setval_loss
, the value of the loss function in the validation set
Notice that each entry in that dictionary contains a list with the values of the corresponding indicator for each epoch. Let's use them to create a plot.
%matplotlib inline
plot_history(history, "model_v1")
Demo¶
Now, we are ready to use the trained model for classifying samples from the testing dataset.
# load trained model
model = load_model("mnist_model_v1.h5")
# prediction
predictions = model.predict(X_test)
# convert to labels
y_pred = predictions.argmax(axis=1)
# compare with correct labels
y_real = test_labels # use copy, not one-hot encoding
num_samples = len(y_real)
acc = (y_pred == y_real).sum() / num_samples
errors = y_pred != y_real
print("Accuracy: %4f" % acc)
print("# of misclassified images: %d" % errors.sum())
# indices of misclassified images
error_indices_ffnn = np.arange(num_samples, dtype=int)[errors]
# print some of the indices of the images classified incorrectly
error_indices_ffnn[:20]
Let's see some images from the training set. Remember, the model have seen these images.
%matplotlib inline
# choose input images from TRAINING SET
sample_ids = [860, 49723, 52099, 57651, 4180, 2445, 6265, 19183, 51078, 40753]
# real labels
y_real = train_labels[sample_ids] # shape: (10, )
# predictions
predictions = model.predict(X_train[sample_ids])
y_pred = predictions.argmax(axis=1) # shape: (10, )
# style
rcParams['xtick.color'] = "#505050" # ticks gray color
rcParams['ytick.color'] = "#505050" # ticks gray color
# create plot
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for ix, yreal, ypred, ax in zip(sample_ids, y_real, y_pred, axes.flat):
sample_image = train_images[ix] # shape (28, 28)
color = "k"
if yreal != ypred: color = "r"
ax.set_title("Real class: %d\nPred class: %d\n ID: %d" % (yreal, ypred, ix), color=color)
ax.imshow(sample_image, cmap=plt.cm.binary)
# adjust margins
fig.subplots_adjust(wspace=0.5, hspace=0.5)
fig.savefig("model_v1_numbers_predictions.jpg", dpi=300)
%matplotlib inline
# choose input images from TESTING SET
np.random.seed(42)
sample_ids = np.random.randint(0, len(test_labels), 10)
# real labels
y_real = test_labels[sample_ids] # shape: (10, )
# predictions
predictions = model.predict(X_test[sample_ids])
y_pred = predictions.argmax(axis=1) # shape: (10, )
# style
rcParams['xtick.color'] = "#505050" # ticks gray color
rcParams['ytick.color'] = "#505050" # ticks gray color
# create plot
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for ix, yreal, ypred, ax in zip(sample_ids, y_real, y_pred, axes.flat):
sample_image = test_images[ix] # shape (28, 28)
color = "k"
if yreal != ypred: color = "r"
ax.set_title("Real class: %d\nPred class: %d\n ID: %d" % (yreal, ypred, ix), color=color)
ax.imshow(sample_image, cmap=plt.cm.binary)
# adjust margins
fig.subplots_adjust(wspace=0.5, hspace=0.5)
#fig.savefig("numbers_predictions.jpg", dpi=300)
The model misclassified more than 100 images. Let's see a few of them
%matplotlib inline
# choose input images from TESTING SET
np.random.seed(42)
sample_ids = np.random.choice(error_indices_ffnn, 10)
# real labels
y_real = test_labels[sample_ids] # shape: (10, )
# predictions
predictions = model.predict(X_test[sample_ids])
y_pred = predictions.argmax(axis=1) # shape: (10, )
# style
rcParams['xtick.color'] = "#505050" # ticks gray color
rcParams['ytick.color'] = "#505050" # ticks gray color
# create plot
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for ix, yreal, ypred, ax in zip(sample_ids, y_real, y_pred, axes.flat):
sample_image = test_images[ix] # shape (28, 28)
color = "k"
if yreal != ypred: color = "r"
ax.set_title("Real class: %d\nPred class: %d\n ID: %d" % (yreal, ypred, ix), color=color)
ax.imshow(sample_image, cmap=plt.cm.binary)
# adjust margins
fig.subplots_adjust(wspace=0.5, hspace=0.5)
fig.savefig("model_v1_numbers_incorrect_predictions.jpg", dpi=300)
Convolutional neural network¶
Now, we use a Convolutional Neural Network for digit classification. Just for comparison, this is our previous architecture:
# network architecture
model = models.Sequential(name="model_v1")
model.add(layers.Dense(512, activation="relu", input_shape=(28*28, )))
model.add(layers.Dense(10, activation="softmax"))
# compilation
model.compile(optimizer="rmsprop",
loss="categorical_crossentropy",
metrics=["accuracy"])
# network architecture
model = models.Sequential(name="model_v2")
# feature extractor
model.add(layers.Conv2D(32, (3,3), activation="relu", input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64, (3,3), activation="relu"))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64, (3,3), activation="relu"))
# classifier
model.add(layers.Flatten())
model.add(layers.Dense(64, activation="relu"))
model.add(layers.Dense(10, activation="softmax"))
# compilation
model.compile(optimizer="rmsprop",
loss="categorical_crossentropy",
metrics=["accuracy"],
)
model.summary()
plot_model(model, show_shapes=True)
# load dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# data preprocessing
X_train = X_train.reshape((60000, 28, 28, 1)) # use (28, 28, 1), not (28*28, )
X_train = X_train.astype("float32") / 255
X_test = X_test.reshape((10000, 28, 28, 1)) # use (28, 28, 1), not (28*28, )
X_test = X_test.astype("float32") / 255
# prepare labels (one-hot encoding)
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
# split into training and validation
split_X_train, split_X_val, split_y_train, split_y_val = train_test_split(X_train, y_train, test_size=0.1, random_state=42)
# add callback
callback_list = [
ModelCheckpoint( # saves the current weights after every epoch
verbose=1, # verbosity mode
filepath="model_v2_{epoch:02d}.h5", # path to the destination model file
monitor="val_loss", # if the validation loss is improved,
save_best_only=True, # saves (overwrites) the best model so far
)
]
# trainining time: ~45s
history = model.fit(split_X_train,
split_y_train,
epochs=15,
callbacks=callback_list,
batch_size=128,
validation_data=(split_X_val, split_y_val))
# save model
model.save("mnist_model_v2.h5")
# save model
model.save("mnist_model_v2.h5")
# evaluate the model
test_loss, test_acc = model.evaluate(X_test, y_test)
print("test_acc:", test_acc)
%matplotlib inline
plot_history(history, "model_v2")
Let's find how many images were classified incorrectly:
# load trained model
model = load_model("model_v2_07.h5") # you may need to change this line
# prediction
predictions = model.predict(X_test)
# convert to labels
y_pred = predictions.argmax(axis=1)
# compare with correct labels
y_real = test_labels # use copy, not one-hot encoding
num_samples = len(y_real)
acc = (y_pred == y_real).sum() / num_samples
errors = y_pred != y_real
print("Accuracy: %4f" % acc)
print("# of misclassified images: %d" % errors.sum())
# indices of misclassified images
error_indices_cnn_v2 = np.arange(num_samples, dtype=int)[errors]
We use the model v2 to classify some of the images incorrectly classified by the model v1:
%matplotlib inline
# load model
model = load_model("model_v2_07.h5") # you may need to change this line
# choose input images from TESTING SET
np.random.seed(42)
sample_ids = np.random.choice(error_indices_ffnn, 10)
# real labels
y_real = test_labels[sample_ids] # shape: (10, )
# predictions
predictions = model.predict(X_test[sample_ids])
y_pred = predictions.argmax(axis=1) # shape: (10, )
# style
rcParams['xtick.color'] = "#505050" # ticks gray color
rcParams['ytick.color'] = "#505050" # ticks gray color
# create plot
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for ix, yreal, ypred, ax in zip(sample_ids, y_real, y_pred, axes.flat):
sample_image = test_images[ix] # shape (28, 28)
color = "k"
if yreal != ypred: color = "r"
ax.set_title("Real class: %d\nPred class: %d\n ID: %d" % (yreal, ypred, ix), color=color)
ax.imshow(sample_image, cmap=plt.cm.binary)
# adjust margins
fig.subplots_adjust(wspace=0.5, hspace=0.5)
fig.savefig("numbers_predictions_v2.jpg", dpi=300)
Finally, let's see some of the images incorrectly classified by the model v2:
%matplotlib inline
# load model
model = load_model("model_v2_07.h5") # you may need to change this line
# choose input images from TESTING SET
np.random.seed(42)
sample_ids = np.random.choice(error_indices_cnn_v2, 10, replace=False)
# real labels
y_real = test_labels[sample_ids] # shape: (10, )
# predictions
predictions = model.predict(X_test[sample_ids])
y_pred = predictions.argmax(axis=1) # shape: (10, )
# style
rcParams['xtick.color'] = "#505050" # ticks gray color
rcParams['ytick.color'] = "#505050" # ticks gray color
# create plot
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for ix, yreal, ypred, ax in zip(sample_ids, y_real, y_pred, axes.flat):
sample_image = test_images[ix] # shape (28, 28)
color = "k"
if yreal != ypred: color = "r"
ax.set_title("Real class: %d\nPred class: %d\n ID: %d" % (yreal, ypred, ix), color=color)
ax.imshow(sample_image, cmap=plt.cm.binary)
# adjust margins
fig.subplots_adjust(wspace=0.5, hspace=0.5)
fig.savefig("numbers_incorrect_predictions_v2.jpg", dpi=300)
Data Augmentation¶
In this section, we use the architecture described in this notebook from Kaggle. It turns out that we can improve our ~99% by using:
- A validation test
- A deeper model
- A technique called data augmentation.
Let's use the same model along with data augmentation. First, let's preprocess and split the dataset:
# load dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# data preprocessing
X_train = X_train.reshape((60000, 28, 28, 1)) # use (28, 28, 1), not (28*28, )
X_train = X_train.astype("float32") / 255
X_test = X_test.reshape((10000, 28, 28, 1)) # use (28, 28, 1), not (28*28, )
X_test = X_test.astype("float32") / 255
# prepare labels (one-hot encoding)
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
# split into training and validation
split_X_train, split_X_val, split_y_train, split_y_val = train_test_split(X_train, y_train, test_size=0.1, random_state=42)
Create a generator to use data augmentation:
datagen = ImageDataGenerator(
featurewise_center=False, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=False, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
rotation_range=10, # randomly rotate images in the range (degrees, 0 to 180)
zoom_range = 0.1, # randomly zoom image
width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
horizontal_flip=False, # randomly flip images
vertical_flip=False) # randomly flip images
train_gen = datagen.flow(split_X_train, split_y_train, batch_size=64)
val_gen = datagen.flow(split_X_val, split_y_val, batch_size=64)
We can use the generators for iterating over the dataset as follows. Note Be aware of using a break
statement to stop the loop.
for batch, labels in train_gen:
print(batch.shape)
print(labels.shape)
break
Remember that our model expects an image of shape (28, 28, 1)
. We need to add a fourth dimension:
sample_ids = [860, 49723, 52099, 57651, 4180, 2445, 6265, 19183, 51078, 40753] # one example per class
sample_X = train_images[sample_ids]
print(sample_X.shape)
You can use expand_dims
for adding the last dimension:
np.expand_dims(sample_X, axis=3).shape
Now, we are ready for plotting a few images. Remember, we are using data augmentation. Thus, the following images are new for the model.
input_set = np.expand_dims(sample_X, axis=3) # use all 10 digits
input_set = np.expand_dims(sample_X[[0]], axis=3) # or use a single image from sample_X, like 0 in this case,
# to show the transformations made in a single digit
# style
rcParams['xtick.color'] = "#505050" # ticks gray color
rcParams['ytick.color'] = "#505050" # ticks gray color
# create plot
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
axes = axes.flatten()
i = 0
for batch in datagen.flow(input_set, batch_size=1):
image = batch[0, :, :, 0] # shape: (28, 28)
axes[i].imshow(image, cmap=plt.cm.binary)
i+=1
if i % 10 == 0:
break
# adjust margins
fig.subplots_adjust(wspace=0.5, hspace=0.5)
#fig.savefig("numbers_predictions.jpg", dpi=300)
Let's train a new model using data augmentation:
# network architecture
model = models.Sequential(name="model_v3")
# feature extractor
model.add(layers.Conv2D(32, (3,3), activation="relu", input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64, (3,3), activation="relu"))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64, (3,3), activation="relu"))
# classifier
model.add(layers.Flatten())
model.add(layers.Dense(64, activation="relu"))
model.add(layers.Dense(10, activation="softmax"))
# compilation
model.compile(optimizer="rmsprop",
loss="categorical_crossentropy",
metrics=["accuracy"],
)
# add callback
callback_list = [
ModelCheckpoint( # saves the current weights after every epoch
verbose=1, # verbosity mode
filepath="model_v3_{epoch:02d}.h5", # path to the destination model file
monitor="val_loss", # if the validation loss is improved,
save_best_only=True, # saves (overwrites) the best model so far
)
]
# training
batch_size = 64
history = model.fit_generator(train_gen,
epochs = 15,
steps_per_epoch = split_X_train.shape[0] // batch_size,
validation_data = val_gen,
validation_steps = split_X_val.shape[0] // batch_size,
callbacks=callback_list
)
# save model
model.save("mnist_model_v3.h5")
# evalute performance
test_loss, test_acc = model.evaluate(X_test, y_test)
print("test acc:", test_acc)
%matplotlib inline
plot_history(history, "model_v3")
Let's find how many images were classified incorrectly:
# load trained model
model = load_model("model_v3_07.h5") # you may need to change this line
# prediction
predictions = model.predict(X_test)
# convert to labels
y_pred = predictions.argmax(axis=1)
# compare with correct labels
y_real = test_labels # use copy, not one-hot encoding
num_samples = len(y_real)
acc = (y_pred == y_real).sum() / num_samples
errors = y_pred != y_real
print("Accuracy: %4f" % acc)
print("# of misclassified images: %d" % errors.sum())
# indices of misclassified images
error_indices_cnn = np.arange(num_samples, dtype=int)[errors]
It seems that the model is a bit overfitted since it classified incorrectly more images than the previous model:
Accuracy: 0.993600
# of misclassified images: 64
Dropout¶
In this section, we add a dropout layer to avoid overfitting.
# load dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# data preprocessing
X_train = X_train.reshape((60000, 28, 28, 1)) # use (28, 28, 1), not (28*28, )
X_train = X_train.astype("float32") / 255
X_test = X_test.reshape((10000, 28, 28, 1)) # use (28, 28, 1), not (28*28, )
X_test = X_test.astype("float32") / 255
# prepare labels (one-hot encoding)
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
# split into training and validation
split_X_train, split_X_val, split_y_train, split_y_val = train_test_split(X_train, y_train, test_size=0.1, random_state=42)
# create generator
datagen = ImageDataGenerator(
featurewise_center=False, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=False, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
rotation_range=10, # randomly rotate images in the range (degrees, 0 to 180)
zoom_range = 0.1, # randomly zoom image
width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
horizontal_flip=False, # randomly flip images
vertical_flip=False) # randomly flip images
train_gen = datagen.flow(split_X_train, split_y_train, batch_size=64)
val_gen = datagen.flow(split_X_val, split_y_val, batch_size=64)
We add a dropout layer after flattening features:
# network architecture
model = models.Sequential(name="model_v4")
# feature extractor
model.add(layers.Conv2D(32, (3,3), activation="relu", input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64, (3,3), activation="relu"))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64, (3,3), activation="relu"))
# classifier
model.add(layers.Flatten())
model.add(layers.Dropout(0.5)) # dropout layer
model.add(layers.Dense(64, activation="relu"))
model.add(layers.Dense(10, activation="softmax"))
# compilation
model.compile(optimizer="rmsprop",
loss="categorical_crossentropy",
metrics=["accuracy"],
)
# add callback
callback_list = [
ModelCheckpoint( # saves the current weights after every epoch
verbose=1, # verbosity mode
filepath="model_v4_{epoch:02d}.h5", # path to the destination model file
monitor="val_loss", # if the validation loss is improved,
save_best_only=True, # saves (overwrites) the best model so far
)
]
# training
batch_size = 64
history = model.fit_generator(train_gen,
epochs = 15,
steps_per_epoch = split_X_train.shape[0] // batch_size,
validation_data = val_gen,
validation_steps = split_X_val.shape[0] // batch_size,
callbacks=callback_list
)
# save model
model.save("mnist_model_v4.h5")
# evalute performance
test_loss, test_acc = model.evaluate(X_test, y_test)
print("test acc:", test_acc)
%matplotlib inline
plot_history(history, "model_v4")
# load trained model
model = load_model("model_v4_12.h5") # you may need to change this line
# prediction
predictions = model.predict(X_test)
# convert to labels
y_pred = predictions.argmax(axis=1)
# compare with correct labels
y_real = test_labels # use copy, not one-hot encoding
num_samples = len(y_real)
acc = (y_pred == y_real).sum() / num_samples
errors = y_pred != y_real
print("Accuracy: %4f" % acc)
print("# of misclassified images: %d" % errors.sum())
# indices of misclassified images
error_indices_cnn_v4 = np.arange(num_samples, dtype=int)[errors]
This is an improvement! Our previous model classified incorrectly 64 images. Let's see some of the images classified incorrectly by this new model:
%matplotlib inline
# load model
model = load_model("model_v4_12.h5") # you may need to change this line
# choose input images from TESTING SET
np.random.seed(42)
sample_ids = np.random.choice(error_indices_cnn_v2, 10, replace=False) # errors of model v2
# real labels
y_real = test_labels[sample_ids] # shape: (10, )
# predictions
predictions = model.predict(X_test[sample_ids])
y_pred = predictions.argmax(axis=1) # shape: (10, )
# style
rcParams['xtick.color'] = "#505050" # ticks gray color
rcParams['ytick.color'] = "#505050" # ticks gray color
# create plot
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for ix, yreal, ypred, ax in zip(sample_ids, y_real, y_pred, axes.flat):
sample_image = test_images[ix] # shape (28, 28)
color = "k"
if yreal != ypred: color = "r"
ax.set_title("Real class: %d\nPred class: %d\n ID: %d" % (yreal, ypred, ix), color=color)
ax.imshow(sample_image, cmap=plt.cm.binary)
# adjust margins
fig.subplots_adjust(wspace=0.5, hspace=0.5)
fig.savefig("numbers_predictions_v4.jpg", dpi=300)
Now, let's see some of the most challenging images for this model:
%matplotlib inline
# load model
model = load_model("model_v4_12.h5") # you may need to change this line
# choose input images from TESTING SET
np.random.seed(42)
sample_ids = np.random.choice(error_indices_cnn_v4, 10, replace=False) # errors of model v4
# real labels
y_real = test_labels[sample_ids] # shape: (10, )
# predictions
predictions = model.predict(X_test[sample_ids])
y_pred = predictions.argmax(axis=1) # shape: (10, )
# style
rcParams['xtick.color'] = "#505050" # ticks gray color
rcParams['ytick.color'] = "#505050" # ticks gray color
# create plot
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for ix, yreal, ypred, ax in zip(sample_ids, y_real, y_pred, axes.flat):
sample_image = test_images[ix] # shape (28, 28)
color = "k"
if yreal != ypred: color = "r"
ax.set_title("Real class: %d\nPred class: %d\n ID: %d" % (yreal, ypred, ix), color=color)
ax.imshow(sample_image, cmap=plt.cm.binary)
# adjust margins
fig.subplots_adjust(wspace=0.5, hspace=0.5)
fig.savefig("numbers_incorrect_predictions_v4.jpg", dpi=300)
Adding more filters¶
In this section, we change the model by adding more filters.
# load dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# data preprocessing
X_train = X_train.reshape((60000, 28, 28, 1)) # use (28, 28, 1), not (28*28, )
X_train = X_train.astype("float32") / 255
X_test = X_test.reshape((10000, 28, 28, 1)) # use (28, 28, 1), not (28*28, )
X_test = X_test.astype("float32") / 255
# prepare labels (one-hot encoding)
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
# split into training and validation
split_X_train, split_X_val, split_y_train, split_y_val = train_test_split(X_train, y_train, test_size=0.1, random_state=42)
# create generator
datagen = ImageDataGenerator(
featurewise_center=False, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=False, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
rotation_range=10, # randomly rotate images in the range (degrees, 0 to 180)
zoom_range = 0.1, # randomly zoom image
width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
horizontal_flip=False, # randomly flip images
vertical_flip=False) # randomly flip images
train_gen = datagen.flow(split_X_train, split_y_train, batch_size=64)
val_gen = datagen.flow(split_X_val, split_y_val, batch_size=64)
Here, we add more convolutional layers:
# network architecture
model = models.Sequential(name="model_v4")
# feature extractor
model.add(layers.Conv2D(64, (3,3), activation="relu", input_shape=(28, 28, 1)))
model.add(layers.Conv2D(64, (3,3), activation="relu"))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.BatchNormalization())
model.add(layers.Conv2D(128, (3,3), activation="relu"))
model.add(layers.Conv2D(128, (3,3), activation="relu"))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.BatchNormalization())
model.add(layers.Conv2D(256, (3,3), activation="relu"))
model.add(layers.MaxPooling2D((2,2)))
# classifier
model.add(layers.Flatten())
model.add(layers.BatchNormalization())
model.add(layers.Dense(512, activation="relu"))
model.add(layers.Dense(10, activation="softmax"))
# compilation
model.compile(optimizer="rmsprop", # the author employed 'adam'; we use 'rmsprop' for a fair comparison
loss="categorical_crossentropy",
metrics=["accuracy"])
# network architecture
model = models.Sequential(name="model_v5")
# feature extractor
model.add(layers.Conv2D(64, (3,3), activation="relu", input_shape=(28, 28, 1)))
model.add(layers.Conv2D(64, (3,3), activation="relu"))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.BatchNormalization())
model.add(layers.Conv2D(128, (3,3), activation="relu"))
model.add(layers.Conv2D(128, (3,3), activation="relu"))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.BatchNormalization())
model.add(layers.Conv2D(256, (3,3), activation="relu"))
model.add(layers.MaxPooling2D((2,2)))
# classifier
model.add(layers.Flatten())
model.add(layers.BatchNormalization())
#model.add(layers.Dropout(0.5)) # dropout layer
model.add(layers.Dense(512, activation="relu"))
model.add(layers.Dense(10, activation="softmax"))
# compilation
model.compile(optimizer="rmsprop",
loss="categorical_crossentropy",
metrics=["accuracy"],
)
# add callback
callback_list = [
ModelCheckpoint( # saves the current weights after every epoch
verbose=1, # verbosity mode
filepath="model_v5_{epoch:02d}.h5", # path to the destination model file
monitor="val_loss", # if the validation loss is improved,
save_best_only=True, # saves (overwrites) the best model so far
)
]
# training
batch_size = 64
history = model.fit_generator(train_gen,
epochs = 15,
steps_per_epoch = split_X_train.shape[0] // batch_size,
validation_data = val_gen,
validation_steps = split_X_val.shape[0] // batch_size,
callbacks=callback_list
)
# save model
model.save("mnist_model_v5.h5")
# evalute performance
test_loss, test_acc = model.evaluate(X_test, y_test)
print("test acc:", test_acc)
%matplotlib inline
plot_history(history, "model_v5")
# load trained model
model = load_model("model_v5_13.h5") # you may need to change this line
# prediction
predictions = model.predict(X_test)
# convert to labels
y_pred = predictions.argmax(axis=1)
# compare with correct labels
y_real = test_labels # use copy, not one-hot encoding
num_samples = len(y_real)
acc = (y_pred == y_real).sum() / num_samples
errors = y_pred != y_real
print("Accuracy: %4f" % acc)
print("# of misclassified images: %d" % errors.sum())
# indices of misclassified images
error_indices_cnn_v5 = np.arange(num_samples, dtype=int)[errors]
This is interesting, both models, v4 and v5, classified incorrectly the same number of images.
%matplotlib inline
# load model
model = load_model("model_v5_13.h5") # you may need to change this line
# choose input images from TESTING SET
np.random.seed(42)
sample_ids = np.random.choice(error_indices_cnn_v4, 10, replace=False) # errors of model v2
# real labels
y_real = test_labels[sample_ids] # shape: (10, )
# predictions
predictions = model.predict(X_test[sample_ids])
y_pred = predictions.argmax(axis=1) # shape: (10, )
# style
rcParams['xtick.color'] = "#505050" # ticks gray color
rcParams['ytick.color'] = "#505050" # ticks gray color
# create plot
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for ix, yreal, ypred, ax in zip(sample_ids, y_real, y_pred, axes.flat):
sample_image = test_images[ix] # shape (28, 28)
color = "k"
if yreal != ypred: color = "r"
ax.set_title("Real class: %d\nPred class: %d\n ID: %d" % (yreal, ypred, ix), color=color)
ax.imshow(sample_image, cmap=plt.cm.binary)
# adjust margins
fig.subplots_adjust(wspace=0.5, hspace=0.5)
fig.savefig("numbers_predictions_v5.jpg", dpi=300)
It seems that the model v5 can classify some of the images incorrectly classified by the model v4. That is, both models fail to identify about the same number of images (59), but the images themsleves are different. Let's see some of the challenging images for model v5:
%matplotlib inline
# load model
model = load_model("model_v5_13.h5") # you may need to change this line
# choose input images from TESTING SET
np.random.seed(42)
sample_ids = np.random.choice(error_indices_cnn_v5, 10, replace=False) # errors of model v5
# real labels
y_real = test_labels[sample_ids] # shape: (10, )
# predictions
predictions = model.predict(X_test[sample_ids])
y_pred = predictions.argmax(axis=1) # shape: (10, )
# style
rcParams['xtick.color'] = "#505050" # ticks gray color
rcParams['ytick.color'] = "#505050" # ticks gray color
# create plot
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for ix, yreal, ypred, ax in zip(sample_ids, y_real, y_pred, axes.flat):
sample_image = test_images[ix] # shape (28, 28)
color = "k"
if yreal != ypred: color = "r"
ax.set_title("Real class: %d\nPred class: %d\n ID: %d" % (yreal, ypred, ix), color=color)
ax.imshow(sample_image, cmap=plt.cm.binary)
# adjust margins
fig.subplots_adjust(wspace=0.5, hspace=0.5)
fig.savefig("numbers_incorrect_predictions_v5.jpg", dpi=300)
error_indices_cnn_v4
error_indices_cnn_v5
Combining predictions¶
Let's use both models, v4 and v5, for prediction. We are going to sum the predictions of both models in a single matrix. Remember, both models return a 10-dimensional vector with probabilities, one for each class:
prediction_v4 = [0.0000 0.0000 0.0000 0.1804 0.0000 0.0006 0.0000 0.0000 0.5270 0.2920]
prediction_v5 = [0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000]
Then, we add both vectors and take the maximum argument:
model_v4 - class: 8, probs: [0.0000 0.0000 0.0000 0.1804 0.0000 0.0006 0.0000 0.0000 0.5270 0.2920]
model_v5 - class: 9, probs: [0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000]
merge - class: 9, probs: [0.0000 0.0000 0.0000 0.1804 0.0000 0.0006 0.0000 0.0000 0.5270 1.2920]
Let's load both models:
# load model
model_v4 = load_model("model_v4_12.h5") # you may need to change this line
model_v5 = load_model("model_v5_13.h5") # you may need to change this line
Below, we classify challenging images for the model v4:
# images classified incorrectly by model v4
y_real = test_labels[error_indices_cnn_v4]
predictions_v4 = model_v4.predict(X_test[error_indices_cnn_v4])
predictions_v5 = model_v5.predict(X_test[error_indices_cnn_v4])
print(predictions_v4.argmax(axis=1))
print(predictions_v5.argmax(axis=1))
Now, we aggregate both predictions:
predictions = predictions_v4 + predictions_v5
y_pred = predictions.argmax(axis=1)
The number of images classified incorrectly is 28:
# real labels
(y_real != y_pred).sum()
Below, we print the output of both models for a single example:
def print_prediction(pred, model):
probs = "[" + " ".join(["%.4f" % val for val in pred]) + "]"
pclass = pred.argmax()
print("%10s - class: %d, probs: %s" % (model, pclass, probs))
# first prediction of both models
pred_v4 = predictions_v4[0]
prev_v5 = predictions_v5[0]
print_prediction(pred_v4, "model_v4")
print_prediction(pred_v5, "model_v5")
# merge predictions
print_prediction(pred_v4 + pred_v5, "merge")
As it can be seen, model v4 predicts '8' whereas model v5 predicts '9'. When both vectors are aggregated, the element corresponding to the class '9' is the highest in the vector. Therefore, the prediction by using both models is '9'.
Let's try our new model with the images classified incorrectly by model v4.
%matplotlib inline
# load model
model_v4 = load_model("model_v4_12.h5") # you may need to change this line
model_v5 = load_model("model_v5_13.h5") # you may need to change this line
# choose input images from TESTING SET
np.random.seed(42)
sample_ids = np.random.choice(error_indices_cnn_v4, 10, replace=False) # errors of model v4
# real labels
y_real = test_labels[sample_ids] # shape: (10, )
# predictions from both models
predictions_v4 = model_v4.predict(X_test[sample_ids])
predictions_v5 = model_v5.predict(X_test[sample_ids])
# merge predictions
predictions = predictions_v4 + predictions_v5
y_pred = predictions.argmax(axis=1) # shape: (10, )
# style
rcParams['xtick.color'] = "#505050" # ticks gray color
rcParams['ytick.color'] = "#505050" # ticks gray color
# create plot
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for ix, yreal, ypred, ax in zip(sample_ids, y_real, y_pred, axes.flat):
sample_image = test_images[ix] # shape (28, 28)
color = "k"
if yreal != ypred: color = "r"
ax.set_title("Real class: %d\nPred class: %d\n ID: %d" % (yreal, ypred, ix), color=color)
ax.imshow(sample_image, cmap=plt.cm.binary)
# adjust margins
fig.subplots_adjust(wspace=0.5, hspace=0.5)
fig.savefig("numbers_incorrect_predictions_merge_v4.jpg", dpi=300)
Now, we use our new model with the images classified incorrectly by model v5.
%matplotlib inline
# load model
model_v4 = load_model("model_v4_12.h5") # you may need to change this line
model_v5 = load_model("model_v5_13.h5") # you may need to change this line
# choose input images from TESTING SET
np.random.seed(42)
sample_ids = np.random.choice(error_indices_cnn_v5, 10, replace=False) # errors of model v5
# real labels
y_real = test_labels[sample_ids] # shape: (10, )
# predictions from both models
predictions_v4 = model_v4.predict(X_test[sample_ids])
predictions_v5 = model_v5.predict(X_test[sample_ids])
# merge predictions
predictions = predictions_v4 + predictions_v5
y_pred = predictions.argmax(axis=1) # shape: (10, )
# style
rcParams['xtick.color'] = "#505050" # ticks gray color
rcParams['ytick.color'] = "#505050" # ticks gray color
# create plot
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for ix, yreal, ypred, ax in zip(sample_ids, y_real, y_pred, axes.flat):
sample_image = test_images[ix] # shape (28, 28)
color = "k"
if yreal != ypred: color = "r"
ax.set_title("Real class: %d\nPred class: %d\n ID: %d" % (yreal, ypred, ix), color=color)
ax.imshow(sample_image, cmap=plt.cm.binary)
# adjust margins
fig.subplots_adjust(wspace=0.5, hspace=0.5)
fig.savefig("numbers_incorrect_predictions_merge_v5.jpg", dpi=300)
Remember, all above images are challenging (e.g., incorrectly classified) for model v5. Although, if we use both models, v4 and v5, 5 of the 10 images are classified correctly: 115, 4699, 9850, 7434, and 1438.
Now, let's finish our analysis by computing the accuracy of both models in the test set:
# load models
model_v4 = load_model("model_v4_12.h5") # you may need to change this line
model_v5 = load_model("model_v5_13.h5") # you may need to change this line
# predictions
prediction_v4 = model_v4.predict(X_test) # shape (10000, 10)
prediction_v5 = model_v5.predict(X_test) # shape (10000, 10)
# merge predictions
predictions = prediction_v4 + prediction_v5
# convert to labels
y_pred = predictions.argmax(axis=1)
# compare with correct labels
y_real = test_labels # use copy, not one-hot encoding
num_samples = len(y_real)
acc = (y_pred == y_real).sum() / num_samples
errors = y_pred != y_real
print("Accuracy: %4f" % acc)
print("# of misclassified images: %d" % errors.sum())
# indices of misclassified images
error_indices_cnn_merge = np.arange(num_samples, dtype=int)[errors]
That's it. We reduced the number of images classified incorrectly from 59 to 39 by aggregating the predictions of two models. Our final accuracy is 99.6%. Let's see the most challenging images:
error_indices_cnn_merge
%matplotlib inline
# real labels
set_y_real = y_real[error_indices_cnn_merge]
set_y_pred = y_pred[error_indices_cnn_merge]
# style
rcParams['xtick.color'] = "#505050" # ticks gray color
rcParams['ytick.color'] = "#505050" # ticks gray color
# create plot
fig, axes = plt.subplots(5, 8, figsize=(20, 15)) #12, 5
for ix, yreal, ypred, ax in zip(error_indices_cnn_merge, set_y_real, set_y_pred, axes.flat):
sample_image = test_images[ix] # shape (28, 28)
color = "k"
if yreal != ypred: color = "r"
ax.set_title("Real class: %d\nPred class: %d\n ID: %d" % (yreal, ypred, ix), color=color)
ax.imshow(sample_image, cmap=plt.cm.binary)
# adjust margins
fig.subplots_adjust(wspace=0.5, hspace=0.5)
fig.savefig("numbers_incorrect_predictions_merge.jpg", dpi=300)
Conclusions¶
In this notebook, we discuss how to design a neural network for digit classification. We trained several models and achieved an accuracy of 99.6% in the test set. This means that only 39 of 10,000 images were incorrectly classified.
Accuracy: 0.996100
# of misclassified images: 39
In order to achieved that result, we did the following:
Preprocess the dataset:
- Change the range from
[0, 255]
to[0, 1]
. - Change the data type from
int
tofloat
. - Change the shape from
(28,28)
to(28*28,)
for the first model, and(28,28,1)
for the convolutional models.
- Change the range from
Split the dataset into training, validation, and test set.
Increase the number of training examples by using data augmentation.
Try several arquitectures:
- v1: feed forward neural network
- v2: convolutional neural network
- v3: convolutional neural network + data augmentation
- v4: convolutional neural network + data augmentation + dropout
- v5: convolutional neural network + data augmentation + batch normalization
- v4 + v5
The performance of the last two models is similar in terms of accuracy. However, the images classified incorrectly by these models are different. Hence, we combined both models to improve the accuracy. This notebook is available here.