Keras: computing accuracy

Posted on Sun 28 February 2021 in Recipes

Suppose you just finished to train your model and want to evaluate its performance on a large dataset. Here, we show a few ways to compute its accuracy.

First approach¶

Assumptions:

You have trained your model using an ImageDataGenerator.
You include "acc" in the compilation of the model as follows:

model.compile(loss="binary_crossentropy",
              optimizer=RMSprop(lr=1e-4),
              metrics=["acc"]
             )

In [9]:

from keras.preprocessing.image import ImageDataGenerator
from keras.models import load_model 
import numpy as np

# load model
model = load_model("../streamlit/cats_and_dogs_small_1.h5")

# create generator
test_path = "/media/data/dogs_vs_cats_small/test"
test_datagen = ImageDataGenerator(rescale=1./255)     # the testing generator is similar to the 
                                                      # generator employed for training, configure it properly
batch_size = 20
test_generator = test_datagen.flow_from_directory(
                            test_path,
                            target_size=(150, 150),
                            shuffle=False,            # this is recommended for testing
                            batch_size=batch_size,        
                            class_mode="binary")

# define step size = n_images // batch_size
n_images = test_generator.samples
steps = n_images // batch_size

# compute loss and accuracy
test_loss, test_acc = model.evaluate_generator(test_generator, steps)  # chollet, p. 158

Found 1000 images belonging to 2 classes.

In [10]:

print("loss: %.8f" % test_loss)
print("acc:  %.8f" % test_acc)

loss: 0.99661190
acc:  0.73700000

Considerations:

Be sure to setup the generator (test_generator) properly. It should use the same configuration of the generator employed for training the model (same target_size, class_mode, and rescale). Attributes like directory, shuffle, and batch_size can be different. See the documentation for more details.
Here we used steps = n_images // batch_size rather than other approaches like steps = int(np.ceil(n_images/batch_size)). We followed that method as recommended by Adrian Rosebrock.

Second approach¶

If you need more control over the batches of images or you do not specified "acc" when compiling the model, then you can use the following approach.

In [12]:

from keras.preprocessing.image import ImageDataGenerator
from keras.models import load_model 
import numpy as np

# load model
model = load_model("../streamlit/cats_and_dogs_small_1.h5")

# create generator
test_path = "/media/data/dogs_vs_cats_small/test"
test_datagen = ImageDataGenerator(rescale=1./255)     # the testing generator is similar to the 
                                                      # generator employed for training, configure it properly
batch_size = 20
test_generator = test_datagen.flow_from_directory(
                            test_path,
                            target_size=(150, 150),
                            shuffle=False,            # this is recommended for testing
                            batch_size=batch_size,        
                            class_mode="binary")

# define step size = n_images // batch_size
n_images = test_generator.samples
steps = n_images // batch_size

# predict test images
pred = model.predict_generator(testn_generator, steps)    # [[0.00, 0.99, 0.00, 0.33, ...]]
all_pred_labels = (pred > 0.5).astype('float').flatten()  #  [ 0.0, 1.0,  0.0,  0.0, ...]
all_real_labels = test_generator.labels                   #  [ 0.0,  0.0, 0.0, 0.0, ..., 1.0, 1.0, 1.0]

# compute accuracy
acc = (all_pred_labels == all_real_labels).sum() / len(all_pred_labels)
print("acc:  %.8f" % acc)

Found 1000 images belonging to 2 classes.
acc:  0.73700000

Considerations:

The output of model.predict_generator() are predictions (not labels). As a result, we need to use a threshold for converting the predictions to labels (in this case, it is a binary classification problem, so there are two labels, 0.0 and 1.0).
We also need to flatten pred since its shape is (1000, 1).

In [22]:

# This block shows a few predictions and their corresponding labels
for i in range(4):
    print("pred: %.4f, label: %d" % (pred[[i]], all_pred_labels[i]))

pred: 0.0000, label: 0
pred: 0.9998, label: 1
pred: 0.0092, label: 0
pred: 0.3316, label: 0

In [23]:

# Shape of the predictions
pred.shape

Out[23]:

(1000, 1)

Third approach¶

Now, we show one more way to compute the accuracy. Here, we predict a batch of images and save the predictions. We repeat this procedure until all the images are processed.

In [28]:

from keras.preprocessing.image import ImageDataGenerator
from keras.models import load_model 
import numpy as np

# load model
model = load_model("../streamlit/cats_and_dogs_small_1.h5")

# create generator
test_path = "/media/data/dogs_vs_cats_small/test"
test_datagen = ImageDataGenerator(rescale=1./255)     # the testing generator is similar to the 
                                                      # generator employed for training, configure it properly
batch_size = 20
test_generator = test_datagen.flow_from_directory(
                            test_path,
                            target_size=(150, 150),
                            shuffle=False,            # this is recommended for testing
                            batch_size=batch_size,        
                            class_mode="binary")

# define step size = n_images // batch_size
n_images = test_generator.samples
steps = n_images // batch_size

# predict test images
all_pred_labels = np.zeros(n_images)
all_real_labels = np.zeros(n_images)

for i in range(steps):
    
    if (i+1)%10 == 0:
        print("batch %d/%d" % (i+1, steps))
    
    for images, true_labels in testn_generator:

        pred = model.predict(images)
        pred_labels = (pred > 0.5).astype('float').flatten()

        # save predictions and real labels
        all_pred_labels[i*batch_size:(i+1)*batch_size] = pred_labels
        all_real_labels[i*batch_size:(i+1)*batch_size] = true_labels

        break

# compute accuracy
acc = (all_pred_labels == all_real_labels).sum() / len(all_pred_labels)
print("acc:  %.8f" % acc)

Found 1000 images belonging to 2 classes.
batch 10/50
batch 20/50
batch 30/50
batch 40/50
batch 50/50
acc:  0.73700000

Considerations:

This approach requires a few more lines of code but it returns the same value for acc.
You can also iterate over the batches as follows:

i = 0
for images, true_labels in testn_generator:

    pred = model.predict(images)
    pred_labels = (pred > 0.5).astype('float').flatten()

    # save predictions and real labels
    all_pred_labels[i*batch_size:(i+1)*batch_size] = pred_labels
    all_real_labels[i*batch_size:(i+1)*batch_size] = true_labels

    # stop criteria
    i += 1
    if i * batch_size >= n_images:    # chollet p. 147
        break

The full code is given below:

In [26]:

from keras.preprocessing.image import ImageDataGenerator
from keras.models import load_model 
import numpy as np

# load model
model = load_model("../streamlit/cats_and_dogs_small_1.h5")

# create generator
test_path = "/media/data/dogs_vs_cats_small/test"
test_datagen = ImageDataGenerator(rescale=1./255)     # the testing generator is similar to the 
                                                      # generator employed for training, configure it properly
batch_size = 20
test_generator = test_datagen.flow_from_directory(
                            test_path,
                            target_size=(150, 150),
                            shuffle=False,            # this is recommended for testing
                            batch_size=batch_size,        
                            class_mode="binary")

# define step size = n_images // batch_size
n_images = test_generator.samples
steps = n_images // batch_size

# predict test images
all_pred_labels = np.zeros(n_images)
all_real_labels = np.zeros(n_images)

i = 0
for images, true_labels in testn_generator:

    pred = model.predict(images)
    pred_labels = (pred > 0.5).astype('float').flatten()

    # save predictions and real labels
    all_pred_labels[i*batch_size:(i+1)*batch_size] = pred_labels
    all_real_labels[i*batch_size:(i+1)*batch_size] = true_labels

    if (i+1)%10 == 0:
        print("batch %d/%d" % (i+1, steps))
    
    # stop criteria
    i += 1
    if i * batch_size >= n_images:    # chollet p. 147
        break
        
# compute accuracy
acc = (all_pred_labels == all_real_labels).sum() / len(all_pred_labels)
print("acc:  %.8f" % acc)

Found 1000 images belonging to 2 classes.
batch 10/50
batch 20/50
batch 30/50
batch 40/50
batch 50/50
acc:  0.73700000

References¶

[Deep Learning with Python]
pyimagesearch.com
Keras documentation