Written by Brian Hulela
Updated at 25 Jun 2025, 20:38
15 min read
Images of cats and dogs classified with a CNN classifier
In this article, we’ll guide you through the process of training a Convolutional Neural Network (CNN) from scratch using python, specifically to classify images of cats
and dogs
. Whether you're a beginner or have some experience in machine learning, we’ll break down each step clearly so you can confidently build your own models.
All the code used in this guide can be found on this GitHub Repository.
Before we dive into coding, it's essential to set up our development environment. For this tutorial, we'll be using Jupyter Notebook, which allows us to execute Python code interactively and visualize data on the fly. This makes the development process smoother and more organized.
First, open your terminal (or Command Prompt on Windows) and set up a project folder called cnn_classification
, then navigate to it:
mkdir cnn_classification
cd cnn_classification
This ensures that all project-related files stay organized in a dedicated directory.
We then create and activate a virtual environment. A virtual environment helps isolate your project’s dependencies, preventing conflicts with other projects and ensuring consistency across different setups. It allows you to manage packages specifically for your project. Here’s how to create and activate a virtual environment:
python -m venv venv
Activate the virtual environment by running:
On Windows:
./venv/Scripts/activate
On macOS/Linux
source venv/bin/activate
Once the virtual environment is activated, you'll install Jupyter by running:
pip install jupyter
Next, launch Jupyter Notebook with the following command:
jupyter notebook
Now, open a new notebook and name it classify.ipynb
, and you’re ready to start coding!
To train our CNN, we'll need several Python libraries. In a new cell in the classify.ipynb
Jupyter Notebook, run the following command to install the necessary dependencies:
# Install necessary libraries directly in the notebook
%pip install tensorflow numpy matplotlib seaborn pandas tabulate scipy scikit-learn
This installation includes:
tensorflow – for building and training deep learning models
numpy – for efficient numerical computations and array manipulations
matplotlib & seaborn – for visualizing data, training progress, and evaluation metrics
pandas – for handling and preprocessing datasets
tabulate – for displaying results in well-structured tables
scipy – for scientific computing and optimization functions
scikit-learn – for evaluation metrics, model selection, and data preprocessing
For this tutorial, we'll use the Cats and Dogs dataset from Kaggle which contains images of cats and dogs. We will load and preprocess this data so that it can be fed into the CNN for training.
The dataset has two folders, the train
and test
folder. Each folder contains images of cats and dogs (e.g. cat01.jpg
or dog93.jpg
), as well as the label files where the labels of our data are.
The dataset contains:
18,388 train images, with 7,817 cats and 10,471 dogs.
3,998 test images, with 1,999 cats and 1,999 dogs.
All the images are RGB images and have a size of 128x128 pixels.
The dataset is stored in two directories: train
and test
. The train folder contains the train_label.csv
file while the test folder contains the test_label.csv
file. Each of these files have a file name and label which is either 0 or 1:
0: Cat
1: Dog
First, let’s import the necessary libraries and load the CSV files containing the labels:
import os
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing import image
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
from tabulate import tabulate
import random
import matplotlib.image as mpimg
from tensorflow.keras import layers, models
from sklearn.metrics import confusion_matrix
import seaborn as sns
# Define directory paths
train_dir = 'archive/train'
test_dir = 'archive/test'
# Load CSV files for train and test labels
train_labels_df = pd.read_csv(os.path.join(train_dir, 'train_label.csv'))
test_labels_df = pd.read_csv(os.path.join(test_dir, 'test_label.csv'))
Let’s convert the labels to strings for consistency and check the distribution of cats and dogs in both the training and test datasets:
# Convert the labels in the dataframe to strings
train_labels_df['label'] = train_labels_df['label'].astype(str)
test_labels_df['label'] = test_labels_df['label'].astype(str)
total_data = [
["Train", train_labels_df.shape[0]],
["Test", test_labels_df.shape[0]]
]
print(tabulate(total_data, headers=["Type", "Count"], tablefmt="grid"))
Output:
+--------+---------+
| Type | Count |
+========+=========+
| Train | 18388 |
+--------+---------+
| Test | 3998 |
+--------+---------+
This step ensures that the dataset is properly loaded and gives us an overview of the number of samples in each dataset.
To better understand the data, let's count how many cats and dogs are in both the training and test datasets:
# Count the number of cats and dogs in the train dataset
train_cat_count = train_labels_df[train_labels_df['label'] == "0"].shape[0]
train_dog_count = train_labels_df[train_labels_df['label'] == "1"].shape[0]
# Count the number of cats and dogs in the test dataset
test_cat_count = test_labels_df[test_labels_df['label'] == "0"].shape[0]
test_dog_count = test_labels_df[test_labels_df['label'] == "1"].shape[0]
# Prepare the data for display using tabulate
train_data = [
['Cats', train_cat_count],
['Dogs', train_dog_count]
]
test_data = [
['Cats', test_cat_count],
['Dogs', test_dog_count]
]
# Display the results using tabulate
print("Train Data:")
print(tabulate(train_data, headers=["Label", "Count"], tablefmt="grid"))
print("\nTest Data:")
print(tabulate(test_data, headers=["Label", "Count"], tablefmt="grid"))
This will print the count of cats and dogs in both the training and test datasets. Understanding the dataset's balance is crucial for model training, as imbalances may require techniques like data augmentation.
Output:
Train Data:
+---------+---------+
| Label | Count |
+=========+=========+
| Cats | 7917 |
+---------+---------+
| Dogs | 10471 |
+---------+---------+
Test Data:
+---------+---------+
| Label | Count |
+=========+=========+
| Cats | 1999 |
+---------+---------+
| Dogs | 1999 |
+---------+---------+
Let’s take a look at a few random images from the training set to get an idea of the data format and ensure everything is loaded correctly:
# Define a function to display and save random images from the train dataset
def visualize_random_images(train_dir, num_images=6, image_size=(128, 128)):
# Set dark mode
plt.style.use("dark_background")
# Get all image filenames from the train directory
train_image_files = train_labels_df['file name'].tolist()
# Randomly sample a specified number of images
random_images = random.sample(train_image_files, num_images)
# Set up the plot (2 rows, 3 columns)
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()
# Loop through the random images and display them
for i, img_file in enumerate(random_images):
img_path = os.path.join(train_dir, img_file)
img = mpimg.imread(img_path)
# Display the image
axes[i].imshow(img)
axes[i].axis('off')
# Display the corresponding label (0 for cat, 1 for dog)
label = train_labels_df[train_labels_df['file name'] == img_file]['label'].values[0]
label_name = 'Cat' if label == "0" else 'Dog'
axes[i].set_title(label_name, color="white") # Set title color for dark mode
# Save the figure
plt.savefig(save_path, dpi=300, bbox_inches="tight", facecolor="black")
plt.show()
# Visualize and save 6 random images from the train set
visualize_random_images(train_dir, num_images=6, save_path="random_images_dark.png")
This function randomly samples five images from the training set, displaying them with their labels ('Cat' or 'Dog').
Sample Images from the Cats and Dogs Dataset from Kaggle
Proper data preprocessing is essential for training a robust and accurate model. Since raw images can vary in brightness, scale, and orientation, preprocessing ensures consistency and better generalization. One key aspect is rescaling pixel values, which involves dividing the pixel values by 255. This scales the original pixel range (0-255) to a 0-1 range, making it easier for the neural network to process and train effectively.
By standardizing the pixel values, the model can learn more efficiently, as large input values might cause instability in the training process. Without preprocessing, the model might struggle with variations in lighting and scale, leading to poor performance.
Data augmentation is particularly important because it artificially expands the dataset by applying transformations such as flipping, rotating, and zooming. This helps the model become more resilient to variations in real-world images, preventing overfitting to the training data. By exposing the model to different perspectives of the same object, augmentation enhances its ability to recognize patterns in unseen images, ultimately leading to better accuracy on test data.
Here's the code for data preprocessing with ImageGenerator
# Create an ImageDataGenerator for data augmentation on the training set
train_datagen = ImageDataGenerator(
rescale=1./255, # Rescale pixel values to [0, 1]
shear_range=0.2, # Random shear transformations
zoom_range=0.2, # Random zoom
horizontal_flip=True # Random horizontal flip
)
# Create a validation data generator (no augmentation for validation data)
test_datagen = ImageDataGenerator(rescale=1./255)
# Flow data from the directories using the labels and paths from CSV
train_generator = train_datagen.flow_from_dataframe(
dataframe=train_labels_df,
directory=train_dir,
x_col='file name',
y_col='label',
target_size=(128, 128), # Resize images to 128x128
batch_size=32, # Number of images per batch
class_mode='binary' # Binary classification (cat vs dog)
)
validation_generator = test_datagen.flow_from_dataframe(
dataframe=test_labels_df,
directory=test_dir,
x_col='file name',
y_col='label',
target_size=(128, 128), # Resize images to 128x128
batch_size=32, # Number of images per batch
class_mode='binary' # Binary classification (cat vs dog)
)
This ensures images are dynamically preprocessed and augmented during training while validation data remains unchanged.
Now that we've prepared our data, it's time to build the Convolutional Neural Network (CNN). The CNN will automatically learn features from the images through convolution and pooling layers. The model will then make predictions based on these features.
In defining the CNN architecture, we made the following key choices:
Input Shape: Set to (128, 128, 3)
for RGB images of size 128x128.
Convolutional Layers: Start with 32 filters and increase (32 → 64 → 128) to learn more complex features.
ReLU Activation: Used in convolutional and dense layers for faster learning and non-linearity.
Max-Pooling: Applied after each convolution to reduce spatial dimensions and prevent overfitting.
Flatten Layer: Converts 2D feature maps to 1D before feeding into dense layers.
Dense Layers: First dense layer with 128 units, followed by a sigmoid output layer for binary classification (cat vs. dog).
Compilation: Adam optimizer for fast convergence, binary cross-entropy loss for binary classification, and accuracy as the evaluation metric.
The architecture of the CNN is explained in detail in the Feature Extraction and Convolutional Neural Networks section.
Here’s how we define the CNN architecture using TensorFlow's Keras API:
# Initialize the model
model = models.Sequential()
# Add Convolutional Layers and Pooling Layers
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(128, 128, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
# Flatten the results to feed into a fully connected layer
model.add(layers.Flatten())
# Add Dense Layers
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid')) # Sigmoid for binary classification
# Compile the model
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
# Summarize the model architecture
model.summary()
This model consists of three convolutional layers followed by max-pooling layers, which progressively extract higher-level features from the input images. Finally, the dense layers process the features and output a prediction (0 for cat, 1 for dog).
Model Architecture Summary
Now that the model is ready, we train it using the prepared data. Add this code after defining your model:
# Train the model
history = model.fit(
train_generator,
steps_per_epoch=train_generator.samples // train_generator.batch_size,
epochs=40, # You can adjust the number of epochs
validation_data=validation_generator,
validation_steps=validation_generator.samples // validation_generator.batch_size
)
Training Data: train_generator
feeds the training data.
Validation Data: validation_generator
provides the validation data during training.
Epochs: Set to 40 for training, adjust as needed.
Steps per Epoch: Ensures all training samples are processed.
Validation Steps: Ensures all validation samples are evaluated.
This code trains the model, allowing it to learn from the data and improve its predictions.
Once training is complete, we can save the model for future use:
# Save the trained model
model.save('model.h5') # This will save the model as 'model.h5' in the current directory
After training the model, it's important to assess its performance. We can visualize how the model's accuracy and loss change over the epochs for both training and validation data.
Use the following code to plot the training and validation accuracy, as well as the training and validation loss:
# Assuming you have saved the history of the training process in the variable 'history'
history = model.history.history
# Plot the training and validation accuracy
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.plot(history['accuracy'], label='Training Accuracy')
plt.plot(history['val_accuracy'], label='Validation Accuracy')
plt.title('Accuracy over Epochs')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
# Plot the training and validation loss
plt.subplot(1, 2, 2)
plt.plot(history['loss'], label='Training Loss')
plt.plot(history['val_loss'], label='Validation Loss')
plt.title('Loss over Epochs')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
# Save the figure
plt.savefig("train_perfomance.png", dpi=300, bbox_inches="tight", facecolor="black")
plt.show()
Training Accuracy and Loss Performance Metrics
To evaluate the model’s performance, we use the test dataset:
# Evaluate the model on the test set
test_loss, test_accuracy = model.evaluate(validation_generator, verbose=1)
# Prepare the results in a table format
results = [
["Test Loss", test_loss],
["Test Accuracy", test_accuracy]
]
# Print the table
print(tabulate(results, headers=["Metric", "Value"], tablefmt="grid"))
This will print the loss and accuracy of the model on the test dataset, allowing you to gauge its performance.
Output:
+---------------+----------+
| Metric | Value |
+===============+==========+
| Test Loss | 0.383285 |
+---------------+----------+
| Test Accuracy | 0.864682 |
+---------------+----------+
To evaluate how well the model is performing, we visualize some predictions from the validation set:
Get a batch of images: We retrieve a batch of images and their true labels from the validation generator.
Make predictions: The model makes predictions on these images, and the predicted labels are converted to binary (0 or 1).
Randomly select images: A few random images are selected to display, along with their true and predicted labels.
This provides a quick visual comparison to see how well the model is classifying images.
Here’s the code to visualize predictions:
# Get a batch of images and their true labels from the validation generator
images, true_labels = next(validation_generator)
# Make predictions on the batch of images
predictions = model.predict(images)
# Convert the predicted labels to binary (0 or 1) for classification
predicted_labels = (predictions > 0.5).astype('int32')
# Randomly select a few images to display
num_images_to_display = 6
indices = random.sample(range(images.shape[0]), num_images_to_display)
# Create a figure to display the images in a 2x3 grid
plt.figure(figsize=(12, 8))
for i, index in enumerate(indices):
plt.subplot(2, 3, i + 1) # Create 2 rows and 3 columns
img = images[index]
true_label = true_labels[index]
predicted_label = predicted_labels[index]
# Show image
plt.imshow(img)
plt.axis('off')
# Show true and predicted labels
plt.title(f"True: {'Cat' if true_label == 0 else 'Dog'}, Pred: {'Cat' if predicted_label == 0 else 'Dog'}")
# Save the figure
plt.savefig("cat_dog_results", dpi=300, bbox_inches="tight", facecolor="black")
plt.show()
Ground Truth Labels with Labels Predicted using CNN
To evaluate the performance of the model more quantitatively, we use the confusion matrix. This matrix compares the model’s predictions against the true labels and helps us understand how well the model is classifying the images.
In a binary classification problem like this one (Cat vs. Dog), the confusion matrix consists of four key metrics:
True Positive (TP): The model correctly predicted the positive class (in this case, "Dog") when it was actually a "Dog."
Example: The model predicted "Dog," and the actual label was "Dog."
False Positive (FP): The model incorrectly predicted the positive class (in this case, "Dog") when the true label was negative (i.e., "Cat").
Example: The model predicted "Dog," but the actual label was "Cat."
True Negative (TN): The model correctly predicted the negative class (in this case, "Cat") when it was actually a "Cat."
Example: The model predicted "Cat," and the actual label was "Cat."
False Negative (FN): The model incorrectly predicted the negative class (in this case, "Cat") when the true label was positive (i.e., "Dog").
Example: The model predicted "Cat," but the actual label was "Dog."
To visualize these metrics, we use the confusion matrix
# Initialize empty lists to store true and predicted labels
all_true_labels = []
all_predicted_labels = []
# Iterate through the validation generator (test set)
for images, true_labels_batch in validation_generator:
# Make predictions on the batch of images
predictions = model.predict(images)
# Convert the predicted labels to binary (0 or 1) for classification
predicted_labels_batch = (predictions > 0.5).astype('int32')
# Append the true and predicted labels to the lists
all_true_labels.extend(true_labels_batch)
all_predicted_labels.extend(predicted_labels_batch)
# Break after one full pass through the dataset
if len(all_true_labels) >= validation_generator.samples:
break
# Compute the confusion matrix on all true and predicted labels
cm = confusion_matrix(all_true_labels, all_predicted_labels)
# Plot confusion matrix
plt.figure(figsize=(6, 5))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=["Cat", "Dog"], yticklabels=["Cat", "Dog"])
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.title("Confusion Matrix")
# Save the figure
plt.savefig("confusion matrix", dpi=300, bbox_inches="tight", facecolor="black")
plt.show()
Confusion Matrix
In this article, we built and trained a Convolutional Neural Network (CNN) for binary image classification.
After preparing and augmenting the dataset, we defined the CNN architecture, trained it on the data, and monitored its performance using accuracy and loss metrics.
We visualized the results and evaluated the model on the test set, demonstrating its ability to classify images effectively. This article provides a foundation for developing and improving computer vision models.
If you have any issues, feel free to Contact Me. All the code for this guide is also available in this GitHub Repository.