MNIST Handwritten Digits Recognition using Python | Image Classification | Deep Learning Tutorial

Hackers Realm
Apr 25, 2022
7 min read

Updated: Jun 4, 2023

Embark on an exciting journey of handwritten digits recognition using Python! This deep learning tutorial focuses on the MNIST dataset, where you'll learn image classification techniques. Master the art of preprocessing, building and training deep neural networks, and evaluating model performance. Enhance your skills in Python, image classification, and deep learning while working with real-world data. Join this comprehensive tutorial to unlock the secrets of recognizing handwritten digits and pave your way to advanced image understanding. #MNIST #Python #ImageClassification #DeepLearning #NeuralNetworks #ModelEvaluation

MNIST Handwritten Digits - Image Classification

In the previous projects we have been using Jupyter Notebook interface to code and process all our dataset and the corresponding models. At this point forward for all our future projects for deep learning and machine learning, we will be using the Google Colab interface.

You can watch the step by step explanation video tutorial down below

To know more and get started with Google Colab, click here

Dataset Information

This dataset allows you to study, analyze and recognize elements in the images. That’s exactly how your camera detects your face, using image recognition! It’s a digit recognition problem. This data set has 49,000 images of 28 X 28 size, totaling 49 MB.

Download the dataset here

Import Modules

First, we have to import all the basic modules we will be needing for this project.

!pip install tensorflow-gpu keras

Installation of tensorflow-gpu backend module and keras API

import pandas as pd
import numpy as np
from tqdm.notebook import tqdm
from keras.preprocessing.image import img_to_array, load_img
import tensorflow as tf
import matplotlib.pyplot as plt
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

pandas - used to perform data manipulation and analysis
numpy - used to perform a wide variety of mathematical operations on arrays
tqdm.notebook - progressbar decorator for iterators. Includes a default range iterator printing to stderr.
keras.preprocessing.image - basic preprocessing step for loading the image and converting the image to a numpy array
tensorflow – backend module for the use of Keras
matplotlib - used for data visualization and graphical plotting
warnings - to manipulate warnings details
%matplotlib - to enable the inline plotting

filterwarnings('ignore') is to ignore the warnings thrown by the modules (gives clean results)

Unzip the train data

We need to extract the dataset for training and testing using unzip

!unzip Train_UQcUa52.zip

Load the data

df = pd.read_csv('train.csv')
df.head()

pd.read_csv() - read csv file to view the data
Filename is the image name
Label determines the digit value in the image
Data must be converted into an array for processing

!pwd

/content

Current directory path

image_path = 'Images/train/'

Storing the path of the image data

Now we load all the dataset into an array for processing

X = np.array([img_to_array(load_img(image_path+df['filename'][i], target_size=(28,28,1), grayscale=True))
                   for i in tqdm(range(df.shape[0]))
                   ]).astype('float32')

Array where the appended images will be stored
target_size=(28,28,1) - the images are in grayscale so the channel mode is one (1), in a 28 x 28 plot size
grayscale=True - indication to the function that it’s a grayscale image
range(df.shape[0]) - retrieve the length of the whole training data set
tqdm() - display of overall process with a loading bar on each iteration of each image being converted into an array
astype('float32') - conversion of numpy array into a float 32 data type

y = df['label']

Loading label data to the variable y

print(X.shape, y.shape)

(49000, 28, 28, 1) (49000,)

Input data has 49000 samples with 28 x 28 size, at one (1) channel that is grayscale
Output attribute has 49000 samples

Exploratory Data Analysis

In Exploratory Data Analysis (EDA), we will visualize the data with different kinds of plots for inference. It is helpful to find some patterns (or) relations within the data

image_index = 0
print(y[image_index])
plt.imshow(X[image_index].reshape(28,28), cmap='Greys')

y[image_index] returns the output value
X[image_index].reshape(28,28), cmap='Greys' returns the input image sample reshaping in 28 x 28 size in grayscale
Display of the value returned into a plot graph in grayscale

image_index = 10
print(y[image_index])
plt.imshow(X[image_index].reshape(28,28), cmap='Greys')

image_index = 100
print(y[image_index])
plt.imshow(X[image_index].reshape(28,28), cmap='Greys')

Train-Test Split

We must split the dataset for training and testing

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, stratify=np.array(y))

stratify=np.array(y) - parameter that will split the test data with uniform distribution, without this it will randomly split the data and possibly not cover all the test data

Normalization

Pre-processing step to normalize all the values in the data between zero (0) to one (1) to obtain better results

x_train[0]

x_train /= 255
x_test /= 255

x_train[0]

x_train[0] visualization of the values of the data
All data will have a pixel value between 0 and 255
x_train /= 255, x_test /= 255 - command to normalize the values in the dataset.
Run it only once, more than one will divide the data again by 255 and mess up the data
x_train[0] again to view the data after normalization
Now all the data should have a value between 0 and 1

Model Creation

input_shape = (28,28,1)
output_class = 10

input_shape - size and color scale of our plot graph
output_class - no. of output classes

Let us create the model for training

from keras.models import Sequential
from keras.layers import Dense, Conv2D, Dropout, Flatten, MaxPooling2D

# define the model
model = Sequential()
model.add(Conv2D(28, kernel_size=(3,3), input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(128, activation=tf.nn.relu))
model.add(Dropout(0.3))
model.add(Dense(output_class, activation=tf.nn.softmax))

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics='accuracy')

Basic CNN model used for this project
Dense - single dimension linear layer array
Conv2D - model used for 2D images
Dropout - used to add regularization to the data, avoiding over fitting dropping out a fraction of the data
Flatten - convert and 2D array into a single array
MaxPooling2D - function to get the maximum pixel value to the next layer
Kernel_size=() - filter size
Model.compile() - compilation of the data
Optimizer=’adam’ - automatically adjust the learning rate for the model over the number of epochs
Loss=’sparse_categorical_crossentropy’ - loss function for category outputs
The activation function defines the output data, use sigmoid for binary classification

Training the model for the entire dataset

# train the model
model.fit(x=x_train, y=y_train, batch_size=32, epochs=30, validation_data=(x_test, y_test))

Display of the results after training the data
Batch_size=32 - amount of images to process per iteration
Epochs=30 - no. of iterations for training
Highest validation accuracy is 98.38
Highest training accuracy is 99.59
Training too much will over fit the data &adjust the number of epochs reasonably

Epoch 1/30 1149/1149 [==============================] - 10s 3ms/step - loss: 0.4816 - accuracy: 0.8475 - val_loss: 0.1202 - val_accuracy: 0.9637 Epoch 2/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.1336 - accuracy: 0.9605 - val_loss: 0.0848 - val_accuracy: 0.9743 Epoch 3/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0863 - accuracy: 0.9732 - val_loss: 0.0807 - val_accuracy: 0.9742 Epoch 4/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0685 - accuracy: 0.9783 - val_loss: 0.0734 - val_accuracy: 0.9788 Epoch 5/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0543 - accuracy: 0.9825 - val_loss: 0.0690 - val_accuracy: 0.9809 Epoch 6/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0461 - accuracy: 0.9844 - val_loss: 0.0684 - val_accuracy: 0.9808 Epoch 7/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0360 - accuracy: 0.9873 - val_loss: 0.0743 - val_accuracy: 0.9798 Epoch 8/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0318 - accuracy: 0.9884 - val_loss: 0.0733 - val_accuracy: 0.9811 Epoch 9/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0319 - accuracy: 0.9891 - val_loss: 0.0658 - val_accuracy: 0.9838 Epoch 10/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0242 - accuracy: 0.9919 - val_loss: 0.0728 - val_accuracy: 0.9827

Epoch 11/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0218 - accuracy: 0.9926 - val_loss: 0.0815 - val_accuracy: 0.9818 Epoch 12/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0286 - accuracy: 0.9895 - val_loss: 0.0766 - val_accuracy: 0.9829 Epoch 13/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0199 - accuracy: 0.9928 - val_loss: 0.0762 - val_accuracy: 0.9820 Epoch 14/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0239 - accuracy: 0.9918 - val_loss: 0.0754 - val_accuracy: 0.9836 Epoch 15/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0160 - accuracy: 0.9938 - val_loss: 0.0865 - val_accuracy: 0.9820 Epoch 16/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0196 - accuracy: 0.9935 - val_loss: 0.0842 - val_accuracy: 0.9822 Epoch 17/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0152 - accuracy: 0.9951 - val_loss: 0.0825 - val_accuracy: 0.9828 Epoch 18/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0155 - accuracy: 0.9943 - val_loss: 0.0889 - val_accuracy: 0.9817 Epoch 19/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0207 - accuracy: 0.9930 - val_loss: 0.0886 - val_accuracy: 0.9822 Epoch 20/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0122 - accuracy: 0.9955 - val_loss: 0.0958 - val_accuracy: 0.9822

Epoch 21/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0135 - accuracy: 0.9957 - val_loss: 0.0986 - val_accuracy: 0.9824 Epoch 22/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0166 - accuracy: 0.9949 - val_loss: 0.0987 - val_accuracy: 0.9824 Epoch 23/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0153 - accuracy: 0.9949 - val_loss: 0.0917 - val_accuracy: 0.9832 Epoch 24/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0147 - accuracy: 0.9950 - val_loss: 0.0967 - val_accuracy: 0.9838 Epoch 25/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0112 - accuracy: 0.9957 - val_loss: 0.1057 - val_accuracy: 0.9816 Epoch 26/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0134 - accuracy: 0.9959 - val_loss: 0.1024 - val_accuracy: 0.9830 Epoch 27/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0085 - accuracy: 0.9968 - val_loss: 0.1256 - val_accuracy: 0.9795 Epoch 28/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0127 - accuracy: 0.9958 - val_loss: 0.1099 - val_accuracy: 0.9832 Epoch 29/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0136 - accuracy: 0.9952 - val_loss: 0.1043 - val_accuracy: 0.9824 Epoch 30/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0132 - accuracy: 0.9959 - val_loss: 0.1162 - val_accuracy: 0.9827

Testing the model

After training the dataset we will check the prediction result

image_index = 10
# print("Original output:",y_test[image_index])
plt.imshow(x_test[image_index].reshape(28,28), cmap='Greys')
pred = model.predict(x_test[image_index].reshape(1,28,28,1))
print("Predicted output:", pred.argmax())

Predicted output: 1

image_index = 100
# print("Original output:",y_test[image_index])
plt.imshow(x_test[image_index].reshape(28,28), cmap='Greys')
pred = model.predict(x_test[image_index].reshape(1,28,28,1))
print("Predicted output:", pred.argmax())

Predicted output: 8

x_test[image_index].reshape(1,28,28,1) - function to reshape the data passing only one data sample on a 28 x 28 size plot on grayscale.
To pass the whole data set don’t include the first parameter in reshape
pred.argmax() - display the highest probability class from the array
The displayed digit correspond to the predicted output

Final Thoughts

Must be careful on training the data, over training can ruin the model giving inaccurate results.
This model can be reused differently depending on the data set and parameters, including face recognition and flower recognition.
Basic deep learning model trained in a small neural network, adding new layers varies the results

In this project tutorial, we have explored the Handwritten Digits recognition dataset as a classification deep learning project. Different handwritten digits was identified with explanatory data analysis.

Get the project notebook from here

Thanks for reading the article!!!

Check out more project videos from the YouTube channel Hackers Realm