• Hackers Realm

MNIST Handwritten Digits Recognition using Python | Image Classification | Deep Learning Tutorial

The MNIST Handwritten Digits Recognition is an image classification project where we have to analyze and recognize handwritten digits. This is a basic deep learning project for beginners to learn.



In the previous projects we have been using Jupyter Notebook interface to code and process all our dataset and the corresponding models. At this point forward for all our future projects for deep learning and machine learning, we will be using the Google Colab interface.



You can watch the step by step explanation video tutorial down below


To know more and get started with Google Colab, click here


Dataset Information

This dataset allows you to study, analyze and recognize elements in the images. That’s exactly how your camera detects your face, using image recognition! It’s a digit recognition problem. This data set has 49,000 images of 28 X 28 size, totaling 49 MB.


Download the dataset here



Import Modules


First, we have to import all the basic modules we will be needing for this project.

!pip install tensorflow-gpu keras
  • Installation of tensorflow-gpu backend module and keras API


import pandas as pd
import numpy as np
from tqdm.notebook import tqdm
from keras.preprocessing.image import img_to_array, load_img
import tensorflow as tf
import matplotlib.pyplot as plt
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')
  • pandas - used to perform data manipulation and analysis

  • numpy - used to perform a wide variety of mathematical operations on arrays

  • tqdm.notebook - progressbar decorator for iterators. Includes a default range iterator printing to stderr.

  • keras.preprocessing.image - basic preprocessing step for loading the image and converting the image to a numpy array

  • tensorflow – backend module for the use of Keras

  • matplotlib - used for data visualization and graphical plotting

  • warnings - to manipulate warnings details

  • %matplotlib - to enable the inline plotting

filterwarnings('ignore') is to ignore the warnings thrown by the modules (gives clean results)


Unzip the train data


We need to extract the dataset for training and testing using unzip

!unzip Train_UQcUa52.zip


Load the data


df = pd.read_csv('train.csv')
df.head()
  • pd.read_csv() - read csv file to view the data

  • Filename is the image name

  • Label determines the digit value in the image

  • Data must be converted into an array for processing


!pwd

/content

  • Current directory path


image_path = 'Images/train/'
  • Storing the path of the image data



Now we load all the dataset into an array for processing

X = np.array([img_to_array(load_img(image_path+df['filename'][i], target_size=(28,28,1), grayscale=True))
                   for i in tqdm(range(df.shape[0]))
                   ]).astype('float32')
  • Array where the appended images will be stored

  • target_size=(28,28,1) - the images are in grayscale so the channel mode is one (1), in a 28 x 28 plot size

  • grayscale=True - indication to the function that it’s a grayscale image

  • range(df.shape[0]) - retrieve the length of the whole training data set

  • tqdm() - display of overall process with a loading bar on each iteration of each image being converted into an array

  • astype('float32') - conversion of numpy array into a float 32 data type


y = df['label']
  • Loading label data to the variable y


print(X.shape, y.shape)

(49000, 28, 28, 1) (49000,)

  • Input data has 49000 samples with 28 x 28 size, at one (1) channel that is grayscale

  • Output attribute has 49000 samples



Exploratory Data Analysis


In Exploratory Data Analysis (EDA), we will visualize the data with different kinds of plots for inference. It is helpful to find some patterns (or) relations within the data


image_index = 0
print(y[image_index])
plt.imshow(X[image_index].reshape(28,28), cmap='Greys')

4

  • y[image_index] returns the output value

  • X[image_index].reshape(28,28), cmap='Greys' returns the input image sample reshaping in 28 x 28 size in grayscale

  • Display of the value returned into a plot graph in grayscale



image_index = 10
print(y[image_index])
plt.imshow(X[image_index].reshape(28,28), cmap='Greys')

2


image_index = 100
print(y[image_index])
plt.imshow(X[image_index].reshape(28,28), cmap='Greys')

7



Train-Test Split


We must split the dataset for training and testing

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, stratify=np.array(y))
  • stratify=np.array(y) - parameter that will split the test data with uniform distribution, without this it will randomly split the data and possibly not cover all the test data


Normalization


Pre-processing step to normalize all the values in the data between zero (0) to one (1) to obtain better results


x_train[0]
x_train /= 255
x_test /= 255
x_train[0]
  • x_train[0] visualization of the values of the data

  • All data will have a pixel value between 0 and 255

  • x_train /= 255, x_test /= 255 - command to normalize the values in the dataset.

  • Run it only once, more than one will divide the data again by 255 and mess up the data

  • x_train[0] again to view the data after normalization

  • Now all the data should have a value between 0 and 1



Model Creation


input_shape = (28,28,1)
output_class = 10
  • input_shape - size and color scale of our plot graph

  • output_class - no. of output classes

Let us create the model for training

from keras.models import Sequential
from keras.layers import Dense, Conv2D, Dropout, Flatten, MaxPooling2D

# define the model
model = Sequential()
model.add(Conv2D(28, kernel_size=(3,3), input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(128, activation=tf.nn.relu))
model.add(Dropout(0.3))
model.add(Dense(output_class, activation=tf.nn.softmax))

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics='accuracy')
  • Basic CNN model used for this project

  • Dense - single dimension linear layer array

  • Conv2D - model used for 2D images

  • Dropout - used to add regularization to the data, avoiding over fitting dropping out a fraction of the data

  • Flatten - convert and 2D array into a single array

  • MaxPooling2D - function to get the maximum pixel value to the next layer

  • Kernel_size=() - filter size

  • Model.compile() - compilation of the data

  • Optimizer=’adam’ - automatically adjust the learning rate for the model over the number of epochs

  • Loss=’sparse_categorical_crossentropy’ - loss function for category outputs

  • The activation function defines the output data, use sigmoid for binary classification



Training the model for the entire dataset

# train the model
model.fit(x=x_train, y=y_train, batch_size=32, epochs=30, validation_data=(x_test, y_test))
  • Display of the results after training the data

  • Batch_size=32 - amount of images to process per iteration

  • Epochs=30 - no. of iterations for training

  • Highest validation accuracy is 98.38

  • Highest training accuracy is 99.59

  • Training too much will over fit the data &adjust the number of epochs reasonably


Epoch 1/30 1149/1149 [==============================] - 10s 3ms/step - loss: 0.4816 - accuracy: 0.8475 - val_loss: 0.1202 - val_accuracy: 0.9637 Epoch 2/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.1336 - accuracy: 0.9605 - val_loss: 0.0848 - val_accuracy: 0.9743 Epoch 3/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0863 - accuracy: 0.9732 - val_loss: 0.0807 - val_accuracy: 0.9742 Epoch 4/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0685 - accuracy: 0.9783 - val_loss: 0.0734 - val_accuracy: 0.9788 Epoch 5/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0543 - accuracy: 0.9825 - val_loss: 0.0690 - val_accuracy: 0.9809 Epoch 6/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0461 - accuracy: 0.9844 - val_loss: 0.0684 - val_accuracy: 0.9808 Epoch 7/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0360 - accuracy: 0.9873 - val_loss: 0.0743 - val_accuracy: 0.9798 Epoch 8/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0318 - accuracy: 0.9884 - val_loss: 0.0733 - val_accuracy: 0.9811 Epoch 9/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0319 - accuracy: 0.9891 - val_loss: 0.0658 - val_accuracy: 0.9838 Epoch 10/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0242 - accuracy: 0.9919 - val_loss: 0.0728 - val_accuracy: 0.9827 Epoch 11/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0218 - accuracy: 0.9926 - val_loss: 0.0815 - val_accuracy: 0.9818 Epoch 12/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0286 - accuracy: 0.9895 - val_loss: 0.0766 - val_accuracy: 0.9829 Epoch 13/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0199 - accuracy: 0.9928 - val_loss: 0.0762 - val_accuracy: 0.9820 Epoch 14/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0239 - accuracy: 0.9918 - val_loss: 0.0754 - val_accuracy: 0.9836 Epoch 15/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0160 - accuracy: 0.9938 - val_loss: 0.0865 - val_accuracy: 0.9820


Epoch 16/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0196 - accuracy: 0.9935 - val_loss: 0.0842 - val_accuracy: 0.9822 Epoch 17/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0152 - accuracy: 0.9951 - val_loss: 0.0825 - val_accuracy: 0.9828 Epoch 18/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0155 - accuracy: 0.9943 - val_loss: 0.0889 - val_accuracy: 0.9817 Epoch 19/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0207 - accuracy: 0.9930 - val_loss: 0.0886 - val_accuracy: 0.9822 Epoch 20/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0122 - accuracy: 0.9955 - val_loss: 0.0958 - val_accuracy: 0.9822 Epoch 21/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0135 - accuracy: 0.9957 - val_loss: 0.0986 - val_accuracy: 0.9824 Epoch 22/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0166 - accuracy: 0.9949 - val_loss: 0.0987 - val_accuracy: 0.9824 Epoch 23/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0153 - accuracy: 0.9949 - val_loss: 0.0917 - val_accuracy: 0.9832 Epoch 24/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0147 - accuracy: 0.9950 - val_loss: 0.0967 - val_accuracy: 0.9838 Epoch 25/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0112 - accuracy: 0.9957 - val_loss: 0.1057 - val_accuracy: 0.9816 Epoch 26/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0134 - accuracy: 0.9959 - val_loss: 0.1024 - val_accuracy: 0.9830 Epoch 27/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0085 - accuracy: 0.9968 - val_loss: 0.1256 - val_accuracy: 0.9795 Epoch 28/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0127 - accuracy: 0.9958 - val_loss: 0.1099 - val_accuracy: 0.9832 Epoch 29/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0136 - accuracy: 0.9952 - val_loss: 0.1043 - val_accuracy: 0.9824 Epoch 30/30 1149/1149 [==============================] - 4s 3ms/step - loss: 0.0132 - accuracy: 0.9959 - val_loss: 0.1162 - val_accuracy: 0.9827



Testing the model


After training the dataset we will check the prediction result

image_index = 10
# print("Original output:",y_test[image_index])
plt.imshow(x_test[image_index].reshape(28,28), cmap='Greys')
pred = model.predict(x_test[image_index].reshape(1,28,28,1))
print("Predicted output:", pred.argmax())

Predicted output: 1


image_index = 100
# print("Original output:",y_test[image_index])
plt.imshow(x_test[image_index].reshape(28,28), cmap='Greys')
pred = model.predict(x_test[image_index].reshape(1,28,28,1))
print("Predicted output:", pred.argmax())

Predicted output: 8

  • x_test[image_index].reshape(1,28,28,1) - function to reshape the data passing only one data sample on a 28 x 28 size plot on grayscale.

  • To pass the whole data set don’t include the first parameter in reshape

  • pred.argmax() - display the highest probability class from the array

  • The displayed digit correspond to the predicted output



Final Thoughts


  • Must be careful on training the data, over training can ruin the model giving inaccurate results.

  • This model can be reused differently depending on the data set and parameters, including face recognition and flower recognition.

  • Basic deep learning model trained in a small neural network, adding new layers varies the results

In this project tutorial, we have explored the Handwritten Digits recognition dataset as a classification deep learning project. Different handwritten digits was identified with explanatory data analysis.


Get the project notebook from here


Thanks for reading the article!!!


Check out more project videos from the YouTube channel Hackers Realm

136 views