In the rapidly evolving field of artificial intelligence and machine learning, transfer learning has emerged as a pivotal technique that empowers models to leverage knowledge gained from one task and apply it to another, often distinct, task. One of the most remarkable manifestations of transfer learning is the utilization of pretrained models, which have been preprocessed and trained on massive datasets, usually containing vast amounts of diverse information and it can be implemented using python. These pretrained models serve as a foundation, capturing intricate patterns, features, and representations from their original training tasks. When adapted to new tasks, these models can significantly expedite the training process, enhance performance, and even enable breakthroughs in domains with limited labeled data.
In the realm of machine learning and artificial intelligence, transfer learning stands as a beacon of innovation, allowing models to harness the insights gained from one task and apply them to new, distinct tasks.
You can watch the video-based tutorial with step by step explanation down below.
Import Modules
from tensorflow.keras.datasets import mnist
import matplotlib.pyplot as plt
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.utils import to_categorical
import numpy as np
import warnings
warnings.filterwarnings('ignore')
tensorflow.keras.datasets - provides a collection of commonly used datasets for machine learning and deep learning tasks.
matplotlib.pyplot - provides a simple and convenient interface for creating and customizing plots, making it a go-to tool for data visualization tasks.
tensorflow.keras.preprocessing.image - provides a collection of utilities for image data preprocessing and augmentation.
tensorflow.keras.utils - provides utility functions to assist with various tasks related to building, training, and evaluating deep learning models using the Keras API.
numpy - provides support for large, multi-dimensional arrays and matrices, along with a wide range of mathematical functions to operate on these arrays.
warnings - provides a way to control and manage warnings that are generated during program execution.
Load the dataset
Let us load the MNIST Dataset.
(X_train, y_train), (X_test, y_test) = mnist.load_data()
Load the MNIST dataset using Keras 'mnist.load_data()' function.
The MNIST dataset is a commonly used dataset in the field of machine learning and computer vision, containing a large collection of handwritten digits.
X_train: This variable will hold the training images. It's a 3D NumPy array with the shape (num_samples, height, width) where num_samples is the number of training examples, and height and width are the dimensions of each image (28x28 pixels in the case of MNIST).
y_train: This variable will hold the corresponding labels for the training images. It's a 1D NumPy array with the shape (num_samples,) containing integers representing the digit labels (0 to 9).
X_test: Similar to X_train, this variable holds the testing images.
y_test: Similar to y_train, this variable holds the corresponding labels for the testing images.
These arrays can be used to train and test machine learning models for tasks like digit recognition and image classification.
Next check the shapes of these arrays.
X_train.shape, X_test.shape
((60000, 28, 28), (10000, 28, 28))
The shape attribute of a NumPy array returns a tuple representing the dimensions of the array. In this case, the shapes of X_train and X_test will indicate the number of samples, height, and width of each image.
For the MNIST dataset, each image is a 28x28 grayscale image.
This indicates that there are 60,000 training images, each of size 28x28 pixels, and 10,000 testing images of the same size.
Preprocess the Data
Next reshape and convert the data types of the X_train and X_test arrays, which are the training and testing images from the MNIST dataset.
# reshape the data
X_train = X_train.reshape((X_train.shape[0], 28, 28))
X_test = X_test.reshape((X_test.shape[0], 28, 28))
# change the type to float
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# convert data to 3 channels
X_train = np.stack((X_train,)*3, axis=-1)
X_test = np.stack((X_test,)*3, axis=-1)
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
The MNIST dataset contains grayscale images with dimensions of 28x28 pixels each. However, many deep learning models, especially CNNs, expect input data in a different shape, typically including a channel dimension (e.g., 28x28x1 for grayscale or 28x28x3 for RGB).
The code reshapes X_train and X_test to have the shape (number_of_samples, 28, 28).
The pixel values of the images are typically represented as integers ranging from 0 to 255. Converting these values to a floating-point format is often necessary for numerical computations and model training.
The code converts X_train and X_test to the float32 data type.
The code converts the grayscale images to RGB format by replicating the grayscale channel three times to mimic the three color channels (red, green, and blue) that RGB images have.
The code uses np.stack to create a new array where each grayscale channel is replicated three times along the last axis, resulting in images with shape (number_of_samples, 28, 28, 3).
The original y_train and y_test labels are represented as single integers indicating the digit in each image (0 to 9).
To use these labels for multi-class classification with a neural network, it's common to one-hot encode them. This means converting the single integer labels into binary vectors, where each vector has a 1 in the position corresponding to the correct class and 0 elsewhere.
The code uses the to_categorical() function to perform this one-hot encoding on y_train and y_test.
Next check the shapes of these arrays after preprocessing.
X_train.shape, X_test.shape
((60000, 28, 28, 3), (10000, 28, 28, 3))
After applying the reshaping and data type conversion operations to the X_train and X_test arrays, the shapes of these arrays will change.
X_train.shape: (number_of_train_samples, 28, 28, 3).
X_test.shape: (number_of_test_samples, 28, 28, 3).
Here the value of number_of_train_samples and number_of_test_samples is 60000 and 10000 respectively.
Each image in both X_train and X_test has a shape of (28, 28, 3), where:
The first dimension (28) represents the height of the image.
The second dimension (28) represents the width of the image.
The third dimension (3) represents the number of color channels. In this case, it's 3, indicating that the images are now in RGB format.
These shapes are suitable for feeding into a convolutional neural network (CNN) model, which typically expects input in the form of (batch_size, height, width, channels).
Data augmentation using an ImageDataGenerator
Data augmentation is a common technique used to artificially increase the size of your training dataset and improve the generalization capabilities of your model.
# data augmentation with generator
train_generator = ImageDataGenerator(
rescale = 1./255, # normalization of images
rotation_range = 40, # augmention of images to avoid overfitting
shear_range = 0.2,
zoom_range = 0.2,
fill_mode = 'nearest'
)
val_generator = ImageDataGenerator(rescale = 1./255)
train_iterator = train_generator.flow(X_train, y_train, batch_size=512, shuffle=True)
val_iterator = val_generator.flow(X_test, y_test, batch_size=512, shuffle=False)
train_generator: This creates an instance of the ImageDataGenerator class, which will generate augmented training data batches. It's initialized with various data augmentation settings:
rescale = 1./255: This scales down the pixel values of the images to a range between 0 and 1, which is a common normalization step for image data.
rotation_range = 40: Images will be randomly rotated by angles between -40 and 40 degrees during augmentation.
shear_range = 0.2: Random shear transformations will be applied with a maximum shear intensity of 0.2.
zoom_range = 0.2: Random zooming with a maximum zoom factor of 0.2 will be performed.
fill_mode = 'nearest': This determines how new pixels created by transformations will be filled. 'nearest' means new pixels take the value of the nearest existing pixel.
val_generator: This creates another instance of ImageDataGenerator, intended for generating validation data batches. It only includes normalization by rescaling pixel values to the range of 0 to 1, without the augmentation settings.
train_iterator: This is an iterator created from the training generator. It generates batches of augmented training data by applying the transformations specified in train_generator to the input images X_train and corresponding labels y_train. Key points:
X_train and y_train are the training images and labels.
batch_size=512: This specifies how many samples are in each batch.
shuffle=True: Batches will be shuffled before being presented to the model, which helps with randomness and avoiding overfitting.
val_iterator: Similar to the train_iterator, this iterator generates batches of validation data using the validation generator. Key points:
X_test and y_test are the validation images and labels.
batch_size=512: The number of samples in each validation batch.
shuffle=False: Since shuffling isn't necessary during validation, this is set to False to maintain the order of the validation data.
Create the Model
Let us import the modules for machine learning model using TensorFlow and Keras.
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
ResNet50: This is a pre-trained convolutional neural network (CNN) architecture included in the tensorflow.keras.applications module. It's a deep neural network that has shown excellent performance on a variety of image classification tasks.
Sequential: This is the type of Keras model you'll use. It represents a linear stack of layers.
Dense: This is the standard fully connected layer in Keras.
After importing these modules, you can proceed to create your own model by adding layers to it.
model = Sequential()
# add the pretrained model
model.add(ResNet50(include_top=False, pooling='avg', weights='imagenet'))
# add fully connected layer with output
model.add(Dense(512, activation='relu'))
model.add(Dense(10, activation='softmax'))
# set resnet layers not trainable
model.layers[0].trainable=False
model.summary()
You're creating a new Sequential model instance.
Next add the ResNet50 architecture as a base to your model. The include_top parameter is set to False to exclude the fully connected layers at the top of the network (this is because you're adding your own fully connected layers later). The pooling parameter is set to 'avg', which means global average pooling will be applied to the output of the ResNet50 base before passing it to the subsequent layers. The weights parameter is set to 'imagenet', which initializes the model with pre-trained weights from the ImageNet dataset.
Next add two fully connected Dense layers to your model. The first layer has 512 units and uses the ReLU activation function. This can act as a feature extractor that processes the output from the ResNet50 base. The second layer has 10 units and uses the softmax activation function, suitable for multiclass classification with 10 classes.
Next set the layers of the ResNet50 base to be non-trainable. This means that only the weights of the added fully connected layers will be updated during training, while the pre-trained ResNet50 weights will remain fixed.
Finally print the summary of the model, which provides an overview of the layers, their output shapes, and the number of trainable parameters.
By adding a global average pooling layer after the ResNet50 base and setting its layers as non-trainable, you're essentially using the ResNet50 features as a fixed feature extractor and then adding your custom classification layers on top.
Now, you can compile and train this model using the data generators and iterators you've defined earlier.
model.compile(optimizer='Adam', loss='categorical_crossentropy', metrics=['accuracy'])
optimizer='Adam': You've chosen the Adam optimizer for updating the model's weights during training. Adam is an adaptive learning rate optimization algorithm that combines the benefits of both the AdaGrad and RMSProp optimizers.
loss='categorical_crossentropy': This is the loss function used for categorical classification problems. It measures the dissimilarity between the true class probabilities and the predicted class probabilities. Since you're dealing with multiclass classification (10 classes), categorical_crossentropy is an appropriate choice.
metrics=['accuracy']: During training, the model's accuracy will be calculated and displayed. This metric indicates how well the model's predictions match the actual labels.
Train the Model
Train your model using the fit() function and the data iterators you've set up earlier.
model.fit(train_iterator, epochs=10, validation_data=val_iterator)
train_iterator: This is the data iterator you've defined for your training data using the ImageDataGenerator. It generates batches of augmented training data along with their corresponding labels.
epochs=10: This parameter specifies the number of times the entire training dataset will be used to train the model. In this case, you're training for 10 epochs, meaning the model will see the complete training data 10 times during training.
validation_data=val_iterator: This parameter specifies the validation data iterator. It generates batches of validation data and labels that are used to evaluate the model's performance during training after each epoch.
In each epoch, the train_iterator generates batches of augmented training data.
The model is trained on each batch, and the weights are updated based on the chosen optimizer and loss function.
This process continues for all batches until all training data is processed for that epoch.
After each epoch, the val_iterator generates batches of validation data.
The model's performance on the validation data is evaluated using the provided validation data and loss function.
The validation metrics, such as loss and accuracy, are recorded for analysis.
The training process repeats for the specified number of epochs (in this case, 10).
As the training progresses, you'll see the output indicating the loss and accuracy on both the training and validation datasets for each epoch.
You can use these metrics to monitor the model's performance and potentially detect issues like overfitting or underfitting.
Once all epochs are completed, the model's weights will be updated based on the training data, and the final model can be used for making predictions on new data.
Final Thoughts
Transfer learning can significantly reduce training time, as the base model has already learned useful features from a large dataset.
If you have a limited amount of data for your specific task, transfer learning can help by leveraging knowledge from the original dataset.
Pretrained models have learned rich feature representations that often generalize well to new tasks.
Using a pretrained model's features can help prevent overfitting, especially when you're working with limited data.
Choose a model architecture that's relevant to your task. For example, use a CNN for image classification tasks.
Select a model that has been pretrained on a dataset similar to yours, as this will ensure the features captured are relevant.
Modify the top layers of the pretrained model to match your specific task. Add new layers for classification or regression.
Fine-tune the pretrained model by updating some of its layers while keeping others frozen. This helps to specialize the model for your task.
Transfer learning using pretrained models is a practical approach that has been successfully applied in various domains, including computer vision, natural language processing, and more. It can save you time and resources while enabling you to build effective models even with limited data. However, understanding the principles and nuances of transfer learning is crucial for making informed decisions and achieving the best results for your specific task.
Get the project notebook from here
Thanks for reading the article!!!
Check out more project videos from the YouTube channel Hackers Realm
Comentários