Learning Rate Scheduler | Keras Tensorflow | Python

Hackers Realm
Aug 21, 2023
5 min read

A learning rate scheduler is a technique used in training machine learning models, particularly neural networks, to dynamically adjust the learning rate during the training process with python. The learning rate is a hyperparameter that determines the step size at which the model updates its weights in response to the gradient of the loss function. Properly tuning the learning rate is essential for achieving fast convergence and stable training.

Learning rate schedulers help in finding an appropriate learning rate by either gradually decreasing it, adapting it based on certain conditions, or using more complex strategies.

You can watch the video-based tutorial with step by step explanation down below.

Custom Learning Rate Scheduler Implementation

First let us define a custom learning rate scheduler using TensorFlow's Keras API.

from tensorflow.keras.callbacks import LearningRateScheduler
import tensorflow as tf

def scheduler(epoch, lr):
    if epoch <= 3:
        return lr
    else:
        return lr * tf.math.exp(-0.1)

LearningRateScheduler is a callback class from TensorFlow's Keras API that allows you to customize the learning rate schedule during training.
tensorflow as tf imports TensorFlow, which is the deep learning framework you're using.
We will define a function named scheduler that takes two arguments: epoch and lr. The epoch argument represents the current training epoch, and lr represents the current learning rate.
The function checks whether the value of epoch is less than or equal to 3. If the condition is true (i.e., the current epoch is within the first three epochs of training), the function returns the current learning rate (lr) unchanged. This means that the learning rate will remain the same during these initial epochs.
If the condition in the previous block is not met (i.e., the current epoch is greater than 3), this block is executed. It calculates a new learning rate by multiplying the current learning rate (lr) by the exponential of -0.1. The tf.math.exp() function calculates the exponent of a given value. This effectively decreases the learning rate exponentially after the third epoch.

Next let us see how the learning rate would change over 10 epochs using the scheduler function you defined earlier.

lr = 0.01
for i in range(10):
    lr = scheduler(i, lr)
    print(i, lr)

0 0.01

1 0.01

2 0.01

3 0.01

4 tf.Tensor(0.009048373, shape=(), dtype=float32)

5 tf.Tensor(0.008187306, shape=(), dtype=float32)

6 tf.Tensor(0.0074081807, shape=(), dtype=float32)

7 tf.Tensor(0.006703199, shape=(), dtype=float32)

8 tf.Tensor(0.006065305, shape=(), dtype=float32)

9 tf.Tensor(0.0054881144, shape=(), dtype=float32)

First we initialize the learning rate lr with a value of 0.01. This is the starting learning rate before any adjustments.
for loop iterates over a range of 10 epochs (0 to 9). It simulates the learning rate adjustment for each epoch.
Within each iteration of the loop, the scheduler function is called with the current epoch i and the current learning rate lr. The function returns a new learning rate based on the conditions defined in the function. The new learning rate is assigned to the variable lr.
After calculating the new learning rate, the loop prints the current epoch i and the corresponding learning rate lr.
The learning rate will remain unchanged during the first three epochs and then decrease exponentially for subsequent epochs. The output of the loop will show how the learning rate evolves from epoch to epoch.

Next let us create an instance of LearningRateScheduler.

callback = LearningRateScheduler(scheduler)

In this code snippet, you create an instance of the LearningRateScheduler callback by passing your custom scheduler function as an argument.

Next let us Train the model.

# train the model
model.fit(train_iterator, epochs=10, validation_data=val_iterator, callbacks=callback)

Epoch 1/10

118/118 [==============================] - 28s 212ms/step - loss: 1.8649 - accuracy: 0.3935 - val_loss: 0.8353 - val_accuracy: 0.7686

Epoch 2/10

118/118 [==============================] - 24s 198ms/step - loss: 1.0418 - accuracy: 0.6792 - val_loss: 0.6458 - val_accuracy: 0.8067

Epoch 3/10

118/118 [==============================] - 23s 196ms/step - loss: 0.8402 - accuracy: 0.7442 - val_loss: 0.5829 - val_accuracy: 0.8213

Epoch 4/10

118/118 [==============================] - 23s 197ms/step - loss: 0.7247 - accuracy: 0.7757 - val_loss: 0.5291 - val_accuracy: 0.8297

Epoch 5/10

118/118 [==============================] - 23s 194ms/step - loss: 0.6669 - accuracy: 0.7907 - val_loss: 0.4918 - val_accuracy: 0.8450

Epoch 6/10

118/118 [==============================] - 23s 197ms/step - loss: 0.6155 - accuracy: 0.8083 - val_loss: 0.4922 - val_accuracy: 0.8416

Epoch 7/10

118/118 [==============================] - 28s 232ms/step - loss: 0.5778 - accuracy: 0.8202 - val_loss: 0.4703 - val_accuracy: 0.8493

Epoch 8/10

118/118 [==============================] - 29s 239ms/step - loss: 0.5723 - accuracy: 0.8217 - val_loss: 0.4206 - val_accuracy: 0.8658

Epoch 9/10

118/118 [==============================] - 25s 210ms/step - loss: 0.5470 - accuracy: 0.8266 - val_loss: 0.4052 - val_accuracy: 0.8711

Epoch 10/10

118/118 [==============================] - 24s 199ms/step - loss: 0.5282 - accuracy: 0.8353 - val_loss: 0.4045 - val_accuracy: 0.8703

<tensorflow.python.keras.callbacks.History at 0x269702f0d60>

model.fit() is used to train your model.
train_iterator: This is typically a generator or an iterator that provides batches of training data for each epoch.
epochs=10: This argument specifies the number of epochs for which the model will be trained. In this case, the model will be trained for 10 epochs.
validation_data=val_iterator: This argument provides a validation data generator or iterator that supplies validation data for each epoch. This helps in monitoring the model's performance on unseen data during training.
callbacks=callback: Here, you're passing your LearningRateScheduler callback (which you created earlier using the scheduler function) to the callbacks parameter. This means that the learning rate will be adjusted based on the schedule defined in the scheduler function during each epoch of training.
By including the LearningRateScheduler callback in the callbacks parameter, you ensure that the learning rate will be updated according to the logic in your scheduler function for each epoch of training. This dynamic adjustment of the learning rate can help improve convergence and overall training performance.
During the training process, as your model goes through each epoch, the learning rate scheduler dynamically adjusts the learning rate based on the logic defined in your scheduler function.

Final Thoughts

Learning rate schedulers are a crucial tool in training machine learning models, especially deep neural networks. Properly adjusting the learning rate during training can significantly impact the convergence speed and final performance of your model.
Finding an appropriate learning rate schedule can help your model converge faster and achieve better performance in terms of accuracy and loss on both the training and validation data.
Learning rate schedulers can help your model navigate flat regions of the loss landscape, also known as plateaus, where traditional fixed learning rates might slow down convergence or even lead to getting stuck.
There's no one-size-fits-all solution for learning rate scheduling. Different problems, datasets, and architectures might require different strategies. Experiment with various learning rate schedules to find the one that works best for your specific task.
There are various learning rate schedules to choose from, including step decay, exponential decay, cyclical schedules, and adaptive schedules like ReduceLROnPlateau. Choose a scheduler that aligns well with your problem's characteristics.
Learning rate schedulers are hyperparameters themselves, and tuning them is just as important as tuning other hyperparameters. Grid search, random search, or more advanced techniques can be used to find the best schedule for your model.
Plotting the learning rate schedule or monitoring the learning rate's behavior during training can provide insights into how well your chosen schedule is working and if any adjustments are needed.
While learning rate schedules can automate the process of adjusting the learning rate, it's still important to monitor the model's performance and convergence. Make adjustments if you observe unexpected behaviors.
Many deep learning frameworks provide built-in functions for common learning rate schedulers, making it easier to implement them in your training pipeline.

In summary, learning rate schedulers are powerful tools for fine-tuning the training process of your machine learning models. By adjusting the learning rate dynamically over the course of training, you can help your model converge more efficiently and achieve better results. Experiment, monitor, and adapt your learning rate schedules to suit the characteristics of your specific task and dataset.

Get the project notebook from here

Thanks for reading the article!!!

Check out more project videos from the YouTube channel Hackers Realm