top of page
  • Writer's pictureHackers Realm

Deep CNN Autoencoder for Image Compression & Denoising | Deep Learning | Python Tutorial

Updated: 3 days ago

An autoencoder is a type of unsupervised learning algorithm that aims to reconstruct its input data at the output layer, typically learns efficient data representations (encoding) by training the network to ignore signal “noise”. Autoencoders can be used for image denoising, image compression, data compression, anomaly detection, and feature extraction and, in some cases, even generation of image data.


A deep CNN autoencoder is a powerful approach for both image compression and denoising tasks. In this project tutorial we will explore how Deep CNN Autoencoder can be used for image compression and denoising.



In this project tutorial first we will see how autoencoder can be used for image compression


Deep CNN Autoencoder - Image Compression


For image compression, the deep CNN autoencoder learns to encode the important features of an input image into a compressed representation in the latent space. The encoding process reduces the dimensionality of the input image while retaining the essential information.

Deep CNN Autoencoder for Image Compression
Deep CNN Autoencoder for Image Compression


You can watch the video-based tutorial with a step-by-step explanation down below.


Flow of Autoencoder


Input Image -> Encoder -> Compressed Representation -> Decoder -> Reconstruct Input Image

  • The autoencoder takes an input data sample, here we have considered an image, and feeds it into the encoder network

  • The encoder network consists of several layers, typically including convolutional layers, pooling layers, and fully connected layers. These layers progressively reduce the spatial dimensions and extract meaningful features from the input data

  • The final layer of the encoder network produces a compressed representation of the input data

  • The compressed representation from the encoding stage is passed into the decoder network

  • The decoder network is symmetrical to the encoder network, consisting of fully connected layers, upsampling layers, and sometimes transposed convolutional layers. It takes the compressed representation and gradually increases the spatial dimensions to reconstruct the original input data

  • The final layer of the decoder network generates the reconstructed output, which aims to closely resemble the original input data


Import Modules


import numpy as np
import matplotlib.pyplot as plt
from keras import Sequential
from keras.layers import Dense, Conv2D, MaxPooling2D, UpSampling2D
from keras.datasets import mnist
  • numpy - used to perform a wide variety of mathematical operations on arrays

  • matplotlib - used for data visualization and graphical plotting

  • keras - used to provide a user-friendly and intuitive interface for designing, training, and evaluating deep learning models

  • keras.layers - provides a variety of pre-defined layers that can be used to construct neural network models

  • keras.datasets - provides pre-loaded datasets that can be used for training, testing, and evaluating machine learning models


Load the Dataset


The project uses The MNIST handwritten digits dataset

(x_train, _), (x_test, _) = mnist.load_data()
  • mnist.load_data() loads the MNIST dataset


Preprocess the image Data


Next we will have to normalize the input image data

# normalize the image data
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
  • astype('float32') converts the data type of the pixel values to float32. This step is performed to ensure compatibility with subsequent operations and to allow for decimal values

  • Then we will have to scale the pixel values by dividing them by 255. This step normalizes the pixel values to the range of 0 to 1, as the original pixel values are integers ranging from 0 to 255.

  • This normalization step is often applied to improve the training process and convergence of neural networks


Next we will reshape the input image data

# reshape in the input data for the model
x_train = x_train.reshape(len(x_train), 28, 28, 1)
x_test = x_test.reshape(len(x_test), 28, 28, 1)
x_test.shape

(10000, 28, 28, 1)

  • reshape() is a NumPy function that reshapes the array

  • No of samples is 10000

  • (28,28,1) is the dimensions of the image , the first two dimensions represent the spatial dimensions (height and width), and the last dimension represents the number of channels


Exploratory Data Analysis


Here we will explore how the input image looks like

# randomly select input image
index = np.random.randint(len(x_test))
# plot the image
plt.imshow(x_test[index].reshape(28,28))
plt.gray()
  • np.random.randint() generates a random integer index within the range of the length of the x_test array. This is used to randomly select an image from the test dataset

  • imshow() displays the image. It takes the reshaped image as input.

  • x_test[index] retrieves the image at the randomly generated index from the test dataset.

  • reshape() reshapes the selected image back to its original 2D shape of 28x28 pixels. This is necessary because the image was flattened into a 1D array when it was stored in x_test.

  • plt.gray() sets the color map of the plot to grayscale, so the image is displayed in black and white

MNIST Sample Image
MNIST Sample Image

  • This is the random image from the MNIST test dataset displayed using matplotlib, with the pixel values reshaped into a 2D grid and plotted in grayscale.


Next let us see one more images from the dataset

# randomly select input image
index = np.random.randint(len(x_test))
# plot the image
plt.imshow(x_test[index].reshape(28,28))
plt.gray()

minst dataset
MNIST Sample Image
  • This is another random image from the MNIST test dataset displayed using matplotlib, with the pixel values reshaped into a 2D grid and plotted in grayscale


Model Creation


Next we will define a sequential model in Keras for a convolutional autoencoder

model = Sequential([
                    # encoder network
                    Conv2D(32, 3, activation='relu', padding='same', input_shape=(28, 28, 1)),
                    MaxPooling2D(2, padding='same'),
                    Conv2D(16, 3, activation='relu', padding='same'),
                    MaxPooling2D(2, padding='same'),
                    # decoder network
                    Conv2D(16, 3, activation='relu', padding='same'),
                    UpSampling2D(2),
                    Conv2D(32, 3, activation='relu', padding='same'),
                    UpSampling2D(2),
                    # output layer
                    Conv2D(1, 3, activation='sigmoid', padding='same')
])

model.compile(optimizer='adam', loss='binary_crossentropy')
model.summary()
  • Sequential([]) creates a new sequential model object

  • Inside the sequential model, a series of layers are added in order. The model architecture follows an encoder-decoder structure, typical of autoencoders

  • The encoder network consists of convolutional and pooling layers. The input shape of the first layer is specified as (28, 28, 1), indicating grayscale images of size 28x28 pixels

  • The decoder network consists of convolutional and upsampling layers

  • The final layer is the output layer, which uses a convolutional layer with a single channel and a sigmoid activation function to reconstruct the image

  • model.compile() compiles the model and configures the training process. The optimizer is set to 'adam', which is a popular optimization algorithm for neural networks. The loss function is set to 'binary_crossentropy', which is commonly used for binary classification problems

  • model.summary() prints a summary of the model architecture, including the number of parameters and the shape of each layer's output

autoencoder model configuration
Autoencoder Model Configuration
  • This is the overview of the model's structure and parameter counts


Training the Model


Next we will train the model

# train the model
model.fit(x_train, x_train, epochs=20, batch_size=256, validation_data=(x_test, x_test))
  • model.fit() is used to train the model

  • x_train is the input training data, and x_train is also used as the target output since it's an autoencoder (reconstructing the input)

  • epochs=20 specifies the number of times the entire training dataset will be iterated during training

  • batch_size=256 determines the number of samples used in each training update. In this case, 256 samples will be processed before updating the model's weights

  • validation_data=(x_test, x_test) is used to specify the validation data to evaluate the model's performance during training. Here, the same dataset (x_test) is used as both the input and target output

You will see the following result :

Training Steps of the model
Training Steps of the model


Visualize the results


First we will randomly select the image and display it

# randomly select input image
index = np.random.randint(len(x_test))
# plot the image
plt.imshow(x_test[index].reshape(28,28))
plt.gray()

mnist sample dataset
MNIST Sample Image
  • This is the random image from the MNIST test dataset displayed using matplotlib


Next we will predict the results from model

# predict the results from model (get compressed images)
pred = model.predict(x_test)
  • model.predict() is a method in Keras used to obtain predictions from a trained model

  • x_test is the input test data on which predictions will be made


Next we will visualize the compressed image obtained from model.predict()

# visualize compressed image
plt.imshow(pred[index].reshape(28,28))
plt.gray()
mnist compressed image from autoencoder model
Compressed Image from Autoencoder
  • This is the compressed image . We can clearly see that there is some difference between original image and the compressed image


We can create subplots which will display the original and predicted compressed image side by side , which helps to visualize the difference between both the images clearly

index = np.random.randint(len(x_test))
plt.figure(figsize=(10, 4))
# display original image
ax = plt.subplot(1, 2, 1)
plt.imshow(x_test[index].reshape(28,28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# display compressed image
ax = plt.subplot(1, 2, 2)
plt.imshow(pred[index].reshape(28,28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
  • First, np.random.randint() as seen earlier generates a random integer index within the range of the length of the x_test array.

  • plt.figure() creates a new figure object with a specified size of 10 inches wide and 4 inches tall. This sets the overall size of the plot

  • plt.subplot(1, 2, 1) creates a subplot grid with 1 row and 2 columns and selects the first subplot for displaying the original image

  • plt.imshow() displays the original image at the selected index from x_test. The reshape(28, 28) is used to reshape the flattened image back to its original 2D shape of 28x28 pixels.

  • plt.gray() as seen earlier sets the color map of the plot to grayscale.

  • ax.get_xaxis().set_visible() and ax.get_yaxis().set_visible() hide the x-axis and y-axis ticks as we set False, respectively, to remove the axis labels

  • plt.subplot(1, 2, 2) selects the second subplot for displaying the compressed/reconstructed image.

  • Next we will display the reconstructed image at the selected index from pred using plt.imshow()

  • plt.show() displays the figure with both subplots showing the original and reconstructed images

comparison between original and compressed image using autoencoder
Comparison between original and compressed image using autoencoder
  • This gives us better visualization where we can clearly see the difference between the original image and the predicted compressed image


We can check the results for one more image data to see the accuracy and performance of the model

index = np.random.randint(len(x_test))
plt.figure(figsize=(10, 4))
# display original image
ax = plt