Hackers Realm
Deep CNN Autoencoder for Image Compression & Denoising | Deep Learning | Python Tutorial
Updated: 3 days ago
An autoencoder is a type of unsupervised learning algorithm that aims to reconstruct its input data at the output layer, typically learns efficient data representations (encoding) by training the network to ignore signal “noise”. Autoencoders can be used for image denoising, image compression, data compression, anomaly detection, and feature extraction and, in some cases, even generation of image data.
A deep CNN autoencoder is a powerful approach for both image compression and denoising tasks. In this project tutorial we will explore how Deep CNN Autoencoder can be used for image compression and denoising.
In this project tutorial first we will see how autoencoder can be used for image compression
Deep CNN Autoencoder - Image Compression
For image compression, the deep CNN autoencoder learns to encode the important features of an input image into a compressed representation in the latent space. The encoding process reduces the dimensionality of the input image while retaining the essential information.

You can watch the video-based tutorial with a step-by-step explanation down below.
Flow of Autoencoder
Input Image -> Encoder -> Compressed Representation -> Decoder -> Reconstruct Input Image
The autoencoder takes an input data sample, here we have considered an image, and feeds it into the encoder network
The encoder network consists of several layers, typically including convolutional layers, pooling layers, and fully connected layers. These layers progressively reduce the spatial dimensions and extract meaningful features from the input data
The final layer of the encoder network produces a compressed representation of the input data
The compressed representation from the encoding stage is passed into the decoder network
The decoder network is symmetrical to the encoder network, consisting of fully connected layers, upsampling layers, and sometimes transposed convolutional layers. It takes the compressed representation and gradually increases the spatial dimensions to reconstruct the original input data
The final layer of the decoder network generates the reconstructed output, which aims to closely resemble the original input data
Import Modules
import numpy as np
import matplotlib.pyplot as plt
from keras import Sequential
from keras.layers import Dense, Conv2D, MaxPooling2D, UpSampling2D
from keras.datasets import mnist
numpy - used to perform a wide variety of mathematical operations on arrays
matplotlib - used for data visualization and graphical plotting
keras - used to provide a user-friendly and intuitive interface for designing, training, and evaluating deep learning models
keras.layers - provides a variety of pre-defined layers that can be used to construct neural network models
keras.datasets - provides pre-loaded datasets that can be used for training, testing, and evaluating machine learning models
Load the Dataset
The project uses The MNIST handwritten digits dataset
(x_train, _), (x_test, _) = mnist.load_data()
mnist.load_data() loads the MNIST dataset
Preprocess the image Data
Next we will have to normalize the input image data
# normalize the image data
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
astype('float32') converts the data type of the pixel values to float32. This step is performed to ensure compatibility with subsequent operations and to allow for decimal values
Then we will have to scale the pixel values by dividing them by 255. This step normalizes the pixel values to the range of 0 to 1, as the original pixel values are integers ranging from 0 to 255.
This normalization step is often applied to improve the training process and convergence of neural networks
Next we will reshape the input image data
# reshape in the input data for the model
x_train = x_train.reshape(len(x_train), 28, 28, 1)
x_test = x_test.reshape(len(x_test), 28, 28, 1)
x_test.shape
(10000, 28, 28, 1)
reshape() is a NumPy function that reshapes the array
No of samples is 10000
(28,28,1) is the dimensions of the image , the first two dimensions represent the spatial dimensions (height and width), and the last dimension represents the number of channels
Exploratory Data Analysis
Here we will explore how the input image looks like
# randomly select input image
index = np.random.randint(len(x_test))
# plot the image
plt.imshow(x_test[index].reshape(28,28))
plt.gray()
np.random.randint() generates a random integer index within the range of the length of the x_test array. This is used to randomly select an image from the test dataset
imshow() displays the image. It takes the reshaped image as input.
x_test[index] retrieves the image at the randomly generated index from the test dataset.
reshape() reshapes the selected image back to its original 2D shape of 28x28 pixels. This is necessary because the image was flattened into a 1D array when it was stored in x_test.
plt.gray() sets the color map of the plot to grayscale, so the image is displayed in black and white

This is the random image from the MNIST test dataset displayed using matplotlib, with the pixel values reshaped into a 2D grid and plotted in grayscale.
Next let us see one more images from the dataset
# randomly select input image
index = np.random.randint(len(x_test))
# plot the image
plt.imshow(x_test[index].reshape(28,28))
plt.gray()

This is another random image from the MNIST test dataset displayed using matplotlib, with the pixel values reshaped into a 2D grid and plotted in grayscale
Model Creation
Next we will define a sequential model in Keras for a convolutional autoencoder
model = Sequential([
# encoder network
Conv2D(32, 3, activation='relu', padding='same', input_shape=(28, 28, 1)),
MaxPooling2D(2, padding='same'),
Conv2D(16, 3, activation='relu', padding='same'),
MaxPooling2D(2, padding='same'),
# decoder network
Conv2D(16, 3, activation='relu', padding='same'),
UpSampling2D(2),
Conv2D(32, 3, activation='relu', padding='same'),
UpSampling2D(2),
# output layer
Conv2D(1, 3, activation='sigmoid', padding='same')
])
model.compile(optimizer='adam', loss='binary_crossentropy')
model.summary()
Sequential([]) creates a new sequential model object
Inside the sequential model, a series of layers are added in order. The model architecture follows an encoder-decoder structure, typical of autoencoders
The encoder network consists of convolutional and pooling layers. The input shape of the first layer is specified as (28, 28, 1), indicating grayscale images of size 28x28 pixels
The decoder network consists of convolutional and upsampling layers
The final layer is the output layer, which uses a convolutional layer with a single channel and a sigmoid activation function to reconstruct the image
model.compile() compiles the model and configures the training process. The optimizer is set to 'adam', which is a popular optimization algorithm for neural networks. The loss function is set to 'binary_crossentropy', which is commonly used for binary classification problems
model.summary() prints a summary of the model architecture, including the number of parameters and the shape of each layer's output

This is the overview of the model's structure and parameter counts
Training the Model
Next we will train the model
# train the model
model.fit(x_train, x_train, epochs=20, batch_size=256, validation_data=(x_test, x_test))
model.fit() is used to train the model
x_train is the input training data, and x_train is also used as the target output since it's an autoencoder (reconstructing the input)
epochs=20 specifies the number of times the entire training dataset will be iterated during training
batch_size=256 determines the number of samples used in each training update. In this case, 256 samples will be processed before updating the model's weights
validation_data=(x_test, x_test) is used to specify the validation data to evaluate the model's performance during training. Here, the same dataset (x_test) is used as both the input and target output
You will see the following result :

Visualize the results
First we will randomly select the image and display it
# randomly select input image
index = np.random.randint(len(x_test))
# plot the image
plt.imshow(x_test[index].reshape(28,28))
plt.gray()

This is the random image from the MNIST test dataset displayed using matplotlib
Next we will predict the results from model
# predict the results from model (get compressed images)
pred = model.predict(x_test)
model.predict() is a method in Keras used to obtain predictions from a trained model
x_test is the input test data on which predictions will be made
Next we will visualize the compressed image obtained from model.predict()
# visualize compressed image
plt.imshow(pred[index].reshape(28,28))
plt.gray()

This is the compressed image . We can clearly see that there is some difference between original image and the compressed image
We can create subplots which will display the original and predicted compressed image side by side , which helps to visualize the difference between both the images clearly
index = np.random.randint(len(x_test))
plt.figure(figsize=(10, 4))
# display original image
ax = plt.subplot(1, 2, 1)
plt.imshow(x_test[index].reshape(28,28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# display compressed image
ax = plt.subplot(1, 2, 2)
plt.imshow(pred[index].reshape(28,28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
First, np.random.randint() as seen earlier generates a random integer index within the range of the length of the x_test array.
plt.figure() creates a new figure object with a specified size of 10 inches wide and 4 inches tall. This sets the overall size of the plot
plt.subplot(1, 2, 1) creates a subplot grid with 1 row and 2 columns and selects the first subplot for displaying the original image
plt.imshow() displays the original image at the selected index from x_test. The reshape(28, 28) is used to reshape the flattened image back to its original 2D shape of 28x28 pixels.
plt.gray() as seen earlier sets the color map of the plot to grayscale.
ax.get_xaxis().set_visible() and ax.get_yaxis().set_visible() hide the x-axis and y-axis ticks as we set False, respectively, to remove the axis labels
plt.subplot(1, 2, 2) selects the second subplot for displaying the compressed/reconstructed image.
Next we will display the reconstructed image at the selected index from pred using plt.imshow()
plt.show() displays the figure with both subplots showing the original and reconstructed images

This gives us better visualization where we can clearly see the difference between the original image and the predicted compressed image
We can check the results for one more image data to see the accuracy and performance of the model
index = np.random.randint(len(x_test))
plt.figure(figsize=(10, 4))
# display original image
ax = plt