Gender and Age Prediction using Python | Image Classification & Regression | Deep Learning

Hackers Realm
Jun 25, 2022
10 min read

Updated: May 15, 2024

Uncover the power of gender and age prediction with Python! This tutorial dives into image classification and regression techniques in deep learning. Learn to build models that can accurately predict gender and age from images, unlocking applications in facial recognition, demographics analysis, and more. Enhance your skills in computer vision, machine learning, and advance your understanding of deep learning algorithms. Unlock the potential of gender and age prediction with this comprehensive project tutorial. #GenderPrediction #AgePrediction #Python #ImageClassification #Regression #DeepLearning #ComputerVision

Gender & Age Prediction with Keras Tensorflow — Gender & Age Prediction

In this project tutorial we will use Convolutional Neural Network (CNN) for image feature extraction and visualize the results with plot graphs. We will create an image classification model for the gender prediction and a regression model for the age prediction.

You can watch the video-based tutorial with step by step explanation down below.

Dataset Information

UTKFace dataset is a large-scale face dataset with long age span (range from 0 to 116 years old). The dataset consists of over 20,000 face images with annotations of age, gender, and ethnicity. The images cover large variation in pose, facial expression, illumination, occlusion, resolution, etc. This dataset could be used on a variety of tasks, e.g., face detection, age estimation, age progression/regression, landmark localization, etc.

The objective of the project is to detect gender and age using facial images. Convolutional Neural Network is used to classify the images. There are 2 output types namely, gender(M or F) and age.

Environment: kaggle

Download the UTKFace dataset here

Import Modules

import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from tqdm.notebook import tqdm
warnings.filterwarnings('ignore')
%matplotlib inline

import tensorflow as tf
from keras.preprocessing.image import load_img
from keras.models import Sequential, Model
from keras.layers import Dense, Conv2D, Dropout, Flatten, MaxPooling2D, Input

pandas - used to perform data manipulation and analysis
numpy - used to perform a wide variety of mathematical operations on arrays
matplotlib - used for data visualization and graphical plotting
seaborn - built on top of matplotlib with similar functionalities
os - used to handle files using system commands
tqdm - progress bar decorator for iterators
warnings - to manipulate warnings details, filterwarnings('ignore') is to ignore the warnings thrown by the modules (gives clean results)
load_img - used for loading the image as numpy array
tensorflow - backend module for the use of Keras
Dense - single dimension linear layer
Dropout - used to add regularization to the data, avoiding over fitting & dropping out a fraction of the data
Activation - layer for the use of certain threshold
Flatten - convert a 2D array into a 1D array
Conv2D - convolutional layer in 2 dimension
MaxPooling2D - function to get the maximum pixel value to the next layer

Load the Dataset

Now we load the dataset for processing

BASE_DIR = '../input/utkface-new/UTKFace/'

Use the directory where you have stored the dataset

# labels - age, gender, ethnicity
image_paths = []
age_labels = []
gender_labels = []

for filename in tqdm(os.listdir(BASE_DIR)):
    image_path = os.path.join(BASE_DIR, filename)
    temp = filename.split('_')
    age = int(temp[0])
    gender = int(temp[1])
    image_paths.append(image_path)
    age_labels.append(age)
    gender_labels.append(gender)

Here we use the BASE_DIR to iterate the image paths
Age and gender labels are assigned to the corresponding image path
With the split function, we can extract the age and gender from the image path
The first index is the age and the second index is the gender

Now we create the dataframe

# convert to dataframe
df = pd.DataFrame()
df['image'], df['age'], df['gender'] = image_paths, age_labels, gender_labels
df.head()

From the display we can see better how the age and gender were extracted
In gender zero (0) is Male and one (1) is female.

Now we map the gender label for a better display in the graphs

# map labels for gender
gender_dict = {0:'Male', 1:'Female'}

Exploratory Data Analysis

from PIL import Image
img = Image.open(df['image'][0])
plt.axis('off')
plt.imshow(img);

Display of the first image in the dataset
You may resize the image to a uniform width and height for easier processing
In this project we will resize all images to 128 x 128 due to limited resources

sns.distplot(df['age'])

Distribution Plot of Age — Distribution of Age

Distplot of the age attribute
The majority are in between ages 25 to 30 years old.
You may convert this distribution into a scaled format using Standard Scalar (or) Min Max Normalization

sns.countplot(df['gender'])

Count Plot of Gender — Distribution of Gender

Visualization of the gender attribute and it's in uniform distribution

# to display grid of images
plt.figure(figsize=(20, 20))
files = df.iloc[0:25]

for index, file, age, gender in files.itertuples():
    plt.subplot(5, 5, index+1)
    img = load_img(file)
    img = np.array(img)
    plt.imshow(img)
    plt.title(f"Age: {age} Gender: {gender_dict[gender]}")
    plt.axis('off')

Sample Facial Images from UTK Dataset with Labels

Display of 25 random images with different genders and ages
You may shuffle the data for different result
Different saturation and qualities can be observed among the images

Feature Extraction

Now we define the feature extraction function

def extract_features(images):
    features = []
    for image in tqdm(images):
        img = load_img(image, color_mode='grayscale')
        img = img.resize((128, 128), Image.ANTIALIAS)
        img = np.array(img)features.append(img)
        
    features = np.array(features)
    # ignore this step if using RGB
    features = features.reshape(len(features), 128, 128, 1)
    return features

Image reshaped is defined and in grayscale for quicker processing

Now let us test the feature extraction

X = extract_features(df['image'])

X.shape

(23708, 128, 128, 1)

Features extracted from the image data

# normalize the images
X = X/255.0

All images normalized from a range of 1 to 255 into 0 to 1

y_gender = np.array(df['gender'])
y_age = np.array(df['age'])

Conversion of gender and age into a numpy array

input_shape = (128, 128, 1)

Configuration of input shape of the images into a fixed size and in grayscale

Model Creation

Now we proceed to the model creation

inputs = Input((input_shape))
# convolutional layers
conv_1 = Conv2D(32, kernel_size=(3, 3), activation='relu') (inputs)
maxp_1 = MaxPooling2D(pool_size=(2, 2)) (conv_1)
conv_2 = Conv2D(64, kernel_size=(3, 3), activation='relu') (maxp_1)
maxp_2 = MaxPooling2D(pool_size=(2, 2)) (conv_2)
conv_3 = Conv2D(128, kernel_size=(3, 3), activation='relu') (maxp_2)
maxp_3 = MaxPooling2D(pool_size=(2, 2)) (conv_3)
conv_4 = Conv2D(256, kernel_size=(3, 3), activation='relu') (maxp_3)
maxp_4 = MaxPooling2D(pool_size=(2, 2)) (conv_4)

flatten = Flatten() (maxp_4)

# fully connected layers
dense_1 = Dense(256, activation='relu') (flatten)
dense_2 = Dense(256, activation='relu') (flatten)

dropout_1 = Dropout(0.3) (dense_1)
dropout_2 = Dropout(0.3) (dense_2)

output_1 = Dense(1, activation='sigmoid', name='gender_out') (dropout_1)
output_2 = Dense(1, activation='relu', name='age_out') (dropout_2)

model = Model(inputs=[inputs], outputs=[output_1, output_2])

model.compile(loss=['binary_crossentropy', 'mae'], optimizer='adam', metrics=['accuracy', 'mae'])

Dropout() - used to add regularization to the data, avoiding over fitting & dropping out a fraction of the data from the layers
activation='sigmoid' - used for binary classification
optimizer=’adam’ - automatically adjust the learning rate for the model over the no. of epochs
loss='binary_crossentropy' - loss function for binary outputs

# plot the model
from tensorflow.keras.utils import plot_model
plot_model(model)

Model plot shows the image processing layers and split into 2 dense layers for classification and regression outputs

Now we train the dataset

# train model
history = model.fit(x=X, y=[y_gender, y_age], batch_size=32, epochs=30, validation_split=0.2)

Epoch 1/30

593/593 [==============================] - 18s 17ms/step - loss: 16.1346 - gender_out_loss: 0.6821 - age_out_loss: 15.4525 - gender_out_accuracy: 0.5472 - age_out_accuracy: 0.0476 - val_loss: 12.7578 - val_gender_out_loss: 0.5521 - val_age_out_loss: 12.2057 - val_gender_out_accuracy: 0.7269 - val_age_out_accuracy: 0.0460 Epoch 2/30 593/593 [==============================] - 9s 16ms/step - loss: 11.2216 - gender_out_loss: 0.4761 - age_out_loss: 10.7455 - gender_out_accuracy: 0.7741 - age_out_accuracy: 0.0285 - val_loss: 11.5279 - val_gender_out_loss: 0.4163 - val_age_out_loss: 11.1116 - val_gender_out_accuracy: 0.8064 - val_age_out_accuracy: 0.0255 Epoch 3/30 593/593 [==============================] - 10s 16ms/step - loss: 9.3465 - gender_out_loss: 0.3925 - age_out_loss: 8.9540 - gender_out_accuracy: 0.8214 - age_out_accuracy: 0.0157 - val_loss: 8.4260 - val_gender_out_loss: 0.3558 - val_age_out_loss: 8.0702 - val_gender_out_accuracy: 0.8361 - val_age_out_accuracy: 0.0074 Epoch 4/30 593/593 [==============================] - 9s 16ms/step - loss: 8.5809 - gender_out_loss: 0.3446 - age_out_loss: 8.2363 - gender_out_accuracy: 0.8416 - age_out_accuracy: 0.0119 - val_loss: 8.5080 - val_gender_out_loss: 0.3214 - val_age_out_loss: 8.1866 - val_gender_out_accuracy: 0.8541 - val_age_out_accuracy: 0.0078 Epoch 5/30 593/593 [==============================] - 9s 16ms/step - loss: 8.0615 - gender_out_loss: 0.3149 - age_out_loss: 7.7466 - gender_out_accuracy: 0.8602 - age_out_accuracy: 0.0109 - val_loss: 7.5080 - val_gender_out_loss: 0.3134 - val_age_out_loss: 7.1946 - val_gender_out_accuracy: 0.8551 - val_age_out_accuracy: 0.0076 Epoch 6/30 593/593 [==============================] - 10s 16ms/step - loss: 7.6047 - gender_out_loss: 0.2935 - age_out_loss: 7.3112 - gender_out_accuracy: 0.8672 - age_out_accuracy: 0.0096 - val_loss: 7.5676 - val_gender_out_loss: 0.2822 - val_age_out_loss: 7.2854 - val_gender_out_accuracy: 0.8747 - val_age_out_accuracy: 0.0063 Epoch 7/30 593/593 [==============================] - 9s 15ms/step - loss: 7.2906 - gender_out_loss: 0.2782 - age_out_loss: 7.0124 - gender_out_accuracy: 0.8743 - age_out_accuracy: 0.0080 - val_loss: 7.1280 - val_gender_out_loss: 0.2800 - val_age_out_loss: 6.8480 - val_gender_out_accuracy: 0.8739 - val_age_out_accuracy: 0.0049 Epoch 8/30 593/593 [==============================] - 9s 16ms/step - loss: 6.9194 - gender_out_loss: 0.2654 - age_out_loss: 6.6540 - gender_out_accuracy: 0.8818 - age_out_accuracy: 0.0072 - val_loss: 8.0823 - val_gender_out_loss: 0.2770 - val_age_out_loss: 7.8053 - val_gender_out_accuracy: 0.8766 - val_age_out_accuracy: 0.0049 Epoch 9/30 593/593 [==============================] - 9s 15ms/step - loss: 6.6902 - gender_out_loss: 0.2507 - age_out_loss: 6.4395 - gender_out_accuracy: 0.8903 - age_out_accuracy: 0.0064 - val_loss: 7.1591 - val_gender_out_loss: 0.2882 - val_age_out_loss: 6.8709 - val_gender_out_accuracy: 0.8707 - val_age_out_accuracy: 0.0032 Epoch 10/30 593/593 [==============================] - 10s 16ms/step - loss: 6.4238 - gender_out_loss: 0.2404 - age_out_loss: 6.1834 - gender_out_accuracy: 0.8941 - age_out_accuracy: 0.0063 - val_loss: 7.0038 - val_gender_out_loss: 0.2649 - val_age_out_loss: 6.7389 - val_gender_out_accuracy: 0.8842 - val_age_out_accuracy: 0.0051

Epoch 11/30 593/593 [==============================] - 10s 16ms/step - loss: 6.2591 - gender_out_loss: 0.2276 - age_out_loss: 6.0316 - gender_out_accuracy: 0.9011 - age_out_accuracy: 0.0063 - val_loss: 6.8535 - val_gender_out_loss: 0.2642 - val_age_out_loss: 6.5894 - val_gender_out_accuracy: 0.8876 - val_age_out_accuracy: 0.0027 Epoch 12/30 593/593 [==============================] - 10s 16ms/step - loss: 5.9888 - gender_out_loss: 0.2179 - age_out_loss: 5.7709 - gender_out_accuracy: 0.9047 - age_out_accuracy: 0.0072 - val_loss: 6.8253 - val_gender_out_loss: 0.2690 - val_age_out_loss: 6.5562 - val_gender_out_accuracy: 0.8851 - val_age_out_accuracy: 0.0049 Epoch 13/30 593/593 [==============================] - 10s 16ms/step - loss: 5.7775 - gender_out_loss: 0.2075 - age_out_loss: 5.5700 - gender_out_accuracy: 0.9118 - age_out_accuracy: 0.0059 - val_loss: 7.1583 - val_gender_out_loss: 0.2630 - val_age_out_loss: 6.8953 - val_gender_out_accuracy: 0.8876 - val_age_out_accuracy: 0.0036 Epoch 14/30 593/593 [==============================] - 9s 15ms/step - loss: 5.4795 - gender_out_loss: 0.1951 - age_out_loss: 5.2844 - gender_out_accuracy: 0.9160 - age_out_accuracy: 0.0054 - val_loss: 6.8055 - val_gender_out_loss: 0.2790 - val_age_out_loss: 6.5264 - val_gender_out_accuracy: 0.8838 - val_age_out_accuracy: 0.0034 Epoch 15/30 593/593 [==============================] - 9s 16ms/step - loss: 5.4528 - gender_out_loss: 0.1831 - age_out_loss: 5.2697 - gender_out_accuracy: 0.9243 - age_out_accuracy: 0.0057 - val_loss: 6.9825 - val_gender_out_loss: 0.2813 - val_age_out_loss: 6.7012 - val_gender_out_accuracy: 0.8882 - val_age_out_accuracy: 0.0034 Epoch 16/30 593/593 [==============================] - 9s 15ms/step - loss: 5.1696 - gender_out_loss: 0.1693 - age_out_loss: 5.0004 - gender_out_accuracy: 0.9304 - age_out_accuracy: 0.0051 - val_loss: 6.9286 - val_gender_out_loss: 0.2636 - val_age_out_loss: 6.6650 - val_gender_out_accuracy: 0.8817 - val_age_out_accuracy: 0.0044 Epoch 17/30 593/593 [==============================] - 10s 16ms/step - loss: 5.0638 - gender_out_loss: 0.1638 - age_out_loss: 4.9000 - gender_out_accuracy: 0.9312 - age_out_accuracy: 0.0054 - val_loss: 6.9296 - val_gender_out_loss: 0.2813 - val_age_out_loss: 6.6483 - val_gender_out_accuracy: 0.8956 - val_age_out_accuracy: 0.0042 Epoch 18/30 593/593 [==============================] - 9s 15ms/step - loss: 4.8813 - gender_out_loss: 0.1494 - age_out_loss: 4.7319 - gender_out_accuracy: 0.9396 - age_out_accuracy: 0.0053 - val_loss: 6.9294 - val_gender_out_loss: 0.2971 - val_age_out_loss: 6.6323 - val_gender_out_accuracy: 0.8880 - val_age_out_accuracy: 0.0040 Epoch 19/30 593/593 [==============================] - 9s 15ms/step - loss: 4.8204 - gender_out_loss: 0.1428 - age_out_loss: 4.6776 - gender_out_accuracy: 0.9414 - age_out_accuracy: 0.0057 - val_loss: 6.9242 - val_gender_out_loss: 0.3056 - val_age_out_loss: 6.6185 - val_gender_out_accuracy: 0.8832 - val_age_out_accuracy: 0.0042 Epoch 20/30 593/593 [==============================] - 9s 16ms/step - loss: 4.6624 - gender_out_loss: 0.1350 - age_out_loss: 4.5274 - gender_out_accuracy: 0.9444 - age_out_accuracy: 0.0057 - val_loss: 7.0920 - val_gender_out_loss: 0.3745 - val_age_out_loss: 6.7175 - val_gender_out_accuracy: 0.8699 - val_age_out_accuracy: 0.0070

Epoch 21/30 593/593 [==============================] - 9s 16ms/step - loss: 4.5481 - gender_out_loss: 0.1267 - age_out_loss: 4.4214 - gender_out_accuracy: 0.9491 - age_out_accuracy: 0.0068 - val_loss: 6.9295 - val_gender_out_loss: 0.3286 - val_age_out_loss: 6.6009 - val_gender_out_accuracy: 0.8865 - val_age_out_accuracy: 0.0032 Epoch 22/30 593/593 [==============================] - 9s 16ms/step - loss: 4.4753 - gender_out_loss: 0.1239 - age_out_loss: 4.3514 - gender_out_accuracy: 0.9497 - age_out_accuracy: 0.0054 - val_loss: 7.0483 - val_gender_out_loss: 0.3409 - val_age_out_loss: 6.7075 - val_gender_out_accuracy: 0.8918 - val_age_out_accuracy: 0.0070 Epoch 23/30 593/593 [==============================] - 9s 16ms/step - loss: 4.4120 - gender_out_loss: 0.1102 - age_out_loss: 4.3018 - gender_out_accuracy: 0.9540 - age_out_accuracy: 0.0090 - val_loss: 6.9948 - val_gender_out_loss: 0.3285 - val_age_out_loss: 6.6663 - val_gender_out_accuracy: 0.8895 - val_age_out_accuracy: 0.0105 Epoch 24/30 593/593 [==============================] - 10s 16ms/step - loss: 4.2673 - gender_out_loss: 0.1059 - age_out_loss: 4.1614 - gender_out_accuracy: 0.9583 - age_out_accuracy: 0.0193 - val_loss: 7.0131 - val_gender_out_loss: 0.3328 - val_age_out_loss: 6.6803 - val_gender_out_accuracy: 0.8897 - val_age_out_accuracy: 0.0243 Epoch 25/30 593/593 [==============================] - 9s 15ms/step - loss: 4.1578 - gender_out_loss: 0.1024 - age_out_loss: 4.0553 - gender_out_accuracy: 0.9582 - age_out_accuracy: 0.0264 - val_loss: 6.8706 - val_gender_out_loss: 0.3361 - val_age_out_loss: 6.5345 - val_gender_out_accuracy: 0.8958 - val_age_out_accuracy: 0.0287 Epoch 26/30 593/593 [==============================] - 9s 15ms/step - loss: 4.0662 - gender_out_loss: 0.0933 - age_out_loss: 3.9730 - gender_out_accuracy: 0.9611 - age_out_accuracy: 0.0299 - val_loss: 7.2064 - val_gender_out_loss: 0.3738 - val_age_out_loss: 6.8326 - val_gender_out_accuracy: 0.8918 - val_age_out_accuracy: 0.0266 Epoch 27/30 593/593 [==============================] - 9s 15ms/step - loss: 4.0040 - gender_out_loss: 0.0851 - age_out_loss: 3.9189 - gender_out_accuracy: 0.9644 - age_out_accuracy: 0.0311 - val_loss: 7.1397 - val_gender_out_loss: 0.4333 - val_age_out_loss: 6.7064 - val_gender_out_accuracy: 0.8903 - val_age_out_accuracy: 0.0331 Epoch 28/30 593/593 [==============================] - 10s 16ms/step - loss: 3.9340 - gender_out_loss: 0.0848 - age_out_loss: 3.8492 - gender_out_accuracy: 0.9641 - age_out_accuracy: 0.0344 - val_loss: 7.0291 - val_gender_out_loss: 0.4004 - val_age_out_loss: 6.6287 - val_gender_out_accuracy: 0.8889 - val_age_out_accuracy: 0.0251 Epoch 29/30 593/593 [==============================] - 9s 16ms/step - loss: 3.9378 - gender_out_loss: 0.0819 - age_out_loss: 3.8559 - gender_out_accuracy: 0.9659 - age_out_accuracy: 0.0329 - val_loss: 6.9958 - val_gender_out_loss: 0.3569 - val_age_out_loss: 6.6389 - val_gender_out_accuracy: 0.8895 - val_age_out_accuracy: 0.0346 Epoch 30/30 593/593 [==============================] - 9s 15ms/step - loss: 3.8026 - gender_out_loss: 0.0774 - age_out_loss: 3.7252 - gender_out_accuracy: 0.9671 - age_out_accuracy: 0.0341 - val_loss: 7.0322 - val_gender_out_loss: 0.4161 - val_age_out_loss: 6.6161 - val_gender_out_accuracy: 0.8920 - val_age_out_accuracy: 0.0259

Set the no. of epochs and batch size according to the hardware specifications
Training accuracy and validation accuracy increases each iteration
Training loss and validation loss decreases each iteration

Plot the Results

# plot results for gender
acc = history.history['gender_out_accuracy']
val_acc = history.history['val_gender_out_accuracy']
epochs = range(len(acc))

plt.plot(epochs, acc, 'b', label='Training Accuracy')
plt.plot(epochs, val_acc, 'r', label='Validation Accuracy')
plt.title('Accuracy Graph')
plt.legend()
plt.figure()

loss = history.history['loss']
val_loss = history.history['val_loss']

plt.plot(epochs, loss, 'b', label='Training Loss')
plt.plot(epochs, val_loss, 'r', label='Validation Loss')
plt.title('Loss Graph')
plt.legend()
plt.show()

Training & Validation Accuracy for Gender

Gender Accuracy: 90.00
Age MAE: 6.5

# plot results for age
loss = history.history['age_out_mae']
val_loss = history.history['val_age_out_mae']
epochs = range(len(loss))

plt.plot(epochs, loss, 'b', label='Training MAE')
plt.plot(epochs, val_loss, 'r', label='Validation MAE')
plt.title('MAE Graph')
plt.legend()
plt.show()

Prediction with Test Data

image_index = 100
print("Original Gender:", gender_dict[y_gender[image_index]], "Original Age:", y_age[image_index])
# predict from model
pred = model.predict(X[image_index].reshape(1, 128, 128, 1))
pred_gender = gender_dict[round(pred[0][0][0])]
pred_age = round(pred[1][0][0])
print("Predicted Gender:", pred_gender, "Predicted Age:", pred_age)
plt.axis('off')
plt.imshow(X[image_index].reshape(128, 128), cmap='gray');

Original Gender: Female Original Age: 3 Predicted Gender: Female Predicted Age: 1

image_index = 3000
print("Original Gender:", gender_dict[y_gender[image_index]], "Original Age:", y_age[image_index])
# predict from model
pred = model.predict(X[image_index].reshape(1, 128, 128, 1))
pred_gender = gender_dict[round(pred[0][0][0])]
pred_age = round(pred[1][0][0])
print("Predicted Gender:", pred_gender, "Predicted Age:", pred_age)
plt.axis('off')
plt.imshow(X[image_index].reshape(128, 128), cmap='gray');

Original Gender: Male Original Age: 28 Predicted Gender: Male Predicted Age: 32

image_index = 10000
print("Original Gender:", gender_dict[y_gender[image_index]], "Original Age:", y_age[image_index])
# predict from model
pred = model.predict(X[image_index].reshape(1, 128, 128, 1))
pred_gender = gender_dict[round(pred[0][0][0])]
pred_age = round(pred[1][0][0])
print("Predicted Gender:", pred_gender, "Predicted Age:", pred_age)
plt.axis('off')
plt.imshow(X[image_index].reshape(128, 128), cmap='gray');

Original Gender: Male Original Age: 42 Predicted Gender: Male Predicted Age: 38

We could see a standard deviation of 4 for the prediction of age attribute

Final Thoughts

Training the model by increasing the no. of epochs can give better and more accurate results.
Processing large amount of data can take a lot of time and system resources.
You may use other image classification models of your preference for comparison.
The no. of layers of the model can be increased if you want to process large dataset

In this project tutorial, we have explored the Gender and Age Prediction image classification using UTKFace dataset. This is a deep learning project to learn image classification and regression using feature extraction techniques and visualize the results through different plots.

Get the project notebook from here

Thanks for reading the article!!!

Check out more project videos from the YouTube channel Hackers Realm