top of page
  • Writer's pictureHackers Realm

Gender and Age Prediction using Python | Image Classification & Regression | Deep Learning

The Gender and Age Prediction is a classification project where we have to analyze an image and predict the gender and age. This is a deep learning project where we use image classification and regression models to obtain the results.

In this project tutorial we will use Convolutional Neural Network (CNN) for image feature extraction and visualize the results with plot graphs. We will create an image classification model for the gender prediction and a regression model for the age prediction.



You can watch the video-based tutorial with step by step explanation down below.


Dataset Information


UTKFace dataset is a large-scale face dataset with long age span (range from 0 to 116 years old). The dataset consists of over 20,000 face images with annotations of age, gender, and ethnicity. The images cover large variation in pose, facial expression, illumination, occlusion, resolution, etc. This dataset could be used on a variety of tasks, e.g., face detection, age estimation, age progression/regression, landmark localization, etc.


The objective of the project is to detect gender and age using facial images. Convolutional Neural Network is used to classify the images. There are 2 output types namely, gender(M or F) and age.


Environment: kaggle


Download the UTKFace dataset here



Import Modules


import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from tqdm.notebook import tqdm
warnings.filterwarnings('ignore')
%matplotlib inline

import tensorflow as tf
from keras.preprocessing.image import load_img
from keras.models import Sequential, Model
from keras.layers import Dense, Conv2D, Dropout, Flatten, MaxPooling2D, Input
  • pandas - used to perform data manipulation and analysis

  • numpy - used to perform a wide variety of mathematical operations on arrays

  • matplotlib - used for data visualization and graphical plotting

  • seaborn - built on top of matplotlib with similar functionalities

  • os - used to handle files using system commands

  • tqdm - progress bar decorator for iterators

  • warnings - to manipulate warnings details, filterwarnings('ignore') is to ignore the warnings thrown by the modules (gives clean results)

  • load_img - used for loading the image as numpy array

  • tensorflow - backend module for the use of Keras

  • Dense - single dimension linear layer

  • Dropout - used to add regularization to the data, avoiding over fitting & dropping out a fraction of the data

  • Activation - layer for the use of certain threshold

  • Flatten - convert a 2D array into a 1D array

  • Conv2D - convolutional layer in 2 dimension

  • MaxPooling2D - function to get the maximum pixel value to the next layer


Load the Dataset


Now we load the dataset for processing

BASE_DIR = '../input/utkface-new/UTKFace/'
  • Use the directory where you have stored the dataset


# labels - age, gender, ethnicity
image_paths = []
age_labels = []
gender_labels = []

for filename in tqdm(os.listdir(BASE_DIR)):
    image_path = os.path.join(BASE_DIR, filename)
    temp = filename.split('_')
    age = int(temp[0])
    gender = int(temp[1])
    image_paths.append(image_path)
    age_labels.append(age)
    gender_labels.append(gender)
  • Here we use the BASE_DIR to iterate the image paths

  • Age and gender labels are assigned to the corresponding image path

  • With the split function, we can extract the age and gender from the image path

  • The first index is the age and the second index is the gender



Now we create the dataframe

# convert to dataframe
df = pd.DataFrame()
df['image'], df['age'], df['gender'] = image_paths, age_labels, gender_labels
df.head()
  • From the display we can see better how the age and gender were extracted

  • In gender zero (0) is Male and one (1) is female.



Now we map the gender label for a better display in the graphs

# map labels for gender
gender_dict = {0:'Male', 1:'Female'}

Exploratory Data Analysis


from PIL import Image
img = Image.open(df['image'][0])
plt.axis('off')
plt.imshow(img);
  • Display of the first image in the dataset

  • You may resize the image to a uniform width and height for easier processing

  • In this project we will resize all images to 128 x 128 due to limited resources



sns.distplot(df['age'])
  • Distplot of the age attribute

  • The majority are in between ages 25 to 30 years old.

  • You may convert this distribution into a scaled format using Standard Scalar (or) Min Max Normalization



sns.countplot(df['gender'])
  • Visualization of the gender attribute and it's in uniform distribution



# to display grid of images
plt.figure(figsize=(20, 20))
files = df.iloc[0:25]

for index, file, age, gender in files.itertuples():
    plt.subplot(5, 5, index+1)
    img = load_img(file)
    img = np.array(img)
    plt.imshow(img)
    plt.title(f"Age: {age} Gender: {gender_dict[gender]}")
    plt.axis('off')
  • Display of 25 random images with different genders and ages

  • You may shuffle the data for different result

  • Different saturation and qualities can be observed among the images



Feature Extraction


Now we define the feature extraction function

def extract_features(images):
    features = []
    for image in tqdm(images):
        img = load_img(image, grayscale=True)
        img = img.resize((128, 128), Image.ANTIALIAS)
        img = np.array(img)features.append(img)
        
    features = np.array(features)
    # ignore this step if using RGB
    features = features.reshape(len(features), 128, 128, 1)
    return features
  • Image reshaped is defined and in grayscale for quicker processing



Now let us test the feature extraction

X = extract_features(df['image'])
X.shape

(23708, 128, 128, 1)

  • Features extracted from the image data


# normalize the images
X = X/255.0
  • All images normalized from a range of 1 to 255 into 0 to 1


y_gender = np.array(df['gender'])
y_age = np.array(df['age'])
  • Conversion of gender and age into a numpy array


input_shape = (128, 128, 1)
  • Configuration of input shape of the images into a fixed size and in grayscale



Model Creation


Now we proceed to the model creation

inputs = Input((input_shape))
# convolutional layers
conv_1 = Conv2D(32, kernel_size=(3, 3), activation='relu') (inputs)
maxp_1 = MaxPooling2D(pool_size=(2, 2)) (conv_1)
conv_2 = Conv2D(64, kernel_size=(3, 3), activation='relu') (maxp_1)
maxp_2 = MaxPooling2D(pool_size=(2, 2)) (conv_2)
conv_3 = Conv2D(128, kernel_size=(3, 3), activation='relu') (maxp_2)
maxp_3 = MaxPooling2D(pool_size=(2, 2)) (conv_3)
conv_4 = Conv2D(256, kernel_size=(3, 3), activation='relu') (maxp_3)
maxp_4 = MaxPooling2D(pool_size=(2, 2)) (conv_4)

flatten = Flatten() (maxp_4)

# fully connected layers
dense_1 = Dense(256, activation='relu') (flatten)
dense_2 = Dense(256, activation='relu') (flatten)

dropout_1 = Dropout(0.3) (dense_1)
dropout_2 = Dropout(0.3) (dense_2)

output_1 = Dense(1, activation='sigmoid', name='gender_out') (dropout_1)
output_2 = Dense(1, activation='relu', name='age_out') (dropout_2)

model = Model(inputs=[inputs], outputs=[output_1, output_2])

model.compile(loss=['binary_crossentropy', 'mae'], optimizer='adam', metrics=['accuracy'])
  • Dropout() - used to add regularization to the data, avoiding over fitting & dropping out a fraction of the data from the layers

  • activation='sigmoid' - used for binary classification

  • optimizer=’adam’ - automatically adjust the learning rate for the model over the no. of epochs

  • loss='binary_crossentropy' - loss function for binary outputs



# plot the model
from tensorflow.keras.utils import plot_model
plot_model(model)
  • Model plot shows the image processing layers and split into 2 dense layers for classification and regression outputs



Now we train the dataset

# train model
history = model.fit(x=X, y=[y_gender, y_age], batch_size=32, epochs=30, validation_split=0.2)

Epoch 1/30

2022-03-20 12:29:52.907433: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8005

593/593 [==============================] - 18s 17ms/step - loss: 16.1346 - gender_out_loss: 0.6821 - age_out_loss: 15.4525 - gender_out_accuracy: 0.5472 - age_out_accuracy: 0.0476 - val_loss: 12.7578 - val_gender_out_loss: 0.5521 - val_age_out_loss: 12.2057 - val_gender_out_accuracy: 0.7269 - val_age_out_accuracy: 0.0460 Epoch 2/30 593/593 [==============================] - 9s 16ms/step - loss: 11.2216 - gender_out_loss: 0.4761 - age_out_loss: 10.7455 - gender_out_accuracy: 0.7741 - age_out_accuracy: 0.0285 - val_loss: 11.5279 - val_gender_out_loss: 0.4163 - val_age_out_loss: 11.1116 - val_gender_out_accuracy: 0.8064 - val_age_out_accuracy: 0.0255 Epoch 3/30 593/593 [==============================] - 10s 16ms/step - loss: 9.3465 - gender_out_loss: 0.3925 - age_out_loss: 8.9540 - gender_out_accuracy: 0.8214 - age_out_accuracy: 0.0157 - val_loss: 8.4260 - val_gender_out_loss: 0.3558 - val_age_out_loss: 8.0702 - val_gender_out_accuracy: 0.8361 - val_age_out_accuracy: 0.0074 Epoch 4/30 593/593 [==============================] - 9s 16ms/step - loss: 8.5809 - gender_out_loss: 0.3446 - age_out_loss: 8.2363 - gender_out_accuracy: 0.8416 - age_out_accuracy: 0.0119 - val_loss: 8.5080 - val_gender_out_loss: 0.3214 - val_age_out_loss: 8.1866 - val_gender_out_accuracy: 0.8541 - val_age_out_accuracy: 0.0078 Epoch 5/30 593/593 [==============================] - 9s 16ms/step - loss: 8.0615 - gender_out_loss: 0.3149 - age_out_loss: 7.7466 - gender_out_accuracy: 0.8602 - age_out_accuracy: 0.0109 - val_loss: 7.5080 - val_gender_out_loss: 0.3134 - val_age_out_loss: 7.1946 - val_gender_out_accuracy: 0.8551 - val_age_out_accuracy: 0.0076 Epoch 6/30 593/593 [==============================] - 10s 16ms/step - loss: 7.6047 - gender_out_loss: 0.2935 - age_out_loss: 7.3112 - gender_out_accuracy: 0.8672 - age_out_accuracy: 0.0096 - val_loss: 7.5676 - val_gender_out_loss: 0.2822 - val_age_out_loss: 7.2854 - val_gender_out_accuracy: 0.8747 - val_age_out_accuracy: 0.0063 Epoch 7/30 593/593 [==============================] - 9s 15ms/step - loss: 7.2906 - gender_out_loss: 0.2782 - age_out_loss: 7.0124 - gender_out_accuracy: 0.8743 - age_out_accuracy: 0.0080 - val_loss: 7.1280 - val_gender_out_loss: 0.2800 - val_age_out_loss: 6.8480 - val_gender_out_accuracy: 0.8739 - val_age_out_accuracy: 0.0049 Epoch 8/30 593/593 [==============================] - 9s 16ms/step - loss: 6.9194 - gender_out_loss: 0.2654 - age_out_loss: 6.6540 - gender_out_accuracy: 0.8818 - age_out_accuracy: 0.0072 - val_loss: 8.0823 - val_gender_out_loss: 0.2770 - val_age_out_loss: 7.8053 - val_gender_out_accuracy: 0.8766 - val_age_out_accuracy: 0.0049 Epoch 9/30 593/593 [==============================] - 9s 15ms/step - loss: 6.6902 - gender_out_loss: 0.2507 - age_out_loss: 6.4395 - gender_out_accuracy: 0.8903 - age_out_accuracy: 0.0064 - val_loss: 7.1591 - val_gender_out_loss: 0.2882 - val_age_out_loss: 6.8709 - val_gender_out_accuracy: 0.8707 - val_age_out_accuracy: 0.0032 Epoch 10/30 593/593 [==============================] - 10s 16ms/step - loss: 6.4238 - gender_out_loss: 0.2404 - age_out_loss: 6.1834 - gender_out_accuracy: 0.8941 - age_out_accuracy: 0.0063 - val_loss: 7.0038 - val_gender_out_loss: 0.2649 - val_age_out_loss: 6.7389 - val_gender_out_accuracy: 0.8842 - val_age_out_accuracy: 0.0051


Epoch 11/30 593/593 [==============================] - 10s 16ms/step - loss: 6.2591 - gender_out_loss: 0.2276 - age_out_loss: 6.0316 - gender_out_accuracy: 0.9011 - age_out_accuracy: 0.0063 - val_loss: 6.8535 - val_gender_out_loss: 0.2642 - val_age_out_loss: 6.5894 - val_gender_out_accuracy: 0.8876 - val_age_out_accuracy: 0.0027 Epoch 12/30 593/593 [==============================] - 10s 16ms/step - loss: 5.9888 - gender_out_loss: 0.2179 - age_out_loss: 5.7709 - gender_out_accuracy: 0.9047 - age_out_accuracy: 0.0072 - val_loss: 6.8253 - val_gender_out_loss: 0.2690 - val_age_out_loss: 6.5562 - val_gender_out_accuracy: 0.8851 - val_age_out_accuracy: 0.0049 Epoch 13/30 593/593 [==============================] - 10s 16ms/step - loss: 5.7775 - gender_out_loss: 0.2075 - age_out_loss: 5.5700 - gender_out_accuracy: 0.9118 - age_out_accuracy: 0.0059 - val_loss: 7.1583 - val_gender_out_loss: 0.2630 - val_age_out_loss: 6.8953 - val_gender_out_accuracy: 0.8876 - val_age_out_accuracy: 0.0036 Epoch 14/30 593/593 [==============================] - 9s 15ms/step - loss: 5.4795 - gender_out_loss: 0.1951 - age_out_loss: 5.2844 - gender_out_accuracy: 0.9160 - age_out_accuracy: 0.0054 - val_loss: 6.8055 - val_gender_out_loss: 0.2790 - val_age_out_loss: 6.5264 - val_gender_out_accuracy: 0.8838 - val_age_out_accuracy: 0.0034 Epoch 15/30 593/593 [==============================] - 9s 16ms/step - loss: 5.4528 - gender_out_loss: 0.1831 - age_out_loss: 5.2697 - gender_out_accuracy: 0.9243 - age_out_accuracy: 0.0057 - val_loss: 6.9825 - val_gender_out_loss: 0.2813 - val_age_out_loss: 6.7012 - val_gender_out_accuracy: 0.8882 - val_age_out_accuracy: 0.0034 Epoch 16/30 593/593 [==============================] - 9s 15ms/step - loss: 5.1696 - gender_out_loss: 0.1693 - age_out_loss: 5.0004 - gender_out_accuracy: 0.9304 - age_out_accuracy: 0.0051 - val_loss: 6.9286 - val_gender_out_loss: 0.2636 - val_age_out_loss: 6.6650 - val_gender_out_accuracy: 0.8817 - val_age_out_accuracy: 0.0044 Epoch 17/30 593/593 [==============================] - 10s 16ms/step - loss: 5.0638 - gender_out_loss: 0.1638 - age_out_loss: 4.9000 - gender_out_accuracy: 0.9312 - age_out_accuracy: 0.0054 - val_loss: 6.9296 - val_gender_out_loss: 0.2813 - val_age_out_loss: 6.6483 - val_gender_out_accuracy: 0.8956 - val_age_out_accuracy: 0.0042 Epoch 18/30 593/593 [==============================] - 9s 15ms/step - loss: 4.8813 - gender_out_loss: 0.1494 - age_out_loss: 4.7319 - gender_out_accuracy: 0.9396 - age_out_accuracy: 0.0053 - val_loss: 6.9294 - val_gender_out_loss: 0.2971 - val_age_out_loss: 6.6323 - val_gender_out_accuracy: 0.8880 - val_age_out_accuracy: 0.0040 Epoch 19/30 593/593 [==============================] - 9s 15ms/step - loss: 4.8204 - gender_out_loss: 0.1428 - age_out_loss: 4.6776 - gender_out_accuracy: 0.9414 - age_out_accuracy: 0.0057 - val_loss: 6.9242 - val_gender_out_loss: 0.3056 - val_age_out_loss: 6.6185 - val_gender_out_accuracy: 0.8832 - val_age_out_accuracy: 0.0042 Epoch 20/30 593/593 [==============================] - 9s 16ms/step - loss: 4.6624 - gender_out_loss: 0.1350 - age_out_loss: 4.5274 - gender_out_accuracy: 0.9444 - age_out_accuracy: 0.0057 - val_loss: 7.0920 - val_gender_out_loss: 0.3745 - val_age_out_loss: 6.7175 - val_gender_out_accuracy: 0.8699 - val_age_out_accuracy: 0.0070


Epoch 21/30 593/593 [==============================] - 9s 16ms/step - loss: 4.5481 - gender_out_loss: 0.1267 - age_out_loss: 4.4214 - gender_out_accuracy: 0.9491 - age_out_accuracy: 0.0068 - val_loss: 6.9295 - val_gender_out_loss: 0.3286 - val_age_out_loss: 6.6009 - val_gender_out_accuracy: 0.8865 - val_age_out_accuracy: 0.0032 Epoch 22/30 593/593 [==============================] - 9s 16ms/step - loss: 4.4753 - gender_out_loss: 0.1239 - age_out_loss: 4.3514 - gender_out_accuracy: 0.9497 - age_out_accuracy: 0.0054 - val_loss: 7.0483 - val_gender_out_loss: 0.3409 - val_age_out_loss: 6.7075 - val_gender_out_accuracy: 0.8918 - val_age_out_accuracy: 0.0070 Epoch 23/30 593/593 [==============================] - 9s 16ms/step - loss: 4.4120 - gender_out_loss: 0.1102 - age_out_loss: 4.3018 - gender_out_accuracy: 0.9540 - age_out_accuracy: 0.0090 - val_loss: 6.9948 - val_gender_out_loss: 0.3285 - val_age_out_loss: 6.6663 - val_gender_out_accuracy: 0.8895 - val_age_out_accuracy: 0.0105 Epoch 24/30 593/593 [==============================] - 10s 16ms/step - loss: 4.2673 - gender_out_loss: 0.1059 - age_out_loss: 4.1614 - gender_out_accuracy: 0.9583 - age_out_accuracy: 0.0193 - val_loss: 7.0131 - val_gender_out_loss: 0.3328 - val_age_out_loss: 6.6803 - val_gender_out_accuracy: 0.8897 - val_age_out_accuracy: 0.0243 Epoch 25/30 593/593 [==============================] - 9s 15ms/step - loss: 4.1578 - gender_out_loss: 0.1024 - age_out_loss: 4.0553 - gender_out_accuracy: 0.9582 - age_out_accuracy: 0.0264 - val_loss: 6.8706 - val_gender_out_loss: 0.3361 - val_age_out_loss: 6.5345 - val_gender_out_accuracy: 0.8958 - val_age_out_accuracy: 0.0287 Epoch 26/30 593/593 [==============================] - 9s 15ms/step - loss: 4.0662 - gender_out_loss: 0.0933 - age_out_loss: 3.9730 - gender_out_accuracy: 0.9611 - age_out_accuracy: 0.0299 - val_loss: 7.2064 - val_gender_out_loss: 0.3738 - val_age_out_loss: 6.8326 - val_gender_out_accuracy: 0.8918 - val_age_out_accuracy: 0.0266 Epoch 27/30 593/593 [==============================] - 9s 15ms/step - loss: 4.0040 - gender_out_loss: 0.0851 - age_out_loss: 3.9189 - gender_out_accuracy: 0.9644 - age_out_accuracy: 0.0311 - val_loss: 7.1397 - val_gender_out_loss: 0.4333 - val_age_out_loss: 6.7064 - val_gender_out_accuracy: 0.8903 - val_age_out_accuracy: 0.0331 Epoch 28/30 593/593 [==============================] - 10s 16ms/step - loss: 3.9340 - gender_out_loss: 0.0848 - age_out_loss: 3.8492 - gender_out_accuracy: 0.9641 - age_out_accuracy: 0.0344 - val_loss: 7.0291 - val_gender_out_loss: 0.4004 - val_age_out_loss: 6.6287 - val_gender_out_accuracy: 0.8889 - val_age_out_accuracy: 0.0251 Epoch 29/30 593/593 [==============================] - 9s 16ms/step - loss: 3.9378 - gender_out_loss: 0.0819 - age_out_loss: 3.8559 - gender_out_accuracy: 0.9659 - age_out_accuracy: 0.0329 - val_loss: 6.9958 - val_gender_out_loss: 0.3569 - val_age_out_loss: 6.6389 - val_gender_out_accuracy: 0.8895 - val_age_out_accuracy: 0.0346 Epoch 30/30 593/593 [==============================] - 9s 15ms/step - loss: 3.8026 - gender_out_loss: 0.0774 - age_out_loss: 3.7252 - gender_out_accuracy: 0.9671 - age_out_accuracy: 0.0341 - val_loss: 7.0322 - val_gender_out_loss: 0.4161 - val_age_out_loss: 6.6161 - val_gender_out_accuracy: 0.8920 - val_age_out_accuracy: 0.0259

  • Set the no. of epochs and batch size according to the hardware specifications

  • Training accuracy and validation accuracy increases each iteration

  • Training loss and validation loss decreases each iteration



Plot the Results


# plot results for gender
acc = history.history['gender_out_accuracy']
val_acc = history.history['val_gender_out_accuracy']
epochs = range(len(acc))

plt.plot(epochs, acc, 'b', label='Training Accuracy')
plt.plot(epochs, val_acc, 'r', label='Validation Accuracy')
plt.title('Accuracy Graph')
plt.legend()
plt.figure()

loss = history.history['gender_out_loss']
val_loss = history.history['val_gender_out_loss']

plt.plot(epochs, loss, 'b', label='Training Loss')
plt.plot(epochs, val_loss, 'r', label='Validation Loss')
plt.title('Loss Graph')
plt.legend()
plt.show()



  • Gender Accuracy: 90.00

  • Age MAE: 6.5



# plot results for age
loss = history.history['age_out_loss']
val_loss = history.history['val_age_out_loss']
epochs = range(len(loss