top of page
  • Writer's pictureHackers Realm

Urban Sound Analysis using Python | Classification | Deep Learning Project Tutorial

The Urban Sound Analysis is a deep learning classification project. The objective of the project is to analyze sound data and classify each sound. This model can be used for any sound based recognition model such as speech, music, songs, etc.



In this project tutorial we are going to analyze and classify various audio files to a corresponding class and visualize the frequency of the sounds through a plot.



You can watch the step by step explanation video tutorial down below


Dataset Information

This dataset contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes:

  • air_conditioner

  • car_horn

  • children_playing

  • dog_bark

  • drilling

  • engine_idling

  • gun_shot

  • jackhammer

  • siren

  • street_music


Download the dataset here



Mounting Drive


We are mounting the sound dataset from Google Drive

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive

  • The files must be uploaded to your Google Drive account for this to work.

  • An authorization link is provided you must click the link to access the authorization code and paste it in the code box.

Let us verify with directory we are working in

!pwd

/content


Unzip data


Now we unzip the train dataset from the drive

!unzip 'drive/MyDrive/Colab Notebooks/train.zip'
  • The dataset file is around 3GB

Streaming output truncated to the last 5000 lines.

inflating: Train/1674.wav inflating: Train/1675.wav inflating: Train/1677.wav inflating: Train/1678.wav inflating: Train/1679.wav inflating: Train/168.wav inflating: Train/1680.wav inflating: Train/1681.wav inflating: Train/1686.wav inflating: Train/1687.wav

...

  • For this example we are listing 10 sound samples for a simple view



Import modules

import pandas as pd
import numpy as np
import librosa
import librosa.display
import glob
import IPython.display as ipd
import random
%pylab inline

import warnings
warnings.filterwarnings('ignore')
  • pandas - used to perform data manipulation and analysis

  • numpy - used to perform a wide variety of mathematical operations on arrays

  • librosa - used to analyze music and sound files

  • librosa.display - used to display sound data as images

  • glob - used to find all pathnames matching a specific pattern

  • IPython.display - used to display and hear the audio

  • random - used for randomizing

  • %pylab inline - to enable the inline plotting

  • warnings - to manipulate warnings details



Loading the dataset


Now we load the dataset for processing

df = pd.read_csv('Urban Sound Dataset.csv')
df.head()
  • ID - Name of the audio file

  • Class - Name of the output class the audio file belongs to


Let us display an audio file

ipd.Audio('Train/1.wav')
  • Sound bar display of the audio file from the data



Exploratory Data Analysis


In this step we will visualize different audio sample of the data through wave plots.


We will load the audio file into an array

data, sampling_rate = librosa.load('Train/1.wav')
  • sampling_rate - number of splits or samples per second


Now we will view the data array

data

array([-0.09602016, -0.14303702, 0.05203498, ..., -0.01646687, -0.00915894, 0.09742922], dtype=float32)

  • Audio files loaded into values

  • Each value is a frequency value of the data


Next we will view the sampling rate

sampling_rate

22050

  • Output value determines the amount of samples per second



Now we plot some graphs of the audio files

plt.figure(figsize=(12,4))
librosa.display.waveplot(data, sr=sampling_rate)
  • figsize=(12,4) - size of the plot graph

  • librosa.display.waveplot() - display a waveplot of the data and sampling rate



index = random.choice(df.index)

print('Class:', df['Class'][index])
data, sampling_rate = librosa.load('Train/'+str(df['ID'][index]) + '.wav')

plt.figure(figsize=(12,4))
librosa.display.waveplot(data, sr=sampling_rate)

Class: dog_bark

  • Randomly picked audio file to train

  • librosa.load('Train/'+str(df['ID'][index]) + '.wav') - creating a whole path for the data file and append the format

  • Graph display waveplot of a dog bark from the data



index = random.choice(df.index)

print('Class:', df['Class'][index])
data, sampling_rate = librosa.load('Train/'+str(df['ID'][index]) + '.wav')

plt.figure(figsize=(12,4))
librosa.display.waveplot(data, sr=sampling_rate)

Class: gun_shot

  • Different audio data randomly picked

  • Graph display of a gun shot data sample



index = random.choice(df.index)

print('Class:', df['Class'][index])
data, sampling_rate = librosa.load('Train/'+str(df['ID'][index]) + '.wav')

plt.figure(figsize=(12,4))
librosa.display.waveplot(data, sr=sampling_rate)

Class: car_horn

  • Graph display of a car horn data sample



Now we will view the different class distribution in the data set

import seaborn as sns
plt.figure(figsize=(12,7))
sns.countplot(df['Class'])
  • seaborn - built on top of matplotlib with similar functionalities

  • Visualization through a bar graph for the no. of samples for each class.



Input Split


The data currently is in the audio file, we need to extract the audio into an array and convert the data as a sample to directly load the input and output data.


import os

def parser(row):
    # path of the file
    file_name = os.path.join('Train', str(row.ID) + '.wav')
    # load the audio file
    x, sample_rate = librosa.load(file_name, res_type='kaiser_fast')
    # extract features from the data
    mfccs = np.mean(librosa.feature.mfcc(y=x, sr=sample_rate, n_mfcc=40).T, axis=0)
    
    feature = mfccs
    label = row.Class
    
    return [feature, label]
  • import os - used to obtain and concatenate path directories

  • res_type='kaiser_fast' - used to extract the features very fast

  • librosa.feature.mfcc() - Mel-frequency cepstral coefficients technique to extract audio file features

  • feature - array of the features extracted form the data

  • label - name of the class of the extracted data



Now we will load the data for training

data = df.apply(parser, axis=1)
data.columns = ['feature','label']
  • Assigning the columns to display the features and the corresponding label of the data


data[0]

[array([-82.12358939, 139.50591598, -42.43086489, 24.82786139, -11.62076447, 23.49708426, -12.19458986, 25.89713885, -9.40527728, 21.21042898, -7.36882138, 14.25433903, -8.67870015, 7.75023765, -10.1241154 , 3.2581183 , -11.35261914, 2.80096779, -7.04601346, 3.91331351, -2.3349743 , 2.01242254, -2.79394367, 4.12927394, -1.62076864, 4.32620082, -1.03440959, -1.23297714, -3.11085341, 0.32044827, -1.787786 , 0.44295495, -1.79164752, -0.76361758, -1.24246428, -0.27664012, 0.65718559, -0.50237115, -2.60428533, -1.05346291]), 'siren']

  • List of data in a single array in the first index

  • Second index indicates the class


Now we split the data for better processing

# input split
X = np.array(list(zip(*data))[0])
y = np.array(list(zip(*data))[1])


Label encoder


We will transform the 10 class labels from text attributes to numerical attributes

from sklearn.preprocessing import LabelEncoder
from keras.utils import np_utils

le = LabelEncoder()
y = np_utils.to_categorical(le.fit_transform(y))
  • Each class converted into integer values in different categorical columns


y.shape

(5435, 10)

  • Shape of the data set for training, indicating 5435 samples of training data with 10 classes


y[0]

array([0., 0., 0., 0., 0., 0., 0., 0., 1., 0.], dtype=float32)

  • Single sample of the data in numerical columns of the classes

  • If the output class is present in the sample, it will change the corresponding numerical column to 1 and the rest to 0



Model Training


Let us create the model for training

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten

num_classes = 10

# model creation
model = Sequential()

model.add(Dense(256, input_shape=(40,)))
model.add(Activation('relu'))
model.add(Dropout(0.3))

model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.3))

model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dropout(0.3))

model.add(Dense(num_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', metrics='accuracy', optimizer='adam')
  • Dense - single dimension linear layer

  • Dropout - used to add regularization to the data, avoiding over fitting & dropping out a fraction of the data

  • Activation - layer for the use of certain threshold

  • Flatten - convert a 2D array into a 1D array

  • Loss=’sparse_categorical_crossentropy’ - basic structure for the threshold to adjust the gradient descent

  • Optimizer=’adam’ - automatically adjust the learning rate for the model over the number of epochs



Now we will train the data

# train the model
model.fit(X, y, batch_size=32, epochs=100, validation_split=0.25)

Epoch 1/30

1149/1149 [==============================] - 10s 3ms/step - loss: 0.4816 - accuracy: 0.8475 - val_loss: 0.1202 - val_accuracy: 0.9637

Epoch 2/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.1336 - accuracy: 0.9605 - val_loss: 0.0848 - val_accuracy: 0.9743

Epoch 3/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0863 - accuracy: 0.9732 - val_loss: 0.0807 - val_accuracy: 0.9742

Epoch 4/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0685 - accuracy: 0.9783 - val_loss: 0.0734 - val_accuracy: 0.9788

Epoch 5/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0543 - accuracy: 0.9825 - val_loss: 0.0690 - val_accuracy: 0.9809

Epoch 6/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0461 - accuracy: 0.9844 - val_loss: 0.0684 - val_accuracy: 0.9808

Epoch 7/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0360 - accuracy: 0.9873 - val_loss: 0.0743 - val_accuracy: 0.9798

Epoch 8/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0318 - accuracy: 0.9884 - val_loss: 0.0733 - val_accuracy: 0.9811

Epoch 9/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0319 - accuracy: 0.9891 - val_loss: 0.0658 - val_accuracy: 0.9838

Epoch 10/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0242 - accuracy: 0.9919 - val_loss: 0.0728 - val_accuracy: 0.9827

Epoch 11/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0218 - accuracy: 0.9926 - val_loss: 0.0815 - val_accuracy: 0.9818

Epoch 12/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0286 - accuracy: 0.9895 - val_loss: 0.0766 - val_accuracy: 0.9829

Epoch 13/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0199 - accuracy: 0.9928 - val_loss: 0.0762 - val_accuracy: 0.9820

Epoch 14/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0239 - accuracy: 0.9918 - val_loss: 0.0754 - val_accuracy: 0.9836

Epoch 15/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0160 - accuracy: 0.9938 - val_loss: 0.0865 - val_accuracy: 0.9820



Epoch 16/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0196 - accuracy: 0.9935 - val_loss: 0.0842 - val_accuracy: 0.9822

Epoch 17/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0152 - accuracy: 0.9951 - val_loss: 0.0825 - val_accuracy: 0.9828

Epoch 18/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0155 - accuracy: 0.9943 - val_loss: 0.0889 - val_accuracy: 0.9817

Epoch 19/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0207 - accuracy: 0.9930 - val_loss: 0.0886 - val_accuracy: 0.9822

Epoch 20/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0122 - accuracy: 0.9955 - val_loss: 0.0958 - val_accuracy: 0.9822

Epoch 21/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0135 - accuracy: 0.9957 - val_loss: 0.0986 - val_accuracy: 0.9824

Epoch 22/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0166 - accuracy: 0.9949 - val_loss: 0.0987 - val_accuracy: 0.9824

Epoch 23/30