Hackers Realm

Jul 29, 20224 min

Traffic Forecast using Python | Time Series Analysis | FbProphet | Machine Learning Tutorial

Updated: May 30, 2023

Traffic forecast prediction is a task of predicting traffic volumes, utilizing historical speed and volume data with the help of Time Series Analysis in python. Traffic forecasting is an integral part of the process of designing of road facilities, starting from investment feasibility study to developing of working documentation. You can also apply the Time Series Analysis for stock market, product sales, item demands, etc.

Traffic Forecast - Time Series Analysis

In this tutorial, we will load and analyze the data set from a transport company, preprocess the data and apply a prediction model to forecast the traffic and visualize through graphs.

You can watch the video-based tutorial with step by step explanation down below.

Dataset Information

Investors are considering making an investment in a new form of transportation - JetRail. JetRail uses Jet propulsion technology to run rails and move people at a high speed! While Jet-Rail has mastered the technology and they hold the patent for their product, the investment would only make sense, if they can get more than 1 Million monthly users within the next 18 months.

You need to help Unicorn ventures with the decision. They usually invest in B2C start-ups less than 4 years old looking for pre-series A funding. In order to help Unicorn Ventures in their decision, you need to forecast the traffic on JetRail for the next 7 months.

Download the Dataset here

Import Modules

Let us import all the basic modules we will be needing for this project.

import pandas as pd
 
import numpy as np
 
import matplotlib.pyplot as plt
 
%matplotlib inline
 
import warnings
 
warnings.filterwarnings('ignore')
 
from fbprophet import Prophet

  • pandas - used to perform data manipulation and analysis

  • numpy - used to perform a wide variety of mathematical operations on arrays

  • matplotlib - used for data visualization and graphical plotting

  • %matplotlib - to enable the inline plotting.

  • warnings - to manipulate warnings details

  • filterwarnings('ignore') is to ignore the warnings thrown by the modules (gives clean results)

  • Prophet - Module to use the Time Series Analysis API

  • You must install fbprophet in order to import correctly
     

Loading the Dataset

df = pd.read_csv('Traffic data.csv')
 
df.head()

  • Here, we display the first five data from the data set

  • The Id is not necessary for this tutorial so it can be left out for faster processing

df

Traffic Forecast Dataset
  • This is the display of the whole data, partially returned for viewing purposes

  • The data is composed of three years of collected data (2012-2014)

Preprocessing the dataset

# check null values
 
df.isnull().sum()

  • As we see, there are no null values present in the data set, which is good and makes the process easier

  • In case the data set contains at least one null value you must replace it using imputation techniques so it doesn't affect the results
     

df.info()

  • Datetime is a column we want to convert so we need to see the data type to know how to process it

# convert object to datetime datatype
 
df['Datetime'] = pd.to_datetime(df['Datetime'], format='%d-%m-%Y %H:%M')
 
df.info()

  • Now we converted the Datetime column from an object to a proper datetime attribute
     

# EDA
 
plt.figure(figsize=(10,7))
 
plt.plot(df['Datetime'], df['Count'])
 
plt.show()

Visualization of past traffic data
  • Visualization of the growth of traffic in the specific datetime range collected

Format data for the model

df.index = df['Datetime']
 
df['y'] = df['Count']
 
df.drop(columns=['ID', 'Datetime', 'Count'], axis=1, inplace=True)
 
df = df.resample('D').sum()
 
df.head()

  • This process combines all the data set with the same date summing the count

df['ds'] = df.index
 
df.head()

  • Defining the input data as 'ds' and the output data as 'y'

Input Split

size = 60
 
from sklearn.model_selection import train_test_split
 
train, test = train_test_split(df, test_size=size/len(df), shuffle=False)

train.tail()

  • test_size=size/len(df) - gives a certain percentage

  • shuffle=False - To ensure that the data is not shuffled

test.head()

  • Display of the first five data samples in the test data

test.tail()

  • Display of the last five data samples from the test data
     

Model Training

model = Prophet(yearly_seasonality=True, seasonality_prior_scale=0.9)
 
model.fit(train)

  • Initialization of Prophet model with yearly seasonality as a parameter

future = model.make_future_dataframe(periods=60)
 
future

  • Generating a future dataframe in a 60 day period from the train dataset

forecast = model.predict(future)
 
forecast.head()

Forecasted Data using FBProphet
  • Forecast prediction of the future dataframe, giving the lowest and highest prediction for a specific day

model.plot_components(forecast)

Traffic trend analysis for different time periods

  • We can clearly see the progression of the trends throughout different time ranges

  • We can also see the weekly and yearly progression from the data
     

pred = forecast.iloc[-60:, :]
 
len(pred)

# test results
 
plt.figure(figsize=(10,7))
 
plt.plot(test['ds'], test['y'])
 
plt.plot(pred['ds'], pred['yhat'], color='red')
 
plt.plot(pred['ds'], pred['yhat_lower'], color='green')
 
plt.plot(pred['ds'], pred['yhat_upper'], color='orange')
 
plt.show()

Visualization of traffic forecast data for test set using fbprophet
  • Plot graph comparison between the dataset and the predicted data

  • You can apply hyper parameter tuning to get more accurate results
     

# input data
 
plt.plot(df['ds'], df['y'])
 
plt.show()

Visualization of original traffic data
  • Plot graph display of the original data set

# forecast data
 
plt.plot(forecast['ds'], forecast['yhat'])
 
plt.show()

Visualization of forecasted traffic data
  • Plot graph display of the predictions from the whole data set
     

model = Prophet(yearly_seasonality=True, seasonality_prior_scale=0.9)
 
model.fit(df)
 
future = model.make_future_dataframe(periods=200)
 
forecast = model.predict(future)
 
forecast.head()

Forecasted traffic data
  • Forecast prediction changing the hyper parameter, extending the time period to 200 days

# forecast data
 
plt.plot(forecast['ds'], forecast['yhat'])
 
plt.show()

Visualization of forecasted traffic data
  • Compared with the graph of the 60 day period, it has more accurate results almost similar to the original data

Final Thoughts

  • You can also use hyperparameter tuning to improve the model performance.

  • You can further try other models like ARIMA, LSTM, Transformer, etc.

In this article, we have explored the Time Series Analysis through traffic forecast in a specific time period to predict the progressive increase of traffic in the city. This exercise is very useful to apply on any market or sales type project to view the possible sales prediction, client increase, financial income, etc., in a time range.

Get the project notebook from here

Thanks for reading the article!!!

Check out more project videos from the YouTube channel Hackers Realm

    2955
    0