top of page
Writer's pictureHackers Realm

Custom Object Detection using YOLOv8 | Python

In recent years, object detection has become a cornerstone in computer vision, driving advancements in areas ranging from autonomous vehicles to smart surveillance. One of the most popular and efficient models for real-time object detection is the YOLO (You Only Look Once) series, with the latest version, YOLOv8, pushing the boundaries of speed and accuracy even further. YOLOv8 introduces significant improvements in architecture, enabling faster inference, better precision, and easier customization compared to its predecessors.

Custom Object Detection Tutorial using YOLOV8
Custom Object Detection Tutorial using YOLOV8

This article focuses on building a custom object detection model using YOLOv8. By training YOLOv8 on a custom dataset, you can create a specialized model capable of identifying unique objects relevant to specific applications—whether it’s for counting machinery on a factory floor, detecting different types of animals in a wildlife reserve, or recognizing defective items in a production line. We'll explore how to collect and annotate data, configure YOLOv8 for training, and deploy the trained model, providing a step-by-step guide to empower you to build and leverage your own object detection solutions.


You can watch the video-based tutorial with step by step explanation down below.


Install Modules


# install pytorch with gpu support for faster training
!pip install ultralytics
  • The command install torch installs PyTorch, a popular deep learning library. PyTorch provides a framework for building and training deep learning models, such as neural networks, and is widely used in tasks like computer vision, natural language processing, and more.

  • The command !pip install ultralytics installs the ultralytics library, which is the official package for YOLOv8. This package includes all the tools needed to use and customize YOLOv8 for object detection tasks, making it convenient to implement and train custom detection models.


Import Modules


import os
import time
import random
import pandas as pd
import numpy as np
import cv2
import torch
from tqdm.auto import tqdm
from PIL import Image
import shutil
import matplotlib.pyplot as plt
%matplotlib inline
  • import os: Imports the os module, which provides functions for interacting with the operating system (e.g., file and directory management).

  • import time: Imports the time module, allowing you to work with time-related functions, such as measuring elapsed time.

  • import random: Imports the random module, which provides functions to generate random numbers and make random choices, useful for data splitting or augmentation.

  • import pandas as pd: Imports pandas, a powerful library for data manipulation and analysis, and aliases it as pd.

  • import numpy as np: Imports numpy, a library used for numerical operations, and aliases it as np. It provides support for arrays and matrices, as well as mathematical operations on them.

  • import cv2: Imports OpenCV, a computer vision library used for processing images and videos.

  • import torch: Imports PyTorch, which is used for building and training machine learning models.

  • from tqdm.auto import tqdm: Imports tqdm, a library for displaying progress bars in loops. The auto submodule allows it to adjust to various environments, such as Jupyter notebooks.

  • from PIL import Image: Imports the Image module from the Python Imaging Library (PIL). PIL is used for opening, manipulating, and saving different image formats.

  • import shutil: Imports the shutil module, which provides high-level file operations such as copying and removing files or directories.

  • import matplotlib.pyplot as plt: Imports the pyplot module from Matplotlib for data visualization and aliases it as plt. This module is used to create static, animated, or interactive plots.

  • %matplotlib inline: This line is a Jupyter notebook "magic command" that allows plots created by Matplotlib to be displayed directly in the notebook cells. This is useful for visualizing results without needing to open separate windows.


Load the Dataset


First we will load the dataset.

df = pd.read_csv('data/train_solution_bounding_boxes (1).csv')
df.head()
First 5 rows of the dataframe
First 5 rows of the dataframe
  • df = pd.read_csv('data/train_solution_bounding_boxes (1).csv'): This line uses pandas to read a CSV file located at 'data/train_solution_bounding_boxes (1).csv'. The CSV file is loaded into a DataFrame called df. DataFrames are a powerful data structure in pandas used for tabular data, allowing easy manipulation, analysis, and visualization.

  • df.head(): This function is used to display the first 5 rows of the DataFrame df by default. It is helpful for quickly inspecting the data and verifying that it has been loaded correctly.


Next we will add new columns to the DataFrame df to extract information from the existing columns.

# get image_id
df['image_id'] = df['image'].apply(lambda x: x.split('.')[0])
df['classes'] = 0
df.head(2)
daframe after adding new columns
Firs 2 rows of dataframe after adding new columns
  • df['image_id'] = df['image'].apply(lambda x: x.split('.')[0]):

    • This line creates a new column called 'image_id' in the DataFrame.

    • It uses the .apply() method to apply a lambda function to each value in the 'image' column.

    • The lambda function lambda x: x.split('.')[0] splits the string by the '.' character and takes the first part (typically the filename without the extension).

    • This is useful when you want a unique identifier for each image without including the file extension.

  • df['classes'] = 0:

    • This line adds a new column called 'classes' to the DataFrame and sets all of its values to 0.

    • This could be used to assign a default class label (e.g., for a single-class dataset) before further processing.

  • df.head(2):

    • Displays the first 2 rows of the updated DataFrame to inspect the changes.


Next we will initialize he configurations.

# initialize configuration
img_h, img_w, num_channels = (380, 676, 3)
images_folder = 'data/training_images/'
  • img_h, img_w, num_channels = (380, 676, 3):

    • This line sets the height, width, and number of channels for the images being processed.

    • img_h and img_w represent the height and width of the images in pixels (380x676 in this case).

    • num_channels = 3 indicates that the images are in RGB format, which has three color channels (Red, Green, Blue).

  • images_folder = 'data/training_images/':

    • This line sets the path to the folder containing the training images.

    • It specifies that the images are located in the 'data/training_images/' directory.


Next we will convert the bounding box coordinates from their original format to the YOLO format. YOLO requires the bounding boxes to be specified in terms of their center coordinates and dimensions relative to the image.

# convert the data points to YOLO format
df['x_center'] = (df['xmin'] + df['xmax']) / 2
df['y_center'] = (df['ymin'] + df['ymax']) / 2
df['w'] = df['xmax'] - df['xmin']
df['h'] = df['ymax'] - df['ymin']
  • df['x_center'] = (df['xmin'] + df['xmax']) / 2:

    • Calculates the x-coordinate of the center of the bounding box.

    • The center x-coordinate is obtained by averaging the minimum ('xmin') and maximum ('xmax') x-coordinates of the bounding box.

  • df['y_center'] = (df['ymin'] + df['ymax']) / 2:

    • Calculates the y-coordinate of the center of the bounding box.

    • The center y-coordinate is obtained by averaging the minimum ('ymin') and maximum ('ymax') y-coordinates.

  • df['w'] = df['xmax'] - df['xmin']:

    • Computes the width ('w') of the bounding box by subtracting the minimum x-coordinate ('xmin') from the maximum x-coordinate ('xmax').

  • df['h'] = df['ymax'] - df['ymin']:

    • Computes the height ('h') of the bounding box by subtracting the minimum y-coordinate ('ymin') from the maximum y-coordinate ('ymax').

  • In YOLO format, each bounding box is represented by the (x_center, y_center, w, h) format, where x_center and y_center are the coordinates of the bounding box center, and w and h are its width and height. This representation is beneficial for YOLO's one-shot detection approach, making it easier for the model to learn and predict bounding boxes.


Next we will  normalize the bounding box coordinates to make them relative to the dimensions of the image. This is an important step for YOLO, which requires normalized values between 0 and 1.

# normalize the values
df['x_center'] = df['x_center'] / img_w
df['y_center'] = df['y_center'] / img_h
df['w'] = df['w'] / img_w
df['h'] = df['h'] / img_h
df.head(2)
First 2 rows of dataframe after normalizing he values
First 2 rows of dataframe after normalizing he values
  • df['x_center'] = df['x_center'] / img_w:

    • Normalizes the x-coordinate of the bounding box center.

    • df['x_center'] is divided by img_w (the width of the image), which converts it to a value between 0 and 1, representing its relative position within the image.

  • df['y_center'] = df['y_center'] / img_h:

    • Normalizes the y-coordinate of the bounding box center.

    • df['y_center'] is divided by img_h (the height of the image), which converts it to a value between 0 and 1.

  • df['w'] = df['w'] / img_w:

    • Normalizes the width of the bounding box.

    • df['w'] is divided by img_w, resulting in a relative width value between 0 and 1.

  • df['h'] = df['h'] / img_h:

    • Normalizes the height of the bounding box.

    • df['h'] is divided by img_h, resulting in a relative height value between 0 and 1.

  • After normalization, all bounding box values (x_center, y_center, w, h) are in the range [0, 1]. This ensures that the model can work with different image sizes consistently, as it learns the positions and dimensions of objects relative to the entire image rather than in absolute pixel values.


Exploratory Data Analysis


First we will select a random image from the dataset and display it using matplotlib.

image = random.choice(df['image'])
image_path = os.path.join(images_folder, image)
img = Image.open(image_path)
plt.axis('off')
plt.imshow(img)
plt.show()
Random image from dataset
Random image from dataset

  • image = random.choice(df['image']):

    • Selects a random image filename from the 'image' column of the DataFrame df.

    • The random.choice() function is used to pick one item randomly from the list of image filenames.

  • image_path = os.path.join(images_folder, image):

    • Constructs the full path to the randomly selected image.

    • os.path.join(images_folder, image) combines the images_folder (the directory containing the images) and the image filename to create the complete file path.

  • img = Image.open(image_path):

    • Opens the image at the specified image_path using the PIL library (Image.open()).

    • This loads the image into memory for further processing or visualization.

  • plt.axis('off'):

    • Hides the axis when displaying the image using matplotlib, giving a cleaner visual output.

  • plt.imshow(img):

    • Displays the image using matplotlib.

  • plt.show():

    • Renders and displays the image plot.


We will display a another random image.

image = random.choice(df['image'])
image_path = os.path.join(images_folder, image)
img = Image.open(image_path)
plt.axis('off')
plt.imshow(img)
plt.show()
Another random image from dataset
Another random image from dataset

Next we will create a function draw_bounding_box to read an image from the dataset, draw a bounding box around a specified object, and display the annotated image.

def draw_bounding_box(idx):
    image = cv2.imread(os.path.join(images_folder, df['image'][idx]))
    x_min = int(df['xmin'][idx])
    y_min = int(df['ymin'][idx])
    x_max = int(df['xmax'][idx])
    y_max = int(df['ymax'][idx])

    # draw the rectangle
    cv2.rectangle(image, (x_min, y_min), (x_max, y_max), (0, 255, 0), 2)
    # place the label above bounding boxes
    cv2.putText(image, 'car', (x_min, y_min-10), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

    # display the image
    plt.axis('off')
    plt.imshow(image)
    plt.show()
  • Function Definition:

    • The function takes one argument, idx, which is the index of the DataFrame df that specifies which image and bounding box to draw.

  • Read the Image:

    • Next read the image file at the path constructed from the images_folder and the filename from the 'image' column of the DataFrame using OpenCV's cv2.imread() function.

  • Extract Bounding Box Coordinates:

    • Next retrieve the bounding box coordinates (xmin, ymin, xmax, ymax) from the DataFrame for the specified index. The coordinates are converted to integers.

  • Draw the Bounding Box:

    • cv2.rectangle()  is used to draw a rectangle (bounding box) on the image. The rectangle is defined by the top-left corner (x_min, y_min) and the bottom-right corner (x_max, y_max).

    • The color of the rectangle is green (0, 255, 0), and 2 specifies the thickness of the rectangle.

  • Place the Label Above the Bounding Box:

    • Next add text to the image using cv2.putText(). The label 'car' is placed above the bounding box.

    • The text color is green (0, 255, 0), and it uses the FONT_HERSHEY_SIMPLEX font. The size of the text is 1, and the thickness is 2.

  • Display the Image:

    • Next display the annotated image using matplotlib.

    • The axis is turned off for a cleaner display, and plt.imshow(image) is used to render the image. Finally, plt.show() displays the image.


Next randomly select an index from the DataFrame df and then calls the draw_bounding_box function to display the corresponding image with its bounding box.

idx = random.randrange(0, len(df))
draw_bounding_box(idx)
Image with bounding box
Image with bounding box
  • Next we will use random.randrange() to generate a random integer between 0 and len(df) - 1, which is a valid index for the DataFrame df. This ensures that the selected index is within the bounds of the DataFrame.

  • Next call the previously defined draw_bounding_box function, passing the randomly selected index idx as an argument.

  • The function reads the corresponding image from the dataset, draws the bounding box around the object, places a label, and displays the annotated image.

  • When you run this code snippet, you'll see a randomly selected image from your dataset with its bounding box drawn around the specified object (e.g., a car) and the label placed above the bounding box. This is a useful way to visually verify the correctness of your bounding box annotations in the dataset.


Next we will draw a bounding box for another image.

idx = random.randrange(0, len(df))
draw_bounding_box(idx)
Image with bounding box
Image with bounding box

Create Annotations for YOLO Format


First we will create a directory to store annotations for our project.

annotations_folder = 'data/annotations/'

if not os.path.exists(annotations_folder):
    os.mkdir(annotations_folder)
  • Define the Annotations Folder Path:

    • annotations_folder = 'data/annotations/': This line sets a variable annotations_folder to specify the path where the annotations will be stored. In this case, it is defined as 'data/annotations/'.

  • Check if the Directory Exists:

    • Next check whether the directory specified in annotations_folder already exists using os.path.exists().

    • If the directory does not exist (i.e., it returns False), the code inside the if block will execute.

  • Create the Directory:

    • If the directory does not exist, this line creates it using os.mkdir(). This function creates a new directory with the specified path.


Next we will create YOLO-style annotation files for each image in your dataset by iterating over the bounding box data in the DataFrame and writing annotations to individual text files.

annotations_dict = {}

# iterate through dataframe to consolidate bounding boxes for each image
for _, row in df.iterrows():
    image_file = row['image']
    class_label = int(row['classes'])
    x_center = row['x_center']
    y_center = row['y_center']
    w = row['w']
    h = row['h']

    # initialize list if image is not present in dicitionary
    if image_file not in annotations_dict:
        annotations_dict[image_file] = []

    # append annotations to the list 
    annotations_dict[image_file].append(f"{class_label} {x_center} {y_center} {w} {h}")

# write the annotations in text file
for image_file, annotations in annotations_dict.items():
    annotation_file = os.path.join(annotations_folder, os.path.splitext(image_file)[0] + '.txt')
    with open(annotation_file, 'w') as f:
        for annotation in annotations:
            f.write(annotation + '\n')
  • Initialize an Empty Dictionary:

    • Initialize an empty dictionary named annotations_dict to store annotations for each image.

    • The keys in the dictionary will be the image filenames, and the values will be lists of bounding box annotations for each image.

  • Iterate Through the DataFrame and Consolidate Bounding Boxes:

    • Iterate over each row in the DataFrame df using iterrows().

    • _ is used to ignore the index.

    • row represents each row in the DataFrame.

    • Extract the necessary data from each row to create an annotation for an image.

      • image_file: Gets the filename of the image from the 'image' column.

      • class_label: Gets the class label from the 'classes' column, converted to an integer.

      • x_center, y_center, w, h: Extracts the center coordinates (x_center, y_center) and dimensions (w, h) of the bounding box, which have been normalized.

    • Check if the image file is already a key in annotations_dict. If image_file is not present in annotations_dict, it initializes an empty list for that key. This ensures that a list exists for each image to store multiple bounding boxes.

    • Next add a formatted annotation string to the list of annotations for the current image. The format is "class_label x_center y_center width height", which is the format required by YOLO. The string is appended to the list associated with image_file in annotations_dict.

  • Write the Annotations to Text Files :

    • First  Iterate over each image and its list of annotations in annotations_dict.

    • os.path.splitext(image_file)[0] extracts the base name of the image (without the file extension).

    • + '.txt' adds the .txt extension to create the corresponding annotation filename.

    • os.path.join(annotations_folder, ...) constructs the full path of the annotation file inside the annotations_folder.

    • with open(annotation_file, 'w') as f: opens the file in write mode ('w').

    • for annotation in annotations: iterates over each annotation string for the image.

    • f.write(annotation + '\n') writes each annotation to a new line in the text file.

    • These .txt files are used during the training of the YOLO model to locate and identify objects in each image.


Next we will split the dataset of images into training and testing sets.

# split images for training and testing
from sklearn.model_selection import train_test_split
images = list(annotations_dict.keys())
train_images, test_images = train_test_split(images, test_size=0.2, random_state=42)
  • train_test_split is a utility function that helps split data into training and testing sets, which is crucial for building machine learning models and evaluating their performance.

  • annotations_dict contains all the annotations for each image, with the image filename as the key.

  • list(annotations_dict.keys()) converts the keys (image filenames) into a list, which will be split into training and testing sets.

  • Uses train_test_split to randomly split the list of image filenames (images) into training and testing sets.

    • Parameters:

      • images: The list of image filenames to be split.

      • test_size=0.2: Specifies that 20% of the images should be used for testing, while the remaining 80% should be used for training.

      • random_state=42: Sets the seed for random number generation to ensure reproducibility. Using a fixed value (e.g., 42) ensures that the split is the same every time the code is run.

    • Returns:

      • train_images: A list of image filenames for training (80% of the data).

      • test_images: A list of image filenames for testing (20% of the data).


Next we will create directories (folders) for storing training and testing images, as well as their corresponding annotations.

# create folders for train and test
train_images_folder = 'datasets/train_images'
test_images_folder = 'datasets/test_images'
train_annotations_folder = 'datasets/train_annotations'
test_annotations_folder = 'datasets/test_annotations'

# create directors for the above paths
os.mkdir('datasets')
os.mkdir(train_images_folder)
os.mkdir(test_images_folder)
os.mkdir(train_annotations_folder)
os.mkdir(test_annotations_folder)
  • Define Folder Paths:

    • train_images_folder: Path for storing training images.

    • test_images_folder: Path for storing testing images.

    • train_annotations_folder: Path for storing annotation files for training images.

    • test_annotations_folder: Path for storing annotation files for testing images.

  • Create Directories for the Above Paths

    • The os.mkdir() function is used to create a new directory with the specified name. This folder will serve as the root directory containing subdirectories for training and testing data.

    • os.mkdir(train_images_folder): Creates a folder named train_images inside the datasets directory to store training images.

    • os.mkdir(test_images_folder): Creates a folder named test_images inside the datasets directory to store testing images.

    • os.mkdir(train_annotations_folder): Creates a folder named train_annotations inside the datasets directory to store annotation files for training images.

    • os.mkdir(test_annotations_folder): Creates a folder named test_annotations inside the datasets directory to store annotation files for testing images.

  • These directories help organize the dataset into separate parts for training and testing, which is crucial when training machine learning models like YOLO. Organizing the data in this way makes it easier to feed the appropriate data into the model during the training and evaluation phases.


Next we will define a function called copy_files() that copies image files and their corresponding annotation files to designated destination folders.

# copy the files to the respective folder paths
def copy_files(images_list, image_src_folder, annotation_src_folder, image_dest_folder, annotation_dest_folder):
    for image_file in images_list:
        # path for source and destination
        src_image_path = os.path.join(image_src_folder, image_file)
        annotation_file = os.path.splitext(image_file)[0] + '.txt'
        src_annotations_path = os.path.join(annotation_src_folder, annotation_file)
        dest_image_path = os.path.join(image_dest_folder, image_file)
        dest_annotation_path = os.path.join(annotation_dest_folder, annotation_file)

        # copy images
        shutil.copy2(src_image_path,  dest_image_path)
        # copy annotations
        shutil.copy2(src_annotations_path, dest_annotation_path)
  • Define the Function copy_files() :

    • Define a function named copy_files() that takes five arguments.

    • Parameters:

      • images_list: A list of image filenames to be copied.

      • image_src_folder: The source folder containing the original image files.

      • annotation_src_folder: The source folder containing the corresponding annotation files.

      • image_dest_folder: The destination folder where image files should be copied.

      • annotation_dest_folder: The destination folder where annotation files should be copied.

  • Iterate Over the List of Image Filenames :

    • Iterates over each image filename in the images_list.

    • image_file: Represents the name of the current image being processed in each iteration.

  • Define Paths for Source and Destination Files :

    • os.path.join(image_src_folder, image_file): Joins the image_src_folder path and the image_file name to create the full source path for the image.

    • os.path.splitext(image_file)[0]: Extracts the base name of the image file (without the file extension).

    • + '.txt': Appends the .txt extension to get the corresponding annotation filename.

    • os.path.join(annotation_src_folder, annotation_file): Joins the annotation_src_folder and annotation_file name to create the full source path for the annotation.

    • dest_image_path: Path to where the image file should be copied.

    • dest_annotation_path: Path to where the annotation file should be copied.

    • shutil.copy2(src_image_path, dest_image_path): Copies the file from src_image_path to dest_image_path. shutil.copy2() preserves the original file metadata (such as creation and modification times).

    • shutil.copy2(src_annotations_path, dest_annotation_path): Copies the annotation file from src_annotations_path to dest_annotation_path.


Next we will call the copy_files() function twice to copy the training and testing data to their respective directories.

copy_files(train_images, images_folder, annotations_folder, train_images_folder, train_annotations_folder)

copy_files(test_images, images_folder, annotations_folder, test_images_folder, test_annotations_folder)
  • Copy Training Images and Annotations

    • train_images: List of filenames for images that are part of the training dataset.

    • images_folder: The source folder that contains all the original images.

    • annotations_folder: The source folder that contains all the annotation files.

    • train_images_folder: The destination folder where the training images will be copied.

    • train_annotations_folder: The destination folder where the annotation files for training images will be copied.

  • Copy Testing Images and Annotations

    • test_images: List of filenames for images that are part of the testing dataset.

    • images_folder: The source folder that contains all the original images.

    • annotations_folder: The source folder that contains all the annotation files.

    • test_images_folder: The destination folder where the testing images will be copied.

    • test_annotations_folder: The destination folder where the annotation files for testing images will be copied.


Model Training


Next we will train the model.

from ultralytics import YOLO
# load pretrained model
model = YOLO('yolov8n.yaml')

# define training parameters
model.train(data='data.yaml', epochs=50, batch=16, imgsz=676, workers=2)
Model training logs
Model training logs
  • The ultralytics library provides a simplified interface for working with YOLO models, specifically YOLOv8.

  • The YOLO class is used to create a model object that can be trained, validated, or used for inference.

  • YOLO('yolov8n.yaml'): Creates a model object based on the yolov8n.yaml configuration file.

  • 'yolov8n.yaml': Specifies the configuration file for the YOLOv8 model. The configuration file contains details about the model architecture (e.g., the number of layers, filters, etc.). yolov8n typically stands for YOLOv8 nano, which is a lightweight version of the model optimized for speed.

  • data='data.yaml': Specifies the path to the data configuration file (data.yaml).

    • The data.yaml file contains information about the dataset, including paths to training and validation images, class names, and the number of classes.

  • epochs=50: Specifies that the model should be trained for 50 epochs.

    • An epoch is one complete pass through the entire training dataset.

  • batch=16: Specifies the batch size of 16.

    • The batch size is the number of samples that are processed before the model's internal parameters are updated. A batch size of 16 means that 16 images are processed at a time during training.

  • imgsz=676: Specifies the size of the images to be used for training.

    • The images will be resized to 676 pixels before being fed into the model.

  • workers=2: Specifies the number of worker threads to use for loading the data.

    • Using more workers can speed up the data loading process, especially for large datasets.


Next we will evaluate the trained YOLOv8 model on a validation dataset.

metrics = model.val()
Trained model evaluation
Trained model evaluation
  • model.val():

    • Purpose: The .val() method is used to evaluate the model's performance using a validation dataset. This is typically done after training to check how well the model generalizes to unseen data.

    • Validation Process: During validation, the model is tested on images it has not seen during training. The results are compared against the ground-truth labels to measure accuracy, precision, recall, and other metrics.

  • metrics:

    • This variable will store the performance metrics of the model after validation.

    • Metrics may include:

      • Precision: The proportion of positive identifications that were actually correct.

      • Recall: The proportion of actual positives that were identified correctly.

      • mAP (mean Average Precision): Measures the precision and recall of the model at different thresholds.

      • F1 Score: A balance between precision and recall, often used as a summary of model performance.


Test the Real Image


Next we will run the trained YOLO model on a test image, draw bounding boxes for the detected objects, and display the results.

# process the results
image_path = 'datasets/test_images/vid_4_10020.jpg'
results = model(image_path)
image = cv2.imread(image_path)
for result in results:
    # loop through the detected objects
    for detection in result.boxes:
        x_min, y_min, x_max, y_max = detection.xyxy[0]
        confidence = round(float(detection.conf[0]), 2)
        class_id = int(detection.cls[0])

        # draw bouding box
        cv2.rectangle(image, (int(x_min), int(y_min)), (int(x_max), int(y_max)), (0, 255, 0), 2)

        # write label
        label = f"Car {confidence}"
        cv2.putText(image, label, (int(x_min), int(y_min)-10), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

# convert the image to rgb
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

plt.imshow(image_rgb)
plt.axis('off')
plt.show()
Test image
Test image
Another test image
Another test image
  • Load the Test Image and Get Predictions:

    • image_path: Specifies the path to the image you want to test (vid_4_10020.jpg).

    • results = model(image_path): Runs the model on the specified image and stores the detection results in the results variable.

    • The results object contains bounding boxes, confidence scores, and class labels for the detected objects.

  • Load the Image with OpenCV:

    • cv2.imread(image_path): Loads the image from the given path using OpenCV.

    • The loaded image is stored in the variable image.

  • Loop Through Detected Objects and Draw Bounding Boxes:

    • Outer Loop (for result in results):

      • Loops through each result in results. Typically, there will be only one result since we're running the model on a single image.

    • Inner Loop (for detection in result.boxes):

      • Loops through each detected object (detection) in result.

      • detection.xyxy[0]: Gets the bounding box coordinates (x_min, y_min, x_max, y_max).

      • detection.conf[0]: Retrieves the confidence score of the detection.

      • detection.cls[0]: Retrieves the class label ID of the detected object.

  • Draw Bounding Boxes and Add Labels:

    • Draw Bounding Box:

      • cv2.rectangle(image, (int(x_min), int(y_min)), (int(x_max), int(y_max)), (0, 255, 0), 2): Draws a green rectangle around the detected object.

      • (0, 255, 0) specifies the color of the rectangle (green), and 2 specifies the line thickness.

    • Write Label:

      • label = f"Car {confidence}": Creates a label that shows the class name (Car) and confidence score.

      • cv2.putText(image, label, (int(x_min), int(y_min)-10), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2): Writes the label above the bounding box using green text. The 1 specifies the font size and 2 specifies the thickness of the text.

  • Convert the Image to RGB Format for Display:

    • OpenCV loads images in BGR format, while matplotlib expects RGB format for correct color representation.

    • cv2.cvtColor(image, cv2.COLOR_BGR2RGB): Converts the image from BGR to RGB format.

  • Display the Image with Bounding Boxes and Labels:

    • plt.imshow(image_rgb): Displays the processed image with bounding boxes and labels.

    • plt.axis('off'): Hides the axis for a cleaner display.

    • plt.show(): Renders the image.


Final Thoughts


  • Custom object detection using YOLOv8 offers a powerful and flexible approach to building effective computer vision solutions.

  • By leveraging YOLOv8's state-of-the-art architecture and customization capabilities, you can train models that precisely detect the objects relevant to your use case, whether it's for vehicles, people, wildlife, or specialized industry needs.

  • The step-by-step process of preparing data, training, and evaluating the model highlights the importance of a well-curated dataset and a good understanding of the parameters that impact performance.

  • While YOLOv8’s user-friendly API simplifies the workflow, successfully deploying a custom model still requires careful tuning and optimization based on the complexity of the target environment and task requirements.


To further enhance your results, consider experimenting with different augmentation techniques, hyperparameters, or incorporating techniques like transfer learning. By doing so, you can continue to push the boundaries of what your custom object detection system can achieve.


Get the project notebook from here


Thanks for reading the article!!!


Check out more project videos from the YouTube channel Hackers Realm

Comments


bottom of page