Detecto is an object detection module which use Pytorch, used for detecting custom objects from datasets. It is a fully-functioning computer vision and object detection model with just 5 lines of code implementation. Detecto is build on top of Pytorch, allowing an easy transfer of models between the two libraries.
In this tutorial, we will explain other elements related with Detecto, install Pytorch and implement Detecto in 5 lines of coding.
You can watch the video-based tutorial with step by step explanation down below.
The power of Detecto comes from its simplicity and ease of use. Creating and running a pre-trained Faster R-CNN ResNet-50 FPN from PyTorch's model zoo takes 4 lines of code.
RCNN is short for Region-based Convolutional Neural Network. R-CNN is the first in a series of related algorithms, the next is Fast R-CNN and after that Faster R-CNN. R-CNN classify many objects by first developing Region Proposals in the image where objects that can be classified may be. It does this by using another algorithm known as Selective Search. Fast R-CNN uses Selective Search on the layers of the CNN itself to identify Region Proposals. Faster R-CNN uses an object detection algorithm that allows the CNN itself to learn the Region Proposals.
PyTorch is an open source machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Meta AI.
MS COCO (Microsoft Common Objects in Context) is a large-scale image dataset containing 328,000 images of everyday objects and humans. The dataset contains annotations you can use to train machine learning models to recognize, label, and describe objects.
MC COCO provides the following types of annotations:
Object detection—coordinates of bounding boxes and full segmentation masks for 80 categories of objects
Captioning—natural language descriptions of each image.
Keypoints—the dataset has more than 200,000 images containing over 250,000 humans, labeled with keypoints such as right eye, nose, left hip.
“Stuff image” segmentation—pixel maps of 91 categories of “stuff”—amorphous background regions like walls, sky, or grass.
Panoptic—full scene segmentation, indicating objects in the image according to 80 categories of “things” (cat, pen, fridge, etc.) and 91 “stuff” categories (road, sky, water, etc.).
Dense pose—the dataset has more than 39,000 images containing over 56,000 humans, with every labeled person annotated with an instance id and a mapping between pixels representing that person’s body and a template 3D model.
!pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
!pip3 install detecto
# object detection
from detecto import core, utils, visualize
# read image
image = utils.read_image('apple.jpg')
# load the model
model = core.model()
# make the prediction
labels, boxes, scores = model.predict_top(image)
# visualize the image
visualize.show_labeled_image(image, boxes, labels)
Display of the image with corresponding predictions
It has predicted accurately the apple and also a dining table due to the shadow, since it's common for an apple to be placed on a dining table.
Adding additional threshold can predict more accurately and show only the apple in this case.
This is an easy implementation of an object detection model for images and videos.
Processing large amount of data can take a lot of time and system resource.
Basic deep learning model trained in a small neural network
You may use other object detection models of your preference for comparison.
In this project tutorial, we have explored the Detecto object detection model as a deep learning project, and explained relevant information about RCNN, Pytorch and MS-COCO. This is a basic deep learning project using a pre-trained model to analyze and predict the object in an image.
Thanks for reading the article!!!
Check out more project videos from the YouTube channel Hackers Realm