Revolutionizing Image Automation with YOLOV5: The Future of Object Detection

4 min readMar 27, 2023

In recent years, the field of computer vision has witnessed a significant advancement in the development of deep learning-based object detection models. YOLO (You Only Look Once) has emerged as one of the most popular object detection models due to its exceptional accuracy and real-time performance. With the latest version, YOLOv5, the possibilities of object detection have been taken to a whole new level.

In this article, we will explore the potential of YOLOv5 in automating image annotation. Specifically, we will demonstrate how YOLOv5 can be used to detect and label objects in images, streamlining the time-consuming and tedious task of manual annotation. At first glance, generating annotations from a trained model may seem like a futile task. However, it can be an incredibly valuable tool in a number of different scenarios.

By using a smaller dataset to train the heavier model, we can achieve higher accuracy with less data. Then, by using the heavier model to generate annotations for a larger dataset, we can leverage the accuracy of the heavier model while still achieving a larger dataset. This larger dataset can then be used to train a smaller, more efficient YOLO model that can run in real-time on low-computing devices.

Another use case of generating annotations from a trained model is by correcting inaccurate annotations and retraining the model to improve its performance. This can be especially useful in situations where the quality of the original datasets was poor, or where new types of objects need to be detected that were not present in the original dataset. With the help of YOLOv5 and its annotation generation capabilities, we can easily and efficiently generate accurate annotations and improve the performance of our models with minimal effort.

To get started with coding in YOLOv5, you will need to set it up locally on your machine or run a demo on Colab. However, if you want to use the code with a webcam, you will need to do a local setup.

Import libraries

import os
import sys
from pathlib import Path
import time
import numpy as np
import cv2
import torch
import torch.backends.cudnn as cudnn
import io
import base64
import datetime
import argparse
from models.common import DetectMultiBackend
from utils.augmentations import letterbox 
from utils.dataloaders import IMG_FORMATS, VID_FORMATS, LoadImages, LoadStreams

from utils.plots import Annotator, colors, save_one_box
from utils.torch_utils import select_device, time_sync
from PIL import Image
from scipy.spatial import distance as dist
import threading 

from utils.general import (LOGGER, check_file, check_img_size, check_imshow, check_requirements, colorstr,
                           increment_path, non_max_suppression, print_args, scale_boxes, strip_optimizer, xyxy2xywh)p



device = select_device('cpu')#Set 0 if you have GPU
model = DetectMultiBackend('yolov5s.pt', device=device, dnn=False, data='data/coco128.yaml')

This code sets the device to run the YOLOv5 model on, either CPU. If you have GPU support you can replace “cpu” with “0”. Then it loads the YOLOv5 model using the DetectMultiBackend class from the YOLOv5 package. The arguments for the class are the path to the model weights file, the device to run the model on, whether to use OpenCV or PyTorch for DNN inference, and the path to the configuration file for the dataset used to train the model. This will create an instance of the YOLOv5 model that can be used for object detection on images or video streams.

def perform_prediction(img0,model):
    
    stride=model.stride
    img0=img0.copy() 
    img = letterbox(img0, 640, stride=stride, auto=True)[0]  # Resize image keeping aspect ratio
    cv2.imwrite("car1.jpg",img)
    height,width ,_=img.shape
    img = img.transpose((2, 0, 1))[::-1]  # HWC to CHW, BGR to RGB
    img = np.ascontiguousarray(img)
    im = torch.from_numpy(img).to(device) # converting image into torch of cpu type
    im = im.float() # uint8 to fp16/32
    im /= 255  # 0 - 255 to 0.0 - 1.0
    if len(im.shape) == 3:
        im = im[None]  # expand for batch dim
    dt = [0.0,0.0,0.0]    
    pred = model(im, augment=False, visualize=False) # Passing the image to 
    seen = 0
    pred = non_max_suppression(pred, conf_thres=0.45, iou_thres=0.45, classes=[0,1,2,3,4,6] , max_det=1000)
    det=pred[0]
    det[:, :4] = scale_boxes(im.shape[2:], det[:, :4], img0.shape).round()
    det[:, :4].cpu().numpy()
    return det[:, :4].cpu().numpy(),det[:, -1].cpu().numpy(),height,width

This function performs object detection using the YOLOv5 model and returns the predicted class labels and corresponding bounding boxes for each object detected in the input image.Note that the image is first resized using the letterbox function, so the predicted bounding boxes are adjusted to match the resized image.

def convert_pred(predictions,classes,height,width):
    predictions.shape[0]
    x_norm, y_norm, w_norm, h_norm =[],[],[],[]
    for i in range(predictions.shape[0]):
        
        x1, y1, x2,y2 =predictions[i]
        
        x_norm.append(((x1 + x2) / (2 * width)))
        y_norm.append((y1 + y2) / (2 * height))
        w_norm.append((x2 - x1) / width)
        h_norm.append((y2 - y1) / height)
        
    predictions_anotatedformat= np.column_stack([classes,x_norm, y_norm, w_norm, h_norm ])
    return predictions_anotatedformat

This function takes in four parameters: predictions, classes, height, and width.

It loops over the predictions and calculates normalized values for the x-coordinate, y-coordinate, width, and height of the bounding box. It then stacks these values with the associated class label and returns the result in an annotated format.Note this is the format which is used by YoloV5 for training and all the annotation tools return annotation in this format.

Now lets read an image and use the model to annotate it.And visualise the result using roboflow.Note im using defualt yolov5s model which can detect 80 classes.

im=cv2.imread("./cars.jpg") ## Reading iput images

Lets pass the image and annotate it

predictions,classes,height,width=perform_prediction(im,model)
converted_pred=convert_pred(predictions,classes,height,width)
np.savetxt('car1.txt',converted_pred,fmt='%.0f %.6f %.6f %.6f %.6f')

Now lets visualize it on Roboflow

Here we can see that our model has succesfully annotated the input image.

You can find the notebook version in my Github repo from this link

Revolutionizing Image Automation with YOLOV5: The Future of Object Detection

Written by Imtiaz Ul Hassan