Deploying YoloV5 Simplified

4 min readMar 13, 2023

In this article, I will attempt to write a simple script that utilizes the YOLOv5 object detection model for custom use. Suppose you have trained a YOLOv5 model and wish to implement it. Let’s say you have trained your model to detect dogs and cats, and you want to create a Python script that will play an audio file corresponding to the detected object. For example, if a cat is detected, the script should play an audio file related to cats, and so on.

To achieve this, you will need to be able to pass an image to your trained model and receive information about the detected class. We will accomplish this using the following steps.

First of all you should setup YoloV5 locally. You can follow this link for windows. Once You are done with setting up Yolov5. Create a jupyter notebook file Yolov5 Directory. I have written the following for CPU. Let’s code it

In the following file we are loading Yolov5s model.

import os
import sys
from pathlib import Path
import time
import numpy as np
import cv2
import torch
import torch.backends.cudnn as cudnn
import io
import base64
import datetime


ROOT = '/home/rcai/Desktop/yolov5'
if str(ROOT) not in sys.path:
    sys.path.append(str(ROOT))  # add ROOT to PATH
ROOT = Path(os.path.relpath(ROOT, Path.cwd()))  # relative



import argparse
import os
import sys
from pathlib import Path

import torch

from models.common import DetectMultiBackend

import utils
from utils.augmentations import letterbox 
from models.common import DetectMultiBackend
from utils.dataloaders import IMG_FORMATS, VID_FORMATS, LoadImages, LoadStreams
from utils.general import (LOGGER, check_file, check_img_size, check_imshow, check_requirements, colorstr,
                           increment_path, non_max_suppression, print_args, scale_boxes, strip_optimizer, xyxy2xywh)
from utils.plots import Annotator, colors, save_one_box
from utils.torch_utils import select_device, time_sync
import cv2
from PIL import Image
import time


from scipy.spatial import distance as dist

import argparse
import imutils
import time
#import dlib
import time
from threading import Thread
import math
import cv2
import playsound
import numpy as np
import threading 


import cv2
import numpy as np
import pandas as pd
import csv
import numpy
from datetime import datetime

import math



from imutils.video import VideoStream
from imutils import face_utils


device = select_device('cpu')#Set 0 if you have GPU
model = DetectMultiBackend('Yolov5s.pt', device=device, dnn=False, data='data/coco128.yaml')
model.classes = [0, 2]
stride, names, pt, jit, onnx, engine = model.stride, model.names, model.pt, model.jit, model.onnx, model.engine
imgsz = check_img_size((640, 640), s=stride)  # check image size


dataset = LoadImages('./car.jpg', img_size=imgsz, stride=stride, auto=pt)

from PIL import Image

# Preparing images to pass  to yoloV5
im=cv2.imread("./car.jpg") ## Reading iput images



# You may need to convert the color.
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
im_pil = Image.fromarray(im)
im_pil

Now we have read the input image. Next we will prepare it before passing it to YoloV5. Note we also have convert the image from numpy array to Pytorch tensor of cpu type.

img0=im.copy() 
img = letterbox(img0, 640, stride=stride, auto=True)[0]  # Resize image keeping aspect ratio
img = img.transpose((2, 0, 1))[::-1]  # HWC to CHW, BGR to RGB
img = np.ascontiguousarray(img)
im = torch.from_numpy(img).to(device) # converting image into torch of cpu type
im = im.float() # uint8 to fp16/32
im /= 255  # 0 - 255 to 0.0 - 1.0
if len(im.shape) == 3:
    im = im[None]  # expand for batch dim
dt = [0.0,0.0,0.0]    
pred = model(im, augment=False, visualize=False) # Passing the image to 
seen = 0

Applying non-max Supression and printing the output of the model.


pred = non_max_suppression(pred, conf_thres=0.45, iou_thres=0.45, classes=[0,1,2,3,4,6] , max_det=1000)
print(pred)

we can see 3 classes have been detected. The result of non-max-supression returns a tensor a tensor which has 3 rows , as 3 objects are detected. The first 4 entries in each rows tells us about the location of object. These will be used to draw bounding box. The 5th entry shows the confidence of model about the class. For example for the first entry the model is 91% confident. The last entry tells us about the class. Here the class is 2. Which represents the car. Now we have to convert this tensor to numpy so we can write condition to perform different task based on detected classes.

The following code converts the tensor into numpy array

det=pred[0]
prediction=det.cpu().numpy() 
print(prediction)

Now we can use this numpy array according to our use. For example i can print car detected if car is detected as follows.

prediction[:,-1]

for i in prediction[:,-1]:
    if i==2:
        print("Car detected")

Deploying YoloV5 Simplified

Written by Imtiaz Ul Hassan