YOLO (Part 2) Object detection with YOLO & OpenCV

Share this post

In the article Object detection with YOLO v4 we saw the implementation of YOLO with Darknet. The idea of this article is to make it easy for us by using the implementation of YOLO which is provided in one of the most famous computer vision libraries: OpenCV. So, instead of downloading darknet like we did – if this previous article – or implementing YOLO “from scratch” why not using an existing and strong implementation of YOLO … with the very famous computer vision library OpenCV ? Agree ? so let’s go …


In this article I will be using Python in Google Colab.

You must nevertheless have downloaded the 3 files essential for YOLO:

  • The configuration (cfg file)
  • The pre-calculated weights
  • The labels (coco.names)

I invite you to refer to my article Object detection with YOLO v4 where I explain how to recover and copy these files in Google colab.

Once the notebook has been created in colab, import the libraries as follows:

import numpy as np
import cv2
from google.colab.patches import cv2_imshow

Note that we will have to use the cv2_imshow function instead of cv2.imshow () because the latter is not supported in colab.


We are now going to define the global variables (notably allowing the configuration):

ROOT_COLAB = '/content/drive/MyDrive/Colab Notebooks'
YOLO_WEIGHTS_FILE = YOLO_CONFIG + 'yolov3.weights'
IMAGE_FILE = 'yoloimg.jpg'
IMAGE = cv2.imread(ROOT_COLAB + '/' + IMAGE_FILE)

The ROOT_COLAB variable is the root of my colab directories in Google Drive, you will certainly need to adapt it to your environment. Of course, I downloaded the YOLO files and put them in the / YOLO / oc_data / directory.

In order to display correctly – without taking up your entire screen – by resizing and maintaining the proportions of your image, here is a handy little function:

# Little function to resize in keeping the format ratio
# Cf. https://stackoverflow.com/questions/35180764/opencv-python-image-too-big-to-display
def ResizeWithAspectRatio(image, width=None, height=None, inter=cv2.INTER_AREA):
    dim = None
    image = image.copy()
    (h, w) = image.shape[:2]
    if width is None and height is None:
        return image
    if width is None:
        r = height / float(h)
        dim = (int(w * r), height)
        r = width / float(w)
        dim = (width, int(h * r))
    return cv2.resize(image, dim, interpolation=inter)
cv2_imshow(ResizeWithAspectRatio(IMAGE, width=700)) 

We are now going to read the list of labels (classes) from the coco.names file:

with open(COCO_LABELS_FILE, 'rt') as f:
    labels = f.read().rstrip('\n').split('\n')

Then we will create a random color set for each class / label:

BOX_COLORS = np.random.randint(0, 255, size=(len(labels), 3), dtype="uint8")

This will allow us for each class to have a different colored frame (useful if we have a lot of detections in the same image).

YOLO initialization through OpenCV

Here we are, we are now going to initialize the YOLO network (implemented in OpenCV) via the readNetFromDarknet () function:

yolo = cv2.dnn.readNetFromDarknet(YOLO_CONFIG_FILE, YOLO_WEIGHTS_FILE)

We then get the last output layer:

yololayers = [yolo.getLayerNames()[i[0] - 1] for i in yolo.getUnconnectedOutLayers()]

We are now going to send our image to the neural network, but for that we need to convert it into a blob:

blobimage = cv2.dnn.blobFromImage(IMAGE, 1 / 255.0, (416, 416), swapRB=True, crop=False)

Effective Objets detection

Now we just have to launch the detection by using a single line of code :

layerOutputs = yolo.forward(yololayers)

And here is the neural network has done its job: let’s see how we are going to retrieve / process its results. For this we will need 3 lists:

  • One for executives who have detected objects
  • Another on for the confidence scores of these objects / detection
  • One for the type of object (class / label) detected

YOLO is indeed sending all this back to us in bulk and we are going to have to choose what to do with all its information, for example an object detected on a confidence of less than 10% is it interesting?

We will have to go through all these detected objects and filter / rework the information as needed:

boxes_detected = []
confidences_scores = []
labels_detected = []
# loop over each of the layer outputs
for output in layerOutputs:
  # loop over each of the detections
  for detection in output:
    # extract the class ID and confidence (i.e., probability) of the current object detection
    scores = detection[5:]
    classID = np.argmax(scores)
    confidence = scores[classID]
    # Take only predictions with confidence more than CONFIDENCE_MIN thresold
    if confidence > CONFIDENCE_MIN:
      # Bounding box
      box = detection[0:4] * np.array([W, H, W, H])
      (centerX, centerY, width, height) = box.astype("int")
      # Use the center (x, y)-coordinates to derive the top and left corner of the bounding box
      x = int(centerX - (width / 2))
      y = int(centerY - (height / 2))
      # update our result list (detection)
      boxes_detected.append([x, y, int(width), int(height)])

Above we have for example filtered on the minimum confidence score of CONFIDENCE_MIN. We have also reworked the coordinates of detected frames so that they can be displayed later with the OpenCV cv2.Rectangle function.

Also be careful because the classes (labels) returned by YOLO are the IDs and not the common names. it will be necessary to use the initial table to display them:

label_names = [labels[i] for i in labels_detected]
['laptop', 'cell phone']

Show the result …

Come on, stop the suspense! we will display the result in the image by creating the colored frames (according to the classes) directly in the source image. We will of course add the confidence rate of the detection:

image = IMAGE.copy()
if nb_results > 0:
  for i in range(nb_results):
    # extract the bounding box coordinates
    (x, y) = (boxes_detected[i][0], boxes_detected[i][1])
    (w, h) = (boxes_detected[i][2], boxes_detected[i][3])
    # draw a bounding box rectangle and label on the image
    color = [int(c) for c in BOX_COLORS[labels_detected[i]]]
    cv2.rectangle(image, (x, y), (x + w, y + h), color, 2)
    score = str(round(float(confidences_scores[i]) * 100, 1)) + "%"
    text = "{}: {}".format(labels[labels_detected[i]], score)
    cv2.putText(image, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
cv2_imshow(ResizeWithAspectRatio(image, width=700))

Fast and efficient, isn’t it? Don’t hesitate to play around with the CONFIDENCE_MIN confidence threshold parameter as well. You will also realize that on fairly low scores we can have some surprises …

Share this post

Benoit Cayla

In more than 15 years, I have built-up a solid experience around various integration projects (data & applications). I have, indeed, worked in nine different companies and successively adopted the vision of the service provider, the customer and the software editor. This experience, which made me almost omniscient in my field naturally led me to be involved in large-scale projects around the digitalization of business processes, mainly in such sectors like insurance and finance. Really passionate about AI (Machine Learning, NLP and Deep Learning), I joined Blue Prism in 2019 as a pre-sales solution consultant, where I can combine my subject matter skills with automation to help my customers to automate complex business processes in a more efficient way. In parallel with my professional activity, I run a blog aimed at showing how to understand and analyze data as simply as possible: datacorner.fr Learning, convincing by the arguments and passing on my knowledge could be my caracteristic triptych.

View all posts by Benoit Cayla →

3 thoughts on “YOLO (Part 2) Object detection with YOLO & OpenCV

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Fork me on GitHub