Research Highlights: YOLO Revisited

deep learning
Print Friendly, PDF & Email

In the insideBIGDATA Research Highlights column we take a look at new and upcoming results from the research community for data science, machine learning, AI and deep learning. Our readers need to get a glimpse for technology coming down the pipeline that will make their efforts more strategic and competitive. In this installment we review a new update of the highly-acclaimed real-time object detector “You Only Look Once” or YOLO algorithm that is more accurate than ever.

Researchers Alexey Bochovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao at Taiwan’s Institute of Information Science Academia Sinica offer YOLOv4 — the first version not to include the technique’s original authors.

Rapid inference is YOLO’s primary advancement. The authors prioritized newer techniques that improve accuracy without impinging on speed, i.e. their so-called “bag of freebies.” Additionally, YOLO4 includes improvements that boost accuracy at a minimal cost to speed, i.e. the “bag of specials.” All told, these enhancements enable the new version to outperform both its predecessor and high-accuracy competitors running at real-time frame rates.

YOLO, as well as most object detectors since, use a model that predicts bounding boxes and classes onto a pre-trained ImageNet feature extractor.

  • Techniques under the heading “bag of freebies” boost accuracy by adding computation during training. These include alternate bounding box loss functions, data augmentation, and decreasing the model’s confidence for ambiguous classes.
  • The authors introduce new data augmentation techniques such as Mosaic, which mixes elements drawn from four training images to place objects in novel contexts.
  • “Bag of specials” techniques include the choice of activation function: ReLU variants are marginally slower, but they can yield better accuracy.
  • The researchers accommodate users with limited hardware resources by choosing techniques that allow training on a single, reasonably affordable GPU.

The researchers pitted YOLOv4 against other object detectors that process at least 30 frames per second, using the COCO image data set. YOLOv4 achieved 0.435 average precision (AP), running at 62 frames per second (FPS). It achieved 0.41 AP at its maximum rate of 96 FPS. The previous state of the art, EfficientDet, achieved 0.43 AP running at nearly 42 FPS and 0.333 AP at its top speed of 62 FPS.


YOLOv4 locates and classifies objects faster than measurements of human performance. While it’s not as accurate as slower networks such as EfficientDet, the new version boosts accuracy without sacrificing speed.

Sign up for the free insideBIGDATA newsletter.

Speak Your Mind