Abstract

YOLO has become a central real-time object detection system for robotics, driverless cars, and video monitoring applications. We present a comprehensive analysis of YOLO’s evolution, examining the innovations and contributions in each iteration from the original YOLO up to YOLOv8, YOLO-NAS, and YOLO with transformers. We start by describing the standard metrics and postprocessing; then, we discuss the major changes in network architecture and training tricks for each model. Finally, we summarize the essential lessons from YOLO’s development and provide a perspective on its future, highlighting potential research directions to enhance real-time object detection systems.

Keywords

Computer scienceArchitectureArtificial intelligenceObject detectionRoboticsSystems engineeringHuman–computer interactionEngineeringRobotGeography

Affiliated Institutions

Related Publications

A ConvNet for the 2020s

The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification...

2022 2022 IEEE/CVF Conference on Computer ... 5683 citations

Publication Info

Year
2023
Type
review
Volume
5
Issue
4
Pages
1680-1716
Citations
1932
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1932
OpenAlex
86
Influential
1950
CrossRef

Cite This

Juan Terven, Diana‐Margarita Córdova‐Esparza, Julio-Alejandro Romero-González (2023). A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Machine Learning and Knowledge Extraction , 5 (4) , 1680-1716. https://doi.org/10.3390/make5040083

Identifiers

DOI
10.3390/make5040083
arXiv
2304.00501

Data Quality

Data completeness: 88%