COMPARATIVE ANALYSIS OF REAL-TIME COMPUTER VISION SYSTEMS

Elviz Ismayilov, Hasanrza Hasanli

Volume: 27

No.: 6

Year:

Pages: 142-147

DOI: 10.32010/IXNR6858

Abstract

Real-time object detection poses a significant problem in modern computer vision, especially in areas that include autonomous driving, intelligent surveillance systems, robotics, and innovative manufacturing processes. This work provides a comparative study of two highly-used and highperforming models in the object detection field: the newly released You Only Look Once version 12 (YOLOv12), known for its high speed in processing and ease of use, and Faster R-CNN that has a ResNet-50-FPN backbone and is known as a two-stage detection model that is highly valued for its accuracy and performance in the process of feature extraction. The main objective of this work is the evaluation of both models in terms of performance in real-time applications with considerations of variables including inference speed, computational complexity, the number of parameters, and the general efficiency of each model. To enable the evaluation, both models were tested under the same experimental conditions using a test image benchmark and run in a GPU-based Google Colab environment. The models were compared in terms of average inference time in seconds, frames per second (FPS), total parameter count in millions, and floating point operations (FLOPs). The outcome of the experiment showed that despite the architectural optimizations of the YOLOv12 for enhanced real-time performance, the Faster R-CNN model surprisingly out-performed in terms of FPS and showed a lower inference time in the given configuration. However, in contrast, the YOLOv12 showed considerably higher model complexity that may make it more suited for generalized performance in more complex or diverse deployment situations. The results indicate that Faster R-CNN suits applications that value research and high accuracy and tolerate a slight increase in inference times. In contrast, YOLOv12 exhibits better versatility for edge computing platforms and real-time processing use cases due to its modular architecture, low weight, and hardware acceleration compatibility. This side-by-side analysis reveals valuable information about the trade-offs between precision, speed, and computation requirements and aids in more effective decision-making for real-time computer vision application model selection.

Keywords

Real-time object detection YOLOv12 Faster R-CNN ResNet-50-FPN inference speed frames per second (FPS) model complexity computational cost deep learning edge computing twostage detector single-stage detector computer vision performance comparison

Download PDF Back to Archive