COMPARATIVE ANALYSIS OF REAL-TIME COMPUTER VISION SYSTEMS
DOI:
10.32010/IXNR6858
Abstract
Real-time object detection poses a significant problem in modern computer vision, especially in areas that include autonomous driving, intelligent surveillance systems, robotics, and innovative manufacturing processes. This work provides a comparative study of two highly-used and highperforming models in the object detection field: the newly released You Only Look Once version 12 (YOLOv12), known for its high speed in processing and ease of use, and Faster R-CNN that has a ResNet-50-FPN backbone and is known as a two-stage detection model that is highly valued for its accuracy and performance in the process of feature extraction. The main objective of this work is the evaluation of both models in terms of performance in real-time applications with considerations of variables including inference speed, computational complexity, the number of parameters, and the general efficiency of each model. To enable the evaluation, both models were tested under the same experimental conditions using a test image benchmark and run in a GPU-based Google Colab environment. The models were compared in terms of average inference time in seconds, frames per second (FPS), total parameter count in millions, and floating point operations (FLOPs). The outcome of the experiment showed that despite the architectural optimizations of the YOLOv12 for enhanced real-time performance, the Faster R-CNN model surprisingly out-performed in terms of FPS and showed a lower inference time in the given configuration. However, in contrast, the YOLOv12 showed considerably higher model complexity that may make it more suited for generalized performance in more complex or diverse deployment situations. The results indicate that Faster R-CNN suits applications that value research and high accuracy and tolerate a slight increase in inference times. In contrast, YOLOv12 exhibits better versatility for edge computing platforms and real-time processing use cases due to its modular architecture, low weight, and hardware acceleration compatibility. This side-by-side analysis reveals valuable information about the trade-offs between precision, speed, and computation requirements and aids in more effective decision-making for real-time computer vision application model selection.
Keywords
Real-time object detection
YOLOv12
Faster R-CNN
ResNet-50-FPN
inference speed
frames per second (FPS)
model complexity
computational cost
deep learning
edge computing
twostage detector
single-stage detector
computer vision
performance comparison