Attention-Centric YOLOv12 for Real-Time Fine-Grained Waste Detection in the TACO Dataset

Authors

  • Hongye Wu Author

DOI:

https://doi.org/10.61173/aexy0p38

Keywords:

YOLOv12, Waste detection, TACO dataset, Attention mechanism, Real-time inference

Abstract

Efficient waste detection is crucial for environmental sustainability, yet existing models struggle with finegrained objects in complex backgrounds, such as those in the TACO dataset. This paper proposes an attention-centric approach using YOLOv12n to balance detection accuracy and real-time performance. Experimental results on the TACO dataset demonstrate that the proposed YOLOv12n achieves a mean Average Precision (mAP50 ) of 0.376 with an end-to-end inference speed of 94.33 FPS on an NVIDIA RTX 5060 GPU. Ablation studies reveal that removing the Area-Attention ( A2 ) module leads to a significant performance drop, with mAP50 plummeting from 0.376 to 0.148. Furthermore, compared to the YOLOv8n model with a plug-in CBAM module (46.18 FPS, 0.310 mAP), the native attention-centric architecture of YOLOv12n provides a significant increase in inference speed and superior feature localization. This research confirms that a native attention-based design is more effective for realtime fine-grained waste detection than traditional modular additions.

Downloads

Published

2026-04-24

Issue

Section

Articles