A Highly-Parallel AI Accelerator Architecture for Convolution and Activation, Implemented in Verilog
DOI:
https://doi.org/10.61173/fmwnqv14Keywords:
Convolutional neural networks, FPGA acceleration, hardware implementation, LeNet-5, edge computingAbstract
LeNet-5 is a classic Convolutional Neural Network (CNN) model whose core structure, such as the C1 Convolution layer, remains constrained by hardware resources like computing power, power consumption, and storage bandwidth for real-time inference in embedded and edge computing scenarios. To break through this bottleneck and enhance the computing efficiency of artificial intelligence in resource-constrained environments, this study focuses on the design of a dedicated hardware accelerator for the Convolution Layer 1 (C1) of LeNet-5 and its subsequent Rectified Linear Unit (ReLU). A highly parallel convolution computing architecture was constructed, enabling synchronous operation and data reuse across multiple groups of convolution units (CUs), which significantly improved computational throughput and energy efficiency. The experimental results show that while controlling the resource consumption of the Field Programmable Gate Array (FPGA), this accelerator has a significant improvement in inference speed compared with the pure software implementation, successfully verifying the technical feasibility and engineering advantages of hardwareization of the basic operators of Convolution neural networks. This research not only provides a reusable hardware prototype and optimization path for the efficient deployment of CNN models in embedded terminals, but also lays an important theoretical and technical foundation for the industrial application of artificial intelligence edge computing, possessing high academic innovation and practical application value.