文章摘要
张立国,黄文汉,金梅.FPGA实现卷积神经网络加速器[J].高技术通讯(中文),2023,33(10):1060~1067
FPGA实现卷积神经网络加速器
Implementation of a convolutional neural network on an FPGA
  
DOI:10. 3772/ j. issn. 1002-0470. 2023. 10. 006
中文关键词: 现场可编程门阵列(FPGA); 卷积神经网络(CNN); 硬件加速器; 并行度
英文关键词: field programmable gate array (FPGA), convolutional neural network (CNN), hardware accelerator, parallelism
基金项目:
作者单位
张立国 (燕山大学电气工程学院秦皇岛 066004) 
黄文汉  
金梅  
摘要点击次数: 475
全文下载次数: 565
中文摘要:
      卷积神经网络传统的应用平台是中央处理器(CPU)和图形处理器(GPU),其体积和功耗不能适应轻量化的行业,轻量化的专用集成电路(ASIC)平台专用加速器的开发成本又不能适应愈发复杂和深层次的网络结构。针对上述问题,设计一种基于现场可编程门阵列(FPGA)的卷积神经网络(CNN)加速器,既满足轻量化应用场景,又有低开发成本的特性。设计浮点加法器和浮点乘法器组合成卷积运算的基本运算单元,完成16bits浮点数乘累加操作只需要消耗一个数字信号处理器(DSP)资源;针对FPGA运算特性设计了基于ReLU函数的激活层模块;设计可调节并行度的各层模块,可根据平台资源在性能、功耗和面积上取得平衡;设计用比较器简化的SoftMax模块。实验结果表明,在100MHz工作频率下,峰值算力可达44.8GFLOPS,功率仅为4.51W。
英文摘要:
      The traditional application platforms for convolutional neural networks are central processing unit (CPU) and graphics processing unit (GPU), whose size and power consumption cannot be adapted to lightweight industries, and the development cost of lightweight application specific integrated circuit (ASIC) cannot be adapted to increasingly complex and deep network structures. To address the above problems, an convolutional neural network (CNN) hardware accelerator based on field programmable gate array (FPGA) is designed to satisfy both lightweight application scenes and low development cost. Design the floating-point adder and floating-point multiplier to combine into the basic operation unit of convolutional operation, and complete the 16 bits floating-point multiply-accumulate operation only need to consume one digital signal processing (DSP) resource. An activation layer module based on ReLU function is designed for the computing characteristics of FPGA. Designing modules at each layer with adjustable parallelism allows for a balance between performance, power consumption, and area, depending on platform resources. Design of SoftMax modules simplified with comparators. Experimental results show that the peak arithmetic can reach 44.8 GFLOPS at 100MHz operating frequency with only 4.51W power.
查看全文   查看/发表评论  下载PDF阅读器
关闭

分享按钮