面向高能效场景的神经网络结构和加速器协同设计

陈维伟* **; 王颖*; 张磊*

文章摘要

陈维伟* **,王颖*,张磊*.面向高能效场景的神经网络结构和加速器协同设计[J].高技术通讯(中文),2022,32(11):1143~1152

面向高能效场景的神经网络结构和加速器协同设计

Neural network architecture and accelerator co-design for high energy-efficient scenarios

DOI：10.3772/j.issn.1002-0470.2022.11.005

中文关键词: 神经网络结构设计；加速器设计；软硬件协同设计；设计空间探索

英文关键词: neural network architecture search, accelerator design, hardware/software co-design, design space exploration

基金项目:

作者	单位
陈维伟* **	(中国科学院计算技术研究所北京 100190) (*中国科学院大学北京 100049)
王颖*	(中国科学院计算技术研究所北京 100190) (*中国科学院大学北京 100049)
张磊*	(中国科学院计算技术研究所北京 100190) (*中国科学院大学北京 100049)

摘要点击次数: 4688

全文下载次数: 3505

中文摘要:

神经网络算法和深度学习加速器已成为推动深度学习方法应用最重要的两股力量，但目前的神经网络结构设计主要围绕模型精度、计算量等指标，忽略了不同模型在目标加速器上计算效率的差异；而加速器设计一般针对既定的神经网络基准程序进行优化，往往难以覆盖到未来不断迭代进化的神经网络模型，这就容易导致加速器在新的网络架构上表现不佳。本质上，神经网络架构与加速器相对独立的设计流程，导致了两者的设计和优化不匹配，从而无法达到最优的深度学习推理性能。为此，本文提出了一种针对图像分类任务的网络结构和加速器软硬件协同设计的框架，将网络结构和加速器设计融合到统一的设计空间中，并针对设计约束，自动搜索最优协同设计方案，实现了端到端的深度学习推理定制和优化。实验表明，在真实的图像分类数据集和脉动阵列架构上，相对于传统的网络结构和加速器分别独立优化的方法，本文提出的协同设计方法实现了平均40%的能耗降低。

英文摘要:

Neural network architecture and hardware accelerators have been two driving forces for the rapid progress in deep learning. However, previous work has optimized either neural architectures given fixed hardware, or hardware give fixed neural architectures. The design of neural network structure algorithm focuses on the accuracy, and does not take the characteristics of accelerator hardware into consideration. The accelerator design is generally aimed at specific Benchmark and does not support the new network structure, which makes the hardware design lag behind the algorithm update. At the same time, deep learning has a variety of application scenarios, and different scenarios have different software and hardware requirements. Therefore, special design of software and hardware is required for special scenarios, which requires a lot of labor costs and expert knowledge. This paper studies the importance of co-designing neural architectures and hardware accelerators. To this end, an automatic framework that jointly searches for the best configuration for both neural network architecture and accelerator is proposed. This framework combines the network architecture and accelerator design space, then searches the co-design solution given the design constraints automatically, thus providing better performance opportunities than previous approaches that design the network and accelerator separately. The experiments show that, compared with previous method, joint optimization can reduce the average energy consumption by 40% in a real image classification task under the some level of accuracy constraints.

查看全文查看/发表评论下载PDF阅读器

关闭