面向深度可分离卷积的硬件高效加速器设计

许浩博* **; 王颖*; 王郁杰*; 张士长* **; 刘博生* **; 韩银和*

文章摘要

许浩博* **,王颖*,王郁杰*,张士长* **,刘博生* **,韩银和*.面向深度可分离卷积的硬件高效加速器设计[J].高技术通讯(中文),2021,31(8):791~799

面向深度可分离卷积的硬件高效加速器设计

Hardware efficient accelerator for depthwise separate convolution

DOI：10.3772/j.issn.1002-0470.2021.08.001

中文关键词: 深度可分离（DS）卷积；加速器；低面积；低延迟；利用率

英文关键词: depthwise separate (DS) convolution, accelerator, low area, low latency, utilization

基金项目:

作者	单位
许浩博* **
王颖*
王郁杰*
张士长* **
刘博生* **
韩银和*

摘要点击次数: 2022

全文下载次数: 1228

中文摘要:

采用深度可分离（DS）卷积替代标准卷积已成为神经网络轻量化设计的趋势，但是由于深度可分离卷积不规则的数据维度和数据尺寸，现有卷积神经网络加速器在处理这类网络时计算并行度和计算单元（PE）利用率无法得到保证，导致加速器计算性能降低。针对这一问题，本文提出一种通道朝向的计算数据流，该数据流能够将数据维度不同的Depthwise卷积、Pointwise卷积和标准卷积在统一的数据流下展开运算。基于该数据流，设计了一款面向深度可分离卷积的加速器，该加速器采用统一的计算核心处理深度可分离卷积中各类卷积运算，在低面积开销下实现了高计算并行度。实验结果表明，与目前现有的深度可分离卷积加速器相比，该设计获得了1.32倍处理速度和1.76倍面积效率的提升。

英文摘要:

Recent advances in convolutional neural networks (CNNs) reveal the trend towards designing compact structures such as depthwise separable(DS) convolution. However, the diverse data dimensions of depthwise and pointwise in DS convolution increases the difficulty of data-parallel computing, which incurs performance loss due to the decline in the processing element(PE) utilization. To overcome this problem, a novel channel-oriented dataflow is proposed to unify the computing dataflow of Depthwise convolution and Pointwise convolution. Based on the proposed dataflow, a compact CNN accelerator is developed that can process the Depthwise and Pointwise convolution in a unified processing core with high PE utilization. The experimental results show that the proposed accelerator achieves 1.32× speedup and 1.76× area efficiency compared with the state-of-the-art depthwise separable CNN accelerator for the evaluated workloads.

查看全文查看/发表评论下载PDF阅读器

关闭