郝一帆* **,杜子东*,支天*.二进制张量分解法简化神经网络推理计算[J].高技术通讯(中文),2022,32(7):687~695 |
二进制张量分解法简化神经网络推理计算 |
Simplifying inference computation of neural networks by identical-binary tensor factorization |
|
DOI:10.3772/j.issn.1002-0470.2022.07.003 |
中文关键词: 神经网络; 二进制张量分解(IBTF); 乘积累加运算(MAC) |
英文关键词: neural network, identical binary tensor factorization (IBTF), multiply-accumulate (MAC) |
基金项目: |
作者 | 单位 | 郝一帆* ** | (*中国科学院计算技术研究所智能处理器研究中心北京 100190)
(**中国科学院大学北京 100049) | 杜子东* | (*中国科学院计算技术研究所智能处理器研究中心北京 100190)
(**中国科学院大学北京 100049) | 支天* | (*中国科学院计算技术研究所智能处理器研究中心北京 100190)
(**中国科学院大学北京 100049) |
|
摘要点击次数: 1097 |
全文下载次数: 753 |
中文摘要: |
针对现有的简化神经网络推理计算方法面临模型精度下滑及重训练带来的额外开销问题,本文提出一种在比特级减少乘积累加运算(MAC)的乘加操作数的二进制张量分解法(IBTF)。该方法利用张量分解消除多个卷积核之间由于权值比特位重复导致的计算重复,并保持计算结果不变,即无需重训练。在比特级简化模型计算的IBTF算法与量化、稀疏等数据级简化方法正交,即可以协同使用,从而进一步减少MAC计算量。实验结果表明,在多个主流神经网络中,相较于量化与稀疏后的模型,IBTF进一步使计算量减少了3.32倍,并且IBTF在不同卷积核大小、不同权值位宽及不同稀疏率的卷积运算中都发挥了显著的效果。 |
英文摘要: |
Existing methods to simplify neural network inference often face the problem of model accuracy degradation and additional overhead caused by retraining. In this work, an identical binary tensor factorization ( IBTF) algorithm is proposed for the further reduction of multiply-accumulate (MAC) operands under bit-level. IBTF uses tensor decomposition to extract the computation repetition between multiple convolution kernels due to the bit repetition of synapses, and keep computational results identical without retraining. Moreover, IBTF, which simplifies models under bit-level, is orthogonal to these data-level simplification methods such as quantization and sparsity, so they can be used synergistically to further reduce MAC operands. The experimental results show that, in several mainstream neural networks, compared with models after quantization and sparsity, IBTF further reduces 3. 32 times MAC operands. In addition, IBTF plays a significant role in convolution layers with different sizes, bit-widths and sparsiKtyeyratweso. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |