A resource-adaptive tensor decomposition method for convolutional neural networks

XIE Xiaoyan (谢晓燕)*; REN Xun*; ZHU Yun**; YU Jinhao*; JIN Luochen*; YANG Tianjiao*

文章摘要

XIE Xiaoyan (谢晓燕)*,REN Xun*,ZHU Yun**,YU Jinhao*,JIN Luochen*,YANG Tianjiao*.[J].高技术通讯(英文),2025,31(4):355~364

A resource-adaptive tensor decomposition method for convolutional neural networks

DOI：10. 3772 / j. issn. 1006-6748. 2025. 04. 005

中文关键词:

英文关键词: tensor decomposition, operator parallelism, convolutional neural network (CNN)

基金项目:

Author Name	Affiliation
XIE Xiaoyan (谢晓燕)*	(* School of Computer, Xi’an University of Posts and Telecommunications, Xi’an 710121, P. R. China) (** School of Electronic Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, P. R. China)
REN Xun*
ZHU Yun**
YU Jinhao*
JIN Luochen*
YANG Tianjiao*

Hits: 453

Download times: 516

中文摘要:

英文摘要:

To enhance the inference efficiency of convolutional neural network (CNN), tensor parallelism is employed to improve the parallelism within operators. However, existing methods are customized to specific networks and hardware, limiting their generalizability. This paper proposes an approach called resource-adaptive tensor decomposition (RATD) for CNN operators, which aims to achieve an optimal match between computational resources and parallel computing tasks. Firstly, CNN is represented with fine-grained tensors at the lower graph level, thereby decoupling tensors that can be computed in parallel within operators. Secondly, the convolution and pooling operators are fused,and the decoupled tensor blocks are scheduled in parallel. Finally, a cost model is constructed,based on runtime and resource utilization, to iteratively refine the process of tensor block decomposition and automatically determine the optimal tensor decomposition. Experimental results demonstrate that the proposed RATD improves the accuracy of the model by 11% . Compared with CUDA (compute unified device architecture) deep neural network library (cuDNN), RATD achieves an average speedup ratio of 1. 21 times in inference time across various convolution kernels, along with a 12% increase in computational resource utilization.

View Full Text View/Add Comment Download reader