Design and implementation of near-memory computing array architecture based on shared buffer

SHAN Rui(山蕊)*; GAO Xu*; FENG Yani*; HUI Chao*; CUI Xinyue*; CHAI Miaomiao**

文章摘要

SHAN Rui(山蕊)*,GAO Xu*,FENG Yani*,HUI Chao*,CUI Xinyue*,CHAI Miaomiao**.[J].高技术通讯(英文),2022,28(4):345~353

Design and implementation of near-memory computing array architecture based on shared buffer

DOI：10.3772/j.issn.1006-6748.2022.04.002

中文关键词:

英文关键词: near-memory computing, shared buffer, reconfigurable array processor, convolutional neural network (CNN)

基金项目:

Author Name	Affiliation
SHAN Rui(山蕊)*	(School of Electronic Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China) (* School of Computer, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China)
GAO Xu*	(School of Electronic Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China) (* School of Computer, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China)
FENG Yani*	(School of Electronic Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China) (* School of Computer, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China)
HUI Chao*	(School of Electronic Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China) (* School of Computer, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China)
CUI Xinyue*	(School of Electronic Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China) (* School of Computer, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China)
CHAI Miaomiao**	(School of Electronic Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China) (* School of Computer, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China)

Hits: 2849

Download times: 6037

中文摘要:

英文摘要:

Deep learning algorithms have been widely used in computer vision, natural language processing and other fields. However, due to the ever-increasing scale of the deep learning model, the requirements for storage and computing performance are getting higher and higher, and the processors based on the von Neumann architecture have gradually exposed significant shortcomings such as consumption and long latency. In order to alleviate this problem, large-scale processing systems are shifting from a traditional computing-centric model to a data-centric model. A near-memory computing array architecture based on the shared buffer is proposed in this paper to improve system performance, which supports instructions with the characteristics of store-calculation integration, reducing the data movement between the processor and main memory. Through data reuse, the processing speed of the algorithm is further improved. The proposed architecture is verified and tested through the parallel realization of the convolutional neural network (CNN) algorithm. The experimental results show that at the frequency of 110 MHz, the calculation speed of a single convolution operation is increased by 66.64% on average compared with the CNN architecture that performs parallel calculations on field programmable gate array(FPGA). The processing speed of the whole convolution layer is improved by 8.81% compared with the reconfigurable array processor that does not support near-memory computing.

View Full Text View/Add Comment Download reader