Optimizing deep learning inference on mobile devices with neural network accelerators

Zeng Xi（曾惜）; Xu Yunlong; Zhi Tian

文章摘要

Zeng Xi（曾惜）,Xu Yunlong,Zhi Tian.[J].高技术通讯(英文),2019,25(4):417~425

Optimizing deep learning inference on mobile devices with neural network accelerators

DOI：doi：10.3772/j.issn.1006-6748.2019.04.010

中文关键词:

英文关键词: machine learning inference, neural network accelerator (NNA), low latency, kernel fusion, in-advance compilation

基金项目:

Author Name	Affiliation
Zeng Xi（曾惜）
Xu Yunlong
Zhi Tian

Hits: 2111

Download times: 2126

中文摘要:

英文摘要:

Deep learning has now been widely used in intelligent apps of mobile devices. In pursuit of ultra-low power and latency, integrating neural network accelerators (NNA) to mobile phones has become a trend. However, conventional deep learning programming frameworks are not well-developed to support such devices, leading to low computing efficiency and high memory-occupation. To address this problem, a 2-stage pipeline is proposed for optimizing deep learning model inference on mobile devices with NNAs in terms of both speed and memory-footprint. The 1st stage reduces computation workload via graph optimization, including splitting and merging nodes. The 2nd stage goes further by optimizing at compilation level, including kernel fusion and in-advance compilation. The proposed optimizations on a commercial mobile phone with an NNA is evaluated. The experimental results show that the proposed approaches achieve 2.8× to 26× speed up, and reduce the memory-footprint by up to 75%.

View Full Text View/Add Comment Download reader