Zeng Xi(曾惜),Xu Yunlong,Zhi Tian.[J].高技术通讯(英文),2019,25(4):417~425 |
|
Optimizing deep learning inference on mobile devices with neural network accelerators |
|
DOI:doi:10.3772/j.issn.1006-6748.2019.04.010 |
中文关键词: |
英文关键词: machine learning inference, neural network accelerator (NNA), low latency, kernel fusion, in-advance compilation |
基金项目: |
Author Name | Affiliation | Zeng Xi(曾惜) | | Xu Yunlong | | Zhi Tian | |
|
Hits: 1898 |
Download times: 1722 |
中文摘要: |
|
英文摘要: |
Deep learning has now been widely used in intelligent apps of mobile devices. In pursuit of ultra-low power and latency, integrating neural network accelerators (NNA) to mobile phones has become a trend. However, conventional deep learning programming frameworks are not well-developed to support such devices, leading to low computing efficiency and high memory-occupation. To address this problem, a 2-stage pipeline is proposed for optimizing deep learning model inference on mobile devices with NNAs in terms of both speed and memory-footprint. The 1st stage reduces computation workload via graph optimization, including splitting and merging nodes. The 2nd stage goes further by optimizing at compilation level, including kernel fusion and in-advance compilation. The proposed optimizations on a commercial mobile phone with an NNA is evaluated. The experimental results show that the proposed approaches achieve 2.8× to 26× speed up, and reduce the memory-footprint by up to 75%. |
View Full Text
View/Add Comment Download reader |
Close |
|
|
|