| Zeng Xi(曾惜),Xu Yunlong,Zhi Tian.[J].高技术通讯(英文),2019,25(4):417~425 |
|
| Optimizing deep learning inference on mobile devices with neural network accelerators |
| |
| DOI:doi:10.3772/j.issn.1006-6748.2019.04.010 |
| 中文关键词: |
| 英文关键词: machine learning inference, neural network accelerator (NNA), low latency, kernel fusion, in-advance compilation |
| 基金项目: |
| Author Name | Affiliation | | Zeng Xi(曾惜) | | | Xu Yunlong | | | Zhi Tian | |
|
| Hits: 3291 |
| Download times: 3308 |
| 中文摘要: |
| |
| 英文摘要: |
| Deep learning has now been widely used in intelligent apps of mobile devices. In pursuit of ultra-low power and latency, integrating neural network accelerators (NNA) to mobile phones has become a trend. However, conventional deep learning programming frameworks are not well-developed to support such devices, leading to low computing efficiency and high memory-occupation. To address this problem, a 2-stage pipeline is proposed for optimizing deep learning model inference on mobile devices with NNAs in terms of both speed and memory-footprint. The 1st stage reduces computation workload via graph optimization, including splitting and merging nodes. The 2nd stage goes further by optimizing at compilation level, including kernel fusion and in-advance compilation. The proposed optimizations on a commercial mobile phone with an NNA is evaluated. The experimental results show that the proposed approaches achieve 2.8× to 26× speed up, and reduce the memory-footprint by up to 75%. |
|
View Full Text
View/Add Comment Download reader |
| Close |
|
|
|