文章摘要
Zeng Xi(曾惜),Xu Yunlong,Zhi Tian.[J].高技术通讯(英文),2019,25(4):417~425
Optimizing deep learning inference on mobile devices with neural network accelerators
  
DOI:doi:10.3772/j.issn.1006-6748.2019.04.010
中文关键词: 
英文关键词: machine learning inference, neural network accelerator (NNA), low latency, kernel fusion, in-advance compilation
基金项目:
Author NameAffiliation
Zeng Xi(曾惜)  
Xu Yunlong  
Zhi Tian  
Hits: 1898
Download times: 1722
中文摘要:
      
英文摘要:
      Deep learning has now been widely used in intelligent apps of mobile devices. In pursuit of ultra-low power and latency, integrating neural network accelerators (NNA) to mobile phones has become a trend. However, conventional deep learning programming frameworks are not well-developed to support such devices, leading to low computing efficiency and high memory-occupation. To address this problem, a 2-stage pipeline is proposed for optimizing deep learning model inference on mobile devices with NNAs in terms of both speed and memory-footprint. The 1st stage reduces computation workload via graph optimization, including splitting and merging nodes. The 2nd stage goes further by optimizing at compilation level, including kernel fusion and in-advance compilation. The proposed optimizations on a commercial mobile phone with an NNA is evaluated. The experimental results show that the proposed approaches achieve 2.8× to 26× speed up, and reduce the memory-footprint by up to 75%.
View Full Text   View/Add Comment  Download reader
Close

分享按钮