文章摘要
张军*,王兴宾**,苏玉兰**.面向多模型工作负载的弹性计算加速器架构研究[J].高技术通讯(中文),2025,35(7):698~710
面向多模型工作负载的弹性计算加速器架构研究
An elastic computing accelerator architecture for multi-model workloads
  
DOI:10. 3772 / j. issn. 1002-0470. 2025. 07. 003
中文关键词: 深度神经网络加速器,集成学习,多模型工作负载,弹性计算,脉动阵列,抢占调度
英文关键词: deep neural network (DNN) accelerator,ensemble learning,multi-model workloads,elastic computing,systolic array,preemptive scheduling
基金项目:
作者单位
张军* (* 湖北文理学院智慧交通研究院 襄阳 441053) (** 中国科学院信息工程研究所 北京 100093) 
王兴宾**  
苏玉兰**  
摘要点击次数: 340
全文下载次数: 535
中文摘要:
      针对多模型工作负载在深度神经网络(deep neural network,DNN)加速器上部署时服务质量下降的问题,本文提出新的加速器体系结构 EnsBooster,该架构能够为多模型的高效推理提供经济高效的并行执行模式。 首先,设计了弹性脉动阵列,将较大的脉动阵列划分为多个较小的脉动子阵列,以满足多模型并行执行的灵活性和可扩展性需求。其次,提出了时空复用资源分配策略,充分利用时空共享来提高底层计算资源的使用效率。 最后,提出分层调度机制,在粗粒度层面,采用提前退出调度来降低多模型推理的计算负担;在细粒度层面,采用抢占调度机制利用多模型的互补性和数据局部性抢占空闲计算资源,最大限度地提高硬件资源和带宽利用率。 使用一组不同的多模型工作负载基准进行的评估表明,EnsBooster 架构在吞吐量、能耗降低方面有显著提高。
英文摘要:
      When multi-model workloads are deployed on the current deep neural networks(DNN) accelerator,the quality of service of them is degraded. To tackle this problem,this paper proposes a new accelerator architecture EnsBooster,which can provide a cost-effective parallel execution mode for the efficient reasoning for the integrated model.Firstly,the elastic systolic array is designed,and the larger systolic array is divided into several smaller systolic subarrays to meet the flexibility and scalability requirements of the parallel implementation of the integrated model.Secondly,a spatial-temporal reuse resource allocation strategy is proposed,which can make full use of spatial-temporal sharing to improve the efficiency of the underlying computing resources. Finally,a hierarchical scheduling mechanism is proposed: at the coarse-grained level,early exit scheduling is used to reduce the computational burden of integrated model reasoning; at the fine-grained level,the preemptive scheduling mechanism is used to preempt idle computing resources by using the complementarity and data locality of the integration model to maximize the utilization of hardware resources and bandwidth. The evaluation using a set of different workload benchmarks shows that the throughput and energy efficiency of EnsBooster are significantly improved.
查看全文   查看/发表评论  下载PDF阅读器
关闭

分享按钮