文章摘要
李文青,齐寒,肖子原,朱威浦,王剑.紧耦合异构线程处理器[J].高技术通讯(中文),2023,33(2):113~123
紧耦合异构线程处理器
Tightly coupled heterogeneous thread processor
  
DOI:10. 3772/ j. issn. 1002-0470. 2023. 02. 001
中文关键词: 异构计算; 异构接口; 紧耦合; 通信; 细粒度并行
英文关键词: heterogeneous computing, heterogeneous interface, tight coupling, communication, fine-grained parallelism
基金项目:
作者单位
李文青 (处理器芯片全国重点实验室(中国科学院计算技术研究所)北京 100190) (中国科学院大学计算机科学与技术学院北京 100049) 
齐寒 (处理器芯片全国重点实验室(中国科学院计算技术研究所)北京 100190) (中国科学院大学计算机科学与技术学院北京 100049) 
肖子原 (处理器芯片全国重点实验室(中国科学院计算技术研究所)北京 100190) (中国科学院大学计算机科学与技术学院北京 100049) 
朱威浦 (处理器芯片全国重点实验室(中国科学院计算技术研究所)北京 100190) (中国科学院大学计算机科学与技术学院北京 100049) 
王剑 (处理器芯片全国重点实验室(中国科学院计算技术研究所)北京 100190) (中国科学院大学计算机科学与技术学院北京 100049) 
摘要点击次数: 740
全文下载次数: 567
中文摘要:
      异构计算为系统达到更高的性能功耗比提供了新的思路和方向,但异构系统中中央处理器(CPU)和加速器协同执行任务的过程中大量的控制信号传输和数据搬运始终是系统性能的一个重要瓶颈。对此,本文提出了一种紧耦合异构线程处理器结构,包括一个硬件CPU线程和一个硬件加速器线程,二者采用流水线紧耦合的硬件线程间通信接口和共享存储的方式降低了通信代价,大幅提高了系统性能。为验证该结构的优势,本文在开源BOOM核的基础上设计了硬件线程间通信接口,实现了一个具有高级加密标淮(AES)加速器的紧耦合异构线程处理器,并在现场可编程门阵列(FPGA)上进行了评估。结果显示,在加密任务中,该处理器吞吐量约是Intel Comet Lake使用AES指令集(AES NI)的5.7倍,是BOOM平台上仅使用通用指令的4000倍。实验进一步验证了通过CPU和加速器快速通信实现的细粒度并行可以取得更多的性能收益。由此得出结论:该结构能敏捷地将加速器整合到CPU周围,有效降低了通信时间,实现CPU线程和加速器线程的细粒度并行,有效地发挥出异构计算的优势,取得可观的性能收益。
英文摘要:
      Heterogeneous computing provides a new idea and direction for the system to achieve a higher performance-power ratio. However, a large amount of control signal and data interaction is always an important bottleneck for the system performance in the process of central processing unit (CPU) and accelerator cooperatively executing tasks in heterogeneous systems. To address this problem, a tightly coupled heterogeneous thread processor architecture is proposed, which includes a hardware CPU thread and a hardware accelerator thread. Both of them use pipeline tightly coupled hardware thread communication interface and shared memory to reduce the communication cost and greatly improve the system performance. To verify the advantages of this architecture, a hardware inter-thread communication interface is designed based on the open source BOOM core, a tightly coupled heterogeneous thread processor is implemented with an advanced encryption standard (AES) encryption and decryption accelerator, and evaluation is performed on field programmable gate array (FPGA). The results show that in encryption tasks, the processor throughput is about 4.7 times higher than that of Intel Comet Lake using AES new instructions (AES-NI) in encryption tasks and 4000 times higher than that of the BOOM platform using only general-purpose instructions. The experiment further verifies that more performance gains can be achieved with fine-grained parallelism obtained by fast communication between the CPU and accelerator. This leads to the conclusion that the architecture can agilely integrate the accelerator around the CPU, effectively reduce the communication time and achieve fine-grained parallelism of CPU threads and accelerator threads, effectively exploit the advantages of heterogeneous computing, and obtain considerable performance gains.
查看全文   查看/发表评论  下载PDF阅读器
关闭

分享按钮