徐晗,郭振江,肖俊华.微处理器性能分析与优化:基于SPEC CPU2017的对比研究[J].高技术通讯(中文),2025,35(3):241~249 |
微处理器性能分析与优化:基于SPEC CPU2017的对比研究 |
Processor performance measurement and analysis based on SPEC CPU2017 |
|
DOI:10. 3772 / j. issn. 1002-0470. 2025. 03. 002 |
中文关键词: SPEC CPU2017; 性能分析; 龙芯3A5000; 向量化; 体系结构 |
英文关键词: SPEC CPU2017, performance analysis, Loongson 3A5000, vectorizing, architecture |
基金项目: |
作者 | 单位 | 徐晗 | (中国科学院计算技术研究所北京 100190)
(中国科学院大学北京 100049) | 郭振江 | | 肖俊华 | |
|
摘要点击次数: 139 |
全文下载次数: 100 |
中文摘要: |
运行标准测试程序是进行微处理器设计空间探索的基本手段。横向对比分析国内外主流处理器在标准测试程序场景下的各项性能指标,有助于识别国产处理器的性能瓶颈,为进一步的性能优化指明方向。本文基于SPEC CPU2017对3款微处理器进行同频性能测试和对比分析,分别是龙芯LA464架构的3A5000微处理器、AMD ZEN1架构的R3-1200以及Intel Skylake架构的i3-9100f。根据测试结果,3A5000定点性能与R3-1200基本相同,比i3-9100f低10%左右;3A5000浮点性能相当于另2款微处理器的70%左右。本文从动态指令数和每周期指令数(instruction per cycle,IPC)2个角度对微处理器进行比较和分析。基于SPEC CPU2017的结果显示,3A5000定点动态指令数和浮点动态指令数分别比另2款微处理器多约10%和25%。在3A5000上使用激进的自动向量化编译优化策略、优化立即数乘法编译效率等手段可以将其性能提升10%左右。3A5000的定点IPC比另外2款微处理器高4%左右,浮点IPC低8%左右。发射宽度、执行单元数量、功能和延迟等微结构参数接近是3款微处理器IPC差别较小的主要原因。 |
英文摘要: |
Running standardized benchmark programs constitutes a fundamental method for exploring the design space of microprocessors. A cross-sectional analysis of the performance metrics of domestic and international mainstream processors under standardized benchmark scenarios aids in identifying the performance bottlenecks of domestic processors, thereby illuminating directions for further performance optimization. This paper conducts a synchronous performance test and comparative analysis of three microprocessors using the SPEC CPU2017 benchmark: the Loongson LA464 architecture’s 3A5000 microprocessor, AMD’s ZEN1 architecture R3-1200, and Intel’s Skylake architecture i3-9100f. According to the test results, the fixed-point performance of the 3A5000 is roughly equivalent to that of the R3-1200 and approximately 10% lower than that of the i3-9100f; the floating-point performance of the 3A5000 is about 70% of the other two processors. This paper compares and analyzes the microprocessors from the perspectives of dynamic instruction count and instruction per cycle (IPC). Based on the SPEC CPU2017 results, the dynamic instruction counts for fixed-point and floating-point of the 3A5000 are approximately 10% and 25% higher, respectively, compared with the AMD R3-1200 (ZEN 1) and Intel Core i3-9100f(skylae). The application of aggressive auto-vectorization compilation optimization strategies and optimization of immediate number multiplication compilation efficiency can enhance performance by about 10% for the 3A5000. The fixed-point IPC of the 3A5000 is about 4% higher than that of the two comparison processors, while showing an 8% deficit in floating-point IPC performance. Similar micro-architectural parameters, such as issue width, execution units count, functional unit configuration, and pipeline delays, are the primary reasons for the slight variations in IPC among the three processors. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |
|
|
|