面向GPU的循环合并

杨扬; 崔慧敏; 冯晓兵

文章摘要

杨扬,崔慧敏,冯晓兵.面向GPU的循环合并[J].高技术通讯(中文),2013,23(3):257~262

面向GPU的循环合并

A GPU oriented loop fusion method

DOI：

中文关键词: 通用图形处理单元（GPU）, 循环合并，并行， CUDA, 循环间数据重用

英文关键词: general purpose graphic processing unit (GPU), loop fusion, parallelization, CUDA, inter loop data reuse

基金项目:973计划（2011CB302504， 2011ZX01028 001 002）， 863计划（2009AA01A129， 2012AA010902）和国家自然科学基金（60970024， 60925009， 60921002）资助项目

作者	单位
杨扬	中国科学院计算技术研究所计算机体系结构国家重点实验室中国科学院研究生院
崔慧敏	中国科学院计算技术研究所计算机体系结构国家重点实验室
冯晓兵	中国科学院计算技术研究所计算机体系结构国家重点实验室

摘要点击次数: 4365

全文下载次数: 2862

中文摘要:

针对现有的将C或Fortran程序映射到通用图形处理单元（GPU）的自动转换工具主要关注将单个循环生成一个独立的GPU内核，从而阻碍了对循环间数据重用的利用的问题，提出一种新的面向GPU的循环合并的代码变换方法，该方法通过循环分块（strip mining）和冗余计算等手段达到消除迭代间数据依赖的目的，并可充分利用GPU片上的共享内存进行线程间数据交换，从而将此类程序高效地映射到GPU上。通过典型程序在GPU上的实验表明，该新方法由于能够减少对全局内存的访问，带来了最多高达1.96倍的加速比。

英文摘要:

To solve the problem that current tools for automatical mapping of C or Fortran programs onto a general purpose graphic processing unit (GPU) mainly aim at generating an independent GPU kernel for each individual loop, which hinders the exploitation of inter loop data reuse, this paper presents a novel GPU oriented code transformation approach for loop fusion. The approach integrates strip mining and redundant computation to eliminate data dependence between iterations, and takes advantage of GPU’s on chip shared memory to achieve inter thread data exchange so as to map this kind of programs onto GPUs effectively. The experiment on various programs demonstrate that the proposed framework can achieve the 1.96 fold speedup because of its reduction of global memory access.

查看全文查看/发表评论下载PDF阅读器

关闭