University of Minnesota
Institute of Technology
myU OneStop


Electrical and Computer Engineering

Performance Optimization for CUDA programs

Dr. Weijun Xiao
Electrical and Computer Engineering

Duration: 1 day (7 hours)

Course Description:
The rapid development of Graphic Processing Units (GPUs) brings new opportunities for high performance computing. Due to the low cost and parallel computing capability of GPUs, GPU computing has recently been used for a wide range of high-performance computing applications. Benefiting from GPU hardware, applications can often achieve more than 100 times performance speedup for signal processing, physical simulations, biomedical imaging, geologic computation, and other fields. This short course is to provide students with knowledge and hands-on experience in optimizing performance for CUDA programs. It will concentrate on CUDA performance analysis, CUDA performance considerations and optimizations, and examples for performance CUDA optimizations. All performance optimizations are based on the latest NVIDIA Fermi GPU architecture.

1.    Introduction
Fermi architecture
Performance consideration
2.    CUDA tools for performance optimization
CUDA profiler
Parallel Nsight
3.    CUDA performance analysis and optimization
Kernel Launch Configuration
Global Memory
Shared Memory
Constant and Texture Memory
Control Flow
4.    CUDA optimization examples

Intended audience and assumed background:
This short course is an advanced tutorial for introduction to CUDA C programming. It is intended for software designers and application developers who need to understand how to analyze CUDA C programs and identify CUDA performance bottlenecks, and who need to learn how to use CUDA tools (Profiler and Parallel Nsignt) for performance optimization. It is assumed that the audience has basic knowledge of CUDA C programming. 

Biographical sketch of the instructor:
Weijun Xiao received his Ph.D. in Electrical and Computer Engineering from the University of Rhode Island, and both M.S. and B.S. degrees in Computer Science from Huazhong University of Science and Technology, China. He is currently a Research Associate of the Department of Electrical and Computer Engineering at the University of Minnesota-Twin Cities. He has been named a 2009 Computing Innovation Fellow (CIFellow), a postdoctoral fellowship program developed by the Computing Community Consortium (CCC) and the Computing Research Association (CRA), with funding from the National Science Foundation. Dr. Xiao's primary expertise is in data storage, high-performance computing, and computer architecture. He has published research papers at top journals and conferences in these areas such as IEEE TPDS, ISCA, and ICDCS.