Session Type
Student Technical Talks
Date & Time
Thursday, October 12, 2023, 11:00 AM - 12:00 PM
Name
Student Talks
Talk Order

1) OpenMP Kernel Language Extensions for Performance Portable GPU Codes - Shilei Tian
2) Profiling the Profiler: New metrics to evaluate and improve profile guided optimization - Micah Weston
3) Optimization of CUDA GPU Kernels and Translation to AMDGPU in 4) Polygeist/MLIR - Ivan Ivanov
4) Leveraging MLIR for Loop Vectorization and GPU Porting of FFT Libraries - Yifei He

Abstract/s

OpenMP Kernel Language Extensions for Performance Portable GPU Codes - Shilei Tian
In this talk, we will introduce extensions to LLVM OpenMP, transforming it into a versatile and performance portable kernel language for GPU programming. We will demonstrate how these extensions allow for the seamless porting of CUDA programs to high-performance OpenMP GPU programs with minimal modifications. Finally, we will present performance results on both NVIDIA and AMD GPUs.

Profiling the Profiler: New metrics to evaluate and improve profile guided optimization - Micah Weston
PGO can have the biggest impact when all compiler passes have accurate profile information, but in practice many parts of the compilation pipeline introduce inaccuracies. Recent PGO evaluations have measured accuracy of imported profiles, ignoring distortion from later optimizations, or looked at aggregate performance, which is often too noisy to correlate with profile accuracy. We propose new metrics that compare end-of-compilation profile data against instruction traces, letting us check the accuracy of profile data end-to-end and decoupling it from performance measurement noise. We share our experience using these new metrics to measure the accuracy of profiles used in PGO, pinpoint areas for improvement, and evaluate new fixes.

 

Optimization of CUDA GPU Kernels and Translation to AMDGPU in Polygeist/MLIR - Ivan Ivanov
We extend the Polygeist C/C++ compiler to utilize a target-agnostic parallel representation of GPU kernels in MLIR to perform parallel optimizations and architecture-specific tuning. We also implement translation from CUDA to AMDGPU and expand the set of possible target hardware for CUDA code.

Leveraging MLIR for Loop Vectorization and GPU Porting of FFT Libraries - Yifei He
Leveraging MLIR for Loop Vectorization and GPU Porting of FFT Libraries
Related paer:
https://arxiv.org/abs/2308.00497

Location Name
Hall of Cities