Name
Quick Talks
Session Type
Quick Talks
Date & Time
Wednesday, October 23, 2024, 11:00 AM - 12:00 PM
Talk Order

1) Instrumenting MLIR Based ML Compilers for GPU Performance Analysis and Optimization - Corbin Robeck

2) PyDSL: A MLIR DSL for Python developers - Kai Ting Wang

3) Embedding Domain-Specific Languages in C++ with Polygeist - Lorenzo Chelini

4) Vector-DDG (Vector Data Dependence Graph): For Better Visualization and Verification of Vectorized LLVM-IR - Sumukh Bharadwaj, Raghesh Aloor

5) From Fallbacks to Performance: Towards Better GlobalISel Performance on AArch64 Advanced SIMD Platforms - Madhur Amilkanthwar

Abstract/s

1) Instrumenting MLIR Based ML Compilers for GPU Performance Analysis and Optimization - Corbin Robeck
Correlating GPU kernel performance bottleneck analysis information back to program source within modern machine learning frameworks, that use MLIR and JIT style kernels, remains a challenge as it can often be difficult to attribute the performance issue to specific points within the compiler tool chain and the various lowering passes (Python/C++, GPU kernel source, multiple MLIR IRs, LLVM IR, and architecture specific ISA). In this talk we give an overview of a developed set of open source and extendible GPU kernel instrumentation passes to address this issue and how they can be integrated within popular MLIR based machine learning compilers.

2) PyDSL: A MLIR DSL for Python developers - Kai Ting Wang
This talk introduces new improvements to PyDSL, a compiler research project that transforms a subset of Python down to MLIR which was originally introduced in an MLIR ODM in December 2023. While the existing MLIR infrastructure is essential to our optimization stack, it does not yet provide a language that can describe MLIR program behaviors that also benefits end-developer productivity. As such, PyDSL aims to bridge this gap by providing a faithful Python-based syntax and programming style to writing MLIR programs. The presentation will review aspects of PyDSL and introduce new ways we manage typing, translate Python syntax into MLIR, and improve the modularity and usability of the language.

3) Embedding Domain-Specific Languages in C++ with Polygeist - Lorenzo Chelini
Domain-specific languages (DSLs) and compilers allow high-level abstraction and optimal performance by directly mapping abstractions to hardware. DSLs are becoming more prevalent, spanning fields from linear algebra to quantum computing, yet they often remain isolated, complicating multi-domain application optimization and integration with C++ codebases. In this talk, we propose embedding DSLs and their optimizations into general-purpose code (C or C++) using Polygeist. Our approach leverages modern compiler technology and facilitates domain-specific compilation, bridging the gap between specialized and general-purpose programming.

4) Vector-DDG (Vector Data Dependence Graph): For Better Visualization and Verification of Vectorized LLVM-IR- Sumukh Bharadwaj, Raghesh Aloor
We propose Vector-DDG (Vector Data Dependence Graph), a tool to visualize and verify the complicated data flow in vectorized LLVM-IR. The visualization helps to understand the vectorized IR better and to further improve the quality of the same. The automatic verification helps improve the developer productivity by catching the vectorization errors early.

5) From Fallbacks to Performance: Towards Better GlobalISel Performance on AArch64 Advanced SIMD Platforms - Madhur Amilkanthwar
In this talk, we will present our work on enhancing the Global Instruction Selection (GISel) framework for AArch64 Advanced SIMD platforms. We addressed its fallback to the traditional SelectionDAG due to incomplete support for certain instructions and patterns. We will present our experience with using GISel on TSVC-2, RajaPerf, and LLVM Test Suite benchmarks, which identify fallbacks across GISel due to a lack of support for various SVE instructions and AAPCS ABI. Our contributions include eliminating fallbacks in GISel, particularly for the TSVC-2 benchmark, by introducing patches across the phases of GISel. We also present our work on optimizations of GISel’s generated code which has significantly closed the performance gap between GISel and SelectionDAG on Advanced SIMD-based AArch64 platforms, especially for the TSVC-2 benchmark. These advancements mark an important step forward in improving the GISel framework, bringing us one step closer to making it default. However, we also acknowledge that further effort is required for full SVE support and tuning for other workloads.

Location Name
California Ballroom