Date & Time
Tuesday, October 28, 2025, 11:00 AM - 12:00 PM
Name
Quick Talks
Session Type
Quick Talks
Talk Order
  1. Generating efficient CPU code with MLIR for scalable vector extensions in an end-to-end case study - Andrzej Warzyński, Ege Beysel
  2. Accelerating ML on Hexagon: A Glimpse into Qualcomm’s MLIR-Based Compiler - Franck Slama, Muthu Baskaran
  3. Where We’re Legalizing, We Don’t Need Validators: Generating valid DXIL for the DirectX Backend - Farzon Lotfi
  4. An investigation of missed devirtualization opportunities - Ehsan Amiri
  5. Understanding linalg.pack and linalg.unpack - Maximilian Bartel
Abstract/s

Generating efficient CPU code with MLIR for scalable vector extensions in an end-to-end case study - Andrzej Warzyński, Ege Beysel
This talk demonstrates how to generate efficient CPU code for AI workloads using IREE's MLIR-based compiler infrastructure with emphasis on ARM’s Scalable Vector and Matrix Extensions (SVE and SME). We explore the integration of SVE and SME code generation into IREE, covering compilation strategies, vectorization techniques, and targeting two AI centric architecture features, namely FEAT_BF16 and FEAT_I8MM.

Accelerating ML on Hexagon: A Glimpse into Qualcomm’s MLIR-Based Compiler - Franck Slama, Muthu Baskaran
In this talk, I’ll present an overview of Qualcomm’s MLIR-based compiler for machine learning models, designed to target the Hexagon™ DSP via the Hexagon™ LLVM backend. I’ll outline the high-level architecture of the compiler stack, which lowers Torch models to Hexagon assembly, highlighting how MLIR enables modular and extensible compilation for embedded ML workloads. I’ll also touch on some of the key technical challenges the team has been addressing, such as memory management on constrained devices. This session aims to give attendees a quick but insightful look into the practical application of MLIR in a production-grade compiler.

Where We’re Legalizing, We Don’t Need Validators: Generating valid DXIL for the DirectX Backend - Farzon Lotfi
Clang-Doc is a LibTooling-based documentation generator that has been a part of LLVM for almost 10 years. In that timeframe, it has experienced long periods of neglect. However, over the last two years, the project has seen steady improvement. In this talk, we’ll give a historical overview of its development, evolution, improvements to performance, C++ support, and a redesign of its core architecture leveraging Mustache templates.

An investigation of missed devirtualization opportunities - Ehsan Amiri
We will present two groups of missed opportunities in whole program devirtualization (WPD). Currently we have statistics that shows catching one the two cases will increase the number of devirtualized callsites in some popular C++ open source programs by thousands or hundreds (0.5% to 3.5% of all virtual calls that are not devirtualized by WPD). In both groups of missed opportunities there is enough information to devirtualize the call in the source code and in the same function as the virtual callsite. Unfortunately catching these missed cases does not seem easy. We will discuss why currently devirtualization misses these opportunities and what are the challenges to make it happen. One notable issue that is highlighted during the discussion is the existing tension between WPD and non-strict-aliasing. After reminding this issue using an example, we will discuss why we think a language level improvement is needed to address this issue.

Understanding linalg.pack and linalg.unpack - Maximilian Bartel
The linalg.pack and linalg.unpack operations enable critical data layout transformations for tensor computations in MLIR. This talk examines their design, implementation challenges, and production deployment insights. We begin by demonstrating how these operations facilitate efficient mapping to hardware-specific kernels, particularly for matrix multiplication workloads. Through visual examples, we illustrate the transformation patterns and their impact on memory access efficiency. Drawing from production AI compiler development, we present concrete examples of semantic ambiguities encountered during implementation—cases where operation behavior was undefined or inconsistent. We detail how these issues were identified, their implications for correctness, and the solutions adopted by the MLIR community. The talk concludes with practical guidance on when and how to employ these operations effectively. We share performance considerations for both isolated kernels and full network compilation and discuss the trade-offs between transformation overhead and execution efficiency. Attendees will gain actionable knowledge for integrating linalg.pack/unpack into their compilation flows while avoiding common implementation pitfalls.

Location Name
California Ballroom