Session Details: 2025 LLVM Developers' Meeting

Date & Time

Tuesday, October 28, 2025, 11:00 AM - 12:00 PM

Name

Quick Talks

Session Type

Quick Talks

Talk Order

Generating efficient CPU code with MLIR for scalable vector extensions in an end-to-end case study - Andrzej Warzyński, Ege Beysel
Accelerating ML on Hexagon: A Glimpse into Qualcomm’s MLIR-Based Compiler - Franck Slama, Muthu Baskaran
Where We’re Legalizing, We Don’t Need Validators: Generating valid DXIL for the DirectX Backend - Farzon Lotfi
An investigation of missed devirtualization opportunities - Ehsan Amiri
Understanding linalg.pack and linalg.unpack - Maximilian Bartel

Abstract/s

Generating efficient CPU code with MLIR for scalable vector extensions in an end-to-end case study - Andrzej Warzyński, Ege Beysel
This talk demonstrates how to generate efficient CPU code for AI workloads using IREE's MLIR-based compiler infrastructure with emphasis on ARM’s Scalable Vector and Matrix Extensions (SVE and SME). We explore the integration of SVE and SME code generation into IREE, covering compilation strategies, vectorization techniques, and targeting two AI centric architecture features, namely FEAT_BF16 and FEAT_I8MM.

Accelerating ML on Hexagon: A Glimpse into Qualcomm’s MLIR-Based Compiler - Franck Slama, Muthu Baskaran
In this talk, I’ll present an overview of Qualcomm’s MLIR-based compiler for machine learning models, designed to target the Hexagon™ DSP via the Hexagon™ LLVM backend. I’ll outline the high-level architecture of the compiler stack, which lowers Torch models to Hexagon assembly, highlighting how MLIR enables modular and extensible compilation for embedded ML workloads. I’ll also touch on some of the key technical challenges the team has been addressing, such as memory management on constrained devices. This session aims to give attendees a quick but insightful look into the practical application of MLIR in a production-grade compiler.

Where We’re Legalizing, We Don’t Need Validators: Generating valid DXIL for the DirectX Backend - Farzon Lotfi In this talk, we present a plan for a validator-free pipeline for targeting DXIL directly from LLVM. By reimagining the classic LLVM-to-DXIL flow, we've developed a suite of legalizations and transformations that make DXIL generation predictable, robust, and compliant from the outset. We’ll cover our approach to data scalarization, leveraging and adapting the existing LLVM Scalarizer. We’ll also dive into our array flattening work, which transforms nested and multidimensional arrays into forms acceptable by the DXIL backend. By carefully reusing and extending LLVM infrastructure, we've built a system that emits DXIL-ready code which will eventually no longer require a post-hoc validator step.

An investigation of missed devirtualization opportunities - Ehsan Amiri
We will present two groups of missed opportunities in whole program devirtualization (WPD). Currently we have statistics that shows catching one the two cases will increase the number of devirtualized callsites in some popular C++ open source programs by thousands or hundreds (0.5% to 3.5% of all virtual calls that are not devirtualized by WPD). In both groups of missed opportunities there is enough information to devirtualize the call in the source code and in the same function as the virtual callsite. Unfortunately catching these missed cases does not seem easy. We will discuss why currently devirtualization misses these opportunities and what are the challenges to make it happen. One notable issue that is highlighted during the discussion is the existing tension between WPD and non-strict-aliasing. After reminding this issue using an example, we will discuss why we think a language level improvement is needed to address this issue.

Understanding linalg.pack and linalg.unpack - Maximilian Bartel
The linalg.pack and linalg.unpack operations enable critical data layout transformations for tensor computations in MLIR. This talk examines their design, implementation challenges, and production deployment insights. We begin by demonstrating how these operations facilitate efficient mapping to hardware-specific kernels, particularly for matrix multiplication workloads. Through visual examples, we illustrate the transformation patterns and their impact on memory access efficiency. Drawing from production AI compiler development, we present concrete examples of semantic ambiguities encountered during implementation—cases where operation behavior was undefined or inconsistent. We detail how these issues were identified, their implications for correctness, and the solutions adopted by the MLIR community. The talk concludes with practical guidance on when and how to employ these operations effectively. We share performance considerations for both isolated kernels and full network compilation and discuss the trade-offs between transformation overhead and execution efficiency. Attendees will gain actionable knowledge for integrating linalg.pack/unpack into their compilation flows while avoiding common implementation pitfalls.

Speakers

Ehsan Amiri
Maximilian Bartel
Ege Beysel
Farzon Lotfi
Franck Slama
Andrzej Warzyński
Muthu Baskaran

Location Name

California Ballroom