- Self-Contained, Target-Specific GEMM Code Generation in MLIR - Adam Siemieniuk, Renato Golin, Rolf Morel
- Tracking Operations Through MLIR Pass Pipelines Using Source Locations - Florian Walbroel
- Mojo Compile-time Interpreter in MLIR - Weiwei Chen
- Apple GPU Support in Mojo - Kolya Panchenko
- Anatomy of Tiling and Vectorizing linalg.pack and linalg.unpack - Ege Beysel
Self-Contained, Target-Specific GEMM Code Generation in MLIR
Speaker(s): Adam Siemieniuk, Renato Golin, Rolf Morel
We present an MLIR-based approach for generating target-specific, highly optimized GEMM kernels that is fully self-contained within the LLVM/MLIR compiler infrastructure and does not rely on external libraries such as LIBXSMM, Intel MKL, or oneDNN. Building on prior work in the TPP-MLIR compiler, we upstream FP32, BF16, and INT8 code-generation techniques into MLIR and propose a transform schedule that combines existing and newly upstreamed passes to lower matmul, batch matmul, and batch-reduce matmul operations into optimized kernels, achieving performance competitive with the LIBXSMM library. As a future work, we plan to leverage auto-tuning techniques to select efficient tile sizes based on hardware characteristics and problem dimensions.
Tracking Operations Through MLIR Pass Pipelines Using Source Locations
Speaker(s): Florian Walbroel
This talk presents a source-location-driven approach for tracking the evolution of MLIR operations across deep pass pipelines. Motivated by real-world optimization work on quantized convolutions in IREE, it shows how preserved source locations can be used to reconstruct operation lineage across IR stages, enabling systematic reasoning about transformation effects. The talk surveys source location semantics under common MLIR transformations and demonstrates a reusable Python-based tool that supports interactive, cross-stage operation tracking for improved debuggability in large MLIR programs.
Mojo Compile-time Interpreter in MLIR
Speaker(s): Weiwei Chen
Mojo supports powerful compile time meta-programming that helps to unlock performance on heterogeneous accelerators by enabling generic abstractions across different targets. Almost any runtime Mojo code can be moved to compile-time to trade for runtime performance while constants evaluated at compile-time can be materialized into runtime values. In this talk, we will dive into the architecture of Mojo’s MLIR based compile-time interpreter which is at the core of materializing generic code into concrete form during compilation. We'll share implementation insights, performance challenges, and lessons learned, while fostering discussion on building meta-programming compilers with MLIR.
Apple GPU Support in Mojo
Speaker(s): Kolya Panchenko
Mojo's powerful compile-time meta-programming system and unified syntax make it exceptionally well-suited for heterogeneous accelerator programming, with proven success on CUDA and ROCm platforms. As many developers work on Apple products (such as iMacs, MacBooks, Mac Studios etc) adding Apple GPU support to Mojo is a natural next step. However, unlike CUDA and ROCm, which have open-source compiler toolchains in upstream LLVM, Apple's GPU compiler stack is proprietary. We will present chosen design and discuss challenges of Apple GPU’s integration into Mojo.
Anatomy of Tiling and Vectorizing linalg.pack and linalg.unpack
Speaker(s): Ege Beysel
linalg.pack and linalg.unpack enable explicit data-tiling and layout transformations in MLIR, but their use in data-tiled compilation flows raises subtle questions about alignment, legality, and vectorization. This talk explores how these operations interact with MLIR’s tiling and vectorization infrastructure, focusing on alignment constraints, masking semantics, and performance implications. Using an end-to-end data-tiled matmul example, the talk highlights practical guidance and performance gains for developers building high-performance tensor pipelines.