MLIR is a modular, composable compiler infrastructure framework that can be used to build a wide range of compilers and tools. The 3rd MLIR workshop co-located with LLVM dev meeting brings together developers and users to discuss new developments, uses and explorations of MLIR. This workshop is also open to work-in-progress and novel prototypes.
The workshop will include a keynote address, technical talks, and roundtable discussions. There will also be opportunities for networking and collaboration.
8am-9am: Welcome Coffee and Tea
9am: Session 1 Pallas: A JAX kernel language for GPU and TPU - Sharad Vikram (Google Deepmind)
We introduce Pallas, an extension to JAX that allows embedding custom kernels inside larger JAX programs. Pallas is cross-platform, allowing you to write kernels for both GPU and TPU. It is also compatible with JAX transformations, enabling you to do things like jax.vmap a kernel. Importantly, Pallas enables researchers to push the performance frontier of their hardware, while still being an accessible front end for numerical computing.
Enzyme-MLIR: Early Experiments on multi-level automatic differentiation - William Moses (MIT)
This presentation will discuss the continuing effort to scale up the Enzyme automatic differentiation (AD) tool from operating on the LLVM internal representation to the broader MLIR representation. MLIR offers unprecedented extensibility by supporting user-defined instructions and types in the compiler, which is a challenge for a compiler-based AD tool. It requires one to conceptualize a differentiable compiler instruction and capture all information required for AD in abstract terms, but as a reward allows one to choose the most suitable level for differentiation on the set ranging from machine learning-style tensor operations to loops, to assembly-like instructions.
Modernizing types and attributes (short talk) - Mathieu Fehr, Tobias Grosser (University of Edinburgh)
We will present some ideas on how to modernize types and attributes in MLIR, and show how this improves meta-dialects such as IRDL and PDL. We will first take a look at type and attribute name parsing, and show the benefits of adding proper names to attributes and types, similar to operations. We will then look at the relationship between the Type and Attribute class, and the benefits of making Type a subtype of Attribute.
10:10am: Coffee Break
Physical Device Modeling in MLIR - Stephen Neuendorffer (AMD)
ML accelerators often have complex internal architecture which does not map neatly into the traditional compiler concepts. The ability to represent concurrency, memory hierarchy, and explicit data movement is often critical to achieving performance. With the right dialect, MLIR enables these concepts to be explicitly represented and provides a foundation for end-to-end flows. This talk will illustrate this using the AMD/Xilinx AIEngine architecture and provide insight on how MLIR concepts can be used to model other devices.
CIRCT in 2023 - Mike Urbach (SiFive), John Demme (Microsoft)
We will provide brief updates on many CIRCT subprojects. We will then delve into interesting technical details, problems we have faced (some of which MLIR could have helped solve) and our solutions to them (some of which are generic enough to be upstream-able). Major issues include symbol tables, replacing Function[Type,Like], and Python bindings.
Dataflow Architecture Compiler Design Using MLIR (short talk) - Joseph Primmer (SambaNova)
xdsl-run: an interpreter for MLIR (short talk) - Sasha Lopoukhine (University of Edinburgh)
xdsl-run: an interpreter for MLIR IR. An early look at an interpreter that allows users to run MLIR IR without compiling, with hooks to register individual Python functions for each operation. It can be used to implement a modular interpreter, with operation implementations provided separately for each dialect, combined into a single interpreter instance at runtime. It has been useful as a way to test lowering correctness in our experiments. This talk/demo will show some example uses to verify that behavior is consistent across rewrites on Toy.
1pm: Session 3 A stable serialization format using MLIR - Matteo Franciolini, Dhruv Saksena (Apple)
A year ago, a bytecode serialization format was introduced for MLIR. As opposed to the existing textual format, bytecode offers significant performance improvements in terms of IO, memory requirements, opportunities to do memory mapping, among others. However, the initial functional implementation came without stability guarantees, enabling the use of the bytecode serialization as a tool for temporary IR storage, but severely limiting its adoption for a stable serialization. The talk will go over the progress made over the last year towards reaching stability of the bytecode format and the additional features that were implemented, such as IR versioning, lazy loading, use-list order preservation, and efficient encoding of op properties. The talk will also discuss how a client dialect can leverage the MLIR bytecode features to build a backward and forward compatible serialization format.
Exorcizing the (C)Python interpreter to implement a universal MLIR frontend - Maksim Levental (University of Chicago)
A MLIR dialect without a language frontend is like a belt without any pants to hold up: very useful in principle but ultimately unproductive. If you find yourself in this unenviable position, don’t be caught pantsless: let me show you how a few (cute) tricks can transform the (C)Python interpreter into a convenient frontend for your dialect.
MLIR Interpreter for a stack-based programming language - Jeff Niu (Modular)
This talk will give an overview of the implementation of an MLIR interpreter. Interpreters can play many useful roles in a compiler stack, from constant folding function calls and memory accesses, to validating IR. We will discuss how the implementation is generic over control flow semantics and how it implements a memory model. We will give an example of how the memory model can be used to build a dense conditional constant propagation analysis. To conclude, we will also discuss the challenges with building a generic MLIR interpreter, and what it would take.
Progress Report on the MLIR Sparsifier - Aart Bik, Peiming Liu, Yinying Li (Google)
In this talk, we will discuss recent improvements made in the MLIR Sparsifier, present ongoing efforts, and will provide a demo of “sparse compilation” through a colab environment.
3pm: Coffee Break
3:30pm: Session 4 Data Tiling and targeting fixed instruction sequences in IREE - Mahesh Ravishankar, Benoit Jacob, Hanhan Wang (Nod.ai)
This talk describes the recent work in IREE that enables the use of data layout transformations to improve matrix multiplication performance. There are two parts to this work that are independent, but work together to deliver good performance, (a) doing a data layout transformation of the operands to be more friendly to the SIMD ISA and/or to the memory system (caches) and (b) the ability to offload the innermost loop code to predefined sequence of instructions on specific hardware. These techniques by themselves have been well known to the community, but this talk will show how these two tie together. While the current implementation has been evaluated on CPU architectures, the same approach is readily usable on any backend that IREE targets. We will also discuss the performance of the code generated through this approach.
XeTile: A low-level dialect for generating high-efficient GEMM - Jianhui Li, Chao Chen, Shahneous Bari, Md Abdullah, Gusthinna Waduge, Charitha Saumya (Intel)
To facilitate efficient code generation for GEMM on Intel GPUs, we introduce the XeTile dialect that supports a tile-based programming model. XeTile decomposes the GEMM kernel to large pre-defined tile sizes at the subgroup and workgroup level. With the XeTile dialect, the tile-based GEMM algorithms can easily be expressed, and it enables advanced optimizations like cooperative load/prefetch, K-slicing, and software pipelining. Underneath XeTile, the implementation uses target-specific features to get the best performance on specific hardware. Based on XeTile representation, as the GEMM is decomposed at submatrix granularity and mapped to registers, it supports optimization such as fusion with neighboring operations. Although XeTile is developed for intel GPU, the dialect definition is target-independent, and it can be lowered to different hardware targets. We would like to discuss with the MLIR community and obtain feedback to finetune it as an upstream dialect.
4:30pm: Roundtables Poison Semantics - Jakub Kuderski (nod.ai), Ivan Butygin (Intel), Karl Friebel (TU Dresden)
Poison semantics, originally introduced in LLVM, allow for defining semantics of ops with undefined behavior and modeling deferred undefined behavior. This roundtable focuses on the implementation strategy specific to MLIR.
Using MLIR from Python - Mathieu Fehr (University of Edinburgh)
How we could improve Python Bindings in MLIR, and also how we could contribute some of the ideas we had in xDSL in MLIR python bindings.