- Code-generation of highly efficient finite element operations using the MLIR compiler infrastructure
- LLVM Support for Sub-FP8 Quantization with RISC-V Extensions for Machine Learning Models
- Towards Multi-Level Arithmetic Optimizations
- CuSan: a data race detector for CUDA based on ThreadSanitizer
- SonicMachine: Scalable Architecture Description using MLIR
- Coroutines, reinforcement learning environments, typechecking, MLIR: tying them together.
1) Code-generation of highly efficient finite element operations using the MLIR compiler infrastructure - Edward Erasmie-Jones
In this poster, we present NektarIR, a work-in-progress high-level intermediate representation of high-order finite element operations, built using the MLIR compiler infrastructure. Our goal is to address the software fragmentation that arises in developing highly efficient finite element kernels for heterogeneous hardware by enabling the generation of hardware-specific JIT-compiled kernels through both MLIR and LLVM. We demonstrate the initial stages in the development of our MLIR dialect and the performance of JIT-compiled finite kernels, tested as a back-end for the Nektar++ spectral/hp element framework.
2) LLVM Support for Sub-FP8 Quantization with RISC-V Extensions for Machine Learning Models - Kathryn Chapman, Fu-Jian Shen
Deep learning computations require substantial computational power, storage, and data transfer bandwidth. To enhance efficiency, model quantization has become essential, driving the evolution of data formats from 32-bit and 16-bit to 8-bit, 6-bit, and 4-bit in both integer and floating-point representations. Previously, at the RISC-V Summit, North America, 2024, we proposed the Sub-FP8 extension for RISC-V, a novel approach to low-precision floating-point computation. In this work, we further provide Sub-FP8 support in the LLVM backend, along with C/C++ intrinsics to facilitate its integration into machine learning workloads. In addition, a reference design of Sub-FP8 is in the on-going work to integrate with CVA6 and FPnew core from open hardware. In our reference design, we also look at the issues to integrate Sub-FP8 with risc-v vector and matrix extensions, respectively. Experiments and simulations with Toolkits such as Spike and Gem5 will be also presented.
3) Towards Multi-Level Arithmetic Optimizations - Louis Ledoux, Florent de Dinechin, Luc Forget
Numerical code is usually conceived using real numbers. However, programming languages only provide constructs operating at the lower abstraction level of machine encoding arithmetic. This work introduces an MLIR dialect for representing computations on real numbers at a high abstraction level. This enables more opportunities of arithmetic optimizations than those supported by current compilers. Such optimisations are particularly relevant when compiling a high-level mathematical description to application-specific hardware, for instance in signal processing and AI acceleration.
4) CuSan: a data race detector for CUDA based on ThreadSanitizer - Alexander Hück
CuSan is a tool for detecting data races between (asynchronous) CUDA calls and the host. To achieve this, we analyze and instrument CUDA API usage in the target code during compilation with Clang/LLVM to track CUDA-specific concurrency, memory accesses and synchronization semantics. Our runtime then exposes this information to ThreadSanitizer for final data race analysis.
5) SonicMachine: Scalable Architecture Description using MLIR - Daniyal Khan
SonicMachine is a novel MLIR dialect designed to efficiently represent complex, large-scale machine learning architectures, addressing the challenges of modeling hierarchical structures and data movement in modern accelerators. By leveraging MLIR’s shape system, SonicMachine enables explicit representation of interconnections and performance attributes through operations like slice, concat, and broadcast, along with builder functions and templates for scalable descriptions. This work demonstrates how compiler infrastructure can facilitate expressive, hierarchical hardware modeling, providing a foundation for analysis, optimization, and efficient system design.
6) Coroutines, reinforcement learning environments, typechecking, MLIR: tying them together. - Massimo Fiorovanti
While machine learning tools become simpler by the day, developing robust environments for those tools to operate in remains an open challenge. Rulebook is an MLIR and coroutine-based DSL aimed at building environments and simplifying their interoperability. This talk describes the issues found at the boundaries between machine learning agents and machine learning environments from a programming language perspective, how Rulebook solves them, what challenges prevent other languages from adopting the same constructs, how the language leverages LLVM and MLIR to achieve its purpose, and various tips and tricks learned from implementing co-routines on top of MLIR.
Florent de Dinechin
Edward Erasmie-Jones
Alexander Hück
Louis Ledoux
Fu-Jian Shen
Daniyal Khan
massimo fioravanti