- Code-generation of highly efficient finite element operations using the MLIR compiler infrastructure
- LLVM Support for Sub-FP8 Quantization with RISC-V Extensions for Machine Learning Models
- Towards Multi-Level Arithmetic Optimizations
- CuSan: a data race detector for CUDA based on ThreadSanitizer
1) Code-generation of highly efficient finite element operations using the MLIR compiler infrastructure - Edward Erasmie-Jones
In this poster, we present NektarIR, a work-in-progress high-level intermediate representation of high-order finite element operations, built using the MLIR compiler infrastructure. Our goal is to address the software fragmentation that arises in developing highly efficient finite element kernels for heterogeneous hardware by enabling the generation of hardware-specific JIT-compiled kernels through both MLIR and LLVM. We demonstrate the initial stages in the development of our MLIR dialect and the performance of JIT-compiled finite kernels, tested as a back-end for the Nektar++ spectral/hp element framework.
2) LLVM Support for Sub-FP8 Quantization with RISC-V Extensions for Machine Learning Models - Kathryn Chapman, Fu-Jian Shen
Deep learning computations require substantial computational power, storage, and data transfer bandwidth. To enhance efficiency, model quantization has become essential, driving the evolution of data formats from 32-bit and 16-bit to 8-bit, 6-bit, and 4-bit in both integer and floating-point representations. Previously, at the RISC-V Summit, North America, 2024, we proposed the Sub-FP8 extension for RISC-V, a novel approach to low-precision floating-point computation. In this work, we further provide Sub-FP8 support in the LLVM backend, along with C/C++ intrinsics to facilitate its integration into machine learning workloads. In addition, a reference design of Sub-FP8 is in the on-going work to integrate with CVA6 and FPnew core from open hardware. In our reference design, we also look at the issues to integrate Sub-FP8 with risc-v vector and matrix extensions, respectively. Experiments and simulations with Toolkits such as Spike and Gem5 will be also presented.
3) Towards Multi-Level Arithmetic Optimizations - Louis Ledoux, Florent de Dinechin, Luc Forget
Numerical code is usually conceived using real numbers. However, programming languages only provide constructs operating at the lower abstraction level of machine encoding arithmetic. This work introduces an MLIR dialect for representing computations on real numbers at a high abstraction level. This enables more opportunities of arithmetic optimizations than those supported by current compilers. Such optimisations are particularly relevant when compiling a high-level mathematical description to application-specific hardware, for instance in signal processing and AI acceleration.
4) CuSan: a data race detector for CUDA based on ThreadSanitizer - Alexander Hück
CuSan is a tool for detecting data races between (asynchronous) CUDA calls and the host. To achieve this, we analyze and instrument CUDA API usage in the target code during compilation with Clang/LLVM to track CUDA-specific concurrency, memory accesses and synchronization semantics. Our runtime then exposes this information to ThreadSanitizer for final data race analysis.
Florent de Dinechin
Edward Erasmie-Jones
Luc Forget
Alexander Hück
Louis Ledoux
Fu-Jian Shen