- Enhancing Tile & Fuse Transformation in MLIR with a Planning Interface
- LTO/BOLT Optimised Clang/LLD AArch64 toolchain
- MLIR Tensor Compiler Charter Update
- Accurate Runtime Performance Estimation for Predictably Training ML Guided Register Eviction Policies
- The New Premerge System
- Measuring the health of the LLVM community
- Efficient Cache Performance Prediction for Neural Networks
- Lessons learned from leveling up RISC-V LLVM testing
1) Enhancing Tile & Fuse Transformation in MLIR with a Planning Interface - Aviad Cohen
The MLIR project provides a Tile & Fuse transformation to enhance memory locality by breaking down computations into smaller tiles and merging tiled loops. While the transformation utilities exist, determining optimal tiling and fusion decisions is challenging due to the lack of readily available iteration domain mapping information. This talk proposes a new interface to expose this critical mapping independent of the transformation process, enabling more informed and flexible tiling and fusion strategies. By providing a common solution, this interface aims to benefit both in-tree and out-of-tree dialects, addressing challenges faced by downstream projects and improving the broader MLIR ecosystem.
2) LTO/BOLT Optimised Clang/LLD AArch64 toolchain - Elvina Yakubova
In this talk, we present our work on building a faster AArch64 Clang compiler by using an advanced LTO-PGO-BOLT build. We’ll show that using the default settings leads to regressions in some applications due to its training on C++ codebases. To address this issue, efforts are made to diversify the training phase and merge profiles from applications with different codebases to develop a LTO-PGO-BOLT-optimized compiler that achieves higher performance across a wider range of applications.
3) MLIR Tensor Compiler Charter Update - Renato Golin
This will be an update on the current efforts to consolidate the MLIR charter via the recently formed design group. In the two months before EuroLLVM, we'll collect rationale, design and implementation documents, review contents and propose changes and reach a state where we'll be in a position to start making more assertive suggestions based on the community's feedback on current progress. This effort will continue upstream, in the open, but we recognize that the volume and intensity of discussions can be daunting, so we plan to present a summary of the current state and potential futures, which will lead into a round table for discussions, and follow-up forum threads to finally reach back the rest of the community.
4) Accurate Runtime Performance Estimation for Predictably Training ML Guided Register Eviction Policies - Aiden Grossman
Within this talk, we describe how we trained a machine-learned register allocation heuristic using a new trace-based runtime estimation methodology. We outline how our new performance estimation system works, our overall training process, and our results, where we were able to achieve performance gains of around 1% on some server binaries.
5) The New Premerge System - Lucile Rose Nihlen, Aiden Grossman
Here we talk through the redesigned premerge system. We show a demo of the system in action from the contributor perspective, talk through the public analytics to understand system performance and reliability, explain the SLO and on-call process, provide a technical overview of the system, and answer any questions the audience might have.
6) Measuring the health of the LLVM community - Jeremy Bennett
In this short talk, I look at how data mining of git activity and mailing lists can give insight to the health for a community project. The talk offers no prescriptions, its purpose is to share techniques that may be useful to the community.
7) Efficient Cache Performance Prediction for Neural Networks - Arjun Pitchanathan
Compilers require cost models to guide optimizations. Compilers for neural networks in particular are currently of great interest. Existing tools like llvm-mca do not take memory cache behaviour into account. Current strategies to decide when to apply loop fusion use rough heuristics or greedy approaches. An analysis that predicts cache miss rates could be applied to better choose when and how to apply loop fusion or tiling. We introduce a new algorithm to predict these miss rates that is orders of magnitude faster in many practical settings. Moreover, when given more time to run, it can produce more accurate results than any prior model — for example, its L2 miss rate predictions have a correlation of 0.985 with hardware measurements as compared to 0.724 with the prior state-of-the-art.
8) Lessons learned from leveling up RISC-V LLVM testing - Alex Bradbury
Testing is essential to ensuring the quality of LLVM's releases and maintaining its development velocity. LLVM has built an extensive collection of CI and testing infrastructure over the years, but it can be challenging to stand up builders that provide a reasonable cycle time for scenarios such as testing a backend of target configuration where high performance hardware isn't (yet) readily available. This talk describes work done to improve the situation for RISC-V, including extending support for qemu and cross-compilation scenarios, improvements to documentation and the ability to test buildbot configurations locally, balancing available compute resource vs configuration coverage vs test cycle time, and the path to providing feedback beyond functional testing (e.g. code size, performance).
Alex Bradbury
Aviad Cohen
Renato Golin
Aiden Grossman
Lucile Rose Nihlen
Arjun Pitchanathan
Elvina Yakubova