- GPU optimizations, and where Rust knows more than LLVM - Marcelo Dom�nguez
- IR2Vec Python Bindings: Native Integration for Pythonic, ML Workflows - Nishant Sachdeva, S. Venkatakeerthy
- Accelerating Pass Order Auto-tuning via Profile-Guided Cost Modeling - Bingyu Gao
GPU optimizations, and where Rust knows more than LLVM
Speaker(s): Marcelo Domínguez
In this talk we compare the performance of Rust's `std:offload` interface on various benchmarks with C++ OpenMP, CUDA, and ROCm implementations. We show the impact of a new set of LLVM-IR optimizations, and the performance difference between "safe" and "unsafe" Rust. We briefly introduce two aliasing models that are under consideration in the Rust community, and how higher-level Rust alias info can be combined with our lower-level LLVM-IR opt pass.
IR2Vec Python Bindings: Native Integration for Pythonic, ML Workflows
Speaker(s): Nishant Sachdeva, S. Venkatakeerthy
IR2Vec is a widely adopted framework for generating vector embeddings from LLVM IR to enable machine-learning–driven compiler optimizations. This work introduces native Python bindings for IR2Vec using pybind11, enabling seamless and efficient integration with Python-based ML ecosystems such as PyTorch and TensorFlow. By replacing subprocess-based CLI invocation with a direct programmatic interface, the bindings eliminate process overhead, provide inbuilt C++–Python type conversion, and support robust exception handling. The implementation and usage are demonstrated through practical embedding-generation examples. The project is currently under active development for upstream integration into the LLVM monorepo, with multiple pull requests already accepted, and is available in beta form via TestPyPI.
Accelerating Pass Order Auto-tuning via Profile-Guided Cost Modeling
Speaker(s): Bingyu Gao
LLVM pass ordering auto-tuning can outperform standard -O3, but it is often hindered by an enormous search space and the high overhead of hundreds of dynamic measurements. This talk presents an efficient auto-tuning framework that minimizes expensive measurements using a profile-guided relative cost model and calibrated beam search. Evaluation on cBench shows an average 10.46% speedup over -O3 with only 20 dynamic measurements, significantly accelerating the search for optimal pass sequences.