- Better Performance Models for MLGO Training -Viraj Shah
- Transform-dialect schedules: writing MLIR-lowering pipelines in MLIR -Rolf Morel
- How expensive is it? Big data for ML cost modeling -Aiden Grossman
Better Performance Models for MLGO Training -Viraj Shah
The systematic application of MLGO models to more optimizations is impeded by existing models' insufficient ability to account for many of the dynamic effects associated with runtime as a result of assumptions made about the execution environment and runtime behavior of code. In this talk, we present our work focusing on developing a performance model capable of accurately modeling longest latency cache misses and including resulting overhead into the throughput, and consequently, reward signal calculation. Further, experimenting with different ways to supplement such models with additional features so as to strike a balance between how accurately the model can estimate performance, and how feasible it is to build/train and use.
Transform-dialect schedules: writing MLIR-lowering pipelines in MLIR -Rolf Morel
The Transform dialect exposes transformations of MLIR as ops in MLIR. These fine-grained operations can be sequenced to express more involved transformations. When such a sequence expresses a coherent lowering step we refer to it as a schedule. By recent additions to the Transform dialect, we can name these schedules and call them from other sequences. Leveraging this feature, we show how Transform ops compose into reusable schedules and how schedules compose into larger schedules. We show that entire MLIR-lowering pipelines can be declaratively specified in MLIR with large parts being shared among pipelines.
How expensive is it? Big data for ML cost modeling - Aiden Grossman
Within this talk, we present tooling and processes to create extremely accurate learned cost models. We take a large set of basic blocks from ComPile, benchmark them using llvm-exegesis, and then train a learned cost model based on that data. In contrast to previous approaches, we were able to train on a significantly more representative set of basic blocks than previous approaches due to our use of a large dataset like ComPile on top of the production-grade benchmarking infrastructure