1) Profiling Based Global Machine Outlining - Gai Liu
2) Compromises with linking large x86-64 binaries - Arthur Eubanks
3) Novel Data Layout Optimizations in BiSheng Compiler - Ehsan Amiri
4) Precision and Performance Analysis of LLVM's C Standard Math Library on GPUs - Anton Rydahl
5) APX & AVX10: The next major evolution of Intel® architecture - Sergey Maslov
Profiling Based Global Machine Outlining - Gai Liu
While LTO based global machine outlining has shown significant code size reduction, it often suffers from very long compilation time. Here we propose a two-stage approach to move the time-consuming global analysis stage offline, which achieves similar code size saving without significantly lengthening the frequent integration builds.
Compromises with linking large x86-64 binaries - Arthur Eubanks
When x86-64 binaries get too large, the typical instruction sequences to access globals can stop working. We take a look at the medium code model and the compromises it makes to keep large binaries linking without sacrificing too much performance, and what needs to be added to LLVM to make this work.
Novel Data Layout Optimizations in BiSheng Compiler - Ehsan Amiri
We talk about two new data layout optimizations in Bisheng compiler. The first optimization , Structure Peeling Using Runtime Memory Identifiers (SPRMI) is a variation of the well-known Array of Structures to Structure of Arrays (AoS to SoA) optimization. This optimization addresses cases where there are multiple arrays of the structure that we want to optimize. The second optimization, Nested Container Flattening (NCF) relocates some of the data members of one C++ class (e.g. class D) to another class (e.g. class A). As we will explain in the talk, this allows us to reduce the number of load instructions and improve locality of data accessed in the program. These optimizations have significant impact on some of the SPEC CPU bemchmarks. We also highlight techniques used for legality analysis that can be of independent interest and applicable to C++ workloads.
Precision and Performance Analysis of LLVM's C Standard Math Library on GPUs - Anton Rydahl
The LLVM C standard math library, LIBM, is under active development but primarily focused on supporting CPU architectures. We compare the accuracy and performance of existing implementations of standard math library functions across GPU targets. The analysis highlights when target-agnostic implementations from LIBM produce accurate results on GPU targets. The existing LLVM intrinsics or LIBM target-agnostic implementations are, in many cases, comparable to vendor libraries in precision and performance. However, the analysis also highlights weak spots where LIBM needs to rely on vendor implementations for now. We propose a fully functional GPU math library that, as a starting point, mixes vendor and LLVM native implementations. It will provide the users with the best possible performance and precision and, if mutually exclusive, offer configurations prioritizing the former or the latter.
APX & AVX10: The next major evolution of Intel® architecture - Sergey Maslov
Intel disclosed two exciting extensions for future Intel architecture. Intel® Advanced Performance Extensions (Intel® APX) doubles the number of GPRs to 32, and introduces many other new features. Intel® Advanced Vector Extensions 10 (Intel® AVX10) introduces a modern vector ISA which can run across Intel future P-core and E-core. Compiler support is key to enable those ISA features and exploit the hardware capability. In this talk, we will introduce the new ISA extension and how we can utilize them to speed up the application with compatibility.