Session Type
Quick Talks
Date & Time
Wednesday, November 9, 2022, 1:30 PM - 3:00 PM
Name
Quick Talk
Description

1) Inlining for Size  - Kyungwoo Lee
2) Automatic indirect memory access instructions generation for pointer chasing patterns - Przemysław Ossowski
3) Expecting the expected: Honoring user branch hints for code placement optimizations - Stan Kvasov, Vince Del Vecchio
4) LLVM Education Initiative - Chris Bieneman, Mike Edwards, Kit Barton
5) Using modern CPU instructions to improve LLVM's libc math library. - Tue Ly
6) Building an End-to-End Toolchain for Fully Homomorphic Encryption with MLIR - Alexander Viand
7) Challenges Of Enabling Golang Binaries Optimization By BOLT - Vasily Leonenko, Vladislav Khmelevskyi
8) Enabling AArch64 Instrumentation Support In BOLT - Elvina Yakubova

Abstract/s
1) Inlining for Size  - Kyungwoo Lee Inlining for size is critical in mobile apps as app size continues to grow. While a link-time optimization (LTO) largely minimizes the app size at minimum size optimization (-Oz), a scalable link-time optimization (ThinLTO) misses many inline opportunities because each module inliner works independently without modeling the size cost globally. We first show how to use the ModuleInliner with LTO. Then, we describe how to improve inlining with ThinLTO by extending the bitcode summary, followed by a global inline analysis. We also explain how to overcome import restrictions, often appearing in Objective-C or Swift, by pre-merging bitcode modules. We reduced the code size by 2.8% for SocialApp, 4.0% for ChatApp, and 3.0% for Clang, compared to -Oz with ThinLTO. 2) Automatic indirect memory access instructions generation for pointer chasing patterns - Przemysław Ossowski This short talk provides an example how newly introduced feature into real HW can be adopted into Clang and LLVM and thanks to it easily available for the user. Indirect Memory Access Instructions (IMAI) can provide significant performance improvement but its usability is limited with particular HW restrictions. This talk will present how we tried to reconcile HW limitations, complexity of IMAI and ease of use by handling dedicated pragma in Clang and applying Complex Patterns in DAG in LLVM Backend. 3)  Expecting the expected: Honoring user branch hints for code placement optimizations - Stan Kvasov, Vince Del Vecchio LLVM's __builtin_expect, and a variant we recently added, __builtin_expect_with_probability, allow source code control over branch weights and can boost performance with or without PGO via hot/cold splitting. But in LLVM optimization, it's not always intuitive how to update branch weight metadata with control flow changes. We talk about recent issues with losing branch weights in SimplifyCFG and possible improvements to the infrastructure for maintaining branch weights. Link-Time Attributes for LTO: Incorporating linker knowledge into the LTO recompile - Todd Snider Embedded-application systems have limited memory, so user control over placement of functions and variables is important. The programmer uses a linker script to define a memory configuration and specify placement constraints on input sections that contain function and variable definitions. With LTO enabled, it is critical that the compiler incorporate link-time placement information into the LTO recompile (Edler von Koch - LLVM 2017). This talk discusses a compiler and linker implementation that roughly follows the ideas presented in Edler von Koch, highlighting differences in our implementation that offer significant advantages. 4) LLVM Education Initiative - Chris Bieneman, Mike Edwards, Kit Barton Interested in expanding the LLVM community through education? Interested in better documentation, tutorials, and examples? Interested in sharing your knowledge to help other engineers grow? Come learn about the proposal for a new LLVM Education working group! 5) Using modern CPU instructions to improve LLVM's libc math library. - Tue Ly LLVM libc's math routines aim to be both performant and correctly rounded according to the IEEE 754 standard. Modern CPU instruction sets include many useful instructions for mathematical computations. Effectively utilize these instructions could boost the performance of your math functions' implementations significantly. In this talk, we will discuss about how 2 families of such instructions, fused-multiply-add (FMA) and floating point rounding, are used in LLVM's libc for x86-64 and ARMv8 architectures allowing us to have comparable performance to glibc while achieving accuracy for all rounding modes. 6) Building an End-to-End Toolchain for Fully Homomorphic Encryption with MLIR - Alexander Viand Fully Homomorphic Encryption (FHE) allows a third party to perform arbitrary computations on encrypted data, learning neither the inputs nor the computation results. However, the complexity of developing an efficient FHE application currently limits deploying FHE in practice. In this talk, we will first present the underlying challenges of FHE development that motivate the development of tools and compilers. We then discuss how MLIR has been used by three different efforts, including one led by us, to significantly advance the state of the art in FHE tooling. While MLIR has brought great benefits to the FHE community, we also want to highlight some of the challenges experienced when introducing the framework to a new domain. Finally, we conclude by discussing how the ongoing efforts could be combined and unified before potentially being up-streamed. 7) Challenges Of Enabling Golang Binaries Optimization By BOLT - Vasily Leonenko, Vladislav Khmelevskyi Golang is a very specific language, which compiles to an architecture-specific binary, but also uses its own runtime library, which in turn uses a version-specific data structures to support internal mechanisms like garbage collection, scheduling, reflection and others. BOLT is a post-link optimizer – it rearranges code and data locations in the output binary, so Golang-specific tables should also be updated according to performed modifications. In this talk, we will cover the status of current implementation of Golang support in BOLT, achieved optimization effect and challenges of enabling Golang binaries optimization by BOLT. 8) Enabling AArch64 Instrumentation Support In BOLT - Elvina Yakubova BOLT is a post-link optimizer, built on top of the LLVM. It achieves performance improvement by optimizing application's code layout based on execution profile gathered by a sampling profiler, such as Linux perf tool. In case when necessary advanced hardware counters for precise profiling are not available on some target platforms, one may collect profile by instrumenting binary. In this talk, we will cover changes essential for enabling instrumentation support in BOLT for a new target platform using AArch64 as an example.
Location Name
Monterey - Lower Level