Today, achieving peak performance on modern AI accelerators often requires control over low-level hardware features. This trend is expected to further exacerbate as more asynchronicity and dynamism are built first-class in the hardware. As Dark Silicon trends continue, hardware is expected to expose coarser-grain primitives and coarser-grain programming models must be used (e.g. with warp/wave specialization, the low-level programming model increasingly resembles MPI/MIMD-style parallelism but complexified by low-level hardware constraints such as instruction issue ports or warp/wave scheduling and specialization). AMD?s open approach to hardware ISA documentation creates a unique opportunity to build world-class assembly tooling in the open, making AMDGPU ASM accessible to a broader community as well as higher-level tools, while maintaining expert-level control. To reap the benefits of modern and future HW we believe an order of magnitude better low-level tooling is needed. Aster builds the foundations for highly-controllable assembly production and pushes the boundaries of what?s possible in low?level performance tooling.
Speakers: Nicolas Vasilache (AMD), Fabian Mora Corder (AMD), Kunwar Grover (AMD)