Name
Simplifying GPU Programming with Parametric Tile-Level Tensors In Mojo
Session Type
Technical Talk
Date & Time
Thursday, October 24, 2024, 2:45 PM - 3:15 PM
Abstract/s
Today’s AI GPU workloads are dominated by operations such as matrix multiplication (matmul) and flash-attention, with state-of-the-art implementations designed to leverage the compute and memory hierarchy of modern GPUs at a tile-level granularity. Expressing these algorithms at this level, rather than using the low-level SIMT (Single Instruction, Multiple Threads) model, presents a significant challenge for kernel developers. In this talk, we will demonstrate how Mojo, a systems programming language built on MLIR, addresses this challenge through its powerful metaprogramming capabilities. Mojo enables the creation of simple yet powerful composable abstractions for parametric Tensor types, which can be tiled, distributed across the compute hierarchy, and vectorized. Additionally, the language provides GPU library authors with direct access to MLIR, making it easier for library authors to specialize high-level library operations for specific hardware targets, which facilitates the efficient development of state-of-the-art GPU kernels that outperform vendor libraries like cuBLAS.
Location Name
California Ballroom