Name
Shardy: An MLIR-based Tensor Partitioning System for All Dialects
Session Type
Technical Talk
Date & Time
Thursday, October 24, 2024, 2:15 PM - 2:45 PM
Abstract/s
Generative AI models are so large that the tensor programs they are represented as are required to be chunked (partitioned) into programs on thousands of hardware accelerators. Within Google DeepMind these models are being partitioned across TPU super clusters of over 4096 devices. In this presentation, we present a new MLIR tensor propagation system we have been developing and deploying to train these large AI models. We’ve defined our own dialect that expresses tensor shardings and compiler transformation rules as MLIR attributes. It is MLIR dialect agnostic, and has improved debugging capabilities and more configurability to the propagation algorithm over past systems.
Location Name
California Ballroom