NVIDIA's latest Blackwell architecture packs a great deal of new features for Generative AI and accelerated computing towards Tensor Compute. Platform support for these features is being actively added in upstream LLVM and MLIR projects. This comes in various forms mainly in the form of intrinsics, newer type additions to the APFloat subsystem, as well as exposing them in the NVGPU/NVVM dialects of MLIR. In the first-part of the talk, we will discuss the challenges we faced in modelling these features as intrinsics, evaluated alternatives, as well as the lessons learnt in implementing them in the NVPTX backend as well as in the NVVM Dialect. The second-part of the talk aims to provide a hands-on tutorial on writing sample kernels using NVDSL in python.