In this talk, we present an optimized code generation for data layout transformations in MLIR. For an input tensor of arbitrary order (dimensionality) and a given index permutation sequence, our code generator synthesizes high-performance vector code for the corresponding transposition with progressive lowering in MLIR. We currently support single-threaded transposition with explicit vectorization for mixed-precision tensor data on AVX2-based processors. We are currently extending this to support parallel code generation, autotuning, and other vector instruction sets, and also integrate this with existing MLIR-based tensor algebra compilers. Performance results show a significant speedup over the existing unoptimized MLIR implementation, Tensorflow, and Eigen, and we achieve performance comparable to the HPTT library.