TransposeOp are often followed by ExtractOp.
In certain cases however, it is unnecessary (and even detrimental) to lower a TransposeOp to either a flat transpose (llvm.matrix intrinsics) or to unrolled scalar insert / extract chains.
Providing foldings of ExtractOp mitigates some of the unnecessary complexity.
can getSubMap take a range instead of materializing the vector?