hlfir.matmul_transpose will be lowered to a new runtime call.
A canonicalizer was chosen because
- Alternative: a new pass for rewriting chained intrinsics - this would add a lot of unnecessary boilerplate.
- Alternative: including this in the HLFIR Intrinsic Lowering pass - I wanted to separate these two concerns: not adding a second purpose complicating the intrinsic lowering pass.
With this change, the MLIR built-in canonicalization pass should be run
before the HLFIR Intrinsic Lowering pass.
There is got to be some verification that the two uses are exactly hlfir.matmul and hlfir.destroy, otherwise, the transformation will silently produce incorrect code in case the HLFIR contract changes at some point.