This new pattern mixes vector.transpose and direct lowering to vector.reduce.
This allows more progressive lowering than immediately going to insert/extract and
composes more nicely with other canonicalizations.
This has 2 use cases:
- for very wide vectors the generated IR may be much smaller
- when we have a custom lowering for transpose ops we can target it directly
rather than rely LLVM
nit: an output