We have a partial transform in the opposite direction, so that needs to be removed while adding a more general transform that moves bitcast after insertelement.
The motivating case from PR45748:
https://bugs.llvm.org/show_bug.cgi?id=45748
...is the last test diff. In that example, we are triggering an existing bitcast transform, so we reduce the number of casts, and that should give us the ideal x86 codegen.
I'm not sure what to do about the mmx diffs. If the x86 backend is expecting something in particular, we need to specify that here (do we need to exclude/add the mmx type to either of these code diffs?).
I guess this won't work for scalable vectors?
Can we somehow just replace the elt type in VecOp->getType() instead?