A vmv.v.v can be thought of a vmerge, where instead of masking with a mask,
we're masking the tail with VL. For example, in the sequence below the vmv.v.v
copies over the first 2 elements from the vadd.vv:
vsetivli zero, 4, e32, m1, ta, ma
vadd.vv v9, v10, v11
vsetivli zero, 2, e32, m1, tu, ma
vmv.v.v v8, v9
This patch adds an optimisation to fold away the vmv.v.v into the preceding op
if it has only use, by modifying the op's VL:
vsetivli zero, 2, e32, m1, ta, ma
vadd.vv v8, v10, v11
In general, we can just replace the VL of the op with the VL of the vmv.v.v
(Unless it's a load, in which case we make sure we're only loading a VL less
than or equal to the original, so we don't end up loading more elements than
before)
The actual optimisation shares largely the same structure as
performCombineVMergeAndVOps: I've just duplicated the code for now, but if
people wanted I could try and abstract some of the shared bits away.
Why aren't we using RISCVII::hasVLOp if you want to know there is a VL operand.