Correct me if I am wrong, I think there is minmal microarchitecture
difference between vmv.s.x with vl==1 and v(f)mv.v.(ixf) with
vl==avl. So we can just splat scalar to be of length VL instead
of 1 to reduce context switch of vtype.
For VP reductions, we can do this iff we can prove AVL is non-zero.
To not increase register pressure, we do this iff LMUL <= 1.