This patch adds a DAG combine to replace a vmv.s.x into a splat vector with a vslide1up instead. This relies on the fact that we can shift a splat without changing any of the active lanes, and vslide1up has separate source and destination vector registers. This allows vslide1up to be tail agnostic whereas vmv.s.x has to be tail undisturbed. This in turn avoids the need for a vsetvli toggle.
One downside to this conversation is that vslide1up has a restriction that the source and destination vector registers can't overlap. This increases register pressure locally, and particularly at very high LMUL, could force an additional spill for a value live over the vslide1up. I think this is net worthwhile, but I'm curious what others think.
There are several TODOs noted in the patch. I plan on implementing the vmv.s.f and narrower element types in a follow up patch. I don't plan to bother with the wider VL one.
Noticed when glancing through other code that I hadn't handled the vmv.v.i case here. Consider that added to the todo list above.