We had previously been going through the stack.
A couple small notes:
- We have the vslide1down idiom in a few other places. As a post patch, I plan to try to common the code a bit.
- VF=2 case is still going through the splat + insert path. Picking the optimal sequence for this seems to be a bit fiddly (due to constant mat costs), so I restricted this to cases which would have previously hit the stack.
- I'm only handling integer vectors for the moment. Mostly because I don't see the existing vfslide1down ISD nodes being in place. This will be an obvious followup.
- One of the test diffs does expose a missing combine - a build_vector with a prefix coming from a vector extract sequence. The code after this is arguably worse (due to domain crossing vs stack store), but I think this is a narrow enough case to be non-blocking for now. Let me know if you disagree.
I might go investigate if we can avoid scalarizing this.