This change adds generic support for scalarizing scalable vector operations. Unlike fixed length vectors, we can't simply unroll a scalable vector as we don't know how long it is at compile time. However, there's nothing that prevents us from emitting the loop directly as we do know the dynamic number of elements.
For testing purposes, I have hocked this up to the uniform memory op path. This is not an optimal lowering for uniform mem ops, but it nicely demonstrates the value of having a fallback scalarization strategy available when smarter and more optimal things haven't yet been implemented.
From here, I plan on doing the following:
- Add the support on the predicated path. This is quite a bit more involved and requires setting up VPBlocks for the CFG.
- Generalize the definition of uniform memory op to allow internal predication. (This fundamentally requires the fully general predicated scalarization fallback, so it makes a good test to make sure we haven't missed anything.)
- Write generic cost modeling for scalable scalarization, and start enabling other paths that we current unconditionally bail out from.
- Implement a dedicated recipe for the uniform memory op case in the current predication due to tail folding only form. The loop form will probably be removed via LICM, but we should really stop relying on pass ordering here.
I think you can avoid the scalarisation for SVE here simply by asking for the scalarisation cost of the instruction, similarly to how it's done elsewhere. For SVE this should return Invalid. Alternatively you could add a TTI hook to ask if target should scalarise or not, i.e.
We have always considered it 'illegal' to scalarise for SVE.