This contains the RFC for the scalable vector type alone, without the intrinsics.
This document will not be committed to the codebase, this is only for review.
I will post additions/changes to the RFC here.
This is looking good to me, thanks!
A few comments inline.
Also, can you move this to docs/Proposals, together with the other RFCs, please?
Precisely. This is an optimisation problem, not an IR representation one.
It'd be good to have a small paragraph on why not for both restrictions.
It may not be that simple.
If the middle-end has assumed scalable vectors exist, then it has also added wrapping code for the predication, runtime check of induction evolution, and potentially have used intrinsics to represent concepts that cannot be done in plain IR.
If that's the case, then dropping the scalable would make a lot of other things to fail, potentially not in the legalisation phase, but further down the line, which will be much harder to debug.
I think a safer assumption is: if the target doesn't support scalable vectors, then scalable vectors are not legal and need to fail.
This seems like yet another step in the right direction. Of course I may be biased as I've already been happy with previous iterations.
Handling those in the frontend instead sounds like a good tradeoff for how much this simplifies the IR and the size queries in particular, kudos!
I don't think having a primitive IR type be not functional at all on the majority of targets will be acceptable. I also don't understand what problems you are seeing: any scalable vector code the middle end generates has to be valid if the processor implementation the code runs on happens to use a multiple (vscale) of 1, right? Dropping scalable in legalization just means specializing for this case (vscale=1) at compile time.
If target-specific intrinsics are used that can't sensibly be implemented on another target (e.g., SVE first-fault loads), then of course instruction selection should fail. Intrinsics like vscale() or stepvector() are trivial to implement on any fixed-width SIMD architecture though.
Since aggregates containing scalable vectors aren't part of this proposal any more, is one of these two integers always zero? If so, how about instead returning one integer and a flag indicating if it needs to be scaled by the runtime multile (similar to VectorType::ElementCount)?
Ah, I didn't spot that. Will do. Does phabricator automatically copy across the inline comments?
Our original approach assumed that scalable vectors being illegal on other targets would be sufficient, but the feedback I got at the dev meeting indicated otherwise -- hence this change.
As Robin says, the IR must be valid for vscale being 1 (as well as any other value), and the proposed intrinsics can easily be lowered to something appropriate for fixed-length vectors.
Yes, you're right -- I was still thinking about cases where we could have mixed comparisons in future if we found a use case, but without aggregates we only need to distinguish between scalable/non-scalable quantities.
This would also have the side effect of simplifying the comparison operations further, in that I'd only have to check that the flags were equal on both sides instead of checking for both sides being 0 for one term before returning the comparison of the other term (and then checking them again in the other order).
My worry is about different support on the different units. Specifically, SVE supports scatter/gather, predication, etc. which NEON doesn't. So, for Arm, generating scalable code and lowering to NEON may not only be terrible for performance, but could end up exposing a host of illegal lowering scenarios.
I'm not saying we should forbid them by definition, but we could make them hard fail now and add them, case by case, when we have analysed and found them to be benign.
I'm also not strongly set on this either, I'm just worried about safety. If everyone is clearly convinced this is safe in all cases, then I drop my argument. :)
Siding with @rengolin here: let the backend crash if it does not support scalable types.
Vectorizers should simply not generate scalable types if they are not supported by the target.
Legalizing scalable to unscaled IR is hard to get right (the legality issues already came up). It's worse if you expect fast SIMD to come out of this.. basically you open up the cost modelling can all over, considering TTI/TLI.. just like in LV/VPlan.
Nobody's expecting good code to come out of writing scalable vector code and compiling it for NEON or SSE or something, just as you can already generate atrocious code by e.g. generating 128 bit masked gathers & scatters while compiling for NEON (those intrinsics are expanded into a sequence of conditional scalar loads/stores if they're not legal). Still, in general it's nice to have all backends support most IR constructs functionally though inefficiently. Among other things, it helps ensure that the semantics are not tied to a specific target (so it can be implemented by other future backends) and allows running test programs to understand those semantics without needing a particular chip or emulator.
This is aspirational rather than a hard rule, there is and will always be a large body of seemingly simple LLVM IR that asserts somewhere in CodeGen, but I see absolutely no reason to rule it out from the get-go in this case. If some specific intrinsic can't be legalized reasonably and correctly, then don't legalize it and ISel crash on that specific operation. But many things can be legalized correctly with little effort (and already are for fixed-width vectors).
Updated based on feedback:
I didn't move it to the proposals directory, since I'm not sure if we would lose the inline comments that way.
Update based on off-list feedback:
That's ok. You can move after approval and before commit.
Those changes look good to me. I think we converged into a good solution.
I'll approve now, but please wait a day or two for the last comments form the other reviewers.
I concede, given that this is just a proposal. It may well be the case that legalising scalable vectors to non-scalable ones will prove hard and we'll end up adding a number of llvm_unreachable cases to protect it. I don't foresee generating scalable vectors by default on any target anyway, so this will always be an edge case at best.