Iterate through the loop and check that the observable values produced are the same whether tail predication happens or not.
We want to find out if the tail-predicated version of this loop will produce the same values as the loop in its original form. For this to be true, the newly inserted implicit predication must not change the the results. All MVE loads and stores have to be predicated, so we know that any load operands, or stored results are equivalent already.
Other explicitly predicated instructions will perform the same operation in the original loop and the tail-predicated form. We call predicated instructions 'Known' and their users are also Known if it overwrites an Known operand. This is because tail predication will cause the false lanes to not be updated, but we know the output because we know the input.
For any 'Unknown' instructions, we can check that they're only consumed by Known instructions because it means that the unknown false lane(s) are replaced with known lane(s).
What this should result in is that all instructions are dependent upon predicated inputs. It also means that for an iteration of the loop, a register should hold the same value whether tail predication has happened or not, with the exception when the differences are masked away by their user(s) and not observable elsewhere.
Details
Diff Detail
Event Timeline
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
555 | Would it be good to spell out here that we have these 2 use cases?
loads/stores produced by the vectoriser (when it thinks tail-folding is good) will all be nicely masked, so this concerns the 2nd use case. Is that right? | |
560 | nit: analyse -> analysis? | |
574 | I appreciate this is bikeshedding, but here we go anyway: UnknownFalseLanes and KnownFalseZeros is a bit difficult to read (in the code below), but your descriptions/comments above are clear. Summarising that, we have a set of instructions with some behaviour that we (immediately) understand, and there are some instructions that needs to be further analysed. | |
589 | can you use one of your new RDA helper functions for this? |
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
555 | This pass shouldn't be concerned with whether the input is from the vectorizer or the intrinsics, as we can't know. On line 762 we enforce that all MVE memory operations are explicitly predicated. | |
574 | I did have Predicated at some point, but I thought that was misleading because some instructions won't be using the VPR. I like KnownFalseZeros because it's explicit in what we expect and I'm much rather have code that doesn't require a big comment to make any sense of what is going on. It doesn't have to be that, but its name should convey what's important about the set. The naming of UnknownFalseLanes is definitely less important though. | |
589 | There were just static helpers and I don't think we won't really do enough of this logic to warrant more helpers. |
was just wondering about more corner cases for "negative" tests: do we have test where lanes are swapped (if that makes sense)?
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
582 | Is there a test for this? | |
llvm/test/CodeGen/Thumb2/LowOverheadLoops/unpredicated-min.mir | ||
2 | nit: rename file unpredicated-min.mir -> unpredicated-max.mir? |
do we have test where lanes are swapped (if that makes sense)?
I don't think there's one for this pass... but that was the reason for having a unit test for validForTailPredication, so we don't have to write a whole loop to test an instruction.
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
582 | The only instructions that I can think of would be indexed load/stores, but they'll be caught by isVectorPredicated. If/When we can enable reductions, we could have an instruction producing two scalar values and then this logic will need to be reorganised a bit. |
LGTM, with one nit inline.
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
582 | okidoki, then probably just remove this, or replace it with an assert? |
Would it be good to spell out here that we have these 2 use cases?
loads/stores produced by the vectoriser (when it thinks tail-folding is good) will all be nicely masked, so this concerns the 2nd use case. Is that right?