We already have patterns for matching fadd(select(..., -0.0)),
but an upcoming patch will lead to patterns using +0.0 as the
identity instead of -0.0. I'm adding support for these patterns
now to avoid any regressions for MVE.
Details
Diff Detail
Event Timeline
If this is inverting the transform from D126774, do we need 'nsz' to avoid miscompiling -0.0?
https://alive2.llvm.org/ce/z/Z6sngi
I'm assuming that for MVE an fadd of +0 for a given lane is equivalent to simply not performing the fadd for that lane (i.e. by using predication). This is the same assumption that we were making for -0 case, i.e. that adding either -0 or +0 leaves the lane unchanged. Perhaps @dmgreen can confirm this?
I'm assuming that for MVE an fadd of +0 for a given lane is equivalent to simply not performing the fadd for that lane (i.e. by using predication).
Yes, but I'm not sure how that matters. fadd -0.0, 0.0 should produce 0.0. The same should be true if the first argument is in a variable holding -0.0, and the second is from a select that could be 0.0 - it should give the output 0.0, not of the variable.
llvm/lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
16709 | Can you update this comment. The Lambda name is no longer correct too. | |
16715 | ImmVal == 0 would show the intent a little better. | |
llvm/test/CodeGen/Thumb2/mve-pred-selectop3.ll | ||
375 | Can you add nsz variants of the tests and make sure it is only those that get folded. |
llvm/test/CodeGen/Thumb2/mve-pred-selectop3.ll | ||
---|---|---|
375 | I was a bit confused by your comment starting Yes, but I'm not sure how that matters. Do you agree with @spatel that we should only apply the fold for '+0.0' if the nsz flag is set on the select? Hopefully the FMF flags on the IR select have been propagated to the VSELECT DAG node. |
Oh, sorry - I did misread your comment. I should have said No, that's not how floating point fadd works :)
fadd -0.0, 0.0 is 0.0. The identity value for a fadd is -0.0. A vaddt.f32 Qd, Qn, Qm with a false lane predicate will take the original value of Qd unchanged for that lane (so if Qd==Qn, the original value from the input will be used, without any addition).
Luckily though, the transform you are altering is in terms of VSELECT and FADD nodes, which have the standard definitions, and you can think about without considering MVE specifics. The transform just needs to be valid in general.
Old: fadd (select c, y, 0), x true -> fadd y, x false -> fadd 0, x New: select c, (fadd x, y), x true -> fadd x, y false -> x
Which is only valid if fadd 0, x is x, which needs nsz.
llvm/lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
16729 | This is not strictly correct. If the 'fadd' doesn't have 'nsz', then the transform results in an 'nsz' value where the original sequence did not. The transform is safe as long as the 'fadd' has 'nsz', so that's the flag we should use to enable the transform. We can propagate the fadd's 'nsz' to the new 'select' even if the original 'select' was not 'nsz' itself. |
llvm/lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
16729 | Doesn't this contradict https://reviews.llvm.org/D126774 then? In my original instcombine patch I think it was suggested that we should add nsz to the select instruction being generated by the vectoriser, whereas from this comment it sounds like we don't need to and can just rely upon the nsz flag on the IR fadd? |
llvm/lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
16729 | No, the transforms are not strictly reversible/bidirectional: In both cases, I think we need 'nsz' on the final value in the pattern to enable the transform safely. The semantics are fuzzy and having different flags on values within a function seems highly unlikely, but we should probably add more tests to cover those possibilities. |
- Use the flags for the fadd to determine the safety of the transformation and apply the NSZ flag to the select if needed.
- Tweaked the tests to demonstrate the transform succeeds when the nsz flag is only on the fadd.
llvm/lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
16729 | OK thanks for explaining and looking into this in detail! I guess differences in flags are much more likely to occur with inlining, especially as part of LTO where different translation units may be compiled differently. |
Thanks for putting this patch together. That's a great help and the regressions were pretty large otherwise! I am still getting some odd results from the change, but I think that is just folding the select 0 into a masked load, which is OK.
Other than using FaddFlags instead of SelFlags, this LGTM.
llvm/lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
16731 | I don't think it is valid to transfer the flags to the select, unfortunately. nsz may be valid, but not the existing flags. It should be valid to transfer the flags from the fadd, as that is the end result of the old pattern. https://alive2.llvm.org/ce/z/PzGZSw |
Can you update this comment. The Lambda name is no longer correct too.