This is an archive of the discontinued LLVM Phabricator instance.

[MVE] Fold fadd(select(..., +0.0)) into a predicated fadd
ClosedPublic

Authored by david-arm on Jun 8 2022, 1:33 AM.

Details

Summary

We already have patterns for matching fadd(select(..., -0.0)),
but an upcoming patch will lead to patterns using +0.0 as the
identity instead of -0.0. I'm adding support for these patterns
now to avoid any regressions for MVE.

Diff Detail

Event Timeline

david-arm created this revision.Jun 8 2022, 1:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 8 2022, 1:33 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
david-arm requested review of this revision.Jun 8 2022, 1:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 8 2022, 1:33 AM
spatel added a subscriber: spatel.EditedJun 8 2022, 5:06 AM

If this is inverting the transform from D126774, do we need 'nsz' to avoid miscompiling -0.0?
https://alive2.llvm.org/ce/z/Z6sngi

If this is inverting the transform from D126774, do we need 'nsz' to avoid miscompiling -0.0?
https://alive2.llvm.org/ce/z/Z6sngi

I'm assuming that for MVE an fadd of +0 for a given lane is equivalent to simply not performing the fadd for that lane (i.e. by using predication). This is the same assumption that we were making for -0 case, i.e. that adding either -0 or +0 leaves the lane unchanged. Perhaps @dmgreen can confirm this?

But if necessary I can certainly add an additional check for nsz.

I'm assuming that for MVE an fadd of +0 for a given lane is equivalent to simply not performing the fadd for that lane (i.e. by using predication).

Yes, but I'm not sure how that matters. fadd -0.0, 0.0 should produce 0.0. The same should be true if the first argument is in a variable holding -0.0, and the second is from a select that could be 0.0 - it should give the output 0.0, not of the variable.

llvm/lib/Target/ARM/ARMISelLowering.cpp
16709

Can you update this comment. The Lambda name is no longer correct too.

16715

ImmVal == 0 would show the intent a little better.

llvm/test/CodeGen/Thumb2/mve-pred-selectop3.ll
375

Can you add nsz variants of the tests and make sure it is only those that get folded.

david-arm added inline comments.Jun 8 2022, 5:41 AM
llvm/test/CodeGen/Thumb2/mve-pred-selectop3.ll
375

I was a bit confused by your comment starting Yes, but I'm not sure how that matters. Do you agree with @spatel that we should only apply the fold for '+0.0' if the nsz flag is set on the select? Hopefully the FMF flags on the IR select have been propagated to the VSELECT DAG node.

Oh, sorry - I did misread your comment. I should have said No, that's not how floating point fadd works :)
fadd -0.0, 0.0 is 0.0. The identity value for a fadd is -0.0. A vaddt.f32 Qd, Qn, Qm with a false lane predicate will take the original value of Qd unchanged for that lane (so if Qd==Qn, the original value from the input will be used, without any addition).

Luckily though, the transform you are altering is in terms of VSELECT and FADD nodes, which have the standard definitions, and you can think about without considering MVE specifics. The transform just needs to be valid in general.

Old: fadd (select c, y, 0), x
true -> fadd y, x
false -> fadd 0, x

New: select c, (fadd x, y), x
true -> fadd x, y
false -> x

Which is only valid if fadd 0, x is x, which needs nsz.

david-arm updated this revision to Diff 435156.Jun 8 2022, 7:39 AM
  • Added checks for nsz flag on the select when the identity is +0.0
david-arm marked 3 inline comments as done.Jun 8 2022, 7:39 AM

Thanks for the explanation @dmgreen!

spatel added inline comments.Jun 8 2022, 8:53 AM
llvm/lib/Target/ARM/ARMISelLowering.cpp
16729

This is not strictly correct. If the 'fadd' doesn't have 'nsz', then the transform results in an 'nsz' value where the original sequence did not.

The transform is safe as long as the 'fadd' has 'nsz', so that's the flag we should use to enable the transform.

We can propagate the fadd's 'nsz' to the new 'select' even if the original 'select' was not 'nsz' itself.
It's confusing, but you can run examples with Alive2 to see what's valid:
https://alive2.llvm.org/ce/z/Z-cuhQ

david-arm added inline comments.Jun 8 2022, 8:59 AM
llvm/lib/Target/ARM/ARMISelLowering.cpp
16729

Doesn't this contradict https://reviews.llvm.org/D126774 then? In my original instcombine patch I think it was suggested that we should add nsz to the select instruction being generated by the vectoriser, whereas from this comment it sounds like we don't need to and can just rely upon the nsz flag on the IR fadd?

spatel added inline comments.Jun 8 2022, 9:11 AM
llvm/lib/Target/ARM/ARMISelLowering.cpp
16729

No, the transforms are not strictly reversible/bidirectional:
https://alive2.llvm.org/ce/z/G42d8e

In both cases, I think we need 'nsz' on the final value in the pattern to enable the transform safely.
It is possible that we've created some unintended FMF propagation though when the flags are different on each value.

The semantics are fuzzy and having different flags on values within a function seems highly unlikely, but we should probably add more tests to cover those possibilities.

david-arm updated this revision to Diff 435449.Jun 9 2022, 1:33 AM
  • Use the flags for the fadd to determine the safety of the transformation and apply the NSZ flag to the select if needed.
  • Tweaked the tests to demonstrate the transform succeeds when the nsz flag is only on the fadd.
david-arm marked 2 inline comments as done.Jun 9 2022, 1:34 AM
david-arm added inline comments.
llvm/lib/Target/ARM/ARMISelLowering.cpp
16729

OK thanks for explaining and looking into this in detail! I guess differences in flags are much more likely to occur with inlining, especially as part of LTO where different translation units may be compiled differently.

dmgreen accepted this revision.Jun 10 2022, 1:14 AM

Thanks for putting this patch together. That's a great help and the regressions were pretty large otherwise! I am still getting some odd results from the change, but I think that is just folding the select 0 into a masked load, which is OK.

Other than using FaddFlags instead of SelFlags, this LGTM.

llvm/lib/Target/ARM/ARMISelLowering.cpp
16731

I don't think it is valid to transfer the flags to the select, unfortunately. nsz may be valid, but not the existing flags.
https://alive2.llvm.org/ce/z/9u943A

It should be valid to transfer the flags from the fadd, as that is the end result of the old pattern. https://alive2.llvm.org/ce/z/PzGZSw

This revision is now accepted and ready to land.Jun 10 2022, 1:14 AM
This revision was landed with ongoing or failed builds.Jun 10 2022, 3:10 AM
This revision was automatically updated to reflect the committed changes.
david-arm marked an inline comment as done.
david-arm marked an inline comment as done.Jun 10 2022, 3:11 AM
david-arm added inline comments.
llvm/lib/Target/ARM/ARMISelLowering.cpp
16731

Thanks for the LGTM @dmgreen! I addressed this comment before merging.