Changes getRecurrenceIdentity to always return a neutral value of -0.0 for FAdd.
Details
Diff Detail
Event Timeline
Hmm. Yeah I was thinking it would be OK to always generate -0.0. Does that start creating mixed vectors though?
We could do that - I guess we might see quite a few test failures though. Is there potential for less optimal code because the FP vector can no longer be a zeroinitializer?
Changed getRecurrenceIdentity to always generate -0.0 for FAdd.
@david-arm - I also wondered if we should try to use zeroinitializer where possible, though when I changed this to always return -0.0 for FAdd the only tests affected were in test/Transforms/LoopVectorize
I think the original version was probably better, only doing this without nsz. A vector like <0.0, -0.0, -0.0 -0.0> is going to be more difficult to materialize than one that is all zeros, without an obvious way of converting it to all zeros. And we should try to not pessimize the existing -Ofast cases.
Sorry, That was what I was trying to say above, it wasn't very clear.
If you change this back the the original, then it LGTM.
Do we know why we're seeing the mixed zeros in the tests? Is it an artifact of the test (because the test includes a +0.0 to start with) that we don't see in practice?
Note that phi instructions can have FMF ( D67564 ), so if we assume that was the right direction, and we are losing the flags somewhere in the opt pipeline, we should try to fix that.
It would presumably need to be SROA that added flags to phi's it created? I'm not sure where it would get that info from though.
Do we know why we're seeing the mixed zeros in the tests? Is it an artifact of the test (because the test includes a +0.0 to start with) that we don't see in practice?
I think most reductions will start out as
float s = 0; for(int i = 0; i < n; i++) s += x[i];
I feel it would be unusual for a user to deliberately use -0.0 as the start value! 0 is always going to be the most common choice.
I'm not sure either. We might need to apply FMF to load, stores, and function args to fill the gap. Another option might be to back-propagate the FMF from the fadd to its phi operand:
%s.0 = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ] %add = fadd fast float %s.0, %0
Not sure if there's some corner-case I'm overlooking, but I'm imagining something like what instcombine does to fill-in / restore no-wrap flags (and so the FMF setting could also happen in instcombine).
Do we know why we're seeing the mixed zeros in the tests? Is it an artifact of the test (because the test includes a +0.0 to start with) that we don't see in practice?
I think most reductions will start out as
float s = 0; for(int i = 0; i < n; i++) s += x[i];I feel it would be unusual for a user to deliberately use -0.0 as the start value! 0 is always going to be the most common choice.
Yes, I agree - hardly anyone considers the subtlety of -0.0 in FP math, so it's almost never specified in source.
That sounds great. Sounds like it might be useful past just these reductions. So long as we can say that flags propagate like that, or come up with some rules for when they do.
What do you folks think for the context of this review? Is it best to have a go at adding that into instcombine, plus a fold to make phi + nsz + -0.0 use 0.0 instead. Or go with the initial version of this patch that just plumbs the nsz flag through and emits 0.0 for the nsz case?
Hi @dmgreen and @spatel, I've been trying to follow the discussion but I'm not entirely sure I follow what you're proposing @kmclaughlin should do here? Are you suggesting that Kerry could change this patch to:
- Change InstCombine in two places so that a) prior to vectorisation the PHI inherits the FMF from the instruction it feeds into, then b) apply a second fold in InstCombine as well that changes all 0.0 values to -0.0, including for any possible vector PHIs? Obviously we have to remember InstCombine is run after the vectoriser too so will also look at vector PHIs.
- Always set the default identity to be <-0.0, -0.0> regardless of the nsz flag?
I guess this might take some time to implement and test - I imagine there are issues/corner cases that might come up, such as what if the chain of FP operations includes a fadd, etc. with FMF different from the rest? Does that mean we cannot propagate FMF to the PHI?
The other possibility is to revert the patch back to the original so we at least get some sensible behaviour for now, whilst working on a second patch to investigate these new proposals?
Ah sorry, I realise now you meant change PHIs (vector or scalar) with nsz and -0.0 to use +0.0 instead!
Bending FMF propagation to match our goals is a multi-step process and will need more discussion to determine what is right/expected. So I am ok to go back to the initial version of the patch that checked nsz, but please put a FIXME comment there to state the ideal behavior and the mixed-zero problem that we're avoiding.
@dmgreen, @spatel, @david-arm Thank you for the all the comments and discussion on this patch; I've tried to summarise the changes required before we can always return -0.0 in the FIXME added to getRecurrenceIdentity() and reverted back to the original version.