This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] relax FMF constraint for FP induction
ClosedPublic

Authored by spatel on Mar 16 2021, 7:16 AM.

Details

Summary

This makes the induction part of the loop vectorizer match the reduction part. We do not need all of the fast-math-flags. For example, there are some that clearly are not in play like arcp or afn.

If we want to make FMF constraints consistent across the IR optimizer, we might want to add nsz too, but that's up for debate (users can't expect associative FP math and preservation of sign-of-zero at the same time?).

I fixed the calling code to avoid miscompiles with 1bee549737ac and don't know of any more places in LV that need to be patched.

Diff Detail

Event Timeline

spatel created this revision.Mar 16 2021, 7:16 AM
spatel requested review of this revision.Mar 16 2021, 7:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2021, 7:16 AM
This revision is now accepted and ready to land.Mar 17 2021, 2:28 AM

If we want to make FMF constraints consistent across the IR optimizer, we might want to add nsz too, but that's up for debate (users can't expect associative FP math and preservation of sign-of-zero at the same time?).

What's the logic here? Can we assume that reassoc implies nsz somehow? (It does sound odd/rare to have reassoc without nsz).

We may need to end up adding 0's for predicated reductions, at least in certain modes. We may want to change the identity value for fadd to a -0.0 for in-order reductions anyway though.

Hi @dmgreen, yes of course you're right. I'd forgotten about the nsz requirement. It's definitely needed at compile time for vectorising FP reduction loops, i.e. clang -freassociative-math -fno-trapping-math -fno-signed-zeroes. I guess adding a check for nsz here is consistent with that?

Hi @dmgreen, yes of course you're right. I'd forgotten about the nsz requirement. It's definitely needed at compile time for vectorising FP reduction loops, i.e. clang -freassociative-math -fno-trapping-math -fno-signed-zeroes. I guess adding a check for nsz here is consistent with that?

Yes - clang derived its requirements from gcc, so we've passed that into the optimizer in some places (instcombine at least). I don't know of any practical examples where you could have FP reassociation and still guarantee sign-of-zero, but maybe I'm not being imaginative. :)
So currently there's no easy way (starting from C/C++ at least) to have IR that has reassoc without nsz.

Ok if I push this change, so we're consistent within the vectorizer? Then, I'll push a follow-up (we'll need a pile of new regression tests) to add the nsz requirement for both induction and reduction. That way, we'll be conservatively correct in requiring the extra flag, and we'll match the expected IR coming out of clang.

Note that the FMF requirements for fmul/fadd reduction/induction are different than the fmin/fmax patterns that we've also recently updated; fmin/fmax require nnan and nsz to rearrange, but not reassoc (since there's no FP math involved in those ops).

david-arm added a comment.EditedMar 17 2021, 5:47 AM

Yeah I'm fine with that if @dmgreen is happy? It makes sense to be consistent with the RecurrenceDescriptor code. I think from what I understand clang will only generate IR that contains the reassoc flag if we've set all the appropriate frontend flags. Therefore, currently the only ambiguity at the moment is when hand-writing IR and using the reassoc flag without nsz, right?

dmgreen accepted this revision.EditedMar 17 2021, 6:08 AM

Hi @dmgreen, yes of course you're right. I'd forgotten about the nsz requirement. It's definitely needed at compile time for vectorising FP reduction loops, i.e. clang -freassociative-math -fno-trapping-math -fno-signed-zeroes. I guess adding a check for nsz here is consistent with that?

Yes - clang derived its requirements from gcc, so we've passed that into the optimizer in some places (instcombine at least). I don't know of any practical examples where you could have FP reassociation and still guarantee sign-of-zero, but maybe I'm not being imaginative. :)
So currently there's no easy way (starting from C/C++ at least) to have IR that has reassoc without nsz.

Ok if I push this change, so we're consistent within the vectorizer? Then, I'll push a follow-up (we'll need a pile of new regression tests) to add the nsz requirement for both induction and reduction. That way, we'll be conservatively correct in requiring the extra flag, and we'll match the expected IR coming out of clang.

Note that the FMF requirements for fmul/fadd reduction/induction are different than the fmin/fmax patterns that we've also recently updated; fmin/fmax require nnan and nsz to rearrange, but not reassoc (since there's no FP math involved in those ops).

Yeah that sounds fine to me.

I was thinking more in terms of Alive - how reassoc might imply nsz or not, mathematically. It seems it would not general, but may be mistaken.

I'm not sure where nsz for a vanilla fadd reduction would be needed though. Reassoc might well be enough. For predication + inloop reductions (which no-one uses yet for floats) it would be important if the identity value was 0.0 and we generated:

%x = select %cond, %load, {0.0, 0.0,...}
%y = vecreduce.fadd(%x)
%z = fadd %y, %phi

But if we change that to -0.0 instead, would there be another need of nsz?

Yeah I'm fine with that if @dmgreen is happy? It makes sense to be consistent with the RecurrenceDescriptor code. I think from what I understand clang will only generate IR that contains the reassoc flag if we've set all the appropriate frontend flags. Therefore, currently the only ambiguity at the moment is when hand-writing IR and using the reassoc flag without nsz, right?

That's correct. If we're looking at IR coming from clang, it will always have both flags at least. AFAICT, we don't apply any fast-math-flags in the front-end if someone uses "-fassociative-math" alone.

I was thinking more in terms of Alive - how reassoc might imply nsz or not, mathematically. It seems it would not general, but may be mistaken.

I'm not sure where nsz for a vanilla fadd reduction would be needed though. Reassoc might well be enough. For predication + inloop reductions (which no-one uses yet for floats) it would be important if the identity value was 0.0 and we generated:

%x = select %cond, %load, {0.0, 0.0,...}
%y = vecreduce.fadd(%x)
%z = fadd %x, %phi

But if we change that to -0.0 instead, would there be another need of nsz?

I can't think of any other. If we go back to patches like D47335, we've been conservatively requiring nsz (because gcc did it + nobody has proven otherwise).

That patch included this test:

; ( A + C1 ) + ( B + -C1 )
; Verify this folds to 'A + B' with 'reassoc' and 'nsz' ('nsz' is required)

But I'm not seeing why nsz was needed there...

So I'm fine either way - we can leave this as-is, or follow-up by requiring nsz to be safer.

This revision was landed with ongoing or failed builds.Mar 18 2021, 5:11 AM
This revision was automatically updated to reflect the committed changes.