This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Enable float minmax reductions via instruction flag
Needs ReviewPublic

Authored by nlw0 on May 10 2019, 1:04 PM.

Details

Summary

Reduction loops with the min function get vectorized today for integers and floats, but in the second case the function attribute "no-nans-fp-math" is required. This patch makes the vectorization for floats not strictly dependent on that, further enabling it if fcmp has the nnan instruction flag.

This issue is alluded to in bug report #35538 https://bugs.llvm.org/show_bug.cgi?id=35538#c1 .

Diff Detail

Event Timeline

nlw0 created this revision.May 10 2019, 1:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 10 2019, 1:04 PM

[adding potential reviewers who are likely more familiar with the vectorizers than me]

  1. I checked in the test file with baseline (current) checks: rL360542. Assuming we can fix the motivating test with this patch, you should be able to run 'utils/update_test_checks.py' to automatically update that test file.
  2. Did you run 'make check' after applying your patch? I might not be seeing it correctly, but I think existing regression tests would change with this patch.
simoll added a subscriber: simoll.May 13 2019, 12:26 AM
nlw0 added a comment.EditedMay 13 2019, 1:24 PM

I had not run make check, sorry I'm still a bit lost. There is a number of broken tests. Is the idea that these might be all be benign changes due to the vectorization not kicking in before, and we might just need to update what the generated code now looks like? And then the test file you already included would now be just another test in such state? How do I update this PR now, just apply the patch over current master and upload a new patch file?

I had not run make check, sorry I'm still a bit lost. There is a number of broken tests. Is the idea that these might be all be benign changes due to the vectorization not kicking in before, and we might just need to update what the generated code now looks like? And then the test file you already included would now be just another test in such state? How do I update this PR now, just apply the patch over current master and upload a new patch file?

The problem is that the patch does not check that 'I' is an FPMathOperator before asking if 'I->hasNoNaNs()', so we are crashing (asserting) on a bunch of existing tests.

The logic should look like this to work with the current state of FMF:

diff --git a/llvm/lib/Analysis/IVDescriptors.cpp b/llvm/lib/Analysis/IVDescriptors.cpp
index 19f1a771b85..b6de4ace07c 100644
--- a/llvm/lib/Analysis/IVDescriptors.cpp
+++ b/llvm/lib/Analysis/IVDescriptors.cpp
@@ -586,8 +586,9 @@ RecurrenceDescriptor::isRecurrenceInstr(Instruction *I, RecurrenceKind Kind,
     LLVM_FALLTHROUGH;
   case Instruction::FCmp:
   case Instruction::ICmp:
-    if (Kind != RK_IntegerMinMax &&
-        (!HasFunNoNaNAttr || Kind != RK_FloatMinMax))
+    bool MightHaveNans = !HasFunNoNaNAttr &&
+                         I->getOpcode() == Instruction::FCmp && !I->hasNoNaNs();
+    if (Kind != RK_IntegerMinMax && (Kind != RK_FloatMinMax || MightHaveNans))
       return InstDesc(false, I);
     return isMinMaxSelectCmpPattern(I, Prev);
   }

Unfortunately, this patch is exposing existing problems with our instruction-level fast-math-flags. See:
https://bugs.llvm.org/show_bug.cgi?id=38086
https://bugs.llvm.org/show_bug.cgi?id=39535
https://bugs.llvm.org/show_bug.cgi?id=23116
D51701
https://bugs.llvm.org/show_bug.cgi?id=35538#c2

So the vectorization code already has the potential to miscompile because we can't just check for no-NANs (nnan). We also need to check for no-signed-zeros (nsz). But checking for nsz should happen on the 'select' rather than the 'fcmp'. I think we should fix the fundamental problems in the IR itself instead of trying to hack around them any more.

nlw0 added a comment.May 14 2019, 11:14 AM

My main motivation is just enabling vectorization for "minimum" loops without requiring function attributes. I think having a focus might help ensure I can contribute something, and a small patch would be nice too. I'll gladly work on a larger issue if it's within my reach, though, or if turns out to be required.

Where can I find more information about the expected behavior of fcmp? I don't understand why signed zeros should be relevant, the resulting behavior should just be whatever is caused by fcmp, either ignoring them or not. And ignoring should be the default anyway from what I recall from IEEE-754. Anyway, isn't this test here just about the fact NaNs violate associativity?

Why would select take a flag, isn't the whole semantics encoded in the boolean returned from fcmp?

My main motivation is just enabling vectorization for "minimum" loops without requiring function attributes. I think having a focus might help ensure I can contribute something, and a small patch would be nice too. I'll gladly work on a larger issue if it's within my reach, though, or if turns out to be required.

No problem - I've been putting the underlying changes off for a while, so I'll try to make some progress now and cc you on those patches.

Where can I find more information about the expected behavior of fcmp? I don't understand why signed zeros should be relevant, the resulting behavior should just be whatever is caused by fcmp, either ignoring them or not. And ignoring should be the default anyway from what I recall from IEEE-754. Anyway, isn't this test here just about the fact NaNs violate associativity?

Yes, you're correct that fcmp should ignore signed zeros based on IEEE-754, and that's part of the problem: currently, we use 'nsz' on an fcmp to predicate at least 1 transform (that's PR38086).
That is relevant to this patch because FP min/max via select also violate associativity with signed zeros, but we're not accounting for that currently. We could ignore that as an existing bug and move ahead with this patch if it's urgent, but I'd rather try to fix this properly.

Why would select take a flag, isn't the whole semantics encoded in the boolean returned from fcmp?

I think it would be better to discuss more general questions in PR38086 or 1 of the related bugs or on llvm-dev.

No problem - I've been putting the underlying changes off for a while, so I'll try to make some progress now and cc you on those patches.

D61917

nlw0 added a comment.May 15 2019, 12:58 PM

My main motivation is just enabling vectorization for "minimum" loops without requiring function attributes. I think having a focus might help ensure I can contribute something, and a small patch would be nice too. I'll gladly work on a larger issue if it's within my reach, though, or if turns out to be required.

No problem - I've been putting the underlying changes off for a while, so I'll try to make some progress now and cc you on those patches.

This is excellent, thanks a lot. There's no hurry. I'll try to familiarize myself with PR38086.

@nlw0 reverse ping - are you still looking at this?

But checking for nsz should happen on the 'select' rather than the 'fcmp'. I think we should fix the fundamental problems in the IR itself instead of trying to hack around them any more.

This is solved.

But checking for nsz should happen on the 'select' rather than the 'fcmp'. I think we should fix the fundamental problems in the IR itself instead of trying to hack around them any more.

This is solved.

It's true that we now allow FMF on select and phi, but this example shows that we're still a long way from deprecating FMF on fcmp. The IR in these tests matches something like this in C source:

float f(float *p) {
  float min = p[0];
  for (unsigned i=0; i<65537; ++i)
    if (p[i] < min) min = p[i];
  return min;
}

When that begins life as IR in clang, there are no select or phi instructions, so existing FMF propagations are never even exercised. We will need to allow FMF on loads and arguments (effectively any FP value) to make this work without using any function attributes.

nlw0 added a comment.Dec 30 2019, 1:19 AM

Hi, I'm unable to spend much time on this right now, although I'm definitely still interested.

On Julia the IR is currently generated with flags in the fcmp instruction. I was hoping at first that this would just work like that, although now it sounds like there will need to be changes on how this IR is being produced in the first place. What is the way forward?

spatel added a comment.Jan 2 2020, 1:08 PM

Hi, I'm unable to spend much time on this right now, although I'm definitely still interested.

On Julia the IR is currently generated with flags in the fcmp instruction. I was hoping at first that this would just work like that, although now it sounds like there will need to be changes on how this IR is being produced in the first place. What is the way forward?

The current thinking is that we want any FP value to be able to carry fast-math-flag properties. That would hopefully remove logical inconsistencies such as shown here:
https://bugs.llvm.org/show_bug.cgi?id=38086

We've made some progress towards that goal by allowing FMF on select and phi instructions. Originally, I thought that just adding FMF to select/phi would be enough to fix this. But it seems clear now that we're going to need to allow FMF on memory ops and probably on function arguments before we can deprecate FMF on fcmp.

arsenm resigned from this revision.Feb 13 2020, 3:27 PM