This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Enable strict reductions when allowReordering() returns false
ClosedPublic

Authored by kmclaughlin on May 4 2021, 7:38 AM.

Details

Summary

When loop hints are passed via metadata, the allowReordering function
in LoopVectorizationLegality will allow the order of floating point
operations to be changed:

bool allowReordering() const {
  // When enabling loop hints are provided we allow the vectorizer to change
  // the order of operations that is given by the scalar loop. This is not
  // enabled by default because can be unsafe or inefficient.

The -enable-strict-reductions flag introduced in D98435 will currently only
vectorize reductions in-loop if hints are used, since canVectorizeFPMath()
will return false if reordering is not allowed.

This patch changes canVectorizeFPMath() to query whether it is safe to
vectorize the loop with ordered reductions if no hints are used. For
testing purposes, an additional flag (-hints-allow-reordering) has been
added to disable the reordering behaviour described above.

Diff Detail

Event Timeline

kmclaughlin created this revision.May 4 2021, 7:38 AM
kmclaughlin requested review of this revision.May 4 2021, 7:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2021, 7:38 AM
sdesmalen added inline comments.May 4 2021, 1:38 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9804–9805

nit: indentation, please use clang-format.

9810

Why do all of the reductions have to be ordered for the LV to be able to vectorize FP math?
(e.g. if there is an integer reduction and an ordered FP reduction, it would now choose not to vectorize based on this condition)

9919–9920

This condition is a bit odd. Should canVectorizeOrderedFPMath just contain the call to Requirements.canVectorizeFPMath instead? i.e. in order to vectorize ordered FP math, it must at least be able to vectorize FP math.

david-arm added inline comments.May 5 2021, 12:49 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9810

I guess it might be worth adding a test for this too then, i.e. having a loop with both an integer and FP reduction and ensure we vectorise with ordered reductions.

9919–9920

I think the existing canVectorizeFPMath function is badly named because it actually checks for reordering:

bool canVectorizeFPMath(const LoopVectorizeHints &Hints) const {
  return !ExactFPMathInst || Hints.allowReordering();
}

so the logic in Kerry's patch is something like this:

  1. Is this an exact FP math instruction? If not -> vectorise, else
  2. Do hints permit reordering? If so -> vectorise, else
  3. Can we vectorise with ordered reductions? If not -> emit remark.

It probably is possible to combine these into a single LoopVectorizationLegality::canVectorizeFPMath function that does all the above, since that class does have access to the Requirements I think.

kmclaughlin marked 2 inline comments as done.

Addressing review comments from @sdesmalen & @david-arm:

  • Merged canVectorizeFPMath with canVectorizeOrderedFPMath in LoopVectorizationLegality
  • Only check the IsOrdered flag of the RecurrenceDescriptor if hasExactFPMath() is true.
  • Added a test with different types of reductions (integer add & FP add) that we should be able to vectorize with the -enable-strict-reductions flag

Also added an extra RUN line to both strict-fadd.ll & scalable-strict-fadd.ll to test the changes made to allowReordering (i.e. changing EC.getKnownMinValue() > 1 to EC.isScalar()).

kmclaughlin added inline comments.May 5 2021, 7:50 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9810

Hi @sdesmalen, we should only need the FP reductions in the loop to be ordered. I've changed this so that only reductions where hasExactFPMath() is true need to be ordered & added a test for this scenario to strict-fadd.ll

david-arm added inline comments.May 5 2021, 7:57 AM
llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
258

I think we can avoid passing in the Hints here as they are already a member of the class with the same name?

sdesmalen added inline comments.May 5 2021, 8:06 AM
llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
146

is this change necessary?

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
865

Should a hint of VF=1 really lead to the diagnostic "loop not vectorized: cannot prove it is safe to reorder floating-point operations"?

865–866

This will also return false if there are no reductions at all, or if all reductions are unordered?

868–871

How about returning true if for each reduction variable, any of the following conditions is true:

  1. The reduction is no ExactFPMath instruction for the reduction.
  2. The reduction is unordered.
  3. EnableStrictReductions is true.
872

I'd prefer the default case to be return false;, i.e. when we cannot explicitly determine it is safe, we assume it isn't safe. That would handle the case where Requirements->getExactFPInst() is true, but it isn't an instruction used in the reduction. (although I don't know if that would ever happen?)

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9919

You don't need to pass EnableStrictReductions, since it is defined in the same file?

david-arm added inline comments.May 5 2021, 8:10 AM
llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
16

Thanks a lot for adding the VF=vscale x 1 case here, but perhaps CHECK-SCALAR should be CHECK-VF1U1, since we're still vectorising? Also, it's probably worth adding an extra CHECK line for at least one instruction that shows the "<vscale x 1 ..." - maybe the call float ... instruction?

david-arm added inline comments.May 5 2021, 8:17 AM
llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
146

It's fixing a missing case where we weren't previously allowing reordering for scalable VF=vscale x 1. I think it's worth fixing, but maybe it doesn't have to live in this patch?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9919

I think it has to be passed since EnableStrictReductions lives in LoopVectorize.cpp and canVectorizeFPMath lives in LoopVectorizationLegality.cpp.

sdesmalen added inline comments.May 5 2021, 8:30 AM
llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
146

@kmclaughlin can this be pulled out into a separate patch, or does it depend on changes in this patch in order to test it?

I find the way the condition is written very confusing. It looks like the condition is synonymous, but it isn't. How about writing EC.isVector() instead?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9919

You're right, I didn't realise that, thanks!

kmclaughlin marked 2 inline comments as done.
kmclaughlin added a reviewer: spatel.
  • Removed changes to allowReordering() from this patch
  • Removed Hints.getWidth().isScalar() check from canVectorizeFPMath()
  • Changed canVectorizeFPMath to also look at induction variables, as we should not vectorize if the loop has any exact floating-point induction operators and we do not allow reassociation.
  • Added more tests to strict-fadd.ll which include floating-point induction variables to test the above changes.
kmclaughlin added inline comments.May 10 2021, 8:44 AM
llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
146

@sdesmalen this doesn't depend on any other changes here so I've removed it from this patch

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
865

I've removed this check as I don't think it's necessary, but it was added to be consistent with allowReordering() which returns false if EC.getKnownMinValue() > 1

sdesmalen added inline comments.May 10 2021, 9:28 AM
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
873–879

is it sufficient to write:

if (any_of(..... { }))
  return false;

(i.e. if ExactIndVars is true, then !getInductionVars().empty() must also be true)

884

Is it still necessary to iterate through the reduction variables at this point? Given that EnableStrictReductions is true, and that reductions are the only other operations that can have exact FPMath instructions, I think you can just return true.

david-arm added inline comments.May 11 2021, 12:32 AM
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
884

We don't support vectorising all of these reductions, for example we don't suppose strict reductions involving fmul and we don't support chains of fadds currently either. That's why in the code below we check !RdxDesc.isOrdered() because the ordered flag is only set for cases we can support at the moment. I think @kmclaughlin has added a test for this case as well below called fast_induction_unordered_reduction which shows how there are both fmul and fadd reductions in the same loop.

kmclaughlin marked an inline comment as done.
  • Removed the !getInductionVars().empty() test from canVectorizeFPMath()
david-arm added inline comments.May 12 2021, 6:13 AM
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
887

nit: This is just a suggestion, but you could rename ExactRdxVars to HasExactRdxVar and then here simply do:

return !HasExactRdxVar;

since I think when the list is empty that variable should be false?

kmclaughlin marked an inline comment as done.
  • Removed the getReductionVars().empty() test from canVectorizeFPMath() and renamed ExactRdxVar to HasExactRdxVar
sdesmalen added inline comments.May 14 2021, 9:01 AM
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
882

Can you rewrite this to:

// We can now only vectorize if all reductions with Exact FP math also
// have the isOrdered flag set, which indicates that we can move the
// reduction operations in-loop.
return all_of(getReductionVars(), [&](auto &Reduction) -> bool {
  RecurrenceDescriptor RdxDesc = Reduction.second;
  return !RdxDesc.hasExactFPMath() || RdxDesc.isOrdered();
});
llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
7–17

Should all test functions have check lines for all VF8UF1, VF8UF4, etc. ? Conversely, is it sufficient to just pass the interleave-count hint (not the vector width) via metadata and have 1 RUN line for VF8UF1, VF8UF4, VF4UF1?

Which also makes me wonder, what is the additional value of having both VF8UF1 and VF4UF1 ?

kmclaughlin marked 2 inline comments as done.
  • Removes HasExactRdxVar from canVectorizeFPMath() and instead returns the result from all_of(getReductionVars()...
  • Reduce the number of RUN lines in the tests
llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
7–17

I think it should be sufficient to pass the interleave count via metadata. I've changed VF4UF1 to VF8UF1 as there was no additional benefit in having both, similarly I've changed VF4UF1 in the strict-fadd.ll test as well to reduce the number of RUN lines.

david-arm added inline comments.May 17 2021, 6:18 AM
llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
82–104

Hi @kmclaughlin, I think maybe this is meant to be CHECK-VF8UF4?

kmclaughlin marked an inline comment as done.
  • Fixed incorrect CHECK lines in the @fadd_strict_unroll_last_val test in strict-fadd.ll (CHECK-VF8UF2 -> CHECK-VF8UF4)
sdesmalen added inline comments.May 17 2021, 1:32 PM
llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
76

Why does this test need to be vectorized with VF=vscale x 8 instead of VF=vscale x 4? Is that because it needs to be driven using the cmdline flags to circumvent the "hint allows reordering" behaviour? (and so that the 1 RUN line covers all tests?) If that's the case, can you do a NFC patch where you first change the test to use the new VF, and then rebase this patch on top?

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
1–7

The first test, @fadd_strict, has check lines that match the first RUN line, and the 4th RUN line, but none of the others. Why is that?
And are all these RUN lines needed?

396–397

nit:

; Strict reduction could be performed in-loop, but ordered FP induction variables are not supported.
420

nit: this PHI is unnecessary? (same for the tests below)

424

Can you be more explicit in the comment? i.e.

; As above, but with the FP induction being unordered (fast), the loop can be vectorized.
kmclaughlin added inline comments.May 18 2021, 6:40 AM
llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
76

Hi @sdesmalen, it is the case that I changed the VF so that the test could be covered by one RUN line, and to try and circumvent the hints allow reordering behaviour. I will move changes to the RUN lines into a new patch so that this patch only adds new tests.

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
1–7

Since the allowReordering() function returns false if EC.getKnownMinValue() > 1, I thought it was worth making sure that we don't vectorize a VF of 1 for at least one of the tests, which is why I added the extra RUN line to @fadd_strict.
The RUN lines are needed so that we can pass the different VFs & interleave counts needed for each of the tests (e.g. @fadd_strict_unroll needs a UF > 1) and I didn't want to change the 'allow reordering' behaviour by passing hints through metadata. Though I think I could remove the CHECK-PRED line since the @fadd_predicated does rely on metadata if this would help at all?

sdesmalen added inline comments.May 19 2021, 1:35 AM
llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
76

I think it's worth adding a flag to the vectorizer to disable this weird behaviour for testing purposes, so that we don't need to change this test, and so that you don't need the multiple RUN lines in the other test in favour of using metadata to control the VF per individual test.

kmclaughlin marked 4 inline comments as done.
  • Rebased on the dependent changes, D102774
  • Removed the separate RUN lines from both strict-fadd.ll & scalable-strict-fadd.ll for different VF/UFs.
  • The tests now use one RUN line with the -hints-allow-reordering=false flag. This uses the existing CHECK lines in the tests, which prior to these changes only tested vectorization where reordering was allowed.

As discussed with @sdesmalen, I have made the following changes to this patch so that the tests are clearer:

  • Combined this patch with D102774, adding the -hints-allow-reordering flag here
  • Added CHECK-ORDERED and CHECK-UNORDERED lines to the tests in D103015
  • Updated the RUN lines in this patch to use the -hints-allow-reordering flag and also added a RUN line for CHECK-NOT-VECTORIZED. The tests themselves remain unchanged, with the exception of the new tests added for induction variables.
kmclaughlin edited the summary of this revision. (Show Details)May 24 2021, 6:26 AM
sdesmalen added inline comments.May 24 2021, 6:30 AM
llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
138

I'd suggest moving the implementation of allowReordering to LoopVectorizationLegality.cpp, so that you don't need to add the extern+include above.

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
866–867

nit: can you fold this into the condition below:

if (!EnableStrictReductions || any_of(...))
  return false;
llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
3

Can you also add a RUN line for -enable-strict-reductions=true -hints-allow-reordering=true (which I think can reuse prefix CHECK-UNORDERED)

david-arm added inline comments.May 24 2021, 7:21 AM
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
866–867

I guess the comment below will need updating too if the check is moved.

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
474

Are these new tests missing hints that the other tests seem to use? I just wondered if it was better to be consistent here that's all. The reason I mention this is because I was expecting the UNORDERED case to vectorise due to the -hints-allow-reordering=true flag.

david-arm added inline comments.May 24 2021, 7:37 AM
llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
474

I think I see now @kmclaughlin - you're testing the productisation of -enable-strict-reductions so you were adding some tests deliberately without hints, which also makes sense. In this case I'd also be happy if you left these tests as they are and just added some comments explaining why we expect the CHECK-UNORDERED case to not vectorise.

kmclaughlin marked 3 inline comments as done.
  • Moved the implementation of allowReordering() to LoopVectorizationLegality.cpp
  • Added a comment above the new tests in strict-fadd.ll to explain why the CHECK-UNORDERED case should not vectorize
kmclaughlin added inline comments.May 25 2021, 5:57 AM
llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
3

Hi @sdesmalen, I haven't added another RUN line for -enable-strict-reductions=true -hints-allow-reordering=true as this will cause failures with the fadd_multiple test.
For most of the tests, the output where both flags are true will match the CHECK-ORDERED lines, since we will always use strict reductions where possible if this flag is set. For fadd_multiple, we cannot use strict reductions and so the value of -hints-allow-reordering will change whether or not the test vectorizes.
As we discussed previously, I will follow this up with a patch which ensures we only choose strict reductions if we do not allow reordering. At this point I can add a RUN line as you've suggested and reuse the CHECK-UNORDERED prefix.

474

Hi @david-arm, I've added a comment above these tests to explain why the CHECK-UNORDERED case shouldn't vectorize.

sdesmalen accepted this revision.May 25 2021, 2:12 PM

Other than my request for a FIXME, I'm happy with the patch. LGTM!

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
3

Fair enough, thanks for explaining. Can you just add a FIXME above the cl::opt for -enable-strict-reductions that this flag reverses the default behaviour we have now when hints are passed?

This revision is now accepted and ready to land.May 25 2021, 2:12 PM
This revision was landed with ongoing or failed builds.May 26 2021, 6:06 AM
This revision was automatically updated to reflect the committed changes.

Thank you @sdesmalen & @david-arm for the reviews and comments!