This is an archive of the discontinued LLVM Phabricator instance.

[SVE] Add support for scalable vectorization of loops with int/fast FP reductions
ClosedPublic

Authored by kmclaughlin on Jan 22 2021, 9:28 AM.

Details

Summary

This patch enables scalable vectorization of loops with integer/fast reductions, e.g:

unsigned sum = 0;
for (int i = 0; i < n; ++i) {
  sum += a[i];
}

A new TTI interface, isLegalToVectorizeReduction, has been added to prevent
reductions which are not supported for scalable types from vectorizing.
If the reduction is not supported for a given scalable VF,
computeFeasibleMaxVF will fall back to using fixed-width vectorization.

Diff Detail

Event Timeline

kmclaughlin created this revision.Jan 22 2021, 9:28 AM
kmclaughlin requested review of this revision.Jan 22 2021, 9:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 22 2021, 9:28 AM
Matt added a subscriber: Matt.Jan 22 2021, 11:13 AM

Hey Kerry,
Thank you for this patch.
I found some nit and I have some suggestions about instructionCost.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4695

So once we start to use Scalable vector and we start to use the VF.getKnownMinValue(), shouldn't;t this be multiplied by getMaxVScale()?

4743

Same here, should we not need to multiply by getMaxVScale()?

6192

I believe we can use LoopCost.isValid(), here!

6213

Can you change SmallLoopCost to be instruction cost as LoopCost, so you don't need to use *LoopCost.getValue()?
And I believe that in the std::min you will not need to use getValue

7700

nit

9470

nit

david-arm added inline comments.Jan 25 2021, 1:37 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1121

Just a thought - if we're excluding FMul from reductions is it worth having an assert here that the op is not fmul?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1519

It might be worth printing out the recurrence kind here. Do we also want to emit a remark here to help the user understand why it failed to vectorise?

4695

This is for vectorise of induction variables. I think we'll have to use a runtime VF that I introduced in D95139 here. I don't think Kerry has to fix this in her patch.

6192

I think since we're changing LoopCost to be InstructionCost we can change the line above too from

LoopCost = *expectedCost(VF).first.getValue();

to

LoopCost = expectedCost(VF).first;

david-arm added inline comments.Jan 25 2021, 1:37 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9485

Similar to an earlier comment, a remark here would be good I think.

llvm/test/Transforms/LoopVectorize/scalable_reductions.ll
1 ↗(On Diff #318549)

Needs a "REQUIRES: asserts" here I think because you're relying upon debug output. Also, since you're explicitly adding "-mattr=+sve" here I think you'll either have to:

  1. Make the test generic work for all targets (this test will fail on some builds due to lack of AArch64 support), or
  2. Move the test for LoopVectorize/AArch64
16 ↗(On Diff #318549)

I wonder if it's worth adding CHECK lines for the resulting IR to show we've vectorised the loop using reductions and checking we have the right structure, i.e. vector.body, middle.block, etc?

fhahn added inline comments.Jan 25 2021, 2:02 AM
llvm/include/llvm/Analysis/TargetTransformInfo.h
1309

This should probably have a comment,.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3924 ↗(On Diff #318549)

Can you add a test for this? Also, this seems completely unrelated, can you split it off?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1271

those changes could also be submitted separately?

1513

This also needs a comment. And the name could probably be improved. Maybe canVectorizeReductions?

7691

This should only be checked in the code handling UserVF below? Also, This seems like a property that generally limits to vectorization factor to fixed-width vectorization factors and would be good to check beforehand.

Would it be possible to just limit vectorization factors to fixed width factors in computeFeasibleMaxVF? This way, we won't need extra checks once automatically picked VFs are supported. You'd also won't need any extra code in the caller of ::plan.

This is similar to how we deal with other 'legality' properties that depend on the vectorization factor, like dependencies that may limit the vectorization factor.

9483

This message seems a bit odd. I think the cost model should just be responsible for assigning a cost, not deciding whether it is possible to vectorize or not; that's the job of the legality checks.

Please see my comment above, the could probably done in computeFeasibleMaxVF, which technically is part of the cost model, but is the first step and applies other legality constraints as well which limit the vectorization factor.

llvm/test/Transforms/LoopVectorize/scalable_reductions.ll
9 ↗(On Diff #318549)

Personally I don't think the C source code adds much value. The IR is very compact and it should be obvious from the IR & test name what is going on. Also, the IR that clang generates can change, clang options may change, pragmas may change and so on.

20 ↗(On Diff #318549)

this should not be needed for the test.

23 ↗(On Diff #318549)

this should not be needed for the test, you can just pass %n as i64.

27 ↗(On Diff #318549)

nit: can strip indvars from the name to mark things more compact.

kmclaughlin marked 15 inline comments as done.
kmclaughlin edited the summary of this revision. (Show Details)
  • Removed changes to LoopVectorizationPlanner::plan and instead check whether reductions can be vectorized in computeFeasibleMaxVF. If any reduction in the loop cannot be vectorized with a scalable VF, we fall back on fixed-width vectorization.
  • Changes to have VectorizationFactor use InstructionCost were not necessary to the patch after the above change and have also been removed.
  • Improved the tests in scalable_reductions.ll based on suggestions from @fhahn & @david-arm

Thanks for reviewing this patch, all!

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3924 ↗(On Diff #318549)

I've removed this from the patch, I don't think it's required for the tests here.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6213

Hi @CarolineConcatto, thanks for your suggestions on InstructionCost! I didn't change the SmallLoopCost flag to be an instruction cost in the last revision as this caused tests which use -small-loop-cost to fail (e.g. LoopVectorize/unroll_novec.ll)

7691

Thanks for this suggestion, @fhahn. I've moved the canVectorizeReductions check to computeFeasibleMaxVF & updated the affected test in scalable_reductions.ll, where we can use fixed-width vectorization instead (@mul)

llvm/test/Transforms/LoopVectorize/scalable_reductions.ll
1 ↗(On Diff #318549)

Added REQUIRES: asserts & moved the test to Transforms/LoopVectorize/AArch64

david-arm added inline comments.Jan 27 2021, 1:36 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1513

nit: Perhaps use /// here instead of '//' in line with other function comments?

5706

I wonder if it's worth bailing out even earlier, i.e. in the same place as above where you check initially? I think the main benefit to bailing out here is if you can reduce the VF to something smaller so that it becomes legal. However, I think for reductions changing the VF won't make a difference in practice.

5715

nit: Perhaps use "operations" here instead of types? I'm thinking that the user probably isn't aware of the RecurrenceKind so type might not make as much sense?

llvm/test/Transforms/LoopVectorize/AArch64/scalable_reductions.ll
2 ↗(On Diff #319303)

I think you can reduce the number of RUN lines here by piping stderr for the first RUN line to a temporary file, e.g. something like

; RUN: opt < %s -loop-vectorize -transform-warning -mtriple aarch64-unknown-linux-gnu -mattr=+sve -debug-only=loop-vectorize -S 2>%t | FileCheck %s -check-prefix=CHECK
; RUN cat %t | FileCheck %s -check-prefix=CHECK-DEBUG

4 ↗(On Diff #319303)

Is it worth changing this to check for the new remark instead? You can use something like this:

; RUN: opt < %s -loop-vectorize -pass-remarks='loop-vectorize' -disable-output -mtriple aarch64-unknown-linux-gnu -mattr=+sve -S 2>&1 | ...

223 ↗(On Diff #319303)

I'm a bit surprised this vectorises to be honest, since there is no 'fast' flag here! Perhaps for IEEE math you have to add specific attributes to the function?

kmclaughlin marked 4 inline comments as done.
  • Moved the canVectorizeReductions check to earlier in computeFeasibleMaxVF
  • Updated the RUN lines in scalable_reductions.ll
  • Removed duplicate test for FAdd
kmclaughlin marked an inline comment as not done.Feb 1 2021, 10:11 AM
kmclaughlin added inline comments.
llvm/test/Transforms/LoopVectorize/AArch64/scalable_reductions.ll
223 ↗(On Diff #319303)

I think what happened here is that the hints used to enable vectorization have allowed reordering, similar to using -Ofast. I found this comment at the top of allowReordering() in LoopVectorizationLegality:

// When enabling loop hints are provided we allow the vectorizer to change
// the order of operations that is given by the scalar loop. This is not
// enabled by default because can be unsafe or inefficient. For example,
// reordering floating-point operations will change the way round-off
// error accumulates in the loop.

This behaviour was queried on the mailing list last year:
https://lists.llvm.org/pipermail/llvm-dev/2020-June/142697.html

dmgreen added inline comments.
llvm/include/llvm/Analysis/TargetTransformInfo.h
1310

Does this need to check the type? Does an i128 reduction work, for example?

I presume if a <vscale x 4 x float> reduction works then any <vscale x ? x float> will work?

Thanks for making the changes @kmclaughlin! Just a couple more comments ...

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5659–5660

I think this looks much better now you're just checking reductions only once and early on - thanks for this! However, I think you might need to move this check down to line 5677 where we return UserVF. So the reason I think this is because if we have a loop that contains memory dependences and reductions in the same loop we want to ensure we always do the reduction checks regardless. For example, Legal->isSafeForAnyVectorWidth() could return false and then in the code below we may successfully reduce the UserVF from <vscale x 8 x float> to <vscale x 4 x float> without ever calling canVectorizeReductions.

llvm/test/Transforms/LoopVectorize/AArch64/scalable_reductions.ll
1 ↗(On Diff #320512)

Thanks for RUN line changes here - looks a lot neater now thanks! If it's not too difficult I think it would be great if you could test the remark here too, since this is user-facing rather than debug. If you want you can even test the remark instead of the debug - this would also mean you can remove the "REQUIRE: asserts" line above too.

sdesmalen added inline comments.Feb 2 2021, 4:16 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1094

nit: bail out early to reduce indentation.

if (!Scalable)
  return true;
1111

nit: can be removed if you add the early bail out.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1516

nit: use return llvm::all_of(....) with lambda, instead of loop?

1519

Is it worth just passing the whole Recurrence descriptor and the whole of VF?

When passing the whole Recurrence descriptor, in the future the function can also determine whether it can vectorize an ordered reduction (e.g. ordered fadd) in the loop body using some instruction.

fhahn added inline comments.Feb 2 2021, 4:39 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5659–5660

please also add a test for this scenario.

kmclaughlin marked 7 inline comments as done.
  • Moved the Legal->isSafeForAnyVectorWidth() check in computeFeasibleMaxVF further down so that we always check the reductions even if the loop contains memory dependencies. Added a test for this scenario to scalable_reductions.ll.
  • Changed isLegalToVectorizeReduction so that the whole RecurrenceDescriptor and VF are passed in, and added a check of the recurrence type.
  • Replaced the loop in canVectorizeReductions with lambda
  • Removed REQUIRE: asserts from the test file and added -pass-remarks-analysis/missed flags to the RUN line
llvm/include/llvm/Analysis/TargetTransformInfo.h
1310

Hi @dmgreen, thanks for taking a look at this!
I've added a check of the recurrence type to isLegalToVectorizeReduction. I think any <vscale x ? x float> reduction will work, I added some tests for legalization of vector reductions as part of D93050.

LGTM! Thanks for making all changes. Perhaps wait a while before merging in case others want a look?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1516

nit: I think you can remove the '(' and ')' surrounding the llvm::all_of call here.

david-arm accepted this revision.Feb 3 2021, 8:40 AM

LGTM. Forgot to click "Accept Revision" before. Doh!

This revision is now accepted and ready to land.Feb 3 2021, 8:40 AM
dmgreen added inline comments.Feb 4 2021, 12:42 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1098

Thanks. This looks like it should work for most current types. Are bfloats always supported? It may be better to be more specific in case other smaller-than-64bit float types are added in the future.

david-arm added inline comments.Feb 4 2021, 1:00 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1098

Is this needed though? If bfloats are in the scalar IR it means that the user has explicitly written code using the SVE ACLE so I'd imagine that all bets are off anyway if they didn't build with bf16 support. I'd also imagine that these would be flagged up as illegal types earlier on in the vectoriser too I think?

dmgreen added inline comments.Feb 4 2021, 3:46 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1098

Hmm. I guess I I don't see the advantage of getting it wrong. Clang isn't the only frontend and the vectorizer needs to take any valid input and not crash or produce code that will later crash. Being specific about which types are supported seems like a better idea to me than hoping it works and hoping that won't change in the future.

david-arm added inline comments.Feb 4 2021, 3:58 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1098

No that's a fair point and happy for @kmclaughlin to add the check. However, we can't test such a scenario even with hand written IR because the vectoriser crashes without bfloat support:

LLVM ERROR: Cannot legalize this vector
#8 0x0000ffff959efad8 llvm::TargetLoweringBase::getTypeConversion(llvm::LLVMContext&, llvm::EVT) const (.localalias) (/home/davshe01/upstream/llvm-project/build2/bin/../lib/libLLVMSupport.so.13git+0xcfad8)
#9 0x0000ffff959efbd8 llvm::TargetLoweringBase::getTypeLegalizationCost(llvm::DataLayout const&, llvm::Type*) const (/home/davshe01/upstream/llvm-project/build2/bin/../lib/libLLVMSupport.so.13git+0xcfbd8)

  • Added a function called isLegalScalarTypeForSVE which checks that the reduction type is supported & added a new test which uses bfloat to scalable-reductions.ll
dmgreen accepted this revision.Feb 4 2021, 7:36 AM

Nice one. Thanks for the change. LGTM

sdesmalen added inline comments.Feb 4 2021, 2:09 PM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1121

The same should hold for integer Mul.

nit: you can better add that to the switch statement below as:

case Instruction::Mul:
case Instruction::FMul:
  assert(!isa<ScalableVectorType>(Ty) && "Unexpected ...");
  LLVM_FALLTHROUGH;
case Instruction::Fadd:
...
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
189

Can you merge this function with isLegalScalarTypeForSVEMaskedMemOp and name it isLegalElementTypeForSVE?

I think their implementation should be the same (including your check here for hasBF16)

sdesmalen added inline comments.Feb 5 2021, 1:54 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
190–191

I forgot to mention that there are no reduction instructions for bfloat, so you'll need to catch out that specific case in isLegalToVectorizeReduction

kmclaughlin marked 2 inline comments as done.
  • Merged isLegalScalarTypeForSVEMaskedMemOp & isLegalScalarTypeForSVE
  • Return false from isLegalToVectorizeReduction for bfloat types
  • Included isa<ScalableVectorType>(Ty) in the switch statement conditions of useReductionIntrinsic

Thanks for the changes. I only have some more comments about the tests now.

llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions.ll
101

nit: remove dso_local here and in other definitions.

340

This CHECK-DEBUG (with it's own RUN line) is not checking which function is not vectorizing, it could just as well be emitted for one of the other functions. I'd suggest explicitly adding checks for @mul and adding a CHECK-DEBUG line for the other tests as well.

376

Same as above. Can you also add a comment saying why you're testing a memory_dependence issue in a test file called scalable-reductions.ll ?

424

These two fmin/fmax tests are not very useful, because the loop doesn't fail to vectorize because of code added in this patch.

470

nit: use nnan directly in the fp operation instead of an attribute.

david-arm added inline comments.Feb 8 2021, 12:38 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1121

Hi @sdesmalen, just for information the reason I'd asked for an assert here is that if we're still intending to create a target reduction intrinsic at this point with a mul or fmul then something has gone badly wrong and is almost certainly a bug. This is because this function is only ever called at the point where you've already decided that it's legal to reduce a scalable mul operation. The two places where this is called are from SLPVectorizer.cpp:createSimpleTargetReduction and InnerLoopVectorizer::fixReduction (via createTargetReduction).

david-arm added inline comments.Feb 8 2021, 1:14 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1121

Sorry, please ignore my comment! For some reason I hadn't seen the assert in there.

kmclaughlin marked 4 inline comments as done.

Changes to the tests in scalable-reductions.ll:

  • Removed dso_local from definitions
  • Added a comment on the purpose of the memory_dependence test
  • Added CHECK-REMARK lines for each test in the file
  • Removed the unnecessary fmin/fmax tests where we can't vectorize
llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions.ll
470

Hi @sdesmalen, these tests for fmin/fmax fail without the no-nans-fp-math attribute, I think because RecurrenceDescriptor::isRecurrenceInstr is just checking for the function attribute and not the flags on the instruction. I've created a separate patch (D96350) to try and address this.

Rebased changes

david-arm accepted this revision.Feb 12 2021, 6:11 AM

LGTM! Latest version looks good and I think you've addressed @sdesmalen's comments. Thanks!

llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions.ll
377

nit: Perhaps you could make it clear you're testing the ordering, i.e. with something like:

This test was added to ensure we always check the legality of reductions (end emit a warning if necessary) before checking for memory dependencies
fhahn accepted this revision.Feb 15 2021, 12:41 AM

LV changes LGTM, thanks for the updates!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1514

nit: . at end of sentence.

1516

nit: llvm:: should not be required

5665

I think you should be bale to use reportVectorizationFailure to print to dbgs() and generate a remark with the same message

llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions.ll
21

nit: those checks should not be needed.

This revision was automatically updated to reflect the committed changes.
kmclaughlin marked 5 inline comments as done.

Thanks all for reviewing these changes!