Page MenuHomePhabricator

[SVE] Add support for scalable vectorization of loops with int/fast FP reductions

Authored by kmclaughlin on Jan 22 2021, 9:28 AM.



This patch enables scalable vectorization of loops with integer/fast reductions, e.g:

unsigned sum = 0;
for (int i = 0; i < n; ++i) {
  sum += a[i];

A new TTI interface, isLegalToVectorizeReduction, has been added to prevent
reductions which are not supported for scalable types from vectorizing.
If the reduction is not supported for a given scalable VF,
computeFeasibleMaxVF will fall back to using fixed-width vectorization.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
david-arm added inline comments.Jan 25 2021, 1:37 AM

Similar to an earlier comment, a remark here would be good I think.

1 ↗(On Diff #318549)

Needs a "REQUIRES: asserts" here I think because you're relying upon debug output. Also, since you're explicitly adding "-mattr=+sve" here I think you'll either have to:

  1. Make the test generic work for all targets (this test will fail on some builds due to lack of AArch64 support), or
  2. Move the test for LoopVectorize/AArch64
16 ↗(On Diff #318549)

I wonder if it's worth adding CHECK lines for the resulting IR to show we've vectorised the loop using reductions and checking we have the right structure, i.e. vector.body, middle.block, etc?

fhahn added inline comments.Jan 25 2021, 2:02 AM

This should probably have a comment,.

3924 ↗(On Diff #318549)

Can you add a test for this? Also, this seems completely unrelated, can you split it off?


those changes could also be submitted separately?


This also needs a comment. And the name could probably be improved. Maybe canVectorizeReductions?


This should only be checked in the code handling UserVF below? Also, This seems like a property that generally limits to vectorization factor to fixed-width vectorization factors and would be good to check beforehand.

Would it be possible to just limit vectorization factors to fixed width factors in computeFeasibleMaxVF? This way, we won't need extra checks once automatically picked VFs are supported. You'd also won't need any extra code in the caller of ::plan.

This is similar to how we deal with other 'legality' properties that depend on the vectorization factor, like dependencies that may limit the vectorization factor.


This message seems a bit odd. I think the cost model should just be responsible for assigning a cost, not deciding whether it is possible to vectorize or not; that's the job of the legality checks.

Please see my comment above, the could probably done in computeFeasibleMaxVF, which technically is part of the cost model, but is the first step and applies other legality constraints as well which limit the vectorization factor.

9 ↗(On Diff #318549)

Personally I don't think the C source code adds much value. The IR is very compact and it should be obvious from the IR & test name what is going on. Also, the IR that clang generates can change, clang options may change, pragmas may change and so on.

20 ↗(On Diff #318549)

this should not be needed for the test.

23 ↗(On Diff #318549)

this should not be needed for the test, you can just pass %n as i64.

27 ↗(On Diff #318549)

nit: can strip indvars from the name to mark things more compact.

kmclaughlin marked 15 inline comments as done.
kmclaughlin edited the summary of this revision. (Show Details)
  • Removed changes to LoopVectorizationPlanner::plan and instead check whether reductions can be vectorized in computeFeasibleMaxVF. If any reduction in the loop cannot be vectorized with a scalable VF, we fall back on fixed-width vectorization.
  • Changes to have VectorizationFactor use InstructionCost were not necessary to the patch after the above change and have also been removed.
  • Improved the tests in scalable_reductions.ll based on suggestions from @fhahn & @david-arm

Thanks for reviewing this patch, all!

3924 ↗(On Diff #318549)

I've removed this from the patch, I don't think it's required for the tests here.


Hi @CarolineConcatto, thanks for your suggestions on InstructionCost! I didn't change the SmallLoopCost flag to be an instruction cost in the last revision as this caused tests which use -small-loop-cost to fail (e.g. LoopVectorize/unroll_novec.ll)


Thanks for this suggestion, @fhahn. I've moved the canVectorizeReductions check to computeFeasibleMaxVF & updated the affected test in scalable_reductions.ll, where we can use fixed-width vectorization instead (@mul)

1 ↗(On Diff #318549)

Added REQUIRES: asserts & moved the test to Transforms/LoopVectorize/AArch64

david-arm added inline comments.Jan 27 2021, 1:36 AM

nit: Perhaps use /// here instead of '//' in line with other function comments?


I wonder if it's worth bailing out even earlier, i.e. in the same place as above where you check initially? I think the main benefit to bailing out here is if you can reduce the VF to something smaller so that it becomes legal. However, I think for reductions changing the VF won't make a difference in practice.


nit: Perhaps use "operations" here instead of types? I'm thinking that the user probably isn't aware of the RecurrenceKind so type might not make as much sense?

2 ↗(On Diff #319303)

I think you can reduce the number of RUN lines here by piping stderr for the first RUN line to a temporary file, e.g. something like

; RUN: opt < %s -loop-vectorize -transform-warning -mtriple aarch64-unknown-linux-gnu -mattr=+sve -debug-only=loop-vectorize -S 2>%t | FileCheck %s -check-prefix=CHECK
; RUN cat %t | FileCheck %s -check-prefix=CHECK-DEBUG

4 ↗(On Diff #319303)

Is it worth changing this to check for the new remark instead? You can use something like this:

; RUN: opt < %s -loop-vectorize -pass-remarks='loop-vectorize' -disable-output -mtriple aarch64-unknown-linux-gnu -mattr=+sve -S 2>&1 | ...

223 ↗(On Diff #319303)

I'm a bit surprised this vectorises to be honest, since there is no 'fast' flag here! Perhaps for IEEE math you have to add specific attributes to the function?

kmclaughlin marked 4 inline comments as done.
  • Moved the canVectorizeReductions check to earlier in computeFeasibleMaxVF
  • Updated the RUN lines in scalable_reductions.ll
  • Removed duplicate test for FAdd
kmclaughlin marked an inline comment as not done.Feb 1 2021, 10:11 AM
kmclaughlin added inline comments.
223 ↗(On Diff #319303)

I think what happened here is that the hints used to enable vectorization have allowed reordering, similar to using -Ofast. I found this comment at the top of allowReordering() in LoopVectorizationLegality:

// When enabling loop hints are provided we allow the vectorizer to change
// the order of operations that is given by the scalar loop. This is not
// enabled by default because can be unsafe or inefficient. For example,
// reordering floating-point operations will change the way round-off
// error accumulates in the loop.

This behaviour was queried on the mailing list last year:

dmgreen added inline comments.

Does this need to check the type? Does an i128 reduction work, for example?

I presume if a <vscale x 4 x float> reduction works then any <vscale x ? x float> will work?

Thanks for making the changes @kmclaughlin! Just a couple more comments ...


I think this looks much better now you're just checking reductions only once and early on - thanks for this! However, I think you might need to move this check down to line 5677 where we return UserVF. So the reason I think this is because if we have a loop that contains memory dependences and reductions in the same loop we want to ensure we always do the reduction checks regardless. For example, Legal->isSafeForAnyVectorWidth() could return false and then in the code below we may successfully reduce the UserVF from <vscale x 8 x float> to <vscale x 4 x float> without ever calling canVectorizeReductions.

1 ↗(On Diff #320512)

Thanks for RUN line changes here - looks a lot neater now thanks! If it's not too difficult I think it would be great if you could test the remark here too, since this is user-facing rather than debug. If you want you can even test the remark instead of the debug - this would also mean you can remove the "REQUIRE: asserts" line above too.

sdesmalen added inline comments.Feb 2 2021, 4:16 AM

nit: bail out early to reduce indentation.

if (!Scalable)
  return true;

nit: can be removed if you add the early bail out.


nit: use return llvm::all_of(....) with lambda, instead of loop?


Is it worth just passing the whole Recurrence descriptor and the whole of VF?

When passing the whole Recurrence descriptor, in the future the function can also determine whether it can vectorize an ordered reduction (e.g. ordered fadd) in the loop body using some instruction.

fhahn added inline comments.Feb 2 2021, 4:39 AM

please also add a test for this scenario.

kmclaughlin marked 7 inline comments as done.
  • Moved the Legal->isSafeForAnyVectorWidth() check in computeFeasibleMaxVF further down so that we always check the reductions even if the loop contains memory dependencies. Added a test for this scenario to scalable_reductions.ll.
  • Changed isLegalToVectorizeReduction so that the whole RecurrenceDescriptor and VF are passed in, and added a check of the recurrence type.
  • Replaced the loop in canVectorizeReductions with lambda
  • Removed REQUIRE: asserts from the test file and added -pass-remarks-analysis/missed flags to the RUN line

Hi @dmgreen, thanks for taking a look at this!
I've added a check of the recurrence type to isLegalToVectorizeReduction. I think any <vscale x ? x float> reduction will work, I added some tests for legalization of vector reductions as part of D93050.

LGTM! Thanks for making all changes. Perhaps wait a while before merging in case others want a look?


nit: I think you can remove the '(' and ')' surrounding the llvm::all_of call here.

david-arm accepted this revision.Feb 3 2021, 8:40 AM

LGTM. Forgot to click "Accept Revision" before. Doh!

This revision is now accepted and ready to land.Feb 3 2021, 8:40 AM
dmgreen added inline comments.Feb 4 2021, 12:42 AM

Thanks. This looks like it should work for most current types. Are bfloats always supported? It may be better to be more specific in case other smaller-than-64bit float types are added in the future.

david-arm added inline comments.Feb 4 2021, 1:00 AM

Is this needed though? If bfloats are in the scalar IR it means that the user has explicitly written code using the SVE ACLE so I'd imagine that all bets are off anyway if they didn't build with bf16 support. I'd also imagine that these would be flagged up as illegal types earlier on in the vectoriser too I think?

dmgreen added inline comments.Feb 4 2021, 3:46 AM

Hmm. I guess I I don't see the advantage of getting it wrong. Clang isn't the only frontend and the vectorizer needs to take any valid input and not crash or produce code that will later crash. Being specific about which types are supported seems like a better idea to me than hoping it works and hoping that won't change in the future.

david-arm added inline comments.Feb 4 2021, 3:58 AM

No that's a fair point and happy for @kmclaughlin to add the check. However, we can't test such a scenario even with hand written IR because the vectoriser crashes without bfloat support:

LLVM ERROR: Cannot legalize this vector
#8 0x0000ffff959efad8 llvm::TargetLoweringBase::getTypeConversion(llvm::LLVMContext&, llvm::EVT) const (.localalias) (/home/davshe01/upstream/llvm-project/build2/bin/../lib/
#9 0x0000ffff959efbd8 llvm::TargetLoweringBase::getTypeLegalizationCost(llvm::DataLayout const&, llvm::Type*) const (/home/davshe01/upstream/llvm-project/build2/bin/../lib/

  • Added a function called isLegalScalarTypeForSVE which checks that the reduction type is supported & added a new test which uses bfloat to scalable-reductions.ll
dmgreen accepted this revision.Feb 4 2021, 7:36 AM

Nice one. Thanks for the change. LGTM

sdesmalen added inline comments.Feb 4 2021, 2:09 PM

The same should hold for integer Mul.

nit: you can better add that to the switch statement below as:

case Instruction::Mul:
case Instruction::FMul:
  assert(!isa<ScalableVectorType>(Ty) && "Unexpected ...");
case Instruction::Fadd:

Can you merge this function with isLegalScalarTypeForSVEMaskedMemOp and name it isLegalElementTypeForSVE?

I think their implementation should be the same (including your check here for hasBF16)

sdesmalen added inline comments.Feb 5 2021, 1:54 AM

I forgot to mention that there are no reduction instructions for bfloat, so you'll need to catch out that specific case in isLegalToVectorizeReduction

kmclaughlin marked 2 inline comments as done.
  • Merged isLegalScalarTypeForSVEMaskedMemOp & isLegalScalarTypeForSVE
  • Return false from isLegalToVectorizeReduction for bfloat types
  • Included isa<ScalableVectorType>(Ty) in the switch statement conditions of useReductionIntrinsic

Thanks for the changes. I only have some more comments about the tests now.


nit: remove dso_local here and in other definitions.


This CHECK-DEBUG (with it's own RUN line) is not checking which function is not vectorizing, it could just as well be emitted for one of the other functions. I'd suggest explicitly adding checks for @mul and adding a CHECK-DEBUG line for the other tests as well.


Same as above. Can you also add a comment saying why you're testing a memory_dependence issue in a test file called scalable-reductions.ll ?


These two fmin/fmax tests are not very useful, because the loop doesn't fail to vectorize because of code added in this patch.


nit: use nnan directly in the fp operation instead of an attribute.

david-arm added inline comments.Feb 8 2021, 12:38 AM

Hi @sdesmalen, just for information the reason I'd asked for an assert here is that if we're still intending to create a target reduction intrinsic at this point with a mul or fmul then something has gone badly wrong and is almost certainly a bug. This is because this function is only ever called at the point where you've already decided that it's legal to reduce a scalable mul operation. The two places where this is called are from SLPVectorizer.cpp:createSimpleTargetReduction and InnerLoopVectorizer::fixReduction (via createTargetReduction).

david-arm added inline comments.Feb 8 2021, 1:14 AM

Sorry, please ignore my comment! For some reason I hadn't seen the assert in there.

kmclaughlin marked 4 inline comments as done.

Changes to the tests in scalable-reductions.ll:

  • Removed dso_local from definitions
  • Added a comment on the purpose of the memory_dependence test
  • Added CHECK-REMARK lines for each test in the file
  • Removed the unnecessary fmin/fmax tests where we can't vectorize

Hi @sdesmalen, these tests for fmin/fmax fail without the no-nans-fp-math attribute, I think because RecurrenceDescriptor::isRecurrenceInstr is just checking for the function attribute and not the flags on the instruction. I've created a separate patch (D96350) to try and address this.

Rebased changes

david-arm accepted this revision.Feb 12 2021, 6:11 AM

LGTM! Latest version looks good and I think you've addressed @sdesmalen's comments. Thanks!


nit: Perhaps you could make it clear you're testing the ordering, i.e. with something like:

This test was added to ensure we always check the legality of reductions (end emit a warning if necessary) before checking for memory dependencies
fhahn accepted this revision.Feb 15 2021, 12:41 AM

LV changes LGTM, thanks for the updates!


nit: . at end of sentence.


nit: llvm:: should not be required


I think you should be bale to use reportVectorizationFailure to print to dbgs() and generate a remark with the same message


nit: those checks should not be needed.

This revision was automatically updated to reflect the committed changes.
kmclaughlin marked 5 inline comments as done.

Thanks all for reviewing these changes!