Page MenuHomePhabricator

[LoopVectorize] Fix crash for predicated instruction with scalable VF

Authored by CarolineConcatto on May 5 2021, 9:01 AM.



This patch avoids computing discounts for predicated instructions when the
VF is scalable.
There is no support for vectorization of loops with division because the
vectorizer cannot guarantee that zero divisions will not happen.

This loop now does not use VF scalable

for (long long i = 0; i < n; i++)
    if (cond[i])
      a[i] /= b[i];

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
sdesmalen added inline comments.May 18 2021, 12:33 AM

Should this be !isScalarEpilogueAllowed()? i.e. if a scalar epilogue loop is not allowed, then we know predication is required.

CarolineConcatto marked 3 inline comments as done.May 18 2021, 1:31 AM

Thank you @david-arm and @sdesmalen for the review.
I believe I've addressed all your comments.
I've updated the patch based on D102437 because there was a dependency between the cost model and the scalarization.
I could only use predicateWithScalar if I was able to compute the cost model, but the cost model was only able to be computed if it was able to be scalarized, and in this case, that was not possible because of sdiv.


Hey Sander,
I may be wrong, but if the loop allows scalar epilogue then it should check if the instruction has a !mayDivideByZero(*I). If it does not allows scalar epilogue then we do not need to check the instruction, because the loop will not scalarize the epilogue.

// If predication is used for this block and the operation would otherwise
// be guarded, then this requires scalarizing.
if (blockNeedsPredication(I->getParent()) || VectorizeWithPredication)
  return !mayDivideByZero(*I);
sdesmalen added inline comments.May 18 2021, 1:50 AM

If the loop requires no predication, then we know that if the instruction divides by zero then that is an issue caused by the user, who should have avoided this case. If the loop requires predication - either because the user added some if (condition) around the divide, or because the compiler has chosen to fold the tail loop into the vector body and decided to use predication to enable/disable the lanes - then the LV must guarantee the program does not cause different behaviour after vectorizing the code. Because at the moment the LV cannot handle this case yet, it has to fall back on scalarization.

So, if the LV decided to use predication where no predication was needed before and the instruction may divide by zero, then it requires scalarization in order not to change the behaviour of the original program.

The question to ask is "does the loop have instructions that require predication, given that we need predication to handle the tail loop". The "do we need predication to handle the tail loop" part is only true, when the scalar epilogue is not allowed, because then the tail loop is folded into the main vector body.

e.g. 10 iterations with VF=4 without predication (and scalar tail):

1st vector iteration handles 0..3
2nd vector iteration handles 4..7
scalar tail loop handles 8..9

Alternatively, 10 iterations with VF=4 and predication

1st vector iteration handles 0..3, with predicate <true, true, true, true>
2nd vector iteration handles 4..7, with predicate <true, true, true, true>
3rd vector iteration handles 8, 9, xx, xx with predicate <true, true, false, false>
  • Address review's comment. Only check the loop body, that is when !isScalarEpilogueAllowed()
CarolineConcatto marked 2 inline comments as done and an inline comment as not done.May 18 2021, 3:40 AM
CarolineConcatto added inline comments.

Thank you @sdesmalen.
It took me a while to understand, but if I understood correct, now I agree with you.
The test here should only check instructions in the loop body and not in the tail of the loop. I added a comment in case I pass through here again and ask the same question.
I guess this same check(!Legal->canVectorizeWithPredication) will be needed when checking if the epilogue scalar is allowed.

david-arm accepted this revision.May 18 2021, 5:50 AM

LGTM! Looks like you've addressed all the review comments. :)


nit: Maybe "This test corresponds to the following function:"?


nit: Simple typo - it should be guaranteed.

This revision is now accepted and ready to land.May 18 2021, 5:50 AM
fhahn added a subscriber: fhahn.May 18 2021, 6:09 AM
fhahn added inline comments.

Would it be more accurate to say this can only be vectorized by scalarizing the division for now which cannot be done for scalable vectors at the moment?

sdesmalen added inline comments.May 19 2021, 1:12 AM

In the previous iteration you changed this condition, but this didn't affect the test. Can you add a test that exercises this change?


Instead of checking the output to be exactly this, is it sufficient to just check no vector type is being used?
e.g. CHECK-NOT: <vscale x 4>

CarolineConcatto marked an inline comment as done.
  • add test when the loop does not need predication and epilogue not allowed
CarolineConcatto marked 4 inline comments as done.May 19 2021, 3:46 AM
CarolineConcatto added inline comments.

Thank you @sdesmalen
I added the test as you suggested for when the loop does not need predication and the epilogue is not allowed.
This path is only triggered with the flag:


Otherwise isScalarEpilogueAllowed() is true and the loop can vectorize. That is the reason the second test has

; CHECK:  sdiv <vscale x 4 x i32>

Thank you @fhahn,
Is the explanation better now?
I added that it will be possible when llvm.vp is implemented.


Thank you @sdesmalen,
There is no problem. I have changed to check only SDIV.
Is that better?

CarolineConcatto marked 2 inline comments as done.
  • Fix and simplify llvm-ir test when the epilogue is not allowed
sdesmalen added inline comments.May 19 2021, 6:38 AM

I think it'd be good to give this a more generic name, so that we can use it to test for other cases as well. How about loopCanBeWidenedWithScalableVectors which implies the conditions should be negated, and you may want to return some message of why it did not vectorize, e.g.

LoopVectorizationCostModel::loopCanBeWidenedWithScalableVectors(ElementCount MaxVF, std::string &Message)

This is no longer true


nit: predication_in_loop


nit: unpredicated_loop_predication_through_tailfolding



david-arm added inline comments.May 19 2021, 7:23 AM

This is similar to where I moved the canVectorizeReductions call in there too.

CarolineConcatto marked an inline comment as done.
  • Rebase patch bases on D102394 to use canWidenLoopWithScalableVectors
  • Address reviewer's comment on the test
CarolineConcatto marked 5 inline comments as done.May 25 2021, 8:40 AM
CarolineConcatto added inline comments.

Thank you @david-arm and @sdesmalen
In order to not create the same function twice I have rebased this patch based on D102394.
So there is only one canWidenLoopWithScalableVectors.
This patch already has one dependence in D102437.
However, we can remove dependence on D102394, and copy the function if this patch and D102437 are accepted first

david-arm added inline comments.May 25 2021, 9:42 AM

nit: Hi @CarolineConcatto, this is just a suggestion but I think it might be clearer here to say something like:

// Check we are able to vectorize predicated instructions using scalable
// vectors. Such predication may occur if the scalar epilogue is folded into
// the vector loop body.

Not sure what others think?


I wonder if this is potentially misleading as we're not guaranteed to fall back on fixed-width vectorization, particularly if the user applied a C/C++ pragma to the loop?

CarolineConcatto marked an inline comment as done.
  • Address comments
CarolineConcatto marked 2 inline comments as done.May 26 2021, 6:27 AM

Thank you @david-arm for the review.
It is quite difficult to get the comments right, but I think that your comment is better than mine
So I've replaced.


Ok, I removed the part that used to say that it would rely on a fixed-width vector.
Now it has the same message as the others

CarolineConcatto marked an inline comment as done.
  • Rebase patch
  • Remove loopHasPredicatedScalar and add test for scalable in collectInstsToScalarize
sdesmalen added inline comments.Jul 19 2021, 1:38 PM

Can you extend the comment here explaining that the discount logic is not applied when the VF is scalable, because that would lead to invalid scalarization costs.


Can you merge these conditions, i.e. if (!VFScalable() && !useEmulatedMaskMemRefHack(&I) && ..


I think this assert can be removed in favour of using VF.getFixedValue() instead of VF.getKnownMinValue() everywhere in this function.


The RUN lines don't actually enable scalable vectorization (=off by default), so this test always passes.
You'll need to add -scalable-vectorization=preferred.

I'd also recommend moving this function to llvm/test/Transforms/LoopVectorize/AArch64. The -force-target-supports-scalable-vectors flag doesn't work properly, because the BasicTTIImpl costmodel implementation has defaults that must be overridden for scalable auto-vec to work.

CarolineConcatto marked 2 inline comments as done.
  • Add scalable-vectorization=on in the llvm-ir test.
  • Move llvm-ir test to be in AArch64 folder.
  • Remove assertions in computePredInstDiscount and getInstructionCost.
  • Replace getKnownMinValue by getFixedValue() in computePredInstDiscount and getInstructionCost.

Thanks for the changes! Just a few little nits left.


nit: this can just be part of the same expression, i.e.

if (!VF.isScalable() && !useEmulatedMaskMemRefHack(&I) &&
    computePredInstDiscount(&I, ScalarCosts, VF) >= 0)

Can you split out all the changes to use getFixedValue() from this patch and commit those as a separate NFC patch?

1 ↗(On Diff #360091)

nit: Can you remove the -mattr=+sve -mtriple aarch64-unknown-linux-gnu and instead encode it in the IR as:

target triple = aarch64-unknown-linux-gnu

define void @predication_in_loop(...) #0 {

attributes #0 = { "target-features"="+sve" }

Hey @sdesmalen,
Thank you for pointing out the missing flag on the test.
I have moved the test to be inside the AArch64 folder.
One thing I am not sure is the fix I added in getinstructionCost for scalable vector with BR instruction.
I believe it makes sense to not compute the cost of scalarized predicated instruction when VF is scalable, but not sure if the cost is correct.
Or if it should be invalid.


Is it ok if I do:

if (!VF.isScalable() && !useEmulatedMaskMemRefHack(&I))
  if (computePredInstDiscount(&I, ScalarCosts, VF) >= 0)

Because computePredInstDiscount should not be called for scalable vector.

  • Remove the changes from collectInstsToScalarize and getInstructionCost(now in D106359)
  • Add attribute in the llvm-ir test
CarolineConcatto marked 4 inline comments as done.Jul 20 2021, 8:01 AM
sdesmalen added inline comments.Jul 21 2021, 2:04 AM

The cost of the Br should also be invalid, because it tries to calculate the cost as if it's trying to scalarize the operation if ScalarPredicatedBB == true.

  • Make cost invalid for predicated instruction when VF is scalable in getInstructionCost for Instruction::Br
CarolineConcatto marked an inline comment as done.Jul 21 2021, 3:13 AM
sdesmalen accepted this revision.Jul 21 2021, 5:13 AM

LGTM, thanks @CarolineConcatto

Can you please also update the commit message to reflect the new approach and remove the 'Depends on <other patches>' since it no longer depends on either of those?

CarolineConcatto retitled this revision from [LoopVectorize] Fix crash for predicated instructions with scalable VF to [LoopVectorize] Fix crash for predicated instruction with scalable VF.
CarolineConcatto edited the summary of this revision. (Show Details)
  • Update the message with --verbatim
This revision was landed with ongoing or failed builds.Jul 22 2021, 4:50 AM
This revision was automatically updated to reflect the committed changes.