Page MenuHomePhabricator

[LoopVectorize] Fix crash for predicated instructions with scalable VF

Authored by CarolineConcatto on May 5 2021, 9:01 AM.



This patch adds a function that checks if inside a loop there is a scalar
predicated instruction. There is no support for vectorization of loops
with division because the vectorizer cannot guarantee that zero divisions
will not happen.

This patch adds a check in computeFeasibleMaxVF. If the instruction is not
supported for scalable VF, computeFeasibleMaxVF will fall back using
fixed-width vectorization.

This loop now does not use VF scalable

for (long long i = 0; i < n; i++)
    if (cond[i])
      a[i] /= b[i];

Depends on D102437

Depends on D102394

Diff Detail

Event Timeline

CarolineConcatto requested review of this revision.May 5 2021, 9:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2021, 9:01 AM
CarolineConcatto retitled this revision from [LoopVectorize] Fix crach for predicate instruction with scalable VF to [LoopVectorize] Fix crach for predicated instruction with scalable VF.


sdesmalen added inline comments.May 5 2021, 9:44 AM

nit: when the loop contains a predicated instruction that requires scalarization.


nit: loopHasScalarWithPredication


is this condition necessary? I would expect any instructions that explicitly need predication+scalarization, to never be values that can be ignored.


There is a problem with this approach, in that isScalarWithPredication calls blockNeedsPredication, which in turn calls foldTailByMasking, whose return value is determined after calling computeFeasibleMaxVF, which ends up calling this function in order to determine the min/max VF. This may lead to unexpected results when we want to use scalable vectors for tail-folded (predicated) loops, so you'll need to figure out a better place to call loopHasPredicatedScalar.

Additionally, it might be useful to add a mechanism that records if FoldTailByMasking already has its final value, and then assert that in isScalarWithPredication.

david-arm added inline comments.May 6 2021, 12:57 AM

OK that's a good point. It's tricky because we also need to ensure we call this before attempting to calculate any costs as we'll hit asserts in computePredInstDiscount again. If we didn't have to worry about blockNeedsPredication, this seems like a sensible place to bail out because we have to also worry about the general case without hints, i.e. we should really be telling the computeMaxVF code the max legal scalable VF = 0 as I believe this is how you've been recently structuring the code to work?

I realise this is not an ideal solution, but what if for now we could avoid concerns about tail folding by somehow marking tail folding as not an option for scalable vectors, since we don't support it anyway for scalable vectors?

CarolineConcatto retitled this revision from [LoopVectorize] Fix crach for predicated instruction with scalable VF to [LoopVectorize] Fix crash for predicated instructions with scalable VF.May 6 2021, 1:02 AM
Matt added a subscriber: Matt.May 7 2021, 8:35 AM
  • address review's comment about circular dependency with foldTailByMasking
  • add dependency in D102437 because of memory cost model
CarolineConcatto marked 2 inline comments as done.

-remove old comments in loopHasScalarWithPredication

sdesmalen added inline comments.May 18 2021, 12:33 AM

Should this be !isScalarEpilogueAllowed()? i.e. if a scalar epilogue loop is not allowed, then we know predication is required.

CarolineConcatto marked 3 inline comments as done.May 18 2021, 1:31 AM

Thank you @david-arm and @sdesmalen for the review.
I believe I've addressed all your comments.
I've updated the patch based on D102437 because there was a dependency between the cost model and the scalarization.
I could only use predicateWithScalar if I was able to compute the cost model, but the cost model was only able to be computed if it was able to be scalarized, and in this case, that was not possible because of sdiv.


Hey Sander,
I may be wrong, but if the loop allows scalar epilogue then it should check if the instruction has a !mayDivideByZero(*I). If it does not allows scalar epilogue then we do not need to check the instruction, because the loop will not scalarize the epilogue.

// If predication is used for this block and the operation would otherwise
// be guarded, then this requires scalarizing.
if (blockNeedsPredication(I->getParent()) || VectorizeWithPredication)
  return !mayDivideByZero(*I);
sdesmalen added inline comments.May 18 2021, 1:50 AM

If the loop requires no predication, then we know that if the instruction divides by zero then that is an issue caused by the user, who should have avoided this case. If the loop requires predication - either because the user added some if (condition) around the divide, or because the compiler has chosen to fold the tail loop into the vector body and decided to use predication to enable/disable the lanes - then the LV must guarantee the program does not cause different behaviour after vectorizing the code. Because at the moment the LV cannot handle this case yet, it has to fall back on scalarization.

So, if the LV decided to use predication where no predication was needed before and the instruction may divide by zero, then it requires scalarization in order not to change the behaviour of the original program.

The question to ask is "does the loop have instructions that require predication, given that we need predication to handle the tail loop". The "do we need predication to handle the tail loop" part is only true, when the scalar epilogue is not allowed, because then the tail loop is folded into the main vector body.

e.g. 10 iterations with VF=4 without predication (and scalar tail):

1st vector iteration handles 0..3
2nd vector iteration handles 4..7
scalar tail loop handles 8..9

Alternatively, 10 iterations with VF=4 and predication

1st vector iteration handles 0..3, with predicate <true, true, true, true>
2nd vector iteration handles 4..7, with predicate <true, true, true, true>
3rd vector iteration handles 8, 9, xx, xx with predicate <true, true, false, false>
  • Address review's comment. Only check the loop body, that is when !isScalarEpilogueAllowed()
CarolineConcatto marked 2 inline comments as done and an inline comment as not done.May 18 2021, 3:40 AM
CarolineConcatto added inline comments.

Thank you @sdesmalen.
It took me a while to understand, but if I understood correct, now I agree with you.
The test here should only check instructions in the loop body and not in the tail of the loop. I added a comment in case I pass through here again and ask the same question.
I guess this same check(!Legal->canVectorizeWithPredication) will be needed when checking if the epilogue scalar is allowed.

david-arm accepted this revision.May 18 2021, 5:50 AM

LGTM! Looks like you've addressed all the review comments. :)


nit: Maybe "This test corresponds to the following function:"?


nit: Simple typo - it should be guaranteed.

This revision is now accepted and ready to land.May 18 2021, 5:50 AM
fhahn added a subscriber: fhahn.May 18 2021, 6:09 AM
fhahn added inline comments.

Would it be more accurate to say this can only be vectorized by scalarizing the division for now which cannot be done for scalable vectors at the moment?

sdesmalen added inline comments.May 19 2021, 1:12 AM

In the previous iteration you changed this condition, but this didn't affect the test. Can you add a test that exercises this change?


Instead of checking the output to be exactly this, is it sufficient to just check no vector type is being used?
e.g. CHECK-NOT: <vscale x 4>

CarolineConcatto marked an inline comment as done.
  • add test when the loop does not need predication and epilogue not allowed
CarolineConcatto marked 4 inline comments as done.May 19 2021, 3:46 AM
CarolineConcatto added inline comments.

Thank you @sdesmalen
I added the test as you suggested for when the loop does not need predication and the epilogue is not allowed.
This path is only triggered with the flag:


Otherwise isScalarEpilogueAllowed() is true and the loop can vectorize. That is the reason the second test has

; CHECK:  sdiv <vscale x 4 x i32>

Thank you @fhahn,
Is the explanation better now?
I added that it will be possible when llvm.vp is implemented.


Thank you @sdesmalen,
There is no problem. I have changed to check only SDIV.
Is that better?

CarolineConcatto marked 2 inline comments as done.
  • Fix and simplify llvm-ir test when the epilogue is not allowed
sdesmalen added inline comments.May 19 2021, 6:38 AM

I think it'd be good to give this a more generic name, so that we can use it to test for other cases as well. How about loopCanBeWidenedWithScalableVectors which implies the conditions should be negated, and you may want to return some message of why it did not vectorize, e.g.

LoopVectorizationCostModel::loopCanBeWidenedWithScalableVectors(ElementCount MaxVF, std::string &Message)

This is no longer true


nit: predication_in_loop


nit: unpredicated_loop_predication_through_tailfolding



david-arm added inline comments.May 19 2021, 7:23 AM

This is similar to where I moved the canVectorizeReductions call in there too.

CarolineConcatto marked an inline comment as done.
  • Rebase patch bases on D102394 to use canWidenLoopWithScalableVectors
  • Address reviewer's comment on the test
CarolineConcatto marked 5 inline comments as done.Tue, May 25, 8:40 AM
CarolineConcatto added inline comments.

Thank you @david-arm and @sdesmalen
In order to not create the same function twice I have rebased this patch based on D102394.
So there is only one canWidenLoopWithScalableVectors.
This patch already has one dependence in D102437.
However, we can remove dependence on D102394, and copy the function if this patch and D102437 are accepted first

david-arm added inline comments.Tue, May 25, 9:42 AM

nit: Hi @CarolineConcatto, this is just a suggestion but I think it might be clearer here to say something like:

// Check we are able to vectorize predicated instructions using scalable
// vectors. Such predication may occur if the scalar epilogue is folded into
// the vector loop body.

Not sure what others think?


I wonder if this is potentially misleading as we're not guaranteed to fall back on fixed-width vectorization, particularly if the user applied a C/C++ pragma to the loop?

CarolineConcatto marked an inline comment as done.
  • Address comments
CarolineConcatto marked 2 inline comments as done.Wed, May 26, 6:27 AM

Thank you @david-arm for the review.
It is quite difficult to get the comments right, but I think that your comment is better than mine
So I've replaced.


Ok, I removed the part that used to say that it would rely on a fixed-width vector.
Now it has the same message as the others