This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Fix crash for predicated instruction with scalable VF
ClosedPublic

Authored by CarolineConcatto on May 5 2021, 9:01 AM.

Details

Summary

This patch avoids computing discounts for predicated instructions when the
VF is scalable.
There is no support for vectorization of loops with division because the
vectorizer cannot guarantee that zero divisions will not happen.

This loop now does not use VF scalable

for (long long i = 0; i < n; i++)
    if (cond[i])
      a[i] /= b[i];

Diff Detail

Event Timeline

CarolineConcatto requested review of this revision.May 5 2021, 9:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2021, 9:01 AM
CarolineConcatto retitled this revision from [LoopVectorize] Fix crach for predicate instruction with scalable VF to [LoopVectorize] Fix crach for predicated instruction with scalable VF.

-s/crach/crash/

sdesmalen added inline comments.May 5 2021, 9:44 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1282

nit: when the loop contains a predicated instruction that requires scalarization.

1283

nit: loopHasScalarWithPredication

7963–7965

is this condition necessary? I would expect any instructions that explicitly need predication+scalarization, to never be values that can be ignored.

7966

There is a problem with this approach, in that isScalarWithPredication calls blockNeedsPredication, which in turn calls foldTailByMasking, whose return value is determined after calling computeFeasibleMaxVF, which ends up calling this function in order to determine the min/max VF. This may lead to unexpected results when we want to use scalable vectors for tail-folded (predicated) loops, so you'll need to figure out a better place to call loopHasPredicatedScalar.

Additionally, it might be useful to add a mechanism that records if FoldTailByMasking already has its final value, and then assert that in isScalarWithPredication.

david-arm added inline comments.May 6 2021, 12:57 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7966

OK that's a good point. It's tricky because we also need to ensure we call this before attempting to calculate any costs as we'll hit asserts in computePredInstDiscount again. If we didn't have to worry about blockNeedsPredication, this seems like a sensible place to bail out because we have to also worry about the general case without hints, i.e. we should really be telling the computeMaxVF code the max legal scalable VF = 0 as I believe this is how you've been recently structuring the code to work?

I realise this is not an ideal solution, but what if for now we could avoid concerns about tail folding by somehow marking tail folding as not an option for scalable vectors, since we don't support it anyway for scalable vectors?

CarolineConcatto retitled this revision from [LoopVectorize] Fix crach for predicated instruction with scalable VF to [LoopVectorize] Fix crash for predicated instructions with scalable VF.May 6 2021, 1:02 AM
Matt added a subscriber: Matt.May 7 2021, 8:35 AM
  • address review's comment about circular dependency with foldTailByMasking
  • add dependency in D102437 because of memory cost model
CarolineConcatto marked 2 inline comments as done.

-remove old comments in loopHasScalarWithPredication

sdesmalen added inline comments.May 18 2021, 12:33 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5680

Should this be !isScalarEpilogueAllowed()? i.e. if a scalar epilogue loop is not allowed, then we know predication is required.

CarolineConcatto marked 3 inline comments as done.May 18 2021, 1:31 AM

Thank you @david-arm and @sdesmalen for the review.
I believe I've addressed all your comments.
I've updated the patch based on D102437 because there was a dependency between the cost model and the scalarization.
I could only use predicateWithScalar if I was able to compute the cost model, but the cost model was only able to be computed if it was able to be scalarized, and in this case, that was not possible because of sdiv.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5680

Hey Sander,
I may be wrong, but if the loop allows scalar epilogue then it should check if the instruction has a !mayDivideByZero(*I). If it does not allows scalar epilogue then we do not need to check the instruction, because the loop will not scalarize the epilogue.

// If predication is used for this block and the operation would otherwise
// be guarded, then this requires scalarizing.
if (blockNeedsPredication(I->getParent()) || VectorizeWithPredication)
  return !mayDivideByZero(*I);
sdesmalen added inline comments.May 18 2021, 1:50 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5680

If the loop requires no predication, then we know that if the instruction divides by zero then that is an issue caused by the user, who should have avoided this case. If the loop requires predication - either because the user added some if (condition) around the divide, or because the compiler has chosen to fold the tail loop into the vector body and decided to use predication to enable/disable the lanes - then the LV must guarantee the program does not cause different behaviour after vectorizing the code. Because at the moment the LV cannot handle this case yet, it has to fall back on scalarization.

So, if the LV decided to use predication where no predication was needed before and the instruction may divide by zero, then it requires scalarization in order not to change the behaviour of the original program.

The question to ask is "does the loop have instructions that require predication, given that we need predication to handle the tail loop". The "do we need predication to handle the tail loop" part is only true, when the scalar epilogue is not allowed, because then the tail loop is folded into the main vector body.

e.g. 10 iterations with VF=4 without predication (and scalar tail):

1st vector iteration handles 0..3
2nd vector iteration handles 4..7
scalar tail loop handles 8..9

Alternatively, 10 iterations with VF=4 and predication

1st vector iteration handles 0..3, with predicate <true, true, true, true>
2nd vector iteration handles 4..7, with predicate <true, true, true, true>
3rd vector iteration handles 8, 9, xx, xx with predicate <true, true, false, false>
  • Address review's comment. Only check the loop body, that is when !isScalarEpilogueAllowed()
CarolineConcatto marked 2 inline comments as done and an inline comment as not done.May 18 2021, 3:40 AM
CarolineConcatto added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5680

Thank you @sdesmalen.
It took me a while to understand, but if I understood correct, now I agree with you.
The test here should only check instructions in the loop body and not in the tail of the loop. I added a comment in case I pass through here again and ask the same question.
I guess this same check(!Legal->canVectorizeWithPredication) will be needed when checking if the epilogue scalar is allowed.

david-arm accepted this revision.May 18 2021, 5:50 AM

LGTM! Looks like you've addressed all the review comments. :)

llvm/test/Transforms/LoopVectorize/scalable-predicate-instruction.ll
7

nit: Maybe "This test corresponds to the following function:"?

14

nit: Simple typo - it should be guaranteed.

This revision is now accepted and ready to land.May 18 2021, 5:50 AM
fhahn added a subscriber: fhahn.May 18 2021, 6:09 AM
fhahn added inline comments.
llvm/test/Transforms/LoopVectorize/scalable-predicate-instruction.ll
14

Would it be more accurate to say this can only be vectorized by scalarizing the division for now which cannot be done for scalable vectors at the moment?

sdesmalen added inline comments.May 19 2021, 1:12 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5682

In the previous iteration you changed this condition, but this didn't affect the test. Can you add a test that exercises this change?

llvm/test/Transforms/LoopVectorize/scalable-predicate-instruction.ll
17

Instead of checking the output to be exactly this, is it sufficient to just check no vector type is being used?
e.g. CHECK-NOT: <vscale x 4>

CarolineConcatto marked an inline comment as done.
  • add test when the loop does not need predication and epilogue not allowed
CarolineConcatto marked 4 inline comments as done.May 19 2021, 3:46 AM
CarolineConcatto added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5682

Thank you @sdesmalen
I added the test as you suggested for when the loop does not need predication and the epilogue is not allowed.
This path is only triggered with the flag:

prefer-predicate-over-epilogue=predicate-dont-vectorize

Otherwise isScalarEpilogueAllowed() is true and the loop can vectorize. That is the reason the second test has

; CHECK:  sdiv <vscale x 4 x i32>
llvm/test/Transforms/LoopVectorize/scalable-predicate-instruction.ll
14

Thank you @fhahn,
Is the explanation better now?
I added that it will be possible when llvm.vp is implemented.

17

Thank you @sdesmalen,
There is no problem. I have changed to check only SDIV.
Is that better?

CarolineConcatto marked 2 inline comments as done.
  • Fix and simplify llvm-ir test when the epilogue is not allowed
sdesmalen added inline comments.May 19 2021, 6:38 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7959

I think it'd be good to give this a more generic name, so that we can use it to test for other cases as well. How about loopCanBeWidenedWithScalableVectors which implies the conditions should be negated, and you may want to return some message of why it did not vectorize, e.g.

LoopVectorizationCostModel::loopCanBeWidenedWithScalableVectors(ElementCount MaxVF, std::string &Message)
llvm/test/Transforms/LoopVectorize/scalable-predicate-instruction.ll
2

This is no longer true

18

nit: predication_in_loop

64

nit: unpredicated_loop_predication_through_tailfolding

91

unnecessary?

david-arm added inline comments.May 19 2021, 7:23 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7959

This is similar to https://reviews.llvm.org/D102394 where I moved the canVectorizeReductions call in there too.

CarolineConcatto marked an inline comment as done.
  • Rebase patch bases on D102394 to use canWidenLoopWithScalableVectors
  • Address reviewer's comment on the test
CarolineConcatto marked 5 inline comments as done.May 25 2021, 8:40 AM
CarolineConcatto added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7959

Thank you @david-arm and @sdesmalen
In order to not create the same function twice I have rebased this patch based on D102394.
So there is only one canWidenLoopWithScalableVectors.
This patch already has one dependence in D102437.
However, we can remove dependence on D102394, and copy the function if this patch and D102437 are accepted first

david-arm added inline comments.May 25 2021, 9:42 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5646–5647

nit: Hi @CarolineConcatto, this is just a suggestion but I think it might be clearer here to say something like:

// Check we are able to vectorize predicated instructions using scalable
// vectors. Such predication may occur if the scalar epilogue is folded into
// the vector loop body.

Not sure what others think?

5650

I wonder if this is potentially misleading as we're not guaranteed to fall back on fixed-width vectorization, particularly if the user applied a C/C++ pragma to the loop?

CarolineConcatto marked an inline comment as done.
  • Address comments
CarolineConcatto marked 2 inline comments as done.May 26 2021, 6:27 AM

Thank you @david-arm for the review.
It is quite difficult to get the comments right, but I think that your comment is better than mine
So I've replaced.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5650

Ok, I removed the part that used to say that it would rely on a fixed-width vector.
Now it has the same message as the others

CarolineConcatto marked an inline comment as done.
  • Rebase patch
  • Remove loopHasPredicatedScalar and add test for scalable in collectInstsToScalarize
sdesmalen added inline comments.Jul 19 2021, 1:38 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6827

Can you extend the comment here explaining that the discount logic is not applied when the VF is scalable, because that would lead to invalid scalarization costs.

6828–6829

Can you merge these conditions, i.e. if (!VFScalable() && !useEmulatedMaskMemRefHack(&I) && ..

6917

I think this assert can be removed in favour of using VF.getFixedValue() instead of VF.getKnownMinValue() everywhere in this function.

llvm/test/Transforms/LoopVectorize/scalable-predicate-instruction.ll
1

The RUN lines don't actually enable scalable vectorization (=off by default), so this test always passes.
You'll need to add -scalable-vectorization=preferred.

I'd also recommend moving this function to llvm/test/Transforms/LoopVectorize/AArch64. The -force-target-supports-scalable-vectors flag doesn't work properly, because the BasicTTIImpl costmodel implementation has defaults that must be overridden for scalable auto-vec to work.

CarolineConcatto marked 2 inline comments as done.
  • Add scalable-vectorization=on in the llvm-ir test.
  • Move llvm-ir test to be in AArch64 folder.
  • Remove assertions in computePredInstDiscount and getInstructionCost.
  • Replace getKnownMinValue by getFixedValue() in computePredInstDiscount and getInstructionCost.

Thanks for the changes! Just a few little nits left.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6829

nit: this can just be part of the same expression, i.e.

if (!VF.isScalable() && !useEmulatedMaskMemRefHack(&I) &&
    computePredInstDiscount(&I, ScalarCosts, VF) >= 0)
6908

Can you split out all the changes to use getFixedValue() from this patch and commit those as a separate NFC patch?

llvm/test/Transforms/LoopVectorize/AArch64/scalable-predicate-instruction.ll
1 ↗(On Diff #360091)

nit: Can you remove the -mattr=+sve -mtriple aarch64-unknown-linux-gnu and instead encode it in the IR as:

target triple = aarch64-unknown-linux-gnu

define void @predication_in_loop(...) #0 {
  ..
}

attributes #0 = { "target-features"="+sve" }

Hey @sdesmalen,
Thank you for pointing out the missing flag on the test.
I have moved the test to be inside the AArch64 folder.
One thing I am not sure is the fix I added in getinstructionCost for scalable vector with BR instruction.
I believe it makes sense to not compute the cost of scalarized predicated instruction when VF is scalable, but not sure if the cost is correct.
Or if it should be invalid.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6828–6829

Is it ok if I do:

if (!VF.isScalable() && !useEmulatedMaskMemRefHack(&I))
  if (computePredInstDiscount(&I, ScalarCosts, VF) >= 0)

Because computePredInstDiscount should not be called for scalable vector.

  • Remove the changes from collectInstsToScalarize and getInstructionCost(now in D106359)
  • Add attribute in the llvm-ir test
CarolineConcatto marked 4 inline comments as done.Jul 20 2021, 8:01 AM
sdesmalen added inline comments.Jul 21 2021, 2:04 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7625

The cost of the Br should also be invalid, because it tries to calculate the cost as if it's trying to scalarize the operation if ScalarPredicatedBB == true.

  • Make cost invalid for predicated instruction when VF is scalable in getInstructionCost for Instruction::Br
CarolineConcatto marked an inline comment as done.Jul 21 2021, 3:13 AM
sdesmalen accepted this revision.Jul 21 2021, 5:13 AM

LGTM, thanks @CarolineConcatto

Can you please also update the commit message to reflect the new approach and remove the 'Depends on <other patches>' since it no longer depends on either of those?

CarolineConcatto retitled this revision from [LoopVectorize] Fix crash for predicated instructions with scalable VF to [LoopVectorize] Fix crash for predicated instruction with scalable VF.
CarolineConcatto edited the summary of this revision. (Show Details)
  • Update the message with --verbatim
This revision was landed with ongoing or failed builds.Jul 22 2021, 4:50 AM
This revision was automatically updated to reflect the committed changes.