This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
23/24
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
9/11
scalable-predicate-instruction.ll

Differential D101916

[LoopVectorize] Fix crash for predicated instruction with scalable VF
ClosedPublic

Authored by CarolineConcatto on May 5 2021, 9:01 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
david-arm
kmclaughlin
Florian
dmgreen

Commits

rG5a4de84d55fa: [LoopVectorize] Fix crash for predicated instruction with scalable VF

Summary

This patch avoids computing discounts for predicated instructions when the
VF is scalable.
There is no support for vectorization of loops with division because the
vectorizer cannot guarantee that zero divisions will not happen.

This loop now does not use VF scalable

for (long long i = 0; i < n; i++)
    if (cond[i])
      a[i] /= b[i];

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

CarolineConcatto created this revision.May 5 2021, 9:01 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptMay 5 2021, 9:01 AM

CarolineConcatto requested review of this revision.May 5 2021, 9:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2021, 9:01 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

CarolineConcatto added reviewers: sdesmalen, david-arm, kmclaughlin, Florian, dmgreen.May 5 2021, 9:06 AM

CarolineConcatto retitled this revision from [LoopVectorize] Fix crach for predicate instruction with scalable VF to [LoopVectorize] Fix crach for predicated instruction with scalable VF.

-s/crach/crash/

sdesmalen added inline comments.May 5 2021, 9:44 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1266	nit: when the loop contains a predicated instruction that requires scalarization.
1267	nit: `loopHasScalarWithPredication`
7800–7802	is this condition necessary? I would expect any instructions that explicitly need predication+scalarization, to never be values that can be ignored.
7803	There is a problem with this approach, in that `isScalarWithPredication` calls `blockNeedsPredication`, which in turn calls `foldTailByMasking`, whose return value is determined after calling `computeFeasibleMaxVF`, which ends up calling this function in order to determine the min/max VF. This may lead to unexpected results when we want to use scalable vectors for tail-folded (predicated) loops, so you'll need to figure out a better place to call `loopHasPredicatedScalar`. Additionally, it might be useful to add a mechanism that records if FoldTailByMasking already has its final value, and then assert that in `isScalarWithPredication`.

Harbormaster completed remote builds in B102769: Diff 343077.May 5 2021, 10:04 AM

david-arm added inline comments.May 6 2021, 12:57 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7803	OK that's a good point. It's tricky because we also need to ensure we call this before attempting to calculate any costs as we'll hit asserts in computePredInstDiscount again. If we didn't have to worry about `blockNeedsPredication`, this seems like a sensible place to bail out because we have to also worry about the general case without hints, i.e. we should really be telling the computeMaxVF code the max legal scalable VF = 0 as I believe this is how you've been recently structuring the code to work? I realise this is not an ideal solution, but what if for now we could avoid concerns about tail folding by somehow marking tail folding as not an option for scalable vectors, since we don't support it anyway for scalable vectors?

CarolineConcatto retitled this revision from [LoopVectorize] Fix crach for predicated instruction with scalable VF to [LoopVectorize] Fix crash for predicated instructions with scalable VF.May 6 2021, 1:02 AM

Matt added a subscriber: Matt.May 7 2021, 8:35 AM

address review's comment about circular dependency with foldTailByMasking
add dependency in D102437 because of memory cost model

-remove old comments in loopHasScalarWithPredication

sdesmalen added inline comments.May 18 2021, 12:33 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5644	Should this be `!isScalarEpilogueAllowed()`? i.e. if a scalar epilogue loop is not allowed, then we know predication is required.

Harbormaster completed remote builds in B104948: Diff 346063.May 18 2021, 12:40 AM

Thank you @david-arm and @sdesmalen for the review.
I believe I've addressed all your comments.
I've updated the patch based on D102437 because there was a dependency between the cost model and the scalarization.
I could only use predicateWithScalar if I was able to compute the cost model, but the cost model was only able to be computed if it was able to be scalarized, and in this case, that was not possible because of sdiv.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5644	Hey Sander, I may be wrong, but if the loop allows scalar epilogue then it should check if the instruction has a !mayDivideByZero(I). If it does not allows scalar epilogue then we do not need to check the instruction, because the loop will not scalarize the epilogue. // If predication is used for this block and the operation would otherwise // be guarded, then this requires scalarizing. if (blockNeedsPredication(I->getParent()) \|\| VectorizeWithPredication) return !mayDivideByZero(I);

sdesmalen added inline comments.May 18 2021, 1:50 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5644	If the loop requires no predication, then we know that if the instruction divides by zero then that is an issue caused by the user, who should have avoided this case. If the loop requires predication - either because the user added some `if (condition)` around the divide, or because the compiler has chosen to fold the tail loop into the vector body and decided to use predication to enable/disable the lanes - then the LV must guarantee the program does not cause different behaviour after vectorizing the code. Because at the moment the LV cannot handle this case yet, it has to fall back on scalarization. So, if the LV decided to use predication where no predication was needed before and the instruction may divide by zero, then it requires scalarization in order not to change the behaviour of the original program. The question to ask is "does the loop have instructions that require predication, given that we need predication to handle the tail loop". The "do we need predication to handle the tail loop" part is only `true`, when the scalar epilogue is not allowed, because then the tail loop is folded into the main vector body. e.g. 10 iterations with VF=4 without predication (and scalar tail): 1st vector iteration handles 0..3 2nd vector iteration handles 4..7 scalar tail loop handles 8..9 Alternatively, 10 iterations with VF=4 and predication 1st vector iteration handles 0..3, with predicate <true, true, true, true> 2nd vector iteration handles 4..7, with predicate <true, true, true, true> 3rd vector iteration handles 8, 9, xx, xx with predicate <true, true, false, false>

Address review's comment. Only check the loop body, that is when !isScalarEpilogueAllowed()

CarolineConcatto marked 2 inline comments as done and an inline comment as not done.May 18 2021, 3:40 AM

CarolineConcatto added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5644	Thank you @sdesmalen. It took me a while to understand, but if I understood correct, now I agree with you. The test here should only check instructions in the loop body and not in the tail of the loop. I added a comment in case I pass through here again and ask the same question. I guess this same check(!Legal->canVectorizeWithPredication) will be needed when checking if the epilogue scalar is allowed.

Harbormaster completed remote builds in B104978: Diff 346104.May 18 2021, 3:54 AM

LGTM! Looks like you've addressed all the review comments. :)

llvm/test/Transforms/LoopVectorize/scalable-predicate-instruction.ll
7	nit: Maybe "This test corresponds to the following function:"?
14	nit: Simple typo - it should be `guaranteed`.

This revision is now accepted and ready to land.May 18 2021, 5:50 AM

fhahn added a subscriber: fhahn.May 18 2021, 6:09 AM

fhahn added inline comments.

llvm/test/Transforms/LoopVectorize/scalable-predicate-instruction.ll
14	Would it be more accurate to say this can only be vectorized by scalarizing the division for now which cannot be done for scalable vectors at the moment?

sdesmalen added inline comments.May 19 2021, 1:12 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5646	In the previous iteration you changed this condition, but this didn't affect the test. Can you add a test that exercises this change?
llvm/test/Transforms/LoopVectorize/scalable-predicate-instruction.ll
17	Instead of checking the output to be exactly this, is it sufficient to just check no vector type is being used? e.g. CHECK-NOT: <vscale x 4>

add test when the loop does not need predication and epilogue not allowed

Harbormaster completed remote builds in B105184: Diff 346387.May 19 2021, 3:42 AM

CarolineConcatto marked 4 inline comments as done.May 19 2021, 3:46 AM

CarolineConcatto added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5646	Thank you @sdesmalen I added the test as you suggested for when the loop does not need predication and the epilogue is not allowed. This path is only triggered with the flag: prefer-predicate-over-epilogue=predicate-dont-vectorize Otherwise isScalarEpilogueAllowed() is true and the loop can vectorize. That is the reason the second test has ; CHECK: sdiv <vscale x 4 x i32>
llvm/test/Transforms/LoopVectorize/scalable-predicate-instruction.ll
14	Thank you @fhahn, Is the explanation better now? I added that it will be possible when llvm.vp is implemented.
17	Thank you @sdesmalen, There is no problem. I have changed to check only SDIV. Is that better?

Fix and simplify llvm-ir test when the epilogue is not allowed

Harbormaster completed remote builds in B105217: Diff 346428.May 19 2021, 6:26 AM

sdesmalen added inline comments.May 19 2021, 6:38 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7796	I think it'd be good to give this a more generic name, so that we can use it to test for other cases as well. How about `loopCanBeWidenedWithScalableVectors` which implies the conditions should be negated, and you may want to return some message of why it did not vectorize, e.g. LoopVectorizationCostModel::loopCanBeWidenedWithScalableVectors(ElementCount MaxVF, std::string &Message)
llvm/test/Transforms/LoopVectorize/scalable-predicate-instruction.ll
2	This is no longer true
18	nit: predication_in_loop
64	nit: unpredicated_loop_predication_through_tailfolding
91	unnecessary?

david-arm added inline comments.May 19 2021, 7:23 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7796	This is similar to https://reviews.llvm.org/D102394 where I moved the canVectorizeReductions call in there too.

Rebase patch bases on D102394 to use canWidenLoopWithScalableVectors
Address reviewer's comment on the test

Harbormaster completed remote builds in B106097: Diff 347686.May 25 2021, 8:29 AM

CarolineConcatto edited the summary of this revision. (Show Details)May 25 2021, 8:30 AM

CarolineConcatto added parent revisions: D102437: [LV] NFC: Decouple foldTailByMasking from isScalarWithPredication., D102394: [LoopVectorize] Don't attempt to widen certain calls for scalable vectors.

CarolineConcatto marked 5 inline comments as done.May 25 2021, 8:40 AM

CarolineConcatto added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7796	Thank you @david-arm and @sdesmalen In order to not create the same function twice I have rebased this patch based on D102394. So there is only one canWidenLoopWithScalableVectors. This patch already has one dependence in D102437. However, we can remove dependence on D102394, and copy the function if this patch and D102437 are accepted first

david-arm added inline comments.May 25 2021, 9:42 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5594–5595	nit: Hi @CarolineConcatto, this is just a suggestion but I think it might be clearer here to say something like: // Check we are able to vectorize predicated instructions using scalable // vectors. Such predication may occur if the scalar epilogue is folded into // the vector loop body. Not sure what others think?
5598	I wonder if this is potentially misleading as we're not guaranteed to fall back on fixed-width vectorization, particularly if the user applied a C/C++ pragma to the loop?

Address comments

Harbormaster completed remote builds in B106285: Diff 347947.May 26 2021, 6:22 AM

Thank you @david-arm for the review.
It is quite difficult to get the comments right, but I think that your comment is better than mine
So I've replaced.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5598	Ok, I removed the part that used to say that it would rely on a fixed-width vector. Now it has the same message as the others

CarolineConcatto mentioned this in D103180: [InstSimplify] Add constant fold for extractelement + splat for scalable vectors.May 26 2021, 9:41 AM

sdesmalen mentioned this in D102253: [LV] Prevent vectorization with unsupported element types..Jun 8 2021, 9:58 AM

Rebase patch
Remove loopHasPredicatedScalar and add test for scalable in collectInstsToScalarize

Harbormaster completed remote builds in B114886: Diff 359826.Jul 19 2021, 9:43 AM

sdesmalen added inline comments.Jul 19 2021, 1:38 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6687–6689	Can you extend the comment here explaining that the discount logic is not applied when the VF is scalable, because that would lead to invalid scalarization costs.
6690–6691	Can you merge these conditions, i.e. `if (!VFScalable() && !useEmulatedMaskMemRefHack(&I) && ..`
6776	I think this assert can be removed in favour of using `VF.getFixedValue()` instead of `VF.getKnownMinValue()` everywhere in this function.
llvm/test/Transforms/LoopVectorize/scalable-predicate-instruction.ll
2	The RUN lines don't actually enable scalable vectorization (=off by default), so this test always passes. You'll need to add `-scalable-vectorization=preferred`. I'd also recommend moving this function to llvm/test/Transforms/LoopVectorize/AArch64. The `-force-target-supports-scalable-vectors` flag doesn't work properly, because the BasicTTIImpl costmodel implementation has defaults that must be overridden for scalable auto-vec to work.

Add scalable-vectorization=on in the llvm-ir test.
Move llvm-ir test to be in AArch64 folder.
Remove assertions in computePredInstDiscount and getInstructionCost.
Replace getKnownMinValue by getFixedValue() in computePredInstDiscount and getInstructionCost.

Harbormaster completed remote builds in B115074: Diff 360091.Jul 20 2021, 5:17 AM

Thanks for the changes! Just a few little nits left.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6691	nit: this can just be part of the same expression, i.e. if (!VF.isScalable() && !useEmulatedMaskMemRefHack(&I) && computePredInstDiscount(&I, ScalarCosts, VF) >= 0)
6767	Can you split out all the changes to use `getFixedValue()` from this patch and commit those as a separate NFC patch?
llvm/test/Transforms/LoopVectorize/AArch64/scalable-predicate-instruction.ll
1 ↗	(On Diff #360091)	nit: Can you remove the `-mattr=+sve -mtriple aarch64-unknown-linux-gnu` and instead encode it in the IR as: target triple = aarch64-unknown-linux-gnu define void @predication_in_loop(...) #0 { .. } attributes #0 = { "target-features"="+sve" }

Hey @sdesmalen,
Thank you for pointing out the missing flag on the test.
I have moved the test to be inside the AArch64 folder.
One thing I am not sure is the fix I added in getinstructionCost for scalable vector with BR instruction.
I believe it makes sense to not compute the cost of scalarized predicated instruction when VF is scalable, but not sure if the cost is correct.
Or if it should be invalid.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6690–6691	Is it ok if I do: if (!VF.isScalable() && !useEmulatedMaskMemRefHack(&I)) if (computePredInstDiscount(&I, ScalarCosts, VF) >= 0) Because computePredInstDiscount should not be called for scalable vector.

sdesmalen mentioned this in D106359: [NFC][LoopVectorizer] Remove VF.isScalable() assertion from collectInstsToScalarize and getInstructionCost.Jul 20 2021, 6:46 AM

Remove the changes from collectInstsToScalarize and getInstructionCost(now in D106359)
Add attribute in the llvm-ir test

Harbormaster completed remote builds in B115103: Diff 360135.Jul 20 2021, 7:59 AM

CarolineConcatto marked 4 inline comments as done.Jul 20 2021, 8:01 AM

sdesmalen added inline comments.Jul 21 2021, 2:04 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7468	The cost of the `Br` should also be invalid, because it tries to calculate the cost as if it's trying to scalarize the operation if ScalarPredicatedBB == true.

Make cost invalid for predicated instruction when VF is scalable in getInstructionCost for Instruction::Br

Harbormaster completed remote builds in B115274: Diff 360399.Jul 21 2021, 3:09 AM

CarolineConcatto marked an inline comment as done.Jul 21 2021, 3:13 AM

LGTM, thanks @CarolineConcatto

Can you please also update the commit message to reflect the new approach and remove the 'Depends on <other patches>' since it no longer depends on either of those?

CarolineConcatto mentioned this in D106536: [LoopVectorize] Fix crash for predicated instruction with scalable VF.Jul 22 2021, 3:48 AM

Update the message with --verbatim

Harbormaster completed remote builds in B115532: Diff 360758.Jul 22 2021, 3:50 AM

This revision was landed with ongoing or failed builds.Jul 22 2021, 4:50 AM

Closed by commit rG5a4de84d55fa: [LoopVectorize] Fix crash for predicated instruction with scalable VF (authored by CarolineConcatto). · Explain Why

This revision was automatically updated to reflect the committed changes.

CarolineConcatto added a commit: rG5a4de84d55fa: [LoopVectorize] Fix crash for predicated instruction with scalable VF.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

8 lines

test/

Transforms/

LoopVectorize/

scalable-predicate-instruction.ll

91 lines

Diff 347947

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,257 Lines • ▼ Show 20 Lines	public:
std::pair<unsigned, unsigned> getSmallestAndWidestTypes();		std::pair<unsigned, unsigned> getSmallestAndWidestTypes();

/// \return The desired interleave count.		/// \return The desired interleave count.
/// If interleave count has been specified by metadata it will be returned.		/// If interleave count has been specified by metadata it will be returned.
/// Otherwise, the interleave count is computed and returned. VF and LoopCost		/// Otherwise, the interleave count is computed and returned. VF and LoopCost
/// are the selected vectorization factor and the cost of the selected VF.		/// are the selected vectorization factor and the cost of the selected VF.
unsigned selectInterleaveCount(ElementCount VF, unsigned LoopCost);		unsigned selectInterleaveCount(ElementCount VF, unsigned LoopCost);

/// Memory access instruction may be vectorized in more than one way.		/// Memory access instruction may be vectorized in more than one way.
		sdesmalenUnsubmitted Done Reply Inline Actions nit: when the loop contains a predicated instruction that requires scalarization. sdesmalen: nit: when the loop contains a predicated instruction that requires scalarization.
/// Form of instruction after vectorization depends on cost.		/// Form of instruction after vectorization depends on cost.
		sdesmalenUnsubmitted Done Reply Inline Actions nit: `loopHasScalarWithPredication` sdesmalen: nit: `loopHasScalarWithPredication`
/// This function takes cost-based decisions for Load/Store instructions		/// This function takes cost-based decisions for Load/Store instructions
/// and collects them in a map. This decisions map is used for building		/// and collects them in a map. This decisions map is used for building
/// the lists of loop-uniform and loop-scalar instructions.		/// the lists of loop-uniform and loop-scalar instructions.
/// The calculated cost is saved with widening decision in order to		/// The calculated cost is saved with widening decision in order to
/// avoid redundant calculations.		/// avoid redundant calculations.
void setCostBasedWideningDecision(ElementCount VF);		void setCostBasedWideningDecision(ElementCount VF);

/// A struct that represents some properties of the register usage		/// A struct that represents some properties of the register usage
▲ Show 20 Lines • Show All 4,310 Lines • ▼ Show 20 Lines	Msg = "Scalable vectorization not supported for the reduction "
"operations found in this loop.";		"operations found in this loop.";
return false;		return false;
}		}

// Iterate through all instructions in the loop ensuring that is legal to		// Iterate through all instructions in the loop ensuring that is legal to
// vectorize with a scalable VF.		// vectorize with a scalable VF.
for (BasicBlock *BB : TheLoop->blocks()) {		for (BasicBlock *BB : TheLoop->blocks()) {
for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
		// Check we are able to vectorize predicated instructions using scalable
		// vectors. Such predication may occur if the scalar epilogue is folded
		david-armUnsubmitted Done Reply Inline Actions nit: Hi @CarolineConcatto, this is just a suggestion but I think it might be clearer here to say something like: // Check we are able to vectorize predicated instructions using scalable // vectors. Such predication may occur if the scalar epilogue is folded into // the vector loop body. Not sure what others think? david-arm: nit: Hi @CarolineConcatto, this is just a suggestion but I think it might be clearer here to…
		// into the vector loop body.
		if (!Legal->canVectorizeWithPredication(&I, !isScalarEpilogueAllowed())) {
		Msg = "Scalable vectorization not supported for predicated operations "
		david-armUnsubmitted Done Reply Inline Actions I wonder if this is potentially misleading as we're not guaranteed to fall back on fixed-width vectorization, particularly if the user applied a C/C++ pragma to the loop? david-arm: I wonder if this is potentially misleading as we're not guaranteed to fall back on fixed-width…
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions Ok, I removed the part that used to say that it would rely on a fixed-width vector. Now it has the same message as the others CarolineConcatto: Ok, I removed the part that used to say that it would rely on a fixed-width vector. Now it has…
		"found in this loop.";
		return false;
		}
if (auto *CI = dyn_cast<CallInst>(&I)) {		if (auto *CI = dyn_cast<CallInst>(&I)) {
Intrinsic::ID VecID = getVectorIntrinsicIDForCall(CI, TLI);		Intrinsic::ID VecID = getVectorIntrinsicIDForCall(CI, TLI);

// First check if it's always legal to widen this intrinsic regardless		// First check if it's always legal to widen this intrinsic regardless
// of the scalable VF, i.e. we don't have to worry about scalarizing		// of the scalable VF, i.e. we don't have to worry about scalarizing
// the intrinsic.		// the intrinsic.
if (VecID && TTI.isLegalScalableVectorIntrinsic(VecID))		if (VecID && TTI.isLegalScalableVectorIntrinsic(VecID))
continue;		continue;
Show All 26 Lines	auto MaxScalableVF = ElementCount::getScalable(
std::numeric_limits<ElementCount::ScalarTy>::max());		std::numeric_limits<ElementCount::ScalarTy>::max());

StringRef Msg;		StringRef Msg;
if (!canWidenLoopWithScalableVectors(MaxScalableVF, Msg)) {		if (!canWidenLoopWithScalableVectors(MaxScalableVF, Msg)) {
reportVectorizationInfo(Msg, "ScalableVFUnfeasible", ORE, TheLoop);		reportVectorizationInfo(Msg, "ScalableVFUnfeasible", ORE, TheLoop);
return ElementCount::getScalable(0);		return ElementCount::getScalable(0);
}		}

if (Legal->isSafeForAnyVectorWidth())		if (Legal->isSafeForAnyVectorWidth())
		sdesmalenUnsubmitted Done Reply Inline Actions Should this be `!isScalarEpilogueAllowed()`? i.e. if a scalar epilogue loop is not allowed, then we know predication is required. sdesmalen: Should this be `!isScalarEpilogueAllowed()`? i.e. if a scalar epilogue loop is not allowed…
		CarolineConcattoAuthorUnsubmitted Not Done Reply Inline Actions Hey Sander, I may be wrong, but if the loop allows scalar epilogue then it should check if the instruction has a !mayDivideByZero(I). If it does not allows scalar epilogue then we do not need to check the instruction, because the loop will not scalarize the epilogue. // If predication is used for this block and the operation would otherwise // be guarded, then this requires scalarizing. if (blockNeedsPredication(I->getParent()) \|\| VectorizeWithPredication) return !mayDivideByZero(I); CarolineConcatto: Hey Sander, I may be wrong, but if the loop allows scalar epilogue then it should check if the…
		sdesmalenUnsubmitted Done Reply Inline Actions If the loop requires no predication, then we know that if the instruction divides by zero then that is an issue caused by the user, who should have avoided this case. If the loop requires predication - either because the user added some `if (condition)` around the divide, or because the compiler has chosen to fold the tail loop into the vector body and decided to use predication to enable/disable the lanes - then the LV must guarantee the program does not cause different behaviour after vectorizing the code. Because at the moment the LV cannot handle this case yet, it has to fall back on scalarization. So, if the LV decided to use predication where no predication was needed before and the instruction may divide by zero, then it requires scalarization in order not to change the behaviour of the original program. The question to ask is "does the loop have instructions that require predication, given that we need predication to handle the tail loop". The "do we need predication to handle the tail loop" part is only `true`, when the scalar epilogue is not allowed, because then the tail loop is folded into the main vector body. e.g. 10 iterations with VF=4 without predication (and scalar tail): 1st vector iteration handles 0..3 2nd vector iteration handles 4..7 scalar tail loop handles 8..9 Alternatively, 10 iterations with VF=4 and predication 1st vector iteration handles 0..3, with predicate <true, true, true, true> 2nd vector iteration handles 4..7, with predicate <true, true, true, true> 3rd vector iteration handles 8, 9, xx, xx with predicate <true, true, false, false> sdesmalen: If the loop requires no predication, then we know that if the instruction divides by zero then…
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions Thank you @sdesmalen. It took me a while to understand, but if I understood correct, now I agree with you. The test here should only check instructions in the loop body and not in the tail of the loop. I added a comment in case I pass through here again and ask the same question. I guess this same check(!Legal->canVectorizeWithPredication) will be needed when checking if the epilogue scalar is allowed. CarolineConcatto: Thank you @sdesmalen. It took me a while to understand, but if I understood correct, now I…
return MaxScalableVF;		return MaxScalableVF;

		sdesmalenUnsubmitted Done Reply Inline Actions In the previous iteration you changed this condition, but this didn't affect the test. Can you add a test that exercises this change? sdesmalen: In the previous iteration you changed this condition, but this didn't affect the test. Can you…
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions Thank you @sdesmalen I added the test as you suggested for when the loop does not need predication and the epilogue is not allowed. This path is only triggered with the flag: prefer-predicate-over-epilogue=predicate-dont-vectorize Otherwise isScalarEpilogueAllowed() is true and the loop can vectorize. That is the reason the second test has ; CHECK: sdiv <vscale x 4 x i32> CarolineConcatto: Thank you @sdesmalen I added the test as you suggested for when the loop does not need…
// Limit MaxScalableVF by the maximum safe dependence distance.		// Limit MaxScalableVF by the maximum safe dependence distance.
Optional<unsigned> MaxVScale = TTI.getMaxVScale();		Optional<unsigned> MaxVScale = TTI.getMaxVScale();
MaxScalableVF = ElementCount::getScalable(		MaxScalableVF = ElementCount::getScalable(
MaxVScale ? (MaxSafeElements / MaxVScale.getValue()) : 0);		MaxVScale ? (MaxSafeElements / MaxVScale.getValue()) : 0);
if (!MaxScalableVF)		if (!MaxScalableVF)
reportVectorizationInfo(		reportVectorizationInfo(
"Max legal vector width too small, scalable vectorization "		"Max legal vector width too small, scalable vectorization "
"unfeasible.",		"unfeasible.",
▲ Show 20 Lines • Show All 1,024 Lines • ▼ Show 20 Lines	void LoopVectorizationCostModel::collectInstsToScalarize(ElementCount VF) {
// If so, we also record the instructions to scalarize.		// If so, we also record the instructions to scalarize.
for (BasicBlock *BB : TheLoop->blocks()) {		for (BasicBlock *BB : TheLoop->blocks()) {
if (!Legal->blockNeedsPredication(BB) && !foldTailByMasking())		if (!Legal->blockNeedsPredication(BB) && !foldTailByMasking())
continue;		continue;
for (Instruction &I : *BB)		for (Instruction &I : *BB)
if (isScalarWithPredication(&I)) {		if (isScalarWithPredication(&I)) {
ScalarCostsTy ScalarCosts;		ScalarCostsTy ScalarCosts;
// Do not apply discount logic if hacked cost is needed		// Do not apply discount logic if hacked cost is needed
// for emulated masked memrefs.		// for emulated masked memrefs.
if (!useEmulatedMaskMemRefHack(&I) &&		if (!useEmulatedMaskMemRefHack(&I) &&
computePredInstDiscount(&I, ScalarCosts, VF) >= 0)		computePredInstDiscount(&I, ScalarCosts, VF) >= 0)
		sdesmalenUnsubmitted Done Reply Inline Actions Can you extend the comment here explaining that the discount logic is not applied when the VF is scalable, because that would lead to invalid scalarization costs. sdesmalen: Can you extend the comment here explaining that the discount logic is not applied when the VF…
ScalarCostsVF.insert(ScalarCosts.begin(), ScalarCosts.end());		ScalarCostsVF.insert(ScalarCosts.begin(), ScalarCosts.end());
// Remember that BB will remain after vectorization.		// Remember that BB will remain after vectorization.
		sdesmalenUnsubmitted Done Reply Inline Actions Can you merge these conditions, i.e. `if (!VFScalable() && !useEmulatedMaskMemRefHack(&I) && ..` sdesmalen: Can you merge these conditions, i.e. `if (!VFScalable() && !useEmulatedMaskMemRefHack(&I) && ..`
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions Is it ok if I do: if (!VF.isScalable() && !useEmulatedMaskMemRefHack(&I)) if (computePredInstDiscount(&I, ScalarCosts, VF) >= 0) Because computePredInstDiscount should not be called for scalable vector. CarolineConcatto: Is it ok if I do: if (!VF.isScalable() && !useEmulatedMaskMemRefHack(&I)) if…
		sdesmalenUnsubmitted Done Reply Inline Actions nit: this can just be part of the same expression, i.e. if (!VF.isScalable() && !useEmulatedMaskMemRefHack(&I) && computePredInstDiscount(&I, ScalarCosts, VF) >= 0) sdesmalen: nit: this can just be part of the same expression, i.e. if (!VF.isScalable() && !
PredicatedBBsAfterVectorization.insert(BB);		PredicatedBBsAfterVectorization.insert(BB);
}		}
}		}
}		}

int LoopVectorizationCostModel::computePredInstDiscount(		int LoopVectorizationCostModel::computePredInstDiscount(
Instruction *PredInst, ScalarCostsTy &ScalarCosts, ElementCount VF) {		Instruction *PredInst, ScalarCostsTy &ScalarCosts, ElementCount VF) {
assert(!isUniformAfterVectorization(PredInst, VF) &&		assert(!isUniformAfterVectorization(PredInst, VF) &&
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	while (!Worklist.empty()) {
InstructionCost VectorCost = getInstructionCost(I, VF).first;		InstructionCost VectorCost = getInstructionCost(I, VF).first;

// Compute the cost of the scalarized instruction. This cost is the cost of		// Compute the cost of the scalarized instruction. This cost is the cost of
// the instruction as if it wasn't if-converted and instead remained in the		// the instruction as if it wasn't if-converted and instead remained in the
// predicated block. We will scale this cost by block probability after		// predicated block. We will scale this cost by block probability after
// computing the scalarization overhead.		// computing the scalarization overhead.
assert(!VF.isScalable() && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
InstructionCost ScalarCost =		InstructionCost ScalarCost =
VF.getKnownMinValue() *		VF.getKnownMinValue() *
		sdesmalenUnsubmitted Done Reply Inline Actions Can you split out all the changes to use `getFixedValue()` from this patch and commit those as a separate NFC patch? sdesmalen: Can you split out all the changes to use `getFixedValue()` from this patch and commit those as…
getInstructionCost(I, ElementCount::getFixed(1)).first;		getInstructionCost(I, ElementCount::getFixed(1)).first;

// Compute the scalarization overhead of needed insertelement instructions		// Compute the scalarization overhead of needed insertelement instructions
// and phi nodes.		// and phi nodes.
if (isScalarWithPredication(I) && !I->getType()->isVoidTy()) {		if (isScalarWithPredication(I) && !I->getType()->isVoidTy()) {
ScalarCost += TTI.getScalarizationOverhead(		ScalarCost += TTI.getScalarizationOverhead(
cast<VectorType>(ToVectorTy(I->getType(), VF)),		cast<VectorType>(ToVectorTy(I->getType(), VF)),
APInt::getAllOnesValue(VF.getKnownMinValue()), true, false);		APInt::getAllOnesValue(VF.getKnownMinValue()), true, false);
assert(!VF.isScalable() && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
		sdesmalenUnsubmitted Done Reply Inline Actions I think this assert can be removed in favour of using `VF.getFixedValue()` instead of `VF.getKnownMinValue()` everywhere in this function. sdesmalen: I think this assert can be removed in favour of using `VF.getFixedValue()` instead of `VF.
ScalarCost +=		ScalarCost +=
VF.getKnownMinValue() *		VF.getKnownMinValue() *
TTI.getCFInstrCost(Instruction::PHI, TTI::TCK_RecipThroughput);		TTI.getCFInstrCost(Instruction::PHI, TTI::TCK_RecipThroughput);
}		}

// Compute the scalarization overhead of needed extractelement		// Compute the scalarization overhead of needed extractelement
// instructions. For each of the instruction's operands, if the operand can		// instructions. For each of the instruction's operands, if the operand can
// be scalarized, add it to the worklist; otherwise, account for the		// be scalarized, add it to the worklist; otherwise, account for the
▲ Show 20 Lines • Show All 675 Lines • ▼ Show 20 Lines	case Instruction::GetElementPtr:
// instruction cost.		// instruction cost.
return 0;		return 0;
case Instruction::Br: {		case Instruction::Br: {
// In cases of scalarized and predicated instructions, there will be VF		// In cases of scalarized and predicated instructions, there will be VF
// predicated blocks in the vectorized loop. Each branch around these		// predicated blocks in the vectorized loop. Each branch around these
// blocks requires also an extract of its vector compare i1 element.		// blocks requires also an extract of its vector compare i1 element.
bool ScalarPredicatedBB = false;		bool ScalarPredicatedBB = false;
BranchInst *BI = cast<BranchInst>(I);		BranchInst *BI = cast<BranchInst>(I);
if (VF.isVector() && BI->isConditional() &&		if (VF.isVector() && BI->isConditional() &&
		sdesmalenUnsubmitted Done Reply Inline Actions The cost of the `Br` should also be invalid, because it tries to calculate the cost as if it's trying to scalarize the operation if ScalarPredicatedBB == true. sdesmalen: The cost of the `Br` should also be invalid, because it tries to calculate the cost as if it's…
(PredicatedBBsAfterVectorization.count(BI->getSuccessor(0)) \|\|		(PredicatedBBsAfterVectorization.count(BI->getSuccessor(0)) \|\|
PredicatedBBsAfterVectorization.count(BI->getSuccessor(1))))		PredicatedBBsAfterVectorization.count(BI->getSuccessor(1))))
ScalarPredicatedBB = true;		ScalarPredicatedBB = true;

if (ScalarPredicatedBB) {		if (ScalarPredicatedBB) {
// Return cost for branches around scalarized and predicated blocks.		// Return cost for branches around scalarized and predicated blocks.
assert(!VF.isScalable() && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
auto *Vec_i1Ty =		auto *Vec_i1Ty =
▲ Show 20 Lines • Show All 311 Lines • ▼ Show 20 Lines
bool LoopVectorizationCostModel::isConsecutiveLoadOrStore(Instruction *Inst) {		bool LoopVectorizationCostModel::isConsecutiveLoadOrStore(Instruction *Inst) {
// Check if the pointer operand of a load or store instruction is		// Check if the pointer operand of a load or store instruction is
// consecutive.		// consecutive.
if (auto *Ptr = getLoadStorePointerOperand(Inst))		if (auto *Ptr = getLoadStorePointerOperand(Inst))
return Legal->isConsecutivePtr(Ptr);		return Legal->isConsecutivePtr(Ptr);
return false;		return false;
}		}

void LoopVectorizationCostModel::collectValuesToIgnore() {		void LoopVectorizationCostModel::collectValuesToIgnore() {
		sdesmalenUnsubmitted Done Reply Inline Actions I think it'd be good to give this a more generic name, so that we can use it to test for other cases as well. How about `loopCanBeWidenedWithScalableVectors` which implies the conditions should be negated, and you may want to return some message of why it did not vectorize, e.g. LoopVectorizationCostModel::loopCanBeWidenedWithScalableVectors(ElementCount MaxVF, std::string &Message) sdesmalen: I think it'd be good to give this a more generic name, so that we can use it to test for other…
		david-armUnsubmitted Done Reply Inline Actions This is similar to https://reviews.llvm.org/D102394 where I moved the canVectorizeReductions call in there too. david-arm: This is similar to https://reviews.llvm.org/D102394 where I moved the canVectorizeReductions…
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions Thank you @david-arm and @sdesmalen In order to not create the same function twice I have rebased this patch based on D102394. So there is only one canWidenLoopWithScalableVectors. This patch already has one dependence in D102437. However, we can remove dependence on D102394, and copy the function if this patch and D102437 are accepted first CarolineConcatto: Thank you @david-arm and @sdesmalen In order to not create the same function twice I have…
// Ignore ephemeral values.		// Ignore ephemeral values.
CodeMetrics::collectEphemeralValues(TheLoop, AC, ValuesToIgnore);		CodeMetrics::collectEphemeralValues(TheLoop, AC, ValuesToIgnore);

// Ignore type-promoting instructions we identified during reduction		// Ignore type-promoting instructions we identified during reduction
// detection.		// detection.
for (auto &Reduction : Legal->getReductionVars()) {		for (auto &Reduction : Legal->getReductionVars()) {
		sdesmalenUnsubmitted Done Reply Inline Actions is this condition necessary? I would expect any instructions that explicitly need predication+scalarization, to never be values that can be ignored. sdesmalen: is this condition necessary? I would expect any instructions that explicitly need…
RecurrenceDescriptor &RedDes = Reduction.second;		RecurrenceDescriptor &RedDes = Reduction.second;
		sdesmalenUnsubmitted Done Reply Inline Actions There is a problem with this approach, in that `isScalarWithPredication` calls `blockNeedsPredication`, which in turn calls `foldTailByMasking`, whose return value is determined after calling `computeFeasibleMaxVF`, which ends up calling this function in order to determine the min/max VF. This may lead to unexpected results when we want to use scalable vectors for tail-folded (predicated) loops, so you'll need to figure out a better place to call `loopHasPredicatedScalar`. Additionally, it might be useful to add a mechanism that records if FoldTailByMasking already has its final value, and then assert that in `isScalarWithPredication`. sdesmalen: There is a problem with this approach, in that `isScalarWithPredication` calls…
		david-armUnsubmitted Done Reply Inline Actions OK that's a good point. It's tricky because we also need to ensure we call this before attempting to calculate any costs as we'll hit asserts in computePredInstDiscount again. If we didn't have to worry about `blockNeedsPredication`, this seems like a sensible place to bail out because we have to also worry about the general case without hints, i.e. we should really be telling the computeMaxVF code the max legal scalable VF = 0 as I believe this is how you've been recently structuring the code to work? I realise this is not an ideal solution, but what if for now we could avoid concerns about tail folding by somehow marking tail folding as not an option for scalable vectors, since we don't support it anyway for scalable vectors? david-arm: OK that's a good point. It's tricky because we also need to ensure we call this before…
const SmallPtrSetImpl<Instruction *> &Casts = RedDes.getCastInsts();		const SmallPtrSetImpl<Instruction *> &Casts = RedDes.getCastInsts();
VecValuesToIgnore.insert(Casts.begin(), Casts.end());		VecValuesToIgnore.insert(Casts.begin(), Casts.end());
}		}
// Ignore type-casting instructions we identified during induction		// Ignore type-casting instructions we identified during induction
// detection.		// detection.
for (auto &Induction : Legal->getInductionVars()) {		for (auto &Induction : Legal->getInductionVars()) {
InductionDescriptor &IndDes = Induction.second;		InductionDescriptor &IndDes = Induction.second;
const SmallVectorImpl<Instruction *> &Casts = IndDes.getCastInsts();		const SmallVectorImpl<Instruction *> &Casts = IndDes.getCastInsts();
▲ Show 20 Lines • Show All 2,460 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/scalable-predicate-instruction.ll

This file was added.

				; RUN: opt < %s -loop-vectorize -force-target-supports-scalable-vectors -S \| FileCheck %s
				; RUN: opt < %s -loop-vectorize -force-target-supports-scalable-vectors -prefer-predicate-over-epilogue=predicate-dont-vectorize -S \| FileCheck %s
				sdesmalenUnsubmitted Done Reply Inline Actions This is no longer true sdesmalen: This is no longer true
				sdesmalenUnsubmitted Not Done Reply Inline Actions The RUN lines don't actually enable scalable vectorization (=off by default), so this test always passes. You'll need to add `-scalable-vectorization=preferred`. I'd also recommend moving this function to llvm/test/Transforms/LoopVectorize/AArch64. The `-force-target-supports-scalable-vectors` flag doesn't work properly, because the BasicTTIImpl costmodel implementation has defaults that must be overridden for scalable auto-vec to work. sdesmalen: The RUN lines don't actually enable scalable vectorization (=off by default), so this test…
				target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

				; The test predication_in_loop corresponds
				; to the following function
				; for (long long i = 0; i < 1024; i++) {
				david-armUnsubmitted Done Reply Inline Actions nit: Maybe "This test corresponds to the following function:"? david-arm: nit: Maybe "This test corresponds to the following function:"?
				; if (cond[i])
				; a[i] /= b[i];
				; }

				; Scalarizing the division cannot be done for scalable vectors at the moment
				; when the loop needs predication
				; Future implementation of llvm.vp could allow this to happen
				david-armUnsubmitted Done Reply Inline Actions nit: Simple typo - it should be `guaranteed`. david-arm: nit: Simple typo - it should be `guaranteed`.
				fhahnUnsubmitted Not Done Reply Inline Actions Would it be more accurate to say this can only be vectorized by scalarizing the division for now which cannot be done for scalable vectors at the moment? fhahn: Would it be more accurate to say this can only be vectorized by scalarizing the division for…
				CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions Thank you @fhahn, Is the explanation better now? I added that it will be possible when llvm.vp is implemented. CarolineConcatto: Thank you @fhahn, Is the explanation better now? I added that it will be possible when llvm.vp…

				define void @predication_in_loop(i32* %a, i32* %b, i32* %cond) {
				; CHECK-LABEL: @predication_in_loop
				sdesmalenUnsubmitted Done Reply Inline Actions Instead of checking the output to be exactly this, is it sufficient to just check no vector type is being used? e.g. CHECK-NOT: <vscale x 4> sdesmalen: Instead of checking the output to be exactly this, is it sufficient to just check no vector…
				CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions Thank you @sdesmalen, There is no problem. I have changed to check only SDIV. Is that better? CarolineConcatto: Thank you @sdesmalen, There is no problem. I have changed to check only SDIV. Is that better?
				; CHECK-NOT: sdiv <vscale x 4 x i32>
				sdesmalenUnsubmitted Done Reply Inline Actions nit: predication_in_loop sdesmalen: nit: predication_in_loop
				;
				entry:
				br label %for.body

				for.cond.cleanup: ; preds = %for.inc, %entry
				ret void

				for.body: ; preds = %entry, %for.inc
				%i.09 = phi i64 [ %inc, %for.inc ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds i32, i32* %cond, i64 %i.09
				%0 = load i32, i32* %arrayidx, align 4
				%tobool.not = icmp eq i32 %0, 0
				br i1 %tobool.not, label %for.inc, label %if.then

				if.then: ; preds = %for.body
				%arrayidx1 = getelementptr inbounds i32, i32* %b, i64 %i.09
				%1 = load i32, i32* %arrayidx1, align 4
				%arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %i.09
				%2 = load i32, i32* %arrayidx2, align 4
				%div = sdiv i32 %2, %1
				store i32 %div, i32* %arrayidx2, align 4
				br label %for.inc

				for.inc: ; preds = %for.body, %if.then
				%inc = add nuw nsw i64 %i.09, 1
				%exitcond.not = icmp eq i64 %inc, 1024
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body, !llvm.loop !0
				}


				;
				; This test unpredicated_loop_predication_through_tailfolding corresponds
				; to the following function
				; for (long long i = 0; i < 1024; i++) {
				; a[i] /= b[i];
				; }

				; Scalarization not possible in the main loop when there is no predication and
				; epilogue should not be able to allow scalarization
				; otherwise it could be able to vectorize, but will not because
				; "Max legal vector width too small, scalable vectorization unfeasible.."

				define void @unpredicated_loop_predication_through_tailfolding(i32* %a, i32* %b) {
				; CHECK-LABEL: @unpredicated_loop_predication_through_tailfolding
				; CHECK-NOT: sdiv <vscale x 4 x i32>

				sdesmalenUnsubmitted Done Reply Inline Actions nit: unpredicated_loop_predication_through_tailfolding sdesmalen: nit: unpredicated_loop_predication_through_tailfolding
				entry:
				br label %loop

				loop:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i32, i32* %b, i64 %iv
				%1 = load i32, i32* %arrayidx2, align 4
				%sdiv = sdiv i32 %1, %0
				%2 = add nuw nsw i64 %iv, 8
				%arrayidx5 = getelementptr inbounds i32, i32* %a, i64 %2
				store i32 %sdiv, i32* %arrayidx5, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, 1024
				br i1 %exitcond.not, label %exit, label %loop, !llvm.loop !0

				exit:
				ret void

				}

				!0 = distinct !{!0, !1, !2, !3, !4}
				!1 = !{!"llvm.loop.vectorize.width", i32 4}
				!2 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}
				!3 = !{!"llvm.loop.interleave.count", i32 1}
				!4 = !{!"llvm.loop.vectorize.enable", i1 true}
				sdesmalenUnsubmitted Done Reply Inline Actions unnecessary? sdesmalen: unnecessary?

This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Fix crash for predicated instruction with scalable VFClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 347947

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/scalable-predicate-instruction.ll

[LoopVectorize] Fix crash for predicated instruction with scalable VF
ClosedPublic