This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
-
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
3
pr45259.ll

Differential D76669

[LoopVectorize] Fix crash on "getNoopOrZeroExtend cannot truncate!" (PR45259)
ClosedPublic

Authored by vsk on Mar 23 2020, 10:41 PM.

Download Raw Diff

Details

Reviewers

fhahn
nemanjai
Ayal
gilr

Commits

rGd9ba731d5690: [LoopVectorize] Fix crash on "getNoopOrZeroExtend cannot truncate!" (PR45259)
rGdcc410b5cf20: [LoopVectorize] Fix crash on "getNoopOrZeroExtend cannot truncate!" (PR45259)

Summary

In InnerLoopVectorizer::getOrCreateTripCount, when the backedge taken
count is a SCEV add expression, its type is defined by the type of the
last operand of the add expression.

In the test case from PR45259, this last operand happens to be a
pointer, which (according to llvm::Type) does not have a primitive size
in bits. In this case, LoopVectorize fails to truncate the SCEV and
crashes as a result.

ISTM that the truncation really is needed, and that using
ScalarEvolution::getTypeSizeInBits makes the truncation work as
expected. As I'm not really familiar with this code, I'm not sure
whether this is an appropriate or sufficient fix -- any advice
appreciated.

https://bugs.llvm.org/show_bug.cgi?id=45259

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

vsk created this revision.Mar 23 2020, 10:41 PM

Herald added subscribers: javed.absar, hiraditya. · View Herald TranscriptMar 23 2020, 10:41 PM

LGTM, with some suggestions to improve the test case. Please try to reduce the test case to the things required for the crash only. For example, the function below should be enough (includes a bit of renaming and also use for %t3.i, to make it less prone to being optimized away).

define i8 @widget(i8* %arr, i8 %t9) {
bb:
  br label %bb6

bb6:
  %t1.0 = phi i8* [ %arr, %bb ], [ null, %bb6 ]
  %c = call i1 @cond()
  br i1 %c, label %for.preheader, label %bb6

for.preheader:
  br label %for.body

for.body:
  %iv = phi i8 [ %iv.next, %for.body ], [ 0, %for.preheader ]
  %iv.next = add i8 %iv, 1
  %ptr = getelementptr inbounds i8, i8* %arr, i8 %iv.next
  %t3.i = icmp slt i8 %iv.next, %t9
  %t3.i8 = zext i1 %t3.i to i8
  store i8 %t3.i8, i8* %ptr
  %ec = icmp eq i8* %t1.0, %ptr
  br i1 %ec, label %for.exit, label %for.body

for.exit:
  %iv.next.lcssa = phi i8 [ %iv.next, %for.body ]
  ret i8 %iv.next.lcssa
}

declare i1 @cond()

llvm/test/Transforms/LoopVectorize/pr45259.ll
2	I think for the test it would be better to not auto-generate the checks. Here we mostly care about vectorizing without crashing. It should be enough to check if there's a vector body for the right loop with some vector instructions. The auto-generated checks seems to verbose here and also prone to changes when small things change, potentially causing unnecessary large diffs on further changes.
6	Tests directly in `llvm/test/Transforms/LoopVectorize/` should be target independent. If the x86 triple is required, please move it to the X86 subdirectory. But it should not be required. As we are not interested in cost-modeling here I think, you should be able to instead pass something like `-force-vector-width=4 -force-vector-interleave=1` and remove the triple.
10	nit: dso_local, local_unnamed_addr not required

This revision is now accepted and ready to land.Mar 24 2020, 3:38 AM

@fhahn thanks for the simplified test! This update should incorporate all the feedback.

Thanks Vedant. It might be good to wait a day or so with committing, in case someone else has additional thoughts.

Closed by commit rGdcc410b5cf20: [LoopVectorize] Fix crash on "getNoopOrZeroExtend cannot truncate!" (PR45259) (authored by vsk). · Explain WhyMar 30 2020, 10:18 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2020, 10:18 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

2 lines

test/

Transforms/

LoopVectorize/

pr45259.ll

36 lines

Diff 253632

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,577 Lines • ▼ Show 20 Lines	Value InnerLoopVectorizer::getOrCreateTripCount(Loop L) {
Type *IdxTy = Legal->getWidestInductionType();		Type *IdxTy = Legal->getWidestInductionType();
assert(IdxTy && "No type for induction");		assert(IdxTy && "No type for induction");

// The exit count might have the type of i64 while the phi is i32. This can		// The exit count might have the type of i64 while the phi is i32. This can
// happen if we have an induction variable that is sign extended before the		// happen if we have an induction variable that is sign extended before the
// compare. The only way that we get a backedge taken count is that the		// compare. The only way that we get a backedge taken count is that the
// induction variable was signed and as such will not overflow. In such a case		// induction variable was signed and as such will not overflow. In such a case
// truncation is legal.		// truncation is legal.
if (BackedgeTakenCount->getType()->getPrimitiveSizeInBits() >		if (SE->getTypeSizeInBits(BackedgeTakenCount->getType()) >
IdxTy->getPrimitiveSizeInBits())		IdxTy->getPrimitiveSizeInBits())
BackedgeTakenCount = SE->getTruncateOrNoop(BackedgeTakenCount, IdxTy);		BackedgeTakenCount = SE->getTruncateOrNoop(BackedgeTakenCount, IdxTy);
BackedgeTakenCount = SE->getNoopOrZeroExtend(BackedgeTakenCount, IdxTy);		BackedgeTakenCount = SE->getNoopOrZeroExtend(BackedgeTakenCount, IdxTy);

// Get the total trip count from the count by adding 1.		// Get the total trip count from the count by adding 1.
const SCEV *ExitCount = SE->getAddExpr(		const SCEV *ExitCount = SE->getAddExpr(
BackedgeTakenCount, SE->getOne(BackedgeTakenCount->getType()));		BackedgeTakenCount, SE->getOne(BackedgeTakenCount->getType()));

▲ Show 20 Lines • Show All 5,430 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/pr45259.ll

This file was added.

				; RUN: opt < %s -S -loop-vectorize -force-vector-width=4 -force-vector-interleave=1 \| FileCheck %s

				fhahnUnsubmitted Not Done Reply Inline Actions I think for the test it would be better to not auto-generate the checks. Here we mostly care about vectorizing without crashing. It should be enough to check if there's a vector body for the right loop with some vector instructions. The auto-generated checks seems to verbose here and also prone to changes when small things change, potentially causing unnecessary large diffs on further changes. fhahn: I think for the test it would be better to not auto-generate the checks. Here we mostly care…
				; Check that we can vectorize this loop without crashing.

				; CHECK-LABEL: define {{.*}} @widget(
				; CHECK: [[vecInd:%.*]] = phi <4 x i8> [ <i8 0, i8 1, i8 2, i8 3>
				fhahnUnsubmitted Not Done Reply Inline Actions Tests directly in `llvm/test/Transforms/LoopVectorize/` should be target independent. If the x86 triple is required, please move it to the X86 subdirectory. But it should not be required. As we are not interested in cost-modeling here I think, you should be able to instead pass something like `-force-vector-width=4 -force-vector-interleave=1` and remove the triple. fhahn: Tests directly in `llvm/test/Transforms/LoopVectorize/` should be target independent. If the…
				; CHECK-NEXT: add <4 x i8> [[vecInd]], <i8 1, i8 1, i8 1, i8 1>

				define i8 @widget(i8* %arr, i8 %t9) {
				bb:
				fhahnUnsubmitted Not Done Reply Inline Actions nit: dso_local, local_unnamed_addr not required fhahn: nit: dso_local, local_unnamed_addr not required
				br label %bb6

				bb6:
				%t1.0 = phi i8* [ %arr, %bb ], [ null, %bb6 ]
				%c = call i1 @cond()
				br i1 %c, label %for.preheader, label %bb6

				for.preheader:
				br label %for.body

				for.body:
				%iv = phi i8 [ %iv.next, %for.body ], [ 0, %for.preheader ]
				%iv.next = add i8 %iv, 1
				%ptr = getelementptr inbounds i8, i8* %arr, i8 %iv.next
				%t3.i = icmp slt i8 %iv.next, %t9
				%t3.i8 = zext i1 %t3.i to i8
				store i8 %t3.i8, i8* %ptr
				%ec = icmp eq i8* %t1.0, %ptr
				br i1 %ec, label %for.exit, label %for.body

				for.exit:
				%iv.next.lcssa = phi i8 [ %iv.next, %for.body ]
				ret i8 %iv.next.lcssa
				}

				declare i1 @cond()