This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Fix crash on "getNoopOrZeroExtend cannot truncate!" (PR45259)
ClosedPublic

Authored by vsk on Mar 23 2020, 10:41 PM.

Details

Summary

In InnerLoopVectorizer::getOrCreateTripCount, when the backedge taken
count is a SCEV add expression, its type is defined by the type of the
last operand of the add expression.

In the test case from PR45259, this last operand happens to be a
pointer, which (according to llvm::Type) does not have a primitive size
in bits. In this case, LoopVectorize fails to truncate the SCEV and
crashes as a result.

ISTM that the truncation really is needed, and that using
ScalarEvolution::getTypeSizeInBits makes the truncation work as
expected. As I'm not really familiar with this code, I'm not sure
whether this is an appropriate or sufficient fix -- any advice
appreciated.

https://bugs.llvm.org/show_bug.cgi?id=45259

Diff Detail

Event Timeline

vsk created this revision.Mar 23 2020, 10:41 PM
fhahn accepted this revision.Mar 24 2020, 3:38 AM

LGTM, with some suggestions to improve the test case. Please try to reduce the test case to the things required for the crash only. For example, the function below should be enough (includes a bit of renaming and also use for %t3.i, to make it less prone to being optimized away).

define i8 @widget(i8* %arr, i8 %t9) {
bb:
  br label %bb6

bb6:
  %t1.0 = phi i8* [ %arr, %bb ], [ null, %bb6 ]
  %c = call i1 @cond()
  br i1 %c, label %for.preheader, label %bb6

for.preheader:
  br label %for.body

for.body:
  %iv = phi i8 [ %iv.next, %for.body ], [ 0, %for.preheader ]
  %iv.next = add i8 %iv, 1
  %ptr = getelementptr inbounds i8, i8* %arr, i8 %iv.next
  %t3.i = icmp slt i8 %iv.next, %t9
  %t3.i8 = zext i1 %t3.i to i8
  store i8 %t3.i8, i8* %ptr
  %ec = icmp eq i8* %t1.0, %ptr
  br i1 %ec, label %for.exit, label %for.body

for.exit:
  %iv.next.lcssa = phi i8 [ %iv.next, %for.body ]
  ret i8 %iv.next.lcssa
}

declare i1 @cond()
llvm/test/Transforms/LoopVectorize/pr45259.ll
2

I think for the test it would be better to not auto-generate the checks. Here we mostly care about vectorizing without crashing. It should be enough to check if there's a vector body for the right loop with some vector instructions.

The auto-generated checks seems to verbose here and also prone to changes when small things change, potentially causing unnecessary large diffs on further changes.

6

Tests directly in llvm/test/Transforms/LoopVectorize/ should be target independent. If the x86 triple is required, please move it to the X86 subdirectory. But it should not be required. As we are not interested in cost-modeling here I think, you should be able to instead pass something like -force-vector-width=4 -force-vector-interleave=1 and remove the triple.

10

nit: dso_local, local_unnamed_addr not required

This revision is now accepted and ready to land.Mar 24 2020, 3:38 AM
vsk updated this revision to Diff 252451.Mar 24 2020, 4:10 PM

@fhahn thanks for the simplified test! This update should incorporate all the feedback.

Thanks Vedant. It might be good to wait a day or so with committing, in case someone else has additional thoughts.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2020, 10:18 AM