This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
2
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/Hexagon/
-
Transforms/
-
LoopVectorize/
-
Hexagon/
9
maximum-vf-crash.ll

Differential D94869

[LV] Fix crash when computing max VF too early
ClosedPublic

Authored by c-rhodes on Jan 16 2021, 9:00 AM.

Download Raw Diff

Details

Reviewers

iajbar
fhahn
sdesmalen

Commits

rG8cda227432f1: [LV] Fix crash when computing max VF too early

Summary

D90687 introduced a crash:

llvm::LoopVectorizationCostModel::computeMaxVF(llvm::ElementCount, unsigned int): Assertion `WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() && "No decisions should have been taken at this point"' failed.

when compiling the following C code:

typedef struct {
char a;
} b;

b *c;
int d, e;

int f() {
  int g = 0;
  for (; d; d++) {
    e = 0;
    for (; e < c[d].a; e++)
      g++;
  }
  return g;
}

with:

clang -Os -target hexagon -mhvx -fvectorize -mv67 testcase.c -S -o -

This occurred since prior to D90687 computeFeasibleMaxVF would only be
called in computeMaxVF when a scalar epilogue was allowed, but now it's
always called. This causes the assert above since computeFeasibleMaxVF
collects all viable VFs larger than the default MaxVF, and for each VF
calculates the register usage which results in analysis being done the
assert above guards against. This can occur in computeFeasibleMaxVF if
TTI.shouldMaximizeVectorBandwidth and this target hook is implemented in
the hexagon backend to always return true.

Reported by @iajbar.

Diff Detail

Event Timeline

c-rhodes created this revision.Jan 16 2021, 9:00 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptJan 16 2021, 9:00 AM

c-rhodes requested review of this revision.Jan 16 2021, 9:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 16 2021, 9:00 AM

c-rhodes mentioned this in D90687: [LV] Clamp VF hint when unsafe.Jan 16 2021, 9:07 AM

Harbormaster completed remote builds in B85499: Diff 317183.Jan 16 2021, 9:47 AM

c-rhodes added a subscriber: fhahn.Jan 18 2021, 6:47 AM

c-rhodes added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5534–5535	@fhahn Any thoughts on this? This assert is firing since D90687, details above. I'm just wondering having looked at your patch D78298 if there's a more sensible fix here, maybe to get rid of the assert and call `invalidateCostModelingDecisions`?

c-rhodes added reviewers: fhahn, sdesmalen.Jan 18 2021, 6:47 AM

fhahn added inline comments.Jan 21 2021, 9:15 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5534–5535	I think it is still preferable to avoid spending time computing cost-model decisions unnecessarily. Duplicating the MaxVF computation at a few places to ensure we only do it when needed looks good to me.
llvm/test/Transforms/LoopVectorize/Hexagon/maximum-vf-crash.ll
1	please add some check lines to make sure something sensible happens, besides not crashing.
14	Personally I think the C source code mostly just adds clutter. Ideally the IR source would be concise and with descriptive variable names, so it should be relatively easy to see what's going on without C source; especially if the C source looks like something auto-generated/C-reduced. Also, there's no guarantee that Clang will generate the same IR in future versions.
44	can this function be a bit more simplified & cleaned up? I'll leave some suggestions below & I think the basic block names could be improved & shortened,
46	none of this should be needed to reproduce the failure, you should be able to just use a constant instead of `%.pr` as incoming value below.
51	can we instead just pass a pointer argument?
57	instead of using a struct, can this just be plain pointer to `i8` or something like that?
61	are all those compares/extensions/selects needed?
72	not needed?
82	not needed?

@fhahn I've simplified the test, thanks for the comments!

LGTM, thanks! This should probably also go onto the 12.x release branch.

This revision is now accepted and ready to land.Feb 1 2021, 1:14 AM

Closed by commit rG8cda227432f1: [LV] Fix crash when computing max VF too early (authored by c-rhodes). · Explain WhyFeb 1 2021, 4:15 AM

This revision was automatically updated to reflect the committed changes.

c-rhodes added a commit: rG8cda227432f1: [LV] Fix crash when computing max VF too early.

In D94869#2533313, @fhahn wrote:

LGTM, thanks! This should probably also go onto the 12.x release branch.

Landed, cheers!

I've created https://bugs.llvm.org/show_bug.cgi?id=48989 to get it cherry-picked onto the 12.x release branch.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

7 lines

test/

Transforms/

LoopVectorize/

Hexagon/

maximum-vf-crash.ll

83 lines

Diff 317183

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,475 Lines • ▼ Show 20 Lines	LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
LLVM_DEBUG(dbgs() << "LV: Found trip count: " << TC << '\n');		LLVM_DEBUG(dbgs() << "LV: Found trip count: " << TC << '\n');
if (TC == 1) {		if (TC == 1) {
reportVectorizationFailure("Single iteration (non) loop",		reportVectorizationFailure("Single iteration (non) loop",
"loop trip count is one, irrelevant for vectorization",		"loop trip count is one, irrelevant for vectorization",
"SingleIterationLoop", ORE, TheLoop);		"SingleIterationLoop", ORE, TheLoop);
return None;		return None;
}		}

ElementCount MaxVF = computeFeasibleMaxVF(TC, UserVF);

switch (ScalarEpilogueStatus) {		switch (ScalarEpilogueStatus) {
case CM_ScalarEpilogueAllowed:		case CM_ScalarEpilogueAllowed:
return MaxVF;		return computeFeasibleMaxVF(TC, UserVF);
case CM_ScalarEpilogueNotAllowedUsePredicate:		case CM_ScalarEpilogueNotAllowedUsePredicate:
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
case CM_ScalarEpilogueNotNeededUsePredicate:		case CM_ScalarEpilogueNotNeededUsePredicate:
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "LV: vector predicate hint/switch found.\n"		dbgs() << "LV: vector predicate hint/switch found.\n"
<< "LV: Not allowing scalar epilogue, creating predicated "		<< "LV: Not allowing scalar epilogue, creating predicated "
<< "vector loop.\n");		<< "vector loop.\n");
break;		break;
Show All 21 Lines	LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
// require a lane mask which varies through the vector loop body. (TODO)		// require a lane mask which varies through the vector loop body. (TODO)
if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {		if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
// If there was a tail-folding hint/switch, but we can't fold the tail by		// If there was a tail-folding hint/switch, but we can't fold the tail by
// masking, fallback to a vectorization with a scalar epilogue.		// masking, fallback to a vectorization with a scalar epilogue.
if (ScalarEpilogueStatus == CM_ScalarEpilogueNotNeededUsePredicate) {		if (ScalarEpilogueStatus == CM_ScalarEpilogueNotNeededUsePredicate) {
LLVM_DEBUG(dbgs() << "LV: Cannot fold tail by masking: vectorize with a "		LLVM_DEBUG(dbgs() << "LV: Cannot fold tail by masking: vectorize with a "
"scalar epilogue instead.\n");		"scalar epilogue instead.\n");
ScalarEpilogueStatus = CM_ScalarEpilogueAllowed;		ScalarEpilogueStatus = CM_ScalarEpilogueAllowed;
return MaxVF;		return computeFeasibleMaxVF(TC, UserVF);
}		}
return None;		return None;
}		}

// Now try the tail folding		// Now try the tail folding

// Invalidate interleave groups that require an epilogue if we can't mask		// Invalidate interleave groups that require an epilogue if we can't mask
// the interleave-group.		// the interleave-group.
if (!useMaskedInterleavedAccesses(TTI)) {		if (!useMaskedInterleavedAccesses(TTI)) {
assert(WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() &&		assert(WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() &&
"No decisions should have been taken at this point");		"No decisions should have been taken at this point");
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions @fhahn Any thoughts on this? This assert is firing since D90687, details above. I'm just wondering having looked at your patch D78298 if there's a more sensible fix here, maybe to get rid of the assert and call `invalidateCostModelingDecisions`? c-rhodes: @fhahn Any thoughts on this? This assert is firing since D90687, details above. I'm just…
		fhahnUnsubmitted Not Done Reply Inline Actions I think it is still preferable to avoid spending time computing cost-model decisions unnecessarily. Duplicating the MaxVF computation at a few places to ensure we only do it when needed looks good to me. fhahn: I think it is still preferable to avoid spending time computing cost-model decisions…
// Note: There is no need to invalidate any cost modeling decisions here, as		// Note: There is no need to invalidate any cost modeling decisions here, as
// non where taken so far.		// non where taken so far.
InterleaveInfo.invalidateGroupsRequiringScalarEpilogue();		InterleaveInfo.invalidateGroupsRequiringScalarEpilogue();
}		}

		ElementCount MaxVF = computeFeasibleMaxVF(TC, UserVF);
assert(!MaxVF.isScalable() &&		assert(!MaxVF.isScalable() &&
"Scalable vectors do not yet support tail folding");		"Scalable vectors do not yet support tail folding");
assert((UserVF.isNonZero() \|\| isPowerOf2_32(MaxVF.getFixedValue())) &&		assert((UserVF.isNonZero() \|\| isPowerOf2_32(MaxVF.getFixedValue())) &&
"MaxVF must be a power of 2");		"MaxVF must be a power of 2");
unsigned MaxVFtimesIC =		unsigned MaxVFtimesIC =
UserIC ? MaxVF.getFixedValue() * UserIC : MaxVF.getFixedValue();		UserIC ? MaxVF.getFixedValue() * UserIC : MaxVF.getFixedValue();
// Avoid tail folding if the trip count is known to be a multiple of any VF we		// Avoid tail folding if the trip count is known to be a multiple of any VF we
// chose.		// chose.
▲ Show 20 Lines • Show All 4,005 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/Hexagon/maximum-vf-crash.ll

This file was added.

				; RUN: opt -march=hexagon -hexagon-autohvx -loop-vectorize -disable-output < %s
				fhahnUnsubmitted Not Done Reply Inline Actions please add some check lines to make sure something sensible happens, besides not crashing. fhahn: please add some check lines to make sure something sensible happens, besides not crashing.

				; Check that we don't crash.
				;
				; Testcase originated from this C code:
				;
				; typedef struct {
				; char a;
				; } b;
				;
				; b *c;
				; int d, e;
				;
				; int f() {
				fhahnUnsubmitted Not Done Reply Inline Actions Personally I think the C source code mostly just adds clutter. Ideally the IR source would be concise and with descriptive variable names, so it should be relatively easy to see what's going on without C source; especially if the C source looks like something auto-generated/C-reduced. Also, there's no guarantee that Clang will generate the same IR in future versions. fhahn: Personally I think the C source code mostly just adds clutter. Ideally the IR source would be…
				; int g = 0;
				; for (; d; d++) {
				; e = 0;
				; for (; e < c[d].a; e++)
				; g++;
				; }
				; return g;
				; }
				;
				; which was crashing when compiling with:
				;
				; clang -Os -mhvx -fvectorize -mv67 testcase.c -S -o -
				;
				; Source of the crash was introduced in D90687.
				;
				; IR generated by:
				;
				; ./bin/clang -Os -mhvx -fvectorize -mv67 testcase.c -S -emit-llvm -o testcase.ll

				target datalayout = "e-m:e-p:32:32:32-a:0-n16:32-i64:64:64-i32:32:32-i16:16:16-i1:8:8-f32:32:32-f64:64:64-v32:32:32-v64:64:64-v512:512:512-v1024:1024:1024-v2048:2048:2048"
				target triple = "hexagon"

				%struct.b = type { i8 }

				@d = dso_local local_unnamed_addr global i32 0, align 4
				@e = dso_local local_unnamed_addr global i32 0, align 4
				@c = dso_local local_unnamed_addr global %struct.b* null, align 4

				; Function Attrs: optsize
				define dso_local i32 @f() local_unnamed_addr #0 {
				fhahnUnsubmitted Not Done Reply Inline Actions can this function be a bit more simplified & cleaned up? I'll leave some suggestions below & I think the basic block names could be improved & shortened, fhahn: can this function be a bit more simplified & cleaned up? I'll leave some suggestions below & I…
				entry:
				%.pr = load i32, i32* @d, align 4
				fhahnUnsubmitted Not Done Reply Inline Actions none of this should be needed to reproduce the failure, you should be able to just use a constant instead of `%.pr` as incoming value below. fhahn: none of this should be needed to reproduce the failure, you should be able to just use a…
				%tobool.not15 = icmp eq i32 %.pr, 0
				br i1 %tobool.not15, label %for.end7, label %for.cond1.preheader.lr.ph

				for.cond1.preheader.lr.ph: ; preds = %entry
				%0 = load %struct.b, %struct.b* @c, align 4
				fhahnUnsubmitted Not Done Reply Inline Actions can we instead just pass a pointer argument? fhahn: can we instead just pass a pointer argument?
				br label %for.cond1.preheader

				for.cond1.preheader: ; preds = %for.cond1.preheader.lr.ph, %for.cond1.preheader
				%g.016 = phi i32 [ 0, %for.cond1.preheader.lr.ph ], [ %g.1.lcssa, %for.cond1.preheader ]
				%1 = phi i32 [ %.pr, %for.cond1.preheader.lr.ph ], [ %inc6, %for.cond1.preheader ]
				%a10 = getelementptr inbounds %struct.b, %struct.b* %0, i32 %1, i32 0
				fhahnUnsubmitted Not Done Reply Inline Actions instead of using a struct, can this just be plain pointer to `i8` or something like that? fhahn: instead of using a struct, can this just be plain pointer to `i8` or something like that?
				%2 = load i8, i8* %a10, align 1
				%cmp12.not = icmp eq i8 %2, 0
				%conv = zext i8 %2 to i32
				%3 = icmp ugt i32 %conv, 1
				fhahnUnsubmitted Not Done Reply Inline Actions are all those compares/extensions/selects needed? fhahn: are all those compares/extensions/selects needed?
				%umax = select i1 %3, i32 %conv, i32 1
				%4 = select i1 %cmp12.not, i32 0, i32 %umax
				%g.1.lcssa = add i32 %g.016, %4
				%inc6 = add nsw i32 %1, 1
				%tobool.not = icmp eq i32 %inc6, 0
				br i1 %tobool.not, label %for.cond.for.end7_crit_edge, label %for.cond1.preheader, !llvm.loop !0

				for.cond.for.end7_crit_edge: ; preds = %for.cond1.preheader
				%inc4.lcssa18 = select i1 %cmp12.not, i32 0, i32 %umax
				store i32 %inc4.lcssa18, i32* @e, align 4
				store i32 0, i32* @d, align 4
				fhahnUnsubmitted Not Done Reply Inline Actions not needed? fhahn: not needed?
				br label %for.end7

				for.end7: ; preds = %for.cond.for.end7_crit_edge, %entry
				%g.0.lcssa = phi i32 [ %g.1.lcssa, %for.cond.for.end7_crit_edge ], [ 0, %entry ]
				ret i32 %g.0.lcssa
				}

				attributes #0 = { optsize "target-cpu"="hexagonv67" "target-features"="+hvx-length128b,+hvxv67,+v67,-long-calls" }

				!0 = distinct !{!0, !1}
				fhahnUnsubmitted Not Done Reply Inline Actions not needed? fhahn: not needed?
				!1 = !{!"llvm.loop.mustprogress"}