This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
-
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
3
pr35773.ll

Differential D41913

[LV] Don't call recordVectorLoopValueForInductionCast for newly-created IV from a trunc.
ClosedPublic

Authored by a.elovikov on Jan 10 2018, 11:35 AM.

Download Raw Diff

Details

Reviewers

dorit
Ayal
mssimpso

Commits

rGcf9e4fbd73af: Merging r322473: --------------------------------------------------------------…
rG7457aa0bce33: [LV] Don't call recordVectorLoopValueForInductionCast for newly-created IV from…
rL322673: Merging r322473:
rL322473: [LV] Don't call recordVectorLoopValueForInductionCast for newly-created IV from…

Summary

This method is supposed to be called for IVs that have casts in their use-def
chains that are completely ignored after vectorization under PSE. However, for
truncates of such IVs the same InductionDescriptor is used during
creation/widening of both original IV based on PHINode and new IV based on
TruncInst.

This leads to unintended second call to recordVectorLoopValueForInductionCast
with a VectorLoopVal set to the newly created IV for a trunc and causes an
assert due to attempt to store new information for already existing entry in the
map. This is wrong and should not be done.

Fixes PR35773.

Diff Detail

Build Status

Buildable 13732
Build 13732: arc lint + arc unit

Event Timeline

a.elovikov created this revision.Jan 10 2018, 11:35 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptJan 10 2018, 11:35 AM

dcaballe added a subscriber: dcaballe.Jan 10 2018, 12:14 PM

For me, this fixes both minimized test cases mentioned in PR35773, and the full test cases originating from https://bugs.freebsd.org/224867 and https://bugs.freebsd.org/224868.

Hi Dimitry,

This looks good to me, but it's probably a good idea to let Dorit have a look as well. Also, this should be a release blocker if it's not already. Thanks!

llvm/test/Transforms/LoopVectorize/pr35773.ll
35–37	Currently, every induction variable gets vectorized in some way, even though we may know that the vector version will not be needed. So even if we know the induction variable is uniform, as %main.iv is in this case, we will still vectorize it with broadcast/splats, knowing that InstCombine will simplify it all. The vectorizer relies on InstCombine for a lot of cleanup, actually. But I think we're to the point where we don't need to do this, since we can build up vector values on-demand if needed (this wasn't always the case). You can add "-instcombine" to your run line here if you'd rather just check the cleaner output. This is done in many test cases. For the TODO, it's probably better to add something like that in the code for greater visibility, instead of in the test case. The relevant piece of code is the block at the end of `widenIntOrFpInduction` that begins with `if (!VectorizedIV)`. This could be added as a separate patch I think.

In D41913#973706, @mssimpso wrote:

This looks good to me, but it's probably a good idea to let Dorit have a look as well. Also, this should be a release blocker if it's not already. Thanks!

Yes, PR35773 is marked as release blocker, and blocks PR35804 (the 6.0.0 release meta-bug).

In D41913#973706, @mssimpso wrote:

Also, this should be a release blocker if it's not already. Thanks!

Does it mean that I'll be required to commit the fix into some other branch after it lands to trunk?

llvm/test/Transforms/LoopVectorize/pr35773.ll
35–37	instcombine changes the output way too much, IMO. I'd rather keep the RUN line and checks as is because they describe what exactly the vectorizer is doing. I'll remove the "TODO" in such case. Another possibility is to add "-instcombine" but don't run FileCheck at all - just verify that "opt" is not crashing. Not sure if that's common in the tests though.

In D41913#973824, @a.elovikov wrote:

In D41913#973706, @mssimpso wrote:

Also, this should be a release blocker if it's not already. Thanks!

Does it mean that I'll be required to commit the fix into some other branch after it lands to trunk?

No extra commit should be necessary. Once it's committed to trunk, make a note of the revision on the bugzilla in the "Fixed By Commit" field. PR35773 is already blocking 6.00 so if you don't resolve the ticket then the release team should see it and they can resolve it once they've merged.

mssimpso added inline comments.Jan 11 2018, 1:56 PM

llvm/test/Transforms/LoopVectorize/pr35773.ll
35–37	instcombine changes the output way too much, IMO. I'd rather keep the RUN line and checks as is because they describe what exactly the vectorizer is doing. I'll remove the "TODO" in such case. Sounds good to me, but feel free to add a relevant TODO to the .cpp if you'd like. I'm pretty sure we can avoid generating these unneeded sequences, but probably no one has had the time or interest to look at that yet. Another possibility is to add "-instcombine" but don't run FileCheck at all - just verify that "opt" is not crashing. Not sure if that's common in the tests though. I'd prefer that we not do this. We should definitely check that were getting the expected output.

Move TODO from the test to the actual code.

LGTM. Thanks for the fix.

This revision is now accepted and ready to land.Jan 14 2018, 2:46 AM

Closed by commit rL322473: [LV] Don't call recordVectorLoopValueForInductionCast for newly-created IV from… (authored by a.elovikov). · Explain WhyJan 15 2018, 2:57 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

9 lines

test/

Transforms/

LoopVectorize/

pr35773.ll

53 lines

Diff 129523

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,429 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::createVectorIntOrFpInductionPHI(

// We may need to add the step a number of times, depending on the unroll		// We may need to add the step a number of times, depending on the unroll
// factor. The last of those goes into the PHI.		// factor. The last of those goes into the PHI.
PHINode *VecInd = PHINode::Create(SteppedStart->getType(), 2, "vec.ind",		PHINode *VecInd = PHINode::Create(SteppedStart->getType(), 2, "vec.ind",
&*LoopVectorBody->getFirstInsertionPt());		&*LoopVectorBody->getFirstInsertionPt());
Instruction *LastInduction = VecInd;		Instruction *LastInduction = VecInd;
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
VectorLoopValueMap.setVectorValue(EntryVal, Part, LastInduction);		VectorLoopValueMap.setVectorValue(EntryVal, Part, LastInduction);
recordVectorLoopValueForInductionCast(II, LastInduction, Part);
if (isa<TruncInst>(EntryVal))		if (isa<TruncInst>(EntryVal))
addMetadata(LastInduction, EntryVal);		addMetadata(LastInduction, EntryVal);
		else
		recordVectorLoopValueForInductionCast(II, LastInduction, Part);

LastInduction = cast<Instruction>(addFastMathFlag(		LastInduction = cast<Instruction>(addFastMathFlag(
Builder.CreateBinOp(AddOp, LastInduction, SplatVF, "step.add")));		Builder.CreateBinOp(AddOp, LastInduction, SplatVF, "step.add")));
}		}

// Move the last step to the end of the latch block. This ensures consistent		// Move the last step to the end of the latch block. This ensures consistent
// placement of all induction updates.		// placement of all induction updates.
auto *LoopVectorLatch = LI->getLoopFor(LoopVectorBody)->getLoopLatch();		auto *LoopVectorLatch = LI->getLoopFor(LoopVectorBody)->getLoopLatch();
auto *Br = cast<BranchInst>(LoopVectorLatch->getTerminator());		auto *Br = cast<BranchInst>(LoopVectorLatch->getTerminator());
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	if (Trunc) {
"Truncation requires an integer step");		"Truncation requires an integer step");
ScalarIV = Builder.CreateTrunc(ScalarIV, TruncType);		ScalarIV = Builder.CreateTrunc(ScalarIV, TruncType);
Step = Builder.CreateTrunc(Step, TruncType);		Step = Builder.CreateTrunc(Step, TruncType);
}		}
}		}

// If we haven't yet vectorized the induction variable, splat the scalar		// If we haven't yet vectorized the induction variable, splat the scalar
// induction variable, and build the necessary step vectors.		// induction variable, and build the necessary step vectors.
		// TODO: Don't do it unless the vectorized IV is really required.
if (!VectorizedIV) {		if (!VectorizedIV) {
Value *Broadcasted = getBroadcastInstrs(ScalarIV);		Value *Broadcasted = getBroadcastInstrs(ScalarIV);
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Value *EntryPart =		Value *EntryPart =
getStepVector(Broadcasted, VF * Part, Step, ID.getInductionOpcode());		getStepVector(Broadcasted, VF * Part, Step, ID.getInductionOpcode());
VectorLoopValueMap.setVectorValue(EntryVal, Part, EntryPart);		VectorLoopValueMap.setVectorValue(EntryVal, Part, EntryPart);
recordVectorLoopValueForInductionCast(ID, EntryPart, Part);
if (Trunc)		if (Trunc)
addMetadata(EntryPart, Trunc);		addMetadata(EntryPart, Trunc);
		else
		recordVectorLoopValueForInductionCast(ID, EntryPart, Part);
}		}
}		}

// If an induction variable is only used for counting loop iterations or		// If an induction variable is only used for counting loop iterations or
// calculating addresses, it doesn't need to be widened. Create scalar steps		// calculating addresses, it doesn't need to be widened. Create scalar steps
// that can be used by instructions we will later scalarize. Note that the		// that can be used by instructions we will later scalarize. Note that the
// addition of the scalar steps will not increase the number of instructions		// addition of the scalar steps will not increase the number of instructions
// in the loop in the common case prior to InstCombine. We will be trading		// in the loop in the common case prior to InstCombine. We will be trading
▲ Show 20 Lines • Show All 6,051 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/pr35773.ll

This file was added.

				; RUN: opt -S -loop-vectorize -force-vector-width=4 -force-vector-interleave=1 < %s 2>&1 \| FileCheck %s
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				@a = common local_unnamed_addr global i32 0, align 4
				@b = common local_unnamed_addr global i8 0, align 1

				; Function Attrs: norecurse nounwind uwtable
				define void @doit1() local_unnamed_addr{
				entry:
				br label %for.body

				for.body:
				%main.iv = phi i32 [ 0, %entry ], [ %inc, %for.body ]

				%i8.iv = phi i8 [ 0, %entry ], [ %i8.add, %for.body ]
				%i32.iv = phi i32 [ 0, %entry ], [ %i32.add, %for.body ]

				%trunc.to.be.converted.to.new.iv = trunc i32 %i32.iv to i8
				%i8.add = add i8 %i8.iv, %trunc.to.be.converted.to.new.iv

				%noop.conv.under.pse = and i32 %i32.iv, 255
				%i32.add = add nuw nsw i32 %noop.conv.under.pse, 9

				%inc = add i32 %main.iv, 1
				%tobool = icmp eq i32 %inc, 16
				br i1 %tobool, label %for.cond.for.end_crit_edge, label %for.body

				; CHECK-LABEL: @doit1(
				; CHECK: vector.body:
				; CHECK-NEXT: [[MAIN_IV:%.]] = phi i32 [ 0, [[VECTOR_PH:%.]] ], [ [[MAIN_IV_NEXT:%.]], [[VECTOR_BODY:%.]] ]
				; CHECK-NEXT: [[I8_IV:%.]] = phi <4 x i8> [ zeroinitializer, [[VECTOR_PH]] ], [ [[I8_IV_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[I32_IV:%.]] = phi <4 x i32> [ <i32 0, i32 9, i32 18, i32 27>, [[VECTOR_PH]] ], [ [[I32_IV_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[IV_FROM_TRUNC:%.]] = phi <4 x i8> [ <i8 0, i8 9, i8 18, i8 27>, [[VECTOR_PH]] ], [ [[IV_FROM_TRUNC_NEXT:%.]], [[VECTOR_BODY]] ]

				; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> undef, i32 [[MAIN_IV]], i32 0
				; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> undef, <4 x i32> zeroinitializer
				; CHECK-NEXT: [[INDUCTION:%.*]] = add <4 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3>
				; CHECK-NEXT: [[TMP7:%.*]] = add i32 [[MAIN_IV]], 0
				mssimpsoUnsubmitted Not Done Reply Inline Actions Currently, every induction variable gets vectorized in some way, even though we may know that the vector version will not be needed. So even if we know the induction variable is uniform, as %main.iv is in this case, we will still vectorize it with broadcast/splats, knowing that InstCombine will simplify it all. The vectorizer relies on InstCombine for a lot of cleanup, actually. But I think we're to the point where we don't need to do this, since we can build up vector values on-demand if needed (this wasn't always the case). You can add "-instcombine" to your run line here if you'd rather just check the cleaner output. This is done in many test cases. For the TODO, it's probably better to add something like that in the code for greater visibility, instead of in the test case. The relevant piece of code is the block at the end of `widenIntOrFpInduction` that begins with `if (!VectorizedIV)`. This could be added as a separate patch I think. mssimpso: Currently, every induction variable gets vectorized in some way, even though we may know that…
				a.elovikovAuthorUnsubmitted Not Done Reply Inline Actions instcombine changes the output way too much, IMO. I'd rather keep the RUN line and checks as is because they describe what exactly the vectorizer is doing. I'll remove the "TODO" in such case. Another possibility is to add "-instcombine" but don't run FileCheck at all - just verify that "opt" is not crashing. Not sure if that's common in the tests though. a.elovikov: instcombine changes the output way too much, IMO. I'd rather keep the RUN line and checks as is…
				mssimpsoUnsubmitted Not Done Reply Inline Actions instcombine changes the output way too much, IMO. I'd rather keep the RUN line and checks as is because they describe what exactly the vectorizer is doing. I'll remove the "TODO" in such case. Sounds good to me, but feel free to add a relevant TODO to the .cpp if you'd like. I'm pretty sure we can avoid generating these unneeded sequences, but probably no one has had the time or interest to look at that yet. Another possibility is to add "-instcombine" but don't run FileCheck at all - just verify that "opt" is not crashing. Not sure if that's common in the tests though. I'd prefer that we not do this. We should definitely check that were getting the expected output. mssimpso: > instcombine changes the output way too much, IMO. I'd rather keep the RUN line and checks as…

				; CHECK-NEXT: [[I8_IV_NEXT]] = add <4 x i8> [[I8_IV]], [[IV_FROM_TRUNC]]

				; CHECK-NEXT: [[MAIN_IV_NEXT]] = add i32 [[MAIN_IV]], 4
				; CHECK-NEXT: [[I32_IV_NEXT]] = add <4 x i32> [[I32_IV]], <i32 36, i32 36, i32 36, i32 36>
				; CHECK-NEXT: [[IV_FROM_TRUNC_NEXT]] = add <4 x i8> [[IV_FROM_TRUNC]], <i8 36, i8 36, i8 36, i8 36>
				; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i32 [[MAIN_IV_NEXT]], 16
				; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0

				for.cond.for.end_crit_edge:
				store i8 %i8.add, i8* @b, align 1
				br label %for.end

				for.end:
				ret void
				}