This is an archive of the discontinued LLVM Phabricator instance.

[LV] Update dominator tree before fixing external IV users
ClosedPublic

Authored by mssimpso on Dec 29 2016, 2:03 PM.

Download Raw Diff

Details

Reviewers

Commits

rGcf796478e9e9: [LV] Fix-up external IV users after updating dominator tree
rL291462: [LV] Fix-up external IV users after updating dominator tree

Summary

This patch delays the fix-up step for external induction variable users until after the dominator tree has been properly updated. This should fix PR30742. The SCEVExpander in InductionDescriptor::transform can generate code in the wrong location if the dominator tree is not up-to-date.

I'm not quite sure if the is the right approach or not. In particular, we use InductionDescriptor::transform in other locations before the dominator tree has been updated. Maybe this isn't a problem because the vector loop is still detached from the dominator tree? In any case, I made an attempt to keep the dominator tree up-to-date at the outset when creating the structure of the vector loop, but that caused the SCEVExpander to generate worse code (the expander was either crashing or creating a new canonical induction variable for every loop). The use of InductionDescriptor::transform when fixing the external induction variable users may be unique in that the insertion point is outside the loop (it's in the middle block). I'm not sure if this would make a difference, though. Please take a look.

Reference: https://llvm.org/bugs/show_bug.cgi?id=30742

Diff Detail

Repository: rL LLVM

Event Timeline

mssimpso updated this revision to Diff 82694.Dec 29 2016, 2:03 PM

mssimpso retitled this revision from to [LV] Update dominator tree before fixing external IV users.

mssimpso updated this object.

mssimpso added a reviewer: mkuper.

mssimpso added subscribers: llvm-commits, mcrosier.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptDec 29 2016, 2:03 PM

mssimpso updated this object.Dec 29 2016, 2:03 PM

My gut feeling is that we really want to update DT as we go.
Even if all current uses of transform() except this one are safe (and I'm not at all sure about that), leaving this as is sounds very brittle, since any additional use of transform() can potentially break it.

In D28168#634459, @mkuper wrote:

My gut feeling is that we really want to update DT as we go.
Even if all current uses of transform() except this one are safe (and I'm not at all sure about that), leaving this as is sounds very brittle, since any additional use of transform() can potentially break it.

I totally agree. I'm happy to continue looking at this, but when I updated the DT at the beginning (right after we create the new loop structure), subsequent calls to transform caused the Expander to generate "new" canonical IVs in addition to the ones we already create. Any idea why it would do this? I haven't yet figured out why the Expander would be generating worse code after such a change.

For the Expander, it wants an IV that starts at zero and steps by one. But LV generates IVs that step by VFxUF.

In D28168#634469, @mssimpso wrote:

In D28168#634459, @mkuper wrote:

My gut feeling is that we really want to update DT as we go.
Even if all current uses of transform() except this one are safe (and I'm not at all sure about that), leaving this as is sounds very brittle, since any additional use of transform() can potentially break it.

I totally agree. I'm happy to continue looking at this, but when I updated the DT at the beginning (right after we create the new loop structure), subsequent calls to transform caused the Expander to generate "new" canonical IVs in addition to the ones we already create. Any idea why it would do this? I haven't yet figured out why the Expander would be generating worse code after such a change.

For the Expander, it wants an IV that starts at zero and steps by one. But LV generates IVs that step by VFxUF.

I'm not really familiar with the expander, unfortunately.
In any case, I'm ok with this patch going in in the meanwhile. I think it still moves us in the right direction of updating DT earlier. :-)

Added a FIXME comment indicating that we should work towards updating the dominator tree earlier.

Should we commit this now so we can close the PR before the 4.0 branch? We can later try to resolve the SCEV issues and keep the dominator tree up-to-date as we go.

SGTM

This revision is now accepted and ready to land.Jan 9 2017, 10:51 AM

Closed by commit rL291462: [LV] Fix-up external IV users after updating dominator tree (authored by mssimpso). · Explain WhyJan 9 2017, 11:16 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

27 lines

test/

Transforms/

LoopVectorize/

iv_outside_user.ll

45 lines

Diff 83657

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 777 Lines • ▼ Show 20 Lines	protected:

// Holds instructions from the original loop whose counterparts in the		// Holds instructions from the original loop whose counterparts in the
// vectorized loop would be trivially dead if generated. For example,		// vectorized loop would be trivially dead if generated. For example,
// original induction update instructions can become dead because we		// original induction update instructions can become dead because we
// separately emit induction "steps" when generating code for the new loop.		// separately emit induction "steps" when generating code for the new loop.
// Similarly, we create a new latch condition when setting up the structure		// Similarly, we create a new latch condition when setting up the structure
// of the new loop, so the old one can become dead.		// of the new loop, so the old one can become dead.
SmallPtrSet<Instruction *, 4> DeadInstructions;		SmallPtrSet<Instruction *, 4> DeadInstructions;

		// Holds the end values for each induction variable. We save the end values
		// so we can later fix-up the external users of the induction variables.
		DenseMap<PHINode , Value > IVEndValues;
};		};

class InnerLoopUnroller : public InnerLoopVectorizer {		class InnerLoopUnroller : public InnerLoopVectorizer {
public:		public:
InnerLoopUnroller(Loop *OrigLoop, PredicatedScalarEvolution &PSE,		InnerLoopUnroller(Loop *OrigLoop, PredicatedScalarEvolution &PSE,
LoopInfo LI, DominatorTree DT,		LoopInfo LI, DominatorTree DT,
const TargetLibraryInfo *TLI,		const TargetLibraryInfo *TLI,
const TargetTransformInfo TTI, AssumptionCache AC,		const TargetTransformInfo TTI, AssumptionCache AC,
▲ Show 20 Lines • Show All 2,618 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::createEmptyLoop() {
LoopVectorizationLegality::InductionList *List = Legal->getInductionVars();		LoopVectorizationLegality::InductionList *List = Legal->getInductionVars();
for (auto &InductionEntry : *List) {		for (auto &InductionEntry : *List) {
PHINode *OrigPhi = InductionEntry.first;		PHINode *OrigPhi = InductionEntry.first;
InductionDescriptor II = InductionEntry.second;		InductionDescriptor II = InductionEntry.second;

// Create phi nodes to merge from the backedge-taken check block.		// Create phi nodes to merge from the backedge-taken check block.
PHINode *BCResumeVal = PHINode::Create(		PHINode *BCResumeVal = PHINode::Create(
OrigPhi->getType(), 3, "bc.resume.val", ScalarPH->getTerminator());		OrigPhi->getType(), 3, "bc.resume.val", ScalarPH->getTerminator());
Value *EndValue;		Value *&EndValue = IVEndValues[OrigPhi];
if (OrigPhi == OldInduction) {		if (OrigPhi == OldInduction) {
// We know what the end value is.		// We know what the end value is.
EndValue = CountRoundDown;		EndValue = CountRoundDown;
} else {		} else {
IRBuilder<> B(LoopBypassBlocks.back()->getTerminator());		IRBuilder<> B(LoopBypassBlocks.back()->getTerminator());
Type *StepType = II.getStep()->getType();		Type *StepType = II.getStep()->getType();
Instruction::CastOps CastOp =		Instruction::CastOps CastOp =
CastInst::getCastOpcode(CountRoundDown, true, StepType, true);		CastInst::getCastOpcode(CountRoundDown, true, StepType, true);
Value *CRD = B.CreateCast(CastOp, CountRoundDown, StepType, "cast.crd");		Value *CRD = B.CreateCast(CastOp, CountRoundDown, StepType, "cast.crd");
const DataLayout &DL = OrigLoop->getHeader()->getModule()->getDataLayout();		const DataLayout &DL = OrigLoop->getHeader()->getModule()->getDataLayout();
EndValue = II.transform(B, CRD, PSE.getSE(), DL);		EndValue = II.transform(B, CRD, PSE.getSE(), DL);
EndValue->setName("ind.end");		EndValue->setName("ind.end");
}		}

// The new PHI merges the original incoming value, in case of a bypass,		// The new PHI merges the original incoming value, in case of a bypass,
// or the value at the end of the vectorized loop.		// or the value at the end of the vectorized loop.
BCResumeVal->addIncoming(EndValue, MiddleBlock);		BCResumeVal->addIncoming(EndValue, MiddleBlock);

// Fix up external users of the induction variable.
fixupIVUsers(OrigPhi, II, CountRoundDown, EndValue, MiddleBlock);

// Fix the scalar body counter (PHI node).		// Fix the scalar body counter (PHI node).
unsigned BlockIdx = OrigPhi->getBasicBlockIndex(ScalarPH);		unsigned BlockIdx = OrigPhi->getBasicBlockIndex(ScalarPH);

// The old induction's phi node in the scalar body needs the truncated		// The old induction's phi node in the scalar body needs the truncated
// value.		// value.
for (BasicBlock *BB : LoopBypassBlocks)		for (BasicBlock *BB : LoopBypassBlocks)
BCResumeVal->addIncoming(II.getStartValue(), BB);		BCResumeVal->addIncoming(II.getStartValue(), BB);
OrigPhi->setIncomingValue(BlockIdx, BCResumeVal);		OrigPhi->setIncomingValue(BlockIdx, BCResumeVal);
▲ Show 20 Lines • Show All 654 Lines • ▼ Show 20 Lines	int IncomingEdgeBlockIdx =
Phi->getBasicBlockIndex(OrigLoop->getLoopLatch());		Phi->getBasicBlockIndex(OrigLoop->getLoopLatch());
assert(IncomingEdgeBlockIdx >= 0 && "Invalid block index");		assert(IncomingEdgeBlockIdx >= 0 && "Invalid block index");
// Pick the other block.		// Pick the other block.
int SelfEdgeBlockIdx = (IncomingEdgeBlockIdx ? 0 : 1);		int SelfEdgeBlockIdx = (IncomingEdgeBlockIdx ? 0 : 1);
Phi->setIncomingValue(SelfEdgeBlockIdx, BCBlockPhi);		Phi->setIncomingValue(SelfEdgeBlockIdx, BCBlockPhi);
Phi->setIncomingValue(IncomingEdgeBlockIdx, LoopExitInst);		Phi->setIncomingValue(IncomingEdgeBlockIdx, LoopExitInst);
} // end of for each Phi in PHIsToFix.		} // end of for each Phi in PHIsToFix.

fixLCSSAPHIs();		// Update the dominator tree.
		//
// Make sure DomTree is updated.		// FIXME: After creating the structure of the new loop, the dominator tree is
		// no longer up-to-date, and it remains that way until we update it
		// here. An out-of-date dominator tree is problematic for SCEV,
		// because SCEVExpander uses it to guide code generation. The
		// vectorizer use SCEVExpanders in several places. Instead, we should
		// keep the dominator tree up-to-date as we go.
updateAnalysis();		updateAnalysis();

		// Fix-up external users of the induction variables.
		for (auto &Entry : *Legal->getInductionVars())
		fixupIVUsers(Entry.first, Entry.second,
		getOrCreateVectorTripCount(LI->getLoopFor(LoopVectorBody)),
		IVEndValues[Entry.first], LoopMiddleBlock);

		fixLCSSAPHIs();
predicateInstructions();		predicateInstructions();

// Remove redundant induction instructions.		// Remove redundant induction instructions.
cse(LoopVectorBody);		cse(LoopVectorBody);
}		}

void InnerLoopVectorizer::fixFirstOrderRecurrence(PHINode *Phi) {		void InnerLoopVectorizer::fixFirstOrderRecurrence(PHINode *Phi) {

▲ Show 20 Lines • Show All 3,528 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopVectorize/iv_outside_user.ll

Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	for.body:
br i1 %cmp, label %for.end, label %for.body		br i1 %cmp, label %for.end, label %for.body

for.end:		for.end:
%phi = phi i32 [ %inc, %for.body ]		%phi = phi i32 [ %inc, %for.body ]
%phi2 = phi i32 [ %inc, %for.body ]		%phi2 = phi i32 [ %inc, %for.body ]
store i32 %phi2, i32* %p		store i32 %phi2, i32* %p
ret i32 %phi		ret i32 %phi
}		}

		; CHECK-LABEL: @PR30742
		; CHECK: min.iters.checked
		; CHECK: %[[N_MOD_VF:.+]] = urem i32 %[[T5:.+]], 2
		; CHECK: %[[N_VEC:.+]] = sub i32 %[[T5]], %[[N_MOD_VF]]
		; CHECK: middle.block
		; CHECK: %[[CMP:.+]] = icmp eq i32 %[[T5]], %[[N_VEC]]
		; CHECK: %[[T15:.+]] = add i32 %tmp03, -7
		; CHECK: %[[T16:.+]] = shl i32 %[[N_MOD_VF]], 3
		; CHECK: %[[T17:.+]] = add i32 %[[T15]], %[[T16]]
		; CHECK: %[[T18:.+]] = shl i32 {{.*}}, 3
		; CHECK: %ind.escape = sub i32 %[[T17]], %[[T18]]
		; CHECK: br i1 %[[CMP]], label %BB3, label %scalar.ph
		define void @PR30742() {
		BB0:
		br label %BB1

		BB1:
		%tmp00 = load i32, i32* undef, align 16
		%tmp01 = sub i32 %tmp00, undef
		%tmp02 = icmp slt i32 %tmp01, 1
		%tmp03 = select i1 %tmp02, i32 1, i32 %tmp01
		%tmp04 = add nsw i32 %tmp03, -7
		br label %BB2

		BB2:
		%tmp05 = phi i32 [ %tmp04, %BB1 ], [ %tmp06, %BB2 ]
		%tmp06 = add i32 %tmp05, -8
		%tmp07 = icmp sgt i32 %tmp06, 0
		br i1 %tmp07, label %BB2, label %BB3

		BB3:
		%tmp08 = phi i32 [ %tmp05, %BB2 ]
		%tmp09 = sub i32 %tmp00, undef
		%tmp10 = icmp slt i32 %tmp09, 1
		%tmp11 = select i1 %tmp10, i32 1, i32 %tmp09
		%tmp12 = add nsw i32 %tmp11, -7
		br label %BB4

		BB4:
		%tmp13 = phi i32 [ %tmp12, %BB3 ], [ %tmp14, %BB4 ]
		%tmp14 = add i32 %tmp13, -8
		%tmp15 = icmp sgt i32 %tmp14, 0
		br i1 %tmp15, label %BB4, label %BB1
		}