This is an archive of the discontinued LLVM Phabricator instance.

[CVP] Require DomTree for new Pass Manager
ClosedPublic

Authored by dmgreen on May 15 2018, 10:20 AM.

Download Raw Diff

Details

Reviewers

chandlerc
spatel
• dberlin

Summary

We were previously using a DT in CVP through SimplifyQuery, but not requiring it in
the new pass manager. This now gets DT directly and plumbs it through to where it is
used (instead of using it through SQ)

Diff Detail

Event Timeline

dmgreen created this revision.May 15 2018, 10:20 AM

dmgreen added inline comments.

lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
162	Required here :)

There must be a way to test this? Ie, how did you notice this bug?

Ah, test, yes. I found this with a downstream pass that doesn't preserve DT's like jump threading does. I'll see if I can come up with something.

chandlerc added inline comments.May 15 2018, 5:14 PM

lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
762–765	I really don't like this use of getBestSimplifyQuery. I would strongly suggest passing the analyses you want to pass into SimplifyQuery. Not doing that makes the pass subtly dependent on pass ordering and other vagaries.

I've done some plumbing to pass DT directly through to where it is used. The only other use of SQ is into SimplifyInstruction. I hope this makes the use of getBestSimplifyQuery OK. If not, which analyses should be used to create it? All of them (DT, DL, TLI and AC)? And why do we have getBestSimplifyQuery? ;)

I've also added new pass manager run lines to a couple of tests. Add.ll fails without requiring the DT.

So, this seems very confused.
getBestSimplifyQuery simply queries the analysis manager to see what passes are available.

This was used to replace something even *more* vague, where passes were randomly plumbing or not plumbing pieces they had to simplifyInstruction various places, and were doing it quite wrong.
It was generally considered a significant improvement over what existed before.
You can look at what it replaced and make that decision,.

All that said, it does not *require* anything.
Nor does SimplifyInstruction, which was deliberately built to not require a DomTree to function properly, and is used in places where it is not up to date or correct.
Hence, getBestSimplifyQuery uses what is available, as does the SimplifyQuery structure and SimplifyInstruction.

It is only meant to be used with SI, and using it elsewhere is a bad plan.

Otherwise, i disagree with Chandler strongly to go back to constructing SimplifyQuery's directly. It was wrong often enough that this was clearly not a good way of doing things, IMHO.

dmgreen edited the summary of this revision. (Show Details)May 16 2018, 9:16 AM

I may have sowed some confusion here by not updating commit messages as the code changed.

The code here in CVP currently:

use getBestSimplifyQuery to get the best SQ available (but not require a DT in the NPM)
use the SQ in SQ.DT->dominates (causing the problem) and SimplifyInstruction(.., SQ)

With this patch it:

gets DT directly and passes it through to where it's needed (i.e DT->dominates)
uses getBestSimplifyQuery only for SimplifyInstruction(.., SQ)
no longer uses SQ.DT directly

Everyone happy (enough) with this state of affairs?

In D46893#1104202, @dmgreen wrote:

I may have sowed some confusion here by not updating commit messages as the code changed.

The code here in CVP currently:

use getBestSimplifyQuery to get the best SQ available (but not require a DT in the NPM)

use the SQ in SQ.DT->dominates (causing the problem) and SimplifyInstruction(.., SQ)

Yeah, this is a bad idea :)

With this patch it:

gets DT directly and passes it through to where it's needed (i.e DT->dominates)

uses getBestSimplifyQuery only for SimplifyInstruction(.., SQ)

no longer uses SQ.DT directly

This is a good idea ;)

Everyone happy (enough) with this state of affairs?

I am for sure.

I'll mark this accepted, but please wait until monday to see if anyone has any more concerns.

This revision is now accepted and ready to land.May 18 2018, 8:49 AM

I'm happy as long as we have a test. :)

I haven't stepped through this to understand actually what happens here.

In the add.ll file, the new PM crashes. Explain that with a comment in the test file and/or commit message?
I don't see any difference in phi-common-val.ll - all of the auto-generated assertions pass even without this patch. Augment the run/assertions to show the failure (if it's visible there some other way)?

In D46893#1101272, @dberlin wrote:

So, this seems very confused.
getBestSimplifyQuery simply queries the analysis manager to see what passes are available.

This was used to replace something even *more* vague, where passes were randomly plumbing or not plumbing pieces they had to simplifyInstruction various places, and were doing it quite wrong.
It was generally considered a significant improvement over what existed before.
You can look at what it replaced and make that decision,.

All that said, it does not *require* anything.
Nor does SimplifyInstruction, which was deliberately built to not require a DomTree to function properly, and is used in places where it is not up to date or correct.
Hence, getBestSimplifyQuery uses what is available, as does the SimplifyQuery structure and SimplifyInstruction.

It is only meant to be used with SI, and using it elsewhere is a bad plan.

Otherwise, i disagree with Chandler strongly to go back to constructing SimplifyQuery's directly. It was wrong often enough that this was clearly not a good way of doing things, IMHO.

I think we're miscommunicating about this.

I'm not necessarily suggesting constructing SimplifyQuery directly. I'm not trying to suggest any previous API is good.

I'm trying to point out that we're going to have a debugging and stability problem if we rely heavily on getting "available" passes in the new PM that is significantly different from the old PM. I'll try to explain the underlying problem here in a bit more detail so that we're on the same page.

In the old PM, getting the aviailable passes was a really nice approach. Because the schedule of pass runs was decided up-front, getting available analyses essentially said "whatever the pass pipeline looks like, use what is there". This is a great optimization and avoids lots of overly tight coupling between passes and their schedule. But the results were *extremely predictable* -- there would be exactly one set of analyses available for a pass in a given position in the pipeline. I see zero problems here, and the API we're discussing does indeed seem way, way better than manually building SimplifyQuery.

But in the new PM, we have a very different situation. Because passes are cached, the "available" thing is really subtle. It isn't static, it is dynamic. One module may have that analysis available, while a *very* subtly different module won't. You can even have action at a distance with this, where one function changes in the module, and on a completely unrelated function the analysis is no longer available. I see at least three issues with this:

It will make it very hard to debug missed optimizations because it is much more complicated to reason about. For example, it will make bugpoint reduction of failures substantially harder as now you'll end up needing to preserve lots of very inexplicable things to get particular behaviors to occur
If we ever want to add memory usage limiting functionality to the new PM (which everyone was expecting in the early days but hasn't been needed thus far), those memory saving clears of the cache *will change optimization power*.
It may in rare cases hurt users where seemingly benign refactorings and changes cause the optimizer to improve and regress in ways they can't understand and don't expect.

Now, all I'm trying to say here is that getting available analyses in the new PM is substantially more risky than in the old PM. I'm not suggesting any particular API approach. Just saying that in the new PM, especially with analyses being somewhat cheaper due to caching, we should be skewing much, much further towards requiring analyses rather than getting them if available.

That still could mean the raw SimplifyQuery API is bad, and that we need a better API. Totally on board with that. I'm just hoping that, to the extent possible, we can design APIs that actually require the analyses in the new PM rather than using whatever happens to be sitting in the cache.

And that also doesn't mean this patch isn't an incremental improvement over the prior state necessarily. I'm perfectly happy if we need to fix this in a follow-up. I just want people to be aware of and understand the risk and issues here.

chandlerc added inline comments.May 18 2018, 10:22 AM

lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
760	And this makes it somewhat obvious that this patch is an incremental improvement -- it removes one dynamic aspect of this. So, this patch definitely LGTM, but I think we also want to think about what a better API for building SimplifyQuery looks like in the new PM world where we have more of these issues around cached analyses being unpredictable.

Thanks for the info. I see the problem. I will commit this, to make it so this isn't crashing, and we can go from there.

I've changed the test here to be a new test in phi-common-val.ll that shows the error. No longer using the fact that add.ll happened to crash.

rL332836

I also forgot to mention I took a quick look and it appears that getBestSimplifyQuery is only used in two places, here in CVP and LoopRotate.

As LoopRotate is a loop pass, it will always have access to all the analyses via LoopStandardAnalysisResults.

I'm pretty sure that TLI, and AC are all so cheap we could just always require them in getBestSimplifyQuery (at least without paying any cost in getBestSimplifyQuery, I guess they could still cause more cost in InstructionSimplify). That would still leave what to do about DT.

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

CorrelatedValuePropagation.cpp

22 lines

test/

Transforms/

CorrelatedValuePropagation/

phi-common-val.ll

31 lines

Diff 147751

lib/Transforms/Scalar/CorrelatedValuePropagation.cpp

Show First 20 Lines • Show All 130 Lines • ▼ Show 20 Lines
/// br i1 %isnull, label %bb2, label %bb1		/// br i1 %isnull, label %bb2, label %bb1
/// bb1:		/// bb1:
/// br label %bb2		/// br label %bb2
/// bb2:		/// bb2:
/// %r = phi i8* [ %x, %bb1 ], [ null, %bb0 ]		/// %r = phi i8* [ %x, %bb1 ], [ null, %bb0 ]
/// -->		/// -->
/// %r = %x		/// %r = %x
static bool simplifyCommonValuePhi(PHINode P, LazyValueInfo LVI,		static bool simplifyCommonValuePhi(PHINode P, LazyValueInfo LVI,
const SimplifyQuery &SQ) {		DominatorTree *DT) {
// Collect incoming constants and initialize possible common value.		// Collect incoming constants and initialize possible common value.
SmallVector<std::pair<Constant *, unsigned>, 4> IncomingConstants;		SmallVector<std::pair<Constant *, unsigned>, 4> IncomingConstants;
Value *CommonValue = nullptr;		Value *CommonValue = nullptr;
for (unsigned i = 0, e = P->getNumIncomingValues(); i != e; ++i) {		for (unsigned i = 0, e = P->getNumIncomingValues(); i != e; ++i) {
Value *Incoming = P->getIncomingValue(i);		Value *Incoming = P->getIncomingValue(i);
if (auto *IncomingConstant = dyn_cast<Constant>(Incoming)) {		if (auto *IncomingConstant = dyn_cast<Constant>(Incoming)) {
IncomingConstants.push_back(std::make_pair(IncomingConstant, i));		IncomingConstants.push_back(std::make_pair(IncomingConstant, i));
} else if (!CommonValue) {		} else if (!CommonValue) {
// The potential common value is initialized to the first non-constant.		// The potential common value is initialized to the first non-constant.
CommonValue = Incoming;		CommonValue = Incoming;
} else if (Incoming != CommonValue) {		} else if (Incoming != CommonValue) {
// There can be only one non-constant common value.		// There can be only one non-constant common value.
return false;		return false;
}		}
}		}

if (!CommonValue \|\| IncomingConstants.empty())		if (!CommonValue \|\| IncomingConstants.empty())
return false;		return false;

// The common value must be valid in all incoming blocks.		// The common value must be valid in all incoming blocks.
BasicBlock *ToBB = P->getParent();		BasicBlock *ToBB = P->getParent();
if (auto *CommonInst = dyn_cast<Instruction>(CommonValue))		if (auto *CommonInst = dyn_cast<Instruction>(CommonValue))
if (!SQ.DT->dominates(CommonInst, ToBB))		if (!DT->dominates(CommonInst, ToBB))
		dmgreenAuthorUnsubmitted Not Done Reply Inline Actions Required here :) dmgreen: Required here :)
return false;		return false;

// We have a phi with exactly 1 variable incoming value and 1 or more constant		// We have a phi with exactly 1 variable incoming value and 1 or more constant
// incoming values. See if all constant incoming values can be mapped back to		// incoming values. See if all constant incoming values can be mapped back to
// the same incoming variable value.		// the same incoming variable value.
for (auto &IncomingConstant : IncomingConstants) {		for (auto &IncomingConstant : IncomingConstants) {
Constant *C = IncomingConstant.first;		Constant *C = IncomingConstant.first;
BasicBlock *IncomingBB = P->getIncomingBlock(IncomingConstant.second);		BasicBlock *IncomingBB = P->getIncomingBlock(IncomingConstant.second);
if (C != LVI->getConstantOnEdge(CommonValue, IncomingBB, ToBB, P))		if (C != LVI->getConstantOnEdge(CommonValue, IncomingBB, ToBB, P))
return false;		return false;
}		}

// All constant incoming values map to the same variable along the incoming		// All constant incoming values map to the same variable along the incoming
// edges of the phi. The phi is unnecessary.		// edges of the phi. The phi is unnecessary.
P->replaceAllUsesWith(CommonValue);		P->replaceAllUsesWith(CommonValue);
P->eraseFromParent();		P->eraseFromParent();
++NumPhiCommon;		++NumPhiCommon;
return true;		return true;
}		}

static bool processPHI(PHINode P, LazyValueInfo LVI,		static bool processPHI(PHINode P, LazyValueInfo LVI, DominatorTree *DT,
const SimplifyQuery &SQ) {		const SimplifyQuery &SQ) {
bool Changed = false;		bool Changed = false;

BasicBlock *BB = P->getParent();		BasicBlock *BB = P->getParent();
for (unsigned i = 0, e = P->getNumIncomingValues(); i < e; ++i) {		for (unsigned i = 0, e = P->getNumIncomingValues(); i < e; ++i) {
Value *Incoming = P->getIncomingValue(i);		Value *Incoming = P->getIncomingValue(i);
if (isa<Constant>(Incoming)) continue;		if (isa<Constant>(Incoming)) continue;

▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	static bool processPHI(PHINode P, LazyValueInfo LVI, DominatorTree *DT,

if (Value *V = SimplifyInstruction(P, SQ)) {		if (Value *V = SimplifyInstruction(P, SQ)) {
P->replaceAllUsesWith(V);		P->replaceAllUsesWith(V);
P->eraseFromParent();		P->eraseFromParent();
Changed = true;		Changed = true;
}		}

if (!Changed)		if (!Changed)
Changed = simplifyCommonValuePhi(P, LVI, SQ);		Changed = simplifyCommonValuePhi(P, LVI, DT);

if (Changed)		if (Changed)
++NumPhis;		++NumPhis;

return Changed;		return Changed;
}		}

static bool processMemAccess(Instruction I, LazyValueInfo LVI) {		static bool processMemAccess(Instruction I, LazyValueInfo LVI) {
▲ Show 20 Lines • Show All 410 Lines • ▼ Show 20 Lines	static Constant getConstantAt(Value V, Instruction At, LazyValueInfo LVI) {
if (Result == LazyValueInfo::Unknown)		if (Result == LazyValueInfo::Unknown)
return nullptr;		return nullptr;

return (Result == LazyValueInfo::True) ?		return (Result == LazyValueInfo::True) ?
ConstantInt::getTrue(C->getContext()) :		ConstantInt::getTrue(C->getContext()) :
ConstantInt::getFalse(C->getContext());		ConstantInt::getFalse(C->getContext());
}		}

static bool runImpl(Function &F, LazyValueInfo *LVI, const SimplifyQuery &SQ) {		static bool runImpl(Function &F, LazyValueInfo LVI, DominatorTree DT,
		const SimplifyQuery &SQ) {
bool FnChanged = false;		bool FnChanged = false;
// Visiting in a pre-order depth-first traversal causes us to simplify early		// Visiting in a pre-order depth-first traversal causes us to simplify early
// blocks before querying later blocks (which require us to analyze early		// blocks before querying later blocks (which require us to analyze early
// blocks). Eagerly simplifying shallow blocks means there is strictly less		// blocks). Eagerly simplifying shallow blocks means there is strictly less
// work to do for deep blocks. This also means we don't visit unreachable		// work to do for deep blocks. This also means we don't visit unreachable
// blocks.		// blocks.
for (BasicBlock *BB : depth_first(&F.getEntryBlock())) {		for (BasicBlock *BB : depth_first(&F.getEntryBlock())) {
bool BBChanged = false;		bool BBChanged = false;
for (BasicBlock::iterator BI = BB->begin(), BE = BB->end(); BI != BE;) {		for (BasicBlock::iterator BI = BB->begin(), BE = BB->end(); BI != BE;) {
Instruction II = &BI++;		Instruction II = &BI++;
switch (II->getOpcode()) {		switch (II->getOpcode()) {
case Instruction::Select:		case Instruction::Select:
BBChanged \|= processSelect(cast<SelectInst>(II), LVI);		BBChanged \|= processSelect(cast<SelectInst>(II), LVI);
break;		break;
case Instruction::PHI:		case Instruction::PHI:
BBChanged \|= processPHI(cast<PHINode>(II), LVI, SQ);		BBChanged \|= processPHI(cast<PHINode>(II), LVI, DT, SQ);
break;		break;
case Instruction::ICmp:		case Instruction::ICmp:
case Instruction::FCmp:		case Instruction::FCmp:
BBChanged \|= processCmp(cast<CmpInst>(II), LVI);		BBChanged \|= processCmp(cast<CmpInst>(II), LVI);
break;		break;
case Instruction::Load:		case Instruction::Load:
case Instruction::Store:		case Instruction::Store:
BBChanged \|= processMemAccess(II, LVI);		BBChanged \|= processMemAccess(II, LVI);
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	static bool runImpl(Function &F, LazyValueInfo LVI, DominatorTree DT,
return FnChanged;		return FnChanged;
}		}

bool CorrelatedValuePropagation::runOnFunction(Function &F) {		bool CorrelatedValuePropagation::runOnFunction(Function &F) {
if (skipFunction(F))		if (skipFunction(F))
return false;		return false;

LazyValueInfo *LVI = &getAnalysis<LazyValueInfoWrapperPass>().getLVI();		LazyValueInfo *LVI = &getAnalysis<LazyValueInfoWrapperPass>().getLVI();
return runImpl(F, LVI, getBestSimplifyQuery(*this, F));		DominatorTree *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();

		return runImpl(F, LVI, DT, getBestSimplifyQuery(*this, F));
}		}

PreservedAnalyses		PreservedAnalyses
CorrelatedValuePropagationPass::run(Function &F, FunctionAnalysisManager &AM) {		CorrelatedValuePropagationPass::run(Function &F, FunctionAnalysisManager &AM) {

LazyValueInfo *LVI = &AM.getResult<LazyValueAnalysis>(F);		LazyValueInfo *LVI = &AM.getResult<LazyValueAnalysis>(F);
bool Changed = runImpl(F, LVI, getBestSimplifyQuery(AM, F));		DominatorTree *DT = &AM.getResult<DominatorTreeAnalysis>(F);
chandlercUnsubmitted Not Done Reply Inline Actions And this makes it somewhat obvious that this patch is an incremental improvement -- it removes one dynamic aspect of this. So, this patch definitely LGTM, but I think we also want to think about what a better API for building SimplifyQuery looks like in the new PM world where we have more of these issues around cached analyses being unpredictable. chandlerc: And this makes it somewhat obvious that this patch is an incremental improvement -- it removes…

		bool Changed = runImpl(F, LVI, DT, getBestSimplifyQuery(AM, F));

		chandlercUnsubmitted Not Done Reply Inline Actions I really don't like this use of getBestSimplifyQuery. I would strongly suggest passing the analyses you want to pass into SimplifyQuery. Not doing that makes the pass subtly dependent on pass ordering and other vagaries. chandlerc: I really don't like this use of getBestSimplifyQuery. I would strongly suggest passing the…
if (!Changed)		if (!Changed)
return PreservedAnalyses::all();		return PreservedAnalyses::all();
PreservedAnalyses PA;		PreservedAnalyses PA;
PA.preserve<GlobalsAA>();		PA.preserve<GlobalsAA>();
return PA;		return PA;
}		}

test/Transforms/CorrelatedValuePropagation/phi-common-val.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -correlated-propagation -S \| FileCheck %s		; RUN: opt < %s -correlated-propagation -S \| FileCheck %s
		; RUN: opt < %s -passes="correlated-propagation" -S \| FileCheck %s

define i8* @simplify_phi_common_value_op0(i8* %ptr, i32* %b) {		define i8* @simplify_phi_common_value_op0(i8* %ptr, i32* %b) {
; CHECK-LABEL: @simplify_phi_common_value_op0(		; CHECK-LABEL: @simplify_phi_common_value_op0(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[ISNULL:%.]] = icmp eq i8 [[PTR:%.*]], null		; CHECK-NEXT: [[ISNULL:%.]] = icmp eq i8 [[PTR:%.*]], null
; CHECK-NEXT: br i1 [[ISNULL]], label [[RETURN:%.]], label [[ELSE:%.]]		; CHECK-NEXT: br i1 [[ISNULL]], label [[RETURN:%.]], label [[ELSE:%.]]
; CHECK: else:		; CHECK: else:
; CHECK-NEXT: [[LB:%.]] = load i32, i32 [[B:%.*]]		; CHECK-NEXT: [[LB:%.]] = load i32, i32 [[B:%.*]]
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	else2:
store i32 %add, i32* %b		store i32 %add, i32* %b
br label %return		br label %return

return:		return:
%r = phi i8 [ 0, %entry], [ %x, %else2 ], [ 42, %else1 ]		%r = phi i8 [ 0, %entry], [ %x, %else2 ], [ 42, %else1 ]
ret i8 %r		ret i8 %r
}		}

		define i8* @simplify_phi_common_value_from_instruction(i8* %ptr_op, i32* %b, i32 %i) {
		; CHECK-LABEL: @simplify_phi_common_value_from_instruction(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[PTR:%.]] = getelementptr i8, i8 [[PTR_OP:%.]], i32 [[I:%.]]
		; CHECK-NEXT: [[ISNULL:%.]] = icmp eq i8 [[PTR]], null
		; CHECK-NEXT: br i1 [[ISNULL]], label [[RETURN:%.]], label [[ELSE:%.]]
		; CHECK: else:
		; CHECK-NEXT: [[LB:%.]] = load i32, i32 [[B:%.*]]
		; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[LB]], 1
		; CHECK-NEXT: store i32 [[ADD]], i32* [[B]]
		; CHECK-NEXT: br label [[RETURN]]
		; CHECK: return:
		; CHECK-NEXT: ret i8* [[PTR]]
		;
		entry:
		%ptr = getelementptr i8, i8* %ptr_op, i32 %i
		%isnull = icmp eq i8* %ptr, null
		br i1 %isnull, label %return, label %else

		else:
		%lb = load i32, i32* %b
		%add = add nsw i32 %lb, 1
		store i32 %add, i32* %b
		br label %return

		return:
		%r = phi i8* [ %ptr, %else ], [ null, %entry ]
		ret i8* %r
		}