This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Analysis/
-
Analysis/
8
LazyValueInfo.cpp
-
test/Transforms/CorrelatedValuePropagation/
-
Transforms/
-
CorrelatedValuePropagation/
-
identity.ll

Differential D25075

[LazyValueInfo] Understand monotonically increasing loops and identity values
AcceptedPublic

Authored by jmolloy on Sep 30 2016, 12:19 AM.

Download Raw Diff

Details

Reviewers

reames
gberry
nlewycky

Summary

We have a very broad-brush bailout in JumpThreading - we never thread over a
loop header. This is so we don't create irreducible control flow.

However, as mentioned in the existing comments there are situations where we'd
want to thread over a loop header and it can create significant improvements in
some cases.

As it's hard to know if irreducible control flow would be created, handle
specially the case where we thread from predecessor to itself - essentially
generating a single block loop. Single block loops aren't irreducible by
definition and are easy to check.

Diff Detail

Repository: rL LLVM

Event Timeline

jmolloy updated this revision to Diff 72992.Sep 30 2016, 12:19 AM

jmolloy retitled this revision from to [JumpThreading] Allow threading over loop headers when creating single block loops.

jmolloy updated this object.

jmolloy added a reviewer: reames.

jmolloy set the repository for this revision to rL LLVM.

jmolloy added a subscriber: llvm-commits.

reames added inline comments.Oct 1 2016, 5:52 PM

lib/Transforms/Scalar/JumpThreading.cpp
1460 ↗	(On Diff #72992)	I'm more than a bit nervous about this check. I'm trying to convince myself it's correct and am struggling.
1596 ↗	(On Diff #72992)	Some form of incremental update would be good.
test/Transforms/JumpThreading/singleblock-loop.ll
7 ↗	(On Diff #72992)	Please give more detailed checks. It's really hard to tell what the outcome here is.
12 ↗	(On Diff #72992)	Looking at the example, it seems like we actually have two sub-cases you want to handle: thread a branch through a header to a loop exit thread a branch through a header to itself (essentially removing the header from the loop) The former is obviously correct and easy to reason about. The second is the tricky case if we want to handle it generally while avoiding irreducible control flow. I wonder if this would be easier to reason about if you broke apart each sub-case explicitly?
18 ↗	(On Diff #72992)	Not related to this review, but we should be able to prove this is "3". Is the actual case you're worried about this fragile? If so, there are other simpler patches which might work.

majnemer added a subscriber: majnemer.Oct 1 2016, 6:36 PM

majnemer added inline comments.

lib/Transforms/Scalar/JumpThreading.cpp
1460 ↗	(On Diff #72992)	You could use `is_contained(PredBBs, SuccBB)` as a shorthand.

Hi Philip,

Thanks for your review. Based on that I was able to go back and re-look with fresh eyes at the problem.

As you mentioned, in the motivating testcase we should have been able to prove that the value of the incremented PHI was always greater than or equal to 3. You were right that this exposed an underlying problem - in fact there were several things I needed to fix:

LazyValueInfo gets confused by trivial loops. I have added a couple of special cases that I think are very common:
1. A PHI incoming value that is the PHI itself (%x = phi %x, %y) can be ignored, because it cannot contribute usefully to its own range.
2. A PHI incoming value that can be trivially proved monotonically increasing (an induction variable for instance) can have its range more precisely calculated.
JumpThreading should consider all candidates for threading and not just the most profitable one.
- If the most profitable candidate fails legality checks, it should try others and not simply bail.
JumpThreading should invalidate LVI when unfolding switches
- The unfolding changes the CFG and creates new edges that can now have more specific values than before.

Mixed in there is also a change to make JumpThreading use and keep up to date LoopInfo.

This patch is obviously too large, has conflated several concerns and needs to be split apart. However I just wanted to get your input on the direction - is there anything particularly objectionable, before I go split it all out into subpatches?

Cheers,

James

Commenting mostly on the LVI parts to start with. I request that you split those out into a separate change with tests.

lib/Analysis/LazyValueInfo.cpp
927	Nope. This will break key parts about LVI. The problem here is that there can be facts learned about the Phi along a particular path. The classic example is the backedge test of a loop can tell you that the value along the backedge is restricted to say < 400.
931	I see where you're going here, but this is the wrong way to structure this. You've got two reasonable approaches: Compute the existing result and then intersect it with the result computed this way Integrate this into the path sensitive logic. Doing this as the base case (i.e. instead of overdefined) might be reasonable.

This revision now requires changes to proceed.Oct 3 2016, 12:03 PM

Thanks Philip for the prompt review. Just one question for now before I rework this tomorrow.

lib/Analysis/LazyValueInfo.cpp
927	How is this actually supposed to work? My debugging shows that we recurse into getEdgeValue and because we end up in the same state we've been before, we stop recursing and just return overdefined. Surely we can be more clever than this - at the moment I can't see how even trivial loops are managed, but I must be missing something.

Hi Philip,

Thanks for the review. After spending a day poring over LVI, I think I've got something more palatable.

I investigated trying to understand monotonic increments in the base case (getEdgeValue and friends), but I drew a blank there. When analyzing PHI incoming values, we obviously haven't fully analyzed the PHI itself. Because those incoming values rely on the PHI, we really have no choice but to return a conservative result (overdefined). I experimented with providing a "partially evaluated PHI" stack but couldn't get a solution that returned conservatively correct results - really, it seems we need to look at a PHI holistically.

This new patch sits in solveBlockValuePHINode as before but has the big change you mentioned in your review; it intersect()s where applicable so we don't clobber any interesting information we may have picked up in getEdgeValue().

The one thing I don't like about this new diff is the "TheCache.eraseValue(PhiVal);" call. We need this because, while analyzing the PHI we will have analyzed the increment (in getEdgeValue) and decided it gives us no information (ConstantRange<full-set>). Later, we understand that increment in the context of the PHI it is incrementing and now have a more precise definition for it, but the full-set value is still cached. I erase it here to ensure it gets recomputed, but given that's never done elsewhere I'm not sure that's right.

Cheers,

James

Ping!

Adding more reviewers.

javed.absar added a subscriber: javed.absar.Oct 17 2016, 7:21 AM

Detailed comments inline. This change is conceptually complicated even if a small textual change. I'd suggest thinking about how to factor out complexity into separate changes.

lib/Analysis/LazyValueInfo.cpp
935	I had this whole long response written on why this was wrong when I realized I misread what you had in the new draft. :) I believe this piece is correct. Please separate it into it's own review with test cases. The complexity here is that Result is a speculative set of facts. This is relying on the fact that any result computed (after the entire loop), must be correct for the phi input. You're implicitly assuming that a later fact will force the lattice to a conservative value if needed. Yep, that sounds reasonable.
940	My (strongly) suggested factoring would be: Extract the current loop as a helper function called solveBlockValuePHINodeImpl Analyze the phi using your new logic in a helper function which returns a LatticeValue. Call both helper functions exactly once and return the intersected results. Another possible structure would be: Create the helper function mentioned in 2 above. Intersect it's results with the return value of getEdgeValue in the loop. The later will produce slightly more precise results than the former when we know a fact about the incoming value into the loop, but would otherwise loose all knowledge in the loop.
950	Using the result part was through the loop in this manner is incorrect. It contains a speculative set of facts which may be later misproven. p.s. I could see an line of argument in which this might be safe for this particular use, but a) I'm not sure that's what you intended and b) if it is, it needs to be clearly documented. p.p.s. I wrote this before revising my comment on the PhiVal == PN case. I'm now reasonable sure you're doing this intentionally, but please make the argument in comment form. :) p.p.p.s. I don't believe there is any guarantee as to the order in which we visit values in a phi node. In particular, we may not have visited the preheader yet...
952	This whole chain of logic can become ConstantRange::makeNoWrapRegion
960	Manually playing with the cache is a really dangerous thing to be doing and will need to be solidly justified. Please implement a variant of this without this piece, then post a separate review for this with clear test cases motivating it.

This revision now requires changes to proceed.Oct 17 2016, 3:36 PM

Hi Philip,

Thanks again. I've separated out the first, trivial, change and isolated a testcase for it.

I'm fairly sure the cache invalidation is required, but I will make that case in a followup phab review with the rest of the code as you suggested.

Cheers,
James

LGTM, sorry for the delay.

This revision is now accepted and ready to land.Nov 30 2016, 6:27 PM

mcrosier resigned from this revision.Mar 9 2017, 8:52 AM

Revision Contents

Path

Size

lib/

Analysis/

LazyValueInfo.cpp

5 lines

test/

Transforms/

CorrelatedValuePropagation/

identity.ll

30 lines

Diff 74991

lib/Analysis/LazyValueInfo.cpp

Show First 20 Lines • Show All 918 Lines • ▼ Show 20 Lines	bool LazyValueInfoImpl::solveBlockValuePHINode(LVILatticeVal &BBLV,
// Loop over all of our predecessors, merging what we know from them into		// Loop over all of our predecessors, merging what we know from them into
// result.		// result.
bool EdgesMissing = false;		bool EdgesMissing = false;
for (unsigned i = 0, e = PN->getNumIncomingValues(); i != e; ++i) {		for (unsigned i = 0, e = PN->getNumIncomingValues(); i != e; ++i) {
BasicBlock *PhiBB = PN->getIncomingBlock(i);		BasicBlock *PhiBB = PN->getIncomingBlock(i);
Value *PhiVal = PN->getIncomingValue(i);		Value *PhiVal = PN->getIncomingValue(i);
LVILatticeVal EdgeResult;		LVILatticeVal EdgeResult;
// Note that we can provide PN as the context value to getEdgeValue, even		// Note that we can provide PN as the context value to getEdgeValue, even
// though the results will be cached, because PN is the value being used as		// though the results will be cached, because PN is the value being used as
		reamesUnsubmitted Not Done Reply Inline Actions Nope. This will break key parts about LVI. The problem here is that there can be facts learned about the Phi along a particular path. The classic example is the backedge test of a loop can tell you that the value along the backedge is restricted to say < 400. reames: Nope. This will break key parts about LVI. The problem here is that there can be facts…
		jmolloyAuthorUnsubmitted Not Done Reply Inline Actions How is this actually supposed to work? My debugging shows that we recurse into getEdgeValue and because we end up in the same state we've been before, we stop recursing and just return overdefined. Surely we can be more clever than this - at the moment I can't see how even trivial loops are managed, but I must be missing something. jmolloy: How is this actually supposed to work? My debugging shows that we recurse into getEdgeValue…
// the cache key in the caller.		// the cache key in the caller.
EdgesMissing \|= !getEdgeValue(PhiVal, PhiBB, BB, EdgeResult, PN);		EdgesMissing \|= !getEdgeValue(PhiVal, PhiBB, BB, EdgeResult, PN);
if (EdgesMissing)		if (EdgesMissing)
continue;		continue;
		reamesUnsubmitted Not Done Reply Inline Actions I see where you're going here, but this is the wrong way to structure this. You've got two reasonable approaches: Compute the existing result and then intersect it with the result computed this way Integrate this into the path sensitive logic. Doing this as the base case (i.e. instead of overdefined) might be reasonable. reames: I see where you're going here, but this is the wrong way to structure this. You've got two…

		// If this is an identity arc, we cannot be less specific now than we were
		// before.
		if (PhiVal == PN)
		reamesUnsubmitted Not Done Reply Inline Actions I had this whole long response written on why this was wrong when I realized I misread what you had in the new draft. :) I believe this piece is correct. Please separate it into it's own review with test cases. The complexity here is that Result is a speculative set of facts. This is relying on the fact that any result computed (after the entire loop), must be correct for the phi input. You're implicitly assuming that a later fact will force the lattice to a conservative value if needed. Yep, that sounds reasonable. reames: I had this whole long response written on why this was wrong when I realized I misread what you…
		EdgeResult = intersect(EdgeResult, Result);

Result.mergeIn(EdgeResult, DL);		Result.mergeIn(EdgeResult, DL);

// If we hit overdefined, exit early. The BlockVals entry is already set		// If we hit overdefined, exit early. The BlockVals entry is already set
		reamesUnsubmitted Not Done Reply Inline Actions My (strongly) suggested factoring would be: Extract the current loop as a helper function called solveBlockValuePHINodeImpl Analyze the phi using your new logic in a helper function which returns a LatticeValue. Call both helper functions exactly once and return the intersected results. Another possible structure would be: Create the helper function mentioned in 2 above. Intersect it's results with the return value of getEdgeValue in the loop. The later will produce slightly more precise results than the former when we know a fact about the incoming value into the loop, but would otherwise loose all knowledge in the loop. reames: My (strongly) suggested factoring would be: 1) Extract the current loop as a helper function…
// to overdefined.		// to overdefined.
if (Result.isOverdefined()) {		if (Result.isOverdefined()) {
DEBUG(dbgs() << " compute BB '" << BB->getName()		DEBUG(dbgs() << " compute BB '" << BB->getName()
<< "' - overdefined because of pred (local).\n");		<< "' - overdefined because of pred (local).\n");

BBLV = Result;		BBLV = Result;
return true;		return true;
}		}
}		}
if (EdgesMissing)		if (EdgesMissing)
		reamesUnsubmitted Not Done Reply Inline Actions Using the result part was through the loop in this manner is incorrect. It contains a speculative set of facts which may be later misproven. p.s. I could see an line of argument in which this might be safe for this particular use, but a) I'm not sure that's what you intended and b) if it is, it needs to be clearly documented. p.p.s. I wrote this before revising my comment on the PhiVal == PN case. I'm now reasonable sure you're doing this intentionally, but please make the argument in comment form. :) p.p.p.s. I don't believe there is any guarantee as to the order in which we visit values in a phi node. In particular, we may not have visited the preheader yet... reames: Using the result part was through the loop in this manner is incorrect. It contains a…
return false;		return false;

		reamesUnsubmitted Not Done Reply Inline Actions This whole chain of logic can become ConstantRange::makeNoWrapRegion reames: This whole chain of logic can become ConstantRange::makeNoWrapRegion
// Return the merged value, which is more precise than 'overdefined'.		// Return the merged value, which is more precise than 'overdefined'.
assert(!Result.isOverdefined() && "Possible PHI in entry block?");		assert(!Result.isOverdefined() && "Possible PHI in entry block?");
BBLV = Result;		BBLV = Result;
return true;		return true;
}		}

static LVILatticeVal getValueFromCondition(Value Val, Value Cond,		static LVILatticeVal getValueFromCondition(Value Val, Value Cond,
bool isTrueDest = true);		bool isTrueDest = true);
		reamesUnsubmitted Not Done Reply Inline Actions Manually playing with the cache is a really dangerous thing to be doing and will need to be solidly justified. Please implement a variant of this without this piece, then post a separate review for this with clear test cases motivating it. reames: Manually playing with the cache is a really dangerous thing to be doing and will need to be…

// If we can determine a constraint on the value given conditions assumed by		// If we can determine a constraint on the value given conditions assumed by
// the program, intersect those constraints with BBLV		// the program, intersect those constraints with BBLV
void LazyValueInfoImpl::intersectAssumeOrGuardBlockValueConstantRange(		void LazyValueInfoImpl::intersectAssumeOrGuardBlockValueConstantRange(
Value Val, LVILatticeVal &BBLV, Instruction BBI) {		Value Val, LVILatticeVal &BBLV, Instruction BBI) {
BBI = BBI ? BBI : dyn_cast<Instruction>(Val);		BBI = BBI ? BBI : dyn_cast<Instruction>(Val);
if (!BBI)		if (!BBI)
return;		return;
▲ Show 20 Lines • Show All 907 Lines • Show Last 20 Lines

test/Transforms/CorrelatedValuePropagation/identity.ll

This file was added.

				; RUN: opt -S < %s -correlated-propagation \| FileCheck %s

				; CHECK-LABEL: @f
				; CHECK: ret i32 3
				define i32 @f(i32 %n) {
				entry:
				br label %while.cond

				while.cond:
				%i = phi i32 [ 2, %entry ], [ %i, %one ], [ 3, %two ]
				%cmp = icmp slt i32 %i, 3
				br i1 %cmp, label %while.body, label %return

				while.body:
				%call = call i32 @g(i32 1)
				%add = add i32 %i, 6
				%cmp3 = icmp sgt i32 %add , 6
				br i1 %cmp3, label %one, label %two

				one:
				br label %while.cond

				two:
				br label %while.cond

				return:
				ret i32 %i
				}

				declare i32 @g(i32)

This is an archive of the discontinued LLVM Phabricator instance.

[LazyValueInfo] Understand monotonically increasing loops and identity valuesAcceptedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 74991

lib/Analysis/LazyValueInfo.cpp

test/Transforms/CorrelatedValuePropagation/identity.ll

[LazyValueInfo] Understand monotonically increasing loops and identity values
AcceptedPublic