This is an archive of the discontinued LLVM Phabricator instance.

[LazyValueInfo] Understand monotonically increasing loops and identity values
AcceptedPublic

Authored by jmolloy on Sep 30 2016, 12:19 AM.

Details

Summary

We have a very broad-brush bailout in JumpThreading - we never thread over a
loop header. This is so we don't create irreducible control flow.

However, as mentioned in the existing comments there are situations where we'd
want to thread over a loop header and it can create significant improvements in
some cases.

As it's hard to know if irreducible control flow would be created, handle
specially the case where we thread from predecessor to itself - essentially
generating a single block loop. Single block loops aren't irreducible by
definition and are easy to check.

Diff Detail

Repository
rL LLVM

Event Timeline

jmolloy updated this revision to Diff 72992.Sep 30 2016, 12:19 AM
jmolloy retitled this revision from to [JumpThreading] Allow threading over loop headers when creating single block loops.
jmolloy updated this object.
jmolloy added a reviewer: reames.
jmolloy set the repository for this revision to rL LLVM.
jmolloy added a subscriber: llvm-commits.
reames added inline comments.Oct 1 2016, 5:52 PM
lib/Transforms/Scalar/JumpThreading.cpp
1460 ↗(On Diff #72992)

I'm more than a bit nervous about this check. I'm trying to convince myself it's correct and am struggling.

1596 ↗(On Diff #72992)

Some form of incremental update would be good.

test/Transforms/JumpThreading/singleblock-loop.ll
7 ↗(On Diff #72992)

Please give more detailed checks. It's really hard to tell what the outcome here is.

12 ↗(On Diff #72992)

Looking at the example, it seems like we actually have two sub-cases you want to handle:

  • thread a branch through a header to a loop exit
  • thread a branch through a header to itself (essentially removing the header from the loop)

The former is obviously correct and easy to reason about. The second is the tricky case if we want to handle it generally while avoiding irreducible control flow.

I wonder if this would be easier to reason about if you broke apart each sub-case explicitly?

18 ↗(On Diff #72992)

Not related to this review, but we should be able to prove this is "3". Is the actual case you're worried about this fragile? If so, there are other simpler patches which might work.

majnemer added inline comments.
lib/Transforms/Scalar/JumpThreading.cpp
1460 ↗(On Diff #72992)

You could use is_contained(PredBBs, SuccBB) as a shorthand.

jmolloy updated this revision to Diff 73285.Oct 3 2016, 9:01 AM

Hi Philip,

Thanks for your review. Based on that I was able to go back and re-look with fresh eyes at the problem.

As you mentioned, in the motivating testcase we should have been able to prove that the value of the incremented PHI was always greater than or equal to 3. You were right that this exposed an underlying problem - in fact there were several things I needed to fix:

  • LazyValueInfo gets confused by trivial loops. I have added a couple of special cases that I think are very common:
    1. A PHI incoming value that is the PHI itself (%x = phi %x, %y) can be ignored, because it cannot contribute usefully to its own range.
    2. A PHI incoming value that can be trivially proved monotonically increasing (an induction variable for instance) can have its range more precisely calculated.
  • JumpThreading should consider all candidates for threading and not just the most profitable one.
    • If the most profitable candidate fails legality checks, it should try others and not simply bail.
  • JumpThreading should invalidate LVI when unfolding switches
    • The unfolding changes the CFG and creates new edges that can now have more specific values than before.

Mixed in there is also a change to make JumpThreading use and keep up to date LoopInfo.

This patch is obviously too large, has conflated several concerns and needs to be split apart. However I just wanted to get your input on the direction - is there anything particularly objectionable, before I go split it all out into subpatches?

Cheers,

James

reames requested changes to this revision.Oct 3 2016, 12:03 PM
reames edited edge metadata.

Commenting mostly on the LVI parts to start with. I request that you split those out into a separate change with tests.

lib/Analysis/LazyValueInfo.cpp
927

Nope. This will break key parts about LVI. The problem here is that there can be facts learned about the Phi along a particular path. The classic example is the backedge test of a loop can tell you that the value along the backedge is restricted to say < 400.

931

I see where you're going here, but this is the wrong way to structure this. You've got two reasonable approaches:

  1. Compute the existing result and then intersect it with the result computed this way
  2. Integrate this into the path sensitive logic. Doing this as the base case (i.e. instead of overdefined) might be reasonable.
This revision now requires changes to proceed.Oct 3 2016, 12:03 PM

Thanks Philip for the prompt review. Just one question for now before I rework this tomorrow.

lib/Analysis/LazyValueInfo.cpp
927

How is this actually *supposed* to work? My debugging shows that we recurse into getEdgeValue and because we end up in the same state we've been before, we stop recursing and just return overdefined. Surely we can be more clever than this - at the moment I can't see how even trivial loops are managed, but I must be missing something.

jmolloy updated this revision to Diff 73478.Oct 4 2016, 7:46 AM
jmolloy retitled this revision from [JumpThreading] Allow threading over loop headers when creating single block loops to [LazyValueInfo] Understand monotonically increasing loops and identity values.
jmolloy edited edge metadata.

Hi Philip,

Thanks for the review. After spending a day poring over LVI, I think I've got something more palatable.

I investigated trying to understand monotonic increments in the base case (getEdgeValue and friends), but I drew a blank there. When analyzing PHI incoming values, we obviously haven't fully analyzed the PHI itself. Because those incoming values rely on the PHI, we really have no choice but to return a conservative result (overdefined). I experimented with providing a "partially evaluated PHI" stack but couldn't get a solution that returned conservatively correct results - really, it seems we need to look at a PHI holistically.

This new patch sits in solveBlockValuePHINode as before but has the big change you mentioned in your review; it intersect()s where applicable so we don't clobber any interesting information we may have picked up in getEdgeValue().

The one thing I don't like about this new diff is the "TheCache.eraseValue(PhiVal);" call. We need this because, while analyzing the PHI we will have analyzed the increment (in getEdgeValue) and decided it gives us no information (ConstantRange<full-set>). Later, we understand that increment in the context of the PHI it is incrementing and now have a more precise definition for it, but the full-set value is still cached. I erase it here to ensure it gets recomputed, but given that's never done elsewhere I'm not sure that's right.

Cheers,

James

Adding more reviewers.

reames requested changes to this revision.Oct 17 2016, 3:36 PM
reames edited edge metadata.

Detailed comments inline. This change is conceptually complicated even if a small textual change. I'd suggest thinking about how to factor out complexity into separate changes.

lib/Analysis/LazyValueInfo.cpp
935

I had this whole long response written on why this was wrong when I realized I misread what you had in the new draft. :)

I believe this piece is correct. Please separate it into it's own review with test cases.

The complexity here is that Result is a *speculative* set of facts. This is relying on the fact that any result computed (after the entire loop), must be correct for the phi input. You're implicitly assuming that a later fact will force the lattice to a conservative value if needed. Yep, that sounds reasonable.

940

My (strongly) suggested factoring would be:

  1. Extract the current loop as a helper function called solveBlockValuePHINodeImpl
  2. Analyze the phi using your new logic in a helper function which returns a LatticeValue.
  3. Call both helper functions exactly once and return the intersected results.

Another possible structure would be:

  1. Create the helper function mentioned in 2 above.
  2. Intersect it's results with the return value of getEdgeValue in the loop.

The later will produce slightly more precise results than the former when we know a fact about the incoming value into the loop, but would otherwise loose all knowledge in the loop.

950

Using the result part was through the loop in this manner is *incorrect*. It contains a speculative set of facts which may be later misproven.

p.s. I could see an line of argument in which this might be safe for this particular use, but a) I'm not sure that's what you intended and b) if it is, it needs to be clearly documented.

p.p.s. I wrote this before revising my comment on the PhiVal == PN case. I'm now reasonable sure you're doing this intentionally, but please make the argument in comment form. :)

p.p.p.s. I don't believe there is any guarantee as to the order in which we visit values in a phi node. In particular, we may not have visited the preheader yet...

952

This whole chain of logic can become ConstantRange::makeNoWrapRegion

960

Manually playing with the cache is a really dangerous thing to be doing and will need to be solidly justified. Please implement a variant of this without this piece, then post a separate review for this with clear test cases motivating it.

This revision now requires changes to proceed.Oct 17 2016, 3:36 PM
jmolloy updated this revision to Diff 74991.Oct 18 2016, 6:28 AM
jmolloy edited edge metadata.

Hi Philip,

Thanks again. I've separated out the first, trivial, change and isolated a testcase for it.

I'm fairly sure the cache invalidation is required, but I will make that case in a followup phab review with the rest of the code as you suggested.

Cheers,
James

reames accepted this revision.Nov 30 2016, 6:27 PM
reames edited edge metadata.

LGTM, sorry for the delay.

This revision is now accepted and ready to land.Nov 30 2016, 6:27 PM
mcrosier resigned from this revision.Mar 9 2017, 8:52 AM