This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] LiveDebugValues should always revisit backedges if it skips them
ClosedPublic

Authored by jmorse on Aug 23 2019, 9:27 AM.

Details

Summary

LiveDebugValues currently ignores loop backedges on the first iteration -- this is because any variable locations that are live-through a loop need to be propagated in regardless. Otherwise, the unvisited backedge would always invalidate any potential live-through variable locations.

However: LiveDebugValues only revisits blocks if the out-locations of a predecessor changes. This creates situations like the "baz" function added in this patch:

  • Location of !103 enters the loop block as %bar, which is taken on faith by ignoring the backedge,
  • Location of !103 changes in the loop block to constant-zero, which should later invalidate the in-location for the loop through the backedge, like it does in the "foo" function,
  • However, the out-locations of the loop block are not changed as the result of propagation, therefore the loop block is never re-visited.

The net effect of which is that we took the !103 == %bar location on faith, and as the backedge is not seen, it's kept forever. (This doesn't happen in the "foo" function in the same file because I've deliberately set that up to have an out-location change, to continue propagation). Some of this is behind PR38997.

This patch tries to rectify this by, whenever we ignore a backedge, forcing the ignoring block to be revisited at least once. This rectifies the immediate problem. The previous discussion about how this is supposed to work seems to be here [0]: I'm not completely convinced that what we have right now will always get the right answer because, as Vikram puts half way through that thread, getting a full answer requires initialising every block as having all variables in all locations. However, with this patch I think we get more accurate locations, so it's a step in the right direction.

On a clang-3.4 build we drop a single variable location; and the scope-bytes-covered statistic ticks up 0.2%. This is mildly counter-intuitive as I would expect more locations to be invalidated. I'm hoping this is just complex location interactions, with fewer wrong locations it might be easier to prove there are more right ones, currently looking closer.

I've added a third function in the test file that positively checks a variable location that's live-through a loop has its location preserved, just in case we mess with this further.

In the DW_AT_location-reference.ll test that changes, placeDbgValues moves the non-constant variable locations into the middle of the loops. Previously we marked the first instruction of each loop body as having a constant value (which would be wrong), which doesn't occur now -- in fact no instruction gets a constant-value of "x", hence the fewer location list entries. This test says it only cares about the fact that "x" moves around though, so I believe it still works (TM).

[0] https://groups.google.com/d/msg/llvm-dev/nlgxIfDQnhs/c55kcncIAAAJ

Diff Detail

Repository
rL LLVM

Event Timeline

jmorse created this revision.Aug 23 2019, 9:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2019, 9:27 AM

LiveDebugValues currently ignores loop backedges on the first iteration -- this is because any variable locations that are live-through a loop need to be propagated in regardless. Otherwise, the unvisited backedge would always invalidate any potential live-through variable locations.

Conceptually, the analysis uses a lattice with three elements to describe each VarLoc at each point: ⊥ (uninitialized), true (VarLoc is valid), false (VarLoc is not valid). As @dberlin pointed out in the linked thread, we LiveDebugValues *implicitly* represents the (uninitialized) state by ensuring a specific iteration order. (Not joining the unvisited backedge with the incoming edge, is equivalent to joining a set full of ⊥'s with the incoming edge).

However: LiveDebugValues only revisits blocks if the out-locations of a predecessor changes. This creates situations like the "baz" function added in this patch:

Location of !103 enters the loop block as %bar, which is taken on faith by ignoring the backedge,

Makes sense.

Location of !103 changes in the loop block to constant-zero, which should later invalidate the in-location for the loop through the backedge, like it does in the "foo" function,

When the set out OutLocs changes (and in this case it changes from [crumpets=⊥] to [crumpets=0]) all successor blocks (in this case also the block itself because of the back edge) should be added to worklist automatically. Why isn't this happening?

However, the out-locations of the loop block are not changed as the result of propagation, therefore the loop block is never re-visited.

That I don't understand yet. They change from "no info" -> "some info". What am I missing?

The net effect of which is that we took the !103 == %bar location on faith, and as the backedge is not seen, it's kept forever. (This doesn't happen in the "foo" function in the same file because I've deliberately set that up to have an out-location change, to continue propagation). Some of this is behind PR38997.

This patch tries to rectify this by, whenever we ignore a backedge, forcing the ignoring block to be revisited at least once. This rectifies the immediate problem. The previous discussion about how this is supposed to work seems to be here [0]: I'm not completely convinced that what we have right now will always get the right answer because, as Vikram puts half way through that thread, getting a full answer requires initialising every block as having all variables in all locations. However, with this patch I think we get more accurate locations, so it's a step in the right direction.

Location of !103 changes in the loop block to constant-zero, which should later invalidate the in-location for the loop through the backedge, like it does in the "foo" function,

When the set out OutLocs changes (and in this case it changes from [crumpets=⊥] to [crumpets=0]) all successor blocks (in this case also the block itself because of the back edge) should be added to worklist automatically. Why isn't this happening?

However, the out-locations of the loop block are not changed as the result of propagation, therefore the loop block is never re-visited.

That I don't understand yet. They change from "no info" -> "some info". What am I missing?

Ah, this comes down to the initialisation that OutLocs experiences, performed here [0] before cross-block propagation starts. Because OutLocs is initialised with [crumpets=0] rather than ⊥, reprocessing the loop block after joining appears to change nothing, hence there's no further iteration.

Exactly why the initialisation happens isn't clear to me (it appears to be as old as LiveDebugValues). It causes other problems too (the entry block is never revisited in propagation, causing entry register/stack transfers to not be recorded) but that's for some other time.

I'll try reimplementing this patch by removing the initialisation phase and see what happens; I think (80%) there will still be some special casing needed though. If in the example the variable location was instead clobbered/undef'd in the loop block, the OutLocs for the loop would have effectively changed from uninitialised to false and the loop head would need revisiting, but this wouldn't actually present as a difference in the OutLocs vector-of-valid-locations.

[0] https://github.com/llvm/llvm-project/blob/0ae549814693eb201d7e7a0a77856e55a98e19ee/llvm/lib/CodeGen/LiveDebugValues.cpp#L1257

I think (80%) there will still be some special casing needed though.

Unless we change the implementation of VarLocSet to actually include three values instead of two, I think you are right. We could try and see if something like:

  1. not in the set -> ⊥
  2. in the set -> true
  3. in the set with a special KILL marker (perhaps implemented as an extra set of killed VarLocs) -> false

makes the implementation look more regular. This is not the only way to achieve this, just the first thing I came with on the spot.

jmorse updated this revision to Diff 217424.Aug 27 2019, 10:09 AM

This update removes the "initialisation" phase that was happening in LiveDebugValues, and doing so fixes the original problem I was trying to solve, horray.

The change in logic (the |= to MBBJoined) forces all blocks to be joined again after the first pass through. Before, the "join" logic for unvisited predecessors was effectively firing on the _second_ pass through because OutLocs was already initialised. Now that happens on the first pass, and it correctly sees added/missing locations on the second pass through.

I've eliminated the initialization flag for the "process" method while doing this, as I can't find any reason for it to exist. All the instructions get processed for transfers at least once, just as before, and no tests fail. I've added a MIR regression test for the behaviour where the entry block not being revisited will lead to transfers there not being recognised, which is another thing fixed by getting rid of the initialisation.

compiler-gen-bbs-livedebugvalues.mir: This test now recognises a stack spill and creates a second DBG_VALUE, where LiveDebugVariables has already created a DBG_VALUE. This is annoying, but not faulty IMO.

debug-info-blocks.ll: is more problematic. It seems a hash-collision and PR43058 [0] causes a stack-load of a return address to be recognised as a variable transfer, which I've XFailed. I haven't committed in the fix for PR43058 because of what I've written on bugzilla; IMHO it's best to xfail this test and move forwards, although we could also leave the patch hanging until PR43058 is done.

There are negligible differences in the debug statistics for a clang-3.4 build.

[0] https://bugs.llvm.org/show_bug.cgi?id=43058

This patch looks really elegant. Does the removal of the initialization phase have any adverse effect on the performance?

Which hash is colliding to contribute top the XFAILed test? Should we fix PR43058 before landing this?

This patch looks really elegant. Does the removal of the initialization phase have any adverse effect on the performance?

A few asan-builds of clang-3.4 showed no notable difference in performance at all.

Which hash is colliding to contribute top the XFAILed test? Should we fix PR43058 before landing this?

I've uploaded D66895 to fix this, it wasn't as deep a problem as I feared. I've added it as a child revision to this patch because the regression test I add in D66895 depends on transfers in the entry block being considered, which they aren't without this patch.

aprantl accepted this revision.Aug 28 2019, 1:42 PM

Thanks!

This revision is now accepted and ready to land.Aug 28 2019, 1:42 PM
This revision was automatically updated to reflect the committed changes.