This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] LiveDebugValues: Defer all DBG_VALUE creation during analysis
ClosedPublic

Authored by jmorse on Sep 10 2019, 4:08 AM.

Details

Summary

As stated in the docs here [0], the meaning of a DBG_VALUE instruction changes after LiveDebugValues runs, from corresponding to a source-level assignment, to being a per-block variable location record [1]. Unfortunately, LiveDebugValues seems to mix these up and there's a feedback effect. The problem is this:

  • Currently all register/stack transfers have per-block DBG_VALUE instructions created and inserted during the dataflow analysis [2],
  • Sometimes variable locations are invalidated during the dataflow analysis [3],
  • But the register/stack transfer DBG_VALUEs have already been created by then, and are interpreted as source-level assignments.

This means that a variable location that's propagated once (such as the first iteration where backedges are ignored) but invalidated on later iterations, can "latch" if it experiences a transfer in that time. Observe the test case in this patch, where in the loop block a value is shifted through four different registers (esi -> edi -> ecx -> eax). Currently the incoming location in ecx is transferred to ebx in the loop block, then invalidated on the second LiveDebugValues iteration through the loop because ecx is clobbered. However a DBG_VALUE for ebx has already been created by that point, and gets propagated into the exit block.

I've mentally divided this into two issues: ensuring that block-only DBG_VALUEs aren't interpreted as source-assignments, and deleting transfers of locations that are later invalidated. The former is easy to fix by moving the code in [2] out of the location propagation loop, which is what this patch does. Dealing with deleting transfers is in some more patches that are coming.

This patch also removes a condition in transferTerminator that doesn't update OutLocs if there are no locations at the end of processing a block. I think this made sense when we were only taking the union of OutLocs after propagation, but now we're deleting locations too, an empty set of OutLocs is something that needs to be handled. This is covered by the test added too.

[0] https://llvm.org/docs/SourceLevelDebugging.html#livedebugvalues-expansion-of-variable-locations
[1] I'm pretty confident of this, but as I wrote that paragraph of the docs this might be a circular argument.
[2] https://github.com/llvm/llvm-project/blob/5d9cd3b4ca457e55c3c21094cfea5e49dddef36c/llvm/lib/CodeGen/LiveDebugValues.cpp#L1365
[3] https://reviews.llvm.org/D66599

Diff Detail

Repository
rL LLVM

Event Timeline

jmorse created this revision.Sep 10 2019, 4:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 10 2019, 4:08 AM
jmorse updated this revision to Diff 219516.Sep 10 2019, 4:13 AM

(Ninja-edit: this moves transfer-insertion to before flushPendingLocs. flushPendingLocs may manipulate VarLocs that refer to not-yet-inserted DBG_VALUEs as their sources).

As stated in the docs here [0], the meaning of a DBG_VALUE instruction changes after LiveDebugValues runs

Do you think we should avoid this confusion by calling them DBG_LOC or something else after LiveDebugValues to embrace the semantic difference?

Do you think we should avoid this confusion by calling them DBG_LOC or something else after LiveDebugValues to embrace the semantic difference?

That'd probably be best -- although something without "location" in its name, it's already a highly overloaded term :(. Something like DBG_SPAN, DBG_BLOCKVAL, or some other term that indicates the limited range of the instructions effect?

Okay, we should make sure that we do get the design right. I don't quite get the argument made in SourceLevelDebugInfo.rst for why the semantics are different:

After this pass the DBG_VALUE instruction changes meaning: rather than corresponding to a source-level assignment where the variable may change value, it asserts the location of a variable in a block, and loses effect outside the block.

First, DBG_VALUEs aren't necessarily source-level assignments *before* LiveDebugValues either, they update the SSA value that a (fragment) of a source-level variable can be found in, but that SSA value could have been created by the compiler and has not necessarily any relation to a source-level assignment (think about salvageDebugInfo, for example).
The fact that the DBG_VALUE has no effect outside of the current basic block just falls out of DbgEntityHistoryCalculator not doing a LiveDebugVariable-style data flow analysis, but IMO that isn't a change in semantics, it would be *legal* for it to perform one, it just would be pointless after LiveDebugValues has propagated DBG_VALUEs across basic blocks and reached a fixed point.

So I'm suspecting that I'm missing something and it isn't mentioned in the text. Can you perhaps fill me in?

First, DBG_VALUEs aren't necessarily source-level assignments *before* LiveDebugValues either, they update the SSA value that a (fragment) of a source-level variable can be found in, but that SSA value could have been created by the compiler and has not necessarily any relation to a source-level assignment (think about salvageDebugInfo, for example).

(Unfortunately I'm not known for operating the English language effectively). My meaning in the wording was about the placement of a dbg.value/DBG_VALUE within a block, ignoring its operand. AFAIUI, for any LLVM-IR instruction i in a function, and a (fragment of) variable, to determine the variables location at i one has to do an entire dominance-frontier analysis to work out which dbg.value/DBG_VALUE dominates i (potentially none of them). This method of recording the _position_ where variable locations _change_, closely matches the original source program: if you had five assignments to a variable in a program, you'd have five dbg.values, regardless of their operands.

The fact that the DBG_VALUE has no effect outside of the current basic block just falls out of DbgEntityHistoryCalculator not doing a LiveDebugVariable-style data flow analysis, but IMO that isn't a change in semantics, it would be *legal* for it to perform one, it just would be pointless after LiveDebugValues has propagated DBG_VALUEs across basic blocks and reached a fixed point.

All true; this is where a question of "what is the design" de-jure and de-facto comes in. Because DbgEntityHistoryCalculator currently relies on LiveDebugValues having run its analysis, does that not _make_ it a semantic change? At the very least, because that's what I saw when trying to document these things, that's what I wrote.

Alternately, we could document the change in interpretation as being an optimisation that DbgEntityHistoryCalculator performs/relies on, rather than being a change in semantics. (I think these are both sides of the same coin when it comes to explaining internal state).

vsk accepted this revision.Sep 20 2019, 1:05 PM

First, DBG_VALUEs aren't necessarily source-level assignments *before* LiveDebugValues either, they update the SSA value that a (fragment) of a source-level variable can be found in, but that SSA value could have been created by the compiler and has not necessarily any relation to a source-level assignment (think about salvageDebugInfo, for example).

(Unfortunately I'm not known for operating the English language effectively). My meaning in the wording was about the placement of a dbg.value/DBG_VALUE within a block, ignoring its operand. AFAIUI, for any LLVM-IR instruction i in a function, and a (fragment of) variable, to determine the variables location at i one has to do an entire dominance-frontier analysis to work out which dbg.value/DBG_VALUE dominates i (potentially none of them). This method of recording the _position_ where variable locations _change_, closely matches the original source program: if you had five assignments to a variable in a program, you'd have five dbg.values, regardless of their operands.

The fact that the DBG_VALUE has no effect outside of the current basic block just falls out of DbgEntityHistoryCalculator not doing a LiveDebugVariable-style data flow analysis, but IMO that isn't a change in semantics, it would be *legal* for it to perform one, it just would be pointless after LiveDebugValues has propagated DBG_VALUEs across basic blocks and reached a fixed point.

All true; this is where a question of "what is the design" de-jure and de-facto comes in. Because DbgEntityHistoryCalculator currently relies on LiveDebugValues having run its analysis, does that not _make_ it a semantic change? At the very least, because that's what I saw when trying to document these things, that's what I wrote.

Alternately, we could document the change in interpretation as being an optimisation that DbgEntityHistoryCalculator performs/relies on, rather than being a change in semantics. (I think these are both sides of the same coin when it comes to explaining internal state).

Does the location calculator rely on LiveDebugValues for correctness? IIUC it should be possible to run the location calculator without doing LiveDebugValues, and that should produce correct (if incomplete) information.

If that's right, then we can edit the docs to 1) clarify that, 2) describe LiveDebugValues as a pass that lets the location calculator "get away with" taking a narrow, one-block-at-a-time view of the program, and 3) state that the semantics of DBG_VALUE never change.

Now, for this patch, I think the change looks good, and the test case is really neat :). Please let @aprantl chime in as well, though.

This revision is now accepted and ready to land.Sep 20 2019, 1:05 PM

Does the location calculator rely on LiveDebugValues for correctness? IIUC it should be possible to run the location calculator without doing LiveDebugValues, and that should produce correct (if incomplete) information.

Hmmm, not for correctness, no -- you're right, the location calculator wouldn't produce any incorrect locations if LiveDebugValues didn't run.

If that's right, then we can edit the docs to 1) clarify that, 2) describe LiveDebugValues as a pass that lets the location calculator "get away with" taking a narrow, one-block-at-a-time view of the program, and 3) state that the semantics of DBG_VALUE never change.

Sure; I'm going to let this soak into my mind a little so that I'm confident it's not really a change in semantics.

Now, for this patch, I think the change looks good, and the test case is really neat :). Please let @aprantl chime in as well, though.

jmorse added a comment.Oct 3 2019, 3:24 AM

I wrote:

Sure; I'm going to let this soak into my mind a little so that I'm confident it's not really a change in semantics.

I think I've got it now: the location calculators purpose isn't to "produce perfect and complete variable locations", it's to calculate the locations that the debuginfo in the instruction stream describes. If the information described by the DBG_VALUEs is incomplete, that's not an error; and LiveDebugValues just propagates locations to create more information rather than fixing a correctness problem. I'll cough up a documentation fix probably tomorrow; assuming @aprantl is happy with all of this, I'll commit this patch tomorrow-ish.

aprantl accepted this revision.Oct 3 2019, 12:51 PM
This revision was automatically updated to reflect the committed changes.