This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo][LDV] Teach LDV how to identify source variables and handle fragments
AbandonedPublic

Authored by Orlando on Nov 12 2019, 5:36 AM.

Details

Summary

Related bugs [0, 1]: 41992, 43957

This patch prevents DBG_VALUE instructions being incorrectly coalesced by LiveDebugVariables.

Currently a UserValue represents part of a source variable. A UserValue is identified by the source variable, DIExpression, and the inlined scope of the source variable. DBG_VALUES definitions are added to a particular UserValue if it can be identified by these constraints. The definitions are coalesced if they refer to the same location and are at adjacent slots.

This is a problem because arbitrary DIExpressions are used to classify the definitions. For example, a DW_OP_deref on the value presented in the DBG_VALUE shouldn't change what source variable (part) we're describing, but it will according to the existing model. Instead of looking at the whole expression to classify definitions we should just be looking at the fragment.

This patch changes the internal structure of LDV: Now DBG_VALUES are instead grouped by the Fragment of the SourceVariable (DIVariable, Inlining scope) they refer to.

When adding a value definition for a Fragment of a SourceVariable, we search for overlapping fragments within the SourceVariable and add a dummy 'blocker' definition into their location interval map to prevent adjacent locations being coalesced and the definition range being extended beyond this point.

With this patch overlapping fragments are correctly handled and arbitrary DIExpression operations are not used to classify which source variable is being referred to by the DBG_VALUES.

To give a concrete example for arbitrary DIExpression operations preventing DBG_VALUES being coalesced from pr43957:

Going into LiveDebugVariables:
DBG_VALUE 36901, $noreg, !"l_801", !DIExpression()
DBG_VALUE 36901, $noreg, !"l_801", !DIExpression(DW_OP_plus_uconst, 1, DW_OP_stack_value)

Coming out of VirtRegRewriter without this patch:
DBG_VALUE 36901, $noreg, !"l_801", !DIExpression(DW_OP_plus_uconst, 1, DW_OP_stack_value)
DBG_VALUE 36901, $noreg, !"l_801", !DIExpression() // Despite referring to the same source variable as the previous DBG_VALUE, this one isn't dropped AND has been reordered.

Coming out of VirtRegRewriter with this patch:
DBG_VALUE 36901, $noreg, !"l_801", !DIExpression(DW_OP_plus_uconst, 1, DW_OP_stack_value)

You can see a similar example for overlapping fragments by looking at pr41992.ll which covers 3 previously failing cases.

coalesce_dbg_values.cpp is a Dexter test based on the reproducer in 43957. I'm happy to discuss whether this is an appropriate case!

[0] https://bugs.llvm.org/show_bug.cgi?id=41992
[1] https://bugs.llvm.org/show_bug.cgi?id=43957

More info and notes:

  1. This is loosely based on top of D68816 (adding a read-from-invalid-ptr fix), but removes or changes a lot of it, so I've included all of it in this diff.
  1. I want to improve test coverage for this. I haven't managed to come up with any more just yet.
  1. I had to modify llvm/test/DebugInfo/ARM/PR26163.ll to make it pass.

Diff Detail

Event Timeline

Orlando created this revision.Nov 12 2019, 5:36 AM
krisb added a subscriber: krisb.Nov 12 2019, 5:54 AM
aprantl added inline comments.Nov 12 2019, 9:17 AM
llvm/lib/CodeGen/LiveDebugVariables.cpp
123

getExpression

125

getLocNo

338

I don't know if this helps, but in other places we conceptually treat std::pair<DILocalVariable *, FragmentInfo> as unique "variables" and the remainder expression as part of a "location".

458

Why this change?

Orlando marked 2 inline comments as done.Nov 12 2019, 9:40 AM
Orlando added inline comments.
llvm/lib/CodeGen/LiveDebugVariables.cpp
338

Yes that makes sense, I can probably improve the class name and comment here.

458

This line was changed as part of the work for D68816 (see point 1 in my summary). In D68816 VRMap was a DenseMap and this has been changed to an unordered_map to avoid potentially reading from an invalid pointer.

More Info about this patch: As far as LDV is concerned undefs and 'blockers' are semantically very similar. However, blockers are not emitted (as DBG_VALUES) because it is unnecessary: They are only added to UserValues with overlapping fragments for any given def, and a def implicitly terminates preceding defs with overlapping fragments [0].

[0] https://llvm.org/docs/SourceLevelDebugging.html#object-lifetime-in-optimized-code

Orlando updated this revision to Diff 229481.Nov 15 2019, 2:13 AM

+ Improve some comments
+ rename getter/setters as Adrian requested
+ remove some whitespace changes

Orlando marked 2 inline comments as done and an inline comment as not done.

+ Added reviewers

llvm/lib/CodeGen/LiveDebugVariables.cpp
338

These terms feel overloaded but much more descriptive. How do you feel about this kind of thing:

UserValue is now responsible for all DBG_VALUEs which refer to an exact _fragment_.

UserValue -> Fragment

UserValueGroup groups UserValues (Fragments) that refer to the same _variable_

UserValueGroup -> Variable

Maybe a suitable prefix would help too.

some super nits. Patch is huge and difficult to parse, will require several passes I feel.

llvm/lib/CodeGen/LiveDebugVariables.cpp
102

'This is acts'

should be

'This acts'

?

123–125

'May return nullptr'

Should be

'Will return nullptr'

or is there a case where it can be a dummy value and not return a nullptr?

345

We refer to? We point to?

not sure "are part of" is quite the right description here.

bit nitty I know...

@Orlando Thanks for this! Please consider splitting this into two or more patches (i.e. one for NFC and one teaching functionality).

llvm/test/DebugInfo/pr41992.ll
88

Nit: producer: "clang version 10.0.0"

aprantl added inline comments.Nov 19 2019, 10:03 AM
llvm/lib/CodeGen/LiveDebugVariables.cpp
142–143

= 0

143

= nullptr

338

UserValue is now responsible for all DBG_VALUEs which refer to an exact _fragment_.

You need to be prepared to deal with debug info were the fragments aren't always the same size. Particularly it's common to have both the entire variable and then a version of the variable being split up into fragments. In other passes I've used a concept of a non-overlapping fragment, which I believe is what you mean, but I wanted to be sure you are aware of this!

Orlando updated this revision to Diff 230215.Nov 20 2019, 2:26 AM
Orlando marked an inline comment as not done.
Orlando retitled this revision from [WIP][DebugInfo][LDV] Attempt to teach LDV how to handle fragments to [DebugInfo][LDV] Teach LDV how to identify source variables and handle fragments.
Orlando edited the summary of this revision. (Show Details)

Thanks all for the initial reviews.

Removing 'WIP' -- there is likely more work to do but it'll be driven by reviews at this point.

I've changed the names of some classes to better reflect what's happening and I've updated the summary for this review.

+ dexter test
+ suggested changes/nits
+ fix typo in previous update

@aprantl wrote https://reviews.llvm.org/D70318#inline-635016:

@Orlando cf. the discussion in https://reviews.llvm.org/D70121

This patch could be rewritten in terms of the DebugVariable class being discussed in D70318. We'd still need to group the DebugVariables by { Variable, InlinedAt } to let us search for overlaps so it looks to me like the only benefit would be unification of representations across passes (which is important IMO).

At a glance I'm not sure how much work this would require.

jmorse added inline comments.Nov 20 2019, 7:21 AM
debuginfo-tests/dexter-tests/coalesce_dbg_values.cpp
2

This would more ideally be "UNSUPPORTED: system-windows" so that it runs on the darwin debuginfo-tests bot too.

I think there's still some uncertainty (to me) about what should go into debuginfo-tests, but integration tests for what happens when things are optimised out sounds great.

Orlando updated this revision to Diff 231246.Nov 27 2019, 6:54 AM

Ping.

+ Update dexter test for @jmorse.
+ Small additional change to print().
+ Rebase.
+ Rephrase a small part of the summary.

@djtodoro said:

Please consider splitting this into two or more patches (i.e. one for NFC and one teaching functionality)

I'm not sure if that's possible here. Everything seems quite interdependent, and the bug fix (functionality change) happens as a result of the structural change (also a functionality change).
I could split this into two or more dependant patches and keep 'related' things together. For example, put all the debug print() updates in a single patch. Not sure that this would this help though, what do you think?
There is a very small change/fix to DbgValueLocation which I included in this patch. I could take that out (and probably should) but it won't have a noticable impact on the diff size.

@aprantl since your last review I've updated the class names and summary, how do you feel about these changes?

Orlando added inline comments.Nov 27 2019, 6:57 AM
llvm/test/DebugInfo/pr41992.ll
88

Ah forgot about this one, I'll include this in the next round of updates.

I'm not sure if that's possible here. Everything seems quite interdependent, and the bug fix (functionality change) happens as a result of the structural change (also a functionality change).
I could split this into two or more dependant patches and keep 'related' things together. For example, put all the debug print() updates in a single patch. Not sure that this would this help though, what do you think?
There is a very small change/fix to DbgValueLocation which I included in this patch. I could take that out (and probably should) but it won't have a noticable impact on the diff size.

I see. The semantic of the change does not allow us to get it shorter, so I am OK with that. Thanks for considering that!

llvm/lib/CodeGen/LiveDebugVariables.cpp
758–759

Do not need the braces anymore.

1061

Frag ==> Fragment

1506

Frag ==> Fragment

Orlando updated this revision to Diff 232328.Dec 5 2019, 6:14 AM

ping

+ Rebase.
+ Address comments from @djtodoro.

Orlando updated this revision to Diff 232331.Dec 5 2019, 6:18 AM

Missed one of djtodoro's comments, sorry for the spam!

aprantl added inline comments.Dec 5 2019, 9:23 AM
llvm/lib/CodeGen/LiveDebugVariables.cpp
120–121

DbgValueLocation() = default;

349
Orlando marked 7 inline comments as done.Dec 6 2019, 3:05 AM

Thanks for taking a look @aprantl.

llvm/lib/CodeGen/LiveDebugVariables.cpp
120–121

I'll update this in the next revision, thanks!

349

The point of SourceVariable is to, for any given def, allow us to quickly find
overlapping fragments and explicitly terminate the range of any previous defs.

The key difference between SourceVariable and DebugVariable (from D70486)
is that a SourceVariable represents a group of fragments, and a DebugVariable
is closer to what I've called a Fragment here; it represents a unique _part of_
a variable only.

So I don't believe we can use DebugVariable in place of SourceVariable. But at
the cost of some refactoring and a couple of redundant fields (repeated
DIVariable*, DILocation*, and bool part of OptFragInfo per DebugVariable)
I could replace Fragment with DebugVariable? I do not feel particularly
strongly either way.

djtodoro added inline comments.Dec 6 2019, 5:37 AM
llvm/lib/CodeGen/LiveDebugVariables.cpp
349

The key difference between SourceVariable and DebugVariable (from D70486)
is that a SourceVariable represents a group of fragments, and a DebugVariable
is closer to what I've called a Fragment here; it represents a unique _part of_
a variable only.

Can we document this somehow?

Orlando marked an inline comment as done.Dec 6 2019, 6:29 AM
Orlando added inline comments.
llvm/lib/CodeGen/LiveDebugVariables.cpp
349

I could certainly improve the doxygen comment. Would it be appropriate to document anywhere else?

djtodoro added inline comments.Dec 6 2019, 7:20 AM
llvm/lib/CodeGen/LiveDebugVariables.cpp
349

The doxygen comment will help. I am not sure for some other place, maybe the docs/SourceLevelDebugging is good place.

Orlando updated this revision to Diff 233802.Dec 13 2019, 8:04 AM
Orlando marked 5 inline comments as done.

Hi, sorry for the slow update.

The classes DbgValueLocation (line 144), Fragment (line 172), and SourceVariable (line 356) all have new doxygen comments.

+ Rebase
+ Address reviewer comments

Thanks,
Orlando

Thanks for this! This looks good to me now!

Thanks @djtodoro for looking at this. Sorry for this really delayed response!

This was initially going to be a ping, but as a result of some light offline
discussion I have a few questions. For clarification an "overlap" here refers to
a partial overlap between fragments in DIExpressions.

There are other places in llvm which do not consider overlaps, and patches
which ignore overlaps are being accepted (e.g. D71478).

  1. Do overlapping fragments in dbg.values ever occur before isel currently? I would be happy to write an IRVerifier check for this and document it somewhere if this is appropriate.
  1. More radically and long term, would it be feasible to ever ban overlapping fragments in DBG_VALUEs (MIR) somehow?

As demonstrated by this patch overlapping fragments are a pain to handle
properly. The original patch D66415 was much smaller but didn't handle
overlaps.

I have gathered some stats from clang and llvm-as RelWithDebInfo builds @
llvm 3.4 by counting overlaps found in the livedebugvars pass with this patch:

Terminology:
Def: A { Variable, Scope, Fragment } tuple representing a dbg.value instance.
VarSet: A set containing all the Defs with any given Variable and Scope.
Overlaps: Sum of unique Fragment overlaps between Defs in a VarSet for each VarSet.

llvm 3.4 binary |     llvm-as |       clang
----------------|--------------------------
Num VarSets     |     2284961 |    14430359
Overlaps        |           2 |           7

These stats are only anecdotal and the results are C++ specific so any conclusion
would have to be drawn very tentatively. But:

  1. I would be okay with dropping this patch in favour of cleaning up D66415 and adding a FIXME comment since this patch may be useless in the real world. What do you think?

Ping. Please see my previous comment for an update on this.

vsk added a comment.Jan 24 2020, 2:14 PM

Thanks @djtodoro for looking at this. Sorry for this really delayed response!

This was initially going to be a ping, but as a result of some light offline
discussion I have a few questions. For clarification an "overlap" here refers to
a partial overlap between fragments in DIExpressions.

Apologies, I'm not up-to-date on this patch and would appreciate some clarification / a few short examples of what overlapping fragments are. Is it when you have adjacent dbg.value / DBG_VALUEs, s.t. there is some overlap in the bits they describe? So, are these valid examples? --

Example 1 (full overlap)
dbg.value(..., "var", DIExpression())
dbg.value(..., "var", DIExpression())

Example 2 (partial overlap)
dbg.value(..., "var", DIExpression(DW_OP_LLVM_fragment 0 16))
dbg.value(..., "var", DIExpression(DW_OP_LLVM_fragment 8 24)) ; bits 8-16 overlap

Example 3 (no overlap)
dbg.value(..., "var", DIExpression(DW_OP_LLVM_fragment 0 16))
dbg.value(..., "var", DIExpression(DW_OP_LLVM_fragment 16 16))

Example 3 (also no overlap)
dbg.value(..., "var", DIExpression(DW_OP_LLVM_fragment 0 16))
dbg.value(..., "other_var", DIExpression())
dbg.value(..., "var", DIExpression(DW_OP_LLVM_fragment 8 24))  ; bits 8-16 would overlap, but there's an intervening dbg.value

There are other places in llvm which do not consider overlaps, and patches
which ignore overlaps are being accepted (e.g. D71478).

I'm fuzzy on the definition here so apologies for the probably naive question, but: why are overlaps a problem? Would you mind following up on the review for D71478 describing what it's missing?

  1. Do overlapping fragments in dbg.values ever occur before isel currently? I would be happy to write an IRVerifier check for this and document it somewhere if this is appropriate.
  1. More radically and long term, would it be feasible to ever ban overlapping fragments in DBG_VALUEs (MIR) somehow?

As demonstrated by this patch overlapping fragments are a pain to handle
properly. The original patch D66415 was much smaller but didn't handle
overlaps.

I have gathered some stats from clang and llvm-as RelWithDebInfo builds @
llvm 3.4 by counting overlaps found in the livedebugvars pass with this patch:

Terminology:
Def: A { Variable, Scope, Fragment } tuple representing a dbg.value instance.
VarSet: A set containing all the Defs with any given Variable and Scope.
Overlaps: Sum of unique Fragment overlaps between Defs in a VarSet for each VarSet.

^ Sorry, I didn't quite parse this definition.

llvm 3.4 binary |     llvm-as |       clang
----------------|--------------------------
Num VarSets     |     2284961 |    14430359
Overlaps        |           2 |           7

These stats are only anecdotal and the results are C++ specific so any conclusion
would have to be drawn very tentatively. But:

  1. I would be okay with dropping this patch in favour of cleaning up D66415 and adding a FIXME comment since this patch may be useless in the real world. What do you think?

Hi @vsk, thank you for taking a look at this. Please bear with me, getting you some (relatively concise) answers it is taking a little longer than I expected but I am working on it.

@vsk this is the situation as far as I understand it. I appreciate that this is
a lot of text; there is a summary at the end. I hope this helps.

Apologies, I'm not up-to-date on this patch and would appreciate some
clarification / a few short examples of what overlapping fragments are. Is it
when you have adjacent dbg.value / DBG_VALUEs, s.t. there is some overlap in
the bits they describe?

Yes, except adjacency is not part of that relationship.

Example 5
DBG_VALUE %0, $noreg, "x", DIExpression() ; DV1
...
DBG_VALUE %0, $noreg, "y", DIExpression(DW_OP_LLVM_fragment 0 16) ; DV2
...
DBG_VALUE %1, $noreg, "x", DIExpression() ; DV3 
...
DBG_VALUE %0, $noreg, "y", DIExpression(DW_OP_LLVM_fragment 8 8)  ; DV4

DV1, DV3, have fully overlapping fragments. DV2 and DV4 have partially
overlapping fragments.

So, in your example 4 the first and last DBG_VALUEs partially overlap.

are these valid examples?

Example 1: Correct - I will explicitly state "full overlap" if I must refer
           to this case.  
Example 2: Correct.
Example 3: Correct.
Example 4: Not quite, that's still a partial overlap. See above.

I'm fuzzy on the definition here so apologies for the probably naive question,
but: why are overlaps a problem?

Having to consider fragment overlaps complicates value liveness queries for any
particular bits of a source variable. 'value' here refers to the combination of
a location (e.g. vreg, const) and DIExpression (because this may add a modifier
to the location).

Would you mind following up on the review for D71478 describing what it's
missing?

Ah, looking closely at D71478 the pass will never produce wrong debuginfo. It
doesn't remove any dbg.values which _may overlap_ by ignoring the fragment part.
It takes a conservative approach, but this has no effect on the final debuginfo
so it turns out it isn't a very compelling example.

In the Example 6 below it would be safe to remove DV3, but the pass
implemented in D71478 will not currently do so because DV3 and DV2 _may
overlap_.

Example 6
dbg.value %1, "x", Fragment(0, 16) ; DV1
...
dbg.value %2, "x", Fragment(16, 16) ; DV2
...
dbg.value %1, "x", Fragment(0, 16) ; DV3

why are overlaps a problem? (continued)

LiveDebugVariables (LDV) removes DBG_VALUEs before regalloc and reinserts after.
It cannot afford to be so conservative because this reduces coverage. Before I
explain this I have to show you what is wrong with LDV currently.

LDV uses interval maps to explicitly represent the intervals that the removed
DBG_VALUEs implicitly described. The DBG_VALUEs are filtered into an interval
map based on their { Source Variable, DIExpression }. The interval map will
coalesce adjacent entries that use the same location.

Consider (interval maps before coalescing on the left):

Example 7
+------------DBG_VALUE %0, $noreg, "x", DIExpression() ; DV1
|            ...
|       +----DBG_VALUE %1, $noreg, "x", DIExpression(DW_OP_plus_uconst, 4, DW_OP_stack_value) ; DV2
|       |    ...    
+-------|----DBG_VALUE %0, $noreg, "x", DIExpression() ; DV3
|       |
DV1(%0) DV2(%1)
DV3(%0)

This approach generates incorrect debuginfo. The interval map coalesces DV3 into
DV1 because they use the same location.

Taking the same approach as D71478 and filtering only on { Source Variable }
would fix the bug described above but also may greatly reduce coverage. For
example:

Example 8
+----DBG_VALUE %0, "x", (DW_OP_LLVM_fragment 0 8) ; DV1
|    ...
+----DBG_VALUE %1, "x", (DW_OP_LLVM_fragment 8 8) ; DV2
|  
+----DBG_VALUE %0, "x", (DW_OP_LLVM_fragment 0 8) ; DV3
|
DV1(%0)
DV2(%1)
DV3(%0)

In Example 8 DV3 is not coalesced into DV1. Imagine that regalloc wants to
spill %0:

Example 9
+----DBG_VALUE %0, "x", (DW_OP_LLVM_fragment 0 8) ; DV1
|    ...
+----DBG_VALUE %1, "x", (DW_OP_LLVM_fragment 8 8) ; DV2
|    <SPILL %0 into STACK0>
+----DBG_VALUE %0, "x", (DW_OP_LLVM_fragment 0 8) ; DV3
|
DV1(%0)
DV2(%1) <----| %0 is not used in this interval in this map.
DV3(%0)      | Really, DV2 should not be in this map.

Because the spill range isn't represented in the interval map we just lose that
coverage.

Patch D66415 changes this so that the DBG_VALUEs are filtered into interval maps
based on { Source Variable, Fragment } and will coalesce adjacent entries that
use the same { Location, DIExpression }.

This method correctly handles fully overlapping fragments and non-overlapping
fragments of the same Source Variable and fixes case [0] and [1]. However, it
still produces incorrect debuginfo when presented with partially overlapping
fragments in much the same way:

Example 10
+-------------------DBG_VALUE %0, "x", (DW_OP_LLVM_fragment 0 16) ; DV1
|                   ...
|              +----DBG_VALUE %1, "x", (DW_OP_LLVM_fragment 0 8)  ; DV2
|              |    ...
+--------------|----DBG_VALUE %0, "x", (DW_OP_LLVM_fragment 0 16) ; DV3
|              |
DV1(%0, Expr0) DV2(%1, Expr1)
DV3(%0, Expr0)

DV2 will get filtered into a separate interval map and therefore DV3 and DV1
will be incorrectly coalesced.

So, the patch D66415 may cause incorrect debuginfo, but in practice it is should
cause much less than the current implementation because we are more building more
accurate intervals for any given bits.

^ Sorry, I didn't quite parse this definition.

TL;DR The upper bound of DBG_VALUEs coming into LDV with partially overlapping
fragments in a clang 3.4 RelWithDebInfo build is around 0.00005%.

Summary and questions

Partially overlapping fragments (between DBG_VALUEs) are allowed in IR and MIR
but they are not handled everywhere. For a large C++ codebase I saw a
negligible number of partial overlaps coming into LiveDebugVariablesq.

This patch (in theory) fixes the problem of overlapping fragments for LDV. But
it is large and difficult to review, so:

  1. Please can we accept a patch that looks like D66415 (plus FIXME: overlaps)? It will produce incorrect debuginfo given partially overlapping fragments, but it is still strictly an improvement on the current implementation, fixes [0] and [1], and is far simpler than this patch which does handle partial overlaps.

My numbers are only anecdotal and there may be languages / situations which
necessitate partially overlapping fragments (e.g. C allows much more type
punning than C++). With this in mind:

  1. It seems difficult to actually get clang to produce these pesky overlaps. Do you know of any cases where overlapping fragments in either IR or MIR are _required_ to produce correct debuginfo? Hypothetically, if it were feasible to ban partial overlaps, then D66415 would be fully correct and it would be very easy to make D71478 optimal.

Thanks,
Orlando

[0] https://bugs.llvm.org/show_bug.cgi?id=41992
[1] https://bugs.llvm.org/show_bug.cgi?id=43957

vsk added a comment.EditedJan 30 2020, 5:02 PM

@Orlando thanks for your patient and thorough explanation, I really appreciate it!

@vsk this is the situation as far as I understand it. I appreciate that this is
a lot of text; there is a summary at the end. I hope this helps.

Apologies, I'm not up-to-date on this patch and would appreciate some
clarification / a few short examples of what overlapping fragments are. Is it
when you have adjacent dbg.value / DBG_VALUEs, s.t. there is some overlap in
the bits they describe?

Yes, except adjacency is not part of that relationship.

Example 5
DBG_VALUE %0, $noreg, "x", DIExpression() ; DV1
...
DBG_VALUE %0, $noreg, "y", DIExpression(DW_OP_LLVM_fragment 0 16) ; DV2
...
DBG_VALUE %1, $noreg, "x", DIExpression() ; DV3 
...
DBG_VALUE %0, $noreg, "y", DIExpression(DW_OP_LLVM_fragment 8 8)  ; DV4

DV1, DV3, have fully overlapping fragments. DV2 and DV4 have partially
overlapping fragments.

So, in your example 4 the first and last DBG_VALUEs partially overlap.

are these valid examples?

Example 1: Correct - I will explicitly state "full overlap" if I must refer
           to this case.  
Example 2: Correct.
Example 3: Correct.
Example 4: Not quite, that's still a partial overlap. See above.

Got it.

I'm fuzzy on the definition here so apologies for the probably naive question,
but: why are overlaps a problem?

Having to consider fragment overlaps complicates value liveness queries for any
particular bits of a source variable. 'value' here refers to the combination of
a location (e.g. vreg, const) and DIExpression (because this may add a modifier
to the location).

Would you mind following up on the review for D71478 describing what it's
missing?

Ah, looking closely at D71478 the pass will never produce wrong debuginfo. It
doesn't remove any dbg.values which _may overlap_ by ignoring the fragment part.
It takes a conservative approach, but this has no effect on the final debuginfo
so it turns out it isn't a very compelling example.

In the Example 6 below it would be safe to remove DV3, but the pass
implemented in D71478 will not currently do so because DV3 and DV2 _may
overlap_.

Example 6
dbg.value %1, "x", Fragment(0, 16) ; DV1
...
dbg.value %2, "x", Fragment(16, 16) ; DV2
...
dbg.value %1, "x", Fragment(0, 16) ; DV3

why are overlaps a problem? (continued)

LiveDebugVariables (LDV) removes DBG_VALUEs before regalloc and reinserts after.
It cannot afford to be so conservative because this reduces coverage. Before I
explain this I have to show you what is wrong with LDV currently.

LDV uses interval maps to explicitly represent the intervals that the removed
DBG_VALUEs implicitly described. The DBG_VALUEs are filtered into an interval
map based on their { Source Variable, DIExpression }. The interval map will
coalesce adjacent entries that use the same location.

Consider (interval maps before coalescing on the left):

Example 7
+------------DBG_VALUE %0, $noreg, "x", DIExpression() ; DV1
|            ...
|       +----DBG_VALUE %1, $noreg, "x", DIExpression(DW_OP_plus_uconst, 4, DW_OP_stack_value) ; DV2
|       |    ...    
+-------|----DBG_VALUE %0, $noreg, "x", DIExpression() ; DV3
|       |
DV1(%0) DV2(%1)
DV3(%0)

This approach generates incorrect debuginfo. The interval map coalesces DV3 into
DV1 because they use the same location.

Let me see if I can restate that correctly. So, the problem here is: for the list of user-values tracked by the interval map for the range containing [DV1, DV3], there are two entries: one for {DV1, DV3} (coalesced), and another for {DV2}. This would result in "x" being interpreted as %1 + 4 instead of %0 in that interval (by the debugger)?

Taking the same approach as D71478 and filtering only on { Source Variable }
would fix the bug described above but also may greatly reduce coverage. For
example:

IIUC that would essentially make fragments not 'work', or appear fully, in debug sessions? As you'd only ever see the the last fragment described in an interval?

Example 8
+----DBG_VALUE %0, "x", (DW_OP_LLVM_fragment 0 8) ; DV1
|    ...
+----DBG_VALUE %1, "x", (DW_OP_LLVM_fragment 8 8) ; DV2
|  
+----DBG_VALUE %0, "x", (DW_OP_LLVM_fragment 0 8) ; DV3
|
DV1(%0)
DV2(%1)
DV3(%0)

In Example 8 DV3 is not coalesced into DV1. Imagine that regalloc wants to
spill %0:

Example 9
+----DBG_VALUE %0, "x", (DW_OP_LLVM_fragment 0 8) ; DV1
|    ...
+----DBG_VALUE %1, "x", (DW_OP_LLVM_fragment 8 8) ; DV2
|    <SPILL %0 into STACK0>
+----DBG_VALUE %0, "x", (DW_OP_LLVM_fragment 0 8) ; DV3
|
DV1(%0)
DV2(%1) <----| %0 is not used in this interval in this map.
DV3(%0)      | Really, DV2 should not be in this map.

Because the spill range isn't represented in the interval map we just lose that
coverage.

Patch D66415 changes this so that the DBG_VALUEs are filtered into interval maps
based on { Source Variable, Fragment } and will coalesce adjacent entries that
use the same { Location, DIExpression }.

Ah, sgtm, then.

This method correctly handles fully overlapping fragments and non-overlapping
fragments of the same Source Variable and fixes case [0] and [1]. However, it
still produces incorrect debuginfo when presented with partially overlapping
fragments in much the same way:

Example 10
+-------------------DBG_VALUE %0, "x", (DW_OP_LLVM_fragment 0 16) ; DV1
|                   ...
|              +----DBG_VALUE %1, "x", (DW_OP_LLVM_fragment 0 8)  ; DV2
|              |    ...
+--------------|----DBG_VALUE %0, "x", (DW_OP_LLVM_fragment 0 16) ; DV3
|              |
DV1(%0, Expr0) DV2(%1, Expr1)
DV3(%0, Expr0)

DV2 will get filtered into a separate interval map and therefore DV3 and DV1
will be incorrectly coalesced.

I don't think we should ban these partial overlaps from showing up in IR, as I can imagine situations where this sort of IR is legitimately describing the known bits in a value. What we might consider doing to make LDV more conservative is a) identifying partial overlaps in a dbg.value pack using a backwards scan and b) deleting any earlier dbg.values which overlap with bits described later in the pack. That should be conservatively correct, and wouldn't require changing the interval map structure?

So, the patch D66415 may cause incorrect debuginfo, but in practice it is should
cause much less than the current implementation because we are more building more
accurate intervals for any given bits.

^ Sorry, I didn't quite parse this definition.

TL;DR The upper bound of DBG_VALUEs coming into LDV with partially overlapping
fragments in a clang 3.4 RelWithDebInfo build is around 0.00005%.

Summary and questions

Partially overlapping fragments (between DBG_VALUEs) are allowed in IR and MIR
but they are not handled everywhere. For a large C++ codebase I saw a
negligible number of partial overlaps coming into LiveDebugVariablesq.

This patch (in theory) fixes the problem of overlapping fragments for LDV. But
it is large and difficult to review, so:

  1. Please can we accept a patch that looks like D66415 (plus FIXME: overlaps)? It will produce incorrect debuginfo given partially overlapping fragments, but it is still strictly an improvement on the current implementation, fixes [0] and [1], and is far simpler than this patch which does handle partial overlaps.

Sgtm, I very strongly feel that we should handle the 99.99995% case first and then handle the rest later.

My numbers are only anecdotal and there may be languages / situations which
necessitate partially overlapping fragments (e.g. C allows much more type
punning than C++). With this in mind:

  1. It seems difficult to actually get clang to produce these pesky overlaps. Do you know of any cases where overlapping fragments in either IR or MIR are _required_ to produce correct debuginfo?

Anything I'd write here would be highly contrived, but I don't believe that these overlaps are senseless. E.g. we see this in IR all the time:

dbg.value(i32 0, "x", DIExpression())
dbg.value(i32 1, "x", DIExpression())

and this is seen as well-defined.

Hypothetically, if it were feasible to ban partial overlaps, then D66415
would be fully correct and it would be very easy to make D71478 optimal.

Thanks,
Orlando

[0] https://bugs.llvm.org/show_bug.cgi?id=41992
[1] https://bugs.llvm.org/show_bug.cgi?id=43957

Let me see if I can restate that correctly. So, the problem here is: for the list of user-values tracked by the interval map for the range containing [DV1, DV3], there are two entries: one for {DV1, DV3} (coalesced), and another for {DV2}. This would result in "x" being interpreted as %1 + 4 instead of %0 in that interval (by the debugger)?

Short answer: yes, that is the behaviour we observe in the debugger. Longer
answer: the cause of the issue is slightly different to how I read your
explanation. The core problem here is that in this situation _two_ interval maps
are created instead of one:

A: { [DV1, DV3), [DV3, ...) }
   ** Coalesce these adjacent ranges because they both use the same location (%0) **
   { [DV1, ...) }
B: { [DV2, ...) }

This results in "x" being interpreted as %1 + 4 in the range [DV2, ...). As
stated previously this happens because of the method we use currently to filter
DBG_VALUEs into the 'correct' interval maps.

In this example, to get correct debuginfo, we would want to see the filter
changed so that all of these DBG_VALUEs end up in the same map, and slightly
different coalescing rules (e.g. D66415):

C: { [DV1, DV2), [DV2, DV3), [DV3, ...) }
   ** No coalescing because no adjacent ranges use the same location AND expression. **

Anything I'd write here would be highly contrived, but I don't believe that these overlaps are senseless. E.g. we see this in IR all the time:

dbg.value(i32 0, "x", DIExpression())
dbg.value(i32 1, "x", DIExpression())

and this is seen as well-defined.

Sorry, this question was badly written. I meant to ask if we could ever consider
banning _paritally_ overlapping fragments. But you do answer this elsewhere anyway:

I don't think we should ban these partial overlaps from showing up in IR, as I can
imagine situations where this sort of IR is legitimately describing the known bits
in a value. What we might consider doing to make LDV more conservative is a)
identifying partial overlaps in a dbg.value pack using a backwards scan and b)
deleting any earlier dbg.values which overlap with bits described later in the
pack. That should be conservatively correct, and wouldn't require changing the
interval map structure?

IIUC I'm not sure that works with this example?

DBG_VALUE %0, "x", (DW_OP_LLVM_fragment 00 16) ; DV1
...
DBG_VALUE %1, "x", (DW_OP_LLVM_fragment 00 16) ; DV2
DBG_VALUE %0, "x", (DW_OP_LLVM_fragment 08 32) ; DV3
...
DBG_VALUE %0, "x", (DW_OP_LLVM_fragment 00 16) ; DV4

Going forward, I will:

  1. Abandon this patch (for now...)
  2. Touch-up and reopen D66415 (fixes the 99.99995% case)

Thank you for sticking with me on this one!

Orlando abandoned this revision.Feb 5 2020, 9:05 AM

I've opened a stack of patches starting at D74052 which implement the 99.99...% fix (handle fully overlapping fragments) discussed above.