This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Trim call-clobbered location list entries when tuning for GDB
AbandonedPublic

Authored by dstenb on Oct 4 2019, 7:46 AM.

Details

Summary

This is a patch that mimics GCC's behavior for location list entries
that are clobbered by calls. This is introduced in order to stop
variables that are described by call-clobbered registers and reside in
outer frames from being printable when performing virtual unwinding in
GDB.

GDB has no information about which registers are scratch registers and
which are preserved, but instead relies on the producer to describe
that. One way of solving that would be to emit DW_CFA_undefined
instructions for all scratch registers, but that would result in a lot
of CFI output.

When unwinding to an outer frame, GCC and LLDB subtract one byte from
the return address in order to get the correct scope for the call. GCC
utilizes that by subtracting one byte from the range for all variables
that are clobbered by the call. This means that such variables will be
printed as <optimized out> in GDB when performing virtual unwinding, but
the variables will still be printable when standing on the call before
executing it. This commit introduces the same behavior in LLVM's
location list code. This assumes that a call instruction is larger than
one byte.

LLDB has knowledge about which registers are preserved, so trimming the
ranges in this way is not valuable for LLDB, as far as I can tell. I
have therefore added this as a GDB tuning.

This fixes PR39752.

Diff Detail

Event Timeline

dstenb created this revision.Oct 4 2019, 7:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 4 2019, 7:46 AM
dstenb added a comment.Oct 4 2019, 7:51 AM

I'm uploading this to see it there is any interest for it. On the DWARF-discuss mailing list it was argued that the producer should be able to know which registers are preserved by the call, and that it is undesirable that the producer makes these sorts of assumptions about the consumer. I think that both of those points are fair.

On the other hand, I think that the incompatibility between Clang and GDB is quite nasty, as it requires the end user to be aware of how variables are described (e.g. by using info addr var) before trusting the values of variables in outer frames. That can be time consuming, and is perhaps easy to forget sometimes, e.g. when looking at backtraces.

As this can be implemented as a GDB-specific tuning with not too much impact on the rest of the code, except for the MCSymbol -> SymbolWithOffset rewrites, I think it may be worth considering, but I totally understand if you don't want this, and then we'll just keep it downstream.

Thanks for bringing this up!

A few thoughts from me:

  1. Yeah, I tend to agree with the DWARF Committee folks & the fact that LLDB can do the right thing without this change sort of points to this being a "fix it in GDB" situation. Have you tried asking the GDB folks about it/submitting patches there rather than here?
  1. Do you have a small example of GCC producing this kind of output?
  1. The ability to use an offset from a debug_addr entry is actually something that's quite desirable - but not quite in the way you're suggesting. Actually the goal would be to not use another debug_addr entry, but the ability to refer to an addr entry + offset in a DIE. Of course this would require an extension to DWARF (non-standard, or eventually standard) which would also mean updating the DWARF consumer... which probably defeats the point of your work, which I imagine is intended to avoid changing the consumer. Though perhaps GDB would be more inclined to accept a patch for an addr+offset form compared to support for the register details.
  1. If GDB can do the right thing when printing in a backtrace, then it seems like it should do the same/similar thing when in a frame
dstenb added a comment.Oct 7 2019, 6:29 AM

Thanks for bringing this up!

A few thoughts from me:

  1. Yeah, I tend to agree with the DWARF Committee folks & the fact that LLDB can do the right thing without this change sort of points to this being a "fix it in GDB" situation. Have you tried asking the GDB folks about it/submitting patches there rather than here?

No, we have not done that yet.

  1. Do you have a small example of GCC producing this kind of output?
extern int value(void);
extern void call(int);

int main() {
  int local = value();
  call(local);
  return 0;
}

compiled using -O1 -g with GCC 8.3.0 gives:

(gdb) info addr local
Symbol "local" is multi-location:
  Range 0x5555555550ee-0x5555555550f4: a variable in $rax
.
(gdb) disas main
[...]
   0x00005555555550f0 <+11>:	callq  0x5555555550ff <call>
   0x00005555555550f5 <+16>:	mov    $0x0,%eax
[...]

As seen, the location list entry ends one byte before the return address.

  1. The ability to use an offset from a debug_addr entry is actually something that's quite desirable - but not quite in the way you're suggesting. Actually the goal would be to not use another debug_addr entry, but the ability to refer to an addr entry + offset in a DIE. Of course this would require an extension to DWARF (non-standard, or eventually standard) which would also mean updating the DWARF consumer... which probably defeats the point of your work, which I imagine is intended to avoid changing the consumer. Though perhaps GDB would be more inclined to accept a patch for an addr+offset form compared to support for the register details.

Okay! How would that look like, and what would that be used for?

  1. If GDB can do the right thing when printing in a backtrace, then it seems like it should do the same/similar thing when in a frame

The same problem exists for backtraces. For example, in PR39752 the outer frames' parameters are printed using the inner-most register value:

(gdb) bt
#0  fn3 (p3=<optimized out>) at test.c:11
#1  0x000000000040050f in fn2 (p2=999) at test.c:15
#2  0x000000000040052f in fn1 (p1=999) at test.c:21
#3  0x000000000040054e in main () at test.c:26

Debugger tuning should not be used directly this way. There should be a DwarfDebug flag, and a CL option, and the default set in the DwarfDebug ctor based on tuning. This allows the defaulting to work how you want, but can be overridden easily for experimentation and testing. There are lots of examples of doing this in the ctor already. Also, if it turns out some other debugger also needs this, it's trivial to fix up the ctor to handle it with no code changes needed elsewhere.

@dblaikie I'm also not clear what you're suggestion about .debug_addr entry plus offset. DW_LLE_offset_pair does this, derived from the base address, which ought to be available for any given function, assuming DWARF v5. Can you explain more clearly what's missing?

Thanks for bringing this up!

A few thoughts from me:

  1. Yeah, I tend to agree with the DWARF Committee folks & the fact that LLDB can do the right thing without this change sort of points to this being a "fix it in GDB" situation. Have you tried asking the GDB folks about it/submitting patches there rather than here?

No, we have not done that yet.

I think it'd be worthwhile having at least a statement from GDB that they feel this should be the responsibility of the producer. Though even if that's the answer they provide - I think some amount of pushback (especially given the existence proof of LLDB's behavior, by the sounds of it/if I'm understanding you correctly) might be worthwhile.

  1. Do you have a small example of GCC producing this kind of output?
extern int value(void);
extern void call(int);

int main() {
  int local = value();
  call(local);
  return 0;
}

compiled using -O1 -g with GCC 8.3.0 gives:

(gdb) info addr local
Symbol "local" is multi-location:
  Range 0x5555555550ee-0x5555555550f4: a variable in $rax
.
(gdb) disas main
[...]
   0x00005555555550f0 <+11>:	callq  0x5555555550ff <call>
   0x00005555555550f5 <+16>:	mov    $0x0,%eax
[...]

As seen, the location list entry ends one byte before the return address.

Cool - thanks!

  1. The ability to use an offset from a debug_addr entry is actually something that's quite desirable - but not quite in the way you're suggesting.

OK, I've looked through the code some more & understand a little better.

So your changes to the address pool don't actually cause the address pool to contain entries with offsets - it stores only the base address in the actual debug_addr pool output, but then uses the offset from there in the place that refers to the address pool.

So I think that would mean there would end up with duplicate entries in debug_addr, which would be a waste of space/relocations/etc.

So only the address should go in the pool - the pool shouldn't be aware of the offset. (this would mean the semantics of the in-memory data structures would match more closely to the output)

So the DebugLocStream::Entry's Begin/End could be SymbolWithOffset - without having the AddressPool having any knowledge of these offsets, just the symbol itself.

Actually the goal would be to not use another debug_addr entry, but the ability to refer to an addr entry + offset in a DIE. Of course this would require an extension to DWARF (non-standard, or eventually standard) which would also mean updating the DWARF consumer... which probably defeats the point of your work, which I imagine is intended to avoid changing the consumer. Though perhaps GDB would be more inclined to accept a patch for an addr+offset form compared to support for the register details.

Okay! How would that look like, and what would that be used for?

  1. If GDB can do the right thing when printing in a backtrace, then it seems like it should do the same/similar thing when in a frame

The same problem exists for backtraces. For example, in PR39752 the outer frames' parameters are printed using the inner-most register value:

(gdb) bt
#0  fn3 (p3=<optimized out>) at test.c:11
#1  0x000000000040050f in fn2 (p2=999) at test.c:15
#2  0x000000000040052f in fn1 (p1=999) at test.c:21
#3  0x000000000040054e in main () at test.c:26

Ah, sorry, I misunderstood your description.

Debugger tuning should not be used directly this way. There should be a DwarfDebug flag, and a CL option, and the default set in the DwarfDebug ctor based on tuning. This allows the defaulting to work how you want, but can be overridden easily for experimentation and testing. There are lots of examples of doing this in the ctor already. Also, if it turns out some other debugger also needs this, it's trivial to fix up the ctor to handle it with no code changes needed elsewhere.

@dblaikie I'm also not clear what you're suggestion about .debug_addr entry plus offset. DW_LLE_offset_pair does this, derived from the base address, which ought to be available for any given function, assuming DWARF v5. Can you explain more clearly what's missing?

Right - for loclists there's no need for new forms, etc. It was specifically related to the other review related to this that modifies the in-memory representation of the debug_addr (llvm::AddressPool) - which I assume meant a difference in output in the address pool, but seems it doesn't add the offset inside the pool (but may end up with redundant entries in the pool which should be fixed in any case).

My point was generally that the debug_addr section shouldn't be incnluding addresses with offsets, it should be the places that refer to debug_addr that use the offsets. The specific place I'd like to use offsets would be from FORM_addr in debug_info. But, yes, in this case the support for debug addr references from loclists, the forms are already sufficiently descriptive for this.

@dblaikie I'm also not clear what you're suggestion about .debug_addr entry plus offset. DW_LLE_offset_pair does this, derived from the base address, which ought to be available for any given function, assuming DWARF v5. Can you explain more clearly what's missing?

Right - for loclists there's no need for new forms, etc. It was specifically related to the other review related to this that modifies the in-memory representation of the debug_addr (llvm::AddressPool) - which I assume meant a difference in output in the address pool, but seems it doesn't add the offset inside the pool (but may end up with redundant entries in the pool which should be fixed in any case).

I agree, address pool management should be able to eliminate redundant entries

My point was generally that the debug_addr section shouldn't be incnluding addresses with offsets, it should be the places that refer to debug_addr that use the offsets. The specific place I'd like to use offsets would be from FORM_addr in debug_info. But, yes, in this case the support for debug addr references from loclists, the forms are already sufficiently descriptive for this.

Ah, got it. The problem you're facing is that for a DIE, a FORM describes one value, while you want to describe two--the address (or index into .debug_addr), and a separate offset. For a DIE attribute, this would normally be done using an expression. Would it work to have (say) DW_AT_low_pc be allowed to have class exprloc? (It currently must have class address, either FORM_addr or one of the FORM_addrx's.) The expression can index into .debug_addr and then add an offset to the result.

@dblaikie I'm also not clear what you're suggestion about .debug_addr entry plus offset. DW_LLE_offset_pair does this, derived from the base address, which ought to be available for any given function, assuming DWARF v5. Can you explain more clearly what's missing?

My point was generally that the debug_addr section shouldn't be incnluding addresses with offsets, it should be the places that refer to debug_addr that use the offsets. The specific place I'd like to use offsets would be from FORM_addr in debug_info. But, yes, in this case the support for debug addr references from loclists, the forms are already sufficiently descriptive for this.

Ah, got it. The problem you're facing is that for a DIE, a FORM describes one value, while you want to describe two--the address (or index into .debug_addr), and a separate offset. For a DIE attribute, this would normally be done using an expression. Would it work to have (say) DW_AT_low_pc be allowed to have class exprloc? (It currently must have class address, either FORM_addr or one of the FORM_addrx's.) The expression can index into .debug_addr and then add an offset to the result.

Yep, we've (you, me, DWARF committee, other folks) have talked about it before, years ago - didn't mean to derail/rehash the whole thing. I've prototyped it with expressions - you're right, doesn't require much change to the DWARF standard (if any, really - DWARF being permissive & all) but maybe a fair bit of chaneg/work for DWARF consumers compared to a more specific encoding (& I'd previously asked/discussed what encoding would be suitable - given that both the address and teh offset support various different fixed (& then variable) length encodings and having the combination of all of those would be painful - so having only the 2xuleb encoding was the last thing discussed).

But, yeha, once we've got DWARFv5 fully implemented & deployed/functioning with LLDB and GDB I'll probably look at whether addr+offset is worthwhile. (the other, sort of cheaper workaround, is to use a rnglist of length 1 - not sure how much the addr+offset would save compared to a rnglist of length 1, etc - lots to compare, see what's worthwhile)

dstenb added a comment.Oct 9 2019, 5:48 AM

Thanks for bringing this up!

A few thoughts from me:

  1. Yeah, I tend to agree with the DWARF Committee folks & the fact that LLDB can do the right thing without this change sort of points to this being a "fix it in GDB" situation. Have you tried asking the GDB folks about it/submitting patches there rather than here?

No, we have not done that yet.

I think it'd be worthwhile having at least a statement from GDB that they feel this should be the responsibility of the producer. Though even if that's the answer they provide - I think some amount of pushback (especially given the existence proof of LLDB's behavior, by the sounds of it/if I'm understanding you correctly) might be worthwhile.

Yes, that's fair. I sent a mail to the GDB mailing list now: https://sourceware.org/ml/gdb/2019-10/msg00002.html. As mentioned in the mail, it appears that registers actually are described in that way for some targets, e.g. RS/6000 and S/390. Sorry for not noticing that before. (Although, I did not test any such targets in practice, so I might have misunderstood that code.)

So your changes to the address pool don't actually cause the address pool to contain entries with offsets - it stores only the base address in the actual debug_addr pool output, but then uses the offset from there in the place that refers to the address pool.

The offset will make it into the address pool output. In the parent patch (D68466) the offset is added to the output in AddressPool::emit().

Such a case is tested in the attached call-clobbered-split.mir test:

# CHECK: .Ldebug_loc0:
[...]
# CHECK-NEXT:    .byte    3
# CHECK-NEXT:    .byte    2 <---------
# CHECK-NEXT:    .long    .Ltmp2-(.Ltmp2-1)
# CHECK-NEXT:    .byte    4                       # Loc expr size
# CHECK-NEXT:    .byte    48                      # DW_OP_lit0
# CHECK-NEXT:    .byte    159                     # DW_OP_stack_value
# CHECK-NEXT:    .byte    147                     # DW_OP_piece
# CHECK-NEXT:    .byte    8                       # 8
[...]
# CHECK: .Laddr_table_base0:
# CHECK-NEXT:    .quad    .Lfunc_begin0
# CHECK-NEXT:    .quad    .Ltmp1
# CHECK-NEXT:    .quad    .Ltmp2-1 <-----
# CHECK-NEXT:    .quad    .Ltmp2

So I think that would mean there would end up with duplicate entries in debug_addr, which would be a waste of space/relocations/etc.

So only the address should go in the pool - the pool shouldn't be aware of the offset. (this would mean the semantics of the in-memory data structures would match more closely to the output)

I think it's necessary to emit the offsets in the address pool output since DW_LLE_offset_pair takes unsigned operands.

dstenb updated this revision to Diff 224386.Oct 10 2019, 9:16 AM

Rebase (after addressing comments in D68466), and move GDB tuning to DwarfDebug member.

Debugger tuning should not be used directly this way. There should be a DwarfDebug flag, and a CL option, and the default set in the DwarfDebug ctor based on tuning. This allows the defaulting to work how you want, but can be overridden easily for experimentation and testing. There are lots of examples of doing this in the ctor already. Also, if it turns out some other debugger also needs this, it's trivial to fix up the ctor to handle it with no code changes needed elsewhere.

Oh, right. That makes sense. When I added that I saw that there were inline uses of the tuner queries in constructCallSiteEntryDIEs() already, so I did not consider that. Perhaps those uses should be addressed also.

So your changes to the address pool don't actually cause the address pool to contain entries with offsets - it stores only the base address in the actual debug_addr pool output, but then uses the offset from there in the place that refers to the address pool.

The offset will make it into the address pool output. In the parent patch (D68466) the offset is added to the output in AddressPool::emit().

Such a case is tested in the attached call-clobbered-split.mir test:

# CHECK: .Ldebug_loc0:
[...]
# CHECK-NEXT:    .byte    3
# CHECK-NEXT:    .byte    2 <---------
# CHECK-NEXT:    .long    .Ltmp2-(.Ltmp2-1)
# CHECK-NEXT:    .byte    4                       # Loc expr size
# CHECK-NEXT:    .byte    48                      # DW_OP_lit0
# CHECK-NEXT:    .byte    159                     # DW_OP_stack_value
# CHECK-NEXT:    .byte    147                     # DW_OP_piece
# CHECK-NEXT:    .byte    8                       # 8
[...]
# CHECK: .Laddr_table_base0:
# CHECK-NEXT:    .quad    .Lfunc_begin0
# CHECK-NEXT:    .quad    .Ltmp1
# CHECK-NEXT:    .quad    .Ltmp2-1 <-----
# CHECK-NEXT:    .quad    .Ltmp2

Ah, OK. I haven't applied/tested the patch myself - I looked at GCC's behavior & figured you were probably aiming for the same as it, and it doesn't look like GCC does this - and I'd certainly like to avoid that if at all possible. (though all this is pending discussion/conclusions you get from discussing things with GDB)

So I think that would mean there would end up with duplicate entries in debug_addr, which would be a waste of space/relocations/etc.

So only the address should go in the pool - the pool shouldn't be aware of the offset. (this would mean the semantics of the in-memory data structures would match more closely to the output)

I think it's necessary to emit the offsets in the address pool output since DW_LLE_offset_pair takes unsigned operands.

Unsigned operands don't /seem/ to me to be problematic here..

GCC's output in -gdwarf-5 with the example you provided previously doesn't use debug_addr, instead using offset_pair (if there's a base address) or start_length (if I add another function to the example, and use -ffunction-sections, so the base address of the CU is constant zero):

.byte   0x8
.quad   .LVL0
.uleb128 .LVL1-1-.LVL0
.uleb128 0x1
.byte   0x50
.byte   0

With -gsplit-dwarf GCC has to use debug_addr, and we don't see any label arithmetic in debug_addr:

.section        .debug_addr,"",@progbits
.quad   .LVL1
.quad   .LFB0
.quad   .LFB1
.quad   call
.quad   value
.quad   .LVL0

& we do see it in debug_loclists.dwo:

.byte   0x3
.uleb128 0x5
.uleb128 .LVL1-1-.LVL0
.uleb128 0x1
.byte   0x50
.byte   0

So if we are going to do this, I'd certainly want to match that sort of behavior - and not make changes to/add extra addresses to debug_addr if we don't have to.

Unsigned operands don't /seem/ to me to be problematic here..

GCC's output in -gdwarf-5 with the example you provided previously doesn't use debug_addr, instead using offset_pair (if there's a base address) or start_length (if I add another function to the example, and use -ffunction-sections, so the base address of the CU is constant zero):

.byte   0x8
.quad   .LVL0
.uleb128 .LVL1-1-.LVL0
.uleb128 0x1
.byte   0x50
.byte   0

With -gsplit-dwarf GCC has to use debug_addr, and we don't see any label arithmetic in debug_addr:

.section        .debug_addr,"",@progbits
.quad   .LVL1
.quad   .LFB0
.quad   .LFB1
.quad   call
.quad   value
.quad   .LVL0

& we do see it in debug_loclists.dwo:

.byte   0x3
.uleb128 0x5
.uleb128 .LVL1-1-.LVL0
.uleb128 0x1
.byte   0x50
.byte   0

So if we are going to do this, I'd certainly want to match that sort of behavior - and not make changes to/add extra addresses to debug_addr if we don't have to.

Is that output based on the C reproducer I posted a few comments up? I don't think that case is representative for the issue with the offsets in the address pool. Since the variable is only described by a call-clobbered register, we only have to end a location list entry at $return_addr - 1, not start a location list entry at $return_addr - 1. I think we can get away without needing offsets in the address pool for such cases.

The C reproducer that the call-clobbered-split.mir test case is based on has an aggregate variable whose elements are described by a call-clobbered register respectively a constant, so we want to start a location list entry at $return_addr - 1 for which only the constant element is described.

Here is that C reproducer:

extern void fn2(int *);

void fn1() {
  int data[] = {1, 2};
  int *ptrs[] = {0, &data[1]};
  fn2(ptrs[1]);
  ptrs[1] = 0;
}

If I compile that with GCC 7.4.0 using the following command line:

$ gcc-7 -O1 -g -gdwarf-5 -gsplit-dwarf -S -o - foo.c

GCC emits an address pool entry with an offset:

[...]
.Ldebug_addr0:
        .quad   .LVL1
        .quad   .LVL2
        .quad   .LVL3
        .quad   .LFB0
        .quad   .LVL2-1 <----------
        .quad   fn2
        .quad   __stack_chk_fail
        .quad   .LVL0

So this patch seems to behave similarly as, admittedly a quite old version of, GCC. I'll build the latest GCC release and try the same with that.

I don't think that case is representative for the issue with the offsets in the address pool. Since the variable is only described by a call-clobbered register, we only have to end a location list entry at $return_addr - 1, not start a location list entry at $return_addr - 1. I think we can get away without needing offsets in the address pool for such cases.

The C reproducer that the call-clobbered-split.mir test case is based on has an aggregate variable whose elements are described by a call-clobbered register respectively a constant, so we want to start a location list entry at $return_addr - 1 for which only the constant element is described.

Here is that C reproducer:

extern void fn2(int *);

void fn1() {
  int data[] = {1, 2};
  int *ptrs[] = {0, &data[1]};
  fn2(ptrs[1]);
  ptrs[1] = 0;
}

If I compile that with GCC 7.4.0 using the following command line:

$ gcc-7 -O1 -g -gdwarf-5 -gsplit-dwarf -S -o - foo.c

GCC emits an address pool entry with an offset:

[...]
.Ldebug_addr0:
        .quad   .LVL1
        .quad   .LVL2
        .quad   .LVL3
        .quad   .LFB0
        .quad   .LVL2-1 <----------
        .quad   fn2
        .quad   __stack_chk_fail
        .quad   .LVL0

So this patch seems to behave similarly as, admittedly a quite old version of, GCC. I'll build the latest GCC release and try the same with that.

Yeah, GCC 8.1 has the same behavior.

But I think we can avoid that in LLVM with some of the loclist changes I'm working on - we should only ever be using the start of a function (not necessarily the current function, if there are multiple functions in the same section) as a base address in location lists. So locations like this should be rendered as offset pairs relative to that base address & should always be positive, even if you subtract 1.

I'm wondering whether it might be worth deferring all this design discussion until after there's more clarity from GDB - since I expect that's the right path forward & this design discussion may be unnecessary.

Yeah, GCC 8.1 has the same behavior.

But I think we can avoid that in LLVM with some of the loclist changes I'm working on - we should only ever be using the start of a function (not necessarily the current function, if there are multiple functions in the same section) as a base address in location lists. So locations like this should be rendered as offset pairs relative to that base address & should always be positive, even if you subtract 1.

Okay!

I'm wondering whether it might be worth deferring all this design discussion until after there's more clarity from GDB - since I expect that's the right path forward & this design discussion may be unnecessary.

Yes, agreed.

dstenb abandoned this revision.Oct 29 2019, 3:33 AM

Sorry for the delay!

I got an answer from Simon Marchi about this: https://sourceware.org/ml/gdb/2019-10/msg00007.html, and I wrote a ticket on GDB's Bugzilla to keep track of that: https://sourceware.org/bugzilla/show_bug.cgi?id=25143.

It seems that the way forward is to try to land such patches in GDB. Given that, I'll abandon this patch for now. I will probably not be able to do that GDB work in the near time, but there is at least a way forward if someone wants to pick it up.