This is an archive of the discontinued LLVM Phabricator instance.

DebugInfo: Use base address selection entries for debug_loc
ClosedPublic

Authored by dblaikie on Oct 7 2019, 6:19 PM.

Details

Summary

Unify the range and loc emission (for both DWARFv4 and DWARFv5 style lists) and take advantage of that unification to use strategic base addresses for loclists.

Needs more testing, but llvm-dwarfdump doesn't currently support LLE_base_addressx, for instance. But Pavel's looking at some changes there, so I'm holding off in case his work addresses it, or at least I can work on it afterwards so as not to conflict if I tried to do so now.

Anyone know whether they have consumers (LLDB, the Sony debugger) that would need to be updated for either the v4 changes (use of base address specifiers in classic debug_loc lists) or v5 (base_addressx, etc, etc)? GDB can't cope with the DWARFv5 stuff, but seems fine with the v4 version.

Diff Detail

Event Timeline

dblaikie created this revision.Oct 7 2019, 6:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2019, 6:19 PM
labath added a comment.Oct 8 2019, 5:30 AM

LLDB seems to have support for base address selection in v4 debug_loc. It does not have support for v5 LLE_base_address(x) stuff, but the whole of v5 location list support is kind of wonky, which also is why I am looking at getting it to use the llvm version of the parser.

As for llvm-dwarfdump, feel free to add new encodings there. My plan is to add support for all LLE encodings, but since I also need to figure out a way to refactor all of that stuff, it may take a while before I get to that. Having one or two new encodings appear in the mean time should only be a minor nuisance.

probinson accepted this revision.Oct 8 2019, 9:23 AM

Anyone know whether they have consumers (LLDB, the Sony debugger) that would need to be updated for either the v4 changes (use of base address specifiers in classic debug_loc lists) or v5 (base_addressx, etc, etc)?

I'll ask re Sony debugger. I have no direct visibility to that code.

This revision is now accepted and ready to land.Oct 8 2019, 9:23 AM

BTW the BinaryFormat part LGTM and can go in on its own if you like. Should have been done that way in the first place.

probinson requested changes to this revision.Oct 8 2019, 10:08 AM

For some reason a previous comment caused it to set Accept, and the only way I know to undo it is to set Request Changes. Sorry about that.

This revision now requires changes to proceed.Oct 8 2019, 10:08 AM
SouraVX removed a subscriber: SouraVX.Oct 8 2019, 12:19 PM
SouraVX added a subscriber: SouraVX.

I'll ask re Sony debugger. I have no direct visibility to that code.

My debugger guys say they have code to handle it and some hand-coded tests, so they are cautiously optimistic that nothing bad will happen.

lib/CodeGen/AsmPrinter/DwarfDebug.cpp
2328 ↗(On Diff #223716)

Would it be more readable this way?

if (!UseDwarf5) {
  Base = NewBase;
  BaseIsSet = true;
  Asm-OutStreamer->EmitIntValue(-1, Size);
  // etc
} else if (NewBase != Begin || P.second.size() > 1) {
  Base = NewBase;
  BaseIsSet = true;
  Asm->OutStreamer->AddComment(StringifyEnum(BaseAddressx);
  // etc
}

As there are only 2 lines in common. (My eye caught if (!UseDwarf5 and two lines later if (UseDwarf5) and did a double-take.)

dblaikie marked an inline comment as done.Oct 11 2019, 2:49 PM

LLDB seems to have support for base address selection in v4 debug_loc. It does not have support for v5 LLE_base_address(x) stuff, but the whole of v5 location list support is kind of wonky, which also is why I am looking at getting it to use the llvm version of the parser.

Yeah, that sort of summarizes GDB's support too.

As for llvm-dwarfdump, feel free to add new encodings there. My plan is to add support for all LLE encodings, but since I also need to figure out a way to refactor all of that stuff, it may take a while before I get to that. Having one or two new encodings appear in the mean time should only be a minor nuisance.

Had to take a few goes at this to see if there was a good mid-point of refactoring & think I found one that coalesces some of the codepaths for verbose, non-verbose, and inline dumping - insofar as seemed reasonable, I tried to make things more similar to debug_rnglists (in several cases just at least making the code look similar, even though it's not shared yet).

lib/CodeGen/AsmPrinter/DwarfDebug.cpp
2328 ↗(On Diff #223716)

Sure, looks good to me! I know this whole function's got several cases to think about & is a bit unwieldy.

This revision was not accepted when it landed; it landed in state Needs Revision.Oct 11 2019, 2:59 PM
This revision was automatically updated to reflect the committed changes.

I haven't fully debugged this, but it looks like this change caused a failure on the Windows LLDB bot. There was already another failure, so you probably didn't get an email:

http://lab.llvm.org:8011/builders/lldb-x64-windows-ninja/builds/9772

The Buildbot is still failing because of this test: http://lab.llvm.org:8011/builders/lldb-x64-windows-ninja/builds/9884/steps/test/logs/stdio

When running the test locally, the output is:

(lldb) file E:\_work\22\b\tools\lldb\lldb-test-build.noindex\lang\c\local_variables\TestLocalVariables.test_c_local_variables_dwarf\a.out
Current executable set to 'E:\_work\22\b\tools\lldb\lldb-test-build.noindex\lang\c\local_variables\TestLocalVariables.test_c_local_variables_dwarf\a.out' (x86_64).
(lldb) br s -f main.c -l 13
Breakpoint 1: where = a.out`foo + 9 at main.c:13:3, address = 0x0000000140001029
(lldb) r
Process 43368 launched: 'E:\_work\22\b\tools\lldb\lldb-test-build.noindex\lang\c\local_variables\TestLocalVariables.test_c_local_variables_dwarf\a.out' (x86_64)
Process 43368 stopped
* thread #1, stop reason = breakpoint 1.1
    frame #0: 0x00007ff6a0141029 a.out`foo(j=<unavailable>) at main.c:13:3
   10     unsigned i = j;
   11     bar(i);
   12     i = 10;
-> 13     bar(i); // Set break point at this line.
   14   }
   15
   16   int main(int argc, char** argv)
(lldb) thread list
Process 43368 stopped
* thread #1: tid = 0x9a78, 0x00007ff6a0141029 a.out`foo(j=<unavailable>) at main.c:13:3, stop reason = breakpoint 1.1
(lldb) breakpoint list -f
Current breakpoints:
1: file = 'main.c', line = 13, exact_match = 0, locations = 1, resolved = 1, hit count = 1
  1.1: where = a.out`foo + 9 at main.c:13:3, address = 0x00007ff6a0141029, resolved, hit count = 1

(lldb) frame variable i
(unsigned int) i = <variable not available>

I haven't fully debugged this, but it looks like this change caused a failure on the Windows LLDB bot. There was already another failure, so you probably didn't get an email:

http://lab.llvm.org:8011/builders/lldb-x64-windows-ninja/builds/9772

BTW, there were some failures on linux after this patch too, caused by lldb's incomplete support for base address selection entries. r374769 was enough to fix the failures I was seeing on linux, with the location parsing being as scattered as it is, it is possible I did not catch all cases.

It looks like this caused a very large increase in binary size (627M->686M). Is that expected/has anyone else observed this?

It looks like this caused a very large increase in binary size (627M->686M). Is that expected/has anyone else observed this?

Yes; we're observing a 2.8% increase for non-LTO, 8.3% increase for LTO in Linux kernel image size when CONFIG_DEBUG_INFO is set to emit debug sections (DWARF4). b/154242577

Our LTO builds were also slowed down 4.2% by this.

@dblaikie was there any follow up to this?

It looks like this caused a very large increase in binary size (627M->686M). Is that expected/has anyone else observed this?

Yes; we're observing a 2.8% increase for non-LTO, 8.3% increase for LTO in Linux kernel image size when CONFIG_DEBUG_INFO is set to emit debug sections (DWARF4). b/154242577

Our LTO builds were also slowed down 4.2% by this.

@dblaikie was there any follow up to this?

No, there's not been any follow-up to this, it seems to have stuck fairly well in general. Happen to have a profile comparison for a representative compilation? (& any sense of the error bars on your measurements?)

Is executable size with debug info included a significant constraint for the Linux kernel image? (what's the scenario for that?) - if it is, perhaps linker debug info compression and/or Split DWARF, etc, might be helpful.

It looks like this caused a very large increase in binary size (627M->686M). Is that expected/has anyone else observed this?

sorry I didn't see this (somehow ended up muting this thread) - which binary built with what flags? could you run bloaty or otherwise compare the objects before/after. I'd expect some growth in linked executable size of a non-split, optimized debug build, but that seems a bit more than I'd expect/seems to be observed elsewhere so far as I know.

It looks like this caused a very large increase in binary size (627M->686M). Is that expected/has anyone else observed this?

sorry I didn't see this (somehow ended up muting this thread) - which binary built with what flags? could you run bloaty or otherwise compare the objects before/after. I'd expect some growth in linked executable size of a non-split, optimized debug build, but that seems a bit more than I'd expect/seems to be observed elsewhere so far as I know.

FWIW, we're seeing about a 30-35% increase in the size of the .debug_loc sections across a variety of benchmarks (DWARF4, optimized build, non-split). Given that the majority of location lists seem to have 2 entries, and the base selection entry adds a third, this figure makes intuitive sense. The tradeoff is a reduced number of relocations against the .debug_loc sections, which does not seem to have all that much impact on link times.
Given that with DWARF v5 we can use the start_offset/end_offset LLE types with possibly small offset values, it seems that the tradeoff is more favorable towards base selection entries with DWARF5 than it is with DWARF4.

It looks like this caused a very large increase in binary size (627M->686M). Is that expected/has anyone else observed this?

sorry I didn't see this (somehow ended up muting this thread) - which binary built with what flags? could you run bloaty or otherwise compare the objects before/after. I'd expect some growth in linked executable size of a non-split, optimized debug build, but that seems a bit more than I'd expect/seems to be observed elsewhere so far as I know.

FWIW, we're seeing about a 30-35% increase in the size of the .debug_loc sections across a variety of benchmarks (DWARF4, optimized build, non-split). Given that the majority of location lists seem to have 2 entries, and the base selection entry adds a third, this figure makes intuitive sense. The tradeoff is a reduced number of relocations against the .debug_loc sections, which does not seem to have all that much impact on link times.
Given that with DWARF v5 we can use the start_offset/end_offset LLE types with possibly small offset values, it seems that the tradeoff is more favorable towards base selection entries with DWARF5 than it is with DWARF4.

Sorry I haven't given this more attention - but it is on my list (though I guess you've worked around it in some manner/downstream patch for now?) - but out of curiosity, does the change in 57d8acac64b87cb4286b00485fb2da7521fc091e help much? (I mean, it helps in general, so maybe you'd still want to disable the base address specifier use - even if it was offset by that change, because the change would still be a win in addition to disable the base address specifiers)

It looks like this caused a very large increase in binary size (627M->686M). Is that expected/has anyone else observed this?

sorry I didn't see this (somehow ended up muting this thread) - which binary built with what flags? could you run bloaty or otherwise compare the objects before/after. I'd expect some growth in linked executable size of a non-split, optimized debug build, but that seems a bit more than I'd expect/seems to be observed elsewhere so far as I know.

FWIW, we're seeing about a 30-35% increase in the size of the .debug_loc sections across a variety of benchmarks (DWARF4, optimized build, non-split). Given that the majority of location lists seem to have 2 entries, and the base selection entry adds a third, this figure makes intuitive sense. The tradeoff is a reduced number of relocations against the .debug_loc sections, which does not seem to have all that much impact on link times.
Given that with DWARF v5 we can use the start_offset/end_offset LLE types with possibly small offset values, it seems that the tradeoff is more favorable towards base selection entries with DWARF5 than it is with DWARF4.

Sorry I haven't given this more attention - but it is on my list (though I guess you've worked around it in some manner/downstream patch for now?) - but out of curiosity, does the change in 57d8acac64b87cb4286b00485fb2da7521fc091e help much? (I mean, it helps in general, so maybe you'd still want to disable the base address specifier use - even if it was offset by that change, because the change would still be a win in addition to disable the base address specifiers)

@probinson - do you have any state here? Whether this is an ongoing issue, whether the improvements to avoid unnecessary debug_loc have reduced the overhead sufficiently, etc?

@probinson - do you have any state here? Whether this is an ongoing issue, whether the improvements to avoid unnecessary debug_loc have reduced the overhead sufficiently, etc?

@Orlando mentioned he was collecting some size data that would be relevant here, he'll post it when he's done. Basically .debug_loc sizes at various points.

But naively, pre-v5, base-address entries can only make lists longer, and for short lists the size cost is big while the reduction in relocations is small. I'd have to agree with @wolfgangp that the space-time tradeoff probably is not favorable for small lists, and maybe there should be some threshold. I know that's more complicated on the emission side, but it's not a trivial thing in the final object. Orlando showed me a preliminary chart where .debug_loc went from ~30% of all debug info in LLVM 5.0 to ~50% in LLVM 10.0, in one of our benchmarks.

@probinson - do you have any state here? Whether this is an ongoing issue, whether the improvements to avoid unnecessary debug_loc have reduced the overhead sufficiently, etc?

@Orlando mentioned he was collecting some size data that would be relevant here, he'll post it when he's done. Basically .debug_loc sizes at various points.

Awesome!

But naively, pre-v5, base-address entries can only make lists longer, and for short lists the size cost is big while the reduction in relocations is small.

Yep, at least on ELF, the relocation is 3 times the size (and uncompressible) than the address itself. So a debug_loc (or debug_range) entry with only a single entry is in favor of base address selection for object file size even without compression (2 addresses + 2 relocations (2 + 2 * 3 == 8), compared to 1 base address selection thing + 1 address + 1 relocation + 2 offsets == 7), but yes, that doubles the address size in the linked binary.

I'd have to agree with @wolfgangp that the space-time tradeoff probably is not favorable for small lists,

My hope/wonder is whether the reduction in unnecessary small (single entry that has scope identical to the enclosing scope: ie: the variable doesn't need a location list - it should use a direct location instead) entries might help make the tradeoff less problematic.

and maybe there should be some threshold. I know that's more complicated on the emission side, but it's not a trivial thing in the final object. Orlando showed me a preliminary chart where .debug_loc went from ~30% of all debug info in LLVM 5.0 to ~50% in LLVM 10.0, in one of our benchmarks.

Yeah - not sure exactly what basis to use to choose that threshold as it'll depend on how important object file size versus binary size is, whether you're using compression (whether only compressing DWARF in objects, executables, or both). But also making it a customizable number doesn't seem super helpful either. Open to ideas, for sure.

@Orlando mentioned he was collecting some size data that would be relevant here, he'll post it when he's done. Basically .debug_loc sizes at various points.

Hi!

I've built a benchmark suite of 85 programs with "-O2 -g" with our downstream branch targeting X86 emitting DWARF v4.

This table provides a summary of the data. It shows the mean for some size data normalized as a percentage of the llvm-3 results for each benchmark.

For reference:
Largest binary built with llvm-3: 6010 kB
Smallest binary built with llvm-3: 79 kB

+------------------------------------------------------------------------------------------- +
| Mean binary size for benchmarks normalized as a percentage of llvm-3 builds                |
+---------------------------------+------------+------------------+-----------------+--------+
| llvm version                    | .debug_loc | other debug info | everything else | Total  |
+---------------------------------+------------+------------------+-----------------+--------+
| llvm-3                          | 13.7       | 33.2             | 53.1            | 100    |
| llvm-4                          | 12.7       | 33.8             | 53.8            | 100.3  |
| llvm-5                          | 13.4       | 35.6             | 54.6            | 103.7  |
| llvm-7                          | 18.4       | 35.6             | 54.0            | 108.0  |
| llvm-8                          | 17.5       | 37.1             | 54.5            | 109.1  |
| llvm-9                          | 19.7       | 37.2             | 54.6            | 111.5  |
| llvm-10 before dblaikie commit  | 19.8       | 37.4             | 54.9            | 112.1  |
| llvm-10 with dblaikie commit    | 25.6       | 37.4             | 54.9            | 117.9  |
| llvm-10                         | 25.8       | 37.5             | 54.8            | 118.1  |
| llvm-master before my commits   | 26.2       | 37.4             | 54.8            | 118.4  |
| llvm-master with my commits     | 18.4       | 35.5             | 55.3            | 109.3  |
+---------------------------------+------------+------------------+-----------------+--------+

Here's an image of that data in graph form: M3.

The llvm-6 entry has been omitted because the non debug-info size is a distracting outlier. The .debug_loc section size is ~16.5% for that one.

If you'd like to see the data in another format or see data for the benchmarks individually please let me know.

@Orlando mentioned he was collecting some size data that would be relevant here, he'll post it when he's done. Basically .debug_loc sizes at various points.

Hi!

I've built a benchmark suite of 85 programs with "-O2 -g" with our downstream branch targeting X86 emitting DWARF v4.

This table provides a summary of the data. It shows the mean for some size data normalized as a percentage of the llvm-3 results for each benchmark.

For reference:
Largest binary built with llvm-3: 6010 kB
Smallest binary built with llvm-3: 79 kB

+------------------------------------------------------------------------------------------- +
| Mean binary size for benchmarks normalized as a percentage of llvm-3 builds                |
+---------------------------------+------------+------------------+-----------------+--------+
| llvm version                    | .debug_loc | other debug info | everything else | Total  |
+---------------------------------+------------+------------------+-----------------+--------+
| llvm-3                          | 13.7       | 33.2             | 53.1            | 100    |
| llvm-4                          | 12.7       | 33.8             | 53.8            | 100.3  |
| llvm-5                          | 13.4       | 35.6             | 54.6            | 103.7  |
| llvm-7                          | 18.4       | 35.6             | 54.0            | 108.0  |
| llvm-8                          | 17.5       | 37.1             | 54.5            | 109.1  |
| llvm-9                          | 19.7       | 37.2             | 54.6            | 111.5  |
| llvm-10 before dblaikie commit  | 19.8       | 37.4             | 54.9            | 112.1  |
| llvm-10 with dblaikie commit    | 25.6       | 37.4             | 54.9            | 117.9  |
| llvm-10                         | 25.8       | 37.5             | 54.8            | 118.1  |
| llvm-master before my commits   | 26.2       | 37.4             | 54.8            | 118.4  |
| llvm-master with my commits     | 18.4       | 35.5             | 55.3            | 109.3  |
+---------------------------------+------------+------------------+-----------------+--------+

Here's an image of that data in graph form: M3.

The llvm-6 entry has been omitted because the non debug-info size is a distracting outlier. The .debug_loc section size is ~16.5% for that one.

If you'd like to see the data in another format or see data for the benchmarks individually please let me know.

Thanks for the data!

(as an aside: do you/@probinson have much of a sense of how much binary size increase you'd trade for object size reductions in this space? If binary size is ulmtimately what you care most about, I'm guessing maybe debug_loc base address specifiers will never be a win for you & perhaps we should just group them under a flag, maybe refactor the debug_ranges base address specifier flag to cover both (the flag there was introduced due to a gold+gdb_index+32 bit binary bug, unfortunately, but lumping them together seems OK-ish to me))

When you say "llvm-master before/after your commits" - what version of llvm-master and what commits did you have to test with? (if you can/want to discuss them)

I'm rather surprised, if master was moderately recent, that it shows no benefit from https://reviews.llvm.org/rG57d8acac64b87cb4286b00485fb2da7521fc091e (perhaps, if it's not too much hassle, you could run a sample benchmark before/after that change?)

Thanks for the data!

Happy to help :)

(as an aside: do you/@probinson have much of a sense of how much binary size increase you'd trade for object size reductions in this space? If binary size is ulmtimately what you care most about, I'm guessing maybe debug_loc base address specifiers will never be a win for you & perhaps we should just group them under a flag, maybe refactor the debug_ranges base address specifier flag to cover both (the flag there was introduced due to a gold+gdb_index+32 bit binary bug, unfortunately, but lumping them together seems OK-ish to me))

I'm not sure, I'll defer to @probinson on that.

When you say "llvm-master before/after your commits" - what version of llvm-master and what commits did you have to test with? (if you can/want to discuss them)

I'm rather surprised, if master was moderately recent, that it shows no benefit from https://reviews.llvm.org/rG57d8acac64b87cb4286b00485fb2da7521fc091e (perhaps, if it's not too much hassle, you could run a sample benchmark before/after that change?)

Apologies for being unclear. Stats for "before my commits" are taken from a build at the first commit before D79571, and "with my commits" is with D86153 / rG57d8acac64b87cb4286b00485fb2da7521fc091e applied (and all the commits in between, including D82129).

Thanks for the data!

Happy to help :)

(as an aside: do you/@probinson have much of a sense of how much binary size increase you'd trade for object size reductions in this space? If binary size is ulmtimately what you care most about, I'm guessing maybe debug_loc base address specifiers will never be a win for you & perhaps we should just group them under a flag, maybe refactor the debug_ranges base address specifier flag to cover both (the flag there was introduced due to a gold+gdb_index+32 bit binary bug, unfortunately, but lumping them together seems OK-ish to me))

I'm not sure, I'll defer to @probinson on that.

When you say "llvm-master before/after your commits" - what version of llvm-master and what commits did you have to test with? (if you can/want to discuss them)

I'm rather surprised, if master was moderately recent, that it shows no benefit from https://reviews.llvm.org/rG57d8acac64b87cb4286b00485fb2da7521fc091e (perhaps, if it's not too much hassle, you could run a sample benchmark before/after that change?)

Apologies for being unclear. Stats for "before my commits" are taken from a build at the first commit before D79571, and "with my commits" is with D86153 / rG57d8acac64b87cb4286b00485fb2da7521fc091e applied (and all the commits in between, including D82129).

Ah, OK - so it sounds like we're back down below the size before I added debug_loc base address specifiers? That's good to hear!

If you're really interested in binary size, then, it might be worth an extra experiment to see what would happen if you disable base address specifiers - might still get you significantly below where we were before (now that the "don't use debug_loc so often" improvements have been made) - so there might still be some discussion about whether more selective use of base address selection entries would be good for you/others.

Ah, OK - so it sounds like we're back down below the size before I added debug_loc base address specifiers? That's good to hear!

Yeah looks like it.

If you're really interested in binary size, then, it might be worth an extra experiment to see what would happen if you disable base address specifiers - might still get you significantly below where we were before (now that the "don't use debug_loc so often" improvements have been made) - so there might still be some discussion about whether more selective use of base address selection entries would be good for you/others.

SGTM I will have a look. To disable base address specifiers here is it enough to just pass in false for ShouldUseBaseAddress to emitRangeList on line DwarfDebug.cpp:2398?

(as an aside: do you/@probinson have much of a sense of how much binary size increase you'd trade for object size reductions in this space? If binary size is ulmtimately what you care most about, I'm guessing maybe debug_loc base address specifiers will never be a win for you & perhaps we should just group them under a flag, maybe refactor the debug_ranges base address specifier flag to cover both (the flag there was introduced due to a gold+gdb_index+32 bit binary bug, unfortunately, but lumping them together seems OK-ish to me))

I'm not sure, I'll defer to @probinson on that.

What we actually care about is turnaround time, which for debug info encompasses compile time, link time, debugger startup time, and the attendant I/O latencies. Note that console download time is *not* a factor, as our downloader knows to omit the debug info.

It's not exactly raw size that matters, but we ought to care how efficiently the information is encoded in the file data. So, with respect to location/range lists, multiplying the number of entries without reducing the number of relocations is bad all around. The compiler has to produce more data; the linker has to copy more data, without any compensating reduction in relocation processing time; the debugger has to read more data to get the same information content.

Do we know how often a location/range list has a single non-base-address entry? Clearly we can avoid a list at all if the location/range aligns with the containing scope (the ValidThroughout case); single-entry lists would come up where that range is a subset of the containing scope, but still only needs one entry. In those cases, there's no object-file or final-binary benefit to having a base-address entry followed by a single list entry. And that optimization really wouldn't have to be under a flag, because it would always be a win, even in v5.
Apologies for not going to look myself...

(as an aside: do you/@probinson have much of a sense of how much binary size increase you'd trade for object size reductions in this space? If binary size is ulmtimately what you care most about, I'm guessing maybe debug_loc base address specifiers will never be a win for you & perhaps we should just group them under a flag, maybe refactor the debug_ranges base address specifier flag to cover both (the flag there was introduced due to a gold+gdb_index+32 bit binary bug, unfortunately, but lumping them together seems OK-ish to me))

I'm not sure, I'll defer to @probinson on that.

What we actually care about is turnaround time, which for debug info encompasses compile time, link time, debugger startup time, and the attendant I/O latencies. Note that console download time is *not* a factor, as our downloader knows to omit the debug info.

It's not exactly raw size that matters, but we ought to care how efficiently the information is encoded in the file data. So, with respect to location lists, multiplying the number of entries without reducing the number of relocations is bad all around.

Agreed, though I don't think that's what we're doing here.

A single entry location list in DWARFv4 looks like this:

relocatable start address, relocatable end address

And at least on ELF x86_64, relocations are 3 times the size of an address, so the total size of that entry in the object file is (1 + 3) * 2 == 8

Whereas with a base address selection entry:

0xffffffffffffffff, relocatable address
start offset, end offset

So that's 1 + (1 + 3) + 1 + 1 == 7

So using a base address selection entry is a minor win in uncompressed object size - more significant win if you compress your .debug_info in object files (because relocations aren't compressed, so in the first case, only 1/4 of object file bytes are compressed, in the second case 4/7 bytes are compressed).

The compiler has to produce more data; the linker has to copy more data, without any compensating reduction in relocation processing time; the debugger has to read more data to get the same information content.

Do we know how often a location/range list has a single non-base-address entry?

aside: currently a range list never has a single entry, because we use low/high_pc then. Though per my thread from early this year, I think there might be some value in using range lists even in these single entry cases for the same reason as here with location lists - again, moreso in DWARFv5 (see below) than DWARFv4. (as a workaround for the lack of addrx+offset encoding which I'd use for low_pc otherwise and gain the same benefits - essentially allowing a "base address + offset pair" form of encoding for low/high pc, whereas it's currently more like "start address+length" encoding (akin to RLE_startx_length))

Clearly we can avoid a list at all if the location/range aligns with the containing scope (the ValidThroughout case); single-entry lists would come up where that range is a subset of the containing scope, but still only needs one entry. In those cases, there's no object-file or final-binary benefit to having a base-address entry followed by a single list entry. And that optimization really wouldn't have to be under a flag, because it would always be a win, even in v5.
Apologies for not going to look myself...

Not sure I understand - base address selection entries do have some benefit even for single entry lists. If they're explicitly worse across the board, yeah, I'd be all for not enabling this in single entry lists - but my understanding at the moment, and when I implemented it, was that it was still a (variable, depending on whether compressed debug info is used) win for object size and relocation count.

In v5 it's even more valuable, because you can share base addresses from other places. Imagine a location list for a scope inside a function. If we don't use a base address selection entry, it might be:

[DW_LLE_startx_length]:  uleb, uleb: location description
[DW_LLE_end_of_list  ]
...
debug_addr:
relocatable address

But with a base address selection entry, we can strategically choose an address that's already in the address pool:

[DW_LLE_startx_length]: uleb, uleb: location description
[DW_LLE_end_of_list  ]

Removing the address from the address pool entirely and instead relying on an existing one - at the cost of larger values, which in some cases would mean longer encoded ulebs - but given an entry in the address pool is 4 or 8 bytes + 3 times that in relocation, it's unlikely the ulebs would be long enough to favor the address pool entry instead.

Ah, OK - so it sounds like we're back down below the size before I added debug_loc base address specifiers? That's good to hear!

Yeah looks like it.

Awesome, thanks for confirming/looking into it/etc!

If you're really interested in binary size, then, it might be worth an extra experiment to see what would happen if you disable base address specifiers - might still get you significantly below where we were before (now that the "don't use debug_loc so often" improvements have been made) - so there might still be some discussion about whether more selective use of base address selection entries would be good for you/others.

SGTM I will have a look. To disable base address specifiers here is it enough to just pass in false for ShouldUseBaseAddress to emitRangeList on line DwarfDebug.cpp:2398?

Yep, that ought to do it!

I have here a copy of the table I shared earlier, with a new row "ShouldUseBaseAddress=false". The stats for this row are taken at 57d8acac64b (D86153) with the changes mentioned in my previous comment (disabling base address specifiers).

+------------------------------------------------------------------------------------------- +
| Mean binary size for benchmarks normalized as a percentage of llvm-3 builds                |
+---------------------------------+------------+------------------+-----------------+--------+
| llvm version                    | .debug_loc | other debug info | everything else | Total  |
+---------------------------------+------------+------------------+-----------------+--------+
| llvm-3                          | 13.7       | 33.2             | 53.1            | 100    |
| llvm-4                          | 12.7       | 33.8             | 53.8            | 100.3  |
| llvm-5                          | 13.4       | 35.6             | 54.6            | 103.7  |
| llvm-7                          | 18.4       | 35.6             | 54.0            | 108.0  |
| llvm-8                          | 17.5       | 37.1             | 54.5            | 109.1  |
| llvm-9                          | 19.7       | 37.2             | 54.6            | 111.5  |
| llvm-10 before dblaikie commit  | 19.8       | 37.4             | 54.9            | 112.1  |
| llvm-10 with dblaikie commit    | 25.6       | 37.4             | 54.9            | 117.9  |
| llvm-10                         | 25.8       | 37.5             | 54.8            | 118.1  |
| llvm-master before my commits   | 26.2       | 37.4             | 54.8            | 118.4  |
| llvm-master with my commits     | 18.4       | 35.5             | 55.3            | 109.3  |
| ShouldUseBaseAddress=false      | 14.9       | 35.5             | 55.3            | 105.7  |
+---------------------------------+------------+------------------+-----------------+--------+

When disabling the base address specifier - for these benchmarks (-O2 -gdwarf-4) - there is a 3.3% reduction in total file size again, with .debug_loc 19% smaller. This brings the binary sizes nearly in line with llvm-5.

I'm PTO tomorrow but I'd be happy to continue looking into this when I'm back if that would be useful.

Out of curiosity I also did a clang-3.4 build too using master @ 485e6db8729 (3rd September) with "-O2 -gdwarf-4". It is smaller when disabling base address specifiers (and emitting DWARFv 4) too:
With base addresses (default): Total File Size: 527591064
Without base addresses: Total File Size: 513946184 (-2.59 %)

Thanks for all the data @Orlando - do you happen to have means to measure total object size too? Be useful to compare the binary size increase with the object size decrease.

The following builds are with clang @ 485e6db8729 (3rd September) targeting x86.

+---------------------------------------------------------------+
| File size (bytes) of clang-3.4 built with -O2 -gdwarf-4       |
*===============================================================*
|                    | base addr    | no base addr | % change   |
+--------------------+--------------+--------------+------------+
| Accumulated object | 1874653208   | 1924003152   | +2.63      |
| file sizes         |              |              |            |
+--------------------+--------------+--------------+------------+
| Elf size           | 527591064    | 513946184    | -2.57      |
+--------------------+--------------+--------------+------------+

+---------------------------------------------------------------+
| File size (bytes) of clang-3.4 built with -O2 -gdwarf-5       |
*===============================================================*
|                    | base addr    | no base addr | % change   |
+--------------------+--------------+--------------+------------+
| Accumulated object | 1501490184   | 1515647496   | +0.94      |
| file sizes         |              |              |            |
+--------------------+--------------+--------------+------------+
| Elf size           | 478841560    | 478725248    | -0.024     |
+--------------------+--------------+--------------+------------+

The build time difference between all 4 configurations appears to be negligible. For both DWARFv5 and v5, when disabling base address entries, the object files are larger and elfs samller. The delta is more pronounced in DWARFv4, and the elf size reduction is very small for DWARFv5.

The following results are builds of a private benchmark suite (mentioned in previous comments) with downstream clang @ 57d8acac64b (27th August) targeting x86.

+---------------------------------------------------------------+
| File size (bytes) of benchmark suite built with -O2 -gdwarf-4 |
*===============================================================*
|                    | base addr    | no base addr | % change   |
+--------------------+--------------+--------------+------------+
| Accumulated object | 51828696     | 57886752     | +11.69     |
| file sizes         |              |              |            |
+--------------------+--------------+--------------+------------+
| Accumulated elf    | 43012748     | 41199724     | -4.22      |
| file sizes         |              |              |            |
+--------------------+--------------+--------------+------------+

The results are more extreme but follow the same pattern as the clang-3.4 builds. I don't have the build times or DWARFv5 builds to hand for these benchmarks.

Just to clarify: when I say "elf" here I'm talking about the linked executable file, and "object files" are the pre-link .o files.

Just to clarify: when I say "elf" here I'm talking about the linked executable file, and "object files" are the pre-link .o files.

Thanks - makes sense.

I guess all of these measurements were done without Split DWARF (shouldn't make things better/worse overall (across .o and .dwo), really, compared to DWARFv5 non-split - but means when only looking at .o files the difference of avoiding more debug_addr entries is more significant because there's fewer remaining .o debug bytes to begin with) and without compression (-gz) enabled?

@probinson - how're these tradeoffs all sounding to you, and did you have further thoughts on what sounds like somewhat of a source of confusion in the previous posts on this thread (the question of whether a base address selection entry for a one-entry list could be beneficial, which I believe it is/showed the numbers/my reasoning there)?

I guess all of these measurements were done without Split DWARF (shouldn't make things better/worse overall (across .o and .dwo), really, compared to DWARFv5 non-split - but means when only looking at .o files the difference of avoiding more debug_addr entries is more significant because there's fewer remaining .o debug bytes to begin with) and without compression (-gz) enabled?

That's right; no split DWARF and no compression for any of those builds.

TL;DR: It's all good.

As I worked through the calculations again myself, I realized I had forgotten that both the start and end address in a v4 entry were relocated. (The spec says neither of them are; they are supposed to be relative to the CU base address. But nobody actually does it that way.) So in v4, using the base-address entry is a slight size win for object files, and reduces the total number of relocations. It has a cost in loadfile .debug_loc size, which is what people were noticing above.

And in v5, the combo of base_addressx/offset_pair is a win over startx_length because the former can reuse the .debug_addr entry for the function entry point. It shows up as in increase in .debug_loclists but there's a compensating decrease in .debug_addr as long as we actually do reuse a .debug_addr entry. Our debugger folks will whine about the additional indirection, but that's nothing new; v5 just has a lot more of that.

Orlando's data suggests that the overall build time difference is minimal. We have no data about debugger load times, but I'd hope that location lists wouldn't be on the critical path there.

My conclusion is: We have a better understanding of where the size difference is coming from; naively it doesn't cause turnaround-time problems. If we get complaints from licensees, we can revisit this, but I haven't been noticing size complaints in the last couple of years.

TL;DR: It's all good.

As I worked through the calculations again myself, I realized I had forgotten that both the start and end address in a v4 entry were relocated. (The spec says neither of them are; they are supposed to be relative to the CU base address. But nobody actually does it that way.)

Oh, they are relative to the CU base address, if there is one - but very few C++ CUs have a relocated base address - because they have code in multiple sections, they use DW_AT_ranges. If you make a simple CU with one function, then the CU gets a traditional low/high_pc - and, say, DW_AT_ranges on a scope inside that one function will be relative to the CU's base address (low_pc) and no relocations would be used (& even with base address selection entries in use - we won't emit a base address selection entry)

eg:

$ cat loc.cpp
void f1();
void f2() {
  int i = 7;
  f1();
  i = 3;
  f1();
}
void f3() {
}
$ clang++-tot loc.cpp -gdwarf-5 -c -O3 && llvm-dwarfdump-tot -v loc.o -debug-loclists
loc.o:  file format elf64-x86-64

.debug_loclists contents:
0x00000000: locations list header: length = 0x0000001b, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00, offset_entry_count = 0x00000001
offsets: [
0x00000004 => 0x00000010
]
0x00000010: 
            DW_LLE_offset_pair     (0x0000000000000001, 0x0000000000000006): DW_OP_consts +7, DW_OP_stack_value
            DW_LLE_offset_pair     (0x0000000000000006, 0x000000000000000c): DW_OP_consts +3, DW_OP_stack_value
            DW_LLE_end_of_list     ()

Compared to using function-sections, that'll put f2 and f3 in separate .text sections, forcing the use of DW_AT_ranges at the CU and then LLVM produces a constant zero value DW_AT_low_pc to be clear what the "base address" is (which means basically no base address - everything has to use absolute addressing/explicit base addresses in the range/loc lists):

$ clang++-tot -ffunction-sections loc.cpp -gdwarf-5 -c -O3 && llvm-dwarfdump-tot -v loc.o -debug-loclists
loc.o:  file format elf64-x86-64

.debug_loclists contents:
0x00000000: locations list header: length = 0x0000001d, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00, offset_entry_count = 0x00000001
offsets: [
0x00000004 => 0x00000010
]
0x00000010: 
            DW_LLE_base_addressx   (0x0000000000000000)
            DW_LLE_offset_pair     (0x0000000000000001, 0x0000000000000006): DW_OP_consts +7, DW_OP_stack_value
            DW_LLE_offset_pair     (0x0000000000000006, 0x000000000000000c): DW_OP_consts +3, DW_OP_stack_value
            DW_LLE_end_of_list     ()

So in v4, using the base-address entry is a slight size win for object files, and reduces the total number of relocations. It has a cost in loadfile .debug_loc size, which is what people were noticing above.

And in v5, the combo of base_addressx/offset_pair is a win over startx_length because the former can reuse the .debug_addr entry for the function entry point. It shows up as in increase in .debug_loclists but there's a compensating decrease in .debug_addr as long as we actually do reuse a .debug_addr entry. Our debugger folks will whine about the additional indirection, but that's nothing new; v5 just has a lot more of that.

Orlando's data suggests that the overall build time difference is minimal. We have no data about debugger load times, but I'd hope that location lists wouldn't be on the critical path there.

My conclusion is: We have a better understanding of where the size difference is coming from; naively it doesn't cause turnaround-time problems. If we get complaints from licensees, we can revisit this, but I haven't been noticing size complaints in the last couple of years.

Awesome awesome - really appreciate you talking through all this!