This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Merge partially matching chains of inlined locations
ClosedPublic

Authored by dstenb on Jan 25 2023, 8:26 AM.

Details

Summary

For example, if you have a chain of inlined funtions like this:

 1 #include <stdlib.h>
 2 int g1 = 4, g2 = 6;
 3
 4 static inline void bar(int q) {
 5   if (q > 5)
 6     abort();
 7 }
 8
 9 static inline void foo(int q) {
10   bar(q);
11 }
12
13 int main() {
14   foo(g1);
15   foo(g2);
16   return 0;
17 }

with optimizations you could end up with a single abort call for the two
inlined instances of foo(). When merging the locations for those inlined
instances you would previously end up with a 0:0 location in main().
Leaving out that inlined chain from the location for the abort call
could make troubleshooting difficult in some cases.

This patch changes DILocation::getMergedLocation() to try to handle such
cases. The function is rewritten to first find a common starting point
for the two locations (same subprogram and inlined-at location), and
then in reverse traverses the inlined-at chain looking for matches in
each subprogram. For each subprogram, the merge function will find the
nearest common scope for the two locations, and matching line and
column (or set them to 0 if not matching).

In the example above, you will for the abort call get a location in
bar() at 6:5, inlined in foo() at 10:3, inlined in main() at 0:0 (since
the two inlined functions are on different lines, but in the same
scope).

I have not seen anything in the DWARF standard that would disallow
inlining a non-zero location at 0:0 in the inlined-at function, and both
LLDB and GDB seem to accept these locations (with D142552 needed for
LLDB to handle cases where the file, line and column number are all 0).
One incompatibility with GDB is that it seems to ignore 0-line locations
in some cases, but I am not aware of any specific issue that this patch
produces related to that.

With x86-64 LLDB (trunk) you previously got:

frame #0: 0x00007ffff7a44930 libc.so.6`abort
frame #1: 0x00005555555546ec a.out`main at merge.c:0

and will now get:

frame #0: 0x[...] libc.so.6`abort
frame #1: 0x[...] a.out`main [inlined] bar(q=<unavailable>) at merge.c:6:5
frame #2: 0x[...] a.out`main [inlined] foo(q=<unavailable>) at merge.c:10:3
frame #3: 0x[...] a.out`main at merge.c:0

and with x86-64 GDB (11.1) you will get:

(gdb) bt
#0  0x00007ffff7a44930 in abort () from /lib64/libc.so.6
#1  0x00005555555546ec in bar (q=<optimized out>) at merge.c:6
#2  foo (q=<optimized out>) at merge.c:10
#3  0x00005555555546ec in main ()

Diff Detail

Event Timeline

dstenb created this revision.Jan 25 2023, 8:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 25 2023, 8:26 AM
dstenb requested review of this revision.Jan 25 2023, 8:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 25 2023, 8:26 AM

As far as I can tell it is valid DWARF to inline at line 0.

I guess that one of the main questions here is whether debuggers accept this. Of what I have tested of lldb trunk and GDB (9 and 11) they seem to handle it, with the caveat regarding the LLDB patch D142552 and that GDB seems to ignore line 0 locations in some cases. I do not know how the Sony debugger, dbx, or any other debugger handles it.

Oh, neat idea - but I wonder what it does to the DIE tree, rather than just the line table? Do we end up with 3 inlined_subroutines? (one for each of the original calls, if tehy have any unique instructions after inlining, then one for this sythesized call? I guess that's not strictly worse - the instructions were from some sort of inlining... )

Oh, neat idea - but I wonder what it does to the DIE tree, rather than just the line table? Do we end up with 3 inlined_subroutines? (one for each of the original calls, if tehy have any unique instructions after inlining, then one for this sythesized call? I guess that's not strictly worse - the instructions were from some sort of inlining... )

Yes, we will end up with 3 inlined_subroutines in the example above (as both original calls have unique instructions):

$ grep -A7 DW_TAG_inlined_subroutine after.txt
0x00000066:     DW_TAG_inlined_subroutine
                  DW_AT_abstract_origin	(0x0000004a "foo")
                  DW_AT_low_pc	(0x00000000000006d1)
                  DW_AT_high_pc	(0x00000000000006da)
                  DW_AT_call_file	("/upstream/llvm-project/a.c")
                  DW_AT_call_line	(14)
                  DW_AT_call_column	(0x03)

0x00000073:       DW_TAG_inlined_subroutine
                    DW_AT_abstract_origin	(0x0000003d "bar")
                    DW_AT_low_pc	(0x00000000000006d1)
                    DW_AT_high_pc	(0x00000000000006da)
                    DW_AT_call_file	("/upstream/llvm-project/a.c")
                    DW_AT_call_line	(10)
                    DW_AT_call_column	(0x03)

--
0x00000081:     DW_TAG_inlined_subroutine
                  DW_AT_abstract_origin	(0x0000004a "foo")
                  DW_AT_low_pc	(0x00000000000006da)
                  DW_AT_high_pc	(0x00000000000006e3)
                  DW_AT_call_file	("/upstream/llvm-project/a.c")
                  DW_AT_call_line	(15)
                  DW_AT_call_column	(0x03)

0x0000008e:       DW_TAG_inlined_subroutine
                    DW_AT_abstract_origin	(0x0000003d "bar")
                    DW_AT_low_pc	(0x00000000000006da)
                    DW_AT_high_pc	(0x00000000000006e3)
                    DW_AT_call_file	("/upstream/llvm-project/a.c")
                    DW_AT_call_line	(10)
                    DW_AT_call_column	(0x03)

--
0x0000009c:     DW_TAG_inlined_subroutine
                  DW_AT_abstract_origin	(0x0000004a "foo")
                  DW_AT_low_pc	(0x00000000000006e7)
                  DW_AT_high_pc	(0x00000000000006ec)
                  DW_AT_call_file	("/upstream/llvm-project/a.c")
                  DW_AT_call_line	(0)

0x000000a8:       DW_TAG_inlined_subroutine
                    DW_AT_abstract_origin	(0x0000003d "bar")
                    DW_AT_low_pc	(0x00000000000006e7)
                    DW_AT_high_pc	(0x00000000000006ec)
                    DW_AT_call_file	("/upstream/llvm-project/a.c")
                    DW_AT_call_line	(10)
                    DW_AT_call_column	(0x03)
dstenb updated this revision to Diff 492719.EditedJan 27 2023, 5:58 AM

When building a Clang binary with this patch to see how often the new case occurs I encountered a bug where the BRIt iterator was invalidated due to further insertion into BLocs. I have fixed that and added a unit test.

Oh, neat idea - but I wonder what it does to the DIE tree, rather than just the line table? Do we end up with 3 inlined_subroutines? (one for each of the original calls, if tehy have any unique instructions after inlining, then one for this sythesized call? I guess that's not strictly worse - the instructions were from some sort of inlining... )

Yes, we will end up with 3 inlined_subroutines in the example above (as both original calls have unique instructions):

Fair enough - makes sense!

How's the performance?

(@aprantl @JDevlieghere - this seems pretty reasonable/good to me in terms of direction (Haven't looked in detail at the code, but at a glance it seems plausible, if perf is OK) - how about you folks?)

llvm/lib/IR/DebugInfoMetadata.cpp
194

Probably add an explicit return type to the lambda, rather than a cast on the return expression here.

jmorse added a subscriber: jmorse.Jan 27 2023, 10:34 AM

I've run out of time to read the code this week, but it sounds like a good approach to me, I'll give my debugger colleagues a prod about it too.

As far as I can tell it is valid DWARF to inline at line 0.

I guess that one of the main questions here is whether debuggers accept this. Of what I have tested of lldb trunk and GDB (9 and 11) they seem to handle it, with the caveat regarding the LLDB patch D142552 and that GDB seems to ignore line 0 locations in some cases. I do not know how the Sony debugger, dbx, or any other debugger handles it.

To speak for LLDB here: I support emitting an inlined-at: line 0. If LLDB can't deal with it (though I don't see why) we can fix LLDB.

dstenb added a comment.EditedFeb 3 2023, 6:53 AM

Oh, neat idea - but I wonder what it does to the DIE tree, rather than just the line table? Do we end up with 3 inlined_subroutines? (one for each of the original calls, if tehy have any unique instructions after inlining, then one for this sythesized call? I guess that's not strictly worse - the instructions were from some sort of inlining... )

Yes, we will end up with 3 inlined_subroutines in the example above (as both original calls have unique instructions):

How's the performance?

Sorry, I have been blocked by other work. I have now measured compilation speed for an opt 8.0 binary with the RelWithDebInfo build type using patched respectively unpatched versions of trunk (those were Release builds).

This was done a shared machine, so the load varied a bit over time, but I've built opt 40 times with respective binary and these are the user time results:

# With patch
[67.32661666666667, 67.33988333333333, 68.00153333333333, 69.95165, 70.0251, 70.08698333333332, 70.10561666666666, 70.15785000000001, 70.1597, 70.16915, 70.1803, 70.22085, 70.2247, 70.30366666666666, 70.30761666666666, 70.31343333333334, 70.31875, 70.43641666666667, 70.5036, 70.50515, 70.51246666666667, 70.53808333333333, 70.57755, 70.60194999999999, 70.64415000000001, 70.84625, 71.6538, 73.79506666666667, 73.89473333333333, 74.56911666666666, 76.15891666666667, 76.41226666666667, 79.76478333333333, 81.38295, 81.68751666666667, 86.29401666666666, 87.57535, 89.4209, 95.79031666666667, 109.24031666666667]
mean: 74.44997666666667
median: 70.50880833333333
std: 8.485396943096646

# Without patch
[66.41356666666667, 67.65480000000001, 70.14276666666667, 70.14448333333334, 70.18646666666667, 70.19085, 70.2014, 70.21263333333333, 70.26146666666666, 70.26425, 70.26745, 70.28258333333333, 70.28881666666666, 70.36748333333333, 70.41218333333335, 70.44623333333334, 70.47338333333333, 70.48428333333334, 70.51248333333334, 70.51689999999999, 70.54706666666667, 70.55458333333333, 70.55946666666667, 70.57926666666667, 70.58081666666666, 70.70003333333334, 70.99738333333333, 71.0452, 71.10905, 72.0509, 77.86593333333333, 79.43435000000001, 79.80055, 84.38, 88.10063333333333, 97.71156666666667, 98.03155, 101.29769999999999, 106.61161666666666, 115.31475]
mean: 75.92492250000001
median: 70.53198333333333
std: 11.548023516184701

So, it seems that this patch is not affecting the compile time noticeably, at least for this normal case.

dstenb added a comment.Feb 3 2023, 7:00 AM

For the RelWithDebInfo opt 8.0 binaries that was mentioned in the previous comment the number of inlined subroutine DIEs increased by ~2.8%:

$ llvm-dwarfdump build-without/bin/opt | grep -E ':\s+DW_TAG_inlined_subroutine' | wc -l
1726835
$ llvm-dwarfdump build-with/bin/opt | grep -E ':\s+DW_TAG_inlined_subroutine' | wc -l
1775581

and this is the output from llvm-dwarfdump --statistics:

--- stats-without.txt	2023-02-03 14:53:06.328707620 +0100
+++ stats-with.txt	2023-02-03 14:51:35.704846099 +0100
@@ -3 +3 @@
-  "file": "build-without/bin/opt",
+  "file": "build-with/bin/opt",
@@ -5,4 +5,4 @@
-  "#functions": 175812,
-  "#functions with location": 175360,
-  "#inlined functions": 1726835,
-  "#inlined functions with abstract origins": 1726835,
+  "#functions": 177591,
+  "#functions with location": 177139,
+  "#inlined functions": 1775581,
+  "#inlined functions with abstract origins": 1775581,
@@ -10 +10 @@
-  "#source variables": 4042638,
+  "#source variables": 4133726,
@@ -12 +12 @@
-  "#call site entries": 1726835,
+  "#call site entries": 1775581,
@@ -15 +15 @@
-  "sum_all_variables(#bytes in parent scope)": 325501149,
+  "sum_all_variables(#bytes in parent scope)": 325501167,
@@ -22 +22 @@
-  "sum_all_local_vars(#bytes in parent scope)": 165535273,
+  "sum_all_local_vars(#bytes in parent scope)": 165535291,
@@ -26 +26 @@
-  "#bytes within inlined functions": 17090162,
+  "#bytes within inlined functions": 17532650,
@@ -35,11 +35,11 @@
-  "#bytes in .debug_info": 150806439,
-  "#bytes in .debug_abbrev": 2926872,
-  "#bytes in .debug_line": 19552340,
-  "#bytes in .debug_str": 55235762,
-  "#bytes in .debug_addr": 9078944,
-  "#bytes in .debug_loclists": 26593774,
-  "#bytes in .debug_rnglists": 7188993,
-  "#bytes in .debug_str_offsets": 24633440,
-  "#bytes in .debug_line_str": 113086,
-  "#variables processed by location statistics": 2579495,
-  "#variables with 0% of parent scope covered by DW_AT_location": 25017,
+  "#bytes in .debug_info": 151752899,
+  "#bytes in .debug_abbrev": 2959240,
+  "#bytes in .debug_line": 19702676,
+  "#bytes in .debug_str": 55419330,
+  "#bytes in .debug_addr": 9205856,
+  "#bytes in .debug_loclists": 26594317,
+  "#bytes in .debug_rnglists": 7304436,
+  "#bytes in .debug_str_offsets": 24650788,
+  "#bytes in .debug_line_str": 113071,
+  "#variables processed by location statistics": 2581225,
+  "#variables with 0% of parent scope covered by DW_AT_location": 26747,
@@ -57 +57 @@
-  "#variables - entry values with 0% of parent scope covered by DW_AT_location": 18853,
+  "#variables - entry values with 0% of parent scope covered by DW_AT_location": 18849,
@@ -69,2 +69,2 @@
-  "#params processed by location statistics": 2219749,
-  "#params with 0% of parent scope covered by DW_AT_location": 10094,
+  "#params processed by location statistics": 2221618,
+  "#params with 0% of parent scope covered by DW_AT_location": 11963,
@@ -82 +82 @@
-  "#params - entry values with 0% of parent scope covered by DW_AT_location": 4540,
+  "#params - entry values with 0% of parent scope covered by DW_AT_location": 4539,
@@ -94,2 +94,2 @@
-  "#local vars processed by location statistics": 359746,
-  "#local vars with 0% of parent scope covered by DW_AT_location": 14923,
+  "#local vars processed by location statistics": 359607,
+  "#local vars with 0% of parent scope covered by DW_AT_location": 14784,
@@ -107 +107 @@
-  "#local vars - entry values with 0% of parent scope covered by DW_AT_location": 14313,
+  "#local vars - entry values with 0% of parent scope covered by DW_AT_location": 14310,

The binary grew by 2 MB (0.6%):

$ du -h build-without/bin/opt build-with/bin/opt 
323M	build-without/bin/opt
325M	build-with/bin/opt

Numbers look fairly good - wonder if you could get this run through the LLVM perf tracker to get some more stable/precise numbers?
& any chance you could use something like bloaty ( https://github.com/google/bloaty ) or by hand comparison on a per-section basis on the clang binary? Getting a sense of, specifically, how much bigger this makes .debug_info (but any incidental changes to other sections too) might be helpful

Numbers look fairly good - wonder if you could get this run through the LLVM perf tracker to get some more stable/precise numbers?
& any chance you could use something like bloaty ( https://github.com/google/bloaty ) or by hand comparison on a per-section basis on the clang binary? Getting a sense of, specifically, how much bigger this makes .debug_info (but any incidental changes to other sections too) might be helpful

I feel like mentioning “llvm-dwarfdump --show-section-sizes“ :)

Numbers look fairly good - wonder if you could get this run through the LLVM perf tracker to get some more stable/precise numbers?
& any chance you could use something like bloaty ( https://github.com/google/bloaty ) or by hand comparison on a per-section basis on the clang binary? Getting a sense of, specifically, how much bigger this makes .debug_info (but any incidental changes to other sections too) might be helpful

I feel like mentioning “llvm-dwarfdump --show-section-sizes“ :)

Indeed - I usually use llvm-objdump -h when I just want section sizes (it's useful to have the DWARF ones in the context of the others, even if sometimes I'm interested only in the change in DWARF sections). - but the think that I find especially useful about Bloaty is the comparison mode - showing per section % changes and overall % changes, etc.

djtodoro added a comment.EditedFeb 5 2023, 11:53 PM

Numbers look fairly good - wonder if you could get this run through the LLVM perf tracker to get some more stable/precise numbers?
& any chance you could use something like bloaty ( https://github.com/google/bloaty ) or by hand comparison on a per-section basis on the clang binary? Getting a sense of, specifically, how much bigger this makes .debug_info (but any incidental changes to other sections too) might be helpful

I feel like mentioning “llvm-dwarfdump --show-section-sizes“ :)

Numbers look fairly good - wonder if you could get this run through the LLVM perf tracker to get some more stable/precise numbers?
& any chance you could use something like bloaty ( https://github.com/google/bloaty ) or by hand comparison on a per-section basis on the clang binary? Getting a sense of, specifically, how much bigger this makes .debug_info (but any incidental changes to other sections too) might be helpful

I feel like mentioning “llvm-dwarfdump --show-section-sizes“ :)

Indeed - I usually use llvm-objdump -h when I just want section sizes (it's useful to have the DWARF ones in the context of the others, even if sometimes I'm interested only in the change in DWARF sections). - but the think that I find especially useful about Bloaty is the comparison mode - showing per section % changes and overall % changes, etc.

(sorry for diverging from the topic of the patch:) ) I see. It will be really beneficial if we added something like that in our dwarfdump. I may find some free slots to implement that. Thanks.

Numbers look fairly good - wonder if you could get this run through the LLVM perf tracker to get some more stable/precise numbers?

Just to make sure, is that https://llvm-compile-time-tracker.com/? I can try, otherwise I'll build binaries on a smaller machine that is otherwise unused during the nights, to get rid of most of the noise.

& any chance you could use something like bloaty ( https://github.com/google/bloaty ) or by hand comparison on a per-section basis on the clang binary? Getting a sense of, specifically, how much bigger this makes .debug_info (but any incidental changes to other sections too) might be helpful

Here are results for the above mentioned RelWithDebInfo opt build:

$ bloaty normal/build-with/bin/opt -- normal/build-without/bin/opt 
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +101%  +924Ki  [ = ]       0    .debug_info
  +100%  +179Ki  [ = ]       0    .debug_str
  +101%  +146Ki  [ = ]       0    .debug_line
  +101%  +123Ki  [ = ]       0    .debug_addr
  +102%  +112Ki  [ = ]       0    .debug_rnglists
  +101% +31.6Ki  [ = ]       0    .debug_abbrev
  +100% +16.9Ki  [ = ]       0    .debug_str_offsets
  +100%    +543  [ = ]       0    .debug_loclists
  +100%     -18  [ = ]       0    .debug_line_str
  +100%  +324Mi  +100% +35.6Mi    TOTAL

RelWithDebInfo + Full LTO opt binary:

$ bloaty full-lto/build-with/bin/opt -- full-lto/build-without/bin/opt 
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +101%  +800Ki  [ = ]       0    .debug_info
  +101%  +152Ki  [ = ]       0    .debug_line
  +102%  +130Ki  [ = ]       0    .debug_rnglists
  +101%  +101Ki  [ = ]       0    .debug_addr
  +100% +92.8Ki  [ = ]       0    .debug_str
  +100% +2.85Ki  [ = ]       0    .debug_str_offsets
  +100%    +120  [ = ]       0    .debug_abbrev
  +100%      -5  [ = ]       0    .debug_loclists
  +100%     -14  [ = ]       0    .debug_line_str
  +101%  +243Mi  +100% +37.7Mi    TOTAL

RelWithDebInfo + Thin LTO opt binary:

$ bloaty thin-lto/build-with/bin/opt -- thin-lto/build-without/bin/opt 
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +101%  +790Ki  [ = ]       0    .debug_info
  +101%  +148Ki  [ = ]       0    .debug_line
  +102%  +121Ki  [ = ]       0    .debug_rnglists
  +101%  +105Ki  [ = ]       0    .debug_addr
  +100%  +102Ki  [ = ]       0    .debug_str
  +101% +34.7Ki  [ = ]       0    .debug_abbrev
  +100% +10.1Ki  [ = ]       0    .debug_str_offsets
  +100% +1.04Ki  [ = ]       0    .debug_loclists
  +100%     -15  [ = ]       0    .debug_line_str
  +100% -3.07Ki  [ = ]       0    .strtab
  +100%  +332Mi  +100% +38.0Mi    TOTAL
330668	normal/build-without/bin/opt
332204	normal/build-with/bin/opt # ~0.5% increase.

248560	full-lto/build-without/bin/opt
249840	full-lto/build-with/bin/opt # ~0.5% increase.

339144	thin-lto/build-without/bin/opt
340456	thin-lto/build-with/bin/opt # ~0.4% increase.

So, around a 0.5% increase of the total binary size, with at max a 1-2% increase of some of the debug information sections for these RelWithDebInfo builds. I used the opt binary since I had used that before in my earlier comments, but the diffs look the same for a clang binary.

I wonder what the increase is with ThinLTO?

@dblaikie, @ayermolo: I have now ran measurements on an otherwise unused machine to get rid of the noise.

The measurements are from 21 number of opt binary builds per each build type (RelWithDebInfo with/without ThinLTO), using Clang built with the Release build type as before. The time is user+sys (I had missed the system time last round) taken from bash's time.

# With patch:
mean: 81.391325 (81m23.479476s)
median: 81.387383 (81m23.243000s)
std: 0.052490

# Without patch:
mean: 81.216625 (81m12.997476s)
median: 81.216417 (81m12.985000s)
std: 0.058820

# With patch (ThinLTO build):
mean: 90.751952 (90m45.117095s)
median: 90.761333 (90m45.680000s)
std: 0.045923

# Without patch (ThinLTO build):
mean: 90.619793 (90m37.187571s)
median: 90.600950 (90m36.057000s)
std: 0.085509

So, it seems like this patch adds ~10 seconds of CPU time for these ~80-90 minute builds.

Is this a sufficient measurement?

@ayermolo: And in case you meant the effect on the section and file sizes for a ThinLTO build, those are available in my previous comment!

@dblaikie, @ayermolo: I have now ran measurements on an otherwise unused machine to get rid of the noise.

The measurements are from 21 number of opt binary builds per each build type (RelWithDebInfo with/without ThinLTO), using Clang built with the Release build type as before. The time is user+sys (I had missed the system time last round) taken from bash's time.

# With patch:
mean: 81.391325 (81m23.479476s)
median: 81.387383 (81m23.243000s)
std: 0.052490

# Without patch:
mean: 81.216625 (81m12.997476s)
median: 81.216417 (81m12.985000s)
std: 0.058820

# With patch (ThinLTO build):
mean: 90.751952 (90m45.117095s)
median: 90.761333 (90m45.680000s)
std: 0.045923

# Without patch (ThinLTO build):
mean: 90.619793 (90m37.187571s)
median: 90.600950 (90m36.057000s)
std: 0.085509

So, it seems like this patch adds ~10 seconds of CPU time for these ~80-90 minute builds.

Is this a sufficient measurement?

@ayermolo: And in case you meant the effect on the section and file sizes for a ThinLTO build, those are available in my previous comment!

Ah sorry I am blind.
Another data point. I used this patch on some internal build with all the bells and whistles and increase was about 28MB total. Which is negligible.

Numbers look fairly good - wonder if you could get this run through the LLVM perf tracker to get some more stable/precise numbers?

Just to make sure, is that https://llvm-compile-time-tracker.com/? I can try, otherwise I'll build binaries on a smaller machine that is otherwise unused during the nights, to get rid of most of the noise.

Yeah, that's the one I meant.

& any chance you could use something like bloaty ( https://github.com/google/bloaty ) or by hand comparison on a per-section basis on the clang binary? Getting a sense of, specifically, how much bigger this makes .debug_info (but any incidental changes to other sections too) might be helpful

Here are results for the above mentioned RelWithDebInfo opt build:

$ bloaty normal/build-with/bin/opt -- normal/build-without/bin/opt 
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +101%  +924Ki  [ = ]       0    .debug_info
  +100%  +179Ki  [ = ]       0    .debug_str
  +101%  +146Ki  [ = ]       0    .debug_line
  +101%  +123Ki  [ = ]       0    .debug_addr
  +102%  +112Ki  [ = ]       0    .debug_rnglists
  +101% +31.6Ki  [ = ]       0    .debug_abbrev
  +100% +16.9Ki  [ = ]       0    .debug_str_offsets
  +100%    +543  [ = ]       0    .debug_loclists
  +100%     -18  [ = ]       0    .debug_line_str
  +100%  +324Mi  +100% +35.6Mi    TOTAL

I'm a bit confused by all these reports - at least with my bloaty locally, "+100%" would mean the section doubled in size (100% growth) but it shows "+100%" even for a section that shrunk (.debug_line_str) slightly... Confusing.

but the overall numbers still sound OK.

I'm a bit confused by all these reports - at least with my bloaty locally, "+100%" would mean the section doubled in size (100% growth) but it shows "+100%" even for a section that shrunk (.debug_line_str) slightly... Confusing.

but the overall numbers still sound OK.

Yes, I thought that the output format looked odd. That was with bloaty built from main (52948c), which we can compare to v1.1 (for a small example binary):

$ diff -u <(./bloaty-1.1 with.out -- without.out) <(./bloaty-52948c1 with.out -- without.out)
--- /dev/fd/63	2023-02-11 01:25:09.785684383 +0100
+++ /dev/fd/62	2023-02-11 01:25:09.785684383 +0100
@@ -1,8 +1,8 @@
     FILE SIZE        VM SIZE    
  --------------  -------------- 
-   +11%     +26  [ = ]       0    .debug_info
-  +6.5%     +15  [ = ]       0    .debug_abbrev
-  +6.7%      +8  [ = ]       0    .debug_addr
-  +0.1%      +6  [ = ]       0    [Unmapped]
-  +0.5%      +1  [ = ]       0    .debug_line
-  +0.3%     +56  [ = ]       0    TOTAL
+  +115%     +26  [ = ]       0    .debug_info
+  +109%     +15  [ = ]       0    .debug_abbrev
+  +114%      +8  [ = ]       0    .debug_addr
+  +111%      +6  [ = ]       0    .debug_str_offsets
+  +101%      +1  [ = ]       0    .debug_line
+  +100% +17.1Ki  +100% +2.61Ki    TOTAL

Although they find the same number of bytes, v1.1 reports an +11% increase for .debug_info for this example compared to +115% when using bloaty from main. Perhaps it is broken on main?

I don't have access to run v1.1 on the binaries in question now, but if we look at the ratios for the "#bytes" diff lines from llvm-dwarfdump in one of my earlier comments we can see that the ratios are smaller than how I interpreted bloaty's output, similar to this small example.

I'm a bit confused by all these reports - at least with my bloaty locally, "+100%" would mean the section doubled in size (100% growth) but it shows "+100%" even for a section that shrunk (.debug_line_str) slightly... Confusing.

but the overall numbers still sound OK.

Yes, I thought that the output format looked odd. That was with bloaty built from main (52948c), which we can compare to v1.1 (for a small example binary):

$ diff -u <(./bloaty-1.1 with.out -- without.out) <(./bloaty-52948c1 with.out -- without.out)
--- /dev/fd/63	2023-02-11 01:25:09.785684383 +0100
+++ /dev/fd/62	2023-02-11 01:25:09.785684383 +0100
@@ -1,8 +1,8 @@
     FILE SIZE        VM SIZE    
  --------------  -------------- 
-   +11%     +26  [ = ]       0    .debug_info
-  +6.5%     +15  [ = ]       0    .debug_abbrev
-  +6.7%      +8  [ = ]       0    .debug_addr
-  +0.1%      +6  [ = ]       0    [Unmapped]
-  +0.5%      +1  [ = ]       0    .debug_line
-  +0.3%     +56  [ = ]       0    TOTAL
+  +115%     +26  [ = ]       0    .debug_info
+  +109%     +15  [ = ]       0    .debug_abbrev
+  +114%      +8  [ = ]       0    .debug_addr
+  +111%      +6  [ = ]       0    .debug_str_offsets
+  +101%      +1  [ = ]       0    .debug_line
+  +100% +17.1Ki  +100% +2.61Ki    TOTAL

Although they find the same number of bytes, v1.1 reports an +11% increase for .debug_info for this example compared to +115% when using bloaty from main. Perhaps it is broken on main?

I don't have access to run v1.1 on the binaries in question now, but if we look at the ratios for the "#bytes" diff lines from llvm-dwarfdump in one of my earlier comments we can see that the ratios are smaller than how I interpreted bloaty's output, similar to this small example.

*nod* All a bit confusing, unfortunately - but I /think/ the numbers sound OK.

compile-time-tracker results?

compile-time-tracker results?

I have gotten results now, but I will not have time to look at them tonight: https://llvm-compile-time-tracker.com/compare.php?from=f7b10467b63f09ab74e67d4002b3e11601091882&to=d200692f433fbc5b8edaa68e92ab04124d0c7e6a

compile-time-tracker results?

I have gotten results now, but I will not have time to look at them tonight: https://llvm-compile-time-tracker.com/compare.php?from=f7b10467b63f09ab74e67d4002b3e11601091882&to=d200692f433fbc5b8edaa68e92ab04124d0c7e6a

Looking at the instruction count we have the following geomeans:

# NewPM-O3:
geomean         72506M         72501M (-0.01%)
# NewPM-ReleaseThinLTO:
geomean         95895M         95901M (+0.01%)
# NewPM-ReleaseLTO-g:
geomean        110051M        110076M (+0.02%)
# NewPM-O0-g:
geomean         21463M         21473M (+0.05%)

and cycles:

# NewPM-O3:
geomean         54478M         54520M (+0.08%)
# NewPM-ReleaseThinLTO:
geomean         70324M         70367M (+0.06%)
# NewPM-ReleaseLTO-g:
geomean         82451M         82524M (+0.09%)
# NewPM-O0-g:
geomean         16404M         16423M (+0.12%)

I assume that the NewPM-O3 and NewPM-ReleaseThinLTO configurations are not interesting since they do not use -g (there's no file-size changes for those), and that NewPM-O0-g is not interesting since we'd only (?) inline always_inline functions (there are no file-size changes there also). So, there is a slight instruction and cycle count increase for the geometric mean for NewPM-ReleaseLTO-g, but that is similar to the uninteresting configurations. I assume that the uninteresting configurations may move a bit due to different program layout (cache effects, etc.).

In terms of file size increase there is a slight increase for the geometric mean, around in the same ballpark as I saw with my opt compilations.

size-file, NewPM-ReleaseLTO-g:

*Benchmark**Old**New*
kimwitu++3610KiB3638KiB (+0.77%)
sqlite31911KiB1917KiB (+0.31%)
consumer-typeset1382KiB1383KiB (+0.05%)
Bullet2448KiB2453KiB (+0.21%)
tramp3d-v412535KiB12552KiB (+0.14%)
mafft579KiB580KiB (+0.17%)
ClamAV2213KiB2221KiB (+0.39%)
lencod2514KiB2522KiB (+0.34%)
SPASS2616KiB2629KiB (+0.47%)
7zip5495KiB5505KiB (+0.18%)
geomean2589KiB2597KiB (+0.30%)

I've run out of time to read the code this week, but it sounds like a good approach to me, I'll give my debugger colleagues a prod about it too.

Hi! Did you have a chance to talk with your colleagues regarding this?

perf report looks ok to me, thanks!

dstenb updated this revision to Diff 499059.Feb 21 2023, 12:37 AM

Address comment and fix the patch for non-assert builds.

dstenb marked an inline comment as done.Feb 21 2023, 12:39 AM
dstenb added inline comments.
llvm/lib/IR/DebugInfoMetadata.cpp
194

Sorry, I did not catch this comment!

dblaikie accepted this revision.Feb 21 2023, 1:24 PM

Reckon this is worth a go - maybe wait for @aprantl to sign off too, though.

llvm/lib/IR/DebugInfoMetadata.cpp
154–156

Would it be easier to express/read as arithmetic on rbegin, rather than begin->arithmetic->reverse?

This revision is now accepted and ready to land.Feb 21 2023, 1:24 PM
dstenb marked an inline comment as done.Feb 22 2023, 6:32 AM
dstenb added inline comments.
llvm/lib/IR/DebugInfoMetadata.cpp
154–156

The corresponding initializations based on rbegin would be:

ARIt = ALocs.rbegin() + (ALocs.size() - IT->second - 1);
BRIt = BLocs.rbegin() + (BLocs.size() - I - 1);

I think that they are about the same in terms of readability, so happy to pick either or.

dblaikie added inline comments.Feb 22 2023, 12:20 PM
llvm/lib/IR/DebugInfoMetadata.cpp
154–156

Yeah, I think I'd lean /slightly/ towards the latter/not having to explicitly construct a reverse iterator, but no big deal either way, indeed. Your call.

dstenb added a comment.EditedMar 1 2023, 2:42 AM

Reckon this is worth a go - maybe wait for @aprantl to sign off too, though.

Thanks for the review!

Any thoughts on this, @aprantl, or others? If not, I'll attempt to land this at the end of this week or in the beginning of next.

I'm fine with taking this, thanks!

aprantl accepted this revision.Mar 3 2023, 8:39 AM
dstenb added a comment.Mar 6 2023, 5:23 AM

Thanks! I'll land this shortly.

This revision was landed with ongoing or failed builds.Mar 6 2023, 5:37 AM
This revision was automatically updated to reflect the committed changes.