This is an archive of the discontinued LLVM Phabricator instance.

Create synthetic symbol names on demand to improve memory consumption and startup times.
ClosedPublic

Authored by clayborg on Jul 26 2021, 4:42 PM.

Details

Summary

This is a resubmission of https://reviews.llvm.org/D105160 after fixing testing issues.

This fix was created after profiling the target creation of a large C/C++/ObjC application that contained almost 4,000,000 redacted symbol names. The symbol table parsing code was creating names for each of these synthetic symbols and adding them to the name indexes. The code was also adding the object file basename to the end of the symbol name which doesn't allow symbols from different shared libraries to share the names in the constant string pool.

Prior to this fix this was creating 180MB of "___lldb_unnamed_symbol" symbol names and was taking a long time to generate each name, add them to the string pool and then add each of these names to the name index.

This patch fixes the issue by:

not adding a name to synthetic symbols at creation time, and allows name to be dynamically generated when accessed
doesn't add synthetic symbol names to the name indexes, but catches this special case as name lookup time. Users won't typically set breakpoints or lookup these synthetic names, but support was added to do the lookup in case it does happen
removes the object file baseanme from the generated names to allow the names to be shared in the constant string pool
Prior to this fix the startup times for a large application was:
12.5 seconds (cold file caches)
8.5 seconds (warm file caches)

After this fix:
9.7 seconds (cold file caches)
5.7 seconds (warm file caches)

The names of the symbols are auto generated by appending the symbol's UserID to the end of the "___lldb_unnamed_symbol" string and is only done when the name is requested from a synthetic symbol if it has no name.

Diff Detail

Event Timeline

clayborg created this revision.Jul 26 2021, 4:42 PM
clayborg requested review of this revision.Jul 26 2021, 4:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2021, 4:42 PM
wallace accepted this revision.Jul 26 2021, 4:52 PM

good luck with the build bots!

This revision is now accepted and ready to land.Jul 26 2021, 4:52 PM
This revision was landed with ongoing or failed builds.Jul 27 2021, 4:51 PM
This revision was automatically updated to reflect the committed changes.

Hi all, I found this patch causing PR52702 in that the parent of this commit and LLDB 12 worked fine.
When disassembling a hello world C program on Linux, LLDB used to show
callq 0x401030 ; symbol stub for: puts
instead of
callq 0x401030 ; symbol stub for: ___lldb_unnamed_symbol36.
Examining the symbol table by running lldb -b -o 'image dump symtab' a.out used to show:

[   18]     20   X Undefined       0x0000000000000000                    0x0000000000000000 0x00000012 puts@GLIBC_2.2.5
                         ........
[   33]     35   X Code            0x0000000000401000                    0x000000000000001b 0x00000212 _init
[   34]     36  S  Trampoline      0x0000000000401030                    0x0000000000000010 0x00000000 puts
[   35]     37  SX Code            0x0000000000401020                    0x0000000000000010 0x00000000 ___lldb_unnamed_symbol1$$a.out

and now (ToT and LLDB 13) it's:

[   18]     20   X Undefined       0x0000000000000000                    0x0000000000000000 0x00000012 puts@GLIBC_2.2.5
                         ........
[   33]     35   X Code            0x0000000000401000                    0x000000000000001b 0x00000212 _init
[   34]     36  S  Trampoline      0x0000000000401030                    0x0000000000000010 0x00000000 ___lldb_unnamed_symbol36
[   35]     37  SX Code            0x0000000000401020                    0x0000000000000010 0x00000000 ___lldb_unnamed_symbol37

image dump symtab libc.so.6 gives similar result.
Before ec1a4917 :

[ 2366]   2367  S  Trampoline      0x0000000000025010 0x00007ffff7df2010 0x0000000000000010 0x00000000 realloc
[ 2367]   2368  S  Trampoline      0x0000000000025020 0x00007ffff7df2020 0x0000000000000010 0x00000000 __tls_get_addr
[ 2368]   2369  S  Trampoline      0x0000000000025030 0x00007ffff7df2030 0x0000000000000010 0x00000000 memalign
[ 2369]   2370  S  Trampoline      0x0000000000025040 0x00007ffff7df2040 0x0000000000000010 0x00000000 _dl_exception_create
[ 2370]   2371  S  Trampoline      0x0000000000025050 0x00007ffff7df2050 0x0000000000000010 0x00000000 __tunable_get_val
[ 2371]   2372  S  Trampoline      0x0000000000025060 0x00007ffff7df2060 0x0000000000000010 0x00000000 _dl_find_dso_for_object
[ 2372]   2373  S  Trampoline      0x0000000000025070 0x00007ffff7df2070 0x0000000000000010 0x00000000 calloc
[ 2373]   2373  SX Code            0x0000000000025000 0x00007ffff7df2000 0x0000000000000010 0x00000000 ___lldb_unnamed_symbol1$$libc.so.6
[ 2374]   2373  SX Code            0x0000000000025300 0x00007ffff7df2300 0x0000000000000040 0x00000000 ___lldb_unnamed_symbol2$$libc.so.6
[ 2375]   2373  SX Code            0x0000000000025340 0x00007ffff7df2340 0x00000000000002f0 0x00000000 ___lldb_unnamed_symbol3$$libc.so.6
[ 2376]   2373  SX Code            0x0000000000025630 0x00007ffff7df2630 0x000000000000000c 0x00000000 ___lldb_unnamed_symbol4$$libc.so.6

After:

[ 2366]   2367  S  Trampoline      0x0000000000025010 0x00007ffff7df2010 0x0000000000000010 0x00000000 ___lldb_unnamed_symbol2367
[ 2367]   2368  S  Trampoline      0x0000000000025020 0x00007ffff7df2020 0x0000000000000010 0x00000000 ___lldb_unnamed_symbol2368
[ 2368]   2369  S  Trampoline      0x0000000000025030 0x00007ffff7df2030 0x0000000000000010 0x00000000 ___lldb_unnamed_symbol2369
[ 2369]   2370  S  Trampoline      0x0000000000025040 0x00007ffff7df2040 0x0000000000000010 0x00000000 ___lldb_unnamed_symbol2370
[ 2370]   2371  S  Trampoline      0x0000000000025050 0x00007ffff7df2050 0x0000000000000010 0x00000000 ___lldb_unnamed_symbol2371
[ 2371]   2372  S  Trampoline      0x0000000000025060 0x00007ffff7df2060 0x0000000000000010 0x00000000 ___lldb_unnamed_symbol2372
[ 2372]   2373  S  Trampoline      0x0000000000025070 0x00007ffff7df2070 0x0000000000000010 0x00000000 ___lldb_unnamed_symbol2373
[ 2373]   2374  SX Code            0x0000000000025000 0x00007ffff7df2000 0x0000000000000010 0x00000000 ___lldb_unnamed_symbol2374
[ 2374]   2375  SX Code            0x0000000000025300 0x00007ffff7df2300 0x0000000000000040 0x00000000 ___lldb_unnamed_symbol2375
[ 2375]   2376  SX Code            0x0000000000025340 0x00007ffff7df2340 0x00000000000002f0 0x00000000 ___lldb_unnamed_symbol2376
[ 2376]   2377  SX Code            0x0000000000025630 0x00007ffff7df2630 0x000000000000000c 0x00000000 ___lldb_unnamed_symbol2377

Is this intended for the performance boost? It seems to me that "S Trampoline" symbols should be handled differently.

FWIW, I also found that this doesn't affect macOS tho as puts is not even a synthetic symbol:

Index   UserID DSX Type            File Address/Value Load Address       Size               Flags      Name
------- ------ --- --------------- ------------------ ------------------ ------------------ ---------- ----------------------------------
[    0]      0     Data            0x0000000100008008                    0x0000000000000008 0x000e0000 _dyld_private
[    1]      1   X Data            0x0000000100000000                    0x0000000000003f70 0x000f0010 _mh_execute_header
[    2]      2   X Code            0x0000000100003f70                    0x0000000000000014 0x000f0000 main
[    3]      3     Trampoline      0x0000000100003f84                    0x0000000000000006 0x00010100 puts
[    4]      4   X Undefined       0x0000000000000000                    0x0000000000000000 0x00010100 dyld_stub_binder
callq  0x100003f84               ; symbol stub for: puts

Hi all, I found this patch causing PR52702 in that the parent of this commit and LLDB 12 worked fine.
When disassembling a hello world C program on Linux, LLDB used to show
callq 0x401030 ; symbol stub for: puts
instead of
callq 0x401030 ; symbol stub for: ___lldb_unnamed_symbol36.
Examining the symbol table by running lldb -b -o 'image dump symtab' a.out used to show:

[   18]     20   X Undefined       0x0000000000000000                    0x0000000000000000 0x00000012 puts@GLIBC_2.2.5
                         ........
[   33]     35   X Code            0x0000000000401000                    0x000000000000001b 0x00000212 _init
[   34]     36  S  Trampoline      0x0000000000401030                    0x0000000000000010 0x00000000 puts
[   35]     37  SX Code            0x0000000000401020                    0x0000000000000010 0x00000000 ___lldb_unnamed_symbol1$$a.out

and now (ToT and LLDB 13) it's:

[   18]     20   X Undefined       0x0000000000000000                    0x0000000000000000 0x00000012 puts@GLIBC_2.2.5
                         ........
[   33]     35   X Code            0x0000000000401000                    0x000000000000001b 0x00000212 _init
[   34]     36  S  Trampoline      0x0000000000401030                    0x0000000000000010 0x00000000 ___lldb_unnamed_symbol36
[   35]     37  SX Code            0x0000000000401020                    0x0000000000000010 0x00000000 ___lldb_unnamed_symbol37

image dump symtab libc.so.6 gives similar result.
Before ec1a4917 :

[ 2366]   2367  S  Trampoline      0x0000000000025010 0x00007ffff7df2010 0x0000000000000010 0x00000000 realloc
[ 2367]   2368  S  Trampoline      0x0000000000025020 0x00007ffff7df2020 0x0000000000000010 0x00000000 __tls_get_addr
[ 2368]   2369  S  Trampoline      0x0000000000025030 0x00007ffff7df2030 0x0000000000000010 0x00000000 memalign
[ 2369]   2370  S  Trampoline      0x0000000000025040 0x00007ffff7df2040 0x0000000000000010 0x00000000 _dl_exception_create
[ 2370]   2371  S  Trampoline      0x0000000000025050 0x00007ffff7df2050 0x0000000000000010 0x00000000 __tunable_get_val
[ 2371]   2372  S  Trampoline      0x0000000000025060 0x00007ffff7df2060 0x0000000000000010 0x00000000 _dl_find_dso_for_object
[ 2372]   2373  S  Trampoline      0x0000000000025070 0x00007ffff7df2070 0x0000000000000010 0x00000000 calloc
[ 2373]   2373  SX Code            0x0000000000025000 0x00007ffff7df2000 0x0000000000000010 0x00000000 ___lldb_unnamed_symbol1$$libc.so.6
[ 2374]   2373  SX Code            0x0000000000025300 0x00007ffff7df2300 0x0000000000000040 0x00000000 ___lldb_unnamed_symbol2$$libc.so.6
[ 2375]   2373  SX Code            0x0000000000025340 0x00007ffff7df2340 0x00000000000002f0 0x00000000 ___lldb_unnamed_symbol3$$libc.so.6
[ 2376]   2373  SX Code            0x0000000000025630 0x00007ffff7df2630 0x000000000000000c 0x00000000 ___lldb_unnamed_symbol4$$libc.so.6

After:

[ 2366]   2367  S  Trampoline      0x0000000000025010 0x00007ffff7df2010 0x0000000000000010 0x00000000 ___lldb_unnamed_symbol2367
[ 2367]   2368  S  Trampoline      0x0000000000025020 0x00007ffff7df2020 0x0000000000000010 0x00000000 ___lldb_unnamed_symbol2368
[ 2368]   2369  S  Trampoline      0x0000000000025030 0x00007ffff7df2030 0x0000000000000010 0x00000000 ___lldb_unnamed_symbol2369
[ 2369]   2370  S  Trampoline      0x0000000000025040 0x00007ffff7df2040 0x0000000000000010 0x00000000 ___lldb_unnamed_symbol2370
[ 2370]   2371  S  Trampoline      0x0000000000025050 0x00007ffff7df2050 0x0000000000000010 0x00000000 ___lldb_unnamed_symbol2371
[ 2371]   2372  S  Trampoline      0x0000000000025060 0x00007ffff7df2060 0x0000000000000010 0x00000000 ___lldb_unnamed_symbol2372
[ 2372]   2373  S  Trampoline      0x0000000000025070 0x00007ffff7df2070 0x0000000000000010 0x00000000 ___lldb_unnamed_symbol2373
[ 2373]   2374  SX Code            0x0000000000025000 0x00007ffff7df2000 0x0000000000000010 0x00000000 ___lldb_unnamed_symbol2374
[ 2374]   2375  SX Code            0x0000000000025300 0x00007ffff7df2300 0x0000000000000040 0x00000000 ___lldb_unnamed_symbol2375
[ 2375]   2376  SX Code            0x0000000000025340 0x00007ffff7df2340 0x00000000000002f0 0x00000000 ___lldb_unnamed_symbol2376
[ 2376]   2377  SX Code            0x0000000000025630 0x00007ffff7df2630 0x000000000000000c 0x00000000 ___lldb_unnamed_symbol2377

Is this intended for the performance boost? It seems to me that "S Trampoline" symbols should be handled differently.

It is a huge performance boost for Darwin binaries that are stripped because we end up with many symbols with no names due to the fact that mach-o binaries have a LC_FUNCTION_STARTS load command that details the start of all functions even that won't have symbols in the symbol tables.

FWIW, I also found that this doesn't affect macOS tho as puts is not even a synthetic symbol:

Index   UserID DSX Type            File Address/Value Load Address       Size               Flags      Name
------- ------ --- --------------- ------------------ ------------------ ------------------ ---------- ----------------------------------
[    0]      0     Data            0x0000000100008008                    0x0000000000000008 0x000e0000 _dyld_private
[    1]      1   X Data            0x0000000100000000                    0x0000000000003f70 0x000f0010 _mh_execute_header
[    2]      2   X Code            0x0000000100003f70                    0x0000000000000014 0x000f0000 main
[    3]      3     Trampoline      0x0000000100003f84                    0x0000000000000006 0x00010100 puts
[    4]      4   X Undefined       0x0000000000000000                    0x0000000000000000 0x00010100 dyld_stub_binder
callq  0x100003f84               ; symbol stub for: puts

Looks like you found the underlying issue in https://reviews.llvm.org/D116217. Thanks for finding this issue!