Page MenuHomePhabricator

Fix lookup of symbols at the same address with no size vs. size
ClosedPublic

Authored by jankratochvil on Jun 19 2019, 3:54 AM.

Details

Summary

This fixes a failing testcase on Fedora 30 x86_64 (regression Fedora 29->30):

PASS:
./bin/lldb ./lldb-test-build.noindex/functionalities/unwind/noreturn/TestNoreturnUnwind.test_dwarf/a.out -o 'settings set symbols.enable-external-lookup false' -o r -o bt -o quit
  * frame #0: 0x00007ffff7aa6e75 libc.so.6`__GI_raise + 325
    frame #1: 0x00007ffff7a91895 libc.so.6`__GI_abort + 295
    frame #2: 0x0000000000401140 a.out`func_c at main.c:12:2
    frame #3: 0x000000000040113a a.out`func_b at main.c:18:2
    frame #4: 0x0000000000401134 a.out`func_a at main.c:26:2
    frame #5: 0x000000000040112e a.out`main(argc=<unavailable>, argv=<unavailable>) at main.c:32:2
    frame #6: 0x00007ffff7a92f33 libc.so.6`__libc_start_main + 243
    frame #7: 0x000000000040106e a.out`_start + 46

vs.

FAIL - unrecognized abort() function:
./bin/lldb ./lldb-test-build.noindex/functionalities/unwind/noreturn/TestNoreturnUnwind.test_dwarf/a.out -o 'settings set symbols.enable-external-lookup false' -o r -o bt -o quit
  * frame #0: 0x00007ffff7aa6e75 libc.so.6`.annobin_raise.c + 325
    frame #1: 0x00007ffff7a91895 libc.so.6`.annobin_loadmsgcat.c_end.unlikely + 295
    frame #2: 0x0000000000401140 a.out`func_c at main.c:12:2
    frame #3: 0x000000000040113a a.out`func_b at main.c:18:2
    frame #4: 0x0000000000401134 a.out`func_a at main.c:26:2
    frame #5: 0x000000000040112e a.out`main(argc=<unavailable>, argv=<unavailable>) at main.c:32:2
    frame #6: 0x00007ffff7a92f33 libc.so.6`.annobin_libc_start.c + 243
    frame #7: 0x000000000040106e a.out`.annobin_init.c.hot + 46

The extra ELF symbols are there due to Annobin (I did not investigate why this problem happened specifically since F-30 and not since F-28).
It is due to:

Symbol table '.dynsym' contains 2361 entries:
Valu e          Size Type   Bind   Vis     Name
0000000000022769   5 FUNC   LOCAL  DEFAULT _nl_load_domain.cold
000000000002276e   0 NOTYPE LOCAL  HIDDEN  .annobin_abort.c.unlikely
...
000000000002276e   0 NOTYPE LOCAL  HIDDEN  .annobin_loadmsgcat.c_end.unlikely
...
000000000002276e   0 NOTYPE LOCAL  HIDDEN  .annobin_textdomain.c_end.unlikely
000000000002276e 548 FUNC   GLOBAL DEFAULT abort
000000000002276e 548 FUNC   GLOBAL DEFAULT abort@@GLIBC_2.2.5
000000000002276e 548 FUNC   LOCAL  DEFAULT __GI_abort
0000000000022992   0 NOTYPE LOCAL  HIDDEN  .annobin_abort.c_end.unlikely

GDB has some more complicated preferences between overlapping and/or sharing address symbols, I have made here so far the most simple fix for this case.

Diff Detail

Repository
rL LLVM

Event Timeline

jankratochvil created this revision.Jun 19 2019, 3:54 AM

The fix seems reasonable to me. There's just one thing I don't get: Why was it necessary to change the iteration order here?
IIUC, the real business happens on line 904, where you add the if (entry[i].base == entry[i+1].base) break; check. I could be missing something, but it seems to me like that thing would work no matter what the iteration order is...

lldb/lit/SymbolFile/sizeless-symbol.test
1 ↗(On Diff #205538)

Are you sure this will work whatever the host triple happens to be ? (you can try it out by manually forcing a couple of random targets with the -target argument). I have a feeling the .type, .size, etc. directives are a feature of elf-targetting assemblers. If that doesn't work you can always force a specific triple with the -target command.

4 ↗(On Diff #205538)

As you're fixing this bug by changing the way symbol size is computed, it would be nice to also put (and check it's output) a "image dump symtab" command here. That would also make it clear what are the assumptions the following "lookup" commands are operating under.

So I am fine with symbols having zero size being in the symbol table. I would be fine not changing anything in the sorting and leaving some symbols with zero size, we just need to fix:

Symbol *Symtab::FindSymbolAtFileAddress(addr_t file_addr);

To ignore zero sized symbols when it finds them _if_ there is another symbol that has a size for that address. Wouldn't that fix the issue here?

my fix assumes we would have zero sized symbols first in the m_file_addr_to_index map, and the symbols with the same size would be ordered after the ones with zero size. Also, I am assuming when we do:

const FileRangeToIndexMap::Entry *entry = m_file_addr_to_index.FindEntryStartsAt(file_addr);

that it finds the first symbol that starts with "file_addr". If this isn't the case we might need to be able to back up a few symbols to ensure we get consistent results

So I am fine with symbols having zero size being in the symbol table. I would be fine not changing anything in the sorting and leaving some symbols with zero size, we just need to fix:

Symbol *Symtab::FindSymbolAtFileAddress(addr_t file_addr);

To ignore zero sized symbols when it finds them _if_ there is another symbol that has a size for that address. Wouldn't that fix the issue here?

You are assuming here that the symbols have size zero at the time we are performing the lookup. If I understand correctly what is going on, the problem here is that the code munging the symbol table (InitAddressIndexes), will set these symbols to have non-zero size. This is what this patch is trying to avoid.

The reason we are fiddling with the size of the symbols is because there are valid instances of symbols not having a size (usually coming from hand-written assembly, where one just doesn't bother to add the .size directive). However, it certainly seems like we shouldn't be doing that if there is another symbol at the same address, and this symbol has the size set correctly...

So I am fine with symbols having zero size being in the symbol table. I would be fine not changing anything in the sorting and leaving some symbols with zero size, we just need to fix:

Symbol *Symtab::FindSymbolAtFileAddress(addr_t file_addr);

To ignore zero sized symbols when it finds them _if_ there is another symbol that has a size for that address. Wouldn't that fix the issue here?

You are assuming here that the symbols have size zero at the time we are performing the lookup. If I understand correctly what is going on, the problem here is that the code munging the symbol table (InitAddressIndexes), will set these symbols to have non-zero size. This is what this patch is trying to avoid.

I am saying to leave symbols with zero size as is _if_ there is another symbol that does have a valid size with the same start address.

The reason we are fiddling with the size of the symbols is because there are valid instances of symbols not having a size (usually coming from hand-written assembly, where one just doesn't bother to add the .size directive). However, it certainly seems like we shouldn't be doing that if there is another symbol at the same address, and this symbol has the size set correctly...

Exactly. The best solution in my mind is:

  • leave all symbols with sizes as is
  • only set a symbol size if there is no other symbol at that address that didn't originally have a size
  • maybe leave zero sizes symbols out of the lookup map if they have no sizes after doing the two steps above
jankratochvil marked 4 inline comments as done.Thu, Jun 27, 12:25 PM

Why was it necessary to change the iteration order here?

No, it was an accidental leftover.

Thanks for the review.

lldb/lit/SymbolFile/sizeless-symbol.test
1 ↗(On Diff #205538)

Yes, you are right.

4 ↗(On Diff #205538)

I only hope the image dump symtab output ordering is stable.

jankratochvil marked 2 inline comments as done.

I am saying to leave symbols with zero size as is _if_ there is another symbol that does have a valid size with the same start address.

This is what this patch does.

Exactly. The best solution in my mind is:

  • leave all symbols with sizes as is

This is what this patch does.

  • only set a symbol size if there is no other symbol at that address that didn't originally have a size

This is what this patch does.

  • maybe leave zero sizes symbols out of the lookup map if they have no sizes after doing the two steps above

I did not change this, IIUC it is already solved in existing code because InitAddressIndexes will expand any such zero-sized symbols first and it already uses the sized symbol in preference to the sizeless one during lookup.

labath accepted this revision.Mon, Jul 1, 2:11 AM

Looks good to me, but I'd suggest waiting a while to give @clayborg a chance to respond...

lldb/lit/SymbolFile/sizeless-symbol.test
4 ↗(On Diff #205538)

I'm pretty sure these are printed in the same order as present in .symtab. I'd be more worried about the order in which the symbols get emitted into the symtab in the first place. Theoretically this might change due to some patches to the assembler, but it should not change spuriously...

10 ↗(On Diff #206905)

Nit: Please line up this table by inserting a couple of spaces between ":" and "Index".

lldb/source/Symbol/Symtab.cpp
904 ↗(On Diff #206905)

Please clang-format this.

This revision is now accepted and ready to land.Mon, Jul 1, 2:11 AM
jankratochvil marked 5 inline comments as done.Mon, Jul 1, 3:05 AM
jankratochvil added inline comments.
lldb/lit/SymbolFile/sizeless-symbol.test
10 ↗(On Diff #206905)

Great, I did not expect it will still match with the added whitespaces.

lldb/source/Symbol/Symtab.cpp
904 ↗(On Diff #206905)

Sorry I forgot to run the clang-format.

jankratochvil marked 2 inline comments as done.
clayborg accepted this revision.Mon, Jul 1, 7:08 AM
clayborg added inline comments.
lldb/lit/SymbolFile/sizeless-symbol.test
5 ↗(On Diff #207252)

Symbols are ordered in the same order as they appear in the symbol table. On Darwin we coalesce symbols, so not all symbol are always in the symbol table, but we don't do this for ELF.

jankratochvil marked 2 inline comments as done.Mon, Jul 1, 7:30 AM
jankratochvil added inline comments.
lldb/lit/SymbolFile/sizeless-symbol.test
5 ↗(On Diff #207252)

OK, great; thanks for the review.

This revision was automatically updated to reflect the committed changes.
jankratochvil marked an inline comment as done.
Herald added a project: Restricted Project. · View Herald TranscriptMon, Jul 1, 7:32 AM