Page MenuHomePhabricator

[LLDB][ELF] Load both, .symtab and .dynsym sections
AbandonedPublic

Authored by kwk on Sep 10 2019, 2:38 AM.

Details

Summary

This change ensures that the .dynsym section will be parsed even when there's already is a .symtab.

It is motivated because of minidebuginfo (https://sourceware.org/gdb/current/onlinedocs/gdb/MiniDebugInfo.html#MiniDebugInfo).

There it says:

Keep all the function symbols not already in the dynamic symbol table.

That means the .symtab embedded inside the .gnu_debugdata does NOT contain the symbols from .dynsym. But in order to put a breakpoint on all symbols we need to load both.

To not add symbols that where already added I keep a list of unique ELF symbols on each invocation of ObjectFileELF::GetSymtab.

My other patch D66791 implements support for minidebuginfo, that's why I need this change.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
kwk edited the summary of this revision. (Show Details)Sep 26 2019, 2:38 PM
kwk updated this revision to Diff 222073.Sep 26 2019, 8:38 PM
  • Cleanup
kwk updated this revision to Diff 222075.Sep 26 2019, 9:00 PM
kwk marked 10 inline comments as done.
  • Change order of compare members to match order of member definitions.
kwk added a comment.Sep 26 2019, 9:02 PM

Not all is answered now but please respect: https://reviews.llvm.org/D67390#1683705

lldb/source/Plugins/ObjectFile/ELF/ELFHeader.cpp
367–369

@clayborg I explicitly only compare what we care about and therefore always set the name index to be the same.

lldb/source/Plugins/ObjectFile/ELF/ELFHeader.h
286

That is a good point.

443

That's why I explicitly set the name to 0 always which effectively ignores it because it has no effect on the hash then. Please see the specialization hash for NamedELFSymbol.

kwk updated this revision to Diff 222076.Sep 26 2019, 9:14 PM
  • make the section name part of NamedELFSymbol
kwk marked 9 inline comments as done.Sep 26 2019, 9:29 PM

I think I've finished the implementation now and should have answered all your comments and concerns. I run tests now. I would appreciate if you (@clayborg , @labath , @jankratochvil ) can take a look at this patch again.

lldb/source/Plugins/ObjectFile/ELF/ELFHeader.h
286

@clayborg in fact I think this could be the reason not to use a set of lldb_private::Symbol objects because there we don't store the section name or symbol name but only addresses or indexes. I did add the st_section_name_string struct member.

433

I did add the section name to NamedELFSymbol and explicitly ignore it when building the hash for the base ELFSymbol.

lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
42

removed.

2201–2205

@clayborg I find it much easier with NamedELFSymbol because all we have to do is derive from ELFSymbol and add the strings for the symbol name and the section name. If we were to use lldb_private::Symbol I would have to lookup the symbols manually each time I calculate a hash which seems bad. I mean, the symbol and section name already are ConstStrings and should be stored and computed very efficiently. Also I wanted to keep things local to ELF and not mess with everything that uses lldb_private::Symbol. Makes sense?

2205

@jankratochvil we already have places inside this for-loop where we continue. I hope it is okay to ask the same question back that you've asked for those continue-places. Why don't we adjust the returned number (i) in case symbols where skipped?

kwk updated this revision to Diff 222080.Sep 26 2019, 10:02 PM
kwk marked an inline comment as done.
  • Use symbol name including @VERSION suffix.
kwk added a comment.EditedSep 26 2019, 10:04 PM

Interesting. It looks like we had a test that wants a symbol to be added twice:

[ RUN      ] MangledTest.NameIndexes_FindFunctionSymbols
/home/kkleine/llvm/lldb/unittests/Core/MangledTest.cpp:186: Failure
      Expected: 1
To be equal to: Count("puts@GLIBC_2.6", eFunctionNameTypeFull)
      Which is: 0
/home/kkleine/llvm/lldb/unittests/Core/MangledTest.cpp:187: Failure
      Expected: 2
To be equal to: Count("puts", eFunctionNameTypeFull)
      Which is: 1
/home/kkleine/llvm/lldb/unittests/Core/MangledTest.cpp:188: Failure
      Expected: 2
To be equal to: Count("puts", eFunctionNameTypeBase)
      Which is: 1
[  FAILED  ] MangledTest.NameIndexes_FindFunctionSymbols (1 ms)

Before I used the bare symbol name with stripped @VERSION suffix. Now I've changed the implementation of NamedELFSymbol to include the @VERSION suffix and the tests pass.

This looks fairly ok to me, but it could use a more explicit test of the symbol uniqueing code. Right, now I believe the two tests you added check that the symbols are _not_ uniqued. (They're also a bit hard to follow due to the whole
objcopy business). Could you create a test with a yaml file which will contain various edge cases relevant to this code. I'm thinking of stuff like "a dynsym and a symtab symbol at the same address, but a different name", "a dynsym and symtab symbols with identical names, but different addresses", etc. Then just run "image dump symtab" on that file to check what we have parsed?

I am also surprised that you weren't able to just use a Section* (instead of the name) for the uniqueing. I'd expect that all symbols (even those from the separate object file) should be resolved to the sections in the main object. I see that this isn't the case, but I am surprised that this isn't causing any problems. Anyway, as things seem to be working as they are now, we can leave that for another day.

In D67390#1685313, @kwk wrote:

Before I used the bare symbol name with stripped @VERSION suffix. Now I've changed the implementation of NamedELFSymbol to include the @VERSION suffix and the tests pass.

Interesting. I'm pretty sure that the symbol count is irrelevant for that test (it just wants to know if it is there), so we can change that if needed. However, having the uniqueing include the @VERSION sounds right to me, so if that makes the test happy too then, great.

lldb/lit/Modules/ELF/Inputs/load-from-dynsym-alone.c
1 ↗(On Diff #222080)

It looks like you could inline these test inputs into the test files. You'd just need to change all the comments to // and add .c to the list of valid suffixes in lit.local.cfg.

lldb/lit/Modules/ELF/load-from-dynsym-alone.test
16 ↗(On Diff #222080)

s/dynmic/dynamic/

lldb/source/Plugins/ObjectFile/ELF/ELFHeader.cpp
367–369

I'll echo @clayborg here. This business with copying the ELFSymbol and clearing some fields is confusing. Do you even need the ELFSymbol::operator== for anything? If not I'd just delete that, and have the derived version compare all fields.

Also, it would be nice if the operator== and hash function definitions were next to each other. Can you just forward declare the std::hash stuff in the header, and have the implementation be next to this?

lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
2202–2204

Move this into the NamedELFSymbol constructor?

2205

something like if (unique_elf_symbols.insert(needle).second) would be more efficient, as you don't need to mess with the map twice.

2648

what's wrong with the old-fashioned UniqueElfSymbolColl unique_elf_symbols; ?

jankratochvil added inline comments.Sep 27 2019, 4:43 AM
lldb/source/Plugins/ObjectFile/ELF/ELFHeader.h
446

I find better to rather define std::hash<ConstString> (or provide ConstString::Hasher which I do in my DWZ patchset).

lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
2205

I would unconditionally add all the symbols to std::vector<NamedElfSymbol> unique_elf_symbols (with unique_elf_symbols.reserve in advance for the sym of .symtab+.dynsym sizes) and then after processing both .symtab and .dynsym and can llvm::sort(unique_elf_symbols) and add to symtab only those which are unique. I believe it will be much faster, one could benchmark it.

labath added inline comments.Sep 27 2019, 4:48 AM
lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
2205

sounds like a good idea.

kwk marked 8 inline comments as done.Sep 30 2019, 4:26 AM

This looks fairly ok to me, but it could use a more explicit test of the symbol uniqueing code. Right, now I believe the two tests you added check that the symbols are _not_ uniqued. (They're also a bit hard to follow due to the whole

objcopy business). Could you create a test with a yaml file which will contain various edge cases relevant to this code. I'm thinking of stuff like "a dynsym and a symtab symbol at the same address, but a different name", "a dynsym and symtab symbols with identical names, but different addresses", etc. Then just run "image dump symtab" on that file to check what we have parsed?

I'll give my best to implement this today.

I am also surprised that you weren't able to just use a Section* (instead of the name) for the uniqueing. I'd expect that all symbols (even those from the separate object file) should be resolved to the sections in the main object. I see that this isn't the case, but I am surprised that this isn't causing any problems. Anyway, as things seem to be working as they are now, we can leave that for another day.

Okay.

In D67390#1685313, @kwk wrote:

Before I used the bare symbol name with stripped @VERSION suffix. Now I've changed the implementation of NamedELFSymbol to include the @VERSION suffix and the tests pass.

Interesting. I'm pretty sure that the symbol count is irrelevant for that test (it just wants to know if it is there), so we can change that if needed. However, having the uniqueing include the @VERSION sounds right to me, so if that makes the test happy too then, great.

Yes, I hoped so. Thank you. Please await another revision of this patch with the tests requested.

lldb/source/Plugins/ObjectFile/ELF/ELFHeader.cpp
367–369

I'll echo @clayborg here. This business with copying the ELFSymbol and clearing some fields is confusing.

I've cleared up the documentation now and it is exactly the way I like it. Every entity deals with it's own business (respects its own fields when comparing). I find it pretty dirty to compare fields from the base struct in a derived one. The way I compare fields from the base struct is minimally invasive.

Do you even need the ELFSymbol::operator== for anything?

Yes, when you want to compare ELFSymbols. I know that I don't do that explicitly but I the function only deals with fields from the entity itself and they don't spill into any derived structure (with the exception of explicitly ignoring fields).

If not I'd just delete that, and have the derived version compare all fields.

No because I call it explcitly from the derived NamedELFSymbol.

Also, it would be nice if the operator== and hash function definitions were next to each other. Can you just forward declare the std::hash stuff in the header, and have the implementation be next to this?

I've found a compromise that is even more appealing to me. The ELFSymbol and NamedELFSymbol structs now have a hash function which contains the implementation next to the one of operator==(). This hash is called in the specialization which remains in the same location as before; the reason being that I didn't find a way do define something in the std:: namespace when I'm in the elf:: namespace.

lldb/source/Plugins/ObjectFile/ELF/ELFHeader.h
446

Once your DWZ patchset arrives, please let me know and we can change it.

lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
2205

@jankratochvil @labath yes, this sound like a good idea for *performance* improvement but honestly, I need to get this patch done first in order to make any progress with minidebuginfo. I hope you don't mind when I take this task to another patch, okay?

2648

Apparently nothing now :) . But before I had troubles because of some deleted default constructor. Thanks for spotting this and bringing it back to my attention.

kwk updated this revision to Diff 222387.Sep 30 2019, 4:27 AM
kwk marked an inline comment as done.
  • typo: dynmic -> dynamic
  • Applied changes requested in review
kwk marked 2 inline comments as done.Sep 30 2019, 4:29 AM
kwk added a comment.EditedSep 30 2019, 6:16 AM

@labath I did prepare some YAML file but apparently yaml2obj isn't meant to deal with this properly. Instead I get an Error like this: yaml2obj: error: repeated symbol name: 'main'. It looks like symbols from the Symbols: part of the YAML file are just added by name to a map. Changing yaml2obj for this seems a bit too heavy right now. If you're okay I'll go with a few more c programs if I can pull them off.

Here's the part hat causes the error:

template <class ELFT> void ELFState<ELFT>::buildSymbolIndexes() {
  auto Build = [this](ArrayRef<ELFYAML::Symbol> V, NameToIdxMap &Map) {
    for (size_t I = 0, S = V.size(); I < S; ++I) {
      const ELFYAML::Symbol &Sym = V[I];
      if (!Sym.Name.empty() && !Map.addName(Sym.Name, I + 1))
        reportError("repeated symbol name: '" + Sym.Name + "'");
    }
  };

  Build(Doc.Symbols, SymN2I);
  Build(Doc.DynamicSymbols, DynSymN2I);
}
kwk updated this revision to Diff 222427.Sep 30 2019, 7:33 AM
  • Added YAML test to merge symbols
kwk updated this revision to Diff 222441.Sep 30 2019, 8:28 AM
  • include test code in .c test file
kwk marked an inline comment as done.Sep 30 2019, 8:28 AM
kwk updated this revision to Diff 222479.Sep 30 2019, 12:52 PM
  • Fix comment
labath added inline comments.Sep 30 2019, 11:40 PM
lldb/lit/Modules/ELF/merge-symbols.yaml
52

smymtab

lldb/source/Plugins/ObjectFile/ELF/ELFHeader.cpp
367–369

Yes, when you want to compare ELFSymbols. I know that I don't do that explicitly but I the function only deals with fields from the entity itself and they don't spill into any derived structure (with the exception of explicitly ignoring fields).

Yes, but to me that exception kind of invalidates the whole idea. In order to know which fields you need to ignore, you need the knowledge of what fields are present in the struct (and as the fields are public, that is not a big deal), at which point you can just avoid the whole copying business, and explicitly compare the fields that you care about. Given that this also saves a nontrivial amount of code, I still think it's a better way to go. (Also, defining equality operators on class hierarchies is usually not a good idea even if they "nest" nicely, since they can still produce strange results due to object slicing.)

I've found a compromise that is even more appealing to me. The ELFSymbol and NamedELFSymbol structs now have a hash function which contains the implementation next to the one of operator==().

That works for me.

lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
2205

I don't think that kind of reasoning really applies here, since it's this patch that is introducing the potential performance problem. However, I don't think that is going to be a big deal, so I think we can leave this out for now.

kwk updated this revision to Diff 222569.Oct 1 2019, 2:38 AM
kwk marked 8 inline comments as done.
  • Remove verbose output in test
  • Fix typo: smymtab -> symtab
  • Move compare and hash logic out of base class into derived class as requested
kwk marked an inline comment as done.Oct 1 2019, 2:40 AM

@labath can you please check this patch one last time (hopefully)?

lldb/source/Plugins/ObjectFile/ELF/ELFHeader.cpp
367–369

@labath okay, I've remove the logic from ELFSymbol and coded everything straight away. I guess, that I wanted to be able to extend ELFSymbol with n number of fields and add them to the ELFSymbol::operator==() without touching the NamedELFSymbol::operator==() as long as the added fields shall not be ignored. Makes sense? I guess that you can find arguments for both ways to implement it. Anyway, I've coded it the way you want now, I hope.

lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
2205

Okay, thank you for allowing me to leave it out for now.

kwk updated this revision to Diff 222576.Oct 1 2019, 3:21 AM
  • Correct comment of test
kwk updated this revision to Diff 222577.Oct 1 2019, 3:22 AM
  • Fix comment
kwk updated this revision to Diff 222584.Oct 1 2019, 3:55 AM
  • Simplify NamedELFSymbol::hash()
labath accepted this revision.Oct 1 2019, 5:10 AM

Ok, let's give this one more shot. Thanks for your patience. I do have a couple of additional comments inline, but I don't think we need another round of review for those.

lldb/lit/Modules/ELF/merge-symbols.yaml
23

Please add a CHECK-EMPTY: after this line to ensure there are no additional symbols here.

lldb/source/Plugins/ObjectFile/ELF/ELFHeader.cpp
254–257

Is this really needed? It looks like the default compiler-generated copy constructor would be sufficient here.

lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
1918–1919

delete

2636

delete

jankratochvil added inline comments.Oct 1 2019, 6:32 AM
lldb/source/Plugins/ObjectFile/ELF/ELFHeader.cpp
379

llvm::hash_combine already calls std::hash<T> for each of its parameters.

lldb/source/Plugins/ObjectFile/ELF/ELFHeader.h
446

https://people.redhat.com/jkratoch/ConstStringHasher.patch - Although then the whole UniqueElfSymbolColl should be replaced by std::sort+std::unique of std::vector you plan in a future patch so the hashing does not matter much.

lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
1925

also delete

2665

For the planned rework of the unification of symbols it could be put (I think) to Symtab::InitAddressIndexes which already sorts the Symtab anyway.

kwk updated this revision to Diff 222634.Oct 1 2019, 8:45 AM
kwk marked 10 inline comments as done.
  • Check that no additional symbols follow after the expected ones
  • Use compiler-generated copy-ctor
  • Cleanup from experiment
  • Simplify NamedELFSymbol::hash()
  • Cleanup
lldb/source/Plugins/ObjectFile/ELF/ELFHeader.cpp
379

Good to know. Thank you.

lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
2636

Sorry, I noticed this myself as well. Some of the performance experiment spilled over in this patch.

2665

Honestly, for the rework I think about not to use an std::vector as you proposed but instead create the std::unordered_set using a bucket count that is equal to the number of symbols in .symtab and in .dynsym. Then inserts to that set will be constant as they are for the vector. But let's see how it goes in a followup patch. I have the feeling that if I use the approach you suggested, I need to keep a vector *and* a set around. The vector for collecting all symbols and the set for doing the unification, no? Anyway, let's postpone this.

kwk abandoned this revision.Oct 2 2019, 8:00 AM

Here's the relevant transcript from #lldb@otfc for why this change is abandoned.

[10/02/19 15:22:25] <jankratochvil> labath: Is it acceptable for you?  "BTW given how this unique-ness of symbols turns out to be non-trivial is it really needed?  Because for regular .symtab we can ignore .dynsym (as current LLDB does) and for .gnu_debugdata->.symtab we can just concatenate it with .dynsym and there will be no symbol duplicates anyway."
[10/02/19 15:22:25] <jankratochvil> Otherwise it is either a big code complication or a big performance regression and it is not really needed for anything.
[10/02/19 15:27:42] <labath> yeah, i suppose that would work
[10/02/19 15:28:11] <labath> then you just need to make sure the merging does not kick in for non-gnu_debugdata sections
[10/02/19 15:28:30] <labath> a simple check on the existing of this section might suffice
[10/02/19 15:29:39] <labath> it seems to me that it would be useful to have information from both sources available, but it is true that nobody has really needed that so far
[10/02/19 15:36:11] <jankratochvil> Thanks.
[10/02/19 15:36:22] <jankratochvil> kkleine: ^^^ Let's call it a win-win. :-)
[10/02/19 16:14:27] <kkleine> labath, jankratochvil I wonder if we the merging with .gnu_debugdata needs to be as "clever" as it is today and if it is enough to just add symbols without checking for uniqueness. That would simplify things even more.
[10/02/19 16:16:32] <jankratochvil> I think just add it without checks.
[10/02/19 16:16:53] <labath> works for me