This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lldb/
-
lit/
-
Modules/
-
ELF/
-
load-from-dynsym-alone.c
-
load-symtab-and-dynsym.c
2/2
merge-symbols.yaml
-
lit.local.cfg
-
helper/
-
toolchain.py
-
source/Plugins/ObjectFile/ELF/
-
Plugins/
-
ObjectFile/
-
ELF/
11/12
ELFHeader.h
9/9
ELFHeader.cpp
-
ObjectFileELF.h
28/28
ObjectFileELF.cpp

Differential D67390

[LLDB][ELF] Load both, .symtab and .dynsym sections
AbandonedPublic

Authored by kwk on Sep 10 2019, 2:38 AM.

Download Raw Diff

Details

Reviewers

labath
• espindola
alexander-shaposhnikov
JDevlieghere

Commits

rG3a4781bbf4f3: [LLDB][ELF] Load both, .symtab and .dynsym sections
rLLDB371599: [LLDB][ELF] Load both, .symtab and .dynsym sections
rL371599: [LLDB][ELF] Load both, .symtab and .dynsym sections

Summary

This change ensures that the .dynsym section will be parsed even when there's already is a .symtab.

It is motivated because of minidebuginfo (https://sourceware.org/gdb/current/onlinedocs/gdb/MiniDebugInfo.html#MiniDebugInfo).

There it says:

Keep all the function symbols not already in the dynamic symbol table.

That means the .symtab embedded inside the .gnu_debugdata does NOT contain the symbols from .dynsym. But in order to put a breakpoint on all symbols we need to load both.

To not add symbols that where already added I keep a list of unique ELF symbols on each invocation of ObjectFileELF::GetSymtab.

My other patch D66791 implements support for minidebuginfo, that's why I need this change.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 38827
Build 38826: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Herald added a reviewer: • espindola. · View Herald TranscriptSep 10 2019, 2:38 AM

Herald added a reviewer: alexander-shaposhnikov. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: lldb-commits, MaskRay, arichardson, emaste. · View Herald Transcript

Harbormaster completed remote builds in B37948: Diff 219502.Sep 10 2019, 2:39 AM

JDevlieghere added a subscriber: JDevlieghere.Sep 10 2019, 12:41 PM

JDevlieghere added inline comments.

lldb/lit/Modules/ELF/load-from-dynsym-alone.test
24 ↗	(On Diff #219502)	Please also add `llvm-strip` to the list of support tools (`toolchain.py:127`).
lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
2654	Can you motivate the need for this change? This comment seems to suggest that reading the symtab table should be sufficient as it should contain all the information from the dynsym. If that is not true, it would be worth updating this comment.
2687	Why did you remove the last part of the original comment? This seemed to be the most useful part... The newly added sentences explain what we are doing (which is relatively clear from the code). I'd rather see a comment explaining "why" something needs to happen.

Added llvm-strip to the list of support tools

Harbormaster completed remote builds in B37974: Diff 219587.Sep 10 2019, 12:46 PM

@JDevlieghere I change the support tool. It was @labath who requested (D66791#inline-601050) to outsource this patch.

Let's put this bit into a separate patch. As I said over IRC, I view this bit as functionally independent of the debugdata stuff (it's definitely independent in it's current form, as it will kick in for non-debugdata files too, which I think is fine).

The change is motivated because of minidebuginfo (https://sourceware.org/gdb/current/onlinedocs/gdb/MiniDebugInfo.html#MiniDebugInfo).

There it says:

Keep all the function symbols not already in the dynamic symbol

table.

That means the .symtab embedded inside the .gnu_debugdata does NOT contain the symbols from .dynsym. But in order to put a breakpoint on all symbols we need to load both. I hope this makes sense.

My other patch D66791 implements support for minidebuginfo, that's why I need this change.

update comment for .symtab section with minidebuginfo

Harbormaster completed remote builds in B37977: Diff 219594.Sep 10 2019, 1:21 PM

Fixed the comment as per request.

Thanks! This LGTM with the comment on line 2660 rephrased and the motivation as part of the summary/commit message.

lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
2687	The symtab section is non-allocable and can be stripped, while the dynsym section which should always be always be there. If both exist we load both to support the minidebuginfo case. Otherwise we just load the dynsym section.

This revision is now accepted and ready to land.Sep 10 2019, 2:07 PM

Rephrase comment

Harbormaster completed remote builds in B37979: Diff 219605.Sep 10 2019, 2:16 PM

kwk edited the summary of this revision. (Show Details)Sep 10 2019, 2:16 PM

lgtm. The reason I requested this to be put separately, is because it is implemented in a way that kicks in even without minidebuginfo. I think this is fine, because entries can be removed from the symtab even without the whole minidebuginfo business. While this format will likely be the only real user of these partial symtabs, in theory, there isn't anything stopping someone from removing symtab entries independently of that. While unlikely, I see no harm in supporting that, if it does not incur any extra maintenance costs.

lldb/lit/Modules/ELF/load-from-dynsym-alone.test
14 ↗	(On Diff #219605)	s/.dynamic/.dynsym/
lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
2657–2659	How about we make this less layered, and rephrase the existing comment a bit: "Information in the dynsym section is usually also found in the symtab, but this is not required as symtab entries can be removed after linking. The minidebuginfo format makes use of this facility to create smaller symbol tables.
2695–2698	I wouldn't bother with this. You can just unconditionally create a Symtab object before you start parsing any symbol tables.

Updated commit message and squashed commits into one rebased onto master.

Harbormaster completed remote builds in B37999: Diff 219677.Sep 11 2019, 3:01 AM

Closed by commit rL371599: [LLDB][ELF] Load both, .symtab and .dynsym sections (authored by kwk). · Explain WhySep 11 2019, 3:01 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptSep 11 2019, 3:01 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

kwk mentioned this in rL371600: [LLDB][ELF] Fixup for comments in D67390.Sep 11 2019, 3:11 AM

@labath I've addressed your comment rewrites in a fixup commit that I've commited without a review (llvm-svn: 371600): https://reviews.llvm.org/rLLDB371600

lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
2695–2698	I don't fully agree that it is that simple because further down in the code we do check for `if (m_symtab_up == nullptr)` and that is a condition I need to respect because of relocation, don't I?

kwk mentioned this in rG813f05915d29: [LLDB][ELF] Fixup for comments in D67390.Sep 11 2019, 3:15 AM

kwk mentioned this in D66791: [lldb][ELF] Read symbols from .gnu_debugdata sect..Sep 11 2019, 3:18 AM

This broke the windows bot:

http://lab.llvm.org:8011/builders/lldb-x64-windows-ninja/builds/8741

It broke on linux too. You did run the tests before committing, did you?

labath added inline comments.Sep 11 2019, 6:04 AM

lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
2695–2698	Well.. I'm pretty sure you could delete those null checks too. But, given that these null checks seem to be the prevailing pattern in this function, changing that might be better left for a separate patch...

kwk mentioned this in rL371624: Revert "[LLDB][ELF] Fixup for comments in D67390".Sep 11 2019, 7:33 AM

kwk mentioned this in rGd44c4a71df9d: Revert "[LLDB][ELF] Fixup for comments in D67390".

@stella.stamenova I've reverted this patch and thanks for letting me know.

@labath I didn't notice that tests failed, I'm really sorry.

@labath how shall we go about this? We do have the situation that when you lookup a symbol you might find it twice if it is in .dynsym and in .symtab. Shall I adjust the test expectation to that or change my implementation?

kwk reopened this revision.Sep 15 2019, 9:17 PM

This revision is now accepted and ready to land.Sep 15 2019, 9:17 PM

[LLDB][ELF] Fixup for comments in D67390
Change test expectation to find 2 instead of 1 symbol

Harbormaster completed remote builds in B38151: Diff 220273.Sep 15 2019, 9:19 PM

In D67390#1667270, @kwk wrote:

@labath how shall we go about this? We do have the situation that when you lookup a symbol you might find it twice if it is in .dynsym and in .symtab. Shall I adjust the test expectation to that or change my implementation?

That's a good question (and another reason why I wanted this to be a separate patch). Since only two tests broke it does not seem like having some symbols twice does much harm. OTOH, having an identical symbol twice does seem like asking for trouble down the line. One possibility would be to restrict this merging to the gnu_debugdata case only. Another option would be to make the merging code smarter and avoid adding the symbol a second time if it has the same name and address. That would have the advantage of having the symbol just once in the common case, while still preserving the full information (in case the symbol tables were munged independently of the gnu_debugdata thingy).

Overall, I guess I would prefer the last solution (inserting only different symbols) unless that turns out to be difficult. In that case, I think just restricting this to gnu_debugdata is fine.

BTW, if you want, I think you can submit the rest of the gnu_debugdata changes without waiting for this, if you just adjust the test expectations to account for the fact that symtab+dynsym merging does not work (yet).

lldb/packages/Python/lldbsuite/test/functionalities/load_unload/TestLoadUnload.py
276 ↗	(On Diff #220273)	You can't do that because this will have only two matches on elf platforms. Since we don't really care about the number here. I suggest finding a string that can be grepped for, which does not depend on the number of matches (maybe just `a_function`).

In D67390#1671018, @labath wrote:

In D67390#1667270, @kwk wrote:

@labath how shall we go about this? We do have the situation that when you lookup a symbol you might find it twice if it is in .dynsym and in .symtab. Shall I adjust the test expectation to that or change my implementation?

That's a good question (and another reason why I wanted this to be a separate patch). Since only two tests broke it does not seem like having some symbols twice does much harm. OTOH, having an identical symbol twice does seem like asking for trouble down the line. One possibility would be to restrict this merging to the gnu_debugdata case only. Another option would be to make the merging code smarter and avoid adding the symbol a second time if it has the same name and address. That would have the advantage of having the symbol just once in the common case, while still preserving the full information (in case the symbol tables were munged independently of the gnu_debugdata thingy).

Overall, I guess I would prefer the last solution (inserting only different symbols) unless that turns out to be difficult. In that case, I think just restricting this to gnu_debugdata is fine.

The crucial line is the following line in ObjectFileELF::ParseSymbols():

symtab->AddSymbol(dc_symbol);

If I change that to:

symtab->FindSymbolsByNameAndAddress(dc_symbol.GetName(), dc_symbol.GetAddress(), indexVector)
if (indexVector.empty()) {
  symtab->AddSymbol(dc_symbol);
}

is that the logic you have in mind? FindSymbolsByNameAndAddress I would need to implement.

BTW, if you want, I think you can submit the rest of the gnu_debugdata changes without waiting for this, if you just adjust the test expectations to account for the fact that symtab+dynsym merging does not work (yet).

Let's wait first, okay?

In D67390#1671018, @labath wrote:

In D67390#1667270, @kwk wrote:

@labath how shall we go about this? We do have the situation that when you lookup a symbol you might find it twice if it is in .dynsym and in .symtab. Shall I adjust the test expectation to that or change my implementation?

That's a good question (and another reason why I wanted this to be a separate patch). Since only two tests broke it does not seem like having some symbols twice does much harm.

You should ensure that there aren't multiple symbols for the same address. Mach-o has this problem with the STAB symbols and the standard symbol table symbols. That plug-in will unique them into one using the STAB symbol table index as the lldb::user_id_t.

OTOH, having an identical symbol twice does seem like asking for trouble down the line. One possibility would be to restrict this merging to the gnu_debugdata case only. Another option would be to make the merging code smarter and avoid adding the symbol a second time if it has the same name and address. That would have the advantage of having the symbol just once in the common case, while still preserving the full information (in case the symbol tables were munged independently of the gnu_debugdata thingy).

We really want to present the simplest and best view of the module to the the user. So please do only create one symbol, no matter how many sources we have.

Overall, I guess I would prefer the last solution (inserting only different symbols) unless that turns out to be difficult. In that case, I think just restricting this to gnu_debugdata is fine.

Code to unique symbols isn't that hard to do during the symbol table parsing. Thought don't keep re-sorting the lldb_private::Symbol objects, just keep a side map that has file address to symbol name. I am guessing it will be easier than mach-o because there the different symbols have different info (STAB vs normal symbol table symbol) where as in ELF they should be the same.

BTW, if you want, I think you can submit the rest of the gnu_debugdata changes without waiting for this, if you just adjust the test expectations to account for the fact that symtab+dynsym merging does not work (yet).

Love to see this fixed prior to checkin in possible

In D67390#1671128, @kwk wrote:
In D67390#1671018, @labath wrote:

In D67390#1667270, @kwk wrote:

@labath how shall we go about this? We do have the situation that when you lookup a symbol you might find it twice if it is in .dynsym and in .symtab. Shall I adjust the test expectation to that or change my implementation?

That's a good question (and another reason why I wanted this to be a separate patch). Since only two tests broke it does not seem like having some symbols twice does much harm. OTOH, having an identical symbol twice does seem like asking for trouble down the line. One possibility would be to restrict this merging to the gnu_debugdata case only. Another option would be to make the merging code smarter and avoid adding the symbol a second time if it has the same name and address. That would have the advantage of having the symbol just once in the common case, while still preserving the full information (in case the symbol tables were munged independently of the gnu_debugdata thingy).

Overall, I guess I would prefer the last solution (inserting only different symbols) unless that turns out to be difficult. In that case, I think just restricting this to gnu_debugdata is fine.

The crucial line is the following line in ObjectFileELF::ParseSymbols():
symtab->AddSymbol(dc_symbol);
If I change that to:
symtab->FindSymbolsByNameAndAddress(dc_symbol.GetName(), dc_symbol.GetAddress(), indexVector)
if (indexVector.empty()) {
  symtab->AddSymbol(dc_symbol);
}

in mach-o plug-in we keep a std::map or something easier. We don't sort or search the current lldb_private::Symbol objects since they aren't sorted, nor are the name indexes created until we are done with adding all symbols. Can we use a side map that is just something like:

std::map<lldb::addr_t, ContString> SymbolMapType;

is that the logic you have in mind? FindSymbolsByNameAndAddress I would need to implement.

BTW, if you want, I think you can submit the rest of the gnu_debugdata changes without waiting for this, if you just adjust the test expectations to account for the fact that symtab+dynsym merging does not work (yet).

Let's wait first, okay?

@clayborg what address is it exactly to store here std::map<lldb::addr_t, ContString> SymbolMapType;? As an example dc_symbol.GetAddress().GetFileAddress() is something that would work but as soon as we have minidebuginfo, then we might end having the same symbol coming from two different object files and so their address would still be different. Also do you want me to keep this map in ObjectFileELF?

In D67390#1671463, @kwk wrote:

@clayborg what address is it exactly to store here std::map<lldb::addr_t, ContString> SymbolMapType;? As an example dc_symbol.GetAddress().GetFileAddress() is something that would work but as soon as we have minidebuginfo, then we might end having the same symbol coming from two different object files and so their address would still be different. Also do you want me to keep this map in ObjectFileELF?

We might need a private function on ObjectFileELF that takes an extra parameter. My idea would be something like:

... ObjectFileELF::GetSymtab() {
  std::map<lldb::addr_t, ConstString SymbolMapType;
  SymbolMapType symbol_map;
  ParseSymbolTablePrivate(..., symbol_map); // .symtab
  ParseSymbolTablePrivate(..., symbol_map); // .dynsym
  ParseSymbolTablePrivate(..., symbol_map); // .other?

In D67390#1671463, @kwk wrote:

@clayborg what address is it exactly to store here std::map<lldb::addr_t, ContString> SymbolMapType;? As an example dc_symbol.GetAddress().GetFileAddress() is something that would work but as soon as we have minidebuginfo, then we might end having the same symbol coming from two different object files and so their address would still be different. Also do you want me to keep this map in ObjectFileELF?

The file address should be sufficient for normal ELF files. When we have a minidebuginfo, we should almost to loading this as part of the ObjectFileELF that points to the minidebuginfo and parsing it as if it were part of that file. SymbolVendor used to exist to allow one view of an executable using multiple individual ObjectFile objects, but that got removed. So now it might be best to load the minidebuginfo file as an ObjectFileELF _just_ to access the data in the ".gnu_debugdata" and decompress it, and use that data to add to the symbol table of the current file? So the code would be:

ObjectFileELF::GetSymtab() {
  std::map<lldb::addr_t, ConstString SymbolMapType;
  SymbolMapType symbol_map;
  ParseSymbolTablePrivate(..., symbol_map); // .symtab from current file
  ParseSymbolTablePrivate(..., symbol_map); // .dynsym from current file
  // Detect ".gnu_debugdata" file and load it
  FileSpec gnu_debugdata_file(...);
  if (gnu_debugdata_file.Exists()) {
    ObjectfileELF gnu_debugdata(gnu_debugdata_file);
    gnu_debugdata_symtab_data = gnu_debugdata.GetSectionData();
    // Decompress above data and parse the symbol table as if it is part of this file?
   ParseSymbolTablePrivate(...); // .gnu_debugdata symtab
  }

This assumes that the .gnu_debugdata file has the same section definitions for things like .text and .data etc.

@clayborg thank you for this explanation. My patch for minidebuginfo is already done in D66791 . But this one here I was asked to pull out as a separate patch. For good reasons as we see. I wonder how this extra parameter SymbolMapType of yours does help. In the end I have to identify duplicates. But if no symbol with the same name should be added then why do I care about where the symbol is coming from?

Please help me understand of follow my thoughts here:

When I'm in the (lldb) console and type b foo I expect LLDB to set a breakpoint on the function foo, right? The type of the symbol foo is deduced as function. I ask this question because Symtab has no function to just search for a symbol by name; instead you always have to pass an address, a type or an ID:

Symbol *FindSymbolByID(lldb::user_id_t uid) const;
Symbol *FindSymbolWithType(lldb::SymbolType symbol_type,
size_t FindAllSymbolsWithNameAndType(ConstString name,
size_t FindAllSymbolsWithNameAndType(ConstString name,
size_t FindAllSymbolsMatchingRexExAndType(
Symbol *FindFirstSymbolWithNameAndType(ConstString name,
Symbol *FindSymbolAtFileAddress(lldb::addr_t file_addr);
Symbol *FindSymbolContainingFileAddress(lldb::addr_t file_addr);
size_t FindFunctionSymbols(ConstString name, uint32_t name_type_mask,

So my point of this whole question is: What makes a symbol unique in the sense that it shouldn't be added to the symtab if it is already there?

Shouldn't the type of the symbol together with it's name define uniqueness? We shouldn't care about where the symbol is coming from nor if it is located at a different address. Well, if there's an overloaded function foo(int) and foo(char*) then both symbols are of type function and they both share the same name. When you type b foo you DO want 2 breakpoints to be set. Hence, niqueness cannot be defined over the name and the type . But wait, the name is mangled, so it IS unique enough unless I use Symbol::GetNameNoArguments(); there only the name is returned.

Here's my naive approach to test the admittedly very weird thought process from above: https://github.com/kwk/llvm-project/commit/5da4559a00c73ebefd8f8199890bd1991c94fa3f

In D67390#1672210, @kwk wrote:

So my point of this whole question is: What makes a symbol unique in the sense that it shouldn't be added to the symtab if it is already there?

A symbol name is not unique because you can have multiple (static) functions with the same (mangled) name in one module. An address is not unique as well because you can have symbol aliases, which will have the same address (and we want to keep both names to resolve name breakpoints correctly for instance).

The name+address combination (my original suggestion) should be sufficiently unique for the purposes we care about. Theoretically, if you want, you could include some additional items in the uniqueness "key" like symbol type etc. (to rule out the perverse case of somebody setting a "file" symbol to conflict with some other function symbol), but I don't think that is really necessary.

In D67390#1672210, @kwk wrote:

@clayborg thank you for this explanation. My patch for minidebuginfo is already done in D66791 . But this one here I was asked to pull out as a separate patch. For good reasons as we see. I wonder how this extra parameter SymbolMapType of yours does help. In the end I have to identify duplicates. But if no symbol with the same name should be added then why do I care about where the symbol is coming from?

The uniqueness is for symbols with the same name and file address and size. You can have multiple symbols with the same name, and each on could have a different address. We want there to only be one symbol per ObjectFile that has the same name + addr + size. That way when we ask for symbols by name, we don't have to end up getting more than one symbol for something that is the same thing.

Please help me understand of follow my thoughts here:

When I'm in the (lldb) console and type b foo I expect LLDB to set a breakpoint on the function foo, right? The type of the symbol foo is deduced as function.

Breakpoints ask for symbols whose type is lldb::eSymbolTypeCode and that match the name. The name matching is much more complex than you would think though. "b foo" by default turns into 'set a breakpoint on functions whose "basename" matches foo'. This means a C function named 'foo', any C++ method (stand alone function or class method) whose basename is 'foo' (bar::foo(int)", "foo(int)", "foo(float)", "std::a::b::foo()", many more) and any objective C function whose selector is 'foo' ("-[MyClass foo]", "+[AnotherClass foo]", and any other basename from any other language.

If you type "b foo::bar" this will end up looking up all functions whose basename is "bar" and then making sure any found matches contain "foo::bar".

I ask this question because Symtab has no function to just search for a symbol by name; instead you always have to pass an address, a type or an ID:

Symbol *FindSymbolByID(lldb::user_id_t uid) const;
Symbol *FindSymbolWithType(lldb::SymbolType symbol_type,
size_t FindAllSymbolsWithNameAndType(ConstString name,
size_t FindAllSymbolsWithNameAndType(ConstString name,
size_t FindAllSymbolsMatchingRexExAndType(
Symbol *FindFirstSymbolWithNameAndType(ConstString name,
Symbol *FindSymbolAtFileAddress(lldb::addr_t file_addr);
Symbol *FindSymbolContainingFileAddress(lldb::addr_t file_addr);
size_t FindFunctionSymbols(ConstString name, uint32_t name_type_mask,

FindAllSymbolsWithNameAndType() takes a name. The symbol type can be lldb::eSymbolTypeAny, so yes there is way to search. So there is a way to search for a symbol by name. The main issue is we are still parsing the symbol table and we don't want to initialized the name lookup table in the symbol table just yet since we are still adding symbols to the complete list. This is why an extra map that is used only during parsing of the symbol table makes sense.

So my point of this whole question is: What makes a symbol unique in the sense that it shouldn't be added to the symtab if it is already there?

Symbol is unique within one representation of an ObjectFile where the symbol has the same name, address and size and type.

Shouldn't the type of the symbol together with it's name define uniqueness? We shouldn't care about where the symbol is coming from nor if it is located at a different address.

We don't care where the symbol comes from as long as it is representing information for one ObjectFile. I would contend that if you have an "a.out" binary that has a .gnu_debugdata that points to "a.out.gnu_debugdata" that we would have one single ObjectFile that represents "a.out" and give the best description of what "a.out" contains.

We do care if a symbol has a different address. You can have as many static functions called "foo" as you want in a single binary. They are each unique since they have different addresses. So if you have 10 source files where 3 of those sources have a symbol "foo", we want there to be 3 different symbols with the same name, different addresses and possibly different sizes.

Well, if there's an overloaded function foo(int) and foo(char*) then both symbols are of type function and they both share the same name. When you type b foo you DO want 2 breakpoints to be set.

Yes we do! One breakpoint in LLDB has N breakpoint locations. A breakpoint of any kind (source file and line, function name breakpoint, and more) all constantly adding and removing locations as shared libraries get loaded. So if you have "foo(int)" and "foo(char *)" and say "b foo" we would end up with one breakpoint whose name matches "foo" with two locations.

Hence, niqueness cannot be defined over the name and the type . But wait, the name is mangled, so it IS unique enough unless I use Symbol::GetNameNoArguments(); there only the name is returned.

Again, there can be N breakpoints with the same name and different addresses. We are just trying to avoid a symbol table that has 3 symbols all with the name "foo", address of 0x1000 and size of 0x10. Why? Because the information is redundant and is just noise. The map I suggested would track these symbols so we don't end up adding multiple of the same symbols. Again, name address and size must all match. We _can_ have multiple symbols at the same address with different names. How? Linkers often have aliased symbols that point to the same thing. if we have a symbol "foo" at 0x1000 with size 0x10, we can also have a symbol "foo_alias" at 0x1000 and size 0x10. We need both so that we can set breakpoints correctly when setting breakpoints by name.

Here's my naive approach to test the admittedly very weird thought process from above: https://github.com/kwk/llvm-project/commit/5da4559a00c73ebefd8f8199890bd1991c94fa3f

I will take a look.

Let me know if you have any questions about what I said above.

In D67390#1672254, @labath wrote:

In D67390#1672210, @kwk wrote:

So my point of this whole question is: What makes a symbol unique in the sense that it shouldn't be added to the symtab if it is already there?

A symbol name is not unique because you can have multiple (static) functions with the same (mangled) name in one module. An address is not unique as well because you can have symbol aliases, which will have the same address (and we want to keep both names to resolve name breakpoints correctly for instance).

The name+address combination (my original suggestion) should be sufficiently unique for the purposes we care about. Theoretically, if you want, you could include some additional items in the uniqueness "key" like symbol type etc. (to rule out the perverse case of somebody setting a "file" symbol to conflict with some other function symbol), but I don't think that is really necessary.

We could track any extra data we need in the map if needed as Pavel suggests above. Not sure if it is needed, but we could do it if necessary.

@clayborg are you on IRC?

jankratochvil added a subscriber: jankratochvil.Sep 23 2019, 5:17 AM

@clayborg @labath I'm still trying to only add symbols when they are unique.

Take this already existing test:

./bin/llvm-lit -avv ~/llvm-project/lldb/lit/SymbolFile/DWARF/debug-types-line-tables.s

The symbols that are being added at the end of ObjectFileELF::ParseSymbols are these:

Symbol:
  i: 1
  start_id: 0
  symbol_bare: ""
  is_mangled: 0
  symbol_type: 0
  is_global: 0
  symbol_section_sp: 0x1915710 (.debug_str)
  symbol_value: 0
  symbol.st_size: 0
  symbol_size_valid: 1
  has_suffix: 0
  flags: 3

Symbol:
  i: 2
  start_id: 0
  symbol_bare: ""
  is_mangled: 0
  symbol_type: 0
  is_global: 0
  symbol_section_sp: 0x19157e0 (.debug_abbrev)
  symbol_value: 0
  symbol.st_size: 0
  symbol_size_valid: 1
  has_suffix: 0
  flags: 3

Symbol:
  i: 3
  start_id: 0
  symbol_bare: ""
  is_mangled: 0
  symbol_type: 0
  is_global: 0
  symbol_section_sp: 0x1915a50 (.debug_line)
  symbol_value: 0
  symbol.st_size: 0
  symbol_size_valid: 1
  has_suffix: 0
  flags: 3

I wonder how to define uniqueness for them. As you can see, the only difference is the symbol section which wasn't part of your definition of uniqueness (yet).

In D67390#1681034, @kwk wrote:

I wonder how to define uniqueness for them. As you can see, the only difference is the symbol section which wasn't part of your definition of uniqueness (yet).

These symbols should be obviously all kept distinct. Yes, you should add to the tuple also ELFSymbol::st_shndx as I have seen in your code: https://people.redhat.com/jkratoch/kwk-branch.diff

[LLDB][ELF] Fixup for comments in D67390
Change test expectation to find 2 instead of 1 symbol
symbol uniqueness by using elf::ELFSymbol

Harbormaster completed remote builds in B38524: Diff 221689.Sep 25 2019, 1:55 AM

Revert "Change test expectation to find 2 instead of 1 symbol"

Harbormaster completed remote builds in B38525: Diff 221690.Sep 25 2019, 2:00 AM

jankratochvil added inline comments.Sep 25 2019, 2:11 AM

lldb/source/Plugins/ObjectFile/ELF/ELFHeader.h
277	It could be in the same order as `ELFSymbol` fields as otherwise it is difficult to verify all the fields are matched here.
lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
42	Is it really needed?
2205	What if the symbol is ignored, the function will then incorrectly return a number of added symbols even when they were not added, wouldn't it?

Please wait before reviewing this patch again. I will let you know when things do work.

Adjust other occurrence of AddSymbol where ELF plays a role
Working tests
Use unordered_set for storing unique elf symbols

Harbormaster completed remote builds in B38609: Diff 221968.Sep 26 2019, 9:03 AM

So overall approach is good. See inline comments for issue and questions.

lldb/source/Plugins/ObjectFile/ELF/ELFHeader.cpp
367–369	I would almost just manually compare only the things we care about here. Again, what about st_shndx when comparing a symbol from the main symbol table and one from the .gnu_debugdata symbol table. Are those section indexes guaranteed to match? I would think not.
lldb/source/Plugins/ObjectFile/ELF/ELFHeader.h
286	Do we need a ConstString for the st_shndx as well so we can correctly compare a section from one file to a section from another as in the .gnu_debugdata case?
430	An ELF symbol from one symbol table can have the same name as another yet with a different st_name string table offset. Do we even want to be able to hash an ELFSymbol on its own? Maybe remove this enture function and only hash NamedELFSymbol?
433	I know the section index will match between for symbol tables in the same ELF file, but what about a symbol table in an external file like .gnu_debugdata?
443	Don't we need to hash everything we care about except the st_name? Those indexes can differ if they come from a different string table? Shouldn't this be: std::size_t h1 = std::hash<elf::elf_addr>()(s.st_value); std::size_t h2 = std::hash<elf::elf_xword>()(s.st_size); // Skip std::size_t h3 = std::hash<elf::elf_word>()(s.st_name); std::size_t h4 = std::hash<unsigned char>()(s.st_info); std::size_t h5 = std::hash<unsigned char>()(s.st_other); std::size_t h6 = std::hash<elf::elf_half>()(s.st_shndx); std::size_t h7 = std::hash<const char *>()(s.st_name_string.AsCString()); return llvm::hash_combine(h1, h2, h4, h5, h6, h7); I left the "h" variables with the same names to show we don't want "h3".
lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
2201–2205	Do we even need NamedELFSymbol? Can we just make an unordered_set of lldb_private::Symbol values?

Cleanup

Harbormaster completed remote builds in B38622: Diff 222029.Sep 26 2019, 2:35 PM

kwk edited the summary of this revision. (Show Details)Sep 26 2019, 2:38 PM

Cleanup

Harbormaster completed remote builds in B38634: Diff 222073.Sep 26 2019, 8:39 PM

Change order of compare members to match order of member definitions.

Harbormaster completed remote builds in B38635: Diff 222075.Sep 26 2019, 9:01 PM

Not all is answered now but please respect: https://reviews.llvm.org/D67390#1683705

lldb/source/Plugins/ObjectFile/ELF/ELFHeader.cpp
367–369	@clayborg I explicitly only compare what we care about and therefore always set the name index to be the same.
lldb/source/Plugins/ObjectFile/ELF/ELFHeader.h
286	That is a good point.
443	That's why I explicitly set the name to 0 always which effectively ignores it because it has no effect on the hash then. Please see the specialization hash for `NamedELFSymbol`.

make the section name part of NamedELFSymbol

Harbormaster completed remote builds in B38636: Diff 222076.Sep 26 2019, 9:15 PM

I think I've finished the implementation now and should have answered all your comments and concerns. I run tests now. I would appreciate if you (@clayborg , @labath , @jankratochvil ) can take a look at this patch again.

lldb/source/Plugins/ObjectFile/ELF/ELFHeader.h
286	@clayborg in fact I think this could be the reason not to use a set of `lldb_private::Symbol` objects because there we don't store the section name or symbol name but only addresses or indexes. I did add the `st_section_name_string` struct member.
433	I did add the section name to `NamedELFSymbol` and explicitly ignore it when building the hash for the base `ELFSymbol`.
lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
42	removed.
2201–2205	@clayborg I find it much easier with `NamedELFSymbol` because all we have to do is derive from `ELFSymbol` and add the strings for the symbol name and the section name. If we were to use `lldb_private::Symbol` I would have to lookup the symbols manually each time I calculate a hash which seems bad. I mean, the symbol and section name already are `ConstString`s and should be stored and computed very efficiently. Also I wanted to keep things local to ELF and not mess with everything that uses `lldb_private::Symbol`. Makes sense?
2205	@jankratochvil we already have places inside this `for`-loop where we `continue`. I hope it is okay to ask the same question back that you've asked for those `continue`-places. Why don't we adjust the returned number (`i`) in case symbols where skipped?

Use symbol name including @VERSION suffix.

Harbormaster completed remote builds in B38637: Diff 222080.Sep 26 2019, 10:02 PM

Interesting. It looks like we had a test that wants a symbol to be added twice:

[ RUN      ] MangledTest.NameIndexes_FindFunctionSymbols
/home/kkleine/llvm/lldb/unittests/Core/MangledTest.cpp:186: Failure
      Expected: 1
To be equal to: Count("puts@GLIBC_2.6", eFunctionNameTypeFull)
      Which is: 0
/home/kkleine/llvm/lldb/unittests/Core/MangledTest.cpp:187: Failure
      Expected: 2
To be equal to: Count("puts", eFunctionNameTypeFull)
      Which is: 1
/home/kkleine/llvm/lldb/unittests/Core/MangledTest.cpp:188: Failure
      Expected: 2
To be equal to: Count("puts", eFunctionNameTypeBase)
      Which is: 1
[  FAILED  ] MangledTest.NameIndexes_FindFunctionSymbols (1 ms)

Before I used the bare symbol name with stripped @VERSION suffix. Now I've changed the implementation of NamedELFSymbol to include the @VERSION suffix and the tests pass.

This looks fairly ok to me, but it could use a more explicit test of the symbol uniqueing code. Right, now I believe the two tests you added check that the symbols are _not_ uniqued. (They're also a bit hard to follow due to the whole
objcopy business). Could you create a test with a yaml file which will contain various edge cases relevant to this code. I'm thinking of stuff like "a dynsym and a symtab symbol at the same address, but a different name", "a dynsym and symtab symbols with identical names, but different addresses", etc. Then just run "image dump symtab" on that file to check what we have parsed?

I am also surprised that you weren't able to just use a Section* (instead of the name) for the uniqueing. I'd expect that all symbols (even those from the separate object file) should be resolved to the sections in the main object. I see that this isn't the case, but I am surprised that this isn't causing any problems. Anyway, as things seem to be working as they are now, we can leave that for another day.

In D67390#1685313, @kwk wrote:

Before I used the bare symbol name with stripped @VERSION suffix. Now I've changed the implementation of NamedELFSymbol to include the @VERSION suffix and the tests pass.

Interesting. I'm pretty sure that the symbol count is irrelevant for that test (it just wants to know if it is there), so we can change that if needed. However, having the uniqueing include the @VERSION sounds right to me, so if that makes the test happy too then, great.

lldb/lit/Modules/ELF/Inputs/load-from-dynsym-alone.c
1 ↗	(On Diff #222080)	It looks like you could inline these test inputs into the test files. You'd just need to change all the comments to `//` and add `.c` to the list of valid suffixes in lit.local.cfg.
lldb/lit/Modules/ELF/load-from-dynsym-alone.test
16 ↗	(On Diff #222080)	s/dynmic/dynamic/
lldb/source/Plugins/ObjectFile/ELF/ELFHeader.cpp
367–369	I'll echo @clayborg here. This business with copying the ELFSymbol and clearing some fields is confusing. Do you even need the ELFSymbol::operator== for anything? If not I'd just delete that, and have the derived version compare all fields. Also, it would be nice if the operator== and hash function definitions were next to each other. Can you just forward declare the std::hash stuff in the header, and have the implementation be next to this?
lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
2202–2204	Move this into the NamedELFSymbol constructor?
2205	something like `if (unique_elf_symbols.insert(needle).second)` would be more efficient, as you don't need to mess with the map twice.
2648	what's wrong with the old-fashioned `UniqueElfSymbolColl unique_elf_symbols;` ?

jankratochvil added inline comments.Sep 27 2019, 4:43 AM

lldb/source/Plugins/ObjectFile/ELF/ELFHeader.h
446	I find better to rather define `std::hash<ConstString>` (or provide `ConstString::Hasher` which I do in my DWZ patchset).
lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
2205	I would unconditionally add all the symbols to `std::vector<NamedElfSymbol> unique_elf_symbols` (with `unique_elf_symbols.reserve` in advance for the sym of `.symtab`+`.dynsym` sizes) and then after processing both `.symtab` and `.dynsym` and can `llvm::sort(unique_elf_symbols)` and add to `symtab` only those which are unique. I believe it will be much faster, one could benchmark it.

labath added inline comments.Sep 27 2019, 4:48 AM

lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
2205	sounds like a good idea.

In D67390#1685484, @labath wrote:

This looks fairly ok to me, but it could use a more explicit test of the symbol uniqueing code. Right, now I believe the two tests you added check that the symbols are _not_ uniqued. (They're also a bit hard to follow due to the whole
objcopy business). Could you create a test with a yaml file which will contain various edge cases relevant to this code. I'm thinking of stuff like "a dynsym and a symtab symbol at the same address, but a different name", "a dynsym and symtab symbols with identical names, but different addresses", etc. Then just run "image dump symtab" on that file to check what we have parsed?

I'll give my best to implement this today.

I am also surprised that you weren't able to just use a Section* (instead of the name) for the uniqueing. I'd expect that all symbols (even those from the separate object file) should be resolved to the sections in the main object. I see that this isn't the case, but I am surprised that this isn't causing any problems. Anyway, as things seem to be working as they are now, we can leave that for another day.

Okay.

In D67390#1685313, @kwk wrote:

Before I used the bare symbol name with stripped @VERSION suffix. Now I've changed the implementation of NamedELFSymbol to include the @VERSION suffix and the tests pass.

Interesting. I'm pretty sure that the symbol count is irrelevant for that test (it just wants to know if it is there), so we can change that if needed. However, having the uniqueing include the @VERSION sounds right to me, so if that makes the test happy too then, great.

Yes, I hoped so. Thank you. Please await another revision of this patch with the tests requested.

lldb/source/Plugins/ObjectFile/ELF/ELFHeader.cpp
367–369	I'll echo @clayborg here. This business with copying the ELFSymbol and clearing some fields is confusing. I've cleared up the documentation now and it is exactly the way I like it. Every entity deals with it's own business (respects its own fields when comparing). I find it pretty dirty to compare fields from the base struct in a derived one. The way I compare fields from the base struct is minimally invasive. Do you even need the ELFSymbol::operator== for anything? Yes, when you want to compare ELFSymbols. I know that I don't do that explicitly but I the function only deals with fields from the entity itself and they don't spill into any derived structure (with the exception of explicitly ignoring fields). If not I'd just delete that, and have the derived version compare all fields. No because I call it explcitly from the derived NamedELFSymbol. Also, it would be nice if the operator== and hash function definitions were next to each other. Can you just forward declare the std::hash stuff in the header, and have the implementation be next to this? I've found a compromise that is even more appealing to me. The ELFSymbol and NamedELFSymbol structs now have a `hash` function which contains the implementation next to the one of `operator==()`. This `hash` is called in the specialization which remains in the same location as before; the reason being that I didn't find a way do define something in the `std::` namespace when I'm in the `elf::` namespace.
lldb/source/Plugins/ObjectFile/ELF/ELFHeader.h
446	Once your DWZ patchset arrives, please let me know and we can change it.
lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
2205	@jankratochvil @labath yes, this sound like a good idea for performance improvement but honestly, I need to get this patch done first in order to make any progress with minidebuginfo. I hope you don't mind when I take this task to another patch, okay?
2648	Apparently nothing now :) . But before I had troubles because of some deleted default constructor. Thanks for spotting this and bringing it back to my attention.

typo: dynmic -> dynamic
Applied changes requested in review

Harbormaster completed remote builds in B38735: Diff 222387.Sep 30 2019, 4:27 AM

kwk marked 2 inline comments as done.Sep 30 2019, 4:29 AM

@labath I did prepare some YAML file but apparently yaml2obj isn't meant to deal with this properly. Instead I get an Error like this: yaml2obj: error: repeated symbol name: 'main'. It looks like symbols from the Symbols: part of the YAML file are just added by name to a map. Changing yaml2obj for this seems a bit too heavy right now. If you're okay I'll go with a few more c programs if I can pull them off.

Here's the part hat causes the error:

template <class ELFT> void ELFState<ELFT>::buildSymbolIndexes() {
  auto Build = [this](ArrayRef<ELFYAML::Symbol> V, NameToIdxMap &Map) {
    for (size_t I = 0, S = V.size(); I < S; ++I) {
      const ELFYAML::Symbol &Sym = V[I];
      if (!Sym.Name.empty() && !Map.addName(Sym.Name, I + 1))
        reportError("repeated symbol name: '" + Sym.Name + "'");
    }
  };

  Build(Doc.Symbols, SymN2I);
  Build(Doc.DynamicSymbols, DynSymN2I);
}

Added YAML test to merge symbols

Harbormaster completed remote builds in B38753: Diff 222427.Sep 30 2019, 7:33 AM

include test code in .c test file

kwk marked an inline comment as done.Sep 30 2019, 8:28 AM

Harbormaster completed remote builds in B38757: Diff 222441.Sep 30 2019, 8:31 AM

Fix comment

Harbormaster completed remote builds in B38768: Diff 222479.Sep 30 2019, 12:52 PM

labath added inline comments.Sep 30 2019, 11:40 PM

lldb/lit/Modules/ELF/merge-symbols.yaml
52	smymtab
lldb/source/Plugins/ObjectFile/ELF/ELFHeader.cpp
367–369	Yes, when you want to compare ELFSymbols. I know that I don't do that explicitly but I the function only deals with fields from the entity itself and they don't spill into any derived structure (with the exception of explicitly ignoring fields). Yes, but to me that exception kind of invalidates the whole idea. In order to know which fields you need to ignore, you need the knowledge of what fields are present in the struct (and as the fields are public, that is not a big deal), at which point you can just avoid the whole copying business, and explicitly compare the fields that you care about. Given that this also saves a nontrivial amount of code, I still think it's a better way to go. (Also, defining equality operators on class hierarchies is usually not a good idea even if they "nest" nicely, since they can still produce strange results due to object slicing.) I've found a compromise that is even more appealing to me. The ELFSymbol and NamedELFSymbol structs now have a hash function which contains the implementation next to the one of operator==(). That works for me.
lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
2205	I don't think that kind of reasoning really applies here, since it's this patch that is introducing the potential performance problem. However, I don't think that is going to be a big deal, so I think we can leave this out for now.

Remove verbose output in test
Fix typo: smymtab -> symtab
Move compare and hash logic out of base class into derived class as requested

Harbormaster completed remote builds in B38799: Diff 222569.Oct 1 2019, 2:38 AM

@labath can you please check this patch one last time (hopefully)?

lldb/source/Plugins/ObjectFile/ELF/ELFHeader.cpp
367–369	@labath okay, I've remove the logic from `ELFSymbol` and coded everything straight away. I guess, that I wanted to be able to extend `ELFSymbol` with n number of fields and add them to the `ELFSymbol::operator==()` without touching the `NamedELFSymbol::operator==()` as long as the added fields shall not be ignored. Makes sense? I guess that you can find arguments for both ways to implement it. Anyway, I've coded it the way you want now, I hope.
lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
2205	Okay, thank you for allowing me to leave it out for now.

Correct comment of test

Harbormaster completed remote builds in B38802: Diff 222576.Oct 1 2019, 3:21 AM

Fix comment

Harbormaster completed remote builds in B38803: Diff 222577.Oct 1 2019, 3:22 AM

Simplify NamedELFSymbol::hash()

Harbormaster completed remote builds in B38807: Diff 222584.Oct 1 2019, 3:56 AM

Ok, let's give this one more shot. Thanks for your patience. I do have a couple of additional comments inline, but I don't think we need another round of review for those.

lldb/lit/Modules/ELF/merge-symbols.yaml
23	Please add a `CHECK-EMPTY:` after this line to ensure there are no additional symbols here.
lldb/source/Plugins/ObjectFile/ELF/ELFHeader.cpp
254–257	Is this really needed? It looks like the default compiler-generated copy constructor would be sufficient here.
lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
1918–1919	delete
2636	delete

jankratochvil added inline comments.Oct 1 2019, 6:32 AM

lldb/source/Plugins/ObjectFile/ELF/ELFHeader.cpp
379	llvm::hash_combine already calls std::hash<T> for each of its parameters.
lldb/source/Plugins/ObjectFile/ELF/ELFHeader.h
446	https://people.redhat.com/jkratoch/ConstStringHasher.patch - Although then the whole UniqueElfSymbolColl should be replaced by `std::sort`+`std::unique` of `std::vector` you plan in a future patch so the hashing does not matter much.
lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
1925	also delete
2665	For the planned rework of the unification of symbols it could be put (I think) to `Symtab::InitAddressIndexes` which already sorts the Symtab anyway.

Check that no additional symbols follow after the expected ones
Use compiler-generated copy-ctor
Cleanup from experiment
Simplify NamedELFSymbol::hash()
Cleanup

lldb/source/Plugins/ObjectFile/ELF/ELFHeader.cpp
379	Good to know. Thank you.
lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
2636	Sorry, I noticed this myself as well. Some of the performance experiment spilled over in this patch.
2665	Honestly, for the rework I think about not to use an `std::vector` as you proposed but instead create the `std::unordered_set` using a bucket count that is equal to the number of symbols in `.symtab` and in `.dynsym`. Then inserts to that set will be constant as they are for the vector. But let's see how it goes in a followup patch. I have the feeling that if I use the approach you suggested, I need to keep a vector and a set around. The vector for collecting all symbols and the set for doing the unification, no? Anyway, let's postpone this.

Harbormaster completed remote builds in B38827: Diff 222634.Oct 1 2019, 8:47 AM

LLVM reasoning for why to go with std::vector: http://llvm.org/docs/ProgrammersManual.html#set-like-containers-std-set-smallset-setvector-etc.

Here's the relevant transcript from #lldb@otfc for why this change is abandoned.

[10/02/19 15:22:25] <jankratochvil> labath: Is it acceptable for you?  "BTW given how this unique-ness of symbols turns out to be non-trivial is it really needed?  Because for regular .symtab we can ignore .dynsym (as current LLDB does) and for .gnu_debugdata->.symtab we can just concatenate it with .dynsym and there will be no symbol duplicates anyway."
[10/02/19 15:22:25] <jankratochvil> Otherwise it is either a big code complication or a big performance regression and it is not really needed for anything.
[10/02/19 15:27:42] <labath> yeah, i suppose that would work
[10/02/19 15:28:11] <labath> then you just need to make sure the merging does not kick in for non-gnu_debugdata sections
[10/02/19 15:28:30] <labath> a simple check on the existing of this section might suffice
[10/02/19 15:29:39] <labath> it seems to me that it would be useful to have information from both sources available, but it is true that nobody has really needed that so far
[10/02/19 15:36:11] <jankratochvil> Thanks.
[10/02/19 15:36:22] <jankratochvil> kkleine: ^^^ Let's call it a win-win. :-)
[10/02/19 16:14:27] <kkleine> labath, jankratochvil I wonder if we the merging with .gnu_debugdata needs to be as "clever" as it is today and if it is enough to just add symbols without checking for uniqueness. That would simplify things even more.
[10/02/19 16:16:32] <jankratochvil> I think just add it without checks.
[10/02/19 16:16:53] <labath> works for me

Revision Contents

Path

Size

lldb/

lit/

Modules/

ELF/

load-from-dynsym-alone.c

41 lines

load-symtab-and-dynsym.c

59 lines

merge-symbols.yaml

98 lines

lit.local.cfg

2 lines

helper/

toolchain.py

2 lines

source/

Plugins/

ObjectFile/

ELF/

37 lines

23 lines

9 lines

62 lines

Commit	Tree	Parents	Author	Summary	Date
6c8c9ad08ab2	caa0e90e531c	cc443945a766	Konrad Kleine	Cleanup	Oct 1 2019, 8:45 AM
cc443945a766	18321e2207a3	713b7d0b8cf2	Konrad Kleine	Simplify NamedELFSymbol::hash()	Oct 1 2019, 8:34 AM
713b7d0b8cf2	ba87623f39d5	84c326d80430	Konrad Kleine	Cleanup from experiment	Oct 1 2019, 8:32 AM
84c326d80430	c0e5651ec981	d15b978d2492	Konrad Kleine	Use compiler-generated copy-ctor	Oct 1 2019, 7:17 AM
d15b978d2492	b53e29314b8d	bec1351bba92	Konrad Kleine	Check that no additional symbols follow after the expected ones	Oct 1 2019, 7:15 AM
bec1351bba92	d412626d5871	6fdd2c66e6a0	Konrad Kleine	Simplify NamedELFSymbol::hash()	Oct 1 2019, 3:51 AM
6fdd2c66e6a0	b112bed11b92	7a9b44be50d6	Konrad Kleine	Fix comment	Oct 1 2019, 3:22 AM
7a9b44be50d6	f9e5b10f0be8	a80244e3a27b	Konrad Kleine	Correct comment of test	Oct 1 2019, 3:21 AM
a80244e3a27b	75558eb2a2f1	568c52b5ccb9	Konrad Kleine	Move compare and hash logic out of base class into derived class as requested	Oct 1 2019, 2:08 AM
568c52b5ccb9	889eeb78bc29	e8e9f278f5b0	Konrad Kleine	Fix typo: smymtab -> symtab	Oct 1 2019, 2:00 AM
e8e9f278f5b0	6bebcdd3d28d	c7a4d156c53e	Konrad Kleine	Remove verbose output in test	Sep 30 2019, 1:06 PM
c7a4d156c53e	48967fdd3d17	cec79fa80111	Konrad Kleine	Fix comment	Sep 30 2019, 12:52 PM
cec79fa80111	f0fca1088bea	5cdc12a8f6a8	Konrad Kleine	include test code in .c test file	Sep 30 2019, 8:28 AM
5cdc12a8f6a8	85f51c68cbd8	ce0c3a469967	Konrad Kleine	Added YAML test to merge symbols	Sep 30 2019, 7:33 AM
ce0c3a469967	ef53dab46f0d	6b6becc5cd9b	Konrad Kleine	Applied changes requested in review	Sep 30 2019, 4:27 AM
6b6becc5cd9b	0257f3efc744	525acf8196fb	Konrad Kleine	typo: dynmic -> dynamic	Sep 27 2019, 2:28 AM
525acf8196fb	a8dc8fc1dd07	edef40a2f4f6	Konrad Kleine	Use symbol name including @VERSION suffix. (Show More…)	Sep 26 2019, 10:00 PM
edef40a2f4f6	77b66aa27b35	c340ccd35cc8	Konrad Kleine	make the section name part of NamedELFSymbol	Sep 26 2019, 9:14 PM
c340ccd35cc8	77a641eea548	1fbd3af30c2b	Konrad Kleine	Change order of compare members to match order of member definitions.	Sep 26 2019, 9:00 PM
1fbd3af30c2b	c7948bf54f51	f1fdc6382a11	Konrad Kleine	Cleanup	Sep 26 2019, 8:38 PM
f1fdc6382a11	8af992fb09eb	72acd62c6d4a	Konrad Kleine	Cleanup	Sep 26 2019, 2:35 PM
72acd62c6d4a	c497560abcb7	1ea4fc24a0c2	Konrad Kleine	Use unordered_set for storing unique elf symbols	Sep 26 2019, 9:01 AM
1ea4fc24a0c2	5bc3ceeb763a	ca53f22ea000	Konrad Kleine	Working tests	Sep 26 2019, 6:05 AM
ca53f22ea000	b358ec6fb80a	31485d4ba86b	Konrad Kleine	Adjust other occurrence of AddSymbol where ELF plays a role	Sep 25 2019, 2:23 AM
31485d4ba86b	f1a1d142acd2	53d41e5eec11	Konrad Kleine	Revert "Change test expectation to find 2 instead of 1 symbol" (Show More…)	Sep 25 2019, 1:55 AM
53d41e5eec11	1dfae7411e82	3bf8ab3b2aff	Konrad Kleine	symbol uniqueness by using elf::ELFSymbol	Sep 17 2019, 2:02 AM
3bf8ab3b2aff	79ef5c93aa09	5b8dcc0d0124	Konrad Kleine	Change test expectation to find 2 instead of 1 symbol	Sep 12 2019, 2:24 AM
5b8dcc0d0124	4753925fd4c2	5198c7d88831	Konrad Kleine	[LLDB][ELF] Fixup for comments in D67390 (Show More…)	Sep 11 2019, 3:12 AM
5198c7d88831	7709ae8e7ed2	ce7cfbccc63b	Konrad Kleine	[LLDB][ELF] Load both, .symtab and .dynsym sections (Show More…)	Sep 11 2019, 3:00 AM

Diff 222634

lldb/lit/Modules/ELF/load-from-dynsym-alone.c

This file was added.

				// REQUIRES: system-linux

				// This test ensures that we will load .dynsym even if there's no .symtab section.
				// We do this by compiling a small C program with a function and we direct the
				// linker where to put the symbols so that in the end the layout is as follows:
				//
				// Symbol table '.dynsym' contains 4 entries:
				// Num: Value Size Type Bind Vis Ndx Name
				// 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
				// 1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@GLIBC_2.2.5
				// 2: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
				// 3: 0000000000401110 13 FUNC GLOBAL DEFAULT 10 functionInDynsym

				// We want to keep the symbol "functionInDynsym" in the .dynsym section and not
				// have it put the default .symtab section.
				// RUN: echo "{functionInDynsym;};" > %T/dynamic-symbols.txt
				// RUN: %clang -Wl,--dynamic-list=%T/dynamic-symbols.txt -g -o %t.binary %s

				// Remove not needed symbols
				// RUN: echo "functionInDynsym" > %t.keep_symbols
				// RUN: llvm-objcopy --strip-all --remove-section .gdb_index --remove-section .comment --keep-symbols=%t.keep_symbols %t.binary

				// Remove functionInDynsym symbol from .symtab (will leave symbol in .dynsym intact)
				// RUN: llvm-strip --strip-symbol=functionInDynsym %t.binary

				// RUN: %lldb -b -o 'b functionInDynsym' -o 'run' -o 'continue' %t.binary \| FileCheck %s

				// CHECK: (lldb) b functionInDynsym
				// CHECK-NEXT: Breakpoint 1: where = {{.}}.binary`functionInDynsym, address = 0x{{.}}

				// CHECK: (lldb) run
				// CHECK-NEXT: Process {{.*}} stopped
				// CHECK-NEXT: * thread #1, name = 'load-from-dynsy', stop reason = breakpoint 1.1

				// This function will be embedded within the .dynsym section of the main binary.
				int functionInDynsym(int num) { return num * 3; }

				int main(int argc, char *argv[]) {
				int y = functionInDynsym(argc);
				return y;
				}

lldb/lit/Modules/ELF/load-symtab-and-dynsym.c

This file was added.

				// REQUIRES: system-linux

				// This test ensures that we will load .dynsym even if there's a .symtab section.
				// We do this by compiling a small C program with two functions and we direct the
				// linker where to put the symbols so that in the end the layout is as follows:
				//
				// Symbol table '.dynsym' contains 4 entries:
				// Num: Value Size Type Bind Vis Ndx Name
				// 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
				// 1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@GLIBC_2.2.5
				// 2: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
				// 3: 0000000000401120 13 FUNC GLOBAL DEFAULT 10 functionInDynsym
				//
				// Symbol table '.symtab' contains 2 entries:
				// Num: Value Size Type Bind Vis Ndx Name
				// 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
				// 1: 0000000000401110 15 FUNC GLOBAL DEFAULT 10 functionInSymtab

				// We want to keep the symbol "functionInDynsym" in the .dynsym section and not
				// have it put the default .symtab section.
				// RUN: echo "{functionInDynsym;};" > %T/dynamic-symbols.txt
				// RUN: %clang -Wl,--dynamic-list=%T/dynamic-symbols.txt -g -o %t.binary %s

				// Remove not needed symbols
				// RUN: echo "functionInSymtab" > %t.keep_symbols
				// RUN: echo "functionInDynsym" >> %t.keep_symbols
				// RUN: llvm-objcopy --strip-all --remove-section .gdb_index --remove-section .comment --keep-symbols=%t.keep_symbols %t.binary

				// Remove functionInDynsym symbol from .symtab (will leave symbol in .dynsym intact)
				// RUN: llvm-strip --strip-symbol=functionInDynsym %t.binary

				// RUN: %lldb -b -o 'b functionInSymtab' -o 'b functionInDynsym' -o 'run' -o 'continue' %t.binary \| FileCheck %s

				// CHECK: (lldb) b functionInSymtab
				// CHECK-NEXT: Breakpoint 1: where = {{.}}.binary`functionInSymtab, address = 0x{{.}}

				// CHECK: (lldb) b functionInDynsym
				// CHECK-NEXT: Breakpoint 2: where = {{.}}.binary`functionInDynsym, address = 0x{{.}}

				// CHECK: (lldb) run
				// CHECK-NEXT: Process {{.*}} stopped
				// CHECK-NEXT: * thread #1, name = 'load-symtab-and', stop reason = breakpoint 1.1

				// CHECK: (lldb) continue
				// CHECK-NEXT: Process {{.*}} resuming
				// CHECK-NEXT: Process {{.*}} stopped
				// CHECK-NEXT: * thread #1, name = 'load-symtab-and', stop reason = breakpoint 2.1

				// This function will be embedded within the .symtab section of the main binary.
				int functionInSymtab(int num) { return num * 4; }

				// This function will be embedded within the .dynsym section of the main binary.
				int functionInDynsym(int num) { return num * 3; }

				int main(int argc, char *argv[]) {
				int x = functionInSymtab(argc);
				int y = functionInDynsym(x);
				return y;
				}

lldb/lit/Modules/ELF/merge-symbols.yaml

This file was added.

				# This test ensures that two symbols with the same name are only merged if all
				# of the following attributes are the same: symbol name, size, address/value,
				# section name.

				# RUN: yaml2obj %s > %t

				# RUN: llvm-objcopy \
				# RUN: --redefine-sym=all_same2=all_same1 \
				# RUN: --redefine-sym=different_section2=different_section1 \
				# RUN: --redefine-sym=different_address2=different_address1 \
				# RUN: --redefine-sym=different_size2=different_size1 %t

				# RUN: %lldb -b -o 'image dump symtab' %t \| FileCheck %s

				# CHECK: Index UserID DSX Type File Address/Value Load Address Size Flags Name
				# CHECK: [ 0] 1 Code 0x0000000000500000 0x0000000000000008 0x00000002 all_same1
				# CHECK-NEXT: [ 1] 3 Code 0x0000000000500000 0x0000000000000008 0x00000002 different_section1
				# CHECK-NEXT: [ 2] 4 Code 0x0000000000500000 0x0000000000000008 0x00000002 different_section1
				# CHECK-NEXT: [ 3] 5 Code 0x0000000000500000 0x0000000000000008 0x00000002 different_address1
				# CHECK-NEXT: [ 4] 6 Code 0x0000000000500001 0x0000000000000008 0x00000002 different_address1
				# CHECK-NEXT: [ 5] 7 Code 0x0000000000500000 0x0000000000000008 0x00000002 different_size1
				# CHECK-NEXT: [ 6] 8 Code 0x0000000000500000 0x0000000000000009 0x00000002 different_size1
				# CHECK-EMPTY:
				labathUnsubmitted Done Reply Inline Actions Please add a `CHECK-EMPTY:` after this line to ensure there are no additional symbols here. labath: Please add a `CHECK-EMPTY:` after this line to ensure there are no additional symbols here.

				--- !ELF
				FileHeader:
				Class: ELFCLASS64
				Data: ELFDATA2LSB
				Type: ET_EXEC
				Machine: EM_X86_64
				Entry: 0x0000000000400000
				Sections:
				- Name: .text
				Type: SHT_PROGBITS
				Flags: [ SHF_ALLOC, SHF_EXECINSTR ]
				Address: 0x0000000000400000
				AddressAlign: 0x0000000000000010
				Content: DEADBEEFBAADF00D
				- Name: .text2
				Type: SHT_PROGBITS
				Flags: [ SHF_ALLOC, SHF_EXECINSTR ]
				Address: 0x0000000000500000
				AddressAlign: 0x0000000000000010
				Content: DEADBEEFBAADF00D
				Symbols:
				# Symbols with same addresses, names, sizes but different sections
				- Name: all_same1
				Type: STT_FUNC
				Section: .text
				Value: 0x0000000000500000
				Size: 0x0000000000000008
				# Symbol all_same2 will be renamed to all_same1 and should therefore
				labathUnsubmitted Done Reply Inline Actions smymtab labath: smymtab
				# disappear from the dumped symtab because all fields are equal.
				- Name: all_same2
				Type: STT_FUNC
				Section: .text
				Value: 0x0000000000500000
				Size: 0x0000000000000008
				# Symbols with same addresses, names, sizes and sections
				- Name: different_section1
				Type: STT_FUNC
				Section: .text
				Value: 0x0000000000500000
				Size: 0x0000000000000008
				# Symbol different_section2 will be renamed to different_section1 and we
				# should# see two symbols named different_section1 in the dumped symtab.
				- Name: different_section2
				Type: STT_FUNC
				Section: .text2
				Value: 0x0000000000500000
				Size: 0x0000000000000008
				# Symbols with same names, sizes and sections but different addresses
				- Name: different_address1
				Type: STT_FUNC
				Section: .text
				Value: 0x0000000000500000
				Size: 0x0000000000000008
				# Symbol different_address2 will be renamed to different_address1 and we
				# should see two symbols named different_address1 in the dumped symtab.
				- Name: different_address2
				Type: STT_FUNC
				Section: .text
				Value: 0x0000000000500001
				Size: 0x0000000000000008
				# Symbols with same names, addresses and sections but different sizes
				- Name: different_size1
				Type: STT_FUNC
				Section: .text
				Value: 0x0000000000500000
				Size: 0x0000000000000008
				# Symbol different_size2 will be renamed to different_size1 and we should
				# see two symbols named different_size1 in the dumped symtab.
				- Name: different_size2
				Type: STT_FUNC
				Section: .text
				Value: 0x0000000000500000
				Size: 0x0000000000000009
				...

lldb/lit/Modules/lit.local.cfg

config.suffixes = ['.s', '.test', '.yaml']

config.suffixes = ['.s', '.test', '.yaml', '.c']

lldb/lit/helper/toolchain.py

Show First 20 Lines • Show All 120 Lines • ▼ Show 20 Lines	def use_support_substitutions(config):
have_lld = llvm_config.use_lld(additional_tool_dirs=additional_tool_dirs,		have_lld = llvm_config.use_lld(additional_tool_dirs=additional_tool_dirs,
required=False)		required=False)
if have_lld:		if have_lld:
config.available_features.add('lld')		config.available_features.add('lld')


support_tools = ['yaml2obj', 'obj2yaml', 'llvm-pdbutil',		support_tools = ['yaml2obj', 'obj2yaml', 'llvm-pdbutil',
'llvm-mc', 'llvm-readobj', 'llvm-objdump',		'llvm-mc', 'llvm-readobj', 'llvm-objdump',
'llvm-objcopy', 'lli']		'llvm-objcopy', 'lli', 'llvm-strip']
additional_tool_dirs += [config.lldb_tools_dir, config.llvm_tools_dir]		additional_tool_dirs += [config.lldb_tools_dir, config.llvm_tools_dir]
llvm_config.add_tool_substitutions(support_tools, additional_tool_dirs)		llvm_config.add_tool_substitutions(support_tools, additional_tool_dirs)

lldb/source/Plugins/ObjectFile/ELF/ELFHeader.h

Show All 18 Lines

#ifndef liblldb_ELFHeader_h_		#ifndef liblldb_ELFHeader_h_
#define liblldb_ELFHeader_h_		#define liblldb_ELFHeader_h_

#include "llvm/BinaryFormat/ELF.h"		#include "llvm/BinaryFormat/ELF.h"

#include "lldb/lldb-enumerations.h"		#include "lldb/lldb-enumerations.h"
#include "lldb/lldb-types.h"		#include "lldb/lldb-types.h"
		#include "lldb/Utility/ConstString.h"

namespace lldb_private {		namespace lldb_private {
class DataExtractor;		class DataExtractor;
} // End namespace lldb_private.		} // End namespace lldb_private.

namespace elf {		namespace elf {

/// \name ELF type definitions.		/// \name ELF type definitions.
▲ Show 20 Lines • Show All 231 Lines • ▼ Show 20 Lines	struct ELFSymbol {
/// True if the ELFSymbol was successfully read and false otherwise.		/// True if the ELFSymbol was successfully read and false otherwise.
bool Parse(const lldb_private::DataExtractor &data, lldb::offset_t *offset);		bool Parse(const lldb_private::DataExtractor &data, lldb::offset_t *offset);

void Dump(lldb_private::Stream *s, uint32_t idx,		void Dump(lldb_private::Stream *s, uint32_t idx,
const lldb_private::DataExtractor *strtab_data,		const lldb_private::DataExtractor *strtab_data,
const lldb_private::SectionList *section_list);		const lldb_private::SectionList *section_list);
};		};

		/// A \c NamedELFSymbol is an ELFSymbol that, alongside it's name and section
		/// index, stores the name of the symbol and section as a string and only uses
		/// those for comparison instead of the name or section index which could differ
		jankratochvilUnsubmitted Done Reply Inline Actions It could be in the same order as `ELFSymbol` fields as otherwise it is difficult to verify all the fields are matched here. jankratochvil: It could be in the same order as `ELFSymbol` fields as otherwise it is difficult to verify all…
		/// depending on which section the symbol is defined in (e.g. .strtab or
		/// .dynstr) or which object file a section belongs to (see .gnu_debugdata).
		struct NamedELFSymbol : ELFSymbol {
		lldb_private::ConstString st_name_string; ///< Actual name of the ELF symbol
		lldb_private::ConstString
		st_section_name_string; ///< Actual name of the section

		NamedELFSymbol(const ELFSymbol &sym, lldb_private::ConstString symbol_name,
		lldb_private::ConstString section_name);
		clayborgUnsubmitted Done Reply Inline Actions Do we need a ConstString for the st_shndx as well so we can correctly compare a section from one file to a section from another as in the .gnu_debugdata case? clayborg: Do we need a ConstString for the st_shndx as well so we can correctly compare a section from…
		kwkAuthorUnsubmitted Done Reply Inline Actions That is a good point. kwk: That is a good point.
		kwkAuthorUnsubmitted Done Reply Inline Actions @clayborg in fact I think this could be the reason not to use a set of `lldb_private::Symbol` objects because there we don't store the section name or symbol name but only addresses or indexes. I did add the `st_section_name_string` struct member. kwk: @clayborg in fact I think this could be the reason not to use a set of `lldb_private::Symbol`…

		/// \returns \c true when all fields (except name and section indexes) of the
		/// right hand side object are the same as the once of this object; otherwise
		/// \c false is returned. We ignore the name and section index in order to
		/// only compare actual name strings and not where strings are located.
		bool operator==(const NamedELFSymbol &rhs) const noexcept;

		/// \c returns a combined hash value for the given \c NamedELFSymbol over all
		/// struct fields but ignores the name and section index of the base struct in
		/// order to only compare actual name strings and not where strings are
		/// located.
		std::size_t hash() const noexcept;
		};

/// \class ELFDynamic		/// \class ELFDynamic
/// Represents an entry in an ELF dynamic table.		/// Represents an entry in an ELF dynamic table.
struct ELFDynamic {		struct ELFDynamic {
elf_sxword d_tag; ///< Type of dynamic table entry.		elf_sxword d_tag; ///< Type of dynamic table entry.
union {		union {
elf_xword d_val; ///< Integer value of the table entry.		elf_xword d_val; ///< Integer value of the table entry.
elf_addr d_ptr; ///< Pointer value of the table entry.		elf_addr d_ptr; ///< Pointer value of the table entry.
};		};
▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	struct ELFRela {
/// relocation.		/// relocation.
static unsigned RelocSymbol64(const ELFRela &rela) {		static unsigned RelocSymbol64(const ELFRela &rela) {
return rela.r_info >> 32;		return rela.r_info >> 32;
}		}
};		};

} // End namespace elf.		} // End namespace elf.

		namespace std {
		/// Define specializations of the std::hash struct for NamedELFSymbol so they
		/// can be used in an std::unordered_set.
		template <> struct hash<elf::NamedELFSymbol> {
		std::size_t operator()(const elf::NamedELFSymbol &s) const noexcept {
		return s.hash();
		}
		};
		} // namespace std

		clayborgUnsubmitted Done Reply Inline Actions An ELF symbol from one symbol table can have the same name as another yet with a different st_name string table offset. Do we even want to be able to hash an ELFSymbol on its own? Maybe remove this enture function and only hash NamedELFSymbol? clayborg: An ELF symbol from one symbol table can have the same name as another yet with a different…
#endif // #ifndef liblldb_ELFHeader_h_		#endif // #ifndef liblldb_ELFHeader_h_
		clayborgUnsubmitted Done Reply Inline Actions I know the section index will match between for symbol tables in the same ELF file, but what about a symbol table in an external file like .gnu_debugdata? clayborg: I know the section index will match between for symbol tables in the same ELF file, but what…
		kwkAuthorUnsubmitted Done Reply Inline Actions I did add the section name to `NamedELFSymbol` and explicitly ignore it when building the hash for the base `ELFSymbol`. kwk: I did add the section name to `NamedELFSymbol` and explicitly ignore it when building the hash…
		clayborgUnsubmitted Done Reply Inline Actions Don't we need to hash everything we care about except the st_name? Those indexes can differ if they come from a different string table? Shouldn't this be: std::size_t h1 = std::hash<elf::elf_addr>()(s.st_value); std::size_t h2 = std::hash<elf::elf_xword>()(s.st_size); // Skip std::size_t h3 = std::hash<elf::elf_word>()(s.st_name); std::size_t h4 = std::hash<unsigned char>()(s.st_info); std::size_t h5 = std::hash<unsigned char>()(s.st_other); std::size_t h6 = std::hash<elf::elf_half>()(s.st_shndx); std::size_t h7 = std::hash<const char >()(s.st_name_string.AsCString()); return llvm::hash_combine(h1, h2, h4, h5, h6, h7); I left the "h" variables with the same names to show we don't want "h3". clayborg:* Don't we need to hash everything we care about except the st_name? Those indexes can differ if…
		kwkAuthorUnsubmitted Done Reply Inline Actions That's why I explicitly set the name to 0 always which effectively ignores it because it has no effect on the hash then. Please see the specialization hash for `NamedELFSymbol`. kwk: That's why I explicitly set the name to 0 always which effectively ignores it because it has no…
		jankratochvilUnsubmitted Done Reply Inline Actions I find better to rather define `std::hash<ConstString>` (or provide `ConstString::Hasher` which I do in my DWZ patchset). jankratochvil: I find better to rather define `std::hash<ConstString>` (or provide `ConstString::Hasher` which…
		kwkAuthorUnsubmitted Done Reply Inline Actions Once your DWZ patchset arrives, please let me know and we can change it. kwk: Once your DWZ patchset arrives, please let me know and we can change it.
		jankratochvilUnsubmitted Not Done Reply Inline Actions https://people.redhat.com/jkratoch/ConstStringHasher.patch - Although then the whole UniqueElfSymbolColl should be replaced by `std::sort`+`std::unique` of `std::vector` you plan in a future patch so the hashing does not matter much. jankratochvil: https://people.redhat.com/jkratoch/ConstStringHasher.patch - Although then the whole…

lldb/source/Plugins/ObjectFile/ELF/ELFHeader.cpp

//===-- ELFHeader.cpp ----------------------------------------- -- C++ --===//		//===-- ELFHeader.cpp ----------------------------------------- -- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include <cstring>		#include <cstring>

#include "lldb/Core/Section.h"		#include "lldb/Core/Section.h"
#include "lldb/Utility/DataExtractor.h"		#include "lldb/Utility/DataExtractor.h"
#include "lldb/Utility/Stream.h"		#include "lldb/Utility/Stream.h"
		#include "llvm/ADT/Hashing.h"

#include "ELFHeader.h"		#include "ELFHeader.h"

using namespace elf;		using namespace elf;
using namespace lldb;		using namespace lldb;
using namespace llvm::ELF;		using namespace llvm::ELF;

// Static utility functions.		// Static utility functions.
▲ Show 20 Lines • Show All 223 Lines • ▼ Show 20 Lines	bool ELFSectionHeader::Parse(const lldb_private::DataExtractor &data,

return true;		return true;
}		}

// ELFSymbol		// ELFSymbol

ELFSymbol::ELFSymbol() { memset(this, 0, sizeof(ELFSymbol)); }		ELFSymbol::ELFSymbol() { memset(this, 0, sizeof(ELFSymbol)); }

#define ENUM_TO_CSTR(e) \		#define ENUM_TO_CSTR(e) \
case e: \		case e: \
return #e		return #e

		labathUnsubmitted Done Reply Inline Actions Is this really needed? It looks like the default compiler-generated copy constructor would be sufficient here. labath: Is this really needed? It looks like the default compiler-generated copy constructor would be…
const char *ELFSymbol::bindingToCString(unsigned char binding) {		const char *ELFSymbol::bindingToCString(unsigned char binding) {
switch (binding) {		switch (binding) {
ENUM_TO_CSTR(STB_LOCAL);		ENUM_TO_CSTR(STB_LOCAL);
ENUM_TO_CSTR(STB_GLOBAL);		ENUM_TO_CSTR(STB_GLOBAL);
ENUM_TO_CSTR(STB_WEAK);		ENUM_TO_CSTR(STB_WEAK);
ENUM_TO_CSTR(STB_LOOS);		ENUM_TO_CSTR(STB_LOOS);
ENUM_TO_CSTR(STB_HIOS);		ENUM_TO_CSTR(STB_HIOS);
ENUM_TO_CSTR(STB_LOPROC);		ENUM_TO_CSTR(STB_LOPROC);
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	if (parsing_32) {

// Read st_value and st_size.		// Read st_value and st_size.
if (data.GetU64(offset, &st_value, 2) == nullptr)		if (data.GetU64(offset, &st_value, 2) == nullptr)
return false;		return false;
}		}
return true;		return true;
}		}

		// NamedELFSymbol

		NamedELFSymbol::NamedELFSymbol(const ELFSymbol &other,
		lldb_private::ConstString symbol_name,
		lldb_private::ConstString section_name)
		: ELFSymbol(other), st_name_string(symbol_name),
		st_section_name_string(section_name) {}

		bool NamedELFSymbol::operator==(const NamedELFSymbol &rhs) const noexcept {
		return st_value == rhs.st_value && st_size == rhs.st_size &&
		st_info == rhs.st_info && st_other == rhs.st_other &&
		st_name_string == rhs.st_name_string &&
		st_section_name_string == rhs.st_section_name_string;
		clayborgUnsubmitted Done Reply Inline Actions I would almost just manually compare only the things we care about here. Again, what about st_shndx when comparing a symbol from the main symbol table and one from the .gnu_debugdata symbol table. Are those section indexes guaranteed to match? I would think not. clayborg: I would almost just manually compare only the things we care about here. Again, what about…
		kwkAuthorUnsubmitted Done Reply Inline Actions @clayborg I explicitly only compare what we care about and therefore always set the name index to be the same. kwk: @clayborg I explicitly only compare what we care about and therefore always set the name index…
		labathUnsubmitted Done Reply Inline Actions I'll echo @clayborg here. This business with copying the ELFSymbol and clearing some fields is confusing. Do you even need the ELFSymbol::operator== for anything? If not I'd just delete that, and have the derived version compare all fields. Also, it would be nice if the operator== and hash function definitions were next to each other. Can you just forward declare the std::hash stuff in the header, and have the implementation be next to this? labath: I'll echo @clayborg here. This business with copying the ELFSymbol and clearing some fields is…
		kwkAuthorUnsubmitted Done Reply Inline Actions I'll echo @clayborg here. This business with copying the ELFSymbol and clearing some fields is confusing. I've cleared up the documentation now and it is exactly the way I like it. Every entity deals with it's own business (respects its own fields when comparing). I find it pretty dirty to compare fields from the base struct in a derived one. The way I compare fields from the base struct is minimally invasive. Do you even need the ELFSymbol::operator== for anything? Yes, when you want to compare ELFSymbols. I know that I don't do that explicitly but I the function only deals with fields from the entity itself and they don't spill into any derived structure (with the exception of explicitly ignoring fields). If not I'd just delete that, and have the derived version compare all fields. No because I call it explcitly from the derived NamedELFSymbol. Also, it would be nice if the operator== and hash function definitions were next to each other. Can you just forward declare the std::hash stuff in the header, and have the implementation be next to this? I've found a compromise that is even more appealing to me. The ELFSymbol and NamedELFSymbol structs now have a `hash` function which contains the implementation next to the one of `operator==()`. This `hash` is called in the specialization which remains in the same location as before; the reason being that I didn't find a way do define something in the `std::` namespace when I'm in the `elf::` namespace. kwk: > I'll echo @clayborg here. This business with copying the ELFSymbol and clearing some fields…
		labathUnsubmitted Done Reply Inline Actions Yes, when you want to compare ELFSymbols. I know that I don't do that explicitly but I the function only deals with fields from the entity itself and they don't spill into any derived structure (with the exception of explicitly ignoring fields). Yes, but to me that exception kind of invalidates the whole idea. In order to know which fields you need to ignore, you need the knowledge of what fields are present in the struct (and as the fields are public, that is not a big deal), at which point you can just avoid the whole copying business, and explicitly compare the fields that you care about. Given that this also saves a nontrivial amount of code, I still think it's a better way to go. (Also, defining equality operators on class hierarchies is usually not a good idea even if they "nest" nicely, since they can still produce strange results due to object slicing.) I've found a compromise that is even more appealing to me. The ELFSymbol and NamedELFSymbol structs now have a hash function which contains the implementation next to the one of operator==(). That works for me. labath: > Yes, when you want to compare ELFSymbols. I know that I don't do that explicitly but I the…
		kwkAuthorUnsubmitted Done Reply Inline Actions @labath okay, I've remove the logic from `ELFSymbol` and coded everything straight away. I guess, that I wanted to be able to extend `ELFSymbol` with n number of fields and add them to the `ELFSymbol::operator==()` without touching the `NamedELFSymbol::operator==()` as long as the added fields shall not be ignored. Makes sense? I guess that you can find arguments for both ways to implement it. Anyway, I've coded it the way you want now, I hope. kwk: @labath okay, I've remove the logic from `ELFSymbol` and coded everything straight away. I…
		}

		std::size_t NamedELFSymbol::hash() const noexcept {
		// ignore the name and section index when hashing the ELFSymbol
		return llvm::hash_combine(st_value, st_size, st_info, st_other,
		st_name_string.AsCString(),
		st_section_name_string.AsCString());
		}

// ELFProgramHeader		// ELFProgramHeader
		jankratochvilUnsubmitted Done Reply Inline Actions llvm::hash_combine already calls std::hash<T> for each of its parameters. jankratochvil: llvm::hash_combine already calls std::hash<T> for each of its parameters.
		kwkAuthorUnsubmitted Done Reply Inline Actions Good to know. Thank you. kwk: Good to know. Thank you.

ELFProgramHeader::ELFProgramHeader() {		ELFProgramHeader::ELFProgramHeader() {
memset(this, 0, sizeof(ELFProgramHeader));		memset(this, 0, sizeof(ELFProgramHeader));
}		}

bool ELFProgramHeader::Parse(const lldb_private::DataExtractor &data,		bool ELFProgramHeader::Parse(const lldb_private::DataExtractor &data,
lldb::offset_t *offset) {		lldb::offset_t *offset) {
const uint32_t byte_size = data.GetAddressByteSize();		const uint32_t byte_size = data.GetAddressByteSize();
▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.h

//===-- ObjectFileELF.h --------------------------------------- -- C++ --===//		//===-- ObjectFileELF.h --------------------------------------- -- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef liblldb_ObjectFileELF_h_		#ifndef liblldb_ObjectFileELF_h_
#define liblldb_ObjectFileELF_h_		#define liblldb_ObjectFileELF_h_

#include <stdint.h>		#include <stdint.h>

#include <vector>		#include <vector>
		#include <unordered_set>

#include "lldb/Symbol/ObjectFile.h"		#include "lldb/Symbol/ObjectFile.h"
#include "lldb/Utility/ArchSpec.h"		#include "lldb/Utility/ArchSpec.h"
#include "lldb/Utility/FileSpec.h"		#include "lldb/Utility/FileSpec.h"
#include "lldb/Utility/UUID.h"		#include "lldb/Utility/UUID.h"
#include "lldb/lldb-private.h"		#include "lldb/lldb-private.h"

#include "ELFHeader.h"		#include "ELFHeader.h"
▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	private:
typedef SectionHeaderColl::const_iterator SectionHeaderCollConstIter;		typedef SectionHeaderColl::const_iterator SectionHeaderCollConstIter;

typedef std::vector<elf::ELFDynamic> DynamicSymbolColl;		typedef std::vector<elf::ELFDynamic> DynamicSymbolColl;
typedef DynamicSymbolColl::iterator DynamicSymbolCollIter;		typedef DynamicSymbolColl::iterator DynamicSymbolCollIter;
typedef DynamicSymbolColl::const_iterator DynamicSymbolCollConstIter;		typedef DynamicSymbolColl::const_iterator DynamicSymbolCollConstIter;

typedef std::map<lldb::addr_t, lldb_private::AddressClass>		typedef std::map<lldb::addr_t, lldb_private::AddressClass>
FileAddressToAddressClassMap;		FileAddressToAddressClassMap;

		typedef std::unordered_set<elf::NamedELFSymbol> UniqueElfSymbolColl;

/// Version of this reader common to all plugins based on this class.		/// Version of this reader common to all plugins based on this class.
static const uint32_t m_plugin_version = 1;		static const uint32_t m_plugin_version = 1;
static const uint32_t g_core_uuid_magic;		static const uint32_t g_core_uuid_magic;

/// ELF file header.		/// ELF file header.
elf::ELFHeader m_header;		elf::ELFHeader m_header;

/// ELF build ID.		/// ELF build ID.
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	private:
/// vector retains the order as found in the object file. Returns the		/// vector retains the order as found in the object file. Returns the
/// number of dynamic symbols parsed.		/// number of dynamic symbols parsed.
size_t ParseDynamicSymbols();		size_t ParseDynamicSymbols();

/// Populates m_symtab_up will all non-dynamic linker symbols. This method		/// Populates m_symtab_up will all non-dynamic linker symbols. This method
/// will parse the symbols only once. Returns the number of symbols parsed.		/// will parse the symbols only once. Returns the number of symbols parsed.
unsigned ParseSymbolTable(lldb_private::Symtab *symbol_table,		unsigned ParseSymbolTable(lldb_private::Symtab *symbol_table,
lldb::user_id_t start_id,		lldb::user_id_t start_id,
lldb_private::Section *symtab);		lldb_private::Section *symtab,
		UniqueElfSymbolColl &unique_elf_symbols);

/// Helper routine for ParseSymbolTable().		/// Helper routine for ParseSymbolTable().
unsigned ParseSymbols(lldb_private::Symtab *symbol_table,		unsigned ParseSymbols(lldb_private::Symtab *symbol_table,
lldb::user_id_t start_id,		lldb::user_id_t start_id,
lldb_private::SectionList *section_list,		lldb_private::SectionList *section_list,
const size_t num_symbols,		const size_t num_symbols,
const lldb_private::DataExtractor &symtab_data,		const lldb_private::DataExtractor &symtab_data,
const lldb_private::DataExtractor &strtab_data);		const lldb_private::DataExtractor &strtab_data,
		UniqueElfSymbolColl &unique_elf_symbols);

/// Scans the relocation entries and adds a set of artificial symbols to the		/// Scans the relocation entries and adds a set of artificial symbols to the
/// given symbol table for each PLT slot. Returns the number of symbols		/// given symbol table for each PLT slot. Returns the number of symbols
/// added.		/// added.
unsigned ParseTrampolineSymbols(lldb_private::Symtab *symbol_table,		unsigned ParseTrampolineSymbols(lldb_private::Symtab *symbol_table,
lldb::user_id_t start_id,		lldb::user_id_t start_id,
const ELFSectionHeaderInfo *rela_hdr,		const ELFSectionHeaderInfo *rela_hdr,
lldb::user_id_t section_id);		lldb::user_id_t section_id);
▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp

Show All 33 Lines
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/BinaryFormat/ELF.h"		#include "llvm/BinaryFormat/ELF.h"
#include "llvm/Object/Decompressor.h"		#include "llvm/Object/Decompressor.h"
#include "llvm/Support/ARMBuildAttributes.h"		#include "llvm/Support/ARMBuildAttributes.h"
#include "llvm/Support/JamCRC.h"		#include "llvm/Support/JamCRC.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
#include "llvm/Support/MemoryBuffer.h"		#include "llvm/Support/MemoryBuffer.h"
#include "llvm/Support/MipsABIFlags.h"		#include "llvm/Support/MipsABIFlags.h"

		jankratochvilUnsubmitted Done Reply Inline Actions Is it really needed? jankratochvil: Is it really needed?
		kwkAuthorUnsubmitted Done Reply Inline Actions removed. kwk: removed.
#define CASE_AND_STREAM(s, def, width) \		#define CASE_AND_STREAM(s, def, width) \
case def: \		case def: \
s->Printf("%-*s", width, #def); \		s->Printf("%-*s", width, #def); \
break;		break;

using namespace lldb;		using namespace lldb;
using namespace lldb_private;		using namespace lldb_private;
using namespace elf;		using namespace elf;
▲ Show 20 Lines • Show All 1,815 Lines • ▼ Show 20 Lines
#define STO_MICROMIPS (2 << 6)		#define STO_MICROMIPS (2 << 6)
#define IS_MICROMIPS(ST_OTHER) (((ST_OTHER)&STO_MIPS_ISA) == STO_MICROMIPS)		#define IS_MICROMIPS(ST_OTHER) (((ST_OTHER)&STO_MIPS_ISA) == STO_MICROMIPS)

// private		// private
unsigned ObjectFileELF::ParseSymbols(Symtab *symtab, user_id_t start_id,		unsigned ObjectFileELF::ParseSymbols(Symtab *symtab, user_id_t start_id,
SectionList *section_list,		SectionList *section_list,
const size_t num_symbols,		const size_t num_symbols,
const DataExtractor &symtab_data,		const DataExtractor &symtab_data,
const DataExtractor &strtab_data) {		const DataExtractor &strtab_data,
		UniqueElfSymbolColl &unique_elf_symbols) {
ELFSymbol symbol;		ELFSymbol symbol;
lldb::offset_t offset = 0;		lldb::offset_t offset = 0;

static ConstString text_section_name(".text");		static ConstString text_section_name(".text");
static ConstString init_section_name(".init");		static ConstString init_section_name(".init");
static ConstString fini_section_name(".fini");		static ConstString fini_section_name(".fini");
static ConstString ctors_section_name(".ctors");		static ConstString ctors_section_name(".ctors");
static ConstString dtors_section_name(".dtors");		static ConstString dtors_section_name(".dtors");
Show All 26 Lines	unsigned ObjectFileELF::ParseSymbols(Symtab *symtab, user_id_t start_id,

// Local cache to avoid doing a FindSectionByName for each symbol. The "const		// Local cache to avoid doing a FindSectionByName for each symbol. The "const
// char*" key must came from a ConstString object so they can be compared by		// char*" key must came from a ConstString object so they can be compared by
// pointer		// pointer
std::unordered_map<const char *, lldb::SectionSP> section_name_to_section;		std::unordered_map<const char *, lldb::SectionSP> section_name_to_section;

unsigned i;		unsigned i;
for (i = 0; i < num_symbols; ++i) {		for (i = 0; i < num_symbols; ++i) {
if (!symbol.Parse(symtab_data, &offset))		if (!symbol.Parse(symtab_data, &offset))
break;		break;
		labathUnsubmitted Done Reply Inline Actions delete labath: delete

const char *symbol_name = strtab_data.PeekCStr(symbol.st_name);		const char *symbol_name = strtab_data.PeekCStr(symbol.st_name);
if (!symbol_name)		if (!symbol_name)
symbol_name = "";		symbol_name = "";

// No need to add non-section symbols that have no names		// No need to add non-section symbols that have no names
		jankratochvilUnsubmitted Done Reply Inline Actions also delete jankratochvil: also delete
if (symbol.getType() != STT_SECTION &&		if (symbol.getType() != STT_SECTION &&
(symbol_name == nullptr \|\| symbol_name[0] == '\0'))		(symbol_name == nullptr \|\| symbol_name[0] == '\0'))
continue;		continue;

// Skipping oatdata and oatexec sections if it is requested. See details		// Skipping oatdata and oatexec sections if it is requested. See details
// above the definition of skip_oatdata_oatexec for the reasons.		// above the definition of skip_oatdata_oatexec for the reasons.
if (skip_oatdata_oatexec && (::strcmp(symbol_name, "oatdata") == 0 \|\|		if (skip_oatdata_oatexec && (::strcmp(symbol_name, "oatdata") == 0 \|\|
::strcmp(symbol_name, "oatexec") == 0))		::strcmp(symbol_name, "oatexec") == 0))
▲ Show 20 Lines • Show All 258 Lines • ▼ Show 20 Lines	Symbol dc_symbol(
false, // Is this symbol artificial?		false, // Is this symbol artificial?
AddressRange(symbol_section_sp, // Section in which this symbol is		AddressRange(symbol_section_sp, // Section in which this symbol is
// defined or null.		// defined or null.
symbol_value, // Offset in section or symbol value.		symbol_value, // Offset in section or symbol value.
symbol.st_size), // Size in bytes of this symbol.		symbol.st_size), // Size in bytes of this symbol.
symbol_size_valid, // Symbol size is valid		symbol_size_valid, // Symbol size is valid
has_suffix, // Contains linker annotations?		has_suffix, // Contains linker annotations?
flags); // Symbol flags.		flags); // Symbol flags.

		NamedELFSymbol needle(symbol, ConstString(symbol_ref),
		symbol_section_sp.get() ? symbol_section_sp->GetName()
		: ConstString());
		if (unique_elf_symbols.insert(needle).second) {
		labathUnsubmitted Done Reply Inline Actions Move this into the NamedELFSymbol constructor? labath: Move this into the NamedELFSymbol constructor?
symtab->AddSymbol(dc_symbol);		symtab->AddSymbol(dc_symbol);
		jankratochvilUnsubmitted Done Reply Inline Actions What if the symbol is ignored, the function will then incorrectly return a number of added symbols even when they were not added, wouldn't it? jankratochvil: What if the symbol is ignored, the function will then incorrectly return a number of added…
		kwkAuthorUnsubmitted Done Reply Inline Actions @jankratochvil we already have places inside this `for`-loop where we `continue`. I hope it is okay to ask the same question back that you've asked for those `continue`-places. Why don't we adjust the returned number (`i`) in case symbols where skipped? kwk: @jankratochvil we already have places inside this `for`-loop where we `continue`. I hope it is…
		clayborgUnsubmitted Done Reply Inline Actions Do we even need NamedELFSymbol? Can we just make an unordered_set of lldb_private::Symbol values? clayborg: Do we even need NamedELFSymbol? Can we just make an unordered_set of lldb_private::Symbol…
		kwkAuthorUnsubmitted Done Reply Inline Actions @clayborg I find it much easier with `NamedELFSymbol` because all we have to do is derive from `ELFSymbol` and add the strings for the symbol name and the section name. If we were to use `lldb_private::Symbol` I would have to lookup the symbols manually each time I calculate a hash which seems bad. I mean, the symbol and section name already are `ConstString`s and should be stored and computed very efficiently. Also I wanted to keep things local to ELF and not mess with everything that uses `lldb_private::Symbol`. Makes sense? kwk: @clayborg I find it much easier with `NamedELFSymbol` because all we have to do is derive from…
		labathUnsubmitted Done Reply Inline Actions something like `if (unique_elf_symbols.insert(needle).second)` would be more efficient, as you don't need to mess with the map twice. labath: something like `if (unique_elf_symbols.insert(needle).second)` would be more efficient, as you…
		jankratochvilUnsubmitted Done Reply Inline Actions I would unconditionally add all the symbols to `std::vector<NamedElfSymbol> unique_elf_symbols` (with `unique_elf_symbols.reserve` in advance for the sym of `.symtab`+`.dynsym` sizes) and then after processing both `.symtab` and `.dynsym` and can `llvm::sort(unique_elf_symbols)` and add to `symtab` only those which are unique. I believe it will be much faster, one could benchmark it. jankratochvil: I would unconditionally add all the symbols to `std::vector<NamedElfSymbol> unique_elf_symbols`…
		labathUnsubmitted Done Reply Inline Actions sounds like a good idea. labath: sounds like a good idea.
		kwkAuthorUnsubmitted Done Reply Inline Actions @jankratochvil @labath yes, this sound like a good idea for performance improvement but honestly, I need to get this patch done first in order to make any progress with minidebuginfo. I hope you don't mind when I take this task to another patch, okay? kwk: @jankratochvil @labath yes, this sound like a good idea for performance improvement but…
		labathUnsubmitted Done Reply Inline Actions I don't think that kind of reasoning really applies here, since it's this patch that is introducing the potential performance problem. However, I don't think that is going to be a big deal, so I think we can leave this out for now. labath: I don't think that kind of reasoning really applies here, since it's this patch that is…
		kwkAuthorUnsubmitted Done Reply Inline Actions Okay, thank you for allowing me to leave it out for now. kwk: Okay, thank you for allowing me to leave it out for now.
}		}
		}
return i;		return i;
}		}

unsigned ObjectFileELF::ParseSymbolTable(Symtab *symbol_table,		unsigned
user_id_t start_id,		ObjectFileELF::ParseSymbolTable(Symtab *symbol_table, user_id_t start_id,
lldb_private::Section *symtab) {		lldb_private::Section *symtab,
		UniqueElfSymbolColl &unique_elf_symbols) {
if (symtab->GetObjectFile() != this) {		if (symtab->GetObjectFile() != this) {
// If the symbol table section is owned by a different object file, have it		// If the symbol table section is owned by a different object file, have it
// do the parsing.		// do the parsing.
ObjectFileELF *obj_file_elf =		ObjectFileELF *obj_file_elf =
static_cast<ObjectFileELF *>(symtab->GetObjectFile());		static_cast<ObjectFileELF *>(symtab->GetObjectFile());
return obj_file_elf->ParseSymbolTable(symbol_table, start_id, symtab);		return obj_file_elf->ParseSymbolTable(symbol_table, start_id, symtab,
		unique_elf_symbols);
}		}

// Get section list for this object file.		// Get section list for this object file.
SectionList *section_list = m_sections_up.get();		SectionList *section_list = m_sections_up.get();
if (!section_list)		if (!section_list)
return 0;		return 0;

user_id_t symtab_id = symtab->GetID();		user_id_t symtab_id = symtab->GetID();
Show All 11 Lines	if (symtab && strtab) {

DataExtractor symtab_data;		DataExtractor symtab_data;
DataExtractor strtab_data;		DataExtractor strtab_data;
if (ReadSectionData(symtab, symtab_data) &&		if (ReadSectionData(symtab, symtab_data) &&
ReadSectionData(strtab, strtab_data)) {		ReadSectionData(strtab, strtab_data)) {
size_t num_symbols = symtab_data.GetByteSize() / symtab_hdr->sh_entsize;		size_t num_symbols = symtab_data.GetByteSize() / symtab_hdr->sh_entsize;

return ParseSymbols(symbol_table, start_id, section_list, num_symbols,		return ParseSymbols(symbol_table, start_id, section_list, num_symbols,
symtab_data, strtab_data);		symtab_data, strtab_data, unique_elf_symbols);
}		}
}		}

return 0;		return 0;
}		}

size_t ObjectFileELF::ParseDynamicSymbols() {		size_t ObjectFileELF::ParseDynamicSymbols() {
if (m_dynamic_symbols.size())		if (m_dynamic_symbols.size())
▲ Show 20 Lines • Show All 151 Lines • ▼ Show 20 Lines	Symbol jump_symbol(
true, // Is this symbol a trampoline?		true, // Is this symbol a trampoline?
true, // Is this symbol artificial?		true, // Is this symbol artificial?
plt_section_sp, // Section in which this symbol is defined or null.		plt_section_sp, // Section in which this symbol is defined or null.
plt_index, // Offset in section or symbol value.		plt_index, // Offset in section or symbol value.
plt_entsize, // Size in bytes of this symbol.		plt_entsize, // Size in bytes of this symbol.
true, // Size is valid		true, // Size is valid
false, // Contains linker annotations?		false, // Contains linker annotations?
0); // Symbol flags.		0); // Symbol flags.

symbol_table->AddSymbol(jump_symbol);		symbol_table->AddSymbol(jump_symbol);
}		}

return i;		return i;
}		}

unsigned		unsigned
ObjectFileELF::ParseTrampolineSymbols(Symtab *symbol_table, user_id_t start_id,		ObjectFileELF::ParseTrampolineSymbols(Symtab *symbol_table, user_id_t start_id,
▲ Show 20 Lines • Show All 202 Lines • ▼ Show 20 Lines	if (GetData(rel->GetFileOffset(), rel->GetFileSize(), rel_data) &&
ApplyRelocations(thetab, &m_header, rel_hdr, symtab_hdr, debug_hdr,		ApplyRelocations(thetab, &m_header, rel_hdr, symtab_hdr, debug_hdr,
rel_data, symtab_data, debug_data, debug);		rel_data, symtab_data, debug_data, debug);
}		}

return 0;		return 0;
}		}

Symtab *ObjectFileELF::GetSymtab() {		Symtab *ObjectFileELF::GetSymtab() {
ModuleSP module_sp(GetModule());		ModuleSP module_sp(GetModule());
		labathUnsubmitted Done Reply Inline Actions delete labath: delete
		kwkAuthorUnsubmitted Done Reply Inline Actions Sorry, I noticed this myself as well. Some of the performance experiment spilled over in this patch. kwk: Sorry, I noticed this myself as well. Some of the performance experiment spilled over in this…
if (!module_sp)		if (!module_sp)
return nullptr;		return nullptr;

// We always want to use the main object file so we (hopefully) only have one		// We always want to use the main object file so we (hopefully) only have one
// cached copy of our symtab, dynamic sections, etc.		// cached copy of our symtab, dynamic sections, etc.
ObjectFile *module_obj_file = module_sp->GetObjectFile();		ObjectFile *module_obj_file = module_sp->GetObjectFile();
if (module_obj_file && module_obj_file != this)		if (module_obj_file && module_obj_file != this)
return module_obj_file->GetSymtab();		return module_obj_file->GetSymtab();

if (m_symtab_up == nullptr) {		if (m_symtab_up == nullptr) {
SectionList *section_list = module_sp->GetSectionList();		SectionList *section_list = module_sp->GetSectionList();
if (!section_list)		if (!section_list)
		labathUnsubmitted Done Reply Inline Actions what's wrong with the old-fashioned `UniqueElfSymbolColl unique_elf_symbols;` ? labath: what's wrong with the old-fashioned `UniqueElfSymbolColl unique_elf_symbols;` ?
		kwkAuthorUnsubmitted Done Reply Inline Actions Apparently nothing now :) . But before I had troubles because of some deleted default constructor. Thanks for spotting this and bringing it back to my attention. kwk: Apparently nothing now :) . But before I had troubles because of some deleted default…
return nullptr;		return nullptr;

uint64_t symbol_id = 0;		uint64_t symbol_id = 0;
std::lock_guard<std::recursive_mutex> guard(module_sp->GetMutex());		std::lock_guard<std::recursive_mutex> guard(module_sp->GetMutex());

// Sharable objects and dynamic executables usually have 2 distinct symbol		// Sharable objects and dynamic executables usually have 2 distinct symbol
		JDevlieghereUnsubmitted Done Reply Inline Actions Can you motivate the need for this change? This comment seems to suggest that reading the symtab table should be sufficient as it should contain all the information from the dynsym. If that is not true, it would be worth updating this comment. JDevlieghere: Can you motivate the need for this change? This comment seems to suggest that reading the…
// tables, one named ".symtab", and the other ".dynsym". The dynsym is a		// tables, one named ".symtab", and the other ".dynsym". The dynsym is a
// smaller version of the symtab that only contains global symbols. The		// smaller version of the symtab that only contains global symbols.
// information found in the dynsym is therefore also found in the symtab,		// Information in the dynsym section is usually also found in the symtab,
// while the reverse is not necessarily true.		// but this is not required as symtab entries can be removed after linking.
		// The minidebuginfo format makes use of this facility to create smaller
		labathUnsubmitted Done Reply Inline Actions How about we make this less layered, and rephrase the existing comment a bit: "Information in the dynsym section is usually also found in the symtab, but this is not required as symtab entries can be removed after linking. The minidebuginfo format makes use of this facility to create smaller symbol tables. labath: How about we make this less layered, and rephrase the existing comment a bit: "Information in…
		// symbol tables.
Section *symtab =		Section *symtab =
section_list->FindSectionByType(eSectionTypeELFSymbolTable, true).get();		section_list->FindSectionByType(eSectionTypeELFSymbolTable, true).get();
if (!symtab) {
// The symtab section is non-allocable and can be stripped, so if it		// A unique set of ELF symbols added to the symtab
// doesn't exist then use the dynsym section which should always be		UniqueElfSymbolColl unique_elf_symbols;
		jankratochvilUnsubmitted Done Reply Inline Actions For the planned rework of the unification of symbols it could be put (I think) to `Symtab::InitAddressIndexes` which already sorts the Symtab anyway. jankratochvil: For the planned rework of the unification of symbols it could be put (I think) to `Symtab…
		kwkAuthorUnsubmitted Done Reply Inline Actions Honestly, for the rework I think about not to use an `std::vector` as you proposed but instead create the `std::unordered_set` using a bucket count that is equal to the number of symbols in `.symtab` and in `.dynsym`. Then inserts to that set will be constant as they are for the vector. But let's see how it goes in a followup patch. I have the feeling that if I use the approach you suggested, I need to keep a vector and a set around. The vector for collecting all symbols and the set for doing the unification, no? Anyway, let's postpone this. kwk: Honestly, for the rework I think about not to use an `std::vector` as you proposed but instead…
// there.
symtab =
section_list->FindSectionByType(eSectionTypeELFDynamicSymbols, true)
.get();
}
if (symtab) {		if (symtab) {
m_symtab_up.reset(new Symtab(symtab->GetObjectFile()));		m_symtab_up.reset(new Symtab(symtab->GetObjectFile()));
symbol_id += ParseSymbolTable(m_symtab_up.get(), symbol_id, symtab);		symbol_id += ParseSymbolTable(m_symtab_up.get(), symbol_id, symtab,
		unique_elf_symbols);
		}

		// The symtab section is non-allocable and can be stripped, while the dynsym
		// section which should always be always be there. If both exist we load
		// both to support the minidebuginfo case. Otherwise we just load the dynsym
		// section.
		Section *dynsym =
		section_list->FindSectionByType(eSectionTypeELFDynamicSymbols, true)
		.get();
		if (dynsym) {
		if (!m_symtab_up)
		m_symtab_up.reset(new Symtab(dynsym->GetObjectFile()));
		symbol_id += ParseSymbolTable(m_symtab_up.get(), symbol_id, dynsym,
		unique_elf_symbols);
}		}

// DT_JMPREL		// DT_JMPREL
		JDevlieghereUnsubmitted Done Reply Inline Actions Why did you remove the last part of the original comment? This seemed to be the most useful part... The newly added sentences explain what we are doing (which is relatively clear from the code). I'd rather see a comment explaining "why" something needs to happen. JDevlieghere: Why did you remove the last part of the original comment? This seemed to be the most useful…
		JDevlieghereUnsubmitted Done Reply Inline Actions The symtab section is non-allocable and can be stripped, while the dynsym section which should always be always be there. If both exist we load both to support the minidebuginfo case. Otherwise we just load the dynsym section. JDevlieghere: > The symtab section is non-allocable and can be stripped, while the dynsym section which…
// If present, this entry's d_ptr member holds the address of		// If present, this entry's d_ptr member holds the address of
// relocation		// relocation
// entries associated solely with the procedure linkage table.		// entries associated solely with the procedure linkage table.
// Separating		// Separating
// these relocation entries lets the dynamic linker ignore them during		// these relocation entries lets the dynamic linker ignore them during
// process initialization, if lazy binding is enabled. If this entry is		// process initialization, if lazy binding is enabled. If this entry is
// present, the related entries of types DT_PLTRELSZ and DT_PLTREL must		// present, the related entries of types DT_PLTRELSZ and DT_PLTREL must
// also be present.		// also be present.
const ELFDynamic *symbol = FindDynamicSymbol(DT_JMPREL);		const ELFDynamic *symbol = FindDynamicSymbol(DT_JMPREL);
if (symbol) {		if (symbol) {
// Synthesize trampoline symbols to help navigate the PLT.		// Synthesize trampoline symbols to help navigate the PLT.
		labathUnsubmitted Done Reply Inline Actions I wouldn't bother with this. You can just unconditionally create a Symtab object before you start parsing any symbol tables. labath: I wouldn't bother with this. You can just unconditionally create a Symtab object before you…
		kwkAuthorUnsubmitted Done Reply Inline Actions I don't fully agree that it is that simple because further down in the code we do check for `if (m_symtab_up == nullptr)` and that is a condition I need to respect because of relocation, don't I? kwk: I don't fully agree that it is that simple because further down in the code we do check for `if…
		labathUnsubmitted Done Reply Inline Actions Well.. I'm pretty sure you could delete those null checks too. But, given that these null checks seem to be the prevailing pattern in this function, changing that might be better left for a separate patch... labath: Well.. I'm pretty sure you could delete those null checks too. But, given that these null…
addr_t addr = symbol->d_ptr;		addr_t addr = symbol->d_ptr;
Section *reloc_section =		Section *reloc_section =
section_list->FindSectionContainingFileAddress(addr).get();		section_list->FindSectionContainingFileAddress(addr).get();
if (reloc_section) {		if (reloc_section) {
user_id_t reloc_id = reloc_section->GetID();		user_id_t reloc_id = reloc_section->GetID();
const ELFSectionHeaderInfo *reloc_header =		const ELFSectionHeaderInfo *reloc_header =
GetSectionHeaderByIndex(reloc_id);		GetSectionHeaderByIndex(reloc_id);
assert(reloc_header);		assert(reloc_header);
▲ Show 20 Lines • Show All 596 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LLDB][ELF] Load both, .symtab and .dynsym sectionsAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 222634

lldb/lit/Modules/ELF/load-from-dynsym-alone.c

lldb/lit/Modules/ELF/load-symtab-and-dynsym.c

lldb/lit/Modules/ELF/merge-symbols.yaml

lldb/lit/Modules/lit.local.cfg

lldb/lit/helper/toolchain.py

lldb/source/Plugins/ObjectFile/ELF/ELFHeader.h

lldb/source/Plugins/ObjectFile/ELF/ELFHeader.cpp

lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.h

lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp

[LLDB][ELF] Load both, .symtab and .dynsym sections
AbandonedPublic