Page MenuHomePhabricator

[llvm-readobj/llvm-readelf]: Dump stack-size entry for all function symbols that have a particular value (fixes PR43245)
Needs ReviewPublic

Authored by wolfgangp on Sep 24 2019, 5:13 PM.

Details

Summary

It is possible that several function symbols have the same value, for example as a result of linker optimizations such as collapsing of identical functions into one function body. See PR43245.

This patch fixes readelf/readobj in the following way:
for each stack size entry, dump all functions (not just the first one) with the specified address.

(Note: as part of collapsing the function bodies, the linker has also collapsed the stack size entries).

Diff Detail

Event Timeline

wolfgangp created this revision.Sep 24 2019, 5:13 PM
jhenderson added inline comments.
tools/llvm-readobj/ELFDumper.cpp
4758–4760

I think it would be good if we could avoid duplicating this reportWarning call (especially the text part). Perhaps worth a small lambda, or at least pulling the message into a StringRef?

4761

Whilst not strictly necessary, it might be good to return here. I think that would be less likely to run into surprises due to future code changes below it.

grimar added inline comments.Sep 25 2019, 1:47 AM
tools/llvm-readobj/ELFDumper.cpp
4762

I'd add a early return here:

if (FuncSyms.empty()) {
  reportWarning(
      createError("could not identify function symbol for stack size entry"),
      Obj->getFileName());
  printStackSizeEntry(StackSize, "?");
  return;
}
4779

Perhaps inline FuncName? It would remove one string assignment for the normal case:

for (const SymbolRef &FuncSym : FuncSyms) {
  // A valid SymbolRef has a non-null object file pointer.
  if (FuncSym.BasicSymbolRef::getObject()) {
    // Extract the symbol name.
    Expected<StringRef> FuncNameOrErr = FuncSym.getName();
    if (FuncNameOrErr)
      printStackSizeEntry(StackSize, maybeDemangle(*FuncNameOrErr));
    else
      consumeError(FuncNameOrErr.takeError());
    continue;
  }
  reportWarning(
      createError("could not identify function symbol for stack size entry"),
      Obj->getFileName());
  printStackSizeEntry(StackSize, "?");
}
grimar added inline comments.Sep 25 2019, 1:51 AM
tools/llvm-readobj/ELFDumper.cpp
4779

Ah, my suggestion will not work for a case when getName() returns an error. Please ignore this comment.

One issue that just occurred to me, is what happens if you still have the stack size entry in this case? Something like:

<snip>
Sections:
  - Name: .stack_sizes
    Type: SHT_PROGBITS
    Entries:
      - Address: 0x10
        Size: 32
      - Address: 0x10
        Size: 32
Symbols:
  - Name: foo
    Type: STT_FUNC
    Value: 0x10
  - Name: baz
    Type: STT_FUNC
    Value: 0x10

I might be missing something in the code, but I think your change will result in the stack size entries effectively being printed twice per symbol:

# Printing for first entry:
32 foo
32 baz
# Printing for second entry:
32 foo
32 baz

I think this is undesirable. Perhaps you need to track whether a stack size entry for a given address has been printed yet, and skip it if it has?

It is possible that several function symbols have the same value, for example as a result of linker optimizations such as collapsing of identical functions into one function body. See PR43245.

Yes. A simple reproduce:

clang -ffunction-sections -fstack-size-section a.c -c
ld.lld --icf=all a.o -o a

Another type of cases is function aliases (__attribute__((alias(foo)))).

This patch fixes readelf/readobj to dump the stack size for all such functions, not just the first one found in the symbol table.

It may be clearer to restate this: for each stack size entry, dump all functions (not just the first one) with the specified address.

test/tools/llvm-readobj/stack-sizes.test
177

Add another entry with Type: STT_NOTYPE to demonstrate that a non-function symbol is not dumped.

MaskRay added inline comments.Sep 25 2019, 2:27 AM
tools/llvm-readobj/ELFDumper.cpp
4770

if (Expected<StringRef> FuncNameOrErr = FuncSym.getName())

One issue that just occurred to me, is what happens if you still have the stack size entry in this case? Something like:

<snip>

I think this is undesirable. Perhaps you need to track whether a stack size entry for a given address has been printed yet, and skip it if it has?

Yes, good observation. There is also the question what to do if the sizes of 2 different entries with the same address don't match. This should probably elicit a warning.

Yes, good observation. There is also the question what to do if the sizes of 2 different entries with the same address don't match. This should probably elicit a warning.

I agree this should probably print a warning, and I think it then should do a best guess print of the stack sizes. I have three options, in rough order of preference for this:

  1. Display the entries with symbols in the order they appear in the symbol table, i.e. if foo and bar appear in that order in the symbol table, with the same address, I'd print foo with the first stack size and bar with the second. This is probably a reasonable guess based on what the linker might have done (putting the symbols in order they are discovered, same as the stack sizes entries).
  2. Display only the largest entry, but once per symbol with the same address. In practice, this is probably just as good. It doesn't really make sense for two symbols to have the same address but different stack sizes: that would imply that the same bit of code has different possible sizes, which doesn't really make sense.
  3. Display the first entry size, for each symbol. This is probably the simplest.
MaskRay added a comment.EditedSep 26 2019, 2:18 AM

Yes, good observation. There is also the question what to do if the sizes of 2 different entries with the same address don't match. This should probably elicit a warning.

I agree this should probably print a warning, and I think it then should do a best guess print of the stack sizes. I have three options, in rough order of preference for this:

  1. Display the entries with symbols in the order they appear in the symbol table, i.e. if foo and bar appear in that order in the symbol table, with the same address, I'd print foo with the first stack size and bar with the second. This is probably a reasonable guess based on what the linker might have done (putting the symbols in order they are discovered, same as the stack sizes entries).
  2. Display only the largest entry, but once per symbol with the same address. In practice, this is probably just as good. It doesn't really make sense for two symbols to have the same address but different stack sizes: that would imply that the same bit of code has different possible sizes, which doesn't really make sense.
  3. Display the first entry size, for each symbol. This is probably the simplest.

The warning case should be very rare. The ICF implementation in neither gold nor lld folds sections with different sizes (i.e. functions with different sizes) gold/icf.cc#l756 lld/ELF/ICF.cpp#L305. st_size=0 aliases can sometimes happen for STT_NOTYPE symbols, but I can't find any case for STT_FUNC. Emitting a warnings looks good to me. I think the we can just pick the simplest guessing approach.

wolfgangp updated this revision to Diff 222015.Sep 26 2019, 1:06 PM
wolfgangp marked 4 inline comments as done.
wolfgangp edited the summary of this revision. (Show Details)
  1. Rebased with the latest upstream changes.
  2. Addressed review comments as follows:
    • Keeping track of stack sizes already dumped so that duplicate entries only are printed once.
    • Issue a warning when a duplicate entry with a different stack size is encountered.
    • Simplified code by collapsing some warning code into a lambda.
    • added the appropriate test cases for the changed functionality and added one for STT_NOTYPE.
    • added an early return in a case where no valid symbol is found.

Yes, good observation. There is also the question what to do if the sizes of 2 different entries with the same address don't match. This should probably elicit a warning.

<snip>

  1. Display the first entry size, for each symbol. This is probably the simplest.

For some reason, I didn't see your response until I had posted my updated. I basically implemented number 3.

jhenderson added inline comments.Sep 27 2019, 1:58 AM
test/tools/llvm-readobj/stack-sizes.test
95–100

I think combining the different cases into a single test run is risking confusion. I'd prefer it if each issue were a separate test (i.e. have a basic single-entry only case, a NOBITS case, a multiple entries for the same address case, and one where the sizes conflict).

172–176

It's not clear from anywhere why this symbol exists. It needs commenting at least. See also my comments above.

686

FileCheck has an --input-file switch, which I think is more commonly used instead of redirecting stdin, when the file exists on disk.

717

This looks out-of-line in comparison to the other fields.

735–737

Using a section symbol is rather confusing. Can't this just be another symbol with value 0 in the same section?

Also, I think relocations are a more complex case to the executable one. In particular, you need to handle two symbols in different sections, but with identical values. In these cases, they aren't duplicates (and you need to show that).

tools/llvm-readobj/ELFDumper.cpp
373

Does this map need to be ordered?

4724

I think these warnings need more context - which entry can't you identify a function symbol for here? (Add the offset).

4761

This is marked as done but hasn't been addressed?

4762

Which entries (include the offsets), and what were their sizes?