This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objdump] Teach `llvm-objdump` dump dynamic symbols.
ClosedPublic

Authored by Higuoxing on Mar 6 2020, 8:49 AM.

Details

Summary

This patch is to teach llvm-objdump dump dynamic symbols (-T and --dynamic-syms). Currently, this patch is not fully compatible with gnu-objdump, but I would like to continue working on this in next few patches. It has two issues.

  1. Some symbols shouldn't be marked as global(g). (-t/--syms has same issue as well) (Fixed by D75659)
  2. gnu-objdump can dump version information and *dynamically* insert before symbol name field. (I haven't been able to figure out how it works)

objdump -T a.out gives:

DYNAMIC SYMBOL TABLE:
0000000000000000  w   D  *UND*  0000000000000000              _ITM_deregisterTMCloneTable
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 printf
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 __libc_start_main
0000000000000000  w   D  *UND*  0000000000000000              __gmon_start__
0000000000000000  w   D  *UND*  0000000000000000              _ITM_registerTMCloneTable
0000000000000000  w   DF *UND*  0000000000000000  GLIBC_2.2.5 __cxa_finalize

llvm-objdump -T a.out gives:

DYNAMIC SYMBOL TABLE:
0000000000000000  w   D  *UND*  0000000000000000 _ITM_deregisterTMCloneTable
0000000000000000 g    DF *UND*  0000000000000000 printf
0000000000000000 g    DF *UND*  0000000000000000 __libc_start_main
0000000000000000  w   D  *UND*  0000000000000000 __gmon_start__
0000000000000000  w   D  *UND*  0000000000000000 _ITM_registerTMCloneTable
0000000000000000  w   DF *UND*  0000000000000000 __cxa_finalize

Diff Detail

Event Timeline

Higuoxing created this revision.Mar 6 2020, 8:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2020, 8:49 AM

We should reuse printSymbolTable for DYNAMIC SYMBOL TABLE:. D75659 should have fixed the g ll problems.

Higuoxing updated this revision to Diff 248928.Mar 7 2020, 5:46 AM

Addressed @MaskRay 's comment.

  • Reuse printSymbolTable function.
Higuoxing edited the summary of this revision. (Show Details)Mar 7 2020, 8:15 AM
MaskRay added inline comments.Mar 7 2020, 8:21 AM
llvm/test/tools/llvm-objdump/elf-dynamic-symbol.test
1 ↗(On Diff #248928)

My idea is that after we have the binary format specific directory test/tools/llvm-objdump/ELF/. We can move tests under it.

llvm/tools/llvm-objdump/llvm-objdump.cpp
1887–1888

For .dynsym, char Debug should use 'D' instead of ' '

Higuoxing updated this revision to Diff 248938.Mar 7 2020, 9:00 AM

Addressed @MaskRay 's comments.

  • Dump dynamic symbol as 'D'
  • Move tests to ELF/ (Shall I wait until D75798 lands?)
Higuoxing updated this revision to Diff 249293.Mar 10 2020, 2:33 AM

Rebase & Update doc.

grimar added inline comments.Mar 10 2020, 3:05 AM
llvm/test/tools/llvm-objdump/ELF/elf-dynamic-symbol.test
54 ↗(On Diff #248938)

Add an empty line before --- !ELF please.

llvm/tools/llvm-objdump/llvm-objdump.cpp
1854–1855

DumpDynamicSymbol -> IsDynamic (Used commonly ind shorter).

1856

It is a bit strange when the code assigns a value to a variable and then reassign it right below.

I.e. why not do something like below?

  if (const COFFObjectFile *Coff = dyn_cast<const COFFObjectFile>(O)) {
    outs() << "SYMBOL TABLE:\n";
    printCOFFSymbolTable(Coff);
    return;
  }
.......
llvm::basic_symbol_iterator I, E;
if (!DumpDynamicSymbol) {
   I = O->symbol_begin();
   E = O->symbol_end();
   outs() << "SYMBOL TABLE:\n";
...
} else {
   I = ELF->getDynamicSymbolIterators().begin();
   E = ELF->getDynamicSymbolIterators().end();
   outs() << "DYNAMIC SYMBOL TABLE:\n"
...
}
...
1865

This error is untested. Also, perhaps you could use reportWarning instead.
(Does not seem we want to stop dump here).

1869

dyn_cast will return nullptr when an object is not ELF. Since you know it is always an ELF object at this point, you should use cast.

1873

Non dynamic symbol table also has a null symbol. Why we do not need it for !DumpDynamicSymbol? (Or: why do we need it here?).

1890

It seems that when we are consistent with the code around, it looks a bit simpler:

char Debug = ' ';
if (DumpDynamicSymbol)
 Debug = 'D';
else if (Type == SymbolRef::ST_Debug || Type == SymbolRef::ST_File)
 Debug = 'd';
2256

StringRef() -> /*ArchitectureName=*/=""

Some thoughts on this, which don't need to be addressed in this patch, but should be in follow-on ones if necessary:

  1. Does GNU objdump use the section name or section type to identify the section? What happens if there are multiple SHT_DYNSYM sections? What about if ".dynsym" is not SHT_DYNSYM?
  2. What should we do if the section header table is missing (e.g. due to llvm-objcopy --strip-sections)? See llvm-readobj bugs https://bugs.llvm.org/show_bug.cgi?id=45089 and https://bugs.llvm.org/show_bug.cgi?id=45090 for suggestions.
llvm/test/tools/llvm-objdump/ELF/elf-dynamic-symbol.test
1 ↗(On Diff #249293)

No need for "elf-" in the test file name.

4–5 ↗(On Diff #249293)

It would be a good idea to use --strict-whitespace and --match-full-lines in the FileCheck calls.

9–12 ↗(On Diff #249293)

Here and below, remove the excessive spacing, so that the values line up, but with only a single space between Machine: and its value.

14–16 ↗(On Diff #249293)

Could you line up blocks like this so that the values all line up:

- Name: .data
  Type:  SHT_PROGBITS
  Flags:  [ SHF_ALLOC, SHF_WRITE ]
18–21 ↗(On Diff #249293)
- Name:    localsym
  Type:    STT_OBJECT
  Section: .data
  Binding: STB_LOCAL

Same below

51 ↗(On Diff #249293)

I think there are a few different cases here, and I think you need to test them all:

  1. No dynamic symbol table at all (i.e. no SHT_DYNSYM section).
  2. A truly empty .dynsym section (size == 0)
  3. A logically empty .dynsym section (just has the 0-index symbol).

I think this is testing the first one, but I'm not sure.

llvm/tools/llvm-objdump/llvm-objdump.cpp
1854–1855

DumpDynamicSymbol -> DumpDynamic

I think that's sufficient, and also it's the symbol table, not a single symbol.

Higuoxing updated this revision to Diff 249846.EditedMar 12 2020, 12:33 AM

Addressed @grimar 's comments

  • Change DumpDynamicSymbol to DumpDynamic
  • I have to use this pattern, due to there doesn't exist a default constructor for basic_symbol_iterator
auto I = O->symbol_begin();
...
if (DumpDynamic) {
  I = ELF->dynamic_symbol_begin();
  ...
}
  • Change WithColor::error() to reportWarning()
  • Change dyn_cast to cast on an ELF object.
  • Modify dynamic_symbol_begin() according to symbol_begin() method.

Addressed @jhenderson 's comments

  • Change test file's name
  • I didn't apply --strict-whitespace to FileCheck, because there are tabs preventing me doing so (Shall we look into this later?).
  • Rearrage yaml file blocks to line up values.
  • Add more tests for ELF files with truely empty .dynsym section, logically empty .dynsym section.
grimar added inline comments.Mar 12 2020, 2:05 AM
llvm/include/llvm/Object/ELFObjectFile.h
1024 ↗(On Diff #249846)

"Index 0 in any symbol table is used to represent undefined symbols.
This first entry in a symbol table is always completely zeroed." (Oracle ELF spec)
A null symbol is also a symbol, so it is not a correct change.

The change you want to do should not be here, in a higher level library.

Also, did you run llvm tests? I'd expect them should fail with such change.

llvm/tools/llvm-objdump/llvm-objdump.cpp
1862–1863

Addressed @grimar 's comments

  • I have to use this pattern, due to there doesn't exist a default constructor for basic_symbol_iterator
auto I = O->symbol_begin();
...
if (DumpDynamic) {
  I = ELF->dynamic_symbol_begin();
  ...
}

I think lets create a printSymbol(SymbolRef Sym, ...) helper. It should help.

1868

I wonder if we want to have it here or move it above, out of scope of this if (!DumpDynamic).
Probably we do not want to change the behavior of COFF?

Higuoxing marked an inline comment as done.Mar 12 2020, 3:24 AM
Higuoxing added inline comments.
llvm/include/llvm/Object/ELFObjectFile.h
1024 ↗(On Diff #249846)

I've run tests locally, and no failure is observed. I also checked function call (getDynamicSymbolIterators()). link

  • llvm/tools/llvm-nm/llvm-nm.cpp: There's an attempt to omit NULL symbol in D62148. @MaskRay helps change the behavior of symbol_begin() in D62296. I think dynamic_symbol_begin() should be changed as well.
  • For calls in lib/Object/SymbolSize.cpp, I think this is similar to https://reviews.llvm.org/D62296#1514082
Higuoxing marked an inline comment as done.Mar 12 2020, 11:20 AM
Higuoxing added inline comments.
llvm/include/llvm/Object/ELFObjectFile.h
1024 ↗(On Diff #249846)

I have opened a new revision for this change in D76081 : )

  • I didn't apply --strict-whitespace to FileCheck, because there are tabs preventing me doing so (Shall we look into this later?).

Does GNU objdump use hard tabs too? How practical would it be to write hard tabs in the test check? I believe they should be respected with --strict-whitespace (although you'll need to remove the extra space between the CHECKs' ':' and the start of the string to check).

llvm/test/tools/llvm-objdump/ELF/dynamic-symbol.test
12 ↗(On Diff #249846)

You need to add a CHECK-EMPTY or similar the line below to show that there are no more symbols printed.

49 ↗(On Diff #249846)

You don't need the double blank line. It's clear enough that the next bit is a new case with a single one.

Same goes below.

56 ↗(On Diff #249846)

You need to add a CHECK-EMPTY or similar the line below to show that there are no symbols printed.

64–67 ↗(On Diff #249846)

You don't need the .data section here or in any of the test cases below..

70 ↗(On Diff #249846)

Put the brackets before the full stop. Also "have" -> "has".

74 ↗(On Diff #249846)

I'd change the check prefix to "ONLY-NULL"

You need to add a CHECK-EMPTY or similar the line below to show that there are no symbols printed.

88 ↗(On Diff #249846)

Same comment as above re. full stop position. Also truely -> truly.

92 ↗(On Diff #249846)

TRUELY_EMPTY to simply EMPTY is fine.

You need to add a CHECK-EMPTY or similar the line below to show that there are no symbols printed.

llvm/tools/llvm-objdump/llvm-objdump.cpp
1875

Still need a test for this. Also, use lower-case first letter for diagnostics, and no trailing full stop. Finally, do you need the '\n'? I'd expect reportWarning to do that itself.

1887–1888

Unrelated formatting change should not be included.

Higuoxing updated this revision to Diff 251301.Mar 19 2020, 1:04 AM

Addressed comments.

  • Move printCOFFSymbol() out of if (!DumpDynamic) scope.
  • > @grimar: I think lets create a printSymbol(SymbolRef Sym, ...) helper. It should help.

    Can we do this in next few patches? It seems that we have to resolve some continue; and break; statements.
  • Correct usage of reportWarning() (e.g. lower-case first letter, remove '\n' and trailing full stop.)
  • Add test for reportWarning()
  • Add --strict-whitespace option for FileCheck
  • Add CHECK-EMPTY to ensure no more symbols are printed.
  • Remove useless Section: .data field in YAML.
  • Correct some checking prefix (e.g. NOT-NULL, EMPTY).
Higuoxing updated this revision to Diff 251302.Mar 19 2020, 1:09 AM

Break long # RUN: lines.

Can we do this in next few patches? It seems that we have to resolve some continue; and break; statements.

I think we should commit the nice code from start. If we can't because of a minor refactoring needed, this refactoring should be done first.

Higuoxing updated this revision to Diff 251316.Mar 19 2020, 3:08 AM

Fix lint warning.

Can we do this in next few patches? It seems that we have to resolve some continue; and break; statements.

I think we should commit the nice code from start. If we can't because of a minor refactoring needed, this refactoring should be done first.

Sure. Let me have a try, thanks a lot.

grimar added a comment.EditedMar 19 2020, 3:16 AM

Can we do this in next few patches? It seems that we have to resolve some continue; and break; statements.

I think we should commit the nice code from start. If we can't because of a minor refactoring needed, this refactoring should be done first.

Sure. Let me have a try, thanks a lot.

I think you can just do it in this patch: separating the code to a helper function is a normal common thing that is often done.
Probably no need to create a separate revision, since this refactoring is needed for changes you do right here.

Higuoxing updated this revision to Diff 251328.Mar 19 2020, 3:39 AM

Add printSymbol() helper function.

Higuoxing updated this revision to Diff 251330.Mar 19 2020, 3:45 AM

Remove irrevelent format change.

grimar added inline comments.Mar 19 2020, 3:54 AM
llvm/tools/llvm-objdump/llvm-objdump.cpp
1887–1888

The reason to create a printSymbol helper was to address my previous comments.

Now, when you have it you should be able to write the code above cleaner.
Something like:

if (!DumpDynamic) {
  outs() << "SYMBOL TABLE:\n";
  for (auto I = O->symbol_begin(), E = O->symbol_end(); I != E; ++I)
    printSymbol(O, *I, FileName, ArchiveName, ArchitectureName, DumpDynamic)l
  return;
}

outs() << "DYNAMIC SYMBOL TABLE:\n";
...
<the similar code for dynamic>
Higuoxing updated this revision to Diff 251341.Mar 19 2020, 4:17 AM

Great! I didn't get it just now 🤣

Addressed @grimar 's comment.

grimar accepted this revision.Mar 19 2020, 4:47 AM

I have only 2 minor nits and a question to others, otherwise this looks good, thanks!
So I've accepted, but let's wait for one more opinion.

llvm/test/tools/llvm-objdump/X86/elf-dynamic-symbols.test
87

DynamicSymbols: [] is a bit more commonly used in tests for cases when you want to have a .dynsym with a zero symbol only.

108

Size: 0 probably better reflects what you want to achieve.

119

I am not sure about this part.
I am not sure it should be in this test, because it called "elf-*".
And I am concerned that precomiled objects are used a bit, though we could use
a trivial YAMLs instead (anyways the general direction is to remove all committed precompiled objects,
so at least it worth to stop adding new places I think).

At the same time I wonder if we might want to add a test case like all-targets-unimlemented-features.test or alike and put all
temporarily checks like this there.

Or perhaps we can just go with this piece for now and remove it in a follow-up (since you do not add new binaries, I think it is OK)

2All: What do others think?

This revision is now accepted and ready to land.Mar 19 2020, 4:47 AM
Harbormaster failed remote builds in B49725: Diff 251333!
Harbormaster failed remote builds in B49732: Diff 251341!
grimar added inline comments.Mar 19 2020, 4:50 AM
llvm/test/tools/llvm-objdump/X86/elf-dynamic-symbols.test
119

If we decide to go with this piece for now, please add empty line after "RUN:"

Fix nits.

Thanks a lot for taking time review.

jhenderson added inline comments.Mar 23 2020, 2:12 AM
llvm/docs/CommandGuide/llvm-objdump.rst
88–91

Lexicographically, 'T' is before 't', so this should probably be before the -t option.

llvm/test/tools/llvm-objdump/X86/elf-dynamic-symbols.test
119

I'd move these into unimplemented-features.test probably. I'd also only have one version for each file format (i.e. just --dynamic-syms, not -T as well). I suspect there should be more versions of this though for other formats (I'm thinking XCOFF off the top of my head, but there may be others). As for using pre-compiled binaries, my preference is to not use them if we can avoid it. yaml2obj support exists for COFF and Mach-O at least, so those can be added even if other formats require existing prebuilt binaries.

llvm/tools/llvm-objdump/llvm-objdump.cpp
1874

This should probably be changed to say "this operation is not currently supported for this file format", removing the "ELF" reference, since the message will need to be updated every time support is added for a new file format.

1885–1887

As this function is now unindented, you need to run clang-format on the whole function. Sorry, if I incorrectly commented incorrectly on something being an unrelated formatting before! Phabricator doesn't show the indentation change as an actual change.

Higuoxing marked 2 inline comments as done.Mar 23 2020, 3:02 AM
Higuoxing added inline comments.
llvm/docs/CommandGuide/llvm-objdump.rst
88–91

It seems that -r, --reloc is before -R, --dynamic-reloc, -d, --disassemble is before -D, --disassemble-all. I'm a little confused .. 🤣

llvm/tools/llvm-objdump/llvm-objdump.cpp
1885–1887

Never mind, I could format this now : )
I was afraid of unwanted format bringing troubles for reviewing.

jhenderson added inline comments.Mar 23 2020, 3:17 AM
llvm/docs/CommandGuide/llvm-objdump.rst
88–91

Oh okay, leave it as is then. Sorry!

Higuoxing updated this revision to Diff 251987.Mar 23 2020, 4:26 AM
  • Move unimplemented features test to unimplemented-features.test.
  • Improve warning message.
  • Format codes.
Higuoxing updated this revision to Diff 251988.Mar 23 2020, 4:28 AM

Format YAML.

Higuoxing updated this revision to Diff 252009.Mar 23 2020, 6:26 AM

Rename isSTAB to IsSTAB, attampt to make clang-tidy check happy (Fix builds).

  • Use [[FILE]] to check file name.
jhenderson added inline comments.Mar 24 2020, 2:33 AM
llvm/test/tools/llvm-objdump/unimplemented-features.test
1 ↗(On Diff #252226)

I'd change this to "Test dumping dynamic symbols reports a warning if the operation is unsupported for the file format."

3 ↗(On Diff #252226)

'##' here and below for all comments.

14 ↗(On Diff #252226)

No need for the cp. It's okay to reference the binary directly in the llvm-objdump call here.

Higuoxing marked an inline comment as done.Mar 24 2020, 3:11 AM
Higuoxing added inline comments.
llvm/test/tools/llvm-objdump/unimplemented-features.test
14 ↗(On Diff #252226)

I was using cp to shorten the file name as %t3, which is intended to replace [[FILE]]. Is this acceptable?

jhenderson added inline comments.Mar 24 2020, 3:19 AM
llvm/test/tools/llvm-objdump/unimplemented-features.test
14 ↗(On Diff #252226)

Personally, I don't think that really gains us anything.

Higuoxing updated this revision to Diff 252292.Mar 24 2020, 6:08 AM

Addressed comment.

Higuoxing updated this revision to Diff 252294.Mar 24 2020, 6:10 AM

Indent for continuous lines.

jhenderson accepted this revision.Mar 25 2020, 2:14 AM

Two minor comments. Otherwise, looks good to me.

llvm/test/tools/llvm-objdump/unimplemented-features.test
4 ↗(On Diff #252294)

You can get rid of --match-full-lines and --strict-whitespace form here and below. This is only testing a warning, not the output format of an option.

6 ↗(On Diff #252294)

No need to duplicate this comment throughout this test. The one at the top is sufficient (you can get rid of the extra blank lines then too. You can keep the old comments for "Mach-O/COFF object file" if you like, or just name the output files %t.coff.o, and %t.macho.o to make them self-documenting.

Higuoxing updated this revision to Diff 253328.Mar 28 2020, 4:02 AM
  • Remove duplicated comments.
  • Get rid of --match-full-lines and --strict-whitespace
This revision was automatically updated to reflect the committed changes.