This is an archive of the discontinued LLVM Phabricator instance.

[llvm-nm] Omit the symbol table entry at index 0 when --debug-syms is enabled for ELF files
ClosedPublic

Authored by chrisjackson on May 20 2019, 8:29 AM.

Details

Summary

Currently, when debug-syms is passed to llvm-nm, symbol table entry index 0 is also printed. gnu-nm omits this symbol. This diff causes llvm-nm to behave the same way for ELF files.

Diff Detail

Event Timeline

chrisjackson created this revision.May 20 2019, 8:29 AM

While this seems to be a direct way to address this, should the ELF object file instead begin the symbol range at 1 instead of 0? That way all clients of libObject know to skip the undefined range and you don't need to modify every call site.

While this seems to be a direct way to address this, should the ELF object file instead begin the symbol range at 1 instead of 0? That way all clients of libObject know to skip the undefined range and you don't need to modify every call site.

I'm not sure. I think there are limited use cases for being able to use that symbol, and it's possible that other clients actively want to dump the value. However, if there aren't any existing uses within LLVM, and we provide an alternative interface to get that symbol, I could get behind this.

mtrent accepted this revision.EditedMay 22 2019, 7:53 AM

I'm not sure. I think there are limited use cases for being able to use that symbol, and it's possible that other clients actively want to dump the value. However, if there aren't any existing uses within LLVM, and we provide an alternative interface to get that symbol, I could get behind this.

So I know the BSD a.out strings table, and therefore the Mach-O strings table, defines offset 0 as meaning "this symbol has no string, reading a value at offset 0 is undefined." Apple's older Mach-O linker would write a zero at location 0 so that if you accidentally did index into the strings table there you'd get an empty string. Apple's current Mach-O linker writes "<space>\0" at this location for a similar reason ( ... which is weird ... but it's 'undefined' so who's to say it's wrong?)

Looking at the ELF file format definition, it looks like ELF makes the following claims. I'm not 100% sure my internet sources are authoritative, but they look reasonable:

Index 0 both designates the first entry in the table and serves as the undefined symbol index.

Meaning Symbol 0 is reserved for the undefined symbols entry.

As mentioned above, the symbol table entry for index 0 (STN_UNDEF) is reserved. This entry holds the values listed in the following table.

And the table includes all zeroes, and the STN_UNDEF contant, which is also 0.

I'd argue that since ELF appears to define a semantic meaning for STN_UNDEF and also defines contents for the struct at that index, folks are expected to be able to read from that value if they want to. On the other hand, it looks like people don't want to see this symbol while iterating over the symbols:

tools/llvm-readobj/ELFDumper.cpp (which doesn't appear to use the libObject code):

if (Buckets[Buc] == ELF::STN_UNDEF)
  continue;

tools/llvm-objdump/llvm-objdump:

for (auto I = O->symbol_begin(), E = O->symbol_end(); I != E; ++I) {
  // Skip printing the special zero symbol when dumping an ELF file.
  // This makes the output consistent with the GNU objdump.
  if (I == O->symbol_begin() && isa<ELFObjectFileBase>(O))
    continue;

tools/llvm-elfabi/ELFObjHandler.cpp:

static Error populateSymbols(ELFStub &TargetStub,
                          const typename ELFT::SymRange DynSym,
                          StringRef DynStr) {
// Skips the first symbol since it's the NULL symbol.
for (auto RawSym : DynSym.drop_front(1)) {

I guess it's unfair to dump this on your shoulders for an llvm-nm change. Your diff is consistent with llvm-objdump. I can't help but feel like libObject should make people's lives easier here somehow. /shrug

This revision is now accepted and ready to land.May 22 2019, 7:53 AM
evgeny777 accepted this revision.May 22 2019, 8:30 AM

Created D62296 to change symbols() for ELF. If it is accepted, this patch can probably be changed to add the test.

Not much code change becaue many clients of symbols() filter out undefined symbols or don't change behavior w/ or w/o index 0. Clients that need STN_UNDEF (e.g. lld) can use getSectionContentsAsArray().

tools/llvm-nm/llvm-nm.cpp
1197

Nit: Sym == *Symbols.begin()

MaskRay accepted this revision.May 23 2019, 6:28 PM
MaskRay added inline comments.
tools/llvm-nm/llvm-nm.cpp
1197

This block is not needed after D62296. You may repurpose this patch to add the test :)

@chrisjackson, I assume this got committed, and just didn't get closed? Can you confirm?

@chrisjackson, I assume this got committed, and just didn't get closed? Can you confirm?

The change to llvm-nm.cpp was no longer needed, as Maskray described. The test was included in D62955.

mtrent resigned from this revision.Mar 7 2020, 11:58 AM