Currently, when debug-syms is passed to llvm-nm, symbol table entry index 0 is also printed. gnu-nm omits this symbol. This diff causes llvm-nm to behave the same way for ELF files.
Details
Diff Detail
Event Timeline
While this seems to be a direct way to address this, should the ELF object file instead begin the symbol range at 1 instead of 0? That way all clients of libObject know to skip the undefined range and you don't need to modify every call site.
I'm not sure. I think there are limited use cases for being able to use that symbol, and it's possible that other clients actively want to dump the value. However, if there aren't any existing uses within LLVM, and we provide an alternative interface to get that symbol, I could get behind this.
So I know the BSD a.out strings table, and therefore the Mach-O strings table, defines offset 0 as meaning "this symbol has no string, reading a value at offset 0 is undefined." Apple's older Mach-O linker would write a zero at location 0 so that if you accidentally did index into the strings table there you'd get an empty string. Apple's current Mach-O linker writes "<space>\0" at this location for a similar reason ( ... which is weird ... but it's 'undefined' so who's to say it's wrong?)
Looking at the ELF file format definition, it looks like ELF makes the following claims. I'm not 100% sure my internet sources are authoritative, but they look reasonable:
Index 0 both designates the first entry in the table and serves as the undefined symbol index.
Meaning Symbol 0 is reserved for the undefined symbols entry.
As mentioned above, the symbol table entry for index 0 (STN_UNDEF) is reserved. This entry holds the values listed in the following table.
And the table includes all zeroes, and the STN_UNDEF contant, which is also 0.
I'd argue that since ELF appears to define a semantic meaning for STN_UNDEF and also defines contents for the struct at that index, folks are expected to be able to read from that value if they want to. On the other hand, it looks like people don't want to see this symbol while iterating over the symbols:
tools/llvm-readobj/ELFDumper.cpp (which doesn't appear to use the libObject code):
if (Buckets[Buc] == ELF::STN_UNDEF) continue;
tools/llvm-objdump/llvm-objdump:
for (auto I = O->symbol_begin(), E = O->symbol_end(); I != E; ++I) { // Skip printing the special zero symbol when dumping an ELF file. // This makes the output consistent with the GNU objdump. if (I == O->symbol_begin() && isa<ELFObjectFileBase>(O)) continue;
tools/llvm-elfabi/ELFObjHandler.cpp:
static Error populateSymbols(ELFStub &TargetStub, const typename ELFT::SymRange DynSym, StringRef DynStr) { // Skips the first symbol since it's the NULL symbol. for (auto RawSym : DynSym.drop_front(1)) {
I guess it's unfair to dump this on your shoulders for an llvm-nm change. Your diff is consistent with llvm-objdump. I can't help but feel like libObject should make people's lives easier here somehow. /shrug
Created D62296 to change symbols() for ELF. If it is accepted, this patch can probably be changed to add the test.
Not much code change becaue many clients of symbols() filter out undefined symbols or don't change behavior w/ or w/o index 0. Clients that need STN_UNDEF (e.g. lld) can use getSectionContentsAsArray().
tools/llvm-nm/llvm-nm.cpp | ||
---|---|---|
1197 | Nit: Sym == *Symbols.begin() |
@chrisjackson, I assume this got committed, and just didn't get closed? Can you confirm?
The change to llvm-nm.cpp was no longer needed, as Maskray described. The test was included in D62955.
Nit: Sym == *Symbols.begin()