This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Return the section name when calling getSymbolName on a section symbol.
ClosedPublic

Authored by mattd on Jan 23 2019, 9:57 AM.

Details

Summary

Previously, llvm-nm would report symbols for .debug and .note sections as: '?' with an empty section name:

00000000 ?
00000000 ?
...

With this patch the output more closely resembles GNU nm:

00000000 N .debug_abbrev
00000000 n .note.GNU-stack
...

This patch calls getSectionName for sections that belong to symbols of type ELF::STT_SECTION, which returns the name of the section from the section string table.

Diff Detail

Event Timeline

mattd created this revision.Jan 23 2019, 9:57 AM
This revision is now accepted and ready to land.Jan 23 2019, 10:00 AM
mattd planned changes to this revision.Jan 23 2019, 4:13 PM

I have a few changes to this patch. Namely, I want to avoid breaking the llvm-objdump/AArch64/elf-aarch64-mapping-symbols.test. I didn't realize I had broken that test earlier, but I caught it in my second round of testing. It turns out that llvm-objdump presents a data mapping symbol ($x) when the test works. However, with the patch above, that mapping symbol is gone and the section name is displayed instead. I'm working on fixing my patch to avoid that situation.

mattd updated this revision to Diff 183358.Jan 24 2019, 11:54 AM
  • Added a comment to the change in ELFObjectFile.h.
  • Updated the condition to retrieve the section name if the symbol is a section and has no name.
  • I discovered that my original patch broke an aarch64 test. In that case, llvm-objdump was using the section name (produced from my patch) instead of ignoring the section in general. That had the effect of the section name being displayed in the disassembly instead of the aarch64/arm mapping name "$x." This patch now ignores section names and preserves the mapping name in that case. I'm not sure that the change to llvm-objdump is the best choice, but it doesn't seem to break any llvm tests or any test I generated while fiddling around with this patch.
This revision is now accepted and ready to land.Jan 24 2019, 11:54 AM
mattd requested review of this revision.Jan 24 2019, 11:54 AM

My instinct is that the place you've made the change is too deep in the hierarchy, and that's why you have the problem in llvm-objdump. My feeling is that Symbol::getName() should probably return the name of the symbol, and leave it up to the caller to fall back and get the section name for section symbols.

That being said, I see that llvm-nm, at least in places, is assuming that the section name IS being returned when looking up symbol names for section symbols (see the bottom of getSymblNMTypeChar for ELFObjectFileBase, which is currently broken because of this incorrect assumption - presumably this piece of code is untested!).

I think before putting this in as is, it's probably worth reviewing all the call sites and seeing if there is anywhere where returning the section name would be harmful. I suspect it's probably mostly fine, given the lack of test failures.

include/llvm/Object/ELFObjectFile.h
448

I'm not sure what the type of Sec here is, so this probably shouldn't be auto (but feel free to point to a similar call in this area that uses auto, and I'll back down).

test/Object/nm-trivial-object.test
21–22

I'm not sure you're using the right input for this sort of test. I'd expect it to be something like %p/Inputs/trivial-object-test.elf-x86-64, which allows you to then compare the two sets of symbols to see what's different.

123–124

Aside: Why are the majority of these 'N', not 'n'? I wouldn't expect to see global section symbols... but that's my understanding of the difference between the two, based on the GNU nm man page (although I note GNU nm does the same thing).

mattd marked 2 inline comments as done.Jan 25 2019, 8:32 AM

My instinct is that the place you've made the change is too deep in the hierarchy, and that's why you have the problem in llvm-objdump. My feeling is that Symbol::getName() should probably return the name of the symbol, and leave it up to the caller to fall back and get the section name for section symbols.

Interesting point. I was going to fix this up closer to the callsite (llvm-nm), but thought that I was fixing the wrong problem. Therefore, I made the change deeper. The change only uses a section name as a last resort: If there is no symbol name or an empty symbol name. I'm not sure we want to put the responsibility for using a section name in the caller. Ideally, the caller (llvm-nm, etc) calls SymbolRef::getName and gets an Expected<StringRef>. The caller already has to check if the current name is valid. For a symbol representing a section, the caller would need to add more ceremony to get the SectionName, after obtaining the SectionRef and validating that the value is not an error.

include/llvm/Object/ELFObjectFile.h
448

That's a fair point. This routine uses a variety of auto, I figured I could get away with it in this case. I'll update this to std::error_code, which is what getSymbolSection returns.

test/Object/nm-trivial-object.test
21–22

IsNAN.o has both .debug sections and a .note section. Testing IsNAN will exercise both of those cases in llvm-nm.cpp getSymbolNMTypeChar. The test you mention (trivial-object-test.elf-x86-64) only has a .note section. I wanted to test the .debug and .note cases.

123–124

That's the decision of getSymbolNMTypeChar in llvm-nm.cpp. As you point out, the 'N' representing the names for the debug sections is treated the same in GNU nm. According to the GNU nm manual:

"N" The symbol is a debugging symbol.

Since this is a section containing debug data, I assume 'N' is close enough.

For the record, this fixes https://bugs.llvm.org/show_bug.cgi?id=39995.

@mattd, could you please update that bug to show that you are working on it, to prevent duplicate work.

test/Object/nm-trivial-object.test
21–22

I personally feel like that's a different test case. I think that there are two related-but-different behaviours that need testing here:

  1. That section symbols have their name printed (that's not specific to .debug* sections - it includes other section symbols, such as .text and .data).
  2. That the (currently untested) behaviour in getSymbolNMTypeChar around debug sections is correct (i.e. 'N' for .debug* and 'n' for notes).

I suppose that you can test the first whilst testing the second, but they don't have to be.

mattd updated this revision to Diff 184320.Jan 30 2019, 10:09 AM
mattd marked 2 inline comments as done and 2 inline comments as done.
  • Updated the test to also look at the presentation of other sections/symbols, not just the debug sections.
  • Replaced the use of an auto with the actual type.
  • Rebased against master.
mattd updated this revision to Diff 184324.Jan 30 2019, 10:17 AM
  • Update a comment, it was bothering me.

Okay, the change as-is LGTM, assuming that the following has been done:

I think before putting this in as is, it's probably worth reviewing all the call sites and seeing if there is anywhere where returning the section name would be harmful. I suspect it's probably mostly fine, given the lack of test failures.

A few things to look out for:

  1. Some code paths may already do the extra work needed to look up the section name, if the symbol doesn't have a name. These should be tidied up to not do this any more (it can be a later change).
  2. Some code paths may always print the name returned, but this may be undesirable for certain situations (I can't think of any off the top of my head however). This wouldn't be picked up by tests necessarily, due to the way FileCheck tests tend to be written.
  3. Some code may be using the name in some kind of map, if non-empty. This could cause issues e.g. with relocations in LLD, when attempting to map references to symbols via names. In practice, it shouldn't be an issue, but we should double check.
jhenderson accepted this revision.Jan 31 2019, 2:10 AM
This revision is now accepted and ready to land.Jan 31 2019, 2:10 AM
mattd added a comment.Jan 31 2019, 9:41 AM

Okay, the change as-is LGTM, assuming that the following has been done:

I think before putting this in as is, it's probably worth reviewing all the call sites and seeing if there is anywhere where returning the section name would be harmful. I suspect it's probably mostly fine, given the lack of test failures.

A few things to look out for:

  1. Some code paths may already do the extra work needed to look up the section name, if the symbol doesn't have a name. These should be tidied up to not do this any more (it can be a later change).
  2. Some code paths may always print the name returned, but this may be undesirable for certain situations (I can't think of any off the top of my head however). This wouldn't be picked up by tests necessarily, due to the way FileCheck tests tend to be written.
  3. Some code may be using the name in some kind of map, if non-empty. This could cause issues e.g. with relocations in LLD, when attempting to map references to symbols via names. In practice, it shouldn't be an issue, but we should double check.

Thanks for the second pair of eyes on this patch. I did take a look at the codebase from the monorepo to identify uses of getSymbolName. I did not see any that would be disturbed by this change; however, I'll take another peek today.

This revision was automatically updated to reflect the committed changes.