This is an archive of the discontinued LLVM Phabricator instance.

[objdump] Fix disassembly for call/branch instructions for ELF binaries built with -ffunction-sections
Needs RevisionPublic

Authored by davidb on Feb 19 2020, 2:58 AM.

Details

Summary

When disassembling relocatable ELF object files built with -ffunction-sections, calls and branches to symbols that live in different text sections cannot be resolved correctly. You could add -r to dump the relocations inbetweeen the disassembly, however this is not widely known by users of objdump, and the default behaviour has been confusing many users of the tool. This behaviour differs from the gnu binutils... but I'm not sure that is really a bad thing.

Outstanding question about the symbol addend? I couldn't find out hor to access that from RelocationRef.

Note that the XCOFF fix has been split into a separate patch.

Event Timeline

davidb created this revision.Feb 19 2020, 2:58 AM
davidb retitled this revision from [objump] Fix disassembly for call/branch instructions for ELF binaries built with -ffunction-sections to [objdump] Fix disassembly for call/branch instructions for ELF binaries built with -ffunction-sections.Feb 19 2020, 2:58 AM
davidb edited the summary of this revision. (Show Details)Feb 19 2020, 3:04 AM
davidb added reviewers: grimar, MaskRay, rnk.
MaskRay added inline comments.Feb 19 2020, 6:08 PM
llvm/test/tools/llvm-objdump/X86/demangle.s
14

The addend is incorrectly dropped.

llvm/test/tools/llvm-objdump/X86/elf-disassemble-relocs.test
10–11

This is not correct. whiz is undefined. It is incorrect to symbolize with an undefined symbol.

objdump -d does not print the symbol:

5:   e8 00 00 00 00          callq  0xa
                      6: R_X86_64_PLT32       whiz+0x1
llvm/test/tools/llvm-objdump/X86/section-filter-relocs.test
13

foo is undefined. It is not correct to symbolize with an undefined symbol. The problem is that the display can get addends wrong.

For disassembly of object files, I think users should realize that they need to use -d -r.

llvm/test/tools/llvm-objdump/xcoff-disassemble-all.test
3 ↗(On Diff #245362)

You can move XCOFF support to a separate change.

davidb marked an inline comment as done.Feb 20 2020, 4:09 AM
davidb added inline comments.
llvm/test/tools/llvm-objdump/X86/section-filter-relocs.test
13

It is not correct to symbolize with an undefined symbol.

What do you mean by symbolize? Are you saying it is incorrect to print the name of the symbol that the call or branch instruction is to be patched to when linked? How is <.text+0x5> more correct?that seems like garbage to me...

In the -ffunction-section case, these symbols are defined, but live in a separate section, and therefore require a relocation. The result is the same as if the symbol was undefined. Would it be ok if defined symbols were displayed, but undefined ones left to print garbage?

For disassembly of object files, I think users should realize that they need to use -d -r.

This is pretty much what is being asked in the OP. And I'm not sure I agree. When a user is compiling ELF binaries with -ffunction-sections, then -d is pretty much useless. Worse than useless in the case that on some platforms it will pick the symbol name for that particular section, so it looks like a genuine call

davidb marked an inline comment as done.Feb 20 2020, 4:14 AM
davidb added inline comments.
llvm/test/tools/llvm-objdump/xcoff-disassemble-all.test
3 ↗(On Diff #245362)

Was submitted before as D74823

Looks like you've got changes related to D74823 mixed in here?

Looks like you've got changes related to D74823 mixed in here?

Yes I put a note in the original description mentioning that this has been split to a separate change.

Looks like you've got changes related to D74823 mixed in here?

Yes I put a note in the original description mentioning that this has been split to a separate change.

It would be clearer if you made them as two separate diffs, with the one that needs to be second based on the first one, so that Phabricator only shows the differences related to this patch (in other words, each Phabricator review represents one git commit). You can also indicate the dependency by saying "Depends on DXXXXXX" in the patch description, and Phabricator will then link the two patches together.

Looks like you've got changes related to D74823 mixed in here?

Yes I put a note in the original description mentioning that this has been split to a separate change.

It would be clearer if you made them as two separate diffs, with the one that needs to be second based on the first one, so that Phabricator only shows the differences related to this patch (in other words, each Phabricator review represents one git commit). You can also indicate the dependency by saying "Depends on DXXXXXX" in the patch description, and Phabricator will then link the two patches together.

Done. I'm aware that there is more required to do on this change, but I don't want to continue until we answer the question about handling undefined symbols, or symbols defined in different text sections... First impression is that current behaviour that matches GNU is preferable.

On the references to undefined symbols, I don't have a strong opinion either way. It is nice to see the symbol name, as produced in this case, and I don't know how important GNU compatibility is here. On the other hand, if you print with inline relocations, you will get the symbol name from that, so maybe it's unnecessary?

I think symbolizing an undefined symbol as if it is defined is fine.

llvm/tools/llvm-objdump/llvm-objdump.cpp
1463

Addends may be important for "range extension thunks with addends"

https://bugs.llvm.org/show_bug.cgi?id=40438

I think a sensible approach is to shift r_addend as if r_offset is moved to the start address of the instruction.

MaskRay requested changes to this revision.Jul 28 2021, 8:47 PM

Request changes as this may regress disassembly for "range extension thunks with addends" (see inline comment)

This revision now requires changes to proceed.Jul 28 2021, 8:47 PM