Page MenuHomePhabricator

Add support for dumping relocations in non-relocatable files
ClosedPublic

Authored by colinl on Jan 7 2016, 12:22 PM.

Details

Summary

Relocations can be in non-relocatable files by linking with --emit-relocs. This removes the associated violated assertion in ELFObjectFile and removes the short-circuit return in llvm-objdump. The comment says objdump doesn't print out these relocations though this isn't correct. Using the associated test file it's seen GNU objdump indeed prints out these relocations.

Diff Detail

Repository
rL LLVM

Event Timeline

colinl updated this revision to Diff 44248.Jan 7 2016, 12:22 PM
colinl retitled this revision from to Add support for dumping relocations in non-relocatable files.
colinl updated this object.
colinl added a reviewer: rafael.
colinl set the repository for this revision to rL LLVM.
colinl added a subscriber: llvm-commits.
mcrosier resigned from this revision.Jan 12 2016, 6:27 AM
mcrosier removed a reviewer: mcrosier.

Fairly out of my field of expertise.

rafael edited edge metadata.Jan 14 2016, 2:18 PM
rafael added a subscriber: rafael.

Please add a comment saying how the binary was created.

The assert is correct. Whatever is calling getRelocationOffset should
instead be trying to get the address.

Testing locally I see that objdump prints relocations that were copied
by --emit-reloc but not dynamic relocations.

Given

void g(void);
void f(void) {

g();

}

objudump on x86_64 will print


RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
0000000000000005 R_X86_64_PLT32 g-0x0000000000000004

RELOCATION RECORDS FOR [.eh_frame]:
OFFSET TYPE VALUE

0000000000000020 R_X86_64_PC32 .text+0x0000000000000270

readelf prints


Relocation section '.rela.plt' at offset 0x238 contains 1 entries:

Offset             Info             Type               Symbol's

Value Symbol's Name + Addend
00000000000013e8 0000000200000007 R_X86_64_JUMP_SLOT 0000000000000000 g + 0

Relocation section '.rela.text' at offset 0x428 contains 1 entries:

Offset             Info             Type               Symbol's

Value Symbol's Name + Addend
0000000000000275 0000000600000004 R_X86_64_PLT32 0000000000000000 g - 4

Relocation section '.rela.eh_frame' at offset 0x440 contains 1 entries:

Offset             Info             Type               Symbol's

Value Symbol's Name + Addend
00000000000002a0 0000000100000002 R_X86_64_PC32

0000000000000270 .text + 270

Note that:

  • The relocations from .rela.plt are missing in objdump.
  • The relocations in the .so have addresses. What objdump does is map

address to section and subtract the section address to get an offset.

Please investigate exactly which relocations are skipped by objdump.

Please make sure llvm-objdump skips the same relocations and prints
offsets like objdump.

Cheers,
Rafael

colinl updated this revision to Diff 50459.Mar 11 2016, 11:57 AM
colinl edited edge metadata.
colinl removed rL LLVM as the repository for this revision.

It looks like in the ELF spec, as you already noticed, it says r_offset contains the virtual address if the file type is ET_EXEC or ET_DYN so in those cases we look for the relocated section and subtract the address.

I added a comment in the test on how the binary was created.

For equivalent functionality of dynamic relocations we'd need to add the -R flag to llvm-objdump, SymbolRef would need an accessor like isDynamic, the ELF implementation can use code from getRelocationSymbol to determine if it's dynamic:

bool IsDyn = Rel.d.b & 1;
DataRefImpl SymbolData;
if (IsDyn)
  SymbolData = toDRI(DotDynSymSec, symbolIdx);
else
  SymbolData = toDRI(DotSymtabSec, symbolIdx);

and then COFF and MachO would need implementations for those file types. The interwoven disassembly/relocations would need to also take this in to account.

It'd be good to have equivalent functionality though I think we should do that in a separate revision since it's non-trivial.

colinl set the repository for this revision to rL LLVM.Mar 11 2016, 12:03 PM

*ping*

Does this solution and test look acceptable?

I'm noticing r215844 saying GNU objdump doesn't print relocations in non-relocatable files. This doesn't seem correct, with -emit-relocs objdump does seem to print relocations in executable and SO files.

r_offset is being modeled as per the ELF spec IMO and looks correct. GNU objdump does show relocations when there are relocation sections, that point to the section that needs relocation.

Example :-

00000000004000c0 <foo>:
  4000c0: 55                    push   %rbp
  4000c1: 48 89 e5              mov    %rsp,%rbp
  4000c4: e8 e7 ff ff ff        callq  4000b0 <bar>
        4000c5: R_X86_64_PC32   bar-0x4
  4000c9: 5d                    pop    %rbp
  4000ca: c3                    retq
khemant added a subscriber: khemant.
khemant accepted this revision.Mar 21 2016, 11:16 AM
khemant edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Mar 21 2016, 11:16 AM
This revision was automatically updated to reflect the committed changes.

The assert is correct. Whatever is calling getRelocationOffset should
instead be trying to get the address.

I disagree, the assertion isn't correct.

You can ask for a relocation's offset inside an exe and SO file. r_offset is overloaded to contain the address in these which is why section offset needs to be subtracted in order to get the offset.

I'm not looking for the address, I'm looking for precisely the relocation offset.

http://www.sco.com/developers/gabi/latest/ch4.reloc.html

r_offset
This member gives the location at which to apply the relocation action. For a relocatable file, the value is the byte offset from the beginning of the section to the storage unit affected by the relocation. For an executable file or a shared object, the value is the virtual address of the storage unit affected by the relocation.

This shows that the assertion is incorrect, as the ELF ABI suggests reading relocation records is valid in Executables as well.

While relocation offset is not looked up for shared libraries from .rela.dyn and .rela.plt sections, there is a usecase where relocation sections are generated with the linker switch --emit-relocs

Removing the assertion, LGTM.

llvm-objdump wants to print the relocationOffset, so it calls RelocationRef::relocationOffset which calls ELFObjectFile::getRelocationOffset

Since objdump wants the relocation offset, and this pipes down to ELFObjectFile::getRelocationOffset, at what point is the address supposed to be asked for?

colinl updated this revision to Diff 51649.Mar 25 2016, 10:09 AM
colinl edited edge metadata.
colinl removed rL LLVM as the repository for this revision.

Is this the recommendation on how to print relocation section offsets inside executable files?

This is much better, but needs a bit more work.

include/llvm/Object/ELFObjectFile.h
687 ↗(On Diff #51649)

It is undesirable to expose this.

Calling a relocation "dynamic" was a hack to avoid looking at sh_link every time one asks for a symbol for a relocation. I removed the hack in r264624.

test/tools/llvm-objdump/X86/relocations-in-nonrelocatable.test
7 ↗(On Diff #51649)

You can produce a much simpler file by running

gcc -c main.c -fno-asynchronous-unwind-tables -fPIC
ld.gold main.o -o main.so -shared --emit-relocs

Please do that and also run "llvm-readobj -r" so show all the relocations in the file.

I investigated a bit exactly which relocations are printed by bfd objdump. The answer is that it prints relocations from sections that use .symtab as the symbol table. That is just a limitation of bfd.

Some options we have given the desire to print relocations in executables and shared libraries.

  • Add a check for just the restriction that bfd has. That seems a bit too much of a bug for bug compatibility.
  • Print any relocation in section that have a relocated section (sh_info is not zero). Unlike bfd this would print the info for .rela.plt.
  • Print all relocations. Unlike bfd this would print .rela.plt and .rela.dyn.
tools/llvm-objdump/llvm-objdump.cpp
444 ↗(On Diff #51649)

This is guarding against a relocation with a symbol index of 0? That is an independent change, no?

445 ↗(On Diff #51649)

Since moving it, please use the new variable name style: Symb

462 ↗(On Diff #51649)

Does gnu objdump print ABS for this? Seems like an odd term. ABS refers to symbols, but in here you have no symbol at all.

1195 ↗(On Diff #51649)

You only need a sorted std::vector, no?

1196 ↗(On Diff #51649)

Initialize the iterators with "I = " not "I(...)";

1221 ↗(On Diff #51649)

Can you use lower_bound?