This is an archive of the discontinued LLVM Phabricator instance.

[lld][ELF2] Apply relocations.
ClosedPublic

Authored by Bigcheese on Jul 29 2015, 3:52 PM.

Details

Summary

This is located in the Writer as opposed to SectionChunk so that the Writer can
do a single pass over the relocations and simultaneously generate GOT/PLT
entries which will be placed after the relocated SectionChunks.

Fully supporting linker scripts, which allow placement of the PLT and GOT
anywhere, will require a pre-layout pass to generate them.

Diff Detail

Repository
rL LLVM

Event Timeline

Bigcheese updated this revision to Diff 30961.Jul 29 2015, 3:52 PM
Bigcheese retitled this revision from to [lld][ELF2] Apply relocations..
Bigcheese updated this object.
Bigcheese added reviewers: rafael, ruiu.
Bigcheese added a subscriber: llvm-commits.
ruiu accepted this revision.Jul 29 2015, 4:19 PM
ruiu edited edge metadata.

I kind of like the original approach to teach Chunks how to apply
relocations to themselves. It looks more modular and easy to use
(the writer has to just tell Chunks to "write to here" and everything
including applying relocations is done in one shot).

For COFF, we iterate the relocation table three times for dead-
stripping, applying relocations, and construct COFF base relocation
table. Iterating over a relocation table is not slow -- it's fast enough
to make me think to keep the three things separated. I'd keep them
separated in this file too.

ELF/Chunks.cpp
21–22 ↗(On Diff #30961)

May fit in 80 cols?

ELF/InputFiles.cpp
48 ↗(On Diff #30961)

Does this have to be std::stable_sort? std::sort doesn't work?

ELF/Symbols.h
90 ↗(On Diff #30961)

Add a blank line.

ELF/Writer.h
54 ↗(On Diff #30961)

These functions are sorted in the same order as they are called. So move this after writeSections (and move writeHeader below writeSections.)

This revision is now accepted and ready to land.Jul 29 2015, 4:19 PM
ruiu requested changes to this revision.Jul 29 2015, 4:20 PM
ruiu edited edge metadata.
This revision now requires changes to proceed.Jul 29 2015, 4:20 PM

@rafael really wanted this to be a single pass, which is why I designed it this way.

In my RelWithDebInfo build of llvm+lld there are about 1GiB of relocations from .o files. I assume that quite a few of these will be COMDAT merged away before being looked at, but that's still a lot of memory.

ruiu added a comment.Jul 29 2015, 5:58 PM

IIUC, doing GOT/PLT and applying relocations in one pass is just merging
multiple loops to one loop, and the total number of computation we need
to do for GOT/PLT and applying relocations wouldn't change that much.
Because relocations are continuous in memory and they are fixed in size,
iterating over them should be very fast, so I don't see loop-merging would
make things noticeably faster (that's what I observed in COFF.)

So I feel like it's a premature optimization. I'd like to keep it really simple and
straightforward as possible at this moment. We'll be able to do optimization
later.

Rafael, I'd like to hear from you since this is your idea.

rafael edited edge metadata.Aug 6 2015, 9:45 AM
rafael added a subscriber: rafael.

Starting to take a look at this. Please rebase it on top of trunk now
previous issues have been addressed.

I do want us to be able to do a single pass over the relocations. On other ELF linkers at least relocation processing shows up in profiles.

Having said that, I am not sure that forces where the code lives.

Rui, are you ok with this going in with the code in writer.cpp, we add GOT/PLT support and then try to refactor it to some other file? It seems easier than starting with 2 passes and trying to then combine them.

ELF/Chunks.h
19 ↗(On Diff #30961)

Independent cleanup, please commit it first.

81 ↗(On Diff #30961)

Why do you need to pass an ObjectFile here?

ELF/InputFiles.cpp
35 ↗(On Diff #30961)

For SparseChunks you can use resize and drop the push_backs below.

42 ↗(On Diff #30961)

This is not checking if getSection fails.

ELF/InputFiles.h
57 ↗(On Diff #30961)

Add a comment on why it is deterministic to use < on pointers (they are all from one file).

73 ↗(On Diff #30961)

don't use report_fatal_error.

ELF/SymbolTable.h
19 ↗(On Diff #30961)

independent cleanup. Please commit first.

ELF/Symbols.h
105 ↗(On Diff #30961)

This is dead.

108 ↗(On Diff #30961)

This is dead.

125 ↗(On Diff #30961)

Seems better to just cast to DefinedRegular at use.

ELF/Writer.cpp
189 ↗(On Diff #30961)

Can you use a range loop.

test/elf2/relocation.test
1 ↗(On Diff #30961)

llvm-mc

emaste added a subscriber: emaste.Aug 6 2015, 10:13 AM
ruiu added a comment.Aug 7 2015, 3:32 PM

I was thinking about the cost. I'm still thinking but I don't want to block this,
so writing a comment. I still prefer to not write code for relocations in Writer.cpp,
as I think the original code is much more cleaner than this is.

I believe it shouldn't be hard to rewrite it as one pass if we have to.
So it looks like a premature optimization to me.

I believe it shouldn't be hard to rewrite it as one pass if we have to.
So it looks like a premature optimization to me.

Suppose we have a relocation R. This relocation is in the .text section and refers a symbol in the .data section. Suppose that the .data section goes after the .text one. It is still a common case even if we forget about a linker script support. How will we write result of this relocation using a single pass approach? Will we know an address of the symbol in the .data section at the moment of writing the .text section?

There is one more question about single-pass relocation handling. On MIPS if you call non-PIC routine from a PIC one you should create a thunk contains three instructions. We do not known the number of the thunks until we scan all relocations. So we cannot predict the output .text section's size and offsets. IIRC ARM requires some kind of thunks too. How can the single-pass relocation calculation handle this problem?

ruiu added a subscriber: ruiu.Aug 11 2015, 4:28 PM

There may be more things to think. I'd like to see to land with
non-single-pass approach as I suggested so that it doesn't block anyone. We
can revisit this later when needed.

silvas added a subscriber: silvas.Aug 13 2015, 3:33 PM
silvas added inline comments.
test/elf2/relocation.test
1 ↗(On Diff #30961)

That doesn't make sense. It's like using a C file for an LLVM IR test. If we are testing relocations, there better be R_ in the input and not in a comment.

30–38 ↗(On Diff #30961)

Delete the data and bss sections. They don't matter for this test.

71 ↗(On Diff #30961)

For clarity, please have an explicit offset here.

Bigcheese updated this revision to Diff 32112.Aug 13 2015, 4:55 PM
Bigcheese edited edge metadata.
Bigcheese marked 11 inline comments as done.

Rebase and address review comments.

ELF/Chunks.h
81 ↗(On Diff #30961)

Because we need to know which ObjectFile a section is from to get its relocations.

ELF/InputFiles.cpp
35 ↗(On Diff #30961)

I cleaned it up some. Using resize and array indexing was just as complicated.

test/elf2/relocation.test
1 ↗(On Diff #30961)

This is testing explicit relocations. Assembly doesn't represent specific relocations, and it's not obvious what gets generated.

llvm-mc

This is testing explicit relocations. Assembly doesn't represent specific relocations, and it's not obvious what gets generated.

This test is using a valid file that can be created with assembly.
YAML is an unnecessary LLVM only invention here. Use assembly.

Cheers,
Rafael


llvm-commits mailing list
llvm-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits

Bigcheese updated this revision to Diff 32352.Aug 17 2015, 4:20 PM
Bigcheese edited edge metadata.

Fixed symbol indexing bug. The underlying code changed while the patch was out for review.

I still find using assembly to represent relocations to be an obfuscation, but I'm not going to let that delay work.

davide added a subscriber: davide.Aug 25 2015, 9:09 AM

I'm fine with the general structure of this patch. I would really like to see this moving forward. Rafael is doing a fair amount of work that may potentially conflict with this, so, the more we delay the check-in of this code, the harder it will be (delays due to rebase etc..). Also, having the scaffolding for relocation in place will ease implementation of other relocations. Is there anything else outstanding?

I'm just waiting for a lgtm.

ruiu added a comment.Aug 25 2015, 11:09 PM

Rafael should be the person who signs off.

This doesn't even compile:

/home/espindola/llvm/llvm/tools/lld/ELF/Symbols.h:135:12: error: use
of undeclared identifier 'Sym'

return Sym.st_value;

The attached patch fixes the build and runs git-clang-format.

elf2/basic64be.s is still failing, looking.

Hacked to pass all tests.

Trying to merge now.

Rebased to trunk with all tests passing.

Main issues I see so far:

  • There seems to be more code than needed for the test.
  • Treating local symbols like regular symbols is incredibly wasteful.

They don't need the replacement logic.

I will try to further reduce the patch tomorrow.

This revision was automatically updated to reflect the committed changes.