Page MenuHomePhabricator

[WIP][LLD][RISCV] Linker Relaxation
Needs ReviewPublic

Authored by PkmX on Tue, Apr 20, 2:54 AM.

Details

Summary

This patch fully implements linker relaxation for RISC-V including relaxation for R_RISCV_CALL, R_RISCV_HI20/LO12, R_RISCV_PCREL_HI20/LO12 and handling for R_RISCV_ALIGN. Just for reference/link, there were some previous efforts/discussion to implementation linker relaxation in D77694 and D79105.

As linker relaxation is highly specific to RISC-V at the moment, most of the work are done in the Target::finalizeContents() function and isolated from generic code. For now I'm avoiding trying to come up with a common relaxation framework for multiple targets as their needs may be greatly different.

The relaxation process is split into several passes:

  • For each executable input section, search its relocation vector for R_RISCV_RELAX and determine how to relax the previous relocation.
  • If the relocation is R_RISCV_CALL (auipc+jalr pair), try to relax to jal or c.jal if the jump target is in range. This assumes that the PC-relative offset can only become smaller during the relaxation process.
  • If the relocation is R_RISCV_HI20/LO12 (absolute addressing) and the target symbol can be addressed from __global_pointer$ (defaulted to .sdata+0x800), delete the lui and rewrite the lo part to use gp as source register.
  • If the relocation is R_RISCV_PCREL_HI20/LO12, this requires two-pass to relax as PCREL_LO12 links to its PCREL_HI20 for address calculation and they may appear in arbitrary order in an input section. To preserve the look up from PCREL_LO12 the first pass relaxes/deletes`PCREL_LO12` and the second pass handles PCREL_HI20. The implementation is simpler than that in bfd because lld doesn't allow addends in PCREL_LO12 (https://github.com/riscv/riscv-elf-psabi-doc/issues/184), and so both parts can be relaxed independently.
  • The range of bytes are deleted after processing relaxation for a whole input section. This requires adjusting section content, symbol addresses/sizes and relocation offsets, which is handled in InputSectionBase::deleteRanges. Compared to other approaches we use a algorithm that is not quadratic by sorting symbols, relocations and bytes to be deleted by offset.
  • After relaxation, handle alignment as symbols addresses are now fixed modulo section alignment. This is always enabled regardless of the --relax option as this is required for correctness.

There are still some issues to solve:

  • --emit-relocs will be broken on relaxed section as currently it just copies the corresponding .rela section verbatim from input. It needs to be fixed to build relocation entries from the section's relocation vector.
  • The patch adds two additional RelExpr (R_RISCV_GPREL and R_RELAX_HINT) which unfortunately makes it unable to fit into a 64-bit mask, so it is now changed back to sequential checking. I think it should be somehow split into target-independent (R_ABS, R_PC, ...) exprs and target-dependent (R_RISCV_..., R_MIPS_...) exprs which can overlap in numeric range, but it is probably out of the scope of this patch.

Diff Detail

Event Timeline

PkmX created this revision.Tue, Apr 20, 2:54 AM
PkmX requested review of this revision.Tue, Apr 20, 2:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptTue, Apr 20, 2:54 AM

Why write your own deleteRanges? I took great care in writing mine to ensure there were no hidden quadratic complexities, but your simpler version has introduced them. Just take mine, unless there is a problem with it nobody's mentioned. Same goes for mutableData being a less efficient version of mine.

I also don't like making this overly target-specific. We should take the time to get the interface right, not just punt on it and shove it all into target-specific code.

lld/ELF/Arch/RISCV.cpp
617

Can we not just keep two iterators around? That feels nicer to me than careful +1s everywhere (e.g. it's not immediately obvious this won't crash on the first element until you look at the start point of the loop).

637–638

This condition seems unnecessary

686

When would the addend ever be 0? That seems like invalid input.

733–734

Why a limit of 5? It is guaranteed to converge, I don't see why there needs to be a limit, and it'll probably take just 1 or 2 almost all of the time.

740

Not for relocatable

lld/ELF/InputSection.cpp
195–196

This looks quadratic

216–217

As does this

227–228

And this

lld/ELF/InputSection.h
162

Make the relevant fields mutable and mark this const, like data()

173

std::map seems like a high-overhead way to store what is just a sorted list of intervals.

269

Why do we need a whole new pointer when rawData is also being updated?

lld/ELF/Relocations.cpp
128

(a) this is unrelated to the diff
(b) this completely removes the whole point of oneof

PkmX added inline comments.Tue, Apr 20, 8:36 PM
lld/ELF/Arch/RISCV.cpp
637–638

It's necessary as processRelaxations needs to access two iterators which would be invalid if the container is empty. I will move this into processRelaxations which is where it should be in the next patch.

733–734

Will be removed in the next patch.

740

Will do in the next patch.

lld/ELF/InputSection.cpp
195–196

This isn't quadratic.

We are really looping over the symbols in this section while simultaneously accumulating the number of bytes deleted. Since symbols and delete ranges are sorted by offset, if we know X bytes are deleted before offset A, and B > A, we only have to add up delete ranges between A and B to know the total number of bytes deleted before B. In essence every symbol and delete range is only visited once in this loop.

lld/ELF/InputSection.h
173

I agree. std::vector<std::pair<uint64_t, uint64_t>> should be sufficient although it needs to be sorted for deleteRanges as the vector may not sorted with two-pass handling for PCREL_HI20.

269

Will change it to use a copiedData bool as in your patch.

lld/ELF/Relocations.cpp
128

As noted in the summary adding more RelExpr makes it unable to fit in a 64-bit mask, so this is temporary change to make this patch work for now.

There were already 64 RelExprs before this patch so there is no room for growing and I don't think it is workable in the long run. The only solution I can think of is split them into target specific RelExpr with some common exprs like R_ABS being shared, so as long as the target doesn't define more than ~20 RelExpr it should be fine. However this probably should belong to another patch.

PkmX added inline comments.Wed, Apr 21, 1:01 AM
lld/ELF/Arch/RISCV.cpp
686

Yes, valid addends should only be 2^n-2 for n >= 2. With that said I don't think lld is really checking this anywhere, so malicious input could cause lld to crash.

lld/ELF/Relocations.cpp
128

Alternatively I think we can use two 64-bit masks, so this should work up to 128 RelExprs.

PkmX updated this revision to Diff 339181.EditedWed, Apr 21, 4:38 AM
PkmX marked an inline comment as done and an inline comment as not done.
PkmX edited the summary of this revision. (Show Details)

This addresses some of the comments by @jrtc27:

  • Check for empty relocation vector is moved into processRelaxations
  • Remove the iteration limit for relaxation
  • Don't perform alignment handling for relocatable link
  • No more mutData in InputSectionBase, now it uses copiedData to check if a mutable copy has been made.
  • DeleteRanges is now a std::vector of offset and size pairs.
  • oneof now tests for two 64-bit masks depending on which subset (<64 or >= 64) the expr is in.
arichardson added inline comments.Wed, Apr 21, 4:46 AM
lld/ELF/Relocations.cpp
144

This change should be a separate review. I would very much like support for > 64 RelExpr values to land upstream since we also had to make that change that for our CHERI LLD.

Why write your own deleteRanges? I took great care in writing mine to ensure there were no hidden quadratic complexities, but your simpler version has introduced them. Just take mine, unless there is a problem with it nobody's mentioned. Same goes for mutableData being a less efficient version of mine.

I also don't like making this overly target-specific. We should take the time to get the interface right, not just punt on it and shove it all into target-specific code.

+1 for a low time complexity implementation. The quadratic time complexity of the binutils approach may run into scalability problems.

I haven't closely looked into this yet, but it seems that R_RISCV_GPREL_* support can be contributed separately.

PkmX added a comment.Mon, Apr 26, 7:58 PM

Why write your own deleteRanges? I took great care in writing mine to ensure there were no hidden quadratic complexities, but your simpler version has introduced them. Just take mine, unless there is a problem with it nobody's mentioned. Same goes for mutableData being a less efficient version of mine.

I also don't like making this overly target-specific. We should take the time to get the interface right, not just punt on it and shove it all into target-specific code.

+1 for a low time complexity implementation. The quadratic time complexity of the binutils approach may run into scalability problems.

I haven't closely looked into this yet, but it seems that R_RISCV_GPREL_* support can be contributed separately.

Do you mean the whole gp-relaxation support? As only implementing relocations for R_RISCV_GPREL seems pointless as they are only generated and consumed by the linker.

lld/ELF/Relocations.cpp
144

That's okay for me. I will split this out into a new patch.