[lld] Support RISC-V
Needs ReviewPublic

Authored by PkmX on Oct 26 2017, 1:24 AM.

Details

Reviewers
ruiu
asb
Summary

This patch makes lld recognize RISC-V target and implements basic relocation for RV32/RV64 (and RVC). This should be necessary for static linking ELF applications.

The ABI documentation for RISC-V can be found at: https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md. Note that the documentation is far from complete so we had to figure out some details from bfd.

The patch should be pretty straightforward. Some highlights:

  • A new relocation Expr R_RISCV_PC_INDIRECT is added. This is needed as the low part of a PC-relative relocation is linked to the corresponding high part (auipc), see: https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#pc-relative-symbol-addresses
  • LLVM's MC support for RISC-V is very incomplete (we are working on this), so tests are given in objectyaml format with the original assembly included in the comments. Once we have complete support for RISC-V in MC, we can switch to llvm-as/llvm-objdump.
  • We don't support linker relaxation for now as it requires greater changes to lld that is beyond the scope of this patch. Once this is accepted we can start to work on adding relaxation to lld.

Diff Detail

PkmX created this revision.Oct 26 2017, 1:24 AM
ruiu added a comment.Oct 26 2017, 1:14 PM

As you said this looks like a pretty straightforward port to RISC-V. Thank you for doing this!

IIRC, last time when we discussed RISC-V support in lld, I had a concern that linker relaxation (moving code inside a section to make it compact) is virtually mandatory to link RISC-V object files. What is the situation of it? If you can link a small static executable with this patch, then it is probably no longer mandatory, but I want to make sure that that's the case.

ELF/Arch/RISCV.cpp
40–41

We usually do not enclose an entire cpp file in namespaces. Please take a look at X86.cpp and do the same thing.

46

This is redundant, I would remove this line.

47

Please use RelType instead of uint32_t if a variable represents a relocation type, even though RelType is just a typedef for uint32_t.

56

& 0xFFFF is not needed as it is automatically trimmed.

69

We do not indent cases. Maybe you want to just use clang-format-diff.

88

I don't like this comment -- could you describe the function?

91

I think

return (V >> End) & ((1 << Begin) - 1);

is more straightforward (if I understand this expression correctly.) You can do shift the result to the left after returning from this function.

I'd make the range half-open (e.g. pass (8, 9) instead of (8, 8)) because it feels more C++-ish.

Function names usually should be verb. Something like extract is a better name.

110

I don't think you need a cast because you are using the least signifcant 8 bits anyway.

112–117

They are obviously not mutated, and in that case, we don't usually add const.

219

+=

259–269

Move these cases to the end of this switch (just before default), to remove gotos.

294–297

Is big-endian RISCV actually a thing? I know it is technically defined, but I wonder if there exists a real user.

ELF/Driver.cpp
118–133

Sort

ELF/InputSection.cpp
576–578

Move this comment inside the following case

582

If Body is an absolute symbol, Section is a nullptr. Are you sure that this shouldn't happen?

(And basically, please remove const from local variables as our functions are usually small and their scopes are pretty small.)

ELF/Symbols.cpp
30–31

Sort

ELF/Symbols.h
324–333

Sort

329

Please add a comment.

ELF/Target.cpp
76–82

Sort

82–89

Big-endian RISC-Vs seems unreachable. I'd remove them if it is not needed now.

ELF/Target.h
128–137

Sort

(Basically, we want to add new code to the place where it would have been if it had existed from beginning. This makes our code look less patch-y.)

ELF/Writer.cpp
790–795

Needs comment.

912–918

Format

PkmX marked 12 inline comments as done.Oct 26 2017, 7:49 PM
In D39322#908393, @ruiu wrote:

As you said this looks like a pretty straightforward port to RISC-V. Thank you for doing this!

IIRC, last time when we discussed RISC-V support in lld, I had a concern that linker relaxation (moving code inside a section to make it compact) is virtually mandatory to link RISC-V object files. What is the situation of it? If you can link a small static executable with this patch, then it is probably no longer mandatory, but I want to make sure that that's the case.

Yes, linker relaxation is no longer mandatory, after these followup discussion here:
https://github.com/riscv/riscv-binutils-gdb/pull/88
https://github.com/riscv/riscv-tools/issues/132

ELF/Arch/RISCV.cpp
91

Yes, basically it extracts the bits from [Begin, End] and shifts the result so the LSB is at LShift bit.

Making the range half-open does make it work more like C++ iterators, but it does make cross-referencing with the ISA spec more difficult as ranges are often given in closed forms, e.g. Inst[5:0] means the lowest 6 bits.

I agree we can move the left shift out of the function, and I will also rename the function to extractBits.

294–297

I don't think there are any big-endian implementations (and the toolchain/libraries are most likely not intensively tested for BE), so yeah I'm going to remove these.

ELF/Symbols.h
324–333

RISCVGlobalPointer should be the last one in alphabetical order, no?

ELF/Target.cpp
76–82

It is already in alphabetical order.

ruiu added inline comments.Oct 26 2017, 8:51 PM
ELF/Arch/RISCV.cpp
91

If it is consistent with the specification, I'm fine with open interval.

PkmX updated this revision to Diff 120541.Oct 26 2017, 11:37 PM
PkmX marked 6 inline comments as done.
ruiu added inline comments.Oct 26 2017, 11:46 PM
ELF/Arch/RISCV.cpp
45

This is where you should use RelType too.

87

I'd replace static_cast<uint64_t>(1) with 1u, as we assume int is at least 32 bits.

294–297

Thanks!

Then maybe it makes more sense if you provide two functions, getRISCV32TargetInfo and getRISCV64TargetInfo, I guess?

PkmX updated this revision to Diff 120794.Oct 30 2017, 4:24 AM
  • Renamed uint32_t to RelType.
  • Detemplate the RISCV class.
ruiu added a comment.Oct 30 2017, 1:29 PM

Generally looking pretty good except lo12 relocations.

ELF/Arch/RISCV.cpp
50–55

Please remove this function and directly use write32le (I believe instead of calling write16le twice, you can use write32le.).

57–59

Ditto -- please use read32le directly.

85

static_cast<uint64_t>(1) -> (uint64_t)1 or 1ULL

as it is obvious that it is a static cast.

ELF/InputSection.cpp
583–590

It is O(n^2), isn't it? It doesn't seems good.

Fundamentally I think I don't understand why R_RISCV_PCREL_LO12{I,S} were designed that way. IIUC, they are relocations to load a PC-relative symbol address and used in combination with PC_RISCV_PCREL_HI20. A pair of HI and LO relocations are associated via label locations. That seems unnecessarily tricky to me.

The spec says that they are computed this way:

hi20 = ((symbol_address - hi20_reloc_offset + 0x800) >> 12);
lo12 = symbol_address - hi20_reloc_offset - hi20;

but why can't we define lo12 as a more regular relocation like this:

hi20 = ((symbol_address - hi20_reloc_offset + 0x800) >> 12);
lo12 = symbol_address - lo20_reloc_offset - ((symbol_address - lo20_reloc_offset + 0x800 + Addend) >> 12) + Addend;

and set a relocation addend to a distance between a lo12 relocation and a hi12 relocation? This way, you don't really have to associate two relocation, but you can just compute their values separately. What am I missing?

PkmX updated this revision to Diff 120939.Oct 30 2017, 11:52 PM
PkmX marked 3 inline comments as done.
  • Use {read,write}32le to read/write insturctions.
  • Make __global_pointer$ non-absolute.
asb added a comment.Oct 31 2017, 2:49 AM

I'm not an LLD expert, but the RISC-V bits seem correct as far as I can see. I've just added a couple of minor comments.

ELF/Arch/RISCV.cpp
72

A comment that clarifies the semantics would be useful. Perhaps "Begin and End are inclusive, e.g. extractBits(V, 0, 5) will return the lower 6 bits."

166

It would seem consistent with the other cases do define Imm* local variables to help piece together the final instruction. It's slightly more verbose, but enforcing uniformity across the cases might make this slightly easier to read?

niosHD added a subscriber: niosHD.Oct 31 2017, 8:11 AM
ruiu added a comment.Oct 31 2017, 4:01 PM

What do you think of lo20 relocations? If there's a better place to ask about the design of the RISC-V ABI, please let me know so that I can start a thread.

PkmX marked an inline comment as done.Oct 31 2017, 5:24 PM

I believe the best place would be either the issue tracker on the psABI doc, or RISC-V's sw-dev mailing list.

ELF/Arch/RISCV.cpp
72

Will do.

87

On common systems where int is 32-bit (x86_64) and with Begin == 31, 1u << 32 would invoke undefined behavior.

With that said, this will still break if Begin == 63, so I think it is more correct to right-shift from UINT64_MAX.

110

The cast is needed: consider uint64_t Val = -1, right-shifting an unsigned integer in C performs zero-extension in the MSB and produces a large positive number if interpreted as signed later.

I've removed redundant casts in other checkInt where no shifting is performed.

166

That sounds reasonable. The other cases are more complicated as the Imm needs to be split up and shifted into the correct bit position and or'ed together. With auipc (U-Type) and jalr (I-Type), the imm fields in the instruction are already packed in the right order, so we only need a simple mask and shift.

Actually, for R_RISCV_CALL, I think I should just recursively call relocateOne with R_RISCV_PCREL_HI20 at Buf and R_RISCV_PCREL_LO12_I at Buf + 4 to remove redundant code. This should be okay since we aren't doing relaxation for now.

294–297

I think it should actually be possible to de-template the class and just use Config->Wordsize and Config->Endianness, then we can just provide a single instance.

ELF/Driver.cpp
118–133

Sorting this alphabetically will put elf32lriscv awkwardly between elf32btsmip and elf32ltsmip, so I think it's better to put it after these two?

.Cases("elf32btsmip", "elf32btsmipn32", {ELF32BEKind, EM_MIPS})                                                                                                                      
.Cases("elf32ltsmip", "elf32ltsmipn32", {ELF32LEKind, EM_MIPS})        
.Case("elf32lriscv", {ELF32LEKind, EM_RISCV})                          
.Case("elf32ppc", {ELF32BEKind, EM_PPC})                               
.Case("elf64btsmip", {ELF64BEKind, EM_MIPS})                           
.Case("elf64ltsmip", {ELF64LEKind, EM_MIPS})                           
.Case("elf64lriscv", {ELF64LEKind, EM_RISCV})
ELF/InputSection.cpp
582

I don't think that should ever happen since the symbol should point to an auipc instruction somewhere in a text section.

583–590

RISC-V expects linker relaxation so the distance between a pair of HI20 and LO12 relocation isn't static. If some code is removed then you need to update the addend of the LO12 of all HI/LO pairs surrounding that change.

It should be possible to pre-compute a mapping from symbols to relocations to reduce the search time complexity, but I think that could go in another patch?

ruiu added inline comments.Oct 31 2017, 5:36 PM
ELF/InputSection.cpp
583–590

Yeah, I guessed that that might be reason. But, if the linker relaxes and repacks instructions, it naturally knows how many bytes have been removed between any two locations, so it can easily compute how many bytes need to be adjusted for lo20 addends, right? I'd think computing the same value using labels is unnecessarily complicated, and it seems like a residue of when the code shrinking relaxation was mandatory for RISC-V.

PkmX updated this revision to Diff 121106.Nov 1 2017, 1:46 AM
PkmX marked an inline comment as done.

Make R_RISCV_CALL reuse R_RISCV_PCREL_HI20 and R_RISCV_PCREL_LO12_I.

PkmX updated this revision to Diff 121837.Nov 6 2017, 11:12 PM
  • Rebase onto latest master
  • Clean up #includes
Razer6 added a subscriber: Razer6.Nov 7 2017, 12:14 AM
ruiu added inline comments.Nov 8 2017, 4:03 PM
ELF/InputSection.cpp
583–590

I still have a concern on HI20 and LO20 relocations. Could you start a thread to discuss whether it can be fixed in the ABI?

mgrang added a subscriber: mgrang.Fri, Nov 17, 11:11 AM
PkmX added a comment.Thu, Dec 7, 3:19 AM

Oops. It looks like I actually didn't submit some of these draft comments.

ELF/Arch/RISCV.cpp
11–28

I think some #includes here are only needed in the subsequent patches and some are simply unnecessary. Noting in case I forget to revise this.

ELF/InputSection.cpp
583–590

Yeah, that sounds workable. This does mean that the linker needs scan the whole section for LO12 to update the addend during relaxation, as the HI20/LO12 pair may be far apart (it is also perfectly fine for many PCREL_LO12 relocations to refer to the same PCREL_HI20). I'm not sure how this compares performance-wise with storing a label since you only pay for the penalty when you want the value or to find the linked PCREL_HI20.

With that said, I think it may be too late to change the ABI, especially when it comes to a fundamental change like this.

583–590

BTW, those are called LO12, not LO20.

583–590

FYI, I've opened a follow-up thread on RISC-V's mailing list:

https://groups.google.com/a/groups.riscv.org/forum/#!topic/sw-dev/IqYjXvl4o4Q

583–590

FYI, I've opened a follow-up thread on RISC-V's mailing list:

https://groups.google.com/a/groups.riscv.org/forum/#!topic/sw-dev/IqYjXvl4o4Q

583–590

It's been a while and I don't think they are willing to change the ABI at this point. Do you have further comments and is there still anything that needs to be done with this patch?