This is an archive of the discontinued LLVM Phabricator instance.

[BOLT] Add minimal RISC-V 64-bit support
ClosedPublic

Authored by jobnoorman on Mar 9 2023, 6:01 AM.

Details

Summary

Just enough features are implemented to process a simple "hello world"
executable and produce something that still runs (including libc calls).
This was mainly a matter of implementing support for various
relocations. Currently, the following are handled:

  • R_RISCV_JAL
  • R_RISCV_CALL
  • R_RISCV_CALL_PLT
  • R_RISCV_BRANCH
  • R_RISCV_RVC_BRANCH
  • R_RISCV_RVC_JUMP
  • R_RISCV_GOT_HI20
  • R_RISCV_PCREL_HI20
  • R_RISCV_PCREL_LO12_I
  • R_RISCV_RELAX
  • R_RISCV_NONE

Note that in order to implement some of these relocations, this patch
relies on another patch for RuntimeDyld (D145686). However, it has been pointed
out in the past (D98560) that it might be better to switch to JITLink.
That change would be a lot more involved, though, so using RuntimeDyld
for the time being seems to be easiest way forward for getting RISC-V
support in BOLT.

Executables linked with linker relaxation will probably fail to be
processed. BOLT relocates .text to a high address while leaving .plt at
its original (low) address. This causes PC-relative PLT calls that were
relaxed to a JAL to not fit their offset in an I-immediate anymore. This
is something that will be addressed in a later patch.

Changes to the BOLT core are relatively minor. Two things were tricky to
implement and needed slightly larger changes. I'll explain those below.

The R_RISCV_CALL(_PLT) relocation is put on the first instruction of a
AUIPC/JALR pair, the second does not get any relocation (unlike other
PCREL pairs). This causes issues with the combinations of the way BOLT
processes binaries and the RISC-V MC-layer handles relocations:

  • BOLT reassembles instructions one by one and since the JALR doesn't have a relocation, it simply gets copied without modification;
  • Even though the MC-layer handles R_RISCV_CALL properly (adjusts both the AUIPC and the JALR), it assumes the immediates of both instructions are 0 (to be able to or-in a new value). This will most likely not be the case for the JALR that got copied over.

To handle this difficulty without resorting to RISC-V-specific hacks in
the BOLT core, a new binary pass was added that searches for
AUIPC/JALR pairs and zeroes-out the immediate of the JALR.

A second difficulty was supporting ABS symbols. As far as I can tell,
ABS symbols were not handled at all, causing __global_pointer$ to break.
RewriteInstance::analyzeRelocation was updated to handle these
generically.

Tests are provided for all supported relocations. Note that in order to
test the correct handling of PLT entries, an ELF file produced by GCC
had to be used. While I tried to strip the YAML representation, it's
still quite large. Any suggestions on how to improve this would be
appreciated.

Diff Detail

Event Timeline

jobnoorman created this revision.Mar 9 2023, 6:01 AM
Herald added a project: Restricted Project. · View Herald Transcript
jobnoorman requested review of this revision.Mar 9 2023, 6:01 AM
jobnoorman updated this revision to Diff 503797.Mar 9 2023, 8:48 AM

Fix failing test.

yota9 added a comment.Mar 13 2023, 1:43 PM

Thank you for this patch :)

bolt/lib/Rewrite/RewriteInstance.cpp
1461

Is it first PLT entry skipped? I don't think it is good idea to set constant size here in that case. Maybe it is possible to make more universal like it is done in aarch64? there are no PLT entry size assumptions too as I remember

2096

Please add || for aarch64 above

2624

We would need to check this code. Please provide more info about example of such symbols, sections it is contained/referenced in and segments.
Is there any chance that the problem might be solved by this commit https://github.com/llvm/llvm-project/commit/0776fc32b15dc76c6b43c41fc4471552833265de ?

3255

Probably better to move it to arch-specific pass like it is currently done in ADRRelaxationPass or VeneerElimination for example

craig.topper added inline comments.
bolt/include/bolt/Core/BinaryContext.h
730

Should this be isRISCV64()?

bolt/lib/Core/Relocation.cpp
505

What about PCREL_LO12_S?

bolt/lib/Target/RISCV/RISCVMCPlusBuilder.cpp
40

Does this need to support c.nop?

254

This references R_RISCV_PCREL_LO12_S but nothing else in this patch does.

jobnoorman edited the summary of this revision. (Show Details)

Address some of the reviewer comments:

  • Remove references to PCREL_LO12_S. It was only partially implemented so I'll keep the full implementation for a later patch;
  • Collapse two separate ifs in one;
  • Invoke call-fixing code from binary pass;
  • Detect c.nop in isNoop()
jobnoorman marked 4 inline comments as done.Mar 14 2023, 5:08 AM
jobnoorman added inline comments.
bolt/include/bolt/Core/BinaryContext.h
730

This would technically be more correct at the moment. However, I don't think anything in this patch is actually RV64 specific. The only reason we cannot currently handle RV32 is that BOLT rejects ELF32 files. Once that's fixed, I believe this patch could also handle RV32. For this reason, I would propose to keep the current naming. Would you agree?

bolt/lib/Core/Relocation.cpp
505

I will leave the implementation of PCREL_LO12_S to a future patch.

bolt/lib/Rewrite/RewriteInstance.cpp
1461

It's not really the first entry that's skipped, but a special entry at the beginning of the PLT section. The riscv psabi defines the layout of the PLT section:

  • First 32 bytes: a special entry that calls _dl_runtime_resolve to resolve GOT entries. Initially, all GOT entries point here.
  • Then, 16 bytes per "normal" PLT entry.

So since the sizes of these entries are standardized, I think it's OK to keep the as constants here. Would you agree?

2624

The symbol looks like this:

Num:    Value          Size Type    Bind   Vis      Ndx Name
 86: 0000000000002800     0 NOTYPE  GLOBAL DEFAULT  ABS __global_pointer$

It is used in RISC-V to initialize the gp register using a pair of R_RISCV_PCREL_HI20/R_RISCV_PCREL_LO12 relocations.

Since ABS symbols don't reference/belong to any section, I believe that commit doesn't solve the problem?

3255

I moved this to a separate pass but I left the implementation in a MCPlusBuilder method. I did this because trying to implement the logic directly in a pass is quite difficult because passes don't have access to a target's MC interface. I would have needed to add methods like isCallAuipc/isJalr to MCPlusBuilder which feels wrong since it's supposed to be a generic interface.

Would it be a good idea to allow targets to register their own passes (e.g., MCPlusBuilder::registerPasses()) so that they can be implemented in a target-specific subfolder and have access to their MC interface? This might allow us to remove some target-specific methods from MCPlusBuilder. If you agree with this idea, I could give it a shot.

bolt/lib/Target/RISCV/RISCVMCPlusBuilder.cpp
40

It does, added!

254

Removed.

Awesome!

bolt/lib/Core/BinaryContext.cpp
740–742

Why are we creating a new method to do the same thing as the above one?

bolt/lib/Target/RISCV/RISCVMCPlusBuilder.cpp
297

BOLT's libTarget is designed in a way such that it doesn't know about BOLT's core data structures other than MCInst and the MCPlusBuilder class (see the other backend implementations). libBOLTCore depends on libBOLTTarget, but not vice-versa (except for the MCPlusBuilder base class itself, which is in BOLTCore).

So the way forward here would be to implement this loop over all instructions in the pass itself, and only call the target backend for specific target-specific actions on MCInsts. For example, only the body of the loop here would be in "fixCalls".

Another thing we could do is to rename fixCalls to fixRISCVCall, so it is obvious to other backends that this function is exclusive of RISCV and that they don't have to implement it.

Amir added a comment.Mar 14 2023, 5:39 PM

Thank you for working on it! Would you want to add a line to bolt/CODE_OWNERS.TXT?

bolt/include/bolt/Core/BinaryFunction.h
1307

Please move it to RISCVMCPlusBuilder, similar to D131813.

bolt/lib/Target/RISCV/RISCVMCPlusBuilder.cpp
36

Feel free to remove these per D145972

jobnoorman marked 3 inline comments as done and 4 inline comments as done.
jobnoorman edited the summary of this revision. (Show Details)

Rebase and address reviewer comments:

  • Adapt to D131813 and D145972
  • Remove duplicated method getOrCreateAbsoluteSymbol
  • Do not depend on libBOLTCore in libBOLTTarget: have the pass loop over instructions and only call back to MCPlusBuilder for target-specific info
  • Update bolt/CODE_OWNERS.TXT
jobnoorman marked 3 inline comments as done.Mar 15 2023, 4:01 AM
jobnoorman added inline comments.
bolt/lib/Core/BinaryContext.cpp
740–742

Nice catch! I think this method went through several rounds of refactoring and ended up the same as another one without me noticing :) Removed.

bolt/lib/Target/RISCV/RISCVMCPlusBuilder.cpp
297

Fair point, I gave it a shot: the pass now loops over all instructions, calls a new method MCPlusBuilder::isRISCVCall to identify AUIPC/JALR pairs and then fixes-up the JALR.

It still feels unfortunate to have target-specific methods in MCPlusBuilder though. I wonder of some of them (like this one) could be prevented by allowing passes to be implemented in the backend library. Not something to be addressed by this patch though.

jobnoorman marked an inline comment as done.Mar 23 2023, 7:32 AM

Friendly ping.

rafauler added inline comments.Mar 24 2023, 4:31 PM
bolt/lib/Rewrite/RewriteInstance.cpp
2627

Maybe convert this TODO into a regular comment saying this also passes for UND symbols, even though the intent is to capture ABS symbols that land outside any section.

2630

I'm probably not getting the full picture here, so correct my thinking below.

So, if I understand correctly, you have an ABS __global_pointer$ symbol that points outside any section that BOLT has loaded (otherwise ReferencedSection wouldn't be null here). Therefore, BOLT won't change the location of this symbol (if it is pointing to some unknown region of memory that is not covered by any section known by BC->getSectionForAddress(SymbolAddress).

So why register this reloc, if we are not changing this symbol? We will arrive at link step and RuntimeDyld will resolve this unknown symbol to zero, no? How do we know the correct value of this symbol?

If we don't create the relocation here (with addRelocation), we will disassemble the original code that is already patched with the correct value for this symbol -- at least for X86. For RISC, this is harder to do if the real address is split in different instructions, and we don't have the luxury of ignoring the relocation. So my feeling is that, at the very least, this code is RISC-only, but you didn't protect this code from running in the X86 backend.

jobnoorman added inline comments.Mar 27 2023, 4:35 AM
bolt/lib/Rewrite/RewriteInstance.cpp
2630

Thanks for the extensive review!

So, if I understand correctly, you have an ABS __global_pointer$ symbol that points outside any section that BOLT has loaded (otherwise ReferencedSection wouldn't be null here). Therefore, BOLT won't change the location of this symbol (if it is pointing to some unknown region of memory that is not covered by any section known by BC->getSectionForAddress(SymbolAddress).

Correct. Except that __global_pointer$ should be handled even if it happens to point to a section. I'll look into a way to fix this.

So why register this reloc, if we are not changing this symbol? We will arrive at link step and RuntimeDyld will resolve this unknown symbol to zero, no? How do we know the correct value of this symbol?

The current behavior of this patch is that __global_pointer$ keeps its original value. I just noticed that this works kind of accidentally because symbols that don't reference a section are registered for "pretty printing" here.

Now, whether this is correct or not is a good question. The linker is supposed to pick the value for __global_pointer$ when performing Global-pointer relaxation. Since linker relaxation is not addressed by this patch (-Wl,--no-relax is currently mandatory to use it), I don't have an answer to this question yet. (*)

If we don't create the relocation here (with addRelocation), we will disassemble the original code that is already patched with the correct value for this symbol -- at least for X86. For RISC, this is harder to do if the real address is split in different instructions, and we don't have the luxury of ignoring the relocation.

The problem is that the relocation against __global_pointer$ is typically used by a PC-relative code sequence like this (copied from the link above):

1:  auipc gp, %pcrel_hi(__global_pointer$) # R_RISCV_HI20
    addi  gp, gp, %pcrel_lo(1b)            # R_RISCV_LO12_I

So even though we don't change the value of __global_pointer$, the location of this code sequence *can* change and hence the immediate values in it need to be patched again. Therefore, I think adding this relocation is necessary.

So my feeling is that, at the very least, this code is RISC-only, but you didn't protect this code from running in the X86 backend.

I think it's true this code will currently only ever get triggered on RISC-V and I don't mind protecting it as such. However, do you think it will do the wrong thing for an X86 binary containing an ABS symbol? If not, it might make sense to not protect this code to reduce the number of target-specific code paths.

(*) The fact that we don't support linker relaxation yet got me thinking about whether we could skip handling of __global_pointer$ altogether in this patch. This should work in principle since the only place it's used in the binaries I've tested is during the initialization of gp in GCC's startup routines; gp is never actually read from in the rest of the binary. The problem is that we then should also ignore the second relocation (R_RISCV_LO12_I) in the example above or else RuntimeDyld gets angry. It feels like any way to filter-out this particular case of R_RISCV_LO12_I would be quite hacky so it will probably be preferable to not do this. I'll look into it though.

Hi all, I just got the following comment on the RuntimeDyLd part of this patch:

Is there something preventing Bolt from moving to ORC / JITLink? If Bolt is able to move over then the aim should be to do that. If Bolt is unable to move over then we need to know why so that we can address the issue. RuntimeDyld is very much in maintenance mode at the moment, and we're working hard to reach parity in backend coverage so that we can officially deprecate it.

None of that is a total blocker to landing this, but the bar is high, and it should be understood that Bolt will need to migrate in the future.

Since I cannot speak for BOLT's plans of migrating to JITLink, I was wondering if one of the BOLT devs could chip in here?

rafauler added inline comments.Mar 27 2023, 3:40 PM
bolt/lib/Rewrite/RewriteInstance.cpp
2630

Oh I see, thanks for explaining. I was afraid to push this because I thought we would resolve these symbols to zero, but since we're loading them correctly in discoverFileObjects, then it makes sense to push this to X86 too. Can you update the pretty printing comment to reference this patch and how it is important to handle relocs against these symbols?

To push this to X86, I would like this code to be more guardrails. First, is there a way to check that no undefined symbols are being handled here? Second, I really don't like the name "getOrCreateUndefinedGlobalSymbol" because this symbol is supposedly defined. Can you use "BC.Ctx->lookupSymbol()" and explicitly search for the symbol you need, and drop the relocation in case the symbol is not there? That's because, in case the symbol is not there, you will create an undefined symbol, which ideally we shouldn't have. You can even add an assertion here for RISCV only, in case it is global_pointer and you couldn't find it., to keep track of your support for it.

  • Rebase
  • Improve ABS symbol handling:
    • Reuse existing symbol-lookup code when handling relocations so we don't need to create new undefined symbols.
    • Also handle ABS symbol relocation if their value happens to point to a section.
jobnoorman added inline comments.Mar 28 2023, 1:51 AM
bolt/lib/Rewrite/RewriteInstance.cpp
2630

Oh I see, thanks for explaining. I was afraid to push this because I thought we would resolve these symbols to zero, but since we're loading them correctly in discoverFileObjects, then it makes sense to push this to X86 too. Can you update the pretty printing comment to reference this patch and how it is important to handle relocs against these symbols?

Done!

To push this to X86, I would like this code to be more guardrails.

I rewrote the implementation of ABS symbol handling and I believe it's more robust now. Note that it moved a bit up because of this so you won't find the implementation here anymore.

First, is there a way to check that no undefined symbols are being handled here?

The code will now not trigger anymore for symbols that were never registered in discoverFileObjects. However, I believe that the way ABS symbols are detected there (checking if they don't have a section) would also match UND symbols. I'm not even sure if it's possible to make the distinction between ABS and UND symbols here. I feel like this shouldn't be a big problem though as UND symbols shouldn't happen in practice. Would you agree?

Second, I really don't like the name "getOrCreateUndefinedGlobalSymbol" because this symbol is supposedly defined. Can you use "BC.Ctx->lookupSymbol()" and explicitly search for the symbol you need, and drop the relocation in case the symbol is not there? That's because, in case the symbol is not there, you will create an undefined symbol, which ideally we shouldn't have. You can even add an assertion here for RISCV only, in case it is global_pointer and you couldn't find it., to keep track of your support for it.

I managed to remove the need to create symbols entirely by reusing the symbol lookup that already happened earlier in the handleRelocation method.

rafauler added inline comments.Mar 29 2023, 1:55 PM
bolt/lib/Rewrite/RewriteInstance.cpp
2578–2584

The new solution is triggering a lot more than the previous one, now we're definitely getting undefined symbols. In one X86 input we're getting these entries now being used to generate relocs:

714106: 0000000000000000 0 NOTYPE WEAK DEFAULT UND symbol1
714107: 0000000000000000 0 NOTYPE WEAK DEFAULT UND symbol2

Unless we plan to define symbols in BOLT, which I can't think why we would do that, I think we should explicitly filter out undefined syms when adding relocations to functions.

Don't add relocations against UND symbols.

jobnoorman added inline comments.Mar 30 2023, 2:15 AM
bolt/lib/Rewrite/RewriteInstance.cpp
2578–2584

Interesting! I found a way to check that the symbol is actually an ABS one.

Note that I wasn't able to reproduce this issue (I might've misunderstood/done something wrong though). If you're still seeing issues with this, could you maybe point me to an input file to reproduce it?

This revision is now accepted and ready to land.Mar 31 2023, 8:46 PM
jobnoorman added a subscriber: rafaelauler.EditedApr 1 2023, 9:32 AM

Thanks a lot for the reviews and help improving this patch, @rafauler.

The main roadblock in getting this landed seems to be the issue around RuntimeDyLd. Since it's deprecated, I've been playing around with porting BOLT to JITLink and the results are starting to look quite promising (just 24 failing tests to solve). I hope to be able to post an RFC patch for this at some point next week.

jobnoorman edited the summary of this revision. (Show Details)

Rebase: integrate CMake changes from D148847.

This revision was landed with ongoing or failed builds.Jun 16 2023, 3:20 AM
This revision was automatically updated to reflect the committed changes.