This is an archive of the discontinued LLVM Phabricator instance.

[BOLT][RISCV] Add support for linker relaxation
ClosedPublic

Authored by jobnoorman on Aug 29 2023, 6:17 AM.

Details

Summary

Calls on RISC-V are typically compiled to auipc/jalr pairs to allow
a maximum target range (32-bit pc-relative). In order to optimize calls
to near targets, linker relaxation may replace those pairs with, for
example, single jal instructions.

To allow BOLT to freely reassign function addresses in relaxed binaries,
this patch proposes the following approach:

  • Expand all relaxed calls back to auipc/jalr;
  • Rely on JITLink to relax those back to shorter forms where possible.

This is implemented by detecting all possible call instructions and
replacing them with PseudoCALL (or PseudoTAIL) instructions. The
RISC-V backend then expands those and adds the necessary relocations for
relaxation.

Since BOLT generally ignores pseudo instruction, this patch makes
MCPlusBuilder::isPseudo virtual so that RISCVMCPlusBuilder can
override it to exclude PseudoCALL and PseudoTAIL.

To ensure JITLink knows about the correct section addresses while
relaxing, reassignment of addresses has been moved to a post-allocation
pass. Note that this is probably the time it had to be done in the
first place since in notifyResolved (where it was done before), all
symbols are supposed to be resolved already.

Depends on D159082

Diff Detail

Event Timeline

jobnoorman created this revision.Aug 29 2023, 6:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2023, 6:17 AM
jobnoorman requested review of this revision.Aug 29 2023, 6:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2023, 6:17 AM

Note that since this relies on a fix in lld (D159082), this patch will probably cause CI problems again (since the system's clang is used there).

Amir added a comment.Sep 13 2023, 12:30 PM

Note that since this relies on a fix in lld (D159082), this patch will probably cause CI problems again (since the system's clang is used there).

Pre-merge checks here and in GitHub PRs use trunk lld [1], so if you replace clang with lld in test then it should pass.

[1]: https://buildkite.com/llvm-project/phabricator-run-tests/builds/173788#018a4173-32e6-4d74-9871-6c0bff371c9c/117-120

bolt/test/RISCV/relax.s
2

Use llvm-mc+ld.lld instead of clang.

jobnoorman marked an inline comment as done.Sep 14 2023, 12:20 AM

Note that since this relies on a fix in lld (D159082), this patch will probably cause CI problems again (since the system's clang is used there).

Pre-merge checks here and in GitHub PRs use trunk lld [1], so if you replace clang with lld in test then it should pass.

Thanks @Amir, I completely missed that.

Thanks for the patch, Job! Do you know what's the effect of changing the reassignSectionAddress() invocation on other platforms? Everything else LGTM.

Thanks for the patch, Job! Do you know what's the effect of changing the reassignSectionAddress() invocation on other platforms? Everything else LGTM.

It shouldn't have any effect on other platforms. The change will call reassignSectionAddress() a bit earlier in the JITLink pipeline (notifyResolved is called right after running the PostAllocationPasses, see this) but this shouldn't matter from BOLT's perspective.

Just to be sure, I'll run llvm-bolt-wrapper in the morning to verify (I'm pretty sure I already did this but it has been a while).

Awesome. Let me also check this change internally just in case.

maksfb accepted this revision.Sep 15 2023, 1:33 AM
This revision is now accepted and ready to land.Sep 15 2023, 1:33 AM
This revision was automatically updated to reflect the committed changes.
yota9 added a comment.Oct 8 2023, 2:29 PM

It seems to be that on some of the buildbots the test is failing for some reason e.g. https://lab.llvm.org/buildbot/#/builders/221/builds/19162/steps/8/logs/FAIL__BOLT__relax_s

It seems to be that on some of the buildbots the test is failing for some reason e.g. https://lab.llvm.org/buildbot/#/builders/221/builds/19162/steps/8/logs/FAIL__BOLT__relax_s

Welp, that test has been broken for a while it seems... Looks like the first failing commit is c1ce21b450fd9607c3b44acd520f4e449c6a9a89 but I don't see why that could be relevant. Did something else change on the build bot (e.g., clang/lld version?). Do you have a way to reproduce this? (It works locally for me.) If you do, it would be very helpful if you could send me all the intermediary files of that test so that I can try to debug this.

yota9 added a comment.Oct 9 2023, 1:11 AM

I believe some patches could be tested together, so probably the reason is nearby patch. But I could not help you, probably @Amir might help you more with the investigation. For now I suggest to mark the test as invalid. Probably adding something like "UNRESOLVED: *" might help, I don't think XFAIL could fit since on some of the node the test is passed.

I believe some patches could be tested together, so probably the reason is nearby patch.

In this case, it really seems to be just one patch: this is the last good build (at 0b0ed8f76a264c3677b8254d8d334de43600568f), this the first bad (at c1ce21b450fd9607c3b44acd520f4e449c6a9a89).

$ git log --oneline 0b0ed8f76a264c3677b8254d8d334de43600568f..c1ce21b450fd9607c3b44acd520f4e449c6a9a89
c1ce21b450fd cmake: allow disabling building with the install name directory

That's why I feel something must have changed on the build bot. fwiw, the issue seems to be that relaxation is not performed properly. So that suggests either an llvm-mc issue (unlikely since it's built from tree and hence should be locally reproducible) or an lld issue (seems more likely since the system one is used in the build bots).