This is an archive of the discontinued LLVM Phabricator instance.

[BOLT][AArch64] Handle adrp+ld64 linker relaxations
ClosedPublic

Authored by yota9 on Nov 16 2022, 12:06 AM.

Details

Summary

Linker might relax adrp + ldr got address loading to adrp + add for
local non-preemptible symbols (e.g. hidden/protected symbols in
executable). As usually linker doesn't change relocations properly after
relaxation, so we have to handle such cases by ourselves. To do that
during relocations reading we change LD64 reloc to ADD if instruction
mismatch found and introduce FixRelaxationPass that searches for ADRP+ADD
pairs and after performing some checks we're replacing ADRP target symbol
to already fixed ADDs one.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Diff Detail

Event Timeline

yota9 created this revision.Nov 16 2022, 12:06 AM
yota9 requested review of this revision.Nov 16 2022, 12:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 16 2022, 12:06 AM

Gentle ping @rafaelauler

bolt/lib/Passes/FixRelaxationPass.cpp
60

@rafaelauler To be honest I'm a bit confused, since sometimes I had a problems in the past within parallel execution of other passes. But it seems to be safe here, is it? I should probably make it parallel

yota9 added a comment.Nov 29 2022, 1:03 PM

2 weeks passed, gentle ping :)

@yota9, sorry for the delay. Rafael is on vacation. I took a quick peek at aarch64 lld, and it handles more relaxations than just ADRP/LDR, e.g. TLS-specific sequences. Do you plan to support those in the future?

In general, the pass looks fine. Ideally, we should fix such discrepancies as early in the pipeline as possible, i.e. during disassembly so that --print-disasm output looks correct. But this clearly requires scanning forward, which complicates things. Still, you can check if the next relocation doesn't match the instruction when you see a relaxable ADRP candidate. What do you think?

To speed things up with the current approach, you can detect functions with relaxable relocations while reading relocs, mark functions that need fixes, and only run the pass on them.

Please run spellcheck on the summary.

bolt/lib/Passes/FixRelaxationPass.cpp
60

Since you are not modifying MCContext, it should be thread-safe. You can run "valgrind --tool=helgrind ..." or thread sanitizer for extra validation.

bolt/lib/Rewrite/RewriteInstance.cpp
2557–2558

Can we use __BOLT_ prefix for the internally-created symbol?

bolt/test/AArch64/got-ld64-relaxation.test
2

Does it make sense to generate a test case with lld similar to x86 GOTPCRELX test in D126747?

yota9 marked 3 inline comments as done.EditedDec 16 2022, 6:29 AM

Thank you for review @maksfb and sorry for long response too.
We are now supporting most of them I think, currently every relaxation is handled in skipRelocationProcessAArch64. As for pure TLS ones we don't have to support them since we do not support TLS relocations and don't handle TLS sections in BOLT for now.

Ideally, we should fix such discrepancies as early in the pipeline as possible

I'm absolutely agree with this, but currently I don't see how to do it at relocation-handling stage. We need to handle relocations, get instruction opcode & etc, I expect the code to be very messy if we would do that. Although I don't really like the idea of the separate pass too, but probably it is the only way currently :(

To speed things up with the current approach, you can detect functions with relaxable relocations while reading relocs, mark functions that need fixes, and only run the pass on them.

It is true, but I don't really want to introduce new BF variables here.. Ideally we might make some bit mask values for such purposes, but for now I think it is ok to use it as-is.

bolt/lib/Passes/FixRelaxationPass.cpp
60

It seems to be no problem to run this pass in parallel :)

bolt/lib/Rewrite/RewriteInstance.cpp
2557–2558

Sure :)

bolt/test/AArch64/got-ld64-relaxation.test
2

The thing is that the linker will relax it to the adr + nop due to small size of the final binary. Such a case is already handled in skipRelocationProcessAArch64, but for adrp+add case it is better to use pre-built binary

yota9 edited the summary of this revision. (Show Details)Dec 16 2022, 6:32 AM
yota9 updated this revision to Diff 483514.Dec 16 2022, 6:33 AM
yota9 marked 3 inline comments as done.

Address comments

maksfb accepted this revision.Dec 19 2022, 3:45 PM

LGTM

bolt/test/AArch64/got-ld64-relaxation.test
2

You can make the binary artificially bigger with .zero/.skip directives.

This revision is now accepted and ready to land.Dec 19 2022, 3:45 PM
yota9 marked an inline comment as done.Dec 20 2022, 6:29 AM
yota9 added inline comments.
bolt/test/AArch64/got-ld64-relaxation.test
2

Well you're right, I was thinking about rept one. Usually I won't use it since it is quite slow and I'm not sure how big binary is OK to use in CI, but in this case it is just 1MB, so it is probably fine to use here.. Also a greater number of bytes really slows down the build. But again I think 1MB is good-enough limit to use in such tests..

yota9 updated this revision to Diff 484247.Dec 20 2022, 6:30 AM
yota9 marked an inline comment as done.

Replace test

yota9 updated this revision to Diff 484249.Dec 20 2022, 6:37 AM

Add lld relax option

yota9 added a comment.Dec 20 2022, 7:03 AM

It seems to be that CI prebuilt lld is a bit out of date and does not support --relax option. I may remove this since it is the default, although I think it is probably good to have one. What do you think @Amir ?

Amir added a comment.Dec 20 2022, 1:37 PM

It seems to be that CI prebuilt lld is a bit out of date and does not support --relax option. I may remove this since it is the default, although I think it is probably good to have one. What do you think @Amir ?

Pre-merge checks are managed by Google folks in a separate repo. I can submit a PR to update LLVM version used in pre-merge checks but it'll take me a while (don't have docker-enabled machine readily available to test).

yota9 updated this revision to Diff 484597.Dec 21 2022, 8:50 AM

Thanks @Amir ! For now I would probably just remove this option, I don't think it is really necessary to be honest :)

yota9 updated this revision to Diff 484601.EditedDec 21 2022, 9:15 AM

The CI lld does not support this relaxation for now, revert test to prebuilt blob.

This revision was automatically updated to reflect the committed changes.