This is an archive of the discontinued LLVM Phabricator instance.

[BOLT][AArch64] Handle gold linker veneers
ClosedPublic

Authored by yota9 on Jun 17 2022, 11:47 AM.

Details

Summary

The gold linker veneers are written between functions without symbols,
so we to handle it specially in BOLT.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Diff Detail

Event Timeline

yota9 created this revision.Jun 17 2022, 11:47 AM
yota9 requested review of this revision.Jun 17 2022, 11:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 17 2022, 11:47 AM
yota9 updated this revision to Diff 437998.Jun 17 2022, 11:56 AM

ADd test description

yota9 added inline comments.Jun 17 2022, 11:59 AM
bolt/include/bolt/Core/BinaryContext.h
647

The list seems to be more suitable, then previously used set. Since we have to handle references one by one the insertion is faster here. But more important the newely-handled trampolines will create new references, that must be added to the end of the list while iterating it.

bolt/lib/Core/BinaryContext.cpp
1145

We might just create a BF here and call disassemble, but since we don't know veneer size it seems to be that a bit of code copy-paste is more proper solution here, what do you think?

bolt/lib/Core/BinaryFunction.cpp
1298

@maksfb I hope we could solve this problem, do you have ideas how to handle it properly? :)

yota9 updated this revision to Diff 438003.Jun 17 2022, 12:13 PM

Add test checks

yota9 updated this revision to Diff 438137.Jun 18 2022, 11:02 AM

Uncomment __builtin_unreachable() branch handling.
@maksfb as a tmp solution I purpose to add extra check in unreachable handling. I hope we will able to move elliminate this code here oneday,
but for now it should work fine. Also as I see there is some other code that I think we need to be refactore also, e.g. removeConditionalTailCalls
in postProcessCFG. Ideally we want to move all code transformations to the passes, since at least golang support needs it. Thank you!

yota9 retitled this revision from [BOLT][AArch64][RFC] Handle gold linker veneers to [BOLT][AArch64] Handle gold linker veneers.Jun 18 2022, 11:43 AM
yota9 updated this revision to Diff 439464.Jun 23 2022, 10:56 AM

Run veneer ellimination pass before instrumentation.
We would like to remove veneers asap and before instrumentation pass, since we want to record call to the real function, not it's veneer.
@maksfb @rafauler gentle ping

yota9 updated this revision to Diff 439466.Jun 23 2022, 10:58 AM

dbgs->outs in veneer pass

maksfb added inline comments.Jun 23 2022, 11:07 AM
bolt/test/AArch64/veneer.s
9

We don't enforce double dash long options (at least not yet), but encourage.

13

Why is it .align 2? I though AArch64 required 4 bytes.

32

Will be also good to check the input, to make sure the linker respects --no-relax. Alternatively, print the function in BOLT before and after the veneer elimination pass.

yota9 marked 3 inline comments as done.Jun 23 2022, 11:13 AM
yota9 added inline comments.
bolt/test/AArch64/veneer.s
13

It's 1 << 2 = 4 bytes

32

NP :)

maksfb added inline comments.Jun 23 2022, 11:18 AM
bolt/test/AArch64/veneer.s
13

Are you sure? From https://ftp.gnu.org/old-gnu/Manuals/gas-2.9.1/html_chapter/as_7.html:

The way the required alignment is specified varies from system to system. For the a29k, hppa, m68k, m88k, w65, sparc, and Hitachi SH, and i386 using ELF format, the first expression is the alignment request in bytes. For example `.align 8' advances the location counter until it is a multiple of 8. If the location counter is already a multiple of 8, no change is needed.

For other systems, including the i386 using a.out format, it is the number of low-order zero bits the location counter must have after advancement. For example `.align 3' advances the location counter until it a multiple of 8. If the location counter is already a multiple of 8, no change is needed.

I suggest using .p2align to avoid the confusion.

yota9 marked 3 inline comments as done.Jun 23 2022, 11:27 AM
yota9 added inline comments.
bolt/test/AArch64/veneer.s
13

In your example .align 3 -> until it a multiple of 8 , 1<<3 = 8
From arm docs: https://developer.arm.com/documentation/dui0473/m/directives-reference/align

is a numeric expression evaluating to any power of 2 from 2^0 to 2^31

Also I've checked it by my self now :) But we can use .p2align 2 or .balign 4 indeed, no problem :)

yota9 marked 2 inline comments as done.Jun 23 2022, 11:37 AM
yota9 added inline comments.
bolt/test/AArch64/veneer.s
13

It is i386 test, there is no reason for arm platform to make alignment non power of two :) Anyway to avoid confusion I've replaced it with .balign 4 :)

maksfb added inline comments.Jun 23 2022, 11:44 AM
bolt/test/AArch64/veneer.s
13

Thanks. On AArch64 .align 2 is converted into .p2align 2 automatically.

yota9 updated this revision to Diff 439487.Jun 23 2022, 11:50 AM
yota9 marked an inline comment as done.

Address comments

rafauler added inline comments.Jun 23 2022, 5:03 PM
bolt/lib/Core/BinaryContext.cpp
1139

It's curious to see that we arrive at disassembly stage without having mapped this address to a given function. but I guess this might happen in this scenario if there are no symbols to cover the veneer region.

This function serves a dual purpose: both matching a veneer and asserting that there is a veneer at a given address, but also building a new binary function.

I would preferrably avoid functions that have a boolean param and try to break it in two:

  1. BC::matchAArch64Veneer()
  2. BC::createAArch64VeneerFunction()

And then try to write the code in a way that 2 calls 1, and the BF::disassemble() function calls 1, and RI::disassembleFunctions() calls 2. The "tryToHandle" name prefix here is unnecessary since we can just assume that if we have "bool" as the return type, it might fail because it returns false for some instances, so the naming becomes a bit redundant.

If breaking it up in two functions is too awkward, then I think we can at least rename this to "handleAArch64Veneer", but in that case make the comment in the header file more explicit by saying that this function serves both to match a veneer and, if MatchOnly is false, to create a new function to hold the unmapped veneer instructions.

yota9 marked 2 inline comments as done.Jun 25 2022, 7:12 AM
yota9 added inline comments.
bolt/lib/Core/BinaryContext.cpp
1139

Thanks for the review rafauler! I think I would stay with the handleAArch64Veneer single function variant, since it is a bit messy to pass the instruction parameters & etc.

yota9 updated this revision to Diff 439991.Jun 25 2022, 8:27 AM
yota9 marked an inline comment as done.

Address comments

This revision is now accepted and ready to land.Jun 27 2022, 3:22 PM
This revision was automatically updated to reflect the committed changes.

Hi @yota9, I reverted this commit because it is causing BOLT to crash in one of your internal binaries, let me work a bit more on it to understand what is happening before you land it again. Thanks!

yota9 added a comment.Jun 29 2022, 1:30 AM

Hello @rafauler ! Let me know if I can help you with investigation!

The cause of the crash caused by this diff is because this diff moves processing of interprocedural references outside the disassembly loop.

In the disassembly loop, we don't call "process interprocedural references" on every function. For non simple functions, we bail out early.

With this diff, we process all interprocedural references, even the ones coming from non-simple/ignored functions. We happen to have an internal binary with a non-simple function that has a very weird interprocedural reference that, when processed, causes BOLT to crash (it's an invalid address that can't be mapped to any function).

rafauler added inline comments.Jul 6 2022, 1:20 PM
bolt/lib/Rewrite/RewriteInstance.cpp
2889

Notice this "continue" here. It will cause us to skip calling "BC->processInterproceduralReferences(Function)" for an ignored function.

yota9 added inline comments.Jul 6 2022, 1:54 PM
bolt/lib/Rewrite/RewriteInstance.cpp
2889

I'm starting to see now, thanks! Probably your CI uses processAllFunctions flag? What if I will re-open the review and will add isIgnored() check in BCs processInterproceduralReferences()?

I think that works. Feel free to reopen or create a new review, whichever is easier. CI runs:

-reorder-blocks=cache -use-gnu-stack -jump-tables=move

yota9 added a comment.Jul 8 2022, 11:27 AM

@rafauler Already :) https://reviews.llvm.org/D129260
Also could you please approve this one: https://reviews.llvm.org/D129321 - it fixes CI. Interesting, that such thing does broken it, probably IR during compilation has some extra checks and does not only rely on the symbol name, but also on function argument types..