This is an archive of the discontinued LLVM Phabricator instance.

[CG][X86][NFC] Add an option to disable unconditional generation of PLT32 relocations for jmp/call
Needs RevisionPublic

Authored by ebrevnov on May 19 2021, 7:37 AM.

Details

Summary

Unconditional generation of PLT32 relocations had been added as an optimization in revision da4f43a4b4987f4b207b3ecee6bf67a9f5761c81.
Here is related commit message:

[llvm-mc] - Produce R_X86_64_PLT32 for "call/jmp foo".

For instructions like call foo and jmp foo patch changes
relocation produced from R_X86_64_PC32 to R_X86_64_PLT32.
Relocation can be used as a marker for 32-bit PC-relative branches.
Linker will reduce PLT32 relocation to PC32 if function is defined locally.

Differential revision: https://reviews.llvm.org/D43383

The scheme relies on linker support to reduce PLT32 to PC32 relocations back when not needed. It's not always feasible\convenient to rely on that.
This patch introduces an option to be able to disable this optimization if not needed. Off by default.

Diff Detail

Event Timeline

ebrevnov created this revision.May 19 2021, 7:37 AM
ebrevnov requested review of this revision.May 19 2021, 7:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2021, 7:37 AM
ebrevnov retitled this revision from [CG][X86][NFC] Add option to disable unconditional generation of PLT relocations for jmp/call to [CG][X86][NFC] Add option to disable unconditional generation of PLT32 relocations for jmp/call.May 19 2021, 7:45 AM
ebrevnov edited the summary of this revision. (Show Details)
ebrevnov added reviewers: skan, craig.topper, reames.
ebrevnov retitled this revision from [CG][X86][NFC] Add option to disable unconditional generation of PLT32 relocations for jmp/call to [CG][X86][NFC] Add an option to disable unconditional generation of PLT32 relocations for jmp/call.
ebrevnov added a reviewer: skatkov.
MaskRay requested changes to this revision.May 20 2021, 3:21 PM

How is it infeasible? PLT32 is moving toward the correct direction and matches most other architectures.

Branch relocation types should be different from PC-relative relocations because the former has less reliance on the semantics. PC-relative relocations can mean address taking operations and can cause issues like canonical PLT entries.

This revision now requires changes to proceed.May 20 2021, 3:21 PM

How is it infeasible? PLT32 is moving toward the correct direction and matches most other architectures.

Branch relocation types should be different from PC-relative relocations because the former has less reliance on the semantics. PC-relative relocations can mean address taking operations and can cause issues like canonical PLT entries.

Ok, let me share more details and maybe we can come up with a better solution.

We use LLVM as JIT (Just-In-Time) compiler. In this scenario system dynamic linker is not used and relocations are resolved at compile time before the code get executed. The process has the following 3 phases and resembles classical scheme: "static" linking, remapping and "dynamic" linking (I took "static"&"dynamic" in quotes because in case of JIT both are done during compilation of a method but compilation itself is done at runtime). All 3 are done by llvm::RuntimeDyld. JITed methods can have dependencies on symbols defined by VM (Virtual Machine) . Such dependencies are ALWAYS satisfied at "dynamic" linking phase (even though addresses of VM provided symbols are known before method compilation they can't be satisfied at the "static" phase which happens before address remapping). Thus we want to avoid unnecessary extra indirection on calls to VM provided symbols which is critical from performance point of view.

Today there is no way to achieve desired behavior. Even though such symbols are marked as "dso_local" PLT32 relocations are still generated for them and PLT stubs are created during "static" linking phase. I wonder if there is any sense to generate PLT relocations for "dso_local" symbols in the first place?

I would expect that I can get desired result by using "-fdirect-access-external-data -f[no-]pie" options but I still see PLT32 relocations generated (https://godbolt.org/z/z3TEsrrMj). Is this expected by the way?

Regarding "PC-relative relocations can mean address taking operations and can cause issues like canonical PLT entries". This is not an issue for us because 1) There is no way to take an address of a method in Java 2) If there was a way we would have canonical PLT entries anyway. In large code model on x86_64 all calls are done through a register and R_X86_64_64 relocations are generated where the register is filled in with a function address.

MaskRay added a comment.EditedMay 27 2021, 5:29 PM

How is it infeasible? PLT32 is moving toward the correct direction and matches most other architectures.

Branch relocation types should be different from PC-relative relocations because the former has less reliance on the semantics. PC-relative relocations can mean address taking operations and can cause issues like canonical PLT entries.

Ok, let me share more details and maybe we can come up with a better solution.

We use LLVM as JIT (Just-In-Time) compiler. In this scenario system dynamic linker is not used and relocations are resolved at compile time before the code get executed. The process has the following 3 phases and resembles classical scheme: "static" linking, remapping and "dynamic" linking (I took "static"&"dynamic" in quotes because in case of JIT both are done during compilation of a method but compilation itself is done at runtime). All 3 are done by llvm::RuntimeDyld. JITed methods can have dependencies on symbols defined by VM (Virtual Machine) . Such dependencies are ALWAYS satisfied at "dynamic" linking phase (even though addresses of VM provided symbols are known before method compilation they can't be satisfied at the "static" phase which happens before address remapping). Thus we want to avoid unnecessary extra indirection on calls to VM provided symbols which is critical from performance point of view.

You can add R_X86_64_PLT32 support to your JIT compiler. You can handle R_X86_64_PLT32 the same way as R_X86_64_PC32 for such -fno-pie -no-pie usage. https://git.kernel.org/linus/b21ebf2fb4cde1618915a97cc773e287ff49173e

ExecutionEngine seems to support R_X86_64_PLT32.

Today there is no way to achieve desired behavior. Even though such symbols are marked as "dso_local" PLT32 relocations are still generated for them and PLT stubs are created during "static" linking phase. I wonder if there is any sense to generate PLT relocations for "dso_local" symbols in the first place?

Please note that a PLT-generating relocation does not mean a PLT will be created.
The name is probably not great. On other architectures the relocation names may just be "*CALL*" or "*JUMP*".
The linker can optimize R_X86_64_PLT32 to R_X86_64_PC32.

I would expect that I can get desired result by using "-fdirect-access-external-data -f[no-]pie" options but I still see PLT32 relocations generated (https://godbolt.org/z/z3TEsrrMj). Is this expected by the way?

It is expected. Branches should use branch relocation types, not PC-relative relocation types.

Regarding "PC-relative relocations can mean address taking operations and can cause issues like canonical PLT entries". This is not an issue for us because 1) There is no way to take an address of a method in Java 2) If there was a way we would have canonical PLT entries anyway. In large code model on x86_64 all calls are done through a register and R_X86_64_64 relocations are generated where the register is filled in with a function address.