This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Target/X86/MCTargetDesc/
-
lib/
-
Target/
-
X86/
-
MCTargetDesc/
-
X86MCCodeEmitter.cpp

Differential D102776

[CG][X86][NFC] Add an option to disable unconditional generation of PLT32 relocations for jmp/call
Needs RevisionPublic

Authored by ebrevnov on May 19 2021, 7:37 AM.

Download Raw Diff

Details

Reviewers

skan
craig.topper
reames
skatkov
MaskRay

Summary

Unconditional generation of PLT32 relocations had been added as an optimization in revision da4f43a4b4987f4b207b3ecee6bf67a9f5761c81.
Here is related commit message:

[llvm-mc] - Produce R_X86_64_PLT32 for "call/jmp foo".

For instructions like call foo and jmp foo patch changes
relocation produced from R_X86_64_PC32 to R_X86_64_PLT32.
Relocation can be used as a marker for 32-bit PC-relative branches.
Linker will reduce PLT32 relocation to PC32 if function is defined locally.

Differential revision: https://reviews.llvm.org/D43383

The scheme relies on linker support to reduce PLT32 to PC32 relocations back when not needed. It's not always feasible\convenient to rely on that.
This patch introduces an option to be able to disable this optimization if not needed. Off by default.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ebrevnov created this revision.May 19 2021, 7:37 AM

Herald added subscribers: pengfei, hiraditya. · View Herald TranscriptMay 19 2021, 7:37 AM

ebrevnov requested review of this revision.May 19 2021, 7:37 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2021, 7:37 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

ebrevnov retitled this revision from [CG][X86][NFC] Add option to disable unconditional generation of PLT relocations for jmp/call to [CG][X86][NFC] Add option to disable unconditional generation of PLT32 relocations for jmp/call.May 19 2021, 7:45 AM

ebrevnov edited the summary of this revision. (Show Details)

ebrevnov added reviewers: skan, craig.topper, reames.

ebrevnov retitled this revision from [CG][X86][NFC] Add option to disable unconditional generation of PLT32 relocations for jmp/call to [CG][X86][NFC] Add an option to disable unconditional generation of PLT32 relocations for jmp/call.

ebrevnov added a reviewer: skatkov.

craig.topper added a reviewer: MaskRay.May 19 2021, 7:47 AM

Harbormaster completed remote builds in B105237: Diff 346456.May 19 2021, 8:09 AM

How is it infeasible? PLT32 is moving toward the correct direction and matches most other architectures.

Branch relocation types should be different from PC-relative relocations because the former has less reliance on the semantics. PC-relative relocations can mean address taking operations and can cause issues like canonical PLT entries.

This revision now requires changes to proceed.May 20 2021, 3:21 PM

In D102776#2772339, @MaskRay wrote:

How is it infeasible? PLT32 is moving toward the correct direction and matches most other architectures.

Branch relocation types should be different from PC-relative relocations because the former has less reliance on the semantics. PC-relative relocations can mean address taking operations and can cause issues like canonical PLT entries.

Ok, let me share more details and maybe we can come up with a better solution.

We use LLVM as JIT (Just-In-Time) compiler. In this scenario system dynamic linker is not used and relocations are resolved at compile time before the code get executed. The process has the following 3 phases and resembles classical scheme: "static" linking, remapping and "dynamic" linking (I took "static"&"dynamic" in quotes because in case of JIT both are done during compilation of a method but compilation itself is done at runtime). All 3 are done by llvm::RuntimeDyld. JITed methods can have dependencies on symbols defined by VM (Virtual Machine) . Such dependencies are ALWAYS satisfied at "dynamic" linking phase (even though addresses of VM provided symbols are known before method compilation they can't be satisfied at the "static" phase which happens before address remapping). Thus we want to avoid unnecessary extra indirection on calls to VM provided symbols which is critical from performance point of view.

Today there is no way to achieve desired behavior. Even though such symbols are marked as "dso_local" PLT32 relocations are still generated for them and PLT stubs are created during "static" linking phase. I wonder if there is any sense to generate PLT relocations for "dso_local" symbols in the first place?

I would expect that I can get desired result by using "-fdirect-access-external-data -f[no-]pie" options but I still see PLT32 relocations generated (https://godbolt.org/z/z3TEsrrMj). Is this expected by the way?

Regarding "PC-relative relocations can mean address taking operations and can cause issues like canonical PLT entries". This is not an issue for us because 1) There is no way to take an address of a method in Java 2) If there was a way we would have canonical PLT entries anyway. In large code model on x86_64 all calls are done through a register and R_X86_64_64 relocations are generated where the register is filled in with a function address.

In D102776#2778922, @ebrevnov wrote:

In D102776#2772339, @MaskRay wrote:

How is it infeasible? PLT32 is moving toward the correct direction and matches most other architectures.

Branch relocation types should be different from PC-relative relocations because the former has less reliance on the semantics. PC-relative relocations can mean address taking operations and can cause issues like canonical PLT entries.

Ok, let me share more details and maybe we can come up with a better solution.

We use LLVM as JIT (Just-In-Time) compiler. In this scenario system dynamic linker is not used and relocations are resolved at compile time before the code get executed. The process has the following 3 phases and resembles classical scheme: "static" linking, remapping and "dynamic" linking (I took "static"&"dynamic" in quotes because in case of JIT both are done during compilation of a method but compilation itself is done at runtime). All 3 are done by llvm::RuntimeDyld. JITed methods can have dependencies on symbols defined by VM (Virtual Machine) . Such dependencies are ALWAYS satisfied at "dynamic" linking phase (even though addresses of VM provided symbols are known before method compilation they can't be satisfied at the "static" phase which happens before address remapping). Thus we want to avoid unnecessary extra indirection on calls to VM provided symbols which is critical from performance point of view.

You can add R_X86_64_PLT32 support to your JIT compiler. You can handle R_X86_64_PLT32 the same way as R_X86_64_PC32 for such -fno-pie -no-pie usage. https://git.kernel.org/linus/b21ebf2fb4cde1618915a97cc773e287ff49173e

ExecutionEngine seems to support R_X86_64_PLT32.

Today there is no way to achieve desired behavior. Even though such symbols are marked as "dso_local" PLT32 relocations are still generated for them and PLT stubs are created during "static" linking phase. I wonder if there is any sense to generate PLT relocations for "dso_local" symbols in the first place?

Please note that a PLT-generating relocation does not mean a PLT will be created.
The name is probably not great. On other architectures the relocation names may just be "*CALL*" or "*JUMP*".
The linker can optimize R_X86_64_PLT32 to R_X86_64_PC32.

I would expect that I can get desired result by using "-fdirect-access-external-data -f[no-]pie" options but I still see PLT32 relocations generated (https://godbolt.org/z/z3TEsrrMj). Is this expected by the way?

It is expected. Branches should use branch relocation types, not PC-relative relocation types.

Regarding "PC-relative relocations can mean address taking operations and can cause issues like canonical PLT entries". This is not an issue for us because 1) There is no way to take an address of a method in Java 2) If there was a way we would have canonical PLT entries anyway. In large code model on x86_64 all calls are done through a register and R_X86_64_64 relocations are generated where the register is filled in with a function address.

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

MCTargetDesc/

X86MCCodeEmitter.cpp

14 lines

Diff 346456

llvm/lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp

Show All 18 Lines
#include "llvm/MC/MCExpr.h"		#include "llvm/MC/MCExpr.h"
#include "llvm/MC/MCFixup.h"		#include "llvm/MC/MCFixup.h"
#include "llvm/MC/MCInst.h"		#include "llvm/MC/MCInst.h"
#include "llvm/MC/MCInstrDesc.h"		#include "llvm/MC/MCInstrDesc.h"
#include "llvm/MC/MCInstrInfo.h"		#include "llvm/MC/MCInstrInfo.h"
#include "llvm/MC/MCRegisterInfo.h"		#include "llvm/MC/MCRegisterInfo.h"
#include "llvm/MC/MCSubtargetInfo.h"		#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/MC/MCSymbol.h"		#include "llvm/MC/MCSymbol.h"
		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
#include <cstdlib>		#include <cstdlib>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "mccodeemitter"		#define DEBUG_TYPE "mccodeemitter"

		cl::opt<bool> DontGenPLTForDSOLocalCalls(
		"dont-gen-plt-for-dsolocal-calls", cl::Hidden, cl::init(false),
		cl::desc("Disables generation of PLT relocations for DSO local symbols"));

namespace {		namespace {

class X86MCCodeEmitter : public MCCodeEmitter {		class X86MCCodeEmitter : public MCCodeEmitter {
const MCInstrInfo &MCII;		const MCInstrInfo &MCII;
MCContext &Ctx;		MCContext &Ctx;

public:		public:
X86MCCodeEmitter(const MCInstrInfo &mcii, MCContext &ctx)		X86MCCodeEmitter(const MCInstrInfo &mcii, MCContext &ctx)
▲ Show 20 Lines • Show All 1,414 Lines • ▼ Show 20 Lines	case X86II::AddCCFrm: {
// This will be added to the opcode in the fallthrough.		// This will be added to the opcode in the fallthrough.
OpcodeOffset = MI.getOperand(NumOps - 1).getImm();		OpcodeOffset = MI.getOperand(NumOps - 1).getImm();
assert(OpcodeOffset < 16 && "Unexpected opcode offset!");		assert(OpcodeOffset < 16 && "Unexpected opcode offset!");
--NumOps; // Drop the operand from the end.		--NumOps; // Drop the operand from the end.
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
case X86II::RawFrm:		case X86II::RawFrm:
emitByte(BaseOpcode + OpcodeOffset, OS);		emitByte(BaseOpcode + OpcodeOffset, OS);

		// Check if unconditional generation of PLT32 relocations for
		// calls/jumps has been explicitly disabled. This may be useful for cases
		// when nonstandard linker is used. For example, linking of JITed code is
		// done in a special way and symbols exported by VM get linked only after
		// PLT table has already been generated and it's too late to reduce PLT32
		// relocations.
		if (DontGenPLTForDSOLocalCalls)
		break;

if (!STI.hasFeature(X86::Mode64Bit) \|\| !isPCRel32Branch(MI, MCII))		if (!STI.hasFeature(X86::Mode64Bit) \|\| !isPCRel32Branch(MI, MCII))
break;		break;

const MCOperand &Op = MI.getOperand(CurOp++);		const MCOperand &Op = MI.getOperand(CurOp++);
emitImmediate(Op, MI.getLoc(), X86II::getSizeOfImm(TSFlags),		emitImmediate(Op, MI.getLoc(), X86II::getSizeOfImm(TSFlags),
MCFixupKind(X86::reloc_branch_4byte_pcrel), StartByte, OS,		MCFixupKind(X86::reloc_branch_4byte_pcrel), StartByte, OS,
Fixups);		Fixups);
break;		break;
▲ Show 20 Lines • Show All 366 Lines • Show Last 20 Lines