Download Raw Diff

Details

Reviewers

compnerd
efriedma
t.p.northover
peter.smith
aemerson
rengolin
rnk

Commits

rGcc24096d4df7: [AArch64] Implement native TLS for Windows
rL327220: [AArch64] Implement native TLS for Windows

Diff Detail

Event Timeline

mstorsjo created this revision.Mar 1 2018, 1:58 PM

Herald added subscribers: kristof.beyls, javed.absar, rengolin. · View Herald TranscriptMar 1 2018, 1:58 PM

I don't feel like I know enough AArch64 asm to review this.

I don't know much about COFF TLS to tell why you need the special treatment, but the AArch64 asm looks correct (though very inefficient, but you know that:).

In D43971#1030932, @rengolin wrote:

I don't know much about COFF TLS to tell why you need the special treatment, but the AArch64 asm looks correct (though very inefficient, but you know that:).

Thanks for taking a look in any case!

Let's see if I can get some more progress on this if I turn it into a few more concrete questions:

First: Currently there's the pseudoinstruction LOADgot which lowers into a "adrp + ldr" instruction pair, but the ldr always loads a full X-register. Here I needed to do that, but load a W-register instead, to only read 32 bits. I tried to do this by adding a separate LOADgot32 pseudoinstruction, but I wasn't able to make it return a 32 bit value on the SelectionDAG level.

Secondly: When I do a series of SHL + ZERO_EXTEND + Load, I would want it lowered into a ldr x0, [x0, w1 uxtw #3], but it ends up as lsl w1, w1, #3; ldr x0, [x0, w1 uxtw] which feels pointless. I remember doing similar things elsewhere, where it was folded properly into the ldr as a shift - what's missing here?

Who's familiar with the tablegen part of the AArch64 backend and/or SelectionDAG and can help out with this?

In D43971#1031818, @mstorsjo wrote:

First: Currently there's the pseudoinstruction LOADgot which lowers into a "adrp + ldr" instruction pair, but the ldr always loads a full X-register. Here I needed to do that, but load a W-register instead, to only read 32 bits. I tried to do this by adding a separate LOADgot32 pseudoinstruction, but I wasn't able to make it return a 32 bit value on the SelectionDAG level.

Right, I see. This seems to be a new problem. When trying it on current trunk, I do get an unreachable, but when I try on older versions (ex. 3.8) I see a much simpler code coming out.

@tlsVar16 = thread_local global i16 0
@tlsVar32 = thread_local global i32 0
@tlsVar64 = thread_local global i64 0

define i64 @getVar() {
  %1 = load i16, i16* @tlsVar16
  %2 = load i32, i32* @tlsVar32
  %3 = load i64, i64* @tlsVar64

  %4 = zext i16 %1 to i32
  %5 = add i32 %2, %4

  %6 = zext i32 %5 to i64
  %7 = add i64 %3, %6
  ret i64 %7
}

then:

$ llc -mtriple aarch64-windows tls-32b.ll -o -

I get:

mrs  x8, TPIDR_EL0
add  x9, x8, :tprel_hi12:tlsVar16
add  x10, x8, :tprel_hi12:tlsVar32
add  x8, x8, :tprel_hi12:tlsVar64
add  x9, x9, :tprel_lo12_nc:tlsVar16
add  x10, x10, :tprel_lo12_nc:tlsVar32
add  x8, x8, :tprel_lo12_nc:tlsVar64
ldr  w10, [x10]
ldrh w9, [x9]
ldr  x8, [x8]
add  w9, w10, w9
add  x0, x8, w9, uxtw
ret

Which does the correct loads for half, single and double sizes.

Maybe the functionality was lost for not being tested properly on Windows?

Secondly: When I do a series of SHL + ZERO_EXTEND + Load, I would want it lowered into a ldr x0, [x0, w1 uxtw #3], but it ends up as lsl w1, w1, #3; ldr x0, [x0, w1 uxtw] which feels pointless. I remember doing similar things elsewhere, where it was folded properly into the ldr as a shift - what's missing here?

Who's familiar with the tablegen part of the AArch64 backend and/or SelectionDAG and can help out with this?

AFAICR, this is supposed to be done by the DAGCombine. However, it depends on when the Pseudos get expanded (I forget). If it's too late, then they won't get merged.

In D43971#1032589, @rengolin wrote:

In D43971#1031818, @mstorsjo wrote:

First: Currently there's the pseudoinstruction LOADgot which lowers into a "adrp + ldr" instruction pair, but the ldr always loads a full X-register. Here I needed to do that, but load a W-register instead, to only read 32 bits. I tried to do this by adding a separate LOADgot32 pseudoinstruction, but I wasn't able to make it return a 32 bit value on the SelectionDAG level.

Maybe the functionality was lost for not being tested properly on Windows?

No, that's probably because it aarch64-windows didn't exist at all back then (there's even no publicly available hardware yet), so this just did something else (looks like ELF relocations). And the issue isn't with reading tls vars of different sizes, but just in one small piece of implementing it properly - when reading the global _tls_index variable, which is a 32 bit value.

To reproduce the issue with this, that I'm facing, try adding the following to AArch64InstrInfo.td (next to the existing similar pattern for LOADgot):

def LOADgot32 : Pseudo<(outs GPR32:$dst), (ins i64imm:$addr),
                     [(set GPR32:$dst, (AArch64LOADgot tglobaladdr:$addr))]>,
              Sched<[WriteLDAdr]>;

When building with this pattern in place, tablegen fails like this:

FAILED: cd /home/martin/code/llvm/build/lib/Target/AArch64 && /home/martin/code/llvm/build/bin/llvm-tblgen -gen-instr-info -I /home/martin/code/llvm/lib/Target/AArch64 -I /home/martin/code/llvm/include -I /home/martin/code/llvm/lib/Target /home/martin/code/llvm/lib/Target/AArch64/AArch64.td -o /home/martin/code/llvm/build/lib/Target/AArch64/AArch64GenInstrInfo.inc.tmp
Type set is empty for each HW mode:
possible type contradiction in the pattern below (use -print-records with llvm-tblgen to see all expanded records).
LOADgot32:      (LOADgot32:{ *:[i32] } (tglobaladdr:{ *:[] }):$addr)
UNREACHABLE executed at ../utils/TableGen/CodeGenDAGPatterns.cpp:817!

I guess I could try to see what dag nodes this produces if I do such a load of a 32 bit global variable from normal IR.

In D43971#1032615, @mstorsjo wrote:

No, that's probably because it aarch64-windows didn't exist at all back then (there's even no publicly available hardware yet), so this just did something else (looks like ELF relocations).

D'oh, of course! :)

I guess I could try to see what dag nodes this produces if I do such a load of a 32 bit global variable from normal IR.

That's what I'd recommend, yes.

Managed to implement it without adding any new pseudo instructions. No significant effect on the generated code (the two missing combines are still as before), but this isn't nearly as messy in as many places as before.

While still suboptimal, I wouldn't be quite as embarrased to commit this as such now, with the rest of the instruction selection mess cleaned up now.

Right, this makes a lot more sense! :)

I'm assuming only 32-bit values are supported? If not, would it be too hard to add different sizes now?

Managed to merge the lsl into the ldr addressing by swapping the order of ZERO_EXTEND and SHL.

In D43971#1033473, @rengolin wrote:

Right, this makes a lot more sense! :)

I'm assuming only 32-bit values are supported? If not, would it be too hard to add different sizes now?

No, it supports any size. This added code just calculates the address of the tls variable. (The fiddling with a 32 bit load was only for the _tls_index ABI variable, not the user defined tls variable itself.)

mstorsjo edited the summary of this revision. (Show Details)Mar 10 2018, 1:55 AM

Added testcases for getting a plain pointer, for storing to the tls var, and for other sizes of tls vars.

Right, now it's clear what it's doing, thanks! I'm happy with it, LGTM.

This revision is now accepted and ready to land.Mar 10 2018, 5:50 AM

Closed by commit rL327220: [AArch64] Implement native TLS for Windows (authored by mstorsjo). · Explain WhyMar 10 2018, 11:09 AM

This revision was automatically updated to reflect the committed changes.

Diff 136600

lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp

Show First 20 Lines • Show All 847 Lines • ▼ Show 20 Lines	MachineInstrBuilder MIB1 =
.add(MI.getOperand(1))		.add(MI.getOperand(1))
.add(MI.getOperand(2))		.add(MI.getOperand(2))
.addImm(AArch64_AM::getShifterImm(AArch64_AM::LSL, 0));		.addImm(AArch64_AM::getShifterImm(AArch64_AM::LSL, 0));
transferImpOps(MI, MIB1, MIB1);		transferImpOps(MI, MIB1, MIB1);
MI.eraseFromParent();		MI.eraseFromParent();
return true;		return true;
}		}

		case AArch64::LOADgot32:
case AArch64::LOADgot: {		case AArch64::LOADgot: {
// Expand into ADRP + LDR.		// Expand into ADRP + LDR.
unsigned DstReg = MI.getOperand(0).getReg();		unsigned DstReg = MI.getOperand(0).getReg();
const MachineOperand &MO1 = MI.getOperand(1);		const MachineOperand &MO1 = MI.getOperand(1);
unsigned Flags = MO1.getTargetFlags();		unsigned Flags = MO1.getTargetFlags();
MachineInstrBuilder MIB1 =		MachineInstrBuilder MIB1 =
BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(AArch64::ADRP), DstReg);		BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(AArch64::ADRP), DstReg);
MachineInstrBuilder MIB2 =		MachineInstrBuilder MIB2 =
BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(AArch64::LDRXui))		BuildMI(MBB, MBBI, MI.getDebugLoc(),
.add(MI.getOperand(0))		TII->get(Opcode == AArch64::LOADgot32 ? AArch64::LDRWui
.addReg(DstReg);		: AArch64::LDRXui));
		if (Opcode == AArch64::LOADgot32)
		MIB2 = MIB2.addReg(getWRegFromXReg(DstReg));
		else
		MIB2 = MIB2.add(MI.getOperand(0));
		MIB2 = MIB2.addReg(DstReg);

if (MO1.isGlobal()) {		if (MO1.isGlobal()) {
MIB1.addGlobalAddress(MO1.getGlobal(), 0, Flags \| AArch64II::MO_PAGE);		MIB1.addGlobalAddress(MO1.getGlobal(), 0, Flags \| AArch64II::MO_PAGE);
MIB2.addGlobalAddress(MO1.getGlobal(), 0,		MIB2.addGlobalAddress(MO1.getGlobal(), 0,
Flags \| AArch64II::MO_PAGEOFF \| AArch64II::MO_NC);		Flags \| AArch64II::MO_PAGEOFF \| AArch64II::MO_NC);
} else if (MO1.isSymbol()) {		} else if (MO1.isSymbol()) {
MIB1.addExternalSymbol(MO1.getSymbolName(), Flags \| AArch64II::MO_PAGE);		MIB1.addExternalSymbol(MO1.getSymbolName(), Flags \| AArch64II::MO_PAGE);
MIB2.addExternalSymbol(MO1.getSymbolName(),		MIB2.addExternalSymbol(MO1.getSymbolName(),
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	if (MF->getTarget().getTargetTriple().isOSFuchsia() &&
MF->getTarget().getCodeModel() == CodeModel::Kernel)		MF->getTarget().getCodeModel() == CodeModel::Kernel)
SysReg = AArch64SysReg::TPIDR_EL1;		SysReg = AArch64SysReg::TPIDR_EL1;
BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(AArch64::MRS), DstReg)		BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(AArch64::MRS), DstReg)
.addImm(SysReg);		.addImm(SysReg);
MI.eraseFromParent();		MI.eraseFromParent();
return true;		return true;
}		}

		case AArch64::ADDsecrel: {
		// Expand into ADD + ADD.
		const MachineOperand &GVOperand = MI.getOperand(2);
		assert(GVOperand.isGlobal() && "ADDsecrel needs a global");
		const GlobalValue *GV = GVOperand.getGlobal();
		MachineOperand lo12 = MachineOperand::CreateGA(
		GV, 0, AArch64II::MO_TLS \| AArch64II::MO_PAGEOFF);
		MachineOperand hi12 =
		MachineOperand::CreateGA(GV, 0, AArch64II::MO_TLS \| AArch64II::MO_PAGE);
		MachineInstrBuilder MIB1 =
		BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(AArch64::ADDXri))
		.add(MI.getOperand(0))
		.add(MI.getOperand(1))
		.add(hi12)
		.addImm(0);

		MachineInstrBuilder MIB2 =
		BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(AArch64::ADDXri))
		.add(MI.getOperand(0))
		.add(MI.getOperand(1))
		.add(lo12)
		.addImm(0);

		transferImpOps(MI, MIB1, MIB2);
		MI.eraseFromParent();
		return true;
		}

case AArch64::MOVi32imm:		case AArch64::MOVi32imm:
return expandMOVImm(MBB, MBBI, 32);		return expandMOVImm(MBB, MBBI, 32);
case AArch64::MOVi64imm:		case AArch64::MOVi64imm:
return expandMOVImm(MBB, MBBI, 64);		return expandMOVImm(MBB, MBBI, 64);
case AArch64::RET_ReallyLR: {		case AArch64::RET_ReallyLR: {
// Hiding the LR use with RET_ReallyLR may lead to extra kills in the		// Hiding the LR use with RET_ReallyLR may lead to extra kills in the
// function and missing live-ins. We are fine in practice because callee		// function and missing live-ins. We are fine in practice because callee
// saved register handling ensures the register value is restored before		// saved register handling ensures the register value is restored before
▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64ISelLowering.h

Show All 32 Lines	enum NodeType : unsigned {

// Produces the full sequence of instructions for getting the thread pointer		// Produces the full sequence of instructions for getting the thread pointer
// offset of a variable into X0, using the TLSDesc model.		// offset of a variable into X0, using the TLSDesc model.
TLSDESC_CALLSEQ,		TLSDESC_CALLSEQ,
ADRP, // Page address of a TargetGlobalAddress operand.		ADRP, // Page address of a TargetGlobalAddress operand.
ADDlow, // Add the low 12 bits of a TargetGlobalAddress operand.		ADDlow, // Add the low 12 bits of a TargetGlobalAddress operand.
LOADgot, // Load from automatically generated descriptor (e.g. Global		LOADgot, // Load from automatically generated descriptor (e.g. Global
// Offset Table, TLS record).		// Offset Table, TLS record).
		LOADgot32,// Load 32 bits from automatically generated descriptor
RET_FLAG, // Return with a flag operand. Operand 0 is the chain operand.		RET_FLAG, // Return with a flag operand. Operand 0 is the chain operand.
BRCOND, // Conditional branch instruction; "b.cond".		BRCOND, // Conditional branch instruction; "b.cond".
CSEL,		CSEL,
FCSEL, // Conditional move instruction.		FCSEL, // Conditional move instruction.
CSINV, // Conditional select invert.		CSINV, // Conditional select invert.
CSNEG, // Conditional select negate.		CSNEG, // Conditional select negate.
CSINC, // Conditional select increment.		CSINC, // Conditional select increment.
		ADDsecrel, // Add the section relative offset of a global variable

// Pointer to the thread's local storage area. Materialised from TPIDR_EL0 on		// Pointer to the thread's local storage area. Materialised from TPIDR_EL0 on
// ELF.		// ELF.
THREAD_POINTER,		THREAD_POINTER,
ADC,		ADC,
SBC, // adc, sbc instructions		SBC, // adc, sbc instructions

// Arithmetic instructions which write flags.		// Arithmetic instructions which write flags.
▲ Show 20 Lines • Show All 494 Lines • ▼ Show 20 Lines	private:
SDValue getAddr(NodeTy *N, SelectionDAG &DAG, unsigned Flags = 0) const;		SDValue getAddr(NodeTy *N, SelectionDAG &DAG, unsigned Flags = 0) const;
SDValue LowerADDROFRETURNADDR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerADDROFRETURNADDR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerGlobalTLSAddress(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerGlobalTLSAddress(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerDarwinGlobalTLSAddress(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerDarwinGlobalTLSAddress(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerELFGlobalTLSAddress(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerELFGlobalTLSAddress(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerELFTLSDescCallSeq(SDValue SymAddr, const SDLoc &DL,		SDValue LowerELFTLSDescCallSeq(SDValue SymAddr, const SDLoc &DL,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;
		SDValue LowerWindowsGlobalTLSAddress(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSETCC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSETCC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBR_CC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBR_CC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSELECT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSELECT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSELECT_CC(ISD::CondCode CC, SDValue LHS, SDValue RHS,		SDValue LowerSELECT_CC(ISD::CondCode CC, SDValue LHS, SDValue RHS,
SDValue TVal, SDValue FVal, const SDLoc &dl,		SDValue TVal, SDValue FVal, const SDLoc &dl,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;
SDValue LowerJumpTable(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerJumpTable(SDValue Op, SelectionDAG &DAG) const;
▲ Show 20 Lines • Show All 106 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64ISelLowering.cpp

	Show First 20 Lines • Show All 991 Lines • ▼ Show 20 Lines
	}			}

	const char *AArch64TargetLowering::getTargetNodeName(unsigned Opcode) const {			const char *AArch64TargetLowering::getTargetNodeName(unsigned Opcode) const {
	switch ((AArch64ISD::NodeType)Opcode) {			switch ((AArch64ISD::NodeType)Opcode) {
	case AArch64ISD::FIRST_NUMBER: break;			case AArch64ISD::FIRST_NUMBER: break;
	case AArch64ISD::CALL: return "AArch64ISD::CALL";			case AArch64ISD::CALL: return "AArch64ISD::CALL";
	case AArch64ISD::ADRP: return "AArch64ISD::ADRP";			case AArch64ISD::ADRP: return "AArch64ISD::ADRP";
	case AArch64ISD::ADDlow: return "AArch64ISD::ADDlow";			case AArch64ISD::ADDlow: return "AArch64ISD::ADDlow";
				case AArch64ISD::ADDsecrel: return "AArch64ISD::ADDsecrel";
	case AArch64ISD::LOADgot: return "AArch64ISD::LOADgot";			case AArch64ISD::LOADgot: return "AArch64ISD::LOADgot";
				case AArch64ISD::LOADgot32: return "AArch64ISD::LOADgot32";
	case AArch64ISD::RET_FLAG: return "AArch64ISD::RET_FLAG";			case AArch64ISD::RET_FLAG: return "AArch64ISD::RET_FLAG";
	case AArch64ISD::BRCOND: return "AArch64ISD::BRCOND";			case AArch64ISD::BRCOND: return "AArch64ISD::BRCOND";
	case AArch64ISD::CSEL: return "AArch64ISD::CSEL";			case AArch64ISD::CSEL: return "AArch64ISD::CSEL";
	case AArch64ISD::FCSEL: return "AArch64ISD::FCSEL";			case AArch64ISD::FCSEL: return "AArch64ISD::FCSEL";
	case AArch64ISD::CSINV: return "AArch64ISD::CSINV";			case AArch64ISD::CSINV: return "AArch64ISD::CSINV";
	case AArch64ISD::CSNEG: return "AArch64ISD::CSNEG";			case AArch64ISD::CSNEG: return "AArch64ISD::CSNEG";
	case AArch64ISD::CSINC: return "AArch64ISD::CSINC";			case AArch64ISD::CSINC: return "AArch64ISD::CSINC";
	case AArch64ISD::THREAD_POINTER: return "AArch64ISD::THREAD_POINTER";			case AArch64ISD::THREAD_POINTER: return "AArch64ISD::THREAD_POINTER";
	▲ Show 20 Lines • Show All 1,982 Lines • ▼ Show 20 Lines
	// Finally we can make a call to calculate the offset from tpidr_el0.			// Finally we can make a call to calculate the offset from tpidr_el0.
	TPOff = LowerELFTLSDescCallSeq(SymAddr, DL, DAG);			TPOff = LowerELFTLSDescCallSeq(SymAddr, DL, DAG);
	} else			} else
	llvm_unreachable("Unsupported ELF TLS access model");			llvm_unreachable("Unsupported ELF TLS access model");

	return DAG.getNode(ISD::ADD, DL, PtrVT, ThreadBase, TPOff);			return DAG.getNode(ISD::ADD, DL, PtrVT, ThreadBase, TPOff);
	}			}

				SDValue
				AArch64TargetLowering::LowerWindowsGlobalTLSAddress(SDValue Op,
				SelectionDAG &DAG) const {
				assert(Subtarget->isTargetWindows() && "Windows specific TLS lowering");

				SDValue Chain = DAG.getEntryNode();
				EVT PtrVT = getPointerTy(DAG.getDataLayout());
				SDLoc DL(Op);

				SDValue TEB = DAG.getRegister(AArch64::X18, MVT::i64);

				// Load the ThreadLocalStoragePointer from the TEB
				// A pointer to the TLS array is located at offset 0x58 from the TEB.
				SDValue TLSArray =
				DAG.getNode(ISD::ADD, DL, PtrVT, TEB, DAG.getIntPtrConstant(0x58, DL));
				TLSArray = DAG.getLoad(PtrVT, DL, Chain, TLSArray, MachinePointerInfo());
				Chain = TLSArray.getValue(1);

				// Load the TLS index from the C runtime
				SDValue TLSIndex = DAG.getTargetExternalSymbol("_tls_index", PtrVT, 0);
				TLSIndex = DAG.getNode(AArch64ISD::LOADgot32, DL, PtrVT, TLSIndex);
				// LOADgot32 only loads 32 bits, but pretends to return an i64 to make
				// tablegen not fail. Truncate it to i32 as it should be returned.
				TLSIndex = DAG.getNode(ISD::TRUNCATE, DL, MVT::i32, TLSIndex);

				// The pointer to the thread's TLS data area is at the TLS Index scaled by 8
				// offset into the TLSArray.
				SDValue Slot = DAG.getNode(ISD::SHL, DL, MVT::i32, TLSIndex,
				DAG.getConstant(3, DL, MVT::i32));
				Slot = DAG.getNode(ISD::ZERO_EXTEND, DL, PtrVT, Slot);
				SDValue TLS = DAG.getLoad(PtrVT, DL, Chain,
				DAG.getNode(ISD::ADD, DL, PtrVT, TLSArray, Slot),
				MachinePointerInfo());
				Chain = TLS.getValue(1);

				const auto *GA = cast<GlobalAddressSDNode>(Op);
				SDValue TGA = DAG.getTargetGlobalAddress(
				GA->getGlobal(), DL, GA->getValueType(0), GA->getOffset(), 0);
				// Add the offset from the start of the .tls section (section base).
				return DAG.getNode(AArch64ISD::ADDsecrel, DL, PtrVT, TLS, TGA);
				}

	SDValue AArch64TargetLowering::LowerGlobalTLSAddress(SDValue Op,			SDValue AArch64TargetLowering::LowerGlobalTLSAddress(SDValue Op,
	SelectionDAG &DAG) const {			SelectionDAG &DAG) const {
	const GlobalAddressSDNode *GA = cast<GlobalAddressSDNode>(Op);			const GlobalAddressSDNode *GA = cast<GlobalAddressSDNode>(Op);
	if (DAG.getTarget().useEmulatedTLS())			if (DAG.getTarget().useEmulatedTLS())
	return LowerToTLSEmulatedModel(GA, DAG);			return LowerToTLSEmulatedModel(GA, DAG);

	if (Subtarget->isTargetDarwin())			if (Subtarget->isTargetDarwin())
	return LowerDarwinGlobalTLSAddress(Op, DAG);			return LowerDarwinGlobalTLSAddress(Op, DAG);
	if (Subtarget->isTargetELF())			if (Subtarget->isTargetELF())
	return LowerELFGlobalTLSAddress(Op, DAG);			return LowerELFGlobalTLSAddress(Op, DAG);
				if (Subtarget->isTargetWindows())
				return LowerWindowsGlobalTLSAddress(Op, DAG);

	llvm_unreachable("Unexpected platform trying to use TLS");			llvm_unreachable("Unexpected platform trying to use TLS");
	}			}

	SDValue AArch64TargetLowering::LowerBR_CC(SDValue Op, SelectionDAG &DAG) const {			SDValue AArch64TargetLowering::LowerBR_CC(SDValue Op, SelectionDAG &DAG) const {
	SDValue Chain = Op.getOperand(0);			SDValue Chain = Op.getOperand(0);
	ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(1))->get();			ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(1))->get();
	SDValue LHS = Op.getOperand(2);			SDValue LHS = Op.getOperand(2);
	▲ Show 20 Lines • Show All 991 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64InstrInfo.td

Show First 20 Lines • Show All 159 Lines • ▼ Show 20 Lines	def SDT_AArch64WrapperLarge : SDTypeProfile<1, 4,
[SDTCisVT<0, i64>, SDTCisVT<1, i32>,		[SDTCisVT<0, i64>, SDTCisVT<1, i32>,
SDTCisSameAs<1, 2>, SDTCisSameAs<1, 3>,		SDTCisSameAs<1, 2>, SDTCisSameAs<1, 3>,
SDTCisSameAs<1, 4>]>;		SDTCisSameAs<1, 4>]>;


// Node definitions.		// Node definitions.
def AArch64adrp : SDNode<"AArch64ISD::ADRP", SDTIntUnaryOp, []>;		def AArch64adrp : SDNode<"AArch64ISD::ADRP", SDTIntUnaryOp, []>;
def AArch64addlow : SDNode<"AArch64ISD::ADDlow", SDTIntBinOp, []>;		def AArch64addlow : SDNode<"AArch64ISD::ADDlow", SDTIntBinOp, []>;
		def AArch64addsecrel : SDNode<"AArch64ISD::ADDsecrel", SDTIntBinOp, []>;
def AArch64LOADgot : SDNode<"AArch64ISD::LOADgot", SDTIntUnaryOp>;		def AArch64LOADgot : SDNode<"AArch64ISD::LOADgot", SDTIntUnaryOp>;
		def AArch64LOADgot32 : SDNode<"AArch64ISD::LOADgot32", SDTIntUnaryOp>;
def AArch64callseq_start : SDNode<"ISD::CALLSEQ_START",		def AArch64callseq_start : SDNode<"ISD::CALLSEQ_START",
SDCallSeqStart<[ SDTCisVT<0, i32>,		SDCallSeqStart<[ SDTCisVT<0, i32>,
SDTCisVT<1, i32> ]>,		SDTCisVT<1, i32> ]>,
[SDNPHasChain, SDNPOutGlue]>;		[SDNPHasChain, SDNPOutGlue]>;
def AArch64callseq_end : SDNode<"ISD::CALLSEQ_END",		def AArch64callseq_end : SDNode<"ISD::CALLSEQ_END",
SDCallSeqEnd<[ SDTCisVT<0, i32>,		SDCallSeqEnd<[ SDTCisVT<0, i32>,
SDTCisVT<1, i32> ]>,		SDTCisVT<1, i32> ]>,
[SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;		[SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;
▲ Show 20 Lines • Show All 181 Lines • ▼ Show 20 Lines
// FIXME: The following pseudo instructions are only needed because remat		// FIXME: The following pseudo instructions are only needed because remat
// cannot handle multiple instructions. When that changes, they can be		// cannot handle multiple instructions. When that changes, they can be
// removed, along with the AArch64Wrapper node.		// removed, along with the AArch64Wrapper node.

let AddedComplexity = 10 in		let AddedComplexity = 10 in
def LOADgot : Pseudo<(outs GPR64:$dst), (ins i64imm:$addr),		def LOADgot : Pseudo<(outs GPR64:$dst), (ins i64imm:$addr),
[(set GPR64:$dst, (AArch64LOADgot tglobaladdr:$addr))]>,		[(set GPR64:$dst, (AArch64LOADgot tglobaladdr:$addr))]>,
Sched<[WriteLDAdr]>;		Sched<[WriteLDAdr]>;
		def LOADgot32 : Pseudo<(outs GPR64:$dst), (ins i64imm:$addr),
		[(set GPR64:$dst, (AArch64LOADgot32 tglobaladdr:$addr))]>,
		Sched<[WriteLDAdr]>;

// The MOVaddr instruction should match only when the add is not folded		// The MOVaddr instruction should match only when the add is not folded
// into a load or store address.		// into a load or store address.
def MOVaddr		def MOVaddr
: Pseudo<(outs GPR64:$dst), (ins i64imm:$hi, i64imm:$low),		: Pseudo<(outs GPR64:$dst), (ins i64imm:$hi, i64imm:$low),
[(set GPR64:$dst, (AArch64addlow (AArch64adrp tglobaladdr:$hi),		[(set GPR64:$dst, (AArch64addlow (AArch64adrp tglobaladdr:$hi),
tglobaladdr:$low))]>,		tglobaladdr:$low))]>,
Sched<[WriteAdrAdr]>;		Sched<[WriteAdrAdr]>;
Show All 18 Lines	: Pseudo<(outs GPR64:$dst), (ins i64imm:$hi, i64imm:$low),
tglobaltlsaddr:$low))]>,		tglobaltlsaddr:$low))]>,
Sched<[WriteAdrAdr]>;		Sched<[WriteAdrAdr]>;
def MOVaddrEXT		def MOVaddrEXT
: Pseudo<(outs GPR64:$dst), (ins i64imm:$hi, i64imm:$low),		: Pseudo<(outs GPR64:$dst), (ins i64imm:$hi, i64imm:$low),
[(set GPR64:$dst, (AArch64addlow (AArch64adrp texternalsym:$hi),		[(set GPR64:$dst, (AArch64addlow (AArch64adrp texternalsym:$hi),
texternalsym:$low))]>,		texternalsym:$low))]>,
Sched<[WriteAdrAdr]>;		Sched<[WriteAdrAdr]>;

		def ADDsecrel
		: Pseudo<(outs GPR64:$dst), (ins GPR64:$src, i64imm:$addr),
		[(set GPR64:$dst, (AArch64addsecrel GPR64:$src, tglobaltlsaddr:$addr))]>,
		Sched<[WriteAdrAdr]>;
} // isReMaterializable, isCodeGenOnly		} // isReMaterializable, isCodeGenOnly

def : Pat<(AArch64LOADgot tglobaltlsaddr:$addr),		def : Pat<(AArch64LOADgot tglobaltlsaddr:$addr),
(LOADgot tglobaltlsaddr:$addr)>;		(LOADgot tglobaltlsaddr:$addr)>;

def : Pat<(AArch64LOADgot texternalsym:$addr),		def : Pat<(AArch64LOADgot texternalsym:$addr),
(LOADgot texternalsym:$addr)>;		(LOADgot texternalsym:$addr)>;

def : Pat<(AArch64LOADgot tconstpool:$addr),		def : Pat<(AArch64LOADgot tconstpool:$addr),
(LOADgot tconstpool:$addr)>;		(LOADgot tconstpool:$addr)>;

		def : Pat<(AArch64LOADgot32 tglobaltlsaddr:$addr),
		(LOADgot32 tglobaltlsaddr:$addr)>;

		def : Pat<(AArch64LOADgot32 texternalsym:$addr),
		(LOADgot32 texternalsym:$addr)>;

		def : Pat<(AArch64LOADgot32 tconstpool:$addr),
		(LOADgot32 tconstpool:$addr)>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// System instructions.		// System instructions.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def HINT : HintI<"hint">;		def HINT : HintI<"hint">;
def : InstAlias<"nop", (HINT 0b000)>;		def : InstAlias<"nop", (HINT 0b000)>;
def : InstAlias<"yield",(HINT 0b001)>;		def : InstAlias<"yield",(HINT 0b001)>;
def : InstAlias<"wfe", (HINT 0b010)>;		def : InstAlias<"wfe", (HINT 0b010)>;
▲ Show 20 Lines • Show All 991 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64MCInstLower.cpp

Show First 20 Lines • Show All 167 Lines • ▼ Show 20 Lines	MCOperand AArch64MCInstLower::lowerSymbolOperandELF(const MachineOperand &MO,
RefKind = static_cast<AArch64MCExpr::VariantKind>(RefFlags);		RefKind = static_cast<AArch64MCExpr::VariantKind>(RefFlags);
Expr = AArch64MCExpr::create(Expr, RefKind, Ctx);		Expr = AArch64MCExpr::create(Expr, RefKind, Ctx);

return MCOperand::createExpr(Expr);		return MCOperand::createExpr(Expr);
}		}

MCOperand AArch64MCInstLower::lowerSymbolOperandCOFF(const MachineOperand &MO,		MCOperand AArch64MCInstLower::lowerSymbolOperandCOFF(const MachineOperand &MO,
MCSymbol *Sym) const {		MCSymbol *Sym) const {
MCSymbolRefExpr::VariantKind RefKind = MCSymbolRefExpr::VK_None;		AArch64MCExpr::VariantKind RefKind = AArch64MCExpr::VK_NONE;
const MCExpr *Expr = MCSymbolRefExpr::create(Sym, RefKind, Ctx);		if (MO.getTargetFlags() & AArch64II::MO_TLS) {
		if (MO.getTargetFlags() & AArch64II::MO_PAGEOFF)
		RefKind = AArch64MCExpr::VK_SECREL_LO12;
		else if (MO.getTargetFlags() & AArch64II::MO_PAGE)
		RefKind = AArch64MCExpr::VK_SECREL_HI12;
		}
		const MCExpr *Expr =
		MCSymbolRefExpr::create(Sym, MCSymbolRefExpr::VK_None, Ctx);
if (!MO.isJTI() && MO.getOffset())		if (!MO.isJTI() && MO.getOffset())
Expr = MCBinaryExpr::createAdd(		Expr = MCBinaryExpr::createAdd(
Expr, MCConstantExpr::create(MO.getOffset(), Ctx), Ctx);		Expr, MCConstantExpr::create(MO.getOffset(), Ctx), Ctx);
		Expr = AArch64MCExpr::create(Expr, RefKind, Ctx);
return MCOperand::createExpr(Expr);		return MCOperand::createExpr(Expr);
}		}

MCOperand AArch64MCInstLower::LowerSymbolOperand(const MachineOperand &MO,		MCOperand AArch64MCInstLower::LowerSymbolOperand(const MachineOperand &MO,
MCSymbol *Sym) const {		MCSymbol *Sym) const {
if (Printer.TM.getTargetTriple().isOSDarwin())		if (Printer.TM.getTargetTriple().isOSDarwin())
return lowerSymbolOperandDarwin(MO, Sym);		return lowerSymbolOperandDarwin(MO, Sym);
if (Printer.TM.getTargetTriple().isOSBinFormatCOFF())		if (Printer.TM.getTargetTriple().isOSBinFormatCOFF())
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

test/CodeGen/AArch64/win-tls.ll

This file was added.

				; RUN: llc -mtriple aarch64-windows %s -o - \| FileCheck %s

				@tlsVar = thread_local global i32 0

				define i32 @getVar() {
				%1 = load i32, i32* @tlsVar
				ret i32 %1
				}

				; CHECK: ldr [[TLS_POINTER:x[0-9]+]], [x18, #88]
				; CHECK: adrp [[TLS_INDEX_ADDR:x[0-9]+]], _tls_index
				; CHECK: ldr [[TLS_INDEX:w[0-9]+]], {{\[}}[[TLS_INDEX_ADDR]], _tls_index]
				; This lsl could ideally be folded into the uxtw below, but that doesn't
				; happen right now.
				; CHECK: lsl [[TLS_INDEX]], [[TLS_INDEX]], #3

				; CHECK: ldr [[TLS:x[0-9]+]], {{\[}}[[TLS_POINTER]], [[TLS_INDEX]], uxtw]
				; CHECK: add [[TLS]], [[TLS]], :secrel_hi12:tlsVar
				; This add+ldr could also be folded into a single ldr with a :secrel_lo12:
				; offset.
				; CHECK: add [[TLS]], [[TLS]], :secrel_lo12:tlsVar
				; CHECK: ldr w0, {{\[}}[[TLS]]{{\]}}

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Implement native TLS for Windows
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 136600

lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp

lib/Target/AArch64/AArch64ISelLowering.h

lib/Target/AArch64/AArch64ISelLowering.cpp

lib/Target/AArch64/AArch64InstrInfo.td

lib/Target/AArch64/AArch64MCInstLower.cpp

test/CodeGen/AArch64/win-tls.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Implement native TLS for WindowsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 136600

lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp

lib/Target/AArch64/AArch64ISelLowering.h

lib/Target/AArch64/AArch64ISelLowering.cpp

lib/Target/AArch64/AArch64InstrInfo.td

lib/Target/AArch64/AArch64MCInstLower.cpp

test/CodeGen/AArch64/win-tls.ll

[AArch64] Implement native TLS for Windows
ClosedPublic