Download Raw Diff

Details

Reviewers

t.p.northover
compnerd
eli.friedman
aemerson

Commits

rG2778fd0b59b6: [AArch64] Implement stack probing for windows
rL321150: [AArch64] Implement stack probing for windows

Diff Detail

Repository: rL LLVM

Event Timeline

mstorsjo created this revision.Dec 12 2017, 1:22 PM

Herald added subscribers: kristof.beyls, javed.absar, rengolin, aemerson. · View Herald TranscriptDec 12 2017, 1:22 PM

How to work around this?? The intended instruction is sub sp, sp, x15, lsl #4

I think you're using the wrong opcode? Should be SUBXrs.

compnerd added inline comments.Dec 12 2017, 1:57 PM

lib/Target/AArch64/AArch64FrameLowering.cpp
597 ↗	(On Diff #126615)	Is this something that Visual Studio supports? Or the code model an extension like the ARMv7 case?
1224–1226 ↗	(On Diff #126615)	Id just use a ternary: unsigned BP = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister() : AArch64::NoRegister;
1229 ↗	(On Diff #126615)	Why not use a range based loop? for (const auto &CSR : RegInfo->getCalleeSavedRegs(&MF)) if (CSR == BasePointerReg) ++SpillEstimate;

In D41131#952864, @efriedma wrote:

How to work around this?? The intended instruction is sub sp, sp, x15, lsl #4

I think you're using the wrong opcode? Should be SUBXrs.

No, that ends up doing the wrong thing.

SP and XZR are different names for R31, and the interpretation depend on the kind of the instruction. It refers to SP in add/sub with immediates or extended registers, but not with shifted registers. So therefore I need to use SUBXrx, otherwise this ends up interpreted as sub xzr, xzr, x15, lsl #4, which gets simplified into neg xzr, x15, lsl #4 or so.

mstorsjo added inline comments.Dec 12 2017, 2:19 PM

lib/Target/AArch64/AArch64FrameLowering.cpp
597 ↗	(On Diff #126615)	I modeled this based on what MSVC produces. It's pretty similar to the ARMv7 case, with one difference: On ARM, on return from `__chkstk`, the register has been rescaled into bytes, so you do `sub sp, sp, r4`, while here it's kept in the same unit as on input to `__chkstk`, and thus need to do `sub sp, sp, x15, lsl #4`.
1224–1226 ↗	(On Diff #126615)	Sure, I can do that.
1229 ↗	(On Diff #126615)	That doesn't work, since `getCalleeSavedRegs` returns a plain pointer with null termination, the length isn't known at compile time and it doesn't have `begin()`/`end()` functions.

Used a ternary operator as suggested.

No, that ends up doing the wrong thing.

Oh, oops, you're right. The actual correct opcode is SUBXrx64.

In D41131#952896, @efriedma wrote:

No, that ends up doing the wrong thing.

Oh, oops, you're right. The actual correct opcode is SUBXrx64.

Thanks, that does fix it! Those opcodes aren't really easily discoverable...

Using SUBXrx64, added -verify-machineinstrs to the test.

How does this interact with https://reviews.llvm.org/D40863 ?

MatzeB added a reviewer: aemerson.Dec 12 2017, 3:31 PM

mstorsjo mentioned this in D40863: [AArch64][Darwin] Implement stack probing for static and dynamic stack objects.Dec 12 2017, 11:53 PM

In D41131#952995, @efriedma wrote:

How does this interact with https://reviews.llvm.org/D40863 ?

They're pretty similar, although I only implemented the case for static stack allocations so far, leaving dynamic allocations for later. The basis for what they do is almost identical (with the differences in the function name and calling convention), so I'm sure it should be easy to adapt one to match the pattern laid out by the other one when one of them gets committed first.

compnerd added inline comments.Dec 14 2017, 5:50 PM

lib/Target/AArch64/AArch64FrameLowering.cpp
597 ↗	(On Diff #126615)	Oh, I assumed as much. However, the behavior of the large code model is something which is a conforming extension. The generated code is slightly different to allow the values to extend past what the pattern MSVC emits would allow. I am asking if they did this differently in the ARM64 backend or if that is an extension (and thus should be documented).

mstorsjo added inline comments.Dec 14 2017, 10:16 PM

lib/Target/AArch64/AArch64FrameLowering.cpp
597 ↗	(On Diff #126615)	I didn't check that case with msvc, I'll have a look.

mstorsjo added inline comments.Dec 14 2017, 11:14 PM

lib/Target/AArch64/AArch64FrameLowering.cpp
597 ↗	(On Diff #126615)	So I presume that MSVC didn't have any command line flag to enable a large code model from before/on other archs? I don't seem to find one in MSVC for ARM64 at least. (I.e., for the armv7 case, was this a case of MSVC just not supporting such a code model, or that it does but didn't emit code like this?) In any case, I'll update this patch with similar documentation as for arm, about the fact that this is an extension.

Added documentation about the fact that the large code model operation is an extension.

Fix the estimate of number of registers to spill, adjust it to make sure it won't underestimate the amount of stack used.

Please do add the docs that the large code model behavior is an extension.

lib/Target/AArch64/AArch64FrameLowering.cpp
367 ↗	(On Diff #127256)	Why not just combine these into a single `requiresStackProbe`?
498 ↗	(On Diff #127256)	Same.

This revision is now accepted and ready to land.Dec 19 2017, 2:14 PM

In D41131#960031, @compnerd wrote:

Please do add the docs that the large code model behavior is an extension.

I did add a section to docs/Extensions.rst about this

lib/Target/AArch64/AArch64FrameLowering.cpp
367 ↗	(On Diff #127256)	I think I'd rather keep them separate like this, since they're there for quite different reasons - the 512 byte limit doesn't have anything to do with stack probing.
498 ↗	(On Diff #127256)	Not quite sure how you mean I should try to merge these

Closed by commit rL321150: [AArch64] Implement stack probing for windows (authored by mstorsjo). · Explain WhyDec 19 2017, 10:52 PM

This revision was automatically updated to reflect the committed changes.

mstorsjo mentioned this in D42356: [AArch64] Implement dynamic stack probing for windows.Jan 24 2018, 11:22 PM

Diff 127662

llvm/trunk/docs/Extensions.rst

	Show First 20 Lines • Show All 282 Lines • ▼ Show 20 Lines
	The Windows ARM Itanium ABI extends the base ABI by adding support for emitting			The Windows ARM Itanium ABI extends the base ABI by adding support for emitting
	a dynamic stack allocation. When emitting a variable stack allocation, a call			a dynamic stack allocation. When emitting a variable stack allocation, a call
	to ``__chkstk`` is emitted unconditionally to ensure that guard pages are setup			to ``__chkstk`` is emitted unconditionally to ensure that guard pages are setup
	properly. The emission of this stack probe emission is handled similar to the			properly. The emission of this stack probe emission is handled similar to the
	standard stack probe emission.			standard stack probe emission.

	The MSVC environment does not emit code for VLAs currently.			The MSVC environment does not emit code for VLAs currently.

				Windows on ARM64
				----------------

				Stack Probe Emission
				^^^^^^^^^^^^^^^^^^^^

				The reference implementation (Microsoft Visual Studio 2017) emits stack probes
				in the following fashion:

				.. code-block:: gas

				mov x15, #constant
				bl __chkstk
				sub sp, sp, x15, lsl #4

				However, this has the limitation of 256 MiB (±128MiB). In order to accommodate
				larger binaries, LLVM supports the use of ``-mcode-model=large`` to allow a 8GiB
				(±4GiB) range via a slight deviation. It will generate an indirect jump as
				follows:

				.. code-block:: gas

				mov x15, #constant
				adrp x16, __chkstk
				add x16, x16, :lo12:__chkstk
				blr x16
				sub sp, sp, x15, lsl #4

llvm/trunk/lib/Target/AArch64/AArch64FrameLowering.cpp

Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AArch64FrameLowering.h"		#include "AArch64FrameLowering.h"
#include "AArch64InstrInfo.h"		#include "AArch64InstrInfo.h"
#include "AArch64MachineFunctionInfo.h"		#include "AArch64MachineFunctionInfo.h"
#include "AArch64RegisterInfo.h"		#include "AArch64RegisterInfo.h"
#include "AArch64Subtarget.h"		#include "AArch64Subtarget.h"
#include "AArch64TargetMachine.h"		#include "AArch64TargetMachine.h"
		#include "MCTargetDesc/AArch64AddressingModes.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/CodeGen/LivePhysRegs.h"		#include "llvm/CodeGen/LivePhysRegs.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineFrameInfo.h"		#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineInstr.h"		#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
▲ Show 20 Lines • Show All 222 Lines • ▼ Show 20 Lines	bool AArch64FrameLowering::canUseAsPrologue(
// Don't need a scratch register if we're not going to re-align the stack.		// Don't need a scratch register if we're not going to re-align the stack.
if (!RegInfo->needsStackRealignment(*MF))		if (!RegInfo->needsStackRealignment(*MF))
return true;		return true;
// Otherwise, we can use any block as long as it has a scratch register		// Otherwise, we can use any block as long as it has a scratch register
// available.		// available.
return findScratchNonCalleeSaveRegister(TmpMBB) != AArch64::NoRegister;		return findScratchNonCalleeSaveRegister(TmpMBB) != AArch64::NoRegister;
}		}

		static bool windowsRequiresStackProbe(MachineFunction &MF,
		unsigned StackSizeInBytes) {
		const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
		if (!Subtarget.isTargetWindows())
		return false;
		const Function &F = MF.getFunction();
		// TODO: When implementing stack protectors, take that into account
		// for the probe threshold.
		unsigned StackProbeSize = 4096;
		if (F.hasFnAttribute("stack-probe-size"))
		F.getFnAttribute("stack-probe-size")
		.getValueAsString()
		.getAsInteger(0, StackProbeSize);
		return StackSizeInBytes >= StackProbeSize;
		}

bool AArch64FrameLowering::shouldCombineCSRLocalStackBump(		bool AArch64FrameLowering::shouldCombineCSRLocalStackBump(
MachineFunction &MF, unsigned StackBumpBytes) const {		MachineFunction &MF, unsigned StackBumpBytes) const {
AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();		AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
const MachineFrameInfo &MFI = MF.getFrameInfo();		const MachineFrameInfo &MFI = MF.getFrameInfo();
const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();		const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();		const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();

if (AFI->getLocalStackSize() == 0)		if (AFI->getLocalStackSize() == 0)
return false;		return false;

// 512 is the maximum immediate for stp/ldp that will be used for		// 512 is the maximum immediate for stp/ldp that will be used for
// callee-save save/restores		// callee-save save/restores
if (StackBumpBytes >= 512)		if (StackBumpBytes >= 512 \|\| windowsRequiresStackProbe(MF, StackBumpBytes))
return false;		return false;

if (MFI.hasVarSizedObjects())		if (MFI.hasVarSizedObjects())
return false;		return false;

if (RegInfo->needsStackRealignment(MF))		if (RegInfo->needsStackRealignment(MF))
return false;		return false;

▲ Show 20 Lines • Show All 114 Lines • ▼ Show 20 Lines	void AArch64FrameLowering::emitPrologue(MachineFunction &MF,
DebugLoc DL;		DebugLoc DL;

// All calls are tail calls in GHC calling conv, and functions have no		// All calls are tail calls in GHC calling conv, and functions have no
// prologue/epilogue.		// prologue/epilogue.
if (MF.getFunction().getCallingConv() == CallingConv::GHC)		if (MF.getFunction().getCallingConv() == CallingConv::GHC)
return;		return;

int NumBytes = (int)MFI.getStackSize();		int NumBytes = (int)MFI.getStackSize();
if (!AFI->hasStackFrame()) {		if (!AFI->hasStackFrame() && !windowsRequiresStackProbe(MF, NumBytes)) {
assert(!HasFP && "unexpected function without stack frame but with FP");		assert(!HasFP && "unexpected function without stack frame but with FP");

// All of the stack allocation is for locals.		// All of the stack allocation is for locals.
AFI->setLocalStackSize(NumBytes);		AFI->setLocalStackSize(NumBytes);

if (!NumBytes)		if (!NumBytes)
return;		return;
// REDZONE: If the stack size is less than 128 bytes, we don't need		// REDZONE: If the stack size is less than 128 bytes, we don't need
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	if (HasFP) {
// Issue sub fp, sp, FPOffset or		// Issue sub fp, sp, FPOffset or
// mov fp,sp when FPOffset is zero.		// mov fp,sp when FPOffset is zero.
// Note: All stores of callee-saved registers are marked as "FrameSetup".		// Note: All stores of callee-saved registers are marked as "FrameSetup".
// This code marks the instruction(s) that set the FP also.		// This code marks the instruction(s) that set the FP also.
emitFrameOffset(MBB, MBBI, DL, AArch64::FP, AArch64::SP, FPOffset, TII,		emitFrameOffset(MBB, MBBI, DL, AArch64::FP, AArch64::SP, FPOffset, TII,
MachineInstr::FrameSetup);		MachineInstr::FrameSetup);
}		}

		if (windowsRequiresStackProbe(MF, NumBytes)) {
		uint32_t NumWords = NumBytes >> 4;

		BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm), AArch64::X15)
		.addImm(NumWords)
		.setMIFlags(MachineInstr::FrameSetup);

		switch (MF.getTarget().getCodeModel()) {
		case CodeModel::Small:
		case CodeModel::Medium:
		case CodeModel::Kernel:
		BuildMI(MBB, MBBI, DL, TII->get(AArch64::BL))
		.addExternalSymbol("__chkstk")
		.addReg(AArch64::X15, RegState::Implicit)
		.setMIFlags(MachineInstr::FrameSetup);
		break;
		case CodeModel::Large:
		BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVaddrEXT))
		.addReg(AArch64::X16, RegState::Define)
		.addExternalSymbol("__chkstk")
		.addExternalSymbol("__chkstk")
		.setMIFlags(MachineInstr::FrameSetup);

		BuildMI(MBB, MBBI, DL, TII->get(AArch64::BLR))
		.addReg(AArch64::X16, RegState::Kill)
		.addReg(AArch64::X15, RegState::Implicit \| RegState::Define)
		.setMIFlags(MachineInstr::FrameSetup);
		break;
		}

		BuildMI(MBB, MBBI, DL, TII->get(AArch64::SUBXrx64), AArch64::SP)
		.addReg(AArch64::SP, RegState::Kill)
		.addReg(AArch64::X15, RegState::Kill)
		.addImm(AArch64_AM::getArithExtendImm(AArch64_AM::UXTX, 4))
		.setMIFlags(MachineInstr::FrameSetup);
		NumBytes = 0;
		}

// Allocate space for the rest of the frame.		// Allocate space for the rest of the frame.
if (NumBytes) {		if (NumBytes) {
const bool NeedsRealignment = RegInfo->needsStackRealignment(MF);		const bool NeedsRealignment = RegInfo->needsStackRealignment(MF);
unsigned scratchSPReg = AArch64::SP;		unsigned scratchSPReg = AArch64::SP;

if (NeedsRealignment) {		if (NeedsRealignment) {
scratchSPReg = findScratchNonCalleeSaveRegister(&MBB);		scratchSPReg = findScratchNonCalleeSaveRegister(&MBB);
assert(scratchSPReg != AArch64::NoRegister);		assert(scratchSPReg != AArch64::NoRegister);
▲ Show 20 Lines • Show All 598 Lines • ▼ Show 20 Lines	void AArch64FrameLowering::determineCalleeSaves(MachineFunction &MF,

TargetFrameLowering::determineCalleeSaves(MF, SavedRegs, RS);		TargetFrameLowering::determineCalleeSaves(MF, SavedRegs, RS);
const AArch64RegisterInfo RegInfo = static_cast<const AArch64RegisterInfo >(		const AArch64RegisterInfo RegInfo = static_cast<const AArch64RegisterInfo >(
MF.getSubtarget().getRegisterInfo());		MF.getSubtarget().getRegisterInfo());
AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();		AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
unsigned UnspilledCSGPR = AArch64::NoRegister;		unsigned UnspilledCSGPR = AArch64::NoRegister;
unsigned UnspilledCSGPRPaired = AArch64::NoRegister;		unsigned UnspilledCSGPRPaired = AArch64::NoRegister;

		MachineFrameInfo &MFI = MF.getFrameInfo();
		const MCPhysReg *CSRegs = RegInfo->getCalleeSavedRegs(&MF);

		unsigned BasePointerReg = RegInfo->hasBasePointer(MF)
		? RegInfo->getBaseRegister()
		: (unsigned)AArch64::NoRegister;

		unsigned SpillEstimate = SavedRegs.count();
		for (unsigned i = 0; CSRegs[i]; ++i) {
		unsigned Reg = CSRegs[i];
		unsigned PairedReg = CSRegs[i ^ 1];
		if (Reg == BasePointerReg)
		SpillEstimate++;
		if (produceCompactUnwindFrame(MF) && !SavedRegs.test(PairedReg))
		SpillEstimate++;
		}
		SpillEstimate += 2; // Conservatively include FP+LR in the estimate
		unsigned StackEstimate = MFI.estimateStackSize(MF) + 8 * SpillEstimate;

// The frame record needs to be created by saving the appropriate registers		// The frame record needs to be created by saving the appropriate registers
if (hasFP(MF)) {		if (hasFP(MF) \|\| windowsRequiresStackProbe(MF, StackEstimate)) {
SavedRegs.set(AArch64::FP);		SavedRegs.set(AArch64::FP);
SavedRegs.set(AArch64::LR);		SavedRegs.set(AArch64::LR);
}		}

unsigned BasePointerReg = AArch64::NoRegister;
if (RegInfo->hasBasePointer(MF))
BasePointerReg = RegInfo->getBaseRegister();

unsigned ExtraCSSpill = 0;		unsigned ExtraCSSpill = 0;
const MCPhysReg *CSRegs = RegInfo->getCalleeSavedRegs(&MF);
// Figure out which callee-saved registers to save/restore.		// Figure out which callee-saved registers to save/restore.
for (unsigned i = 0; CSRegs[i]; ++i) {		for (unsigned i = 0; CSRegs[i]; ++i) {
const unsigned Reg = CSRegs[i];		const unsigned Reg = CSRegs[i];

// Add the base pointer register to SavedRegs if it is callee-save.		// Add the base pointer register to SavedRegs if it is callee-save.
if (Reg == BasePointerReg)		if (Reg == BasePointerReg)
SavedRegs.set(Reg);		SavedRegs.set(Reg);

Show All 25 Lines	DEBUG(dbgs() << "*** determineCalleeSaves\nUsed CSRs:";
dbgs() << "\n";);		dbgs() << "\n";);

// If any callee-saved registers are used, the frame cannot be eliminated.		// If any callee-saved registers are used, the frame cannot be eliminated.
unsigned NumRegsSpilled = SavedRegs.count();		unsigned NumRegsSpilled = SavedRegs.count();
bool CanEliminateFrame = NumRegsSpilled == 0;		bool CanEliminateFrame = NumRegsSpilled == 0;

// The CSR spill slots have not been allocated yet, so estimateStackSize		// The CSR spill slots have not been allocated yet, so estimateStackSize
// won't include them.		// won't include them.
MachineFrameInfo &MFI = MF.getFrameInfo();
unsigned CFSize = MFI.estimateStackSize(MF) + 8 * NumRegsSpilled;		unsigned CFSize = MFI.estimateStackSize(MF) + 8 * NumRegsSpilled;
DEBUG(dbgs() << "Estimated stack frame size: " << CFSize << " bytes.\n");		DEBUG(dbgs() << "Estimated stack frame size: " << CFSize << " bytes.\n");
unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);		unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
bool BigStack = (CFSize > EstimatedStackSizeLimit);		bool BigStack = (CFSize > EstimatedStackSizeLimit);
if (BigStack \|\| !CanEliminateFrame \|\| RegInfo->cannotEliminateFrame(MF))		if (BigStack \|\| !CanEliminateFrame \|\| RegInfo->cannotEliminateFrame(MF))
AFI->setHasStackFrame(true);		AFI->setHasStackFrame(true);

// Estimate if we might need to scavenge a register at some point in order		// Estimate if we might need to scavenge a register at some point in order
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/chkstk.ll

				; RUN: llc -mtriple=aarch64-windows -verify-machineinstrs %s -o - \
				; RUN: \| FileCheck -check-prefix CHECK-DEFAULT-CODE-MODEL %s

				; RUN: llc -mtriple=aarch64-windows -verify-machineinstrs -code-model=large %s -o - \
				; RUN: \| FileCheck -check-prefix CHECK-LARGE-CODE-MODEL %s

				define void @check_watermark() {
				entry:
				%buffer = alloca [4096 x i8], align 1
				ret void
				}

				; CHECK-DEFAULT-CODE-MODEL: check_watermark:
				; CHECK-DEFAULT-CODE-MODEL-DAG: stp x29, x30, [sp
				; CHECK-DEFAULT-CODE-MODEL-DAG: orr x15, xzr, #0x100
				; CHECK-DEFAULT-CODE-MODEL: bl __chkstk
				; CHECK-DEFAULT-CODE-MODEL: sub sp, sp, x15, lsl #4

				; CHECK-LARGE-CODE-MODEL: check_watermark:
				; CHECK-LARGE-CODE-MODEL-DAG: stp x29, x30, [sp
				; CHECK-LARGE-CODE-MODEL-DAG: orr x15, xzr, #0x100
				; CHECK-LARGE-CODE-MODEL-DAG: adrp x16, __chkstk
				; CHECK-LARGE-CODE-MODEL-DAG: add x16, x16, __chkstk
				; CHECK-LARGE-CODE-MODEL: blr x16
				; CHECK-LARGE-CODE-MODEL: sub sp, sp, x15, lsl #4

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Implement stack probing for windows
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 127662

llvm/trunk/docs/Extensions.rst

llvm/trunk/lib/Target/AArch64/AArch64FrameLowering.cpp

llvm/trunk/test/CodeGen/AArch64/chkstk.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Implement stack probing for windowsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 127662

llvm/trunk/docs/Extensions.rst

llvm/trunk/lib/Target/AArch64/AArch64FrameLowering.cpp

llvm/trunk/test/CodeGen/AArch64/chkstk.ll

[AArch64] Implement stack probing for windows
ClosedPublic