This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/AArch64/
-
Target/
-
AArch64/
1
AArch64CallingConvention.td
-
AArch64FrameLowering.h
12
AArch64FrameLowering.cpp
-
AArch64ISelLowering.h
3
AArch64ISelLowering.cpp
-
AArch64RegisterInfo.h
-
AArch64RegisterInfo.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
1
arm64-stack-probing.ll

Differential D40863

[AArch64][Darwin] Implement stack probing for static and dynamic stack objects
AbandonedPublic

Authored by aemerson on Dec 5 2017, 2:58 PM.

Download Raw Diff

Details

Reviewers

MatzeB
t.p.northover
ab
mstorsjo
javed.absar

Summary

[AArch64][Darwin] Implement stack probing for static and dynamic stack objects.

This feature generates calls to a stack probing function on Darwin with a custom calling convention. The probe function uses some temp registers to probe the SP at 4k intervals. For static stack objects, we can emit the probe call in the function prolog. For dynamic objects like allocas, we create a call to the same function with a custom calling convention.

Diff Detail

Repository: rL LLVM

Event Timeline

aemerson created this revision.Dec 5 2017, 2:58 PM

Herald added subscribers: kristof.beyls, javed.absar, rengolin. · View Herald TranscriptDec 5 2017, 2:58 PM

aemerson added parent revisions: D40857: [AArch64][Darwin] Add new ARM64 stack probing function for Darwin, D40861: [X86] Add support for stack probing on x86_64 Darwin.Dec 5 2017, 2:59 PM

efriedma added subscribers: pcwalton, efriedma.Dec 5 2017, 3:05 PM

aemerson added a child revision: D40864: [Darwin] Add a new -mstack-probe option and enable by default.Dec 5 2017, 3:07 PM

It would be helpful to describe somewhere what exactly stack probing is. Maybe add more comments to emitStackProbe() and the commit message.

Is it one of those situations where you have to touch the OS pages backing stack memory one after the other (instead of accidentally jumping across pages for big arrays)?

In D40863#945833, @MatzeB wrote:

It would be helpful to describe somewhere what exactly stack probing is. Maybe add more comments to emitStackProbe() and the commit message.

Is it one of those situations where you have to touch the OS pages backing stack memory one after the other (instead of accidentally jumping across pages for big arrays)?

Yes, that's correct. The default implementation I've created in D40857 in compiler-rt loads data in 4096 (=page size) byte intervals. This ensures that the stack guard page is always hit. I'll add some comments to the code as you suggest. However the implementation in compiler-rt has a weakly defined symbol and may be overridden at link time so I'm reticent to define what the probe function does *exactly* in the compiler.

Looks generally good to me, with nitpicks below. But I'd like to hear a comment on how this is supposed to be enabled (see comment below).

lib/Target/AArch64/AArch64CallingConvention.td
348–349	From the perspective of the caller I'd call it "clobbered" instead of "used".
lib/Target/AArch64/AArch64FrameLowering.cpp
462	I think DebugLoc will only ever be a default constructed one, which you can just do inside the function instead of passing it as a parameter.
465–466	Isn't a `TargetInstrInfo` enough here so you can get away without a `static_cast`?
471–487	Generally it feels sketchy to save/extend/restore the stackframe without MachineFrameInfo knowing about it. (For example you will hide this extra stuff from the `WarnStackSize` functionality in PrologueEpilogueINserter. That's probably not a big deal but if we can somehow find a better way that would be nice. "We don't bother updating SP...", this is problematic, AFAIK a unix signal can come in at any time and will use your stack frame. It will probably work on platforms with stack red zones defined that the signal handlers have to respect.
lib/Target/AArch64/AArch64ISelLowering.cpp
11040–11042	I assume this will be a correctness problem (on newer darwins?). So maybe we should not leave the decision to the frontend to set Options, but instead always make our decision based on the target triple (being darwin and newer than some version). Otherwise I can easily see non-clang frontends such as the various JITs we have around to miss the setting and fail. You could then additionally provide a cl::opt then for cases where users want to explicitely disable the probe code generation for some reason.
test/CodeGen/AArch64/arm64-stack-probing.ll
68–69	Did you try removing all the function attributes to make the test simpler?

MatzeB added reviewers: t.p.northover, ab.Dec 5 2017, 5:00 PM

aemerson mentioned this in D40864: [Darwin] Add a new -mstack-probe option and enable by default.Dec 6 2017, 6:46 AM

aemerson added inline comments.

lib/Target/AArch64/AArch64FrameLowering.cpp
471–487	I think I should update MFI to ensure `hasCalls = true` here and in the dynamic case. However for the stack size, since the SP is only ever adjusted by 16 bytes, and then restored after probing, I don't think it's necessary to tell MFI about it since it's purely a local change. Since we must have at least 4k of stack objects in order to be attempting to probe, the 16 bytes are only using stack space that would have been re-used later anyway. "We don't bother updating SP...", this is problematic, AFAIK a unix signal can come in at any time and will use your stack frame. It will probably work on platforms with stack red zones defined that the signal handlers have to respect. I think that's a mis-worded comment. I meant to say that we don't bother saving SP since we know the probe function won't clobber it. However that might not be a good idea to restrict it so much in future. The ABI isn't yet 100% finalized.
lib/Target/AArch64/AArch64ISelLowering.cpp
11040–11042	Akira suggested on D40864 that we use function attributes for this instead of a codegen option. That will still require front-ends to handle it. If we to unconditionally enable this in the backend based on target triple perhaps that would break cases where JITs etc don't automatically link in compiler-rt?

MatzeB added inline comments.Dec 6 2017, 12:59 PM

lib/Target/AArch64/AArch64FrameLowering.cpp
471–487	Ah, I didn't realize this happens before the stackframe is even setup so it doesn't effect the maximum or the layout during the main part of the function.
lib/Target/AArch64/AArch64ISelLowering.cpp
11040–11042	He's right that for users explicitely enabling/disabling the feature a function attribute would be best. I think the question whether we want the logic in llvm or just in the frontend depends a bit on why we need/want stack probing.

efriedma added inline comments.Dec 6 2017, 1:31 PM

lib/Target/AArch64/AArch64FrameLowering.cpp
560	Needs a comment to explain why NumBytes < StackProbeSize doesn't need a stack probe. (On x86, the "CALL" instruction stores the return pointer onto the stack, which simplifies the logic, but there isn't any equivalent on AArch64.)

MatzeB added a reviewer: mstorsjo.Dec 12 2017, 3:31 PM

In D40863#945847, @aemerson wrote:

In D40863#945833, @MatzeB wrote:

It would be helpful to describe somewhere what exactly stack probing is. Maybe add more comments to emitStackProbe() and the commit message.

Is it one of those situations where you have to touch the OS pages backing stack memory one after the other (instead of accidentally jumping across pages for big arrays)?

Yes, that's correct. The default implementation I've created in D40857 in compiler-rt loads data in 4096 (=page size) byte intervals. This ensures that the stack guard page is always hit

Since aarch64 on darwin isn't exactly new, how did expanding the stack by more than 4096 bytes work prior to this patchset? Functions allocating more than 4096 bytes of stack space isn't exactly uncommon. Will darwin start requiring this if it detects an executable built by a new enough toolchain to support it?

lib/Target/AArch64/AArch64FrameLowering.cpp
502	In the implementation in D41131 (based on the one for Windows on ARM by @compnerd in rL207615), I check for CodeModel::Large as well, and do the function call via MOVaddrEXT and BLR for that case.

mstorsjo added inline comments.Dec 12 2017, 11:55 PM

lib/Target/AArch64/AArch64FrameLowering.cpp
478	Instead of manually saving LR here if it wasn't already saved, in D41131 I instead tried to estimate whether the stack probe would end up being necessary in determineCalleeSaves, and made sure LR is saved already there if a stack probe will be necessary.

In D40863#953298, @mstorsjo wrote:

In D40863#945847, @aemerson wrote:

In D40863#945833, @MatzeB wrote:

It would be helpful to describe somewhere what exactly stack probing is. Maybe add more comments to emitStackProbe() and the commit message.

Is it one of those situations where you have to touch the OS pages backing stack memory one after the other (instead of accidentally jumping across pages for big arrays)?

Yes, that's correct. The default implementation I've created in D40857 in compiler-rt loads data in 4096 (=page size) byte intervals. This ensures that the stack guard page is always hit

Since aarch64 on darwin isn't exactly new, how did expanding the stack by more than 4096 bytes work prior to this patchset? Functions allocating more than 4096 bytes of stack space isn't exactly uncommon. Will darwin start requiring this if it detects an executable built by a new enough toolchain to support it?

The motivation for stack probing here is different to the Windows case, although the mechanism is the same. For Darwin this is a security feature to protect against stack clash based attacks.

lib/Target/AArch64/AArch64FrameLowering.cpp
478	Is that estimate always conservative? I.e. does it always at least over-estimate the stack size?
502	This function is specific to the Darwin stack probing ABI under consideration, so I don't think we'll be able to share this code. No objections to your patch though.

In D40863#957575, @aemerson wrote:

The motivation for stack probing here is different to the Windows case, although the mechanism is the same. For Darwin this is a security feature to protect against stack clash based attacks.

Ah, thanks for explaining!

lib/Target/AArch64/AArch64FrameLowering.cpp
478	I think it should be possible to make it conservative. I found an issue with the estimate in my patch, but I'll update it.
502	Yes, the ABI for the function call themselves is different so this particular piece of code can't be shared, but ideally all the logic for when to emit it could be shared though.

mstorsjo mentioned this in D42356: [AArch64] Implement dynamic stack probing for windows.Jan 21 2018, 12:53 PM

aemerson abandoned this revision.May 16 2018, 3:48 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptMay 16 2018, 3:48 AM

I am looking to add stack probing for linux AArch64 using this diff as a basis for my work. Are there any plans to re-introduce this diff for Darwin? Is that something I should take on as well? Context is I am working on mobile apps and we want to add -fstack-clash-protection support :)

Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2023, 11:30 AM

Current work is in https://reviews.llvm.org/D96004

In D40863#4281085, @oskarwirga wrote:

I am looking to add stack probing for linux AArch64 using this diff as a basis for my work. Are there any plans to re-introduce this diff for Darwin? Is that something I should take on as well? Context is I am working on mobile apps and we want to add -fstack-clash-protection support :)

For Darwin, we have this implemented downstream in the xcode compiler on by default. It’s downstream because originally it was closely tied to platform routines on Darwin for fast stack bounds checking. I think we can upstream that implementation now (it uses -fstack-check instead).

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64CallingConvention.td

6 lines

AArch64FrameLowering.h

4 lines

AArch64FrameLowering.cpp

88 lines

AArch64ISelLowering.h

4 lines

AArch64ISelLowering.cpp

59 lines

AArch64RegisterInfo.h

3 lines

AArch64RegisterInfo.cpp

4 lines

test/

CodeGen/

AArch64/

arm64-stack-probing.ll

70 lines

Diff 125624

lib/Target/AArch64/AArch64CallingConvention.td

Show First 20 Lines • Show All 339 Lines • ▼ Show 20 Lines	: CalleeSavedRegs<(add (sequence "W%u", 0, 30), WSP,
(sequence "S%u", 0, 31), (sequence "D%u", 0, 31),		(sequence "S%u", 0, 31), (sequence "D%u", 0, 31),
(sequence "Q%u", 0, 31))>;		(sequence "Q%u", 0, 31))>;

def CSR_AArch64_NoRegs : CalleeSavedRegs<(add)>;		def CSR_AArch64_NoRegs : CalleeSavedRegs<(add)>;

def CSR_AArch64_RT_MostRegs : CalleeSavedRegs<(add CSR_AArch64_AAPCS,		def CSR_AArch64_RT_MostRegs : CalleeSavedRegs<(add CSR_AArch64_AAPCS,
(sequence "X%u", 9, 15))>;		(sequence "X%u", 9, 15))>;

		// Darwin stack probing function CSRs. Registers X9-X11 are used, LR since it's
		// a call.
		MatzeBUnsubmitted Not Done Reply Inline Actions From the perspective of the caller I'd call it "clobbered" instead of "used". MatzeB: From the perspective of the caller I'd call it "clobbered" instead of "used".
		def CSR_AArch64_StackProbe_Darwin
		: CalleeSavedRegs<(add (sequence "X%u", 0, 8),
		(sequence "X%u", 12, 28), FP, SP,
		(sequence "Q%u", 0, 31))>;

lib/Target/AArch64/AArch64FrameLowering.h

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	public:

/// Returns true if the target will correctly handle shrink wrapping.		/// Returns true if the target will correctly handle shrink wrapping.
bool enableShrinkWrapping(const MachineFunction &MF) const override {		bool enableShrinkWrapping(const MachineFunction &MF) const override {
return true;		return true;
}		}

bool enableStackSlotScavenging(const MachineFunction &MF) const override;		bool enableStackSlotScavenging(const MachineFunction &MF) const override;

		void emitStackProbe(MachineFunction &MF, MachineBasicBlock &MBB,
		MachineBasicBlock::iterator MBBI, DebugLoc DL,
		unsigned NumBytes) const;

private:		private:
bool shouldCombineCSRLocalStackBump(MachineFunction &MF,		bool shouldCombineCSRLocalStackBump(MachineFunction &MF,
unsigned StackBumpBytes) const;		unsigned StackBumpBytes) const;
};		};

} // End llvm namespace		} // End llvm namespace

#endif		#endif

lib/Target/AArch64/AArch64FrameLowering.cpp

Show First 20 Lines • Show All 449 Lines • ▼ Show 20 Lines	assert(MI.getOperand(OffsetIdx - 1).getReg() == AArch64::SP &&
"Unexpected base register in callee-save save/restore instruction!");		"Unexpected base register in callee-save save/restore instruction!");
// Last operand is immediate offset that needs fixing.		// Last operand is immediate offset that needs fixing.
MachineOperand &OffsetOpnd = MI.getOperand(OffsetIdx);		MachineOperand &OffsetOpnd = MI.getOperand(OffsetIdx);
// All generated opcodes have scaled offsets.		// All generated opcodes have scaled offsets.
assert(LocalStackSize % 8 == 0);		assert(LocalStackSize % 8 == 0);
OffsetOpnd.setImm(OffsetOpnd.getImm() + LocalStackSize / 8);		OffsetOpnd.setImm(OffsetOpnd.getImm() + LocalStackSize / 8);
}		}

		// Emit a stack probing function call at the specified location.
		void AArch64FrameLowering::emitStackProbe(MachineFunction &MF,
		MachineBasicBlock &MBB,
		MachineBasicBlock::iterator MBBI,
		DebugLoc DL,
		MatzeBUnsubmitted Not Done Reply Inline Actions I think DebugLoc will only ever be a default constructed one, which you can just do inside the function instead of passing it as a parameter. MatzeB: I think DebugLoc will only ever be a default constructed one, which you can just do inside the…
		unsigned NumBytes) const {
		const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
		const AArch64InstrInfo *TII =
		static_cast<const AArch64InstrInfo *>(Subtarget.getInstrInfo());
		MatzeBUnsubmitted Not Done Reply Inline Actions Isn't a `TargetInstrInfo` enough here so you can get away without a `static_cast`? MatzeB: Isn't a `TargetInstrInfo` enough here so you can get away without a `static_cast`?
		const MachineFrameInfo &MFI = MF.getFrameInfo();
		const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
		const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();

		// If the LR has already been saved we don't need to save it before calling
		// the probe function. However if it hasn't then the probe will clobber it.
		bool LRIsSaved =
		std::any_of(CSI.begin(), CSI.end(), [](const CalleeSavedInfo &SI) {
		return SI.getReg() == AArch64::LR;
		});
		if (!LRIsSaved) {
		// LR wasn't saved as a CSR, so we need to save it ourselves. We don't
		mstorsjoUnsubmitted Not Done Reply Inline Actions Instead of manually saving LR here if it wasn't already saved, in D41131 I instead tried to estimate whether the stack probe would end up being necessary in determineCalleeSaves, and made sure LR is saved already there if a stack probe will be necessary. mstorsjo: Instead of manually saving LR here if it wasn't already saved, in D41131 I instead tried to…
		aemersonAuthorUnsubmitted Not Done Reply Inline Actions Is that estimate always conservative? I.e. does it always at least over-estimate the stack size? aemerson: Is that estimate always conservative? I.e. does it always at least over-estimate the stack size?
		mstorsjoUnsubmitted Not Done Reply Inline Actions I think it should be possible to make it conservative. I found an issue with the estimate in my patch, but I'll update it. mstorsjo: I think it should be possible to make it conservative. I found an issue with the estimate in my…
		// bother updating SP, as we know the probe function won't modify any
		// memory.
		BuildMI(MBB, MBBI, DL, TII->get(AArch64::STRXpre))
		.addReg(AArch64::SP, RegState::Define)
		.addReg(AArch64::LR)
		.addReg(AArch64::SP)
		.addImm(-16) // Keep SP 16 byte aligned.
		.setMIFlags(MachineInstr::FrameSetup);
		}
		MatzeBUnsubmitted Not Done Reply Inline Actions Generally it feels sketchy to save/extend/restore the stackframe without MachineFrameInfo knowing about it. (For example you will hide this extra stuff from the `WarnStackSize` functionality in PrologueEpilogueINserter. That's probably not a big deal but if we can somehow find a better way that would be nice. "We don't bother updating SP...", this is problematic, AFAIK a unix signal can come in at any time and will use your stack frame. It will probably work on platforms with stack red zones defined that the signal handlers have to respect. MatzeB: Generally it feels sketchy to save/extend/restore the stackframe without MachineFrameInfo…
		aemersonAuthorUnsubmitted Not Done Reply Inline Actions I think I should update MFI to ensure `hasCalls = true` here and in the dynamic case. However for the stack size, since the SP is only ever adjusted by 16 bytes, and then restored after probing, I don't think it's necessary to tell MFI about it since it's purely a local change. Since we must have at least 4k of stack objects in order to be attempting to probe, the 16 bytes are only using stack space that would have been re-used later anyway. "We don't bother updating SP...", this is problematic, AFAIK a unix signal can come in at any time and will use your stack frame. It will probably work on platforms with stack red zones defined that the signal handlers have to respect. I think that's a mis-worded comment. I meant to say that we don't bother saving SP since we know the probe function won't clobber it. However that might not be a good idea to restrict it so much in future. The ABI isn't yet 100% finalized. aemerson: I think I should update MFI to ensure `hasCalls = true` here and in the dynamic case. However…
		MatzeBUnsubmitted Not Done Reply Inline Actions Ah, I didn't realize this happens before the stackframe is even setup so it doesn't effect the maximum or the layout during the main part of the function. MatzeB: Ah, I didn't realize this happens before the stackframe is even setup so it doesn't effect the…

		StringRef Symbol = Subtarget.getTargetLowering()->getStackProbeSymbolName(MF);
		// We pass the number of bytes to check to the probe function in register
		// W9, a temporary register that we can use in places like the prolog. The
		// probe function should preserve all registers except X9, X10, X11 and LR.

		// To materalize the probe size, we emit a MOVi32imm pseudo-instruction which
		// will later get expanded into either an ORR Wd, Wzr, #bimm32 or into a
		// sequence of MOV instructions depending on the value.
		BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi32imm))
		.addReg(AArch64::W9, RegState::Define)
		.addImm(NumBytes)
		.setMIFlags(MachineInstr::FrameSetup);

		BuildMI(MBB, MBBI, DL, TII->get(AArch64::BL))
		mstorsjoUnsubmitted Not Done Reply Inline Actions In the implementation in D41131 (based on the one for Windows on ARM by @compnerd in rL207615), I check for CodeModel::Large as well, and do the function call via MOVaddrEXT and BLR for that case. mstorsjo: In the implementation in D41131 (based on the one for Windows on ARM by @compnerd in rL207615)…
		aemersonAuthorUnsubmitted Not Done Reply Inline Actions This function is specific to the Darwin stack probing ABI under consideration, so I don't think we'll be able to share this code. No objections to your patch though. aemerson: This function is specific to the Darwin stack probing ABI under consideration, so I don't think…
		mstorsjoUnsubmitted Not Done Reply Inline Actions Yes, the ABI for the function call themselves is different so this particular piece of code can't be shared, but ideally all the logic for when to emit it could be shared though. mstorsjo: Yes, the ABI for the function call themselves is different so this particular piece of code…
		.addExternalSymbol(MF.createExternalSymbolName(Symbol))
		.addRegMask(RegInfo->getDarwinStackProbePreservedMask())
		.addReg(AArch64::X9, RegState::Implicit)
		.setMIFlags(MachineInstr::FrameSetup);

		if (!LRIsSaved) {
		// Restore LR.
		BuildMI(MBB, MBBI, DL, TII->get(AArch64::LDRXpost))
		.addReg(AArch64::SP, RegState::Define)
		.addReg(AArch64::LR, RegState::Define)
		.addReg(AArch64::SP)
		.addImm(16)
		.setMIFlags(MachineInstr::FrameSetup);
		}
		}

void AArch64FrameLowering::emitPrologue(MachineFunction &MF,		void AArch64FrameLowering::emitPrologue(MachineFunction &MF,
MachineBasicBlock &MBB) const {		MachineBasicBlock &MBB) const {
MachineBasicBlock::iterator MBBI = MBB.begin();		MachineBasicBlock::iterator MBBI = MBB.begin();
const MachineFrameInfo &MFI = MF.getFrameInfo();		const MachineFrameInfo &MFI = MF.getFrameInfo();
const Function *Fn = MF.getFunction();		const Function *Fn = MF.getFunction();
const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();		const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();		const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
const TargetInstrInfo *TII = Subtarget.getInstrInfo();		const TargetInstrInfo *TII = Subtarget.getInstrInfo();
MachineModuleInfo &MMI = MF.getMMI();		MachineModuleInfo &MMI = MF.getMMI();
AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();		AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
bool needsFrameMoves = MMI.hasDebugInfo() \|\| Fn->needsUnwindTableEntry();		bool needsFrameMoves = MMI.hasDebugInfo() \|\| Fn->needsUnwindTableEntry();
bool HasFP = hasFP(MF);		bool HasFP = hasFP(MF);

// Debug location must be unknown since the first debug location is used		// Debug location must be unknown since the first debug location is used
// to determine the end of the prologue.		// to determine the end of the prologue.
DebugLoc DL;		DebugLoc DL;

// All calls are tail calls in GHC calling conv, and functions have no		// All calls are tail calls in GHC calling conv, and functions have no
// prologue/epilogue.		// prologue/epilogue.
if (MF.getFunction()->getCallingConv() == CallingConv::GHC)		if (MF.getFunction()->getCallingConv() == CallingConv::GHC)
return;		return;

		// Currently only Darwin platforms support stack probing on AArch64,
		// with a fixed probe size of 4096 bytes.
		const int StackProbeSize = 4096;
		StringRef ProbeSym = Subtarget.getTargetLowering()->getStackProbeSymbolName(MF);

int NumBytes = (int)MFI.getStackSize();		int NumBytes = (int)MFI.getStackSize();
if (!AFI->hasStackFrame()) {		if (!AFI->hasStackFrame()) {
assert(!HasFP && "unexpected function without stack frame but with FP");		assert(!HasFP && "unexpected function without stack frame but with FP");

// All of the stack allocation is for locals.		// All of the stack allocation is for locals.
AFI->setLocalStackSize(NumBytes);		AFI->setLocalStackSize(NumBytes);

if (!NumBytes)		if (!NumBytes)
return;		return;
// REDZONE: If the stack size is less than 128 bytes, we don't need		// REDZONE: If the stack size is less than 128 bytes, we don't need
// to actually allocate.		// to actually allocate.
if (canUseRedZone(MF))		if (canUseRedZone(MF))
++NumRedZoneFunctions;		++NumRedZoneFunctions;
else {		else {
		if (NumBytes > StackProbeSize && ProbeSym != "")
		efriedmaUnsubmitted Not Done Reply Inline Actions Needs a comment to explain why NumBytes < StackProbeSize doesn't need a stack probe. (On x86, the "CALL" instruction stores the return pointer onto the stack, which simplifies the logic, but there isn't any equivalent on AArch64.) efriedma: Needs a comment to explain why NumBytes < StackProbeSize doesn't need a stack probe. (On x86…
		emitStackProbe(MF, MBB, MBBI, DL, NumBytes);
emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP, -NumBytes, TII,		emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP, -NumBytes, TII,
MachineInstr::FrameSetup);		MachineInstr::FrameSetup);

// Label used to tie together the PROLOG_LABEL and the MachineMoves.		// Label used to tie together the PROLOG_LABEL and the MachineMoves.
MCSymbol *FrameLabel = MMI.getContext().createTempSymbol();		MCSymbol *FrameLabel = MMI.getContext().createTempSymbol();
// Encode the stack size of the leaf function.		// Encode the stack size of the leaf function.
unsigned CFIIndex = MF.addFrameInst(		unsigned CFIIndex = MF.addFrameInst(
MCCFIInstruction::createDefCfaOffset(FrameLabel, -NumBytes));		MCCFIInstruction::createDefCfaOffset(FrameLabel, -NumBytes));
BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))		BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
.addCFIIndex(CFIIndex)		.addCFIIndex(CFIIndex)
.setMIFlags(MachineInstr::FrameSetup);		.setMIFlags(MachineInstr::FrameSetup);
}		}
return;		return;
}		}

bool IsWin64 =		bool IsWin64 =
Subtarget.isCallingConvWin64(MF.getFunction()->getCallingConv());		Subtarget.isCallingConvWin64(MF.getFunction()->getCallingConv());
unsigned FixedObject = IsWin64 ? alignTo(AFI->getVarArgsGPRSize(), 16) : 0;		unsigned FixedObject = IsWin64 ? alignTo(AFI->getVarArgsGPRSize(), 16) : 0;

auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;		auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
// All of the remaining stack allocations are for locals.		// All of the remaining stack allocations are for locals.
AFI->setLocalStackSize(NumBytes - PrologueSaveSize);		AFI->setLocalStackSize(NumBytes - PrologueSaveSize);

bool CombineSPBump = shouldCombineCSRLocalStackBump(MF, NumBytes);		bool CombineSPBump = shouldCombineCSRLocalStackBump(MF, NumBytes);
		// If we're going to combine SP updates then the stack adjustment must be less
		// than 512 bytes, hence stack probing in the prologue is unnecssary.
if (CombineSPBump) {		if (CombineSPBump) {
emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP, -NumBytes, TII,		emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP, -NumBytes, TII,
MachineInstr::FrameSetup);		MachineInstr::FrameSetup);
NumBytes = 0;		NumBytes = 0;
} else if (PrologueSaveSize != 0) {		} else if (PrologueSaveSize != 0) {
MBBI = convertCalleeSaveRestoreToSPPrePostIncDec(MBB, MBBI, DL, TII,		MBBI = convertCalleeSaveRestoreToSPPrePostIncDec(MBB, MBBI, DL, TII,
-PrologueSaveSize);		-PrologueSaveSize);
NumBytes -= PrologueSaveSize;		NumBytes -= PrologueSaveSize;
Show All 29 Lines	if (NumBytes) {
const bool NeedsRealignment = RegInfo->needsStackRealignment(MF);		const bool NeedsRealignment = RegInfo->needsStackRealignment(MF);
unsigned scratchSPReg = AArch64::SP;		unsigned scratchSPReg = AArch64::SP;

if (NeedsRealignment) {		if (NeedsRealignment) {
scratchSPReg = findScratchNonCalleeSaveRegister(&MBB);		scratchSPReg = findScratchNonCalleeSaveRegister(&MBB);
assert(scratchSPReg != AArch64::NoRegister);		assert(scratchSPReg != AArch64::NoRegister);
}		}

		if (NumBytes > StackProbeSize && ProbeSym != "") {
		// We need to emit a call to the stack probe function. Note that we still
		// need to adjust SP, the probing doesn't modify it.
		emitStackProbe(MF, MBB, MBBI, DL, NumBytes);
		emitFrameOffset(MBB, MBBI, DL, scratchSPReg, AArch64::SP, -NumBytes, TII,
		MachineInstr::FrameSetup);
		} else {
// If we're a leaf function, try using the red zone.		// If we're a leaf function, try using the red zone.
if (!canUseRedZone(MF))		if (!canUseRedZone(MF))
// FIXME: in the case of dynamic re-alignment, NumBytes doesn't have		// FIXME: in the case of dynamic re-alignment, NumBytes doesn't have
// the correct value here, as NumBytes also includes padding bytes,		// the correct value here, as NumBytes also includes padding bytes,
// which shouldn't be counted here.		// which shouldn't be counted here.
emitFrameOffset(MBB, MBBI, DL, scratchSPReg, AArch64::SP, -NumBytes, TII,		emitFrameOffset(MBB, MBBI, DL, scratchSPReg, AArch64::SP, -NumBytes,
MachineInstr::FrameSetup);		TII, MachineInstr::FrameSetup);
		}

if (NeedsRealignment) {		if (NeedsRealignment) {
const unsigned Alignment = MFI.getMaxAlignment();		const unsigned Alignment = MFI.getMaxAlignment();
const unsigned NrBitsToZero = countTrailingZeros(Alignment);		const unsigned NrBitsToZero = countTrailingZeros(Alignment);
assert(NrBitsToZero > 1);		assert(NrBitsToZero > 1);
assert(scratchSPReg != AArch64::SP);		assert(scratchSPReg != AArch64::SP);

// SUB X9, SP, NumBytes		// SUB X9, SP, NumBytes
▲ Show 20 Lines • Show All 694 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 471 Lines • ▼ Show 20 Lines	public:
unsigned getNumInterleavedAccesses(VectorType *VecTy,		unsigned getNumInterleavedAccesses(VectorType *VecTy,
const DataLayout &DL) const;		const DataLayout &DL) const;

MachineMemOperand::Flags getMMOFlags(const Instruction &I) const override;		MachineMemOperand::Flags getMMOFlags(const Instruction &I) const override;

bool functionArgumentNeedsConsecutiveRegisters(Type *Ty,		bool functionArgumentNeedsConsecutiveRegisters(Type *Ty,
CallingConv::ID CallConv,		CallingConv::ID CallConv,
bool isVarArg) const override;		bool isVarArg) const override;

		StringRef getStackProbeSymbolName(MachineFunction &MF) const override;

private:		private:
bool isExtFreeImpl(const Instruction *Ext) const override;		bool isExtFreeImpl(const Instruction *Ext) const override;

/// Keep a pointer to the AArch64Subtarget around so that we can		/// Keep a pointer to the AArch64Subtarget around so that we can
/// make the right decision when generating code for different targets.		/// make the right decision when generating code for different targets.
const AArch64Subtarget *Subtarget;		const AArch64Subtarget *Subtarget;

void addTypeForNEON(MVT VT, MVT PromotedBitwiseVT);		void addTypeForNEON(MVT VT, MVT PromotedBitwiseVT);
▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	private:
SDValue LowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFP_TO_INT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFP_TO_INT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVectorAND(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVectorAND(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVectorOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVectorOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFSINCOS(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFSINCOS(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVECREDUCE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVECREDUCE(SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const;

SDValue BuildSDIVPow2(SDNode *N, const APInt &Divisor, SelectionDAG &DAG,		SDValue BuildSDIVPow2(SDNode *N, const APInt &Divisor, SelectionDAG &DAG,
std::vector<SDNode > Created) const override;		std::vector<SDNode > Created) const override;
SDValue getSqrtEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,		SDValue getSqrtEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,
int &ExtraSteps, bool &UseOneConst,		int &ExtraSteps, bool &UseOneConst,
bool Reciprocal) const override;		bool Reciprocal) const override;
SDValue getRecipEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,		SDValue getRecipEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,
int &ExtraSteps) const override;		int &ExtraSteps) const override;
▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 247 Lines • ▼ Show 20 Lines	AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::VASTART, MVT::Other, Custom);		setOperationAction(ISD::VASTART, MVT::Other, Custom);
setOperationAction(ISD::VAARG, MVT::Other, Custom);		setOperationAction(ISD::VAARG, MVT::Other, Custom);
setOperationAction(ISD::VACOPY, MVT::Other, Custom);		setOperationAction(ISD::VACOPY, MVT::Other, Custom);
setOperationAction(ISD::VAEND, MVT::Other, Expand);		setOperationAction(ISD::VAEND, MVT::Other, Expand);

// Variable-sized objects.		// Variable-sized objects.
setOperationAction(ISD::STACKSAVE, MVT::Other, Expand);		setOperationAction(ISD::STACKSAVE, MVT::Other, Expand);
setOperationAction(ISD::STACKRESTORE, MVT::Other, Expand);		setOperationAction(ISD::STACKRESTORE, MVT::Other, Expand);
		if (Subtarget->isTargetDarwin() && TM.Options.EnableStackProbe)
		setOperationAction(ISD::DYNAMIC_STACKALLOC, MVT::i64, Custom);
		else
setOperationAction(ISD::DYNAMIC_STACKALLOC, MVT::i64, Expand);		setOperationAction(ISD::DYNAMIC_STACKALLOC, MVT::i64, Expand);

// Constant pool entries		// Constant pool entries
setOperationAction(ISD::ConstantPool, MVT::i64, Custom);		setOperationAction(ISD::ConstantPool, MVT::i64, Custom);

// BlockAddress		// BlockAddress
setOperationAction(ISD::BlockAddress, MVT::i64, Custom);		setOperationAction(ISD::BlockAddress, MVT::i64, Custom);

// Add/Sub overflow ops with MVT::Glues are lowered to NZCV dependences.		// Add/Sub overflow ops with MVT::Glues are lowered to NZCV dependences.
▲ Show 20 Lines • Show All 2,415 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
case ISD::VECREDUCE_ADD:		case ISD::VECREDUCE_ADD:
case ISD::VECREDUCE_SMAX:		case ISD::VECREDUCE_SMAX:
case ISD::VECREDUCE_SMIN:		case ISD::VECREDUCE_SMIN:
case ISD::VECREDUCE_UMAX:		case ISD::VECREDUCE_UMAX:
case ISD::VECREDUCE_UMIN:		case ISD::VECREDUCE_UMIN:
case ISD::VECREDUCE_FMAX:		case ISD::VECREDUCE_FMAX:
case ISD::VECREDUCE_FMIN:		case ISD::VECREDUCE_FMIN:
return LowerVECREDUCE(Op, DAG);		return LowerVECREDUCE(Op, DAG);
		case ISD::DYNAMIC_STACKALLOC:
		return LowerDYNAMIC_STACKALLOC(Op, DAG);
}		}
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Calling Convention Implementation		// Calling Convention Implementation
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AArch64GenCallingConv.inc"		#include "AArch64GenCallingConv.inc"
▲ Show 20 Lines • Show All 4,666 Lines • ▼ Show 20 Lines	return DAG.getNode(
DAG.getConstant(Intrinsic::aarch64_neon_fminnmv, dl, MVT::i32),		DAG.getConstant(Intrinsic::aarch64_neon_fminnmv, dl, MVT::i32),
Op.getOperand(0));		Op.getOperand(0));
}		}
default:		default:
llvm_unreachable("Unhandled reduction");		llvm_unreachable("Unhandled reduction");
}		}
}		}

		SDValue
		AArch64TargetLowering::LowerDYNAMIC_STACKALLOC(SDValue Op,
		SelectionDAG &DAG) const {
		assert(Subtarget->isTargetDarwin() &&
		"Only Darwin dynamic alloca probing supported");
		SDLoc dl(Op);
		// Get the inputs.
		SDNode *Node = Op.getNode();
		SDValue Chain = Op.getOperand(0);
		SDValue Size = Op.getOperand(1);
		unsigned Align = cast<ConstantSDNode>(Op.getOperand(2))->getZExtValue();
		EVT VT = Node->getValueType(0);
		EVT PtrVT = getPointerTy(DAG.getDataLayout());

		SDValue Callee = DAG.getTargetExternalSymbol("___chkstk_darwin", PtrVT, 0);

		Chain = DAG.getCALLSEQ_START(Chain, 0, 0, dl);
		const auto Mask =
		Subtarget->getRegisterInfo()->getDarwinStackProbePreservedMask();

		Chain = DAG.getCopyToReg(Chain, dl, AArch64::X9, Size, SDValue());
		Chain =
		DAG.getNode(AArch64ISD::CALL, dl, DAG.getVTList(MVT::Other, MVT::Glue),
		Chain, Callee, DAG.getRegister(AArch64::X9, MVT::i64),
		DAG.getRegisterMask(Mask), Chain.getValue(1));

		SDValue SP = DAG.getCopyFromReg(Chain, dl, AArch64::SP, MVT::i64);
		Chain = SP.getValue(1);
		SP = DAG.getNode(ISD::SUB, dl, MVT::i64, SP, Size);
		Chain = DAG.getCopyToReg(Chain, dl, AArch64::SP, SP);

		if (Align) {
		SP = DAG.getNode(ISD::AND, dl, VT, SP.getValue(0),
		DAG.getConstant(-(uint64_t)Align, dl, VT));
		Chain = DAG.getCopyToReg(Chain, dl, AArch64::SP, SP);
		}

		Chain = DAG.getCALLSEQ_END(Chain, DAG.getIntPtrConstant(0, dl, true),
		DAG.getIntPtrConstant(0, dl, true), SDValue(), dl);

		SDValue Ops[2] = {SP, Chain};
		return DAG.getMergeValues(Ops, dl);
		}


/// getTgtMemIntrinsic - Represent NEON load and store intrinsics as		/// getTgtMemIntrinsic - Represent NEON load and store intrinsics as
/// MemIntrinsicNodes. The associated MachineMemOperands record the alignment		/// MemIntrinsicNodes. The associated MachineMemOperands record the alignment
/// specified in the intrinsic calls.		/// specified in the intrinsic calls.
bool AArch64TargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,		bool AArch64TargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,
const CallInst &I,		const CallInst &I,
unsigned Intrinsic) const {		unsigned Intrinsic) const {
auto &DL = I.getModule()->getDataLayout();		auto &DL = I.getModule()->getDataLayout();
switch (Intrinsic) {		switch (Intrinsic) {
▲ Show 20 Lines • Show All 3,601 Lines • ▼ Show 20 Lines

unsigned		unsigned
AArch64TargetLowering::getVaListSizeInBits(const DataLayout &DL) const {		AArch64TargetLowering::getVaListSizeInBits(const DataLayout &DL) const {
if (Subtarget->isTargetDarwin() \|\| Subtarget->isTargetWindows())		if (Subtarget->isTargetDarwin() \|\| Subtarget->isTargetWindows())
return getPointerTy(DL).getSizeInBits();		return getPointerTy(DL).getSizeInBits();

return 3 * getPointerTy(DL).getSizeInBits() + 2 * 32;		return 3 * getPointerTy(DL).getSizeInBits() + 2 * 32;
}		}

		StringRef
		AArch64TargetLowering::getStackProbeSymbolName(MachineFunction &MF) const {
		if (Subtarget->isTargetDarwin() && MF.getTarget().Options.EnableStackProbe)
		return "___chkstk_darwin";
		return "";
		MatzeBUnsubmitted Not Done Reply Inline Actions I assume this will be a correctness problem (on newer darwins?). So maybe we should not leave the decision to the frontend to set Options, but instead always make our decision based on the target triple (being darwin and newer than some version). Otherwise I can easily see non-clang frontends such as the various JITs we have around to miss the setting and fail. You could then additionally provide a cl::opt then for cases where users want to explicitely disable the probe code generation for some reason. MatzeB: I assume this will be a correctness problem (on newer darwins?). So maybe we should not leave…
		aemersonAuthorUnsubmitted Not Done Reply Inline Actions Akira suggested on D40864 that we use function attributes for this instead of a codegen option. That will still require front-ends to handle it. If we to unconditionally enable this in the backend based on target triple perhaps that would break cases where JITs etc don't automatically link in compiler-rt? aemerson: Akira suggested on D40864 that we use function attributes for this instead of a codegen option.
		MatzeBUnsubmitted Not Done Reply Inline Actions He's right that for users explicitely enabling/disabling the feature a function attribute would be best. I think the question whether we want the logic in llvm or just in the frontend depends a bit on why we need/want stack probing. MatzeB: He's right that for users explicitely enabling/disabling the feature a function attribute would…
		}

lib/Target/AArch64/AArch64RegisterInfo.h

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	public:
/// (i.e. it is a calling convention that uses the same register for the first		/// (i.e. it is a calling convention that uses the same register for the first
/// i64 argument and an i64 return value)		/// i64 argument and an i64 return value)
///		///
/// Should return NULL in the case that the calling convention does not have		/// Should return NULL in the case that the calling convention does not have
/// this property		/// this property
const uint32_t *getThisReturnPreservedMask(const MachineFunction &MF,		const uint32_t *getThisReturnPreservedMask(const MachineFunction &MF,
CallingConv::ID) const;		CallingConv::ID) const;

		/// Stack probing calls preserve different CSRs to the normal CC.
		const uint32_t *getDarwinStackProbePreservedMask() const;

BitVector getReservedRegs(const MachineFunction &MF) const override;		BitVector getReservedRegs(const MachineFunction &MF) const override;
bool isConstantPhysReg(unsigned PhysReg) const override;		bool isConstantPhysReg(unsigned PhysReg) const override;
const TargetRegisterClass *		const TargetRegisterClass *
getPointerRegClass(const MachineFunction &MF,		getPointerRegClass(const MachineFunction &MF,
unsigned Kind = 0) const override;		unsigned Kind = 0) const override;
const TargetRegisterClass *		const TargetRegisterClass *
getCrossCopyRegClass(const TargetRegisterClass *RC) const override;		getCrossCopyRegClass(const TargetRegisterClass *RC) const override;

Show All 35 Lines

lib/Target/AArch64/AArch64RegisterInfo.cpp

Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	AArch64RegisterInfo::getThisReturnPreservedMask(const MachineFunction &MF,
// single i64 return value)		// single i64 return value)
//		//
// In case that the calling convention does not use the same register for		// In case that the calling convention does not use the same register for
// both, the function should return NULL (does not currently apply)		// both, the function should return NULL (does not currently apply)
assert(CC != CallingConv::GHC && "should not be GHC calling convention.");		assert(CC != CallingConv::GHC && "should not be GHC calling convention.");
return CSR_AArch64_AAPCS_ThisReturn_RegMask;		return CSR_AArch64_AAPCS_ThisReturn_RegMask;
}		}

		const uint32_t *AArch64RegisterInfo::getDarwinStackProbePreservedMask() const {
		return CSR_AArch64_StackProbe_Darwin_RegMask;
		}

BitVector		BitVector
AArch64RegisterInfo::getReservedRegs(const MachineFunction &MF) const {		AArch64RegisterInfo::getReservedRegs(const MachineFunction &MF) const {
const AArch64FrameLowering *TFI = getFrameLowering(MF);		const AArch64FrameLowering *TFI = getFrameLowering(MF);

// FIXME: avoid re-calculating this every time.		// FIXME: avoid re-calculating this every time.
BitVector Reserved(getNumRegs());		BitVector Reserved(getNumRegs());
markSuperRegs(Reserved, AArch64::WSP);		markSuperRegs(Reserved, AArch64::WSP);
markSuperRegs(Reserved, AArch64::WZR);		markSuperRegs(Reserved, AArch64::WZR);
▲ Show 20 Lines • Show All 320 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-stack-probing.ll

This file was added.

				; RUN: llc < %s -mtriple=arm64-apple-darwin -verify-machineinstrs -stack-probe \| FileCheck %s
				; RUN: llc < %s -mtriple=arm64-apple-darwin -verify-machineinstrs \| FileCheck --check-prefix=DISABLED %s
				target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
				target triple = "arm64-apple-darwin"

				declare i32 @use_ptr(i32*) #1

				; Expect a probe here due to static object size > 4096.
				; Function Attrs: noinline nounwind optnone uwtable
				define void @static_test1_probe() #0 {
				; CHECK-LABEL: static_test1_probe
				; CHECK: orr w9, wzr, #0x4000
				; CHECK-NEXT: bl ____chkstk_darwin
				; CHECK-NEXT: sub sp, sp, #4, lsl #12
				; DISABLED-NOT: bl ____chkstk_darwin
				%1 = alloca [4096 x i32], align 4
				%2 = getelementptr inbounds [4096 x i32], [4096 x i32]* %1, i32 0, i32 0
				%3 = call i32 @use_ptr(i32* %2)
				ret void
				}

				; Stack size should be less than 4k, no probe.
				; Function Attrs: noinline nounwind optnone uwtable
				define void @static_test2_small() #0 {
				; CHECK-LABEL: static_test2_small
				; CHECK-NOT: bl ____chkstk_darwin
				%1 = alloca [64 x i32], align 4
				%2 = getelementptr inbounds [64 x i32], [64 x i32]* %1, i32 0, i32 0
				%3 = call i32 @use_ptr(i32* %2)
				ret void
				}

				@g = common local_unnamed_addr global i32* null, align 8

				; Check that the LR is saved in the prolog for static allocas when it isn't
				; otherwise saved as a normal callee-save reg.
				; Function Attrs: nounwind optsize ssp uwtable
				define void @test_static_oversize_nocsr(i32* nocapture readnone) local_unnamed_addr #0 {
				; CHECK-LABEL: test_static_oversize_nocsr
				; CHECK: stp x28, x27, [sp, #-16]!
				; CHECK-NEXT: str x30, [sp, #-16]!
				; CHECK-NEXT: mov w9, #8000
				; CHECK-NEXT: bl ____chkstk_darwin
				; CHECK-NEXT: ldr x30, [sp], #16
				%2 = alloca [2000 x i32], align 4
				%3 = bitcast [2000 x i32]* %2 to i32*
				store i32* %3, i32** @g, align 8
				ret void
				}

				; Test dynamic sized allocas.
				; Function Attrs: nounwind optsize ssp uwtable
				define void @test_dynamic(i32* nocapture readnone, i64 %num) local_unnamed_addr #0 {
				; CHECK-LABEL: test_dynamic
				; CHECK: add x8, x1, #15
				; CHECK-NEXT: and x8, x8, #0xfffffffffffffff0
				; CHECK-NEXT: mov x9, x8
				; CHECK-NEXT: bl ____chkstk_darwin
				; CHECK-NEXT: mov x9, sp
				; CHECK-NEXT: subs x8, x9, x8
				%2 = alloca i8, i64 %num, align 16
				%3 = bitcast i8* %2 to i32*
				store i32* %3, i32** @g, align 8
				ret void
				}


				attributes #0 = { noinline nounwind optnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="cyclone" "target-features"="+crypto,+fp-armv8,+neon,+zcm,+zcz" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="cyclone" "target-features"="+crypto,+fp-armv8,+neon,+zcm,+zcz" "unsafe-fp-math"="false" "use-soft-float"="false" }
				MatzeBUnsubmitted Not Done Reply Inline Actions Did you try removing all the function attributes to make the test simpler? MatzeB: Did you try removing all the function attributes to make the test simpler?

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][Darwin] Implement stack probing for static and dynamic stack objectsAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 125624

lib/Target/AArch64/AArch64CallingConvention.td

lib/Target/AArch64/AArch64FrameLowering.h

lib/Target/AArch64/AArch64FrameLowering.cpp

lib/Target/AArch64/AArch64ISelLowering.h

lib/Target/AArch64/AArch64ISelLowering.cpp

lib/Target/AArch64/AArch64RegisterInfo.h

lib/Target/AArch64/AArch64RegisterInfo.cpp

test/CodeGen/AArch64/arm64-stack-probing.ll

[AArch64][Darwin] Implement stack probing for static and dynamic stack objects
AbandonedPublic