This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
2/4
AArch64FrameLowering.h
9
AArch64FrameLowering.cpp
-
AArch64ISelLowering.h
2/3
AArch64ISelLowering.cpp
-
AArch64InstrInfo.h
-
AArch64InstrInfo.cpp
-
AArch64InstrInfo.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
6
stack-probing-64k.ll
-
stack-probing-sve.ll
-
stack-probing.ll

Differential D96004

[AArch64] Stack probing for function prologues
Needs ReviewPublic

Authored by ostannard on Feb 4 2021, 2:27 AM.

Download Raw Diff

Details

Reviewers

serge-sans-paille
• jnspaulsson
bzEq
tnfchris
efriedma
jonpa
aemerson

Summary

This adds code to AArch64 function prologues to protect against stack
clash attacks by probing (writing to) the stack at regular enough
intervals to ensure that the guard page cannot be skipped over.

There are multiple probing sequences that can be emitted, depending on
the size of the stack allocation:

A straight-line sequence of subtracts and stores, used when the allocation size is smaller than 3 guard pages.
A loop allocating and probing one page size per iteration, plus a single probe to deal with the remainder, used when the allocation size is larger but still known at compile time.
A loop which moves the SP down to the target value held in a register, used when the allocation size is not known at compile-time, such as when allocating space for SVE values, or when over-aligning the stack. This is emitted in AArch64InstrInfo because it will also be used for dynamic allocas in a future patch.

By default, the stack guard size is 4KiB, which is a safe default as this is
the smallest possible page size for AArch64. Linux uses a 64KiB guard for
AArch64, so this can be overridden by the stack-probe-size function attribute.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ostannard created this revision.Feb 4 2021, 2:27 AM

Herald added subscribers: danielkiss, hiraditya, kristof.beyls. · View Herald TranscriptFeb 4 2021, 2:27 AM

ostannard requested review of this revision.Feb 4 2021, 2:27 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 4 2021, 2:27 AM

ostannard added a child revision: D96005: [AArch64] Stack probing for dynamic allocas in SelectionDAG.Feb 4 2021, 2:29 AM

Harbormaster completed remote builds in B87854: Diff 321352.Feb 4 2021, 3:35 AM

lkail added a reviewer: efriedma.Feb 5 2021, 2:14 AM

lkail added a subscriber: lkail.

lkail added a reviewer: jonpa.Feb 5 2021, 2:18 AM

ostannard updated this revision to Diff 321686.Feb 5 2021, 2:37 AM

Ping.

alex added a subscriber: alex.Feb 22 2021, 4:32 PM

Ping

@ostannard I've done the best I can for this review, but I have no ARM background so I can't be sure for the arch-specific parts @kristof.beyls can you please have a look?

llvm/lib/Target/AArch64/AArch64FrameLowering.h
153	Why did you set that one virtual?
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
16629	Should you warn at some point if another (unsupported) scheme is used?
16657	Can you give any hint about why 1024 was chosen, and why it's independent from `getStackProbeSize` ?

Rebase
Check for unsupported stack probing methods

ostannard planned changes to this revision.Mar 9 2021, 9:00 AM

ostannard added inline comments.

llvm/lib/Target/AArch64/AArch64FrameLowering.h
153	It's an override of a virtual function called by the target-independent part of the prologue/epilogue pass.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
16657	This matches GCC's ABI, which doesn't appear to be explicitly documented anywhere, other then comments in the GCC code: https://github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/aarch64.c#L8190

ostannard requested review of this revision.Mar 9 2021, 9:01 AM

Harbormaster completed remote builds in B92867: Diff 329317.Mar 9 2021, 3:00 PM

Thanks for this patch Oliver!
I've started reviewing, but am not very far yet.
I thought I'd share my thoughts so far already, as I'm not sure exactly when I'll be able to do a complete review.

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
1351	I know this is not your original code, but I'd be tempted to go for more descriptive names than AllocateBefore and AllocateAfter. It's been a little while since I looked at the FrameLowering code, and from the names alone, I could not guess what these stack offsets were meant to represent. Maybe I'll have a suggestion for a better name later on after I've read more of the patch. I think it also makes it easier to read this code if there was a comment above this that describes that this part of the code handles SVE CSR and SVE locals (i.e. the SVE area of the stack frame).
1351	I'm a bit confused by this code change (probably partly due to the not really understanding what these variables are meant to represent). Does the change on this line indicate that there was a bug here before? If so, would it be better to commit that separately? If not, does the code change result in the meaning of AllocateBefore and AllocateAfter to change?
1382	This line of code reminds me that frame setup can also happen in non-entry basic blocks, when shrink wrapping has happened. Assuming I checked correctly, there are no test cases in the tests in this patch verifying correct behavior in case shrink wrapping happens. I'm not sure if it's worthwhile to have such tests, but I thought I'd check if you thought about this.
1398–1400	This seems to be a key heuristic/logic on when to generate stack probes and when not. I think it would be useful to have some light design documentation/rationale on why this is the right heuristic. Depending on the length of that documentation, it could go either here, in the large comment at the top of this file or somewhere else (potentially as a target-independent design doc in the docs directory)? I don't think the design doc has to be particularly long, but a small doc probably will bring a lot of value for future developers needing to touch this code.
llvm/lib/Target/AArch64/AArch64FrameLowering.h
153	I think the keyword virtual is not needed when override is specified. There are other examples of override methods in this class; they do not explicitly write virtual. So I guess it's more consistent to not use virtual here?

Hi @ostannard ,

I don't know enough about LLVM to comment on the actual code so I will only comment on the output I see generated from the testcases.

From the testcases (like static_1024) I can see that you probe when there is more than 1k of incoming stack arguments.
For GCC this is guard-page - 1k. The reasoning is that with any outgoing argument larger than 1k we would probe such that we maintain the invariant, but probing that 1k means we have a whole guard-size -1k left that we can use without probing. These sizes were chose as they cover about 99% of all programs (for a subset of all :)).

So the idea is to minimize the number of probes required.

As such for this is what GCC generates for these cases:

int probe (int x)
{
  char arr[64 * 1028];
  return arr[x];
}

probe:
        sub     sp, sp, #65536
        str     xzr, [sp, 1024]
        sub     sp, sp, #256
        ldrb    w0, [sp, w0, sxtw]
        add     sp, sp, 256
        add     sp, sp, 65536
        ret

int no_probe (int x)
{
  char arr[1028];
  return arr[x];
}

no_probe:
        sub     sp, sp, #1040
        add     x1, sp, 8
        ldrb    w0, [x1, w0, sxtw]
        add     sp, sp, 1040
        ret

For 64k probe sizes.

The other difference is where we probe as well. You seem to be probing at SP but we probe at SP + 1k.

This is because say you were at the 1k boundary and you allocate 1 guard size worth of incoming args you could a page. So we probe the 1k up to ensure you touch the pages as you go.

For the alloca cases,

I noticed you don't have a testcase for alloc(n) where n is a variable. Also how does it handle alloca(0)?

I also notice that you only set the CFA after the loop has finished.

In GCC we temporarily change the CFI to a different register and set it to the final expected value after the loop.
After the loop we switch it back, so we say which value it's going to be before hand.

i.e.

.LFB0:
        .cfi_startproc
        sub     x12, sp, #1310720
        .cfi_def_cfa 12, 1310720
.LPSRL0:
        sub     sp, sp, 65536
        str     xzr, [sp, 1024]
        cmp     sp, x12
        b.ne    .LPSRL0
        .cfi_def_cfa_register 31
        sub     sp, sp, #512
        .cfi_def_cfa_offset 1311232
        ldrb    w0, [sp, w0, sxtw]
        add     sp, sp, 512
        .cfi_def_cfa_offset 1310720
        add     sp, sp, 1310720
        .cfi_def_cfa_offset 0
        ret
        .cfi_endproc
.LFE0:

siddhesh added a subscriber: siddhesh.Mar 16 2021, 8:26 AM

Can you please pre-commit the tests so that it is easier to see how the codegen changes? E.g. I suspect the CFI directives in prologue are already broken before your changes and so my comments aren't super relevant.

llvm/test/CodeGen/AArch64/stack-probing-64k.ll
31	This should come directly after the `sub` I believe. Otherwise the stack offsets would be incorrect while the `str x29, [sp, #256]` is being executed. (Not sure if preexisting, though)
32	Is this right? We spilled 8 bytes, but specify the offset for the 32-bit view of the register.
71	Similarly, this should come immediately after the `sub` instruction, otherwise the CFI won't describe the stack accurately during the `str` above.

nagisa added inline comments.Mar 21 2021, 6:58 AM

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
1382	I believe this would fail or produce incorrect results for functions that both have the `no_caller_saved_registers` attribute and request the inline stack probes, right?

For GCC this is guard-page - 1k. The reasoning is that with any outgoing argument larger than 1k we would probe such that we maintain the invariant, but probing that 1k means we have a whole guard-size -1k left that we can use without probing. These sizes were chose as they cover about 99% of all programs (for a subset of all :)).

As an addendum, as I mentioned above, wrt the outgoing argument not being larger than 1k. That's the buffer we guarantee. We're able to do so because during a function call the storing of LR counts as an implicit probe. So in order for this scheme to be secure you'd need to check that LLVM (like GCC) always stores LR, even for no-return leaf functions.

emaste added a subscriber: emaste.Aug 23 2021, 12:02 PM

Herald added a subscriber: ctetreau. · View Herald TranscriptAug 23 2021, 12:02 PM

tnfchris mentioned this in D96005: [AArch64] Stack probing for dynamic allocas in SelectionDAG.Oct 13 2021, 3:49 PM

cuviper added a subscriber: cuviper.Sep 27 2022, 9:48 AM

cuviper added inline comments.

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
1382	This line of code reminds me that frame setup can also happen in non-entry basic blocks, when shrink wrapping has happened. FWIW, `TargetFrameLowering::canUseAsPrologue` can be used to block that if needed. (I just used that to fix an X86 bug where stack probes clobbered EFLAGS, D134494.)

Herald added a project: Restricted Project. · View Herald TranscriptSep 27 2022, 9:48 AM

Matt added a subscriber: Matt.Sep 27 2022, 11:36 AM

What's the status of this patch? Were the review comments addressed, or is this still waiting for someone to address them?

In D96004#3992651, @efriedma wrote:

What's the status of this patch? Were the review comments addressed, or is this still waiting for someone to address them?

Current plans are I will be picking this up.

ab added a reviewer: aemerson.Feb 21 2023, 11:13 AM

I am resuming Oliver's work, I will address the current reviewers remarks and send a newer version.

efriedma mentioned this in D40863: [AArch64][Darwin] Implement stack probing for static and dynamic stack objects.Apr 19 2023, 12:45 PM

oskarwirga added a subscriber: oskarwirga.Apr 19 2023, 2:10 PM

varunkumare99 mentioned this in D154911: Enabling fstack_clash_protection for arm32 bit, thumb and thumb2 mode.Jul 10 2023, 5:56 PM

I've spent some time myself working rebasing this stack and getting it green. Currently running into a runtime crash in libunwind where parseFDEInstructions is being called over and over and exhausting stack memory. Current best guess is that perhaps the stack probing is messing up the call frame information in such a way that causes a loop in the libunwind. If anyone has any suggestions of where to look or how to debug this I would appreciate it :)

I can think of the following possibilities:

The stack is getting adjusted by the wrong amount (i.e. the code with stack probes is subtracting a different total offset from the stack pointer compared to the code without probes).
The code is trying to asynchronously unwind the stack (i.e. unwind while the stack probing loop is running), and there isn't a frame pointer. This patch might need to be reworked a bit to support async unuwind.

I've started a new review of this patch here: https://reviews.llvm.org/D158084
I believe I've addressed most of the comments in this review (except one?).

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
1351	I've added more descriptive names in the new patch. As far as I can tell, there's no bug, just the code is a little convoluted. I've rearranged it is the new patch (could just as well be a separate NFC/refactoring patch).
1382	In the new patch I've added a check for an available scratch register in `canUseAsPrologue`.
1398–1400	TBH, the main reason for this logic is compatibility with GCC (let's call it an informal ABI). GCC expects at most 1024 unprobed bytes above the stack, hence if we allocate more than that we need to issue probe. The choice of the number 1024 comes from an analysis of SPEC frame size distributions and was chosen such that a great number of stack frames (of size < 3k or so, for 4k pages) won't need to perform a probe at all (for GCC). As far as I can understand, these considerations are not really relevant for LLVM, since it stores `x29` for frames, greater than 240 bytes anyway.
llvm/lib/Target/AArch64/AArch64FrameLowering.h
153	Keyword `virtual` removed.
llvm/test/CodeGen/AArch64/stack-probing-64k.ll
31	This patch predates the support for asynchronous unwinding. In the new patch it's handled correctly, I believe.
32	Yes, it's correct, the DWARF encoding of register names does not encode separately `xN` and `wN`.
71	Also fixed.

chill mentioned this in D158084: [AArch64] Stack probing for function prologues.Aug 16 2023, 7:47 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64FrameLowering.h

12 lines

AArch64FrameLowering.cpp

220 lines

AArch64ISelLowering.h

11 lines

AArch64ISelLowering.cpp

39 lines

AArch64InstrInfo.h

8 lines

AArch64InstrInfo.cpp

84 lines

AArch64InstrInfo.td

15 lines

test/

CodeGen/

AArch64/

stack-probing-64k.ll

289 lines

stack-probing-sve.ll

351 lines

stack-probing.ll

291 lines

Diff 329317

llvm/lib/Target/AArch64/AArch64FrameLowering.h

Show First 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	int64_t assignSVEStackObjectOffsets(MachineFrameInfo &MF,
int &MaxCSFrameIndex) const;		int &MaxCSFrameIndex) const;
MCCFIInstruction		MCCFIInstruction
createDefCFAExpressionFromSP(const TargetRegisterInfo &TRI,		createDefCFAExpressionFromSP(const TargetRegisterInfo &TRI,
const StackOffset &OffsetFromSP) const;		const StackOffset &OffsetFromSP) const;
MCCFIInstruction createCfaOffset(const TargetRegisterInfo &MRI, unsigned DwarfReg,		MCCFIInstruction createCfaOffset(const TargetRegisterInfo &MRI, unsigned DwarfReg,
const StackOffset &OffsetFromDefCFA) const;		const StackOffset &OffsetFromDefCFA) const;
bool shouldCombineCSRLocalStackBumpInEpilogue(MachineBasicBlock &MBB,		bool shouldCombineCSRLocalStackBumpInEpilogue(MachineBasicBlock &MBB,
unsigned StackBumpBytes) const;		unsigned StackBumpBytes) const;

		/// Replace a StackProbe stub (if any) with the actual probe code inline
		virtual void inlineStackProbe(MachineFunction &MF,
		serge-sans-pailleUnsubmitted Not Done Reply Inline Actions Why did you set that one virtual? serge-sans-paille: Why did you set that one virtual?
		ostannardAuthorUnsubmitted Done Reply Inline Actions It's an override of a virtual function called by the target-independent part of the prologue/epilogue pass. ostannard: It's an override of a virtual function called by the target-independent part of the…
		kristof.beylsUnsubmitted Not Done Reply Inline Actions I think the keyword virtual is not needed when override is specified. There are other examples of override methods in this class; they do not explicitly write virtual. So I guess it's more consistent to not use virtual here? kristof.beyls: I think the keyword virtual is not needed when override is specified. There are other examples…
		chillUnsubmitted Not Done Reply Inline Actions Keyword `virtual` removed. chill: Keyword `virtual` removed.
		MachineBasicBlock &PrologueMBB) const override;
		MachineBasicBlock::iterator
		inlineStackProbeFixed(MachineBasicBlock::iterator MBBI) const;
		MachineBasicBlock::iterator
		inlineStackProbeVar(MachineBasicBlock::iterator MBBI) const;
		MachineBasicBlock::iterator
		inlineStackProbeLoopExactMultiple(MachineBasicBlock::iterator MBBI,
		int64_t NegProbeSize,
		Register TargetReg) const;
};		};

} // End llvm namespace		} // End llvm namespace

#endif		#endif

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp

Show First 20 Lines • Show All 1,085 Lines • ▼ Show 20 Lines

void AArch64FrameLowering::emitPrologue(MachineFunction &MF,		void AArch64FrameLowering::emitPrologue(MachineFunction &MF,
MachineBasicBlock &MBB) const {		MachineBasicBlock &MBB) const {
MachineBasicBlock::iterator MBBI = MBB.begin();		MachineBasicBlock::iterator MBBI = MBB.begin();
const MachineFrameInfo &MFI = MF.getFrameInfo();		const MachineFrameInfo &MFI = MF.getFrameInfo();
const Function &F = MF.getFunction();		const Function &F = MF.getFunction();
const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();		const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();		const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
		const AArch64TargetLowering &TLI = *Subtarget.getTargetLowering();
const TargetInstrInfo *TII = Subtarget.getInstrInfo();		const TargetInstrInfo *TII = Subtarget.getInstrInfo();
MachineModuleInfo &MMI = MF.getMMI();		MachineModuleInfo &MMI = MF.getMMI();
AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();		AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
bool needsFrameMoves =		bool needsFrameMoves =
MF.needsFrameMoves() && !MF.getTarget().getMCAsmInfo()->usesWindowsCFI();		MF.needsFrameMoves() && !MF.getTarget().getMCAsmInfo()->usesWindowsCFI();
bool HasFP = hasFP(MF);		bool HasFP = hasFP(MF);
bool NeedsWinCFI = needsWinCFI(MF);		bool NeedsWinCFI = needsWinCFI(MF);
bool HasWinCFI = false;		bool HasWinCFI = false;
▲ Show 20 Lines • Show All 240 Lines • ▼ Show 20 Lines	if (NeedsWinCFI) {
HasWinCFI = true;		HasWinCFI = true;
BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_StackAlloc))		BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_StackAlloc))
.addImm(NumBytes)		.addImm(NumBytes)
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
}		}
NumBytes = 0;		NumBytes = 0;
}		}

StackOffset AllocateBefore = SVEStackSize, AllocateAfter = {};		StackOffset AllocateBefore = {}, AllocateAfter = SVEStackSize;
		kristof.beylsUnsubmitted Not Done Reply Inline Actions I know this is not your original code, but I'd be tempted to go for more descriptive names than AllocateBefore and AllocateAfter. It's been a little while since I looked at the FrameLowering code, and from the names alone, I could not guess what these stack offsets were meant to represent. Maybe I'll have a suggestion for a better name later on after I've read more of the patch. I think it also makes it easier to read this code if there was a comment above this that describes that this part of the code handles SVE CSR and SVE locals (i.e. the SVE area of the stack frame). kristof.beyls: I know this is not your original code, but I'd be tempted to go for more descriptive names than…
		kristof.beylsUnsubmitted Not Done Reply Inline Actions I'm a bit confused by this code change (probably partly due to the not really understanding what these variables are meant to represent). Does the change on this line indicate that there was a bug here before? If so, would it be better to commit that separately? If not, does the code change result in the meaning of AllocateBefore and AllocateAfter to change? kristof.beyls: I'm a bit confused by this code change (probably partly due to the not really understanding…
		chillUnsubmitted Not Done Reply Inline Actions I've added more descriptive names in the new patch. As far as I can tell, there's no bug, just the code is a little convoluted. I've rearranged it is the new patch (could just as well be a separate NFC/refactoring patch). chill: I've added more descriptive names in the new patch. As far as I can tell, there's no bug, just…
MachineBasicBlock::iterator CalleeSavesBegin = MBBI, CalleeSavesEnd = MBBI;		MachineBasicBlock::iterator CalleeSavesBegin = MBBI, CalleeSavesEnd = MBBI;

// Process the SVE callee-saves to determine what space needs to be		// Process the SVE callee-saves to determine what space needs to be
// allocated.		// allocated.
if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {		if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
		LLVM_DEBUG(dbgs() << "SVECalleeSavedStackSize = " << CalleeSavedSize << "\n");
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - LLVM_DEBUG(dbgs() << "SVECalleeSavedStackSize = " << CalleeSavedSize << "\n"); + LLVM_DEBUG(dbgs() << "SVECalleeSavedStackSize = " << CalleeSavedSize + << "\n"); Lint: Pre-merge checks: clang-format: please reformat the code ``` - LLVM_DEBUG(dbgs() << "SVECalleeSavedStackSize =…
// Find callee save instructions in frame.		// Find callee save instructions in frame.
CalleeSavesBegin = MBBI;		CalleeSavesBegin = MBBI;
assert(IsSVECalleeSave(CalleeSavesBegin) && "Unexpected instruction");		assert(IsSVECalleeSave(CalleeSavesBegin) && "Unexpected instruction");
while (IsSVECalleeSave(MBBI) && MBBI != MBB.getFirstTerminator())		while (IsSVECalleeSave(MBBI) && MBBI != MBB.getFirstTerminator())
++MBBI;		++MBBI;
CalleeSavesEnd = MBBI;		CalleeSavesEnd = MBBI;

AllocateBefore = StackOffset::getScalable(CalleeSavedSize);		AllocateBefore = StackOffset::getScalable(CalleeSavedSize);
AllocateAfter = SVEStackSize - AllocateBefore;		AllocateAfter = SVEStackSize - AllocateBefore;
}		}
		LLVM_DEBUG(dbgs() << "AllocateBefore = " << AllocateBefore.getFixed() << ", "
// Allocate space for the callee saves (if any).		<< AllocateBefore.getScalable() << "\n");
		LLVM_DEBUG(dbgs() << "AllocateAfter = " << AllocateAfter.getFixed() << ", "
		<< AllocateAfter.getScalable() << "\n");

		// Allocate space for the SVE callee saves (if any).
		// This space doesn't need stack probing, because it will all be written to
		// when saving the CSRs.
emitFrameOffset(MBB, CalleeSavesBegin, DL, AArch64::SP, AArch64::SP,		emitFrameOffset(MBB, CalleeSavesBegin, DL, AArch64::SP, AArch64::SP,
-AllocateBefore, TII,		-AllocateBefore, TII,
MachineInstr::FrameSetup);		MachineInstr::FrameSetup);

// Finally allocate remaining SVE stack space.		// Finally allocate remaining SVE stack space.
		if (TLI.hasInlineStackProbe(MF) && AllocateAfter) {
		Register ScratchReg = findScratchNonCalleeSaveRegister(&MBB);
		kristof.beylsUnsubmitted Not Done Reply Inline Actions This line of code reminds me that frame setup can also happen in non-entry basic blocks, when shrink wrapping has happened. Assuming I checked correctly, there are no test cases in the tests in this patch verifying correct behavior in case shrink wrapping happens. I'm not sure if it's worthwhile to have such tests, but I thought I'd check if you thought about this. kristof.beyls: This line of code reminds me that frame setup can also happen in non-entry basic blocks, when…
		nagisaUnsubmitted Not Done Reply Inline Actions I believe this would fail or produce incorrect results for functions that both have the `no_caller_saved_registers` attribute and request the inline stack probes, right? nagisa: I believe this would fail or produce incorrect results for functions that both have the…
		cuviperUnsubmitted Not Done Reply Inline Actions This line of code reminds me that frame setup can also happen in non-entry basic blocks, when shrink wrapping has happened. FWIW, `TargetFrameLowering::canUseAsPrologue` can be used to block that if needed. (I just used that to fix an X86 bug where stack probes clobbered EFLAGS, D134494.) cuviper: > This line of code reminds me that frame setup can also happen in non-entry basic blocks, when…
		chillUnsubmitted Not Done Reply Inline Actions In the new patch I've added a check for an available scratch register in `canUseAsPrologue`. chill: In the new patch I've added a check for an available scratch register in `canUseAsPrologue`.
		assert(ScratchReg != AArch64::NoRegister);
		emitFrameOffset(MBB, CalleeSavesEnd, DL, ScratchReg, AArch64::SP,
		-AllocateAfter, TII, MachineInstr::FrameSetup);
		BuildMI(MBB, MBBI, DL, TII->get(AArch64::PROBED_STACKALLOC_VAR))
		.addUse(ScratchReg);
		} else {
emitFrameOffset(MBB, CalleeSavesEnd, DL, AArch64::SP, AArch64::SP,		emitFrameOffset(MBB, CalleeSavesEnd, DL, AArch64::SP, AArch64::SP,
-AllocateAfter, TII,		-AllocateAfter, TII, MachineInstr::FrameSetup);
MachineInstr::FrameSetup);		}

// Allocate space for the rest of the frame.		// Allocate space for the rest of the frame.
if (NumBytes) {		if (NumBytes) {
// Alignment is required for the parent frame, not the funclet		// Alignment is required for the parent frame, not the funclet
const bool NeedsRealignment =		const bool NeedsRealignment =
!IsFunclet && RegInfo->needsStackRealignment(MF);		!IsFunclet && RegInfo->needsStackRealignment(MF);
		bool NeedsStackProbe = TLI.hasInlineStackProbe(MF) &&
		(NumBytes >= TLI.getStackProbeMaxUnprobedStack(MF) \|\|
		MFI.hasVarSizedObjects());
		kristof.beylsUnsubmitted Not Done Reply Inline Actions This seems to be a key heuristic/logic on when to generate stack probes and when not. I think it would be useful to have some light design documentation/rationale on why this is the right heuristic. Depending on the length of that documentation, it could go either here, in the large comment at the top of this file or somewhere else (potentially as a target-independent design doc in the docs directory)? I don't think the design doc has to be particularly long, but a small doc probably will bring a lot of value for future developers needing to touch this code. kristof.beyls: This seems to be a key heuristic/logic on when to generate stack probes and when not. I think…
		chillUnsubmitted Not Done Reply Inline Actions TBH, the main reason for this logic is compatibility with GCC (let's call it an informal ABI). GCC expects at most 1024 unprobed bytes above the stack, hence if we allocate more than that we need to issue probe. The choice of the number 1024 comes from an analysis of SPEC frame size distributions and was chosen such that a great number of stack frames (of size < 3k or so, for 4k pages) won't need to perform a probe at all (for GCC). As far as I can understand, these considerations are not really relevant for LLVM, since it stores `x29` for frames, greater than 240 bytes anyway. chill: TBH, the main reason for this logic is compatibility with GCC (let's call it an informal ABI).
		if (NeedsRealignment)
		NeedsStackProbe \|= TLI.hasInlineStackProbe(MF) &&
		(NumBytes + MFI.getMaxAlign().value()) >= TLI.getStackProbeMaxUnprobedStack(MF);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - (NumBytes + MFI.getMaxAlign().value()) >= TLI.getStackProbeMaxUnprobedStack(MF); + (NumBytes + MFI.getMaxAlign().value()) >= + TLI.getStackProbeMaxUnprobedStack(MF); Lint: Pre-merge checks: clang-format: please reformat the code ``` - (NumBytes + MFI.getMaxAlign().value()) >=…
unsigned scratchSPReg = AArch64::SP;		unsigned scratchSPReg = AArch64::SP;

if (NeedsRealignment) {		if (NeedsRealignment) {
scratchSPReg = findScratchNonCalleeSaveRegister(&MBB);		scratchSPReg = findScratchNonCalleeSaveRegister(&MBB);
assert(scratchSPReg != AArch64::NoRegister);		assert(scratchSPReg != AArch64::NoRegister);
}		}

// If we're a leaf function, try using the red zone.		// If we're a leaf function, try using the red zone.
if (!canUseRedZone(MF))		if (!canUseRedZone(MF)) {
// FIXME: in the case of dynamic re-alignment, NumBytes doesn't have		// FIXME: in the case of dynamic re-alignment, NumBytes doesn't have
// the correct value here, as NumBytes also includes padding bytes,		// the correct value here, as NumBytes also includes padding bytes,
// which shouldn't be counted here.		// which shouldn't be counted here.
		if (NeedsStackProbe && !NeedsRealignment) {
		// If we don't need to re-align the stack, we can use a more efficient
		// sequence for stack probing.
		Register ScratchReg = findScratchNonCalleeSaveRegister(&MBB);
		assert(ScratchReg != AArch64::NoRegister);
		BuildMI(MBB, MBBI, DL,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - BuildMI(MBB, MBBI, DL, - TII->get(AArch64::PROBED_STACKALLOC)) + BuildMI(MBB, MBBI, DL, TII->get(AArch64::PROBED_STACKALLOC)) Lint: Pre-merge checks: clang-format: please reformat the code ``` - BuildMI(MBB, MBBI, DL, - TII…
		TII->get(AArch64::PROBED_STACKALLOC))
		.addDef(ScratchReg)
		.addImm(-NumBytes);
		} else {
emitFrameOffset(MBB, MBBI, DL, scratchSPReg, AArch64::SP,		emitFrameOffset(MBB, MBBI, DL, scratchSPReg, AArch64::SP,
StackOffset::getFixed(-NumBytes), TII,		StackOffset::getFixed(-NumBytes), TII,
MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);		MachineInstr::FrameSetup, false, NeedsWinCFI,
		&HasWinCFI);
		}
		}

if (NeedsRealignment) {		if (NeedsRealignment) {
const unsigned NrBitsToZero = Log2(MFI.getMaxAlign());		const unsigned NrBitsToZero = Log2(MFI.getMaxAlign());
assert(NrBitsToZero > 1);		assert(NrBitsToZero > 1);
assert(scratchSPReg != AArch64::SP);		assert(scratchSPReg != AArch64::SP);

// SUB X9, SP, NumBytes		// SUB X9, SP, NumBytes
// -- X9 is temporary register, so shouldn't contain any live data here,		// -- X9 is temporary register, so shouldn't contain any live data here,
// -- free to use. This is already produced by emitFrameOffset above.		// -- free to use. This is already produced by emitFrameOffset above.
// AND SP, X9, 0b11111...0000		// AND SP, X9, 0b11111...0000
// The logical immediates have a non-trivial encoding. The following		// The logical immediates have a non-trivial encoding. The following
// formula computes the encoded immediate with all ones but		// formula computes the encoded immediate with all ones but
// NrBitsToZero zero bits as least significant bits.		// NrBitsToZero zero bits as least significant bits.
uint32_t andMaskEncoded = (1 << 12) // = N		uint32_t andMaskEncoded = (1 << 12) // = N
\| ((64 - NrBitsToZero) << 6) // immr		\| ((64 - NrBitsToZero) << 6) // immr
\| ((64 - NrBitsToZero - 1) << 0); // imms		\| ((64 - NrBitsToZero - 1) << 0); // imms

		if (NeedsStackProbe) {
		BuildMI(MBB, MBBI, DL, TII->get(AArch64::ANDXri), scratchSPReg)
		.addReg(scratchSPReg, RegState::Kill)
		.addImm(andMaskEncoded);
		BuildMI(MBB, MBBI, DL, TII->get(AArch64::PROBED_STACKALLOC_VAR))
		.addUse(scratchSPReg);
		} else {
BuildMI(MBB, MBBI, DL, TII->get(AArch64::ANDXri), AArch64::SP)		BuildMI(MBB, MBBI, DL, TII->get(AArch64::ANDXri), AArch64::SP)
.addReg(scratchSPReg, RegState::Kill)		.addReg(scratchSPReg, RegState::Kill)
.addImm(andMaskEncoded);		.addImm(andMaskEncoded);
		}
AFI->setStackRealigned(true);		AFI->setStackRealigned(true);
if (NeedsWinCFI) {		if (NeedsWinCFI) {
HasWinCFI = true;		HasWinCFI = true;
BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_StackAlloc))		BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_StackAlloc))
.addImm(NumBytes & andMaskEncoded)		.addImm(NumBytes & andMaskEncoded)
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
}		}
}		}
▲ Show 20 Lines • Show All 2,150 Lines • ▼ Show 20 Lines	LLVM_DEBUG(dbgs() << "Final frame order:\n"; for (auto &Obj
dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;		dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
if (Obj.ObjectFirst)		if (Obj.ObjectFirst)
dbgs() << ", first";		dbgs() << ", first";
if (Obj.GroupFirst)		if (Obj.GroupFirst)
dbgs() << ", group-first";		dbgs() << ", group-first";
dbgs() << "\n";		dbgs() << "\n";
});		});
}		}

		/// Emit a loop to decrement SP until it is equal to TargetReg, with probes at
		/// least every NegProbeSize bytes. Returns an iterator of the first instruction
		/// after the loop. The difference between SP and TargetReg must be an exact
		/// multiple of NegProbeSize.
		MachineBasicBlock::iterator
		AArch64FrameLowering::inlineStackProbeLoopExactMultiple(
		MachineBasicBlock::iterator MBBI, int64_t NegProbeSize,
		Register TargetReg) const {
		MachineBasicBlock &MBB = *MBBI->getParent();
		MachineFunction &MF = *MBB.getParent();
		const AArch64InstrInfo *TII =
		MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
		DebugLoc DL = MBB.findDebugLoc(MBBI);

		MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
		MachineBasicBlock *LoopMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
		MF.insert(MBBInsertPoint, LoopMBB);
		MachineBasicBlock *ExitMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
		MF.insert(MBBInsertPoint, ExitMBB);

		// ADD SP, SP, #NegFrameSize (or equivalent if NegFrameSize is not encodable
		// in ADD).
		emitFrameOffset(*LoopMBB, LoopMBB->end(), DL, AArch64::SP, AArch64::SP,
		StackOffset::getFixed(NegProbeSize), TII,
		MachineInstr::FrameSetup);
		// STR XZR, [SP]
		BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::STRXui))
		.addReg(AArch64::XZR)
		.addReg(AArch64::SP)
		.addImm(0)
		.setMIFlags(MachineInstr::FrameSetup);
		// CMP SP, XZR
		BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
		AArch64::XZR)
		.addReg(AArch64::SP)
		.addReg(TargetReg)
		.addImm(AArch64_AM::getArithExtendImm(AArch64_AM::UXTX, 0))
		.setMIFlags(MachineInstr::FrameSetup);
		// B.CC Loop
		BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::Bcc))
		.addImm(AArch64CC::NE)
		.addMBB(LoopMBB)
		.setMIFlags(MachineInstr::FrameSetup);

		LoopMBB->addSuccessor(ExitMBB);
		LoopMBB->addSuccessor(LoopMBB);
		// Synthesize the exit MBB.
		ExitMBB->splice(ExitMBB->end(), &MBB, std::next(MBBI), MBB.end());
		ExitMBB->transferSuccessorsAndUpdatePHIs(&MBB);
		MBB.addSuccessor(LoopMBB);
		// Update liveins.
		recomputeLiveIns(*LoopMBB);
		recomputeLiveIns(*ExitMBB);

		return ExitMBB->begin();
		}

		MachineBasicBlock::iterator AArch64FrameLowering::inlineStackProbeFixed(
		MachineBasicBlock::iterator MBBI) const {
		MachineBasicBlock &MBB = *MBBI->getParent();
		MachineFunction &MF = *MBB.getParent();
		const AArch64TargetLowering *TLI =
		MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
		const AArch64InstrInfo *TII =
		MF.getSubtarget<AArch64Subtarget>().getInstrInfo();

		DebugLoc DL = MBB.findDebugLoc(MBBI);
		Register ScratchReg = MBBI->getOperand(0).getReg();
		int64_t NegFrameSize = MBBI->getOperand(1).getImm();
		int64_t NegProbeSize = -(int64_t)TLI->getStackProbeSize(MF);
		int64_t NumBlocks = NegFrameSize / NegProbeSize;
		int64_t NegResidualSize = NegFrameSize % NegProbeSize;
		MachineBasicBlock::iterator NextInst;
		LLVM_DEBUG(dbgs() << "Stack probing: total " << NegFrameSize << " bytes, " <<
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - LLVM_DEBUG(dbgs() << "Stack probing: total " << NegFrameSize << " bytes, " << - NumBlocks << " blocks of " << NegProbeSize << " bytes, plus " << - NegResidualSize << " bytes\n"); + LLVM_DEBUG(dbgs() << "Stack probing: total " << NegFrameSize << " bytes, " + << NumBlocks << " blocks of " << NegProbeSize + << " bytes, plus " << NegResidualSize << " bytes\n"); Lint: Pre-merge checks: clang-format: please reformat the code ``` - LLVM_DEBUG(dbgs() << "Stack probing: total " <<…
		NumBlocks << " blocks of " << NegProbeSize << " bytes, plus " <<
		NegResidualSize << " bytes\n");

		if (NegResidualSize != 0) {
		// ADD SP, SP, #NegFrameSize (or equivalent if NegFrameSize is not encodable
		// in ADD).
		emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
		StackOffset::getFixed(NegResidualSize), TII, MachineInstr::FrameSetup);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - StackOffset::getFixed(NegResidualSize), TII, MachineInstr::FrameSetup); + StackOffset::getFixed(NegResidualSize), TII, + MachineInstr::FrameSetup); Lint: Pre-merge checks: clang-format: please reformat the code ``` - StackOffset::getFixed…
		// STR XZR, [SP]
		BuildMI(MBB, MBBI, DL, TII->get(AArch64::STRXui))
		.addReg(AArch64::XZR)
		.addReg(AArch64::SP)
		.addImm(0)
		.setMIFlags(MachineInstr::FrameSetup);
		}

		if (NumBlocks < 3) {
		for (int i = 0; i < NumBlocks; ++i) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming]…
		// ADD SP, SP, #NegFrameSize (or equivalent if NegFrameSize is not
		// encodable in ADD).
		emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
		StackOffset::getFixed(NegProbeSize), TII, MachineInstr::FrameSetup);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - StackOffset::getFixed(NegProbeSize), TII, MachineInstr::FrameSetup); + StackOffset::getFixed(NegProbeSize), TII, + MachineInstr::FrameSetup); Lint: Pre-merge checks: clang-format: please reformat the code ``` - StackOffset::getFixed…
		// STR XZR, [SP]
		BuildMI(MBB, MBBI, DL, TII->get(AArch64::STRXui))
		.addReg(AArch64::XZR)
		.addReg(AArch64::SP)
		.addImm(0)
		.setMIFlags(MachineInstr::FrameSetup);
		}
		NextInst = std::next(MBBI);
		} else if (NumBlocks != 0) {
		// ADD ScratchReg, SP, #NegFrameSize (or equivalent if NegFrameSize is not
		// encodable in ADD).
		emitFrameOffset(MBB, MBBI, DL, ScratchReg, AArch64::SP,
		StackOffset::getFixed(NegProbeSize * NumBlocks), TII,
		MachineInstr::FrameSetup);
		NextInst =
		inlineStackProbeLoopExactMultiple(MBBI, NegProbeSize, ScratchReg);
		}

		MBBI->eraseFromParent();
		return NextInst;
		}

		MachineBasicBlock::iterator AArch64FrameLowering::inlineStackProbeVar(
		MachineBasicBlock::iterator MBBI) const {
		MachineBasicBlock &MBB = *MBBI->getParent();
		MachineFunction &MF = *MBB.getParent();
		const AArch64TargetLowering *TLI =
		MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
		const AArch64InstrInfo *TII =
		MF.getSubtarget<AArch64Subtarget>().getInstrInfo();

		DebugLoc DL = MBB.findDebugLoc(MBBI);
		Register TargetReg = MBBI->getOperand(0).getReg();
		int64_t NegProbeSize = -(int64_t)TLI->getStackProbeSize(MF);
		MachineBasicBlock::iterator NextInst = std::next(MBBI);

		NextInst = TII->insertStackProbingLoop(MBBI, NegProbeSize, TargetReg);

		MBBI->eraseFromParent();
		return NextInst;
		}

		void AArch64FrameLowering::inlineStackProbe(
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code -void AArch64FrameLowering::inlineStackProbe( - MachineFunction &MF, MachineBasicBlock &MBB) const { +void AArch64FrameLowering::inlineStackProbe(MachineFunction &MF, + MachineBasicBlock &MBB) const { Lint: Pre-merge checks: clang-format: please reformat the code ``` -void AArch64FrameLowering::inlineStackProbe…
		MachineFunction &MF, MachineBasicBlock &MBB) const {
		for (auto MBBI = MBB.begin(), E = MBB.end(); MBBI != E;) {
		if (MBBI->getOpcode() == AArch64::PROBED_STACKALLOC) {
		MBBI = inlineStackProbeFixed(MBBI);
		E = MBBI->getParent()->end();
		} else if (MBBI->getOpcode() == AArch64::PROBED_STACKALLOC_VAR) {
		MBBI = inlineStackProbeVar(MBBI);
		E = MBBI->getParent()->end();
		} else {
		++MBBI;
		}
		}
		}

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 809 Lines • ▼ Show 20 Lines	public:

// If the platform/function should have a redzone, return the size in bytes.		// If the platform/function should have a redzone, return the size in bytes.
unsigned getRedZoneSize(const Function &F) const {		unsigned getRedZoneSize(const Function &F) const {
if (F.hasFnAttribute(Attribute::NoRedZone))		if (F.hasFnAttribute(Attribute::NoRedZone))
return 0;		return 0;
return 128;		return 128;
}		}

		/// True if stack clash protection is enabled for this functions.
		bool hasInlineStackProbe(MachineFunction &MF) const override;

		/// Get the interval between stack-clash probes, which is equal to the stack
		/// guard size, in bytes.
		unsigned getStackProbeSize(MachineFunction &MF) const;

		/// Get the maximum allowed number of unprobed bytes above SP at an ABI
		/// boundary.
		unsigned getStackProbeMaxUnprobedStack(MachineFunction &MF) const;

private:		private:
/// Keep a pointer to the AArch64Subtarget around so that we can		/// Keep a pointer to the AArch64Subtarget around so that we can
/// make the right decision when generating code for different targets.		/// make the right decision when generating code for different targets.
const AArch64Subtarget *Subtarget;		const AArch64Subtarget *Subtarget;

bool isExtFreeImpl(const Instruction *Ext) const override;		bool isExtFreeImpl(const Instruction *Ext) const override;

void addTypeForNEON(MVT VT, MVT PromotedBitwiseVT);		void addTypeForNEON(MVT VT, MVT PromotedBitwiseVT);
▲ Show 20 Lines • Show All 256 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 16,617 Lines • ▼ Show 20 Lines

	Function *AArch64TargetLowering::getSSPStackGuardCheck(const Module &M) const {			Function *AArch64TargetLowering::getSSPStackGuardCheck(const Module &M) const {
	// MSVC CRT has a function to validate security cookie.			// MSVC CRT has a function to validate security cookie.
	if (Subtarget->getTargetTriple().isWindowsMSVCEnvironment())			if (Subtarget->getTargetTriple().isWindowsMSVCEnvironment())
	return M.getFunction("__security_check_cookie");			return M.getFunction("__security_check_cookie");
	return TargetLowering::getSSPStackGuardCheck(M);			return TargetLowering::getSSPStackGuardCheck(M);
	}			}

				bool AArch64TargetLowering::hasInlineStackProbe(MachineFunction &MF) const {
				// If the function specifically requests inline stack probes, emit them.
				if (MF.getFunction().hasFnAttribute("probe-stack")) {
				if (MF.getFunction().getFnAttribute("probe-stack").getValueAsString() ==
				serge-sans-pailleUnsubmitted Done Reply Inline Actions Should you warn at some point if another (unsupported) scheme is used? serge-sans-paille: Should you warn at some point if another (unsupported) scheme is used?
				"inline-asm")
				return true;
				else
				llvm_unreachable("Unsupported stack probing method");
				}

				return false;
				}

				unsigned AArch64TargetLowering::getStackProbeSize(MachineFunction &MF) const {
				const TargetFrameLowering *TFI = Subtarget->getFrameLowering();
				unsigned StackAlign = TFI->getStackAlignment();
				assert(StackAlign >= 1 && isPowerOf2_32(StackAlign) &&
				"Unexpected stack alignment");
				// The default stack probe size is 4096 if the function has no
				// stack-probe-size attribute. This is a safe default because it is the
				// smallest possible guard page size.
				unsigned StackProbeSize = 4096;
				const Function &Fn = MF.getFunction();
				if (Fn.hasFnAttribute("stack-probe-size"))
				Fn.getFnAttribute("stack-probe-size")
				.getValueAsString()
				.getAsInteger(0, StackProbeSize);
				// Round down to the stack alignment.
				StackProbeSize &= ~(StackAlign - 1);
				return StackProbeSize ? StackProbeSize : StackAlign;
				}

				serge-sans-pailleUnsubmitted Not Done Reply Inline Actions Can you give any hint about why 1024 was chosen, and why it's independent from `getStackProbeSize` ? serge-sans-paille: Can you give any hint about why 1024 was chosen, and why it's independent from…
				ostannardAuthorUnsubmitted Done Reply Inline Actions This matches GCC's ABI, which doesn't appear to be explicitly documented anywhere, other then comments in the GCC code: https://github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/aarch64.c#L8190 ostannard: This matches GCC's ABI, which doesn't appear to be explicitly documented anywhere, other then…
				unsigned AArch64TargetLowering::getStackProbeMaxUnprobedStack(
				MachineFunction &MF) const {
				// We assume and guarantee that, at an ABI boundary, the last probe was no
				// more than 1024 bytes above SP.
				return 1024;
				}

	Value *AArch64TargetLowering::getSafeStackPointerLocation(IRBuilder<> &IRB) const {			Value *AArch64TargetLowering::getSafeStackPointerLocation(IRBuilder<> &IRB) const {
	// Android provides a fixed TLS slot for the SafeStack pointer. See the			// Android provides a fixed TLS slot for the SafeStack pointer. See the
	// definition of TLS_SLOT_SAFESTACK in			// definition of TLS_SLOT_SAFESTACK in
	// https://android.googlesource.com/platform/bionic/+/master/libc/private/bionic_tls.h			// https://android.googlesource.com/platform/bionic/+/master/libc/private/bionic_tls.h
	if (Subtarget->isTargetAndroid())			if (Subtarget->isTargetAndroid())
	return UseTlsOffset(IRB, 0x48);			return UseTlsOffset(IRB, 0x48);

	// Fuchsia is similar.			// Fuchsia is similar.
	▲ Show 20 Lines • Show All 726 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.h

Show First 20 Lines • Show All 305 Lines • ▼ Show 20 Lines	public:

static void decomposeStackOffsetForFrameOffsets(const StackOffset &Offset,		static void decomposeStackOffsetForFrameOffsets(const StackOffset &Offset,
int64_t &NumBytes,		int64_t &NumBytes,
int64_t &NumPredicateVectors,		int64_t &NumPredicateVectors,
int64_t &NumDataVectors);		int64_t &NumDataVectors);
static void decomposeStackOffsetForDwarfOffsets(const StackOffset &Offset,		static void decomposeStackOffsetForDwarfOffsets(const StackOffset &Offset,
int64_t &ByteSized,		int64_t &ByteSized,
int64_t &VGSized);		int64_t &VGSized);

		/// Insert code to set SP to the value in TargetReg, ensuring that memory is
		/// writen to every NegProbeSize bytes. TargetReg must be below SP, and has no
		/// alignment requirements other then the usual 16-byte alignment for SP.
		MachineBasicBlock::iterator
		insertStackProbingLoop(MachineBasicBlock::iterator MBBI, int64_t NegProbeSize,
		Register TargetReg) const;

#define GET_INSTRINFO_HELPER_DECLS		#define GET_INSTRINFO_HELPER_DECLS
#include "AArch64GenInstrInfo.inc"		#include "AArch64GenInstrInfo.inc"

protected:		protected:
/// If the specific machine instruction is an instruction that moves/copies		/// If the specific machine instruction is an instruction that moves/copies
/// value from one register to another register return destination and source		/// value from one register to another register return destination and source
/// registers as machine operands.		/// registers as machine operands.
Optional<DestSourcePair>		Optional<DestSourcePair>
▲ Show 20 Lines • Show All 178 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 7,209 Lines • ▼ Show 20 Lines

	unsigned llvm::getBLRCallOpcode(const MachineFunction &MF) {			unsigned llvm::getBLRCallOpcode(const MachineFunction &MF) {
	if (MF.getSubtarget<AArch64Subtarget>().hardenSlsBlr())			if (MF.getSubtarget<AArch64Subtarget>().hardenSlsBlr())
	return AArch64::BLRNoIP;			return AArch64::BLRNoIP;
	else			else
	return AArch64::BLR;			return AArch64::BLR;
	}			}

				MachineBasicBlock::iterator
				AArch64InstrInfo::insertStackProbingLoop(MachineBasicBlock::iterator MBBI,
				int64_t NegProbeSize,
				Register TargetReg) const {
				MachineBasicBlock &MBB = *MBBI->getParent();
				MachineFunction &MF = *MBB.getParent();
				const AArch64InstrInfo *TII =
				MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
				DebugLoc DL = MBB.findDebugLoc(MBBI);

				MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
				MachineBasicBlock *LoopTestMBB =
				MF.CreateMachineBasicBlock(MBB.getBasicBlock());
				MF.insert(MBBInsertPoint, LoopTestMBB);
				MachineBasicBlock *LoopBodyMBB =
				MF.CreateMachineBasicBlock(MBB.getBasicBlock());
				MF.insert(MBBInsertPoint, LoopBodyMBB);
				MachineBasicBlock *ExitMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
				MF.insert(MBBInsertPoint, ExitMBB);

				// LoopTest:
				// SUB SP, SP, #ProbeSize
				emitFrameOffset(*LoopTestMBB, LoopTestMBB->end(), DL, AArch64::SP,
				AArch64::SP, StackOffset::getFixed(NegProbeSize), TII,
				MachineInstr::FrameSetup);

				// CMP SP, TargetReg
				BuildMI(*LoopTestMBB, LoopTestMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
				AArch64::XZR)
				.addReg(AArch64::SP)
				.addReg(TargetReg)
				.addImm(AArch64_AM::getArithExtendImm(AArch64_AM::UXTX, 0))
				.setMIFlags(MachineInstr::FrameSetup);

				// B.LE LoopExit
				BuildMI(*LoopTestMBB, LoopTestMBB->end(), DL, TII->get(AArch64::Bcc))
				.addImm(AArch64CC::LE)
				.addMBB(ExitMBB)
				.setMIFlags(MachineInstr::FrameSetup);

				// STR XZR, [SP]
				BuildMI(*LoopBodyMBB, LoopBodyMBB->end(), DL, TII->get(AArch64::STRXui))
				.addReg(AArch64::XZR)
				.addReg(AArch64::SP)
				.addImm(0)
				.setMIFlags(MachineInstr::FrameSetup);

				// B loop
				BuildMI(*LoopBodyMBB, LoopBodyMBB->end(), DL, TII->get(AArch64::B))
				.addMBB(LoopTestMBB);

				// LoopExit:
				// MOV SP, TargetReg
				BuildMI(*ExitMBB, ExitMBB->end(), DL, TII->get(AArch64::ADDXri), AArch64::SP)
				.addReg(TargetReg)
				.addImm(0)
				.addImm(AArch64_AM::getShifterImm(AArch64_AM::LSL, 0))
				.setMIFlags(MachineInstr::FrameSetup);

				// STR XZR, [SP]
				BuildMI(*ExitMBB, ExitMBB->end(), DL, TII->get(AArch64::STRXui))
				.addReg(AArch64::XZR)
				.addReg(AArch64::SP)
				.addImm(0)
				.setMIFlags(MachineInstr::FrameSetup);

				LoopTestMBB->addSuccessor(ExitMBB);
				LoopTestMBB->addSuccessor(LoopBodyMBB);
				LoopBodyMBB->addSuccessor(LoopTestMBB);
				// Synthesize the exit MBB.
				ExitMBB->splice(ExitMBB->end(), &MBB, std::next(MBBI), MBB.end());
				ExitMBB->transferSuccessorsAndUpdatePHIs(&MBB);
				MBB.addSuccessor(LoopTestMBB);

				// Update liveins.
				if (MF.getRegInfo().reservedRegsFrozen()) {
				recomputeLiveIns(*LoopTestMBB);
				recomputeLiveIns(*LoopBodyMBB);
				recomputeLiveIns(*ExitMBB);
				}

				return ExitMBB->begin();
				}

	#define GET_INSTRINFO_HELPERS			#define GET_INSTRINFO_HELPERS
	#define GET_INSTRMAP_INFO			#define GET_INSTRMAP_INFO
	#include "AArch64GenInstrInfo.inc"			#include "AArch64GenInstrInfo.inc"

llvm/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 647 Lines • ▼ Show 20 Lines
	// We set Sched to empty list because we expect these instructions to simply get			// We set Sched to empty list because we expect these instructions to simply get
	// removed in most cases.			// removed in most cases.
	def ADJCALLSTACKDOWN : Pseudo<(outs), (ins i32imm:$amt1, i32imm:$amt2),			def ADJCALLSTACKDOWN : Pseudo<(outs), (ins i32imm:$amt1, i32imm:$amt2),
	[(AArch64callseq_start timm:$amt1, timm:$amt2)]>,			[(AArch64callseq_start timm:$amt1, timm:$amt2)]>,
	Sched<[]>;			Sched<[]>;
	def ADJCALLSTACKUP : Pseudo<(outs), (ins i32imm:$amt1, i32imm:$amt2),			def ADJCALLSTACKUP : Pseudo<(outs), (ins i32imm:$amt1, i32imm:$amt2),
	[(AArch64callseq_end timm:$amt1, timm:$amt2)]>,			[(AArch64callseq_end timm:$amt1, timm:$amt2)]>,
	Sched<[]>;			Sched<[]>;

				// Probed stack allocation of a constant size, used in function prologues when
				// stack-clash protection is enabled.
				def PROBED_STACKALLOC : Pseudo<(outs GPR64:$scratch),
				(ins i64imm:$stacksize),
				[]>,
				Sched<[]>;
				// Probed stack allocation of a variable size, used in function prologues when
				// stack-clash protection is enabled. The register input is the target SP,
				// which should be below the current value, and has no alignment requirements
				// beyond the usual 16-byte alignment for SP.
				def PROBED_STACKALLOC_VAR : Pseudo<(outs),
				(ins GPR64:$target),
				[]>,
				Sched<[]>;
	} // Defs = [SP], Uses = [SP], hasSideEffects = 1, isCodeGenOnly = 1			} // Defs = [SP], Uses = [SP], hasSideEffects = 1, isCodeGenOnly = 1

	let isReMaterializable = 1, isCodeGenOnly = 1 in {			let isReMaterializable = 1, isCodeGenOnly = 1 in {
	// FIXME: The following pseudo instructions are only needed because remat			// FIXME: The following pseudo instructions are only needed because remat
	// cannot handle multiple instructions. When that changes, they can be			// cannot handle multiple instructions. When that changes, they can be
	// removed, along with the AArch64Wrapper node.			// removed, along with the AArch64Wrapper node.

	let AddedComplexity = 10 in			let AddedComplexity = 10 in
	▲ Show 20 Lines • Show All 7,286 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/stack-probing-64k.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple aarch64-none-eabi < %s -verify-machineinstrs \| FileCheck %s
				; RUN: llc -mtriple aarch64-none-eabi < %s -verify-machineinstrs -global-isel \| FileCheck %s

				; Tests for prolog sequences for stack probing, when using a 64KiB stack guard.

				; Small stack frame, no probing required.
				define void @static_64(i8** %out) "probe-stack"="inline-asm" "stack-probe-size"="65536" "frame-pointer"="none" {
				; CHECK-LABEL: static_64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sub sp, sp, #64 // =64
				; CHECK-NEXT: .cfi_def_cfa_offset 64
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: add sp, sp, #64 // =64
				; CHECK-NEXT: ret
				entry:
				%vla = alloca i8, i64 64, align 1
				store i8* %vla, i8** %out, align 8
				ret void
				}

				; At 256 bytes we start to always create a frame pointer. No frame smaller then
				; this needs a probe, so we can use the saving of at least one CSR as a probe
				; at the top of our frame.
				define void @static_256(i8** %out) "probe-stack"="inline-asm" "stack-probe-size"="65536" "frame-pointer"="none" {
				; CHECK-LABEL: static_256:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sub sp, sp, #272 // =272
				; CHECK-NEXT: str x29, [sp, #256] // 8-byte Folded Spill
				; CHECK-NEXT: .cfi_def_cfa_offset 272
				nagisaUnsubmitted Not Done Reply Inline Actions This should come directly after the `sub` I believe. Otherwise the stack offsets would be incorrect while the `str x29, [sp, #256]` is being executed. (Not sure if preexisting, though) nagisa: This //should// come directly after the `sub` I believe. Otherwise the stack offsets would be…
				chillUnsubmitted Not Done Reply Inline Actions This patch predates the support for asynchronous unwinding. In the new patch it's handled correctly, I believe. chill: This patch predates the support for asynchronous unwinding. In the new patch it's handled…
				; CHECK-NEXT: .cfi_offset w29, -16
				nagisaUnsubmitted Not Done Reply Inline Actions Is this right? We spilled 8 bytes, but specify the offset for the 32-bit view of the register. nagisa: Is this right? We spilled 8 bytes, but specify the offset for the 32-bit view of the register.
				chillUnsubmitted Not Done Reply Inline Actions Yes, it's correct, the DWARF encoding of register names does not encode separately `xN` and `wN`. chill: Yes, it's correct, the DWARF encoding of register names does not encode separately `xN` and…
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: add sp, sp, #272 // =272
				; CHECK-NEXT: ret
				entry:
				%vla = alloca i8, i64 256, align 1
				store i8* %vla, i8** %out, align 8
				ret void
				}

				; At just less that 1024 bytes, this is the largest frame which doesn't need
				; probing.
				define void @static_1008(i8** %out) "probe-stack"="inline-asm" "stack-probe-size"="65536" "frame-pointer"="none" {
				; CHECK-LABEL: static_1008:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: sub sp, sp, #1008 // =1008
				; CHECK-NEXT: .cfi_def_cfa_offset 1024
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: add sp, sp, #1008 // =1008
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vla = alloca i8, i64 1008, align 1
				store i8* %vla, i8** %out, align 8
				ret void
				}

				; At 1024 bytes, we need to start probing to guarantee that SP does not go too
				; far into the guard at an ABI boundary.
				define void @static_1024(i8** %out) "probe-stack"="inline-asm" "stack-probe-size"="65536" "frame-pointer"="none" {
				; CHECK-LABEL: static_1024:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: sub sp, sp, #1024 // =1024
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: .cfi_def_cfa_offset 1040
				nagisaUnsubmitted Not Done Reply Inline Actions Similarly, this should come immediately after the `sub` instruction, otherwise the CFI won't describe the stack accurately during the `str` above. nagisa: Similarly, this should come immediately after the `sub` instruction, otherwise the CFI won't…
				chillUnsubmitted Not Done Reply Inline Actions Also fixed. chill: Also fixed.
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: add sp, sp, #1024 // =1024
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vla = alloca i8, i64 1024, align 1
				store i8* %vla, i8** %out, align 8
				ret void
				}

				; The stack offset does not fit into one SUB unstruction, so two are used.
				define void @static_65520(i8** %out) "probe-stack"="inline-asm" "stack-probe-size"="65536" "frame-pointer"="none" {
				; CHECK-LABEL: static_65520:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: sub sp, sp, #15, lsl #12 // =61440
				; CHECK-NEXT: sub sp, sp, #4080 // =4080
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: .cfi_def_cfa_offset 65536
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: add sp, sp, #15, lsl #12 // =61440
				; CHECK-NEXT: add sp, sp, #4080 // =4080
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vla = alloca i8, i64 65520, align 1
				store i8* %vla, i8** %out, align 8
				ret void
				}


				; 64k bytes is the largest frame we can probe in one go.
				define void @static_65536(i8** %out) "probe-stack"="inline-asm" "stack-probe-size"="65536" "frame-pointer"="none" {
				; CHECK-LABEL: static_65536:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: sub sp, sp, #16, lsl #12 // =65536
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: .cfi_def_cfa_offset 65552
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: add sp, sp, #16, lsl #12 // =65536
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vla = alloca i8, i64 65536, align 1
				store i8* %vla, i8** %out, align 8
				ret void
				}

				; Smallest stack frame (64k+16) which needs two probes (the first, smaller one gets
				; folded into a store with writeback).
				define void @static_65552(i8** %out) "probe-stack"="inline-asm" "stack-probe-size"="65536" "frame-pointer"="none" {
				; CHECK-LABEL: static_65552:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: str xzr, [sp, #-16]!
				; CHECK-NEXT: sub sp, sp, #16, lsl #12 // =65536
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: .cfi_def_cfa_offset 65568
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: add sp, sp, #16, lsl #12 // =65536
				; CHECK-NEXT: add sp, sp, #16 // =16
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vla = alloca i8, i64 65552, align 1
				store i8* %vla, i8** %out, align 8
				ret void
				}

				; Largest frame needing two probes.
				define void @static_131072(i8** %out) "probe-stack"="inline-asm" "stack-probe-size"="65536" "frame-pointer"="none" {
				; CHECK-LABEL: static_131072:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: sub sp, sp, #16, lsl #12 // =65536
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: sub sp, sp, #16, lsl #12 // =65536
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: .cfi_def_cfa_offset 131088
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: add sp, sp, #32, lsl #12 // =131072
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vla = alloca i8, i64 131072, align 1
				store i8* %vla, i8** %out, align 8
				ret void
				}

				; Largest frame probed without a loop (3 probes).
				define void @static_196592(i8** %out) "probe-stack"="inline-asm" "stack-probe-size"="65536" "frame-pointer"="none" {
				; CHECK-LABEL: static_196592:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: sub sp, sp, #15, lsl #12 // =61440
				; CHECK-NEXT: sub sp, sp, #4080 // =4080
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: sub sp, sp, #16, lsl #12 // =65536
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: sub sp, sp, #16, lsl #12 // =65536
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: .cfi_def_cfa_offset 196608
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: add sp, sp, #47, lsl #12 // =192512
				; CHECK-NEXT: add sp, sp, #4080 // =4080
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vla = alloca i8, i64 196592, align 1
				store i8* %vla, i8** %out, align 8
				ret void
				}

				; Smallest frame probed with a loop.
				define void @static_196608(i8** %out) "probe-stack"="inline-asm" "stack-probe-size"="65536" "frame-pointer"="none" {
				; CHECK-LABEL: static_196608:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: sub x9, sp, #48, lsl #12 // =196608
				; CHECK-NEXT: .LBB9_1: // %entry
				; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: sub sp, sp, #16, lsl #12 // =65536
				; CHECK-NEXT: cmp sp, x9
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: b.ne .LBB9_1
				; CHECK-NEXT: // %bb.2: // %entry
				; CHECK-NEXT: .cfi_def_cfa_offset 196624
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: add sp, sp, #48, lsl #12 // =196608
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vla = alloca i8, i64 196608, align 1
				store i8* %vla, i8** %out, align 8
				ret void
				}

				; Large enough to use a loop, but not a miltiple of 64KiB so needs an extra
				; probe for the remainder.
				define void @static_197632(i8** %out) "probe-stack"="inline-asm" "stack-probe-size"="65536" "frame-pointer"="none" {
				; CHECK-LABEL: static_197632:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: sub sp, sp, #1024 // =1024
				; CHECK-NEXT: sub x9, sp, #48, lsl #12 // =196608
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: .LBB10_1: // %entry
				; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: sub sp, sp, #16, lsl #12 // =65536
				; CHECK-NEXT: cmp sp, x9
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: b.ne .LBB10_1
				; CHECK-NEXT: // %bb.2: // %entry
				; CHECK-NEXT: .cfi_def_cfa_offset 197648
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: add sp, sp, #48, lsl #12 // =196608
				; CHECK-NEXT: add sp, sp, #1024 // =1024
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vla = alloca i8, i64 197632, align 1
				store i8* %vla, i8** %out, align 8
				ret void
				}

				; A small allocation, but with a very large alignment requirement. We do this
				; by moving SP far enough that a sufficiently-aligned block will exist
				; somewhere in the stack frame, so must probe the whole of that larger SP move.
				define void @static_16_align_8192(i8** %out) "probe-stack"="inline-asm" "stack-probe-size"="65536" "frame-pointer"="none" {
				; CHECK-LABEL: static_16_align_8192:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
				; CHECK-NEXT: sub x9, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: sub x9, x9, #4080 // =4080
				; CHECK-NEXT: and x9, x9, #0xffffffffffffe000
				; CHECK-NEXT: mov x29, sp
				; CHECK-NEXT: .LBB11_1: // %entry
				; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: sub sp, sp, #16, lsl #12 // =65536
				; CHECK-NEXT: cmp sp, x9
				; CHECK-NEXT: b.le .LBB11_3
				; CHECK-NEXT: // %bb.2: // %entry
				; CHECK-NEXT: // in Loop: Header=BB11_1 Depth=1
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: b .LBB11_1
				; CHECK-NEXT: .LBB11_3: // %entry
				; CHECK-NEXT: mov sp, x9
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: .cfi_def_cfa w29, 16
				; CHECK-NEXT: .cfi_offset w30, -8
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: mov sp, x29
				; CHECK-NEXT: ldp x29, x30, [sp], #16 // 16-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vla = alloca i8, i64 16, align 8192
				store i8* %vla, i8** %out, align 8
				ret void
				}

llvm/test/CodeGen/AArch64/stack-probing-sve.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple aarch64-none-eabi < %s -verify-machineinstrs \| FileCheck %s
				; RUN: llc -mtriple aarch64-none-eabi < %s -verify-machineinstrs -global-isel -global-isel-abort=2 \| FileCheck %s

				; Test prolog sequences for stack probing when SVE objects are involved.

				; An SVE stack slot needs probing, because we don't know it's size at
				; compile-time.
				define void @sve_1_vector(<vscale x 4 x float>** %out) "probe-stack"="inline-asm" "frame-pointer"="none" "target-features"="+sve" {
				; CHECK-LABEL: sve_1_vector:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl x9, sp, #-1
				; CHECK-NEXT: .LBB0_1: // %entry
				; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: sub sp, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: cmp sp, x9
				; CHECK-NEXT: b.le .LBB0_3
				; CHECK-NEXT: // %bb.2: // %entry
				; CHECK-NEXT: // in Loop: Header=BB0_1 Depth=1
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: b .LBB0_1
				; CHECK-NEXT: .LBB0_3: // %entry
				; CHECK-NEXT: mov sp, x9
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 8 * VG
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vec = alloca <vscale x 4 x float>, align 16
				ret void
				}

				; As above, but with 4 SVE vectors of stack space.
				define void @sve_4_vector(<vscale x 4 x float>** %out) "probe-stack"="inline-asm" "frame-pointer"="none" "target-features"="+sve" {
				; CHECK-LABEL: sve_4_vector:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl x9, sp, #-4
				; CHECK-NEXT: .LBB1_1: // %entry
				; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: sub sp, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: cmp sp, x9
				; CHECK-NEXT: b.le .LBB1_3
				; CHECK-NEXT: // %bb.2: // %entry
				; CHECK-NEXT: // in Loop: Header=BB1_1 Depth=1
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: b .LBB1_1
				; CHECK-NEXT: .LBB1_3: // %entry
				; CHECK-NEXT: mov sp, x9
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x20, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 32 * VG
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: addvl sp, sp, #4
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vec1 = alloca <vscale x 4 x float>, align 16
				%vec2 = alloca <vscale x 4 x float>, align 16
				%vec3 = alloca <vscale x 4 x float>, align 16
				%vec4 = alloca <vscale x 4 x float>, align 16
				ret void
				}

				; The area allocated to save callee-saved SVE registers does not need to be
				; probed, because it will always be written to, which acts as a probe.
				define void @sve_1v_csr(<vscale x 4 x float> %a) "probe-stack"="inline-asm" "frame-pointer"="none" "target-features"="+sve" {
				; CHECK-LABEL: sve_1v_csr:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: str z8, [sp] // 16-byte Folded Spill
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 8 * VG
				; CHECK-NEXT: .cfi_escape 0x10, 0x48, 0x0a, 0x11, 0x70, 0x22, 0x11, 0x78, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d8 @ cfa - 16 - 8 * VG
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: //APP
				; CHECK-NEXT: //NO_APP
				; CHECK-NEXT: ldr z8, [sp] // 16-byte Folded Reload
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				call void asm sideeffect "", "~{z8}" ()
				ret void
				}

				define void @sve_4v_csr(<vscale x 4 x float> %a) "probe-stack"="inline-asm" "frame-pointer"="none" "target-features"="+sve" {
				; CHECK-LABEL: sve_4v_csr:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-4
				; CHECK-NEXT: str z11, [sp] // 16-byte Folded Spill
				; CHECK-NEXT: str z10, [sp, #1, mul vl] // 16-byte Folded Spill
				; CHECK-NEXT: str z9, [sp, #2, mul vl] // 16-byte Folded Spill
				; CHECK-NEXT: str z8, [sp, #3, mul vl] // 16-byte Folded Spill
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x20, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 32 * VG
				; CHECK-NEXT: .cfi_escape 0x10, 0x48, 0x0a, 0x11, 0x70, 0x22, 0x11, 0x78, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d8 @ cfa - 16 - 8 * VG
				; CHECK-NEXT: .cfi_escape 0x10, 0x49, 0x0a, 0x11, 0x70, 0x22, 0x11, 0x70, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d9 @ cfa - 16 - 16 * VG
				; CHECK-NEXT: .cfi_escape 0x10, 0x4a, 0x0a, 0x11, 0x70, 0x22, 0x11, 0x68, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d10 @ cfa - 16 - 24 * VG
				; CHECK-NEXT: .cfi_escape 0x10, 0x4b, 0x0a, 0x11, 0x70, 0x22, 0x11, 0x60, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d11 @ cfa - 16 - 32 * VG
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: //APP
				; CHECK-NEXT: //NO_APP
				; CHECK-NEXT: ldr z11, [sp] // 16-byte Folded Reload
				; CHECK-NEXT: ldr z10, [sp, #1, mul vl] // 16-byte Folded Reload
				; CHECK-NEXT: ldr z9, [sp, #2, mul vl] // 16-byte Folded Reload
				; CHECK-NEXT: ldr z8, [sp, #3, mul vl] // 16-byte Folded Reload
				; CHECK-NEXT: addvl sp, sp, #4
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				call void asm sideeffect "", "~{z8},~{z9},~{z10},~{z11}" ()
				ret void
				}

				define void @sve_1p_csr(<vscale x 4 x float> %a) "probe-stack"="inline-asm" "frame-pointer"="none" "target-features"="+sve" {
				; CHECK-LABEL: sve_1p_csr:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: str p8, [sp, #7, mul vl] // 2-byte Folded Spill
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 8 * VG
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: //APP
				; CHECK-NEXT: //NO_APP
				; CHECK-NEXT: ldr p8, [sp, #7, mul vl] // 2-byte Folded Reload
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				call void asm sideeffect "", "~{p8}" ()
				ret void
				}

				define void @sve_4p_csr(<vscale x 4 x float> %a) "probe-stack"="inline-asm" "frame-pointer"="none" "target-features"="+sve" {
				; CHECK-LABEL: sve_4p_csr:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: str p11, [sp, #4, mul vl] // 2-byte Folded Spill
				; CHECK-NEXT: str p10, [sp, #5, mul vl] // 2-byte Folded Spill
				; CHECK-NEXT: str p9, [sp, #6, mul vl] // 2-byte Folded Spill
				; CHECK-NEXT: str p8, [sp, #7, mul vl] // 2-byte Folded Spill
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 8 * VG
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: //APP
				; CHECK-NEXT: //NO_APP
				; CHECK-NEXT: ldr p11, [sp, #4, mul vl] // 2-byte Folded Reload
				; CHECK-NEXT: ldr p10, [sp, #5, mul vl] // 2-byte Folded Reload
				; CHECK-NEXT: ldr p9, [sp, #6, mul vl] // 2-byte Folded Reload
				; CHECK-NEXT: ldr p8, [sp, #7, mul vl] // 2-byte Folded Reload
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				call void asm sideeffect "", "~{p8},~{p9},~{p10},~{p11}" ()
				ret void
				}

				; 1 SVE vector, which needs probing, and a 16-byte fixed size object, which
				; doesn't. Here the final store of the SVE probing loop gets merged with the
				; fixed-size SP decrement, but this doesn't affect probing as the pattern of
				; memory access is the same.
				define void @sve_1_vector_16_arr(<vscale x 4 x float>** %out) "probe-stack"="inline-asm" "frame-pointer"="none" "target-features"="+sve" {
				; CHECK-LABEL: sve_1_vector_16_arr:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl x9, sp, #-1
				; CHECK-NEXT: mov x29, sp
				; CHECK-NEXT: .LBB6_1: // %entry
				; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: sub sp, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: cmp sp, x9
				; CHECK-NEXT: b.le .LBB6_3
				; CHECK-NEXT: // %bb.2: // %entry
				; CHECK-NEXT: // in Loop: Header=BB6_1 Depth=1
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: b .LBB6_1
				; CHECK-NEXT: .LBB6_3: // %entry
				; CHECK-NEXT: mov sp, x9
				; CHECK-NEXT: str xzr, [sp], #-16
				; CHECK-NEXT: .cfi_def_cfa w29, 16
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: add sp, sp, #16 // =16
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vec = alloca <vscale x 4 x float>, align 16
				%arr = alloca i8, i64 16, align 1
				ret void
				}

				; 1 SVE stack slot and a 4096-byte stack slot, both of which need probing.
				; TODO: This could be optimised by combining the fixed-size offset into the
				; loop.
				define void @sve_1_vector_4096_arr(<vscale x 4 x float>** %out) "probe-stack"="inline-asm" "frame-pointer"="none" "target-features"="+sve" {
				; CHECK-LABEL: sve_1_vector_4096_arr:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl x9, sp, #-1
				; CHECK-NEXT: mov x29, sp
				; CHECK-NEXT: .LBB7_1: // %entry
				; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: sub sp, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: cmp sp, x9
				; CHECK-NEXT: b.le .LBB7_3
				; CHECK-NEXT: // %bb.2: // %entry
				; CHECK-NEXT: // in Loop: Header=BB7_1 Depth=1
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: b .LBB7_1
				; CHECK-NEXT: .LBB7_3: // %entry
				; CHECK-NEXT: mov sp, x9
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: sub sp, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: .cfi_def_cfa w29, 16
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: add sp, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vec = alloca <vscale x 4 x float>, align 16
				%arr = alloca i8, i64 4096, align 1
				ret void
				}

				; 1 SVE stack slot and a large stack slot, both of which need probing.
				; TODO this could be optimised by combining both loops.
				define void @sve_1_vector_12288_arr(<vscale x 4 x float>** %out) "probe-stack"="inline-asm" "frame-pointer"="none" "target-features"="+sve" {
				; CHECK-LABEL: sve_1_vector_12288_arr:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl x9, sp, #-1
				; CHECK-NEXT: mov x29, sp
				; CHECK-NEXT: .LBB8_1: // %entry
				; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: sub sp, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: cmp sp, x9
				; CHECK-NEXT: b.le .LBB8_3
				; CHECK-NEXT: // %bb.2: // %entry
				; CHECK-NEXT: // in Loop: Header=BB8_1 Depth=1
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: b .LBB8_1
				; CHECK-NEXT: .LBB8_3: // %entry
				; CHECK-NEXT: mov sp, x9
				; CHECK-NEXT: sub x9, sp, #3, lsl #12 // =12288
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: .LBB8_4: // %entry
				; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: sub sp, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: cmp sp, x9
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: b.ne .LBB8_4
				; CHECK-NEXT: // %bb.5: // %entry
				; CHECK-NEXT: .cfi_def_cfa w29, 16
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: add sp, sp, #3, lsl #12 // =12288
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vec = alloca <vscale x 4 x float>, align 16
				%arr = alloca i8, i64 12288, align 1
				ret void
				}

				; Not tested: SVE stack objects with alignment >16 bytes, which isn't currently
				; supported even without stack-probing.

				; 1 SVE vector, which needs probing, and a 16-byte fixed size object, which
				; has a large alignment requirement so also needs a probing loop.
				define void @sve_1_vector_16_arr_align_8192(<vscale x 4 x float>** %out) "probe-stack"="inline-asm" "frame-pointer"="none" "target-features"="+sve" {
				; CHECK-LABEL: sve_1_vector_16_arr_align_8192:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
				; CHECK-NEXT: addvl x9, sp, #-1
				; CHECK-NEXT: mov x29, sp
				; CHECK-NEXT: .LBB9_1: // %entry
				; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: sub sp, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: cmp sp, x9
				; CHECK-NEXT: b.le .LBB9_3
				; CHECK-NEXT: // %bb.2: // %entry
				; CHECK-NEXT: // in Loop: Header=BB9_1 Depth=1
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: b .LBB9_1
				; CHECK-NEXT: .LBB9_3: // %entry
				; CHECK-NEXT: mov sp, x9
				; CHECK-NEXT: sub x9, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: sub x9, x9, #4080 // =4080
				; CHECK-NEXT: and x9, x9, #0xffffffffffffe000
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: .LBB9_4: // %entry
				; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: sub sp, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: cmp sp, x9
				; CHECK-NEXT: b.le .LBB9_6
				; CHECK-NEXT: // %bb.5: // %entry
				; CHECK-NEXT: // in Loop: Header=BB9_4 Depth=1
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: b .LBB9_4
				; CHECK-NEXT: .LBB9_6: // %entry
				; CHECK-NEXT: mov sp, x9
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: .cfi_def_cfa w29, 16
				; CHECK-NEXT: .cfi_offset w30, -8
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: mov sp, x29
				; CHECK-NEXT: ldp x29, x30, [sp], #16 // 16-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vec = alloca <vscale x 4 x float>, align 16
				%arr = alloca i8, i64 16, align 8192
				ret void
				}

				; For 64k guard pages, the only difference is the constant subtracted from SP
				; in the loop.
				define void @sve_64k_guard(<vscale x 4 x float>** %out) "probe-stack"="inline-asm" "frame-pointer"="none" "target-features"="+sve" "stack-probe-size"="65536" {
				; CHECK-LABEL: sve_64k_guard:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl x9, sp, #-1
				; CHECK-NEXT: .LBB10_1: // %entry
				; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: sub sp, sp, #16, lsl #12 // =65536
				; CHECK-NEXT: cmp sp, x9
				; CHECK-NEXT: b.le .LBB10_3
				; CHECK-NEXT: // %bb.2: // %entry
				; CHECK-NEXT: // in Loop: Header=BB10_1 Depth=1
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: b .LBB10_1
				; CHECK-NEXT: .LBB10_3: // %entry
				; CHECK-NEXT: mov sp, x9
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 8 * VG
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vec = alloca <vscale x 4 x float>, align 16
				ret void
				}

				; Not tested: dynamic allocations of SVE vectors, which don't currently work
				; without stack probing.

llvm/test/CodeGen/AArch64/stack-probing.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple aarch64-none-eabi < %s -verify-machineinstrs \| FileCheck %s
				; RUN: llc -mtriple aarch64-none-eabi < %s -verify-machineinstrs -global-isel \| FileCheck %s

				; Tests for prolog sequences for stack probing, when using a 4KiB stack guard.

				; Small stack frame, no probing required.
				define void @static_64(i8** %out) "probe-stack"="inline-asm" "frame-pointer"="none" {
				; CHECK-LABEL: static_64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sub sp, sp, #64 // =64
				; CHECK-NEXT: .cfi_def_cfa_offset 64
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: add sp, sp, #64 // =64
				; CHECK-NEXT: ret
				entry:
				%vla = alloca i8, i64 64, align 1
				store i8* %vla, i8** %out, align 8
				ret void
				}

				; At 256 bytes we start to always create a frame pointer. No frame smaller then
				; this needs a probe, so we can use the saving of at least one CSR as a probe
				; at the top of our frame.
				define void @static_256(i8** %out) "probe-stack"="inline-asm" "frame-pointer"="none" {
				; CHECK-LABEL: static_256:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sub sp, sp, #272 // =272
				; CHECK-NEXT: str x29, [sp, #256] // 8-byte Folded Spill
				; CHECK-NEXT: .cfi_def_cfa_offset 272
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: add sp, sp, #272 // =272
				; CHECK-NEXT: ret
				entry:
				%vla = alloca i8, i64 256, align 1
				store i8* %vla, i8** %out, align 8
				ret void
				}

				; At just less that 1024 bytes, this is the largest frame which doesn't need
				; probing.
				define void @static_1008(i8** %out) "probe-stack"="inline-asm" "frame-pointer"="none" {
				; CHECK-LABEL: static_1008:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: sub sp, sp, #1008 // =1008
				; CHECK-NEXT: .cfi_def_cfa_offset 1024
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: add sp, sp, #1008 // =1008
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vla = alloca i8, i64 1008, align 1
				store i8* %vla, i8** %out, align 8
				ret void
				}

				; At 1024 bytes, we need to start probing to guarantee that SP does not go too
				; far into the guard at an ABI boundary.
				define void @static_1024(i8** %out) "probe-stack"="inline-asm" "frame-pointer"="none" {
				; CHECK-LABEL: static_1024:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: sub sp, sp, #1024 // =1024
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: .cfi_def_cfa_offset 1040
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: add sp, sp, #1024 // =1024
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vla = alloca i8, i64 1024, align 1
				store i8* %vla, i8** %out, align 8
				ret void
				}

				; 4096 bytes is the largest frame we can probe in one go.
				define void @static_4096(i8** %out) "probe-stack"="inline-asm" "frame-pointer"="none" {
				; CHECK-LABEL: static_4096:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: sub sp, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: .cfi_def_cfa_offset 4112
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: add sp, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vla = alloca i8, i64 4096, align 1
				store i8* %vla, i8** %out, align 8
				ret void
				}

				; Smallest stack frame which needs two probes (the first, smaller one gets
				; folded into a store with writeback).
				define void @static_4112(i8** %out) "probe-stack"="inline-asm" "frame-pointer"="none" {
				; CHECK-LABEL: static_4112:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: str xzr, [sp, #-16]!
				; CHECK-NEXT: sub sp, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: .cfi_def_cfa_offset 4128
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: add sp, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: add sp, sp, #16 // =16
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vla = alloca i8, i64 4112, align 1
				store i8* %vla, i8** %out, align 8
				ret void
				}

				; Needs two probes, but neither can be folded into a store with writeback.
				define void @static_6144(i8** %out) "probe-stack"="inline-asm" "frame-pointer"="none" {
				; CHECK-LABEL: static_6144:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: sub sp, sp, #2048 // =2048
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: sub sp, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: .cfi_def_cfa_offset 6160
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: add sp, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: add sp, sp, #2048 // =2048
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vla = alloca i8, i64 6144, align 1
				store i8* %vla, i8** %out, align 8
				ret void
				}

				; Largest frame needing two probes
				define void @static_8192(i8** %out) "probe-stack"="inline-asm" "frame-pointer"="none" {
				; CHECK-LABEL: static_8192:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: sub sp, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: sub sp, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: .cfi_def_cfa_offset 8208
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: add sp, sp, #2, lsl #12 // =8192
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vla = alloca i8, i64 8192, align 1
				store i8* %vla, i8** %out, align 8
				ret void
				}

				; Largest frame probed without a loop
				define void @static_12272(i8** %out) "probe-stack"="inline-asm" "frame-pointer"="none" {
				; CHECK-LABEL: static_12272:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: sub sp, sp, #4080 // =4080
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: sub sp, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: sub sp, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: .cfi_def_cfa_offset 12288
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: add sp, sp, #2, lsl #12 // =8192
				; CHECK-NEXT: add sp, sp, #4080 // =4080
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vla = alloca i8, i64 12272, align 1
				store i8* %vla, i8** %out, align 8
				ret void
				}

				; Smallest frame probed with a loop
				define void @static_12288(i8** %out) "probe-stack"="inline-asm" "frame-pointer"="none" {
				; CHECK-LABEL: static_12288:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: sub x9, sp, #3, lsl #12 // =12288
				; CHECK-NEXT: .LBB9_1: // %entry
				; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: sub sp, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: cmp sp, x9
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: b.ne .LBB9_1
				; CHECK-NEXT: // %bb.2: // %entry
				; CHECK-NEXT: .cfi_def_cfa_offset 12304
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: add sp, sp, #3, lsl #12 // =12288
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vla = alloca i8, i64 12288, align 1
				store i8* %vla, i8** %out, align 8
				ret void
				}

				; Large enough to use a loop, but not a multiple of 4KiB so needs an extra
				; probe for the remainder.
				define void @static_13312(i8** %out) "probe-stack"="inline-asm" "frame-pointer"="none" {
				; CHECK-LABEL: static_13312:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: sub sp, sp, #1024 // =1024
				; CHECK-NEXT: sub x9, sp, #3, lsl #12 // =12288
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: .LBB10_1: // %entry
				; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: sub sp, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: cmp sp, x9
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: b.ne .LBB10_1
				; CHECK-NEXT: // %bb.2: // %entry
				; CHECK-NEXT: .cfi_def_cfa_offset 13328
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: add sp, sp, #3, lsl #12 // =12288
				; CHECK-NEXT: add sp, sp, #1024 // =1024
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vla = alloca i8, i64 13312, align 1
				store i8* %vla, i8** %out, align 8
				ret void
				}

				; A small allocation, but with a very large alignment requirement. We do this
				; by moving SP far enough that a sufficiently-aligned block will exist
				; somewhere in the stack frame, so must probe the whole of that larger SP move.
				define void @static_16_align_8192(i8** %out) "probe-stack"="inline-asm" "frame-pointer"="none" {
				; CHECK-LABEL: static_16_align_8192:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
				; CHECK-NEXT: sub x9, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: sub x9, x9, #4080 // =4080
				; CHECK-NEXT: and x9, x9, #0xffffffffffffe000
				; CHECK-NEXT: mov x29, sp
				; CHECK-NEXT: .LBB11_1: // %entry
				; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: sub sp, sp, #1, lsl #12 // =4096
				; CHECK-NEXT: cmp sp, x9
				; CHECK-NEXT: b.le .LBB11_3
				; CHECK-NEXT: // %bb.2: // %entry
				; CHECK-NEXT: // in Loop: Header=BB11_1 Depth=1
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: b .LBB11_1
				; CHECK-NEXT: .LBB11_3: // %entry
				; CHECK-NEXT: mov sp, x9
				; CHECK-NEXT: str xzr, [sp]
				; CHECK-NEXT: .cfi_def_cfa w29, 16
				; CHECK-NEXT: .cfi_offset w30, -8
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: str x8, [x0]
				; CHECK-NEXT: mov sp, x29
				; CHECK-NEXT: ldp x29, x30, [sp], #16 // 16-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				%vla = alloca i8, i64 16, align 8192
				store i8* %vla, i8** %out, align 8
				ret void
				}