This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
4/10
AArch64FrameLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
win-align-chkstk.ll

Differential D135687

[AArch64] Fix aligning the stack after calling __chkstk
ClosedPublic

Authored by mstorsjo on Oct 11 2022, 9:03 AM.

Download Raw Diff

Details

Reviewers

efriedma
zzheng

Commits

rG6eb205b25771: Reapply [AArch64] Fix aligning the stack after calling __chkstk
rG50e0aced4521: [AArch64] Fix aligning the stack after calling __chkstk

Summary

Whenever a call to __chkstk was made, the frame lowering previously
omitted the aligning (as NumBytes was reset to zero before doing
alignment).

This fixes https://github.com/llvm/llvm-project/issues/56182.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mstorsjo created this revision.Oct 11 2022, 9:03 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 11 2022, 9:03 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

mstorsjo requested review of this revision.Oct 11 2022, 9:03 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 11 2022, 9:03 AM

mstorsjo added a parent revision: D135686: [AArch64] Exclude instructions after setting the FP from SEH prologues.Oct 11 2022, 9:03 AM

Harbormaster completed remote builds in B191516: Diff 466834.Oct 11 2022, 9:57 AM

In general, I think we need to pass the number of bytes required for realignment to __chkstk, instead of directly realigning the stack. As discussed on the bug.

In D135687#3850040, @efriedma wrote:

In general, I think we need to pass the number of bytes required for realignment to __chkstk, instead of directly realigning the stack. As discussed on the bug.

Oh, right, I didn't realize that we could fix that part of the requirement that easily.

In D135687#3850352, @mstorsjo wrote:

In D135687#3850040, @efriedma wrote:

In general, I think we need to pass the number of bytes required for realignment to __chkstk, instead of directly realigning the stack. As discussed on the bug.

Oh, right, I didn't realize that we could fix that part of the requirement that easily.

Actually, on second thought, either this requires a bit more changes than that (deviating more from the pattern for how __chkstk is used) or it doesn't actually fulfill the intent you have here (or I'm still overlooking something).

Currently, the call to __chkstk looks like this:

mov x15, #(NumBytes/16)
bl __chkstk
sub sp, sp, x15, lsl #4

If we'd include the alignment in the probing, it'd look like this:

mov x15, #((NumBytes+Align)/16)
bl __chkstk
sub x15, sp, x15, lsl #4
and sp, x15, #(AlignMask)

Here, the call to __chkstk does cover the extra align amount, but we also decrement the stack pointer by that amount, so when doing the alignment masking, we can still decrement the stack pointer further past what's been probed. To avoid this, I guess we'd need to add an sub x15, x15, #(Align/16) after the call, before decrementing the stack. Is that how you meant it?

I was thinking you could do something like this:

mov x15, #(NumBytes/16) ; Stack allocation size
mov x9, sp
and x9, x9, #AlignMask ; compute aligned stack
sub x9, sp, x9 ; compute number of bytes required to align stack
add x15, x15, x9, lsr #4 ; add bytes to chkstk input
bl __chkstk
sub sp, sp, x15, lsl #4

There's probably some slightly more efficient way to write this, but that's the idea.

In D135687#3850753, @efriedma wrote:
I was thinking you could do something like this:
mov x15, #(NumBytes/16) ; Stack allocation size
mov x9, sp
and x9, x9, #AlignMask ; compute aligned stack
sub x9, sp, x9 ; compute number of bytes required to align stack
add x15, x15, x9, lsr #4 ; add bytes to chkstk input
bl __chkstk
sub sp, sp, x15, lsl #4
There's probably some slightly more efficient way to write this, but that's the idea.

Hmm, ok, I see. I guess that'd work, but it feels a bit roundabout IMO.

It looks like MSVC does something like this:

mov x15, #(NumBytes+Align)/16
bl __chkstk
sub sp, sp, x15, lsl #4
mov x8, sp
add x9, x8, #(Align-1)
and x0, x9, #AlignMask

(The mov x8, sp feels superfluous here?) It doesn't align the stack pointer itself, but I think we should be able to do the same (while wasting fewer registers) by doing the last and into sp.

I think I'd prefer something like this:

mov x15, #(NumBytes+Align)/16
bl __chkstk
sub sp, sp, x15, lsl #4
add x9, sp, #(Align -1)
and sp, x9, #AlignMask

After staring at my code sequence a bit more, it simplifies to:

mov x15, #(NumBytes/16)
mov x9, sp
ubfx x9, x9, #4, #AlignBits
add x15, x15, x9
bl __chkstk
sub sp, sp, x15, lsl #4

Your suggested sequence also works, I guess, but it feels a little weird to overallocate, then deallocate the extra memory.

MSVC doesn't actually realign the stack in the sense LLVM does; it just overallocates stack memory, then uses masking to align the addresses of variables.

In D135687#3850987, @efriedma wrote:
After staring at my code sequence a bit more, it simplifies to:
mov x15, #(NumBytes/16)
mov x9, sp
ubfx x9, x9, #4, #AlignBits
add x15, x15, x9
bl __chkstk
sub sp, sp, x15, lsl #4
Your suggested sequence also works, I guess, but it feels a little weird to overallocate, then deallocate the extra memory.

Yeah, that's a bit clumsy (but the amount of extra instructions is quite small).

I find your suggestion very neat, but it assumes that NumBytes itself is aligned to the alignment, otherwise we break the realignment we calculated from the actual stack pointer. In practice it isn't - the total allocation of the stack frame (including CSRs) is aligned, but NumBytes isn't. And trying to calculate how much is spent on the other bits and adjusting it here feels very brittle too. To counter that, we'd have to overallocate the chkstk amount to make sure that it is aligned on its own, and that's not very pretty either.

Updated to overallocate the chkstk amount based on the alignment, and then reducing the overallocation to align the stack pointer in the end.

I'm not very fond of this solution (your version would be neater), but I'm updating the patch to bring it a bit forward for discussion at least.

Harbormaster completed remote builds in B191706: Diff 467100.Oct 12 2022, 6:10 AM

Oh, to account for unaligned NumBytes, my solution would have to be something more like:

sub x15, sp, #NumBytes
and x15, x15, #AlignMask
sub x15, sp, x15
lsr x15, x15, lsr #4
bl __chkstk
sub sp, sp, x15, lsl #4

(If NumBytes is large, add 1 or 2 instructions to materialize it.) Maybe emitting extra instructions to avoid overallocating isn't worth it, though...

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
1650	This is allocating more than necessary? We only really need to add `RealignmentBytes ? RealignmentBytes - 16 : 0` bytes.
1756	We have a helper for this: AArch64_AM::encodeLogicalImmediate.

Reduced the amount of overallocation; moving back by (RealignmentBytes-16) aka RealignmentPadding in the new revision instead of (RealignmentBytes-1) when aligning the pointer at the end.

Using encodeLogicalImmediate to encode the immediate, added a missed setStackRealigned(true), added an assert that NeedsWinCFI is false at this point.

Harbormaster completed remote builds in B191817: Diff 467262.Oct 12 2022, 2:34 PM

LGTM

This revision is now accepted and ready to land.Oct 12 2022, 3:17 PM

This revision was landed with ongoing or failed builds.Oct 12 2022, 11:54 PM

Closed by commit rG50e0aced4521: [AArch64] Fix aligning the stack after calling __chkstk (authored by mstorsjo). · Explain Why

This revision was automatically updated to reflect the committed changes.

mstorsjo added a commit: rG50e0aced4521: [AArch64] Fix aligning the stack after calling __chkstk.

mstorsjo added a reverting change: rGf309f095e7c6: Revert "[AArch64] Fix aligning the stack after calling __chkstk".Oct 14 2022, 2:00 AM

mstorsjo reopened this revision.Oct 14 2022, 2:11 AM

This revision is now accepted and ready to land.Oct 14 2022, 2:11 AM

Added a testcase for one case where things broke in practice with the previous form.

Since the previous version, I moved the setting of NeedsRealignment further down in the function. Previously I initialized this variable early in the function, before the first call to windowsRequiresStackProbe. However, NumBytes may be nonzero at this point (making NeedsRealignment true), even if NumBytes would be zero at the point when realignment is done later in the function, prior to the patch.

We don't need to include RealignmentPadding in the first initial check with windowsRequiresStackProbe, since if we're actually going to realign the stack, we do have a stack frame anyway.

Additionally, don't set RealignmentPadding to a nonzero value, if the requested alignment is less than 16. In the added testcase, MFI.getMaxAlign() is 8. (Curiously, for non-Windows OSes, this input actually does result in generating code for ANDing the stack pointer to align it to 8 bytes though.)

Harbormaster completed remote builds in B192141: Diff 467717.Oct 14 2022, 3:00 AM

mstorsjo requested review of this revision.Oct 14 2022, 4:54 AM

efriedma added inline comments.Oct 14 2022, 10:42 AM

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
1651	I'm not sure it actually makes sense to check NumBytes here. If we're forcing realignment with an attribute, we want to make sure it happens even if there aren't any local variables.
1654	I suggested subtracting "16" here on the assumption the stack is actually 16-byte aligned on entry. If we're not assuming the incoming stack is 16-byte aligned, we can't do that. (Theoretically, we could subtract 1, but we can't actually allocate in increments of less than 16.) Normally, we shouldn't be using realignment in the first place if we're only trying to align the stack to 16 bytes, but I guess the "stackrealign" attribute forces us to realign the stack even if we think it's already aligned. (By default, the CPU faults if the alignment of sp is less than 16, so that seems unlikely, but I guess there could be some environment which disables that flag.)

mstorsjo added inline comments.Oct 14 2022, 11:55 AM

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
1651	Right, I guess that’s true. However, in the current codepath for non-windows, all alignment is being done within an `if (NumBytes)` condition, so unless there’s further local allocations, it actually won’t realign the stack. Also, the alignment to apply on the stack (from `-mstack-alignment`) doesn’t seem to be taken into account for actually aligning the stack, only the alignment of the objects in the current stack frame. So this does seem like a preexisting bug. It seems like the x86 backend does the right thing here (aligning to the maximum of the objects in the stack frame and the stack’s own alignment, while this code here only looks at the objects.
1654	I don’t really think we need to worry about the case when the incoming stack isn’t aligned to 16 bytes, I guess thus is just a degenerate case - while it’s only expected to do anything if the requested alignment is higher than that.

efriedma added inline comments.Oct 14 2022, 1:07 PM

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
1651	Not sure I understand the different sources of alignment, but we can leave that for a followup in any case.
1654	In that case, should we just pretend the user didn't request stack realignment if they don't request alignment higher than 16?

mstorsjo added inline comments.Oct 14 2022, 1:53 PM

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
1651	The two different cases are these: $ cat align1.c void other(char *buf); void func(void) { char buf[100] __attribute__((aligned(64))); other(buf); } $ clang -target aarch64-linux-gnu -S -o - align1.c -O2 $ cat align2.c void other(void); void func(void) { other(); } $ clang -target aarch64-linux-gnu -S -o - -O2 align2.c -mstack-alignment=64 -mstackrealign For the first case, we want to allocate a 64-byte aligned object on the stack (which implicitly aligns the stack entirely to 64 bytes at this point). For the second case, we want to realign the stack to 64 bytes (even though the function itself doesn't allocate anything) before calling another function, so that the second function can assume that the incoming stack is aligned to 64 bytes and not bother with more similar realignments, while maintaining 64 byte alignment. The second case isn't taken into account at all by the aarch64 target; it optimizes the call into a tail call, and even if it tweaked so that it doesn't do that, it still doesn't align the stack to 64 bytes. For x86_64, both cases are handled as intended. (I guess there's seldom need for much more alignment than 16 bytes on aarch64 anyway, so there probably hasn't been much need or desire to fix this yet. I guess SVE might benefit from it at some point though?) So since the aarch64 target essentially doesn't really handle the explicit stack realignment requests properly right now, I think the patch in the current form is as good as it gets on its own - handling the alignment of stack objects correctly, and no longer blowing up on the realignment requests (but doing as little as we do for other targets about it).
1654	I guess we could. Currently this patch only needs to care about whether the alignment is higher than 16 for the windows codepath; we could extend that into the rest of the cases, but I'd leave that to a separate patch.

This seems like progress in the right direction; LGTM

This revision is now accepted and ready to land.Oct 14 2022, 1:59 PM

This revision was landed with ongoing or failed builds.Oct 14 2022, 2:43 PM

Closed by commit rG6eb205b25771: Reapply [AArch64] Fix aligning the stack after calling __chkstk (authored by mstorsjo). · Explain Why

This revision was automatically updated to reflect the committed changes.

mstorsjo added a commit: rG6eb205b25771: Reapply [AArch64] Fix aligning the stack after calling __chkstk.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64FrameLowering.cpp

34 lines

test/

CodeGen/

AArch64/

win-align-chkstk.ll

27 lines

Diff 467369

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp

Show First 20 Lines • Show All 1,490 Lines • ▼ Show 20 Lines	void AArch64FrameLowering::emitPrologue(MachineFunction &MF,
// getStackSize() includes all the locals in its size calculation. We don't		// getStackSize() includes all the locals in its size calculation. We don't
// include these locals when computing the stack size of a funclet, as they		// include these locals when computing the stack size of a funclet, as they
// are allocated in the parent's stack frame and accessed via the frame		// are allocated in the parent's stack frame and accessed via the frame
// pointer from the funclet. We only save the callee saved registers in the		// pointer from the funclet. We only save the callee saved registers in the
// funclet, which are really the callee saved registers of the parent		// funclet, which are really the callee saved registers of the parent
// function, including the funclet.		// function, including the funclet.
int64_t NumBytes = IsFunclet ? getWinEHFuncletFrameSize(MF)		int64_t NumBytes = IsFunclet ? getWinEHFuncletFrameSize(MF)
: MFI.getStackSize();		: MFI.getStackSize();
if (!AFI->hasStackFrame() && !windowsRequiresStackProbe(MF, NumBytes)) {
		// Alignment is required for the parent frame, not the funclet
		const bool NeedsRealignment =
		NumBytes && !IsFunclet && RegInfo->hasStackRealignment(MF);
		int64_t RealignmentPadding =
		NeedsRealignment ? MFI.getMaxAlign().value() - 16 : 0;

		if (!AFI->hasStackFrame() &&
		!windowsRequiresStackProbe(MF, NumBytes + RealignmentPadding)) {
assert(!HasFP && "unexpected function without stack frame but with FP");		assert(!HasFP && "unexpected function without stack frame but with FP");
assert(!SVEStackSize &&		assert(!SVEStackSize &&
"unexpected function without stack frame but with SVE objects");		"unexpected function without stack frame but with SVE objects");
// All of the stack allocation is for locals.		// All of the stack allocation is for locals.
AFI->setLocalStackSize(NumBytes);		AFI->setLocalStackSize(NumBytes);
if (!NumBytes)		if (!NumBytes)
return;		return;
// REDZONE: If the stack size is less than 128 bytes, we don't need		// REDZONE: If the stack size is less than 128 bytes, we don't need
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	void AArch64FrameLowering::emitPrologue(MachineFunction &MF,
}		}

// Now emit the moves for whatever callee saved regs we have (including FP,		// Now emit the moves for whatever callee saved regs we have (including FP,
// LR if those are saved). Frame instructions for SVE register are emitted		// LR if those are saved). Frame instructions for SVE register are emitted
// later, after the instruction which actually save SVE regs.		// later, after the instruction which actually save SVE regs.
if (EmitCFI)		if (EmitCFI)
emitCalleeSavedGPRLocations(MBB, MBBI);		emitCalleeSavedGPRLocations(MBB, MBBI);

if (windowsRequiresStackProbe(MF, NumBytes)) {		if (windowsRequiresStackProbe(MF, NumBytes + RealignmentPadding)) {
uint64_t NumWords = NumBytes >> 4;		uint64_t NumWords = (NumBytes + RealignmentPadding) >> 4;
		efriedmaUnsubmitted Not Done Reply Inline Actions This is allocating more than necessary? We only really need to add `RealignmentBytes ? RealignmentBytes - 16 : 0` bytes. efriedma: This is allocating more than necessary? We only really need to add `RealignmentBytes ?
if (NeedsWinCFI) {		if (NeedsWinCFI) {
		efriedmaUnsubmitted Not Done Reply Inline Actions I'm not sure it actually makes sense to check NumBytes here. If we're forcing realignment with an attribute, we want to make sure it happens even if there aren't any local variables. efriedma: I'm not sure it actually makes sense to check NumBytes here. If we're forcing realignment with…
		mstorsjoAuthorUnsubmitted Done Reply Inline Actions Right, I guess that’s true. However, in the current codepath for non-windows, all alignment is being done within an `if (NumBytes)` condition, so unless there’s further local allocations, it actually won’t realign the stack. Also, the alignment to apply on the stack (from `-mstack-alignment`) doesn’t seem to be taken into account for actually aligning the stack, only the alignment of the objects in the current stack frame. So this does seem like a preexisting bug. It seems like the x86 backend does the right thing here (aligning to the maximum of the objects in the stack frame and the stack’s own alignment, while this code here only looks at the objects. mstorsjo: Right, I guess that’s true. However, in the current codepath for non-windows, all alignment is…
		efriedmaUnsubmitted Not Done Reply Inline Actions Not sure I understand the different sources of alignment, but we can leave that for a followup in any case. efriedma: Not sure I understand the different sources of alignment, but we can leave that for a followup…
		mstorsjoAuthorUnsubmitted Done Reply Inline Actions The two different cases are these: $ cat align1.c void other(char buf); void func(void) { char buf[100] __attribute__((aligned(64))); other(buf); } $ clang -target aarch64-linux-gnu -S -o - align1.c -O2 $ cat align2.c void other(void); void func(void) { other(); } $ clang -target aarch64-linux-gnu -S -o - -O2 align2.c -mstack-alignment=64 -mstackrealign For the first case, we want to allocate a 64-byte aligned object on the stack (which implicitly aligns the stack entirely to 64 bytes at this point). For the second case, we want to realign the stack to 64 bytes (even though the function itself doesn't allocate anything) before calling another function, so that the second function can assume that the incoming stack is aligned to 64 bytes and not bother with more similar realignments, while maintaining 64 byte alignment. The second case isn't taken into account at all by the aarch64 target; it optimizes the call into a tail call, and even if it tweaked so that it doesn't do that, it still doesn't align the stack to 64 bytes. For x86_64, both cases are handled as intended. (I guess there's seldom need for much more alignment than 16 bytes on aarch64 anyway, so there probably hasn't been much need or desire to fix this yet. I guess SVE might benefit from it at some point though?) So since the aarch64 target essentially doesn't really handle the explicit stack realignment requests properly right now, I think the patch in the current form is as good as it gets on its own - handling the alignment of stack objects correctly, and no longer blowing up on the realignment requests (but doing as little as we do for other targets about it). mstorsjo:* The two different cases are these: ``` $ cat align1.c void other(char *buf); void func(void) {…
HasWinCFI = true;		HasWinCFI = true;
// alloc_l can hold at most 256MB, so assume that NumBytes doesn't		// alloc_l can hold at most 256MB, so assume that NumBytes doesn't
// exceed this amount. We need to move at most 2^24 - 1 into x15.		// exceed this amount. We need to move at most 2^24 - 1 into x15.
		efriedmaUnsubmitted Not Done Reply Inline Actions I suggested subtracting "16" here on the assumption the stack is actually 16-byte aligned on entry. If we're not assuming the incoming stack is 16-byte aligned, we can't do that. (Theoretically, we could subtract 1, but we can't actually allocate in increments of less than 16.) Normally, we shouldn't be using realignment in the first place if we're only trying to align the stack to 16 bytes, but I guess the "stackrealign" attribute forces us to realign the stack even if we think it's already aligned. (By default, the CPU faults if the alignment of sp is less than 16, so that seems unlikely, but I guess there could be some environment which disables that flag.) efriedma: I suggested subtracting "16" here on the assumption the stack is actually 16-byte aligned on…
		mstorsjoAuthorUnsubmitted Done Reply Inline Actions I don’t really think we need to worry about the case when the incoming stack isn’t aligned to 16 bytes, I guess thus is just a degenerate case - while it’s only expected to do anything if the requested alignment is higher than that. mstorsjo: I don’t really think we need to worry about the case when the incoming stack isn’t aligned to…
		efriedmaUnsubmitted Not Done Reply Inline Actions In that case, should we just pretend the user didn't request stack realignment if they don't request alignment higher than 16? efriedma: In that case, should we just pretend the user didn't request stack realignment if they don't…
		mstorsjoAuthorUnsubmitted Done Reply Inline Actions I guess we could. Currently this patch only needs to care about whether the alignment is higher than 16 for the windows codepath; we could extend that into the rest of the cases, but I'd leave that to a separate patch. mstorsjo: I guess we could. Currently this patch only needs to care about whether the alignment is higher…
// This is at most two instructions, MOVZ follwed by MOVK.		// This is at most two instructions, MOVZ follwed by MOVK.
// TODO: Fix to use multiple stack alloc unwind codes for stacks		// TODO: Fix to use multiple stack alloc unwind codes for stacks
// exceeding 256MB in size.		// exceeding 256MB in size.
if (NumBytes >= (1 << 28))		if (NumBytes >= (1 << 28))
report_fatal_error("Stack size cannot exceed 256MB for stack "		report_fatal_error("Stack size cannot exceed 256MB for stack "
"unwinding purposes");		"unwinding purposes");

uint32_t LowNumWords = NumWords & 0xFFFF;		uint32_t LowNumWords = NumWords & 0xFFFF;
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	BuildMI(MBB, MBBI, DL, TII->get(AArch64::SUBXrx64), AArch64::SP)
.setMIFlags(MachineInstr::FrameSetup);		.setMIFlags(MachineInstr::FrameSetup);
if (NeedsWinCFI) {		if (NeedsWinCFI) {
HasWinCFI = true;		HasWinCFI = true;
BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_StackAlloc))		BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_StackAlloc))
.addImm(NumBytes)		.addImm(NumBytes)
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
}		}
NumBytes = 0;		NumBytes = 0;

		if (NeedsRealignment) {
		BuildMI(MBB, MBBI, DL, TII->get(AArch64::ADDXri), AArch64::X15)
		.addReg(AArch64::SP)
		.addImm(RealignmentPadding)
		.addImm(0);

		uint64_t AndMask = ~(MFI.getMaxAlign().value() - 1);
		BuildMI(MBB, MBBI, DL, TII->get(AArch64::ANDXri), AArch64::SP)
		.addReg(AArch64::X15, RegState::Kill)
		.addImm(AArch64_AM::encodeLogicalImmediate(AndMask, 64));
		AFI->setStackRealigned(true);

		// No need for SEH instructions here; if we're realigning the stack,
		// we've set a frame pointer and already finished the SEH prologue.
		efriedmaUnsubmitted Not Done Reply Inline Actions We have a helper for this: AArch64_AM::encodeLogicalImmediate. efriedma: We have a helper for this: AArch64_AM::encodeLogicalImmediate.
		assert(!NeedsWinCFI);
		}
}		}

StackOffset AllocateBefore = SVEStackSize, AllocateAfter = {};		StackOffset AllocateBefore = SVEStackSize, AllocateAfter = {};
MachineBasicBlock::iterator CalleeSavesBegin = MBBI, CalleeSavesEnd = MBBI;		MachineBasicBlock::iterator CalleeSavesBegin = MBBI, CalleeSavesEnd = MBBI;

// Process the SVE callee-saves to determine what space needs to be		// Process the SVE callee-saves to determine what space needs to be
// allocated.		// allocated.
if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {		if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
Show All 22 Lines	void AArch64FrameLowering::emitPrologue(MachineFunction &MF,
emitFrameOffset(MBB, CalleeSavesEnd, DL, AArch64::SP, AArch64::SP,		emitFrameOffset(MBB, CalleeSavesEnd, DL, AArch64::SP, AArch64::SP,
-AllocateAfter, TII, MachineInstr::FrameSetup, false, false,		-AllocateAfter, TII, MachineInstr::FrameSetup, false, false,
nullptr, EmitCFI && !HasFP && AllocateAfter,		nullptr, EmitCFI && !HasFP && AllocateAfter,
AllocateBefore + StackOffset::getFixed(		AllocateBefore + StackOffset::getFixed(
(int64_t)MFI.getStackSize() - NumBytes));		(int64_t)MFI.getStackSize() - NumBytes));

// Allocate space for the rest of the frame.		// Allocate space for the rest of the frame.
if (NumBytes) {		if (NumBytes) {
// Alignment is required for the parent frame, not the funclet
const bool NeedsRealignment =
!IsFunclet && RegInfo->hasStackRealignment(MF);
unsigned scratchSPReg = AArch64::SP;		unsigned scratchSPReg = AArch64::SP;

if (NeedsRealignment) {		if (NeedsRealignment) {
scratchSPReg = findScratchNonCalleeSaveRegister(&MBB);		scratchSPReg = findScratchNonCalleeSaveRegister(&MBB);
assert(scratchSPReg != AArch64::NoRegister);		assert(scratchSPReg != AArch64::NoRegister);
}		}

// If we're a leaf function, try using the red zone.		// If we're a leaf function, try using the red zone.
▲ Show 20 Lines • Show All 2,173 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/win-align-chkstk.ll

This file was added.

				; RUN: llc < %s -mtriple=aarch64-windows \| FileCheck %s

				define dso_local void @func() {
				entry:
				%buf = alloca [8192 x i8], align 32
				%arraydecay = getelementptr inbounds [8192 x i8], ptr %buf, i64 0, i64 0
				call void @other(ptr noundef %arraydecay)
				ret void
				}

				declare dso_local void @other(ptr noundef)

				; CHECK-LABEL: func:
				; CHECK-NEXT: .seh_proc func
				; CHECK-NEXT: // %bb.0:
				; CHECK-NEXT: str x28, [sp, #-32]!
				; CHECK-NEXT: .seh_save_reg_x x28, 32
				; CHECK-NEXT: stp x29, x30, [sp, #8]
				; CHECK-NEXT: .seh_save_fplr 8
				; CHECK-NEXT: add x29, sp, #8
				; CHECK-NEXT: .seh_add_fp 8
				; CHECK-NEXT: .seh_endprologue
				; CHECK-NEXT: mov x15, #513
				; CHECK-NEXT: bl __chkstk
				; CHECK-NEXT: sub sp, sp, x15, lsl #4
				; CHECK-NEXT: add x15, sp, #16
				; CHECK-NEXT: and sp, x15, #0xffffffffffffffe0