Download Raw Diff

Details

Reviewers

Commits

rG42f628c84269: Reapply "[SystemZFrameLowering] Don't overrwrite R1D (backchain) when probing."
rGea475c77ff9e: [SystemZFrameLowering] Don't overrwrite R1D (backchain) when probing.

Summary

For a function that saves the backchain while allocating stack space with probing (stack clash protection), R1D is used for both of these purposes. The probing modifies R1D in order to have a reference value for exiting the loop. In order to save the original value of R15D as the backchain after the probing-loop, the value decremented from R1D must be added back.

An additional improvement is to not copy R15 to R1 in this case, since then it has already been done for the purpose of the backchain.

I am not sure if there is a better way, but this seems ok as long as adding it back is just a single instruction (would there be any other reg available?).

This was incidentally discovered as a big function with "backchain" and "probe-stack"="inline-asm" was discovered by csmith/machine-verifier to not have R1D live-in to DoneMBB where the backchain is saved by an STG. This was due to the curious fact that the StackAllocMI is erased *after* recomputeLiveIns(*DoneMBB) is called, and since PROBED_STACKALLOC is marked only as defining R1D, it was not live-in.

I started by adding R1D as live into DoneMBB in SystemZFrameLowering::inlineStackProbe(), but then realized that the problem seems to be even worse: R1D is used in the probing loop after first being decremented with LoopAlloc. That means that the backchain value is in fact not the incoming R15D anymore in this case.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jonpa created this revision.Dec 7 2020, 5:04 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptDec 7 2020, 5:04 PM

jonpa requested review of this revision.Dec 7 2020, 5:04 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 7 2020, 5:04 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

I'm not sure I like the implicit assumptions the patch makes between inlineStackProbe and emitPrologue.

I think the code would be clearer if manipulation of the backchain were moved next to manipulation of r15. That is to say, emitPrologue should only store the backchain in the case where it itself updates r15. In the case where the r15 update happens in inlineStackProbe then storing the backchain should also happen in inlineStackProbe.

Then the code in inlineStackProbe can be self-contained and use r1 however it likes. If it helps, the code might also want to use r0 as a second temporary register to hold the backchain value temporarily. Also, strictly speaking the backchain ought to stored every time the stack pointer is updated, so it could be stored every time through the loop. (Then the store would also implicitly serve as probe.)

Updated per review. R0D is now used for the loop exit check while probing.

Hmm... now that R0D is used for the loop exit, and R1D is used for the backchain, perhaps the backchain actually could be handled just in emitPrologue()?

I think it still makes sense to have the backchain store local in inlineStackProbe.

In fact, I think it would be best to have the backchain store in every iteration of the loop, i.e. to the store in allocateAndProbe (of course that means the store then implicitly acts as probe so we don't need the volatile compare any more if we have a backchain).

// The back chain is stored topmost with packed-stack.
int Offset = usePackedStack(MF) ? SystemZMC::CallFrameSize - 8 : 0;

Given that this is now duplicated, maybe it would make sense to have that logic in a separate function "getBackchainOffset(MF)" or the like.

In D92803#2443137, @uweigand wrote:

I think it still makes sense to have the backchain store local in inlineStackProbe.

In fact, I think it would be best to have the backchain store in every iteration of the loop, i.e. to the store in allocateAndProbe (of course that means the store then implicitly acts as probe so we don't need the volatile compare any more if we have a backchain).

I remember there was an issue with "store tags" which we are handling for instance when we do loop-unrolling. But maybe that is not an issue any more on newer machines (and maybe we don't need to consider that in unrolling then either)?

This suggestion wasn't for performance reasons, but correctness. In theory, when using the backchain, it should be updated on every change to %r15 so that %r15 at all times points to a valid backchain value. Otherwise, unwinding using the backchain will randomly fail when starting at some PC where the stack pointer has been updated but the backchain not yet written.

However, thinking about this again, it probably doesn't matter in this case since there's nothing in this loop where we might be triggering unwinding. In any case, GCC also doesn't seem to update the backchain each time through the loop, so we're probably fine without.

So I think this patch LGTM now.

This suggestion:

// The back chain is stored topmost with packed-stack.
int Offset = usePackedStack(MF) ? SystemZMC::CallFrameSize - 8 : 0;

Given that this is now duplicated, maybe it would make sense to have that logic in a separate function "getBackchainOffset(MF)" or the like.

is still valid, but can be done in a separate patch. Looks like there are other places where this offset calculation is duplicated, they should really all be merged.

B.t.w. it seems lowerDYNAMIC_STACKALLOC and lowerSTACKRESTORE always use 0 as backchain offset, so those would be incorrect with the kernel backchain? I guess the kernel rarely uses dynamic stack allocation, but it seems those two places still ought to be fixed.

This revision is now accepted and ready to land.Dec 10 2020, 2:13 AM

Closed by commit rGea475c77ff9e: [SystemZFrameLowering] Don't overrwrite R1D (backchain) when probing. (authored by jonpa). · Explain WhyDec 10 2020, 1:07 PM

This revision was automatically updated to reflect the committed changes.

jonpa added a commit: rGea475c77ff9e: [SystemZFrameLowering] Don't overrwrite R1D (backchain) when probing..

jonpa added a reverting change: rGbc7a61b70360: Revert "[SystemZFrameLowering] Don't overrwrite R1D (backchain) when probing.".Dec 10 2020, 4:07 PM

Sorry - had to revert patch since the live-in lists had not been handled properly.

I think the right thing to do is to compute them after all else - the saving of the backchain and removal of the pseudo.

jonpa reopened this revision.Dec 10 2020, 5:20 PM

This revision is now accepted and ready to land.Dec 10 2020, 5:20 PM

jonpa requested review of this revision.Dec 10 2020, 5:21 PM

Ah, OK . Looks good to me.

This revision is now accepted and ready to land.Dec 11 2020, 12:29 AM

Closed by commit rG42f628c84269: Reapply "[SystemZFrameLowering] Don't overrwrite R1D (backchain) when probing." (authored by jonpa). · Explain WhyDec 11 2020, 4:30 PM

This revision was automatically updated to reflect the committed changes.

jonpa added a commit: rG42f628c84269: Reapply "[SystemZFrameLowering] Don't overrwrite R1D (backchain) when probing.".

Diff 311335

llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp

Show First 20 Lines • Show All 482 Lines • ▼ Show 20 Lines	if (HasStackObject \|\| MFFrame.hasCalls())
StackSize += SystemZMC::CallFrameSize;		StackSize += SystemZMC::CallFrameSize;
// Don't allocate the incoming reg save area.		// Don't allocate the incoming reg save area.
StackSize = StackSize > SystemZMC::CallFrameSize		StackSize = StackSize > SystemZMC::CallFrameSize
? StackSize - SystemZMC::CallFrameSize		? StackSize - SystemZMC::CallFrameSize
: 0;		: 0;
MFFrame.setStackSize(StackSize);		MFFrame.setStackSize(StackSize);

if (StackSize) {		if (StackSize) {
// Determine if we want to store a backchain.
bool StoreBackchain = MF.getFunction().hasFnAttribute("backchain");

// If we need backchain, save current stack pointer. R1 is free at this
// point.
if (StoreBackchain)
BuildMI(MBB, MBBI, DL, ZII->get(SystemZ::LGR))
.addReg(SystemZ::R1D, RegState::Define).addReg(SystemZ::R15D);

// Allocate StackSize bytes.		// Allocate StackSize bytes.
int64_t Delta = -int64_t(StackSize);		int64_t Delta = -int64_t(StackSize);
const unsigned ProbeSize = TLI.getStackProbeSize(MF);		const unsigned ProbeSize = TLI.getStackProbeSize(MF);
bool FreeProbe = (ZFI->getSpillGPRRegs().GPROffset &&		bool FreeProbe = (ZFI->getSpillGPRRegs().GPROffset &&
(ZFI->getSpillGPRRegs().GPROffset + StackSize) < ProbeSize);		(ZFI->getSpillGPRRegs().GPROffset + StackSize) < ProbeSize);
if (!FreeProbe &&		if (!FreeProbe &&
MF.getSubtarget().getTargetLowering()->hasInlineStackProbe(MF)) {		MF.getSubtarget().getTargetLowering()->hasInlineStackProbe(MF)) {
// Stack probing may involve looping, but splitting the prologue block		// Stack probing may involve looping, but splitting the prologue block
// is not possible at this point since it would invalidate the		// is not possible at this point since it would invalidate the
// SaveBlocks / RestoreBlocks sets of PEI in the single block function		// SaveBlocks / RestoreBlocks sets of PEI in the single block function
// case. Build a pseudo to be handled later by inlineStackProbe().		// case. Build a pseudo to be handled later by inlineStackProbe().
BuildMI(MBB, MBBI, DL, ZII->get(SystemZ::PROBED_STACKALLOC))		BuildMI(MBB, MBBI, DL, ZII->get(SystemZ::PROBED_STACKALLOC))
.addImm(StackSize);		.addImm(StackSize);
}		}
else {		else {
		bool StoreBackchain = MF.getFunction().hasFnAttribute("backchain");
		// If we need backchain, save current stack pointer. R1 is free at
		// this point.
		if (StoreBackchain)
		BuildMI(MBB, MBBI, DL, ZII->get(SystemZ::LGR))
		.addReg(SystemZ::R1D, RegState::Define).addReg(SystemZ::R15D);
emitIncrement(MBB, MBBI, DL, SystemZ::R15D, Delta, ZII);		emitIncrement(MBB, MBBI, DL, SystemZ::R15D, Delta, ZII);
buildCFAOffs(MBB, MBBI, DL, SPOffsetFromCFA + Delta, ZII);		buildCFAOffs(MBB, MBBI, DL, SPOffsetFromCFA + Delta, ZII);
}
SPOffsetFromCFA += Delta;

if (StoreBackchain) {		if (StoreBackchain) {
// The back chain is stored topmost with packed-stack.		// The back chain is stored topmost with packed-stack.
int Offset = usePackedStack(MF) ? SystemZMC::CallFrameSize - 8 : 0;		int Offset = usePackedStack(MF) ? SystemZMC::CallFrameSize - 8 : 0;
BuildMI(MBB, MBBI, DL, ZII->get(SystemZ::STG))		BuildMI(MBB, MBBI, DL, ZII->get(SystemZ::STG))
.addReg(SystemZ::R1D, RegState::Kill).addReg(SystemZ::R15D)		.addReg(SystemZ::R1D, RegState::Kill).addReg(SystemZ::R15D)
.addImm(Offset).addReg(0);		.addImm(Offset).addReg(0);
}		}
}		}
		SPOffsetFromCFA += Delta;
		}

if (HasFP) {		if (HasFP) {
// Copy the base of the frame to R11.		// Copy the base of the frame to R11.
BuildMI(MBB, MBBI, DL, ZII->get(SystemZ::LGR), SystemZ::R11D)		BuildMI(MBB, MBBI, DL, ZII->get(SystemZ::LGR), SystemZ::R11D)
.addReg(SystemZ::R15D);		.addReg(SystemZ::R15D);

// Add CFI for the new frame location.		// Add CFI for the new frame location.
buildDefCFAReg(MBB, MBBI, DL, SystemZ::R11D, ZII);		buildDefCFAReg(MBB, MBBI, DL, SystemZ::R11D, ZII);
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	auto allocateAndProbe = [&](MachineBasicBlock &InsMBB,
MachineMemOperand *MMO = MF.getMachineMemOperand(MachinePointerInfo(),		MachineMemOperand *MMO = MF.getMachineMemOperand(MachinePointerInfo(),
MachineMemOperand::MOVolatile \| MachineMemOperand::MOLoad, 8, Align(1));		MachineMemOperand::MOVolatile \| MachineMemOperand::MOLoad, 8, Align(1));
BuildMI(InsMBB, InsPt, DL, ZII->get(SystemZ::CG))		BuildMI(InsMBB, InsPt, DL, ZII->get(SystemZ::CG))
.addReg(SystemZ::R0D, RegState::Undef)		.addReg(SystemZ::R0D, RegState::Undef)
.addReg(SystemZ::R15D).addImm(Size - 8).addReg(0)		.addReg(SystemZ::R15D).addImm(Size - 8).addReg(0)
.addMemOperand(MMO);		.addMemOperand(MMO);
};		};

		bool StoreBackchain = MF.getFunction().hasFnAttribute("backchain");
		if (StoreBackchain)
		BuildMI(*MBB, MBBI, DL, ZII->get(SystemZ::LGR))
		.addReg(SystemZ::R1D, RegState::Define).addReg(SystemZ::R15D);

		MachineBasicBlock *DoneMBB = nullptr;
		MachineBasicBlock *LoopMBB = nullptr;
if (NumFullBlocks < 3) {		if (NumFullBlocks < 3) {
// Emit unrolled probe statements.		// Emit unrolled probe statements.
for (unsigned int i = 0; i < NumFullBlocks; i++)		for (unsigned int i = 0; i < NumFullBlocks; i++)
allocateAndProbe(MBB, MBBI, ProbeSize, true/EmitCFI*/);		allocateAndProbe(MBB, MBBI, ProbeSize, true/EmitCFI*/);
} else {		} else {
// Emit a loop probing the pages.		// Emit a loop probing the pages.
uint64_t LoopAlloc = ProbeSize * NumFullBlocks;		uint64_t LoopAlloc = ProbeSize * NumFullBlocks;
SPOffsetFromCFA -= LoopAlloc;		SPOffsetFromCFA -= LoopAlloc;

BuildMI(*MBB, MBBI, DL, ZII->get(SystemZ::LGR), SystemZ::R1D)		// Use R0D to hold the exit value.
		BuildMI(*MBB, MBBI, DL, ZII->get(SystemZ::LGR), SystemZ::R0D)
.addReg(SystemZ::R15D);		.addReg(SystemZ::R15D);
buildDefCFAReg(*MBB, MBBI, DL, SystemZ::R1D, ZII);		buildDefCFAReg(*MBB, MBBI, DL, SystemZ::R0D, ZII);
emitIncrement(*MBB, MBBI, DL, SystemZ::R1D, -int64_t(LoopAlloc), ZII);		emitIncrement(*MBB, MBBI, DL, SystemZ::R0D, -int64_t(LoopAlloc), ZII);
buildCFAOffs(*MBB, MBBI, DL, -int64_t(SystemZMC::CallFrameSize + LoopAlloc),		buildCFAOffs(*MBB, MBBI, DL, -int64_t(SystemZMC::CallFrameSize + LoopAlloc),
ZII);		ZII);

MachineBasicBlock *DoneMBB = SystemZ::splitBlockBefore(MBBI, MBB);		DoneMBB = SystemZ::splitBlockBefore(MBBI, MBB);
MachineBasicBlock *LoopMBB = SystemZ::emitBlockAfter(MBB);		LoopMBB = SystemZ::emitBlockAfter(MBB);
MBB->addSuccessor(LoopMBB);		MBB->addSuccessor(LoopMBB);
LoopMBB->addSuccessor(LoopMBB);		LoopMBB->addSuccessor(LoopMBB);
LoopMBB->addSuccessor(DoneMBB);		LoopMBB->addSuccessor(DoneMBB);

MBB = LoopMBB;		MBB = LoopMBB;
allocateAndProbe(MBB, MBB->end(), ProbeSize, false/EmitCFI*/);		allocateAndProbe(MBB, MBB->end(), ProbeSize, false/EmitCFI*/);
BuildMI(*MBB, MBB->end(), DL, ZII->get(SystemZ::CLGR))		BuildMI(*MBB, MBB->end(), DL, ZII->get(SystemZ::CLGR))
.addReg(SystemZ::R15D).addReg(SystemZ::R1D);		.addReg(SystemZ::R15D).addReg(SystemZ::R0D);
BuildMI(*MBB, MBB->end(), DL, ZII->get(SystemZ::BRC))		BuildMI(*MBB, MBB->end(), DL, ZII->get(SystemZ::BRC))
.addImm(SystemZ::CCMASK_ICMP).addImm(SystemZ::CCMASK_CMP_GT).addMBB(MBB);		.addImm(SystemZ::CCMASK_ICMP).addImm(SystemZ::CCMASK_CMP_GT).addMBB(MBB);

MBB = DoneMBB;		MBB = DoneMBB;
MBBI = DoneMBB->begin();		MBBI = DoneMBB->begin();
buildDefCFAReg(*MBB, MBBI, DL, SystemZ::R15D, ZII);		buildDefCFAReg(*MBB, MBBI, DL, SystemZ::R15D, ZII);

recomputeLiveIns(*DoneMBB);
recomputeLiveIns(*LoopMBB);
}		}

if (Residual)		if (Residual)
allocateAndProbe(MBB, MBBI, Residual, true/EmitCFI*/);		allocateAndProbe(MBB, MBBI, Residual, true/EmitCFI*/);

		if (StoreBackchain) {
		// The back chain is stored topmost with packed-stack.
		int Offset = usePackedStack(MF) ? SystemZMC::CallFrameSize - 8 : 0;
		BuildMI(*MBB, MBBI, DL, ZII->get(SystemZ::STG))
		.addReg(SystemZ::R1D, RegState::Kill).addReg(SystemZ::R15D)
		.addImm(Offset).addReg(0);
		}

StackAllocMI->eraseFromParent();		StackAllocMI->eraseFromParent();
		if (DoneMBB != nullptr) {
		// Compute the live-in lists for the new blocks.
		recomputeLiveIns(*DoneMBB);
		recomputeLiveIns(*LoopMBB);
		}
}		}

bool SystemZFrameLowering::hasFP(const MachineFunction &MF) const {		bool SystemZFrameLowering::hasFP(const MachineFunction &MF) const {
return (MF.getTarget().Options.DisableFramePointerElim(MF) \|\|		return (MF.getTarget().Options.DisableFramePointerElim(MF) \|\|
MF.getFrameInfo().hasVarSizedObjects() \|\|		MF.getFrameInfo().hasVarSizedObjects() \|\|
MF.getInfo<SystemZMachineFunctionInfo>()->getManipulatesSP());		MF.getInfo<SystemZMachineFunctionInfo>()->getManipulatesSP());
}		}

▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

llvm/test/CodeGen/SystemZ/stack-clash-dynamic-alloca.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z14 \| FileCheck %s			; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z14 -verify-machineinstrs \| FileCheck %s

	define i32 @fun0(i32 %n) #0 {			define i32 @fun0(i32 %n) #0 {
	; CHECK-LABEL: fun0:			; CHECK-LABEL: fun0:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: stmg %r11, %r15, 88(%r15)			; CHECK-NEXT: stmg %r11, %r15, 88(%r15)
	; CHECK-NEXT: .cfi_offset %r11, -72			; CHECK-NEXT: .cfi_offset %r11, -72
	; CHECK-NEXT: .cfi_offset %r15, -40			; CHECK-NEXT: .cfi_offset %r15, -40
	; CHECK-NEXT: aghi %r15, -160			; CHECK-NEXT: aghi %r15, -160
	▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines

	; The minimum probe size is the stack alignment.			; The minimum probe size is the stack alignment.
	define i32 @fun2(i32 %n) #0 "stack-probe-size"="4" {			define i32 @fun2(i32 %n) #0 "stack-probe-size"="4" {
	; CHECK-LABEL: fun2:			; CHECK-LABEL: fun2:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: stmg %r11, %r15, 88(%r15)			; CHECK-NEXT: stmg %r11, %r15, 88(%r15)
	; CHECK-NEXT: .cfi_offset %r11, -72			; CHECK-NEXT: .cfi_offset %r11, -72
	; CHECK-NEXT: .cfi_offset %r15, -40			; CHECK-NEXT: .cfi_offset %r15, -40
	; CHECK-NEXT: lgr %r1, %r15			; CHECK-NEXT: lgr %r0, %r15
	; CHECK-NEXT: .cfi_def_cfa_register %r1			; CHECK-NEXT: .cfi_def_cfa_register %r0
	; CHECK-NEXT: aghi %r1, -160			; CHECK-NEXT: aghi %r0, -160
	; CHECK-NEXT: .cfi_def_cfa_offset 320			; CHECK-NEXT: .cfi_def_cfa_offset 320
	; CHECK-NEXT: .LBB2_1: # =>This Inner Loop Header: Depth=1			; CHECK-NEXT: .LBB2_1: # =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: aghi %r15, -8			; CHECK-NEXT: aghi %r15, -8
	; CHECK-NEXT: cg %r0, 0(%r15)			; CHECK-NEXT: cg %r0, 0(%r15)
	; CHECK-NEXT: clgrjh %r15, %r1, .LBB2_1			; CHECK-NEXT: clgrjh %r15, %r0, .LBB2_1
	; CHECK-NEXT: # %bb.2:			; CHECK-NEXT: # %bb.2:
	; CHECK-NEXT: .cfi_def_cfa_register %r15			; CHECK-NEXT: .cfi_def_cfa_register %r15
	; CHECK-NEXT: lgr %r11, %r15			; CHECK-NEXT: lgr %r11, %r15
	; CHECK-NEXT: .cfi_def_cfa_register %r11			; CHECK-NEXT: .cfi_def_cfa_register %r11
	; CHECK-NEXT: # kill: def $r2l killed $r2l def $r2d			; CHECK-NEXT: # kill: def $r2l killed $r2l def $r2d
	; CHECK-NEXT: risbgn %r1, %r2, 30, 189, 2			; CHECK-NEXT: risbgn %r1, %r2, 30, 189, 2
	; CHECK-NEXT: la %r0, 7(%r1)			; CHECK-NEXT: la %r0, 7(%r1)
	; CHECK-NEXT: risbgn %r1, %r0, 29, 188, 0			; CHECK-NEXT: risbgn %r1, %r0, 29, 188, 0
	Show All 26 Lines

llvm/test/CodeGen/SystemZ/stack-clash-protection.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z14 -O3 \| FileCheck %s		; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z14 -O3 -verify-machineinstrs \| FileCheck %s
;		;
; Test stack clash protection probing for static allocas.		; Test stack clash protection probing for static allocas.

; Small: one probe.		; Small: one probe.
define i32 @fun0() #0 {		define i32 @fun0() #0 {
; CHECK-LABEL: fun0:		; CHECK-LABEL: fun0:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: aghi %r15, -560		; CHECK-NEXT: aghi %r15, -560
Show All 32 Lines	; CHECK-NEXT: br %r14
%c = load volatile i32, i32* %a		%c = load volatile i32, i32* %a
ret i32 %c		ret i32 %c
}		}

; Large: Use a loop to allocate and probe in steps.		; Large: Use a loop to allocate and probe in steps.
define i32 @fun2() #0 {		define i32 @fun2() #0 {
; CHECK-LABEL: fun2:		; CHECK-LABEL: fun2:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: lgr %r1, %r15		; CHECK-NEXT: lgr %r0, %r15
; CHECK-NEXT: .cfi_def_cfa_register %r1		; CHECK-NEXT: .cfi_def_cfa_register %r0
; CHECK-NEXT: agfi %r1, -69632		; CHECK-NEXT: agfi %r0, -69632
; CHECK-NEXT: .cfi_def_cfa_offset 69792		; CHECK-NEXT: .cfi_def_cfa_offset 69792
; CHECK-NEXT: .LBB2_1: # =>This Inner Loop Header: Depth=1		; CHECK-NEXT: .LBB2_1: # =>This Inner Loop Header: Depth=1
; CHECK-NEXT: aghi %r15, -4096		; CHECK-NEXT: aghi %r15, -4096
; CHECK-NEXT: cg %r0, 4088(%r15)		; CHECK-NEXT: cg %r0, 4088(%r15)
; CHECK-NEXT: clgrjh %r15, %r1, .LBB2_1		; CHECK-NEXT: clgrjh %r15, %r0, .LBB2_1
; CHECK-NEXT: # %bb.2:		; CHECK-NEXT: # %bb.2:
; CHECK-NEXT: .cfi_def_cfa_register %r15		; CHECK-NEXT: .cfi_def_cfa_register %r15
; CHECK-NEXT: aghi %r15, -2544		; CHECK-NEXT: aghi %r15, -2544
; CHECK-NEXT: .cfi_def_cfa_offset 72336		; CHECK-NEXT: .cfi_def_cfa_offset 72336
; CHECK-NEXT: cg %r0, 2536(%r15)		; CHECK-NEXT: cg %r0, 2536(%r15)
; CHECK-NEXT: lhi %r0, 1		; CHECK-NEXT: lhi %r0, 1
; CHECK-NEXT: mvhi 568(%r15), 1		; CHECK-NEXT: mvhi 568(%r15), 1
; CHECK-NEXT: sty %r0, 28968(%r15)		; CHECK-NEXT: sty %r0, 28968(%r15)
Show All 9 Lines	; CHECK-NEXT: br %r14
%c = load volatile i32, i32* %a		%c = load volatile i32, i32* %a
ret i32 %c		ret i32 %c
}		}

; Ends evenly on the step so no remainder needed.		; Ends evenly on the step so no remainder needed.
define void @fun3() #0 {		define void @fun3() #0 {
; CHECK-LABEL: fun3:		; CHECK-LABEL: fun3:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: lgr %r1, %r15		; CHECK-NEXT: lgr %r0, %r15
; CHECK-NEXT: .cfi_def_cfa_register %r1		; CHECK-NEXT: .cfi_def_cfa_register %r0
; CHECK-NEXT: aghi %r1, -28672		; CHECK-NEXT: aghi %r0, -28672
; CHECK-NEXT: .cfi_def_cfa_offset 28832		; CHECK-NEXT: .cfi_def_cfa_offset 28832
; CHECK-NEXT: .LBB3_1: # %entry		; CHECK-NEXT: .LBB3_1: # %entry
; CHECK-NEXT: # =>This Inner Loop Header: Depth=1		; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
; CHECK-NEXT: aghi %r15, -4096		; CHECK-NEXT: aghi %r15, -4096
; CHECK-NEXT: cg %r0, 4088(%r15)		; CHECK-NEXT: cg %r0, 4088(%r15)
; CHECK-NEXT: clgrjh %r15, %r1, .LBB3_1		; CHECK-NEXT: clgrjh %r15, %r0, .LBB3_1
; CHECK-NEXT: # %bb.2: # %entry		; CHECK-NEXT: # %bb.2: # %entry
; CHECK-NEXT: .cfi_def_cfa_register %r15		; CHECK-NEXT: .cfi_def_cfa_register %r15
; CHECK-NEXT: mvhi 180(%r15), 0		; CHECK-NEXT: mvhi 180(%r15), 0
; CHECK-NEXT: l %r0, 180(%r15)		; CHECK-NEXT: l %r0, 180(%r15)
; CHECK-NEXT: aghi %r15, 28672		; CHECK-NEXT: aghi %r15, 28672
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
entry:		entry:
%stack = alloca [7122 x i32], align 4		%stack = alloca [7122 x i32], align 4
%i = alloca i32, align 4		%i = alloca i32, align 4
%0 = bitcast [7122 x i32]* %stack to i8*		%0 = bitcast [7122 x i32]* %stack to i8*
%i.0.i.0..sroa_cast = bitcast i32* %i to i8*		%i.0.i.0..sroa_cast = bitcast i32* %i to i8*
store volatile i32 0, i32* %i, align 4		store volatile i32 0, i32* %i, align 4
%i.0.i.0.6 = load volatile i32, i32* %i, align 4		%i.0.i.0.6 = load volatile i32, i32* %i, align 4
ret void		ret void
}		}

; Loop with bigger step.		; Loop with bigger step.
define void @fun4() #0 "stack-probe-size"="8192" {		define void @fun4() #0 "stack-probe-size"="8192" {
; CHECK-LABEL: fun4:		; CHECK-LABEL: fun4:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: lgr %r1, %r15		; CHECK-NEXT: lgr %r0, %r15
; CHECK-NEXT: .cfi_def_cfa_register %r1		; CHECK-NEXT: .cfi_def_cfa_register %r0
; CHECK-NEXT: aghi %r1, -24576		; CHECK-NEXT: aghi %r0, -24576
; CHECK-NEXT: .cfi_def_cfa_offset 24736		; CHECK-NEXT: .cfi_def_cfa_offset 24736
; CHECK-NEXT: .LBB4_1: # %entry		; CHECK-NEXT: .LBB4_1: # %entry
; CHECK-NEXT: # =>This Inner Loop Header: Depth=1		; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
; CHECK-NEXT: aghi %r15, -8192		; CHECK-NEXT: aghi %r15, -8192
; CHECK-NEXT: cg %r0, 8184(%r15)		; CHECK-NEXT: cg %r0, 8184(%r15)
; CHECK-NEXT: clgrjh %r15, %r1, .LBB4_1		; CHECK-NEXT: clgrjh %r15, %r0, .LBB4_1
; CHECK-NEXT: # %bb.2: # %entry		; CHECK-NEXT: # %bb.2: # %entry
; CHECK-NEXT: .cfi_def_cfa_register %r15		; CHECK-NEXT: .cfi_def_cfa_register %r15
; CHECK-NEXT: aghi %r15, -7608		; CHECK-NEXT: aghi %r15, -7608
; CHECK-NEXT: .cfi_def_cfa_offset 32344		; CHECK-NEXT: .cfi_def_cfa_offset 32344
; CHECK-NEXT: cg %r0, 7600(%r15)		; CHECK-NEXT: cg %r0, 7600(%r15)
; CHECK-NEXT: mvhi 180(%r15), 0		; CHECK-NEXT: mvhi 180(%r15), 0
; CHECK-NEXT: l %r0, 180(%r15)		; CHECK-NEXT: l %r0, 180(%r15)
; CHECK-NEXT: aghi %r15, 32184		; CHECK-NEXT: aghi %r15, 32184
Show All 31 Lines	entry:
%i.0.i.0.6 = load volatile i32, i32* %i, align 4		%i.0.i.0.6 = load volatile i32, i32* %i, align 4
ret void		ret void
}		}

; The minimum probe size is the stack alignment.		; The minimum probe size is the stack alignment.
define void @fun6() #0 "stack-probe-size"="5" {		define void @fun6() #0 "stack-probe-size"="5" {
; CHECK-LABEL: fun6:		; CHECK-LABEL: fun6:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: lgr %r1, %r15		; CHECK-NEXT: lgr %r0, %r15
; CHECK-NEXT: .cfi_def_cfa_register %r1		; CHECK-NEXT: .cfi_def_cfa_register %r0
; CHECK-NEXT: aghi %r1, -4184		; CHECK-NEXT: aghi %r0, -4184
; CHECK-NEXT: .cfi_def_cfa_offset 4344		; CHECK-NEXT: .cfi_def_cfa_offset 4344
; CHECK-NEXT: .LBB6_1: # %entry		; CHECK-NEXT: .LBB6_1: # %entry
; CHECK-NEXT: # =>This Inner Loop Header: Depth=1		; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
; CHECK-NEXT: aghi %r15, -8		; CHECK-NEXT: aghi %r15, -8
; CHECK-NEXT: cg %r0, 0(%r15)		; CHECK-NEXT: cg %r0, 0(%r15)
; CHECK-NEXT: clgrjh %r15, %r1, .LBB6_1		; CHECK-NEXT: clgrjh %r15, %r0, .LBB6_1
; CHECK-NEXT: # %bb.2: # %entry		; CHECK-NEXT: # %bb.2: # %entry
; CHECK-NEXT: .cfi_def_cfa_register %r15		; CHECK-NEXT: .cfi_def_cfa_register %r15
; CHECK-NEXT: mvhi 180(%r15), 0		; CHECK-NEXT: mvhi 180(%r15), 0
; CHECK-NEXT: l %r0, 180(%r15)		; CHECK-NEXT: l %r0, 180(%r15)
; CHECK-NEXT: aghi %r15, 4184		; CHECK-NEXT: aghi %r15, 4184
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
entry:		entry:
%stack = alloca [1000 x i32], align 4		%stack = alloca [1000 x i32], align 4
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	; CHECK-NEXT: br %r14
%v = call i32 @foo()		%v = call i32 @foo()
%a = alloca i32, i64 952		%a = alloca i32, i64 952
%b = getelementptr inbounds i32, i32* %a, i64 200		%b = getelementptr inbounds i32, i32* %a, i64 200
store volatile i32 %v, i32* %b		store volatile i32 %v, i32* %b
%c = load volatile i32, i32* %a		%c = load volatile i32, i32* %a
ret i32 %c		ret i32 %c
}		}

		define void @fun9() #0 "backchain" {
		; CHECK-LABEL: fun9:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: lgr %r1, %r15
		; CHECK-NEXT: lgr %r0, %r15
		; CHECK-NEXT: .cfi_def_cfa_register %r0
		; CHECK-NEXT: aghi %r0, -28672
		; CHECK-NEXT: .cfi_def_cfa_offset 28832
		; CHECK-NEXT: .LBB9_1: # %entry
		; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
		; CHECK-NEXT: aghi %r15, -4096
		; CHECK-NEXT: cg %r0, 4088(%r15)
		; CHECK-NEXT: clgrjh %r15, %r0, .LBB9_1
		; CHECK-NEXT: # %bb.2: # %entry
		; CHECK-NEXT: .cfi_def_cfa_register %r15
		; CHECK-NEXT: stg %r1, 0(%r15)
		; CHECK-NEXT: mvhi 180(%r15), 0
		; CHECK-NEXT: l %r0, 180(%r15)
		; CHECK-NEXT: aghi %r15, 28672
		; CHECK-NEXT: br %r14
		entry:
		%stack = alloca [7122 x i32], align 4
		%i = alloca i32, align 4
		%0 = bitcast [7122 x i32]* %stack to i8*
		%i.0.i.0..sroa_cast = bitcast i32* %i to i8*
		store volatile i32 0, i32* %i, align 4
		%i.0.i.0.6 = load volatile i32, i32* %i, align 4
		ret void
		}


declare i32 @foo()		declare i32 @foo()
attributes #0 = { "probe-stack"="inline-asm" }		attributes #0 = { "probe-stack"="inline-asm" }

This is an archive of the discontinued LLVM Phabricator instance.

[SystemZFrameLowering] Make sure R1 holding the backchain is not corrupted by probing
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 311335

llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp

llvm/test/CodeGen/SystemZ/stack-clash-dynamic-alloca.ll

llvm/test/CodeGen/SystemZ/stack-clash-protection.ll

This is an archive of the discontinued LLVM Phabricator instance.

[SystemZFrameLowering] Make sure R1 holding the backchain is not corrupted by probing ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 311335

llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp

llvm/test/CodeGen/SystemZ/stack-clash-dynamic-alloca.ll

llvm/test/CodeGen/SystemZ/stack-clash-protection.ll

[SystemZFrameLowering] Make sure R1 holding the backchain is not corrupted by probing
ClosedPublic