This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
-
X86FrameLowering.h
-
X86FrameLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
stack-clash-large-large-align.ll
2/4
stack-clash-large.ll
-
stack-clash-medium-natural-probes-mutliple-objects.ll
-
stack-clash-medium-natural-probes.ll
-
stack-clash-medium.ll
-
stack-clash-small-large-align.ll
2/3
stack-clash-unknown-call.ll

Differential D98789

[PEI] add dwarf information for stack probe
AbandonedPublic

Authored by YangKeao on Mar 17 2021, 9:14 AM.

Download Raw Diff

Details

Reviewers

serge-sans-paille
nagisa
efriedma
lkail

Summary

While probing stack, the stack register is moved without
dwarf information, which could cause panic if unwind the
backtrace at that point. This commit add dwarf information
for these operation, and use r11 (instead of rsp) to iterate
over pages to probe.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

YangKeao created this revision.Mar 17 2021, 9:14 AM

Herald added subscribers: pengfei, hiraditya. · View Herald TranscriptMar 17 2021, 9:14 AM

YangKeao requested review of this revision.Mar 17 2021, 9:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2021, 9:14 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

There has been a bug report for this on bugzilla. A more "downstream" context for this feature is discussed in rust#83139.

Harbormaster completed remote builds in B94257: Diff 331285.Mar 17 2021, 9:49 AM

nagisa added reviewers: serge-sans-paille, nagisa, efriedma, lkail.Mar 17 2021, 10:51 AM

nagisa added a subscriber: nagisa.

nagisa added inline comments.

llvm/test/CodeGen/X86/stack-clash-unknown-call.ll
18	The `def_cfa_offset` is superfluous here now. The last `cfi` directive should be either the `adjust` or `def`.

Generally, unlike the formats used on Windows etc., DWARF unwind isn't accurate in the prologue. I mean, you could add separate unwind info for each relevant instruction, but that would be a lot of data, and we currently don't have any option to do that.

Given that, I'm not sure what you're trying to accomplish here. I can't see how adding more .cfi_adjust_cfa_offset directives does anything useful.

Err, sorry, please pretend I didn't write that. I have no idea what I just wrote.

To clarify, I've done some more reading now, and figured out where I went wrong. For a long time, LLVM did not emit accurate unwind info to describe the prologue/epilogue (and still doesn't on some targets), so I was under the impression it wasn't possible. Clearly, it is, and it's implemented on x86.

The change to use r11 isn't implemented correctly: we can't adjust the stack pointer until *after* we've probed the relevant pages. It'll appear to work, but it won't actually provide complete protection if a signal handler triggers at the wrong time.

In D98789#2632702, @efriedma wrote:

To clarify, I've done some more reading now, and figured out where I went wrong. For a long time, LLVM did not emit accurate unwind info to describe the prologue/epilogue (and still doesn't on some targets), so I was under the impression it wasn't possible. Clearly, it is, and it's implemented on x86.

The change to use r11 isn't implemented correctly: we can't adjust the stack pointer until *after* we've probed the relevant pages. It'll appear to work, but it won't actually provide complete protection if a signal handler triggers at the wrong time.

Make sense. I tried to use r11 + offset to represent CFA temporarily. However, r11d cannot be used as a dwarf register on x86_32. Can I use another register (like di) here?

Use rdi to represent the stack bound and CFA
Remove extra tailing offset adjust

YangKeao marked an inline comment as done.Mar 18 2021, 12:44 AM

YangKeao added inline comments.

llvm/test/CodeGen/X86/stack-clash-unknown-call.ll
18	The `def_cfa_offset` is superfluous here now. The last `cfi` directive should be either the `adjust` or `def`.
18	Fixed by removing the last `adjust`

Harbormaster completed remote builds in B94406: Diff 331477.Mar 18 2021, 1:17 AM

reformat the code

Harbormaster completed remote builds in B94420: Diff 331503.Mar 18 2021, 4:01 AM

Make sense. I tried to use r11 + offset to represent CFA temporarily. However, r11d cannot be used as a dwarf register on x86_32.

What registers can be used? I did a quick search and couldn't find anything.

Can I use another register (like di) here?

Any register that isn't callee-preserved could be used (or any register in general if its spilled first), assuming it isn't used for something else already. I did a quick search in an effort to figure out which registers are callee-saved on x86, but couldn't find anything definitive :(

This makes me wonder, though: why not leave selection of the register to use here to regalloc?

llvm/test/CodeGen/X86/stack-clash-large.ll
22	I… think this wants to be a `def_cfa_offset`? `def_cfa_register` does not reset the offset so its not at all obvious what this is offsetting from. Alternatively there's a form that combines both setting the new register and the offset into a single directive: .cfi_def_cfa %rdi, 69632

What registers can be used? I did a quick search and couldn't find anything.

I found the available dwarf registers under the x86 register table in LLVM (X86RegisterInfo.td).

Any register that isn't callee-preserved could be used (or any register in general if its spilled first), assuming it isn't used for something else already. I did a quick search in an effort to figure out which registers are callee-saved on x86, but couldn't find anything definitive :(

It seems that different platform will have different set of callee-preserved register. I also found them in the list in LLVM (which is refered in the function getCalleeSavedRegs of llvm/lib/Target/X86/X86RegisterInfo.cpp). Sadly, it seems like RDI is more likely to be a callee-saved register.

Only considering the SYSV and Windows, all callee-saved registers are:

X86::RBX, X86::R12, X86::R13, X86::R14, X86::R15, X86::RBP
X86::RBX, X86::RBP, X86::RDI, X86::RSI, X86::R12, X86::R13, X86::R14, X86::R15, X86::XMM6, X86::XMM7, X86::XMM8, X86::XMM9, X86::XMM10, X86::XMM11, X86::XMM12, X86::XMM13, X86::XMM14, X86::XMM15

And all available dwarf registers are:

32bit:
EAX, EDX, ECX, EBX, ESI, EDI, EBP, ESP, EIP
64bit, X86-64:
RAX, RDX, RCX, RBX, RSI, RDI, RBP, RSP, R8-R15, RIP

For 64bit, RAX, RCX, RDX, R8-R11 could be a good choice, and for 32 bit, it could only be choosen from EAX, ECX, EDX

This makes me wonder, though: why not leave selection of the register to use here to regalloc?

Sounds like a good idea. But I'm not familiar with LLVM codes. Is there any example on how to use the regalloc?

(I don't know whether it will make this patch too complicate. Allocating stack probe register with regalloc and fixing the dwarf sound like two things and I'd like to do it in a separated patch.)

llvm/test/CodeGen/X86/stack-clash-large.ll
22	The problem is that I don't know the accurate offset. When the callee-saved registers are pushed to the stack, there will be an offset before probing the stack (if I understand the prolog part correctly, please tell me if I'm wrong).

Use RDX/EDX instead of RDI/EDI, as RDI/EDI is callee saved on Win64 calling convension

In D98789#2634751, @YangKeao wrote:
Any register that isn't callee-preserved could be used (or any register in general if its spilled first), assuming it isn't used for something else already. I did a quick search in an effort to figure out which registers are callee-saved on x86, but couldn't find anything definitive :(

It seems that different platform will have different set of callee-preserved register. I also found them in the list in LLVM (which is refered in the function getCalleeSavedRegs of llvm/lib/Target/X86/X86RegisterInfo.cpp). Sadly, it seems like RDI is more likely to be a callee-saved register.

Only considering the SYSV and Windows, all callee-saved registers are:
X86::RBX, X86::R12, X86::R13, X86::R14, X86::R15, X86::RBP
X86::RBX, X86::RBP, X86::RDI, X86::RSI, X86::R12, X86::R13, X86::R14, X86::R15, X86::XMM6, X86::XMM7, X86::XMM8, X86::XMM9, X86::XMM10, X86::XMM11, X86::XMM12, X86::XMM13, X86::XMM14, X86::XMM15
And all available dwarf registers are:
32bit:
EAX, EDX, ECX, EBX, ESI, EDI, EBP, ESP, EIP
64bit, X86-64:
RAX, RDX, RCX, RBX, RSI, RDI, RBP, RSP, R8-R15, RIP
For 64bit, RAX, RCX, RDX, R8-R11 could be a good choice, and for 32 bit, it could only be choosen from EAX, ECX, EDX

A care must be taken to not overwrite the arguments as well. For instance on SysV x86_64 ABI rdi, rsi, rdx, rcx, r8, r9 are used to pass in integer arguments. For functions with a small number of arguments one of these could be reused, but if a function happens to use all of them, unconditional use of rdx would clobber the argument.

In D98789#2634751, @YangKeao wrote:

Sounds like a good idea. But I'm not familiar with LLVM codes. Is there any example on how to use the regalloc?

(I don't know whether it will make this patch too complicate. Allocating stack probe register with regalloc and fixing the dwarf sound like two things and I'd like to do it in a separated patch.)

You would probably have to split the patches up into distinct parts, yes (first one adjusting the backend to allocate virtual registers and the next being this one).

I'll get back to you with advice on how to specify a virtual register a little bit later –

Harbormaster completed remote builds in B94478: Diff 331583.Mar 18 2021, 9:52 AM

See createVirtualRegister.

Use RAX/EAX as the iterate register, as RDX/EDX is used as arguments under systemv

A care must be taken to not overwrite the arguments as well. For instance on SysV x86_64 ABI rdi, rsi, rdx, rcx, r8, r9 are used to pass in integer arguments. For functions with a small number of arguments one of these could be reused, but if a function happens to use all of them, unconditional use of rdx would clobber the argument.

Nice catch! After eliminating RCX, RDX from the selections, it seems that the only possible "always correct" choice is RAX (being used as the return value doesn't bother).

See createVirtualRegister.

Thanks. I have seen some usage of this function. It seems that there isn't any suitable RegClass for this situation ("dwarf suitable registers class", is there an equivalent one?). Should I create one by myself?

Harbormaster completed remote builds in B94501: Diff 331618.Mar 18 2021, 12:04 PM

nagisa mentioned this in D98909: [X86, NFC] Update stack-clash tests using the automated tooling.Mar 18 2021, 5:20 PM

I investigated this a little bit, and it seems like using createVirtualRegister won't work here, after all. It seems that this code does in fact run after regalloc happens. I'm out of ideas, but I also am not that familiar with this area to give any useful advice or guidance here.

Normally, I'd expect some register is naturally free in the prologue, but you could get into weird situations. On 32-bit specifically, consider compiling with -mregparm=3; I think there are no registers which are unconditionally safe in that case. One possibility is to always use EAX, and just save/restore it if necessary. See isEAXAlive in X86FrameLowering::emitPrologue.

Alternatively, you could ensure that some callee-save GPR is spilled, and explicitly use that register. This is taking advantage of the fact this is part of the prologue: there can't be any other uses of callee-save registers at that point. (In theory, it might be possibly for an exotic calling convention to have no callee-save registers, but I don't think there are any in practice.)

Outside the prologue, the allocation should be represented by some instruction; that instruction should clobber some register, and regalloc will ensure that register is free.

efriedma added inline comments.Mar 18 2021, 9:24 PM

llvm/test/CodeGen/X86/stack-clash-large.ll
38	BTW, this is completely broken; r11d doesn't exist on 32-bit x86.

YangKeao added inline comments.Mar 18 2021, 10:39 PM

llvm/test/CodeGen/X86/stack-clash-large.ll
38	Wow, surprising discovery. (I think) a "bad register name" should be given when compiling this codes? Is there any pass which will omit this problem? Run `clang -fstack-clash-protection -m32 -fomit-frame-pointer -S` will generate codes containing `r11d`, which is bad. However, `clang -fstack-clash-protection -m32 -fomit-frame-pointer -c` and disassemble the output, the register used here will become `ebx`.

Normally, I'd expect some register is naturally free in the prologue, but you could get into weird situations. On 32-bit specifically, consider compiling with -mregparm=3; I think there are no registers which are unconditionally safe in that case. One possibility is to always use EAX, and just save/restore it if necessary. See isEAXAlive in X86FrameLowering::emitPrologue.

Alternatively, you could ensure that some callee-save GPR is spilled, and explicitly use that register. This is taking advantage of the fact this is part of the prologue: there can't be any other uses of callee-save registers at that point. (In theory, it might be possibly for an exotic calling convention to have no callee-save registers, but I don't think there are any in practice.)

Both solutions seems to be either complicate or with extra cost. And surprisingly found that the original implementation of stack probe is wrong on 32bit. Given that, I preferred to only provide DWARF information in 64bit situation (and left a comment), so that r11 can be used and solve this problem easily.

Remove dwarf information for 32bit and use R11 as the iterate bound / dwarf register

left comments about 32bit

Harbormaster completed remote builds in B94623: Diff 331774.Mar 19 2021, 12:41 AM

Harbormaster completed remote builds in B94624: Diff 331775.Mar 19 2021, 1:10 AM

btw I prototyped a D98906: [X86] Improve lowering of the unrolled inline-asm probing yesterday as an alternative approach towards improving the unrolled case.

In D98789#2636879, @nagisa wrote:

btw I prototyped a D98906: [X86] Improve lowering of the unrolled inline-asm probing yesterday as an alternative approach towards improving the unrolled case.

Great! It seems better. I will rebase on it.

nagisa mentioned this in rGc2313a45307e: [X86, NFC] Update stack-clash tests using the automated tooling.Mar 19 2021, 5:02 AM

nagisa mentioned this in D98999: [X86] Don't use sp as a loop variable in loop stack probing.Mar 19 2021, 5:59 PM

Looks like my alternative has caveats of its own. Can you please split your change into two parts? One that affects only the unrolled case (which I believe should be good to land) and the other which affects the loop case (potentially subject to further discussion).

@YangKeao Will you be pursuing this further? Should I take over this for you?

In D98789#2657046, @nagisa wrote:

@YangKeao Will you be pursuing this further? Should I take over this for you?

Oops. Sorry, I missed the former comment "Looks like my alternative has caveats of its own.". I will split this patch into two parts right now. Thanks.

YangKeao mentioned this in D99579: [X86] add dwarf annotation for inline stack probe.Mar 30 2021, 5:08 AM

YangKeao mentioned this in D99585: [X86] add dwarf information for loop stack probe.Mar 30 2021, 6:02 AM

Stack probing with loop should be discussed further in D99585.

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86FrameLowering.h

4 lines

X86FrameLowering.cpp

50 lines

test/

CodeGen/

X86/

stack-clash-large-large-align.ll

6 lines

stack-clash-large.ll

20 lines

stack-clash-medium-natural-probes-mutliple-objects.ll

1 line

stack-clash-medium-natural-probes.ll

1 line

stack-clash-medium.ll

2 lines

stack-clash-small-large-align.ll

6 lines

stack-clash-unknown-call.ll

1 line

Diff 331618

llvm/lib/Target/X86/X86FrameLowering.h

Show First 20 Lines • Show All 186 Lines • ▼ Show 20 Lines	public:

Register getInitialCFARegister(const MachineFunction &MF) const override;		Register getInitialCFARegister(const MachineFunction &MF) const override;

/// Return true if the function has a redzone (accessible bytes past the		/// Return true if the function has a redzone (accessible bytes past the
/// frame of the top of stack function) as part of it's ABI.		/// frame of the top of stack function) as part of it's ABI.
bool has128ByteRedZone(const MachineFunction& MF) const;		bool has128ByteRedZone(const MachineFunction& MF) const;

private:		private:
		bool isWin64Prologue(const MachineFunction &MF) const;

		bool needsDwarfCFI(const MachineFunction &MF) const;

uint64_t calculateMaxStackAlign(const MachineFunction &MF) const;		uint64_t calculateMaxStackAlign(const MachineFunction &MF) const;

/// Emit target stack probe as a call to a helper function		/// Emit target stack probe as a call to a helper function
void emitStackProbeCall(MachineFunction &MF, MachineBasicBlock &MBB,		void emitStackProbeCall(MachineFunction &MF, MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI, const DebugLoc &DL,		MachineBasicBlock::iterator MBBI, const DebugLoc &DL,
bool InProlog) const;		bool InProlog) const;

/// Emit target stack probe as an inline sequence.		/// Emit target stack probe as an inline sequence.
▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86FrameLowering.cpp

Show First 20 Lines • Show All 547 Lines • ▼ Show 20 Lines	void X86FrameLowering::emitStackProbeInlineGeneric(
}		}
}		}

void X86FrameLowering::emitStackProbeInlineGenericBlock(		void X86FrameLowering::emitStackProbeInlineGenericBlock(
MachineFunction &MF, MachineBasicBlock &MBB,		MachineFunction &MF, MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI, const DebugLoc &DL, uint64_t Offset,		MachineBasicBlock::iterator MBBI, const DebugLoc &DL, uint64_t Offset,
uint64_t AlignOffset) const {		uint64_t AlignOffset) const {

		const bool NeedsDwarfCFI = needsDwarfCFI(MF);
		const bool HasFP = hasFP(MF);
const X86Subtarget &STI = MF.getSubtarget<X86Subtarget>();		const X86Subtarget &STI = MF.getSubtarget<X86Subtarget>();
const X86TargetLowering &TLI = *STI.getTargetLowering();		const X86TargetLowering &TLI = *STI.getTargetLowering();
const unsigned Opc = getSUBriOpcode(Uses64BitFramePtr, Offset);		const unsigned Opc = getSUBriOpcode(Uses64BitFramePtr, Offset);
const unsigned MovMIOpc = Is64Bit ? X86::MOV64mi32 : X86::MOV32mi;		const unsigned MovMIOpc = Is64Bit ? X86::MOV64mi32 : X86::MOV32mi;
const uint64_t StackProbeSize = TLI.getStackProbeSize(MF);		const uint64_t StackProbeSize = TLI.getStackProbeSize(MF);

uint64_t CurrentOffset = 0;		uint64_t CurrentOffset = 0;

assert(AlignOffset < StackProbeSize);		assert(AlignOffset < StackProbeSize);

// If the offset is so small it fits within a page, there's nothing to do.		// If the offset is so small it fits within a page, there's nothing to do.
if (StackProbeSize < Offset + AlignOffset) {		if (StackProbeSize < Offset + AlignOffset) {

MachineInstr *MI = BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr)		MachineInstr *MI = BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr)
.addReg(StackPtr)		.addReg(StackPtr)
.addImm(StackProbeSize - AlignOffset)		.addImm(StackProbeSize - AlignOffset)
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
MI->getOperand(3).setIsDead(); // The EFLAGS implicit def is dead.		MI->getOperand(3).setIsDead(); // The EFLAGS implicit def is dead.

		if (!HasFP && NeedsDwarfCFI) {
		BuildCFI(MBB, MBBI, DL,
		MCCFIInstruction::createAdjustCfaOffset(
		nullptr, StackProbeSize - AlignOffset));
		}

addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(MovMIOpc))		addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(MovMIOpc))
.setMIFlag(MachineInstr::FrameSetup),		.setMIFlag(MachineInstr::FrameSetup),
StackPtr, false, 0)		StackPtr, false, 0)
.addImm(0)		.addImm(0)
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
NumFrameExtraProbe++;		NumFrameExtraProbe++;
CurrentOffset = StackProbeSize - AlignOffset;		CurrentOffset = StackProbeSize - AlignOffset;
}		}

// For the next N - 1 pages, just probe. I tried to take advantage of		// For the next N - 1 pages, just probe. I tried to take advantage of
// natural probes but it implies much more logic and there was very few		// natural probes but it implies much more logic and there was very few
// interesting natural probes to interleave.		// interesting natural probes to interleave.
while (CurrentOffset + StackProbeSize < Offset) {		while (CurrentOffset + StackProbeSize < Offset) {
MachineInstr *MI = BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr)		MachineInstr *MI = BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr)
.addReg(StackPtr)		.addReg(StackPtr)
.addImm(StackProbeSize)		.addImm(StackProbeSize)
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
MI->getOperand(3).setIsDead(); // The EFLAGS implicit def is dead.		MI->getOperand(3).setIsDead(); // The EFLAGS implicit def is dead.

		if (!HasFP && NeedsDwarfCFI) {
		BuildCFI(
		MBB, MBBI, DL,
		MCCFIInstruction::createAdjustCfaOffset(nullptr, StackProbeSize));
		}

addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(MovMIOpc))		addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(MovMIOpc))
.setMIFlag(MachineInstr::FrameSetup),		.setMIFlag(MachineInstr::FrameSetup),
StackPtr, false, 0)		StackPtr, false, 0)
.addImm(0)		.addImm(0)
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
NumFrameExtraProbe++;		NumFrameExtraProbe++;
CurrentOffset += StackProbeSize;		CurrentOffset += StackProbeSize;
}		}

// No need to probe the tail, it is smaller than a Page.		// No need to probe the tail, it is smaller than a Page.
uint64_t ChunkSize = Offset - CurrentOffset;		uint64_t ChunkSize = Offset - CurrentOffset;
MachineInstr *MI = BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr)		MachineInstr *MI = BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr)
.addReg(StackPtr)		.addReg(StackPtr)
.addImm(ChunkSize)		.addImm(ChunkSize)
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
		// No need to adjust Dwarf CFA offset here, the last position of the stack has
		// been defined
MI->getOperand(3).setIsDead(); // The EFLAGS implicit def is dead.		MI->getOperand(3).setIsDead(); // The EFLAGS implicit def is dead.
}		}

void X86FrameLowering::emitStackProbeInlineGenericLoop(		void X86FrameLowering::emitStackProbeInlineGenericLoop(
MachineFunction &MF, MachineBasicBlock &MBB,		MachineFunction &MF, MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI, const DebugLoc &DL, uint64_t Offset,		MachineBasicBlock::iterator MBBI, const DebugLoc &DL, uint64_t Offset,
uint64_t AlignOffset) const {		uint64_t AlignOffset) const {
assert(Offset && "null offset");		assert(Offset && "null offset");

		const bool NeedsDwarfCFI = needsDwarfCFI(MF);
		const bool HasFP = hasFP(MF);
const X86Subtarget &STI = MF.getSubtarget<X86Subtarget>();		const X86Subtarget &STI = MF.getSubtarget<X86Subtarget>();
const X86TargetLowering &TLI = *STI.getTargetLowering();		const X86TargetLowering &TLI = *STI.getTargetLowering();
const unsigned MovMIOpc = Is64Bit ? X86::MOV64mi32 : X86::MOV32mi;		const unsigned MovMIOpc = Is64Bit ? X86::MOV64mi32 : X86::MOV32mi;
const uint64_t StackProbeSize = TLI.getStackProbeSize(MF);		const uint64_t StackProbeSize = TLI.getStackProbeSize(MF);

if (AlignOffset) {		if (AlignOffset) {
if (AlignOffset < StackProbeSize) {		if (AlignOffset < StackProbeSize) {
// Perform a first smaller allocation followed by a probe.		// Perform a first smaller allocation followed by a probe.
Show All 20 Lines	void X86FrameLowering::emitStackProbeInlineGenericLoop(

MachineBasicBlock *testMBB = MF.CreateMachineBasicBlock(LLVM_BB);		MachineBasicBlock *testMBB = MF.CreateMachineBasicBlock(LLVM_BB);
MachineBasicBlock *tailMBB = MF.CreateMachineBasicBlock(LLVM_BB);		MachineBasicBlock *tailMBB = MF.CreateMachineBasicBlock(LLVM_BB);

MachineFunction::iterator MBBIter = ++MBB.getIterator();		MachineFunction::iterator MBBIter = ++MBB.getIterator();
MF.insert(MBBIter, testMBB);		MF.insert(MBBIter, testMBB);
MF.insert(MBBIter, tailMBB);		MF.insert(MBBIter, tailMBB);

Register FinalStackProbed = Uses64BitFramePtr ? X86::R11 : X86::R11D;		Register FinalStackProbed = Uses64BitFramePtr ? X86::RAX : X86::EAX;
BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::COPY), FinalStackProbed)		BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::COPY), FinalStackProbed)
.addReg(StackPtr)		.addReg(StackPtr)
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);

// save loop bound		// save loop bound
{		{
		const unsigned BoundOffset = Offset / StackProbeSize * StackProbeSize;
const unsigned SUBOpc = getSUBriOpcode(Uses64BitFramePtr, Offset);		const unsigned SUBOpc = getSUBriOpcode(Uses64BitFramePtr, Offset);
BuildMI(MBB, MBBI, DL, TII.get(SUBOpc), FinalStackProbed)		BuildMI(MBB, MBBI, DL, TII.get(SUBOpc), FinalStackProbed)
.addReg(FinalStackProbed)		.addReg(FinalStackProbed)
.addImm(Offset / StackProbeSize * StackProbeSize)		.addImm(BoundOffset)
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);

		if (!HasFP && NeedsDwarfCFI) {
		BuildCFI(MBB, MBBI, DL,
		MCCFIInstruction::createDefCfaRegister(
		nullptr, TRI->getDwarfRegNum(FinalStackProbed, true)));
		BuildCFI(MBB, MBBI, DL,
		MCCFIInstruction::createAdjustCfaOffset(nullptr, BoundOffset));
		}
}		}

// allocate a page		// allocate a page
{		{
const unsigned SUBOpc = getSUBriOpcode(Uses64BitFramePtr, StackProbeSize);		const unsigned SUBOpc = getSUBriOpcode(Uses64BitFramePtr, StackProbeSize);
BuildMI(testMBB, DL, TII.get(SUBOpc), StackPtr)		BuildMI(testMBB, DL, TII.get(SUBOpc), StackPtr)
.addReg(StackPtr)		.addReg(StackPtr)
.addImm(StackProbeSize)		.addImm(StackProbeSize)
Show All 23 Lines	void X86FrameLowering::emitStackProbeInlineGenericLoop(

// BB management		// BB management
tailMBB->splice(tailMBB->end(), &MBB, MBBI, MBB.end());		tailMBB->splice(tailMBB->end(), &MBB, MBBI, MBB.end());
tailMBB->transferSuccessorsAndUpdatePHIs(&MBB);		tailMBB->transferSuccessorsAndUpdatePHIs(&MBB);
MBB.addSuccessor(testMBB);		MBB.addSuccessor(testMBB);

// handle tail		// handle tail
unsigned TailOffset = Offset % StackProbeSize;		unsigned TailOffset = Offset % StackProbeSize;
		MachineBasicBlock::iterator TailMBBIter = tailMBB->begin();
if (TailOffset) {		if (TailOffset) {
const unsigned Opc = getSUBriOpcode(Uses64BitFramePtr, TailOffset);		const unsigned Opc = getSUBriOpcode(Uses64BitFramePtr, TailOffset);
BuildMI(*tailMBB, tailMBB->begin(), DL, TII.get(Opc), StackPtr)		BuildMI(*tailMBB, TailMBBIter, DL, TII.get(Opc), StackPtr)
.addReg(StackPtr)		.addReg(StackPtr)
.addImm(TailOffset)		.addImm(TailOffset)
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
}		}

		if (!HasFP && NeedsDwarfCFI) {
		BuildCFI(*tailMBB, TailMBBIter, DL,
		MCCFIInstruction::createDefCfaRegister(
		nullptr, TRI->getDwarfRegNum(StackPtr, true)));
		}

// Update Live In information		// Update Live In information
recomputeLiveIns(*testMBB);		recomputeLiveIns(*testMBB);
recomputeLiveIns(*tailMBB);		recomputeLiveIns(*tailMBB);
}		}

void X86FrameLowering::emitStackProbeInlineWindowsCoreCLR64(		void X86FrameLowering::emitStackProbeInlineWindowsCoreCLR64(
MachineFunction &MF, MachineBasicBlock &MBB,		MachineFunction &MF, MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI, const DebugLoc &DL, bool InProlog) const {		MachineBasicBlock::iterator MBBI, const DebugLoc &DL, bool InProlog) const {
▲ Show 20 Lines • Show All 473 Lines • ▼ Show 20 Lines	bool X86FrameLowering::has128ByteRedZone(const MachineFunction& MF) const {
// clobbered by any interrupt handler.		// clobbered by any interrupt handler.
assert(&STI == &MF.getSubtarget<X86Subtarget>() &&		assert(&STI == &MF.getSubtarget<X86Subtarget>() &&
"MF used frame lowering for wrong subtarget");		"MF used frame lowering for wrong subtarget");
const Function &Fn = MF.getFunction();		const Function &Fn = MF.getFunction();
const bool IsWin64CC = STI.isCallingConvWin64(Fn.getCallingConv());		const bool IsWin64CC = STI.isCallingConvWin64(Fn.getCallingConv());
return Is64Bit && !IsWin64CC && !Fn.hasFnAttribute(Attribute::NoRedZone);		return Is64Bit && !IsWin64CC && !Fn.hasFnAttribute(Attribute::NoRedZone);
}		}

		bool X86FrameLowering::isWin64Prologue(const MachineFunction &MF) const {
		return MF.getTarget().getMCAsmInfo()->usesWindowsCFI();
		}

		bool X86FrameLowering::needsDwarfCFI(const MachineFunction &MF) const {
		return !isWin64Prologue(MF) && MF.needsFrameMoves();
		}

/// emitPrologue - Push callee-saved registers onto the stack, which		/// emitPrologue - Push callee-saved registers onto the stack, which
/// automatically adjust the stack pointer. Adjust the stack pointer to allocate		/// automatically adjust the stack pointer. Adjust the stack pointer to allocate
/// space for local variables. Also emit labels used by the exception handler to		/// space for local variables. Also emit labels used by the exception handler to
/// generate the exception handling frames.		/// generate the exception handling frames.

/*		/*
Here's a gist of what gets emitted:		Here's a gist of what gets emitted:
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	void X86FrameLowering::emitPrologue(MachineFunction &MF,
bool IsFunclet = MBB.isEHFuncletEntry();		bool IsFunclet = MBB.isEHFuncletEntry();
EHPersonality Personality = EHPersonality::Unknown;		EHPersonality Personality = EHPersonality::Unknown;
if (Fn.hasPersonalityFn())		if (Fn.hasPersonalityFn())
Personality = classifyEHPersonality(Fn.getPersonalityFn());		Personality = classifyEHPersonality(Fn.getPersonalityFn());
bool FnHasClrFunclet =		bool FnHasClrFunclet =
MF.hasEHFunclets() && Personality == EHPersonality::CoreCLR;		MF.hasEHFunclets() && Personality == EHPersonality::CoreCLR;
bool IsClrFunclet = IsFunclet && FnHasClrFunclet;		bool IsClrFunclet = IsFunclet && FnHasClrFunclet;
bool HasFP = hasFP(MF);		bool HasFP = hasFP(MF);
bool IsWin64Prologue = MF.getTarget().getMCAsmInfo()->usesWindowsCFI();		bool IsWin64Prologue = isWin64Prologue(MF);
bool NeedsWin64CFI = IsWin64Prologue && Fn.needsUnwindTableEntry();		bool NeedsWin64CFI = IsWin64Prologue && Fn.needsUnwindTableEntry();
// FIXME: Emit FPO data for EH funclets.		// FIXME: Emit FPO data for EH funclets.
bool NeedsWinFPO =		bool NeedsWinFPO =
!IsFunclet && STI.isTargetWin32() && MMI.getModule()->getCodeViewFlag();		!IsFunclet && STI.isTargetWin32() && MMI.getModule()->getCodeViewFlag();
bool NeedsWinCFI = NeedsWin64CFI \|\| NeedsWinFPO;		bool NeedsWinCFI = NeedsWin64CFI \|\| NeedsWinFPO;
bool NeedsDwarfCFI = !IsWin64Prologue && MF.needsFrameMoves();		bool NeedsDwarfCFI = needsDwarfCFI(MF);
Register FramePtr = TRI->getFrameRegister(MF);		Register FramePtr = TRI->getFrameRegister(MF);
const Register MachineFramePtr =		const Register MachineFramePtr =
STI.isTarget64BitILP32()		STI.isTarget64BitILP32()
? Register(getX86SubSuperRegister(FramePtr, 64)) : FramePtr;		? Register(getX86SubSuperRegister(FramePtr, 64)) : FramePtr;
Register BasePtr = TRI->getBaseRegister();		Register BasePtr = TRI->getBaseRegister();
bool HasWinCFI = false;		bool HasWinCFI = false;

// Debug location must be unknown since the first debug location is used		// Debug location must be unknown since the first debug location is used
▲ Show 20 Lines • Show All 2,279 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/stack-clash-large-large-align.ll

	Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: movq $0, (%rsp)			; CHECK-NEXT: movq $0, (%rsp)
	; CHECK-NEXT: subq $4096, %rsp # imm = 0x1000			; CHECK-NEXT: subq $4096, %rsp # imm = 0x1000
	; CHECK-NEXT: cmpq %rsp, %r11			; CHECK-NEXT: cmpq %rsp, %r11
	; CHECK-NEXT: jb .LBB1_2			; CHECK-NEXT: jb .LBB1_2
	; CHECK-NEXT:.LBB1_3:			; CHECK-NEXT:.LBB1_3:
	; CHECK-NEXT: movq %r11, %rsp			; CHECK-NEXT: movq %r11, %rsp
	; CHECK-NEXT: movq $0, (%rsp)			; CHECK-NEXT: movq $0, (%rsp)
	; CHECK-NEXT:.LBB1_4:			; CHECK-NEXT:.LBB1_4:
	; CHECK-NEXT: movq %rsp, %r11			; CHECK-NEXT: movq %rsp, %rax
	; CHECK-NEXT: subq $73728, %r11 # imm = 0x12000			; CHECK-NEXT: subq $73728, %rax # imm = 0x12000
	; CHECK-NEXT:.LBB1_5: # =>This Inner Loop Header: Depth=1			; CHECK-NEXT:.LBB1_5: # =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: subq $4096, %rsp # imm = 0x1000			; CHECK-NEXT: subq $4096, %rsp # imm = 0x1000
	; CHECK-NEXT: movq $0, (%rsp)			; CHECK-NEXT: movq $0, (%rsp)
	; CHECK-NEXT: cmpq %r11, %rsp			; CHECK-NEXT: cmpq %rax, %rsp
	; CHECK-NEXT: jne .LBB1_5			; CHECK-NEXT: jne .LBB1_5
	; CHECK-NEXT:# %bb.6:			; CHECK-NEXT:# %bb.6:
	; CHECK-NEXT: movl $1, 392(%rsp)			; CHECK-NEXT: movl $1, 392(%rsp)
	; CHECK-NEXT: movl $1, 28792(%rsp)			; CHECK-NEXT: movl $1, 28792(%rsp)
	; CHECK-NEXT: movl (%rsp), %eax			; CHECK-NEXT: movl (%rsp), %eax
	; CHECK-NEXT: movq %rbp, %rsp			; CHECK-NEXT: movq %rbp, %rsp
	; CHECK-NEXT: popq %rbp			; CHECK-NEXT: popq %rbp
	; CHECK-NEXT: .cfi_def_cfa %rsp, 8			; CHECK-NEXT: .cfi_def_cfa %rsp, 8
	Show All 15 Lines

llvm/test/CodeGen/X86/stack-clash-large.ll

Show All 10 Lines	define i32 @foo() local_unnamed_addr #0 {
%c = load volatile i32, i32* %a		%c = load volatile i32, i32* %a
ret i32 %c		ret i32 %c
}		}

attributes #0 = {"probe-stack"="inline-asm"}		attributes #0 = {"probe-stack"="inline-asm"}

; CHECK-X86-64-LABEL: foo:		; CHECK-X86-64-LABEL: foo:
; CHECK-X86-64: # %bb.0:		; CHECK-X86-64: # %bb.0:
; CHECK-X86-64-NEXT: movq %rsp, %r11		; CHECK-X86-64-NEXT: movq %rsp, %rax
; CHECK-X86-64-NEXT: subq $69632, %r11 # imm = 0x11000		; CHECK-X86-64-NEXT: subq $69632, %rax # imm = 0x11000
		; CHECK-X86-64-NEXT: .cfi_def_cfa_register %rax
		; CHECK-X86-64-NEXT: .cfi_adjust_cfa_offset 69632
		nagisaUnsubmitted Not Done Reply Inline Actions I… think this wants to be a `def_cfa_offset`? `def_cfa_register` does not reset the offset so its not at all obvious what this is offsetting from. Alternatively there's a form that combines both setting the new register and the offset into a single directive: .cfi_def_cfa %rdi, 69632 nagisa: I… think this wants to be a `def_cfa_offset`? `def_cfa_register` does not reset the offset so…
		YangKeaoAuthorUnsubmitted Done Reply Inline Actions The problem is that I don't know the accurate offset. When the callee-saved registers are pushed to the stack, there will be an offset before probing the stack (if I understand the prolog part correctly, please tell me if I'm wrong). YangKeao: The problem is that I don't know the accurate offset. When the callee-saved registers are…
; CHECK-X86-64-NEXT: .LBB0_1:		; CHECK-X86-64-NEXT: .LBB0_1:
; CHECK-X86-64-NEXT: subq $4096, %rsp # imm = 0x1000		; CHECK-X86-64-NEXT: subq $4096, %rsp # imm = 0x1000
; CHECK-X86-64-NEXT: movq $0, (%rsp)		; CHECK-X86-64-NEXT: movq $0, (%rsp)
; CHECK-X86-64-NEXT: cmpq %r11, %rsp		; CHECK-X86-64-NEXT: cmpq %rax, %rsp
; CHECK-X86-64-NEXT: jne .LBB0_1		; CHECK-X86-64-NEXT: jne .LBB0_1
; CHECK-X86-64-NEXT:# %bb.2:		; CHECK-X86-64-NEXT:# %bb.2:
; CHECK-X86-64-NEXT: subq $2248, %rsp		; CHECK-X86-64-NEXT: subq $2248, %rsp
		; CHECK-X86-64-NEXT: .cfi_def_cfa_register %rsp
; CHECK-X86-64-NEXT: .cfi_def_cfa_offset 71888		; CHECK-X86-64-NEXT: .cfi_def_cfa_offset 71888
; CHECK-X86-64-NEXT: movl $1, 264(%rsp)		; CHECK-X86-64-NEXT: movl $1, 264(%rsp)
; CHECK-X86-64-NEXT: movl $1, 28664(%rsp)		; CHECK-X86-64-NEXT: movl $1, 28664(%rsp)
; CHECK-X86-64-NEXT: movl -128(%rsp), %eax		; CHECK-X86-64-NEXT: movl -128(%rsp), %eax
; CHECK-X86-64-NEXT: addq $71880, %rsp # imm = 0x118C8		; CHECK-X86-64-NEXT: addq $71880, %rsp # imm = 0x118C8
; CHECK-X86-64-NEXT: .cfi_def_cfa_offset 8		; CHECK-X86-64-NEXT: .cfi_def_cfa_offset 8
; CHECK-X86-64-NEXT: retq		; CHECK-X86-64-NEXT: retq

; CHECK-X86-32-LABEL: foo:		; CHECK-X86-32-LABEL: foo:
; CHECK-X86-32: # %bb.0:		; CHECK-X86-32: # %bb.0:
; CHECK-X86-32-NEXT: movl %esp, %r11d		; CHECK-X86-32-NEXT: movl %esp, %eax
efriedmaUnsubmitted Not Done Reply Inline Actions BTW, this is completely broken; r11d doesn't exist on 32-bit x86. efriedma: BTW, this is completely broken; r11d doesn't exist on 32-bit x86.
YangKeaoAuthorUnsubmitted Done Reply Inline Actions Wow, surprising discovery. (I think) a "bad register name" should be given when compiling this codes? Is there any pass which will omit this problem? Run `clang -fstack-clash-protection -m32 -fomit-frame-pointer -S` will generate codes containing `r11d`, which is bad. However, `clang -fstack-clash-protection -m32 -fomit-frame-pointer -c` and disassemble the output, the register used here will become `ebx`. YangKeao: Wow, surprising discovery. (I think) a "bad register name" should be given when compiling this…
; CHECK-X86-32-NEXT: subl $69632, %r11d # imm = 0x11000		; CHECK-X86-32-NEXT: subl $69632, %eax # imm = 0x11000
		; CHECK-X86-32-NEXT: .cfi_def_cfa_register %eax
		; CHECK-X86-32-NEXT: .cfi_adjust_cfa_offset 69632
; CHECK-X86-32-NEXT: .LBB0_1: # =>This Inner Loop Header: Depth=1		; CHECK-X86-32-NEXT: .LBB0_1: # =>This Inner Loop Header: Depth=1
; CHECK-X86-32-NEXT: subl $4096, %esp # imm = 0x1000		; CHECK-X86-32-NEXT: subl $4096, %esp # imm = 0x1000
; CHECK-X86-32-NEXT: movl $0, (%esp)		; CHECK-X86-32-NEXT: movl $0, (%esp)
; CHECK-X86-32-NEXT: cmpl %r11d, %esp		; CHECK-X86-32-NEXT: cmpl %eax, %esp
; CHECK-X86-32-NEXT: jne .LBB0_1		; CHECK-X86-32-NEXT: jne .LBB0_1
; CHECK-X86-32-NEXT:# %bb.2:		; CHECK-X86-32-NEXT:# %bb.2:
; CHECK-X86-32-NEXT: subl $2380, %esp		; CHECK-X86-32-NEXT: subl $2380, %esp
		; CHECK-X86-32-NEXT: .cfi_def_cfa_register %esp
; CHECK-X86-32-NEXT: .cfi_def_cfa_offset 72016		; CHECK-X86-32-NEXT: .cfi_def_cfa_offset 72016
; CHECK-X86-32-NEXT: movl $1, 392(%esp)		; CHECK-X86-32-NEXT: movl $1, 392(%esp)
; CHECK-X86-32-NEXT: movl $1, 28792(%esp)		; CHECK-X86-32-NEXT: movl $1, 28792(%esp)
; CHECK-X86-32-NEXT: movl (%esp), %eax		; CHECK-X86-32-NEXT: movl (%esp), %eax
; CHECK-X86-32-NEXT: addl $72012, %esp # imm = 0x1194C		; CHECK-X86-32-NEXT: addl $72012, %esp # imm = 0x1194C
; CHECK-X86-32-NEXT: .cfi_def_cfa_offset 4		; CHECK-X86-32-NEXT: .cfi_def_cfa_offset 4
; CHECK-X86-32-NEXT: retl		; CHECK-X86-32-NEXT: retl

llvm/test/CodeGen/X86/stack-clash-medium-natural-probes-mutliple-objects.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s \| FileCheck %s			; RUN: llc < %s \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	define i32 @foo() local_unnamed_addr #0 {			define i32 @foo() local_unnamed_addr #0 {
	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: subq $4096, %rsp # imm = 0x1000			; CHECK-NEXT: subq $4096, %rsp # imm = 0x1000
				; CHECK-NEXT: .cfi_adjust_cfa_offset 4096
	; CHECK-NEXT: movq $0, (%rsp)			; CHECK-NEXT: movq $0, (%rsp)
	; CHECK-NEXT: subq $1784, %rsp # imm = 0x6F8			; CHECK-NEXT: subq $1784, %rsp # imm = 0x6F8
	; CHECK-NEXT: .cfi_def_cfa_offset 5888			; CHECK-NEXT: .cfi_def_cfa_offset 5888
	; CHECK-NEXT: movl $1, {{[0-9]+}}(%rsp)			; CHECK-NEXT: movl $1, {{[0-9]+}}(%rsp)
	; CHECK-NEXT: movl $2, {{[0-9]+}}(%rsp)			; CHECK-NEXT: movl $2, {{[0-9]+}}(%rsp)
	; CHECK-NEXT: movl {{[0-9]+}}(%rsp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%rsp), %eax
	; CHECK-NEXT: addq $5880, %rsp # imm = 0x16F8			; CHECK-NEXT: addq $5880, %rsp # imm = 0x16F8
	; CHECK-NEXT: .cfi_def_cfa_offset 8			; CHECK-NEXT: .cfi_def_cfa_offset 8
	Show All 12 Lines

llvm/test/CodeGen/X86/stack-clash-medium-natural-probes.ll

	; RUN: llc < %s \| FileCheck %s			; RUN: llc < %s \| FileCheck %s


	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	define i32 @foo() local_unnamed_addr #0 {			define i32 @foo() local_unnamed_addr #0 {

	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: subq $4096, %rsp # imm = 0x1000			; CHECK-NEXT: subq $4096, %rsp # imm = 0x1000
				; CHECK-NEXT: .cfi_adjust_cfa_offset 4096
	; CHECK-NEXT: movq $0, (%rsp)			; CHECK-NEXT: movq $0, (%rsp)
	; CHECK-NEXT: subq $3784, %rsp # imm = 0xEC8			; CHECK-NEXT: subq $3784, %rsp # imm = 0xEC8
	; CHECK-NEXT: .cfi_def_cfa_offset 7888			; CHECK-NEXT: .cfi_def_cfa_offset 7888
	; CHECK-NEXT: movl $1, 264(%rsp)			; CHECK-NEXT: movl $1, 264(%rsp)
	; CHECK-NEXT: movl $1, 4664(%rsp)			; CHECK-NEXT: movl $1, 4664(%rsp)
	; CHECK-NEXT: movl -128(%rsp), %eax			; CHECK-NEXT: movl -128(%rsp), %eax
	; CHECK-NEXT: addq $7880, %rsp # imm = 0x1EC8			; CHECK-NEXT: addq $7880, %rsp # imm = 0x1EC8
	; CHECK-NEXT: .cfi_def_cfa_offset 8			; CHECK-NEXT: .cfi_def_cfa_offset 8
	Show All 14 Lines

llvm/test/CodeGen/X86/stack-clash-medium.ll

	; RUN: llc -mtriple=x86_64-linux-android < %s \| FileCheck -check-prefix=CHECK-X86-64 %s			; RUN: llc -mtriple=x86_64-linux-android < %s \| FileCheck -check-prefix=CHECK-X86-64 %s
	; RUN: llc -mtriple=i686-linux-android < %s \| FileCheck -check-prefix=CHECK-X86-32 %s			; RUN: llc -mtriple=i686-linux-android < %s \| FileCheck -check-prefix=CHECK-X86-32 %s

	define i32 @foo() local_unnamed_addr #0 {			define i32 @foo() local_unnamed_addr #0 {
	%a = alloca i32, i64 2000, align 16			%a = alloca i32, i64 2000, align 16
	%b = getelementptr inbounds i32, i32* %a, i64 200			%b = getelementptr inbounds i32, i32* %a, i64 200
	store volatile i32 1, i32* %b			store volatile i32 1, i32* %b
	%c = load volatile i32, i32* %a			%c = load volatile i32, i32* %a
	ret i32 %c			ret i32 %c
	}			}

	attributes #0 = {"probe-stack"="inline-asm"}			attributes #0 = {"probe-stack"="inline-asm"}

	; CHECK-X86-64-LABEL: foo:			; CHECK-X86-64-LABEL: foo:
	; CHECK-X86-64: # %bb.0:			; CHECK-X86-64: # %bb.0:
	; CHECK-X86-64-NEXT: subq $4096, %rsp # imm = 0x1000			; CHECK-X86-64-NEXT: subq $4096, %rsp # imm = 0x1000
				; CHECK-X86-64-NEXT: .cfi_adjust_cfa_offset 4096
	; CHECK-X86-64-NEXT: movq $0, (%rsp)			; CHECK-X86-64-NEXT: movq $0, (%rsp)
	; CHECK-X86-64-NEXT: subq $3784, %rsp # imm = 0xEC8			; CHECK-X86-64-NEXT: subq $3784, %rsp # imm = 0xEC8
	; CHECK-X86-64-NEXT: .cfi_def_cfa_offset 7888			; CHECK-X86-64-NEXT: .cfi_def_cfa_offset 7888
	; CHECK-X86-64-NEXT: movl $1, 672(%rsp)			; CHECK-X86-64-NEXT: movl $1, 672(%rsp)
	; CHECK-X86-64-NEXT: movl -128(%rsp), %eax			; CHECK-X86-64-NEXT: movl -128(%rsp), %eax
	; CHECK-X86-64-NEXT: addq $7880, %rsp # imm = 0x1EC8			; CHECK-X86-64-NEXT: addq $7880, %rsp # imm = 0x1EC8
	; CHECK-X86-64-NEXT: .cfi_def_cfa_offset 8			; CHECK-X86-64-NEXT: .cfi_def_cfa_offset 8
	; CHECK-X86-64-NEXT: retq			; CHECK-X86-64-NEXT: retq


	; CHECK-X86-32-LABEL: foo:			; CHECK-X86-32-LABEL: foo:
	; CHECK-X86-32: # %bb.0:			; CHECK-X86-32: # %bb.0:
	; CHECK-X86-32-NEXT: subl $4096, %esp # imm = 0x1000			; CHECK-X86-32-NEXT: subl $4096, %esp # imm = 0x1000
				; CHECK-X86-32-NEXT: .cfi_adjust_cfa_offset 4096
	; CHECK-X86-32-NEXT: movl $0, (%esp)			; CHECK-X86-32-NEXT: movl $0, (%esp)
	; CHECK-X86-32-NEXT: subl $3916, %esp # imm = 0xF4C			; CHECK-X86-32-NEXT: subl $3916, %esp # imm = 0xF4C
	; CHECK-X86-32-NEXT: .cfi_def_cfa_offset 8016			; CHECK-X86-32-NEXT: .cfi_def_cfa_offset 8016
	; CHECK-X86-32-NEXT: movl $1, 800(%esp)			; CHECK-X86-32-NEXT: movl $1, 800(%esp)
	; CHECK-X86-32-NEXT: movl (%esp), %eax			; CHECK-X86-32-NEXT: movl (%esp), %eax
	; CHECK-X86-32-NEXT: addl $8012, %esp # imm = 0x1F4C			; CHECK-X86-32-NEXT: addl $8012, %esp # imm = 0x1F4C
	; CHECK-X86-32-NEXT: .cfi_def_cfa_offset 4			; CHECK-X86-32-NEXT: .cfi_def_cfa_offset 4
	; CHECK-X86-32-NEXT: retl			; CHECK-X86-32-NEXT: retl

llvm/test/CodeGen/X86/stack-clash-small-large-align.ll

	Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: movq $0, (%rsp)			; CHECK-NEXT: movq $0, (%rsp)
	; CHECK-NEXT: subq $4096, %rsp # imm = 0x1000			; CHECK-NEXT: subq $4096, %rsp # imm = 0x1000
	; CHECK-NEXT: cmpq %rsp, %r11			; CHECK-NEXT: cmpq %rsp, %r11
	; CHECK-NEXT: jb .LBB1_2			; CHECK-NEXT: jb .LBB1_2
	; CHECK-NEXT:.LBB1_3:			; CHECK-NEXT:.LBB1_3:
	; CHECK-NEXT: movq %r11, %rsp			; CHECK-NEXT: movq %r11, %rsp
	; CHECK-NEXT: movq $0, (%rsp)			; CHECK-NEXT: movq $0, (%rsp)
	; CHECK-NEXT:.LBB1_4:			; CHECK-NEXT:.LBB1_4:
	; CHECK-NEXT: movq %rsp, %r11			; CHECK-NEXT: movq %rsp, %rax
	; CHECK-NEXT: subq $65536, %r11 # imm = 0x10000			; CHECK-NEXT: subq $65536, %rax # imm = 0x10000
	; CHECK-NEXT:.LBB1_5: # =>This Inner Loop Header: Depth=1			; CHECK-NEXT:.LBB1_5: # =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: subq $4096, %rsp # imm = 0x1000			; CHECK-NEXT: subq $4096, %rsp # imm = 0x1000
	; CHECK-NEXT: movq $0, (%rsp)			; CHECK-NEXT: movq $0, (%rsp)
	; CHECK-NEXT: cmpq %r11, %rsp			; CHECK-NEXT: cmpq %rax, %rsp
	; CHECK-NEXT: jne .LBB1_5			; CHECK-NEXT: jne .LBB1_5
	; CHECK-NEXT:# %bb.6:			; CHECK-NEXT:# %bb.6:
	; CHECK-NEXT: movl $1, 392(%rsp)			; CHECK-NEXT: movl $1, 392(%rsp)
	; CHECK-NEXT: movl (%rsp), %eax			; CHECK-NEXT: movl (%rsp), %eax
	; CHECK-NEXT: movq %rbp, %rsp			; CHECK-NEXT: movq %rbp, %rsp
	; CHECK-NEXT: popq %rbp			; CHECK-NEXT: popq %rbp
	; CHECK-NEXT: .cfi_def_cfa %rsp, 8			; CHECK-NEXT: .cfi_def_cfa %rsp, 8
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	Show All 12 Lines

llvm/test/CodeGen/X86/stack-clash-unknown-call.ll

	; RUN: llc < %s \| FileCheck %s			; RUN: llc < %s \| FileCheck %s


	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	declare void @llvm.memset.p0i8.i64(i8* nocapture writeonly, i8, i64, i1 immarg);			declare void @llvm.memset.p0i8.i64(i8* nocapture writeonly, i8, i64, i1 immarg);

	define void @foo() local_unnamed_addr #0 {			define void @foo() local_unnamed_addr #0 {

	;CHECK-LABEL: foo:			;CHECK-LABEL: foo:
	;CHECK: # %bb.0:			;CHECK: # %bb.0:
	;CHECK-NEXT: subq $4096, %rsp # imm = 0x1000			;CHECK-NEXT: subq $4096, %rsp # imm = 0x1000
				;CHECK-NEXT: .cfi_adjust_cfa_offset 4096
	; it's important that we don't use the call as a probe here			; it's important that we don't use the call as a probe here
	;CHECK-NEXT: movq $0, (%rsp)			;CHECK-NEXT: movq $0, (%rsp)
	;CHECK-NEXT: subq $3912, %rsp # imm = 0xF48			;CHECK-NEXT: subq $3912, %rsp # imm = 0xF48
	;CHECK-NEXT: .cfi_def_cfa_offset 8016			;CHECK-NEXT: .cfi_def_cfa_offset 8016
				nagisaUnsubmitted Not Done Reply Inline Actions The `def_cfa_offset` is superfluous here now. The last `cfi` directive should be either the `adjust` or `def`. nagisa: The `def_cfa_offset` is superfluous here now. The last `cfi` directive should be either the…
				YangKeaoAuthorUnsubmitted Done Reply Inline Actions The `def_cfa_offset` is superfluous here now. The last `cfi` directive should be either the `adjust` or `def`. YangKeao: > The `def_cfa_offset` is superfluous here now. The last `cfi` directive should be either the…
				YangKeaoAuthorUnsubmitted Done Reply Inline Actions Fixed by removing the last `adjust` YangKeao: Fixed by removing the last `adjust`
	;CHECK-NEXT: movq %rsp, %rdi			;CHECK-NEXT: movq %rsp, %rdi
	;CHECK-NEXT: movl $8000, %edx # imm = 0x1F40			;CHECK-NEXT: movl $8000, %edx # imm = 0x1F40
	;CHECK-NEXT: xorl %esi, %esi			;CHECK-NEXT: xorl %esi, %esi
	;CHECK-NEXT: callq memset			;CHECK-NEXT: callq memset
	;CHECK-NEXT: addq $8008, %rsp # imm = 0x1F48			;CHECK-NEXT: addq $8008, %rsp # imm = 0x1F48
	;CHECK-NEXT: .cfi_def_cfa_offset 8			;CHECK-NEXT: .cfi_def_cfa_offset 8
	;CHECK-NEXT: retq			;CHECK-NEXT: retq

	%a = alloca i8, i64 8000, align 16			%a = alloca i8, i64 8000, align 16
	call void @llvm.memset.p0i8.i64(i8* align 16 %a, i8 0, i64 8000, i1 false)			call void @llvm.memset.p0i8.i64(i8* align 16 %a, i8 0, i64 8000, i1 false)
	ret void			ret void
	}			}

	attributes #0 = {"probe-stack"="inline-asm"}			attributes #0 = {"probe-stack"="inline-asm"}

This is an archive of the discontinued LLVM Phabricator instance.

[PEI] add dwarf information for stack probeAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 331618

llvm/lib/Target/X86/X86FrameLowering.h

llvm/lib/Target/X86/X86FrameLowering.cpp

llvm/test/CodeGen/X86/stack-clash-large-large-align.ll

llvm/test/CodeGen/X86/stack-clash-large.ll

llvm/test/CodeGen/X86/stack-clash-medium-natural-probes-mutliple-objects.ll

llvm/test/CodeGen/X86/stack-clash-medium-natural-probes.ll

llvm/test/CodeGen/X86/stack-clash-medium.ll

llvm/test/CodeGen/X86/stack-clash-small-large-align.ll

llvm/test/CodeGen/X86/stack-clash-unknown-call.ll

[PEI] add dwarf information for stack probe
AbandonedPublic