This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Target/X86/
-
Target/
-
X86/
-
CMakeLists.txt
-
X86.h
-
X86ISelLowering.h
-
X86ISelLowering.cpp
-
X86InstrCompiler.td
-
X86InstrInfo.td
-
X86MachineFunctionInfo.h
-
X86TargetMachine.cpp
-
X86WinAllocaExpander.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
cleanuppad-inalloca.ll
-
dynamic-alloca-in-entry.ll
-
inalloca-ctor.ll
-
inalloca-invoke.ll
-
inalloca-stdcall.ll
-
inalloca.ll
-
shrink-wrap-chkstk.ll
-
win-alloca-expander.ll

Differential D20263

X86: Avoid using _chkstk when lowering WIN_ALLOCA instructions
ClosedPublic

Authored by hans on May 13 2016, 4:40 PM.

Download Raw Diff

Details

Reviewers

mkuper
DavidKreitzer
rnk

Commits

rGc3fb51171e90: X86: Avoid using _chkstk when lowering WIN_ALLOCA instructions
rL269828: X86: Avoid using _chkstk when lowering WIN_ALLOCA instructions

Summary

This patch moves the expansion of WIN_ALLOCA pseudo-instructions into a separate pass that walks the CFG and lowers the instructions based on a conservative estimate of the offset between the stack pointer and the lowest accessed stack address.

The goal is to reduce binary size and run-time costs by removing calls to _chkstk.

Diff Detail

Repository: rL LLVM

Event Timeline

hans updated this revision to Diff 57268.May 13 2016, 4:40 PM

hans retitled this revision from to X86: Avoid using _chkstk when lowering WIN_ALLOCA instructions.

hans updated this object.

hans added reviewers: rnk, DavidKreitzer, mkuper.

hans added a subscriber: llvm-commits.

mkuper added inline comments.May 13 2016, 5:11 PM

lib/Target/X86/X86.h
62 ↗	(On Diff #57268)	I think this needs a description. :-)
lib/Target/X86/X86ISelLowering.cpp
16606 ↗	(On Diff #57268)	Do we still need a CopyToReg? The previous code had it because it needed specifically a copy to EAX glued to the future-chkstk. Can you use Size directly?
lib/Target/X86/X86WinAllocaExpander.cpp
120 ↗	(On Diff #57268)	We have a ReversePostOrderTraversal<> in ADT. Can you use that, or is it inappropriate?
166 ↗	(On Diff #57268)	Are you sure this is conservative enough? Perhaps blacklist anything that defs the stack pointer, except for a specific whitelist (calls, pushes, pops...) that touches the tip? Or would we get a huge whitelist?
263 ↗	(On Diff #57268)	StackProbeSize = MF.... ? (Perhaps add a test for this?)

rnk added inline comments.May 13 2016, 5:12 PM

include/llvm/CodeGen/MachineFrameInfo.h
282 ↗	(On Diff #57268)	I think WIN_ALLOCA is x86-specific, so a better place for this would be lib/Target/X86/X86MachineFunctionInfo.h
lib/Target/X86/X86.h
62 ↗	(On Diff #57268)	write the comment :)
lib/Target/X86/X86ISelLowering.cpp
16606 ↗	(On Diff #57268)	I wonder if we could do something clever to avoid creating a copy of an immediate. Might be too much work.
lib/Target/X86/X86WinAllocaExpander.cpp
116 ↗	(On Diff #57268)	s/lowerst/lowest/
119 ↗	(On Diff #57268)	This can be: ReversePostOrderTraversal<MachineFunction *> RPOT(&MF);
168 ↗	(On Diff #57268)	Should this be INT32_MAX instead? It seems like we could do alloca, save+restore, alloca and run over the guard page this way. Maybe a good test case?
197 ↗	(On Diff #57268)	Maybe cache `Is64Bit ? 8 : 4` as a `SlotSize` member variable. Also RAX as... some variable. RegA?

mkuper added inline comments.May 13 2016, 5:24 PM

lib/Target/X86/X86WinAllocaExpander.cpp
197 ↗	(On Diff #57268)	We can probably just query MRI for getSlotSize instead.

Hi Hans,

I made a few detailed comments, but I have higher level concerns about the approach. This new pass will eliminate most of the _chkstk calls, which is great. But we'll still be leaving some performance on the table. We'd really like calls that take inalloca arguments to be optimized in all the same ways as other calls. In particular, we'd like to be able to reserve the outgoing argument block & do the store-to-push optimization for outgoing arguments.

I think the "inalloca" name is unfortunate, because these "WIN_ALLOCA" stack allocations have more in common with ADJCALLSTACKDOWN than they do with alloca. For one thing, they always allocate a fixed amount of space. Additionally, there is an implicit requirement that the stack pointer value immediately following the WIN_ALLOCA matches the stack pointer value at the time of the call to the inalloca function. This requirement is not currently enforced correctly in some cases. You'll find that this program doesn't work with clang on Win32, because the stack space for the _alloca call is allocated *after* the stack space for the "WIN_ALLOCA":

#include <stdio.h>
#include <malloc.h>
struct X
{
    int x;
    X() {x = 42;}
    X(const X&v) { x = v.x; }
};

X x;

void f2(X v, void*p) { printf("v.x = %d (should be 42)\n", v.x); }
void f(int n)
{
    f2(x, _alloca(n));
}

int main()
{
    f(1000);
    return 0;
}

Fixing that bug may be orthogonal to how we model & implement inalloca calls in the back end. But that illustrates why I think a more accurate modeling would be to move the ADJCALLSTACKDOWN instruction that precedes the inalloca call (i.e. the ADJCALLSTACKDOWN instruction that is hacked to always use an allocation size of 0) to the current location of the WIN_ALLOCA, use an accurate allocation size rather than 0, and get rid of the WIN_ALLOCA altogether. That would have the effect of introducing nested ADJCALLSTACKDOWN/ADJCALLSTACKUP structures, which I'm sure is going to cause some implementation problems. But I think it will allow us to more naturally handle & optimize these calls during FrameLowering. It's kind of mess, but I think it is just a natural consequence of Microsoft's choice to construct inalloca arguments in the outgoing argument block rather than in an arbitrary location on the stack and passing a pointer to that location as is done on Linux.

I'll try to put together a prototype. That might make it easier to see what I have in mind.

Meanwhile, I have no objection to using this new pass as a shorter term performance fix. :)

Thanks,
-Dave

BTW, any thoughts on how to fix the above bug? This is another thing that I think ought to be handled by clang.

lib/Target/X86/X86WinAllocaExpander.cpp
50 ↗	(On Diff #57268)	Do you really want to return a LoweringMap? This will cause the entire container to be copied. Maybe pass in a LoweringMap& and fill it in instead?
99 ↗	(On Diff #57268)	The indentation looks off here.
101 ↗	(On Diff #57268)	To be extra safe, you could assert that AllocaAmount is >= -1.
109 ↗	(On Diff #57268)	This assertion seems unnecessary given line 101.
126 ↗	(On Diff #57268)	"Out" isn't very descriptive. Can you name this differently, e.g. OutOffset?
152 ↗	(On Diff #57268)	You should probably add a default: notreached().

mkuper added inline comments.May 16 2016, 3:50 PM

lib/Target/X86/X86WinAllocaExpander.cpp
50 ↗	(On Diff #57268)	Won't this be ok with RVO?

Thanks for the great comments everyone!

David, I agree this only addresses a small part of the problem and leaves a lot of performance on the table. The way I think about it is as an incremental improvement, easier than tackling the full inalloca problem all at once, and also helpful for dynamic allocas in general.

Having the WIN_ALLOCA pseudos expanded late might also help getting better code for these calls eventually, as it should remove the need for some of the pattern matching I tried in D20003 (unless we manage to fix them with some higher-level approach).

include/llvm/CodeGen/MachineFrameInfo.h
282 ↗	(On Diff #57268)	Done.
lib/Target/X86/X86.h
62 ↗	(On Diff #57268)	Done.
lib/Target/X86/X86ISelLowering.cpp
16606 ↗	(On Diff #57268)	We can avoid creating the copy here, as Michael pointed out, but since the MI instruction expects a register (because the amount might not be a constant), there will spill be a copy at the MI level. We could create a variant of the pseudo-instruction that takes an immediate, but I'm not sure that would end up being a simplification.
lib/Target/X86/X86WinAllocaExpander.cpp
50 ↗	(On Diff #57268)	I figured the call would be inlined anyway, and the compiler could avoid the copy. Anyway, I've changed the call to return via a reference parameter instead, as that seems more idiomatic.
99 ↗	(On Diff #57268)	Done.
101 ↗	(On Diff #57268)	Hmm, I'll just change it to check AllocaAmount < 0 in the condition instead.
109 ↗	(On Diff #57268)	Removed.
116 ↗	(On Diff #57268)	Done.
119 ↗	(On Diff #57268)	Done.
126 ↗	(On Diff #57268)	Done.
152 ↗	(On Diff #57268)	I left it out intentionally, relying on -Wswitch to detect any missing cases: http://llvm.org/docs/CodingStandards.html#don-t-use-default-labels-in-fully-covered-switches-over-enumerations
166 ↗	(On Diff #57268)	I couldn't come up with any example code that would break this in practice. But you're right, a white-list would be better, and I don't think it will be big.
168 ↗	(On Diff #57268)	Oops, -1 was a leftover from an earlier version of the patch. I've added a test.
197 ↗	(On Diff #57268)	Done.
197 ↗	(On Diff #57268)	Done.
263 ↗	(On Diff #57268)	It's passed as an argument to .getAsInteger below. Or are you saying there's some shorter way to get at it? Added a test.

Addressing comments and adding another test.

mkuper added inline comments.May 16 2016, 5:02 PM

lib/Target/X86/X86WinAllocaExpander.cpp
181 ↗	(On Diff #57417)	This looks a bit scary to me. It can cause an offset to be temporary negative, and negative offsets are special-cased now, right? (Well, we'll probably never hit "-1" due to alignment anyway, but...) I think I can live with this, though, especially since, as Dave's example shows, allocations within this region are probably broken as is, so it's not like this would make anything worse. But if we keep it as is, then I think we need at least a FIXME.
264 ↗	(On Diff #57417)	Never mind, I misread, sorry for the noise.

Any more comments, or are you all OK with this going in?

lib/Target/X86/X86WinAllocaExpander.cpp
181 ↗	(On Diff #57417)	Thinking more about this, I don't think a negative offset would break anything. The interesting code is on line 107, which I think will do the right thing (if we ever ended up in this situation). Negative-offsets are not special-cased (a negative WinAlloca amount means "unknown" though.)

In D20263#431442, @DavidKreitzer wrote:

BTW, any thoughts on how to fix the above bug? This is another thing that I think ought to be handled by clang.

We did notice this, it's https://llvm.org/bugs/show_bug.cgi?id=26776. We can leverage the fact that argument evaluation order is currently unsequenced, and evaluate the argument containing the alloca call first.

However, if C++17 defines argument evaluation order, then it will become impossible to satisfy the copy elision requirement and the evaluation order requirement in this old and busted 32-bit ABI. =/

I think we can transform WIN_ALLOCA to ADJCALLSTACK today in certain limited cases, and if we can, it would definitely be a win. I doubt we can tolerate nested ADJCALLSTACK frames, though, and inalloca typically happens precisely when there is a nested call sequence. Obviously, stack frame reservation conflicts with nested call stack adjustments, but we should already be protected from that by the hasVarSizedObjects flag, which is set earlier in FunctionLoweringInfo.

Do you intend to pursue store-to-push conversion for calls where an argument is still address-taken by the time we reach codegen? Consider this kind of code:

struct S { S(const S &); ~S(); int s; };
S makeS(); // cannot inline away
void takeS(int, S, int);
void passS() { takeS(1, makeS(), 3); }

MSVC can make this into "push 1, sub 4, call makeS, push 3", but by the time we reach LLVM codegen, we have no ability to reason about what memory makeS writes. I'm happy if LLVM can turn this into "sub, call makeS, store 1, store 3, call takeS, add". This is what we used to get before push conversion, and I think we can achieve it with our current approach. If we want to do full push conversion, then we need to revisit the IR.

I just have a couple other minor suggestions. Otherwise, LGTM.

lib/Target/X86/X86WinAllocaExpander.cpp
116 ↗	(On Diff #57417)	Isn't the formatting convention to line up "case" and "switch"?
153 ↗	(On Diff #57417)	Ah, cool. That's good to know, thanks!
217 ↗	(On Diff #57417)	Since SlotSize is always a power of 2, a minor compile-time efficiency improvement would be to change the assert condition to (Amount & (SlotSize - 1)) == 0.

LGTM

lib/Target/X86/X86WinAllocaExpander.cpp
181 ↗	(On Diff #57417)	Ok, that sounds reasonable.

This revision is now accepted and ready to land.May 17 2016, 12:40 PM

Thanks for the pointer to https://llvm.org/bugs/show_bug.cgi?id=26776, Reid.

I doubt we can tolerate nested ADJCALLSTACK frames, though, and inalloca typically happens precisely when there is a nested call sequence.

I'm sure you are right. I'd still like to give it a try and see how badly things break.

Obviously, stack frame reservation conflicts with nested call stack adjustments, but we should already be protected from that by the hasVarSizedObjects flag, which is set earlier in FunctionLoweringInfo.

We can still do stack frame reservation with nested call stack adjustments. It's just that the reserved call frame is only available to the top-level calls. Nested calls would have to do their own stack adjusts. I'm sure making that distinction would create some implementation challenges, and it isn't likely to be high impact, but it seems theoretically possible.

Closed by commit rL269828: X86: Avoid using _chkstk when lowering WIN_ALLOCA instructions (authored by hans). · Explain WhyMay 17 2016, 1:32 PM

This revision was automatically updated to reflect the committed changes.

hans marked an inline comment as done.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

1 line

3 lines

3 lines

23 lines

33 lines

6 lines

X86MachineFunctionInfo.h

6 lines

X86TargetMachine.cpp

1 line

X86WinAllocaExpander.cpp

294 lines

test/

CodeGen/

X86/

cleanuppad-inalloca.ll

4 lines

dynamic-alloca-in-entry.ll

2 lines

4 lines

3 lines

4 lines

12 lines

shrink-wrap-chkstk.ll

4 lines

win-alloca-expander.ll

153 lines

Diff 57514

llvm/trunk/lib/Target/X86/CMakeLists.txt

Show All 31 Lines	set(sources
X86TargetMachine.cpp		X86TargetMachine.cpp
X86TargetObjectFile.cpp		X86TargetObjectFile.cpp
X86TargetTransformInfo.cpp		X86TargetTransformInfo.cpp
X86VZeroUpper.cpp		X86VZeroUpper.cpp
X86FixupLEAs.cpp		X86FixupLEAs.cpp
X86WinEHState.cpp		X86WinEHState.cpp
X86OptimizeLEAs.cpp		X86OptimizeLEAs.cpp
X86FixupBWInsts.cpp		X86FixupBWInsts.cpp
		X86WinAllocaExpander.cpp
)		)

add_llvm_target(X86CodeGen ${sources})		add_llvm_target(X86CodeGen ${sources})

add_subdirectory(AsmParser)		add_subdirectory(AsmParser)
add_subdirectory(Disassembler)		add_subdirectory(Disassembler)
add_subdirectory(InstPrinter)		add_subdirectory(InstPrinter)
add_subdirectory(MCTargetDesc)		add_subdirectory(MCTargetDesc)
add_subdirectory(TargetInfo)		add_subdirectory(TargetInfo)
add_subdirectory(Utils)		add_subdirectory(Utils)

llvm/trunk/lib/Target/X86/X86.h

	Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	/// sub, inc, dec, some shifts, and some multiplies) by equivalent LEA			/// sub, inc, dec, some shifts, and some multiplies) by equivalent LEA
	/// instructions, in order to eliminate execution delays in some processors.			/// instructions, in order to eliminate execution delays in some processors.
	FunctionPass *createX86FixupLEAs();			FunctionPass *createX86FixupLEAs();

	/// Return a pass that removes redundant LEA instructions and redundant address			/// Return a pass that removes redundant LEA instructions and redundant address
	/// recalculations.			/// recalculations.
	FunctionPass *createX86OptimizeLEAs();			FunctionPass *createX86OptimizeLEAs();

				/// Return a pass that expands WinAlloca pseudo-instructions.
				FunctionPass *createX86WinAllocaExpander();

	/// Return a pass that optimizes the code-size of x86 call sequences. This is			/// Return a pass that optimizes the code-size of x86 call sequences. This is
	/// done by replacing esp-relative movs with pushes.			/// done by replacing esp-relative movs with pushes.
	FunctionPass *createX86CallFrameOptimization();			FunctionPass *createX86CallFrameOptimization();

	/// Return an IR pass that inserts EH registration stack objects and explicit			/// Return an IR pass that inserts EH registration stack objects and explicit
	/// EH state updates. This pass must run after EH preparation, which does			/// EH state updates. This pass must run after EH preparation, which does
	/// Windows-specific but architecture-neutral preparation.			/// Windows-specific but architecture-neutral preparation.
	FunctionPass *createX86WinEHStatePass();			FunctionPass *createX86WinEHStatePass();
	Show All 17 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 1,156 Lines • ▼ Show 20 Lines	MachineBasicBlock *EmitVAStartSaveXMMRegsWithCustomInserter(
MachineBasicBlock *BB) const;		MachineBasicBlock *BB) const;

MachineBasicBlock EmitLoweredSelect(MachineInstr I,		MachineBasicBlock EmitLoweredSelect(MachineInstr I,
MachineBasicBlock *BB) const;		MachineBasicBlock *BB) const;

MachineBasicBlock EmitLoweredAtomicFP(MachineInstr I,		MachineBasicBlock EmitLoweredAtomicFP(MachineInstr I,
MachineBasicBlock *BB) const;		MachineBasicBlock *BB) const;

MachineBasicBlock EmitLoweredWinAlloca(MachineInstr MI,
MachineBasicBlock *BB) const;

MachineBasicBlock EmitLoweredCatchRet(MachineInstr MI,		MachineBasicBlock EmitLoweredCatchRet(MachineInstr MI,
MachineBasicBlock *BB) const;		MachineBasicBlock *BB) const;

MachineBasicBlock EmitLoweredCatchPad(MachineInstr MI,		MachineBasicBlock EmitLoweredCatchPad(MachineInstr MI,
MachineBasicBlock *BB) const;		MachineBasicBlock *BB) const;

MachineBasicBlock EmitLoweredSegAlloca(MachineInstr MI,		MachineBasicBlock EmitLoweredSegAlloca(MachineInstr MI,
MachineBasicBlock *BB) const;		MachineBasicBlock *BB) const;
▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 16,557 Lines • ▼ Show 20 Lines	if (!Lower) {
}		}

const TargetRegisterClass *AddrRegClass = getRegClassFor(SPTy);		const TargetRegisterClass *AddrRegClass = getRegClassFor(SPTy);
unsigned Vreg = MRI.createVirtualRegister(AddrRegClass);		unsigned Vreg = MRI.createVirtualRegister(AddrRegClass);
Chain = DAG.getCopyToReg(Chain, dl, Vreg, Size);		Chain = DAG.getCopyToReg(Chain, dl, Vreg, Size);
Result = DAG.getNode(X86ISD::SEG_ALLOCA, dl, SPTy, Chain,		Result = DAG.getNode(X86ISD::SEG_ALLOCA, dl, SPTy, Chain,
DAG.getRegister(Vreg, SPTy));		DAG.getRegister(Vreg, SPTy));
} else {		} else {
SDValue Flag;
const unsigned Reg = (Subtarget.isTarget64BitLP64() ? X86::RAX : X86::EAX);

Chain = DAG.getCopyToReg(Chain, dl, Reg, Size, Flag);
Flag = Chain.getValue(1);
SDVTList NodeTys = DAG.getVTList(MVT::Other, MVT::Glue);		SDVTList NodeTys = DAG.getVTList(MVT::Other, MVT::Glue);
		Chain = DAG.getNode(X86ISD::WIN_ALLOCA, dl, NodeTys, Chain, Size);
Chain = DAG.getNode(X86ISD::WIN_ALLOCA, dl, NodeTys, Chain, Flag);		MF.getInfo<X86MachineFunctionInfo>()->setHasWinAlloca(true);

const X86RegisterInfo *RegInfo = Subtarget.getRegisterInfo();		const X86RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
unsigned SPReg = RegInfo->getStackRegister();		unsigned SPReg = RegInfo->getStackRegister();
SDValue SP = DAG.getCopyFromReg(Chain, dl, SPReg, SPTy);		SDValue SP = DAG.getCopyFromReg(Chain, dl, SPReg, SPTy);
Chain = SP.getValue(1);		Chain = SP.getValue(1);

if (Align) {		if (Align) {
SP = DAG.getNode(ISD::AND, dl, VT, SP.getValue(0),		SP = DAG.getNode(ISD::AND, dl, VT, SP.getValue(0),
▲ Show 20 Lines • Show All 6,641 Lines • ▼ Show 20 Lines	X86TargetLowering::EmitLoweredSegAlloca(MachineInstr *MI,
// Delete the original pseudo instruction.		// Delete the original pseudo instruction.
MI->eraseFromParent();		MI->eraseFromParent();

// And we're done.		// And we're done.
return continueMBB;		return continueMBB;
}		}

MachineBasicBlock *		MachineBasicBlock *
X86TargetLowering::EmitLoweredWinAlloca(MachineInstr *MI,
MachineBasicBlock *BB) const {
assert(!Subtarget.isTargetMachO());
DebugLoc DL = MI->getDebugLoc();
MachineInstr *ResumeMI = Subtarget.getFrameLowering()->emitStackProbe(
BB->getParent(), BB, MI, DL, false);
MachineBasicBlock *ResumeBB = ResumeMI->getParent();
MI->eraseFromParent(); // The pseudo instruction is gone now.
return ResumeBB;
}

MachineBasicBlock *
X86TargetLowering::EmitLoweredCatchRet(MachineInstr *MI,		X86TargetLowering::EmitLoweredCatchRet(MachineInstr *MI,
MachineBasicBlock *BB) const {		MachineBasicBlock *BB) const {
MachineFunction *MF = BB->getParent();		MachineFunction *MF = BB->getParent();
const TargetInstrInfo &TII = *Subtarget.getInstrInfo();		const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
MachineBasicBlock *TargetMBB = MI->getOperand(0).getMBB();		MachineBasicBlock *TargetMBB = MI->getOperand(0).getMBB();
DebugLoc DL = MI->getDebugLoc();		DebugLoc DL = MI->getDebugLoc();

assert(!isAsynchronousEHPersonality(		assert(!isAsynchronousEHPersonality(
▲ Show 20 Lines • Show All 445 Lines • ▼ Show 20 Lines	X86TargetLowering::EmitInstrWithCustomInserter(MachineInstr *MI,
case X86::TCRETURNri64:		case X86::TCRETURNri64:
case X86::TCRETURNmi64:		case X86::TCRETURNmi64:
return BB;		return BB;
case X86::TLS_addr32:		case X86::TLS_addr32:
case X86::TLS_addr64:		case X86::TLS_addr64:
case X86::TLS_base_addr32:		case X86::TLS_base_addr32:
case X86::TLS_base_addr64:		case X86::TLS_base_addr64:
return EmitLoweredTLSAddr(MI, BB);		return EmitLoweredTLSAddr(MI, BB);
case X86::WIN_ALLOCA:
return EmitLoweredWinAlloca(MI, BB);
case X86::CATCHRET:		case X86::CATCHRET:
return EmitLoweredCatchRet(MI, BB);		return EmitLoweredCatchRet(MI, BB);
case X86::CATCHPAD:		case X86::CATCHPAD:
return EmitLoweredCatchPad(MI, BB);		return EmitLoweredCatchPad(MI, BB);
case X86::SEG_ALLOCA_32:		case X86::SEG_ALLOCA_32:
case X86::SEG_ALLOCA_64:		case X86::SEG_ALLOCA_64:
return EmitLoweredSegAlloca(MI, BB);		return EmitLoweredSegAlloca(MI, BB);
case X86::TLSCall_32:		case X86::TLSCall_32:
▲ Show 20 Lines • Show All 6,855 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrCompiler.td

	Show First 20 Lines • Show All 93 Lines • ▼ Show 20 Lines
	def VAARG_64 : I<0, Pseudo,			def VAARG_64 : I<0, Pseudo,
	(outs GR64:$dst),			(outs GR64:$dst),
	(ins i8mem:$ap, i32imm:$size, i8imm:$mode, i32imm:$align),			(ins i8mem:$ap, i32imm:$size, i8imm:$mode, i32imm:$align),
	"#VAARG_64 $dst, $ap, $size, $mode, $align",			"#VAARG_64 $dst, $ap, $size, $mode, $align",
	[(set GR64:$dst,			[(set GR64:$dst,
	(X86vaarg64 addr:$ap, imm:$size, imm:$mode, imm:$align)),			(X86vaarg64 addr:$ap, imm:$size, imm:$mode, imm:$align)),
	(implicit EFLAGS)]>;			(implicit EFLAGS)]>;

	// Dynamic stack allocation yields a _chkstk or _alloca call for all Windows
	// targets. These calls are needed to probe the stack when allocating more than
	// 4k bytes in one go. Touching the stack at 4K increments is necessary to
	// ensure that the guard pages used by the OS virtual memory manager are
	// allocated in correct sequence.
	// The main point of having separate instruction are extra unmodelled effects
	// (compared to ordinary calls) like stack pointer change.

	let Defs = [EAX, ESP, EFLAGS], Uses = [ESP] in
	def WIN_ALLOCA : I<0, Pseudo, (outs), (ins),
	"# dynamic stack allocation",
	[(X86WinAlloca)]>;

	// When using segmented stacks these are lowered into instructions which first			// When using segmented stacks these are lowered into instructions which first
	// check if the current stacklet has enough free memory. If it does, memory is			// check if the current stacklet has enough free memory. If it does, memory is
	// allocated by bumping the stack pointer. Otherwise memory is allocated from			// allocated by bumping the stack pointer. Otherwise memory is allocated from
	// the heap.			// the heap.

	let Defs = [EAX, ESP, EFLAGS], Uses = [ESP] in			let Defs = [EAX, ESP, EFLAGS], Uses = [ESP] in
	def SEG_ALLOCA_32 : I<0, Pseudo, (outs GR32:$dst), (ins GR32:$size),			def SEG_ALLOCA_32 : I<0, Pseudo, (outs GR32:$dst), (ins GR32:$size),
	"# variable sized alloca for segmented stacks",			"# variable sized alloca for segmented stacks",
	[(set GR32:$dst,			[(set GR32:$dst,
	(X86SegAlloca GR32:$size))]>,			(X86SegAlloca GR32:$size))]>,
	Requires<[NotLP64]>;			Requires<[NotLP64]>;

	let Defs = [RAX, RSP, EFLAGS], Uses = [RSP] in			let Defs = [RAX, RSP, EFLAGS], Uses = [RSP] in
	def SEG_ALLOCA_64 : I<0, Pseudo, (outs GR64:$dst), (ins GR64:$size),			def SEG_ALLOCA_64 : I<0, Pseudo, (outs GR64:$dst), (ins GR64:$size),
	"# variable sized alloca for segmented stacks",			"# variable sized alloca for segmented stacks",
	[(set GR64:$dst,			[(set GR64:$dst,
	(X86SegAlloca GR64:$size))]>,			(X86SegAlloca GR64:$size))]>,
	Requires<[In64BitMode]>;			Requires<[In64BitMode]>;
	}			}

				// Dynamic stack allocation yields a _chkstk or _alloca call for all Windows
				// targets. These calls are needed to probe the stack when allocating more than
				// 4k bytes in one go. Touching the stack at 4K increments is necessary to
				// ensure that the guard pages used by the OS virtual memory manager are
				// allocated in correct sequence.
				// The main point of having separate instruction are extra unmodelled effects
				// (compared to ordinary calls) like stack pointer change.

				let Defs = [EAX, ESP, EFLAGS], Uses = [ESP] in
				def WIN_ALLOCA_32 : I<0, Pseudo, (outs), (ins GR32:$size),
				"# dynamic stack allocation",
				[(X86WinAlloca GR32:$size)]>,
				Requires<[NotLP64]>;

				let Defs = [RAX, RSP, EFLAGS], Uses = [RSP] in
				def WIN_ALLOCA_64 : I<0, Pseudo, (outs), (ins GR64:$size),
				"# dynamic stack allocation",
				[(X86WinAlloca GR64:$size)]>,
				Requires<[In64BitMode]>;


	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// EH Pseudo Instructions			// EH Pseudo Instructions
	//			//
	let SchedRW = [WriteSystem] in {			let SchedRW = [WriteSystem] in {
	let isTerminator = 1, isReturn = 1, isBarrier = 1,			let isTerminator = 1, isReturn = 1, isBarrier = 1,
	hasCtrlDep = 1, isCodeGenOnly = 1 in {			hasCtrlDep = 1, isCodeGenOnly = 1 in {
	def EH_RETURN : I<0xC3, RawFrm, (outs), (ins GR32:$addr),			def EH_RETURN : I<0xC3, RawFrm, (outs), (ins GR32:$addr),
	"ret\t#eh_return, addr: $addr",			"ret\t#eh_return, addr: $addr",
	▲ Show 20 Lines • Show All 1,813 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrInfo.td

	Show First 20 Lines • Show All 106 Lines • ▼ Show 20 Lines
	def SDTX86Wrapper : SDTypeProfile<1, 1, [SDTCisSameAs<0, 1>, SDTCisPtrTy<0>]>;			def SDTX86Wrapper : SDTypeProfile<1, 1, [SDTCisSameAs<0, 1>, SDTCisPtrTy<0>]>;

	def SDT_X86TLSADDR : SDTypeProfile<0, 1, [SDTCisInt<0>]>;			def SDT_X86TLSADDR : SDTypeProfile<0, 1, [SDTCisInt<0>]>;

	def SDT_X86TLSBASEADDR : SDTypeProfile<0, 1, [SDTCisInt<0>]>;			def SDT_X86TLSBASEADDR : SDTypeProfile<0, 1, [SDTCisInt<0>]>;

	def SDT_X86TLSCALL : SDTypeProfile<0, 1, [SDTCisInt<0>]>;			def SDT_X86TLSCALL : SDTypeProfile<0, 1, [SDTCisInt<0>]>;

				def SDT_X86WIN_ALLOCA : SDTypeProfile<0, 1, [SDTCisVT<0, iPTR>]>;

	def SDT_X86SEG_ALLOCA : SDTypeProfile<1, 1, [SDTCisVT<0, iPTR>, SDTCisVT<1, iPTR>]>;			def SDT_X86SEG_ALLOCA : SDTypeProfile<1, 1, [SDTCisVT<0, iPTR>, SDTCisVT<1, iPTR>]>;

	def SDT_X86EHRET : SDTypeProfile<0, 1, [SDTCisInt<0>]>;			def SDT_X86EHRET : SDTypeProfile<0, 1, [SDTCisInt<0>]>;

	def SDT_X86TCRET : SDTypeProfile<0, 2, [SDTCisPtrTy<0>, SDTCisVT<1, i32>]>;			def SDT_X86TCRET : SDTypeProfile<0, 2, [SDTCisPtrTy<0>, SDTCisVT<1, i32>]>;

	def SDT_X86MEMBARRIER : SDTypeProfile<0, 0, []>;			def SDT_X86MEMBARRIER : SDTypeProfile<0, 0, []>;

	▲ Show 20 Lines • Show All 145 Lines • ▼ Show 20 Lines
	def X86lock_and : SDNode<"X86ISD::LAND", SDTLockBinaryArithWithFlags,			def X86lock_and : SDNode<"X86ISD::LAND", SDTLockBinaryArithWithFlags,
	[SDNPHasChain, SDNPMayStore, SDNPMayLoad,			[SDNPHasChain, SDNPMayStore, SDNPMayLoad,
	SDNPMemOperand]>;			SDNPMemOperand]>;

	def X86bextr : SDNode<"X86ISD::BEXTR", SDTIntBinOp>;			def X86bextr : SDNode<"X86ISD::BEXTR", SDTIntBinOp>;

	def X86mul_imm : SDNode<"X86ISD::MUL_IMM", SDTIntBinOp>;			def X86mul_imm : SDNode<"X86ISD::MUL_IMM", SDTIntBinOp>;

	def X86WinAlloca : SDNode<"X86ISD::WIN_ALLOCA", SDTX86Void,			def X86WinAlloca : SDNode<"X86ISD::WIN_ALLOCA", SDT_X86WIN_ALLOCA,
	[SDNPHasChain, SDNPInGlue, SDNPOutGlue]>;			[SDNPHasChain, SDNPOutGlue]>;

	def X86SegAlloca : SDNode<"X86ISD::SEG_ALLOCA", SDT_X86SEG_ALLOCA,			def X86SegAlloca : SDNode<"X86ISD::SEG_ALLOCA", SDT_X86SEG_ALLOCA,
	[SDNPHasChain]>;			[SDNPHasChain]>;

	def X86TLSCall : SDNode<"X86ISD::TLSCALL", SDT_X86TLSCALL,			def X86TLSCall : SDNode<"X86ISD::TLSCALL", SDT_X86TLSCALL,
	[SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;			[SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	▲ Show 20 Lines • Show All 2,801 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86MachineFunctionInfo.h

Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	class X86MachineFunctionInfo : public MachineFunctionInfo {

/// True if this function has a subset of CSRs that is handled explicitly via		/// True if this function has a subset of CSRs that is handled explicitly via
/// copies.		/// copies.
bool IsSplitCSR = false;		bool IsSplitCSR = false;

/// True if this function uses the red zone.		/// True if this function uses the red zone.
bool UsesRedZone = false;		bool UsesRedZone = false;

		/// True if this function has WIN_ALLOCA instructions.
		bool HasWinAlloca = false;

private:		private:
/// ForwardedMustTailRegParms - A list of virtual and physical registers		/// ForwardedMustTailRegParms - A list of virtual and physical registers
/// that must be forwarded to every musttail call.		/// that must be forwarded to every musttail call.
SmallVector<ForwardedRegister, 1> ForwardedMustTailRegParms;		SmallVector<ForwardedRegister, 1> ForwardedMustTailRegParms;

public:		public:
X86MachineFunctionInfo() = default;		X86MachineFunctionInfo() = default;

▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	SmallVectorImpl<ForwardedRegister> &getForwardedMustTailRegParms() {
return ForwardedMustTailRegParms;		return ForwardedMustTailRegParms;
}		}

bool isSplitCSR() const { return IsSplitCSR; }		bool isSplitCSR() const { return IsSplitCSR; }
void setIsSplitCSR(bool s) { IsSplitCSR = s; }		void setIsSplitCSR(bool s) { IsSplitCSR = s; }

bool getUsesRedZone() const { return UsesRedZone; }		bool getUsesRedZone() const { return UsesRedZone; }
void setUsesRedZone(bool V) { UsesRedZone = V; }		void setUsesRedZone(bool V) { UsesRedZone = V; }

		bool hasWinAlloca() const { return HasWinAlloca; }
		void setHasWinAlloca(bool v) { HasWinAlloca = v; }
};		};

} // End llvm namespace		} // End llvm namespace

#endif		#endif

llvm/trunk/lib/Target/X86/X86TargetMachine.cpp

Show First 20 Lines • Show All 260 Lines • ▼ Show 20 Lines	bool X86PassConfig::addPreISel() {
return true;		return true;
}		}

void X86PassConfig::addPreRegAlloc() {		void X86PassConfig::addPreRegAlloc() {
if (getOptLevel() != CodeGenOpt::None)		if (getOptLevel() != CodeGenOpt::None)
addPass(createX86OptimizeLEAs());		addPass(createX86OptimizeLEAs());

addPass(createX86CallFrameOptimization());		addPass(createX86CallFrameOptimization());
		addPass(createX86WinAllocaExpander());
}		}

void X86PassConfig::addPostRegAlloc() {		void X86PassConfig::addPostRegAlloc() {
addPass(createX86FloatingPointStackifierPass());		addPass(createX86FloatingPointStackifierPass());
}		}

void X86PassConfig::addPreSched2() { addPass(createX86ExpandPseudoPass()); }		void X86PassConfig::addPreSched2() { addPass(createX86ExpandPseudoPass()); }

Show All 13 Lines

llvm/trunk/lib/Target/X86/X86WinAllocaExpander.cpp

				//===----- X86WinAllocaExpander.cpp - Expand WinAlloca pseudo instruction -===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file defines a pass that expands WinAlloca pseudo-instructions.
				//
				// It performs a conservative analysis to determine whether each allocation
				// falls within a region of the stack that is safe to use, or whether stack
				// probes must be emitted.
				//
				//===----------------------------------------------------------------------===//

				#include "X86.h"
				#include "X86InstrBuilder.h"
				#include "X86InstrInfo.h"
				#include "X86MachineFunctionInfo.h"
				#include "X86Subtarget.h"
				#include "llvm/ADT/PostOrderIterator.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/CodeGen/Passes.h"
				#include "llvm/IR/Function.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Target/TargetInstrInfo.h"

				using namespace llvm;

				namespace {

				class X86WinAllocaExpander : public MachineFunctionPass {
				public:
				X86WinAllocaExpander() : MachineFunctionPass(ID) {}

				bool runOnMachineFunction(MachineFunction &MF) override;

				private:
				/// Strategies for lowering a WinAlloca.
				enum Lowering { TouchAndSub, Sub, Probe };

				/// Deterministic-order map from WinAlloca instruction to desired lowering.
				typedef MapVector<MachineInstr*, Lowering> LoweringMap;

				/// Compute which lowering to use for each WinAlloca instruction.
				void computeLowerings(MachineFunction &MF, LoweringMap& Lowerings);

				/// Get the appropriate lowering based on current offset and amount.
				Lowering getLowering(int64_t CurrentOffset, int64_t AllocaAmount);

				/// Lower a WinAlloca instruction.
				void lower(MachineInstr* MI, Lowering L);

				MachineRegisterInfo *MRI;
				const X86Subtarget *STI;
				const TargetInstrInfo *TII;
				const X86RegisterInfo *TRI;
				unsigned StackPtr;
				unsigned SlotSize;
				int64_t StackProbeSize;

				const char *getPassName() const override { return "X86 WinAlloca Expander"; }
				static char ID;
				};

				char X86WinAllocaExpander::ID = 0;

				} // end anonymous namespace

				FunctionPass *llvm::createX86WinAllocaExpander() {
				return new X86WinAllocaExpander();
				}

				/// Return the allocation amount for a WinAlloca instruction, or -1 if unknown.
				static int64_t getWinAllocaAmount(MachineInstr MI, MachineRegisterInfo MRI) {
				assert(MI->getOpcode() == X86::WIN_ALLOCA_32 \|\|
				MI->getOpcode() == X86::WIN_ALLOCA_64);
				assert(MI->getOperand(0).isReg());

				unsigned AmountReg = MI->getOperand(0).getReg();
				MachineInstr *Def = MRI->getUniqueVRegDef(AmountReg);

				// Look through copies.
				while (Def && Def->isCopy() && Def->getOperand(1).isReg())
				Def = MRI->getUniqueVRegDef(Def->getOperand(1).getReg());

				if (!Def \|\|
				(Def->getOpcode() != X86::MOV32ri && Def->getOpcode() != X86::MOV64ri) \|\|
				!Def->getOperand(1).isImm())
				return -1;

				return Def->getOperand(1).getImm();
				}

				X86WinAllocaExpander::Lowering
				X86WinAllocaExpander::getLowering(int64_t CurrentOffset,
				int64_t AllocaAmount) {
				// For a non-constant amount or a large amount, we have to probe.
				if (AllocaAmount < 0 \|\| AllocaAmount > StackProbeSize)
				return Probe;

				// If it fits within the safe region of the stack, just subtract.
				if (CurrentOffset + AllocaAmount <= StackProbeSize)
				return Sub;

				// Otherwise, touch the current tip of the stack, then subtract.
				return TouchAndSub;
				}

				static bool isPushPop(const MachineInstr &MI) {
				switch (MI.getOpcode()) {
				case X86::PUSH32i8:
				case X86::PUSH32r:
				case X86::PUSH32rmm:
				case X86::PUSH32rmr:
				case X86::PUSHi32:
				case X86::PUSH64i8:
				case X86::PUSH64r:
				case X86::PUSH64rmm:
				case X86::PUSH64rmr:
				case X86::PUSH64i32:
				case X86::POP32r:
				case X86::POP64r:
				return true;
				default:
				return false;
				}
				}

				void X86WinAllocaExpander::computeLowerings(MachineFunction &MF,
				LoweringMap &Lowerings) {
				// Do a one-pass reverse post-order walk of the CFG to conservatively estimate
				// the offset between the stack pointer and the lowest touched part of the
				// stack, and use that to decide how to lower each WinAlloca instruction.

				// Initialize OutOffset[B], the stack offset at exit from B, to something big.
				DenseMap<MachineBasicBlock *, int64_t> OutOffset;
				for (MachineBasicBlock &MBB : MF)
				OutOffset[&MBB] = INT32_MAX;

				// Note: we don't know the offset at the start of the entry block since the
				// prologue hasn't been inserted yet, and how much that will adjust the stack
				// pointer depends on register spills, which have not been computed yet.

				// Compute the reverse post-order.
				ReversePostOrderTraversal<MachineFunction*> RPO(&MF);

				for (MachineBasicBlock *MBB : RPO) {
				int64_t Offset = -1;
				for (MachineBasicBlock *Pred : MBB->predecessors())
				Offset = std::max(Offset, OutOffset[Pred]);
				if (Offset == -1) Offset = INT32_MAX;

				for (MachineInstr &MI : *MBB) {
				if (MI.getOpcode() == X86::WIN_ALLOCA_32 \|\|
				MI.getOpcode() == X86::WIN_ALLOCA_64) {
				// A WinAlloca moves StackPtr, and potentially touches it.
				int64_t Amount = getWinAllocaAmount(&MI, MRI);
				Lowering L = getLowering(Offset, Amount);
				Lowerings[&MI] = L;
				switch (L) {
				case Sub:
				Offset += Amount;
				break;
				case TouchAndSub:
				Offset = Amount;
				break;
				case Probe:
				Offset = 0;
				break;
				}
				} else if (MI.isCall() \|\| isPushPop(MI)) {
				// Calls, pushes and pops touch the tip of the stack.
				Offset = 0;
				} else if (MI.getOpcode() == X86::ADJCALLSTACKUP32 \|\|
				MI.getOpcode() == X86::ADJCALLSTACKUP64) {
				Offset -= MI.getOperand(0).getImm();
				} else if (MI.getOpcode() == X86::ADJCALLSTACKDOWN32 \|\|
				MI.getOpcode() == X86::ADJCALLSTACKDOWN64) {
				Offset += MI.getOperand(0).getImm();
				} else if (MI.modifiesRegister(StackPtr, TRI)) {
				// Any other modification of SP means we've lost track of it.
				Offset = INT32_MAX;
				}
				}

				OutOffset[MBB] = Offset;
				}
				}

				static unsigned getSubOpcode(bool Is64Bit, int64_t Amount) {
				if (Is64Bit)
				return isInt<8>(Amount) ? X86::SUB64ri8 : X86::SUB64ri32;
				return isInt<8>(Amount) ? X86::SUB32ri8 : X86::SUB32ri;
				}

				void X86WinAllocaExpander::lower(MachineInstr* MI, Lowering L) {
				DebugLoc DL = MI->getDebugLoc();
				MachineBasicBlock *MBB = MI->getParent();
				MachineBasicBlock::iterator I = *MI;

				int64_t Amount = getWinAllocaAmount(MI, MRI);
				if (Amount == 0) {
				MI->eraseFromParent();
				return;
				}

				bool Is64Bit = STI->is64Bit();
				assert(SlotSize == 4 \|\| SlotSize == 8);
				unsigned RegA = (SlotSize == 8) ? X86::RAX : X86::EAX;

				switch (L) {
				case TouchAndSub:
				assert(Amount >= SlotSize);

				// Use a push to touch the top of the stack.
				BuildMI(*MBB, I, DL, TII->get(Is64Bit ? X86::PUSH64r : X86::PUSH32r))
				.addReg(RegA, RegState::Undef);
				Amount -= SlotSize;
				if (!Amount)
				break;

				// Fall through to make any remaining adjustment.
				case Sub:
				assert(Amount > 0);
				if (Amount == SlotSize) {
				// Use push to save size.
				BuildMI(*MBB, I, DL, TII->get(Is64Bit ? X86::PUSH64r : X86::PUSH32r))
				.addReg(RegA, RegState::Undef);
				} else {
				// Sub.
				BuildMI(*MBB, I, DL, TII->get(getSubOpcode(Is64Bit, Amount)), StackPtr)
				.addReg(StackPtr)
				.addImm(Amount);
				}
				break;
				case Probe:
				// The probe lowering expects the amount in RAX/EAX.
				BuildMI(*MBB, MI, DL, TII->get(TargetOpcode::COPY), RegA)
				.addReg(MI->getOperand(0).getReg());

				// Do the probe.
				STI->getFrameLowering()->emitStackProbe(MBB->getParent(), MBB, MI, DL,
				/InPrologue=/false);
				break;
				}

				unsigned AmountReg = MI->getOperand(0).getReg();
				MI->eraseFromParent();

				// Delete the definition of AmountReg, possibly walking a chain of copies.
				for (;;) {
				if (!MRI->use_empty(AmountReg))
				break;
				MachineInstr *AmountDef = MRI->getUniqueVRegDef(AmountReg);
				if (!AmountDef)
				break;
				if (AmountDef->isCopy() && AmountDef->getOperand(1).isReg())
				AmountReg = AmountDef->getOperand(1).isReg();
				AmountDef->eraseFromParent();
				break;
				}
				}

				bool X86WinAllocaExpander::runOnMachineFunction(MachineFunction &MF) {
				if (!MF.getInfo<X86MachineFunctionInfo>()->hasWinAlloca())
				return false;

				MRI = &MF.getRegInfo();
				STI = &MF.getSubtarget<X86Subtarget>();
				TII = STI->getInstrInfo();
				TRI = STI->getRegisterInfo();
				StackPtr = TRI->getStackRegister();
				SlotSize = TRI->getSlotSize();

				StackProbeSize = 4096;
				if (MF.getFunction()->hasFnAttribute("stack-probe-size")) {
				MF.getFunction()
				->getFnAttribute("stack-probe-size")
				.getValueAsString()
				.getAsInteger(0, StackProbeSize);
				}

				LoweringMap Lowerings;
				computeLowerings(MF, Lowerings);
				for (auto &P : Lowerings)
				lower(P.first, P.second);

				return true;
				}

llvm/trunk/test/CodeGen/X86/cleanuppad-inalloca.ll

Show All 32 Lines	ehcleanup: ; preds = %entry
call x86_thiscallcc void @"\01??1A@@QAE@XZ"(%struct.A* %0) [ "funclet"(token %2) ]		call x86_thiscallcc void @"\01??1A@@QAE@XZ"(%struct.A* %0) [ "funclet"(token %2) ]
cleanupret from %2 unwind to caller		cleanupret from %2 unwind to caller
}		}

; CHECK: _passes_two:		; CHECK: _passes_two:
; CHECK: pushl %ebp		; CHECK: pushl %ebp
; CHECK: movl %esp, %ebp		; CHECK: movl %esp, %ebp
; CHECK: subl ${{[0-9]+}}, %esp		; CHECK: subl ${{[0-9]+}}, %esp
; CHECK: movl $8, %eax		; CHECK: pushl %eax
; CHECK: calll __chkstk		; CHECK: pushl %eax
; CHECK: calll "??0A@@QAE@XZ"		; CHECK: calll "??0A@@QAE@XZ"
; CHECK: calll "??0A@@QAE@XZ"		; CHECK: calll "??0A@@QAE@XZ"
; CHECK: calll _takes_two		; CHECK: calll _takes_two
; ESP must be restored via EBP due to "dynamic" alloca.		; ESP must be restored via EBP due to "dynamic" alloca.
; CHECK: leal -{{[0-9]+}}(%ebp), %esp		; CHECK: leal -{{[0-9]+}}(%ebp), %esp
; CHECK: popl %ebp		; CHECK: popl %ebp
; CHECK: retl		; CHECK: retl

Show All 18 Lines

llvm/trunk/test/CodeGen/X86/dynamic-alloca-in-entry.ll

	Show All 9 Lines
	; CHECK: retl			; CHECK: retl

	; Use of inalloca implies that that the alloca is not static.			; Use of inalloca implies that that the alloca is not static.
	define void @bar() {			define void @bar() {
	%m = alloca inalloca i32			%m = alloca inalloca i32
	ret void			ret void
	}			}
	; CHECK-LABEL: _bar:			; CHECK-LABEL: _bar:
	; CHECK: calll __chkstk			; CHECK: pushl %eax
	; CHECK: retl			; CHECK: retl

llvm/trunk/test/CodeGen/X86/inalloca-ctor.ll

	; RUN: llc < %s -mtriple=i686-pc-win32 \| FileCheck %s			; RUN: llc < %s -mtriple=i686-pc-win32 \| FileCheck %s

	%Foo = type { i32, i32 }			%Foo = type { i32, i32 }

	%frame = type { %Foo, i32, %Foo }			%frame = type { %Foo, i32, %Foo }

	declare void @f(%frame* inalloca %a)			declare void @f(%frame* inalloca %a)

	declare void @Foo_ctor(%Foo* %this)			declare void @Foo_ctor(%Foo* %this)

	define void @g() {			define void @g() {
	entry:			entry:
	%args = alloca inalloca %frame			%args = alloca inalloca %frame
	%c = getelementptr %frame, %frame* %args, i32 0, i32 2			%c = getelementptr %frame, %frame* %args, i32 0, i32 2
	; CHECK: movl $20, %eax			; CHECK: pushl %eax
	; CHECK: calll __chkstk			; CHECK: subl $16, %esp
	; CHECK: movl %esp,			; CHECK: movl %esp,
	call void @Foo_ctor(%Foo* %c)			call void @Foo_ctor(%Foo* %c)
	; CHECK: leal 12(%{{.*}}),			; CHECK: leal 12(%{{.*}}),
	; CHECK-NEXT: pushl			; CHECK-NEXT: pushl
	; CHECK-NEXT: calll _Foo_ctor			; CHECK-NEXT: calll _Foo_ctor
	; CHECK: addl $4, %esp			; CHECK: addl $4, %esp
	%b = getelementptr %frame, %frame* %args, i32 0, i32 1			%b = getelementptr %frame, %frame* %args, i32 0, i32 1
	store i32 42, i32* %b			store i32 42, i32* %b
	Show All 10 Lines

llvm/trunk/test/CodeGen/X86/inalloca-invoke.ll

Show All 15 Lines	define i32 @main() personality i32 (...)* @pers {
br label %blah		br label %blah

blah:		blah:
%inalloca.save = call i8* @llvm.stacksave()		%inalloca.save = call i8* @llvm.stacksave()
%rev_args = alloca inalloca %frame.reverse, align 4		%rev_args = alloca inalloca %frame.reverse, align 4
%beg = getelementptr %frame.reverse, %frame.reverse* %rev_args, i32 0, i32 0		%beg = getelementptr %frame.reverse, %frame.reverse* %rev_args, i32 0, i32 0
%end = getelementptr %frame.reverse, %frame.reverse* %rev_args, i32 0, i32 1		%end = getelementptr %frame.reverse, %frame.reverse* %rev_args, i32 0, i32 1

; CHECK: calll __chkstk		; CHECK: pushl %eax
		; CHECK: subl $20, %esp
; CHECK: movl %esp, %[[beg:[^ ]*]]		; CHECK: movl %esp, %[[beg:[^ ]*]]
; CHECK: leal 12(%[[beg]]), %[[end:[^ ]*]]		; CHECK: leal 12(%[[beg]]), %[[end:[^ ]*]]

call void @begin(%Iter* sret %temp.lvalue)		call void @begin(%Iter* sret %temp.lvalue)
; CHECK: calll _begin		; CHECK: calll _begin

invoke void @plus(%Iter* sret %end, %Iter* %temp.lvalue, i32 4)		invoke void @plus(%Iter* sret %end, %Iter* %temp.lvalue, i32 4)
to label %invoke.cont unwind label %lpad		to label %invoke.cont unwind label %lpad
Show All 23 Lines

llvm/trunk/test/CodeGen/X86/inalloca-stdcall.ll

	; RUN: llc < %s -mtriple=i686-pc-win32 \| FileCheck %s			; RUN: llc < %s -mtriple=i686-pc-win32 \| FileCheck %s

	%Foo = type { i32, i32 }			%Foo = type { i32, i32 }

	declare x86_stdcallcc void @f(%Foo* inalloca %a)			declare x86_stdcallcc void @f(%Foo* inalloca %a)
	declare x86_stdcallcc void @i(i32 %a)			declare x86_stdcallcc void @i(i32 %a)

	define void @g() {			define void @g() {
	; CHECK-LABEL: _g:			; CHECK-LABEL: _g:
	%b = alloca inalloca %Foo			%b = alloca inalloca %Foo
	; CHECK: movl $8, %eax			; CHECK: pushl %eax
	; CHECK: calll __chkstk			; CHECK: pushl %eax
	%f1 = getelementptr %Foo, %Foo* %b, i32 0, i32 0			%f1 = getelementptr %Foo, %Foo* %b, i32 0, i32 0
	%f2 = getelementptr %Foo, %Foo* %b, i32 0, i32 1			%f2 = getelementptr %Foo, %Foo* %b, i32 0, i32 1
	store i32 13, i32* %f1			store i32 13, i32* %f1
	store i32 42, i32* %f2			store i32 42, i32* %f2
	; CHECK: movl %esp, %eax			; CHECK: movl %esp, %eax
	; CHECK: movl $13, (%eax)			; CHECK: movl $13, (%eax)
	; CHECK: movl $42, 4(%eax)			; CHECK: movl $42, 4(%eax)
	call x86_stdcallcc void @f(%Foo* inalloca %b)			call x86_stdcallcc void @f(%Foo* inalloca %b)
	; CHECK: calll _f@8			; CHECK: calll _f@8
	; CHECK-NOT: %esp			; CHECK-NOT: %esp
	; CHECK: pushl			; CHECK: pushl
	; CHECK: calll _i@4			; CHECK: calll _i@4
	call x86_stdcallcc void @i(i32 0)			call x86_stdcallcc void @i(i32 0)
	ret void			ret void
	}			}

llvm/trunk/test/CodeGen/X86/inalloca.ll

	; RUN: llc < %s -mtriple=i686-pc-win32 \| FileCheck %s			; RUN: llc < %s -mtriple=i686-pc-win32 \| FileCheck %s

	%Foo = type { i32, i32 }			%Foo = type { i32, i32 }

	declare void @f(%Foo* inalloca %b)			declare void @f(%Foo* inalloca %b)

	define void @a() {			define void @a() {
	; CHECK-LABEL: _a:			; CHECK-LABEL: _a:
	entry:			entry:
	%b = alloca inalloca %Foo			%b = alloca inalloca %Foo
	; CHECK: movl $8, %eax			; CHECK: pushl %eax
	; CHECK: calll __chkstk			; CHECK: pushl %eax
	%f1 = getelementptr %Foo, %Foo* %b, i32 0, i32 0			%f1 = getelementptr %Foo, %Foo* %b, i32 0, i32 0
	%f2 = getelementptr %Foo, %Foo* %b, i32 0, i32 1			%f2 = getelementptr %Foo, %Foo* %b, i32 0, i32 1
	store i32 13, i32* %f1			store i32 13, i32* %f1
	store i32 42, i32* %f2			store i32 42, i32* %f2
	; CHECK: movl %esp, %eax			; CHECK: movl %esp, %eax
	; CHECK: movl $13, (%eax)			; CHECK: movl $13, (%eax)
	; CHECK: movl $42, 4(%eax)			; CHECK: movl $42, 4(%eax)
	call void @f(%Foo* inalloca %b)			call void @f(%Foo* inalloca %b)
	; CHECK: calll _f			; CHECK: calll _f
	ret void			ret void
	}			}

	declare void @inreg_with_inalloca(i32 inreg %a, %Foo* inalloca %b)			declare void @inreg_with_inalloca(i32 inreg %a, %Foo* inalloca %b)

	define void @b() {			define void @b() {
	; CHECK-LABEL: _b:			; CHECK-LABEL: _b:
	entry:			entry:
	%b = alloca inalloca %Foo			%b = alloca inalloca %Foo
	; CHECK: movl $8, %eax			; CHECK: pushl %eax
	; CHECK: calll __chkstk			; CHECK: pushl %eax
	%f1 = getelementptr %Foo, %Foo* %b, i32 0, i32 0			%f1 = getelementptr %Foo, %Foo* %b, i32 0, i32 0
	%f2 = getelementptr %Foo, %Foo* %b, i32 0, i32 1			%f2 = getelementptr %Foo, %Foo* %b, i32 0, i32 1
	store i32 13, i32* %f1			store i32 13, i32* %f1
	store i32 42, i32* %f2			store i32 42, i32* %f2
	; CHECK: movl %esp, %eax			; CHECK: movl %esp, %eax
	; CHECK: movl $13, (%eax)			; CHECK: movl $13, (%eax)
	; CHECK: movl $42, 4(%eax)			; CHECK: movl $42, 4(%eax)
	call void @inreg_with_inalloca(i32 inreg 1, %Foo* inalloca %b)			call void @inreg_with_inalloca(i32 inreg 1, %Foo* inalloca %b)
	; CHECK: movl $1, %eax			; CHECK: movl $1, %eax
	; CHECK: calll _inreg_with_inalloca			; CHECK: calll _inreg_with_inalloca
	ret void			ret void
	}			}

	declare x86_thiscallcc void @thiscall_with_inalloca(i8* %a, %Foo* inalloca %b)			declare x86_thiscallcc void @thiscall_with_inalloca(i8* %a, %Foo* inalloca %b)

	define void @c() {			define void @c() {
	; CHECK-LABEL: _c:			; CHECK-LABEL: _c:
	entry:			entry:
	%b = alloca inalloca %Foo			%b = alloca inalloca %Foo
	; CHECK: movl $8, %eax			; CHECK: pushl %eax
	; CHECK: calll __chkstk			; CHECK: pushl %eax
	%f1 = getelementptr %Foo, %Foo* %b, i32 0, i32 0			%f1 = getelementptr %Foo, %Foo* %b, i32 0, i32 0
	%f2 = getelementptr %Foo, %Foo* %b, i32 0, i32 1			%f2 = getelementptr %Foo, %Foo* %b, i32 0, i32 1
	store i32 13, i32* %f1			store i32 13, i32* %f1
	store i32 42, i32* %f2			store i32 42, i32* %f2
	; CHECK: movl %esp, %eax			; CHECK: movl %esp, %eax
	; CHECK-DAG: movl $13, (%eax)			; CHECK-DAG: movl $13, (%eax)
	; CHECK-DAG: movl $42, 4(%eax)			; CHECK-DAG: movl $42, 4(%eax)
	call x86_thiscallcc void @thiscall_with_inalloca(i8* null, %Foo* inalloca %b)			call x86_thiscallcc void @thiscall_with_inalloca(i8* null, %Foo* inalloca %b)
	; CHECK-DAG: xorl %ecx, %ecx			; CHECK-DAG: xorl %ecx, %ecx
	; CHECK: calll _thiscall_with_inalloca			; CHECK: calll _thiscall_with_inalloca
	ret void			ret void
	}			}

llvm/trunk/test/CodeGen/X86/shrink-wrap-chkstk.ll

	; RUN: llc < %s -enable-shrink-wrap=true \| FileCheck %s			; RUN: llc < %s -enable-shrink-wrap=true \| FileCheck %s

	; chkstk cannot come before the usual prologue, since it adjusts ESP.			; chkstk cannot come before the usual prologue, since it adjusts ESP.
	; If chkstk is used in the prologue, we also have to be careful about preserving			; If chkstk is used in the prologue, we also have to be careful about preserving
	; EAX if it is used.			; EAX if it is used.

	target datalayout = "e-m:x-p:32:32-i64:64-f80:32-n8:16:32-a:0:32-S32"			target datalayout = "e-m:x-p:32:32-i64:64-f80:32-n8:16:32-a:0:32-S32"
	target triple = "i686-pc-windows-msvc18.0.0"			target triple = "i686-pc-windows-msvc18.0.0"

	%struct.S = type { [12 x i8] }			%struct.S = type { [8192 x i8] }

	define x86_thiscallcc void @call_inalloca(i1 %x) {			define x86_thiscallcc void @call_inalloca(i1 %x) {
	entry:			entry:
	%argmem = alloca inalloca <{ %struct.S }>, align 4			%argmem = alloca inalloca <{ %struct.S }>, align 4
	%argidx1 = getelementptr inbounds <{ %struct.S }>, <{ %struct.S }>* %argmem, i32 0, i32 0, i32 0, i32 0			%argidx1 = getelementptr inbounds <{ %struct.S }>, <{ %struct.S }>* %argmem, i32 0, i32 0, i32 0, i32 0
	%argidx2 = getelementptr inbounds <{ %struct.S }>, <{ %struct.S }>* %argmem, i32 0, i32 0, i32 0, i32 1			%argidx2 = getelementptr inbounds <{ %struct.S }>, <{ %struct.S }>* %argmem, i32 0, i32 0, i32 0, i32 1
	store i8 42, i8* %argidx2, align 4			store i8 42, i8* %argidx2, align 4
	br i1 %x, label %bb1, label %bb2			br i1 %x, label %bb1, label %bb2

	bb1:			bb1:
	store i8 42, i8* %argidx1, align 4			store i8 42, i8* %argidx1, align 4
	br label %bb2			br label %bb2

	bb2:			bb2:
	call void @inalloca_params(<{ %struct.S }>* inalloca nonnull %argmem)			call void @inalloca_params(<{ %struct.S }>* inalloca nonnull %argmem)
	ret void			ret void
	}			}

	; CHECK-LABEL: _call_inalloca: # @call_inalloca			; CHECK-LABEL: _call_inalloca: # @call_inalloca
	; CHECK: pushl %ebp			; CHECK: pushl %ebp
	; CHECK: movl %esp, %ebp			; CHECK: movl %esp, %ebp
	; CHECK: movl $12, %eax			; CHECK: movl $8192, %eax
	; CHECK: calll __chkstk			; CHECK: calll __chkstk
	; CHECK: calll _inalloca_params			; CHECK: calll _inalloca_params
	; CHECK: movl %ebp, %esp			; CHECK: movl %ebp, %esp
	; CHECK: popl %ebp			; CHECK: popl %ebp
	; CHECK: retl			; CHECK: retl

	declare void @inalloca_params(<{ %struct.S }>* inalloca)			declare void @inalloca_params(<{ %struct.S }>* inalloca)

	Show All 32 Lines

llvm/trunk/test/CodeGen/X86/win-alloca-expander.ll

				; RUN: llc < %s -mtriple=i686-pc-win32 \| FileCheck %s

				%struct.S = type { [1024 x i8] }
				%struct.T = type { [3000 x i8] }
				%struct.U = type { [10000 x i8] }

				define void @basics() {
				; CHECK-LABEL: basics:
				entry:
				br label %bb1

				; Allocation move sizes should have been removed.
				; CHECK-NOT: movl $1024
				; CHECK-NOT: movl $3000

				bb1:
				%p0 = alloca %struct.S
				; The allocation is small enough not to require stack probing, but the %esp
				; offset after the prologue is not known, so the stack must be touched before
				; the pointer is adjusted.
				; CHECK: pushl %eax
				; CHECK: subl $1020, %esp

				%saved_stack = tail call i8* @llvm.stacksave()

				%p1 = alloca %struct.S
				; We know the %esp offset from above, so there is no need to touch the stack
				; before adjusting it.
				; CHECK: subl $1024, %esp

				%p2 = alloca %struct.T
				; The offset is now 2048 bytes, so allocating a T must touch the stack again.
				; CHECK: pushl %eax
				; CHECK: subl $2996, %esp

				call void @f(%struct.S* %p0)
				; CHECK: calll

				%p3 = alloca %struct.T
				; The call above touched the stack, so there is room for a T object.
				; CHECK: subl $3000, %esp

				%p4 = alloca %struct.U
				; The U object is large enough to require stack probing.
				; CHECK: movl $10000, %eax
				; CHECK: calll __chkstk

				%p5 = alloca %struct.T
				; The stack probing above touched the tip of the stack, so there's room for a T.
				; CHECK: subl $3000, %esp

				call void @llvm.stackrestore(i8* %saved_stack)
				%p6 = alloca %struct.S
				; The stack restore means we lose track of the stack pointer and must probe.
				; CHECK: pushl %eax
				; CHECK: subl $1020, %esp

				; Use the pointers so they're not optimized away.
				call void @f(%struct.S* %p1)
				call void @g(%struct.T* %p2)
				call void @g(%struct.T* %p3)
				call void @h(%struct.U* %p4)
				call void @g(%struct.T* %p5)
				ret void
				}

				define void @loop() {
				; CHECK-LABEL: loop:
				entry:
				br label %bb1

				bb1:
				%p1 = alloca %struct.S
				; The entry offset is unknown; touch-and-sub.
				; CHECK: pushl %eax
				; CHECK: subl $1020, %esp
				br label %loop1

				loop1:
				%i1 = phi i32 [ 10, %bb1 ], [ %dec1, %loop1 ]
				%p2 = alloca %struct.S
				; We know the incoming offset from bb1, but from the back-edge, we assume the
				; worst, and therefore touch-and-sub to allocate.
				; CHECK: pushl %eax
				; CHECK: subl $1020, %esp
				%dec1 = sub i32 %i1, 1
				%cmp1 = icmp sgt i32 %i1, 0
				br i1 %cmp1, label %loop1, label %end
				; CHECK: decl
				; CHECK: jg

				end:
				call void @f(%struct.S* %p1)
				call void @f(%struct.S* %p2)
				ret void
				}

				define void @probe_size_attribute() "stack-probe-size"="512" {
				; CHECK-LABEL: probe_size_attribute:
				entry:
				br label %bb1

				bb1:
				%p0 = alloca %struct.S
				; The allocation would be small enough not to require probing, if it wasn't
				; for the stack-probe-size attribute.
				; CHECK: movl $1024, %eax
				; CHECK: calll __chkstk
				call void @f(%struct.S* %p0)
				ret void
				}

				define void @cfg(i1 %x, i1 %y) {
				; Test that the blocks are analyzed in the correct order.
				; CHECK-LABEL: cfg:
				entry:
				br i1 %x, label %bb1, label %bb2

				bb1:
				%p1 = alloca %struct.S
				; CHECK: pushl %eax
				; CHECK: subl $1020, %esp
				br label %bb3
				bb2:
				%p2 = alloca %struct.T
				; CHECK: pushl %eax
				; CHECK: subl $2996, %esp
				br label %bb3

				bb3:
				br i1 %y, label %bb4, label %bb5

				bb4:
				%p4 = alloca %struct.S
				; CHECK: subl $1024, %esp
				call void @f(%struct.S* %p4)
				ret void

				bb5:
				%p5 = alloca %struct.T
				; CHECK: pushl %eax
				; CHECK: subl $2996, %esp
				call void @g(%struct.T* %p5)
				ret void
				}


				declare void @f(%struct.S*)
				declare void @g(%struct.T*)
				declare void @h(%struct.U*)

				declare i8* @llvm.stacksave()
				declare void @llvm.stackrestore(i8*)

This is an archive of the discontinued LLVM Phabricator instance.

X86: Avoid using _chkstk when lowering WIN_ALLOCA instructionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 57514

llvm/trunk/lib/Target/X86/CMakeLists.txt

llvm/trunk/lib/Target/X86/X86.h

llvm/trunk/lib/Target/X86/X86ISelLowering.h

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

llvm/trunk/lib/Target/X86/X86InstrCompiler.td

llvm/trunk/lib/Target/X86/X86InstrInfo.td

llvm/trunk/lib/Target/X86/X86MachineFunctionInfo.h

llvm/trunk/lib/Target/X86/X86TargetMachine.cpp

llvm/trunk/lib/Target/X86/X86WinAllocaExpander.cpp

llvm/trunk/test/CodeGen/X86/cleanuppad-inalloca.ll

llvm/trunk/test/CodeGen/X86/dynamic-alloca-in-entry.ll

llvm/trunk/test/CodeGen/X86/inalloca-ctor.ll

llvm/trunk/test/CodeGen/X86/inalloca-invoke.ll

llvm/trunk/test/CodeGen/X86/inalloca-stdcall.ll

llvm/trunk/test/CodeGen/X86/inalloca.ll

llvm/trunk/test/CodeGen/X86/shrink-wrap-chkstk.ll

llvm/trunk/test/CodeGen/X86/win-alloca-expander.ll

X86: Avoid using _chkstk when lowering WIN_ALLOCA instructions
ClosedPublic