This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/
-
CodeGen/
2
PrologEpilogInserter.cpp
-
Target/X86/
-
X86/
1
CMakeLists.txt
2
X86.h
30
X86ConvertMovsToPushes.cpp
-
X86FastISel.cpp
-
X86FrameLowering.h
-
X86FrameLowering.cpp
-
X86InstrCompiler.td
-
X86InstrInfo.h
8
X86InstrInfo.cpp
-
X86MachineFunctionInfo.h
1
X86RegisterInfo.cpp
-
X86TargetMachine.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
inalloca-invoke.ll
1
movtopush.ll

Differential D6789

[X86] Convert esp-relative movs of function arguments to pushes, step 2
ClosedPublic

Authored by mkuper on Dec 28 2014, 6:11 AM.

Download Raw Diff

Details

Reviewers

nadav
delena
rnk

Commits

rG13fbd4526336: [X86] Convert esp-relative movs of function arguments to pushes, step 2
rGbd57186c763f: [X86] Convert esp-relative movs of function arguments to pushes, step 2
rL227752: [X86] Convert esp-relative movs of function arguments to pushes, step 2
rL227728: [X86] Convert esp-relative movs of function arguments to pushes, step 2

Summary

This is a first stab at the next step of the mov-to-push transformation.

It moves the transformation earlier in the pass order so that it can do load-folding, and prepares the required infrastructure.
It is still enabled only in cases where it should be a clear win - when we don't expect to have a reserved call frame, or when optimizing for size.
The next step will be a heuristic that makes a smarter decision on when this should be enabled.

As a side note - I've done some internal testing for effects on the code size, but I'd like to do some testing for things other people care about as well. So, if you have a x86-32 code-base where you care about the code size, and is publicly available, let me know.

Diff Detail

Event Timeline

mkuper updated this revision to Diff 17656.Dec 28 2014, 6:11 AM

mkuper retitled this revision from to [X86] Convert esp-relative movs of function arguments to pushes, step 2.

mkuper updated this object.

mkuper edited the test plan for this revision. (Show Details)

mkuper added reviewers: nadav, rnk, delena.

mkuper added a subscriber: Unknown Object (MLST).

Removed a horrible hack that was, in addition to being horrible, completely wrong, and added a test-case to cover the issue.

Also, ping?

I suggest to check also varargs and stdcall functions, were the callee clears the stack.

lib/Target/X86/CMakeLists.txt
17	Can you add this code to X86FrameLowering.cpp ?
lib/Target/X86/X86.h
70	I suggest to choose another name, something like optimizeCallFrameForSize
lib/Target/X86/X86ConvertMovsToPushes.cpp
40	I don't think that we really need this knob.
114	If you change instructions inside bb, your iterator may be broken.
213	It should be immediate, right? Can we have a relocation here?
225	SlowPush should be a property of the target, like slowLea
227	The comment is missing here.

Thanks, Elena!
Will upload a new version.

lib/Target/X86/X86.h
70	I wasn't happy with the name either, but didn't have any good ideas at the time. Will do.
lib/Target/X86/X86ConvertMovsToPushes.cpp
40	I'd rather keep this knob, it's fairly useful for debugging. Of course, it's internal only, not exposed to clang.
114	As far as I know, MBB iterators aren't invalidated by removing other instructions, and we don't remove the FrameSetup itself. But it's probably better to keep going from the FrameDestroy instead of the next instruction. Will change that.
213	It can be a relocation, but in that case, isImm() will fail. Will document that more clearly
225	I agree. Unfortunately, I've run out of bits. The Subtarget features are 64-bit bitfield, and they're all taken.
lib/Target/X86/X86RegisterInfo.cpp
508	And, apparently, this is still wrong, because eliminateCallFramePseudoInstr() may actually adjust the SP by a different amount than what PEI passes as the SPAdj, e.g. due to stack alignment concerns.

mkuper added inline comments.Jan 6 2015, 3:19 AM

lib/Target/X86/X86ConvertMovsToPushes.cpp
286	Argh. This is nonsense. Commented one thing, coded another... (mayStore() is extremely far from being a strong enough condition to allow this.)

rnk added inline comments.Jan 12 2015, 4:02 PM

lib/Target/X86/X86ConvertMovsToPushes.cpp
40	I would also like this as a temporary testing knob so that I can evaluate this across a large codebase.

So, this version should actually work (e.g. it can self-host and past check-llvm. Without the stackalign restriction of course, since that currently makes it a nop except on windows).
Unfortunately, it has several big warts, so I'm not planning to commit it as is. This is more of a request for ideas on how to improve the code.

So, any ideas on how to make this sane, especially X86InstrInfo::getSPAdjust(), are welcome.

I haven't finished reviewing yet, but I've got to run and handle something personal.

At a high level, is there any reason we shouldn't commit to push/pop earlier to allow for better ISel, rather than trying to transform call sequences later? Specifically, I'm thinking about adding an X86ISD::PUSH DAG node and changing X86TargetLowering::LowerCall() to use it.

lib/CodeGen/PrologEpilogInserter.cpp
855–856	This seems like an x86-specific quirk, right? Given "push [esp + 8]", x86 chips will load [esp + 8] before adjusting esp, and I think this code motion accomplishes that. I'm OK with that motion so long as there are no other upstream LLVM backends with CISC-y instructions like "push [SP-mem]". :)
lib/Target/X86/X86ConvertMovsToPushes.cpp
11	s/stck/stack/
82–83	I think it's important to at least support __thiscall eventually, since that's a very common convention with one regparm.
84–86	I guess I would justify this more in terms of reducing the extra CFI that we would have to emit to describe the SP adjustments. Converting a few movs to pushes isn't worth the complexity.
143	Can you explain why this is unprofitable? I guess if we get here we are in dyanamic alloca plus stack realignment land, i.e. the worst thing that could possibly happen. Is this about extra code for preserving the outgoing stack alignment then? Like on Linux, where we provide 16 byte stack alignment?

Thanks, Reid!

Waiting for the second part, you didn't get to the really horrible stuff yet...

Regarding the high level, two reasons:

It seemed like it was going to be simpler. I'm not so sure anymore, but I still think it is. (Note that we'll still need to fix all of the code that tracks SP adjustment, that's not going away in either case).
The main problem is that next step after this is going to be a function-scope heuristic. To use this transformation for even one call-site, I have to disable the reserved frame for the whole function. So, I need to try to approximate the impact on the whole function (which contains some calls that will be converted to use pushes, and some calls that won't be). I don't see how this can be done on the DAG level.

lib/CodeGen/PrologEpilogInserter.cpp
855–856	This call to SPAdjust() always returns 0 right now (barring the code in this patch), it was added as part of my refactoring in D6863, and I added it in the wrong place. The motivation here wasn't a push, actually, since I try to never generate push [esp + 8], that's filtered out by the code in the optimization. Although I can probably start generating them - I was trying to filter them out precisely because I didn't want all of this complexity at the first stage, but apparently it's necessary. The problem is that once we don't have a reserved call frame (regardless of the push transformation), you can have things like CALL32r <fi#1>, where the call is callee-pop. So you need to resolve the indirect call using the stack-pointer from before the call.
lib/Target/X86/X86ConvertMovsToPushes.cpp
82–83	Yes, and maybe even for _fastcall (It looks like gcc will do this for fastcall, icc won't). But I am still trying to do this gradually, to the extent that I can. :-)
84–86	You're right, that too.
143	If we get here, we're in opt-for-size + stack-realignment land. And, yes, that's exactly what it is is about. If you are passing only one parameter, the original code would be: mov %eax, 128(%esp) call $foo Without re-alignment, you have push %eax call $foo add $4, %esp which is still a win in terms of code-size With re-alignment, you get: sub $16, %esp push %eax call $foo add $12, %esp Which is... questionable. The code size for the sequence is the same (in this case, 7 bytes for both, not including the call), but if you have other call sites which you didn't convert, you may actually lose. And, of course, you lose performance (3 instructions instead of 1) without anything to show for it. Once there is a heuristic that tries to estimate the overhead, we can address this on a case-by-case basis (e.g. if we have 16-byte stack re-alignment, but most call-sites have a lot of parameters, then it's still worth it.)

rnk added inline comments.Jan 13 2015, 3:58 PM

lib/Target/X86/X86ConvertMovsToPushes.cpp
128–130	I think I misinterpreted this on the first pass. We always expect this to be profitable if we know we can't reserve space for the call frame. Maybe rename the bool to CannotReserveFrame to match the sense?
143	Based on my misinterpretation, I think I understand why you get this code. SP is assumed to be aligned coming into the sequence. We realign SP after dynamic allocas. The sequence is probably more like: sub $12, %esp push %eax call $foo add $16, %esp I can see why this is less profitable.
208	std::map is really malloc heavy. This can probably be a SmallVector<MachineInstr*, 8> or something, mapping slot index to the MI that fills it. The frame setup opcode should tell you how much stack space to allocate up front, and you can index into the vector by StackOffset / 4.
220–222	This seems worth tackling, given that you had to handle the `call <fi>` case. :)
364–368	It's not clear to me that same BB is sufficient, consider this potential BB: movl (%edi), %eax movl $42, (%edi) <call setup> movl %eax, (%esp) calll foo <call end> We can't move the load if there is a potentially aliasing store in the way. There might be a utility to help with the aliasing query, or you can assume that any stores other than arg stores might alias it and bail on that.
lib/Target/X86/X86InstrInfo.cpp
1717–1718	This is the best thing I can think of at the moment. =/
test/CodeGen/X86/movtopush.ll
206	Test case suggestions: ; Where the callee is indirect via the stack, `call <fi>` define void @test10() optsize { %stack_fptr = alloca void (i32, i32, i32, i32)* store void (i32, i32, i32, i32)* @good, void (i32, i32, i32, i32) %stack_fptr %good_ptr = load void (i32, i32, i32, i32) %stack_fptr call void (i32, i32, i32, i32)* %good_ptr(i32 1, i32 2, i32 3, i32 4) ret void } ; We can't fold the load into the push here, skipping the store. @the_global = global i32 define void @test11() optsize { %myload = load i32* @the_global store i32 42, i32* @the_global call void @good(i32 %myload, i32 2, i32 3, i32 4) ret void }

Thanks, Reid!

lib/Target/X86/X86ConvertMovsToPushes.cpp
128–130	Err, yes, you're right, sorry about that... got distracted while naming the variable, I guess, I meant the opposite. Thanks!
143	Yes, that sequence. :-) It doesn't depend on dynamic allocas, though. If you don't have a reserved frame (for whatever reason - for x86 after this patch, it's either dynamic allocas, or because we forced it not to reserve by using pushes), then you need this re-alignment.
208	That can work. Thanks, I'll try.
220–222	Yes, definitely. :-) It may even work out of the box now. But I think I still want to split it into a separate commit.
364–368	Right now I'm way more conservative than even that - I'm checking below that everything between this mov and the call setup is a MOV32rm. The "same basic block" check here is just a way to short-circuit the obviously wrong cases. This catches some common cases like the one in the comment above, but of course misses other opportunities. I could check for a mayStore() instead, but I'm not sure that's safe enough. I'd like to relax the condition - but again, I think that ought to be a separate commit.
lib/Target/X86/X86InstrInfo.cpp
1717–1718	Too bad. :-\ So you think I should commit with this code as is? This shouldn't be a huge problem in terms of compile-time (since I'm looking only until the next call, it can't go quadratic), but it's insanely ugly.

Applied review comments
Fixed another bug in the way PEI was handling push sequences (argh) - this required adding a target query.
Made the tests check a bit more (which would have exposed the bug above earlier).

rnk added inline comments.Jan 15 2015, 10:19 AM

lib/Target/X86/X86InstrInfo.cpp
1717–1718	Yeah, if we go with this MI pass approach to mov -> push conversion, then we'll have to keep this ADJCALLSTACKUP scan. We aren't going to move the callee cleanup stack adjustment onto the CALL instr without major changes.
1745	I wonder if it's possible for __readeflags() (pushf ; pop %reg) or others to get folded into a call sequence. Probably not.

mkuper added inline comments.Jan 16 2015, 5:57 AM

lib/Target/X86/X86InstrInfo.cpp
1717–1718	This will have to happen regardless of the MI pass vs. DAG approach. I mean, I still think doing it on the DAG is unfeasible, but even if we could do that, it wouldn't help. This code is used for the case where fi resolution needs to handle a a sequence where there is a fi reference between the call and the adjcallstackup, with callee cleanup for the call. This is just a side effect of making canSimplifyCallFramePseudos return false.
1745	I don't see how it could happen. In any case, we won't match either the pushf or the pop, so it should be ok.

lgtm

I still think forming pushes prior to isel is the way to go long term. It's a lot easier to convert pushes to 'load, SP adjust, store' than it is to go the other way.

include/llvm/Target/TargetFrameLowering.h
196 ↗	(On Diff #18222)	"- Do" uppercase
lib/Target/X86/X86ConvertMovsToPushes.cpp
101	Can this be `for (MachineBasicBlock &BB : *MF) {`?
103	Ditto, `for (MachineInstr &MI : BB) {` ?
lib/Target/X86/X86InstrInfo.cpp
1713–1725	I would shorten this to just something like "look for the ADJCALLSTACKUP instr that follows the call".
1717–1718	I was imagining in the DAG LowerCall implementation we emit FrameIndex operands with some kind of SP offset to indicate the current stack level. We'd end up with MI looking like this: ADJCALLSTACKDOWN32 <N> ; N is <size-of-args> % <stack-alignment>, which is usually zero PUSH32rmm <fi> <sp offset, N> PUSH32rmm <fi> <sp offset, N + 4> PUSH32rmm <fi> <sp offset, N + 8> CALL32rm <fi> <sp offset, N + 12> ADJCALLSTACKUP32 <N + 12> The main thing is that if we commit to pushes instead of movs at DAG time, it's impossible for the push conversion to fail for hard to diagnose reasons. It looks like the frame index MachineOperand type has an unused offset field.

This revision is now accepted and ready to land.Jan 22 2015, 12:57 PM

Hi Chandler,

This is something that Reid and I talked about on IRC, but I don’t think we came to a conclusion both of us were happy with (hence Reid’s “lgtm with reservations”, I guess :-) )

First, I don’t think the decision on whether to use movs or pushes belongs in the DAG.
The decision on whether a call-site should use movs or pushes needs to be aware of its context, because having even one call-site use pushes means we will not have a reserved call frame, which affects the way all other call sites are treated as well. This patch makes the decision based on global attributes only (opt for size vs. speed, stack alignment), but the next step will be to make it based on an analysis of the call-sites – e.g. even with stack alignment of 16, it can still often be a win, depending on just how many of the function calls we can actually transform, and how many memory arguments each call has.

So the way I envision the next step is that the pass will:

a) Collect the necessary information from all call sites in the function.

b) Make a judgment on whether the transformation is worth it – in terms of size for Os/Oz, in terms of performance for other opt levels.

c) Perform the transformation.
I don’t see how we can do this on the DAG.

If I understand Reid’s last suggestion, he proposed to flip the default – that is, emit pushes in the DAG, and have an MI pass that does the opposite (push -> mov) transformation if necessary.

I don’t believe that removes a lot of complexity or would improve performance.
The code in PEI, InstrInfo and FrameLowering is just a side effect on not being able to rely on a 0 SPAdj in PEI anymore (that is, canSimplifyCallFramePseudos() can now return false), and is needed regardless of how the transformation is performed. And we will still need the heuristic decision.
Some of the logic in looking for sequences where the conversion is possible will disappear, but I think a lot of it will remain as conditions on the incoming operand DAG nodes. And since we don’t want to transform each push into a “adjust esp, mov” but rather want to group all the esp adjustments back into the ADJCALLSTACKs, we will still need to have code in the pass that make sure this is safe w.r.t to the final sequence.
The main benefit I see is that we will no longer need to have the folding code – rather, we will have to unfold PUSH32rmm, which is simpler. However, I hope I can eventually get rid of the folding here by teaching PeepholeOptimizer to be smarter about this.

On the other hand, X86TargetLowering::LowerCall() is already, IMHO, a fairly complex piece of code, and I’d rather avoid making it even more complex.
Conceptually, I’d prefer that LowerCall() did standard mov-based lowering in all cases like it does now (we aren’t always going to lower to pushes anyway – it doesn’t really make sense for x864-64) and treat pushes as an optimization where available.

What do you think?

Michael

From: Chandler Carruth [mailto:chandlerc@google.com]
Sent: Thursday, January 22, 2015 23:07
To: reviews+D6789+public+a4ec4af5a5133e84@reviews.llvm.org
Cc: Kuperstein, Michael M; Nadav Rotem; Demikhovsky, Elena; Commit Messages and Patches for LLVM
Subject: Re: [PATCH] [X86] Convert esp-relative movs of function arguments to pushes, step 2

Closed by commit rL227728: [X86] Convert esp-relative movs of function arguments to pushes, step 2 (authored by mkuper). · Explain WhyFeb 1 2015, 3:46 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

CodeGen/

	PrologEpilogInserter.cpp
	PrologEpilogInserter.cpp (revision 225780)

17 lines

Target/

X86/

	CMakeLists.txt
	CMakeLists.txt (revision 225780)

1 line

	X86.h
	X86.h (revision 225780)

5 lines

	X86ConvertMovsToPushes.cpp
	X86ConvertMovsToPushes.cpp (revision 0)

386 lines

	X86FastISel.cpp
	X86FastISel.cpp (revision 225780)

2 lines

	X86FrameLowering.h
	X86FrameLowering.h (revision 225780)

1 line

	X86FrameLowering.cpp
	X86FrameLowering.cpp (revision 225780)

154 lines

	X86InstrCompiler.td
	X86InstrCompiler.td (revision 225780)

14 lines

	X86InstrInfo.h
	X86InstrInfo.h (revision 225780)

5 lines

	X86InstrInfo.cpp
	X86InstrInfo.cpp (revision 225780)

62 lines

	X86MachineFunctionInfo.h
	X86MachineFunctionInfo.h (revision 225780)

12 lines

	X86RegisterInfo.cpp
	X86RegisterInfo.cpp (revision 225780)

5 lines

	X86TargetMachine.cpp
	X86TargetMachine.cpp (revision 225780)

5 lines

test/

CodeGen/

X86/

	inalloca-invoke.ll
	inalloca-invoke.ll (revision 225780)

2 lines

	movtopush.ll
	movtopush.ll (revision 225780)

126 lines

Diff 18084

lib/CodeGen/PrologEpilogInserter.cpp

Show First 20 Lines • Show All 763 Lines • ▼ Show 20 Lines	if (I->getOpcode() == FrameSetupOpcode \|\|
// Visit the instructions created by eliminateCallFramePseudoInstr().		// Visit the instructions created by eliminateCallFramePseudoInstr().
if (PrevI == BB->end())		if (PrevI == BB->end())
I = BB->begin(); // The replaced instr was the first in the block.		I = BB->begin(); // The replaced instr was the first in the block.
else		else
I = std::next(PrevI);		I = std::next(PrevI);
continue;		continue;
}		}

// If we are looking at a call sequence, we need to keep track of
// the SP adjustment made by each instruction in the sequence.
// This includes both the frame setup/destroy pseudos (handled above),
// as well as other instructions that have side effects w.r.t the SP.
if (InsideCallSequence)
SPAdj += TII.getSPAdjust(I);

MachineInstr *MI = I;		MachineInstr *MI = I;
bool DoIncr = true;		bool DoIncr = true;
for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {		for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
if (!MI->getOperand(i).isFI())		if (!MI->getOperand(i).isFI())
continue;		continue;

// Frame indicies in debug values are encoded in a target independent		// Frame indicies in debug values are encoded in a target independent
// way with simply the frame index and offset rather than any		// way with simply the frame index and offset rather than any
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
I = BB->begin();		I = BB->begin();
DoIncr = false;		DoIncr = false;
}		}

MI = nullptr;		MI = nullptr;
break;		break;
}		}

		// If we are looking at a call sequence, we need to keep track of
		// the SP adjustment made by each instruction in the sequence.
		// This includes both the frame setup/destroy pseudos (handled above),
		// as well as other instructions that have side effects w.r.t the SP.
		// Note that this must come after eliminateFrameIndex, because
		// if I itself referred to a frame index, we shouldn't count its own
		// adjustment.
		rnkUnsubmitted Not Done Reply Inline Actions This seems like an x86-specific quirk, right? Given "push [esp + 8]", x86 chips will load [esp + 8] before adjusting esp, and I think this code motion accomplishes that. I'm OK with that motion so long as there are no other upstream LLVM backends with CISC-y instructions like "push [SP-mem]". :) rnk: This seems like an x86-specific quirk, right? Given "push [esp + 8]", x86 chips will load [esp…
		mkuperAuthorUnsubmitted Not Done Reply Inline Actions This call to SPAdjust() always returns 0 right now (barring the code in this patch), it was added as part of my refactoring in D6863, and I added it in the wrong place. The motivation here wasn't a push, actually, since I try to never generate push [esp + 8], that's filtered out by the code in the optimization. Although I can probably start generating them - I was trying to filter them out precisely because I didn't want all of this complexity at the first stage, but apparently it's necessary. The problem is that once we don't have a reserved call frame (regardless of the push transformation), you can have things like CALL32r <fi#1>, where the call is callee-pop. So you need to resolve the indirect call using the stack-pointer from before the call. mkuper: This call to SPAdjust() always returns 0 right now (barring the code in this patch), it was…
		if (MI && InsideCallSequence)
		SPAdj += TII.getSPAdjust(MI);

if (DoIncr && I != BB->end()) ++I;		if (DoIncr && I != BB->end()) ++I;

// Update register states.		// Update register states.
if (RS && !FrameIndexVirtualScavenging && MI) RS->forward(MI);		if (RS && !FrameIndexVirtualScavenging && MI) RS->forward(MI);
}		}
}		}

/// scavengeFrameVirtualRegs - Replace all frame index virtual registers		/// scavengeFrameVirtualRegs - Replace all frame index virtual registers
▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

lib/Target/X86/CMakeLists.txt

	set(LLVM_TARGET_DEFINITIONS X86.td)			set(LLVM_TARGET_DEFINITIONS X86.td)

	tablegen(LLVM X86GenRegisterInfo.inc -gen-register-info)			tablegen(LLVM X86GenRegisterInfo.inc -gen-register-info)
	tablegen(LLVM X86GenDisassemblerTables.inc -gen-disassembler)			tablegen(LLVM X86GenDisassemblerTables.inc -gen-disassembler)
	tablegen(LLVM X86GenInstrInfo.inc -gen-instr-info)			tablegen(LLVM X86GenInstrInfo.inc -gen-instr-info)
	tablegen(LLVM X86GenAsmWriter.inc -gen-asm-writer)			tablegen(LLVM X86GenAsmWriter.inc -gen-asm-writer)
	tablegen(LLVM X86GenAsmWriter1.inc -gen-asm-writer -asmwriternum=1)			tablegen(LLVM X86GenAsmWriter1.inc -gen-asm-writer -asmwriternum=1)
	tablegen(LLVM X86GenAsmMatcher.inc -gen-asm-matcher)			tablegen(LLVM X86GenAsmMatcher.inc -gen-asm-matcher)
	tablegen(LLVM X86GenDAGISel.inc -gen-dag-isel)			tablegen(LLVM X86GenDAGISel.inc -gen-dag-isel)
	tablegen(LLVM X86GenFastISel.inc -gen-fast-isel)			tablegen(LLVM X86GenFastISel.inc -gen-fast-isel)
	tablegen(LLVM X86GenCallingConv.inc -gen-callingconv)			tablegen(LLVM X86GenCallingConv.inc -gen-callingconv)
	tablegen(LLVM X86GenSubtargetInfo.inc -gen-subtarget)			tablegen(LLVM X86GenSubtargetInfo.inc -gen-subtarget)
	add_public_tablegen_target(X86CommonTableGen)			add_public_tablegen_target(X86CommonTableGen)

	set(sources			set(sources
	X86AsmPrinter.cpp			X86AsmPrinter.cpp
				X86ConvertMovsToPushes.cpp
				delenaUnsubmitted Not Done Reply Inline Actions Can you add this code to X86FrameLowering.cpp ? delena: Can you add this code to X86FrameLowering.cpp ?
	X86FastISel.cpp			X86FastISel.cpp
	X86FloatingPoint.cpp			X86FloatingPoint.cpp
	X86FrameLowering.cpp			X86FrameLowering.cpp
	X86ISelDAGToDAG.cpp			X86ISelDAGToDAG.cpp
	X86ISelLowering.cpp			X86ISelLowering.cpp
	X86InstrInfo.cpp			X86InstrInfo.cpp
	X86MCInstLower.cpp			X86MCInstLower.cpp
	X86MachineFunctionInfo.cpp			X86MachineFunctionInfo.cpp
	Show All 29 Lines

lib/Target/X86/X86.h

	Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	/// with NOOPs. This will prevent a stall when returning on the Atom.			/// with NOOPs. This will prevent a stall when returning on the Atom.
	FunctionPass *createX86PadShortFunctions();			FunctionPass *createX86PadShortFunctions();
	/// createX86FixupLEAs - Return a a pass that selectively replaces			/// createX86FixupLEAs - Return a a pass that selectively replaces
	/// certain instructions (like add, sub, inc, dec, some shifts,			/// certain instructions (like add, sub, inc, dec, some shifts,
	/// and some multiplies) by equivalent LEA instructions, in order			/// and some multiplies) by equivalent LEA instructions, in order
	/// to eliminate execution delays in some Atom processors.			/// to eliminate execution delays in some Atom processors.
	FunctionPass *createX86FixupLEAs();			FunctionPass *createX86FixupLEAs();

				/// createX86CallFrameOptimization - Return a pass that optimizes
				delenaUnsubmitted Not Done Reply Inline Actions I suggest to choose another name, something like optimizeCallFrameForSize delena: I suggest to choose another name, something like optimizeCallFrameForSize
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions I wasn't happy with the name either, but didn't have any good ideas at the time. Will do. mkuper: I wasn't happy with the name either, but didn't have any good ideas at the time. Will do.
				/// the code-size of x86 call sequences. This is done by replacing
				/// esp-relative movs with pushes.
				FunctionPass *createX86CallFrameOptimization();

	} // End llvm namespace			} // End llvm namespace

	#endif			#endif

lib/Target/X86/X86ConvertMovsToPushes.cpp

				//===----- X86CallFrameOptimization.cpp - Optimize x86 call sequences -----===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file defines a pass that optimizes call sequences on x86.
				// Currently, it converts movs of function parameters onto the stck into
				rnkUnsubmitted Not Done Reply Inline Actions s/stck/stack/ rnk: s/stck/stack/
				// pushes. This is beneficial for two main reasons:
				// 1) The push instruction encoding is much smaller than an esp-relative mov
				// 2) It is possible to push memory arguments directly. So, if the
				// the transformation is preformed pre-reg-alloc, it can help relieve
				// register pressure.
				//
				//===----------------------------------------------------------------------===//

				#include <algorithm>

				#include "X86.h"
				#include "X86InstrInfo.h"
				#include "X86Subtarget.h"
				#include "X86MachineFunctionInfo.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/CodeGen/Passes.h"
				#include "llvm/IR/Function.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Target/TargetInstrInfo.h"

				using namespace llvm;

				#define DEBUG_TYPE "x86-cf-opt"

				cl::opt<bool> NoX86CFOpt("no-x86-call-frame-opt",
				delenaUnsubmitted Not Done Reply Inline Actions I don't think that we really need this knob. delena: I don't think that we really need this knob.
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions I'd rather keep this knob, it's fairly useful for debugging. Of course, it's internal only, not exposed to clang. mkuper: I'd rather keep this knob, it's fairly useful for debugging. Of course, it's internal only, not…
				rnkUnsubmitted Not Done Reply Inline Actions I would also like this as a temporary testing knob so that I can evaluate this across a large codebase. rnk: I would also like this as a temporary testing knob so that I can evaluate this across a large…
				cl::desc("Avoid optimizing x86 call frames for size"),
				cl::init(false), cl::Hidden);

				namespace {
				class X86CallFrameOptimization : public MachineFunctionPass {
				public:
				X86CallFrameOptimization() : MachineFunctionPass(ID) {}

				bool runOnMachineFunction(MachineFunction &MF) override;

				private:
				bool shouldPerformTransformation(MachineFunction &MF);

				bool adjustCallSequence(MachineFunction &MF, MachineBasicBlock &MBB,
				MachineBasicBlock::iterator I);

				MachineInstr *canFoldIntoRegPush(MachineBasicBlock::iterator FrameSetup,
				unsigned Reg);

				const char *getPassName() const override {
				return "X86 Optimize Call Frame";
				}

				const TargetInstrInfo *TII;
				const TargetFrameLowering *TFL;
				const MachineRegisterInfo *MRI;
				static char ID;
				};

				char X86CallFrameOptimization::ID = 0;
				}

				FunctionPass *llvm::createX86CallFrameOptimization() {
				return new X86CallFrameOptimization();
				}

				// This checks whether the transformation is legal and profitable
				bool X86CallFrameOptimization::shouldPerformTransformation(MachineFunction &MF) {
				if (NoX86CFOpt.getValue())
				return false;

				// We currently only support call sequences where all parameters.
				// are passed on the stack.
				rnkUnsubmitted Not Done Reply Inline Actions I think it's important to at least support __thiscall eventually, since that's a very common convention with one regparm. rnk: I think it's important to at least support __thiscall eventually, since that's a very common…
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions Yes, and maybe even for _fastcall (It looks like gcc will do this for fastcall, icc won't). But I am still trying to do this gradually, to the extent that I can. :-) mkuper: Yes, and maybe even for _fastcall (It looks like gcc will do this for fastcall, icc won't).
				// No point in running this in 64-bit mode, since some arguments are
				// passed in-register in all common calling conventions, so the pattern
				// we're looking for will never match.
				rnkUnsubmitted Not Done Reply Inline Actions I guess I would justify this more in terms of reducing the extra CFI that we would have to emit to describe the SP adjustments. Converting a few movs to pushes isn't worth the complexity. rnk: I guess I would justify this more in terms of reducing the extra CFI that we would have to emit…
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions You're right, that too. mkuper: You're right, that too.
				const X86Subtarget &STI = MF.getTarget().getSubtarget<X86Subtarget>();
				if (STI.is64Bit())
				return false;

				// You would expect straight-line code between call-frame setup and
				// call-frame destroy. You would be wrong. There are circumstances (e.g.
				// CMOV_GR8 expansion of a select that feeds a function call!) where we can
				// end up with the setup and the destroy in different basic blocks.
				// This is bad, and breaks SP adjustment.
				// So, check that all of the frames in the function are closed inside
				// the same block, and, for good measure, that there are no nested frames.
				int FrameSetupOpcode = TII->getCallFrameSetupOpcode();
				int FrameDestroyOpcode = TII->getCallFrameDestroyOpcode();
				for (MachineFunction::iterator BB = MF.begin(), E = MF.end(); BB != E; ++BB) {
				bool InsideFrameSequence = false;
				rnkUnsubmitted Not Done Reply Inline Actions Can this be `for (MachineBasicBlock &BB : MF) {`? rnk:* Can this be `for (MachineBasicBlock &BB : *MF) {`?
				for (MachineBasicBlock::iterator I = BB->begin(); I != BB->end(); ++I) {
				if (I->getOpcode() == FrameSetupOpcode) {
				rnkUnsubmitted Not Done Reply Inline Actions Ditto, `for (MachineInstr &MI : BB) {` ? rnk: Ditto, `for (MachineInstr &MI : BB) {` ?
				if (InsideFrameSequence)
				return false;
				InsideFrameSequence = true;
				}
				else if (I->getOpcode() == FrameDestroyOpcode) {
				if (!InsideFrameSequence)
				return false;
				InsideFrameSequence = false;
				}
				}

				delenaUnsubmitted Not Done Reply Inline Actions If you change instructions inside bb, your iterator may be broken. delena: If you change instructions inside bb, your iterator may be broken.
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions As far as I know, MBB iterators aren't invalidated by removing other instructions, and we don't remove the FrameSetup itself. But it's probably better to keep going from the FrameDestroy instead of the next instruction. Will change that. mkuper: As far as I know, MBB iterators aren't invalidated by removing other instructions, and we don't…
				if (InsideFrameSequence)
				return false;
				}

				// Now that we know the transformation is legal, check if it is
				// profitable.
				// TODO: Add a heuristic that actually looks at the function,
				// and enable this for more cases.

				// This transformation is always a win when we expected to have
				// a reserved call frame. Under other circumstances, it may be either
				// a win or a loss, and requires a heuristic.
				// For now, enable it only for the relatively clear win cases.
				bool ExpectReservedFrame = MF.getFrameInfo()->hasVarSizedObjects();
				if (ExpectReservedFrame)
				return true;
				rnkUnsubmitted Not Done Reply Inline Actions I think I misinterpreted this on the first pass. We always expect this to be profitable if we know we can't reserve space for the call frame. Maybe rename the bool to CannotReserveFrame to match the sense? rnk: I think I misinterpreted this on the first pass. We always expect this to be profitable if we…
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions Err, yes, you're right, sorry about that... got distracted while naming the variable, I guess, I meant the opposite. Thanks! mkuper: Err, yes, you're right, sorry about that... got distracted while naming the variable, I guess…

				// For now, don't even try to evaluate the profitability when
				// not optimizing for size.
				AttributeSet FnAttrs = MF.getFunction()->getAttributes();
				bool OptForSize =
				FnAttrs.hasAttribute(AttributeSet::FunctionIndex,
				Attribute::OptimizeForSize) \|\|
				FnAttrs.hasAttribute(AttributeSet::FunctionIndex, Attribute::MinSize);

				if (!OptForSize)
				return false;

				// Stack re-alignment can make this unprofitable even in terms of size.
				rnkUnsubmitted Not Done Reply Inline Actions Can you explain why this is unprofitable? I guess if we get here we are in dyanamic alloca plus stack realignment land, i.e. the worst thing that could possibly happen. Is this about extra code for preserving the outgoing stack alignment then? Like on Linux, where we provide 16 byte stack alignment? rnk: Can you explain why this is unprofitable? I guess if we get here we are in dyanamic alloca plus…
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions If we get here, we're in opt-for-size + stack-realignment land. And, yes, that's exactly what it is is about. If you are passing only one parameter, the original code would be: mov %eax, 128(%esp) call $foo Without re-alignment, you have push %eax call $foo add $4, %esp which is still a win in terms of code-size With re-alignment, you get: sub $16, %esp push %eax call $foo add $12, %esp Which is... questionable. The code size for the sequence is the same (in this case, 7 bytes for both, not including the call), but if you have other call sites which you didn't convert, you may actually lose. And, of course, you lose performance (3 instructions instead of 1) without anything to show for it. Once there is a heuristic that tries to estimate the overhead, we can address this on a case-by-case basis (e.g. if we have 16-byte stack re-alignment, but most call-sites have a lot of parameters, then it's still worth it.) mkuper: If we get here, we're in opt-for-size + stack-realignment land. And, yes, that's exactly what…
				rnkUnsubmitted Not Done Reply Inline Actions Based on my misinterpretation, I think I understand why you get this code. SP is assumed to be aligned coming into the sequence. We realign SP after dynamic allocas. The sequence is probably more like: sub $12, %esp push %eax call $foo add $16, %esp I can see why this is less profitable. rnk: Based on my misinterpretation, I think I understand why you get this code. SP is assumed to be…
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions Yes, that sequence. :-) It doesn't depend on dynamic allocas, though. If you don't have a reserved frame (for whatever reason - for x86 after this patch, it's either dynamic allocas, or because we forced it not to reserve by using pushes), then you need this re-alignment. mkuper: Yes, that sequence. :-) It doesn't depend on dynamic allocas, though. If you don't have a…
				// As mentioned above, a better heuristic is needed. For now, don't do this
				// when the required alignment is above 8. (4 would be the safe choice, but
				// some experimentation showed 8 is generally good).
				if (TFL->getStackAlignment() > 8)
				return false;

				return true;
				}

				bool X86CallFrameOptimization::runOnMachineFunction(MachineFunction &MF) {
				TII = MF.getSubtarget().getInstrInfo();
				TFL = MF.getSubtarget().getFrameLowering();
				MRI = &MF.getRegInfo();

				if (!shouldPerformTransformation(MF))
				return false;

				int FrameSetupOpcode = TII->getCallFrameSetupOpcode();

				bool Changed = false;

				for (MachineFunction::iterator BB = MF.begin(), E = MF.end(); BB != E; ++BB)
				for (MachineBasicBlock::iterator I = BB->begin(); I != BB->end(); ++I)
				if (I->getOpcode() == FrameSetupOpcode)
				Changed \|= adjustCallSequence(MF, *BB, I);

				return Changed;
				}

				bool X86CallFrameOptimization::adjustCallSequence(MachineFunction &MF,
				MachineBasicBlock &MBB,
				MachineBasicBlock::iterator I) {

				// Check that this particular call sequence is amenable to the
				// transformation.
				const X86RegisterInfo &RegInfo = static_cast<const X86RegisterInfo >(
				MF.getSubtarget().getRegisterInfo());
				unsigned StackPtr = RegInfo.getStackRegister();
				int FrameDestroyOpcode = TII->getCallFrameDestroyOpcode();

				// We expect to enter this at the beginning of a call sequence
				assert(I->getOpcode() == TII->getCallFrameSetupOpcode());
				MachineBasicBlock::iterator FrameSetup = I++;


				// For globals in PIC mode, we can have some LEAs here.
				// Ignore them, they don't bother us.
				// TODO: Extend this to something that covers more cases.
				while (I->getOpcode() == X86::LEA32r)
				++I;

				// We expect a copy instruction here.
				// TODO: The copy instruction is a lowering artifact.
				// We should also support a copy-less version, where the stack
				// pointer is used directly.
				if (!I->isCopy() \|\| !I->getOperand(0).isReg())
				return false;
				MachineBasicBlock::iterator SPCopy = I++;
				StackPtr = SPCopy->getOperand(0).getReg();

				// Scan the call setup sequence for the pattern we're looking for.
				// We only handle a simple case - a sequence of MOV32mi or MOV32mr
				// instructions, that push a sequence of 32-bit values onto the stack, with
				// no gaps between them.
				std::map<int64_t, MachineBasicBlock::iterator> MovMap;
				rnkUnsubmitted Not Done Reply Inline Actions std::map is really malloc heavy. This can probably be a SmallVector<MachineInstr, 8> or something, mapping slot index to the MI that fills it. The frame setup opcode should tell you how much stack space to allocate up front, and you can index into the vector by StackOffset / 4. rnk:* std::map is really malloc heavy. This can probably be a SmallVector<MachineInstr*, 8> or…
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions That can work. Thanks, I'll try. mkuper: That can work. Thanks, I'll try.

				do {
				int Opcode = I->getOpcode();
				if (Opcode != X86::MOV32mi && Opcode != X86::MOV32mr)
				break;
				delenaUnsubmitted Not Done Reply Inline Actions It should be immediate, right? Can we have a relocation here? delena: It should be immediate, right? Can we have a relocation here?
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions It can be a relocation, but in that case, isImm() will fail. Will document that more clearly mkuper: It can be a relocation, but in that case, isImm() will fail. Will document that more clearly

				// We only want movs of the form:
				// movl imm/r32, k(%esp)
				// If we run into something else, bail.
				// Note that AddrBaseReg may, counter to its name, not be a register,
				// but rather a frame index.
				// TODO: Support the fi case. This should probably work now that we
				// have the infrastructure to track the stack pointer within a call
				// sequence.
				rnkUnsubmitted Not Done Reply Inline Actions This seems worth tackling, given that you had to handle the `call <fi>` case. :) rnk: This seems worth tackling, given that you had to handle the `call <fi>` case. :)
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions Yes, definitely. :-) It may even work out of the box now. But I think I still want to split it into a separate commit. mkuper: Yes, definitely. :-) It may even work out of the box now. But I think I still want to split it…
				if (!I->getOperand(X86::AddrBaseReg).isReg() \|\|
				(I->getOperand(X86::AddrBaseReg).getReg() != StackPtr) \|\|
				!I->getOperand(X86::AddrScaleAmt).isImm() \|\|
				delenaUnsubmitted Not Done Reply Inline Actions SlowPush should be a property of the target, like slowLea delena: SlowPush should be a property of the target, like slowLea
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions I agree. Unfortunately, I've run out of bits. The Subtarget features are 64-bit bitfield, and they're all taken. mkuper: I agree. Unfortunately, I've run out of bits. The Subtarget features are 64-bit bitfield, and…
				(I->getOperand(X86::AddrScaleAmt).getImm() != 1) \|\|
				(I->getOperand(X86::AddrIndexReg).getReg() != X86::NoRegister) \|\|
				delenaUnsubmitted Not Done Reply Inline Actions The comment is missing here. delena: The comment is missing here.
				(I->getOperand(X86::AddrSegmentReg).getReg() != X86::NoRegister) \|\|
				!I->getOperand(X86::AddrDisp).isImm())
				return false;

				int64_t StackDisp = I->getOperand(X86::AddrDisp).getImm();

				// We really don't want to consider the unaligned case.
				if (StackDisp % 4)
				return false;

				// If the same stack slot is being filled twice, something's fishy.
				if (!MovMap.insert(std::pair<int64_t, MachineInstr *>(StackDisp, I)).second)
				return false;

				++I;
				} while (I != MBB.end());

				// We now expect the end of the sequence - a call and a stack adjust.
				if (I == MBB.end())
				return false;

				// For PCrel calls, we expect an additional COPY of the basereg.
				// If we find one, skip it.
				if (I->isCopy()) {
				if (I->getOperand(1).getReg() ==
				MF.getInfo<X86MachineFunctionInfo>()->getGlobalBaseReg())
				++I;
				else
				return false;
				}

				if (!I->isCall())
				return false;
				MachineBasicBlock::iterator Call = I;
				if ((++I)->getOpcode() != FrameDestroyOpcode)
				return false;

				// Now, go through the map, and see that we don't have any gaps,
				// but only a series of 32-bit MOVs.
				// Since std::map provides ordered iteration, the original order
				// of the MOVs doesn't matter.
				int64_t ExpectedDist = 0;
				for (auto MMI = MovMap.begin(), MME = MovMap.end(); MMI != MME;
				++MMI, ExpectedDist += 4)
				if (MMI->first != ExpectedDist)
				return false;

				// If the call had no parameters, do nothing
				if (!ExpectedDist)
				return false;

				// Ok, we can in fact do the transformation for this call.
				// Do not remove the FrameSetup instruction, but adjust the parameters.
				// PEI will end up finalizing the handling of this.
				FrameSetup->getOperand(1).setImm(ExpectedDist);

				DebugLoc DL = I->getDebugLoc();
				// Now, iterate through the map in reverse order, and replace the movs
				// with pushes. MOVmi/MOVmr doesn't have any defs, so no need to
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions Argh. This is nonsense. Commented one thing, coded another... (mayStore() is extremely far from being a strong enough condition to allow this.) mkuper: Argh. This is nonsense. Commented one thing, coded another... (mayStore() is extremely far from…
				// replace uses.
				for (auto MMI = MovMap.rbegin(), MME = MovMap.rend(); MMI != MME; ++MMI) {
				MachineBasicBlock::iterator MOV = MMI->second;
				MachineOperand PushOp = MOV->getOperand(X86::AddrNumOperands);
				if (MOV->getOpcode() == X86::MOV32mi) {
				unsigned PushOpcode = X86::PUSHi32;
				// If the operand is a small (8-bit) immediate, we can use a
				// PUSH instruction with a shorter encoding.
				// Note that isImm() may fail even though this is a MOVmi, because
				// the operand can also be a symbol.
				if (PushOp.isImm()) {
				int64_t Val = PushOp.getImm();
				if (isInt<8>(Val))
				PushOpcode = X86::PUSH32i8;
				}
				BuildMI(MBB, Call, DL, TII->get(PushOpcode)).addOperand(PushOp);
				} else {
				unsigned int Reg = PushOp.getReg();

				// If PUSHrmm is not slow on this target, try to fold the source of the
				// push into the instruction.
				const X86Subtarget &ST = MF.getTarget().getSubtarget<X86Subtarget>();
				bool SlowPUSHrmm = ST.isAtom() \|\| ST.isSLM();

				// Check that this is legal to fold. Right now, we're extremely
				// conservative about that.
				MachineInstr *DefMov = nullptr;
				if (!SlowPUSHrmm && (DefMov = canFoldIntoRegPush(FrameSetup, Reg))) {
				MachineInstr *Push = BuildMI(MBB, Call, DL, TII->get(X86::PUSH32rmm));

				unsigned NumOps = DefMov->getDesc().getNumOperands();
				for (unsigned i = NumOps - X86::AddrNumOperands; i != NumOps; ++i)
				Push->addOperand(DefMov->getOperand(i));

				DefMov->eraseFromParent();
				} else {
				BuildMI(MBB, Call, DL, TII->get(X86::PUSH32r)).addReg(Reg).getInstr();
				}
				}

				MBB.erase(MOV);
				}

				// The stack-pointer copy is no longer used in the call sequences.
				// There should not be any other users, but we can't commit to that, so:
				if (MRI->use_empty(SPCopy->getOperand(0).getReg()))
				SPCopy->eraseFromParent();

				// Once we've done this, we need to make sure PEI doesn't assume a reserved
				// frame.
				X86MachineFunctionInfo *FuncInfo = MF.getInfo<X86MachineFunctionInfo>();
				FuncInfo->setHasPushSequences(true);

				return true;
				}

				MachineInstr *X86CallFrameOptimization::canFoldIntoRegPush(
				MachineBasicBlock::iterator FrameSetup, unsigned Reg) {
				// Do an extremely restricted form of load folding.
				// ISel will often create patterns like:
				// movl 4(%edi), %eax
				// movl 8(%edi), %ecx
				// movl 12(%edi), %edx
				// movl %edx, 8(%esp)
				// movl %ecx, 4(%esp)
				// movl %eax, (%esp)
				// call
				// Get rid of those with prejudice.
				if (!TargetRegisterInfo::isVirtualRegister(Reg))
				return nullptr;

				// Make sure this is the only use of Reg.
				if (!MRI->hasOneNonDBGUse(Reg))
				return nullptr;

				MachineBasicBlock::iterator DefMI = MRI->getVRegDef(Reg);

				// Make sure the def is a MOV from memory.
				// If the def is an another block, give up.
				if (DefMI->getOpcode() != X86::MOV32rm \|\|
				DefMI->getParent() != FrameSetup->getParent())
				return nullptr;
				rnkUnsubmitted Not Done Reply Inline Actions It's not clear to me that same BB is sufficient, consider this potential BB: movl (%edi), %eax movl $42, (%edi) <call setup> movl %eax, (%esp) calll foo <call end> We can't move the load if there is a potentially aliasing store in the way. There might be a utility to help with the aliasing query, or you can assume that any stores other than arg stores might alias it and bail on that. rnk: It's not clear to me that same BB is sufficient, consider this potential BB: ``` movl (%edi)…
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions Right now I'm way more conservative than even that - I'm checking below that everything between this mov and the call setup is a MOV32rm. The "same basic block" check here is just a way to short-circuit the obviously wrong cases. This catches some common cases like the one in the comment above, but of course misses other opportunities. I could check for a mayStore() instead, but I'm not sure that's safe enough. I'd like to relax the condition - but again, I think that ought to be a separate commit. mkuper: Right now I'm way more conservative than even that - I'm checking below that everything between…

				// Be careful with movs that load from a stack slot, since it may get
				// resolved incorrectly.
				// TODO: Again, we already have the infrastructure, so this should work.
				if (!DefMI->getOperand(1).isReg())
				return nullptr;

				// Now, make sure everything else up until the ADJCALLSTACK is a sequence
				// of MOVs. To be less conservative would require duplicating a lot of the
				// logic from PeepholeOptimizer.
				// FIXME: A possibly better approach would be to teach the PeepholeOptimizer
				// to be smarter about folding into pushes.
				for (auto I = DefMI; I != FrameSetup; ++I)
				if (I->getOpcode() != X86::MOV32rm)
				return nullptr;

				return DefMI;
				}

lib/Target/X86/X86FastISel.cpp

Show First 20 Lines • Show All 2,729 Lines • ▼ Show 20 Lines	bool X86FastISel::fastLowerCall(CallLoweringInfo &CLI) {
CCInfo.AnalyzeCallOperands(OutVTs, OutFlags, CC_X86);		CCInfo.AnalyzeCallOperands(OutVTs, OutFlags, CC_X86);

// Get a count of how many bytes are to be pushed on the stack.		// Get a count of how many bytes are to be pushed on the stack.
unsigned NumBytes = CCInfo.getNextStackOffset();		unsigned NumBytes = CCInfo.getNextStackOffset();

// Issue CALLSEQ_START		// Issue CALLSEQ_START
unsigned AdjStackDown = TII.getCallFrameSetupOpcode();		unsigned AdjStackDown = TII.getCallFrameSetupOpcode();
BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(AdjStackDown))		BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(AdjStackDown))
.addImm(NumBytes);		.addImm(NumBytes).addImm(0);

// Walk the register/memloc assignments, inserting copies/loads.		// Walk the register/memloc assignments, inserting copies/loads.
const X86RegisterInfo RegInfo = static_cast<const X86RegisterInfo >(		const X86RegisterInfo RegInfo = static_cast<const X86RegisterInfo >(
TM.getSubtargetImpl()->getRegisterInfo());		TM.getSubtargetImpl()->getRegisterInfo());
for (unsigned i = 0, e = ArgLocs.size(); i != e; ++i) {		for (unsigned i = 0, e = ArgLocs.size(); i != e; ++i) {
CCValAssign const &VA = ArgLocs[i];		CCValAssign const &VA = ArgLocs[i];
const Value *ArgVal = OutVals[VA.getValNo()];		const Value *ArgVal = OutVals[VA.getValNo()];
MVT ArgVT = OutVTs[VA.getValNo()];		MVT ArgVT = OutVTs[VA.getValNo()];
▲ Show 20 Lines • Show All 606 Lines • Show Last 20 Lines

lib/Target/X86/X86FrameLowering.h

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	public:

bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB,		bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MI,		MachineBasicBlock::iterator MI,
const std::vector<CalleeSavedInfo> &CSI,		const std::vector<CalleeSavedInfo> &CSI,
const TargetRegisterInfo *TRI) const override;		const TargetRegisterInfo *TRI) const override;

bool hasFP(const MachineFunction &MF) const override;		bool hasFP(const MachineFunction &MF) const override;
bool hasReservedCallFrame(const MachineFunction &MF) const override;		bool hasReservedCallFrame(const MachineFunction &MF) const override;
		bool canSimplifyCallFramePseudos(const MachineFunction &MF) const override;

int getFrameIndexOffset(const MachineFunction &MF, int FI) const override;		int getFrameIndexOffset(const MachineFunction &MF, int FI) const override;
int getFrameIndexReference(const MachineFunction &MF, int FI,		int getFrameIndexReference(const MachineFunction &MF, int FI,
unsigned &FrameReg) const override;		unsigned &FrameReg) const override;

int getFrameIndexOffsetFromSP(const MachineFunction &MF, int FI) const;		int getFrameIndexOffsetFromSP(const MachineFunction &MF, int FI) const;
int getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI,		int getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI,
unsigned &FrameReg) const override;		unsigned &FrameReg) const override;
Show All 19 Lines

lib/Target/X86/X86FrameLowering.cpp

Show All 32 Lines
#include <cstdlib>		#include <cstdlib>

using namespace llvm;		using namespace llvm;

// FIXME: completely move here.		// FIXME: completely move here.
extern cl::opt<bool> ForceStackAlign;		extern cl::opt<bool> ForceStackAlign;

bool X86FrameLowering::hasReservedCallFrame(const MachineFunction &MF) const {		bool X86FrameLowering::hasReservedCallFrame(const MachineFunction &MF) const {
return !MF.getFrameInfo()->hasVarSizedObjects();		return !MF.getFrameInfo()->hasVarSizedObjects() &&
		!MF.getInfo<X86MachineFunctionInfo>()->getHasPushSequences();
		}

		/// canSimplifyCallFramePseudos - If there is a reserved call frame, the
		/// call frame pseudos can be simplified. Having a FP, as in the default
		/// implementation, is not sufficient here since we can't always use it.
		/// Use a more nuanced condition.
		bool
		X86FrameLowering::canSimplifyCallFramePseudos(const MachineFunction &MF) const {
		const X86RegisterInfo TRI = static_cast<const X86RegisterInfo >
		(MF.getSubtarget().getRegisterInfo());
		return hasReservedCallFrame(MF) \|\|
		(hasFP(MF) && !TRI->needsStackRealignment(MF))
		\|\| TRI->hasBasePointer(MF);
}		}

/// hasFP - Return true if the specified function should have a dedicated frame		/// hasFP - Return true if the specified function should have a dedicated frame
/// pointer register. This is true if the function has variable sized allocas		/// pointer register. This is true if the function has variable sized allocas
/// or if frame pointer elimination is disabled.		/// or if frame pointer elimination is disabled.
bool X86FrameLowering::hasFP(const MachineFunction &MF) const {		bool X86FrameLowering::hasFP(const MachineFunction &MF) const {
const MachineFrameInfo *MFI = MF.getFrameInfo();		const MachineFrameInfo *MFI = MF.getFrameInfo();
const MachineModuleInfo &MMI = MF.getMMI();		const MachineModuleInfo &MMI = MF.getMMI();
Show All 38 Lines	if (isInt<8>(Imm))
return X86::AND64ri8;		return X86::AND64ri8;
return X86::AND64ri32;		return X86::AND64ri32;
}		}
if (isInt<8>(Imm))		if (isInt<8>(Imm))
return X86::AND32ri8;		return X86::AND32ri8;
return X86::AND32ri;		return X86::AND32ri;
}		}

static unsigned getPUSHiOpcode(bool IsLP64, MachineOperand MO) {
// We don't support LP64 for now.
assert(!IsLP64);

if (MO.isImm() && isInt<8>(MO.getImm()))
return X86::PUSH32i8;

return X86::PUSHi32;;
}

static unsigned getLEArOpcode(unsigned IsLP64) {		static unsigned getLEArOpcode(unsigned IsLP64) {
return IsLP64 ? X86::LEA64r : X86::LEA32r;		return IsLP64 ? X86::LEA64r : X86::LEA32r;
}		}

/// findDeadCallerSavedReg - Return a caller-saved register that isn't live		/// findDeadCallerSavedReg - Return a caller-saved register that isn't live
/// when it reaches the "return" instruction. We can then pop a stack object		/// when it reaches the "return" instruction. We can then pop a stack object
/// to this register without worry about clobbering it.		/// to this register without worry about clobbering it.
static unsigned findDeadCallerSavedReg(MachineBasicBlock &MBB,		static unsigned findDeadCallerSavedReg(MachineBasicBlock &MBB,
▲ Show 20 Lines • Show All 1,729 Lines • ▼ Show 20 Lines	if (MaxStack > Guaranteed) {
incStackMBB->addSuccessor(&prologueMBB, 99);		incStackMBB->addSuccessor(&prologueMBB, 99);
incStackMBB->addSuccessor(incStackMBB, 1);		incStackMBB->addSuccessor(incStackMBB, 1);
}		}
#ifdef XDEBUG		#ifdef XDEBUG
MF.verify();		MF.verify();
#endif		#endif
}		}

bool X86FrameLowering::
convertArgMovsToPushes(MachineFunction &MF, MachineBasicBlock &MBB,
MachineBasicBlock::iterator I, uint64_t Amount) const {
const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
const X86RegisterInfo &RegInfo = static_cast<const X86RegisterInfo >(
MF.getSubtarget().getRegisterInfo());
unsigned StackPtr = RegInfo.getStackRegister();

// Scan the call setup sequence for the pattern we're looking for.
// We only handle a simple case now - a sequence of MOV32mi or MOV32mr
// instructions, that push a sequence of 32-bit values onto the stack, with
// no gaps.
std::map<int64_t, MachineBasicBlock::iterator> MovMap;
do {
int Opcode = I->getOpcode();
if (Opcode != X86::MOV32mi && Opcode != X86::MOV32mr)
break;

// We only want movs of the form:
// movl imm/r32, k(%ecx)
// If we run into something else, bail
// Note that AddrBaseReg may, counterintuitively, not be a register...
if (!I->getOperand(X86::AddrBaseReg).isReg() \|\|
(I->getOperand(X86::AddrBaseReg).getReg() != StackPtr) \|\|
!I->getOperand(X86::AddrScaleAmt).isImm() \|\|
(I->getOperand(X86::AddrScaleAmt).getImm() != 1) \|\|
(I->getOperand(X86::AddrIndexReg).getReg() != X86::NoRegister) \|\|
(I->getOperand(X86::AddrSegmentReg).getReg() != X86::NoRegister) \|\|
!I->getOperand(X86::AddrDisp).isImm())
return false;

int64_t StackDisp = I->getOperand(X86::AddrDisp).getImm();

// We don't want to consider the unaligned case.
if (StackDisp % 4)
return false;

// If the same stack slot is being filled twice, something's fishy.
if (!MovMap.insert(std::pair<int64_t, MachineInstr*>(StackDisp, I)).second)
return false;

++I;
} while (I != MBB.end());

// We now expect the end of the sequence - a call and a stack adjust.
if (I == MBB.end())
return false;
if (!I->isCall())
return false;
MachineBasicBlock::iterator Call = I;
if ((++I)->getOpcode() != TII.getCallFrameDestroyOpcode())
return false;

// Now, go through the map, and see that we don't have any gaps,
// but only a series of 32-bit MOVs.
// Since std::map provides ordered iteration, the original order
// of the MOVs doesn't matter.
int64_t ExpectedDist = 0;
for (auto MMI = MovMap.begin(), MME = MovMap.end(); MMI != MME;
++MMI, ExpectedDist += 4)
if (MMI->first != ExpectedDist)
return false;

// Ok, everything looks fine. Do the transformation.
DebugLoc DL = I->getDebugLoc();

// It's possible the original stack adjustment amount was larger than
// that done by the pushes. If so, we still need a SUB.
Amount -= ExpectedDist;
if (Amount) {
MachineInstr* Sub = BuildMI(MBB, Call, DL,
TII.get(getSUBriOpcode(false, Amount)), StackPtr)
.addReg(StackPtr).addImm(Amount);
Sub->getOperand(3).setIsDead();
}

// Now, iterate through the map in reverse order, and replace the movs
// with pushes. MOVmi/MOVmr doesn't have any defs, so need to replace uses.
for (auto MMI = MovMap.rbegin(), MME = MovMap.rend(); MMI != MME; ++MMI) {
MachineBasicBlock::iterator MOV = MMI->second;
MachineOperand PushOp = MOV->getOperand(X86::AddrNumOperands);

// Replace MOVmr with PUSH32r, and MOVmi with PUSHi of appropriate size
int PushOpcode = X86::PUSH32r;
if (MOV->getOpcode() == X86::MOV32mi)
PushOpcode = getPUSHiOpcode(false, PushOp);

BuildMI(MBB, Call, DL, TII.get(PushOpcode)).addOperand(PushOp);
MBB.erase(MOV);
}

return true;
}

void X86FrameLowering::		void X86FrameLowering::
eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB,		eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB,
MachineBasicBlock::iterator I) const {		MachineBasicBlock::iterator I) const {
const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();		const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
const X86RegisterInfo &RegInfo = static_cast<const X86RegisterInfo >(		const X86RegisterInfo &RegInfo = static_cast<const X86RegisterInfo >(
MF.getSubtarget().getRegisterInfo());		MF.getSubtarget().getRegisterInfo());
unsigned StackPtr = RegInfo.getStackRegister();		unsigned StackPtr = RegInfo.getStackRegister();
bool reserveCallFrame = hasReservedCallFrame(MF);		bool reserveCallFrame = hasReservedCallFrame(MF);
int Opcode = I->getOpcode();		int Opcode = I->getOpcode();
bool isDestroy = Opcode == TII.getCallFrameDestroyOpcode();		bool isDestroy = Opcode == TII.getCallFrameDestroyOpcode();
const X86Subtarget &STI = MF.getTarget().getSubtarget<X86Subtarget>();		const X86Subtarget &STI = MF.getTarget().getSubtarget<X86Subtarget>();
bool IsLP64 = STI.isTarget64BitLP64();		bool IsLP64 = STI.isTarget64BitLP64();
DebugLoc DL = I->getDebugLoc();		DebugLoc DL = I->getDebugLoc();
uint64_t Amount = !reserveCallFrame ? I->getOperand(0).getImm() : 0;		uint64_t Amount = !reserveCallFrame ? I->getOperand(0).getImm() : 0;
uint64_t CalleeAmt = isDestroy ? I->getOperand(1).getImm() : 0;		uint64_t InternalAmt = (isDestroy \|\| Amount) ? I->getOperand(1).getImm() : 0;
I = MBB.erase(I);		I = MBB.erase(I);

if (!reserveCallFrame) {		if (!reserveCallFrame) {
// If the stack pointer can be changed after prologue, turn the		// If the stack pointer can be changed after prologue, turn the
// adjcallstackup instruction into a 'sub ESP, <amt>' and the		// adjcallstackup instruction into a 'sub ESP, <amt>' and the
// adjcallstackdown instruction into 'add ESP, <amt>'		// adjcallstackdown instruction into 'add ESP, <amt>'
if (Amount == 0)		if (Amount == 0)
return;		return;

// We need to keep the stack aligned properly. To do this, we round the		// We need to keep the stack aligned properly. To do this, we round the
// amount of space needed for the outgoing arguments up to the next		// amount of space needed for the outgoing arguments up to the next
// alignment boundary.		// alignment boundary.
unsigned StackAlign = MF.getTarget()		unsigned StackAlign = MF.getTarget()
.getSubtargetImpl()		.getSubtargetImpl()
->getFrameLowering()		->getFrameLowering()
->getStackAlignment();		->getStackAlignment();
Amount = (Amount + StackAlign - 1) / StackAlign * StackAlign;		Amount = (Amount + StackAlign - 1) / StackAlign * StackAlign;

MachineInstr *New = nullptr;		MachineInstr *New = nullptr;
if (Opcode == TII.getCallFrameSetupOpcode()) {
// Try to convert movs to the stack into pushes.
// We currently only look for a pattern that appears in 32-bit
// calling conventions.
if (!IsLP64 && convertArgMovsToPushes(MF, MBB, I, Amount))
return;

New = BuildMI(MF, DL, TII.get(getSUBriOpcode(IsLP64, Amount)),		// Factor out the amount that gets handled inside the sequence
StackPtr)		// (Pushes of argument for frame setup, callee pops for frame destroy)
.addReg(StackPtr)		Amount -= InternalAmt;
.addImm(Amount);
		if (Amount) {
		if (Opcode == TII.getCallFrameSetupOpcode()) {
		New = BuildMI(MF, DL, TII.get(getSUBriOpcode(IsLP64, Amount)), StackPtr)
		.addReg(StackPtr).addImm(Amount);
} else {		} else {
assert(Opcode == TII.getCallFrameDestroyOpcode());		assert(Opcode == TII.getCallFrameDestroyOpcode());

// Factor out the amount the callee already popped.
Amount -= CalleeAmt;

if (Amount) {
unsigned Opc = getADDriOpcode(IsLP64, Amount);		unsigned Opc = getADDriOpcode(IsLP64, Amount);
New = BuildMI(MF, DL, TII.get(Opc), StackPtr)		New = BuildMI(MF, DL, TII.get(Opc), StackPtr)
.addReg(StackPtr).addImm(Amount);		.addReg(StackPtr).addImm(Amount);
}		}
}		}

if (New) {		if (New) {
// The EFLAGS implicit def is dead.		// The EFLAGS implicit def is dead.
New->getOperand(3).setIsDead();		New->getOperand(3).setIsDead();

// Replace the pseudo instruction with a new instruction.		// Replace the pseudo instruction with a new instruction.
MBB.insert(I, New);		MBB.insert(I, New);
}		}

return;		return;
}		}

if (Opcode == TII.getCallFrameDestroyOpcode() && CalleeAmt) {		if (Opcode == TII.getCallFrameDestroyOpcode() && InternalAmt) {
// If we are performing frame pointer elimination and if the callee pops		// If we are performing frame pointer elimination and if the callee pops
// something off the stack pointer, add it back. We do this until we have		// something off the stack pointer, add it back. We do this until we have
// more advanced stack pointer tracking ability.		// more advanced stack pointer tracking ability.
unsigned Opc = getSUBriOpcode(IsLP64, CalleeAmt);		unsigned Opc = getSUBriOpcode(IsLP64, InternalAmt);
MachineInstr *New = BuildMI(MF, DL, TII.get(Opc), StackPtr)		MachineInstr *New = BuildMI(MF, DL, TII.get(Opc), StackPtr)
.addReg(StackPtr).addImm(CalleeAmt);		.addReg(StackPtr).addImm(InternalAmt);

// The EFLAGS implicit def is dead.		// The EFLAGS implicit def is dead.
New->getOperand(3).setIsDead();		New->getOperand(3).setIsDead();

// We are not tracking the stack pointer adjustment by the callee, so make		// We are not tracking the stack pointer adjustment by the callee, so make
// sure we restore the stack pointer immediately after the call, there may		// sure we restore the stack pointer immediately after the call, there may
// be spill code inserted between the CALL and ADJCALLSTACKUP instructions.		// be spill code inserted between the CALL and ADJCALLSTACKUP instructions.
MachineBasicBlock::iterator B = MBB.begin();		MachineBasicBlock::iterator B = MBB.begin();
while (I != B && !std::prev(I)->isCall())		while (I != B && !std::prev(I)->isCall())
--I;		--I;
MBB.insert(I, New);		MBB.insert(I, New);
}		}
}		}

lib/Target/X86/X86InstrCompiler.td

	Show All 37 Lines


	// ADJCALLSTACKDOWN/UP implicitly use/def ESP because they may be expanded into			// ADJCALLSTACKDOWN/UP implicitly use/def ESP because they may be expanded into
	// a stack adjustment and the codegen must know that they may modify the stack			// a stack adjustment and the codegen must know that they may modify the stack
	// pointer before prolog-epilog rewriting occurs.			// pointer before prolog-epilog rewriting occurs.
	// Pessimistically assume ADJCALLSTACKDOWN / ADJCALLSTACKUP will become			// Pessimistically assume ADJCALLSTACKDOWN / ADJCALLSTACKUP will become
	// sub / add which can clobber EFLAGS.			// sub / add which can clobber EFLAGS.
	let Defs = [ESP, EFLAGS], Uses = [ESP] in {			let Defs = [ESP, EFLAGS], Uses = [ESP] in {
	def ADJCALLSTACKDOWN32 : I<0, Pseudo, (outs), (ins i32imm:$amt),			def ADJCALLSTACKDOWN32 : I<0, Pseudo, (outs), (ins i32imm:$amt1, i32imm:$amt2),
	"#ADJCALLSTACKDOWN",			"#ADJCALLSTACKDOWN",
	[(X86callseq_start timm:$amt)]>,			[]>,
	Requires<[NotLP64]>;			Requires<[NotLP64]>;
	def ADJCALLSTACKUP32 : I<0, Pseudo, (outs), (ins i32imm:$amt1, i32imm:$amt2),			def ADJCALLSTACKUP32 : I<0, Pseudo, (outs), (ins i32imm:$amt1, i32imm:$amt2),
	"#ADJCALLSTACKUP",			"#ADJCALLSTACKUP",
	[(X86callseq_end timm:$amt1, timm:$amt2)]>,			[(X86callseq_end timm:$amt1, timm:$amt2)]>,
	Requires<[NotLP64]>;			Requires<[NotLP64]>;
	}			}
				def : Pat<(X86callseq_start timm:$amt1),
				(ADJCALLSTACKDOWN32 i32imm:$amt1, 0)>, Requires<[NotLP64]>;


	// ADJCALLSTACKDOWN/UP implicitly use/def RSP because they may be expanded into			// ADJCALLSTACKDOWN/UP implicitly use/def RSP because they may be expanded into
	// a stack adjustment and the codegen must know that they may modify the stack			// a stack adjustment and the codegen must know that they may modify the stack
	// pointer before prolog-epilog rewriting occurs.			// pointer before prolog-epilog rewriting occurs.
	// Pessimistically assume ADJCALLSTACKDOWN / ADJCALLSTACKUP will become			// Pessimistically assume ADJCALLSTACKDOWN / ADJCALLSTACKUP will become
	// sub / add which can clobber EFLAGS.			// sub / add which can clobber EFLAGS.
	let Defs = [RSP, EFLAGS], Uses = [RSP] in {			let Defs = [RSP, EFLAGS], Uses = [RSP] in {
	def ADJCALLSTACKDOWN64 : I<0, Pseudo, (outs), (ins i32imm:$amt),			def ADJCALLSTACKDOWN64 : I<0, Pseudo, (outs), (ins i32imm:$amt1, i32imm:$amt2),
	"#ADJCALLSTACKDOWN",			"#ADJCALLSTACKDOWN",
	[(X86callseq_start timm:$amt)]>,			[]>,
	Requires<[IsLP64]>;			Requires<[IsLP64]>;
	def ADJCALLSTACKUP64 : I<0, Pseudo, (outs), (ins i32imm:$amt1, i32imm:$amt2),			def ADJCALLSTACKUP64 : I<0, Pseudo, (outs), (ins i32imm:$amt1, i32imm:$amt2),
	"#ADJCALLSTACKUP",			"#ADJCALLSTACKUP",
	[(X86callseq_end timm:$amt1, timm:$amt2)]>,			[(X86callseq_end timm:$amt1, timm:$amt2)]>,
	Requires<[IsLP64]>;			Requires<[IsLP64]>;
	}			}
				def : Pat<(X86callseq_start timm:$amt1),
				(ADJCALLSTACKDOWN64 i32imm:$amt1, 0)>, Requires<[IsLP64]>;


	// x86-64 va_start lowering magic.			// x86-64 va_start lowering magic.
	let usesCustomInserter = 1, Defs = [EFLAGS] in {			let usesCustomInserter = 1, Defs = [EFLAGS] in {
	def VASTART_SAVE_XMM_REGS : I<0, Pseudo,			def VASTART_SAVE_XMM_REGS : I<0, Pseudo,
	(outs),			(outs),
	(ins GR8:$al,			(ins GR8:$al,
	i64imm:$regsavefi, i64imm:$offset,			i64imm:$regsavefi, i64imm:$offset,
	▲ Show 20 Lines • Show All 1,769 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrInfo.h

Show First 20 Lines • Show All 169 Lines • ▼ Show 20 Lines	public:
explicit X86InstrInfo(X86Subtarget &STI);		explicit X86InstrInfo(X86Subtarget &STI);

/// getRegisterInfo - TargetInstrInfo is a superset of MRegister info. As		/// getRegisterInfo - TargetInstrInfo is a superset of MRegister info. As
/// such, whenever a client has an instance of instruction info, it should		/// such, whenever a client has an instance of instruction info, it should
/// always be able to get register info as well (through this method).		/// always be able to get register info as well (through this method).
///		///
const X86RegisterInfo &getRegisterInfo() const { return RI; }		const X86RegisterInfo &getRegisterInfo() const { return RI; }

		/// getSPAdjust - This returns the stack pointer adjustment made by
		/// this instruction. For x86, we need to handle more complex call
		/// sequences involving PUSHes.
		int getSPAdjust(const MachineInstr *MI) const override;

/// isCoalescableExtInstr - Return true if the instruction is a "coalescable"		/// isCoalescableExtInstr - Return true if the instruction is a "coalescable"
/// extension instruction. That is, it's like a copy where it's legal for the		/// extension instruction. That is, it's like a copy where it's legal for the
/// source to overlap the destination. e.g. X86::MOVSX64rr32. If this returns		/// source to overlap the destination. e.g. X86::MOVSX64rr32. If this returns
/// true, then it's expected the pre-extension value is available as a subreg		/// true, then it's expected the pre-extension value is available as a subreg
/// of the result register. This also returns the sub-register index in		/// of the result register. This also returns the sub-register index in
/// SubIdx.		/// SubIdx.
bool isCoalescableExtInstr(const MachineInstr &MI,		bool isCoalescableExtInstr(const MachineInstr &MI,
unsigned &SrcReg, unsigned &DstReg,		unsigned &SrcReg, unsigned &DstReg,
▲ Show 20 Lines • Show All 283 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrInfo.cpp

Show First 20 Lines • Show All 1,686 Lines • ▼ Show 20 Lines	case X86::MOVSX64rr32:
break;		break;
}		}
return true;		return true;
}		}
}		}
return false;		return false;
}		}

		int X86InstrInfo::getSPAdjust(const MachineInstr *MI) const {
		const MachineFunction *MF = MI->getParent()->getParent();
		const TargetFrameLowering *TFI = MF->getSubtarget().getFrameLowering();

		if (MI->getOpcode() == getCallFrameSetupOpcode() \|\|
		MI->getOpcode() == getCallFrameDestroyOpcode()) {
		unsigned StackAlign = TFI->getStackAlignment();
		int SPAdj = (MI->getOperand(0).getImm() + StackAlign - 1) / StackAlign *
		StackAlign;

		SPAdj -= MI->getOperand(1).getImm();

		if (MI->getOpcode() == getCallFrameSetupOpcode())
		return SPAdj;
		else
		return -SPAdj;
		}

		// FIXME: This is horrible.
		// The problem is that to know whether a call adjusts the stack, we
		// need information that is bound to the following ADJCALLSTACKUP
		// pseudo. This leaves us with a choice:
		// 1) Look for the next ADJCALLSTACKUP, like the code below. This
		// works, but certainly doesn't belong here.
		rnkUnsubmitted Not Done Reply Inline Actions This is the best thing I can think of at the moment. =/ rnk: This is the best thing I can think of at the moment. =/
		mkuperAuthorUnsubmitted Not Done Reply Inline Actions Too bad. :-\ So you think I should commit with this code as is? This shouldn't be a huge problem in terms of compile-time (since I'm looking only until the next call, it can't go quadratic), but it's insanely ugly. mkuper: Too bad. :-\ So you think I should commit with this code as is? This shouldn't be a huge…
		rnkUnsubmitted Not Done Reply Inline Actions Yeah, if we go with this MI pass approach to mov -> push conversion, then we'll have to keep this ADJCALLSTACKUP scan. We aren't going to move the callee cleanup stack adjustment onto the CALL instr without major changes. rnk: Yeah, if we go with this MI pass approach to mov -> push conversion, then we'll have to keep…
		mkuperAuthorUnsubmitted Not Done Reply Inline Actions This will have to happen regardless of the MI pass vs. DAG approach. I mean, I still think doing it on the DAG is unfeasible, but even if we could do that, it wouldn't help. This code is used for the case where fi resolution needs to handle a a sequence where there is a fi reference between the call and the adjcallstackup, with callee cleanup for the call. This is just a side effect of making canSimplifyCallFramePseudos return false. mkuper: This will have to happen regardless of the MI pass vs. DAG approach. I mean, I still think…
		rnkUnsubmitted Not Done Reply Inline Actions I was imagining in the DAG LowerCall implementation we emit FrameIndex operands with some kind of SP offset to indicate the current stack level. We'd end up with MI looking like this: ADJCALLSTACKDOWN32 <N> ; N is <size-of-args> % <stack-alignment>, which is usually zero PUSH32rmm <fi> <sp offset, N> PUSH32rmm <fi> <sp offset, N + 4> PUSH32rmm <fi> <sp offset, N + 8> CALL32rm <fi> <sp offset, N + 12> ADJCALLSTACKUP32 <N + 12> The main thing is that if we commit to pushes instead of movs at DAG time, it's impossible for the push conversion to fail for hard to diagnose reasons. It looks like the frame index MachineOperand type has an unused offset field. rnk: I was imagining in the DAG LowerCall implementation we emit FrameIndex operands with some kind…
		// 2) Factor out the code that makes the decision on whether a function
		// has callee-pop, and call it here as well as when creating the
		// ADJCALLSTACKUP. This is also insanely ugly.
		// 3) Don't use SP adjustments (in practice, disable mov -> push
		// tranformation) in the presence of callee-pop. This would be
		// unfortunate, but may be a good temporary solution.
		// 4) ?
		rnkUnsubmitted Not Done Reply Inline Actions I would shorten this to just something like "look for the ADJCALLSTACKUP instr that follows the call". rnk: I would shorten this to just something like "look for the ADJCALLSTACKUP instr that follows the…
		if (MI->isCall()) {
		const MachineBasicBlock* MBB = MI->getParent();
		auto I = ++MachineBasicBlock::const_iterator(MI);
		for (auto E = MBB->end(); I != E; ++I) {
		if (I->getOpcode() == getCallFrameDestroyOpcode() \|\|
		I->isCall())
		break;
		}

		// If we could not find a frame destroy opcode, then it has already
		// been simplified, so we don't care.
		if (I->getOpcode() != getCallFrameDestroyOpcode())
		return 0;

		return -(I->getOperand(1).getImm());
		}

		// Currently handle only PUSHes we can reasonably expect to see
		// in call sequences
		switch (MI->getOpcode()) {
		rnkUnsubmitted Not Done Reply Inline Actions I wonder if it's possible for __readeflags() (pushf ; pop %reg) or others to get folded into a call sequence. Probably not. rnk: I wonder if it's possible for __readeflags() (pushf ; pop %reg) or others to get folded into a…
		mkuperAuthorUnsubmitted Not Done Reply Inline Actions I don't see how it could happen. In any case, we won't match either the pushf or the pop, so it should be ok. mkuper: I don't see how it could happen. In any case, we won't match either the pushf or the pop, so it…
		default:
		return 0;
		case X86::PUSH32i8:
		case X86::PUSH32r:
		case X86::PUSH32rmm:
		case X86::PUSH32rmr:
		case X86::PUSHi32:
		return 4;
		}
		}

/// isFrameOperand - Return true and the FrameIndex if the specified		/// isFrameOperand - Return true and the FrameIndex if the specified
/// operand and follow operands form a reference to the stack frame.		/// operand and follow operands form a reference to the stack frame.
bool X86InstrInfo::isFrameOperand(const MachineInstr *MI, unsigned int Op,		bool X86InstrInfo::isFrameOperand(const MachineInstr *MI, unsigned int Op,
int &FrameIndex) const {		int &FrameIndex) const {
if (MI->getOperand(Op+X86::AddrBaseReg).isFI() &&		if (MI->getOperand(Op+X86::AddrBaseReg).isFI() &&
MI->getOperand(Op+X86::AddrScaleAmt).isImm() &&		MI->getOperand(Op+X86::AddrScaleAmt).isImm() &&
MI->getOperand(Op+X86::AddrIndexReg).isReg() &&		MI->getOperand(Op+X86::AddrIndexReg).isReg() &&
MI->getOperand(Op+X86::AddrDisp).isImm() &&		MI->getOperand(Op+X86::AddrDisp).isImm() &&
▲ Show 20 Lines • Show All 4,220 Lines • Show Last 20 Lines

lib/Target/X86/X86MachineFunctionInfo.h

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	class X86MachineFunctionInfo : public MachineFunctionInfo {
unsigned VarArgsGPOffset;		unsigned VarArgsGPOffset;
/// VarArgsFPOffset - X86-64 vararg func fp reg offset.		/// VarArgsFPOffset - X86-64 vararg func fp reg offset.
unsigned VarArgsFPOffset;		unsigned VarArgsFPOffset;
/// ArgumentStackSize - The number of bytes on stack consumed by the arguments		/// ArgumentStackSize - The number of bytes on stack consumed by the arguments
/// being passed on the stack.		/// being passed on the stack.
unsigned ArgumentStackSize;		unsigned ArgumentStackSize;
/// NumLocalDynamics - Number of local-dynamic TLS accesses.		/// NumLocalDynamics - Number of local-dynamic TLS accesses.
unsigned NumLocalDynamics;		unsigned NumLocalDynamics;
		/// HasPushSequences - Keeps track of whether this function uses sequences
		/// of pushes to pass function parameters.
		bool HasPushSequences;

private:		private:
/// ForwardedMustTailRegParms - A list of virtual and physical registers		/// ForwardedMustTailRegParms - A list of virtual and physical registers
/// that must be forwarded to every musttail call.		/// that must be forwarded to every musttail call.
SmallVector<ForwardedRegister, 1> ForwardedMustTailRegParms;		SmallVector<ForwardedRegister, 1> ForwardedMustTailRegParms;

public:		public:
X86MachineFunctionInfo() : ForceFramePointer(false),		X86MachineFunctionInfo() : ForceFramePointer(false),
RestoreBasePointerOffset(0),		RestoreBasePointerOffset(0),
CalleeSavedFrameSize(0),		CalleeSavedFrameSize(0),
BytesToPopOnReturn(0),		BytesToPopOnReturn(0),
ReturnAddrIndex(0),		ReturnAddrIndex(0),
TailCallReturnAddrDelta(0),		TailCallReturnAddrDelta(0),
SRetReturnReg(0),		SRetReturnReg(0),
GlobalBaseReg(0),		GlobalBaseReg(0),
VarArgsFrameIndex(0),		VarArgsFrameIndex(0),
RegSaveFrameIndex(0),		RegSaveFrameIndex(0),
VarArgsGPOffset(0),		VarArgsGPOffset(0),
VarArgsFPOffset(0),		VarArgsFPOffset(0),
ArgumentStackSize(0),		ArgumentStackSize(0),
NumLocalDynamics(0) {}		NumLocalDynamics(0),
		HasPushSequences(false) {}

explicit X86MachineFunctionInfo(MachineFunction &MF)		explicit X86MachineFunctionInfo(MachineFunction &MF)
: ForceFramePointer(false),		: ForceFramePointer(false),
RestoreBasePointerOffset(0),		RestoreBasePointerOffset(0),
CalleeSavedFrameSize(0),		CalleeSavedFrameSize(0),
BytesToPopOnReturn(0),		BytesToPopOnReturn(0),
ReturnAddrIndex(0),		ReturnAddrIndex(0),
TailCallReturnAddrDelta(0),		TailCallReturnAddrDelta(0),
SRetReturnReg(0),		SRetReturnReg(0),
GlobalBaseReg(0),		GlobalBaseReg(0),
VarArgsFrameIndex(0),		VarArgsFrameIndex(0),
RegSaveFrameIndex(0),		RegSaveFrameIndex(0),
VarArgsGPOffset(0),		VarArgsGPOffset(0),
VarArgsFPOffset(0),		VarArgsFPOffset(0),
ArgumentStackSize(0),		ArgumentStackSize(0),
NumLocalDynamics(0) {}		NumLocalDynamics(0),
		HasPushSequences(false) {}

bool getForceFramePointer() const { return ForceFramePointer;}		bool getForceFramePointer() const { return ForceFramePointer;}
void setForceFramePointer(bool forceFP) { ForceFramePointer = forceFP; }		void setForceFramePointer(bool forceFP) { ForceFramePointer = forceFP; }

		bool getHasPushSequences() const { return HasPushSequences; }
		void setHasPushSequences(bool HasPush) { HasPushSequences = HasPush; }

bool getRestoreBasePointer() const { return RestoreBasePointerOffset!=0; }		bool getRestoreBasePointer() const { return RestoreBasePointerOffset!=0; }
void setRestoreBasePointer(const MachineFunction *MF);		void setRestoreBasePointer(const MachineFunction *MF);
int getRestoreBasePointerOffset() const {return RestoreBasePointerOffset; }		int getRestoreBasePointerOffset() const {return RestoreBasePointerOffset; }

unsigned getCalleeSavedFrameSize() const { return CalleeSavedFrameSize; }		unsigned getCalleeSavedFrameSize() const { return CalleeSavedFrameSize; }
void setCalleeSavedFrameSize(unsigned bytes) { CalleeSavedFrameSize = bytes; }		void setCalleeSavedFrameSize(unsigned bytes) { CalleeSavedFrameSize = bytes; }

unsigned getBytesToPopOnReturn() const { return BytesToPopOnReturn; }		unsigned getBytesToPopOnReturn() const { return BytesToPopOnReturn; }
Show All 40 Lines

lib/Target/X86/X86RegisterInfo.cpp

Show First 20 Lines • Show All 462 Lines • ▼ Show 20 Lines	bool X86RegisterInfo::hasReservedSpillSlot(const MachineFunction &MF,
// this function neither used nor tested.		// this function neither used nor tested.
llvm_unreachable("Unused function on X86. Otherwise need a test case.");		llvm_unreachable("Unused function on X86. Otherwise need a test case.");
}		}

void		void
X86RegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,		X86RegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
int SPAdj, unsigned FIOperandNum,		int SPAdj, unsigned FIOperandNum,
RegScavenger *RS) const {		RegScavenger *RS) const {
assert(SPAdj == 0 && "Unexpected");

MachineInstr &MI = *II;		MachineInstr &MI = *II;
MachineFunction &MF = *MI.getParent()->getParent();		MachineFunction &MF = *MI.getParent()->getParent();
const TargetFrameLowering *TFI = MF.getSubtarget().getFrameLowering();		const TargetFrameLowering *TFI = MF.getSubtarget().getFrameLowering();
int FrameIndex = MI.getOperand(FIOperandNum).getIndex();		int FrameIndex = MI.getOperand(FIOperandNum).getIndex();
unsigned BasePtr;		unsigned BasePtr;

unsigned Opc = MI.getOpcode();		unsigned Opc = MI.getOpcode();
bool AfterFPPop = Opc == X86::TAILJMPm64 \|\| Opc == X86::TAILJMPm;		bool AfterFPPop = Opc == X86::TAILJMPm64 \|\| Opc == X86::TAILJMPm;
Show All 20 Lines	X86RegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
int FIOffset;		int FIOffset;
if (AfterFPPop) {		if (AfterFPPop) {
// Tail call jmp happens after FP is popped.		// Tail call jmp happens after FP is popped.
const MachineFrameInfo *MFI = MF.getFrameInfo();		const MachineFrameInfo *MFI = MF.getFrameInfo();
FIOffset = MFI->getObjectOffset(FrameIndex) - TFI->getOffsetOfLocalArea();		FIOffset = MFI->getObjectOffset(FrameIndex) - TFI->getOffsetOfLocalArea();
} else		} else
FIOffset = TFI->getFrameIndexOffset(MF, FrameIndex);		FIOffset = TFI->getFrameIndexOffset(MF, FrameIndex);

		if (BasePtr == StackPtr)
		FIOffset += SPAdj;
		mkuperAuthorUnsubmitted Not Done Reply Inline Actions And, apparently, this is still wrong, because eliminateCallFramePseudoInstr() may actually adjust the SP by a different amount than what PEI passes as the SPAdj, e.g. due to stack alignment concerns. mkuper: And, apparently, this is still wrong, because eliminateCallFramePseudoInstr() may actually…

// The frame index format for stackmaps and patchpoints is different from the		// The frame index format for stackmaps and patchpoints is different from the
// X86 format. It only has a FI and an offset.		// X86 format. It only has a FI and an offset.
if (Opc == TargetOpcode::STACKMAP \|\| Opc == TargetOpcode::PATCHPOINT) {		if (Opc == TargetOpcode::STACKMAP \|\| Opc == TargetOpcode::PATCHPOINT) {
assert(BasePtr == FramePtr && "Expected the FP as base register");		assert(BasePtr == FramePtr && "Expected the FP as base register");
int64_t Offset = MI.getOperand(FIOperandNum + 1).getImm() + FIOffset;		int64_t Offset = MI.getOperand(FIOperandNum + 1).getImm() + FIOffset;
MI.getOperand(FIOperandNum + 1).ChangeToImmediate(Offset);		MI.getOperand(FIOperandNum + 1).ChangeToImmediate(Offset);
return;		return;
}		}
▲ Show 20 Lines • Show All 214 Lines • Show Last 20 Lines

lib/Target/X86/X86TargetMachine.cpp

Show First 20 Lines • Show All 148 Lines • ▼ Show 20 Lines	public:

const X86Subtarget &getX86Subtarget() const {		const X86Subtarget &getX86Subtarget() const {
return *getX86TargetMachine().getSubtargetImpl();		return *getX86TargetMachine().getSubtargetImpl();
}		}

void addIRPasses() override;		void addIRPasses() override;
bool addInstSelector() override;		bool addInstSelector() override;
bool addILPOpts() override;		bool addILPOpts() override;
		void addPreRegAlloc() override;
void addPostRegAlloc() override;		void addPostRegAlloc() override;
void addPreEmitPass() override;		void addPreEmitPass() override;
};		};
} // namespace		} // namespace

TargetPassConfig *X86TargetMachine::createPassConfig(PassManagerBase &PM) {		TargetPassConfig *X86TargetMachine::createPassConfig(PassManagerBase &PM) {
return new X86PassConfig(this, PM);		return new X86PassConfig(this, PM);
}		}
Show All 17 Lines	bool X86PassConfig::addInstSelector() {
return false;		return false;
}		}

bool X86PassConfig::addILPOpts() {		bool X86PassConfig::addILPOpts() {
addPass(&EarlyIfConverterID);		addPass(&EarlyIfConverterID);
return true;		return true;
}		}

		void X86PassConfig::addPreRegAlloc() {
		addPass(createX86CallFrameOptimization());
		}

void X86PassConfig::addPostRegAlloc() {		void X86PassConfig::addPostRegAlloc() {
addPass(createX86FloatingPointStackifierPass());		addPass(createX86FloatingPointStackifierPass());
}		}

void X86PassConfig::addPreEmitPass() {		void X86PassConfig::addPreEmitPass() {
if (getOptLevel() != CodeGenOpt::None && getX86Subtarget().hasSSE2())		if (getOptLevel() != CodeGenOpt::None && getX86Subtarget().hasSSE2())
addPass(createExecutionDependencyFixPass(&X86::VR128RegClass));		addPass(createExecutionDependencyFixPass(&X86::VR128RegClass));

if (UseVZeroUpper)		if (UseVZeroUpper)
addPass(createX86IssueVZeroUpperPass());		addPass(createX86IssueVZeroUpperPass());

if (getOptLevel() != CodeGenOpt::None) {		if (getOptLevel() != CodeGenOpt::None) {
addPass(createX86PadShortFunctions());		addPass(createX86PadShortFunctions());
addPass(createX86FixupLEAs());		addPass(createX86FixupLEAs());
}		}
}		}

test/CodeGen/X86/inalloca-invoke.ll

Show All 25 Lines	; CHECK: leal 12(%[[beg]]), %[[end:[^ ]*]]

call void @begin(%Iter* sret %temp.lvalue)		call void @begin(%Iter* sret %temp.lvalue)
; CHECK: calll _begin		; CHECK: calll _begin

invoke void @plus(%Iter* sret %end, %Iter* %temp.lvalue, i32 4)		invoke void @plus(%Iter* sret %end, %Iter* %temp.lvalue, i32 4)
to label %invoke.cont unwind label %lpad		to label %invoke.cont unwind label %lpad

; Uses end as sret param.		; Uses end as sret param.
; CHECK: movl %[[end]], (%esp)		; CHECK: pushl %[[end]]
; CHECK: calll _plus		; CHECK: calll _plus

invoke.cont:		invoke.cont:
call void @begin(%Iter* sret %beg)		call void @begin(%Iter* sret %beg)

; CHECK: pushl %[[beg]]		; CHECK: pushl %[[beg]]
; CHECK: calll _begin		; CHECK: calll _begin

Show All 12 Lines

test/CodeGen/X86/movtopush.ll

	; RUN: llc < %s -mtriple=i686-windows \| FileCheck %s -check-prefix=NORMAL			; RUN: llc < %s -mtriple=i686-windows \| FileCheck %s -check-prefix=NORMAL
				; RUN: llc < %s -mtriple=x86_64-windows \| FileCheck %s -check-prefix=X64
	; RUN: llc < %s -mtriple=i686-windows -force-align-stack -stack-alignment=32 \| FileCheck %s -check-prefix=ALIGNED			; RUN: llc < %s -mtriple=i686-windows -force-align-stack -stack-alignment=32 \| FileCheck %s -check-prefix=ALIGNED

	declare void @good(i32 %a, i32 %b, i32 %c, i32 %d)			declare void @good(i32 %a, i32 %b, i32 %c, i32 %d)
	declare void @inreg(i32 %a, i32 inreg %b, i32 %c, i32 %d)			declare void @inreg(i32 %a, i32 inreg %b, i32 %c, i32 %d)

	; Here, we should have a reserved frame, so we don't expect pushes			; Here, we should have a reserved frame, so we don't expect pushes
	; NORMAL-LABEL: test1			; NORMAL-LABEL: test1:
	; NORMAL: subl $16, %esp			; NORMAL: subl $16, %esp
	; NORMAL-NEXT: movl $4, 12(%esp)			; NORMAL-NEXT: movl $4, 12(%esp)
	; NORMAL-NEXT: movl $3, 8(%esp)			; NORMAL-NEXT: movl $3, 8(%esp)
	; NORMAL-NEXT: movl $2, 4(%esp)			; NORMAL-NEXT: movl $2, 4(%esp)
	; NORMAL-NEXT: movl $1, (%esp)			; NORMAL-NEXT: movl $1, (%esp)
	; NORMAL-NEXT: call			; NORMAL-NEXT: call
	define void @test1() {			define void @test1() {
	entry:			entry:
	call void @good(i32 1, i32 2, i32 3, i32 4)			call void @good(i32 1, i32 2, i32 3, i32 4)
	ret void			ret void
	}			}

	; Here, we expect a sequence of 4 immediate pushes			; We're optimizing for code size, so we should get pushes for x86,
	; NORMAL-LABEL: test2			; even though there is a reserved call frame.
				; Make sure we don't touch x86-64
				; NORMAL-LABEL: test1b:
				; NORMAL-NOT: subl {{.*}} %esp
				; NORMAL: pushl $4
				; NORMAL-NEXT: pushl $3
				; NORMAL-NEXT: pushl $2
				; NORMAL-NEXT: pushl $1
				; NORMAL-NEXT: call
				; X64-LABEL: test1b:
				; X64: movl $1, %ecx
				; X64-NEXT: movl $2, %edx
				; X64-NEXT: movl $3, %r8d
				; X64-NEXT: movl $4, %r9d
				; X64-NEXT: callq good
				define void @test1b() optsize {
				entry:
				call void @good(i32 1, i32 2, i32 3, i32 4)
				ret void
				}

				; Same as above, but for minsize
				; NORMAL-LABEL: test1c:
				; NORMAL-NOT: subl {{.*}} %esp
				; NORMAL: pushl $4
				; NORMAL-NEXT: pushl $3
				; NORMAL-NEXT: pushl $2
				; NORMAL-NEXT: pushl $1
				; NORMAL-NEXT: call
				define void @test1c() minsize {
				entry:
				call void @good(i32 1, i32 2, i32 3, i32 4)
				ret void
				}

				; If we have a reserved frame, we should have pushes
				; NORMAL-LABEL: test2:
	; NORMAL-NOT: subl {{.*}} %esp			; NORMAL-NOT: subl {{.*}} %esp
	; NORMAL: pushl $4			; NORMAL: pushl $4
	; NORMAL-NEXT: pushl $3			; NORMAL-NEXT: pushl $3
	; NORMAL-NEXT: pushl $2			; NORMAL-NEXT: pushl $2
	; NORMAL-NEXT: pushl $1			; NORMAL-NEXT: pushl $1
	; NORMAL-NEXT: call			; NORMAL-NEXT: call
	define void @test2(i32 %k) {			define void @test2(i32 %k) {
	entry:			entry:
	%a = alloca i32, i32 %k			%a = alloca i32, i32 %k
	call void @good(i32 1, i32 2, i32 3, i32 4)			call void @good(i32 1, i32 2, i32 3, i32 4)
	ret void			ret void
	}			}

	; Again, we expect a sequence of 4 immediate pushes			; Again, we expect a sequence of 4 immediate pushes
	; Checks that we generate the right pushes for >8bit immediates			; Checks that we generate the right pushes for >8bit immediates
	; NORMAL-LABEL: test2b			; NORMAL-LABEL: test2b:
	; NORMAL-NOT: subl {{.*}} %esp			; NORMAL-NOT: subl {{.*}} %esp
	; NORMAL: pushl $4096			; NORMAL: pushl $4096
	; NORMAL-NEXT: pushl $3072			; NORMAL-NEXT: pushl $3072
	; NORMAL-NEXT: pushl $2048			; NORMAL-NEXT: pushl $2048
	; NORMAL-NEXT: pushl $1024			; NORMAL-NEXT: pushl $1024
	; NORMAL-NEXT: call			; NORMAL-NEXT: call
	define void @test2b(i32 %k) {			define void @test2b() optsize {
	entry:			entry:
	%a = alloca i32, i32 %k
	call void @good(i32 1024, i32 2048, i32 3072, i32 4096)			call void @good(i32 1024, i32 2048, i32 3072, i32 4096)
	ret void			ret void
	}			}

	; The first push should push a register			; The first push should push a register
	; NORMAL-LABEL: test3			; NORMAL-LABEL: test3:
	; NORMAL-NOT: subl {{.*}} %esp			; NORMAL-NOT: subl {{.*}} %esp
	; NORMAL: pushl $4			; NORMAL: pushl $4
	; NORMAL-NEXT: pushl $3			; NORMAL-NEXT: pushl $3
	; NORMAL-NEXT: pushl $2			; NORMAL-NEXT: pushl $2
	; NORMAL-NEXT: pushl %e{{..}}			; NORMAL-NEXT: pushl %e{{..}}
	; NORMAL-NEXT: call			; NORMAL-NEXT: call
	define void @test3(i32 %k) {			define void @test3(i32 %k) optsize {
	entry:			entry:
	%a = alloca i32, i32 %k
	call void @good(i32 %k, i32 2, i32 3, i32 4)			call void @good(i32 %k, i32 2, i32 3, i32 4)
	ret void			ret void
	}			}

	; We don't support weird calling conventions			; We don't support weird calling conventions
	; NORMAL-LABEL: test4			; NORMAL-LABEL: test4:
	; NORMAL: subl $12, %esp			; NORMAL: subl $12, %esp
	; NORMAL-NEXT: movl $4, 8(%esp)			; NORMAL-NEXT: movl $4, 8(%esp)
	; NORMAL-NEXT: movl $3, 4(%esp)			; NORMAL-NEXT: movl $3, 4(%esp)
	; NORMAL-NEXT: movl $1, (%esp)			; NORMAL-NEXT: movl $1, (%esp)
	; NORMAL-NEXT: movl $2, %eax			; NORMAL-NEXT: movl $2, %eax
	; NORMAL-NEXT: call			; NORMAL-NEXT: call
	define void @test4(i32 %k) {			define void @test4() optsize {
	entry:			entry:
	%a = alloca i32, i32 %k
	call void @inreg(i32 1, i32 2, i32 3, i32 4)			call void @inreg(i32 1, i32 2, i32 3, i32 4)
	ret void			ret void
	}			}

	; Check that additional alignment is added when the pushes			; When there is no reserved call frame, check that additional alignment
	; don't add up to the required alignment.			; is added when the pushes don't add up to the required alignment.
	; ALIGNED-LABEL: test5			; ALIGNED-LABEL: test5:
	; ALIGNED: subl $16, %esp			; ALIGNED: subl $16, %esp
	; ALIGNED-NEXT: pushl $4			; ALIGNED-NEXT: pushl $4
	; ALIGNED-NEXT: pushl $3			; ALIGNED-NEXT: pushl $3
	; ALIGNED-NEXT: pushl $2			; ALIGNED-NEXT: pushl $2
	; ALIGNED-NEXT: pushl $1			; ALIGNED-NEXT: pushl $1
	; ALIGNED-NEXT: call			; ALIGNED-NEXT: call
	define void @test5(i32 %k) {			define void @test5(i32 %k) {
	entry:			entry:
	%a = alloca i32, i32 %k			%a = alloca i32, i32 %k
	call void @good(i32 1, i32 2, i32 3, i32 4)			call void @good(i32 1, i32 2, i32 3, i32 4)
	ret void			ret void
	}			}

	; Check that pushing the addresses of globals (Or generally, things that			; Check that pushing the addresses of globals (Or generally, things that
	; aren't exactly immediates) isn't broken.			; aren't exactly immediates) isn't broken.
	; Fixes PR21878.			; Fixes PR21878.
	; NORMAL-LABEL: test6			; NORMAL-LABEL: test6:
	; NORMAL: pushl $_ext			; NORMAL: pushl $_ext
	; NORMAL-NEXT: call			; NORMAL-NEXT: call
	declare void @f(i8*)			declare void @f(i8*)
	@ext = external constant i8			@ext = external constant i8

	define void @test6() {			define void @test6() {
	call void @f(i8* @ext)			call void @f(i8* @ext)
	br label %bb			br label %bb
	bb:			bb:
	alloca i32			alloca i32
	ret void			ret void
	}			}

				; Check that we fold simple cases into the push
				; NORMAL-LABEL: test7:
				; NORMAL-NOT: subl {{.*}} %esp
				; NORMAL: movl 4(%esp), [[EAX:%e..]]
				; NORMAL-NEXT: pushl $4
				; NORMAL-NEXT: pushl ([[EAX]])
				; NORMAL-NEXT: pushl $2
				; NORMAL-NEXT: pushl $1
				; NORMAL-NEXT: call
				define void @test7(i32* %ptr) optsize {
				entry:
				%val = load i32* %ptr
				call void @good(i32 1, i32 2, i32 %val, i32 4)
				ret void
				}

				; But we don't want to fold stack-relative loads into the push,
				; because the offset will be wrong
				; NORMAL-LABEL: test8:
				; NORMAL-NOT: subl {{.*}} %esp
				; NORMAL: movl 4(%esp), [[EAX:%e..]]
				; NORMAL-NEXT: pushl $4
				; NORMAL-NEXT: pushl [[EAX]]
				; NORMAL-NEXT: pushl $2
				; NORMAL-NEXT: pushl $1
				; NORMAL-NEXT: call
				define void @test8(i32* %ptr) optsize {
				entry:
				%val = ptrtoint i32* %ptr to i32
				call void @good(i32 1, i32 2, i32 %val, i32 4)
				ret void
				}

				; If one function is using push instructions, and the other isn't
				; (because it has frame-index references), then we must resolve
				; these references correctly.
				; NORMAL-LABEL: test9:
				; NORMAL-NOT: leal (%esp),
				; NORMAL: pushl $4
				; NORMAL-NEXT: pushl $3
				; NORMAL-NEXT: pushl $2
				; NORMAL-NEXT: pushl $1
				; NORMAL-NEXT: call
				; NORMAL: subl $16, %esp
				; NORMAL-NEXT: leal 16(%esp), [[EAX:%e..]]
				; NORMAL-NEXT: movl [[EAX]], 12(%esp)
				; NORMAL-NEXT: movl $7, 8(%esp)
				; NORMAL-NEXT: movl $6, 4(%esp)
				; NORMAL-NEXT: movl $5, (%esp)
				; NORMAL-NEXT: call
				define void @test9() optsize {
				entry:
				%p = alloca i32, align 4
				call void @good(i32 1, i32 2, i32 3, i32 4)
				%0 = ptrtoint i32* %p to i32
				call void @good(i32 5, i32 6, i32 7, i32 %0)
				ret void
				}
				rnkUnsubmitted Not Done Reply Inline Actions Test case suggestions: ; Where the callee is indirect via the stack, `call <fi>` define void @test10() optsize { %stack_fptr = alloca void (i32, i32, i32, i32)* store void (i32, i32, i32, i32)* @good, void (i32, i32, i32, i32) %stack_fptr %good_ptr = load void (i32, i32, i32, i32) %stack_fptr call void (i32, i32, i32, i32)* %good_ptr(i32 1, i32 2, i32 3, i32 4) ret void } ; We can't fold the load into the push here, skipping the store. @the_global = global i32 define void @test11() optsize { %myload = load i32* @the_global store i32 42, i32* @the_global call void @good(i32 %myload, i32 2, i32 3, i32 4) ret void } rnk: Test case suggestions: ``` ; Where the callee is indirect via the stack, `call <fi>` define…

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Convert esp-relative movs of function arguments to pushes, step 2ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 18084

lib/CodeGen/PrologEpilogInserter.cpp

lib/Target/X86/CMakeLists.txt

lib/Target/X86/X86.h

lib/Target/X86/X86ConvertMovsToPushes.cpp

lib/Target/X86/X86FastISel.cpp

lib/Target/X86/X86FrameLowering.h

lib/Target/X86/X86FrameLowering.cpp

lib/Target/X86/X86InstrCompiler.td

lib/Target/X86/X86InstrInfo.h

lib/Target/X86/X86InstrInfo.cpp

lib/Target/X86/X86MachineFunctionInfo.h

lib/Target/X86/X86RegisterInfo.cpp

lib/Target/X86/X86TargetMachine.cpp

test/CodeGen/X86/inalloca-invoke.ll

test/CodeGen/X86/movtopush.ll

[X86] Convert esp-relative movs of function arguments to pushes, step 2
ClosedPublic