This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/X86/
-
Target/
-
X86/
1
CMakeLists.txt
2
X86.h
30
X86ConvertMovsToPushes.cpp
-
X86FastISel.cpp
-
X86FrameLowering.h
-
X86FrameLowering.cpp
-
X86InstrCompiler.td
-
X86MachineFunctionInfo.h
-
X86TargetMachine.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
inalloca-invoke.ll
1
movtopush.ll

Differential D6789

[X86] Convert esp-relative movs of function arguments to pushes, step 2
ClosedPublic

Authored by mkuper on Dec 28 2014, 6:11 AM.

Download Raw Diff

Details

Reviewers

nadav
delena
rnk

Commits

rG13fbd4526336: [X86] Convert esp-relative movs of function arguments to pushes, step 2
rGbd57186c763f: [X86] Convert esp-relative movs of function arguments to pushes, step 2
rL227752: [X86] Convert esp-relative movs of function arguments to pushes, step 2
rL227728: [X86] Convert esp-relative movs of function arguments to pushes, step 2

Summary

This is a first stab at the next step of the mov-to-push transformation.

It moves the transformation earlier in the pass order so that it can do load-folding, and prepares the required infrastructure.
It is still enabled only in cases where it should be a clear win - when we don't expect to have a reserved call frame, or when optimizing for size.
The next step will be a heuristic that makes a smarter decision on when this should be enabled.

As a side note - I've done some internal testing for effects on the code size, but I'd like to do some testing for things other people care about as well. So, if you have a x86-32 code-base where you care about the code size, and is publicly available, let me know.

Diff Detail

Event Timeline

mkuper updated this revision to Diff 17656.Dec 28 2014, 6:11 AM

mkuper retitled this revision from to [X86] Convert esp-relative movs of function arguments to pushes, step 2.

mkuper updated this object.

mkuper edited the test plan for this revision. (Show Details)

mkuper added reviewers: nadav, rnk, delena.

mkuper added a subscriber: Unknown Object (MLST).

Removed a horrible hack that was, in addition to being horrible, completely wrong, and added a test-case to cover the issue.

Also, ping?

I suggest to check also varargs and stdcall functions, were the callee clears the stack.

lib/Target/X86/CMakeLists.txt
17	Can you add this code to X86FrameLowering.cpp ?
lib/Target/X86/X86.h
70	I suggest to choose another name, something like optimizeCallFrameForSize
lib/Target/X86/X86ConvertMovsToPushes.cpp
40	I don't think that we really need this knob.
114	If you change instructions inside bb, your iterator may be broken.
213	It should be immediate, right? Can we have a relocation here?
225	SlowPush should be a property of the target, like slowLea
227	The comment is missing here.

Thanks, Elena!
Will upload a new version.

lib/Target/X86/X86.h
70	I wasn't happy with the name either, but didn't have any good ideas at the time. Will do.
lib/Target/X86/X86ConvertMovsToPushes.cpp
40	I'd rather keep this knob, it's fairly useful for debugging. Of course, it's internal only, not exposed to clang.
114	As far as I know, MBB iterators aren't invalidated by removing other instructions, and we don't remove the FrameSetup itself. But it's probably better to keep going from the FrameDestroy instead of the next instruction. Will change that.
213	It can be a relocation, but in that case, isImm() will fail. Will document that more clearly
225	I agree. Unfortunately, I've run out of bits. The Subtarget features are 64-bit bitfield, and they're all taken.
lib/Target/X86/X86RegisterInfo.cpp
508 ↗	(On Diff #17788)	And, apparently, this is still wrong, because eliminateCallFramePseudoInstr() may actually adjust the SP by a different amount than what PEI passes as the SPAdj, e.g. due to stack alignment concerns.

mkuper added inline comments.Jan 6 2015, 3:19 AM

lib/Target/X86/X86ConvertMovsToPushes.cpp
286	Argh. This is nonsense. Commented one thing, coded another... (mayStore() is extremely far from being a strong enough condition to allow this.)

rnk added inline comments.Jan 12 2015, 4:02 PM

lib/Target/X86/X86ConvertMovsToPushes.cpp
40	I would also like this as a temporary testing knob so that I can evaluate this across a large codebase.

So, this version should actually work (e.g. it can self-host and past check-llvm. Without the stackalign restriction of course, since that currently makes it a nop except on windows).
Unfortunately, it has several big warts, so I'm not planning to commit it as is. This is more of a request for ideas on how to improve the code.

So, any ideas on how to make this sane, especially X86InstrInfo::getSPAdjust(), are welcome.

I haven't finished reviewing yet, but I've got to run and handle something personal.

At a high level, is there any reason we shouldn't commit to push/pop earlier to allow for better ISel, rather than trying to transform call sequences later? Specifically, I'm thinking about adding an X86ISD::PUSH DAG node and changing X86TargetLowering::LowerCall() to use it.

lib/CodeGen/PrologEpilogInserter.cpp
855–856 ↗	(On Diff #18084)	This seems like an x86-specific quirk, right? Given "push [esp + 8]", x86 chips will load [esp + 8] before adjusting esp, and I think this code motion accomplishes that. I'm OK with that motion so long as there are no other upstream LLVM backends with CISC-y instructions like "push [SP-mem]". :)
lib/Target/X86/X86ConvertMovsToPushes.cpp
12	s/stck/stack/
83–84	I think it's important to at least support __thiscall eventually, since that's a very common convention with one regparm.
85–87	I guess I would justify this more in terms of reducing the extra CFI that we would have to emit to describe the SP adjustments. Converting a few movs to pushes isn't worth the complexity.
144	Can you explain why this is unprofitable? I guess if we get here we are in dyanamic alloca plus stack realignment land, i.e. the worst thing that could possibly happen. Is this about extra code for preserving the outgoing stack alignment then? Like on Linux, where we provide 16 byte stack alignment?

Thanks, Reid!

Waiting for the second part, you didn't get to the really horrible stuff yet...

Regarding the high level, two reasons:

It seemed like it was going to be simpler. I'm not so sure anymore, but I still think it is. (Note that we'll still need to fix all of the code that tracks SP adjustment, that's not going away in either case).
The main problem is that next step after this is going to be a function-scope heuristic. To use this transformation for even one call-site, I have to disable the reserved frame for the whole function. So, I need to try to approximate the impact on the whole function (which contains some calls that will be converted to use pushes, and some calls that won't be). I don't see how this can be done on the DAG level.

lib/CodeGen/PrologEpilogInserter.cpp
855–856 ↗	(On Diff #18084)	This call to SPAdjust() always returns 0 right now (barring the code in this patch), it was added as part of my refactoring in D6863, and I added it in the wrong place. The motivation here wasn't a push, actually, since I try to never generate push [esp + 8], that's filtered out by the code in the optimization. Although I can probably start generating them - I was trying to filter them out precisely because I didn't want all of this complexity at the first stage, but apparently it's necessary. The problem is that once we don't have a reserved call frame (regardless of the push transformation), you can have things like CALL32r <fi#1>, where the call is callee-pop. So you need to resolve the indirect call using the stack-pointer from before the call.
lib/Target/X86/X86ConvertMovsToPushes.cpp
83–84	Yes, and maybe even for _fastcall (It looks like gcc will do this for fastcall, icc won't). But I am still trying to do this gradually, to the extent that I can. :-)
85–87	You're right, that too.
144	If we get here, we're in opt-for-size + stack-realignment land. And, yes, that's exactly what it is is about. If you are passing only one parameter, the original code would be: mov %eax, 128(%esp) call $foo Without re-alignment, you have push %eax call $foo add $4, %esp which is still a win in terms of code-size With re-alignment, you get: sub $16, %esp push %eax call $foo add $12, %esp Which is... questionable. The code size for the sequence is the same (in this case, 7 bytes for both, not including the call), but if you have other call sites which you didn't convert, you may actually lose. And, of course, you lose performance (3 instructions instead of 1) without anything to show for it. Once there is a heuristic that tries to estimate the overhead, we can address this on a case-by-case basis (e.g. if we have 16-byte stack re-alignment, but most call-sites have a lot of parameters, then it's still worth it.)

rnk added inline comments.Jan 13 2015, 3:58 PM

lib/Target/X86/X86ConvertMovsToPushes.cpp
129–131	I think I misinterpreted this on the first pass. We always expect this to be profitable if we know we can't reserve space for the call frame. Maybe rename the bool to CannotReserveFrame to match the sense?
144	Based on my misinterpretation, I think I understand why you get this code. SP is assumed to be aligned coming into the sequence. We realign SP after dynamic allocas. The sequence is probably more like: sub $12, %esp push %eax call $foo add $16, %esp I can see why this is less profitable.
209	std::map is really malloc heavy. This can probably be a SmallVector<MachineInstr*, 8> or something, mapping slot index to the MI that fills it. The frame setup opcode should tell you how much stack space to allocate up front, and you can index into the vector by StackOffset / 4.
221–223	This seems worth tackling, given that you had to handle the `call <fi>` case. :)
365–369	It's not clear to me that same BB is sufficient, consider this potential BB: movl (%edi), %eax movl $42, (%edi) <call setup> movl %eax, (%esp) calll foo <call end> We can't move the load if there is a potentially aliasing store in the way. There might be a utility to help with the aliasing query, or you can assume that any stores other than arg stores might alias it and bail on that.
lib/Target/X86/X86InstrInfo.cpp
1717–1718 ↗	(On Diff #18084)	This is the best thing I can think of at the moment. =/
test/CodeGen/X86/movtopush.ll
204	Test case suggestions: ; Where the callee is indirect via the stack, `call <fi>` define void @test10() optsize { %stack_fptr = alloca void (i32, i32, i32, i32)* store void (i32, i32, i32, i32)* @good, void (i32, i32, i32, i32) %stack_fptr %good_ptr = load void (i32, i32, i32, i32) %stack_fptr call void (i32, i32, i32, i32)* %good_ptr(i32 1, i32 2, i32 3, i32 4) ret void } ; We can't fold the load into the push here, skipping the store. @the_global = global i32 define void @test11() optsize { %myload = load i32* @the_global store i32 42, i32* @the_global call void @good(i32 %myload, i32 2, i32 3, i32 4) ret void }

Thanks, Reid!

lib/Target/X86/X86ConvertMovsToPushes.cpp
129–131	Err, yes, you're right, sorry about that... got distracted while naming the variable, I guess, I meant the opposite. Thanks!
144	Yes, that sequence. :-) It doesn't depend on dynamic allocas, though. If you don't have a reserved frame (for whatever reason - for x86 after this patch, it's either dynamic allocas, or because we forced it not to reserve by using pushes), then you need this re-alignment.
209	That can work. Thanks, I'll try.
221–223	Yes, definitely. :-) It may even work out of the box now. But I think I still want to split it into a separate commit.
365–369	Right now I'm way more conservative than even that - I'm checking below that everything between this mov and the call setup is a MOV32rm. The "same basic block" check here is just a way to short-circuit the obviously wrong cases. This catches some common cases like the one in the comment above, but of course misses other opportunities. I could check for a mayStore() instead, but I'm not sure that's safe enough. I'd like to relax the condition - but again, I think that ought to be a separate commit.
lib/Target/X86/X86InstrInfo.cpp
1717–1718 ↗	(On Diff #18084)	Too bad. :-\ So you think I should commit with this code as is? This shouldn't be a huge problem in terms of compile-time (since I'm looking only until the next call, it can't go quadratic), but it's insanely ugly.

Applied review comments
Fixed another bug in the way PEI was handling push sequences (argh) - this required adding a target query.
Made the tests check a bit more (which would have exposed the bug above earlier).

rnk added inline comments.Jan 15 2015, 10:19 AM

lib/Target/X86/X86InstrInfo.cpp
1717–1718 ↗	(On Diff #18084)	Yeah, if we go with this MI pass approach to mov -> push conversion, then we'll have to keep this ADJCALLSTACKUP scan. We aren't going to move the callee cleanup stack adjustment onto the CALL instr without major changes.
1745 ↗	(On Diff #18084)	I wonder if it's possible for __readeflags() (pushf ; pop %reg) or others to get folded into a call sequence. Probably not.

mkuper added inline comments.Jan 16 2015, 5:57 AM

lib/Target/X86/X86InstrInfo.cpp
1717–1718 ↗	(On Diff #18084)	This will have to happen regardless of the MI pass vs. DAG approach. I mean, I still think doing it on the DAG is unfeasible, but even if we could do that, it wouldn't help. This code is used for the case where fi resolution needs to handle a a sequence where there is a fi reference between the call and the adjcallstackup, with callee cleanup for the call. This is just a side effect of making canSimplifyCallFramePseudos return false.
1745 ↗	(On Diff #18084)	I don't see how it could happen. In any case, we won't match either the pushf or the pop, so it should be ok.

lgtm

I still think forming pushes prior to isel is the way to go long term. It's a lot easier to convert pushes to 'load, SP adjust, store' than it is to go the other way.

include/llvm/Target/TargetFrameLowering.h
196 ↗	(On Diff #18222)	"- Do" uppercase
lib/Target/X86/X86ConvertMovsToPushes.cpp
101	Can this be `for (MachineBasicBlock &BB : *MF) {`?
103	Ditto, `for (MachineInstr &MI : BB) {` ?
lib/Target/X86/X86InstrInfo.cpp
1713–1725 ↗	(On Diff #18222)	I would shorten this to just something like "look for the ADJCALLSTACKUP instr that follows the call".
1717–1718 ↗	(On Diff #18084)	I was imagining in the DAG LowerCall implementation we emit FrameIndex operands with some kind of SP offset to indicate the current stack level. We'd end up with MI looking like this: ADJCALLSTACKDOWN32 <N> ; N is <size-of-args> % <stack-alignment>, which is usually zero PUSH32rmm <fi> <sp offset, N> PUSH32rmm <fi> <sp offset, N + 4> PUSH32rmm <fi> <sp offset, N + 8> CALL32rm <fi> <sp offset, N + 12> ADJCALLSTACKUP32 <N + 12> The main thing is that if we commit to pushes instead of movs at DAG time, it's impossible for the push conversion to fail for hard to diagnose reasons. It looks like the frame index MachineOperand type has an unused offset field.

This revision is now accepted and ready to land.Jan 22 2015, 12:57 PM

Hi Chandler,

This is something that Reid and I talked about on IRC, but I don’t think we came to a conclusion both of us were happy with (hence Reid’s “lgtm with reservations”, I guess :-) )

First, I don’t think the decision on whether to use movs or pushes belongs in the DAG.
The decision on whether a call-site should use movs or pushes needs to be aware of its context, because having even one call-site use pushes means we will not have a reserved call frame, which affects the way all other call sites are treated as well. This patch makes the decision based on global attributes only (opt for size vs. speed, stack alignment), but the next step will be to make it based on an analysis of the call-sites – e.g. even with stack alignment of 16, it can still often be a win, depending on just how many of the function calls we can actually transform, and how many memory arguments each call has.

So the way I envision the next step is that the pass will:

a) Collect the necessary information from all call sites in the function.

b) Make a judgment on whether the transformation is worth it – in terms of size for Os/Oz, in terms of performance for other opt levels.

c) Perform the transformation.
I don’t see how we can do this on the DAG.

If I understand Reid’s last suggestion, he proposed to flip the default – that is, emit pushes in the DAG, and have an MI pass that does the opposite (push -> mov) transformation if necessary.

I don’t believe that removes a lot of complexity or would improve performance.
The code in PEI, InstrInfo and FrameLowering is just a side effect on not being able to rely on a 0 SPAdj in PEI anymore (that is, canSimplifyCallFramePseudos() can now return false), and is needed regardless of how the transformation is performed. And we will still need the heuristic decision.
Some of the logic in looking for sequences where the conversion is possible will disappear, but I think a lot of it will remain as conditions on the incoming operand DAG nodes. And since we don’t want to transform each push into a “adjust esp, mov” but rather want to group all the esp adjustments back into the ADJCALLSTACKs, we will still need to have code in the pass that make sure this is safe w.r.t to the final sequence.
The main benefit I see is that we will no longer need to have the folding code – rather, we will have to unfold PUSH32rmm, which is simpler. However, I hope I can eventually get rid of the folding here by teaching PeepholeOptimizer to be smarter about this.

On the other hand, X86TargetLowering::LowerCall() is already, IMHO, a fairly complex piece of code, and I’d rather avoid making it even more complex.
Conceptually, I’d prefer that LowerCall() did standard mov-based lowering in all cases like it does now (we aren’t always going to lower to pushes anyway – it doesn’t really make sense for x864-64) and treat pushes as an optimization where available.

What do you think?

Michael

From: Chandler Carruth [mailto:chandlerc@google.com]
Sent: Thursday, January 22, 2015 23:07
To: reviews+D6789+public+a4ec4af5a5133e84@reviews.llvm.org
Cc: Kuperstein, Michael M; Nadav Rotem; Demikhovsky, Elena; Commit Messages and Patches for LLVM
Subject: Re: [PATCH] [X86] Convert esp-relative movs of function arguments to pushes, step 2

Closed by commit rL227728: [X86] Convert esp-relative movs of function arguments to pushes, step 2 (authored by mkuper). · Explain WhyFeb 1 2015, 3:46 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

X86/

	CMakeLists.txt
	CMakeLists.txt (revision 224837)

1 line

	X86.h
	X86.h (revision 224837)

4 lines

	X86ConvertMovsToPushes.cpp
	X86ConvertMovsToPushes.cpp (revision 0)

294 lines

	X86FastISel.cpp
	X86FastISel.cpp (revision 224837)

2 lines

	X86FrameLowering.h
	X86FrameLowering.h (revision 224837)

1 line

	X86FrameLowering.cpp
	X86FrameLowering.cpp (revision 224837)

150 lines

	X86InstrCompiler.td
	X86InstrCompiler.td (revision 224837)

14 lines

	X86MachineFunctionInfo.h
	X86MachineFunctionInfo.h (revision 224837)

12 lines

	X86TargetMachine.cpp
	X86TargetMachine.cpp (revision 224837)

5 lines

test/

CodeGen/

X86/

	inalloca-invoke.ll
	inalloca-invoke.ll (revision 224837)

2 lines

	movtopush.ll
	movtopush.ll (revision 224837)

98 lines

Diff 17656

lib/Target/X86/CMakeLists.txt

	set(LLVM_TARGET_DEFINITIONS X86.td)			set(LLVM_TARGET_DEFINITIONS X86.td)

	tablegen(LLVM X86GenRegisterInfo.inc -gen-register-info)			tablegen(LLVM X86GenRegisterInfo.inc -gen-register-info)
	tablegen(LLVM X86GenDisassemblerTables.inc -gen-disassembler)			tablegen(LLVM X86GenDisassemblerTables.inc -gen-disassembler)
	tablegen(LLVM X86GenInstrInfo.inc -gen-instr-info)			tablegen(LLVM X86GenInstrInfo.inc -gen-instr-info)
	tablegen(LLVM X86GenAsmWriter.inc -gen-asm-writer)			tablegen(LLVM X86GenAsmWriter.inc -gen-asm-writer)
	tablegen(LLVM X86GenAsmWriter1.inc -gen-asm-writer -asmwriternum=1)			tablegen(LLVM X86GenAsmWriter1.inc -gen-asm-writer -asmwriternum=1)
	tablegen(LLVM X86GenAsmMatcher.inc -gen-asm-matcher)			tablegen(LLVM X86GenAsmMatcher.inc -gen-asm-matcher)
	tablegen(LLVM X86GenDAGISel.inc -gen-dag-isel)			tablegen(LLVM X86GenDAGISel.inc -gen-dag-isel)
	tablegen(LLVM X86GenFastISel.inc -gen-fast-isel)			tablegen(LLVM X86GenFastISel.inc -gen-fast-isel)
	tablegen(LLVM X86GenCallingConv.inc -gen-callingconv)			tablegen(LLVM X86GenCallingConv.inc -gen-callingconv)
	tablegen(LLVM X86GenSubtargetInfo.inc -gen-subtarget)			tablegen(LLVM X86GenSubtargetInfo.inc -gen-subtarget)
	add_public_tablegen_target(X86CommonTableGen)			add_public_tablegen_target(X86CommonTableGen)

	set(sources			set(sources
	X86AsmPrinter.cpp			X86AsmPrinter.cpp
				X86ConvertMovsToPushes.cpp
				delenaUnsubmitted Not Done Reply Inline Actions Can you add this code to X86FrameLowering.cpp ? delena: Can you add this code to X86FrameLowering.cpp ?
	X86FastISel.cpp			X86FastISel.cpp
	X86FloatingPoint.cpp			X86FloatingPoint.cpp
	X86FrameLowering.cpp			X86FrameLowering.cpp
	X86ISelDAGToDAG.cpp			X86ISelDAGToDAG.cpp
	X86ISelLowering.cpp			X86ISelLowering.cpp
	X86InstrInfo.cpp			X86InstrInfo.cpp
	X86MCInstLower.cpp			X86MCInstLower.cpp
	X86MachineFunctionInfo.cpp			X86MachineFunctionInfo.cpp
	Show All 29 Lines

lib/Target/X86/X86.h

	Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	/// with NOOPs. This will prevent a stall when returning on the Atom.			/// with NOOPs. This will prevent a stall when returning on the Atom.
	FunctionPass *createX86PadShortFunctions();			FunctionPass *createX86PadShortFunctions();
	/// createX86FixupLEAs - Return a a pass that selectively replaces			/// createX86FixupLEAs - Return a a pass that selectively replaces
	/// certain instructions (like add, sub, inc, dec, some shifts,			/// certain instructions (like add, sub, inc, dec, some shifts,
	/// and some multiplies) by equivalent LEA instructions, in order			/// and some multiplies) by equivalent LEA instructions, in order
	/// to eliminate execution delays in some Atom processors.			/// to eliminate execution delays in some Atom processors.
	FunctionPass *createX86FixupLEAs();			FunctionPass *createX86FixupLEAs();

				/// createX86ConvertMovsToPushes - Return a pass that converts movs
				delenaUnsubmitted Not Done Reply Inline Actions I suggest to choose another name, something like optimizeCallFrameForSize delena: I suggest to choose another name, something like optimizeCallFrameForSize
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions I wasn't happy with the name either, but didn't have any good ideas at the time. Will do. mkuper: I wasn't happy with the name either, but didn't have any good ideas at the time. Will do.
				/// that stores function parameters onto the stack into pushes.
				FunctionPass *createX86ConvertMovsToPushes();

	} // End llvm namespace			} // End llvm namespace

	#endif			#endif

lib/Target/X86/X86ConvertMovsToPushes.cpp

				//===-------- X86ConvertMovsToPushes.cpp - pad short functions ------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file defines a pass that converts movs of function parameters onto the
				// stack into pushes. This is beneficial for two main reasons
				// 1) The push instruction encoding is much smaller than an esp-relative mov
				rnkUnsubmitted Not Done Reply Inline Actions s/stck/stack/ rnk: s/stck/stack/
				// 2) It is possible to push memory arguments directly. So, if the
				// the transformation is preformed pre-reg-alloc, it can help relieve
				// register pressure.
				//
				//===----------------------------------------------------------------------===//

				#include <algorithm>

				#include "X86.h"
				#include "X86InstrInfo.h"
				#include "X86Subtarget.h"
				#include "X86MachineFunctionInfo.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/CodeGen/Passes.h"
				#include "llvm/IR/Function.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Target/TargetInstrInfo.h"

				using namespace llvm;

				#define DEBUG_TYPE "x86-mov-to-push"

				cl::opt<bool> NoMovToPush("no-mov-to-push",
				cl::desc("Avoid function argument mov-to-push transformation"),
				delenaUnsubmitted Not Done Reply Inline Actions I don't think that we really need this knob. delena: I don't think that we really need this knob.
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions I'd rather keep this knob, it's fairly useful for debugging. Of course, it's internal only, not exposed to clang. mkuper: I'd rather keep this knob, it's fairly useful for debugging. Of course, it's internal only, not…
				rnkUnsubmitted Not Done Reply Inline Actions I would also like this as a temporary testing knob so that I can evaluate this across a large codebase. rnk: I would also like this as a temporary testing knob so that I can evaluate this across a large…
				cl::init(false), cl::Hidden);

				namespace {
				class X86ConvertMovsToPushes : public MachineFunctionPass {
				public:
				X86ConvertMovsToPushes() : MachineFunctionPass(ID) {}

				bool runOnMachineFunction(MachineFunction &MF) override;

				private:
				bool adjustCallSequence(MachineFunction &MF, MachineBasicBlock &MBB,
				MachineBasicBlock::iterator I);

				MachineInstr *canFoldIntoRegPush(MachineBasicBlock::iterator FrameSetup,
				unsigned Reg);

				const char *getPassName() const override {
				return "X86 Convert Movs to Pushes";
				}

				const TargetInstrInfo *TII;
				const MachineRegisterInfo *MRI;
				static char ID;
				};

				char X86ConvertMovsToPushes::ID = 0;
				}

				FunctionPass *llvm::createX86ConvertMovsToPushes() {
				return new X86ConvertMovsToPushes();
				}

				bool X86ConvertMovsToPushes::runOnMachineFunction(MachineFunction &MF) {
				if (NoMovToPush.getValue())
				return false;

				// We currently only support call sequences where all parameters.
				// are passed on the stack.
				// No point in running this in 64-bit mode, since some arguments are
				// passed in-register in all common calling conventions, so the pattern
				// we're looking for will never match.
				const X86Subtarget &STI = MF.getTarget().getSubtarget<X86Subtarget>();
				if (STI.is64Bit())
				return false;
				rnkUnsubmitted Not Done Reply Inline Actions I think it's important to at least support __thiscall eventually, since that's a very common convention with one regparm. rnk: I think it's important to at least support __thiscall eventually, since that's a very common…
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions Yes, and maybe even for _fastcall (It looks like gcc will do this for fastcall, icc won't). But I am still trying to do this gradually, to the extent that I can. :-) mkuper: Yes, and maybe even for _fastcall (It looks like gcc will do this for fastcall, icc won't).

				// This transformation is always a win when optimizing for size,
				// or when we are not going to have a reserved call stack.
				rnkUnsubmitted Not Done Reply Inline Actions I guess I would justify this more in terms of reducing the extra CFI that we would have to emit to describe the SP adjustments. Converting a few movs to pushes isn't worth the complexity. rnk: I guess I would justify this more in terms of reducing the extra CFI that we would have to emit…
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions You're right, that too. mkuper: You're right, that too.
				// Under other circumstances, it may be either a win or a loss,
				// and requires a heuristic.
				// For now, enable it only for the clear win cases.

				// TODO: Add a heuristic that actually looks at the function,
				// and enable this for more cases.

				AttributeSet FnAttrs = MF.getFunction()->getAttributes();
				bool OptForSize =
				FnAttrs.hasAttribute(AttributeSet::FunctionIndex,
				Attribute::OptimizeForSize) \|\|
				FnAttrs.hasAttribute(AttributeSet::FunctionIndex, Attribute::MinSize);

				if (!MF.getFrameInfo()->hasVarSizedObjects() && !OptForSize)
				rnkUnsubmitted Not Done Reply Inline Actions Can this be `for (MachineBasicBlock &BB : MF) {`? rnk:* Can this be `for (MachineBasicBlock &BB : *MF) {`?
				return false;

				rnkUnsubmitted Not Done Reply Inline Actions Ditto, `for (MachineInstr &MI : BB) {` ? rnk: Ditto, `for (MachineInstr &MI : BB) {` ?
				TII = MF.getSubtarget().getInstrInfo();
				MRI = &MF.getRegInfo();
				int FrameSetupOpcode = TII->getCallFrameSetupOpcode();

				bool Changed = false;

				for (MachineFunction::iterator BB = MF.begin(), E = MF.end(); BB != E; ++BB)
				for (MachineBasicBlock::iterator I = BB->begin(); I != BB->end(); ++I)
				if (I->getOpcode() == FrameSetupOpcode)
				Changed \|= adjustCallSequence(MF, *BB, I);

				delenaUnsubmitted Not Done Reply Inline Actions If you change instructions inside bb, your iterator may be broken. delena: If you change instructions inside bb, your iterator may be broken.
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions As far as I know, MBB iterators aren't invalidated by removing other instructions, and we don't remove the FrameSetup itself. But it's probably better to keep going from the FrameDestroy instead of the next instruction. Will change that. mkuper: As far as I know, MBB iterators aren't invalidated by removing other instructions, and we don't…
				return Changed;
				}

				bool X86ConvertMovsToPushes::adjustCallSequence(MachineFunction &MF,
				MachineBasicBlock &MBB,
				MachineBasicBlock::iterator I) {

				// Check that this particular call sequence is amenable to the
				// transformation.
				const X86RegisterInfo &RegInfo = static_cast<const X86RegisterInfo >(
				MF.getSubtarget().getRegisterInfo());
				unsigned StackPtr = RegInfo.getStackRegister();
				int FrameDestroyOpcode = TII->getCallFrameDestroyOpcode();

				// We expect to enter this at the beginning of a call sequence
				assert(I->getOpcode() == TII->getCallFrameSetupOpcode());
				MachineBasicBlock::iterator FrameSetup = I++;
				rnkUnsubmitted Not Done Reply Inline Actions I think I misinterpreted this on the first pass. We always expect this to be profitable if we know we can't reserve space for the call frame. Maybe rename the bool to CannotReserveFrame to match the sense? rnk: I think I misinterpreted this on the first pass. We always expect this to be profitable if we…
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions Err, yes, you're right, sorry about that... got distracted while naming the variable, I guess, I meant the opposite. Thanks! mkuper: Err, yes, you're right, sorry about that... got distracted while naming the variable, I guess…

				// We expect a copy instruction here.
				// TODO: The copy instruction here is a lowering artifact.
				// We should also support a copy-less version, where the stack
				// pointer is used directly.
				if (!I->isCopy() \|\| !I->getOperand(0).isReg())
				return false;
				MachineBasicBlock::iterator SPCopy = I++;
				StackPtr = SPCopy->getOperand(0).getReg();

				// Scan the call setup sequence for the pattern we're looking for.
				// We only handle a simple case - a sequence of MOV32mi or MOV32mr
				// instructions, that push a sequence of 32-bit values onto the stack, with
				rnkUnsubmitted Not Done Reply Inline Actions Can you explain why this is unprofitable? I guess if we get here we are in dyanamic alloca plus stack realignment land, i.e. the worst thing that could possibly happen. Is this about extra code for preserving the outgoing stack alignment then? Like on Linux, where we provide 16 byte stack alignment? rnk: Can you explain why this is unprofitable? I guess if we get here we are in dyanamic alloca plus…
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions If we get here, we're in opt-for-size + stack-realignment land. And, yes, that's exactly what it is is about. If you are passing only one parameter, the original code would be: mov %eax, 128(%esp) call $foo Without re-alignment, you have push %eax call $foo add $4, %esp which is still a win in terms of code-size With re-alignment, you get: sub $16, %esp push %eax call $foo add $12, %esp Which is... questionable. The code size for the sequence is the same (in this case, 7 bytes for both, not including the call), but if you have other call sites which you didn't convert, you may actually lose. And, of course, you lose performance (3 instructions instead of 1) without anything to show for it. Once there is a heuristic that tries to estimate the overhead, we can address this on a case-by-case basis (e.g. if we have 16-byte stack re-alignment, but most call-sites have a lot of parameters, then it's still worth it.) mkuper: If we get here, we're in opt-for-size + stack-realignment land. And, yes, that's exactly what…
				rnkUnsubmitted Not Done Reply Inline Actions Based on my misinterpretation, I think I understand why you get this code. SP is assumed to be aligned coming into the sequence. We realign SP after dynamic allocas. The sequence is probably more like: sub $12, %esp push %eax call $foo add $16, %esp I can see why this is less profitable. rnk: Based on my misinterpretation, I think I understand why you get this code. SP is assumed to be…
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions Yes, that sequence. :-) It doesn't depend on dynamic allocas, though. If you don't have a reserved frame (for whatever reason - for x86 after this patch, it's either dynamic allocas, or because we forced it not to reserve by using pushes), then you need this re-alignment. mkuper: Yes, that sequence. :-) It doesn't depend on dynamic allocas, though. If you don't have a…
				// no gaps between them.
				std::map<int64_t, MachineBasicBlock::iterator> MovMap;

				do {
				int Opcode = I->getOpcode();
				if (Opcode != X86::MOV32mi && Opcode != X86::MOV32mr)
				break;

				// We only want movs of the form:
				// movl imm/r32, k(%esp)
				// If we run into something else, bail.
				// Note that AddrBaseReg may, counter to its name, not be a register,
				// but rather a frame index.
				if (!I->getOperand(X86::AddrBaseReg).isReg() \|\|
				(I->getOperand(X86::AddrBaseReg).getReg() != StackPtr) \|\|
				!I->getOperand(X86::AddrScaleAmt).isImm() \|\|
				(I->getOperand(X86::AddrScaleAmt).getImm() != 1) \|\|
				(I->getOperand(X86::AddrIndexReg).getReg() != X86::NoRegister) \|\|
				(I->getOperand(X86::AddrSegmentReg).getReg() != X86::NoRegister) \|\|
				!I->getOperand(X86::AddrDisp).isImm())
				return false;

				int64_t StackDisp = I->getOperand(X86::AddrDisp).getImm();

				// We really don't want to consider the unaligned case.
				if (StackDisp % 4)
				return false;

				// If the same stack slot is being filled twice, something's fishy.
				if (!MovMap.insert(std::pair<int64_t, MachineInstr *>(StackDisp, I)).second)
				return false;

				++I;
				} while (I != MBB.end());

				// We now expect the end of the sequence - a call and a stack adjust.
				if (I == MBB.end())
				return false;
				if (!I->isCall())
				return false;
				MachineBasicBlock::iterator Call = I;
				if ((++I)->getOpcode() != FrameDestroyOpcode)
				return false;

				// Now, go through the map, and see that we don't have any gaps,
				// but only a series of 32-bit MOVs.
				// Since std::map provides ordered iteration, the original order
				// of the MOVs doesn't matter.
				int64_t ExpectedDist = 0;
				for (auto MMI = MovMap.begin(), MME = MovMap.end(); MMI != MME;
				++MMI, ExpectedDist += 4)
				if (MMI->first != ExpectedDist)
				return false;

				// Ok, we can in fact do the transformation for this call.
				// Do not remove the FrameSetup instruction, but adjust the size.
				// PEI will end up finalizing the handling of that.
				FrameSetup->getOperand(1).setImm(ExpectedDist);

				DebugLoc DL = I->getDebugLoc();
				// Now, iterate through the map in reverse order, and replace the movs
				// with pushes. MOVmi/MOVmr doesn't have any defs, so need to replace uses.
				for (auto MMI = MovMap.rbegin(), MME = MovMap.rend(); MMI != MME; ++MMI) {
				MachineBasicBlock::iterator MOV = MMI->second;
				MachineOperand PushOp = MOV->getOperand(X86::AddrNumOperands);
				rnkUnsubmitted Not Done Reply Inline Actions std::map is really malloc heavy. This can probably be a SmallVector<MachineInstr, 8> or something, mapping slot index to the MI that fills it. The frame setup opcode should tell you how much stack space to allocate up front, and you can index into the vector by StackOffset / 4. rnk:* std::map is really malloc heavy. This can probably be a SmallVector<MachineInstr*, 8> or…
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions That can work. Thanks, I'll try. mkuper: That can work. Thanks, I'll try.
				if (MOV->getOpcode() == X86::MOV32mi) {
				unsigned PushOpcode = X86::PUSHi32;
				if (PushOp.isImm()) {
				int64_t Val = PushOp.getImm();
				delenaUnsubmitted Not Done Reply Inline Actions It should be immediate, right? Can we have a relocation here? delena: It should be immediate, right? Can we have a relocation here?
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions It can be a relocation, but in that case, isImm() will fail. Will document that more clearly mkuper: It can be a relocation, but in that case, isImm() will fail. Will document that more clearly
				if (isInt<8>(Val))
				PushOpcode = X86::PUSH32i8;
				}
				BuildMI(MBB, Call, DL, TII->get(PushOpcode)).addOperand(PushOp);
				} else {
				unsigned int Reg = PushOp.getReg();

				// If PUSHrmm is not slow on this target, try to fold the source of the
				// push into the instruction.
				const X86Subtarget &ST = MF.getTarget().getSubtarget<X86Subtarget>();
				rnkUnsubmitted Not Done Reply Inline Actions This seems worth tackling, given that you had to handle the `call <fi>` case. :) rnk: This seems worth tackling, given that you had to handle the `call <fi>` case. :)
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions Yes, definitely. :-) It may even work out of the box now. But I think I still want to split it into a separate commit. mkuper: Yes, definitely. :-) It may even work out of the box now. But I think I still want to split it…
				bool SlowPUSHrmm = ST.isAtom() \|\| ST.isSLM();
				MachineInstr *DefMov = nullptr;
				delenaUnsubmitted Not Done Reply Inline Actions SlowPush should be a property of the target, like slowLea delena: SlowPush should be a property of the target, like slowLea
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions I agree. Unfortunately, I've run out of bits. The Subtarget features are 64-bit bitfield, and they're all taken. mkuper: I agree. Unfortunately, I've run out of bits. The Subtarget features are 64-bit bitfield, and…
				if (!SlowPUSHrmm && (DefMov = canFoldIntoRegPush(FrameSetup, Reg))) {
				MachineInstr *Push = BuildMI(MBB, Call, DL, TII->get(X86::PUSH32rmm));
				delenaUnsubmitted Not Done Reply Inline Actions The comment is missing here. delena: The comment is missing here.

				unsigned NumOps = DefMov->getDesc().getNumOperands();
				for (unsigned i = NumOps - X86::AddrNumOperands; i != NumOps; ++i)
				Push->addOperand(DefMov->getOperand(i));

				DefMov->eraseFromParent();
				} else {
				BuildMI(MBB, Call, DL, TII->get(X86::PUSH32r)).addReg(Reg).getInstr();
				}
				}

				MBB.erase(MOV);
				}

				// The stack-pointer copy is no longer used in the call sequences.
				// There should not be any other users, but we can't commit to that, so:
				if (MRI->use_empty(SPCopy->getOperand(0).getReg()))
				SPCopy->eraseFromParent();

				// Once we've done this, we need to make sure PEI doesn't assume a reserved
				// frame.
				X86MachineFunctionInfo *FuncInfo = MF.getInfo<X86MachineFunctionInfo>();
				FuncInfo->setHasPushSequences(true);

				return true;
				}

				MachineInstr *X86ConvertMovsToPushes::canFoldIntoRegPush(
				MachineBasicBlock::iterator FrameSetup, unsigned Reg) {
				// Do an extremely restricted form of load folding.
				// ISel will often create patterns like:
				// movl 4(%edi), %eax
				// movl 8(%edi), %ecx
				// movl 12(%edi), %edx
				// movl %edx, 8(%esp)
				// movl %ecx, 4(%esp)
				// movl %eax, (%esp)
				// call
				// Get rid of those with prejudice.
				if (!TargetRegisterInfo::isVirtualRegister(Reg))
				return nullptr;

				// Make sure this is the only use of Reg.
				if (!MRI->hasOneNonDBGUse(Reg))
				return nullptr;

				MachineBasicBlock::iterator DefMI = MRI->getVRegDef(Reg);

				// Make sure the def is a MOV from memory.
				// If the def is an another block, give up.
				if (DefMI->getOpcode() != X86::MOV32rm \|\|
				DefMI->getParent() != FrameSetup->getParent())
				return nullptr;

				// Now, make sure everything else up until the ADJCALLSTACK is a sequence
				// of MOVs.
				for (auto I = DefMI; I != FrameSetup; ++I)
				if (I->mayStore())
				return nullptr;
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions Argh. This is nonsense. Commented one thing, coded another... (mayStore() is extremely far from being a strong enough condition to allow this.) mkuper: Argh. This is nonsense. Commented one thing, coded another... (mayStore() is extremely far from…

				// Be careful with movs that load from a stack slot, since it may get
				// resolved incorrectly.
				if (!DefMI->getOperand(1).isReg())
				return nullptr;

				return DefMI;
				}
				rnkUnsubmitted Not Done Reply Inline Actions It's not clear to me that same BB is sufficient, consider this potential BB: movl (%edi), %eax movl $42, (%edi) <call setup> movl %eax, (%esp) calll foo <call end> We can't move the load if there is a potentially aliasing store in the way. There might be a utility to help with the aliasing query, or you can assume that any stores other than arg stores might alias it and bail on that. rnk: It's not clear to me that same BB is sufficient, consider this potential BB: ``` movl (%edi)…
				mkuperAuthorUnsubmitted Not Done Reply Inline Actions Right now I'm way more conservative than even that - I'm checking below that everything between this mov and the call setup is a MOV32rm. The "same basic block" check here is just a way to short-circuit the obviously wrong cases. This catches some common cases like the one in the comment above, but of course misses other opportunities. I could check for a mayStore() instead, but I'm not sure that's safe enough. I'd like to relax the condition - but again, I think that ought to be a separate commit. mkuper: Right now I'm way more conservative than even that - I'm checking below that everything between…

lib/Target/X86/X86FastISel.cpp

Show First 20 Lines • Show All 2,730 Lines • ▼ Show 20 Lines	bool X86FastISel::fastLowerCall(CallLoweringInfo &CLI) {
CCInfo.AnalyzeCallOperands(OutVTs, OutFlags, CC_X86);		CCInfo.AnalyzeCallOperands(OutVTs, OutFlags, CC_X86);

// Get a count of how many bytes are to be pushed on the stack.		// Get a count of how many bytes are to be pushed on the stack.
unsigned NumBytes = CCInfo.getNextStackOffset();		unsigned NumBytes = CCInfo.getNextStackOffset();

// Issue CALLSEQ_START		// Issue CALLSEQ_START
unsigned AdjStackDown = TII.getCallFrameSetupOpcode();		unsigned AdjStackDown = TII.getCallFrameSetupOpcode();
BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(AdjStackDown))		BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(AdjStackDown))
.addImm(NumBytes);		.addImm(NumBytes).addImm(0);

// Walk the register/memloc assignments, inserting copies/loads.		// Walk the register/memloc assignments, inserting copies/loads.
const X86RegisterInfo RegInfo = static_cast<const X86RegisterInfo >(		const X86RegisterInfo RegInfo = static_cast<const X86RegisterInfo >(
TM.getSubtargetImpl()->getRegisterInfo());		TM.getSubtargetImpl()->getRegisterInfo());
for (unsigned i = 0, e = ArgLocs.size(); i != e; ++i) {		for (unsigned i = 0, e = ArgLocs.size(); i != e; ++i) {
CCValAssign const &VA = ArgLocs[i];		CCValAssign const &VA = ArgLocs[i];
const Value *ArgVal = OutVals[VA.getValNo()];		const Value *ArgVal = OutVals[VA.getValNo()];
MVT ArgVT = OutVTs[VA.getValNo()];		MVT ArgVT = OutVTs[VA.getValNo()];
▲ Show 20 Lines • Show All 608 Lines • Show Last 20 Lines

lib/Target/X86/X86FrameLowering.h

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	public:

bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB,		bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MI,		MachineBasicBlock::iterator MI,
const std::vector<CalleeSavedInfo> &CSI,		const std::vector<CalleeSavedInfo> &CSI,
const TargetRegisterInfo *TRI) const override;		const TargetRegisterInfo *TRI) const override;

bool hasFP(const MachineFunction &MF) const override;		bool hasFP(const MachineFunction &MF) const override;
bool hasReservedCallFrame(const MachineFunction &MF) const override;		bool hasReservedCallFrame(const MachineFunction &MF) const override;
		bool canSimplifyCallFramePseudos(const MachineFunction &MF) const override;

int getFrameIndexOffset(const MachineFunction &MF, int FI) const override;		int getFrameIndexOffset(const MachineFunction &MF, int FI) const override;
int getFrameIndexReference(const MachineFunction &MF, int FI,		int getFrameIndexReference(const MachineFunction &MF, int FI,
unsigned &FrameReg) const override;		unsigned &FrameReg) const override;

int getFrameIndexOffsetFromSP(const MachineFunction &MF, int FI) const;		int getFrameIndexOffsetFromSP(const MachineFunction &MF, int FI) const;
int getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI,		int getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI,
unsigned &FrameReg) const override;		unsigned &FrameReg) const override;
Show All 19 Lines

lib/Target/X86/X86FrameLowering.cpp

Show All 32 Lines
#include <cstdlib>		#include <cstdlib>

using namespace llvm;		using namespace llvm;

// FIXME: completely move here.		// FIXME: completely move here.
extern cl::opt<bool> ForceStackAlign;		extern cl::opt<bool> ForceStackAlign;

bool X86FrameLowering::hasReservedCallFrame(const MachineFunction &MF) const {		bool X86FrameLowering::hasReservedCallFrame(const MachineFunction &MF) const {
return !MF.getFrameInfo()->hasVarSizedObjects();		return !MF.getFrameInfo()->hasVarSizedObjects() &&
		!MF.getInfo<X86MachineFunctionInfo>()->getHasPushSequences();
		}

		// We can simplify even we don't have a reserved call frame, in case
		// the only reason we don't have it is because we did the mov -> push
		// transformation.
		bool X86FrameLowering::canSimplifyCallFramePseudos(const MachineFunction &MF)
		const {
		return hasReservedCallFrame(MF) \|\| hasFP(MF) \|\|
		(!hasReservedCallFrame(MF) && !MF.getFrameInfo()->hasVarSizedObjects());
}		}

/// hasFP - Return true if the specified function should have a dedicated frame		/// hasFP - Return true if the specified function should have a dedicated frame
/// pointer register. This is true if the function has variable sized allocas		/// pointer register. This is true if the function has variable sized allocas
/// or if frame pointer elimination is disabled.		/// or if frame pointer elimination is disabled.
bool X86FrameLowering::hasFP(const MachineFunction &MF) const {		bool X86FrameLowering::hasFP(const MachineFunction &MF) const {
const MachineFrameInfo *MFI = MF.getFrameInfo();		const MachineFrameInfo *MFI = MF.getFrameInfo();
const MachineModuleInfo &MMI = MF.getMMI();		const MachineModuleInfo &MMI = MF.getMMI();
Show All 38 Lines	if (isInt<8>(Imm))
return X86::AND64ri8;		return X86::AND64ri8;
return X86::AND64ri32;		return X86::AND64ri32;
}		}
if (isInt<8>(Imm))		if (isInt<8>(Imm))
return X86::AND32ri8;		return X86::AND32ri8;
return X86::AND32ri;		return X86::AND32ri;
}		}

static unsigned getPUSHiOpcode(bool IsLP64, MachineOperand MO) {
// We don't support LP64 for now.
assert(!IsLP64);

if (MO.isImm() && isInt<8>(MO.getImm()))
return X86::PUSH32i8;

return X86::PUSHi32;;
}

static unsigned getLEArOpcode(unsigned IsLP64) {		static unsigned getLEArOpcode(unsigned IsLP64) {
return IsLP64 ? X86::LEA64r : X86::LEA32r;		return IsLP64 ? X86::LEA64r : X86::LEA32r;
}		}

/// findDeadCallerSavedReg - Return a caller-saved register that isn't live		/// findDeadCallerSavedReg - Return a caller-saved register that isn't live
/// when it reaches the "return" instruction. We can then pop a stack object		/// when it reaches the "return" instruction. We can then pop a stack object
/// to this register without worry about clobbering it.		/// to this register without worry about clobbering it.
static unsigned findDeadCallerSavedReg(MachineBasicBlock &MBB,		static unsigned findDeadCallerSavedReg(MachineBasicBlock &MBB,
▲ Show 20 Lines • Show All 1,691 Lines • ▼ Show 20 Lines	if (MaxStack > Guaranteed) {
incStackMBB->addSuccessor(&prologueMBB, 99);		incStackMBB->addSuccessor(&prologueMBB, 99);
incStackMBB->addSuccessor(incStackMBB, 1);		incStackMBB->addSuccessor(incStackMBB, 1);
}		}
#ifdef XDEBUG		#ifdef XDEBUG
MF.verify();		MF.verify();
#endif		#endif
}		}

bool X86FrameLowering::
convertArgMovsToPushes(MachineFunction &MF, MachineBasicBlock &MBB,
MachineBasicBlock::iterator I, uint64_t Amount) const {
const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
const X86RegisterInfo &RegInfo = static_cast<const X86RegisterInfo >(
MF.getSubtarget().getRegisterInfo());
unsigned StackPtr = RegInfo.getStackRegister();

// Scan the call setup sequence for the pattern we're looking for.
// We only handle a simple case now - a sequence of MOV32mi or MOV32mr
// instructions, that push a sequence of 32-bit values onto the stack, with
// no gaps.
std::map<int64_t, MachineBasicBlock::iterator> MovMap;
do {
int Opcode = I->getOpcode();
if (Opcode != X86::MOV32mi && Opcode != X86::MOV32mr)
break;

// We only want movs of the form:
// movl imm/r32, k(%ecx)
// If we run into something else, bail
// Note that AddrBaseReg may, counterintuitively, not be a register...
if (!I->getOperand(X86::AddrBaseReg).isReg() \|\|
(I->getOperand(X86::AddrBaseReg).getReg() != StackPtr) \|\|
!I->getOperand(X86::AddrScaleAmt).isImm() \|\|
(I->getOperand(X86::AddrScaleAmt).getImm() != 1) \|\|
(I->getOperand(X86::AddrIndexReg).getReg() != X86::NoRegister) \|\|
(I->getOperand(X86::AddrSegmentReg).getReg() != X86::NoRegister) \|\|
!I->getOperand(X86::AddrDisp).isImm())
return false;

int64_t StackDisp = I->getOperand(X86::AddrDisp).getImm();

// We don't want to consider the unaligned case.
if (StackDisp % 4)
return false;

// If the same stack slot is being filled twice, something's fishy.
if (!MovMap.insert(std::pair<int64_t, MachineInstr*>(StackDisp, I)).second)
return false;

++I;
} while (I != MBB.end());

// We now expect the end of the sequence - a call and a stack adjust.
if (I == MBB.end())
return false;
if (!I->isCall())
return false;
MachineBasicBlock::iterator Call = I;
if ((++I)->getOpcode() != TII.getCallFrameDestroyOpcode())
return false;

// Now, go through the map, and see that we don't have any gaps,
// but only a series of 32-bit MOVs.
// Since std::map provides ordered iteration, the original order
// of the MOVs doesn't matter.
int64_t ExpectedDist = 0;
for (auto MMI = MovMap.begin(), MME = MovMap.end(); MMI != MME;
++MMI, ExpectedDist += 4)
if (MMI->first != ExpectedDist)
return false;

// Ok, everything looks fine. Do the transformation.
DebugLoc DL = I->getDebugLoc();

// It's possible the original stack adjustment amount was larger than
// that done by the pushes. If so, we still need a SUB.
Amount -= ExpectedDist;
if (Amount) {
MachineInstr* Sub = BuildMI(MBB, Call, DL,
TII.get(getSUBriOpcode(false, Amount)), StackPtr)
.addReg(StackPtr).addImm(Amount);
Sub->getOperand(3).setIsDead();
}

// Now, iterate through the map in reverse order, and replace the movs
// with pushes. MOVmi/MOVmr doesn't have any defs, so need to replace uses.
for (auto MMI = MovMap.rbegin(), MME = MovMap.rend(); MMI != MME; ++MMI) {
MachineBasicBlock::iterator MOV = MMI->second;
MachineOperand PushOp = MOV->getOperand(X86::AddrNumOperands);

// Replace MOVmr with PUSH32r, and MOVmi with PUSHi of appropriate size
int PushOpcode = X86::PUSH32r;
if (MOV->getOpcode() == X86::MOV32mi)
PushOpcode = getPUSHiOpcode(false, PushOp);

BuildMI(MBB, Call, DL, TII.get(PushOpcode)).addOperand(PushOp);
MBB.erase(MOV);
}

return true;
}

void X86FrameLowering::		void X86FrameLowering::
eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB,		eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB,
MachineBasicBlock::iterator I) const {		MachineBasicBlock::iterator I) const {
const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();		const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
const X86RegisterInfo &RegInfo = static_cast<const X86RegisterInfo >(		const X86RegisterInfo &RegInfo = static_cast<const X86RegisterInfo >(
MF.getSubtarget().getRegisterInfo());		MF.getSubtarget().getRegisterInfo());
unsigned StackPtr = RegInfo.getStackRegister();		unsigned StackPtr = RegInfo.getStackRegister();
bool reserveCallFrame = hasReservedCallFrame(MF);		bool reserveCallFrame = hasReservedCallFrame(MF);
int Opcode = I->getOpcode();		int Opcode = I->getOpcode();
bool isDestroy = Opcode == TII.getCallFrameDestroyOpcode();		bool isDestroy = Opcode == TII.getCallFrameDestroyOpcode();
const X86Subtarget &STI = MF.getTarget().getSubtarget<X86Subtarget>();		const X86Subtarget &STI = MF.getTarget().getSubtarget<X86Subtarget>();
bool IsLP64 = STI.isTarget64BitLP64();		bool IsLP64 = STI.isTarget64BitLP64();
DebugLoc DL = I->getDebugLoc();		DebugLoc DL = I->getDebugLoc();
uint64_t Amount = !reserveCallFrame ? I->getOperand(0).getImm() : 0;		uint64_t Amount = !reserveCallFrame ? I->getOperand(0).getImm() : 0;
uint64_t CalleeAmt = isDestroy ? I->getOperand(1).getImm() : 0;		uint64_t InternalAmt = (isDestroy \|\| Amount) ? I->getOperand(1).getImm() : 0;
I = MBB.erase(I);		I = MBB.erase(I);

if (!reserveCallFrame) {		if (!reserveCallFrame) {
// If the stack pointer can be changed after prologue, turn the		// If the stack pointer can be changed after prologue, turn the
// adjcallstackup instruction into a 'sub ESP, <amt>' and the		// adjcallstackup instruction into a 'sub ESP, <amt>' and the
// adjcallstackdown instruction into 'add ESP, <amt>'		// adjcallstackdown instruction into 'add ESP, <amt>'
if (Amount == 0)		if (Amount == 0)
return;		return;

// We need to keep the stack aligned properly. To do this, we round the		// We need to keep the stack aligned properly. To do this, we round the
// amount of space needed for the outgoing arguments up to the next		// amount of space needed for the outgoing arguments up to the next
// alignment boundary.		// alignment boundary.
unsigned StackAlign = MF.getTarget()		unsigned StackAlign = MF.getTarget()
.getSubtargetImpl()		.getSubtargetImpl()
->getFrameLowering()		->getFrameLowering()
->getStackAlignment();		->getStackAlignment();
Amount = (Amount + StackAlign - 1) / StackAlign * StackAlign;		Amount = (Amount + StackAlign - 1) / StackAlign * StackAlign;

MachineInstr *New = nullptr;		MachineInstr *New = nullptr;
if (Opcode == TII.getCallFrameSetupOpcode()) {
// Try to convert movs to the stack into pushes.
// We currently only look for a pattern that appears in 32-bit
// calling conventions.
if (!IsLP64 && convertArgMovsToPushes(MF, MBB, I, Amount))
return;

New = BuildMI(MF, DL, TII.get(getSUBriOpcode(IsLP64, Amount)),		// Factor out the amount that gets handled inside the sequence
StackPtr)		// (Pushes of argument for frame setup, callee pops for frame destroy)
.addReg(StackPtr)		Amount -= InternalAmt;
.addImm(Amount);
		if (Amount) {
		if (Opcode == TII.getCallFrameSetupOpcode()) {
		New = BuildMI(MF, DL, TII.get(getSUBriOpcode(IsLP64, Amount)), StackPtr)
		.addReg(StackPtr).addImm(Amount);
} else {		} else {
assert(Opcode == TII.getCallFrameDestroyOpcode());		assert(Opcode == TII.getCallFrameDestroyOpcode());

// Factor out the amount the callee already popped.
Amount -= CalleeAmt;

if (Amount) {
unsigned Opc = getADDriOpcode(IsLP64, Amount);		unsigned Opc = getADDriOpcode(IsLP64, Amount);
New = BuildMI(MF, DL, TII.get(Opc), StackPtr)		New = BuildMI(MF, DL, TII.get(Opc), StackPtr)
.addReg(StackPtr).addImm(Amount);		.addReg(StackPtr).addImm(Amount);
}		}
}		}

if (New) {		if (New) {
// The EFLAGS implicit def is dead.		// The EFLAGS implicit def is dead.
New->getOperand(3).setIsDead();		New->getOperand(3).setIsDead();

// Replace the pseudo instruction with a new instruction.		// Replace the pseudo instruction with a new instruction.
MBB.insert(I, New);		MBB.insert(I, New);
}		}

return;		return;
}		}

if (Opcode == TII.getCallFrameDestroyOpcode() && CalleeAmt) {		if (Opcode == TII.getCallFrameDestroyOpcode() && InternalAmt) {
// If we are performing frame pointer elimination and if the callee pops		// If we are performing frame pointer elimination and if the callee pops
// something off the stack pointer, add it back. We do this until we have		// something off the stack pointer, add it back. We do this until we have
// more advanced stack pointer tracking ability.		// more advanced stack pointer tracking ability.
unsigned Opc = getSUBriOpcode(IsLP64, CalleeAmt);		unsigned Opc = getSUBriOpcode(IsLP64, InternalAmt);
MachineInstr *New = BuildMI(MF, DL, TII.get(Opc), StackPtr)		MachineInstr *New = BuildMI(MF, DL, TII.get(Opc), StackPtr)
.addReg(StackPtr).addImm(CalleeAmt);		.addReg(StackPtr).addImm(InternalAmt);

// The EFLAGS implicit def is dead.		// The EFLAGS implicit def is dead.
New->getOperand(3).setIsDead();		New->getOperand(3).setIsDead();

// We are not tracking the stack pointer adjustment by the callee, so make		// We are not tracking the stack pointer adjustment by the callee, so make
// sure we restore the stack pointer immediately after the call, there may		// sure we restore the stack pointer immediately after the call, there may
// be spill code inserted between the CALL and ADJCALLSTACKUP instructions.		// be spill code inserted between the CALL and ADJCALLSTACKUP instructions.
MachineBasicBlock::iterator B = MBB.begin();		MachineBasicBlock::iterator B = MBB.begin();
while (I != B && !std::prev(I)->isCall())		while (I != B && !std::prev(I)->isCall())
--I;		--I;
MBB.insert(I, New);		MBB.insert(I, New);
}		}
}		}

lib/Target/X86/X86InstrCompiler.td

	Show All 37 Lines


	// ADJCALLSTACKDOWN/UP implicitly use/def ESP because they may be expanded into			// ADJCALLSTACKDOWN/UP implicitly use/def ESP because they may be expanded into
	// a stack adjustment and the codegen must know that they may modify the stack			// a stack adjustment and the codegen must know that they may modify the stack
	// pointer before prolog-epilog rewriting occurs.			// pointer before prolog-epilog rewriting occurs.
	// Pessimistically assume ADJCALLSTACKDOWN / ADJCALLSTACKUP will become			// Pessimistically assume ADJCALLSTACKDOWN / ADJCALLSTACKUP will become
	// sub / add which can clobber EFLAGS.			// sub / add which can clobber EFLAGS.
	let Defs = [ESP, EFLAGS], Uses = [ESP] in {			let Defs = [ESP, EFLAGS], Uses = [ESP] in {
	def ADJCALLSTACKDOWN32 : I<0, Pseudo, (outs), (ins i32imm:$amt),			def ADJCALLSTACKDOWN32 : I<0, Pseudo, (outs), (ins i32imm:$amt1, i32imm:$amt2),
	"#ADJCALLSTACKDOWN",			"#ADJCALLSTACKDOWN",
	[(X86callseq_start timm:$amt)]>,			[]>,
	Requires<[NotLP64]>;			Requires<[NotLP64]>;
	def ADJCALLSTACKUP32 : I<0, Pseudo, (outs), (ins i32imm:$amt1, i32imm:$amt2),			def ADJCALLSTACKUP32 : I<0, Pseudo, (outs), (ins i32imm:$amt1, i32imm:$amt2),
	"#ADJCALLSTACKUP",			"#ADJCALLSTACKUP",
	[(X86callseq_end timm:$amt1, timm:$amt2)]>,			[(X86callseq_end timm:$amt1, timm:$amt2)]>,
	Requires<[NotLP64]>;			Requires<[NotLP64]>;
	}			}
				def : Pat<(X86callseq_start timm:$amt1),
				(ADJCALLSTACKDOWN32 i32imm:$amt1, 0)>, Requires<[NotLP64]>;


	// ADJCALLSTACKDOWN/UP implicitly use/def RSP because they may be expanded into			// ADJCALLSTACKDOWN/UP implicitly use/def RSP because they may be expanded into
	// a stack adjustment and the codegen must know that they may modify the stack			// a stack adjustment and the codegen must know that they may modify the stack
	// pointer before prolog-epilog rewriting occurs.			// pointer before prolog-epilog rewriting occurs.
	// Pessimistically assume ADJCALLSTACKDOWN / ADJCALLSTACKUP will become			// Pessimistically assume ADJCALLSTACKDOWN / ADJCALLSTACKUP will become
	// sub / add which can clobber EFLAGS.			// sub / add which can clobber EFLAGS.
	let Defs = [RSP, EFLAGS], Uses = [RSP] in {			let Defs = [RSP, EFLAGS], Uses = [RSP] in {
	def ADJCALLSTACKDOWN64 : I<0, Pseudo, (outs), (ins i32imm:$amt),			def ADJCALLSTACKDOWN64 : I<0, Pseudo, (outs), (ins i32imm:$amt1, i32imm:$amt2),
	"#ADJCALLSTACKDOWN",			"#ADJCALLSTACKDOWN",
	[(X86callseq_start timm:$amt)]>,			[]>,
	Requires<[IsLP64]>;			Requires<[IsLP64]>;
	def ADJCALLSTACKUP64 : I<0, Pseudo, (outs), (ins i32imm:$amt1, i32imm:$amt2),			def ADJCALLSTACKUP64 : I<0, Pseudo, (outs), (ins i32imm:$amt1, i32imm:$amt2),
	"#ADJCALLSTACKUP",			"#ADJCALLSTACKUP",
	[(X86callseq_end timm:$amt1, timm:$amt2)]>,			[(X86callseq_end timm:$amt1, timm:$amt2)]>,
	Requires<[IsLP64]>;			Requires<[IsLP64]>;
	}			}
				def : Pat<(X86callseq_start timm:$amt1),
				(ADJCALLSTACKDOWN64 i32imm:$amt1, 0)>, Requires<[IsLP64]>;


	// x86-64 va_start lowering magic.			// x86-64 va_start lowering magic.
	let usesCustomInserter = 1, Defs = [EFLAGS] in {			let usesCustomInserter = 1, Defs = [EFLAGS] in {
	def VASTART_SAVE_XMM_REGS : I<0, Pseudo,			def VASTART_SAVE_XMM_REGS : I<0, Pseudo,
	(outs),			(outs),
	(ins GR8:$al,			(ins GR8:$al,
	i64imm:$regsavefi, i64imm:$offset,			i64imm:$regsavefi, i64imm:$offset,
	▲ Show 20 Lines • Show All 1,783 Lines • Show Last 20 Lines

lib/Target/X86/X86MachineFunctionInfo.h

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	class X86MachineFunctionInfo : public MachineFunctionInfo {
unsigned VarArgsGPOffset;		unsigned VarArgsGPOffset;
/// VarArgsFPOffset - X86-64 vararg func fp reg offset.		/// VarArgsFPOffset - X86-64 vararg func fp reg offset.
unsigned VarArgsFPOffset;		unsigned VarArgsFPOffset;
/// ArgumentStackSize - The number of bytes on stack consumed by the arguments		/// ArgumentStackSize - The number of bytes on stack consumed by the arguments
/// being passed on the stack.		/// being passed on the stack.
unsigned ArgumentStackSize;		unsigned ArgumentStackSize;
/// NumLocalDynamics - Number of local-dynamic TLS accesses.		/// NumLocalDynamics - Number of local-dynamic TLS accesses.
unsigned NumLocalDynamics;		unsigned NumLocalDynamics;
		/// HasPushSequences - Keeps track of whether this function uses sequences
		/// of pushes to pass function parameters.
		bool HasPushSequences;

private:		private:
/// ForwardedMustTailRegParms - A list of virtual and physical registers		/// ForwardedMustTailRegParms - A list of virtual and physical registers
/// that must be forwarded to every musttail call.		/// that must be forwarded to every musttail call.
SmallVector<ForwardedRegister, 1> ForwardedMustTailRegParms;		SmallVector<ForwardedRegister, 1> ForwardedMustTailRegParms;

public:		public:
X86MachineFunctionInfo() : ForceFramePointer(false),		X86MachineFunctionInfo() : ForceFramePointer(false),
RestoreBasePointerOffset(0),		RestoreBasePointerOffset(0),
CalleeSavedFrameSize(0),		CalleeSavedFrameSize(0),
BytesToPopOnReturn(0),		BytesToPopOnReturn(0),
ReturnAddrIndex(0),		ReturnAddrIndex(0),
TailCallReturnAddrDelta(0),		TailCallReturnAddrDelta(0),
SRetReturnReg(0),		SRetReturnReg(0),
GlobalBaseReg(0),		GlobalBaseReg(0),
VarArgsFrameIndex(0),		VarArgsFrameIndex(0),
RegSaveFrameIndex(0),		RegSaveFrameIndex(0),
VarArgsGPOffset(0),		VarArgsGPOffset(0),
VarArgsFPOffset(0),		VarArgsFPOffset(0),
ArgumentStackSize(0),		ArgumentStackSize(0),
NumLocalDynamics(0) {}		NumLocalDynamics(0),
		HasPushSequences(false) {}

explicit X86MachineFunctionInfo(MachineFunction &MF)		explicit X86MachineFunctionInfo(MachineFunction &MF)
: ForceFramePointer(false),		: ForceFramePointer(false),
RestoreBasePointerOffset(0),		RestoreBasePointerOffset(0),
CalleeSavedFrameSize(0),		CalleeSavedFrameSize(0),
BytesToPopOnReturn(0),		BytesToPopOnReturn(0),
ReturnAddrIndex(0),		ReturnAddrIndex(0),
TailCallReturnAddrDelta(0),		TailCallReturnAddrDelta(0),
SRetReturnReg(0),		SRetReturnReg(0),
GlobalBaseReg(0),		GlobalBaseReg(0),
VarArgsFrameIndex(0),		VarArgsFrameIndex(0),
RegSaveFrameIndex(0),		RegSaveFrameIndex(0),
VarArgsGPOffset(0),		VarArgsGPOffset(0),
VarArgsFPOffset(0),		VarArgsFPOffset(0),
ArgumentStackSize(0),		ArgumentStackSize(0),
NumLocalDynamics(0) {}		NumLocalDynamics(0),
		HasPushSequences(false) {}

bool getForceFramePointer() const { return ForceFramePointer;}		bool getForceFramePointer() const { return ForceFramePointer;}
void setForceFramePointer(bool forceFP) { ForceFramePointer = forceFP; }		void setForceFramePointer(bool forceFP) { ForceFramePointer = forceFP; }

		bool getHasPushSequences() const { return HasPushSequences; }
		void setHasPushSequences(bool HasPush) { HasPushSequences = HasPush; }

bool getRestoreBasePointer() const { return RestoreBasePointerOffset!=0; }		bool getRestoreBasePointer() const { return RestoreBasePointerOffset!=0; }
void setRestoreBasePointer(const MachineFunction *MF);		void setRestoreBasePointer(const MachineFunction *MF);
int getRestoreBasePointerOffset() const {return RestoreBasePointerOffset; }		int getRestoreBasePointerOffset() const {return RestoreBasePointerOffset; }

unsigned getCalleeSavedFrameSize() const { return CalleeSavedFrameSize; }		unsigned getCalleeSavedFrameSize() const { return CalleeSavedFrameSize; }
void setCalleeSavedFrameSize(unsigned bytes) { CalleeSavedFrameSize = bytes; }		void setCalleeSavedFrameSize(unsigned bytes) { CalleeSavedFrameSize = bytes; }

unsigned getBytesToPopOnReturn() const { return BytesToPopOnReturn; }		unsigned getBytesToPopOnReturn() const { return BytesToPopOnReturn; }
Show All 40 Lines

lib/Target/X86/X86TargetMachine.cpp

Show First 20 Lines • Show All 148 Lines • ▼ Show 20 Lines	public:

const X86Subtarget &getX86Subtarget() const {		const X86Subtarget &getX86Subtarget() const {
return *getX86TargetMachine().getSubtargetImpl();		return *getX86TargetMachine().getSubtargetImpl();
}		}

void addIRPasses() override;		void addIRPasses() override;
bool addInstSelector() override;		bool addInstSelector() override;
bool addILPOpts() override;		bool addILPOpts() override;
		void addPreRegAlloc() override;
void addPostRegAlloc() override;		void addPostRegAlloc() override;
void addPreEmitPass() override;		void addPreEmitPass() override;
};		};
} // namespace		} // namespace

TargetPassConfig *X86TargetMachine::createPassConfig(PassManagerBase &PM) {		TargetPassConfig *X86TargetMachine::createPassConfig(PassManagerBase &PM) {
return new X86PassConfig(this, PM);		return new X86PassConfig(this, PM);
}		}
Show All 17 Lines	bool X86PassConfig::addInstSelector() {
return false;		return false;
}		}

bool X86PassConfig::addILPOpts() {		bool X86PassConfig::addILPOpts() {
addPass(&EarlyIfConverterID);		addPass(&EarlyIfConverterID);
return true;		return true;
}		}

		void X86PassConfig::addPreRegAlloc() {
		addPass(createX86ConvertMovsToPushes());
		}

void X86PassConfig::addPostRegAlloc() {		void X86PassConfig::addPostRegAlloc() {
addPass(createX86FloatingPointStackifierPass());		addPass(createX86FloatingPointStackifierPass());
}		}

void X86PassConfig::addPreEmitPass() {		void X86PassConfig::addPreEmitPass() {
if (getOptLevel() != CodeGenOpt::None && getX86Subtarget().hasSSE2())		if (getOptLevel() != CodeGenOpt::None && getX86Subtarget().hasSSE2())
addPass(createExecutionDependencyFixPass(&X86::VR128RegClass));		addPass(createExecutionDependencyFixPass(&X86::VR128RegClass));

if (UseVZeroUpper)		if (UseVZeroUpper)
addPass(createX86IssueVZeroUpperPass());		addPass(createX86IssueVZeroUpperPass());

if (getOptLevel() != CodeGenOpt::None) {		if (getOptLevel() != CodeGenOpt::None) {
addPass(createX86PadShortFunctions());		addPass(createX86PadShortFunctions());
addPass(createX86FixupLEAs());		addPass(createX86FixupLEAs());
}		}
}		}

test/CodeGen/X86/inalloca-invoke.ll

Show All 25 Lines	; CHECK: leal 12(%[[beg]]), %[[end:[^ ]*]]

call void @begin(%Iter* sret %temp.lvalue)		call void @begin(%Iter* sret %temp.lvalue)
; CHECK: calll _begin		; CHECK: calll _begin

invoke void @plus(%Iter* sret %end, %Iter* %temp.lvalue, i32 4)		invoke void @plus(%Iter* sret %end, %Iter* %temp.lvalue, i32 4)
to label %invoke.cont unwind label %lpad		to label %invoke.cont unwind label %lpad

; Uses end as sret param.		; Uses end as sret param.
; CHECK: movl %[[end]], (%esp)		; CHECK: pushl %[[end]]
; CHECK: calll _plus		; CHECK: calll _plus

invoke.cont:		invoke.cont:
call void @begin(%Iter* sret %beg)		call void @begin(%Iter* sret %beg)

; CHECK: pushl %[[beg]]		; CHECK: pushl %[[beg]]
; CHECK: calll _begin		; CHECK: calll _begin

Show All 12 Lines

test/CodeGen/X86/movtopush.ll

	; RUN: llc < %s -mtriple=i686-windows \| FileCheck %s -check-prefix=NORMAL			; RUN: llc < %s -mtriple=i686-windows \| FileCheck %s -check-prefix=NORMAL
				; RUN: llc < %s -mtriple=x86_64-windows \| FileCheck %s -check-prefix=X64
	; RUN: llc < %s -mtriple=i686-windows -force-align-stack -stack-alignment=32 \| FileCheck %s -check-prefix=ALIGNED			; RUN: llc < %s -mtriple=i686-windows -force-align-stack -stack-alignment=32 \| FileCheck %s -check-prefix=ALIGNED
	declare void @good(i32 %a, i32 %b, i32 %c, i32 %d)			declare void @good(i32 %a, i32 %b, i32 %c, i32 %d)
	declare void @inreg(i32 %a, i32 inreg %b, i32 %c, i32 %d)			declare void @inreg(i32 %a, i32 inreg %b, i32 %c, i32 %d)

	; Here, we should have a reserved frame, so we don't expect pushes			; Here, we should have a reserved frame, so we don't expect pushes
	; NORMAL-LABEL: test1			; NORMAL-LABEL: test1:
	; NORMAL: subl $16, %esp			; NORMAL: subl $16, %esp
	; NORMAL-NEXT: movl $4, 12(%esp)			; NORMAL-NEXT: movl $4, 12(%esp)
	; NORMAL-NEXT: movl $3, 8(%esp)			; NORMAL-NEXT: movl $3, 8(%esp)
	; NORMAL-NEXT: movl $2, 4(%esp)			; NORMAL-NEXT: movl $2, 4(%esp)
	; NORMAL-NEXT: movl $1, (%esp)			; NORMAL-NEXT: movl $1, (%esp)
	; NORMAL-NEXT: call			; NORMAL-NEXT: call
	define void @test1() {			define void @test1() {
	entry:			entry:
	call void @good(i32 1, i32 2, i32 3, i32 4)			call void @good(i32 1, i32 2, i32 3, i32 4)
	ret void			ret void
	}			}

	; Here, we expect a sequence of 4 immediate pushes			; We're optimizing for code size, so we should get pushes for x86.
	; NORMAL-LABEL: test2			; Make sure we don't touch x86-64
				; NORMAL-LABEL: test1b:
				; NORMAL-NOT: subl {{.*}} %esp
				; NORMAL: pushl $4
				; NORMAL-NEXT: pushl $3
				; NORMAL-NEXT: pushl $2
				; NORMAL-NEXT: pushl $1
				; NORMAL-NEXT: call
				; X64-LABEL: test1b:
				; X64: movl $1, %ecx
				; X64-NEXT: movl $2, %edx
				; X64-NEXT: movl $3, %r8d
				; X64-NEXT: movl $4, %r9d
				; X64-NEXT: callq good
				define void @test1b() optsize {
				entry:
				call void @good(i32 1, i32 2, i32 3, i32 4)
				ret void
				}

				; Same as above, but for minsize
				; NORMAL-LABEL: test1c:
				; NORMAL-NOT: subl {{.*}} %esp
				; NORMAL: pushl $4
				; NORMAL-NEXT: pushl $3
				; NORMAL-NEXT: pushl $2
				; NORMAL-NEXT: pushl $1
				; NORMAL-NEXT: call
				define void @test1c() minsize {
				entry:
				call void @good(i32 1, i32 2, i32 3, i32 4)
				ret void
				}

				; If we have a reserved frame, we should have pushes
				; NORMAL-LABEL: test2:
	; NORMAL-NOT: subl {{.*}} %esp			; NORMAL-NOT: subl {{.*}} %esp
	; NORMAL: pushl $4			; NORMAL: pushl $4
	; NORMAL-NEXT: pushl $3			; NORMAL-NEXT: pushl $3
	; NORMAL-NEXT: pushl $2			; NORMAL-NEXT: pushl $2
	; NORMAL-NEXT: pushl $1			; NORMAL-NEXT: pushl $1
	; NORMAL-NEXT: call			; NORMAL-NEXT: call
	define void @test2(i32 %k) {			define void @test2(i32 %k) {
	entry:			entry:
	%a = alloca i32, i32 %k			%a = alloca i32, i32 %k
	call void @good(i32 1, i32 2, i32 3, i32 4)			call void @good(i32 1, i32 2, i32 3, i32 4)
	ret void			ret void
	}			}

	; Again, we expect a sequence of 4 immediate pushes			; Again, we expect a sequence of 4 immediate pushes
	; Checks that we generate the right pushes for >8bit immediates			; Checks that we generate the right pushes for >8bit immediates
	; NORMAL-LABEL: test2b			; NORMAL-LABEL: test2b:
	; NORMAL-NOT: subl {{.*}} %esp			; NORMAL-NOT: subl {{.*}} %esp
	; NORMAL: pushl $4096			; NORMAL: pushl $4096
	; NORMAL-NEXT: pushl $3072			; NORMAL-NEXT: pushl $3072
	; NORMAL-NEXT: pushl $2048			; NORMAL-NEXT: pushl $2048
	; NORMAL-NEXT: pushl $1024			; NORMAL-NEXT: pushl $1024
	; NORMAL-NEXT: call			; NORMAL-NEXT: call
	define void @test2b(i32 %k) {			define void @test2b() optsize {
	entry:			entry:
	%a = alloca i32, i32 %k
	call void @good(i32 1024, i32 2048, i32 3072, i32 4096)			call void @good(i32 1024, i32 2048, i32 3072, i32 4096)
	ret void			ret void
	}			}

	; The first push should push a register			; The first push should push a register
	; NORMAL-LABEL: test3			; NORMAL-LABEL: test3:
	; NORMAL-NOT: subl {{.*}} %esp			; NORMAL-NOT: subl {{.*}} %esp
	; NORMAL: pushl $4			; NORMAL: pushl $4
	; NORMAL-NEXT: pushl $3			; NORMAL-NEXT: pushl $3
	; NORMAL-NEXT: pushl $2			; NORMAL-NEXT: pushl $2
	; NORMAL-NEXT: pushl %e{{..}}			; NORMAL-NEXT: pushl %e{{..}}
	; NORMAL-NEXT: call			; NORMAL-NEXT: call
	define void @test3(i32 %k) {			define void @test3(i32 %k) optsize {
	entry:			entry:
	%a = alloca i32, i32 %k
	call void @good(i32 %k, i32 2, i32 3, i32 4)			call void @good(i32 %k, i32 2, i32 3, i32 4)
	ret void			ret void
	}			}

	; We don't support weird calling conventions			; We don't support weird calling conventions
	; NORMAL-LABEL: test4			; NORMAL-LABEL: test4:
	; NORMAL: subl $12, %esp			; NORMAL: subl $12, %esp
	; NORMAL-NEXT: movl $4, 8(%esp)			; NORMAL-NEXT: movl $4, 8(%esp)
	; NORMAL-NEXT: movl $3, 4(%esp)			; NORMAL-NEXT: movl $3, 4(%esp)
	; NORMAL-NEXT: movl $1, (%esp)			; NORMAL-NEXT: movl $1, (%esp)
	; NORMAL-NEXT: movl $2, %eax			; NORMAL-NEXT: movl $2, %eax
	; NORMAL-NEXT: call			; NORMAL-NEXT: call
	define void @test4(i32 %k) {			define void @test4() optsize {
	entry:			entry:
	%a = alloca i32, i32 %k
	call void @inreg(i32 1, i32 2, i32 3, i32 4)			call void @inreg(i32 1, i32 2, i32 3, i32 4)
	ret void			ret void
	}			}

	; Check that additional alignment is added when the pushes			; When there is no reserved call frame, check that additional alignment
	; don't add up to the required alignment.			; is added when the pushes don't add up to the required alignment.
	; ALIGNED-LABEL: test5			; ALIGNED-LABEL: test5:
	; ALIGNED: subl $16, %esp			; ALIGNED: subl $16, %esp
	; ALIGNED-NEXT: pushl $4			; ALIGNED-NEXT: pushl $4
	; ALIGNED-NEXT: pushl $3			; ALIGNED-NEXT: pushl $3
	; ALIGNED-NEXT: pushl $2			; ALIGNED-NEXT: pushl $2
	; ALIGNED-NEXT: pushl $1			; ALIGNED-NEXT: pushl $1
	; ALIGNED-NEXT: call			; ALIGNED-NEXT: call
	define void @test5(i32 %k) {			define void @test5(i32 %k) {
	entry:			entry:
	%a = alloca i32, i32 %k			%a = alloca i32, i32 %k
	call void @good(i32 1, i32 2, i32 3, i32 4)			call void @good(i32 1, i32 2, i32 3, i32 4)
	ret void			ret void
	}			}

	; Check that pushing the addresses of globals (Or generally, things that			; Check that pushing the addresses of globals (Or generally, things that
	; aren't exactly immediates) isn't broken.			; aren't exactly immediates) isn't broken.
	; Fixes PR21878.			; Fixes PR21878.
	; NORMAL-LABEL: test6			; NORMAL-LABEL: test6:
	; NORMAL: pushl $_ext			; NORMAL: pushl $_ext
	; NORMAL-NEXT: call			; NORMAL-NEXT: call
	declare void @f(i8*)			declare void @f(i8*)
	@ext = external constant i8			@ext = external constant i8

	define void @test6() {			define void @test6() {
	call void @f(i8* @ext)			call void @f(i8* @ext)
	br label %bb			br label %bb
	bb:			bb:
	alloca i32			alloca i32
	ret void			ret void
	}			}

				; Check that we fold simple cases into the push
				; NORMAL-LABEL: test7:
				; NORMAL-NOT: subl {{.*}} %esp
				; NORMAL: movl 4(%esp), [[EAX:%e..]]
				; NORMAL-NEXT: pushl $4
				; NORMAL-NEXT: pushl ([[EAX]])
				; NORMAL-NEXT: pushl $2
				; NORMAL-NEXT: pushl $1
				; NORMAL-NEXT: call
				define void @test7(i32* %ptr) optsize {
				entry:
				%val = load i32* %ptr
				call void @good(i32 1, i32 2, i32 %val, i32 4)
				ret void
				}

				; But we don't want to fold stack-relative loads into the push,
				; because the offset will be wrong
				; NORMAL-LABEL: test8:
				; NORMAL-NOT: subl {{.*}} %esp
				; NORMAL: movl 4(%esp), [[EAX:%e..]]
				; NORMAL-NEXT: pushl $4
				; NORMAL-NEXT: pushl [[EAX]]
				; NORMAL-NEXT: pushl $2
				; NORMAL-NEXT: pushl $1
				; NORMAL-NEXT: call
				define void @test8(i32* %ptr) optsize {
				entry:
				%val = ptrtoint i32* %ptr to i32
				call void @good(i32 1, i32 2, i32 %val, i32 4)
				ret void
				}
				No newline at end of file
				rnkUnsubmitted Not Done Reply Inline Actions Test case suggestions: ; Where the callee is indirect via the stack, `call <fi>` define void @test10() optsize { %stack_fptr = alloca void (i32, i32, i32, i32)* store void (i32, i32, i32, i32)* @good, void (i32, i32, i32, i32) %stack_fptr %good_ptr = load void (i32, i32, i32, i32) %stack_fptr call void (i32, i32, i32, i32)* %good_ptr(i32 1, i32 2, i32 3, i32 4) ret void } ; We can't fold the load into the push here, skipping the store. @the_global = global i32 define void @test11() optsize { %myload = load i32* @the_global store i32 42, i32* @the_global call void @good(i32 %myload, i32 2, i32 3, i32 4) ret void } rnk: Test case suggestions: ``` ; Where the callee is indirect via the stack, `call <fi>` define…