This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/Target/
-
llvm/
-
Target/
-
TargetFrameLowering.h
-
lib/
-
CodeGen/
-
PrologEpilogInserter.cpp
-
TargetFrameLoweringImpl.cpp
-
Target/X86/
-
X86/
-
CMakeLists.txt
-
X86.h
-
X86CallFrameOptimization.cpp
-
X86FastISel.cpp
-
X86FrameLowering.h
-
X86FrameLowering.cpp
-
X86InstrCompiler.td
-
X86InstrInfo.h
-
X86InstrInfo.cpp
-
X86MachineFunctionInfo.h
-
X86RegisterInfo.cpp
-
X86TargetMachine.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
inalloca-invoke.ll
-
movtopush.ll

Differential D6789

[X86] Convert esp-relative movs of function arguments to pushes, step 2
ClosedPublic

Authored by mkuper on Dec 28 2014, 6:11 AM.

Download Raw Diff

Details

Reviewers

nadav
delena
rnk

Commits

rG13fbd4526336: [X86] Convert esp-relative movs of function arguments to pushes, step 2
rGbd57186c763f: [X86] Convert esp-relative movs of function arguments to pushes, step 2
rL227752: [X86] Convert esp-relative movs of function arguments to pushes, step 2
rL227728: [X86] Convert esp-relative movs of function arguments to pushes, step 2

Summary

This is a first stab at the next step of the mov-to-push transformation.

It moves the transformation earlier in the pass order so that it can do load-folding, and prepares the required infrastructure.
It is still enabled only in cases where it should be a clear win - when we don't expect to have a reserved call frame, or when optimizing for size.
The next step will be a heuristic that makes a smarter decision on when this should be enabled.

As a side note - I've done some internal testing for effects on the code size, but I'd like to do some testing for things other people care about as well. So, if you have a x86-32 code-base where you care about the code size, and is publicly available, let me know.

Diff Detail

Repository: rL LLVM

Event Timeline

mkuper updated this revision to Diff 17656.Dec 28 2014, 6:11 AM

mkuper retitled this revision from to [X86] Convert esp-relative movs of function arguments to pushes, step 2.

mkuper updated this object.

mkuper edited the test plan for this revision. (Show Details)

mkuper added reviewers: nadav, rnk, delena.

mkuper added a subscriber: Unknown Object (MLST).

Removed a horrible hack that was, in addition to being horrible, completely wrong, and added a test-case to cover the issue.

Also, ping?

I suggest to check also varargs and stdcall functions, were the callee clears the stack.

lib/Target/X86/CMakeLists.txt
17 ↗	(On Diff #17788)	Can you add this code to X86FrameLowering.cpp ?
lib/Target/X86/X86.h
70 ↗	(On Diff #17788)	I suggest to choose another name, something like optimizeCallFrameForSize
lib/Target/X86/X86ConvertMovsToPushes.cpp
39 ↗	(On Diff #17788)	I don't think that we really need this knob.
113 ↗	(On Diff #17788)	If you change instructions inside bb, your iterator may be broken.
212 ↗	(On Diff #17788)	It should be immediate, right? Can we have a relocation here?
224 ↗	(On Diff #17788)	SlowPush should be a property of the target, like slowLea
226 ↗	(On Diff #17788)	The comment is missing here.

Thanks, Elena!
Will upload a new version.

lib/Target/X86/X86.h
70 ↗	(On Diff #17788)	I wasn't happy with the name either, but didn't have any good ideas at the time. Will do.
lib/Target/X86/X86ConvertMovsToPushes.cpp
39 ↗	(On Diff #17788)	I'd rather keep this knob, it's fairly useful for debugging. Of course, it's internal only, not exposed to clang.
113 ↗	(On Diff #17788)	As far as I know, MBB iterators aren't invalidated by removing other instructions, and we don't remove the FrameSetup itself. But it's probably better to keep going from the FrameDestroy instead of the next instruction. Will change that.
212 ↗	(On Diff #17788)	It can be a relocation, but in that case, isImm() will fail. Will document that more clearly
224 ↗	(On Diff #17788)	I agree. Unfortunately, I've run out of bits. The Subtarget features are 64-bit bitfield, and they're all taken.
lib/Target/X86/X86RegisterInfo.cpp
508 ↗	(On Diff #17788)	And, apparently, this is still wrong, because eliminateCallFramePseudoInstr() may actually adjust the SP by a different amount than what PEI passes as the SPAdj, e.g. due to stack alignment concerns.

mkuper added inline comments.Jan 6 2015, 3:19 AM

lib/Target/X86/X86ConvertMovsToPushes.cpp
285 ↗	(On Diff #17788)	Argh. This is nonsense. Commented one thing, coded another... (mayStore() is extremely far from being a strong enough condition to allow this.)

rnk added inline comments.Jan 12 2015, 4:02 PM

lib/Target/X86/X86ConvertMovsToPushes.cpp
39 ↗	(On Diff #17788)	I would also like this as a temporary testing knob so that I can evaluate this across a large codebase.

So, this version should actually work (e.g. it can self-host and past check-llvm. Without the stackalign restriction of course, since that currently makes it a nop except on windows).
Unfortunately, it has several big warts, so I'm not planning to commit it as is. This is more of a request for ideas on how to improve the code.

So, any ideas on how to make this sane, especially X86InstrInfo::getSPAdjust(), are welcome.

I haven't finished reviewing yet, but I've got to run and handle something personal.

At a high level, is there any reason we shouldn't commit to push/pop earlier to allow for better ISel, rather than trying to transform call sequences later? Specifically, I'm thinking about adding an X86ISD::PUSH DAG node and changing X86TargetLowering::LowerCall() to use it.

lib/CodeGen/PrologEpilogInserter.cpp
855–856 ↗	(On Diff #18084)	This seems like an x86-specific quirk, right? Given "push [esp + 8]", x86 chips will load [esp + 8] before adjusting esp, and I think this code motion accomplishes that. I'm OK with that motion so long as there are no other upstream LLVM backends with CISC-y instructions like "push [SP-mem]". :)
lib/Target/X86/X86ConvertMovsToPushes.cpp
11 ↗	(On Diff #18084)	s/stck/stack/
82–83 ↗	(On Diff #18084)	I think it's important to at least support __thiscall eventually, since that's a very common convention with one regparm.
84–86 ↗	(On Diff #18084)	I guess I would justify this more in terms of reducing the extra CFI that we would have to emit to describe the SP adjustments. Converting a few movs to pushes isn't worth the complexity.
143 ↗	(On Diff #18084)	Can you explain why this is unprofitable? I guess if we get here we are in dyanamic alloca plus stack realignment land, i.e. the worst thing that could possibly happen. Is this about extra code for preserving the outgoing stack alignment then? Like on Linux, where we provide 16 byte stack alignment?

Thanks, Reid!

Waiting for the second part, you didn't get to the really horrible stuff yet...

Regarding the high level, two reasons:

It seemed like it was going to be simpler. I'm not so sure anymore, but I still think it is. (Note that we'll still need to fix all of the code that tracks SP adjustment, that's not going away in either case).
The main problem is that next step after this is going to be a function-scope heuristic. To use this transformation for even one call-site, I have to disable the reserved frame for the whole function. So, I need to try to approximate the impact on the whole function (which contains some calls that will be converted to use pushes, and some calls that won't be). I don't see how this can be done on the DAG level.

lib/CodeGen/PrologEpilogInserter.cpp
855–856 ↗	(On Diff #18084)	This call to SPAdjust() always returns 0 right now (barring the code in this patch), it was added as part of my refactoring in D6863, and I added it in the wrong place. The motivation here wasn't a push, actually, since I try to never generate push [esp + 8], that's filtered out by the code in the optimization. Although I can probably start generating them - I was trying to filter them out precisely because I didn't want all of this complexity at the first stage, but apparently it's necessary. The problem is that once we don't have a reserved call frame (regardless of the push transformation), you can have things like CALL32r <fi#1>, where the call is callee-pop. So you need to resolve the indirect call using the stack-pointer from before the call.
lib/Target/X86/X86ConvertMovsToPushes.cpp
82–83 ↗	(On Diff #18084)	Yes, and maybe even for _fastcall (It looks like gcc will do this for fastcall, icc won't). But I am still trying to do this gradually, to the extent that I can. :-)
84–86 ↗	(On Diff #18084)	You're right, that too.
143 ↗	(On Diff #18084)	If we get here, we're in opt-for-size + stack-realignment land. And, yes, that's exactly what it is is about. If you are passing only one parameter, the original code would be: mov %eax, 128(%esp) call $foo Without re-alignment, you have push %eax call $foo add $4, %esp which is still a win in terms of code-size With re-alignment, you get: sub $16, %esp push %eax call $foo add $12, %esp Which is... questionable. The code size for the sequence is the same (in this case, 7 bytes for both, not including the call), but if you have other call sites which you didn't convert, you may actually lose. And, of course, you lose performance (3 instructions instead of 1) without anything to show for it. Once there is a heuristic that tries to estimate the overhead, we can address this on a case-by-case basis (e.g. if we have 16-byte stack re-alignment, but most call-sites have a lot of parameters, then it's still worth it.)

rnk added inline comments.Jan 13 2015, 3:58 PM

lib/Target/X86/X86ConvertMovsToPushes.cpp
128–130 ↗	(On Diff #18084)	I think I misinterpreted this on the first pass. We always expect this to be profitable if we know we can't reserve space for the call frame. Maybe rename the bool to CannotReserveFrame to match the sense?
143 ↗	(On Diff #18084)	Based on my misinterpretation, I think I understand why you get this code. SP is assumed to be aligned coming into the sequence. We realign SP after dynamic allocas. The sequence is probably more like: sub $12, %esp push %eax call $foo add $16, %esp I can see why this is less profitable.
208 ↗	(On Diff #18084)	std::map is really malloc heavy. This can probably be a SmallVector<MachineInstr*, 8> or something, mapping slot index to the MI that fills it. The frame setup opcode should tell you how much stack space to allocate up front, and you can index into the vector by StackOffset / 4.
220–222 ↗	(On Diff #18084)	This seems worth tackling, given that you had to handle the `call <fi>` case. :)
364–368 ↗	(On Diff #18084)	It's not clear to me that same BB is sufficient, consider this potential BB: movl (%edi), %eax movl $42, (%edi) <call setup> movl %eax, (%esp) calll foo <call end> We can't move the load if there is a potentially aliasing store in the way. There might be a utility to help with the aliasing query, or you can assume that any stores other than arg stores might alias it and bail on that.
lib/Target/X86/X86InstrInfo.cpp
1717–1718 ↗	(On Diff #18084)	This is the best thing I can think of at the moment. =/
test/CodeGen/X86/movtopush.ll
206 ↗	(On Diff #18084)	Test case suggestions: ; Where the callee is indirect via the stack, `call <fi>` define void @test10() optsize { %stack_fptr = alloca void (i32, i32, i32, i32)* store void (i32, i32, i32, i32)* @good, void (i32, i32, i32, i32) %stack_fptr %good_ptr = load void (i32, i32, i32, i32) %stack_fptr call void (i32, i32, i32, i32)* %good_ptr(i32 1, i32 2, i32 3, i32 4) ret void } ; We can't fold the load into the push here, skipping the store. @the_global = global i32 define void @test11() optsize { %myload = load i32* @the_global store i32 42, i32* @the_global call void @good(i32 %myload, i32 2, i32 3, i32 4) ret void }

Thanks, Reid!

lib/Target/X86/X86ConvertMovsToPushes.cpp
128–130 ↗	(On Diff #18084)	Err, yes, you're right, sorry about that... got distracted while naming the variable, I guess, I meant the opposite. Thanks!
143 ↗	(On Diff #18084)	Yes, that sequence. :-) It doesn't depend on dynamic allocas, though. If you don't have a reserved frame (for whatever reason - for x86 after this patch, it's either dynamic allocas, or because we forced it not to reserve by using pushes), then you need this re-alignment.
208 ↗	(On Diff #18084)	That can work. Thanks, I'll try.
220–222 ↗	(On Diff #18084)	Yes, definitely. :-) It may even work out of the box now. But I think I still want to split it into a separate commit.
364–368 ↗	(On Diff #18084)	Right now I'm way more conservative than even that - I'm checking below that everything between this mov and the call setup is a MOV32rm. The "same basic block" check here is just a way to short-circuit the obviously wrong cases. This catches some common cases like the one in the comment above, but of course misses other opportunities. I could check for a mayStore() instead, but I'm not sure that's safe enough. I'd like to relax the condition - but again, I think that ought to be a separate commit.
lib/Target/X86/X86InstrInfo.cpp
1717–1718 ↗	(On Diff #18084)	Too bad. :-\ So you think I should commit with this code as is? This shouldn't be a huge problem in terms of compile-time (since I'm looking only until the next call, it can't go quadratic), but it's insanely ugly.

Applied review comments
Fixed another bug in the way PEI was handling push sequences (argh) - this required adding a target query.
Made the tests check a bit more (which would have exposed the bug above earlier).

rnk added inline comments.Jan 15 2015, 10:19 AM

lib/Target/X86/X86InstrInfo.cpp
1717–1718 ↗	(On Diff #18084)	Yeah, if we go with this MI pass approach to mov -> push conversion, then we'll have to keep this ADJCALLSTACKUP scan. We aren't going to move the callee cleanup stack adjustment onto the CALL instr without major changes.
1745 ↗	(On Diff #18084)	I wonder if it's possible for __readeflags() (pushf ; pop %reg) or others to get folded into a call sequence. Probably not.

mkuper added inline comments.Jan 16 2015, 5:57 AM

lib/Target/X86/X86InstrInfo.cpp
1717–1718 ↗	(On Diff #18084)	This will have to happen regardless of the MI pass vs. DAG approach. I mean, I still think doing it on the DAG is unfeasible, but even if we could do that, it wouldn't help. This code is used for the case where fi resolution needs to handle a a sequence where there is a fi reference between the call and the adjcallstackup, with callee cleanup for the call. This is just a side effect of making canSimplifyCallFramePseudos return false.
1745 ↗	(On Diff #18084)	I don't see how it could happen. In any case, we won't match either the pushf or the pop, so it should be ok.

lgtm

I still think forming pushes prior to isel is the way to go long term. It's a lot easier to convert pushes to 'load, SP adjust, store' than it is to go the other way.

include/llvm/Target/TargetFrameLowering.h
196 ↗	(On Diff #18222)	"- Do" uppercase
lib/Target/X86/X86ConvertMovsToPushes.cpp
100 ↗	(On Diff #18222)	Can this be `for (MachineBasicBlock &BB : *MF) {`?
102 ↗	(On Diff #18222)	Ditto, `for (MachineInstr &MI : BB) {` ?
lib/Target/X86/X86InstrInfo.cpp
1717–1718 ↗	(On Diff #18084)	I was imagining in the DAG LowerCall implementation we emit FrameIndex operands with some kind of SP offset to indicate the current stack level. We'd end up with MI looking like this: ADJCALLSTACKDOWN32 <N> ; N is <size-of-args> % <stack-alignment>, which is usually zero PUSH32rmm <fi> <sp offset, N> PUSH32rmm <fi> <sp offset, N + 4> PUSH32rmm <fi> <sp offset, N + 8> CALL32rm <fi> <sp offset, N + 12> ADJCALLSTACKUP32 <N + 12> The main thing is that if we commit to pushes instead of movs at DAG time, it's impossible for the push conversion to fail for hard to diagnose reasons. It looks like the frame index MachineOperand type has an unused offset field.
1713–1725 ↗	(On Diff #18222)	I would shorten this to just something like "look for the ADJCALLSTACKUP instr that follows the call".

This revision is now accepted and ready to land.Jan 22 2015, 12:57 PM

Hi Chandler,

This is something that Reid and I talked about on IRC, but I don’t think we came to a conclusion both of us were happy with (hence Reid’s “lgtm with reservations”, I guess :-) )

First, I don’t think the decision on whether to use movs or pushes belongs in the DAG.
The decision on whether a call-site should use movs or pushes needs to be aware of its context, because having even one call-site use pushes means we will not have a reserved call frame, which affects the way all other call sites are treated as well. This patch makes the decision based on global attributes only (opt for size vs. speed, stack alignment), but the next step will be to make it based on an analysis of the call-sites – e.g. even with stack alignment of 16, it can still often be a win, depending on just how many of the function calls we can actually transform, and how many memory arguments each call has.

So the way I envision the next step is that the pass will:

a) Collect the necessary information from all call sites in the function.

b) Make a judgment on whether the transformation is worth it – in terms of size for Os/Oz, in terms of performance for other opt levels.

c) Perform the transformation.
I don’t see how we can do this on the DAG.

If I understand Reid’s last suggestion, he proposed to flip the default – that is, emit pushes in the DAG, and have an MI pass that does the opposite (push -> mov) transformation if necessary.

I don’t believe that removes a lot of complexity or would improve performance.
The code in PEI, InstrInfo and FrameLowering is just a side effect on not being able to rely on a 0 SPAdj in PEI anymore (that is, canSimplifyCallFramePseudos() can now return false), and is needed regardless of how the transformation is performed. And we will still need the heuristic decision.
Some of the logic in looking for sequences where the conversion is possible will disappear, but I think a lot of it will remain as conditions on the incoming operand DAG nodes. And since we don’t want to transform each push into a “adjust esp, mov” but rather want to group all the esp adjustments back into the ADJCALLSTACKs, we will still need to have code in the pass that make sure this is safe w.r.t to the final sequence.
The main benefit I see is that we will no longer need to have the folding code – rather, we will have to unfold PUSH32rmm, which is simpler. However, I hope I can eventually get rid of the folding here by teaching PeepholeOptimizer to be smarter about this.

On the other hand, X86TargetLowering::LowerCall() is already, IMHO, a fairly complex piece of code, and I’d rather avoid making it even more complex.
Conceptually, I’d prefer that LowerCall() did standard mov-based lowering in all cases like it does now (we aren’t always going to lower to pushes anyway – it doesn’t really make sense for x864-64) and treat pushes as an optimization where available.

What do you think?

Michael

From: Chandler Carruth [mailto:chandlerc@google.com]
Sent: Thursday, January 22, 2015 23:07
To: reviews+D6789+public+a4ec4af5a5133e84@reviews.llvm.org
Cc: Kuperstein, Michael M; Nadav Rotem; Demikhovsky, Elena; Commit Messages and Patches for LLVM
Subject: Re: [PATCH] [X86] Convert esp-relative movs of function arguments to pushes, step 2

Closed by commit rL227728: [X86] Convert esp-relative movs of function arguments to pushes, step 2 (authored by mkuper). · Explain WhyFeb 1 2015, 3:46 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Target/

TargetFrameLowering.h

5 lines

lib/

CodeGen/

PrologEpilogInserter.cpp

20 lines

TargetFrameLoweringImpl.cpp

5 lines

Target/

X86/

CMakeLists.txt

107 lines

X86.h

5 lines

X86CallFrameOptimization.cpp

400 lines

6716 lines

192 lines

4123 lines

3700 lines

5 lines

52 lines

X86MachineFunctionInfo.h

12 lines

X86RegisterInfo.cpp

5 lines

X86TargetMachine.cpp

5 lines

test/

CodeGen/

X86/

inalloca-invoke.ll

2 lines

movtopush.ll

178 lines

Diff 19110

llvm/trunk/include/llvm/Target/TargetFrameLowering.h

Show First 20 Lines • Show All 187 Lines • ▼ Show 20 Lines	public:
/// need adjusting for the call frame adjustments. Normally, that's true		/// need adjusting for the call frame adjustments. Normally, that's true
/// if the function has a reserved call frame or a frame pointer. Some		/// if the function has a reserved call frame or a frame pointer. Some
/// targets (Thumb2, for example) may have more complicated criteria,		/// targets (Thumb2, for example) may have more complicated criteria,
/// however, and can override this behavior.		/// however, and can override this behavior.
virtual bool canSimplifyCallFramePseudos(const MachineFunction &MF) const {		virtual bool canSimplifyCallFramePseudos(const MachineFunction &MF) const {
return hasReservedCallFrame(MF) \|\| hasFP(MF);		return hasReservedCallFrame(MF) \|\| hasFP(MF);
}		}

		// needsFrameIndexResolution - Do we need to perform FI resolution for
		// this function. Normally, this is required only when the function
		// has any stack objects. However, targets may want to override this.
		virtual bool needsFrameIndexResolution(const MachineFunction &MF) const;

/// getFrameIndexOffset - Returns the displacement from the frame register to		/// getFrameIndexOffset - Returns the displacement from the frame register to
/// the stack frame of the specified index.		/// the stack frame of the specified index.
virtual int getFrameIndexOffset(const MachineFunction &MF, int FI) const;		virtual int getFrameIndexOffset(const MachineFunction &MF, int FI) const;

/// getFrameIndexReference - This method should return the base register		/// getFrameIndexReference - This method should return the base register
/// and offset used to reference a frame index location. The offset is		/// and offset used to reference a frame index location. The offset is
/// returned directly, and the base register is returned via FrameReg.		/// returned directly, and the base register is returned via FrameReg.
virtual int getFrameIndexReference(const MachineFunction &MF, int FI,		virtual int getFrameIndexReference(const MachineFunction &MF, int FI,
▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/PrologEpilogInserter.cpp

Show First 20 Lines • Show All 697 Lines • ▼ Show 20 Lines	void PEI::insertPrologEpilogCode(MachineFunction &Fn) {
if (Fn.getFunction()->getCallingConv() == CallingConv::HiPE)		if (Fn.getFunction()->getCallingConv() == CallingConv::HiPE)
TFI.adjustForHiPEPrologue(Fn);		TFI.adjustForHiPEPrologue(Fn);
}		}

/// replaceFrameIndices - Replace all MO_FrameIndex operands with physical		/// replaceFrameIndices - Replace all MO_FrameIndex operands with physical
/// register references and actual offsets.		/// register references and actual offsets.
///		///
void PEI::replaceFrameIndices(MachineFunction &Fn) {		void PEI::replaceFrameIndices(MachineFunction &Fn) {
if (!Fn.getFrameInfo()->hasStackObjects()) return; // Nothing to do?		const TargetFrameLowering &TFI = *Fn.getSubtarget().getFrameLowering();
		if (!TFI.needsFrameIndexResolution(Fn)) return;

// Store SPAdj at exit of a basic block.		// Store SPAdj at exit of a basic block.
SmallVector<int, 8> SPState;		SmallVector<int, 8> SPState;
SPState.resize(Fn.getNumBlockIDs());		SPState.resize(Fn.getNumBlockIDs());
SmallPtrSet<MachineBasicBlock*, 8> Reachable;		SmallPtrSet<MachineBasicBlock*, 8> Reachable;

// Iterate over the reachable blocks in DFS order.		// Iterate over the reachable blocks in DFS order.
for (auto DFI = df_ext_begin(&Fn, Reachable), DFE = df_ext_end(&Fn, Reachable);		for (auto DFI = df_ext_begin(&Fn, Reachable), DFE = df_ext_end(&Fn, Reachable);
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	if (I->getOpcode() == FrameSetupOpcode \|\|
// Visit the instructions created by eliminateCallFramePseudoInstr().		// Visit the instructions created by eliminateCallFramePseudoInstr().
if (PrevI == BB->end())		if (PrevI == BB->end())
I = BB->begin(); // The replaced instr was the first in the block.		I = BB->begin(); // The replaced instr was the first in the block.
else		else
I = std::next(PrevI);		I = std::next(PrevI);
continue;		continue;
}		}

// If we are looking at a call sequence, we need to keep track of
// the SP adjustment made by each instruction in the sequence.
// This includes both the frame setup/destroy pseudos (handled above),
// as well as other instructions that have side effects w.r.t the SP.
if (InsideCallSequence)
SPAdj += TII.getSPAdjust(I);

MachineInstr *MI = I;		MachineInstr *MI = I;
bool DoIncr = true;		bool DoIncr = true;
for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {		for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
if (!MI->getOperand(i).isFI())		if (!MI->getOperand(i).isFI())
continue;		continue;

// Frame indicies in debug values are encoded in a target independent		// Frame indicies in debug values are encoded in a target independent
// way with simply the frame index and offset rather than any		// way with simply the frame index and offset rather than any
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
I = BB->begin();		I = BB->begin();
DoIncr = false;		DoIncr = false;
}		}

MI = nullptr;		MI = nullptr;
break;		break;
}		}

		// If we are looking at a call sequence, we need to keep track of
		// the SP adjustment made by each instruction in the sequence.
		// This includes both the frame setup/destroy pseudos (handled above),
		// as well as other instructions that have side effects w.r.t the SP.
		// Note that this must come after eliminateFrameIndex, because
		// if I itself referred to a frame index, we shouldn't count its own
		// adjustment.
		if (MI && InsideCallSequence)
		SPAdj += TII.getSPAdjust(MI);

if (DoIncr && I != BB->end()) ++I;		if (DoIncr && I != BB->end()) ++I;

// Update register states.		// Update register states.
if (RS && !FrameIndexVirtualScavenging && MI) RS->forward(MI);		if (RS && !FrameIndexVirtualScavenging && MI) RS->forward(MI);
}		}
}		}

/// scavengeFrameVirtualRegs - Replace all frame index virtual registers		/// scavengeFrameVirtualRegs - Replace all frame index virtual registers
▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/TargetFrameLoweringImpl.cpp

Show All 36 Lines	int TargetFrameLowering::getFrameIndexReference(const MachineFunction &MF,
const TargetRegisterInfo *RI = MF.getSubtarget().getRegisterInfo();		const TargetRegisterInfo *RI = MF.getSubtarget().getRegisterInfo();

// By default, assume all frame indices are referenced via whatever		// By default, assume all frame indices are referenced via whatever
// getFrameRegister() says. The target can override this if it's doing		// getFrameRegister() says. The target can override this if it's doing
// something different.		// something different.
FrameReg = RI->getFrameRegister(MF);		FrameReg = RI->getFrameRegister(MF);
return getFrameIndexOffset(MF, FI);		return getFrameIndexOffset(MF, FI);
}		}

		bool TargetFrameLowering::needsFrameIndexResolution(
		const MachineFunction &MF) const {
		return MF.getFrameInfo()->hasStackObjects();
		}

llvm/trunk/lib/Target/X86/CMakeLists.txt

	set(LLVM_TARGET_DEFINITIONS X86.td)			set(LLVM_TARGET_DEFINITIONS X86.td)

	tablegen(LLVM X86GenRegisterInfo.inc -gen-register-info)			tablegen(LLVM X86GenRegisterInfo.inc -gen-register-info)
	tablegen(LLVM X86GenDisassemblerTables.inc -gen-disassembler)			tablegen(LLVM X86GenDisassemblerTables.inc -gen-disassembler)
	tablegen(LLVM X86GenInstrInfo.inc -gen-instr-info)			tablegen(LLVM X86GenInstrInfo.inc -gen-instr-info)
	tablegen(LLVM X86GenAsmWriter.inc -gen-asm-writer)			tablegen(LLVM X86GenAsmWriter.inc -gen-asm-writer)
	tablegen(LLVM X86GenAsmWriter1.inc -gen-asm-writer -asmwriternum=1)			tablegen(LLVM X86GenAsmWriter1.inc -gen-asm-writer -asmwriternum=1)
	tablegen(LLVM X86GenAsmMatcher.inc -gen-asm-matcher)			tablegen(LLVM X86GenAsmMatcher.inc -gen-asm-matcher)
	tablegen(LLVM X86GenDAGISel.inc -gen-dag-isel)			tablegen(LLVM X86GenDAGISel.inc -gen-dag-isel)
	tablegen(LLVM X86GenFastISel.inc -gen-fast-isel)			tablegen(LLVM X86GenFastISel.inc -gen-fast-isel)
	tablegen(LLVM X86GenCallingConv.inc -gen-callingconv)			tablegen(LLVM X86GenCallingConv.inc -gen-callingconv)
	tablegen(LLVM X86GenSubtargetInfo.inc -gen-subtarget)			tablegen(LLVM X86GenSubtargetInfo.inc -gen-subtarget)
	add_public_tablegen_target(X86CommonTableGen)			add_public_tablegen_target(X86CommonTableGen)

	set(sources			set(sources
	X86AsmPrinter.cpp			X86AsmPrinter.cpp
	X86FastISel.cpp			X86CallFrameOptimization.cpp
	X86FloatingPoint.cpp			X86FastISel.cpp
	X86FrameLowering.cpp			X86FloatingPoint.cpp
	X86ISelDAGToDAG.cpp			X86FrameLowering.cpp
	X86ISelLowering.cpp			X86ISelDAGToDAG.cpp
	X86InstrInfo.cpp			X86ISelLowering.cpp
	X86MCInstLower.cpp			X86InstrInfo.cpp
	X86MachineFunctionInfo.cpp			X86MCInstLower.cpp
	X86PadShortFunction.cpp			X86MachineFunctionInfo.cpp
	X86RegisterInfo.cpp			X86PadShortFunction.cpp
	X86SelectionDAGInfo.cpp			X86RegisterInfo.cpp
	X86Subtarget.cpp			X86SelectionDAGInfo.cpp
	X86TargetMachine.cpp			X86Subtarget.cpp
	X86TargetObjectFile.cpp			X86TargetMachine.cpp
	X86TargetTransformInfo.cpp			X86TargetObjectFile.cpp
	X86VZeroUpper.cpp			X86TargetTransformInfo.cpp
	X86FixupLEAs.cpp			X86VZeroUpper.cpp
	)			X86FixupLEAs.cpp
				)
	if( CMAKE_CL_64 )
	enable_language(ASM_MASM)			if( CMAKE_CL_64 )
	ADD_CUSTOM_COMMAND(			enable_language(ASM_MASM)
	OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/X86CompilationCallback_Win64.obj			ADD_CUSTOM_COMMAND(
	MAIN_DEPENDENCY X86CompilationCallback_Win64.asm			OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/X86CompilationCallback_Win64.obj
	COMMAND ${CMAKE_ASM_MASM_COMPILER} /Fo ${CMAKE_CURRENT_BINARY_DIR}/X86CompilationCallback_Win64.obj /c ${CMAKE_CURRENT_SOURCE_DIR}/X86CompilationCallback_Win64.asm			MAIN_DEPENDENCY X86CompilationCallback_Win64.asm
	)			COMMAND ${CMAKE_ASM_MASM_COMPILER} /Fo ${CMAKE_CURRENT_BINARY_DIR}/X86CompilationCallback_Win64.obj /c ${CMAKE_CURRENT_SOURCE_DIR}/X86CompilationCallback_Win64.asm
	set(sources ${sources} ${CMAKE_CURRENT_BINARY_DIR}/X86CompilationCallback_Win64.obj)			)
	endif()			set(sources ${sources} ${CMAKE_CURRENT_BINARY_DIR}/X86CompilationCallback_Win64.obj)
				endif()
	add_llvm_target(X86CodeGen ${sources})
				add_llvm_target(X86CodeGen ${sources})
	add_subdirectory(AsmParser)
	add_subdirectory(Disassembler)			add_subdirectory(AsmParser)
	add_subdirectory(InstPrinter)			add_subdirectory(Disassembler)
	add_subdirectory(MCTargetDesc)			add_subdirectory(InstPrinter)
	add_subdirectory(TargetInfo)			add_subdirectory(MCTargetDesc)
	add_subdirectory(Utils)			add_subdirectory(TargetInfo)
				add_subdirectory(Utils)

llvm/trunk/lib/Target/X86/X86.h

	Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	/// with NOOPs. This will prevent a stall when returning on the Atom.			/// with NOOPs. This will prevent a stall when returning on the Atom.
	FunctionPass *createX86PadShortFunctions();			FunctionPass *createX86PadShortFunctions();
	/// createX86FixupLEAs - Return a a pass that selectively replaces			/// createX86FixupLEAs - Return a a pass that selectively replaces
	/// certain instructions (like add, sub, inc, dec, some shifts,			/// certain instructions (like add, sub, inc, dec, some shifts,
	/// and some multiplies) by equivalent LEA instructions, in order			/// and some multiplies) by equivalent LEA instructions, in order
	/// to eliminate execution delays in some Atom processors.			/// to eliminate execution delays in some Atom processors.
	FunctionPass *createX86FixupLEAs();			FunctionPass *createX86FixupLEAs();

				/// createX86CallFrameOptimization - Return a pass that optimizes
				/// the code-size of x86 call sequences. This is done by replacing
				/// esp-relative movs with pushes.
				FunctionPass *createX86CallFrameOptimization();

	} // End llvm namespace			} // End llvm namespace

	#endif			#endif

llvm/trunk/lib/Target/X86/X86CallFrameOptimization.cpp

				//===----- X86CallFrameOptimization.cpp - Optimize x86 call sequences -----===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file defines a pass that optimizes call sequences on x86.
				// Currently, it converts movs of function parameters onto the stack into
				// pushes. This is beneficial for two main reasons:
				// 1) The push instruction encoding is much smaller than an esp-relative mov
				// 2) It is possible to push memory arguments directly. So, if the
				// the transformation is preformed pre-reg-alloc, it can help relieve
				// register pressure.
				//
				//===----------------------------------------------------------------------===//

				#include <algorithm>

				#include "X86.h"
				#include "X86InstrInfo.h"
				#include "X86Subtarget.h"
				#include "X86MachineFunctionInfo.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/CodeGen/Passes.h"
				#include "llvm/IR/Function.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Target/TargetInstrInfo.h"

				using namespace llvm;

				#define DEBUG_TYPE "x86-cf-opt"

				cl::opt<bool> NoX86CFOpt("no-x86-call-frame-opt",
				cl::desc("Avoid optimizing x86 call frames for size"),
				cl::init(false), cl::Hidden);

				namespace {
				class X86CallFrameOptimization : public MachineFunctionPass {
				public:
				X86CallFrameOptimization() : MachineFunctionPass(ID) {}

				bool runOnMachineFunction(MachineFunction &MF) override;

				private:
				bool shouldPerformTransformation(MachineFunction &MF);

				bool adjustCallSequence(MachineFunction &MF, MachineBasicBlock &MBB,
				MachineBasicBlock::iterator I);

				MachineInstr *canFoldIntoRegPush(MachineBasicBlock::iterator FrameSetup,
				unsigned Reg);

				const char *getPassName() const override {
				return "X86 Optimize Call Frame";
				}

				const TargetInstrInfo *TII;
				const TargetFrameLowering *TFL;
				const MachineRegisterInfo *MRI;
				static char ID;
				};

				char X86CallFrameOptimization::ID = 0;
				}

				FunctionPass *llvm::createX86CallFrameOptimization() {
				return new X86CallFrameOptimization();
				}

				// This checks whether the transformation is legal and profitable
				bool X86CallFrameOptimization::shouldPerformTransformation(MachineFunction &MF) {
				if (NoX86CFOpt.getValue())
				return false;

				// We currently only support call sequences where all parameters.
				// are passed on the stack.
				// No point in running this in 64-bit mode, since some arguments are
				// passed in-register in all common calling conventions, so the pattern
				// we're looking for will never match.
				const X86Subtarget &STI = MF.getTarget().getSubtarget<X86Subtarget>();
				if (STI.is64Bit())
				return false;

				// You would expect straight-line code between call-frame setup and
				// call-frame destroy. You would be wrong. There are circumstances (e.g.
				// CMOV_GR8 expansion of a select that feeds a function call!) where we can
				// end up with the setup and the destroy in different basic blocks.
				// This is bad, and breaks SP adjustment.
				// So, check that all of the frames in the function are closed inside
				// the same block, and, for good measure, that there are no nested frames.
				int FrameSetupOpcode = TII->getCallFrameSetupOpcode();
				int FrameDestroyOpcode = TII->getCallFrameDestroyOpcode();
				for (MachineBasicBlock &BB : MF) {
				bool InsideFrameSequence = false;
				for (MachineInstr &MI : BB) {
				if (MI.getOpcode() == FrameSetupOpcode) {
				if (InsideFrameSequence)
				return false;
				InsideFrameSequence = true;
				}
				else if (MI.getOpcode() == FrameDestroyOpcode) {
				if (!InsideFrameSequence)
				return false;
				InsideFrameSequence = false;
				}
				}

				if (InsideFrameSequence)
				return false;
				}

				// Now that we know the transformation is legal, check if it is
				// profitable.
				// TODO: Add a heuristic that actually looks at the function,
				// and enable this for more cases.

				// This transformation is always a win when we expected to have
				// a reserved call frame. Under other circumstances, it may be either
				// a win or a loss, and requires a heuristic.
				// For now, enable it only for the relatively clear win cases.
				bool CannotReserveFrame = MF.getFrameInfo()->hasVarSizedObjects();
				if (CannotReserveFrame)
				return true;

				// For now, don't even try to evaluate the profitability when
				// not optimizing for size.
				AttributeSet FnAttrs = MF.getFunction()->getAttributes();
				bool OptForSize =
				FnAttrs.hasAttribute(AttributeSet::FunctionIndex,
				Attribute::OptimizeForSize) \|\|
				FnAttrs.hasAttribute(AttributeSet::FunctionIndex, Attribute::MinSize);

				if (!OptForSize)
				return false;

				// Stack re-alignment can make this unprofitable even in terms of size.
				// As mentioned above, a better heuristic is needed. For now, don't do this
				// when the required alignment is above 8. (4 would be the safe choice, but
				// some experimentation showed 8 is generally good).
				if (TFL->getStackAlignment() > 8)
				return false;

				return true;
				}

				bool X86CallFrameOptimization::runOnMachineFunction(MachineFunction &MF) {
				TII = MF.getSubtarget().getInstrInfo();
				TFL = MF.getSubtarget().getFrameLowering();
				MRI = &MF.getRegInfo();

				if (!shouldPerformTransformation(MF))
				return false;

				int FrameSetupOpcode = TII->getCallFrameSetupOpcode();

				bool Changed = false;

				for (MachineFunction::iterator BB = MF.begin(), E = MF.end(); BB != E; ++BB)
				for (MachineBasicBlock::iterator I = BB->begin(); I != BB->end(); ++I)
				if (I->getOpcode() == FrameSetupOpcode)
				Changed \|= adjustCallSequence(MF, *BB, I);

				return Changed;
				}

				bool X86CallFrameOptimization::adjustCallSequence(MachineFunction &MF,
				MachineBasicBlock &MBB,
				MachineBasicBlock::iterator I) {

				// Check that this particular call sequence is amenable to the
				// transformation.
				const X86RegisterInfo &RegInfo = static_cast<const X86RegisterInfo >(
				MF.getSubtarget().getRegisterInfo());
				unsigned StackPtr = RegInfo.getStackRegister();
				int FrameDestroyOpcode = TII->getCallFrameDestroyOpcode();

				// We expect to enter this at the beginning of a call sequence
				assert(I->getOpcode() == TII->getCallFrameSetupOpcode());
				MachineBasicBlock::iterator FrameSetup = I++;


				// For globals in PIC mode, we can have some LEAs here.
				// Ignore them, they don't bother us.
				// TODO: Extend this to something that covers more cases.
				while (I->getOpcode() == X86::LEA32r)
				++I;

				// We expect a copy instruction here.
				// TODO: The copy instruction is a lowering artifact.
				// We should also support a copy-less version, where the stack
				// pointer is used directly.
				if (!I->isCopy() \|\| !I->getOperand(0).isReg())
				return false;
				MachineBasicBlock::iterator SPCopy = I++;
				StackPtr = SPCopy->getOperand(0).getReg();

				// Scan the call setup sequence for the pattern we're looking for.
				// We only handle a simple case - a sequence of MOV32mi or MOV32mr
				// instructions, that push a sequence of 32-bit values onto the stack, with
				// no gaps between them.
				SmallVector<MachineInstr*, 4> MovVector(4, nullptr);
				unsigned int MaxAdjust = FrameSetup->getOperand(0).getImm() / 4;
				if (MaxAdjust > 4)
				MovVector.resize(MaxAdjust, nullptr);

				do {
				int Opcode = I->getOpcode();
				if (Opcode != X86::MOV32mi && Opcode != X86::MOV32mr)
				break;

				// We only want movs of the form:
				// movl imm/r32, k(%esp)
				// If we run into something else, bail.
				// Note that AddrBaseReg may, counter to its name, not be a register,
				// but rather a frame index.
				// TODO: Support the fi case. This should probably work now that we
				// have the infrastructure to track the stack pointer within a call
				// sequence.
				if (!I->getOperand(X86::AddrBaseReg).isReg() \|\|
				(I->getOperand(X86::AddrBaseReg).getReg() != StackPtr) \|\|
				!I->getOperand(X86::AddrScaleAmt).isImm() \|\|
				(I->getOperand(X86::AddrScaleAmt).getImm() != 1) \|\|
				(I->getOperand(X86::AddrIndexReg).getReg() != X86::NoRegister) \|\|
				(I->getOperand(X86::AddrSegmentReg).getReg() != X86::NoRegister) \|\|
				!I->getOperand(X86::AddrDisp).isImm())
				return false;

				int64_t StackDisp = I->getOperand(X86::AddrDisp).getImm();
				assert(StackDisp >= 0 && "Negative stack displacement when passing parameters");

				// We really don't want to consider the unaligned case.
				if (StackDisp % 4)
				return false;
				StackDisp /= 4;

				assert((size_t)StackDisp < MovVector.size() &&
				"Function call has more parameters than the stack is adjusted for.");

				// If the same stack slot is being filled twice, something's fishy.
				if (MovVector[StackDisp] != nullptr)
				return false;
				MovVector[StackDisp] = I;

				++I;
				} while (I != MBB.end());

				// We now expect the end of the sequence - a call and a stack adjust.
				if (I == MBB.end())
				return false;

				// For PCrel calls, we expect an additional COPY of the basereg.
				// If we find one, skip it.
				if (I->isCopy()) {
				if (I->getOperand(1).getReg() ==
				MF.getInfo<X86MachineFunctionInfo>()->getGlobalBaseReg())
				++I;
				else
				return false;
				}

				if (!I->isCall())
				return false;
				MachineBasicBlock::iterator Call = I;
				if ((++I)->getOpcode() != FrameDestroyOpcode)
				return false;

				// Now, go through the vector, and see that we don't have any gaps,
				// but only a series of 32-bit MOVs.

				int64_t ExpectedDist = 0;
				auto MMI = MovVector.begin(), MME = MovVector.end();
				for (; MMI != MME; ++MMI, ExpectedDist += 4)
				if (*MMI == nullptr)
				break;

				// If the call had no parameters, do nothing
				if (!ExpectedDist)
				return false;

				// We are either at the last parameter, or a gap.
				// Make sure it's not a gap
				for (; MMI != MME; ++MMI)
				if (*MMI != nullptr)
				return false;

				// Ok, we can in fact do the transformation for this call.
				// Do not remove the FrameSetup instruction, but adjust the parameters.
				// PEI will end up finalizing the handling of this.
				FrameSetup->getOperand(1).setImm(ExpectedDist);

				DebugLoc DL = I->getDebugLoc();
				// Now, iterate through the vector in reverse order, and replace the movs
				// with pushes. MOVmi/MOVmr doesn't have any defs, so no need to
				// replace uses.
				for (int Idx = (ExpectedDist / 4) - 1; Idx >= 0; --Idx) {
				MachineBasicBlock::iterator MOV = *MovVector[Idx];
				MachineOperand PushOp = MOV->getOperand(X86::AddrNumOperands);
				if (MOV->getOpcode() == X86::MOV32mi) {
				unsigned PushOpcode = X86::PUSHi32;
				// If the operand is a small (8-bit) immediate, we can use a
				// PUSH instruction with a shorter encoding.
				// Note that isImm() may fail even though this is a MOVmi, because
				// the operand can also be a symbol.
				if (PushOp.isImm()) {
				int64_t Val = PushOp.getImm();
				if (isInt<8>(Val))
				PushOpcode = X86::PUSH32i8;
				}
				BuildMI(MBB, Call, DL, TII->get(PushOpcode)).addOperand(PushOp);
				} else {
				unsigned int Reg = PushOp.getReg();

				// If PUSHrmm is not slow on this target, try to fold the source of the
				// push into the instruction.
				const X86Subtarget &ST = MF.getTarget().getSubtarget<X86Subtarget>();
				bool SlowPUSHrmm = ST.isAtom() \|\| ST.isSLM();

				// Check that this is legal to fold. Right now, we're extremely
				// conservative about that.
				MachineInstr *DefMov = nullptr;
				if (!SlowPUSHrmm && (DefMov = canFoldIntoRegPush(FrameSetup, Reg))) {
				MachineInstr *Push = BuildMI(MBB, Call, DL, TII->get(X86::PUSH32rmm));

				unsigned NumOps = DefMov->getDesc().getNumOperands();
				for (unsigned i = NumOps - X86::AddrNumOperands; i != NumOps; ++i)
				Push->addOperand(DefMov->getOperand(i));

				DefMov->eraseFromParent();
				} else {
				BuildMI(MBB, Call, DL, TII->get(X86::PUSH32r)).addReg(Reg).getInstr();
				}
				}

				MBB.erase(MOV);
				}

				// The stack-pointer copy is no longer used in the call sequences.
				// There should not be any other users, but we can't commit to that, so:
				if (MRI->use_empty(SPCopy->getOperand(0).getReg()))
				SPCopy->eraseFromParent();

				// Once we've done this, we need to make sure PEI doesn't assume a reserved
				// frame.
				X86MachineFunctionInfo *FuncInfo = MF.getInfo<X86MachineFunctionInfo>();
				FuncInfo->setHasPushSequences(true);

				return true;
				}

				MachineInstr *X86CallFrameOptimization::canFoldIntoRegPush(
				MachineBasicBlock::iterator FrameSetup, unsigned Reg) {
				// Do an extremely restricted form of load folding.
				// ISel will often create patterns like:
				// movl 4(%edi), %eax
				// movl 8(%edi), %ecx
				// movl 12(%edi), %edx
				// movl %edx, 8(%esp)
				// movl %ecx, 4(%esp)
				// movl %eax, (%esp)
				// call
				// Get rid of those with prejudice.
				if (!TargetRegisterInfo::isVirtualRegister(Reg))
				return nullptr;

				// Make sure this is the only use of Reg.
				if (!MRI->hasOneNonDBGUse(Reg))
				return nullptr;

				MachineBasicBlock::iterator DefMI = MRI->getVRegDef(Reg);

				// Make sure the def is a MOV from memory.
				// If the def is an another block, give up.
				if (DefMI->getOpcode() != X86::MOV32rm \|\|
				DefMI->getParent() != FrameSetup->getParent())
				return nullptr;

				// Be careful with movs that load from a stack slot, since it may get
				// resolved incorrectly.
				// TODO: Again, we already have the infrastructure, so this should work.
				if (!DefMI->getOperand(1).isReg())
				return nullptr;

				// Now, make sure everything else up until the ADJCALLSTACK is a sequence
				// of MOVs. To be less conservative would require duplicating a lot of the
				// logic from PeepholeOptimizer.
				// FIXME: A possibly better approach would be to teach the PeepholeOptimizer
				// to be smarter about folding into pushes.
				for (auto I = DefMI; I != FrameSetup; ++I)
				if (I->getOpcode() != X86::MOV32rm)
				return nullptr;

				return DefMI;
				}

llvm/trunk/lib/Target/X86/X86FastISel.cpp

	//===-- X86FastISel.cpp - X86 FastISel implementation ---------------------===//			//===-- X86FastISel.cpp - X86 FastISel implementation ---------------------===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file defines the X86-specific support for the FastISel class. Much			// This file defines the X86-specific support for the FastISel class. Much
	// of the target-specific code is generated by tablegen in the file			// of the target-specific code is generated by tablegen in the file
	// X86GenFastISel.inc, which is #included here.			// X86GenFastISel.inc, which is #included here.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "X86.h"			#include "X86.h"
	#include "X86CallingConv.h"			#include "X86CallingConv.h"
	#include "X86InstrBuilder.h"			#include "X86InstrBuilder.h"
	#include "X86InstrInfo.h"			#include "X86InstrInfo.h"
	#include "X86MachineFunctionInfo.h"			#include "X86MachineFunctionInfo.h"
	#include "X86RegisterInfo.h"			#include "X86RegisterInfo.h"
	#include "X86Subtarget.h"			#include "X86Subtarget.h"
	#include "X86TargetMachine.h"			#include "X86TargetMachine.h"
	#include "llvm/Analysis/BranchProbabilityInfo.h"			#include "llvm/Analysis/BranchProbabilityInfo.h"
	#include "llvm/CodeGen/Analysis.h"			#include "llvm/CodeGen/Analysis.h"
	#include "llvm/CodeGen/FastISel.h"			#include "llvm/CodeGen/FastISel.h"
	#include "llvm/CodeGen/FunctionLoweringInfo.h"			#include "llvm/CodeGen/FunctionLoweringInfo.h"
	#include "llvm/CodeGen/MachineConstantPool.h"			#include "llvm/CodeGen/MachineConstantPool.h"
	#include "llvm/CodeGen/MachineFrameInfo.h"			#include "llvm/CodeGen/MachineFrameInfo.h"
	#include "llvm/CodeGen/MachineRegisterInfo.h"			#include "llvm/CodeGen/MachineRegisterInfo.h"
	#include "llvm/IR/CallSite.h"			#include "llvm/IR/CallSite.h"
	#include "llvm/IR/CallingConv.h"			#include "llvm/IR/CallingConv.h"
	#include "llvm/IR/DerivedTypes.h"			#include "llvm/IR/DerivedTypes.h"
	#include "llvm/IR/GetElementPtrTypeIterator.h"			#include "llvm/IR/GetElementPtrTypeIterator.h"
	#include "llvm/IR/GlobalAlias.h"			#include "llvm/IR/GlobalAlias.h"
	#include "llvm/IR/GlobalVariable.h"			#include "llvm/IR/GlobalVariable.h"
	#include "llvm/IR/Instructions.h"			#include "llvm/IR/Instructions.h"
	#include "llvm/IR/IntrinsicInst.h"			#include "llvm/IR/IntrinsicInst.h"
	#include "llvm/IR/Operator.h"			#include "llvm/IR/Operator.h"
	#include "llvm/Support/ErrorHandling.h"			#include "llvm/Support/ErrorHandling.h"
	#include "llvm/Target/TargetOptions.h"			#include "llvm/Target/TargetOptions.h"
	using namespace llvm;			using namespace llvm;

	namespace {			namespace {

	class X86FastISel final : public FastISel {			class X86FastISel final : public FastISel {
	/// Subtarget - Keep a pointer to the X86Subtarget around so that we can			/// Subtarget - Keep a pointer to the X86Subtarget around so that we can
	/// make the right decision when generating code for different targets.			/// make the right decision when generating code for different targets.
	const X86Subtarget *Subtarget;			const X86Subtarget *Subtarget;

	/// X86ScalarSSEf32, X86ScalarSSEf64 - Select between SSE or x87			/// X86ScalarSSEf32, X86ScalarSSEf64 - Select between SSE or x87
	/// floating point ops.			/// floating point ops.
	/// When SSE is available, use it for f32 operations.			/// When SSE is available, use it for f32 operations.
	/// When SSE2 is available, use it for f64 operations.			/// When SSE2 is available, use it for f64 operations.
	bool X86ScalarSSEf64;			bool X86ScalarSSEf64;
	bool X86ScalarSSEf32;			bool X86ScalarSSEf32;

	public:			public:
	explicit X86FastISel(FunctionLoweringInfo &funcInfo,			explicit X86FastISel(FunctionLoweringInfo &funcInfo,
	const TargetLibraryInfo *libInfo)			const TargetLibraryInfo *libInfo)
	: FastISel(funcInfo, libInfo) {			: FastISel(funcInfo, libInfo) {
	Subtarget = &TM.getSubtarget<X86Subtarget>();			Subtarget = &TM.getSubtarget<X86Subtarget>();
	X86ScalarSSEf64 = Subtarget->hasSSE2();			X86ScalarSSEf64 = Subtarget->hasSSE2();
	X86ScalarSSEf32 = Subtarget->hasSSE1();			X86ScalarSSEf32 = Subtarget->hasSSE1();
	}			}

	bool fastSelectInstruction(const Instruction *I) override;			bool fastSelectInstruction(const Instruction *I) override;

	/// \brief The specified machine instr operand is a vreg, and that			/// \brief The specified machine instr operand is a vreg, and that
	/// vreg is being provided by the specified load instruction. If possible,			/// vreg is being provided by the specified load instruction. If possible,
	/// try to fold the load as an operand to the instruction, returning true if			/// try to fold the load as an operand to the instruction, returning true if
	/// possible.			/// possible.
	bool tryToFoldLoadIntoMI(MachineInstr *MI, unsigned OpNo,			bool tryToFoldLoadIntoMI(MachineInstr *MI, unsigned OpNo,
	const LoadInst *LI) override;			const LoadInst *LI) override;

	bool fastLowerArguments() override;			bool fastLowerArguments() override;
	bool fastLowerCall(CallLoweringInfo &CLI) override;			bool fastLowerCall(CallLoweringInfo &CLI) override;
	bool fastLowerIntrinsicCall(const IntrinsicInst *II) override;			bool fastLowerIntrinsicCall(const IntrinsicInst *II) override;

	#include "X86GenFastISel.inc"			#include "X86GenFastISel.inc"

	private:			private:
	bool X86FastEmitCompare(const Value LHS, const Value RHS, EVT VT, DebugLoc DL);			bool X86FastEmitCompare(const Value LHS, const Value RHS, EVT VT, DebugLoc DL);

	bool X86FastEmitLoad(EVT VT, const X86AddressMode &AM, MachineMemOperand *MMO,			bool X86FastEmitLoad(EVT VT, const X86AddressMode &AM, MachineMemOperand *MMO,
	unsigned &ResultReg);			unsigned &ResultReg);

	bool X86FastEmitStore(EVT VT, const Value *Val, const X86AddressMode &AM,			bool X86FastEmitStore(EVT VT, const Value *Val, const X86AddressMode &AM,
	MachineMemOperand *MMO = nullptr, bool Aligned = false);			MachineMemOperand *MMO = nullptr, bool Aligned = false);
	bool X86FastEmitStore(EVT VT, unsigned ValReg, bool ValIsKill,			bool X86FastEmitStore(EVT VT, unsigned ValReg, bool ValIsKill,
	const X86AddressMode &AM,			const X86AddressMode &AM,
	MachineMemOperand *MMO = nullptr, bool Aligned = false);			MachineMemOperand *MMO = nullptr, bool Aligned = false);

	bool X86FastEmitExtend(ISD::NodeType Opc, EVT DstVT, unsigned Src, EVT SrcVT,			bool X86FastEmitExtend(ISD::NodeType Opc, EVT DstVT, unsigned Src, EVT SrcVT,
	unsigned &ResultReg);			unsigned &ResultReg);

	bool X86SelectAddress(const Value *V, X86AddressMode &AM);			bool X86SelectAddress(const Value *V, X86AddressMode &AM);
	bool X86SelectCallAddress(const Value *V, X86AddressMode &AM);			bool X86SelectCallAddress(const Value *V, X86AddressMode &AM);

	bool X86SelectLoad(const Instruction *I);			bool X86SelectLoad(const Instruction *I);

	bool X86SelectStore(const Instruction *I);			bool X86SelectStore(const Instruction *I);

	bool X86SelectRet(const Instruction *I);			bool X86SelectRet(const Instruction *I);

	bool X86SelectCmp(const Instruction *I);			bool X86SelectCmp(const Instruction *I);

	bool X86SelectZExt(const Instruction *I);			bool X86SelectZExt(const Instruction *I);

	bool X86SelectBranch(const Instruction *I);			bool X86SelectBranch(const Instruction *I);

	bool X86SelectShift(const Instruction *I);			bool X86SelectShift(const Instruction *I);

	bool X86SelectDivRem(const Instruction *I);			bool X86SelectDivRem(const Instruction *I);

	bool X86FastEmitCMoveSelect(MVT RetVT, const Instruction *I);			bool X86FastEmitCMoveSelect(MVT RetVT, const Instruction *I);

	bool X86FastEmitSSESelect(MVT RetVT, const Instruction *I);			bool X86FastEmitSSESelect(MVT RetVT, const Instruction *I);

	bool X86FastEmitPseudoSelect(MVT RetVT, const Instruction *I);			bool X86FastEmitPseudoSelect(MVT RetVT, const Instruction *I);

	bool X86SelectSelect(const Instruction *I);			bool X86SelectSelect(const Instruction *I);

	bool X86SelectTrunc(const Instruction *I);			bool X86SelectTrunc(const Instruction *I);

	bool X86SelectFPExt(const Instruction *I);			bool X86SelectFPExt(const Instruction *I);
	bool X86SelectFPTrunc(const Instruction *I);			bool X86SelectFPTrunc(const Instruction *I);

	const X86InstrInfo *getInstrInfo() const {			const X86InstrInfo *getInstrInfo() const {
	return getTargetMachine()->getSubtargetImpl()->getInstrInfo();			return getTargetMachine()->getSubtargetImpl()->getInstrInfo();
	}			}
	const X86TargetMachine *getTargetMachine() const {			const X86TargetMachine *getTargetMachine() const {
	return static_cast<const X86TargetMachine *>(&TM);			return static_cast<const X86TargetMachine *>(&TM);
	}			}

	bool handleConstantAddresses(const Value *V, X86AddressMode &AM);			bool handleConstantAddresses(const Value *V, X86AddressMode &AM);

	unsigned X86MaterializeInt(const ConstantInt *CI, MVT VT);			unsigned X86MaterializeInt(const ConstantInt *CI, MVT VT);
	unsigned X86MaterializeFP(const ConstantFP *CFP, MVT VT);			unsigned X86MaterializeFP(const ConstantFP *CFP, MVT VT);
	unsigned X86MaterializeGV(const GlobalValue *GV, MVT VT);			unsigned X86MaterializeGV(const GlobalValue *GV, MVT VT);
	unsigned fastMaterializeConstant(const Constant *C) override;			unsigned fastMaterializeConstant(const Constant *C) override;

	unsigned fastMaterializeAlloca(const AllocaInst *C) override;			unsigned fastMaterializeAlloca(const AllocaInst *C) override;

	unsigned fastMaterializeFloatZero(const ConstantFP *CF) override;			unsigned fastMaterializeFloatZero(const ConstantFP *CF) override;

	/// isScalarFPTypeInSSEReg - Return true if the specified scalar FP type is			/// isScalarFPTypeInSSEReg - Return true if the specified scalar FP type is
	/// computed in an SSE register, not on the X87 floating point stack.			/// computed in an SSE register, not on the X87 floating point stack.
	bool isScalarFPTypeInSSEReg(EVT VT) const {			bool isScalarFPTypeInSSEReg(EVT VT) const {
	return (VT == MVT::f64 && X86ScalarSSEf64) \|\| // f64 is when SSE2			return (VT == MVT::f64 && X86ScalarSSEf64) \|\| // f64 is when SSE2
	(VT == MVT::f32 && X86ScalarSSEf32); // f32 is when SSE1			(VT == MVT::f32 && X86ScalarSSEf32); // f32 is when SSE1
	}			}

	bool isTypeLegal(Type *Ty, MVT &VT, bool AllowI1 = false);			bool isTypeLegal(Type *Ty, MVT &VT, bool AllowI1 = false);

	bool IsMemcpySmall(uint64_t Len);			bool IsMemcpySmall(uint64_t Len);

	bool TryEmitSmallMemcpy(X86AddressMode DestAM,			bool TryEmitSmallMemcpy(X86AddressMode DestAM,
	X86AddressMode SrcAM, uint64_t Len);			X86AddressMode SrcAM, uint64_t Len);

	bool foldX86XALUIntrinsic(X86::CondCode &CC, const Instruction *I,			bool foldX86XALUIntrinsic(X86::CondCode &CC, const Instruction *I,
	const Value *Cond);			const Value *Cond);
	};			};

	} // end anonymous namespace.			} // end anonymous namespace.

	static std::pair<X86::CondCode, bool>			static std::pair<X86::CondCode, bool>
	getX86ConditionCode(CmpInst::Predicate Predicate) {			getX86ConditionCode(CmpInst::Predicate Predicate) {
	X86::CondCode CC = X86::COND_INVALID;			X86::CondCode CC = X86::COND_INVALID;
	bool NeedSwap = false;			bool NeedSwap = false;
	switch (Predicate) {			switch (Predicate) {
	default: break;			default: break;
	// Floating-point Predicates			// Floating-point Predicates
	case CmpInst::FCMP_UEQ: CC = X86::COND_E; break;			case CmpInst::FCMP_UEQ: CC = X86::COND_E; break;
	case CmpInst::FCMP_OLT: NeedSwap = true; // fall-through			case CmpInst::FCMP_OLT: NeedSwap = true; // fall-through
	case CmpInst::FCMP_OGT: CC = X86::COND_A; break;			case CmpInst::FCMP_OGT: CC = X86::COND_A; break;
	case CmpInst::FCMP_OLE: NeedSwap = true; // fall-through			case CmpInst::FCMP_OLE: NeedSwap = true; // fall-through
	case CmpInst::FCMP_OGE: CC = X86::COND_AE; break;			case CmpInst::FCMP_OGE: CC = X86::COND_AE; break;
	case CmpInst::FCMP_UGT: NeedSwap = true; // fall-through			case CmpInst::FCMP_UGT: NeedSwap = true; // fall-through
	case CmpInst::FCMP_ULT: CC = X86::COND_B; break;			case CmpInst::FCMP_ULT: CC = X86::COND_B; break;
	case CmpInst::FCMP_UGE: NeedSwap = true; // fall-through			case CmpInst::FCMP_UGE: NeedSwap = true; // fall-through
	case CmpInst::FCMP_ULE: CC = X86::COND_BE; break;			case CmpInst::FCMP_ULE: CC = X86::COND_BE; break;
	case CmpInst::FCMP_ONE: CC = X86::COND_NE; break;			case CmpInst::FCMP_ONE: CC = X86::COND_NE; break;
	case CmpInst::FCMP_UNO: CC = X86::COND_P; break;			case CmpInst::FCMP_UNO: CC = X86::COND_P; break;
	case CmpInst::FCMP_ORD: CC = X86::COND_NP; break;			case CmpInst::FCMP_ORD: CC = X86::COND_NP; break;
	case CmpInst::FCMP_OEQ: // fall-through			case CmpInst::FCMP_OEQ: // fall-through
	case CmpInst::FCMP_UNE: CC = X86::COND_INVALID; break;			case CmpInst::FCMP_UNE: CC = X86::COND_INVALID; break;

	// Integer Predicates			// Integer Predicates
	case CmpInst::ICMP_EQ: CC = X86::COND_E; break;			case CmpInst::ICMP_EQ: CC = X86::COND_E; break;
	case CmpInst::ICMP_NE: CC = X86::COND_NE; break;			case CmpInst::ICMP_NE: CC = X86::COND_NE; break;
	case CmpInst::ICMP_UGT: CC = X86::COND_A; break;			case CmpInst::ICMP_UGT: CC = X86::COND_A; break;
	case CmpInst::ICMP_UGE: CC = X86::COND_AE; break;			case CmpInst::ICMP_UGE: CC = X86::COND_AE; break;
	case CmpInst::ICMP_ULT: CC = X86::COND_B; break;			case CmpInst::ICMP_ULT: CC = X86::COND_B; break;
	case CmpInst::ICMP_ULE: CC = X86::COND_BE; break;			case CmpInst::ICMP_ULE: CC = X86::COND_BE; break;
	case CmpInst::ICMP_SGT: CC = X86::COND_G; break;			case CmpInst::ICMP_SGT: CC = X86::COND_G; break;
	case CmpInst::ICMP_SGE: CC = X86::COND_GE; break;			case CmpInst::ICMP_SGE: CC = X86::COND_GE; break;
	case CmpInst::ICMP_SLT: CC = X86::COND_L; break;			case CmpInst::ICMP_SLT: CC = X86::COND_L; break;
	case CmpInst::ICMP_SLE: CC = X86::COND_LE; break;			case CmpInst::ICMP_SLE: CC = X86::COND_LE; break;
	}			}

	return std::make_pair(CC, NeedSwap);			return std::make_pair(CC, NeedSwap);
	}			}

	static std::pair<unsigned, bool>			static std::pair<unsigned, bool>
	getX86SSEConditionCode(CmpInst::Predicate Predicate) {			getX86SSEConditionCode(CmpInst::Predicate Predicate) {
	unsigned CC;			unsigned CC;
	bool NeedSwap = false;			bool NeedSwap = false;

	// SSE Condition code mapping:			// SSE Condition code mapping:
	// 0 - EQ			// 0 - EQ
	// 1 - LT			// 1 - LT
	// 2 - LE			// 2 - LE
	// 3 - UNORD			// 3 - UNORD
	// 4 - NEQ			// 4 - NEQ
	// 5 - NLT			// 5 - NLT
	// 6 - NLE			// 6 - NLE
	// 7 - ORD			// 7 - ORD
	switch (Predicate) {			switch (Predicate) {
	default: llvm_unreachable("Unexpected predicate");			default: llvm_unreachable("Unexpected predicate");
	case CmpInst::FCMP_OEQ: CC = 0; break;			case CmpInst::FCMP_OEQ: CC = 0; break;
	case CmpInst::FCMP_OGT: NeedSwap = true; // fall-through			case CmpInst::FCMP_OGT: NeedSwap = true; // fall-through
	case CmpInst::FCMP_OLT: CC = 1; break;			case CmpInst::FCMP_OLT: CC = 1; break;
	case CmpInst::FCMP_OGE: NeedSwap = true; // fall-through			case CmpInst::FCMP_OGE: NeedSwap = true; // fall-through
	case CmpInst::FCMP_OLE: CC = 2; break;			case CmpInst::FCMP_OLE: CC = 2; break;
	case CmpInst::FCMP_UNO: CC = 3; break;			case CmpInst::FCMP_UNO: CC = 3; break;
	case CmpInst::FCMP_UNE: CC = 4; break;			case CmpInst::FCMP_UNE: CC = 4; break;
	case CmpInst::FCMP_ULE: NeedSwap = true; // fall-through			case CmpInst::FCMP_ULE: NeedSwap = true; // fall-through
	case CmpInst::FCMP_UGE: CC = 5; break;			case CmpInst::FCMP_UGE: CC = 5; break;
	case CmpInst::FCMP_ULT: NeedSwap = true; // fall-through			case CmpInst::FCMP_ULT: NeedSwap = true; // fall-through
	case CmpInst::FCMP_UGT: CC = 6; break;			case CmpInst::FCMP_UGT: CC = 6; break;
	case CmpInst::FCMP_ORD: CC = 7; break;			case CmpInst::FCMP_ORD: CC = 7; break;
	case CmpInst::FCMP_UEQ:			case CmpInst::FCMP_UEQ:
	case CmpInst::FCMP_ONE: CC = 8; break;			case CmpInst::FCMP_ONE: CC = 8; break;
	}			}

	return std::make_pair(CC, NeedSwap);			return std::make_pair(CC, NeedSwap);
	}			}

	/// \brief Check if it is possible to fold the condition from the XALU intrinsic			/// \brief Check if it is possible to fold the condition from the XALU intrinsic
	/// into the user. The condition code will only be updated on success.			/// into the user. The condition code will only be updated on success.
	bool X86FastISel::foldX86XALUIntrinsic(X86::CondCode &CC, const Instruction *I,			bool X86FastISel::foldX86XALUIntrinsic(X86::CondCode &CC, const Instruction *I,
	const Value *Cond) {			const Value *Cond) {
	if (!isa<ExtractValueInst>(Cond))			if (!isa<ExtractValueInst>(Cond))
	return false;			return false;

	const auto *EV = cast<ExtractValueInst>(Cond);			const auto *EV = cast<ExtractValueInst>(Cond);
	if (!isa<IntrinsicInst>(EV->getAggregateOperand()))			if (!isa<IntrinsicInst>(EV->getAggregateOperand()))
	return false;			return false;

	const auto *II = cast<IntrinsicInst>(EV->getAggregateOperand());			const auto *II = cast<IntrinsicInst>(EV->getAggregateOperand());
	MVT RetVT;			MVT RetVT;
	const Function *Callee = II->getCalledFunction();			const Function *Callee = II->getCalledFunction();
	Type *RetTy =			Type *RetTy =
	cast<StructType>(Callee->getReturnType())->getTypeAtIndex(0U);			cast<StructType>(Callee->getReturnType())->getTypeAtIndex(0U);
	if (!isTypeLegal(RetTy, RetVT))			if (!isTypeLegal(RetTy, RetVT))
	return false;			return false;

	if (RetVT != MVT::i32 && RetVT != MVT::i64)			if (RetVT != MVT::i32 && RetVT != MVT::i64)
	return false;			return false;

	X86::CondCode TmpCC;			X86::CondCode TmpCC;
	switch (II->getIntrinsicID()) {			switch (II->getIntrinsicID()) {
	default: return false;			default: return false;
	case Intrinsic::sadd_with_overflow:			case Intrinsic::sadd_with_overflow:
	case Intrinsic::ssub_with_overflow:			case Intrinsic::ssub_with_overflow:
	case Intrinsic::smul_with_overflow:			case Intrinsic::smul_with_overflow:
	case Intrinsic::umul_with_overflow: TmpCC = X86::COND_O; break;			case Intrinsic::umul_with_overflow: TmpCC = X86::COND_O; break;
	case Intrinsic::uadd_with_overflow:			case Intrinsic::uadd_with_overflow:
	case Intrinsic::usub_with_overflow: TmpCC = X86::COND_B; break;			case Intrinsic::usub_with_overflow: TmpCC = X86::COND_B; break;
	}			}

	// Check if both instructions are in the same basic block.			// Check if both instructions are in the same basic block.
	if (II->getParent() != I->getParent())			if (II->getParent() != I->getParent())
	return false;			return false;

	// Make sure nothing is in the way			// Make sure nothing is in the way
	BasicBlock::const_iterator Start = I;			BasicBlock::const_iterator Start = I;
	BasicBlock::const_iterator End = II;			BasicBlock::const_iterator End = II;
	for (auto Itr = std::prev(Start); Itr != End; --Itr) {			for (auto Itr = std::prev(Start); Itr != End; --Itr) {
	// We only expect extractvalue instructions between the intrinsic and the			// We only expect extractvalue instructions between the intrinsic and the
	// instruction to be selected.			// instruction to be selected.
	if (!isa<ExtractValueInst>(Itr))			if (!isa<ExtractValueInst>(Itr))
	return false;			return false;

	// Check that the extractvalue operand comes from the intrinsic.			// Check that the extractvalue operand comes from the intrinsic.
	const auto *EVI = cast<ExtractValueInst>(Itr);			const auto *EVI = cast<ExtractValueInst>(Itr);
	if (EVI->getAggregateOperand() != II)			if (EVI->getAggregateOperand() != II)
	return false;			return false;
	}			}

	CC = TmpCC;			CC = TmpCC;
	return true;			return true;
	}			}

	bool X86FastISel::isTypeLegal(Type *Ty, MVT &VT, bool AllowI1) {			bool X86FastISel::isTypeLegal(Type *Ty, MVT &VT, bool AllowI1) {
	EVT evt = TLI.getValueType(Ty, /HandleUnknown=/true);			EVT evt = TLI.getValueType(Ty, /HandleUnknown=/true);
	if (evt == MVT::Other \|\| !evt.isSimple())			if (evt == MVT::Other \|\| !evt.isSimple())
	// Unhandled type. Halt "fast" selection and bail.			// Unhandled type. Halt "fast" selection and bail.
	return false;			return false;

	VT = evt.getSimpleVT();			VT = evt.getSimpleVT();
	// For now, require SSE/SSE2 for performing floating-point operations,			// For now, require SSE/SSE2 for performing floating-point operations,
	// since x87 requires additional work.			// since x87 requires additional work.
	if (VT == MVT::f64 && !X86ScalarSSEf64)			if (VT == MVT::f64 && !X86ScalarSSEf64)
	return false;			return false;
	if (VT == MVT::f32 && !X86ScalarSSEf32)			if (VT == MVT::f32 && !X86ScalarSSEf32)
	return false;			return false;
	// Similarly, no f80 support yet.			// Similarly, no f80 support yet.
	if (VT == MVT::f80)			if (VT == MVT::f80)
	return false;			return false;
	// We only handle legal types. For example, on x86-32 the instruction			// We only handle legal types. For example, on x86-32 the instruction
	// selector contains all of the 64-bit instructions from x86-64,			// selector contains all of the 64-bit instructions from x86-64,
	// under the assumption that i64 won't be used if the target doesn't			// under the assumption that i64 won't be used if the target doesn't
	// support it.			// support it.
	return (AllowI1 && VT == MVT::i1) \|\| TLI.isTypeLegal(VT);			return (AllowI1 && VT == MVT::i1) \|\| TLI.isTypeLegal(VT);
	}			}

	#include "X86GenCallingConv.inc"			#include "X86GenCallingConv.inc"

	/// X86FastEmitLoad - Emit a machine instruction to load a value of type VT.			/// X86FastEmitLoad - Emit a machine instruction to load a value of type VT.
	/// The address is either pre-computed, i.e. Ptr, or a GlobalAddress, i.e. GV.			/// The address is either pre-computed, i.e. Ptr, or a GlobalAddress, i.e. GV.
	/// Return true and the result register by reference if it is possible.			/// Return true and the result register by reference if it is possible.
	bool X86FastISel::X86FastEmitLoad(EVT VT, const X86AddressMode &AM,			bool X86FastISel::X86FastEmitLoad(EVT VT, const X86AddressMode &AM,
	MachineMemOperand *MMO, unsigned &ResultReg) {			MachineMemOperand *MMO, unsigned &ResultReg) {
	// Get opcode and regclass of the output for the given load instruction.			// Get opcode and regclass of the output for the given load instruction.
	unsigned Opc = 0;			unsigned Opc = 0;
	const TargetRegisterClass *RC = nullptr;			const TargetRegisterClass *RC = nullptr;
	switch (VT.getSimpleVT().SimpleTy) {			switch (VT.getSimpleVT().SimpleTy) {
	default: return false;			default: return false;
	case MVT::i1:			case MVT::i1:
	case MVT::i8:			case MVT::i8:
	Opc = X86::MOV8rm;			Opc = X86::MOV8rm;
	RC = &X86::GR8RegClass;			RC = &X86::GR8RegClass;
	break;			break;
	case MVT::i16:			case MVT::i16:
	Opc = X86::MOV16rm;			Opc = X86::MOV16rm;
	RC = &X86::GR16RegClass;			RC = &X86::GR16RegClass;
	break;			break;
	case MVT::i32:			case MVT::i32:
	Opc = X86::MOV32rm;			Opc = X86::MOV32rm;
	RC = &X86::GR32RegClass;			RC = &X86::GR32RegClass;
	break;			break;
	case MVT::i64:			case MVT::i64:
	// Must be in x86-64 mode.			// Must be in x86-64 mode.
	Opc = X86::MOV64rm;			Opc = X86::MOV64rm;
	RC = &X86::GR64RegClass;			RC = &X86::GR64RegClass;
	break;			break;
	case MVT::f32:			case MVT::f32:
	if (X86ScalarSSEf32) {			if (X86ScalarSSEf32) {
	Opc = Subtarget->hasAVX() ? X86::VMOVSSrm : X86::MOVSSrm;			Opc = Subtarget->hasAVX() ? X86::VMOVSSrm : X86::MOVSSrm;
	RC = &X86::FR32RegClass;			RC = &X86::FR32RegClass;
	} else {			} else {
	Opc = X86::LD_Fp32m;			Opc = X86::LD_Fp32m;
	RC = &X86::RFP32RegClass;			RC = &X86::RFP32RegClass;
	}			}
	break;			break;
	case MVT::f64:			case MVT::f64:
	if (X86ScalarSSEf64) {			if (X86ScalarSSEf64) {
	Opc = Subtarget->hasAVX() ? X86::VMOVSDrm : X86::MOVSDrm;			Opc = Subtarget->hasAVX() ? X86::VMOVSDrm : X86::MOVSDrm;
	RC = &X86::FR64RegClass;			RC = &X86::FR64RegClass;
	} else {			} else {
	Opc = X86::LD_Fp64m;			Opc = X86::LD_Fp64m;
	RC = &X86::RFP64RegClass;			RC = &X86::RFP64RegClass;
	}			}
	break;			break;
	case MVT::f80:			case MVT::f80:
	// No f80 support yet.			// No f80 support yet.
	return false;			return false;
	}			}

	ResultReg = createResultReg(RC);			ResultReg = createResultReg(RC);
	MachineInstrBuilder MIB =			MachineInstrBuilder MIB =
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc), ResultReg);			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc), ResultReg);
	addFullAddress(MIB, AM);			addFullAddress(MIB, AM);
	if (MMO)			if (MMO)
	MIB->addMemOperand(*FuncInfo.MF, MMO);			MIB->addMemOperand(*FuncInfo.MF, MMO);
	return true;			return true;
	}			}

	/// X86FastEmitStore - Emit a machine instruction to store a value Val of			/// X86FastEmitStore - Emit a machine instruction to store a value Val of
	/// type VT. The address is either pre-computed, consisted of a base ptr, Ptr			/// type VT. The address is either pre-computed, consisted of a base ptr, Ptr
	/// and a displacement offset, or a GlobalAddress,			/// and a displacement offset, or a GlobalAddress,
	/// i.e. V. Return true if it is possible.			/// i.e. V. Return true if it is possible.
	bool X86FastISel::X86FastEmitStore(EVT VT, unsigned ValReg, bool ValIsKill,			bool X86FastISel::X86FastEmitStore(EVT VT, unsigned ValReg, bool ValIsKill,
	const X86AddressMode &AM,			const X86AddressMode &AM,
	MachineMemOperand *MMO, bool Aligned) {			MachineMemOperand *MMO, bool Aligned) {
	// Get opcode and regclass of the output for the given store instruction.			// Get opcode and regclass of the output for the given store instruction.
	unsigned Opc = 0;			unsigned Opc = 0;
	switch (VT.getSimpleVT().SimpleTy) {			switch (VT.getSimpleVT().SimpleTy) {
	case MVT::f80: // No f80 support yet.			case MVT::f80: // No f80 support yet.
	default: return false;			default: return false;
	case MVT::i1: {			case MVT::i1: {
	// Mask out all but lowest bit.			// Mask out all but lowest bit.
	unsigned AndResult = createResultReg(&X86::GR8RegClass);			unsigned AndResult = createResultReg(&X86::GR8RegClass);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(X86::AND8ri), AndResult)			TII.get(X86::AND8ri), AndResult)
	.addReg(ValReg, getKillRegState(ValIsKill)).addImm(1);			.addReg(ValReg, getKillRegState(ValIsKill)).addImm(1);
	ValReg = AndResult;			ValReg = AndResult;
	}			}
	// FALLTHROUGH, handling i1 as i8.			// FALLTHROUGH, handling i1 as i8.
	case MVT::i8: Opc = X86::MOV8mr; break;			case MVT::i8: Opc = X86::MOV8mr; break;
	case MVT::i16: Opc = X86::MOV16mr; break;			case MVT::i16: Opc = X86::MOV16mr; break;
	case MVT::i32: Opc = X86::MOV32mr; break;			case MVT::i32: Opc = X86::MOV32mr; break;
	case MVT::i64: Opc = X86::MOV64mr; break; // Must be in x86-64 mode.			case MVT::i64: Opc = X86::MOV64mr; break; // Must be in x86-64 mode.
	case MVT::f32:			case MVT::f32:
	Opc = X86ScalarSSEf32 ?			Opc = X86ScalarSSEf32 ?
	(Subtarget->hasAVX() ? X86::VMOVSSmr : X86::MOVSSmr) : X86::ST_Fp32m;			(Subtarget->hasAVX() ? X86::VMOVSSmr : X86::MOVSSmr) : X86::ST_Fp32m;
	break;			break;
	case MVT::f64:			case MVT::f64:
	Opc = X86ScalarSSEf64 ?			Opc = X86ScalarSSEf64 ?
	(Subtarget->hasAVX() ? X86::VMOVSDmr : X86::MOVSDmr) : X86::ST_Fp64m;			(Subtarget->hasAVX() ? X86::VMOVSDmr : X86::MOVSDmr) : X86::ST_Fp64m;
	break;			break;
	case MVT::v4f32:			case MVT::v4f32:
	if (Aligned)			if (Aligned)
	Opc = Subtarget->hasAVX() ? X86::VMOVAPSmr : X86::MOVAPSmr;			Opc = Subtarget->hasAVX() ? X86::VMOVAPSmr : X86::MOVAPSmr;
	else			else
	Opc = Subtarget->hasAVX() ? X86::VMOVUPSmr : X86::MOVUPSmr;			Opc = Subtarget->hasAVX() ? X86::VMOVUPSmr : X86::MOVUPSmr;
	break;			break;
	case MVT::v2f64:			case MVT::v2f64:
	if (Aligned)			if (Aligned)
	Opc = Subtarget->hasAVX() ? X86::VMOVAPDmr : X86::MOVAPDmr;			Opc = Subtarget->hasAVX() ? X86::VMOVAPDmr : X86::MOVAPDmr;
	else			else
	Opc = Subtarget->hasAVX() ? X86::VMOVUPDmr : X86::MOVUPDmr;			Opc = Subtarget->hasAVX() ? X86::VMOVUPDmr : X86::MOVUPDmr;
	break;			break;
	case MVT::v4i32:			case MVT::v4i32:
	case MVT::v2i64:			case MVT::v2i64:
	case MVT::v8i16:			case MVT::v8i16:
	case MVT::v16i8:			case MVT::v16i8:
	if (Aligned)			if (Aligned)
	Opc = Subtarget->hasAVX() ? X86::VMOVDQAmr : X86::MOVDQAmr;			Opc = Subtarget->hasAVX() ? X86::VMOVDQAmr : X86::MOVDQAmr;
	else			else
	Opc = Subtarget->hasAVX() ? X86::VMOVDQUmr : X86::MOVDQUmr;			Opc = Subtarget->hasAVX() ? X86::VMOVDQUmr : X86::MOVDQUmr;
	break;			break;
	}			}

	MachineInstrBuilder MIB =			MachineInstrBuilder MIB =
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc));			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc));
	addFullAddress(MIB, AM).addReg(ValReg, getKillRegState(ValIsKill));			addFullAddress(MIB, AM).addReg(ValReg, getKillRegState(ValIsKill));
	if (MMO)			if (MMO)
	MIB->addMemOperand(*FuncInfo.MF, MMO);			MIB->addMemOperand(*FuncInfo.MF, MMO);

	return true;			return true;
	}			}

	bool X86FastISel::X86FastEmitStore(EVT VT, const Value *Val,			bool X86FastISel::X86FastEmitStore(EVT VT, const Value *Val,
	const X86AddressMode &AM,			const X86AddressMode &AM,
	MachineMemOperand *MMO, bool Aligned) {			MachineMemOperand *MMO, bool Aligned) {
	// Handle 'null' like i32/i64 0.			// Handle 'null' like i32/i64 0.
	if (isa<ConstantPointerNull>(Val))			if (isa<ConstantPointerNull>(Val))
	Val = Constant::getNullValue(DL.getIntPtrType(Val->getContext()));			Val = Constant::getNullValue(DL.getIntPtrType(Val->getContext()));

	// If this is a store of a simple constant, fold the constant into the store.			// If this is a store of a simple constant, fold the constant into the store.
	if (const ConstantInt *CI = dyn_cast<ConstantInt>(Val)) {			if (const ConstantInt *CI = dyn_cast<ConstantInt>(Val)) {
	unsigned Opc = 0;			unsigned Opc = 0;
	bool Signed = true;			bool Signed = true;
	switch (VT.getSimpleVT().SimpleTy) {			switch (VT.getSimpleVT().SimpleTy) {
	default: break;			default: break;
	case MVT::i1: Signed = false; // FALLTHROUGH to handle as i8.			case MVT::i1: Signed = false; // FALLTHROUGH to handle as i8.
	case MVT::i8: Opc = X86::MOV8mi; break;			case MVT::i8: Opc = X86::MOV8mi; break;
	case MVT::i16: Opc = X86::MOV16mi; break;			case MVT::i16: Opc = X86::MOV16mi; break;
	case MVT::i32: Opc = X86::MOV32mi; break;			case MVT::i32: Opc = X86::MOV32mi; break;
	case MVT::i64:			case MVT::i64:
	// Must be a 32-bit sign extended value.			// Must be a 32-bit sign extended value.
	if (isInt<32>(CI->getSExtValue()))			if (isInt<32>(CI->getSExtValue()))
	Opc = X86::MOV64mi32;			Opc = X86::MOV64mi32;
	break;			break;
	}			}

	if (Opc) {			if (Opc) {
	MachineInstrBuilder MIB =			MachineInstrBuilder MIB =
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc));			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc));
	addFullAddress(MIB, AM).addImm(Signed ? (uint64_t) CI->getSExtValue()			addFullAddress(MIB, AM).addImm(Signed ? (uint64_t) CI->getSExtValue()
	: CI->getZExtValue());			: CI->getZExtValue());
	if (MMO)			if (MMO)
	MIB->addMemOperand(*FuncInfo.MF, MMO);			MIB->addMemOperand(*FuncInfo.MF, MMO);
	return true;			return true;
	}			}
	}			}

	unsigned ValReg = getRegForValue(Val);			unsigned ValReg = getRegForValue(Val);
	if (ValReg == 0)			if (ValReg == 0)
	return false;			return false;

	bool ValKill = hasTrivialKill(Val);			bool ValKill = hasTrivialKill(Val);
	return X86FastEmitStore(VT, ValReg, ValKill, AM, MMO, Aligned);			return X86FastEmitStore(VT, ValReg, ValKill, AM, MMO, Aligned);
	}			}

	/// X86FastEmitExtend - Emit a machine instruction to extend a value Src of			/// X86FastEmitExtend - Emit a machine instruction to extend a value Src of
	/// type SrcVT to type DstVT using the specified extension opcode Opc (e.g.			/// type SrcVT to type DstVT using the specified extension opcode Opc (e.g.
	/// ISD::SIGN_EXTEND).			/// ISD::SIGN_EXTEND).
	bool X86FastISel::X86FastEmitExtend(ISD::NodeType Opc, EVT DstVT,			bool X86FastISel::X86FastEmitExtend(ISD::NodeType Opc, EVT DstVT,
	unsigned Src, EVT SrcVT,			unsigned Src, EVT SrcVT,
	unsigned &ResultReg) {			unsigned &ResultReg) {
	unsigned RR = fastEmit_r(SrcVT.getSimpleVT(), DstVT.getSimpleVT(), Opc,			unsigned RR = fastEmit_r(SrcVT.getSimpleVT(), DstVT.getSimpleVT(), Opc,
	Src, /TODO: Kill=/false);			Src, /TODO: Kill=/false);
	if (RR == 0)			if (RR == 0)
	return false;			return false;

	ResultReg = RR;			ResultReg = RR;
	return true;			return true;
	}			}

	bool X86FastISel::handleConstantAddresses(const Value *V, X86AddressMode &AM) {			bool X86FastISel::handleConstantAddresses(const Value *V, X86AddressMode &AM) {
	// Handle constant address.			// Handle constant address.
	if (const GlobalValue *GV = dyn_cast<GlobalValue>(V)) {			if (const GlobalValue *GV = dyn_cast<GlobalValue>(V)) {
	// Can't handle alternate code models yet.			// Can't handle alternate code models yet.
	if (TM.getCodeModel() != CodeModel::Small)			if (TM.getCodeModel() != CodeModel::Small)
	return false;			return false;

	// Can't handle TLS yet.			// Can't handle TLS yet.
	if (GV->isThreadLocal())			if (GV->isThreadLocal())
	return false;			return false;

	// RIP-relative addresses can't have additional register operands, so if			// RIP-relative addresses can't have additional register operands, so if
	// we've already folded stuff into the addressing mode, just force the			// we've already folded stuff into the addressing mode, just force the
	// global value into its own register, which we can use as the basereg.			// global value into its own register, which we can use as the basereg.
	if (!Subtarget->isPICStyleRIPRel() \|\|			if (!Subtarget->isPICStyleRIPRel() \|\|
	(AM.Base.Reg == 0 && AM.IndexReg == 0)) {			(AM.Base.Reg == 0 && AM.IndexReg == 0)) {
	// Okay, we've committed to selecting this global. Set up the address.			// Okay, we've committed to selecting this global. Set up the address.
	AM.GV = GV;			AM.GV = GV;

	// Allow the subtarget to classify the global.			// Allow the subtarget to classify the global.
	unsigned char GVFlags = Subtarget->ClassifyGlobalReference(GV, TM);			unsigned char GVFlags = Subtarget->ClassifyGlobalReference(GV, TM);

	// If this reference is relative to the pic base, set it now.			// If this reference is relative to the pic base, set it now.
	if (isGlobalRelativeToPICBase(GVFlags)) {			if (isGlobalRelativeToPICBase(GVFlags)) {
	// FIXME: How do we know Base.Reg is free??			// FIXME: How do we know Base.Reg is free??
	AM.Base.Reg = getInstrInfo()->getGlobalBaseReg(FuncInfo.MF);			AM.Base.Reg = getInstrInfo()->getGlobalBaseReg(FuncInfo.MF);
	}			}

	// Unless the ABI requires an extra load, return a direct reference to			// Unless the ABI requires an extra load, return a direct reference to
	// the global.			// the global.
	if (!isGlobalStubReference(GVFlags)) {			if (!isGlobalStubReference(GVFlags)) {
	if (Subtarget->isPICStyleRIPRel()) {			if (Subtarget->isPICStyleRIPRel()) {
	// Use rip-relative addressing if we can. Above we verified that the			// Use rip-relative addressing if we can. Above we verified that the
	// base and index registers are unused.			// base and index registers are unused.
	assert(AM.Base.Reg == 0 && AM.IndexReg == 0);			assert(AM.Base.Reg == 0 && AM.IndexReg == 0);
	AM.Base.Reg = X86::RIP;			AM.Base.Reg = X86::RIP;
	}			}
	AM.GVOpFlags = GVFlags;			AM.GVOpFlags = GVFlags;
	return true;			return true;
	}			}

	// Ok, we need to do a load from a stub. If we've already loaded from			// Ok, we need to do a load from a stub. If we've already loaded from
	// this stub, reuse the loaded pointer, otherwise emit the load now.			// this stub, reuse the loaded pointer, otherwise emit the load now.
	DenseMap<const Value *, unsigned>::iterator I = LocalValueMap.find(V);			DenseMap<const Value *, unsigned>::iterator I = LocalValueMap.find(V);
	unsigned LoadReg;			unsigned LoadReg;
	if (I != LocalValueMap.end() && I->second != 0) {			if (I != LocalValueMap.end() && I->second != 0) {
	LoadReg = I->second;			LoadReg = I->second;
	} else {			} else {
	// Issue load from stub.			// Issue load from stub.
	unsigned Opc = 0;			unsigned Opc = 0;
	const TargetRegisterClass *RC = nullptr;			const TargetRegisterClass *RC = nullptr;
	X86AddressMode StubAM;			X86AddressMode StubAM;
	StubAM.Base.Reg = AM.Base.Reg;			StubAM.Base.Reg = AM.Base.Reg;
	StubAM.GV = GV;			StubAM.GV = GV;
	StubAM.GVOpFlags = GVFlags;			StubAM.GVOpFlags = GVFlags;

	// Prepare for inserting code in the local-value area.			// Prepare for inserting code in the local-value area.
	SavePoint SaveInsertPt = enterLocalValueArea();			SavePoint SaveInsertPt = enterLocalValueArea();

	if (TLI.getPointerTy() == MVT::i64) {			if (TLI.getPointerTy() == MVT::i64) {
	Opc = X86::MOV64rm;			Opc = X86::MOV64rm;
	RC = &X86::GR64RegClass;			RC = &X86::GR64RegClass;

	if (Subtarget->isPICStyleRIPRel())			if (Subtarget->isPICStyleRIPRel())
	StubAM.Base.Reg = X86::RIP;			StubAM.Base.Reg = X86::RIP;
	} else {			} else {
	Opc = X86::MOV32rm;			Opc = X86::MOV32rm;
	RC = &X86::GR32RegClass;			RC = &X86::GR32RegClass;
	}			}

	LoadReg = createResultReg(RC);			LoadReg = createResultReg(RC);
	MachineInstrBuilder LoadMI =			MachineInstrBuilder LoadMI =
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc), LoadReg);			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc), LoadReg);
	addFullAddress(LoadMI, StubAM);			addFullAddress(LoadMI, StubAM);

	// Ok, back to normal mode.			// Ok, back to normal mode.
	leaveLocalValueArea(SaveInsertPt);			leaveLocalValueArea(SaveInsertPt);

	// Prevent loading GV stub multiple times in same MBB.			// Prevent loading GV stub multiple times in same MBB.
	LocalValueMap[V] = LoadReg;			LocalValueMap[V] = LoadReg;
	}			}

	// Now construct the final address. Note that the Disp, Scale,			// Now construct the final address. Note that the Disp, Scale,
	// and Index values may already be set here.			// and Index values may already be set here.
	AM.Base.Reg = LoadReg;			AM.Base.Reg = LoadReg;
	AM.GV = nullptr;			AM.GV = nullptr;
	return true;			return true;
	}			}
	}			}

	// If all else fails, try to materialize the value in a register.			// If all else fails, try to materialize the value in a register.
	if (!AM.GV \|\| !Subtarget->isPICStyleRIPRel()) {			if (!AM.GV \|\| !Subtarget->isPICStyleRIPRel()) {
	if (AM.Base.Reg == 0) {			if (AM.Base.Reg == 0) {
	AM.Base.Reg = getRegForValue(V);			AM.Base.Reg = getRegForValue(V);
	return AM.Base.Reg != 0;			return AM.Base.Reg != 0;
	}			}
	if (AM.IndexReg == 0) {			if (AM.IndexReg == 0) {
	assert(AM.Scale == 1 && "Scale with no index!");			assert(AM.Scale == 1 && "Scale with no index!");
	AM.IndexReg = getRegForValue(V);			AM.IndexReg = getRegForValue(V);
	return AM.IndexReg != 0;			return AM.IndexReg != 0;
	}			}
	}			}

	return false;			return false;
	}			}

	/// X86SelectAddress - Attempt to fill in an address from the given value.			/// X86SelectAddress - Attempt to fill in an address from the given value.
	///			///
	bool X86FastISel::X86SelectAddress(const Value *V, X86AddressMode &AM) {			bool X86FastISel::X86SelectAddress(const Value *V, X86AddressMode &AM) {
	SmallVector<const Value *, 32> GEPs;			SmallVector<const Value *, 32> GEPs;
	redo_gep:			redo_gep:
	const User *U = nullptr;			const User *U = nullptr;
	unsigned Opcode = Instruction::UserOp1;			unsigned Opcode = Instruction::UserOp1;
	if (const Instruction *I = dyn_cast<Instruction>(V)) {			if (const Instruction *I = dyn_cast<Instruction>(V)) {
	// Don't walk into other basic blocks; it's possible we haven't			// Don't walk into other basic blocks; it's possible we haven't
	// visited them yet, so the instructions may not yet be assigned			// visited them yet, so the instructions may not yet be assigned
	// virtual registers.			// virtual registers.
	if (FuncInfo.StaticAllocaMap.count(static_cast<const AllocaInst *>(V)) \|\|			if (FuncInfo.StaticAllocaMap.count(static_cast<const AllocaInst *>(V)) \|\|
	FuncInfo.MBBMap[I->getParent()] == FuncInfo.MBB) {			FuncInfo.MBBMap[I->getParent()] == FuncInfo.MBB) {
	Opcode = I->getOpcode();			Opcode = I->getOpcode();
	U = I;			U = I;
	}			}
	} else if (const ConstantExpr *C = dyn_cast<ConstantExpr>(V)) {			} else if (const ConstantExpr *C = dyn_cast<ConstantExpr>(V)) {
	Opcode = C->getOpcode();			Opcode = C->getOpcode();
	U = C;			U = C;
	}			}

	if (PointerType *Ty = dyn_cast<PointerType>(V->getType()))			if (PointerType *Ty = dyn_cast<PointerType>(V->getType()))
	if (Ty->getAddressSpace() > 255)			if (Ty->getAddressSpace() > 255)
	// Fast instruction selection doesn't support the special			// Fast instruction selection doesn't support the special
	// address spaces.			// address spaces.
	return false;			return false;

	switch (Opcode) {			switch (Opcode) {
	default: break;			default: break;
	case Instruction::BitCast:			case Instruction::BitCast:
	// Look past bitcasts.			// Look past bitcasts.
	return X86SelectAddress(U->getOperand(0), AM);			return X86SelectAddress(U->getOperand(0), AM);

	case Instruction::IntToPtr:			case Instruction::IntToPtr:
	// Look past no-op inttoptrs.			// Look past no-op inttoptrs.
	if (TLI.getValueType(U->getOperand(0)->getType()) == TLI.getPointerTy())			if (TLI.getValueType(U->getOperand(0)->getType()) == TLI.getPointerTy())
	return X86SelectAddress(U->getOperand(0), AM);			return X86SelectAddress(U->getOperand(0), AM);
	break;			break;

	case Instruction::PtrToInt:			case Instruction::PtrToInt:
	// Look past no-op ptrtoints.			// Look past no-op ptrtoints.
	if (TLI.getValueType(U->getType()) == TLI.getPointerTy())			if (TLI.getValueType(U->getType()) == TLI.getPointerTy())
	return X86SelectAddress(U->getOperand(0), AM);			return X86SelectAddress(U->getOperand(0), AM);
	break;			break;

	case Instruction::Alloca: {			case Instruction::Alloca: {
	// Do static allocas.			// Do static allocas.
	const AllocaInst *A = cast<AllocaInst>(V);			const AllocaInst *A = cast<AllocaInst>(V);
	DenseMap<const AllocaInst *, int>::iterator SI =			DenseMap<const AllocaInst *, int>::iterator SI =
	FuncInfo.StaticAllocaMap.find(A);			FuncInfo.StaticAllocaMap.find(A);
	if (SI != FuncInfo.StaticAllocaMap.end()) {			if (SI != FuncInfo.StaticAllocaMap.end()) {
	AM.BaseType = X86AddressMode::FrameIndexBase;			AM.BaseType = X86AddressMode::FrameIndexBase;
	AM.Base.FrameIndex = SI->second;			AM.Base.FrameIndex = SI->second;
	return true;			return true;
	}			}
	break;			break;
	}			}

	case Instruction::Add: {			case Instruction::Add: {
	// Adds of constants are common and easy enough.			// Adds of constants are common and easy enough.
	if (const ConstantInt *CI = dyn_cast<ConstantInt>(U->getOperand(1))) {			if (const ConstantInt *CI = dyn_cast<ConstantInt>(U->getOperand(1))) {
	uint64_t Disp = (int32_t)AM.Disp + (uint64_t)CI->getSExtValue();			uint64_t Disp = (int32_t)AM.Disp + (uint64_t)CI->getSExtValue();
	// They have to fit in the 32-bit signed displacement field though.			// They have to fit in the 32-bit signed displacement field though.
	if (isInt<32>(Disp)) {			if (isInt<32>(Disp)) {
	AM.Disp = (uint32_t)Disp;			AM.Disp = (uint32_t)Disp;
	return X86SelectAddress(U->getOperand(0), AM);			return X86SelectAddress(U->getOperand(0), AM);
	}			}
	}			}
	break;			break;
	}			}

	case Instruction::GetElementPtr: {			case Instruction::GetElementPtr: {
	X86AddressMode SavedAM = AM;			X86AddressMode SavedAM = AM;

	// Pattern-match simple GEPs.			// Pattern-match simple GEPs.
	uint64_t Disp = (int32_t)AM.Disp;			uint64_t Disp = (int32_t)AM.Disp;
	unsigned IndexReg = AM.IndexReg;			unsigned IndexReg = AM.IndexReg;
	unsigned Scale = AM.Scale;			unsigned Scale = AM.Scale;
	gep_type_iterator GTI = gep_type_begin(U);			gep_type_iterator GTI = gep_type_begin(U);
	// Iterate through the indices, folding what we can. Constants can be			// Iterate through the indices, folding what we can. Constants can be
	// folded, and one dynamic index can be handled, if the scale is supported.			// folded, and one dynamic index can be handled, if the scale is supported.
	for (User::const_op_iterator i = U->op_begin() + 1, e = U->op_end();			for (User::const_op_iterator i = U->op_begin() + 1, e = U->op_end();
	i != e; ++i, ++GTI) {			i != e; ++i, ++GTI) {
	const Value Op = i;			const Value Op = i;
	if (StructType STy = dyn_cast<StructType>(GTI)) {			if (StructType STy = dyn_cast<StructType>(GTI)) {
	const StructLayout *SL = DL.getStructLayout(STy);			const StructLayout *SL = DL.getStructLayout(STy);
	Disp += SL->getElementOffset(cast<ConstantInt>(Op)->getZExtValue());			Disp += SL->getElementOffset(cast<ConstantInt>(Op)->getZExtValue());
	continue;			continue;
	}			}

	// A array/variable index is always of the form i*S where S is the			// A array/variable index is always of the form i*S where S is the
	// constant scale size. See if we can push the scale into immediates.			// constant scale size. See if we can push the scale into immediates.
	uint64_t S = DL.getTypeAllocSize(GTI.getIndexedType());			uint64_t S = DL.getTypeAllocSize(GTI.getIndexedType());
	for (;;) {			for (;;) {
	if (const ConstantInt *CI = dyn_cast<ConstantInt>(Op)) {			if (const ConstantInt *CI = dyn_cast<ConstantInt>(Op)) {
	// Constant-offset addressing.			// Constant-offset addressing.
	Disp += CI->getSExtValue() * S;			Disp += CI->getSExtValue() * S;
	break;			break;
	}			}
	if (canFoldAddIntoGEP(U, Op)) {			if (canFoldAddIntoGEP(U, Op)) {
	// A compatible add with a constant operand. Fold the constant.			// A compatible add with a constant operand. Fold the constant.
	ConstantInt *CI =			ConstantInt *CI =
	cast<ConstantInt>(cast<AddOperator>(Op)->getOperand(1));			cast<ConstantInt>(cast<AddOperator>(Op)->getOperand(1));
	Disp += CI->getSExtValue() * S;			Disp += CI->getSExtValue() * S;
	// Iterate on the other operand.			// Iterate on the other operand.
	Op = cast<AddOperator>(Op)->getOperand(0);			Op = cast<AddOperator>(Op)->getOperand(0);
	continue;			continue;
	}			}
	if (IndexReg == 0 &&			if (IndexReg == 0 &&
	(!AM.GV \|\| !Subtarget->isPICStyleRIPRel()) &&			(!AM.GV \|\| !Subtarget->isPICStyleRIPRel()) &&
	(S == 1 \|\| S == 2 \|\| S == 4 \|\| S == 8)) {			(S == 1 \|\| S == 2 \|\| S == 4 \|\| S == 8)) {
	// Scaled-index addressing.			// Scaled-index addressing.
	Scale = S;			Scale = S;
	IndexReg = getRegForGEPIndex(Op).first;			IndexReg = getRegForGEPIndex(Op).first;
	if (IndexReg == 0)			if (IndexReg == 0)
	return false;			return false;
	break;			break;
	}			}
	// Unsupported.			// Unsupported.
	goto unsupported_gep;			goto unsupported_gep;
	}			}
	}			}

	// Check for displacement overflow.			// Check for displacement overflow.
	if (!isInt<32>(Disp))			if (!isInt<32>(Disp))
	break;			break;

	AM.IndexReg = IndexReg;			AM.IndexReg = IndexReg;
	AM.Scale = Scale;			AM.Scale = Scale;
	AM.Disp = (uint32_t)Disp;			AM.Disp = (uint32_t)Disp;
	GEPs.push_back(V);			GEPs.push_back(V);

	if (const GetElementPtrInst *GEP =			if (const GetElementPtrInst *GEP =
	dyn_cast<GetElementPtrInst>(U->getOperand(0))) {			dyn_cast<GetElementPtrInst>(U->getOperand(0))) {
	// Ok, the GEP indices were covered by constant-offset and scaled-index			// Ok, the GEP indices were covered by constant-offset and scaled-index
	// addressing. Update the address state and move on to examining the base.			// addressing. Update the address state and move on to examining the base.
	V = GEP;			V = GEP;
	goto redo_gep;			goto redo_gep;
	} else if (X86SelectAddress(U->getOperand(0), AM)) {			} else if (X86SelectAddress(U->getOperand(0), AM)) {
	return true;			return true;
	}			}

	// If we couldn't merge the gep value into this addr mode, revert back to			// If we couldn't merge the gep value into this addr mode, revert back to
	// our address and just match the value instead of completely failing.			// our address and just match the value instead of completely failing.
	AM = SavedAM;			AM = SavedAM;

	for (SmallVectorImpl<const Value *>::reverse_iterator			for (SmallVectorImpl<const Value *>::reverse_iterator
	I = GEPs.rbegin(), E = GEPs.rend(); I != E; ++I)			I = GEPs.rbegin(), E = GEPs.rend(); I != E; ++I)
	if (handleConstantAddresses(*I, AM))			if (handleConstantAddresses(*I, AM))
	return true;			return true;

	return false;			return false;
	unsupported_gep:			unsupported_gep:
	// Ok, the GEP indices weren't all covered.			// Ok, the GEP indices weren't all covered.
	break;			break;
	}			}
	}			}

	return handleConstantAddresses(V, AM);			return handleConstantAddresses(V, AM);
	}			}

	/// X86SelectCallAddress - Attempt to fill in an address from the given value.			/// X86SelectCallAddress - Attempt to fill in an address from the given value.
	///			///
	bool X86FastISel::X86SelectCallAddress(const Value *V, X86AddressMode &AM) {			bool X86FastISel::X86SelectCallAddress(const Value *V, X86AddressMode &AM) {
	const User *U = nullptr;			const User *U = nullptr;
	unsigned Opcode = Instruction::UserOp1;			unsigned Opcode = Instruction::UserOp1;
	const Instruction *I = dyn_cast<Instruction>(V);			const Instruction *I = dyn_cast<Instruction>(V);
	// Record if the value is defined in the same basic block.			// Record if the value is defined in the same basic block.
	//			//
	// This information is crucial to know whether or not folding an			// This information is crucial to know whether or not folding an
	// operand is valid.			// operand is valid.
	// Indeed, FastISel generates or reuses a virtual register for all			// Indeed, FastISel generates or reuses a virtual register for all
	// operands of all instructions it selects. Obviously, the definition and			// operands of all instructions it selects. Obviously, the definition and
	// its uses must use the same virtual register otherwise the produced			// its uses must use the same virtual register otherwise the produced
	// code is incorrect.			// code is incorrect.
	// Before instruction selection, FunctionLoweringInfo::set sets the virtual			// Before instruction selection, FunctionLoweringInfo::set sets the virtual
	// registers for values that are alive across basic blocks. This ensures			// registers for values that are alive across basic blocks. This ensures
	// that the values are consistently set between across basic block, even			// that the values are consistently set between across basic block, even
	// if different instruction selection mechanisms are used (e.g., a mix of			// if different instruction selection mechanisms are used (e.g., a mix of
	// SDISel and FastISel).			// SDISel and FastISel).
	// For values local to a basic block, the instruction selection process			// For values local to a basic block, the instruction selection process
	// generates these virtual registers with whatever method is appropriate			// generates these virtual registers with whatever method is appropriate
	// for its needs. In particular, FastISel and SDISel do not share the way			// for its needs. In particular, FastISel and SDISel do not share the way
	// local virtual registers are set.			// local virtual registers are set.
	// Therefore, this is impossible (or at least unsafe) to share values			// Therefore, this is impossible (or at least unsafe) to share values
	// between basic blocks unless they use the same instruction selection			// between basic blocks unless they use the same instruction selection
	// method, which is not guarantee for X86.			// method, which is not guarantee for X86.
	// Moreover, things like hasOneUse could not be used accurately, if we			// Moreover, things like hasOneUse could not be used accurately, if we
	// allow to reference values across basic blocks whereas they are not			// allow to reference values across basic blocks whereas they are not
	// alive across basic blocks initially.			// alive across basic blocks initially.
	bool InMBB = true;			bool InMBB = true;
	if (I) {			if (I) {
	Opcode = I->getOpcode();			Opcode = I->getOpcode();
	U = I;			U = I;
	InMBB = I->getParent() == FuncInfo.MBB->getBasicBlock();			InMBB = I->getParent() == FuncInfo.MBB->getBasicBlock();
	} else if (const ConstantExpr *C = dyn_cast<ConstantExpr>(V)) {			} else if (const ConstantExpr *C = dyn_cast<ConstantExpr>(V)) {
	Opcode = C->getOpcode();			Opcode = C->getOpcode();
	U = C;			U = C;
	}			}

	switch (Opcode) {			switch (Opcode) {
	default: break;			default: break;
	case Instruction::BitCast:			case Instruction::BitCast:
	// Look past bitcasts if its operand is in the same BB.			// Look past bitcasts if its operand is in the same BB.
	if (InMBB)			if (InMBB)
	return X86SelectCallAddress(U->getOperand(0), AM);			return X86SelectCallAddress(U->getOperand(0), AM);
	break;			break;

	case Instruction::IntToPtr:			case Instruction::IntToPtr:
	// Look past no-op inttoptrs if its operand is in the same BB.			// Look past no-op inttoptrs if its operand is in the same BB.
	if (InMBB &&			if (InMBB &&
	TLI.getValueType(U->getOperand(0)->getType()) == TLI.getPointerTy())			TLI.getValueType(U->getOperand(0)->getType()) == TLI.getPointerTy())
	return X86SelectCallAddress(U->getOperand(0), AM);			return X86SelectCallAddress(U->getOperand(0), AM);
	break;			break;

	case Instruction::PtrToInt:			case Instruction::PtrToInt:
	// Look past no-op ptrtoints if its operand is in the same BB.			// Look past no-op ptrtoints if its operand is in the same BB.
	if (InMBB &&			if (InMBB &&
	TLI.getValueType(U->getType()) == TLI.getPointerTy())			TLI.getValueType(U->getType()) == TLI.getPointerTy())
	return X86SelectCallAddress(U->getOperand(0), AM);			return X86SelectCallAddress(U->getOperand(0), AM);
	break;			break;
	}			}

	// Handle constant address.			// Handle constant address.
	if (const GlobalValue *GV = dyn_cast<GlobalValue>(V)) {			if (const GlobalValue *GV = dyn_cast<GlobalValue>(V)) {
	// Can't handle alternate code models yet.			// Can't handle alternate code models yet.
	if (TM.getCodeModel() != CodeModel::Small)			if (TM.getCodeModel() != CodeModel::Small)
	return false;			return false;

	// RIP-relative addresses can't have additional register operands.			// RIP-relative addresses can't have additional register operands.
	if (Subtarget->isPICStyleRIPRel() &&			if (Subtarget->isPICStyleRIPRel() &&
	(AM.Base.Reg != 0 \|\| AM.IndexReg != 0))			(AM.Base.Reg != 0 \|\| AM.IndexReg != 0))
	return false;			return false;

	// Can't handle DLL Import.			// Can't handle DLL Import.
	if (GV->hasDLLImportStorageClass())			if (GV->hasDLLImportStorageClass())
	return false;			return false;

	// Can't handle TLS.			// Can't handle TLS.
	if (const GlobalVariable *GVar = dyn_cast<GlobalVariable>(GV))			if (const GlobalVariable *GVar = dyn_cast<GlobalVariable>(GV))
	if (GVar->isThreadLocal())			if (GVar->isThreadLocal())
	return false;			return false;

	// Okay, we've committed to selecting this global. Set up the basic address.			// Okay, we've committed to selecting this global. Set up the basic address.
	AM.GV = GV;			AM.GV = GV;

	// No ABI requires an extra load for anything other than DLLImport, which			// No ABI requires an extra load for anything other than DLLImport, which
	// we rejected above. Return a direct reference to the global.			// we rejected above. Return a direct reference to the global.
	if (Subtarget->isPICStyleRIPRel()) {			if (Subtarget->isPICStyleRIPRel()) {
	// Use rip-relative addressing if we can. Above we verified that the			// Use rip-relative addressing if we can. Above we verified that the
	// base and index registers are unused.			// base and index registers are unused.
	assert(AM.Base.Reg == 0 && AM.IndexReg == 0);			assert(AM.Base.Reg == 0 && AM.IndexReg == 0);
	AM.Base.Reg = X86::RIP;			AM.Base.Reg = X86::RIP;
	} else if (Subtarget->isPICStyleStubPIC()) {			} else if (Subtarget->isPICStyleStubPIC()) {
	AM.GVOpFlags = X86II::MO_PIC_BASE_OFFSET;			AM.GVOpFlags = X86II::MO_PIC_BASE_OFFSET;
	} else if (Subtarget->isPICStyleGOT()) {			} else if (Subtarget->isPICStyleGOT()) {
	AM.GVOpFlags = X86II::MO_GOTOFF;			AM.GVOpFlags = X86II::MO_GOTOFF;
	}			}

	return true;			return true;
	}			}

	// If all else fails, try to materialize the value in a register.			// If all else fails, try to materialize the value in a register.
	if (!AM.GV \|\| !Subtarget->isPICStyleRIPRel()) {			if (!AM.GV \|\| !Subtarget->isPICStyleRIPRel()) {
	if (AM.Base.Reg == 0) {			if (AM.Base.Reg == 0) {
	AM.Base.Reg = getRegForValue(V);			AM.Base.Reg = getRegForValue(V);
	return AM.Base.Reg != 0;			return AM.Base.Reg != 0;
	}			}
	if (AM.IndexReg == 0) {			if (AM.IndexReg == 0) {
	assert(AM.Scale == 1 && "Scale with no index!");			assert(AM.Scale == 1 && "Scale with no index!");
	AM.IndexReg = getRegForValue(V);			AM.IndexReg = getRegForValue(V);
	return AM.IndexReg != 0;			return AM.IndexReg != 0;
	}			}
	}			}

	return false;			return false;
	}			}


	/// X86SelectStore - Select and emit code to implement store instructions.			/// X86SelectStore - Select and emit code to implement store instructions.
	bool X86FastISel::X86SelectStore(const Instruction *I) {			bool X86FastISel::X86SelectStore(const Instruction *I) {
	// Atomic stores need special handling.			// Atomic stores need special handling.
	const StoreInst *S = cast<StoreInst>(I);			const StoreInst *S = cast<StoreInst>(I);

	if (S->isAtomic())			if (S->isAtomic())
	return false;			return false;

	const Value *Val = S->getValueOperand();			const Value *Val = S->getValueOperand();
	const Value *Ptr = S->getPointerOperand();			const Value *Ptr = S->getPointerOperand();

	MVT VT;			MVT VT;
	if (!isTypeLegal(Val->getType(), VT, /AllowI1=/true))			if (!isTypeLegal(Val->getType(), VT, /AllowI1=/true))
	return false;			return false;

	unsigned Alignment = S->getAlignment();			unsigned Alignment = S->getAlignment();
	unsigned ABIAlignment = DL.getABITypeAlignment(Val->getType());			unsigned ABIAlignment = DL.getABITypeAlignment(Val->getType());
	if (Alignment == 0) // Ensure that codegen never sees alignment 0			if (Alignment == 0) // Ensure that codegen never sees alignment 0
	Alignment = ABIAlignment;			Alignment = ABIAlignment;
	bool Aligned = Alignment >= ABIAlignment;			bool Aligned = Alignment >= ABIAlignment;

	X86AddressMode AM;			X86AddressMode AM;
	if (!X86SelectAddress(Ptr, AM))			if (!X86SelectAddress(Ptr, AM))
	return false;			return false;

	return X86FastEmitStore(VT, Val, AM, createMachineMemOperandFor(I), Aligned);			return X86FastEmitStore(VT, Val, AM, createMachineMemOperandFor(I), Aligned);
	}			}

	/// X86SelectRet - Select and emit code to implement ret instructions.			/// X86SelectRet - Select and emit code to implement ret instructions.
	bool X86FastISel::X86SelectRet(const Instruction *I) {			bool X86FastISel::X86SelectRet(const Instruction *I) {
	const ReturnInst *Ret = cast<ReturnInst>(I);			const ReturnInst *Ret = cast<ReturnInst>(I);
	const Function &F = *I->getParent()->getParent();			const Function &F = *I->getParent()->getParent();
	const X86MachineFunctionInfo *X86MFInfo =			const X86MachineFunctionInfo *X86MFInfo =
	FuncInfo.MF->getInfo<X86MachineFunctionInfo>();			FuncInfo.MF->getInfo<X86MachineFunctionInfo>();

	if (!FuncInfo.CanLowerReturn)			if (!FuncInfo.CanLowerReturn)
	return false;			return false;

	CallingConv::ID CC = F.getCallingConv();			CallingConv::ID CC = F.getCallingConv();
	if (CC != CallingConv::C &&			if (CC != CallingConv::C &&
	CC != CallingConv::Fast &&			CC != CallingConv::Fast &&
	CC != CallingConv::X86_FastCall &&			CC != CallingConv::X86_FastCall &&
	CC != CallingConv::X86_64_SysV)			CC != CallingConv::X86_64_SysV)
	return false;			return false;

	if (Subtarget->isCallingConvWin64(CC))			if (Subtarget->isCallingConvWin64(CC))
	return false;			return false;

	// Don't handle popping bytes on return for now.			// Don't handle popping bytes on return for now.
	if (X86MFInfo->getBytesToPopOnReturn() != 0)			if (X86MFInfo->getBytesToPopOnReturn() != 0)
	return false;			return false;

	// fastcc with -tailcallopt is intended to provide a guaranteed			// fastcc with -tailcallopt is intended to provide a guaranteed
	// tail call optimization. Fastisel doesn't know how to do that.			// tail call optimization. Fastisel doesn't know how to do that.
	if (CC == CallingConv::Fast && TM.Options.GuaranteedTailCallOpt)			if (CC == CallingConv::Fast && TM.Options.GuaranteedTailCallOpt)
	return false;			return false;

	// Let SDISel handle vararg functions.			// Let SDISel handle vararg functions.
	if (F.isVarArg())			if (F.isVarArg())
	return false;			return false;

	// Build a list of return value registers.			// Build a list of return value registers.
	SmallVector<unsigned, 4> RetRegs;			SmallVector<unsigned, 4> RetRegs;

	if (Ret->getNumOperands() > 0) {			if (Ret->getNumOperands() > 0) {
	SmallVector<ISD::OutputArg, 4> Outs;			SmallVector<ISD::OutputArg, 4> Outs;
	GetReturnInfo(F.getReturnType(), F.getAttributes(), Outs, TLI);			GetReturnInfo(F.getReturnType(), F.getAttributes(), Outs, TLI);

	// Analyze operands of the call, assigning locations to each operand.			// Analyze operands of the call, assigning locations to each operand.
	SmallVector<CCValAssign, 16> ValLocs;			SmallVector<CCValAssign, 16> ValLocs;
	CCState CCInfo(CC, F.isVarArg(), *FuncInfo.MF, ValLocs, I->getContext());			CCState CCInfo(CC, F.isVarArg(), *FuncInfo.MF, ValLocs, I->getContext());
	CCInfo.AnalyzeReturn(Outs, RetCC_X86);			CCInfo.AnalyzeReturn(Outs, RetCC_X86);

	const Value *RV = Ret->getOperand(0);			const Value *RV = Ret->getOperand(0);
	unsigned Reg = getRegForValue(RV);			unsigned Reg = getRegForValue(RV);
	if (Reg == 0)			if (Reg == 0)
	return false;			return false;

	// Only handle a single return value for now.			// Only handle a single return value for now.
	if (ValLocs.size() != 1)			if (ValLocs.size() != 1)
	return false;			return false;

	CCValAssign &VA = ValLocs[0];			CCValAssign &VA = ValLocs[0];

	// Don't bother handling odd stuff for now.			// Don't bother handling odd stuff for now.
	if (VA.getLocInfo() != CCValAssign::Full)			if (VA.getLocInfo() != CCValAssign::Full)
	return false;			return false;
	// Only handle register returns for now.			// Only handle register returns for now.
	if (!VA.isRegLoc())			if (!VA.isRegLoc())
	return false;			return false;

	// The calling-convention tables for x87 returns don't tell			// The calling-convention tables for x87 returns don't tell
	// the whole story.			// the whole story.
	if (VA.getLocReg() == X86::FP0 \|\| VA.getLocReg() == X86::FP1)			if (VA.getLocReg() == X86::FP0 \|\| VA.getLocReg() == X86::FP1)
	return false;			return false;

	unsigned SrcReg = Reg + VA.getValNo();			unsigned SrcReg = Reg + VA.getValNo();
	EVT SrcVT = TLI.getValueType(RV->getType());			EVT SrcVT = TLI.getValueType(RV->getType());
	EVT DstVT = VA.getValVT();			EVT DstVT = VA.getValVT();
	// Special handling for extended integers.			// Special handling for extended integers.
	if (SrcVT != DstVT) {			if (SrcVT != DstVT) {
	if (SrcVT != MVT::i1 && SrcVT != MVT::i8 && SrcVT != MVT::i16)			if (SrcVT != MVT::i1 && SrcVT != MVT::i8 && SrcVT != MVT::i16)
	return false;			return false;

	if (!Outs[0].Flags.isZExt() && !Outs[0].Flags.isSExt())			if (!Outs[0].Flags.isZExt() && !Outs[0].Flags.isSExt())
	return false;			return false;

	assert(DstVT == MVT::i32 && "X86 should always ext to i32");			assert(DstVT == MVT::i32 && "X86 should always ext to i32");

	if (SrcVT == MVT::i1) {			if (SrcVT == MVT::i1) {
	if (Outs[0].Flags.isSExt())			if (Outs[0].Flags.isSExt())
	return false;			return false;
	SrcReg = fastEmitZExtFromI1(MVT::i8, SrcReg, /TODO: Kill=/false);			SrcReg = fastEmitZExtFromI1(MVT::i8, SrcReg, /TODO: Kill=/false);
	SrcVT = MVT::i8;			SrcVT = MVT::i8;
	}			}
	unsigned Op = Outs[0].Flags.isZExt() ? ISD::ZERO_EXTEND :			unsigned Op = Outs[0].Flags.isZExt() ? ISD::ZERO_EXTEND :
	ISD::SIGN_EXTEND;			ISD::SIGN_EXTEND;
	SrcReg = fastEmit_r(SrcVT.getSimpleVT(), DstVT.getSimpleVT(), Op,			SrcReg = fastEmit_r(SrcVT.getSimpleVT(), DstVT.getSimpleVT(), Op,
	SrcReg, /TODO: Kill=/false);			SrcReg, /TODO: Kill=/false);
	}			}

	// Make the copy.			// Make the copy.
	unsigned DstReg = VA.getLocReg();			unsigned DstReg = VA.getLocReg();
	const TargetRegisterClass *SrcRC = MRI.getRegClass(SrcReg);			const TargetRegisterClass *SrcRC = MRI.getRegClass(SrcReg);
	// Avoid a cross-class copy. This is very unlikely.			// Avoid a cross-class copy. This is very unlikely.
	if (!SrcRC->contains(DstReg))			if (!SrcRC->contains(DstReg))
	return false;			return false;
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(TargetOpcode::COPY), DstReg).addReg(SrcReg);			TII.get(TargetOpcode::COPY), DstReg).addReg(SrcReg);

	// Add register to return instruction.			// Add register to return instruction.
	RetRegs.push_back(VA.getLocReg());			RetRegs.push_back(VA.getLocReg());
	}			}

	// The x86-64 ABI for returning structs by value requires that we copy			// The x86-64 ABI for returning structs by value requires that we copy
	// the sret argument into %rax for the return. We saved the argument into			// the sret argument into %rax for the return. We saved the argument into
	// a virtual register in the entry block, so now we copy the value out			// a virtual register in the entry block, so now we copy the value out
	// and into %rax. We also do the same with %eax for Win32.			// and into %rax. We also do the same with %eax for Win32.
	if (F.hasStructRetAttr() &&			if (F.hasStructRetAttr() &&
	(Subtarget->is64Bit() \|\| Subtarget->isTargetKnownWindowsMSVC())) {			(Subtarget->is64Bit() \|\| Subtarget->isTargetKnownWindowsMSVC())) {
	unsigned Reg = X86MFInfo->getSRetReturnReg();			unsigned Reg = X86MFInfo->getSRetReturnReg();
	assert(Reg &&			assert(Reg &&
	"SRetReturnReg should have been set in LowerFormalArguments()!");			"SRetReturnReg should have been set in LowerFormalArguments()!");
	unsigned RetReg = Subtarget->is64Bit() ? X86::RAX : X86::EAX;			unsigned RetReg = Subtarget->is64Bit() ? X86::RAX : X86::EAX;
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(TargetOpcode::COPY), RetReg).addReg(Reg);			TII.get(TargetOpcode::COPY), RetReg).addReg(Reg);
	RetRegs.push_back(RetReg);			RetRegs.push_back(RetReg);
	}			}

	// Now emit the RET.			// Now emit the RET.
	MachineInstrBuilder MIB =			MachineInstrBuilder MIB =
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(Subtarget->is64Bit() ? X86::RETQ : X86::RETL));			TII.get(Subtarget->is64Bit() ? X86::RETQ : X86::RETL));
	for (unsigned i = 0, e = RetRegs.size(); i != e; ++i)			for (unsigned i = 0, e = RetRegs.size(); i != e; ++i)
	MIB.addReg(RetRegs[i], RegState::Implicit);			MIB.addReg(RetRegs[i], RegState::Implicit);
	return true;			return true;
	}			}

	/// X86SelectLoad - Select and emit code to implement load instructions.			/// X86SelectLoad - Select and emit code to implement load instructions.
	///			///
	bool X86FastISel::X86SelectLoad(const Instruction *I) {			bool X86FastISel::X86SelectLoad(const Instruction *I) {
	const LoadInst *LI = cast<LoadInst>(I);			const LoadInst *LI = cast<LoadInst>(I);

	// Atomic loads need special handling.			// Atomic loads need special handling.
	if (LI->isAtomic())			if (LI->isAtomic())
	return false;			return false;

	MVT VT;			MVT VT;
	if (!isTypeLegal(LI->getType(), VT, /AllowI1=/true))			if (!isTypeLegal(LI->getType(), VT, /AllowI1=/true))
	return false;			return false;

	const Value *Ptr = LI->getPointerOperand();			const Value *Ptr = LI->getPointerOperand();

	X86AddressMode AM;			X86AddressMode AM;
	if (!X86SelectAddress(Ptr, AM))			if (!X86SelectAddress(Ptr, AM))
	return false;			return false;

	unsigned ResultReg = 0;			unsigned ResultReg = 0;
	if (!X86FastEmitLoad(VT, AM, createMachineMemOperandFor(LI), ResultReg))			if (!X86FastEmitLoad(VT, AM, createMachineMemOperandFor(LI), ResultReg))
	return false;			return false;

	updateValueMap(I, ResultReg);			updateValueMap(I, ResultReg);
	return true;			return true;
	}			}

	static unsigned X86ChooseCmpOpcode(EVT VT, const X86Subtarget *Subtarget) {			static unsigned X86ChooseCmpOpcode(EVT VT, const X86Subtarget *Subtarget) {
	bool HasAVX = Subtarget->hasAVX();			bool HasAVX = Subtarget->hasAVX();
	bool X86ScalarSSEf32 = Subtarget->hasSSE1();			bool X86ScalarSSEf32 = Subtarget->hasSSE1();
	bool X86ScalarSSEf64 = Subtarget->hasSSE2();			bool X86ScalarSSEf64 = Subtarget->hasSSE2();

	switch (VT.getSimpleVT().SimpleTy) {			switch (VT.getSimpleVT().SimpleTy) {
	default: return 0;			default: return 0;
	case MVT::i8: return X86::CMP8rr;			case MVT::i8: return X86::CMP8rr;
	case MVT::i16: return X86::CMP16rr;			case MVT::i16: return X86::CMP16rr;
	case MVT::i32: return X86::CMP32rr;			case MVT::i32: return X86::CMP32rr;
	case MVT::i64: return X86::CMP64rr;			case MVT::i64: return X86::CMP64rr;
	case MVT::f32:			case MVT::f32:
	return X86ScalarSSEf32 ? (HasAVX ? X86::VUCOMISSrr : X86::UCOMISSrr) : 0;			return X86ScalarSSEf32 ? (HasAVX ? X86::VUCOMISSrr : X86::UCOMISSrr) : 0;
	case MVT::f64:			case MVT::f64:
	return X86ScalarSSEf64 ? (HasAVX ? X86::VUCOMISDrr : X86::UCOMISDrr) : 0;			return X86ScalarSSEf64 ? (HasAVX ? X86::VUCOMISDrr : X86::UCOMISDrr) : 0;
	}			}
	}			}

	/// X86ChooseCmpImmediateOpcode - If we have a comparison with RHS as the RHS			/// X86ChooseCmpImmediateOpcode - If we have a comparison with RHS as the RHS
	/// of the comparison, return an opcode that works for the compare (e.g.			/// of the comparison, return an opcode that works for the compare (e.g.
	/// CMP32ri) otherwise return 0.			/// CMP32ri) otherwise return 0.
	static unsigned X86ChooseCmpImmediateOpcode(EVT VT, const ConstantInt *RHSC) {			static unsigned X86ChooseCmpImmediateOpcode(EVT VT, const ConstantInt *RHSC) {
	switch (VT.getSimpleVT().SimpleTy) {			switch (VT.getSimpleVT().SimpleTy) {
	// Otherwise, we can't fold the immediate into this comparison.			// Otherwise, we can't fold the immediate into this comparison.
	default: return 0;			default: return 0;
	case MVT::i8: return X86::CMP8ri;			case MVT::i8: return X86::CMP8ri;
	case MVT::i16: return X86::CMP16ri;			case MVT::i16: return X86::CMP16ri;
	case MVT::i32: return X86::CMP32ri;			case MVT::i32: return X86::CMP32ri;
	case MVT::i64:			case MVT::i64:
	// 64-bit comparisons are only valid if the immediate fits in a 32-bit sext			// 64-bit comparisons are only valid if the immediate fits in a 32-bit sext
	// field.			// field.
	if ((int)RHSC->getSExtValue() == RHSC->getSExtValue())			if ((int)RHSC->getSExtValue() == RHSC->getSExtValue())
	return X86::CMP64ri32;			return X86::CMP64ri32;
	return 0;			return 0;
	}			}
	}			}

	bool X86FastISel::X86FastEmitCompare(const Value Op0, const Value Op1,			bool X86FastISel::X86FastEmitCompare(const Value Op0, const Value Op1,
	EVT VT, DebugLoc CurDbgLoc) {			EVT VT, DebugLoc CurDbgLoc) {
	unsigned Op0Reg = getRegForValue(Op0);			unsigned Op0Reg = getRegForValue(Op0);
	if (Op0Reg == 0) return false;			if (Op0Reg == 0) return false;

	// Handle 'null' like i32/i64 0.			// Handle 'null' like i32/i64 0.
	if (isa<ConstantPointerNull>(Op1))			if (isa<ConstantPointerNull>(Op1))
	Op1 = Constant::getNullValue(DL.getIntPtrType(Op0->getContext()));			Op1 = Constant::getNullValue(DL.getIntPtrType(Op0->getContext()));

	// We have two options: compare with register or immediate. If the RHS of			// We have two options: compare with register or immediate. If the RHS of
	// the compare is an immediate that we can fold into this compare, use			// the compare is an immediate that we can fold into this compare, use
	// CMPri, otherwise use CMPrr.			// CMPri, otherwise use CMPrr.
	if (const ConstantInt *Op1C = dyn_cast<ConstantInt>(Op1)) {			if (const ConstantInt *Op1C = dyn_cast<ConstantInt>(Op1)) {
	if (unsigned CompareImmOpc = X86ChooseCmpImmediateOpcode(VT, Op1C)) {			if (unsigned CompareImmOpc = X86ChooseCmpImmediateOpcode(VT, Op1C)) {
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, CurDbgLoc, TII.get(CompareImmOpc))			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, CurDbgLoc, TII.get(CompareImmOpc))
	.addReg(Op0Reg)			.addReg(Op0Reg)
	.addImm(Op1C->getSExtValue());			.addImm(Op1C->getSExtValue());
	return true;			return true;
	}			}
	}			}

	unsigned CompareOpc = X86ChooseCmpOpcode(VT, Subtarget);			unsigned CompareOpc = X86ChooseCmpOpcode(VT, Subtarget);
	if (CompareOpc == 0) return false;			if (CompareOpc == 0) return false;

	unsigned Op1Reg = getRegForValue(Op1);			unsigned Op1Reg = getRegForValue(Op1);
	if (Op1Reg == 0) return false;			if (Op1Reg == 0) return false;
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, CurDbgLoc, TII.get(CompareOpc))			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, CurDbgLoc, TII.get(CompareOpc))
	.addReg(Op0Reg)			.addReg(Op0Reg)
	.addReg(Op1Reg);			.addReg(Op1Reg);

	return true;			return true;
	}			}

	bool X86FastISel::X86SelectCmp(const Instruction *I) {			bool X86FastISel::X86SelectCmp(const Instruction *I) {
	const CmpInst *CI = cast<CmpInst>(I);			const CmpInst *CI = cast<CmpInst>(I);

	MVT VT;			MVT VT;
	if (!isTypeLegal(I->getOperand(0)->getType(), VT))			if (!isTypeLegal(I->getOperand(0)->getType(), VT))
	return false;			return false;

	// Try to optimize or fold the cmp.			// Try to optimize or fold the cmp.
	CmpInst::Predicate Predicate = optimizeCmpPredicate(CI);			CmpInst::Predicate Predicate = optimizeCmpPredicate(CI);
	unsigned ResultReg = 0;			unsigned ResultReg = 0;
	switch (Predicate) {			switch (Predicate) {
	default: break;			default: break;
	case CmpInst::FCMP_FALSE: {			case CmpInst::FCMP_FALSE: {
	ResultReg = createResultReg(&X86::GR32RegClass);			ResultReg = createResultReg(&X86::GR32RegClass);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(X86::MOV32r0),			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(X86::MOV32r0),
	ResultReg);			ResultReg);
	ResultReg = fastEmitInst_extractsubreg(MVT::i8, ResultReg, /Kill=/true,			ResultReg = fastEmitInst_extractsubreg(MVT::i8, ResultReg, /Kill=/true,
	X86::sub_8bit);			X86::sub_8bit);
	if (!ResultReg)			if (!ResultReg)
	return false;			return false;
	break;			break;
	}			}
	case CmpInst::FCMP_TRUE: {			case CmpInst::FCMP_TRUE: {
	ResultReg = createResultReg(&X86::GR8RegClass);			ResultReg = createResultReg(&X86::GR8RegClass);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(X86::MOV8ri),			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(X86::MOV8ri),
	ResultReg).addImm(1);			ResultReg).addImm(1);
	break;			break;
	}			}
	}			}

	if (ResultReg) {			if (ResultReg) {
	updateValueMap(I, ResultReg);			updateValueMap(I, ResultReg);
	return true;			return true;
	}			}

	const Value *LHS = CI->getOperand(0);			const Value *LHS = CI->getOperand(0);
	const Value *RHS = CI->getOperand(1);			const Value *RHS = CI->getOperand(1);

	// The optimizer might have replaced fcmp oeq %x, %x with fcmp ord %x, 0.0.			// The optimizer might have replaced fcmp oeq %x, %x with fcmp ord %x, 0.0.
	// We don't have to materialize a zero constant for this case and can just use			// We don't have to materialize a zero constant for this case and can just use
	// %x again on the RHS.			// %x again on the RHS.
	if (Predicate == CmpInst::FCMP_ORD \|\| Predicate == CmpInst::FCMP_UNO) {			if (Predicate == CmpInst::FCMP_ORD \|\| Predicate == CmpInst::FCMP_UNO) {
	const auto *RHSC = dyn_cast<ConstantFP>(RHS);			const auto *RHSC = dyn_cast<ConstantFP>(RHS);
	if (RHSC && RHSC->isNullValue())			if (RHSC && RHSC->isNullValue())
	RHS = LHS;			RHS = LHS;
	}			}

	// FCMP_OEQ and FCMP_UNE cannot be checked with a single instruction.			// FCMP_OEQ and FCMP_UNE cannot be checked with a single instruction.
	static unsigned SETFOpcTable[2][3] = {			static unsigned SETFOpcTable[2][3] = {
	{ X86::SETEr, X86::SETNPr, X86::AND8rr },			{ X86::SETEr, X86::SETNPr, X86::AND8rr },
	{ X86::SETNEr, X86::SETPr, X86::OR8rr }			{ X86::SETNEr, X86::SETPr, X86::OR8rr }
	};			};
	unsigned *SETFOpc = nullptr;			unsigned *SETFOpc = nullptr;
	switch (Predicate) {			switch (Predicate) {
	default: break;			default: break;
	case CmpInst::FCMP_OEQ: SETFOpc = &SETFOpcTable[0][0]; break;			case CmpInst::FCMP_OEQ: SETFOpc = &SETFOpcTable[0][0]; break;
	case CmpInst::FCMP_UNE: SETFOpc = &SETFOpcTable[1][0]; break;			case CmpInst::FCMP_UNE: SETFOpc = &SETFOpcTable[1][0]; break;
	}			}

	ResultReg = createResultReg(&X86::GR8RegClass);			ResultReg = createResultReg(&X86::GR8RegClass);
	if (SETFOpc) {			if (SETFOpc) {
	if (!X86FastEmitCompare(LHS, RHS, VT, I->getDebugLoc()))			if (!X86FastEmitCompare(LHS, RHS, VT, I->getDebugLoc()))
	return false;			return false;

	unsigned FlagReg1 = createResultReg(&X86::GR8RegClass);			unsigned FlagReg1 = createResultReg(&X86::GR8RegClass);
	unsigned FlagReg2 = createResultReg(&X86::GR8RegClass);			unsigned FlagReg2 = createResultReg(&X86::GR8RegClass);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(SETFOpc[0]),			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(SETFOpc[0]),
	FlagReg1);			FlagReg1);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(SETFOpc[1]),			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(SETFOpc[1]),
	FlagReg2);			FlagReg2);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(SETFOpc[2]),			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(SETFOpc[2]),
	ResultReg).addReg(FlagReg1).addReg(FlagReg2);			ResultReg).addReg(FlagReg1).addReg(FlagReg2);
	updateValueMap(I, ResultReg);			updateValueMap(I, ResultReg);
	return true;			return true;
	}			}

	X86::CondCode CC;			X86::CondCode CC;
	bool SwapArgs;			bool SwapArgs;
	std::tie(CC, SwapArgs) = getX86ConditionCode(Predicate);			std::tie(CC, SwapArgs) = getX86ConditionCode(Predicate);
	assert(CC <= X86::LAST_VALID_COND && "Unexpected condition code.");			assert(CC <= X86::LAST_VALID_COND && "Unexpected condition code.");
	unsigned Opc = X86::getSETFromCond(CC);			unsigned Opc = X86::getSETFromCond(CC);

	if (SwapArgs)			if (SwapArgs)
	std::swap(LHS, RHS);			std::swap(LHS, RHS);

	// Emit a compare of LHS/RHS.			// Emit a compare of LHS/RHS.
	if (!X86FastEmitCompare(LHS, RHS, VT, I->getDebugLoc()))			if (!X86FastEmitCompare(LHS, RHS, VT, I->getDebugLoc()))
	return false;			return false;

	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc), ResultReg);			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc), ResultReg);
	updateValueMap(I, ResultReg);			updateValueMap(I, ResultReg);
	return true;			return true;
	}			}

	bool X86FastISel::X86SelectZExt(const Instruction *I) {			bool X86FastISel::X86SelectZExt(const Instruction *I) {
	EVT DstVT = TLI.getValueType(I->getType());			EVT DstVT = TLI.getValueType(I->getType());
	if (!TLI.isTypeLegal(DstVT))			if (!TLI.isTypeLegal(DstVT))
	return false;			return false;

	unsigned ResultReg = getRegForValue(I->getOperand(0));			unsigned ResultReg = getRegForValue(I->getOperand(0));
	if (ResultReg == 0)			if (ResultReg == 0)
	return false;			return false;

	// Handle zero-extension from i1 to i8, which is common.			// Handle zero-extension from i1 to i8, which is common.
	MVT SrcVT = TLI.getSimpleValueType(I->getOperand(0)->getType());			MVT SrcVT = TLI.getSimpleValueType(I->getOperand(0)->getType());
	if (SrcVT.SimpleTy == MVT::i1) {			if (SrcVT.SimpleTy == MVT::i1) {
	// Set the high bits to zero.			// Set the high bits to zero.
	ResultReg = fastEmitZExtFromI1(MVT::i8, ResultReg, /TODO: Kill=/false);			ResultReg = fastEmitZExtFromI1(MVT::i8, ResultReg, /TODO: Kill=/false);
	SrcVT = MVT::i8;			SrcVT = MVT::i8;

	if (ResultReg == 0)			if (ResultReg == 0)
	return false;			return false;
	}			}

	if (DstVT == MVT::i64) {			if (DstVT == MVT::i64) {
	// Handle extension to 64-bits via sub-register shenanigans.			// Handle extension to 64-bits via sub-register shenanigans.
	unsigned MovInst;			unsigned MovInst;

	switch (SrcVT.SimpleTy) {			switch (SrcVT.SimpleTy) {
	case MVT::i8: MovInst = X86::MOVZX32rr8; break;			case MVT::i8: MovInst = X86::MOVZX32rr8; break;
	case MVT::i16: MovInst = X86::MOVZX32rr16; break;			case MVT::i16: MovInst = X86::MOVZX32rr16; break;
	case MVT::i32: MovInst = X86::MOV32rr; break;			case MVT::i32: MovInst = X86::MOV32rr; break;
	default: llvm_unreachable("Unexpected zext to i64 source type");			default: llvm_unreachable("Unexpected zext to i64 source type");
	}			}

	unsigned Result32 = createResultReg(&X86::GR32RegClass);			unsigned Result32 = createResultReg(&X86::GR32RegClass);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(MovInst), Result32)			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(MovInst), Result32)
	.addReg(ResultReg);			.addReg(ResultReg);

	ResultReg = createResultReg(&X86::GR64RegClass);			ResultReg = createResultReg(&X86::GR64RegClass);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(TargetOpcode::SUBREG_TO_REG),			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(TargetOpcode::SUBREG_TO_REG),
	ResultReg)			ResultReg)
	.addImm(0).addReg(Result32).addImm(X86::sub_32bit);			.addImm(0).addReg(Result32).addImm(X86::sub_32bit);
	} else if (DstVT != MVT::i8) {			} else if (DstVT != MVT::i8) {
	ResultReg = fastEmit_r(MVT::i8, DstVT.getSimpleVT(), ISD::ZERO_EXTEND,			ResultReg = fastEmit_r(MVT::i8, DstVT.getSimpleVT(), ISD::ZERO_EXTEND,
	ResultReg, /Kill=/true);			ResultReg, /Kill=/true);
	if (ResultReg == 0)			if (ResultReg == 0)
	return false;			return false;
	}			}

	updateValueMap(I, ResultReg);			updateValueMap(I, ResultReg);
	return true;			return true;
	}			}

	bool X86FastISel::X86SelectBranch(const Instruction *I) {			bool X86FastISel::X86SelectBranch(const Instruction *I) {
	// Unconditional branches are selected by tablegen-generated code.			// Unconditional branches are selected by tablegen-generated code.
	// Handle a conditional branch.			// Handle a conditional branch.
	const BranchInst *BI = cast<BranchInst>(I);			const BranchInst *BI = cast<BranchInst>(I);
	MachineBasicBlock *TrueMBB = FuncInfo.MBBMap[BI->getSuccessor(0)];			MachineBasicBlock *TrueMBB = FuncInfo.MBBMap[BI->getSuccessor(0)];
	MachineBasicBlock *FalseMBB = FuncInfo.MBBMap[BI->getSuccessor(1)];			MachineBasicBlock *FalseMBB = FuncInfo.MBBMap[BI->getSuccessor(1)];

	// Fold the common case of a conditional branch with a comparison			// Fold the common case of a conditional branch with a comparison
	// in the same block (values defined on other blocks may not have			// in the same block (values defined on other blocks may not have
	// initialized registers).			// initialized registers).
	X86::CondCode CC;			X86::CondCode CC;
	if (const CmpInst *CI = dyn_cast<CmpInst>(BI->getCondition())) {			if (const CmpInst *CI = dyn_cast<CmpInst>(BI->getCondition())) {
	if (CI->hasOneUse() && CI->getParent() == I->getParent()) {			if (CI->hasOneUse() && CI->getParent() == I->getParent()) {
	EVT VT = TLI.getValueType(CI->getOperand(0)->getType());			EVT VT = TLI.getValueType(CI->getOperand(0)->getType());

	// Try to optimize or fold the cmp.			// Try to optimize or fold the cmp.
	CmpInst::Predicate Predicate = optimizeCmpPredicate(CI);			CmpInst::Predicate Predicate = optimizeCmpPredicate(CI);
	switch (Predicate) {			switch (Predicate) {
	default: break;			default: break;
	case CmpInst::FCMP_FALSE: fastEmitBranch(FalseMBB, DbgLoc); return true;			case CmpInst::FCMP_FALSE: fastEmitBranch(FalseMBB, DbgLoc); return true;
	case CmpInst::FCMP_TRUE: fastEmitBranch(TrueMBB, DbgLoc); return true;			case CmpInst::FCMP_TRUE: fastEmitBranch(TrueMBB, DbgLoc); return true;
	}			}

	const Value *CmpLHS = CI->getOperand(0);			const Value *CmpLHS = CI->getOperand(0);
	const Value *CmpRHS = CI->getOperand(1);			const Value *CmpRHS = CI->getOperand(1);

	// The optimizer might have replaced fcmp oeq %x, %x with fcmp ord %x,			// The optimizer might have replaced fcmp oeq %x, %x with fcmp ord %x,
	// 0.0.			// 0.0.
	// We don't have to materialize a zero constant for this case and can just			// We don't have to materialize a zero constant for this case and can just
	// use %x again on the RHS.			// use %x again on the RHS.
	if (Predicate == CmpInst::FCMP_ORD \|\| Predicate == CmpInst::FCMP_UNO) {			if (Predicate == CmpInst::FCMP_ORD \|\| Predicate == CmpInst::FCMP_UNO) {
	const auto *CmpRHSC = dyn_cast<ConstantFP>(CmpRHS);			const auto *CmpRHSC = dyn_cast<ConstantFP>(CmpRHS);
	if (CmpRHSC && CmpRHSC->isNullValue())			if (CmpRHSC && CmpRHSC->isNullValue())
	CmpRHS = CmpLHS;			CmpRHS = CmpLHS;
	}			}

	// Try to take advantage of fallthrough opportunities.			// Try to take advantage of fallthrough opportunities.
	if (FuncInfo.MBB->isLayoutSuccessor(TrueMBB)) {			if (FuncInfo.MBB->isLayoutSuccessor(TrueMBB)) {
	std::swap(TrueMBB, FalseMBB);			std::swap(TrueMBB, FalseMBB);
	Predicate = CmpInst::getInversePredicate(Predicate);			Predicate = CmpInst::getInversePredicate(Predicate);
	}			}

	// FCMP_OEQ and FCMP_UNE cannot be expressed with a single flag/condition			// FCMP_OEQ and FCMP_UNE cannot be expressed with a single flag/condition
	// code check. Instead two branch instructions are required to check all			// code check. Instead two branch instructions are required to check all
	// the flags. First we change the predicate to a supported condition code,			// the flags. First we change the predicate to a supported condition code,
	// which will be the first branch. Later one we will emit the second			// which will be the first branch. Later one we will emit the second
	// branch.			// branch.
	bool NeedExtraBranch = false;			bool NeedExtraBranch = false;
	switch (Predicate) {			switch (Predicate) {
	default: break;			default: break;
	case CmpInst::FCMP_OEQ:			case CmpInst::FCMP_OEQ:
	std::swap(TrueMBB, FalseMBB); // fall-through			std::swap(TrueMBB, FalseMBB); // fall-through
	case CmpInst::FCMP_UNE:			case CmpInst::FCMP_UNE:
	NeedExtraBranch = true;			NeedExtraBranch = true;
	Predicate = CmpInst::FCMP_ONE;			Predicate = CmpInst::FCMP_ONE;
	break;			break;
	}			}

	bool SwapArgs;			bool SwapArgs;
	unsigned BranchOpc;			unsigned BranchOpc;
	std::tie(CC, SwapArgs) = getX86ConditionCode(Predicate);			std::tie(CC, SwapArgs) = getX86ConditionCode(Predicate);
	assert(CC <= X86::LAST_VALID_COND && "Unexpected condition code.");			assert(CC <= X86::LAST_VALID_COND && "Unexpected condition code.");

	BranchOpc = X86::GetCondBranchFromCond(CC);			BranchOpc = X86::GetCondBranchFromCond(CC);
	if (SwapArgs)			if (SwapArgs)
	std::swap(CmpLHS, CmpRHS);			std::swap(CmpLHS, CmpRHS);

	// Emit a compare of the LHS and RHS, setting the flags.			// Emit a compare of the LHS and RHS, setting the flags.
	if (!X86FastEmitCompare(CmpLHS, CmpRHS, VT, CI->getDebugLoc()))			if (!X86FastEmitCompare(CmpLHS, CmpRHS, VT, CI->getDebugLoc()))
	return false;			return false;

	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(BranchOpc))			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(BranchOpc))
	.addMBB(TrueMBB);			.addMBB(TrueMBB);

	// X86 requires a second branch to handle UNE (and OEQ, which is mapped			// X86 requires a second branch to handle UNE (and OEQ, which is mapped
	// to UNE above).			// to UNE above).
	if (NeedExtraBranch) {			if (NeedExtraBranch) {
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(X86::JP_1))			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(X86::JP_1))
	.addMBB(TrueMBB);			.addMBB(TrueMBB);
	}			}

	// Obtain the branch weight and add the TrueBB to the successor list.			// Obtain the branch weight and add the TrueBB to the successor list.
	uint32_t BranchWeight = 0;			uint32_t BranchWeight = 0;
	if (FuncInfo.BPI)			if (FuncInfo.BPI)
	BranchWeight = FuncInfo.BPI->getEdgeWeight(BI->getParent(),			BranchWeight = FuncInfo.BPI->getEdgeWeight(BI->getParent(),
	TrueMBB->getBasicBlock());			TrueMBB->getBasicBlock());
	FuncInfo.MBB->addSuccessor(TrueMBB, BranchWeight);			FuncInfo.MBB->addSuccessor(TrueMBB, BranchWeight);

	// Emits an unconditional branch to the FalseBB, obtains the branch			// Emits an unconditional branch to the FalseBB, obtains the branch
	// weight, and adds it to the successor list.			// weight, and adds it to the successor list.
	fastEmitBranch(FalseMBB, DbgLoc);			fastEmitBranch(FalseMBB, DbgLoc);

	return true;			return true;
	}			}
	} else if (TruncInst *TI = dyn_cast<TruncInst>(BI->getCondition())) {			} else if (TruncInst *TI = dyn_cast<TruncInst>(BI->getCondition())) {
	// Handle things like "%cond = trunc i32 %X to i1 / br i1 %cond", which			// Handle things like "%cond = trunc i32 %X to i1 / br i1 %cond", which
	// typically happen for _Bool and C++ bools.			// typically happen for _Bool and C++ bools.
	MVT SourceVT;			MVT SourceVT;
	if (TI->hasOneUse() && TI->getParent() == I->getParent() &&			if (TI->hasOneUse() && TI->getParent() == I->getParent() &&
	isTypeLegal(TI->getOperand(0)->getType(), SourceVT)) {			isTypeLegal(TI->getOperand(0)->getType(), SourceVT)) {
	unsigned TestOpc = 0;			unsigned TestOpc = 0;
	switch (SourceVT.SimpleTy) {			switch (SourceVT.SimpleTy) {
	default: break;			default: break;
	case MVT::i8: TestOpc = X86::TEST8ri; break;			case MVT::i8: TestOpc = X86::TEST8ri; break;
	case MVT::i16: TestOpc = X86::TEST16ri; break;			case MVT::i16: TestOpc = X86::TEST16ri; break;
	case MVT::i32: TestOpc = X86::TEST32ri; break;			case MVT::i32: TestOpc = X86::TEST32ri; break;
	case MVT::i64: TestOpc = X86::TEST64ri32; break;			case MVT::i64: TestOpc = X86::TEST64ri32; break;
	}			}
	if (TestOpc) {			if (TestOpc) {
	unsigned OpReg = getRegForValue(TI->getOperand(0));			unsigned OpReg = getRegForValue(TI->getOperand(0));
	if (OpReg == 0) return false;			if (OpReg == 0) return false;
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(TestOpc))			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(TestOpc))
	.addReg(OpReg).addImm(1);			.addReg(OpReg).addImm(1);

	unsigned JmpOpc = X86::JNE_1;			unsigned JmpOpc = X86::JNE_1;
	if (FuncInfo.MBB->isLayoutSuccessor(TrueMBB)) {			if (FuncInfo.MBB->isLayoutSuccessor(TrueMBB)) {
	std::swap(TrueMBB, FalseMBB);			std::swap(TrueMBB, FalseMBB);
	JmpOpc = X86::JE_1;			JmpOpc = X86::JE_1;
	}			}

	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(JmpOpc))			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(JmpOpc))
	.addMBB(TrueMBB);			.addMBB(TrueMBB);
	fastEmitBranch(FalseMBB, DbgLoc);			fastEmitBranch(FalseMBB, DbgLoc);
	uint32_t BranchWeight = 0;			uint32_t BranchWeight = 0;
	if (FuncInfo.BPI)			if (FuncInfo.BPI)
	BranchWeight = FuncInfo.BPI->getEdgeWeight(BI->getParent(),			BranchWeight = FuncInfo.BPI->getEdgeWeight(BI->getParent(),
	TrueMBB->getBasicBlock());			TrueMBB->getBasicBlock());
	FuncInfo.MBB->addSuccessor(TrueMBB, BranchWeight);			FuncInfo.MBB->addSuccessor(TrueMBB, BranchWeight);
	return true;			return true;
	}			}
	}			}
	} else if (foldX86XALUIntrinsic(CC, BI, BI->getCondition())) {			} else if (foldX86XALUIntrinsic(CC, BI, BI->getCondition())) {
	// Fake request the condition, otherwise the intrinsic might be completely			// Fake request the condition, otherwise the intrinsic might be completely
	// optimized away.			// optimized away.
	unsigned TmpReg = getRegForValue(BI->getCondition());			unsigned TmpReg = getRegForValue(BI->getCondition());
	if (TmpReg == 0)			if (TmpReg == 0)
	return false;			return false;

	unsigned BranchOpc = X86::GetCondBranchFromCond(CC);			unsigned BranchOpc = X86::GetCondBranchFromCond(CC);

	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(BranchOpc))			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(BranchOpc))
	.addMBB(TrueMBB);			.addMBB(TrueMBB);
	fastEmitBranch(FalseMBB, DbgLoc);			fastEmitBranch(FalseMBB, DbgLoc);
	uint32_t BranchWeight = 0;			uint32_t BranchWeight = 0;
	if (FuncInfo.BPI)			if (FuncInfo.BPI)
	BranchWeight = FuncInfo.BPI->getEdgeWeight(BI->getParent(),			BranchWeight = FuncInfo.BPI->getEdgeWeight(BI->getParent(),
	TrueMBB->getBasicBlock());			TrueMBB->getBasicBlock());
	FuncInfo.MBB->addSuccessor(TrueMBB, BranchWeight);			FuncInfo.MBB->addSuccessor(TrueMBB, BranchWeight);
	return true;			return true;
	}			}

	// Otherwise do a clumsy setcc and re-test it.			// Otherwise do a clumsy setcc and re-test it.
	// Note that i1 essentially gets ANY_EXTEND'ed to i8 where it isn't used			// Note that i1 essentially gets ANY_EXTEND'ed to i8 where it isn't used
	// in an explicit cast, so make sure to handle that correctly.			// in an explicit cast, so make sure to handle that correctly.
	unsigned OpReg = getRegForValue(BI->getCondition());			unsigned OpReg = getRegForValue(BI->getCondition());
	if (OpReg == 0) return false;			if (OpReg == 0) return false;

	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(X86::TEST8ri))			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(X86::TEST8ri))
	.addReg(OpReg).addImm(1);			.addReg(OpReg).addImm(1);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(X86::JNE_1))			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(X86::JNE_1))
	.addMBB(TrueMBB);			.addMBB(TrueMBB);
	fastEmitBranch(FalseMBB, DbgLoc);			fastEmitBranch(FalseMBB, DbgLoc);
	uint32_t BranchWeight = 0;			uint32_t BranchWeight = 0;
	if (FuncInfo.BPI)			if (FuncInfo.BPI)
	BranchWeight = FuncInfo.BPI->getEdgeWeight(BI->getParent(),			BranchWeight = FuncInfo.BPI->getEdgeWeight(BI->getParent(),
	TrueMBB->getBasicBlock());			TrueMBB->getBasicBlock());
	FuncInfo.MBB->addSuccessor(TrueMBB, BranchWeight);			FuncInfo.MBB->addSuccessor(TrueMBB, BranchWeight);
	return true;			return true;
	}			}

	bool X86FastISel::X86SelectShift(const Instruction *I) {			bool X86FastISel::X86SelectShift(const Instruction *I) {
	unsigned CReg = 0, OpReg = 0;			unsigned CReg = 0, OpReg = 0;
	const TargetRegisterClass *RC = nullptr;			const TargetRegisterClass *RC = nullptr;
	if (I->getType()->isIntegerTy(8)) {			if (I->getType()->isIntegerTy(8)) {
	CReg = X86::CL;			CReg = X86::CL;
	RC = &X86::GR8RegClass;			RC = &X86::GR8RegClass;
	switch (I->getOpcode()) {			switch (I->getOpcode()) {
	case Instruction::LShr: OpReg = X86::SHR8rCL; break;			case Instruction::LShr: OpReg = X86::SHR8rCL; break;
	case Instruction::AShr: OpReg = X86::SAR8rCL; break;			case Instruction::AShr: OpReg = X86::SAR8rCL; break;
	case Instruction::Shl: OpReg = X86::SHL8rCL; break;			case Instruction::Shl: OpReg = X86::SHL8rCL; break;
	default: return false;			default: return false;
	}			}
	} else if (I->getType()->isIntegerTy(16)) {			} else if (I->getType()->isIntegerTy(16)) {
	CReg = X86::CX;			CReg = X86::CX;
	RC = &X86::GR16RegClass;			RC = &X86::GR16RegClass;
	switch (I->getOpcode()) {			switch (I->getOpcode()) {
	case Instruction::LShr: OpReg = X86::SHR16rCL; break;			case Instruction::LShr: OpReg = X86::SHR16rCL; break;
	case Instruction::AShr: OpReg = X86::SAR16rCL; break;			case Instruction::AShr: OpReg = X86::SAR16rCL; break;
	case Instruction::Shl: OpReg = X86::SHL16rCL; break;			case Instruction::Shl: OpReg = X86::SHL16rCL; break;
	default: return false;			default: return false;
	}			}
	} else if (I->getType()->isIntegerTy(32)) {			} else if (I->getType()->isIntegerTy(32)) {
	CReg = X86::ECX;			CReg = X86::ECX;
	RC = &X86::GR32RegClass;			RC = &X86::GR32RegClass;
	switch (I->getOpcode()) {			switch (I->getOpcode()) {
	case Instruction::LShr: OpReg = X86::SHR32rCL; break;			case Instruction::LShr: OpReg = X86::SHR32rCL; break;
	case Instruction::AShr: OpReg = X86::SAR32rCL; break;			case Instruction::AShr: OpReg = X86::SAR32rCL; break;
	case Instruction::Shl: OpReg = X86::SHL32rCL; break;			case Instruction::Shl: OpReg = X86::SHL32rCL; break;
	default: return false;			default: return false;
	}			}
	} else if (I->getType()->isIntegerTy(64)) {			} else if (I->getType()->isIntegerTy(64)) {
	CReg = X86::RCX;			CReg = X86::RCX;
	RC = &X86::GR64RegClass;			RC = &X86::GR64RegClass;
	switch (I->getOpcode()) {			switch (I->getOpcode()) {
	case Instruction::LShr: OpReg = X86::SHR64rCL; break;			case Instruction::LShr: OpReg = X86::SHR64rCL; break;
	case Instruction::AShr: OpReg = X86::SAR64rCL; break;			case Instruction::AShr: OpReg = X86::SAR64rCL; break;
	case Instruction::Shl: OpReg = X86::SHL64rCL; break;			case Instruction::Shl: OpReg = X86::SHL64rCL; break;
	default: return false;			default: return false;
	}			}
	} else {			} else {
	return false;			return false;
	}			}

	MVT VT;			MVT VT;
	if (!isTypeLegal(I->getType(), VT))			if (!isTypeLegal(I->getType(), VT))
	return false;			return false;

	unsigned Op0Reg = getRegForValue(I->getOperand(0));			unsigned Op0Reg = getRegForValue(I->getOperand(0));
	if (Op0Reg == 0) return false;			if (Op0Reg == 0) return false;

	unsigned Op1Reg = getRegForValue(I->getOperand(1));			unsigned Op1Reg = getRegForValue(I->getOperand(1));
	if (Op1Reg == 0) return false;			if (Op1Reg == 0) return false;
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(TargetOpcode::COPY),			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(TargetOpcode::COPY),
	CReg).addReg(Op1Reg);			CReg).addReg(Op1Reg);

	// The shift instruction uses X86::CL. If we defined a super-register			// The shift instruction uses X86::CL. If we defined a super-register
	// of X86::CL, emit a subreg KILL to precisely describe what we're doing here.			// of X86::CL, emit a subreg KILL to precisely describe what we're doing here.
	if (CReg != X86::CL)			if (CReg != X86::CL)
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(TargetOpcode::KILL), X86::CL)			TII.get(TargetOpcode::KILL), X86::CL)
	.addReg(CReg, RegState::Kill);			.addReg(CReg, RegState::Kill);

	unsigned ResultReg = createResultReg(RC);			unsigned ResultReg = createResultReg(RC);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(OpReg), ResultReg)			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(OpReg), ResultReg)
	.addReg(Op0Reg);			.addReg(Op0Reg);
	updateValueMap(I, ResultReg);			updateValueMap(I, ResultReg);
	return true;			return true;
	}			}

	bool X86FastISel::X86SelectDivRem(const Instruction *I) {			bool X86FastISel::X86SelectDivRem(const Instruction *I) {
	const static unsigned NumTypes = 4; // i8, i16, i32, i64			const static unsigned NumTypes = 4; // i8, i16, i32, i64
	const static unsigned NumOps = 4; // SDiv, SRem, UDiv, URem			const static unsigned NumOps = 4; // SDiv, SRem, UDiv, URem
	const static bool S = true; // IsSigned			const static bool S = true; // IsSigned
	const static bool U = false; // !IsSigned			const static bool U = false; // !IsSigned
	const static unsigned Copy = TargetOpcode::COPY;			const static unsigned Copy = TargetOpcode::COPY;
	// For the X86 DIV/IDIV instruction, in most cases the dividend			// For the X86 DIV/IDIV instruction, in most cases the dividend
	// (numerator) must be in a specific register pair highreg:lowreg,			// (numerator) must be in a specific register pair highreg:lowreg,
	// producing the quotient in lowreg and the remainder in highreg.			// producing the quotient in lowreg and the remainder in highreg.
	// For most data types, to set up the instruction, the dividend is			// For most data types, to set up the instruction, the dividend is
	// copied into lowreg, and lowreg is sign-extended or zero-extended			// copied into lowreg, and lowreg is sign-extended or zero-extended
	// into highreg. The exception is i8, where the dividend is defined			// into highreg. The exception is i8, where the dividend is defined
	// as a single register rather than a register pair, and we			// as a single register rather than a register pair, and we
	// therefore directly sign-extend or zero-extend the dividend into			// therefore directly sign-extend or zero-extend the dividend into
	// lowreg, instead of copying, and ignore the highreg.			// lowreg, instead of copying, and ignore the highreg.
	const static struct DivRemEntry {			const static struct DivRemEntry {
	// The following portion depends only on the data type.			// The following portion depends only on the data type.
	const TargetRegisterClass *RC;			const TargetRegisterClass *RC;
	unsigned LowInReg; // low part of the register pair			unsigned LowInReg; // low part of the register pair
	unsigned HighInReg; // high part of the register pair			unsigned HighInReg; // high part of the register pair
	// The following portion depends on both the data type and the operation.			// The following portion depends on both the data type and the operation.
	struct DivRemResult {			struct DivRemResult {
	unsigned OpDivRem; // The specific DIV/IDIV opcode to use.			unsigned OpDivRem; // The specific DIV/IDIV opcode to use.
	unsigned OpSignExtend; // Opcode for sign-extending lowreg into			unsigned OpSignExtend; // Opcode for sign-extending lowreg into
	// highreg, or copying a zero into highreg.			// highreg, or copying a zero into highreg.
	unsigned OpCopy; // Opcode for copying dividend into lowreg, or			unsigned OpCopy; // Opcode for copying dividend into lowreg, or
	// zero/sign-extending into lowreg for i8.			// zero/sign-extending into lowreg for i8.
	unsigned DivRemResultReg; // Register containing the desired result.			unsigned DivRemResultReg; // Register containing the desired result.
	bool IsOpSigned; // Whether to use signed or unsigned form.			bool IsOpSigned; // Whether to use signed or unsigned form.
	} ResultTable[NumOps];			} ResultTable[NumOps];
	} OpTable[NumTypes] = {			} OpTable[NumTypes] = {
	{ &X86::GR8RegClass, X86::AX, 0, {			{ &X86::GR8RegClass, X86::AX, 0, {
	{ X86::IDIV8r, 0, X86::MOVSX16rr8, X86::AL, S }, // SDiv			{ X86::IDIV8r, 0, X86::MOVSX16rr8, X86::AL, S }, // SDiv
	{ X86::IDIV8r, 0, X86::MOVSX16rr8, X86::AH, S }, // SRem			{ X86::IDIV8r, 0, X86::MOVSX16rr8, X86::AH, S }, // SRem
	{ X86::DIV8r, 0, X86::MOVZX16rr8, X86::AL, U }, // UDiv			{ X86::DIV8r, 0, X86::MOVZX16rr8, X86::AL, U }, // UDiv
	{ X86::DIV8r, 0, X86::MOVZX16rr8, X86::AH, U }, // URem			{ X86::DIV8r, 0, X86::MOVZX16rr8, X86::AH, U }, // URem
	}			}
	}, // i8			}, // i8
	{ &X86::GR16RegClass, X86::AX, X86::DX, {			{ &X86::GR16RegClass, X86::AX, X86::DX, {
	{ X86::IDIV16r, X86::CWD, Copy, X86::AX, S }, // SDiv			{ X86::IDIV16r, X86::CWD, Copy, X86::AX, S }, // SDiv
	{ X86::IDIV16r, X86::CWD, Copy, X86::DX, S }, // SRem			{ X86::IDIV16r, X86::CWD, Copy, X86::DX, S }, // SRem
	{ X86::DIV16r, X86::MOV32r0, Copy, X86::AX, U }, // UDiv			{ X86::DIV16r, X86::MOV32r0, Copy, X86::AX, U }, // UDiv
	{ X86::DIV16r, X86::MOV32r0, Copy, X86::DX, U }, // URem			{ X86::DIV16r, X86::MOV32r0, Copy, X86::DX, U }, // URem
	}			}
	}, // i16			}, // i16
	{ &X86::GR32RegClass, X86::EAX, X86::EDX, {			{ &X86::GR32RegClass, X86::EAX, X86::EDX, {
	{ X86::IDIV32r, X86::CDQ, Copy, X86::EAX, S }, // SDiv			{ X86::IDIV32r, X86::CDQ, Copy, X86::EAX, S }, // SDiv
	{ X86::IDIV32r, X86::CDQ, Copy, X86::EDX, S }, // SRem			{ X86::IDIV32r, X86::CDQ, Copy, X86::EDX, S }, // SRem
	{ X86::DIV32r, X86::MOV32r0, Copy, X86::EAX, U }, // UDiv			{ X86::DIV32r, X86::MOV32r0, Copy, X86::EAX, U }, // UDiv
	{ X86::DIV32r, X86::MOV32r0, Copy, X86::EDX, U }, // URem			{ X86::DIV32r, X86::MOV32r0, Copy, X86::EDX, U }, // URem
	}			}
	}, // i32			}, // i32
	{ &X86::GR64RegClass, X86::RAX, X86::RDX, {			{ &X86::GR64RegClass, X86::RAX, X86::RDX, {
	{ X86::IDIV64r, X86::CQO, Copy, X86::RAX, S }, // SDiv			{ X86::IDIV64r, X86::CQO, Copy, X86::RAX, S }, // SDiv
	{ X86::IDIV64r, X86::CQO, Copy, X86::RDX, S }, // SRem			{ X86::IDIV64r, X86::CQO, Copy, X86::RDX, S }, // SRem
	{ X86::DIV64r, X86::MOV32r0, Copy, X86::RAX, U }, // UDiv			{ X86::DIV64r, X86::MOV32r0, Copy, X86::RAX, U }, // UDiv
	{ X86::DIV64r, X86::MOV32r0, Copy, X86::RDX, U }, // URem			{ X86::DIV64r, X86::MOV32r0, Copy, X86::RDX, U }, // URem
	}			}
	}, // i64			}, // i64
	};			};

	MVT VT;			MVT VT;
	if (!isTypeLegal(I->getType(), VT))			if (!isTypeLegal(I->getType(), VT))
	return false;			return false;

	unsigned TypeIndex, OpIndex;			unsigned TypeIndex, OpIndex;
	switch (VT.SimpleTy) {			switch (VT.SimpleTy) {
	default: return false;			default: return false;
	case MVT::i8: TypeIndex = 0; break;			case MVT::i8: TypeIndex = 0; break;
	case MVT::i16: TypeIndex = 1; break;			case MVT::i16: TypeIndex = 1; break;
	case MVT::i32: TypeIndex = 2; break;			case MVT::i32: TypeIndex = 2; break;
	case MVT::i64: TypeIndex = 3;			case MVT::i64: TypeIndex = 3;
	if (!Subtarget->is64Bit())			if (!Subtarget->is64Bit())
	return false;			return false;
	break;			break;
	}			}

	switch (I->getOpcode()) {			switch (I->getOpcode()) {
	default: llvm_unreachable("Unexpected div/rem opcode");			default: llvm_unreachable("Unexpected div/rem opcode");
	case Instruction::SDiv: OpIndex = 0; break;			case Instruction::SDiv: OpIndex = 0; break;
	case Instruction::SRem: OpIndex = 1; break;			case Instruction::SRem: OpIndex = 1; break;
	case Instruction::UDiv: OpIndex = 2; break;			case Instruction::UDiv: OpIndex = 2; break;
	case Instruction::URem: OpIndex = 3; break;			case Instruction::URem: OpIndex = 3; break;
	}			}

	const DivRemEntry &TypeEntry = OpTable[TypeIndex];			const DivRemEntry &TypeEntry = OpTable[TypeIndex];
	const DivRemEntry::DivRemResult &OpEntry = TypeEntry.ResultTable[OpIndex];			const DivRemEntry::DivRemResult &OpEntry = TypeEntry.ResultTable[OpIndex];
	unsigned Op0Reg = getRegForValue(I->getOperand(0));			unsigned Op0Reg = getRegForValue(I->getOperand(0));
	if (Op0Reg == 0)			if (Op0Reg == 0)
	return false;			return false;
	unsigned Op1Reg = getRegForValue(I->getOperand(1));			unsigned Op1Reg = getRegForValue(I->getOperand(1));
	if (Op1Reg == 0)			if (Op1Reg == 0)
	return false;			return false;

	// Move op0 into low-order input register.			// Move op0 into low-order input register.
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(OpEntry.OpCopy), TypeEntry.LowInReg).addReg(Op0Reg);			TII.get(OpEntry.OpCopy), TypeEntry.LowInReg).addReg(Op0Reg);
	// Zero-extend or sign-extend into high-order input register.			// Zero-extend or sign-extend into high-order input register.
	if (OpEntry.OpSignExtend) {			if (OpEntry.OpSignExtend) {
	if (OpEntry.IsOpSigned)			if (OpEntry.IsOpSigned)
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(OpEntry.OpSignExtend));			TII.get(OpEntry.OpSignExtend));
	else {			else {
	unsigned Zero32 = createResultReg(&X86::GR32RegClass);			unsigned Zero32 = createResultReg(&X86::GR32RegClass);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(X86::MOV32r0), Zero32);			TII.get(X86::MOV32r0), Zero32);

	// Copy the zero into the appropriate sub/super/identical physical			// Copy the zero into the appropriate sub/super/identical physical
	// register. Unfortunately the operations needed are not uniform enough			// register. Unfortunately the operations needed are not uniform enough
	// to fit neatly into the table above.			// to fit neatly into the table above.
	if (VT.SimpleTy == MVT::i16) {			if (VT.SimpleTy == MVT::i16) {
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(Copy), TypeEntry.HighInReg)			TII.get(Copy), TypeEntry.HighInReg)
	.addReg(Zero32, 0, X86::sub_16bit);			.addReg(Zero32, 0, X86::sub_16bit);
	} else if (VT.SimpleTy == MVT::i32) {			} else if (VT.SimpleTy == MVT::i32) {
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(Copy), TypeEntry.HighInReg)			TII.get(Copy), TypeEntry.HighInReg)
	.addReg(Zero32);			.addReg(Zero32);
	} else if (VT.SimpleTy == MVT::i64) {			} else if (VT.SimpleTy == MVT::i64) {
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(TargetOpcode::SUBREG_TO_REG), TypeEntry.HighInReg)			TII.get(TargetOpcode::SUBREG_TO_REG), TypeEntry.HighInReg)
	.addImm(0).addReg(Zero32).addImm(X86::sub_32bit);			.addImm(0).addReg(Zero32).addImm(X86::sub_32bit);
	}			}
	}			}
	}			}
	// Generate the DIV/IDIV instruction.			// Generate the DIV/IDIV instruction.
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(OpEntry.OpDivRem)).addReg(Op1Reg);			TII.get(OpEntry.OpDivRem)).addReg(Op1Reg);
	// For i8 remainder, we can't reference AH directly, as we'll end			// For i8 remainder, we can't reference AH directly, as we'll end
	// up with bogus copies like %R9B = COPY %AH. Reference AX			// up with bogus copies like %R9B = COPY %AH. Reference AX
	// instead to prevent AH references in a REX instruction.			// instead to prevent AH references in a REX instruction.
	//			//
	// The current assumption of the fast register allocator is that isel			// The current assumption of the fast register allocator is that isel
	// won't generate explicit references to the GPR8_NOREX registers. If			// won't generate explicit references to the GPR8_NOREX registers. If
	// the allocator and/or the backend get enhanced to be more robust in			// the allocator and/or the backend get enhanced to be more robust in
	// that regard, this can be, and should be, removed.			// that regard, this can be, and should be, removed.
	unsigned ResultReg = 0;			unsigned ResultReg = 0;
	if ((I->getOpcode() == Instruction::SRem \|\|			if ((I->getOpcode() == Instruction::SRem \|\|
	I->getOpcode() == Instruction::URem) &&			I->getOpcode() == Instruction::URem) &&
	OpEntry.DivRemResultReg == X86::AH && Subtarget->is64Bit()) {			OpEntry.DivRemResultReg == X86::AH && Subtarget->is64Bit()) {
	unsigned SourceSuperReg = createResultReg(&X86::GR16RegClass);			unsigned SourceSuperReg = createResultReg(&X86::GR16RegClass);
	unsigned ResultSuperReg = createResultReg(&X86::GR16RegClass);			unsigned ResultSuperReg = createResultReg(&X86::GR16RegClass);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(Copy), SourceSuperReg).addReg(X86::AX);			TII.get(Copy), SourceSuperReg).addReg(X86::AX);

	// Shift AX right by 8 bits instead of using AH.			// Shift AX right by 8 bits instead of using AH.
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(X86::SHR16ri),			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(X86::SHR16ri),
	ResultSuperReg).addReg(SourceSuperReg).addImm(8);			ResultSuperReg).addReg(SourceSuperReg).addImm(8);

	// Now reference the 8-bit subreg of the result.			// Now reference the 8-bit subreg of the result.
	ResultReg = fastEmitInst_extractsubreg(MVT::i8, ResultSuperReg,			ResultReg = fastEmitInst_extractsubreg(MVT::i8, ResultSuperReg,
	/Kill=/true, X86::sub_8bit);			/Kill=/true, X86::sub_8bit);
	}			}
	// Copy the result out of the physreg if we haven't already.			// Copy the result out of the physreg if we haven't already.
	if (!ResultReg) {			if (!ResultReg) {
	ResultReg = createResultReg(TypeEntry.RC);			ResultReg = createResultReg(TypeEntry.RC);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Copy), ResultReg)			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Copy), ResultReg)
	.addReg(OpEntry.DivRemResultReg);			.addReg(OpEntry.DivRemResultReg);
	}			}
	updateValueMap(I, ResultReg);			updateValueMap(I, ResultReg);

	return true;			return true;
	}			}

	/// \brief Emit a conditional move instruction (if the are supported) to lower			/// \brief Emit a conditional move instruction (if the are supported) to lower
	/// the select.			/// the select.
	bool X86FastISel::X86FastEmitCMoveSelect(MVT RetVT, const Instruction *I) {			bool X86FastISel::X86FastEmitCMoveSelect(MVT RetVT, const Instruction *I) {
	// Check if the subtarget supports these instructions.			// Check if the subtarget supports these instructions.
	if (!Subtarget->hasCMov())			if (!Subtarget->hasCMov())
	return false;			return false;

	// FIXME: Add support for i8.			// FIXME: Add support for i8.
	if (RetVT < MVT::i16 \|\| RetVT > MVT::i64)			if (RetVT < MVT::i16 \|\| RetVT > MVT::i64)
	return false;			return false;

	const Value *Cond = I->getOperand(0);			const Value *Cond = I->getOperand(0);
	const TargetRegisterClass *RC = TLI.getRegClassFor(RetVT);			const TargetRegisterClass *RC = TLI.getRegClassFor(RetVT);
	bool NeedTest = true;			bool NeedTest = true;
	X86::CondCode CC = X86::COND_NE;			X86::CondCode CC = X86::COND_NE;

	// Optimize conditions coming from a compare if both instructions are in the			// Optimize conditions coming from a compare if both instructions are in the
	// same basic block (values defined in other basic blocks may not have			// same basic block (values defined in other basic blocks may not have
	// initialized registers).			// initialized registers).
	const auto *CI = dyn_cast<CmpInst>(Cond);			const auto *CI = dyn_cast<CmpInst>(Cond);
	if (CI && (CI->getParent() == I->getParent())) {			if (CI && (CI->getParent() == I->getParent())) {
	CmpInst::Predicate Predicate = optimizeCmpPredicate(CI);			CmpInst::Predicate Predicate = optimizeCmpPredicate(CI);

	// FCMP_OEQ and FCMP_UNE cannot be checked with a single instruction.			// FCMP_OEQ and FCMP_UNE cannot be checked with a single instruction.
	static unsigned SETFOpcTable[2][3] = {			static unsigned SETFOpcTable[2][3] = {
	{ X86::SETNPr, X86::SETEr , X86::TEST8rr },			{ X86::SETNPr, X86::SETEr , X86::TEST8rr },
	{ X86::SETPr, X86::SETNEr, X86::OR8rr }			{ X86::SETPr, X86::SETNEr, X86::OR8rr }
	};			};
	unsigned *SETFOpc = nullptr;			unsigned *SETFOpc = nullptr;
	switch (Predicate) {			switch (Predicate) {
	default: break;			default: break;
	case CmpInst::FCMP_OEQ:			case CmpInst::FCMP_OEQ:
	SETFOpc = &SETFOpcTable[0][0];			SETFOpc = &SETFOpcTable[0][0];
	Predicate = CmpInst::ICMP_NE;			Predicate = CmpInst::ICMP_NE;
	break;			break;
	case CmpInst::FCMP_UNE:			case CmpInst::FCMP_UNE:
	SETFOpc = &SETFOpcTable[1][0];			SETFOpc = &SETFOpcTable[1][0];
	Predicate = CmpInst::ICMP_NE;			Predicate = CmpInst::ICMP_NE;
	break;			break;
	}			}

	bool NeedSwap;			bool NeedSwap;
	std::tie(CC, NeedSwap) = getX86ConditionCode(Predicate);			std::tie(CC, NeedSwap) = getX86ConditionCode(Predicate);
	assert(CC <= X86::LAST_VALID_COND && "Unexpected condition code.");			assert(CC <= X86::LAST_VALID_COND && "Unexpected condition code.");

	const Value *CmpLHS = CI->getOperand(0);			const Value *CmpLHS = CI->getOperand(0);
	const Value *CmpRHS = CI->getOperand(1);			const Value *CmpRHS = CI->getOperand(1);
	if (NeedSwap)			if (NeedSwap)
	std::swap(CmpLHS, CmpRHS);			std::swap(CmpLHS, CmpRHS);

	EVT CmpVT = TLI.getValueType(CmpLHS->getType());			EVT CmpVT = TLI.getValueType(CmpLHS->getType());
	// Emit a compare of the LHS and RHS, setting the flags.			// Emit a compare of the LHS and RHS, setting the flags.
	if (!X86FastEmitCompare(CmpLHS, CmpRHS, CmpVT, CI->getDebugLoc()))			if (!X86FastEmitCompare(CmpLHS, CmpRHS, CmpVT, CI->getDebugLoc()))
	return false;			return false;

	if (SETFOpc) {			if (SETFOpc) {
	unsigned FlagReg1 = createResultReg(&X86::GR8RegClass);			unsigned FlagReg1 = createResultReg(&X86::GR8RegClass);
	unsigned FlagReg2 = createResultReg(&X86::GR8RegClass);			unsigned FlagReg2 = createResultReg(&X86::GR8RegClass);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(SETFOpc[0]),			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(SETFOpc[0]),
	FlagReg1);			FlagReg1);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(SETFOpc[1]),			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(SETFOpc[1]),
	FlagReg2);			FlagReg2);
	auto const &II = TII.get(SETFOpc[2]);			auto const &II = TII.get(SETFOpc[2]);
	if (II.getNumDefs()) {			if (II.getNumDefs()) {
	unsigned TmpReg = createResultReg(&X86::GR8RegClass);			unsigned TmpReg = createResultReg(&X86::GR8RegClass);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, II, TmpReg)			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, II, TmpReg)
	.addReg(FlagReg2).addReg(FlagReg1);			.addReg(FlagReg2).addReg(FlagReg1);
	} else {			} else {
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, II)			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, II)
	.addReg(FlagReg2).addReg(FlagReg1);			.addReg(FlagReg2).addReg(FlagReg1);
	}			}
	}			}
	NeedTest = false;			NeedTest = false;
	} else if (foldX86XALUIntrinsic(CC, I, Cond)) {			} else if (foldX86XALUIntrinsic(CC, I, Cond)) {
	// Fake request the condition, otherwise the intrinsic might be completely			// Fake request the condition, otherwise the intrinsic might be completely
	// optimized away.			// optimized away.
	unsigned TmpReg = getRegForValue(Cond);			unsigned TmpReg = getRegForValue(Cond);
	if (TmpReg == 0)			if (TmpReg == 0)
	return false;			return false;

	NeedTest = false;			NeedTest = false;
	}			}

	if (NeedTest) {			if (NeedTest) {
	// Selects operate on i1, however, CondReg is 8 bits width and may contain			// Selects operate on i1, however, CondReg is 8 bits width and may contain
	// garbage. Indeed, only the less significant bit is supposed to be			// garbage. Indeed, only the less significant bit is supposed to be
	// accurate. If we read more than the lsb, we may see non-zero values			// accurate. If we read more than the lsb, we may see non-zero values
	// whereas lsb is zero. Therefore, we have to truncate Op0Reg to i1 for			// whereas lsb is zero. Therefore, we have to truncate Op0Reg to i1 for
	// the select. This is achieved by performing TEST against 1.			// the select. This is achieved by performing TEST against 1.
	unsigned CondReg = getRegForValue(Cond);			unsigned CondReg = getRegForValue(Cond);
	if (CondReg == 0)			if (CondReg == 0)
	return false;			return false;
	bool CondIsKill = hasTrivialKill(Cond);			bool CondIsKill = hasTrivialKill(Cond);

	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(X86::TEST8ri))			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(X86::TEST8ri))
	.addReg(CondReg, getKillRegState(CondIsKill)).addImm(1);			.addReg(CondReg, getKillRegState(CondIsKill)).addImm(1);
	}			}

	const Value *LHS = I->getOperand(1);			const Value *LHS = I->getOperand(1);
	const Value *RHS = I->getOperand(2);			const Value *RHS = I->getOperand(2);

	unsigned RHSReg = getRegForValue(RHS);			unsigned RHSReg = getRegForValue(RHS);
	bool RHSIsKill = hasTrivialKill(RHS);			bool RHSIsKill = hasTrivialKill(RHS);

	unsigned LHSReg = getRegForValue(LHS);			unsigned LHSReg = getRegForValue(LHS);
	bool LHSIsKill = hasTrivialKill(LHS);			bool LHSIsKill = hasTrivialKill(LHS);

	if (!LHSReg \|\| !RHSReg)			if (!LHSReg \|\| !RHSReg)
	return false;			return false;

	unsigned Opc = X86::getCMovFromCond(CC, RC->getSize());			unsigned Opc = X86::getCMovFromCond(CC, RC->getSize());
	unsigned ResultReg = fastEmitInst_rr(Opc, RC, RHSReg, RHSIsKill,			unsigned ResultReg = fastEmitInst_rr(Opc, RC, RHSReg, RHSIsKill,
	LHSReg, LHSIsKill);			LHSReg, LHSIsKill);
	updateValueMap(I, ResultReg);			updateValueMap(I, ResultReg);
	return true;			return true;
	}			}

	/// \brief Emit SSE instructions to lower the select.			/// \brief Emit SSE instructions to lower the select.
	///			///
	/// Try to use SSE1/SSE2 instructions to simulate a select without branches.			/// Try to use SSE1/SSE2 instructions to simulate a select without branches.
	/// This lowers fp selects into a CMP/AND/ANDN/OR sequence when the necessary			/// This lowers fp selects into a CMP/AND/ANDN/OR sequence when the necessary
	/// SSE instructions are available.			/// SSE instructions are available.
	bool X86FastISel::X86FastEmitSSESelect(MVT RetVT, const Instruction *I) {			bool X86FastISel::X86FastEmitSSESelect(MVT RetVT, const Instruction *I) {
	// Optimize conditions coming from a compare if both instructions are in the			// Optimize conditions coming from a compare if both instructions are in the
	// same basic block (values defined in other basic blocks may not have			// same basic block (values defined in other basic blocks may not have
	// initialized registers).			// initialized registers).
	const auto *CI = dyn_cast<FCmpInst>(I->getOperand(0));			const auto *CI = dyn_cast<FCmpInst>(I->getOperand(0));
	if (!CI \|\| (CI->getParent() != I->getParent()))			if (!CI \|\| (CI->getParent() != I->getParent()))
	return false;			return false;

	if (I->getType() != CI->getOperand(0)->getType() \|\|			if (I->getType() != CI->getOperand(0)->getType() \|\|
	!((Subtarget->hasSSE1() && RetVT == MVT::f32) \|\|			!((Subtarget->hasSSE1() && RetVT == MVT::f32) \|\|
	(Subtarget->hasSSE2() && RetVT == MVT::f64)))			(Subtarget->hasSSE2() && RetVT == MVT::f64)))
	return false;			return false;

	const Value *CmpLHS = CI->getOperand(0);			const Value *CmpLHS = CI->getOperand(0);
	const Value *CmpRHS = CI->getOperand(1);			const Value *CmpRHS = CI->getOperand(1);
	CmpInst::Predicate Predicate = optimizeCmpPredicate(CI);			CmpInst::Predicate Predicate = optimizeCmpPredicate(CI);

	// The optimizer might have replaced fcmp oeq %x, %x with fcmp ord %x, 0.0.			// The optimizer might have replaced fcmp oeq %x, %x with fcmp ord %x, 0.0.
	// We don't have to materialize a zero constant for this case and can just use			// We don't have to materialize a zero constant for this case and can just use
	// %x again on the RHS.			// %x again on the RHS.
	if (Predicate == CmpInst::FCMP_ORD \|\| Predicate == CmpInst::FCMP_UNO) {			if (Predicate == CmpInst::FCMP_ORD \|\| Predicate == CmpInst::FCMP_UNO) {
	const auto *CmpRHSC = dyn_cast<ConstantFP>(CmpRHS);			const auto *CmpRHSC = dyn_cast<ConstantFP>(CmpRHS);
	if (CmpRHSC && CmpRHSC->isNullValue())			if (CmpRHSC && CmpRHSC->isNullValue())
	CmpRHS = CmpLHS;			CmpRHS = CmpLHS;
	}			}

	unsigned CC;			unsigned CC;
	bool NeedSwap;			bool NeedSwap;
	std::tie(CC, NeedSwap) = getX86SSEConditionCode(Predicate);			std::tie(CC, NeedSwap) = getX86SSEConditionCode(Predicate);
	if (CC > 7)			if (CC > 7)
	return false;			return false;

	if (NeedSwap)			if (NeedSwap)
	std::swap(CmpLHS, CmpRHS);			std::swap(CmpLHS, CmpRHS);

	static unsigned OpcTable[2][2][4] = {			static unsigned OpcTable[2][2][4] = {
	{ { X86::CMPSSrr, X86::FsANDPSrr, X86::FsANDNPSrr, X86::FsORPSrr },			{ { X86::CMPSSrr, X86::FsANDPSrr, X86::FsANDNPSrr, X86::FsORPSrr },
	{ X86::VCMPSSrr, X86::VFsANDPSrr, X86::VFsANDNPSrr, X86::VFsORPSrr } },			{ X86::VCMPSSrr, X86::VFsANDPSrr, X86::VFsANDNPSrr, X86::VFsORPSrr } },
	{ { X86::CMPSDrr, X86::FsANDPDrr, X86::FsANDNPDrr, X86::FsORPDrr },			{ { X86::CMPSDrr, X86::FsANDPDrr, X86::FsANDNPDrr, X86::FsORPDrr },
	{ X86::VCMPSDrr, X86::VFsANDPDrr, X86::VFsANDNPDrr, X86::VFsORPDrr } }			{ X86::VCMPSDrr, X86::VFsANDPDrr, X86::VFsANDNPDrr, X86::VFsORPDrr } }
	};			};

	bool HasAVX = Subtarget->hasAVX();			bool HasAVX = Subtarget->hasAVX();
	unsigned *Opc = nullptr;			unsigned *Opc = nullptr;
	switch (RetVT.SimpleTy) {			switch (RetVT.SimpleTy) {
	default: return false;			default: return false;
	case MVT::f32: Opc = &OpcTable[0][HasAVX][0]; break;			case MVT::f32: Opc = &OpcTable[0][HasAVX][0]; break;
	case MVT::f64: Opc = &OpcTable[1][HasAVX][0]; break;			case MVT::f64: Opc = &OpcTable[1][HasAVX][0]; break;
	}			}

	const Value *LHS = I->getOperand(1);			const Value *LHS = I->getOperand(1);
	const Value *RHS = I->getOperand(2);			const Value *RHS = I->getOperand(2);

	unsigned LHSReg = getRegForValue(LHS);			unsigned LHSReg = getRegForValue(LHS);
	bool LHSIsKill = hasTrivialKill(LHS);			bool LHSIsKill = hasTrivialKill(LHS);

	unsigned RHSReg = getRegForValue(RHS);			unsigned RHSReg = getRegForValue(RHS);
	bool RHSIsKill = hasTrivialKill(RHS);			bool RHSIsKill = hasTrivialKill(RHS);

	unsigned CmpLHSReg = getRegForValue(CmpLHS);			unsigned CmpLHSReg = getRegForValue(CmpLHS);
	bool CmpLHSIsKill = hasTrivialKill(CmpLHS);			bool CmpLHSIsKill = hasTrivialKill(CmpLHS);

	unsigned CmpRHSReg = getRegForValue(CmpRHS);			unsigned CmpRHSReg = getRegForValue(CmpRHS);
	bool CmpRHSIsKill = hasTrivialKill(CmpRHS);			bool CmpRHSIsKill = hasTrivialKill(CmpRHS);

	if (!LHSReg \|\| !RHSReg \|\| !CmpLHS \|\| !CmpRHS)			if (!LHSReg \|\| !RHSReg \|\| !CmpLHS \|\| !CmpRHS)
	return false;			return false;

	const TargetRegisterClass *RC = TLI.getRegClassFor(RetVT);			const TargetRegisterClass *RC = TLI.getRegClassFor(RetVT);
	unsigned CmpReg = fastEmitInst_rri(Opc[0], RC, CmpLHSReg, CmpLHSIsKill,			unsigned CmpReg = fastEmitInst_rri(Opc[0], RC, CmpLHSReg, CmpLHSIsKill,
	CmpRHSReg, CmpRHSIsKill, CC);			CmpRHSReg, CmpRHSIsKill, CC);
	unsigned AndReg = fastEmitInst_rr(Opc[1], RC, CmpReg, /IsKill=/false,			unsigned AndReg = fastEmitInst_rr(Opc[1], RC, CmpReg, /IsKill=/false,
	LHSReg, LHSIsKill);			LHSReg, LHSIsKill);
	unsigned AndNReg = fastEmitInst_rr(Opc[2], RC, CmpReg, /IsKill=/true,			unsigned AndNReg = fastEmitInst_rr(Opc[2], RC, CmpReg, /IsKill=/true,
	RHSReg, RHSIsKill);			RHSReg, RHSIsKill);
	unsigned ResultReg = fastEmitInst_rr(Opc[3], RC, AndNReg, /IsKill=/true,			unsigned ResultReg = fastEmitInst_rr(Opc[3], RC, AndNReg, /IsKill=/true,
	AndReg, /IsKill=/true);			AndReg, /IsKill=/true);
	updateValueMap(I, ResultReg);			updateValueMap(I, ResultReg);
	return true;			return true;
	}			}

	bool X86FastISel::X86FastEmitPseudoSelect(MVT RetVT, const Instruction *I) {			bool X86FastISel::X86FastEmitPseudoSelect(MVT RetVT, const Instruction *I) {
	// These are pseudo CMOV instructions and will be later expanded into control-			// These are pseudo CMOV instructions and will be later expanded into control-
	// flow.			// flow.
	unsigned Opc;			unsigned Opc;
	switch (RetVT.SimpleTy) {			switch (RetVT.SimpleTy) {
	default: return false;			default: return false;
	case MVT::i8: Opc = X86::CMOV_GR8; break;			case MVT::i8: Opc = X86::CMOV_GR8; break;
	case MVT::i16: Opc = X86::CMOV_GR16; break;			case MVT::i16: Opc = X86::CMOV_GR16; break;
	case MVT::i32: Opc = X86::CMOV_GR32; break;			case MVT::i32: Opc = X86::CMOV_GR32; break;
	case MVT::f32: Opc = X86::CMOV_FR32; break;			case MVT::f32: Opc = X86::CMOV_FR32; break;
	case MVT::f64: Opc = X86::CMOV_FR64; break;			case MVT::f64: Opc = X86::CMOV_FR64; break;
	}			}

	const Value *Cond = I->getOperand(0);			const Value *Cond = I->getOperand(0);
	X86::CondCode CC = X86::COND_NE;			X86::CondCode CC = X86::COND_NE;

	// Optimize conditions coming from a compare if both instructions are in the			// Optimize conditions coming from a compare if both instructions are in the
	// same basic block (values defined in other basic blocks may not have			// same basic block (values defined in other basic blocks may not have
	// initialized registers).			// initialized registers).
	const auto *CI = dyn_cast<CmpInst>(Cond);			const auto *CI = dyn_cast<CmpInst>(Cond);
	if (CI && (CI->getParent() == I->getParent())) {			if (CI && (CI->getParent() == I->getParent())) {
	bool NeedSwap;			bool NeedSwap;
	std::tie(CC, NeedSwap) = getX86ConditionCode(CI->getPredicate());			std::tie(CC, NeedSwap) = getX86ConditionCode(CI->getPredicate());
	if (CC > X86::LAST_VALID_COND)			if (CC > X86::LAST_VALID_COND)
	return false;			return false;

	const Value *CmpLHS = CI->getOperand(0);			const Value *CmpLHS = CI->getOperand(0);
	const Value *CmpRHS = CI->getOperand(1);			const Value *CmpRHS = CI->getOperand(1);

	if (NeedSwap)			if (NeedSwap)
	std::swap(CmpLHS, CmpRHS);			std::swap(CmpLHS, CmpRHS);

	EVT CmpVT = TLI.getValueType(CmpLHS->getType());			EVT CmpVT = TLI.getValueType(CmpLHS->getType());
	if (!X86FastEmitCompare(CmpLHS, CmpRHS, CmpVT, CI->getDebugLoc()))			if (!X86FastEmitCompare(CmpLHS, CmpRHS, CmpVT, CI->getDebugLoc()))
	return false;			return false;
	} else {			} else {
	unsigned CondReg = getRegForValue(Cond);			unsigned CondReg = getRegForValue(Cond);
	if (CondReg == 0)			if (CondReg == 0)
	return false;			return false;
	bool CondIsKill = hasTrivialKill(Cond);			bool CondIsKill = hasTrivialKill(Cond);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(X86::TEST8ri))			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(X86::TEST8ri))
	.addReg(CondReg, getKillRegState(CondIsKill)).addImm(1);			.addReg(CondReg, getKillRegState(CondIsKill)).addImm(1);
	}			}

	const Value *LHS = I->getOperand(1);			const Value *LHS = I->getOperand(1);
	const Value *RHS = I->getOperand(2);			const Value *RHS = I->getOperand(2);

	unsigned LHSReg = getRegForValue(LHS);			unsigned LHSReg = getRegForValue(LHS);
	bool LHSIsKill = hasTrivialKill(LHS);			bool LHSIsKill = hasTrivialKill(LHS);

	unsigned RHSReg = getRegForValue(RHS);			unsigned RHSReg = getRegForValue(RHS);
	bool RHSIsKill = hasTrivialKill(RHS);			bool RHSIsKill = hasTrivialKill(RHS);

	if (!LHSReg \|\| !RHSReg)			if (!LHSReg \|\| !RHSReg)
	return false;			return false;

	const TargetRegisterClass *RC = TLI.getRegClassFor(RetVT);			const TargetRegisterClass *RC = TLI.getRegClassFor(RetVT);

	unsigned ResultReg =			unsigned ResultReg =
	fastEmitInst_rri(Opc, RC, RHSReg, RHSIsKill, LHSReg, LHSIsKill, CC);			fastEmitInst_rri(Opc, RC, RHSReg, RHSIsKill, LHSReg, LHSIsKill, CC);
	updateValueMap(I, ResultReg);			updateValueMap(I, ResultReg);
	return true;			return true;
	}			}

	bool X86FastISel::X86SelectSelect(const Instruction *I) {			bool X86FastISel::X86SelectSelect(const Instruction *I) {
	MVT RetVT;			MVT RetVT;
	if (!isTypeLegal(I->getType(), RetVT))			if (!isTypeLegal(I->getType(), RetVT))
	return false;			return false;

	// Check if we can fold the select.			// Check if we can fold the select.
	if (const auto *CI = dyn_cast<CmpInst>(I->getOperand(0))) {			if (const auto *CI = dyn_cast<CmpInst>(I->getOperand(0))) {
	CmpInst::Predicate Predicate = optimizeCmpPredicate(CI);			CmpInst::Predicate Predicate = optimizeCmpPredicate(CI);
	const Value *Opnd = nullptr;			const Value *Opnd = nullptr;
	switch (Predicate) {			switch (Predicate) {
	default: break;			default: break;
	case CmpInst::FCMP_FALSE: Opnd = I->getOperand(2); break;			case CmpInst::FCMP_FALSE: Opnd = I->getOperand(2); break;
	case CmpInst::FCMP_TRUE: Opnd = I->getOperand(1); break;			case CmpInst::FCMP_TRUE: Opnd = I->getOperand(1); break;
	}			}
	// No need for a select anymore - this is an unconditional move.			// No need for a select anymore - this is an unconditional move.
	if (Opnd) {			if (Opnd) {
	unsigned OpReg = getRegForValue(Opnd);			unsigned OpReg = getRegForValue(Opnd);
	if (OpReg == 0)			if (OpReg == 0)
	return false;			return false;
	bool OpIsKill = hasTrivialKill(Opnd);			bool OpIsKill = hasTrivialKill(Opnd);
	const TargetRegisterClass *RC = TLI.getRegClassFor(RetVT);			const TargetRegisterClass *RC = TLI.getRegClassFor(RetVT);
	unsigned ResultReg = createResultReg(RC);			unsigned ResultReg = createResultReg(RC);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(TargetOpcode::COPY), ResultReg)			TII.get(TargetOpcode::COPY), ResultReg)
	.addReg(OpReg, getKillRegState(OpIsKill));			.addReg(OpReg, getKillRegState(OpIsKill));
	updateValueMap(I, ResultReg);			updateValueMap(I, ResultReg);
	return true;			return true;
	}			}
	}			}

	// First try to use real conditional move instructions.			// First try to use real conditional move instructions.
	if (X86FastEmitCMoveSelect(RetVT, I))			if (X86FastEmitCMoveSelect(RetVT, I))
	return true;			return true;

	// Try to use a sequence of SSE instructions to simulate a conditional move.			// Try to use a sequence of SSE instructions to simulate a conditional move.
	if (X86FastEmitSSESelect(RetVT, I))			if (X86FastEmitSSESelect(RetVT, I))
	return true;			return true;

	// Fall-back to pseudo conditional move instructions, which will be later			// Fall-back to pseudo conditional move instructions, which will be later
	// converted to control-flow.			// converted to control-flow.
	if (X86FastEmitPseudoSelect(RetVT, I))			if (X86FastEmitPseudoSelect(RetVT, I))
	return true;			return true;

	return false;			return false;
	}			}

	bool X86FastISel::X86SelectFPExt(const Instruction *I) {			bool X86FastISel::X86SelectFPExt(const Instruction *I) {
	// fpext from float to double.			// fpext from float to double.
	if (X86ScalarSSEf64 &&			if (X86ScalarSSEf64 &&
	I->getType()->isDoubleTy()) {			I->getType()->isDoubleTy()) {
	const Value *V = I->getOperand(0);			const Value *V = I->getOperand(0);
	if (V->getType()->isFloatTy()) {			if (V->getType()->isFloatTy()) {
	unsigned OpReg = getRegForValue(V);			unsigned OpReg = getRegForValue(V);
	if (OpReg == 0) return false;			if (OpReg == 0) return false;
	unsigned ResultReg = createResultReg(&X86::FR64RegClass);			unsigned ResultReg = createResultReg(&X86::FR64RegClass);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(X86::CVTSS2SDrr), ResultReg)			TII.get(X86::CVTSS2SDrr), ResultReg)
	.addReg(OpReg);			.addReg(OpReg);
	updateValueMap(I, ResultReg);			updateValueMap(I, ResultReg);
	return true;			return true;
	}			}
	}			}

	return false;			return false;
	}			}

	bool X86FastISel::X86SelectFPTrunc(const Instruction *I) {			bool X86FastISel::X86SelectFPTrunc(const Instruction *I) {
	if (X86ScalarSSEf64) {			if (X86ScalarSSEf64) {
	if (I->getType()->isFloatTy()) {			if (I->getType()->isFloatTy()) {
	const Value *V = I->getOperand(0);			const Value *V = I->getOperand(0);
	if (V->getType()->isDoubleTy()) {			if (V->getType()->isDoubleTy()) {
	unsigned OpReg = getRegForValue(V);			unsigned OpReg = getRegForValue(V);
	if (OpReg == 0) return false;			if (OpReg == 0) return false;
	unsigned ResultReg = createResultReg(&X86::FR32RegClass);			unsigned ResultReg = createResultReg(&X86::FR32RegClass);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(X86::CVTSD2SSrr), ResultReg)			TII.get(X86::CVTSD2SSrr), ResultReg)
	.addReg(OpReg);			.addReg(OpReg);
	updateValueMap(I, ResultReg);			updateValueMap(I, ResultReg);
	return true;			return true;
	}			}
	}			}
	}			}

	return false;			return false;
	}			}

	bool X86FastISel::X86SelectTrunc(const Instruction *I) {			bool X86FastISel::X86SelectTrunc(const Instruction *I) {
	EVT SrcVT = TLI.getValueType(I->getOperand(0)->getType());			EVT SrcVT = TLI.getValueType(I->getOperand(0)->getType());
	EVT DstVT = TLI.getValueType(I->getType());			EVT DstVT = TLI.getValueType(I->getType());

	// This code only handles truncation to byte.			// This code only handles truncation to byte.
	if (DstVT != MVT::i8 && DstVT != MVT::i1)			if (DstVT != MVT::i8 && DstVT != MVT::i1)
	return false;			return false;
	if (!TLI.isTypeLegal(SrcVT))			if (!TLI.isTypeLegal(SrcVT))
	return false;			return false;

	unsigned InputReg = getRegForValue(I->getOperand(0));			unsigned InputReg = getRegForValue(I->getOperand(0));
	if (!InputReg)			if (!InputReg)
	// Unhandled operand. Halt "fast" selection and bail.			// Unhandled operand. Halt "fast" selection and bail.
	return false;			return false;

	if (SrcVT == MVT::i8) {			if (SrcVT == MVT::i8) {
	// Truncate from i8 to i1; no code needed.			// Truncate from i8 to i1; no code needed.
	updateValueMap(I, InputReg);			updateValueMap(I, InputReg);
	return true;			return true;
	}			}

	if (!Subtarget->is64Bit()) {			if (!Subtarget->is64Bit()) {
	// If we're on x86-32; we can't extract an i8 from a general register.			// If we're on x86-32; we can't extract an i8 from a general register.
	// First issue a copy to GR16_ABCD or GR32_ABCD.			// First issue a copy to GR16_ABCD or GR32_ABCD.
	const TargetRegisterClass *CopyRC =			const TargetRegisterClass *CopyRC =
	(SrcVT == MVT::i16) ? &X86::GR16_ABCDRegClass : &X86::GR32_ABCDRegClass;			(SrcVT == MVT::i16) ? &X86::GR16_ABCDRegClass : &X86::GR32_ABCDRegClass;
	unsigned CopyReg = createResultReg(CopyRC);			unsigned CopyReg = createResultReg(CopyRC);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(TargetOpcode::COPY), CopyReg).addReg(InputReg);			TII.get(TargetOpcode::COPY), CopyReg).addReg(InputReg);
	InputReg = CopyReg;			InputReg = CopyReg;
	}			}

	// Issue an extract_subreg.			// Issue an extract_subreg.
	unsigned ResultReg = fastEmitInst_extractsubreg(MVT::i8,			unsigned ResultReg = fastEmitInst_extractsubreg(MVT::i8,
	InputReg, /Kill=/true,			InputReg, /Kill=/true,
	X86::sub_8bit);			X86::sub_8bit);
	if (!ResultReg)			if (!ResultReg)
	return false;			return false;

	updateValueMap(I, ResultReg);			updateValueMap(I, ResultReg);
	return true;			return true;
	}			}

	bool X86FastISel::IsMemcpySmall(uint64_t Len) {			bool X86FastISel::IsMemcpySmall(uint64_t Len) {
	return Len <= (Subtarget->is64Bit() ? 32 : 16);			return Len <= (Subtarget->is64Bit() ? 32 : 16);
	}			}

	bool X86FastISel::TryEmitSmallMemcpy(X86AddressMode DestAM,			bool X86FastISel::TryEmitSmallMemcpy(X86AddressMode DestAM,
	X86AddressMode SrcAM, uint64_t Len) {			X86AddressMode SrcAM, uint64_t Len) {

	// Make sure we don't bloat code by inlining very large memcpy's.			// Make sure we don't bloat code by inlining very large memcpy's.
	if (!IsMemcpySmall(Len))			if (!IsMemcpySmall(Len))
	return false;			return false;

	bool i64Legal = Subtarget->is64Bit();			bool i64Legal = Subtarget->is64Bit();

	// We don't care about alignment here since we just emit integer accesses.			// We don't care about alignment here since we just emit integer accesses.
	while (Len) {			while (Len) {
	MVT VT;			MVT VT;
	if (Len >= 8 && i64Legal)			if (Len >= 8 && i64Legal)
	VT = MVT::i64;			VT = MVT::i64;
	else if (Len >= 4)			else if (Len >= 4)
	VT = MVT::i32;			VT = MVT::i32;
	else if (Len >= 2)			else if (Len >= 2)
	VT = MVT::i16;			VT = MVT::i16;
	else			else
	VT = MVT::i8;			VT = MVT::i8;

	unsigned Reg;			unsigned Reg;
	bool RV = X86FastEmitLoad(VT, SrcAM, nullptr, Reg);			bool RV = X86FastEmitLoad(VT, SrcAM, nullptr, Reg);
	RV &= X86FastEmitStore(VT, Reg, /Kill=/true, DestAM);			RV &= X86FastEmitStore(VT, Reg, /Kill=/true, DestAM);
	assert(RV && "Failed to emit load or store??");			assert(RV && "Failed to emit load or store??");

	unsigned Size = VT.getSizeInBits()/8;			unsigned Size = VT.getSizeInBits()/8;
	Len -= Size;			Len -= Size;
	DestAM.Disp += Size;			DestAM.Disp += Size;
	SrcAM.Disp += Size;			SrcAM.Disp += Size;
	}			}

	return true;			return true;
	}			}

	bool X86FastISel::fastLowerIntrinsicCall(const IntrinsicInst *II) {			bool X86FastISel::fastLowerIntrinsicCall(const IntrinsicInst *II) {
	// FIXME: Handle more intrinsics.			// FIXME: Handle more intrinsics.
	switch (II->getIntrinsicID()) {			switch (II->getIntrinsicID()) {
	default: return false;			default: return false;
	case Intrinsic::frameaddress: {			case Intrinsic::frameaddress: {
	Type *RetTy = II->getCalledFunction()->getReturnType();			Type *RetTy = II->getCalledFunction()->getReturnType();

	MVT VT;			MVT VT;
	if (!isTypeLegal(RetTy, VT))			if (!isTypeLegal(RetTy, VT))
	return false;			return false;

	unsigned Opc;			unsigned Opc;
	const TargetRegisterClass *RC = nullptr;			const TargetRegisterClass *RC = nullptr;

	switch (VT.SimpleTy) {			switch (VT.SimpleTy) {
	default: llvm_unreachable("Invalid result type for frameaddress.");			default: llvm_unreachable("Invalid result type for frameaddress.");
	case MVT::i32: Opc = X86::MOV32rm; RC = &X86::GR32RegClass; break;			case MVT::i32: Opc = X86::MOV32rm; RC = &X86::GR32RegClass; break;
	case MVT::i64: Opc = X86::MOV64rm; RC = &X86::GR64RegClass; break;			case MVT::i64: Opc = X86::MOV64rm; RC = &X86::GR64RegClass; break;
	}			}

	// This needs to be set before we call getPtrSizedFrameRegister, otherwise			// This needs to be set before we call getPtrSizedFrameRegister, otherwise
	// we get the wrong frame register.			// we get the wrong frame register.
	MachineFrameInfo *MFI = FuncInfo.MF->getFrameInfo();			MachineFrameInfo *MFI = FuncInfo.MF->getFrameInfo();
	MFI->setFrameAddressIsTaken(true);			MFI->setFrameAddressIsTaken(true);

	const X86RegisterInfo RegInfo = static_cast<const X86RegisterInfo >(			const X86RegisterInfo RegInfo = static_cast<const X86RegisterInfo >(
	TM.getSubtargetImpl()->getRegisterInfo());			TM.getSubtargetImpl()->getRegisterInfo());
	unsigned FrameReg = RegInfo->getPtrSizedFrameRegister(*(FuncInfo.MF));			unsigned FrameReg = RegInfo->getPtrSizedFrameRegister(*(FuncInfo.MF));
	assert(((FrameReg == X86::RBP && VT == MVT::i64) \|\|			assert(((FrameReg == X86::RBP && VT == MVT::i64) \|\|
	(FrameReg == X86::EBP && VT == MVT::i32)) &&			(FrameReg == X86::EBP && VT == MVT::i32)) &&
	"Invalid Frame Register!");			"Invalid Frame Register!");

	// Always make a copy of the frame register to to a vreg first, so that we			// Always make a copy of the frame register to to a vreg first, so that we
	// never directly reference the frame register (the TwoAddressInstruction-			// never directly reference the frame register (the TwoAddressInstruction-
	// Pass doesn't like that).			// Pass doesn't like that).
	unsigned SrcReg = createResultReg(RC);			unsigned SrcReg = createResultReg(RC);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(TargetOpcode::COPY), SrcReg).addReg(FrameReg);			TII.get(TargetOpcode::COPY), SrcReg).addReg(FrameReg);

	// Now recursively load from the frame address.			// Now recursively load from the frame address.
	// movq (%rbp), %rax			// movq (%rbp), %rax
	// movq (%rax), %rax			// movq (%rax), %rax
	// movq (%rax), %rax			// movq (%rax), %rax
	// ...			// ...
	unsigned DestReg;			unsigned DestReg;
	unsigned Depth = cast<ConstantInt>(II->getOperand(0))->getZExtValue();			unsigned Depth = cast<ConstantInt>(II->getOperand(0))->getZExtValue();
	while (Depth--) {			while (Depth--) {
	DestReg = createResultReg(RC);			DestReg = createResultReg(RC);
	addDirectMem(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			addDirectMem(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(Opc), DestReg), SrcReg);			TII.get(Opc), DestReg), SrcReg);
	SrcReg = DestReg;			SrcReg = DestReg;
	}			}

	updateValueMap(II, SrcReg);			updateValueMap(II, SrcReg);
	return true;			return true;
	}			}
	case Intrinsic::memcpy: {			case Intrinsic::memcpy: {
	const MemCpyInst *MCI = cast<MemCpyInst>(II);			const MemCpyInst *MCI = cast<MemCpyInst>(II);
	// Don't handle volatile or variable length memcpys.			// Don't handle volatile or variable length memcpys.
	if (MCI->isVolatile())			if (MCI->isVolatile())
	return false;			return false;

	if (isa<ConstantInt>(MCI->getLength())) {			if (isa<ConstantInt>(MCI->getLength())) {
	// Small memcpy's are common enough that we want to do them			// Small memcpy's are common enough that we want to do them
	// without a call if possible.			// without a call if possible.
	uint64_t Len = cast<ConstantInt>(MCI->getLength())->getZExtValue();			uint64_t Len = cast<ConstantInt>(MCI->getLength())->getZExtValue();
	if (IsMemcpySmall(Len)) {			if (IsMemcpySmall(Len)) {
	X86AddressMode DestAM, SrcAM;			X86AddressMode DestAM, SrcAM;
	if (!X86SelectAddress(MCI->getRawDest(), DestAM) \|\|			if (!X86SelectAddress(MCI->getRawDest(), DestAM) \|\|
	!X86SelectAddress(MCI->getRawSource(), SrcAM))			!X86SelectAddress(MCI->getRawSource(), SrcAM))
	return false;			return false;
	TryEmitSmallMemcpy(DestAM, SrcAM, Len);			TryEmitSmallMemcpy(DestAM, SrcAM, Len);
	return true;			return true;
	}			}
	}			}

	unsigned SizeWidth = Subtarget->is64Bit() ? 64 : 32;			unsigned SizeWidth = Subtarget->is64Bit() ? 64 : 32;
	if (!MCI->getLength()->getType()->isIntegerTy(SizeWidth))			if (!MCI->getLength()->getType()->isIntegerTy(SizeWidth))
	return false;			return false;

	if (MCI->getSourceAddressSpace() > 255 \|\| MCI->getDestAddressSpace() > 255)			if (MCI->getSourceAddressSpace() > 255 \|\| MCI->getDestAddressSpace() > 255)
	return false;			return false;

	return lowerCallTo(II, "memcpy", II->getNumArgOperands() - 2);			return lowerCallTo(II, "memcpy", II->getNumArgOperands() - 2);
	}			}
	case Intrinsic::memset: {			case Intrinsic::memset: {
	const MemSetInst *MSI = cast<MemSetInst>(II);			const MemSetInst *MSI = cast<MemSetInst>(II);

	if (MSI->isVolatile())			if (MSI->isVolatile())
	return false;			return false;

	unsigned SizeWidth = Subtarget->is64Bit() ? 64 : 32;			unsigned SizeWidth = Subtarget->is64Bit() ? 64 : 32;
	if (!MSI->getLength()->getType()->isIntegerTy(SizeWidth))			if (!MSI->getLength()->getType()->isIntegerTy(SizeWidth))
	return false;			return false;

	if (MSI->getDestAddressSpace() > 255)			if (MSI->getDestAddressSpace() > 255)
	return false;			return false;

	return lowerCallTo(II, "memset", II->getNumArgOperands() - 2);			return lowerCallTo(II, "memset", II->getNumArgOperands() - 2);
	}			}
	case Intrinsic::stackprotector: {			case Intrinsic::stackprotector: {
	// Emit code to store the stack guard onto the stack.			// Emit code to store the stack guard onto the stack.
	EVT PtrTy = TLI.getPointerTy();			EVT PtrTy = TLI.getPointerTy();

	const Value *Op1 = II->getArgOperand(0); // The guard's value.			const Value *Op1 = II->getArgOperand(0); // The guard's value.
	const AllocaInst *Slot = cast<AllocaInst>(II->getArgOperand(1));			const AllocaInst *Slot = cast<AllocaInst>(II->getArgOperand(1));

	MFI.setStackProtectorIndex(FuncInfo.StaticAllocaMap[Slot]);			MFI.setStackProtectorIndex(FuncInfo.StaticAllocaMap[Slot]);

	// Grab the frame index.			// Grab the frame index.
	X86AddressMode AM;			X86AddressMode AM;
	if (!X86SelectAddress(Slot, AM)) return false;			if (!X86SelectAddress(Slot, AM)) return false;
	if (!X86FastEmitStore(PtrTy, Op1, AM)) return false;			if (!X86FastEmitStore(PtrTy, Op1, AM)) return false;
	return true;			return true;
	}			}
	case Intrinsic::dbg_declare: {			case Intrinsic::dbg_declare: {
	const DbgDeclareInst *DI = cast<DbgDeclareInst>(II);			const DbgDeclareInst *DI = cast<DbgDeclareInst>(II);
	X86AddressMode AM;			X86AddressMode AM;
	assert(DI->getAddress() && "Null address should be checked earlier!");			assert(DI->getAddress() && "Null address should be checked earlier!");
	if (!X86SelectAddress(DI->getAddress(), AM))			if (!X86SelectAddress(DI->getAddress(), AM))
	return false;			return false;
	const MCInstrDesc &II = TII.get(TargetOpcode::DBG_VALUE);			const MCInstrDesc &II = TII.get(TargetOpcode::DBG_VALUE);
	// FIXME may need to add RegState::Debug to any registers produced,			// FIXME may need to add RegState::Debug to any registers produced,
	// although ESP/EBP should be the only ones at the moment.			// although ESP/EBP should be the only ones at the moment.
	addFullAddress(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, II), AM)			addFullAddress(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, II), AM)
	.addImm(0)			.addImm(0)
	.addMetadata(DI->getVariable())			.addMetadata(DI->getVariable())
	.addMetadata(DI->getExpression());			.addMetadata(DI->getExpression());
	return true;			return true;
	}			}
	case Intrinsic::trap: {			case Intrinsic::trap: {
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(X86::TRAP));			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(X86::TRAP));
	return true;			return true;
	}			}
	case Intrinsic::sqrt: {			case Intrinsic::sqrt: {
	if (!Subtarget->hasSSE1())			if (!Subtarget->hasSSE1())
	return false;			return false;

	Type *RetTy = II->getCalledFunction()->getReturnType();			Type *RetTy = II->getCalledFunction()->getReturnType();

	MVT VT;			MVT VT;
	if (!isTypeLegal(RetTy, VT))			if (!isTypeLegal(RetTy, VT))
	return false;			return false;

	// Unfortunately we can't use fastEmit_r, because the AVX version of FSQRT			// Unfortunately we can't use fastEmit_r, because the AVX version of FSQRT
	// is not generated by FastISel yet.			// is not generated by FastISel yet.
	// FIXME: Update this code once tablegen can handle it.			// FIXME: Update this code once tablegen can handle it.
	static const unsigned SqrtOpc[2][2] = {			static const unsigned SqrtOpc[2][2] = {
	{X86::SQRTSSr, X86::VSQRTSSr},			{X86::SQRTSSr, X86::VSQRTSSr},
	{X86::SQRTSDr, X86::VSQRTSDr}			{X86::SQRTSDr, X86::VSQRTSDr}
	};			};
	bool HasAVX = Subtarget->hasAVX();			bool HasAVX = Subtarget->hasAVX();
	unsigned Opc;			unsigned Opc;
	const TargetRegisterClass *RC;			const TargetRegisterClass *RC;
	switch (VT.SimpleTy) {			switch (VT.SimpleTy) {
	default: return false;			default: return false;
	case MVT::f32: Opc = SqrtOpc[0][HasAVX]; RC = &X86::FR32RegClass; break;			case MVT::f32: Opc = SqrtOpc[0][HasAVX]; RC = &X86::FR32RegClass; break;
	case MVT::f64: Opc = SqrtOpc[1][HasAVX]; RC = &X86::FR64RegClass; break;			case MVT::f64: Opc = SqrtOpc[1][HasAVX]; RC = &X86::FR64RegClass; break;
	}			}

	const Value *SrcVal = II->getArgOperand(0);			const Value *SrcVal = II->getArgOperand(0);
	unsigned SrcReg = getRegForValue(SrcVal);			unsigned SrcReg = getRegForValue(SrcVal);

	if (SrcReg == 0)			if (SrcReg == 0)
	return false;			return false;

	unsigned ImplicitDefReg = 0;			unsigned ImplicitDefReg = 0;
	if (HasAVX) {			if (HasAVX) {
	ImplicitDefReg = createResultReg(RC);			ImplicitDefReg = createResultReg(RC);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(TargetOpcode::IMPLICIT_DEF), ImplicitDefReg);			TII.get(TargetOpcode::IMPLICIT_DEF), ImplicitDefReg);
	}			}

	unsigned ResultReg = createResultReg(RC);			unsigned ResultReg = createResultReg(RC);
	MachineInstrBuilder MIB;			MachineInstrBuilder MIB;
	MIB = BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc),			MIB = BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc),
	ResultReg);			ResultReg);

	if (ImplicitDefReg)			if (ImplicitDefReg)
	MIB.addReg(ImplicitDefReg);			MIB.addReg(ImplicitDefReg);

	MIB.addReg(SrcReg);			MIB.addReg(SrcReg);

	updateValueMap(II, ResultReg);			updateValueMap(II, ResultReg);
	return true;			return true;
	}			}
	case Intrinsic::sadd_with_overflow:			case Intrinsic::sadd_with_overflow:
	case Intrinsic::uadd_with_overflow:			case Intrinsic::uadd_with_overflow:
	case Intrinsic::ssub_with_overflow:			case Intrinsic::ssub_with_overflow:
	case Intrinsic::usub_with_overflow:			case Intrinsic::usub_with_overflow:
	case Intrinsic::smul_with_overflow:			case Intrinsic::smul_with_overflow:
	case Intrinsic::umul_with_overflow: {			case Intrinsic::umul_with_overflow: {
	// This implements the basic lowering of the xalu with overflow intrinsics			// This implements the basic lowering of the xalu with overflow intrinsics
	// into add/sub/mul followed by either seto or setb.			// into add/sub/mul followed by either seto or setb.
	const Function *Callee = II->getCalledFunction();			const Function *Callee = II->getCalledFunction();
	auto *Ty = cast<StructType>(Callee->getReturnType());			auto *Ty = cast<StructType>(Callee->getReturnType());
	Type *RetTy = Ty->getTypeAtIndex(0U);			Type *RetTy = Ty->getTypeAtIndex(0U);
	Type *CondTy = Ty->getTypeAtIndex(1);			Type *CondTy = Ty->getTypeAtIndex(1);

	MVT VT;			MVT VT;
	if (!isTypeLegal(RetTy, VT))			if (!isTypeLegal(RetTy, VT))
	return false;			return false;

	if (VT < MVT::i8 \|\| VT > MVT::i64)			if (VT < MVT::i8 \|\| VT > MVT::i64)
	return false;			return false;

	const Value *LHS = II->getArgOperand(0);			const Value *LHS = II->getArgOperand(0);
	const Value *RHS = II->getArgOperand(1);			const Value *RHS = II->getArgOperand(1);

	// Canonicalize immediate to the RHS.			// Canonicalize immediate to the RHS.
	if (isa<ConstantInt>(LHS) && !isa<ConstantInt>(RHS) &&			if (isa<ConstantInt>(LHS) && !isa<ConstantInt>(RHS) &&
	isCommutativeIntrinsic(II))			isCommutativeIntrinsic(II))
	std::swap(LHS, RHS);			std::swap(LHS, RHS);

	bool UseIncDec = false;			bool UseIncDec = false;
	if (isa<ConstantInt>(RHS) && cast<ConstantInt>(RHS)->isOne())			if (isa<ConstantInt>(RHS) && cast<ConstantInt>(RHS)->isOne())
	UseIncDec = true;			UseIncDec = true;

	unsigned BaseOpc, CondOpc;			unsigned BaseOpc, CondOpc;
	switch (II->getIntrinsicID()) {			switch (II->getIntrinsicID()) {
	default: llvm_unreachable("Unexpected intrinsic!");			default: llvm_unreachable("Unexpected intrinsic!");
	case Intrinsic::sadd_with_overflow:			case Intrinsic::sadd_with_overflow:
	BaseOpc = UseIncDec ? unsigned(X86ISD::INC) : unsigned(ISD::ADD);			BaseOpc = UseIncDec ? unsigned(X86ISD::INC) : unsigned(ISD::ADD);
	CondOpc = X86::SETOr;			CondOpc = X86::SETOr;
	break;			break;
	case Intrinsic::uadd_with_overflow:			case Intrinsic::uadd_with_overflow:
	BaseOpc = ISD::ADD; CondOpc = X86::SETBr; break;			BaseOpc = ISD::ADD; CondOpc = X86::SETBr; break;
	case Intrinsic::ssub_with_overflow:			case Intrinsic::ssub_with_overflow:
	BaseOpc = UseIncDec ? unsigned(X86ISD::DEC) : unsigned(ISD::SUB);			BaseOpc = UseIncDec ? unsigned(X86ISD::DEC) : unsigned(ISD::SUB);
	CondOpc = X86::SETOr;			CondOpc = X86::SETOr;
	break;			break;
	case Intrinsic::usub_with_overflow:			case Intrinsic::usub_with_overflow:
	BaseOpc = ISD::SUB; CondOpc = X86::SETBr; break;			BaseOpc = ISD::SUB; CondOpc = X86::SETBr; break;
	case Intrinsic::smul_with_overflow:			case Intrinsic::smul_with_overflow:
	BaseOpc = X86ISD::SMUL; CondOpc = X86::SETOr; break;			BaseOpc = X86ISD::SMUL; CondOpc = X86::SETOr; break;
	case Intrinsic::umul_with_overflow:			case Intrinsic::umul_with_overflow:
	BaseOpc = X86ISD::UMUL; CondOpc = X86::SETOr; break;			BaseOpc = X86ISD::UMUL; CondOpc = X86::SETOr; break;
	}			}

	unsigned LHSReg = getRegForValue(LHS);			unsigned LHSReg = getRegForValue(LHS);
	if (LHSReg == 0)			if (LHSReg == 0)
	return false;			return false;
	bool LHSIsKill = hasTrivialKill(LHS);			bool LHSIsKill = hasTrivialKill(LHS);

	unsigned ResultReg = 0;			unsigned ResultReg = 0;
	// Check if we have an immediate version.			// Check if we have an immediate version.
	if (const auto *CI = dyn_cast<ConstantInt>(RHS)) {			if (const auto *CI = dyn_cast<ConstantInt>(RHS)) {
	static const unsigned Opc[2][4] = {			static const unsigned Opc[2][4] = {
	{ X86::INC8r, X86::INC16r, X86::INC32r, X86::INC64r },			{ X86::INC8r, X86::INC16r, X86::INC32r, X86::INC64r },
	{ X86::DEC8r, X86::DEC16r, X86::DEC32r, X86::DEC64r }			{ X86::DEC8r, X86::DEC16r, X86::DEC32r, X86::DEC64r }
	};			};

	if (BaseOpc == X86ISD::INC \|\| BaseOpc == X86ISD::DEC) {			if (BaseOpc == X86ISD::INC \|\| BaseOpc == X86ISD::DEC) {
	ResultReg = createResultReg(TLI.getRegClassFor(VT));			ResultReg = createResultReg(TLI.getRegClassFor(VT));
	bool IsDec = BaseOpc == X86ISD::DEC;			bool IsDec = BaseOpc == X86ISD::DEC;
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(Opc[IsDec][VT.SimpleTy-MVT::i8]), ResultReg)			TII.get(Opc[IsDec][VT.SimpleTy-MVT::i8]), ResultReg)
	.addReg(LHSReg, getKillRegState(LHSIsKill));			.addReg(LHSReg, getKillRegState(LHSIsKill));
	} else			} else
	ResultReg = fastEmit_ri(VT, VT, BaseOpc, LHSReg, LHSIsKill,			ResultReg = fastEmit_ri(VT, VT, BaseOpc, LHSReg, LHSIsKill,
	CI->getZExtValue());			CI->getZExtValue());
	}			}

	unsigned RHSReg;			unsigned RHSReg;
	bool RHSIsKill;			bool RHSIsKill;
	if (!ResultReg) {			if (!ResultReg) {
	RHSReg = getRegForValue(RHS);			RHSReg = getRegForValue(RHS);
	if (RHSReg == 0)			if (RHSReg == 0)
	return false;			return false;
	RHSIsKill = hasTrivialKill(RHS);			RHSIsKill = hasTrivialKill(RHS);
	ResultReg = fastEmit_rr(VT, VT, BaseOpc, LHSReg, LHSIsKill, RHSReg,			ResultReg = fastEmit_rr(VT, VT, BaseOpc, LHSReg, LHSIsKill, RHSReg,
	RHSIsKill);			RHSIsKill);
	}			}

	// FastISel doesn't have a pattern for all X86::MULr and X86::IMULr. Emit			// FastISel doesn't have a pattern for all X86::MULr and X86::IMULr. Emit
	// it manually.			// it manually.
	if (BaseOpc == X86ISD::UMUL && !ResultReg) {			if (BaseOpc == X86ISD::UMUL && !ResultReg) {
	static const unsigned MULOpc[] =			static const unsigned MULOpc[] =
	{ X86::MUL8r, X86::MUL16r, X86::MUL32r, X86::MUL64r };			{ X86::MUL8r, X86::MUL16r, X86::MUL32r, X86::MUL64r };
	static const unsigned Reg[] = { X86::AL, X86::AX, X86::EAX, X86::RAX };			static const unsigned Reg[] = { X86::AL, X86::AX, X86::EAX, X86::RAX };
	// First copy the first operand into RAX, which is an implicit input to			// First copy the first operand into RAX, which is an implicit input to
	// the X86::MUL*r instruction.			// the X86::MUL*r instruction.
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(TargetOpcode::COPY), Reg[VT.SimpleTy-MVT::i8])			TII.get(TargetOpcode::COPY), Reg[VT.SimpleTy-MVT::i8])
	.addReg(LHSReg, getKillRegState(LHSIsKill));			.addReg(LHSReg, getKillRegState(LHSIsKill));
	ResultReg = fastEmitInst_r(MULOpc[VT.SimpleTy-MVT::i8],			ResultReg = fastEmitInst_r(MULOpc[VT.SimpleTy-MVT::i8],
	TLI.getRegClassFor(VT), RHSReg, RHSIsKill);			TLI.getRegClassFor(VT), RHSReg, RHSIsKill);
	} else if (BaseOpc == X86ISD::SMUL && !ResultReg) {			} else if (BaseOpc == X86ISD::SMUL && !ResultReg) {
	static const unsigned MULOpc[] =			static const unsigned MULOpc[] =
	{ X86::IMUL8r, X86::IMUL16rr, X86::IMUL32rr, X86::IMUL64rr };			{ X86::IMUL8r, X86::IMUL16rr, X86::IMUL32rr, X86::IMUL64rr };
	if (VT == MVT::i8) {			if (VT == MVT::i8) {
	// Copy the first operand into AL, which is an implicit input to the			// Copy the first operand into AL, which is an implicit input to the
	// X86::IMUL8r instruction.			// X86::IMUL8r instruction.
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(TargetOpcode::COPY), X86::AL)			TII.get(TargetOpcode::COPY), X86::AL)
	.addReg(LHSReg, getKillRegState(LHSIsKill));			.addReg(LHSReg, getKillRegState(LHSIsKill));
	ResultReg = fastEmitInst_r(MULOpc[0], TLI.getRegClassFor(VT), RHSReg,			ResultReg = fastEmitInst_r(MULOpc[0], TLI.getRegClassFor(VT), RHSReg,
	RHSIsKill);			RHSIsKill);
	} else			} else
	ResultReg = fastEmitInst_rr(MULOpc[VT.SimpleTy-MVT::i8],			ResultReg = fastEmitInst_rr(MULOpc[VT.SimpleTy-MVT::i8],
	TLI.getRegClassFor(VT), LHSReg, LHSIsKill,			TLI.getRegClassFor(VT), LHSReg, LHSIsKill,
	RHSReg, RHSIsKill);			RHSReg, RHSIsKill);
	}			}

	if (!ResultReg)			if (!ResultReg)
	return false;			return false;

	unsigned ResultReg2 = FuncInfo.CreateRegs(CondTy);			unsigned ResultReg2 = FuncInfo.CreateRegs(CondTy);
	assert((ResultReg+1) == ResultReg2 && "Nonconsecutive result registers.");			assert((ResultReg+1) == ResultReg2 && "Nonconsecutive result registers.");
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(CondOpc),			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(CondOpc),
	ResultReg2);			ResultReg2);

	updateValueMap(II, ResultReg, 2);			updateValueMap(II, ResultReg, 2);
	return true;			return true;
	}			}
	case Intrinsic::x86_sse_cvttss2si:			case Intrinsic::x86_sse_cvttss2si:
	case Intrinsic::x86_sse_cvttss2si64:			case Intrinsic::x86_sse_cvttss2si64:
	case Intrinsic::x86_sse2_cvttsd2si:			case Intrinsic::x86_sse2_cvttsd2si:
	case Intrinsic::x86_sse2_cvttsd2si64: {			case Intrinsic::x86_sse2_cvttsd2si64: {
	bool IsInputDouble;			bool IsInputDouble;
	switch (II->getIntrinsicID()) {			switch (II->getIntrinsicID()) {
	default: llvm_unreachable("Unexpected intrinsic.");			default: llvm_unreachable("Unexpected intrinsic.");
	case Intrinsic::x86_sse_cvttss2si:			case Intrinsic::x86_sse_cvttss2si:
	case Intrinsic::x86_sse_cvttss2si64:			case Intrinsic::x86_sse_cvttss2si64:
	if (!Subtarget->hasSSE1())			if (!Subtarget->hasSSE1())
	return false;			return false;
	IsInputDouble = false;			IsInputDouble = false;
	break;			break;
	case Intrinsic::x86_sse2_cvttsd2si:			case Intrinsic::x86_sse2_cvttsd2si:
	case Intrinsic::x86_sse2_cvttsd2si64:			case Intrinsic::x86_sse2_cvttsd2si64:
	if (!Subtarget->hasSSE2())			if (!Subtarget->hasSSE2())
	return false;			return false;
	IsInputDouble = true;			IsInputDouble = true;
	break;			break;
	}			}

	Type *RetTy = II->getCalledFunction()->getReturnType();			Type *RetTy = II->getCalledFunction()->getReturnType();
	MVT VT;			MVT VT;
	if (!isTypeLegal(RetTy, VT))			if (!isTypeLegal(RetTy, VT))
	return false;			return false;

	static const unsigned CvtOpc[2][2][2] = {			static const unsigned CvtOpc[2][2][2] = {
	{ { X86::CVTTSS2SIrr, X86::VCVTTSS2SIrr },			{ { X86::CVTTSS2SIrr, X86::VCVTTSS2SIrr },
	{ X86::CVTTSS2SI64rr, X86::VCVTTSS2SI64rr } },			{ X86::CVTTSS2SI64rr, X86::VCVTTSS2SI64rr } },
	{ { X86::CVTTSD2SIrr, X86::VCVTTSD2SIrr },			{ { X86::CVTTSD2SIrr, X86::VCVTTSD2SIrr },
	{ X86::CVTTSD2SI64rr, X86::VCVTTSD2SI64rr } }			{ X86::CVTTSD2SI64rr, X86::VCVTTSD2SI64rr } }
	};			};
	bool HasAVX = Subtarget->hasAVX();			bool HasAVX = Subtarget->hasAVX();
	unsigned Opc;			unsigned Opc;
	switch (VT.SimpleTy) {			switch (VT.SimpleTy) {
	default: llvm_unreachable("Unexpected result type.");			default: llvm_unreachable("Unexpected result type.");
	case MVT::i32: Opc = CvtOpc[IsInputDouble][0][HasAVX]; break;			case MVT::i32: Opc = CvtOpc[IsInputDouble][0][HasAVX]; break;
	case MVT::i64: Opc = CvtOpc[IsInputDouble][1][HasAVX]; break;			case MVT::i64: Opc = CvtOpc[IsInputDouble][1][HasAVX]; break;
	}			}

	// Check if we can fold insertelement instructions into the convert.			// Check if we can fold insertelement instructions into the convert.
	const Value *Op = II->getArgOperand(0);			const Value *Op = II->getArgOperand(0);
	while (auto *IE = dyn_cast<InsertElementInst>(Op)) {			while (auto *IE = dyn_cast<InsertElementInst>(Op)) {
	const Value *Index = IE->getOperand(2);			const Value *Index = IE->getOperand(2);
	if (!isa<ConstantInt>(Index))			if (!isa<ConstantInt>(Index))
	break;			break;
	unsigned Idx = cast<ConstantInt>(Index)->getZExtValue();			unsigned Idx = cast<ConstantInt>(Index)->getZExtValue();

	if (Idx == 0) {			if (Idx == 0) {
	Op = IE->getOperand(1);			Op = IE->getOperand(1);
	break;			break;
	}			}
	Op = IE->getOperand(0);			Op = IE->getOperand(0);
	}			}

	unsigned Reg = getRegForValue(Op);			unsigned Reg = getRegForValue(Op);
	if (Reg == 0)			if (Reg == 0)
	return false;			return false;

	unsigned ResultReg = createResultReg(TLI.getRegClassFor(VT));			unsigned ResultReg = createResultReg(TLI.getRegClassFor(VT));
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc), ResultReg)			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc), ResultReg)
	.addReg(Reg);			.addReg(Reg);

	updateValueMap(II, ResultReg);			updateValueMap(II, ResultReg);
	return true;			return true;
	}			}
	}			}
	}			}

	bool X86FastISel::fastLowerArguments() {			bool X86FastISel::fastLowerArguments() {
	if (!FuncInfo.CanLowerReturn)			if (!FuncInfo.CanLowerReturn)
	return false;			return false;

	const Function *F = FuncInfo.Fn;			const Function *F = FuncInfo.Fn;
	if (F->isVarArg())			if (F->isVarArg())
	return false;			return false;

	CallingConv::ID CC = F->getCallingConv();			CallingConv::ID CC = F->getCallingConv();
	if (CC != CallingConv::C)			if (CC != CallingConv::C)
	return false;			return false;

	if (Subtarget->isCallingConvWin64(CC))			if (Subtarget->isCallingConvWin64(CC))
	return false;			return false;

	if (!Subtarget->is64Bit())			if (!Subtarget->is64Bit())
	return false;			return false;

	// Only handle simple cases. i.e. Up to 6 i32/i64 scalar arguments.			// Only handle simple cases. i.e. Up to 6 i32/i64 scalar arguments.
	unsigned GPRCnt = 0;			unsigned GPRCnt = 0;
	unsigned FPRCnt = 0;			unsigned FPRCnt = 0;
	unsigned Idx = 0;			unsigned Idx = 0;
	for (auto const &Arg : F->args()) {			for (auto const &Arg : F->args()) {
	// The first argument is at index 1.			// The first argument is at index 1.
	++Idx;			++Idx;
	if (F->getAttributes().hasAttribute(Idx, Attribute::ByVal) \|\|			if (F->getAttributes().hasAttribute(Idx, Attribute::ByVal) \|\|
	F->getAttributes().hasAttribute(Idx, Attribute::InReg) \|\|			F->getAttributes().hasAttribute(Idx, Attribute::InReg) \|\|
	F->getAttributes().hasAttribute(Idx, Attribute::StructRet) \|\|			F->getAttributes().hasAttribute(Idx, Attribute::StructRet) \|\|
	F->getAttributes().hasAttribute(Idx, Attribute::Nest))			F->getAttributes().hasAttribute(Idx, Attribute::Nest))
	return false;			return false;

	Type *ArgTy = Arg.getType();			Type *ArgTy = Arg.getType();
	if (ArgTy->isStructTy() \|\| ArgTy->isArrayTy() \|\| ArgTy->isVectorTy())			if (ArgTy->isStructTy() \|\| ArgTy->isArrayTy() \|\| ArgTy->isVectorTy())
	return false;			return false;

	EVT ArgVT = TLI.getValueType(ArgTy);			EVT ArgVT = TLI.getValueType(ArgTy);
	if (!ArgVT.isSimple()) return false;			if (!ArgVT.isSimple()) return false;
	switch (ArgVT.getSimpleVT().SimpleTy) {			switch (ArgVT.getSimpleVT().SimpleTy) {
	default: return false;			default: return false;
	case MVT::i32:			case MVT::i32:
	case MVT::i64:			case MVT::i64:
	++GPRCnt;			++GPRCnt;
	break;			break;
	case MVT::f32:			case MVT::f32:
	case MVT::f64:			case MVT::f64:
	if (!Subtarget->hasSSE1())			if (!Subtarget->hasSSE1())
	return false;			return false;
	++FPRCnt;			++FPRCnt;
	break;			break;
	}			}

	if (GPRCnt > 6)			if (GPRCnt > 6)
	return false;			return false;

	if (FPRCnt > 8)			if (FPRCnt > 8)
	return false;			return false;
	}			}

	static const MCPhysReg GPR32ArgRegs[] = {			static const MCPhysReg GPR32ArgRegs[] = {
	X86::EDI, X86::ESI, X86::EDX, X86::ECX, X86::R8D, X86::R9D			X86::EDI, X86::ESI, X86::EDX, X86::ECX, X86::R8D, X86::R9D
	};			};
	static const MCPhysReg GPR64ArgRegs[] = {			static const MCPhysReg GPR64ArgRegs[] = {
	X86::RDI, X86::RSI, X86::RDX, X86::RCX, X86::R8 , X86::R9			X86::RDI, X86::RSI, X86::RDX, X86::RCX, X86::R8 , X86::R9
	};			};
	static const MCPhysReg XMMArgRegs[] = {			static const MCPhysReg XMMArgRegs[] = {
	X86::XMM0, X86::XMM1, X86::XMM2, X86::XMM3,			X86::XMM0, X86::XMM1, X86::XMM2, X86::XMM3,
	X86::XMM4, X86::XMM5, X86::XMM6, X86::XMM7			X86::XMM4, X86::XMM5, X86::XMM6, X86::XMM7
	};			};

	unsigned GPRIdx = 0;			unsigned GPRIdx = 0;
	unsigned FPRIdx = 0;			unsigned FPRIdx = 0;
	for (auto const &Arg : F->args()) {			for (auto const &Arg : F->args()) {
	MVT VT = TLI.getSimpleValueType(Arg.getType());			MVT VT = TLI.getSimpleValueType(Arg.getType());
	const TargetRegisterClass *RC = TLI.getRegClassFor(VT);			const TargetRegisterClass *RC = TLI.getRegClassFor(VT);
	unsigned SrcReg;			unsigned SrcReg;
	switch (VT.SimpleTy) {			switch (VT.SimpleTy) {
	default: llvm_unreachable("Unexpected value type.");			default: llvm_unreachable("Unexpected value type.");
	case MVT::i32: SrcReg = GPR32ArgRegs[GPRIdx++]; break;			case MVT::i32: SrcReg = GPR32ArgRegs[GPRIdx++]; break;
	case MVT::i64: SrcReg = GPR64ArgRegs[GPRIdx++]; break;			case MVT::i64: SrcReg = GPR64ArgRegs[GPRIdx++]; break;
	case MVT::f32: // fall-through			case MVT::f32: // fall-through
	case MVT::f64: SrcReg = XMMArgRegs[FPRIdx++]; break;			case MVT::f64: SrcReg = XMMArgRegs[FPRIdx++]; break;
	}			}
	unsigned DstReg = FuncInfo.MF->addLiveIn(SrcReg, RC);			unsigned DstReg = FuncInfo.MF->addLiveIn(SrcReg, RC);
	// FIXME: Unfortunately it's necessary to emit a copy from the livein copy.			// FIXME: Unfortunately it's necessary to emit a copy from the livein copy.
	// Without this, EmitLiveInCopies may eliminate the livein if its only			// Without this, EmitLiveInCopies may eliminate the livein if its only
	// use is a bitcast (which isn't turned into an instruction).			// use is a bitcast (which isn't turned into an instruction).
	unsigned ResultReg = createResultReg(RC);			unsigned ResultReg = createResultReg(RC);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(TargetOpcode::COPY), ResultReg)			TII.get(TargetOpcode::COPY), ResultReg)
	.addReg(DstReg, getKillRegState(true));			.addReg(DstReg, getKillRegState(true));
	updateValueMap(&Arg, ResultReg);			updateValueMap(&Arg, ResultReg);
	}			}
	return true;			return true;
	}			}

	static unsigned computeBytesPoppedByCallee(const X86Subtarget *Subtarget,			static unsigned computeBytesPoppedByCallee(const X86Subtarget *Subtarget,
	CallingConv::ID CC,			CallingConv::ID CC,
	ImmutableCallSite *CS) {			ImmutableCallSite *CS) {
	if (Subtarget->is64Bit())			if (Subtarget->is64Bit())
	return 0;			return 0;
	if (Subtarget->getTargetTriple().isOSMSVCRT())			if (Subtarget->getTargetTriple().isOSMSVCRT())
	return 0;			return 0;
	if (CC == CallingConv::Fast \|\| CC == CallingConv::GHC \|\|			if (CC == CallingConv::Fast \|\| CC == CallingConv::GHC \|\|
	CC == CallingConv::HiPE)			CC == CallingConv::HiPE)
	return 0;			return 0;
	if (CS && !CS->paramHasAttr(1, Attribute::StructRet))			if (CS && !CS->paramHasAttr(1, Attribute::StructRet))
	return 0;			return 0;
	if (CS && CS->paramHasAttr(1, Attribute::InReg))			if (CS && CS->paramHasAttr(1, Attribute::InReg))
	return 0;			return 0;
	return 4;			return 4;
	}			}

	bool X86FastISel::fastLowerCall(CallLoweringInfo &CLI) {			bool X86FastISel::fastLowerCall(CallLoweringInfo &CLI) {
	auto &OutVals = CLI.OutVals;			auto &OutVals = CLI.OutVals;
	auto &OutFlags = CLI.OutFlags;			auto &OutFlags = CLI.OutFlags;
	auto &OutRegs = CLI.OutRegs;			auto &OutRegs = CLI.OutRegs;
	auto &Ins = CLI.Ins;			auto &Ins = CLI.Ins;
	auto &InRegs = CLI.InRegs;			auto &InRegs = CLI.InRegs;
	CallingConv::ID CC = CLI.CallConv;			CallingConv::ID CC = CLI.CallConv;
	bool &IsTailCall = CLI.IsTailCall;			bool &IsTailCall = CLI.IsTailCall;
	bool IsVarArg = CLI.IsVarArg;			bool IsVarArg = CLI.IsVarArg;
	const Value *Callee = CLI.Callee;			const Value *Callee = CLI.Callee;
	const char *SymName = CLI.SymName;			const char *SymName = CLI.SymName;

	bool Is64Bit = Subtarget->is64Bit();			bool Is64Bit = Subtarget->is64Bit();
	bool IsWin64 = Subtarget->isCallingConvWin64(CC);			bool IsWin64 = Subtarget->isCallingConvWin64(CC);

	// Handle only C, fastcc, and webkit_js calling conventions for now.			// Handle only C, fastcc, and webkit_js calling conventions for now.
	switch (CC) {			switch (CC) {
	default: return false;			default: return false;
	case CallingConv::C:			case CallingConv::C:
	case CallingConv::Fast:			case CallingConv::Fast:
	case CallingConv::WebKit_JS:			case CallingConv::WebKit_JS:
	case CallingConv::X86_FastCall:			case CallingConv::X86_FastCall:
	case CallingConv::X86_64_Win64:			case CallingConv::X86_64_Win64:
	case CallingConv::X86_64_SysV:			case CallingConv::X86_64_SysV:
	break;			break;
	}			}

	// Allow SelectionDAG isel to handle tail calls.			// Allow SelectionDAG isel to handle tail calls.
	if (IsTailCall)			if (IsTailCall)
	return false;			return false;

	// fastcc with -tailcallopt is intended to provide a guaranteed			// fastcc with -tailcallopt is intended to provide a guaranteed
	// tail call optimization. Fastisel doesn't know how to do that.			// tail call optimization. Fastisel doesn't know how to do that.
	if (CC == CallingConv::Fast && TM.Options.GuaranteedTailCallOpt)			if (CC == CallingConv::Fast && TM.Options.GuaranteedTailCallOpt)
	return false;			return false;

	// Don't know how to handle Win64 varargs yet. Nothing special needed for			// Don't know how to handle Win64 varargs yet. Nothing special needed for
	// x86-32. Special handling for x86-64 is implemented.			// x86-32. Special handling for x86-64 is implemented.
	if (IsVarArg && IsWin64)			if (IsVarArg && IsWin64)
	return false;			return false;

	// Don't know about inalloca yet.			// Don't know about inalloca yet.
	if (CLI.CS && CLI.CS->hasInAllocaArgument())			if (CLI.CS && CLI.CS->hasInAllocaArgument())
	return false;			return false;

	// Fast-isel doesn't know about callee-pop yet.			// Fast-isel doesn't know about callee-pop yet.
	if (X86::isCalleePop(CC, Subtarget->is64Bit(), IsVarArg,			if (X86::isCalleePop(CC, Subtarget->is64Bit(), IsVarArg,
	TM.Options.GuaranteedTailCallOpt))			TM.Options.GuaranteedTailCallOpt))
	return false;			return false;

	SmallVector<MVT, 16> OutVTs;			SmallVector<MVT, 16> OutVTs;
	SmallVector<unsigned, 16> ArgRegs;			SmallVector<unsigned, 16> ArgRegs;

	// If this is a constant i1/i8/i16 argument, promote to i32 to avoid an extra			// If this is a constant i1/i8/i16 argument, promote to i32 to avoid an extra
	// instruction. This is safe because it is common to all FastISel supported			// instruction. This is safe because it is common to all FastISel supported
	// calling conventions on x86.			// calling conventions on x86.
	for (int i = 0, e = OutVals.size(); i != e; ++i) {			for (int i = 0, e = OutVals.size(); i != e; ++i) {
	Value *&Val = OutVals[i];			Value *&Val = OutVals[i];
	ISD::ArgFlagsTy Flags = OutFlags[i];			ISD::ArgFlagsTy Flags = OutFlags[i];
	if (auto *CI = dyn_cast<ConstantInt>(Val)) {			if (auto *CI = dyn_cast<ConstantInt>(Val)) {
	if (CI->getBitWidth() < 32) {			if (CI->getBitWidth() < 32) {
	if (Flags.isSExt())			if (Flags.isSExt())
	Val = ConstantExpr::getSExt(CI, Type::getInt32Ty(CI->getContext()));			Val = ConstantExpr::getSExt(CI, Type::getInt32Ty(CI->getContext()));
	else			else
	Val = ConstantExpr::getZExt(CI, Type::getInt32Ty(CI->getContext()));			Val = ConstantExpr::getZExt(CI, Type::getInt32Ty(CI->getContext()));
	}			}
	}			}

	// Passing bools around ends up doing a trunc to i1 and passing it.			// Passing bools around ends up doing a trunc to i1 and passing it.
	// Codegen this as an argument + "and 1".			// Codegen this as an argument + "and 1".
	MVT VT;			MVT VT;
	auto *TI = dyn_cast<TruncInst>(Val);			auto *TI = dyn_cast<TruncInst>(Val);
	unsigned ResultReg;			unsigned ResultReg;
	if (TI && TI->getType()->isIntegerTy(1) && CLI.CS &&			if (TI && TI->getType()->isIntegerTy(1) && CLI.CS &&
	(TI->getParent() == CLI.CS->getInstruction()->getParent()) &&			(TI->getParent() == CLI.CS->getInstruction()->getParent()) &&
	TI->hasOneUse()) {			TI->hasOneUse()) {
	Value *PrevVal = TI->getOperand(0);			Value *PrevVal = TI->getOperand(0);
	ResultReg = getRegForValue(PrevVal);			ResultReg = getRegForValue(PrevVal);

	if (!ResultReg)			if (!ResultReg)
	return false;			return false;

	if (!isTypeLegal(PrevVal->getType(), VT))			if (!isTypeLegal(PrevVal->getType(), VT))
	return false;			return false;

	ResultReg =			ResultReg =
	fastEmit_ri(VT, VT, ISD::AND, ResultReg, hasTrivialKill(PrevVal), 1);			fastEmit_ri(VT, VT, ISD::AND, ResultReg, hasTrivialKill(PrevVal), 1);
	} else {			} else {
	if (!isTypeLegal(Val->getType(), VT))			if (!isTypeLegal(Val->getType(), VT))
	return false;			return false;
	ResultReg = getRegForValue(Val);			ResultReg = getRegForValue(Val);
	}			}

	if (!ResultReg)			if (!ResultReg)
	return false;			return false;

	ArgRegs.push_back(ResultReg);			ArgRegs.push_back(ResultReg);
	OutVTs.push_back(VT);			OutVTs.push_back(VT);
	}			}

	// Analyze operands of the call, assigning locations to each operand.			// Analyze operands of the call, assigning locations to each operand.
	SmallVector<CCValAssign, 16> ArgLocs;			SmallVector<CCValAssign, 16> ArgLocs;
	CCState CCInfo(CC, IsVarArg, *FuncInfo.MF, ArgLocs, CLI.RetTy->getContext());			CCState CCInfo(CC, IsVarArg, *FuncInfo.MF, ArgLocs, CLI.RetTy->getContext());

	// Allocate shadow area for Win64			// Allocate shadow area for Win64
	if (IsWin64)			if (IsWin64)
	CCInfo.AllocateStack(32, 8);			CCInfo.AllocateStack(32, 8);

	CCInfo.AnalyzeCallOperands(OutVTs, OutFlags, CC_X86);			CCInfo.AnalyzeCallOperands(OutVTs, OutFlags, CC_X86);

	// Get a count of how many bytes are to be pushed on the stack.			// Get a count of how many bytes are to be pushed on the stack.
	unsigned NumBytes = CCInfo.getNextStackOffset();			unsigned NumBytes = CCInfo.getNextStackOffset();

	// Issue CALLSEQ_START			// Issue CALLSEQ_START
	unsigned AdjStackDown = TII.getCallFrameSetupOpcode();			unsigned AdjStackDown = TII.getCallFrameSetupOpcode();
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(AdjStackDown))			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(AdjStackDown))
	.addImm(NumBytes);			.addImm(NumBytes).addImm(0);

	// Walk the register/memloc assignments, inserting copies/loads.			// Walk the register/memloc assignments, inserting copies/loads.
	const X86RegisterInfo RegInfo = static_cast<const X86RegisterInfo >(			const X86RegisterInfo RegInfo = static_cast<const X86RegisterInfo >(
	TM.getSubtargetImpl()->getRegisterInfo());			TM.getSubtargetImpl()->getRegisterInfo());
	for (unsigned i = 0, e = ArgLocs.size(); i != e; ++i) {			for (unsigned i = 0, e = ArgLocs.size(); i != e; ++i) {
	CCValAssign const &VA = ArgLocs[i];			CCValAssign const &VA = ArgLocs[i];
	const Value *ArgVal = OutVals[VA.getValNo()];			const Value *ArgVal = OutVals[VA.getValNo()];
	MVT ArgVT = OutVTs[VA.getValNo()];			MVT ArgVT = OutVTs[VA.getValNo()];

	if (ArgVT == MVT::x86mmx)			if (ArgVT == MVT::x86mmx)
	return false;			return false;

	unsigned ArgReg = ArgRegs[VA.getValNo()];			unsigned ArgReg = ArgRegs[VA.getValNo()];

	// Promote the value if needed.			// Promote the value if needed.
	switch (VA.getLocInfo()) {			switch (VA.getLocInfo()) {
	case CCValAssign::Full: break;			case CCValAssign::Full: break;
	case CCValAssign::SExt: {			case CCValAssign::SExt: {
	assert(VA.getLocVT().isInteger() && !VA.getLocVT().isVector() &&			assert(VA.getLocVT().isInteger() && !VA.getLocVT().isVector() &&
	"Unexpected extend");			"Unexpected extend");
	bool Emitted = X86FastEmitExtend(ISD::SIGN_EXTEND, VA.getLocVT(), ArgReg,			bool Emitted = X86FastEmitExtend(ISD::SIGN_EXTEND, VA.getLocVT(), ArgReg,
	ArgVT, ArgReg);			ArgVT, ArgReg);
	assert(Emitted && "Failed to emit a sext!"); (void)Emitted;			assert(Emitted && "Failed to emit a sext!"); (void)Emitted;
	ArgVT = VA.getLocVT();			ArgVT = VA.getLocVT();
	break;			break;
	}			}
	case CCValAssign::ZExt: {			case CCValAssign::ZExt: {
	assert(VA.getLocVT().isInteger() && !VA.getLocVT().isVector() &&			assert(VA.getLocVT().isInteger() && !VA.getLocVT().isVector() &&
	"Unexpected extend");			"Unexpected extend");
	bool Emitted = X86FastEmitExtend(ISD::ZERO_EXTEND, VA.getLocVT(), ArgReg,			bool Emitted = X86FastEmitExtend(ISD::ZERO_EXTEND, VA.getLocVT(), ArgReg,
	ArgVT, ArgReg);			ArgVT, ArgReg);
	assert(Emitted && "Failed to emit a zext!"); (void)Emitted;			assert(Emitted && "Failed to emit a zext!"); (void)Emitted;
	ArgVT = VA.getLocVT();			ArgVT = VA.getLocVT();
	break;			break;
	}			}
	case CCValAssign::AExt: {			case CCValAssign::AExt: {
	assert(VA.getLocVT().isInteger() && !VA.getLocVT().isVector() &&			assert(VA.getLocVT().isInteger() && !VA.getLocVT().isVector() &&
	"Unexpected extend");			"Unexpected extend");
	bool Emitted = X86FastEmitExtend(ISD::ANY_EXTEND, VA.getLocVT(), ArgReg,			bool Emitted = X86FastEmitExtend(ISD::ANY_EXTEND, VA.getLocVT(), ArgReg,
	ArgVT, ArgReg);			ArgVT, ArgReg);
	if (!Emitted)			if (!Emitted)
	Emitted = X86FastEmitExtend(ISD::ZERO_EXTEND, VA.getLocVT(), ArgReg,			Emitted = X86FastEmitExtend(ISD::ZERO_EXTEND, VA.getLocVT(), ArgReg,
	ArgVT, ArgReg);			ArgVT, ArgReg);
	if (!Emitted)			if (!Emitted)
	Emitted = X86FastEmitExtend(ISD::SIGN_EXTEND, VA.getLocVT(), ArgReg,			Emitted = X86FastEmitExtend(ISD::SIGN_EXTEND, VA.getLocVT(), ArgReg,
	ArgVT, ArgReg);			ArgVT, ArgReg);

	assert(Emitted && "Failed to emit a aext!"); (void)Emitted;			assert(Emitted && "Failed to emit a aext!"); (void)Emitted;
	ArgVT = VA.getLocVT();			ArgVT = VA.getLocVT();
	break;			break;
	}			}
	case CCValAssign::BCvt: {			case CCValAssign::BCvt: {
	ArgReg = fastEmit_r(ArgVT, VA.getLocVT(), ISD::BITCAST, ArgReg,			ArgReg = fastEmit_r(ArgVT, VA.getLocVT(), ISD::BITCAST, ArgReg,
	/TODO: Kill=/false);			/TODO: Kill=/false);
	assert(ArgReg && "Failed to emit a bitcast!");			assert(ArgReg && "Failed to emit a bitcast!");
	ArgVT = VA.getLocVT();			ArgVT = VA.getLocVT();
	break;			break;
	}			}
	case CCValAssign::VExt:			case CCValAssign::VExt:
	// VExt has not been implemented, so this should be impossible to reach			// VExt has not been implemented, so this should be impossible to reach
	// for now. However, fallback to Selection DAG isel once implemented.			// for now. However, fallback to Selection DAG isel once implemented.
	return false;			return false;
	case CCValAssign::AExtUpper:			case CCValAssign::AExtUpper:
	case CCValAssign::SExtUpper:			case CCValAssign::SExtUpper:
	case CCValAssign::ZExtUpper:			case CCValAssign::ZExtUpper:
	case CCValAssign::FPExt:			case CCValAssign::FPExt:
	llvm_unreachable("Unexpected loc info!");			llvm_unreachable("Unexpected loc info!");
	case CCValAssign::Indirect:			case CCValAssign::Indirect:
	// FIXME: Indirect doesn't need extending, but fast-isel doesn't fully			// FIXME: Indirect doesn't need extending, but fast-isel doesn't fully
	// support this.			// support this.
	return false;			return false;
	}			}

	if (VA.isRegLoc()) {			if (VA.isRegLoc()) {
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(TargetOpcode::COPY), VA.getLocReg()).addReg(ArgReg);			TII.get(TargetOpcode::COPY), VA.getLocReg()).addReg(ArgReg);
	OutRegs.push_back(VA.getLocReg());			OutRegs.push_back(VA.getLocReg());
	} else {			} else {
	assert(VA.isMemLoc());			assert(VA.isMemLoc());

	// Don't emit stores for undef values.			// Don't emit stores for undef values.
	if (isa<UndefValue>(ArgVal))			if (isa<UndefValue>(ArgVal))
	continue;			continue;

	unsigned LocMemOffset = VA.getLocMemOffset();			unsigned LocMemOffset = VA.getLocMemOffset();
	X86AddressMode AM;			X86AddressMode AM;
	AM.Base.Reg = RegInfo->getStackRegister();			AM.Base.Reg = RegInfo->getStackRegister();
	AM.Disp = LocMemOffset;			AM.Disp = LocMemOffset;
	ISD::ArgFlagsTy Flags = OutFlags[VA.getValNo()];			ISD::ArgFlagsTy Flags = OutFlags[VA.getValNo()];
	unsigned Alignment = DL.getABITypeAlignment(ArgVal->getType());			unsigned Alignment = DL.getABITypeAlignment(ArgVal->getType());
	MachineMemOperand *MMO = FuncInfo.MF->getMachineMemOperand(			MachineMemOperand *MMO = FuncInfo.MF->getMachineMemOperand(
	MachinePointerInfo::getStack(LocMemOffset), MachineMemOperand::MOStore,			MachinePointerInfo::getStack(LocMemOffset), MachineMemOperand::MOStore,
	ArgVT.getStoreSize(), Alignment);			ArgVT.getStoreSize(), Alignment);
	if (Flags.isByVal()) {			if (Flags.isByVal()) {
	X86AddressMode SrcAM;			X86AddressMode SrcAM;
	SrcAM.Base.Reg = ArgReg;			SrcAM.Base.Reg = ArgReg;
	if (!TryEmitSmallMemcpy(AM, SrcAM, Flags.getByValSize()))			if (!TryEmitSmallMemcpy(AM, SrcAM, Flags.getByValSize()))
	return false;			return false;
	} else if (isa<ConstantInt>(ArgVal) \|\| isa<ConstantPointerNull>(ArgVal)) {			} else if (isa<ConstantInt>(ArgVal) \|\| isa<ConstantPointerNull>(ArgVal)) {
	// If this is a really simple value, emit this with the Value* version			// If this is a really simple value, emit this with the Value* version
	// of X86FastEmitStore. If it isn't simple, we don't want to do this,			// of X86FastEmitStore. If it isn't simple, we don't want to do this,
	// as it can cause us to reevaluate the argument.			// as it can cause us to reevaluate the argument.
	if (!X86FastEmitStore(ArgVT, ArgVal, AM, MMO))			if (!X86FastEmitStore(ArgVT, ArgVal, AM, MMO))
	return false;			return false;
	} else {			} else {
	bool ValIsKill = hasTrivialKill(ArgVal);			bool ValIsKill = hasTrivialKill(ArgVal);
	if (!X86FastEmitStore(ArgVT, ArgReg, ValIsKill, AM, MMO))			if (!X86FastEmitStore(ArgVT, ArgReg, ValIsKill, AM, MMO))
	return false;			return false;
	}			}
	}			}
	}			}

	// ELF / PIC requires GOT in the EBX register before function calls via PLT			// ELF / PIC requires GOT in the EBX register before function calls via PLT
	// GOT pointer.			// GOT pointer.
	if (Subtarget->isPICStyleGOT()) {			if (Subtarget->isPICStyleGOT()) {
	unsigned Base = getInstrInfo()->getGlobalBaseReg(FuncInfo.MF);			unsigned Base = getInstrInfo()->getGlobalBaseReg(FuncInfo.MF);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(TargetOpcode::COPY), X86::EBX).addReg(Base);			TII.get(TargetOpcode::COPY), X86::EBX).addReg(Base);
	}			}

	if (Is64Bit && IsVarArg && !IsWin64) {			if (Is64Bit && IsVarArg && !IsWin64) {
	// From AMD64 ABI document:			// From AMD64 ABI document:
	// For calls that may call functions that use varargs or stdargs			// For calls that may call functions that use varargs or stdargs
	// (prototype-less calls or calls to functions containing ellipsis (...) in			// (prototype-less calls or calls to functions containing ellipsis (...) in
	// the declaration) %al is used as hidden argument to specify the number			// the declaration) %al is used as hidden argument to specify the number
	// of SSE registers used. The contents of %al do not need to match exactly			// of SSE registers used. The contents of %al do not need to match exactly
	// the number of registers, but must be an ubound on the number of SSE			// the number of registers, but must be an ubound on the number of SSE
	// registers used and is in the range 0 - 8 inclusive.			// registers used and is in the range 0 - 8 inclusive.

	// Count the number of XMM registers allocated.			// Count the number of XMM registers allocated.
	static const MCPhysReg XMMArgRegs[] = {			static const MCPhysReg XMMArgRegs[] = {
	X86::XMM0, X86::XMM1, X86::XMM2, X86::XMM3,			X86::XMM0, X86::XMM1, X86::XMM2, X86::XMM3,
	X86::XMM4, X86::XMM5, X86::XMM6, X86::XMM7			X86::XMM4, X86::XMM5, X86::XMM6, X86::XMM7
	};			};
	unsigned NumXMMRegs = CCInfo.getFirstUnallocated(XMMArgRegs, 8);			unsigned NumXMMRegs = CCInfo.getFirstUnallocated(XMMArgRegs, 8);
	assert((Subtarget->hasSSE1() \|\| !NumXMMRegs)			assert((Subtarget->hasSSE1() \|\| !NumXMMRegs)
	&& "SSE registers cannot be used when SSE is disabled");			&& "SSE registers cannot be used when SSE is disabled");
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(X86::MOV8ri),			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(X86::MOV8ri),
	X86::AL).addImm(NumXMMRegs);			X86::AL).addImm(NumXMMRegs);
	}			}

	// Materialize callee address in a register. FIXME: GV address can be			// Materialize callee address in a register. FIXME: GV address can be
	// handled with a CALLpcrel32 instead.			// handled with a CALLpcrel32 instead.
	X86AddressMode CalleeAM;			X86AddressMode CalleeAM;
	if (!X86SelectCallAddress(Callee, CalleeAM))			if (!X86SelectCallAddress(Callee, CalleeAM))
	return false;			return false;

	unsigned CalleeOp = 0;			unsigned CalleeOp = 0;
	const GlobalValue *GV = nullptr;			const GlobalValue *GV = nullptr;
	if (CalleeAM.GV != nullptr) {			if (CalleeAM.GV != nullptr) {
	GV = CalleeAM.GV;			GV = CalleeAM.GV;
	} else if (CalleeAM.Base.Reg != 0) {			} else if (CalleeAM.Base.Reg != 0) {
	CalleeOp = CalleeAM.Base.Reg;			CalleeOp = CalleeAM.Base.Reg;
	} else			} else
	return false;			return false;

	// Issue the call.			// Issue the call.
	MachineInstrBuilder MIB;			MachineInstrBuilder MIB;
	if (CalleeOp) {			if (CalleeOp) {
	// Register-indirect call.			// Register-indirect call.
	unsigned CallOpc = Is64Bit ? X86::CALL64r : X86::CALL32r;			unsigned CallOpc = Is64Bit ? X86::CALL64r : X86::CALL32r;
	MIB = BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(CallOpc))			MIB = BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(CallOpc))
	.addReg(CalleeOp);			.addReg(CalleeOp);
	} else {			} else {
	// Direct call.			// Direct call.
	assert(GV && "Not a direct call");			assert(GV && "Not a direct call");
	unsigned CallOpc = Is64Bit ? X86::CALL64pcrel32 : X86::CALLpcrel32;			unsigned CallOpc = Is64Bit ? X86::CALL64pcrel32 : X86::CALLpcrel32;

	// See if we need any target-specific flags on the GV operand.			// See if we need any target-specific flags on the GV operand.
	unsigned char OpFlags = 0;			unsigned char OpFlags = 0;

	// On ELF targets, in both X86-64 and X86-32 mode, direct calls to			// On ELF targets, in both X86-64 and X86-32 mode, direct calls to
	// external symbols most go through the PLT in PIC mode. If the symbol			// external symbols most go through the PLT in PIC mode. If the symbol
	// has hidden or protected visibility, or if it is static or local, then			// has hidden or protected visibility, or if it is static or local, then
	// we don't need to use the PLT - we can directly call it.			// we don't need to use the PLT - we can directly call it.
	if (Subtarget->isTargetELF() &&			if (Subtarget->isTargetELF() &&
	TM.getRelocationModel() == Reloc::PIC_ &&			TM.getRelocationModel() == Reloc::PIC_ &&
	GV->hasDefaultVisibility() && !GV->hasLocalLinkage()) {			GV->hasDefaultVisibility() && !GV->hasLocalLinkage()) {
	OpFlags = X86II::MO_PLT;			OpFlags = X86II::MO_PLT;
	} else if (Subtarget->isPICStyleStubAny() &&			} else if (Subtarget->isPICStyleStubAny() &&
	(GV->isDeclaration() \|\| GV->isWeakForLinker()) &&			(GV->isDeclaration() \|\| GV->isWeakForLinker()) &&
	(!Subtarget->getTargetTriple().isMacOSX() \|\|			(!Subtarget->getTargetTriple().isMacOSX() \|\|
	Subtarget->getTargetTriple().isMacOSXVersionLT(10, 5))) {			Subtarget->getTargetTriple().isMacOSXVersionLT(10, 5))) {
	// PC-relative references to external symbols should go through $stub,			// PC-relative references to external symbols should go through $stub,
	// unless we're building with the leopard linker or later, which			// unless we're building with the leopard linker or later, which
	// automatically synthesizes these stubs.			// automatically synthesizes these stubs.
	OpFlags = X86II::MO_DARWIN_STUB;			OpFlags = X86II::MO_DARWIN_STUB;
	}			}

	MIB = BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(CallOpc));			MIB = BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(CallOpc));
	if (SymName)			if (SymName)
	MIB.addExternalSymbol(SymName, OpFlags);			MIB.addExternalSymbol(SymName, OpFlags);
	else			else
	MIB.addGlobalAddress(GV, 0, OpFlags);			MIB.addGlobalAddress(GV, 0, OpFlags);
	}			}

	// Add a register mask operand representing the call-preserved registers.			// Add a register mask operand representing the call-preserved registers.
	// Proper defs for return values will be added by setPhysRegsDeadExcept().			// Proper defs for return values will be added by setPhysRegsDeadExcept().
	MIB.addRegMask(TRI.getCallPreservedMask(CC));			MIB.addRegMask(TRI.getCallPreservedMask(CC));

	// Add an implicit use GOT pointer in EBX.			// Add an implicit use GOT pointer in EBX.
	if (Subtarget->isPICStyleGOT())			if (Subtarget->isPICStyleGOT())
	MIB.addReg(X86::EBX, RegState::Implicit);			MIB.addReg(X86::EBX, RegState::Implicit);

	if (Is64Bit && IsVarArg && !IsWin64)			if (Is64Bit && IsVarArg && !IsWin64)
	MIB.addReg(X86::AL, RegState::Implicit);			MIB.addReg(X86::AL, RegState::Implicit);

	// Add implicit physical register uses to the call.			// Add implicit physical register uses to the call.
	for (auto Reg : OutRegs)			for (auto Reg : OutRegs)
	MIB.addReg(Reg, RegState::Implicit);			MIB.addReg(Reg, RegState::Implicit);

	// Issue CALLSEQ_END			// Issue CALLSEQ_END
	unsigned NumBytesForCalleeToPop =			unsigned NumBytesForCalleeToPop =
	computeBytesPoppedByCallee(Subtarget, CC, CLI.CS);			computeBytesPoppedByCallee(Subtarget, CC, CLI.CS);
	unsigned AdjStackUp = TII.getCallFrameDestroyOpcode();			unsigned AdjStackUp = TII.getCallFrameDestroyOpcode();
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(AdjStackUp))			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(AdjStackUp))
	.addImm(NumBytes).addImm(NumBytesForCalleeToPop);			.addImm(NumBytes).addImm(NumBytesForCalleeToPop);

	// Now handle call return values.			// Now handle call return values.
	SmallVector<CCValAssign, 16> RVLocs;			SmallVector<CCValAssign, 16> RVLocs;
	CCState CCRetInfo(CC, IsVarArg, *FuncInfo.MF, RVLocs,			CCState CCRetInfo(CC, IsVarArg, *FuncInfo.MF, RVLocs,
	CLI.RetTy->getContext());			CLI.RetTy->getContext());
	CCRetInfo.AnalyzeCallResult(Ins, RetCC_X86);			CCRetInfo.AnalyzeCallResult(Ins, RetCC_X86);

	// Copy all of the result registers out of their specified physreg.			// Copy all of the result registers out of their specified physreg.
	unsigned ResultReg = FuncInfo.CreateRegs(CLI.RetTy);			unsigned ResultReg = FuncInfo.CreateRegs(CLI.RetTy);
	for (unsigned i = 0; i != RVLocs.size(); ++i) {			for (unsigned i = 0; i != RVLocs.size(); ++i) {
	CCValAssign &VA = RVLocs[i];			CCValAssign &VA = RVLocs[i];
	EVT CopyVT = VA.getValVT();			EVT CopyVT = VA.getValVT();
	unsigned CopyReg = ResultReg + i;			unsigned CopyReg = ResultReg + i;

	// If this is x86-64, and we disabled SSE, we can't return FP values			// If this is x86-64, and we disabled SSE, we can't return FP values
	if ((CopyVT == MVT::f32 \|\| CopyVT == MVT::f64) &&			if ((CopyVT == MVT::f32 \|\| CopyVT == MVT::f64) &&
	((Is64Bit \|\| Ins[i].Flags.isInReg()) && !Subtarget->hasSSE1())) {			((Is64Bit \|\| Ins[i].Flags.isInReg()) && !Subtarget->hasSSE1())) {
	report_fatal_error("SSE register return with SSE disabled");			report_fatal_error("SSE register return with SSE disabled");
	}			}

	// If we prefer to use the value in xmm registers, copy it out as f80 and			// If we prefer to use the value in xmm registers, copy it out as f80 and
	// use a truncate to move it from fp stack reg to xmm reg.			// use a truncate to move it from fp stack reg to xmm reg.
	if ((VA.getLocReg() == X86::FP0 \|\| VA.getLocReg() == X86::FP1) &&			if ((VA.getLocReg() == X86::FP0 \|\| VA.getLocReg() == X86::FP1) &&
	isScalarFPTypeInSSEReg(VA.getValVT())) {			isScalarFPTypeInSSEReg(VA.getValVT())) {
	CopyVT = MVT::f80;			CopyVT = MVT::f80;
	CopyReg = createResultReg(&X86::RFP80RegClass);			CopyReg = createResultReg(&X86::RFP80RegClass);
	}			}

	// Copy out the result.			// Copy out the result.
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(TargetOpcode::COPY), CopyReg).addReg(VA.getLocReg());			TII.get(TargetOpcode::COPY), CopyReg).addReg(VA.getLocReg());
	InRegs.push_back(VA.getLocReg());			InRegs.push_back(VA.getLocReg());

	// Round the f80 to the right size, which also moves it to the appropriate			// Round the f80 to the right size, which also moves it to the appropriate
	// xmm register. This is accomplished by storing the f80 value in memory			// xmm register. This is accomplished by storing the f80 value in memory
	// and then loading it back.			// and then loading it back.
	if (CopyVT != VA.getValVT()) {			if (CopyVT != VA.getValVT()) {
	EVT ResVT = VA.getValVT();			EVT ResVT = VA.getValVT();
	unsigned Opc = ResVT == MVT::f32 ? X86::ST_Fp80m32 : X86::ST_Fp80m64;			unsigned Opc = ResVT == MVT::f32 ? X86::ST_Fp80m32 : X86::ST_Fp80m64;
	unsigned MemSize = ResVT.getSizeInBits()/8;			unsigned MemSize = ResVT.getSizeInBits()/8;
	int FI = MFI.CreateStackObject(MemSize, MemSize, false);			int FI = MFI.CreateStackObject(MemSize, MemSize, false);
	addFrameReference(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			addFrameReference(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(Opc)), FI)			TII.get(Opc)), FI)
	.addReg(CopyReg);			.addReg(CopyReg);
	Opc = ResVT == MVT::f32 ? X86::MOVSSrm : X86::MOVSDrm;			Opc = ResVT == MVT::f32 ? X86::MOVSSrm : X86::MOVSDrm;
	addFrameReference(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			addFrameReference(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(Opc), ResultReg + i), FI);			TII.get(Opc), ResultReg + i), FI);
	}			}
	}			}

	CLI.ResultReg = ResultReg;			CLI.ResultReg = ResultReg;
	CLI.NumResultRegs = RVLocs.size();			CLI.NumResultRegs = RVLocs.size();
	CLI.Call = MIB;			CLI.Call = MIB;

	return true;			return true;
	}			}

	bool			bool
	X86FastISel::fastSelectInstruction(const Instruction *I) {			X86FastISel::fastSelectInstruction(const Instruction *I) {
	switch (I->getOpcode()) {			switch (I->getOpcode()) {
	default: break;			default: break;
	case Instruction::Load:			case Instruction::Load:
	return X86SelectLoad(I);			return X86SelectLoad(I);
	case Instruction::Store:			case Instruction::Store:
	return X86SelectStore(I);			return X86SelectStore(I);
	case Instruction::Ret:			case Instruction::Ret:
	return X86SelectRet(I);			return X86SelectRet(I);
	case Instruction::ICmp:			case Instruction::ICmp:
	case Instruction::FCmp:			case Instruction::FCmp:
	return X86SelectCmp(I);			return X86SelectCmp(I);
	case Instruction::ZExt:			case Instruction::ZExt:
	return X86SelectZExt(I);			return X86SelectZExt(I);
	case Instruction::Br:			case Instruction::Br:
	return X86SelectBranch(I);			return X86SelectBranch(I);
	case Instruction::LShr:			case Instruction::LShr:
	case Instruction::AShr:			case Instruction::AShr:
	case Instruction::Shl:			case Instruction::Shl:
	return X86SelectShift(I);			return X86SelectShift(I);
	case Instruction::SDiv:			case Instruction::SDiv:
	case Instruction::UDiv:			case Instruction::UDiv:
	case Instruction::SRem:			case Instruction::SRem:
	case Instruction::URem:			case Instruction::URem:
	return X86SelectDivRem(I);			return X86SelectDivRem(I);
	case Instruction::Select:			case Instruction::Select:
	return X86SelectSelect(I);			return X86SelectSelect(I);
	case Instruction::Trunc:			case Instruction::Trunc:
	return X86SelectTrunc(I);			return X86SelectTrunc(I);
	case Instruction::FPExt:			case Instruction::FPExt:
	return X86SelectFPExt(I);			return X86SelectFPExt(I);
	case Instruction::FPTrunc:			case Instruction::FPTrunc:
	return X86SelectFPTrunc(I);			return X86SelectFPTrunc(I);
	case Instruction::IntToPtr: // Deliberate fall-through.			case Instruction::IntToPtr: // Deliberate fall-through.
	case Instruction::PtrToInt: {			case Instruction::PtrToInt: {
	EVT SrcVT = TLI.getValueType(I->getOperand(0)->getType());			EVT SrcVT = TLI.getValueType(I->getOperand(0)->getType());
	EVT DstVT = TLI.getValueType(I->getType());			EVT DstVT = TLI.getValueType(I->getType());
	if (DstVT.bitsGT(SrcVT))			if (DstVT.bitsGT(SrcVT))
	return X86SelectZExt(I);			return X86SelectZExt(I);
	if (DstVT.bitsLT(SrcVT))			if (DstVT.bitsLT(SrcVT))
	return X86SelectTrunc(I);			return X86SelectTrunc(I);
	unsigned Reg = getRegForValue(I->getOperand(0));			unsigned Reg = getRegForValue(I->getOperand(0));
	if (Reg == 0) return false;			if (Reg == 0) return false;
	updateValueMap(I, Reg);			updateValueMap(I, Reg);
	return true;			return true;
	}			}
	}			}

	return false;			return false;
	}			}

	unsigned X86FastISel::X86MaterializeInt(const ConstantInt *CI, MVT VT) {			unsigned X86FastISel::X86MaterializeInt(const ConstantInt *CI, MVT VT) {
	if (VT > MVT::i64)			if (VT > MVT::i64)
	return 0;			return 0;

	uint64_t Imm = CI->getZExtValue();			uint64_t Imm = CI->getZExtValue();
	if (Imm == 0) {			if (Imm == 0) {
	unsigned SrcReg = fastEmitInst_(X86::MOV32r0, &X86::GR32RegClass);			unsigned SrcReg = fastEmitInst_(X86::MOV32r0, &X86::GR32RegClass);
	switch (VT.SimpleTy) {			switch (VT.SimpleTy) {
	default: llvm_unreachable("Unexpected value type");			default: llvm_unreachable("Unexpected value type");
	case MVT::i1:			case MVT::i1:
	case MVT::i8:			case MVT::i8:
	return fastEmitInst_extractsubreg(MVT::i8, SrcReg, /Kill=/true,			return fastEmitInst_extractsubreg(MVT::i8, SrcReg, /Kill=/true,
	X86::sub_8bit);			X86::sub_8bit);
	case MVT::i16:			case MVT::i16:
	return fastEmitInst_extractsubreg(MVT::i16, SrcReg, /Kill=/true,			return fastEmitInst_extractsubreg(MVT::i16, SrcReg, /Kill=/true,
	X86::sub_16bit);			X86::sub_16bit);
	case MVT::i32:			case MVT::i32:
	return SrcReg;			return SrcReg;
	case MVT::i64: {			case MVT::i64: {
	unsigned ResultReg = createResultReg(&X86::GR64RegClass);			unsigned ResultReg = createResultReg(&X86::GR64RegClass);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(TargetOpcode::SUBREG_TO_REG), ResultReg)			TII.get(TargetOpcode::SUBREG_TO_REG), ResultReg)
	.addImm(0).addReg(SrcReg).addImm(X86::sub_32bit);			.addImm(0).addReg(SrcReg).addImm(X86::sub_32bit);
	return ResultReg;			return ResultReg;
	}			}
	}			}
	}			}

	unsigned Opc = 0;			unsigned Opc = 0;
	switch (VT.SimpleTy) {			switch (VT.SimpleTy) {
	default: llvm_unreachable("Unexpected value type");			default: llvm_unreachable("Unexpected value type");
	case MVT::i1: VT = MVT::i8; // fall-through			case MVT::i1: VT = MVT::i8; // fall-through
	case MVT::i8: Opc = X86::MOV8ri; break;			case MVT::i8: Opc = X86::MOV8ri; break;
	case MVT::i16: Opc = X86::MOV16ri; break;			case MVT::i16: Opc = X86::MOV16ri; break;
	case MVT::i32: Opc = X86::MOV32ri; break;			case MVT::i32: Opc = X86::MOV32ri; break;
	case MVT::i64: {			case MVT::i64: {
	if (isUInt<32>(Imm))			if (isUInt<32>(Imm))
	Opc = X86::MOV32ri;			Opc = X86::MOV32ri;
	else if (isInt<32>(Imm))			else if (isInt<32>(Imm))
	Opc = X86::MOV64ri32;			Opc = X86::MOV64ri32;
	else			else
	Opc = X86::MOV64ri;			Opc = X86::MOV64ri;
	break;			break;
	}			}
	}			}
	if (VT == MVT::i64 && Opc == X86::MOV32ri) {			if (VT == MVT::i64 && Opc == X86::MOV32ri) {
	unsigned SrcReg = fastEmitInst_i(Opc, &X86::GR32RegClass, Imm);			unsigned SrcReg = fastEmitInst_i(Opc, &X86::GR32RegClass, Imm);
	unsigned ResultReg = createResultReg(&X86::GR64RegClass);			unsigned ResultReg = createResultReg(&X86::GR64RegClass);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(TargetOpcode::SUBREG_TO_REG), ResultReg)			TII.get(TargetOpcode::SUBREG_TO_REG), ResultReg)
	.addImm(0).addReg(SrcReg).addImm(X86::sub_32bit);			.addImm(0).addReg(SrcReg).addImm(X86::sub_32bit);
	return ResultReg;			return ResultReg;
	}			}
	return fastEmitInst_i(Opc, TLI.getRegClassFor(VT), Imm);			return fastEmitInst_i(Opc, TLI.getRegClassFor(VT), Imm);
	}			}

	unsigned X86FastISel::X86MaterializeFP(const ConstantFP *CFP, MVT VT) {			unsigned X86FastISel::X86MaterializeFP(const ConstantFP *CFP, MVT VT) {
	if (CFP->isNullValue())			if (CFP->isNullValue())
	return fastMaterializeFloatZero(CFP);			return fastMaterializeFloatZero(CFP);

	// Can't handle alternate code models yet.			// Can't handle alternate code models yet.
	CodeModel::Model CM = TM.getCodeModel();			CodeModel::Model CM = TM.getCodeModel();
	if (CM != CodeModel::Small && CM != CodeModel::Large)			if (CM != CodeModel::Small && CM != CodeModel::Large)
	return 0;			return 0;

	// Get opcode and regclass of the output for the given load instruction.			// Get opcode and regclass of the output for the given load instruction.
	unsigned Opc = 0;			unsigned Opc = 0;
	const TargetRegisterClass *RC = nullptr;			const TargetRegisterClass *RC = nullptr;
	switch (VT.SimpleTy) {			switch (VT.SimpleTy) {
	default: return 0;			default: return 0;
	case MVT::f32:			case MVT::f32:
	if (X86ScalarSSEf32) {			if (X86ScalarSSEf32) {
	Opc = Subtarget->hasAVX() ? X86::VMOVSSrm : X86::MOVSSrm;			Opc = Subtarget->hasAVX() ? X86::VMOVSSrm : X86::MOVSSrm;
	RC = &X86::FR32RegClass;			RC = &X86::FR32RegClass;
	} else {			} else {
	Opc = X86::LD_Fp32m;			Opc = X86::LD_Fp32m;
	RC = &X86::RFP32RegClass;			RC = &X86::RFP32RegClass;
	}			}
	break;			break;
	case MVT::f64:			case MVT::f64:
	if (X86ScalarSSEf64) {			if (X86ScalarSSEf64) {
	Opc = Subtarget->hasAVX() ? X86::VMOVSDrm : X86::MOVSDrm;			Opc = Subtarget->hasAVX() ? X86::VMOVSDrm : X86::MOVSDrm;
	RC = &X86::FR64RegClass;			RC = &X86::FR64RegClass;
	} else {			} else {
	Opc = X86::LD_Fp64m;			Opc = X86::LD_Fp64m;
	RC = &X86::RFP64RegClass;			RC = &X86::RFP64RegClass;
	}			}
	break;			break;
	case MVT::f80:			case MVT::f80:
	// No f80 support yet.			// No f80 support yet.
	return 0;			return 0;
	}			}

	// MachineConstantPool wants an explicit alignment.			// MachineConstantPool wants an explicit alignment.
	unsigned Align = DL.getPrefTypeAlignment(CFP->getType());			unsigned Align = DL.getPrefTypeAlignment(CFP->getType());
	if (Align == 0) {			if (Align == 0) {
	// Alignment of vector types. FIXME!			// Alignment of vector types. FIXME!
	Align = DL.getTypeAllocSize(CFP->getType());			Align = DL.getTypeAllocSize(CFP->getType());
	}			}

	// x86-32 PIC requires a PIC base register for constant pools.			// x86-32 PIC requires a PIC base register for constant pools.
	unsigned PICBase = 0;			unsigned PICBase = 0;
	unsigned char OpFlag = 0;			unsigned char OpFlag = 0;
	if (Subtarget->isPICStyleStubPIC()) { // Not dynamic-no-pic			if (Subtarget->isPICStyleStubPIC()) { // Not dynamic-no-pic
	OpFlag = X86II::MO_PIC_BASE_OFFSET;			OpFlag = X86II::MO_PIC_BASE_OFFSET;
	PICBase = getInstrInfo()->getGlobalBaseReg(FuncInfo.MF);			PICBase = getInstrInfo()->getGlobalBaseReg(FuncInfo.MF);
	} else if (Subtarget->isPICStyleGOT()) {			} else if (Subtarget->isPICStyleGOT()) {
	OpFlag = X86II::MO_GOTOFF;			OpFlag = X86II::MO_GOTOFF;
	PICBase = getInstrInfo()->getGlobalBaseReg(FuncInfo.MF);			PICBase = getInstrInfo()->getGlobalBaseReg(FuncInfo.MF);
	} else if (Subtarget->isPICStyleRIPRel() &&			} else if (Subtarget->isPICStyleRIPRel() &&
	TM.getCodeModel() == CodeModel::Small) {			TM.getCodeModel() == CodeModel::Small) {
	PICBase = X86::RIP;			PICBase = X86::RIP;
	}			}

	// Create the load from the constant pool.			// Create the load from the constant pool.
	unsigned CPI = MCP.getConstantPoolIndex(CFP, Align);			unsigned CPI = MCP.getConstantPoolIndex(CFP, Align);
	unsigned ResultReg = createResultReg(RC);			unsigned ResultReg = createResultReg(RC);

	if (CM == CodeModel::Large) {			if (CM == CodeModel::Large) {
	unsigned AddrReg = createResultReg(&X86::GR64RegClass);			unsigned AddrReg = createResultReg(&X86::GR64RegClass);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(X86::MOV64ri),			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(X86::MOV64ri),
	AddrReg)			AddrReg)
	.addConstantPoolIndex(CPI, 0, OpFlag);			.addConstantPoolIndex(CPI, 0, OpFlag);
	MachineInstrBuilder MIB = BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			MachineInstrBuilder MIB = BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(Opc), ResultReg);			TII.get(Opc), ResultReg);
	addDirectMem(MIB, AddrReg);			addDirectMem(MIB, AddrReg);
	MachineMemOperand *MMO = FuncInfo.MF->getMachineMemOperand(			MachineMemOperand *MMO = FuncInfo.MF->getMachineMemOperand(
	MachinePointerInfo::getConstantPool(), MachineMemOperand::MOLoad,			MachinePointerInfo::getConstantPool(), MachineMemOperand::MOLoad,
	TM.getDataLayout()->getPointerSize(), Align);			TM.getDataLayout()->getPointerSize(), Align);
	MIB->addMemOperand(*FuncInfo.MF, MMO);			MIB->addMemOperand(*FuncInfo.MF, MMO);
	return ResultReg;			return ResultReg;
	}			}

	addConstantPoolReference(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			addConstantPoolReference(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(Opc), ResultReg),			TII.get(Opc), ResultReg),
	CPI, PICBase, OpFlag);			CPI, PICBase, OpFlag);
	return ResultReg;			return ResultReg;
	}			}

	unsigned X86FastISel::X86MaterializeGV(const GlobalValue *GV, MVT VT) {			unsigned X86FastISel::X86MaterializeGV(const GlobalValue *GV, MVT VT) {
	// Can't handle alternate code models yet.			// Can't handle alternate code models yet.
	if (TM.getCodeModel() != CodeModel::Small)			if (TM.getCodeModel() != CodeModel::Small)
	return 0;			return 0;

	// Materialize addresses with LEA/MOV instructions.			// Materialize addresses with LEA/MOV instructions.
	X86AddressMode AM;			X86AddressMode AM;
	if (X86SelectAddress(GV, AM)) {			if (X86SelectAddress(GV, AM)) {
	// If the expression is just a basereg, then we're done, otherwise we need			// If the expression is just a basereg, then we're done, otherwise we need
	// to emit an LEA.			// to emit an LEA.
	if (AM.BaseType == X86AddressMode::RegBase &&			if (AM.BaseType == X86AddressMode::RegBase &&
	AM.IndexReg == 0 && AM.Disp == 0 && AM.GV == nullptr)			AM.IndexReg == 0 && AM.Disp == 0 && AM.GV == nullptr)
	return AM.Base.Reg;			return AM.Base.Reg;

	unsigned ResultReg = createResultReg(TLI.getRegClassFor(VT));			unsigned ResultReg = createResultReg(TLI.getRegClassFor(VT));
	if (TM.getRelocationModel() == Reloc::Static &&			if (TM.getRelocationModel() == Reloc::Static &&
	TLI.getPointerTy() == MVT::i64) {			TLI.getPointerTy() == MVT::i64) {
	// The displacement code could be more than 32 bits away so we need to use			// The displacement code could be more than 32 bits away so we need to use
	// an instruction with a 64 bit immediate			// an instruction with a 64 bit immediate
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(X86::MOV64ri),			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(X86::MOV64ri),
	ResultReg)			ResultReg)
	.addGlobalAddress(GV);			.addGlobalAddress(GV);
	} else {			} else {
	unsigned Opc = TLI.getPointerTy() == MVT::i32			unsigned Opc = TLI.getPointerTy() == MVT::i32
	? (Subtarget->isTarget64BitILP32()			? (Subtarget->isTarget64BitILP32()
	? X86::LEA64_32r : X86::LEA32r)			? X86::LEA64_32r : X86::LEA32r)
	: X86::LEA64r;			: X86::LEA64r;
	addFullAddress(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			addFullAddress(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(Opc), ResultReg), AM);			TII.get(Opc), ResultReg), AM);
	}			}
	return ResultReg;			return ResultReg;
	}			}
	return 0;			return 0;
	}			}

	unsigned X86FastISel::fastMaterializeConstant(const Constant *C) {			unsigned X86FastISel::fastMaterializeConstant(const Constant *C) {
	EVT CEVT = TLI.getValueType(C->getType(), true);			EVT CEVT = TLI.getValueType(C->getType(), true);

	// Only handle simple types.			// Only handle simple types.
	if (!CEVT.isSimple())			if (!CEVT.isSimple())
	return 0;			return 0;
	MVT VT = CEVT.getSimpleVT();			MVT VT = CEVT.getSimpleVT();

	if (const auto *CI = dyn_cast<ConstantInt>(C))			if (const auto *CI = dyn_cast<ConstantInt>(C))
	return X86MaterializeInt(CI, VT);			return X86MaterializeInt(CI, VT);
	else if (const ConstantFP *CFP = dyn_cast<ConstantFP>(C))			else if (const ConstantFP *CFP = dyn_cast<ConstantFP>(C))
	return X86MaterializeFP(CFP, VT);			return X86MaterializeFP(CFP, VT);
	else if (const GlobalValue *GV = dyn_cast<GlobalValue>(C))			else if (const GlobalValue *GV = dyn_cast<GlobalValue>(C))
	return X86MaterializeGV(GV, VT);			return X86MaterializeGV(GV, VT);

	return 0;			return 0;
	}			}

	unsigned X86FastISel::fastMaterializeAlloca(const AllocaInst *C) {			unsigned X86FastISel::fastMaterializeAlloca(const AllocaInst *C) {
	// Fail on dynamic allocas. At this point, getRegForValue has already			// Fail on dynamic allocas. At this point, getRegForValue has already
	// checked its CSE maps, so if we're here trying to handle a dynamic			// checked its CSE maps, so if we're here trying to handle a dynamic
	// alloca, we're not going to succeed. X86SelectAddress has a			// alloca, we're not going to succeed. X86SelectAddress has a
	// check for dynamic allocas, because it's called directly from			// check for dynamic allocas, because it's called directly from
	// various places, but targetMaterializeAlloca also needs a check			// various places, but targetMaterializeAlloca also needs a check
	// in order to avoid recursion between getRegForValue,			// in order to avoid recursion between getRegForValue,
	// X86SelectAddrss, and targetMaterializeAlloca.			// X86SelectAddrss, and targetMaterializeAlloca.
	if (!FuncInfo.StaticAllocaMap.count(C))			if (!FuncInfo.StaticAllocaMap.count(C))
	return 0;			return 0;
	assert(C->isStaticAlloca() && "dynamic alloca in the static alloca map?");			assert(C->isStaticAlloca() && "dynamic alloca in the static alloca map?");

	X86AddressMode AM;			X86AddressMode AM;
	if (!X86SelectAddress(C, AM))			if (!X86SelectAddress(C, AM))
	return 0;			return 0;
	unsigned Opc = TLI.getPointerTy() == MVT::i32			unsigned Opc = TLI.getPointerTy() == MVT::i32
	? (Subtarget->isTarget64BitILP32()			? (Subtarget->isTarget64BitILP32()
	? X86::LEA64_32r : X86::LEA32r)			? X86::LEA64_32r : X86::LEA32r)
	: X86::LEA64r;			: X86::LEA64r;
	const TargetRegisterClass* RC = TLI.getRegClassFor(TLI.getPointerTy());			const TargetRegisterClass* RC = TLI.getRegClassFor(TLI.getPointerTy());
	unsigned ResultReg = createResultReg(RC);			unsigned ResultReg = createResultReg(RC);
	addFullAddress(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			addFullAddress(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(Opc), ResultReg), AM);			TII.get(Opc), ResultReg), AM);
	return ResultReg;			return ResultReg;
	}			}

	unsigned X86FastISel::fastMaterializeFloatZero(const ConstantFP *CF) {			unsigned X86FastISel::fastMaterializeFloatZero(const ConstantFP *CF) {
	MVT VT;			MVT VT;
	if (!isTypeLegal(CF->getType(), VT))			if (!isTypeLegal(CF->getType(), VT))
	return 0;			return 0;

	// Get opcode and regclass for the given zero.			// Get opcode and regclass for the given zero.
	unsigned Opc = 0;			unsigned Opc = 0;
	const TargetRegisterClass *RC = nullptr;			const TargetRegisterClass *RC = nullptr;
	switch (VT.SimpleTy) {			switch (VT.SimpleTy) {
	default: return 0;			default: return 0;
	case MVT::f32:			case MVT::f32:
	if (X86ScalarSSEf32) {			if (X86ScalarSSEf32) {
	Opc = X86::FsFLD0SS;			Opc = X86::FsFLD0SS;
	RC = &X86::FR32RegClass;			RC = &X86::FR32RegClass;
	} else {			} else {
	Opc = X86::LD_Fp032;			Opc = X86::LD_Fp032;
	RC = &X86::RFP32RegClass;			RC = &X86::RFP32RegClass;
	}			}
	break;			break;
	case MVT::f64:			case MVT::f64:
	if (X86ScalarSSEf64) {			if (X86ScalarSSEf64) {
	Opc = X86::FsFLD0SD;			Opc = X86::FsFLD0SD;
	RC = &X86::FR64RegClass;			RC = &X86::FR64RegClass;
	} else {			} else {
	Opc = X86::LD_Fp064;			Opc = X86::LD_Fp064;
	RC = &X86::RFP64RegClass;			RC = &X86::RFP64RegClass;
	}			}
	break;			break;
	case MVT::f80:			case MVT::f80:
	// No f80 support yet.			// No f80 support yet.
	return 0;			return 0;
	}			}

	unsigned ResultReg = createResultReg(RC);			unsigned ResultReg = createResultReg(RC);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc), ResultReg);			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc), ResultReg);
	return ResultReg;			return ResultReg;
	}			}


	bool X86FastISel::tryToFoldLoadIntoMI(MachineInstr *MI, unsigned OpNo,			bool X86FastISel::tryToFoldLoadIntoMI(MachineInstr *MI, unsigned OpNo,
	const LoadInst *LI) {			const LoadInst *LI) {
	const Value *Ptr = LI->getPointerOperand();			const Value *Ptr = LI->getPointerOperand();
	X86AddressMode AM;			X86AddressMode AM;
	if (!X86SelectAddress(Ptr, AM))			if (!X86SelectAddress(Ptr, AM))
	return false;			return false;

	const X86InstrInfo &XII = (const X86InstrInfo &)TII;			const X86InstrInfo &XII = (const X86InstrInfo &)TII;

	unsigned Size = DL.getTypeAllocSize(LI->getType());			unsigned Size = DL.getTypeAllocSize(LI->getType());
	unsigned Alignment = LI->getAlignment();			unsigned Alignment = LI->getAlignment();

	if (Alignment == 0) // Ensure that codegen never sees alignment 0			if (Alignment == 0) // Ensure that codegen never sees alignment 0
	Alignment = DL.getABITypeAlignment(LI->getType());			Alignment = DL.getABITypeAlignment(LI->getType());

	SmallVector<MachineOperand, 8> AddrOps;			SmallVector<MachineOperand, 8> AddrOps;
	AM.getFullAddress(AddrOps);			AM.getFullAddress(AddrOps);

	MachineInstr *Result =			MachineInstr *Result =
	XII.foldMemoryOperandImpl(*FuncInfo.MF, MI, OpNo, AddrOps,			XII.foldMemoryOperandImpl(*FuncInfo.MF, MI, OpNo, AddrOps,
	Size, Alignment, /AllowCommute=/true);			Size, Alignment, /AllowCommute=/true);
	if (!Result)			if (!Result)
	return false;			return false;

	Result->addMemOperand(*FuncInfo.MF, createMachineMemOperandFor(LI));			Result->addMemOperand(*FuncInfo.MF, createMachineMemOperandFor(LI));
	FuncInfo.MBB->insert(FuncInfo.InsertPt, Result);			FuncInfo.MBB->insert(FuncInfo.InsertPt, Result);
	MI->eraseFromParent();			MI->eraseFromParent();
	return true;			return true;
	}			}


	namespace llvm {			namespace llvm {
	FastISel *X86::createFastISel(FunctionLoweringInfo &funcInfo,			FastISel *X86::createFastISel(FunctionLoweringInfo &funcInfo,
	const TargetLibraryInfo *libInfo) {			const TargetLibraryInfo *libInfo) {
	return new X86FastISel(funcInfo, libInfo);			return new X86FastISel(funcInfo, libInfo);
	}			}
	}			}

llvm/trunk/lib/Target/X86/X86FrameLowering.h

	//===-- X86TargetFrameLowering.h - Define frame lowering for X86 -- C++ --==//			//===-- X86TargetFrameLowering.h - Define frame lowering for X86 -- C++ --==//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This class implements X86-specific bits of TargetFrameLowering class.			// This class implements X86-specific bits of TargetFrameLowering class.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_LIB_TARGET_X86_X86FRAMELOWERING_H			#ifndef LLVM_LIB_TARGET_X86_X86FRAMELOWERING_H
	#define LLVM_LIB_TARGET_X86_X86FRAMELOWERING_H			#define LLVM_LIB_TARGET_X86_X86FRAMELOWERING_H

	#include "llvm/Target/TargetFrameLowering.h"			#include "llvm/Target/TargetFrameLowering.h"

	namespace llvm {			namespace llvm {

	class MCSymbol;			class MCSymbol;
	class X86TargetMachine;			class X86TargetMachine;
	class X86Subtarget;			class X86Subtarget;

	class X86FrameLowering : public TargetFrameLowering {			class X86FrameLowering : public TargetFrameLowering {
	public:			public:
	explicit X86FrameLowering(StackDirection D, unsigned StackAl, int LAO)			explicit X86FrameLowering(StackDirection D, unsigned StackAl, int LAO)
	: TargetFrameLowering(StackGrowsDown, StackAl, LAO) {}			: TargetFrameLowering(StackGrowsDown, StackAl, LAO) {}

	/// Emit a call to the target's stack probe function. This is required for all			/// Emit a call to the target's stack probe function. This is required for all
	/// large stack allocations on Windows. The caller is required to materialize			/// large stack allocations on Windows. The caller is required to materialize
	/// the number of bytes to probe in RAX/EAX.			/// the number of bytes to probe in RAX/EAX.
	static void emitStackProbeCall(MachineFunction &MF, MachineBasicBlock &MBB,			static void emitStackProbeCall(MachineFunction &MF, MachineBasicBlock &MBB,
	MachineBasicBlock::iterator MBBI, DebugLoc DL);			MachineBasicBlock::iterator MBBI, DebugLoc DL);

	void emitCalleeSavedFrameMoves(MachineBasicBlock &MBB,			void emitCalleeSavedFrameMoves(MachineBasicBlock &MBB,
	MachineBasicBlock::iterator MBBI,			MachineBasicBlock::iterator MBBI,
	DebugLoc DL) const;			DebugLoc DL) const;

	/// emitProlog/emitEpilog - These methods insert prolog and epilog code into			/// emitProlog/emitEpilog - These methods insert prolog and epilog code into
	/// the function.			/// the function.
	void emitPrologue(MachineFunction &MF) const override;			void emitPrologue(MachineFunction &MF) const override;
	void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override;			void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override;

	void adjustForSegmentedStacks(MachineFunction &MF) const override;			void adjustForSegmentedStacks(MachineFunction &MF) const override;

	void adjustForHiPEPrologue(MachineFunction &MF) const override;			void adjustForHiPEPrologue(MachineFunction &MF) const override;

	void processFunctionBeforeCalleeSavedScan(MachineFunction &MF,			void processFunctionBeforeCalleeSavedScan(MachineFunction &MF,
	RegScavenger *RS = nullptr) const override;			RegScavenger *RS = nullptr) const override;

	bool			bool
	assignCalleeSavedSpillSlots(MachineFunction &MF,			assignCalleeSavedSpillSlots(MachineFunction &MF,
	const TargetRegisterInfo *TRI,			const TargetRegisterInfo *TRI,
	std::vector<CalleeSavedInfo> &CSI) const override;			std::vector<CalleeSavedInfo> &CSI) const override;

	bool spillCalleeSavedRegisters(MachineBasicBlock &MBB,			bool spillCalleeSavedRegisters(MachineBasicBlock &MBB,
	MachineBasicBlock::iterator MI,			MachineBasicBlock::iterator MI,
	const std::vector<CalleeSavedInfo> &CSI,			const std::vector<CalleeSavedInfo> &CSI,
	const TargetRegisterInfo *TRI) const override;			const TargetRegisterInfo *TRI) const override;

	bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB,			bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB,
	MachineBasicBlock::iterator MI,			MachineBasicBlock::iterator MI,
	const std::vector<CalleeSavedInfo> &CSI,			const std::vector<CalleeSavedInfo> &CSI,
	const TargetRegisterInfo *TRI) const override;			const TargetRegisterInfo *TRI) const override;

	bool hasFP(const MachineFunction &MF) const override;			bool hasFP(const MachineFunction &MF) const override;
	bool hasReservedCallFrame(const MachineFunction &MF) const override;			bool hasReservedCallFrame(const MachineFunction &MF) const override;
				bool canSimplifyCallFramePseudos(const MachineFunction &MF) const override;
	int getFrameIndexOffset(const MachineFunction &MF, int FI) const override;			bool needsFrameIndexResolution(const MachineFunction &MF) const override;
	int getFrameIndexReference(const MachineFunction &MF, int FI,
	unsigned &FrameReg) const override;			int getFrameIndexOffset(const MachineFunction &MF, int FI) const override;
				int getFrameIndexReference(const MachineFunction &MF, int FI,
	int getFrameIndexOffsetFromSP(const MachineFunction &MF, int FI) const;			unsigned &FrameReg) const override;
	int getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI,
	unsigned &FrameReg) const override;			int getFrameIndexOffsetFromSP(const MachineFunction &MF, int FI) const;
				int getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI,
	void eliminateCallFramePseudoInstr(MachineFunction &MF,			unsigned &FrameReg) const override;
	MachineBasicBlock &MBB,
	MachineBasicBlock::iterator MI) const override;			void eliminateCallFramePseudoInstr(MachineFunction &MF,
				MachineBasicBlock &MBB,
	private:			MachineBasicBlock::iterator MI) const override;
	/// convertArgMovsToPushes - This method tries to convert a call sequence
	/// that uses sub and mov instructions to put the argument onto the stack			private:
	/// into a series of pushes.			/// convertArgMovsToPushes - This method tries to convert a call sequence
	/// Returns true if the transformation succeeded, false if not.			/// that uses sub and mov instructions to put the argument onto the stack
	bool convertArgMovsToPushes(MachineFunction &MF,			/// into a series of pushes.
	MachineBasicBlock &MBB,			/// Returns true if the transformation succeeded, false if not.
	MachineBasicBlock::iterator I,			bool convertArgMovsToPushes(MachineFunction &MF,
	uint64_t Amount) const;			MachineBasicBlock &MBB,
	};			MachineBasicBlock::iterator I,
				uint64_t Amount) const;
	} // End llvm namespace			};

	#endif			} // End llvm namespace

				#endif

llvm/trunk/lib/Target/X86/X86FrameLowering.cpp

	//===-- X86FrameLowering.cpp - X86 Frame Information ----------------------===//			//===-- X86FrameLowering.cpp - X86 Frame Information ----------------------===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file contains the X86 implementation of TargetFrameLowering class.			// This file contains the X86 implementation of TargetFrameLowering class.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "X86FrameLowering.h"			#include "X86FrameLowering.h"
	#include "X86InstrBuilder.h"			#include "X86InstrBuilder.h"
	#include "X86InstrInfo.h"			#include "X86InstrInfo.h"
	#include "X86MachineFunctionInfo.h"			#include "X86MachineFunctionInfo.h"
	#include "X86Subtarget.h"			#include "X86Subtarget.h"
	#include "X86TargetMachine.h"			#include "X86TargetMachine.h"
	#include "llvm/ADT/SmallSet.h"			#include "llvm/ADT/SmallSet.h"
	#include "llvm/CodeGen/MachineFrameInfo.h"			#include "llvm/CodeGen/MachineFrameInfo.h"
	#include "llvm/CodeGen/MachineFunction.h"			#include "llvm/CodeGen/MachineFunction.h"
	#include "llvm/CodeGen/MachineInstrBuilder.h"			#include "llvm/CodeGen/MachineInstrBuilder.h"
	#include "llvm/CodeGen/MachineModuleInfo.h"			#include "llvm/CodeGen/MachineModuleInfo.h"
	#include "llvm/CodeGen/MachineRegisterInfo.h"			#include "llvm/CodeGen/MachineRegisterInfo.h"
	#include "llvm/IR/DataLayout.h"			#include "llvm/IR/DataLayout.h"
	#include "llvm/IR/Function.h"			#include "llvm/IR/Function.h"
	#include "llvm/MC/MCAsmInfo.h"			#include "llvm/MC/MCAsmInfo.h"
	#include "llvm/MC/MCSymbol.h"			#include "llvm/MC/MCSymbol.h"
	#include "llvm/Support/CommandLine.h"			#include "llvm/Support/CommandLine.h"
	#include "llvm/Target/TargetOptions.h"			#include "llvm/Target/TargetOptions.h"
	#include "llvm/Support/Debug.h"			#include "llvm/Support/Debug.h"
	#include <cstdlib>			#include <cstdlib>

	using namespace llvm;			using namespace llvm;

	// FIXME: completely move here.			// FIXME: completely move here.
	extern cl::opt<bool> ForceStackAlign;			extern cl::opt<bool> ForceStackAlign;

	bool X86FrameLowering::hasReservedCallFrame(const MachineFunction &MF) const {			bool X86FrameLowering::hasReservedCallFrame(const MachineFunction &MF) const {
	return !MF.getFrameInfo()->hasVarSizedObjects();			return !MF.getFrameInfo()->hasVarSizedObjects() &&
	}			!MF.getInfo<X86MachineFunctionInfo>()->getHasPushSequences();
				}
	/// hasFP - Return true if the specified function should have a dedicated frame
	/// pointer register. This is true if the function has variable sized allocas			/// canSimplifyCallFramePseudos - If there is a reserved call frame, the
	/// or if frame pointer elimination is disabled.			/// call frame pseudos can be simplified. Having a FP, as in the default
	bool X86FrameLowering::hasFP(const MachineFunction &MF) const {			/// implementation, is not sufficient here since we can't always use it.
	const MachineFrameInfo *MFI = MF.getFrameInfo();			/// Use a more nuanced condition.
	const MachineModuleInfo &MMI = MF.getMMI();			bool
	const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();			X86FrameLowering::canSimplifyCallFramePseudos(const MachineFunction &MF) const {
				const X86RegisterInfo TRI = static_cast<const X86RegisterInfo >
	return (MF.getTarget().Options.DisableFramePointerElim(MF) \|\|			(MF.getSubtarget().getRegisterInfo());
	RegInfo->needsStackRealignment(MF) \|\|			return hasReservedCallFrame(MF) \|\|
	MFI->hasVarSizedObjects() \|\|			(hasFP(MF) && !TRI->needsStackRealignment(MF))
	MFI->isFrameAddressTaken() \|\| MFI->hasInlineAsmWithSPAdjust() \|\|			\|\| TRI->hasBasePointer(MF);
	MF.getInfo<X86MachineFunctionInfo>()->getForceFramePointer() \|\|			}
	MMI.callsUnwindInit() \|\| MMI.callsEHReturn() \|\|
	MFI->hasStackMap() \|\| MFI->hasPatchPoint());			// needsFrameIndexResolution - Do we need to perform FI resolution for
	}			// this function. Normally, this is required only when the function
				// has any stack objects. However, FI resolution actually has another job,
	static unsigned getSUBriOpcode(unsigned IsLP64, int64_t Imm) {			// not apparent from the title - it resolves callframesetup/destroy
	if (IsLP64) {			// that were not simplified earlier.
	if (isInt<8>(Imm))			// So, this is required for x86 functions that have push sequences even
	return X86::SUB64ri8;			// when there are no stack objects.
	return X86::SUB64ri32;			bool
	} else {			X86FrameLowering::needsFrameIndexResolution(const MachineFunction &MF) const {
	if (isInt<8>(Imm))			return MF.getFrameInfo()->hasStackObjects() \|\|
	return X86::SUB32ri8;			MF.getInfo<X86MachineFunctionInfo>()->getHasPushSequences();
	return X86::SUB32ri;			}
	}
	}			/// hasFP - Return true if the specified function should have a dedicated frame
				/// pointer register. This is true if the function has variable sized allocas
	static unsigned getADDriOpcode(unsigned IsLP64, int64_t Imm) {			/// or if frame pointer elimination is disabled.
	if (IsLP64) {			bool X86FrameLowering::hasFP(const MachineFunction &MF) const {
	if (isInt<8>(Imm))			const MachineFrameInfo *MFI = MF.getFrameInfo();
	return X86::ADD64ri8;			const MachineModuleInfo &MMI = MF.getMMI();
	return X86::ADD64ri32;			const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
	} else {
	if (isInt<8>(Imm))			return (MF.getTarget().Options.DisableFramePointerElim(MF) \|\|
	return X86::ADD32ri8;			RegInfo->needsStackRealignment(MF) \|\|
	return X86::ADD32ri;			MFI->hasVarSizedObjects() \|\|
	}			MFI->isFrameAddressTaken() \|\| MFI->hasInlineAsmWithSPAdjust() \|\|
	}			MF.getInfo<X86MachineFunctionInfo>()->getForceFramePointer() \|\|
				MMI.callsUnwindInit() \|\| MMI.callsEHReturn() \|\|
	static unsigned getSUBrrOpcode(unsigned isLP64) {			MFI->hasStackMap() \|\| MFI->hasPatchPoint());
	return isLP64 ? X86::SUB64rr : X86::SUB32rr;			}
	}
				static unsigned getSUBriOpcode(unsigned IsLP64, int64_t Imm) {
	static unsigned getADDrrOpcode(unsigned isLP64) {			if (IsLP64) {
	return isLP64 ? X86::ADD64rr : X86::ADD32rr;			if (isInt<8>(Imm))
	}			return X86::SUB64ri8;
				return X86::SUB64ri32;
	static unsigned getANDriOpcode(bool IsLP64, int64_t Imm) {			} else {
	if (IsLP64) {			if (isInt<8>(Imm))
	if (isInt<8>(Imm))			return X86::SUB32ri8;
	return X86::AND64ri8;			return X86::SUB32ri;
	return X86::AND64ri32;			}
	}			}
	if (isInt<8>(Imm))
	return X86::AND32ri8;			static unsigned getADDriOpcode(unsigned IsLP64, int64_t Imm) {
	return X86::AND32ri;			if (IsLP64) {
	}			if (isInt<8>(Imm))
				return X86::ADD64ri8;
	static unsigned getPUSHiOpcode(bool IsLP64, MachineOperand MO) {			return X86::ADD64ri32;
	// We don't support LP64 for now.			} else {
	assert(!IsLP64);			if (isInt<8>(Imm))
				return X86::ADD32ri8;
	if (MO.isImm() && isInt<8>(MO.getImm()))			return X86::ADD32ri;
	return X86::PUSH32i8;			}
				}
	return X86::PUSHi32;;
	}			static unsigned getSUBrrOpcode(unsigned isLP64) {
				return isLP64 ? X86::SUB64rr : X86::SUB32rr;
	static unsigned getLEArOpcode(unsigned IsLP64) {			}
	return IsLP64 ? X86::LEA64r : X86::LEA32r;
	}			static unsigned getADDrrOpcode(unsigned isLP64) {
				return isLP64 ? X86::ADD64rr : X86::ADD32rr;
	/// findDeadCallerSavedReg - Return a caller-saved register that isn't live			}
	/// when it reaches the "return" instruction. We can then pop a stack object
	/// to this register without worry about clobbering it.			static unsigned getANDriOpcode(bool IsLP64, int64_t Imm) {
	static unsigned findDeadCallerSavedReg(MachineBasicBlock &MBB,			if (IsLP64) {
	MachineBasicBlock::iterator &MBBI,			if (isInt<8>(Imm))
	const TargetRegisterInfo &TRI,			return X86::AND64ri8;
	bool Is64Bit) {			return X86::AND64ri32;
	const MachineFunction *MF = MBB.getParent();			}
	const Function *F = MF->getFunction();			if (isInt<8>(Imm))
	if (!F \|\| MF->getMMI().callsEHReturn())			return X86::AND32ri8;
	return 0;			return X86::AND32ri;
				}
	static const uint16_t CallerSavedRegs32Bit[] = {
	X86::EAX, X86::EDX, X86::ECX, 0			static unsigned getLEArOpcode(unsigned IsLP64) {
	};			return IsLP64 ? X86::LEA64r : X86::LEA32r;
				}
	static const uint16_t CallerSavedRegs64Bit[] = {
	X86::RAX, X86::RDX, X86::RCX, X86::RSI, X86::RDI,			/// findDeadCallerSavedReg - Return a caller-saved register that isn't live
	X86::R8, X86::R9, X86::R10, X86::R11, 0			/// when it reaches the "return" instruction. We can then pop a stack object
	};			/// to this register without worry about clobbering it.
				static unsigned findDeadCallerSavedReg(MachineBasicBlock &MBB,
	unsigned Opc = MBBI->getOpcode();			MachineBasicBlock::iterator &MBBI,
	switch (Opc) {			const TargetRegisterInfo &TRI,
	default: return 0;			bool Is64Bit) {
	case X86::RETL:			const MachineFunction *MF = MBB.getParent();
	case X86::RETQ:			const Function *F = MF->getFunction();
	case X86::RETIL:			if (!F \|\| MF->getMMI().callsEHReturn())
	case X86::RETIQ:			return 0;
	case X86::TCRETURNdi:
	case X86::TCRETURNri:			static const uint16_t CallerSavedRegs32Bit[] = {
	case X86::TCRETURNmi:			X86::EAX, X86::EDX, X86::ECX, 0
	case X86::TCRETURNdi64:			};
	case X86::TCRETURNri64:
	case X86::TCRETURNmi64:			static const uint16_t CallerSavedRegs64Bit[] = {
	case X86::EH_RETURN:			X86::RAX, X86::RDX, X86::RCX, X86::RSI, X86::RDI,
	case X86::EH_RETURN64: {			X86::R8, X86::R9, X86::R10, X86::R11, 0
	SmallSet<uint16_t, 8> Uses;			};
	for (unsigned i = 0, e = MBBI->getNumOperands(); i != e; ++i) {
	MachineOperand &MO = MBBI->getOperand(i);			unsigned Opc = MBBI->getOpcode();
	if (!MO.isReg() \|\| MO.isDef())			switch (Opc) {
	continue;			default: return 0;
	unsigned Reg = MO.getReg();			case X86::RETL:
	if (!Reg)			case X86::RETQ:
	continue;			case X86::RETIL:
	for (MCRegAliasIterator AI(Reg, &TRI, true); AI.isValid(); ++AI)			case X86::RETIQ:
	Uses.insert(*AI);			case X86::TCRETURNdi:
	}			case X86::TCRETURNri:
				case X86::TCRETURNmi:
	const uint16_t *CS = Is64Bit ? CallerSavedRegs64Bit : CallerSavedRegs32Bit;			case X86::TCRETURNdi64:
	for (; *CS; ++CS)			case X86::TCRETURNri64:
	if (!Uses.count(*CS))			case X86::TCRETURNmi64:
	return *CS;			case X86::EH_RETURN:
	}			case X86::EH_RETURN64: {
	}			SmallSet<uint16_t, 8> Uses;
				for (unsigned i = 0, e = MBBI->getNumOperands(); i != e; ++i) {
	return 0;			MachineOperand &MO = MBBI->getOperand(i);
	}			if (!MO.isReg() \|\| MO.isDef())
				continue;
	static bool isEAXLiveIn(MachineFunction &MF) {			unsigned Reg = MO.getReg();
	for (MachineRegisterInfo::livein_iterator II = MF.getRegInfo().livein_begin(),			if (!Reg)
	EE = MF.getRegInfo().livein_end(); II != EE; ++II) {			continue;
	unsigned Reg = II->first;			for (MCRegAliasIterator AI(Reg, &TRI, true); AI.isValid(); ++AI)
				Uses.insert(*AI);
	if (Reg == X86::RAX \|\| Reg == X86::EAX \|\| Reg == X86::AX \|\|			}
	Reg == X86::AH \|\| Reg == X86::AL)
	return true;			const uint16_t *CS = Is64Bit ? CallerSavedRegs64Bit : CallerSavedRegs32Bit;
	}			for (; *CS; ++CS)
				if (!Uses.count(*CS))
	return false;			return *CS;
	}			}
				}
	/// emitSPUpdate - Emit a series of instructions to increment / decrement the
	/// stack pointer by a constant value.			return 0;
	static			}
	void emitSPUpdate(MachineBasicBlock &MBB, MachineBasicBlock::iterator &MBBI,
	unsigned StackPtr, int64_t NumBytes,			static bool isEAXLiveIn(MachineFunction &MF) {
	bool Is64BitTarget, bool Is64BitStackPtr, bool UseLEA,			for (MachineRegisterInfo::livein_iterator II = MF.getRegInfo().livein_begin(),
	const TargetInstrInfo &TII, const TargetRegisterInfo &TRI) {			EE = MF.getRegInfo().livein_end(); II != EE; ++II) {
	bool isSub = NumBytes < 0;			unsigned Reg = II->first;
	uint64_t Offset = isSub ? -NumBytes : NumBytes;
	unsigned Opc;			if (Reg == X86::RAX \|\| Reg == X86::EAX \|\| Reg == X86::AX \|\|
	if (UseLEA)			Reg == X86::AH \|\| Reg == X86::AL)
	Opc = getLEArOpcode(Is64BitStackPtr);			return true;
	else			}
	Opc = isSub
	? getSUBriOpcode(Is64BitStackPtr, Offset)			return false;
	: getADDriOpcode(Is64BitStackPtr, Offset);			}

	uint64_t Chunk = (1LL << 31) - 1;			/// emitSPUpdate - Emit a series of instructions to increment / decrement the
	DebugLoc DL = MBB.findDebugLoc(MBBI);			/// stack pointer by a constant value.
				static
	while (Offset) {			void emitSPUpdate(MachineBasicBlock &MBB, MachineBasicBlock::iterator &MBBI,
	if (Offset > Chunk) {			unsigned StackPtr, int64_t NumBytes,
	// Rather than emit a long series of instructions for large offsets,			bool Is64BitTarget, bool Is64BitStackPtr, bool UseLEA,
	// load the offset into a register and do one sub/add			const TargetInstrInfo &TII, const TargetRegisterInfo &TRI) {
	unsigned Reg = 0;			bool isSub = NumBytes < 0;
				uint64_t Offset = isSub ? -NumBytes : NumBytes;
	if (isSub && !isEAXLiveIn(*MBB.getParent()))			unsigned Opc;
	Reg = (unsigned)(Is64BitTarget ? X86::RAX : X86::EAX);			if (UseLEA)
	else			Opc = getLEArOpcode(Is64BitStackPtr);
	Reg = findDeadCallerSavedReg(MBB, MBBI, TRI, Is64BitTarget);			else
				Opc = isSub
	if (Reg) {			? getSUBriOpcode(Is64BitStackPtr, Offset)
	Opc = Is64BitTarget ? X86::MOV64ri : X86::MOV32ri;			: getADDriOpcode(Is64BitStackPtr, Offset);
	BuildMI(MBB, MBBI, DL, TII.get(Opc), Reg)
	.addImm(Offset);			uint64_t Chunk = (1LL << 31) - 1;
	Opc = isSub			DebugLoc DL = MBB.findDebugLoc(MBBI);
	? getSUBrrOpcode(Is64BitTarget)
	: getADDrrOpcode(Is64BitTarget);			while (Offset) {
	MachineInstr *MI = BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr)			if (Offset > Chunk) {
	.addReg(StackPtr)			// Rather than emit a long series of instructions for large offsets,
	.addReg(Reg);			// load the offset into a register and do one sub/add
	MI->getOperand(3).setIsDead(); // The EFLAGS implicit def is dead.			unsigned Reg = 0;
	Offset = 0;
	continue;			if (isSub && !isEAXLiveIn(*MBB.getParent()))
	}			Reg = (unsigned)(Is64BitTarget ? X86::RAX : X86::EAX);
	}			else
				Reg = findDeadCallerSavedReg(MBB, MBBI, TRI, Is64BitTarget);
	uint64_t ThisVal = (Offset > Chunk) ? Chunk : Offset;
	if (ThisVal == (Is64BitTarget ? 8 : 4)) {			if (Reg) {
	// Use push / pop instead.			Opc = Is64BitTarget ? X86::MOV64ri : X86::MOV32ri;
	unsigned Reg = isSub			BuildMI(MBB, MBBI, DL, TII.get(Opc), Reg)
	? (unsigned)(Is64BitTarget ? X86::RAX : X86::EAX)			.addImm(Offset);
	: findDeadCallerSavedReg(MBB, MBBI, TRI, Is64BitTarget);			Opc = isSub
	if (Reg) {			? getSUBrrOpcode(Is64BitTarget)
	Opc = isSub			: getADDrrOpcode(Is64BitTarget);
	? (Is64BitTarget ? X86::PUSH64r : X86::PUSH32r)			MachineInstr *MI = BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr)
	: (Is64BitTarget ? X86::POP64r : X86::POP32r);			.addReg(StackPtr)
	MachineInstr *MI = BuildMI(MBB, MBBI, DL, TII.get(Opc))			.addReg(Reg);
	.addReg(Reg, getDefRegState(!isSub) \| getUndefRegState(isSub));			MI->getOperand(3).setIsDead(); // The EFLAGS implicit def is dead.
	if (isSub)			Offset = 0;
	MI->setFlag(MachineInstr::FrameSetup);			continue;
	Offset -= ThisVal;			}
	continue;			}
	}
	}			uint64_t ThisVal = (Offset > Chunk) ? Chunk : Offset;
				if (ThisVal == (Is64BitTarget ? 8 : 4)) {
	MachineInstr *MI = nullptr;			// Use push / pop instead.
				unsigned Reg = isSub
	if (UseLEA) {			? (unsigned)(Is64BitTarget ? X86::RAX : X86::EAX)
	MI = addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr),			: findDeadCallerSavedReg(MBB, MBBI, TRI, Is64BitTarget);
	StackPtr, false, isSub ? -ThisVal : ThisVal);			if (Reg) {
	} else {			Opc = isSub
	MI = BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr)			? (Is64BitTarget ? X86::PUSH64r : X86::PUSH32r)
	.addReg(StackPtr)			: (Is64BitTarget ? X86::POP64r : X86::POP32r);
	.addImm(ThisVal);			MachineInstr *MI = BuildMI(MBB, MBBI, DL, TII.get(Opc))
	MI->getOperand(3).setIsDead(); // The EFLAGS implicit def is dead.			.addReg(Reg, getDefRegState(!isSub) \| getUndefRegState(isSub));
	}			if (isSub)
				MI->setFlag(MachineInstr::FrameSetup);
	if (isSub)			Offset -= ThisVal;
	MI->setFlag(MachineInstr::FrameSetup);			continue;
				}
	Offset -= ThisVal;			}
	}
	}			MachineInstr *MI = nullptr;

	/// mergeSPUpdatesUp - Merge two stack-manipulating instructions upper iterator.			if (UseLEA) {
	static			MI = addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr),
	void mergeSPUpdatesUp(MachineBasicBlock &MBB, MachineBasicBlock::iterator &MBBI,			StackPtr, false, isSub ? -ThisVal : ThisVal);
	unsigned StackPtr, uint64_t *NumBytes = nullptr) {			} else {
	if (MBBI == MBB.begin()) return;			MI = BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr)
				.addReg(StackPtr)
	MachineBasicBlock::iterator PI = std::prev(MBBI);			.addImm(ThisVal);
	unsigned Opc = PI->getOpcode();			MI->getOperand(3).setIsDead(); // The EFLAGS implicit def is dead.
	if ((Opc == X86::ADD64ri32 \|\| Opc == X86::ADD64ri8 \|\|			}
	Opc == X86::ADD32ri \|\| Opc == X86::ADD32ri8 \|\|
	Opc == X86::LEA32r \|\| Opc == X86::LEA64_32r) &&			if (isSub)
	PI->getOperand(0).getReg() == StackPtr) {			MI->setFlag(MachineInstr::FrameSetup);
	if (NumBytes)
	*NumBytes += PI->getOperand(2).getImm();			Offset -= ThisVal;
	MBB.erase(PI);			}
	} else if ((Opc == X86::SUB64ri32 \|\| Opc == X86::SUB64ri8 \|\|			}
	Opc == X86::SUB32ri \|\| Opc == X86::SUB32ri8) &&
	PI->getOperand(0).getReg() == StackPtr) {			/// mergeSPUpdatesUp - Merge two stack-manipulating instructions upper iterator.
	if (NumBytes)			static
	*NumBytes -= PI->getOperand(2).getImm();			void mergeSPUpdatesUp(MachineBasicBlock &MBB, MachineBasicBlock::iterator &MBBI,
	MBB.erase(PI);			unsigned StackPtr, uint64_t *NumBytes = nullptr) {
	}			if (MBBI == MBB.begin()) return;
	}
				MachineBasicBlock::iterator PI = std::prev(MBBI);
	/// mergeSPUpdatesDown - Merge two stack-manipulating instructions lower			unsigned Opc = PI->getOpcode();
	/// iterator.			if ((Opc == X86::ADD64ri32 \|\| Opc == X86::ADD64ri8 \|\|
	static			Opc == X86::ADD32ri \|\| Opc == X86::ADD32ri8 \|\|
	void mergeSPUpdatesDown(MachineBasicBlock &MBB,			Opc == X86::LEA32r \|\| Opc == X86::LEA64_32r) &&
	MachineBasicBlock::iterator &MBBI,			PI->getOperand(0).getReg() == StackPtr) {
	unsigned StackPtr, uint64_t *NumBytes = nullptr) {			if (NumBytes)
	// FIXME: THIS ISN'T RUN!!!			*NumBytes += PI->getOperand(2).getImm();
	return;			MBB.erase(PI);
				} else if ((Opc == X86::SUB64ri32 \|\| Opc == X86::SUB64ri8 \|\|
	if (MBBI == MBB.end()) return;			Opc == X86::SUB32ri \|\| Opc == X86::SUB32ri8) &&
				PI->getOperand(0).getReg() == StackPtr) {
	MachineBasicBlock::iterator NI = std::next(MBBI);			if (NumBytes)
	if (NI == MBB.end()) return;			*NumBytes -= PI->getOperand(2).getImm();
				MBB.erase(PI);
	unsigned Opc = NI->getOpcode();			}
	if ((Opc == X86::ADD64ri32 \|\| Opc == X86::ADD64ri8 \|\|			}
	Opc == X86::ADD32ri \|\| Opc == X86::ADD32ri8) &&
	NI->getOperand(0).getReg() == StackPtr) {			/// mergeSPUpdatesDown - Merge two stack-manipulating instructions lower
	if (NumBytes)			/// iterator.
	*NumBytes -= NI->getOperand(2).getImm();			static
	MBB.erase(NI);			void mergeSPUpdatesDown(MachineBasicBlock &MBB,
	MBBI = NI;			MachineBasicBlock::iterator &MBBI,
	} else if ((Opc == X86::SUB64ri32 \|\| Opc == X86::SUB64ri8 \|\|			unsigned StackPtr, uint64_t *NumBytes = nullptr) {
	Opc == X86::SUB32ri \|\| Opc == X86::SUB32ri8) &&			// FIXME: THIS ISN'T RUN!!!
	NI->getOperand(0).getReg() == StackPtr) {			return;
	if (NumBytes)
	*NumBytes += NI->getOperand(2).getImm();			if (MBBI == MBB.end()) return;
	MBB.erase(NI);
	MBBI = NI;			MachineBasicBlock::iterator NI = std::next(MBBI);
	}			if (NI == MBB.end()) return;
	}
				unsigned Opc = NI->getOpcode();
	/// mergeSPUpdates - Checks the instruction before/after the passed			if ((Opc == X86::ADD64ri32 \|\| Opc == X86::ADD64ri8 \|\|
	/// instruction. If it is an ADD/SUB/LEA instruction it is deleted argument and			Opc == X86::ADD32ri \|\| Opc == X86::ADD32ri8) &&
	/// the stack adjustment is returned as a positive value for ADD/LEA and a			NI->getOperand(0).getReg() == StackPtr) {
	/// negative for SUB.			if (NumBytes)
	static int mergeSPUpdates(MachineBasicBlock &MBB,			*NumBytes -= NI->getOperand(2).getImm();
	MachineBasicBlock::iterator &MBBI, unsigned StackPtr,			MBB.erase(NI);
	bool doMergeWithPrevious) {			MBBI = NI;
	if ((doMergeWithPrevious && MBBI == MBB.begin()) \|\|			} else if ((Opc == X86::SUB64ri32 \|\| Opc == X86::SUB64ri8 \|\|
	(!doMergeWithPrevious && MBBI == MBB.end()))			Opc == X86::SUB32ri \|\| Opc == X86::SUB32ri8) &&
	return 0;			NI->getOperand(0).getReg() == StackPtr) {
				if (NumBytes)
	MachineBasicBlock::iterator PI = doMergeWithPrevious ? std::prev(MBBI) : MBBI;			*NumBytes += NI->getOperand(2).getImm();
	MachineBasicBlock::iterator NI = doMergeWithPrevious ? nullptr			MBB.erase(NI);
	: std::next(MBBI);			MBBI = NI;
	unsigned Opc = PI->getOpcode();			}
	int Offset = 0;			}

	if ((Opc == X86::ADD64ri32 \|\| Opc == X86::ADD64ri8 \|\|			/// mergeSPUpdates - Checks the instruction before/after the passed
	Opc == X86::ADD32ri \|\| Opc == X86::ADD32ri8 \|\|			/// instruction. If it is an ADD/SUB/LEA instruction it is deleted argument and
	Opc == X86::LEA32r \|\| Opc == X86::LEA64_32r) &&			/// the stack adjustment is returned as a positive value for ADD/LEA and a
	PI->getOperand(0).getReg() == StackPtr){			/// negative for SUB.
	Offset += PI->getOperand(2).getImm();			static int mergeSPUpdates(MachineBasicBlock &MBB,
	MBB.erase(PI);			MachineBasicBlock::iterator &MBBI, unsigned StackPtr,
	if (!doMergeWithPrevious) MBBI = NI;			bool doMergeWithPrevious) {
	} else if ((Opc == X86::SUB64ri32 \|\| Opc == X86::SUB64ri8 \|\|			if ((doMergeWithPrevious && MBBI == MBB.begin()) \|\|
	Opc == X86::SUB32ri \|\| Opc == X86::SUB32ri8) &&			(!doMergeWithPrevious && MBBI == MBB.end()))
	PI->getOperand(0).getReg() == StackPtr) {			return 0;
	Offset -= PI->getOperand(2).getImm();
	MBB.erase(PI);			MachineBasicBlock::iterator PI = doMergeWithPrevious ? std::prev(MBBI) : MBBI;
	if (!doMergeWithPrevious) MBBI = NI;			MachineBasicBlock::iterator NI = doMergeWithPrevious ? nullptr
	}			: std::next(MBBI);
				unsigned Opc = PI->getOpcode();
	return Offset;			int Offset = 0;
	}
				if ((Opc == X86::ADD64ri32 \|\| Opc == X86::ADD64ri8 \|\|
	void			Opc == X86::ADD32ri \|\| Opc == X86::ADD32ri8 \|\|
	X86FrameLowering::emitCalleeSavedFrameMoves(MachineBasicBlock &MBB,			Opc == X86::LEA32r \|\| Opc == X86::LEA64_32r) &&
	MachineBasicBlock::iterator MBBI,			PI->getOperand(0).getReg() == StackPtr){
	DebugLoc DL) const {			Offset += PI->getOperand(2).getImm();
	MachineFunction &MF = *MBB.getParent();			MBB.erase(PI);
	MachineFrameInfo *MFI = MF.getFrameInfo();			if (!doMergeWithPrevious) MBBI = NI;
	MachineModuleInfo &MMI = MF.getMMI();			} else if ((Opc == X86::SUB64ri32 \|\| Opc == X86::SUB64ri8 \|\|
	const MCRegisterInfo *MRI = MMI.getContext().getRegisterInfo();			Opc == X86::SUB32ri \|\| Opc == X86::SUB32ri8) &&
	const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();			PI->getOperand(0).getReg() == StackPtr) {
				Offset -= PI->getOperand(2).getImm();
	// Add callee saved registers to move list.			MBB.erase(PI);
	const std::vector<CalleeSavedInfo> &CSI = MFI->getCalleeSavedInfo();			if (!doMergeWithPrevious) MBBI = NI;
	if (CSI.empty()) return;			}

	// Calculate offsets.			return Offset;
	for (std::vector<CalleeSavedInfo>::const_iterator			}
	I = CSI.begin(), E = CSI.end(); I != E; ++I) {
	int64_t Offset = MFI->getObjectOffset(I->getFrameIdx());			void
	unsigned Reg = I->getReg();			X86FrameLowering::emitCalleeSavedFrameMoves(MachineBasicBlock &MBB,
				MachineBasicBlock::iterator MBBI,
	unsigned DwarfReg = MRI->getDwarfRegNum(Reg, true);			DebugLoc DL) const {
	unsigned CFIIndex =			MachineFunction &MF = *MBB.getParent();
	MMI.addFrameInst(MCCFIInstruction::createOffset(nullptr, DwarfReg,			MachineFrameInfo *MFI = MF.getFrameInfo();
	Offset));			MachineModuleInfo &MMI = MF.getMMI();
	BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))			const MCRegisterInfo *MRI = MMI.getContext().getRegisterInfo();
	.addCFIIndex(CFIIndex);			const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
	}
	}			// Add callee saved registers to move list.
				const std::vector<CalleeSavedInfo> &CSI = MFI->getCalleeSavedInfo();
	/// usesTheStack - This function checks if any of the users of EFLAGS			if (CSI.empty()) return;
	/// copies the EFLAGS. We know that the code that lowers COPY of EFLAGS has
	/// to use the stack, and if we don't adjust the stack we clobber the first			// Calculate offsets.
	/// frame index.			for (std::vector<CalleeSavedInfo>::const_iterator
	/// See X86InstrInfo::copyPhysReg.			I = CSI.begin(), E = CSI.end(); I != E; ++I) {
	static bool usesTheStack(const MachineFunction &MF) {			int64_t Offset = MFI->getObjectOffset(I->getFrameIdx());
	const MachineRegisterInfo &MRI = MF.getRegInfo();			unsigned Reg = I->getReg();

	for (MachineRegisterInfo::reg_instr_iterator			unsigned DwarfReg = MRI->getDwarfRegNum(Reg, true);
	ri = MRI.reg_instr_begin(X86::EFLAGS), re = MRI.reg_instr_end();			unsigned CFIIndex =
	ri != re; ++ri)			MMI.addFrameInst(MCCFIInstruction::createOffset(nullptr, DwarfReg,
	if (ri->isCopy())			Offset));
	return true;			BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
				.addCFIIndex(CFIIndex);
	return false;			}
	}			}

	void X86FrameLowering::emitStackProbeCall(MachineFunction &MF,			/// usesTheStack - This function checks if any of the users of EFLAGS
	MachineBasicBlock &MBB,			/// copies the EFLAGS. We know that the code that lowers COPY of EFLAGS has
	MachineBasicBlock::iterator MBBI,			/// to use the stack, and if we don't adjust the stack we clobber the first
	DebugLoc DL) {			/// frame index.
	const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();			/// See X86InstrInfo::copyPhysReg.
	const X86Subtarget &STI = MF.getTarget().getSubtarget<X86Subtarget>();			static bool usesTheStack(const MachineFunction &MF) {
	bool Is64Bit = STI.is64Bit();			const MachineRegisterInfo &MRI = MF.getRegInfo();
	bool IsLargeCodeModel = MF.getTarget().getCodeModel() == CodeModel::Large;
				for (MachineRegisterInfo::reg_instr_iterator
	unsigned CallOp;			ri = MRI.reg_instr_begin(X86::EFLAGS), re = MRI.reg_instr_end();
	if (Is64Bit)			ri != re; ++ri)
	CallOp = IsLargeCodeModel ? X86::CALL64r : X86::CALL64pcrel32;			if (ri->isCopy())
	else			return true;
	CallOp = X86::CALLpcrel32;
				return false;
	const char *Symbol;			}
	if (Is64Bit) {
	if (STI.isTargetCygMing()) {			void X86FrameLowering::emitStackProbeCall(MachineFunction &MF,
	Symbol = "___chkstk_ms";			MachineBasicBlock &MBB,
	} else {			MachineBasicBlock::iterator MBBI,
	Symbol = "__chkstk";			DebugLoc DL) {
	}			const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
	} else if (STI.isTargetCygMing())			const X86Subtarget &STI = MF.getTarget().getSubtarget<X86Subtarget>();
	Symbol = "_alloca";			bool Is64Bit = STI.is64Bit();
	else			bool IsLargeCodeModel = MF.getTarget().getCodeModel() == CodeModel::Large;
	Symbol = "_chkstk";
				unsigned CallOp;
	MachineInstrBuilder CI;			if (Is64Bit)
				CallOp = IsLargeCodeModel ? X86::CALL64r : X86::CALL64pcrel32;
	// All current stack probes take AX and SP as input, clobber flags, and			else
	// preserve all registers. x86_64 probes leave RSP unmodified.			CallOp = X86::CALLpcrel32;
	if (Is64Bit && MF.getTarget().getCodeModel() == CodeModel::Large) {
	// For the large code model, we have to call through a register. Use R11,			const char *Symbol;
	// as it is scratch in all supported calling conventions.			if (Is64Bit) {
	BuildMI(MBB, MBBI, DL, TII.get(X86::MOV64ri), X86::R11)			if (STI.isTargetCygMing()) {
	.addExternalSymbol(Symbol);			Symbol = "___chkstk_ms";
	CI = BuildMI(MBB, MBBI, DL, TII.get(CallOp)).addReg(X86::R11);			} else {
	} else {			Symbol = "__chkstk";
	CI = BuildMI(MBB, MBBI, DL, TII.get(CallOp)).addExternalSymbol(Symbol);			}
	}			} else if (STI.isTargetCygMing())
				Symbol = "_alloca";
	unsigned AX = Is64Bit ? X86::RAX : X86::EAX;			else
	unsigned SP = Is64Bit ? X86::RSP : X86::ESP;			Symbol = "_chkstk";
	CI.addReg(AX, RegState::Implicit)
	.addReg(SP, RegState::Implicit)			MachineInstrBuilder CI;
	.addReg(AX, RegState::Define \| RegState::Implicit)
	.addReg(SP, RegState::Define \| RegState::Implicit)			// All current stack probes take AX and SP as input, clobber flags, and
	.addReg(X86::EFLAGS, RegState::Define \| RegState::Implicit);			// preserve all registers. x86_64 probes leave RSP unmodified.
				if (Is64Bit && MF.getTarget().getCodeModel() == CodeModel::Large) {
	if (Is64Bit) {			// For the large code model, we have to call through a register. Use R11,
	// MSVC x64's __chkstk and cygwin/mingw's ___chkstk_ms do not adjust %rsp			// as it is scratch in all supported calling conventions.
	// themselves. It also does not clobber %rax so we can reuse it when			BuildMI(MBB, MBBI, DL, TII.get(X86::MOV64ri), X86::R11)
	// adjusting %rsp.			.addExternalSymbol(Symbol);
	BuildMI(MBB, MBBI, DL, TII.get(X86::SUB64rr), X86::RSP)			CI = BuildMI(MBB, MBBI, DL, TII.get(CallOp)).addReg(X86::R11);
	.addReg(X86::RSP)			} else {
	.addReg(X86::RAX);			CI = BuildMI(MBB, MBBI, DL, TII.get(CallOp)).addExternalSymbol(Symbol);
	}			}
	}
				unsigned AX = Is64Bit ? X86::RAX : X86::EAX;
	/// emitPrologue - Push callee-saved registers onto the stack, which			unsigned SP = Is64Bit ? X86::RSP : X86::ESP;
	/// automatically adjust the stack pointer. Adjust the stack pointer to allocate			CI.addReg(AX, RegState::Implicit)
	/// space for local variables. Also emit labels used by the exception handler to			.addReg(SP, RegState::Implicit)
	/// generate the exception handling frames.			.addReg(AX, RegState::Define \| RegState::Implicit)
				.addReg(SP, RegState::Define \| RegState::Implicit)
	/*			.addReg(X86::EFLAGS, RegState::Define \| RegState::Implicit);
	Here's a gist of what gets emitted:
				if (Is64Bit) {
	; Establish frame pointer, if needed			// MSVC x64's __chkstk and cygwin/mingw's ___chkstk_ms do not adjust %rsp
	[if needs FP]			// themselves. It also does not clobber %rax so we can reuse it when
	push %rbp			// adjusting %rsp.
	.cfi_def_cfa_offset 16			BuildMI(MBB, MBBI, DL, TII.get(X86::SUB64rr), X86::RSP)
	.cfi_offset %rbp, -16			.addReg(X86::RSP)
	.seh_pushreg %rpb			.addReg(X86::RAX);
	mov %rsp, %rbp			}
	.cfi_def_cfa_register %rbp			}

	; Spill general-purpose registers			/// emitPrologue - Push callee-saved registers onto the stack, which
	[for all callee-saved GPRs]			/// automatically adjust the stack pointer. Adjust the stack pointer to allocate
	pushq %<reg>			/// space for local variables. Also emit labels used by the exception handler to
	[if not needs FP]			/// generate the exception handling frames.
	.cfi_def_cfa_offset (offset from RETADDR)
	.seh_pushreg %<reg>			/*
				Here's a gist of what gets emitted:
	; If the required stack alignment > default stack alignment
	; rsp needs to be re-aligned. This creates a "re-alignment gap"			; Establish frame pointer, if needed
	; of unknown size in the stack frame.			[if needs FP]
	[if stack needs re-alignment]			push %rbp
	and $MASK, %rsp			.cfi_def_cfa_offset 16
				.cfi_offset %rbp, -16
	; Allocate space for locals			.seh_pushreg %rpb
	[if target is Windows and allocated space > 4096 bytes]			mov %rsp, %rbp
	; Windows needs special care for allocations larger			.cfi_def_cfa_register %rbp
	; than one page.
	mov $NNN, %rax			; Spill general-purpose registers
	call ___chkstk_ms/___chkstk			[for all callee-saved GPRs]
	sub %rax, %rsp			pushq %<reg>
	[else]			[if not needs FP]
	sub $NNN, %rsp			.cfi_def_cfa_offset (offset from RETADDR)
				.seh_pushreg %<reg>
	[if needs FP]
	.seh_stackalloc (size of XMM spill slots)			; If the required stack alignment > default stack alignment
	.seh_setframe %rbp, SEHFrameOffset ; = size of all spill slots			; rsp needs to be re-aligned. This creates a "re-alignment gap"
	[else]			; of unknown size in the stack frame.
	.seh_stackalloc NNN			[if stack needs re-alignment]
				and $MASK, %rsp
	; Spill XMMs
	; Note, that while only Windows 64 ABI specifies XMMs as callee-preserved,			; Allocate space for locals
	; they may get spilled on any platform, if the current function			[if target is Windows and allocated space > 4096 bytes]
	; calls @llvm.eh.unwind.init			; Windows needs special care for allocations larger
	[if needs FP]			; than one page.
	[for all callee-saved XMM registers]			mov $NNN, %rax
	movaps %<xmm reg>, -MMM(%rbp)			call ___chkstk_ms/___chkstk
	[for all callee-saved XMM registers]			sub %rax, %rsp
	.seh_savexmm %<xmm reg>, (-MMM + SEHFrameOffset)			[else]
	; i.e. the offset relative to (%rbp - SEHFrameOffset)			sub $NNN, %rsp
	[else]
	[for all callee-saved XMM registers]			[if needs FP]
	movaps %<xmm reg>, KKK(%rsp)			.seh_stackalloc (size of XMM spill slots)
	[for all callee-saved XMM registers]			.seh_setframe %rbp, SEHFrameOffset ; = size of all spill slots
	.seh_savexmm %<xmm reg>, KKK			[else]
				.seh_stackalloc NNN
	.seh_endprologue
				; Spill XMMs
	[if needs base pointer]			; Note, that while only Windows 64 ABI specifies XMMs as callee-preserved,
	mov %rsp, %rbx			; they may get spilled on any platform, if the current function
	[if needs to restore base pointer]			; calls @llvm.eh.unwind.init
	mov %rsp, -MMM(%rbp)			[if needs FP]
				[for all callee-saved XMM registers]
	; Emit CFI info			movaps %<xmm reg>, -MMM(%rbp)
	[if needs FP]			[for all callee-saved XMM registers]
	[for all callee-saved registers]			.seh_savexmm %<xmm reg>, (-MMM + SEHFrameOffset)
	.cfi_offset %<reg>, (offset from %rbp)			; i.e. the offset relative to (%rbp - SEHFrameOffset)
	[else]			[else]
	.cfi_def_cfa_offset (offset from RETADDR)			[for all callee-saved XMM registers]
	[for all callee-saved registers]			movaps %<xmm reg>, KKK(%rsp)
	.cfi_offset %<reg>, (offset from %rsp)			[for all callee-saved XMM registers]
				.seh_savexmm %<xmm reg>, KKK
	Notes:
	- .seh directives are emitted only for Windows 64 ABI			.seh_endprologue
	- .cfi directives are emitted for all other ABIs
	- for 32-bit code, substitute %e?? registers for %r??			[if needs base pointer]
	*/			mov %rsp, %rbx
				[if needs to restore base pointer]
	void X86FrameLowering::emitPrologue(MachineFunction &MF) const {			mov %rsp, -MMM(%rbp)
	MachineBasicBlock &MBB = MF.front(); // Prologue goes in entry BB.
	MachineBasicBlock::iterator MBBI = MBB.begin();			; Emit CFI info
	MachineFrameInfo *MFI = MF.getFrameInfo();			[if needs FP]
	const Function *Fn = MF.getFunction();			[for all callee-saved registers]
	const X86RegisterInfo *RegInfo =			.cfi_offset %<reg>, (offset from %rbp)
	static_cast<const X86RegisterInfo *>(MF.getSubtarget().getRegisterInfo());			[else]
	const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();			.cfi_def_cfa_offset (offset from RETADDR)
	MachineModuleInfo &MMI = MF.getMMI();			[for all callee-saved registers]
	X86MachineFunctionInfo *X86FI = MF.getInfo<X86MachineFunctionInfo>();			.cfi_offset %<reg>, (offset from %rsp)
	uint64_t MaxAlign = MFI->getMaxAlignment(); // Desired stack alignment.
	uint64_t StackSize = MFI->getStackSize(); // Number of bytes to allocate.			Notes:
	bool HasFP = hasFP(MF);			- .seh directives are emitted only for Windows 64 ABI
	const X86Subtarget &STI = MF.getTarget().getSubtarget<X86Subtarget>();			- .cfi directives are emitted for all other ABIs
	bool Is64Bit = STI.is64Bit();			- for 32-bit code, substitute %e?? registers for %r??
	// standard x86_64 and NaCl use 64-bit frame/stack pointers, x32 - 32-bit.			*/
	const bool Uses64BitFramePtr = STI.isTarget64BitLP64() \|\| STI.isTargetNaCl64();
	bool IsWin64 = STI.isTargetWin64();			void X86FrameLowering::emitPrologue(MachineFunction &MF) const {
	// Not necessarily synonymous with IsWin64.			MachineBasicBlock &MBB = MF.front(); // Prologue goes in entry BB.
	bool IsWinEH = MF.getTarget().getMCAsmInfo()->usesWindowsCFI();			MachineBasicBlock::iterator MBBI = MBB.begin();
	bool NeedsWinEH = IsWinEH && Fn->needsUnwindTableEntry();			MachineFrameInfo *MFI = MF.getFrameInfo();
	bool NeedsDwarfCFI =			const Function *Fn = MF.getFunction();
	!IsWinEH && (MMI.hasDebugInfo() \|\| Fn->needsUnwindTableEntry());			const X86RegisterInfo *RegInfo =
	bool UseLEA = STI.useLeaForSP();			static_cast<const X86RegisterInfo *>(MF.getSubtarget().getRegisterInfo());
	unsigned StackAlign = getStackAlignment();			const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
	unsigned SlotSize = RegInfo->getSlotSize();			MachineModuleInfo &MMI = MF.getMMI();
	unsigned FramePtr = RegInfo->getFrameRegister(MF);			X86MachineFunctionInfo *X86FI = MF.getInfo<X86MachineFunctionInfo>();
	const unsigned MachineFramePtr = STI.isTarget64BitILP32() ?			uint64_t MaxAlign = MFI->getMaxAlignment(); // Desired stack alignment.
	getX86SubSuperRegister(FramePtr, MVT::i64, false) : FramePtr;			uint64_t StackSize = MFI->getStackSize(); // Number of bytes to allocate.
	unsigned StackPtr = RegInfo->getStackRegister();			bool HasFP = hasFP(MF);
	unsigned BasePtr = RegInfo->getBaseRegister();			const X86Subtarget &STI = MF.getTarget().getSubtarget<X86Subtarget>();
	DebugLoc DL;			bool Is64Bit = STI.is64Bit();
				// standard x86_64 and NaCl use 64-bit frame/stack pointers, x32 - 32-bit.
	// If we're forcing a stack realignment we can't rely on just the frame			const bool Uses64BitFramePtr = STI.isTarget64BitLP64() \|\| STI.isTargetNaCl64();
	// info, we need to know the ABI stack alignment as well in case we			bool IsWin64 = STI.isTargetWin64();
	// have a call out. Otherwise just make sure we have some alignment - we'll			// Not necessarily synonymous with IsWin64.
	// go with the minimum SlotSize.			bool IsWinEH = MF.getTarget().getMCAsmInfo()->usesWindowsCFI();
	if (ForceStackAlign) {			bool NeedsWinEH = IsWinEH && Fn->needsUnwindTableEntry();
	if (MFI->hasCalls())			bool NeedsDwarfCFI =
	MaxAlign = (StackAlign > MaxAlign) ? StackAlign : MaxAlign;			!IsWinEH && (MMI.hasDebugInfo() \|\| Fn->needsUnwindTableEntry());
	else if (MaxAlign < SlotSize)			bool UseLEA = STI.useLeaForSP();
	MaxAlign = SlotSize;			unsigned StackAlign = getStackAlignment();
	}			unsigned SlotSize = RegInfo->getSlotSize();
				unsigned FramePtr = RegInfo->getFrameRegister(MF);
	// Add RETADDR move area to callee saved frame size.			const unsigned MachineFramePtr = STI.isTarget64BitILP32() ?
	int TailCallReturnAddrDelta = X86FI->getTCReturnAddrDelta();			getX86SubSuperRegister(FramePtr, MVT::i64, false) : FramePtr;
	if (TailCallReturnAddrDelta < 0)			unsigned StackPtr = RegInfo->getStackRegister();
	X86FI->setCalleeSavedFrameSize(			unsigned BasePtr = RegInfo->getBaseRegister();
	X86FI->getCalleeSavedFrameSize() - TailCallReturnAddrDelta);			DebugLoc DL;

	bool UseStackProbe = (STI.isOSWindows() && !STI.isTargetMachO());			// If we're forcing a stack realignment we can't rely on just the frame
				// info, we need to know the ABI stack alignment as well in case we
	// The default stack probe size is 4096 if the function has no stackprobesize			// have a call out. Otherwise just make sure we have some alignment - we'll
	// attribute.			// go with the minimum SlotSize.
	unsigned StackProbeSize = 4096;			if (ForceStackAlign) {
	if (Fn->hasFnAttribute("stack-probe-size"))			if (MFI->hasCalls())
	Fn->getFnAttribute("stack-probe-size")			MaxAlign = (StackAlign > MaxAlign) ? StackAlign : MaxAlign;
	.getValueAsString()			else if (MaxAlign < SlotSize)
	.getAsInteger(0, StackProbeSize);			MaxAlign = SlotSize;
				}
	// If this is x86-64 and the Red Zone is not disabled, if we are a leaf
	// function, and use up to 128 bytes of stack space, don't have a frame			// Add RETADDR move area to callee saved frame size.
	// pointer, calls, or dynamic alloca then we do not need to adjust the			int TailCallReturnAddrDelta = X86FI->getTCReturnAddrDelta();
	// stack pointer (we fit in the Red Zone). We also check that we don't			if (TailCallReturnAddrDelta < 0)
	// push and pop from the stack.			X86FI->setCalleeSavedFrameSize(
	if (Is64Bit && !Fn->getAttributes().hasAttribute(AttributeSet::FunctionIndex,			X86FI->getCalleeSavedFrameSize() - TailCallReturnAddrDelta);
	Attribute::NoRedZone) &&
	!RegInfo->needsStackRealignment(MF) &&			bool UseStackProbe = (STI.isOSWindows() && !STI.isTargetMachO());
	!MFI->hasVarSizedObjects() && // No dynamic alloca.
	!MFI->adjustsStack() && // No calls.			// The default stack probe size is 4096 if the function has no stackprobesize
	!IsWin64 && // Win64 has no Red Zone			// attribute.
	!usesTheStack(MF) && // Don't push and pop.			unsigned StackProbeSize = 4096;
	!MF.shouldSplitStack()) { // Regular stack			if (Fn->hasFnAttribute("stack-probe-size"))
	uint64_t MinSize = X86FI->getCalleeSavedFrameSize();			Fn->getFnAttribute("stack-probe-size")
	if (HasFP) MinSize += SlotSize;			.getValueAsString()
	StackSize = std::max(MinSize, StackSize > 128 ? StackSize - 128 : 0);			.getAsInteger(0, StackProbeSize);
	MFI->setStackSize(StackSize);
	}			// If this is x86-64 and the Red Zone is not disabled, if we are a leaf
				// function, and use up to 128 bytes of stack space, don't have a frame
	// Insert stack pointer adjustment for later moving of return addr. Only			// pointer, calls, or dynamic alloca then we do not need to adjust the
	// applies to tail call optimized functions where the callee argument stack			// stack pointer (we fit in the Red Zone). We also check that we don't
	// size is bigger than the callers.			// push and pop from the stack.
	if (TailCallReturnAddrDelta < 0) {			if (Is64Bit && !Fn->getAttributes().hasAttribute(AttributeSet::FunctionIndex,
	MachineInstr *MI =			Attribute::NoRedZone) &&
	BuildMI(MBB, MBBI, DL,			!RegInfo->needsStackRealignment(MF) &&
	TII.get(getSUBriOpcode(Uses64BitFramePtr, -TailCallReturnAddrDelta)),			!MFI->hasVarSizedObjects() && // No dynamic alloca.
	StackPtr)			!MFI->adjustsStack() && // No calls.
	.addReg(StackPtr)			!IsWin64 && // Win64 has no Red Zone
	.addImm(-TailCallReturnAddrDelta)			!usesTheStack(MF) && // Don't push and pop.
	.setMIFlag(MachineInstr::FrameSetup);			!MF.shouldSplitStack()) { // Regular stack
	MI->getOperand(3).setIsDead(); // The EFLAGS implicit def is dead.			uint64_t MinSize = X86FI->getCalleeSavedFrameSize();
	}			if (HasFP) MinSize += SlotSize;
				StackSize = std::max(MinSize, StackSize > 128 ? StackSize - 128 : 0);
	// Mapping for machine moves:			MFI->setStackSize(StackSize);
	//			}
	// DST: VirtualFP AND
	// SRC: VirtualFP => DW_CFA_def_cfa_offset			// Insert stack pointer adjustment for later moving of return addr. Only
	// ELSE => DW_CFA_def_cfa			// applies to tail call optimized functions where the callee argument stack
	//			// size is bigger than the callers.
	// SRC: VirtualFP AND			if (TailCallReturnAddrDelta < 0) {
	// DST: Register => DW_CFA_def_cfa_register			MachineInstr *MI =
	//			BuildMI(MBB, MBBI, DL,
	// ELSE			TII.get(getSUBriOpcode(Uses64BitFramePtr, -TailCallReturnAddrDelta)),
	// OFFSET < 0 => DW_CFA_offset_extended_sf			StackPtr)
	// REG < 64 => DW_CFA_offset + Reg			.addReg(StackPtr)
	// ELSE => DW_CFA_offset_extended			.addImm(-TailCallReturnAddrDelta)
				.setMIFlag(MachineInstr::FrameSetup);
	uint64_t NumBytes = 0;			MI->getOperand(3).setIsDead(); // The EFLAGS implicit def is dead.
	int stackGrowth = -SlotSize;			}

	if (HasFP) {			// Mapping for machine moves:
	// Calculate required stack adjustment.			//
	uint64_t FrameSize = StackSize - SlotSize;			// DST: VirtualFP AND
	// If required, include space for extra hidden slot for stashing base pointer.			// SRC: VirtualFP => DW_CFA_def_cfa_offset
	if (X86FI->getRestoreBasePointer())			// ELSE => DW_CFA_def_cfa
	FrameSize += SlotSize;			//
	if (RegInfo->needsStackRealignment(MF)) {			// SRC: VirtualFP AND
	// Callee-saved registers are pushed on stack before the stack			// DST: Register => DW_CFA_def_cfa_register
	// is realigned.			//
	FrameSize -= X86FI->getCalleeSavedFrameSize();			// ELSE
	NumBytes = (FrameSize + MaxAlign - 1) / MaxAlign * MaxAlign;			// OFFSET < 0 => DW_CFA_offset_extended_sf
	} else {			// REG < 64 => DW_CFA_offset + Reg
	NumBytes = FrameSize - X86FI->getCalleeSavedFrameSize();			// ELSE => DW_CFA_offset_extended
	}
				uint64_t NumBytes = 0;
	// Get the offset of the stack slot for the EBP register, which is			int stackGrowth = -SlotSize;
	// guaranteed to be the last slot by processFunctionBeforeFrameFinalized.
	// Update the frame offset adjustment.			if (HasFP) {
	MFI->setOffsetAdjustment(-NumBytes);			// Calculate required stack adjustment.
				uint64_t FrameSize = StackSize - SlotSize;
	// Save EBP/RBP into the appropriate stack slot.			// If required, include space for extra hidden slot for stashing base pointer.
	BuildMI(MBB, MBBI, DL, TII.get(Is64Bit ? X86::PUSH64r : X86::PUSH32r))			if (X86FI->getRestoreBasePointer())
	.addReg(MachineFramePtr, RegState::Kill)			FrameSize += SlotSize;
	.setMIFlag(MachineInstr::FrameSetup);			if (RegInfo->needsStackRealignment(MF)) {
				// Callee-saved registers are pushed on stack before the stack
	if (NeedsDwarfCFI) {			// is realigned.
	// Mark the place where EBP/RBP was saved.			FrameSize -= X86FI->getCalleeSavedFrameSize();
	// Define the current CFA rule to use the provided offset.			NumBytes = (FrameSize + MaxAlign - 1) / MaxAlign * MaxAlign;
	assert(StackSize);			} else {
	unsigned CFIIndex = MMI.addFrameInst(			NumBytes = FrameSize - X86FI->getCalleeSavedFrameSize();
	MCCFIInstruction::createDefCfaOffset(nullptr, 2 * stackGrowth));			}
	BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
	.addCFIIndex(CFIIndex);			// Get the offset of the stack slot for the EBP register, which is
				// guaranteed to be the last slot by processFunctionBeforeFrameFinalized.
	// Change the rule for the FramePtr to be an "offset" rule.			// Update the frame offset adjustment.
	unsigned DwarfFramePtr = RegInfo->getDwarfRegNum(MachineFramePtr, true);			MFI->setOffsetAdjustment(-NumBytes);
	CFIIndex = MMI.addFrameInst(
	MCCFIInstruction::createOffset(nullptr,			// Save EBP/RBP into the appropriate stack slot.
	DwarfFramePtr, 2 * stackGrowth));			BuildMI(MBB, MBBI, DL, TII.get(Is64Bit ? X86::PUSH64r : X86::PUSH32r))
	BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))			.addReg(MachineFramePtr, RegState::Kill)
	.addCFIIndex(CFIIndex);			.setMIFlag(MachineInstr::FrameSetup);
	}
				if (NeedsDwarfCFI) {
	if (NeedsWinEH) {			// Mark the place where EBP/RBP was saved.
	BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_PushReg))			// Define the current CFA rule to use the provided offset.
	.addImm(FramePtr)			assert(StackSize);
	.setMIFlag(MachineInstr::FrameSetup);			unsigned CFIIndex = MMI.addFrameInst(
	}			MCCFIInstruction::createDefCfaOffset(nullptr, 2 * stackGrowth));
				BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
	// Update EBP with the new base value.			.addCFIIndex(CFIIndex);
	BuildMI(MBB, MBBI, DL,
	TII.get(Uses64BitFramePtr ? X86::MOV64rr : X86::MOV32rr), FramePtr)			// Change the rule for the FramePtr to be an "offset" rule.
	.addReg(StackPtr)			unsigned DwarfFramePtr = RegInfo->getDwarfRegNum(MachineFramePtr, true);
	.setMIFlag(MachineInstr::FrameSetup);			CFIIndex = MMI.addFrameInst(
				MCCFIInstruction::createOffset(nullptr,
	if (NeedsDwarfCFI) {			DwarfFramePtr, 2 * stackGrowth));
	// Mark effective beginning of when frame pointer becomes valid.			BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
	// Define the current CFA to use the EBP/RBP register.			.addCFIIndex(CFIIndex);
	unsigned DwarfFramePtr = RegInfo->getDwarfRegNum(MachineFramePtr, true);			}
	unsigned CFIIndex = MMI.addFrameInst(
	MCCFIInstruction::createDefCfaRegister(nullptr, DwarfFramePtr));			if (NeedsWinEH) {
	BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))			BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_PushReg))
	.addCFIIndex(CFIIndex);			.addImm(FramePtr)
	}			.setMIFlag(MachineInstr::FrameSetup);
				}
	// Mark the FramePtr as live-in in every block.
	for (MachineFunction::iterator I = MF.begin(), E = MF.end(); I != E; ++I)			// Update EBP with the new base value.
	I->addLiveIn(MachineFramePtr);			BuildMI(MBB, MBBI, DL,
	} else {			TII.get(Uses64BitFramePtr ? X86::MOV64rr : X86::MOV32rr), FramePtr)
	NumBytes = StackSize - X86FI->getCalleeSavedFrameSize();			.addReg(StackPtr)
	}			.setMIFlag(MachineInstr::FrameSetup);

	// Skip the callee-saved push instructions.			if (NeedsDwarfCFI) {
	bool PushedRegs = false;			// Mark effective beginning of when frame pointer becomes valid.
	int StackOffset = 2 * stackGrowth;			// Define the current CFA to use the EBP/RBP register.
				unsigned DwarfFramePtr = RegInfo->getDwarfRegNum(MachineFramePtr, true);
	while (MBBI != MBB.end() &&			unsigned CFIIndex = MMI.addFrameInst(
	(MBBI->getOpcode() == X86::PUSH32r \|\|			MCCFIInstruction::createDefCfaRegister(nullptr, DwarfFramePtr));
	MBBI->getOpcode() == X86::PUSH64r)) {			BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
	PushedRegs = true;			.addCFIIndex(CFIIndex);
	unsigned Reg = MBBI->getOperand(0).getReg();			}
	++MBBI;
				// Mark the FramePtr as live-in in every block.
	if (!HasFP && NeedsDwarfCFI) {			for (MachineFunction::iterator I = MF.begin(), E = MF.end(); I != E; ++I)
	// Mark callee-saved push instruction.			I->addLiveIn(MachineFramePtr);
	// Define the current CFA rule to use the provided offset.			} else {
	assert(StackSize);			NumBytes = StackSize - X86FI->getCalleeSavedFrameSize();
	unsigned CFIIndex = MMI.addFrameInst(			}
	MCCFIInstruction::createDefCfaOffset(nullptr, StackOffset));
	BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))			// Skip the callee-saved push instructions.
	.addCFIIndex(CFIIndex);			bool PushedRegs = false;
	StackOffset += stackGrowth;			int StackOffset = 2 * stackGrowth;
	}
				while (MBBI != MBB.end() &&
	if (NeedsWinEH) {			(MBBI->getOpcode() == X86::PUSH32r \|\|
	BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_PushReg)).addImm(Reg).setMIFlag(			MBBI->getOpcode() == X86::PUSH64r)) {
	MachineInstr::FrameSetup);			PushedRegs = true;
	}			unsigned Reg = MBBI->getOperand(0).getReg();
	}			++MBBI;

	// Realign stack after we pushed callee-saved registers (so that we'll be			if (!HasFP && NeedsDwarfCFI) {
	// able to calculate their offsets from the frame pointer).			// Mark callee-saved push instruction.
	if (RegInfo->needsStackRealignment(MF)) {			// Define the current CFA rule to use the provided offset.
	assert(HasFP && "There should be a frame pointer if stack is realigned.");			assert(StackSize);
	uint64_t Val = -MaxAlign;			unsigned CFIIndex = MMI.addFrameInst(
	MachineInstr *MI =			MCCFIInstruction::createDefCfaOffset(nullptr, StackOffset));
	BuildMI(MBB, MBBI, DL,			BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
	TII.get(getANDriOpcode(Uses64BitFramePtr, Val)), StackPtr)			.addCFIIndex(CFIIndex);
	.addReg(StackPtr)			StackOffset += stackGrowth;
	.addImm(Val)			}
	.setMIFlag(MachineInstr::FrameSetup);
				if (NeedsWinEH) {
	// The EFLAGS implicit def is dead.			BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_PushReg)).addImm(Reg).setMIFlag(
	MI->getOperand(3).setIsDead();			MachineInstr::FrameSetup);
	}			}
				}
	// If there is an SUB32ri of ESP immediately before this instruction, merge
	// the two. This can be the case when tail call elimination is enabled and			// Realign stack after we pushed callee-saved registers (so that we'll be
	// the callee has more arguments then the caller.			// able to calculate their offsets from the frame pointer).
	NumBytes -= mergeSPUpdates(MBB, MBBI, StackPtr, true);			if (RegInfo->needsStackRealignment(MF)) {
				assert(HasFP && "There should be a frame pointer if stack is realigned.");
	// If there is an ADD32ri or SUB32ri of ESP immediately after this			uint64_t Val = -MaxAlign;
	// instruction, merge the two instructions.			MachineInstr *MI =
	mergeSPUpdatesDown(MBB, MBBI, StackPtr, &NumBytes);			BuildMI(MBB, MBBI, DL,
				TII.get(getANDriOpcode(Uses64BitFramePtr, Val)), StackPtr)
	// Adjust stack pointer: ESP -= numbytes.			.addReg(StackPtr)
				.addImm(Val)
	// Windows and cygwin/mingw require a prologue helper routine when allocating			.setMIFlag(MachineInstr::FrameSetup);
	// more than 4K bytes on the stack. Windows uses __chkstk and cygwin/mingw
	// uses __alloca. __alloca and the 32-bit version of __chkstk will probe the			// The EFLAGS implicit def is dead.
	// stack and adjust the stack pointer in one go. The 64-bit version of			MI->getOperand(3).setIsDead();
	// __chkstk is only responsible for probing the stack. The 64-bit prologue is			}
	// responsible for adjusting the stack pointer. Touching the stack at 4K
	// increments is necessary to ensure that the guard pages used by the OS			// If there is an SUB32ri of ESP immediately before this instruction, merge
	// virtual memory manager are allocated in correct sequence.			// the two. This can be the case when tail call elimination is enabled and
	if (NumBytes >= StackProbeSize && UseStackProbe) {			// the callee has more arguments then the caller.
	// Check whether EAX is livein for this function.			NumBytes -= mergeSPUpdates(MBB, MBBI, StackPtr, true);
	bool isEAXAlive = isEAXLiveIn(MF);
				// If there is an ADD32ri or SUB32ri of ESP immediately after this
	if (isEAXAlive) {			// instruction, merge the two instructions.
	// Sanity check that EAX is not livein for this function.			mergeSPUpdatesDown(MBB, MBBI, StackPtr, &NumBytes);
	// It should not be, so throw an assert.
	assert(!Is64Bit && "EAX is livein in x64 case!");			// Adjust stack pointer: ESP -= numbytes.

	// Save EAX			// Windows and cygwin/mingw require a prologue helper routine when allocating
	BuildMI(MBB, MBBI, DL, TII.get(X86::PUSH32r))			// more than 4K bytes on the stack. Windows uses __chkstk and cygwin/mingw
	.addReg(X86::EAX, RegState::Kill)			// uses __alloca. __alloca and the 32-bit version of __chkstk will probe the
	.setMIFlag(MachineInstr::FrameSetup);			// stack and adjust the stack pointer in one go. The 64-bit version of
	}			// __chkstk is only responsible for probing the stack. The 64-bit prologue is
				// responsible for adjusting the stack pointer. Touching the stack at 4K
	if (Is64Bit) {			// increments is necessary to ensure that the guard pages used by the OS
	// Handle the 64-bit Windows ABI case where we need to call __chkstk.			// virtual memory manager are allocated in correct sequence.
	// Function prologue is responsible for adjusting the stack pointer.			if (NumBytes >= StackProbeSize && UseStackProbe) {
	BuildMI(MBB, MBBI, DL, TII.get(X86::MOV64ri), X86::RAX)			// Check whether EAX is livein for this function.
	.addImm(NumBytes)			bool isEAXAlive = isEAXLiveIn(MF);
	.setMIFlag(MachineInstr::FrameSetup);
	} else {			if (isEAXAlive) {
	// Allocate NumBytes-4 bytes on stack in case of isEAXAlive.			// Sanity check that EAX is not livein for this function.
	// We'll also use 4 already allocated bytes for EAX.			// It should not be, so throw an assert.
	BuildMI(MBB, MBBI, DL, TII.get(X86::MOV32ri), X86::EAX)			assert(!Is64Bit && "EAX is livein in x64 case!");
	.addImm(isEAXAlive ? NumBytes - 4 : NumBytes)
	.setMIFlag(MachineInstr::FrameSetup);			// Save EAX
	}			BuildMI(MBB, MBBI, DL, TII.get(X86::PUSH32r))
				.addReg(X86::EAX, RegState::Kill)
	// Save a pointer to the MI where we set AX.			.setMIFlag(MachineInstr::FrameSetup);
	MachineBasicBlock::iterator SetRAX = MBBI;			}
	--SetRAX;
				if (Is64Bit) {
	// Call __chkstk, __chkstk_ms, or __alloca.			// Handle the 64-bit Windows ABI case where we need to call __chkstk.
	emitStackProbeCall(MF, MBB, MBBI, DL);			// Function prologue is responsible for adjusting the stack pointer.
				BuildMI(MBB, MBBI, DL, TII.get(X86::MOV64ri), X86::RAX)
	// Apply the frame setup flag to all inserted instrs.			.addImm(NumBytes)
	for (; SetRAX != MBBI; ++SetRAX)			.setMIFlag(MachineInstr::FrameSetup);
	SetRAX->setFlag(MachineInstr::FrameSetup);			} else {
				// Allocate NumBytes-4 bytes on stack in case of isEAXAlive.
	if (isEAXAlive) {			// We'll also use 4 already allocated bytes for EAX.
	// Restore EAX			BuildMI(MBB, MBBI, DL, TII.get(X86::MOV32ri), X86::EAX)
	MachineInstr *MI = addRegOffset(BuildMI(MF, DL, TII.get(X86::MOV32rm),			.addImm(isEAXAlive ? NumBytes - 4 : NumBytes)
	X86::EAX),			.setMIFlag(MachineInstr::FrameSetup);
	StackPtr, false, NumBytes - 4);			}
	MI->setFlag(MachineInstr::FrameSetup);
	MBB.insert(MBBI, MI);			// Save a pointer to the MI where we set AX.
	}			MachineBasicBlock::iterator SetRAX = MBBI;
	} else if (NumBytes) {			--SetRAX;
	emitSPUpdate(MBB, MBBI, StackPtr, -(int64_t)NumBytes, Is64Bit, Uses64BitFramePtr,
	UseLEA, TII, *RegInfo);			// Call __chkstk, __chkstk_ms, or __alloca.
	}			emitStackProbeCall(MF, MBB, MBBI, DL);

	int SEHFrameOffset = 0;			// Apply the frame setup flag to all inserted instrs.
	if (NeedsWinEH) {			for (; SetRAX != MBBI; ++SetRAX)
	if (HasFP) {			SetRAX->setFlag(MachineInstr::FrameSetup);
	// We need to set frame base offset low enough such that all saved
	// register offsets would be positive relative to it, but we can't			if (isEAXAlive) {
	// just use NumBytes, because .seh_setframe offset must be <=240.			// Restore EAX
	// So we pretend to have only allocated enough space to spill the			MachineInstr *MI = addRegOffset(BuildMI(MF, DL, TII.get(X86::MOV32rm),
	// non-volatile registers.			X86::EAX),
	// We don't care about the rest of stack allocation, because unwinder			StackPtr, false, NumBytes - 4);
	// will restore SP to (BP - SEHFrameOffset)			MI->setFlag(MachineInstr::FrameSetup);
	for (const CalleeSavedInfo &Info : MFI->getCalleeSavedInfo()) {			MBB.insert(MBBI, MI);
	int offset = MFI->getObjectOffset(Info.getFrameIdx());			}
	SEHFrameOffset = std::max(SEHFrameOffset, std::abs(offset));			} else if (NumBytes) {
	}			emitSPUpdate(MBB, MBBI, StackPtr, -(int64_t)NumBytes, Is64Bit, Uses64BitFramePtr,
	SEHFrameOffset += SEHFrameOffset % 16; // ensure alignmant			UseLEA, TII, *RegInfo);
				}
	// This only needs to account for XMM spill slots, GPR slots
	// are covered by the .seh_pushreg's emitted above.			int SEHFrameOffset = 0;
	unsigned Size = SEHFrameOffset - X86FI->getCalleeSavedFrameSize();			if (NeedsWinEH) {
	if (Size) {			if (HasFP) {
	BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_StackAlloc))			// We need to set frame base offset low enough such that all saved
	.addImm(Size)			// register offsets would be positive relative to it, but we can't
	.setMIFlag(MachineInstr::FrameSetup);			// just use NumBytes, because .seh_setframe offset must be <=240.
	}			// So we pretend to have only allocated enough space to spill the
				// non-volatile registers.
	BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_SetFrame))			// We don't care about the rest of stack allocation, because unwinder
	.addImm(FramePtr)			// will restore SP to (BP - SEHFrameOffset)
	.addImm(SEHFrameOffset)			for (const CalleeSavedInfo &Info : MFI->getCalleeSavedInfo()) {
	.setMIFlag(MachineInstr::FrameSetup);			int offset = MFI->getObjectOffset(Info.getFrameIdx());
	} else {			SEHFrameOffset = std::max(SEHFrameOffset, std::abs(offset));
	// SP will be the base register for restoring XMMs			}
	if (NumBytes) {			SEHFrameOffset += SEHFrameOffset % 16; // ensure alignmant
	BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_StackAlloc))
	.addImm(NumBytes)			// This only needs to account for XMM spill slots, GPR slots
	.setMIFlag(MachineInstr::FrameSetup);			// are covered by the .seh_pushreg's emitted above.
	}			unsigned Size = SEHFrameOffset - X86FI->getCalleeSavedFrameSize();
	}			if (Size) {
	}			BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_StackAlloc))
				.addImm(Size)
	// Skip the rest of register spilling code			.setMIFlag(MachineInstr::FrameSetup);
	while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))			}
	++MBBI;
				BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_SetFrame))
	// Emit SEH info for non-GPRs			.addImm(FramePtr)
	if (NeedsWinEH) {			.addImm(SEHFrameOffset)
	for (const CalleeSavedInfo &Info : MFI->getCalleeSavedInfo()) {			.setMIFlag(MachineInstr::FrameSetup);
	unsigned Reg = Info.getReg();			} else {
	if (X86::GR64RegClass.contains(Reg) \|\| X86::GR32RegClass.contains(Reg))			// SP will be the base register for restoring XMMs
	continue;			if (NumBytes) {
	assert(X86::FR64RegClass.contains(Reg) && "Unexpected register class");			BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_StackAlloc))
				.addImm(NumBytes)
	int Offset = getFrameIndexOffset(MF, Info.getFrameIdx());			.setMIFlag(MachineInstr::FrameSetup);
	Offset += SEHFrameOffset;			}
				}
	BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_SaveXMM))			}
	.addImm(Reg)
	.addImm(Offset)			// Skip the rest of register spilling code
	.setMIFlag(MachineInstr::FrameSetup);			while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
	}			++MBBI;

	BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_EndPrologue))			// Emit SEH info for non-GPRs
	.setMIFlag(MachineInstr::FrameSetup);			if (NeedsWinEH) {
	}			for (const CalleeSavedInfo &Info : MFI->getCalleeSavedInfo()) {
				unsigned Reg = Info.getReg();
	// If we need a base pointer, set it up here. It's whatever the value			if (X86::GR64RegClass.contains(Reg) \|\| X86::GR32RegClass.contains(Reg))
	// of the stack pointer is at this point. Any variable size objects			continue;
	// will be allocated after this, so we can still use the base pointer			assert(X86::FR64RegClass.contains(Reg) && "Unexpected register class");
	// to reference locals.
	if (RegInfo->hasBasePointer(MF)) {			int Offset = getFrameIndexOffset(MF, Info.getFrameIdx());
	// Update the base pointer with the current stack pointer.			Offset += SEHFrameOffset;
	unsigned Opc = Uses64BitFramePtr ? X86::MOV64rr : X86::MOV32rr;
	BuildMI(MBB, MBBI, DL, TII.get(Opc), BasePtr)			BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_SaveXMM))
	.addReg(StackPtr)			.addImm(Reg)
	.setMIFlag(MachineInstr::FrameSetup);			.addImm(Offset)
	if (X86FI->getRestoreBasePointer()) {			.setMIFlag(MachineInstr::FrameSetup);
	// Stash value of base pointer. Saving RSP instead of EBP shortens dependence chain.			}
	unsigned Opm = Uses64BitFramePtr ? X86::MOV64mr : X86::MOV32mr;
	addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(Opm)),			BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_EndPrologue))
	FramePtr, true, X86FI->getRestoreBasePointerOffset())			.setMIFlag(MachineInstr::FrameSetup);
	.addReg(StackPtr)			}
	.setMIFlag(MachineInstr::FrameSetup);
	}			// If we need a base pointer, set it up here. It's whatever the value
	}			// of the stack pointer is at this point. Any variable size objects
				// will be allocated after this, so we can still use the base pointer
	if (((!HasFP && NumBytes) \|\| PushedRegs) && NeedsDwarfCFI) {			// to reference locals.
	// Mark end of stack pointer adjustment.			if (RegInfo->hasBasePointer(MF)) {
	if (!HasFP && NumBytes) {			// Update the base pointer with the current stack pointer.
	// Define the current CFA rule to use the provided offset.			unsigned Opc = Uses64BitFramePtr ? X86::MOV64rr : X86::MOV32rr;
	assert(StackSize);			BuildMI(MBB, MBBI, DL, TII.get(Opc), BasePtr)
	unsigned CFIIndex = MMI.addFrameInst(			.addReg(StackPtr)
	MCCFIInstruction::createDefCfaOffset(nullptr,			.setMIFlag(MachineInstr::FrameSetup);
	-StackSize + stackGrowth));			if (X86FI->getRestoreBasePointer()) {
				// Stash value of base pointer. Saving RSP instead of EBP shortens dependence chain.
	BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))			unsigned Opm = Uses64BitFramePtr ? X86::MOV64mr : X86::MOV32mr;
	.addCFIIndex(CFIIndex);			addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(Opm)),
	}			FramePtr, true, X86FI->getRestoreBasePointerOffset())
				.addReg(StackPtr)
	// Emit DWARF info specifying the offsets of the callee-saved registers.			.setMIFlag(MachineInstr::FrameSetup);
	if (PushedRegs)			}
	emitCalleeSavedFrameMoves(MBB, MBBI, DL);			}
	}
	}			if (((!HasFP && NumBytes) \|\| PushedRegs) && NeedsDwarfCFI) {
				// Mark end of stack pointer adjustment.
	void X86FrameLowering::emitEpilogue(MachineFunction &MF,			if (!HasFP && NumBytes) {
	MachineBasicBlock &MBB) const {			// Define the current CFA rule to use the provided offset.
	const MachineFrameInfo *MFI = MF.getFrameInfo();			assert(StackSize);
	X86MachineFunctionInfo *X86FI = MF.getInfo<X86MachineFunctionInfo>();			unsigned CFIIndex = MMI.addFrameInst(
	const X86RegisterInfo *RegInfo =			MCCFIInstruction::createDefCfaOffset(nullptr,
	static_cast<const X86RegisterInfo *>(MF.getSubtarget().getRegisterInfo());			-StackSize + stackGrowth));
	const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
	MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();			BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
	assert(MBBI != MBB.end() && "Returning block has no instructions");			.addCFIIndex(CFIIndex);
	unsigned RetOpcode = MBBI->getOpcode();			}
	DebugLoc DL = MBBI->getDebugLoc();
	const X86Subtarget &STI = MF.getTarget().getSubtarget<X86Subtarget>();			// Emit DWARF info specifying the offsets of the callee-saved registers.
	bool Is64Bit = STI.is64Bit();			if (PushedRegs)
	// standard x86_64 and NaCl use 64-bit frame/stack pointers, x32 - 32-bit.			emitCalleeSavedFrameMoves(MBB, MBBI, DL);
	const bool Uses64BitFramePtr = STI.isTarget64BitLP64() \|\| STI.isTargetNaCl64();			}
	const bool Is64BitILP32 = STI.isTarget64BitILP32();			}
	bool UseLEA = STI.useLeaForSP();
	unsigned StackAlign = getStackAlignment();			void X86FrameLowering::emitEpilogue(MachineFunction &MF,
	unsigned SlotSize = RegInfo->getSlotSize();			MachineBasicBlock &MBB) const {
	unsigned FramePtr = RegInfo->getFrameRegister(MF);			const MachineFrameInfo *MFI = MF.getFrameInfo();
	unsigned MachineFramePtr = Is64BitILP32 ?			X86MachineFunctionInfo *X86FI = MF.getInfo<X86MachineFunctionInfo>();
	getX86SubSuperRegister(FramePtr, MVT::i64, false) : FramePtr;			const X86RegisterInfo *RegInfo =
	unsigned StackPtr = RegInfo->getStackRegister();			static_cast<const X86RegisterInfo *>(MF.getSubtarget().getRegisterInfo());
				const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
	bool IsWinEH = MF.getTarget().getMCAsmInfo()->usesWindowsCFI();			MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();
	bool NeedsWinEH = IsWinEH && MF.getFunction()->needsUnwindTableEntry();			assert(MBBI != MBB.end() && "Returning block has no instructions");
				unsigned RetOpcode = MBBI->getOpcode();
	switch (RetOpcode) {			DebugLoc DL = MBBI->getDebugLoc();
	default:			const X86Subtarget &STI = MF.getTarget().getSubtarget<X86Subtarget>();
	llvm_unreachable("Can only insert epilog into returning blocks");			bool Is64Bit = STI.is64Bit();
	case X86::RETQ:			// standard x86_64 and NaCl use 64-bit frame/stack pointers, x32 - 32-bit.
	case X86::RETL:			const bool Uses64BitFramePtr = STI.isTarget64BitLP64() \|\| STI.isTargetNaCl64();
	case X86::RETIL:			const bool Is64BitILP32 = STI.isTarget64BitILP32();
	case X86::RETIQ:			bool UseLEA = STI.useLeaForSP();
	case X86::TCRETURNdi:			unsigned StackAlign = getStackAlignment();
	case X86::TCRETURNri:			unsigned SlotSize = RegInfo->getSlotSize();
	case X86::TCRETURNmi:			unsigned FramePtr = RegInfo->getFrameRegister(MF);
	case X86::TCRETURNdi64:			unsigned MachineFramePtr = Is64BitILP32 ?
	case X86::TCRETURNri64:			getX86SubSuperRegister(FramePtr, MVT::i64, false) : FramePtr;
	case X86::TCRETURNmi64:			unsigned StackPtr = RegInfo->getStackRegister();
	case X86::EH_RETURN:
	case X86::EH_RETURN64:			bool IsWinEH = MF.getTarget().getMCAsmInfo()->usesWindowsCFI();
	break; // These are ok			bool NeedsWinEH = IsWinEH && MF.getFunction()->needsUnwindTableEntry();
	}
				switch (RetOpcode) {
	// Get the number of bytes to allocate from the FrameInfo.			default:
	uint64_t StackSize = MFI->getStackSize();			llvm_unreachable("Can only insert epilog into returning blocks");
	uint64_t MaxAlign = MFI->getMaxAlignment();			case X86::RETQ:
	unsigned CSSize = X86FI->getCalleeSavedFrameSize();			case X86::RETL:
	uint64_t NumBytes = 0;			case X86::RETIL:
				case X86::RETIQ:
	// If we're forcing a stack realignment we can't rely on just the frame			case X86::TCRETURNdi:
	// info, we need to know the ABI stack alignment as well in case we			case X86::TCRETURNri:
	// have a call out. Otherwise just make sure we have some alignment - we'll			case X86::TCRETURNmi:
	// go with the minimum.			case X86::TCRETURNdi64:
	if (ForceStackAlign) {			case X86::TCRETURNri64:
	if (MFI->hasCalls())			case X86::TCRETURNmi64:
	MaxAlign = (StackAlign > MaxAlign) ? StackAlign : MaxAlign;			case X86::EH_RETURN:
	else			case X86::EH_RETURN64:
	MaxAlign = MaxAlign ? MaxAlign : 4;			break; // These are ok
	}			}

	if (hasFP(MF)) {			// Get the number of bytes to allocate from the FrameInfo.
	// Calculate required stack adjustment.			uint64_t StackSize = MFI->getStackSize();
	uint64_t FrameSize = StackSize - SlotSize;			uint64_t MaxAlign = MFI->getMaxAlignment();
	if (RegInfo->needsStackRealignment(MF)) {			unsigned CSSize = X86FI->getCalleeSavedFrameSize();
	// Callee-saved registers were pushed on stack before the stack			uint64_t NumBytes = 0;
	// was realigned.
	FrameSize -= CSSize;			// If we're forcing a stack realignment we can't rely on just the frame
	NumBytes = (FrameSize + MaxAlign - 1) / MaxAlign * MaxAlign;			// info, we need to know the ABI stack alignment as well in case we
	} else {			// have a call out. Otherwise just make sure we have some alignment - we'll
	NumBytes = FrameSize - CSSize;			// go with the minimum.
	}			if (ForceStackAlign) {
				if (MFI->hasCalls())
	// Pop EBP.			MaxAlign = (StackAlign > MaxAlign) ? StackAlign : MaxAlign;
	BuildMI(MBB, MBBI, DL,			else
	TII.get(Is64Bit ? X86::POP64r : X86::POP32r), MachineFramePtr);			MaxAlign = MaxAlign ? MaxAlign : 4;
	} else {			}
	NumBytes = StackSize - CSSize;
	}			if (hasFP(MF)) {
				// Calculate required stack adjustment.
	// Skip the callee-saved pop instructions.			uint64_t FrameSize = StackSize - SlotSize;
	while (MBBI != MBB.begin()) {			if (RegInfo->needsStackRealignment(MF)) {
	MachineBasicBlock::iterator PI = std::prev(MBBI);			// Callee-saved registers were pushed on stack before the stack
	unsigned Opc = PI->getOpcode();			// was realigned.
				FrameSize -= CSSize;
	if (Opc != X86::POP32r && Opc != X86::POP64r && Opc != X86::DBG_VALUE &&			NumBytes = (FrameSize + MaxAlign - 1) / MaxAlign * MaxAlign;
	!PI->isTerminator())			} else {
	break;			NumBytes = FrameSize - CSSize;
				}
	--MBBI;
	}			// Pop EBP.
	MachineBasicBlock::iterator FirstCSPop = MBBI;			BuildMI(MBB, MBBI, DL,
				TII.get(Is64Bit ? X86::POP64r : X86::POP32r), MachineFramePtr);
	DL = MBBI->getDebugLoc();			} else {
				NumBytes = StackSize - CSSize;
	// If there is an ADD32ri or SUB32ri of ESP immediately before this			}
	// instruction, merge the two instructions.
	if (NumBytes \|\| MFI->hasVarSizedObjects())			// Skip the callee-saved pop instructions.
	mergeSPUpdatesUp(MBB, MBBI, StackPtr, &NumBytes);			while (MBBI != MBB.begin()) {
				MachineBasicBlock::iterator PI = std::prev(MBBI);
	// If dynamic alloca is used, then reset esp to point to the last callee-saved			unsigned Opc = PI->getOpcode();
	// slot before popping them off! Same applies for the case, when stack was
	// realigned.			if (Opc != X86::POP32r && Opc != X86::POP64r && Opc != X86::DBG_VALUE &&
	if (RegInfo->needsStackRealignment(MF) \|\| MFI->hasVarSizedObjects()) {			!PI->isTerminator())
	if (RegInfo->needsStackRealignment(MF))			break;
	MBBI = FirstCSPop;
	if (CSSize != 0) {			--MBBI;
	unsigned Opc = getLEArOpcode(Uses64BitFramePtr);			}
	addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr),			MachineBasicBlock::iterator FirstCSPop = MBBI;
	FramePtr, false, -CSSize);
	--MBBI;			DL = MBBI->getDebugLoc();
	} else {
	unsigned Opc = (Uses64BitFramePtr ? X86::MOV64rr : X86::MOV32rr);			// If there is an ADD32ri or SUB32ri of ESP immediately before this
	BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr)			// instruction, merge the two instructions.
	.addReg(FramePtr);			if (NumBytes \|\| MFI->hasVarSizedObjects())
	--MBBI;			mergeSPUpdatesUp(MBB, MBBI, StackPtr, &NumBytes);
	}
	} else if (NumBytes) {			// If dynamic alloca is used, then reset esp to point to the last callee-saved
	// Adjust stack pointer back: ESP += numbytes.			// slot before popping them off! Same applies for the case, when stack was
	emitSPUpdate(MBB, MBBI, StackPtr, NumBytes, Is64Bit, Uses64BitFramePtr, UseLEA,			// realigned.
	TII, *RegInfo);			if (RegInfo->needsStackRealignment(MF) \|\| MFI->hasVarSizedObjects()) {
	--MBBI;			if (RegInfo->needsStackRealignment(MF))
	}			MBBI = FirstCSPop;
				if (CSSize != 0) {
	// Windows unwinder will not invoke function's exception handler if IP is			unsigned Opc = getLEArOpcode(Uses64BitFramePtr);
	// either in prologue or in epilogue. This behavior causes a problem when a			addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr),
	// call immediately precedes an epilogue, because the return address points			FramePtr, false, -CSSize);
	// into the epilogue. To cope with that, we insert an epilogue marker here,			--MBBI;
	// then replace it with a 'nop' if it ends up immediately after a CALL in the			} else {
	// final emitted code.			unsigned Opc = (Uses64BitFramePtr ? X86::MOV64rr : X86::MOV32rr);
	if (NeedsWinEH)			BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr)
	BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_Epilogue));			.addReg(FramePtr);
				--MBBI;
	// We're returning from function via eh_return.			}
	if (RetOpcode == X86::EH_RETURN \|\| RetOpcode == X86::EH_RETURN64) {			} else if (NumBytes) {
	MBBI = MBB.getLastNonDebugInstr();			// Adjust stack pointer back: ESP += numbytes.
	MachineOperand &DestAddr = MBBI->getOperand(0);			emitSPUpdate(MBB, MBBI, StackPtr, NumBytes, Is64Bit, Uses64BitFramePtr, UseLEA,
	assert(DestAddr.isReg() && "Offset should be in register!");			TII, *RegInfo);
	BuildMI(MBB, MBBI, DL,			--MBBI;
	TII.get(Uses64BitFramePtr ? X86::MOV64rr : X86::MOV32rr),			}
	StackPtr).addReg(DestAddr.getReg());
	} else if (RetOpcode == X86::TCRETURNri \|\| RetOpcode == X86::TCRETURNdi \|\|			// Windows unwinder will not invoke function's exception handler if IP is
	RetOpcode == X86::TCRETURNmi \|\|			// either in prologue or in epilogue. This behavior causes a problem when a
	RetOpcode == X86::TCRETURNri64 \|\| RetOpcode == X86::TCRETURNdi64 \|\|			// call immediately precedes an epilogue, because the return address points
	RetOpcode == X86::TCRETURNmi64) {			// into the epilogue. To cope with that, we insert an epilogue marker here,
	bool isMem = RetOpcode == X86::TCRETURNmi \|\| RetOpcode == X86::TCRETURNmi64;			// then replace it with a 'nop' if it ends up immediately after a CALL in the
	// Tail call return: adjust the stack pointer and jump to callee.			// final emitted code.
	MBBI = MBB.getLastNonDebugInstr();			if (NeedsWinEH)
	MachineOperand &JumpTarget = MBBI->getOperand(0);			BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_Epilogue));
	MachineOperand &StackAdjust = MBBI->getOperand(isMem ? 5 : 1);
	assert(StackAdjust.isImm() && "Expecting immediate value.");			// We're returning from function via eh_return.
				if (RetOpcode == X86::EH_RETURN \|\| RetOpcode == X86::EH_RETURN64) {
	// Adjust stack pointer.			MBBI = MBB.getLastNonDebugInstr();
	int StackAdj = StackAdjust.getImm();			MachineOperand &DestAddr = MBBI->getOperand(0);
	int MaxTCDelta = X86FI->getTCReturnAddrDelta();			assert(DestAddr.isReg() && "Offset should be in register!");
	int Offset = 0;			BuildMI(MBB, MBBI, DL,
	assert(MaxTCDelta <= 0 && "MaxTCDelta should never be positive");			TII.get(Uses64BitFramePtr ? X86::MOV64rr : X86::MOV32rr),
				StackPtr).addReg(DestAddr.getReg());
	// Incoporate the retaddr area.			} else if (RetOpcode == X86::TCRETURNri \|\| RetOpcode == X86::TCRETURNdi \|\|
	Offset = StackAdj-MaxTCDelta;			RetOpcode == X86::TCRETURNmi \|\|
	assert(Offset >= 0 && "Offset should never be negative");			RetOpcode == X86::TCRETURNri64 \|\| RetOpcode == X86::TCRETURNdi64 \|\|
				RetOpcode == X86::TCRETURNmi64) {
	if (Offset) {			bool isMem = RetOpcode == X86::TCRETURNmi \|\| RetOpcode == X86::TCRETURNmi64;
	// Check for possible merge with preceding ADD instruction.			// Tail call return: adjust the stack pointer and jump to callee.
	Offset += mergeSPUpdates(MBB, MBBI, StackPtr, true);			MBBI = MBB.getLastNonDebugInstr();
	emitSPUpdate(MBB, MBBI, StackPtr, Offset, Is64Bit, Uses64BitFramePtr,			MachineOperand &JumpTarget = MBBI->getOperand(0);
	UseLEA, TII, *RegInfo);			MachineOperand &StackAdjust = MBBI->getOperand(isMem ? 5 : 1);
	}			assert(StackAdjust.isImm() && "Expecting immediate value.");

	// Jump to label or value in register.			// Adjust stack pointer.
	bool IsWin64 = STI.isTargetWin64();			int StackAdj = StackAdjust.getImm();
	if (RetOpcode == X86::TCRETURNdi \|\| RetOpcode == X86::TCRETURNdi64) {			int MaxTCDelta = X86FI->getTCReturnAddrDelta();
	unsigned Op = (RetOpcode == X86::TCRETURNdi)			int Offset = 0;
	? X86::TAILJMPd			assert(MaxTCDelta <= 0 && "MaxTCDelta should never be positive");
	: (IsWin64 ? X86::TAILJMPd64_REX : X86::TAILJMPd64);
	MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(Op));			// Incoporate the retaddr area.
	if (JumpTarget.isGlobal())			Offset = StackAdj-MaxTCDelta;
	MIB.addGlobalAddress(JumpTarget.getGlobal(), JumpTarget.getOffset(),			assert(Offset >= 0 && "Offset should never be negative");
	JumpTarget.getTargetFlags());
	else {			if (Offset) {
	assert(JumpTarget.isSymbol());			// Check for possible merge with preceding ADD instruction.
	MIB.addExternalSymbol(JumpTarget.getSymbolName(),			Offset += mergeSPUpdates(MBB, MBBI, StackPtr, true);
	JumpTarget.getTargetFlags());			emitSPUpdate(MBB, MBBI, StackPtr, Offset, Is64Bit, Uses64BitFramePtr,
	}			UseLEA, TII, *RegInfo);
	} else if (RetOpcode == X86::TCRETURNmi \|\| RetOpcode == X86::TCRETURNmi64) {			}
	unsigned Op = (RetOpcode == X86::TCRETURNmi)
	? X86::TAILJMPm			// Jump to label or value in register.
	: (IsWin64 ? X86::TAILJMPm64_REX : X86::TAILJMPm64);			bool IsWin64 = STI.isTargetWin64();
	MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(Op));			if (RetOpcode == X86::TCRETURNdi \|\| RetOpcode == X86::TCRETURNdi64) {
	for (unsigned i = 0; i != 5; ++i)			unsigned Op = (RetOpcode == X86::TCRETURNdi)
	MIB.addOperand(MBBI->getOperand(i));			? X86::TAILJMPd
	} else if (RetOpcode == X86::TCRETURNri64) {			: (IsWin64 ? X86::TAILJMPd64_REX : X86::TAILJMPd64);
	BuildMI(MBB, MBBI, DL,			MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(Op));
	TII.get(IsWin64 ? X86::TAILJMPr64_REX : X86::TAILJMPr64))			if (JumpTarget.isGlobal())
	.addReg(JumpTarget.getReg(), RegState::Kill);			MIB.addGlobalAddress(JumpTarget.getGlobal(), JumpTarget.getOffset(),
	} else {			JumpTarget.getTargetFlags());
	BuildMI(MBB, MBBI, DL, TII.get(X86::TAILJMPr)).			else {
	addReg(JumpTarget.getReg(), RegState::Kill);			assert(JumpTarget.isSymbol());
	}			MIB.addExternalSymbol(JumpTarget.getSymbolName(),
				JumpTarget.getTargetFlags());
	MachineInstr *NewMI = std::prev(MBBI);			}
	NewMI->copyImplicitOps(MF, MBBI);			} else if (RetOpcode == X86::TCRETURNmi \|\| RetOpcode == X86::TCRETURNmi64) {
				unsigned Op = (RetOpcode == X86::TCRETURNmi)
	// Delete the pseudo instruction TCRETURN.			? X86::TAILJMPm
	MBB.erase(MBBI);			: (IsWin64 ? X86::TAILJMPm64_REX : X86::TAILJMPm64);
	} else if ((RetOpcode == X86::RETQ \|\| RetOpcode == X86::RETL \|\|			MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(Op));
	RetOpcode == X86::RETIQ \|\| RetOpcode == X86::RETIL) &&			for (unsigned i = 0; i != 5; ++i)
	(X86FI->getTCReturnAddrDelta() < 0)) {			MIB.addOperand(MBBI->getOperand(i));
	// Add the return addr area delta back since we are not tail calling.			} else if (RetOpcode == X86::TCRETURNri64) {
	int delta = -1*X86FI->getTCReturnAddrDelta();			BuildMI(MBB, MBBI, DL,
	MBBI = MBB.getLastNonDebugInstr();			TII.get(IsWin64 ? X86::TAILJMPr64_REX : X86::TAILJMPr64))
				.addReg(JumpTarget.getReg(), RegState::Kill);
	// Check for possible merge with preceding ADD instruction.			} else {
	delta += mergeSPUpdates(MBB, MBBI, StackPtr, true);			BuildMI(MBB, MBBI, DL, TII.get(X86::TAILJMPr)).
	emitSPUpdate(MBB, MBBI, StackPtr, delta, Is64Bit, Uses64BitFramePtr, UseLEA, TII,			addReg(JumpTarget.getReg(), RegState::Kill);
	*RegInfo);			}
	}
	}			MachineInstr *NewMI = std::prev(MBBI);
				NewMI->copyImplicitOps(MF, MBBI);
	int X86FrameLowering::getFrameIndexOffset(const MachineFunction &MF,
	int FI) const {			// Delete the pseudo instruction TCRETURN.
	const X86RegisterInfo *RegInfo =			MBB.erase(MBBI);
	static_cast<const X86RegisterInfo *>(MF.getSubtarget().getRegisterInfo());			} else if ((RetOpcode == X86::RETQ \|\| RetOpcode == X86::RETL \|\|
	const MachineFrameInfo *MFI = MF.getFrameInfo();			RetOpcode == X86::RETIQ \|\| RetOpcode == X86::RETIL) &&
	int Offset = MFI->getObjectOffset(FI) - getOffsetOfLocalArea();			(X86FI->getTCReturnAddrDelta() < 0)) {
	uint64_t StackSize = MFI->getStackSize();			// Add the return addr area delta back since we are not tail calling.
				int delta = -1*X86FI->getTCReturnAddrDelta();
	if (RegInfo->hasBasePointer(MF)) {			MBBI = MBB.getLastNonDebugInstr();
	assert (hasFP(MF) && "VLAs and dynamic stack realign, but no FP?!");
	if (FI < 0) {			// Check for possible merge with preceding ADD instruction.
	// Skip the saved EBP.			delta += mergeSPUpdates(MBB, MBBI, StackPtr, true);
	return Offset + RegInfo->getSlotSize();			emitSPUpdate(MBB, MBBI, StackPtr, delta, Is64Bit, Uses64BitFramePtr, UseLEA, TII,
	} else {			*RegInfo);
	assert((-(Offset + StackSize)) % MFI->getObjectAlignment(FI) == 0);			}
	return Offset + StackSize;			}
	}
	} else if (RegInfo->needsStackRealignment(MF)) {			int X86FrameLowering::getFrameIndexOffset(const MachineFunction &MF,
	if (FI < 0) {			int FI) const {
	// Skip the saved EBP.			const X86RegisterInfo *RegInfo =
	return Offset + RegInfo->getSlotSize();			static_cast<const X86RegisterInfo *>(MF.getSubtarget().getRegisterInfo());
	} else {			const MachineFrameInfo *MFI = MF.getFrameInfo();
	assert((-(Offset + StackSize)) % MFI->getObjectAlignment(FI) == 0);			int Offset = MFI->getObjectOffset(FI) - getOffsetOfLocalArea();
	return Offset + StackSize;			uint64_t StackSize = MFI->getStackSize();
	}
	// FIXME: Support tail calls			if (RegInfo->hasBasePointer(MF)) {
	} else {			assert (hasFP(MF) && "VLAs and dynamic stack realign, but no FP?!");
	if (!hasFP(MF))			if (FI < 0) {
	return Offset + StackSize;			// Skip the saved EBP.
				return Offset + RegInfo->getSlotSize();
	// Skip the saved EBP.			} else {
	Offset += RegInfo->getSlotSize();			assert((-(Offset + StackSize)) % MFI->getObjectAlignment(FI) == 0);
				return Offset + StackSize;
	// Skip the RETADDR move area			}
	const X86MachineFunctionInfo *X86FI = MF.getInfo<X86MachineFunctionInfo>();			} else if (RegInfo->needsStackRealignment(MF)) {
	int TailCallReturnAddrDelta = X86FI->getTCReturnAddrDelta();			if (FI < 0) {
	if (TailCallReturnAddrDelta < 0)			// Skip the saved EBP.
	Offset -= TailCallReturnAddrDelta;			return Offset + RegInfo->getSlotSize();
	}			} else {
				assert((-(Offset + StackSize)) % MFI->getObjectAlignment(FI) == 0);
	return Offset;			return Offset + StackSize;
	}			}
				// FIXME: Support tail calls
	int X86FrameLowering::getFrameIndexReference(const MachineFunction &MF, int FI,			} else {
	unsigned &FrameReg) const {			if (!hasFP(MF))
	const X86RegisterInfo *RegInfo =			return Offset + StackSize;
	static_cast<const X86RegisterInfo *>(MF.getSubtarget().getRegisterInfo());
	// We can't calculate offset from frame pointer if the stack is realigned,			// Skip the saved EBP.
	// so enforce usage of stack/base pointer. The base pointer is used when we			Offset += RegInfo->getSlotSize();
	// have dynamic allocas in addition to dynamic realignment.
	if (RegInfo->hasBasePointer(MF))			// Skip the RETADDR move area
	FrameReg = RegInfo->getBaseRegister();			const X86MachineFunctionInfo *X86FI = MF.getInfo<X86MachineFunctionInfo>();
	else if (RegInfo->needsStackRealignment(MF))			int TailCallReturnAddrDelta = X86FI->getTCReturnAddrDelta();
	FrameReg = RegInfo->getStackRegister();			if (TailCallReturnAddrDelta < 0)
	else			Offset -= TailCallReturnAddrDelta;
	FrameReg = RegInfo->getFrameRegister(MF);			}
	return getFrameIndexOffset(MF, FI);
	}			return Offset;
				}
	// Simplified from getFrameIndexOffset keeping only StackPointer cases
	int X86FrameLowering::getFrameIndexOffsetFromSP(const MachineFunction &MF, int FI) const {			int X86FrameLowering::getFrameIndexReference(const MachineFunction &MF, int FI,
	const MachineFrameInfo *MFI = MF.getFrameInfo();			unsigned &FrameReg) const {
	// Does not include any dynamic realign.			const X86RegisterInfo *RegInfo =
	const uint64_t StackSize = MFI->getStackSize();			static_cast<const X86RegisterInfo *>(MF.getSubtarget().getRegisterInfo());
	{			// We can't calculate offset from frame pointer if the stack is realigned,
	#ifndef NDEBUG			// so enforce usage of stack/base pointer. The base pointer is used when we
	const X86RegisterInfo *RegInfo =			// have dynamic allocas in addition to dynamic realignment.
	static_cast<const X86RegisterInfo*>(MF.getSubtarget().getRegisterInfo());			if (RegInfo->hasBasePointer(MF))
	// Note: LLVM arranges the stack as:			FrameReg = RegInfo->getBaseRegister();
	// Args > Saved RetPC (<--FP) > CSRs > dynamic alignment (<--BP)			else if (RegInfo->needsStackRealignment(MF))
	// > "Stack Slots" (<--SP)			FrameReg = RegInfo->getStackRegister();
	// We can always address StackSlots from RSP. We can usually (unless			else
	// needsStackRealignment) address CSRs from RSP, but sometimes need to			FrameReg = RegInfo->getFrameRegister(MF);
	// address them from RBP. FixedObjects can be placed anywhere in the stack			return getFrameIndexOffset(MF, FI);
	// frame depending on their specific requirements (i.e. we can actually			}
	// refer to arguments to the function which are stored in the callers
	// frame). As a result, THE RESULT OF THIS CALL IS MEANINGLESS FOR CSRs			// Simplified from getFrameIndexOffset keeping only StackPointer cases
	// AND FixedObjects IFF needsStackRealignment or hasVarSizedObject.			int X86FrameLowering::getFrameIndexOffsetFromSP(const MachineFunction &MF, int FI) const {
				const MachineFrameInfo *MFI = MF.getFrameInfo();
	assert(!RegInfo->hasBasePointer(MF) && "we don't handle this case");			// Does not include any dynamic realign.
				const uint64_t StackSize = MFI->getStackSize();
	// We don't handle tail calls, and shouldn't be seeing them			{
	// either.			#ifndef NDEBUG
	int TailCallReturnAddrDelta =			const X86RegisterInfo *RegInfo =
	MF.getInfo<X86MachineFunctionInfo>()->getTCReturnAddrDelta();			static_cast<const X86RegisterInfo*>(MF.getSubtarget().getRegisterInfo());
	assert(!(TailCallReturnAddrDelta < 0) && "we don't handle this case!");			// Note: LLVM arranges the stack as:
	#endif			// Args > Saved RetPC (<--FP) > CSRs > dynamic alignment (<--BP)
	}			// > "Stack Slots" (<--SP)
				// We can always address StackSlots from RSP. We can usually (unless
	// This is how the math works out:			// needsStackRealignment) address CSRs from RSP, but sometimes need to
	//			// address them from RBP. FixedObjects can be placed anywhere in the stack
	// %rsp grows (i.e. gets lower) left to right. Each box below is			// frame depending on their specific requirements (i.e. we can actually
	// one word (eight bytes). Obj0 is the stack slot we're trying to			// refer to arguments to the function which are stored in the callers
	// get to.			// frame). As a result, THE RESULT OF THIS CALL IS MEANINGLESS FOR CSRs
	//			// AND FixedObjects IFF needsStackRealignment or hasVarSizedObject.
	// ----------------------------------
	// \| BP \| Obj0 \| Obj1 \| ... \| ObjN \|			assert(!RegInfo->hasBasePointer(MF) && "we don't handle this case");
	// ----------------------------------
	// ^ ^ ^ ^			// We don't handle tail calls, and shouldn't be seeing them
	// A B C E			// either.
	//			int TailCallReturnAddrDelta =
	// A is the incoming stack pointer.			MF.getInfo<X86MachineFunctionInfo>()->getTCReturnAddrDelta();
	// (B - A) is the local area offset (-8 for x86-64) [1]			assert(!(TailCallReturnAddrDelta < 0) && "we don't handle this case!");
	// (C - A) is the Offset returned by MFI->getObjectOffset for Obj0 [2]			#endif
	//			}
	// \|(E - B)\| is the StackSize (absolute value, positive). For a
	// stack that grown down, this works out to be (B - E). [3]			// This is how the math works out:
	//			//
	// E is also the value of %rsp after stack has been set up, and we			// %rsp grows (i.e. gets lower) left to right. Each box below is
	// want (C - E) -- the value we can add to %rsp to get to Obj0. Now			// one word (eight bytes). Obj0 is the stack slot we're trying to
	// (C - E) == (C - A) - (B - A) + (B - E)			// get to.
	// { Using [1], [2] and [3] above }			//
	// == getObjectOffset - LocalAreaOffset + StackSize			// ----------------------------------
	//			// \| BP \| Obj0 \| Obj1 \| ... \| ObjN \|
				// ----------------------------------
	// Get the Offset from the StackPointer			// ^ ^ ^ ^
	int Offset = MFI->getObjectOffset(FI) - getOffsetOfLocalArea();			// A B C E
				//
	return Offset + StackSize;			// A is the incoming stack pointer.
	}			// (B - A) is the local area offset (-8 for x86-64) [1]
	// Simplified from getFrameIndexReference keeping only StackPointer cases			// (C - A) is the Offset returned by MFI->getObjectOffset for Obj0 [2]
	int X86FrameLowering::getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI,			//
	unsigned &FrameReg) const {			// \|(E - B)\| is the StackSize (absolute value, positive). For a
	const X86RegisterInfo *RegInfo =			// stack that grown down, this works out to be (B - E). [3]
	static_cast<const X86RegisterInfo*>(MF.getSubtarget().getRegisterInfo());			//
				// E is also the value of %rsp after stack has been set up, and we
	assert(!RegInfo->hasBasePointer(MF) && "we don't handle this case");			// want (C - E) -- the value we can add to %rsp to get to Obj0. Now
				// (C - E) == (C - A) - (B - A) + (B - E)
	FrameReg = RegInfo->getStackRegister();			// { Using [1], [2] and [3] above }
	return getFrameIndexOffsetFromSP(MF, FI);			// == getObjectOffset - LocalAreaOffset + StackSize
	}			//

	bool X86FrameLowering::assignCalleeSavedSpillSlots(			// Get the Offset from the StackPointer
	MachineFunction &MF, const TargetRegisterInfo *TRI,			int Offset = MFI->getObjectOffset(FI) - getOffsetOfLocalArea();
	std::vector<CalleeSavedInfo> &CSI) const {
	MachineFrameInfo *MFI = MF.getFrameInfo();			return Offset + StackSize;
	const X86RegisterInfo *RegInfo =			}
	static_cast<const X86RegisterInfo *>(MF.getSubtarget().getRegisterInfo());			// Simplified from getFrameIndexReference keeping only StackPointer cases
	unsigned SlotSize = RegInfo->getSlotSize();			int X86FrameLowering::getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI,
	X86MachineFunctionInfo *X86FI = MF.getInfo<X86MachineFunctionInfo>();			unsigned &FrameReg) const {
				const X86RegisterInfo *RegInfo =
	unsigned CalleeSavedFrameSize = 0;			static_cast<const X86RegisterInfo*>(MF.getSubtarget().getRegisterInfo());
	int SpillSlotOffset = getOffsetOfLocalArea() + X86FI->getTCReturnAddrDelta();
				assert(!RegInfo->hasBasePointer(MF) && "we don't handle this case");
	if (hasFP(MF)) {
	// emitPrologue always spills frame register the first thing.			FrameReg = RegInfo->getStackRegister();
	SpillSlotOffset -= SlotSize;			return getFrameIndexOffsetFromSP(MF, FI);
	MFI->CreateFixedSpillStackObject(SlotSize, SpillSlotOffset);			}

	// Since emitPrologue and emitEpilogue will handle spilling and restoring of			bool X86FrameLowering::assignCalleeSavedSpillSlots(
	// the frame register, we can delete it from CSI list and not have to worry			MachineFunction &MF, const TargetRegisterInfo *TRI,
	// about avoiding it later.			std::vector<CalleeSavedInfo> &CSI) const {
	unsigned FPReg = RegInfo->getFrameRegister(MF);			MachineFrameInfo *MFI = MF.getFrameInfo();
	for (unsigned i = 0; i < CSI.size(); ++i) {			const X86RegisterInfo *RegInfo =
	if (TRI->regsOverlap(CSI[i].getReg(),FPReg)) {			static_cast<const X86RegisterInfo *>(MF.getSubtarget().getRegisterInfo());
	CSI.erase(CSI.begin() + i);			unsigned SlotSize = RegInfo->getSlotSize();
	break;			X86MachineFunctionInfo *X86FI = MF.getInfo<X86MachineFunctionInfo>();
	}
	}			unsigned CalleeSavedFrameSize = 0;
	}			int SpillSlotOffset = getOffsetOfLocalArea() + X86FI->getTCReturnAddrDelta();

	// Assign slots for GPRs. It increases frame size.			if (hasFP(MF)) {
	for (unsigned i = CSI.size(); i != 0; --i) {			// emitPrologue always spills frame register the first thing.
	unsigned Reg = CSI[i - 1].getReg();			SpillSlotOffset -= SlotSize;
				MFI->CreateFixedSpillStackObject(SlotSize, SpillSlotOffset);
	if (!X86::GR64RegClass.contains(Reg) && !X86::GR32RegClass.contains(Reg))
	continue;			// Since emitPrologue and emitEpilogue will handle spilling and restoring of
				// the frame register, we can delete it from CSI list and not have to worry
	SpillSlotOffset -= SlotSize;			// about avoiding it later.
	CalleeSavedFrameSize += SlotSize;			unsigned FPReg = RegInfo->getFrameRegister(MF);
				for (unsigned i = 0; i < CSI.size(); ++i) {
	int SlotIndex = MFI->CreateFixedSpillStackObject(SlotSize, SpillSlotOffset);			if (TRI->regsOverlap(CSI[i].getReg(),FPReg)) {
	CSI[i - 1].setFrameIdx(SlotIndex);			CSI.erase(CSI.begin() + i);
	}			break;
				}
	X86FI->setCalleeSavedFrameSize(CalleeSavedFrameSize);			}
				}
	// Assign slots for XMMs.
	for (unsigned i = CSI.size(); i != 0; --i) {			// Assign slots for GPRs. It increases frame size.
	unsigned Reg = CSI[i - 1].getReg();			for (unsigned i = CSI.size(); i != 0; --i) {
	if (X86::GR64RegClass.contains(Reg) \|\| X86::GR32RegClass.contains(Reg))			unsigned Reg = CSI[i - 1].getReg();
	continue;
				if (!X86::GR64RegClass.contains(Reg) && !X86::GR32RegClass.contains(Reg))
	const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);			continue;
	// ensure alignment
	SpillSlotOffset -= std::abs(SpillSlotOffset) % RC->getAlignment();			SpillSlotOffset -= SlotSize;
	// spill into slot			CalleeSavedFrameSize += SlotSize;
	SpillSlotOffset -= RC->getSize();
	int SlotIndex =			int SlotIndex = MFI->CreateFixedSpillStackObject(SlotSize, SpillSlotOffset);
	MFI->CreateFixedSpillStackObject(RC->getSize(), SpillSlotOffset);			CSI[i - 1].setFrameIdx(SlotIndex);
	CSI[i - 1].setFrameIdx(SlotIndex);			}
	MFI->ensureMaxAlignment(RC->getAlignment());
	}			X86FI->setCalleeSavedFrameSize(CalleeSavedFrameSize);

	return true;			// Assign slots for XMMs.
	}			for (unsigned i = CSI.size(); i != 0; --i) {
				unsigned Reg = CSI[i - 1].getReg();
	bool X86FrameLowering::spillCalleeSavedRegisters(			if (X86::GR64RegClass.contains(Reg) \|\| X86::GR32RegClass.contains(Reg))
	MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,			continue;
	const std::vector<CalleeSavedInfo> &CSI,
	const TargetRegisterInfo *TRI) const {			const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
	DebugLoc DL = MBB.findDebugLoc(MI);			// ensure alignment
				SpillSlotOffset -= std::abs(SpillSlotOffset) % RC->getAlignment();
	MachineFunction &MF = *MBB.getParent();			// spill into slot
	const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();			SpillSlotOffset -= RC->getSize();
	const X86Subtarget &STI = MF.getTarget().getSubtarget<X86Subtarget>();			int SlotIndex =
				MFI->CreateFixedSpillStackObject(RC->getSize(), SpillSlotOffset);
	// Push GPRs. It increases frame size.			CSI[i - 1].setFrameIdx(SlotIndex);
	unsigned Opc = STI.is64Bit() ? X86::PUSH64r : X86::PUSH32r;			MFI->ensureMaxAlignment(RC->getAlignment());
	for (unsigned i = CSI.size(); i != 0; --i) {			}
	unsigned Reg = CSI[i - 1].getReg();
				return true;
	if (!X86::GR64RegClass.contains(Reg) && !X86::GR32RegClass.contains(Reg))			}
	continue;
	// Add the callee-saved register as live-in. It's killed at the spill.			bool X86FrameLowering::spillCalleeSavedRegisters(
	MBB.addLiveIn(Reg);			MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,
				const std::vector<CalleeSavedInfo> &CSI,
	BuildMI(MBB, MI, DL, TII.get(Opc)).addReg(Reg, RegState::Kill)			const TargetRegisterInfo *TRI) const {
	.setMIFlag(MachineInstr::FrameSetup);			DebugLoc DL = MBB.findDebugLoc(MI);
	}
				MachineFunction &MF = *MBB.getParent();
	// Make XMM regs spilled. X86 does not have ability of push/pop XMM.			const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
	// It can be done by spilling XMMs to stack frame.			const X86Subtarget &STI = MF.getTarget().getSubtarget<X86Subtarget>();
	for (unsigned i = CSI.size(); i != 0; --i) {
	unsigned Reg = CSI[i-1].getReg();			// Push GPRs. It increases frame size.
	if (X86::GR64RegClass.contains(Reg) \|\|			unsigned Opc = STI.is64Bit() ? X86::PUSH64r : X86::PUSH32r;
	X86::GR32RegClass.contains(Reg))			for (unsigned i = CSI.size(); i != 0; --i) {
	continue;			unsigned Reg = CSI[i - 1].getReg();
	// Add the callee-saved register as live-in. It's killed at the spill.
	MBB.addLiveIn(Reg);			if (!X86::GR64RegClass.contains(Reg) && !X86::GR32RegClass.contains(Reg))
	const TargetRegisterClass *RC = TRI->getMinimalPhysRegClass(Reg);			continue;
				// Add the callee-saved register as live-in. It's killed at the spill.
	TII.storeRegToStackSlot(MBB, MI, Reg, true, CSI[i - 1].getFrameIdx(), RC,			MBB.addLiveIn(Reg);
	TRI);
	--MI;			BuildMI(MBB, MI, DL, TII.get(Opc)).addReg(Reg, RegState::Kill)
	MI->setFlag(MachineInstr::FrameSetup);			.setMIFlag(MachineInstr::FrameSetup);
	++MI;			}
	}
				// Make XMM regs spilled. X86 does not have ability of push/pop XMM.
	return true;			// It can be done by spilling XMMs to stack frame.
	}			for (unsigned i = CSI.size(); i != 0; --i) {
				unsigned Reg = CSI[i-1].getReg();
	bool X86FrameLowering::restoreCalleeSavedRegisters(MachineBasicBlock &MBB,			if (X86::GR64RegClass.contains(Reg) \|\|
	MachineBasicBlock::iterator MI,			X86::GR32RegClass.contains(Reg))
	const std::vector<CalleeSavedInfo> &CSI,			continue;
	const TargetRegisterInfo *TRI) const {			// Add the callee-saved register as live-in. It's killed at the spill.
	if (CSI.empty())			MBB.addLiveIn(Reg);
	return false;			const TargetRegisterClass *RC = TRI->getMinimalPhysRegClass(Reg);

	DebugLoc DL = MBB.findDebugLoc(MI);			TII.storeRegToStackSlot(MBB, MI, Reg, true, CSI[i - 1].getFrameIdx(), RC,
				TRI);
	MachineFunction &MF = *MBB.getParent();			--MI;
	const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();			MI->setFlag(MachineInstr::FrameSetup);
	const X86Subtarget &STI = MF.getTarget().getSubtarget<X86Subtarget>();			++MI;
				}
	// Reload XMMs from stack frame.
	for (unsigned i = 0, e = CSI.size(); i != e; ++i) {			return true;
	unsigned Reg = CSI[i].getReg();			}
	if (X86::GR64RegClass.contains(Reg) \|\|
	X86::GR32RegClass.contains(Reg))			bool X86FrameLowering::restoreCalleeSavedRegisters(MachineBasicBlock &MBB,
	continue;			MachineBasicBlock::iterator MI,
				const std::vector<CalleeSavedInfo> &CSI,
	const TargetRegisterClass *RC = TRI->getMinimalPhysRegClass(Reg);			const TargetRegisterInfo *TRI) const {
	TII.loadRegFromStackSlot(MBB, MI, Reg, CSI[i].getFrameIdx(), RC, TRI);			if (CSI.empty())
	}			return false;

	// POP GPRs.			DebugLoc DL = MBB.findDebugLoc(MI);
	unsigned Opc = STI.is64Bit() ? X86::POP64r : X86::POP32r;
	for (unsigned i = 0, e = CSI.size(); i != e; ++i) {			MachineFunction &MF = *MBB.getParent();
	unsigned Reg = CSI[i].getReg();			const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
	if (!X86::GR64RegClass.contains(Reg) &&			const X86Subtarget &STI = MF.getTarget().getSubtarget<X86Subtarget>();
	!X86::GR32RegClass.contains(Reg))
	continue;			// Reload XMMs from stack frame.
				for (unsigned i = 0, e = CSI.size(); i != e; ++i) {
	BuildMI(MBB, MI, DL, TII.get(Opc), Reg);			unsigned Reg = CSI[i].getReg();
	}			if (X86::GR64RegClass.contains(Reg) \|\|
	return true;			X86::GR32RegClass.contains(Reg))
	}			continue;

	void			const TargetRegisterClass *RC = TRI->getMinimalPhysRegClass(Reg);
	X86FrameLowering::processFunctionBeforeCalleeSavedScan(MachineFunction &MF,			TII.loadRegFromStackSlot(MBB, MI, Reg, CSI[i].getFrameIdx(), RC, TRI);
	RegScavenger *RS) const {			}
	MachineFrameInfo *MFI = MF.getFrameInfo();
	const X86RegisterInfo *RegInfo =			// POP GPRs.
	static_cast<const X86RegisterInfo *>(MF.getSubtarget().getRegisterInfo());			unsigned Opc = STI.is64Bit() ? X86::POP64r : X86::POP32r;
	unsigned SlotSize = RegInfo->getSlotSize();			for (unsigned i = 0, e = CSI.size(); i != e; ++i) {
				unsigned Reg = CSI[i].getReg();
	X86MachineFunctionInfo *X86FI = MF.getInfo<X86MachineFunctionInfo>();			if (!X86::GR64RegClass.contains(Reg) &&
	int64_t TailCallReturnAddrDelta = X86FI->getTCReturnAddrDelta();			!X86::GR32RegClass.contains(Reg))
				continue;
	if (TailCallReturnAddrDelta < 0) {
	// create RETURNADDR area			BuildMI(MBB, MI, DL, TII.get(Opc), Reg);
	// arg			}
	// arg			return true;
	// RETADDR			}
	// { ...
	// RETADDR area			void
	// ...			X86FrameLowering::processFunctionBeforeCalleeSavedScan(MachineFunction &MF,
	// }			RegScavenger *RS) const {
	// [EBP]			MachineFrameInfo *MFI = MF.getFrameInfo();
	MFI->CreateFixedObject(-TailCallReturnAddrDelta,			const X86RegisterInfo *RegInfo =
	TailCallReturnAddrDelta - SlotSize, true);			static_cast<const X86RegisterInfo *>(MF.getSubtarget().getRegisterInfo());
	}			unsigned SlotSize = RegInfo->getSlotSize();

	// Spill the BasePtr if it's used.			X86MachineFunctionInfo *X86FI = MF.getInfo<X86MachineFunctionInfo>();
	if (RegInfo->hasBasePointer(MF))			int64_t TailCallReturnAddrDelta = X86FI->getTCReturnAddrDelta();
	MF.getRegInfo().setPhysRegUsed(RegInfo->getBaseRegister());
	}			if (TailCallReturnAddrDelta < 0) {
				// create RETURNADDR area
	static bool			// arg
	HasNestArgument(const MachineFunction *MF) {			// arg
	const Function *F = MF->getFunction();			// RETADDR
	for (Function::const_arg_iterator I = F->arg_begin(), E = F->arg_end();			// { ...
	I != E; I++) {			// RETADDR area
	if (I->hasNestAttr())			// ...
	return true;			// }
	}			// [EBP]
	return false;			MFI->CreateFixedObject(-TailCallReturnAddrDelta,
	}			TailCallReturnAddrDelta - SlotSize, true);
				}
	/// GetScratchRegister - Get a temp register for performing work in the
	/// segmented stack and the Erlang/HiPE stack prologue. Depending on platform			// Spill the BasePtr if it's used.
	/// and the properties of the function either one or two registers will be			if (RegInfo->hasBasePointer(MF))
	/// needed. Set primary to true for the first register, false for the second.			MF.getRegInfo().setPhysRegUsed(RegInfo->getBaseRegister());
	static unsigned			}
	GetScratchRegister(bool Is64Bit, bool IsLP64, const MachineFunction &MF, bool Primary) {
	CallingConv::ID CallingConvention = MF.getFunction()->getCallingConv();			static bool
				HasNestArgument(const MachineFunction *MF) {
	// Erlang stuff.			const Function *F = MF->getFunction();
	if (CallingConvention == CallingConv::HiPE) {			for (Function::const_arg_iterator I = F->arg_begin(), E = F->arg_end();
	if (Is64Bit)			I != E; I++) {
	return Primary ? X86::R14 : X86::R13;			if (I->hasNestAttr())
	else			return true;
	return Primary ? X86::EBX : X86::EDI;			}
	}			return false;
				}
	if (Is64Bit) {
	if (IsLP64)			/// GetScratchRegister - Get a temp register for performing work in the
	return Primary ? X86::R11 : X86::R12;			/// segmented stack and the Erlang/HiPE stack prologue. Depending on platform
	else			/// and the properties of the function either one or two registers will be
	return Primary ? X86::R11D : X86::R12D;			/// needed. Set primary to true for the first register, false for the second.
	}			static unsigned
				GetScratchRegister(bool Is64Bit, bool IsLP64, const MachineFunction &MF, bool Primary) {
	bool IsNested = HasNestArgument(&MF);			CallingConv::ID CallingConvention = MF.getFunction()->getCallingConv();

	if (CallingConvention == CallingConv::X86_FastCall \|\|			// Erlang stuff.
	CallingConvention == CallingConv::Fast) {			if (CallingConvention == CallingConv::HiPE) {
	if (IsNested)			if (Is64Bit)
	report_fatal_error("Segmented stacks does not support fastcall with "			return Primary ? X86::R14 : X86::R13;
	"nested function.");			else
	return Primary ? X86::EAX : X86::ECX;			return Primary ? X86::EBX : X86::EDI;
	}			}
	if (IsNested)
	return Primary ? X86::EDX : X86::EAX;			if (Is64Bit) {
	return Primary ? X86::ECX : X86::EAX;			if (IsLP64)
	}			return Primary ? X86::R11 : X86::R12;
				else
	// The stack limit in the TCB is set to this many bytes above the actual stack			return Primary ? X86::R11D : X86::R12D;
	// limit.			}
	static const uint64_t kSplitStackAvailable = 256;
				bool IsNested = HasNestArgument(&MF);
	void
	X86FrameLowering::adjustForSegmentedStacks(MachineFunction &MF) const {			if (CallingConvention == CallingConv::X86_FastCall \|\|
	MachineBasicBlock &prologueMBB = MF.front();			CallingConvention == CallingConv::Fast) {
	MachineFrameInfo *MFI = MF.getFrameInfo();			if (IsNested)
	const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();			report_fatal_error("Segmented stacks does not support fastcall with "
	uint64_t StackSize;			"nested function.");
	const X86Subtarget &STI = MF.getTarget().getSubtarget<X86Subtarget>();			return Primary ? X86::EAX : X86::ECX;
	bool Is64Bit = STI.is64Bit();			}
	const bool IsLP64 = STI.isTarget64BitLP64();			if (IsNested)
	unsigned TlsReg, TlsOffset;			return Primary ? X86::EDX : X86::EAX;
	DebugLoc DL;			return Primary ? X86::ECX : X86::EAX;
				}
	unsigned ScratchReg = GetScratchRegister(Is64Bit, IsLP64, MF, true);
	assert(!MF.getRegInfo().isLiveIn(ScratchReg) &&			// The stack limit in the TCB is set to this many bytes above the actual stack
	"Scratch register is live-in");			// limit.
				static const uint64_t kSplitStackAvailable = 256;
	if (MF.getFunction()->isVarArg())
	report_fatal_error("Segmented stacks do not support vararg functions.");			void
	if (!STI.isTargetLinux() && !STI.isTargetDarwin() && !STI.isTargetWin32() &&			X86FrameLowering::adjustForSegmentedStacks(MachineFunction &MF) const {
	!STI.isTargetWin64() && !STI.isTargetFreeBSD() &&			MachineBasicBlock &prologueMBB = MF.front();
	!STI.isTargetDragonFly())			MachineFrameInfo *MFI = MF.getFrameInfo();
	report_fatal_error("Segmented stacks not supported on this platform.");			const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
				uint64_t StackSize;
	// Eventually StackSize will be calculated by a link-time pass; which will			const X86Subtarget &STI = MF.getTarget().getSubtarget<X86Subtarget>();
	// also decide whether checking code needs to be injected into this particular			bool Is64Bit = STI.is64Bit();
	// prologue.			const bool IsLP64 = STI.isTarget64BitLP64();
	StackSize = MFI->getStackSize();			unsigned TlsReg, TlsOffset;
				DebugLoc DL;
	// Do not generate a prologue for functions with a stack of size zero
	if (StackSize == 0)			unsigned ScratchReg = GetScratchRegister(Is64Bit, IsLP64, MF, true);
	return;			assert(!MF.getRegInfo().isLiveIn(ScratchReg) &&
				"Scratch register is live-in");
	MachineBasicBlock *allocMBB = MF.CreateMachineBasicBlock();
	MachineBasicBlock *checkMBB = MF.CreateMachineBasicBlock();			if (MF.getFunction()->isVarArg())
	X86MachineFunctionInfo *X86FI = MF.getInfo<X86MachineFunctionInfo>();			report_fatal_error("Segmented stacks do not support vararg functions.");
	bool IsNested = false;			if (!STI.isTargetLinux() && !STI.isTargetDarwin() && !STI.isTargetWin32() &&
				!STI.isTargetWin64() && !STI.isTargetFreeBSD() &&
	// We need to know if the function has a nest argument only in 64 bit mode.			!STI.isTargetDragonFly())
	if (Is64Bit)			report_fatal_error("Segmented stacks not supported on this platform.");
	IsNested = HasNestArgument(&MF);
				// Eventually StackSize will be calculated by a link-time pass; which will
	// The MOV R10, RAX needs to be in a different block, since the RET we emit in			// also decide whether checking code needs to be injected into this particular
	// allocMBB needs to be last (terminating) instruction.			// prologue.
				StackSize = MFI->getStackSize();
	for (MachineBasicBlock::livein_iterator i = prologueMBB.livein_begin(),
	e = prologueMBB.livein_end(); i != e; i++) {			// Do not generate a prologue for functions with a stack of size zero
	allocMBB->addLiveIn(*i);			if (StackSize == 0)
	checkMBB->addLiveIn(*i);			return;
	}
				MachineBasicBlock *allocMBB = MF.CreateMachineBasicBlock();
	if (IsNested)			MachineBasicBlock *checkMBB = MF.CreateMachineBasicBlock();
	allocMBB->addLiveIn(IsLP64 ? X86::R10 : X86::R10D);			X86MachineFunctionInfo *X86FI = MF.getInfo<X86MachineFunctionInfo>();
				bool IsNested = false;
	MF.push_front(allocMBB);
	MF.push_front(checkMBB);			// We need to know if the function has a nest argument only in 64 bit mode.
				if (Is64Bit)
	// When the frame size is less than 256 we just compare the stack			IsNested = HasNestArgument(&MF);
	// boundary directly to the value of the stack pointer, per gcc.
	bool CompareStackPointer = StackSize < kSplitStackAvailable;			// The MOV R10, RAX needs to be in a different block, since the RET we emit in
				// allocMBB needs to be last (terminating) instruction.
	// Read the limit off the current stacklet off the stack_guard location.
	if (Is64Bit) {			for (MachineBasicBlock::livein_iterator i = prologueMBB.livein_begin(),
	if (STI.isTargetLinux()) {			e = prologueMBB.livein_end(); i != e; i++) {
	TlsReg = X86::FS;			allocMBB->addLiveIn(*i);
	TlsOffset = IsLP64 ? 0x70 : 0x40;			checkMBB->addLiveIn(*i);
	} else if (STI.isTargetDarwin()) {			}
	TlsReg = X86::GS;
	TlsOffset = 0x60 + 90*8; // See pthread_machdep.h. Steal TLS slot 90.			if (IsNested)
	} else if (STI.isTargetWin64()) {			allocMBB->addLiveIn(IsLP64 ? X86::R10 : X86::R10D);
	TlsReg = X86::GS;
	TlsOffset = 0x28; // pvArbitrary, reserved for application use			MF.push_front(allocMBB);
	} else if (STI.isTargetFreeBSD()) {			MF.push_front(checkMBB);
	TlsReg = X86::FS;
	TlsOffset = 0x18;			// When the frame size is less than 256 we just compare the stack
	} else if (STI.isTargetDragonFly()) {			// boundary directly to the value of the stack pointer, per gcc.
	TlsReg = X86::FS;			bool CompareStackPointer = StackSize < kSplitStackAvailable;
	TlsOffset = 0x20; // use tls_tcb.tcb_segstack
	} else {			// Read the limit off the current stacklet off the stack_guard location.
	report_fatal_error("Segmented stacks not supported on this platform.");			if (Is64Bit) {
	}			if (STI.isTargetLinux()) {
				TlsReg = X86::FS;
	if (CompareStackPointer)			TlsOffset = IsLP64 ? 0x70 : 0x40;
	ScratchReg = IsLP64 ? X86::RSP : X86::ESP;			} else if (STI.isTargetDarwin()) {
	else			TlsReg = X86::GS;
	BuildMI(checkMBB, DL, TII.get(IsLP64 ? X86::LEA64r : X86::LEA64_32r), ScratchReg).addReg(X86::RSP)			TlsOffset = 0x60 + 90*8; // See pthread_machdep.h. Steal TLS slot 90.
	.addImm(1).addReg(0).addImm(-StackSize).addReg(0);			} else if (STI.isTargetWin64()) {
				TlsReg = X86::GS;
	BuildMI(checkMBB, DL, TII.get(IsLP64 ? X86::CMP64rm : X86::CMP32rm)).addReg(ScratchReg)			TlsOffset = 0x28; // pvArbitrary, reserved for application use
	.addReg(0).addImm(1).addReg(0).addImm(TlsOffset).addReg(TlsReg);			} else if (STI.isTargetFreeBSD()) {
	} else {			TlsReg = X86::FS;
	if (STI.isTargetLinux()) {			TlsOffset = 0x18;
	TlsReg = X86::GS;			} else if (STI.isTargetDragonFly()) {
	TlsOffset = 0x30;			TlsReg = X86::FS;
	} else if (STI.isTargetDarwin()) {			TlsOffset = 0x20; // use tls_tcb.tcb_segstack
	TlsReg = X86::GS;			} else {
	TlsOffset = 0x48 + 90*4;			report_fatal_error("Segmented stacks not supported on this platform.");
	} else if (STI.isTargetWin32()) {			}
	TlsReg = X86::FS;
	TlsOffset = 0x14; // pvArbitrary, reserved for application use			if (CompareStackPointer)
	} else if (STI.isTargetDragonFly()) {			ScratchReg = IsLP64 ? X86::RSP : X86::ESP;
	TlsReg = X86::FS;			else
	TlsOffset = 0x10; // use tls_tcb.tcb_segstack			BuildMI(checkMBB, DL, TII.get(IsLP64 ? X86::LEA64r : X86::LEA64_32r), ScratchReg).addReg(X86::RSP)
	} else if (STI.isTargetFreeBSD()) {			.addImm(1).addReg(0).addImm(-StackSize).addReg(0);
	report_fatal_error("Segmented stacks not supported on FreeBSD i386.");
	} else {			BuildMI(checkMBB, DL, TII.get(IsLP64 ? X86::CMP64rm : X86::CMP32rm)).addReg(ScratchReg)
	report_fatal_error("Segmented stacks not supported on this platform.");			.addReg(0).addImm(1).addReg(0).addImm(TlsOffset).addReg(TlsReg);
	}			} else {
				if (STI.isTargetLinux()) {
	if (CompareStackPointer)			TlsReg = X86::GS;
	ScratchReg = X86::ESP;			TlsOffset = 0x30;
	else			} else if (STI.isTargetDarwin()) {
	BuildMI(checkMBB, DL, TII.get(X86::LEA32r), ScratchReg).addReg(X86::ESP)			TlsReg = X86::GS;
	.addImm(1).addReg(0).addImm(-StackSize).addReg(0);			TlsOffset = 0x48 + 90*4;
				} else if (STI.isTargetWin32()) {
	if (STI.isTargetLinux() \|\| STI.isTargetWin32() \|\| STI.isTargetWin64() \|\|			TlsReg = X86::FS;
	STI.isTargetDragonFly()) {			TlsOffset = 0x14; // pvArbitrary, reserved for application use
	BuildMI(checkMBB, DL, TII.get(X86::CMP32rm)).addReg(ScratchReg)			} else if (STI.isTargetDragonFly()) {
	.addReg(0).addImm(0).addReg(0).addImm(TlsOffset).addReg(TlsReg);			TlsReg = X86::FS;
	} else if (STI.isTargetDarwin()) {			TlsOffset = 0x10; // use tls_tcb.tcb_segstack
				} else if (STI.isTargetFreeBSD()) {
	// TlsOffset doesn't fit into a mod r/m byte so we need an extra register.			report_fatal_error("Segmented stacks not supported on FreeBSD i386.");
	unsigned ScratchReg2;			} else {
	bool SaveScratch2;			report_fatal_error("Segmented stacks not supported on this platform.");
	if (CompareStackPointer) {			}
	// The primary scratch register is available for holding the TLS offset.
	ScratchReg2 = GetScratchRegister(Is64Bit, IsLP64, MF, true);			if (CompareStackPointer)
	SaveScratch2 = false;			ScratchReg = X86::ESP;
	} else {			else
	// Need to use a second register to hold the TLS offset			BuildMI(checkMBB, DL, TII.get(X86::LEA32r), ScratchReg).addReg(X86::ESP)
	ScratchReg2 = GetScratchRegister(Is64Bit, IsLP64, MF, false);			.addImm(1).addReg(0).addImm(-StackSize).addReg(0);

	// Unfortunately, with fastcc the second scratch register may hold an			if (STI.isTargetLinux() \|\| STI.isTargetWin32() \|\| STI.isTargetWin64() \|\|
	// argument.			STI.isTargetDragonFly()) {
	SaveScratch2 = MF.getRegInfo().isLiveIn(ScratchReg2);			BuildMI(checkMBB, DL, TII.get(X86::CMP32rm)).addReg(ScratchReg)
	}			.addReg(0).addImm(0).addReg(0).addImm(TlsOffset).addReg(TlsReg);
				} else if (STI.isTargetDarwin()) {
	// If Scratch2 is live-in then it needs to be saved.
	assert((!MF.getRegInfo().isLiveIn(ScratchReg2) \|\| SaveScratch2) &&			// TlsOffset doesn't fit into a mod r/m byte so we need an extra register.
	"Scratch register is live-in and not saved");			unsigned ScratchReg2;
				bool SaveScratch2;
	if (SaveScratch2)			if (CompareStackPointer) {
	BuildMI(checkMBB, DL, TII.get(X86::PUSH32r))			// The primary scratch register is available for holding the TLS offset.
	.addReg(ScratchReg2, RegState::Kill);			ScratchReg2 = GetScratchRegister(Is64Bit, IsLP64, MF, true);
				SaveScratch2 = false;
	BuildMI(checkMBB, DL, TII.get(X86::MOV32ri), ScratchReg2)			} else {
	.addImm(TlsOffset);			// Need to use a second register to hold the TLS offset
	BuildMI(checkMBB, DL, TII.get(X86::CMP32rm))			ScratchReg2 = GetScratchRegister(Is64Bit, IsLP64, MF, false);
	.addReg(ScratchReg)
	.addReg(ScratchReg2).addImm(1).addReg(0)			// Unfortunately, with fastcc the second scratch register may hold an
	.addImm(0)			// argument.
	.addReg(TlsReg);			SaveScratch2 = MF.getRegInfo().isLiveIn(ScratchReg2);
				}
	if (SaveScratch2)
	BuildMI(checkMBB, DL, TII.get(X86::POP32r), ScratchReg2);			// If Scratch2 is live-in then it needs to be saved.
	}			assert((!MF.getRegInfo().isLiveIn(ScratchReg2) \|\| SaveScratch2) &&
	}			"Scratch register is live-in and not saved");

	// This jump is taken if SP >= (Stacklet Limit + Stack Space required).			if (SaveScratch2)
	// It jumps to normal execution of the function body.			BuildMI(checkMBB, DL, TII.get(X86::PUSH32r))
	BuildMI(checkMBB, DL, TII.get(X86::JA_1)).addMBB(&prologueMBB);			.addReg(ScratchReg2, RegState::Kill);

	// On 32 bit we first push the arguments size and then the frame size. On 64			BuildMI(checkMBB, DL, TII.get(X86::MOV32ri), ScratchReg2)
	// bit, we pass the stack frame size in r10 and the argument size in r11.			.addImm(TlsOffset);
	if (Is64Bit) {			BuildMI(checkMBB, DL, TII.get(X86::CMP32rm))
	// Functions with nested arguments use R10, so it needs to be saved across			.addReg(ScratchReg)
	// the call to _morestack			.addReg(ScratchReg2).addImm(1).addReg(0)
				.addImm(0)
	const unsigned RegAX = IsLP64 ? X86::RAX : X86::EAX;			.addReg(TlsReg);
	const unsigned Reg10 = IsLP64 ? X86::R10 : X86::R10D;
	const unsigned Reg11 = IsLP64 ? X86::R11 : X86::R11D;			if (SaveScratch2)
	const unsigned MOVrr = IsLP64 ? X86::MOV64rr : X86::MOV32rr;			BuildMI(checkMBB, DL, TII.get(X86::POP32r), ScratchReg2);
	const unsigned MOVri = IsLP64 ? X86::MOV64ri : X86::MOV32ri;			}
				}
	if (IsNested)
	BuildMI(allocMBB, DL, TII.get(MOVrr), RegAX).addReg(Reg10);			// This jump is taken if SP >= (Stacklet Limit + Stack Space required).
				// It jumps to normal execution of the function body.
	BuildMI(allocMBB, DL, TII.get(MOVri), Reg10)			BuildMI(checkMBB, DL, TII.get(X86::JA_1)).addMBB(&prologueMBB);
	.addImm(StackSize);
	BuildMI(allocMBB, DL, TII.get(MOVri), Reg11)			// On 32 bit we first push the arguments size and then the frame size. On 64
	.addImm(X86FI->getArgumentStackSize());			// bit, we pass the stack frame size in r10 and the argument size in r11.
	MF.getRegInfo().setPhysRegUsed(Reg10);			if (Is64Bit) {
	MF.getRegInfo().setPhysRegUsed(Reg11);			// Functions with nested arguments use R10, so it needs to be saved across
	} else {			// the call to _morestack
	BuildMI(allocMBB, DL, TII.get(X86::PUSHi32))
	.addImm(X86FI->getArgumentStackSize());			const unsigned RegAX = IsLP64 ? X86::RAX : X86::EAX;
	BuildMI(allocMBB, DL, TII.get(X86::PUSHi32))			const unsigned Reg10 = IsLP64 ? X86::R10 : X86::R10D;
	.addImm(StackSize);			const unsigned Reg11 = IsLP64 ? X86::R11 : X86::R11D;
	}			const unsigned MOVrr = IsLP64 ? X86::MOV64rr : X86::MOV32rr;
				const unsigned MOVri = IsLP64 ? X86::MOV64ri : X86::MOV32ri;
	// __morestack is in libgcc
	if (Is64Bit && MF.getTarget().getCodeModel() == CodeModel::Large) {			if (IsNested)
	// Under the large code model, we cannot assume that __morestack lives			BuildMI(allocMBB, DL, TII.get(MOVrr), RegAX).addReg(Reg10);
	// within 2^31 bytes of the call site, so we cannot use pc-relative
	// addressing. We cannot perform the call via a temporary register,			BuildMI(allocMBB, DL, TII.get(MOVri), Reg10)
	// as the rax register may be used to store the static chain, and all			.addImm(StackSize);
	// other suitable registers may be either callee-save or used for			BuildMI(allocMBB, DL, TII.get(MOVri), Reg11)
	// parameter passing. We cannot use the stack at this point either			.addImm(X86FI->getArgumentStackSize());
	// because __morestack manipulates the stack directly.			MF.getRegInfo().setPhysRegUsed(Reg10);
	//			MF.getRegInfo().setPhysRegUsed(Reg11);
	// To avoid these issues, perform an indirect call via a read-only memory			} else {
	// location containing the address.			BuildMI(allocMBB, DL, TII.get(X86::PUSHi32))
	//			.addImm(X86FI->getArgumentStackSize());
	// This solution is not perfect, as it assumes that the .rodata section			BuildMI(allocMBB, DL, TII.get(X86::PUSHi32))
	// is laid out within 2^31 bytes of each function body, but this seems			.addImm(StackSize);
	// to be sufficient for JIT.			}
	BuildMI(allocMBB, DL, TII.get(X86::CALL64m))
	.addReg(X86::RIP)			// __morestack is in libgcc
	.addImm(0)			if (Is64Bit && MF.getTarget().getCodeModel() == CodeModel::Large) {
	.addReg(0)			// Under the large code model, we cannot assume that __morestack lives
	.addExternalSymbol("__morestack_addr")			// within 2^31 bytes of the call site, so we cannot use pc-relative
	.addReg(0);			// addressing. We cannot perform the call via a temporary register,
	MF.getMMI().setUsesMorestackAddr(true);			// as the rax register may be used to store the static chain, and all
	} else {			// other suitable registers may be either callee-save or used for
	if (Is64Bit)			// parameter passing. We cannot use the stack at this point either
	BuildMI(allocMBB, DL, TII.get(X86::CALL64pcrel32))			// because __morestack manipulates the stack directly.
	.addExternalSymbol("__morestack");			//
	else			// To avoid these issues, perform an indirect call via a read-only memory
	BuildMI(allocMBB, DL, TII.get(X86::CALLpcrel32))			// location containing the address.
	.addExternalSymbol("__morestack");			//
	}			// This solution is not perfect, as it assumes that the .rodata section
				// is laid out within 2^31 bytes of each function body, but this seems
	if (IsNested)			// to be sufficient for JIT.
	BuildMI(allocMBB, DL, TII.get(X86::MORESTACK_RET_RESTORE_R10));			BuildMI(allocMBB, DL, TII.get(X86::CALL64m))
	else			.addReg(X86::RIP)
	BuildMI(allocMBB, DL, TII.get(X86::MORESTACK_RET));			.addImm(0)
				.addReg(0)
	allocMBB->addSuccessor(&prologueMBB);			.addExternalSymbol("__morestack_addr")
				.addReg(0);
	checkMBB->addSuccessor(allocMBB);			MF.getMMI().setUsesMorestackAddr(true);
	checkMBB->addSuccessor(&prologueMBB);			} else {
				if (Is64Bit)
	#ifdef XDEBUG			BuildMI(allocMBB, DL, TII.get(X86::CALL64pcrel32))
	MF.verify();			.addExternalSymbol("__morestack");
	#endif			else
	}			BuildMI(allocMBB, DL, TII.get(X86::CALLpcrel32))
				.addExternalSymbol("__morestack");
	/// Erlang programs may need a special prologue to handle the stack size they			}
	/// might need at runtime. That is because Erlang/OTP does not implement a C
	/// stack but uses a custom implementation of hybrid stack/heap architecture.			if (IsNested)
	/// (for more information see Eric Stenman's Ph.D. thesis:			BuildMI(allocMBB, DL, TII.get(X86::MORESTACK_RET_RESTORE_R10));
	/// http://publications.uu.se/uu/fulltext/nbn_se_uu_diva-2688.pdf)			else
	///			BuildMI(allocMBB, DL, TII.get(X86::MORESTACK_RET));
	/// CheckStack:
	/// temp0 = sp - MaxStack			allocMBB->addSuccessor(&prologueMBB);
	/// if( temp0 < SP_LIMIT(P) ) goto IncStack else goto OldStart
	/// OldStart:			checkMBB->addSuccessor(allocMBB);
	/// ...			checkMBB->addSuccessor(&prologueMBB);
	/// IncStack:
	/// call inc_stack # doubles the stack space			#ifdef XDEBUG
	/// temp0 = sp - MaxStack			MF.verify();
	/// if( temp0 < SP_LIMIT(P) ) goto IncStack else goto OldStart			#endif
	void X86FrameLowering::adjustForHiPEPrologue(MachineFunction &MF) const {			}
	const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
	MachineFrameInfo *MFI = MF.getFrameInfo();			/// Erlang programs may need a special prologue to handle the stack size they
	const unsigned SlotSize =			/// might need at runtime. That is because Erlang/OTP does not implement a C
	static_cast<const X86RegisterInfo *>(MF.getSubtarget().getRegisterInfo())			/// stack but uses a custom implementation of hybrid stack/heap architecture.
	->getSlotSize();			/// (for more information see Eric Stenman's Ph.D. thesis:
	const X86Subtarget &STI = MF.getTarget().getSubtarget<X86Subtarget>();			/// http://publications.uu.se/uu/fulltext/nbn_se_uu_diva-2688.pdf)
	const bool Is64Bit = STI.is64Bit();			///
	const bool IsLP64 = STI.isTarget64BitLP64();			/// CheckStack:
	DebugLoc DL;			/// temp0 = sp - MaxStack
	// HiPE-specific values			/// if( temp0 < SP_LIMIT(P) ) goto IncStack else goto OldStart
	const unsigned HipeLeafWords = 24;			/// OldStart:
	const unsigned CCRegisteredArgs = Is64Bit ? 6 : 5;			/// ...
	const unsigned Guaranteed = HipeLeafWords * SlotSize;			/// IncStack:
	unsigned CallerStkArity = MF.getFunction()->arg_size() > CCRegisteredArgs ?			/// call inc_stack # doubles the stack space
	MF.getFunction()->arg_size() - CCRegisteredArgs : 0;			/// temp0 = sp - MaxStack
	unsigned MaxStack = MFI->getStackSize() + CallerStkArity*SlotSize + SlotSize;			/// if( temp0 < SP_LIMIT(P) ) goto IncStack else goto OldStart
				void X86FrameLowering::adjustForHiPEPrologue(MachineFunction &MF) const {
	assert(STI.isTargetLinux() &&			const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
	"HiPE prologue is only supported on Linux operating systems.");			MachineFrameInfo *MFI = MF.getFrameInfo();
				const unsigned SlotSize =
	// Compute the largest caller's frame that is needed to fit the callees'			static_cast<const X86RegisterInfo *>(MF.getSubtarget().getRegisterInfo())
	// frames. This 'MaxStack' is computed from:			->getSlotSize();
	//			const X86Subtarget &STI = MF.getTarget().getSubtarget<X86Subtarget>();
	// a) the fixed frame size, which is the space needed for all spilled temps,			const bool Is64Bit = STI.is64Bit();
	// b) outgoing on-stack parameter areas, and			const bool IsLP64 = STI.isTarget64BitLP64();
	// c) the minimum stack space this function needs to make available for the			DebugLoc DL;
	// functions it calls (a tunable ABI property).			// HiPE-specific values
	if (MFI->hasCalls()) {			const unsigned HipeLeafWords = 24;
	unsigned MoreStackForCalls = 0;			const unsigned CCRegisteredArgs = Is64Bit ? 6 : 5;
				const unsigned Guaranteed = HipeLeafWords * SlotSize;
	for (MachineFunction::iterator MBBI = MF.begin(), MBBE = MF.end();			unsigned CallerStkArity = MF.getFunction()->arg_size() > CCRegisteredArgs ?
	MBBI != MBBE; ++MBBI)			MF.getFunction()->arg_size() - CCRegisteredArgs : 0;
	for (MachineBasicBlock::iterator MI = MBBI->begin(), ME = MBBI->end();			unsigned MaxStack = MFI->getStackSize() + CallerStkArity*SlotSize + SlotSize;
	MI != ME; ++MI) {
	if (!MI->isCall())			assert(STI.isTargetLinux() &&
	continue;			"HiPE prologue is only supported on Linux operating systems.");

	// Get callee operand.			// Compute the largest caller's frame that is needed to fit the callees'
	const MachineOperand &MO = MI->getOperand(0);			// frames. This 'MaxStack' is computed from:
				//
	// Only take account of global function calls (no closures etc.).			// a) the fixed frame size, which is the space needed for all spilled temps,
	if (!MO.isGlobal())			// b) outgoing on-stack parameter areas, and
	continue;			// c) the minimum stack space this function needs to make available for the
				// functions it calls (a tunable ABI property).
	const Function *F = dyn_cast<Function>(MO.getGlobal());			if (MFI->hasCalls()) {
	if (!F)			unsigned MoreStackForCalls = 0;
	continue;
				for (MachineFunction::iterator MBBI = MF.begin(), MBBE = MF.end();
	// Do not update 'MaxStack' for primitive and built-in functions			MBBI != MBBE; ++MBBI)
	// (encoded with names either starting with "erlang."/"bif_" or not			for (MachineBasicBlock::iterator MI = MBBI->begin(), ME = MBBI->end();
	// having a ".", such as a simple <Module>.<Function>.<Arity>, or an			MI != ME; ++MI) {
	// "_", such as the BIF "suspend_0") as they are executed on another			if (!MI->isCall())
	// stack.			continue;
	if (F->getName().find("erlang.") != StringRef::npos \|\|
	F->getName().find("bif_") != StringRef::npos \|\|			// Get callee operand.
	F->getName().find_first_of("._") == StringRef::npos)			const MachineOperand &MO = MI->getOperand(0);
	continue;
				// Only take account of global function calls (no closures etc.).
	unsigned CalleeStkArity =			if (!MO.isGlobal())
	F->arg_size() > CCRegisteredArgs ? F->arg_size()-CCRegisteredArgs : 0;			continue;
	if (HipeLeafWords - 1 > CalleeStkArity)
	MoreStackForCalls = std::max(MoreStackForCalls,			const Function *F = dyn_cast<Function>(MO.getGlobal());
	(HipeLeafWords - 1 - CalleeStkArity) * SlotSize);			if (!F)
	}			continue;
	MaxStack += MoreStackForCalls;
	}			// Do not update 'MaxStack' for primitive and built-in functions
				// (encoded with names either starting with "erlang."/"bif_" or not
	// If the stack frame needed is larger than the guaranteed then runtime checks			// having a ".", such as a simple <Module>.<Function>.<Arity>, or an
	// and calls to "inc_stack_0" BIF should be inserted in the assembly prologue.			// "_", such as the BIF "suspend_0") as they are executed on another
	if (MaxStack > Guaranteed) {			// stack.
	MachineBasicBlock &prologueMBB = MF.front();			if (F->getName().find("erlang.") != StringRef::npos \|\|
	MachineBasicBlock *stackCheckMBB = MF.CreateMachineBasicBlock();			F->getName().find("bif_") != StringRef::npos \|\|
	MachineBasicBlock *incStackMBB = MF.CreateMachineBasicBlock();			F->getName().find_first_of("._") == StringRef::npos)
				continue;
	for (MachineBasicBlock::livein_iterator I = prologueMBB.livein_begin(),
	E = prologueMBB.livein_end(); I != E; I++) {			unsigned CalleeStkArity =
	stackCheckMBB->addLiveIn(*I);			F->arg_size() > CCRegisteredArgs ? F->arg_size()-CCRegisteredArgs : 0;
	incStackMBB->addLiveIn(*I);			if (HipeLeafWords - 1 > CalleeStkArity)
	}			MoreStackForCalls = std::max(MoreStackForCalls,
				(HipeLeafWords - 1 - CalleeStkArity) * SlotSize);
	MF.push_front(incStackMBB);			}
	MF.push_front(stackCheckMBB);			MaxStack += MoreStackForCalls;
				}
	unsigned ScratchReg, SPReg, PReg, SPLimitOffset;
	unsigned LEAop, CMPop, CALLop;			// If the stack frame needed is larger than the guaranteed then runtime checks
	if (Is64Bit) {			// and calls to "inc_stack_0" BIF should be inserted in the assembly prologue.
	SPReg = X86::RSP;			if (MaxStack > Guaranteed) {
	PReg = X86::RBP;			MachineBasicBlock &prologueMBB = MF.front();
	LEAop = X86::LEA64r;			MachineBasicBlock *stackCheckMBB = MF.CreateMachineBasicBlock();
	CMPop = X86::CMP64rm;			MachineBasicBlock *incStackMBB = MF.CreateMachineBasicBlock();
	CALLop = X86::CALL64pcrel32;
	SPLimitOffset = 0x90;			for (MachineBasicBlock::livein_iterator I = prologueMBB.livein_begin(),
	} else {			E = prologueMBB.livein_end(); I != E; I++) {
	SPReg = X86::ESP;			stackCheckMBB->addLiveIn(*I);
	PReg = X86::EBP;			incStackMBB->addLiveIn(*I);
	LEAop = X86::LEA32r;			}
	CMPop = X86::CMP32rm;
	CALLop = X86::CALLpcrel32;			MF.push_front(incStackMBB);
	SPLimitOffset = 0x4c;			MF.push_front(stackCheckMBB);
	}
				unsigned ScratchReg, SPReg, PReg, SPLimitOffset;
	ScratchReg = GetScratchRegister(Is64Bit, IsLP64, MF, true);			unsigned LEAop, CMPop, CALLop;
	assert(!MF.getRegInfo().isLiveIn(ScratchReg) &&			if (Is64Bit) {
	"HiPE prologue scratch register is live-in");			SPReg = X86::RSP;
				PReg = X86::RBP;
	// Create new MBB for StackCheck:			LEAop = X86::LEA64r;
	addRegOffset(BuildMI(stackCheckMBB, DL, TII.get(LEAop), ScratchReg),			CMPop = X86::CMP64rm;
	SPReg, false, -MaxStack);			CALLop = X86::CALL64pcrel32;
	// SPLimitOffset is in a fixed heap location (pointed by BP).			SPLimitOffset = 0x90;
	addRegOffset(BuildMI(stackCheckMBB, DL, TII.get(CMPop))			} else {
	.addReg(ScratchReg), PReg, false, SPLimitOffset);			SPReg = X86::ESP;
	BuildMI(stackCheckMBB, DL, TII.get(X86::JAE_1)).addMBB(&prologueMBB);			PReg = X86::EBP;
				LEAop = X86::LEA32r;
	// Create new MBB for IncStack:			CMPop = X86::CMP32rm;
	BuildMI(incStackMBB, DL, TII.get(CALLop)).			CALLop = X86::CALLpcrel32;
	addExternalSymbol("inc_stack_0");			SPLimitOffset = 0x4c;
	addRegOffset(BuildMI(incStackMBB, DL, TII.get(LEAop), ScratchReg),			}
	SPReg, false, -MaxStack);
	addRegOffset(BuildMI(incStackMBB, DL, TII.get(CMPop))			ScratchReg = GetScratchRegister(Is64Bit, IsLP64, MF, true);
	.addReg(ScratchReg), PReg, false, SPLimitOffset);			assert(!MF.getRegInfo().isLiveIn(ScratchReg) &&
	BuildMI(incStackMBB, DL, TII.get(X86::JLE_1)).addMBB(incStackMBB);			"HiPE prologue scratch register is live-in");

	stackCheckMBB->addSuccessor(&prologueMBB, 99);			// Create new MBB for StackCheck:
	stackCheckMBB->addSuccessor(incStackMBB, 1);			addRegOffset(BuildMI(stackCheckMBB, DL, TII.get(LEAop), ScratchReg),
	incStackMBB->addSuccessor(&prologueMBB, 99);			SPReg, false, -MaxStack);
	incStackMBB->addSuccessor(incStackMBB, 1);			// SPLimitOffset is in a fixed heap location (pointed by BP).
	}			addRegOffset(BuildMI(stackCheckMBB, DL, TII.get(CMPop))
	#ifdef XDEBUG			.addReg(ScratchReg), PReg, false, SPLimitOffset);
	MF.verify();			BuildMI(stackCheckMBB, DL, TII.get(X86::JAE_1)).addMBB(&prologueMBB);
	#endif
	}			// Create new MBB for IncStack:
				BuildMI(incStackMBB, DL, TII.get(CALLop)).
	bool X86FrameLowering::			addExternalSymbol("inc_stack_0");
	convertArgMovsToPushes(MachineFunction &MF, MachineBasicBlock &MBB,			addRegOffset(BuildMI(incStackMBB, DL, TII.get(LEAop), ScratchReg),
	MachineBasicBlock::iterator I, uint64_t Amount) const {			SPReg, false, -MaxStack);
	const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();			addRegOffset(BuildMI(incStackMBB, DL, TII.get(CMPop))
	const X86RegisterInfo &RegInfo = static_cast<const X86RegisterInfo >(			.addReg(ScratchReg), PReg, false, SPLimitOffset);
	MF.getSubtarget().getRegisterInfo());			BuildMI(incStackMBB, DL, TII.get(X86::JLE_1)).addMBB(incStackMBB);
	unsigned StackPtr = RegInfo.getStackRegister();
				stackCheckMBB->addSuccessor(&prologueMBB, 99);
	// Scan the call setup sequence for the pattern we're looking for.			stackCheckMBB->addSuccessor(incStackMBB, 1);
	// We only handle a simple case now - a sequence of MOV32mi or MOV32mr			incStackMBB->addSuccessor(&prologueMBB, 99);
	// instructions, that push a sequence of 32-bit values onto the stack, with			incStackMBB->addSuccessor(incStackMBB, 1);
	// no gaps.			}
	std::map<int64_t, MachineBasicBlock::iterator> MovMap;			#ifdef XDEBUG
	do {			MF.verify();
	int Opcode = I->getOpcode();			#endif
	if (Opcode != X86::MOV32mi && Opcode != X86::MOV32mr)			}
	break;
				void X86FrameLowering::
	// We only want movs of the form:			eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB,
	// movl imm/r32, k(%ecx)			MachineBasicBlock::iterator I) const {
	// If we run into something else, bail			const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
	// Note that AddrBaseReg may, counterintuitively, not be a register...			const X86RegisterInfo &RegInfo = static_cast<const X86RegisterInfo >(
	if (!I->getOperand(X86::AddrBaseReg).isReg() \|\|			MF.getSubtarget().getRegisterInfo());
	(I->getOperand(X86::AddrBaseReg).getReg() != StackPtr) \|\|			unsigned StackPtr = RegInfo.getStackRegister();
	!I->getOperand(X86::AddrScaleAmt).isImm() \|\|			bool reserveCallFrame = hasReservedCallFrame(MF);
	(I->getOperand(X86::AddrScaleAmt).getImm() != 1) \|\|			int Opcode = I->getOpcode();
	(I->getOperand(X86::AddrIndexReg).getReg() != X86::NoRegister) \|\|			bool isDestroy = Opcode == TII.getCallFrameDestroyOpcode();
	(I->getOperand(X86::AddrSegmentReg).getReg() != X86::NoRegister) \|\|			const X86Subtarget &STI = MF.getTarget().getSubtarget<X86Subtarget>();
	!I->getOperand(X86::AddrDisp).isImm())			bool IsLP64 = STI.isTarget64BitLP64();
	return false;			DebugLoc DL = I->getDebugLoc();
				uint64_t Amount = !reserveCallFrame ? I->getOperand(0).getImm() : 0;
	int64_t StackDisp = I->getOperand(X86::AddrDisp).getImm();			uint64_t InternalAmt = (isDestroy \|\| Amount) ? I->getOperand(1).getImm() : 0;
				I = MBB.erase(I);
	// We don't want to consider the unaligned case.
	if (StackDisp % 4)			if (!reserveCallFrame) {
	return false;			// If the stack pointer can be changed after prologue, turn the
				// adjcallstackup instruction into a 'sub ESP, <amt>' and the
	// If the same stack slot is being filled twice, something's fishy.			// adjcallstackdown instruction into 'add ESP, <amt>'
	if (!MovMap.insert(std::pair<int64_t, MachineInstr*>(StackDisp, I)).second)			if (Amount == 0)
	return false;			return;

	++I;			// We need to keep the stack aligned properly. To do this, we round the
	} while (I != MBB.end());			// amount of space needed for the outgoing arguments up to the next
				// alignment boundary.
	// We now expect the end of the sequence - a call and a stack adjust.			unsigned StackAlign = MF.getTarget()
	if (I == MBB.end())			.getSubtargetImpl()
	return false;			->getFrameLowering()
	if (!I->isCall())			->getStackAlignment();
	return false;			Amount = (Amount + StackAlign - 1) / StackAlign * StackAlign;
	MachineBasicBlock::iterator Call = I;
	if ((++I)->getOpcode() != TII.getCallFrameDestroyOpcode())			MachineInstr *New = nullptr;
	return false;
				// Factor out the amount that gets handled inside the sequence
	// Now, go through the map, and see that we don't have any gaps,			// (Pushes of argument for frame setup, callee pops for frame destroy)
	// but only a series of 32-bit MOVs.			Amount -= InternalAmt;
	// Since std::map provides ordered iteration, the original order
	// of the MOVs doesn't matter.			if (Amount) {
	int64_t ExpectedDist = 0;			if (Opcode == TII.getCallFrameSetupOpcode()) {
	for (auto MMI = MovMap.begin(), MME = MovMap.end(); MMI != MME;			New = BuildMI(MF, DL, TII.get(getSUBriOpcode(IsLP64, Amount)), StackPtr)
	++MMI, ExpectedDist += 4)			.addReg(StackPtr).addImm(Amount);
	if (MMI->first != ExpectedDist)			} else {
	return false;			assert(Opcode == TII.getCallFrameDestroyOpcode());

	// Ok, everything looks fine. Do the transformation.			unsigned Opc = getADDriOpcode(IsLP64, Amount);
	DebugLoc DL = I->getDebugLoc();			New = BuildMI(MF, DL, TII.get(Opc), StackPtr)
				.addReg(StackPtr).addImm(Amount);
	// It's possible the original stack adjustment amount was larger than			}
	// that done by the pushes. If so, we still need a SUB.			}
	Amount -= ExpectedDist;
	if (Amount) {			if (New) {
	MachineInstr* Sub = BuildMI(MBB, Call, DL,			// The EFLAGS implicit def is dead.
	TII.get(getSUBriOpcode(false, Amount)), StackPtr)			New->getOperand(3).setIsDead();
	.addReg(StackPtr).addImm(Amount);
	Sub->getOperand(3).setIsDead();			// Replace the pseudo instruction with a new instruction.
	}			MBB.insert(I, New);
				}
	// Now, iterate through the map in reverse order, and replace the movs
	// with pushes. MOVmi/MOVmr doesn't have any defs, so need to replace uses.			return;
	for (auto MMI = MovMap.rbegin(), MME = MovMap.rend(); MMI != MME; ++MMI) {			}
	MachineBasicBlock::iterator MOV = MMI->second;
	MachineOperand PushOp = MOV->getOperand(X86::AddrNumOperands);			if (Opcode == TII.getCallFrameDestroyOpcode() && InternalAmt) {
				// If we are performing frame pointer elimination and if the callee pops
	// Replace MOVmr with PUSH32r, and MOVmi with PUSHi of appropriate size			// something off the stack pointer, add it back. We do this until we have
	int PushOpcode = X86::PUSH32r;			// more advanced stack pointer tracking ability.
	if (MOV->getOpcode() == X86::MOV32mi)			unsigned Opc = getSUBriOpcode(IsLP64, InternalAmt);
	PushOpcode = getPUSHiOpcode(false, PushOp);			MachineInstr *New = BuildMI(MF, DL, TII.get(Opc), StackPtr)
				.addReg(StackPtr).addImm(InternalAmt);
	BuildMI(MBB, Call, DL, TII.get(PushOpcode)).addOperand(PushOp);
	MBB.erase(MOV);			// The EFLAGS implicit def is dead.
	}			New->getOperand(3).setIsDead();

	return true;			// We are not tracking the stack pointer adjustment by the callee, so make
	}			// sure we restore the stack pointer immediately after the call, there may
				// be spill code inserted between the CALL and ADJCALLSTACKUP instructions.
	void X86FrameLowering::			MachineBasicBlock::iterator B = MBB.begin();
	eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB,			while (I != B && !std::prev(I)->isCall())
	MachineBasicBlock::iterator I) const {			--I;
	const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();			MBB.insert(I, New);
	const X86RegisterInfo &RegInfo = static_cast<const X86RegisterInfo >(			}
	MF.getSubtarget().getRegisterInfo());			}
	unsigned StackPtr = RegInfo.getStackRegister();
	bool reserveCallFrame = hasReservedCallFrame(MF);
	int Opcode = I->getOpcode();
	bool isDestroy = Opcode == TII.getCallFrameDestroyOpcode();
	const X86Subtarget &STI = MF.getTarget().getSubtarget<X86Subtarget>();
	bool IsLP64 = STI.isTarget64BitLP64();
	DebugLoc DL = I->getDebugLoc();
	uint64_t Amount = !reserveCallFrame ? I->getOperand(0).getImm() : 0;
	uint64_t CalleeAmt = isDestroy ? I->getOperand(1).getImm() : 0;
	I = MBB.erase(I);

	if (!reserveCallFrame) {
	// If the stack pointer can be changed after prologue, turn the
	// adjcallstackup instruction into a 'sub ESP, <amt>' and the
	// adjcallstackdown instruction into 'add ESP, <amt>'
	if (Amount == 0)
	return;

	// We need to keep the stack aligned properly. To do this, we round the
	// amount of space needed for the outgoing arguments up to the next
	// alignment boundary.
	unsigned StackAlign = MF.getTarget()
	.getSubtargetImpl()
	->getFrameLowering()
	->getStackAlignment();
	Amount = (Amount + StackAlign - 1) / StackAlign * StackAlign;

	MachineInstr *New = nullptr;
	if (Opcode == TII.getCallFrameSetupOpcode()) {
	// Try to convert movs to the stack into pushes.
	// We currently only look for a pattern that appears in 32-bit
	// calling conventions.
	if (!IsLP64 && convertArgMovsToPushes(MF, MBB, I, Amount))
	return;

	New = BuildMI(MF, DL, TII.get(getSUBriOpcode(IsLP64, Amount)),
	StackPtr)
	.addReg(StackPtr)
	.addImm(Amount);
	} else {
	assert(Opcode == TII.getCallFrameDestroyOpcode());

	// Factor out the amount the callee already popped.
	Amount -= CalleeAmt;

	if (Amount) {
	unsigned Opc = getADDriOpcode(IsLP64, Amount);
	New = BuildMI(MF, DL, TII.get(Opc), StackPtr)
	.addReg(StackPtr).addImm(Amount);
	}
	}

	if (New) {
	// The EFLAGS implicit def is dead.
	New->getOperand(3).setIsDead();

	// Replace the pseudo instruction with a new instruction.
	MBB.insert(I, New);
	}

	return;
	}

	if (Opcode == TII.getCallFrameDestroyOpcode() && CalleeAmt) {
	// If we are performing frame pointer elimination and if the callee pops
	// something off the stack pointer, add it back. We do this until we have
	// more advanced stack pointer tracking ability.
	unsigned Opc = getSUBriOpcode(IsLP64, CalleeAmt);
	MachineInstr *New = BuildMI(MF, DL, TII.get(Opc), StackPtr)
	.addReg(StackPtr).addImm(CalleeAmt);

	// The EFLAGS implicit def is dead.
	New->getOperand(3).setIsDead();

	// We are not tracking the stack pointer adjustment by the callee, so make
	// sure we restore the stack pointer immediately after the call, there may
	// be spill code inserted between the CALL and ADJCALLSTACKUP instructions.
	MachineBasicBlock::iterator B = MBB.begin();
	while (I != B && !std::prev(I)->isCall())
	--I;
	MBB.insert(I, New);
	}
	}

llvm/trunk/lib/Target/X86/X86InstrCompiler.td

	//===- X86InstrCompiler.td - Compiler Pseudos and Patterns -- tablegen --===//			//===- X86InstrCompiler.td - Compiler Pseudos and Patterns -- tablegen --===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file describes the various pseudo instructions used by the compiler,			// This file describes the various pseudo instructions used by the compiler,
	// as well as Pat patterns used during instruction selection.			// as well as Pat patterns used during instruction selection.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Pattern Matching Support			// Pattern Matching Support

	def GetLo32XForm : SDNodeXForm<imm, [{			def GetLo32XForm : SDNodeXForm<imm, [{
	// Transformation function: get the low 32 bits.			// Transformation function: get the low 32 bits.
	return getI32Imm((unsigned)N->getZExtValue());			return getI32Imm((unsigned)N->getZExtValue());
	}]>;			}]>;

	def GetLo8XForm : SDNodeXForm<imm, [{			def GetLo8XForm : SDNodeXForm<imm, [{
	// Transformation function: get the low 8 bits.			// Transformation function: get the low 8 bits.
	return getI8Imm((uint8_t)N->getZExtValue());			return getI8Imm((uint8_t)N->getZExtValue());
	}]>;			}]>;


	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Random Pseudo Instructions.			// Random Pseudo Instructions.

	// PIC base construction. This expands to code that looks like this:			// PIC base construction. This expands to code that looks like this:
	// call $next_inst			// call $next_inst
	// popl %destreg"			// popl %destreg"
	let hasSideEffects = 0, isNotDuplicable = 1, Uses = [ESP] in			let hasSideEffects = 0, isNotDuplicable = 1, Uses = [ESP] in
	def MOVPC32r : Ii32<0xE8, Pseudo, (outs GR32:$reg), (ins i32imm:$label),			def MOVPC32r : Ii32<0xE8, Pseudo, (outs GR32:$reg), (ins i32imm:$label),
	"", []>;			"", []>;


	// ADJCALLSTACKDOWN/UP implicitly use/def ESP because they may be expanded into			// ADJCALLSTACKDOWN/UP implicitly use/def ESP because they may be expanded into
	// a stack adjustment and the codegen must know that they may modify the stack			// a stack adjustment and the codegen must know that they may modify the stack
	// pointer before prolog-epilog rewriting occurs.			// pointer before prolog-epilog rewriting occurs.
	// Pessimistically assume ADJCALLSTACKDOWN / ADJCALLSTACKUP will become			// Pessimistically assume ADJCALLSTACKDOWN / ADJCALLSTACKUP will become
	// sub / add which can clobber EFLAGS.			// sub / add which can clobber EFLAGS.
	let Defs = [ESP, EFLAGS], Uses = [ESP] in {			let Defs = [ESP, EFLAGS], Uses = [ESP] in {
	def ADJCALLSTACKDOWN32 : I<0, Pseudo, (outs), (ins i32imm:$amt),			def ADJCALLSTACKDOWN32 : I<0, Pseudo, (outs), (ins i32imm:$amt1, i32imm:$amt2),
	"#ADJCALLSTACKDOWN",			"#ADJCALLSTACKDOWN",
	[(X86callseq_start timm:$amt)]>,			[]>,
	Requires<[NotLP64]>;			Requires<[NotLP64]>;
	def ADJCALLSTACKUP32 : I<0, Pseudo, (outs), (ins i32imm:$amt1, i32imm:$amt2),			def ADJCALLSTACKUP32 : I<0, Pseudo, (outs), (ins i32imm:$amt1, i32imm:$amt2),
	"#ADJCALLSTACKUP",			"#ADJCALLSTACKUP",
	[(X86callseq_end timm:$amt1, timm:$amt2)]>,			[(X86callseq_end timm:$amt1, timm:$amt2)]>,
	Requires<[NotLP64]>;			Requires<[NotLP64]>;
	}			}
				def : Pat<(X86callseq_start timm:$amt1),
	// ADJCALLSTACKDOWN/UP implicitly use/def RSP because they may be expanded into			(ADJCALLSTACKDOWN32 i32imm:$amt1, 0)>, Requires<[NotLP64]>;
	// a stack adjustment and the codegen must know that they may modify the stack
	// pointer before prolog-epilog rewriting occurs.
	// Pessimistically assume ADJCALLSTACKDOWN / ADJCALLSTACKUP will become			// ADJCALLSTACKDOWN/UP implicitly use/def RSP because they may be expanded into
	// sub / add which can clobber EFLAGS.			// a stack adjustment and the codegen must know that they may modify the stack
	let Defs = [RSP, EFLAGS], Uses = [RSP] in {			// pointer before prolog-epilog rewriting occurs.
	def ADJCALLSTACKDOWN64 : I<0, Pseudo, (outs), (ins i32imm:$amt),			// Pessimistically assume ADJCALLSTACKDOWN / ADJCALLSTACKUP will become
	"#ADJCALLSTACKDOWN",			// sub / add which can clobber EFLAGS.
	[(X86callseq_start timm:$amt)]>,			let Defs = [RSP, EFLAGS], Uses = [RSP] in {
	Requires<[IsLP64]>;			def ADJCALLSTACKDOWN64 : I<0, Pseudo, (outs), (ins i32imm:$amt1, i32imm:$amt2),
	def ADJCALLSTACKUP64 : I<0, Pseudo, (outs), (ins i32imm:$amt1, i32imm:$amt2),			"#ADJCALLSTACKDOWN",
	"#ADJCALLSTACKUP",			[]>,
	[(X86callseq_end timm:$amt1, timm:$amt2)]>,			Requires<[IsLP64]>;
	Requires<[IsLP64]>;			def ADJCALLSTACKUP64 : I<0, Pseudo, (outs), (ins i32imm:$amt1, i32imm:$amt2),
	}			"#ADJCALLSTACKUP",
				[(X86callseq_end timm:$amt1, timm:$amt2)]>,
				Requires<[IsLP64]>;
				}
	// x86-64 va_start lowering magic.			def : Pat<(X86callseq_start timm:$amt1),
	let usesCustomInserter = 1, Defs = [EFLAGS] in {			(ADJCALLSTACKDOWN64 i32imm:$amt1, 0)>, Requires<[IsLP64]>;
	def VASTART_SAVE_XMM_REGS : I<0, Pseudo,
	(outs),
	(ins GR8:$al,			// x86-64 va_start lowering magic.
	i64imm:$regsavefi, i64imm:$offset,			let usesCustomInserter = 1, Defs = [EFLAGS] in {
	variable_ops),			def VASTART_SAVE_XMM_REGS : I<0, Pseudo,
	"#VASTART_SAVE_XMM_REGS $al, $regsavefi, $offset",			(outs),
	[(X86vastart_save_xmm_regs GR8:$al,			(ins GR8:$al,
	imm:$regsavefi,			i64imm:$regsavefi, i64imm:$offset,
	imm:$offset),			variable_ops),
	(implicit EFLAGS)]>;			"#VASTART_SAVE_XMM_REGS $al, $regsavefi, $offset",
				[(X86vastart_save_xmm_regs GR8:$al,
	// The VAARG_64 pseudo-instruction takes the address of the va_list,			imm:$regsavefi,
	// and places the address of the next argument into a register.			imm:$offset),
	let Defs = [EFLAGS] in			(implicit EFLAGS)]>;
	def VAARG_64 : I<0, Pseudo,
	(outs GR64:$dst),			// The VAARG_64 pseudo-instruction takes the address of the va_list,
	(ins i8mem:$ap, i32imm:$size, i8imm:$mode, i32imm:$align),			// and places the address of the next argument into a register.
	"#VAARG_64 $dst, $ap, $size, $mode, $align",			let Defs = [EFLAGS] in
	[(set GR64:$dst,			def VAARG_64 : I<0, Pseudo,
	(X86vaarg64 addr:$ap, imm:$size, imm:$mode, imm:$align)),			(outs GR64:$dst),
	(implicit EFLAGS)]>;			(ins i8mem:$ap, i32imm:$size, i8imm:$mode, i32imm:$align),
				"#VAARG_64 $dst, $ap, $size, $mode, $align",
	// Dynamic stack allocation yields a _chkstk or _alloca call for all Windows			[(set GR64:$dst,
	// targets. These calls are needed to probe the stack when allocating more than			(X86vaarg64 addr:$ap, imm:$size, imm:$mode, imm:$align)),
	// 4k bytes in one go. Touching the stack at 4K increments is necessary to			(implicit EFLAGS)]>;
	// ensure that the guard pages used by the OS virtual memory manager are
	// allocated in correct sequence.			// Dynamic stack allocation yields a _chkstk or _alloca call for all Windows
	// The main point of having separate instruction are extra unmodelled effects			// targets. These calls are needed to probe the stack when allocating more than
	// (compared to ordinary calls) like stack pointer change.			// 4k bytes in one go. Touching the stack at 4K increments is necessary to
				// ensure that the guard pages used by the OS virtual memory manager are
	let Defs = [EAX, ESP, EFLAGS], Uses = [ESP] in			// allocated in correct sequence.
	def WIN_ALLOCA : I<0, Pseudo, (outs), (ins),			// The main point of having separate instruction are extra unmodelled effects
	"# dynamic stack allocation",			// (compared to ordinary calls) like stack pointer change.
	[(X86WinAlloca)]>;
				let Defs = [EAX, ESP, EFLAGS], Uses = [ESP] in
	// When using segmented stacks these are lowered into instructions which first			def WIN_ALLOCA : I<0, Pseudo, (outs), (ins),
	// check if the current stacklet has enough free memory. If it does, memory is			"# dynamic stack allocation",
	// allocated by bumping the stack pointer. Otherwise memory is allocated from			[(X86WinAlloca)]>;
	// the heap.
				// When using segmented stacks these are lowered into instructions which first
	let Defs = [EAX, ESP, EFLAGS], Uses = [ESP] in			// check if the current stacklet has enough free memory. If it does, memory is
	def SEG_ALLOCA_32 : I<0, Pseudo, (outs GR32:$dst), (ins GR32:$size),			// allocated by bumping the stack pointer. Otherwise memory is allocated from
	"# variable sized alloca for segmented stacks",			// the heap.
	[(set GR32:$dst,
	(X86SegAlloca GR32:$size))]>,			let Defs = [EAX, ESP, EFLAGS], Uses = [ESP] in
	Requires<[NotLP64]>;			def SEG_ALLOCA_32 : I<0, Pseudo, (outs GR32:$dst), (ins GR32:$size),
				"# variable sized alloca for segmented stacks",
	let Defs = [RAX, RSP, EFLAGS], Uses = [RSP] in			[(set GR32:$dst,
	def SEG_ALLOCA_64 : I<0, Pseudo, (outs GR64:$dst), (ins GR64:$size),			(X86SegAlloca GR32:$size))]>,
	"# variable sized alloca for segmented stacks",			Requires<[NotLP64]>;
	[(set GR64:$dst,
	(X86SegAlloca GR64:$size))]>,			let Defs = [RAX, RSP, EFLAGS], Uses = [RSP] in
	Requires<[In64BitMode]>;			def SEG_ALLOCA_64 : I<0, Pseudo, (outs GR64:$dst), (ins GR64:$size),
	}			"# variable sized alloca for segmented stacks",
				[(set GR64:$dst,
	// The MSVC runtime contains an _ftol2 routine for converting floating-point			(X86SegAlloca GR64:$size))]>,
	// to integer values. It has a strange calling convention: the input is			Requires<[In64BitMode]>;
	// popped from the x87 stack, and the return value is given in EDX:EAX. ECX is			}
	// used as a temporary register. No other registers (aside from flags) are
	// touched.			// The MSVC runtime contains an _ftol2 routine for converting floating-point
	// Microsoft toolchains do not support 80-bit precision, so a WIN_FTOL_80			// to integer values. It has a strange calling convention: the input is
	// variant is unnecessary.			// popped from the x87 stack, and the return value is given in EDX:EAX. ECX is
				// used as a temporary register. No other registers (aside from flags) are
	let Defs = [EAX, EDX, ECX, EFLAGS], FPForm = SpecialFP in {			// touched.
	def WIN_FTOL_32 : I<0, Pseudo, (outs), (ins RFP32:$src),			// Microsoft toolchains do not support 80-bit precision, so a WIN_FTOL_80
	"# win32 fptoui",			// variant is unnecessary.
	[(X86WinFTOL RFP32:$src)]>,
	Requires<[Not64BitMode]>;			let Defs = [EAX, EDX, ECX, EFLAGS], FPForm = SpecialFP in {
				def WIN_FTOL_32 : I<0, Pseudo, (outs), (ins RFP32:$src),
	def WIN_FTOL_64 : I<0, Pseudo, (outs), (ins RFP64:$src),			"# win32 fptoui",
	"# win32 fptoui",			[(X86WinFTOL RFP32:$src)]>,
	[(X86WinFTOL RFP64:$src)]>,			Requires<[Not64BitMode]>;
	Requires<[Not64BitMode]>;
	}			def WIN_FTOL_64 : I<0, Pseudo, (outs), (ins RFP64:$src),
				"# win32 fptoui",
	//===----------------------------------------------------------------------===//			[(X86WinFTOL RFP64:$src)]>,
	// EH Pseudo Instructions			Requires<[Not64BitMode]>;
	//			}
	let SchedRW = [WriteSystem] in {
	let isTerminator = 1, isReturn = 1, isBarrier = 1,			//===----------------------------------------------------------------------===//
	hasCtrlDep = 1, isCodeGenOnly = 1 in {			// EH Pseudo Instructions
	def EH_RETURN : I<0xC3, RawFrm, (outs), (ins GR32:$addr),			//
	"ret\t#eh_return, addr: $addr",			let SchedRW = [WriteSystem] in {
	[(X86ehret GR32:$addr)], IIC_RET>, Sched<[WriteJumpLd]>;			let isTerminator = 1, isReturn = 1, isBarrier = 1,
				hasCtrlDep = 1, isCodeGenOnly = 1 in {
	}			def EH_RETURN : I<0xC3, RawFrm, (outs), (ins GR32:$addr),
				"ret\t#eh_return, addr: $addr",
	let isTerminator = 1, isReturn = 1, isBarrier = 1,			[(X86ehret GR32:$addr)], IIC_RET>, Sched<[WriteJumpLd]>;
	hasCtrlDep = 1, isCodeGenOnly = 1 in {
	def EH_RETURN64 : I<0xC3, RawFrm, (outs), (ins GR64:$addr),			}
	"ret\t#eh_return, addr: $addr",
	[(X86ehret GR64:$addr)], IIC_RET>, Sched<[WriteJumpLd]>;			let isTerminator = 1, isReturn = 1, isBarrier = 1,
				hasCtrlDep = 1, isCodeGenOnly = 1 in {
	}			def EH_RETURN64 : I<0xC3, RawFrm, (outs), (ins GR64:$addr),
				"ret\t#eh_return, addr: $addr",
	let hasSideEffects = 1, isBarrier = 1, isCodeGenOnly = 1,			[(X86ehret GR64:$addr)], IIC_RET>, Sched<[WriteJumpLd]>;
	usesCustomInserter = 1 in {
	def EH_SjLj_SetJmp32 : I<0, Pseudo, (outs GR32:$dst), (ins i32mem:$buf),			}
	"#EH_SJLJ_SETJMP32",
	[(set GR32:$dst, (X86eh_sjlj_setjmp addr:$buf))]>,			let hasSideEffects = 1, isBarrier = 1, isCodeGenOnly = 1,
	Requires<[Not64BitMode]>;			usesCustomInserter = 1 in {
	def EH_SjLj_SetJmp64 : I<0, Pseudo, (outs GR32:$dst), (ins i64mem:$buf),			def EH_SjLj_SetJmp32 : I<0, Pseudo, (outs GR32:$dst), (ins i32mem:$buf),
	"#EH_SJLJ_SETJMP64",			"#EH_SJLJ_SETJMP32",
	[(set GR32:$dst, (X86eh_sjlj_setjmp addr:$buf))]>,			[(set GR32:$dst, (X86eh_sjlj_setjmp addr:$buf))]>,
	Requires<[In64BitMode]>;			Requires<[Not64BitMode]>;
	let isTerminator = 1 in {			def EH_SjLj_SetJmp64 : I<0, Pseudo, (outs GR32:$dst), (ins i64mem:$buf),
	def EH_SjLj_LongJmp32 : I<0, Pseudo, (outs), (ins i32mem:$buf),			"#EH_SJLJ_SETJMP64",
	"#EH_SJLJ_LONGJMP32",			[(set GR32:$dst, (X86eh_sjlj_setjmp addr:$buf))]>,
	[(X86eh_sjlj_longjmp addr:$buf)]>,			Requires<[In64BitMode]>;
	Requires<[Not64BitMode]>;			let isTerminator = 1 in {
	def EH_SjLj_LongJmp64 : I<0, Pseudo, (outs), (ins i64mem:$buf),			def EH_SjLj_LongJmp32 : I<0, Pseudo, (outs), (ins i32mem:$buf),
	"#EH_SJLJ_LONGJMP64",			"#EH_SJLJ_LONGJMP32",
	[(X86eh_sjlj_longjmp addr:$buf)]>,			[(X86eh_sjlj_longjmp addr:$buf)]>,
	Requires<[In64BitMode]>;			Requires<[Not64BitMode]>;
	}			def EH_SjLj_LongJmp64 : I<0, Pseudo, (outs), (ins i64mem:$buf),
	}			"#EH_SJLJ_LONGJMP64",
	} // SchedRW			[(X86eh_sjlj_longjmp addr:$buf)]>,
				Requires<[In64BitMode]>;
	let isBranch = 1, isTerminator = 1, isCodeGenOnly = 1 in {			}
	def EH_SjLj_Setup : I<0, Pseudo, (outs), (ins brtarget:$dst),			}
	"#EH_SjLj_Setup\t$dst", []>;			} // SchedRW
	}
				let isBranch = 1, isTerminator = 1, isCodeGenOnly = 1 in {
	//===----------------------------------------------------------------------===//			def EH_SjLj_Setup : I<0, Pseudo, (outs), (ins brtarget:$dst),
	// Pseudo instructions used by unwind info.			"#EH_SjLj_Setup\t$dst", []>;
	//			}
	let isPseudo = 1 in {
	def SEH_PushReg : I<0, Pseudo, (outs), (ins i32imm:$reg),			//===----------------------------------------------------------------------===//
	"#SEH_PushReg $reg", []>;			// Pseudo instructions used by unwind info.
	def SEH_SaveReg : I<0, Pseudo, (outs), (ins i32imm:$reg, i32imm:$dst),			//
	"#SEH_SaveReg $reg, $dst", []>;			let isPseudo = 1 in {
	def SEH_SaveXMM : I<0, Pseudo, (outs), (ins i32imm:$reg, i32imm:$dst),			def SEH_PushReg : I<0, Pseudo, (outs), (ins i32imm:$reg),
	"#SEH_SaveXMM $reg, $dst", []>;			"#SEH_PushReg $reg", []>;
	def SEH_StackAlloc : I<0, Pseudo, (outs), (ins i32imm:$size),			def SEH_SaveReg : I<0, Pseudo, (outs), (ins i32imm:$reg, i32imm:$dst),
	"#SEH_StackAlloc $size", []>;			"#SEH_SaveReg $reg, $dst", []>;
	def SEH_SetFrame : I<0, Pseudo, (outs), (ins i32imm:$reg, i32imm:$offset),			def SEH_SaveXMM : I<0, Pseudo, (outs), (ins i32imm:$reg, i32imm:$dst),
	"#SEH_SetFrame $reg, $offset", []>;			"#SEH_SaveXMM $reg, $dst", []>;
	def SEH_PushFrame : I<0, Pseudo, (outs), (ins i1imm:$mode),			def SEH_StackAlloc : I<0, Pseudo, (outs), (ins i32imm:$size),
	"#SEH_PushFrame $mode", []>;			"#SEH_StackAlloc $size", []>;
	def SEH_EndPrologue : I<0, Pseudo, (outs), (ins),			def SEH_SetFrame : I<0, Pseudo, (outs), (ins i32imm:$reg, i32imm:$offset),
	"#SEH_EndPrologue", []>;			"#SEH_SetFrame $reg, $offset", []>;
	def SEH_Epilogue : I<0, Pseudo, (outs), (ins),			def SEH_PushFrame : I<0, Pseudo, (outs), (ins i1imm:$mode),
	"#SEH_Epilogue", []>;			"#SEH_PushFrame $mode", []>;
	}			def SEH_EndPrologue : I<0, Pseudo, (outs), (ins),
				"#SEH_EndPrologue", []>;
	//===----------------------------------------------------------------------===//			def SEH_Epilogue : I<0, Pseudo, (outs), (ins),
	// Pseudo instructions used by segmented stacks.			"#SEH_Epilogue", []>;
	//			}

	// This is lowered into a RET instruction by MCInstLower. We need			//===----------------------------------------------------------------------===//
	// this so that we don't have to have a MachineBasicBlock which ends			// Pseudo instructions used by segmented stacks.
	// with a RET and also has successors.			//
	let isPseudo = 1 in {
	def MORESTACK_RET: I<0, Pseudo, (outs), (ins),			// This is lowered into a RET instruction by MCInstLower. We need
	"", []>;			// this so that we don't have to have a MachineBasicBlock which ends
				// with a RET and also has successors.
	// This instruction is lowered to a RET followed by a MOV. The two			let isPseudo = 1 in {
	// instructions are not generated on a higher level since then the			def MORESTACK_RET: I<0, Pseudo, (outs), (ins),
	// verifier sees a MachineBasicBlock ending with a non-terminator.			"", []>;
	def MORESTACK_RET_RESTORE_R10 : I<0, Pseudo, (outs), (ins),
	"", []>;			// This instruction is lowered to a RET followed by a MOV. The two
	}			// instructions are not generated on a higher level since then the
				// verifier sees a MachineBasicBlock ending with a non-terminator.
	//===----------------------------------------------------------------------===//			def MORESTACK_RET_RESTORE_R10 : I<0, Pseudo, (outs), (ins),
	// Alias Instructions			"", []>;
	//===----------------------------------------------------------------------===//			}

	// Alias instruction mapping movr0 to xor.			//===----------------------------------------------------------------------===//
	// FIXME: remove when we can teach regalloc that xor reg, reg is ok.			// Alias Instructions
	let Defs = [EFLAGS], isReMaterializable = 1, isAsCheapAsAMove = 1,			//===----------------------------------------------------------------------===//
	isPseudo = 1 in
	def MOV32r0 : I<0, Pseudo, (outs GR32:$dst), (ins), "",			// Alias instruction mapping movr0 to xor.
	[(set GR32:$dst, 0)], IIC_ALU_NONMEM>, Sched<[WriteZero]>;			// FIXME: remove when we can teach regalloc that xor reg, reg is ok.
				let Defs = [EFLAGS], isReMaterializable = 1, isAsCheapAsAMove = 1,
	// Other widths can also make use of the 32-bit xor, which may have a smaller			isPseudo = 1 in
	// encoding and avoid partial register updates.			def MOV32r0 : I<0, Pseudo, (outs GR32:$dst), (ins), "",
	def : Pat<(i8 0), (EXTRACT_SUBREG (MOV32r0), sub_8bit)>;			[(set GR32:$dst, 0)], IIC_ALU_NONMEM>, Sched<[WriteZero]>;
	def : Pat<(i16 0), (EXTRACT_SUBREG (MOV32r0), sub_16bit)>;
	def : Pat<(i64 0), (SUBREG_TO_REG (i64 0), (MOV32r0), sub_32bit)> {			// Other widths can also make use of the 32-bit xor, which may have a smaller
	let AddedComplexity = 20;			// encoding and avoid partial register updates.
	}			def : Pat<(i8 0), (EXTRACT_SUBREG (MOV32r0), sub_8bit)>;
				def : Pat<(i16 0), (EXTRACT_SUBREG (MOV32r0), sub_16bit)>;
	// Materialize i64 constant where top 32-bits are zero. This could theoretically			def : Pat<(i64 0), (SUBREG_TO_REG (i64 0), (MOV32r0), sub_32bit)> {
	// use MOV32ri with a SUBREG_TO_REG to represent the zero-extension, however			let AddedComplexity = 20;
	// that would make it more difficult to rematerialize.			}
	let AddedComplexity = 1, isReMaterializable = 1, isAsCheapAsAMove = 1,
	isCodeGenOnly = 1, hasSideEffects = 0 in			// Materialize i64 constant where top 32-bits are zero. This could theoretically
	def MOV32ri64 : Ii32<0xb8, AddRegFrm, (outs GR32:$dst), (ins i64i32imm:$src),			// use MOV32ri with a SUBREG_TO_REG to represent the zero-extension, however
	"", [], IIC_ALU_NONMEM>, Sched<[WriteALU]>;			// that would make it more difficult to rematerialize.
				let AddedComplexity = 1, isReMaterializable = 1, isAsCheapAsAMove = 1,
	// This 64-bit pseudo-move can be used for both a 64-bit constant that is			isCodeGenOnly = 1, hasSideEffects = 0 in
	// actually the zero-extension of a 32-bit constant, and for labels in the			def MOV32ri64 : Ii32<0xb8, AddRegFrm, (outs GR32:$dst), (ins i64i32imm:$src),
	// x86-64 small code model.			"", [], IIC_ALU_NONMEM>, Sched<[WriteALU]>;
	def mov64imm32 : ComplexPattern<i64, 1, "SelectMOV64Imm32", [imm, X86Wrapper]>;
				// This 64-bit pseudo-move can be used for both a 64-bit constant that is
	let AddedComplexity = 1 in			// actually the zero-extension of a 32-bit constant, and for labels in the
	def : Pat<(i64 mov64imm32:$src),			// x86-64 small code model.
	(SUBREG_TO_REG (i64 0), (MOV32ri64 mov64imm32:$src), sub_32bit)>;			def mov64imm32 : ComplexPattern<i64, 1, "SelectMOV64Imm32", [imm, X86Wrapper]>;

	// Use sbb to materialize carry bit.			let AddedComplexity = 1 in
	let Uses = [EFLAGS], Defs = [EFLAGS], isPseudo = 1, SchedRW = [WriteALU] in {			def : Pat<(i64 mov64imm32:$src),
	// FIXME: These are pseudo ops that should be replaced with Pat<> patterns.			(SUBREG_TO_REG (i64 0), (MOV32ri64 mov64imm32:$src), sub_32bit)>;
	// However, Pat<> can't replicate the destination reg into the inputs of the
	// result.			// Use sbb to materialize carry bit.
	def SETB_C8r : I<0, Pseudo, (outs GR8:$dst), (ins), "",			let Uses = [EFLAGS], Defs = [EFLAGS], isPseudo = 1, SchedRW = [WriteALU] in {
	[(set GR8:$dst, (X86setcc_c X86_COND_B, EFLAGS))]>;			// FIXME: These are pseudo ops that should be replaced with Pat<> patterns.
	def SETB_C16r : I<0, Pseudo, (outs GR16:$dst), (ins), "",			// However, Pat<> can't replicate the destination reg into the inputs of the
	[(set GR16:$dst, (X86setcc_c X86_COND_B, EFLAGS))]>;			// result.
	def SETB_C32r : I<0, Pseudo, (outs GR32:$dst), (ins), "",			def SETB_C8r : I<0, Pseudo, (outs GR8:$dst), (ins), "",
	[(set GR32:$dst, (X86setcc_c X86_COND_B, EFLAGS))]>;			[(set GR8:$dst, (X86setcc_c X86_COND_B, EFLAGS))]>;
	def SETB_C64r : I<0, Pseudo, (outs GR64:$dst), (ins), "",			def SETB_C16r : I<0, Pseudo, (outs GR16:$dst), (ins), "",
	[(set GR64:$dst, (X86setcc_c X86_COND_B, EFLAGS))]>;			[(set GR16:$dst, (X86setcc_c X86_COND_B, EFLAGS))]>;
	} // isCodeGenOnly			def SETB_C32r : I<0, Pseudo, (outs GR32:$dst), (ins), "",
				[(set GR32:$dst, (X86setcc_c X86_COND_B, EFLAGS))]>;
				def SETB_C64r : I<0, Pseudo, (outs GR64:$dst), (ins), "",
	def : Pat<(i16 (anyext (i8 (X86setcc_c X86_COND_B, EFLAGS)))),			[(set GR64:$dst, (X86setcc_c X86_COND_B, EFLAGS))]>;
	(SETB_C16r)>;			} // isCodeGenOnly
	def : Pat<(i32 (anyext (i8 (X86setcc_c X86_COND_B, EFLAGS)))),
	(SETB_C32r)>;
	def : Pat<(i64 (anyext (i8 (X86setcc_c X86_COND_B, EFLAGS)))),			def : Pat<(i16 (anyext (i8 (X86setcc_c X86_COND_B, EFLAGS)))),
	(SETB_C64r)>;			(SETB_C16r)>;
				def : Pat<(i32 (anyext (i8 (X86setcc_c X86_COND_B, EFLAGS)))),
	def : Pat<(i16 (sext (i8 (X86setcc_c X86_COND_B, EFLAGS)))),			(SETB_C32r)>;
	(SETB_C16r)>;			def : Pat<(i64 (anyext (i8 (X86setcc_c X86_COND_B, EFLAGS)))),
	def : Pat<(i32 (sext (i8 (X86setcc_c X86_COND_B, EFLAGS)))),			(SETB_C64r)>;
	(SETB_C32r)>;
	def : Pat<(i64 (sext (i8 (X86setcc_c X86_COND_B, EFLAGS)))),			def : Pat<(i16 (sext (i8 (X86setcc_c X86_COND_B, EFLAGS)))),
	(SETB_C64r)>;			(SETB_C16r)>;
				def : Pat<(i32 (sext (i8 (X86setcc_c X86_COND_B, EFLAGS)))),
	// We canonicalize 'setb' to "(and (sbb reg,reg), 1)" on the hope that the and			(SETB_C32r)>;
	// will be eliminated and that the sbb can be extended up to a wider type. When			def : Pat<(i64 (sext (i8 (X86setcc_c X86_COND_B, EFLAGS)))),
	// this happens, it is great. However, if we are left with an 8-bit sbb and an			(SETB_C64r)>;
	// and, we might as well just match it as a setb.
	def : Pat<(and (i8 (X86setcc_c X86_COND_B, EFLAGS)), 1),			// We canonicalize 'setb' to "(and (sbb reg,reg), 1)" on the hope that the and
	(SETBr)>;			// will be eliminated and that the sbb can be extended up to a wider type. When
				// this happens, it is great. However, if we are left with an 8-bit sbb and an
	// (add OP, SETB) -> (adc OP, 0)			// and, we might as well just match it as a setb.
	def : Pat<(add (and (i8 (X86setcc_c X86_COND_B, EFLAGS)), 1), GR8:$op),			def : Pat<(and (i8 (X86setcc_c X86_COND_B, EFLAGS)), 1),
	(ADC8ri GR8:$op, 0)>;			(SETBr)>;
	def : Pat<(add (and (i32 (X86setcc_c X86_COND_B, EFLAGS)), 1), GR32:$op),
	(ADC32ri8 GR32:$op, 0)>;			// (add OP, SETB) -> (adc OP, 0)
	def : Pat<(add (and (i64 (X86setcc_c X86_COND_B, EFLAGS)), 1), GR64:$op),			def : Pat<(add (and (i8 (X86setcc_c X86_COND_B, EFLAGS)), 1), GR8:$op),
	(ADC64ri8 GR64:$op, 0)>;			(ADC8ri GR8:$op, 0)>;
				def : Pat<(add (and (i32 (X86setcc_c X86_COND_B, EFLAGS)), 1), GR32:$op),
	// (sub OP, SETB) -> (sbb OP, 0)			(ADC32ri8 GR32:$op, 0)>;
	def : Pat<(sub GR8:$op, (and (i8 (X86setcc_c X86_COND_B, EFLAGS)), 1)),			def : Pat<(add (and (i64 (X86setcc_c X86_COND_B, EFLAGS)), 1), GR64:$op),
	(SBB8ri GR8:$op, 0)>;			(ADC64ri8 GR64:$op, 0)>;
	def : Pat<(sub GR32:$op, (and (i32 (X86setcc_c X86_COND_B, EFLAGS)), 1)),
	(SBB32ri8 GR32:$op, 0)>;			// (sub OP, SETB) -> (sbb OP, 0)
	def : Pat<(sub GR64:$op, (and (i64 (X86setcc_c X86_COND_B, EFLAGS)), 1)),			def : Pat<(sub GR8:$op, (and (i8 (X86setcc_c X86_COND_B, EFLAGS)), 1)),
	(SBB64ri8 GR64:$op, 0)>;			(SBB8ri GR8:$op, 0)>;
				def : Pat<(sub GR32:$op, (and (i32 (X86setcc_c X86_COND_B, EFLAGS)), 1)),
	// (sub OP, SETCC_CARRY) -> (adc OP, 0)			(SBB32ri8 GR32:$op, 0)>;
	def : Pat<(sub GR8:$op, (i8 (X86setcc_c X86_COND_B, EFLAGS))),			def : Pat<(sub GR64:$op, (and (i64 (X86setcc_c X86_COND_B, EFLAGS)), 1)),
	(ADC8ri GR8:$op, 0)>;			(SBB64ri8 GR64:$op, 0)>;
	def : Pat<(sub GR32:$op, (i32 (X86setcc_c X86_COND_B, EFLAGS))),
	(ADC32ri8 GR32:$op, 0)>;			// (sub OP, SETCC_CARRY) -> (adc OP, 0)
	def : Pat<(sub GR64:$op, (i64 (X86setcc_c X86_COND_B, EFLAGS))),			def : Pat<(sub GR8:$op, (i8 (X86setcc_c X86_COND_B, EFLAGS))),
	(ADC64ri8 GR64:$op, 0)>;			(ADC8ri GR8:$op, 0)>;
				def : Pat<(sub GR32:$op, (i32 (X86setcc_c X86_COND_B, EFLAGS))),
	//===----------------------------------------------------------------------===//			(ADC32ri8 GR32:$op, 0)>;
	// String Pseudo Instructions			def : Pat<(sub GR64:$op, (i64 (X86setcc_c X86_COND_B, EFLAGS))),
	//			(ADC64ri8 GR64:$op, 0)>;
	let SchedRW = [WriteMicrocoded] in {
	let Defs = [ECX,EDI,ESI], Uses = [ECX,EDI,ESI], isCodeGenOnly = 1 in {			//===----------------------------------------------------------------------===//
	def REP_MOVSB_32 : I<0xA4, RawFrm, (outs), (ins), "{rep;movsb\|rep movsb}",			// String Pseudo Instructions
	[(X86rep_movs i8)], IIC_REP_MOVS>, REP,			//
	Requires<[Not64BitMode]>;			let SchedRW = [WriteMicrocoded] in {
	def REP_MOVSW_32 : I<0xA5, RawFrm, (outs), (ins), "{rep;movsw\|rep movsw}",			let Defs = [ECX,EDI,ESI], Uses = [ECX,EDI,ESI], isCodeGenOnly = 1 in {
	[(X86rep_movs i16)], IIC_REP_MOVS>, REP, OpSize16,			def REP_MOVSB_32 : I<0xA4, RawFrm, (outs), (ins), "{rep;movsb\|rep movsb}",
	Requires<[Not64BitMode]>;			[(X86rep_movs i8)], IIC_REP_MOVS>, REP,
	def REP_MOVSD_32 : I<0xA5, RawFrm, (outs), (ins), "{rep;movsl\|rep movsd}",			Requires<[Not64BitMode]>;
	[(X86rep_movs i32)], IIC_REP_MOVS>, REP, OpSize32,			def REP_MOVSW_32 : I<0xA5, RawFrm, (outs), (ins), "{rep;movsw\|rep movsw}",
	Requires<[Not64BitMode]>;			[(X86rep_movs i16)], IIC_REP_MOVS>, REP, OpSize16,
	}			Requires<[Not64BitMode]>;
				def REP_MOVSD_32 : I<0xA5, RawFrm, (outs), (ins), "{rep;movsl\|rep movsd}",
	let Defs = [RCX,RDI,RSI], Uses = [RCX,RDI,RSI], isCodeGenOnly = 1 in {			[(X86rep_movs i32)], IIC_REP_MOVS>, REP, OpSize32,
	def REP_MOVSB_64 : I<0xA4, RawFrm, (outs), (ins), "{rep;movsb\|rep movsb}",			Requires<[Not64BitMode]>;
	[(X86rep_movs i8)], IIC_REP_MOVS>, REP,			}
	Requires<[In64BitMode]>;
	def REP_MOVSW_64 : I<0xA5, RawFrm, (outs), (ins), "{rep;movsw\|rep movsw}",			let Defs = [RCX,RDI,RSI], Uses = [RCX,RDI,RSI], isCodeGenOnly = 1 in {
	[(X86rep_movs i16)], IIC_REP_MOVS>, REP, OpSize16,			def REP_MOVSB_64 : I<0xA4, RawFrm, (outs), (ins), "{rep;movsb\|rep movsb}",
	Requires<[In64BitMode]>;			[(X86rep_movs i8)], IIC_REP_MOVS>, REP,
	def REP_MOVSD_64 : I<0xA5, RawFrm, (outs), (ins), "{rep;movsl\|rep movsd}",			Requires<[In64BitMode]>;
	[(X86rep_movs i32)], IIC_REP_MOVS>, REP, OpSize32,			def REP_MOVSW_64 : I<0xA5, RawFrm, (outs), (ins), "{rep;movsw\|rep movsw}",
	Requires<[In64BitMode]>;			[(X86rep_movs i16)], IIC_REP_MOVS>, REP, OpSize16,
	def REP_MOVSQ_64 : RI<0xA5, RawFrm, (outs), (ins), "{rep;movsq\|rep movsq}",			Requires<[In64BitMode]>;
	[(X86rep_movs i64)], IIC_REP_MOVS>, REP,			def REP_MOVSD_64 : I<0xA5, RawFrm, (outs), (ins), "{rep;movsl\|rep movsd}",
	Requires<[In64BitMode]>;			[(X86rep_movs i32)], IIC_REP_MOVS>, REP, OpSize32,
	}			Requires<[In64BitMode]>;
				def REP_MOVSQ_64 : RI<0xA5, RawFrm, (outs), (ins), "{rep;movsq\|rep movsq}",
	// FIXME: Should use "(X86rep_stos AL)" as the pattern.			[(X86rep_movs i64)], IIC_REP_MOVS>, REP,
	let Defs = [ECX,EDI], isCodeGenOnly = 1 in {			Requires<[In64BitMode]>;
	let Uses = [AL,ECX,EDI] in			}
	def REP_STOSB_32 : I<0xAA, RawFrm, (outs), (ins), "{rep;stosb\|rep stosb}",
	[(X86rep_stos i8)], IIC_REP_STOS>, REP,			// FIXME: Should use "(X86rep_stos AL)" as the pattern.
	Requires<[Not64BitMode]>;			let Defs = [ECX,EDI], isCodeGenOnly = 1 in {
	let Uses = [AX,ECX,EDI] in			let Uses = [AL,ECX,EDI] in
	def REP_STOSW_32 : I<0xAB, RawFrm, (outs), (ins), "{rep;stosw\|rep stosw}",			def REP_STOSB_32 : I<0xAA, RawFrm, (outs), (ins), "{rep;stosb\|rep stosb}",
	[(X86rep_stos i16)], IIC_REP_STOS>, REP, OpSize16,			[(X86rep_stos i8)], IIC_REP_STOS>, REP,
	Requires<[Not64BitMode]>;			Requires<[Not64BitMode]>;
	let Uses = [EAX,ECX,EDI] in			let Uses = [AX,ECX,EDI] in
	def REP_STOSD_32 : I<0xAB, RawFrm, (outs), (ins), "{rep;stosl\|rep stosd}",			def REP_STOSW_32 : I<0xAB, RawFrm, (outs), (ins), "{rep;stosw\|rep stosw}",
	[(X86rep_stos i32)], IIC_REP_STOS>, REP, OpSize32,			[(X86rep_stos i16)], IIC_REP_STOS>, REP, OpSize16,
	Requires<[Not64BitMode]>;			Requires<[Not64BitMode]>;
	}			let Uses = [EAX,ECX,EDI] in
				def REP_STOSD_32 : I<0xAB, RawFrm, (outs), (ins), "{rep;stosl\|rep stosd}",
	let Defs = [RCX,RDI], isCodeGenOnly = 1 in {			[(X86rep_stos i32)], IIC_REP_STOS>, REP, OpSize32,
	let Uses = [AL,RCX,RDI] in			Requires<[Not64BitMode]>;
	def REP_STOSB_64 : I<0xAA, RawFrm, (outs), (ins), "{rep;stosb\|rep stosb}",			}
	[(X86rep_stos i8)], IIC_REP_STOS>, REP,
	Requires<[In64BitMode]>;			let Defs = [RCX,RDI], isCodeGenOnly = 1 in {
	let Uses = [AX,RCX,RDI] in			let Uses = [AL,RCX,RDI] in
	def REP_STOSW_64 : I<0xAB, RawFrm, (outs), (ins), "{rep;stosw\|rep stosw}",			def REP_STOSB_64 : I<0xAA, RawFrm, (outs), (ins), "{rep;stosb\|rep stosb}",
	[(X86rep_stos i16)], IIC_REP_STOS>, REP, OpSize16,			[(X86rep_stos i8)], IIC_REP_STOS>, REP,
	Requires<[In64BitMode]>;			Requires<[In64BitMode]>;
	let Uses = [RAX,RCX,RDI] in			let Uses = [AX,RCX,RDI] in
	def REP_STOSD_64 : I<0xAB, RawFrm, (outs), (ins), "{rep;stosl\|rep stosd}",			def REP_STOSW_64 : I<0xAB, RawFrm, (outs), (ins), "{rep;stosw\|rep stosw}",
	[(X86rep_stos i32)], IIC_REP_STOS>, REP, OpSize32,			[(X86rep_stos i16)], IIC_REP_STOS>, REP, OpSize16,
	Requires<[In64BitMode]>;			Requires<[In64BitMode]>;
				let Uses = [RAX,RCX,RDI] in
	let Uses = [RAX,RCX,RDI] in			def REP_STOSD_64 : I<0xAB, RawFrm, (outs), (ins), "{rep;stosl\|rep stosd}",
	def REP_STOSQ_64 : RI<0xAB, RawFrm, (outs), (ins), "{rep;stosq\|rep stosq}",			[(X86rep_stos i32)], IIC_REP_STOS>, REP, OpSize32,
	[(X86rep_stos i64)], IIC_REP_STOS>, REP,			Requires<[In64BitMode]>;
	Requires<[In64BitMode]>;
	}			let Uses = [RAX,RCX,RDI] in
	} // SchedRW			def REP_STOSQ_64 : RI<0xAB, RawFrm, (outs), (ins), "{rep;stosq\|rep stosq}",
				[(X86rep_stos i64)], IIC_REP_STOS>, REP,
	//===----------------------------------------------------------------------===//			Requires<[In64BitMode]>;
	// Thread Local Storage Instructions			}
	//			} // SchedRW

	// ELF TLS Support			//===----------------------------------------------------------------------===//
	// All calls clobber the non-callee saved registers. ESP is marked as			// Thread Local Storage Instructions
	// a use to prevent stack-pointer assignments that appear immediately			//
	// before calls from potentially appearing dead.
	let Defs = [EAX, ECX, EDX, FP0, FP1, FP2, FP3, FP4, FP5, FP6, FP7,			// ELF TLS Support
	ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7,			// All calls clobber the non-callee saved registers. ESP is marked as
	MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7,			// a use to prevent stack-pointer assignments that appear immediately
	XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7,			// before calls from potentially appearing dead.
	XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS],			let Defs = [EAX, ECX, EDX, FP0, FP1, FP2, FP3, FP4, FP5, FP6, FP7,
	Uses = [ESP] in {			ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7,
	def TLS_addr32 : I<0, Pseudo, (outs), (ins i32mem:$sym),			MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7,
	"# TLS_addr32",			XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7,
	[(X86tlsaddr tls32addr:$sym)]>,			XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS],
	Requires<[Not64BitMode]>;			Uses = [ESP] in {
	def TLS_base_addr32 : I<0, Pseudo, (outs), (ins i32mem:$sym),			def TLS_addr32 : I<0, Pseudo, (outs), (ins i32mem:$sym),
	"# TLS_base_addr32",			"# TLS_addr32",
	[(X86tlsbaseaddr tls32baseaddr:$sym)]>,			[(X86tlsaddr tls32addr:$sym)]>,
	Requires<[Not64BitMode]>;			Requires<[Not64BitMode]>;
	}			def TLS_base_addr32 : I<0, Pseudo, (outs), (ins i32mem:$sym),
				"# TLS_base_addr32",
	// All calls clobber the non-callee saved registers. RSP is marked as			[(X86tlsbaseaddr tls32baseaddr:$sym)]>,
	// a use to prevent stack-pointer assignments that appear immediately			Requires<[Not64BitMode]>;
	// before calls from potentially appearing dead.			}
	let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11,
	FP0, FP1, FP2, FP3, FP4, FP5, FP6, FP7,			// All calls clobber the non-callee saved registers. RSP is marked as
	ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7,			// a use to prevent stack-pointer assignments that appear immediately
	MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7,			// before calls from potentially appearing dead.
	XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7,			let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11,
	XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS],			FP0, FP1, FP2, FP3, FP4, FP5, FP6, FP7,
	Uses = [RSP] in {			ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7,
	def TLS_addr64 : I<0, Pseudo, (outs), (ins i64mem:$sym),			MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7,
	"# TLS_addr64",			XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7,
	[(X86tlsaddr tls64addr:$sym)]>,			XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS],
	Requires<[In64BitMode]>;			Uses = [RSP] in {
	def TLS_base_addr64 : I<0, Pseudo, (outs), (ins i64mem:$sym),			def TLS_addr64 : I<0, Pseudo, (outs), (ins i64mem:$sym),
	"# TLS_base_addr64",			"# TLS_addr64",
	[(X86tlsbaseaddr tls64baseaddr:$sym)]>,			[(X86tlsaddr tls64addr:$sym)]>,
	Requires<[In64BitMode]>;			Requires<[In64BitMode]>;
	}			def TLS_base_addr64 : I<0, Pseudo, (outs), (ins i64mem:$sym),
				"# TLS_base_addr64",
	// Darwin TLS Support			[(X86tlsbaseaddr tls64baseaddr:$sym)]>,
	// For i386, the address of the thunk is passed on the stack, on return the			Requires<[In64BitMode]>;
	// address of the variable is in %eax. %ecx is trashed during the function			}
	// call. All other registers are preserved.
	let Defs = [EAX, ECX, EFLAGS],			// Darwin TLS Support
	Uses = [ESP],			// For i386, the address of the thunk is passed on the stack, on return the
	usesCustomInserter = 1 in			// address of the variable is in %eax. %ecx is trashed during the function
	def TLSCall_32 : I<0, Pseudo, (outs), (ins i32mem:$sym),			// call. All other registers are preserved.
	"# TLSCall_32",			let Defs = [EAX, ECX, EFLAGS],
	[(X86TLSCall addr:$sym)]>,			Uses = [ESP],
	Requires<[Not64BitMode]>;			usesCustomInserter = 1 in
				def TLSCall_32 : I<0, Pseudo, (outs), (ins i32mem:$sym),
	// For x86_64, the address of the thunk is passed in %rdi, on return			"# TLSCall_32",
	// the address of the variable is in %rax. All other registers are preserved.			[(X86TLSCall addr:$sym)]>,
	let Defs = [RAX, EFLAGS],			Requires<[Not64BitMode]>;
	Uses = [RSP, RDI],
	usesCustomInserter = 1 in			// For x86_64, the address of the thunk is passed in %rdi, on return
	def TLSCall_64 : I<0, Pseudo, (outs), (ins i64mem:$sym),			// the address of the variable is in %rax. All other registers are preserved.
	"# TLSCall_64",			let Defs = [RAX, EFLAGS],
	[(X86TLSCall addr:$sym)]>,			Uses = [RSP, RDI],
	Requires<[In64BitMode]>;			usesCustomInserter = 1 in
				def TLSCall_64 : I<0, Pseudo, (outs), (ins i64mem:$sym),
				"# TLSCall_64",
	//===----------------------------------------------------------------------===//			[(X86TLSCall addr:$sym)]>,
	// Conditional Move Pseudo Instructions			Requires<[In64BitMode]>;

	// X86 doesn't have 8-bit conditional moves. Use a customInserter to
	// emit control flow. An alternative to this is to mark i8 SELECT as Promote,			//===----------------------------------------------------------------------===//
	// however that requires promoting the operands, and can induce additional			// Conditional Move Pseudo Instructions
	// i8 register pressure.
	let usesCustomInserter = 1, Uses = [EFLAGS] in {			// X86 doesn't have 8-bit conditional moves. Use a customInserter to
	def CMOV_GR8 : I<0, Pseudo,			// emit control flow. An alternative to this is to mark i8 SELECT as Promote,
	(outs GR8:$dst), (ins GR8:$src1, GR8:$src2, i8imm:$cond),			// however that requires promoting the operands, and can induce additional
	"#CMOV_GR8 PSEUDO!",			// i8 register pressure.
	[(set GR8:$dst, (X86cmov GR8:$src1, GR8:$src2,			let usesCustomInserter = 1, Uses = [EFLAGS] in {
	imm:$cond, EFLAGS))]>;			def CMOV_GR8 : I<0, Pseudo,
				(outs GR8:$dst), (ins GR8:$src1, GR8:$src2, i8imm:$cond),
	let Predicates = [NoCMov] in {			"#CMOV_GR8 PSEUDO!",
	def CMOV_GR32 : I<0, Pseudo,			[(set GR8:$dst, (X86cmov GR8:$src1, GR8:$src2,
	(outs GR32:$dst), (ins GR32:$src1, GR32:$src2, i8imm:$cond),			imm:$cond, EFLAGS))]>;
	"#CMOV_GR32* PSEUDO!",
	[(set GR32:$dst,			let Predicates = [NoCMov] in {
	(X86cmov GR32:$src1, GR32:$src2, imm:$cond, EFLAGS))]>;			def CMOV_GR32 : I<0, Pseudo,
	def CMOV_GR16 : I<0, Pseudo,			(outs GR32:$dst), (ins GR32:$src1, GR32:$src2, i8imm:$cond),
	(outs GR16:$dst), (ins GR16:$src1, GR16:$src2, i8imm:$cond),			"#CMOV_GR32* PSEUDO!",
	"#CMOV_GR16* PSEUDO!",			[(set GR32:$dst,
	[(set GR16:$dst,			(X86cmov GR32:$src1, GR32:$src2, imm:$cond, EFLAGS))]>;
	(X86cmov GR16:$src1, GR16:$src2, imm:$cond, EFLAGS))]>;			def CMOV_GR16 : I<0, Pseudo,
	} // Predicates = [NoCMov]			(outs GR16:$dst), (ins GR16:$src1, GR16:$src2, i8imm:$cond),
				"#CMOV_GR16* PSEUDO!",
	// fcmov doesn't handle all possible EFLAGS, provide a fallback if there is no			[(set GR16:$dst,
	// SSE1.			(X86cmov GR16:$src1, GR16:$src2, imm:$cond, EFLAGS))]>;
	let Predicates = [FPStackf32] in			} // Predicates = [NoCMov]
	def CMOV_RFP32 : I<0, Pseudo,
	(outs RFP32:$dst),			// fcmov doesn't handle all possible EFLAGS, provide a fallback if there is no
	(ins RFP32:$src1, RFP32:$src2, i8imm:$cond),			// SSE1.
	"#CMOV_RFP32 PSEUDO!",			let Predicates = [FPStackf32] in
	[(set RFP32:$dst,			def CMOV_RFP32 : I<0, Pseudo,
	(X86cmov RFP32:$src1, RFP32:$src2, imm:$cond,			(outs RFP32:$dst),
	EFLAGS))]>;			(ins RFP32:$src1, RFP32:$src2, i8imm:$cond),
	// fcmov doesn't handle all possible EFLAGS, provide a fallback if there is no			"#CMOV_RFP32 PSEUDO!",
	// SSE2.			[(set RFP32:$dst,
	let Predicates = [FPStackf64] in			(X86cmov RFP32:$src1, RFP32:$src2, imm:$cond,
	def CMOV_RFP64 : I<0, Pseudo,			EFLAGS))]>;
	(outs RFP64:$dst),			// fcmov doesn't handle all possible EFLAGS, provide a fallback if there is no
	(ins RFP64:$src1, RFP64:$src2, i8imm:$cond),			// SSE2.
	"#CMOV_RFP64 PSEUDO!",			let Predicates = [FPStackf64] in
	[(set RFP64:$dst,			def CMOV_RFP64 : I<0, Pseudo,
	(X86cmov RFP64:$src1, RFP64:$src2, imm:$cond,			(outs RFP64:$dst),
	EFLAGS))]>;			(ins RFP64:$src1, RFP64:$src2, i8imm:$cond),
	def CMOV_RFP80 : I<0, Pseudo,			"#CMOV_RFP64 PSEUDO!",
	(outs RFP80:$dst),			[(set RFP64:$dst,
	(ins RFP80:$src1, RFP80:$src2, i8imm:$cond),			(X86cmov RFP64:$src1, RFP64:$src2, imm:$cond,
	"#CMOV_RFP80 PSEUDO!",			EFLAGS))]>;
	[(set RFP80:$dst,			def CMOV_RFP80 : I<0, Pseudo,
	(X86cmov RFP80:$src1, RFP80:$src2, imm:$cond,			(outs RFP80:$dst),
	EFLAGS))]>;			(ins RFP80:$src1, RFP80:$src2, i8imm:$cond),
	} // UsesCustomInserter = 1, Uses = [EFLAGS]			"#CMOV_RFP80 PSEUDO!",
				[(set RFP80:$dst,
				(X86cmov RFP80:$src1, RFP80:$src2, imm:$cond,
	//===----------------------------------------------------------------------===//			EFLAGS))]>;
	// Normal-Instructions-With-Lock-Prefix Pseudo Instructions			} // UsesCustomInserter = 1, Uses = [EFLAGS]
	//===----------------------------------------------------------------------===//

	// FIXME: Use normal instructions and add lock prefix dynamically.			//===----------------------------------------------------------------------===//
				// Normal-Instructions-With-Lock-Prefix Pseudo Instructions
	// Memory barriers			//===----------------------------------------------------------------------===//

	// TODO: Get this to fold the constant into the instruction.			// FIXME: Use normal instructions and add lock prefix dynamically.
	let isCodeGenOnly = 1, Defs = [EFLAGS] in
	def OR32mrLocked : I<0x09, MRMDestMem, (outs), (ins i32mem:$dst, GR32:$zero),			// Memory barriers
	"or{l}\t{$zero, $dst\|$dst, $zero}",
	[], IIC_ALU_MEM>, Requires<[Not64BitMode]>, LOCK,			// TODO: Get this to fold the constant into the instruction.
	Sched<[WriteALULd, WriteRMW]>;			let isCodeGenOnly = 1, Defs = [EFLAGS] in
				def OR32mrLocked : I<0x09, MRMDestMem, (outs), (ins i32mem:$dst, GR32:$zero),
	let hasSideEffects = 1 in			"or{l}\t{$zero, $dst\|$dst, $zero}",
	def Int_MemBarrier : I<0, Pseudo, (outs), (ins),			[], IIC_ALU_MEM>, Requires<[Not64BitMode]>, LOCK,
	"#MEMBARRIER",			Sched<[WriteALULd, WriteRMW]>;
	[(X86MemBarrier)]>, Sched<[WriteLoad]>;
				let hasSideEffects = 1 in
	// RegOpc corresponds to the mr version of the instruction			def Int_MemBarrier : I<0, Pseudo, (outs), (ins),
	// ImmOpc corresponds to the mi version of the instruction			"#MEMBARRIER",
	// ImmOpc8 corresponds to the mi8 version of the instruction			[(X86MemBarrier)]>, Sched<[WriteLoad]>;
	// ImmMod corresponds to the instruction format of the mi and mi8 versions
	multiclass LOCK_ArithBinOp<bits<8> RegOpc, bits<8> ImmOpc, bits<8> ImmOpc8,			// RegOpc corresponds to the mr version of the instruction
	Format ImmMod, string mnemonic> {			// ImmOpc corresponds to the mi version of the instruction
	let Defs = [EFLAGS], mayLoad = 1, mayStore = 1, isCodeGenOnly = 1,			// ImmOpc8 corresponds to the mi8 version of the instruction
	SchedRW = [WriteALULd, WriteRMW] in {			// ImmMod corresponds to the instruction format of the mi and mi8 versions
				multiclass LOCK_ArithBinOp<bits<8> RegOpc, bits<8> ImmOpc, bits<8> ImmOpc8,
	def NAME#8mr : I<{RegOpc{7}, RegOpc{6}, RegOpc{5}, RegOpc{4},			Format ImmMod, string mnemonic> {
	RegOpc{3}, RegOpc{2}, RegOpc{1}, 0 },			let Defs = [EFLAGS], mayLoad = 1, mayStore = 1, isCodeGenOnly = 1,
	MRMDestMem, (outs), (ins i8mem:$dst, GR8:$src2),			SchedRW = [WriteALULd, WriteRMW] in {
	!strconcat(mnemonic, "{b}\t",
	"{$src2, $dst\|$dst, $src2}"),			def NAME#8mr : I<{RegOpc{7}, RegOpc{6}, RegOpc{5}, RegOpc{4},
	[], IIC_ALU_NONMEM>, LOCK;			RegOpc{3}, RegOpc{2}, RegOpc{1}, 0 },
	def NAME#16mr : I<{RegOpc{7}, RegOpc{6}, RegOpc{5}, RegOpc{4},			MRMDestMem, (outs), (ins i8mem:$dst, GR8:$src2),
	RegOpc{3}, RegOpc{2}, RegOpc{1}, 1 },			!strconcat(mnemonic, "{b}\t",
	MRMDestMem, (outs), (ins i16mem:$dst, GR16:$src2),			"{$src2, $dst\|$dst, $src2}"),
	!strconcat(mnemonic, "{w}\t",			[], IIC_ALU_NONMEM>, LOCK;
	"{$src2, $dst\|$dst, $src2}"),			def NAME#16mr : I<{RegOpc{7}, RegOpc{6}, RegOpc{5}, RegOpc{4},
	[], IIC_ALU_NONMEM>, OpSize16, LOCK;			RegOpc{3}, RegOpc{2}, RegOpc{1}, 1 },
	def NAME#32mr : I<{RegOpc{7}, RegOpc{6}, RegOpc{5}, RegOpc{4},			MRMDestMem, (outs), (ins i16mem:$dst, GR16:$src2),
	RegOpc{3}, RegOpc{2}, RegOpc{1}, 1 },			!strconcat(mnemonic, "{w}\t",
	MRMDestMem, (outs), (ins i32mem:$dst, GR32:$src2),			"{$src2, $dst\|$dst, $src2}"),
	!strconcat(mnemonic, "{l}\t",			[], IIC_ALU_NONMEM>, OpSize16, LOCK;
	"{$src2, $dst\|$dst, $src2}"),			def NAME#32mr : I<{RegOpc{7}, RegOpc{6}, RegOpc{5}, RegOpc{4},
	[], IIC_ALU_NONMEM>, OpSize32, LOCK;			RegOpc{3}, RegOpc{2}, RegOpc{1}, 1 },
	def NAME#64mr : RI<{RegOpc{7}, RegOpc{6}, RegOpc{5}, RegOpc{4},			MRMDestMem, (outs), (ins i32mem:$dst, GR32:$src2),
	RegOpc{3}, RegOpc{2}, RegOpc{1}, 1 },			!strconcat(mnemonic, "{l}\t",
	MRMDestMem, (outs), (ins i64mem:$dst, GR64:$src2),			"{$src2, $dst\|$dst, $src2}"),
	!strconcat(mnemonic, "{q}\t",			[], IIC_ALU_NONMEM>, OpSize32, LOCK;
	"{$src2, $dst\|$dst, $src2}"),			def NAME#64mr : RI<{RegOpc{7}, RegOpc{6}, RegOpc{5}, RegOpc{4},
	[], IIC_ALU_NONMEM>, LOCK;			RegOpc{3}, RegOpc{2}, RegOpc{1}, 1 },
				MRMDestMem, (outs), (ins i64mem:$dst, GR64:$src2),
	def NAME#8mi : Ii8<{ImmOpc{7}, ImmOpc{6}, ImmOpc{5}, ImmOpc{4},			!strconcat(mnemonic, "{q}\t",
	ImmOpc{3}, ImmOpc{2}, ImmOpc{1}, 0 },			"{$src2, $dst\|$dst, $src2}"),
	ImmMod, (outs), (ins i8mem :$dst, i8imm :$src2),			[], IIC_ALU_NONMEM>, LOCK;
	!strconcat(mnemonic, "{b}\t",
	"{$src2, $dst\|$dst, $src2}"),			def NAME#8mi : Ii8<{ImmOpc{7}, ImmOpc{6}, ImmOpc{5}, ImmOpc{4},
	[], IIC_ALU_MEM>, LOCK;			ImmOpc{3}, ImmOpc{2}, ImmOpc{1}, 0 },
				ImmMod, (outs), (ins i8mem :$dst, i8imm :$src2),
	def NAME#16mi : Ii16<{ImmOpc{7}, ImmOpc{6}, ImmOpc{5}, ImmOpc{4},			!strconcat(mnemonic, "{b}\t",
	ImmOpc{3}, ImmOpc{2}, ImmOpc{1}, 1 },			"{$src2, $dst\|$dst, $src2}"),
	ImmMod, (outs), (ins i16mem :$dst, i16imm :$src2),			[], IIC_ALU_MEM>, LOCK;
	!strconcat(mnemonic, "{w}\t",
	"{$src2, $dst\|$dst, $src2}"),			def NAME#16mi : Ii16<{ImmOpc{7}, ImmOpc{6}, ImmOpc{5}, ImmOpc{4},
	[], IIC_ALU_MEM>, OpSize16, LOCK;			ImmOpc{3}, ImmOpc{2}, ImmOpc{1}, 1 },
				ImmMod, (outs), (ins i16mem :$dst, i16imm :$src2),
	def NAME#32mi : Ii32<{ImmOpc{7}, ImmOpc{6}, ImmOpc{5}, ImmOpc{4},			!strconcat(mnemonic, "{w}\t",
	ImmOpc{3}, ImmOpc{2}, ImmOpc{1}, 1 },			"{$src2, $dst\|$dst, $src2}"),
	ImmMod, (outs), (ins i32mem :$dst, i32imm :$src2),			[], IIC_ALU_MEM>, OpSize16, LOCK;
	!strconcat(mnemonic, "{l}\t",
	"{$src2, $dst\|$dst, $src2}"),			def NAME#32mi : Ii32<{ImmOpc{7}, ImmOpc{6}, ImmOpc{5}, ImmOpc{4},
	[], IIC_ALU_MEM>, OpSize32, LOCK;			ImmOpc{3}, ImmOpc{2}, ImmOpc{1}, 1 },
				ImmMod, (outs), (ins i32mem :$dst, i32imm :$src2),
	def NAME#64mi32 : RIi32S<{ImmOpc{7}, ImmOpc{6}, ImmOpc{5}, ImmOpc{4},			!strconcat(mnemonic, "{l}\t",
	ImmOpc{3}, ImmOpc{2}, ImmOpc{1}, 1 },			"{$src2, $dst\|$dst, $src2}"),
	ImmMod, (outs), (ins i64mem :$dst, i64i32imm :$src2),			[], IIC_ALU_MEM>, OpSize32, LOCK;
	!strconcat(mnemonic, "{q}\t",
	"{$src2, $dst\|$dst, $src2}"),			def NAME#64mi32 : RIi32S<{ImmOpc{7}, ImmOpc{6}, ImmOpc{5}, ImmOpc{4},
	[], IIC_ALU_MEM>, LOCK;			ImmOpc{3}, ImmOpc{2}, ImmOpc{1}, 1 },
				ImmMod, (outs), (ins i64mem :$dst, i64i32imm :$src2),
	def NAME#16mi8 : Ii8<{ImmOpc8{7}, ImmOpc8{6}, ImmOpc8{5}, ImmOpc8{4},			!strconcat(mnemonic, "{q}\t",
	ImmOpc8{3}, ImmOpc8{2}, ImmOpc8{1}, 1 },			"{$src2, $dst\|$dst, $src2}"),
	ImmMod, (outs), (ins i16mem :$dst, i16i8imm :$src2),			[], IIC_ALU_MEM>, LOCK;
	!strconcat(mnemonic, "{w}\t",
	"{$src2, $dst\|$dst, $src2}"),			def NAME#16mi8 : Ii8<{ImmOpc8{7}, ImmOpc8{6}, ImmOpc8{5}, ImmOpc8{4},
	[], IIC_ALU_MEM>, OpSize16, LOCK;			ImmOpc8{3}, ImmOpc8{2}, ImmOpc8{1}, 1 },
	def NAME#32mi8 : Ii8<{ImmOpc8{7}, ImmOpc8{6}, ImmOpc8{5}, ImmOpc8{4},			ImmMod, (outs), (ins i16mem :$dst, i16i8imm :$src2),
	ImmOpc8{3}, ImmOpc8{2}, ImmOpc8{1}, 1 },			!strconcat(mnemonic, "{w}\t",
	ImmMod, (outs), (ins i32mem :$dst, i32i8imm :$src2),			"{$src2, $dst\|$dst, $src2}"),
	!strconcat(mnemonic, "{l}\t",			[], IIC_ALU_MEM>, OpSize16, LOCK;
	"{$src2, $dst\|$dst, $src2}"),			def NAME#32mi8 : Ii8<{ImmOpc8{7}, ImmOpc8{6}, ImmOpc8{5}, ImmOpc8{4},
	[], IIC_ALU_MEM>, OpSize32, LOCK;			ImmOpc8{3}, ImmOpc8{2}, ImmOpc8{1}, 1 },
	def NAME#64mi8 : RIi8<{ImmOpc8{7}, ImmOpc8{6}, ImmOpc8{5}, ImmOpc8{4},			ImmMod, (outs), (ins i32mem :$dst, i32i8imm :$src2),
	ImmOpc8{3}, ImmOpc8{2}, ImmOpc8{1}, 1 },			!strconcat(mnemonic, "{l}\t",
	ImmMod, (outs), (ins i64mem :$dst, i64i8imm :$src2),			"{$src2, $dst\|$dst, $src2}"),
	!strconcat(mnemonic, "{q}\t",			[], IIC_ALU_MEM>, OpSize32, LOCK;
	"{$src2, $dst\|$dst, $src2}"),			def NAME#64mi8 : RIi8<{ImmOpc8{7}, ImmOpc8{6}, ImmOpc8{5}, ImmOpc8{4},
	[], IIC_ALU_MEM>, LOCK;			ImmOpc8{3}, ImmOpc8{2}, ImmOpc8{1}, 1 },
				ImmMod, (outs), (ins i64mem :$dst, i64i8imm :$src2),
	}			!strconcat(mnemonic, "{q}\t",
				"{$src2, $dst\|$dst, $src2}"),
	}			[], IIC_ALU_MEM>, LOCK;

	defm LOCK_ADD : LOCK_ArithBinOp<0x00, 0x80, 0x83, MRM0m, "add">;			}
	defm LOCK_SUB : LOCK_ArithBinOp<0x28, 0x80, 0x83, MRM5m, "sub">;
	defm LOCK_OR : LOCK_ArithBinOp<0x08, 0x80, 0x83, MRM1m, "or">;			}
	defm LOCK_AND : LOCK_ArithBinOp<0x20, 0x80, 0x83, MRM4m, "and">;
	defm LOCK_XOR : LOCK_ArithBinOp<0x30, 0x80, 0x83, MRM6m, "xor">;			defm LOCK_ADD : LOCK_ArithBinOp<0x00, 0x80, 0x83, MRM0m, "add">;
				defm LOCK_SUB : LOCK_ArithBinOp<0x28, 0x80, 0x83, MRM5m, "sub">;
	// Optimized codegen when the non-memory output is not used.			defm LOCK_OR : LOCK_ArithBinOp<0x08, 0x80, 0x83, MRM1m, "or">;
	multiclass LOCK_ArithUnOp<bits<8> Opc8, bits<8> Opc, Format Form,			defm LOCK_AND : LOCK_ArithBinOp<0x20, 0x80, 0x83, MRM4m, "and">;
	string mnemonic> {			defm LOCK_XOR : LOCK_ArithBinOp<0x30, 0x80, 0x83, MRM6m, "xor">;
	let Defs = [EFLAGS], mayLoad = 1, mayStore = 1, isCodeGenOnly = 1,
	SchedRW = [WriteALULd, WriteRMW] in {			// Optimized codegen when the non-memory output is not used.
				multiclass LOCK_ArithUnOp<bits<8> Opc8, bits<8> Opc, Format Form,
	def NAME#8m : I<Opc8, Form, (outs), (ins i8mem :$dst),			string mnemonic> {
	!strconcat(mnemonic, "{b}\t$dst"),			let Defs = [EFLAGS], mayLoad = 1, mayStore = 1, isCodeGenOnly = 1,
	[], IIC_UNARY_MEM>, LOCK;			SchedRW = [WriteALULd, WriteRMW] in {
	def NAME#16m : I<Opc, Form, (outs), (ins i16mem:$dst),
	!strconcat(mnemonic, "{w}\t$dst"),			def NAME#8m : I<Opc8, Form, (outs), (ins i8mem :$dst),
	[], IIC_UNARY_MEM>, OpSize16, LOCK;			!strconcat(mnemonic, "{b}\t$dst"),
	def NAME#32m : I<Opc, Form, (outs), (ins i32mem:$dst),			[], IIC_UNARY_MEM>, LOCK;
	!strconcat(mnemonic, "{l}\t$dst"),			def NAME#16m : I<Opc, Form, (outs), (ins i16mem:$dst),
	[], IIC_UNARY_MEM>, OpSize32, LOCK;			!strconcat(mnemonic, "{w}\t$dst"),
	def NAME#64m : RI<Opc, Form, (outs), (ins i64mem:$dst),			[], IIC_UNARY_MEM>, OpSize16, LOCK;
	!strconcat(mnemonic, "{q}\t$dst"),			def NAME#32m : I<Opc, Form, (outs), (ins i32mem:$dst),
	[], IIC_UNARY_MEM>, LOCK;			!strconcat(mnemonic, "{l}\t$dst"),
	}			[], IIC_UNARY_MEM>, OpSize32, LOCK;
	}			def NAME#64m : RI<Opc, Form, (outs), (ins i64mem:$dst),
				!strconcat(mnemonic, "{q}\t$dst"),
	defm LOCK_INC : LOCK_ArithUnOp<0xFE, 0xFF, MRM0m, "inc">;			[], IIC_UNARY_MEM>, LOCK;
	defm LOCK_DEC : LOCK_ArithUnOp<0xFE, 0xFF, MRM1m, "dec">;			}
				}
	// Atomic compare and swap.
	multiclass LCMPXCHG_UnOp<bits<8> Opc, Format Form, string mnemonic,			defm LOCK_INC : LOCK_ArithUnOp<0xFE, 0xFF, MRM0m, "inc">;
	SDPatternOperator frag, X86MemOperand x86memop,			defm LOCK_DEC : LOCK_ArithUnOp<0xFE, 0xFF, MRM1m, "dec">;
	InstrItinClass itin> {
	let isCodeGenOnly = 1 in {			// Atomic compare and swap.
	def NAME : I<Opc, Form, (outs), (ins x86memop:$ptr),			multiclass LCMPXCHG_UnOp<bits<8> Opc, Format Form, string mnemonic,
	!strconcat(mnemonic, "\t$ptr"),			SDPatternOperator frag, X86MemOperand x86memop,
	[(frag addr:$ptr)], itin>, TB, LOCK;			InstrItinClass itin> {
	}			let isCodeGenOnly = 1 in {
	}			def NAME : I<Opc, Form, (outs), (ins x86memop:$ptr),
				!strconcat(mnemonic, "\t$ptr"),
	multiclass LCMPXCHG_BinOp<bits<8> Opc8, bits<8> Opc, Format Form,			[(frag addr:$ptr)], itin>, TB, LOCK;
	string mnemonic, SDPatternOperator frag,			}
	InstrItinClass itin8, InstrItinClass itin> {			}
	let isCodeGenOnly = 1, SchedRW = [WriteALULd, WriteRMW] in {
	let Defs = [AL, EFLAGS], Uses = [AL] in			multiclass LCMPXCHG_BinOp<bits<8> Opc8, bits<8> Opc, Format Form,
	def NAME#8 : I<Opc8, Form, (outs), (ins i8mem:$ptr, GR8:$swap),			string mnemonic, SDPatternOperator frag,
	!strconcat(mnemonic, "{b}\t{$swap, $ptr\|$ptr, $swap}"),			InstrItinClass itin8, InstrItinClass itin> {
	[(frag addr:$ptr, GR8:$swap, 1)], itin8>, TB, LOCK;			let isCodeGenOnly = 1, SchedRW = [WriteALULd, WriteRMW] in {
	let Defs = [AX, EFLAGS], Uses = [AX] in			let Defs = [AL, EFLAGS], Uses = [AL] in
	def NAME#16 : I<Opc, Form, (outs), (ins i16mem:$ptr, GR16:$swap),			def NAME#8 : I<Opc8, Form, (outs), (ins i8mem:$ptr, GR8:$swap),
	!strconcat(mnemonic, "{w}\t{$swap, $ptr\|$ptr, $swap}"),			!strconcat(mnemonic, "{b}\t{$swap, $ptr\|$ptr, $swap}"),
	[(frag addr:$ptr, GR16:$swap, 2)], itin>, TB, OpSize16, LOCK;			[(frag addr:$ptr, GR8:$swap, 1)], itin8>, TB, LOCK;
	let Defs = [EAX, EFLAGS], Uses = [EAX] in			let Defs = [AX, EFLAGS], Uses = [AX] in
	def NAME#32 : I<Opc, Form, (outs), (ins i32mem:$ptr, GR32:$swap),			def NAME#16 : I<Opc, Form, (outs), (ins i16mem:$ptr, GR16:$swap),
	!strconcat(mnemonic, "{l}\t{$swap, $ptr\|$ptr, $swap}"),			!strconcat(mnemonic, "{w}\t{$swap, $ptr\|$ptr, $swap}"),
	[(frag addr:$ptr, GR32:$swap, 4)], itin>, TB, OpSize32, LOCK;			[(frag addr:$ptr, GR16:$swap, 2)], itin>, TB, OpSize16, LOCK;
	let Defs = [RAX, EFLAGS], Uses = [RAX] in			let Defs = [EAX, EFLAGS], Uses = [EAX] in
	def NAME#64 : RI<Opc, Form, (outs), (ins i64mem:$ptr, GR64:$swap),			def NAME#32 : I<Opc, Form, (outs), (ins i32mem:$ptr, GR32:$swap),
	!strconcat(mnemonic, "{q}\t{$swap, $ptr\|$ptr, $swap}"),			!strconcat(mnemonic, "{l}\t{$swap, $ptr\|$ptr, $swap}"),
	[(frag addr:$ptr, GR64:$swap, 8)], itin>, TB, LOCK;			[(frag addr:$ptr, GR32:$swap, 4)], itin>, TB, OpSize32, LOCK;
	}			let Defs = [RAX, EFLAGS], Uses = [RAX] in
	}			def NAME#64 : RI<Opc, Form, (outs), (ins i64mem:$ptr, GR64:$swap),
				!strconcat(mnemonic, "{q}\t{$swap, $ptr\|$ptr, $swap}"),
	let Defs = [EAX, EDX, EFLAGS], Uses = [EAX, EBX, ECX, EDX],			[(frag addr:$ptr, GR64:$swap, 8)], itin>, TB, LOCK;
	SchedRW = [WriteALULd, WriteRMW] in {			}
	defm LCMPXCHG8B : LCMPXCHG_UnOp<0xC7, MRM1m, "cmpxchg8b",			}
	X86cas8, i64mem,
	IIC_CMPX_LOCK_8B>;			let Defs = [EAX, EDX, EFLAGS], Uses = [EAX, EBX, ECX, EDX],
	}			SchedRW = [WriteALULd, WriteRMW] in {
				defm LCMPXCHG8B : LCMPXCHG_UnOp<0xC7, MRM1m, "cmpxchg8b",
	let Defs = [RAX, RDX, EFLAGS], Uses = [RAX, RBX, RCX, RDX],			X86cas8, i64mem,
	Predicates = [HasCmpxchg16b], SchedRW = [WriteALULd, WriteRMW] in {			IIC_CMPX_LOCK_8B>;
	defm LCMPXCHG16B : LCMPXCHG_UnOp<0xC7, MRM1m, "cmpxchg16b",			}
	X86cas16, i128mem,
	IIC_CMPX_LOCK_16B>, REX_W;			let Defs = [RAX, RDX, EFLAGS], Uses = [RAX, RBX, RCX, RDX],
	}			Predicates = [HasCmpxchg16b], SchedRW = [WriteALULd, WriteRMW] in {
				defm LCMPXCHG16B : LCMPXCHG_UnOp<0xC7, MRM1m, "cmpxchg16b",
	defm LCMPXCHG : LCMPXCHG_BinOp<0xB0, 0xB1, MRMDestMem, "cmpxchg",			X86cas16, i128mem,
	X86cas, IIC_CMPX_LOCK_8, IIC_CMPX_LOCK>;			IIC_CMPX_LOCK_16B>, REX_W;
				}
	// Atomic exchange and add
	multiclass ATOMIC_LOAD_BINOP<bits<8> opc8, bits<8> opc, string mnemonic,			defm LCMPXCHG : LCMPXCHG_BinOp<0xB0, 0xB1, MRMDestMem, "cmpxchg",
	string frag,			X86cas, IIC_CMPX_LOCK_8, IIC_CMPX_LOCK>;
	InstrItinClass itin8, InstrItinClass itin> {
	let Constraints = "$val = $dst", Defs = [EFLAGS], isCodeGenOnly = 1,			// Atomic exchange and add
	SchedRW = [WriteALULd, WriteRMW] in {			multiclass ATOMIC_LOAD_BINOP<bits<8> opc8, bits<8> opc, string mnemonic,
	def NAME#8 : I<opc8, MRMSrcMem, (outs GR8:$dst),			string frag,
	(ins GR8:$val, i8mem:$ptr),			InstrItinClass itin8, InstrItinClass itin> {
	!strconcat(mnemonic, "{b}\t{$val, $ptr\|$ptr, $val}"),			let Constraints = "$val = $dst", Defs = [EFLAGS], isCodeGenOnly = 1,
	[(set GR8:$dst,			SchedRW = [WriteALULd, WriteRMW] in {
	(!cast<PatFrag>(frag # "_8") addr:$ptr, GR8:$val))],			def NAME#8 : I<opc8, MRMSrcMem, (outs GR8:$dst),
	itin8>;			(ins GR8:$val, i8mem:$ptr),
	def NAME#16 : I<opc, MRMSrcMem, (outs GR16:$dst),			!strconcat(mnemonic, "{b}\t{$val, $ptr\|$ptr, $val}"),
	(ins GR16:$val, i16mem:$ptr),			[(set GR8:$dst,
	!strconcat(mnemonic, "{w}\t{$val, $ptr\|$ptr, $val}"),			(!cast<PatFrag>(frag # "_8") addr:$ptr, GR8:$val))],
	[(set			itin8>;
	GR16:$dst,			def NAME#16 : I<opc, MRMSrcMem, (outs GR16:$dst),
	(!cast<PatFrag>(frag # "_16") addr:$ptr, GR16:$val))],			(ins GR16:$val, i16mem:$ptr),
	itin>, OpSize16;			!strconcat(mnemonic, "{w}\t{$val, $ptr\|$ptr, $val}"),
	def NAME#32 : I<opc, MRMSrcMem, (outs GR32:$dst),			[(set
	(ins GR32:$val, i32mem:$ptr),			GR16:$dst,
	!strconcat(mnemonic, "{l}\t{$val, $ptr\|$ptr, $val}"),			(!cast<PatFrag>(frag # "_16") addr:$ptr, GR16:$val))],
	[(set			itin>, OpSize16;
	GR32:$dst,			def NAME#32 : I<opc, MRMSrcMem, (outs GR32:$dst),
	(!cast<PatFrag>(frag # "_32") addr:$ptr, GR32:$val))],			(ins GR32:$val, i32mem:$ptr),
	itin>, OpSize32;			!strconcat(mnemonic, "{l}\t{$val, $ptr\|$ptr, $val}"),
	def NAME#64 : RI<opc, MRMSrcMem, (outs GR64:$dst),			[(set
	(ins GR64:$val, i64mem:$ptr),			GR32:$dst,
	!strconcat(mnemonic, "{q}\t{$val, $ptr\|$ptr, $val}"),			(!cast<PatFrag>(frag # "_32") addr:$ptr, GR32:$val))],
	[(set			itin>, OpSize32;
	GR64:$dst,			def NAME#64 : RI<opc, MRMSrcMem, (outs GR64:$dst),
	(!cast<PatFrag>(frag # "_64") addr:$ptr, GR64:$val))],			(ins GR64:$val, i64mem:$ptr),
	itin>;			!strconcat(mnemonic, "{q}\t{$val, $ptr\|$ptr, $val}"),
	}			[(set
	}			GR64:$dst,
				(!cast<PatFrag>(frag # "_64") addr:$ptr, GR64:$val))],
	defm LXADD : ATOMIC_LOAD_BINOP<0xc0, 0xc1, "xadd", "atomic_load_add",			itin>;
	IIC_XADD_LOCK_MEM8, IIC_XADD_LOCK_MEM>,			}
	TB, LOCK;			}

	/* The following multiclass tries to make sure that in code like			defm LXADD : ATOMIC_LOAD_BINOP<0xc0, 0xc1, "xadd", "atomic_load_add",
	* x.store (immediate op x.load(acquire), release)			IIC_XADD_LOCK_MEM8, IIC_XADD_LOCK_MEM>,
	* an operation directly on memory is generated instead of wasting a register.			TB, LOCK;
	* It is not automatic as atomic_store/load are only lowered to MOV instructions
	* extremely late to prevent them from being accidentally reordered in the backend			/* The following multiclass tries to make sure that in code like
	* (see below the RELEASE_MOV* / ACQUIRE_MOV* pseudo-instructions)			* x.store (immediate op x.load(acquire), release)
	*/			* an operation directly on memory is generated instead of wasting a register.
	multiclass RELEASE_BINOP_MI<string op> {			* It is not automatic as atomic_store/load are only lowered to MOV instructions
	def NAME#8mi : I<0, Pseudo, (outs), (ins i8mem:$dst, i8imm:$src),			* extremely late to prevent them from being accidentally reordered in the backend
	"#RELEASE_BINOP PSEUDO!",			* (see below the RELEASE_MOV* / ACQUIRE_MOV* pseudo-instructions)
	[(atomic_store_8 addr:$dst, (!cast<PatFrag>(op)			*/
	(atomic_load_8 addr:$dst), (i8 imm:$src)))]>;			multiclass RELEASE_BINOP_MI<string op> {
	// NAME#16 is not generated as 16-bit arithmetic instructions are considered			def NAME#8mi : I<0, Pseudo, (outs), (ins i8mem:$dst, i8imm:$src),
	// costly and avoided as far as possible by this backend anyway			"#RELEASE_BINOP PSEUDO!",
	def NAME#32mi : I<0, Pseudo, (outs), (ins i32mem:$dst, i32imm:$src),			[(atomic_store_8 addr:$dst, (!cast<PatFrag>(op)
	"#RELEASE_BINOP PSEUDO!",			(atomic_load_8 addr:$dst), (i8 imm:$src)))]>;
	[(atomic_store_32 addr:$dst, (!cast<PatFrag>(op)			// NAME#16 is not generated as 16-bit arithmetic instructions are considered
	(atomic_load_32 addr:$dst), (i32 imm:$src)))]>;			// costly and avoided as far as possible by this backend anyway
	def NAME#64mi32 : I<0, Pseudo, (outs), (ins i64mem:$dst, i64i32imm:$src),			def NAME#32mi : I<0, Pseudo, (outs), (ins i32mem:$dst, i32imm:$src),
	"#RELEASE_BINOP PSEUDO!",			"#RELEASE_BINOP PSEUDO!",
	[(atomic_store_64 addr:$dst, (!cast<PatFrag>(op)			[(atomic_store_32 addr:$dst, (!cast<PatFrag>(op)
	(atomic_load_64 addr:$dst), (i64immSExt32:$src)))]>;			(atomic_load_32 addr:$dst), (i32 imm:$src)))]>;
	}			def NAME#64mi32 : I<0, Pseudo, (outs), (ins i64mem:$dst, i64i32imm:$src),
	defm RELEASE_ADD : RELEASE_BINOP_MI<"add">;			"#RELEASE_BINOP PSEUDO!",
	defm RELEASE_AND : RELEASE_BINOP_MI<"and">;			[(atomic_store_64 addr:$dst, (!cast<PatFrag>(op)
	defm RELEASE_OR : RELEASE_BINOP_MI<"or">;			(atomic_load_64 addr:$dst), (i64immSExt32:$src)))]>;
	defm RELEASE_XOR : RELEASE_BINOP_MI<"xor">;			}
	// Note: we don't deal with sub, because substractions of constants are			defm RELEASE_ADD : RELEASE_BINOP_MI<"add">;
	// optimized into additions before this code can run			defm RELEASE_AND : RELEASE_BINOP_MI<"and">;
				defm RELEASE_OR : RELEASE_BINOP_MI<"or">;
	multiclass RELEASE_UNOP<dag dag8, dag dag16, dag dag32, dag dag64> {			defm RELEASE_XOR : RELEASE_BINOP_MI<"xor">;
	def NAME#8m : I<0, Pseudo, (outs), (ins i8mem:$dst),			// Note: we don't deal with sub, because substractions of constants are
	"#RELEASE_UNOP PSEUDO!",			// optimized into additions before this code can run
	[(atomic_store_8 addr:$dst, dag8)]>;
	def NAME#16m : I<0, Pseudo, (outs), (ins i16mem:$dst),			multiclass RELEASE_UNOP<dag dag8, dag dag16, dag dag32, dag dag64> {
	"#RELEASE_UNOP PSEUDO!",			def NAME#8m : I<0, Pseudo, (outs), (ins i8mem:$dst),
	[(atomic_store_16 addr:$dst, dag16)]>;			"#RELEASE_UNOP PSEUDO!",
	def NAME#32m : I<0, Pseudo, (outs), (ins i32mem:$dst),			[(atomic_store_8 addr:$dst, dag8)]>;
	"#RELEASE_UNOP PSEUDO!",			def NAME#16m : I<0, Pseudo, (outs), (ins i16mem:$dst),
	[(atomic_store_32 addr:$dst, dag32)]>;			"#RELEASE_UNOP PSEUDO!",
	def NAME#64m : I<0, Pseudo, (outs), (ins i64mem:$dst),			[(atomic_store_16 addr:$dst, dag16)]>;
	"#RELEASE_UNOP PSEUDO!",			def NAME#32m : I<0, Pseudo, (outs), (ins i32mem:$dst),
	[(atomic_store_64 addr:$dst, dag64)]>;			"#RELEASE_UNOP PSEUDO!",
	}			[(atomic_store_32 addr:$dst, dag32)]>;
				def NAME#64m : I<0, Pseudo, (outs), (ins i64mem:$dst),
	defm RELEASE_INC : RELEASE_UNOP<			"#RELEASE_UNOP PSEUDO!",
	(add (atomic_load_8 addr:$dst), (i8 1)),			[(atomic_store_64 addr:$dst, dag64)]>;
	(add (atomic_load_16 addr:$dst), (i16 1)),			}
	(add (atomic_load_32 addr:$dst), (i32 1)),
	(add (atomic_load_64 addr:$dst), (i64 1))>, Requires<[NotSlowIncDec]>;			defm RELEASE_INC : RELEASE_UNOP<
	defm RELEASE_DEC : RELEASE_UNOP<			(add (atomic_load_8 addr:$dst), (i8 1)),
	(add (atomic_load_8 addr:$dst), (i8 -1)),			(add (atomic_load_16 addr:$dst), (i16 1)),
	(add (atomic_load_16 addr:$dst), (i16 -1)),			(add (atomic_load_32 addr:$dst), (i32 1)),
	(add (atomic_load_32 addr:$dst), (i32 -1)),			(add (atomic_load_64 addr:$dst), (i64 1))>, Requires<[NotSlowIncDec]>;
	(add (atomic_load_64 addr:$dst), (i64 -1))>, Requires<[NotSlowIncDec]>;			defm RELEASE_DEC : RELEASE_UNOP<
	/*			(add (atomic_load_8 addr:$dst), (i8 -1)),
	TODO: These don't work because the type inference of TableGen fails.			(add (atomic_load_16 addr:$dst), (i16 -1)),
	TODO: find a way to fix it.			(add (atomic_load_32 addr:$dst), (i32 -1)),
	defm RELEASE_NEG : RELEASE_UNOP<			(add (atomic_load_64 addr:$dst), (i64 -1))>, Requires<[NotSlowIncDec]>;
	(ineg (atomic_load_8 addr:$dst)),			/*
	(ineg (atomic_load_16 addr:$dst)),			TODO: These don't work because the type inference of TableGen fails.
	(ineg (atomic_load_32 addr:$dst)),			TODO: find a way to fix it.
	(ineg (atomic_load_64 addr:$dst))>;			defm RELEASE_NEG : RELEASE_UNOP<
	defm RELEASE_NOT : RELEASE_UNOP<			(ineg (atomic_load_8 addr:$dst)),
	(not (atomic_load_8 addr:$dst)),			(ineg (atomic_load_16 addr:$dst)),
	(not (atomic_load_16 addr:$dst)),			(ineg (atomic_load_32 addr:$dst)),
	(not (atomic_load_32 addr:$dst)),			(ineg (atomic_load_64 addr:$dst))>;
	(not (atomic_load_64 addr:$dst))>;			defm RELEASE_NOT : RELEASE_UNOP<
	*/			(not (atomic_load_8 addr:$dst)),
				(not (atomic_load_16 addr:$dst)),
	def RELEASE_MOV8mi : I<0, Pseudo, (outs), (ins i8mem:$dst, i8imm:$src),			(not (atomic_load_32 addr:$dst)),
	"#RELEASE_MOV PSEUDO !",			(not (atomic_load_64 addr:$dst))>;
	[(atomic_store_8 addr:$dst, (i8 imm:$src))]>;			*/
	def RELEASE_MOV16mi : I<0, Pseudo, (outs), (ins i16mem:$dst, i16imm:$src),
	"#RELEASE_MOV PSEUDO !",			def RELEASE_MOV8mi : I<0, Pseudo, (outs), (ins i8mem:$dst, i8imm:$src),
	[(atomic_store_16 addr:$dst, (i16 imm:$src))]>;			"#RELEASE_MOV PSEUDO !",
	def RELEASE_MOV32mi : I<0, Pseudo, (outs), (ins i32mem:$dst, i32imm:$src),			[(atomic_store_8 addr:$dst, (i8 imm:$src))]>;
	"#RELEASE_MOV PSEUDO !",			def RELEASE_MOV16mi : I<0, Pseudo, (outs), (ins i16mem:$dst, i16imm:$src),
	[(atomic_store_32 addr:$dst, (i32 imm:$src))]>;			"#RELEASE_MOV PSEUDO !",
	def RELEASE_MOV64mi32 : I<0, Pseudo, (outs), (ins i64mem:$dst, i64i32imm:$src),			[(atomic_store_16 addr:$dst, (i16 imm:$src))]>;
	"#RELEASE_MOV PSEUDO !",			def RELEASE_MOV32mi : I<0, Pseudo, (outs), (ins i32mem:$dst, i32imm:$src),
	[(atomic_store_64 addr:$dst, i64immSExt32:$src)]>;			"#RELEASE_MOV PSEUDO !",
				[(atomic_store_32 addr:$dst, (i32 imm:$src))]>;
	def RELEASE_MOV8mr : I<0, Pseudo, (outs), (ins i8mem :$dst, GR8 :$src),			def RELEASE_MOV64mi32 : I<0, Pseudo, (outs), (ins i64mem:$dst, i64i32imm:$src),
	"#RELEASE_MOV PSEUDO!",			"#RELEASE_MOV PSEUDO !",
	[(atomic_store_8 addr:$dst, GR8 :$src)]>;			[(atomic_store_64 addr:$dst, i64immSExt32:$src)]>;
	def RELEASE_MOV16mr : I<0, Pseudo, (outs), (ins i16mem:$dst, GR16:$src),
	"#RELEASE_MOV PSEUDO!",			def RELEASE_MOV8mr : I<0, Pseudo, (outs), (ins i8mem :$dst, GR8 :$src),
	[(atomic_store_16 addr:$dst, GR16:$src)]>;			"#RELEASE_MOV PSEUDO!",
	def RELEASE_MOV32mr : I<0, Pseudo, (outs), (ins i32mem:$dst, GR32:$src),			[(atomic_store_8 addr:$dst, GR8 :$src)]>;
	"#RELEASE_MOV PSEUDO!",			def RELEASE_MOV16mr : I<0, Pseudo, (outs), (ins i16mem:$dst, GR16:$src),
	[(atomic_store_32 addr:$dst, GR32:$src)]>;			"#RELEASE_MOV PSEUDO!",
	def RELEASE_MOV64mr : I<0, Pseudo, (outs), (ins i64mem:$dst, GR64:$src),			[(atomic_store_16 addr:$dst, GR16:$src)]>;
	"#RELEASE_MOV PSEUDO!",			def RELEASE_MOV32mr : I<0, Pseudo, (outs), (ins i32mem:$dst, GR32:$src),
	[(atomic_store_64 addr:$dst, GR64:$src)]>;			"#RELEASE_MOV PSEUDO!",
				[(atomic_store_32 addr:$dst, GR32:$src)]>;
	def ACQUIRE_MOV8rm : I<0, Pseudo, (outs GR8 :$dst), (ins i8mem :$src),			def RELEASE_MOV64mr : I<0, Pseudo, (outs), (ins i64mem:$dst, GR64:$src),
	"#ACQUIRE_MOV PSEUDO!",			"#RELEASE_MOV PSEUDO!",
	[(set GR8:$dst, (atomic_load_8 addr:$src))]>;			[(atomic_store_64 addr:$dst, GR64:$src)]>;
	def ACQUIRE_MOV16rm : I<0, Pseudo, (outs GR16:$dst), (ins i16mem:$src),
	"#ACQUIRE_MOV PSEUDO!",			def ACQUIRE_MOV8rm : I<0, Pseudo, (outs GR8 :$dst), (ins i8mem :$src),
	[(set GR16:$dst, (atomic_load_16 addr:$src))]>;			"#ACQUIRE_MOV PSEUDO!",
	def ACQUIRE_MOV32rm : I<0, Pseudo, (outs GR32:$dst), (ins i32mem:$src),			[(set GR8:$dst, (atomic_load_8 addr:$src))]>;
	"#ACQUIRE_MOV PSEUDO!",			def ACQUIRE_MOV16rm : I<0, Pseudo, (outs GR16:$dst), (ins i16mem:$src),
	[(set GR32:$dst, (atomic_load_32 addr:$src))]>;			"#ACQUIRE_MOV PSEUDO!",
	def ACQUIRE_MOV64rm : I<0, Pseudo, (outs GR64:$dst), (ins i64mem:$src),			[(set GR16:$dst, (atomic_load_16 addr:$src))]>;
	"#ACQUIRE_MOV PSEUDO!",			def ACQUIRE_MOV32rm : I<0, Pseudo, (outs GR32:$dst), (ins i32mem:$src),
	[(set GR64:$dst, (atomic_load_64 addr:$src))]>;			"#ACQUIRE_MOV PSEUDO!",
	//===----------------------------------------------------------------------===//			[(set GR32:$dst, (atomic_load_32 addr:$src))]>;
	// Conditional Move Pseudo Instructions.			def ACQUIRE_MOV64rm : I<0, Pseudo, (outs GR64:$dst), (ins i64mem:$src),
	//===----------------------------------------------------------------------===//			"#ACQUIRE_MOV PSEUDO!",
				[(set GR64:$dst, (atomic_load_64 addr:$src))]>;
	// CMOV* - Used to implement the SSE SELECT DAG operation. Expanded after			//===----------------------------------------------------------------------===//
	// instruction selection into a branch sequence.			// Conditional Move Pseudo Instructions.
	let Uses = [EFLAGS], usesCustomInserter = 1 in {			//===----------------------------------------------------------------------===//
	def CMOV_FR32 : I<0, Pseudo,
	(outs FR32:$dst), (ins FR32:$t, FR32:$f, i8imm:$cond),			// CMOV* - Used to implement the SSE SELECT DAG operation. Expanded after
	"#CMOV_FR32 PSEUDO!",			// instruction selection into a branch sequence.
	[(set FR32:$dst, (X86cmov FR32:$t, FR32:$f, imm:$cond,			let Uses = [EFLAGS], usesCustomInserter = 1 in {
	EFLAGS))]>;			def CMOV_FR32 : I<0, Pseudo,
	def CMOV_FR64 : I<0, Pseudo,			(outs FR32:$dst), (ins FR32:$t, FR32:$f, i8imm:$cond),
	(outs FR64:$dst), (ins FR64:$t, FR64:$f, i8imm:$cond),			"#CMOV_FR32 PSEUDO!",
	"#CMOV_FR64 PSEUDO!",			[(set FR32:$dst, (X86cmov FR32:$t, FR32:$f, imm:$cond,
	[(set FR64:$dst, (X86cmov FR64:$t, FR64:$f, imm:$cond,			EFLAGS))]>;
	EFLAGS))]>;			def CMOV_FR64 : I<0, Pseudo,
	def CMOV_V4F32 : I<0, Pseudo,			(outs FR64:$dst), (ins FR64:$t, FR64:$f, i8imm:$cond),
	(outs VR128:$dst), (ins VR128:$t, VR128:$f, i8imm:$cond),			"#CMOV_FR64 PSEUDO!",
	"#CMOV_V4F32 PSEUDO!",			[(set FR64:$dst, (X86cmov FR64:$t, FR64:$f, imm:$cond,
	[(set VR128:$dst,			EFLAGS))]>;
	(v4f32 (X86cmov VR128:$t, VR128:$f, imm:$cond,			def CMOV_V4F32 : I<0, Pseudo,
	EFLAGS)))]>;			(outs VR128:$dst), (ins VR128:$t, VR128:$f, i8imm:$cond),
	def CMOV_V2F64 : I<0, Pseudo,			"#CMOV_V4F32 PSEUDO!",
	(outs VR128:$dst), (ins VR128:$t, VR128:$f, i8imm:$cond),			[(set VR128:$dst,
	"#CMOV_V2F64 PSEUDO!",			(v4f32 (X86cmov VR128:$t, VR128:$f, imm:$cond,
	[(set VR128:$dst,			EFLAGS)))]>;
	(v2f64 (X86cmov VR128:$t, VR128:$f, imm:$cond,			def CMOV_V2F64 : I<0, Pseudo,
	EFLAGS)))]>;			(outs VR128:$dst), (ins VR128:$t, VR128:$f, i8imm:$cond),
	def CMOV_V2I64 : I<0, Pseudo,			"#CMOV_V2F64 PSEUDO!",
	(outs VR128:$dst), (ins VR128:$t, VR128:$f, i8imm:$cond),			[(set VR128:$dst,
	"#CMOV_V2I64 PSEUDO!",			(v2f64 (X86cmov VR128:$t, VR128:$f, imm:$cond,
	[(set VR128:$dst,			EFLAGS)))]>;
	(v2i64 (X86cmov VR128:$t, VR128:$f, imm:$cond,			def CMOV_V2I64 : I<0, Pseudo,
	EFLAGS)))]>;			(outs VR128:$dst), (ins VR128:$t, VR128:$f, i8imm:$cond),
	def CMOV_V8F32 : I<0, Pseudo,			"#CMOV_V2I64 PSEUDO!",
	(outs VR256:$dst), (ins VR256:$t, VR256:$f, i8imm:$cond),			[(set VR128:$dst,
	"#CMOV_V8F32 PSEUDO!",			(v2i64 (X86cmov VR128:$t, VR128:$f, imm:$cond,
	[(set VR256:$dst,			EFLAGS)))]>;
	(v8f32 (X86cmov VR256:$t, VR256:$f, imm:$cond,			def CMOV_V8F32 : I<0, Pseudo,
	EFLAGS)))]>;			(outs VR256:$dst), (ins VR256:$t, VR256:$f, i8imm:$cond),
	def CMOV_V4F64 : I<0, Pseudo,			"#CMOV_V8F32 PSEUDO!",
	(outs VR256:$dst), (ins VR256:$t, VR256:$f, i8imm:$cond),			[(set VR256:$dst,
	"#CMOV_V4F64 PSEUDO!",			(v8f32 (X86cmov VR256:$t, VR256:$f, imm:$cond,
	[(set VR256:$dst,			EFLAGS)))]>;
	(v4f64 (X86cmov VR256:$t, VR256:$f, imm:$cond,			def CMOV_V4F64 : I<0, Pseudo,
	EFLAGS)))]>;			(outs VR256:$dst), (ins VR256:$t, VR256:$f, i8imm:$cond),
	def CMOV_V4I64 : I<0, Pseudo,			"#CMOV_V4F64 PSEUDO!",
	(outs VR256:$dst), (ins VR256:$t, VR256:$f, i8imm:$cond),			[(set VR256:$dst,
	"#CMOV_V4I64 PSEUDO!",			(v4f64 (X86cmov VR256:$t, VR256:$f, imm:$cond,
	[(set VR256:$dst,			EFLAGS)))]>;
	(v4i64 (X86cmov VR256:$t, VR256:$f, imm:$cond,			def CMOV_V4I64 : I<0, Pseudo,
	EFLAGS)))]>;			(outs VR256:$dst), (ins VR256:$t, VR256:$f, i8imm:$cond),
	def CMOV_V8I64 : I<0, Pseudo,			"#CMOV_V4I64 PSEUDO!",
	(outs VR512:$dst), (ins VR512:$t, VR512:$f, i8imm:$cond),			[(set VR256:$dst,
	"#CMOV_V8I64 PSEUDO!",			(v4i64 (X86cmov VR256:$t, VR256:$f, imm:$cond,
	[(set VR512:$dst,			EFLAGS)))]>;
	(v8i64 (X86cmov VR512:$t, VR512:$f, imm:$cond,			def CMOV_V8I64 : I<0, Pseudo,
	EFLAGS)))]>;			(outs VR512:$dst), (ins VR512:$t, VR512:$f, i8imm:$cond),
	def CMOV_V8F64 : I<0, Pseudo,			"#CMOV_V8I64 PSEUDO!",
	(outs VR512:$dst), (ins VR512:$t, VR512:$f, i8imm:$cond),			[(set VR512:$dst,
	"#CMOV_V8F64 PSEUDO!",			(v8i64 (X86cmov VR512:$t, VR512:$f, imm:$cond,
	[(set VR512:$dst,			EFLAGS)))]>;
	(v8f64 (X86cmov VR512:$t, VR512:$f, imm:$cond,			def CMOV_V8F64 : I<0, Pseudo,
	EFLAGS)))]>;			(outs VR512:$dst), (ins VR512:$t, VR512:$f, i8imm:$cond),
	def CMOV_V16F32 : I<0, Pseudo,			"#CMOV_V8F64 PSEUDO!",
	(outs VR512:$dst), (ins VR512:$t, VR512:$f, i8imm:$cond),			[(set VR512:$dst,
	"#CMOV_V16F32 PSEUDO!",			(v8f64 (X86cmov VR512:$t, VR512:$f, imm:$cond,
	[(set VR512:$dst,			EFLAGS)))]>;
	(v16f32 (X86cmov VR512:$t, VR512:$f, imm:$cond,			def CMOV_V16F32 : I<0, Pseudo,
	EFLAGS)))]>;			(outs VR512:$dst), (ins VR512:$t, VR512:$f, i8imm:$cond),
	}			"#CMOV_V16F32 PSEUDO!",
				[(set VR512:$dst,
				(v16f32 (X86cmov VR512:$t, VR512:$f, imm:$cond,
	//===----------------------------------------------------------------------===//			EFLAGS)))]>;
	// DAG Pattern Matching Rules			}
	//===----------------------------------------------------------------------===//

	// ConstantPool GlobalAddress, ExternalSymbol, and JumpTable			//===----------------------------------------------------------------------===//
	def : Pat<(i32 (X86Wrapper tconstpool :$dst)), (MOV32ri tconstpool :$dst)>;			// DAG Pattern Matching Rules
	def : Pat<(i32 (X86Wrapper tjumptable :$dst)), (MOV32ri tjumptable :$dst)>;			//===----------------------------------------------------------------------===//
	def : Pat<(i32 (X86Wrapper tglobaltlsaddr:$dst)),(MOV32ri tglobaltlsaddr:$dst)>;
	def : Pat<(i32 (X86Wrapper tglobaladdr :$dst)), (MOV32ri tglobaladdr :$dst)>;			// ConstantPool GlobalAddress, ExternalSymbol, and JumpTable
	def : Pat<(i32 (X86Wrapper texternalsym:$dst)), (MOV32ri texternalsym:$dst)>;			def : Pat<(i32 (X86Wrapper tconstpool :$dst)), (MOV32ri tconstpool :$dst)>;
	def : Pat<(i32 (X86Wrapper tblockaddress:$dst)), (MOV32ri tblockaddress:$dst)>;			def : Pat<(i32 (X86Wrapper tjumptable :$dst)), (MOV32ri tjumptable :$dst)>;
				def : Pat<(i32 (X86Wrapper tglobaltlsaddr:$dst)),(MOV32ri tglobaltlsaddr:$dst)>;
	def : Pat<(add GR32:$src1, (X86Wrapper tconstpool:$src2)),			def : Pat<(i32 (X86Wrapper tglobaladdr :$dst)), (MOV32ri tglobaladdr :$dst)>;
	(ADD32ri GR32:$src1, tconstpool:$src2)>;			def : Pat<(i32 (X86Wrapper texternalsym:$dst)), (MOV32ri texternalsym:$dst)>;
	def : Pat<(add GR32:$src1, (X86Wrapper tjumptable:$src2)),			def : Pat<(i32 (X86Wrapper tblockaddress:$dst)), (MOV32ri tblockaddress:$dst)>;
	(ADD32ri GR32:$src1, tjumptable:$src2)>;
	def : Pat<(add GR32:$src1, (X86Wrapper tglobaladdr :$src2)),			def : Pat<(add GR32:$src1, (X86Wrapper tconstpool:$src2)),
	(ADD32ri GR32:$src1, tglobaladdr:$src2)>;			(ADD32ri GR32:$src1, tconstpool:$src2)>;
	def : Pat<(add GR32:$src1, (X86Wrapper texternalsym:$src2)),			def : Pat<(add GR32:$src1, (X86Wrapper tjumptable:$src2)),
	(ADD32ri GR32:$src1, texternalsym:$src2)>;			(ADD32ri GR32:$src1, tjumptable:$src2)>;
	def : Pat<(add GR32:$src1, (X86Wrapper tblockaddress:$src2)),			def : Pat<(add GR32:$src1, (X86Wrapper tglobaladdr :$src2)),
	(ADD32ri GR32:$src1, tblockaddress:$src2)>;			(ADD32ri GR32:$src1, tglobaladdr:$src2)>;
				def : Pat<(add GR32:$src1, (X86Wrapper texternalsym:$src2)),
	def : Pat<(store (i32 (X86Wrapper tglobaladdr:$src)), addr:$dst),			(ADD32ri GR32:$src1, texternalsym:$src2)>;
	(MOV32mi addr:$dst, tglobaladdr:$src)>;			def : Pat<(add GR32:$src1, (X86Wrapper tblockaddress:$src2)),
	def : Pat<(store (i32 (X86Wrapper texternalsym:$src)), addr:$dst),			(ADD32ri GR32:$src1, tblockaddress:$src2)>;
	(MOV32mi addr:$dst, texternalsym:$src)>;
	def : Pat<(store (i32 (X86Wrapper tblockaddress:$src)), addr:$dst),			def : Pat<(store (i32 (X86Wrapper tglobaladdr:$src)), addr:$dst),
	(MOV32mi addr:$dst, tblockaddress:$src)>;			(MOV32mi addr:$dst, tglobaladdr:$src)>;
				def : Pat<(store (i32 (X86Wrapper texternalsym:$src)), addr:$dst),
	// ConstantPool GlobalAddress, ExternalSymbol, and JumpTable when not in small			(MOV32mi addr:$dst, texternalsym:$src)>;
	// code model mode, should use 'movabs'. FIXME: This is really a hack, the			def : Pat<(store (i32 (X86Wrapper tblockaddress:$src)), addr:$dst),
	// 'movabs' predicate should handle this sort of thing.			(MOV32mi addr:$dst, tblockaddress:$src)>;
	def : Pat<(i64 (X86Wrapper tconstpool :$dst)),
	(MOV64ri tconstpool :$dst)>, Requires<[FarData]>;			// ConstantPool GlobalAddress, ExternalSymbol, and JumpTable when not in small
	def : Pat<(i64 (X86Wrapper tjumptable :$dst)),			// code model mode, should use 'movabs'. FIXME: This is really a hack, the
	(MOV64ri tjumptable :$dst)>, Requires<[FarData]>;			// 'movabs' predicate should handle this sort of thing.
	def : Pat<(i64 (X86Wrapper tglobaladdr :$dst)),			def : Pat<(i64 (X86Wrapper tconstpool :$dst)),
	(MOV64ri tglobaladdr :$dst)>, Requires<[FarData]>;			(MOV64ri tconstpool :$dst)>, Requires<[FarData]>;
	def : Pat<(i64 (X86Wrapper texternalsym:$dst)),			def : Pat<(i64 (X86Wrapper tjumptable :$dst)),
	(MOV64ri texternalsym:$dst)>, Requires<[FarData]>;			(MOV64ri tjumptable :$dst)>, Requires<[FarData]>;
	def : Pat<(i64 (X86Wrapper tblockaddress:$dst)),			def : Pat<(i64 (X86Wrapper tglobaladdr :$dst)),
	(MOV64ri tblockaddress:$dst)>, Requires<[FarData]>;			(MOV64ri tglobaladdr :$dst)>, Requires<[FarData]>;
				def : Pat<(i64 (X86Wrapper texternalsym:$dst)),
	// In kernel code model, we can get the address of a label			(MOV64ri texternalsym:$dst)>, Requires<[FarData]>;
	// into a register with 'movq'. FIXME: This is a hack, the 'imm' predicate of			def : Pat<(i64 (X86Wrapper tblockaddress:$dst)),
	// the MOV64ri32 should accept these.			(MOV64ri tblockaddress:$dst)>, Requires<[FarData]>;
	def : Pat<(i64 (X86Wrapper tconstpool :$dst)),
	(MOV64ri32 tconstpool :$dst)>, Requires<[KernelCode]>;			// In kernel code model, we can get the address of a label
	def : Pat<(i64 (X86Wrapper tjumptable :$dst)),			// into a register with 'movq'. FIXME: This is a hack, the 'imm' predicate of
	(MOV64ri32 tjumptable :$dst)>, Requires<[KernelCode]>;			// the MOV64ri32 should accept these.
	def : Pat<(i64 (X86Wrapper tglobaladdr :$dst)),			def : Pat<(i64 (X86Wrapper tconstpool :$dst)),
	(MOV64ri32 tglobaladdr :$dst)>, Requires<[KernelCode]>;			(MOV64ri32 tconstpool :$dst)>, Requires<[KernelCode]>;
	def : Pat<(i64 (X86Wrapper texternalsym:$dst)),			def : Pat<(i64 (X86Wrapper tjumptable :$dst)),
	(MOV64ri32 texternalsym:$dst)>, Requires<[KernelCode]>;			(MOV64ri32 tjumptable :$dst)>, Requires<[KernelCode]>;
	def : Pat<(i64 (X86Wrapper tblockaddress:$dst)),			def : Pat<(i64 (X86Wrapper tglobaladdr :$dst)),
	(MOV64ri32 tblockaddress:$dst)>, Requires<[KernelCode]>;			(MOV64ri32 tglobaladdr :$dst)>, Requires<[KernelCode]>;
				def : Pat<(i64 (X86Wrapper texternalsym:$dst)),
	// If we have small model and -static mode, it is safe to store global addresses			(MOV64ri32 texternalsym:$dst)>, Requires<[KernelCode]>;
	// directly as immediates. FIXME: This is really a hack, the 'imm' predicate			def : Pat<(i64 (X86Wrapper tblockaddress:$dst)),
	// for MOV64mi32 should handle this sort of thing.			(MOV64ri32 tblockaddress:$dst)>, Requires<[KernelCode]>;
	def : Pat<(store (i64 (X86Wrapper tconstpool:$src)), addr:$dst),
	(MOV64mi32 addr:$dst, tconstpool:$src)>,			// If we have small model and -static mode, it is safe to store global addresses
	Requires<[NearData, IsStatic]>;			// directly as immediates. FIXME: This is really a hack, the 'imm' predicate
	def : Pat<(store (i64 (X86Wrapper tjumptable:$src)), addr:$dst),			// for MOV64mi32 should handle this sort of thing.
	(MOV64mi32 addr:$dst, tjumptable:$src)>,			def : Pat<(store (i64 (X86Wrapper tconstpool:$src)), addr:$dst),
	Requires<[NearData, IsStatic]>;			(MOV64mi32 addr:$dst, tconstpool:$src)>,
	def : Pat<(store (i64 (X86Wrapper tglobaladdr:$src)), addr:$dst),			Requires<[NearData, IsStatic]>;
	(MOV64mi32 addr:$dst, tglobaladdr:$src)>,			def : Pat<(store (i64 (X86Wrapper tjumptable:$src)), addr:$dst),
	Requires<[NearData, IsStatic]>;			(MOV64mi32 addr:$dst, tjumptable:$src)>,
	def : Pat<(store (i64 (X86Wrapper texternalsym:$src)), addr:$dst),			Requires<[NearData, IsStatic]>;
	(MOV64mi32 addr:$dst, texternalsym:$src)>,			def : Pat<(store (i64 (X86Wrapper tglobaladdr:$src)), addr:$dst),
	Requires<[NearData, IsStatic]>;			(MOV64mi32 addr:$dst, tglobaladdr:$src)>,
	def : Pat<(store (i64 (X86Wrapper tblockaddress:$src)), addr:$dst),			Requires<[NearData, IsStatic]>;
	(MOV64mi32 addr:$dst, tblockaddress:$src)>,			def : Pat<(store (i64 (X86Wrapper texternalsym:$src)), addr:$dst),
	Requires<[NearData, IsStatic]>;			(MOV64mi32 addr:$dst, texternalsym:$src)>,
				Requires<[NearData, IsStatic]>;
	def : Pat<(i32 (X86RecoverFrameAlloc texternalsym:$dst)), (MOV32ri texternalsym:$dst)>;			def : Pat<(store (i64 (X86Wrapper tblockaddress:$src)), addr:$dst),
	def : Pat<(i64 (X86RecoverFrameAlloc texternalsym:$dst)), (MOV64ri texternalsym:$dst)>;			(MOV64mi32 addr:$dst, tblockaddress:$src)>,
				Requires<[NearData, IsStatic]>;
	// Calls
				def : Pat<(i32 (X86RecoverFrameAlloc texternalsym:$dst)), (MOV32ri texternalsym:$dst)>;
	// tls has some funny stuff here...			def : Pat<(i64 (X86RecoverFrameAlloc texternalsym:$dst)), (MOV64ri texternalsym:$dst)>;
	// This corresponds to movabs $foo@tpoff, %rax
	def : Pat<(i64 (X86Wrapper tglobaltlsaddr :$dst)),			// Calls
	(MOV64ri32 tglobaltlsaddr :$dst)>;
	// This corresponds to add $foo@tpoff, %rax			// tls has some funny stuff here...
	def : Pat<(add GR64:$src1, (X86Wrapper tglobaltlsaddr :$dst)),			// This corresponds to movabs $foo@tpoff, %rax
	(ADD64ri32 GR64:$src1, tglobaltlsaddr :$dst)>;			def : Pat<(i64 (X86Wrapper tglobaltlsaddr :$dst)),
				(MOV64ri32 tglobaltlsaddr :$dst)>;
				// This corresponds to add $foo@tpoff, %rax
	// Direct PC relative function call for small code model. 32-bit displacement			def : Pat<(add GR64:$src1, (X86Wrapper tglobaltlsaddr :$dst)),
	// sign extended to 64-bit.			(ADD64ri32 GR64:$src1, tglobaltlsaddr :$dst)>;
	def : Pat<(X86call (i64 tglobaladdr:$dst)),
	(CALL64pcrel32 tglobaladdr:$dst)>;
	def : Pat<(X86call (i64 texternalsym:$dst)),			// Direct PC relative function call for small code model. 32-bit displacement
	(CALL64pcrel32 texternalsym:$dst)>;			// sign extended to 64-bit.
				def : Pat<(X86call (i64 tglobaladdr:$dst)),
	// Tailcall stuff. The TCRETURN instructions execute after the epilog, so they			(CALL64pcrel32 tglobaladdr:$dst)>;
	// can never use callee-saved registers. That is the purpose of the GR64_TC			def : Pat<(X86call (i64 texternalsym:$dst)),
	// register classes.			(CALL64pcrel32 texternalsym:$dst)>;
	//
	// The only volatile register that is never used by the calling convention is			// Tailcall stuff. The TCRETURN instructions execute after the epilog, so they
	// %r11. This happens when calling a vararg function with 6 arguments.			// can never use callee-saved registers. That is the purpose of the GR64_TC
	//			// register classes.
	// Match an X86tcret that uses less than 7 volatile registers.			//
	def X86tcret_6regs : PatFrag<(ops node:$ptr, node:$off),			// The only volatile register that is never used by the calling convention is
	(X86tcret node:$ptr, node:$off), [{			// %r11. This happens when calling a vararg function with 6 arguments.
	// X86tcret args: (*chain, ptr, imm, regs..., glue)			//
	unsigned NumRegs = 0;			// Match an X86tcret that uses less than 7 volatile registers.
	for (unsigned i = 3, e = N->getNumOperands(); i != e; ++i)			def X86tcret_6regs : PatFrag<(ops node:$ptr, node:$off),
	if (isa<RegisterSDNode>(N->getOperand(i)) && ++NumRegs > 6)			(X86tcret node:$ptr, node:$off), [{
	return false;			// X86tcret args: (*chain, ptr, imm, regs..., glue)
	return true;			unsigned NumRegs = 0;
	}]>;			for (unsigned i = 3, e = N->getNumOperands(); i != e; ++i)
				if (isa<RegisterSDNode>(N->getOperand(i)) && ++NumRegs > 6)
	def : Pat<(X86tcret ptr_rc_tailcall:$dst, imm:$off),			return false;
	(TCRETURNri ptr_rc_tailcall:$dst, imm:$off)>,			return true;
	Requires<[Not64BitMode]>;			}]>;

	// FIXME: This is disabled for 32-bit PIC mode because the global base			def : Pat<(X86tcret ptr_rc_tailcall:$dst, imm:$off),
	// register which is part of the address mode may be assigned a			(TCRETURNri ptr_rc_tailcall:$dst, imm:$off)>,
	// callee-saved register.			Requires<[Not64BitMode]>;
	def : Pat<(X86tcret (load addr:$dst), imm:$off),
	(TCRETURNmi addr:$dst, imm:$off)>,			// FIXME: This is disabled for 32-bit PIC mode because the global base
	Requires<[Not64BitMode, IsNotPIC]>;			// register which is part of the address mode may be assigned a
				// callee-saved register.
	def : Pat<(X86tcret (i32 tglobaladdr:$dst), imm:$off),			def : Pat<(X86tcret (load addr:$dst), imm:$off),
	(TCRETURNdi tglobaladdr:$dst, imm:$off)>,			(TCRETURNmi addr:$dst, imm:$off)>,
	Requires<[NotLP64]>;			Requires<[Not64BitMode, IsNotPIC]>;

	def : Pat<(X86tcret (i32 texternalsym:$dst), imm:$off),			def : Pat<(X86tcret (i32 tglobaladdr:$dst), imm:$off),
	(TCRETURNdi texternalsym:$dst, imm:$off)>,			(TCRETURNdi tglobaladdr:$dst, imm:$off)>,
	Requires<[NotLP64]>;			Requires<[NotLP64]>;

	def : Pat<(X86tcret ptr_rc_tailcall:$dst, imm:$off),			def : Pat<(X86tcret (i32 texternalsym:$dst), imm:$off),
	(TCRETURNri64 ptr_rc_tailcall:$dst, imm:$off)>,			(TCRETURNdi texternalsym:$dst, imm:$off)>,
	Requires<[In64BitMode]>;			Requires<[NotLP64]>;

	// Don't fold loads into X86tcret requiring more than 6 regs.			def : Pat<(X86tcret ptr_rc_tailcall:$dst, imm:$off),
	// There wouldn't be enough scratch registers for base+index.			(TCRETURNri64 ptr_rc_tailcall:$dst, imm:$off)>,
	def : Pat<(X86tcret_6regs (load addr:$dst), imm:$off),			Requires<[In64BitMode]>;
	(TCRETURNmi64 addr:$dst, imm:$off)>,
	Requires<[In64BitMode]>;			// Don't fold loads into X86tcret requiring more than 6 regs.
				// There wouldn't be enough scratch registers for base+index.
	def : Pat<(X86tcret (i64 tglobaladdr:$dst), imm:$off),			def : Pat<(X86tcret_6regs (load addr:$dst), imm:$off),
	(TCRETURNdi64 tglobaladdr:$dst, imm:$off)>,			(TCRETURNmi64 addr:$dst, imm:$off)>,
	Requires<[IsLP64]>;			Requires<[In64BitMode]>;

	def : Pat<(X86tcret (i64 texternalsym:$dst), imm:$off),			def : Pat<(X86tcret (i64 tglobaladdr:$dst), imm:$off),
	(TCRETURNdi64 texternalsym:$dst, imm:$off)>,			(TCRETURNdi64 tglobaladdr:$dst, imm:$off)>,
	Requires<[IsLP64]>;			Requires<[IsLP64]>;

	// Normal calls, with various flavors of addresses.			def : Pat<(X86tcret (i64 texternalsym:$dst), imm:$off),
	def : Pat<(X86call (i32 tglobaladdr:$dst)),			(TCRETURNdi64 texternalsym:$dst, imm:$off)>,
	(CALLpcrel32 tglobaladdr:$dst)>;			Requires<[IsLP64]>;
	def : Pat<(X86call (i32 texternalsym:$dst)),
	(CALLpcrel32 texternalsym:$dst)>;			// Normal calls, with various flavors of addresses.
	def : Pat<(X86call (i32 imm:$dst)),			def : Pat<(X86call (i32 tglobaladdr:$dst)),
	(CALLpcrel32 imm:$dst)>, Requires<[CallImmAddr]>;			(CALLpcrel32 tglobaladdr:$dst)>;
				def : Pat<(X86call (i32 texternalsym:$dst)),
	// Comparisons.			(CALLpcrel32 texternalsym:$dst)>;
				def : Pat<(X86call (i32 imm:$dst)),
	// TEST R,R is smaller than CMP R,0			(CALLpcrel32 imm:$dst)>, Requires<[CallImmAddr]>;
	def : Pat<(X86cmp GR8:$src1, 0),
	(TEST8rr GR8:$src1, GR8:$src1)>;			// Comparisons.
	def : Pat<(X86cmp GR16:$src1, 0),
	(TEST16rr GR16:$src1, GR16:$src1)>;			// TEST R,R is smaller than CMP R,0
	def : Pat<(X86cmp GR32:$src1, 0),			def : Pat<(X86cmp GR8:$src1, 0),
	(TEST32rr GR32:$src1, GR32:$src1)>;			(TEST8rr GR8:$src1, GR8:$src1)>;
	def : Pat<(X86cmp GR64:$src1, 0),			def : Pat<(X86cmp GR16:$src1, 0),
	(TEST64rr GR64:$src1, GR64:$src1)>;			(TEST16rr GR16:$src1, GR16:$src1)>;
				def : Pat<(X86cmp GR32:$src1, 0),
	// Conditional moves with folded loads with operands swapped and conditions			(TEST32rr GR32:$src1, GR32:$src1)>;
	// inverted.			def : Pat<(X86cmp GR64:$src1, 0),
	multiclass CMOVmr<PatLeaf InvertedCond, Instruction Inst16, Instruction Inst32,			(TEST64rr GR64:$src1, GR64:$src1)>;
	Instruction Inst64> {
	let Predicates = [HasCMov] in {			// Conditional moves with folded loads with operands swapped and conditions
	def : Pat<(X86cmov (loadi16 addr:$src1), GR16:$src2, InvertedCond, EFLAGS),			// inverted.
	(Inst16 GR16:$src2, addr:$src1)>;			multiclass CMOVmr<PatLeaf InvertedCond, Instruction Inst16, Instruction Inst32,
	def : Pat<(X86cmov (loadi32 addr:$src1), GR32:$src2, InvertedCond, EFLAGS),			Instruction Inst64> {
	(Inst32 GR32:$src2, addr:$src1)>;			let Predicates = [HasCMov] in {
	def : Pat<(X86cmov (loadi64 addr:$src1), GR64:$src2, InvertedCond, EFLAGS),			def : Pat<(X86cmov (loadi16 addr:$src1), GR16:$src2, InvertedCond, EFLAGS),
	(Inst64 GR64:$src2, addr:$src1)>;			(Inst16 GR16:$src2, addr:$src1)>;
	}			def : Pat<(X86cmov (loadi32 addr:$src1), GR32:$src2, InvertedCond, EFLAGS),
	}			(Inst32 GR32:$src2, addr:$src1)>;
				def : Pat<(X86cmov (loadi64 addr:$src1), GR64:$src2, InvertedCond, EFLAGS),
	defm : CMOVmr<X86_COND_B , CMOVAE16rm, CMOVAE32rm, CMOVAE64rm>;			(Inst64 GR64:$src2, addr:$src1)>;
	defm : CMOVmr<X86_COND_AE, CMOVB16rm , CMOVB32rm , CMOVB64rm>;			}
	defm : CMOVmr<X86_COND_E , CMOVNE16rm, CMOVNE32rm, CMOVNE64rm>;			}
	defm : CMOVmr<X86_COND_NE, CMOVE16rm , CMOVE32rm , CMOVE64rm>;
	defm : CMOVmr<X86_COND_BE, CMOVA16rm , CMOVA32rm , CMOVA64rm>;			defm : CMOVmr<X86_COND_B , CMOVAE16rm, CMOVAE32rm, CMOVAE64rm>;
	defm : CMOVmr<X86_COND_A , CMOVBE16rm, CMOVBE32rm, CMOVBE64rm>;			defm : CMOVmr<X86_COND_AE, CMOVB16rm , CMOVB32rm , CMOVB64rm>;
	defm : CMOVmr<X86_COND_L , CMOVGE16rm, CMOVGE32rm, CMOVGE64rm>;			defm : CMOVmr<X86_COND_E , CMOVNE16rm, CMOVNE32rm, CMOVNE64rm>;
	defm : CMOVmr<X86_COND_GE, CMOVL16rm , CMOVL32rm , CMOVL64rm>;			defm : CMOVmr<X86_COND_NE, CMOVE16rm , CMOVE32rm , CMOVE64rm>;
	defm : CMOVmr<X86_COND_LE, CMOVG16rm , CMOVG32rm , CMOVG64rm>;			defm : CMOVmr<X86_COND_BE, CMOVA16rm , CMOVA32rm , CMOVA64rm>;
	defm : CMOVmr<X86_COND_G , CMOVLE16rm, CMOVLE32rm, CMOVLE64rm>;			defm : CMOVmr<X86_COND_A , CMOVBE16rm, CMOVBE32rm, CMOVBE64rm>;
	defm : CMOVmr<X86_COND_P , CMOVNP16rm, CMOVNP32rm, CMOVNP64rm>;			defm : CMOVmr<X86_COND_L , CMOVGE16rm, CMOVGE32rm, CMOVGE64rm>;
	defm : CMOVmr<X86_COND_NP, CMOVP16rm , CMOVP32rm , CMOVP64rm>;			defm : CMOVmr<X86_COND_GE, CMOVL16rm , CMOVL32rm , CMOVL64rm>;
	defm : CMOVmr<X86_COND_S , CMOVNS16rm, CMOVNS32rm, CMOVNS64rm>;			defm : CMOVmr<X86_COND_LE, CMOVG16rm , CMOVG32rm , CMOVG64rm>;
	defm : CMOVmr<X86_COND_NS, CMOVS16rm , CMOVS32rm , CMOVS64rm>;			defm : CMOVmr<X86_COND_G , CMOVLE16rm, CMOVLE32rm, CMOVLE64rm>;
	defm : CMOVmr<X86_COND_O , CMOVNO16rm, CMOVNO32rm, CMOVNO64rm>;			defm : CMOVmr<X86_COND_P , CMOVNP16rm, CMOVNP32rm, CMOVNP64rm>;
	defm : CMOVmr<X86_COND_NO, CMOVO16rm , CMOVO32rm , CMOVO64rm>;			defm : CMOVmr<X86_COND_NP, CMOVP16rm , CMOVP32rm , CMOVP64rm>;
				defm : CMOVmr<X86_COND_S , CMOVNS16rm, CMOVNS32rm, CMOVNS64rm>;
	// zextload bool -> zextload byte			defm : CMOVmr<X86_COND_NS, CMOVS16rm , CMOVS32rm , CMOVS64rm>;
	def : Pat<(zextloadi8i1 addr:$src), (MOV8rm addr:$src)>;			defm : CMOVmr<X86_COND_O , CMOVNO16rm, CMOVNO32rm, CMOVNO64rm>;
	def : Pat<(zextloadi16i1 addr:$src), (MOVZX16rm8 addr:$src)>;			defm : CMOVmr<X86_COND_NO, CMOVO16rm , CMOVO32rm , CMOVO64rm>;
	def : Pat<(zextloadi32i1 addr:$src), (MOVZX32rm8 addr:$src)>;
	def : Pat<(zextloadi64i1 addr:$src),			// zextload bool -> zextload byte
	(SUBREG_TO_REG (i64 0), (MOVZX32rm8 addr:$src), sub_32bit)>;			def : Pat<(zextloadi8i1 addr:$src), (MOV8rm addr:$src)>;
				def : Pat<(zextloadi16i1 addr:$src), (MOVZX16rm8 addr:$src)>;
	// extload bool -> extload byte			def : Pat<(zextloadi32i1 addr:$src), (MOVZX32rm8 addr:$src)>;
	// When extloading from 16-bit and smaller memory locations into 64-bit			def : Pat<(zextloadi64i1 addr:$src),
	// registers, use zero-extending loads so that the entire 64-bit register is			(SUBREG_TO_REG (i64 0), (MOVZX32rm8 addr:$src), sub_32bit)>;
	// defined, avoiding partial-register updates.
				// extload bool -> extload byte
	def : Pat<(extloadi8i1 addr:$src), (MOV8rm addr:$src)>;			// When extloading from 16-bit and smaller memory locations into 64-bit
	def : Pat<(extloadi16i1 addr:$src), (MOVZX16rm8 addr:$src)>;			// registers, use zero-extending loads so that the entire 64-bit register is
	def : Pat<(extloadi32i1 addr:$src), (MOVZX32rm8 addr:$src)>;			// defined, avoiding partial-register updates.
	def : Pat<(extloadi16i8 addr:$src), (MOVZX16rm8 addr:$src)>;
	def : Pat<(extloadi32i8 addr:$src), (MOVZX32rm8 addr:$src)>;			def : Pat<(extloadi8i1 addr:$src), (MOV8rm addr:$src)>;
	def : Pat<(extloadi32i16 addr:$src), (MOVZX32rm16 addr:$src)>;			def : Pat<(extloadi16i1 addr:$src), (MOVZX16rm8 addr:$src)>;
				def : Pat<(extloadi32i1 addr:$src), (MOVZX32rm8 addr:$src)>;
	// For other extloads, use subregs, since the high contents of the register are			def : Pat<(extloadi16i8 addr:$src), (MOVZX16rm8 addr:$src)>;
	// defined after an extload.			def : Pat<(extloadi32i8 addr:$src), (MOVZX32rm8 addr:$src)>;
	def : Pat<(extloadi64i1 addr:$src),			def : Pat<(extloadi32i16 addr:$src), (MOVZX32rm16 addr:$src)>;
	(SUBREG_TO_REG (i64 0), (MOVZX32rm8 addr:$src), sub_32bit)>;
	def : Pat<(extloadi64i8 addr:$src),			// For other extloads, use subregs, since the high contents of the register are
	(SUBREG_TO_REG (i64 0), (MOVZX32rm8 addr:$src), sub_32bit)>;			// defined after an extload.
	def : Pat<(extloadi64i16 addr:$src),			def : Pat<(extloadi64i1 addr:$src),
	(SUBREG_TO_REG (i64 0), (MOVZX32rm16 addr:$src), sub_32bit)>;			(SUBREG_TO_REG (i64 0), (MOVZX32rm8 addr:$src), sub_32bit)>;
	def : Pat<(extloadi64i32 addr:$src),			def : Pat<(extloadi64i8 addr:$src),
	(SUBREG_TO_REG (i64 0), (MOV32rm addr:$src), sub_32bit)>;			(SUBREG_TO_REG (i64 0), (MOVZX32rm8 addr:$src), sub_32bit)>;
				def : Pat<(extloadi64i16 addr:$src),
	// anyext. Define these to do an explicit zero-extend to			(SUBREG_TO_REG (i64 0), (MOVZX32rm16 addr:$src), sub_32bit)>;
	// avoid partial-register updates.			def : Pat<(extloadi64i32 addr:$src),
	def : Pat<(i16 (anyext GR8 :$src)), (EXTRACT_SUBREG			(SUBREG_TO_REG (i64 0), (MOV32rm addr:$src), sub_32bit)>;
	(MOVZX32rr8 GR8 :$src), sub_16bit)>;
	def : Pat<(i32 (anyext GR8 :$src)), (MOVZX32rr8 GR8 :$src)>;			// anyext. Define these to do an explicit zero-extend to
				// avoid partial-register updates.
	// Except for i16 -> i32 since isel expect i16 ops to be promoted to i32.			def : Pat<(i16 (anyext GR8 :$src)), (EXTRACT_SUBREG
	def : Pat<(i32 (anyext GR16:$src)),			(MOVZX32rr8 GR8 :$src), sub_16bit)>;
	(INSERT_SUBREG (i32 (IMPLICIT_DEF)), GR16:$src, sub_16bit)>;			def : Pat<(i32 (anyext GR8 :$src)), (MOVZX32rr8 GR8 :$src)>;

	def : Pat<(i64 (anyext GR8 :$src)),			// Except for i16 -> i32 since isel expect i16 ops to be promoted to i32.
	(SUBREG_TO_REG (i64 0), (MOVZX32rr8 GR8 :$src), sub_32bit)>;			def : Pat<(i32 (anyext GR16:$src)),
	def : Pat<(i64 (anyext GR16:$src)),			(INSERT_SUBREG (i32 (IMPLICIT_DEF)), GR16:$src, sub_16bit)>;
	(SUBREG_TO_REG (i64 0), (MOVZX32rr16 GR16 :$src), sub_32bit)>;
	def : Pat<(i64 (anyext GR32:$src)),			def : Pat<(i64 (anyext GR8 :$src)),
	(SUBREG_TO_REG (i64 0), GR32:$src, sub_32bit)>;			(SUBREG_TO_REG (i64 0), (MOVZX32rr8 GR8 :$src), sub_32bit)>;
				def : Pat<(i64 (anyext GR16:$src)),
				(SUBREG_TO_REG (i64 0), (MOVZX32rr16 GR16 :$src), sub_32bit)>;
	// Any instruction that defines a 32-bit result leaves the high half of the			def : Pat<(i64 (anyext GR32:$src)),
	// register. Truncate can be lowered to EXTRACT_SUBREG. CopyFromReg may			(SUBREG_TO_REG (i64 0), GR32:$src, sub_32bit)>;
	// be copying from a truncate. And x86's cmov doesn't do anything if the
	// condition is false. But any other 32-bit operation will zero-extend
	// up to 64 bits.			// Any instruction that defines a 32-bit result leaves the high half of the
	def def32 : PatLeaf<(i32 GR32:$src), [{			// register. Truncate can be lowered to EXTRACT_SUBREG. CopyFromReg may
	return N->getOpcode() != ISD::TRUNCATE &&			// be copying from a truncate. And x86's cmov doesn't do anything if the
	N->getOpcode() != TargetOpcode::EXTRACT_SUBREG &&			// condition is false. But any other 32-bit operation will zero-extend
	N->getOpcode() != ISD::CopyFromReg &&			// up to 64 bits.
	N->getOpcode() != ISD::AssertSext &&			def def32 : PatLeaf<(i32 GR32:$src), [{
	N->getOpcode() != X86ISD::CMOV;			return N->getOpcode() != ISD::TRUNCATE &&
	}]>;			N->getOpcode() != TargetOpcode::EXTRACT_SUBREG &&
				N->getOpcode() != ISD::CopyFromReg &&
	// In the case of a 32-bit def that is known to implicitly zero-extend,			N->getOpcode() != ISD::AssertSext &&
	// we can use a SUBREG_TO_REG.			N->getOpcode() != X86ISD::CMOV;
	def : Pat<(i64 (zext def32:$src)),			}]>;
	(SUBREG_TO_REG (i64 0), GR32:$src, sub_32bit)>;
				// In the case of a 32-bit def that is known to implicitly zero-extend,
	//===----------------------------------------------------------------------===//			// we can use a SUBREG_TO_REG.
	// Pattern match OR as ADD			def : Pat<(i64 (zext def32:$src)),
	//===----------------------------------------------------------------------===//			(SUBREG_TO_REG (i64 0), GR32:$src, sub_32bit)>;

	// If safe, we prefer to pattern match OR as ADD at isel time. ADD can be			//===----------------------------------------------------------------------===//
	// 3-addressified into an LEA instruction to avoid copies. However, we also			// Pattern match OR as ADD
	// want to finally emit these instructions as an or at the end of the code			//===----------------------------------------------------------------------===//
	// generator to make the generated code easier to read. To do this, we select
	// into "disjoint bits" pseudo ops.			// If safe, we prefer to pattern match OR as ADD at isel time. ADD can be
				// 3-addressified into an LEA instruction to avoid copies. However, we also
	// Treat an 'or' node is as an 'add' if the or'ed bits are known to be zero.			// want to finally emit these instructions as an or at the end of the code
	def or_is_add : PatFrag<(ops node:$lhs, node:$rhs), (or node:$lhs, node:$rhs),[{			// generator to make the generated code easier to read. To do this, we select
	if (ConstantSDNode *CN = dyn_cast<ConstantSDNode>(N->getOperand(1)))			// into "disjoint bits" pseudo ops.
	return CurDAG->MaskedValueIsZero(N->getOperand(0), CN->getAPIntValue());
				// Treat an 'or' node is as an 'add' if the or'ed bits are known to be zero.
	APInt KnownZero0, KnownOne0;			def or_is_add : PatFrag<(ops node:$lhs, node:$rhs), (or node:$lhs, node:$rhs),[{
	CurDAG->computeKnownBits(N->getOperand(0), KnownZero0, KnownOne0, 0);			if (ConstantSDNode *CN = dyn_cast<ConstantSDNode>(N->getOperand(1)))
	APInt KnownZero1, KnownOne1;			return CurDAG->MaskedValueIsZero(N->getOperand(0), CN->getAPIntValue());
	CurDAG->computeKnownBits(N->getOperand(1), KnownZero1, KnownOne1, 0);
	return (~KnownZero0 & ~KnownZero1) == 0;			APInt KnownZero0, KnownOne0;
	}]>;			CurDAG->computeKnownBits(N->getOperand(0), KnownZero0, KnownOne0, 0);
				APInt KnownZero1, KnownOne1;
				CurDAG->computeKnownBits(N->getOperand(1), KnownZero1, KnownOne1, 0);
	// (or x1, x2) -> (add x1, x2) if two operands are known not to share bits.			return (~KnownZero0 & ~KnownZero1) == 0;
	// Try this before the selecting to OR.			}]>;
	let AddedComplexity = 5, SchedRW = [WriteALU] in {

	let isConvertibleToThreeAddress = 1,			// (or x1, x2) -> (add x1, x2) if two operands are known not to share bits.
	Constraints = "$src1 = $dst", Defs = [EFLAGS] in {			// Try this before the selecting to OR.
	let isCommutable = 1 in {			let AddedComplexity = 5, SchedRW = [WriteALU] in {
	def ADD16rr_DB : I<0, Pseudo, (outs GR16:$dst), (ins GR16:$src1, GR16:$src2),
	"", // orw/addw REG, REG			let isConvertibleToThreeAddress = 1,
	[(set GR16:$dst, (or_is_add GR16:$src1, GR16:$src2))]>;			Constraints = "$src1 = $dst", Defs = [EFLAGS] in {
	def ADD32rr_DB : I<0, Pseudo, (outs GR32:$dst), (ins GR32:$src1, GR32:$src2),			let isCommutable = 1 in {
	"", // orl/addl REG, REG			def ADD16rr_DB : I<0, Pseudo, (outs GR16:$dst), (ins GR16:$src1, GR16:$src2),
	[(set GR32:$dst, (or_is_add GR32:$src1, GR32:$src2))]>;			"", // orw/addw REG, REG
	def ADD64rr_DB : I<0, Pseudo, (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),			[(set GR16:$dst, (or_is_add GR16:$src1, GR16:$src2))]>;
	"", // orq/addq REG, REG			def ADD32rr_DB : I<0, Pseudo, (outs GR32:$dst), (ins GR32:$src1, GR32:$src2),
	[(set GR64:$dst, (or_is_add GR64:$src1, GR64:$src2))]>;			"", // orl/addl REG, REG
	} // isCommutable			[(set GR32:$dst, (or_is_add GR32:$src1, GR32:$src2))]>;
				def ADD64rr_DB : I<0, Pseudo, (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),
	// NOTE: These are order specific, we want the ri8 forms to be listed			"", // orq/addq REG, REG
	// first so that they are slightly preferred to the ri forms.			[(set GR64:$dst, (or_is_add GR64:$src1, GR64:$src2))]>;
				} // isCommutable
	def ADD16ri8_DB : I<0, Pseudo,
	(outs GR16:$dst), (ins GR16:$src1, i16i8imm:$src2),			// NOTE: These are order specific, we want the ri8 forms to be listed
	"", // orw/addw REG, imm8			// first so that they are slightly preferred to the ri forms.
	[(set GR16:$dst,(or_is_add GR16:$src1,i16immSExt8:$src2))]>;
	def ADD16ri_DB : I<0, Pseudo, (outs GR16:$dst), (ins GR16:$src1, i16imm:$src2),			def ADD16ri8_DB : I<0, Pseudo,
	"", // orw/addw REG, imm			(outs GR16:$dst), (ins GR16:$src1, i16i8imm:$src2),
	[(set GR16:$dst, (or_is_add GR16:$src1, imm:$src2))]>;			"", // orw/addw REG, imm8
				[(set GR16:$dst,(or_is_add GR16:$src1,i16immSExt8:$src2))]>;
	def ADD32ri8_DB : I<0, Pseudo,			def ADD16ri_DB : I<0, Pseudo, (outs GR16:$dst), (ins GR16:$src1, i16imm:$src2),
	(outs GR32:$dst), (ins GR32:$src1, i32i8imm:$src2),			"", // orw/addw REG, imm
	"", // orl/addl REG, imm8			[(set GR16:$dst, (or_is_add GR16:$src1, imm:$src2))]>;
	[(set GR32:$dst,(or_is_add GR32:$src1,i32immSExt8:$src2))]>;
	def ADD32ri_DB : I<0, Pseudo, (outs GR32:$dst), (ins GR32:$src1, i32imm:$src2),			def ADD32ri8_DB : I<0, Pseudo,
	"", // orl/addl REG, imm			(outs GR32:$dst), (ins GR32:$src1, i32i8imm:$src2),
	[(set GR32:$dst, (or_is_add GR32:$src1, imm:$src2))]>;			"", // orl/addl REG, imm8
				[(set GR32:$dst,(or_is_add GR32:$src1,i32immSExt8:$src2))]>;
				def ADD32ri_DB : I<0, Pseudo, (outs GR32:$dst), (ins GR32:$src1, i32imm:$src2),
	def ADD64ri8_DB : I<0, Pseudo,			"", // orl/addl REG, imm
	(outs GR64:$dst), (ins GR64:$src1, i64i8imm:$src2),			[(set GR32:$dst, (or_is_add GR32:$src1, imm:$src2))]>;
	"", // orq/addq REG, imm8
	[(set GR64:$dst, (or_is_add GR64:$src1,
	i64immSExt8:$src2))]>;			def ADD64ri8_DB : I<0, Pseudo,
	def ADD64ri32_DB : I<0, Pseudo,			(outs GR64:$dst), (ins GR64:$src1, i64i8imm:$src2),
	(outs GR64:$dst), (ins GR64:$src1, i64i32imm:$src2),			"", // orq/addq REG, imm8
	"", // orq/addq REG, imm			[(set GR64:$dst, (or_is_add GR64:$src1,
	[(set GR64:$dst, (or_is_add GR64:$src1,			i64immSExt8:$src2))]>;
	i64immSExt32:$src2))]>;			def ADD64ri32_DB : I<0, Pseudo,
	}			(outs GR64:$dst), (ins GR64:$src1, i64i32imm:$src2),
	} // AddedComplexity, SchedRW			"", // orq/addq REG, imm
				[(set GR64:$dst, (or_is_add GR64:$src1,
				i64immSExt32:$src2))]>;
	//===----------------------------------------------------------------------===//			}
	// Some peepholes			} // AddedComplexity, SchedRW
	//===----------------------------------------------------------------------===//

	// Odd encoding trick: -128 fits into an 8-bit immediate field while			//===----------------------------------------------------------------------===//
	// +128 doesn't, so in this special case use a sub instead of an add.			// Some peepholes
	def : Pat<(add GR16:$src1, 128),			//===----------------------------------------------------------------------===//
	(SUB16ri8 GR16:$src1, -128)>;
	def : Pat<(store (add (loadi16 addr:$dst), 128), addr:$dst),			// Odd encoding trick: -128 fits into an 8-bit immediate field while
	(SUB16mi8 addr:$dst, -128)>;			// +128 doesn't, so in this special case use a sub instead of an add.
				def : Pat<(add GR16:$src1, 128),
	def : Pat<(add GR32:$src1, 128),			(SUB16ri8 GR16:$src1, -128)>;
	(SUB32ri8 GR32:$src1, -128)>;			def : Pat<(store (add (loadi16 addr:$dst), 128), addr:$dst),
	def : Pat<(store (add (loadi32 addr:$dst), 128), addr:$dst),			(SUB16mi8 addr:$dst, -128)>;
	(SUB32mi8 addr:$dst, -128)>;
				def : Pat<(add GR32:$src1, 128),
	def : Pat<(add GR64:$src1, 128),			(SUB32ri8 GR32:$src1, -128)>;
	(SUB64ri8 GR64:$src1, -128)>;			def : Pat<(store (add (loadi32 addr:$dst), 128), addr:$dst),
	def : Pat<(store (add (loadi64 addr:$dst), 128), addr:$dst),			(SUB32mi8 addr:$dst, -128)>;
	(SUB64mi8 addr:$dst, -128)>;
				def : Pat<(add GR64:$src1, 128),
	// The same trick applies for 32-bit immediate fields in 64-bit			(SUB64ri8 GR64:$src1, -128)>;
	// instructions.			def : Pat<(store (add (loadi64 addr:$dst), 128), addr:$dst),
	def : Pat<(add GR64:$src1, 0x0000000080000000),			(SUB64mi8 addr:$dst, -128)>;
	(SUB64ri32 GR64:$src1, 0xffffffff80000000)>;
	def : Pat<(store (add (loadi64 addr:$dst), 0x00000000800000000), addr:$dst),			// The same trick applies for 32-bit immediate fields in 64-bit
	(SUB64mi32 addr:$dst, 0xffffffff80000000)>;			// instructions.
				def : Pat<(add GR64:$src1, 0x0000000080000000),
	// To avoid needing to materialize an immediate in a register, use a 32-bit and			(SUB64ri32 GR64:$src1, 0xffffffff80000000)>;
	// with implicit zero-extension instead of a 64-bit and if the immediate has at			def : Pat<(store (add (loadi64 addr:$dst), 0x00000000800000000), addr:$dst),
	// least 32 bits of leading zeros. If in addition the last 32 bits can be			(SUB64mi32 addr:$dst, 0xffffffff80000000)>;
	// represented with a sign extension of a 8 bit constant, use that.
				// To avoid needing to materialize an immediate in a register, use a 32-bit and
	def : Pat<(and GR64:$src, i64immZExt32SExt8:$imm),			// with implicit zero-extension instead of a 64-bit and if the immediate has at
	(SUBREG_TO_REG			// least 32 bits of leading zeros. If in addition the last 32 bits can be
	(i64 0),			// represented with a sign extension of a 8 bit constant, use that.
	(AND32ri8
	(EXTRACT_SUBREG GR64:$src, sub_32bit),			def : Pat<(and GR64:$src, i64immZExt32SExt8:$imm),
	(i32 (GetLo8XForm imm:$imm))),			(SUBREG_TO_REG
	sub_32bit)>;			(i64 0),
				(AND32ri8
	def : Pat<(and GR64:$src, i64immZExt32:$imm),			(EXTRACT_SUBREG GR64:$src, sub_32bit),
	(SUBREG_TO_REG			(i32 (GetLo8XForm imm:$imm))),
	(i64 0),			sub_32bit)>;
	(AND32ri
	(EXTRACT_SUBREG GR64:$src, sub_32bit),			def : Pat<(and GR64:$src, i64immZExt32:$imm),
	(i32 (GetLo32XForm imm:$imm))),			(SUBREG_TO_REG
	sub_32bit)>;			(i64 0),
				(AND32ri
				(EXTRACT_SUBREG GR64:$src, sub_32bit),
	// r & (2^16-1) ==> movz			(i32 (GetLo32XForm imm:$imm))),
	def : Pat<(and GR32:$src1, 0xffff),			sub_32bit)>;
	(MOVZX32rr16 (EXTRACT_SUBREG GR32:$src1, sub_16bit))>;
	// r & (2^8-1) ==> movz
	def : Pat<(and GR32:$src1, 0xff),			// r & (2^16-1) ==> movz
	(MOVZX32rr8 (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS GR32:$src1,			def : Pat<(and GR32:$src1, 0xffff),
	GR32_ABCD)),			(MOVZX32rr16 (EXTRACT_SUBREG GR32:$src1, sub_16bit))>;
	sub_8bit))>,			// r & (2^8-1) ==> movz
	Requires<[Not64BitMode]>;			def : Pat<(and GR32:$src1, 0xff),
	// r & (2^8-1) ==> movz			(MOVZX32rr8 (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS GR32:$src1,
	def : Pat<(and GR16:$src1, 0xff),			GR32_ABCD)),
	(EXTRACT_SUBREG (MOVZX32rr8 (EXTRACT_SUBREG			sub_8bit))>,
	(i16 (COPY_TO_REGCLASS GR16:$src1, GR16_ABCD)), sub_8bit)),			Requires<[Not64BitMode]>;
	sub_16bit)>,			// r & (2^8-1) ==> movz
	Requires<[Not64BitMode]>;			def : Pat<(and GR16:$src1, 0xff),
				(EXTRACT_SUBREG (MOVZX32rr8 (EXTRACT_SUBREG
	// r & (2^32-1) ==> movz			(i16 (COPY_TO_REGCLASS GR16:$src1, GR16_ABCD)), sub_8bit)),
	def : Pat<(and GR64:$src, 0x00000000FFFFFFFF),			sub_16bit)>,
	(SUBREG_TO_REG (i64 0),			Requires<[Not64BitMode]>;
	(MOV32rr (EXTRACT_SUBREG GR64:$src, sub_32bit)),
	sub_32bit)>;			// r & (2^32-1) ==> movz
	// r & (2^16-1) ==> movz			def : Pat<(and GR64:$src, 0x00000000FFFFFFFF),
	def : Pat<(and GR64:$src, 0xffff),			(SUBREG_TO_REG (i64 0),
	(SUBREG_TO_REG (i64 0),			(MOV32rr (EXTRACT_SUBREG GR64:$src, sub_32bit)),
	(MOVZX32rr16 (i16 (EXTRACT_SUBREG GR64:$src, sub_16bit))),			sub_32bit)>;
	sub_32bit)>;			// r & (2^16-1) ==> movz
	// r & (2^8-1) ==> movz			def : Pat<(and GR64:$src, 0xffff),
	def : Pat<(and GR64:$src, 0xff),			(SUBREG_TO_REG (i64 0),
	(SUBREG_TO_REG (i64 0),			(MOVZX32rr16 (i16 (EXTRACT_SUBREG GR64:$src, sub_16bit))),
	(MOVZX32rr8 (i8 (EXTRACT_SUBREG GR64:$src, sub_8bit))),			sub_32bit)>;
	sub_32bit)>;			// r & (2^8-1) ==> movz
	// r & (2^8-1) ==> movz			def : Pat<(and GR64:$src, 0xff),
	def : Pat<(and GR32:$src1, 0xff),			(SUBREG_TO_REG (i64 0),
	(MOVZX32rr8 (EXTRACT_SUBREG GR32:$src1, sub_8bit))>,			(MOVZX32rr8 (i8 (EXTRACT_SUBREG GR64:$src, sub_8bit))),
	Requires<[In64BitMode]>;			sub_32bit)>;
	// r & (2^8-1) ==> movz			// r & (2^8-1) ==> movz
	def : Pat<(and GR16:$src1, 0xff),			def : Pat<(and GR32:$src1, 0xff),
	(EXTRACT_SUBREG (MOVZX32rr8 (i8			(MOVZX32rr8 (EXTRACT_SUBREG GR32:$src1, sub_8bit))>,
	(EXTRACT_SUBREG GR16:$src1, sub_8bit))), sub_16bit)>,			Requires<[In64BitMode]>;
	Requires<[In64BitMode]>;			// r & (2^8-1) ==> movz
				def : Pat<(and GR16:$src1, 0xff),
				(EXTRACT_SUBREG (MOVZX32rr8 (i8
	// sext_inreg patterns			(EXTRACT_SUBREG GR16:$src1, sub_8bit))), sub_16bit)>,
	def : Pat<(sext_inreg GR32:$src, i16),			Requires<[In64BitMode]>;
	(MOVSX32rr16 (EXTRACT_SUBREG GR32:$src, sub_16bit))>;
	def : Pat<(sext_inreg GR32:$src, i8),
	(MOVSX32rr8 (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS GR32:$src,			// sext_inreg patterns
	GR32_ABCD)),			def : Pat<(sext_inreg GR32:$src, i16),
	sub_8bit))>,			(MOVSX32rr16 (EXTRACT_SUBREG GR32:$src, sub_16bit))>;
	Requires<[Not64BitMode]>;			def : Pat<(sext_inreg GR32:$src, i8),
				(MOVSX32rr8 (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS GR32:$src,
	def : Pat<(sext_inreg GR16:$src, i8),			GR32_ABCD)),
	(EXTRACT_SUBREG (i32 (MOVSX32rr8 (EXTRACT_SUBREG			sub_8bit))>,
	(i32 (COPY_TO_REGCLASS GR16:$src, GR16_ABCD)), sub_8bit))),			Requires<[Not64BitMode]>;
	sub_16bit)>,
	Requires<[Not64BitMode]>;			def : Pat<(sext_inreg GR16:$src, i8),
				(EXTRACT_SUBREG (i32 (MOVSX32rr8 (EXTRACT_SUBREG
	def : Pat<(sext_inreg GR64:$src, i32),			(i32 (COPY_TO_REGCLASS GR16:$src, GR16_ABCD)), sub_8bit))),
	(MOVSX64rr32 (EXTRACT_SUBREG GR64:$src, sub_32bit))>;			sub_16bit)>,
	def : Pat<(sext_inreg GR64:$src, i16),			Requires<[Not64BitMode]>;
	(MOVSX64rr16 (EXTRACT_SUBREG GR64:$src, sub_16bit))>;
	def : Pat<(sext_inreg GR64:$src, i8),			def : Pat<(sext_inreg GR64:$src, i32),
	(MOVSX64rr8 (EXTRACT_SUBREG GR64:$src, sub_8bit))>;			(MOVSX64rr32 (EXTRACT_SUBREG GR64:$src, sub_32bit))>;
	def : Pat<(sext_inreg GR32:$src, i8),			def : Pat<(sext_inreg GR64:$src, i16),
	(MOVSX32rr8 (EXTRACT_SUBREG GR32:$src, sub_8bit))>,			(MOVSX64rr16 (EXTRACT_SUBREG GR64:$src, sub_16bit))>;
	Requires<[In64BitMode]>;			def : Pat<(sext_inreg GR64:$src, i8),
	def : Pat<(sext_inreg GR16:$src, i8),			(MOVSX64rr8 (EXTRACT_SUBREG GR64:$src, sub_8bit))>;
	(EXTRACT_SUBREG (MOVSX32rr8			def : Pat<(sext_inreg GR32:$src, i8),
	(EXTRACT_SUBREG GR16:$src, sub_8bit)), sub_16bit)>,			(MOVSX32rr8 (EXTRACT_SUBREG GR32:$src, sub_8bit))>,
	Requires<[In64BitMode]>;			Requires<[In64BitMode]>;
				def : Pat<(sext_inreg GR16:$src, i8),
	// sext, sext_load, zext, zext_load			(EXTRACT_SUBREG (MOVSX32rr8
	def: Pat<(i16 (sext GR8:$src)),			(EXTRACT_SUBREG GR16:$src, sub_8bit)), sub_16bit)>,
	(EXTRACT_SUBREG (MOVSX32rr8 GR8:$src), sub_16bit)>;			Requires<[In64BitMode]>;
	def: Pat<(sextloadi16i8 addr:$src),
	(EXTRACT_SUBREG (MOVSX32rm8 addr:$src), sub_16bit)>;			// sext, sext_load, zext, zext_load
	def: Pat<(i16 (zext GR8:$src)),			def: Pat<(i16 (sext GR8:$src)),
	(EXTRACT_SUBREG (MOVZX32rr8 GR8:$src), sub_16bit)>;			(EXTRACT_SUBREG (MOVSX32rr8 GR8:$src), sub_16bit)>;
	def: Pat<(zextloadi16i8 addr:$src),			def: Pat<(sextloadi16i8 addr:$src),
	(EXTRACT_SUBREG (MOVZX32rm8 addr:$src), sub_16bit)>;			(EXTRACT_SUBREG (MOVSX32rm8 addr:$src), sub_16bit)>;
				def: Pat<(i16 (zext GR8:$src)),
	// trunc patterns			(EXTRACT_SUBREG (MOVZX32rr8 GR8:$src), sub_16bit)>;
	def : Pat<(i16 (trunc GR32:$src)),			def: Pat<(zextloadi16i8 addr:$src),
	(EXTRACT_SUBREG GR32:$src, sub_16bit)>;			(EXTRACT_SUBREG (MOVZX32rm8 addr:$src), sub_16bit)>;
	def : Pat<(i8 (trunc GR32:$src)),
	(EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS GR32:$src, GR32_ABCD)),			// trunc patterns
	sub_8bit)>,			def : Pat<(i16 (trunc GR32:$src)),
	Requires<[Not64BitMode]>;			(EXTRACT_SUBREG GR32:$src, sub_16bit)>;
	def : Pat<(i8 (trunc GR16:$src)),			def : Pat<(i8 (trunc GR32:$src)),
	(EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src, GR16_ABCD)),			(EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS GR32:$src, GR32_ABCD)),
	sub_8bit)>,			sub_8bit)>,
	Requires<[Not64BitMode]>;			Requires<[Not64BitMode]>;
	def : Pat<(i32 (trunc GR64:$src)),			def : Pat<(i8 (trunc GR16:$src)),
	(EXTRACT_SUBREG GR64:$src, sub_32bit)>;			(EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src, GR16_ABCD)),
	def : Pat<(i16 (trunc GR64:$src)),			sub_8bit)>,
	(EXTRACT_SUBREG GR64:$src, sub_16bit)>;			Requires<[Not64BitMode]>;
	def : Pat<(i8 (trunc GR64:$src)),			def : Pat<(i32 (trunc GR64:$src)),
	(EXTRACT_SUBREG GR64:$src, sub_8bit)>;			(EXTRACT_SUBREG GR64:$src, sub_32bit)>;
	def : Pat<(i8 (trunc GR32:$src)),			def : Pat<(i16 (trunc GR64:$src)),
	(EXTRACT_SUBREG GR32:$src, sub_8bit)>,			(EXTRACT_SUBREG GR64:$src, sub_16bit)>;
	Requires<[In64BitMode]>;			def : Pat<(i8 (trunc GR64:$src)),
	def : Pat<(i8 (trunc GR16:$src)),			(EXTRACT_SUBREG GR64:$src, sub_8bit)>;
	(EXTRACT_SUBREG GR16:$src, sub_8bit)>,			def : Pat<(i8 (trunc GR32:$src)),
	Requires<[In64BitMode]>;			(EXTRACT_SUBREG GR32:$src, sub_8bit)>,
				Requires<[In64BitMode]>;
	// h-register tricks			def : Pat<(i8 (trunc GR16:$src)),
	def : Pat<(i8 (trunc (srl_su GR16:$src, (i8 8)))),			(EXTRACT_SUBREG GR16:$src, sub_8bit)>,
	(EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src, GR16_ABCD)),			Requires<[In64BitMode]>;
	sub_8bit_hi)>,
	Requires<[Not64BitMode]>;			// h-register tricks
	def : Pat<(i8 (trunc (srl_su GR32:$src, (i8 8)))),			def : Pat<(i8 (trunc (srl_su GR16:$src, (i8 8)))),
	(EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS GR32:$src, GR32_ABCD)),			(EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src, GR16_ABCD)),
	sub_8bit_hi)>,			sub_8bit_hi)>,
	Requires<[Not64BitMode]>;			Requires<[Not64BitMode]>;
	def : Pat<(srl GR16:$src, (i8 8)),			def : Pat<(i8 (trunc (srl_su GR32:$src, (i8 8)))),
	(EXTRACT_SUBREG			(EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS GR32:$src, GR32_ABCD)),
	(MOVZX32rr8			sub_8bit_hi)>,
	(EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src, GR16_ABCD)),			Requires<[Not64BitMode]>;
	sub_8bit_hi)),			def : Pat<(srl GR16:$src, (i8 8)),
	sub_16bit)>,			(EXTRACT_SUBREG
	Requires<[Not64BitMode]>;			(MOVZX32rr8
	def : Pat<(i32 (zext (srl_su GR16:$src, (i8 8)))),			(EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src, GR16_ABCD)),
	(MOVZX32rr8 (EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src,			sub_8bit_hi)),
	GR16_ABCD)),			sub_16bit)>,
	sub_8bit_hi))>,			Requires<[Not64BitMode]>;
	Requires<[Not64BitMode]>;			def : Pat<(i32 (zext (srl_su GR16:$src, (i8 8)))),
	def : Pat<(i32 (anyext (srl_su GR16:$src, (i8 8)))),			(MOVZX32rr8 (EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src,
	(MOVZX32rr8 (EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src,			GR16_ABCD)),
	GR16_ABCD)),			sub_8bit_hi))>,
	sub_8bit_hi))>,			Requires<[Not64BitMode]>;
	Requires<[Not64BitMode]>;			def : Pat<(i32 (anyext (srl_su GR16:$src, (i8 8)))),
	def : Pat<(and (srl_su GR32:$src, (i8 8)), (i32 255)),			(MOVZX32rr8 (EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src,
	(MOVZX32rr8 (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS GR32:$src,			GR16_ABCD)),
	GR32_ABCD)),			sub_8bit_hi))>,
	sub_8bit_hi))>,			Requires<[Not64BitMode]>;
	Requires<[Not64BitMode]>;			def : Pat<(and (srl_su GR32:$src, (i8 8)), (i32 255)),
	def : Pat<(srl (and_su GR32:$src, 0xff00), (i8 8)),			(MOVZX32rr8 (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS GR32:$src,
	(MOVZX32rr8 (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS GR32:$src,			GR32_ABCD)),
	GR32_ABCD)),			sub_8bit_hi))>,
	sub_8bit_hi))>,			Requires<[Not64BitMode]>;
	Requires<[Not64BitMode]>;			def : Pat<(srl (and_su GR32:$src, 0xff00), (i8 8)),
				(MOVZX32rr8 (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS GR32:$src,
	// h-register tricks.			GR32_ABCD)),
	// For now, be conservative on x86-64 and use an h-register extract only if the			sub_8bit_hi))>,
	// value is immediately zero-extended or stored, which are somewhat common			Requires<[Not64BitMode]>;
	// cases. This uses a bunch of code to prevent a register requiring a REX prefix
	// from being allocated in the same instruction as the h register, as there's			// h-register tricks.
	// currently no way to describe this requirement to the register allocator.			// For now, be conservative on x86-64 and use an h-register extract only if the
				// value is immediately zero-extended or stored, which are somewhat common
	// h-register extract and zero-extend.			// cases. This uses a bunch of code to prevent a register requiring a REX prefix
	def : Pat<(and (srl_su GR64:$src, (i8 8)), (i64 255)),			// from being allocated in the same instruction as the h register, as there's
	(SUBREG_TO_REG			// currently no way to describe this requirement to the register allocator.
	(i64 0),
	(MOVZX32_NOREXrr8			// h-register extract and zero-extend.
	(EXTRACT_SUBREG (i64 (COPY_TO_REGCLASS GR64:$src, GR64_ABCD)),			def : Pat<(and (srl_su GR64:$src, (i8 8)), (i64 255)),
	sub_8bit_hi)),			(SUBREG_TO_REG
	sub_32bit)>;			(i64 0),
	def : Pat<(and (srl_su GR32:$src, (i8 8)), (i32 255)),			(MOVZX32_NOREXrr8
	(MOVZX32_NOREXrr8			(EXTRACT_SUBREG (i64 (COPY_TO_REGCLASS GR64:$src, GR64_ABCD)),
	(EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS GR32:$src, GR32_ABCD)),			sub_8bit_hi)),
	sub_8bit_hi))>,			sub_32bit)>;
	Requires<[In64BitMode]>;			def : Pat<(and (srl_su GR32:$src, (i8 8)), (i32 255)),
	def : Pat<(srl (and_su GR32:$src, 0xff00), (i8 8)),			(MOVZX32_NOREXrr8
	(MOVZX32_NOREXrr8 (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS GR32:$src,			(EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS GR32:$src, GR32_ABCD)),
	GR32_ABCD)),			sub_8bit_hi))>,
	sub_8bit_hi))>,			Requires<[In64BitMode]>;
	Requires<[In64BitMode]>;			def : Pat<(srl (and_su GR32:$src, 0xff00), (i8 8)),
	def : Pat<(srl GR16:$src, (i8 8)),			(MOVZX32_NOREXrr8 (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS GR32:$src,
	(EXTRACT_SUBREG			GR32_ABCD)),
	(MOVZX32_NOREXrr8			sub_8bit_hi))>,
	(EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src, GR16_ABCD)),			Requires<[In64BitMode]>;
	sub_8bit_hi)),			def : Pat<(srl GR16:$src, (i8 8)),
	sub_16bit)>,			(EXTRACT_SUBREG
	Requires<[In64BitMode]>;			(MOVZX32_NOREXrr8
	def : Pat<(i32 (zext (srl_su GR16:$src, (i8 8)))),			(EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src, GR16_ABCD)),
	(MOVZX32_NOREXrr8			sub_8bit_hi)),
	(EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src, GR16_ABCD)),			sub_16bit)>,
	sub_8bit_hi))>,			Requires<[In64BitMode]>;
	Requires<[In64BitMode]>;			def : Pat<(i32 (zext (srl_su GR16:$src, (i8 8)))),
	def : Pat<(i32 (anyext (srl_su GR16:$src, (i8 8)))),			(MOVZX32_NOREXrr8
	(MOVZX32_NOREXrr8			(EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src, GR16_ABCD)),
	(EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src, GR16_ABCD)),			sub_8bit_hi))>,
	sub_8bit_hi))>,			Requires<[In64BitMode]>;
	Requires<[In64BitMode]>;			def : Pat<(i32 (anyext (srl_su GR16:$src, (i8 8)))),
	def : Pat<(i64 (zext (srl_su GR16:$src, (i8 8)))),			(MOVZX32_NOREXrr8
	(SUBREG_TO_REG			(EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src, GR16_ABCD)),
	(i64 0),			sub_8bit_hi))>,
	(MOVZX32_NOREXrr8			Requires<[In64BitMode]>;
	(EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src, GR16_ABCD)),			def : Pat<(i64 (zext (srl_su GR16:$src, (i8 8)))),
	sub_8bit_hi)),			(SUBREG_TO_REG
	sub_32bit)>;			(i64 0),
	def : Pat<(i64 (anyext (srl_su GR16:$src, (i8 8)))),			(MOVZX32_NOREXrr8
	(SUBREG_TO_REG			(EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src, GR16_ABCD)),
	(i64 0),			sub_8bit_hi)),
	(MOVZX32_NOREXrr8			sub_32bit)>;
	(EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src, GR16_ABCD)),			def : Pat<(i64 (anyext (srl_su GR16:$src, (i8 8)))),
	sub_8bit_hi)),			(SUBREG_TO_REG
	sub_32bit)>;			(i64 0),
				(MOVZX32_NOREXrr8
	// h-register extract and store.			(EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src, GR16_ABCD)),
	def : Pat<(store (i8 (trunc_su (srl_su GR64:$src, (i8 8)))), addr:$dst),			sub_8bit_hi)),
	(MOV8mr_NOREX			sub_32bit)>;
	addr:$dst,
	(EXTRACT_SUBREG (i64 (COPY_TO_REGCLASS GR64:$src, GR64_ABCD)),			// h-register extract and store.
	sub_8bit_hi))>;			def : Pat<(store (i8 (trunc_su (srl_su GR64:$src, (i8 8)))), addr:$dst),
	def : Pat<(store (i8 (trunc_su (srl_su GR32:$src, (i8 8)))), addr:$dst),			(MOV8mr_NOREX
	(MOV8mr_NOREX			addr:$dst,
	addr:$dst,			(EXTRACT_SUBREG (i64 (COPY_TO_REGCLASS GR64:$src, GR64_ABCD)),
	(EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS GR32:$src, GR32_ABCD)),			sub_8bit_hi))>;
	sub_8bit_hi))>,			def : Pat<(store (i8 (trunc_su (srl_su GR32:$src, (i8 8)))), addr:$dst),
	Requires<[In64BitMode]>;			(MOV8mr_NOREX
	def : Pat<(store (i8 (trunc_su (srl_su GR16:$src, (i8 8)))), addr:$dst),			addr:$dst,
	(MOV8mr_NOREX			(EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS GR32:$src, GR32_ABCD)),
	addr:$dst,			sub_8bit_hi))>,
	(EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src, GR16_ABCD)),			Requires<[In64BitMode]>;
	sub_8bit_hi))>,			def : Pat<(store (i8 (trunc_su (srl_su GR16:$src, (i8 8)))), addr:$dst),
	Requires<[In64BitMode]>;			(MOV8mr_NOREX
				addr:$dst,
				(EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src, GR16_ABCD)),
	// (shl x, 1) ==> (add x, x)			sub_8bit_hi))>,
	// Note that if x is undef (immediate or otherwise), we could theoretically			Requires<[In64BitMode]>;
	// end up with the two uses of x getting different values, producing a result
	// where the least significant bit is not 0. However, the probability of this
	// happening is considered low enough that this is officially not a			// (shl x, 1) ==> (add x, x)
	// "real problem".			// Note that if x is undef (immediate or otherwise), we could theoretically
	def : Pat<(shl GR8 :$src1, (i8 1)), (ADD8rr GR8 :$src1, GR8 :$src1)>;			// end up with the two uses of x getting different values, producing a result
	def : Pat<(shl GR16:$src1, (i8 1)), (ADD16rr GR16:$src1, GR16:$src1)>;			// where the least significant bit is not 0. However, the probability of this
	def : Pat<(shl GR32:$src1, (i8 1)), (ADD32rr GR32:$src1, GR32:$src1)>;			// happening is considered low enough that this is officially not a
	def : Pat<(shl GR64:$src1, (i8 1)), (ADD64rr GR64:$src1, GR64:$src1)>;			// "real problem".
				def : Pat<(shl GR8 :$src1, (i8 1)), (ADD8rr GR8 :$src1, GR8 :$src1)>;
	// Helper imms that check if a mask doesn't change significant shift bits.			def : Pat<(shl GR16:$src1, (i8 1)), (ADD16rr GR16:$src1, GR16:$src1)>;
	def immShift32 : ImmLeaf<i8, [{ return CountTrailingOnes_32(Imm) >= 5; }]>;			def : Pat<(shl GR32:$src1, (i8 1)), (ADD32rr GR32:$src1, GR32:$src1)>;
	def immShift64 : ImmLeaf<i8, [{ return CountTrailingOnes_32(Imm) >= 6; }]>;			def : Pat<(shl GR64:$src1, (i8 1)), (ADD64rr GR64:$src1, GR64:$src1)>;

	// Shift amount is implicitly masked.			// Helper imms that check if a mask doesn't change significant shift bits.
	multiclass MaskedShiftAmountPats<SDNode frag, string name> {			def immShift32 : ImmLeaf<i8, [{ return CountTrailingOnes_32(Imm) >= 5; }]>;
	// (shift x (and y, 31)) ==> (shift x, y)			def immShift64 : ImmLeaf<i8, [{ return CountTrailingOnes_32(Imm) >= 6; }]>;
	def : Pat<(frag GR8:$src1, (and CL, immShift32)),
	(!cast<Instruction>(name # "8rCL") GR8:$src1)>;			// Shift amount is implicitly masked.
	def : Pat<(frag GR16:$src1, (and CL, immShift32)),			multiclass MaskedShiftAmountPats<SDNode frag, string name> {
	(!cast<Instruction>(name # "16rCL") GR16:$src1)>;			// (shift x (and y, 31)) ==> (shift x, y)
	def : Pat<(frag GR32:$src1, (and CL, immShift32)),			def : Pat<(frag GR8:$src1, (and CL, immShift32)),
	(!cast<Instruction>(name # "32rCL") GR32:$src1)>;			(!cast<Instruction>(name # "8rCL") GR8:$src1)>;
	def : Pat<(store (frag (loadi8 addr:$dst), (and CL, immShift32)), addr:$dst),			def : Pat<(frag GR16:$src1, (and CL, immShift32)),
	(!cast<Instruction>(name # "8mCL") addr:$dst)>;			(!cast<Instruction>(name # "16rCL") GR16:$src1)>;
	def : Pat<(store (frag (loadi16 addr:$dst), (and CL, immShift32)), addr:$dst),			def : Pat<(frag GR32:$src1, (and CL, immShift32)),
	(!cast<Instruction>(name # "16mCL") addr:$dst)>;			(!cast<Instruction>(name # "32rCL") GR32:$src1)>;
	def : Pat<(store (frag (loadi32 addr:$dst), (and CL, immShift32)), addr:$dst),			def : Pat<(store (frag (loadi8 addr:$dst), (and CL, immShift32)), addr:$dst),
	(!cast<Instruction>(name # "32mCL") addr:$dst)>;			(!cast<Instruction>(name # "8mCL") addr:$dst)>;
				def : Pat<(store (frag (loadi16 addr:$dst), (and CL, immShift32)), addr:$dst),
	// (shift x (and y, 63)) ==> (shift x, y)			(!cast<Instruction>(name # "16mCL") addr:$dst)>;
	def : Pat<(frag GR64:$src1, (and CL, immShift64)),			def : Pat<(store (frag (loadi32 addr:$dst), (and CL, immShift32)), addr:$dst),
	(!cast<Instruction>(name # "64rCL") GR64:$src1)>;			(!cast<Instruction>(name # "32mCL") addr:$dst)>;
	def : Pat<(store (frag (loadi64 addr:$dst), (and CL, 63)), addr:$dst),
	(!cast<Instruction>(name # "64mCL") addr:$dst)>;			// (shift x (and y, 63)) ==> (shift x, y)
	}			def : Pat<(frag GR64:$src1, (and CL, immShift64)),
				(!cast<Instruction>(name # "64rCL") GR64:$src1)>;
	defm : MaskedShiftAmountPats<shl, "SHL">;			def : Pat<(store (frag (loadi64 addr:$dst), (and CL, 63)), addr:$dst),
	defm : MaskedShiftAmountPats<srl, "SHR">;			(!cast<Instruction>(name # "64mCL") addr:$dst)>;
	defm : MaskedShiftAmountPats<sra, "SAR">;			}
	defm : MaskedShiftAmountPats<rotl, "ROL">;
	defm : MaskedShiftAmountPats<rotr, "ROR">;			defm : MaskedShiftAmountPats<shl, "SHL">;
				defm : MaskedShiftAmountPats<srl, "SHR">;
	// (anyext (setcc_carry)) -> (setcc_carry)			defm : MaskedShiftAmountPats<sra, "SAR">;
	def : Pat<(i16 (anyext (i8 (X86setcc_c X86_COND_B, EFLAGS)))),			defm : MaskedShiftAmountPats<rotl, "ROL">;
	(SETB_C16r)>;			defm : MaskedShiftAmountPats<rotr, "ROR">;
	def : Pat<(i32 (anyext (i8 (X86setcc_c X86_COND_B, EFLAGS)))),
	(SETB_C32r)>;			// (anyext (setcc_carry)) -> (setcc_carry)
	def : Pat<(i32 (anyext (i16 (X86setcc_c X86_COND_B, EFLAGS)))),			def : Pat<(i16 (anyext (i8 (X86setcc_c X86_COND_B, EFLAGS)))),
	(SETB_C32r)>;			(SETB_C16r)>;
				def : Pat<(i32 (anyext (i8 (X86setcc_c X86_COND_B, EFLAGS)))),
				(SETB_C32r)>;
				def : Pat<(i32 (anyext (i16 (X86setcc_c X86_COND_B, EFLAGS)))),
				(SETB_C32r)>;
	//===----------------------------------------------------------------------===//
	// EFLAGS-defining Patterns
	//===----------------------------------------------------------------------===//

	// add reg, reg			//===----------------------------------------------------------------------===//
	def : Pat<(add GR8 :$src1, GR8 :$src2), (ADD8rr GR8 :$src1, GR8 :$src2)>;			// EFLAGS-defining Patterns
	def : Pat<(add GR16:$src1, GR16:$src2), (ADD16rr GR16:$src1, GR16:$src2)>;			//===----------------------------------------------------------------------===//
	def : Pat<(add GR32:$src1, GR32:$src2), (ADD32rr GR32:$src1, GR32:$src2)>;
				// add reg, reg
	// add reg, mem			def : Pat<(add GR8 :$src1, GR8 :$src2), (ADD8rr GR8 :$src1, GR8 :$src2)>;
	def : Pat<(add GR8:$src1, (loadi8 addr:$src2)),			def : Pat<(add GR16:$src1, GR16:$src2), (ADD16rr GR16:$src1, GR16:$src2)>;
	(ADD8rm GR8:$src1, addr:$src2)>;			def : Pat<(add GR32:$src1, GR32:$src2), (ADD32rr GR32:$src1, GR32:$src2)>;
	def : Pat<(add GR16:$src1, (loadi16 addr:$src2)),
	(ADD16rm GR16:$src1, addr:$src2)>;			// add reg, mem
	def : Pat<(add GR32:$src1, (loadi32 addr:$src2)),			def : Pat<(add GR8:$src1, (loadi8 addr:$src2)),
	(ADD32rm GR32:$src1, addr:$src2)>;			(ADD8rm GR8:$src1, addr:$src2)>;
				def : Pat<(add GR16:$src1, (loadi16 addr:$src2)),
	// add reg, imm			(ADD16rm GR16:$src1, addr:$src2)>;
	def : Pat<(add GR8 :$src1, imm:$src2), (ADD8ri GR8:$src1 , imm:$src2)>;			def : Pat<(add GR32:$src1, (loadi32 addr:$src2)),
	def : Pat<(add GR16:$src1, imm:$src2), (ADD16ri GR16:$src1, imm:$src2)>;			(ADD32rm GR32:$src1, addr:$src2)>;
	def : Pat<(add GR32:$src1, imm:$src2), (ADD32ri GR32:$src1, imm:$src2)>;
	def : Pat<(add GR16:$src1, i16immSExt8:$src2),			// add reg, imm
	(ADD16ri8 GR16:$src1, i16immSExt8:$src2)>;			def : Pat<(add GR8 :$src1, imm:$src2), (ADD8ri GR8:$src1 , imm:$src2)>;
	def : Pat<(add GR32:$src1, i32immSExt8:$src2),			def : Pat<(add GR16:$src1, imm:$src2), (ADD16ri GR16:$src1, imm:$src2)>;
	(ADD32ri8 GR32:$src1, i32immSExt8:$src2)>;			def : Pat<(add GR32:$src1, imm:$src2), (ADD32ri GR32:$src1, imm:$src2)>;
				def : Pat<(add GR16:$src1, i16immSExt8:$src2),
	// sub reg, reg			(ADD16ri8 GR16:$src1, i16immSExt8:$src2)>;
	def : Pat<(sub GR8 :$src1, GR8 :$src2), (SUB8rr GR8 :$src1, GR8 :$src2)>;			def : Pat<(add GR32:$src1, i32immSExt8:$src2),
	def : Pat<(sub GR16:$src1, GR16:$src2), (SUB16rr GR16:$src1, GR16:$src2)>;			(ADD32ri8 GR32:$src1, i32immSExt8:$src2)>;
	def : Pat<(sub GR32:$src1, GR32:$src2), (SUB32rr GR32:$src1, GR32:$src2)>;
				// sub reg, reg
	// sub reg, mem			def : Pat<(sub GR8 :$src1, GR8 :$src2), (SUB8rr GR8 :$src1, GR8 :$src2)>;
	def : Pat<(sub GR8:$src1, (loadi8 addr:$src2)),			def : Pat<(sub GR16:$src1, GR16:$src2), (SUB16rr GR16:$src1, GR16:$src2)>;
	(SUB8rm GR8:$src1, addr:$src2)>;			def : Pat<(sub GR32:$src1, GR32:$src2), (SUB32rr GR32:$src1, GR32:$src2)>;
	def : Pat<(sub GR16:$src1, (loadi16 addr:$src2)),
	(SUB16rm GR16:$src1, addr:$src2)>;			// sub reg, mem
	def : Pat<(sub GR32:$src1, (loadi32 addr:$src2)),			def : Pat<(sub GR8:$src1, (loadi8 addr:$src2)),
	(SUB32rm GR32:$src1, addr:$src2)>;			(SUB8rm GR8:$src1, addr:$src2)>;
				def : Pat<(sub GR16:$src1, (loadi16 addr:$src2)),
	// sub reg, imm			(SUB16rm GR16:$src1, addr:$src2)>;
	def : Pat<(sub GR8:$src1, imm:$src2),			def : Pat<(sub GR32:$src1, (loadi32 addr:$src2)),
	(SUB8ri GR8:$src1, imm:$src2)>;			(SUB32rm GR32:$src1, addr:$src2)>;
	def : Pat<(sub GR16:$src1, imm:$src2),
	(SUB16ri GR16:$src1, imm:$src2)>;			// sub reg, imm
	def : Pat<(sub GR32:$src1, imm:$src2),			def : Pat<(sub GR8:$src1, imm:$src2),
	(SUB32ri GR32:$src1, imm:$src2)>;			(SUB8ri GR8:$src1, imm:$src2)>;
	def : Pat<(sub GR16:$src1, i16immSExt8:$src2),			def : Pat<(sub GR16:$src1, imm:$src2),
	(SUB16ri8 GR16:$src1, i16immSExt8:$src2)>;			(SUB16ri GR16:$src1, imm:$src2)>;
	def : Pat<(sub GR32:$src1, i32immSExt8:$src2),			def : Pat<(sub GR32:$src1, imm:$src2),
	(SUB32ri8 GR32:$src1, i32immSExt8:$src2)>;			(SUB32ri GR32:$src1, imm:$src2)>;
				def : Pat<(sub GR16:$src1, i16immSExt8:$src2),
	// sub 0, reg			(SUB16ri8 GR16:$src1, i16immSExt8:$src2)>;
	def : Pat<(X86sub_flag 0, GR8 :$src), (NEG8r GR8 :$src)>;			def : Pat<(sub GR32:$src1, i32immSExt8:$src2),
	def : Pat<(X86sub_flag 0, GR16:$src), (NEG16r GR16:$src)>;			(SUB32ri8 GR32:$src1, i32immSExt8:$src2)>;
	def : Pat<(X86sub_flag 0, GR32:$src), (NEG32r GR32:$src)>;
	def : Pat<(X86sub_flag 0, GR64:$src), (NEG64r GR64:$src)>;			// sub 0, reg
				def : Pat<(X86sub_flag 0, GR8 :$src), (NEG8r GR8 :$src)>;
	// mul reg, reg			def : Pat<(X86sub_flag 0, GR16:$src), (NEG16r GR16:$src)>;
	def : Pat<(mul GR16:$src1, GR16:$src2),			def : Pat<(X86sub_flag 0, GR32:$src), (NEG32r GR32:$src)>;
	(IMUL16rr GR16:$src1, GR16:$src2)>;			def : Pat<(X86sub_flag 0, GR64:$src), (NEG64r GR64:$src)>;
	def : Pat<(mul GR32:$src1, GR32:$src2),
	(IMUL32rr GR32:$src1, GR32:$src2)>;			// mul reg, reg
				def : Pat<(mul GR16:$src1, GR16:$src2),
	// mul reg, mem			(IMUL16rr GR16:$src1, GR16:$src2)>;
	def : Pat<(mul GR16:$src1, (loadi16 addr:$src2)),			def : Pat<(mul GR32:$src1, GR32:$src2),
	(IMUL16rm GR16:$src1, addr:$src2)>;			(IMUL32rr GR32:$src1, GR32:$src2)>;
	def : Pat<(mul GR32:$src1, (loadi32 addr:$src2)),
	(IMUL32rm GR32:$src1, addr:$src2)>;			// mul reg, mem
				def : Pat<(mul GR16:$src1, (loadi16 addr:$src2)),
	// mul reg, imm			(IMUL16rm GR16:$src1, addr:$src2)>;
	def : Pat<(mul GR16:$src1, imm:$src2),			def : Pat<(mul GR32:$src1, (loadi32 addr:$src2)),
	(IMUL16rri GR16:$src1, imm:$src2)>;			(IMUL32rm GR32:$src1, addr:$src2)>;
	def : Pat<(mul GR32:$src1, imm:$src2),
	(IMUL32rri GR32:$src1, imm:$src2)>;			// mul reg, imm
	def : Pat<(mul GR16:$src1, i16immSExt8:$src2),			def : Pat<(mul GR16:$src1, imm:$src2),
	(IMUL16rri8 GR16:$src1, i16immSExt8:$src2)>;			(IMUL16rri GR16:$src1, imm:$src2)>;
	def : Pat<(mul GR32:$src1, i32immSExt8:$src2),			def : Pat<(mul GR32:$src1, imm:$src2),
	(IMUL32rri8 GR32:$src1, i32immSExt8:$src2)>;			(IMUL32rri GR32:$src1, imm:$src2)>;
				def : Pat<(mul GR16:$src1, i16immSExt8:$src2),
	// reg = mul mem, imm			(IMUL16rri8 GR16:$src1, i16immSExt8:$src2)>;
	def : Pat<(mul (loadi16 addr:$src1), imm:$src2),			def : Pat<(mul GR32:$src1, i32immSExt8:$src2),
	(IMUL16rmi addr:$src1, imm:$src2)>;			(IMUL32rri8 GR32:$src1, i32immSExt8:$src2)>;
	def : Pat<(mul (loadi32 addr:$src1), imm:$src2),
	(IMUL32rmi addr:$src1, imm:$src2)>;			// reg = mul mem, imm
	def : Pat<(mul (loadi16 addr:$src1), i16immSExt8:$src2),			def : Pat<(mul (loadi16 addr:$src1), imm:$src2),
	(IMUL16rmi8 addr:$src1, i16immSExt8:$src2)>;			(IMUL16rmi addr:$src1, imm:$src2)>;
	def : Pat<(mul (loadi32 addr:$src1), i32immSExt8:$src2),			def : Pat<(mul (loadi32 addr:$src1), imm:$src2),
	(IMUL32rmi8 addr:$src1, i32immSExt8:$src2)>;			(IMUL32rmi addr:$src1, imm:$src2)>;
				def : Pat<(mul (loadi16 addr:$src1), i16immSExt8:$src2),
	// Patterns for nodes that do not produce flags, for instructions that do.			(IMUL16rmi8 addr:$src1, i16immSExt8:$src2)>;
				def : Pat<(mul (loadi32 addr:$src1), i32immSExt8:$src2),
	// addition			(IMUL32rmi8 addr:$src1, i32immSExt8:$src2)>;
	def : Pat<(add GR64:$src1, GR64:$src2),
	(ADD64rr GR64:$src1, GR64:$src2)>;			// Patterns for nodes that do not produce flags, for instructions that do.
	def : Pat<(add GR64:$src1, i64immSExt8:$src2),
	(ADD64ri8 GR64:$src1, i64immSExt8:$src2)>;			// addition
	def : Pat<(add GR64:$src1, i64immSExt32:$src2),			def : Pat<(add GR64:$src1, GR64:$src2),
	(ADD64ri32 GR64:$src1, i64immSExt32:$src2)>;			(ADD64rr GR64:$src1, GR64:$src2)>;
	def : Pat<(add GR64:$src1, (loadi64 addr:$src2)),			def : Pat<(add GR64:$src1, i64immSExt8:$src2),
	(ADD64rm GR64:$src1, addr:$src2)>;			(ADD64ri8 GR64:$src1, i64immSExt8:$src2)>;
				def : Pat<(add GR64:$src1, i64immSExt32:$src2),
	// subtraction			(ADD64ri32 GR64:$src1, i64immSExt32:$src2)>;
	def : Pat<(sub GR64:$src1, GR64:$src2),			def : Pat<(add GR64:$src1, (loadi64 addr:$src2)),
	(SUB64rr GR64:$src1, GR64:$src2)>;			(ADD64rm GR64:$src1, addr:$src2)>;
	def : Pat<(sub GR64:$src1, (loadi64 addr:$src2)),
	(SUB64rm GR64:$src1, addr:$src2)>;			// subtraction
	def : Pat<(sub GR64:$src1, i64immSExt8:$src2),			def : Pat<(sub GR64:$src1, GR64:$src2),
	(SUB64ri8 GR64:$src1, i64immSExt8:$src2)>;			(SUB64rr GR64:$src1, GR64:$src2)>;
	def : Pat<(sub GR64:$src1, i64immSExt32:$src2),			def : Pat<(sub GR64:$src1, (loadi64 addr:$src2)),
	(SUB64ri32 GR64:$src1, i64immSExt32:$src2)>;			(SUB64rm GR64:$src1, addr:$src2)>;
				def : Pat<(sub GR64:$src1, i64immSExt8:$src2),
	// Multiply			(SUB64ri8 GR64:$src1, i64immSExt8:$src2)>;
	def : Pat<(mul GR64:$src1, GR64:$src2),			def : Pat<(sub GR64:$src1, i64immSExt32:$src2),
	(IMUL64rr GR64:$src1, GR64:$src2)>;			(SUB64ri32 GR64:$src1, i64immSExt32:$src2)>;
	def : Pat<(mul GR64:$src1, (loadi64 addr:$src2)),
	(IMUL64rm GR64:$src1, addr:$src2)>;			// Multiply
	def : Pat<(mul GR64:$src1, i64immSExt8:$src2),			def : Pat<(mul GR64:$src1, GR64:$src2),
	(IMUL64rri8 GR64:$src1, i64immSExt8:$src2)>;			(IMUL64rr GR64:$src1, GR64:$src2)>;
	def : Pat<(mul GR64:$src1, i64immSExt32:$src2),			def : Pat<(mul GR64:$src1, (loadi64 addr:$src2)),
	(IMUL64rri32 GR64:$src1, i64immSExt32:$src2)>;			(IMUL64rm GR64:$src1, addr:$src2)>;
	def : Pat<(mul (loadi64 addr:$src1), i64immSExt8:$src2),			def : Pat<(mul GR64:$src1, i64immSExt8:$src2),
	(IMUL64rmi8 addr:$src1, i64immSExt8:$src2)>;			(IMUL64rri8 GR64:$src1, i64immSExt8:$src2)>;
	def : Pat<(mul (loadi64 addr:$src1), i64immSExt32:$src2),			def : Pat<(mul GR64:$src1, i64immSExt32:$src2),
	(IMUL64rmi32 addr:$src1, i64immSExt32:$src2)>;			(IMUL64rri32 GR64:$src1, i64immSExt32:$src2)>;
				def : Pat<(mul (loadi64 addr:$src1), i64immSExt8:$src2),
	// Increment/Decrement reg.			(IMUL64rmi8 addr:$src1, i64immSExt8:$src2)>;
	// Do not make INC/DEC if it is slow			def : Pat<(mul (loadi64 addr:$src1), i64immSExt32:$src2),
	let Predicates = [NotSlowIncDec] in {			(IMUL64rmi32 addr:$src1, i64immSExt32:$src2)>;
	def : Pat<(add GR8:$src, 1), (INC8r GR8:$src)>;
	def : Pat<(add GR16:$src, 1), (INC16r GR16:$src)>;			// Increment/Decrement reg.
	def : Pat<(add GR32:$src, 1), (INC32r GR32:$src)>;			// Do not make INC/DEC if it is slow
	def : Pat<(add GR64:$src, 1), (INC64r GR64:$src)>;			let Predicates = [NotSlowIncDec] in {
	def : Pat<(add GR8:$src, -1), (DEC8r GR8:$src)>;			def : Pat<(add GR8:$src, 1), (INC8r GR8:$src)>;
	def : Pat<(add GR16:$src, -1), (DEC16r GR16:$src)>;			def : Pat<(add GR16:$src, 1), (INC16r GR16:$src)>;
	def : Pat<(add GR32:$src, -1), (DEC32r GR32:$src)>;			def : Pat<(add GR32:$src, 1), (INC32r GR32:$src)>;
	def : Pat<(add GR64:$src, -1), (DEC64r GR64:$src)>;			def : Pat<(add GR64:$src, 1), (INC64r GR64:$src)>;
	}			def : Pat<(add GR8:$src, -1), (DEC8r GR8:$src)>;
				def : Pat<(add GR16:$src, -1), (DEC16r GR16:$src)>;
	// or reg/reg.			def : Pat<(add GR32:$src, -1), (DEC32r GR32:$src)>;
	def : Pat<(or GR8 :$src1, GR8 :$src2), (OR8rr GR8 :$src1, GR8 :$src2)>;			def : Pat<(add GR64:$src, -1), (DEC64r GR64:$src)>;
	def : Pat<(or GR16:$src1, GR16:$src2), (OR16rr GR16:$src1, GR16:$src2)>;			}
	def : Pat<(or GR32:$src1, GR32:$src2), (OR32rr GR32:$src1, GR32:$src2)>;
	def : Pat<(or GR64:$src1, GR64:$src2), (OR64rr GR64:$src1, GR64:$src2)>;			// or reg/reg.
				def : Pat<(or GR8 :$src1, GR8 :$src2), (OR8rr GR8 :$src1, GR8 :$src2)>;
	// or reg/mem			def : Pat<(or GR16:$src1, GR16:$src2), (OR16rr GR16:$src1, GR16:$src2)>;
	def : Pat<(or GR8:$src1, (loadi8 addr:$src2)),			def : Pat<(or GR32:$src1, GR32:$src2), (OR32rr GR32:$src1, GR32:$src2)>;
	(OR8rm GR8:$src1, addr:$src2)>;			def : Pat<(or GR64:$src1, GR64:$src2), (OR64rr GR64:$src1, GR64:$src2)>;
	def : Pat<(or GR16:$src1, (loadi16 addr:$src2)),
	(OR16rm GR16:$src1, addr:$src2)>;			// or reg/mem
	def : Pat<(or GR32:$src1, (loadi32 addr:$src2)),			def : Pat<(or GR8:$src1, (loadi8 addr:$src2)),
	(OR32rm GR32:$src1, addr:$src2)>;			(OR8rm GR8:$src1, addr:$src2)>;
	def : Pat<(or GR64:$src1, (loadi64 addr:$src2)),			def : Pat<(or GR16:$src1, (loadi16 addr:$src2)),
	(OR64rm GR64:$src1, addr:$src2)>;			(OR16rm GR16:$src1, addr:$src2)>;
				def : Pat<(or GR32:$src1, (loadi32 addr:$src2)),
	// or reg/imm			(OR32rm GR32:$src1, addr:$src2)>;
	def : Pat<(or GR8:$src1 , imm:$src2), (OR8ri GR8 :$src1, imm:$src2)>;			def : Pat<(or GR64:$src1, (loadi64 addr:$src2)),
	def : Pat<(or GR16:$src1, imm:$src2), (OR16ri GR16:$src1, imm:$src2)>;			(OR64rm GR64:$src1, addr:$src2)>;
	def : Pat<(or GR32:$src1, imm:$src2), (OR32ri GR32:$src1, imm:$src2)>;
	def : Pat<(or GR16:$src1, i16immSExt8:$src2),			// or reg/imm
	(OR16ri8 GR16:$src1, i16immSExt8:$src2)>;			def : Pat<(or GR8:$src1 , imm:$src2), (OR8ri GR8 :$src1, imm:$src2)>;
	def : Pat<(or GR32:$src1, i32immSExt8:$src2),			def : Pat<(or GR16:$src1, imm:$src2), (OR16ri GR16:$src1, imm:$src2)>;
	(OR32ri8 GR32:$src1, i32immSExt8:$src2)>;			def : Pat<(or GR32:$src1, imm:$src2), (OR32ri GR32:$src1, imm:$src2)>;
	def : Pat<(or GR64:$src1, i64immSExt8:$src2),			def : Pat<(or GR16:$src1, i16immSExt8:$src2),
	(OR64ri8 GR64:$src1, i64immSExt8:$src2)>;			(OR16ri8 GR16:$src1, i16immSExt8:$src2)>;
	def : Pat<(or GR64:$src1, i64immSExt32:$src2),			def : Pat<(or GR32:$src1, i32immSExt8:$src2),
	(OR64ri32 GR64:$src1, i64immSExt32:$src2)>;			(OR32ri8 GR32:$src1, i32immSExt8:$src2)>;
				def : Pat<(or GR64:$src1, i64immSExt8:$src2),
	// xor reg/reg			(OR64ri8 GR64:$src1, i64immSExt8:$src2)>;
	def : Pat<(xor GR8 :$src1, GR8 :$src2), (XOR8rr GR8 :$src1, GR8 :$src2)>;			def : Pat<(or GR64:$src1, i64immSExt32:$src2),
	def : Pat<(xor GR16:$src1, GR16:$src2), (XOR16rr GR16:$src1, GR16:$src2)>;			(OR64ri32 GR64:$src1, i64immSExt32:$src2)>;
	def : Pat<(xor GR32:$src1, GR32:$src2), (XOR32rr GR32:$src1, GR32:$src2)>;
	def : Pat<(xor GR64:$src1, GR64:$src2), (XOR64rr GR64:$src1, GR64:$src2)>;			// xor reg/reg
				def : Pat<(xor GR8 :$src1, GR8 :$src2), (XOR8rr GR8 :$src1, GR8 :$src2)>;
	// xor reg/mem			def : Pat<(xor GR16:$src1, GR16:$src2), (XOR16rr GR16:$src1, GR16:$src2)>;
	def : Pat<(xor GR8:$src1, (loadi8 addr:$src2)),			def : Pat<(xor GR32:$src1, GR32:$src2), (XOR32rr GR32:$src1, GR32:$src2)>;
	(XOR8rm GR8:$src1, addr:$src2)>;			def : Pat<(xor GR64:$src1, GR64:$src2), (XOR64rr GR64:$src1, GR64:$src2)>;
	def : Pat<(xor GR16:$src1, (loadi16 addr:$src2)),
	(XOR16rm GR16:$src1, addr:$src2)>;			// xor reg/mem
	def : Pat<(xor GR32:$src1, (loadi32 addr:$src2)),			def : Pat<(xor GR8:$src1, (loadi8 addr:$src2)),
	(XOR32rm GR32:$src1, addr:$src2)>;			(XOR8rm GR8:$src1, addr:$src2)>;
	def : Pat<(xor GR64:$src1, (loadi64 addr:$src2)),			def : Pat<(xor GR16:$src1, (loadi16 addr:$src2)),
	(XOR64rm GR64:$src1, addr:$src2)>;			(XOR16rm GR16:$src1, addr:$src2)>;
				def : Pat<(xor GR32:$src1, (loadi32 addr:$src2)),
	// xor reg/imm			(XOR32rm GR32:$src1, addr:$src2)>;
	def : Pat<(xor GR8:$src1, imm:$src2),			def : Pat<(xor GR64:$src1, (loadi64 addr:$src2)),
	(XOR8ri GR8:$src1, imm:$src2)>;			(XOR64rm GR64:$src1, addr:$src2)>;
	def : Pat<(xor GR16:$src1, imm:$src2),
	(XOR16ri GR16:$src1, imm:$src2)>;			// xor reg/imm
	def : Pat<(xor GR32:$src1, imm:$src2),			def : Pat<(xor GR8:$src1, imm:$src2),
	(XOR32ri GR32:$src1, imm:$src2)>;			(XOR8ri GR8:$src1, imm:$src2)>;
	def : Pat<(xor GR16:$src1, i16immSExt8:$src2),			def : Pat<(xor GR16:$src1, imm:$src2),
	(XOR16ri8 GR16:$src1, i16immSExt8:$src2)>;			(XOR16ri GR16:$src1, imm:$src2)>;
	def : Pat<(xor GR32:$src1, i32immSExt8:$src2),			def : Pat<(xor GR32:$src1, imm:$src2),
	(XOR32ri8 GR32:$src1, i32immSExt8:$src2)>;			(XOR32ri GR32:$src1, imm:$src2)>;
	def : Pat<(xor GR64:$src1, i64immSExt8:$src2),			def : Pat<(xor GR16:$src1, i16immSExt8:$src2),
	(XOR64ri8 GR64:$src1, i64immSExt8:$src2)>;			(XOR16ri8 GR16:$src1, i16immSExt8:$src2)>;
	def : Pat<(xor GR64:$src1, i64immSExt32:$src2),			def : Pat<(xor GR32:$src1, i32immSExt8:$src2),
	(XOR64ri32 GR64:$src1, i64immSExt32:$src2)>;			(XOR32ri8 GR32:$src1, i32immSExt8:$src2)>;
				def : Pat<(xor GR64:$src1, i64immSExt8:$src2),
	// and reg/reg			(XOR64ri8 GR64:$src1, i64immSExt8:$src2)>;
	def : Pat<(and GR8 :$src1, GR8 :$src2), (AND8rr GR8 :$src1, GR8 :$src2)>;			def : Pat<(xor GR64:$src1, i64immSExt32:$src2),
	def : Pat<(and GR16:$src1, GR16:$src2), (AND16rr GR16:$src1, GR16:$src2)>;			(XOR64ri32 GR64:$src1, i64immSExt32:$src2)>;
	def : Pat<(and GR32:$src1, GR32:$src2), (AND32rr GR32:$src1, GR32:$src2)>;
	def : Pat<(and GR64:$src1, GR64:$src2), (AND64rr GR64:$src1, GR64:$src2)>;			// and reg/reg
				def : Pat<(and GR8 :$src1, GR8 :$src2), (AND8rr GR8 :$src1, GR8 :$src2)>;
	// and reg/mem			def : Pat<(and GR16:$src1, GR16:$src2), (AND16rr GR16:$src1, GR16:$src2)>;
	def : Pat<(and GR8:$src1, (loadi8 addr:$src2)),			def : Pat<(and GR32:$src1, GR32:$src2), (AND32rr GR32:$src1, GR32:$src2)>;
	(AND8rm GR8:$src1, addr:$src2)>;			def : Pat<(and GR64:$src1, GR64:$src2), (AND64rr GR64:$src1, GR64:$src2)>;
	def : Pat<(and GR16:$src1, (loadi16 addr:$src2)),
	(AND16rm GR16:$src1, addr:$src2)>;			// and reg/mem
	def : Pat<(and GR32:$src1, (loadi32 addr:$src2)),			def : Pat<(and GR8:$src1, (loadi8 addr:$src2)),
	(AND32rm GR32:$src1, addr:$src2)>;			(AND8rm GR8:$src1, addr:$src2)>;
	def : Pat<(and GR64:$src1, (loadi64 addr:$src2)),			def : Pat<(and GR16:$src1, (loadi16 addr:$src2)),
	(AND64rm GR64:$src1, addr:$src2)>;			(AND16rm GR16:$src1, addr:$src2)>;
				def : Pat<(and GR32:$src1, (loadi32 addr:$src2)),
	// and reg/imm			(AND32rm GR32:$src1, addr:$src2)>;
	def : Pat<(and GR8:$src1, imm:$src2),			def : Pat<(and GR64:$src1, (loadi64 addr:$src2)),
	(AND8ri GR8:$src1, imm:$src2)>;			(AND64rm GR64:$src1, addr:$src2)>;
	def : Pat<(and GR16:$src1, imm:$src2),
	(AND16ri GR16:$src1, imm:$src2)>;			// and reg/imm
	def : Pat<(and GR32:$src1, imm:$src2),			def : Pat<(and GR8:$src1, imm:$src2),
	(AND32ri GR32:$src1, imm:$src2)>;			(AND8ri GR8:$src1, imm:$src2)>;
	def : Pat<(and GR16:$src1, i16immSExt8:$src2),			def : Pat<(and GR16:$src1, imm:$src2),
	(AND16ri8 GR16:$src1, i16immSExt8:$src2)>;			(AND16ri GR16:$src1, imm:$src2)>;
	def : Pat<(and GR32:$src1, i32immSExt8:$src2),			def : Pat<(and GR32:$src1, imm:$src2),
	(AND32ri8 GR32:$src1, i32immSExt8:$src2)>;			(AND32ri GR32:$src1, imm:$src2)>;
	def : Pat<(and GR64:$src1, i64immSExt8:$src2),			def : Pat<(and GR16:$src1, i16immSExt8:$src2),
	(AND64ri8 GR64:$src1, i64immSExt8:$src2)>;			(AND16ri8 GR16:$src1, i16immSExt8:$src2)>;
	def : Pat<(and GR64:$src1, i64immSExt32:$src2),			def : Pat<(and GR32:$src1, i32immSExt8:$src2),
	(AND64ri32 GR64:$src1, i64immSExt32:$src2)>;			(AND32ri8 GR32:$src1, i32immSExt8:$src2)>;
				def : Pat<(and GR64:$src1, i64immSExt8:$src2),
	// Bit scan instruction patterns to match explicit zero-undef behavior.			(AND64ri8 GR64:$src1, i64immSExt8:$src2)>;
	def : Pat<(cttz_zero_undef GR16:$src), (BSF16rr GR16:$src)>;			def : Pat<(and GR64:$src1, i64immSExt32:$src2),
	def : Pat<(cttz_zero_undef GR32:$src), (BSF32rr GR32:$src)>;			(AND64ri32 GR64:$src1, i64immSExt32:$src2)>;
	def : Pat<(cttz_zero_undef GR64:$src), (BSF64rr GR64:$src)>;
	def : Pat<(cttz_zero_undef (loadi16 addr:$src)), (BSF16rm addr:$src)>;			// Bit scan instruction patterns to match explicit zero-undef behavior.
	def : Pat<(cttz_zero_undef (loadi32 addr:$src)), (BSF32rm addr:$src)>;			def : Pat<(cttz_zero_undef GR16:$src), (BSF16rr GR16:$src)>;
	def : Pat<(cttz_zero_undef (loadi64 addr:$src)), (BSF64rm addr:$src)>;			def : Pat<(cttz_zero_undef GR32:$src), (BSF32rr GR32:$src)>;
				def : Pat<(cttz_zero_undef GR64:$src), (BSF64rr GR64:$src)>;
	// When HasMOVBE is enabled it is possible to get a non-legalized			def : Pat<(cttz_zero_undef (loadi16 addr:$src)), (BSF16rm addr:$src)>;
	// register-register 16 bit bswap. This maps it to a ROL instruction.			def : Pat<(cttz_zero_undef (loadi32 addr:$src)), (BSF32rm addr:$src)>;
	let Predicates = [HasMOVBE] in {			def : Pat<(cttz_zero_undef (loadi64 addr:$src)), (BSF64rm addr:$src)>;
	def : Pat<(bswap GR16:$src), (ROL16ri GR16:$src, (i8 8))>;
	}			// When HasMOVBE is enabled it is possible to get a non-legalized
				// register-register 16 bit bswap. This maps it to a ROL instruction.
				let Predicates = [HasMOVBE] in {
				def : Pat<(bswap GR16:$src), (ROL16ri GR16:$src, (i8 8))>;
				}

llvm/trunk/lib/Target/X86/X86InstrInfo.h

Show First 20 Lines • Show All 169 Lines • ▼ Show 20 Lines	public:
explicit X86InstrInfo(X86Subtarget &STI);		explicit X86InstrInfo(X86Subtarget &STI);

/// getRegisterInfo - TargetInstrInfo is a superset of MRegister info. As		/// getRegisterInfo - TargetInstrInfo is a superset of MRegister info. As
/// such, whenever a client has an instance of instruction info, it should		/// such, whenever a client has an instance of instruction info, it should
/// always be able to get register info as well (through this method).		/// always be able to get register info as well (through this method).
///		///
const X86RegisterInfo &getRegisterInfo() const { return RI; }		const X86RegisterInfo &getRegisterInfo() const { return RI; }

		/// getSPAdjust - This returns the stack pointer adjustment made by
		/// this instruction. For x86, we need to handle more complex call
		/// sequences involving PUSHes.
		int getSPAdjust(const MachineInstr *MI) const override;

/// isCoalescableExtInstr - Return true if the instruction is a "coalescable"		/// isCoalescableExtInstr - Return true if the instruction is a "coalescable"
/// extension instruction. That is, it's like a copy where it's legal for the		/// extension instruction. That is, it's like a copy where it's legal for the
/// source to overlap the destination. e.g. X86::MOVSX64rr32. If this returns		/// source to overlap the destination. e.g. X86::MOVSX64rr32. If this returns
/// true, then it's expected the pre-extension value is available as a subreg		/// true, then it's expected the pre-extension value is available as a subreg
/// of the result register. This also returns the sub-register index in		/// of the result register. This also returns the sub-register index in
/// SubIdx.		/// SubIdx.
bool isCoalescableExtInstr(const MachineInstr &MI,		bool isCoalescableExtInstr(const MachineInstr &MI,
unsigned &SrcReg, unsigned &DstReg,		unsigned &SrcReg, unsigned &DstReg,
▲ Show 20 Lines • Show All 283 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp

Show First 20 Lines • Show All 1,798 Lines • ▼ Show 20 Lines	case X86::MOVSX64rr32:
break;		break;
}		}
return true;		return true;
}		}
}		}
return false;		return false;
}		}

		int X86InstrInfo::getSPAdjust(const MachineInstr *MI) const {
		const MachineFunction *MF = MI->getParent()->getParent();
		const TargetFrameLowering *TFI = MF->getSubtarget().getFrameLowering();

		if (MI->getOpcode() == getCallFrameSetupOpcode() \|\|
		MI->getOpcode() == getCallFrameDestroyOpcode()) {
		unsigned StackAlign = TFI->getStackAlignment();
		int SPAdj = (MI->getOperand(0).getImm() + StackAlign - 1) / StackAlign *
		StackAlign;

		SPAdj -= MI->getOperand(1).getImm();

		if (MI->getOpcode() == getCallFrameSetupOpcode())
		return SPAdj;
		else
		return -SPAdj;
		}

		// To know whether a call adjusts the stack, we need information
		// that is bound to the following ADJCALLSTACKUP pseudo.
		// Look for the next ADJCALLSTACKUP that follows the call.
		if (MI->isCall()) {
		const MachineBasicBlock* MBB = MI->getParent();
		auto I = ++MachineBasicBlock::const_iterator(MI);
		for (auto E = MBB->end(); I != E; ++I) {
		if (I->getOpcode() == getCallFrameDestroyOpcode() \|\|
		I->isCall())
		break;
		}

		// If we could not find a frame destroy opcode, then it has already
		// been simplified, so we don't care.
		if (I->getOpcode() != getCallFrameDestroyOpcode())
		return 0;

		return -(I->getOperand(1).getImm());
		}

		// Currently handle only PUSHes we can reasonably expect to see
		// in call sequences
		switch (MI->getOpcode()) {
		default:
		return 0;
		case X86::PUSH32i8:
		case X86::PUSH32r:
		case X86::PUSH32rmm:
		case X86::PUSH32rmr:
		case X86::PUSHi32:
		return 4;
		}
		}

/// isFrameOperand - Return true and the FrameIndex if the specified		/// isFrameOperand - Return true and the FrameIndex if the specified
/// operand and follow operands form a reference to the stack frame.		/// operand and follow operands form a reference to the stack frame.
bool X86InstrInfo::isFrameOperand(const MachineInstr *MI, unsigned int Op,		bool X86InstrInfo::isFrameOperand(const MachineInstr *MI, unsigned int Op,
int &FrameIndex) const {		int &FrameIndex) const {
if (MI->getOperand(Op+X86::AddrBaseReg).isFI() &&		if (MI->getOperand(Op+X86::AddrBaseReg).isFI() &&
MI->getOperand(Op+X86::AddrScaleAmt).isImm() &&		MI->getOperand(Op+X86::AddrScaleAmt).isImm() &&
MI->getOperand(Op+X86::AddrIndexReg).isReg() &&		MI->getOperand(Op+X86::AddrIndexReg).isReg() &&
MI->getOperand(Op+X86::AddrDisp).isImm() &&		MI->getOperand(Op+X86::AddrDisp).isImm() &&
▲ Show 20 Lines • Show All 4,265 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86MachineFunctionInfo.h

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	class X86MachineFunctionInfo : public MachineFunctionInfo {
unsigned VarArgsGPOffset;		unsigned VarArgsGPOffset;
/// VarArgsFPOffset - X86-64 vararg func fp reg offset.		/// VarArgsFPOffset - X86-64 vararg func fp reg offset.
unsigned VarArgsFPOffset;		unsigned VarArgsFPOffset;
/// ArgumentStackSize - The number of bytes on stack consumed by the arguments		/// ArgumentStackSize - The number of bytes on stack consumed by the arguments
/// being passed on the stack.		/// being passed on the stack.
unsigned ArgumentStackSize;		unsigned ArgumentStackSize;
/// NumLocalDynamics - Number of local-dynamic TLS accesses.		/// NumLocalDynamics - Number of local-dynamic TLS accesses.
unsigned NumLocalDynamics;		unsigned NumLocalDynamics;
		/// HasPushSequences - Keeps track of whether this function uses sequences
		/// of pushes to pass function parameters.
		bool HasPushSequences;

private:		private:
/// ForwardedMustTailRegParms - A list of virtual and physical registers		/// ForwardedMustTailRegParms - A list of virtual and physical registers
/// that must be forwarded to every musttail call.		/// that must be forwarded to every musttail call.
SmallVector<ForwardedRegister, 1> ForwardedMustTailRegParms;		SmallVector<ForwardedRegister, 1> ForwardedMustTailRegParms;

public:		public:
X86MachineFunctionInfo() : ForceFramePointer(false),		X86MachineFunctionInfo() : ForceFramePointer(false),
RestoreBasePointerOffset(0),		RestoreBasePointerOffset(0),
CalleeSavedFrameSize(0),		CalleeSavedFrameSize(0),
BytesToPopOnReturn(0),		BytesToPopOnReturn(0),
ReturnAddrIndex(0),		ReturnAddrIndex(0),
TailCallReturnAddrDelta(0),		TailCallReturnAddrDelta(0),
SRetReturnReg(0),		SRetReturnReg(0),
GlobalBaseReg(0),		GlobalBaseReg(0),
VarArgsFrameIndex(0),		VarArgsFrameIndex(0),
RegSaveFrameIndex(0),		RegSaveFrameIndex(0),
VarArgsGPOffset(0),		VarArgsGPOffset(0),
VarArgsFPOffset(0),		VarArgsFPOffset(0),
ArgumentStackSize(0),		ArgumentStackSize(0),
NumLocalDynamics(0) {}		NumLocalDynamics(0),
		HasPushSequences(false) {}

explicit X86MachineFunctionInfo(MachineFunction &MF)		explicit X86MachineFunctionInfo(MachineFunction &MF)
: ForceFramePointer(false),		: ForceFramePointer(false),
RestoreBasePointerOffset(0),		RestoreBasePointerOffset(0),
CalleeSavedFrameSize(0),		CalleeSavedFrameSize(0),
BytesToPopOnReturn(0),		BytesToPopOnReturn(0),
ReturnAddrIndex(0),		ReturnAddrIndex(0),
TailCallReturnAddrDelta(0),		TailCallReturnAddrDelta(0),
SRetReturnReg(0),		SRetReturnReg(0),
GlobalBaseReg(0),		GlobalBaseReg(0),
VarArgsFrameIndex(0),		VarArgsFrameIndex(0),
RegSaveFrameIndex(0),		RegSaveFrameIndex(0),
VarArgsGPOffset(0),		VarArgsGPOffset(0),
VarArgsFPOffset(0),		VarArgsFPOffset(0),
ArgumentStackSize(0),		ArgumentStackSize(0),
NumLocalDynamics(0) {}		NumLocalDynamics(0),
		HasPushSequences(false) {}

bool getForceFramePointer() const { return ForceFramePointer;}		bool getForceFramePointer() const { return ForceFramePointer;}
void setForceFramePointer(bool forceFP) { ForceFramePointer = forceFP; }		void setForceFramePointer(bool forceFP) { ForceFramePointer = forceFP; }

		bool getHasPushSequences() const { return HasPushSequences; }
		void setHasPushSequences(bool HasPush) { HasPushSequences = HasPush; }

bool getRestoreBasePointer() const { return RestoreBasePointerOffset!=0; }		bool getRestoreBasePointer() const { return RestoreBasePointerOffset!=0; }
void setRestoreBasePointer(const MachineFunction *MF);		void setRestoreBasePointer(const MachineFunction *MF);
int getRestoreBasePointerOffset() const {return RestoreBasePointerOffset; }		int getRestoreBasePointerOffset() const {return RestoreBasePointerOffset; }

unsigned getCalleeSavedFrameSize() const { return CalleeSavedFrameSize; }		unsigned getCalleeSavedFrameSize() const { return CalleeSavedFrameSize; }
void setCalleeSavedFrameSize(unsigned bytes) { CalleeSavedFrameSize = bytes; }		void setCalleeSavedFrameSize(unsigned bytes) { CalleeSavedFrameSize = bytes; }

unsigned getBytesToPopOnReturn() const { return BytesToPopOnReturn; }		unsigned getBytesToPopOnReturn() const { return BytesToPopOnReturn; }
Show All 40 Lines

llvm/trunk/lib/Target/X86/X86RegisterInfo.cpp

Show First 20 Lines • Show All 462 Lines • ▼ Show 20 Lines	bool X86RegisterInfo::hasReservedSpillSlot(const MachineFunction &MF,
// this function neither used nor tested.		// this function neither used nor tested.
llvm_unreachable("Unused function on X86. Otherwise need a test case.");		llvm_unreachable("Unused function on X86. Otherwise need a test case.");
}		}

void		void
X86RegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,		X86RegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
int SPAdj, unsigned FIOperandNum,		int SPAdj, unsigned FIOperandNum,
RegScavenger *RS) const {		RegScavenger *RS) const {
assert(SPAdj == 0 && "Unexpected");

MachineInstr &MI = *II;		MachineInstr &MI = *II;
MachineFunction &MF = *MI.getParent()->getParent();		MachineFunction &MF = *MI.getParent()->getParent();
const TargetFrameLowering *TFI = MF.getSubtarget().getFrameLowering();		const TargetFrameLowering *TFI = MF.getSubtarget().getFrameLowering();
int FrameIndex = MI.getOperand(FIOperandNum).getIndex();		int FrameIndex = MI.getOperand(FIOperandNum).getIndex();
unsigned BasePtr;		unsigned BasePtr;

unsigned Opc = MI.getOpcode();		unsigned Opc = MI.getOpcode();
bool AfterFPPop = Opc == X86::TAILJMPm64 \|\| Opc == X86::TAILJMPm;		bool AfterFPPop = Opc == X86::TAILJMPm64 \|\| Opc == X86::TAILJMPm;
Show All 20 Lines	X86RegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
int FIOffset;		int FIOffset;
if (AfterFPPop) {		if (AfterFPPop) {
// Tail call jmp happens after FP is popped.		// Tail call jmp happens after FP is popped.
const MachineFrameInfo *MFI = MF.getFrameInfo();		const MachineFrameInfo *MFI = MF.getFrameInfo();
FIOffset = MFI->getObjectOffset(FrameIndex) - TFI->getOffsetOfLocalArea();		FIOffset = MFI->getObjectOffset(FrameIndex) - TFI->getOffsetOfLocalArea();
} else		} else
FIOffset = TFI->getFrameIndexOffset(MF, FrameIndex);		FIOffset = TFI->getFrameIndexOffset(MF, FrameIndex);

		if (BasePtr == StackPtr)
		FIOffset += SPAdj;

// The frame index format for stackmaps and patchpoints is different from the		// The frame index format for stackmaps and patchpoints is different from the
// X86 format. It only has a FI and an offset.		// X86 format. It only has a FI and an offset.
if (Opc == TargetOpcode::STACKMAP \|\| Opc == TargetOpcode::PATCHPOINT) {		if (Opc == TargetOpcode::STACKMAP \|\| Opc == TargetOpcode::PATCHPOINT) {
assert(BasePtr == FramePtr && "Expected the FP as base register");		assert(BasePtr == FramePtr && "Expected the FP as base register");
int64_t Offset = MI.getOperand(FIOperandNum + 1).getImm() + FIOffset;		int64_t Offset = MI.getOperand(FIOperandNum + 1).getImm() + FIOffset;
MI.getOperand(FIOperandNum + 1).ChangeToImmediate(Offset);		MI.getOperand(FIOperandNum + 1).ChangeToImmediate(Offset);
return;		return;
}		}
▲ Show 20 Lines • Show All 214 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86TargetMachine.cpp

Show First 20 Lines • Show All 186 Lines • ▼ Show 20 Lines	public:

const X86Subtarget &getX86Subtarget() const {		const X86Subtarget &getX86Subtarget() const {
return *getX86TargetMachine().getSubtargetImpl();		return *getX86TargetMachine().getSubtargetImpl();
}		}

void addIRPasses() override;		void addIRPasses() override;
bool addInstSelector() override;		bool addInstSelector() override;
bool addILPOpts() override;		bool addILPOpts() override;
		void addPreRegAlloc() override;
void addPostRegAlloc() override;		void addPostRegAlloc() override;
void addPreEmitPass() override;		void addPreEmitPass() override;
};		};
} // namespace		} // namespace

TargetPassConfig *X86TargetMachine::createPassConfig(PassManagerBase &PM) {		TargetPassConfig *X86TargetMachine::createPassConfig(PassManagerBase &PM) {
return new X86PassConfig(this, PM);		return new X86PassConfig(this, PM);
}		}
Show All 17 Lines	bool X86PassConfig::addInstSelector() {
return false;		return false;
}		}

bool X86PassConfig::addILPOpts() {		bool X86PassConfig::addILPOpts() {
addPass(&EarlyIfConverterID);		addPass(&EarlyIfConverterID);
return true;		return true;
}		}

		void X86PassConfig::addPreRegAlloc() {
		addPass(createX86CallFrameOptimization());
		}

void X86PassConfig::addPostRegAlloc() {		void X86PassConfig::addPostRegAlloc() {
addPass(createX86FloatingPointStackifierPass());		addPass(createX86FloatingPointStackifierPass());
}		}

void X86PassConfig::addPreEmitPass() {		void X86PassConfig::addPreEmitPass() {
if (getOptLevel() != CodeGenOpt::None && getX86Subtarget().hasSSE2())		if (getOptLevel() != CodeGenOpt::None && getX86Subtarget().hasSSE2())
addPass(createExecutionDependencyFixPass(&X86::VR128RegClass));		addPass(createExecutionDependencyFixPass(&X86::VR128RegClass));

if (UseVZeroUpper)		if (UseVZeroUpper)
addPass(createX86IssueVZeroUpperPass());		addPass(createX86IssueVZeroUpperPass());

if (getOptLevel() != CodeGenOpt::None) {		if (getOptLevel() != CodeGenOpt::None) {
addPass(createX86PadShortFunctions());		addPass(createX86PadShortFunctions());
addPass(createX86FixupLEAs());		addPass(createX86FixupLEAs());
}		}
}		}

llvm/trunk/test/CodeGen/X86/inalloca-invoke.ll

Show All 25 Lines	; CHECK: leal 12(%[[beg]]), %[[end:[^ ]*]]

call void @begin(%Iter* sret %temp.lvalue)		call void @begin(%Iter* sret %temp.lvalue)
; CHECK: calll _begin		; CHECK: calll _begin

invoke void @plus(%Iter* sret %end, %Iter* %temp.lvalue, i32 4)		invoke void @plus(%Iter* sret %end, %Iter* %temp.lvalue, i32 4)
to label %invoke.cont unwind label %lpad		to label %invoke.cont unwind label %lpad

; Uses end as sret param.		; Uses end as sret param.
; CHECK: movl %[[end]], (%esp)		; CHECK: pushl %[[end]]
; CHECK: calll _plus		; CHECK: calll _plus

invoke.cont:		invoke.cont:
call void @begin(%Iter* sret %beg)		call void @begin(%Iter* sret %beg)

; CHECK: pushl %[[beg]]		; CHECK: pushl %[[beg]]
; CHECK: calll _begin		; CHECK: calll _begin

Show All 12 Lines

llvm/trunk/test/CodeGen/X86/movtopush.ll

	; RUN: llc < %s -mtriple=i686-windows \| FileCheck %s -check-prefix=NORMAL			; RUN: llc < %s -mtriple=i686-windows \| FileCheck %s -check-prefix=NORMAL
				; RUN: llc < %s -mtriple=x86_64-windows \| FileCheck %s -check-prefix=X64
	; RUN: llc < %s -mtriple=i686-windows -force-align-stack -stack-alignment=32 \| FileCheck %s -check-prefix=ALIGNED			; RUN: llc < %s -mtriple=i686-windows -force-align-stack -stack-alignment=32 \| FileCheck %s -check-prefix=ALIGNED

	declare void @good(i32 %a, i32 %b, i32 %c, i32 %d)			declare void @good(i32 %a, i32 %b, i32 %c, i32 %d)
	declare void @inreg(i32 %a, i32 inreg %b, i32 %c, i32 %d)			declare void @inreg(i32 %a, i32 inreg %b, i32 %c, i32 %d)

	; Here, we should have a reserved frame, so we don't expect pushes			; Here, we should have a reserved frame, so we don't expect pushes
	; NORMAL-LABEL: test1			; NORMAL-LABEL: test1:
	; NORMAL: subl $16, %esp			; NORMAL: subl $16, %esp
	; NORMAL-NEXT: movl $4, 12(%esp)			; NORMAL-NEXT: movl $4, 12(%esp)
	; NORMAL-NEXT: movl $3, 8(%esp)			; NORMAL-NEXT: movl $3, 8(%esp)
	; NORMAL-NEXT: movl $2, 4(%esp)			; NORMAL-NEXT: movl $2, 4(%esp)
	; NORMAL-NEXT: movl $1, (%esp)			; NORMAL-NEXT: movl $1, (%esp)
	; NORMAL-NEXT: call			; NORMAL-NEXT: call
				; NORMAL-NEXT: addl $16, %esp
	define void @test1() {			define void @test1() {
	entry:			entry:
	call void @good(i32 1, i32 2, i32 3, i32 4)			call void @good(i32 1, i32 2, i32 3, i32 4)
	ret void			ret void
	}			}

	; Here, we expect a sequence of 4 immediate pushes			; We're optimizing for code size, so we should get pushes for x86,
	; NORMAL-LABEL: test2			; even though there is a reserved call frame.
				; Make sure we don't touch x86-64
				; NORMAL-LABEL: test1b:
				; NORMAL-NOT: subl {{.*}} %esp
				; NORMAL: pushl $4
				; NORMAL-NEXT: pushl $3
				; NORMAL-NEXT: pushl $2
				; NORMAL-NEXT: pushl $1
				; NORMAL-NEXT: call
				; NORMAL-NEXT: addl $16, %esp
				; X64-LABEL: test1b:
				; X64: movl $1, %ecx
				; X64-NEXT: movl $2, %edx
				; X64-NEXT: movl $3, %r8d
				; X64-NEXT: movl $4, %r9d
				; X64-NEXT: callq good
				define void @test1b() optsize {
				entry:
				call void @good(i32 1, i32 2, i32 3, i32 4)
				ret void
				}

				; Same as above, but for minsize
				; NORMAL-LABEL: test1c:
				; NORMAL-NOT: subl {{.*}} %esp
				; NORMAL: pushl $4
				; NORMAL-NEXT: pushl $3
				; NORMAL-NEXT: pushl $2
				; NORMAL-NEXT: pushl $1
				; NORMAL-NEXT: call
				; NORMAL-NEXT: addl $16, %esp
				define void @test1c() minsize {
				entry:
				call void @good(i32 1, i32 2, i32 3, i32 4)
				ret void
				}

				; If we have a reserved frame, we should have pushes
				; NORMAL-LABEL: test2:
	; NORMAL-NOT: subl {{.*}} %esp			; NORMAL-NOT: subl {{.*}} %esp
	; NORMAL: pushl $4			; NORMAL: pushl $4
	; NORMAL-NEXT: pushl $3			; NORMAL-NEXT: pushl $3
	; NORMAL-NEXT: pushl $2			; NORMAL-NEXT: pushl $2
	; NORMAL-NEXT: pushl $1			; NORMAL-NEXT: pushl $1
	; NORMAL-NEXT: call			; NORMAL-NEXT: call
	define void @test2(i32 %k) {			define void @test2(i32 %k) {
	entry:			entry:
	%a = alloca i32, i32 %k			%a = alloca i32, i32 %k
	call void @good(i32 1, i32 2, i32 3, i32 4)			call void @good(i32 1, i32 2, i32 3, i32 4)
	ret void			ret void
	}			}

	; Again, we expect a sequence of 4 immediate pushes			; Again, we expect a sequence of 4 immediate pushes
	; Checks that we generate the right pushes for >8bit immediates			; Checks that we generate the right pushes for >8bit immediates
	; NORMAL-LABEL: test2b			; NORMAL-LABEL: test2b:
	; NORMAL-NOT: subl {{.*}} %esp			; NORMAL-NOT: subl {{.*}} %esp
	; NORMAL: pushl $4096			; NORMAL: pushl $4096
	; NORMAL-NEXT: pushl $3072			; NORMAL-NEXT: pushl $3072
	; NORMAL-NEXT: pushl $2048			; NORMAL-NEXT: pushl $2048
	; NORMAL-NEXT: pushl $1024			; NORMAL-NEXT: pushl $1024
	; NORMAL-NEXT: call			; NORMAL-NEXT: call
	define void @test2b(i32 %k) {			; NORMAL-NEXT: addl $16, %esp
				define void @test2b() optsize {
	entry:			entry:
	%a = alloca i32, i32 %k
	call void @good(i32 1024, i32 2048, i32 3072, i32 4096)			call void @good(i32 1024, i32 2048, i32 3072, i32 4096)
	ret void			ret void
	}			}

	; The first push should push a register			; The first push should push a register
	; NORMAL-LABEL: test3			; NORMAL-LABEL: test3:
	; NORMAL-NOT: subl {{.*}} %esp			; NORMAL-NOT: subl {{.*}} %esp
	; NORMAL: pushl $4			; NORMAL: pushl $4
	; NORMAL-NEXT: pushl $3			; NORMAL-NEXT: pushl $3
	; NORMAL-NEXT: pushl $2			; NORMAL-NEXT: pushl $2
	; NORMAL-NEXT: pushl %e{{..}}			; NORMAL-NEXT: pushl %e{{..}}
	; NORMAL-NEXT: call			; NORMAL-NEXT: call
	define void @test3(i32 %k) {			; NORMAL-NEXT: addl $16, %esp
				define void @test3(i32 %k) optsize {
	entry:			entry:
	%a = alloca i32, i32 %k
	call void @good(i32 %k, i32 2, i32 3, i32 4)			call void @good(i32 %k, i32 2, i32 3, i32 4)
	ret void			ret void
	}			}

	; We don't support weird calling conventions			; We don't support weird calling conventions
	; NORMAL-LABEL: test4			; NORMAL-LABEL: test4:
	; NORMAL: subl $12, %esp			; NORMAL: subl $12, %esp
	; NORMAL-NEXT: movl $4, 8(%esp)			; NORMAL-NEXT: movl $4, 8(%esp)
	; NORMAL-NEXT: movl $3, 4(%esp)			; NORMAL-NEXT: movl $3, 4(%esp)
	; NORMAL-NEXT: movl $1, (%esp)			; NORMAL-NEXT: movl $1, (%esp)
	; NORMAL-NEXT: movl $2, %eax			; NORMAL-NEXT: movl $2, %eax
	; NORMAL-NEXT: call			; NORMAL-NEXT: call
	define void @test4(i32 %k) {			; NORMAL-NEXT: addl $12, %esp
				define void @test4() optsize {
	entry:			entry:
	%a = alloca i32, i32 %k
	call void @inreg(i32 1, i32 2, i32 3, i32 4)			call void @inreg(i32 1, i32 2, i32 3, i32 4)
	ret void			ret void
	}			}

	; Check that additional alignment is added when the pushes			; When there is no reserved call frame, check that additional alignment
	; don't add up to the required alignment.			; is added when the pushes don't add up to the required alignment.
	; ALIGNED-LABEL: test5			; ALIGNED-LABEL: test5:
	; ALIGNED: subl $16, %esp			; ALIGNED: subl $16, %esp
	; ALIGNED-NEXT: pushl $4			; ALIGNED-NEXT: pushl $4
	; ALIGNED-NEXT: pushl $3			; ALIGNED-NEXT: pushl $3
	; ALIGNED-NEXT: pushl $2			; ALIGNED-NEXT: pushl $2
	; ALIGNED-NEXT: pushl $1			; ALIGNED-NEXT: pushl $1
	; ALIGNED-NEXT: call			; ALIGNED-NEXT: call
	define void @test5(i32 %k) {			define void @test5(i32 %k) {
	entry:			entry:
	%a = alloca i32, i32 %k			%a = alloca i32, i32 %k
	call void @good(i32 1, i32 2, i32 3, i32 4)			call void @good(i32 1, i32 2, i32 3, i32 4)
	ret void			ret void
	}			}

	; Check that pushing the addresses of globals (Or generally, things that			; Check that pushing the addresses of globals (Or generally, things that
	; aren't exactly immediates) isn't broken.			; aren't exactly immediates) isn't broken.
	; Fixes PR21878.			; Fixes PR21878.
	; NORMAL-LABEL: test6			; NORMAL-LABEL: test6:
	; NORMAL: pushl $_ext			; NORMAL: pushl $_ext
	; NORMAL-NEXT: call			; NORMAL-NEXT: call
	declare void @f(i8*)			declare void @f(i8*)
	@ext = external constant i8			@ext = external constant i8

	define void @test6() {			define void @test6() {
	call void @f(i8* @ext)			call void @f(i8* @ext)
	br label %bb			br label %bb
	bb:			bb:
	alloca i32			alloca i32
	ret void			ret void
	}			}

				; Check that we fold simple cases into the push
				; NORMAL-LABEL: test7:
				; NORMAL-NOT: subl {{.*}} %esp
				; NORMAL: movl 4(%esp), [[EAX:%e..]]
				; NORMAL-NEXT: pushl $4
				; NORMAL-NEXT: pushl ([[EAX]])
				; NORMAL-NEXT: pushl $2
				; NORMAL-NEXT: pushl $1
				; NORMAL-NEXT: call
				; NORMAL-NEXT: addl $16, %esp
				define void @test7(i32* %ptr) optsize {
				entry:
				%val = load i32* %ptr
				call void @good(i32 1, i32 2, i32 %val, i32 4)
				ret void
				}

				; But we don't want to fold stack-relative loads into the push,
				; because the offset will be wrong
				; NORMAL-LABEL: test8:
				; NORMAL-NOT: subl {{.*}} %esp
				; NORMAL: movl 4(%esp), [[EAX:%e..]]
				; NORMAL-NEXT: pushl $4
				; NORMAL-NEXT: pushl [[EAX]]
				; NORMAL-NEXT: pushl $2
				; NORMAL-NEXT: pushl $1
				; NORMAL-NEXT: call
				; NORMAL-NEXT: addl $16, %esp
				define void @test8(i32* %ptr) optsize {
				entry:
				%val = ptrtoint i32* %ptr to i32
				call void @good(i32 1, i32 2, i32 %val, i32 4)
				ret void
				}

				; If one function is using push instructions, and the other isn't
				; (because it has frame-index references), then we must resolve
				; these references correctly.
				; NORMAL-LABEL: test9:
				; NORMAL-NOT: leal (%esp),
				; NORMAL: pushl $4
				; NORMAL-NEXT: pushl $3
				; NORMAL-NEXT: pushl $2
				; NORMAL-NEXT: pushl $1
				; NORMAL-NEXT: call
				; NORMAL-NEXT: addl $16, %esp
				; NORMAL-NEXT: subl $16, %esp
				; NORMAL-NEXT: leal 16(%esp), [[EAX:%e..]]
				; NORMAL-NEXT: movl [[EAX]], 12(%esp)
				; NORMAL-NEXT: movl $7, 8(%esp)
				; NORMAL-NEXT: movl $6, 4(%esp)
				; NORMAL-NEXT: movl $5, (%esp)
				; NORMAL-NEXT: call
				; NORMAL-NEXT: addl $16, %esp
				define void @test9() optsize {
				entry:
				%p = alloca i32, align 4
				call void @good(i32 1, i32 2, i32 3, i32 4)
				%0 = ptrtoint i32* %p to i32
				call void @good(i32 5, i32 6, i32 7, i32 %0)
				ret void
				}

				; We can end up with an indirect call which gets reloaded on the spot.
				; Make sure we reference the correct stack slot - we spill into (%esp)
				; and reload from 16(%esp) due to the pushes.
				; NORMAL-LABEL: test10:
				; NORMAL: movl $_good, [[ALLOC:.*]]
				; NORMAL-NEXT: movl [[ALLOC]], [[EAX:%e..]]
				; NORMAL-NEXT: movl [[EAX]], (%esp) # 4-byte Spill
				; NORMAL: nop
				; NORMAL: pushl $4
				; NORMAL-NEXT: pushl $3
				; NORMAL-NEXT: pushl $2
				; NORMAL-NEXT: pushl $1
				; NORMAL-NEXT: calll *16(%esp)
				; NORMAL-NEXT: addl $16, %esp
				define void @test10() optsize {
				%stack_fptr = alloca void (i32, i32, i32, i32)*
				store void (i32, i32, i32, i32)* @good, void (i32, i32, i32, i32)** %stack_fptr
				%good_ptr = load volatile void (i32, i32, i32, i32)** %stack_fptr
				call void asm sideeffect "nop", "~{ax},~{bx},~{cx},~{dx},~{bp},~{si},~{di}"()
				call void (i32, i32, i32, i32)* %good_ptr(i32 1, i32 2, i32 3, i32 4)
				ret void
				}

				; We can't fold the load from the global into the push because of
				; interference from the store
				; NORMAL-LABEL: test11:
				; NORMAL: movl _the_global, [[EAX:%e..]]
				; NORMAL-NEXT: movl $42, _the_global
				; NORMAL-NEXT: pushl $4
				; NORMAL-NEXT: pushl $3
				; NORMAL-NEXT: pushl $2
				; NORMAL-NEXT: pushl [[EAX]]
				; NORMAL-NEXT: call
				; NORMAL-NEXT: addl $16, %esp
				@the_global = external global i32
				define void @test11() optsize {
				%myload = load i32* @the_global
				store i32 42, i32* @the_global
				call void @good(i32 %myload, i32 2, i32 3, i32 4)
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Convert esp-relative movs of function arguments to pushes, step 2ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 19110

llvm/trunk/include/llvm/Target/TargetFrameLowering.h

llvm/trunk/lib/CodeGen/PrologEpilogInserter.cpp

llvm/trunk/lib/CodeGen/TargetFrameLoweringImpl.cpp

llvm/trunk/lib/Target/X86/CMakeLists.txt

llvm/trunk/lib/Target/X86/X86.h

llvm/trunk/lib/Target/X86/X86CallFrameOptimization.cpp

llvm/trunk/lib/Target/X86/X86FastISel.cpp

llvm/trunk/lib/Target/X86/X86FrameLowering.h

llvm/trunk/lib/Target/X86/X86FrameLowering.cpp

llvm/trunk/lib/Target/X86/X86InstrCompiler.td

llvm/trunk/lib/Target/X86/X86InstrInfo.h

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp

llvm/trunk/lib/Target/X86/X86MachineFunctionInfo.h

llvm/trunk/lib/Target/X86/X86RegisterInfo.cpp

llvm/trunk/lib/Target/X86/X86TargetMachine.cpp

llvm/trunk/test/CodeGen/X86/inalloca-invoke.ll

llvm/trunk/test/CodeGen/X86/movtopush.ll

[X86] Convert esp-relative movs of function arguments to pushes, step 2
ClosedPublic