This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
CallingConvLower.h
-
lib/
-
CodeGen/
-
CallingConvLower.cpp
-
SelectionDAG/
-
FunctionLoweringInfo.cpp
-
Target/
-
AArch64/
-
AArch64CallLowering.cpp
-
AArch64ISelLowering.cpp
-
X86/
1/2
X86ExpandPseudo.cpp
2/4
X86ISelLowering.h
4/7
X86ISelLowering.cpp
-
X86InstrCompiler.td
-
X86InstrInfo.td
-
X86MachineFunctionInfo.h
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
1/2
musttail-varargs.ll
-
vastart-defs-eflags.ll
-
x32-va_start.ll
-
xmm-vararg-noopt.ll

Differential D69372

[X86][VARARG] Avoid spilling xmm vararg arguments.
AbandonedPublic

Authored by avl on Oct 24 2019, 3:08 AM.

Download Raw Diff

Details

Reviewers

craig.topper
RKSimon
andreadb
qcolombet
rnk

Summary

That patch fixes https://bugs.llvm.org/show_bug.cgi?id=42219 bug.
Related review D62639.

For the noimplicitfloat mode, the compiler mustn't generate
floating-point code if it was not asked directly to do so.
This rule does not work with variable function arguments currently.
Though compiler correctly guards block of code, which copies xmm vararg parameters with a check for %al,
it does not protect spills for xmm registers. Thus, such spills generated in non-protected areas and could break code, which does not expect floating-point data.
The problem happens in -O0 optimization mode. With this optimization level
there is used FastRegisterAllocator, which spills virtual registers at basic block boundaries.
Register Allocator does not protect spills with additional control-flow modifications.
Thus to resolve that problem, it is suggested to not copy incoming physical
registers into virtual registers. Instead, store incoming physical xmm registers into the memory from scratch.

Another variant of this problem happens with high optimization modes with thunk functions.
At a high optimization level, the Greedy Register Allocator is used.
For the following test case(CodeGen/X86/musttail-varargs.ll) :

define void @f_thunk(i8* %this, ...) {
  ; Use va_start so that we exercise the combination.  %ap = alloca [4 x i8*], align 16
  %ap_i8 = bitcast [4 x i8*]* %ap to i8*
  call void @llvm.va_start(i8* %ap_i8)

  %fptr = call void(i8*, ...)*(i8*) @get_f(i8* %this)  <<<<<<<<<<<<<<<<<<<
  musttail call void (i8*, ...) %fptr(i8* %this, ...)
  ret void
}

f_thunk function needs to propagate all their parameters into %fptr. But, it needs to store/restore virtual registers
corresponding to incoming xmm registers around the call to get_f(). So the final code contains unprotected stores/restores for xmm registers. The solution for that case is also to avoid using virtual registers. Copy incoming physical xmm registers into the memory at the function entry. Restore them from memory right before tail-call jump instruction. New asm code for this case would look like this:

f_thunk: 
        testb   %al, %al
        je      .LBB0_2
# %bb.1:
        movaps  %xmm0, 96(%rsp)  <<< store incoming xmm registers on to stack

.LBB0_2:
        callq   get_f
        testb   %al, %al   <<< check for existence of xmm varargs parameters
        je      .LBB0_4
# %bb.3:
        movaps  96(%rsp), %xmm0        <<< restore xmm varargs parameters before tailcall jump. 
        jmpq    *%r11                   # TAILCALL
.LBB0_4:
        jmpq    *%r11                   # TAILCALL

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	1,640 ms	libc++.std/thread/thread_mutex/thread_mutex_requirements/thread_mutex_requirements_mutex/thread_mutex_class::Unknown Unit Message ("")

Event Timeline

avl created this revision.Oct 24 2019, 3:08 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2019, 3:08 AM

Herald added subscribers: jfb, hiraditya. · View Herald Transcript

avl edited the summary of this revision. (Show Details)Oct 24 2019, 3:11 AM

avl edited the summary of this revision. (Show Details)Oct 24 2019, 3:13 AM

asl added inline comments.Oct 24 2019, 8:41 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
3690	What is the status of all these TODO here and there?
llvm/lib/Target/X86/X86ISelLowering.h
264	Do you really need these changes?

avl marked 2 inline comments as done.Oct 24 2019, 9:51 AM

avl added inline comments.

llvm/lib/Target/X86/X86ISelLowering.cpp
3690	I assume they would be done later. They are not required for correctness. not storing xmm registers if "no fp, only musttail calls, noimplcitfloat". I am going to do after this patch and D62639 would be integrated. support of YMM and ZMM would be done by someone who would implement support of YMM and ZMM registers for varargs. this is separate task. comment for AArch64 - for someone who would implement that on AArch64.
llvm/lib/Target/X86/X86ISelLowering.h
264	no, I don`t . it was done by clang-format.

asl added inline comments.Oct 24 2019, 9:59 AM

llvm/lib/Target/X86/X86ISelLowering.h
264	Please do not add unrelated changes

avl marked an inline comment as done.Oct 24 2019, 10:13 AM

avl added inline comments.

llvm/lib/Target/X86/X86ISelLowering.h
264	Ok, I will delete them. clang-format added above changes since VARARG_THUNK_SAVE_XMM_REGS was added. In that sense the changes are related. But OK, I will delete them.

removed formatting done by clang-format for X86ISelLowering.h. ping...

fix small typo introduced by previous update.

Is it possible to avoid expanding VARARG_THUNK_SAVE_XMM_REGS until after register allocation? I would rather not make the liveness rules more complicated just for the sake of working around a limitation of fast regalloc.

Is it possible to avoid expanding VARARG_THUNK_SAVE_XMM_REGS until after register allocation? I would rather not make the liveness rules more complicated just for the sake of working around a limitation of fast regalloc.

yes, it is. Would do that way. Thanks.

addressed comments(moved expansion of xmm registers code until after register allocator).

ping.

ping..

ping.

avl added a reviewer: qcolombet.Dec 13 2019, 2:06 AM

rebased. ping.

ping

ping.

Colleagues, any comments on this review?

ping.

Target-independent changes look fine.

I'm not planning to review the x86 backend changes.

craig.topper added inline comments.Jan 13 2020, 2:45 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
3534	First letter of variables should be capitalized.
3683	!guardedXmmRegs.empty()

For the noimplicitfloat mode, the compiler mustn't generate
floating-point code if it was not asked directly to do so.

I don't think checking AL on function entry will work reliably. In general, musttail is meant to forward all possible register parameters. Normal arguments may be passed in XMM registers without setting AL. Consider this code:

typedef float v4f32 __attribute__((__vector_size__(16), __aligned__(16)));
int gv_i32;
v4f32 gv_v4f32;
void bar(int x, v4f32 v) {
  gv_i32 = x;
  gv_v4f32 = v;
}
void foo() {
  v4f32 v = {};
  bar(42, v);
}

It should be possible to use musttail thunk with a prototype of void (int, ...) between the call to foo and bar. And, if that thunk contains a function call, it will need to spill&fill all XMM argument registers. If you look at the generated assembly for calling bar, AL is not set to 1:

# %bb.0:                                # %entry
        pushq   %rax
        .cfi_def_cfa_offset 16
        xorps   %xmm0, %xmm0
        movl    $42, %edi
        callq   _Z3bariDv4_f

Is there actually a use case for musttail thunks and noimplicitfloat? Could we instead just declare that such a thunk represents explicit FP usage?

Furthermore, this will probably do the wrong thing on Windows, where AL is never set, to my knowledge. Consider this C++ code that uses musttail in the MS ABI:

int gv_i32;
double gv_f64;
class Foo {
  void virtual bar(int x, double v);
  void foo();
};
void Foo::bar(int x, double v) {
  gv_i32 = x;
  gv_f64 = v;
}
void Foo::foo() {
  auto mp = &Foo::bar;
  (this->*mp)(42, 0.0);
}

The virtual member pointer thunk needs to pass along XMM2, although it doesn't contain a function call.

llvm/lib/Target/X86/X86ExpandPseudo.cpp
250	This looks like a lot of very fragile pattern matching. I would greatly prefer it if we didn't have to do this.
llvm/lib/Target/X86/X86ISelLowering.cpp
3565	We don't in general reference the issue tracker from code comments, only from test cases.

This revision now requires changes to proceed.Jan 13 2020, 2:50 PM

In D69372#1818166, @rnk wrote:
For the noimplicitfloat mode, the compiler mustn't generate
floating-point code if it was not asked directly to do so.

I don't think checking AL on function entry will work reliably. In general, musttail is meant to forward all possible register parameters. Normal arguments may be passed in XMM registers without setting AL. Consider this code:
typedef float v4f32 __attribute__((__vector_size__(16), __aligned__(16)));
int gv_i32;
v4f32 gv_v4f32;
void bar(int x, v4f32 v) {
  gv_i32 = x;
  gv_v4f32 = v;
}
void foo() {
  v4f32 v = {};
  bar(42, v);
}
It should be possible to use musttail thunk with a prototype of void (int, ...) between the call to foo and bar.

if I correctly understood the example - such a situation should not occur.
My understanding is that musttail thunk and it`s target _must_ have identical signatures.

In the above example, thunk assumed this definition: void (int, ...) while the target has this definition: void (int x, v4f32 v), that is wrong.

There should be either:

thunk: void (int, ...)
target: void (int, ...)

either:

thunk: void (int x, v4f32 v)
target: void (int x, v4f32 v)

In both these cases, this patch would work correctly.
Thus, assuming identical declarations, checking AL should work correctly.

And, if that thunk contains a function call, it will need to spill&fill all XMM argument registers. If you look at the generated assembly for calling bar, AL is not set to 1:
# %bb.0:                                # %entry
        pushq   %rax
        .cfi_def_cfa_offset 16
        xorps   %xmm0, %xmm0
        movl    $42, %edi
        callq   _Z3bariDv4_f

Right. AL is not set to 1. That is so because bar() does not use variable arguments according to its declaration.
Thunk function should assume the same signature and do not use AL trick then.

Is there actually a use case for musttail thunks and noimplicitfloat? Could we instead just declare that such a thunk represents explicit FP usage?

I do not know the real examples. But in general, it looks like a valid use case(musttail thunks and noimplicitfloat).

Furthermore, this will probably do the wrong thing on Windows, where AL is never set, to my knowledge. Consider this C++ code that uses musttail in the MS ABI:
int gv_i32;
double gv_f64;
class Foo {
  void virtual bar(int x, double v);
  void foo();
};
void Foo::bar(int x, double v) {
  gv_i32 = x;
  gv_f64 = v;
}
void Foo::foo() {
  auto mp = &Foo::bar;
  (this->*mp)(42, 0.0);
}
The virtual member pointer thunk needs to pass along XMM2, although it doesn't contain a function call.

This patch does not work on windows . It is only for amd64 ABI:

bool X86CallLowering::lowerCall()
...

if (STI.is64Bit() && !IsFixed && !STI.isCallingConvWin64(Info.CallConv)) {
  // From AMD64 ABI document:
  // For calls that may call functions that use varargs or stdargs
  // (prototype-less calls or calls to functions containing ellipsis (...) in
  // the declaration) %al is used as hidden argument to specify the number
  // of SSE registers used. The contents of %al do not need to match exactly
  // the number of registers, but must be an ubound on the number of SSE
  // registers used and is in the range 0 - 8 inclusive.

  MIRBuilder.buildInstr(X86::MOV8ri)
      .addDef(X86::AL)
      .addImm(Handler.getNumXmmRegs());
  MIB.addUse(X86::AL, RegState::Implicit);
}

X86ISelLowering.cpp:
// FIXME: Get this from tablegen.
static ArrayRef<MCPhysReg> get64BitArgumentXMMs(MachineFunction &MF,

                                              CallingConv::ID CallConv,
                                              const X86Subtarget &Subtarget) {
assert(Subtarget.is64Bit());
if (Subtarget.isCallingConvWin64(CallConv)) {
  // The XMM registers which might contain var arg parameters are shadowed
  // in their paired GPR.  So we only need to save the GPR to their home
  // slots.
  // TODO: __vectorcall will change this.
  return None;
}

avl marked 3 inline comments as done.Jan 14 2020, 10:47 AM

avl added inline comments.

llvm/lib/Target/X86/X86ExpandPseudo.cpp
250	would rewrite it.

In D69372#1820099, @avl wrote:

In D69372#1818166, @rnk wrote:

It should be possible to use musttail thunk with a prototype of void (int, ...) between the call to foo and bar.

if I correctly understood the example - such a situation should not occur.
My understanding is that musttail thunk and it`s target _must_ have identical signatures.

The third point under musttail is meant to create an exception for for perfectly forwarding thunks: https://llvm.org/docs/LangRef.html#id325

If the musttail call appears in a function with the "thunk" attribute and the caller and callee both have varargs, than any unprototyped arguments in register or memory are forwarded to the callee. Similarly, the return value of the callee is returned to the caller’s caller, even if a void return type is in use.

In retrospect, I think it may have been a mistake to repurpose varargs to indicate that all remaining argument registers should be preserved. Maybe we don't need to use a varargs function prototype to implement perfectly forwarding thunks. We ended up needing the "thunk" function attribute, so we could base it on that instead.

This patch does not work on windows . It is only for amd64 ABI:

OK, but the C++ code I provided illustrates the use case of perfect forwarding, which might come up on Linux. It sets up a chain of calls that looks like:

indirect call to thunk, pass FP in XMM ->
in thunk, save XMM, use XMM, restore XMM, tail call to callee ->
receive XMM value in callee

The use case for perfectly forwarding thunks is pretty rare. I don't think there is a way to convince clang to generate one for Linux, just Windows.

Anyway, it seems like you could do away with a fair amount of the complexity in this patch for handling guarding musttail forwarding by declaring them to be an "explicit FP use", since it is not, in general, possible for a perfectly forwarding thunk to know if XMM arguments have been used.

In D69372#1820845, @rnk wrote:

In D69372#1820099, @avl wrote:

In D69372#1818166, @rnk wrote:

It should be possible to use musttail thunk with a prototype of void (int, ...) between the call to foo and bar.

if I correctly understood the example - such a situation should not occur.
My understanding is that musttail thunk and it`s target _must_ have identical signatures.

The third point under musttail is meant to create an exception for for perfectly forwarding thunks: https://llvm.org/docs/LangRef.html#id325

If the musttail call appears in a function with the "thunk" attribute and the caller and callee both have varargs, than any unprototyped arguments in register or memory are forwarded to the callee. Similarly, the return value of the callee is returned to the caller’s caller, even if a void return type is in use.

In retrospect, I think it may have been a mistake to repurpose varargs to indicate that all remaining argument registers should be preserved. Maybe we don't need to use a varargs function prototype to implement perfectly forwarding thunks. We ended up needing the "thunk" function attribute, so we could base it on that instead.

It looks like point 3 does not allow to have thunk signature to be "void (int, ...)" and callee signature "void (int x, v4f32 v)".
That point means that if caller and callee both have varargs, all varargs arguments should be properly forwarded from caller to callee.

There is another rule which looks relevant:

"The callee must be varargs iff the caller is varargs. Bitcasting a non-varargs function to the appropriate varargs type is legal so long as the non-varargs prefixes obey the other rules."

i.e. if caller/thunk signature "void (int, ...)", then the callee signature could be "void (int x, v4f32 v)".
Caller while calls thunk should use "void (int, ...)" signature.
Caller must set AL because of thunk signature "void (int, ...)".

that scenario should work correctly with/without this patch.

if currently there is a use case when caller/thunk have this signature "void (int, ...)" and AL is not set by caller(for AMD64 ABI) - then it looks like a bug in implementation.

This patch does not work on windows . It is only for amd64 ABI:

OK, but the C++ code I provided illustrates the use case of perfect forwarding, which might come up on Linux. It sets up a chain of calls that looks like:

indirect call to thunk, pass FP in XMM ->

in thunk, save XMM, use XMM, restore XMM, tail call to callee ->

receive XMM value in callee

The use case for perfectly forwarding thunks is pretty rare. I don't think there is a way to convince clang to generate one for Linux, just Windows.

It seems to me that this case is already properly handled and my patch does not break it.
This example does not have varargs in function declarations.
Thus AL trick would not be used.

Both thunk and target have this signature : void (int x, double v).
It explicitly uses XMM(as per ABI for func params).
All following manipulations would be explicitly done:

indirect call to thunk, pass FP in XMM -> OK
in thunk, save XMM, use XMM, restore XMM, tail call to callee -> OK
receive XMM value in callee -> OK

What would start to be wrong with this patch?

@rnk

Anyway, it seems like you could do away with a fair amount of the complexity in this patch for handling guarding musttail forwarding by declaring them to be an "explicit FP use", since it is not, in general, possible for a perfectly forwarding thunk to know if XMM arguments have been used.

This effectively means that any program containing thunks could not be compiled for non-floating point environment even if it does not use fp at all.
Taking into account that thunks are usually inserted by the compiler - that could be unexpected and hard to solve for the user.
i.e. The program could not have floating point code at all but it would not be possible to compile it because of thunks.
I think it is not a good alternative.

In retrospect, I think it may have been a mistake to repurpose varargs to indicate that all remaining argument registers should be preserved. Maybe we don't need to use a varargs function prototype to implement perfectly forwarding thunks. We ended up needing the "thunk" function attribute, so we could base it on that instead.

I agree that it is a mistake to use varargs for perfectly forwarding thunks purposes. Since it does not conform to current behavior and documented rules:

testcase from llvm/test/CodeGen/X86/musttail-varargs.ll:

declare void @llvm.va_start(i8*) nounwind

declare void(i8*, ...)* @get_f(i8* %this)

define void @f_thunk(i8* %this, ...) {
  %ap = alloca [4 x i8*], align 16
  %ap_i8 = bitcast [4 x i8*]* %ap to i8*
  call void @llvm.va_start(i8* %ap_i8)

  %fptr = call void(i8*, ...)*(i8*) @get_f(i8* %this)
  musttail call void (i8*, ...) %fptr(i8* %this, ...)
  ret void
}

It has "f_thunk" which has llvm.va_start. Thus it saves incoming xmm registers into va_start area :

; LINUX-NEXT:    testb %al, %al
; LINUX-NEXT:    je .LBB0_2
; LINUX-NEXT:  # %bb.1:
; LINUX-NEXT:    movaps %xmm0, {{[0-9]+}}(%rsp)
; LINUX-NEXT:    movaps %xmm1, {{[0-9]+}}(%rsp)
; LINUX-NEXT:    movaps %xmm2, {{[0-9]+}}(%rsp)
; LINUX-NEXT:    movaps %xmm3, {{[0-9]+}}(%rsp)
; LINUX-NEXT:    movaps %xmm4, {{[0-9]+}}(%rsp)
; LINUX-NEXT:    movaps %xmm5, {{[0-9]+}}(%rsp)
; LINUX-NEXT:    movaps %xmm6, {{[0-9]+}}(%rsp)
; LINUX-NEXT:    movaps %xmm7, {{[0-9]+}}(%rsp)
; LINUX-NEXT:  .LBB0_2:

if f_thunk would be called without setting AL it would work incorrectly.

calling "void (int, ...)" function without setting AL is AMD64 ABI violation.

https://llvm.org/docs/LangRef.html#id325

The caller and callee prototypes must match. Pointer types of parameters or return types may differ in pointee type, but not in address space. The calling conventions of the caller and callee must match. All ABI-impacting function attributes, such as sret, byval, inreg, returned, and inalloca, must match.

These rules clearly states that thunk signature and callee signature should agree on calling conventions. if thunk assumes that AL should be set then it is an error to not setting it.

Probably we could fix implementation of perfectly forwarding thunks in an ABI compatible way ?

@rnk Reid, I am trying to research the question of using varargs thunk for non-varargs methods.
(like using "void (int, ...)" thunk with "void (int x, v4f32 v)" signature as from the previous discussion).
I found following place which looks like precisely the case which uses musttail for not only varargs case:

CGVTables.cpp:EmitCallAndReturnForThunk():

// If perfect forwarding is required a variadic method, a method using
// inalloca, or an unprototyped thunk, use musttail. Emit an error if this
// thunk requires a return adjustment, since that is impossible with musttail.
if (CurFnInfo->usesInAlloca() || CurFnInfo->isVariadic() || IsUnprototyped) {

In cited fragments "CurFnInfo->isVariadic()" is a common usage for musttail.
"CurFnInfo->usesInAlloca()" and "IsUnprototyped" looks like extended usages.
Both of these features are specific for MS ABI:

InAlloca

https://llvm.org/docs/InAlloca.html

Primarily, this feature is required for compatibility with the Microsoft C++ ABI.

Unprototyped thunk

CGVTables.cpp:EmitCallAndReturnForThunk()
// Arrange a function prototype appropriate for a function definition. In some
// cases in the MS ABI, we may need to build an unprototyped musttail thunk.
const CGFunctionInfo &FnInfo =
    IsUnprototyped ? CGM.getTypes().arrangeUnprototypedMustTailThunk(MD)
                   : CGM.getTypes().arrangeGlobalDeclaration(GD);
llvm::FunctionType *ThunkFnTy = CGM.getTypes().GetFunctionType(FnInfo);

Thus it looks like the case using musttail for non-varargs functions never happens with AMD64 ABI.
So It is safe for this patch(Avoid spilling xmm vararg arguments) to rely on ABI requirement, and AL register should always be properly set for vararg functions.

The ability to avoid floating point usages is essential for environments not using floating point. If thunks would be considered as "explicit FP use" then many simple C++ programs could not be compiled for non-floating point environment because of unintended FP usage.

addressed comments:

Do not create guarded registers for functions with "thunk" attribute.
Resolve style issues.
Check for MachineInstr::FrameDestroy instead of checking instruction patterns.

Unit tests: fail. 62440 tests passed, 1 failed and 845 were skipped.

failed: libc++.std/thread/thread_mutex/thread_mutex_requirements/thread_mutex_requirements_mutex/thread_mutex_class/try_lock.pass.cpp

clang-tidy: fail. clang-tidy found 0 errors and 2 warnings. 0 of them are added as review comments below (why?).

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Harbormaster failed remote builds in B45714: Diff 242407!Feb 4 2020, 1:19 PM

clang-format recommendations: I was asked to not apply them in this review.
clang-tidy - would apply.
unit-test - looks like not reproduced, though I would check additionally.

applied clang-tidy reccomendations, removed icall_branch_funnel workaround.

Harbormaster failed remote builds in B46055: Diff 243463!Feb 9 2020, 1:52 PM

Unit tests pass. 62573 tests passed, 0 failed and 842 were skipped.

clang-tidy pass.

clang-format fail.

rnk added inline comments.Feb 10 2020, 5:15 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
3702	I haven't reviewed all of this code, but we have to find some way to refactor LowerCall. It was already poorly factored and long, but this is just too much complexity, too many lines of code. We need to find some way to separate concerns.
llvm/test/CodeGen/X86/musttail-varargs.ll
299–300	This seems unfortunate. :( Does all this go away if you put the "thunk" attribute on it? Everything in this file is meant to be a test for universal thunks, so adding the attribute is reasonable.

avl marked 2 inline comments as done.Feb 11 2020, 2:40 AM

avl added inline comments.

llvm/lib/Target/X86/X86ISelLowering.cpp
3702	Ok, I would refactor it. Would it be OK if that refactoring would be a part of this patch ? Or Do I need to make it previously in separate patch ?
llvm/test/CodeGen/X86/musttail-varargs.ll
299–300	yes. all this xmm save/restore code would go away if "thunk" is specified. I will add it to the test case. additionally, I would like to make separate patch which would NOT do this xmm store/restore if noimplicitfloat=false. So that store/restore code is generated for only noimplicitfloat=true case. thus, I assume following patches would be done: this patch. xmm stores/restores through phys regs would be generated for all _usual\|_ thunks(not including universal thunks) patch which will fix ABI breakage for noimplicitfloat case - D62639 do not generate xmm store/restore through phys regs for noimplicitfloat=false case.

avl mentioned this in D74794: [X86][ISelLowering] refactor Varargs handling in X86ISelLowering.cpp.Feb 18 2020, 2:07 PM

avl mentioned this in rGaa1eb5152d9a: [X86][ISelLowering] refactor Varargs handling in X86ISelLowering.cpp.May 12 2020, 3:05 PM

avl mentioned this in D80163: [X86][VARARG] Avoid spilling xmm registers for va_start..May 18 2020, 2:24 PM

avl planned changes to this revision.Oct 22 2020, 3:00 AM

Herald added a subscriber: pengfei. · View Herald TranscriptOct 22 2020, 3:00 AM

avl mentioned this in rGcf7cdaff64fb: [X86][VARARG] Avoid spilling xmm registers for va_start..Mar 6 2021, 4:30 AM

already fixed by D80163.

Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2023, 8:09 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

CallingConvLower.h

15 lines

lib/

CodeGen/

CallingConvLower.cpp

12 lines

SelectionDAG/

FunctionLoweringInfo.cpp

7 lines

Target/

AArch64/

AArch64CallLowering.cpp

5 lines

AArch64ISelLowering.cpp

6 lines

X86/

236 lines

18 lines

384 lines

22 lines

11 lines

X86MachineFunctionInfo.h

6 lines

test/

CodeGen/

X86/

musttail-varargs.ll

985 lines

vastart-defs-eflags.ll

10 lines

x32-va_start.ll

10 lines

xmm-vararg-noopt.ll

49 lines

Diff 242407

llvm/include/llvm/CodeGen/CallingConvLower.h

//===- llvm/CallingConvLower.h - Calling Conventions ------------- C++ --===//		//===- llvm/CallingConvLower.h - Calling Conventions ------------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file declares the CCState and CCValAssign classes, used for lowering		// This file declares the CCState and CCValAssign classes, used for lowering
// and implementing calling conventions.		// and implementing calling conventions.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_CODEGEN_CALLINGCONVLOWER_H		#ifndef LLVM_CODEGEN_CALLINGCONVLOWER_H
#define LLVM_CODEGEN_CALLINGCONVLOWER_H		#define LLVM_CODEGEN_CALLINGCONVLOWER_H

		#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/CodeGen/MachineFrameInfo.h"		#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/TargetCallingConv.h"		#include "llvm/CodeGen/TargetCallingConv.h"
#include "llvm/IR/CallingConv.h"		#include "llvm/IR/CallingConv.h"
#include "llvm/MC/MCRegisterInfo.h"		#include "llvm/MC/MCRegisterInfo.h"
#include "llvm/Support/Alignment.h"		#include "llvm/Support/Alignment.h"

▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines	public:
}		}

bool isUpperBitsInLoc() const {		bool isUpperBitsInLoc() const {
return HTP == AExtUpper \|\| HTP == SExtUpper \|\| HTP == ZExtUpper;		return HTP == AExtUpper \|\| HTP == SExtUpper \|\| HTP == ZExtUpper;
}		}
};		};

/// Describes a register that needs to be forwarded from the prologue to a		/// Describes a register that needs to be forwarded from the prologue to a
/// musttail call.		/// musttail call. Specifying VReg == 0 means that the register should be
		/// put into guarded area and no virtual register was created for it.
struct ForwardedRegister {		struct ForwardedRegister {
ForwardedRegister(unsigned VReg, MCPhysReg PReg, MVT VT)		ForwardedRegister(unsigned VReg, MCPhysReg PReg, MVT VT)
: VReg(VReg), PReg(PReg), VT(VT) {}		: VReg(VReg), PReg(PReg), VT(VT) {}
		bool IsGuarded() const { return VReg == 0; }
unsigned VReg;		unsigned VReg;
MCPhysReg PReg;		MCPhysReg PReg;
MVT VT;		MVT VT;
};		};

/// CCAssignFn - This function assigns a location for Val, updating State to		/// CCAssignFn - This function assigns a location for Val, updating State to
/// reflect the change. It returns 'true' if it failed to handle Val.		/// reflect the change. It returns 'true' if it failed to handle Val.
typedef bool CCAssignFn(unsigned ValNo, MVT ValVT,		typedef bool CCAssignFn(unsigned ValNo, MVT ValVT,
▲ Show 20 Lines • Show All 340 Lines • ▼ Show 20 Lines	public:
/// Compute the remaining unused register parameters that would be used for		/// Compute the remaining unused register parameters that would be used for
/// the given value type. This is useful when varargs are passed in the		/// the given value type. This is useful when varargs are passed in the
/// registers that normal prototyped parameters would be passed in, or for		/// registers that normal prototyped parameters would be passed in, or for
/// implementing perfect forwarding.		/// implementing perfect forwarding.
void getRemainingRegParmsForType(SmallVectorImpl<MCPhysReg> &Regs, MVT VT,		void getRemainingRegParmsForType(SmallVectorImpl<MCPhysReg> &Regs, MVT VT,
CCAssignFn Fn);		CCAssignFn Fn);

/// Compute the set of registers that need to be preserved and forwarded to		/// Compute the set of registers that need to be preserved and forwarded to
/// any musttail calls.		/// any musttail calls. Some platforms(AMD64) allow to guard access to
		/// certain kind of input registers. f.e. Accesses to xmm registers should
		/// be done according to the state of %al register. This function set
		/// IsGuarded bit according to the specified set of GuardedForwardedRegs.
void analyzeMustTailForwardedRegisters(		void analyzeMustTailForwardedRegisters(
SmallVectorImpl<ForwardedRegister> &Forwards, ArrayRef<MVT> RegParmTypes,		SmallVectorImpl<ForwardedRegister> &Forwards,
CCAssignFn Fn);		const SmallDenseSet<MCPhysReg, 8> &GuardedForwardedRegs,
		ArrayRef<MVT> RegParmTypes, CCAssignFn Fn);

/// Returns true if the results of the two calling conventions are compatible.		/// Returns true if the results of the two calling conventions are compatible.
/// This is usually part of the check for tailcall eligibility.		/// This is usually part of the check for tailcall eligibility.
static bool resultsCompatible(CallingConv::ID CalleeCC,		static bool resultsCompatible(CallingConv::ID CalleeCC,
CallingConv::ID CallerCC, MachineFunction &MF,		CallingConv::ID CallerCC, MachineFunction &MF,
LLVMContext &C,		LLVMContext &C,
const SmallVectorImpl<ISD::InputArg> &Ins,		const SmallVectorImpl<ISD::InputArg> &Ins,
CCAssignFn CalleeFn, CCAssignFn CallerFn);		CCAssignFn CalleeFn, CCAssignFn CallerFn);
Show All 40 Lines

llvm/lib/CodeGen/CallingConvLower.cpp

Show First 20 Lines • Show All 230 Lines • ▼ Show 20 Lines	#endif
// as allocated so that future queries don't return the same registers, i.e.		// as allocated so that future queries don't return the same registers, i.e.
// when i64 and f64 are both passed in GPRs.		// when i64 and f64 are both passed in GPRs.
StackOffset = SavedStackOffset;		StackOffset = SavedStackOffset;
MaxStackArgAlign = SavedMaxStackArgAlign;		MaxStackArgAlign = SavedMaxStackArgAlign;
Locs.resize(NumLocs);		Locs.resize(NumLocs);
}		}

void CCState::analyzeMustTailForwardedRegisters(		void CCState::analyzeMustTailForwardedRegisters(
SmallVectorImpl<ForwardedRegister> &Forwards, ArrayRef<MVT> RegParmTypes,		SmallVectorImpl<ForwardedRegister> &Forwards,
CCAssignFn Fn) {		const SmallDenseSet<MCPhysReg, 8> &GuardedForwardedRegs,
		ArrayRef<MVT> RegParmTypes, CCAssignFn Fn) {
// Oftentimes calling conventions will not user register parameters for		// Oftentimes calling conventions will not user register parameters for
// variadic functions, so we need to assume we're not variadic so that we get		// variadic functions, so we need to assume we're not variadic so that we get
// all the registers that might be used in a non-variadic call.		// all the registers that might be used in a non-variadic call.
SaveAndRestore<bool> SavedVarArg(IsVarArg, false);		SaveAndRestore<bool> SavedVarArg(IsVarArg, false);
SaveAndRestore<bool> SavedMustTail(AnalyzingMustTailForwardedRegs, true);		SaveAndRestore<bool> SavedMustTail(AnalyzingMustTailForwardedRegs, true);

for (MVT RegVT : RegParmTypes) {		for (MVT RegVT : RegParmTypes) {
SmallVector<MCPhysReg, 8> RemainingRegs;		SmallVector<MCPhysReg, 8> RemainingRegs;
getRemainingRegParmsForType(RemainingRegs, RegVT, Fn);		getRemainingRegParmsForType(RemainingRegs, RegVT, Fn);
const TargetLowering *TL = MF.getSubtarget().getTargetLowering();		const TargetLowering *TL = MF.getSubtarget().getTargetLowering();
const TargetRegisterClass *RC = TL->getRegClassFor(RegVT);		const TargetRegisterClass *RC = TL->getRegClassFor(RegVT);
for (MCPhysReg PReg : RemainingRegs) {		for (MCPhysReg PReg : RemainingRegs) {
		if (GuardedForwardedRegs.count(PReg) == 0) {
unsigned VReg = MF.addLiveIn(PReg, RC);		unsigned VReg = MF.addLiveIn(PReg, RC);
Forwards.push_back(ForwardedRegister(VReg, PReg, RegVT));		Forwards.push_back(ForwardedRegister(VReg, PReg, RegVT));
		} else
		Forwards.push_back(ForwardedRegister(0, PReg, RegVT));
}		}
}		}
}		}

bool CCState::resultsCompatible(CallingConv::ID CalleeCC,		bool CCState::resultsCompatible(CallingConv::ID CalleeCC,
CallingConv::ID CallerCC, MachineFunction &MF,		CallingConv::ID CallerCC, MachineFunction &MF,
LLVMContext &C,		LLVMContext &C,
const SmallVectorImpl<ISD::InputArg> &Ins,		const SmallVectorImpl<ISD::InputArg> &Ins,
Show All 31 Lines

llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp

Show First 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	for (const Instruction &I : BB) {
if (const auto *II = dyn_cast<IntrinsicInst>(&I)) {		if (const auto *II = dyn_cast<IntrinsicInst>(&I)) {
if (II->getIntrinsicID() == Intrinsic::vastart)		if (II->getIntrinsicID() == Intrinsic::vastart)
MF->getFrameInfo().setHasVAStart(true);		MF->getFrameInfo().setHasVAStart(true);
}		}

// If we have a musttail call in a variadic function, we need to ensure we		// If we have a musttail call in a variadic function, we need to ensure we
// forward implicit register parameters.		// forward implicit register parameters.
if (const auto *CI = dyn_cast<CallInst>(&I)) {		if (const auto *CI = dyn_cast<CallInst>(&I)) {
		// check for llvm::Intrinsic::icall_branch_funnel intrinsic.
		// we do not store varargs parameters explicitly for icall_branch_funnel
		if (CI->getCalledFunction() &&
		CI->getCalledFunction()->getIntrinsicID() ==
		llvm::Intrinsic::icall_branch_funnel)
		continue;

if (CI->isMustTailCall() && Fn->isVarArg())		if (CI->isMustTailCall() && Fn->isVarArg())
MF->getFrameInfo().setHasMustTailInVarArgFunc(true);		MF->getFrameInfo().setHasMustTailInVarArgFunc(true);
}		}

// Mark values used outside their block as exported, by allocating		// Mark values used outside their block as exported, by allocating
// a virtual register for them.		// a virtual register for them.
if (isUsedOutsideOfDefiningBlock(&I))		if (isUsedOutsideOfDefiningBlock(&I))
if (!isa<AllocaInst>(I) \|\| !StaticAllocaMap.count(cast<AllocaInst>(&I)))		if (!isa<AllocaInst>(I) \|\| !StaticAllocaMap.count(cast<AllocaInst>(&I)))
▲ Show 20 Lines • Show All 327 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64CallLowering.cpp

Show First 20 Lines • Show All 385 Lines • ▼ Show 20 Lines	CCState CCInfo(F.getCallingConv(), /IsVarArg=/true, MF, ArgLocs,
F.getContext());		F.getContext());
SmallVector<MVT, 2> RegParmTypes;		SmallVector<MVT, 2> RegParmTypes;
RegParmTypes.push_back(MVT::i64);		RegParmTypes.push_back(MVT::i64);
RegParmTypes.push_back(MVT::f128);		RegParmTypes.push_back(MVT::f128);

// Later on, we can use this vector to restore the registers if necessary.		// Later on, we can use this vector to restore the registers if necessary.
SmallVectorImpl<ForwardedRegister> &Forwards =		SmallVectorImpl<ForwardedRegister> &Forwards =
FuncInfo->getForwardedMustTailRegParms();		FuncInfo->getForwardedMustTailRegParms();
CCInfo.analyzeMustTailForwardedRegisters(Forwards, RegParmTypes, AssignFn);
		SmallDenseSet<MCPhysReg, 8> GuardedRegs;
		CCInfo.analyzeMustTailForwardedRegisters(Forwards, GuardedRegs, RegParmTypes,
		AssignFn);

// Conservatively forward X8, since it might be used for an aggregate		// Conservatively forward X8, since it might be used for an aggregate
// return.		// return.
if (!CCInfo.isAllocated(AArch64::X8)) {		if (!CCInfo.isAllocated(AArch64::X8)) {
unsigned X8VReg = MF.addLiveIn(AArch64::X8, &AArch64::GPR64RegClass);		unsigned X8VReg = MF.addLiveIn(AArch64::X8, &AArch64::GPR64RegClass);
Forwards.push_back(ForwardedRegister(X8VReg, AArch64::X8, MVT::i64));		Forwards.push_back(ForwardedRegister(X8VReg, AArch64::X8, MVT::i64));
}		}

▲ Show 20 Lines • Show All 621 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,566 Lines • ▼ Show 20 Lines	if (isVarArg) {

if (MFI.hasMustTailInVarArgFunc()) {		if (MFI.hasMustTailInVarArgFunc()) {
SmallVector<MVT, 2> RegParmTypes;		SmallVector<MVT, 2> RegParmTypes;
RegParmTypes.push_back(MVT::i64);		RegParmTypes.push_back(MVT::i64);
RegParmTypes.push_back(MVT::f128);		RegParmTypes.push_back(MVT::f128);
// Compute the set of forwarded registers. The rest are scratch.		// Compute the set of forwarded registers. The rest are scratch.
SmallVectorImpl<ForwardedRegister> &Forwards =		SmallVectorImpl<ForwardedRegister> &Forwards =
FuncInfo->getForwardedMustTailRegParms();		FuncInfo->getForwardedMustTailRegParms();
CCInfo.analyzeMustTailForwardedRegisters(Forwards, RegParmTypes,
CC_AArch64_AAPCS);		SmallDenseSet<MCPhysReg, 8> GuardedRegs;
		CCInfo.analyzeMustTailForwardedRegisters(Forwards, GuardedRegs,
		RegParmTypes, CC_AArch64_AAPCS);

// Conservatively forward X8, since it might be used for aggregate return.		// Conservatively forward X8, since it might be used for aggregate return.
if (!CCInfo.isAllocated(AArch64::X8)) {		if (!CCInfo.isAllocated(AArch64::X8)) {
unsigned X8VReg = MF.addLiveIn(AArch64::X8, &AArch64::GPR64RegClass);		unsigned X8VReg = MF.addLiveIn(AArch64::X8, &AArch64::GPR64RegClass);
Forwards.push_back(ForwardedRegister(X8VReg, AArch64::X8, MVT::i64));		Forwards.push_back(ForwardedRegister(X8VReg, AArch64::X8, MVT::i64));
}		}
}		}
}		}
▲ Show 20 Lines • Show All 9,959 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ExpandPseudo.cpp

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	public:
}		}

private:		private:
void ExpandICallBranchFunnel(MachineBasicBlock *MBB,		void ExpandICallBranchFunnel(MachineBasicBlock *MBB,
MachineBasicBlock::iterator MBBI);		MachineBasicBlock::iterator MBBI);

bool ExpandMI(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI);		bool ExpandMI(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI);
bool ExpandMBB(MachineBasicBlock &MBB);		bool ExpandMBB(MachineBasicBlock &MBB);

		void expandSaveVarargXmmRegs(MachineBasicBlock *MBB,
		MachineBasicBlock::iterator MBBI) const;

		void createTailCallBlocksPair(MachineBasicBlock &OriginalTailCallBlk,
		MachineBasicBlock::iterator &TCPseudoInstr);
};		};
char X86ExpandPseudo::ID = 0;		char X86ExpandPseudo::ID = 0;

} // End anonymous namespace.		} // End anonymous namespace.

INITIALIZE_PASS(X86ExpandPseudo, DEBUG_TYPE, X86_EXPAND_PSEUDO_NAME, false,		INITIALIZE_PASS(X86ExpandPseudo, DEBUG_TYPE, X86_EXPAND_PSEUDO_NAME, false,
false)		false)

▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	void X86ExpandPseudo::ExpandICallBranchFunnel(
for (auto P : TargetMBBs) {		for (auto P : TargetMBBs) {
MF->insert(InsPt, P.first);		MF->insert(InsPt, P.first);
BuildMI(P.first, DL, TII->get(X86::TAILJMPd64))		BuildMI(P.first, DL, TII->get(X86::TAILJMPd64))
.add(JTInst->getOperand(3 + 2 * P.second));		.add(JTInst->getOperand(3 + 2 * P.second));
}		}
JTMBB->erase(JTInst);		JTMBB->erase(JTInst);
}		}

		// this function replaces original tail call instruction with two versions
		// of tailcall instruction. One is fully similar to original, another has xmm
		// registers restoring code inserted previously. Additionally there is created a
		// branch which checks %al and selects proper version of tailcall. This %al
		// trick is AMD64 specific thing.
		//
		// f_thunk: f_thunk:
		// # %bb.1: => # %bb.1:
		// addq 32, %rsp testb %al, %al
		// jmpq tc_func je .LBB0_2
		// # %bb.2:
		// movaps 96(%rsp), %xmm0
		// addq 32, %rsp
		// jmpq tc_func
		// .LBB0_2:
		// # %bb.3:
		// addq 32, %rsp
		// jmpq tc_func
		//
		void X86ExpandPseudo::createTailCallBlocksPair(
		MachineBasicBlock &OriginalTailCallBlk,
		MachineBasicBlock::iterator &TCPseudoInstr) {

		MachineFunction *Func = OriginalTailCallBlk.getParent();
		X86MachineFunctionInfo *X86Info = Func->getInfo<X86MachineFunctionInfo>();
		const auto &Forwards = X86Info->getForwardedMustTailRegParms();
		const BasicBlock *BB = OriginalTailCallBlk.getBasicBlock();

		MachineBasicBlock::iterator TailCallMInstr = std::prev(TCPseudoInstr);
		DebugLoc DL = TCPseudoInstr->getDebugLoc();

		// create two blocks for tailcalls.
		MachineFunction::iterator MBBIter = ++OriginalTailCallBlk.getIterator();
		MachineBasicBlock *TailCallBlkWithGuardedRegs =
		Func->CreateMachineBasicBlock(BB);
		MachineBasicBlock *TailCallBlk = Func->CreateMachineBasicBlock(BB);
		Func->insert(MBBIter, TailCallBlkWithGuardedRegs);
		Func->insert(MBBIter, TailCallBlk);

		TailCallBlk->transferSuccessors(&OriginalTailCallBlk);
		OriginalTailCallBlk.addSuccessor(TailCallBlkWithGuardedRegs);
		OriginalTailCallBlk.addSuccessor(TailCallBlk);

		// search for the first stack frame destroying instruction
		MachineBasicBlock::iterator FirstStackFrameDestroyingInstr = &*TailCallMInstr;
		for (MachineBasicBlock::iterator I = TCPseudoInstr;
		I != OriginalTailCallBlk.begin(); --I) {
		MachineBasicBlock::iterator PI = std::prev(I);
		if (PI->getFlag(MachineInstr::FrameDestroy))
		FirstStackFrameDestroyingInstr = PI;
		}

		// copy stack restoring code and tailcall instruction into
		// two created blocks. Delete copied instructions from the
		// OriginalTailCallBlk.
		MachineBasicBlock::iterator CurInstr = FirstStackFrameDestroyingInstr;
		do {
		// duplicate instructions and put them into new blocks.
		// handle CFI instructions separately.
		if (CurInstr->isCFIInstruction()) {
		BuildMI(*TailCallBlkWithGuardedRegs, TailCallBlkWithGuardedRegs->end(),
		TailCallBlkWithGuardedRegs->findDebugLoc(
		TailCallBlkWithGuardedRegs->begin()),
		TII->get(TargetOpcode::CFI_INSTRUCTION))
		.addCFIIndex(CurInstr->getOperand(0).getCFIIndex());
		BuildMI(*TailCallBlk, TailCallBlk->end(),
		TailCallBlk->findDebugLoc(TailCallBlk->begin()),
		TII->get(TargetOpcode::CFI_INSTRUCTION))
		.addCFIIndex(CurInstr->getOperand(0).getCFIIndex());
		rnkUnsubmitted Not Done Reply Inline Actions This looks like a lot of very fragile pattern matching. I would greatly prefer it if we didn't have to do this. rnk: This looks like a lot of very fragile pattern matching. I would greatly prefer it if we didn't…
		avlAuthorUnsubmitted Done Reply Inline Actions would rewrite it. avl: would rewrite it.
		} else {
		TII->duplicate(*TailCallBlkWithGuardedRegs,
		TailCallBlkWithGuardedRegs->end(), *CurInstr);

		TII->duplicate(TailCallBlk, TailCallBlk->end(), CurInstr);
		}

		// stop copying if we achieved tail call instruction
		if (CurInstr->getOpcode() == TailCallMInstr->getOpcode()) {
		OriginalTailCallBlk.erase(CurInstr);
		break;
		}

		CurInstr = &*OriginalTailCallBlk.erase(CurInstr);
		} while (CurInstr != OriginalTailCallBlk.end());

		// copy call site information into new tail call instructions
		OriginalTailCallBlk.getParent()->copyCallSiteInfo(
		&TCPseudoInstr, &TailCallBlkWithGuardedRegs->getLastNonDebugInstr());

		OriginalTailCallBlk.getParent()->copyCallSiteInfo(
		&TCPseudoInstr, &TailCallBlk->getLastNonDebugInstr());

		// If %al is 0, branch around the XMM save block.
		BuildMI(&OriginalTailCallBlk, DL, TII->get(X86::TEST8rr))
		.addReg(X86::AL)
		.addReg(X86::AL);
		BuildMI(&OriginalTailCallBlk, DL, TII->get(X86::JCC_1))
		.addMBB(TailCallBlk)
		.addImm(X86::COND_E);

		// add code restoring xmm regsiters into start of TailCallInstrFromGuardedBlk
		MachineInstr &TailCallInstrFromGuardedBlk =
		*TailCallBlkWithGuardedRegs->getLastNonDebugInstr();

		// TODO: take into account YMM, ZMM here
		unsigned MOVOpc = STI->hasAVX() ? X86::VMOVAPSrm : X86::MOVAPSrm;

		unsigned BaseReg;
		int64_t FrameOffset = X86FL->getFrameIndexReference(
		*Func, X86Info->getThunkRegSaveFrameIndex(), BaseReg);
		int64_t SaveAreaOffset =
		(Func->getFrameInfo().hasVAStart() ? X86Info->getVarArgsFPOffset() : 0);

		int RegIdx = 0;
		for (const auto &Fwd : Forwards) {
		if (Fwd.IsGuarded()) {
		int64_t Offset = FrameOffset + SaveAreaOffset + RegIdx * 16;

		MachineMemOperand *MMO = Func->getMachineMemOperand(
		MachinePointerInfo::getFixedStack(
		*Func, X86Info->getThunkRegSaveFrameIndex(), Offset),
		MachineMemOperand::MOLoad,
		/Size=/16, /Align=/16);

		BuildMI(*TailCallBlkWithGuardedRegs, TailCallBlkWithGuardedRegs->begin(),
		DL, TII->get(MOVOpc), Fwd.PReg)
		.addReg(BaseReg)
		.addImm(/Scale=/1)
		.addReg(/IndexReg=/0)
		.addImm(/Disp=/Offset)
		.addReg(/Segment=/0)
		.addMemOperand(MMO);

		TailCallInstrFromGuardedBlk.addOperand(
		MachineOperand::CreateReg(Fwd.PReg, false /IsDef/, true /IsImp/));
		RegIdx++;
		}
		}

		// add liveins into newly created blocks
		for (auto &MO : TCPseudoInstr->operands()) {
		if (MO.isReg() && Register::isPhysicalRegister(MO.getReg())) {
		TailCallBlk->addLiveIn(MO.getReg());
		TailCallBlkWithGuardedRegs->addLiveIn(MO.getReg());
		}
		}
		}

/// If \p MBBI is a pseudo instruction, this method expands		/// If \p MBBI is a pseudo instruction, this method expands
/// it to the corresponding (sequence of) actual instruction(s).		/// it to the corresponding (sequence of) actual instruction(s).
/// \returns true if \p MBBI has been expanded.		/// \returns true if \p MBBI has been expanded.
bool X86ExpandPseudo::ExpandMI(MachineBasicBlock &MBB,		bool X86ExpandPseudo::ExpandMI(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI) {		MachineBasicBlock::iterator MBBI) {
MachineInstr &MI = *MBBI;		MachineInstr &MI = *MBBI;
unsigned Opcode = MI.getOpcode();		unsigned Opcode = MI.getOpcode();
DebugLoc DL = MBBI->getDebugLoc();		DebugLoc DL = MBBI->getDebugLoc();
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	case X86::TCRETURNmi64: {
} else {		} else {
JumpTarget.setIsKill();		JumpTarget.setIsKill();
BuildMI(MBB, MBBI, DL, TII->get(X86::TAILJMPr))		BuildMI(MBB, MBBI, DL, TII->get(X86::TAILJMPr))
.add(JumpTarget);		.add(JumpTarget);
}		}

MachineInstr &NewMI = *std::prev(MBBI);		MachineInstr &NewMI = *std::prev(MBBI);
NewMI.copyImplicitOps(MBBI->getParent()->getParent(), MBBI);		NewMI.copyImplicitOps(MBBI->getParent()->getParent(), MBBI);
MBB.getParent()->moveCallSiteInfo(&*MBBI, &NewMI);		MBB.getParent()->copyCallSiteInfo(&*MBBI, &NewMI);

		MachineFunction *Func = MBB.getParent();
		X86MachineFunctionInfo *X86Info = Func->getInfo<X86MachineFunctionInfo>();
		const auto &Forwards = X86Info->getForwardedMustTailRegParms();

		// if tailcall return sequence is a "musttail"
		// and some of forwarded registers should be guarded
		// then replace current tailcall return sequence
		// with two return sequences: one which restores
		// guarded registers and another one which does not.
		// Otherwise, leave current tailcall return sequence as is.
		if (Func->getFrameInfo().hasMustTailInVarArgFunc()) {
		for (auto &F : Forwards)
		if (F.IsGuarded()) {
		createTailCallBlocksPair(MBB, MBBI);
		break;
		}
		}

// Delete the pseudo instruction TCRETURN.		// Delete the pseudo instruction TCRETURN.
		MBB.getParent()->eraseCallSiteInfo(&*MBBI);
MBB.erase(MBBI);		MBB.erase(MBBI);

return true;		return true;
}		}
case X86::EH_RETURN:		case X86::EH_RETURN:
case X86::EH_RETURN64: {		case X86::EH_RETURN64: {
MachineOperand &DestAddr = MBBI->getOperand(0);		MachineOperand &DestAddr = MBBI->getOperand(0);
assert(DestAddr.isReg() && "Offset should be in register!");		assert(DestAddr.isReg() && "Offset should be in register!");
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	case X86::LCMPXCHG16B_SAVE_RBX: {

// Delete the pseudo.		// Delete the pseudo.
MBBI->eraseFromParent();		MBBI->eraseFromParent();
return true;		return true;
}		}
case TargetOpcode::ICALL_BRANCH_FUNNEL:		case TargetOpcode::ICALL_BRANCH_FUNNEL:
ExpandICallBranchFunnel(&MBB, MBBI);		ExpandICallBranchFunnel(&MBB, MBBI);
return true;		return true;

		case X86::SAVE_VARARG_XMM_REGS:
		expandSaveVarargXmmRegs(&MBB, MBBI);
		return true;
}		}
llvm_unreachable("Previous switch has a fallthrough?");		llvm_unreachable("Previous switch has a fallthrough?");
}		}

/// Expand all pseudo instructions contained in \p MBB.		/// Expand all pseudo instructions contained in \p MBB.
/// \returns true if any expansion occurred for \p MBB.		/// \returns true if any expansion occurred for \p MBB.
bool X86ExpandPseudo::ExpandMBB(MachineBasicBlock &MBB) {		bool X86ExpandPseudo::ExpandMBB(MachineBasicBlock &MBB) {
bool Modified = false;		bool Modified = false;

// MBBI may be invalidated by the expansion.		// MBBI may be invalidated by the expansion.
MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end();		MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end();
while (MBBI != E) {		while (MBBI != E) {
MachineBasicBlock::iterator NMBBI = std::next(MBBI);		MachineBasicBlock::iterator NMBBI = std::next(MBBI);
Modified \|= ExpandMI(MBB, MBBI);		Modified \|= ExpandMI(MBB, MBBI);
MBBI = NMBBI;		MBBI = NMBBI;
}		}

return Modified;		return Modified;
}		}

		/// This function replaces X86::SAVE_VARARG_XMM_REGS pseudo instruction
		/// with set of copying instructions for specified xmm vararg registers.
		///
		/// [0] parameter of X86::SAVE_VARARG_XMM_REGS is frame index of stack area,
		/// where registers should be stored
		/// [1] parameter of X86::SAVE_VARARG_XMM_REGS is offset inside stack frame
		/// to the area where registers should be stored
		/// [2] - [till end] parameters of X86::SAVE_VARARG_XMM_REGS are set of
		/// xmm registers which should be stored.
		void X86ExpandPseudo::expandSaveVarargXmmRegs(
		MachineBasicBlock *GuardedBlock,
		MachineBasicBlock::iterator SaveVarargXmmRegsInstr) const {
		assert(SaveVarargXmmRegsInstr->getOpcode() == X86::SAVE_VARARG_XMM_REGS);

		MachineFunction *Func = GuardedBlock->getParent();
		DebugLoc DL = SaveVarargXmmRegsInstr->getDebugLoc();

		int64_t FrameIndex = SaveVarargXmmRegsInstr->getOperand(0).getImm();
		unsigned BaseReg;
		int64_t FrameOffset =
		X86FL->getFrameIndexReference(*Func, FrameIndex, BaseReg);
		int64_t VarArgsRegsOffset = SaveVarargXmmRegsInstr->getOperand(1).getImm();

		// TODO: add support for YMM and ZMM here.
		unsigned MOVOpc = STI->hasAVX() ? X86::VMOVAPSmr : X86::MOVAPSmr;

		// In the XMM save block, save all the XMM argument registers.
		for (int64_t OpndIdx = 2, RegIdx = 0;
		OpndIdx < SaveVarargXmmRegsInstr->getNumOperands();
		OpndIdx++, RegIdx++) {

		int64_t Offset = FrameOffset + VarArgsRegsOffset + RegIdx * 16;

		MachineMemOperand *MMO = Func->getMachineMemOperand(
		MachinePointerInfo::getFixedStack(*Func, FrameIndex, Offset),
		MachineMemOperand::MOStore,
		/Size=/16, /Align=/16);

		BuildMI(GuardedBlock, DL, TII->get(MOVOpc))
		.addReg(BaseReg)
		.addImm(/Scale=/1)
		.addReg(/IndexReg=/0)
		.addImm(/Disp=/Offset)
		.addReg(/Segment=/0)
		.addReg(SaveVarargXmmRegsInstr->getOperand(OpndIdx).getReg())
		.addMemOperand(MMO);
		assert(Register::isPhysicalRegister(
		SaveVarargXmmRegsInstr->getOperand(OpndIdx).getReg()));

		GuardedBlock->addLiveIn(
		SaveVarargXmmRegsInstr->getOperand(OpndIdx).getReg());
		}

		// Delete the pseudo.
		SaveVarargXmmRegsInstr->eraseFromParent();
		}

bool X86ExpandPseudo::runOnMachineFunction(MachineFunction &MF) {		bool X86ExpandPseudo::runOnMachineFunction(MachineFunction &MF) {
STI = &static_cast<const X86Subtarget &>(MF.getSubtarget());		STI = &static_cast<const X86Subtarget &>(MF.getSubtarget());
TII = STI->getInstrInfo();		TII = STI->getInstrInfo();
TRI = STI->getRegisterInfo();		TRI = STI->getRegisterInfo();
X86FI = MF.getInfo<X86MachineFunctionInfo>();		X86FI = MF.getInfo<X86MachineFunctionInfo>();
X86FL = STI->getFrameLowering();		X86FL = STI->getFrameLowering();

bool Modified = false;		bool Modified = false;
Show All 9 Lines

llvm/lib/Target/X86/X86ISelLowering.h

Show All 18 Lines
#include "llvm/CodeGen/TargetLowering.h"		#include "llvm/CodeGen/TargetLowering.h"

namespace llvm {		namespace llvm {
class X86Subtarget;		class X86Subtarget;
class X86TargetMachine;		class X86TargetMachine;

namespace X86ISD {		namespace X86ISD {
// X86 Specific DAG Nodes		// X86 Specific DAG Nodes
enum NodeType : unsigned {		enum NodeType : unsigned {
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - enum NodeType : unsigned { - // Start the numbering where the builtin ops leave off. - FIRST_NUMBER = ISD::BUILTIN_OP_END, - - /// Bit scan forward. - BSF, - /// Bit scan reverse. - BSR, - - /// Double shift instructions. These correspond to - /// X86::SHLDxx and X86::SHRDxx instructions. - SHLD, - SHRD, - - /// Bitwise logical AND of floating point values. This corresponds - /// to X86::ANDPS or X86::ANDPD. - FAND, - - /// Bitwise logical OR of floating point values. This corresponds - /// to X86::ORPS or X86::ORPD. - FOR, - - /// Bitwise logical XOR of floating point values. This corresponds - /// to X86::XORPS or X86::XORPD. - FXOR, - - /// Bitwise logical ANDNOT of floating point values. This - /// corresponds to X86::ANDNPS or X86::ANDNPD. - FANDN, - - /// These operations represent an abstract X86 call - /// instruction, which includes a bunch of information. In particular the - /// operands of these node are: - /// - /// #0 - The incoming token chain - /// #1 - The callee - /// #2 - The number of arg bytes the caller pushes on the stack. - /// #3 - The number of arg bytes the callee pops off the stack. - /// #4 - The value to pass in AL/AX/EAX (optional) - /// #5 - The value to pass in DL/DX/EDX (optional) - /// - /// The result values of these nodes are: - /// - /// #0 - The outgoing token chain - /// #1 - The first register result value (optional) - /// #2 - The second register result value (optional) - /// - CALL, - - /// Same as call except it adds the NoTrack prefix. - NT_CALL, - - /// X86 compare and logical compare instructions. - CMP, COMI, UCOMI, - - /// X86 bit-test instructions. - BT, - - /// X86 SetCC. Operand 0 is condition code, and operand 1 is the EFLAGS - /// operand, usually produced by a CMP instruction. - SETCC, - - /// X86 Select - SELECTS, - - // Same as SETCC except it's materialized with a sbb and the value is all - // one's or all zero's. - SETCC_CARRY, // R = carry_bit ? ~0 : 0 - - /// X86 FP SETCC, implemented with CMP{cc}SS/CMP{cc}SD. - /// Operands are two FP values to compare; result is a mask of - /// 0s or 1s. Generally DTRT for C/C++ with NaNs. - FSETCC, - - /// X86 FP SETCC, similar to above, but with output as an i1 mask and - /// and a version with SAE. - FSETCCM, FSETCCM_SAE, - - /// X86 conditional moves. Operand 0 and operand 1 are the two values - /// to select from. Operand 2 is the condition code, and operand 3 is the - /// flag operand produced by a CMP or TEST instruction. - CMOV, - - /// X86 conditional branches. Operand 0 is the chain operand, operand 1 - /// is the block to branch if condition is true, operand 2 is the - /// condition code, and operand 3 is the flag operand produced by a CMP - /// or TEST instruction. - BRCOND, - - /// BRIND node with NoTrack prefix. Operand 0 is the chain operand and - /// operand 1 is the target address. - NT_BRIND, - - /// Return with a flag operand. Operand 0 is the chain operand, operand - /// 1 is the number of bytes of stack to pop. - RET_FLAG, - - /// Return from interrupt. Operand 0 is the number of bytes to pop. - IRET, - - /// Repeat fill, corresponds to X86::REP_STOSx. - REP_STOS, - - /// Repeat move, corresponds to X86::REP_MOVSx. - REP_MOVS, - - /// On Darwin, this node represents the result of the popl - /// at function entry, used for PIC code. - GlobalBaseReg, - - /// A wrapper node for TargetConstantPool, TargetJumpTable, - /// TargetExternalSymbol, TargetGlobalAddress, TargetGlobalTLSAddress, - /// MCSymbol and TargetBlockAddress. - Wrapper, - - /// Special wrapper used under X86-64 PIC mode for RIP - /// relative displacements. - WrapperRIP, - - /// Copies a 64-bit value from an MMX vector to the low word - /// of an XMM vector, with the high word zero filled. - MOVQ2DQ, - - /// Copies a 64-bit value from the low word of an XMM vector - /// to an MMX vector. - MOVDQ2Q, - - /// Copies a 32-bit value from the low word of a MMX - /// vector to a GPR. - MMX_MOVD2W, - - /// Copies a GPR into the low 32-bit word of a MMX vector - /// and zero out the high word. - MMX_MOVW2D, - - /// Extract an 8-bit value from a vector and zero extend it to - /// i32, corresponds to X86::PEXTRB. - PEXTRB, - - /// Extract a 16-bit value from a vector and zero extend it to - /// i32, corresponds to X86::PEXTRW. - PEXTRW, - - /// Insert any element of a 4 x float vector into any element - /// of a destination 4 x floatvector. - INSERTPS, - - /// Insert the lower 8-bits of a 32-bit value to a vector, - /// corresponds to X86::PINSRB. - PINSRB, - - /// Insert the lower 16-bits of a 32-bit value to a vector, - /// corresponds to X86::PINSRW. - PINSRW, - - /// Shuffle 16 8-bit values within a vector. - PSHUFB, - - /// Compute Sum of Absolute Differences. - PSADBW, - /// Compute Double Block Packed Sum-Absolute-Differences - DBPSADBW, - - /// Bitwise Logical AND NOT of Packed FP values. - ANDNP, - - /// Blend where the selector is an immediate. - BLENDI, - - /// Dynamic (non-constant condition) vector blend where only the sign bits - /// of the condition elements are used. This is used to enforce that the - /// condition mask is not valid for generic VSELECT optimizations. This - /// is also used to implement the intrinsics. - /// Operands are in VSELECT order: MASK, TRUE, FALSE - BLENDV, - - /// Combined add and sub on an FP vector. - ADDSUB, - - // FP vector ops with rounding mode. - FADD_RND, FADDS, FADDS_RND, - FSUB_RND, FSUBS, FSUBS_RND, - FMUL_RND, FMULS, FMULS_RND, - FDIV_RND, FDIVS, FDIVS_RND, - FMAX_SAE, FMAXS_SAE, - FMIN_SAE, FMINS_SAE, - FSQRT_RND, FSQRTS, FSQRTS_RND, - - // FP vector get exponent. - FGETEXP, FGETEXP_SAE, FGETEXPS, FGETEXPS_SAE, - // Extract Normalized Mantissas. - VGETMANT, VGETMANT_SAE, VGETMANTS, VGETMANTS_SAE, - // FP Scale. - SCALEF, SCALEF_RND, - SCALEFS, SCALEFS_RND, - - // Unsigned Integer average. - AVG, - - /// Integer horizontal add/sub. - HADD, - HSUB, - - /// Floating point horizontal add/sub. - FHADD, - FHSUB, - - // Detect Conflicts Within a Vector - CONFLICT, - - /// Floating point max and min. - FMAX, FMIN, - - /// Commutative FMIN and FMAX. - FMAXC, FMINC, - - /// Scalar intrinsic floating point max and min. - FMAXS, FMINS, - - /// Floating point reciprocal-sqrt and reciprocal approximation. - /// Note that these typically require refinement - /// in order to obtain suitable precision. - FRSQRT, FRCP, - - // AVX-512 reciprocal approximations with a little more precision. - RSQRT14, RSQRT14S, RCP14, RCP14S, - - // Thread Local Storage. - TLSADDR, - - // Thread Local Storage. A call to get the start address - // of the TLS block for the current module. - TLSBASEADDR, - - // Thread Local Storage. When calling to an OS provided - // thunk at the address from an earlier relocation. - TLSCALL, - - // Exception Handling helpers. - EH_RETURN, - - // SjLj exception handling setjmp. - EH_SJLJ_SETJMP, - - // SjLj exception handling longjmp. - EH_SJLJ_LONGJMP, - - // SjLj exception handling dispatch. - EH_SJLJ_SETUP_DISPATCH, - - /// Tail call return. See X86TargetLowering::LowerCall for - /// the list of operands. - TC_RETURN, - - // Vector move to low scalar and zero higher vector elements. - VZEXT_MOVL, - - // Vector integer truncate. - VTRUNC, - // Vector integer truncate with unsigned/signed saturation. - VTRUNCUS, VTRUNCS, - - // Masked version of the above. Used when less than a 128-bit result is - // produced since the mask only applies to the lower elements and can't - // be represented by a select. - // SRC, PASSTHRU, MASK - VMTRUNC, VMTRUNCUS, VMTRUNCS, - - // Vector FP extend. - VFPEXT, VFPEXT_SAE, VFPEXTS, VFPEXTS_SAE, - - // Vector FP round. - VFPROUND, VFPROUND_RND, VFPROUNDS, VFPROUNDS_RND, - - // Masked version of above. Used for v2f64->v4f32. - // SRC, PASSTHRU, MASK - VMFPROUND, - - // 128-bit vector logical left / right shift - VSHLDQ, VSRLDQ, - - // Vector shift elements - VSHL, VSRL, VSRA, - - // Vector variable shift - VSHLV, VSRLV, VSRAV, - - // Vector shift elements by immediate - VSHLI, VSRLI, VSRAI, - - // Shifts of mask registers. - KSHIFTL, KSHIFTR, - - // Bit rotate by immediate - VROTLI, VROTRI, - - // Vector packed double/float comparison. - CMPP, - - // Vector integer comparisons. - PCMPEQ, PCMPGT, - - // v8i16 Horizontal minimum and position. - PHMINPOS, - - MULTISHIFT, - - /// Vector comparison generating mask bits for fp and - /// integer signed and unsigned data types. - CMPM, - // Vector comparison with SAE for FP values - CMPM_SAE, - - // Arithmetic operations with FLAGS results. - ADD, SUB, ADC, SBB, SMUL, UMUL, - OR, XOR, AND, - - // Bit field extract. - BEXTR, - - // Zero High Bits Starting with Specified Bit Position. - BZHI, - - // X86-specific multiply by immediate. - MUL_IMM, - - // Vector sign bit extraction. - MOVMSK, - - // Vector bitwise comparisons. - PTEST, - - // Vector packed fp sign bitwise comparisons. - TESTP, - - // OR/AND test for masks. - KORTEST, - KTEST, - - // ADD for masks. - KADD, - - // Several flavors of instructions with vector shuffle behaviors. - // Saturated signed/unnsigned packing. - PACKSS, - PACKUS, - // Intra-lane alignr. - PALIGNR, - // AVX512 inter-lane alignr. - VALIGN, - PSHUFD, - PSHUFHW, - PSHUFLW, - SHUFP, - // VBMI2 Concat & Shift. - VSHLD, - VSHRD, - VSHLDV, - VSHRDV, - //Shuffle Packed Values at 128-bit granularity. - SHUF128, - MOVDDUP, - MOVSHDUP, - MOVSLDUP, - MOVLHPS, - MOVHLPS, - MOVSD, - MOVSS, - UNPCKL, - UNPCKH, - VPERMILPV, - VPERMILPI, - VPERMI, - VPERM2X128, - - // Variable Permute (VPERM). - // Res = VPERMV MaskV, V0 - VPERMV, - - // 3-op Variable Permute (VPERMT2). - // Res = VPERMV3 V0, MaskV, V1 - VPERMV3, - - // Bitwise ternary logic. - VPTERNLOG, - // Fix Up Special Packed Float32/64 values. - VFIXUPIMM, VFIXUPIMM_SAE, - VFIXUPIMMS, VFIXUPIMMS_SAE, - // Range Restriction Calculation For Packed Pairs of Float32/64 values. - VRANGE, VRANGE_SAE, VRANGES, VRANGES_SAE, - // Reduce - Perform Reduction Transformation on scalar\packed FP. - VREDUCE, VREDUCE_SAE, VREDUCES, VREDUCES_SAE, - // RndScale - Round FP Values To Include A Given Number Of Fraction Bits. - // Also used by the legacy (V)ROUND intrinsics where we mask out the - // scaling part of the immediate. - VRNDSCALE, VRNDSCALE_SAE, VRNDSCALES, VRNDSCALES_SAE, - // Tests Types Of a FP Values for packed types. - VFPCLASS, - // Tests Types Of a FP Values for scalar types. - VFPCLASSS, - - // Broadcast (splat) scalar or element 0 of a vector. If the operand is - // a vector, this node may change the vector length as part of the splat. - VBROADCAST, - // Broadcast mask to vector. - VBROADCASTM, - // Broadcast subvector to vector. - SUBV_BROADCAST, - - /// SSE4A Extraction and Insertion. - EXTRQI, INSERTQI, - - // XOP arithmetic/logical shifts. - VPSHA, VPSHL, - // XOP signed/unsigned integer comparisons. - VPCOM, VPCOMU, - // XOP packed permute bytes. - VPPERM, - // XOP two source permutation. - VPERMIL2, - - // Vector multiply packed unsigned doubleword integers. - PMULUDQ, - // Vector multiply packed signed doubleword integers. - PMULDQ, - // Vector Multiply Packed UnsignedIntegers with Round and Scale. - MULHRS, - - // Multiply and Add Packed Integers. - VPMADDUBSW, VPMADDWD, - - // AVX512IFMA multiply and add. - // NOTE: These are different than the instruction and perform - // op0 x op1 + op2. - VPMADD52L, VPMADD52H, - - // VNNI - VPDPBUSD, - VPDPBUSDS, - VPDPWSSD, - VPDPWSSDS, - - // FMA nodes. - // We use the target independent ISD::FMA for the non-inverted case. - FNMADD, - FMSUB, - FNMSUB, - FMADDSUB, - FMSUBADD, - - // FMA with rounding mode. - FMADD_RND, - FNMADD_RND, - FMSUB_RND, - FNMSUB_RND, - FMADDSUB_RND, - FMSUBADD_RND, - - // Compress and expand. - COMPRESS, - EXPAND, - - // Bits shuffle - VPSHUFBITQMB, - - // Convert Unsigned/Integer to Floating-Point Value with rounding mode. - SINT_TO_FP_RND, UINT_TO_FP_RND, - SCALAR_SINT_TO_FP, SCALAR_UINT_TO_FP, - SCALAR_SINT_TO_FP_RND, SCALAR_UINT_TO_FP_RND, - - // Vector float/double to signed/unsigned integer. - CVTP2SI, CVTP2UI, CVTP2SI_RND, CVTP2UI_RND, - // Scalar float/double to signed/unsigned integer. - CVTS2SI, CVTS2UI, CVTS2SI_RND, CVTS2UI_RND, - - // Vector float/double to signed/unsigned integer with truncation. - CVTTP2SI, CVTTP2UI, CVTTP2SI_SAE, CVTTP2UI_SAE, - // Scalar float/double to signed/unsigned integer with truncation. - CVTTS2SI, CVTTS2UI, CVTTS2SI_SAE, CVTTS2UI_SAE, - - // Vector signed/unsigned integer to float/double. - CVTSI2P, CVTUI2P, - - // Masked versions of above. Used for v2f64->v4f32. - // SRC, PASSTHRU, MASK - MCVTP2SI, MCVTP2UI, MCVTTP2SI, MCVTTP2UI, - MCVTSI2P, MCVTUI2P, - - // Vector float to bfloat16. - // Convert TWO packed single data to one packed BF16 data - CVTNE2PS2BF16, - // Convert packed single data to packed BF16 data - CVTNEPS2BF16, - // Masked version of above. - // SRC, PASSTHRU, MASK - MCVTNEPS2BF16, - - // Dot product of BF16 pairs to accumulated into - // packed single precision. - DPBF16PS, - - // Save xmm argument registers to the stack, according to %al. An operator - // is needed so that this can be expanded with control flow. - VASTART_SAVE_XMM_REGS, - - // Save xmm argument registers of the vararg thunk function to the stack, - // according to %al. An operator is needed so that this can be expanded with - // control flow. - MUSTTAIL_SAVE_GUARDED_REGS, - - // Windows's _chkstk call to do stack probing. - WIN_ALLOCA, - - // For allocating variable amounts of stack space when using - // segmented stacks. Check if the current stacklet has enough space, and - // falls back to heap allocation if not. - SEG_ALLOCA, - - // Memory barriers. - MEMBARRIER, - MFENCE, - - // Store FP status word into i16 register. - FNSTSW16r, - - // Store contents of %ah into %eflags. - SAHF, - - // Get a random integer and indicate whether it is valid in CF. - RDRAND, - - // Get a NIST SP800-90B & C compliant random integer and - // indicate whether it is valid in CF. - RDSEED, - - // Protection keys - // RDPKRU - Operand 0 is chain. Operand 1 is value for ECX. - // WRPKRU - Operand 0 is chain. Operand 1 is value for EDX. Operand 2 is - // value for ECX. - RDPKRU, WRPKRU, - - // SSE42 string comparisons. - // These nodes produce 3 results, index, mask, and flags. X86ISelDAGToDAG - // will emit one or two instructions based on which results are used. If - // flags and index/mask this allows us to use a single instruction since - // we won't have to pick and opcode for flags. Instead we can rely on the - // DAG to CSE everything and decide at isel. - PCMPISTR, - PCMPESTR, - - // Test if in transactional execution. - XTEST, - - // ERI instructions. - RSQRT28, RSQRT28_SAE, RSQRT28S, RSQRT28S_SAE, - RCP28, RCP28_SAE, RCP28S, RCP28S_SAE, EXP2, EXP2_SAE, - - // Conversions between float and half-float. - CVTPS2PH, CVTPH2PS, CVTPH2PS_SAE, - - // Masked version of above. - // SRC, RND, PASSTHRU, MASK - MCVTPS2PH, - - // Galois Field Arithmetic Instructions - GF2P8AFFINEINVQB, GF2P8AFFINEQB, GF2P8MULB, - - // LWP insert record. - LWPINS, - - // User level wait - UMWAIT, TPAUSE, - - // Enqueue Stores Instructions - ENQCMD, ENQCMDS, - - // For avx512-vp2intersect - VP2INTERSECT, - - /// X86 strict FP compare instructions. - STRICT_FCMP = ISD::FIRST_TARGET_STRICTFP_OPCODE, - STRICT_FCMPS, - - // Vector packed double/float comparison. - STRICT_CMPP, - - /// Vector comparison generating mask bits for fp and - /// integer signed and unsigned data types. - STRICT_CMPM, - - // Vector float/double to signed/unsigned integer with truncation. - STRICT_CVTTP2SI, STRICT_CVTTP2UI, - - // Vector FP extend. - STRICT_VFPEXT, - - // Vector FP round. - STRICT_VFPROUND, - - // RndScale - Round FP Values To Include A Given Number Of Fraction Bits. - // Also used by the legacy (V)ROUND intrinsics where we mask out the - // scaling part of the immediate. - STRICT_VRNDSCALE, - - // Vector signed/unsigned integer to float/double. - STRICT_CVTSI2P, STRICT_CVTUI2P, - - // Strict FMA nodes. - STRICT_FNMADD, STRICT_FMSUB, STRICT_FNMSUB, - - // Compare and swap. - LCMPXCHG_DAG = ISD::FIRST_TARGET_MEMORY_OPCODE, - LCMPXCHG8_DAG, - LCMPXCHG16_DAG, - LCMPXCHG8_SAVE_EBX_DAG, - LCMPXCHG16_SAVE_RBX_DAG, - - /// LOCK-prefixed arithmetic read-modify-write instructions. - /// EFLAGS, OUTCHAIN = LADD(INCHAIN, PTR, RHS) - LADD, LSUB, LOR, LXOR, LAND, - - // Load, scalar_to_vector, and zero extend. - VZEXT_LOAD, - - // extract_vector_elt, store. - VEXTRACT_STORE, - - // scalar broadcast from memory - VBROADCAST_LOAD, - - // Store FP control world into i16 memory. - FNSTCW16m, - - /// This instruction implements FP_TO_SINT with the - /// integer destination in memory and a FP reg source. This corresponds - /// to the X86::FISTm instructions and the rounding mode change stuff. It - /// has two inputs (token chain and address) and two outputs (int value - /// and token chain). Memory VT specifies the type to store to. - FP_TO_INT_IN_MEM, - - /// This instruction implements SINT_TO_FP with the - /// integer source in memory and FP reg result. This corresponds to the - /// X86::FILDm instructions. It has two inputs (token chain and address) - /// and two outputs (FP value and token chain). The integer source type is - /// specified by the memory VT. - FILD, - - /// This instruction implements a fp->int store from FP stack - /// slots. This corresponds to the fist instruction. It takes a - /// chain operand, value to store, address, and glue. The memory VT - /// specifies the type to store as. - FIST, - - /// This instruction implements an extending load to FP stack slots. - /// This corresponds to the X86::FLD32m / X86::FLD64m. It takes a chain - /// operand, and ptr to load from. The memory VT specifies the type to - /// load from. - FLD, - - /// This instruction implements a truncating store from FP stack - /// slots. This corresponds to the X86::FST32m / X86::FST64m. It takes a - /// chain operand, value to store, address, and glue. The memory VT - /// specifies the type to store as. - FST, - - /// This instruction grabs the address of the next argument - /// from a va_list. (reads and modifies the va_list in memory) - VAARG_64, - - // Vector truncating store with unsigned/signed saturation - VTRUNCSTOREUS, VTRUNCSTORES, - // Vector truncating masked store with unsigned/signed saturation - VMTRUNCSTOREUS, VMTRUNCSTORES, - - // X86 specific gather and scatter - MGATHER, MSCATTER, - - // WARNING: Do not add anything in the end unless you want the node to - // have memop! In fact, starting from FIRST_TARGET_MEMORY_OPCODE all - // opcodes will be thought as target memory ops! - }; + enum NodeType : unsigned { + // Start the numbering where the builtin ops leave off. + FIRST_NUMBER = ISD::BUILTIN_OP_END, + + /// Bit scan forward. + BSF, + /// Bit scan reverse. + BSR, + + /// Double shift instructions. These correspond to + /// X86::SHLDxx and X86::SHRDxx instructions. + SHLD, + SHRD, + + /// Bitwise logical AND of floating point values. This corresponds + /// to X86::ANDPS or X86::ANDPD. + FAND, + + /// Bitwise logical OR of floating point values. This corresponds + /// to X86::ORPS or X86::ORPD. + FOR, + + /// Bitwise logical XOR of floating point values. This corresponds + /// to X86::XORPS or X86::XORPD. + FXOR, + + /// Bitwise logical ANDNOT of floating point values. This + /// corresponds to X86::ANDNPS or X86::ANDNPD. + FANDN, + + /// These operations represent an abstract X86 call + /// instruction, which includes a bunch of information. In particular the + /// operands of these node are: + /// + /// #0 - The incoming token chain + /// #1 - The callee + /// #2 - The number of arg bytes the caller pushes on the stack. + /// #3 - The number of arg bytes the callee pops off the stack. + /// #4 - The value to pass in AL/AX/EAX (optional) + /// #5 - The value to pass in DL/DX/EDX (optional) + /// + /// The result values of these nodes are: + /// + /// #0 - The outgoing token chain + /// #1 - The first register result value (optional) + /// #2 - The second register result value (optional) + /// + CALL, + + /// Same as call except it adds the NoTrack prefix. + NT_CALL, + + /// X86 compare and logical compare instructions. + CMP, + COMI, + UCOMI, + + /// X86 bit-test instructions. + BT, + + /// X86 SetCC. Operand 0 is condition code, and operand 1 is the EFLAGS + /// operand, usually produced by a CMP instruction. + SETCC, + + /// X86 Select + SELECTS, + + // Same as SETCC except it's materialized with a sbb and the value is all + // one's or all zero's. + SETCC_CARRY, // R = carry_bit ? ~0 : 0 + + /// X86 FP SETCC, implemented with CMP{cc}SS/CMP{cc}SD. + /// Operands are two FP values to compare; result is a mask of + /// 0s or 1s. Generally DTRT for C/C++ with NaNs. + FSETCC, + + /// X86 FP SETCC, similar to above, but with output as an i1 mask and + /// and a version with SAE. + FSETCCM, + FSETCCM_SAE, + + /// X86 conditional moves. Operand 0 and operand 1 are the two values + /// to select from. Operand 2 is the condition code, and operand 3 is the + /// flag operand produced by a CMP or TEST instruction. + CMOV, + + /// X86 conditional branches. Operand 0 is the chain operand, operand 1 + /// is the block to branch if condition is true, operand 2 is the + /// condition code, and operand 3 is the flag operand produced by a CMP + /// or TEST instruction. + BRCOND, + + /// BRIND node with NoTrack prefix. Operand 0 is the chain operand and + /// operand 1 is the target address. + NT_BRIND, + + /// Return with a flag operand. Operand 0 is the chain operand, operand + /// 1 is the number of bytes of stack to pop. + RET_FLAG, + + /// Return from interrupt. Operand 0 is the number of bytes to pop. + IRET, + + /// Repeat fill, corresponds to X86::REP_STOSx. + REP_STOS, + + /// Repeat move, corresponds to X86::REP_MOVSx. + REP_MOVS, + + /// On Darwin, this node represents the result of the popl + /// at function entry, used for PIC code. + GlobalBaseReg, + + /// A wrapper node for TargetConstantPool, TargetJumpTable, + /// TargetExternalSymbol, TargetGlobalAddress, TargetGlobalTLSAddress, + /// MCSymbol and TargetBlockAddress. + Wrapper, + + /// Special wrapper used under X86-64 PIC mode for RIP + /// relative displacements. + WrapperRIP, + + /// Copies a 64-bit value from an MMX vector to the low word + /// of an XMM vector, with the high word zero filled. + MOVQ2DQ, + + /// Copies a 64-bit value from the low word of an XMM vector + /// to an MMX vector. + MOVDQ2Q, + + /// Copies a 32-bit value from the low word of a MMX + /// vector to a GPR. + MMX_MOVD2W, + + /// Copies a GPR into the low 32-bit word of a MMX vector + /// and zero out the high word. + MMX_MOVW2D, + + /// Extract an 8-bit value from a vector and zero extend it to + /// i32, corresponds to X86::PEXTRB. + PEXTRB, + + /// Extract a 16-bit value from a vector and zero extend it to + /// i32, corresponds to X86::PEXTRW. + PEXTRW, + + /// Insert any element of a 4 x float vector into any element + /// of a destination 4 x floatvector. + INSERTPS, + + /// Insert the lower 8-bits of a 32-bit value to a vector, + /// corresponds to X86::PINSRB. + PINSRB, + + /// Insert the lower 16-bits of a 32-bit value to a vector, + /// corresponds to X86::PINSRW. + PINSRW, + + /// Shuffle 16 8-bit values within a vector. + PSHUFB, + + /// Compute Sum of Absolute Differences. + PSADBW, + /// Compute Double Block Packed Sum-Absolute-Differences + DBPSADBW, + + /// Bitwise Logical AND NOT of Packed FP values. + ANDNP, + + /// Blend where the selector is an immediate. + BLENDI, + + /// Dynamic (non-constant condition) vector blend where only the sign bits + /// of the condition elements are used. This is used to enforce that the + /// condition mask is not valid for generic VSELECT optimizations. This + /// is also used to implement the intrinsics. + /// Operands are in VSELECT order: MASK, TRUE, FALSE + BLENDV, + + /// Combined add and sub on an FP vector. + ADDSUB, + + // FP vector ops with rounding mode. + FADD_RND, + FADDS, + FADDS_RND, + FSUB_RND, + FSUBS, + FSUBS_RND, + FMUL_RND, + FMULS, + FMULS_RND, + FDIV_RND, + FDIVS, + FDIVS_RND, + FMAX_SAE, + FMAXS_SAE, + FMIN_SAE, + FMINS_SAE, + FSQRT_RND, + FSQRTS, + FSQRTS_RND, + + // FP vector get exponent. + FGETEXP, + FGETEXP_SAE, + FGETEXPS, + FGETEXPS_SAE, + // Extract Normalized Mantissas. + VGETMANT, + VGETMANT_SAE, + VGETMANTS, + VGETMANTS_SAE, + // FP Scale. + SCALEF, + SCALEF_RND, + SCALEFS, + SCALEFS_RND, + + // Unsigned Integer average. + AVG, + + /// Integer horizontal add/sub. + HADD, + HSUB, + + /// Floating point horizontal add/sub. + FHADD, + FHSUB, + + // Detect Conflicts Within a Vector + CONFLICT, + + /// Floating point max and min. + FMAX, + FMIN, + + /// Commutative FMIN and FMAX. + FMAXC, + FMINC, + + /// Scalar intrinsic floating point max and min. + FMAXS, + FMINS, + + /// Floating point reciprocal-sqrt and reciprocal approximation. + /// Note that these typically require refinement + /// in order to obtain suitable precision. + FRSQRT, + FRCP, + + // AVX-512 reciprocal approximations with a little more precision. + RSQRT14, + RSQRT14S, + RCP14, + RCP14S, + + // Thread Local Storage. + TLSADDR, + + // Thread Local Storage. A call to get the start address + // of the TLS block for the current module. + TLSBASEADDR, + + // Thread Local Storage. When calling to an OS provided + // thunk at the address from an earlier relocation. + TLSCALL, + + // Exception Handling helpers. + EH_RETURN, + + // SjLj exception handling setjmp. + EH_SJLJ_SETJMP, + + // SjLj exception handling longjmp. + EH_SJLJ_LONGJMP, + + // SjLj exception handling dispatch. + EH_SJLJ_SETUP_DISPATCH, + + /// Tail call return. See X86TargetLowering::LowerCall for + /// the list of operands. + TC_RETURN, + + // Vector move to low scalar and zero higher vector elements. + VZEXT_MOVL, + + // Vector integer truncate. + VTRUNC, + // Vector integer truncate with unsigned/signed saturation. + VTRUNCUS, + VTRUNCS, + + // Masked version of the above. Used when less than a 128-bit result is + // produced since the mask only applies to the lower elements and can't + // be represented by a select. + // SRC, PASSTHRU, MASK + VMTRUNC, + VMTRUNCUS, + VMTRUNCS, + + // Vector FP extend. + VFPEXT, + VFPEXT_SAE, + VFPEXTS, + VFPEXTS_SAE, + + // Vector FP round. + VFPROUND, + VFPROUND_RND, + VFPROUNDS, + VFPROUNDS_RND, + + // Masked version of above. Used for v2f64->v4f32. + // SRC, PASSTHRU, MASK + VMFPROUND, + + // 128-bit vector logical left / right shift + VSHLDQ, + VSRLDQ, + + // Vector shift elements + VSHL, + VSRL, + VSRA, + + // Vector variable shift + VSHLV, + VSRLV, + VSRAV, + + // Vector shift elements by immediate + VSHLI, + VSRLI, + VSRAI, + + // Shifts of mask registers. + KSHIFTL, + KSHIFTR, + + // Bit rotate by immediate + VROTLI, + VROTRI, + + // Vector packed double/float comparison. + CMPP, + + // Vector integer comparisons. + PCMPEQ, + PCMPGT, + + // v8i16 Horizontal minimum and position. + PHMINPOS, + + MULTISHIFT, + + /// Vector comparison generating mask bits for fp and + /// integer signed and unsigned data types. + CMPM, + // Vector comparison with SAE for FP values + CMPM_SAE, + + // Arithmetic operations with FLAGS results. + ADD, + SUB, + ADC, + SBB, + SMUL, + UMUL, + OR, + XOR, + AND, + + // Bit field extract. + BEXTR, + + // Zero High Bits Starting with Specified Bit Position. + BZHI, + + // X86-specific multiply by immediate. + MUL_IMM, + + // Vector sign bit extraction. + MOVMSK, + + // Vector bitwise comparisons. + PTEST, + + // Vector packed fp sign bitwise comparisons. + TESTP, + + // OR/AND test for masks. + KORTEST, + KTEST, + + // ADD for masks. + KADD, + + // Several flavors of instructions with vector shuffle behaviors. + // Saturated signed/unnsigned packing. + PACKSS, + PACKUS, + // Intra-lane alignr. + PALIGNR, + // AVX512 inter-lane alignr. + VALIGN, + PSHUFD, + PSHUFHW, + PSHUFLW, + SHUFP, + // VBMI2 Concat & Shift. + VSHLD, + VSHRD, + VSHLDV, + VSHRDV, + // Shuffle Packed Values at 128-bit granularity. + SHUF128, + MOVDDUP, + MOVSHDUP, + MOVSLDUP, + MOVLHPS, + MOVHLPS, + MOVSD, + MOVSS, + UNPCKL, + UNPCKH, + VPERMILPV, + VPERMILPI, + VPERMI, + VPERM2X128, + + // Variable Permute (VPERM). + // Res = VPERMV MaskV, V0 + VPERMV, + + // 3-op Variable Permute (VPERMT2). + // Res = VPERMV3 V0, MaskV, V1 + VPERMV3, + + // Bitwise ternary logic. + VPTERNLOG, + // Fix Up Special Packed Float32/64 values. + VFIXUPIMM, + VFIXUPIMM_SAE, + VFIXUPIMMS, + VFIXUPIMMS_SAE, + // Range Restriction Calculation For Packed Pairs of Float32/64 values. + VRANGE, + VRANGE_SAE, + VRANGES, + VRANGES_SAE, + // Reduce - Perform Reduction Transformation on scalar\packed FP. + VREDUCE, + VREDUCE_SAE, + VREDUCES, + VREDUCES_SAE, + // RndScale - Round FP Values To Include A Given Number Of Fraction Bits. + // Also used by the legacy (V)ROUND intrinsics where we mask out the + // scaling part of the immediate. + VRNDSCALE, + VRNDSCALE_SAE, + VRNDSCALES, + VRNDSCALES_SAE, + // Tests Types Of a FP Values for packed types. + VFPCLASS, + // Tests Types Of a FP Values for scalar types. + VFPCLASSS, + + // Broadcast (splat) scalar or element 0 of a vector. If the operand is + // a vector, this node may change the vector length as part of the splat. + VBROADCAST, + // Broadcast mask to vector. + VBROADCASTM, + // Broadcast subvector to vector. + SUBV_BROADCAST, + + /// SSE4A Extraction and Insertion. + EXTRQI, + INSERTQI, + + // XOP arithmetic/logical shifts. + VPSHA, + VPSHL, + // XOP signed/unsigned integer comparisons. + VPCOM, + VPCOMU, + // XOP packed permute bytes. + VPPERM, + // XOP two source permutation. + VPERMIL2, + + // Vector multiply packed unsigned doubleword integers. + PMULUDQ, + // Vector multiply packed signed doubleword integers. + PMULDQ, + // Vector Multiply Packed UnsignedIntegers with Round and Scale. + MULHRS, + + // Multiply and Add Packed Integers. + VPMADDUBSW, + VPMADDWD, + + // AVX512IFMA multiply and add. + // NOTE: These are different than the instruction and perform + // op0 x op1 + op2. + VPMADD52L, + VPMADD52H, + + // VNNI + VPDPBUSD, + VPDPBUSDS, + VPDPWSSD, + VPDPWSSDS, + + // FMA nodes. + // We use the target independent ISD::FMA for the non-inverted case. + FNMADD, + FMSUB, + FNMSUB, + FMADDSUB, + FMSUBADD, + + // FMA with rounding mode. + FMADD_RND, + FNMADD_RND, + FMSUB_RND, + FNMSUB_RND, + FMADDSUB_RND, + FMSUBADD_RND, + + // Compress and expand. + COMPRESS, + EXPAND, + + // Bits shuffle + VPSHUFBITQMB, + + // Convert Unsigned/Integer to Floating-Point Value with rounding mode. + SINT_TO_FP_RND, + UINT_TO_FP_RND, + SCALAR_SINT_TO_FP, + SCALAR_UINT_TO_FP, + SCALAR_SINT_TO_FP_RND, + SCALAR_UINT_TO_FP_RND, + + // Vector float/double to signed/unsigned integer. + CVTP2SI, + CVTP2UI, + CVTP2SI_RND, + CVTP2UI_RND, + // Scalar float/double to signed/unsigned integer. + CVTS2SI, + CVTS2UI, + CVTS2SI_RND, + CVTS2UI_RND, + + // Vector float/double to signed/unsigned integer with truncation. + CVTTP2SI, + CVTTP2UI, + CVTTP2SI_SAE, + CVTTP2UI_SAE, + // Scalar float/double to signed/unsigned integer with truncation. + CVTTS2SI, + CVTTS2UI, + CVTTS2SI_SAE, + CVTTS2UI_SAE, + + // Vector signed/unsigned integer to float/double. + CVTSI2P, + CVTUI2P, + + // Masked versions of above. Used for v2f64->v4f32. + // SRC, PASSTHRU, MASK + MCVTP2SI, + MCVTP2UI, + MCVTTP2SI, + MCVTTP2UI, + MCVTSI2P, + MCVTUI2P, + + // Vector float to bfloat16. + // Convert TWO packed single data to one packed BF16 data + CVTNE2PS2BF16, + // Convert packed single data to packed BF16 data + CVTNEPS2BF16, + // Masked version of above. + // SRC, PASSTHRU, MASK + MCVTNEPS2BF16, + + // Dot product of BF16 pairs to accumulated into + // packed single precision. + DPBF16PS, + + // Save xmm argument registers to the stack, according to %al. An operator + // is needed so that this can be expanded with control flow. + VASTART_SAVE_XMM_REGS, + + // Save xmm argument registers of the vararg thunk function to the stack, + // according to %al. An operator is needed so that this can be expanded with + // control flow. + MUSTTAIL_SAVE_GUARDED_REGS, + + // Windows's _chkstk call to do stack probing. + WIN_ALLOCA, + + // For allocating variable amounts of stack space when using + // segmented stacks. Check if the current stacklet has enough space, and + // falls back to heap allocation if not. + SEG_ALLOCA, + + // Memory barriers. + MEMBARRIER, + MFENCE, + + // Store FP status word into i16 register. + FNSTSW16r, + + // Store contents of %ah into %eflags. + SAHF, + + // Get a random integer and indicate whether it is valid in CF. + RDRAND, + + // Get a NIST SP800-90B & C compliant random integer and + // indicate whether it is valid in CF. + RDSEED, + + // Protection keys + // RDPKRU - Operand 0 is chain. Operand 1 is value for ECX. + // WRPKRU - Operand 0 is chain. Operand 1 is value for EDX. Operand 2 is + // value for ECX. + RDPKRU, + WRPKRU, + + // SSE42 string comparisons. + // These nodes produce 3 results, index, mask, and flags. X86ISelDAGToDAG + // will emit one or two instructions based on which results are used. If + // flags and index/mask this allows us to use a single instruction since + // we won't have to pick and opcode for flags. Instead we can rely on the + // DAG to CSE everything and decide at isel. + PCMPISTR, + PCMPESTR, + + // Test if in transactional execution. + XTEST, + + // ERI instructions. + RSQRT28, + RSQRT28_SAE, + RSQRT28S, + RSQRT28S_SAE, + RCP28, + RCP28_SAE, + RCP28S, + RCP28S_SAE, + EXP2, + EXP2_SAE, + + // Conversions between float and half-float. + CVTPS2PH, + CVTPH2PS, + CVTPH2PS_SAE, + + // Masked version of above. + // SRC, RND, PASSTHRU, MASK + MCVTPS2PH, + + // Galois Field Arithmetic Instructions + GF2P8AFFINEINVQB, + GF2P8AFFINEQB, + GF2P8MULB, + + // LWP insert record. + LWPINS, + + // User level wait + UMWAIT, + TPAUSE, + + // Enqueue Stores Instructions + ENQCMD, + ENQCMDS, + + // For avx512-vp2intersect + VP2INTERSECT, + + /// X86 strict FP compare instructions. + STRICT_FCMP = ISD::FIRST_TARGET_STRICTFP_OPCODE, + STRICT_FCMPS, + + // Vector packed double/float comparison. + STRICT_CMPP, + + /// Vector comparison generating mask bits for fp and + /// integer signed and unsigned data types. + STRICT_CMPM, + + // Vector float/double to signed/unsigned integer with truncation. + STRICT_CVTTP2SI, + STRICT_CVTTP2UI, + + // Vector FP extend. + STRICT_VFPEXT, + + // Vector FP round. + STRICT_VFPROUND, + + // RndScale - Round FP Values To Include A Given Number Of Fraction Bits. + // Also used by the legacy (V)ROUND intrinsics where we mask out the + // scaling part of the immediate. + STRICT_VRNDSCALE, + + // Vector signed/unsigned integer to float/double. + STRICT_CVTSI2P, + STRICT_CVTUI2P, + + // Strict FMA nodes. + STRICT_FNMADD, + STRICT_FMSUB, + STRICT_FNMSUB, + + // Compare and swap. + LCMPXCHG_DAG = ISD::FIRST_TARGET_MEMORY_OPCODE, + LCMPXCHG8_DAG, + LCMPXCHG16_DAG, + LCMPXCHG8_SAVE_EBX_DAG, + LCMPXCHG16_SAVE_RBX_DAG, + + /// LOCK-prefixed arithmetic read-modify-write instructions. + /// EFLAGS, OUTCHAIN = LADD(INCHAIN, PTR, RHS) + LADD, + LSUB, + LOR, + LXOR, + LAND, + + // Load, scalar_to_vector, and zero extend. + VZEXT_LOAD, + + // extract_vector_elt, store. + VEXTRACT_STORE, + + // scalar broadcast from memory + VBROADCAST_LOAD, + + // Store FP control world into i16 memory. + FNSTCW16m, + + /// This instruction implements FP_TO_SINT with the + /// integer destination in memory and a FP reg source. This corresponds + /// to the X86::FISTm instructions and the rounding mode change stuff. It + /// has two inputs (token chain and address) and two outputs (int value + /// and token chain). Memory VT specifies the type to store to. + FP_TO_INT_IN_MEM, + + /// This instruction implements SINT_TO_FP with the + /// integer source in memory and FP reg result. This corresponds to the + /// X86::FILDm instructions. It has two inputs (token chain and address) + /// and two outputs (FP value and token chain). The integer source type is + /// specified by the memory VT. + FILD, + + /// This instruction implements a fp->int store from FP stack + /// slots. This corresponds to the fist instruction. It takes a + /// chain operand, value to store, address, and glue. The memory VT + /// specifies the type to store as. + FIST, + + /// This instruction implements an extending load to FP stack slots. + /// This corresponds to the X86::FLD32m / X86::FLD64m. It takes a chain + /// operand, and ptr to load from. The memory VT specifies the type to + /// load from. + FLD, + + /// This instruction implements a truncating store from FP stack + /// slots. This corresponds to the X86::FST32m / X86::FST64m. It takes a + /// chain operand, value to store, address, and glue. The memory VT + /// specifies the type to store as. + FST, + + /// This instruction grabs the address of the next argument + /// from a va_list. (reads and modifies the va_list in memory) + VAARG_64, + + // Vector truncating store with unsigned/signed saturation + VTRUNCSTOREUS, + VTRUNCSTORES, + // Vector truncating masked store with unsigned/signed saturation + VMTRUNCSTOREUS, + VMTRUNCSTORES, + + // X86 specific gather and scatter + MGATHER, + MSCATTER, + + // WARNING: Do not add anything in the end unless you want the node to + // have memop! In fact, starting from FIRST_TARGET_MEMORY_OPCODE all + // opcodes will be thought as target memory ops! + }; Lint: Pre-merge checks: clang-format: please reformat the code ``` - enum NodeType : unsigned { - // Start the…
// Start the numbering where the builtin ops leave off.		// Start the numbering where the builtin ops leave off.
FIRST_NUMBER = ISD::BUILTIN_OP_END,		FIRST_NUMBER = ISD::BUILTIN_OP_END,

/// Bit scan forward.		/// Bit scan forward.
BSF,		BSF,
/// Bit scan reverse.		/// Bit scan reverse.
BSR,		BSR,

▲ Show 20 Lines • Show All 220 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {

// Thread Local Storage. A call to get the start address		// Thread Local Storage. A call to get the start address
// of the TLS block for the current module.		// of the TLS block for the current module.
TLSBASEADDR,		TLSBASEADDR,

// Thread Local Storage. When calling to an OS provided		// Thread Local Storage. When calling to an OS provided
// thunk at the address from an earlier relocation.		// thunk at the address from an earlier relocation.
TLSCALL,		TLSCALL,

		aslUnsubmitted Not Done Reply Inline Actions Do you really need these changes? asl: Do you really need these changes?
		avlAuthorUnsubmitted Done Reply Inline Actions no, I don`t . it was done by clang-format. avl: no, I don`t . it was done by clang-format.
		aslUnsubmitted Not Done Reply Inline Actions Please do not add unrelated changes asl: Please do not add unrelated changes
		avlAuthorUnsubmitted Done Reply Inline Actions Ok, I will delete them. clang-format added above changes since VARARG_THUNK_SAVE_XMM_REGS was added. In that sense the changes are related. But OK, I will delete them. avl: Ok, I will delete them. clang-format added above changes since VARARG_THUNK_SAVE_XMM_REGS was…
// Exception Handling helpers.		// Exception Handling helpers.
EH_RETURN,		EH_RETURN,

// SjLj exception handling setjmp.		// SjLj exception handling setjmp.
EH_SJLJ_SETJMP,		EH_SJLJ_SETJMP,

// SjLj exception handling longjmp.		// SjLj exception handling longjmp.
EH_SJLJ_LONGJMP,		EH_SJLJ_LONGJMP,
▲ Show 20 Lines • Show All 251 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
// Dot product of BF16 pairs to accumulated into		// Dot product of BF16 pairs to accumulated into
// packed single precision.		// packed single precision.
DPBF16PS,		DPBF16PS,

// Save xmm argument registers to the stack, according to %al. An operator		// Save xmm argument registers to the stack, according to %al. An operator
// is needed so that this can be expanded with control flow.		// is needed so that this can be expanded with control flow.
VASTART_SAVE_XMM_REGS,		VASTART_SAVE_XMM_REGS,

		// Save xmm argument registers of the vararg thunk function to the stack,
		// according to %al. An operator is needed so that this can be expanded with
		// control flow.
		MUSTTAIL_SAVE_GUARDED_REGS,

// Windows's _chkstk call to do stack probing.		// Windows's _chkstk call to do stack probing.
WIN_ALLOCA,		WIN_ALLOCA,

// For allocating variable amounts of stack space when using		// For allocating variable amounts of stack space when using
// segmented stacks. Check if the current stacklet has enough space, and		// segmented stacks. Check if the current stacklet has enough space, and
// falls back to heap allocation if not.		// falls back to heap allocation if not.
SEG_ALLOCA,		SEG_ALLOCA,

▲ Show 20 Lines • Show All 896 Lines • ▼ Show 20 Lines	private:

bool needsCmpXchgNb(Type *MemType) const;		bool needsCmpXchgNb(Type *MemType) const;

void SetupEntryBlockForSjLj(MachineInstr &MI, MachineBasicBlock *MBB,		void SetupEntryBlockForSjLj(MachineInstr &MI, MachineBasicBlock *MBB,
MachineBasicBlock *DispatchBB, int FI) const;		MachineBasicBlock *DispatchBB, int FI) const;

// Utility function to emit the low-level va_arg code for X86-64.		// Utility function to emit the low-level va_arg code for X86-64.
MachineBasicBlock *		MachineBasicBlock *
EmitVAARG64WithCustomInserter(MachineInstr &MI,		emitVAARG64WithCustomInserter(MachineInstr &MI,
MachineBasicBlock *MBB) const;		MachineBasicBlock *MBB) const;

/// Utility function to emit the xmm reg save portion of va_start.		/// Utility function to emit the xmm reg save portion of va_start.
MachineBasicBlock *		MachineBasicBlock *
EmitVAStartSaveXMMRegsWithCustomInserter(MachineInstr &BInstr,		emitVAStartSaveXMMRegsWithCustomInserter(MachineInstr &BInstr,
MachineBasicBlock *BB) const;		MachineBasicBlock *BB) const;

		/// Utility function to emit the guarded xmm regs saving block.
		MachineBasicBlock *
		emitVarargThunkSaveXMMRegsWithCustomInserter(MachineInstr &BInstr,
		MachineBasicBlock *BB) const;

		void addSaveVarargXmmRegsPseudo(MachineBasicBlock *GuardedRegsBlk,
		MachineBasicBlock *TailBlk,
		MachineInstr &SrcPseudoInstr) const;

MachineBasicBlock *EmitLoweredCascadedSelect(MachineInstr &MI1,		MachineBasicBlock *EmitLoweredCascadedSelect(MachineInstr &MI1,
MachineInstr &MI2,		MachineInstr &MI2,
MachineBasicBlock *BB) const;		MachineBasicBlock *BB) const;

MachineBasicBlock *EmitLoweredSelect(MachineInstr &I,		MachineBasicBlock *EmitLoweredSelect(MachineInstr &I,
MachineBasicBlock *BB) const;		MachineBasicBlock *BB) const;

MachineBasicBlock *EmitLoweredAtomicFP(MachineInstr &I,		MachineBasicBlock *EmitLoweredAtomicFP(MachineInstr &I,
▲ Show 20 Lines • Show All 267 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,525 Lines • ▼ Show 20 Lines	if (MFI.hasVAStart() &&
FuncInfo->setVarArgsFrameIndex(MFI.CreateFixedObject(1, StackSize, true));		FuncInfo->setVarArgsFrameIndex(MFI.CreateFixedObject(1, StackSize, true));
}		}

// Figure out if XMM registers are in use.		// Figure out if XMM registers are in use.
assert(!(Subtarget.useSoftFloat() &&		assert(!(Subtarget.useSoftFloat() &&
F.hasFnAttribute(Attribute::NoImplicitFloat)) &&		F.hasFnAttribute(Attribute::NoImplicitFloat)) &&
"SSE register cannot be used when SSE is disabled!");		"SSE register cannot be used when SSE is disabled!");

		SmallDenseSet<MCPhysReg, 8> GuardedRegs;
		craig.topperUnsubmitted Done Reply Inline Actions First letter of variables should be capitalized. craig.topper: First letter of variables should be capitalized.
		SmallVector<SDValue, 6> LiveGPRs;
		SmallVector<SDValue, 8> LiveXMMRegs;
		SDValue ALVal;

// 64-bit calling conventions support varargs and register parameters, so we		// 64-bit calling conventions support varargs and register parameters, so we
// have to do extra work to spill them in the prologue.		// have to do extra work to spill them in the prologue.
if (Is64Bit && isVarArg && MFI.hasVAStart()) {		if (Is64Bit && isVarArg) {
// Find the first unallocated argument registers.		// Find the first unallocated argument registers.
ArrayRef<MCPhysReg> ArgGPRs = get64BitArgumentGPRs(CallConv, Subtarget);		ArrayRef<MCPhysReg> ArgGPRs = get64BitArgumentGPRs(CallConv, Subtarget);
ArrayRef<MCPhysReg> ArgXMMs = get64BitArgumentXMMs(MF, CallConv, Subtarget);		ArrayRef<MCPhysReg> ArgXMMs = get64BitArgumentXMMs(MF, CallConv, Subtarget);
unsigned NumIntRegs = CCInfo.getFirstUnallocated(ArgGPRs);		unsigned NumIntRegs = CCInfo.getFirstUnallocated(ArgGPRs);
unsigned NumXMMRegs = CCInfo.getFirstUnallocated(ArgXMMs);		unsigned NumXMMRegs = CCInfo.getFirstUnallocated(ArgXMMs);
assert(!(NumXMMRegs && !Subtarget.hasSSE1()) &&		assert(!(NumXMMRegs && !Subtarget.hasSSE1()) &&
"SSE register cannot be used when SSE is disabled!");		"SSE register cannot be used when SSE is disabled!");

// Gather all the live in physical registers.		// Gather all the live in physical registers.
SmallVector<SDValue, 6> LiveGPRs;
SmallVector<SDValue, 8> LiveXMMRegs;
SDValue ALVal;
for (MCPhysReg Reg : ArgGPRs.slice(NumIntRegs)) {		for (MCPhysReg Reg : ArgGPRs.slice(NumIntRegs)) {
unsigned GPR = MF.addLiveIn(Reg, &X86::GR64RegClass);		unsigned GPR = MF.addLiveIn(Reg, &X86::GR64RegClass);
LiveGPRs.push_back(		LiveGPRs.push_back(
DAG.getCopyFromReg(Chain, dl, GPR, MVT::i64));		DAG.getCopyFromReg(Chain, dl, GPR, MVT::i64));
}		}

if (!ArgXMMs.empty()) {		if (!ArgXMMs.empty()) {
unsigned AL = MF.addLiveIn(X86::AL, &X86::GR8RegClass);		unsigned AL = MF.addLiveIn(X86::AL, &X86::GR8RegClass);
ALVal = DAG.getCopyFromReg(Chain, dl, AL, MVT::i8);		ALVal = DAG.getCopyFromReg(Chain, dl, AL, MVT::i8);
for (MCPhysReg Reg : ArgXMMs.slice(NumXMMRegs)) {		for (MCPhysReg Reg : ArgXMMs.slice(NumXMMRegs)) {
unsigned XMMReg = MF.addLiveIn(Reg, &X86::VR128RegClass);		// FastRegisterAllocator spills virtual registers at basic
LiveXMMRegs.push_back(		// block boundary. That leads to usages of xmm registers
DAG.getCopyFromReg(Chain, dl, XMMReg, MVT::v4f32));		// outside of check for %al. Pass physical registers to
		// VASTART_SAVE_XMM_REGS to avoid unneccessary spilling.
		MF.getRegInfo().addLiveIn(Reg);
		rnkUnsubmitted Not Done Reply Inline Actions We don't in general reference the issue tracker from code comments, only from test cases. rnk: We don't in general reference the issue tracker from code comments, only from test cases.
		LiveXMMRegs.push_back(DAG.getRegister(Reg, MVT::v4f32));

		// 'musttail' calls forward input registers from thunk function to
		// callee. AMD64 ABI allows to avoid access to xmm registers by guarding
		// them with %al register. That behavior is important for
		// NoImplicitFloat case. To implement that behavior we create set of
		// registers which should be guarded while forwarding.
		//
		// We do not guard registers for functions with "thunk" attribute
		// currently. Attribute "thunk" points to special kind of thunk
		// function: "perfectly forwarding thunk". It is assumed that functions
		// with "thunk" attribute should not be used in NoImplicitFloat case.
		if (!F.hasFnAttribute("thunk"))
		GuardedRegs.insert(Reg);
}		}
}		}

		if (MFI.hasVAStart()) {
if (IsWin64) {		if (IsWin64) {
// Get to the caller-allocated home save location. Add 8 to account		// Get to the caller-allocated home save location. Add 8 to account
// for the return address.		// for the return address.
int HomeOffset = TFI.getOffsetOfLocalArea() + 8;		int HomeOffset = TFI.getOffsetOfLocalArea() + 8;
FuncInfo->setRegSaveFrameIndex(		FuncInfo->setRegSaveFrameIndex(
MFI.CreateFixedObject(1, NumIntRegs * 8 + HomeOffset, false));		MFI.CreateFixedObject(1, NumIntRegs * 8 + HomeOffset, false));
// Fixup to set vararg frame on shadow area (4 x i64).		// Fixup to set vararg frame on shadow area (4 x i64).
if (NumIntRegs < 4)		if (NumIntRegs < 4)
FuncInfo->setVarArgsFrameIndex(FuncInfo->getRegSaveFrameIndex());		FuncInfo->setVarArgsFrameIndex(FuncInfo->getRegSaveFrameIndex());
} else {		} else {
// For X86-64, if there are vararg parameters that are passed via		// For X86-64, if there are vararg parameters that are passed via
// registers, then we must store them to their spots on the stack so		// registers, then we must store them to their spots on the stack so
// they may be loaded by dereferencing the result of va_next.		// they may be loaded by dereferencing the result of va_next.
FuncInfo->setVarArgsGPOffset(NumIntRegs * 8);		FuncInfo->setVarArgsGPOffset(NumIntRegs * 8);
FuncInfo->setVarArgsFPOffset(ArgGPRs.size() * 8 + NumXMMRegs * 16);		FuncInfo->setVarArgsFPOffset(ArgGPRs.size() * 8 + NumXMMRegs * 16);
FuncInfo->setRegSaveFrameIndex(MFI.CreateStackObject(		FuncInfo->setRegSaveFrameIndex(MFI.CreateStackObject(
ArgGPRs.size() * 8 + ArgXMMs.size() * 16, 16, false));		ArgGPRs.size() * 8 + ArgXMMs.size() * 16, 16, false));
}		}

// Store the integer parameter registers.		// Store the integer parameter registers.
SmallVector<SDValue, 8> MemOps;		SmallVector<SDValue, 8> MemOps;
SDValue RSFIN = DAG.getFrameIndex(FuncInfo->getRegSaveFrameIndex(),		SDValue RSFIN = DAG.getFrameIndex(FuncInfo->getRegSaveFrameIndex(),
getPointerTy(DAG.getDataLayout()));		getPointerTy(DAG.getDataLayout()));
unsigned Offset = FuncInfo->getVarArgsGPOffset();		unsigned Offset = FuncInfo->getVarArgsGPOffset();
for (SDValue Val : LiveGPRs) {		for (SDValue Val : LiveGPRs) {
SDValue FIN = DAG.getNode(ISD::ADD, dl, getPointerTy(DAG.getDataLayout()),		SDValue FIN =
RSFIN, DAG.getIntPtrConstant(Offset, dl));		DAG.getNode(ISD::ADD, dl, getPointerTy(DAG.getDataLayout()), RSFIN,
		DAG.getIntPtrConstant(Offset, dl));
SDValue Store =		SDValue Store =
DAG.getStore(Val.getValue(1), dl, Val, FIN,		DAG.getStore(Val.getValue(1), dl, Val, FIN,
MachinePointerInfo::getFixedStack(		MachinePointerInfo::getFixedStack(
DAG.getMachineFunction(),		DAG.getMachineFunction(),
FuncInfo->getRegSaveFrameIndex(), Offset));		FuncInfo->getRegSaveFrameIndex(), Offset));
MemOps.push_back(Store);		MemOps.push_back(Store);
Offset += 8;		Offset += 8;
}		}

if (!ArgXMMs.empty() && NumXMMRegs != ArgXMMs.size()) {		if (!ArgXMMs.empty() && NumXMMRegs != ArgXMMs.size()) {
// Now store the XMM (fp + vector) parameter registers.		// Now store the XMM (fp + vector) parameter registers.
SmallVector<SDValue, 12> SaveXMMOps;		SmallVector<SDValue, 12> SaveXMMOps;
SaveXMMOps.push_back(Chain);		SaveXMMOps.push_back(Chain);
SaveXMMOps.push_back(ALVal);		SaveXMMOps.push_back(ALVal);
SaveXMMOps.push_back(DAG.getIntPtrConstant(		SaveXMMOps.push_back(
FuncInfo->getRegSaveFrameIndex(), dl));		DAG.getIntPtrConstant(FuncInfo->getRegSaveFrameIndex(), dl));
SaveXMMOps.push_back(DAG.getIntPtrConstant(		SaveXMMOps.push_back(
FuncInfo->getVarArgsFPOffset(), dl));		DAG.getIntPtrConstant(FuncInfo->getVarArgsFPOffset(), dl));
SaveXMMOps.insert(SaveXMMOps.end(), LiveXMMRegs.begin(),		SaveXMMOps.insert(SaveXMMOps.end(), LiveXMMRegs.begin(),
LiveXMMRegs.end());		LiveXMMRegs.end());
MemOps.push_back(DAG.getNode(X86ISD::VASTART_SAVE_XMM_REGS, dl,		MemOps.push_back(DAG.getNode(X86ISD::VASTART_SAVE_XMM_REGS, dl,
MVT::Other, SaveXMMOps));		MVT::Other, SaveXMMOps));
}		}

if (!MemOps.empty())		if (!MemOps.empty())
Chain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, MemOps);		Chain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, MemOps);
}		}
		}

if (isVarArg && MFI.hasMustTailInVarArgFunc()) {		if (isVarArg && MFI.hasMustTailInVarArgFunc()) {
// Find the largest legal vector type.		// Find the largest legal vector type.
MVT VecVT = MVT::Other;		MVT VecVT = MVT::Other;
// FIXME: Only some x86_32 calling conventions support AVX512.		// FIXME: Only some x86_32 calling conventions support AVX512.
if (Subtarget.useAVX512Regs() &&		if (Subtarget.useAVX512Regs() &&
(Is64Bit \|\| (CallConv == CallingConv::X86_VectorCall \|\|		(Is64Bit \|\| (CallConv == CallingConv::X86_VectorCall \|\|
CallConv == CallingConv::Intel_OCL_BI)))		CallConv == CallingConv::Intel_OCL_BI)))
VecVT = MVT::v16f32;		VecVT = MVT::v16f32;
else if (Subtarget.hasAVX())		else if (Subtarget.hasAVX())
VecVT = MVT::v8f32;		VecVT = MVT::v8f32;
else if (Subtarget.hasSSE2())		else if (Subtarget.hasSSE2())
VecVT = MVT::v4f32;		VecVT = MVT::v4f32;

// We forward some GPRs and some vector types.		// We forward some GPRs and some vector types.
SmallVector<MVT, 2> RegParmTypes;		SmallVector<MVT, 2> RegParmTypes;
MVT IntVT = Is64Bit ? MVT::i64 : MVT::i32;		MVT IntVT = Is64Bit ? MVT::i64 : MVT::i32;
RegParmTypes.push_back(IntVT);		RegParmTypes.push_back(IntVT);
if (VecVT != MVT::Other)		if (VecVT != MVT::Other)
RegParmTypes.push_back(VecVT);		RegParmTypes.push_back(VecVT);

// Compute the set of forwarded registers. The rest are scratch.		// Compute the set of forwarded registers. The rest are scratch.
SmallVectorImpl<ForwardedRegister> &Forwards =		SmallVectorImpl<ForwardedRegister> &Forwards =
FuncInfo->getForwardedMustTailRegParms();		FuncInfo->getForwardedMustTailRegParms();
CCInfo.analyzeMustTailForwardedRegisters(Forwards, RegParmTypes, CC_X86);		CCInfo.analyzeMustTailForwardedRegisters(Forwards, GuardedRegs,
		RegParmTypes, CC_X86);

// Forward AL for SysV x86_64 targets, since it is used for varargs.		// Forward AL for SysV x86_64 targets, since it is used for varargs.
if (Is64Bit && !IsWin64 && !CCInfo.isAllocated(X86::AL)) {		if (Is64Bit && !IsWin64 && !CCInfo.isAllocated(X86::AL)) {
unsigned ALVReg = MF.addLiveIn(X86::AL, &X86::GR8RegClass);		unsigned ALVReg = MF.addLiveIn(X86::AL, &X86::GR8RegClass);
Forwards.push_back(ForwardedRegister(ALVReg, X86::AL, MVT::i8));		Forwards.push_back(ForwardedRegister(ALVReg, X86::AL, MVT::i8));
}		}

// Copy all forwards from physical to virtual registers.		// Copy all forwards from physical to virtual registers.
for (ForwardedRegister &FR : Forwards) {		for (ForwardedRegister &FR : Forwards) {
// FIXME: Can we use a less constrained schedule?		// FIXME: Can we use a less constrained schedule?
		if (!FR.IsGuarded()) {
SDValue RegVal = DAG.getCopyFromReg(Chain, dl, FR.VReg, FR.VT);		SDValue RegVal = DAG.getCopyFromReg(Chain, dl, FR.VReg, FR.VT);
FR.VReg = MF.getRegInfo().createVirtualRegister(getRegClassFor(FR.VT));		FR.VReg = MF.getRegInfo().createVirtualRegister(getRegClassFor(FR.VT));
Chain = DAG.getCopyToReg(Chain, dl, FR.VReg, RegVal);		Chain = DAG.getCopyToReg(Chain, dl, FR.VReg, RegVal);
}		}
}		}

		if (!GuardedRegs.empty()) {
		craig.topperUnsubmitted Done Reply Inline Actions !guardedXmmRegs.empty() craig.topper: !guardedXmmRegs.empty()
		if (MFI.hasVAStart()) {
		// all incoming xmm registers are already stored by VAStart
		// handling. Reuse these stored values for thunk forwarded
		// parameters here.
		FuncInfo->setThunkRegSaveFrameIndex(FuncInfo->getRegSaveFrameIndex());
		} else {
		// TODO: implement support for YMM, ZMM vararg registers
		aslUnsubmitted Not Done Reply Inline Actions What is the status of all these TODO here and there? asl: What is the status of all these TODO here and there?
		avlAuthorUnsubmitted Done Reply Inline Actions I assume they would be done later. They are not required for correctness. not storing xmm registers if "no fp, only musttail calls, noimplcitfloat". I am going to do after this patch and D62639 would be integrated. support of YMM and ZMM would be done by someone who would implement support of YMM and ZMM registers for varargs. this is separate task. comment for AArch64 - for someone who would implement that on AArch64. avl: I assume they would be done later. They are not required for correctness. 1. not storing xmm…

		// allocate stack space to save registers which should be guarded by
		// ABI, 16 is size of XMM
		FuncInfo->setThunkRegSaveFrameIndex(
		MFI.CreateStackObject(GuardedRegs.size() * 16, 16, false));

		// Save guarded forwards into guarded area
		SmallVector<SDValue, 8> VarargMemOps;
		SmallVector<SDValue, 12> VarargXMMOps;
		VarargXMMOps.push_back(Chain);
		VarargXMMOps.push_back(ALVal);
		VarargXMMOps.push_back(
		rnkUnsubmitted Not Done Reply Inline Actions I haven't reviewed all of this code, but we have to find some way to refactor LowerCall. It was already poorly factored and long, but this is just too much complexity, too many lines of code. We need to find some way to separate concerns. rnk: I haven't reviewed all of this code, but we have to find some way to refactor LowerCall. It was…
		avlAuthorUnsubmitted Done Reply Inline Actions Ok, I would refactor it. Would it be OK if that refactoring would be a part of this patch ? Or Do I need to make it previously in separate patch ? avl: Ok, I would refactor it. Would it be OK if that refactoring would be a part of this patch ? Or…
		DAG.getIntPtrConstant(FuncInfo->getThunkRegSaveFrameIndex(), dl));
		VarargXMMOps.push_back(DAG.getIntPtrConstant(0, dl));
		VarargXMMOps.insert(VarargXMMOps.end(), LiveXMMRegs.begin(),
		LiveXMMRegs.end());
		VarargMemOps.push_back(DAG.getNode(X86ISD::MUSTTAIL_SAVE_GUARDED_REGS,
		dl, MVT::Other, VarargXMMOps));
		if (!VarargMemOps.empty())
		Chain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, VarargMemOps);
		}
		}
		}

// Some CCs need callee pop.		// Some CCs need callee pop.
if (X86::isCalleePop(CallConv, Is64Bit, isVarArg,		if (X86::isCalleePop(CallConv, Is64Bit, isVarArg,
MF.getTarget().Options.GuaranteedTailCallOpt)) {		MF.getTarget().Options.GuaranteedTailCallOpt)) {
FuncInfo->setBytesToPopOnReturn(StackSize); // Callee pops everything.		FuncInfo->setBytesToPopOnReturn(StackSize); // Callee pops everything.
} else if (CallConv == CallingConv::X86_INTR && Ins.size() == 2) {		} else if (CallConv == CallingConv::X86_INTR && Ins.size() == 2) {
// X86 interrupts must pop the error code (and the alignment padding) if		// X86 interrupts must pop the error code (and the alignment padding) if
// present.		// present.
FuncInfo->setBytesToPopOnReturn(Is64Bit ? 16 : 4);		FuncInfo->setBytesToPopOnReturn(Is64Bit ? 16 : 4);
} else {		} else {
FuncInfo->setBytesToPopOnReturn(0); // Callee pops nothing.		FuncInfo->setBytesToPopOnReturn(0); // Callee pops nothing.
// If this is an sret function, the return should pop the hidden pointer.		// If this is an sret function, the return should pop the hidden pointer.
if (!Is64Bit && !canGuaranteeTCO(CallConv) &&		if (!Is64Bit && !canGuaranteeTCO(CallConv) &&
!Subtarget.getTargetTriple().isOSMSVCRT() &&		!Subtarget.getTargetTriple().isOSMSVCRT() &&
argsAreStructReturn(Ins, Subtarget.isTargetMCU()) == StackStructReturn)		argsAreStructReturn(Ins, Subtarget.isTargetMCU()) == StackStructReturn)
FuncInfo->setBytesToPopOnReturn(4);		FuncInfo->setBytesToPopOnReturn(4);
}		}

if (!Is64Bit) {		if (!Is64Bit) {
// RegSaveFrameIndex is X86-64 only.		// RegSaveFrameIndex and ThunkRegSaveFrameIndex is X86-64 only.
FuncInfo->setRegSaveFrameIndex(0xAAAAAAA);		FuncInfo->setRegSaveFrameIndex(0xAAAAAAA);
		FuncInfo->setThunkRegSaveFrameIndex(0xAAAAAAA);
if (CallConv == CallingConv::X86_FastCall \|\|		if (CallConv == CallingConv::X86_FastCall \|\|
CallConv == CallingConv::X86_ThisCall)		CallConv == CallingConv::X86_ThisCall)
// fastcc functions can't have varargs.		// fastcc functions can't have varargs.
FuncInfo->setVarArgsFrameIndex(0xAAAAAAA);		FuncInfo->setVarArgsFrameIndex(0xAAAAAAA);
}		}

FuncInfo->setArgumentStackSize(StackSize);		FuncInfo->setArgumentStackSize(StackSize);

▲ Show 20 Lines • Show All 385 Lines • ▼ Show 20 Lines	if (Is64Bit && isVarArg && !IsWin64 && !IsMustTail) {
RegsToPass.push_back(std::make_pair(unsigned(X86::AL),		RegsToPass.push_back(std::make_pair(unsigned(X86::AL),
DAG.getConstant(NumXMMRegs, dl,		DAG.getConstant(NumXMMRegs, dl,
MVT::i8)));		MVT::i8)));
}		}

if (isVarArg && IsMustTail) {		if (isVarArg && IsMustTail) {
const auto &Forwards = X86Info->getForwardedMustTailRegParms();		const auto &Forwards = X86Info->getForwardedMustTailRegParms();
for (const auto &F : Forwards) {		for (const auto &F : Forwards) {
		if (!F.IsGuarded()) {
SDValue Val = DAG.getCopyFromReg(Chain, dl, F.VReg, F.VT);		SDValue Val = DAG.getCopyFromReg(Chain, dl, F.VReg, F.VT);
RegsToPass.push_back(std::make_pair(unsigned(F.PReg), Val));		RegsToPass.push_back(std::make_pair(unsigned(F.PReg), Val));
}		}
}		}
		}

// For tail calls lower the arguments to the 'real' stack slots. Sibcalls		// For tail calls lower the arguments to the 'real' stack slots. Sibcalls
// don't need this because the eligibility check rejects calls that require		// don't need this because the eligibility check rejects calls that require
// shuffling arguments passed in memory.		// shuffling arguments passed in memory.
if (!IsSibcall && isTailCall) {		if (!IsSibcall && isTailCall) {
// Force all the incoming stack arguments to be loaded from the stack		// Force all the incoming stack arguments to be loaded from the stack
// before any new outgoing arguments are stored to the stack, because the		// before any new outgoing arguments are stored to the stack, because the
// outgoing stack slots may alias the incoming argument stack slots, and		// outgoing stack slots may alias the incoming argument stack slots, and
▲ Show 20 Lines • Show All 25,742 Lines • ▼ Show 20 Lines	#define NODE_NAME_CASE(NODE) case X86ISD::NODE: return "X86ISD::" #NODE;
NODE_NAME_CASE(VRANGE_SAE)		NODE_NAME_CASE(VRANGE_SAE)
NODE_NAME_CASE(VRANGES)		NODE_NAME_CASE(VRANGES)
NODE_NAME_CASE(VRANGES_SAE)		NODE_NAME_CASE(VRANGES_SAE)
NODE_NAME_CASE(PMULUDQ)		NODE_NAME_CASE(PMULUDQ)
NODE_NAME_CASE(PMULDQ)		NODE_NAME_CASE(PMULDQ)
NODE_NAME_CASE(PSADBW)		NODE_NAME_CASE(PSADBW)
NODE_NAME_CASE(DBPSADBW)		NODE_NAME_CASE(DBPSADBW)
NODE_NAME_CASE(VASTART_SAVE_XMM_REGS)		NODE_NAME_CASE(VASTART_SAVE_XMM_REGS)
		NODE_NAME_CASE(MUSTTAIL_SAVE_GUARDED_REGS)
NODE_NAME_CASE(VAARG_64)		NODE_NAME_CASE(VAARG_64)
NODE_NAME_CASE(WIN_ALLOCA)		NODE_NAME_CASE(WIN_ALLOCA)
NODE_NAME_CASE(MEMBARRIER)		NODE_NAME_CASE(MEMBARRIER)
NODE_NAME_CASE(MFENCE)		NODE_NAME_CASE(MFENCE)
NODE_NAME_CASE(SEG_ALLOCA)		NODE_NAME_CASE(SEG_ALLOCA)
NODE_NAME_CASE(SAHF)		NODE_NAME_CASE(SAHF)
NODE_NAME_CASE(RDRAND)		NODE_NAME_CASE(RDRAND)
NODE_NAME_CASE(RDSEED)		NODE_NAME_CASE(RDSEED)
▲ Show 20 Lines • Show All 495 Lines • ▼ Show 20 Lines	static MachineBasicBlock emitXBegin(MachineInstr &MI, MachineBasicBlock MBB,
BuildMI(*sinkMBB, sinkMBB->begin(), DL, TII->get(X86::PHI), DstReg)		BuildMI(*sinkMBB, sinkMBB->begin(), DL, TII->get(X86::PHI), DstReg)
.addReg(mainDstReg).addMBB(mainMBB)		.addReg(mainDstReg).addMBB(mainMBB)
.addReg(fallDstReg).addMBB(fallMBB);		.addReg(fallDstReg).addMBB(fallMBB);

MI.eraseFromParent();		MI.eraseFromParent();
return sinkMBB;		return sinkMBB;
}		}


		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - - Lint: Pre-merge checks: clang-format: please reformat the code ``` - - ```

MachineBasicBlock *		MachineBasicBlock *
X86TargetLowering::EmitVAARG64WithCustomInserter(MachineInstr &MI,		X86TargetLowering::emitVAARG64WithCustomInserter(MachineInstr &MI,
MachineBasicBlock *MBB) const {		MachineBasicBlock *MBB) const {
// Emit va_arg instruction on X86-64.		// Emit va_arg instruction on X86-64.

// Operands to this pseudo-instruction:		// Operands to this pseudo-instruction:
// 0 ) Output : destination address (reg)		// 0 ) Output : destination address (reg)
// 1-5) Input : va_list address (addr, i64mem)		// 1-5) Input : va_list address (addr, i64mem)
// 6 ) ArgSize : Size (in bytes) of vararg type		// 6 ) ArgSize : Size (in bytes) of vararg type
// 7 ) ArgMode : 0=overflow only, 1=use gp_offset, 2=use fp_offset		// 7 ) ArgMode : 0=overflow only, 1=use gp_offset, 2=use fp_offset
▲ Show 20 Lines • Show All 242 Lines • ▼ Show 20 Lines	X86TargetLowering::emitVAARG64WithCustomInserter(MachineInstr &MI,
}		}

// Erase the pseudo instruction		// Erase the pseudo instruction
MI.eraseFromParent();		MI.eraseFromParent();

return endMBB;		return endMBB;
}		}

MachineBasicBlock *X86TargetLowering::EmitVAStartSaveXMMRegsWithCustomInserter(		// This function creates additional block for storing varargs guarded
MachineInstr &MI, MachineBasicBlock *MBB) const {		// registers. It adds check for %al into entry block, to skip
// Emit code to save XMM registers to the stack. The ABI says that the		// GuardedRegsBlk if xmm registers should not be stored.
// number of registers to save is given in %al, so it's theoretically		//
// possible to do an indirect jump trick to avoid saving all of them,		// EntryBlk[VAPseudoInstr] EntryBlk
// however this code takes a simpler approach and just executes all		// \| \| .
// of the stores if %al is non-zero. It's less code, and it's probably		// \| \| .
// easier on the hardware branch predictor, and stores aren't all that		// \| \| GuardedRegsBlk
// expensive anyway.		// \| => \| .
		// \| \| .
		// \| TailBlk[VAPseudoInstr]
		// \| \|
		// \| \|
		//
		static std::pair<MachineBasicBlock , MachineBasicBlock >
		createGuardedRegsBlock(MachineBasicBlock *EntryBlk, MachineInstr &VAPseudoInstr,
		const X86Subtarget &Subtarget) {

		MachineFunction *Func = EntryBlk->getParent();
		const TargetInstrInfo *TII = Subtarget.getInstrInfo();
		DebugLoc DL = VAPseudoInstr.getDebugLoc();
		Register CountReg = VAPseudoInstr.getOperand(0).getReg();

// Create the new basic blocks. One block contains all the XMM stores,		// Create the new basic blocks. One block contains all the XMM stores,
// and one block is the final destination regardless of whether any		// and one block is the final destination regardless of whether any
// stores were performed.		// stores were performed.
const BasicBlock *LLVM_BB = MBB->getBasicBlock();		const BasicBlock *LLVMBlk = EntryBlk->getBasicBlock();
MachineFunction *F = MBB->getParent();		MachineFunction::iterator EntryBlkIter = ++EntryBlk->getIterator();
MachineFunction::iterator MBBIter = ++MBB->getIterator();		MachineBasicBlock *GuardedRegsBlk = Func->CreateMachineBasicBlock(LLVMBlk);
MachineBasicBlock *XMMSaveMBB = F->CreateMachineBasicBlock(LLVM_BB);		MachineBasicBlock *TailBlk = Func->CreateMachineBasicBlock(LLVMBlk);
MachineBasicBlock *EndMBB = F->CreateMachineBasicBlock(LLVM_BB);		Func->insert(EntryBlkIter, GuardedRegsBlk);
F->insert(MBBIter, XMMSaveMBB);		Func->insert(EntryBlkIter, TailBlk);
F->insert(MBBIter, EndMBB);

// Transfer the remainder of MBB and its successor edges to EndMBB.		// Transfer the remainder of MBB and its successor edges to EndMBB.
EndMBB->splice(EndMBB->begin(), MBB,		TailBlk->splice(TailBlk->begin(), EntryBlk,
std::next(MachineBasicBlock::iterator(MI)), MBB->end());		std::next(MachineBasicBlock::iterator(VAPseudoInstr)),
EndMBB->transferSuccessorsAndUpdatePHIs(MBB);		EntryBlk->end());
		TailBlk->transferSuccessorsAndUpdatePHIs(EntryBlk);

// The original block will now fall through to the XMM save block.		// The original block will now fall through to the XMM save block.
MBB->addSuccessor(XMMSaveMBB);		EntryBlk->addSuccessor(GuardedRegsBlk);
// The XMMSaveMBB will fall through to the end block.		// The XMMSaveMBB will fall through to the end block.
XMMSaveMBB->addSuccessor(EndMBB);		GuardedRegsBlk->addSuccessor(TailBlk);

// Now add the instructions.
const TargetInstrInfo *TII = Subtarget.getInstrInfo();
DebugLoc DL = MI.getDebugLoc();

Register CountReg = MI.getOperand(0).getReg();
int64_t RegSaveFrameIndex = MI.getOperand(1).getImm();
int64_t VarArgsFPOffset = MI.getOperand(2).getImm();

if (!Subtarget.isCallingConvWin64(F->getFunction().getCallingConv())) {		if (!Subtarget.isCallingConvWin64(Func->getFunction().getCallingConv())) {
// If %al is 0, branch around the XMM save block.		// If %al is 0, branch around the XMM save block.
BuildMI(MBB, DL, TII->get(X86::TEST8rr)).addReg(CountReg).addReg(CountReg);		BuildMI(EntryBlk, DL, TII->get(X86::TEST8rr))
BuildMI(MBB, DL, TII->get(X86::JCC_1)).addMBB(EndMBB).addImm(X86::COND_E);		.addReg(CountReg)
MBB->addSuccessor(EndMBB);		.addReg(CountReg);
		BuildMI(EntryBlk, DL, TII->get(X86::JCC_1))
		.addMBB(TailBlk)
		.addImm(X86::COND_E);
		EntryBlk->addSuccessor(TailBlk);
}		}

		return std::make_pair(GuardedRegsBlk, TailBlk);
		}

		void X86TargetLowering::addSaveVarargXmmRegsPseudo(
		MachineBasicBlock GuardedRegsBlk, MachineBasicBlock TailBlk,
		MachineInstr &SrcPseudoInstr) const {
// Make sure the last operand is EFLAGS, which gets clobbered by the branch		// Make sure the last operand is EFLAGS, which gets clobbered by the branch
// that was just emitted, but clearly shouldn't be "saved".		// that was just emitted, but clearly shouldn't be "saved".
assert((MI.getNumOperands() <= 3 \|\|		assert((SrcPseudoInstr.getNumOperands() <= 3 \|\|
!MI.getOperand(MI.getNumOperands() - 1).isReg() \|\|		!SrcPseudoInstr.getOperand(SrcPseudoInstr.getNumOperands() - 1)
MI.getOperand(MI.getNumOperands() - 1).getReg() == X86::EFLAGS) &&		.isReg() \|\|
		SrcPseudoInstr.getOperand(SrcPseudoInstr.getNumOperands() - 1)
		.getReg() == X86::EFLAGS) &&
"Expected last argument to be EFLAGS");		"Expected last argument to be EFLAGS");
unsigned MOVOpc = Subtarget.hasAVX() ? X86::VMOVAPSmr : X86::MOVAPSmr;
// In the XMM save block, save all the XMM argument registers.		// create SAVE_VARARG_XMM_REGS pseudo
for (int i = 3, e = MI.getNumOperands() - 1; i != e; ++i) {		MachineInstrBuilder MIB =
int64_t Offset = (i - 3) * 16 + VarArgsFPOffset;		BuildMI(GuardedRegsBlk, SrcPseudoInstr.getDebugLoc(),
MachineMemOperand *MMO = F->getMachineMemOperand(		Subtarget.getInstrInfo()->get(X86::SAVE_VARARG_XMM_REGS));
MachinePointerInfo::getFixedStack(*F, RegSaveFrameIndex, Offset),
MachineMemOperand::MOStore,		// set Frame Index
/Size=/16, /Align=/16);		MIB.addImm(SrcPseudoInstr.getOperand(1).getImm());
BuildMI(XMMSaveMBB, DL, TII->get(MOVOpc))
.addFrameIndex(RegSaveFrameIndex)		// set ArgsOffset
.addImm(/Scale=/1)		MIB.addImm(SrcPseudoInstr.getOperand(2).getImm());
.addReg(/IndexReg=/0)
.addImm(/Disp=/Offset)		for (unsigned OpndIdx = 3, RegIdx = 0;
.addReg(/Segment=/0)		OpndIdx + 1 < SrcPseudoInstr.getNumOperands(); OpndIdx++, RegIdx++)
.addReg(MI.getOperand(i).getReg())		MIB.addReg(SrcPseudoInstr.getOperand(OpndIdx).getReg(),
.addMemOperand(MMO);		RegState::InternalRead);

		SrcPseudoInstr.eraseFromParent(); // The pseudo instruction is gone now.
}		}

MI.eraseFromParent(); // The pseudo instruction is gone now.		MachineBasicBlock *X86TargetLowering::emitVAStartSaveXMMRegsWithCustomInserter(
		MachineInstr &PseudoVaStartInstr, MachineBasicBlock *EntryBlk) const {
		// Emit code to save XMM registers to the stack. The ABI says that the
		// number of registers to save is given in %al, so it's theoretically
		// possible to do an indirect jump trick to avoid saving all of them,
		// however this code takes a simpler approach and just executes all
		// of the stores if %al is non-zero. It's less code, and it's probably
		// easier on the hardware branch predictor, and stores aren't all that
		// expensive anyway.

		MachineBasicBlock *GuardedRegsBlk = nullptr;
		MachineBasicBlock *TailBlk = nullptr;

return EndMBB;		std::tie(GuardedRegsBlk, TailBlk) =
		createGuardedRegsBlock(EntryBlk, PseudoVaStartInstr, Subtarget);

		addSaveVarargXmmRegsPseudo(GuardedRegsBlk, TailBlk, PseudoVaStartInstr);

		return TailBlk;
		}

		MachineBasicBlock *
		X86TargetLowering::emitVarargThunkSaveXMMRegsWithCustomInserter(
		MachineInstr &PseudoVarargThunkInstr, MachineBasicBlock *EntryBlk) const {
		MachineBasicBlock *GuardedRegsBlk = nullptr;
		MachineBasicBlock *TailBlk = nullptr;
		MachineFunction *Func = EntryBlk->getParent();

		// check whether GuardedRegsBlk is already created by VASTART handling code
		assert(Func->begin() != Func->end());
		for (auto &Succ : (*Func->begin()).successors()) {

		for (auto &Instr : Succ->instrs()) {
		if (Instr.getOpcode() == X86::SAVE_VARARG_XMM_REGS) {
		// GuardedRegsBlk is already created by VASTART handling code
		assert(Func->getFrameInfo().hasVAStart());
		GuardedRegsBlk = Succ;
		TailBlk = *GuardedRegsBlk->succ_begin();
		break;
		}
		}

		if (GuardedRegsBlk)
		break;
		}

		if (GuardedRegsBlk == nullptr)
		std::tie(GuardedRegsBlk, TailBlk) =
		createGuardedRegsBlock(EntryBlk, PseudoVarargThunkInstr, Subtarget);

		addSaveVarargXmmRegsPseudo(GuardedRegsBlk, TailBlk, PseudoVarargThunkInstr);

		return TailBlk;
}		}

// The EFLAGS operand of SelectItr might be missing a kill marker		// The EFLAGS operand of SelectItr might be missing a kill marker
// because there were multiple uses of EFLAGS, and ISel didn't know		// because there were multiple uses of EFLAGS, and ISel didn't know
// which to mark. Figure out whether SelectItr should have had a		// which to mark. Figure out whether SelectItr should have had a
// kill marker, and set it if it should. Returns the correct kill		// kill marker, and set it if it should. Returns the correct kill
// marker value.		// marker value.
static bool checkAndUpdateEFLAGSKill(MachineBasicBlock::iterator SelectItr,		static bool checkAndUpdateEFLAGSKill(MachineBasicBlock::iterator SelectItr,
▲ Show 20 Lines • Show All 1,720 Lines • ▼ Show 20 Lines	case X86::FP80_TO_INT64_IN_MEM: {
return BB;		return BB;
}		}

// xbegin		// xbegin
case X86::XBEGIN:		case X86::XBEGIN:
return emitXBegin(MI, BB, Subtarget.getInstrInfo());		return emitXBegin(MI, BB, Subtarget.getInstrInfo());

case X86::VASTART_SAVE_XMM_REGS:		case X86::VASTART_SAVE_XMM_REGS:
return EmitVAStartSaveXMMRegsWithCustomInserter(MI, BB);		return emitVAStartSaveXMMRegsWithCustomInserter(MI, BB);

		case X86::MUSTTAIL_SAVE_GUARDED_REGS:
		return emitVarargThunkSaveXMMRegsWithCustomInserter(MI, BB);

case X86::VAARG_64:		case X86::VAARG_64:
return EmitVAARG64WithCustomInserter(MI, BB);		return emitVAARG64WithCustomInserter(MI, BB);

case X86::EH_SjLj_SetJmp32:		case X86::EH_SjLj_SetJmp32:
case X86::EH_SjLj_SetJmp64:		case X86::EH_SjLj_SetJmp64:
return emitEHSjLjSetJmp(MI, BB);		return emitEHSjLjSetJmp(MI, BB);

case X86::EH_SjLj_LongJmp32:		case X86::EH_SjLj_LongJmp32:
case X86::EH_SjLj_LongJmp64:		case X86::EH_SjLj_LongJmp64:
return emitEHSjLjLongJmp(MI, BB);		return emitEHSjLjLongJmp(MI, BB);
▲ Show 20 Lines • Show All 15,119 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86InstrCompiler.td

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	def ADJCALLSTACKUP64 : I<0, Pseudo, (outs), (ins i32imm:$amt1, i32imm:$amt2),
[(X86callseq_end timm:$amt1, timm:$amt2)]>,		[(X86callseq_end timm:$amt1, timm:$amt2)]>,
Requires<[IsLP64]>;		Requires<[IsLP64]>;
}		}
def : Pat<(X86callseq_start timm:$amt1, timm:$amt2),		def : Pat<(X86callseq_start timm:$amt1, timm:$amt2),
(ADJCALLSTACKDOWN64 i32imm:$amt1, i32imm:$amt2, 0)>, Requires<[IsLP64]>;		(ADJCALLSTACKDOWN64 i32imm:$amt1, i32imm:$amt2, 0)>, Requires<[IsLP64]>;

let SchedRW = [WriteSystem] in {		let SchedRW = [WriteSystem] in {

		let hasSideEffects = 1 in {
		def SAVE_VARARG_XMM_REGS : I<0, Pseudo,
		(outs),
		(ins i64imm:$regsavefi, i64imm:$offset,
		variable_ops),
		"#SAVE_VARARG_XMM_REGS $regsavefi, $offset",
		[]>;
		}

// x86-64 va_start lowering magic.		// x86-64 va_start lowering magic.
let usesCustomInserter = 1, Defs = [EFLAGS] in {		let usesCustomInserter = 1, Defs = [EFLAGS] in {
def VASTART_SAVE_XMM_REGS : I<0, Pseudo,		def VASTART_SAVE_XMM_REGS : I<0, Pseudo,
(outs),		(outs),
(ins GR8:$al,		(ins GR8:$al,
i64imm:$regsavefi, i64imm:$offset,		i64imm:$regsavefi, i64imm:$offset,
variable_ops),		variable_ops),
"#VASTART_SAVE_XMM_REGS $al, $regsavefi, $offset",		"#VASTART_SAVE_XMM_REGS $al, $regsavefi, $offset",
[(X86vastart_save_xmm_regs GR8:$al,		[(X86vastart_save_xmm_regs GR8:$al,
imm:$regsavefi,		imm:$regsavefi,
imm:$offset),		imm:$offset),
(implicit EFLAGS)]>;		(implicit EFLAGS)]>;

		// x86-64 %al guarded thunk arguments lowering magic.
		def MUSTTAIL_SAVE_GUARDED_REGS : I<0, Pseudo,
		(outs),
		(ins GR8:$al,
		i64imm:$regsavefi, i64imm:$offset,
		variable_ops),
		"#MUSTTAIL_SAVE_GUARDED_REGS $al, $regsavefi, $offset",
		[(X86musttail_save_guarded_regs GR8:$al,
		imm:$regsavefi,
		imm:$offset),
		(implicit EFLAGS)]>;


// The VAARG_64 pseudo-instruction takes the address of the va_list,		// The VAARG_64 pseudo-instruction takes the address of the va_list,
// and places the address of the next argument into a register.		// and places the address of the next argument into a register.
let Defs = [EFLAGS] in		let Defs = [EFLAGS] in
def VAARG_64 : I<0, Pseudo,		def VAARG_64 : I<0, Pseudo,
(outs GR64:$dst),		(outs GR64:$dst),
(ins i8mem:$ap, i32imm:$size, i8imm:$mode, i32imm:$align),		(ins i8mem:$ap, i32imm:$size, i8imm:$mode, i32imm:$align),
"#VAARG_64 $dst, $ap, $size, $mode, $align",		"#VAARG_64 $dst, $ap, $size, $mode, $align",
[(set GR64:$dst,		[(set GR64:$dst,
▲ Show 20 Lines • Show All 2,079 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86InstrInfo.td

Show First 20 Lines • Show All 93 Lines • ▼ Show 20 Lines
def SDT_X86Call : SDTypeProfile<0, -1, [SDTCisVT<0, iPTR>]>;		def SDT_X86Call : SDTypeProfile<0, -1, [SDTCisVT<0, iPTR>]>;

def SDT_X86NtBrind : SDTypeProfile<0, -1, [SDTCisVT<0, iPTR>]>;		def SDT_X86NtBrind : SDTypeProfile<0, -1, [SDTCisVT<0, iPTR>]>;

def SDT_X86VASTART_SAVE_XMM_REGS : SDTypeProfile<0, -1, [SDTCisVT<0, i8>,		def SDT_X86VASTART_SAVE_XMM_REGS : SDTypeProfile<0, -1, [SDTCisVT<0, i8>,
SDTCisVT<1, iPTR>,		SDTCisVT<1, iPTR>,
SDTCisVT<2, iPTR>]>;		SDTCisVT<2, iPTR>]>;

		def SDT_X86MUSTTAIL_SAVE_GUARDED_REGS : SDTypeProfile<0, -1, [SDTCisVT<0, i8>,
		SDTCisVT<1, iPTR>,
		SDTCisVT<2, iPTR>]>;


def SDT_X86VAARG_64 : SDTypeProfile<1, -1, [SDTCisPtrTy<0>,		def SDT_X86VAARG_64 : SDTypeProfile<1, -1, [SDTCisPtrTy<0>,
SDTCisPtrTy<1>,		SDTCisPtrTy<1>,
SDTCisVT<2, i32>,		SDTCisVT<2, i32>,
SDTCisVT<3, i8>,		SDTCisVT<3, i8>,
SDTCisVT<4, i32>]>;		SDTCisVT<4, i32>]>;

def SDTX86RepStr : SDTypeProfile<0, 1, [SDTCisVT<0, OtherVT>]>;		def SDTX86RepStr : SDTypeProfile<0, 1, [SDTCisVT<0, OtherVT>]>;

▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	def X86retflag : SDNode<"X86ISD::RET_FLAG", SDTX86Ret,
[SDNPHasChain, SDNPOptInGlue, SDNPVariadic]>;		[SDNPHasChain, SDNPOptInGlue, SDNPVariadic]>;
def X86iret : SDNode<"X86ISD::IRET", SDTX86Ret,		def X86iret : SDNode<"X86ISD::IRET", SDTX86Ret,
[SDNPHasChain, SDNPOptInGlue]>;		[SDNPHasChain, SDNPOptInGlue]>;

def X86vastart_save_xmm_regs :		def X86vastart_save_xmm_regs :
SDNode<"X86ISD::VASTART_SAVE_XMM_REGS",		SDNode<"X86ISD::VASTART_SAVE_XMM_REGS",
SDT_X86VASTART_SAVE_XMM_REGS,		SDT_X86VASTART_SAVE_XMM_REGS,
[SDNPHasChain, SDNPVariadic]>;		[SDNPHasChain, SDNPVariadic]>;

		def X86musttail_save_guarded_regs :
		SDNode<"X86ISD::MUSTTAIL_SAVE_GUARDED_REGS",
		SDT_X86MUSTTAIL_SAVE_GUARDED_REGS,
		[SDNPHasChain, SDNPVariadic]>;

def X86vaarg64 :		def X86vaarg64 :
SDNode<"X86ISD::VAARG_64", SDT_X86VAARG_64,		SDNode<"X86ISD::VAARG_64", SDT_X86VAARG_64,
[SDNPHasChain, SDNPMayLoad, SDNPMayStore,		[SDNPHasChain, SDNPMayLoad, SDNPMayStore,
SDNPMemOperand]>;		SDNPMemOperand]>;
def X86callseq_start :		def X86callseq_start :
SDNode<"ISD::CALLSEQ_START", SDT_X86CallSeqStart,		SDNode<"ISD::CALLSEQ_START", SDT_X86CallSeqStart,
[SDNPHasChain, SDNPOutGlue]>;		[SDNPHasChain, SDNPOutGlue]>;
def X86callseq_end :		def X86callseq_end :
▲ Show 20 Lines • Show All 3,381 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86MachineFunctionInfo.h

Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	class X86MachineFunctionInfo : public MachineFunctionInfo {
/// use as the global base register. This is used for PIC in some PIC		/// use as the global base register. This is used for PIC in some PIC
/// relocation models.		/// relocation models.
unsigned GlobalBaseReg = 0;		unsigned GlobalBaseReg = 0;

/// VarArgsFrameIndex - FrameIndex for start of varargs area.		/// VarArgsFrameIndex - FrameIndex for start of varargs area.
int VarArgsFrameIndex = 0;		int VarArgsFrameIndex = 0;
/// RegSaveFrameIndex - X86-64 vararg func register save area.		/// RegSaveFrameIndex - X86-64 vararg func register save area.
int RegSaveFrameIndex = 0;		int RegSaveFrameIndex = 0;
		/// ThunkRegSaveFrameIndex - X86-64 vararg func register save area for thunk
		/// functions.
		int ThunkRegSaveFrameIndex = 0;
/// VarArgsGPOffset - X86-64 vararg func int reg offset.		/// VarArgsGPOffset - X86-64 vararg func int reg offset.
unsigned VarArgsGPOffset = 0;		unsigned VarArgsGPOffset = 0;
/// VarArgsFPOffset - X86-64 vararg func fp reg offset.		/// VarArgsFPOffset - X86-64 vararg func fp reg offset.
unsigned VarArgsFPOffset = 0;		unsigned VarArgsFPOffset = 0;
/// ArgumentStackSize - The number of bytes on stack consumed by the arguments		/// ArgumentStackSize - The number of bytes on stack consumed by the arguments
/// being passed on the stack.		/// being passed on the stack.
unsigned ArgumentStackSize = 0;		unsigned ArgumentStackSize = 0;
/// NumLocalDynamics - Number of local-dynamic TLS accesses.		/// NumLocalDynamics - Number of local-dynamic TLS accesses.
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	public:
void setGlobalBaseReg(unsigned Reg) { GlobalBaseReg = Reg; }		void setGlobalBaseReg(unsigned Reg) { GlobalBaseReg = Reg; }

int getVarArgsFrameIndex() const { return VarArgsFrameIndex; }		int getVarArgsFrameIndex() const { return VarArgsFrameIndex; }
void setVarArgsFrameIndex(int Idx) { VarArgsFrameIndex = Idx; }		void setVarArgsFrameIndex(int Idx) { VarArgsFrameIndex = Idx; }

int getRegSaveFrameIndex() const { return RegSaveFrameIndex; }		int getRegSaveFrameIndex() const { return RegSaveFrameIndex; }
void setRegSaveFrameIndex(int Idx) { RegSaveFrameIndex = Idx; }		void setRegSaveFrameIndex(int Idx) { RegSaveFrameIndex = Idx; }

		int getThunkRegSaveFrameIndex() const { return ThunkRegSaveFrameIndex; }
		void setThunkRegSaveFrameIndex(int Idx) { ThunkRegSaveFrameIndex = Idx; }

unsigned getVarArgsGPOffset() const { return VarArgsGPOffset; }		unsigned getVarArgsGPOffset() const { return VarArgsGPOffset; }
void setVarArgsGPOffset(unsigned Offset) { VarArgsGPOffset = Offset; }		void setVarArgsGPOffset(unsigned Offset) { VarArgsGPOffset = Offset; }

unsigned getVarArgsFPOffset() const { return VarArgsFPOffset; }		unsigned getVarArgsFPOffset() const { return VarArgsFPOffset; }
void setVarArgsFPOffset(unsigned Offset) { VarArgsFPOffset = Offset; }		void setVarArgsFPOffset(unsigned Offset) { VarArgsFPOffset = Offset; }

unsigned getArgumentStackSize() const { return ArgumentStackSize; }		unsigned getArgumentStackSize() const { return ArgumentStackSize; }
void setArgumentStackSize(unsigned size) { ArgumentStackSize = size; }		void setArgumentStackSize(unsigned size) { ArgumentStackSize = size; }
Show All 27 Lines

llvm/test/CodeGen/X86/musttail-varargs.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -verify-machineinstrs < %s -enable-tail-merge=0 -mtriple=x86_64-linux \| FileCheck %s --check-prefix=LINUX			; RUN: llc -verify-machineinstrs < %s -enable-tail-merge=0 -mtriple=x86_64-linux \| FileCheck %s --check-prefix=LINUX
				; RUN: llc -verify-machineinstrs -O0 < %s -enable-tail-merge=0 -mtriple=x86_64-linux \| FileCheck %s --check-prefix=LINUX-OPT0
	; RUN: llc -verify-machineinstrs < %s -enable-tail-merge=0 -mtriple=x86_64-linux-gnux32 \| FileCheck %s --check-prefix=LINUX-X32			; RUN: llc -verify-machineinstrs < %s -enable-tail-merge=0 -mtriple=x86_64-linux-gnux32 \| FileCheck %s --check-prefix=LINUX-X32
				; RUN: llc -verify-machineinstrs -O0 < %s -enable-tail-merge=0 -mtriple=x86_64-linux-gnux32 \| FileCheck %s --check-prefix=LINUX-X32-OPT0
	; RUN: llc -verify-machineinstrs < %s -enable-tail-merge=0 -mtriple=x86_64-windows \| FileCheck %s --check-prefix=WINDOWS			; RUN: llc -verify-machineinstrs < %s -enable-tail-merge=0 -mtriple=x86_64-windows \| FileCheck %s --check-prefix=WINDOWS
				; RUN: llc -verify-machineinstrs -O0 < %s -enable-tail-merge=0 -mtriple=x86_64-windows \| FileCheck %s --check-prefix=WINDOWS-OPT0
	; RUN: llc -verify-machineinstrs < %s -enable-tail-merge=0 -mtriple=i686-windows \| FileCheck %s --check-prefix=X86 --check-prefix=X86-NOSSE			; RUN: llc -verify-machineinstrs < %s -enable-tail-merge=0 -mtriple=i686-windows \| FileCheck %s --check-prefix=X86 --check-prefix=X86-NOSSE
				; RUN: llc -verify-machineinstrs -O0 < %s -enable-tail-merge=0 -mtriple=i686-windows \| FileCheck %s --check-prefix=X86 --check-prefix=X86-NOSSE-OPT0
	; RUN: llc -verify-machineinstrs < %s -enable-tail-merge=0 -mtriple=i686-windows -mattr=+sse2 \| FileCheck %s --check-prefix=X86 --check-prefix=X86-SSE			; RUN: llc -verify-machineinstrs < %s -enable-tail-merge=0 -mtriple=i686-windows -mattr=+sse2 \| FileCheck %s --check-prefix=X86 --check-prefix=X86-SSE
				; RUN: llc -verify-machineinstrs -O0 < %s -enable-tail-merge=0 -mtriple=i686-windows -mattr=+sse2 \| FileCheck %s --check-prefix=X86 --check-prefix=X86-SSE-OPT0

	; Test that we actually spill and reload all arguments in the variadic argument			; Test that we actually spill and reload all arguments in the variadic argument
	; pack. Doing a normal call will clobber all argument registers, and we will			; pack. Doing a normal call will clobber all argument registers, and we will
	; spill around it. A simple adjustment should not require any XMM spills.			; spill around it. A simple adjustment should not require any XMM spills.

	declare void @llvm.va_start(i8*) nounwind			declare void @llvm.va_start(i8*) nounwind

	declare void(i8, ...) @get_f(i8* %this)			declare void(i8, ...) @get_f(i8* %this)
	Show All 9 Lines
	; LINUX-NEXT: pushq %r14			; LINUX-NEXT: pushq %r14
	; LINUX-NEXT: .cfi_def_cfa_offset 32			; LINUX-NEXT: .cfi_def_cfa_offset 32
	; LINUX-NEXT: pushq %r13			; LINUX-NEXT: pushq %r13
	; LINUX-NEXT: .cfi_def_cfa_offset 40			; LINUX-NEXT: .cfi_def_cfa_offset 40
	; LINUX-NEXT: pushq %r12			; LINUX-NEXT: pushq %r12
	; LINUX-NEXT: .cfi_def_cfa_offset 48			; LINUX-NEXT: .cfi_def_cfa_offset 48
	; LINUX-NEXT: pushq %rbx			; LINUX-NEXT: pushq %rbx
	; LINUX-NEXT: .cfi_def_cfa_offset 56			; LINUX-NEXT: .cfi_def_cfa_offset 56
	; LINUX-NEXT: subq $360, %rsp # imm = 0x168			; LINUX-NEXT: subq $232, %rsp
	; LINUX-NEXT: .cfi_def_cfa_offset 416			; LINUX-NEXT: .cfi_def_cfa_offset 288
	; LINUX-NEXT: .cfi_offset %rbx, -56			; LINUX-NEXT: .cfi_offset %rbx, -56
	; LINUX-NEXT: .cfi_offset %r12, -48			; LINUX-NEXT: .cfi_offset %r12, -48
	; LINUX-NEXT: .cfi_offset %r13, -40			; LINUX-NEXT: .cfi_offset %r13, -40
	; LINUX-NEXT: .cfi_offset %r14, -32			; LINUX-NEXT: .cfi_offset %r14, -32
	; LINUX-NEXT: .cfi_offset %r15, -24			; LINUX-NEXT: .cfi_offset %r15, -24
	; LINUX-NEXT: .cfi_offset %rbp, -16			; LINUX-NEXT: .cfi_offset %rbp, -16
	; LINUX-NEXT: movq %r9, %r15			; LINUX-NEXT: movq %r9, %r15
	; LINUX-NEXT: movq %r8, %r12			; LINUX-NEXT: movq %r8, %r12
	; LINUX-NEXT: movq %rcx, %r13			; LINUX-NEXT: movq %rcx, %r13
	; LINUX-NEXT: movq %rdx, %rbp			; LINUX-NEXT: movq %rdx, %rbp
	; LINUX-NEXT: movq %rsi, %rbx			; LINUX-NEXT: movq %rsi, %rbx
	; LINUX-NEXT: movq %rdi, %r14			; LINUX-NEXT: movq %rdi, %r14
				; LINUX-NEXT: movq %rsi, {{[0-9]+}}(%rsp)
				; LINUX-NEXT: movq %rdx, {{[0-9]+}}(%rsp)
				; LINUX-NEXT: movq %rcx, {{[0-9]+}}(%rsp)
				; LINUX-NEXT: movq %r8, {{[0-9]+}}(%rsp)
				; LINUX-NEXT: movq %r9, {{[0-9]+}}(%rsp)
	; LINUX-NEXT: movb %al, {{[-0-9]+}}(%r{{[sb]}}p) # 1-byte Spill			; LINUX-NEXT: movb %al, {{[-0-9]+}}(%r{{[sb]}}p) # 1-byte Spill
	; LINUX-NEXT: testb %al, %al			; LINUX-NEXT: testb %al, %al
	; LINUX-NEXT: je .LBB0_2			; LINUX-NEXT: je .LBB0_2
	; LINUX-NEXT: # %bb.1:			; LINUX-NEXT: # %bb.1:
	; LINUX-NEXT: movaps %xmm0, {{[0-9]+}}(%rsp)			; LINUX-NEXT: movaps %xmm0, {{[0-9]+}}(%rsp)
	; LINUX-NEXT: movaps %xmm1, {{[0-9]+}}(%rsp)			; LINUX-NEXT: movaps %xmm1, {{[0-9]+}}(%rsp)
	; LINUX-NEXT: movaps %xmm2, {{[0-9]+}}(%rsp)			; LINUX-NEXT: movaps %xmm2, {{[0-9]+}}(%rsp)
	; LINUX-NEXT: movaps %xmm3, {{[0-9]+}}(%rsp)			; LINUX-NEXT: movaps %xmm3, {{[0-9]+}}(%rsp)
	; LINUX-NEXT: movaps %xmm4, {{[0-9]+}}(%rsp)			; LINUX-NEXT: movaps %xmm4, {{[0-9]+}}(%rsp)
	; LINUX-NEXT: movaps %xmm5, {{[0-9]+}}(%rsp)			; LINUX-NEXT: movaps %xmm5, {{[0-9]+}}(%rsp)
	; LINUX-NEXT: movaps %xmm6, {{[0-9]+}}(%rsp)			; LINUX-NEXT: movaps %xmm6, {{[0-9]+}}(%rsp)
	; LINUX-NEXT: movaps %xmm7, {{[0-9]+}}(%rsp)			; LINUX-NEXT: movaps %xmm7, {{[0-9]+}}(%rsp)
	; LINUX-NEXT: .LBB0_2:			; LINUX-NEXT: .LBB0_2:
	; LINUX-NEXT: movq %rbx, {{[0-9]+}}(%rsp)
	; LINUX-NEXT: movq %rbp, {{[0-9]+}}(%rsp)
	; LINUX-NEXT: movq %r13, {{[0-9]+}}(%rsp)
	; LINUX-NEXT: movq %r12, {{[0-9]+}}(%rsp)
	; LINUX-NEXT: movq %r15, {{[0-9]+}}(%rsp)
	; LINUX-NEXT: leaq {{[0-9]+}}(%rsp), %rax			; LINUX-NEXT: leaq {{[0-9]+}}(%rsp), %rax
	; LINUX-NEXT: movq %rax, {{[0-9]+}}(%rsp)			; LINUX-NEXT: movq %rax, {{[0-9]+}}(%rsp)
	; LINUX-NEXT: leaq {{[0-9]+}}(%rsp), %rax			; LINUX-NEXT: leaq {{[0-9]+}}(%rsp), %rax
	; LINUX-NEXT: movq %rax, {{[0-9]+}}(%rsp)			; LINUX-NEXT: movq %rax, {{[0-9]+}}(%rsp)
	; LINUX-NEXT: movabsq $206158430216, %rax # imm = 0x3000000008			; LINUX-NEXT: movabsq $206158430216, %rax # imm = 0x3000000008
	; LINUX-NEXT: movq %rax, {{[0-9]+}}(%rsp)			; LINUX-NEXT: movq %rax, {{[0-9]+}}(%rsp)
	; LINUX-NEXT: movq %r14, %rdi			; LINUX-NEXT: movq %r14, %rdi
	; LINUX-NEXT: movaps %xmm7, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
	; LINUX-NEXT: movaps %xmm6, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
	; LINUX-NEXT: movaps %xmm5, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
	; LINUX-NEXT: movaps %xmm4, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
	; LINUX-NEXT: movaps %xmm3, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
	; LINUX-NEXT: movaps %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
	; LINUX-NEXT: movaps %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
	; LINUX-NEXT: movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
	; LINUX-NEXT: callq get_f			; LINUX-NEXT: callq get_f
	; LINUX-NEXT: movq %rax, %r11			; LINUX-NEXT: movq %rax, %r11
	; LINUX-NEXT: movq %r14, %rdi			; LINUX-NEXT: movq %r14, %rdi
	; LINUX-NEXT: movq %rbx, %rsi			; LINUX-NEXT: movq %rbx, %rsi
	; LINUX-NEXT: movq %rbp, %rdx			; LINUX-NEXT: movq %rbp, %rdx
	; LINUX-NEXT: movq %r13, %rcx			; LINUX-NEXT: movq %r13, %rcx
	; LINUX-NEXT: movq %r12, %r8			; LINUX-NEXT: movq %r12, %r8
	; LINUX-NEXT: movq %r15, %r9			; LINUX-NEXT: movq %r15, %r9
	; LINUX-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
	; LINUX-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Reload
	; LINUX-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm2 # 16-byte Reload
	; LINUX-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm3 # 16-byte Reload
	; LINUX-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm4 # 16-byte Reload
	; LINUX-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm5 # 16-byte Reload
	; LINUX-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm6 # 16-byte Reload
	; LINUX-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm7 # 16-byte Reload
	; LINUX-NEXT: movb {{[-0-9]+}}(%r{{[sb]}}p), %al # 1-byte Reload			; LINUX-NEXT: movb {{[-0-9]+}}(%r{{[sb]}}p), %al # 1-byte Reload
	; LINUX-NEXT: addq $360, %rsp # imm = 0x168			; LINUX-NEXT: testb %al, %al
				; LINUX-NEXT: je .LBB0_4
				; LINUX-NEXT: # %bb.3:
				; LINUX-NEXT: movaps {{[0-9]+}}(%rsp), %xmm7
				; LINUX-NEXT: movaps {{[0-9]+}}(%rsp), %xmm6
				; LINUX-NEXT: movaps {{[0-9]+}}(%rsp), %xmm5
				; LINUX-NEXT: movaps {{[0-9]+}}(%rsp), %xmm4
				; LINUX-NEXT: movaps {{[0-9]+}}(%rsp), %xmm3
				; LINUX-NEXT: movaps {{[0-9]+}}(%rsp), %xmm2
				; LINUX-NEXT: movaps {{[0-9]+}}(%rsp), %xmm1
				; LINUX-NEXT: movaps {{[0-9]+}}(%rsp), %xmm0
				; LINUX-NEXT: addq $232, %rsp
				; LINUX-NEXT: .cfi_def_cfa_offset 56
				; LINUX-NEXT: popq %rbx
				; LINUX-NEXT: .cfi_def_cfa_offset 48
				; LINUX-NEXT: popq %r12
				; LINUX-NEXT: .cfi_def_cfa_offset 40
				; LINUX-NEXT: popq %r13
				; LINUX-NEXT: .cfi_def_cfa_offset 32
				; LINUX-NEXT: popq %r14
				; LINUX-NEXT: .cfi_def_cfa_offset 24
				; LINUX-NEXT: popq %r15
				; LINUX-NEXT: .cfi_def_cfa_offset 16
				; LINUX-NEXT: popq %rbp
				; LINUX-NEXT: .cfi_def_cfa_offset 8
				; LINUX-NEXT: jmpq *%r11 # TAILCALL
				; LINUX-NEXT: .LBB0_4:
				; LINUX-NEXT: .cfi_def_cfa_offset 288
				; LINUX-NEXT: addq $232, %rsp
	; LINUX-NEXT: .cfi_def_cfa_offset 56			; LINUX-NEXT: .cfi_def_cfa_offset 56
	; LINUX-NEXT: popq %rbx			; LINUX-NEXT: popq %rbx
	; LINUX-NEXT: .cfi_def_cfa_offset 48			; LINUX-NEXT: .cfi_def_cfa_offset 48
	; LINUX-NEXT: popq %r12			; LINUX-NEXT: popq %r12
	; LINUX-NEXT: .cfi_def_cfa_offset 40			; LINUX-NEXT: .cfi_def_cfa_offset 40
	; LINUX-NEXT: popq %r13			; LINUX-NEXT: popq %r13
	; LINUX-NEXT: .cfi_def_cfa_offset 32			; LINUX-NEXT: .cfi_def_cfa_offset 32
	; LINUX-NEXT: popq %r14			; LINUX-NEXT: popq %r14
	; LINUX-NEXT: .cfi_def_cfa_offset 24			; LINUX-NEXT: .cfi_def_cfa_offset 24
	; LINUX-NEXT: popq %r15			; LINUX-NEXT: popq %r15
	; LINUX-NEXT: .cfi_def_cfa_offset 16			; LINUX-NEXT: .cfi_def_cfa_offset 16
	; LINUX-NEXT: popq %rbp			; LINUX-NEXT: popq %rbp
	; LINUX-NEXT: .cfi_def_cfa_offset 8			; LINUX-NEXT: .cfi_def_cfa_offset 8
	; LINUX-NEXT: jmpq *%r11 # TAILCALL			; LINUX-NEXT: jmpq *%r11 # TAILCALL
	;			;
				; LINUX-OPT0-LABEL: f_thunk:
				; LINUX-OPT0: # %bb.0:
				; LINUX-OPT0-NEXT: subq $328, %rsp # imm = 0x148
				; LINUX-OPT0-NEXT: .cfi_def_cfa_offset 336
				; LINUX-OPT0-NEXT: testb %al, %al
				; LINUX-OPT0-NEXT: movb %al, {{[-0-9]+}}(%r{{[sb]}}p) # 1-byte Spill
				; LINUX-OPT0-NEXT: movq %r9, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; LINUX-OPT0-NEXT: movq %r8, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; LINUX-OPT0-NEXT: movq %rcx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; LINUX-OPT0-NEXT: movq %rdx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; LINUX-OPT0-NEXT: movq %rsi, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; LINUX-OPT0-NEXT: movq %rdi, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; LINUX-OPT0-NEXT: je .LBB0_2
				; LINUX-OPT0-NEXT: # %bb.1:
				; LINUX-OPT0-NEXT: movaps %xmm0, {{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movaps %xmm1, {{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movaps %xmm2, {{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movaps %xmm3, {{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movaps %xmm4, {{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movaps %xmm5, {{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movaps %xmm6, {{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movaps %xmm7, {{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: .LBB0_2:
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rax # 8-byte Reload
				; LINUX-OPT0-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rcx # 8-byte Reload
				; LINUX-OPT0-NEXT: movq %rcx, {{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rdx # 8-byte Reload
				; LINUX-OPT0-NEXT: movq %rdx, {{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rsi # 8-byte Reload
				; LINUX-OPT0-NEXT: movq %rsi, {{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rdi # 8-byte Reload
				; LINUX-OPT0-NEXT: movq %rdi, {{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movb {{[-0-9]+}}(%r{{[sb]}}p), %r8b # 1-byte Reload
				; LINUX-OPT0-NEXT: leaq {{[0-9]+}}(%rsp), %r9
				; LINUX-OPT0-NEXT: movq %r9, {{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: leaq {{[0-9]+}}(%rsp), %r9
				; LINUX-OPT0-NEXT: movq %r9, {{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movl $48, {{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movl $8, {{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %r9 # 8-byte Reload
				; LINUX-OPT0-NEXT: movq %rdi, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; LINUX-OPT0-NEXT: movq %r9, %rdi
				; LINUX-OPT0-NEXT: movq %rsi, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; LINUX-OPT0-NEXT: movq %rdx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; LINUX-OPT0-NEXT: movq %rcx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; LINUX-OPT0-NEXT: movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; LINUX-OPT0-NEXT: movb %r8b, {{[-0-9]+}}(%r{{[sb]}}p) # 1-byte Spill
				; LINUX-OPT0-NEXT: callq get_f
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rdi # 8-byte Reload
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rsi # 8-byte Reload
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rdx # 8-byte Reload
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rcx # 8-byte Reload
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %r8 # 8-byte Reload
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %r9 # 8-byte Reload
				; LINUX-OPT0-NEXT: movb {{[-0-9]+}}(%r{{[sb]}}p), %r10b # 1-byte Reload
				; LINUX-OPT0-NEXT: movq %rax, (%rsp) # 8-byte Spill
				; LINUX-OPT0-NEXT: movb %r10b, %al
				; LINUX-OPT0-NEXT: movq (%rsp), %r11 # 8-byte Reload
				; LINUX-OPT0-NEXT: testb %al, %al
				; LINUX-OPT0-NEXT: je .LBB0_4
				; LINUX-OPT0-NEXT: # %bb.3:
				; LINUX-OPT0-NEXT: movaps {{[0-9]+}}(%rsp), %xmm7
				; LINUX-OPT0-NEXT: movaps {{[0-9]+}}(%rsp), %xmm6
				; LINUX-OPT0-NEXT: movaps {{[0-9]+}}(%rsp), %xmm5
				; LINUX-OPT0-NEXT: movaps {{[0-9]+}}(%rsp), %xmm4
				; LINUX-OPT0-NEXT: movaps {{[0-9]+}}(%rsp), %xmm3
				; LINUX-OPT0-NEXT: movaps {{[0-9]+}}(%rsp), %xmm2
				; LINUX-OPT0-NEXT: movaps {{[0-9]+}}(%rsp), %xmm1
				; LINUX-OPT0-NEXT: movaps {{[0-9]+}}(%rsp), %xmm0
				; LINUX-OPT0-NEXT: addq $328, %rsp # imm = 0x148
				; LINUX-OPT0-NEXT: .cfi_def_cfa_offset 8
				; LINUX-OPT0-NEXT: jmpq *%r11 # TAILCALL
				; LINUX-OPT0-NEXT: .LBB0_4:
				; LINUX-OPT0-NEXT: .cfi_def_cfa_offset 336
				; LINUX-OPT0-NEXT: addq $328, %rsp # imm = 0x148
				; LINUX-OPT0-NEXT: .cfi_def_cfa_offset 8
				; LINUX-OPT0-NEXT: jmpq *%r11 # TAILCALL
				;
	; LINUX-X32-LABEL: f_thunk:			; LINUX-X32-LABEL: f_thunk:
	; LINUX-X32: # %bb.0:			; LINUX-X32: # %bb.0:
	; LINUX-X32-NEXT: pushq %rbp			; LINUX-X32-NEXT: pushq %rbp
	; LINUX-X32-NEXT: .cfi_def_cfa_offset 16			; LINUX-X32-NEXT: .cfi_def_cfa_offset 16
	; LINUX-X32-NEXT: pushq %r15			; LINUX-X32-NEXT: pushq %r15
	; LINUX-X32-NEXT: .cfi_def_cfa_offset 24			; LINUX-X32-NEXT: .cfi_def_cfa_offset 24
	; LINUX-X32-NEXT: pushq %r14			; LINUX-X32-NEXT: pushq %r14
	; LINUX-X32-NEXT: .cfi_def_cfa_offset 32			; LINUX-X32-NEXT: .cfi_def_cfa_offset 32
	; LINUX-X32-NEXT: pushq %r13			; LINUX-X32-NEXT: pushq %r13
	; LINUX-X32-NEXT: .cfi_def_cfa_offset 40			; LINUX-X32-NEXT: .cfi_def_cfa_offset 40
	; LINUX-X32-NEXT: pushq %r12			; LINUX-X32-NEXT: pushq %r12
	; LINUX-X32-NEXT: .cfi_def_cfa_offset 48			; LINUX-X32-NEXT: .cfi_def_cfa_offset 48
	; LINUX-X32-NEXT: pushq %rbx			; LINUX-X32-NEXT: pushq %rbx
	; LINUX-X32-NEXT: .cfi_def_cfa_offset 56			; LINUX-X32-NEXT: .cfi_def_cfa_offset 56
	; LINUX-X32-NEXT: subl $344, %esp # imm = 0x158			; LINUX-X32-NEXT: subl $216, %esp
	; LINUX-X32-NEXT: .cfi_def_cfa_offset 400			; LINUX-X32-NEXT: .cfi_def_cfa_offset 272
	; LINUX-X32-NEXT: .cfi_offset %rbx, -56			; LINUX-X32-NEXT: .cfi_offset %rbx, -56
	; LINUX-X32-NEXT: .cfi_offset %r12, -48			; LINUX-X32-NEXT: .cfi_offset %r12, -48
	; LINUX-X32-NEXT: .cfi_offset %r13, -40			; LINUX-X32-NEXT: .cfi_offset %r13, -40
	; LINUX-X32-NEXT: .cfi_offset %r14, -32			; LINUX-X32-NEXT: .cfi_offset %r14, -32
	; LINUX-X32-NEXT: .cfi_offset %r15, -24			; LINUX-X32-NEXT: .cfi_offset %r15, -24
	; LINUX-X32-NEXT: .cfi_offset %rbp, -16			; LINUX-X32-NEXT: .cfi_offset %rbp, -16
	; LINUX-X32-NEXT: movq %r9, %r15			; LINUX-X32-NEXT: movq %r9, %r15
	; LINUX-X32-NEXT: movq %r8, %r12			; LINUX-X32-NEXT: movq %r8, %r12
	; LINUX-X32-NEXT: movq %rcx, %r13			; LINUX-X32-NEXT: movq %rcx, %r13
	; LINUX-X32-NEXT: movq %rdx, %rbp			; LINUX-X32-NEXT: movq %rdx, %rbp
	; LINUX-X32-NEXT: movq %rsi, %rbx			; LINUX-X32-NEXT: movq %rsi, %rbx
	; LINUX-X32-NEXT: movl %edi, %r14d			; LINUX-X32-NEXT: movl %edi, %r14d
				; LINUX-X32-NEXT: movq %rsi, {{[0-9]+}}(%esp)
				; LINUX-X32-NEXT: movq %rdx, {{[0-9]+}}(%esp)
				; LINUX-X32-NEXT: movq %rcx, {{[0-9]+}}(%esp)
				; LINUX-X32-NEXT: movq %r8, {{[0-9]+}}(%esp)
				; LINUX-X32-NEXT: movq %r9, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movb %al, {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Spill			; LINUX-X32-NEXT: movb %al, {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Spill
	; LINUX-X32-NEXT: testb %al, %al			; LINUX-X32-NEXT: testb %al, %al
	; LINUX-X32-NEXT: je .LBB0_2			; LINUX-X32-NEXT: je .LBB0_2
	; LINUX-X32-NEXT: # %bb.1:			; LINUX-X32-NEXT: # %bb.1:
	; LINUX-X32-NEXT: movaps %xmm0, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movaps %xmm0, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movaps %xmm1, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movaps %xmm1, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movaps %xmm2, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movaps %xmm2, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movaps %xmm3, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movaps %xmm3, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movaps %xmm4, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movaps %xmm4, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movaps %xmm5, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movaps %xmm5, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movaps %xmm6, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movaps %xmm6, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movaps %xmm7, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movaps %xmm7, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: .LBB0_2:			; LINUX-X32-NEXT: .LBB0_2:
	; LINUX-X32-NEXT: movq %rbx, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movq %rbp, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movq %r13, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movq %r12, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movq %r15, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: leal {{[0-9]+}}(%rsp), %eax			; LINUX-X32-NEXT: leal {{[0-9]+}}(%rsp), %eax
	; LINUX-X32-NEXT: movl %eax, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movl %eax, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: leal {{[0-9]+}}(%rsp), %eax			; LINUX-X32-NEXT: leal {{[0-9]+}}(%rsp), %eax
	; LINUX-X32-NEXT: movl %eax, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movl %eax, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movabsq $206158430216, %rax # imm = 0x3000000008			; LINUX-X32-NEXT: movabsq $206158430216, %rax # imm = 0x3000000008
	; LINUX-X32-NEXT: movq %rax, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movq %rax, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movl %r14d, %edi			; LINUX-X32-NEXT: movl %r14d, %edi
	; LINUX-X32-NEXT: movaps %xmm7, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill
	; LINUX-X32-NEXT: movaps %xmm6, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill
	; LINUX-X32-NEXT: movaps %xmm5, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill
	; LINUX-X32-NEXT: movaps %xmm4, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill
	; LINUX-X32-NEXT: movaps %xmm3, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill
	; LINUX-X32-NEXT: movaps %xmm2, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill
	; LINUX-X32-NEXT: movaps %xmm1, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill
	; LINUX-X32-NEXT: movaps %xmm0, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill
	; LINUX-X32-NEXT: callq get_f			; LINUX-X32-NEXT: callq get_f
	; LINUX-X32-NEXT: movl %eax, %r11d			; LINUX-X32-NEXT: movl %eax, %r11d
	; LINUX-X32-NEXT: movl %r14d, %edi			; LINUX-X32-NEXT: movl %r14d, %edi
	; LINUX-X32-NEXT: movq %rbx, %rsi			; LINUX-X32-NEXT: movq %rbx, %rsi
	; LINUX-X32-NEXT: movq %rbp, %rdx			; LINUX-X32-NEXT: movq %rbp, %rdx
	; LINUX-X32-NEXT: movq %r13, %rcx			; LINUX-X32-NEXT: movq %r13, %rcx
	; LINUX-X32-NEXT: movq %r12, %r8			; LINUX-X32-NEXT: movq %r12, %r8
	; LINUX-X32-NEXT: movq %r15, %r9			; LINUX-X32-NEXT: movq %r15, %r9
	; LINUX-X32-NEXT: movaps {{[-0-9]+}}(%e{{[sb]}}p), %xmm0 # 16-byte Reload
	; LINUX-X32-NEXT: movaps {{[-0-9]+}}(%e{{[sb]}}p), %xmm1 # 16-byte Reload
	; LINUX-X32-NEXT: movaps {{[-0-9]+}}(%e{{[sb]}}p), %xmm2 # 16-byte Reload
	; LINUX-X32-NEXT: movaps {{[-0-9]+}}(%e{{[sb]}}p), %xmm3 # 16-byte Reload
	; LINUX-X32-NEXT: movaps {{[-0-9]+}}(%e{{[sb]}}p), %xmm4 # 16-byte Reload
	; LINUX-X32-NEXT: movaps {{[-0-9]+}}(%e{{[sb]}}p), %xmm5 # 16-byte Reload
	; LINUX-X32-NEXT: movaps {{[-0-9]+}}(%e{{[sb]}}p), %xmm6 # 16-byte Reload
	; LINUX-X32-NEXT: movaps {{[-0-9]+}}(%e{{[sb]}}p), %xmm7 # 16-byte Reload
	; LINUX-X32-NEXT: movb {{[-0-9]+}}(%e{{[sb]}}p), %al # 1-byte Reload			; LINUX-X32-NEXT: movb {{[-0-9]+}}(%e{{[sb]}}p), %al # 1-byte Reload
	; LINUX-X32-NEXT: addl $344, %esp # imm = 0x158			; LINUX-X32-NEXT: testb %al, %al
				; LINUX-X32-NEXT: je .LBB0_4
				; LINUX-X32-NEXT: # %bb.3:
				; LINUX-X32-NEXT: movaps {{[0-9]+}}(%esp), %xmm7
				; LINUX-X32-NEXT: movaps {{[0-9]+}}(%esp), %xmm6
				; LINUX-X32-NEXT: movaps {{[0-9]+}}(%esp), %xmm5
				; LINUX-X32-NEXT: movaps {{[0-9]+}}(%esp), %xmm4
				; LINUX-X32-NEXT: movaps {{[0-9]+}}(%esp), %xmm3
				; LINUX-X32-NEXT: movaps {{[0-9]+}}(%esp), %xmm2
				; LINUX-X32-NEXT: movaps {{[0-9]+}}(%esp), %xmm1
				; LINUX-X32-NEXT: movaps {{[0-9]+}}(%esp), %xmm0
				; LINUX-X32-NEXT: addl $216, %esp
				; LINUX-X32-NEXT: .cfi_def_cfa_offset 56
				; LINUX-X32-NEXT: popq %rbx
				; LINUX-X32-NEXT: .cfi_def_cfa_offset 48
				; LINUX-X32-NEXT: popq %r12
				; LINUX-X32-NEXT: .cfi_def_cfa_offset 40
				; LINUX-X32-NEXT: popq %r13
				; LINUX-X32-NEXT: .cfi_def_cfa_offset 32
				; LINUX-X32-NEXT: popq %r14
				; LINUX-X32-NEXT: .cfi_def_cfa_offset 24
				; LINUX-X32-NEXT: popq %r15
				; LINUX-X32-NEXT: .cfi_def_cfa_offset 16
				; LINUX-X32-NEXT: popq %rbp
				; LINUX-X32-NEXT: .cfi_def_cfa_offset 8
				; LINUX-X32-NEXT: jmpq *%r11 # TAILCALL
				; LINUX-X32-NEXT: .LBB0_4:
				; LINUX-X32-NEXT: .cfi_def_cfa_offset 272
				; LINUX-X32-NEXT: addl $216, %esp
	; LINUX-X32-NEXT: .cfi_def_cfa_offset 56			; LINUX-X32-NEXT: .cfi_def_cfa_offset 56
	; LINUX-X32-NEXT: popq %rbx			; LINUX-X32-NEXT: popq %rbx
	; LINUX-X32-NEXT: .cfi_def_cfa_offset 48			; LINUX-X32-NEXT: .cfi_def_cfa_offset 48
	; LINUX-X32-NEXT: popq %r12			; LINUX-X32-NEXT: popq %r12
	; LINUX-X32-NEXT: .cfi_def_cfa_offset 40			; LINUX-X32-NEXT: .cfi_def_cfa_offset 40
	; LINUX-X32-NEXT: popq %r13			; LINUX-X32-NEXT: popq %r13
	; LINUX-X32-NEXT: .cfi_def_cfa_offset 32			; LINUX-X32-NEXT: .cfi_def_cfa_offset 32
	; LINUX-X32-NEXT: popq %r14			; LINUX-X32-NEXT: popq %r14
	; LINUX-X32-NEXT: .cfi_def_cfa_offset 24			; LINUX-X32-NEXT: .cfi_def_cfa_offset 24
	; LINUX-X32-NEXT: popq %r15			; LINUX-X32-NEXT: popq %r15
	; LINUX-X32-NEXT: .cfi_def_cfa_offset 16			; LINUX-X32-NEXT: .cfi_def_cfa_offset 16
	; LINUX-X32-NEXT: popq %rbp			; LINUX-X32-NEXT: popq %rbp
	; LINUX-X32-NEXT: .cfi_def_cfa_offset 8			; LINUX-X32-NEXT: .cfi_def_cfa_offset 8
	; LINUX-X32-NEXT: jmpq *%r11 # TAILCALL			; LINUX-X32-NEXT: jmpq *%r11 # TAILCALL
	;			;
				; LINUX-X32-OPT0-LABEL: f_thunk:
				; LINUX-X32-OPT0: # %bb.0:
				; LINUX-X32-OPT0-NEXT: subl $312, %esp # imm = 0x138
				; LINUX-X32-OPT0-NEXT: .cfi_def_cfa_offset 320
				; LINUX-X32-OPT0-NEXT: testb %al, %al
				; LINUX-X32-OPT0-NEXT: movb %al, {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Spill
				; LINUX-X32-OPT0-NEXT: movq %r9, {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Spill
				; LINUX-X32-OPT0-NEXT: movq %r8, {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Spill
				; LINUX-X32-OPT0-NEXT: movq %rcx, {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Spill
				; LINUX-X32-OPT0-NEXT: movq %rdx, {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Spill
				; LINUX-X32-OPT0-NEXT: movq %rsi, {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Spill
				; LINUX-X32-OPT0-NEXT: movl %edi, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; LINUX-X32-OPT0-NEXT: je .LBB0_2
				; LINUX-X32-OPT0-NEXT: # %bb.1:
				; LINUX-X32-OPT0-NEXT: movaps %xmm0, {{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movaps %xmm1, {{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movaps %xmm2, {{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movaps %xmm3, {{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movaps %xmm4, {{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movaps %xmm5, {{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movaps %xmm6, {{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movaps %xmm7, {{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: .LBB0_2:
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %rax # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: movq %rax, {{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %rcx # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: movq %rcx, {{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %rdx # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: movq %rdx, {{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %rsi # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: movq %rsi, {{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %rdi # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: movq %rdi, {{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movb {{[-0-9]+}}(%e{{[sb]}}p), %r8b # 1-byte Reload
				; LINUX-X32-OPT0-NEXT: leal {{[0-9]+}}(%rsp), %r9d
				; LINUX-X32-OPT0-NEXT: movl %r9d, {{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: leal {{[0-9]+}}(%rsp), %r9d
				; LINUX-X32-OPT0-NEXT: movl %r9d, {{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movl $48, {{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movl $8, {{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %r9d # 4-byte Reload
				; LINUX-X32-OPT0-NEXT: movq %rdi, {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Spill
				; LINUX-X32-OPT0-NEXT: movl %r9d, %edi
				; LINUX-X32-OPT0-NEXT: movq %rsi, {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Spill
				; LINUX-X32-OPT0-NEXT: movq %rdx, {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Spill
				; LINUX-X32-OPT0-NEXT: movq %rcx, {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Spill
				; LINUX-X32-OPT0-NEXT: movq %rax, {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Spill
				; LINUX-X32-OPT0-NEXT: movb %r8b, {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Spill
				; LINUX-X32-OPT0-NEXT: callq get_f
				; LINUX-X32-OPT0-NEXT: movl %eax, %eax
				; LINUX-X32-OPT0-NEXT: movl %eax, %ecx
				; LINUX-X32-OPT0-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %edi # 4-byte Reload
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %rsi # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %rdx # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %r10 # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: movq %rcx, (%esp) # 8-byte Spill
				; LINUX-X32-OPT0-NEXT: movq %r10, %rcx
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %r8 # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %r9 # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: movb {{[-0-9]+}}(%e{{[sb]}}p), %al # 1-byte Reload
				; LINUX-X32-OPT0-NEXT: movq (%esp), %r11 # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: testb %al, %al
				; LINUX-X32-OPT0-NEXT: je .LBB0_4
				; LINUX-X32-OPT0-NEXT: # %bb.3:
				; LINUX-X32-OPT0-NEXT: movaps {{[0-9]+}}(%esp), %xmm7
				; LINUX-X32-OPT0-NEXT: movaps {{[0-9]+}}(%esp), %xmm6
				; LINUX-X32-OPT0-NEXT: movaps {{[0-9]+}}(%esp), %xmm5
				; LINUX-X32-OPT0-NEXT: movaps {{[0-9]+}}(%esp), %xmm4
				; LINUX-X32-OPT0-NEXT: movaps {{[0-9]+}}(%esp), %xmm3
				; LINUX-X32-OPT0-NEXT: movaps {{[0-9]+}}(%esp), %xmm2
				; LINUX-X32-OPT0-NEXT: movaps {{[0-9]+}}(%esp), %xmm1
				; LINUX-X32-OPT0-NEXT: movaps {{[0-9]+}}(%esp), %xmm0
				; LINUX-X32-OPT0-NEXT: addl $312, %esp # imm = 0x138
				; LINUX-X32-OPT0-NEXT: .cfi_def_cfa_offset 8
				; LINUX-X32-OPT0-NEXT: jmpq *%r11 # TAILCALL
				; LINUX-X32-OPT0-NEXT: .LBB0_4:
				; LINUX-X32-OPT0-NEXT: .cfi_def_cfa_offset 320
				; LINUX-X32-OPT0-NEXT: addl $312, %esp # imm = 0x138
				; LINUX-X32-OPT0-NEXT: .cfi_def_cfa_offset 8
				; LINUX-X32-OPT0-NEXT: jmpq *%r11 # TAILCALL
				;
	; WINDOWS-LABEL: f_thunk:			; WINDOWS-LABEL: f_thunk:
	; WINDOWS: # %bb.0:			; WINDOWS: # %bb.0:
	; WINDOWS-NEXT: pushq %r14			; WINDOWS-NEXT: pushq %r14
	; WINDOWS-NEXT: .seh_pushreg %r14			; WINDOWS-NEXT: .seh_pushreg %r14
	; WINDOWS-NEXT: pushq %rsi			; WINDOWS-NEXT: pushq %rsi
	; WINDOWS-NEXT: .seh_pushreg %rsi			; WINDOWS-NEXT: .seh_pushreg %rsi
	; WINDOWS-NEXT: pushq %rdi			; WINDOWS-NEXT: pushq %rdi
	; WINDOWS-NEXT: .seh_pushreg %rdi			; WINDOWS-NEXT: .seh_pushreg %rdi
	Show All 21 Lines
	; WINDOWS-NEXT: popq %rdi			; WINDOWS-NEXT: popq %rdi
	; WINDOWS-NEXT: popq %rsi			; WINDOWS-NEXT: popq %rsi
	; WINDOWS-NEXT: popq %r14			; WINDOWS-NEXT: popq %r14
	; WINDOWS-NEXT: rex64 jmpq *%rax # TAILCALL			; WINDOWS-NEXT: rex64 jmpq *%rax # TAILCALL
	; WINDOWS-NEXT: .seh_handlerdata			; WINDOWS-NEXT: .seh_handlerdata
	; WINDOWS-NEXT: .text			; WINDOWS-NEXT: .text
	; WINDOWS-NEXT: .seh_endproc			; WINDOWS-NEXT: .seh_endproc
	;			;
				; WINDOWS-OPT0-LABEL: f_thunk:
				; WINDOWS-OPT0: # %bb.0:
				; WINDOWS-OPT0-NEXT: subq $104, %rsp
				; WINDOWS-OPT0-NEXT: .seh_stackalloc 104
				; WINDOWS-OPT0-NEXT: .seh_endprologue
				; WINDOWS-OPT0-NEXT: movq %r9, {{[0-9]+}}(%rsp)
				; WINDOWS-OPT0-NEXT: movq %r8, {{[0-9]+}}(%rsp)
				; WINDOWS-OPT0-NEXT: movq %rdx, {{[0-9]+}}(%rsp)
				; WINDOWS-OPT0-NEXT: leaq {{[0-9]+}}(%rsp), %rax
				; WINDOWS-OPT0-NEXT: movq %rax, {{[0-9]+}}(%rsp)
				; WINDOWS-OPT0-NEXT: movq %rcx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; WINDOWS-OPT0-NEXT: movq %rdx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; WINDOWS-OPT0-NEXT: movq %r8, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; WINDOWS-OPT0-NEXT: movq %r9, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; WINDOWS-OPT0-NEXT: callq get_f
				; WINDOWS-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rcx # 8-byte Reload
				; WINDOWS-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rdx # 8-byte Reload
				; WINDOWS-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %r8 # 8-byte Reload
				; WINDOWS-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %r9 # 8-byte Reload
				; WINDOWS-OPT0-NEXT: addq $104, %rsp
				; WINDOWS-OPT0-NEXT: rex64 jmpq *%rax # TAILCALL
				; WINDOWS-OPT0-NEXT: .seh_handlerdata
				; WINDOWS-OPT0-NEXT: .text
				; WINDOWS-OPT0-NEXT: .seh_endproc
				;
	; X86-NOSSE-LABEL: f_thunk:			; X86-NOSSE-LABEL: f_thunk:
	; X86-NOSSE: # %bb.0:			; X86-NOSSE: # %bb.0:
	; X86-NOSSE-NEXT: pushl %ebp			; X86-NOSSE-NEXT: pushl %ebp
	; X86-NOSSE-NEXT: movl %esp, %ebp			; X86-NOSSE-NEXT: movl %esp, %ebp
	; X86-NOSSE-NEXT: pushl %esi			; X86-NOSSE-NEXT: pushl %esi
	; X86-NOSSE-NEXT: andl $-16, %esp			; X86-NOSSE-NEXT: andl $-16, %esp
	; X86-NOSSE-NEXT: subl $32, %esp			; X86-NOSSE-NEXT: subl $32, %esp
	; X86-NOSSE-NEXT: movl 8(%ebp), %esi			; X86-NOSSE-NEXT: movl 8(%ebp), %esi
	; X86-NOSSE-NEXT: leal 12(%ebp), %eax			; X86-NOSSE-NEXT: leal 12(%ebp), %eax
	; X86-NOSSE-NEXT: movl %eax, (%esp)			; X86-NOSSE-NEXT: movl %eax, (%esp)
	; X86-NOSSE-NEXT: pushl %esi			; X86-NOSSE-NEXT: pushl %esi
	; X86-NOSSE-NEXT: calll _get_f			; X86-NOSSE-NEXT: calll _get_f
	; X86-NOSSE-NEXT: addl $4, %esp			; X86-NOSSE-NEXT: addl $4, %esp
	; X86-NOSSE-NEXT: movl %esi, 8(%ebp)			; X86-NOSSE-NEXT: movl %esi, 8(%ebp)
	; X86-NOSSE-NEXT: leal -4(%ebp), %esp			; X86-NOSSE-NEXT: leal -4(%ebp), %esp
	; X86-NOSSE-NEXT: popl %esi			; X86-NOSSE-NEXT: popl %esi
	; X86-NOSSE-NEXT: popl %ebp			; X86-NOSSE-NEXT: popl %ebp
	; X86-NOSSE-NEXT: jmpl *%eax # TAILCALL			; X86-NOSSE-NEXT: jmpl *%eax # TAILCALL
	;			;
				; X86-NOSSE-OPT0-LABEL: f_thunk:
				; X86-NOSSE-OPT0: # %bb.0:
				; X86-NOSSE-OPT0-NEXT: pushl %ebp
				; X86-NOSSE-OPT0-NEXT: movl %esp, %ebp
				; X86-NOSSE-OPT0-NEXT: andl $-16, %esp
				; X86-NOSSE-OPT0-NEXT: subl $48, %esp
				; X86-NOSSE-OPT0-NEXT: movl 8(%ebp), %eax
				; X86-NOSSE-OPT0-NEXT: leal 12(%ebp), %ecx
				; X86-NOSSE-OPT0-NEXT: movl %ecx, {{[0-9]+}}(%esp)
				; X86-NOSSE-OPT0-NEXT: movl %esp, %ecx
				; X86-NOSSE-OPT0-NEXT: movl %eax, (%ecx)
				; X86-NOSSE-OPT0-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-NOSSE-OPT0-NEXT: calll _get_f
				; X86-NOSSE-OPT0-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
				; X86-NOSSE-OPT0-NEXT: movl %ecx, 8(%ebp)
				; X86-NOSSE-OPT0-NEXT: movl %ebp, %esp
				; X86-NOSSE-OPT0-NEXT: popl %ebp
				; X86-NOSSE-OPT0-NEXT: jmpl *%eax # TAILCALL
				;
	; X86-SSE-LABEL: f_thunk:			; X86-SSE-LABEL: f_thunk:
	; X86-SSE: # %bb.0:			; X86-SSE: # %bb.0:
	; X86-SSE-NEXT: pushl %ebp			; X86-SSE-NEXT: pushl %ebp
	; X86-SSE-NEXT: movl %esp, %ebp			; X86-SSE-NEXT: movl %esp, %ebp
	; X86-SSE-NEXT: pushl %esi			; X86-SSE-NEXT: pushl %esi
	; X86-SSE-NEXT: andl $-16, %esp			; X86-SSE-NEXT: andl $-16, %esp
	; X86-SSE-NEXT: subl $80, %esp			; X86-SSE-NEXT: subl $80, %esp
	; X86-SSE-NEXT: movaps %xmm2, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill			; X86-SSE-NEXT: movaps %xmm2, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill
	; X86-SSE-NEXT: movaps %xmm1, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill			; X86-SSE-NEXT: movaps %xmm1, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill
	; X86-SSE-NEXT: movaps %xmm0, (%esp) # 16-byte Spill			; X86-SSE-NEXT: movaps %xmm0, (%esp) # 16-byte Spill
	; X86-SSE-NEXT: movl 8(%ebp), %esi			; X86-SSE-NEXT: movl 8(%ebp), %esi
	; X86-SSE-NEXT: leal 12(%ebp), %eax			; X86-SSE-NEXT: leal 12(%ebp), %eax
	; X86-SSE-NEXT: movl %eax, {{[0-9]+}}(%esp)			; X86-SSE-NEXT: movl %eax, {{[0-9]+}}(%esp)
	; X86-SSE-NEXT: pushl %esi			; X86-SSE-NEXT: pushl %esi
	; X86-SSE-NEXT: calll _get_f			; X86-SSE-NEXT: calll _get_f
	; X86-SSE-NEXT: addl $4, %esp			; X86-SSE-NEXT: addl $4, %esp
	; X86-SSE-NEXT: movl %esi, 8(%ebp)			; X86-SSE-NEXT: movl %esi, 8(%ebp)
	; X86-SSE-NEXT: movaps (%esp), %xmm0 # 16-byte Reload			; X86-SSE-NEXT: movaps (%esp), %xmm0 # 16-byte Reload
	; X86-SSE-NEXT: movaps {{[-0-9]+}}(%e{{[sb]}}p), %xmm1 # 16-byte Reload			; X86-SSE-NEXT: movaps {{[-0-9]+}}(%e{{[sb]}}p), %xmm1 # 16-byte Reload
	; X86-SSE-NEXT: movaps {{[-0-9]+}}(%e{{[sb]}}p), %xmm2 # 16-byte Reload			; X86-SSE-NEXT: movaps {{[-0-9]+}}(%e{{[sb]}}p), %xmm2 # 16-byte Reload
	; X86-SSE-NEXT: leal -4(%ebp), %esp			; X86-SSE-NEXT: leal -4(%ebp), %esp
	; X86-SSE-NEXT: popl %esi			; X86-SSE-NEXT: popl %esi
	; X86-SSE-NEXT: popl %ebp			; X86-SSE-NEXT: popl %ebp
	; X86-SSE-NEXT: jmpl *%eax # TAILCALL			; X86-SSE-NEXT: jmpl *%eax # TAILCALL
				;
				; X86-SSE-OPT0-LABEL: f_thunk:
				; X86-SSE-OPT0: # %bb.0:
				; X86-SSE-OPT0-NEXT: pushl %ebp
				; X86-SSE-OPT0-NEXT: movl %esp, %ebp
				; X86-SSE-OPT0-NEXT: andl $-16, %esp
				; X86-SSE-OPT0-NEXT: subl $112, %esp
				; X86-SSE-OPT0-NEXT: movl 8(%ebp), %eax
				; X86-SSE-OPT0-NEXT: leal 12(%ebp), %ecx
				; X86-SSE-OPT0-NEXT: movl %ecx, {{[0-9]+}}(%esp)
				; X86-SSE-OPT0-NEXT: movl %esp, %ecx
				; X86-SSE-OPT0-NEXT: movl %eax, (%ecx)
				; X86-SSE-OPT0-NEXT: movaps %xmm0, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill
				; X86-SSE-OPT0-NEXT: movaps %xmm1, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill
				; X86-SSE-OPT0-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-SSE-OPT0-NEXT: movaps %xmm2, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill
				; X86-SSE-OPT0-NEXT: calll _get_f
				; X86-SSE-OPT0-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
				; X86-SSE-OPT0-NEXT: movl %ecx, 8(%ebp)
				; X86-SSE-OPT0-NEXT: movaps {{[-0-9]+}}(%e{{[sb]}}p), %xmm0 # 16-byte Reload
				; X86-SSE-OPT0-NEXT: movaps {{[-0-9]+}}(%e{{[sb]}}p), %xmm1 # 16-byte Reload
				; X86-SSE-OPT0-NEXT: movaps {{[-0-9]+}}(%e{{[sb]}}p), %xmm2 # 16-byte Reload
				; X86-SSE-OPT0-NEXT: movl %ebp, %esp
				; X86-SSE-OPT0-NEXT: popl %ebp
				; X86-SSE-OPT0-NEXT: jmpl *%eax # TAILCALL
	%ap = alloca [4 x i8*], align 16			%ap = alloca [4 x i8*], align 16
	%ap_i8 = bitcast [4 x i8] %ap to i8*			%ap_i8 = bitcast [4 x i8] %ap to i8*
	call void @llvm.va_start(i8* %ap_i8)			call void @llvm.va_start(i8* %ap_i8)

	%fptr = call void(i8, ...)(i8) @get_f(i8 %this)			%fptr = call void(i8, ...)(i8) @get_f(i8 %this)
	musttail call void (i8, ...) %fptr(i8 %this, ...)			musttail call void (i8, ...) %fptr(i8 %this, ...)
	ret void			ret void
	}			}

	; Save and restore 6 GPRs, 8 XMMs, and AL around the call.			; Save and restore 6 GPRs, 8 XMMs, and AL around the call.

	; No regparms on normal x86 conventions.			; No regparms on normal x86 conventions.

	; This thunk shouldn't require any spills and reloads, assuming the register			; This thunk stores xmms on entry and restores them before jumping.
	; allocator knows what it's doing.			; Storing and restoring xmms could be optimized out for this concrete case.
	rnkUnsubmitted Not Done Reply Inline Actions This seems unfortunate. :( Does all this go away if you put the "thunk" attribute on it? Everything in this file is meant to be a test for universal thunks, so adding the attribute is reasonable. rnk: This seems unfortunate. :( Does all this go away if you put the "thunk" attribute on it?
	avlAuthorUnsubmitted Done Reply Inline Actions yes. all this xmm save/restore code would go away if "thunk" is specified. I will add it to the test case. additionally, I would like to make separate patch which would NOT do this xmm store/restore if noimplicitfloat=false. So that store/restore code is generated for only noimplicitfloat=true case. thus, I assume following patches would be done: this patch. xmm stores/restores through phys regs would be generated for all _usual\|_ thunks(not including universal thunks) patch which will fix ABI breakage for noimplicitfloat case - D62639 do not generate xmm store/restore through phys regs for noimplicitfloat=false case. avl: yes. all this xmm save/restore code would go away if "thunk" is specified. I will add it to the…

	define void @g_thunk(i8* %fptr_i8, ...) {			define void @g_thunk(i8* %fptr_i8, ...) {
	; LINUX-LABEL: g_thunk:			; LINUX-LABEL: g_thunk:
	; LINUX: # %bb.0:			; LINUX: # %bb.0:
				; LINUX-NEXT: pushq %rax
				; LINUX-NEXT: .cfi_def_cfa_offset 16
				; LINUX-NEXT: testb %al, %al
				; LINUX-NEXT: je .LBB1_2
				; LINUX-NEXT: # %bb.1:
				; LINUX-NEXT: movaps %xmm0, -{{[0-9]+}}(%rsp)
				; LINUX-NEXT: movaps %xmm1, -{{[0-9]+}}(%rsp)
				; LINUX-NEXT: movaps %xmm2, -{{[0-9]+}}(%rsp)
				; LINUX-NEXT: movaps %xmm3, -{{[0-9]+}}(%rsp)
				; LINUX-NEXT: movaps %xmm4, -{{[0-9]+}}(%rsp)
				; LINUX-NEXT: movaps %xmm5, -{{[0-9]+}}(%rsp)
				; LINUX-NEXT: movaps %xmm6, -{{[0-9]+}}(%rsp)
				; LINUX-NEXT: movaps %xmm7, -{{[0-9]+}}(%rsp)
				; LINUX-NEXT: .LBB1_2:
				; LINUX-NEXT: testb %al, %al
				; LINUX-NEXT: je .LBB1_4
				; LINUX-NEXT: # %bb.3:
				; LINUX-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm7
				; LINUX-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm6
				; LINUX-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm5
				; LINUX-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm4
				; LINUX-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm3
				; LINUX-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm2
				; LINUX-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm1
				; LINUX-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm0
				; LINUX-NEXT: popq %r11
				; LINUX-NEXT: .cfi_def_cfa_offset 8
				; LINUX-NEXT: jmpq *%rdi # TAILCALL
				; LINUX-NEXT: .LBB1_4:
				; LINUX-NEXT: .cfi_def_cfa_offset 16
				; LINUX-NEXT: popq %r11
				; LINUX-NEXT: .cfi_def_cfa_offset 8
	; LINUX-NEXT: jmpq *%rdi # TAILCALL			; LINUX-NEXT: jmpq *%rdi # TAILCALL
	;			;
				; LINUX-OPT0-LABEL: g_thunk:
				; LINUX-OPT0: # %bb.0:
				; LINUX-OPT0-NEXT: subq $72, %rsp
				; LINUX-OPT0-NEXT: .cfi_def_cfa_offset 80
				; LINUX-OPT0-NEXT: movb %al, %r10b
				; LINUX-OPT0-NEXT: testb %al, %al
				; LINUX-OPT0-NEXT: movq %rdi, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; LINUX-OPT0-NEXT: movq %rsi, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; LINUX-OPT0-NEXT: movq %rdx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; LINUX-OPT0-NEXT: movq %rcx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; LINUX-OPT0-NEXT: movq %r8, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; LINUX-OPT0-NEXT: movq %r9, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; LINUX-OPT0-NEXT: movb %r10b, {{[-0-9]+}}(%r{{[sb]}}p) # 1-byte Spill
				; LINUX-OPT0-NEXT: je .LBB1_2
				; LINUX-OPT0-NEXT: # %bb.1:
				; LINUX-OPT0-NEXT: movaps %xmm0, -{{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movaps %xmm1, -{{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movaps %xmm2, -{{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movaps %xmm3, -{{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movaps %xmm4, (%rsp)
				; LINUX-OPT0-NEXT: movaps %xmm5, {{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movaps %xmm6, {{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movaps %xmm7, {{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: .LBB1_2:
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rdi # 8-byte Reload
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rsi # 8-byte Reload
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rdx # 8-byte Reload
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rcx # 8-byte Reload
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %r8 # 8-byte Reload
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %r9 # 8-byte Reload
				; LINUX-OPT0-NEXT: movb {{[-0-9]+}}(%r{{[sb]}}p), %al # 1-byte Reload
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %r11 # 8-byte Reload
				; LINUX-OPT0-NEXT: testb %al, %al
				; LINUX-OPT0-NEXT: je .LBB1_4
				; LINUX-OPT0-NEXT: # %bb.3:
				; LINUX-OPT0-NEXT: movaps {{[0-9]+}}(%rsp), %xmm7
				; LINUX-OPT0-NEXT: movaps {{[0-9]+}}(%rsp), %xmm6
				; LINUX-OPT0-NEXT: movaps {{[0-9]+}}(%rsp), %xmm5
				; LINUX-OPT0-NEXT: movaps (%rsp), %xmm4
				; LINUX-OPT0-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm3
				; LINUX-OPT0-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm2
				; LINUX-OPT0-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm1
				; LINUX-OPT0-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm0
				; LINUX-OPT0-NEXT: addq $72, %rsp
				; LINUX-OPT0-NEXT: .cfi_def_cfa_offset 8
				; LINUX-OPT0-NEXT: jmpq *%r11 # TAILCALL
				; LINUX-OPT0-NEXT: .LBB1_4:
				; LINUX-OPT0-NEXT: .cfi_def_cfa_offset 80
				; LINUX-OPT0-NEXT: addq $72, %rsp
				; LINUX-OPT0-NEXT: .cfi_def_cfa_offset 8
				; LINUX-OPT0-NEXT: jmpq *%r11 # TAILCALL
				;
	; LINUX-X32-LABEL: g_thunk:			; LINUX-X32-LABEL: g_thunk:
	; LINUX-X32: # %bb.0:			; LINUX-X32: # %bb.0:
				; LINUX-X32-NEXT: pushq %rax
				; LINUX-X32-NEXT: .cfi_def_cfa_offset 16
				; LINUX-X32-NEXT: testb %al, %al
				; LINUX-X32-NEXT: je .LBB1_2
				; LINUX-X32-NEXT: # %bb.1:
				; LINUX-X32-NEXT: movaps %xmm0, -{{[0-9]+}}(%esp)
				; LINUX-X32-NEXT: movaps %xmm1, -{{[0-9]+}}(%esp)
				; LINUX-X32-NEXT: movaps %xmm2, -{{[0-9]+}}(%esp)
				; LINUX-X32-NEXT: movaps %xmm3, -{{[0-9]+}}(%esp)
				; LINUX-X32-NEXT: movaps %xmm4, -{{[0-9]+}}(%esp)
				; LINUX-X32-NEXT: movaps %xmm5, -{{[0-9]+}}(%esp)
				; LINUX-X32-NEXT: movaps %xmm6, -{{[0-9]+}}(%esp)
				; LINUX-X32-NEXT: movaps %xmm7, -{{[0-9]+}}(%esp)
				; LINUX-X32-NEXT: .LBB1_2:
	; LINUX-X32-NEXT: movl %edi, %r11d			; LINUX-X32-NEXT: movl %edi, %r11d
				; LINUX-X32-NEXT: testb %al, %al
				; LINUX-X32-NEXT: je .LBB1_4
				; LINUX-X32-NEXT: # %bb.3:
				; LINUX-X32-NEXT: movaps -{{[0-9]+}}(%esp), %xmm7
				; LINUX-X32-NEXT: movaps -{{[0-9]+}}(%esp), %xmm6
				; LINUX-X32-NEXT: movaps -{{[0-9]+}}(%esp), %xmm5
				; LINUX-X32-NEXT: movaps -{{[0-9]+}}(%esp), %xmm4
				; LINUX-X32-NEXT: movaps -{{[0-9]+}}(%esp), %xmm3
				; LINUX-X32-NEXT: movaps -{{[0-9]+}}(%esp), %xmm2
				; LINUX-X32-NEXT: movaps -{{[0-9]+}}(%esp), %xmm1
				; LINUX-X32-NEXT: movaps -{{[0-9]+}}(%esp), %xmm0
				; LINUX-X32-NEXT: addl $8, %esp
				; LINUX-X32-NEXT: .cfi_def_cfa_offset 8
				; LINUX-X32-NEXT: jmpq *%r11 # TAILCALL
				; LINUX-X32-NEXT: .LBB1_4:
				; LINUX-X32-NEXT: .cfi_def_cfa_offset 16
				; LINUX-X32-NEXT: addl $8, %esp
				; LINUX-X32-NEXT: .cfi_def_cfa_offset 8
	; LINUX-X32-NEXT: jmpq *%r11 # TAILCALL			; LINUX-X32-NEXT: jmpq *%r11 # TAILCALL
	;			;
				; LINUX-X32-OPT0-LABEL: g_thunk:
				; LINUX-X32-OPT0: # %bb.0:
				; LINUX-X32-OPT0-NEXT: subl $72, %esp
				; LINUX-X32-OPT0-NEXT: .cfi_def_cfa_offset 80
				; LINUX-X32-OPT0-NEXT: movb %al, %r10b
				; LINUX-X32-OPT0-NEXT: testb %al, %al
				; LINUX-X32-OPT0-NEXT: movl %edi, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; LINUX-X32-OPT0-NEXT: movq %rsi, {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Spill
				; LINUX-X32-OPT0-NEXT: movq %rdx, {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Spill
				; LINUX-X32-OPT0-NEXT: movq %rcx, {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Spill
				; LINUX-X32-OPT0-NEXT: movq %r8, {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Spill
				; LINUX-X32-OPT0-NEXT: movq %r9, {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Spill
				; LINUX-X32-OPT0-NEXT: movb %r10b, {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Spill
				; LINUX-X32-OPT0-NEXT: je .LBB1_2
				; LINUX-X32-OPT0-NEXT: # %bb.1:
				; LINUX-X32-OPT0-NEXT: movaps %xmm0, -{{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movaps %xmm1, -{{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movaps %xmm2, -{{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movaps %xmm3, -{{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movaps %xmm4, (%esp)
				; LINUX-X32-OPT0-NEXT: movaps %xmm5, {{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movaps %xmm6, {{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movaps %xmm7, {{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: .LBB1_2:
				; LINUX-X32-OPT0-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; LINUX-X32-OPT0-NEXT: movl %eax, %ecx
				; LINUX-X32-OPT0-NEXT: movl %ecx, %edx
				; LINUX-X32-OPT0-NEXT: movl %eax, %edi
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %rsi # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %r8 # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: movq %rdx, {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Spill
				; LINUX-X32-OPT0-NEXT: movq %r8, %rdx
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %rcx # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %r8 # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %r9 # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: movb {{[-0-9]+}}(%e{{[sb]}}p), %al # 1-byte Reload
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %r11 # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: testb %al, %al
				; LINUX-X32-OPT0-NEXT: je .LBB1_4
				; LINUX-X32-OPT0-NEXT: # %bb.3:
				; LINUX-X32-OPT0-NEXT: movaps {{[0-9]+}}(%esp), %xmm7
				; LINUX-X32-OPT0-NEXT: movaps {{[0-9]+}}(%esp), %xmm6
				; LINUX-X32-OPT0-NEXT: movaps {{[0-9]+}}(%esp), %xmm5
				; LINUX-X32-OPT0-NEXT: movaps (%esp), %xmm4
				; LINUX-X32-OPT0-NEXT: movaps -{{[0-9]+}}(%esp), %xmm3
				; LINUX-X32-OPT0-NEXT: movaps -{{[0-9]+}}(%esp), %xmm2
				; LINUX-X32-OPT0-NEXT: movaps -{{[0-9]+}}(%esp), %xmm1
				; LINUX-X32-OPT0-NEXT: movaps -{{[0-9]+}}(%esp), %xmm0
				; LINUX-X32-OPT0-NEXT: addl $72, %esp
				; LINUX-X32-OPT0-NEXT: .cfi_def_cfa_offset 8
				; LINUX-X32-OPT0-NEXT: jmpq *%r11 # TAILCALL
				; LINUX-X32-OPT0-NEXT: .LBB1_4:
				; LINUX-X32-OPT0-NEXT: .cfi_def_cfa_offset 80
				; LINUX-X32-OPT0-NEXT: addl $72, %esp
				; LINUX-X32-OPT0-NEXT: .cfi_def_cfa_offset 8
				; LINUX-X32-OPT0-NEXT: jmpq *%r11 # TAILCALL
				;
	; WINDOWS-LABEL: g_thunk:			; WINDOWS-LABEL: g_thunk:
	; WINDOWS: # %bb.0:			; WINDOWS: # %bb.0:
	; WINDOWS-NEXT: rex64 jmpq *%rcx # TAILCALL			; WINDOWS-NEXT: rex64 jmpq *%rcx # TAILCALL
	;			;
				; WINDOWS-OPT0-LABEL: g_thunk:
				; WINDOWS-OPT0: # %bb.0:
				; WINDOWS-OPT0-NEXT: pushq %rax
				; WINDOWS-OPT0-NEXT: .seh_stackalloc 8
				; WINDOWS-OPT0-NEXT: .seh_endprologue
				; WINDOWS-OPT0-NEXT: movq %rcx, (%rsp) # 8-byte Spill
				; WINDOWS-OPT0-NEXT: movq (%rsp), %rax # 8-byte Reload
				; WINDOWS-OPT0-NEXT: popq %r10
				; WINDOWS-OPT0-NEXT: rex64 jmpq *%rax # TAILCALL
				; WINDOWS-OPT0-NEXT: .seh_handlerdata
				; WINDOWS-OPT0-NEXT: .text
				; WINDOWS-OPT0-NEXT: .seh_endproc
				;
	; X86-LABEL: g_thunk:			; X86-LABEL: g_thunk:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)			; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)
	; X86-NEXT: jmpl *%eax # TAILCALL			; X86-NEXT: jmpl *%eax # TAILCALL
	%fptr = bitcast i8* %fptr_i8 to void (i8, ...)			%fptr = bitcast i8* %fptr_i8 to void (i8, ...)
	musttail call void (i8, ...) %fptr(i8 %fptr_i8, ...)			musttail call void (i8, ...) %fptr(i8 %fptr_i8, ...)
	ret void			ret void
	}			}

	; Do a simple multi-exit multi-bb test.			; Do a simple multi-exit multi-bb test.

	%struct.Foo = type { i1, i8, i8 }			%struct.Foo = type { i1, i8, i8 }

	@g = external global i32			@g = external global i32

	define void @h_thunk(%struct.Foo* %this, ...) {			define void @h_thunk(%struct.Foo* %this, ...) {
	; LINUX-LABEL: h_thunk:			; LINUX-LABEL: h_thunk:
	; LINUX: # %bb.0:			; LINUX: # %bb.0:
				; LINUX-NEXT: pushq %rax
				; LINUX-NEXT: .cfi_def_cfa_offset 16
				; LINUX-NEXT: testb %al, %al
				; LINUX-NEXT: je .LBB2_2
				; LINUX-NEXT: # %bb.1:
				; LINUX-NEXT: movaps %xmm0, -{{[0-9]+}}(%rsp)
				; LINUX-NEXT: movaps %xmm1, -{{[0-9]+}}(%rsp)
				; LINUX-NEXT: movaps %xmm2, -{{[0-9]+}}(%rsp)
				; LINUX-NEXT: movaps %xmm3, -{{[0-9]+}}(%rsp)
				; LINUX-NEXT: movaps %xmm4, -{{[0-9]+}}(%rsp)
				; LINUX-NEXT: movaps %xmm5, -{{[0-9]+}}(%rsp)
				; LINUX-NEXT: movaps %xmm6, -{{[0-9]+}}(%rsp)
				; LINUX-NEXT: movaps %xmm7, -{{[0-9]+}}(%rsp)
				; LINUX-NEXT: .LBB2_2:
	; LINUX-NEXT: cmpb $1, (%rdi)			; LINUX-NEXT: cmpb $1, (%rdi)
	; LINUX-NEXT: jne .LBB2_2			; LINUX-NEXT: jne .LBB2_4
	; LINUX-NEXT: # %bb.1: # %then			; LINUX-NEXT: # %bb.3: # %then
	; LINUX-NEXT: movq 8(%rdi), %r11			; LINUX-NEXT: movq 8(%rdi), %r11
				; LINUX-NEXT: testb %al, %al
				; LINUX-NEXT: je .LBB2_6
				; LINUX-NEXT: # %bb.5: # %then
				; LINUX-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm7
				; LINUX-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm6
				; LINUX-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm5
				; LINUX-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm4
				; LINUX-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm3
				; LINUX-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm2
				; LINUX-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm1
				; LINUX-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm0
				; LINUX-NEXT: addq $8, %rsp
				; LINUX-NEXT: .cfi_def_cfa_offset 8
	; LINUX-NEXT: jmpq *%r11 # TAILCALL			; LINUX-NEXT: jmpq *%r11 # TAILCALL
	; LINUX-NEXT: .LBB2_2: # %else			; LINUX-NEXT: .LBB2_4: # %else
				; LINUX-NEXT: .cfi_def_cfa_offset 16
	; LINUX-NEXT: movq 16(%rdi), %r11			; LINUX-NEXT: movq 16(%rdi), %r11
	; LINUX-NEXT: movl $42, {{.*}}(%rip)			; LINUX-NEXT: movl $42, {{.*}}(%rip)
				; LINUX-NEXT: testb %al, %al
				; LINUX-NEXT: je .LBB2_8
				; LINUX-NEXT: # %bb.7: # %else
				; LINUX-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm7
				; LINUX-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm6
				; LINUX-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm5
				; LINUX-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm4
				; LINUX-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm3
				; LINUX-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm2
				; LINUX-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm1
				; LINUX-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm0
				; LINUX-NEXT: addq $8, %rsp
				; LINUX-NEXT: .cfi_def_cfa_offset 8
				; LINUX-NEXT: jmpq *%r11 # TAILCALL
				; LINUX-NEXT: .LBB2_6: # %then
				; LINUX-NEXT: .cfi_def_cfa_offset 16
				; LINUX-NEXT: addq $8, %rsp
				; LINUX-NEXT: .cfi_def_cfa_offset 8
				; LINUX-NEXT: jmpq *%r11 # TAILCALL
				; LINUX-NEXT: .LBB2_8: # %else
				; LINUX-NEXT: .cfi_def_cfa_offset 16
				; LINUX-NEXT: addq $8, %rsp
				; LINUX-NEXT: .cfi_def_cfa_offset 8
	; LINUX-NEXT: jmpq *%r11 # TAILCALL			; LINUX-NEXT: jmpq *%r11 # TAILCALL
	;			;
				; LINUX-OPT0-LABEL: h_thunk:
				; LINUX-OPT0: # %bb.0:
				; LINUX-OPT0-NEXT: subq $88, %rsp
				; LINUX-OPT0-NEXT: .cfi_def_cfa_offset 96
				; LINUX-OPT0-NEXT: movb %al, %r10b
				; LINUX-OPT0-NEXT: testb %al, %al
				; LINUX-OPT0-NEXT: movq %rdi, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; LINUX-OPT0-NEXT: movq %rsi, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; LINUX-OPT0-NEXT: movq %rdx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; LINUX-OPT0-NEXT: movq %rcx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; LINUX-OPT0-NEXT: movq %r8, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; LINUX-OPT0-NEXT: movq %r9, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; LINUX-OPT0-NEXT: movb %r10b, {{[-0-9]+}}(%r{{[sb]}}p) # 1-byte Spill
				; LINUX-OPT0-NEXT: je .LBB2_4
				; LINUX-OPT0-NEXT: # %bb.3:
				; LINUX-OPT0-NEXT: movaps %xmm0, -{{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movaps %xmm1, -{{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movaps %xmm2, -{{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movaps %xmm3, (%rsp)
				; LINUX-OPT0-NEXT: movaps %xmm4, {{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movaps %xmm5, {{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movaps %xmm6, {{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: movaps %xmm7, {{[0-9]+}}(%rsp)
				; LINUX-OPT0-NEXT: .LBB2_4:
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rax # 8-byte Reload
				; LINUX-OPT0-NEXT: testb $1, (%rax)
				; LINUX-OPT0-NEXT: jne .LBB2_1
				; LINUX-OPT0-NEXT: jmp .LBB2_2
				; LINUX-OPT0-NEXT: .LBB2_1: # %then
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rax # 8-byte Reload
				; LINUX-OPT0-NEXT: movq 8(%rax), %rcx
				; LINUX-OPT0-NEXT: movq %rax, %rdi
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rsi # 8-byte Reload
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rdx # 8-byte Reload
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %r8 # 8-byte Reload
				; LINUX-OPT0-NEXT: movq %rcx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; LINUX-OPT0-NEXT: movq %r8, %rcx
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %r8 # 8-byte Reload
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %r9 # 8-byte Reload
				; LINUX-OPT0-NEXT: movb {{[-0-9]+}}(%r{{[sb]}}p), %al # 1-byte Reload
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %r11 # 8-byte Reload
				; LINUX-OPT0-NEXT: testb %al, %al
				; LINUX-OPT0-NEXT: je .LBB2_6
				; LINUX-OPT0-NEXT: # %bb.5: # %then
				; LINUX-OPT0-NEXT: movaps {{[0-9]+}}(%rsp), %xmm7
				; LINUX-OPT0-NEXT: movaps {{[0-9]+}}(%rsp), %xmm6
				; LINUX-OPT0-NEXT: movaps {{[0-9]+}}(%rsp), %xmm5
				; LINUX-OPT0-NEXT: movaps {{[0-9]+}}(%rsp), %xmm4
				; LINUX-OPT0-NEXT: movaps (%rsp), %xmm3
				; LINUX-OPT0-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm2
				; LINUX-OPT0-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm1
				; LINUX-OPT0-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm0
				; LINUX-OPT0-NEXT: addq $88, %rsp
				; LINUX-OPT0-NEXT: .cfi_def_cfa_offset 8
				; LINUX-OPT0-NEXT: jmpq *%r11 # TAILCALL
				; LINUX-OPT0-NEXT: .LBB2_6: # %then
				; LINUX-OPT0-NEXT: .cfi_def_cfa_offset 96
				; LINUX-OPT0-NEXT: addq $88, %rsp
				; LINUX-OPT0-NEXT: .cfi_def_cfa_offset 8
				; LINUX-OPT0-NEXT: jmpq *%r11 # TAILCALL
				; LINUX-OPT0-NEXT: .LBB2_2: # %else
				; LINUX-OPT0-NEXT: .cfi_def_cfa_offset 96
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rax # 8-byte Reload
				; LINUX-OPT0-NEXT: movq 16(%rax), %rcx
				; LINUX-OPT0-NEXT: movl $42, {{.*}}(%rip)
				; LINUX-OPT0-NEXT: movq %rax, %rdi
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rsi # 8-byte Reload
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rdx # 8-byte Reload
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %r8 # 8-byte Reload
				; LINUX-OPT0-NEXT: movq %rcx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; LINUX-OPT0-NEXT: movq %r8, %rcx
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %r8 # 8-byte Reload
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %r9 # 8-byte Reload
				; LINUX-OPT0-NEXT: movb {{[-0-9]+}}(%r{{[sb]}}p), %al # 1-byte Reload
				; LINUX-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %r11 # 8-byte Reload
				; LINUX-OPT0-NEXT: testb %al, %al
				; LINUX-OPT0-NEXT: je .LBB2_8
				; LINUX-OPT0-NEXT: # %bb.7: # %else
				; LINUX-OPT0-NEXT: movaps {{[0-9]+}}(%rsp), %xmm7
				; LINUX-OPT0-NEXT: movaps {{[0-9]+}}(%rsp), %xmm6
				; LINUX-OPT0-NEXT: movaps {{[0-9]+}}(%rsp), %xmm5
				; LINUX-OPT0-NEXT: movaps {{[0-9]+}}(%rsp), %xmm4
				; LINUX-OPT0-NEXT: movaps (%rsp), %xmm3
				; LINUX-OPT0-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm2
				; LINUX-OPT0-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm1
				; LINUX-OPT0-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm0
				; LINUX-OPT0-NEXT: addq $88, %rsp
				; LINUX-OPT0-NEXT: .cfi_def_cfa_offset 8
				; LINUX-OPT0-NEXT: jmpq *%r11 # TAILCALL
				; LINUX-OPT0-NEXT: .LBB2_8: # %else
				; LINUX-OPT0-NEXT: .cfi_def_cfa_offset 96
				; LINUX-OPT0-NEXT: addq $88, %rsp
				; LINUX-OPT0-NEXT: .cfi_def_cfa_offset 8
				; LINUX-OPT0-NEXT: jmpq *%r11 # TAILCALL
				;
	; LINUX-X32-LABEL: h_thunk:			; LINUX-X32-LABEL: h_thunk:
	; LINUX-X32: # %bb.0:			; LINUX-X32: # %bb.0:
				; LINUX-X32-NEXT: pushq %rax
				; LINUX-X32-NEXT: .cfi_def_cfa_offset 16
				; LINUX-X32-NEXT: testb %al, %al
				; LINUX-X32-NEXT: je .LBB2_2
				; LINUX-X32-NEXT: # %bb.1:
				; LINUX-X32-NEXT: movaps %xmm0, -{{[0-9]+}}(%esp)
				; LINUX-X32-NEXT: movaps %xmm1, -{{[0-9]+}}(%esp)
				; LINUX-X32-NEXT: movaps %xmm2, -{{[0-9]+}}(%esp)
				; LINUX-X32-NEXT: movaps %xmm3, -{{[0-9]+}}(%esp)
				; LINUX-X32-NEXT: movaps %xmm4, -{{[0-9]+}}(%esp)
				; LINUX-X32-NEXT: movaps %xmm5, -{{[0-9]+}}(%esp)
				; LINUX-X32-NEXT: movaps %xmm6, -{{[0-9]+}}(%esp)
				; LINUX-X32-NEXT: movaps %xmm7, -{{[0-9]+}}(%esp)
				; LINUX-X32-NEXT: .LBB2_2:
	; LINUX-X32-NEXT: cmpb $1, (%edi)			; LINUX-X32-NEXT: cmpb $1, (%edi)
	; LINUX-X32-NEXT: jne .LBB2_2			; LINUX-X32-NEXT: jne .LBB2_4
	; LINUX-X32-NEXT: # %bb.1: # %then			; LINUX-X32-NEXT: # %bb.3: # %then
	; LINUX-X32-NEXT: movl 4(%edi), %r11d			; LINUX-X32-NEXT: movl 4(%edi), %r11d
				; LINUX-X32-NEXT: testb %al, %al
				; LINUX-X32-NEXT: je .LBB2_6
				; LINUX-X32-NEXT: # %bb.5: # %then
				; LINUX-X32-NEXT: movaps -{{[0-9]+}}(%esp), %xmm7
				; LINUX-X32-NEXT: movaps -{{[0-9]+}}(%esp), %xmm6
				; LINUX-X32-NEXT: movaps -{{[0-9]+}}(%esp), %xmm5
				; LINUX-X32-NEXT: movaps -{{[0-9]+}}(%esp), %xmm4
				; LINUX-X32-NEXT: movaps -{{[0-9]+}}(%esp), %xmm3
				; LINUX-X32-NEXT: movaps -{{[0-9]+}}(%esp), %xmm2
				; LINUX-X32-NEXT: movaps -{{[0-9]+}}(%esp), %xmm1
				; LINUX-X32-NEXT: movaps -{{[0-9]+}}(%esp), %xmm0
				; LINUX-X32-NEXT: addl $8, %esp
				; LINUX-X32-NEXT: .cfi_def_cfa_offset 8
	; LINUX-X32-NEXT: jmpq *%r11 # TAILCALL			; LINUX-X32-NEXT: jmpq *%r11 # TAILCALL
	; LINUX-X32-NEXT: .LBB2_2: # %else			; LINUX-X32-NEXT: .LBB2_4: # %else
				; LINUX-X32-NEXT: .cfi_def_cfa_offset 16
	; LINUX-X32-NEXT: movl 8(%edi), %r11d			; LINUX-X32-NEXT: movl 8(%edi), %r11d
	; LINUX-X32-NEXT: movl $42, {{.*}}(%rip)			; LINUX-X32-NEXT: movl $42, {{.*}}(%rip)
				; LINUX-X32-NEXT: testb %al, %al
				; LINUX-X32-NEXT: je .LBB2_8
				; LINUX-X32-NEXT: # %bb.7: # %else
				; LINUX-X32-NEXT: movaps -{{[0-9]+}}(%esp), %xmm7
				; LINUX-X32-NEXT: movaps -{{[0-9]+}}(%esp), %xmm6
				; LINUX-X32-NEXT: movaps -{{[0-9]+}}(%esp), %xmm5
				; LINUX-X32-NEXT: movaps -{{[0-9]+}}(%esp), %xmm4
				; LINUX-X32-NEXT: movaps -{{[0-9]+}}(%esp), %xmm3
				; LINUX-X32-NEXT: movaps -{{[0-9]+}}(%esp), %xmm2
				; LINUX-X32-NEXT: movaps -{{[0-9]+}}(%esp), %xmm1
				; LINUX-X32-NEXT: movaps -{{[0-9]+}}(%esp), %xmm0
				; LINUX-X32-NEXT: addl $8, %esp
				; LINUX-X32-NEXT: .cfi_def_cfa_offset 8
				; LINUX-X32-NEXT: jmpq *%r11 # TAILCALL
				; LINUX-X32-NEXT: .LBB2_6: # %then
				; LINUX-X32-NEXT: .cfi_def_cfa_offset 16
				; LINUX-X32-NEXT: addl $8, %esp
				; LINUX-X32-NEXT: .cfi_def_cfa_offset 8
				; LINUX-X32-NEXT: jmpq *%r11 # TAILCALL
				; LINUX-X32-NEXT: .LBB2_8: # %else
				; LINUX-X32-NEXT: .cfi_def_cfa_offset 16
				; LINUX-X32-NEXT: addl $8, %esp
				; LINUX-X32-NEXT: .cfi_def_cfa_offset 8
	; LINUX-X32-NEXT: jmpq *%r11 # TAILCALL			; LINUX-X32-NEXT: jmpq *%r11 # TAILCALL
	;			;
				; LINUX-X32-OPT0-LABEL: h_thunk:
				; LINUX-X32-OPT0: # %bb.0:
				; LINUX-X32-OPT0-NEXT: subl $88, %esp
				; LINUX-X32-OPT0-NEXT: .cfi_def_cfa_offset 96
				; LINUX-X32-OPT0-NEXT: movb %al, %r10b
				; LINUX-X32-OPT0-NEXT: testb %al, %al
				; LINUX-X32-OPT0-NEXT: movl %edi, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; LINUX-X32-OPT0-NEXT: movq %rsi, {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Spill
				; LINUX-X32-OPT0-NEXT: movq %rdx, {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Spill
				; LINUX-X32-OPT0-NEXT: movq %rcx, {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Spill
				; LINUX-X32-OPT0-NEXT: movq %r8, {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Spill
				; LINUX-X32-OPT0-NEXT: movq %r9, {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Spill
				; LINUX-X32-OPT0-NEXT: movb %r10b, {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Spill
				; LINUX-X32-OPT0-NEXT: je .LBB2_4
				; LINUX-X32-OPT0-NEXT: # %bb.3:
				; LINUX-X32-OPT0-NEXT: movaps %xmm0, -{{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movaps %xmm1, -{{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movaps %xmm2, -{{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movaps %xmm3, (%esp)
				; LINUX-X32-OPT0-NEXT: movaps %xmm4, {{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movaps %xmm5, {{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movaps %xmm6, {{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: movaps %xmm7, {{[0-9]+}}(%esp)
				; LINUX-X32-OPT0-NEXT: .LBB2_4:
				; LINUX-X32-OPT0-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; LINUX-X32-OPT0-NEXT: testb $1, (%eax)
				; LINUX-X32-OPT0-NEXT: jne .LBB2_1
				; LINUX-X32-OPT0-NEXT: jmp .LBB2_2
				; LINUX-X32-OPT0-NEXT: .LBB2_1: # %then
				; LINUX-X32-OPT0-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; LINUX-X32-OPT0-NEXT: movl 4(%eax), %ecx
				; LINUX-X32-OPT0-NEXT: movl %ecx, %edx
				; LINUX-X32-OPT0-NEXT: movl %eax, %edi
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %rsi # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %r8 # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: movq %rdx, {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Spill
				; LINUX-X32-OPT0-NEXT: movq %r8, %rdx
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %rcx # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %r8 # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %r9 # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: movb {{[-0-9]+}}(%e{{[sb]}}p), %al # 1-byte Reload
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %r11 # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: testb %al, %al
				; LINUX-X32-OPT0-NEXT: je .LBB2_6
				; LINUX-X32-OPT0-NEXT: # %bb.5: # %then
				; LINUX-X32-OPT0-NEXT: movaps {{[0-9]+}}(%esp), %xmm7
				; LINUX-X32-OPT0-NEXT: movaps {{[0-9]+}}(%esp), %xmm6
				; LINUX-X32-OPT0-NEXT: movaps {{[0-9]+}}(%esp), %xmm5
				; LINUX-X32-OPT0-NEXT: movaps {{[0-9]+}}(%esp), %xmm4
				; LINUX-X32-OPT0-NEXT: movaps (%esp), %xmm3
				; LINUX-X32-OPT0-NEXT: movaps -{{[0-9]+}}(%esp), %xmm2
				; LINUX-X32-OPT0-NEXT: movaps -{{[0-9]+}}(%esp), %xmm1
				; LINUX-X32-OPT0-NEXT: movaps -{{[0-9]+}}(%esp), %xmm0
				; LINUX-X32-OPT0-NEXT: addl $88, %esp
				; LINUX-X32-OPT0-NEXT: .cfi_def_cfa_offset 8
				; LINUX-X32-OPT0-NEXT: jmpq *%r11 # TAILCALL
				; LINUX-X32-OPT0-NEXT: .LBB2_6: # %then
				; LINUX-X32-OPT0-NEXT: .cfi_def_cfa_offset 96
				; LINUX-X32-OPT0-NEXT: addl $88, %esp
				; LINUX-X32-OPT0-NEXT: .cfi_def_cfa_offset 8
				; LINUX-X32-OPT0-NEXT: jmpq *%r11 # TAILCALL
				; LINUX-X32-OPT0-NEXT: .LBB2_2: # %else
				; LINUX-X32-OPT0-NEXT: .cfi_def_cfa_offset 96
				; LINUX-X32-OPT0-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; LINUX-X32-OPT0-NEXT: movl 8(%eax), %ecx
				; LINUX-X32-OPT0-NEXT: movl %ecx, %edx
				; LINUX-X32-OPT0-NEXT: movl $42, {{.*}}(%rip)
				; LINUX-X32-OPT0-NEXT: movl %eax, %edi
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %rsi # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %r8 # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: movq %rdx, {{[-0-9]+}}(%e{{[sb]}}p) # 8-byte Spill
				; LINUX-X32-OPT0-NEXT: movq %r8, %rdx
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %rcx # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %r8 # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %r9 # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: movb {{[-0-9]+}}(%e{{[sb]}}p), %al # 1-byte Reload
				; LINUX-X32-OPT0-NEXT: movq {{[-0-9]+}}(%e{{[sb]}}p), %r11 # 8-byte Reload
				; LINUX-X32-OPT0-NEXT: testb %al, %al
				; LINUX-X32-OPT0-NEXT: je .LBB2_8
				; LINUX-X32-OPT0-NEXT: # %bb.7: # %else
				; LINUX-X32-OPT0-NEXT: movaps {{[0-9]+}}(%esp), %xmm7
				; LINUX-X32-OPT0-NEXT: movaps {{[0-9]+}}(%esp), %xmm6
				; LINUX-X32-OPT0-NEXT: movaps {{[0-9]+}}(%esp), %xmm5
				; LINUX-X32-OPT0-NEXT: movaps {{[0-9]+}}(%esp), %xmm4
				; LINUX-X32-OPT0-NEXT: movaps (%esp), %xmm3
				; LINUX-X32-OPT0-NEXT: movaps -{{[0-9]+}}(%esp), %xmm2
				; LINUX-X32-OPT0-NEXT: movaps -{{[0-9]+}}(%esp), %xmm1
				; LINUX-X32-OPT0-NEXT: movaps -{{[0-9]+}}(%esp), %xmm0
				; LINUX-X32-OPT0-NEXT: addl $88, %esp
				; LINUX-X32-OPT0-NEXT: .cfi_def_cfa_offset 8
				; LINUX-X32-OPT0-NEXT: jmpq *%r11 # TAILCALL
				; LINUX-X32-OPT0-NEXT: .LBB2_8: # %else
				; LINUX-X32-OPT0-NEXT: .cfi_def_cfa_offset 96
				; LINUX-X32-OPT0-NEXT: addl $88, %esp
				; LINUX-X32-OPT0-NEXT: .cfi_def_cfa_offset 8
				; LINUX-X32-OPT0-NEXT: jmpq *%r11 # TAILCALL
				;
	; WINDOWS-LABEL: h_thunk:			; WINDOWS-LABEL: h_thunk:
	; WINDOWS: # %bb.0:			; WINDOWS: # %bb.0:
	; WINDOWS-NEXT: cmpb $1, (%rcx)			; WINDOWS-NEXT: cmpb $1, (%rcx)
	; WINDOWS-NEXT: jne .LBB2_2			; WINDOWS-NEXT: jne .LBB2_2
	; WINDOWS-NEXT: # %bb.1: # %then			; WINDOWS-NEXT: # %bb.1: # %then
	; WINDOWS-NEXT: movq 8(%rcx), %rax			; WINDOWS-NEXT: movq 8(%rcx), %rax
	; WINDOWS-NEXT: rex64 jmpq *%rax # TAILCALL			; WINDOWS-NEXT: rex64 jmpq *%rax # TAILCALL
	; WINDOWS-NEXT: .LBB2_2: # %else			; WINDOWS-NEXT: .LBB2_2: # %else
	; WINDOWS-NEXT: movq 16(%rcx), %rax			; WINDOWS-NEXT: movq 16(%rcx), %rax
	; WINDOWS-NEXT: movl $42, {{.*}}(%rip)			; WINDOWS-NEXT: movl $42, {{.*}}(%rip)
	; WINDOWS-NEXT: rex64 jmpq *%rax # TAILCALL			; WINDOWS-NEXT: rex64 jmpq *%rax # TAILCALL
	;			;
	; X86-LABEL: h_thunk:			; WINDOWS-OPT0-LABEL: h_thunk:
	; X86: # %bb.0:			; WINDOWS-OPT0: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; WINDOWS-OPT0-NEXT: subq $48, %rsp
	; X86-NEXT: cmpb $1, (%eax)			; WINDOWS-OPT0-NEXT: .seh_stackalloc 48
	; X86-NEXT: jne LBB2_2			; WINDOWS-OPT0-NEXT: .seh_endprologue
	; X86-NEXT: # %bb.1: # %then			; WINDOWS-OPT0-NEXT: testb $1, (%rcx)
	; X86-NEXT: movl 4(%eax), %ecx			; WINDOWS-OPT0-NEXT: movq %rcx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
	; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)			; WINDOWS-OPT0-NEXT: movq %rdx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
	; X86-NEXT: jmpl *%ecx # TAILCALL			; WINDOWS-OPT0-NEXT: movq %r8, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
	; X86-NEXT: LBB2_2: # %else			; WINDOWS-OPT0-NEXT: movq %r9, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
	; X86-NEXT: movl 8(%eax), %ecx			; WINDOWS-OPT0-NEXT: jne .LBB2_1
	; X86-NEXT: movl $42, _g			; WINDOWS-OPT0-NEXT: jmp .LBB2_2
	; X86-NEXT: movl %eax, {{[0-9]+}}(%esp)			; WINDOWS-OPT0-NEXT: .LBB2_1: # %then
	; X86-NEXT: jmpl *%ecx # TAILCALL			; WINDOWS-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rax # 8-byte Reload
				; WINDOWS-OPT0-NEXT: movq 8(%rax), %rcx
				; WINDOWS-OPT0-NEXT: movq %rcx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
				; WINDOWS-OPT0-NEXT: movq %rax, %rcx
				; WINDOWS-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rdx # 8-byte Reload
				; WINDOWS-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %r8 # 8-byte Reload
				; WINDOWS-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %r9 # 8-byte Reload
				; WINDOWS-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %r10 # 8-byte Reload
				; WINDOWS-OPT0-NEXT: addq $48, %rsp
				; WINDOWS-OPT0-NEXT: rex64 jmpq *%r10 # TAILCALL
				; WINDOWS-OPT0-NEXT: .LBB2_2: # %else
				; WINDOWS-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rax # 8-byte Reload
				; WINDOWS-OPT0-NEXT: movq 16(%rax), %rcx
				; WINDOWS-OPT0-NEXT: movl $42, {{.*}}(%rip)
				; WINDOWS-OPT0-NEXT: movq %rcx, (%rsp) # 8-byte Spill
				; WINDOWS-OPT0-NEXT: movq %rax, %rcx
				; WINDOWS-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %rdx # 8-byte Reload
				; WINDOWS-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %r8 # 8-byte Reload
				; WINDOWS-OPT0-NEXT: movq {{[-0-9]+}}(%r{{[sb]}}p), %r9 # 8-byte Reload
				; WINDOWS-OPT0-NEXT: movq (%rsp), %r10 # 8-byte Reload
				; WINDOWS-OPT0-NEXT: addq $48, %rsp
				; WINDOWS-OPT0-NEXT: rex64 jmpq *%r10 # TAILCALL
				; WINDOWS-OPT0-NEXT: .seh_handlerdata
				; WINDOWS-OPT0-NEXT: .text
				; WINDOWS-OPT0-NEXT: .seh_endproc
				;
				; X86-NOSSE-LABEL: h_thunk:
				; X86-NOSSE: # %bb.0:
				; X86-NOSSE-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NOSSE-NEXT: cmpb $1, (%eax)
				; X86-NOSSE-NEXT: jne LBB2_2
				; X86-NOSSE-NEXT: # %bb.1: # %then
				; X86-NOSSE-NEXT: movl 4(%eax), %ecx
				; X86-NOSSE-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NOSSE-NEXT: jmpl *%ecx # TAILCALL
				; X86-NOSSE-NEXT: LBB2_2: # %else
				; X86-NOSSE-NEXT: movl 8(%eax), %ecx
				; X86-NOSSE-NEXT: movl $42, _g
				; X86-NOSSE-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NOSSE-NEXT: jmpl *%ecx # TAILCALL
				;
				; X86-NOSSE-OPT0-LABEL: h_thunk:
				; X86-NOSSE-OPT0: # %bb.0:
				; X86-NOSSE-OPT0-NEXT: pushl %eax
				; X86-NOSSE-OPT0-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NOSSE-OPT0-NEXT: testb $1, (%eax)
				; X86-NOSSE-OPT0-NEXT: movl %eax, (%esp) # 4-byte Spill
				; X86-NOSSE-OPT0-NEXT: jne LBB2_1
				; X86-NOSSE-OPT0-NEXT: jmp LBB2_2
				; X86-NOSSE-OPT0-NEXT: LBB2_1: # %then
				; X86-NOSSE-OPT0-NEXT: movl (%esp), %eax # 4-byte Reload
				; X86-NOSSE-OPT0-NEXT: movl 4(%eax), %ecx
				; X86-NOSSE-OPT0-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NOSSE-OPT0-NEXT: popl %eax
				; X86-NOSSE-OPT0-NEXT: jmpl *%ecx # TAILCALL
				; X86-NOSSE-OPT0-NEXT: LBB2_2: # %else
				; X86-NOSSE-OPT0-NEXT: movl (%esp), %eax # 4-byte Reload
				; X86-NOSSE-OPT0-NEXT: movl 8(%eax), %ecx
				; X86-NOSSE-OPT0-NEXT: movl $42, _g
				; X86-NOSSE-OPT0-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-NOSSE-OPT0-NEXT: popl %eax
				; X86-NOSSE-OPT0-NEXT: jmpl *%ecx # TAILCALL
				;
				; X86-SSE-LABEL: h_thunk:
				; X86-SSE: # %bb.0:
				; X86-SSE-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-SSE-NEXT: cmpb $1, (%eax)
				; X86-SSE-NEXT: jne LBB2_2
				; X86-SSE-NEXT: # %bb.1: # %then
				; X86-SSE-NEXT: movl 4(%eax), %ecx
				; X86-SSE-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-SSE-NEXT: jmpl *%ecx # TAILCALL
				; X86-SSE-NEXT: LBB2_2: # %else
				; X86-SSE-NEXT: movl 8(%eax), %ecx
				; X86-SSE-NEXT: movl $42, _g
				; X86-SSE-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-SSE-NEXT: jmpl *%ecx # TAILCALL
				;
				; X86-SSE-OPT0-LABEL: h_thunk:
				; X86-SSE-OPT0: # %bb.0:
				; X86-SSE-OPT0-NEXT: subl $76, %esp
				; X86-SSE-OPT0-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-SSE-OPT0-NEXT: testb $1, (%eax)
				; X86-SSE-OPT0-NEXT: movups %xmm0, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill
				; X86-SSE-OPT0-NEXT: movups %xmm1, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill
				; X86-SSE-OPT0-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
				; X86-SSE-OPT0-NEXT: movups %xmm2, (%esp) # 16-byte Spill
				; X86-SSE-OPT0-NEXT: jne LBB2_1
				; X86-SSE-OPT0-NEXT: jmp LBB2_2
				; X86-SSE-OPT0-NEXT: LBB2_1: # %then
				; X86-SSE-OPT0-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-SSE-OPT0-NEXT: movl 4(%eax), %ecx
				; X86-SSE-OPT0-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-SSE-OPT0-NEXT: movups {{[-0-9]+}}(%e{{[sb]}}p), %xmm0 # 16-byte Reload
				; X86-SSE-OPT0-NEXT: movups {{[-0-9]+}}(%e{{[sb]}}p), %xmm1 # 16-byte Reload
				; X86-SSE-OPT0-NEXT: movups (%esp), %xmm2 # 16-byte Reload
				; X86-SSE-OPT0-NEXT: addl $76, %esp
				; X86-SSE-OPT0-NEXT: jmpl *%ecx # TAILCALL
				; X86-SSE-OPT0-NEXT: LBB2_2: # %else
				; X86-SSE-OPT0-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
				; X86-SSE-OPT0-NEXT: movl 8(%eax), %ecx
				; X86-SSE-OPT0-NEXT: movl $42, _g
				; X86-SSE-OPT0-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X86-SSE-OPT0-NEXT: movups {{[-0-9]+}}(%e{{[sb]}}p), %xmm0 # 16-byte Reload
				; X86-SSE-OPT0-NEXT: movups {{[-0-9]+}}(%e{{[sb]}}p), %xmm1 # 16-byte Reload
				; X86-SSE-OPT0-NEXT: movups (%esp), %xmm2 # 16-byte Reload
				; X86-SSE-OPT0-NEXT: addl $76, %esp
				; X86-SSE-OPT0-NEXT: jmpl *%ecx # TAILCALL
	%cond_p = getelementptr %struct.Foo, %struct.Foo* %this, i32 0, i32 0			%cond_p = getelementptr %struct.Foo, %struct.Foo* %this, i32 0, i32 0
	%cond = load i1, i1* %cond_p			%cond = load i1, i1* %cond_p
	br i1 %cond, label %then, label %else			br i1 %cond, label %then, label %else

	then:			then:
	%a_p = getelementptr %struct.Foo, %struct.Foo* %this, i32 0, i32 1			%a_p = getelementptr %struct.Foo, %struct.Foo* %this, i32 0, i32 1
	%a_i8 = load i8, i8* %a_p			%a_i8 = load i8, i8* %a_p
	%a = bitcast i8* %a_i8 to void (%struct.Foo, ...)			%a = bitcast i8* %a_i8 to void (%struct.Foo, ...)
	Show All 11 Lines

llvm/test/CodeGen/X86/vastart-defs-eflags.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc %s -o - \| FileCheck %s			; RUN: llc %s -o - \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.10.0"			target triple = "x86_64-apple-macosx10.10.0"

	; Check that vastart handling doesn't get between testb and je for the branch.			; Check that vastart handling doesn't get between testb and je for the branch.
	define i32 @check_flag(i32 %flags, ...) nounwind {			define i32 @check_flag(i32 %flags, ...) nounwind {
	; CHECK-LABEL: check_flag:			; CHECK-LABEL: check_flag:
	; CHECK: ## %bb.0: ## %entry			; CHECK: ## %bb.0: ## %entry
	; CHECK-NEXT: subq $56, %rsp			; CHECK-NEXT: subq $56, %rsp
				; CHECK-NEXT: movq %rsi, -{{[0-9]+}}(%rsp)
				; CHECK-NEXT: movq %rdx, -{{[0-9]+}}(%rsp)
				; CHECK-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)
				; CHECK-NEXT: movq %r8, -{{[0-9]+}}(%rsp)
				; CHECK-NEXT: movq %r9, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: testb %al, %al			; CHECK-NEXT: testb %al, %al
	; CHECK-NEXT: je LBB0_2			; CHECK-NEXT: je LBB0_2
	; CHECK-NEXT: ## %bb.1: ## %entry			; CHECK-NEXT: ## %bb.1: ## %entry
	; CHECK-NEXT: movaps %xmm0, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: movaps %xmm0, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movaps %xmm1, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: movaps %xmm1, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movaps %xmm2, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: movaps %xmm2, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movaps %xmm3, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: movaps %xmm3, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movaps %xmm4, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: movaps %xmm4, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movaps %xmm5, (%rsp)			; CHECK-NEXT: movaps %xmm5, (%rsp)
	; CHECK-NEXT: movaps %xmm6, {{[0-9]+}}(%rsp)			; CHECK-NEXT: movaps %xmm6, {{[0-9]+}}(%rsp)
	; CHECK-NEXT: movaps %xmm7, {{[0-9]+}}(%rsp)			; CHECK-NEXT: movaps %xmm7, {{[0-9]+}}(%rsp)
	; CHECK-NEXT: LBB0_2: ## %entry			; CHECK-NEXT: LBB0_2: ## %entry
	; CHECK-NEXT: movq %rsi, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movq %rdx, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movq %r8, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movq %r9, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: testl $512, %edi ## imm = 0x200			; CHECK-NEXT: testl $512, %edi ## imm = 0x200
	; CHECK-NEXT: je LBB0_4			; CHECK-NEXT: je LBB0_4
	; CHECK-NEXT: ## %bb.3: ## %if.then			; CHECK-NEXT: ## %bb.3: ## %if.then
	; CHECK-NEXT: leaq -{{[0-9]+}}(%rsp), %rax			; CHECK-NEXT: leaq -{{[0-9]+}}(%rsp), %rax
	; CHECK-NEXT: movq %rax, 16			; CHECK-NEXT: movq %rax, 16
	; CHECK-NEXT: leaq {{[0-9]+}}(%rsp), %rax			; CHECK-NEXT: leaq {{[0-9]+}}(%rsp), %rax
	; CHECK-NEXT: movq %rax, 8			; CHECK-NEXT: movq %rax, 8
	Show All 21 Lines

llvm/test/CodeGen/X86/x32-va_start.ll

	Show All 21 Lines

	define i32 @foo(float %a, i8* nocapture readnone %fmt, ...) nounwind {			define i32 @foo(float %a, i8* nocapture readnone %fmt, ...) nounwind {
	entry:			entry:
	%ap = alloca [1 x %struct.__va_list_tag], align 16			%ap = alloca [1 x %struct.__va_list_tag], align 16
	%0 = bitcast [1 x %struct.__va_list_tag]* %ap to i8*			%0 = bitcast [1 x %struct.__va_list_tag]* %ap to i8*
	call void @llvm.lifetime.start.p0i8(i64 16, i8* %0) #2			call void @llvm.lifetime.start.p0i8(i64 16, i8* %0) #2
	call void @llvm.va_start(i8* %0)			call void @llvm.va_start(i8* %0)
	; SSE: subl $72, %esp			; SSE: subl $72, %esp
				; CHECK-DAG: movq %r9
				; CHECK-DAG: movq %r8
				; CHECK-DAG: movq %rcx
				; CHECK-DAG: movq %rdx
				; CHECK-DAG: movq %rsi
	; SSE: testb %al, %al			; SSE: testb %al, %al
	; SSE: je .[[NOFP:.*]]			; SSE: je .[[NOFP:.*]]
	; SSE-DAG: movaps %xmm1			; SSE-DAG: movaps %xmm1
	; SSE-DAG: movaps %xmm2			; SSE-DAG: movaps %xmm2
	; SSE-DAG: movaps %xmm3			; SSE-DAG: movaps %xmm3
	; SSE-DAG: movaps %xmm4			; SSE-DAG: movaps %xmm4
	; SSE-DAG: movaps %xmm5			; SSE-DAG: movaps %xmm5
	; SSE-DAG: movaps %xmm6			; SSE-DAG: movaps %xmm6
	; SSE-DAG: movaps %xmm7			; SSE-DAG: movaps %xmm7
	; NOSSE-NOT: xmm			; NOSSE-NOT: xmm
	; SSE: .[[NOFP]]:			; SSE: .[[NOFP]]:
	; CHECK-DAG: movq %r9
	; CHECK-DAG: movq %r8
	; CHECK-DAG: movq %rcx
	; CHECK-DAG: movq %rdx
	; CHECK-DAG: movq %rsi
	%gp_offset_p = getelementptr inbounds [1 x %struct.__va_list_tag], [1 x %struct.__va_list_tag]* %ap, i32 0, i32 0, i32 0			%gp_offset_p = getelementptr inbounds [1 x %struct.__va_list_tag], [1 x %struct.__va_list_tag]* %ap, i32 0, i32 0, i32 0
	%gp_offset = load i32, i32* %gp_offset_p, align 16			%gp_offset = load i32, i32* %gp_offset_p, align 16
	%fits_in_gp = icmp ult i32 %gp_offset, 41			%fits_in_gp = icmp ult i32 %gp_offset, 41
	br i1 %fits_in_gp, label %vaarg.in_reg, label %vaarg.in_mem			br i1 %fits_in_gp, label %vaarg.in_reg, label %vaarg.in_mem
	; CHECK: cmpl $40, [[COUNT:.*]]			; CHECK: cmpl $40, [[COUNT:.*]]
	; CHECK: ja .[[IN_MEM:.*]]			; CHECK: ja .[[IN_MEM:.*]]

	vaarg.in_reg: ; preds = %entry			vaarg.in_reg: ; preds = %entry
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/xmm-vararg-noopt.ll

This file was added.

				; RUN: llc -O0 -mtriple=x86_64-unknown-unknown < %s \| FileCheck %s

				; CHECK-LABEL: testvarargs
				; Ensure that xmm registers are not used before testing %al
				; CHECK-NOT: xmm
				; CHECK: testb %al, %al
				; CHECK-NOT: xmm
				; CHECK: # %bb.1
				; CHECK-NEXT: %xmm0, {{.*}}%rsp
				; CHECK-NEXT: %xmm1, {{.*}}%rsp
				; CHECK-NEXT: %xmm2, {{.*}}%rsp
				; CHECK-NEXT: %xmm3, {{.*}}%rsp
				; CHECK-NEXT: %xmm4, {{.*}}%rsp
				; CHECK-NEXT: %xmm5, {{.*}}%rsp
				; CHECK-NEXT: %xmm6, {{.*}}%rsp
				; CHECK-NEXT: %xmm7, {{.*}}%rsp

				; ModuleID = 'variadic.c'
				source_filename = "variadic.c"
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux"

				%struct.__va_list_tag = type { i32, i32, i8, i8 }

				@.str = private unnamed_addr constant [9 x i8] c"\0A hello \00", align 1

				; Function Attrs: noinline nounwind optnone uwtable
				define dso_local void @testvarargs(i8* %fmt, ...) {
				entry:
				%fmt.addr = alloca i8*, align 8
				%va = alloca [1 x %struct.__va_list_tag], align 16
				store i8* %fmt, i8** %fmt.addr, align 8
				%arraydecay = getelementptr inbounds [1 x %struct.__va_list_tag], [1 x %struct.__va_list_tag]* %va, i64 0, i64 0
				%arraydecay1 = bitcast %struct.__va_list_tag* %arraydecay to i8*
				call void @llvm.va_start(i8* %arraydecay1)
				%arraydecay2 = getelementptr inbounds [1 x %struct.__va_list_tag], [1 x %struct.__va_list_tag]* %va, i64 0, i64 0
				%arraydecay23 = bitcast %struct.__va_list_tag* %arraydecay2 to i8*
				call void @llvm.va_end(i8* %arraydecay23)
				%call = call i32 (i8, ...) @printf(i8 getelementptr inbounds ([9 x i8], [9 x i8]* @.str, i64 0, i64 0))
				ret void
				}

				; Function Attrs: nounwind
				declare void @llvm.va_start(i8*)

				; Function Attrs: nounwind
				declare void @llvm.va_end(i8*)

				declare dso_local i32 @printf(i8*, ...)

This is an archive of the discontinued LLVM Phabricator instance.

[X86][VARARG] Avoid spilling xmm vararg arguments.AbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 242407

llvm/include/llvm/CodeGen/CallingConvLower.h

llvm/lib/CodeGen/CallingConvLower.cpp

llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp

llvm/lib/Target/AArch64/AArch64CallLowering.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/X86/X86ExpandPseudo.cpp

llvm/lib/Target/X86/X86ISelLowering.h

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/lib/Target/X86/X86InstrCompiler.td

llvm/lib/Target/X86/X86InstrInfo.td

llvm/lib/Target/X86/X86MachineFunctionInfo.h

llvm/test/CodeGen/X86/musttail-varargs.ll

llvm/test/CodeGen/X86/vastart-defs-eflags.ll

llvm/test/CodeGen/X86/x32-va_start.ll

llvm/test/CodeGen/X86/xmm-vararg-noopt.ll

[X86][VARARG] Avoid spilling xmm vararg arguments.
AbandonedPublic