This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Target/X86/
-
Target/
-
X86/
-
X86InstrAVX512.td
-
X86InstrFMA.td
-
X86InstrSSE.td
-
X86SchedBroadwell.td
-
X86SchedHaswell.td
-
X86ScheduleBdVer2.td
-
test/tools/llvm-mca/X86/
-
tools/
-
llvm-mca/
-
X86/
-
BdVer2/
-
int-to-fpu-forwarding-2.s
-
BtVer2/
-
int-to-fpu-forwarding-2.s

Differential D60441

[X86] Make _Int instructions the preferred instructon for the assembly parser and disassembly parser to remove inconsistencies between VEX and EVEX.
ClosedPublic

Authored by craig.topper on Apr 9 2019, 12:44 AM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
andreadb
lebedev.ri

Commits

rG4a32ce39b79d: [X86] Make _Int instructions the preferred instructon for the assembly parser…
rL358138: [X86] Make _Int instructions the preferred instructon for the assembly parser…

Summary

Many of our instructions have both a _Int form used by intrinsics and a form
used by other IR constructs. In the EVEX space the _Int versions usually cover
all the capabilities include broadcasting and rounding. While the other version
only covers simple register/register or register/load forms. For this reason
in EVEX, the non intrinsic form is usually marked isCodeGenOnly=1.

In the VEX encoding space we were less consistent, but usually the _Int version
was the isCodeGenOnly version.

This commit makes the VEX instructions match the EVEX instructions. This was
done by manually studying the AsmMatcher table so its possible I missed some
cases, but we should be closer now.

I'm thinking about using the isCodeGenOnly bit to simplify the EVEX2VEX
tablegen code that disambiguates the _Int and non _Int versions. Currently it
checks register class sizes and Record the memory operands come from. I have
some other changes I was looking into for D59266 that may break the memory check.

I had to make a few scheduler hacks to keep the _Int versions from being treated
differently than the non _Int version.

I'm not sure to do about the int-to-fpu-forwarding-2.s tests. That seems to be
an issue with the fact that we don't model the tied input on the non _Int
instructions in SSE.

Diff Detail

Repository: rL LLVM

Event Timeline

craig.topper created this revision.Apr 9 2019, 12:44 AM

Herald added a reviewer: lebedev.ri. · View Herald TranscriptApr 9 2019, 12:44 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: lebedev.ri, gbedwell, hiraditya. · View Herald Transcript

Harbormaster completed remote builds in B30215: Diff 194256.Apr 9 2019, 12:48 AM

Many of our instructions have both a _Int form used by intrinsics and a form used by other IR constructs.

Is there any documentation, a comment, a mail thread somewhere that explains why this is the way it is?
I.e. why are those _Int variants need to exist? (are they temporary, or to stay forever)

The mca(?) regression is troubling.

In D60441#1459597, @lebedev.ri wrote:

Many of our instructions have both a _Int form used by intrinsics and a form used by other IR constructs.

Is there any documentation, a comment, a mail thread somewhere that explains why this is the way it is?
I.e. why are those _Int variants need to exist? (are they temporary, or to stay forever)

The mca(?) regression is troubling.

There may be a workaround to fix that regression. I plan to send a comment where I describe that workaround.

In D60441#1459597, @lebedev.ri wrote:

Many of our instructions have both a _Int form used by intrinsics and a form used by other IR constructs.

Is there any documentation, a comment, a mail thread somewhere that explains why this is the way it is?
I.e. why are those _Int variants need to exist? (are they temporary, or to stay forever)

The mca(?) regression is troubling.

The _Int instructions use VR128 regclasses while the non _Int versions use the smaller FR32/FR64 register classes. For unary operations like cvtss2sd, the X86 hardware definition reads two source registers, one of them determines the input for the operation that is being performed, the other register is just used to define the final upper bits. For the _Int versions we model this with 2 inputs. For the non _Int version we only model one of the inputs for the legacy SSE encoding. For the VEX encoding we do model both inputs and set one to IMPLICIT_DEF. We have to do this for VEX since the operands are "tied" so the register allocator must assign a register for the implicit_def. For the SSE instructions we do have a late pass that knows these instructions have a "partialRegUpdate" and will insert a dependency breaking XOR based on how long its been since that register was last written. For VEX we try to reassign the register to the oldest register we can find and if that doesn't work we use an XOR to break the dependency.

For binary instructions like addss the lower bits are calculated by adding the lower bits of both sources. The upper bits of the output are defined by upper bits of the first source register. For _Int we use VR128 for both sources and the destination. For non _Int we use FR32/FR64 for both sources and the output.

If we were to merge them it would require a bunch of COPY_TO_REGCLASS conversions to be added to the isel patterns. I fear it would have weird effects on the coalescer and how the register allocate calculates spill slot sizes. For pure scalar float code coming from C not using intrinsics woudl be pessimized. After isel the instructions would produce a VR128, it would be copied to FR32 and then it would be copied back to VR128 for the next instruction. The register coalescer would merge those copies and only VR128 would exist. Then any spills would use a 128-bit spill slot even though we don't care about the upper bits.

I do think we should look into fixing the unary non _Int instructions to list their pass through input and assign it to IMPLICIT_DEF like we do for VEX.

The llvm-mca change does reflect the data the scheduler would see when the user used intrinsics. So its not exactly a "regression". Its showing an existing difference in modeling between the _Int and non _Int instructions. Even though they use the same encoding and the same hardware.

The reason why there is a regression is because this patch adds an extra input operand to the following instructions:

cvtsi2ssl  %ecx, %xmm0
cvtsi2sdl  %ecx, %xmm0

For example:

cvtsi2ssl       %ecx, %xmm0     # <MCInst #775 CVTSI2SSrr_Int
                                #  <MCOperand Reg:142>
                                #  <MCOperand Reg:142>
                                #  <MCOperand Reg:25>>

XMM0 is now an in/out operand. Before this patch, XMM0 was only an output register.
This is equivalent to introducing a false dependency on the output register. That is what causes the extra latency from those two mca tests.

There is a way to workaround the problem introduced by the presence of that extra register read.
We can force that extra read to always have zero-latency by introducing a special "ReadAdvance" definition.

--- X86Schedule.td      (revision 357997)
+++ X86Schedule.td      (working copy)
@@ -22,6 +22,7 @@
 // This SchedRead describes a bypass delay caused by data being moved from the
 // integer unit to the floating point unit.
 def ReadInt2Fpu : SchedRead;
+def ReadAfterInt2Fpu : SchedRead;

For BdVer2 ReadAfterInt2Fpu would be defined as:

def : ReadAdvance<ReadAfterInt2Fpu, 13>;

For BtVer2 it would be defined as follows:

def : ReadAdvance<ReadAfterInt2Fpu, 7>;

The last step is to add ReadAfterInt2Fpu to the schedule read/write list of SSE CVTSI*_Int definitions.
In my experimental change, I had to change multiclass sse12_cvt_sint_3addr by adding an extra param (see below):

Index: X86InstrSSE.td
===================================================================
--- X86InstrSSE.td      (revision 357997)
+++ X86InstrSSE.td      (working copy)
@@ -1026,6 +989,7 @@
 multiclass sse12_cvt_sint_3addr<bits<8> opc, RegisterClass SrcRC,
                     RegisterClass DstRC, X86MemOperand x86memop,
                     string asm, X86FoldableSchedWrite sched,
+                    SchedRead ReadAdv = ReadDefault,
                     bit Is2Addr = 1> {

That required the following changes too:

+let Predicates = [UseAVX] in {
+defm VCVTSI2SS : sse12_cvt_sint_3addr<0x2A, GR32, VR128,
+          i32mem, "cvtsi2ss{l}", WriteCvtI2SS, ReadDefault, 0>, XS, VEX_4V;
+defm VCVTSI642SS : sse12_cvt_sint_3addr<0x2A, GR64, VR128,
+          i64mem, "cvtsi2ss{q}", WriteCvtI2SS, ReadDefault, 0>, XS, VEX_4V, VEX_W;
+defm VCVTSI2SD : sse12_cvt_sint_3addr<0x2A, GR32, VR128,
+          i32mem, "cvtsi2sd{l}", WriteCvtI2SD, ReadDefault, 0>, XD, VEX_4V;
+defm VCVTSI642SD : sse12_cvt_sint_3addr<0x2A, GR64, VR128,
+          i64mem, "cvtsi2sd{q}", WriteCvtI2SD, ReadDefault, 0>, XD, VEX_4V, VEX_W;
+}
+let Constraints = "$src1 = $dst" in {
+  defm CVTSI2SS : sse12_cvt_sint_3addr<0x2A, GR32, VR128,
+                        i32mem, "cvtsi2ss{l}", WriteCvtI2SS, ReadAfterInt2Fpu>, XS;
+  defm CVTSI642SS : sse12_cvt_sint_3addr<0x2A, GR64, VR128,
+                        i64mem, "cvtsi2ss{q}", WriteCvtI2SS, ReadAfterInt2Fpu>, XS, REX_W;
+  defm CVTSI2SD : sse12_cvt_sint_3addr<0x2A, GR32, VR128,
+                        i32mem, "cvtsi2sd{l}", WriteCvtI2SD, ReadAfterInt2Fpu>, XD;
+  defm CVTSI642SD : sse12_cvt_sint_3addr<0x2A, GR64, VR128,
+                        i64mem, "cvtsi2sd{q}", WriteCvtI2SD, ReadAfterInt2Fpu>, XD, REX_W;
+}

This was enough to avoid the regression on BtVer2.

There is one last change required for BdVer2:

===================================================================
--- X86ScheduleBdVer2.td        (revision 357997)
+++ X86ScheduleBdVer2.td        (working copy)
@@ -898,7 +899,8 @@
   let Latency = 13;
   let NumMicroOps = 2;
 }
-def : InstRW<[PdWriteCVTSI642SDrr_CVTSI642SSrr_CVTSI2SDr_CVTSI2SSrr], (instrs CVTSI642SDrr, CVTSI642SSrr, CVTSI2SDrr, CVTSI2SSrr)>;
+def : InstRW<[PdWriteCVTSI642SDrr_CVTSI642SSrr_CVTSI2SDr_CVTSI2SSrr, ReadAfterInt2Fpu], (instrs CVTSI642SDrr, CVTSI642SSrr, CVTSI2SDrr, CVTSI2SSrr,
+                                                                              CVTSI642SDrr_Int, CVTSI642SSrr_Int, CVTSI2SDrr_Int, CVTSI2SSrr_Int)>;

Basically, we need to explicitly pass ReadAfterInt2Fpu to that InstRW. Otherwise the regression would not go away.

Tbh. I don't know if there is another way to fix that regression.
If we want to fix it, then this may be a way to do it.

I hope it helps.

Architecturally that read really does exist. Its not a false dependency. That read defines the upper bits of the result. The fact that AVX and SSE were different before this patch seems like a bug. It looks like with your proposed change they would still be different. That doesn't seem right.

In D60441#1460147, @craig.topper wrote:

Architecturally that read really does exist. Its not a false dependency. That read defines the upper bits of the result. The fact that AVX and SSE were different before this patch seems like a bug. It looks like with your proposed change they would still be different. That doesn't seem right.

Right. Sorry. The upper bits of the result are unmodified for the SSE variants. So yes, it was a bug before, and that read does exist in practice.

In D60441#1460184, @andreadb wrote:

In D60441#1460147, @craig.topper wrote:

Architecturally that read really does exist. Its not a false dependency. That read defines the upper bits of the result. The fact that AVX and SSE were different before this patch seems like a bug. It looks like with your proposed change they would still be different. That doesn't seem right.

Right. Sorry. The upper bits of the result are unmodified for the SSE variants. So yes, it was a bug before, and that read does exist in practice.

So, I found out the reason why the modified BtVer2 test was expecting a higher IPC.
I am specifically talking about the code comment from that test that says `# Throughput for the SSE code snippets below should tend to 1.00 IPC.`

The microbenchmark which was used to measure the actual throughput on the target was pre-initializing XMM0 to all-zeroes.
It shouldn't have done that because AMD processors (at least since AMDFam15h) implement a "register merge optimization" based on the knowledge of zero bits in XMM registers (see below).

Quoting the AMDFam16h SOG:

2.11 XMM Register Merge Optimization
The AMD Family 16h processor implements an XMM register merge optimization.
The processor keeps track of XMM registers whose upper portions have been cleared to zeros. This information
can be followed through multiple operations and register destinations until non-zero data is written into a
register. For certain instructions, this information can be used to bypass the usual result merging for the upper
parts of the register.

Instruction CVTSI2SS and CVTSI2SD are listed by that document as instructions that can benefit from that register merge optimization.

So... I have rerun my original microbenchmark. This time, I made sure not to set XMM0 to all-zeroes.
This is what I've got:

cycles:           105415644                                       ( +- 0.20% )
instructions:     26000303         #   0.25 insn per cycle        ( +- 0.00% )
micro-opcodes:    51640304         #   0.49 uOps per cycle        ( +- 0.01% )

So, yes. The test should not have expected a 1.00 IPC. It was a bug.
It should have been 0.25 IPC instead (which is what we would get with your patch).

I also noticed that this same optimization is done by Fam15h processors (so, it applies to Piledriver). That same paragraph can be found in AMD Fam15h SOG - Section 5.5 Partial-Register Writes.

@lebedev.ri can probably verify those numbers for BdVer2.

Again, sorry for the confusion caused by my previous post. I was trying to be useful but I failed... (I keep forgetting about those SSE partial writes).

This patch looks good to me. Hopefully Roman will be able to verify those numbers for BdVer2.

-Andrea

This revision is now accepted and ready to land.Apr 10 2019, 4:56 AM

In D60441#1461088, @andreadb wrote:

I also noticed that this same optimization is done by Fam15h processors (so, it applies to Piledriver). That same paragraph can be found in AMD Fam15h SOG - Section 5.5 Partial-Register Writes.

@lebedev.ri can probably verify those numbers for BdVer2.

Sorry, i lost the track here. What's the exact methodology and the test?
Apply this patch and then $ perf stat ./bin/llvm-exegesis -opcode-name=CVTSI2SSrr_Int -mode=inverse_throughput -num-repetitions=1000000 ?

Closed by commit rL358138: [X86] Make _Int instructions the preferred instructon for the assembly parser… (authored by ctopper). · Explain WhyApr 10 2019, 2:28 PM

This revision was automatically updated to reflect the committed changes.

xiangzhangllvm mentioned this in D147996: [X86] combineConcatVectorOps - remove FADD/FSUB/FMUL handling (2-1).Apr 17 2023, 5:38 PM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

40 lines

6 lines

245 lines

1 line

4 lines

3 lines

test/

tools/

llvm-mca/

X86/

BdVer2/

int-to-fpu-forwarding-2.s

12 lines

BtVer2/

int-to-fpu-forwarding-2.s

12 lines

Diff 194589

llvm/trunk/lib/Target/X86/X86InstrAVX512.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,296 Lines • ▼ Show 20 Lines

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// AVX-512 Scalar convert from sign integer to float/double		// AVX-512 Scalar convert from sign integer to float/double
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

multiclass avx512_vcvtsi<bits<8> opc, SDPatternOperator OpNode, X86FoldableSchedWrite sched,		multiclass avx512_vcvtsi<bits<8> opc, SDPatternOperator OpNode, X86FoldableSchedWrite sched,
RegisterClass SrcRC, X86VectorVTInfo DstVT,		RegisterClass SrcRC, X86VectorVTInfo DstVT,
X86MemOperand x86memop, PatFrag ld_frag, string asm> {		X86MemOperand x86memop, PatFrag ld_frag, string asm> {
let hasSideEffects = 0 in {		let hasSideEffects = 0, isCodeGenOnly = 1 in {
def rr : SI<opc, MRMSrcReg, (outs DstVT.FRC:$dst),		def rr : SI<opc, MRMSrcReg, (outs DstVT.FRC:$dst),
(ins DstVT.FRC:$src1, SrcRC:$src),		(ins DstVT.FRC:$src1, SrcRC:$src),
!strconcat(asm,"\t{$src, $src1, $dst\|$dst, $src1, $src}"), []>,		!strconcat(asm,"\t{$src, $src1, $dst\|$dst, $src1, $src}"), []>,
EVEX_4V, Sched<[sched, ReadDefault, ReadInt2Fpu]>;		EVEX_4V, Sched<[sched, ReadDefault, ReadInt2Fpu]>;
let mayLoad = 1 in		let mayLoad = 1 in
def rm : SI<opc, MRMSrcMem, (outs DstVT.FRC:$dst),		def rm : SI<opc, MRMSrcMem, (outs DstVT.FRC:$dst),
(ins DstVT.FRC:$src1, x86memop:$src),		(ins DstVT.FRC:$src1, x86memop:$src),
!strconcat(asm,"\t{$src, $src1, $dst\|$dst, $src1, $src}"), []>,		!strconcat(asm,"\t{$src, $src1, $dst\|$dst, $src1, $src}"), []>,
EVEX_4V, Sched<[sched.Folded, sched.ReadAfterFold]>;		EVEX_4V, Sched<[sched.Folded, sched.ReadAfterFold]>;
} // hasSideEffects = 0		} // hasSideEffects = 0
let isCodeGenOnly = 1 in {
def rr_Int : SI<opc, MRMSrcReg, (outs DstVT.RC:$dst),		def rr_Int : SI<opc, MRMSrcReg, (outs DstVT.RC:$dst),
(ins DstVT.RC:$src1, SrcRC:$src2),		(ins DstVT.RC:$src1, SrcRC:$src2),
!strconcat(asm,"\t{$src2, $src1, $dst\|$dst, $src1, $src2}"),		!strconcat(asm,"\t{$src2, $src1, $dst\|$dst, $src1, $src2}"),
[(set DstVT.RC:$dst,		[(set DstVT.RC:$dst,
(OpNode (DstVT.VT DstVT.RC:$src1), SrcRC:$src2))]>,		(OpNode (DstVT.VT DstVT.RC:$src1), SrcRC:$src2))]>,
EVEX_4V, Sched<[sched, ReadDefault, ReadInt2Fpu]>;		EVEX_4V, Sched<[sched, ReadDefault, ReadInt2Fpu]>;

def rm_Int : SI<opc, MRMSrcMem, (outs DstVT.RC:$dst),		def rm_Int : SI<opc, MRMSrcMem, (outs DstVT.RC:$dst),
(ins DstVT.RC:$src1, x86memop:$src2),		(ins DstVT.RC:$src1, x86memop:$src2),
!strconcat(asm,"\t{$src2, $src1, $dst\|$dst, $src1, $src2}"),		!strconcat(asm,"\t{$src2, $src1, $dst\|$dst, $src1, $src2}"),
[(set DstVT.RC:$dst,		[(set DstVT.RC:$dst,
(OpNode (DstVT.VT DstVT.RC:$src1),		(OpNode (DstVT.VT DstVT.RC:$src1),
(ld_frag addr:$src2)))]>,		(ld_frag addr:$src2)))]>,
EVEX_4V, Sched<[sched.Folded, sched.ReadAfterFold]>;		EVEX_4V, Sched<[sched.Folded, sched.ReadAfterFold]>;
}//isCodeGenOnly = 1
}		}

multiclass avx512_vcvtsi_round<bits<8> opc, SDNode OpNode,		multiclass avx512_vcvtsi_round<bits<8> opc, SDNode OpNode,
X86FoldableSchedWrite sched, RegisterClass SrcRC,		X86FoldableSchedWrite sched, RegisterClass SrcRC,
X86VectorVTInfo DstVT, string asm> {		X86VectorVTInfo DstVT, string asm> {
def rrb_Int : SI<opc, MRMSrcReg, (outs DstVT.RC:$dst),		def rrb_Int : SI<opc, MRMSrcReg, (outs DstVT.RC:$dst),
(ins DstVT.RC:$src1, SrcRC:$src2, AVX512RC:$rc),		(ins DstVT.RC:$src1, SrcRC:$src2, AVX512RC:$rc),
!strconcat(asm,		!strconcat(asm,
Show All 27 Lines	defm VCVTSI2SDZ : avx512_vcvtsi<0x2A, null_frag, WriteCvtI2SD, GR32,
v2f64x_info, i32mem, loadi32, "cvtsi2sd{l}">,		v2f64x_info, i32mem, loadi32, "cvtsi2sd{l}">,
XD, VEX_LIG, EVEX_CD8<32, CD8VT1>;		XD, VEX_LIG, EVEX_CD8<32, CD8VT1>;
defm VCVTSI642SDZ: avx512_vcvtsi_common<0x2A, X86SintToFp, X86SintToFpRnd,		defm VCVTSI642SDZ: avx512_vcvtsi_common<0x2A, X86SintToFp, X86SintToFpRnd,
WriteCvtI2SD, GR64,		WriteCvtI2SD, GR64,
v2f64x_info, i64mem, loadi64, "cvtsi2sd{q}">,		v2f64x_info, i64mem, loadi64, "cvtsi2sd{q}">,
XD, VEX_W, EVEX_CD8<64, CD8VT1>;		XD, VEX_W, EVEX_CD8<64, CD8VT1>;

def : InstAlias<"vcvtsi2ss\t{$src, $src1, $dst\|$dst, $src1, $src}",		def : InstAlias<"vcvtsi2ss\t{$src, $src1, $dst\|$dst, $src1, $src}",
(VCVTSI2SSZrm FR64X:$dst, FR64X:$src1, i32mem:$src), 0, "att">;		(VCVTSI2SSZrm_Int VR128X:$dst, VR128X:$src1, i32mem:$src), 0, "att">;
def : InstAlias<"vcvtsi2sd\t{$src, $src1, $dst\|$dst, $src1, $src}",		def : InstAlias<"vcvtsi2sd\t{$src, $src1, $dst\|$dst, $src1, $src}",
(VCVTSI2SDZrm FR64X:$dst, FR64X:$src1, i32mem:$src), 0, "att">;		(VCVTSI2SDZrm_Int VR128X:$dst, VR128X:$src1, i32mem:$src), 0, "att">;

def : Pat<(f32 (sint_to_fp (loadi32 addr:$src))),		def : Pat<(f32 (sint_to_fp (loadi32 addr:$src))),
(VCVTSI2SSZrm (f32 (IMPLICIT_DEF)), addr:$src)>;		(VCVTSI2SSZrm (f32 (IMPLICIT_DEF)), addr:$src)>;
def : Pat<(f32 (sint_to_fp (loadi64 addr:$src))),		def : Pat<(f32 (sint_to_fp (loadi64 addr:$src))),
(VCVTSI642SSZrm (f32 (IMPLICIT_DEF)), addr:$src)>;		(VCVTSI642SSZrm (f32 (IMPLICIT_DEF)), addr:$src)>;
def : Pat<(f64 (sint_to_fp (loadi32 addr:$src))),		def : Pat<(f64 (sint_to_fp (loadi32 addr:$src))),
(VCVTSI2SDZrm (f64 (IMPLICIT_DEF)), addr:$src)>;		(VCVTSI2SDZrm (f64 (IMPLICIT_DEF)), addr:$src)>;
def : Pat<(f64 (sint_to_fp (loadi64 addr:$src))),		def : Pat<(f64 (sint_to_fp (loadi64 addr:$src))),
Show All 20 Lines	defm VCVTUSI2SDZ : avx512_vcvtsi<0x7B, null_frag, WriteCvtI2SD, GR32, v2f64x_info,
i32mem, loadi32, "cvtusi2sd{l}">,		i32mem, loadi32, "cvtusi2sd{l}">,
XD, VEX_LIG, EVEX_CD8<32, CD8VT1>;		XD, VEX_LIG, EVEX_CD8<32, CD8VT1>;
defm VCVTUSI642SDZ : avx512_vcvtsi_common<0x7B, X86UintToFp, X86UintToFpRnd,		defm VCVTUSI642SDZ : avx512_vcvtsi_common<0x7B, X86UintToFp, X86UintToFpRnd,
WriteCvtI2SD, GR64,		WriteCvtI2SD, GR64,
v2f64x_info, i64mem, loadi64, "cvtusi2sd{q}">,		v2f64x_info, i64mem, loadi64, "cvtusi2sd{q}">,
XD, VEX_W, EVEX_CD8<64, CD8VT1>;		XD, VEX_W, EVEX_CD8<64, CD8VT1>;

def : InstAlias<"vcvtusi2ss\t{$src, $src1, $dst\|$dst, $src1, $src}",		def : InstAlias<"vcvtusi2ss\t{$src, $src1, $dst\|$dst, $src1, $src}",
(VCVTUSI2SSZrm FR64X:$dst, FR64X:$src1, i32mem:$src), 0, "att">;		(VCVTUSI2SSZrm_Int VR128X:$dst, VR128X:$src1, i32mem:$src), 0, "att">;
def : InstAlias<"vcvtusi2sd\t{$src, $src1, $dst\|$dst, $src1, $src}",		def : InstAlias<"vcvtusi2sd\t{$src, $src1, $dst\|$dst, $src1, $src}",
(VCVTUSI2SDZrm FR64X:$dst, FR64X:$src1, i32mem:$src), 0, "att">;		(VCVTUSI2SDZrm_Int VR128X:$dst, VR128X:$src1, i32mem:$src), 0, "att">;

def : Pat<(f32 (uint_to_fp (loadi32 addr:$src))),		def : Pat<(f32 (uint_to_fp (loadi32 addr:$src))),
(VCVTUSI2SSZrm (f32 (IMPLICIT_DEF)), addr:$src)>;		(VCVTUSI2SSZrm (f32 (IMPLICIT_DEF)), addr:$src)>;
def : Pat<(f32 (uint_to_fp (loadi64 addr:$src))),		def : Pat<(f32 (uint_to_fp (loadi64 addr:$src))),
(VCVTUSI642SSZrm (f32 (IMPLICIT_DEF)), addr:$src)>;		(VCVTUSI642SSZrm (f32 (IMPLICIT_DEF)), addr:$src)>;
def : Pat<(f64 (uint_to_fp (loadi32 addr:$src))),		def : Pat<(f64 (uint_to_fp (loadi32 addr:$src))),
(VCVTUSI2SDZrm (f64 (IMPLICIT_DEF)), addr:$src)>;		(VCVTUSI2SDZrm (f64 (IMPLICIT_DEF)), addr:$src)>;
def : Pat<(f64 (uint_to_fp (loadi64 addr:$src))),		def : Pat<(f64 (uint_to_fp (loadi64 addr:$src))),
▲ Show 20 Lines • Show All 5,213 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrFMA.td

Show First 20 Lines • Show All 230 Lines • ▼ Show 20 Lines	def m : FMA3S<opc, MRMSrcMem, (outs RC:$dst),
(ins RC:$src1, RC:$src2, x86memop:$src3),		(ins RC:$src1, RC:$src2, x86memop:$src3),
!strconcat(OpcodeStr,		!strconcat(OpcodeStr,
"\t{$src3, $src2, $dst\|$dst, $src2, $src3}"),		"\t{$src3, $src2, $dst\|$dst, $src2, $src3}"),
[(set RC:$dst,		[(set RC:$dst,
(OpNode (load addr:$src3), RC:$src1, RC:$src2))]>,		(OpNode (load addr:$src3), RC:$src1, RC:$src2))]>,
Sched<[sched.Folded, sched.ReadAfterFold, sched.ReadAfterFold]>;		Sched<[sched.Folded, sched.ReadAfterFold, sched.ReadAfterFold]>;
}		}

let Constraints = "$src1 = $dst", isCommutable = 1, hasSideEffects = 0 in		let Constraints = "$src1 = $dst", isCommutable = 1, isCodeGenOnly = 1,
		hasSideEffects = 0 in
multiclass fma3s_forms<bits<8> opc132, bits<8> opc213, bits<8> opc231,		multiclass fma3s_forms<bits<8> opc132, bits<8> opc213, bits<8> opc231,
string OpStr, string PackTy, string Suff,		string OpStr, string PackTy, string Suff,
SDNode OpNode, RegisterClass RC,		SDNode OpNode, RegisterClass RC,
X86MemOperand x86memop, X86FoldableSchedWrite sched> {		X86MemOperand x86memop, X86FoldableSchedWrite sched> {
defm NAME#213#Suff : fma3s_rm_213<opc213, !strconcat(OpStr, "213", PackTy),		defm NAME#213#Suff : fma3s_rm_213<opc213, !strconcat(OpStr, "213", PackTy),
x86memop, RC, OpNode, sched>;		x86memop, RC, OpNode, sched>;
defm NAME#231#Suff : fma3s_rm_231<opc231, !strconcat(OpStr, "231", PackTy),		defm NAME#231#Suff : fma3s_rm_231<opc231, !strconcat(OpStr, "231", PackTy),
x86memop, RC, OpNode, sched>;		x86memop, RC, OpNode, sched>;
Show All 9 Lines
// All of the FMA*_Int opcodes are defined as commutable here.		// All of the FMA*_Int opcodes are defined as commutable here.
// Commuting the 2nd and 3rd source register operands of FMAs is quite trivial		// Commuting the 2nd and 3rd source register operands of FMAs is quite trivial
// and the corresponding optimizations have been developed.		// and the corresponding optimizations have been developed.
// Commuting the 1st operand of FMA*_Int requires some additional analysis,		// Commuting the 1st operand of FMA*_Int requires some additional analysis,
// the commute optimization is legal only if all users of FMA*_Int use only		// the commute optimization is legal only if all users of FMA*_Int use only
// the lowest element of the FMA*_Int instruction. Even though such analysis		// the lowest element of the FMA*_Int instruction. Even though such analysis
// may be not implemented yet we allow the routines doing the actual commute		// may be not implemented yet we allow the routines doing the actual commute
// transformation to decide if one or another instruction is commutable or not.		// transformation to decide if one or another instruction is commutable or not.
let Constraints = "$src1 = $dst", isCommutable = 1, isCodeGenOnly = 1,		let Constraints = "$src1 = $dst", isCommutable = 1, hasSideEffects = 0 in
hasSideEffects = 0 in
multiclass fma3s_rm_int<bits<8> opc, string OpcodeStr,		multiclass fma3s_rm_int<bits<8> opc, string OpcodeStr,
Operand memopr, RegisterClass RC,		Operand memopr, RegisterClass RC,
X86FoldableSchedWrite sched> {		X86FoldableSchedWrite sched> {
def r_Int : FMA3S_Int<opc, MRMSrcReg, (outs RC:$dst),		def r_Int : FMA3S_Int<opc, MRMSrcReg, (outs RC:$dst),
(ins RC:$src1, RC:$src2, RC:$src3),		(ins RC:$src1, RC:$src2, RC:$src3),
!strconcat(OpcodeStr,		!strconcat(OpcodeStr,
"\t{$src3, $src2, $dst\|$dst, $src2, $src3}"),		"\t{$src3, $src2, $dst\|$dst, $src2, $src3}"),
[]>, Sched<[sched]>;		[]>, Sched<[sched]>;
▲ Show 20 Lines • Show All 361 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrSSE.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show All 15 Lines
// SSE 1 & 2 Instructions Classes		// SSE 1 & 2 Instructions Classes
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// sse12_fp_scalar - SSE 1 & 2 scalar instructions class		/// sse12_fp_scalar - SSE 1 & 2 scalar instructions class
multiclass sse12_fp_scalar<bits<8> opc, string OpcodeStr, SDNode OpNode,		multiclass sse12_fp_scalar<bits<8> opc, string OpcodeStr, SDNode OpNode,
RegisterClass RC, X86MemOperand x86memop,		RegisterClass RC, X86MemOperand x86memop,
Domain d, X86FoldableSchedWrite sched,		Domain d, X86FoldableSchedWrite sched,
bit Is2Addr = 1> {		bit Is2Addr = 1> {
		let isCodeGenOnly = 1 in {
let isCommutable = 1 in {		let isCommutable = 1 in {
def rr : SI<opc, MRMSrcReg, (outs RC:$dst), (ins RC:$src1, RC:$src2),		def rr : SI<opc, MRMSrcReg, (outs RC:$dst), (ins RC:$src1, RC:$src2),
!if(Is2Addr,		!if(Is2Addr,
!strconcat(OpcodeStr, "\t{$src2, $dst\|$dst, $src2}"),		!strconcat(OpcodeStr, "\t{$src2, $dst\|$dst, $src2}"),
!strconcat(OpcodeStr, "\t{$src2, $src1, $dst\|$dst, $src1, $src2}")),		!strconcat(OpcodeStr, "\t{$src2, $src1, $dst\|$dst, $src1, $src2}")),
[(set RC:$dst, (OpNode RC:$src1, RC:$src2))], d>,		[(set RC:$dst, (OpNode RC:$src1, RC:$src2))], d>,
Sched<[sched]>;		Sched<[sched]>;
}		}
def rm : SI<opc, MRMSrcMem, (outs RC:$dst), (ins RC:$src1, x86memop:$src2),		def rm : SI<opc, MRMSrcMem, (outs RC:$dst), (ins RC:$src1, x86memop:$src2),
!if(Is2Addr,		!if(Is2Addr,
!strconcat(OpcodeStr, "\t{$src2, $dst\|$dst, $src2}"),		!strconcat(OpcodeStr, "\t{$src2, $dst\|$dst, $src2}"),
!strconcat(OpcodeStr, "\t{$src2, $src1, $dst\|$dst, $src1, $src2}")),		!strconcat(OpcodeStr, "\t{$src2, $src1, $dst\|$dst, $src1, $src2}")),
[(set RC:$dst, (OpNode RC:$src1, (load addr:$src2)))], d>,		[(set RC:$dst, (OpNode RC:$src1, (load addr:$src2)))], d>,
Sched<[sched.Folded, sched.ReadAfterFold]>;		Sched<[sched.Folded, sched.ReadAfterFold]>;
}		}
		}

/// sse12_fp_scalar_int - SSE 1 & 2 scalar instructions intrinsics class		/// sse12_fp_scalar_int - SSE 1 & 2 scalar instructions intrinsics class
multiclass sse12_fp_scalar_int<bits<8> opc, string OpcodeStr,		multiclass sse12_fp_scalar_int<bits<8> opc, string OpcodeStr,
SDPatternOperator OpNode, RegisterClass RC,		SDPatternOperator OpNode, RegisterClass RC,
ValueType VT, string asm, Operand memopr,		ValueType VT, string asm, Operand memopr,
ComplexPattern mem_cpat, Domain d,		ComplexPattern mem_cpat, Domain d,
X86FoldableSchedWrite sched, bit Is2Addr = 1> {		X86FoldableSchedWrite sched, bit Is2Addr = 1> {
let isCodeGenOnly = 1, hasSideEffects = 0 in {		let hasSideEffects = 0 in {
def rr_Int : SI_Int<opc, MRMSrcReg, (outs RC:$dst), (ins RC:$src1, RC:$src2),		def rr_Int : SI_Int<opc, MRMSrcReg, (outs RC:$dst), (ins RC:$src1, RC:$src2),
!if(Is2Addr,		!if(Is2Addr,
!strconcat(asm, "\t{$src2, $dst\|$dst, $src2}"),		!strconcat(asm, "\t{$src2, $dst\|$dst, $src2}"),
!strconcat(asm, "\t{$src2, $src1, $dst\|$dst, $src1, $src2}")),		!strconcat(asm, "\t{$src2, $src1, $dst\|$dst, $src1, $src2}")),
[(set RC:$dst, (VT (OpNode RC:$src1, RC:$src2)))], d>,		[(set RC:$dst, (VT (OpNode RC:$src1, RC:$src2)))], d>,
Sched<[sched]>;		Sched<[sched]>;
let mayLoad = 1 in		let mayLoad = 1 in
def rm_Int : SI_Int<opc, MRMSrcMem, (outs RC:$dst), (ins RC:$src1, memopr:$src2),		def rm_Int : SI_Int<opc, MRMSrcMem, (outs RC:$dst), (ins RC:$src1, memopr:$src2),
▲ Show 20 Lines • Show All 800 Lines • ▼ Show 20 Lines	let hasSideEffects = 0, Predicates = [UseAVX] in {
let mayLoad = 1 in		let mayLoad = 1 in
def rm : SI<opc, MRMSrcMem, (outs DstRC:$dst),		def rm : SI<opc, MRMSrcMem, (outs DstRC:$dst),
(ins DstRC:$src1, x86memop:$src),		(ins DstRC:$src1, x86memop:$src),
!strconcat(asm,"\t{$src, $src1, $dst\|$dst, $src1, $src}"), []>,		!strconcat(asm,"\t{$src, $src1, $dst\|$dst, $src1, $src}"), []>,
Sched<[sched.Folded, sched.ReadAfterFold]>;		Sched<[sched.Folded, sched.ReadAfterFold]>;
} // hasSideEffects = 0		} // hasSideEffects = 0
}		}

let Predicates = [UseAVX] in {		let isCodeGenOnly = 1, Predicates = [UseAVX] in {
defm VCVTTSS2SI : sse12_cvt_s<0x2C, FR32, GR32, fp_to_sint, f32mem, loadf32,		defm VCVTTSS2SI : sse12_cvt_s<0x2C, FR32, GR32, fp_to_sint, f32mem, loadf32,
"cvttss2si\t{$src, $dst\|$dst, $src}",		"cvttss2si\t{$src, $dst\|$dst, $src}",
WriteCvtSS2I>,		WriteCvtSS2I>,
XS, VEX, VEX_LIG;		XS, VEX, VEX_LIG;
defm VCVTTSS2SI64 : sse12_cvt_s<0x2C, FR32, GR64, fp_to_sint, f32mem, loadf32,		defm VCVTTSS2SI64 : sse12_cvt_s<0x2C, FR32, GR64, fp_to_sint, f32mem, loadf32,
"cvttss2si\t{$src, $dst\|$dst, $src}",		"cvttss2si\t{$src, $dst\|$dst, $src}",
WriteCvtSS2I>,		WriteCvtSS2I>,
XS, VEX, VEX_W, VEX_LIG;		XS, VEX, VEX_W, VEX_LIG;
defm VCVTTSD2SI : sse12_cvt_s<0x2C, FR64, GR32, fp_to_sint, f64mem, loadf64,		defm VCVTTSD2SI : sse12_cvt_s<0x2C, FR64, GR32, fp_to_sint, f64mem, loadf64,
"cvttsd2si\t{$src, $dst\|$dst, $src}",		"cvttsd2si\t{$src, $dst\|$dst, $src}",
WriteCvtSD2I>,		WriteCvtSD2I>,
XD, VEX, VEX_LIG;		XD, VEX, VEX_LIG;
defm VCVTTSD2SI64 : sse12_cvt_s<0x2C, FR64, GR64, fp_to_sint, f64mem, loadf64,		defm VCVTTSD2SI64 : sse12_cvt_s<0x2C, FR64, GR64, fp_to_sint, f64mem, loadf64,
"cvttsd2si\t{$src, $dst\|$dst, $src}",		"cvttsd2si\t{$src, $dst\|$dst, $src}",
WriteCvtSD2I>,		WriteCvtSD2I>,
XD, VEX, VEX_W, VEX_LIG;		XD, VEX, VEX_W, VEX_LIG;

def : InstAlias<"vcvttss2si{l}\t{$src, $dst\|$dst, $src}",
(VCVTTSS2SIrr GR32:$dst, FR32:$src), 0, "att">;
def : InstAlias<"vcvttss2si{l}\t{$src, $dst\|$dst, $src}",
(VCVTTSS2SIrm GR32:$dst, f32mem:$src), 0, "att">;
def : InstAlias<"vcvttsd2si{l}\t{$src, $dst\|$dst, $src}",
(VCVTTSD2SIrr GR32:$dst, FR64:$src), 0, "att">;
def : InstAlias<"vcvttsd2si{l}\t{$src, $dst\|$dst, $src}",
(VCVTTSD2SIrm GR32:$dst, f64mem:$src), 0, "att">;
def : InstAlias<"vcvttss2si{q}\t{$src, $dst\|$dst, $src}",
(VCVTTSS2SI64rr GR64:$dst, FR32:$src), 0, "att">;
def : InstAlias<"vcvttss2si{q}\t{$src, $dst\|$dst, $src}",
(VCVTTSS2SI64rm GR64:$dst, f32mem:$src), 0, "att">;
def : InstAlias<"vcvttsd2si{q}\t{$src, $dst\|$dst, $src}",
(VCVTTSD2SI64rr GR64:$dst, FR64:$src), 0, "att">;
def : InstAlias<"vcvttsd2si{q}\t{$src, $dst\|$dst, $src}",
(VCVTTSD2SI64rm GR64:$dst, f64mem:$src), 0, "att">;
}		}

// The assembler can recognize rr 64-bit instructions by seeing a rxx		// The assembler can recognize rr 64-bit instructions by seeing a rxx
// register, but the same isn't true when only using memory operands,		// register, but the same isn't true when only using memory operands,
// provide other assembly "l" and "q" forms to address this explicitly		// provide other assembly "l" and "q" forms to address this explicitly
// where appropriate to do so.		// where appropriate to do so.
		let isCodeGenOnly = 1 in {
defm VCVTSI2SS : sse12_vcvt_avx<0x2A, GR32, FR32, i32mem, "cvtsi2ss{l}",		defm VCVTSI2SS : sse12_vcvt_avx<0x2A, GR32, FR32, i32mem, "cvtsi2ss{l}",
WriteCvtI2SS>, XS, VEX_4V, VEX_LIG;		WriteCvtI2SS>, XS, VEX_4V, VEX_LIG;
defm VCVTSI642SS : sse12_vcvt_avx<0x2A, GR64, FR32, i64mem, "cvtsi2ss{q}",		defm VCVTSI642SS : sse12_vcvt_avx<0x2A, GR64, FR32, i64mem, "cvtsi2ss{q}",
WriteCvtI2SS>, XS, VEX_4V, VEX_W, VEX_LIG;		WriteCvtI2SS>, XS, VEX_4V, VEX_W, VEX_LIG;
defm VCVTSI2SD : sse12_vcvt_avx<0x2A, GR32, FR64, i32mem, "cvtsi2sd{l}",		defm VCVTSI2SD : sse12_vcvt_avx<0x2A, GR32, FR64, i32mem, "cvtsi2sd{l}",
WriteCvtI2SD>, XD, VEX_4V, VEX_LIG;		WriteCvtI2SD>, XD, VEX_4V, VEX_LIG;
defm VCVTSI642SD : sse12_vcvt_avx<0x2A, GR64, FR64, i64mem, "cvtsi2sd{q}",		defm VCVTSI642SD : sse12_vcvt_avx<0x2A, GR64, FR64, i64mem, "cvtsi2sd{q}",
WriteCvtI2SD>, XD, VEX_4V, VEX_W, VEX_LIG;		WriteCvtI2SD>, XD, VEX_4V, VEX_W, VEX_LIG;
		} // isCodeGenOnly = 1
def : InstAlias<"vcvtsi2ss\t{$src, $src1, $dst\|$dst, $src1, $src}",
(VCVTSI2SSrm FR64:$dst, FR64:$src1, i32mem:$src), 0, "att">;
def : InstAlias<"vcvtsi2sd\t{$src, $src1, $dst\|$dst, $src1, $src}",
(VCVTSI2SDrm FR64:$dst, FR64:$src1, i32mem:$src), 0, "att">;

let Predicates = [UseAVX] in {		let Predicates = [UseAVX] in {
def : Pat<(f32 (sint_to_fp (loadi32 addr:$src))),		def : Pat<(f32 (sint_to_fp (loadi32 addr:$src))),
(VCVTSI2SSrm (f32 (IMPLICIT_DEF)), addr:$src)>;		(VCVTSI2SSrm (f32 (IMPLICIT_DEF)), addr:$src)>;
def : Pat<(f32 (sint_to_fp (loadi64 addr:$src))),		def : Pat<(f32 (sint_to_fp (loadi64 addr:$src))),
(VCVTSI642SSrm (f32 (IMPLICIT_DEF)), addr:$src)>;		(VCVTSI642SSrm (f32 (IMPLICIT_DEF)), addr:$src)>;
def : Pat<(f64 (sint_to_fp (loadi32 addr:$src))),		def : Pat<(f64 (sint_to_fp (loadi32 addr:$src))),
(VCVTSI2SDrm (f64 (IMPLICIT_DEF)), addr:$src)>;		(VCVTSI2SDrm (f64 (IMPLICIT_DEF)), addr:$src)>;
def : Pat<(f64 (sint_to_fp (loadi64 addr:$src))),		def : Pat<(f64 (sint_to_fp (loadi64 addr:$src))),
(VCVTSI642SDrm (f64 (IMPLICIT_DEF)), addr:$src)>;		(VCVTSI642SDrm (f64 (IMPLICIT_DEF)), addr:$src)>;

def : Pat<(f32 (sint_to_fp GR32:$src)),		def : Pat<(f32 (sint_to_fp GR32:$src)),
(VCVTSI2SSrr (f32 (IMPLICIT_DEF)), GR32:$src)>;		(VCVTSI2SSrr (f32 (IMPLICIT_DEF)), GR32:$src)>;
def : Pat<(f32 (sint_to_fp GR64:$src)),		def : Pat<(f32 (sint_to_fp GR64:$src)),
(VCVTSI642SSrr (f32 (IMPLICIT_DEF)), GR64:$src)>;		(VCVTSI642SSrr (f32 (IMPLICIT_DEF)), GR64:$src)>;
def : Pat<(f64 (sint_to_fp GR32:$src)),		def : Pat<(f64 (sint_to_fp GR32:$src)),
(VCVTSI2SDrr (f64 (IMPLICIT_DEF)), GR32:$src)>;		(VCVTSI2SDrr (f64 (IMPLICIT_DEF)), GR32:$src)>;
def : Pat<(f64 (sint_to_fp GR64:$src)),		def : Pat<(f64 (sint_to_fp GR64:$src)),
(VCVTSI642SDrr (f64 (IMPLICIT_DEF)), GR64:$src)>;		(VCVTSI642SDrr (f64 (IMPLICIT_DEF)), GR64:$src)>;
}		}

		let isCodeGenOnly = 1 in {
defm CVTTSS2SI : sse12_cvt_s<0x2C, FR32, GR32, fp_to_sint, f32mem, loadf32,		defm CVTTSS2SI : sse12_cvt_s<0x2C, FR32, GR32, fp_to_sint, f32mem, loadf32,
"cvttss2si\t{$src, $dst\|$dst, $src}",		"cvttss2si\t{$src, $dst\|$dst, $src}",
WriteCvtSS2I>, XS;		WriteCvtSS2I>, XS;
defm CVTTSS2SI64 : sse12_cvt_s<0x2C, FR32, GR64, fp_to_sint, f32mem, loadf32,		defm CVTTSS2SI64 : sse12_cvt_s<0x2C, FR32, GR64, fp_to_sint, f32mem, loadf32,
"cvttss2si\t{$src, $dst\|$dst, $src}",		"cvttss2si\t{$src, $dst\|$dst, $src}",
WriteCvtSS2I>, XS, REX_W;		WriteCvtSS2I>, XS, REX_W;
defm CVTTSD2SI : sse12_cvt_s<0x2C, FR64, GR32, fp_to_sint, f64mem, loadf64,		defm CVTTSD2SI : sse12_cvt_s<0x2C, FR64, GR32, fp_to_sint, f64mem, loadf64,
"cvttsd2si\t{$src, $dst\|$dst, $src}",		"cvttsd2si\t{$src, $dst\|$dst, $src}",
WriteCvtSD2I>, XD;		WriteCvtSD2I>, XD;
defm CVTTSD2SI64 : sse12_cvt_s<0x2C, FR64, GR64, fp_to_sint, f64mem, loadf64,		defm CVTTSD2SI64 : sse12_cvt_s<0x2C, FR64, GR64, fp_to_sint, f64mem, loadf64,
"cvttsd2si\t{$src, $dst\|$dst, $src}",		"cvttsd2si\t{$src, $dst\|$dst, $src}",
WriteCvtSD2I>, XD, REX_W;		WriteCvtSD2I>, XD, REX_W;
defm CVTSI2SS : sse12_cvt_s<0x2A, GR32, FR32, sint_to_fp, i32mem, loadi32,		defm CVTSI2SS : sse12_cvt_s<0x2A, GR32, FR32, sint_to_fp, i32mem, loadi32,
"cvtsi2ss{l}\t{$src, $dst\|$dst, $src}",		"cvtsi2ss{l}\t{$src, $dst\|$dst, $src}",
WriteCvtI2SS, ReadInt2Fpu>, XS;		WriteCvtI2SS, ReadInt2Fpu>, XS;
defm CVTSI642SS : sse12_cvt_s<0x2A, GR64, FR32, sint_to_fp, i64mem, loadi64,		defm CVTSI642SS : sse12_cvt_s<0x2A, GR64, FR32, sint_to_fp, i64mem, loadi64,
"cvtsi2ss{q}\t{$src, $dst\|$dst, $src}",		"cvtsi2ss{q}\t{$src, $dst\|$dst, $src}",
WriteCvtI2SS, ReadInt2Fpu>, XS, REX_W;		WriteCvtI2SS, ReadInt2Fpu>, XS, REX_W;
defm CVTSI2SD : sse12_cvt_s<0x2A, GR32, FR64, sint_to_fp, i32mem, loadi32,		defm CVTSI2SD : sse12_cvt_s<0x2A, GR32, FR64, sint_to_fp, i32mem, loadi32,
"cvtsi2sd{l}\t{$src, $dst\|$dst, $src}",		"cvtsi2sd{l}\t{$src, $dst\|$dst, $src}",
WriteCvtI2SD, ReadInt2Fpu>, XD;		WriteCvtI2SD, ReadInt2Fpu>, XD;
defm CVTSI642SD : sse12_cvt_s<0x2A, GR64, FR64, sint_to_fp, i64mem, loadi64,		defm CVTSI642SD : sse12_cvt_s<0x2A, GR64, FR64, sint_to_fp, i64mem, loadi64,
"cvtsi2sd{q}\t{$src, $dst\|$dst, $src}",		"cvtsi2sd{q}\t{$src, $dst\|$dst, $src}",
WriteCvtI2SD, ReadInt2Fpu>, XD, REX_W;		WriteCvtI2SD, ReadInt2Fpu>, XD, REX_W;
		} // isCodeGenOnly = 1
def : InstAlias<"cvttss2si{l}\t{$src, $dst\|$dst, $src}",
(CVTTSS2SIrr GR32:$dst, FR32:$src), 0, "att">;
def : InstAlias<"cvttss2si{l}\t{$src, $dst\|$dst, $src}",
(CVTTSS2SIrm GR32:$dst, f32mem:$src), 0, "att">;
def : InstAlias<"cvttsd2si{l}\t{$src, $dst\|$dst, $src}",
(CVTTSD2SIrr GR32:$dst, FR64:$src), 0, "att">;
def : InstAlias<"cvttsd2si{l}\t{$src, $dst\|$dst, $src}",
(CVTTSD2SIrm GR32:$dst, f64mem:$src), 0, "att">;
def : InstAlias<"cvttss2si{q}\t{$src, $dst\|$dst, $src}",
(CVTTSS2SI64rr GR64:$dst, FR32:$src), 0, "att">;
def : InstAlias<"cvttss2si{q}\t{$src, $dst\|$dst, $src}",
(CVTTSS2SI64rm GR64:$dst, f32mem:$src), 0, "att">;
def : InstAlias<"cvttsd2si{q}\t{$src, $dst\|$dst, $src}",
(CVTTSD2SI64rr GR64:$dst, FR64:$src), 0, "att">;
def : InstAlias<"cvttsd2si{q}\t{$src, $dst\|$dst, $src}",
(CVTTSD2SI64rm GR64:$dst, f64mem:$src), 0, "att">;

def : InstAlias<"cvtsi2ss\t{$src, $dst\|$dst, $src}",
(CVTSI2SSrm FR64:$dst, i32mem:$src), 0, "att">;
def : InstAlias<"cvtsi2sd\t{$src, $dst\|$dst, $src}",
(CVTSI2SDrm FR64:$dst, i32mem:$src), 0, "att">;

// Conversion Instructions Intrinsics - Match intrinsics which expect MM		// Conversion Instructions Intrinsics - Match intrinsics which expect MM
// and/or XMM operand(s).		// and/or XMM operand(s).

multiclass sse12_cvt_sint<bits<8> opc, RegisterClass SrcRC, RegisterClass DstRC,		multiclass sse12_cvt_sint<bits<8> opc, RegisterClass SrcRC, RegisterClass DstRC,
ValueType DstVT, ValueType SrcVT, SDNode OpNode,		ValueType DstVT, ValueType SrcVT, SDNode OpNode,
Operand memop, ComplexPattern mem_cpat, string asm,		Operand memop, ComplexPattern mem_cpat, string asm,
X86FoldableSchedWrite sched> {		X86FoldableSchedWrite sched> {
Show All 36 Lines	defm VCVTSD2SI64 : sse12_cvt_sint<0x2D, VR128, GR64, i64, v2f64,
WriteCvtSD2I>, XD, VEX, VEX_W, VEX_LIG;		WriteCvtSD2I>, XD, VEX, VEX_W, VEX_LIG;
}		}
defm CVTSD2SI : sse12_cvt_sint<0x2D, VR128, GR32, i32, v2f64, X86cvts2si,		defm CVTSD2SI : sse12_cvt_sint<0x2D, VR128, GR32, i32, v2f64, X86cvts2si,
sdmem, sse_load_f64, "cvtsd2si", WriteCvtSD2I>, XD;		sdmem, sse_load_f64, "cvtsd2si", WriteCvtSD2I>, XD;
defm CVTSD2SI64 : sse12_cvt_sint<0x2D, VR128, GR64, i64, v2f64, X86cvts2si,		defm CVTSD2SI64 : sse12_cvt_sint<0x2D, VR128, GR64, i64, v2f64, X86cvts2si,
sdmem, sse_load_f64, "cvtsd2si", WriteCvtSD2I>, XD, REX_W;		sdmem, sse_load_f64, "cvtsd2si", WriteCvtSD2I>, XD, REX_W;


let isCodeGenOnly = 1 in {
let Predicates = [UseAVX] in {		let Predicates = [UseAVX] in {
defm VCVTSI2SS : sse12_cvt_sint_3addr<0x2A, GR32, VR128,		defm VCVTSI2SS : sse12_cvt_sint_3addr<0x2A, GR32, VR128,
i32mem, "cvtsi2ss{l}", WriteCvtI2SS, 0>, XS, VEX_4V, VEX_LIG;		i32mem, "cvtsi2ss{l}", WriteCvtI2SS, 0>, XS, VEX_4V, VEX_LIG;
defm VCVTSI642SS : sse12_cvt_sint_3addr<0x2A, GR64, VR128,		defm VCVTSI642SS : sse12_cvt_sint_3addr<0x2A, GR64, VR128,
i64mem, "cvtsi2ss{q}", WriteCvtI2SS, 0>, XS, VEX_4V, VEX_LIG, VEX_W;		i64mem, "cvtsi2ss{q}", WriteCvtI2SS, 0>, XS, VEX_4V, VEX_LIG, VEX_W;
defm VCVTSI2SD : sse12_cvt_sint_3addr<0x2A, GR32, VR128,		defm VCVTSI2SD : sse12_cvt_sint_3addr<0x2A, GR32, VR128,
i32mem, "cvtsi2sd{l}", WriteCvtI2SD, 0>, XD, VEX_4V, VEX_LIG;		i32mem, "cvtsi2sd{l}", WriteCvtI2SD, 0>, XD, VEX_4V, VEX_LIG;
defm VCVTSI642SD : sse12_cvt_sint_3addr<0x2A, GR64, VR128,		defm VCVTSI642SD : sse12_cvt_sint_3addr<0x2A, GR64, VR128,
i64mem, "cvtsi2sd{q}", WriteCvtI2SD, 0>, XD, VEX_4V, VEX_LIG, VEX_W;		i64mem, "cvtsi2sd{q}", WriteCvtI2SD, 0>, XD, VEX_4V, VEX_LIG, VEX_W;
}		}
let Constraints = "$src1 = $dst" in {		let Constraints = "$src1 = $dst" in {
defm CVTSI2SS : sse12_cvt_sint_3addr<0x2A, GR32, VR128,		defm CVTSI2SS : sse12_cvt_sint_3addr<0x2A, GR32, VR128,
i32mem, "cvtsi2ss{l}", WriteCvtI2SS>, XS;		i32mem, "cvtsi2ss{l}", WriteCvtI2SS>, XS;
defm CVTSI642SS : sse12_cvt_sint_3addr<0x2A, GR64, VR128,		defm CVTSI642SS : sse12_cvt_sint_3addr<0x2A, GR64, VR128,
i64mem, "cvtsi2ss{q}", WriteCvtI2SS>, XS, REX_W;		i64mem, "cvtsi2ss{q}", WriteCvtI2SS>, XS, REX_W;
defm CVTSI2SD : sse12_cvt_sint_3addr<0x2A, GR32, VR128,		defm CVTSI2SD : sse12_cvt_sint_3addr<0x2A, GR32, VR128,
i32mem, "cvtsi2sd{l}", WriteCvtI2SD>, XD;		i32mem, "cvtsi2sd{l}", WriteCvtI2SD>, XD;
defm CVTSI642SD : sse12_cvt_sint_3addr<0x2A, GR64, VR128,		defm CVTSI642SD : sse12_cvt_sint_3addr<0x2A, GR64, VR128,
i64mem, "cvtsi2sd{q}", WriteCvtI2SD>, XD, REX_W;		i64mem, "cvtsi2sd{q}", WriteCvtI2SD>, XD, REX_W;
}		}
} // isCodeGenOnly = 1
		def : InstAlias<"vcvtsi2ss\t{$src, $src1, $dst\|$dst, $src1, $src}",
		(VCVTSI2SSrm_Int VR128:$dst, VR128:$src1, i32mem:$src), 0, "att">;
		def : InstAlias<"vcvtsi2sd\t{$src, $src1, $dst\|$dst, $src1, $src}",
		(VCVTSI2SDrm_Int VR128:$dst, VR128:$src1, i32mem:$src), 0, "att">;

		def : InstAlias<"cvtsi2ss\t{$src, $dst\|$dst, $src}",
		(CVTSI2SSrm_Int VR128:$dst, i32mem:$src), 0, "att">;
		def : InstAlias<"cvtsi2sd\t{$src, $dst\|$dst, $src}",
		(CVTSI2SDrm_Int VR128:$dst, i32mem:$src), 0, "att">;

/// SSE 1 Only		/// SSE 1 Only

// Aliases for intrinsics		// Aliases for intrinsics
let isCodeGenOnly = 1 in {
let Predicates = [UseAVX] in {		let Predicates = [UseAVX] in {
defm VCVTTSS2SI : sse12_cvt_sint<0x2C, VR128, GR32, i32, v4f32, X86cvtts2Int,		defm VCVTTSS2SI : sse12_cvt_sint<0x2C, VR128, GR32, i32, v4f32, X86cvtts2Int,
ssmem, sse_load_f32, "cvttss2si",		ssmem, sse_load_f32, "cvttss2si",
WriteCvtSS2I>, XS, VEX, VEX_LIG;		WriteCvtSS2I>, XS, VEX, VEX_LIG;
defm VCVTTSS2SI64 : sse12_cvt_sint<0x2C, VR128, GR64, i64, v4f32,		defm VCVTTSS2SI64 : sse12_cvt_sint<0x2C, VR128, GR64, i64, v4f32,
X86cvtts2Int, ssmem, sse_load_f32,		X86cvtts2Int, ssmem, sse_load_f32,
"cvttss2si", WriteCvtSS2I>,		"cvttss2si", WriteCvtSS2I>,
XS, VEX, VEX_LIG, VEX_W;		XS, VEX, VEX_LIG, VEX_W;
Show All 12 Lines	defm CVTTSS2SI64 : sse12_cvt_sint<0x2C, VR128, GR64, i64, v4f32,
X86cvtts2Int, ssmem, sse_load_f32,		X86cvtts2Int, ssmem, sse_load_f32,
"cvttss2si", WriteCvtSS2I>, XS, REX_W;		"cvttss2si", WriteCvtSS2I>, XS, REX_W;
defm CVTTSD2SI : sse12_cvt_sint<0x2C, VR128, GR32, i32, v2f64, X86cvtts2Int,		defm CVTTSD2SI : sse12_cvt_sint<0x2C, VR128, GR32, i32, v2f64, X86cvtts2Int,
sdmem, sse_load_f64, "cvttsd2si",		sdmem, sse_load_f64, "cvttsd2si",
WriteCvtSD2I>, XD;		WriteCvtSD2I>, XD;
defm CVTTSD2SI64 : sse12_cvt_sint<0x2C, VR128, GR64, i64, v2f64,		defm CVTTSD2SI64 : sse12_cvt_sint<0x2C, VR128, GR64, i64, v2f64,
X86cvtts2Int, sdmem, sse_load_f64,		X86cvtts2Int, sdmem, sse_load_f64,
"cvttsd2si", WriteCvtSD2I>, XD, REX_W;		"cvttsd2si", WriteCvtSD2I>, XD, REX_W;
} // isCodeGenOnly = 1
		def : InstAlias<"vcvttss2si{l}\t{$src, $dst\|$dst, $src}",
		(VCVTTSS2SIrr_Int GR32:$dst, VR128:$src), 0, "att">;
		def : InstAlias<"vcvttss2si{l}\t{$src, $dst\|$dst, $src}",
		(VCVTTSS2SIrm_Int GR32:$dst, f32mem:$src), 0, "att">;
		def : InstAlias<"vcvttsd2si{l}\t{$src, $dst\|$dst, $src}",
		(VCVTTSD2SIrr_Int GR32:$dst, VR128:$src), 0, "att">;
		def : InstAlias<"vcvttsd2si{l}\t{$src, $dst\|$dst, $src}",
		(VCVTTSD2SIrm_Int GR32:$dst, f64mem:$src), 0, "att">;
		def : InstAlias<"vcvttss2si{q}\t{$src, $dst\|$dst, $src}",
		(VCVTTSS2SI64rr_Int GR64:$dst, VR128:$src), 0, "att">;
		def : InstAlias<"vcvttss2si{q}\t{$src, $dst\|$dst, $src}",
		(VCVTTSS2SI64rm_Int GR64:$dst, f32mem:$src), 0, "att">;
		def : InstAlias<"vcvttsd2si{q}\t{$src, $dst\|$dst, $src}",
		(VCVTTSD2SI64rr_Int GR64:$dst, VR128:$src), 0, "att">;
		def : InstAlias<"vcvttsd2si{q}\t{$src, $dst\|$dst, $src}",
		(VCVTTSD2SI64rm_Int GR64:$dst, f64mem:$src), 0, "att">;

		def : InstAlias<"cvttss2si{l}\t{$src, $dst\|$dst, $src}",
		(CVTTSS2SIrr_Int GR32:$dst, VR128:$src), 0, "att">;
		def : InstAlias<"cvttss2si{l}\t{$src, $dst\|$dst, $src}",
		(CVTTSS2SIrm_Int GR32:$dst, f32mem:$src), 0, "att">;
		def : InstAlias<"cvttsd2si{l}\t{$src, $dst\|$dst, $src}",
		(CVTTSD2SIrr_Int GR32:$dst, VR128:$src), 0, "att">;
		def : InstAlias<"cvttsd2si{l}\t{$src, $dst\|$dst, $src}",
		(CVTTSD2SIrm_Int GR32:$dst, f64mem:$src), 0, "att">;
		def : InstAlias<"cvttss2si{q}\t{$src, $dst\|$dst, $src}",
		(CVTTSS2SI64rr_Int GR64:$dst, VR128:$src), 0, "att">;
		def : InstAlias<"cvttss2si{q}\t{$src, $dst\|$dst, $src}",
		(CVTTSS2SI64rm_Int GR64:$dst, f32mem:$src), 0, "att">;
		def : InstAlias<"cvttsd2si{q}\t{$src, $dst\|$dst, $src}",
		(CVTTSD2SI64rr_Int GR64:$dst, VR128:$src), 0, "att">;
		def : InstAlias<"cvttsd2si{q}\t{$src, $dst\|$dst, $src}",
		(CVTTSD2SI64rm_Int GR64:$dst, f64mem:$src), 0, "att">;

let Predicates = [UseAVX] in {		let Predicates = [UseAVX] in {
defm VCVTSS2SI : sse12_cvt_sint<0x2D, VR128, GR32, i32, v4f32, X86cvts2si,		defm VCVTSS2SI : sse12_cvt_sint<0x2D, VR128, GR32, i32, v4f32, X86cvts2si,
ssmem, sse_load_f32, "cvtss2si",		ssmem, sse_load_f32, "cvtss2si",
WriteCvtSS2I>, XS, VEX, VEX_LIG;		WriteCvtSS2I>, XS, VEX, VEX_LIG;
defm VCVTSS2SI64 : sse12_cvt_sint<0x2D, VR128, GR64, i64, v4f32, X86cvts2si,		defm VCVTSS2SI64 : sse12_cvt_sint<0x2D, VR128, GR64, i64, v4f32, X86cvts2si,
ssmem, sse_load_f32, "cvtss2si",		ssmem, sse_load_f32, "cvtss2si",
WriteCvtSS2I>, XS, VEX, VEX_W, VEX_LIG;		WriteCvtSS2I>, XS, VEX, VEX_W, VEX_LIG;
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
def : InstAlias<"cvtsd2si{q}\t{$src, $dst\|$dst, $src}",		def : InstAlias<"cvtsd2si{q}\t{$src, $dst\|$dst, $src}",
(CVTSD2SI64rr_Int GR64:$dst, VR128:$src), 0, "att">;		(CVTSD2SI64rr_Int GR64:$dst, VR128:$src), 0, "att">;
def : InstAlias<"cvtsd2si{q}\t{$src, $dst\|$dst, $src}",		def : InstAlias<"cvtsd2si{q}\t{$src, $dst\|$dst, $src}",
(CVTSD2SI64rm_Int GR64:$dst, sdmem:$src), 0, "att">;		(CVTSD2SI64rm_Int GR64:$dst, sdmem:$src), 0, "att">;

/// SSE 2 Only		/// SSE 2 Only

// Convert scalar double to scalar single		// Convert scalar double to scalar single
let hasSideEffects = 0, Predicates = [UseAVX] in {		let isCodeGenOnly = 1, hasSideEffects = 0, Predicates = [UseAVX] in {
def VCVTSD2SSrr : VSDI<0x5A, MRMSrcReg, (outs FR32:$dst),		def VCVTSD2SSrr : VSDI<0x5A, MRMSrcReg, (outs FR32:$dst),
(ins FR32:$src1, FR64:$src2),		(ins FR32:$src1, FR64:$src2),
"cvtsd2ss\t{$src2, $src1, $dst\|$dst, $src1, $src2}", []>,		"cvtsd2ss\t{$src2, $src1, $dst\|$dst, $src1, $src2}", []>,
VEX_4V, VEX_LIG, VEX_WIG,		VEX_4V, VEX_LIG, VEX_WIG,
Sched<[WriteCvtSD2SS]>;		Sched<[WriteCvtSD2SS]>;
let mayLoad = 1 in		let mayLoad = 1 in
def VCVTSD2SSrm : I<0x5A, MRMSrcMem, (outs FR32:$dst),		def VCVTSD2SSrm : I<0x5A, MRMSrcMem, (outs FR32:$dst),
(ins FR32:$src1, f64mem:$src2),		(ins FR32:$src1, f64mem:$src2),
"vcvtsd2ss\t{$src2, $src1, $dst\|$dst, $src1, $src2}", []>,		"vcvtsd2ss\t{$src2, $src1, $dst\|$dst, $src1, $src2}", []>,
XD, VEX_4V, VEX_LIG, VEX_WIG,		XD, VEX_4V, VEX_LIG, VEX_WIG,
Sched<[WriteCvtSD2SS.Folded, WriteCvtSD2SS.ReadAfterFold]>;		Sched<[WriteCvtSD2SS.Folded, WriteCvtSD2SS.ReadAfterFold]>;
}		}

def : Pat<(f32 (fpround FR64:$src)),		def : Pat<(f32 (fpround FR64:$src)),
(VCVTSD2SSrr (f32 (IMPLICIT_DEF)), FR64:$src)>,		(VCVTSD2SSrr (f32 (IMPLICIT_DEF)), FR64:$src)>,
Requires<[UseAVX]>;		Requires<[UseAVX]>;

		let isCodeGenOnly = 1 in {
def CVTSD2SSrr : SDI<0x5A, MRMSrcReg, (outs FR32:$dst), (ins FR64:$src),		def CVTSD2SSrr : SDI<0x5A, MRMSrcReg, (outs FR32:$dst), (ins FR64:$src),
"cvtsd2ss\t{$src, $dst\|$dst, $src}",		"cvtsd2ss\t{$src, $dst\|$dst, $src}",
[(set FR32:$dst, (fpround FR64:$src))]>,		[(set FR32:$dst, (fpround FR64:$src))]>,
Sched<[WriteCvtSD2SS]>;		Sched<[WriteCvtSD2SS]>;
def CVTSD2SSrm : I<0x5A, MRMSrcMem, (outs FR32:$dst), (ins f64mem:$src),		def CVTSD2SSrm : I<0x5A, MRMSrcMem, (outs FR32:$dst), (ins f64mem:$src),
"cvtsd2ss\t{$src, $dst\|$dst, $src}",		"cvtsd2ss\t{$src, $dst\|$dst, $src}",
[(set FR32:$dst, (fpround (loadf64 addr:$src)))]>,		[(set FR32:$dst, (fpround (loadf64 addr:$src)))]>,
XD, Requires<[UseSSE2, OptForSize]>,		XD, Requires<[UseSSE2, OptForSize]>,
Sched<[WriteCvtSD2SS.Folded]>;		Sched<[WriteCvtSD2SS.Folded]>;
		}

let isCodeGenOnly = 1 in {
def VCVTSD2SSrr_Int: I<0x5A, MRMSrcReg,		def VCVTSD2SSrr_Int: I<0x5A, MRMSrcReg,
(outs VR128:$dst), (ins VR128:$src1, VR128:$src2),		(outs VR128:$dst), (ins VR128:$src1, VR128:$src2),
"vcvtsd2ss\t{$src2, $src1, $dst\|$dst, $src1, $src2}",		"vcvtsd2ss\t{$src2, $src1, $dst\|$dst, $src1, $src2}",
[(set VR128:$dst,		[(set VR128:$dst,
(v4f32 (X86frounds VR128:$src1, (v2f64 VR128:$src2))))]>,		(v4f32 (X86frounds VR128:$src1, (v2f64 VR128:$src2))))]>,
XD, VEX_4V, VEX_LIG, VEX_WIG, Requires<[UseAVX]>,		XD, VEX_4V, VEX_LIG, VEX_WIG, Requires<[UseAVX]>,
Sched<[WriteCvtSD2SS]>;		Sched<[WriteCvtSD2SS]>;
def VCVTSD2SSrm_Int: I<0x5A, MRMSrcMem,		def VCVTSD2SSrm_Int: I<0x5A, MRMSrcMem,
Show All 13 Lines
def CVTSD2SSrm_Int: I<0x5A, MRMSrcMem,		def CVTSD2SSrm_Int: I<0x5A, MRMSrcMem,
(outs VR128:$dst), (ins VR128:$src1, sdmem:$src2),		(outs VR128:$dst), (ins VR128:$src1, sdmem:$src2),
"cvtsd2ss\t{$src2, $dst\|$dst, $src2}",		"cvtsd2ss\t{$src2, $dst\|$dst, $src2}",
[(set VR128:$dst,		[(set VR128:$dst,
(v4f32 (X86frounds VR128:$src1,sse_load_f64:$src2)))]>,		(v4f32 (X86frounds VR128:$src1,sse_load_f64:$src2)))]>,
XD, Requires<[UseSSE2]>,		XD, Requires<[UseSSE2]>,
Sched<[WriteCvtSD2SS.Folded, WriteCvtSD2SS.ReadAfterFold]>;		Sched<[WriteCvtSD2SS.Folded, WriteCvtSD2SS.ReadAfterFold]>;
}		}
} // isCodeGenOnly = 1

// Convert scalar single to scalar double		// Convert scalar single to scalar double
// SSE2 instructions with XS prefix		// SSE2 instructions with XS prefix
let hasSideEffects = 0 in {		let isCodeGenOnly = 1, hasSideEffects = 0 in {
def VCVTSS2SDrr : I<0x5A, MRMSrcReg, (outs FR64:$dst),		def VCVTSS2SDrr : I<0x5A, MRMSrcReg, (outs FR64:$dst),
(ins FR64:$src1, FR32:$src2),		(ins FR64:$src1, FR32:$src2),
"vcvtss2sd\t{$src2, $src1, $dst\|$dst, $src1, $src2}", []>,		"vcvtss2sd\t{$src2, $src1, $dst\|$dst, $src1, $src2}", []>,
XS, VEX_4V, VEX_LIG, VEX_WIG,		XS, VEX_4V, VEX_LIG, VEX_WIG,
Sched<[WriteCvtSS2SD]>, Requires<[UseAVX]>;		Sched<[WriteCvtSS2SD]>, Requires<[UseAVX]>;
let mayLoad = 1 in		let mayLoad = 1 in
def VCVTSS2SDrm : I<0x5A, MRMSrcMem, (outs FR64:$dst),		def VCVTSS2SDrm : I<0x5A, MRMSrcMem, (outs FR64:$dst),
(ins FR64:$src1, f32mem:$src2),		(ins FR64:$src1, f32mem:$src2),
"vcvtss2sd\t{$src2, $src1, $dst\|$dst, $src1, $src2}", []>,		"vcvtss2sd\t{$src2, $src1, $dst\|$dst, $src1, $src2}", []>,
XS, VEX_4V, VEX_LIG, VEX_WIG,		XS, VEX_4V, VEX_LIG, VEX_WIG,
Sched<[WriteCvtSS2SD.Folded, WriteCvtSS2SD.ReadAfterFold]>,		Sched<[WriteCvtSS2SD.Folded, WriteCvtSS2SD.ReadAfterFold]>,
Requires<[UseAVX, OptForSize]>;		Requires<[UseAVX, OptForSize]>;
}		} // isCodeGenOnly = 1, hasSideEffects = 0

def : Pat<(f64 (fpextend FR32:$src)),		def : Pat<(f64 (fpextend FR32:$src)),
(VCVTSS2SDrr (f64 (IMPLICIT_DEF)), FR32:$src)>, Requires<[UseAVX]>;		(VCVTSS2SDrr (f64 (IMPLICIT_DEF)), FR32:$src)>, Requires<[UseAVX]>;
def : Pat<(fpextend (loadf32 addr:$src)),		def : Pat<(fpextend (loadf32 addr:$src)),
(VCVTSS2SDrm (f64 (IMPLICIT_DEF)), addr:$src)>, Requires<[UseAVX, OptForSize]>;		(VCVTSS2SDrm (f64 (IMPLICIT_DEF)), addr:$src)>, Requires<[UseAVX, OptForSize]>;

def : Pat<(extloadf32 addr:$src),		def : Pat<(extloadf32 addr:$src),
(VCVTSS2SDrm (f64 (IMPLICIT_DEF)), addr:$src)>,		(VCVTSS2SDrm (f64 (IMPLICIT_DEF)), addr:$src)>,
Requires<[UseAVX, OptForSize]>;		Requires<[UseAVX, OptForSize]>;
def : Pat<(extloadf32 addr:$src),		def : Pat<(extloadf32 addr:$src),
(VCVTSS2SDrr (f64 (IMPLICIT_DEF)), (VMOVSSrm addr:$src))>,		(VCVTSS2SDrr (f64 (IMPLICIT_DEF)), (VMOVSSrm addr:$src))>,
Requires<[UseAVX, OptForSpeed]>;		Requires<[UseAVX, OptForSpeed]>;

		let isCodeGenOnly = 1 in {
def CVTSS2SDrr : I<0x5A, MRMSrcReg, (outs FR64:$dst), (ins FR32:$src),		def CVTSS2SDrr : I<0x5A, MRMSrcReg, (outs FR64:$dst), (ins FR32:$src),
"cvtss2sd\t{$src, $dst\|$dst, $src}",		"cvtss2sd\t{$src, $dst\|$dst, $src}",
[(set FR64:$dst, (fpextend FR32:$src))]>,		[(set FR64:$dst, (fpextend FR32:$src))]>,
XS, Requires<[UseSSE2]>, Sched<[WriteCvtSS2SD]>;		XS, Requires<[UseSSE2]>, Sched<[WriteCvtSS2SD]>;
def CVTSS2SDrm : I<0x5A, MRMSrcMem, (outs FR64:$dst), (ins f32mem:$src),		def CVTSS2SDrm : I<0x5A, MRMSrcMem, (outs FR64:$dst), (ins f32mem:$src),
"cvtss2sd\t{$src, $dst\|$dst, $src}",		"cvtss2sd\t{$src, $dst\|$dst, $src}",
[(set FR64:$dst, (extloadf32 addr:$src))]>,		[(set FR64:$dst, (extloadf32 addr:$src))]>,
XS, Requires<[UseSSE2, OptForSize]>,		XS, Requires<[UseSSE2, OptForSize]>,
Sched<[WriteCvtSS2SD.Folded]>;		Sched<[WriteCvtSS2SD.Folded]>;
		} // isCodeGenOnly = 1

// extload f32 -> f64. This matches load+fpextend because we have a hack in		// extload f32 -> f64. This matches load+fpextend because we have a hack in
// the isel (PreprocessForFPConvert) that can introduce loads after dag		// the isel (PreprocessForFPConvert) that can introduce loads after dag
// combine.		// combine.
// Since these loads aren't folded into the fpextend, we have to match it		// Since these loads aren't folded into the fpextend, we have to match it
// explicitly here.		// explicitly here.
def : Pat<(fpextend (loadf32 addr:$src)),		def : Pat<(fpextend (loadf32 addr:$src)),
(CVTSS2SDrm addr:$src)>, Requires<[UseSSE2, OptForSize]>;		(CVTSS2SDrm addr:$src)>, Requires<[UseSSE2, OptForSize]>;
def : Pat<(extloadf32 addr:$src),		def : Pat<(extloadf32 addr:$src),
(CVTSS2SDrr (MOVSSrm addr:$src))>, Requires<[UseSSE2, OptForSpeed]>;		(CVTSS2SDrr (MOVSSrm addr:$src))>, Requires<[UseSSE2, OptForSpeed]>;

let isCodeGenOnly = 1, hasSideEffects = 0 in {		let hasSideEffects = 0 in {
def VCVTSS2SDrr_Int: I<0x5A, MRMSrcReg,		def VCVTSS2SDrr_Int: I<0x5A, MRMSrcReg,
(outs VR128:$dst), (ins VR128:$src1, VR128:$src2),		(outs VR128:$dst), (ins VR128:$src1, VR128:$src2),
"vcvtss2sd\t{$src2, $src1, $dst\|$dst, $src1, $src2}",		"vcvtss2sd\t{$src2, $src1, $dst\|$dst, $src1, $src2}",
[]>, XS, VEX_4V, VEX_LIG, VEX_WIG,		[]>, XS, VEX_4V, VEX_LIG, VEX_WIG,
Requires<[HasAVX]>, Sched<[WriteCvtSS2SD]>;		Requires<[HasAVX]>, Sched<[WriteCvtSS2SD]>;
let mayLoad = 1 in		let mayLoad = 1 in
def VCVTSS2SDrm_Int: I<0x5A, MRMSrcMem,		def VCVTSS2SDrm_Int: I<0x5A, MRMSrcMem,
(outs VR128:$dst), (ins VR128:$src1, ssmem:$src2),		(outs VR128:$dst), (ins VR128:$src1, ssmem:$src2),
"vcvtss2sd\t{$src2, $src1, $dst\|$dst, $src1, $src2}",		"vcvtss2sd\t{$src2, $src1, $dst\|$dst, $src1, $src2}",
[]>, XS, VEX_4V, VEX_LIG, VEX_WIG, Requires<[HasAVX]>,		[]>, XS, VEX_4V, VEX_LIG, VEX_WIG, Requires<[HasAVX]>,
Sched<[WriteCvtSS2SD.Folded, WriteCvtSS2SD.ReadAfterFold]>;		Sched<[WriteCvtSS2SD.Folded, WriteCvtSS2SD.ReadAfterFold]>;
let Constraints = "$src1 = $dst" in { // SSE2 instructions with XS prefix		let Constraints = "$src1 = $dst" in { // SSE2 instructions with XS prefix
def CVTSS2SDrr_Int: I<0x5A, MRMSrcReg,		def CVTSS2SDrr_Int: I<0x5A, MRMSrcReg,
(outs VR128:$dst), (ins VR128:$src1, VR128:$src2),		(outs VR128:$dst), (ins VR128:$src1, VR128:$src2),
"cvtss2sd\t{$src2, $dst\|$dst, $src2}",		"cvtss2sd\t{$src2, $dst\|$dst, $src2}",
[]>, XS, Requires<[UseSSE2]>,		[]>, XS, Requires<[UseSSE2]>,
Sched<[WriteCvtSS2SD]>;		Sched<[WriteCvtSS2SD]>;
let mayLoad = 1 in		let mayLoad = 1 in
def CVTSS2SDrm_Int: I<0x5A, MRMSrcMem,		def CVTSS2SDrm_Int: I<0x5A, MRMSrcMem,
(outs VR128:$dst), (ins VR128:$src1, ssmem:$src2),		(outs VR128:$dst), (ins VR128:$src1, ssmem:$src2),
"cvtss2sd\t{$src2, $dst\|$dst, $src2}",		"cvtss2sd\t{$src2, $dst\|$dst, $src2}",
[]>, XS, Requires<[UseSSE2]>,		[]>, XS, Requires<[UseSSE2]>,
Sched<[WriteCvtSS2SD.Folded, WriteCvtSS2SD.ReadAfterFold]>;		Sched<[WriteCvtSS2SD.Folded, WriteCvtSS2SD.ReadAfterFold]>;
}		}
} // isCodeGenOnly = 1		} // hasSideEffects = 0

// Patterns used for matching (v)cvtsi2ss, (v)cvtsi2sd, (v)cvtsd2ss and		// Patterns used for matching (v)cvtsi2ss, (v)cvtsi2sd, (v)cvtsd2ss and
// (v)cvtss2sd intrinsic sequences from clang which produce unnecessary		// (v)cvtss2sd intrinsic sequences from clang which produce unnecessary
// vmovs{s,d} instructions		// vmovs{s,d} instructions
let Predicates = [UseAVX] in {		let Predicates = [UseAVX] in {
def : Pat<(v4f32 (X86Movss		def : Pat<(v4f32 (X86Movss
(v4f32 VR128:$dst),		(v4f32 VR128:$dst),
(v4f32 (scalar_to_vector		(v4f32 (scalar_to_vector
▲ Show 20 Lines • Show All 450 Lines • ▼ Show 20 Lines	def rr : SIi8<0xC2, MRMSrcReg,
Sched<[sched]>;		Sched<[sched]>;
def rm : SIi8<0xC2, MRMSrcMem,		def rm : SIi8<0xC2, MRMSrcMem,
(outs RC:$dst), (ins RC:$src1, x86memop:$src2, u8imm:$cc), asm,		(outs RC:$dst), (ins RC:$src1, x86memop:$src2, u8imm:$cc), asm,
[(set RC:$dst, (OpNode (VT RC:$src1),		[(set RC:$dst, (OpNode (VT RC:$src1),
(ld_frag addr:$src2), imm:$cc))]>,		(ld_frag addr:$src2), imm:$cc))]>,
Sched<[sched.Folded, sched.ReadAfterFold]>;		Sched<[sched.Folded, sched.ReadAfterFold]>;
}		}

		let isCodeGenOnly = 1 in {
let ExeDomain = SSEPackedSingle in		let ExeDomain = SSEPackedSingle in
defm VCMPSS : sse12_cmp_scalar<FR32, f32mem, X86cmps, f32, loadf32,		defm VCMPSS : sse12_cmp_scalar<FR32, f32mem, X86cmps, f32, loadf32,
"cmpss\t{$cc, $src2, $src1, $dst\|$dst, $src1, $src2, $cc}",		"cmpss\t{$cc, $src2, $src1, $dst\|$dst, $src1, $src2, $cc}",
SchedWriteFCmpSizes.PS.Scl>, XS, VEX_4V, VEX_LIG, VEX_WIG;		SchedWriteFCmpSizes.PS.Scl>, XS, VEX_4V, VEX_LIG, VEX_WIG;
let ExeDomain = SSEPackedDouble in		let ExeDomain = SSEPackedDouble in
defm VCMPSD : sse12_cmp_scalar<FR64, f64mem, X86cmps, f64, loadf64,		defm VCMPSD : sse12_cmp_scalar<FR64, f64mem, X86cmps, f64, loadf64,
"cmpsd\t{$cc, $src2, $src1, $dst\|$dst, $src1, $src2, $cc}",		"cmpsd\t{$cc, $src2, $src1, $dst\|$dst, $src1, $src2, $cc}",
SchedWriteFCmpSizes.PD.Scl>,		SchedWriteFCmpSizes.PD.Scl>,
XD, VEX_4V, VEX_LIG, VEX_WIG;		XD, VEX_4V, VEX_LIG, VEX_WIG;

let Constraints = "$src1 = $dst" in {		let Constraints = "$src1 = $dst" in {
let ExeDomain = SSEPackedSingle in		let ExeDomain = SSEPackedSingle in
defm CMPSS : sse12_cmp_scalar<FR32, f32mem, X86cmps, f32, loadf32,		defm CMPSS : sse12_cmp_scalar<FR32, f32mem, X86cmps, f32, loadf32,
"cmpss\t{$cc, $src2, $dst\|$dst, $src2, $cc}",		"cmpss\t{$cc, $src2, $dst\|$dst, $src2, $cc}",
SchedWriteFCmpSizes.PS.Scl>, XS;		SchedWriteFCmpSizes.PS.Scl>, XS;
let ExeDomain = SSEPackedDouble in		let ExeDomain = SSEPackedDouble in
defm CMPSD : sse12_cmp_scalar<FR64, f64mem, X86cmps, f64, loadf64,		defm CMPSD : sse12_cmp_scalar<FR64, f64mem, X86cmps, f64, loadf64,
"cmpsd\t{$cc, $src2, $dst\|$dst, $src2, $cc}",		"cmpsd\t{$cc, $src2, $dst\|$dst, $src2, $cc}",
SchedWriteFCmpSizes.PD.Scl>, XD;		SchedWriteFCmpSizes.PD.Scl>, XD;
}		}
		}

multiclass sse12_cmp_scalar_int<Operand memop,		multiclass sse12_cmp_scalar_int<Operand memop,
Intrinsic Int, string asm, X86FoldableSchedWrite sched,		Intrinsic Int, string asm, X86FoldableSchedWrite sched,
ComplexPattern mem_cpat> {		ComplexPattern mem_cpat> {
def rr_Int : SIi8<0xC2, MRMSrcReg, (outs VR128:$dst),		def rr_Int : SIi8<0xC2, MRMSrcReg, (outs VR128:$dst),
(ins VR128:$src1, VR128:$src, u8imm:$cc), asm,		(ins VR128:$src1, VR128:$src, u8imm:$cc), asm,
[(set VR128:$dst, (Int VR128:$src1,		[(set VR128:$dst, (Int VR128:$src1,
VR128:$src, imm:$cc))]>,		VR128:$src, imm:$cc))]>,
Sched<[sched]>;		Sched<[sched]>;
let mayLoad = 1 in		let mayLoad = 1 in
def rm_Int : SIi8<0xC2, MRMSrcMem, (outs VR128:$dst),		def rm_Int : SIi8<0xC2, MRMSrcMem, (outs VR128:$dst),
(ins VR128:$src1, memop:$src, u8imm:$cc), asm,		(ins VR128:$src1, memop:$src, u8imm:$cc), asm,
[(set VR128:$dst, (Int VR128:$src1,		[(set VR128:$dst, (Int VR128:$src1,
mem_cpat:$src, imm:$cc))]>,		mem_cpat:$src, imm:$cc))]>,
Sched<[sched.Folded, sched.ReadAfterFold]>;		Sched<[sched.Folded, sched.ReadAfterFold]>;
}		}

let isCodeGenOnly = 1 in {
// Aliases to match intrinsics which expect XMM operand(s).		// Aliases to match intrinsics which expect XMM operand(s).
let ExeDomain = SSEPackedSingle in		let ExeDomain = SSEPackedSingle in
defm VCMPSS : sse12_cmp_scalar_int<ssmem, int_x86_sse_cmp_ss,		defm VCMPSS : sse12_cmp_scalar_int<ssmem, int_x86_sse_cmp_ss,
"cmpss\t{$cc, $src, $src1, $dst\|$dst, $src1, $src, $cc}",		"cmpss\t{$cc, $src, $src1, $dst\|$dst, $src1, $src, $cc}",
SchedWriteFCmpSizes.PS.Scl, sse_load_f32>,		SchedWriteFCmpSizes.PS.Scl, sse_load_f32>,
XS, VEX_4V, VEX_LIG, VEX_WIG;		XS, VEX_4V, VEX_LIG, VEX_WIG;
let ExeDomain = SSEPackedDouble in		let ExeDomain = SSEPackedDouble in
defm VCMPSD : sse12_cmp_scalar_int<sdmem, int_x86_sse2_cmp_sd,		defm VCMPSD : sse12_cmp_scalar_int<sdmem, int_x86_sse2_cmp_sd,
"cmpsd\t{$cc, $src, $src1, $dst\|$dst, $src1, $src, $cc}",		"cmpsd\t{$cc, $src, $src1, $dst\|$dst, $src1, $src, $cc}",
SchedWriteFCmpSizes.PD.Scl, sse_load_f64>,		SchedWriteFCmpSizes.PD.Scl, sse_load_f64>,
XD, VEX_4V, VEX_LIG, VEX_WIG;		XD, VEX_4V, VEX_LIG, VEX_WIG;
let Constraints = "$src1 = $dst" in {		let Constraints = "$src1 = $dst" in {
let ExeDomain = SSEPackedSingle in		let ExeDomain = SSEPackedSingle in
defm CMPSS : sse12_cmp_scalar_int<ssmem, int_x86_sse_cmp_ss,		defm CMPSS : sse12_cmp_scalar_int<ssmem, int_x86_sse_cmp_ss,
"cmpss\t{$cc, $src, $dst\|$dst, $src, $cc}",		"cmpss\t{$cc, $src, $dst\|$dst, $src, $cc}",
SchedWriteFCmpSizes.PS.Scl, sse_load_f32>, XS;		SchedWriteFCmpSizes.PS.Scl, sse_load_f32>, XS;
let ExeDomain = SSEPackedDouble in		let ExeDomain = SSEPackedDouble in
defm CMPSD : sse12_cmp_scalar_int<sdmem, int_x86_sse2_cmp_sd,		defm CMPSD : sse12_cmp_scalar_int<sdmem, int_x86_sse2_cmp_sd,
"cmpsd\t{$cc, $src, $dst\|$dst, $src, $cc}",		"cmpsd\t{$cc, $src, $dst\|$dst, $src, $cc}",
SchedWriteFCmpSizes.PD.Scl, sse_load_f64>, XD;		SchedWriteFCmpSizes.PD.Scl, sse_load_f64>, XD;
}		}
}


// sse12_ord_cmp - Unordered/Ordered scalar fp compare and set EFLAGS		// sse12_ord_cmp - Unordered/Ordered scalar fp compare and set EFLAGS
multiclass sse12_ord_cmp<bits<8> opc, RegisterClass RC, SDNode OpNode,		multiclass sse12_ord_cmp<bits<8> opc, RegisterClass RC, SDNode OpNode,
ValueType vt, X86MemOperand x86memop,		ValueType vt, X86MemOperand x86memop,
PatFrag ld_frag, string OpcodeStr,		PatFrag ld_frag, string OpcodeStr,
X86FoldableSchedWrite sched> {		X86FoldableSchedWrite sched> {
let hasSideEffects = 0 in {		let hasSideEffects = 0 in {
▲ Show 20 Lines • Show All 1,012 Lines • ▼ Show 20 Lines

/// sse_fp_unop_s - SSE1 unops in scalar form		/// sse_fp_unop_s - SSE1 unops in scalar form
/// For the non-AVX defs, we need $src1 to be tied to $dst because		/// For the non-AVX defs, we need $src1 to be tied to $dst because
/// the HW instructions are 2 operand / destructive.		/// the HW instructions are 2 operand / destructive.
multiclass sse_fp_unop_s<bits<8> opc, string OpcodeStr, RegisterClass RC,		multiclass sse_fp_unop_s<bits<8> opc, string OpcodeStr, RegisterClass RC,
ValueType ScalarVT, X86MemOperand x86memop,		ValueType ScalarVT, X86MemOperand x86memop,
Operand intmemop, SDNode OpNode, Domain d,		Operand intmemop, SDNode OpNode, Domain d,
X86FoldableSchedWrite sched, Predicate target> {		X86FoldableSchedWrite sched, Predicate target> {
let hasSideEffects = 0 in {		let isCodeGenOnly = 1, hasSideEffects = 0 in {
def r : I<opc, MRMSrcReg, (outs RC:$dst), (ins RC:$src1),		def r : I<opc, MRMSrcReg, (outs RC:$dst), (ins RC:$src1),
!strconcat(OpcodeStr, "\t{$src1, $dst\|$dst, $src1}"),		!strconcat(OpcodeStr, "\t{$src1, $dst\|$dst, $src1}"),
[(set RC:$dst, (OpNode RC:$src1))], d>, Sched<[sched]>,		[(set RC:$dst, (OpNode RC:$src1))], d>, Sched<[sched]>,
Requires<[target]>;		Requires<[target]>;
let mayLoad = 1 in		let mayLoad = 1 in
def m : I<opc, MRMSrcMem, (outs RC:$dst), (ins x86memop:$src1),		def m : I<opc, MRMSrcMem, (outs RC:$dst), (ins x86memop:$src1),
!strconcat(OpcodeStr, "\t{$src1, $dst\|$dst, $src1}"),		!strconcat(OpcodeStr, "\t{$src1, $dst\|$dst, $src1}"),
[(set RC:$dst, (OpNode (load addr:$src1)))], d>,		[(set RC:$dst, (OpNode (load addr:$src1)))], d>,
Sched<[sched.Folded]>,		Sched<[sched.Folded]>,
Requires<[target, OptForSize]>;		Requires<[target, OptForSize]>;
		}

let isCodeGenOnly = 1, Constraints = "$src1 = $dst", ExeDomain = d in {		let hasSideEffects = 0, Constraints = "$src1 = $dst", ExeDomain = d in {
def r_Int : I<opc, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src1, VR128:$src2),		def r_Int : I<opc, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src1, VR128:$src2),
!strconcat(OpcodeStr, "\t{$src2, $dst\|$dst, $src2}"), []>,		!strconcat(OpcodeStr, "\t{$src2, $dst\|$dst, $src2}"), []>,
Sched<[sched]>;		Sched<[sched]>;
let mayLoad = 1 in		let mayLoad = 1 in
def m_Int : I<opc, MRMSrcMem, (outs VR128:$dst), (ins VR128:$src1, intmemop:$src2),		def m_Int : I<opc, MRMSrcMem, (outs VR128:$dst), (ins VR128:$src1, intmemop:$src2),
!strconcat(OpcodeStr, "\t{$src2, $dst\|$dst, $src2}"), []>,		!strconcat(OpcodeStr, "\t{$src2, $dst\|$dst, $src2}"), []>,
Sched<[sched.Folded, sched.ReadAfterFold]>;		Sched<[sched.Folded, sched.ReadAfterFold]>;
}		}
}

}		}

multiclass sse_fp_unop_s_intr<RegisterClass RC, ValueType vt,		multiclass sse_fp_unop_s_intr<RegisterClass RC, ValueType vt,
ComplexPattern int_cpat, Intrinsic Intr,		ComplexPattern int_cpat, Intrinsic Intr,
Predicate target, string Suffix> {		Predicate target, string Suffix> {
let Predicates = [target] in {		let Predicates = [target] in {
// These are unary operations, but they are modeled as having 2 source operands		// These are unary operations, but they are modeled as having 2 source operands
Show All 28 Lines	def : Pat<(Intr int_cpat:$src2),
(vt (IMPLICIT_DEF)), addr:$src2)>;		(vt (IMPLICIT_DEF)), addr:$src2)>;
}		}
}		}

multiclass avx_fp_unop_s<bits<8> opc, string OpcodeStr, RegisterClass RC,		multiclass avx_fp_unop_s<bits<8> opc, string OpcodeStr, RegisterClass RC,
ValueType ScalarVT, X86MemOperand x86memop,		ValueType ScalarVT, X86MemOperand x86memop,
Operand intmemop, SDNode OpNode, Domain d,		Operand intmemop, SDNode OpNode, Domain d,
X86FoldableSchedWrite sched, Predicate target> {		X86FoldableSchedWrite sched, Predicate target> {
let hasSideEffects = 0 in {		let isCodeGenOnly = 1, hasSideEffects = 0 in {
def r : I<opc, MRMSrcReg, (outs RC:$dst), (ins RC:$src1, RC:$src2),		def r : I<opc, MRMSrcReg, (outs RC:$dst), (ins RC:$src1, RC:$src2),
!strconcat(OpcodeStr, "\t{$src2, $src1, $dst\|$dst, $src1, $src2}"),		!strconcat(OpcodeStr, "\t{$src2, $src1, $dst\|$dst, $src1, $src2}"),
[], d>, Sched<[sched]>;		[], d>, Sched<[sched]>;
let mayLoad = 1 in		let mayLoad = 1 in
def m : I<opc, MRMSrcMem, (outs RC:$dst), (ins RC:$src1, x86memop:$src2),		def m : I<opc, MRMSrcMem, (outs RC:$dst), (ins RC:$src1, x86memop:$src2),
!strconcat(OpcodeStr, "\t{$src2, $src1, $dst\|$dst, $src1, $src2}"),		!strconcat(OpcodeStr, "\t{$src2, $src1, $dst\|$dst, $src1, $src2}"),
[], d>, Sched<[sched.Folded, sched.ReadAfterFold]>;		[], d>, Sched<[sched.Folded, sched.ReadAfterFold]>;
let isCodeGenOnly = 1, ExeDomain = d in {		}
		let hasSideEffects = 0, ExeDomain = d in {
def r_Int : I<opc, MRMSrcReg, (outs VR128:$dst),		def r_Int : I<opc, MRMSrcReg, (outs VR128:$dst),
(ins VR128:$src1, VR128:$src2),		(ins VR128:$src1, VR128:$src2),
!strconcat(OpcodeStr, "\t{$src2, $src1, $dst\|$dst, $src1, $src2}"),		!strconcat(OpcodeStr, "\t{$src2, $src1, $dst\|$dst, $src1, $src2}"),
[]>, Sched<[sched]>;		[]>, Sched<[sched]>;
let mayLoad = 1 in		let mayLoad = 1 in
def m_Int : I<opc, MRMSrcMem, (outs VR128:$dst),		def m_Int : I<opc, MRMSrcMem, (outs VR128:$dst),
(ins VR128:$src1, intmemop:$src2),		(ins VR128:$src1, intmemop:$src2),
!strconcat(OpcodeStr, "\t{$src2, $src1, $dst\|$dst, $src1, $src2}"),		!strconcat(OpcodeStr, "\t{$src2, $src1, $dst\|$dst, $src1, $src2}"),
[]>, Sched<[sched.Folded, sched.ReadAfterFold]>;		[]>, Sched<[sched.Folded, sched.ReadAfterFold]>;
}		}
}

// We don't want to fold scalar loads into these instructions unless		// We don't want to fold scalar loads into these instructions unless
// optimizing for size. This is because the folded instruction will have a		// optimizing for size. This is because the folded instruction will have a
// partial register update, while the unfolded sequence will not, e.g.		// partial register update, while the unfolded sequence will not, e.g.
// vmovss mem, %xmm0		// vmovss mem, %xmm0
// vrcpss %xmm0, %xmm0, %xmm0		// vrcpss %xmm0, %xmm0, %xmm0
// which has a clobber before the rcp, vs.		// which has a clobber before the rcp, vs.
// vrcpss mem, %xmm0, %xmm0		// vrcpss mem, %xmm0, %xmm0
▲ Show 20 Lines • Show All 5,504 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86SchedBroadwell.td

	Show First 20 Lines • Show All 958 Lines • ▼ Show 20 Lines

	def BWWriteResGroup59 : SchedWriteRes<[BWPort0,BWPort23]> {			def BWWriteResGroup59 : SchedWriteRes<[BWPort0,BWPort23]> {
	let Latency = 6;			let Latency = 6;
	let NumMicroOps = 2;			let NumMicroOps = 2;
	let ResourceCycles = [1,1];			let ResourceCycles = [1,1];
	}			}
	def: InstRW<[BWWriteResGroup59], (instrs CVTPS2PDrm, VCVTPS2PDrm,			def: InstRW<[BWWriteResGroup59], (instrs CVTPS2PDrm, VCVTPS2PDrm,
	CVTSS2SDrm, VCVTSS2SDrm,			CVTSS2SDrm, VCVTSS2SDrm,
				CVTSS2SDrm_Int, VCVTSS2SDrm_Int,
	VPSLLVQrm,			VPSLLVQrm,
	VPSRLVQrm)>;			VPSRLVQrm)>;

	def BWWriteResGroup60 : SchedWriteRes<[BWPort1,BWPort5]> {			def BWWriteResGroup60 : SchedWriteRes<[BWPort1,BWPort5]> {
	let Latency = 6;			let Latency = 6;
	let NumMicroOps = 2;			let NumMicroOps = 2;
	let ResourceCycles = [1,1];			let ResourceCycles = [1,1];
	}			}
	▲ Show 20 Lines • Show All 680 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86SchedHaswell.td

Show First 20 Lines • Show All 1,391 Lines • ▼ Show 20 Lines	def: InstRW<[HWWriteResGroup78], (instrs CVTPD2PSrm,
VCVTDQ2PDrm)>;		VCVTDQ2PDrm)>;

def HWWriteResGroup78_1 : SchedWriteRes<[HWPort1,HWPort5,HWPort23]> {		def HWWriteResGroup78_1 : SchedWriteRes<[HWPort1,HWPort5,HWPort23]> {
let Latency = 9;		let Latency = 9;
let NumMicroOps = 3;		let NumMicroOps = 3;
let ResourceCycles = [1,1,1];		let ResourceCycles = [1,1,1];
}		}
def: InstRW<[HWWriteResGroup78_1], (instrs MMX_CVTPI2PDirm,		def: InstRW<[HWWriteResGroup78_1], (instrs MMX_CVTPI2PDirm,
CVTSD2SSrm,		CVTSD2SSrm, CVTSD2SSrm_Int,
VCVTSD2SSrm)>;		VCVTSD2SSrm, VCVTSD2SSrm_Int)>;

def HWWriteResGroup80 : SchedWriteRes<[HWPort5,HWPort23,HWPort015]> {		def HWWriteResGroup80 : SchedWriteRes<[HWPort5,HWPort23,HWPort015]> {
let Latency = 9;		let Latency = 9;
let NumMicroOps = 3;		let NumMicroOps = 3;
let ResourceCycles = [1,1,1];		let ResourceCycles = [1,1,1];
}		}
def: InstRW<[HWWriteResGroup80], (instregex "VPBROADCAST(B\|W)(Y?)rm")>;		def: InstRW<[HWWriteResGroup80], (instregex "VPBROADCAST(B\|W)(Y?)rm")>;

▲ Show 20 Lines • Show All 529 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ScheduleBdVer2.td

	Show First 20 Lines • Show All 895 Lines • ▼ Show 20 Lines

	defm : PdWriteResXMMPair<WriteCvtI2SD, [PdFPU1, PdFPSTO], 4, [], 2>;			defm : PdWriteResXMMPair<WriteCvtI2SD, [PdFPU1, PdFPSTO], 4, [], 2>;
	// FIXME: .Folded version is one NumMicroOp less..			// FIXME: .Folded version is one NumMicroOp less..

	def PdWriteCVTSI642SDrr_CVTSI642SSrr_CVTSI2SDr_CVTSI2SSrr : SchedWriteRes<[PdFPU1, PdFPSTO]> {			def PdWriteCVTSI642SDrr_CVTSI642SSrr_CVTSI2SDr_CVTSI2SSrr : SchedWriteRes<[PdFPU1, PdFPSTO]> {
	let Latency = 13;			let Latency = 13;
	let NumMicroOps = 2;			let NumMicroOps = 2;
	}			}
	def : InstRW<[PdWriteCVTSI642SDrr_CVTSI642SSrr_CVTSI2SDr_CVTSI2SSrr], (instrs CVTSI642SDrr, CVTSI642SSrr, CVTSI2SDrr, CVTSI2SSrr)>;			def : InstRW<[PdWriteCVTSI642SDrr_CVTSI642SSrr_CVTSI2SDr_CVTSI2SSrr], (instrs CVTSI642SDrr, CVTSI642SSrr, CVTSI2SDrr, CVTSI2SSrr,
				CVTSI642SDrr_Int, CVTSI642SSrr_Int, CVTSI2SDrr_Int, CVTSI2SSrr_Int)>;

	defm : PdWriteResXMMPair<WriteCvtI2PD, [PdFPU1, PdFPSTO], 8, [], 2>;			defm : PdWriteResXMMPair<WriteCvtI2PD, [PdFPU1, PdFPSTO], 8, [], 2>;
	defm : PdWriteResYMMPair<WriteCvtI2PDY, [PdFPU1, PdFPSTO], 8, [2, 1], 4, 1>;			defm : PdWriteResYMMPair<WriteCvtI2PDY, [PdFPU1, PdFPSTO], 8, [2, 1], 4, 1>;
	defm : X86WriteResPairUnsupported<WriteCvtI2PDZ>;			defm : X86WriteResPairUnsupported<WriteCvtI2PDZ>;

	defm : PdWriteResXMMPair<WriteCvtSS2SD, [PdFPU1, PdFPSTO], 4>;			defm : PdWriteResXMMPair<WriteCvtSS2SD, [PdFPU1, PdFPSTO], 4>;

	defm : PdWriteResXMMPair<WriteCvtPS2PD, [PdFPU1, PdFPSTO], 8, [], 2>;			defm : PdWriteResXMMPair<WriteCvtPS2PD, [PdFPU1, PdFPSTO], 8, [], 2>;
	▲ Show 20 Lines • Show All 418 Lines • Show Last 20 Lines

llvm/trunk/test/tools/llvm-mca/X86/BdVer2/int-to-fpu-forwarding-2.s

	Show First 20 Lines • Show All 135 Lines • ▼ Show 20 Lines
	# CHECK: Resource pressure by instruction:			# CHECK: Resource pressure by instruction:
	# CHECK-NEXT: [0.0] [0.1] [1] [2] [3] [4] [5] [6] [7.0] [7.1] [8.0] [8.1] [9] [10] [11] [12] [13] [14] [15] [16.0] [16.1] [17] [18] Instructions:			# CHECK-NEXT: [0.0] [0.1] [1] [2] [3] [4] [5] [6] [7.0] [7.1] [8.0] [8.1] [9] [10] [11] [12] [13] [14] [15] [16.0] [16.1] [17] [18] Instructions:
	# CHECK-NEXT: - - - - - - - - - - - - - 1.00 - 1.00 - - - - - - - vcvtsi2sdl %ecx, %xmm0, %xmm0			# CHECK-NEXT: - - - - - - - - - - - - - 1.00 - 1.00 - - - - - - - vcvtsi2sdl %ecx, %xmm0, %xmm0

	# CHECK: [2] Code Region			# CHECK: [2] Code Region

	# CHECK: Iterations: 500			# CHECK: Iterations: 500
	# CHECK-NEXT: Instructions: 500			# CHECK-NEXT: Instructions: 500
	# CHECK-NEXT: Total Cycles: 515			# CHECK-NEXT: Total Cycles: 6503
	# CHECK-NEXT: Total uOps: 1000			# CHECK-NEXT: Total uOps: 1000

	# CHECK: Dispatch Width: 4			# CHECK: Dispatch Width: 4
	# CHECK-NEXT: uOps Per Cycle: 1.94			# CHECK-NEXT: uOps Per Cycle: 0.15
	# CHECK-NEXT: IPC: 0.97			# CHECK-NEXT: IPC: 0.08
	# CHECK-NEXT: Block RThroughput: 1.0			# CHECK-NEXT: Block RThroughput: 1.0

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	Show All 34 Lines
	# CHECK: Resource pressure by instruction:			# CHECK: Resource pressure by instruction:
	# CHECK-NEXT: [0.0] [0.1] [1] [2] [3] [4] [5] [6] [7.0] [7.1] [8.0] [8.1] [9] [10] [11] [12] [13] [14] [15] [16.0] [16.1] [17] [18] Instructions:			# CHECK-NEXT: [0.0] [0.1] [1] [2] [3] [4] [5] [6] [7.0] [7.1] [8.0] [8.1] [9] [10] [11] [12] [13] [14] [15] [16.0] [16.1] [17] [18] Instructions:
	# CHECK-NEXT: - - - - - - - - - - - - - 1.00 - 1.00 - - - - - - - cvtsi2ssl %ecx, %xmm0			# CHECK-NEXT: - - - - - - - - - - - - - 1.00 - 1.00 - - - - - - - cvtsi2ssl %ecx, %xmm0

	# CHECK: [3] Code Region			# CHECK: [3] Code Region

	# CHECK: Iterations: 500			# CHECK: Iterations: 500
	# CHECK-NEXT: Instructions: 500			# CHECK-NEXT: Instructions: 500
	# CHECK-NEXT: Total Cycles: 515			# CHECK-NEXT: Total Cycles: 6503
	# CHECK-NEXT: Total uOps: 1000			# CHECK-NEXT: Total uOps: 1000

	# CHECK: Dispatch Width: 4			# CHECK: Dispatch Width: 4
	# CHECK-NEXT: uOps Per Cycle: 1.94			# CHECK-NEXT: uOps Per Cycle: 0.15
	# CHECK-NEXT: IPC: 0.97			# CHECK-NEXT: IPC: 0.08
	# CHECK-NEXT: Block RThroughput: 1.0			# CHECK-NEXT: Block RThroughput: 1.0

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	▲ Show 20 Lines • Show All 149 Lines • Show Last 20 Lines

llvm/trunk/test/tools/llvm-mca/X86/BtVer2/int-to-fpu-forwarding-2.s

	Show First 20 Lines • Show All 120 Lines • ▼ Show 20 Lines
	# CHECK: Resource pressure by instruction:			# CHECK: Resource pressure by instruction:
	# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:			# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
	# CHECK-NEXT: - - - - - - 1.00 - - - 1.00 - - - vcvtsi2sdl %ecx, %xmm0, %xmm0			# CHECK-NEXT: - - - - - - 1.00 - - - 1.00 - - - vcvtsi2sdl %ecx, %xmm0, %xmm0

	# CHECK: [2] Code Region			# CHECK: [2] Code Region

	# CHECK: Iterations: 500			# CHECK: Iterations: 500
	# CHECK-NEXT: Instructions: 500			# CHECK-NEXT: Instructions: 500
	# CHECK-NEXT: Total Cycles: 506			# CHECK-NEXT: Total Cycles: 2003
	# CHECK-NEXT: Total uOps: 1000			# CHECK-NEXT: Total uOps: 1000

	# CHECK: Dispatch Width: 2			# CHECK: Dispatch Width: 2
	# CHECK-NEXT: uOps Per Cycle: 1.98			# CHECK-NEXT: uOps Per Cycle: 0.50
	# CHECK-NEXT: IPC: 0.99			# CHECK-NEXT: IPC: 0.25
	# CHECK-NEXT: Block RThroughput: 1.0			# CHECK-NEXT: Block RThroughput: 1.0

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	Show All 25 Lines
	# CHECK: Resource pressure by instruction:			# CHECK: Resource pressure by instruction:
	# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:			# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
	# CHECK-NEXT: - - - - - - 1.00 - - - 1.00 - - - cvtsi2ssl %ecx, %xmm0			# CHECK-NEXT: - - - - - - 1.00 - - - 1.00 - - - cvtsi2ssl %ecx, %xmm0

	# CHECK: [3] Code Region			# CHECK: [3] Code Region

	# CHECK: Iterations: 500			# CHECK: Iterations: 500
	# CHECK-NEXT: Instructions: 500			# CHECK-NEXT: Instructions: 500
	# CHECK-NEXT: Total Cycles: 506			# CHECK-NEXT: Total Cycles: 2003
	# CHECK-NEXT: Total uOps: 1000			# CHECK-NEXT: Total uOps: 1000

	# CHECK: Dispatch Width: 2			# CHECK: Dispatch Width: 2
	# CHECK-NEXT: uOps Per Cycle: 1.98			# CHECK-NEXT: uOps Per Cycle: 0.50
	# CHECK-NEXT: IPC: 0.99			# CHECK-NEXT: IPC: 0.25
	# CHECK-NEXT: Block RThroughput: 1.0			# CHECK-NEXT: Block RThroughput: 1.0

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines