This is an archive of the discontinued LLVM Phabricator instance.

Changes in conversion cost model for X86 target
AbandonedPublic

Authored by delena on Dec 16 2015, 11:59 PM.

Download Raw Diff

Details

Reviewers

nadav
andreadb
congh
dorit
hfinkel

Summary

The current cost calculation for the conversion instructions has many problems

The provided numbers are inaccurate
Huge numbers are given for vector split

I changed the approach for vector cost calculation:

If the original types are simple - check them first
if the original vector should be split - take the legal types and multiply by split factor
split factor is the max factor between source and destination
do not put v8i32 -> v4i64 cost. The calculated cost should be 2 * (v4i32 -> v4i64).

I also checked all SSE numbers and put the real number of instructions instead.
I understand, that instruction latency is different in SSE2 and AVX, but
the matter of this cost model to let the vectorizer to choose the right VF (compare VFx to VFy for the same target).

Diff Detail

Event Timeline

delena updated this revision to Diff 43102.Dec 16 2015, 11:59 PM

delena retitled this revision from to Changes in conversion cost model for X86 target.

delena updated this object.

delena added reviewers: congh, hfinkel, andreadb, nadav.

delena set the repository for this revision to rL LLVM.

delena added a subscriber: llvm-commits.

This patch overall looks very nice! Please check my inline comments.

../lib/Target/X86/X86TargetTransformInfo.cpp
837	Those two statements can be merge into one: std::tie(SrcSplitFactor, SrcVT) = LTSrc;
851	Use a lambda to avoid code duplication? Also should the last parameter be Src?
861	Split SrcVT?
864	Instead of further splitting SrcVT/DstVT, should we do the opposite? For example, given SrcVT = v16i16, and DstVT = v16i32, and on SSE2 we have SrcSplitFactor = 2, and DstSplitFactor = 4. Now suppose we have an cost entry for SrcVT = v8i16 and DstVT = v8i32, should we use this entry to estimate the cost? What you are doing here is making SrcVT = v4i16 and DstVT = v4i32, which gives less precise cost.
867	Is this necessary? When we reach here, SrcVT/DstVT should be the same as OrgSimpleSrcVT/OrgSimpleDstVT, so I think it is safe to let CheckOriginalTypes keep its true value.

nadav added inline comments.Dec 17 2015, 4:22 PM

../lib/Target/X86/X86TargetTransformInfo.cpp
812	Elena, I don't understand the logic below. It looks like you are re-implementing type legalization in the code model code. Why not just use getTypeLegalizationCost ? Can you give an example where the existing logic fails?

The getTypeLegalizationCost() shows only split-factor for vectors. It does not show promotion, sign-extend, zero-extend, widening cost.
Now, let's assume that we have vector-spilt. I take the max split-factor between source and destination. (And this is right, agree?)

In case of promotion/widening the getTypeLegalizationCost() does not help. It just show the new types after promotion.
For example v4i8 sext to v4i32, the type legalization result will be v4i32 sext to v4i32.
Or v2i8 to v2f32 - the type legalization will convert the operation to " v2i64 to v4f32 ".
It is very difficult to describe the right cost in this mode. If you don't have any automatic engine for the cost estimation, it is easier to put the right cost for v2i8 to v2f32.

The only one thing that I, probably, forgot is
V8i16 -> v8f64 on AVX2
The result will be
V4i32 to v4f64 and I want to see v4i16 to v4f64. This way I'll be able to add SEXT cost from v4i16 to v4i32.

Why did I start to rewrite?

Because one operation
%a = sext i32 %b to i64
Does not allow to vectorize the whole loop to 16, just because the cost of "%a = sext <16 x i32>%b to <16 x 64>" is 300 instead of 2.

Elena

Thanks for explaining this Elena. Have you considered handling all of the special cases by adding them to the 'TypeConversionCostTblEntry' table? Also, have you considered improving getTypeLegalizationCost?

I re-checked and fixed many entries in SSE and AVX tables. I do not plan to rewrite getTy[eLegalizationCost right now.

Elena

delena added a reviewer: dorit.Dec 20 2015, 4:56 AM

delena marked 3 inline comments as done.Dec 20 2015, 6:57 AM

delena added inline comments.

../lib/Target/X86/X86TargetTransformInfo.cpp
867	Ok.

I changed the code according to Cong's comments.

Elena, I don't understand your comment. getTypeLegalizationCost imitates the type legalizer by splitting and promoting. It actually uses the type legalizer. What do you mean it does not support zero-extend and sign-extend?

Are you saying that it does not handle dag-combine peepholes that we have? Please explain what problem your patch is solving.

I suspect that you can commit changes to the cost table not as part of this patch. But please do not modify the code that uses getTypeLegalizationCost without explaining what it does.

Hi Nadav,

The getTypeLegalizationCost() returns a pair:
Res.first is a Split-Factor
Res.second is a legal MVT

Let's assume that the original type v4i8. The Res.second will show MVT::4i32. Res.first = 1.
Now, 4i8 is promoted to 4i32 and you don't know the real cost of this promotion. And the getTypeLegalizationCost() can't provide this information.
Just because it does not know how to promote - sign-extend or zero-extend.
SITOFP requires sign-extend, UITOFP requires zero-extend.

The real information that we receive from getTypeLegalizationCost() is the split-factor. I use it in the following way:
Example 1:
SSE2:

Sitofp <4 x i32 > %a to <4 x f64>

The split-factor of destination is 2. How do I calculate the cost?

2 * costof (sitofp <2 x i32 > %a to <2 x f64>) + ExtraSplitCost

The ExtraSplitCost is the cost of splitting vector %a.

Example 2:
AVX2:

Sitofp <8 x i64 > %a to <8 x f64>

The split-factor of source and destination is 2. How do I calculate the cost?

2 * costof (sitofp <4 x i64 > %a to <4 x f64>)
The ExtraSplitCost=0 because the source and the destination both come in 2 registers, I don't need any additional shuffle.

Example 3:
SSE2:
Sitofp <4 x i8 > to <4 x f32>
For the source getTypeLegalizationCost() returns (1, MVT::4i32)
For the dest getTypeLegalizationCost() returns (1, MVT::4f32)

The original version returns
1 * costof(Sitofp <4 x i32 > to <4 x f32>)
I changed it to
1 * costof(Sitofp <4 x i8 > to <4 x f32>)

Now, after my changes the following 2 operations
Sitofp <4 x i8 > to <4 x f32>
And
Sitofp <4 x i32 > to <4 x f32>
Have the different cost.

The model, I'm proposing does more exact cost estimation. I use getTypeLegalizationCost() for split-factor only.
I can't add one or two lines that I'm missing, because the current model is wrong.
This is the original wrong code:

> EVT SrcTy = TLI->getValueType(DL, Src);
> EVT DstTy = TLI->getValueType(DL, Dst);
  
> // The function getSimpleVT only handles simple value types.
> if (!SrcTy.isSimple() || !DstTy.isSimple())
>   return BaseT::getCastInstrCost(Opcode, Dst, Src);

Elena

Elena,

We don't let the instruction Sitofp <4 x i32 > %a to <4 x f64> go through the type legalizer.

We handle it as a DAGCombine optimization. DAGCombine optimizations are represented as entries in the table.

-Nadav

The current cost model gives cost 40.
The new model gives 3.

This is the real code (SSE2)
sitofpv4i32v4double: # @sitofpv4i32v4double

.cfi_startproc

BB#0:

cvtdq2pd        %xmm0, %xmm2
pshufd  $78, %xmm0, %xmm0       # xmm0 = xmm0[2,3,0,1]
cvtdq2pd        %xmm0, %xmm1
movaps  %xmm2, %xmm0
retq

Elena

Elena, please fix the cost table and don't change anything else.

The proposed solution was not accepted.

mkuper mentioned this in D22064: [X86] Make some cast costs more precise.Jul 6 2016, 1:06 PM

mkuper mentioned this in rL275106: [X86] Make some cast costs more precise.Jul 11 2016, 2:47 PM

Revision Contents

Path

Size

../

lib/

Target/

X86/

X86TargetTransformInfo.cpp

185 lines

test/

Analysis/

CostModel/

X86/

cast.ll

3 lines

sitofp.ll

250 lines

uitofp.ll

192 lines

Transforms/

LoopVectorize/

X86/

conversion-cost.ll

2 lines

uint64_to_fp64-cost-model.ll

4 lines

Diff 43102

../lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 572 Lines • ▼ Show 20 Lines	static const TypeConversionCostTblEntry AVX512FConversionTbl[] = {
{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i1, 3 },		{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i1, 3 },
{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i8, 2 },		{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i8, 2 },
{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i16, 2 },		{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i16, 2 },
{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i32, 1 },		{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i32, 1 },
{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i1, 4 },		{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i1, 4 },
{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i8, 2 },		{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i8, 2 },
{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i16, 2 },		{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i16, 2 },
{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i32, 1 },		{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i32, 1 },
		{ ISD::SINT_TO_FP, MVT::v8f32, MVT::v8i64, 26 },
		{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i64, 26 },

{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i1, 3 },		{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i1, 3 },
{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i8, 2 },		{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i8, 2 },
{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i16, 2 },		{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i16, 2 },
{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i32, 1 },		{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i32, 1 },
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i32, 1 },		{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i32, 1 },
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i32, 1 },		{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i32, 1 },
{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i1, 4 },		{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i1, 4 },
{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i16, 2 },		{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i16, 2 },
{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i32, 1 },		{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i32, 1 },
{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i8, 2 },		{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i8, 2 },
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i8, 2 },		{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i8, 2 },
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i16, 2 },		{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i16, 2 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i8, 2 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i8, 2 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i16, 2 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i16, 2 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i32, 1 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i32, 1 },
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i8, 2 },		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i32, 1 },
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i16, 5 },
{ ISD::UINT_TO_FP, MVT::v2f32, MVT::v2i32, 2 },		{ ISD::UINT_TO_FP, MVT::v2f32, MVT::v2i32, 2 },
		{ ISD::UINT_TO_FP, MVT::v2f32, MVT::v2i64, 5 },
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i64, 5 },		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i64, 5 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i64, 12 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i64, 12 },
{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i64, 26 },		{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i64, 26 },

{ ISD::FP_TO_UINT, MVT::v2i32, MVT::v2f32, 1 },		{ ISD::FP_TO_UINT, MVT::v2i32, MVT::v2f32, 1 },
{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f32, 1 },		{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f32, 1 },
{ ISD::FP_TO_UINT, MVT::v8i32, MVT::v8f32, 1 },		{ ISD::FP_TO_UINT, MVT::v8i32, MVT::v8f32, 1 },
{ ISD::FP_TO_UINT, MVT::v16i32, MVT::v16f32, 1 },		{ ISD::FP_TO_UINT, MVT::v16i32, MVT::v16f32, 1 },
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	static const TypeConversionCostTblEntry AVXConversionTbl[] = {
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i1, 7 },		{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i1, 7 },
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i8, 2 },		{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i8, 2 },
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i16, 2 },		{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i16, 2 },
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i32, 6 },		{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i32, 6 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i1, 7 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i1, 7 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i8, 2 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i8, 2 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i16, 2 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i16, 2 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i32, 6 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i32, 6 },
		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i32, 6 },
// The generic code to compute the scalar overhead is currently broken.		// The generic code to compute the scalar overhead is currently broken.
// Workaround this limitation by estimating the scalarization overhead		// Workaround this limitation by estimating the scalarization overhead
// here. We have roughly 10 instructions per scalar element.		// here. We have roughly 10 instructions per scalar element.
// Multiply that by the vector width.		// Multiply that by the vector width.
// FIXME: remove that when PR19268 is fixed.		// FIXME: remove that when PR19268 is fixed.
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i64, 2*10 },		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i64, 10 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i64, 4*10 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i64, 20 },

		{ ISD::SINT_TO_FP, MVT::v4f32, MVT::v4i64, 13 },
		{ ISD::SINT_TO_FP, MVT::v4f64, MVT::v4i64, 13 },
{ ISD::FP_TO_SINT, MVT::v8i8, MVT::v8f32, 7 },		{ ISD::FP_TO_SINT, MVT::v8i8, MVT::v8f32, 7 },
{ ISD::FP_TO_SINT, MVT::v4i8, MVT::v4f32, 1 },		{ ISD::FP_TO_SINT, MVT::v4i8, MVT::v4f32, 1 },
// This node is expanded into scalarized operations but BasicTTI is overly		// This node is expanded into scalarized operations but BasicTTI is overly
// optimistic estimating its cost. It computes 3 per element (one		// optimistic estimating its cost. It computes 3 per element (one
// vector-extract, one scalar conversion and one vector-insert). The		// vector-extract, one scalar conversion and one vector-insert). The
// problem is that the inserts form a read-modify-write chain so latency		// problem is that the inserts form a read-modify-write chain so latency
// should be factored in too. Inflating the cost per element by 1.		// should be factored in too. Inflating the cost per element by 1.
{ ISD::FP_TO_UINT, MVT::v8i32, MVT::v8f32, 8*4 },		{ ISD::FP_TO_UINT, MVT::v8i32, MVT::v8f32, 8*4 },
{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f64, 4*4 },		{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f64, 4*4 },

		{ ISD::FP_EXTEND, MVT::v4f64, MVT::v4f32, 1 },
		{ ISD::FP_ROUND, MVT::v4f32, MVT::v4f64, 1 },
};		};

static const TypeConversionCostTblEntry SSE41ConversionTbl[] = {		static const TypeConversionCostTblEntry SSE41ConversionTbl[] = {
{ ISD::ZERO_EXTEND, MVT::v16i32, MVT::v16i16, 4 },		{ ISD::ZERO_EXTEND, MVT::v16i32, MVT::v16i16, 4 },
{ ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i16, 4 },		{ ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i16, 4 },
{ ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i16, 2 },		{ ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i16, 2 },
{ ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i16, 2 },		{ ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i16, 2 },
{ ISD::ZERO_EXTEND, MVT::v4i32, MVT::v4i16, 1 },		{ ISD::ZERO_EXTEND, MVT::v4i32, MVT::v4i16, 1 },
Show All 18 Lines	static const TypeConversionCostTblEntry SSE41ConversionTbl[] = {
{ ISD::TRUNCATE, MVT::v8i8, MVT::v8i32, 3 },		{ ISD::TRUNCATE, MVT::v8i8, MVT::v8i32, 3 },
{ ISD::TRUNCATE, MVT::v4i8, MVT::v4i32, 1 },		{ ISD::TRUNCATE, MVT::v4i8, MVT::v4i32, 1 },
{ ISD::TRUNCATE, MVT::v16i8, MVT::v16i16, 3 },		{ ISD::TRUNCATE, MVT::v16i8, MVT::v16i16, 3 },
{ ISD::TRUNCATE, MVT::v8i8, MVT::v8i16, 1 },		{ ISD::TRUNCATE, MVT::v8i8, MVT::v8i16, 1 },
{ ISD::TRUNCATE, MVT::v4i8, MVT::v4i16, 2 },		{ ISD::TRUNCATE, MVT::v4i8, MVT::v4i16, 2 },
};		};

static const TypeConversionCostTblEntry SSE2ConversionTbl[] = {		static const TypeConversionCostTblEntry SSE2ConversionTbl[] = {
// These are somewhat magic numbers justified by looking at the output of		// These numbers reflect the number of generated instructions
// Intel's IACA, running some kernels and making sure when we take		// and do not reflect instruction latency
// legalization into account the throughput will be overestimated.
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i64, 2*10 },		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i64, 2*10 },
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v4i32, 4*10 },		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i32, 15 },
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v8i16, 8*10 },		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i16, 2 },
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v16i8, 16*10 },		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i8, 2 },
{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v2i64, 2*10 },		{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v2i64, 2*10 },
{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v4i32, 4*10 },		{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v2i32, 1 },
{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v8i16, 8*10 },		{ ISD::SINT_TO_FP, MVT::v2f32, MVT::v2i32, 1 },
{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v16i8, 16*10 },		{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v2i16, 3 },
		{ ISD::SINT_TO_FP, MVT::v2f32, MVT::v2i16, 3 },
		{ ISD::SINT_TO_FP, MVT::v4f32, MVT::v4i16, 3 },
		{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v2i8, 3 },
		{ ISD::SINT_TO_FP, MVT::v2f32, MVT::v2i8, 3 },
		{ ISD::SINT_TO_FP, MVT::v4f32, MVT::v4i8, 3 },
		{ ISD::SINT_TO_FP, MVT::v2f32, MVT::v2i64, 7 },
// There are faster sequences for float conversions.		// There are faster sequences for float conversions.
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v2i64, 15 },		{ ISD::UINT_TO_FP, MVT::v2f32, MVT::v2i8, 2 },
		{ ISD::UINT_TO_FP, MVT::v2f32, MVT::v2i16, 2 },
		{ ISD::UINT_TO_FP, MVT::v2f32, MVT::v2i32, 8 },
		{ ISD::UINT_TO_FP, MVT::v2f32, MVT::v2i64, 15 },

		{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i8, 2 },
		{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i16, 2 },
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i32, 8 },		{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i32, 8 },
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v8i16, 15 },		{ ISD::SINT_TO_FP, MVT::v4f32, MVT::v4i64, 15 },
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v16i8, 8 },		{ ISD::SINT_TO_FP, MVT::v4f32, MVT::v4i32, 1 },
{ ISD::SINT_TO_FP, MVT::v4f32, MVT::v2i64, 15 },
{ ISD::SINT_TO_FP, MVT::v4f32, MVT::v4i32, 15 },
{ ISD::SINT_TO_FP, MVT::v4f32, MVT::v8i16, 15 },
{ ISD::SINT_TO_FP, MVT::v4f32, MVT::v16i8, 8 },

{ ISD::ZERO_EXTEND, MVT::v16i32, MVT::v16i16, 6 },		{ ISD::ZERO_EXTEND, MVT::v16i32, MVT::v16i16, 6 },
{ ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i16, 8 },		{ ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i16, 8 },
{ ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i16, 3 },		{ ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i16, 3 },
{ ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i16, 4 },		{ ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i16, 4 },
{ ISD::ZERO_EXTEND, MVT::v4i32, MVT::v4i16, 1 },		{ ISD::ZERO_EXTEND, MVT::v4i32, MVT::v4i16, 1 },
{ ISD::SIGN_EXTEND, MVT::v4i32, MVT::v4i16, 2 },		{ ISD::SIGN_EXTEND, MVT::v4i32, MVT::v4i16, 2 },
{ ISD::ZERO_EXTEND, MVT::v16i32, MVT::v16i8, 9 },		{ ISD::ZERO_EXTEND, MVT::v16i32, MVT::v16i8, 9 },
{ ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i8, 12 },		{ ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i8, 12 },
Show All 14 Lines	static const TypeConversionCostTblEntry SSE2ConversionTbl[] = {
{ ISD::TRUNCATE, MVT::v16i8, MVT::v16i32, 31 },		{ ISD::TRUNCATE, MVT::v16i8, MVT::v16i32, 31 },
{ ISD::TRUNCATE, MVT::v8i8, MVT::v8i32, 4 },		{ ISD::TRUNCATE, MVT::v8i8, MVT::v8i32, 4 },
{ ISD::TRUNCATE, MVT::v4i8, MVT::v4i32, 3 },		{ ISD::TRUNCATE, MVT::v4i8, MVT::v4i32, 3 },
{ ISD::TRUNCATE, MVT::v16i8, MVT::v16i16, 3 },		{ ISD::TRUNCATE, MVT::v16i8, MVT::v16i16, 3 },
{ ISD::TRUNCATE, MVT::v8i8, MVT::v8i16, 2 },		{ ISD::TRUNCATE, MVT::v8i8, MVT::v8i16, 2 },
{ ISD::TRUNCATE, MVT::v4i8, MVT::v4i16, 4 },		{ ISD::TRUNCATE, MVT::v4i8, MVT::v4i16, 4 },
};		};

std::pair<int, MVT> LTSrc = TLI->getTypeLegalizationCost(DL, Src);
std::pair<int, MVT> LTDest = TLI->getTypeLegalizationCost(DL, Dst);

if (ST->hasSSE2() && !ST->hasAVX()) {
if (const auto *Entry = ConvertCostTableLookup(SSE2ConversionTbl, ISD,
LTDest.second, LTSrc.second))
return LTSrc.first * Entry->Cost;
}

EVT SrcTy = TLI->getValueType(DL, Src);		EVT SrcTy = TLI->getValueType(DL, Src);
EVT DstTy = TLI->getValueType(DL, Dst);		EVT DstTy = TLI->getValueType(DL, Dst);

// The function getSimpleVT only handles simple value types.		MVT SrcVT;
		nadavUnsubmitted Not Done Reply Inline Actions Elena, I don't understand the logic below. It looks like you are re-implementing type legalization in the code model code. Why not just use getTypeLegalizationCost ? Can you give an example where the existing logic fails? nadav: Elena, I don't understand the logic below. It looks like you are re-implementing type…
		MVT DstVT;
		MVT OrgSimpleSrcVT;
		MVT OrgSimpleDstVT;
		int SplitFactor = 1;
		bool CheckOriginalTypes = false;
		if (!SrcTy.isVector()) {
		// Scalar types
if (!SrcTy.isSimple() \|\| !DstTy.isSimple())		if (!SrcTy.isSimple() \|\| !DstTy.isSimple())
return BaseT::getCastInstrCost(Opcode, Dst, Src);		return BaseT::getCastInstrCost(Opcode, Dst, Src);
		SrcVT = SrcTy.getSimpleVT();
if (ST->hasDQI())		DstVT = DstTy.getSimpleVT();
if (const auto *Entry = ConvertCostTableLookup(AVX512DQConversionTbl, ISD,		} else {
DstTy.getSimpleVT(),		// Vector types
SrcTy.getSimpleVT()))		if (SrcTy.isSimple() && DstTy.isSimple()) {
return Entry->Cost;		// The both are simple, check the original types
		OrgSimpleSrcVT = SrcTy.getSimpleVT();
if (ST->hasAVX512())		OrgSimpleDstVT = DstTy.getSimpleVT();
if (const auto *Entry = ConvertCostTableLookup(AVX512FConversionTbl, ISD,		CheckOriginalTypes = true;
DstTy.getSimpleVT(),
SrcTy.getSimpleVT()))
return Entry->Cost;

if (ST->hasAVX2()) {
if (const auto *Entry = ConvertCostTableLookup(AVX2ConversionTbl, ISD,
DstTy.getSimpleVT(),
SrcTy.getSimpleVT()))
return Entry->Cost;
}		}
		int SrcSplitFactor = 1;
		int DstSplitFactor = 1;
		std::pair<int, MVT> LTSrc = TLI->getTypeLegalizationCost(DL, Src);
		if (LTSrc.first > 1) {
		SrcVT = LTSrc.second;
		SrcSplitFactor = LTSrc.first;
		conghUnsubmitted Done Reply Inline Actions Those two statements can be merge into one: std::tie(SrcSplitFactor, SrcVT) = LTSrc; congh: Those two statements can be merge into one: std::tie(SrcSplitFactor, SrcVT) = LTSrc;
		} else if (SrcTy.isSimple())
		// do not take the promoted type, check the original one
		SrcVT = SrcTy.getSimpleVT();
		else
		return BaseT::getCastInstrCost(Opcode, Dst, Src);

if (ST->hasAVX()) {		std::pair<int, MVT> LTDest = TLI->getTypeLegalizationCost(DL, Dst);
if (const auto *Entry = ConvertCostTableLookup(AVXConversionTbl, ISD,		if (LTDest.first > 1) {
DstTy.getSimpleVT(),		DstVT = LTDest.second;
SrcTy.getSimpleVT()))		DstSplitFactor = LTDest.first;
return Entry->Cost;		} else if (DstTy.isSimple())
		DstVT = DstTy.getSimpleVT();
		else
		return BaseT::getCastInstrCost(Opcode, Dst, Dst);
		conghUnsubmitted Done Reply Inline Actions Use a lambda to avoid code duplication? Also should the last parameter be Src? congh: Use a lambda to avoid code duplication? Also should the last parameter be Src?

		if (SrcSplitFactor > 1 \|\| DstSplitFactor > 1) {
		SplitFactor = std::max(SrcSplitFactor, DstSplitFactor);
		if (SrcSplitFactor > DstSplitFactor)
		// Split DstVT
		DstVT = MVT::getVectorVT(DstVT.getScalarType(),
		DstVT.getVectorNumElements() /
		(SrcSplitFactor / DstSplitFactor));
		else if (DstSplitFactor > SrcSplitFactor)
		// Split DstVT
		conghUnsubmitted Done Reply Inline Actions Split SrcVT? congh: Split SrcVT?
		SrcVT = MVT::getVectorVT(SrcVT.getScalarType(),
		SrcVT.getVectorNumElements() /
		(DstSplitFactor / SrcSplitFactor));
		conghUnsubmitted Not Done Reply Inline Actions Instead of further splitting SrcVT/DstVT, should we do the opposite? For example, given SrcVT = v16i16, and DstVT = v16i32, and on SSE2 we have SrcSplitFactor = 2, and DstSplitFactor = 4. Now suppose we have an cost entry for SrcVT = v8i16 and DstVT = v8i32, should we use this entry to estimate the cost? What you are doing here is making SrcVT = v4i16 and DstVT = v4i32, which gives less precise cost. congh: Instead of further splitting SrcVT/DstVT, should we do the opposite? For example, given SrcVT =…
}		}
		else
if (ST->hasSSE41()) {		CheckOriginalTypes = false;
		conghUnsubmitted Not Done Reply Inline Actions Is this necessary? When we reach here, SrcVT/DstVT should be the same as OrgSimpleSrcVT/OrgSimpleDstVT, so I think it is safe to let CheckOriginalTypes keep its true value. congh: Is this necessary? When we reach here, SrcVT/DstVT should be the same as…
		delenaAuthorUnsubmitted Not Done Reply Inline Actions Ok. delena: Ok.
if (const auto *Entry = ConvertCostTableLookup(SSE41ConversionTbl, ISD,
DstTy.getSimpleVT(),
SrcTy.getSimpleVT()))
return Entry->Cost;
}		}

if (ST->hasSSE2()) {		SmallVector<ArrayRef<TypeConversionCostTblEntry>, 8> Tbls;
if (const auto *Entry = ConvertCostTableLookup(SSE2ConversionTbl, ISD,		if (ST->hasDQI())
DstTy.getSimpleVT(),		Tbls.push_back(AVX512DQConversionTbl);
SrcTy.getSimpleVT()))		if (ST->hasAVX512())
		Tbls.push_back(AVX512FConversionTbl);
		if (ST->hasAVX2())
		Tbls.push_back(AVX2ConversionTbl);
		if (ST->hasAVX())
		Tbls.push_back(AVXConversionTbl);
		if (ST->hasSSE41())
		Tbls.push_back(SSE41ConversionTbl);
		if (ST->hasSSE2())
		Tbls.push_back(SSE2ConversionTbl);

		for (ArrayRef<TypeConversionCostTblEntry> Tbl : Tbls) {
		if (CheckOriginalTypes)
		if (const auto *Entry = ConvertCostTableLookup(Tbl, ISD,
		OrgSimpleDstVT,
		OrgSimpleSrcVT))
return Entry->Cost;		return Entry->Cost;
		if (const auto *Entry = ConvertCostTableLookup(Tbl, ISD,
		DstVT,
		SrcVT))
		return Entry->Cost * SplitFactor;
}		}

return BaseT::getCastInstrCost(Opcode, Dst, Src);		return BaseT::getCastInstrCost(Opcode, Dst, Src);
}		}

int X86TTIImpl::getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy) {		int X86TTIImpl::getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy) {
// Legalize the type.		// Legalize the type.
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, ValTy);		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, ValTy);

MVT MTy = LT.second;		MVT MTy = LT.second;
▲ Show 20 Lines • Show All 482 Lines • Show Last 20 Lines

../test/Analysis/CostModel/X86/cast.ll

Show All 31 Lines	; CHECK-LABEL: for function 'add'

;CHECK: cost of 0 {{.*}} ret		;CHECK: cost of 0 {{.*}} ret
ret i32 undef		ret i32 undef
}		}

define i32 @zext_sext(<8 x i1> %in) {		define i32 @zext_sext(<8 x i1> %in) {
; CHECK-AVX2-LABEL: for function 'zext_sext'		; CHECK-AVX2-LABEL: for function 'zext_sext'
; CHECK-AVX-LABEL: for function 'zext_sext'		; CHECK-AVX-LABEL: for function 'zext_sext'
		; CHECK-AVX512-LABEL: for function 'zext_sext'
;CHECK-AVX2: cost of 3 {{.*}} zext		;CHECK-AVX2: cost of 3 {{.*}} zext
;CHECK-AVX: cost of 4 {{.*}} zext		;CHECK-AVX: cost of 4 {{.*}} zext
%Z = zext <8 x i1> %in to <8 x i32>		%Z = zext <8 x i1> %in to <8 x i32>
;CHECK-AVX2: cost of 3 {{.*}} sext		;CHECK-AVX2: cost of 3 {{.*}} sext
;CHECK-AVX: cost of 7 {{.*}} sext		;CHECK-AVX: cost of 7 {{.*}} sext
%S = sext <8 x i1> %in to <8 x i32>		%S = sext <8 x i1> %in to <8 x i32>

;CHECK-AVX2: cost of 1 {{.*}} zext		;CHECK-AVX2: cost of 1 {{.*}} zext
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	; CHECK-AVX512-LABEL: for function 'zext_sext'
;CHECK-AVX2: cost of 2 {{.*}} trunc		;CHECK-AVX2: cost of 2 {{.*}} trunc
;CHECK-AVX: cost of 4 {{.*}} trunc		;CHECK-AVX: cost of 4 {{.*}} trunc
%F2 = trunc <8 x i32> undef to <8 x i8>		%F2 = trunc <8 x i32> undef to <8 x i8>
;CHECK-AVX2: cost of 2 {{.*}} trunc		;CHECK-AVX2: cost of 2 {{.*}} trunc
;CHECK-AVX: cost of 4 {{.*}} trunc		;CHECK-AVX: cost of 4 {{.*}} trunc
%F3 = trunc <4 x i64> undef to <4 x i8>		%F3 = trunc <4 x i64> undef to <4 x i8>

;CHECK-AVX2: cost of 4 {{.*}} trunc		;CHECK-AVX2: cost of 4 {{.*}} trunc
;CHECK-AVX: cost of 9 {{.*}} trunc		;CHECK-AVX: cost of 8 {{.*}} trunc
;CHECK_AVX512: cost of 1 {{.*}} G = trunc		;CHECK_AVX512: cost of 1 {{.*}} G = trunc
%G = trunc <8 x i64> undef to <8 x i32>		%G = trunc <8 x i64> undef to <8 x i32>

;CHECK-AVX512: cost of 1 {{.*}} %G1 = trunc		;CHECK-AVX512: cost of 1 {{.*}} %G1 = trunc
%G1 = trunc <16 x i32> undef to <16 x i16>		%G1 = trunc <16 x i32> undef to <16 x i16>

;CHECK-AVX512: cost of 1 {{.*}} %G2 = trunc		;CHECK-AVX512: cost of 1 {{.*}} %G2 = trunc
%G2 = trunc <16 x i32> undef to <16 x i8>		%G2 = trunc <16 x i32> undef to <16 x i8>
▲ Show 20 Lines • Show All 133 Lines • Show Last 20 Lines

../test/Analysis/CostModel/X86/sitofp.ll

; RUN: opt -mtriple=x86_64-apple-darwin -mattr=+sse2 -cost-model -analyze < %s \| FileCheck --check-prefix=SSE --check-prefix=SSE2 %s		; RUN: opt -mtriple=x86_64-apple-darwin -mattr=+sse2 -cost-model -analyze < %s \| FileCheck --check-prefix=SSE --check-prefix=SSE2 %s
; RUN: opt -mtriple=x86_64-apple-darwin -mattr=+avx -cost-model -analyze < %s \| FileCheck --check-prefix=AVX --check-prefix=AVX1 %s		; RUN: opt -mtriple=x86_64-apple-darwin -mattr=+avx -cost-model -analyze < %s \| FileCheck --check-prefix=AVX --check-prefix=AVX1 %s
; RUN: opt -mtriple=x86_64-apple-darwin -mattr=+avx2 -cost-model -analyze < %s \| FileCheck --check-prefix=AVX --check-prefix=AVX2 %s		; RUN: opt -mtriple=x86_64-apple-darwin -mattr=+avx2 -cost-model -analyze < %s \| FileCheck --check-prefix=AVX --check-prefix=AVX2 %s
; RUN: opt -mtriple=x86_64-apple-darwin -mattr=+avx512f -cost-model -analyze < %s \| FileCheck --check-prefix=AVX512F %s		; RUN: opt -mtriple=x86_64-apple-darwin -mattr=+avx512f -cost-model -analyze < %s \| FileCheck --check-prefix=AVX512F %s

define <2 x double> @sitofpv2i8v2double(<2 x i8> %a) {		define <2 x double> @sitofpv2i8v2double(<2 x i8> %a) {
; SSE2-LABEL: sitofpv2i8v2double		; SSE2-LABEL: sitofpv2i8v2double
; SSE2: cost of 20 {{.*}} sitofp		; SSE2: cost of 3 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv2i8v2double		; AVX1-LABEL: sitofpv2i8v2double
; AVX1: cost of 4 {{.*}} sitofp		; AVX1: cost of 3 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv2i8v2double		; AVX2-LABEL: sitofpv2i8v2double
; AVX2: cost of 4 {{.*}} sitofp		; AVX2: cost of 3 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv2i8v2double		; AVX512F-LABEL: sitofpv2i8v2double
; AVX512F: cost of 4 {{.*}} sitofp		; AVX512F: cost of 3 {{.*}} sitofp
%1 = sitofp <2 x i8> %a to <2 x double>		%1 = sitofp <2 x i8> %a to <2 x double>
ret <2 x double> %1		ret <2 x double> %1
}		}

define <4 x double> @sitofpv4i8v4double(<4 x i8> %a) {		define <4 x double> @sitofpv4i8v4double(<4 x i8> %a) {
; SSE2-LABEL: sitofpv4i8v4double		; SSE2-LABEL: sitofpv4i8v4double
; SSE2: cost of 40 {{.*}} sitofp		; SSE2: cost of 6 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv4i8v4double		; AVX1-LABEL: sitofpv4i8v4double
; AVX1: cost of 3 {{.*}} sitofp		; AVX1: cost of 3 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv4i8v4double		; AVX2-LABEL: sitofpv4i8v4double
; AVX2: cost of 3 {{.*}} sitofp		; AVX2: cost of 3 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv4i8v4double		; AVX512F-LABEL: sitofpv4i8v4double
; AVX512F: cost of 3 {{.*}} sitofp		; AVX512F: cost of 3 {{.*}} sitofp
%1 = sitofp <4 x i8> %a to <4 x double>		%1 = sitofp <4 x i8> %a to <4 x double>
ret <4 x double> %1		ret <4 x double> %1
}		}

define <8 x double> @sitofpv8i8v8double(<8 x i8> %a) {		define <8 x double> @sitofpv8i8v8double(<8 x i8> %a) {
; SSE2-LABEL: sitofpv8i8v8double		; SSE2-LABEL: sitofpv8i8v8double
; SSE2: cost of 80 {{.*}} sitofp		; SSE2: cost of 12 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv8i8v8double		; AVX1-LABEL: sitofpv8i8v8double
; AVX1: cost of 20 {{.*}} sitofp		; AVX1: cost of 6 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv8i8v8double		; AVX2-LABEL: sitofpv8i8v8double
; AVX2: cost of 20 {{.*}} sitofp		; AVX2: cost of 6 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv8i8v8double		; AVX512F-LABEL: sitofpv8i8v8double
; AVX512F: cost of 2 {{.*}} sitofp		; AVX512F: cost of 2 {{.*}} sitofp
%1 = sitofp <8 x i8> %a to <8 x double>		%1 = sitofp <8 x i8> %a to <8 x double>
ret <8 x double> %1		ret <8 x double> %1
}		}

define <16 x double> @sitofpv16i8v16double(<16 x i8> %a) {		define <16 x double> @sitofpv16i8v16double(<16 x i8> %a) {
; SSE2-LABEL: sitofpv16i8v16double		; SSE2-LABEL: sitofpv16i8v16double
; SSE2: cost of 160 {{.*}} sitofp		; SSE2: cost of 24 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv16i8v16double		; AVX1-LABEL: sitofpv16i8v16double
; AVX1: cost of 40 {{.*}} sitofp		; AVX1: cost of 12 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv16i8v16double		; AVX2-LABEL: sitofpv16i8v16double
; AVX2: cost of 40 {{.*}} sitofp		; AVX2: cost of 12 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv16i8v16double		; AVX512F-LABEL: sitofpv16i8v16double
; AVX512F: cost of 44 {{.*}} sitofp		; AVX512F: cost of 4 {{.*}} sitofp
%1 = sitofp <16 x i8> %a to <16 x double>		%1 = sitofp <16 x i8> %a to <16 x double>
ret <16 x double> %1		ret <16 x double> %1
}		}

define <32 x double> @sitofpv32i8v32double(<32 x i8> %a) {		define <32 x double> @sitofpv32i8v32double(<32 x i8> %a) {
; SSE2-LABEL: sitofpv32i8v32double		; SSE2-LABEL: sitofpv32i8v32double
; SSE2: cost of 320 {{.*}} sitofp		; SSE2: cost of 48 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv32i8v32double		; AVX1-LABEL: sitofpv32i8v32double
; AVX1: cost of 80 {{.*}} sitofp		; AVX1: cost of 24 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv32i8v32double		; AVX2-LABEL: sitofpv32i8v32double
; AVX2: cost of 80 {{.*}} sitofp		; AVX2: cost of 24 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv32i8v32double		; AVX512F-LABEL: sitofpv32i8v32double
; AVX512F: cost of 88 {{.*}} sitofp		; AVX512F: cost of 8 {{.*}} sitofp
%1 = sitofp <32 x i8> %a to <32 x double>		%1 = sitofp <32 x i8> %a to <32 x double>
ret <32 x double> %1		ret <32 x double> %1
}		}

define <2 x double> @sitofpv2i16v2double(<2 x i16> %a) {		define <2 x double> @sitofpv2i16v2double(<2 x i16> %a) {
; SSE2-LABEL: sitofpv2i16v2double		; SSE2-LABEL: sitofpv2i16v2double
; SSE2: cost of 20 {{.*}} sitofp		; SSE2: cost of 3 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv2i16v2double		; AVX1-LABEL: sitofpv2i16v2double
; AVX1: cost of 4 {{.*}} sitofp		; AVX1: cost of 3 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv2i16v2double		; AVX2-LABEL: sitofpv2i16v2double
; AVX2: cost of 4 {{.*}} sitofp		; AVX2: cost of 3 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv2i16v2double		; AVX512F-LABEL: sitofpv2i16v2double
; AVX512F: cost of 4 {{.*}} sitofp		; AVX512F: cost of 3 {{.*}} sitofp
%1 = sitofp <2 x i16> %a to <2 x double>		%1 = sitofp <2 x i16> %a to <2 x double>
ret <2 x double> %1		ret <2 x double> %1
}		}

define <4 x double> @sitofpv4i16v4double(<4 x i16> %a) {		define <4 x double> @sitofpv4i16v4double(<4 x i16> %a) {
; SSE2-LABEL: sitofpv4i16v4double		; SSE2-LABEL: sitofpv4i16v4double
; SSE2: cost of 40 {{.*}} sitofp		; SSE2: cost of 6 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv4i16v4double		; AVX1-LABEL: sitofpv4i16v4double
; AVX1: cost of 3 {{.*}} sitofp		; AVX1: cost of 3 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv4i16v4double		; AVX2-LABEL: sitofpv4i16v4double
; AVX2: cost of 3 {{.*}} sitofp		; AVX2: cost of 3 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv4i16v4double		; AVX512F-LABEL: sitofpv4i16v4double
; AVX512F: cost of 3 {{.*}} sitofp		; AVX512F: cost of 3 {{.*}} sitofp
%1 = sitofp <4 x i16> %a to <4 x double>		%1 = sitofp <4 x i16> %a to <4 x double>
ret <4 x double> %1		ret <4 x double> %1
}		}

define <8 x double> @sitofpv8i16v8double(<8 x i16> %a) {		define <8 x double> @sitofpv8i16v8double(<8 x i16> %a) {
; SSE2-LABEL: sitofpv8i16v8double		; SSE2-LABEL: sitofpv8i16v8double
; SSE2: cost of 80 {{.*}} sitofp		; SSE2: cost of 12 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv8i16v8double		; AVX1-LABEL: sitofpv8i16v8double
; AVX1: cost of 20 {{.*}} sitofp		; AVX1: cost of 6 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv8i16v8double		; AVX2-LABEL: sitofpv8i16v8double
; AVX2: cost of 20 {{.*}} sitofp		; AVX2: cost of 6 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv8i16v8double		; AVX512F-LABEL: sitofpv8i16v8double
; AVX512F: cost of 2 {{.*}} sitofp		; AVX512F: cost of 2 {{.*}} sitofp
%1 = sitofp <8 x i16> %a to <8 x double>		%1 = sitofp <8 x i16> %a to <8 x double>
ret <8 x double> %1		ret <8 x double> %1
}		}

define <16 x double> @sitofpv16i16v16double(<16 x i16> %a) {		define <16 x double> @sitofpv16i16v16double(<16 x i16> %a) {
; SSE2-LABEL: sitofpv16i16v16double		; SSE2-LABEL: sitofpv16i16v16double
; SSE2: cost of 160 {{.*}} sitofp		; SSE2: cost of 24 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv16i16v16double		; AVX1-LABEL: sitofpv16i16v16double
; AVX1: cost of 40 {{.*}} sitofp		; AVX1: cost of 12 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv16i16v16double		; AVX2-LABEL: sitofpv16i16v16double
; AVX2: cost of 40 {{.*}} sitofp		; AVX2: cost of 12 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv16i16v16double		; AVX512F-LABEL: sitofpv16i16v16double
; AVX512F: cost of 44 {{.*}} sitofp		; AVX512F: cost of 4 {{.*}} sitofp
%1 = sitofp <16 x i16> %a to <16 x double>		%1 = sitofp <16 x i16> %a to <16 x double>
ret <16 x double> %1		ret <16 x double> %1
}		}

define <32 x double> @sitofpv32i16v32double(<32 x i16> %a) {		define <32 x double> @sitofpv32i16v32double(<32 x i16> %a) {
; SSE2-LABEL: sitofpv32i16v32double		; SSE2-LABEL: sitofpv32i16v32double
; SSE2: cost of 320 {{.*}} sitofp		; SSE2: cost of 48 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv32i16v32double		; AVX1-LABEL: sitofpv32i16v32double
; AVX1: cost of 80 {{.*}} sitofp		; AVX1: cost of 24 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv32i16v32double		; AVX2-LABEL: sitofpv32i16v32double
; AVX2: cost of 80 {{.*}} sitofp		; AVX2: cost of 24 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv32i16v32double		; AVX512F-LABEL: sitofpv32i16v32double
; AVX512F: cost of 88 {{.*}} sitofp		; AVX512F: cost of 8 {{.*}} sitofp
%1 = sitofp <32 x i16> %a to <32 x double>		%1 = sitofp <32 x i16> %a to <32 x double>
ret <32 x double> %1		ret <32 x double> %1
}		}

define <2 x double> @sitofpv2i32v2double(<2 x i32> %a) {		define <2 x double> @sitofpv2i32v2double(<2 x i32> %a) {
; SSE2-LABEL: sitofpv2i32v2double		; SSE2-LABEL: sitofpv2i32v2double
; SSE2: cost of 20 {{.*}} sitofp		; SSE2: cost of 1 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv2i32v2double		; AVX1-LABEL: sitofpv2i32v2double
; AVX1: cost of 4 {{.*}} sitofp		; AVX1: cost of 1 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv2i32v2double		; AVX2-LABEL: sitofpv2i32v2double
; AVX2: cost of 4 {{.*}} sitofp		; AVX2: cost of 1 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv2i32v2double		; AVX512F-LABEL: sitofpv2i32v2double
; AVX512F: cost of 4 {{.*}} sitofp		; AVX512F: cost of 1 {{.*}} sitofp
%1 = sitofp <2 x i32> %a to <2 x double>		%1 = sitofp <2 x i32> %a to <2 x double>
ret <2 x double> %1		ret <2 x double> %1
}		}

define <4 x double> @sitofpv4i32v4double(<4 x i32> %a) {		define <4 x double> @sitofpv4i32v4double(<4 x i32> %a) {
; SSE2-LABEL: sitofpv4i32v4double		; SSE2-LABEL: sitofpv4i32v4double
; SSE2: cost of 40 {{.*}} sitofp		; SSE2: cost of 2 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv4i32v4double		; AVX1-LABEL: sitofpv4i32v4double
; AVX1: cost of 1 {{.*}} sitofp		; AVX1: cost of 1 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv4i32v4double		; AVX2-LABEL: sitofpv4i32v4double
; AVX2: cost of 1 {{.*}} sitofp		; AVX2: cost of 1 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv4i32v4double		; AVX512F-LABEL: sitofpv4i32v4double
; AVX512F: cost of 1 {{.*}} sitofp		; AVX512F: cost of 1 {{.*}} sitofp
%1 = sitofp <4 x i32> %a to <4 x double>		%1 = sitofp <4 x i32> %a to <4 x double>
ret <4 x double> %1		ret <4 x double> %1
}		}

define <8 x double> @sitofpv8i32v8double(<8 x i32> %a) {		define <8 x double> @sitofpv8i32v8double(<8 x i32> %a) {
; SSE2-LABEL: sitofpv8i32v8double		; SSE2-LABEL: sitofpv8i32v8double
; SSE2: cost of 80 {{.*}} sitofp		; SSE2: cost of 4 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv8i32v8double		; AVX1-LABEL: sitofpv8i32v8double
; AVX1: cost of 20 {{.*}} sitofp		; AVX1: cost of 2 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv8i32v8double		; AVX2-LABEL: sitofpv8i32v8double
; AVX2: cost of 20 {{.*}} sitofp		; AVX2: cost of 2 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv8i32v8double		; AVX512F-LABEL: sitofpv8i32v8double
; AVX512F: cost of 1 {{.*}} sitofp		; AVX512F: cost of 1 {{.*}} sitofp
%1 = sitofp <8 x i32> %a to <8 x double>		%1 = sitofp <8 x i32> %a to <8 x double>
ret <8 x double> %1		ret <8 x double> %1
}		}

define <16 x double> @sitofpv16i32v16double(<16 x i32> %a) {		define <16 x double> @sitofpv16i32v16double(<16 x i32> %a) {
; SSE2-LABEL: sitofpv16i32v16double		; SSE2-LABEL: sitofpv16i32v16double
; SSE2: cost of 160 {{.*}} sitofp		; SSE2: cost of 8 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv16i32v16double		; AVX1-LABEL: sitofpv16i32v16double
; AVX1: cost of 40 {{.*}} sitofp		; AVX1: cost of 4 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv16i32v16double		; AVX2-LABEL: sitofpv16i32v16double
; AVX2: cost of 40 {{.*}} sitofp		; AVX2: cost of 4 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv16i32v16double		; AVX512F-LABEL: sitofpv16i32v16double
; AVX512F: cost of 44 {{.*}} sitofp		; AVX512F: cost of 2 {{.*}} sitofp
%1 = sitofp <16 x i32> %a to <16 x double>		%1 = sitofp <16 x i32> %a to <16 x double>
ret <16 x double> %1		ret <16 x double> %1
}		}

define <32 x double> @sitofpv32i32v32double(<32 x i32> %a) {		define <32 x double> @sitofpv32i32v32double(<32 x i32> %a) {
; SSE2-LABEL: sitofpv32i32v32double		; SSE2-LABEL: sitofpv32i32v32double
; SSE2: cost of 320 {{.*}} sitofp		; SSE2: cost of 16 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv32i32v32double		; AVX1-LABEL: sitofpv32i32v32double
; AVX1: cost of 80 {{.*}} sitofp		; AVX1: cost of 8 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv32i32v32double		; AVX2-LABEL: sitofpv32i32v32double
; AVX2: cost of 80 {{.*}} sitofp		; AVX2: cost of 8 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv32i32v32double		; AVX512F-LABEL: sitofpv32i32v32double
; AVX512F: cost of 88 {{.*}} sitofp		; AVX512F: cost of 4 {{.*}} sitofp
%1 = sitofp <32 x i32> %a to <32 x double>		%1 = sitofp <32 x i32> %a to <32 x double>
ret <32 x double> %1		ret <32 x double> %1
}		}

define <2 x double> @sitofpv2i64v2double(<2 x i64> %a) {		define <2 x double> @sitofpv2i64v2double(<2 x i64> %a) {
; SSE2-LABEL: sitofpv2i64v2double		; SSE2-LABEL: sitofpv2i64v2double
; SSE2: cost of 20 {{.*}} sitofp		; SSE2: cost of 20 {{.*}} sitofp
;		;
Show All 9 Lines	define <2 x double> @sitofpv2i64v2double(<2 x i64> %a) {
ret <2 x double> %1		ret <2 x double> %1
}		}

define <4 x double> @sitofpv4i64v4double(<4 x i64> %a) {		define <4 x double> @sitofpv4i64v4double(<4 x i64> %a) {
; SSE2-LABEL: sitofpv4i64v4double		; SSE2-LABEL: sitofpv4i64v4double
; SSE2: cost of 40 {{.*}} sitofp		; SSE2: cost of 40 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv4i64v4double		; AVX1-LABEL: sitofpv4i64v4double
; AVX1: cost of 10 {{.*}} sitofp		; AVX1: cost of 13 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv4i64v4double		; AVX2-LABEL: sitofpv4i64v4double
; AVX2: cost of 10 {{.*}} sitofp		; AVX2: cost of 13 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv4i64v4double		; AVX512F-LABEL: sitofpv4i64v4double
; AVX512F: cost of 10 {{.*}} sitofp		; AVX512F: cost of 13 {{.*}} sitofp
%1 = sitofp <4 x i64> %a to <4 x double>		%1 = sitofp <4 x i64> %a to <4 x double>
ret <4 x double> %1		ret <4 x double> %1
}		}

define <8 x double> @sitofpv8i64v8double(<8 x i64> %a) {		define <8 x double> @sitofpv8i64v8double(<8 x i64> %a) {
; SSE2-LABEL: sitofpv8i64v8double		; SSE2-LABEL: sitofpv8i64v8double
; SSE2: cost of 80 {{.*}} sitofp		; SSE2: cost of 80 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv8i64v8double		; AVX1-LABEL: sitofpv8i64v8double
; AVX1: cost of 20 {{.*}} sitofp		; AVX1: cost of 26 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv8i64v8double		; AVX2-LABEL: sitofpv8i64v8double
; AVX2: cost of 20 {{.*}} sitofp		; AVX2: cost of 26 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv8i64v8double		; AVX512F-LABEL: sitofpv8i64v8double
; AVX512F: cost of 22 {{.*}} sitofp		; AVX512F: cost of 26 {{.*}} sitofp
%1 = sitofp <8 x i64> %a to <8 x double>		%1 = sitofp <8 x i64> %a to <8 x double>
ret <8 x double> %1		ret <8 x double> %1
}		}

define <16 x double> @sitofpv16i64v16double(<16 x i64> %a) {		define <16 x double> @sitofpv16i64v16double(<16 x i64> %a) {
; SSE2-LABEL: sitofpv16i64v16double		; SSE2-LABEL: sitofpv16i64v16double
; SSE2: cost of 160 {{.*}} sitofp		; SSE2: cost of 160 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv16i64v16double		; AVX1-LABEL: sitofpv16i64v16double
; AVX1: cost of 40 {{.*}} sitofp		; AVX1: cost of 52 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv16i64v16double		; AVX2-LABEL: sitofpv16i64v16double
; AVX2: cost of 40 {{.*}} sitofp		; AVX2: cost of 52 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv16i64v16double		; AVX512F-LABEL: sitofpv16i64v16double
; AVX512F: cost of 44 {{.*}} sitofp		; AVX512F: cost of 52 {{.*}} sitofp
%1 = sitofp <16 x i64> %a to <16 x double>		%1 = sitofp <16 x i64> %a to <16 x double>
ret <16 x double> %1		ret <16 x double> %1
}		}

define <32 x double> @sitofpv32i64v32double(<32 x i64> %a) {		define <32 x double> @sitofpv32i64v32double(<32 x i64> %a) {
; SSE2-LABEL: sitofpv32i64v32double		; SSE2-LABEL: sitofpv32i64v32double
; SSE2: cost of 320 {{.*}} sitofp		; SSE2: cost of 320 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv32i64v32double		; AVX1-LABEL: sitofpv32i64v32double
; AVX1: cost of 80 {{.*}} sitofp		; AVX1: cost of 104 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv32i64v32double		; AVX2-LABEL: sitofpv32i64v32double
; AVX2: cost of 80 {{.*}} sitofp		; AVX2: cost of 104 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv32i64v32double		; AVX512F-LABEL: sitofpv32i64v32double
; AVX512F: cost of 88 {{.*}} sitofp		; AVX512F: cost of 104 {{.*}} sitofp
%1 = sitofp <32 x i64> %a to <32 x double>		%1 = sitofp <32 x i64> %a to <32 x double>
ret <32 x double> %1		ret <32 x double> %1
}		}

define <2 x float> @sitofpv2i8v2float(<2 x i8> %a) {		define <2 x float> @sitofpv2i8v2float(<2 x i8> %a) {
; SSE2-LABEL: sitofpv2i8v2float		; SSE2-LABEL: sitofpv2i8v2float
; SSE2: cost of 15 {{.*}} sitofp		; SSE2: cost of 3 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv2i8v2float		; AVX1-LABEL: sitofpv2i8v2float
; AVX1: cost of 4 {{.*}} sitofp		; AVX1: cost of 3 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv2i8v2float		; AVX2-LABEL: sitofpv2i8v2float
; AVX2: cost of 4 {{.*}} sitofp		; AVX2: cost of 3 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv2i8v2float		; AVX512F-LABEL: sitofpv2i8v2float
; AVX512F: cost of 4 {{.*}} sitofp		; AVX512F: cost of 3 {{.*}} sitofp
%1 = sitofp <2 x i8> %a to <2 x float>		%1 = sitofp <2 x i8> %a to <2 x float>
ret <2 x float> %1		ret <2 x float> %1
}		}

define <4 x float> @sitofpv4i8v4float(<4 x i8> %a) {		define <4 x float> @sitofpv4i8v4float(<4 x i8> %a) {
; SSE2-LABEL: sitofpv4i8v4float		; SSE2-LABEL: sitofpv4i8v4float
; SSE2: cost of 15 {{.*}} sitofp		; SSE2: cost of 3 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv4i8v4float		; AVX1-LABEL: sitofpv4i8v4float
; AVX1: cost of 3 {{.*}} sitofp		; AVX1: cost of 3 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv4i8v4float		; AVX2-LABEL: sitofpv4i8v4float
; AVX2: cost of 3 {{.*}} sitofp		; AVX2: cost of 3 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv4i8v4float		; AVX512F-LABEL: sitofpv4i8v4float
; AVX512F: cost of 3 {{.*}} sitofp		; AVX512F: cost of 3 {{.*}} sitofp
%1 = sitofp <4 x i8> %a to <4 x float>		%1 = sitofp <4 x i8> %a to <4 x float>
ret <4 x float> %1		ret <4 x float> %1
}		}

define <8 x float> @sitofpv8i8v8float(<8 x i8> %a) {		define <8 x float> @sitofpv8i8v8float(<8 x i8> %a) {
; SSE2-LABEL: sitofpv8i8v8float		; SSE2-LABEL: sitofpv8i8v8float
; SSE2: cost of 15 {{.*}} sitofp		; SSE2: cost of 6 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv8i8v8float		; AVX1-LABEL: sitofpv8i8v8float
; AVX1: cost of 8 {{.*}} sitofp		; AVX1: cost of 8 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv8i8v8float		; AVX2-LABEL: sitofpv8i8v8float
; AVX2: cost of 8 {{.*}} sitofp		; AVX2: cost of 8 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv8i8v8float		; AVX512F-LABEL: sitofpv8i8v8float
; AVX512F: cost of 8 {{.*}} sitofp		; AVX512F: cost of 8 {{.*}} sitofp
%1 = sitofp <8 x i8> %a to <8 x float>		%1 = sitofp <8 x i8> %a to <8 x float>
ret <8 x float> %1		ret <8 x float> %1
}		}

define <16 x float> @sitofpv16i8v16float(<16 x i8> %a) {		define <16 x float> @sitofpv16i8v16float(<16 x i8> %a) {
; SSE2-LABEL: sitofpv16i8v16float		; SSE2-LABEL: sitofpv16i8v16float
; SSE2: cost of 8 {{.*}} sitofp		; SSE2: cost of 12 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv16i8v16float		; AVX1-LABEL: sitofpv16i8v16float
; AVX1: cost of 44 {{.*}} sitofp		; AVX1: cost of 16 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv16i8v16float		; AVX2-LABEL: sitofpv16i8v16float
; AVX2: cost of 44 {{.*}} sitofp		; AVX2: cost of 16 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv16i8v16float		; AVX512F-LABEL: sitofpv16i8v16float
; AVX512F: cost of 2 {{.*}} sitofp		; AVX512F: cost of 2 {{.*}} sitofp
%1 = sitofp <16 x i8> %a to <16 x float>		%1 = sitofp <16 x i8> %a to <16 x float>
ret <16 x float> %1		ret <16 x float> %1
}		}

define <32 x float> @sitofpv32i8v32float(<32 x i8> %a) {		define <32 x float> @sitofpv32i8v32float(<32 x i8> %a) {
; SSE2-LABEL: sitofpv32i8v32float		; SSE2-LABEL: sitofpv32i8v32float
; SSE2: cost of 16 {{.*}} sitofp		; SSE2: cost of 24 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv32i8v32float		; AVX1-LABEL: sitofpv32i8v32float
; AVX1: cost of 88 {{.*}} sitofp		; AVX1: cost of 32 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv32i8v32float		; AVX2-LABEL: sitofpv32i8v32float
; AVX2: cost of 88 {{.*}} sitofp		; AVX2: cost of 32 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv32i8v32float		; AVX512F-LABEL: sitofpv32i8v32float
; AVX512F: cost of 92 {{.*}} sitofp		; AVX512F: cost of 4 {{.*}} sitofp
%1 = sitofp <32 x i8> %a to <32 x float>		%1 = sitofp <32 x i8> %a to <32 x float>
ret <32 x float> %1		ret <32 x float> %1
}		}

define <2 x float> @sitofpv2i16v2float(<2 x i16> %a) {		define <2 x float> @sitofpv2i16v2float(<2 x i16> %a) {
; SSE2-LABEL: sitofpv2i16v2float		; SSE2-LABEL: sitofpv2i16v2float
; SSE2: cost of 15 {{.*}} sitofp		; SSE2: cost of 3 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv2i16v2float		; AVX1-LABEL: sitofpv2i16v2float
; AVX1: cost of 4 {{.*}} sitofp		; AVX1: cost of 3 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv2i16v2float		; AVX2-LABEL: sitofpv2i16v2float
; AVX2: cost of 4 {{.*}} sitofp		; AVX2: cost of 3 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv2i16v2float		; AVX512F-LABEL: sitofpv2i16v2float
; AVX512F: cost of 4 {{.*}} sitofp		; AVX512F: cost of 3 {{.*}} sitofp
%1 = sitofp <2 x i16> %a to <2 x float>		%1 = sitofp <2 x i16> %a to <2 x float>
ret <2 x float> %1		ret <2 x float> %1
}		}

define <4 x float> @sitofpv4i16v4float(<4 x i16> %a) {		define <4 x float> @sitofpv4i16v4float(<4 x i16> %a) {
; SSE2-LABEL: sitofpv4i16v4float		; SSE2-LABEL: sitofpv4i16v4float
; SSE2: cost of 15 {{.*}} sitofp		; SSE2: cost of 3 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv4i16v4float		; AVX1-LABEL: sitofpv4i16v4float
; AVX1: cost of 3 {{.*}} sitofp		; AVX1: cost of 3 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv4i16v4float		; AVX2-LABEL: sitofpv4i16v4float
; AVX2: cost of 3 {{.*}} sitofp		; AVX2: cost of 3 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv4i16v4float		; AVX512F-LABEL: sitofpv4i16v4float
; AVX512F: cost of 3 {{.*}} sitofp		; AVX512F: cost of 3 {{.*}} sitofp
%1 = sitofp <4 x i16> %a to <4 x float>		%1 = sitofp <4 x i16> %a to <4 x float>
ret <4 x float> %1		ret <4 x float> %1
}		}

define <8 x float> @sitofpv8i16v8float(<8 x i16> %a) {		define <8 x float> @sitofpv8i16v8float(<8 x i16> %a) {
; SSE2-LABEL: sitofpv8i16v8float		; SSE2-LABEL: sitofpv8i16v8float
; SSE2: cost of 15 {{.*}} sitofp		; SSE2: cost of 6 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv8i16v8float		; AVX1-LABEL: sitofpv8i16v8float
; AVX1: cost of 5 {{.*}} sitofp		; AVX1: cost of 5 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv8i16v8float		; AVX2-LABEL: sitofpv8i16v8float
; AVX2: cost of 5 {{.*}} sitofp		; AVX2: cost of 5 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv8i16v8float		; AVX512F-LABEL: sitofpv8i16v8float
; AVX512F: cost of 5 {{.*}} sitofp		; AVX512F: cost of 5 {{.*}} sitofp
%1 = sitofp <8 x i16> %a to <8 x float>		%1 = sitofp <8 x i16> %a to <8 x float>
ret <8 x float> %1		ret <8 x float> %1
}		}

define <16 x float> @sitofpv16i16v16float(<16 x i16> %a) {		define <16 x float> @sitofpv16i16v16float(<16 x i16> %a) {
; SSE2-LABEL: sitofpv16i16v16float		; SSE2-LABEL: sitofpv16i16v16float
; SSE2: cost of 30 {{.*}} sitofp		; SSE2: cost of 12 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv16i16v16float		; AVX1-LABEL: sitofpv16i16v16float
; AVX1: cost of 44 {{.*}} sitofp		; AVX1: cost of 10 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv16i16v16float		; AVX2-LABEL: sitofpv16i16v16float
; AVX2: cost of 44 {{.*}} sitofp		; AVX2: cost of 10 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv16i16v16float		; AVX512F-LABEL: sitofpv16i16v16float
; AVX512F: cost of 2 {{.*}} sitofp		; AVX512F: cost of 2 {{.*}} sitofp
%1 = sitofp <16 x i16> %a to <16 x float>		%1 = sitofp <16 x i16> %a to <16 x float>
ret <16 x float> %1		ret <16 x float> %1
}		}

define <32 x float> @sitofpv32i16v32float(<32 x i16> %a) {		define <32 x float> @sitofpv32i16v32float(<32 x i16> %a) {
; SSE2-LABEL: sitofpv32i16v32float		; SSE2-LABEL: sitofpv32i16v32float
; SSE2: cost of 60 {{.*}} sitofp		; SSE2: cost of 24 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv32i16v32float		; AVX1-LABEL: sitofpv32i16v32float
; AVX1: cost of 88 {{.*}} sitofp		; AVX1: cost of 20 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv32i16v32float		; AVX2-LABEL: sitofpv32i16v32float
; AVX2: cost of 88 {{.*}} sitofp		; AVX2: cost of 20 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv32i16v32float		; AVX512F-LABEL: sitofpv32i16v32float
; AVX512F: cost of 92 {{.*}} sitofp		; AVX512F: cost of 4 {{.*}} sitofp
%1 = sitofp <32 x i16> %a to <32 x float>		%1 = sitofp <32 x i16> %a to <32 x float>
ret <32 x float> %1		ret <32 x float> %1
}		}

define <2 x float> @sitofpv2i32v2float(<2 x i32> %a) {		define <2 x float> @sitofpv2i32v2float(<2 x i32> %a) {
; SSE2-LABEL: sitofpv2i32v2float		; SSE2-LABEL: sitofpv2i32v2float
; SSE2: cost of 15 {{.*}} sitofp		; SSE2: cost of 1 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv2i32v2float		; AVX1-LABEL: sitofpv2i32v2float
; AVX1: cost of 4 {{.*}} sitofp		; AVX1: cost of 1 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv2i32v2float		; AVX2-LABEL: sitofpv2i32v2float
; AVX2: cost of 4 {{.*}} sitofp		; AVX2: cost of 1 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv2i32v2float		; AVX512F-LABEL: sitofpv2i32v2float
; AVX512F: cost of 4 {{.*}} sitofp		; AVX512F: cost of 1 {{.*}} sitofp
%1 = sitofp <2 x i32> %a to <2 x float>		%1 = sitofp <2 x i32> %a to <2 x float>
ret <2 x float> %1		ret <2 x float> %1
}		}

define <4 x float> @sitofpv4i32v4float(<4 x i32> %a) {		define <4 x float> @sitofpv4i32v4float(<4 x i32> %a) {
; SSE2-LABEL: sitofpv4i32v4float		; SSE2-LABEL: sitofpv4i32v4float
; SSE2: cost of 15 {{.*}} sitofp		; SSE2: cost of 1 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv4i32v4float		; AVX1-LABEL: sitofpv4i32v4float
; AVX1: cost of 1 {{.*}} sitofp		; AVX1: cost of 1 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv4i32v4float		; AVX2-LABEL: sitofpv4i32v4float
; AVX2: cost of 1 {{.*}} sitofp		; AVX2: cost of 1 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv4i32v4float		; AVX512F-LABEL: sitofpv4i32v4float
; AVX512F: cost of 1 {{.*}} sitofp		; AVX512F: cost of 1 {{.*}} sitofp
%1 = sitofp <4 x i32> %a to <4 x float>		%1 = sitofp <4 x i32> %a to <4 x float>
ret <4 x float> %1		ret <4 x float> %1
}		}

define <8 x float> @sitofpv8i32v8float(<8 x i32> %a) {		define <8 x float> @sitofpv8i32v8float(<8 x i32> %a) {
; SSE2-LABEL: sitofpv8i32v8float		; SSE2-LABEL: sitofpv8i32v8float
; SSE2: cost of 30 {{.*}} sitofp		; SSE2: cost of 2 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv8i32v8float		; AVX1-LABEL: sitofpv8i32v8float
; AVX1: cost of 1 {{.*}} sitofp		; AVX1: cost of 1 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv8i32v8float		; AVX2-LABEL: sitofpv8i32v8float
; AVX2: cost of 1 {{.*}} sitofp		; AVX2: cost of 1 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv8i32v8float		; AVX512F-LABEL: sitofpv8i32v8float
; AVX512F: cost of 1 {{.*}} sitofp		; AVX512F: cost of 1 {{.*}} sitofp
%1 = sitofp <8 x i32> %a to <8 x float>		%1 = sitofp <8 x i32> %a to <8 x float>
ret <8 x float> %1		ret <8 x float> %1
}		}

define <16 x float> @sitofpv16i32v16float(<16 x i32> %a) {		define <16 x float> @sitofpv16i32v16float(<16 x i32> %a) {
; SSE2-LABEL: sitofpv16i32v16float		; SSE2-LABEL: sitofpv16i32v16float
; SSE2: cost of 60 {{.*}} sitofp		; SSE2: cost of 4 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv16i32v16float		; AVX1-LABEL: sitofpv16i32v16float
; AVX1: cost of 44 {{.*}} sitofp		; AVX1: cost of 2 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv16i32v16float		; AVX2-LABEL: sitofpv16i32v16float
; AVX2: cost of 44 {{.*}} sitofp		; AVX2: cost of 2 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv16i32v16float		; AVX512F-LABEL: sitofpv16i32v16float
; AVX512F: cost of 1 {{.*}} sitofp		; AVX512F: cost of 1 {{.*}} sitofp
%1 = sitofp <16 x i32> %a to <16 x float>		%1 = sitofp <16 x i32> %a to <16 x float>
ret <16 x float> %1		ret <16 x float> %1
}		}

define <32 x float> @sitofpv32i32v32float(<32 x i32> %a) {		define <32 x float> @sitofpv32i32v32float(<32 x i32> %a) {
; SSE2-LABEL: sitofpv32i32v32float		; SSE2-LABEL: sitofpv32i32v32float
; SSE2: cost of 120 {{.*}} sitofp		; SSE2: cost of 8 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv32i32v32float		; AVX1-LABEL: sitofpv32i32v32float
; AVX1: cost of 88 {{.*}} sitofp		; AVX1: cost of 4 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv32i32v32float		; AVX2-LABEL: sitofpv32i32v32float
; AVX2: cost of 88 {{.*}} sitofp		; AVX2: cost of 4 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv32i32v32float		; AVX512F-LABEL: sitofpv32i32v32float
; AVX512F: cost of 92 {{.*}} sitofp		; AVX512F: cost of 2 {{.*}} sitofp
%1 = sitofp <32 x i32> %a to <32 x float>		%1 = sitofp <32 x i32> %a to <32 x float>
ret <32 x float> %1		ret <32 x float> %1
}		}

define <2 x float> @sitofpv2i64v2float(<2 x i64> %a) {		define <2 x float> @sitofpv2i64v2float(<2 x i64> %a) {
; SSE2-LABEL: sitofpv2i64v2float		; SSE2-LABEL: sitofpv2i64v2float
; SSE2: cost of 15 {{.*}} sitofp		; SSE2: cost of 7 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv2i64v2float		; AVX1-LABEL: sitofpv2i64v2float
; AVX1: cost of 4 {{.*}} sitofp		; AVX1: cost of 7 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv2i64v2float		; AVX2-LABEL: sitofpv2i64v2float
; AVX2: cost of 4 {{.*}} sitofp		; AVX2: cost of 7 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv2i64v2float		; AVX512F-LABEL: sitofpv2i64v2float
; AVX512F: cost of 4 {{.*}} sitofp		; AVX512F: cost of 7 {{.*}} sitofp
%1 = sitofp <2 x i64> %a to <2 x float>		%1 = sitofp <2 x i64> %a to <2 x float>
ret <2 x float> %1		ret <2 x float> %1
}		}

define <4 x float> @sitofpv4i64v4float(<4 x i64> %a) {		define <4 x float> @sitofpv4i64v4float(<4 x i64> %a) {
; SSE2-LABEL: sitofpv4i64v4float		; SSE2-LABEL: sitofpv4i64v4float
; SSE2: cost of 30 {{.*}} sitofp		; SSE2: cost of 15 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv4i64v4float		; AVX1-LABEL: sitofpv4i64v4float
; AVX1: cost of 10 {{.*}} sitofp		; AVX1: cost of 13 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv4i64v4float		; AVX2-LABEL: sitofpv4i64v4float
; AVX2: cost of 10 {{.*}} sitofp		; AVX2: cost of 13 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv4i64v4float		; AVX512F-LABEL: sitofpv4i64v4float
; AVX512F: cost of 10 {{.*}} sitofp		; AVX512F: cost of 13 {{.*}} sitofp
%1 = sitofp <4 x i64> %a to <4 x float>		%1 = sitofp <4 x i64> %a to <4 x float>
ret <4 x float> %1		ret <4 x float> %1
}		}

define <8 x float> @sitofpv8i64v8float(<8 x i64> %a) {		define <8 x float> @sitofpv8i64v8float(<8 x i64> %a) {
; SSE2-LABEL: sitofpv8i64v8float		; SSE2-LABEL: sitofpv8i64v8float
; SSE2: cost of 60 {{.*}} sitofp		; SSE2: cost of 28 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv8i64v8float		; AVX1-LABEL: sitofpv8i64v8float
; AVX1: cost of 22 {{.*}} sitofp		; AVX1: cost of 26 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv8i64v8float		; AVX2-LABEL: sitofpv8i64v8float
; AVX2: cost of 22 {{.*}} sitofp		; AVX2: cost of 26 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv8i64v8float		; AVX512F-LABEL: sitofpv8i64v8float
; AVX512F: cost of 22 {{.*}} sitofp		; AVX512F: cost of 26 {{.*}} sitofp
%1 = sitofp <8 x i64> %a to <8 x float>		%1 = sitofp <8 x i64> %a to <8 x float>
ret <8 x float> %1		ret <8 x float> %1
}		}

define <16 x float> @sitofpv16i64v16float(<16 x i64> %a) {		define <16 x float> @sitofpv16i64v16float(<16 x i64> %a) {
; SSE2-LABEL: sitofpv16i64v16float		; SSE2-LABEL: sitofpv16i64v16float
; SSE2: cost of 120 {{.*}} sitofp		; SSE2: cost of 56 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv16i64v16float		; AVX1-LABEL: sitofpv16i64v16float
; AVX1: cost of 44 {{.*}} sitofp		; AVX1: cost of 52 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv16i64v16float		; AVX2-LABEL: sitofpv16i64v16float
; AVX2: cost of 44 {{.*}} sitofp		; AVX2: cost of 52 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv16i64v16float		; AVX512F-LABEL: sitofpv16i64v16float
; AVX512F: cost of 46 {{.*}} sitofp		; AVX512F: cost of 52 {{.*}} sitofp
%1 = sitofp <16 x i64> %a to <16 x float>		%1 = sitofp <16 x i64> %a to <16 x float>
ret <16 x float> %1		ret <16 x float> %1
}		}

define <32 x float> @sitofpv32i64v32float(<32 x i64> %a) {		define <32 x float> @sitofpv32i64v32float(<32 x i64> %a) {
; SSE2-LABEL: sitofpv32i64v32float		; SSE2-LABEL: sitofpv32i64v32float
; SSE2: cost of 240 {{.*}} sitofp		; SSE2: cost of 112 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv32i64v32float		; AVX1-LABEL: sitofpv32i64v32float
; AVX1: cost of 88 {{.*}} sitofp		; AVX1: cost of 104 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv32i64v32float		; AVX2-LABEL: sitofpv32i64v32float
; AVX2: cost of 88 {{.*}} sitofp		; AVX2: cost of 104 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv32i64v32float		; AVX512F-LABEL: sitofpv32i64v32float
; AVX512F: cost of 92 {{.*}} sitofp		; AVX512F: cost of 104 {{.*}} sitofp
%1 = sitofp <32 x i64> %a to <32 x float>		%1 = sitofp <32 x i64> %a to <32 x float>
ret <32 x float> %1		ret <32 x float> %1
}		}

define <8 x double> @sitofpv8i1v8double(<8 x double> %a) {		define <8 x double> @sitofpv8i1v8double(<8 x double> %a) {
; SSE2-LABEL: sitofpv8i1v8double		; SSE2-LABEL: sitofpv8i1v8double
; SSE2: cost of 80 {{.*}} sitofp		; SSE2: cost of 16 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv8i1v8double		; AVX1-LABEL: sitofpv8i1v8double
; AVX1: cost of 20 {{.*}} sitofp		; AVX1: cost of 6 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv8i1v8double		; AVX2-LABEL: sitofpv8i1v8double
; AVX2: cost of 20 {{.*}} sitofp		; AVX2: cost of 6 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv8i1v8double		; AVX512F-LABEL: sitofpv8i1v8double
; AVX512F: cost of 4 {{.*}} sitofp		; AVX512F: cost of 4 {{.*}} sitofp
%cmpres = fcmp ogt <8 x double> %a, zeroinitializer		%cmpres = fcmp ogt <8 x double> %a, zeroinitializer
%1 = sitofp <8 x i1> %cmpres to <8 x double>		%1 = sitofp <8 x i1> %cmpres to <8 x double>
ret <8 x double> %1		ret <8 x double> %1
}		}

define <16 x float> @sitofpv16i1v16float(<16 x float> %a) {		define <16 x float> @sitofpv16i1v16float(<16 x float> %a) {
; SSE2-LABEL: sitofpv16i1v16float		; SSE2-LABEL: sitofpv16i1v16float
; SSE2: cost of 8 {{.*}} sitofp		; SSE2: cost of 40 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv16i1v16float		; AVX1-LABEL: sitofpv16i1v16float
; AVX1: cost of 44 {{.*}} sitofp		; AVX1: cost of 16 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv16i1v16float		; AVX2-LABEL: sitofpv16i1v16float
; AVX2: cost of 44 {{.*}} sitofp		; AVX2: cost of 16 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv16i1v16float		; AVX512F-LABEL: sitofpv16i1v16float
; AVX512F: cost of 3 {{.*}} sitofp		; AVX512F: cost of 3 {{.*}} sitofp
%cmpres = fcmp ogt <16 x float> %a, zeroinitializer		%cmpres = fcmp ogt <16 x float> %a, zeroinitializer
%1 = sitofp <16 x i1> %cmpres to <16 x float>		%1 = sitofp <16 x i1> %cmpres to <16 x float>
ret <16 x float> %1		ret <16 x float> %1
}		}

../test/Analysis/CostModel/X86/uitofp.ll

; RUN: opt -mtriple=x86_64-apple-darwin -mattr=+sse2 -cost-model -analyze < %s \| FileCheck --check-prefix=SSE --check-prefix=SSE2 %s		; RUN: opt -mtriple=x86_64-apple-darwin -mattr=+sse2 -cost-model -analyze < %s \| FileCheck --check-prefix=SSE --check-prefix=SSE2 %s
; RUN: opt -mtriple=x86_64-apple-darwin -mattr=+avx -cost-model -analyze < %s \| FileCheck --check-prefix=AVX --check-prefix=AVX1 %s		; RUN: opt -mtriple=x86_64-apple-darwin -mattr=+avx -cost-model -analyze < %s \| FileCheck --check-prefix=AVX --check-prefix=AVX1 %s
; RUN: opt -mtriple=x86_64-apple-darwin -mattr=+avx2 -cost-model -analyze < %s \| FileCheck --check-prefix=AVX --check-prefix=AVX2 %s		; RUN: opt -mtriple=x86_64-apple-darwin -mattr=+avx2 -cost-model -analyze < %s \| FileCheck --check-prefix=AVX --check-prefix=AVX2 %s
; RUN: opt -mtriple=x86_64-apple-darwin -mattr=+avx512f -cost-model -analyze < %s \| FileCheck --check-prefix=AVX512F %s		; RUN: opt -mtriple=x86_64-apple-darwin -mattr=+avx512f -cost-model -analyze < %s \| FileCheck --check-prefix=AVX512F %s
; RUN: opt -mtriple=x86_64-apple-darwin -mattr=+avx512dq -cost-model -analyze < %s \| FileCheck --check-prefix=AVX512DQ %s		; RUN: opt -mtriple=x86_64-apple-darwin -mattr=+avx512dq -cost-model -analyze < %s \| FileCheck --check-prefix=AVX512DQ %s

define <2 x double> @uitofpv2i8v2double(<2 x i8> %a) {		define <2 x double> @uitofpv2i8v2double(<2 x i8> %a) {
; SSE2-LABEL: uitofpv2i8v2double		; SSE2-LABEL: uitofpv2i8v2double
; SSE2: cost of 20 {{.*}} uitofp		; SSE2: cost of 2 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv2i8v2double		; AVX1-LABEL: uitofpv2i8v2double
; AVX1: cost of 4 {{.*}} uitofp		; AVX1: cost of 2 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv2i8v2double		; AVX2-LABEL: uitofpv2i8v2double
; AVX2: cost of 4 {{.*}} uitofp		; AVX2: cost of 2 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv2i8v2double		; AVX512F-LABEL: uitofpv2i8v2double
; AVX512F: cost of 2 {{.*}} uitofp		; AVX512F: cost of 2 {{.*}} uitofp
%1 = uitofp <2 x i8> %a to <2 x double>		%1 = uitofp <2 x i8> %a to <2 x double>
ret <2 x double> %1		ret <2 x double> %1
}		}

define <4 x double> @uitofpv4i8v4double(<4 x i8> %a) {		define <4 x double> @uitofpv4i8v4double(<4 x i8> %a) {
; SSE2-LABEL: uitofpv4i8v4double		; SSE2-LABEL: uitofpv4i8v4double
; SSE2: cost of 40 {{.*}} uitofp		; SSE2: cost of 4 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv4i8v4double		; AVX1-LABEL: uitofpv4i8v4double
; AVX1: cost of 2 {{.*}} uitofp		; AVX1: cost of 2 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv4i8v4double		; AVX2-LABEL: uitofpv4i8v4double
; AVX2: cost of 2 {{.*}} uitofp		; AVX2: cost of 2 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv4i8v4double		; AVX512F-LABEL: uitofpv4i8v4double
; AVX512F: cost of 2 {{.*}} uitofp		; AVX512F: cost of 2 {{.*}} uitofp
%1 = uitofp <4 x i8> %a to <4 x double>		%1 = uitofp <4 x i8> %a to <4 x double>
ret <4 x double> %1		ret <4 x double> %1
}		}

define <8 x double> @uitofpv8i8v8double(<8 x i8> %a) {		define <8 x double> @uitofpv8i8v8double(<8 x i8> %a) {
; SSE2-LABEL: uitofpv8i8v8double		; SSE2-LABEL: uitofpv8i8v8double
; SSE2: cost of 80 {{.*}} uitofp		; SSE2: cost of 8 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv8i8v8double		; AVX1-LABEL: uitofpv8i8v8double
; AVX1: cost of 20 {{.*}} uitofp		; AVX1: cost of 4 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv8i8v8double		; AVX2-LABEL: uitofpv8i8v8double
; AVX2: cost of 20 {{.*}} uitofp		; AVX2: cost of 4 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv8i8v8double		; AVX512F-LABEL: uitofpv8i8v8double
; AVX512F: cost of 2 {{.*}} uitofp		; AVX512F: cost of 2 {{.*}} uitofp
%1 = uitofp <8 x i8> %a to <8 x double>		%1 = uitofp <8 x i8> %a to <8 x double>
ret <8 x double> %1		ret <8 x double> %1
}		}

define <16 x double> @uitofpv16i8v16double(<16 x i8> %a) {		define <16 x double> @uitofpv16i8v16double(<16 x i8> %a) {
; SSE2-LABEL: uitofpv16i8v16double		; SSE2-LABEL: uitofpv16i8v16double
; SSE2: cost of 160 {{.*}} uitofp		; SSE2: cost of 16 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv16i8v16double		; AVX1-LABEL: uitofpv16i8v16double
; AVX1: cost of 40 {{.*}} uitofp		; AVX1: cost of 8 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv16i8v16double		; AVX2-LABEL: uitofpv16i8v16double
; AVX2: cost of 40 {{.*}} uitofp		; AVX2: cost of 8 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv16i8v16double		; AVX512F-LABEL: uitofpv16i8v16double
; AVX512F: cost of 44 {{.*}} uitofp		; AVX512F: cost of 4 {{.*}} uitofp
%1 = uitofp <16 x i8> %a to <16 x double>		%1 = uitofp <16 x i8> %a to <16 x double>
ret <16 x double> %1		ret <16 x double> %1
}		}

define <32 x double> @uitofpv32i8v32double(<32 x i8> %a) {		define <32 x double> @uitofpv32i8v32double(<32 x i8> %a) {
; SSE2-LABEL: uitofpv32i8v32double		; SSE2-LABEL: uitofpv32i8v32double
; SSE2: cost of 320 {{.*}} uitofp		; SSE2: cost of 32 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv32i8v32double		; AVX1-LABEL: uitofpv32i8v32double
; AVX1: cost of 80 {{.*}} uitofp		; AVX1: cost of 16 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv32i8v32double		; AVX2-LABEL: uitofpv32i8v32double
; AVX2: cost of 80 {{.*}} uitofp		; AVX2: cost of 16 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv32i8v32double		; AVX512F-LABEL: uitofpv32i8v32double
; AVX512F: cost of 88 {{.*}} uitofp		; AVX512F: cost of 8 {{.*}} uitofp
%1 = uitofp <32 x i8> %a to <32 x double>		%1 = uitofp <32 x i8> %a to <32 x double>
ret <32 x double> %1		ret <32 x double> %1
}		}

define <2 x double> @uitofpv2i16v2double(<2 x i16> %a) {		define <2 x double> @uitofpv2i16v2double(<2 x i16> %a) {
; SSE2-LABEL: uitofpv2i16v2double		; SSE2-LABEL: uitofpv2i16v2double
; SSE2: cost of 20 {{.*}} uitofp		; SSE2: cost of 2 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv2i16v2double		; AVX1-LABEL: uitofpv2i16v2double
; AVX1: cost of 4 {{.*}} uitofp		; AVX1: cost of 2 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv2i16v2double		; AVX2-LABEL: uitofpv2i16v2double
; AVX2: cost of 4 {{.*}} uitofp		; AVX2: cost of 2 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv2i16v2double		; AVX512F-LABEL: uitofpv2i16v2double
; AVX512F: cost of 5 {{.*}} uitofp		; AVX512F: cost of 2 {{.*}} uitofp
%1 = uitofp <2 x i16> %a to <2 x double>		%1 = uitofp <2 x i16> %a to <2 x double>
ret <2 x double> %1		ret <2 x double> %1
}		}

define <4 x double> @uitofpv4i16v4double(<4 x i16> %a) {		define <4 x double> @uitofpv4i16v4double(<4 x i16> %a) {
; SSE2-LABEL: uitofpv4i16v4double		; SSE2-LABEL: uitofpv4i16v4double
; SSE2: cost of 40 {{.*}} uitofp		; SSE2: cost of 4 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv4i16v4double		; AVX1-LABEL: uitofpv4i16v4double
; AVX1: cost of 2 {{.*}} uitofp		; AVX1: cost of 2 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv4i16v4double		; AVX2-LABEL: uitofpv4i16v4double
; AVX2: cost of 2 {{.*}} uitofp		; AVX2: cost of 2 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv4i16v4double		; AVX512F-LABEL: uitofpv4i16v4double
; AVX512F: cost of 2 {{.*}} uitofp		; AVX512F: cost of 2 {{.*}} uitofp
%1 = uitofp <4 x i16> %a to <4 x double>		%1 = uitofp <4 x i16> %a to <4 x double>
ret <4 x double> %1		ret <4 x double> %1
}		}

define <8 x double> @uitofpv8i16v8double(<8 x i16> %a) {		define <8 x double> @uitofpv8i16v8double(<8 x i16> %a) {
; SSE2-LABEL: uitofpv8i16v8double		; SSE2-LABEL: uitofpv8i16v8double
; SSE2: cost of 80 {{.*}} uitofp		; SSE2: cost of 8 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv8i16v8double		; AVX1-LABEL: uitofpv8i16v8double
; AVX1: cost of 20 {{.*}} uitofp		; AVX1: cost of 4 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv8i16v8double		; AVX2-LABEL: uitofpv8i16v8double
; AVX2: cost of 20 {{.*}} uitofp		; AVX2: cost of 4 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv8i16v8double		; AVX512F-LABEL: uitofpv8i16v8double
; AVX512F: cost of 2 {{.*}} uitofp		; AVX512F: cost of 2 {{.*}} uitofp
%1 = uitofp <8 x i16> %a to <8 x double>		%1 = uitofp <8 x i16> %a to <8 x double>
ret <8 x double> %1		ret <8 x double> %1
}		}

define <16 x double> @uitofpv16i16v16double(<16 x i16> %a) {		define <16 x double> @uitofpv16i16v16double(<16 x i16> %a) {
; SSE2-LABEL: uitofpv16i16v16double		; SSE2-LABEL: uitofpv16i16v16double
; SSE2: cost of 160 {{.*}} uitofp		; SSE2: cost of 16 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv16i16v16double		; AVX1-LABEL: uitofpv16i16v16double
; AVX1: cost of 40 {{.*}} uitofp		; AVX1: cost of 8 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv16i16v16double		; AVX2-LABEL: uitofpv16i16v16double
; AVX2: cost of 40 {{.*}} uitofp		; AVX2: cost of 8 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv16i16v16double		; AVX512F-LABEL: uitofpv16i16v16double
; AVX512F: cost of 44 {{.*}} uitofp		; AVX512F: cost of 4 {{.*}} uitofp
%1 = uitofp <16 x i16> %a to <16 x double>		%1 = uitofp <16 x i16> %a to <16 x double>
ret <16 x double> %1		ret <16 x double> %1
}		}

define <32 x double> @uitofpv32i16v32double(<32 x i16> %a) {		define <32 x double> @uitofpv32i16v32double(<32 x i16> %a) {
; SSE2-LABEL: uitofpv32i16v32double		; SSE2-LABEL: uitofpv32i16v32double
; SSE2: cost of 320 {{.*}} uitofp		; SSE2: cost of 32 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv32i16v32double		; AVX1-LABEL: uitofpv32i16v32double
; AVX1: cost of 80 {{.*}} uitofp		; AVX1: cost of 16 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv32i16v32double		; AVX2-LABEL: uitofpv32i16v32double
; AVX2: cost of 80 {{.*}} uitofp		; AVX2: cost of 16 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv32i16v32double		; AVX512F-LABEL: uitofpv32i16v32double
; AVX512F: cost of 88 {{.*}} uitofp		; AVX512F: cost of 8 {{.*}} uitofp
%1 = uitofp <32 x i16> %a to <32 x double>		%1 = uitofp <32 x i16> %a to <32 x double>
ret <32 x double> %1		ret <32 x double> %1
}		}

define <2 x double> @uitofpv2i32v2double(<2 x i32> %a) {		define <2 x double> @uitofpv2i32v2double(<2 x i32> %a) {
; SSE2-LABEL: uitofpv2i32v2double		; SSE2-LABEL: uitofpv2i32v2double
; SSE2: cost of 20 {{.*}} uitofp		; SSE2: cost of 15 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv2i32v2double		; AVX1-LABEL: uitofpv2i32v2double
; AVX1: cost of 4 {{.*}} uitofp		; AVX1: cost of 6 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv2i32v2double		; AVX2-LABEL: uitofpv2i32v2double
; AVX2: cost of 4 {{.*}} uitofp		; AVX2: cost of 6 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv2i32v2double		; AVX512F-LABEL: uitofpv2i32v2double
; AVX512F: cost of 4 {{.*}} uitofp		; AVX512F: cost of 1 {{.*}} uitofp
%1 = uitofp <2 x i32> %a to <2 x double>		%1 = uitofp <2 x i32> %a to <2 x double>
ret <2 x double> %1		ret <2 x double> %1
}		}

define <4 x double> @uitofpv4i32v4double(<4 x i32> %a) {		define <4 x double> @uitofpv4i32v4double(<4 x i32> %a) {
; SSE2-LABEL: uitofpv4i32v4double		; SSE2-LABEL: uitofpv4i32v4double
; SSE2: cost of 40 {{.*}} uitofp		; SSE2: cost of 30 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv4i32v4double		; AVX1-LABEL: uitofpv4i32v4double
; AVX1: cost of 6 {{.*}} uitofp		; AVX1: cost of 6 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv4i32v4double		; AVX2-LABEL: uitofpv4i32v4double
; AVX2: cost of 6 {{.*}} uitofp		; AVX2: cost of 6 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv4i32v4double		; AVX512F-LABEL: uitofpv4i32v4double
; AVX512F: cost of 1 {{.*}} uitofp		; AVX512F: cost of 1 {{.*}} uitofp
%1 = uitofp <4 x i32> %a to <4 x double>		%1 = uitofp <4 x i32> %a to <4 x double>
ret <4 x double> %1		ret <4 x double> %1
}		}

define <8 x double> @uitofpv8i32v8double(<8 x i32> %a) {		define <8 x double> @uitofpv8i32v8double(<8 x i32> %a) {
; SSE2-LABEL: uitofpv8i32v8double		; SSE2-LABEL: uitofpv8i32v8double
; SSE2: cost of 80 {{.*}} uitofp		; SSE2: cost of 60 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv8i32v8double		; AVX1-LABEL: uitofpv8i32v8double
; AVX1: cost of 20 {{.*}} uitofp		; AVX1: cost of 12 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv8i32v8double		; AVX2-LABEL: uitofpv8i32v8double
; AVX2: cost of 20 {{.*}} uitofp		; AVX2: cost of 12 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv8i32v8double		; AVX512F-LABEL: uitofpv8i32v8double
; AVX512F: cost of 1 {{.*}} uitofp		; AVX512F: cost of 1 {{.*}} uitofp
%1 = uitofp <8 x i32> %a to <8 x double>		%1 = uitofp <8 x i32> %a to <8 x double>
ret <8 x double> %1		ret <8 x double> %1
}		}

define <16 x double> @uitofpv16i32v16double(<16 x i32> %a) {		define <16 x double> @uitofpv16i32v16double(<16 x i32> %a) {
; SSE2-LABEL: uitofpv16i32v16double		; SSE2-LABEL: uitofpv16i32v16double
; SSE2: cost of 160 {{.*}} uitofp		; SSE2: cost of 120 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv16i32v16double		; AVX1-LABEL: uitofpv16i32v16double
; AVX1: cost of 40 {{.*}} uitofp		; AVX1: cost of 24 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv16i32v16double		; AVX2-LABEL: uitofpv16i32v16double
; AVX2: cost of 40 {{.*}} uitofp		; AVX2: cost of 24 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv16i32v16double		; AVX512F-LABEL: uitofpv16i32v16double
; AVX512F: cost of 44 {{.*}} uitofp		; AVX512F: cost of 2 {{.*}} uitofp
%1 = uitofp <16 x i32> %a to <16 x double>		%1 = uitofp <16 x i32> %a to <16 x double>
ret <16 x double> %1		ret <16 x double> %1
}		}

define <32 x double> @uitofpv32i32v32double(<32 x i32> %a) {		define <32 x double> @uitofpv32i32v32double(<32 x i32> %a) {
; SSE2-LABEL: uitofpv32i32v32double		; SSE2-LABEL: uitofpv32i32v32double
; SSE2: cost of 320 {{.*}} uitofp		; SSE2: cost of 240 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv32i32v32double		; AVX1-LABEL: uitofpv32i32v32double
; AVX1: cost of 80 {{.*}} uitofp		; AVX1: cost of 48 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv32i32v32double		; AVX2-LABEL: uitofpv32i32v32double
; AVX2: cost of 80 {{.*}} uitofp		; AVX2: cost of 48 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv32i32v32double		; AVX512F-LABEL: uitofpv32i32v32double
; AVX512F: cost of 88 {{.*}} uitofp		; AVX512F: cost of 4 {{.*}} uitofp
%1 = uitofp <32 x i32> %a to <32 x double>		%1 = uitofp <32 x i32> %a to <32 x double>
ret <32 x double> %1		ret <32 x double> %1
}		}

define <2 x double> @uitofpv2i64v2double(<2 x i64> %a) {		define <2 x double> @uitofpv2i64v2double(<2 x i64> %a) {
; SSE2-LABEL: uitofpv2i64v2double		; SSE2-LABEL: uitofpv2i64v2double
; SSE2: cost of 20 {{.*}} uitofp		; SSE2: cost of 20 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv2i64v2double		; AVX1-LABEL: uitofpv2i64v2double
; AVX1: cost of 20 {{.*}} uitofp		; AVX1: cost of 10 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv2i64v2double		; AVX2-LABEL: uitofpv2i64v2double
; AVX2: cost of 20 {{.*}} uitofp		; AVX2: cost of 10 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv2i64v2double		; AVX512F-LABEL: uitofpv2i64v2double
; AVX512F: cost of 5 {{.*}} uitofp		; AVX512F: cost of 5 {{.*}} uitofp
;		;
; AVX512DQ: uitofpv2i64v2double		; AVX512DQ: uitofpv2i64v2double
; AVX512DQ: cost of 1 {{.*}} uitofp		; AVX512DQ: cost of 1 {{.*}} uitofp
%1 = uitofp <2 x i64> %a to <2 x double>		%1 = uitofp <2 x i64> %a to <2 x double>
ret <2 x double> %1		ret <2 x double> %1
}		}

define <4 x double> @uitofpv4i64v4double(<4 x i64> %a) {		define <4 x double> @uitofpv4i64v4double(<4 x i64> %a) {
; SSE2-LABEL: uitofpv4i64v4double		; SSE2-LABEL: uitofpv4i64v4double
; SSE2: cost of 40 {{.*}} uitofp		; SSE2: cost of 40 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv4i64v4double		; AVX1-LABEL: uitofpv4i64v4double
; AVX1: cost of 40 {{.*}} uitofp		; AVX1: cost of 20 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv4i64v4double		; AVX2-LABEL: uitofpv4i64v4double
; AVX2: cost of 40 {{.*}} uitofp		; AVX2: cost of 20 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv4i64v4double		; AVX512F-LABEL: uitofpv4i64v4double
; AVX512F: cost of 12 {{.*}} uitofp		; AVX512F: cost of 12 {{.*}} uitofp
;		;
; AVX512DQ: uitofpv4i64v4double		; AVX512DQ: uitofpv4i64v4double
; AVX512DQ: cost of 1 {{.*}} uitofp		; AVX512DQ: cost of 1 {{.*}} uitofp
%1 = uitofp <4 x i64> %a to <4 x double>		%1 = uitofp <4 x i64> %a to <4 x double>
ret <4 x double> %1		ret <4 x double> %1
}		}

define <8 x double> @uitofpv8i64v8double(<8 x i64> %a) {		define <8 x double> @uitofpv8i64v8double(<8 x i64> %a) {
; SSE2-LABEL: uitofpv8i64v8double		; SSE2-LABEL: uitofpv8i64v8double
; SSE2: cost of 80 {{.*}} uitofp		; SSE2: cost of 80 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv8i64v8double		; AVX1-LABEL: uitofpv8i64v8double
; AVX1: cost of 20 {{.*}} uitofp		; AVX1: cost of 40 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv8i64v8double		; AVX2-LABEL: uitofpv8i64v8double
; AVX2: cost of 20 {{.*}} uitofp		; AVX2: cost of 40 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv8i64v8double		; AVX512F-LABEL: uitofpv8i64v8double
; AVX512F: cost of 26 {{.*}} uitofp		; AVX512F: cost of 26 {{.*}} uitofp
;		;
; AVX512DQ: uitofpv8i64v8double		; AVX512DQ: uitofpv8i64v8double
; AVX512DQ: cost of 1 {{.*}} uitofp		; AVX512DQ: cost of 1 {{.*}} uitofp
%1 = uitofp <8 x i64> %a to <8 x double>		%1 = uitofp <8 x i64> %a to <8 x double>
ret <8 x double> %1		ret <8 x double> %1
}		}

define <16 x double> @uitofpv16i64v16double(<16 x i64> %a) {		define <16 x double> @uitofpv16i64v16double(<16 x i64> %a) {
; SSE2-LABEL: uitofpv16i64v16double		; SSE2-LABEL: uitofpv16i64v16double
; SSE2: cost of 160 {{.*}} uitofp		; SSE2: cost of 160 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv16i64v16double		; AVX1-LABEL: uitofpv16i64v16double
; AVX1: cost of 40 {{.*}} uitofp		; AVX1: cost of 80 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv16i64v16double		; AVX2-LABEL: uitofpv16i64v16double
; AVX2: cost of 40 {{.*}} uitofp		; AVX2: cost of 80 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv16i64v16double		; AVX512F-LABEL: uitofpv16i64v16double
; AVX512F: cost of 44 {{.*}} uitofp		; AVX512F: cost of 52 {{.*}} uitofp
;		;
; AVX512DQ: uitofpv16i64v16double		; AVX512DQ: uitofpv16i64v16double
; AVX512DQ: cost of 44 {{.*}} uitofp		; AVX512DQ: cost of 2 {{.*}} uitofp
%1 = uitofp <16 x i64> %a to <16 x double>		%1 = uitofp <16 x i64> %a to <16 x double>
ret <16 x double> %1		ret <16 x double> %1
}		}

define <32 x double> @uitofpv32i64v32double(<32 x i64> %a) {		define <32 x double> @uitofpv32i64v32double(<32 x i64> %a) {
; SSE2-LABEL: uitofpv32i64v32double		; SSE2-LABEL: uitofpv32i64v32double
; SSE2: cost of 320 {{.*}} uitofp		; SSE2: cost of 320 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv32i64v32double		; AVX1-LABEL: uitofpv32i64v32double
; AVX1: cost of 80 {{.*}} uitofp		; AVX1: cost of 160 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv32i64v32double		; AVX2-LABEL: uitofpv32i64v32double
; AVX2: cost of 80 {{.*}} uitofp		; AVX2: cost of 160 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv32i64v32double		; AVX512F-LABEL: uitofpv32i64v32double
; AVX512F: cost of 88 {{.*}} uitofp		; AVX512F: cost of 104 {{.*}} uitofp
;		;
; AVX512DQ: uitofpv32i64v32double		; AVX512DQ: uitofpv32i64v32double
; AVX512DQ: cost of 88 {{.*}} uitofp		; AVX512DQ: cost of 4 {{.*}} uitofp
%1 = uitofp <32 x i64> %a to <32 x double>		%1 = uitofp <32 x i64> %a to <32 x double>
ret <32 x double> %1		ret <32 x double> %1
}		}

define <2 x float> @uitofpv2i8v2float(<2 x i8> %a) {		define <2 x float> @uitofpv2i8v2float(<2 x i8> %a) {
; SSE2-LABEL: uitofpv2i8v2float		; SSE2-LABEL: uitofpv2i8v2float
; SSE2: cost of 15 {{.*}} uitofp		; SSE2: cost of 2 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv2i8v2float		; AVX1-LABEL: uitofpv2i8v2float
; AVX1: cost of 4 {{.*}} uitofp		; AVX1: cost of 2 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv2i8v2float		; AVX2-LABEL: uitofpv2i8v2float
; AVX2: cost of 4 {{.*}} uitofp		; AVX2: cost of 2 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv2i8v2float		; AVX512F-LABEL: uitofpv2i8v2float
; AVX512F: cost of 4 {{.*}} uitofp		; AVX512F: cost of 2 {{.*}} uitofp
%1 = uitofp <2 x i8> %a to <2 x float>		%1 = uitofp <2 x i8> %a to <2 x float>
ret <2 x float> %1		ret <2 x float> %1
}		}

define <4 x float> @uitofpv4i8v4float(<4 x i8> %a) {		define <4 x float> @uitofpv4i8v4float(<4 x i8> %a) {
; SSE2-LABEL: uitofpv4i8v4float		; SSE2-LABEL: uitofpv4i8v4float
; SSE2: cost of 8 {{.*}} uitofp		; SSE2: cost of 2 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv4i8v4float		; AVX1-LABEL: uitofpv4i8v4float
; AVX1: cost of 2 {{.*}} uitofp		; AVX1: cost of 2 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv4i8v4float		; AVX2-LABEL: uitofpv4i8v4float
; AVX2: cost of 2 {{.*}} uitofp		; AVX2: cost of 2 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv4i8v4float		; AVX512F-LABEL: uitofpv4i8v4float
; AVX512F: cost of 2 {{.*}} uitofp		; AVX512F: cost of 2 {{.*}} uitofp
%1 = uitofp <4 x i8> %a to <4 x float>		%1 = uitofp <4 x i8> %a to <4 x float>
ret <4 x float> %1		ret <4 x float> %1
}		}

define <8 x float> @uitofpv8i8v8float(<8 x i8> %a) {		define <8 x float> @uitofpv8i8v8float(<8 x i8> %a) {
; SSE2-LABEL: uitofpv8i8v8float		; SSE2-LABEL: uitofpv8i8v8float
; SSE2: cost of 15 {{.*}} uitofp		; SSE2: cost of 4 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv8i8v8float		; AVX1-LABEL: uitofpv8i8v8float
; AVX1: cost of 5 {{.*}} uitofp		; AVX1: cost of 5 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv8i8v8float		; AVX2-LABEL: uitofpv8i8v8float
; AVX2: cost of 5 {{.*}} uitofp		; AVX2: cost of 5 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv8i8v8float		; AVX512F-LABEL: uitofpv8i8v8float
; AVX512F: cost of 2 {{.*}} uitofp		; AVX512F: cost of 2 {{.*}} uitofp
%1 = uitofp <8 x i8> %a to <8 x float>		%1 = uitofp <8 x i8> %a to <8 x float>
ret <8 x float> %1		ret <8 x float> %1
}		}

define <16 x float> @uitofpv16i8v16float(<16 x i8> %a) {		define <16 x float> @uitofpv16i8v16float(<16 x i8> %a) {
; SSE2-LABEL: uitofpv16i8v16float		; SSE2-LABEL: uitofpv16i8v16float
; SSE2: cost of 8 {{.*}} uitofp		; SSE2: cost of 8 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv16i8v16float		; AVX1-LABEL: uitofpv16i8v16float
; AVX1: cost of 44 {{.*}} uitofp		; AVX1: cost of 10 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv16i8v16float		; AVX2-LABEL: uitofpv16i8v16float
; AVX2: cost of 44 {{.*}} uitofp		; AVX2: cost of 10 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv16i8v16float		; AVX512F-LABEL: uitofpv16i8v16float
; AVX512F: cost of 2 {{.*}} uitofp		; AVX512F: cost of 2 {{.*}} uitofp
%1 = uitofp <16 x i8> %a to <16 x float>		%1 = uitofp <16 x i8> %a to <16 x float>
ret <16 x float> %1		ret <16 x float> %1
}		}

define <32 x float> @uitofpv32i8v32float(<32 x i8> %a) {		define <32 x float> @uitofpv32i8v32float(<32 x i8> %a) {
; SSE2-LABEL: uitofpv32i8v32float		; SSE2-LABEL: uitofpv32i8v32float
; SSE2: cost of 16 {{.*}} uitofp		; SSE2: cost of 16 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv32i8v32float		; AVX1-LABEL: uitofpv32i8v32float
; AVX1: cost of 88 {{.*}} uitofp		; AVX1: cost of 20 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv32i8v32float		; AVX2-LABEL: uitofpv32i8v32float
; AVX2: cost of 88 {{.*}} uitofp		; AVX2: cost of 20 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv32i8v32float		; AVX512F-LABEL: uitofpv32i8v32float
; AVX512F: cost of 92 {{.*}} uitofp		; AVX512F: cost of 4 {{.*}} uitofp
%1 = uitofp <32 x i8> %a to <32 x float>		%1 = uitofp <32 x i8> %a to <32 x float>
ret <32 x float> %1		ret <32 x float> %1
}		}

define <2 x float> @uitofpv2i16v2float(<2 x i16> %a) {		define <2 x float> @uitofpv2i16v2float(<2 x i16> %a) {
; SSE2-LABEL: uitofpv2i16v2float		; SSE2-LABEL: uitofpv2i16v2float
; SSE2: cost of 15 {{.*}} uitofp		; SSE2: cost of 2 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv2i16v2float		; AVX1-LABEL: uitofpv2i16v2float
; AVX1: cost of 4 {{.*}} uitofp		; AVX1: cost of 2 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv2i16v2float		; AVX2-LABEL: uitofpv2i16v2float
; AVX2: cost of 4 {{.*}} uitofp		; AVX2: cost of 2 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv2i16v2float		; AVX512F-LABEL: uitofpv2i16v2float
; AVX512F: cost of 4 {{.*}} uitofp		; AVX512F: cost of 2 {{.*}} uitofp
%1 = uitofp <2 x i16> %a to <2 x float>		%1 = uitofp <2 x i16> %a to <2 x float>
ret <2 x float> %1		ret <2 x float> %1
}		}

define <4 x float> @uitofpv4i16v4float(<4 x i16> %a) {		define <4 x float> @uitofpv4i16v4float(<4 x i16> %a) {
; SSE2-LABEL: uitofpv4i16v4float		; SSE2-LABEL: uitofpv4i16v4float
; SSE2: cost of 8 {{.*}} uitofp		; SSE2: cost of 2 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv4i16v4float		; AVX1-LABEL: uitofpv4i16v4float
; AVX1: cost of 2 {{.*}} uitofp		; AVX1: cost of 2 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv4i16v4float		; AVX2-LABEL: uitofpv4i16v4float
; AVX2: cost of 2 {{.*}} uitofp		; AVX2: cost of 2 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv4i16v4float		; AVX512F-LABEL: uitofpv4i16v4float
; AVX512F: cost of 2 {{.*}} uitofp		; AVX512F: cost of 2 {{.*}} uitofp
%1 = uitofp <4 x i16> %a to <4 x float>		%1 = uitofp <4 x i16> %a to <4 x float>
ret <4 x float> %1		ret <4 x float> %1
}		}

define <8 x float> @uitofpv8i16v8float(<8 x i16> %a) {		define <8 x float> @uitofpv8i16v8float(<8 x i16> %a) {
; SSE2-LABEL: uitofpv8i16v8float		; SSE2-LABEL: uitofpv8i16v8float
; SSE2: cost of 15 {{.*}} uitofp		; SSE2: cost of 4 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv8i16v8float		; AVX1-LABEL: uitofpv8i16v8float
; AVX1: cost of 5 {{.*}} uitofp		; AVX1: cost of 5 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv8i16v8float		; AVX2-LABEL: uitofpv8i16v8float
; AVX2: cost of 5 {{.*}} uitofp		; AVX2: cost of 5 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv8i16v8float		; AVX512F-LABEL: uitofpv8i16v8float
; AVX512F: cost of 2 {{.*}} uitofp		; AVX512F: cost of 2 {{.*}} uitofp
%1 = uitofp <8 x i16> %a to <8 x float>		%1 = uitofp <8 x i16> %a to <8 x float>
ret <8 x float> %1		ret <8 x float> %1
}		}

define <16 x float> @uitofpv16i16v16float(<16 x i16> %a) {		define <16 x float> @uitofpv16i16v16float(<16 x i16> %a) {
; SSE2-LABEL: uitofpv16i16v16float		; SSE2-LABEL: uitofpv16i16v16float
; SSE2: cost of 30 {{.*}} uitofp		; SSE2: cost of 8 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv16i16v16float		; AVX1-LABEL: uitofpv16i16v16float
; AVX1: cost of 44 {{.*}} uitofp		; AVX1: cost of 10 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv16i16v16float		; AVX2-LABEL: uitofpv16i16v16float
; AVX2: cost of 44 {{.*}} uitofp		; AVX2: cost of 10 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv16i16v16float		; AVX512F-LABEL: uitofpv16i16v16float
; AVX512F: cost of 2 {{.*}} uitofp		; AVX512F: cost of 2 {{.*}} uitofp
%1 = uitofp <16 x i16> %a to <16 x float>		%1 = uitofp <16 x i16> %a to <16 x float>
ret <16 x float> %1		ret <16 x float> %1
}		}

define <32 x float> @uitofpv32i16v32float(<32 x i16> %a) {		define <32 x float> @uitofpv32i16v32float(<32 x i16> %a) {
; SSE2-LABEL: uitofpv32i16v32float		; SSE2-LABEL: uitofpv32i16v32float
; SSE2: cost of 60 {{.*}} uitofp		; SSE2: cost of 16 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv32i16v32float		; AVX1-LABEL: uitofpv32i16v32float
; AVX1: cost of 88 {{.*}} uitofp		; AVX1: cost of 20 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv32i16v32float		; AVX2-LABEL: uitofpv32i16v32float
; AVX2: cost of 88 {{.*}} uitofp		; AVX2: cost of 20 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv32i16v32float		; AVX512F-LABEL: uitofpv32i16v32float
; AVX512F: cost of 92 {{.*}} uitofp		; AVX512F: cost of 4 {{.*}} uitofp
%1 = uitofp <32 x i16> %a to <32 x float>		%1 = uitofp <32 x i16> %a to <32 x float>
ret <32 x float> %1		ret <32 x float> %1
}		}

define <2 x float> @uitofpv2i32v2float(<2 x i32> %a) {		define <2 x float> @uitofpv2i32v2float(<2 x i32> %a) {
; SSE2-LABEL: uitofpv2i32v2float		; SSE2-LABEL: uitofpv2i32v2float
; SSE2: cost of 15 {{.*}} uitofp		; SSE2: cost of 8 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv2i32v2float		; AVX1-LABEL: uitofpv2i32v2float
; AVX1: cost of 4 {{.*}} uitofp		; AVX1: cost of 8 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv2i32v2float		; AVX2-LABEL: uitofpv2i32v2float
; AVX2: cost of 4 {{.*}} uitofp		; AVX2: cost of 8 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv2i32v2float		; AVX512F-LABEL: uitofpv2i32v2float
; AVX512F: cost of 2 {{.*}} uitofp		; AVX512F: cost of 2 {{.*}} uitofp
%1 = uitofp <2 x i32> %a to <2 x float>		%1 = uitofp <2 x i32> %a to <2 x float>
ret <2 x float> %1		ret <2 x float> %1
}		}

define <4 x float> @uitofpv4i32v4float(<4 x i32> %a) {		define <4 x float> @uitofpv4i32v4float(<4 x i32> %a) {
Show All 28 Lines	define <8 x float> @uitofpv8i32v8float(<8 x i32> %a) {
ret <8 x float> %1		ret <8 x float> %1
}		}

define <16 x float> @uitofpv16i32v16float(<16 x i32> %a) {		define <16 x float> @uitofpv16i32v16float(<16 x i32> %a) {
; SSE2-LABEL: uitofpv16i32v16float		; SSE2-LABEL: uitofpv16i32v16float
; SSE2: cost of 32 {{.*}} uitofp		; SSE2: cost of 32 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv16i32v16float		; AVX1-LABEL: uitofpv16i32v16float
; AVX1: cost of 44 {{.*}} uitofp		; AVX1: cost of 18 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv16i32v16float		; AVX2-LABEL: uitofpv16i32v16float
; AVX2: cost of 44 {{.*}} uitofp		; AVX2: cost of 16 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv16i32v16float		; AVX512F-LABEL: uitofpv16i32v16float
; AVX512F: cost of 1 {{.*}} uitofp		; AVX512F: cost of 1 {{.*}} uitofp
%1 = uitofp <16 x i32> %a to <16 x float>		%1 = uitofp <16 x i32> %a to <16 x float>
ret <16 x float> %1		ret <16 x float> %1
}		}

define <32 x float> @uitofpv32i32v32float(<32 x i32> %a) {		define <32 x float> @uitofpv32i32v32float(<32 x i32> %a) {
; SSE2-LABEL: uitofpv32i32v32float		; SSE2-LABEL: uitofpv32i32v32float
; SSE2: cost of 64 {{.*}} uitofp		; SSE2: cost of 64 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv32i32v32float		; AVX1-LABEL: uitofpv32i32v32float
; AVX1: cost of 88 {{.*}} uitofp		; AVX1: cost of 36 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv32i32v32float		; AVX2-LABEL: uitofpv32i32v32float
; AVX2: cost of 88 {{.*}} uitofp		; AVX2: cost of 32 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv32i32v32float		; AVX512F-LABEL: uitofpv32i32v32float
; AVX512F: cost of 92 {{.*}} uitofp		; AVX512F: cost of 2 {{.*}} uitofp
%1 = uitofp <32 x i32> %a to <32 x float>		%1 = uitofp <32 x i32> %a to <32 x float>
ret <32 x float> %1		ret <32 x float> %1
}		}

define <2 x float> @uitofpv2i64v2float(<2 x i64> %a) {		define <2 x float> @uitofpv2i64v2float(<2 x i64> %a) {
; SSE2-LABEL: uitofpv2i64v2float		; SSE2-LABEL: uitofpv2i64v2float
; SSE2: cost of 15 {{.*}} uitofp		; SSE2: cost of 15 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv2i64v2float		; AVX1-LABEL: uitofpv2i64v2float
; AVX1: cost of 4 {{.*}} uitofp		; AVX1: cost of 15 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv2i64v2float		; AVX2-LABEL: uitofpv2i64v2float
; AVX2: cost of 4 {{.*}} uitofp		; AVX2: cost of 15 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv2i64v2float		; AVX512F-LABEL: uitofpv2i64v2float
; AVX512F: cost of 4 {{.*}} uitofp		; AVX512F: cost of 5 {{.*}} uitofp
%1 = uitofp <2 x i64> %a to <2 x float>		%1 = uitofp <2 x i64> %a to <2 x float>
ret <2 x float> %1		ret <2 x float> %1
}		}

define <4 x float> @uitofpv4i64v4float(<4 x i64> %a) {		define <4 x float> @uitofpv4i64v4float(<4 x i64> %a) {
; SSE2-LABEL: uitofpv4i64v4float		; SSE2-LABEL: uitofpv4i64v4float
; SSE2: cost of 30 {{.*}} uitofp		; SSE2: cost of 30 {{.*}} uitofp
;		;
▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

../test/Transforms/LoopVectorize/X86/conversion-cost.ll

Show All 19 Lines	.lr.ph: ; preds = %0, %.lr.ph
%exitcond = icmp eq i32 %lftr.wideiv, %n		%exitcond = icmp eq i32 %lftr.wideiv, %n
br i1 %exitcond, label %._crit_edge, label %.lr.ph		br i1 %exitcond, label %._crit_edge, label %.lr.ph

._crit_edge: ; preds = %.lr.ph, %0		._crit_edge: ; preds = %.lr.ph, %0
ret i32 undef		ret i32 undef
}		}

;CHECK-LABEL: @conversion_cost2(		;CHECK-LABEL: @conversion_cost2(
;CHECK: <2 x float>		;CHECK: <8 x float>
;CHECK: ret		;CHECK: ret
define i32 @conversion_cost2(i32 %n, i8* nocapture %A, float* nocapture %B) nounwind uwtable ssp {		define i32 @conversion_cost2(i32 %n, i8* nocapture %A, float* nocapture %B) nounwind uwtable ssp {
%1 = icmp sgt i32 %n, 9		%1 = icmp sgt i32 %n, 9
br i1 %1, label %.lr.ph, label %._crit_edge		br i1 %1, label %.lr.ph, label %._crit_edge

.lr.ph: ; preds = %0, %.lr.ph		.lr.ph: ; preds = %0, %.lr.ph
%indvars.iv = phi i64 [ %indvars.iv.next, %.lr.ph ], [ 9, %0 ]		%indvars.iv = phi i64 [ %indvars.iv.next, %.lr.ph ], [ 9, %0 ]
%add = add nsw i64 %indvars.iv, 3		%add = add nsw i64 %indvars.iv, 3
Show All 11 Lines

../test/Transforms/LoopVectorize/X86/uint64_to_fp64-cost-model.ll

	; RUN: opt < %s -loop-vectorize -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx -S -debug-only=loop-vectorize 2>&1 \| FileCheck %s			; RUN: opt < %s -loop-vectorize -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx -S -debug-only=loop-vectorize 2>&1 \| FileCheck %s
	; REQUIRES: asserts			; REQUIRES: asserts

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.8.0"			target triple = "x86_64-apple-macosx10.8.0"


	; CHECK: cost of 20 for VF 2 For instruction: %conv = uitofp i64 %tmp to double			; CHECK: cost of 10 for VF 2 For instruction: %conv = uitofp i64 %tmp to double
	; CHECK: cost of 40 for VF 4 For instruction: %conv = uitofp i64 %tmp to double			; CHECK: cost of 20 for VF 4 For instruction: %conv = uitofp i64 %tmp to double
	define void @uint64_to_double_cost(i64* noalias nocapture %a, double* noalias nocapture readonly %b) nounwind {			define void @uint64_to_double_cost(i64* noalias nocapture %a, double* noalias nocapture readonly %b) nounwind {
	entry:			entry:
	br label %for.body			br label %for.body
	for.body:			for.body:
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds i64, i64* %a, i64 %indvars.iv			%arrayidx = getelementptr inbounds i64, i64* %a, i64 %indvars.iv
	%tmp = load i64, i64* %arrayidx, align 4			%tmp = load i64, i64* %arrayidx, align 4
	%conv = uitofp i64 %tmp to double			%conv = uitofp i64 %tmp to double
	Show All 9 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Changes in conversion cost model for X86 targetAbandonedPublic

Details

Diff Detail

Event Timeline

BB#0:

Revision Contents

Diff 43102

../lib/Target/X86/X86TargetTransformInfo.cpp

../test/Analysis/CostModel/X86/cast.ll

../test/Analysis/CostModel/X86/sitofp.ll

../test/Analysis/CostModel/X86/uitofp.ll

../test/Transforms/LoopVectorize/X86/conversion-cost.ll

../test/Transforms/LoopVectorize/X86/uint64_to_fp64-cost-model.ll

Changes in conversion cost model for X86 target
AbandonedPublic