This is an archive of the discontinued LLVM Phabricator instance.

[TTI] The cost model should not assume illegal vector casts get completely scalarized
ClosedPublic

Authored by mkuper on Jun 10 2016, 4:50 PM.

Download Raw Diff

Details

Reviewers

RKSimon
delena
• tstellarAMD
jmolloy
arsenm
hfinkel
sbaranga

Commits

rGaa71bdd3afe0: [TTI] The cost model should not assume vector casts get completely scalarized
rL274642: [TTI] The cost model should not assume vector casts get completely scalarized

Summary

If a vector cast gets split, it's quite possible that the resulting casts are legal and cheap.
So, instead of pessimistically assuming scalarization, we use the costs the concrete TTI provides for the split vector.

This looks like it does the right thing for AVX (a lot of overblown costs drop dramatically) - but I'm less sure about ARM.

Diff Detail

Event Timeline

mkuper updated this revision to Diff 60426.Jun 10 2016, 4:50 PM

mkuper retitled this revision from to [TTI] The cost model should not assume illegal vector casts get completely scalarized.

mkuper updated this object.

mkuper added reviewers: sbaranga, jmolloy, hfinkel, RKSimon, delena.

mkuper added a subscriber: llvm-commits.

Herald added a reviewer: • tstellarAMD. · View Herald TranscriptJun 10 2016, 4:50 PM

Herald added a subscriber: aemerson. · View Herald Transcript

The ARM test changes look vaguely sensible to me.

• tstellarAMD added a reviewer: arsenm.Jun 14 2016, 6:04 PM

arsenm added inline comments.Jun 14 2016, 6:08 PM

test/Analysis/CostModel/AMDGPU/addrspacecast.ll
39 ↗	(On Diff #60426)	Pretty much everything should be scalarized. The vector insert and extracts are supposed to be free (and the cost is reported as 0 for those) so I think adding the one there is inconsistent and should check the extract/insert cost

mkuper added inline comments.Jun 14 2016, 6:34 PM

test/Analysis/CostModel/AMDGPU/addrspacecast.ll
39 ↗	(On Diff #60426)	Until now, we assumed scalarization, but I think this is actually the rare case in practice. If the platform cares about vectors, I'd expect it to support most vector operations at least at some vector width, so it usually won't scalarize. And if we assume partial splitting instead of scalarization, using the insert/extract costs will be the wrong thing, regardless of how imprecise "1" is for splitting (and it's definitely imprecise, but it's what the generic getTypeLegalizationCost() uses). We could trace the entire legalization chain, and see whether the end result is a vector or a scalar, and then use either 1 or the getScalarizationOvehead() based on that, but I'm not a huge fan of that. (Is full scalarization common on AMDGPU, or is this a corner case? If it's common, perhaps we should specialize this for the AMDGPU TTI.)

arsenm added inline comments.Jun 17 2016, 1:53 PM

test/Analysis/CostModel/AMDGPU/addrspacecast.ll
39 ↗	(On Diff #60426)	There are no vector operations.Vectors are only for loading and storing, every operation is scalar
39 ↗	(On Diff #60426)	There's also no additional scalarization cost since access of a vector element is just access of a subregister

Could someone on the X86 side verify that this is sane-ish?
Some costs dropped by a very large factor, and the new costs look better, but I'd appreciate another set of eyes.

test/Analysis/CostModel/AMDGPU/addrspacecast.ll
39 ↗	(On Diff #60426)	Ah, I see. I could add a TTI hook for "split cost", and set that to 0 for AMDGPU. But I'm not entirely happy with that either. How does that sound to you? Any other suggestions that will make this work correctly for both AMDGPU and platforms that legalize vectors partially?

Updated to not affect AMDGPU, by specializing the "split cost".

Unfortunately, we can't just plug in getVectorSplitCost() (either this or a more sophisticated version) into getTypeLegalizationCost().

This is because most of the users of getTypeLegalizationCost() apparently don't use the returned value as a "cost". Rather, they seem assume it's just the number of post-legalization pieces, and multiply per-piece costs by that. So that needs a completely separate clean-up first.
AMDGPU itself is also guilty of this - it multiplies by getTypeLegalizationCost().first, so if getTypeLegalizationCost(LegalType) were to return 0 - which it ought to, if we really consider it the "cost" of legalization - things would get seriously out of whack. But this happens across all targets.

Herald added a subscriber: arsenm. · View Herald TranscriptJun 20 2016, 3:29 PM

The SSE2 costs look very high which I guess is due to missing cost entries - that should be fixable in a pre/post patch if you prefer to keep them separate. The changed x86 costs all look reasonable.

test/Analysis/CostModel/X86/sitofp.ll
72	This looks very high, codegen looks like it takes 26 ops - I haven't checked the throughput.

Thanks, Simon!
And you're right about the SSE2 costs, I'll look into that separately.

In D21251#462734, @mkuper wrote:

Updated to not affect AMDGPU, by specializing the "split cost".

Unfortunately, we can't just plug in getVectorSplitCost() (either this or a more sophisticated version) into getTypeLegalizationCost().

This is because most of the users of getTypeLegalizationCost() apparently don't use the returned value as a "cost". Rather, they seem assume it's just the number of post-legalization pieces, and multiply per-piece costs by that. So that needs a completely separate clean-up first.
AMDGPU itself is also guilty of this - it multiplies by getTypeLegalizationCost().first, so if getTypeLegalizationCost(LegalType) were to return 0 - which it ought to, if we really consider it the "cost" of legalization - things would get seriously out of whack. But this happens across all targets.

I was thinking that was to get the cost of the split. Everything should be cost * NElts * scalar cost, so e.g. <8 x float> -> <4 x float>, <4 x float> = 2 * 4 * cost

arsenm added inline comments.Jun 22 2016, 6:39 PM

test/Analysis/CostModel/ARM/cast.ll
267	LGTM, but it looks to me like this should be adding 0, so not increasing by 1?

In D21251#465230, @arsenm wrote:

In D21251#462734, @mkuper wrote:

Unfortunately, we can't just plug in getVectorSplitCost() (either this or a more sophisticated version) into getTypeLegalizationCost().

This is because most of the users of getTypeLegalizationCost() apparently don't use the returned value as a "cost". Rather, they seem assume it's just the number of post-legalization pieces, and multiply per-piece costs by that. So that needs a completely separate clean-up first.
AMDGPU itself is also guilty of this - it multiplies by getTypeLegalizationCost().first, so if getTypeLegalizationCost(LegalType) were to return 0 - which it ought to, if we really consider it the "cost" of legalization - things would get seriously out of whack. But this happens across all targets.

I was thinking that was to get the cost of the split. Everything should be cost * NElts * scalar cost, so e.g. <8 x float> -> <4 x float>, <4 x float> = 2 * 4 * cost

What exactly do you mean when you say "cost of the split"?
My expectation was for getTypeLegalizationCost() to return an approximation of the total cost of legalization (cost of INSERT_SUBVECTOR and EXTRACT_SUBVECTOR operations), that is, an additive factor. So, it would make sense to have "legalization cost + NElts * scalar cost". Or, in the more general case, when we're not fully scalarizing, "legalizaiton cost + NPieces * piece cost". Instead, what it's actually used for is to get NPieces. This is even explicitly referred to as the "split factor" in one of the X86 TTI uses:

std::pair<int, MVT> IdxsLT = TLI->getTypeLegalizationCost(DL, IndexVTy);
std::pair<int, MVT> SrcLT = TLI->getTypeLegalizationCost(DL, SrcVTy);
int SplitFactor = std::max(IdxsLT.first, SrcLT.first);

I'll have to go over all the uses of getTypeLegalizationCost(). If everything uses it as a proxy for the split factor, it's probably just a matter of changing the name to indicate what it really does. If not, then we may need two different functions, one to get the split factor, and one to get the *cost* of performing the splits.
But I think that's independent of this patch.

test/Analysis/CostModel/ARM/cast.ll
267	Generally, with the new formula, it makes sense to have costs like 2 * 32 + 1, so I didn't pay too much attention to those little changes (what concerned me more were the big drops e.g. 64 -> 11 on line 326). But you're right, I need to verify that this is really reasonable and not some unexpected artifact. Thanks!

Could you, please, add a test for "sext <16 x i32 > to <16 x i64 >" for Skylake-avx512 and see what happens?
The cost should be 2. I know that this case did not work correctly and in many cases prevented vectorization to 16.

In D21251#465930, @delena wrote:

Could you, please, add a test for "sext <16 x i32 > to <16 x i64 >" for Skylake-avx512 and see what happens?
The cost should be 2. I know that this case did not work correctly and in many cases prevented vectorization to 16.

The cost is now 3 (instead of the old 48). It's not 2 because of the getVectorSplitCost() fudge factor.
We'll probably need to tune this in the future, as I said before, this is just a first approximation. It's possible that 0 is the right value most of the time.

If you don't mind, I'll add the test as a separate patch - we don't only want this specific test, we need tests for a bunch of sexts/zexts, like ARM has.

test/Analysis/CostModel/ARM/cast.ll
267	So it mostly makes sense. This used to be evaluated as fully scalarizing, with a per-element scalarization cost of 6, and cast cost of 10. So, (6 + 10) * 4 = 64. Now we evaluate it as 1 + 2 * (cost(fptoui <2 x float> to <2 x i64>)). But the 2-wide cast is still considered fully scalarizing (even though the types are now legal), so we get 1 + 2 * (2 * (10 + 6)) = 65.

If you don't mind, I'll add the test as a separate patch - we don't only want

>this specific test, we need tests for a bunch of sexts/zexts, like ARM has.
>

[Demikhovsky, Elena] yes, you can add tests later. You can look at this revision as a test reference for X86

http://reviews.llvm.org/D15604

Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

llvm-commits mailing list
llvm-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits

So, just to make sure - at this point, everybody involved is "vaguely OK" with this, and there are no objections, right?

ashutosh.nema added a subscriber: ashutosh.nema.Jun 27 2016, 3:35 AM

Closed by commit rL274642: [TTI] The cost model should not assume vector casts get completely scalarized (authored by mkuper). · Explain WhyJul 6 2016, 10:38 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

CodeGen/

BasicTTIImpl.h

34 lines

lib/

Target/

AMDGPU/

AMDGPUTargetTransformInfo.h

2 lines

test/

Analysis/

CostModel/

ARM/

cast.ll

176 lines

PowerPC/

ext.ll

2 lines

X86/

sitofp.ll

118 lines

uitofp.ll

124 lines

Transforms/

LoopVectorize/

X86/

gather_scatter.ll

6 lines

Diff 61313

include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 309 Lines • ▼ Show 20 Lines	unsigned getArithmeticInstrCost(

if (!TLI->isOperationExpand(ISD, LT.second)) {		if (!TLI->isOperationExpand(ISD, LT.second)) {
// If the operation is custom lowered, then assume that the code is twice		// If the operation is custom lowered, then assume that the code is twice
// as expensive.		// as expensive.
return LT.first * 2 * OpCost;		return LT.first * 2 * OpCost;
}		}

// Else, assume that we need to scalarize this op.		// Else, assume that we need to scalarize this op.
		// TODO: If one of the types get legalized by splitting, handle this
		// similarly to what getCastInstrCost() does.
if (Ty->isVectorTy()) {		if (Ty->isVectorTy()) {
unsigned Num = Ty->getVectorNumElements();		unsigned Num = Ty->getVectorNumElements();
unsigned Cost = static_cast<T *>(this)		unsigned Cost = static_cast<T *>(this)
->getArithmeticInstrCost(Opcode, Ty->getScalarType());		->getArithmeticInstrCost(Opcode, Ty->getScalarType());
// return the cost of multiple scalar invocation plus the cost of		// return the cost of multiple scalar invocation plus the cost of
// inserting		// inserting
// and extracting the values.		// and extracting the values.
return getScalarizationOverhead(Ty, true, true) + Num * Cost;		return getScalarizationOverhead(Ty, true, true) + Num * Cost;
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	if (Dst->isVectorTy() && Src->isVectorTy()) {

// Just check the op cost. If the operation is legal then assume it		// Just check the op cost. If the operation is legal then assume it
// costs		// costs
// 1 and multiply by the type-legalization overhead.		// 1 and multiply by the type-legalization overhead.
if (!TLI->isOperationExpand(ISD, DstLT.second))		if (!TLI->isOperationExpand(ISD, DstLT.second))
return SrcLT.first * 1;		return SrcLT.first * 1;
}		}

// If we are converting vectors and the operation is illegal, or		// If we are legalizing by splitting, query the concrete TTI for the cost
// if the vectors are legalized to different types, estimate the		// of casting the original vector twice. We also need to factor int the
// scalarization costs.		// cost of the split itself. Count that as 1, to be consistent with
// TODO: This is probably a big overestimate. For splits, we should have		// TLI->getTypeLegalizationCost().
// something like getTypeLegalizationCost() + 2 * getCastInstrCost().		if ((TLI->getTypeAction(Src->getContext(), TLI->getValueType(DL, Src)) ==
// The same applies to getCmpSelInstrCost() and getArithmeticInstrCost()		TargetLowering::TypeSplitVector) \|\|
		(TLI->getTypeAction(Dst->getContext(), TLI->getValueType(DL, Dst)) ==
		TargetLowering::TypeSplitVector)) {
		Type *SplitDst = VectorType::get(Dst->getVectorElementType(),
		Dst->getVectorNumElements() / 2);
		Type *SplitSrc = VectorType::get(Src->getVectorElementType(),
		Src->getVectorNumElements() / 2);
		T TTI = static_cast<T >(this);
		return TTI->getVectorSplitCost() +
		(2 * TTI->getCastInstrCost(Opcode, SplitDst, SplitSrc));
		}

		// In other cases where the source or destination are illegal, assume
		// the operation will get scalarized.
unsigned Num = Dst->getVectorNumElements();		unsigned Num = Dst->getVectorNumElements();
unsigned Cost = static_cast<T *>(this)->getCastInstrCost(		unsigned Cost = static_cast<T *>(this)->getCastInstrCost(
Opcode, Dst->getScalarType(), Src->getScalarType());		Opcode, Dst->getScalarType(), Src->getScalarType());

// Return the cost of multiple scalar invocation plus the cost of		// Return the cost of multiple scalar invocation plus the cost of
// inserting and extracting the values.		// inserting and extracting the values.
return getScalarizationOverhead(Dst, true, true) + Num * Cost;		return getScalarizationOverhead(Dst, true, true) + Num * Cost;
}		}
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	unsigned getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy) {
if (!(ValTy->isVectorTy() && !LT.second.isVector()) &&		if (!(ValTy->isVectorTy() && !LT.second.isVector()) &&
!TLI->isOperationExpand(ISD, LT.second)) {		!TLI->isOperationExpand(ISD, LT.second)) {
// The operation is legal. Assume it costs 1. Multiply		// The operation is legal. Assume it costs 1. Multiply
// by the type-legalization overhead.		// by the type-legalization overhead.
return LT.first * 1;		return LT.first * 1;
}		}

// Otherwise, assume that the cast is scalarized.		// Otherwise, assume that the cast is scalarized.
		// TODO: If one of the types get legalized by splitting, handle this
		// similarly to what getCastInstrCost() does.
if (ValTy->isVectorTy()) {		if (ValTy->isVectorTy()) {
unsigned Num = ValTy->getVectorNumElements();		unsigned Num = ValTy->getVectorNumElements();
if (CondTy)		if (CondTy)
CondTy = CondTy->getScalarType();		CondTy = CondTy->getScalarType();
unsigned Cost = static_cast<T *>(this)->getCmpSelInstrCost(		unsigned Cost = static_cast<T *>(this)->getCmpSelInstrCost(
Opcode, ValTy->getScalarType(), CondTy);		Opcode, ValTy->getScalarType(), CondTy);

// Return the cost of multiple scalar invocation plus the cost of		// Return the cost of multiple scalar invocation plus the cost of
// inserting		// inserting and extracting the values.
// and extracting the values.
return getScalarizationOverhead(ValTy, true, false) + Num * Cost;		return getScalarizationOverhead(ValTy, true, false) + Num * Cost;
}		}

// Unknown scalar opcode.		// Unknown scalar opcode.
return 1;		return 1;
}		}

unsigned getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index) {		unsigned getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index) {
▲ Show 20 Lines • Show All 408 Lines • ▼ Show 20 Lines	unsigned getReductionCost(unsigned Opcode, Type *Ty, bool IsPairwise) {
// Assume the pairwise shuffles add a cost.		// Assume the pairwise shuffles add a cost.
unsigned ShuffleCost =		unsigned ShuffleCost =
NumReduxLevels * (IsPairwise + 1) *		NumReduxLevels * (IsPairwise + 1) *
static_cast<T *>(this)		static_cast<T *>(this)
->getShuffleCost(TTI::SK_ExtractSubvector, Ty, NumVecElts / 2, Ty);		->getShuffleCost(TTI::SK_ExtractSubvector, Ty, NumVecElts / 2, Ty);
return ShuffleCost + ArithCost + getScalarizationOverhead(Ty, false, true);		return ShuffleCost + ArithCost + getScalarizationOverhead(Ty, false, true);
}		}

		unsigned getVectorSplitCost() { return 1; }

/// @}		/// @}
};		};

/// \brief Concrete BasicTTIImpl that can be used if no further customization		/// \brief Concrete BasicTTIImpl that can be used if no further customization
/// is needed.		/// is needed.
class BasicTTIImpl : public BasicTTIImplBase<BasicTTIImpl> {		class BasicTTIImpl : public BasicTTIImplBase<BasicTTIImpl> {
typedef BasicTTIImplBase<BasicTTIImpl> BaseT;		typedef BasicTTIImplBase<BasicTTIImpl> BaseT;
friend class BasicTTIImplBase<BasicTTIImpl>;		friend class BasicTTIImplBase<BasicTTIImpl>;
Show All 21 Lines

lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h

Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	int getArithmeticInstrCost(
TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,
TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,
TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None);		TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None);

unsigned getCFInstrCost(unsigned Opcode);		unsigned getCFInstrCost(unsigned Opcode);

int getVectorInstrCost(unsigned Opcode, Type *ValTy, unsigned Index);		int getVectorInstrCost(unsigned Opcode, Type *ValTy, unsigned Index);
bool isSourceOfDivergence(const Value *V) const;		bool isSourceOfDivergence(const Value *V) const;

		unsigned getVectorSplitCost() { return 0; }
};		};

} // end namespace llvm		} // end namespace llvm

#endif		#endif

test/Analysis/CostModel/ARM/cast.ll

Show First 20 Lines • Show All 258 Lines • ▼ Show 20 Lines	define i32 @casts() {
; CHECK: Found an estimated cost of 2 for instruction: %r114 = fptoui <4 x float> undef to <4 x i16>		; CHECK: Found an estimated cost of 2 for instruction: %r114 = fptoui <4 x float> undef to <4 x i16>
%r114 = fptoui <4 x float> undef to <4 x i16>		%r114 = fptoui <4 x float> undef to <4 x i16>
; CHECK: Found an estimated cost of 2 for instruction: %r115 = fptosi <4 x float> undef to <4 x i16>		; CHECK: Found an estimated cost of 2 for instruction: %r115 = fptosi <4 x float> undef to <4 x i16>
%r115 = fptosi <4 x float> undef to <4 x i16>		%r115 = fptosi <4 x float> undef to <4 x i16>
; CHECK: Found an estimated cost of 1 for instruction: %r116 = fptoui <4 x float> undef to <4 x i32>		; CHECK: Found an estimated cost of 1 for instruction: %r116 = fptoui <4 x float> undef to <4 x i32>
%r116 = fptoui <4 x float> undef to <4 x i32>		%r116 = fptoui <4 x float> undef to <4 x i32>
; CHECK: Found an estimated cost of 1 for instruction: %r117 = fptosi <4 x float> undef to <4 x i32>		; CHECK: Found an estimated cost of 1 for instruction: %r117 = fptosi <4 x float> undef to <4 x i32>
%r117 = fptosi <4 x float> undef to <4 x i32>		%r117 = fptosi <4 x float> undef to <4 x i32>
; CHECK: Found an estimated cost of 64 for instruction: %r118 = fptoui <4 x float> undef to <4 x i64>		; CHECK: Found an estimated cost of 65 for instruction: %r118 = fptoui <4 x float> undef to <4 x i64>
		arsenmUnsubmitted Not Done Reply Inline Actions LGTM, but it looks to me like this should be adding 0, so not increasing by 1? arsenm: LGTM, but it looks to me like this should be adding 0, so not increasing by 1?
		mkuperAuthorUnsubmitted Not Done Reply Inline Actions Generally, with the new formula, it makes sense to have costs like 2 * 32 + 1, so I didn't pay too much attention to those little changes (what concerned me more were the big drops e.g. 64 -> 11 on line 326). But you're right, I need to verify that this is really reasonable and not some unexpected artifact. Thanks! mkuper: Generally, with the new formula, it makes sense to have costs like 2 * 32 + 1, so I didn't pay…
		mkuperAuthorUnsubmitted Not Done Reply Inline Actions So it mostly makes sense. This used to be evaluated as fully scalarizing, with a per-element scalarization cost of 6, and cast cost of 10. So, (6 + 10) * 4 = 64. Now we evaluate it as 1 + 2 * (cost(fptoui <2 x float> to <2 x i64>)). But the 2-wide cast is still considered fully scalarizing (even though the types are now legal), so we get 1 + 2 * (2 * (10 + 6)) = 65. mkuper: So it mostly makes sense. This used to be evaluated as fully scalarizing, with a per-element…
%r118 = fptoui <4 x float> undef to <4 x i64>		%r118 = fptoui <4 x float> undef to <4 x i64>
; CHECK: Found an estimated cost of 64 for instruction: %r119 = fptosi <4 x float> undef to <4 x i64>		; CHECK: Found an estimated cost of 65 for instruction: %r119 = fptosi <4 x float> undef to <4 x i64>
%r119 = fptosi <4 x float> undef to <4 x i64>		%r119 = fptosi <4 x float> undef to <4 x i64>

; CHECK: Found an estimated cost of 32 for instruction: %r120 = fptoui <4 x double> undef to <4 x i1>		; CHECK: Found an estimated cost of 33 for instruction: %r120 = fptoui <4 x double> undef to <4 x i1>
%r120 = fptoui <4 x double> undef to <4 x i1>		%r120 = fptoui <4 x double> undef to <4 x i1>
; CHECK: Found an estimated cost of 32 for instruction: %r121 = fptosi <4 x double> undef to <4 x i1>		; CHECK: Found an estimated cost of 33 for instruction: %r121 = fptosi <4 x double> undef to <4 x i1>
%r121 = fptosi <4 x double> undef to <4 x i1>		%r121 = fptosi <4 x double> undef to <4 x i1>
; CHECK: Found an estimated cost of 32 for instruction: %r122 = fptoui <4 x double> undef to <4 x i8>		; CHECK: Found an estimated cost of 33 for instruction: %r122 = fptoui <4 x double> undef to <4 x i8>
%r122 = fptoui <4 x double> undef to <4 x i8>		%r122 = fptoui <4 x double> undef to <4 x i8>
; CHECK: Found an estimated cost of 32 for instruction: %r123 = fptosi <4 x double> undef to <4 x i8>		; CHECK: Found an estimated cost of 33 for instruction: %r123 = fptosi <4 x double> undef to <4 x i8>
%r123 = fptosi <4 x double> undef to <4 x i8>		%r123 = fptosi <4 x double> undef to <4 x i8>
; CHECK: Found an estimated cost of 32 for instruction: %r124 = fptoui <4 x double> undef to <4 x i16>		; CHECK: Found an estimated cost of 33 for instruction: %r124 = fptoui <4 x double> undef to <4 x i16>
%r124 = fptoui <4 x double> undef to <4 x i16>		%r124 = fptoui <4 x double> undef to <4 x i16>
; CHECK: Found an estimated cost of 32 for instruction: %r125 = fptosi <4 x double> undef to <4 x i16>		; CHECK: Found an estimated cost of 33 for instruction: %r125 = fptosi <4 x double> undef to <4 x i16>
%r125 = fptosi <4 x double> undef to <4 x i16>		%r125 = fptosi <4 x double> undef to <4 x i16>
; CHECK: Found an estimated cost of 32 for instruction: %r126 = fptoui <4 x double> undef to <4 x i32>		; CHECK: Found an estimated cost of 5 for instruction: %r126 = fptoui <4 x double> undef to <4 x i32>
%r126 = fptoui <4 x double> undef to <4 x i32>		%r126 = fptoui <4 x double> undef to <4 x i32>
; CHECK: Found an estimated cost of 32 for instruction: %r127 = fptosi <4 x double> undef to <4 x i32>		; CHECK: Found an estimated cost of 5 for instruction: %r127 = fptosi <4 x double> undef to <4 x i32>
%r127 = fptosi <4 x double> undef to <4 x i32>		%r127 = fptosi <4 x double> undef to <4 x i32>
; CHECK: Found an estimated cost of 64 for instruction: %r128 = fptoui <4 x double> undef to <4 x i64>		; CHECK: Found an estimated cost of 65 for instruction: %r128 = fptoui <4 x double> undef to <4 x i64>
%r128 = fptoui <4 x double> undef to <4 x i64>		%r128 = fptoui <4 x double> undef to <4 x i64>
; CHECK: Found an estimated cost of 64 for instruction: %r129 = fptosi <4 x double> undef to <4 x i64>		; CHECK: Found an estimated cost of 65 for instruction: %r129 = fptosi <4 x double> undef to <4 x i64>
%r129 = fptosi <4 x double> undef to <4 x i64>		%r129 = fptosi <4 x double> undef to <4 x i64>

; CHECK: Found an estimated cost of 64 for instruction: %r130 = fptoui <8 x float> undef to <8 x i1>		; CHECK: Found an estimated cost of 65 for instruction: %r130 = fptoui <8 x float> undef to <8 x i1>
%r130 = fptoui <8 x float> undef to <8 x i1>		%r130 = fptoui <8 x float> undef to <8 x i1>
; CHECK: Found an estimated cost of 64 for instruction: %r131 = fptosi <8 x float> undef to <8 x i1>		; CHECK: Found an estimated cost of 65 for instruction: %r131 = fptosi <8 x float> undef to <8 x i1>
%r131 = fptosi <8 x float> undef to <8 x i1>		%r131 = fptosi <8 x float> undef to <8 x i1>
; CHECK: Found an estimated cost of 64 for instruction: %r132 = fptoui <8 x float> undef to <8 x i8>		; CHECK: Found an estimated cost of 7 for instruction: %r132 = fptoui <8 x float> undef to <8 x i8>
%r132 = fptoui <8 x float> undef to <8 x i8>		%r132 = fptoui <8 x float> undef to <8 x i8>
; CHECK: Found an estimated cost of 64 for instruction: %r133 = fptosi <8 x float> undef to <8 x i8>		; CHECK: Found an estimated cost of 7 for instruction: %r133 = fptosi <8 x float> undef to <8 x i8>
%r133 = fptosi <8 x float> undef to <8 x i8>		%r133 = fptosi <8 x float> undef to <8 x i8>
; CHECK: Found an estimated cost of 4 for instruction: %r134 = fptoui <8 x float> undef to <8 x i16>		; CHECK: Found an estimated cost of 4 for instruction: %r134 = fptoui <8 x float> undef to <8 x i16>
%r134 = fptoui <8 x float> undef to <8 x i16>		%r134 = fptoui <8 x float> undef to <8 x i16>
; CHECK: Found an estimated cost of 4 for instruction: %r135 = fptosi <8 x float> undef to <8 x i16>		; CHECK: Found an estimated cost of 4 for instruction: %r135 = fptosi <8 x float> undef to <8 x i16>
%r135 = fptosi <8 x float> undef to <8 x i16>		%r135 = fptosi <8 x float> undef to <8 x i16>
; CHECK: Found an estimated cost of 2 for instruction: %r136 = fptoui <8 x float> undef to <8 x i32>		; CHECK: Found an estimated cost of 2 for instruction: %r136 = fptoui <8 x float> undef to <8 x i32>
%r136 = fptoui <8 x float> undef to <8 x i32>		%r136 = fptoui <8 x float> undef to <8 x i32>
; CHECK: Found an estimated cost of 2 for instruction: %r137 = fptosi <8 x float> undef to <8 x i32>		; CHECK: Found an estimated cost of 2 for instruction: %r137 = fptosi <8 x float> undef to <8 x i32>
%r137 = fptosi <8 x float> undef to <8 x i32>		%r137 = fptosi <8 x float> undef to <8 x i32>
; CHECK: Found an estimated cost of 128 for instruction: %r138 = fptoui <8 x float> undef to <8 x i64>		; CHECK: Found an estimated cost of 131 for instruction: %r138 = fptoui <8 x float> undef to <8 x i64>
%r138 = fptoui <8 x float> undef to <8 x i64>		%r138 = fptoui <8 x float> undef to <8 x i64>
; CHECK: Found an estimated cost of 128 for instruction: %r139 = fptosi <8 x float> undef to <8 x i64>		; CHECK: Found an estimated cost of 131 for instruction: %r139 = fptosi <8 x float> undef to <8 x i64>
%r139 = fptosi <8 x float> undef to <8 x i64>		%r139 = fptosi <8 x float> undef to <8 x i64>

; CHECK: Found an estimated cost of 64 for instruction: %r140 = fptoui <8 x double> undef to <8 x i1>		; CHECK: Found an estimated cost of 67 for instruction: %r140 = fptoui <8 x double> undef to <8 x i1>
%r140 = fptoui <8 x double> undef to <8 x i1>		%r140 = fptoui <8 x double> undef to <8 x i1>
; CHECK: Found an estimated cost of 64 for instruction: %r141 = fptosi <8 x double> undef to <8 x i1>		; CHECK: Found an estimated cost of 67 for instruction: %r141 = fptosi <8 x double> undef to <8 x i1>
%r141 = fptosi <8 x double> undef to <8 x i1>		%r141 = fptosi <8 x double> undef to <8 x i1>
; CHECK: Found an estimated cost of 64 for instruction: %r142 = fptoui <8 x double> undef to <8 x i8>		; CHECK: Found an estimated cost of 67 for instruction: %r142 = fptoui <8 x double> undef to <8 x i8>
%r142 = fptoui <8 x double> undef to <8 x i8>		%r142 = fptoui <8 x double> undef to <8 x i8>
; CHECK: Found an estimated cost of 64 for instruction: %r143 = fptosi <8 x double> undef to <8 x i8>		; CHECK: Found an estimated cost of 67 for instruction: %r143 = fptosi <8 x double> undef to <8 x i8>
%r143 = fptosi <8 x double> undef to <8 x i8>		%r143 = fptosi <8 x double> undef to <8 x i8>
; CHECK: Found an estimated cost of 64 for instruction: %r144 = fptoui <8 x double> undef to <8 x i16>		; CHECK: Found an estimated cost of 67 for instruction: %r144 = fptoui <8 x double> undef to <8 x i16>
%r144 = fptoui <8 x double> undef to <8 x i16>		%r144 = fptoui <8 x double> undef to <8 x i16>
; CHECK: Found an estimated cost of 64 for instruction: %r145 = fptosi <8 x double> undef to <8 x i16>		; CHECK: Found an estimated cost of 67 for instruction: %r145 = fptosi <8 x double> undef to <8 x i16>
%r145 = fptosi <8 x double> undef to <8 x i16>		%r145 = fptosi <8 x double> undef to <8 x i16>
; CHECK: Found an estimated cost of 64 for instruction: %r146 = fptoui <8 x double> undef to <8 x i32>		; CHECK: Found an estimated cost of 11 for instruction: %r146 = fptoui <8 x double> undef to <8 x i32>
%r146 = fptoui <8 x double> undef to <8 x i32>		%r146 = fptoui <8 x double> undef to <8 x i32>
; CHECK: Found an estimated cost of 64 for instruction: %r147 = fptosi <8 x double> undef to <8 x i32>		; CHECK: Found an estimated cost of 11 for instruction: %r147 = fptosi <8 x double> undef to <8 x i32>
%r147 = fptosi <8 x double> undef to <8 x i32>		%r147 = fptosi <8 x double> undef to <8 x i32>
; CHECK: Found an estimated cost of 128 for instruction: %r148 = fptoui <8 x double> undef to <8 x i64>		; CHECK: Found an estimated cost of 131 for instruction: %r148 = fptoui <8 x double> undef to <8 x i64>
%r148 = fptoui <8 x double> undef to <8 x i64>		%r148 = fptoui <8 x double> undef to <8 x i64>
; CHECK: Found an estimated cost of 128 for instruction: %r149 = fptosi <8 x double> undef to <8 x i64>		; CHECK: Found an estimated cost of 131 for instruction: %r149 = fptosi <8 x double> undef to <8 x i64>
%r149 = fptosi <8 x double> undef to <8 x i64>		%r149 = fptosi <8 x double> undef to <8 x i64>

; CHECK: Found an estimated cost of 128 for instruction: %r150 = fptoui <16 x float> undef to <16 x i1>		; CHECK: Found an estimated cost of 131 for instruction: %r150 = fptoui <16 x float> undef to <16 x i1>
%r150 = fptoui <16 x float> undef to <16 x i1>		%r150 = fptoui <16 x float> undef to <16 x i1>
; CHECK: Found an estimated cost of 128 for instruction: %r151 = fptosi <16 x float> undef to <16 x i1>		; CHECK: Found an estimated cost of 131 for instruction: %r151 = fptosi <16 x float> undef to <16 x i1>
%r151 = fptosi <16 x float> undef to <16 x i1>		%r151 = fptosi <16 x float> undef to <16 x i1>
; CHECK: Found an estimated cost of 128 for instruction: %r152 = fptoui <16 x float> undef to <16 x i8>		; CHECK: Found an estimated cost of 15 for instruction: %r152 = fptoui <16 x float> undef to <16 x i8>
%r152 = fptoui <16 x float> undef to <16 x i8>		%r152 = fptoui <16 x float> undef to <16 x i8>
; CHECK: Found an estimated cost of 128 for instruction: %r153 = fptosi <16 x float> undef to <16 x i8>		; CHECK: Found an estimated cost of 15 for instruction: %r153 = fptosi <16 x float> undef to <16 x i8>
%r153 = fptosi <16 x float> undef to <16 x i8>		%r153 = fptosi <16 x float> undef to <16 x i8>
; CHECK: Found an estimated cost of 8 for instruction: %r154 = fptoui <16 x float> undef to <16 x i16>		; CHECK: Found an estimated cost of 8 for instruction: %r154 = fptoui <16 x float> undef to <16 x i16>
%r154 = fptoui <16 x float> undef to <16 x i16>		%r154 = fptoui <16 x float> undef to <16 x i16>
; CHECK: Found an estimated cost of 8 for instruction: %r155 = fptosi <16 x float> undef to <16 x i16>		; CHECK: Found an estimated cost of 8 for instruction: %r155 = fptosi <16 x float> undef to <16 x i16>
%r155 = fptosi <16 x float> undef to <16 x i16>		%r155 = fptosi <16 x float> undef to <16 x i16>
; CHECK: Found an estimated cost of 4 for instruction: %r156 = fptoui <16 x float> undef to <16 x i32>		; CHECK: Found an estimated cost of 4 for instruction: %r156 = fptoui <16 x float> undef to <16 x i32>
%r156 = fptoui <16 x float> undef to <16 x i32>		%r156 = fptoui <16 x float> undef to <16 x i32>
; CHECK: Found an estimated cost of 4 for instruction: %r157 = fptosi <16 x float> undef to <16 x i32>		; CHECK: Found an estimated cost of 4 for instruction: %r157 = fptosi <16 x float> undef to <16 x i32>
%r157 = fptosi <16 x float> undef to <16 x i32>		%r157 = fptosi <16 x float> undef to <16 x i32>
; CHECK: Found an estimated cost of 256 for instruction: %r158 = fptoui <16 x float> undef to <16 x i64>		; CHECK: Found an estimated cost of 263 for instruction: %r158 = fptoui <16 x float> undef to <16 x i64>
%r158 = fptoui <16 x float> undef to <16 x i64>		%r158 = fptoui <16 x float> undef to <16 x i64>
; CHECK: Found an estimated cost of 256 for instruction: %r159 = fptosi <16 x float> undef to <16 x i64>		; CHECK: Found an estimated cost of 263 for instruction: %r159 = fptosi <16 x float> undef to <16 x i64>
%r159 = fptosi <16 x float> undef to <16 x i64>		%r159 = fptosi <16 x float> undef to <16 x i64>

; CHECK: Found an estimated cost of 128 for instruction: %r160 = fptoui <16 x double> undef to <16 x i1>		; CHECK: Found an estimated cost of 135 for instruction: %r160 = fptoui <16 x double> undef to <16 x i1>
%r160 = fptoui <16 x double> undef to <16 x i1>		%r160 = fptoui <16 x double> undef to <16 x i1>
; CHECK: Found an estimated cost of 128 for instruction: %r161 = fptosi <16 x double> undef to <16 x i1>		; CHECK: Found an estimated cost of 135 for instruction: %r161 = fptosi <16 x double> undef to <16 x i1>
%r161 = fptosi <16 x double> undef to <16 x i1>		%r161 = fptosi <16 x double> undef to <16 x i1>
; CHECK: Found an estimated cost of 128 for instruction: %r162 = fptoui <16 x double> undef to <16 x i8>		; CHECK: Found an estimated cost of 135 for instruction: %r162 = fptoui <16 x double> undef to <16 x i8>
%r162 = fptoui <16 x double> undef to <16 x i8>		%r162 = fptoui <16 x double> undef to <16 x i8>
; CHECK: Found an estimated cost of 128 for instruction: %r163 = fptosi <16 x double> undef to <16 x i8>		; CHECK: Found an estimated cost of 135 for instruction: %r163 = fptosi <16 x double> undef to <16 x i8>
%r163 = fptosi <16 x double> undef to <16 x i8>		%r163 = fptosi <16 x double> undef to <16 x i8>
; CHECK: Found an estimated cost of 128 for instruction: %r164 = fptoui <16 x double> undef to <16 x i16>		; CHECK: Found an estimated cost of 135 for instruction: %r164 = fptoui <16 x double> undef to <16 x i16>
%r164 = fptoui <16 x double> undef to <16 x i16>		%r164 = fptoui <16 x double> undef to <16 x i16>
; CHECK: Found an estimated cost of 128 for instruction: %r165 = fptosi <16 x double> undef to <16 x i16>		; CHECK: Found an estimated cost of 135 for instruction: %r165 = fptosi <16 x double> undef to <16 x i16>
%r165 = fptosi <16 x double> undef to <16 x i16>		%r165 = fptosi <16 x double> undef to <16 x i16>
; CHECK: Found an estimated cost of 128 for instruction: %r166 = fptoui <16 x double> undef to <16 x i32>		; CHECK: Found an estimated cost of 23 for instruction: %r166 = fptoui <16 x double> undef to <16 x i32>
%r166 = fptoui <16 x double> undef to <16 x i32>		%r166 = fptoui <16 x double> undef to <16 x i32>
; CHECK: Found an estimated cost of 128 for instruction: %r167 = fptosi <16 x double> undef to <16 x i32>		; CHECK: Found an estimated cost of 23 for instruction: %r167 = fptosi <16 x double> undef to <16 x i32>
%r167 = fptosi <16 x double> undef to <16 x i32>		%r167 = fptosi <16 x double> undef to <16 x i32>
; CHECK: Found an estimated cost of 256 for instruction: %r168 = fptoui <16 x double> undef to <16 x i64>		; CHECK: Found an estimated cost of 263 for instruction: %r168 = fptoui <16 x double> undef to <16 x i64>
%r168 = fptoui <16 x double> undef to <16 x i64>		%r168 = fptoui <16 x double> undef to <16 x i64>
; CHECK: Found an estimated cost of 256 for instruction: %r169 = fptosi <16 x double> undef to <16 x i64>		; CHECK: Found an estimated cost of 263 for instruction: %r169 = fptosi <16 x double> undef to <16 x i64>
%r169 = fptosi <16 x double> undef to <16 x i64>		%r169 = fptosi <16 x double> undef to <16 x i64>

; CHECK: Found an estimated cost of 12 for instruction: %r170 = uitofp <2 x i1> undef to <2 x float>		; CHECK: Found an estimated cost of 12 for instruction: %r170 = uitofp <2 x i1> undef to <2 x float>
%r170 = uitofp <2 x i1> undef to <2 x float>		%r170 = uitofp <2 x i1> undef to <2 x float>
; CHECK: Found an estimated cost of 12 for instruction: %r171 = sitofp <2 x i1> undef to <2 x float>		; CHECK: Found an estimated cost of 12 for instruction: %r171 = sitofp <2 x i1> undef to <2 x float>
%r171 = sitofp <2 x i1> undef to <2 x float>		%r171 = sitofp <2 x i1> undef to <2 x float>
; CHECK: Found an estimated cost of 3 for instruction: %r172 = uitofp <2 x i8> undef to <2 x float>		; CHECK: Found an estimated cost of 3 for instruction: %r172 = uitofp <2 x i8> undef to <2 x float>
%r172 = uitofp <2 x i8> undef to <2 x float>		%r172 = uitofp <2 x i8> undef to <2 x float>
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	; CHECK: Found an estimated cost of 15 for instruction: %r152 = fptoui <16 x float> undef to <16 x i8>
; CHECK: Found an estimated cost of 2 for instruction: %r194 = uitofp <4 x i16> undef to <4 x float>		; CHECK: Found an estimated cost of 2 for instruction: %r194 = uitofp <4 x i16> undef to <4 x float>
%r194 = uitofp <4 x i16> undef to <4 x float>		%r194 = uitofp <4 x i16> undef to <4 x float>
; CHECK: Found an estimated cost of 2 for instruction: %r195 = sitofp <4 x i16> undef to <4 x float>		; CHECK: Found an estimated cost of 2 for instruction: %r195 = sitofp <4 x i16> undef to <4 x float>
%r195 = sitofp <4 x i16> undef to <4 x float>		%r195 = sitofp <4 x i16> undef to <4 x float>
; CHECK: Found an estimated cost of 1 for instruction: %r196 = uitofp <4 x i32> undef to <4 x float>		; CHECK: Found an estimated cost of 1 for instruction: %r196 = uitofp <4 x i32> undef to <4 x float>
%r196 = uitofp <4 x i32> undef to <4 x float>		%r196 = uitofp <4 x i32> undef to <4 x float>
; CHECK: Found an estimated cost of 1 for instruction: %r197 = sitofp <4 x i32> undef to <4 x float>		; CHECK: Found an estimated cost of 1 for instruction: %r197 = sitofp <4 x i32> undef to <4 x float>
%r197 = sitofp <4 x i32> undef to <4 x float>		%r197 = sitofp <4 x i32> undef to <4 x float>
; CHECK: Found an estimated cost of 56 for instruction: %r198 = uitofp <4 x i64> undef to <4 x float>		; CHECK: Found an estimated cost of 57 for instruction: %r198 = uitofp <4 x i64> undef to <4 x float>
%r198 = uitofp <4 x i64> undef to <4 x float>		%r198 = uitofp <4 x i64> undef to <4 x float>
; CHECK: Found an estimated cost of 56 for instruction: %r199 = sitofp <4 x i64> undef to <4 x float>		; CHECK: Found an estimated cost of 57 for instruction: %r199 = sitofp <4 x i64> undef to <4 x float>
%r199 = sitofp <4 x i64> undef to <4 x float>		%r199 = sitofp <4 x i64> undef to <4 x float>

; CHECK: Found an estimated cost of 16 for instruction: %r200 = uitofp <4 x i1> undef to <4 x double>		; CHECK: Found an estimated cost of 17 for instruction: %r200 = uitofp <4 x i1> undef to <4 x double>
%r200 = uitofp <4 x i1> undef to <4 x double>		%r200 = uitofp <4 x i1> undef to <4 x double>
; CHECK: Found an estimated cost of 16 for instruction: %r201 = sitofp <4 x i1> undef to <4 x double>		; CHECK: Found an estimated cost of 17 for instruction: %r201 = sitofp <4 x i1> undef to <4 x double>
%r201 = sitofp <4 x i1> undef to <4 x double>		%r201 = sitofp <4 x i1> undef to <4 x double>
; CHECK: Found an estimated cost of 16 for instruction: %r202 = uitofp <4 x i8> undef to <4 x double>		; CHECK: Found an estimated cost of 9 for instruction: %r202 = uitofp <4 x i8> undef to <4 x double>
%r202 = uitofp <4 x i8> undef to <4 x double>		%r202 = uitofp <4 x i8> undef to <4 x double>
; CHECK: Found an estimated cost of 16 for instruction: %r203 = sitofp <4 x i8> undef to <4 x double>		; CHECK: Found an estimated cost of 9 for instruction: %r203 = sitofp <4 x i8> undef to <4 x double>
%r203 = sitofp <4 x i8> undef to <4 x double>		%r203 = sitofp <4 x i8> undef to <4 x double>
; CHECK: Found an estimated cost of 16 for instruction: %r204 = uitofp <4 x i16> undef to <4 x double>		; CHECK: Found an estimated cost of 7 for instruction: %r204 = uitofp <4 x i16> undef to <4 x double>
%r204 = uitofp <4 x i16> undef to <4 x double>		%r204 = uitofp <4 x i16> undef to <4 x double>
; CHECK: Found an estimated cost of 16 for instruction: %r205 = sitofp <4 x i16> undef to <4 x double>		; CHECK: Found an estimated cost of 7 for instruction: %r205 = sitofp <4 x i16> undef to <4 x double>
%r205 = sitofp <4 x i16> undef to <4 x double>		%r205 = sitofp <4 x i16> undef to <4 x double>
; CHECK: Found an estimated cost of 16 for instruction: %r206 = uitofp <4 x i32> undef to <4 x double>		; CHECK: Found an estimated cost of 5 for instruction: %r206 = uitofp <4 x i32> undef to <4 x double>
%r206 = uitofp <4 x i32> undef to <4 x double>		%r206 = uitofp <4 x i32> undef to <4 x double>
; CHECK: Found an estimated cost of 16 for instruction: %r207 = sitofp <4 x i32> undef to <4 x double>		; CHECK: Found an estimated cost of 5 for instruction: %r207 = sitofp <4 x i32> undef to <4 x double>
%r207 = sitofp <4 x i32> undef to <4 x double>		%r207 = sitofp <4 x i32> undef to <4 x double>
; CHECK: Found an estimated cost of 48 for instruction: %r208 = uitofp <4 x i64> undef to <4 x double>		; CHECK: Found an estimated cost of 49 for instruction: %r208 = uitofp <4 x i64> undef to <4 x double>
%r208 = uitofp <4 x i64> undef to <4 x double>		%r208 = uitofp <4 x i64> undef to <4 x double>
; CHECK: Found an estimated cost of 48 for instruction: %r209 = sitofp <4 x i64> undef to <4 x double>		; CHECK: Found an estimated cost of 49 for instruction: %r209 = sitofp <4 x i64> undef to <4 x double>
%r209 = sitofp <4 x i64> undef to <4 x double>		%r209 = sitofp <4 x i64> undef to <4 x double>

; CHECK: Found an estimated cost of 48 for instruction: %r210 = uitofp <8 x i1> undef to <8 x float>		; CHECK: Found an estimated cost of 7 for instruction: %r210 = uitofp <8 x i1> undef to <8 x float>
%r210 = uitofp <8 x i1> undef to <8 x float>		%r210 = uitofp <8 x i1> undef to <8 x float>
; CHECK: Found an estimated cost of 48 for instruction: %r211 = sitofp <8 x i1> undef to <8 x float>		; CHECK: Found an estimated cost of 7 for instruction: %r211 = sitofp <8 x i1> undef to <8 x float>
%r211 = sitofp <8 x i1> undef to <8 x float>		%r211 = sitofp <8 x i1> undef to <8 x float>
; CHECK: Found an estimated cost of 48 for instruction: %r212 = uitofp <8 x i8> undef to <8 x float>		; CHECK: Found an estimated cost of 7 for instruction: %r212 = uitofp <8 x i8> undef to <8 x float>
%r212 = uitofp <8 x i8> undef to <8 x float>		%r212 = uitofp <8 x i8> undef to <8 x float>
; CHECK: Found an estimated cost of 48 for instruction: %r213 = sitofp <8 x i8> undef to <8 x float>		; CHECK: Found an estimated cost of 7 for instruction: %r213 = sitofp <8 x i8> undef to <8 x float>
%r213 = sitofp <8 x i8> undef to <8 x float>		%r213 = sitofp <8 x i8> undef to <8 x float>
; CHECK: Found an estimated cost of 4 for instruction: %r214 = uitofp <8 x i16> undef to <8 x float>		; CHECK: Found an estimated cost of 4 for instruction: %r214 = uitofp <8 x i16> undef to <8 x float>
%r214 = uitofp <8 x i16> undef to <8 x float>		%r214 = uitofp <8 x i16> undef to <8 x float>
; CHECK: Found an estimated cost of 4 for instruction: %r215 = sitofp <8 x i16> undef to <8 x float>		; CHECK: Found an estimated cost of 4 for instruction: %r215 = sitofp <8 x i16> undef to <8 x float>
%r215 = sitofp <8 x i16> undef to <8 x float>		%r215 = sitofp <8 x i16> undef to <8 x float>
; CHECK: Found an estimated cost of 2 for instruction: %r216 = uitofp <8 x i32> undef to <8 x float>		; CHECK: Found an estimated cost of 2 for instruction: %r216 = uitofp <8 x i32> undef to <8 x float>
%r216 = uitofp <8 x i32> undef to <8 x float>		%r216 = uitofp <8 x i32> undef to <8 x float>
; CHECK: Found an estimated cost of 2 for instruction: %r217 = sitofp <8 x i32> undef to <8 x float>		; CHECK: Found an estimated cost of 2 for instruction: %r217 = sitofp <8 x i32> undef to <8 x float>
%r217 = sitofp <8 x i32> undef to <8 x float>		%r217 = sitofp <8 x i32> undef to <8 x float>
; CHECK: Found an estimated cost of 112 for instruction: %r218 = uitofp <8 x i64> undef to <8 x float>		; CHECK: Found an estimated cost of 115 for instruction: %r218 = uitofp <8 x i64> undef to <8 x float>
%r218 = uitofp <8 x i64> undef to <8 x float>		%r218 = uitofp <8 x i64> undef to <8 x float>
; CHECK: Found an estimated cost of 112 for instruction: %r219 = sitofp <8 x i64> undef to <8 x float>		; CHECK: Found an estimated cost of 115 for instruction: %r219 = sitofp <8 x i64> undef to <8 x float>
%r219 = sitofp <8 x i64> undef to <8 x float>		%r219 = sitofp <8 x i64> undef to <8 x float>

; CHECK: Found an estimated cost of 32 for instruction: %r220 = uitofp <8 x i1> undef to <8 x double>		; CHECK: Found an estimated cost of 35 for instruction: %r220 = uitofp <8 x i1> undef to <8 x double>
%r220 = uitofp <8 x i1> undef to <8 x double>		%r220 = uitofp <8 x i1> undef to <8 x double>
; CHECK: Found an estimated cost of 32 for instruction: %r221 = sitofp <8 x i1> undef to <8 x double>		; CHECK: Found an estimated cost of 35 for instruction: %r221 = sitofp <8 x i1> undef to <8 x double>
%r221 = sitofp <8 x i1> undef to <8 x double>		%r221 = sitofp <8 x i1> undef to <8 x double>
; CHECK: Found an estimated cost of 32 for instruction: %r222 = uitofp <8 x i8> undef to <8 x double>		; CHECK: Found an estimated cost of 19 for instruction: %r222 = uitofp <8 x i8> undef to <8 x double>
%r222 = uitofp <8 x i8> undef to <8 x double>		%r222 = uitofp <8 x i8> undef to <8 x double>
; CHECK: Found an estimated cost of 32 for instruction: %r223 = sitofp <8 x i8> undef to <8 x double>		; CHECK: Found an estimated cost of 19 for instruction: %r223 = sitofp <8 x i8> undef to <8 x double>
%r223 = sitofp <8 x i8> undef to <8 x double>		%r223 = sitofp <8 x i8> undef to <8 x double>
; CHECK: Found an estimated cost of 32 for instruction: %r224 = uitofp <8 x i16> undef to <8 x double>		; CHECK: Found an estimated cost of 15 for instruction: %r224 = uitofp <8 x i16> undef to <8 x double>
%r224 = uitofp <8 x i16> undef to <8 x double>		%r224 = uitofp <8 x i16> undef to <8 x double>
; CHECK: Found an estimated cost of 32 for instruction: %r225 = sitofp <8 x i16> undef to <8 x double>		; CHECK: Found an estimated cost of 15 for instruction: %r225 = sitofp <8 x i16> undef to <8 x double>
%r225 = sitofp <8 x i16> undef to <8 x double>		%r225 = sitofp <8 x i16> undef to <8 x double>
; CHECK: Found an estimated cost of 32 for instruction: %r226 = uitofp <8 x i16> undef to <8 x double>		; CHECK: Found an estimated cost of 15 for instruction: %r226 = uitofp <8 x i16> undef to <8 x double>
%r226 = uitofp <8 x i16> undef to <8 x double>		%r226 = uitofp <8 x i16> undef to <8 x double>
; CHECK: Found an estimated cost of 32 for instruction: %r227 = sitofp <8 x i16> undef to <8 x double>		; CHECK: Found an estimated cost of 15 for instruction: %r227 = sitofp <8 x i16> undef to <8 x double>
%r227 = sitofp <8 x i16> undef to <8 x double>		%r227 = sitofp <8 x i16> undef to <8 x double>
; CHECK: Found an estimated cost of 96 for instruction: %r228 = uitofp <8 x i64> undef to <8 x double>		; CHECK: Found an estimated cost of 99 for instruction: %r228 = uitofp <8 x i64> undef to <8 x double>
%r228 = uitofp <8 x i64> undef to <8 x double>		%r228 = uitofp <8 x i64> undef to <8 x double>
; CHECK: Found an estimated cost of 96 for instruction: %r229 = sitofp <8 x i64> undef to <8 x double>		; CHECK: Found an estimated cost of 99 for instruction: %r229 = sitofp <8 x i64> undef to <8 x double>
%r229 = sitofp <8 x i64> undef to <8 x double>		%r229 = sitofp <8 x i64> undef to <8 x double>

; CHECK: Found an estimated cost of 96 for instruction: %r230 = uitofp <16 x i1> undef to <16 x float>		; CHECK: Found an estimated cost of 15 for instruction: %r230 = uitofp <16 x i1> undef to <16 x float>
%r230 = uitofp <16 x i1> undef to <16 x float>		%r230 = uitofp <16 x i1> undef to <16 x float>
; CHECK: Found an estimated cost of 96 for instruction: %r231 = sitofp <16 x i1> undef to <16 x float>		; CHECK: Found an estimated cost of 15 for instruction: %r231 = sitofp <16 x i1> undef to <16 x float>
%r231 = sitofp <16 x i1> undef to <16 x float>		%r231 = sitofp <16 x i1> undef to <16 x float>
; CHECK: Found an estimated cost of 96 for instruction: %r232 = uitofp <16 x i8> undef to <16 x float>		; CHECK: Found an estimated cost of 15 for instruction: %r232 = uitofp <16 x i8> undef to <16 x float>
%r232 = uitofp <16 x i8> undef to <16 x float>		%r232 = uitofp <16 x i8> undef to <16 x float>
; CHECK: Found an estimated cost of 96 for instruction: %r233 = sitofp <16 x i8> undef to <16 x float>		; CHECK: Found an estimated cost of 15 for instruction: %r233 = sitofp <16 x i8> undef to <16 x float>
%r233 = sitofp <16 x i8> undef to <16 x float>		%r233 = sitofp <16 x i8> undef to <16 x float>
; CHECK: Found an estimated cost of 8 for instruction: %r234 = uitofp <16 x i16> undef to <16 x float>		; CHECK: Found an estimated cost of 8 for instruction: %r234 = uitofp <16 x i16> undef to <16 x float>
%r234 = uitofp <16 x i16> undef to <16 x float>		%r234 = uitofp <16 x i16> undef to <16 x float>
; CHECK: Found an estimated cost of 8 for instruction: %r235 = sitofp <16 x i16> undef to <16 x float>		; CHECK: Found an estimated cost of 8 for instruction: %r235 = sitofp <16 x i16> undef to <16 x float>
%r235 = sitofp <16 x i16> undef to <16 x float>		%r235 = sitofp <16 x i16> undef to <16 x float>
; CHECK: Found an estimated cost of 4 for instruction: %r236 = uitofp <16 x i32> undef to <16 x float>		; CHECK: Found an estimated cost of 4 for instruction: %r236 = uitofp <16 x i32> undef to <16 x float>
%r236 = uitofp <16 x i32> undef to <16 x float>		%r236 = uitofp <16 x i32> undef to <16 x float>
; CHECK: Found an estimated cost of 4 for instruction: %r237 = sitofp <16 x i32> undef to <16 x float>		; CHECK: Found an estimated cost of 4 for instruction: %r237 = sitofp <16 x i32> undef to <16 x float>
%r237 = sitofp <16 x i32> undef to <16 x float>		%r237 = sitofp <16 x i32> undef to <16 x float>
; CHECK: Found an estimated cost of 224 for instruction: %r238 = uitofp <16 x i64> undef to <16 x float>		; CHECK: Found an estimated cost of 231 for instruction: %r238 = uitofp <16 x i64> undef to <16 x float>
%r238 = uitofp <16 x i64> undef to <16 x float>		%r238 = uitofp <16 x i64> undef to <16 x float>
; CHECK: Found an estimated cost of 224 for instruction: %r239 = sitofp <16 x i64> undef to <16 x float>		; CHECK: Found an estimated cost of 231 for instruction: %r239 = sitofp <16 x i64> undef to <16 x float>
%r239 = sitofp <16 x i64> undef to <16 x float>		%r239 = sitofp <16 x i64> undef to <16 x float>

; CHECK: Found an estimated cost of 64 for instruction: %r240 = uitofp <16 x i1> undef to <16 x double>		; CHECK: Found an estimated cost of 71 for instruction: %r240 = uitofp <16 x i1> undef to <16 x double>
%r240 = uitofp <16 x i1> undef to <16 x double>		%r240 = uitofp <16 x i1> undef to <16 x double>
; CHECK: Found an estimated cost of 64 for instruction: %r241 = sitofp <16 x i1> undef to <16 x double>		; CHECK: Found an estimated cost of 71 for instruction: %r241 = sitofp <16 x i1> undef to <16 x double>
%r241 = sitofp <16 x i1> undef to <16 x double>		%r241 = sitofp <16 x i1> undef to <16 x double>
; CHECK: Found an estimated cost of 64 for instruction: %r242 = uitofp <16 x i8> undef to <16 x double>		; CHECK: Found an estimated cost of 39 for instruction: %r242 = uitofp <16 x i8> undef to <16 x double>
%r242 = uitofp <16 x i8> undef to <16 x double>		%r242 = uitofp <16 x i8> undef to <16 x double>
; CHECK: Found an estimated cost of 64 for instruction: %r243 = sitofp <16 x i8> undef to <16 x double>		; CHECK: Found an estimated cost of 39 for instruction: %r243 = sitofp <16 x i8> undef to <16 x double>
%r243 = sitofp <16 x i8> undef to <16 x double>		%r243 = sitofp <16 x i8> undef to <16 x double>
; CHECK: Found an estimated cost of 64 for instruction: %r244 = uitofp <16 x i16> undef to <16 x double>		; CHECK: Found an estimated cost of 31 for instruction: %r244 = uitofp <16 x i16> undef to <16 x double>
%r244 = uitofp <16 x i16> undef to <16 x double>		%r244 = uitofp <16 x i16> undef to <16 x double>
; CHECK: Found an estimated cost of 64 for instruction: %r245 = sitofp <16 x i16> undef to <16 x double>		; CHECK: Found an estimated cost of 31 for instruction: %r245 = sitofp <16 x i16> undef to <16 x double>
%r245 = sitofp <16 x i16> undef to <16 x double>		%r245 = sitofp <16 x i16> undef to <16 x double>
; CHECK: Found an estimated cost of 64 for instruction: %r246 = uitofp <16 x i16> undef to <16 x double>		; CHECK: Found an estimated cost of 31 for instruction: %r246 = uitofp <16 x i16> undef to <16 x double>
%r246 = uitofp <16 x i16> undef to <16 x double>		%r246 = uitofp <16 x i16> undef to <16 x double>
; CHECK: Found an estimated cost of 64 for instruction: %r247 = sitofp <16 x i16> undef to <16 x double>		; CHECK: Found an estimated cost of 31 for instruction: %r247 = sitofp <16 x i16> undef to <16 x double>
%r247 = sitofp <16 x i16> undef to <16 x double>		%r247 = sitofp <16 x i16> undef to <16 x double>
; CHECK: Found an estimated cost of 192 for instruction: %r248 = uitofp <16 x i64> undef to <16 x double>		; CHECK: Found an estimated cost of 199 for instruction: %r248 = uitofp <16 x i64> undef to <16 x double>
%r248 = uitofp <16 x i64> undef to <16 x double>		%r248 = uitofp <16 x i64> undef to <16 x double>
; CHECK: Found an estimated cost of 192 for instruction: %r249 = sitofp <16 x i64> undef to <16 x double>		; CHECK: Found an estimated cost of 199 for instruction: %r249 = sitofp <16 x i64> undef to <16 x double>
%r249 = sitofp <16 x i64> undef to <16 x double>		%r249 = sitofp <16 x i64> undef to <16 x double>

; CHECK: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK: Found an estimated cost of 0 for instruction: ret i32 undef
ret i32 undef		ret i32 undef
}		}

test/Analysis/CostModel/PowerPC/ext.ll

	; RUN: opt < %s -cost-model -analyze -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -mattr=+vsx \| FileCheck %s			; RUN: opt < %s -cost-model -analyze -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -mattr=+vsx \| FileCheck %s
	target datalayout = "E-m:e-i64:64-n32:64"			target datalayout = "E-m:e-i64:64-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	define void @exts() {			define void @exts() {

	; CHECK: cost of 1 {{.*}} sext			; CHECK: cost of 1 {{.*}} sext
	%v1 = sext i16 undef to i32			%v1 = sext i16 undef to i32

	; CHECK: cost of 1 {{.*}} sext			; CHECK: cost of 1 {{.*}} sext
	%v2 = sext <2 x i16> undef to <2 x i32>			%v2 = sext <2 x i16> undef to <2 x i32>

	; CHECK: cost of 1 {{.*}} sext			; CHECK: cost of 1 {{.*}} sext
	%v3 = sext <4 x i16> undef to <4 x i32>			%v3 = sext <4 x i16> undef to <4 x i32>

	; CHECK: cost of 112 {{.*}} sext			; CHECK: cost of 3 {{.*}} sext
	%v4 = sext <8 x i16> undef to <8 x i32>			%v4 = sext <8 x i16> undef to <8 x i32>

	ret void			ret void
	}			}

test/Analysis/CostModel/X86/sitofp.ll

Show All 34 Lines	define <4 x double> @sitofpv4i8v4double(<4 x i8> %a) {
ret <4 x double> %1		ret <4 x double> %1
}		}

define <8 x double> @sitofpv8i8v8double(<8 x i8> %a) {		define <8 x double> @sitofpv8i8v8double(<8 x i8> %a) {
; SSE2-LABEL: sitofpv8i8v8double		; SSE2-LABEL: sitofpv8i8v8double
; SSE2: cost of 80 {{.*}} sitofp		; SSE2: cost of 80 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv8i8v8double		; AVX1-LABEL: sitofpv8i8v8double
; AVX1: cost of 20 {{.*}} sitofp		; AVX1: cost of 7 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv8i8v8double		; AVX2-LABEL: sitofpv8i8v8double
; AVX2: cost of 20 {{.*}} sitofp		; AVX2: cost of 7 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv8i8v8double		; AVX512F-LABEL: sitofpv8i8v8double
; AVX512F: cost of 2 {{.*}} sitofp		; AVX512F: cost of 2 {{.*}} sitofp
%1 = sitofp <8 x i8> %a to <8 x double>		%1 = sitofp <8 x i8> %a to <8 x double>
ret <8 x double> %1		ret <8 x double> %1
}		}

define <16 x double> @sitofpv16i8v16double(<16 x i8> %a) {		define <16 x double> @sitofpv16i8v16double(<16 x i8> %a) {
; SSE2-LABEL: sitofpv16i8v16double		; SSE2-LABEL: sitofpv16i8v16double
; SSE2: cost of 160 {{.*}} sitofp		; SSE2: cost of 160 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv16i8v16double		; AVX1-LABEL: sitofpv16i8v16double
; AVX1: cost of 40 {{.*}} sitofp		; AVX1: cost of 15 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv16i8v16double		; AVX2-LABEL: sitofpv16i8v16double
; AVX2: cost of 40 {{.*}} sitofp		; AVX2: cost of 15 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv16i8v16double		; AVX512F-LABEL: sitofpv16i8v16double
; AVX512F: cost of 44 {{.*}} sitofp		; AVX512F: cost of 5 {{.*}} sitofp
%1 = sitofp <16 x i8> %a to <16 x double>		%1 = sitofp <16 x i8> %a to <16 x double>
ret <16 x double> %1		ret <16 x double> %1
}		}

define <32 x double> @sitofpv32i8v32double(<32 x i8> %a) {		define <32 x double> @sitofpv32i8v32double(<32 x i8> %a) {
; SSE2-LABEL: sitofpv32i8v32double		; SSE2-LABEL: sitofpv32i8v32double
; SSE2: cost of 320 {{.*}} sitofp		; SSE2: cost of 320 {{.*}} sitofp
		RKSimonUnsubmitted Not Done Reply Inline Actions This looks very high, codegen looks like it takes 26 ops - I haven't checked the throughput. RKSimon: This looks very high, codegen looks like it takes 26 ops - I haven't checked the throughput.
;		;
; AVX1-LABEL: sitofpv32i8v32double		; AVX1-LABEL: sitofpv32i8v32double
; AVX1: cost of 80 {{.*}} sitofp		; AVX1: cost of 31 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv32i8v32double		; AVX2-LABEL: sitofpv32i8v32double
; AVX2: cost of 80 {{.*}} sitofp		; AVX2: cost of 31 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv32i8v32double		; AVX512F-LABEL: sitofpv32i8v32double
; AVX512F: cost of 88 {{.*}} sitofp		; AVX512F: cost of 11 {{.*}} sitofp
%1 = sitofp <32 x i8> %a to <32 x double>		%1 = sitofp <32 x i8> %a to <32 x double>
ret <32 x double> %1		ret <32 x double> %1
}		}

define <2 x double> @sitofpv2i16v2double(<2 x i16> %a) {		define <2 x double> @sitofpv2i16v2double(<2 x i16> %a) {
; SSE2-LABEL: sitofpv2i16v2double		; SSE2-LABEL: sitofpv2i16v2double
; SSE2: cost of 20 {{.*}} sitofp		; SSE2: cost of 20 {{.*}} sitofp
;		;
Show All 25 Lines	define <4 x double> @sitofpv4i16v4double(<4 x i16> %a) {
ret <4 x double> %1		ret <4 x double> %1
}		}

define <8 x double> @sitofpv8i16v8double(<8 x i16> %a) {		define <8 x double> @sitofpv8i16v8double(<8 x i16> %a) {
; SSE2-LABEL: sitofpv8i16v8double		; SSE2-LABEL: sitofpv8i16v8double
; SSE2: cost of 80 {{.*}} sitofp		; SSE2: cost of 80 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv8i16v8double		; AVX1-LABEL: sitofpv8i16v8double
; AVX1: cost of 20 {{.*}} sitofp		; AVX1: cost of 7 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv8i16v8double		; AVX2-LABEL: sitofpv8i16v8double
; AVX2: cost of 20 {{.*}} sitofp		; AVX2: cost of 7 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv8i16v8double		; AVX512F-LABEL: sitofpv8i16v8double
; AVX512F: cost of 2 {{.*}} sitofp		; AVX512F: cost of 2 {{.*}} sitofp
%1 = sitofp <8 x i16> %a to <8 x double>		%1 = sitofp <8 x i16> %a to <8 x double>
ret <8 x double> %1		ret <8 x double> %1
}		}

define <16 x double> @sitofpv16i16v16double(<16 x i16> %a) {		define <16 x double> @sitofpv16i16v16double(<16 x i16> %a) {
; SSE2-LABEL: sitofpv16i16v16double		; SSE2-LABEL: sitofpv16i16v16double
; SSE2: cost of 160 {{.*}} sitofp		; SSE2: cost of 160 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv16i16v16double		; AVX1-LABEL: sitofpv16i16v16double
; AVX1: cost of 40 {{.*}} sitofp		; AVX1: cost of 15 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv16i16v16double		; AVX2-LABEL: sitofpv16i16v16double
; AVX2: cost of 40 {{.*}} sitofp		; AVX2: cost of 15 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv16i16v16double		; AVX512F-LABEL: sitofpv16i16v16double
; AVX512F: cost of 44 {{.*}} sitofp		; AVX512F: cost of 5 {{.*}} sitofp
%1 = sitofp <16 x i16> %a to <16 x double>		%1 = sitofp <16 x i16> %a to <16 x double>
ret <16 x double> %1		ret <16 x double> %1
}		}

define <32 x double> @sitofpv32i16v32double(<32 x i16> %a) {		define <32 x double> @sitofpv32i16v32double(<32 x i16> %a) {
; SSE2-LABEL: sitofpv32i16v32double		; SSE2-LABEL: sitofpv32i16v32double
; SSE2: cost of 320 {{.*}} sitofp		; SSE2: cost of 320 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv32i16v32double		; AVX1-LABEL: sitofpv32i16v32double
; AVX1: cost of 80 {{.*}} sitofp		; AVX1: cost of 31 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv32i16v32double		; AVX2-LABEL: sitofpv32i16v32double
; AVX2: cost of 80 {{.*}} sitofp		; AVX2: cost of 31 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv32i16v32double		; AVX512F-LABEL: sitofpv32i16v32double
; AVX512F: cost of 88 {{.*}} sitofp		; AVX512F: cost of 11 {{.*}} sitofp
%1 = sitofp <32 x i16> %a to <32 x double>		%1 = sitofp <32 x i16> %a to <32 x double>
ret <32 x double> %1		ret <32 x double> %1
}		}

define <2 x double> @sitofpv2i32v2double(<2 x i32> %a) {		define <2 x double> @sitofpv2i32v2double(<2 x i32> %a) {
; SSE2-LABEL: sitofpv2i32v2double		; SSE2-LABEL: sitofpv2i32v2double
; SSE2: cost of 20 {{.*}} sitofp		; SSE2: cost of 20 {{.*}} sitofp
;		;
Show All 25 Lines	define <4 x double> @sitofpv4i32v4double(<4 x i32> %a) {
ret <4 x double> %1		ret <4 x double> %1
}		}

define <8 x double> @sitofpv8i32v8double(<8 x i32> %a) {		define <8 x double> @sitofpv8i32v8double(<8 x i32> %a) {
; SSE2-LABEL: sitofpv8i32v8double		; SSE2-LABEL: sitofpv8i32v8double
; SSE2: cost of 80 {{.*}} sitofp		; SSE2: cost of 80 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv8i32v8double		; AVX1-LABEL: sitofpv8i32v8double
; AVX1: cost of 20 {{.*}} sitofp		; AVX1: cost of 3 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv8i32v8double		; AVX2-LABEL: sitofpv8i32v8double
; AVX2: cost of 20 {{.*}} sitofp		; AVX2: cost of 3 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv8i32v8double		; AVX512F-LABEL: sitofpv8i32v8double
; AVX512F: cost of 1 {{.*}} sitofp		; AVX512F: cost of 1 {{.*}} sitofp
%1 = sitofp <8 x i32> %a to <8 x double>		%1 = sitofp <8 x i32> %a to <8 x double>
ret <8 x double> %1		ret <8 x double> %1
}		}

define <16 x double> @sitofpv16i32v16double(<16 x i32> %a) {		define <16 x double> @sitofpv16i32v16double(<16 x i32> %a) {
; SSE2-LABEL: sitofpv16i32v16double		; SSE2-LABEL: sitofpv16i32v16double
; SSE2: cost of 160 {{.*}} sitofp		; SSE2: cost of 160 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv16i32v16double		; AVX1-LABEL: sitofpv16i32v16double
; AVX1: cost of 40 {{.*}} sitofp		; AVX1: cost of 7 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv16i32v16double		; AVX2-LABEL: sitofpv16i32v16double
; AVX2: cost of 40 {{.*}} sitofp		; AVX2: cost of 7 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv16i32v16double		; AVX512F-LABEL: sitofpv16i32v16double
; AVX512F: cost of 44 {{.*}} sitofp		; AVX512F: cost of 3 {{.*}} sitofp
%1 = sitofp <16 x i32> %a to <16 x double>		%1 = sitofp <16 x i32> %a to <16 x double>
ret <16 x double> %1		ret <16 x double> %1
}		}

define <32 x double> @sitofpv32i32v32double(<32 x i32> %a) {		define <32 x double> @sitofpv32i32v32double(<32 x i32> %a) {
; SSE2-LABEL: sitofpv32i32v32double		; SSE2-LABEL: sitofpv32i32v32double
; SSE2: cost of 320 {{.*}} sitofp		; SSE2: cost of 320 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv32i32v32double		; AVX1-LABEL: sitofpv32i32v32double
; AVX1: cost of 80 {{.*}} sitofp		; AVX1: cost of 15 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv32i32v32double		; AVX2-LABEL: sitofpv32i32v32double
; AVX2: cost of 80 {{.*}} sitofp		; AVX2: cost of 15 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv32i32v32double		; AVX512F-LABEL: sitofpv32i32v32double
; AVX512F: cost of 88 {{.*}} sitofp		; AVX512F: cost of 7 {{.*}} sitofp
%1 = sitofp <32 x i32> %a to <32 x double>		%1 = sitofp <32 x i32> %a to <32 x double>
ret <32 x double> %1		ret <32 x double> %1
}		}

define <2 x double> @sitofpv2i64v2double(<2 x i64> %a) {		define <2 x double> @sitofpv2i64v2double(<2 x i64> %a) {
; SSE2-LABEL: sitofpv2i64v2double		; SSE2-LABEL: sitofpv2i64v2double
; SSE2: cost of 20 {{.*}} sitofp		; SSE2: cost of 20 {{.*}} sitofp
;		;
Show All 25 Lines	define <4 x double> @sitofpv4i64v4double(<4 x i64> %a) {
ret <4 x double> %1		ret <4 x double> %1
}		}

define <8 x double> @sitofpv8i64v8double(<8 x i64> %a) {		define <8 x double> @sitofpv8i64v8double(<8 x i64> %a) {
; SSE2-LABEL: sitofpv8i64v8double		; SSE2-LABEL: sitofpv8i64v8double
; SSE2: cost of 80 {{.*}} sitofp		; SSE2: cost of 80 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv8i64v8double		; AVX1-LABEL: sitofpv8i64v8double
; AVX1: cost of 20 {{.*}} sitofp		; AVX1: cost of 21 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv8i64v8double		; AVX2-LABEL: sitofpv8i64v8double
; AVX2: cost of 20 {{.*}} sitofp		; AVX2: cost of 21 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv8i64v8double		; AVX512F-LABEL: sitofpv8i64v8double
; AVX512F: cost of 22 {{.*}} sitofp		; AVX512F: cost of 22 {{.*}} sitofp
%1 = sitofp <8 x i64> %a to <8 x double>		%1 = sitofp <8 x i64> %a to <8 x double>
ret <8 x double> %1		ret <8 x double> %1
}		}

define <16 x double> @sitofpv16i64v16double(<16 x i64> %a) {		define <16 x double> @sitofpv16i64v16double(<16 x i64> %a) {
; SSE2-LABEL: sitofpv16i64v16double		; SSE2-LABEL: sitofpv16i64v16double
; SSE2: cost of 160 {{.*}} sitofp		; SSE2: cost of 160 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv16i64v16double		; AVX1-LABEL: sitofpv16i64v16double
; AVX1: cost of 40 {{.*}} sitofp		; AVX1: cost of 43 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv16i64v16double		; AVX2-LABEL: sitofpv16i64v16double
; AVX2: cost of 40 {{.*}} sitofp		; AVX2: cost of 43 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv16i64v16double		; AVX512F-LABEL: sitofpv16i64v16double
; AVX512F: cost of 44 {{.*}} sitofp		; AVX512F: cost of 45 {{.*}} sitofp
%1 = sitofp <16 x i64> %a to <16 x double>		%1 = sitofp <16 x i64> %a to <16 x double>
ret <16 x double> %1		ret <16 x double> %1
}		}

define <32 x double> @sitofpv32i64v32double(<32 x i64> %a) {		define <32 x double> @sitofpv32i64v32double(<32 x i64> %a) {
; SSE2-LABEL: sitofpv32i64v32double		; SSE2-LABEL: sitofpv32i64v32double
; SSE2: cost of 320 {{.*}} sitofp		; SSE2: cost of 320 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv32i64v32double		; AVX1-LABEL: sitofpv32i64v32double
; AVX1: cost of 80 {{.*}} sitofp		; AVX1: cost of 87 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv32i64v32double		; AVX2-LABEL: sitofpv32i64v32double
; AVX2: cost of 80 {{.*}} sitofp		; AVX2: cost of 87 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv32i64v32double		; AVX512F-LABEL: sitofpv32i64v32double
; AVX512F: cost of 88 {{.*}} sitofp		; AVX512F: cost of 91 {{.*}} sitofp
%1 = sitofp <32 x i64> %a to <32 x double>		%1 = sitofp <32 x i64> %a to <32 x double>
ret <32 x double> %1		ret <32 x double> %1
}		}

define <2 x float> @sitofpv2i8v2float(<2 x i8> %a) {		define <2 x float> @sitofpv2i8v2float(<2 x i8> %a) {
; SSE2-LABEL: sitofpv2i8v2float		; SSE2-LABEL: sitofpv2i8v2float
; SSE2: cost of 15 {{.*}} sitofp		; SSE2: cost of 15 {{.*}} sitofp
;		;
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	define <8 x float> @sitofpv8i8v8float(<8 x i8> %a) {
ret <8 x float> %1		ret <8 x float> %1
}		}

define <16 x float> @sitofpv16i8v16float(<16 x i8> %a) {		define <16 x float> @sitofpv16i8v16float(<16 x i8> %a) {
; SSE2-LABEL: sitofpv16i8v16float		; SSE2-LABEL: sitofpv16i8v16float
; SSE2: cost of 8 {{.*}} sitofp		; SSE2: cost of 8 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv16i8v16float		; AVX1-LABEL: sitofpv16i8v16float
; AVX1: cost of 44 {{.*}} sitofp		; AVX1: cost of 17 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv16i8v16float		; AVX2-LABEL: sitofpv16i8v16float
; AVX2: cost of 44 {{.*}} sitofp		; AVX2: cost of 17 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv16i8v16float		; AVX512F-LABEL: sitofpv16i8v16float
; AVX512F: cost of 2 {{.*}} sitofp		; AVX512F: cost of 2 {{.*}} sitofp
%1 = sitofp <16 x i8> %a to <16 x float>		%1 = sitofp <16 x i8> %a to <16 x float>
ret <16 x float> %1		ret <16 x float> %1
}		}

define <32 x float> @sitofpv32i8v32float(<32 x i8> %a) {		define <32 x float> @sitofpv32i8v32float(<32 x i8> %a) {
; SSE2-LABEL: sitofpv32i8v32float		; SSE2-LABEL: sitofpv32i8v32float
; SSE2: cost of 16 {{.*}} sitofp		; SSE2: cost of 16 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv32i8v32float		; AVX1-LABEL: sitofpv32i8v32float
; AVX1: cost of 88 {{.*}} sitofp		; AVX1: cost of 35 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv32i8v32float		; AVX2-LABEL: sitofpv32i8v32float
; AVX2: cost of 88 {{.*}} sitofp		; AVX2: cost of 35 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv32i8v32float		; AVX512F-LABEL: sitofpv32i8v32float
; AVX512F: cost of 92 {{.*}} sitofp		; AVX512F: cost of 5 {{.*}} sitofp
%1 = sitofp <32 x i8> %a to <32 x float>		%1 = sitofp <32 x i8> %a to <32 x float>
ret <32 x float> %1		ret <32 x float> %1
}		}

define <2 x float> @sitofpv2i16v2float(<2 x i16> %a) {		define <2 x float> @sitofpv2i16v2float(<2 x i16> %a) {
; SSE2-LABEL: sitofpv2i16v2float		; SSE2-LABEL: sitofpv2i16v2float
; SSE2: cost of 15 {{.*}} sitofp		; SSE2: cost of 15 {{.*}} sitofp
;		;
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	define <8 x float> @sitofpv8i16v8float(<8 x i16> %a) {
ret <8 x float> %1		ret <8 x float> %1
}		}

define <16 x float> @sitofpv16i16v16float(<16 x i16> %a) {		define <16 x float> @sitofpv16i16v16float(<16 x i16> %a) {
; SSE2-LABEL: sitofpv16i16v16float		; SSE2-LABEL: sitofpv16i16v16float
; SSE2: cost of 30 {{.*}} sitofp		; SSE2: cost of 30 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv16i16v16float		; AVX1-LABEL: sitofpv16i16v16float
; AVX1: cost of 44 {{.*}} sitofp		; AVX1: cost of 11 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv16i16v16float		; AVX2-LABEL: sitofpv16i16v16float
; AVX2: cost of 44 {{.*}} sitofp		; AVX2: cost of 11 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv16i16v16float		; AVX512F-LABEL: sitofpv16i16v16float
; AVX512F: cost of 2 {{.*}} sitofp		; AVX512F: cost of 2 {{.*}} sitofp
%1 = sitofp <16 x i16> %a to <16 x float>		%1 = sitofp <16 x i16> %a to <16 x float>
ret <16 x float> %1		ret <16 x float> %1
}		}

define <32 x float> @sitofpv32i16v32float(<32 x i16> %a) {		define <32 x float> @sitofpv32i16v32float(<32 x i16> %a) {
; SSE2-LABEL: sitofpv32i16v32float		; SSE2-LABEL: sitofpv32i16v32float
; SSE2: cost of 60 {{.*}} sitofp		; SSE2: cost of 60 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv32i16v32float		; AVX1-LABEL: sitofpv32i16v32float
; AVX1: cost of 88 {{.*}} sitofp		; AVX1: cost of 23 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv32i16v32float		; AVX2-LABEL: sitofpv32i16v32float
; AVX2: cost of 88 {{.*}} sitofp		; AVX2: cost of 23 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv32i16v32float		; AVX512F-LABEL: sitofpv32i16v32float
; AVX512F: cost of 92 {{.*}} sitofp		; AVX512F: cost of 5 {{.*}} sitofp
%1 = sitofp <32 x i16> %a to <32 x float>		%1 = sitofp <32 x i16> %a to <32 x float>
ret <32 x float> %1		ret <32 x float> %1
}		}

define <2 x float> @sitofpv2i32v2float(<2 x i32> %a) {		define <2 x float> @sitofpv2i32v2float(<2 x i32> %a) {
; SSE2-LABEL: sitofpv2i32v2float		; SSE2-LABEL: sitofpv2i32v2float
; SSE2: cost of 15 {{.*}} sitofp		; SSE2: cost of 15 {{.*}} sitofp
;		;
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	define <8 x float> @sitofpv8i32v8float(<8 x i32> %a) {
ret <8 x float> %1		ret <8 x float> %1
}		}

define <16 x float> @sitofpv16i32v16float(<16 x i32> %a) {		define <16 x float> @sitofpv16i32v16float(<16 x i32> %a) {
; SSE2-LABEL: sitofpv16i32v16float		; SSE2-LABEL: sitofpv16i32v16float
; SSE2: cost of 60 {{.*}} sitofp		; SSE2: cost of 60 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv16i32v16float		; AVX1-LABEL: sitofpv16i32v16float
; AVX1: cost of 44 {{.*}} sitofp		; AVX1: cost of 3 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv16i32v16float		; AVX2-LABEL: sitofpv16i32v16float
; AVX2: cost of 44 {{.*}} sitofp		; AVX2: cost of 3 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv16i32v16float		; AVX512F-LABEL: sitofpv16i32v16float
; AVX512F: cost of 1 {{.*}} sitofp		; AVX512F: cost of 1 {{.*}} sitofp
%1 = sitofp <16 x i32> %a to <16 x float>		%1 = sitofp <16 x i32> %a to <16 x float>
ret <16 x float> %1		ret <16 x float> %1
}		}

define <32 x float> @sitofpv32i32v32float(<32 x i32> %a) {		define <32 x float> @sitofpv32i32v32float(<32 x i32> %a) {
; SSE2-LABEL: sitofpv32i32v32float		; SSE2-LABEL: sitofpv32i32v32float
; SSE2: cost of 120 {{.*}} sitofp		; SSE2: cost of 120 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv32i32v32float		; AVX1-LABEL: sitofpv32i32v32float
; AVX1: cost of 88 {{.*}} sitofp		; AVX1: cost of 7 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv32i32v32float		; AVX2-LABEL: sitofpv32i32v32float
; AVX2: cost of 88 {{.*}} sitofp		; AVX2: cost of 7 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv32i32v32float		; AVX512F-LABEL: sitofpv32i32v32float
; AVX512F: cost of 92 {{.*}} sitofp		; AVX512F: cost of 3 {{.*}} sitofp
%1 = sitofp <32 x i32> %a to <32 x float>		%1 = sitofp <32 x i32> %a to <32 x float>
ret <32 x float> %1		ret <32 x float> %1
}		}

define <2 x float> @sitofpv2i64v2float(<2 x i64> %a) {		define <2 x float> @sitofpv2i64v2float(<2 x i64> %a) {
; SSE2-LABEL: sitofpv2i64v2float		; SSE2-LABEL: sitofpv2i64v2float
; SSE2: cost of 15 {{.*}} sitofp		; SSE2: cost of 15 {{.*}} sitofp
;		;
Show All 25 Lines	define <4 x float> @sitofpv4i64v4float(<4 x i64> %a) {
ret <4 x float> %1		ret <4 x float> %1
}		}

define <8 x float> @sitofpv8i64v8float(<8 x i64> %a) {		define <8 x float> @sitofpv8i64v8float(<8 x i64> %a) {
; SSE2-LABEL: sitofpv8i64v8float		; SSE2-LABEL: sitofpv8i64v8float
; SSE2: cost of 60 {{.*}} sitofp		; SSE2: cost of 60 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv8i64v8float		; AVX1-LABEL: sitofpv8i64v8float
; AVX1: cost of 22 {{.*}} sitofp		; AVX1: cost of 21 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv8i64v8float		; AVX2-LABEL: sitofpv8i64v8float
; AVX2: cost of 22 {{.*}} sitofp		; AVX2: cost of 21 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv8i64v8float		; AVX512F-LABEL: sitofpv8i64v8float
; AVX512F: cost of 22 {{.*}} sitofp		; AVX512F: cost of 22 {{.*}} sitofp
%1 = sitofp <8 x i64> %a to <8 x float>		%1 = sitofp <8 x i64> %a to <8 x float>
ret <8 x float> %1		ret <8 x float> %1
}		}

define <16 x float> @sitofpv16i64v16float(<16 x i64> %a) {		define <16 x float> @sitofpv16i64v16float(<16 x i64> %a) {
; SSE2-LABEL: sitofpv16i64v16float		; SSE2-LABEL: sitofpv16i64v16float
; SSE2: cost of 120 {{.*}} sitofp		; SSE2: cost of 120 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv16i64v16float		; AVX1-LABEL: sitofpv16i64v16float
; AVX1: cost of 44 {{.*}} sitofp		; AVX1: cost of 43 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv16i64v16float		; AVX2-LABEL: sitofpv16i64v16float
; AVX2: cost of 44 {{.*}} sitofp		; AVX2: cost of 43 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv16i64v16float		; AVX512F-LABEL: sitofpv16i64v16float
; AVX512F: cost of 46 {{.*}} sitofp		; AVX512F: cost of 45 {{.*}} sitofp
%1 = sitofp <16 x i64> %a to <16 x float>		%1 = sitofp <16 x i64> %a to <16 x float>
ret <16 x float> %1		ret <16 x float> %1
}		}

define <32 x float> @sitofpv32i64v32float(<32 x i64> %a) {		define <32 x float> @sitofpv32i64v32float(<32 x i64> %a) {
; SSE2-LABEL: sitofpv32i64v32float		; SSE2-LABEL: sitofpv32i64v32float
; SSE2: cost of 240 {{.*}} sitofp		; SSE2: cost of 240 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv32i64v32float		; AVX1-LABEL: sitofpv32i64v32float
; AVX1: cost of 88 {{.*}} sitofp		; AVX1: cost of 87 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv32i64v32float		; AVX2-LABEL: sitofpv32i64v32float
; AVX2: cost of 88 {{.*}} sitofp		; AVX2: cost of 87 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv32i64v32float		; AVX512F-LABEL: sitofpv32i64v32float
; AVX512F: cost of 92 {{.*}} sitofp		; AVX512F: cost of 91 {{.*}} sitofp
%1 = sitofp <32 x i64> %a to <32 x float>		%1 = sitofp <32 x i64> %a to <32 x float>
ret <32 x float> %1		ret <32 x float> %1
}		}

define <8 x double> @sitofpv8i1v8double(<8 x double> %a) {		define <8 x double> @sitofpv8i1v8double(<8 x double> %a) {
; SSE2-LABEL: sitofpv8i1v8double		; SSE2-LABEL: sitofpv8i1v8double
; SSE2: cost of 80 {{.*}} sitofp		; SSE2: cost of 80 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv8i1v8double		; AVX1-LABEL: sitofpv8i1v8double
; AVX1: cost of 20 {{.*}} sitofp		; AVX1: cost of 7 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv8i1v8double		; AVX2-LABEL: sitofpv8i1v8double
; AVX2: cost of 20 {{.*}} sitofp		; AVX2: cost of 7 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv8i1v8double		; AVX512F-LABEL: sitofpv8i1v8double
; AVX512F: cost of 4 {{.*}} sitofp		; AVX512F: cost of 4 {{.*}} sitofp
%cmpres = fcmp ogt <8 x double> %a, zeroinitializer		%cmpres = fcmp ogt <8 x double> %a, zeroinitializer
%1 = sitofp <8 x i1> %cmpres to <8 x double>		%1 = sitofp <8 x i1> %cmpres to <8 x double>
ret <8 x double> %1		ret <8 x double> %1
}		}

define <16 x float> @sitofpv16i1v16float(<16 x float> %a) {		define <16 x float> @sitofpv16i1v16float(<16 x float> %a) {
; SSE2-LABEL: sitofpv16i1v16float		; SSE2-LABEL: sitofpv16i1v16float
; SSE2: cost of 8 {{.*}} sitofp		; SSE2: cost of 8 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv16i1v16float		; AVX1-LABEL: sitofpv16i1v16float
; AVX1: cost of 44 {{.*}} sitofp		; AVX1: cost of 17 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv16i1v16float		; AVX2-LABEL: sitofpv16i1v16float
; AVX2: cost of 44 {{.*}} sitofp		; AVX2: cost of 17 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv16i1v16float		; AVX512F-LABEL: sitofpv16i1v16float
; AVX512F: cost of 3 {{.*}} sitofp		; AVX512F: cost of 3 {{.*}} sitofp
%cmpres = fcmp ogt <16 x float> %a, zeroinitializer		%cmpres = fcmp ogt <16 x float> %a, zeroinitializer
%1 = sitofp <16 x i1> %cmpres to <16 x float>		%1 = sitofp <16 x i1> %cmpres to <16 x float>
ret <16 x float> %1		ret <16 x float> %1
}		}

test/Analysis/CostModel/X86/uitofp.ll

Show All 35 Lines	define <4 x double> @uitofpv4i8v4double(<4 x i8> %a) {
ret <4 x double> %1		ret <4 x double> %1
}		}

define <8 x double> @uitofpv8i8v8double(<8 x i8> %a) {		define <8 x double> @uitofpv8i8v8double(<8 x i8> %a) {
; SSE2-LABEL: uitofpv8i8v8double		; SSE2-LABEL: uitofpv8i8v8double
; SSE2: cost of 80 {{.*}} uitofp		; SSE2: cost of 80 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv8i8v8double		; AVX1-LABEL: uitofpv8i8v8double
; AVX1: cost of 20 {{.*}} uitofp		; AVX1: cost of 5 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv8i8v8double		; AVX2-LABEL: uitofpv8i8v8double
; AVX2: cost of 20 {{.*}} uitofp		; AVX2: cost of 5 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv8i8v8double		; AVX512F-LABEL: uitofpv8i8v8double
; AVX512F: cost of 2 {{.*}} uitofp		; AVX512F: cost of 2 {{.*}} uitofp
%1 = uitofp <8 x i8> %a to <8 x double>		%1 = uitofp <8 x i8> %a to <8 x double>
ret <8 x double> %1		ret <8 x double> %1
}		}

define <16 x double> @uitofpv16i8v16double(<16 x i8> %a) {		define <16 x double> @uitofpv16i8v16double(<16 x i8> %a) {
; SSE2-LABEL: uitofpv16i8v16double		; SSE2-LABEL: uitofpv16i8v16double
; SSE2: cost of 160 {{.*}} uitofp		; SSE2: cost of 160 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv16i8v16double		; AVX1-LABEL: uitofpv16i8v16double
; AVX1: cost of 40 {{.*}} uitofp		; AVX1: cost of 11 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv16i8v16double		; AVX2-LABEL: uitofpv16i8v16double
; AVX2: cost of 40 {{.*}} uitofp		; AVX2: cost of 11 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv16i8v16double		; AVX512F-LABEL: uitofpv16i8v16double
; AVX512F: cost of 44 {{.*}} uitofp		; AVX512F: cost of 5 {{.*}} uitofp
%1 = uitofp <16 x i8> %a to <16 x double>		%1 = uitofp <16 x i8> %a to <16 x double>
ret <16 x double> %1		ret <16 x double> %1
}		}

define <32 x double> @uitofpv32i8v32double(<32 x i8> %a) {		define <32 x double> @uitofpv32i8v32double(<32 x i8> %a) {
; SSE2-LABEL: uitofpv32i8v32double		; SSE2-LABEL: uitofpv32i8v32double
; SSE2: cost of 320 {{.*}} uitofp		; SSE2: cost of 320 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv32i8v32double		; AVX1-LABEL: uitofpv32i8v32double
; AVX1: cost of 80 {{.*}} uitofp		; AVX1: cost of 23 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv32i8v32double		; AVX2-LABEL: uitofpv32i8v32double
; AVX2: cost of 80 {{.*}} uitofp		; AVX2: cost of 23 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv32i8v32double		; AVX512F-LABEL: uitofpv32i8v32double
; AVX512F: cost of 88 {{.*}} uitofp		; AVX512F: cost of 11 {{.*}} uitofp
%1 = uitofp <32 x i8> %a to <32 x double>		%1 = uitofp <32 x i8> %a to <32 x double>
ret <32 x double> %1		ret <32 x double> %1
}		}

define <2 x double> @uitofpv2i16v2double(<2 x i16> %a) {		define <2 x double> @uitofpv2i16v2double(<2 x i16> %a) {
; SSE2-LABEL: uitofpv2i16v2double		; SSE2-LABEL: uitofpv2i16v2double
; SSE2: cost of 20 {{.*}} uitofp		; SSE2: cost of 20 {{.*}} uitofp
;		;
Show All 25 Lines	define <4 x double> @uitofpv4i16v4double(<4 x i16> %a) {
ret <4 x double> %1		ret <4 x double> %1
}		}

define <8 x double> @uitofpv8i16v8double(<8 x i16> %a) {		define <8 x double> @uitofpv8i16v8double(<8 x i16> %a) {
; SSE2-LABEL: uitofpv8i16v8double		; SSE2-LABEL: uitofpv8i16v8double
; SSE2: cost of 80 {{.*}} uitofp		; SSE2: cost of 80 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv8i16v8double		; AVX1-LABEL: uitofpv8i16v8double
; AVX1: cost of 20 {{.*}} uitofp		; AVX1: cost of 5 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv8i16v8double		; AVX2-LABEL: uitofpv8i16v8double
; AVX2: cost of 20 {{.*}} uitofp		; AVX2: cost of 5 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv8i16v8double		; AVX512F-LABEL: uitofpv8i16v8double
; AVX512F: cost of 2 {{.*}} uitofp		; AVX512F: cost of 2 {{.*}} uitofp
%1 = uitofp <8 x i16> %a to <8 x double>		%1 = uitofp <8 x i16> %a to <8 x double>
ret <8 x double> %1		ret <8 x double> %1
}		}

define <16 x double> @uitofpv16i16v16double(<16 x i16> %a) {		define <16 x double> @uitofpv16i16v16double(<16 x i16> %a) {
; SSE2-LABEL: uitofpv16i16v16double		; SSE2-LABEL: uitofpv16i16v16double
; SSE2: cost of 160 {{.*}} uitofp		; SSE2: cost of 160 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv16i16v16double		; AVX1-LABEL: uitofpv16i16v16double
; AVX1: cost of 40 {{.*}} uitofp		; AVX1: cost of 11 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv16i16v16double		; AVX2-LABEL: uitofpv16i16v16double
; AVX2: cost of 40 {{.*}} uitofp		; AVX2: cost of 11 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv16i16v16double		; AVX512F-LABEL: uitofpv16i16v16double
; AVX512F: cost of 44 {{.*}} uitofp		; AVX512F: cost of 5 {{.*}} uitofp
%1 = uitofp <16 x i16> %a to <16 x double>		%1 = uitofp <16 x i16> %a to <16 x double>
ret <16 x double> %1		ret <16 x double> %1
}		}

define <32 x double> @uitofpv32i16v32double(<32 x i16> %a) {		define <32 x double> @uitofpv32i16v32double(<32 x i16> %a) {
; SSE2-LABEL: uitofpv32i16v32double		; SSE2-LABEL: uitofpv32i16v32double
; SSE2: cost of 320 {{.*}} uitofp		; SSE2: cost of 320 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv32i16v32double		; AVX1-LABEL: uitofpv32i16v32double
; AVX1: cost of 80 {{.*}} uitofp		; AVX1: cost of 23 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv32i16v32double		; AVX2-LABEL: uitofpv32i16v32double
; AVX2: cost of 80 {{.*}} uitofp		; AVX2: cost of 23 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv32i16v32double		; AVX512F-LABEL: uitofpv32i16v32double
; AVX512F: cost of 88 {{.*}} uitofp		; AVX512F: cost of 11 {{.*}} uitofp
%1 = uitofp <32 x i16> %a to <32 x double>		%1 = uitofp <32 x i16> %a to <32 x double>
ret <32 x double> %1		ret <32 x double> %1
}		}

define <2 x double> @uitofpv2i32v2double(<2 x i32> %a) {		define <2 x double> @uitofpv2i32v2double(<2 x i32> %a) {
; SSE2-LABEL: uitofpv2i32v2double		; SSE2-LABEL: uitofpv2i32v2double
; SSE2: cost of 20 {{.*}} uitofp		; SSE2: cost of 20 {{.*}} uitofp
;		;
Show All 25 Lines	define <4 x double> @uitofpv4i32v4double(<4 x i32> %a) {
ret <4 x double> %1		ret <4 x double> %1
}		}

define <8 x double> @uitofpv8i32v8double(<8 x i32> %a) {		define <8 x double> @uitofpv8i32v8double(<8 x i32> %a) {
; SSE2-LABEL: uitofpv8i32v8double		; SSE2-LABEL: uitofpv8i32v8double
; SSE2: cost of 80 {{.*}} uitofp		; SSE2: cost of 80 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv8i32v8double		; AVX1-LABEL: uitofpv8i32v8double
; AVX1: cost of 20 {{.*}} uitofp		; AVX1: cost of 13 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv8i32v8double		; AVX2-LABEL: uitofpv8i32v8double
; AVX2: cost of 20 {{.*}} uitofp		; AVX2: cost of 13 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv8i32v8double		; AVX512F-LABEL: uitofpv8i32v8double
; AVX512F: cost of 1 {{.*}} uitofp		; AVX512F: cost of 1 {{.*}} uitofp
%1 = uitofp <8 x i32> %a to <8 x double>		%1 = uitofp <8 x i32> %a to <8 x double>
ret <8 x double> %1		ret <8 x double> %1
}		}

define <16 x double> @uitofpv16i32v16double(<16 x i32> %a) {		define <16 x double> @uitofpv16i32v16double(<16 x i32> %a) {
; SSE2-LABEL: uitofpv16i32v16double		; SSE2-LABEL: uitofpv16i32v16double
; SSE2: cost of 160 {{.*}} uitofp		; SSE2: cost of 160 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv16i32v16double		; AVX1-LABEL: uitofpv16i32v16double
; AVX1: cost of 40 {{.*}} uitofp		; AVX1: cost of 27 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv16i32v16double		; AVX2-LABEL: uitofpv16i32v16double
; AVX2: cost of 40 {{.*}} uitofp		; AVX2: cost of 27 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv16i32v16double		; AVX512F-LABEL: uitofpv16i32v16double
; AVX512F: cost of 44 {{.*}} uitofp		; AVX512F: cost of 3 {{.*}} uitofp
%1 = uitofp <16 x i32> %a to <16 x double>		%1 = uitofp <16 x i32> %a to <16 x double>
ret <16 x double> %1		ret <16 x double> %1
}		}

define <32 x double> @uitofpv32i32v32double(<32 x i32> %a) {		define <32 x double> @uitofpv32i32v32double(<32 x i32> %a) {
; SSE2-LABEL: uitofpv32i32v32double		; SSE2-LABEL: uitofpv32i32v32double
; SSE2: cost of 320 {{.*}} uitofp		; SSE2: cost of 320 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv32i32v32double		; AVX1-LABEL: uitofpv32i32v32double
; AVX1: cost of 80 {{.*}} uitofp		; AVX1: cost of 55 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv32i32v32double		; AVX2-LABEL: uitofpv32i32v32double
; AVX2: cost of 80 {{.*}} uitofp		; AVX2: cost of 55 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv32i32v32double		; AVX512F-LABEL: uitofpv32i32v32double
; AVX512F: cost of 88 {{.*}} uitofp		; AVX512F: cost of 7 {{.*}} uitofp
%1 = uitofp <32 x i32> %a to <32 x double>		%1 = uitofp <32 x i32> %a to <32 x double>
ret <32 x double> %1		ret <32 x double> %1
}		}

define <2 x double> @uitofpv2i64v2double(<2 x i64> %a) {		define <2 x double> @uitofpv2i64v2double(<2 x i64> %a) {
; SSE2-LABEL: uitofpv2i64v2double		; SSE2-LABEL: uitofpv2i64v2double
; SSE2: cost of 20 {{.*}} uitofp		; SSE2: cost of 20 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv2i64v2double		; AVX1-LABEL: uitofpv2i64v2double
; AVX1: cost of 20 {{.*}} uitofp		; AVX1: cost of 20 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv2i64v2double		; AVX2-LABEL: uitofpv2i64v2double
; AVX2: cost of 20 {{.*}} uitofp		; AVX2: cost of 20 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv2i64v2double		; AVX512F-LABEL: uitofpv2i64v2double
; AVX512F: cost of 5 {{.*}} uitofp		; AVX512F: cost of 5 {{.*}} uitofp
;		;
; AVX512DQ: uitofpv2i64v2double		; AVX512DQ-LABEL: uitofpv2i64v2double
; AVX512DQ: cost of 1 {{.*}} uitofp		; AVX512DQ: cost of 1 {{.*}} uitofp
%1 = uitofp <2 x i64> %a to <2 x double>		%1 = uitofp <2 x i64> %a to <2 x double>
ret <2 x double> %1		ret <2 x double> %1
}		}

define <4 x double> @uitofpv4i64v4double(<4 x i64> %a) {		define <4 x double> @uitofpv4i64v4double(<4 x i64> %a) {
; SSE2-LABEL: uitofpv4i64v4double		; SSE2-LABEL: uitofpv4i64v4double
; SSE2: cost of 40 {{.*}} uitofp		; SSE2: cost of 40 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv4i64v4double		; AVX1-LABEL: uitofpv4i64v4double
; AVX1: cost of 40 {{.*}} uitofp		; AVX1: cost of 40 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv4i64v4double		; AVX2-LABEL: uitofpv4i64v4double
; AVX2: cost of 40 {{.*}} uitofp		; AVX2: cost of 40 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv4i64v4double		; AVX512F-LABEL: uitofpv4i64v4double
; AVX512F: cost of 12 {{.*}} uitofp		; AVX512F: cost of 12 {{.*}} uitofp
;		;
; AVX512DQ: uitofpv4i64v4double		; AVX512DQ-LABEL: uitofpv4i64v4double
; AVX512DQ: cost of 1 {{.*}} uitofp		; AVX512DQ: cost of 1 {{.*}} uitofp
%1 = uitofp <4 x i64> %a to <4 x double>		%1 = uitofp <4 x i64> %a to <4 x double>
ret <4 x double> %1		ret <4 x double> %1
}		}

define <8 x double> @uitofpv8i64v8double(<8 x i64> %a) {		define <8 x double> @uitofpv8i64v8double(<8 x i64> %a) {
; SSE2-LABEL: uitofpv8i64v8double		; SSE2-LABEL: uitofpv8i64v8double
; SSE2: cost of 80 {{.*}} uitofp		; SSE2: cost of 80 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv8i64v8double		; AVX1-LABEL: uitofpv8i64v8double
; AVX1: cost of 20 {{.*}} uitofp		; AVX1: cost of 81 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv8i64v8double		; AVX2-LABEL: uitofpv8i64v8double
; AVX2: cost of 20 {{.*}} uitofp		; AVX2: cost of 81 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv8i64v8double		; AVX512F-LABEL: uitofpv8i64v8double
; AVX512F: cost of 26 {{.*}} uitofp		; AVX512F: cost of 26 {{.*}} uitofp
;		;
; AVX512DQ: uitofpv8i64v8double		; AVX512DQ-LABEL: uitofpv8i64v8double
; AVX512DQ: cost of 1 {{.*}} uitofp		; AVX512DQ: cost of 1 {{.*}} uitofp
%1 = uitofp <8 x i64> %a to <8 x double>		%1 = uitofp <8 x i64> %a to <8 x double>
ret <8 x double> %1		ret <8 x double> %1
}		}

define <16 x double> @uitofpv16i64v16double(<16 x i64> %a) {		define <16 x double> @uitofpv16i64v16double(<16 x i64> %a) {
; SSE2-LABEL: uitofpv16i64v16double		; SSE2-LABEL: uitofpv16i64v16double
; SSE2: cost of 160 {{.*}} uitofp		; SSE2: cost of 160 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv16i64v16double		; AVX1-LABEL: uitofpv16i64v16double
; AVX1: cost of 40 {{.*}} uitofp		; AVX1: cost of 163 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv16i64v16double		; AVX2-LABEL: uitofpv16i64v16double
; AVX2: cost of 40 {{.*}} uitofp		; AVX2: cost of 163 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv16i64v16double		; AVX512F-LABEL: uitofpv16i64v16double
; AVX512F: cost of 44 {{.*}} uitofp		; AVX512F: cost of 53 {{.*}} uitofp
;		;
; AVX512DQ: uitofpv16i64v16double		; AVX512DQ-LABEL: uitofpv16i64v16double
; AVX512DQ: cost of 44 {{.*}} uitofp		; AVX512DQ: cost of 3 {{.*}} uitofp
%1 = uitofp <16 x i64> %a to <16 x double>		%1 = uitofp <16 x i64> %a to <16 x double>
ret <16 x double> %1		ret <16 x double> %1
}		}

define <32 x double> @uitofpv32i64v32double(<32 x i64> %a) {		define <32 x double> @uitofpv32i64v32double(<32 x i64> %a) {
; SSE2-LABEL: uitofpv32i64v32double		; SSE2-LABEL: uitofpv32i64v32double
; SSE2: cost of 320 {{.*}} uitofp		; SSE2: cost of 320 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv32i64v32double		; AVX1-LABEL: uitofpv32i64v32double
; AVX1: cost of 80 {{.*}} uitofp		; AVX1: cost of 327 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv32i64v32double		; AVX2-LABEL: uitofpv32i64v32double
; AVX2: cost of 80 {{.*}} uitofp		; AVX2: cost of 327 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv32i64v32double		; AVX512F-LABEL: uitofpv32i64v32double
; AVX512F: cost of 88 {{.*}} uitofp		; AVX512F: cost of 107 {{.*}} uitofp
;		;
; AVX512DQ: uitofpv32i64v32double		; AVX512DQ-LABEL: uitofpv32i64v32double
; AVX512DQ: cost of 88 {{.*}} uitofp		; AVX512DQ: cost of 2 {{.*}} uitofp
%1 = uitofp <32 x i64> %a to <32 x double>		%1 = uitofp <32 x i64> %a to <32 x double>
ret <32 x double> %1		ret <32 x double> %1
}		}

define <2 x float> @uitofpv2i8v2float(<2 x i8> %a) {		define <2 x float> @uitofpv2i8v2float(<2 x i8> %a) {
; SSE2-LABEL: uitofpv2i8v2float		; SSE2-LABEL: uitofpv2i8v2float
; SSE2: cost of 15 {{.*}} uitofp		; SSE2: cost of 15 {{.*}} uitofp
;		;
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	define <8 x float> @uitofpv8i8v8float(<8 x i8> %a) {
ret <8 x float> %1		ret <8 x float> %1
}		}

define <16 x float> @uitofpv16i8v16float(<16 x i8> %a) {		define <16 x float> @uitofpv16i8v16float(<16 x i8> %a) {
; SSE2-LABEL: uitofpv16i8v16float		; SSE2-LABEL: uitofpv16i8v16float
; SSE2: cost of 8 {{.*}} uitofp		; SSE2: cost of 8 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv16i8v16float		; AVX1-LABEL: uitofpv16i8v16float
; AVX1: cost of 44 {{.*}} uitofp		; AVX1: cost of 11 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv16i8v16float		; AVX2-LABEL: uitofpv16i8v16float
; AVX2: cost of 44 {{.*}} uitofp		; AVX2: cost of 11 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv16i8v16float		; AVX512F-LABEL: uitofpv16i8v16float
; AVX512F: cost of 2 {{.*}} uitofp		; AVX512F: cost of 2 {{.*}} uitofp
%1 = uitofp <16 x i8> %a to <16 x float>		%1 = uitofp <16 x i8> %a to <16 x float>
ret <16 x float> %1		ret <16 x float> %1
}		}

define <32 x float> @uitofpv32i8v32float(<32 x i8> %a) {		define <32 x float> @uitofpv32i8v32float(<32 x i8> %a) {
; SSE2-LABEL: uitofpv32i8v32float		; SSE2-LABEL: uitofpv32i8v32float
; SSE2: cost of 16 {{.*}} uitofp		; SSE2: cost of 16 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv32i8v32float		; AVX1-LABEL: uitofpv32i8v32float
; AVX1: cost of 88 {{.*}} uitofp		; AVX1: cost of 23 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv32i8v32float		; AVX2-LABEL: uitofpv32i8v32float
; AVX2: cost of 88 {{.*}} uitofp		; AVX2: cost of 23 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv32i8v32float		; AVX512F-LABEL: uitofpv32i8v32float
; AVX512F: cost of 92 {{.*}} uitofp		; AVX512F: cost of 5 {{.*}} uitofp
%1 = uitofp <32 x i8> %a to <32 x float>		%1 = uitofp <32 x i8> %a to <32 x float>
ret <32 x float> %1		ret <32 x float> %1
}		}

define <2 x float> @uitofpv2i16v2float(<2 x i16> %a) {		define <2 x float> @uitofpv2i16v2float(<2 x i16> %a) {
; SSE2-LABEL: uitofpv2i16v2float		; SSE2-LABEL: uitofpv2i16v2float
; SSE2: cost of 15 {{.*}} uitofp		; SSE2: cost of 15 {{.*}} uitofp
;		;
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	define <8 x float> @uitofpv8i16v8float(<8 x i16> %a) {
ret <8 x float> %1		ret <8 x float> %1
}		}

define <16 x float> @uitofpv16i16v16float(<16 x i16> %a) {		define <16 x float> @uitofpv16i16v16float(<16 x i16> %a) {
; SSE2-LABEL: uitofpv16i16v16float		; SSE2-LABEL: uitofpv16i16v16float
; SSE2: cost of 30 {{.*}} uitofp		; SSE2: cost of 30 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv16i16v16float		; AVX1-LABEL: uitofpv16i16v16float
; AVX1: cost of 44 {{.*}} uitofp		; AVX1: cost of 11 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv16i16v16float		; AVX2-LABEL: uitofpv16i16v16float
; AVX2: cost of 44 {{.*}} uitofp		; AVX2: cost of 11 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv16i16v16float		; AVX512F-LABEL: uitofpv16i16v16float
; AVX512F: cost of 2 {{.*}} uitofp		; AVX512F: cost of 2 {{.*}} uitofp
%1 = uitofp <16 x i16> %a to <16 x float>		%1 = uitofp <16 x i16> %a to <16 x float>
ret <16 x float> %1		ret <16 x float> %1
}		}

define <32 x float> @uitofpv32i16v32float(<32 x i16> %a) {		define <32 x float> @uitofpv32i16v32float(<32 x i16> %a) {
; SSE2-LABEL: uitofpv32i16v32float		; SSE2-LABEL: uitofpv32i16v32float
; SSE2: cost of 60 {{.*}} uitofp		; SSE2: cost of 60 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv32i16v32float		; AVX1-LABEL: uitofpv32i16v32float
; AVX1: cost of 88 {{.*}} uitofp		; AVX1: cost of 23 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv32i16v32float		; AVX2-LABEL: uitofpv32i16v32float
; AVX2: cost of 88 {{.*}} uitofp		; AVX2: cost of 23 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv32i16v32float		; AVX512F-LABEL: uitofpv32i16v32float
; AVX512F: cost of 92 {{.*}} uitofp		; AVX512F: cost of 5 {{.*}} uitofp
%1 = uitofp <32 x i16> %a to <32 x float>		%1 = uitofp <32 x i16> %a to <32 x float>
ret <32 x float> %1		ret <32 x float> %1
}		}

define <2 x float> @uitofpv2i32v2float(<2 x i32> %a) {		define <2 x float> @uitofpv2i32v2float(<2 x i32> %a) {
; SSE2-LABEL: uitofpv2i32v2float		; SSE2-LABEL: uitofpv2i32v2float
; SSE2: cost of 15 {{.*}} uitofp		; SSE2: cost of 15 {{.*}} uitofp
;		;
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	define <8 x float> @uitofpv8i32v8float(<8 x i32> %a) {
ret <8 x float> %1		ret <8 x float> %1
}		}

define <16 x float> @uitofpv16i32v16float(<16 x i32> %a) {		define <16 x float> @uitofpv16i32v16float(<16 x i32> %a) {
; SSE2-LABEL: uitofpv16i32v16float		; SSE2-LABEL: uitofpv16i32v16float
; SSE2: cost of 32 {{.*}} uitofp		; SSE2: cost of 32 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv16i32v16float		; AVX1-LABEL: uitofpv16i32v16float
; AVX1: cost of 44 {{.*}} uitofp		; AVX1: cost of 19 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv16i32v16float		; AVX2-LABEL: uitofpv16i32v16float
; AVX2: cost of 44 {{.*}} uitofp		; AVX2: cost of 17 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv16i32v16float		; AVX512F-LABEL: uitofpv16i32v16float
; AVX512F: cost of 1 {{.*}} uitofp		; AVX512F: cost of 1 {{.*}} uitofp
%1 = uitofp <16 x i32> %a to <16 x float>		%1 = uitofp <16 x i32> %a to <16 x float>
ret <16 x float> %1		ret <16 x float> %1
}		}

define <32 x float> @uitofpv32i32v32float(<32 x i32> %a) {		define <32 x float> @uitofpv32i32v32float(<32 x i32> %a) {
; SSE2-LABEL: uitofpv32i32v32float		; SSE2-LABEL: uitofpv32i32v32float
; SSE2: cost of 64 {{.*}} uitofp		; SSE2: cost of 64 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv32i32v32float		; AVX1-LABEL: uitofpv32i32v32float
; AVX1: cost of 88 {{.*}} uitofp		; AVX1: cost of 39 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv32i32v32float		; AVX2-LABEL: uitofpv32i32v32float
; AVX2: cost of 88 {{.*}} uitofp		; AVX2: cost of 35 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv32i32v32float		; AVX512F-LABEL: uitofpv32i32v32float
; AVX512F: cost of 92 {{.*}} uitofp		; AVX512F: cost of 3 {{.*}} uitofp
%1 = uitofp <32 x i32> %a to <32 x float>		%1 = uitofp <32 x i32> %a to <32 x float>
ret <32 x float> %1		ret <32 x float> %1
}		}

define <2 x float> @uitofpv2i64v2float(<2 x i64> %a) {		define <2 x float> @uitofpv2i64v2float(<2 x i64> %a) {
; SSE2-LABEL: uitofpv2i64v2float		; SSE2-LABEL: uitofpv2i64v2float
; SSE2: cost of 15 {{.*}} uitofp		; SSE2: cost of 15 {{.*}} uitofp
;		;
Show All 25 Lines	define <4 x float> @uitofpv4i64v4float(<4 x i64> %a) {
ret <4 x float> %1		ret <4 x float> %1
}		}

define <8 x float> @uitofpv8i64v8float(<8 x i64> %a) {		define <8 x float> @uitofpv8i64v8float(<8 x i64> %a) {
; SSE2-LABEL: uitofpv8i64v8float		; SSE2-LABEL: uitofpv8i64v8float
; SSE2: cost of 60 {{.*}} uitofp		; SSE2: cost of 60 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv8i64v8float		; AVX1-LABEL: uitofpv8i64v8float
; AVX1: cost of 22 {{.*}} uitofp		; AVX1: cost of 21 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv8i64v8float		; AVX2-LABEL: uitofpv8i64v8float
; AVX2: cost of 22 {{.*}} uitofp		; AVX2: cost of 21 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv8i64v8float		; AVX512F-LABEL: uitofpv8i64v8float
; AVX512F: cost of 22 {{.*}} uitofp		; AVX512F: cost of 22 {{.*}} uitofp
%1 = uitofp <8 x i64> %a to <8 x float>		%1 = uitofp <8 x i64> %a to <8 x float>
ret <8 x float> %1		ret <8 x float> %1
}		}

define <16 x float> @uitofpv16i64v16float(<16 x i64> %a) {		define <16 x float> @uitofpv16i64v16float(<16 x i64> %a) {
; SSE2-LABEL: uitofpv16i64v16float		; SSE2-LABEL: uitofpv16i64v16float
; SSE2: cost of 120 {{.*}} uitofp		; SSE2: cost of 120 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv16i64v16float		; AVX1-LABEL: uitofpv16i64v16float
; AVX1: cost of 44 {{.*}} uitofp		; AVX1: cost of 43 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv16i64v16float		; AVX2-LABEL: uitofpv16i64v16float
; AVX2: cost of 44 {{.*}} uitofp		; AVX2: cost of 43 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv16i64v16float		; AVX512F-LABEL: uitofpv16i64v16float
; AVX512F: cost of 46 {{.*}} uitofp		; AVX512F: cost of 45 {{.*}} uitofp
%1 = uitofp <16 x i64> %a to <16 x float>		%1 = uitofp <16 x i64> %a to <16 x float>
ret <16 x float> %1		ret <16 x float> %1
}		}

define <32 x float> @uitofpv32i64v32float(<32 x i64> %a) {		define <32 x float> @uitofpv32i64v32float(<32 x i64> %a) {
; SSE2-LABEL: uitofpv32i64v32float		; SSE2-LABEL: uitofpv32i64v32float
; SSE2: cost of 240 {{.*}} uitofp		; SSE2: cost of 240 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv32i64v32float		; AVX1-LABEL: uitofpv32i64v32float
; AVX1: cost of 88 {{.*}} uitofp		; AVX1: cost of 87 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv32i64v32float		; AVX2-LABEL: uitofpv32i64v32float
; AVX2: cost of 88 {{.*}} uitofp		; AVX2: cost of 87 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv32i64v32float		; AVX512F-LABEL: uitofpv32i64v32float
; AVX512F: cost of 92 {{.*}} uitofp		; AVX512F: cost of 91 {{.*}} uitofp
%1 = uitofp <32 x i64> %a to <32 x float>		%1 = uitofp <32 x i64> %a to <32 x float>
ret <32 x float> %1		ret <32 x float> %1
}		}

define <8 x i32> @fptouiv8f32v8i32(<8 x float> %a) {		define <8 x i32> @fptouiv8f32v8i32(<8 x float> %a) {
; AVX512F-LABEL: fptouiv8f32v8i32		; AVX512F-LABEL: fptouiv8f32v8i32
; AVX512F: cost of 1 {{.*}} fptoui		; AVX512F: cost of 1 {{.*}} fptoui
%1 = fptoui <8 x float> %a to <8 x i32>		%1 = fptoui <8 x float> %a to <8 x i32>
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/X86/gather_scatter.ll

	Show All 11 Lines
	; for (int i=0; i < SIZE; ++i) {			; for (int i=0; i < SIZE; ++i) {
	; if (trigger[i] > 0) {			; if (trigger[i] > 0) {
	; out[i] = in[index[i]] + (float) 0.5;			; out[i] = in[index[i]] + (float) 0.5;
	; }			; }
	; }			; }
	;}			;}

	;AVX512-LABEL: @foo1			;AVX512-LABEL: @foo1
	;AVX512: llvm.masked.load.v8i32			;AVX512: llvm.masked.load.v16i32
	;AVX512: llvm.masked.gather.v8f32			;AVX512: llvm.masked.gather.v16f32
	;AVX512: llvm.masked.store.v8f32			;AVX512: llvm.masked.store.v16f32
	;AVX512: ret void			;AVX512: ret void

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @foo1(float* noalias %in, float* noalias %out, i32* noalias %trigger, i32* noalias %index) {			define void @foo1(float* noalias %in, float* noalias %out, i32* noalias %trigger, i32* noalias %index) {
	entry:			entry:
	%in.addr = alloca float*, align 8			%in.addr = alloca float*, align 8
	%out.addr = alloca float*, align 8			%out.addr = alloca float*, align 8
	%trigger.addr = alloca i32*, align 8			%trigger.addr = alloca i32*, align 8
	▲ Show 20 Lines • Show All 206 Lines • Show Last 20 Lines