This is an archive of the discontinued LLVM Phabricator instance.

[X86] Use silvermont cost model overrides for goldmont as well.
ClosedPublic

Authored by craig.topper on Mar 19 2018, 12:47 PM.

Download Raw Diff

Details

Reviewers

Commits

rGa985919d3e0d: [X86] Update cost model for Goldmont. Add fsqrt costs for Silvermont
rL328451: [X86] Update cost model for Goldmont. Add fsqrt costs for Silvermont

Summary

Goldmont is similiar to silvermont we should probably use the silvermont cost model as a starting point.

Diff Detail

Repository: rL LLVM

Event Timeline

craig.topper created this revision.Mar 19 2018, 12:47 PM

I suspect this is wrong for PMULLD. I think that improved to a single uop on goldmont.

In D44644#1042332, @craig.topper wrote:

I suspect this is wrong for PMULLD. I think that improved to a single uop on goldmont.

Do you want to fix that here or just add a fixme?

Turns out almost all of the slow things in the SLM table have been improved in GLM. So this isn't the right table to use. The only thing that didn't change much was floating point division.

In D44644#1047756, @craig.topper wrote:

Turns out almost all of the slow things in the SLM table have been improved in GLM. So this isn't the right table to use. The only thing that didn't change much was floating point division.

I'm hoping to start work on PR36550 reasonably soon - a better approach might be to (a) ensure that the SLM model matches what the TTI says it should and (b) decide how best to provide a GLM model. What do you think?

Introduce a GLM specific table the overrides FDIV. Override FSQRT for both GLM and SLM. It appears for packed operations only half of the 128-bit vector is produced at a time for both SLM and GLM as the throughput is twice the scalar throughput. The default SSE42 throughputs we were getting otherwise don't match that behavior.

Throughput data for GLM was taken from table 16-17 in the latest Intel Optimization Manual.

Should I make a copy of the SLM scheduler and use it for GLM so we can start refining it?

In D44644#1047765, @craig.topper wrote:

Should I make a copy of the SLM scheduler and use it for GLM so we can start refining it?

Probably - I was looking to find tidy ways to override models such as these - architecturally the same but with a few latency tweaks - but couldn't see anything. Probably easier just to copy it,, maybe once you're happy with the accuracy of the SLM model.

These changes LGTM though

This revision is now accepted and ready to land.Mar 25 2018, 2:55 AM

Closed by commit rL328451: [X86] Update cost model for Goldmont. Add fsqrt costs for Silvermont (authored by ctopper). · Explain WhyMar 25 2018, 9:00 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86Subtarget.h

1 line

X86TargetTransformInfo.cpp

63 lines

test/

Analysis/

CostModel/

X86/

arith-fp.ll

89 lines

arith.ll

148 lines

Diff 139738

llvm/trunk/lib/Target/X86/X86Subtarget.h

Show First 20 Lines • Show All 649 Lines • ▼ Show 20 Lines	public:

bool isXRaySupported() const override { return is64Bit(); }		bool isXRaySupported() const override { return is64Bit(); }

X86ProcFamilyEnum getProcFamily() const { return X86ProcFamily; }		X86ProcFamilyEnum getProcFamily() const { return X86ProcFamily; }

/// TODO: to be removed later and replaced with suitable properties		/// TODO: to be removed later and replaced with suitable properties
bool isAtom() const { return X86ProcFamily == IntelAtom; }		bool isAtom() const { return X86ProcFamily == IntelAtom; }
bool isSLM() const { return X86ProcFamily == IntelSLM; }		bool isSLM() const { return X86ProcFamily == IntelSLM; }
		bool isGLM() const { return X86ProcFamily == IntelGLM; }
bool useSoftFloat() const { return UseSoftFloat; }		bool useSoftFloat() const { return UseSoftFloat; }

/// Use mfence if we have SSE2 or we're on x86-64 (even if we asked for		/// Use mfence if we have SSE2 or we're on x86-64 (even if we asked for
/// no-sse2). There isn't any reason to disable it if the target processor		/// no-sse2). There isn't any reason to disable it if the target processor
/// supports it.		/// supports it.
bool hasMFence() const { return hasSSE2() \|\| is64Bit(); }		bool hasMFence() const { return hasSSE2() \|\| is64Bit(); }

const Triple &getTargetTriple() const { return TargetTriple; }		const Triple &getTargetTriple() const { return TargetTriple; }
▲ Show 20 Lines • Show All 133 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	int X86TTIImpl::getArithmeticInstrCost(
TTI::OperandValueProperties Opd2PropInfo,		TTI::OperandValueProperties Opd2PropInfo,
ArrayRef<const Value *> Args) {		ArrayRef<const Value *> Args) {
// Legalize the type.		// Legalize the type.
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty);		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty);

int ISD = TLI->InstructionOpcodeToISD(Opcode);		int ISD = TLI->InstructionOpcodeToISD(Opcode);
assert(ISD && "Invalid opcode");		assert(ISD && "Invalid opcode");

		static const CostTblEntry GLMCostTable[] = {
		{ ISD::FDIV, MVT::f32, 18 }, // divss
		{ ISD::FDIV, MVT::v4f32, 35 }, // divps
		{ ISD::FDIV, MVT::f64, 33 }, // divsd
		{ ISD::FDIV, MVT::v2f64, 65 }, // divpd
		};

		if (ST->isGLM())
		if (const auto *Entry = CostTableLookup(GLMCostTable, ISD,
		LT.second))
		return LT.first * Entry->Cost;

static const CostTblEntry SLMCostTable[] = {		static const CostTblEntry SLMCostTable[] = {
{ ISD::MUL, MVT::v4i32, 11 }, // pmulld		{ ISD::MUL, MVT::v4i32, 11 }, // pmulld
{ ISD::MUL, MVT::v8i16, 2 }, // pmullw		{ ISD::MUL, MVT::v8i16, 2 }, // pmullw
{ ISD::MUL, MVT::v16i8, 14 }, // extend/pmullw/trunc sequence.		{ ISD::MUL, MVT::v16i8, 14 }, // extend/pmullw/trunc sequence.
{ ISD::FMUL, MVT::f64, 2 }, // mulsd		{ ISD::FMUL, MVT::f64, 2 }, // mulsd
{ ISD::FMUL, MVT::v2f64, 4 }, // mulpd		{ ISD::FMUL, MVT::v2f64, 4 }, // mulpd
{ ISD::FMUL, MVT::v4f32, 2 }, // mulps		{ ISD::FMUL, MVT::v4f32, 2 }, // mulps
{ ISD::FDIV, MVT::f32, 17 }, // divss		{ ISD::FDIV, MVT::f32, 17 }, // divss
{ ISD::FDIV, MVT::v4f32, 39 }, // divps		{ ISD::FDIV, MVT::v4f32, 39 }, // divps
{ ISD::FDIV, MVT::f64, 32 }, // divsd		{ ISD::FDIV, MVT::f64, 32 }, // divsd
{ ISD::FDIV, MVT::v2f64, 69 }, // divpd		{ ISD::FDIV, MVT::v2f64, 69 }, // divpd
{ ISD::FADD, MVT::v2f64, 2 }, // addpd		{ ISD::FADD, MVT::v2f64, 2 }, // addpd
{ ISD::FSUB, MVT::v2f64, 2 }, // subpd		{ ISD::FSUB, MVT::v2f64, 2 }, // subpd
// v2i64/v4i64 mul is custom lowered as a series of long:		// v2i64/v4i64 mul is custom lowered as a series of long:
// multiplies(3), shifts(3) and adds(2)		// multiplies(3), shifts(3) and adds(2)
// slm muldq version throughput is 2 and addq throughput 4		// slm muldq version throughput is 2 and addq throughput 4
// thus: 3X2 (muldq throughput) + 3X1 (shift throughput) +		// thus: 3X2 (muldq throughput) + 3X1 (shift throughput) +
// 3X4 (addq throughput) = 17		// 3X4 (addq throughput) = 17
{ ISD::MUL, MVT::v2i64, 17 },		{ ISD::MUL, MVT::v2i64, 17 },
// slm addq\subq throughput is 4		// slm addq\subq throughput is 4
{ ISD::ADD, MVT::v2i64, 4 },		{ ISD::ADD, MVT::v2i64, 4 },
{ ISD::SUB, MVT::v2i64, 4 },		{ ISD::SUB, MVT::v2i64, 4 },
};		};

if (ST->isSLM()) {		if (ST->isSLM()) {
if (Args.size() == 2 && ISD == ISD::MUL && LT.second == MVT::v4i32) {		if (Args.size() == 2 && ISD == ISD::MUL && LT.second == MVT::v4i32) {
// Check if the operands can be shrinked into a smaller datatype.		// Check if the operands can be shrinked into a smaller datatype.
bool Op1Signed = false;		bool Op1Signed = false;
unsigned Op1MinSize = BaseT::minRequiredElementSize(Args[0], Op1Signed);		unsigned Op1MinSize = BaseT::minRequiredElementSize(Args[0], Op1Signed);
bool Op2Signed = false;		bool Op2Signed = false;
unsigned Op2MinSize = BaseT::minRequiredElementSize(Args[1], Op2Signed);		unsigned Op2MinSize = BaseT::minRequiredElementSize(Args[1], Op2Signed);

bool signedMode = Op1Signed \| Op2Signed;		bool signedMode = Op1Signed \| Op2Signed;
unsigned OpMinSize = std::max(Op1MinSize, Op2MinSize);		unsigned OpMinSize = std::max(Op1MinSize, Op2MinSize);

if (OpMinSize <= 7)		if (OpMinSize <= 7)
return LT.first * 3; // pmullw/sext		return LT.first * 3; // pmullw/sext
if (!signedMode && OpMinSize <= 8)		if (!signedMode && OpMinSize <= 8)
return LT.first * 3; // pmullw/zext		return LT.first * 3; // pmullw/zext
if (OpMinSize <= 15)		if (OpMinSize <= 15)
return LT.first * 5; // pmullw/pmulhw/pshuf		return LT.first * 5; // pmullw/pmulhw/pshuf
if (!signedMode && OpMinSize <= 16)		if (!signedMode && OpMinSize <= 16)
return LT.first * 5; // pmullw/pmulhw/pshuf		return LT.first * 5; // pmullw/pmulhw/pshuf
}		}

if (const auto *Entry = CostTableLookup(SLMCostTable, ISD,		if (const auto *Entry = CostTableLookup(SLMCostTable, ISD,
LT.second)) {		LT.second)) {
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;
}		}
}		}

if (ISD == ISD::SDIV &&		if (ISD == ISD::SDIV &&
Op2Info == TargetTransformInfo::OK_UniformConstantValue &&		Op2Info == TargetTransformInfo::OK_UniformConstantValue &&
▲ Show 20 Lines • Show All 1,424 Lines • ▼ Show 20 Lines	static const CostTblEntry AVX1CostTbl[] = {
{ ISD::CTTZ, MVT::v32i8, 20 }, // 2 x 128-bit Op + extract/insert		{ ISD::CTTZ, MVT::v32i8, 20 }, // 2 x 128-bit Op + extract/insert
{ ISD::FSQRT, MVT::f32, 14 }, // SNB from http://www.agner.org/		{ ISD::FSQRT, MVT::f32, 14 }, // SNB from http://www.agner.org/
{ ISD::FSQRT, MVT::v4f32, 14 }, // SNB from http://www.agner.org/		{ ISD::FSQRT, MVT::v4f32, 14 }, // SNB from http://www.agner.org/
{ ISD::FSQRT, MVT::v8f32, 28 }, // SNB from http://www.agner.org/		{ ISD::FSQRT, MVT::v8f32, 28 }, // SNB from http://www.agner.org/
{ ISD::FSQRT, MVT::f64, 21 }, // SNB from http://www.agner.org/		{ ISD::FSQRT, MVT::f64, 21 }, // SNB from http://www.agner.org/
{ ISD::FSQRT, MVT::v2f64, 21 }, // SNB from http://www.agner.org/		{ ISD::FSQRT, MVT::v2f64, 21 }, // SNB from http://www.agner.org/
{ ISD::FSQRT, MVT::v4f64, 43 }, // SNB from http://www.agner.org/		{ ISD::FSQRT, MVT::v4f64, 43 }, // SNB from http://www.agner.org/
};		};
		static const CostTblEntry GLMCostTbl[] = {
		{ ISD::FSQRT, MVT::f32, 19 }, // sqrtss
		{ ISD::FSQRT, MVT::v4f32, 37 }, // sqrtps
		{ ISD::FSQRT, MVT::f64, 34 }, // sqrtsd
		{ ISD::FSQRT, MVT::v2f64, 67 }, // sqrtpd
		};
		static const CostTblEntry SLMCostTbl[] = {
		{ ISD::FSQRT, MVT::f32, 20 }, // sqrtss
		{ ISD::FSQRT, MVT::v4f32, 40 }, // sqrtps
		{ ISD::FSQRT, MVT::f64, 35 }, // sqrtsd
		{ ISD::FSQRT, MVT::v2f64, 70 }, // sqrtpd
		};
static const CostTblEntry SSE42CostTbl[] = {		static const CostTblEntry SSE42CostTbl[] = {
{ ISD::FSQRT, MVT::f32, 18 }, // Nehalem from http://www.agner.org/		{ ISD::FSQRT, MVT::f32, 18 }, // Nehalem from http://www.agner.org/
{ ISD::FSQRT, MVT::v4f32, 18 }, // Nehalem from http://www.agner.org/		{ ISD::FSQRT, MVT::v4f32, 18 }, // Nehalem from http://www.agner.org/
};		};
static const CostTblEntry SSSE3CostTbl[] = {		static const CostTblEntry SSSE3CostTbl[] = {
{ ISD::BITREVERSE, MVT::v2i64, 5 },		{ ISD::BITREVERSE, MVT::v2i64, 5 },
{ ISD::BITREVERSE, MVT::v4i32, 5 },		{ ISD::BITREVERSE, MVT::v4i32, 5 },
{ ISD::BITREVERSE, MVT::v8i16, 5 },		{ ISD::BITREVERSE, MVT::v8i16, 5 },
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	case Intrinsic::sqrt:
break;		break;
}		}

// Legalize the type.		// Legalize the type.
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, RetTy);		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, RetTy);
MVT MTy = LT.second;		MVT MTy = LT.second;

// Attempt to lookup cost.		// Attempt to lookup cost.
		if (ST->isGLM())
		if (const auto *Entry = CostTableLookup(GLMCostTbl, ISD, MTy))
		return LT.first * Entry->Cost;

		if (ST->isSLM())
		if (const auto *Entry = CostTableLookup(SLMCostTbl, ISD, MTy))
		return LT.first * Entry->Cost;

if (ST->hasCDI())		if (ST->hasCDI())
if (const auto *Entry = CostTableLookup(AVX512CDCostTbl, ISD, MTy))		if (const auto *Entry = CostTableLookup(AVX512CDCostTbl, ISD, MTy))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;

if (ST->hasBWI())		if (ST->hasBWI())
if (const auto *Entry = CostTableLookup(AVX512BWCostTbl, ISD, MTy))		if (const auto *Entry = CostTableLookup(AVX512BWCostTbl, ISD, MTy))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;

▲ Show 20 Lines • Show All 1,133 Lines • Show Last 20 Lines

llvm/trunk/test/Analysis/CostModel/X86/arith-fp.ll

	; RUN: opt < %s -enable-no-nans-fp-math -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+sse2 \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE2			; RUN: opt < %s -enable-no-nans-fp-math -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+sse2 \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE2
	; RUN: opt < %s -enable-no-nans-fp-math -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+sse4.2 \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE42			; RUN: opt < %s -enable-no-nans-fp-math -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+sse4.2 \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE42
	; RUN: opt < %s -enable-no-nans-fp-math -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+avx,+fma \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX			; RUN: opt < %s -enable-no-nans-fp-math -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+avx,+fma \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
	; RUN: opt < %s -enable-no-nans-fp-math -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+avx2,+fma \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX2			; RUN: opt < %s -enable-no-nans-fp-math -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+avx2,+fma \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX2
	; RUN: opt < %s -enable-no-nans-fp-math -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+avx512f \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512 --check-prefix=AVX512F			; RUN: opt < %s -enable-no-nans-fp-math -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+avx512f \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512 --check-prefix=AVX512F
	; RUN: opt < %s -enable-no-nans-fp-math -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+avx512f,+avx512bw \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512 --check-prefix=AVX512BW			; RUN: opt < %s -enable-no-nans-fp-math -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+avx512f,+avx512bw \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512 --check-prefix=AVX512BW
	; RUN: opt < %s -enable-no-nans-fp-math -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mcpu=slm \| FileCheck %s --check-prefix=CHECK --check-prefix=SLM			; RUN: opt < %s -enable-no-nans-fp-math -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mcpu=slm \| FileCheck %s --check-prefix=CHECK --check-prefix=SLM
				; RUN: opt < %s -enable-no-nans-fp-math -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mcpu=goldmont \| FileCheck %s --check-prefix=CHECK --check-prefix=GLM

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.8.0"			target triple = "x86_64-apple-macosx10.8.0"

	; CHECK-LABEL: 'fadd'			; CHECK-LABEL: 'fadd'
	define i32 @fadd(i32 %arg) {			define i32 @fadd(i32 %arg) {
	; SSE2: cost of 2 {{.*}} %F32 = fadd			; SSE2: cost of 2 {{.*}} %F32 = fadd
	; SSE42: cost of 1 {{.*}} %F32 = fadd			; SSE42: cost of 1 {{.*}} %F32 = fadd
	; AVX: cost of 1 {{.*}} %F32 = fadd			; AVX: cost of 1 {{.*}} %F32 = fadd
	; AVX2: cost of 1 {{.*}} %F32 = fadd			; AVX2: cost of 1 {{.*}} %F32 = fadd
	; AVX512: cost of 1 {{.*}} %F32 = fadd			; AVX512: cost of 1 {{.*}} %F32 = fadd
	; SLM: cost of 1 {{.*}} %F32 = fadd			; SLM: cost of 1 {{.*}} %F32 = fadd
				; GLM: cost of 1 {{.*}} %F32 = fadd
	%F32 = fadd float undef, undef			%F32 = fadd float undef, undef
	; SSE2: cost of 2 {{.*}} %V4F32 = fadd			; SSE2: cost of 2 {{.*}} %V4F32 = fadd
	; SSE42: cost of 1 {{.*}} %V4F32 = fadd			; SSE42: cost of 1 {{.*}} %V4F32 = fadd
	; AVX: cost of 1 {{.*}} %V4F32 = fadd			; AVX: cost of 1 {{.*}} %V4F32 = fadd
	; AVX2: cost of 1 {{.*}} %V4F32 = fadd			; AVX2: cost of 1 {{.*}} %V4F32 = fadd
	; AVX512: cost of 1 {{.*}} %V4F32 = fadd			; AVX512: cost of 1 {{.*}} %V4F32 = fadd
	; SLM: cost of 1 {{.*}} %V4F32 = fadd			; SLM: cost of 1 {{.*}} %V4F32 = fadd
				; GLM: cost of 1 {{.*}} %V4F32 = fadd
	%V4F32 = fadd <4 x float> undef, undef			%V4F32 = fadd <4 x float> undef, undef
	; SSE2: cost of 4 {{.*}} %V8F32 = fadd			; SSE2: cost of 4 {{.*}} %V8F32 = fadd
	; SSE42: cost of 2 {{.*}} %V8F32 = fadd			; SSE42: cost of 2 {{.*}} %V8F32 = fadd
	; AVX: cost of 2 {{.*}} %V8F32 = fadd			; AVX: cost of 2 {{.*}} %V8F32 = fadd
	; AVX2: cost of 1 {{.*}} %V8F32 = fadd			; AVX2: cost of 1 {{.*}} %V8F32 = fadd
	; AVX512: cost of 1 {{.*}} %V8F32 = fadd			; AVX512: cost of 1 {{.*}} %V8F32 = fadd
	; SLM: cost of 2 {{.*}} %V8F32 = fadd			; SLM: cost of 2 {{.*}} %V8F32 = fadd
				; GLM: cost of 2 {{.*}} %V8F32 = fadd
	%V8F32 = fadd <8 x float> undef, undef			%V8F32 = fadd <8 x float> undef, undef
	; SSE2: cost of 8 {{.*}} %V16F32 = fadd			; SSE2: cost of 8 {{.*}} %V16F32 = fadd
	; SSE42: cost of 4 {{.*}} %V16F32 = fadd			; SSE42: cost of 4 {{.*}} %V16F32 = fadd
	; AVX: cost of 4 {{.*}} %V16F32 = fadd			; AVX: cost of 4 {{.*}} %V16F32 = fadd
	; AVX2: cost of 2 {{.*}} %V16F32 = fadd			; AVX2: cost of 2 {{.*}} %V16F32 = fadd
	; AVX512: cost of 1 {{.*}} %V16F32 = fadd			; AVX512: cost of 1 {{.*}} %V16F32 = fadd
	; SLM: cost of 4 {{.*}} %V16F32 = fadd			; SLM: cost of 4 {{.*}} %V16F32 = fadd
				; GLM: cost of 4 {{.*}} %V16F32 = fadd
	%V16F32 = fadd <16 x float> undef, undef			%V16F32 = fadd <16 x float> undef, undef

	; SSE2: cost of 2 {{.*}} %F64 = fadd			; SSE2: cost of 2 {{.*}} %F64 = fadd
	; SSE42: cost of 1 {{.*}} %F64 = fadd			; SSE42: cost of 1 {{.*}} %F64 = fadd
	; AVX: cost of 1 {{.*}} %F64 = fadd			; AVX: cost of 1 {{.*}} %F64 = fadd
	; AVX2: cost of 1 {{.*}} %F64 = fadd			; AVX2: cost of 1 {{.*}} %F64 = fadd
	; AVX512: cost of 1 {{.*}} %F64 = fadd			; AVX512: cost of 1 {{.*}} %F64 = fadd
	; SLM: cost of 1 {{.*}} %F64 = fadd			; SLM: cost of 1 {{.*}} %F64 = fadd
				; GLM: cost of 1 {{.*}} %F64 = fadd
	%F64 = fadd double undef, undef			%F64 = fadd double undef, undef
	; SSE2: cost of 2 {{.*}} %V2F64 = fadd			; SSE2: cost of 2 {{.*}} %V2F64 = fadd
	; SSE42: cost of 1 {{.*}} %V2F64 = fadd			; SSE42: cost of 1 {{.*}} %V2F64 = fadd
	; AVX: cost of 1 {{.*}} %V2F64 = fadd			; AVX: cost of 1 {{.*}} %V2F64 = fadd
	; AVX2: cost of 1 {{.*}} %V2F64 = fadd			; AVX2: cost of 1 {{.*}} %V2F64 = fadd
	; AVX512: cost of 1 {{.*}} %V2F64 = fadd			; AVX512: cost of 1 {{.*}} %V2F64 = fadd
	; SLM: cost of 2 {{.*}} %V2F64 = fadd			; SLM: cost of 2 {{.*}} %V2F64 = fadd
				; GLM: cost of 1 {{.*}} %V2F64 = fadd
	%V2F64 = fadd <2 x double> undef, undef			%V2F64 = fadd <2 x double> undef, undef
	; SSE2: cost of 4 {{.*}} %V4F64 = fadd			; SSE2: cost of 4 {{.*}} %V4F64 = fadd
	; SSE42: cost of 2 {{.*}} %V4F64 = fadd			; SSE42: cost of 2 {{.*}} %V4F64 = fadd
	; AVX: cost of 2 {{.*}} %V4F64 = fadd			; AVX: cost of 2 {{.*}} %V4F64 = fadd
	; AVX2: cost of 1 {{.*}} %V4F64 = fadd			; AVX2: cost of 1 {{.*}} %V4F64 = fadd
	; AVX512: cost of 1 {{.*}} %V4F64 = fadd			; AVX512: cost of 1 {{.*}} %V4F64 = fadd
	; SLM: cost of 4 {{.*}} %V4F64 = fadd			; SLM: cost of 4 {{.*}} %V4F64 = fadd
				; GLM: cost of 2 {{.*}} %V4F64 = fadd
	%V4F64 = fadd <4 x double> undef, undef			%V4F64 = fadd <4 x double> undef, undef
	; SSE2: cost of 8 {{.*}} %V8F64 = fadd			; SSE2: cost of 8 {{.*}} %V8F64 = fadd
	; SSE42: cost of 4 {{.*}} %V8F64 = fadd			; SSE42: cost of 4 {{.*}} %V8F64 = fadd
	; AVX: cost of 4 {{.*}} %V8F64 = fadd			; AVX: cost of 4 {{.*}} %V8F64 = fadd
	; AVX2: cost of 2 {{.*}} %V8F64 = fadd			; AVX2: cost of 2 {{.*}} %V8F64 = fadd
	; AVX512: cost of 1 {{.*}} %V8F64 = fadd			; AVX512: cost of 1 {{.*}} %V8F64 = fadd
	; SLM: cost of 8 {{.*}} %V8F64 = fadd			; SLM: cost of 8 {{.*}} %V8F64 = fadd
				; GLM: cost of 4 {{.*}} %V8F64 = fadd
	%V8F64 = fadd <8 x double> undef, undef			%V8F64 = fadd <8 x double> undef, undef

	ret i32 undef			ret i32 undef
	}			}

	; CHECK-LABEL: 'fsub'			; CHECK-LABEL: 'fsub'
	define i32 @fsub(i32 %arg) {			define i32 @fsub(i32 %arg) {
	; SSE2: cost of 2 {{.*}} %F32 = fsub			; SSE2: cost of 2 {{.*}} %F32 = fsub
	; SSE42: cost of 1 {{.*}} %F32 = fsub			; SSE42: cost of 1 {{.*}} %F32 = fsub
	; AVX: cost of 1 {{.*}} %F32 = fsub			; AVX: cost of 1 {{.*}} %F32 = fsub
	; AVX2: cost of 1 {{.*}} %F32 = fsub			; AVX2: cost of 1 {{.*}} %F32 = fsub
	; AVX512: cost of 1 {{.*}} %F32 = fsub			; AVX512: cost of 1 {{.*}} %F32 = fsub
	; SLM: cost of 1 {{.*}} %F32 = fsub			; SLM: cost of 1 {{.*}} %F32 = fsub
				; GLM: cost of 1 {{.*}} %F32 = fsub
	%F32 = fsub float undef, undef			%F32 = fsub float undef, undef
	; SSE2: cost of 2 {{.*}} %V4F32 = fsub			; SSE2: cost of 2 {{.*}} %V4F32 = fsub
	; SSE42: cost of 1 {{.*}} %V4F32 = fsub			; SSE42: cost of 1 {{.*}} %V4F32 = fsub
	; AVX: cost of 1 {{.*}} %V4F32 = fsub			; AVX: cost of 1 {{.*}} %V4F32 = fsub
	; AVX2: cost of 1 {{.*}} %V4F32 = fsub			; AVX2: cost of 1 {{.*}} %V4F32 = fsub
	; AVX512: cost of 1 {{.*}} %V4F32 = fsub			; AVX512: cost of 1 {{.*}} %V4F32 = fsub
	; SLM: cost of 1 {{.*}} %V4F32 = fsub			; SLM: cost of 1 {{.*}} %V4F32 = fsub
				; GLM: cost of 1 {{.*}} %V4F32 = fsub
	%V4F32 = fsub <4 x float> undef, undef			%V4F32 = fsub <4 x float> undef, undef
	; SSE2: cost of 4 {{.*}} %V8F32 = fsub			; SSE2: cost of 4 {{.*}} %V8F32 = fsub
	; SSE42: cost of 2 {{.*}} %V8F32 = fsub			; SSE42: cost of 2 {{.*}} %V8F32 = fsub
	; AVX: cost of 2 {{.*}} %V8F32 = fsub			; AVX: cost of 2 {{.*}} %V8F32 = fsub
	; AVX2: cost of 1 {{.*}} %V8F32 = fsub			; AVX2: cost of 1 {{.*}} %V8F32 = fsub
	; AVX512: cost of 1 {{.*}} %V8F32 = fsub			; AVX512: cost of 1 {{.*}} %V8F32 = fsub
	; SLM: cost of 2 {{.*}} %V8F32 = fsub			; SLM: cost of 2 {{.*}} %V8F32 = fsub
				; GLM: cost of 2 {{.*}} %V8F32 = fsub
	%V8F32 = fsub <8 x float> undef, undef			%V8F32 = fsub <8 x float> undef, undef
	; SSE2: cost of 8 {{.*}} %V16F32 = fsub			; SSE2: cost of 8 {{.*}} %V16F32 = fsub
	; SSE42: cost of 4 {{.*}} %V16F32 = fsub			; SSE42: cost of 4 {{.*}} %V16F32 = fsub
	; AVX: cost of 4 {{.*}} %V16F32 = fsub			; AVX: cost of 4 {{.*}} %V16F32 = fsub
	; AVX2: cost of 2 {{.*}} %V16F32 = fsub			; AVX2: cost of 2 {{.*}} %V16F32 = fsub
	; AVX512: cost of 1 {{.*}} %V16F32 = fsub			; AVX512: cost of 1 {{.*}} %V16F32 = fsub
	; SLM: cost of 4 {{.*}} %V16F32 = fsub			; SLM: cost of 4 {{.*}} %V16F32 = fsub
				; GLM: cost of 4 {{.*}} %V16F32 = fsub
	%V16F32 = fsub <16 x float> undef, undef			%V16F32 = fsub <16 x float> undef, undef

	; SSE2: cost of 2 {{.*}} %F64 = fsub			; SSE2: cost of 2 {{.*}} %F64 = fsub
	; SSE42: cost of 1 {{.*}} %F64 = fsub			; SSE42: cost of 1 {{.*}} %F64 = fsub
	; AVX: cost of 1 {{.*}} %F64 = fsub			; AVX: cost of 1 {{.*}} %F64 = fsub
	; AVX2: cost of 1 {{.*}} %F64 = fsub			; AVX2: cost of 1 {{.*}} %F64 = fsub
	; AVX512: cost of 1 {{.*}} %F64 = fsub			; AVX512: cost of 1 {{.*}} %F64 = fsub
	; SLM: cost of 1 {{.*}} %F64 = fsub			; SLM: cost of 1 {{.*}} %F64 = fsub
				; GLM: cost of 1 {{.*}} %F64 = fsub
	%F64 = fsub double undef, undef			%F64 = fsub double undef, undef
	; SSE2: cost of 2 {{.*}} %V2F64 = fsub			; SSE2: cost of 2 {{.*}} %V2F64 = fsub
	; SSE42: cost of 1 {{.*}} %V2F64 = fsub			; SSE42: cost of 1 {{.*}} %V2F64 = fsub
	; AVX: cost of 1 {{.*}} %V2F64 = fsub			; AVX: cost of 1 {{.*}} %V2F64 = fsub
	; AVX2: cost of 1 {{.*}} %V2F64 = fsub			; AVX2: cost of 1 {{.*}} %V2F64 = fsub
	; AVX512: cost of 1 {{.*}} %V2F64 = fsub			; AVX512: cost of 1 {{.*}} %V2F64 = fsub
	; SLM: cost of 2 {{.*}} %V2F64 = fsub			; SLM: cost of 2 {{.*}} %V2F64 = fsub
				; GLM: cost of 1 {{.*}} %V2F64 = fsub
	%V2F64 = fsub <2 x double> undef, undef			%V2F64 = fsub <2 x double> undef, undef
	; SSE2: cost of 4 {{.*}} %V4F64 = fsub			; SSE2: cost of 4 {{.*}} %V4F64 = fsub
	; SSE42: cost of 2 {{.*}} %V4F64 = fsub			; SSE42: cost of 2 {{.*}} %V4F64 = fsub
	; AVX: cost of 2 {{.*}} %V4F64 = fsub			; AVX: cost of 2 {{.*}} %V4F64 = fsub
	; AVX2: cost of 1 {{.*}} %V4F64 = fsub			; AVX2: cost of 1 {{.*}} %V4F64 = fsub
	; AVX512: cost of 1 {{.*}} %V4F64 = fsub			; AVX512: cost of 1 {{.*}} %V4F64 = fsub
	; SLM: cost of 4 {{.*}} %V4F64 = fsub			; SLM: cost of 4 {{.*}} %V4F64 = fsub
				; GLM: cost of 2 {{.*}} %V4F64 = fsub
	%V4F64 = fsub <4 x double> undef, undef			%V4F64 = fsub <4 x double> undef, undef
	; SSE2: cost of 8 {{.*}} %V8F64 = fsub			; SSE2: cost of 8 {{.*}} %V8F64 = fsub
	; SSE42: cost of 4 {{.*}} %V8F64 = fsub			; SSE42: cost of 4 {{.*}} %V8F64 = fsub
	; AVX: cost of 4 {{.*}} %V8F64 = fsub			; AVX: cost of 4 {{.*}} %V8F64 = fsub
	; AVX2: cost of 2 {{.*}} %V8F64 = fsub			; AVX2: cost of 2 {{.*}} %V8F64 = fsub
	; AVX512: cost of 1 {{.*}} %V8F64 = fsub			; AVX512: cost of 1 {{.*}} %V8F64 = fsub
	; SLM: cost of 8 {{.*}} %V8F64 = fsub			; SLM: cost of 8 {{.*}} %V8F64 = fsub
				; GLM: cost of 4 {{.*}} %V8F64 = fsub
	%V8F64 = fsub <8 x double> undef, undef			%V8F64 = fsub <8 x double> undef, undef

	ret i32 undef			ret i32 undef
	}			}

	; CHECK-LABEL: 'fmul'			; CHECK-LABEL: 'fmul'
	define i32 @fmul(i32 %arg) {			define i32 @fmul(i32 %arg) {
	; SSE2: cost of 2 {{.*}} %F32 = fmul			; SSE2: cost of 2 {{.*}} %F32 = fmul
	; SSE42: cost of 1 {{.*}} %F32 = fmul			; SSE42: cost of 1 {{.*}} %F32 = fmul
	; AVX: cost of 1 {{.*}} %F32 = fmul			; AVX: cost of 1 {{.*}} %F32 = fmul
	; AVX2: cost of 1 {{.*}} %F32 = fmul			; AVX2: cost of 1 {{.*}} %F32 = fmul
	; AVX512: cost of 1 {{.*}} %F32 = fmul			; AVX512: cost of 1 {{.*}} %F32 = fmul
	; SLM: cost of 1 {{.*}} %F32 = fmul			; SLM: cost of 1 {{.*}} %F32 = fmul
				; GLM: cost of 1 {{.*}} %F32 = fmul
	%F32 = fmul float undef, undef			%F32 = fmul float undef, undef
	; SSE2: cost of 2 {{.*}} %V4F32 = fmul			; SSE2: cost of 2 {{.*}} %V4F32 = fmul
	; SSE42: cost of 1 {{.*}} %V4F32 = fmul			; SSE42: cost of 1 {{.*}} %V4F32 = fmul
	; AVX: cost of 1 {{.*}} %V4F32 = fmul			; AVX: cost of 1 {{.*}} %V4F32 = fmul
	; AVX2: cost of 1 {{.*}} %V4F32 = fmul			; AVX2: cost of 1 {{.*}} %V4F32 = fmul
	; AVX512: cost of 1 {{.*}} %V4F32 = fmul			; AVX512: cost of 1 {{.*}} %V4F32 = fmul
	; SLM: cost of 2 {{.*}} %V4F32 = fmul			; SLM: cost of 2 {{.*}} %V4F32 = fmul
				; GLM: cost of 1 {{.*}} %V4F32 = fmul
	%V4F32 = fmul <4 x float> undef, undef			%V4F32 = fmul <4 x float> undef, undef
	; SSE2: cost of 4 {{.*}} %V8F32 = fmul			; SSE2: cost of 4 {{.*}} %V8F32 = fmul
	; SSE42: cost of 2 {{.*}} %V8F32 = fmul			; SSE42: cost of 2 {{.*}} %V8F32 = fmul
	; AVX: cost of 2 {{.*}} %V8F32 = fmul			; AVX: cost of 2 {{.*}} %V8F32 = fmul
	; AVX2: cost of 1 {{.*}} %V8F32 = fmul			; AVX2: cost of 1 {{.*}} %V8F32 = fmul
	; AVX512: cost of 1 {{.*}} %V8F32 = fmul			; AVX512: cost of 1 {{.*}} %V8F32 = fmul
	; SLM: cost of 4 {{.*}} %V8F32 = fmul			; SLM: cost of 4 {{.*}} %V8F32 = fmul
				; GLM: cost of 2 {{.*}} %V8F32 = fmul
	%V8F32 = fmul <8 x float> undef, undef			%V8F32 = fmul <8 x float> undef, undef
	; SSE2: cost of 8 {{.*}} %V16F32 = fmul			; SSE2: cost of 8 {{.*}} %V16F32 = fmul
	; SSE42: cost of 4 {{.*}} %V16F32 = fmul			; SSE42: cost of 4 {{.*}} %V16F32 = fmul
	; AVX: cost of 4 {{.*}} %V16F32 = fmul			; AVX: cost of 4 {{.*}} %V16F32 = fmul
	; AVX2: cost of 2 {{.*}} %V16F32 = fmul			; AVX2: cost of 2 {{.*}} %V16F32 = fmul
	; AVX512: cost of 1 {{.*}} %V16F32 = fmul			; AVX512: cost of 1 {{.*}} %V16F32 = fmul
	; SLM: cost of 8 {{.*}} %V16F32 = fmul			; SLM: cost of 8 {{.*}} %V16F32 = fmul
				; GLM: cost of 4 {{.*}} %V16F32 = fmul
	%V16F32 = fmul <16 x float> undef, undef			%V16F32 = fmul <16 x float> undef, undef

	; SSE2: cost of 2 {{.*}} %F64 = fmul			; SSE2: cost of 2 {{.*}} %F64 = fmul
	; SSE42: cost of 1 {{.*}} %F64 = fmul			; SSE42: cost of 1 {{.*}} %F64 = fmul
	; AVX: cost of 1 {{.*}} %F64 = fmul			; AVX: cost of 1 {{.*}} %F64 = fmul
	; AVX2: cost of 1 {{.*}} %F64 = fmul			; AVX2: cost of 1 {{.*}} %F64 = fmul
	; AVX512: cost of 1 {{.*}} %F64 = fmul			; AVX512: cost of 1 {{.*}} %F64 = fmul
	; SLM: cost of 2 {{.*}} %F64 = fmul			; SLM: cost of 2 {{.*}} %F64 = fmul
				; GLM: cost of 1 {{.*}} %F64 = fmul
	%F64 = fmul double undef, undef			%F64 = fmul double undef, undef
	; SSE2: cost of 2 {{.*}} %V2F64 = fmul			; SSE2: cost of 2 {{.*}} %V2F64 = fmul
	; SSE42: cost of 1 {{.*}} %V2F64 = fmul			; SSE42: cost of 1 {{.*}} %V2F64 = fmul
	; AVX: cost of 1 {{.*}} %V2F64 = fmul			; AVX: cost of 1 {{.*}} %V2F64 = fmul
	; AVX2: cost of 1 {{.*}} %V2F64 = fmul			; AVX2: cost of 1 {{.*}} %V2F64 = fmul
	; AVX512: cost of 1 {{.*}} %V2F64 = fmul			; AVX512: cost of 1 {{.*}} %V2F64 = fmul
	; SLM: cost of 4 {{.*}} %V2F64 = fmul			; SLM: cost of 4 {{.*}} %V2F64 = fmul
				; GLM: cost of 1 {{.*}} %V2F64 = fmul
	%V2F64 = fmul <2 x double> undef, undef			%V2F64 = fmul <2 x double> undef, undef
	; SSE2: cost of 4 {{.*}} %V4F64 = fmul			; SSE2: cost of 4 {{.*}} %V4F64 = fmul
	; SSE42: cost of 2 {{.*}} %V4F64 = fmul			; SSE42: cost of 2 {{.*}} %V4F64 = fmul
	; AVX: cost of 2 {{.*}} %V4F64 = fmul			; AVX: cost of 2 {{.*}} %V4F64 = fmul
	; AVX2: cost of 1 {{.*}} %V4F64 = fmul			; AVX2: cost of 1 {{.*}} %V4F64 = fmul
	; AVX512: cost of 1 {{.*}} %V4F64 = fmul			; AVX512: cost of 1 {{.*}} %V4F64 = fmul
	; SLM: cost of 8 {{.*}} %V4F64 = fmul			; SLM: cost of 8 {{.*}} %V4F64 = fmul
				; GLM: cost of 2 {{.*}} %V4F64 = fmul
	%V4F64 = fmul <4 x double> undef, undef			%V4F64 = fmul <4 x double> undef, undef
	; SSE2: cost of 8 {{.*}} %V8F64 = fmul			; SSE2: cost of 8 {{.*}} %V8F64 = fmul
	; SSE42: cost of 4 {{.*}} %V8F64 = fmul			; SSE42: cost of 4 {{.*}} %V8F64 = fmul
	; AVX: cost of 4 {{.*}} %V8F64 = fmul			; AVX: cost of 4 {{.*}} %V8F64 = fmul
	; AVX2: cost of 2 {{.*}} %V8F64 = fmul			; AVX2: cost of 2 {{.*}} %V8F64 = fmul
	; AVX512: cost of 1 {{.*}} %V8F64 = fmul			; AVX512: cost of 1 {{.*}} %V8F64 = fmul
	; SLM: cost of 16 {{.*}} %V8F64 = fmul			; SLM: cost of 16 {{.*}} %V8F64 = fmul
				; GLM: cost of 4 {{.*}} %V8F64 = fmul
	%V8F64 = fmul <8 x double> undef, undef			%V8F64 = fmul <8 x double> undef, undef

	ret i32 undef			ret i32 undef
	}			}

	; CHECK-LABEL: 'fdiv'			; CHECK-LABEL: 'fdiv'
	define i32 @fdiv(i32 %arg) {			define i32 @fdiv(i32 %arg) {
	; SSE2: cost of 23 {{.*}} %F32 = fdiv			; SSE2: cost of 23 {{.*}} %F32 = fdiv
	; SSE42: cost of 14 {{.*}} %F32 = fdiv			; SSE42: cost of 14 {{.*}} %F32 = fdiv
	; AVX: cost of 14 {{.*}} %F32 = fdiv			; AVX: cost of 14 {{.*}} %F32 = fdiv
	; AVX2: cost of 7 {{.*}} %F32 = fdiv			; AVX2: cost of 7 {{.*}} %F32 = fdiv
	; AVX512: cost of 7 {{.*}} %F32 = fdiv			; AVX512: cost of 7 {{.*}} %F32 = fdiv
	; SLM: cost of 17 {{.*}} %F32 = fdiv			; SLM: cost of 17 {{.*}} %F32 = fdiv
				; GLM: cost of 18 {{.*}} %F32 = fdiv
	%F32 = fdiv float undef, undef			%F32 = fdiv float undef, undef
	; SSE2: cost of 39 {{.*}} %V4F32 = fdiv			; SSE2: cost of 39 {{.*}} %V4F32 = fdiv
	; SSE42: cost of 14 {{.*}} %V4F32 = fdiv			; SSE42: cost of 14 {{.*}} %V4F32 = fdiv
	; AVX: cost of 14 {{.*}} %V4F32 = fdiv			; AVX: cost of 14 {{.*}} %V4F32 = fdiv
	; AVX2: cost of 7 {{.*}} %V4F32 = fdiv			; AVX2: cost of 7 {{.*}} %V4F32 = fdiv
	; AVX512: cost of 7 {{.*}} %V4F32 = fdiv			; AVX512: cost of 7 {{.*}} %V4F32 = fdiv
	; SLM: cost of 39 {{.*}} %V4F32 = fdiv			; SLM: cost of 39 {{.*}} %V4F32 = fdiv
				; GLM: cost of 35 {{.*}} %V4F32 = fdiv
	%V4F32 = fdiv <4 x float> undef, undef			%V4F32 = fdiv <4 x float> undef, undef
	; SSE2: cost of 78 {{.*}} %V8F32 = fdiv			; SSE2: cost of 78 {{.*}} %V8F32 = fdiv
	; SSE42: cost of 28 {{.*}} %V8F32 = fdiv			; SSE42: cost of 28 {{.*}} %V8F32 = fdiv
	; AVX: cost of 28 {{.*}} %V8F32 = fdiv			; AVX: cost of 28 {{.*}} %V8F32 = fdiv
	; AVX2: cost of 14 {{.*}} %V8F32 = fdiv			; AVX2: cost of 14 {{.*}} %V8F32 = fdiv
	; AVX512: cost of 14 {{.*}} %V8F32 = fdiv			; AVX512: cost of 14 {{.*}} %V8F32 = fdiv
	; SLM: cost of 78 {{.*}} %V8F32 = fdiv			; SLM: cost of 78 {{.*}} %V8F32 = fdiv
				; GLM: cost of 70 {{.*}} %V8F32 = fdiv
	%V8F32 = fdiv <8 x float> undef, undef			%V8F32 = fdiv <8 x float> undef, undef
	; SSE2: cost of 156 {{.*}} %V16F32 = fdiv			; SSE2: cost of 156 {{.*}} %V16F32 = fdiv
	; SSE42: cost of 56 {{.*}} %V16F32 = fdiv			; SSE42: cost of 56 {{.*}} %V16F32 = fdiv
	; AVX: cost of 56 {{.*}} %V16F32 = fdiv			; AVX: cost of 56 {{.*}} %V16F32 = fdiv
	; AVX2: cost of 28 {{.*}} %V16F32 = fdiv			; AVX2: cost of 28 {{.*}} %V16F32 = fdiv
	; AVX512: cost of 2 {{.*}} %V16F32 = fdiv			; AVX512: cost of 2 {{.*}} %V16F32 = fdiv
	; SLM: cost of 156 {{.*}} %V16F32 = fdiv			; SLM: cost of 156 {{.*}} %V16F32 = fdiv
				; GLM: cost of 140 {{.*}} %V16F32 = fdiv
	%V16F32 = fdiv <16 x float> undef, undef			%V16F32 = fdiv <16 x float> undef, undef

	; SSE2: cost of 38 {{.*}} %F64 = fdiv			; SSE2: cost of 38 {{.*}} %F64 = fdiv
	; SSE42: cost of 22 {{.*}} %F64 = fdiv			; SSE42: cost of 22 {{.*}} %F64 = fdiv
	; AVX: cost of 22 {{.*}} %F64 = fdiv			; AVX: cost of 22 {{.*}} %F64 = fdiv
	; AVX2: cost of 14 {{.*}} %F64 = fdiv			; AVX2: cost of 14 {{.*}} %F64 = fdiv
	; AVX512: cost of 14 {{.*}} %F64 = fdiv			; AVX512: cost of 14 {{.*}} %F64 = fdiv
	; SLM: cost of 32 {{.*}} %F64 = fdiv			; SLM: cost of 32 {{.*}} %F64 = fdiv
				; GLM: cost of 33 {{.*}} %F64 = fdiv
	%F64 = fdiv double undef, undef			%F64 = fdiv double undef, undef
	; SSE2: cost of 69 {{.*}} %V2F64 = fdiv			; SSE2: cost of 69 {{.*}} %V2F64 = fdiv
	; SSE42: cost of 22 {{.*}} %V2F64 = fdiv			; SSE42: cost of 22 {{.*}} %V2F64 = fdiv
	; AVX: cost of 22 {{.*}} %V2F64 = fdiv			; AVX: cost of 22 {{.*}} %V2F64 = fdiv
	; AVX2: cost of 14 {{.*}} %V2F64 = fdiv			; AVX2: cost of 14 {{.*}} %V2F64 = fdiv
	; AVX512: cost of 14 {{.*}} %V2F64 = fdiv			; AVX512: cost of 14 {{.*}} %V2F64 = fdiv
	; SLM: cost of 69 {{.*}} %V2F64 = fdiv			; SLM: cost of 69 {{.*}} %V2F64 = fdiv
				; GLM: cost of 65 {{.*}} %V2F64 = fdiv
	%V2F64 = fdiv <2 x double> undef, undef			%V2F64 = fdiv <2 x double> undef, undef
	; SSE2: cost of 138 {{.*}} %V4F64 = fdiv			; SSE2: cost of 138 {{.*}} %V4F64 = fdiv
	; SSE42: cost of 44 {{.*}} %V4F64 = fdiv			; SSE42: cost of 44 {{.*}} %V4F64 = fdiv
	; AVX: cost of 44 {{.*}} %V4F64 = fdiv			; AVX: cost of 44 {{.*}} %V4F64 = fdiv
	; AVX2: cost of 28 {{.*}} %V4F64 = fdiv			; AVX2: cost of 28 {{.*}} %V4F64 = fdiv
	; AVX512: cost of 28 {{.*}} %V4F64 = fdiv			; AVX512: cost of 28 {{.*}} %V4F64 = fdiv
	; SLM: cost of 138 {{.*}} %V4F64 = fdiv			; SLM: cost of 138 {{.*}} %V4F64 = fdiv
				; GLM: cost of 130 {{.*}} %V4F64 = fdiv
	%V4F64 = fdiv <4 x double> undef, undef			%V4F64 = fdiv <4 x double> undef, undef
	; SSE2: cost of 276 {{.*}} %V8F64 = fdiv			; SSE2: cost of 276 {{.*}} %V8F64 = fdiv
	; SSE42: cost of 88 {{.*}} %V8F64 = fdiv			; SSE42: cost of 88 {{.*}} %V8F64 = fdiv
	; AVX: cost of 88 {{.*}} %V8F64 = fdiv			; AVX: cost of 88 {{.*}} %V8F64 = fdiv
	; AVX2: cost of 56 {{.*}} %V8F64 = fdiv			; AVX2: cost of 56 {{.*}} %V8F64 = fdiv
	; AVX512: cost of 2 {{.*}} %V8F64 = fdiv			; AVX512: cost of 2 {{.*}} %V8F64 = fdiv
	; SLM: cost of 276 {{.*}} %V8F64 = fdiv			; SLM: cost of 276 {{.*}} %V8F64 = fdiv
				; GLM: cost of 260 {{.*}} %V8F64 = fdiv
	%V8F64 = fdiv <8 x double> undef, undef			%V8F64 = fdiv <8 x double> undef, undef

	ret i32 undef			ret i32 undef
	}			}

	; CHECK-LABEL: 'frem'			; CHECK-LABEL: 'frem'
	define i32 @frem(i32 %arg) {			define i32 @frem(i32 %arg) {
	; SSE2: cost of 2 {{.*}} %F32 = frem			; SSE2: cost of 2 {{.*}} %F32 = frem
	; SSE42: cost of 2 {{.*}} %F32 = frem			; SSE42: cost of 2 {{.*}} %F32 = frem
	; AVX: cost of 2 {{.*}} %F32 = frem			; AVX: cost of 2 {{.*}} %F32 = frem
	; AVX2: cost of 2 {{.*}} %F32 = frem			; AVX2: cost of 2 {{.*}} %F32 = frem
	; AVX512: cost of 2 {{.*}} %F32 = frem			; AVX512: cost of 2 {{.*}} %F32 = frem
	; SLM: cost of 2 {{.*}} %F32 = frem			; SLM: cost of 2 {{.*}} %F32 = frem
				; GLM: cost of 2 {{.*}} %F32 = frem
	%F32 = frem float undef, undef			%F32 = frem float undef, undef
	; SSE2: cost of 14 {{.*}} %V4F32 = frem			; SSE2: cost of 14 {{.*}} %V4F32 = frem
	; SSE42: cost of 14 {{.*}} %V4F32 = frem			; SSE42: cost of 14 {{.*}} %V4F32 = frem
	; AVX: cost of 14 {{.*}} %V4F32 = frem			; AVX: cost of 14 {{.*}} %V4F32 = frem
	; AVX2: cost of 14 {{.*}} %V4F32 = frem			; AVX2: cost of 14 {{.*}} %V4F32 = frem
	; AVX512: cost of 14 {{.*}} %V4F32 = frem			; AVX512: cost of 14 {{.*}} %V4F32 = frem
	; SLM: cost of 14 {{.*}} %V4F32 = frem			; SLM: cost of 14 {{.*}} %V4F32 = frem
				; GLM: cost of 14 {{.*}} %V4F32 = frem
	%V4F32 = frem <4 x float> undef, undef			%V4F32 = frem <4 x float> undef, undef
	; SSE2: cost of 28 {{.*}} %V8F32 = frem			; SSE2: cost of 28 {{.*}} %V8F32 = frem
	; SSE42: cost of 28 {{.*}} %V8F32 = frem			; SSE42: cost of 28 {{.*}} %V8F32 = frem
	; AVX: cost of 30 {{.*}} %V8F32 = frem			; AVX: cost of 30 {{.*}} %V8F32 = frem
	; AVX2: cost of 30 {{.*}} %V8F32 = frem			; AVX2: cost of 30 {{.*}} %V8F32 = frem
	; AVX512: cost of 30 {{.*}} %V8F32 = frem			; AVX512: cost of 30 {{.*}} %V8F32 = frem
	; SLM: cost of 28 {{.*}} %V8F32 = frem			; SLM: cost of 28 {{.*}} %V8F32 = frem
				; GLM: cost of 28 {{.*}} %V8F32 = frem
	%V8F32 = frem <8 x float> undef, undef			%V8F32 = frem <8 x float> undef, undef
	; SSE2: cost of 56 {{.*}} %V16F32 = frem			; SSE2: cost of 56 {{.*}} %V16F32 = frem
	; SSE42: cost of 56 {{.*}} %V16F32 = frem			; SSE42: cost of 56 {{.*}} %V16F32 = frem
	; AVX: cost of 60 {{.*}} %V16F32 = frem			; AVX: cost of 60 {{.*}} %V16F32 = frem
	; AVX2: cost of 60 {{.*}} %V16F32 = frem			; AVX2: cost of 60 {{.*}} %V16F32 = frem
	; AVX512: cost of 62 {{.*}} %V16F32 = frem			; AVX512: cost of 62 {{.*}} %V16F32 = frem
	; SLM: cost of 56 {{.*}} %V16F32 = frem			; SLM: cost of 56 {{.*}} %V16F32 = frem
				; GLM: cost of 56 {{.*}} %V16F32 = frem
	%V16F32 = frem <16 x float> undef, undef			%V16F32 = frem <16 x float> undef, undef

	; SSE2: cost of 2 {{.*}} %F64 = frem			; SSE2: cost of 2 {{.*}} %F64 = frem
	; SSE42: cost of 2 {{.*}} %F64 = frem			; SSE42: cost of 2 {{.*}} %F64 = frem
	; AVX: cost of 2 {{.*}} %F64 = frem			; AVX: cost of 2 {{.*}} %F64 = frem
	; AVX2: cost of 2 {{.*}} %F64 = frem			; AVX2: cost of 2 {{.*}} %F64 = frem
	; AVX512: cost of 2 {{.*}} %F64 = frem			; AVX512: cost of 2 {{.*}} %F64 = frem
	; SLM: cost of 2 {{.*}} %F64 = frem			; SLM: cost of 2 {{.*}} %F64 = frem
				; GLM: cost of 2 {{.*}} %F64 = frem
	%F64 = frem double undef, undef			%F64 = frem double undef, undef
	; SSE2: cost of 6 {{.*}} %V2F64 = frem			; SSE2: cost of 6 {{.*}} %V2F64 = frem
	; SSE42: cost of 6 {{.*}} %V2F64 = frem			; SSE42: cost of 6 {{.*}} %V2F64 = frem
	; AVX: cost of 6 {{.*}} %V2F64 = frem			; AVX: cost of 6 {{.*}} %V2F64 = frem
	; AVX2: cost of 6 {{.*}} %V2F64 = frem			; AVX2: cost of 6 {{.*}} %V2F64 = frem
	; AVX512: cost of 6 {{.*}} %V2F64 = frem			; AVX512: cost of 6 {{.*}} %V2F64 = frem
	; SLM: cost of 6 {{.*}} %V2F64 = frem			; SLM: cost of 6 {{.*}} %V2F64 = frem
				; GLM: cost of 6 {{.*}} %V2F64 = frem
	%V2F64 = frem <2 x double> undef, undef			%V2F64 = frem <2 x double> undef, undef
	; SSE2: cost of 12 {{.*}} %V4F64 = frem			; SSE2: cost of 12 {{.*}} %V4F64 = frem
	; SSE42: cost of 12 {{.*}} %V4F64 = frem			; SSE42: cost of 12 {{.*}} %V4F64 = frem
	; AVX: cost of 14 {{.*}} %V4F64 = frem			; AVX: cost of 14 {{.*}} %V4F64 = frem
	; AVX2: cost of 14 {{.*}} %V4F64 = frem			; AVX2: cost of 14 {{.*}} %V4F64 = frem
	; AVX512: cost of 14 {{.*}} %V4F64 = frem			; AVX512: cost of 14 {{.*}} %V4F64 = frem
	; SLM: cost of 12 {{.*}} %V4F64 = frem			; SLM: cost of 12 {{.*}} %V4F64 = frem
				; GLM: cost of 12 {{.*}} %V4F64 = frem
	%V4F64 = frem <4 x double> undef, undef			%V4F64 = frem <4 x double> undef, undef
	; SSE2: cost of 24 {{.*}} %V8F64 = frem			; SSE2: cost of 24 {{.*}} %V8F64 = frem
	; SSE42: cost of 24 {{.*}} %V8F64 = frem			; SSE42: cost of 24 {{.*}} %V8F64 = frem
	; AVX: cost of 28 {{.*}} %V8F64 = frem			; AVX: cost of 28 {{.*}} %V8F64 = frem
	; AVX2: cost of 28 {{.*}} %V8F64 = frem			; AVX2: cost of 28 {{.*}} %V8F64 = frem
	; AVX512: cost of 30 {{.*}} %V8F64 = frem			; AVX512: cost of 30 {{.*}} %V8F64 = frem
	; SLM: cost of 24 {{.*}} %V8F64 = frem			; SLM: cost of 24 {{.*}} %V8F64 = frem
				; GLM: cost of 24 {{.*}} %V8F64 = frem
	%V8F64 = frem <8 x double> undef, undef			%V8F64 = frem <8 x double> undef, undef

	ret i32 undef			ret i32 undef
	}			}

	; CHECK-LABEL: 'fsqrt'			; CHECK-LABEL: 'fsqrt'
	define i32 @fsqrt(i32 %arg) {			define i32 @fsqrt(i32 %arg) {
	; SSE2: cost of 28 {{.*}} %F32 = call float @llvm.sqrt.f32			; SSE2: cost of 28 {{.*}} %F32 = call float @llvm.sqrt.f32
	; SSE42: cost of 18 {{.*}} %F32 = call float @llvm.sqrt.f32			; SSE42: cost of 18 {{.*}} %F32 = call float @llvm.sqrt.f32
	; AVX: cost of 14 {{.*}} %F32 = call float @llvm.sqrt.f32			; AVX: cost of 14 {{.*}} %F32 = call float @llvm.sqrt.f32
	; AVX2: cost of 7 {{.*}} %F32 = call float @llvm.sqrt.f32			; AVX2: cost of 7 {{.*}} %F32 = call float @llvm.sqrt.f32
	; AVX512: cost of 7 {{.*}} %F32 = call float @llvm.sqrt.f32			; AVX512: cost of 7 {{.*}} %F32 = call float @llvm.sqrt.f32
	; SLM: cost of 18 {{.*}} %F32 = call float @llvm.sqrt.f32			; SLM: cost of 20 {{.*}} %F32 = call float @llvm.sqrt.f32
				; GLM: cost of 19 {{.*}} %F32 = call float @llvm.sqrt.f32
	%F32 = call float @llvm.sqrt.f32(float undef)			%F32 = call float @llvm.sqrt.f32(float undef)
	; SSE2: cost of 56 {{.*}} %V4F32 = call <4 x float> @llvm.sqrt.v4f32			; SSE2: cost of 56 {{.*}} %V4F32 = call <4 x float> @llvm.sqrt.v4f32
	; SSE42: cost of 18 {{.*}} %V4F32 = call <4 x float> @llvm.sqrt.v4f32			; SSE42: cost of 18 {{.*}} %V4F32 = call <4 x float> @llvm.sqrt.v4f32
	; AVX: cost of 14 {{.*}} %V4F32 = call <4 x float> @llvm.sqrt.v4f32			; AVX: cost of 14 {{.*}} %V4F32 = call <4 x float> @llvm.sqrt.v4f32
	; AVX2: cost of 7 {{.*}} %V4F32 = call <4 x float> @llvm.sqrt.v4f32			; AVX2: cost of 7 {{.*}} %V4F32 = call <4 x float> @llvm.sqrt.v4f32
	; AVX512: cost of 7 {{.*}} %V4F32 = call <4 x float> @llvm.sqrt.v4f32			; AVX512: cost of 7 {{.*}} %V4F32 = call <4 x float> @llvm.sqrt.v4f32
	; SLM: cost of 18 {{.*}} %V4F32 = call <4 x float> @llvm.sqrt.v4f32			; SLM: cost of 40 {{.*}} %V4F32 = call <4 x float> @llvm.sqrt.v4f32
				; GLM: cost of 37 {{.*}} %V4F32 = call <4 x float> @llvm.sqrt.v4f32
	%V4F32 = call <4 x float> @llvm.sqrt.v4f32(<4 x float> undef)			%V4F32 = call <4 x float> @llvm.sqrt.v4f32(<4 x float> undef)
	; SSE2: cost of 112 {{.*}} %V8F32 = call <8 x float> @llvm.sqrt.v8f32			; SSE2: cost of 112 {{.*}} %V8F32 = call <8 x float> @llvm.sqrt.v8f32
	; SSE42: cost of 36 {{.*}} %V8F32 = call <8 x float> @llvm.sqrt.v8f32			; SSE42: cost of 36 {{.*}} %V8F32 = call <8 x float> @llvm.sqrt.v8f32
	; AVX: cost of 28 {{.*}} %V8F32 = call <8 x float> @llvm.sqrt.v8f32			; AVX: cost of 28 {{.*}} %V8F32 = call <8 x float> @llvm.sqrt.v8f32
	; AVX2: cost of 14 {{.*}} %V8F32 = call <8 x float> @llvm.sqrt.v8f32			; AVX2: cost of 14 {{.*}} %V8F32 = call <8 x float> @llvm.sqrt.v8f32
	; AVX512: cost of 14 {{.*}} %V8F32 = call <8 x float> @llvm.sqrt.v8f32			; AVX512: cost of 14 {{.*}} %V8F32 = call <8 x float> @llvm.sqrt.v8f32
	; SLM: cost of 36 {{.*}} %V8F32 = call <8 x float> @llvm.sqrt.v8f32			; SLM: cost of 80 {{.*}} %V8F32 = call <8 x float> @llvm.sqrt.v8f32
				; GLM: cost of 74 {{.*}} %V8F32 = call <8 x float> @llvm.sqrt.v8f32
	%V8F32 = call <8 x float> @llvm.sqrt.v8f32(<8 x float> undef)			%V8F32 = call <8 x float> @llvm.sqrt.v8f32(<8 x float> undef)
	; SSE2: cost of 224 {{.*}} %V16F32 = call <16 x float> @llvm.sqrt.v16f32			; SSE2: cost of 224 {{.*}} %V16F32 = call <16 x float> @llvm.sqrt.v16f32
	; SSE42: cost of 72 {{.*}} %V16F32 = call <16 x float> @llvm.sqrt.v16f32			; SSE42: cost of 72 {{.*}} %V16F32 = call <16 x float> @llvm.sqrt.v16f32
	; AVX: cost of 56 {{.*}} %V16F32 = call <16 x float> @llvm.sqrt.v16f32			; AVX: cost of 56 {{.*}} %V16F32 = call <16 x float> @llvm.sqrt.v16f32
	; AVX2: cost of 28 {{.*}} %V16F32 = call <16 x float> @llvm.sqrt.v16f32			; AVX2: cost of 28 {{.*}} %V16F32 = call <16 x float> @llvm.sqrt.v16f32
	; AVX512: cost of 1 {{.*}} %V16F32 = call <16 x float> @llvm.sqrt.v16f32			; AVX512: cost of 1 {{.*}} %V16F32 = call <16 x float> @llvm.sqrt.v16f32
	; SLM: cost of 72 {{.*}} %V16F32 = call <16 x float> @llvm.sqrt.v16f32			; SLM: cost of 160 {{.*}} %V16F32 = call <16 x float> @llvm.sqrt.v16f32
				; GLM: cost of 148 {{.*}} %V16F32 = call <16 x float> @llvm.sqrt.v16f32
	%V16F32 = call <16 x float> @llvm.sqrt.v16f32(<16 x float> undef)			%V16F32 = call <16 x float> @llvm.sqrt.v16f32(<16 x float> undef)

	; SSE2: cost of 32 {{.*}} %F64 = call double @llvm.sqrt.f64			; SSE2: cost of 32 {{.*}} %F64 = call double @llvm.sqrt.f64
	; SSE42: cost of 32 {{.*}} %F64 = call double @llvm.sqrt.f64			; SSE42: cost of 32 {{.*}} %F64 = call double @llvm.sqrt.f64
	; AVX: cost of 21 {{.*}} %F64 = call double @llvm.sqrt.f64			; AVX: cost of 21 {{.*}} %F64 = call double @llvm.sqrt.f64
	; AVX2: cost of 14 {{.*}} %F64 = call double @llvm.sqrt.f64			; AVX2: cost of 14 {{.*}} %F64 = call double @llvm.sqrt.f64
	; AVX512: cost of 14 {{.*}} %F64 = call double @llvm.sqrt.f64			; AVX512: cost of 14 {{.*}} %F64 = call double @llvm.sqrt.f64
	; SLM: cost of 32 {{.*}} %F64 = call double @llvm.sqrt.f64			; SLM: cost of 35 {{.*}} %F64 = call double @llvm.sqrt.f64
				; GLM: cost of 34 {{.*}} %F64 = call double @llvm.sqrt.f64
	%F64 = call double @llvm.sqrt.f64(double undef)			%F64 = call double @llvm.sqrt.f64(double undef)
	; SSE2: cost of 32 {{.*}} %V2F64 = call <2 x double> @llvm.sqrt.v2f64			; SSE2: cost of 32 {{.*}} %V2F64 = call <2 x double> @llvm.sqrt.v2f64
	; SSE42: cost of 32 {{.*}} %V2F64 = call <2 x double> @llvm.sqrt.v2f64			; SSE42: cost of 32 {{.*}} %V2F64 = call <2 x double> @llvm.sqrt.v2f64
	; AVX: cost of 21 {{.*}} %V2F64 = call <2 x double> @llvm.sqrt.v2f64			; AVX: cost of 21 {{.*}} %V2F64 = call <2 x double> @llvm.sqrt.v2f64
	; AVX2: cost of 14 {{.*}} %V2F64 = call <2 x double> @llvm.sqrt.v2f64			; AVX2: cost of 14 {{.*}} %V2F64 = call <2 x double> @llvm.sqrt.v2f64
	; AVX512: cost of 14 {{.*}} %V2F64 = call <2 x double> @llvm.sqrt.v2f64			; AVX512: cost of 14 {{.*}} %V2F64 = call <2 x double> @llvm.sqrt.v2f64
	; SLM: cost of 32 {{.*}} %V2F64 = call <2 x double> @llvm.sqrt.v2f64			; SLM: cost of 70 {{.*}} %V2F64 = call <2 x double> @llvm.sqrt.v2f64
				; GLM: cost of 67 {{.*}} %V2F64 = call <2 x double> @llvm.sqrt.v2f64
	%V2F64 = call <2 x double> @llvm.sqrt.v2f64(<2 x double> undef)			%V2F64 = call <2 x double> @llvm.sqrt.v2f64(<2 x double> undef)
	; SSE2: cost of 64 {{.*}} %V4F64 = call <4 x double> @llvm.sqrt.v4f64			; SSE2: cost of 64 {{.*}} %V4F64 = call <4 x double> @llvm.sqrt.v4f64
	; SSE42: cost of 64 {{.*}} %V4F64 = call <4 x double> @llvm.sqrt.v4f64			; SSE42: cost of 64 {{.*}} %V4F64 = call <4 x double> @llvm.sqrt.v4f64
	; AVX: cost of 43 {{.*}} %V4F64 = call <4 x double> @llvm.sqrt.v4f64			; AVX: cost of 43 {{.*}} %V4F64 = call <4 x double> @llvm.sqrt.v4f64
	; AVX2: cost of 28 {{.*}} %V4F64 = call <4 x double> @llvm.sqrt.v4f64			; AVX2: cost of 28 {{.*}} %V4F64 = call <4 x double> @llvm.sqrt.v4f64
	; AVX512: cost of 28 {{.*}} %V4F64 = call <4 x double> @llvm.sqrt.v4f64			; AVX512: cost of 28 {{.*}} %V4F64 = call <4 x double> @llvm.sqrt.v4f64
	; SLM: cost of 64 {{.*}} %V4F64 = call <4 x double> @llvm.sqrt.v4f64			; SLM: cost of 140 {{.*}} %V4F64 = call <4 x double> @llvm.sqrt.v4f64
				; GLM: cost of 134 {{.*}} %V4F64 = call <4 x double> @llvm.sqrt.v4f64
	%V4F64 = call <4 x double> @llvm.sqrt.v4f64(<4 x double> undef)			%V4F64 = call <4 x double> @llvm.sqrt.v4f64(<4 x double> undef)
	; SSE2: cost of 128 {{.*}} %V8F64 = call <8 x double> @llvm.sqrt.v8f64			; SSE2: cost of 128 {{.*}} %V8F64 = call <8 x double> @llvm.sqrt.v8f64
	; SSE42: cost of 128 {{.*}} %V8F64 = call <8 x double> @llvm.sqrt.v8f64			; SSE42: cost of 128 {{.*}} %V8F64 = call <8 x double> @llvm.sqrt.v8f64
	; AVX: cost of 86 {{.*}} %V8F64 = call <8 x double> @llvm.sqrt.v8f64			; AVX: cost of 86 {{.*}} %V8F64 = call <8 x double> @llvm.sqrt.v8f64
	; AVX2: cost of 56 {{.*}} %V8F64 = call <8 x double> @llvm.sqrt.v8f64			; AVX2: cost of 56 {{.*}} %V8F64 = call <8 x double> @llvm.sqrt.v8f64
	; AVX512: cost of 1 {{.*}} %V8F64 = call <8 x double> @llvm.sqrt.v8f64			; AVX512: cost of 1 {{.*}} %V8F64 = call <8 x double> @llvm.sqrt.v8f64
	; SLM: cost of 128 {{.*}} %V8F64 = call <8 x double> @llvm.sqrt.v8f64			; SLM: cost of 280 {{.*}} %V8F64 = call <8 x double> @llvm.sqrt.v8f64
				; GLM: cost of 268 {{.*}} %V8F64 = call <8 x double> @llvm.sqrt.v8f64
	%V8F64 = call <8 x double> @llvm.sqrt.v8f64(<8 x double> undef)			%V8F64 = call <8 x double> @llvm.sqrt.v8f64(<8 x double> undef)

	ret i32 undef			ret i32 undef
	}			}

	; CHECK-LABEL: 'fabs'			; CHECK-LABEL: 'fabs'
	define i32 @fabs(i32 %arg) {			define i32 @fabs(i32 %arg) {
	; SSE2: cost of 2 {{.*}} %F32 = call float @llvm.fabs.f32			; SSE2: cost of 2 {{.*}} %F32 = call float @llvm.fabs.f32
	; SSE42: cost of 2 {{.*}} %F32 = call float @llvm.fabs.f32			; SSE42: cost of 2 {{.*}} %F32 = call float @llvm.fabs.f32
	; AVX: cost of 2 {{.*}} %F32 = call float @llvm.fabs.f32			; AVX: cost of 2 {{.*}} %F32 = call float @llvm.fabs.f32
	; AVX2: cost of 2 {{.*}} %F32 = call float @llvm.fabs.f32			; AVX2: cost of 2 {{.*}} %F32 = call float @llvm.fabs.f32
	; AVX512: cost of 2 {{.*}} %F32 = call float @llvm.fabs.f32			; AVX512: cost of 2 {{.*}} %F32 = call float @llvm.fabs.f32
	; SLM: cost of 2 {{.*}} %F32 = call float @llvm.fabs.f32			; SLM: cost of 2 {{.*}} %F32 = call float @llvm.fabs.f32
				; GLM: cost of 2 {{.*}} %F32 = call float @llvm.fabs.f32
	%F32 = call float @llvm.fabs.f32(float undef)			%F32 = call float @llvm.fabs.f32(float undef)
	; SSE2: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.fabs.v4f32			; SSE2: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.fabs.v4f32
	; SSE42: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.fabs.v4f32			; SSE42: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.fabs.v4f32
	; AVX: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.fabs.v4f32			; AVX: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.fabs.v4f32
	; AVX2: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.fabs.v4f32			; AVX2: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.fabs.v4f32
	; AVX512: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.fabs.v4f32			; AVX512: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.fabs.v4f32
	; SLM: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.fabs.v4f32			; SLM: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.fabs.v4f32
				; GLM: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.fabs.v4f32
	%V4F32 = call <4 x float> @llvm.fabs.v4f32(<4 x float> undef)			%V4F32 = call <4 x float> @llvm.fabs.v4f32(<4 x float> undef)
	; SSE2: cost of 4 {{.*}} %V8F32 = call <8 x float> @llvm.fabs.v8f32			; SSE2: cost of 4 {{.*}} %V8F32 = call <8 x float> @llvm.fabs.v8f32
	; SSE42: cost of 4 {{.*}} %V8F32 = call <8 x float> @llvm.fabs.v8f32			; SSE42: cost of 4 {{.*}} %V8F32 = call <8 x float> @llvm.fabs.v8f32
	; AVX: cost of 2 {{.*}} %V8F32 = call <8 x float> @llvm.fabs.v8f32			; AVX: cost of 2 {{.*}} %V8F32 = call <8 x float> @llvm.fabs.v8f32
	; AVX2: cost of 2 {{.*}} %V8F32 = call <8 x float> @llvm.fabs.v8f32			; AVX2: cost of 2 {{.*}} %V8F32 = call <8 x float> @llvm.fabs.v8f32
	; AVX512: cost of 2 {{.*}} %V8F32 = call <8 x float> @llvm.fabs.v8f32			; AVX512: cost of 2 {{.*}} %V8F32 = call <8 x float> @llvm.fabs.v8f32
	; SLM: cost of 4 {{.*}} %V8F32 = call <8 x float> @llvm.fabs.v8f32			; SLM: cost of 4 {{.*}} %V8F32 = call <8 x float> @llvm.fabs.v8f32
				; GLM: cost of 4 {{.*}} %V8F32 = call <8 x float> @llvm.fabs.v8f32
	%V8F32 = call <8 x float> @llvm.fabs.v8f32(<8 x float> undef)			%V8F32 = call <8 x float> @llvm.fabs.v8f32(<8 x float> undef)
	; SSE2: cost of 8 {{.*}} %V16F32 = call <16 x float> @llvm.fabs.v16f32			; SSE2: cost of 8 {{.*}} %V16F32 = call <16 x float> @llvm.fabs.v16f32
	; SSE42: cost of 8 {{.*}} %V16F32 = call <16 x float> @llvm.fabs.v16f32			; SSE42: cost of 8 {{.*}} %V16F32 = call <16 x float> @llvm.fabs.v16f32
	; AVX: cost of 4 {{.*}} %V16F32 = call <16 x float> @llvm.fabs.v16f32			; AVX: cost of 4 {{.*}} %V16F32 = call <16 x float> @llvm.fabs.v16f32
	; AVX2: cost of 4 {{.*}} %V16F32 = call <16 x float> @llvm.fabs.v16f32			; AVX2: cost of 4 {{.*}} %V16F32 = call <16 x float> @llvm.fabs.v16f32
	; AVX512: cost of 2 {{.*}} %V16F32 = call <16 x float> @llvm.fabs.v16f32			; AVX512: cost of 2 {{.*}} %V16F32 = call <16 x float> @llvm.fabs.v16f32
	; SLM: cost of 8 {{.*}} %V16F32 = call <16 x float> @llvm.fabs.v16f32			; SLM: cost of 8 {{.*}} %V16F32 = call <16 x float> @llvm.fabs.v16f32
				; GLM: cost of 8 {{.*}} %V16F32 = call <16 x float> @llvm.fabs.v16f32
	%V16F32 = call <16 x float> @llvm.fabs.v16f32(<16 x float> undef)			%V16F32 = call <16 x float> @llvm.fabs.v16f32(<16 x float> undef)

	; SSE2: cost of 2 {{.*}} %F64 = call double @llvm.fabs.f64			; SSE2: cost of 2 {{.*}} %F64 = call double @llvm.fabs.f64
	; SSE42: cost of 2 {{.*}} %F64 = call double @llvm.fabs.f64			; SSE42: cost of 2 {{.*}} %F64 = call double @llvm.fabs.f64
	; AVX: cost of 2 {{.*}} %F64 = call double @llvm.fabs.f64			; AVX: cost of 2 {{.*}} %F64 = call double @llvm.fabs.f64
	; AVX2: cost of 2 {{.*}} %F64 = call double @llvm.fabs.f64			; AVX2: cost of 2 {{.*}} %F64 = call double @llvm.fabs.f64
	; AVX512: cost of 2 {{.*}} %F64 = call double @llvm.fabs.f64			; AVX512: cost of 2 {{.*}} %F64 = call double @llvm.fabs.f64
	; SLM: cost of 2 {{.*}} %F64 = call double @llvm.fabs.f64			; SLM: cost of 2 {{.*}} %F64 = call double @llvm.fabs.f64
				; GLM: cost of 2 {{.*}} %F64 = call double @llvm.fabs.f64
	%F64 = call double @llvm.fabs.f64(double undef)			%F64 = call double @llvm.fabs.f64(double undef)
	; SSE2: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.fabs.v2f64			; SSE2: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.fabs.v2f64
	; SSE42: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.fabs.v2f64			; SSE42: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.fabs.v2f64
	; AVX: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.fabs.v2f64			; AVX: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.fabs.v2f64
	; AVX2: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.fabs.v2f64			; AVX2: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.fabs.v2f64
	; AVX512: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.fabs.v2f64			; AVX512: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.fabs.v2f64
	; SLM: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.fabs.v2f64			; SLM: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.fabs.v2f64
				; GLM: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.fabs.v2f64
	%V2F64 = call <2 x double> @llvm.fabs.v2f64(<2 x double> undef)			%V2F64 = call <2 x double> @llvm.fabs.v2f64(<2 x double> undef)
	; SSE2: cost of 4 {{.*}} %V4F64 = call <4 x double> @llvm.fabs.v4f64			; SSE2: cost of 4 {{.*}} %V4F64 = call <4 x double> @llvm.fabs.v4f64
	; SSE42: cost of 4 {{.*}} %V4F64 = call <4 x double> @llvm.fabs.v4f64			; SSE42: cost of 4 {{.*}} %V4F64 = call <4 x double> @llvm.fabs.v4f64
	; AVX: cost of 2 {{.*}} %V4F64 = call <4 x double> @llvm.fabs.v4f64			; AVX: cost of 2 {{.*}} %V4F64 = call <4 x double> @llvm.fabs.v4f64
	; AVX2: cost of 2 {{.*}} %V4F64 = call <4 x double> @llvm.fabs.v4f64			; AVX2: cost of 2 {{.*}} %V4F64 = call <4 x double> @llvm.fabs.v4f64
	; AVX512: cost of 2 {{.*}} %V4F64 = call <4 x double> @llvm.fabs.v4f64			; AVX512: cost of 2 {{.*}} %V4F64 = call <4 x double> @llvm.fabs.v4f64
	; SLM: cost of 4 {{.*}} %V4F64 = call <4 x double> @llvm.fabs.v4f64			; SLM: cost of 4 {{.*}} %V4F64 = call <4 x double> @llvm.fabs.v4f64
				; GLM: cost of 4 {{.*}} %V4F64 = call <4 x double> @llvm.fabs.v4f64
	%V4F64 = call <4 x double> @llvm.fabs.v4f64(<4 x double> undef)			%V4F64 = call <4 x double> @llvm.fabs.v4f64(<4 x double> undef)
	; SSE2: cost of 8 {{.*}} %V8F64 = call <8 x double> @llvm.fabs.v8f64			; SSE2: cost of 8 {{.*}} %V8F64 = call <8 x double> @llvm.fabs.v8f64
	; SSE42: cost of 8 {{.*}} %V8F64 = call <8 x double> @llvm.fabs.v8f64			; SSE42: cost of 8 {{.*}} %V8F64 = call <8 x double> @llvm.fabs.v8f64
	; AVX: cost of 4 {{.*}} %V8F64 = call <8 x double> @llvm.fabs.v8f64			; AVX: cost of 4 {{.*}} %V8F64 = call <8 x double> @llvm.fabs.v8f64
	; AVX2: cost of 4 {{.*}} %V8F64 = call <8 x double> @llvm.fabs.v8f64			; AVX2: cost of 4 {{.*}} %V8F64 = call <8 x double> @llvm.fabs.v8f64
	; AVX512: cost of 2 {{.*}} %V8F64 = call <8 x double> @llvm.fabs.v8f64			; AVX512: cost of 2 {{.*}} %V8F64 = call <8 x double> @llvm.fabs.v8f64
	; SLM: cost of 8 {{.*}} %V8F64 = call <8 x double> @llvm.fabs.v8f64			; SLM: cost of 8 {{.*}} %V8F64 = call <8 x double> @llvm.fabs.v8f64
				; GLM: cost of 8 {{.*}} %V8F64 = call <8 x double> @llvm.fabs.v8f64
	%V8F64 = call <8 x double> @llvm.fabs.v8f64(<8 x double> undef)			%V8F64 = call <8 x double> @llvm.fabs.v8f64(<8 x double> undef)

	ret i32 undef			ret i32 undef
	}			}

	; CHECK-LABEL: 'fcopysign'			; CHECK-LABEL: 'fcopysign'
	define i32 @fcopysign(i32 %arg) {			define i32 @fcopysign(i32 %arg) {
	; SSE2: cost of 2 {{.*}} %F32 = call float @llvm.copysign.f32			; SSE2: cost of 2 {{.*}} %F32 = call float @llvm.copysign.f32
	; SSE42: cost of 2 {{.*}} %F32 = call float @llvm.copysign.f32			; SSE42: cost of 2 {{.*}} %F32 = call float @llvm.copysign.f32
	; AVX: cost of 2 {{.*}} %F32 = call float @llvm.copysign.f32			; AVX: cost of 2 {{.*}} %F32 = call float @llvm.copysign.f32
	; AVX2: cost of 2 {{.*}} %F32 = call float @llvm.copysign.f32			; AVX2: cost of 2 {{.*}} %F32 = call float @llvm.copysign.f32
	; AVX512: cost of 2 {{.*}} %F32 = call float @llvm.copysign.f32			; AVX512: cost of 2 {{.*}} %F32 = call float @llvm.copysign.f32
	; SLM: cost of 2 {{.*}} %F32 = call float @llvm.copysign.f32			; SLM: cost of 2 {{.*}} %F32 = call float @llvm.copysign.f32
				; GLM: cost of 2 {{.*}} %F32 = call float @llvm.copysign.f32
	%F32 = call float @llvm.copysign.f32(float undef, float undef)			%F32 = call float @llvm.copysign.f32(float undef, float undef)
	; SSE2: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.copysign.v4f32			; SSE2: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.copysign.v4f32
	; SSE42: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.copysign.v4f32			; SSE42: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.copysign.v4f32
	; AVX: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.copysign.v4f32			; AVX: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.copysign.v4f32
	; AVX2: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.copysign.v4f32			; AVX2: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.copysign.v4f32
	; AVX512: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.copysign.v4f32			; AVX512: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.copysign.v4f32
	; SLM: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.copysign.v4f32			; SLM: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.copysign.v4f32
				; GLM: cost of 2 {{.*}} %V4F32 = call <4 x float> @llvm.copysign.v4f32
	%V4F32 = call <4 x float> @llvm.copysign.v4f32(<4 x float> undef, <4 x float> undef)			%V4F32 = call <4 x float> @llvm.copysign.v4f32(<4 x float> undef, <4 x float> undef)
	; SSE2: cost of 4 {{.*}} %V8F32 = call <8 x float> @llvm.copysign.v8f32			; SSE2: cost of 4 {{.*}} %V8F32 = call <8 x float> @llvm.copysign.v8f32
	; SSE42: cost of 4 {{.*}} %V8F32 = call <8 x float> @llvm.copysign.v8f32			; SSE42: cost of 4 {{.*}} %V8F32 = call <8 x float> @llvm.copysign.v8f32
	; AVX: cost of 2 {{.*}} %V8F32 = call <8 x float> @llvm.copysign.v8f32			; AVX: cost of 2 {{.*}} %V8F32 = call <8 x float> @llvm.copysign.v8f32
	; AVX2: cost of 2 {{.*}} %V8F32 = call <8 x float> @llvm.copysign.v8f32			; AVX2: cost of 2 {{.*}} %V8F32 = call <8 x float> @llvm.copysign.v8f32
	; AVX512: cost of 2 {{.*}} %V8F32 = call <8 x float> @llvm.copysign.v8f32			; AVX512: cost of 2 {{.*}} %V8F32 = call <8 x float> @llvm.copysign.v8f32
	; SLM: cost of 4 {{.*}} %V8F32 = call <8 x float> @llvm.copysign.v8f32			; SLM: cost of 4 {{.*}} %V8F32 = call <8 x float> @llvm.copysign.v8f32
				; GLM: cost of 4 {{.*}} %V8F32 = call <8 x float> @llvm.copysign.v8f32
	%V8F32 = call <8 x float> @llvm.copysign.v8f32(<8 x float> undef, <8 x float> undef)			%V8F32 = call <8 x float> @llvm.copysign.v8f32(<8 x float> undef, <8 x float> undef)
	; SSE2: cost of 8 {{.*}} %V16F32 = call <16 x float> @llvm.copysign.v16f32			; SSE2: cost of 8 {{.*}} %V16F32 = call <16 x float> @llvm.copysign.v16f32
	; SSE42: cost of 8 {{.*}} %V16F32 = call <16 x float> @llvm.copysign.v16f32			; SSE42: cost of 8 {{.*}} %V16F32 = call <16 x float> @llvm.copysign.v16f32
	; AVX: cost of 4 {{.*}} %V16F32 = call <16 x float> @llvm.copysign.v16f32			; AVX: cost of 4 {{.*}} %V16F32 = call <16 x float> @llvm.copysign.v16f32
	; AVX2: cost of 4 {{.*}} %V16F32 = call <16 x float> @llvm.copysign.v16f32			; AVX2: cost of 4 {{.*}} %V16F32 = call <16 x float> @llvm.copysign.v16f32
	; AVX512: cost of 2 {{.*}} %V16F32 = call <16 x float> @llvm.copysign.v16f32			; AVX512: cost of 2 {{.*}} %V16F32 = call <16 x float> @llvm.copysign.v16f32
	; SLM: cost of 8 {{.*}} %V16F32 = call <16 x float> @llvm.copysign.v16f32			; SLM: cost of 8 {{.*}} %V16F32 = call <16 x float> @llvm.copysign.v16f32
				; GLM: cost of 8 {{.*}} %V16F32 = call <16 x float> @llvm.copysign.v16f32
	%V16F32 = call <16 x float> @llvm.copysign.v16f32(<16 x float> undef, <16 x float> undef)			%V16F32 = call <16 x float> @llvm.copysign.v16f32(<16 x float> undef, <16 x float> undef)

	; SSE2: cost of 2 {{.*}} %F64 = call double @llvm.copysign.f64			; SSE2: cost of 2 {{.*}} %F64 = call double @llvm.copysign.f64
	; SSE42: cost of 2 {{.*}} %F64 = call double @llvm.copysign.f64			; SSE42: cost of 2 {{.*}} %F64 = call double @llvm.copysign.f64
	; AVX: cost of 2 {{.*}} %F64 = call double @llvm.copysign.f64			; AVX: cost of 2 {{.*}} %F64 = call double @llvm.copysign.f64
	; AVX2: cost of 2 {{.*}} %F64 = call double @llvm.copysign.f64			; AVX2: cost of 2 {{.*}} %F64 = call double @llvm.copysign.f64
	; AVX512: cost of 2 {{.*}} %F64 = call double @llvm.copysign.f64			; AVX512: cost of 2 {{.*}} %F64 = call double @llvm.copysign.f64
	; SLM: cost of 2 {{.*}} %F64 = call double @llvm.copysign.f64			; SLM: cost of 2 {{.*}} %F64 = call double @llvm.copysign.f64
				; GLM: cost of 2 {{.*}} %F64 = call double @llvm.copysign.f64
	%F64 = call double @llvm.copysign.f64(double undef, double undef)			%F64 = call double @llvm.copysign.f64(double undef, double undef)
	; SSE2: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.copysign.v2f64			; SSE2: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.copysign.v2f64
	; SSE42: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.copysign.v2f64			; SSE42: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.copysign.v2f64
	; AVX: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.copysign.v2f64			; AVX: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.copysign.v2f64
	; AVX2: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.copysign.v2f64			; AVX2: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.copysign.v2f64
	; AVX512: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.copysign.v2f64			; AVX512: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.copysign.v2f64
	; SLM: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.copysign.v2f64			; SLM: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.copysign.v2f64
				; GLM: cost of 2 {{.*}} %V2F64 = call <2 x double> @llvm.copysign.v2f64
	%V2F64 = call <2 x double> @llvm.copysign.v2f64(<2 x double> undef, <2 x double> undef)			%V2F64 = call <2 x double> @llvm.copysign.v2f64(<2 x double> undef, <2 x double> undef)
	; SSE2: cost of 4 {{.*}} %V4F64 = call <4 x double> @llvm.copysign.v4f64			; SSE2: cost of 4 {{.*}} %V4F64 = call <4 x double> @llvm.copysign.v4f64
	; SSE42: cost of 4 {{.*}} %V4F64 = call <4 x double> @llvm.copysign.v4f64			; SSE42: cost of 4 {{.*}} %V4F64 = call <4 x double> @llvm.copysign.v4f64
	; AVX: cost of 2 {{.*}} %V4F64 = call <4 x double> @llvm.copysign.v4f64			; AVX: cost of 2 {{.*}} %V4F64 = call <4 x double> @llvm.copysign.v4f64
	; AVX2: cost of 2 {{.*}} %V4F64 = call <4 x double> @llvm.copysign.v4f64			; AVX2: cost of 2 {{.*}} %V4F64 = call <4 x double> @llvm.copysign.v4f64
	; AVX512: cost of 2 {{.*}} %V4F64 = call <4 x double> @llvm.copysign.v4f64			; AVX512: cost of 2 {{.*}} %V4F64 = call <4 x double> @llvm.copysign.v4f64
	; SLM: cost of 4 {{.*}} %V4F64 = call <4 x double> @llvm.copysign.v4f64			; SLM: cost of 4 {{.*}} %V4F64 = call <4 x double> @llvm.copysign.v4f64
				; GLM: cost of 4 {{.*}} %V4F64 = call <4 x double> @llvm.copysign.v4f64
	%V4F64 = call <4 x double> @llvm.copysign.v4f64(<4 x double> undef, <4 x double> undef)			%V4F64 = call <4 x double> @llvm.copysign.v4f64(<4 x double> undef, <4 x double> undef)
	; SSE2: cost of 8 {{.*}} %V8F64 = call <8 x double> @llvm.copysign.v8f64			; SSE2: cost of 8 {{.*}} %V8F64 = call <8 x double> @llvm.copysign.v8f64
	; SSE42: cost of 8 {{.*}} %V8F64 = call <8 x double> @llvm.copysign.v8f64			; SSE42: cost of 8 {{.*}} %V8F64 = call <8 x double> @llvm.copysign.v8f64
	; AVX: cost of 4 {{.*}} %V8F64 = call <8 x double> @llvm.copysign.v8f64			; AVX: cost of 4 {{.*}} %V8F64 = call <8 x double> @llvm.copysign.v8f64
	; AVX2: cost of 4 {{.*}} %V8F64 = call <8 x double> @llvm.copysign.v8f64			; AVX2: cost of 4 {{.*}} %V8F64 = call <8 x double> @llvm.copysign.v8f64
	; AVX512: cost of 2 {{.*}} %V8F64 = call <8 x double> @llvm.copysign.v8f64			; AVX512: cost of 2 {{.*}} %V8F64 = call <8 x double> @llvm.copysign.v8f64
	; SLM: cost of 8 {{.*}} %V8F64 = call <8 x double> @llvm.copysign.v8f64			; SLM: cost of 8 {{.*}} %V8F64 = call <8 x double> @llvm.copysign.v8f64
				; GLM: cost of 8 {{.*}} %V8F64 = call <8 x double> @llvm.copysign.v8f64
	%V8F64 = call <8 x double> @llvm.copysign.v8f64(<8 x double> undef, <8 x double> undef)			%V8F64 = call <8 x double> @llvm.copysign.v8f64(<8 x double> undef, <8 x double> undef)

	ret i32 undef			ret i32 undef
	}			}

	; CHECK-LABEL: 'fma'			; CHECK-LABEL: 'fma'
	define i32 @fma(i32 %arg) {			define i32 @fma(i32 %arg) {
	; SSE2: cost of 10 {{.*}} %F32 = call float @llvm.fma.f32			; SSE2: cost of 10 {{.*}} %F32 = call float @llvm.fma.f32
	; SSE42: cost of 10 {{.*}} %F32 = call float @llvm.fma.f32			; SSE42: cost of 10 {{.*}} %F32 = call float @llvm.fma.f32
	; AVX: cost of 1 {{.*}} %F32 = call float @llvm.fma.f32			; AVX: cost of 1 {{.*}} %F32 = call float @llvm.fma.f32
	; AVX2: cost of 1 {{.*}} %F32 = call float @llvm.fma.f32			; AVX2: cost of 1 {{.*}} %F32 = call float @llvm.fma.f32
	; AVX512: cost of 1 {{.*}} %F32 = call float @llvm.fma.f32			; AVX512: cost of 1 {{.*}} %F32 = call float @llvm.fma.f32
	; SLM: cost of 10 {{.*}} %F32 = call float @llvm.fma.f32			; SLM: cost of 10 {{.*}} %F32 = call float @llvm.fma.f32
				; GLM: cost of 10 {{.*}} %F32 = call float @llvm.fma.f32
	%F32 = call float @llvm.fma.f32(float undef, float undef, float undef)			%F32 = call float @llvm.fma.f32(float undef, float undef, float undef)
	; SSE2: cost of 43 {{.*}} %V4F32 = call <4 x float> @llvm.fma.v4f32			; SSE2: cost of 43 {{.*}} %V4F32 = call <4 x float> @llvm.fma.v4f32
	; SSE42: cost of 43 {{.*}} %V4F32 = call <4 x float> @llvm.fma.v4f32			; SSE42: cost of 43 {{.*}} %V4F32 = call <4 x float> @llvm.fma.v4f32
	; AVX: cost of 1 {{.*}} %V4F32 = call <4 x float> @llvm.fma.v4f32			; AVX: cost of 1 {{.*}} %V4F32 = call <4 x float> @llvm.fma.v4f32
	; AVX2: cost of 1 {{.*}} %V4F32 = call <4 x float> @llvm.fma.v4f32			; AVX2: cost of 1 {{.*}} %V4F32 = call <4 x float> @llvm.fma.v4f32
	; AVX512: cost of 1 {{.*}} %V4F32 = call <4 x float> @llvm.fma.v4f32			; AVX512: cost of 1 {{.*}} %V4F32 = call <4 x float> @llvm.fma.v4f32
	; SLM: cost of 43 {{.*}} %V4F32 = call <4 x float> @llvm.fma.v4f32			; SLM: cost of 43 {{.*}} %V4F32 = call <4 x float> @llvm.fma.v4f32
				; GLM: cost of 43 {{.*}} %V4F32 = call <4 x float> @llvm.fma.v4f32
	%V4F32 = call <4 x float> @llvm.fma.v4f32(<4 x float> undef, <4 x float> undef, <4 x float> undef)			%V4F32 = call <4 x float> @llvm.fma.v4f32(<4 x float> undef, <4 x float> undef, <4 x float> undef)
	; SSE2: cost of 86 {{.*}} %V8F32 = call <8 x float> @llvm.fma.v8f32			; SSE2: cost of 86 {{.*}} %V8F32 = call <8 x float> @llvm.fma.v8f32
	; SSE42: cost of 86 {{.*}} %V8F32 = call <8 x float> @llvm.fma.v8f32			; SSE42: cost of 86 {{.*}} %V8F32 = call <8 x float> @llvm.fma.v8f32
	; AVX: cost of 1 {{.*}} %V8F32 = call <8 x float> @llvm.fma.v8f32			; AVX: cost of 1 {{.*}} %V8F32 = call <8 x float> @llvm.fma.v8f32
	; AVX2: cost of 1 {{.*}} %V8F32 = call <8 x float> @llvm.fma.v8f32			; AVX2: cost of 1 {{.*}} %V8F32 = call <8 x float> @llvm.fma.v8f32
	; AVX512: cost of 1 {{.*}} %V8F32 = call <8 x float> @llvm.fma.v8f32			; AVX512: cost of 1 {{.*}} %V8F32 = call <8 x float> @llvm.fma.v8f32
	; SLM: cost of 86 {{.*}} %V8F32 = call <8 x float> @llvm.fma.v8f32			; SLM: cost of 86 {{.*}} %V8F32 = call <8 x float> @llvm.fma.v8f32
				; GLM: cost of 86 {{.*}} %V8F32 = call <8 x float> @llvm.fma.v8f32
	%V8F32 = call <8 x float> @llvm.fma.v8f32(<8 x float> undef, <8 x float> undef, <8 x float> undef)			%V8F32 = call <8 x float> @llvm.fma.v8f32(<8 x float> undef, <8 x float> undef, <8 x float> undef)
	; SSE2: cost of 172 {{.*}} %V16F32 = call <16 x float> @llvm.fma.v16f32			; SSE2: cost of 172 {{.*}} %V16F32 = call <16 x float> @llvm.fma.v16f32
	; SSE42: cost of 172 {{.*}} %V16F32 = call <16 x float> @llvm.fma.v16f32			; SSE42: cost of 172 {{.*}} %V16F32 = call <16 x float> @llvm.fma.v16f32
	; AVX: cost of 4 {{.*}} %V16F32 = call <16 x float> @llvm.fma.v16f32			; AVX: cost of 4 {{.*}} %V16F32 = call <16 x float> @llvm.fma.v16f32
	; AVX2: cost of 4 {{.*}} %V16F32 = call <16 x float> @llvm.fma.v16f32			; AVX2: cost of 4 {{.*}} %V16F32 = call <16 x float> @llvm.fma.v16f32
	; AVX512: cost of 1 {{.*}} %V16F32 = call <16 x float> @llvm.fma.v16f32			; AVX512: cost of 1 {{.*}} %V16F32 = call <16 x float> @llvm.fma.v16f32
	; SLM: cost of 172 {{.*}} %V16F32 = call <16 x float> @llvm.fma.v16f32			; SLM: cost of 172 {{.*}} %V16F32 = call <16 x float> @llvm.fma.v16f32
				; GLM: cost of 172 {{.*}} %V16F32 = call <16 x float> @llvm.fma.v16f32
	%V16F32 = call <16 x float> @llvm.fma.v16f32(<16 x float> undef, <16 x float> undef, <16 x float> undef)			%V16F32 = call <16 x float> @llvm.fma.v16f32(<16 x float> undef, <16 x float> undef, <16 x float> undef)

	; SSE2: cost of 10 {{.*}} %F64 = call double @llvm.fma.f64			; SSE2: cost of 10 {{.*}} %F64 = call double @llvm.fma.f64
	; SSE42: cost of 10 {{.*}} %F64 = call double @llvm.fma.f64			; SSE42: cost of 10 {{.*}} %F64 = call double @llvm.fma.f64
	; AVX: cost of 1 {{.*}} %F64 = call double @llvm.fma.f64			; AVX: cost of 1 {{.*}} %F64 = call double @llvm.fma.f64
	; AVX2: cost of 1 {{.*}} %F64 = call double @llvm.fma.f64			; AVX2: cost of 1 {{.*}} %F64 = call double @llvm.fma.f64
	; AVX512: cost of 1 {{.*}} %F64 = call double @llvm.fma.f64			; AVX512: cost of 1 {{.*}} %F64 = call double @llvm.fma.f64
	; SLM: cost of 10 {{.*}} %F64 = call double @llvm.fma.f64			; SLM: cost of 10 {{.*}} %F64 = call double @llvm.fma.f64
				; GLM: cost of 10 {{.*}} %F64 = call double @llvm.fma.f64
	%F64 = call double @llvm.fma.f64(double undef, double undef, double undef)			%F64 = call double @llvm.fma.f64(double undef, double undef, double undef)
	; SSE2: cost of 21 {{.*}} %V2F64 = call <2 x double> @llvm.fma.v2f64			; SSE2: cost of 21 {{.*}} %V2F64 = call <2 x double> @llvm.fma.v2f64
	; SSE42: cost of 21 {{.*}} %V2F64 = call <2 x double> @llvm.fma.v2f64			; SSE42: cost of 21 {{.*}} %V2F64 = call <2 x double> @llvm.fma.v2f64
	; AVX: cost of 1 {{.*}} %V2F64 = call <2 x double> @llvm.fma.v2f64			; AVX: cost of 1 {{.*}} %V2F64 = call <2 x double> @llvm.fma.v2f64
	; AVX2: cost of 1 {{.*}} %V2F64 = call <2 x double> @llvm.fma.v2f64			; AVX2: cost of 1 {{.*}} %V2F64 = call <2 x double> @llvm.fma.v2f64
	; AVX512: cost of 1 {{.*}} %V2F64 = call <2 x double> @llvm.fma.v2f64			; AVX512: cost of 1 {{.*}} %V2F64 = call <2 x double> @llvm.fma.v2f64
	; SLM: cost of 21 {{.*}} %V2F64 = call <2 x double> @llvm.fma.v2f64			; SLM: cost of 21 {{.*}} %V2F64 = call <2 x double> @llvm.fma.v2f64
				; GLM: cost of 21 {{.*}} %V2F64 = call <2 x double> @llvm.fma.v2f64
	%V2F64 = call <2 x double> @llvm.fma.v2f64(<2 x double> undef, <2 x double> undef, <2 x double> undef)			%V2F64 = call <2 x double> @llvm.fma.v2f64(<2 x double> undef, <2 x double> undef, <2 x double> undef)
	; SSE2: cost of 42 {{.*}} %V4F64 = call <4 x double> @llvm.fma.v4f64			; SSE2: cost of 42 {{.*}} %V4F64 = call <4 x double> @llvm.fma.v4f64
	; SSE42: cost of 42 {{.*}} %V4F64 = call <4 x double> @llvm.fma.v4f64			; SSE42: cost of 42 {{.*}} %V4F64 = call <4 x double> @llvm.fma.v4f64
	; AVX: cost of 1 {{.*}} %V4F64 = call <4 x double> @llvm.fma.v4f64			; AVX: cost of 1 {{.*}} %V4F64 = call <4 x double> @llvm.fma.v4f64
	; AVX2: cost of 1 {{.*}} %V4F64 = call <4 x double> @llvm.fma.v4f64			; AVX2: cost of 1 {{.*}} %V4F64 = call <4 x double> @llvm.fma.v4f64
	; AVX512: cost of 1 {{.*}} %V4F64 = call <4 x double> @llvm.fma.v4f64			; AVX512: cost of 1 {{.*}} %V4F64 = call <4 x double> @llvm.fma.v4f64
	; SLM: cost of 42 {{.*}} %V4F64 = call <4 x double> @llvm.fma.v4f64			; SLM: cost of 42 {{.*}} %V4F64 = call <4 x double> @llvm.fma.v4f64
				; GLM: cost of 42 {{.*}} %V4F64 = call <4 x double> @llvm.fma.v4f64
	%V4F64 = call <4 x double> @llvm.fma.v4f64(<4 x double> undef, <4 x double> undef, <4 x double> undef)			%V4F64 = call <4 x double> @llvm.fma.v4f64(<4 x double> undef, <4 x double> undef, <4 x double> undef)
	; SSE2: cost of 84 {{.*}} %V8F64 = call <8 x double> @llvm.fma.v8f64			; SSE2: cost of 84 {{.*}} %V8F64 = call <8 x double> @llvm.fma.v8f64
	; SSE42: cost of 84 {{.*}} %V8F64 = call <8 x double> @llvm.fma.v8f64			; SSE42: cost of 84 {{.*}} %V8F64 = call <8 x double> @llvm.fma.v8f64
	; AVX: cost of 4 {{.*}} %V8F64 = call <8 x double> @llvm.fma.v8f64			; AVX: cost of 4 {{.*}} %V8F64 = call <8 x double> @llvm.fma.v8f64
	; AVX2: cost of 4 {{.*}} %V8F64 = call <8 x double> @llvm.fma.v8f64			; AVX2: cost of 4 {{.*}} %V8F64 = call <8 x double> @llvm.fma.v8f64
	; AVX512: cost of 1 {{.*}} %V8F64 = call <8 x double> @llvm.fma.v8f64			; AVX512: cost of 1 {{.*}} %V8F64 = call <8 x double> @llvm.fma.v8f64
	; SLM: cost of 84 {{.*}} %V8F64 = call <8 x double> @llvm.fma.v8f64			; SLM: cost of 84 {{.*}} %V8F64 = call <8 x double> @llvm.fma.v8f64
				; GLM: cost of 84 {{.*}} %V8F64 = call <8 x double> @llvm.fma.v8f64
	%V8F64 = call <8 x double> @llvm.fma.v8f64(<8 x double> undef, <8 x double> undef, <8 x double> undef)			%V8F64 = call <8 x double> @llvm.fma.v8f64(<8 x double> undef, <8 x double> undef, <8 x double> undef)

	ret i32 undef			ret i32 undef
	}			}

	declare float @llvm.sqrt.f32(float)			declare float @llvm.sqrt.f32(float)
	declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)			declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)
	declare <8 x float> @llvm.sqrt.v8f32(<8 x float>)			declare <8 x float> @llvm.sqrt.v8f32(<8 x float>)
	Show All 36 Lines

llvm/trunk/test/Analysis/CostModel/X86/arith.ll

	; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+ssse3 \| FileCheck %s --check-prefix=CHECK --check-prefix=SSSE3			; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+ssse3 \| FileCheck %s --check-prefix=CHECK --check-prefix=SSSE3
	; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+sse4.2 \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE42			; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+sse4.2 \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE42
	; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+avx \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX			; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+avx \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
	; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+avx2 \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX2			; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+avx2 \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX2
	; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+avx512f \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512 --check-prefix=AVX512F			; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+avx512f \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512 --check-prefix=AVX512F
	; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+avx512f,+avx512bw \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512 --check-prefix=AVX512BW			; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+avx512f,+avx512bw \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512 --check-prefix=AVX512BW
	; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+avx512f,+avx512dq \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512 --check-prefix=AVX512DQ			; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mattr=+avx512f,+avx512dq \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512 --check-prefix=AVX512DQ
				; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mcpu=slm \| FileCheck %s --check-prefix=CHECK --check-prefix=SLM
				; RUN: opt < %s -cost-model -analyze -mtriple=x86_64-apple-macosx10.8.0 -mcpu=goldmont \| FileCheck %s --check-prefix=CHECK --check-prefix=GLM

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.8.0"			target triple = "x86_64-apple-macosx10.8.0"

	; CHECK-LABEL: 'add'			; CHECK-LABEL: 'add'
	define i32 @add(i32 %arg) {			define i32 @add(i32 %arg) {
	; CHECK: cost of 1 {{.*}} %I64 = add			; CHECK: cost of 1 {{.*}} %I64 = add
	%I64 = add i64 undef, undef			%I64 = add i64 undef, undef
	; SSSE3: cost of 1 {{.*}} %V2I64 = add			; SSSE3: cost of 1 {{.*}} %V2I64 = add
	; SSE42: cost of 1 {{.*}} %V2I64 = add			; SSE42: cost of 1 {{.*}} %V2I64 = add
				; SLM: cost of 4 {{.*}} %V2I64 = add
				; GLM: cost of 1 {{.*}} %V2I64 = add
	; AVX: cost of 1 {{.*}} %V2I64 = add			; AVX: cost of 1 {{.*}} %V2I64 = add
	; AVX2: cost of 1 {{.*}} %V2I64 = add			; AVX2: cost of 1 {{.*}} %V2I64 = add
	; AVX512: cost of 1 {{.*}} %V2I64 = add			; AVX512: cost of 1 {{.*}} %V2I64 = add
	%V2I64 = add <2 x i64> undef, undef			%V2I64 = add <2 x i64> undef, undef
	; SSSE3: cost of 2 {{.*}} %V4I64 = add			; SSSE3: cost of 2 {{.*}} %V4I64 = add
	; SSE42: cost of 2 {{.*}} %V4I64 = add			; SSE42: cost of 2 {{.*}} %V4I64 = add
				; SLM: cost of 8 {{.*}} %V4I64 = add
				; GLM: cost of 2 {{.*}} %V4I64 = add
	; AVX: cost of 4 {{.*}} %V4I64 = add			; AVX: cost of 4 {{.*}} %V4I64 = add
	; AVX2: cost of 1 {{.*}} %V4I64 = add			; AVX2: cost of 1 {{.*}} %V4I64 = add
	; AVX512: cost of 1 {{.*}} %V4I64 = add			; AVX512: cost of 1 {{.*}} %V4I64 = add
	%V4I64 = add <4 x i64> undef, undef			%V4I64 = add <4 x i64> undef, undef
	; SSSE3: cost of 4 {{.*}} %V8I64 = add			; SSSE3: cost of 4 {{.*}} %V8I64 = add
	; SSE42: cost of 4 {{.*}} %V8I64 = add			; SSE42: cost of 4 {{.*}} %V8I64 = add
				; SLM: cost of 16 {{.*}} %V8I64 = add
				; GLM: cost of 4 {{.*}} %V8I64 = add
	; AVX: cost of 8 {{.*}} %V8I64 = add			; AVX: cost of 8 {{.*}} %V8I64 = add
	; AVX2: cost of 2 {{.*}} %V8I64 = add			; AVX2: cost of 2 {{.*}} %V8I64 = add
	; AVX512: cost of 1 {{.*}} %V8I64 = add			; AVX512: cost of 1 {{.*}} %V8I64 = add
	%V8I64 = add <8 x i64> undef, undef			%V8I64 = add <8 x i64> undef, undef

	; CHECK: cost of 1 {{.*}} %I32 = add			; CHECK: cost of 1 {{.*}} %I32 = add
	%I32 = add i32 undef, undef			%I32 = add i32 undef, undef
	; SSSE3: cost of 1 {{.*}} %V4I32 = add			; SSSE3: cost of 1 {{.*}} %V4I32 = add
	; SSE42: cost of 1 {{.*}} %V4I32 = add			; SSE42: cost of 1 {{.*}} %V4I32 = add
				; SLM: cost of 1 {{.*}} %V4I32 = add
				; GLM: cost of 1 {{.*}} %V4I32 = add
	; AVX: cost of 1 {{.*}} %V4I32 = add			; AVX: cost of 1 {{.*}} %V4I32 = add
	; AVX2: cost of 1 {{.*}} %V4I32 = add			; AVX2: cost of 1 {{.*}} %V4I32 = add
	; AVX512: cost of 1 {{.*}} %V4I32 = add			; AVX512: cost of 1 {{.*}} %V4I32 = add
	%V4I32 = add <4 x i32> undef, undef			%V4I32 = add <4 x i32> undef, undef
	; SSSE3: cost of 2 {{.*}} %V8I32 = add			; SSSE3: cost of 2 {{.*}} %V8I32 = add
	; SSE42: cost of 2 {{.*}} %V8I32 = add			; SSE42: cost of 2 {{.*}} %V8I32 = add
				; SLM: cost of 2 {{.*}} %V8I32 = add
				; GLM: cost of 2 {{.*}} %V8I32 = add
	; AVX: cost of 4 {{.*}} %V8I32 = add			; AVX: cost of 4 {{.*}} %V8I32 = add
	; AVX2: cost of 1 {{.*}} %V8I32 = add			; AVX2: cost of 1 {{.*}} %V8I32 = add
	; AVX512: cost of 1 {{.*}} %V8I32 = add			; AVX512: cost of 1 {{.*}} %V8I32 = add
	%V8I32 = add <8 x i32> undef, undef			%V8I32 = add <8 x i32> undef, undef
	; SSSE3: cost of 4 {{.*}} %V16I32 = add			; SSSE3: cost of 4 {{.*}} %V16I32 = add
	; SSE42: cost of 4 {{.*}} %V16I32 = add			; SSE42: cost of 4 {{.*}} %V16I32 = add
				; SLM: cost of 4 {{.*}} %V16I32 = add
				; GLM: cost of 4 {{.*}} %V16I32 = add
	; AVX: cost of 8 {{.*}} %V16I32 = add			; AVX: cost of 8 {{.*}} %V16I32 = add
	; AVX2: cost of 2 {{.*}} %V16I32 = add			; AVX2: cost of 2 {{.*}} %V16I32 = add
	; AVX512: cost of 1 {{.*}} %V16I32 = add			; AVX512: cost of 1 {{.*}} %V16I32 = add
	%V16I32 = add <16 x i32> undef, undef			%V16I32 = add <16 x i32> undef, undef

	; CHECK: cost of 1 {{.*}} %I16 = add			; CHECK: cost of 1 {{.*}} %I16 = add
	%I16 = add i16 undef, undef			%I16 = add i16 undef, undef
	; SSSE3: cost of 1 {{.*}} %V8I16 = add			; SSSE3: cost of 1 {{.*}} %V8I16 = add
	; SSE42: cost of 1 {{.*}} %V8I16 = add			; SSE42: cost of 1 {{.*}} %V8I16 = add
				; SLM: cost of 1 {{.*}} %V8I16 = add
				; GLM: cost of 1 {{.*}} %V8I16 = add
	; AVX: cost of 1 {{.*}} %V8I16 = add			; AVX: cost of 1 {{.*}} %V8I16 = add
	; AVX2: cost of 1 {{.*}} %V8I16 = add			; AVX2: cost of 1 {{.*}} %V8I16 = add
	; AVX512: cost of 1 {{.*}} %V8I16 = add			; AVX512: cost of 1 {{.*}} %V8I16 = add
	%V8I16 = add <8 x i16> undef, undef			%V8I16 = add <8 x i16> undef, undef
	; SSSE3: cost of 2 {{.*}} %V16I16 = add			; SSSE3: cost of 2 {{.*}} %V16I16 = add
	; SSE42: cost of 2 {{.*}} %V16I16 = add			; SSE42: cost of 2 {{.*}} %V16I16 = add
				; SLM: cost of 2 {{.*}} %V16I16 = add
				; GLM: cost of 2 {{.*}} %V16I16 = add
	; AVX: cost of 4 {{.*}} %V16I16 = add			; AVX: cost of 4 {{.*}} %V16I16 = add
	; AVX2: cost of 1 {{.*}} %V16I16 = add			; AVX2: cost of 1 {{.*}} %V16I16 = add
	; AVX512: cost of 1 {{.*}} %V16I16 = add			; AVX512: cost of 1 {{.*}} %V16I16 = add
	%V16I16 = add <16 x i16> undef, undef			%V16I16 = add <16 x i16> undef, undef
	; SSSE3: cost of 4 {{.*}} %V32I16 = add			; SSSE3: cost of 4 {{.*}} %V32I16 = add
	; SSE42: cost of 4 {{.*}} %V32I16 = add			; SSE42: cost of 4 {{.*}} %V32I16 = add
				; SLM: cost of 4 {{.*}} %V32I16 = add
				; GLM: cost of 4 {{.*}} %V32I16 = add
	; AVX: cost of 8 {{.*}} %V32I16 = add			; AVX: cost of 8 {{.*}} %V32I16 = add
	; AVX2: cost of 2 {{.*}} %V32I16 = add			; AVX2: cost of 2 {{.*}} %V32I16 = add
	; AVX512F: cost of 2 {{.*}} %V32I16 = add			; AVX512F: cost of 2 {{.*}} %V32I16 = add
	; AVX512BW: cost of 1 {{.*}} %V32I16 = add			; AVX512BW: cost of 1 {{.*}} %V32I16 = add
	%V32I16 = add <32 x i16> undef, undef			%V32I16 = add <32 x i16> undef, undef

	; CHECK: cost of 1 {{.*}} %I8 = add			; CHECK: cost of 1 {{.*}} %I8 = add
	%I8 = add i8 undef, undef			%I8 = add i8 undef, undef
	; SSSE3: cost of 1 {{.*}} %V16I8 = add			; SSSE3: cost of 1 {{.*}} %V16I8 = add
	; SSE42: cost of 1 {{.*}} %V16I8 = add			; SSE42: cost of 1 {{.*}} %V16I8 = add
				; SLM: cost of 1 {{.*}} %V16I8 = add
				; GLM: cost of 1 {{.*}} %V16I8 = add
	; AVX: cost of 1 {{.*}} %V16I8 = add			; AVX: cost of 1 {{.*}} %V16I8 = add
	; AVX2: cost of 1 {{.*}} %V16I8 = add			; AVX2: cost of 1 {{.*}} %V16I8 = add
	; AVX512: cost of 1 {{.*}} %V16I8 = add			; AVX512: cost of 1 {{.*}} %V16I8 = add
	%V16I8 = add <16 x i8> undef, undef			%V16I8 = add <16 x i8> undef, undef
	; SSSE3: cost of 2 {{.*}} %V32I8 = add			; SSSE3: cost of 2 {{.*}} %V32I8 = add
	; SSE42: cost of 2 {{.*}} %V32I8 = add			; SSE42: cost of 2 {{.*}} %V32I8 = add
				; SLM: cost of 2 {{.*}} %V32I8 = add
				; GLM: cost of 2 {{.*}} %V32I8 = add
	; AVX: cost of 4 {{.*}} %V32I8 = add			; AVX: cost of 4 {{.*}} %V32I8 = add
	; AVX2: cost of 1 {{.*}} %V32I8 = add			; AVX2: cost of 1 {{.*}} %V32I8 = add
	; AVX512: cost of 1 {{.*}} %V32I8 = add			; AVX512: cost of 1 {{.*}} %V32I8 = add
	%V32I8 = add <32 x i8> undef, undef			%V32I8 = add <32 x i8> undef, undef
	; SSSE3: cost of 4 {{.*}} %V64I8 = add			; SSSE3: cost of 4 {{.*}} %V64I8 = add
	; SSE42: cost of 4 {{.*}} %V64I8 = add			; SSE42: cost of 4 {{.*}} %V64I8 = add
				; SLM: cost of 4 {{.*}} %V64I8 = add
				; GLM: cost of 4 {{.*}} %V64I8 = add
	; AVX: cost of 8 {{.*}} %V64I8 = add			; AVX: cost of 8 {{.*}} %V64I8 = add
	; AVX2: cost of 2 {{.*}} %V64I8 = add			; AVX2: cost of 2 {{.*}} %V64I8 = add
	; AVX512F: cost of 2 {{.*}} %V64I8 = add			; AVX512F: cost of 2 {{.*}} %V64I8 = add
	; AVX512BW: cost of 1 {{.*}} %V64I8 = add			; AVX512BW: cost of 1 {{.*}} %V64I8 = add
	%V64I8 = add <64 x i8> undef, undef			%V64I8 = add <64 x i8> undef, undef

	ret i32 undef			ret i32 undef
	}			}

	; CHECK-LABEL: 'sub'			; CHECK-LABEL: 'sub'
	define i32 @sub(i32 %arg) {			define i32 @sub(i32 %arg) {
	; CHECK: cost of 1 {{.*}} %I64 = sub			; CHECK: cost of 1 {{.*}} %I64 = sub
	%I64 = sub i64 undef, undef			%I64 = sub i64 undef, undef
	; SSSE3: cost of 1 {{.*}} %V2I64 = sub			; SSSE3: cost of 1 {{.*}} %V2I64 = sub
	; SSE42: cost of 1 {{.*}} %V2I64 = sub			; SSE42: cost of 1 {{.*}} %V2I64 = sub
				; SLM: cost of 4 {{.*}} %V2I64 = sub
				; GLM: cost of 1 {{.*}} %V2I64 = sub
	; AVX: cost of 1 {{.*}} %V2I64 = sub			; AVX: cost of 1 {{.*}} %V2I64 = sub
	; AVX2: cost of 1 {{.*}} %V2I64 = sub			; AVX2: cost of 1 {{.*}} %V2I64 = sub
	; AVX512: cost of 1 {{.*}} %V2I64 = sub			; AVX512: cost of 1 {{.*}} %V2I64 = sub
	%V2I64 = sub <2 x i64> undef, undef			%V2I64 = sub <2 x i64> undef, undef
	; SSSE3: cost of 2 {{.*}} %V4I64 = sub			; SSSE3: cost of 2 {{.*}} %V4I64 = sub
	; SSE42: cost of 2 {{.*}} %V4I64 = sub			; SSE42: cost of 2 {{.*}} %V4I64 = sub
				; SLM: cost of 8 {{.*}} %V4I64 = sub
				; GLM: cost of 2 {{.*}} %V4I64 = sub
	; AVX: cost of 4 {{.*}} %V4I64 = sub			; AVX: cost of 4 {{.*}} %V4I64 = sub
	; AVX2: cost of 1 {{.*}} %V4I64 = sub			; AVX2: cost of 1 {{.*}} %V4I64 = sub
	; AVX512: cost of 1 {{.*}} %V4I64 = sub			; AVX512: cost of 1 {{.*}} %V4I64 = sub
	%V4I64 = sub <4 x i64> undef, undef			%V4I64 = sub <4 x i64> undef, undef
	; SSSE3: cost of 4 {{.*}} %V8I64 = sub			; SSSE3: cost of 4 {{.*}} %V8I64 = sub
	; SSE42: cost of 4 {{.*}} %V8I64 = sub			; SSE42: cost of 4 {{.*}} %V8I64 = sub
				; SLM: cost of 16 {{.*}} %V8I64 = sub
				; GLM: cost of 4 {{.*}} %V8I64 = sub
	; AVX: cost of 8 {{.*}} %V8I64 = sub			; AVX: cost of 8 {{.*}} %V8I64 = sub
	; AVX2: cost of 2 {{.*}} %V8I64 = sub			; AVX2: cost of 2 {{.*}} %V8I64 = sub
	; AVX512: cost of 1 {{.*}} %V8I64 = sub			; AVX512: cost of 1 {{.*}} %V8I64 = sub
	%V8I64 = sub <8 x i64> undef, undef			%V8I64 = sub <8 x i64> undef, undef

	; CHECK: cost of 1 {{.*}} %I32 = sub			; CHECK: cost of 1 {{.*}} %I32 = sub
	%I32 = sub i32 undef, undef			%I32 = sub i32 undef, undef
	; SSSE3: cost of 1 {{.*}} %V4I32 = sub			; SSSE3: cost of 1 {{.*}} %V4I32 = sub
	; SSE42: cost of 1 {{.*}} %V4I32 = sub			; SSE42: cost of 1 {{.*}} %V4I32 = sub
				; SLM: cost of 1 {{.*}} %V4I32 = sub
				; GLM: cost of 1 {{.*}} %V4I32 = sub
	; AVX: cost of 1 {{.*}} %V4I32 = sub			; AVX: cost of 1 {{.*}} %V4I32 = sub
	; AVX2: cost of 1 {{.*}} %V4I32 = sub			; AVX2: cost of 1 {{.*}} %V4I32 = sub
	; AVX512: cost of 1 {{.*}} %V4I32 = sub			; AVX512: cost of 1 {{.*}} %V4I32 = sub
	%V4I32 = sub <4 x i32> undef, undef			%V4I32 = sub <4 x i32> undef, undef
	; SSSE3: cost of 2 {{.*}} %V8I32 = sub			; SSSE3: cost of 2 {{.*}} %V8I32 = sub
	; SSE42: cost of 2 {{.*}} %V8I32 = sub			; SSE42: cost of 2 {{.*}} %V8I32 = sub
				; SLM: cost of 2 {{.*}} %V8I32 = sub
				; GLM: cost of 2 {{.*}} %V8I32 = sub
	; AVX: cost of 4 {{.*}} %V8I32 = sub			; AVX: cost of 4 {{.*}} %V8I32 = sub
	; AVX2: cost of 1 {{.*}} %V8I32 = sub			; AVX2: cost of 1 {{.*}} %V8I32 = sub
	; AVX512: cost of 1 {{.*}} %V8I32 = sub			; AVX512: cost of 1 {{.*}} %V8I32 = sub
	%V8I32 = sub <8 x i32> undef, undef			%V8I32 = sub <8 x i32> undef, undef
	; SSSE3: cost of 4 {{.*}} %V16I32 = sub			; SSSE3: cost of 4 {{.*}} %V16I32 = sub
	; SSE42: cost of 4 {{.*}} %V16I32 = sub			; SSE42: cost of 4 {{.*}} %V16I32 = sub
				; SLM: cost of 4 {{.*}} %V16I32 = sub
				; GLM: cost of 4 {{.*}} %V16I32 = sub
	; AVX: cost of 8 {{.*}} %V16I32 = sub			; AVX: cost of 8 {{.*}} %V16I32 = sub
	; AVX2: cost of 2 {{.*}} %V16I32 = sub			; AVX2: cost of 2 {{.*}} %V16I32 = sub
	; AVX512: cost of 1 {{.*}} %V16I32 = sub			; AVX512: cost of 1 {{.*}} %V16I32 = sub
	%V16I32 = sub <16 x i32> undef, undef			%V16I32 = sub <16 x i32> undef, undef

	; CHECK: cost of 1 {{.*}} %I16 = sub			; CHECK: cost of 1 {{.*}} %I16 = sub
	%I16 = sub i16 undef, undef			%I16 = sub i16 undef, undef
	; SSSE3: cost of 1 {{.*}} %V8I16 = sub			; SSSE3: cost of 1 {{.*}} %V8I16 = sub
	; SSE42: cost of 1 {{.*}} %V8I16 = sub			; SSE42: cost of 1 {{.*}} %V8I16 = sub
				; SLM: cost of 1 {{.*}} %V8I16 = sub
				; GLM: cost of 1 {{.*}} %V8I16 = sub
	; AVX: cost of 1 {{.*}} %V8I16 = sub			; AVX: cost of 1 {{.*}} %V8I16 = sub
	; AVX2: cost of 1 {{.*}} %V8I16 = sub			; AVX2: cost of 1 {{.*}} %V8I16 = sub
	; AVX512: cost of 1 {{.*}} %V8I16 = sub			; AVX512: cost of 1 {{.*}} %V8I16 = sub
	%V8I16 = sub <8 x i16> undef, undef			%V8I16 = sub <8 x i16> undef, undef
	; SSSE3: cost of 2 {{.*}} %V16I16 = sub			; SSSE3: cost of 2 {{.*}} %V16I16 = sub
	; SSE42: cost of 2 {{.*}} %V16I16 = sub			; SSE42: cost of 2 {{.*}} %V16I16 = sub
				; SLM: cost of 2 {{.*}} %V16I16 = sub
				; GLM: cost of 2 {{.*}} %V16I16 = sub
	; AVX: cost of 4 {{.*}} %V16I16 = sub			; AVX: cost of 4 {{.*}} %V16I16 = sub
	; AVX2: cost of 1 {{.*}} %V16I16 = sub			; AVX2: cost of 1 {{.*}} %V16I16 = sub
	; AVX512: cost of 1 {{.*}} %V16I16 = sub			; AVX512: cost of 1 {{.*}} %V16I16 = sub
	%V16I16 = sub <16 x i16> undef, undef			%V16I16 = sub <16 x i16> undef, undef
	; SSSE3: cost of 4 {{.*}} %V32I16 = sub			; SSSE3: cost of 4 {{.*}} %V32I16 = sub
	; SSE42: cost of 4 {{.*}} %V32I16 = sub			; SSE42: cost of 4 {{.*}} %V32I16 = sub
				; SLM: cost of 4 {{.*}} %V32I16 = sub
				; GLM: cost of 4 {{.*}} %V32I16 = sub
	; AVX: cost of 8 {{.*}} %V32I16 = sub			; AVX: cost of 8 {{.*}} %V32I16 = sub
	; AVX2: cost of 2 {{.*}} %V32I16 = sub			; AVX2: cost of 2 {{.*}} %V32I16 = sub
	; AVX512F: cost of 2 {{.*}} %V32I16 = sub			; AVX512F: cost of 2 {{.*}} %V32I16 = sub
	; AVX512BW: cost of 1 {{.*}} %V32I16 = sub			; AVX512BW: cost of 1 {{.*}} %V32I16 = sub
	%V32I16 = sub <32 x i16> undef, undef			%V32I16 = sub <32 x i16> undef, undef

	; CHECK: cost of 1 {{.*}} %I8 = sub			; CHECK: cost of 1 {{.*}} %I8 = sub
	%I8 = sub i8 undef, undef			%I8 = sub i8 undef, undef
	; SSSE3: cost of 1 {{.*}} %V16I8 = sub			; SSSE3: cost of 1 {{.*}} %V16I8 = sub
	; SSE42: cost of 1 {{.*}} %V16I8 = sub			; SSE42: cost of 1 {{.*}} %V16I8 = sub
				; SLM: cost of 1 {{.*}} %V16I8 = sub
				; GLM: cost of 1 {{.*}} %V16I8 = sub
	; AVX: cost of 1 {{.*}} %V16I8 = sub			; AVX: cost of 1 {{.*}} %V16I8 = sub
	; AVX2: cost of 1 {{.*}} %V16I8 = sub			; AVX2: cost of 1 {{.*}} %V16I8 = sub
	; AVX512: cost of 1 {{.*}} %V16I8 = sub			; AVX512: cost of 1 {{.*}} %V16I8 = sub
	%V16I8 = sub <16 x i8> undef, undef			%V16I8 = sub <16 x i8> undef, undef
	; SSSE3: cost of 2 {{.*}} %V32I8 = sub			; SSSE3: cost of 2 {{.*}} %V32I8 = sub
	; SSE42: cost of 2 {{.*}} %V32I8 = sub			; SSE42: cost of 2 {{.*}} %V32I8 = sub
				; SLM: cost of 2 {{.*}} %V32I8 = sub
				; GLM: cost of 2 {{.*}} %V32I8 = sub
	; AVX: cost of 4 {{.*}} %V32I8 = sub			; AVX: cost of 4 {{.*}} %V32I8 = sub
	; AVX2: cost of 1 {{.*}} %V32I8 = sub			; AVX2: cost of 1 {{.*}} %V32I8 = sub
	; AVX512: cost of 1 {{.*}} %V32I8 = sub			; AVX512: cost of 1 {{.*}} %V32I8 = sub
	%V32I8 = sub <32 x i8> undef, undef			%V32I8 = sub <32 x i8> undef, undef
	; SSSE3: cost of 4 {{.*}} %V64I8 = sub			; SSSE3: cost of 4 {{.*}} %V64I8 = sub
	; SSE42: cost of 4 {{.*}} %V64I8 = sub			; SSE42: cost of 4 {{.*}} %V64I8 = sub
				; SLM: cost of 4 {{.*}} %V64I8 = sub
				; GLM: cost of 4 {{.*}} %V64I8 = sub
	; AVX: cost of 8 {{.*}} %V64I8 = sub			; AVX: cost of 8 {{.*}} %V64I8 = sub
	; AVX2: cost of 2 {{.*}} %V64I8 = sub			; AVX2: cost of 2 {{.*}} %V64I8 = sub
	; AVX512F: cost of 2 {{.*}} %V64I8 = sub			; AVX512F: cost of 2 {{.*}} %V64I8 = sub
	; AVX512BW: cost of 1 {{.*}} %V64I8 = sub			; AVX512BW: cost of 1 {{.*}} %V64I8 = sub
	%V64I8 = sub <64 x i8> undef, undef			%V64I8 = sub <64 x i8> undef, undef

	ret i32 undef			ret i32 undef
	}			}

	; CHECK-LABEL: 'or'			; CHECK-LABEL: 'or'
	define i32 @or(i32 %arg) {			define i32 @or(i32 %arg) {
	; CHECK: cost of 1 {{.*}} %I64 = or			; CHECK: cost of 1 {{.*}} %I64 = or
	%I64 = or i64 undef, undef			%I64 = or i64 undef, undef
	; SSSE3: cost of 1 {{.*}} %V2I64 = or			; SSSE3: cost of 1 {{.*}} %V2I64 = or
	; SSE42: cost of 1 {{.*}} %V2I64 = or			; SSE42: cost of 1 {{.*}} %V2I64 = or
				; SLM: cost of 1 {{.*}} %V2I64 = or
				; GLM: cost of 1 {{.*}} %V2I64 = or
	; AVX: cost of 1 {{.*}} %V2I64 = or			; AVX: cost of 1 {{.*}} %V2I64 = or
	; AVX2: cost of 1 {{.*}} %V2I64 = or			; AVX2: cost of 1 {{.*}} %V2I64 = or
	; AVX512: cost of 1 {{.*}} %V2I64 = or			; AVX512: cost of 1 {{.*}} %V2I64 = or
	%V2I64 = or <2 x i64> undef, undef			%V2I64 = or <2 x i64> undef, undef
	; SSSE3: cost of 2 {{.*}} %V4I64 = or			; SSSE3: cost of 2 {{.*}} %V4I64 = or
	; SSE42: cost of 2 {{.*}} %V4I64 = or			; SSE42: cost of 2 {{.*}} %V4I64 = or
				; SLM: cost of 2 {{.*}} %V4I64 = or
				; GLM: cost of 2 {{.*}} %V4I64 = or
	; AVX: cost of 1 {{.*}} %V4I64 = or			; AVX: cost of 1 {{.*}} %V4I64 = or
	; AVX2: cost of 1 {{.*}} %V4I64 = or			; AVX2: cost of 1 {{.*}} %V4I64 = or
	; AVX512: cost of 1 {{.*}} %V4I64 = or			; AVX512: cost of 1 {{.*}} %V4I64 = or
	%V4I64 = or <4 x i64> undef, undef			%V4I64 = or <4 x i64> undef, undef
	; SSSE3: cost of 4 {{.*}} %V8I64 = or			; SSSE3: cost of 4 {{.*}} %V8I64 = or
	; SSE42: cost of 4 {{.*}} %V8I64 = or			; SSE42: cost of 4 {{.*}} %V8I64 = or
				; SLM: cost of 4 {{.*}} %V8I64 = or
				; GLM: cost of 4 {{.*}} %V8I64 = or
	; AVX: cost of 2 {{.*}} %V8I64 = or			; AVX: cost of 2 {{.*}} %V8I64 = or
	; AVX2: cost of 2 {{.*}} %V8I64 = or			; AVX2: cost of 2 {{.*}} %V8I64 = or
	; AVX512: cost of 1 {{.*}} %V8I64 = or			; AVX512: cost of 1 {{.*}} %V8I64 = or
	%V8I64 = or <8 x i64> undef, undef			%V8I64 = or <8 x i64> undef, undef

	; CHECK: cost of 1 {{.*}} %I32 = or			; CHECK: cost of 1 {{.*}} %I32 = or
	%I32 = or i32 undef, undef			%I32 = or i32 undef, undef
	; SSSE3: cost of 1 {{.*}} %V4I32 = or			; SSSE3: cost of 1 {{.*}} %V4I32 = or
	; SSE42: cost of 1 {{.*}} %V4I32 = or			; SSE42: cost of 1 {{.*}} %V4I32 = or
				; SLM: cost of 1 {{.*}} %V4I32 = or
				; GLM: cost of 1 {{.*}} %V4I32 = or
	; AVX: cost of 1 {{.*}} %V4I32 = or			; AVX: cost of 1 {{.*}} %V4I32 = or
	; AVX2: cost of 1 {{.*}} %V4I32 = or			; AVX2: cost of 1 {{.*}} %V4I32 = or
	; AVX512: cost of 1 {{.*}} %V4I32 = or			; AVX512: cost of 1 {{.*}} %V4I32 = or
	%V4I32 = or <4 x i32> undef, undef			%V4I32 = or <4 x i32> undef, undef
	; SSSE3: cost of 2 {{.*}} %V8I32 = or			; SSSE3: cost of 2 {{.*}} %V8I32 = or
	; SSE42: cost of 2 {{.*}} %V8I32 = or			; SSE42: cost of 2 {{.*}} %V8I32 = or
				; SLM: cost of 2 {{.*}} %V8I32 = or
				; GLM: cost of 2 {{.*}} %V8I32 = or
	; AVX: cost of 1 {{.*}} %V8I32 = or			; AVX: cost of 1 {{.*}} %V8I32 = or
	; AVX2: cost of 1 {{.*}} %V8I32 = or			; AVX2: cost of 1 {{.*}} %V8I32 = or
	; AVX512: cost of 1 {{.*}} %V8I32 = or			; AVX512: cost of 1 {{.*}} %V8I32 = or
	%V8I32 = or <8 x i32> undef, undef			%V8I32 = or <8 x i32> undef, undef
	; SSSE3: cost of 4 {{.*}} %V16I32 = or			; SSSE3: cost of 4 {{.*}} %V16I32 = or
	; SSE42: cost of 4 {{.*}} %V16I32 = or			; SSE42: cost of 4 {{.*}} %V16I32 = or
				; SLM: cost of 4 {{.*}} %V16I32 = or
				; GLM: cost of 4 {{.*}} %V16I32 = or
	; AVX: cost of 2 {{.*}} %V16I32 = or			; AVX: cost of 2 {{.*}} %V16I32 = or
	; AVX2: cost of 2 {{.*}} %V16I32 = or			; AVX2: cost of 2 {{.*}} %V16I32 = or
	; AVX512: cost of 1 {{.*}} %V16I32 = or			; AVX512: cost of 1 {{.*}} %V16I32 = or
	%V16I32 = or <16 x i32> undef, undef			%V16I32 = or <16 x i32> undef, undef

	; CHECK: cost of 1 {{.*}} %I16 = or			; CHECK: cost of 1 {{.*}} %I16 = or
	%I16 = or i16 undef, undef			%I16 = or i16 undef, undef
	; SSSE3: cost of 1 {{.*}} %V8I16 = or			; SSSE3: cost of 1 {{.*}} %V8I16 = or
	; SSE42: cost of 1 {{.*}} %V8I16 = or			; SSE42: cost of 1 {{.*}} %V8I16 = or
				; SLM: cost of 1 {{.*}} %V8I16 = or
				; GLM: cost of 1 {{.*}} %V8I16 = or
	; AVX: cost of 1 {{.*}} %V8I16 = or			; AVX: cost of 1 {{.*}} %V8I16 = or
	; AVX2: cost of 1 {{.*}} %V8I16 = or			; AVX2: cost of 1 {{.*}} %V8I16 = or
	; AVX512: cost of 1 {{.*}} %V8I16 = or			; AVX512: cost of 1 {{.*}} %V8I16 = or
	%V8I16 = or <8 x i16> undef, undef			%V8I16 = or <8 x i16> undef, undef
	; SSSE3: cost of 2 {{.*}} %V16I16 = or			; SSSE3: cost of 2 {{.*}} %V16I16 = or
	; SSE42: cost of 2 {{.*}} %V16I16 = or			; SSE42: cost of 2 {{.*}} %V16I16 = or
				; SLM: cost of 2 {{.*}} %V16I16 = or
				; GLM: cost of 2 {{.*}} %V16I16 = or
	; AVX: cost of 1 {{.*}} %V16I16 = or			; AVX: cost of 1 {{.*}} %V16I16 = or
	; AVX2: cost of 1 {{.*}} %V16I16 = or			; AVX2: cost of 1 {{.*}} %V16I16 = or
	; AVX512: cost of 1 {{.*}} %V16I16 = or			; AVX512: cost of 1 {{.*}} %V16I16 = or
	%V16I16 = or <16 x i16> undef, undef			%V16I16 = or <16 x i16> undef, undef
	; SSSE3: cost of 4 {{.*}} %V32I16 = or			; SSSE3: cost of 4 {{.*}} %V32I16 = or
	; SSE42: cost of 4 {{.*}} %V32I16 = or			; SSE42: cost of 4 {{.*}} %V32I16 = or
				; SLM: cost of 4 {{.*}} %V32I16 = or
				; GLM: cost of 4 {{.*}} %V32I16 = or
	; AVX: cost of 2 {{.*}} %V32I16 = or			; AVX: cost of 2 {{.*}} %V32I16 = or
	; AVX2: cost of 2 {{.*}} %V32I16 = or			; AVX2: cost of 2 {{.*}} %V32I16 = or
	; AVX512F: cost of 2 {{.*}} %V32I16 = or			; AVX512F: cost of 2 {{.*}} %V32I16 = or
	; AVX512BW: cost of 1 {{.*}} %V32I16 = or			; AVX512BW: cost of 1 {{.*}} %V32I16 = or
	%V32I16 = or <32 x i16> undef, undef			%V32I16 = or <32 x i16> undef, undef

	; CHECK: cost of 1 {{.*}} %I8 = or			; CHECK: cost of 1 {{.*}} %I8 = or
	%I8 = or i8 undef, undef			%I8 = or i8 undef, undef
	; SSSE3: cost of 1 {{.*}} %V16I8 = or			; SSSE3: cost of 1 {{.*}} %V16I8 = or
	; SSE42: cost of 1 {{.*}} %V16I8 = or			; SSE42: cost of 1 {{.*}} %V16I8 = or
				; SLM: cost of 1 {{.*}} %V16I8 = or
				; GLM: cost of 1 {{.*}} %V16I8 = or
	; AVX: cost of 1 {{.*}} %V16I8 = or			; AVX: cost of 1 {{.*}} %V16I8 = or
	; AVX2: cost of 1 {{.*}} %V16I8 = or			; AVX2: cost of 1 {{.*}} %V16I8 = or
	; AVX512: cost of 1 {{.*}} %V16I8 = or			; AVX512: cost of 1 {{.*}} %V16I8 = or
	%V16I8 = or <16 x i8> undef, undef			%V16I8 = or <16 x i8> undef, undef
	; SSSE3: cost of 2 {{.*}} %V32I8 = or			; SSSE3: cost of 2 {{.*}} %V32I8 = or
	; SSE42: cost of 2 {{.*}} %V32I8 = or			; SSE42: cost of 2 {{.*}} %V32I8 = or
				; SLM: cost of 2 {{.*}} %V32I8 = or
				; GLM: cost of 2 {{.*}} %V32I8 = or
	; AVX: cost of 1 {{.*}} %V32I8 = or			; AVX: cost of 1 {{.*}} %V32I8 = or
	; AVX2: cost of 1 {{.*}} %V32I8 = or			; AVX2: cost of 1 {{.*}} %V32I8 = or
	; AVX512: cost of 1 {{.*}} %V32I8 = or			; AVX512: cost of 1 {{.*}} %V32I8 = or
	%V32I8 = or <32 x i8> undef, undef			%V32I8 = or <32 x i8> undef, undef
	; SSSE3: cost of 4 {{.*}} %V64I8 = or			; SSSE3: cost of 4 {{.*}} %V64I8 = or
	; SSE42: cost of 4 {{.*}} %V64I8 = or			; SSE42: cost of 4 {{.*}} %V64I8 = or
				; SLM: cost of 4 {{.*}} %V64I8 = or
				; GLM: cost of 4 {{.*}} %V64I8 = or
	; AVX: cost of 2 {{.*}} %V64I8 = or			; AVX: cost of 2 {{.*}} %V64I8 = or
	; AVX2: cost of 2 {{.*}} %V64I8 = or			; AVX2: cost of 2 {{.*}} %V64I8 = or
	; AVX512F: cost of 2 {{.*}} %V64I8 = or			; AVX512F: cost of 2 {{.*}} %V64I8 = or
	; AVX512BW: cost of 1 {{.*}} %V64I8 = or			; AVX512BW: cost of 1 {{.*}} %V64I8 = or
	%V64I8 = or <64 x i8> undef, undef			%V64I8 = or <64 x i8> undef, undef

	ret i32 undef			ret i32 undef
	}			}

	; CHECK-LABEL: 'xor'			; CHECK-LABEL: 'xor'
	define i32 @xor(i32 %arg) {			define i32 @xor(i32 %arg) {
	; CHECK: cost of 1 {{.*}} %I64 = xor			; CHECK: cost of 1 {{.*}} %I64 = xor
	%I64 = xor i64 undef, undef			%I64 = xor i64 undef, undef
	; SSSE3: cost of 1 {{.*}} %V2I64 = xor			; SSSE3: cost of 1 {{.*}} %V2I64 = xor
	; SSE42: cost of 1 {{.*}} %V2I64 = xor			; SSE42: cost of 1 {{.*}} %V2I64 = xor
				; SLM: cost of 1 {{.*}} %V2I64 = xor
				; GLM: cost of 1 {{.*}} %V2I64 = xor
	; AVX: cost of 1 {{.*}} %V2I64 = xor			; AVX: cost of 1 {{.*}} %V2I64 = xor
	; AVX2: cost of 1 {{.*}} %V2I64 = xor			; AVX2: cost of 1 {{.*}} %V2I64 = xor
	; AVX512: cost of 1 {{.*}} %V2I64 = xor			; AVX512: cost of 1 {{.*}} %V2I64 = xor
	%V2I64 = xor <2 x i64> undef, undef			%V2I64 = xor <2 x i64> undef, undef
	; SSSE3: cost of 2 {{.*}} %V4I64 = xor			; SSSE3: cost of 2 {{.*}} %V4I64 = xor
	; SSE42: cost of 2 {{.*}} %V4I64 = xor			; SSE42: cost of 2 {{.*}} %V4I64 = xor
				; SLM: cost of 2 {{.*}} %V4I64 = xor
				; GLM: cost of 2 {{.*}} %V4I64 = xor
	; AVX: cost of 1 {{.*}} %V4I64 = xor			; AVX: cost of 1 {{.*}} %V4I64 = xor
	; AVX2: cost of 1 {{.*}} %V4I64 = xor			; AVX2: cost of 1 {{.*}} %V4I64 = xor
	; AVX512: cost of 1 {{.*}} %V4I64 = xor			; AVX512: cost of 1 {{.*}} %V4I64 = xor
	%V4I64 = xor <4 x i64> undef, undef			%V4I64 = xor <4 x i64> undef, undef
	; SSSE3: cost of 4 {{.*}} %V8I64 = xor			; SSSE3: cost of 4 {{.*}} %V8I64 = xor
	; SSE42: cost of 4 {{.*}} %V8I64 = xor			; SSE42: cost of 4 {{.*}} %V8I64 = xor
				; SLM: cost of 4 {{.*}} %V8I64 = xor
				; GLM: cost of 4 {{.*}} %V8I64 = xor
	; AVX: cost of 2 {{.*}} %V8I64 = xor			; AVX: cost of 2 {{.*}} %V8I64 = xor
	; AVX2: cost of 2 {{.*}} %V8I64 = xor			; AVX2: cost of 2 {{.*}} %V8I64 = xor
	; AVX512: cost of 1 {{.*}} %V8I64 = xor			; AVX512: cost of 1 {{.*}} %V8I64 = xor
	%V8I64 = xor <8 x i64> undef, undef			%V8I64 = xor <8 x i64> undef, undef

	; CHECK: cost of 1 {{.*}} %I32 = xor			; CHECK: cost of 1 {{.*}} %I32 = xor
	%I32 = xor i32 undef, undef			%I32 = xor i32 undef, undef
	; SSSE3: cost of 1 {{.*}} %V4I32 = xor			; SSSE3: cost of 1 {{.*}} %V4I32 = xor
	; SSE42: cost of 1 {{.*}} %V4I32 = xor			; SSE42: cost of 1 {{.*}} %V4I32 = xor
				; SLM: cost of 1 {{.*}} %V4I32 = xor
				; GLM: cost of 1 {{.*}} %V4I32 = xor
	; AVX: cost of 1 {{.*}} %V4I32 = xor			; AVX: cost of 1 {{.*}} %V4I32 = xor
	; AVX2: cost of 1 {{.*}} %V4I32 = xor			; AVX2: cost of 1 {{.*}} %V4I32 = xor
	; AVX512: cost of 1 {{.*}} %V4I32 = xor			; AVX512: cost of 1 {{.*}} %V4I32 = xor
	%V4I32 = xor <4 x i32> undef, undef			%V4I32 = xor <4 x i32> undef, undef
	; SSSE3: cost of 2 {{.*}} %V8I32 = xor			; SSSE3: cost of 2 {{.*}} %V8I32 = xor
	; SSE42: cost of 2 {{.*}} %V8I32 = xor			; SSE42: cost of 2 {{.*}} %V8I32 = xor
				; SLM: cost of 2 {{.*}} %V8I32 = xor
				; GLM: cost of 2 {{.*}} %V8I32 = xor
	; AVX: cost of 1 {{.*}} %V8I32 = xor			; AVX: cost of 1 {{.*}} %V8I32 = xor
	; AVX2: cost of 1 {{.*}} %V8I32 = xor			; AVX2: cost of 1 {{.*}} %V8I32 = xor
	; AVX512: cost of 1 {{.*}} %V8I32 = xor			; AVX512: cost of 1 {{.*}} %V8I32 = xor
	%V8I32 = xor <8 x i32> undef, undef			%V8I32 = xor <8 x i32> undef, undef
	; SSSE3: cost of 4 {{.*}} %V16I32 = xor			; SSSE3: cost of 4 {{.*}} %V16I32 = xor
	; SSE42: cost of 4 {{.*}} %V16I32 = xor			; SSE42: cost of 4 {{.*}} %V16I32 = xor
				; SLM: cost of 4 {{.*}} %V16I32 = xor
				; GLM: cost of 4 {{.*}} %V16I32 = xor
	; AVX: cost of 2 {{.*}} %V16I32 = xor			; AVX: cost of 2 {{.*}} %V16I32 = xor
	; AVX2: cost of 2 {{.*}} %V16I32 = xor			; AVX2: cost of 2 {{.*}} %V16I32 = xor
	; AVX512: cost of 1 {{.*}} %V16I32 = xor			; AVX512: cost of 1 {{.*}} %V16I32 = xor
	%V16I32 = xor <16 x i32> undef, undef			%V16I32 = xor <16 x i32> undef, undef

	; CHECK: cost of 1 {{.*}} %I16 = xor			; CHECK: cost of 1 {{.*}} %I16 = xor
	%I16 = xor i16 undef, undef			%I16 = xor i16 undef, undef
	; SSSE3: cost of 1 {{.*}} %V8I16 = xor			; SSSE3: cost of 1 {{.*}} %V8I16 = xor
	; SSE42: cost of 1 {{.*}} %V8I16 = xor			; SSE42: cost of 1 {{.*}} %V8I16 = xor
				; SLM: cost of 1 {{.*}} %V8I16 = xor
				; GLM: cost of 1 {{.*}} %V8I16 = xor
	; AVX: cost of 1 {{.*}} %V8I16 = xor			; AVX: cost of 1 {{.*}} %V8I16 = xor
	; AVX2: cost of 1 {{.*}} %V8I16 = xor			; AVX2: cost of 1 {{.*}} %V8I16 = xor
	; AVX512: cost of 1 {{.*}} %V8I16 = xor			; AVX512: cost of 1 {{.*}} %V8I16 = xor
	%V8I16 = xor <8 x i16> undef, undef			%V8I16 = xor <8 x i16> undef, undef
	; SSSE3: cost of 2 {{.*}} %V16I16 = xor			; SSSE3: cost of 2 {{.*}} %V16I16 = xor
	; SSE42: cost of 2 {{.*}} %V16I16 = xor			; SSE42: cost of 2 {{.*}} %V16I16 = xor
				; SLM: cost of 2 {{.*}} %V16I16 = xor
				; GLM: cost of 2 {{.*}} %V16I16 = xor
	; AVX: cost of 1 {{.*}} %V16I16 = xor			; AVX: cost of 1 {{.*}} %V16I16 = xor
	; AVX2: cost of 1 {{.*}} %V16I16 = xor			; AVX2: cost of 1 {{.*}} %V16I16 = xor
	; AVX512: cost of 1 {{.*}} %V16I16 = xor			; AVX512: cost of 1 {{.*}} %V16I16 = xor
	%V16I16 = xor <16 x i16> undef, undef			%V16I16 = xor <16 x i16> undef, undef
	; SSSE3: cost of 4 {{.*}} %V32I16 = xor			; SSSE3: cost of 4 {{.*}} %V32I16 = xor
	; SSE42: cost of 4 {{.*}} %V32I16 = xor			; SSE42: cost of 4 {{.*}} %V32I16 = xor
				; SLM: cost of 4 {{.*}} %V32I16 = xor
				; GLM: cost of 4 {{.*}} %V32I16 = xor
	; AVX: cost of 2 {{.*}} %V32I16 = xor			; AVX: cost of 2 {{.*}} %V32I16 = xor
	; AVX2: cost of 2 {{.*}} %V32I16 = xor			; AVX2: cost of 2 {{.*}} %V32I16 = xor
	; AVX512F: cost of 2 {{.*}} %V32I16 = xor			; AVX512F: cost of 2 {{.*}} %V32I16 = xor
	; AVX512BW: cost of 1 {{.*}} %V32I16 = xor			; AVX512BW: cost of 1 {{.*}} %V32I16 = xor
	%V32I16 = xor <32 x i16> undef, undef			%V32I16 = xor <32 x i16> undef, undef

	; CHECK: cost of 1 {{.*}} %I8 = xor			; CHECK: cost of 1 {{.*}} %I8 = xor
	%I8 = xor i8 undef, undef			%I8 = xor i8 undef, undef
	; SSSE3: cost of 1 {{.*}} %V16I8 = xor			; SSSE3: cost of 1 {{.*}} %V16I8 = xor
	; SSE42: cost of 1 {{.*}} %V16I8 = xor			; SSE42: cost of 1 {{.*}} %V16I8 = xor
				; SLM: cost of 1 {{.*}} %V16I8 = xor
				; GLM: cost of 1 {{.*}} %V16I8 = xor
	; AVX: cost of 1 {{.*}} %V16I8 = xor			; AVX: cost of 1 {{.*}} %V16I8 = xor
	; AVX2: cost of 1 {{.*}} %V16I8 = xor			; AVX2: cost of 1 {{.*}} %V16I8 = xor
	; AVX512: cost of 1 {{.*}} %V16I8 = xor			; AVX512: cost of 1 {{.*}} %V16I8 = xor
	%V16I8 = xor <16 x i8> undef, undef			%V16I8 = xor <16 x i8> undef, undef
	; SSSE3: cost of 2 {{.*}} %V32I8 = xor			; SSSE3: cost of 2 {{.*}} %V32I8 = xor
	; SSE42: cost of 2 {{.*}} %V32I8 = xor			; SSE42: cost of 2 {{.*}} %V32I8 = xor
				; SLM: cost of 2 {{.*}} %V32I8 = xor
				; GLM: cost of 2 {{.*}} %V32I8 = xor
	; AVX: cost of 1 {{.*}} %V32I8 = xor			; AVX: cost of 1 {{.*}} %V32I8 = xor
	; AVX2: cost of 1 {{.*}} %V32I8 = xor			; AVX2: cost of 1 {{.*}} %V32I8 = xor
	; AVX512: cost of 1 {{.*}} %V32I8 = xor			; AVX512: cost of 1 {{.*}} %V32I8 = xor
	%V32I8 = xor <32 x i8> undef, undef			%V32I8 = xor <32 x i8> undef, undef
	; SSSE3: cost of 4 {{.*}} %V64I8 = xor			; SSSE3: cost of 4 {{.*}} %V64I8 = xor
	; SSE42: cost of 4 {{.*}} %V64I8 = xor			; SSE42: cost of 4 {{.*}} %V64I8 = xor
				; SLM: cost of 4 {{.*}} %V64I8 = xor
				; GLM: cost of 4 {{.*}} %V64I8 = xor
	; AVX: cost of 2 {{.*}} %V64I8 = xor			; AVX: cost of 2 {{.*}} %V64I8 = xor
	; AVX2: cost of 2 {{.*}} %V64I8 = xor			; AVX2: cost of 2 {{.*}} %V64I8 = xor
	; AVX512F: cost of 2 {{.*}} %V64I8 = xor			; AVX512F: cost of 2 {{.*}} %V64I8 = xor
	; AVX512BW: cost of 1 {{.*}} %V64I8 = xor			; AVX512BW: cost of 1 {{.*}} %V64I8 = xor
	%V64I8 = xor <64 x i8> undef, undef			%V64I8 = xor <64 x i8> undef, undef

	ret i32 undef			ret i32 undef
	}			}

	; CHECK-LABEL: 'and'			; CHECK-LABEL: 'and'
	define i32 @and(i32 %arg) {			define i32 @and(i32 %arg) {
	; CHECK: cost of 1 {{.*}} %I64 = and			; CHECK: cost of 1 {{.*}} %I64 = and
	%I64 = and i64 undef, undef			%I64 = and i64 undef, undef
	; SSSE3: cost of 1 {{.*}} %V2I64 = and			; SSSE3: cost of 1 {{.*}} %V2I64 = and
	; SSE42: cost of 1 {{.*}} %V2I64 = and			; SSE42: cost of 1 {{.*}} %V2I64 = and
				; SLM: cost of 1 {{.*}} %V2I64 = and
				; GLM: cost of 1 {{.*}} %V2I64 = and
	; AVX: cost of 1 {{.*}} %V2I64 = and			; AVX: cost of 1 {{.*}} %V2I64 = and
	; AVX2: cost of 1 {{.*}} %V2I64 = and			; AVX2: cost of 1 {{.*}} %V2I64 = and
	; AVX512: cost of 1 {{.*}} %V2I64 = and			; AVX512: cost of 1 {{.*}} %V2I64 = and
	%V2I64 = and <2 x i64> undef, undef			%V2I64 = and <2 x i64> undef, undef
	; SSSE3: cost of 2 {{.*}} %V4I64 = and			; SSSE3: cost of 2 {{.*}} %V4I64 = and
	; SSE42: cost of 2 {{.*}} %V4I64 = and			; SSE42: cost of 2 {{.*}} %V4I64 = and
				; SLM: cost of 2 {{.*}} %V4I64 = and
				; GLM: cost of 2 {{.*}} %V4I64 = and
	; AVX: cost of 1 {{.*}} %V4I64 = and			; AVX: cost of 1 {{.*}} %V4I64 = and
	; AVX2: cost of 1 {{.*}} %V4I64 = and			; AVX2: cost of 1 {{.*}} %V4I64 = and
	; AVX512: cost of 1 {{.*}} %V4I64 = and			; AVX512: cost of 1 {{.*}} %V4I64 = and
	%V4I64 = and <4 x i64> undef, undef			%V4I64 = and <4 x i64> undef, undef
	; SSSE3: cost of 4 {{.*}} %V8I64 = and			; SSSE3: cost of 4 {{.*}} %V8I64 = and
	; SSE42: cost of 4 {{.*}} %V8I64 = and			; SSE42: cost of 4 {{.*}} %V8I64 = and
				; SLM: cost of 4 {{.*}} %V8I64 = and
				; GLM: cost of 4 {{.*}} %V8I64 = and
	; AVX: cost of 2 {{.*}} %V8I64 = and			; AVX: cost of 2 {{.*}} %V8I64 = and
	; AVX2: cost of 2 {{.*}} %V8I64 = and			; AVX2: cost of 2 {{.*}} %V8I64 = and
	; AVX512: cost of 1 {{.*}} %V8I64 = and			; AVX512: cost of 1 {{.*}} %V8I64 = and
	%V8I64 = and <8 x i64> undef, undef			%V8I64 = and <8 x i64> undef, undef

	; CHECK: cost of 1 {{.*}} %I32 = and			; CHECK: cost of 1 {{.*}} %I32 = and
	%I32 = and i32 undef, undef			%I32 = and i32 undef, undef
	; SSSE3: cost of 1 {{.*}} %V4I32 = and			; SSSE3: cost of 1 {{.*}} %V4I32 = and
	; SSE42: cost of 1 {{.*}} %V4I32 = and			; SSE42: cost of 1 {{.*}} %V4I32 = and
				; SLM: cost of 1 {{.*}} %V4I32 = and
				; GLM: cost of 1 {{.*}} %V4I32 = and
	; AVX: cost of 1 {{.*}} %V4I32 = and			; AVX: cost of 1 {{.*}} %V4I32 = and
	; AVX2: cost of 1 {{.*}} %V4I32 = and			; AVX2: cost of 1 {{.*}} %V4I32 = and
	; AVX512: cost of 1 {{.*}} %V4I32 = and			; AVX512: cost of 1 {{.*}} %V4I32 = and
	%V4I32 = and <4 x i32> undef, undef			%V4I32 = and <4 x i32> undef, undef
	; SSSE3: cost of 2 {{.*}} %V8I32 = and			; SSSE3: cost of 2 {{.*}} %V8I32 = and
	; SSE42: cost of 2 {{.*}} %V8I32 = and			; SSE42: cost of 2 {{.*}} %V8I32 = and
				; SLM: cost of 2 {{.*}} %V8I32 = and
				; GLM: cost of 2 {{.*}} %V8I32 = and
	; AVX: cost of 1 {{.*}} %V8I32 = and			; AVX: cost of 1 {{.*}} %V8I32 = and
	; AVX2: cost of 1 {{.*}} %V8I32 = and			; AVX2: cost of 1 {{.*}} %V8I32 = and
	; AVX512: cost of 1 {{.*}} %V8I32 = and			; AVX512: cost of 1 {{.*}} %V8I32 = and
	%V8I32 = and <8 x i32> undef, undef			%V8I32 = and <8 x i32> undef, undef
	; SSSE3: cost of 4 {{.*}} %V16I32 = and			; SSSE3: cost of 4 {{.*}} %V16I32 = and
	; SSE42: cost of 4 {{.*}} %V16I32 = and			; SSE42: cost of 4 {{.*}} %V16I32 = and
				; SLM: cost of 4 {{.*}} %V16I32 = and
				; GLM: cost of 4 {{.*}} %V16I32 = and
	; AVX: cost of 2 {{.*}} %V16I32 = and			; AVX: cost of 2 {{.*}} %V16I32 = and
	; AVX2: cost of 2 {{.*}} %V16I32 = and			; AVX2: cost of 2 {{.*}} %V16I32 = and
	; AVX512: cost of 1 {{.*}} %V16I32 = and			; AVX512: cost of 1 {{.*}} %V16I32 = and
	%V16I32 = and <16 x i32> undef, undef			%V16I32 = and <16 x i32> undef, undef

	; CHECK: cost of 1 {{.*}} %I16 = and			; CHECK: cost of 1 {{.*}} %I16 = and
	%I16 = and i16 undef, undef			%I16 = and i16 undef, undef
	; SSSE3: cost of 1 {{.*}} %V8I16 = and			; SSSE3: cost of 1 {{.*}} %V8I16 = and
	; SSE42: cost of 1 {{.*}} %V8I16 = and			; SSE42: cost of 1 {{.*}} %V8I16 = and
				; SLM: cost of 1 {{.*}} %V8I16 = and
				; GLM: cost of 1 {{.*}} %V8I16 = and
	; AVX: cost of 1 {{.*}} %V8I16 = and			; AVX: cost of 1 {{.*}} %V8I16 = and
	; AVX2: cost of 1 {{.*}} %V8I16 = and			; AVX2: cost of 1 {{.*}} %V8I16 = and
	; AVX512: cost of 1 {{.*}} %V8I16 = and			; AVX512: cost of 1 {{.*}} %V8I16 = and
	%V8I16 = and <8 x i16> undef, undef			%V8I16 = and <8 x i16> undef, undef
	; SSSE3: cost of 2 {{.*}} %V16I16 = and			; SSSE3: cost of 2 {{.*}} %V16I16 = and
	; SSE42: cost of 2 {{.*}} %V16I16 = and			; SSE42: cost of 2 {{.*}} %V16I16 = and
				; SLM: cost of 2 {{.*}} %V16I16 = and
				; GLM: cost of 2 {{.*}} %V16I16 = and
	; AVX: cost of 1 {{.*}} %V16I16 = and			; AVX: cost of 1 {{.*}} %V16I16 = and
	; AVX2: cost of 1 {{.*}} %V16I16 = and			; AVX2: cost of 1 {{.*}} %V16I16 = and
	; AVX512: cost of 1 {{.*}} %V16I16 = and			; AVX512: cost of 1 {{.*}} %V16I16 = and
	%V16I16 = and <16 x i16> undef, undef			%V16I16 = and <16 x i16> undef, undef
	; SSSE3: cost of 4 {{.*}} %V32I16 = and			; SSSE3: cost of 4 {{.*}} %V32I16 = and
	; SSE42: cost of 4 {{.*}} %V32I16 = and			; SSE42: cost of 4 {{.*}} %V32I16 = and
				; SLM: cost of 4 {{.*}} %V32I16 = and
				; GLM: cost of 4 {{.*}} %V32I16 = and
	; AVX: cost of 2 {{.*}} %V32I16 = and			; AVX: cost of 2 {{.*}} %V32I16 = and
	; AVX2: cost of 2 {{.*}} %V32I16 = and			; AVX2: cost of 2 {{.*}} %V32I16 = and
	; AVX512F: cost of 2 {{.*}} %V32I16 = and			; AVX512F: cost of 2 {{.*}} %V32I16 = and
	; AVX512BW: cost of 1 {{.*}} %V32I16 = and			; AVX512BW: cost of 1 {{.*}} %V32I16 = and
	%V32I16 = and <32 x i16> undef, undef			%V32I16 = and <32 x i16> undef, undef

	; CHECK: cost of 1 {{.*}} %I8 = and			; CHECK: cost of 1 {{.*}} %I8 = and
	%I8 = and i8 undef, undef			%I8 = and i8 undef, undef
	; SSSE3: cost of 1 {{.*}} %V16I8 = and			; SSSE3: cost of 1 {{.*}} %V16I8 = and
	; SSE42: cost of 1 {{.*}} %V16I8 = and			; SSE42: cost of 1 {{.*}} %V16I8 = and
				; SLM: cost of 1 {{.*}} %V16I8 = and
				; GLM: cost of 1 {{.*}} %V16I8 = and
	; AVX: cost of 1 {{.*}} %V16I8 = and			; AVX: cost of 1 {{.*}} %V16I8 = and
	; AVX2: cost of 1 {{.*}} %V16I8 = and			; AVX2: cost of 1 {{.*}} %V16I8 = and
	; AVX512: cost of 1 {{.*}} %V16I8 = and			; AVX512: cost of 1 {{.*}} %V16I8 = and
	%V16I8 = and <16 x i8> undef, undef			%V16I8 = and <16 x i8> undef, undef
	; SSSE3: cost of 2 {{.*}} %V32I8 = and			; SSSE3: cost of 2 {{.*}} %V32I8 = and
	; SSE42: cost of 2 {{.*}} %V32I8 = and			; SSE42: cost of 2 {{.*}} %V32I8 = and
				; SLM: cost of 2 {{.*}} %V32I8 = and
				; GLM: cost of 2 {{.*}} %V32I8 = and
	; AVX: cost of 1 {{.*}} %V32I8 = and			; AVX: cost of 1 {{.*}} %V32I8 = and
	; AVX2: cost of 1 {{.*}} %V32I8 = and			; AVX2: cost of 1 {{.*}} %V32I8 = and
	; AVX512: cost of 1 {{.*}} %V32I8 = and			; AVX512: cost of 1 {{.*}} %V32I8 = and
	%V32I8 = and <32 x i8> undef, undef			%V32I8 = and <32 x i8> undef, undef
	; SSSE3: cost of 4 {{.*}} %V64I8 = and			; SSSE3: cost of 4 {{.*}} %V64I8 = and
	; SSE42: cost of 4 {{.*}} %V64I8 = and			; SSE42: cost of 4 {{.*}} %V64I8 = and
				; SLM: cost of 4 {{.*}} %V64I8 = and
				; GLM: cost of 4 {{.*}} %V64I8 = and
	; AVX: cost of 2 {{.*}} %V64I8 = and			; AVX: cost of 2 {{.*}} %V64I8 = and
	; AVX2: cost of 2 {{.*}} %V64I8 = and			; AVX2: cost of 2 {{.*}} %V64I8 = and
	; AVX512F: cost of 2 {{.*}} %V64I8 = and			; AVX512F: cost of 2 {{.*}} %V64I8 = and
	; AVX512BW: cost of 1 {{.*}} %V64I8 = and			; AVX512BW: cost of 1 {{.*}} %V64I8 = and
	%V64I8 = and <64 x i8> undef, undef			%V64I8 = and <64 x i8> undef, undef

	ret i32 undef			ret i32 undef
	}			}

	; CHECK-LABEL: 'mul'			; CHECK-LABEL: 'mul'
	define i32 @mul(i32 %arg) {			define i32 @mul(i32 %arg) {
	; CHECK: cost of 1 {{.*}} %I64 = mul			; CHECK: cost of 1 {{.*}} %I64 = mul
	%I64 = mul i64 undef, undef			%I64 = mul i64 undef, undef
	; SSSE3: cost of 8 {{.*}} %V2I64 = mul			; SSSE3: cost of 8 {{.*}} %V2I64 = mul
	; SSE42: cost of 8 {{.*}} %V2I64 = mul			; SSE42: cost of 8 {{.*}} %V2I64 = mul
				; SLM: cost of 17 {{.*}} %V2I64 = mul
				; GLM: cost of 8 {{.*}} %V2I64 = mul
	; AVX: cost of 8 {{.*}} %V2I64 = mul			; AVX: cost of 8 {{.*}} %V2I64 = mul
	; AVX2: cost of 8 {{.*}} %V2I64 = mul			; AVX2: cost of 8 {{.*}} %V2I64 = mul
	; AVX512F: cost of 8 {{.*}} %V2I64 = mul			; AVX512F: cost of 8 {{.*}} %V2I64 = mul
	; AVX512BW: cost of 8 {{.*}} %V2I64 = mul			; AVX512BW: cost of 8 {{.*}} %V2I64 = mul
	; AVX512DQ: cost of 1 {{.*}} %V2I64 = mul			; AVX512DQ: cost of 1 {{.*}} %V2I64 = mul
	%V2I64 = mul <2 x i64> undef, undef			%V2I64 = mul <2 x i64> undef, undef
	; SSSE3: cost of 16 {{.*}} %V4I64 = mul			; SSSE3: cost of 16 {{.*}} %V4I64 = mul
	; SSE42: cost of 16 {{.*}} %V4I64 = mul			; SSE42: cost of 16 {{.*}} %V4I64 = mul
				; SLM: cost of 34 {{.*}} %V4I64 = mul
				; GLM: cost of 16 {{.*}} %V4I64 = mul
	; AVX: cost of 18 {{.*}} %V4I64 = mul			; AVX: cost of 18 {{.*}} %V4I64 = mul
	; AVX2: cost of 8 {{.*}} %V4I64 = mul			; AVX2: cost of 8 {{.*}} %V4I64 = mul
	; AVX512F: cost of 8 {{.*}} %V4I64 = mul			; AVX512F: cost of 8 {{.*}} %V4I64 = mul
	; AVX512BW: cost of 8 {{.*}} %V4I64 = mul			; AVX512BW: cost of 8 {{.*}} %V4I64 = mul
	; AVX512DQ: cost of 1 {{.*}} %V4I64 = mul			; AVX512DQ: cost of 1 {{.*}} %V4I64 = mul
	%V4I64 = mul <4 x i64> undef, undef			%V4I64 = mul <4 x i64> undef, undef
	; SSSE3: cost of 32 {{.*}} %V8I64 = mul			; SSSE3: cost of 32 {{.*}} %V8I64 = mul
	; SSE42: cost of 32 {{.*}} %V8I64 = mul			; SSE42: cost of 32 {{.*}} %V8I64 = mul
				; SLM: cost of 68 {{.*}} %V8I64 = mul
				; GLM: cost of 32 {{.*}} %V8I64 = mul
	; AVX: cost of 36 {{.*}} %V8I64 = mul			; AVX: cost of 36 {{.*}} %V8I64 = mul
	; AVX2: cost of 16 {{.*}} %V8I64 = mul			; AVX2: cost of 16 {{.*}} %V8I64 = mul
	; AVX512F: cost of 8 {{.*}} %V8I64 = mul			; AVX512F: cost of 8 {{.*}} %V8I64 = mul
	; AVX512BW: cost of 8 {{.*}} %V8I64 = mul			; AVX512BW: cost of 8 {{.*}} %V8I64 = mul
	; AVX512DQ: cost of 1 {{.*}} %V8I64 = mul			; AVX512DQ: cost of 1 {{.*}} %V8I64 = mul
	%V8I64 = mul <8 x i64> undef, undef			%V8I64 = mul <8 x i64> undef, undef

	; CHECK: cost of 1 {{.*}} %I32 = mul			; CHECK: cost of 1 {{.*}} %I32 = mul
	%I32 = mul i32 undef, undef			%I32 = mul i32 undef, undef
	; SSSE3: cost of 6 {{.*}} %V4I32 = mul			; SSSE3: cost of 6 {{.*}} %V4I32 = mul
	; SSE42: cost of 2 {{.*}} %V4I32 = mul			; SSE42: cost of 2 {{.*}} %V4I32 = mul
				; SLM: cost of 11 {{.*}} %V4I32 = mul
				; GLM: cost of 2 {{.*}} %V4I32 = mul
	; AVX: cost of 2 {{.*}} %V4I32 = mul			; AVX: cost of 2 {{.*}} %V4I32 = mul
	; AVX2: cost of 2 {{.*}} %V4I32 = mul			; AVX2: cost of 2 {{.*}} %V4I32 = mul
	; AVX512: cost of 1 {{.*}} %V4I32 = mul			; AVX512: cost of 1 {{.*}} %V4I32 = mul
	%V4I32 = mul <4 x i32> undef, undef			%V4I32 = mul <4 x i32> undef, undef
	; SSSE3: cost of 12 {{.*}} %V8I32 = mul			; SSSE3: cost of 12 {{.*}} %V8I32 = mul
	; SSE42: cost of 4 {{.*}} %V8I32 = mul			; SSE42: cost of 4 {{.*}} %V8I32 = mul
				; SLM: cost of 22 {{.*}} %V8I32 = mul
				; GLM: cost of 4 {{.*}} %V8I32 = mul
	; AVX: cost of 4 {{.*}} %V8I32 = mul			; AVX: cost of 4 {{.*}} %V8I32 = mul
	; AVX2: cost of 2 {{.*}} %V8I32 = mul			; AVX2: cost of 2 {{.*}} %V8I32 = mul
	; AVX512: cost of 1 {{.*}} %V8I32 = mul			; AVX512: cost of 1 {{.*}} %V8I32 = mul
	%V8I32 = mul <8 x i32> undef, undef			%V8I32 = mul <8 x i32> undef, undef
	; SSSE3: cost of 24 {{.*}} %V16I32 = mul			; SSSE3: cost of 24 {{.*}} %V16I32 = mul
	; SSE42: cost of 8 {{.*}} %V16I32 = mul			; SSE42: cost of 8 {{.*}} %V16I32 = mul
				; SLM: cost of 44 {{.*}} %V16I32 = mul
				; GLM: cost of 8 {{.*}} %V16I32 = mul
	; AVX: cost of 8 {{.*}} %V16I32 = mul			; AVX: cost of 8 {{.*}} %V16I32 = mul
	; AVX2: cost of 4 {{.*}} %V16I32 = mul			; AVX2: cost of 4 {{.*}} %V16I32 = mul
	; AVX512: cost of 1 {{.*}} %V16I32 = mul			; AVX512: cost of 1 {{.*}} %V16I32 = mul
	%V16I32 = mul <16 x i32> undef, undef			%V16I32 = mul <16 x i32> undef, undef

	; CHECK: cost of 1 {{.*}} %I16 = mul			; CHECK: cost of 1 {{.*}} %I16 = mul
	%I16 = mul i16 undef, undef			%I16 = mul i16 undef, undef
	; SSSE3: cost of 1 {{.*}} %V8I16 = mul			; SSSE3: cost of 1 {{.*}} %V8I16 = mul
	; SSE42: cost of 1 {{.*}} %V8I16 = mul			; SSE42: cost of 1 {{.*}} %V8I16 = mul
				; SLM: cost of 2 {{.*}} %V8I16 = mul
				; GLM: cost of 1 {{.*}} %V8I16 = mul
	; AVX: cost of 1 {{.*}} %V8I16 = mul			; AVX: cost of 1 {{.*}} %V8I16 = mul
	; AVX2: cost of 1 {{.*}} %V8I16 = mul			; AVX2: cost of 1 {{.*}} %V8I16 = mul
	; AVX512: cost of 1 {{.*}} %V8I16 = mul			; AVX512: cost of 1 {{.*}} %V8I16 = mul
	%V8I16 = mul <8 x i16> undef, undef			%V8I16 = mul <8 x i16> undef, undef
	; SSSE3: cost of 2 {{.*}} %V16I16 = mul			; SSSE3: cost of 2 {{.*}} %V16I16 = mul
	; SSE42: cost of 2 {{.*}} %V16I16 = mul			; SSE42: cost of 2 {{.*}} %V16I16 = mul
				; SLM: cost of 4 {{.*}} %V16I16 = mul
				; GLM: cost of 2 {{.*}} %V16I16 = mul
	; AVX: cost of 4 {{.*}} %V16I16 = mul			; AVX: cost of 4 {{.*}} %V16I16 = mul
	; AVX2: cost of 1 {{.*}} %V16I16 = mul			; AVX2: cost of 1 {{.*}} %V16I16 = mul
	; AVX512: cost of 1 {{.*}} %V16I16 = mul			; AVX512: cost of 1 {{.*}} %V16I16 = mul
	%V16I16 = mul <16 x i16> undef, undef			%V16I16 = mul <16 x i16> undef, undef
	; SSSE3: cost of 4 {{.*}} %V32I16 = mul			; SSSE3: cost of 4 {{.*}} %V32I16 = mul
	; SSE42: cost of 4 {{.*}} %V32I16 = mul			; SSE42: cost of 4 {{.*}} %V32I16 = mul
				; SLM: cost of 8 {{.*}} %V32I16 = mul
				; GLM: cost of 4 {{.*}} %V32I16 = mul
	; AVX: cost of 8 {{.*}} %V32I16 = mul			; AVX: cost of 8 {{.*}} %V32I16 = mul
	; AVX2: cost of 2 {{.*}} %V32I16 = mul			; AVX2: cost of 2 {{.*}} %V32I16 = mul
	; AVX512F: cost of 2 {{.*}} %V32I16 = mul			; AVX512F: cost of 2 {{.*}} %V32I16 = mul
	; AVX512BW: cost of 1 {{.*}} %V32I16 = mul			; AVX512BW: cost of 1 {{.*}} %V32I16 = mul
	%V32I16 = mul <32 x i16> undef, undef			%V32I16 = mul <32 x i16> undef, undef

	; CHECK: cost of 1 {{.*}} %I8 = mul			; CHECK: cost of 1 {{.*}} %I8 = mul
	%I8 = mul i8 undef, undef			%I8 = mul i8 undef, undef
	; SSSE3: cost of 12 {{.*}} %V16I8 = mul			; SSSE3: cost of 12 {{.*}} %V16I8 = mul
	; SSE42: cost of 12 {{.*}} %V16I8 = mul			; SSE42: cost of 12 {{.*}} %V16I8 = mul
				; SLM: cost of 14 {{.*}} %V16I8 = mul
				; GLM: cost of 12 {{.*}} %V16I8 = mul
	; AVX: cost of 12 {{.*}} %V16I8 = mul			; AVX: cost of 12 {{.*}} %V16I8 = mul
	; AVX2: cost of 7 {{.*}} %V16I8 = mul			; AVX2: cost of 7 {{.*}} %V16I8 = mul
	; AVX512F: cost of 5 {{.*}} %V16I8 = mul			; AVX512F: cost of 5 {{.*}} %V16I8 = mul
	; AVX512BW: cost of 4 {{.*}} %V16I8 = mul			; AVX512BW: cost of 4 {{.*}} %V16I8 = mul
	%V16I8 = mul <16 x i8> undef, undef			%V16I8 = mul <16 x i8> undef, undef
	; SSSE3: cost of 24 {{.*}} %V32I8 = mul			; SSSE3: cost of 24 {{.*}} %V32I8 = mul
	; SSE42: cost of 24 {{.*}} %V32I8 = mul			; SSE42: cost of 24 {{.*}} %V32I8 = mul
				; SLM: cost of 28 {{.*}} %V32I8 = mul
				; GLM: cost of 24 {{.*}} %V32I8 = mul
	; AVX: cost of 26 {{.*}} %V32I8 = mul			; AVX: cost of 26 {{.*}} %V32I8 = mul
	; AVX2: cost of 17 {{.*}} %V32I8 = mul			; AVX2: cost of 17 {{.*}} %V32I8 = mul
	; AVX512F: cost of 13 {{.*}} %V32I8 = mul			; AVX512F: cost of 13 {{.*}} %V32I8 = mul
	; AVX512BW: cost of 4 {{.*}} %V32I8 = mul			; AVX512BW: cost of 4 {{.*}} %V32I8 = mul
	%V32I8 = mul <32 x i8> undef, undef			%V32I8 = mul <32 x i8> undef, undef
	; SSSE3: cost of 48 {{.*}} %V64I8 = mul			; SSSE3: cost of 48 {{.*}} %V64I8 = mul
	; SSE42: cost of 48 {{.*}} %V64I8 = mul			; SSE42: cost of 48 {{.*}} %V64I8 = mul
				; SLM: cost of 56 {{.*}} %V64I8 = mul
				; GLM: cost of 48 {{.*}} %V64I8 = mul
	; AVX: cost of 52 {{.*}} %V64I8 = mul			; AVX: cost of 52 {{.*}} %V64I8 = mul
	; AVX2: cost of 34 {{.*}} %V64I8 = mul			; AVX2: cost of 34 {{.*}} %V64I8 = mul
	; AVX512F: cost of 26 {{.*}} %V64I8 = mul			; AVX512F: cost of 26 {{.*}} %V64I8 = mul
	; AVX512BW: cost of 11 {{.*}} %V64I8 = mul			; AVX512BW: cost of 11 {{.*}} %V64I8 = mul
	%V64I8 = mul <64 x i8> undef, undef			%V64I8 = mul <64 x i8> undef, undef

	ret i32 undef			ret i32 undef
	}			}

	; CHECK-LABEL: 'mul_2i32'			; CHECK-LABEL: 'mul_2i32'
	define void @mul_2i32() {			define void @mul_2i32() {
	; A <2 x i32> gets expanded to a <2 x i64> vector.			; A <2 x i32> gets expanded to a <2 x i64> vector.
	; A <2 x i64> vector multiply is implemented using			; A <2 x i64> vector multiply is implemented using
	; 3 PMULUDQ and 2 PADDS and 4 shifts.			; 3 PMULUDQ and 2 PADDS and 4 shifts.
	; SSSE3: cost of 8 {{.*}} %A0 = mul			; SSSE3: cost of 8 {{.*}} %A0 = mul
	; SSE42: cost of 8 {{.*}} %A0 = mul			; SSE42: cost of 8 {{.*}} %A0 = mul
				; SLM: cost of 17 {{.*}} %A0 = mul
				; GLM: cost of 8 {{.*}} %A0 = mul
	; AVX: cost of 8 {{.*}} %A0 = mul			; AVX: cost of 8 {{.*}} %A0 = mul
	; AVX2: cost of 8 {{.*}} %A0 = mul			; AVX2: cost of 8 {{.*}} %A0 = mul
	; AVX512F: cost of 8 {{.*}} %A0 = mul			; AVX512F: cost of 8 {{.*}} %A0 = mul
	; AVX512BW: cost of 8 {{.*}} %A0 = mul			; AVX512BW: cost of 8 {{.*}} %A0 = mul
	; AVX512DQ: cost of 1 {{.*}} %A0 = mul			; AVX512DQ: cost of 1 {{.*}} %A0 = mul
	%A0 = mul <2 x i32> undef, undef			%A0 = mul <2 x i32> undef, undef

	ret void			ret void
	}			}