This is an archive of the discontinued LLVM Phabricator instance.

Differential D22064

[X86] Make some cast costs more precise
ClosedPublic

Authored by mkuper on Jul 6 2016, 1:06 PM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon
delena

Commits

rGf0c59330e914: [X86] Make some cast costs more precise
rL275106: [X86] Make some cast costs more precise

Summary

This brings back from the dead the AVX table parts of Elena's D15604.

The SSE changes seem more closely bound to the strategy change (since the current costs use the weird "convert to simple types, then look up" form), I'll look into that separately.

Diff Detail

Event Timeline

mkuper updated this revision to Diff 62946.Jul 6 2016, 1:06 PM

mkuper retitled this revision from to [X86] Make some cast costs more precise.

mkuper updated this object.

mkuper added reviewers: delena, RKSimon, spatel.

mkuper added a subscriber: llvm-commits.

RKSimon added inline comments.Jul 6 2016, 1:15 PM

lib/Target/X86/X86TargetTransformInfo.cpp
540	Depending on how thorough we need to be shouldn't there be AVX512DQ+AVX512VL UINT_TO_FP cases for 128/256 bit vectors?

mkuper added inline comments.Jul 6 2016, 1:20 PM

lib/Target/X86/X86TargetTransformInfo.cpp
540	Probably. I'd rather leave that to the Intel folks, they can probably get more precise numbers for SKX.

delena added inline comments.Jul 6 2016, 11:29 PM

lib/Target/X86/X86TargetTransformInfo.cpp
540	In this case, even if you have only DQ without VL, the conversion is in ZMM instead of YMM, but the cost is the same.
test/Analysis/CostModel/X86/sitofp.ll
273	We should have a nicer cost for DQ here, because it handles all 64 bit integers, right?

mkuper added inline comments.Jul 7 2016, 10:44 AM

lib/Target/X86/X86TargetTransformInfo.cpp

540

We don't do this right now, see below.

test/Analysis/CostModel/X86/sitofp.ll

273

Right now, we scalarize this unless we have VL.

That is, both F and F+DQ produce:

	vextracti128	$1, %ymm0, %xmm1
	vpextrq	$1, %xmm1, %rax
	vcvtsi2sdq	%rax, %xmm0, %xmm2
	vmovq	%xmm1, %rax
	vcvtsi2sdq	%rax, %xmm0, %xmm1
	vunpcklpd	%xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0]
	vpextrq	$1, %xmm0, %rax
	vcvtsi2sdq	%rax, %xmm0, %xmm2
	vmovq	%xmm0, %rax
	vcvtsi2sdq	%rax, %xmm0, %xmm0
	vunpcklpd	%xmm2, %xmm0, %xmm0 ## xmm0 = xmm0[0],xmm2[0]
	vinsertf128	$1, %xmm1, %ymm0, %ymm0
	retq

And with VL:

	vcvtqq2pd	%ymm0, %ymm0
	retq

I guess we could, potentially, have a nicer sequence with DQ without VL (insert low lanes, vcvtqq2pd, extract low lanes), but we currently don't.

A couple of minors but otherwise this looks good to me. The AVX512 people should probably give the final OK though.

lib/Target/X86/X86TargetTransformInfo.cpp
540	OK - please add a TODO comment to the table for the AVX512DQ 128/256 entries.
717	Test?

Thanks, Simon!

lib/Target/X86/X86TargetTransformInfo.cpp
540	Ack.
717	Right, thanks. Elena's original patch didn't have one, and I didn't notice. I'll add.

delena accepted this revision.Jul 9 2016, 11:13 PM

delena edited edge metadata.

This revision is now accepted and ready to land.Jul 9 2016, 11:13 PM

Closed by commit rL275106: [X86] Make some cast costs more precise (authored by mkuper). · Explain WhyJul 11 2016, 2:47 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

X86/

X86TargetTransformInfo.cpp

16 lines

test/

Analysis/

CostModel/

X86/

sitofp.ll

18 lines

uitofp.ll

34 lines

Transforms/

LoopVectorize/

X86/

uint64_to_fp64-cost-model.ll

4 lines

Diff 62946

lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 531 Lines • ▼ Show 20 Lines	int X86TTIImpl::getCastInstrCost(unsigned Opcode, Type Dst, Type Src) {
// potential massive combinations (elem_num x src_type x dst_type).		// potential massive combinations (elem_num x src_type x dst_type).

static const TypeConversionCostTblEntry AVX512DQConversionTbl[] = {		static const TypeConversionCostTblEntry AVX512DQConversionTbl[] = {
{ ISD::UINT_TO_FP, MVT::v2f32, MVT::v2i64, 1 },		{ ISD::UINT_TO_FP, MVT::v2f32, MVT::v2i64, 1 },
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i64, 1 },		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i64, 1 },
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i64, 1 },		{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i64, 1 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i64, 1 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i64, 1 },
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i64, 1 },		{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i64, 1 },
{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i64, 1 },		{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i64, 1 },
		RKSimonUnsubmitted Not Done Reply Inline Actions Depending on how thorough we need to be shouldn't there be AVX512DQ+AVX512VL UINT_TO_FP cases for 128/256 bit vectors? RKSimon: Depending on how thorough we need to be shouldn't there be AVX512DQ+AVX512VL UINT_TO_FP cases…
		mkuperAuthorUnsubmitted Not Done Reply Inline Actions Probably. I'd rather leave that to the Intel folks, they can probably get more precise numbers for SKX. mkuper: Probably. I'd rather leave that to the Intel folks, they can probably get more precise numbers…
		delenaUnsubmitted Not Done Reply Inline Actions In this case, even if you have only DQ without VL, the conversion is in ZMM instead of YMM, but the cost is the same. delena: In this case, even if you have only DQ without VL, the conversion is in ZMM instead of YMM, but…
		mkuperAuthorUnsubmitted Not Done Reply Inline Actions We don't do this right now, see below. mkuper: We don't do this right now, see below.
		RKSimonUnsubmitted Not Done Reply Inline Actions OK - please add a TODO comment to the table for the AVX512DQ 128/256 entries. RKSimon: OK - please add a TODO comment to the table for the AVX512DQ 128/256 entries.
		mkuperAuthorUnsubmitted Not Done Reply Inline Actions Ack. mkuper: Ack.

{ ISD::FP_TO_UINT, MVT::v2i64, MVT::v2f32, 1 },		{ ISD::FP_TO_UINT, MVT::v2i64, MVT::v2f32, 1 },
{ ISD::FP_TO_UINT, MVT::v4i64, MVT::v4f32, 1 },		{ ISD::FP_TO_UINT, MVT::v4i64, MVT::v4f32, 1 },
{ ISD::FP_TO_UINT, MVT::v8i64, MVT::v8f32, 1 },		{ ISD::FP_TO_UINT, MVT::v8i64, MVT::v8f32, 1 },
{ ISD::FP_TO_UINT, MVT::v2i64, MVT::v2f64, 1 },		{ ISD::FP_TO_UINT, MVT::v2i64, MVT::v2f64, 1 },
{ ISD::FP_TO_UINT, MVT::v4i64, MVT::v4f64, 1 },		{ ISD::FP_TO_UINT, MVT::v4i64, MVT::v4f64, 1 },
{ ISD::FP_TO_UINT, MVT::v8i64, MVT::v8f64, 1 },		{ ISD::FP_TO_UINT, MVT::v8i64, MVT::v8f64, 1 },
};		};
Show All 23 Lines	static const TypeConversionCostTblEntry AVX512FConversionTbl[] = {
{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i1, 4 },		{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i1, 4 },
{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i1, 3 },		{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i1, 3 },
{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i8, 2 },		{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i8, 2 },
{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i8, 2 },		{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i8, 2 },
{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i16, 2 },		{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i16, 2 },
{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i16, 2 },		{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i16, 2 },
{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i32, 1 },		{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i32, 1 },
{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i32, 1 },		{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i32, 1 },
		{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i64, 26 },
		{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i64, 26 },

{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i1, 4 },		{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i1, 4 },
{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i1, 3 },		{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i1, 3 },
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i8, 2 },		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i8, 2 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i8, 2 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i8, 2 },
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i8, 2 },		{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i8, 2 },
{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i8, 2 },		{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i8, 2 },
{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i8, 2 },		{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i8, 2 },
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i16, 5 },		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i16, 5 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i16, 2 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i16, 2 },
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i16, 2 },		{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i16, 2 },
{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i16, 2 },		{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i16, 2 },
{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i16, 2 },		{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i16, 2 },
{ ISD::UINT_TO_FP, MVT::v2f32, MVT::v2i32, 2 },		{ ISD::UINT_TO_FP, MVT::v2f32, MVT::v2i32, 2 },
		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i32, 1 },
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i32, 1 },		{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i32, 1 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i32, 1 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i32, 1 },
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i32, 1 },		{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i32, 1 },
{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i32, 1 },		{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i32, 1 },
{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i32, 1 },		{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i32, 1 },
		{ ISD::UINT_TO_FP, MVT::v2f32, MVT::v2i64, 5 },
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i64, 5 },		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i64, 5 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i64, 12 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i64, 12 },
{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i64, 26 },		{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i64, 26 },

{ ISD::FP_TO_UINT, MVT::v2i32, MVT::v2f32, 1 },		{ ISD::FP_TO_UINT, MVT::v2i32, MVT::v2f32, 1 },
{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f32, 1 },		{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f32, 1 },
{ ISD::FP_TO_UINT, MVT::v8i32, MVT::v8f32, 1 },		{ ISD::FP_TO_UINT, MVT::v8i32, MVT::v8f32, 1 },
{ ISD::FP_TO_UINT, MVT::v16i32, MVT::v16f32, 1 },		{ ISD::FP_TO_UINT, MVT::v16i32, MVT::v16f32, 1 },
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	static const TypeConversionCostTblEntry AVXConversionTbl[] = {
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i1, 7 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i1, 7 },
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i1, 6 },		{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i1, 6 },
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i8, 2 },		{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i8, 2 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i8, 2 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i8, 2 },
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i8, 5 },		{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i8, 5 },
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i16, 2 },		{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i16, 2 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i16, 2 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i16, 2 },
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i16, 5 },		{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i16, 5 },
		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i32, 6 },
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i32, 6 },		{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i32, 6 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i32, 6 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i32, 6 },
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i32, 9 },		{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i32, 9 },
// The generic code to compute the scalar overhead is currently broken.		// The generic code to compute the scalar overhead is currently broken.
// Workaround this limitation by estimating the scalarization overhead		// Workaround this limitation by estimating the scalarization overhead
// here. We have roughly 10 instructions per scalar element.		// here. We have roughly 10 instructions per scalar element.
// Multiply that by the vector width.		// Multiply that by the vector width.
// FIXME: remove that when PR19268 is fixed.		// FIXME: remove that when PR19268 is fixed.
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i64, 2*10 },		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i64, 10 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i64, 4*10 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i64, 20 },
		{ ISD::SINT_TO_FP, MVT::v4f64, MVT::v4i64, 13 },
		{ ISD::SINT_TO_FP, MVT::v4f64, MVT::v4i64, 13 },

{ ISD::FP_TO_SINT, MVT::v4i8, MVT::v4f32, 1 },		{ ISD::FP_TO_SINT, MVT::v4i8, MVT::v4f32, 1 },
{ ISD::FP_TO_SINT, MVT::v8i8, MVT::v8f32, 7 },		{ ISD::FP_TO_SINT, MVT::v8i8, MVT::v8f32, 7 },
// This node is expanded into scalarized operations but BasicTTI is overly		// This node is expanded into scalarized operations but BasicTTI is overly
// optimistic estimating its cost. It computes 3 per element (one		// optimistic estimating its cost. It computes 3 per element (one
// vector-extract, one scalar conversion and one vector-insert). The		// vector-extract, one scalar conversion and one vector-insert). The
// problem is that the inserts form a read-modify-write chain so latency		// problem is that the inserts form a read-modify-write chain so latency
// should be factored in too. Inflating the cost per element by 1.		// should be factored in too. Inflating the cost per element by 1.
{ ISD::FP_TO_UINT, MVT::v8i32, MVT::v8f32, 8*4 },		{ ISD::FP_TO_UINT, MVT::v8i32, MVT::v8f32, 8*4 },
{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f64, 4*4 },		{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f64, 4*4 },

		{ ISD::FP_EXTEND, MVT::v4f64, MVT::v4f32, 1 },
		{ ISD::FP_ROUND, MVT::v4f32, MVT::v4f64, 1 },
		RKSimonUnsubmitted Not Done Reply Inline Actions Test? RKSimon: Test?
		mkuperAuthorUnsubmitted Not Done Reply Inline Actions Right, thanks. Elena's original patch didn't have one, and I didn't notice. I'll add. mkuper: Right, thanks. Elena's original patch didn't have one, and I didn't notice. I'll add.
};		};

static const TypeConversionCostTblEntry SSE41ConversionTbl[] = {		static const TypeConversionCostTblEntry SSE41ConversionTbl[] = {
{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i8, 2 },		{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i8, 2 },
{ ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i8, 2 },		{ ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i8, 2 },
{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i16, 2 },		{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i16, 2 },
{ ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i16, 2 },		{ ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i16, 2 },
{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i32, 2 },		{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i32, 2 },
▲ Show 20 Lines • Show All 894 Lines • Show Last 20 Lines

test/Analysis/CostModel/X86/sitofp.ll

Show First 20 Lines • Show All 258 Lines • ▼ Show 20 Lines	define <2 x double> @sitofpv2i64v2double(<2 x i64> %a) {
ret <2 x double> %1		ret <2 x double> %1
}		}

define <4 x double> @sitofpv4i64v4double(<4 x i64> %a) {		define <4 x double> @sitofpv4i64v4double(<4 x i64> %a) {
; SSE2-LABEL: sitofpv4i64v4double		; SSE2-LABEL: sitofpv4i64v4double
; SSE2: cost of 40 {{.*}} sitofp		; SSE2: cost of 40 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv4i64v4double		; AVX1-LABEL: sitofpv4i64v4double
; AVX1: cost of 10 {{.*}} sitofp		; AVX1: cost of 13 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv4i64v4double		; AVX2-LABEL: sitofpv4i64v4double
; AVX2: cost of 10 {{.*}} sitofp		; AVX2: cost of 13 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv4i64v4double		; AVX512F-LABEL: sitofpv4i64v4double
; AVX512F: cost of 10 {{.*}} sitofp		; AVX512F: cost of 13 {{.*}} sitofp
		delenaUnsubmitted Not Done Reply Inline Actions We should have a nicer cost for DQ here, because it handles all 64 bit integers, right? delena: We should have a nicer cost for DQ here, because it handles all 64 bit integers, right?
		mkuperAuthorUnsubmitted Not Done Reply Inline Actions Right now, we scalarize this unless we have VL. That is, both F and F+DQ produce: vextracti128 $1, %ymm0, %xmm1 vpextrq $1, %xmm1, %rax vcvtsi2sdq %rax, %xmm0, %xmm2 vmovq %xmm1, %rax vcvtsi2sdq %rax, %xmm0, %xmm1 vunpcklpd %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0] vpextrq $1, %xmm0, %rax vcvtsi2sdq %rax, %xmm0, %xmm2 vmovq %xmm0, %rax vcvtsi2sdq %rax, %xmm0, %xmm0 vunpcklpd %xmm2, %xmm0, %xmm0 ## xmm0 = xmm0[0],xmm2[0] vinsertf128 $1, %xmm1, %ymm0, %ymm0 retq And with VL: vcvtqq2pd %ymm0, %ymm0 retq I guess we could, potentially, have a nicer sequence with DQ without VL (insert low lanes, vcvtqq2pd, extract low lanes), but we currently don't. mkuper: Right now, we scalarize this unless we have VL. That is, both F and F+DQ produce: ```…
%1 = sitofp <4 x i64> %a to <4 x double>		%1 = sitofp <4 x i64> %a to <4 x double>
ret <4 x double> %1		ret <4 x double> %1
}		}

define <8 x double> @sitofpv8i64v8double(<8 x i64> %a) {		define <8 x double> @sitofpv8i64v8double(<8 x i64> %a) {
; SSE2-LABEL: sitofpv8i64v8double		; SSE2-LABEL: sitofpv8i64v8double
; SSE2: cost of 80 {{.*}} sitofp		; SSE2: cost of 80 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv8i64v8double		; AVX1-LABEL: sitofpv8i64v8double
; AVX1: cost of 21 {{.*}} sitofp		; AVX1: cost of 27 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv8i64v8double		; AVX2-LABEL: sitofpv8i64v8double
; AVX2: cost of 21 {{.*}} sitofp		; AVX2: cost of 27 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv8i64v8double		; AVX512F-LABEL: sitofpv8i64v8double
; AVX512F: cost of 22 {{.*}} sitofp		; AVX512F: cost of 22 {{.*}} sitofp
%1 = sitofp <8 x i64> %a to <8 x double>		%1 = sitofp <8 x i64> %a to <8 x double>
ret <8 x double> %1		ret <8 x double> %1
}		}

define <16 x double> @sitofpv16i64v16double(<16 x i64> %a) {		define <16 x double> @sitofpv16i64v16double(<16 x i64> %a) {
; SSE2-LABEL: sitofpv16i64v16double		; SSE2-LABEL: sitofpv16i64v16double
; SSE2: cost of 160 {{.*}} sitofp		; SSE2: cost of 160 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv16i64v16double		; AVX1-LABEL: sitofpv16i64v16double
; AVX1: cost of 43 {{.*}} sitofp		; AVX1: cost of 55 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv16i64v16double		; AVX2-LABEL: sitofpv16i64v16double
; AVX2: cost of 43 {{.*}} sitofp		; AVX2: cost of 55 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv16i64v16double		; AVX512F-LABEL: sitofpv16i64v16double
; AVX512F: cost of 45 {{.*}} sitofp		; AVX512F: cost of 45 {{.*}} sitofp
%1 = sitofp <16 x i64> %a to <16 x double>		%1 = sitofp <16 x i64> %a to <16 x double>
ret <16 x double> %1		ret <16 x double> %1
}		}

define <32 x double> @sitofpv32i64v32double(<32 x i64> %a) {		define <32 x double> @sitofpv32i64v32double(<32 x i64> %a) {
; SSE2-LABEL: sitofpv32i64v32double		; SSE2-LABEL: sitofpv32i64v32double
; SSE2: cost of 320 {{.*}} sitofp		; SSE2: cost of 320 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv32i64v32double		; AVX1-LABEL: sitofpv32i64v32double
; AVX1: cost of 87 {{.*}} sitofp		; AVX1: cost of 111 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv32i64v32double		; AVX2-LABEL: sitofpv32i64v32double
; AVX2: cost of 87 {{.*}} sitofp		; AVX2: cost of 111 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv32i64v32double		; AVX512F-LABEL: sitofpv32i64v32double
; AVX512F: cost of 91 {{.*}} sitofp		; AVX512F: cost of 91 {{.*}} sitofp
%1 = sitofp <32 x i64> %a to <32 x double>		%1 = sitofp <32 x i64> %a to <32 x double>
ret <32 x double> %1		ret <32 x double> %1
}		}

define <2 x float> @sitofpv2i8v2float(<2 x i8> %a) {		define <2 x float> @sitofpv2i8v2float(<2 x i8> %a) {
▲ Show 20 Lines • Show All 352 Lines • Show Last 20 Lines

test/Analysis/CostModel/X86/uitofp.ll

Show First 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	define <32 x double> @uitofpv32i16v32double(<32 x i16> %a) {
ret <32 x double> %1		ret <32 x double> %1
}		}

define <2 x double> @uitofpv2i32v2double(<2 x i32> %a) {		define <2 x double> @uitofpv2i32v2double(<2 x i32> %a) {
; SSE2-LABEL: uitofpv2i32v2double		; SSE2-LABEL: uitofpv2i32v2double
; SSE2: cost of 20 {{.*}} uitofp		; SSE2: cost of 20 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv2i32v2double		; AVX1-LABEL: uitofpv2i32v2double
; AVX1: cost of 4 {{.*}} uitofp		; AVX1: cost of 6 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv2i32v2double		; AVX2-LABEL: uitofpv2i32v2double
; AVX2: cost of 4 {{.*}} uitofp		; AVX2: cost of 6 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv2i32v2double		; AVX512F-LABEL: uitofpv2i32v2double
; AVX512F: cost of 4 {{.*}} uitofp		; AVX512F: cost of 1 {{.*}} uitofp
%1 = uitofp <2 x i32> %a to <2 x double>		%1 = uitofp <2 x i32> %a to <2 x double>
ret <2 x double> %1		ret <2 x double> %1
}		}

define <4 x double> @uitofpv4i32v4double(<4 x i32> %a) {		define <4 x double> @uitofpv4i32v4double(<4 x i32> %a) {
; SSE2-LABEL: uitofpv4i32v4double		; SSE2-LABEL: uitofpv4i32v4double
; SSE2: cost of 40 {{.*}} uitofp		; SSE2: cost of 40 {{.*}} uitofp
;		;
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	define <32 x double> @uitofpv32i32v32double(<32 x i32> %a) {
ret <32 x double> %1		ret <32 x double> %1
}		}

define <2 x double> @uitofpv2i64v2double(<2 x i64> %a) {		define <2 x double> @uitofpv2i64v2double(<2 x i64> %a) {
; SSE2-LABEL: uitofpv2i64v2double		; SSE2-LABEL: uitofpv2i64v2double
; SSE2: cost of 20 {{.*}} uitofp		; SSE2: cost of 20 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv2i64v2double		; AVX1-LABEL: uitofpv2i64v2double
; AVX1: cost of 20 {{.*}} uitofp		; AVX1: cost of 10 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv2i64v2double		; AVX2-LABEL: uitofpv2i64v2double
; AVX2: cost of 20 {{.*}} uitofp		; AVX2: cost of 10 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv2i64v2double		; AVX512F-LABEL: uitofpv2i64v2double
; AVX512F: cost of 5 {{.*}} uitofp		; AVX512F: cost of 5 {{.*}} uitofp
;		;
; AVX512DQ-LABEL: uitofpv2i64v2double		; AVX512DQ-LABEL: uitofpv2i64v2double
; AVX512DQ: cost of 1 {{.*}} uitofp		; AVX512DQ: cost of 1 {{.*}} uitofp
%1 = uitofp <2 x i64> %a to <2 x double>		%1 = uitofp <2 x i64> %a to <2 x double>
ret <2 x double> %1		ret <2 x double> %1
}		}

define <4 x double> @uitofpv4i64v4double(<4 x i64> %a) {		define <4 x double> @uitofpv4i64v4double(<4 x i64> %a) {
; SSE2-LABEL: uitofpv4i64v4double		; SSE2-LABEL: uitofpv4i64v4double
; SSE2: cost of 40 {{.*}} uitofp		; SSE2: cost of 40 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv4i64v4double		; AVX1-LABEL: uitofpv4i64v4double
; AVX1: cost of 40 {{.*}} uitofp		; AVX1: cost of 20 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv4i64v4double		; AVX2-LABEL: uitofpv4i64v4double
; AVX2: cost of 40 {{.*}} uitofp		; AVX2: cost of 20 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv4i64v4double		; AVX512F-LABEL: uitofpv4i64v4double
; AVX512F: cost of 12 {{.*}} uitofp		; AVX512F: cost of 12 {{.*}} uitofp
;		;
; AVX512DQ-LABEL: uitofpv4i64v4double		; AVX512DQ-LABEL: uitofpv4i64v4double
; AVX512DQ: cost of 1 {{.*}} uitofp		; AVX512DQ: cost of 1 {{.*}} uitofp
%1 = uitofp <4 x i64> %a to <4 x double>		%1 = uitofp <4 x i64> %a to <4 x double>
ret <4 x double> %1		ret <4 x double> %1
}		}

define <8 x double> @uitofpv8i64v8double(<8 x i64> %a) {		define <8 x double> @uitofpv8i64v8double(<8 x i64> %a) {
; SSE2-LABEL: uitofpv8i64v8double		; SSE2-LABEL: uitofpv8i64v8double
; SSE2: cost of 80 {{.*}} uitofp		; SSE2: cost of 80 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv8i64v8double		; AVX1-LABEL: uitofpv8i64v8double
; AVX1: cost of 81 {{.*}} uitofp		; AVX1: cost of 41 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv8i64v8double		; AVX2-LABEL: uitofpv8i64v8double
; AVX2: cost of 81 {{.*}} uitofp		; AVX2: cost of 41 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv8i64v8double		; AVX512F-LABEL: uitofpv8i64v8double
; AVX512F: cost of 26 {{.*}} uitofp		; AVX512F: cost of 26 {{.*}} uitofp
;		;
; AVX512DQ-LABEL: uitofpv8i64v8double		; AVX512DQ-LABEL: uitofpv8i64v8double
; AVX512DQ: cost of 1 {{.*}} uitofp		; AVX512DQ: cost of 1 {{.*}} uitofp
%1 = uitofp <8 x i64> %a to <8 x double>		%1 = uitofp <8 x i64> %a to <8 x double>
ret <8 x double> %1		ret <8 x double> %1
}		}

define <16 x double> @uitofpv16i64v16double(<16 x i64> %a) {		define <16 x double> @uitofpv16i64v16double(<16 x i64> %a) {
; SSE2-LABEL: uitofpv16i64v16double		; SSE2-LABEL: uitofpv16i64v16double
; SSE2: cost of 160 {{.*}} uitofp		; SSE2: cost of 160 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv16i64v16double		; AVX1-LABEL: uitofpv16i64v16double
; AVX1: cost of 163 {{.*}} uitofp		; AVX1: cost of 83 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv16i64v16double		; AVX2-LABEL: uitofpv16i64v16double
; AVX2: cost of 163 {{.*}} uitofp		; AVX2: cost of 83 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv16i64v16double		; AVX512F-LABEL: uitofpv16i64v16double
; AVX512F: cost of 53 {{.*}} uitofp		; AVX512F: cost of 53 {{.*}} uitofp
;		;
; AVX512DQ-LABEL: uitofpv16i64v16double		; AVX512DQ-LABEL: uitofpv16i64v16double
; AVX512DQ: cost of 3 {{.*}} uitofp		; AVX512DQ: cost of 3 {{.*}} uitofp
%1 = uitofp <16 x i64> %a to <16 x double>		%1 = uitofp <16 x i64> %a to <16 x double>
ret <16 x double> %1		ret <16 x double> %1
}		}

define <32 x double> @uitofpv32i64v32double(<32 x i64> %a) {		define <32 x double> @uitofpv32i64v32double(<32 x i64> %a) {
; SSE2-LABEL: uitofpv32i64v32double		; SSE2-LABEL: uitofpv32i64v32double
; SSE2: cost of 320 {{.*}} uitofp		; SSE2: cost of 320 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv32i64v32double		; AVX1-LABEL: uitofpv32i64v32double
; AVX1: cost of 327 {{.*}} uitofp		; AVX1: cost of 167 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv32i64v32double		; AVX2-LABEL: uitofpv32i64v32double
; AVX2: cost of 327 {{.*}} uitofp		; AVX2: cost of 167 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv32i64v32double		; AVX512F-LABEL: uitofpv32i64v32double
; AVX512F: cost of 107 {{.*}} uitofp		; AVX512F: cost of 107 {{.*}} uitofp
;		;
; AVX512DQ-LABEL: uitofpv32i64v32double		; AVX512DQ-LABEL: uitofpv32i64v32double
; AVX512DQ: cost of 2 {{.*}} uitofp		; AVX512DQ: cost of 2 {{.*}} uitofp
%1 = uitofp <32 x i64> %a to <32 x double>		%1 = uitofp <32 x i64> %a to <32 x double>
ret <32 x double> %1		ret <32 x double> %1
▲ Show 20 Lines • Show All 245 Lines • ▼ Show 20 Lines	define <2 x float> @uitofpv2i64v2float(<2 x i64> %a) {
;		;
; AVX1-LABEL: uitofpv2i64v2float		; AVX1-LABEL: uitofpv2i64v2float
; AVX1: cost of 4 {{.*}} uitofp		; AVX1: cost of 4 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv2i64v2float		; AVX2-LABEL: uitofpv2i64v2float
; AVX2: cost of 4 {{.*}} uitofp		; AVX2: cost of 4 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv2i64v2float		; AVX512F-LABEL: uitofpv2i64v2float
; AVX512F: cost of 4 {{.*}} uitofp		; AVX512F: cost of 5 {{.*}} uitofp
%1 = uitofp <2 x i64> %a to <2 x float>		%1 = uitofp <2 x i64> %a to <2 x float>
ret <2 x float> %1		ret <2 x float> %1
}		}

define <4 x float> @uitofpv4i64v4float(<4 x i64> %a) {		define <4 x float> @uitofpv4i64v4float(<4 x i64> %a) {
; SSE2-LABEL: uitofpv4i64v4float		; SSE2-LABEL: uitofpv4i64v4float
; SSE2: cost of 30 {{.*}} uitofp		; SSE2: cost of 30 {{.*}} uitofp
;		;
Show All 15 Lines	define <8 x float> @uitofpv8i64v8float(<8 x i64> %a) {
;		;
; AVX1-LABEL: uitofpv8i64v8float		; AVX1-LABEL: uitofpv8i64v8float
; AVX1: cost of 21 {{.*}} uitofp		; AVX1: cost of 21 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv8i64v8float		; AVX2-LABEL: uitofpv8i64v8float
; AVX2: cost of 21 {{.*}} uitofp		; AVX2: cost of 21 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv8i64v8float		; AVX512F-LABEL: uitofpv8i64v8float
; AVX512F: cost of 22 {{.*}} uitofp		; AVX512F: cost of 26 {{.*}} uitofp
%1 = uitofp <8 x i64> %a to <8 x float>		%1 = uitofp <8 x i64> %a to <8 x float>
ret <8 x float> %1		ret <8 x float> %1
}		}

define <16 x float> @uitofpv16i64v16float(<16 x i64> %a) {		define <16 x float> @uitofpv16i64v16float(<16 x i64> %a) {
; SSE2-LABEL: uitofpv16i64v16float		; SSE2-LABEL: uitofpv16i64v16float
; SSE2: cost of 120 {{.*}} uitofp		; SSE2: cost of 120 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv16i64v16float		; AVX1-LABEL: uitofpv16i64v16float
; AVX1: cost of 43 {{.*}} uitofp		; AVX1: cost of 43 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv16i64v16float		; AVX2-LABEL: uitofpv16i64v16float
; AVX2: cost of 43 {{.*}} uitofp		; AVX2: cost of 43 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv16i64v16float		; AVX512F-LABEL: uitofpv16i64v16float
; AVX512F: cost of 45 {{.*}} uitofp		; AVX512F: cost of 53 {{.*}} uitofp
%1 = uitofp <16 x i64> %a to <16 x float>		%1 = uitofp <16 x i64> %a to <16 x float>
ret <16 x float> %1		ret <16 x float> %1
}		}

define <32 x float> @uitofpv32i64v32float(<32 x i64> %a) {		define <32 x float> @uitofpv32i64v32float(<32 x i64> %a) {
; SSE2-LABEL: uitofpv32i64v32float		; SSE2-LABEL: uitofpv32i64v32float
; SSE2: cost of 240 {{.*}} uitofp		; SSE2: cost of 240 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv32i64v32float		; AVX1-LABEL: uitofpv32i64v32float
; AVX1: cost of 87 {{.*}} uitofp		; AVX1: cost of 87 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv32i64v32float		; AVX2-LABEL: uitofpv32i64v32float
; AVX2: cost of 87 {{.*}} uitofp		; AVX2: cost of 87 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv32i64v32float		; AVX512F-LABEL: uitofpv32i64v32float
; AVX512F: cost of 91 {{.*}} uitofp		; AVX512F: cost of 107 {{.*}} uitofp
%1 = uitofp <32 x i64> %a to <32 x float>		%1 = uitofp <32 x i64> %a to <32 x float>
ret <32 x float> %1		ret <32 x float> %1
}		}

define <8 x i32> @fptouiv8f32v8i32(<8 x float> %a) {		define <8 x i32> @fptouiv8f32v8i32(<8 x float> %a) {
; AVX512F-LABEL: fptouiv8f32v8i32		; AVX512F-LABEL: fptouiv8f32v8i32
; AVX512F: cost of 1 {{.*}} fptoui		; AVX512F: cost of 1 {{.*}} fptoui
%1 = fptoui <8 x float> %a to <8 x i32>		%1 = fptoui <8 x float> %a to <8 x i32>
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/X86/uint64_to_fp64-cost-model.ll

	; RUN: opt < %s -loop-vectorize -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx -S -debug-only=loop-vectorize 2>&1 \| FileCheck %s			; RUN: opt < %s -loop-vectorize -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx -S -debug-only=loop-vectorize 2>&1 \| FileCheck %s
	; REQUIRES: asserts			; REQUIRES: asserts

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.8.0"			target triple = "x86_64-apple-macosx10.8.0"


	; CHECK: cost of 20 for VF 2 For instruction: %conv = uitofp i64 %tmp to double			; CHECK: cost of 10 for VF 2 For instruction: %conv = uitofp i64 %tmp to double
	; CHECK: cost of 40 for VF 4 For instruction: %conv = uitofp i64 %tmp to double			; CHECK: cost of 20 for VF 4 For instruction: %conv = uitofp i64 %tmp to double
	define void @uint64_to_double_cost(i64* noalias nocapture %a, double* noalias nocapture readonly %b) nounwind {			define void @uint64_to_double_cost(i64* noalias nocapture %a, double* noalias nocapture readonly %b) nounwind {
	entry:			entry:
	br label %for.body			br label %for.body
	for.body:			for.body:
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds i64, i64* %a, i64 %indvars.iv			%arrayidx = getelementptr inbounds i64, i64* %a, i64 %indvars.iv
	%tmp = load i64, i64* %arrayidx, align 4			%tmp = load i64, i64* %arrayidx, align 4
	%conv = uitofp i64 %tmp to double			%conv = uitofp i64 %tmp to double
	Show All 9 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Make some cast costs more preciseClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 62946

lib/Target/X86/X86TargetTransformInfo.cpp

test/Analysis/CostModel/X86/sitofp.ll

test/Analysis/CostModel/X86/uitofp.ll

test/Transforms/LoopVectorize/X86/uint64_to_fp64-cost-model.ll

[X86] Make some cast costs more precise
ClosedPublic