This is an archive of the discontinued LLVM Phabricator instance.

Differential D22064

[X86] Make some cast costs more precise
ClosedPublic

Authored by mkuper on Jul 6 2016, 1:06 PM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon
delena

Commits

rGf0c59330e914: [X86] Make some cast costs more precise
rL275106: [X86] Make some cast costs more precise

Summary

This brings back from the dead the AVX table parts of Elena's D15604.

The SSE changes seem more closely bound to the strategy change (since the current costs use the weird "convert to simple types, then look up" form), I'll look into that separately.

Diff Detail

Repository: rL LLVM

Event Timeline

mkuper updated this revision to Diff 62946.Jul 6 2016, 1:06 PM

mkuper retitled this revision from to [X86] Make some cast costs more precise.

mkuper updated this object.

mkuper added reviewers: delena, RKSimon, spatel.

mkuper added a subscriber: llvm-commits.

RKSimon added inline comments.Jul 6 2016, 1:15 PM

lib/Target/X86/X86TargetTransformInfo.cpp
540 ↗	(On Diff #62946)	Depending on how thorough we need to be shouldn't there be AVX512DQ+AVX512VL UINT_TO_FP cases for 128/256 bit vectors?

mkuper added inline comments.Jul 6 2016, 1:20 PM

lib/Target/X86/X86TargetTransformInfo.cpp
540 ↗	(On Diff #62946)	Probably. I'd rather leave that to the Intel folks, they can probably get more precise numbers for SKX.

delena added inline comments.Jul 6 2016, 11:29 PM

lib/Target/X86/X86TargetTransformInfo.cpp
540 ↗	(On Diff #62946)	In this case, even if you have only DQ without VL, the conversion is in ZMM instead of YMM, but the cost is the same.
test/Analysis/CostModel/X86/sitofp.ll
273 ↗	(On Diff #62946)	We should have a nicer cost for DQ here, because it handles all 64 bit integers, right?

mkuper added inline comments.Jul 7 2016, 10:44 AM

lib/Target/X86/X86TargetTransformInfo.cpp

540 ↗

(On Diff #62946)

We don't do this right now, see below.

test/Analysis/CostModel/X86/sitofp.ll

273 ↗

(On Diff #62946)

Right now, we scalarize this unless we have VL.

That is, both F and F+DQ produce:

	vextracti128	$1, %ymm0, %xmm1
	vpextrq	$1, %xmm1, %rax
	vcvtsi2sdq	%rax, %xmm0, %xmm2
	vmovq	%xmm1, %rax
	vcvtsi2sdq	%rax, %xmm0, %xmm1
	vunpcklpd	%xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0]
	vpextrq	$1, %xmm0, %rax
	vcvtsi2sdq	%rax, %xmm0, %xmm2
	vmovq	%xmm0, %rax
	vcvtsi2sdq	%rax, %xmm0, %xmm0
	vunpcklpd	%xmm2, %xmm0, %xmm0 ## xmm0 = xmm0[0],xmm2[0]
	vinsertf128	$1, %xmm1, %ymm0, %ymm0
	retq

And with VL:

	vcvtqq2pd	%ymm0, %ymm0
	retq

I guess we could, potentially, have a nicer sequence with DQ without VL (insert low lanes, vcvtqq2pd, extract low lanes), but we currently don't.

A couple of minors but otherwise this looks good to me. The AVX512 people should probably give the final OK though.

lib/Target/X86/X86TargetTransformInfo.cpp
540 ↗	(On Diff #62946)	OK - please add a TODO comment to the table for the AVX512DQ 128/256 entries.
717 ↗	(On Diff #62946)	Test?

Thanks, Simon!

lib/Target/X86/X86TargetTransformInfo.cpp
540 ↗	(On Diff #62946)	Ack.
717 ↗	(On Diff #62946)	Right, thanks. Elena's original patch didn't have one, and I didn't notice. I'll add.

delena accepted this revision.Jul 9 2016, 11:13 PM

delena edited edge metadata.

This revision is now accepted and ready to land.Jul 9 2016, 11:13 PM

Closed by commit rL275106: [X86] Make some cast costs more precise (authored by mkuper). · Explain WhyJul 11 2016, 2:47 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86TargetTransformInfo.cpp

19 lines

test/

Analysis/

CostModel/

X86/

cast.ll

18 lines

sitofp.ll

18 lines

uitofp.ll

34 lines

Transforms/

LoopVectorize/

X86/

uint64_to_fp64-cost-model.ll

4 lines

Diff 63581

llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 541 Lines • ▼ Show 20 Lines	static const TypeConversionCostTblEntry AVX512DQConversionTbl[] = {
{ ISD::FP_TO_UINT, MVT::v2i64, MVT::v2f32, 1 },		{ ISD::FP_TO_UINT, MVT::v2i64, MVT::v2f32, 1 },
{ ISD::FP_TO_UINT, MVT::v4i64, MVT::v4f32, 1 },		{ ISD::FP_TO_UINT, MVT::v4i64, MVT::v4f32, 1 },
{ ISD::FP_TO_UINT, MVT::v8i64, MVT::v8f32, 1 },		{ ISD::FP_TO_UINT, MVT::v8i64, MVT::v8f32, 1 },
{ ISD::FP_TO_UINT, MVT::v2i64, MVT::v2f64, 1 },		{ ISD::FP_TO_UINT, MVT::v2i64, MVT::v2f64, 1 },
{ ISD::FP_TO_UINT, MVT::v4i64, MVT::v4f64, 1 },		{ ISD::FP_TO_UINT, MVT::v4i64, MVT::v4f64, 1 },
{ ISD::FP_TO_UINT, MVT::v8i64, MVT::v8f64, 1 },		{ ISD::FP_TO_UINT, MVT::v8i64, MVT::v8f64, 1 },
};		};

		// TODO: For AVX512DQ + AVX512VL, we also have cheap casts for 128-bit and
		// 256-bit wide vectors.

static const TypeConversionCostTblEntry AVX512FConversionTbl[] = {		static const TypeConversionCostTblEntry AVX512FConversionTbl[] = {
{ ISD::FP_EXTEND, MVT::v8f64, MVT::v8f32, 1 },		{ ISD::FP_EXTEND, MVT::v8f64, MVT::v8f32, 1 },
{ ISD::FP_EXTEND, MVT::v8f64, MVT::v16f32, 3 },		{ ISD::FP_EXTEND, MVT::v8f64, MVT::v16f32, 3 },
{ ISD::FP_ROUND, MVT::v8f32, MVT::v8f64, 1 },		{ ISD::FP_ROUND, MVT::v8f32, MVT::v8f64, 1 },

{ ISD::TRUNCATE, MVT::v16i8, MVT::v16i32, 1 },		{ ISD::TRUNCATE, MVT::v16i8, MVT::v16i32, 1 },
{ ISD::TRUNCATE, MVT::v16i16, MVT::v16i32, 1 },		{ ISD::TRUNCATE, MVT::v16i16, MVT::v16i32, 1 },
{ ISD::TRUNCATE, MVT::v8i16, MVT::v8i64, 1 },		{ ISD::TRUNCATE, MVT::v8i16, MVT::v8i64, 1 },
Show All 14 Lines	static const TypeConversionCostTblEntry AVX512FConversionTbl[] = {
{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i1, 4 },		{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i1, 4 },
{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i1, 3 },		{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i1, 3 },
{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i8, 2 },		{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i8, 2 },
{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i8, 2 },		{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i8, 2 },
{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i16, 2 },		{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i16, 2 },
{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i16, 2 },		{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i16, 2 },
{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i32, 1 },		{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i32, 1 },
{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i32, 1 },		{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i32, 1 },
		{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i64, 26 },
		{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i64, 26 },

{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i1, 4 },		{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i1, 4 },
{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i1, 3 },		{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i1, 3 },
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i8, 2 },		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i8, 2 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i8, 2 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i8, 2 },
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i8, 2 },		{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i8, 2 },
{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i8, 2 },		{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i8, 2 },
{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i8, 2 },		{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i8, 2 },
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i16, 5 },		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i16, 5 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i16, 2 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i16, 2 },
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i16, 2 },		{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i16, 2 },
{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i16, 2 },		{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i16, 2 },
{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i16, 2 },		{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i16, 2 },
{ ISD::UINT_TO_FP, MVT::v2f32, MVT::v2i32, 2 },		{ ISD::UINT_TO_FP, MVT::v2f32, MVT::v2i32, 2 },
		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i32, 1 },
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i32, 1 },		{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i32, 1 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i32, 1 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i32, 1 },
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i32, 1 },		{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i32, 1 },
{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i32, 1 },		{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i32, 1 },
{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i32, 1 },		{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i32, 1 },
		{ ISD::UINT_TO_FP, MVT::v2f32, MVT::v2i64, 5 },
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i64, 5 },		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i64, 5 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i64, 12 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i64, 12 },
{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i64, 26 },		{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i64, 26 },

{ ISD::FP_TO_UINT, MVT::v2i32, MVT::v2f32, 1 },		{ ISD::FP_TO_UINT, MVT::v2i32, MVT::v2f32, 1 },
{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f32, 1 },		{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f32, 1 },
{ ISD::FP_TO_UINT, MVT::v8i32, MVT::v8f32, 1 },		{ ISD::FP_TO_UINT, MVT::v8i32, MVT::v8f32, 1 },
{ ISD::FP_TO_UINT, MVT::v16i32, MVT::v16f32, 1 },		{ ISD::FP_TO_UINT, MVT::v16i32, MVT::v16f32, 1 },
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	static const TypeConversionCostTblEntry AVXConversionTbl[] = {
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i1, 7 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i1, 7 },
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i1, 6 },		{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i1, 6 },
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i8, 2 },		{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i8, 2 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i8, 2 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i8, 2 },
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i8, 5 },		{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i8, 5 },
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i16, 2 },		{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i16, 2 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i16, 2 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i16, 2 },
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i16, 5 },		{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i16, 5 },
		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i32, 6 },
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i32, 6 },		{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i32, 6 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i32, 6 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i32, 6 },
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i32, 9 },		{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i32, 9 },
// The generic code to compute the scalar overhead is currently broken.		// The generic code to compute the scalar overhead is currently broken.
// Workaround this limitation by estimating the scalarization overhead		// Workaround this limitation by estimating the scalarization overhead
// here. We have roughly 10 instructions per scalar element.		// here. We have roughly 10 instructions per scalar element.
// Multiply that by the vector width.		// Multiply that by the vector width.
// FIXME: remove that when PR19268 is fixed.		// FIXME: remove that when PR19268 is fixed.
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i64, 2*10 },		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i64, 10 },
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i64, 4*10 },		{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i64, 20 },
		{ ISD::SINT_TO_FP, MVT::v4f64, MVT::v4i64, 13 },
		{ ISD::SINT_TO_FP, MVT::v4f64, MVT::v4i64, 13 },

{ ISD::FP_TO_SINT, MVT::v4i8, MVT::v4f32, 1 },		{ ISD::FP_TO_SINT, MVT::v4i8, MVT::v4f32, 1 },
{ ISD::FP_TO_SINT, MVT::v8i8, MVT::v8f32, 7 },		{ ISD::FP_TO_SINT, MVT::v8i8, MVT::v8f32, 7 },
// This node is expanded into scalarized operations but BasicTTI is overly		// This node is expanded into scalarized operations but BasicTTI is overly
// optimistic estimating its cost. It computes 3 per element (one		// optimistic estimating its cost. It computes 3 per element (one
// vector-extract, one scalar conversion and one vector-insert). The		// vector-extract, one scalar conversion and one vector-insert). The
// problem is that the inserts form a read-modify-write chain so latency		// problem is that the inserts form a read-modify-write chain so latency
// should be factored in too. Inflating the cost per element by 1.		// should be factored in too. Inflating the cost per element by 1.
{ ISD::FP_TO_UINT, MVT::v8i32, MVT::v8f32, 8*4 },		{ ISD::FP_TO_UINT, MVT::v8i32, MVT::v8f32, 8*4 },
{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f64, 4*4 },		{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f64, 4*4 },

		{ ISD::FP_EXTEND, MVT::v4f64, MVT::v4f32, 1 },
		{ ISD::FP_ROUND, MVT::v4f32, MVT::v4f64, 1 },
};		};

static const TypeConversionCostTblEntry SSE41ConversionTbl[] = {		static const TypeConversionCostTblEntry SSE41ConversionTbl[] = {
{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i8, 2 },		{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i8, 2 },
{ ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i8, 2 },		{ ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i8, 2 },
{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i16, 2 },		{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i16, 2 },
{ ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i16, 2 },		{ ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i16, 2 },
{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i32, 2 },		{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i32, 2 },
▲ Show 20 Lines • Show All 894 Lines • Show Last 20 Lines

llvm/trunk/test/Analysis/CostModel/X86/cast.ll

Show First 20 Lines • Show All 232 Lines • ▼ Show 20 Lines	; CHECK-LABEL: for function 'uitofp8'
; CHECK-AVX512: cost of 1 {{.*}} uitofp		; CHECK-AVX512: cost of 1 {{.*}} uitofp
; CHECK-AVX: cost of 9 {{.*}} uitofp		; CHECK-AVX: cost of 9 {{.*}} uitofp
%D1 = uitofp <8 x i32> %d to <8 x float>		%D1 = uitofp <8 x i32> %d to <8 x float>
ret void		ret void
}		}

define void @fp_conv(<8 x float> %a, <16 x float>%b, <4 x float> %c) {		define void @fp_conv(<8 x float> %a, <16 x float>%b, <4 x float> %c) {
;CHECK-LABEL: for function 'fp_conv'		;CHECK-LABEL: for function 'fp_conv'
; CHECK-AVX512: cost of 1 {{.*}} fpext		; CHECK: cost of 1 {{.*}} %A1 = fpext
%A1 = fpext <8 x float> %a to <8 x double>		%A1 = fpext <4 x float> %c to <4 x double>

; CHECK-AVX512: cost of 1 {{.*}} fpext		; CHECK-AVX: cost of 3 {{.*}} %A2 = fpext
%A2 = fpext <4 x float> %c to <4 x double>		; CHECK-AVX2: cost of 3 {{.*}} %A2 = fpext
		; CHECK-AVX512: cost of 1 {{.*}} %A2 = fpext
		%A2 = fpext <8 x float> %a to <8 x double>

; CHECK-AVX2: cost of 3 {{.*}} %A3 = fpext		; CHECK: cost of 1 {{.*}} %A3 = fptrunc
; CHECK-AVX512: cost of 1 {{.*}} %A3 = fpext		%A3 = fptrunc <4 x double> undef to <4 x float>
%A3 = fpext <8 x float> %a to <8 x double>

		; CHECK-AVX: cost of 3 {{.*}} %A4 = fptrunc
; CHECK-AVX2: cost of 3 {{.*}} %A4 = fptrunc		; CHECK-AVX2: cost of 3 {{.*}} %A4 = fptrunc
; CHECK-AVX512: cost of 1 {{.*}} %A4 = fptrunc		; CHECK-AVX512: cost of 1 {{.*}} %A4 = fptrunc
%A4 = fptrunc <8 x double> undef to <8 x float>		%A4 = fptrunc <8 x double> undef to <8 x float>

; CHECK-AVX512: cost of 1 {{.*}} %A5 = fptrunc
%A5 = fptrunc <4 x double> undef to <4 x float>
ret void		ret void
}		}

llvm/trunk/test/Analysis/CostModel/X86/sitofp.ll

Show First 20 Lines • Show All 258 Lines • ▼ Show 20 Lines	define <2 x double> @sitofpv2i64v2double(<2 x i64> %a) {
ret <2 x double> %1		ret <2 x double> %1
}		}

define <4 x double> @sitofpv4i64v4double(<4 x i64> %a) {		define <4 x double> @sitofpv4i64v4double(<4 x i64> %a) {
; SSE2-LABEL: sitofpv4i64v4double		; SSE2-LABEL: sitofpv4i64v4double
; SSE2: cost of 40 {{.*}} sitofp		; SSE2: cost of 40 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv4i64v4double		; AVX1-LABEL: sitofpv4i64v4double
; AVX1: cost of 10 {{.*}} sitofp		; AVX1: cost of 13 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv4i64v4double		; AVX2-LABEL: sitofpv4i64v4double
; AVX2: cost of 10 {{.*}} sitofp		; AVX2: cost of 13 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv4i64v4double		; AVX512F-LABEL: sitofpv4i64v4double
; AVX512F: cost of 10 {{.*}} sitofp		; AVX512F: cost of 13 {{.*}} sitofp
%1 = sitofp <4 x i64> %a to <4 x double>		%1 = sitofp <4 x i64> %a to <4 x double>
ret <4 x double> %1		ret <4 x double> %1
}		}

define <8 x double> @sitofpv8i64v8double(<8 x i64> %a) {		define <8 x double> @sitofpv8i64v8double(<8 x i64> %a) {
; SSE2-LABEL: sitofpv8i64v8double		; SSE2-LABEL: sitofpv8i64v8double
; SSE2: cost of 80 {{.*}} sitofp		; SSE2: cost of 80 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv8i64v8double		; AVX1-LABEL: sitofpv8i64v8double
; AVX1: cost of 21 {{.*}} sitofp		; AVX1: cost of 27 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv8i64v8double		; AVX2-LABEL: sitofpv8i64v8double
; AVX2: cost of 21 {{.*}} sitofp		; AVX2: cost of 27 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv8i64v8double		; AVX512F-LABEL: sitofpv8i64v8double
; AVX512F: cost of 22 {{.*}} sitofp		; AVX512F: cost of 22 {{.*}} sitofp
%1 = sitofp <8 x i64> %a to <8 x double>		%1 = sitofp <8 x i64> %a to <8 x double>
ret <8 x double> %1		ret <8 x double> %1
}		}

define <16 x double> @sitofpv16i64v16double(<16 x i64> %a) {		define <16 x double> @sitofpv16i64v16double(<16 x i64> %a) {
; SSE2-LABEL: sitofpv16i64v16double		; SSE2-LABEL: sitofpv16i64v16double
; SSE2: cost of 160 {{.*}} sitofp		; SSE2: cost of 160 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv16i64v16double		; AVX1-LABEL: sitofpv16i64v16double
; AVX1: cost of 43 {{.*}} sitofp		; AVX1: cost of 55 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv16i64v16double		; AVX2-LABEL: sitofpv16i64v16double
; AVX2: cost of 43 {{.*}} sitofp		; AVX2: cost of 55 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv16i64v16double		; AVX512F-LABEL: sitofpv16i64v16double
; AVX512F: cost of 45 {{.*}} sitofp		; AVX512F: cost of 45 {{.*}} sitofp
%1 = sitofp <16 x i64> %a to <16 x double>		%1 = sitofp <16 x i64> %a to <16 x double>
ret <16 x double> %1		ret <16 x double> %1
}		}

define <32 x double> @sitofpv32i64v32double(<32 x i64> %a) {		define <32 x double> @sitofpv32i64v32double(<32 x i64> %a) {
; SSE2-LABEL: sitofpv32i64v32double		; SSE2-LABEL: sitofpv32i64v32double
; SSE2: cost of 320 {{.*}} sitofp		; SSE2: cost of 320 {{.*}} sitofp
;		;
; AVX1-LABEL: sitofpv32i64v32double		; AVX1-LABEL: sitofpv32i64v32double
; AVX1: cost of 87 {{.*}} sitofp		; AVX1: cost of 111 {{.*}} sitofp
;		;
; AVX2-LABEL: sitofpv32i64v32double		; AVX2-LABEL: sitofpv32i64v32double
; AVX2: cost of 87 {{.*}} sitofp		; AVX2: cost of 111 {{.*}} sitofp
;		;
; AVX512F-LABEL: sitofpv32i64v32double		; AVX512F-LABEL: sitofpv32i64v32double
; AVX512F: cost of 91 {{.*}} sitofp		; AVX512F: cost of 91 {{.*}} sitofp
%1 = sitofp <32 x i64> %a to <32 x double>		%1 = sitofp <32 x i64> %a to <32 x double>
ret <32 x double> %1		ret <32 x double> %1
}		}

define <2 x float> @sitofpv2i8v2float(<2 x i8> %a) {		define <2 x float> @sitofpv2i8v2float(<2 x i8> %a) {
▲ Show 20 Lines • Show All 352 Lines • Show Last 20 Lines

llvm/trunk/test/Analysis/CostModel/X86/uitofp.ll

Show First 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	define <32 x double> @uitofpv32i16v32double(<32 x i16> %a) {
ret <32 x double> %1		ret <32 x double> %1
}		}

define <2 x double> @uitofpv2i32v2double(<2 x i32> %a) {		define <2 x double> @uitofpv2i32v2double(<2 x i32> %a) {
; SSE2-LABEL: uitofpv2i32v2double		; SSE2-LABEL: uitofpv2i32v2double
; SSE2: cost of 20 {{.*}} uitofp		; SSE2: cost of 20 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv2i32v2double		; AVX1-LABEL: uitofpv2i32v2double
; AVX1: cost of 4 {{.*}} uitofp		; AVX1: cost of 6 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv2i32v2double		; AVX2-LABEL: uitofpv2i32v2double
; AVX2: cost of 4 {{.*}} uitofp		; AVX2: cost of 6 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv2i32v2double		; AVX512F-LABEL: uitofpv2i32v2double
; AVX512F: cost of 4 {{.*}} uitofp		; AVX512F: cost of 1 {{.*}} uitofp
%1 = uitofp <2 x i32> %a to <2 x double>		%1 = uitofp <2 x i32> %a to <2 x double>
ret <2 x double> %1		ret <2 x double> %1
}		}

define <4 x double> @uitofpv4i32v4double(<4 x i32> %a) {		define <4 x double> @uitofpv4i32v4double(<4 x i32> %a) {
; SSE2-LABEL: uitofpv4i32v4double		; SSE2-LABEL: uitofpv4i32v4double
; SSE2: cost of 40 {{.*}} uitofp		; SSE2: cost of 40 {{.*}} uitofp
;		;
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	define <32 x double> @uitofpv32i32v32double(<32 x i32> %a) {
ret <32 x double> %1		ret <32 x double> %1
}		}

define <2 x double> @uitofpv2i64v2double(<2 x i64> %a) {		define <2 x double> @uitofpv2i64v2double(<2 x i64> %a) {
; SSE2-LABEL: uitofpv2i64v2double		; SSE2-LABEL: uitofpv2i64v2double
; SSE2: cost of 20 {{.*}} uitofp		; SSE2: cost of 20 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv2i64v2double		; AVX1-LABEL: uitofpv2i64v2double
; AVX1: cost of 20 {{.*}} uitofp		; AVX1: cost of 10 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv2i64v2double		; AVX2-LABEL: uitofpv2i64v2double
; AVX2: cost of 20 {{.*}} uitofp		; AVX2: cost of 10 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv2i64v2double		; AVX512F-LABEL: uitofpv2i64v2double
; AVX512F: cost of 5 {{.*}} uitofp		; AVX512F: cost of 5 {{.*}} uitofp
;		;
; AVX512DQ-LABEL: uitofpv2i64v2double		; AVX512DQ-LABEL: uitofpv2i64v2double
; AVX512DQ: cost of 1 {{.*}} uitofp		; AVX512DQ: cost of 1 {{.*}} uitofp
%1 = uitofp <2 x i64> %a to <2 x double>		%1 = uitofp <2 x i64> %a to <2 x double>
ret <2 x double> %1		ret <2 x double> %1
}		}

define <4 x double> @uitofpv4i64v4double(<4 x i64> %a) {		define <4 x double> @uitofpv4i64v4double(<4 x i64> %a) {
; SSE2-LABEL: uitofpv4i64v4double		; SSE2-LABEL: uitofpv4i64v4double
; SSE2: cost of 40 {{.*}} uitofp		; SSE2: cost of 40 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv4i64v4double		; AVX1-LABEL: uitofpv4i64v4double
; AVX1: cost of 40 {{.*}} uitofp		; AVX1: cost of 20 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv4i64v4double		; AVX2-LABEL: uitofpv4i64v4double
; AVX2: cost of 40 {{.*}} uitofp		; AVX2: cost of 20 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv4i64v4double		; AVX512F-LABEL: uitofpv4i64v4double
; AVX512F: cost of 12 {{.*}} uitofp		; AVX512F: cost of 12 {{.*}} uitofp
;		;
; AVX512DQ-LABEL: uitofpv4i64v4double		; AVX512DQ-LABEL: uitofpv4i64v4double
; AVX512DQ: cost of 1 {{.*}} uitofp		; AVX512DQ: cost of 1 {{.*}} uitofp
%1 = uitofp <4 x i64> %a to <4 x double>		%1 = uitofp <4 x i64> %a to <4 x double>
ret <4 x double> %1		ret <4 x double> %1
}		}

define <8 x double> @uitofpv8i64v8double(<8 x i64> %a) {		define <8 x double> @uitofpv8i64v8double(<8 x i64> %a) {
; SSE2-LABEL: uitofpv8i64v8double		; SSE2-LABEL: uitofpv8i64v8double
; SSE2: cost of 80 {{.*}} uitofp		; SSE2: cost of 80 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv8i64v8double		; AVX1-LABEL: uitofpv8i64v8double
; AVX1: cost of 81 {{.*}} uitofp		; AVX1: cost of 41 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv8i64v8double		; AVX2-LABEL: uitofpv8i64v8double
; AVX2: cost of 81 {{.*}} uitofp		; AVX2: cost of 41 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv8i64v8double		; AVX512F-LABEL: uitofpv8i64v8double
; AVX512F: cost of 26 {{.*}} uitofp		; AVX512F: cost of 26 {{.*}} uitofp
;		;
; AVX512DQ-LABEL: uitofpv8i64v8double		; AVX512DQ-LABEL: uitofpv8i64v8double
; AVX512DQ: cost of 1 {{.*}} uitofp		; AVX512DQ: cost of 1 {{.*}} uitofp
%1 = uitofp <8 x i64> %a to <8 x double>		%1 = uitofp <8 x i64> %a to <8 x double>
ret <8 x double> %1		ret <8 x double> %1
}		}

define <16 x double> @uitofpv16i64v16double(<16 x i64> %a) {		define <16 x double> @uitofpv16i64v16double(<16 x i64> %a) {
; SSE2-LABEL: uitofpv16i64v16double		; SSE2-LABEL: uitofpv16i64v16double
; SSE2: cost of 160 {{.*}} uitofp		; SSE2: cost of 160 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv16i64v16double		; AVX1-LABEL: uitofpv16i64v16double
; AVX1: cost of 163 {{.*}} uitofp		; AVX1: cost of 83 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv16i64v16double		; AVX2-LABEL: uitofpv16i64v16double
; AVX2: cost of 163 {{.*}} uitofp		; AVX2: cost of 83 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv16i64v16double		; AVX512F-LABEL: uitofpv16i64v16double
; AVX512F: cost of 53 {{.*}} uitofp		; AVX512F: cost of 53 {{.*}} uitofp
;		;
; AVX512DQ-LABEL: uitofpv16i64v16double		; AVX512DQ-LABEL: uitofpv16i64v16double
; AVX512DQ: cost of 3 {{.*}} uitofp		; AVX512DQ: cost of 3 {{.*}} uitofp
%1 = uitofp <16 x i64> %a to <16 x double>		%1 = uitofp <16 x i64> %a to <16 x double>
ret <16 x double> %1		ret <16 x double> %1
}		}

define <32 x double> @uitofpv32i64v32double(<32 x i64> %a) {		define <32 x double> @uitofpv32i64v32double(<32 x i64> %a) {
; SSE2-LABEL: uitofpv32i64v32double		; SSE2-LABEL: uitofpv32i64v32double
; SSE2: cost of 320 {{.*}} uitofp		; SSE2: cost of 320 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv32i64v32double		; AVX1-LABEL: uitofpv32i64v32double
; AVX1: cost of 327 {{.*}} uitofp		; AVX1: cost of 167 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv32i64v32double		; AVX2-LABEL: uitofpv32i64v32double
; AVX2: cost of 327 {{.*}} uitofp		; AVX2: cost of 167 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv32i64v32double		; AVX512F-LABEL: uitofpv32i64v32double
; AVX512F: cost of 107 {{.*}} uitofp		; AVX512F: cost of 107 {{.*}} uitofp
;		;
; AVX512DQ-LABEL: uitofpv32i64v32double		; AVX512DQ-LABEL: uitofpv32i64v32double
; AVX512DQ: cost of 2 {{.*}} uitofp		; AVX512DQ: cost of 2 {{.*}} uitofp
%1 = uitofp <32 x i64> %a to <32 x double>		%1 = uitofp <32 x i64> %a to <32 x double>
ret <32 x double> %1		ret <32 x double> %1
▲ Show 20 Lines • Show All 245 Lines • ▼ Show 20 Lines	define <2 x float> @uitofpv2i64v2float(<2 x i64> %a) {
;		;
; AVX1-LABEL: uitofpv2i64v2float		; AVX1-LABEL: uitofpv2i64v2float
; AVX1: cost of 4 {{.*}} uitofp		; AVX1: cost of 4 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv2i64v2float		; AVX2-LABEL: uitofpv2i64v2float
; AVX2: cost of 4 {{.*}} uitofp		; AVX2: cost of 4 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv2i64v2float		; AVX512F-LABEL: uitofpv2i64v2float
; AVX512F: cost of 4 {{.*}} uitofp		; AVX512F: cost of 5 {{.*}} uitofp
%1 = uitofp <2 x i64> %a to <2 x float>		%1 = uitofp <2 x i64> %a to <2 x float>
ret <2 x float> %1		ret <2 x float> %1
}		}

define <4 x float> @uitofpv4i64v4float(<4 x i64> %a) {		define <4 x float> @uitofpv4i64v4float(<4 x i64> %a) {
; SSE2-LABEL: uitofpv4i64v4float		; SSE2-LABEL: uitofpv4i64v4float
; SSE2: cost of 30 {{.*}} uitofp		; SSE2: cost of 30 {{.*}} uitofp
;		;
Show All 15 Lines	define <8 x float> @uitofpv8i64v8float(<8 x i64> %a) {
;		;
; AVX1-LABEL: uitofpv8i64v8float		; AVX1-LABEL: uitofpv8i64v8float
; AVX1: cost of 21 {{.*}} uitofp		; AVX1: cost of 21 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv8i64v8float		; AVX2-LABEL: uitofpv8i64v8float
; AVX2: cost of 21 {{.*}} uitofp		; AVX2: cost of 21 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv8i64v8float		; AVX512F-LABEL: uitofpv8i64v8float
; AVX512F: cost of 22 {{.*}} uitofp		; AVX512F: cost of 26 {{.*}} uitofp
%1 = uitofp <8 x i64> %a to <8 x float>		%1 = uitofp <8 x i64> %a to <8 x float>
ret <8 x float> %1		ret <8 x float> %1
}		}

define <16 x float> @uitofpv16i64v16float(<16 x i64> %a) {		define <16 x float> @uitofpv16i64v16float(<16 x i64> %a) {
; SSE2-LABEL: uitofpv16i64v16float		; SSE2-LABEL: uitofpv16i64v16float
; SSE2: cost of 120 {{.*}} uitofp		; SSE2: cost of 120 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv16i64v16float		; AVX1-LABEL: uitofpv16i64v16float
; AVX1: cost of 43 {{.*}} uitofp		; AVX1: cost of 43 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv16i64v16float		; AVX2-LABEL: uitofpv16i64v16float
; AVX2: cost of 43 {{.*}} uitofp		; AVX2: cost of 43 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv16i64v16float		; AVX512F-LABEL: uitofpv16i64v16float
; AVX512F: cost of 45 {{.*}} uitofp		; AVX512F: cost of 53 {{.*}} uitofp
%1 = uitofp <16 x i64> %a to <16 x float>		%1 = uitofp <16 x i64> %a to <16 x float>
ret <16 x float> %1		ret <16 x float> %1
}		}

define <32 x float> @uitofpv32i64v32float(<32 x i64> %a) {		define <32 x float> @uitofpv32i64v32float(<32 x i64> %a) {
; SSE2-LABEL: uitofpv32i64v32float		; SSE2-LABEL: uitofpv32i64v32float
; SSE2: cost of 240 {{.*}} uitofp		; SSE2: cost of 240 {{.*}} uitofp
;		;
; AVX1-LABEL: uitofpv32i64v32float		; AVX1-LABEL: uitofpv32i64v32float
; AVX1: cost of 87 {{.*}} uitofp		; AVX1: cost of 87 {{.*}} uitofp
;		;
; AVX2-LABEL: uitofpv32i64v32float		; AVX2-LABEL: uitofpv32i64v32float
; AVX2: cost of 87 {{.*}} uitofp		; AVX2: cost of 87 {{.*}} uitofp
;		;
; AVX512F-LABEL: uitofpv32i64v32float		; AVX512F-LABEL: uitofpv32i64v32float
; AVX512F: cost of 91 {{.*}} uitofp		; AVX512F: cost of 107 {{.*}} uitofp
%1 = uitofp <32 x i64> %a to <32 x float>		%1 = uitofp <32 x i64> %a to <32 x float>
ret <32 x float> %1		ret <32 x float> %1
}		}

define <8 x i32> @fptouiv8f32v8i32(<8 x float> %a) {		define <8 x i32> @fptouiv8f32v8i32(<8 x float> %a) {
; AVX512F-LABEL: fptouiv8f32v8i32		; AVX512F-LABEL: fptouiv8f32v8i32
; AVX512F: cost of 1 {{.*}} fptoui		; AVX512F: cost of 1 {{.*}} fptoui
%1 = fptoui <8 x float> %a to <8 x i32>		%1 = fptoui <8 x float> %a to <8 x i32>
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopVectorize/X86/uint64_to_fp64-cost-model.ll

	; RUN: opt < %s -loop-vectorize -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx -S -debug-only=loop-vectorize 2>&1 \| FileCheck %s			; RUN: opt < %s -loop-vectorize -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx -S -debug-only=loop-vectorize 2>&1 \| FileCheck %s
	; REQUIRES: asserts			; REQUIRES: asserts

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.8.0"			target triple = "x86_64-apple-macosx10.8.0"


	; CHECK: cost of 20 for VF 2 For instruction: %conv = uitofp i64 %tmp to double			; CHECK: cost of 10 for VF 2 For instruction: %conv = uitofp i64 %tmp to double
	; CHECK: cost of 40 for VF 4 For instruction: %conv = uitofp i64 %tmp to double			; CHECK: cost of 20 for VF 4 For instruction: %conv = uitofp i64 %tmp to double
	define void @uint64_to_double_cost(i64* noalias nocapture %a, double* noalias nocapture readonly %b) nounwind {			define void @uint64_to_double_cost(i64* noalias nocapture %a, double* noalias nocapture readonly %b) nounwind {
	entry:			entry:
	br label %for.body			br label %for.body
	for.body:			for.body:
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds i64, i64* %a, i64 %indvars.iv			%arrayidx = getelementptr inbounds i64, i64* %a, i64 %indvars.iv
	%tmp = load i64, i64* %arrayidx, align 4			%tmp = load i64, i64* %arrayidx, align 4
	%conv = uitofp i64 %tmp to double			%conv = uitofp i64 %tmp to double
	Show All 9 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Make some cast costs more preciseClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 63581

llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp

llvm/trunk/test/Analysis/CostModel/X86/cast.ll

llvm/trunk/test/Analysis/CostModel/X86/sitofp.ll

llvm/trunk/test/Analysis/CostModel/X86/uitofp.ll

llvm/trunk/test/Transforms/LoopVectorize/X86/uint64_to_fp64-cost-model.ll

[X86] Make some cast costs more precise
ClosedPublic