This is an archive of the discontinued LLVM Phabricator instance.

[X86] Make some cast costs more precise
ClosedPublic

Authored by mkuper on Jul 6 2016, 1:06 PM.

Details

Summary

This brings back from the dead the AVX table parts of Elena's D15604.

The SSE changes seem more closely bound to the strategy change (since the current costs use the weird "convert to simple types, then look up" form), I'll look into that separately.

Diff Detail

Event Timeline

mkuper updated this revision to Diff 62946.Jul 6 2016, 1:06 PM
mkuper retitled this revision from to [X86] Make some cast costs more precise.
mkuper updated this object.
mkuper added reviewers: delena, RKSimon, spatel.
mkuper added a subscriber: llvm-commits.
RKSimon added inline comments.Jul 6 2016, 1:15 PM
lib/Target/X86/X86TargetTransformInfo.cpp
540

Depending on how thorough we need to be shouldn't there be AVX512DQ+AVX512VL UINT_TO_FP cases for 128/256 bit vectors?

mkuper added inline comments.Jul 6 2016, 1:20 PM
lib/Target/X86/X86TargetTransformInfo.cpp
540

Probably.
I'd rather leave that to the Intel folks, they can probably get more precise numbers for SKX.

delena added inline comments.Jul 6 2016, 11:29 PM
lib/Target/X86/X86TargetTransformInfo.cpp
540

In this case, even if you have only DQ without VL, the conversion is in ZMM instead of YMM, but the cost is the same.

test/Analysis/CostModel/X86/sitofp.ll
273

We should have a nicer cost for DQ here, because it handles all 64 bit integers, right?

mkuper added inline comments.Jul 7 2016, 10:44 AM
lib/Target/X86/X86TargetTransformInfo.cpp
540

We don't do this right now, see below.

test/Analysis/CostModel/X86/sitofp.ll
273

Right now, we scalarize this unless we have VL.

That is, both F and F+DQ produce:

	vextracti128	$1, %ymm0, %xmm1
	vpextrq	$1, %xmm1, %rax
	vcvtsi2sdq	%rax, %xmm0, %xmm2
	vmovq	%xmm1, %rax
	vcvtsi2sdq	%rax, %xmm0, %xmm1
	vunpcklpd	%xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0]
	vpextrq	$1, %xmm0, %rax
	vcvtsi2sdq	%rax, %xmm0, %xmm2
	vmovq	%xmm0, %rax
	vcvtsi2sdq	%rax, %xmm0, %xmm0
	vunpcklpd	%xmm2, %xmm0, %xmm0 ## xmm0 = xmm0[0],xmm2[0]
	vinsertf128	$1, %xmm1, %ymm0, %ymm0
	retq

And with VL:

	vcvtqq2pd	%ymm0, %ymm0
	retq

I guess we could, potentially, have a nicer sequence with DQ without VL (insert low lanes, vcvtqq2pd, extract low lanes), but we currently don't.

RKSimon edited edge metadata.Jul 8 2016, 12:44 PM

A couple of minors but otherwise this looks good to me. The AVX512 people should probably give the final OK though.

lib/Target/X86/X86TargetTransformInfo.cpp
540

OK - please add a TODO comment to the table for the AVX512DQ 128/256 entries.

717

Test?

mkuper added a comment.Jul 8 2016, 2:24 PM

Thanks, Simon!

lib/Target/X86/X86TargetTransformInfo.cpp
540

Ack.

717

Right, thanks.
Elena's original patch didn't have one, and I didn't notice. I'll add.

delena accepted this revision.Jul 9 2016, 11:13 PM
delena edited edge metadata.
This revision is now accepted and ready to land.Jul 9 2016, 11:13 PM
This revision was automatically updated to reflect the committed changes.