This is an archive of the discontinued LLVM Phabricator instance.

[X86] Make some cast costs more precise
ClosedPublic

Authored by mkuper on Jul 6 2016, 1:06 PM.

Details

Summary

This brings back from the dead the AVX table parts of Elena's D15604.

The SSE changes seem more closely bound to the strategy change (since the current costs use the weird "convert to simple types, then look up" form), I'll look into that separately.

Diff Detail

Repository
rL LLVM

Event Timeline

mkuper updated this revision to Diff 62946.Jul 6 2016, 1:06 PM
mkuper retitled this revision from to [X86] Make some cast costs more precise.
mkuper updated this object.
mkuper added reviewers: delena, RKSimon, spatel.
mkuper added a subscriber: llvm-commits.
RKSimon added inline comments.Jul 6 2016, 1:15 PM
lib/Target/X86/X86TargetTransformInfo.cpp
540 ↗(On Diff #62946)

Depending on how thorough we need to be shouldn't there be AVX512DQ+AVX512VL UINT_TO_FP cases for 128/256 bit vectors?

mkuper added inline comments.Jul 6 2016, 1:20 PM
lib/Target/X86/X86TargetTransformInfo.cpp
540 ↗(On Diff #62946)

Probably.
I'd rather leave that to the Intel folks, they can probably get more precise numbers for SKX.

delena added inline comments.Jul 6 2016, 11:29 PM
lib/Target/X86/X86TargetTransformInfo.cpp
540 ↗(On Diff #62946)

In this case, even if you have only DQ without VL, the conversion is in ZMM instead of YMM, but the cost is the same.

test/Analysis/CostModel/X86/sitofp.ll
273 ↗(On Diff #62946)

We should have a nicer cost for DQ here, because it handles all 64 bit integers, right?

mkuper added inline comments.Jul 7 2016, 10:44 AM
lib/Target/X86/X86TargetTransformInfo.cpp
540 ↗(On Diff #62946)

We don't do this right now, see below.

test/Analysis/CostModel/X86/sitofp.ll
273 ↗(On Diff #62946)

Right now, we scalarize this unless we have VL.

That is, both F and F+DQ produce:

	vextracti128	$1, %ymm0, %xmm1
	vpextrq	$1, %xmm1, %rax
	vcvtsi2sdq	%rax, %xmm0, %xmm2
	vmovq	%xmm1, %rax
	vcvtsi2sdq	%rax, %xmm0, %xmm1
	vunpcklpd	%xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0]
	vpextrq	$1, %xmm0, %rax
	vcvtsi2sdq	%rax, %xmm0, %xmm2
	vmovq	%xmm0, %rax
	vcvtsi2sdq	%rax, %xmm0, %xmm0
	vunpcklpd	%xmm2, %xmm0, %xmm0 ## xmm0 = xmm0[0],xmm2[0]
	vinsertf128	$1, %xmm1, %ymm0, %ymm0
	retq

And with VL:

	vcvtqq2pd	%ymm0, %ymm0
	retq

I guess we could, potentially, have a nicer sequence with DQ without VL (insert low lanes, vcvtqq2pd, extract low lanes), but we currently don't.

RKSimon edited edge metadata.Jul 8 2016, 12:44 PM

A couple of minors but otherwise this looks good to me. The AVX512 people should probably give the final OK though.

lib/Target/X86/X86TargetTransformInfo.cpp
540 ↗(On Diff #62946)

OK - please add a TODO comment to the table for the AVX512DQ 128/256 entries.

717 ↗(On Diff #62946)

Test?

mkuper added a comment.Jul 8 2016, 2:24 PM

Thanks, Simon!

lib/Target/X86/X86TargetTransformInfo.cpp
540 ↗(On Diff #62946)

Ack.

717 ↗(On Diff #62946)

Right, thanks.
Elena's original patch didn't have one, and I didn't notice. I'll add.

delena accepted this revision.Jul 9 2016, 11:13 PM
delena edited edge metadata.
This revision is now accepted and ready to land.Jul 9 2016, 11:13 PM
This revision was automatically updated to reflect the committed changes.