This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Vectorize v2i32 to v2f64 conversions
ClosedPublic

Authored by RKSimon on Jun 14 2015, 8:33 AM.

Details

Summary

This patch enables support for the conversion of v2i32 to v2f64 to use the CVTDQ2PD xmm instruction and stay on the SSE unit instead of scalarizing, sign extending to i64 and using CVTSI2SDQ scalar conversions.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 27645.Jun 14 2015, 8:33 AM
RKSimon retitled this revision from to [X86][SSE] Vectorize v2i32 to v2f64 conversions.
RKSimon updated this object.
RKSimon edited the test plan for this revision. (Show Details)
RKSimon added reviewers: qcolombet, delena, andreadb, spatel.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: Unknown Object (MLST).
andreadb edited edge metadata.Jun 15 2015, 11:34 AM

Hi Simon,

I am not sure this is the best way to fix this issue.
In particular, I wonder if there is an alternative approach that doesn't involve adding a new target opcode.

At least, on AVX, you can have a canonicalization rule that converts the following dag node sequence:

v4i32: A = ...
v2i32: B  = extract_subvector A, 0
v2f64: C = sint_to_fp B

into:

v4i32: A = ...
v4f64: B  = sint_to_fp A
v2f64: C = extract_subvector B, 0

Then, I think you can add a ISel pattern to match a VCVTDQ2PDrr from a:

(v2f64 (extract_subvector (v4f64 (sint_to_fp v2f64:%V ), 0).

Unfortunately, the combine rule above would not fix the problem for non-AVX targets.
On those targets you will end up with a dag that looks like this:
v2f64 = build_vector (f64 (sint_to_fp i32:A)), (f64 (sint_to_fp i32:B))

Where:

A: i32 = extract_vector_elt %InVec, i64 0
B: i32 = extract_vector_elt %InVec, i64 1

I am not sure if this would be a good approach, but I think one way to fix this is to add a (quite long) ISel pattern to match that sequence and select a VCVTDQ2PDrr.

I hope it helps.
Andrea

test/CodeGen/X86/vec_int_to_fp.ll
11 ↗(On Diff #27645)

I know that this is unrelated to your patch, but I noticed that on SSE2, this 'i64 extract element has been expanded to 'movd'. Shouldn't this be a 'movq' instead?

14 ↗(On Diff #27645)

Same as above.
Although this is unrelated to your patch, I think this should be 'movq'. Otherwise, we end up losing the upper half of the quadword in input.

26 ↗(On Diff #27645)

Again, this is unrelated to your patch but
this vxorps seems redundant. I haven't looked at the code, but I suspect that this may be caused by a sub-optimal build_vector lowering.

test/CodeGen/X86/x86-setcc-int-to-fp-combine.ll
30–31 ↗(On Diff #27645)

You can get rid of that FIXME since you fixed it with this patch :-)

qcolombet accepted this revision.Jun 15 2015, 11:46 AM
qcolombet edited edge metadata.

Hi Simon,

Look good to me.

Do we want to do something similar with i32 -> PS using CVTPI2PS?

That would require to use MMX so that may not be desirable though.

Just something to experiment ;).

Cheers,
-Quentin

This revision is now accepted and ready to land.Jun 15 2015, 11:46 AM

PS: Shouldn’t we update the vectorizer cost model as well?

Thanks guys for the reviews.

Andrea - I did investigate TableGen / ISel pattern approaches but couldn't manage to make anything work - it doesn't like the fact that v2i32 isn't valid. I believe its why the X86ISD VFPEXT and VFPROUND node types were added in a similar manner. I considered giving the node a more general name 'SINT_TO_FPEXT' but given that cvtdq2pd appears to be the only user of this pattern it didn't seem necessary.

Quentin - yes the MMX instructions (CVTPI2PD and CVTPI2PS) could work in a similar fashion. Are we actively adding MMX/3DNow! lowering? I always thought they were just hidden behind their builtin intrinsics.

I'll add the vectorization costs (+ a test) as part of the submission (or possibly as a followup).

test/CodeGen/X86/vec_int_to_fp.ll
11 ↗(On Diff #27645)

This has come up before - I was sure we made a bugzilla for this but can't find it (Sanjay can you remember?).

26 ↗(On Diff #27645)

I'll see if I can find out what's causing it - odd that xmm0 had just been used in exactly the same type of instruction without clearing the upper bits.

andreadb accepted this revision.Jun 15 2015, 3:22 PM
andreadb edited edge metadata.

Thanks guys for the reviews.

Andrea - I did investigate TableGen / ISel pattern approaches but couldn't manage to make anything work - it doesn't like the fact that v2i32 isn't valid. I believe its why the X86ISD VFPEXT and VFPROUND node types were added in a similar manner. I considered giving the node a more general name 'SINT_TO_FPEXT' but given that cvtdq2pd appears to be the only user of this pattern it didn't seem necessary.

Thanks for pointing it out.
I see how FP_EXTEND and FP_ROUND are marked as 'Custom' on v2f32.

The patch looks good to me too.

Quentin - yes the MMX instructions (CVTPI2PD and CVTPI2PS) could work in a similar fashion. Are we actively adding MMX/3DNow! lowering? I always thought they were just hidden behind their builtin intrinsics.

I'll add the vectorization costs (+ a test) as part of the submission (or possibly as a followup).

Are we actively adding MMX/3DNow! lowering?

No, not particularly, that was just a thought :).

This revision was automatically updated to reflect the committed changes.

Thanks guys - I'll add vectorization costs/tests in the next day or so.