This patch enables support for the conversion of v2i32 to v2f64 to use the CVTDQ2PD xmm instruction and stay on the SSE unit instead of scalarizing, sign extending to i64 and using CVTSI2SDQ scalar conversions.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Hi Simon,
I am not sure this is the best way to fix this issue.
In particular, I wonder if there is an alternative approach that doesn't involve adding a new target opcode.
At least, on AVX, you can have a canonicalization rule that converts the following dag node sequence:
v4i32: A = ... v2i32: B = extract_subvector A, 0 v2f64: C = sint_to_fp B
into:
v4i32: A = ... v4f64: B = sint_to_fp A v2f64: C = extract_subvector B, 0
Then, I think you can add a ISel pattern to match a VCVTDQ2PDrr from a:
(v2f64 (extract_subvector (v4f64 (sint_to_fp v2f64:%V ), 0).
Unfortunately, the combine rule above would not fix the problem for non-AVX targets.
On those targets you will end up with a dag that looks like this:
v2f64 = build_vector (f64 (sint_to_fp i32:A)), (f64 (sint_to_fp i32:B))
Where:
A: i32 = extract_vector_elt %InVec, i64 0 B: i32 = extract_vector_elt %InVec, i64 1
I am not sure if this would be a good approach, but I think one way to fix this is to add a (quite long) ISel pattern to match that sequence and select a VCVTDQ2PDrr.
I hope it helps.
Andrea
test/CodeGen/X86/vec_int_to_fp.ll | ||
---|---|---|
11 | I know that this is unrelated to your patch, but I noticed that on SSE2, this 'i64 extract element has been expanded to 'movd'. Shouldn't this be a 'movq' instead? | |
14 | Same as above. | |
26 | Again, this is unrelated to your patch but | |
test/CodeGen/X86/x86-setcc-int-to-fp-combine.ll | ||
30–31 | You can get rid of that FIXME since you fixed it with this patch :-) |
Hi Simon,
Look good to me.
Do we want to do something similar with i32 -> PS using CVTPI2PS?
That would require to use MMX so that may not be desirable though.
Just something to experiment ;).
Cheers,
-Quentin
Thanks guys for the reviews.
Andrea - I did investigate TableGen / ISel pattern approaches but couldn't manage to make anything work - it doesn't like the fact that v2i32 isn't valid. I believe its why the X86ISD VFPEXT and VFPROUND node types were added in a similar manner. I considered giving the node a more general name 'SINT_TO_FPEXT' but given that cvtdq2pd appears to be the only user of this pattern it didn't seem necessary.
Quentin - yes the MMX instructions (CVTPI2PD and CVTPI2PS) could work in a similar fashion. Are we actively adding MMX/3DNow! lowering? I always thought they were just hidden behind their builtin intrinsics.
I'll add the vectorization costs (+ a test) as part of the submission (or possibly as a followup).
test/CodeGen/X86/vec_int_to_fp.ll | ||
---|---|---|
11 | This has come up before - I was sure we made a bugzilla for this but can't find it (Sanjay can you remember?). | |
26 | I'll see if I can find out what's causing it - odd that xmm0 had just been used in exactly the same type of instruction without clearing the upper bits. |
Thanks for pointing it out.
I see how FP_EXTEND and FP_ROUND are marked as 'Custom' on v2f32.
The patch looks good to me too.
Quentin - yes the MMX instructions (CVTPI2PD and CVTPI2PS) could work in a similar fashion. Are we actively adding MMX/3DNow! lowering? I always thought they were just hidden behind their builtin intrinsics.
I'll add the vectorization costs (+ a test) as part of the submission (or possibly as a followup).
Are we actively adding MMX/3DNow! lowering?
No, not particularly, that was just a thought :).
I know that this is unrelated to your patch, but I noticed that on SSE2, this 'i64 extract element has been expanded to 'movd'. Shouldn't this be a 'movq' instead?