This is an archive of the discontinued LLVM Phabricator instance.

[X86] Potential improvement for v2i32->v2f64 uint_to_fp
ClosedPublic

Authored by craig.topper on Dec 27 2019, 12:36 PM.

Details

Summary

This patch proposes an alternate implementation for this conversion derived from our v2i32->v2f32 handling. We can zero extend the v2i32 to v2i64, or it with the bit representation of 2.0^52 which will give us 2.0^52 plus the 32-bit integer since double's mantissa is 52 bits. Then we just need to subtract 2.0^52 as a double and let the floating point unit normalize the remaining bits into a valid double.

This is less instructions then our previous code, but does require a port 5 shuffle for the zero extend or unpack.

Diff Detail

Event Timeline

craig.topper created this revision.Dec 27 2019, 12:36 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 27 2019, 12:36 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
RKSimon added inline comments.Dec 28 2019, 1:48 AM
llvm/test/CodeGen/X86/vec_int_to_fp.ll
1038

Would AVX1/AVX2 benefit for the v4i32-v4f64 case?

craig.topper marked an inline comment as done.Dec 28 2019, 2:04 AM
craig.topper added inline comments.
llvm/test/CodeGen/X86/vec_int_to_fp.ll
1038

What are our options for zext v4i32->v4i64 on avx1. We don’t get the instruction until avx2

RKSimon added inline comments.Dec 28 2019, 1:37 PM
llvm/test/CodeGen/X86/vec_int_to_fp.ll
734

Would AVX1/AVX2 benefit here?

craig.topper marked an inline comment as done.Dec 28 2019, 1:55 PM
craig.topper added inline comments.
llvm/test/CodeGen/X86/vec_int_to_fp.ll
734

This test case is weird. It explicitly use a v4i32->v4f64 and then extracts it to v2f64. The sse tests changed because we had to split the v4f64 during type legalizaiton and then half the split became dead. With AVX we don't split it and then push the extract through later after vector op legalization.

RKSimon added inline comments.Dec 29 2019, 2:58 AM
llvm/test/CodeGen/X86/vec_int_to_fp.ll
1038

We'd probably end up with a PMOVZX(xmm) for the lower v2i32, a PUNPCKH(xmm, zero) for the upper v2i32 followed by a VINSERTF128 and the VORPD(ymm) - and that removes a PBLENDW, PSRLD, 2*CVTDQ2PD and MULPD (+replace VADDPD with VSUBPD) - so that should be an improvement.

RKSimon accepted this revision.Jan 3 2020, 12:49 AM

LGTM, the missed AVX improvements should be handled in other patches already under review

This revision is now accepted and ready to land.Jan 3 2020, 12:49 AM
This revision was automatically updated to reflect the committed changes.