Page MenuHomePhabricator

Optimized instruction sequence for sitofp operation on X86-32
ClosedPublic

Authored by delena on Jan 7 2016, 2:48 AM.

Details

Summary

Optimized
sitofp i64 %x to double
the current sequence

movl %ecx, 8(%esp)
movl %edx, 12(%esp)
fildll 8(%esp)

replaced with

movd %ecx, %xmm0
movd %edx, %xmm1
punpckldq %xmm1, %xmm0
movq %xmm0, 8(%esp)

Diff Detail

Repository
rL LLVM

Event Timeline

delena updated this revision to Diff 44200.Jan 7 2016, 2:48 AM
delena retitled this revision from to Optimized instruction sequence for sitofp operation on X86-32.
delena updated this object.
delena added a reviewer: mbodart.
delena set the repository for this revision to rL LLVM.
delena added a subscriber: llvm-commits.
mbodart edited edge metadata.Jan 8 2016, 3:23 PM

Hi Elena,

Just a few minor comments, otherwise LGTM!

  • mitch
../lib/Target/X86/X86ISelLowering.cpp
12658 ↗(On Diff #44200)

It would be useful to have a source comment here to the effect:

Bitcasting to f64 here allows us to do a single 64-bit store from an SSE register,
avoiding the store forwarding penalty that would come with two 32-bit stores.

../test/CodeGen/X86/scalar-int-to-fp.ll
76 ↗(On Diff #44200)

For both test functions u64_to_f and s64_to_f, we should add the following additional checks
before the fildll:

AVX512_32: punpckldq
SSE_32: punpckldq

120–130 ↗(On Diff #44200)

Rather than creating a new function, it would seem more simple to just add a check for punpckldq, for both SSE2_32 and AVX512_32, in the existing s64_to_d function.

delena marked 2 inline comments as done.Jan 10 2016, 1:37 AM
delena added inline comments.
../test/CodeGen/X86/scalar-int-to-fp.ll
120–130 ↗(On Diff #44200)

This is the code generated for s64_to_d, because the input parameters are already on stack.

pushl   %ebp
movl    %esp, %ebp
andl    $-8, %esp
subl    $8, %esp
fildll  8(%ebp)
fstpl   (%esp)
fldl    (%esp)
movl    %ebp, %esp
popl    %ebp
retl
delena updated this revision to Diff 44421.Jan 10 2016, 1:42 AM
delena edited edge metadata.

Added the same bitcast for UINT_TO_FP.
Added more tests.

This revision was automatically updated to reflect the committed changes.