⚙ D7296 Add memory variant of vcvtps2ph.

alexr updated this revision to Diff 19039.Jan 30 2015, 8:12 AM

alexr retitled this revision from to Add memory variant of vcvtps2ph..

alexr updated this object.

alexr edited the test plan for this revision. (Show Details)

alexr added a reviewer: craig.topper.

alexr added a subscriber: Unknown Object (MLST).

Also add the xmm version.

asl added a subscriber: asl.Jan 30 2015, 8:44 AM

RKSimon added a subscriber: RKSimon.Feb 2 2015, 3:17 AM

RKSimon added a subscriber: andreadb.Feb 2 2015, 5:56 AM

Thanks for working on this Alex - I've a concern with the 128->64 bit folded store.

lib/Target/X86/X86InstrSSE.td
8523	While VCVTPS2PHrr does write to an entire xmm (result half floats to lower 64-bits, and clears the upper 64-bits), VCVTPS2PHmr only writes out 64-bits to memory (so with this pattern the upper 64-bits would be undefined / not touched). Please can you update the pattern to use a suitable extract?

Try storing only 64 bits.

Try storing only 64 bits.
Support f64 as well as i64. Add more tests.

Thanks Alex, there is a 32-bit X86 test failure that needs dealing with but is good otherwise.

test/CodeGen/X86/f16c-intrinsics.ll
111	This fails on the X86 tests at the moment, but passes the X86-64 version fine.

spatel added a subscriber: spatel.Mar 24 2015, 2:14 PM

Storing i64 on i386 with SSE2 is broken in general:
https://llvm.org/bugs/show_bug.cgi?id=23049

I'm almost afraid to ask, but why are these intrinsics defined with integer vectors?

def int_x86_vcvtps2ph_128 : GCCBuiltin<"__builtin_ia32_vcvtps2ph">,
            Intrinsic<[llvm_v8i16_ty], [llvm_v4f32_ty, llvm_i32_ty],
                      [IntrNoMem]>;

MIPS has v8f16 defs...so this should be possible for x86 too?

On Fri, Mar 27, 2015 at 5:40 PM, Ahmed Bougacha <ahmed.bougacha@gmail.com> wrote:

On Fri, Mar 27, 2015 at 3:53 PM, Sanjay Patel <spatel@rotateright.com> wrote:
> I'm almost afraid to ask,

*can see where this is going*

> but why are these intrinsics defined with integer vectors?
>
>   def int_x86_vcvtps2ph_128 : GCCBuiltin<"__builtin_ia32_vcvtps2ph">,
>               Intrinsic<[llvm_v8i16_ty], [llvm_v4f32_ty, llvm_i32_ty],
>                         [IntrNoMem]>;
>
> MIPS has v8f16 defs...

Oh god, why?

> so this should be possible for x86 too?

It should, but would just be asking for trouble, with no benefit whatsoever.

With f16/half you're opening the door to any kind of half operation,
and you really don't want to do that (given that there's no such thing
except for these conversions, pretty much anywhere except GPUs).

On most targets, half is softened by the frontend to i16,  and this
avoids a whole class of problems.  If the ps2ph builtin returned <8 x
half>, I (or some optimization) could write:

  %1 = ..ps2ph..
  %2 = ..ps2ph..
  %l = fadd <8 x half> %1, %2

And this would just crash the compiler.
Also, clang emits __fp16 as i16, and it's reasonable to match that in
the vector builtins (though I'm not sure there's much interaction).

Ideally, we would always use f16/half, and let legalization take care
of it.  Until then, stick with integers ;)

Thanks, Ahmed! I don't know what MIPS is doing with v8f16, but I just took that as a grep-based existence proof. :)

I see now that X86ISelLowering has:

// TODO: when we have SSE, these could be more efficient, by using movd/movq.
if (!X86ScalarSSEf64) {
  setOperationAction(ISD::BITCAST        , MVT::f32  , Expand);
  ...

I haven't looked into how that part works before, but I'm guessing that's where I need to hack to get i64 stores working for i386 in general...and that'll solve this problem with f16 intrinsics too.

Let the yak shaving begin: D8691 is the first of hopefully not too many steps to fix this problem.

spatel mentioned this in D9134: [x86] allow 64-bit extracted vector element integer stores on a 32-bit system.Apr 20 2015, 4:50 PM

spatel mentioned this in rL235460: [x86] allow 64-bit extracted vector element integer stores on a 32-bit system.Apr 21 2015, 5:28 PM

This fixes the 64-bit store on x86-32:
http://reviews.llvm.org/rL235460

So unless there are any other objections, this patch should be good to go.

With the -mtriple fix to the test case LGTM

test/CodeGen/X86/f16c-intrinsics.ll
3	Please can you replace -march with a -mtriple equivalent - it prevents the tests failing on windows build/tests

This revision is now accepted and ready to land.Apr 22 2015, 5:13 AM

spatel commandeered this revision.Apr 22 2015, 8:25 AM

spatel added a reviewer: alexr.

Closed by commit rL235517: [x86] Add store-folded memop patterns for vcvtps2ph (authored by spatel). · Explain WhyApr 22 2015, 9:14 AM

This revision was automatically updated to reflect the committed changes.

This is an archive of the discontinued LLVM Phabricator instance.

Add memory variant of vcvtps2ph.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 19042

lib/Target/X86/X86InstrSSE.td

test/CodeGen/X86/f16c-intrinsics.ll