Download Raw Diff

Details

Reviewers

echristo
mkuper
craig.topper

Commits

rC258492: 2 missing intrinsics _cvtss_sh and _mm_cvtps_ph were added to the intrinsics…
rL258492: 2 missing intrinsics _cvtss_sh and _mm_cvtps_ph were added to the intrinsics…

Summary

I have added 2 missing intrinsics _cvtss_sh and _mm_cvtps_ph to the intrinsics header f16intrin.h.

GCC has these intrinsics in f16cintrin.h. Here is the definition:

extern __inline float __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_cvtsh_ss (unsigned short __S)
{
  __v8hi __H = __extension__ (__v8hi){ __S, 0, 0, 0, 0, 0, 0, 0 };
  __v4sf __A = __builtin_ia32_vcvtph2ps (__H);
  return __builtin_ia32_vec_ext_v4sf (__A, 0);
`}
#ifdef __OPTIMIZE__
extern __inline unsigned short __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_cvtss_sh (float __F, const int __I)
{
  __v4sf __A =  __extension__ (__v4sf){ __F, 0, 0, 0 };
  __v8hi __H = __builtin_ia32_vcvtps2ph (__A, __I);
  return (unsigned short) __builtin_ia32_vec_ext_v8hi (__H, 0);
`}
#else
#define _cvtss_sh(__F, __I)                                             \
  (__extension__                                                        \
   ({                                                                   \
      __v4sf __A =  __extension__ (__v4sf){ __F, 0, 0, 0 };             \
      __v8hi __H = __builtin_ia32_vcvtps2ph (__A, __I);                 \
      (unsigned short) __builtin_ia32_vec_ext_v8hi (__H, 0);            \
    }))
#endif /* __OPTIMIZE */

Intel's documentation expects _cvtsh_ss to have 2 parameters (instead of one)
https://software.intel.com/en-us/node/524287
but most likely the documentation is wrong, because Intel’s' headers contain these intrinsic prototypes in emmintrin.h

extern float          __ICL_INTRINCC _cvtsh_ss(unsigned short);
extern unsigned short __ICL_INTRINCC _cvtss_sh(float, int);

BTW, emmintrin.h includes f16cintrin.h, so it should be OK to place these 2 intrinsics in f16cintrin.h. This should satisfy both Intel's and GCC's expectations.

Clang generates the following IR for cvtsh_ss (see below). In the test I simply checked that the builtin @llvm.x86.vcvtph2ps.128 is generated for _cvtsh_ss. I was afraid that checks for initialization of the vector 'v' might be too lengthy and the IR is prone to change frequently, so I didn't add these checks. However, if you think that this is important and it won't create too much headache because IR will keep changing over time, I could certainly add them.
The same goes for test_cvtss_sh.

static __inline float __DEFAULT_FN_ATTRS
cvtsh_ss(unsigned short a)
{
  __v8hi v = {(short)a, 0, 0, 0, 0, 0, 0, 0};
  __v4sf r = __builtin_ia32_vcvtph2ps(v);
  return r[0];
}

define float @test_cvtsh_ss(i16 zeroext %a) #0 {
entry:
  %a.addr.i = alloca i16, align 2
  %v.i = alloca <8 x i16>, align 16
  %r.i = alloca <4 x float>, align 16
  %a.addr = alloca i16, align 2
  store i16 %a, i16* %a.addr, align 2
  %0 = load i16, i16* %a.addr, align 2
  store i16 %0, i16* %a.addr.i, align 2
  %1 = load i16, i16* %a.addr.i, align 2
  %vecinit.i = insertelement <8 x i16> undef, i16 %1, i32 0
  %vecinit1.i = insertelement <8 x i16> %vecinit.i, i16 0, i32 1
  %vecinit2.i = insertelement <8 x i16> %vecinit1.i, i16 0, i32 2
  %vecinit3.i = insertelement <8 x i16> %vecinit2.i, i16 0, i32 3
  %vecinit4.i = insertelement <8 x i16> %vecinit3.i, i16 0, i32 4
  %vecinit5.i = insertelement <8 x i16> %vecinit4.i, i16 0, i32 5
  %vecinit6.i = insertelement <8 x i16> %vecinit5.i, i16 0, i32 6
  %vecinit7.i = insertelement <8 x i16> %vecinit6.i, i16 0, i32 7
  store <8 x i16> %vecinit7.i, <8 x i16>* %v.i, align 16
  %2 = load <8 x i16>, <8 x i16>* %v.i, align 16
  %3 = call <4 x float> @llvm.x86.vcvtph2ps.128(<8 x i16> %2) #2
  store <4 x float> %3, <4 x float>* %r.i, align 16
  %4 = load <4 x float>, <4 x float>* %r.i, align 16
  %vecext.i = extractelement <4 x float> %4, i32 0
  ret float %vecext.i
}

Diff Detail

Repository: rL LLVM

Event Timeline

kromanova updated this revision to Diff 44832.Jan 13 2016, 9:09 PM

kromanova retitled this revision from to Adding missing intrinsics _cvtsh_ss and _cvtss_sh.

kromanova updated this object.

kromanova set the repository for this revision to rL LLVM.

Adding cfe-commits as a subscriber.

kromanova updated this object.Jan 13 2016, 9:18 PM

kromanova updated this object.Jan 13 2016, 9:22 PM

kromanova updated this object.Jan 13 2016, 9:26 PM

ygao added a reviewer: craig.topper.Jan 19 2016, 2:54 PM

kromanova added reviewers: mkuper, echristo.Jan 19 2016, 3:03 PM

craig.topper added inline comments.Jan 19 2016, 10:03 PM

lib/Headers/f16cintrin.h
47 ↗	(On Diff #44832)	Would it be possible to do this without temporaries? Temporaries in macros can cause -Wshadow warnings if the macro is used multiple times.

Craig, thank you for the review. Here are the changes that you requested.
Katya.

craig.topper added inline comments.Jan 20 2016, 9:52 PM

lib/Headers/f16cintrin.h
47 ↗	(On Diff #45458)	Can we do something like this to remove the last temporary? #define _cvtss_sh(a, imm) extension ({ \ (unsigned short)((__v8hi)__builtin_ia32_vcvtps2ph((__v4sf){a, 0, 0, 0}, (imm))[0]); \ })

Updated patch to address Craig's comments.

kromanova marked an inline comment as done.Jan 21 2016, 6:05 PM

kromanova added inline comments.

lib/Headers/f16cintrin.h

47 ↗

(On Diff #45458)

Hi Craig,

I should have looked how it's done just a few lines below. Sorry.
I had to slightly modify the body of the define that you proposed by adding an additional pair of round brackets, otherwise I got compilation errors like this:

 ~/ngh/ToT_commit/build/bin/clang  intr.cpp  -mf16c intr.cpp:10:7: 
error: C-style cast from scalar 'short' to vector '__v8hi'
    (vector of 8 'short' values) of different size
a = _cvtss_sh(res, imm);
    ^~~~~~~~~~~~~~~~~~~
~/ngh/ToT_commit/build/bin/../lib/clang/3.8.0/include/f16cintrin.h:43:20: note:
    expanded from macro '_cvtss_sh'
(unsigned short)((__v8hi)__builtin_ia32_vcvtps2ph((__v4sf){a, 0, 0, 0}, \
                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
intr.cpp:10:5: error: assigning to 'unsigned short' from incompatible type
    'void'
a = _cvtss_sh(res, imm);
  ^ ~~~~~~~~~~~~~~~~~~~
2 errors generated.

I further simplified the macros by removing the statement for the define that I added (_cvtss_sh) and for the one that was there before (_mm_cvtps_ph).
I also formatted __DEFAULT_FN_ATTRS macro to comply with 80 characters limitation.

Craig, do you think it's necessary to make the tests more fancy by checking how the vector is initialized before the builtin invocation and/or that one element is extracted from the vector after the builtin returned a value? It will add additional 10-15 check lines to each test. My concern is that these additional lines might change from time to time.

I agree that the vector initialization code will be prone to changing. I think what you have is fine.

LGTM

This revision is now accepted and ready to land.Jan 21 2016, 10:16 PM

Closed by commit rL258492: 2 missing intrinsics _cvtss_sh and _mm_cvtps_ph were added to the intrinsics… (authored by kromanova). · Explain WhyJan 21 2016, 10:54 PM

This revision was automatically updated to reflect the committed changes.

This is an archive of the discontinued LLVM Phabricator instance.

Adding missing intrinsics _cvtsh_ss and _cvtss_sh
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 45650

cfe/trunk/lib/Headers/f16cintrin.h

cfe/trunk/test/CodeGen/f16c-builtins.c

This is an archive of the discontinued LLVM Phabricator instance.

Adding missing intrinsics _cvtsh_ss and _cvtss_shClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 45650

cfe/trunk/lib/Headers/f16cintrin.h

cfe/trunk/test/CodeGen/f16c-builtins.c

Adding missing intrinsics _cvtsh_ss and _cvtss_sh
ClosedPublic