Page MenuHomePhabricator

emmintrin.h documentation fixes and updates

Authored by dyung on Dec 21 2017, 2:14 PM.



This is the result of several patches we made internally to update the documentation that we would like to have reviewed for possible submission.

The changes include:

  1. Fixed innaccurate instruction mappings for various intrinsics.
  2. Fixed description of NaN handling in comparison intrinsics.
  3. Unify description of _mm_store_pd1 to match _mm_store1_pd.
  4. Fix incorrect wording in various intrinsic descriptions. Previously the descriptions used "low-order" and "high-order" when the intended meaning was "even-indexed" and "odd-indexed".
  5. Fix typos.
  6. Add missing italics command (\a) for params and fixed some parameter spellings.

These patches were made by Craig Flores

Diff Detail


Event Timeline

dyung created this revision.Dec 21 2017, 2:14 PM
This revision is now accepted and ready to land.Dec 22 2017, 7:17 PM
This revision was automatically updated to reflect the committed changes.
kromanova added inline comments.Jan 4 2018, 8:12 PM

Formatting is inconsistent with the rest of the changes above or below. One sentence here separated by the empty lines, where everywhere else it's 2 sentences.


I'm not sure about this change.

Intel documentation says they generate MOVDQ2Q (don't have icc handy to try).
However, I've tried on Linux/X86_64 with clang and gcc, - and we just return.

kromanova added inline comments.Jan 8 2018, 6:35 PM

It's better if you use the same language as for many intrinsics "before" and "after". Just for consistency purpose.

/// This intrinsic is a utility function and does not correspond to a specific
///    instruction.

Though I suspect it's possible to generate movdq2q, I couldn't come up with an test to trigger this instruction generation.
Should we revert this change?

__m64 fooepi64_pi64 (__m128i a, __m128 c)
  __m64 x;

  x = _mm_movepi64_pi64 (a);
  return x;

on Linux we generate return instruction.
I would expect (v)movq %xmm0,%rax to be generated instead of retq.
Am I missing something? Why do we return 64 bit integer in xmm register rather than in %rax?


For Linux x86_64 I can only generate VMOVQ (or MOVQ) instructions respectively for AVX/non-AVX case.
Can we even generate MOVD+VMOVQ?
How we want to document this intrinsic?

I have a similar question as above.

__m128i foopi64_epi64 (__m64 a)
  __m128i x;

  x = _mm_movpi64_epi64 (a);
  return x;

Why we generate this code

        vmovq   %xmm0, %rax
        vmovq   %rax, %xmm0

instead of something simple like vmovq %rdi, %xmm0?

efriedma added inline comments.

The x86-64 calling convention rules say that __m64 is passed/returned in SSE registers.

Try the following, which generates movdq2q:

__m64 foo(__m128i a, __m128 c)
  return _mm_add_pi8(_mm_movepi64_pi64(a), _mm_set1_pi8(5));
kromanova added inline comments.Jan 8 2018, 8:04 PM

Thanks! That explains it :)
I can see that MOVDQ2Q gets generated.

What about intrinsic below, _mm_movpi64_epi64? Can we ever generate MOVD+VMOVQ as stated in the review?
Or should we write VMOVQ / MOVQ?

efriedma added inline comments.Jan 9 2018, 12:45 PM


#include <immintrin.h>
__m128 foo(__m128i a, __m128 c)
  return _mm_movpi64_epi64(_mm_add_pi8(_mm_movepi64_pi64(a), _mm_set1_pi8(5)));

In this case, we should generate movq2dq, but currently don't (I assume due to a missing DAGCombine). I don't see how you could ever get MOVD... but see, which could be the source of some confusion.