Page MenuHomePhabricator

emmintrin.h documentation fixes and updates

Authored by dyung on Dec 21 2017, 2:14 PM.



This is the result of several patches we made internally to update the documentation that we would like to have reviewed for possible submission.

The changes include:

  1. Fixed innaccurate instruction mappings for various intrinsics.
  2. Fixed description of NaN handling in comparison intrinsics.
  3. Unify description of _mm_store_pd1 to match _mm_store1_pd.
  4. Fix incorrect wording in various intrinsic descriptions. Previously the descriptions used "low-order" and "high-order" when the intended meaning was "even-indexed" and "odd-indexed".
  5. Fix typos.
  6. Add missing italics command (\a) for params and fixed some parameter spellings.

These patches were made by Craig Flores

Event Timeline

dyung created this revision.Dec 21 2017, 2:14 PM
This revision is now accepted and ready to land.Dec 22 2017, 7:17 PM
This revision was automatically updated to reflect the committed changes.
kromanova added inline comments.Jan 4 2018, 8:12 PM

Formatting is inconsistent with the rest of the changes above or below. One sentence here separated by the empty lines, where everywhere else it's 2 sentences.


I'm not sure about this change.

Intel documentation says they generate MOVDQ2Q (don't have icc handy to try).
However, I've tried on Linux/X86_64 with clang and gcc, - and we just return.

kromanova added inline comments.Jan 8 2018, 6:35 PM

It's better if you use the same language as for many intrinsics "before" and "after". Just for consistency purpose.

/// This intrinsic is a utility function and does not correspond to a specific
///    instruction.

Though I suspect it's possible to generate movdq2q, I couldn't come up with an test to trigger this instruction generation.
Should we revert this change?

__m64 fooepi64_pi64 (__m128i a, __m128 c)
  __m64 x;

  x = _mm_movepi64_pi64 (a);
  return x;

on Linux we generate return instruction.
I would expect (v)movq %xmm0,%rax to be generated instead of retq.
Am I missing something? Why do we return 64 bit integer in xmm register rather than in %rax?


For Linux x86_64 I can only generate VMOVQ (or MOVQ) instructions respectively for AVX/non-AVX case.
Can we even generate MOVD+VMOVQ?
How we want to document this intrinsic?

I have a similar question as above.

__m128i foopi64_epi64 (__m64 a)
  __m128i x;

  x = _mm_movpi64_epi64 (a);
  return x;

Why we generate this code

        vmovq   %xmm0, %rax
        vmovq   %rax, %xmm0

instead of something simple like vmovq %rdi, %xmm0?

efriedma added inline comments.

The x86-64 calling convention rules say that __m64 is passed/returned in SSE registers.

Try the following, which generates movdq2q:

__m64 foo(__m128i a, __m128 c)
  return _mm_add_pi8(_mm_movepi64_pi64(a), _mm_set1_pi8(5));
kromanova added inline comments.Jan 8 2018, 8:04 PM

Thanks! That explains it :)
I can see that MOVDQ2Q gets generated.

What about intrinsic below, _mm_movpi64_epi64? Can we ever generate MOVD+VMOVQ as stated in the review?
Or should we write VMOVQ / MOVQ?

efriedma added inline comments.Jan 9 2018, 12:45 PM


#include <immintrin.h>
__m128 foo(__m128i a, __m128 c)
  return _mm_movpi64_epi64(_mm_add_pi8(_mm_movepi64_pi64(a), _mm_set1_pi8(5)));

In this case, we should generate movq2dq, but currently don't (I assume due to a missing DAGCombine). I don't see how you could ever get MOVD... but see, which could be the source of some confusion.