Page MenuHomePhabricator

Documentation for the newly added x86 intrinsics.

Authored by kromanova on Jan 9 2017, 4:53 PM.


  • Added doxygen comments for the newly added intrinsics in avxintrin.h, namely _mm256_cvtsd_f64, _mm256_cvtsi256_si32 and _mm256_cvtss_f32 (@Michael Zuckerman, please review doxygen comments for the intrinsics that you added). We generate the documentation automatically based on comments and our generation tools choke if some of the intrinsics in avxintrin.h are not documented.
  • Added doxygen comments for the new intrinsics in emmintrin.h, namely _mm_loadu_si64 and _mm_load_sd. (@Asaf Badoug, please review doxygen comments related to the intrinsic that you added, _mm_loadu_si64, and while you are there, for _mm_load_sd).
  • Explicit parameter names were added for _mm_clflush, _mm_setcsr (@Albert Gutowski, please review this change, since you removed the explicit param names for these intrinsics). Doxygen gets upset when it can't find (and later match with the comment) a parameter name in intrinsics prototype.
  • The rest of the changes are editorial, removing trailing spaces at the end of the lines.

Diff Detail


Event Timeline

kromanova updated this revision to Diff 83732.Jan 9 2017, 4:53 PM
kromanova retitled this revision from to Documentation for the newly added x86 intrinsics..
kromanova updated this object.
kromanova set the repository for this revision to rL LLVM.
kromanova added a subscriber: RKSimon.
m_zuckerman edited edge metadata.Jan 10 2017, 2:56 AM

For my intrinsics ( _mm256_cvtsd_f64, _mm256_cvtsi256_si32 and _mm256_cvtss_f32) - LGTM.

agutowski edited edge metadata.Jan 10 2017, 3:41 PM

For my part, LGTM.

probinson added inline comments.Jan 11 2017, 12:41 PM
1607 ↗(On Diff #83732)

should this be VMOVQ/MOVQ instead?

kromanova added inline comments.Jan 11 2017, 3:01 PM
1607 ↗(On Diff #83732)

Probably yes. Let me know if you have a different opinion.

If I use this intrinsic by itself, clang generates VMOVSD instruction.
It happens because the default domain is chooses to generate smaller instruction code.
I got confused because I couldn't find Intel's documentation about _mm_loadu_si64, so I just wrote a test like the one below and looked what instructions got generated.

__m128i foo22 (void const * __a)
  return _mm_loadu_si64 (__a);

However, if I change the test and use an intrisic to add 2 64-bit integers after the load intrinsics, I can see that VMOVQ instruction gets generated.

__m128d foo44 (double const * __a)
  __m128i first  = _mm_loadu_si64 (__a);
  __m128i second = _mm_loadu_si64 (__a);
  return _mm_add_epi64(first, second);


So, as you see clang could generate either VMOVSD/MOVSD or VMOVSQ/MOVSQ. I think it makes sense to change the documentation as Paul suggested:

/// This intrinsic corresponds to the VMOVSQ/MOVSQ.

Or, alternatively, we could list all the instructions that correspond to this intrinsics:

/// This intrinsic corresponds to the VMOVSQ/MOVSQ/VMOVSD/MOVSD.

1607 ↗(On Diff #83732)

It will be interesting to hear Asaf Badoug opinion, since he added this intrisic. He probably has access to Intel's documentation for this intrinsic too (which I wasn't able to find online).

1607 ↗(On Diff #83732)

There is a similar situation for one intrisic just a few lines above, namely _mm_loadu_pd. It could generate either VMOVUPD / MOVUPD or VMOVUPS/MOVUPS instructions.
I have actually asked Simon question about it offline just a couple of days ago.

I decided to kept referring to VMOVUPD / MOVUPD as a corresponding instruction for _mm_loadu_pd. However, if we end up doing things differently for _mm_loadu_si64, we need to do a similar change to _mm_loadu_pd (and probably to some other intrinsics).

RKSimon added inline comments.Jan 11 2017, 3:04 PM
1607 ↗(On Diff #83732)

It should be VMOVQ/MOVQ (note NOT VMOVSQ/MOVSQ!). Whatever the domain fixup code does to it, that was the original intent of the code and matches what other compilers says it will (probably) be.

kromanova added inline comments.Jan 11 2017, 3:11 PM
1607 ↗(On Diff #83732)

Yep, sorry, inaccurate editing after copy and paste. Thank you for noticing.
I agree should say VMOVQ/MOVQ (similar to what is done for _mm_loadu_pd that we discussed a few days ago).

I will do this change and reload the review shortly.

kromanova updated this revision to Diff 84038.Jan 11 2017, 3:20 PM
kromanova edited edge metadata.

Changed the instruction name from VMOVSD to VMOVQ for _mm_loadu_si64

AsafBadouh edited edge metadata.Jan 12 2017, 4:20 AM


1607 ↗(On Diff #83732)

sorry for the late response.
In general, not all the intrinsics will lowered to the exact instruction that described in the software manual guide.
We do make effort that the intrinsics will be implemented as C-style functions, it can help the compiler to optimized the code.
you can see that in all the arithmetic intrinsics as example.

it seems that you already got the answer to your questions from the Simon.

BTW, Intel has nice tool that contain descriptions for all the intrinsics.

This revision was automatically updated to reflect the committed changes.