This is an archive of the discontinued LLVM Phabricator instance.

Documentation for the newly added x86 intrinsics.
ClosedPublic

Authored by kromanova on Jan 9 2017, 4:53 PM.

Download Raw Diff

Details

Reviewers

AsafBadouh
agutowski
Sunil_Srivastava
probinson
m_zuckerman

Commits

rG2e041c9c2072: [DOXYGEN] Documentation for the newly added x86 intrinsics.
rC291876: [DOXYGEN] Documentation for the newly added x86 intrinsics.
rL291876: [DOXYGEN] Documentation for the newly added x86 intrinsics.

Summary

Added doxygen comments for the newly added intrinsics in avxintrin.h, namely _mm256_cvtsd_f64, _mm256_cvtsi256_si32 and _mm256_cvtss_f32 (@Michael Zuckerman, please review doxygen comments for the intrinsics that you added). We generate the documentation automatically based on comments and our generation tools choke if some of the intrinsics in avxintrin.h are not documented.
Added doxygen comments for the new intrinsics in emmintrin.h, namely _mm_loadu_si64 and _mm_load_sd. (@Asaf Badoug, please review doxygen comments related to the intrinsic that you added, _mm_loadu_si64, and while you are there, for _mm_load_sd).
Explicit parameter names were added for _mm_clflush, _mm_setcsr (@Albert Gutowski, please review this change, since you removed the explicit param names for these intrinsics). Doxygen gets upset when it can't find (and later match with the comment) a parameter name in intrinsics prototype.
The rest of the changes are editorial, removing trailing spaces at the end of the lines.

Diff Detail

Repository: rL LLVM

Event Timeline

kromanova updated this revision to Diff 83732.Jan 9 2017, 4:53 PM

kromanova retitled this revision from to Documentation for the newly added x86 intrinsics..

kromanova updated this object.

kromanova added reviewers: m_zuckerman, AsafBadouh, agutowski, Sunil_Srivastava, probinson.

kromanova set the repository for this revision to rL LLVM.

kromanova added a subscriber: RKSimon.

RKSimon added a subscriber: cfe-commits.Jan 10 2017, 2:05 AM

For my intrinsics ( _mm256_cvtsd_f64, _mm256_cvtsi256_si32 and _mm256_cvtss_f32) - LGTM.

For my part, LGTM.

probinson added inline comments.Jan 11 2017, 12:41 PM

emmintrin.h
1607 ↗	(On Diff #83732)	should this be VMOVQ/MOVQ instead?

kromanova added inline comments.Jan 11 2017, 3:01 PM

emmintrin.h
1607 ↗	(On Diff #83732)	Probably yes. Let me know if you have a different opinion. If I use this intrinsic by itself, clang generates VMOVSD instruction. It happens because the default domain is chooses to generate smaller instruction code. I got confused because I couldn't find Intel's documentation about _mm_loadu_si64, so I just wrote a test like the one below and looked what instructions got generated. __m128i foo22 (void const * __a) { return _mm_loadu_si64 (__a); } However, if I change the test and use an intrisic to add 2 64-bit integers after the load intrinsics, I can see that VMOVQ instruction gets generated. __m128d foo44 (double const * __a) { __m128i first = _mm_loadu_si64 (__a); __m128i second = _mm_loadu_si64 (__a); return _mm_add_epi64(first, second); } So, as you see clang could generate either VMOVSD/MOVSD or VMOVSQ/MOVSQ. I think it makes sense to change the documentation as Paul suggested: /// This intrinsic corresponds to the VMOVSQ/MOVSQ. Or, alternatively, we could list all the instructions that correspond to this intrinsics: /// This intrinsic corresponds to the VMOVSQ/MOVSQ/VMOVSD/MOVSD.
1607 ↗	(On Diff #83732)	It will be interesting to hear Asaf Badoug opinion, since he added this intrisic. He probably has access to Intel's documentation for this intrinsic too (which I wasn't able to find online).
1607 ↗	(On Diff #83732)	There is a similar situation for one intrisic just a few lines above, namely _mm_loadu_pd. It could generate either VMOVUPD / MOVUPD or VMOVUPS/MOVUPS instructions. I have actually asked Simon question about it offline just a couple of days ago. I decided to kept referring to VMOVUPD / MOVUPD as a corresponding instruction for _mm_loadu_pd. However, if we end up doing things differently for _mm_loadu_si64, we need to do a similar change to _mm_loadu_pd (and probably to some other intrinsics).

RKSimon added inline comments.Jan 11 2017, 3:04 PM

emmintrin.h
1607 ↗	(On Diff #83732)	It should be VMOVQ/MOVQ (note NOT VMOVSQ/MOVSQ!). Whatever the domain fixup code does to it, that was the original intent of the code and matches what other compilers says it will (probably) be.

kromanova added inline comments.Jan 11 2017, 3:11 PM

emmintrin.h
1607 ↗	(On Diff #83732)	Yep, sorry, inaccurate editing after copy and paste. Thank you for noticing. I agree should say VMOVQ/MOVQ (similar to what is done for _mm_loadu_pd that we discussed a few days ago). I will do this change and reload the review shortly.

Changed the instruction name from VMOVSD to VMOVQ for _mm_loadu_si64

LGTM

emmintrin.h
1607 ↗	(On Diff #83732)	sorry for the late response. In general, not all the intrinsics will lowered to the exact instruction that described in the software manual guide. We do make effort that the intrinsics will be implemented as C-style functions, it can help the compiler to optimized the code. you can see that in all the arithmetic intrinsics as example. it seems that you already got the answer to your questions from the Simon. BTW, Intel has nice tool that contain descriptions for all the intrinsics. https://software.intel.com/sites/landingpage/IntrinsicsGuide/

Closed by commit rL291876: [DOXYGEN] Documentation for the newly added x86 intrinsics. (authored by kromanova). · Explain WhyJan 12 2017, 5:25 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

cfe/

trunk/

lib/

Headers/

30 lines

26 lines

2 lines

4 lines

14 lines

Diff 84200

cfe/trunk/lib/Headers/avxintrin.h

	Show First 20 Lines • Show All 2,178 Lines • ▼ Show 20 Lines
	/// A 256-bit vector of [8 x float].			/// A 256-bit vector of [8 x float].
	/// \returns A 256-bit integer vector containing the converted values.			/// \returns A 256-bit integer vector containing the converted values.
	static __inline __m256i __DEFAULT_FN_ATTRS			static __inline __m256i __DEFAULT_FN_ATTRS
	_mm256_cvttps_epi32(__m256 __a)			_mm256_cvttps_epi32(__m256 __a)
	{			{
	return (__m256i)__builtin_ia32_cvttps2dq256((__v8sf) __a);			return (__m256i)__builtin_ia32_cvttps2dq256((__v8sf) __a);
	}			}

				/// \brief Returns the first element of the input vector of [4 x double].
				///
				/// \headerfile <avxintrin.h>
				///
				/// This intrinsic is a utility function and does not correspond to a specific
				/// instruction.
				///
				/// \param __a
				/// A 256-bit vector of [4 x double].
				/// \returns A 64 bit double containing the first element of the input vector.
	static __inline double __DEFAULT_FN_ATTRS			static __inline double __DEFAULT_FN_ATTRS
	_mm256_cvtsd_f64(__m256d __a)			_mm256_cvtsd_f64(__m256d __a)
	{			{
	return __a[0];			return __a[0];
	}			}

				/// \brief Returns the first element of the input vector of [8 x i32].
				///
				/// \headerfile <avxintrin.h>
				///
				/// This intrinsic is a utility function and does not correspond to a specific
				/// instruction.
				///
				/// \param __a
				/// A 256-bit vector of [8 x i32].
				/// \returns A 32 bit integer containing the first element of the input vector.
	static __inline int __DEFAULT_FN_ATTRS			static __inline int __DEFAULT_FN_ATTRS
	_mm256_cvtsi256_si32(__m256i __a)			_mm256_cvtsi256_si32(__m256i __a)
	{			{
	__v8si __b = (__v8si)__a;			__v8si __b = (__v8si)__a;
	return __b[0];			return __b[0];
	}			}

				/// \brief Returns the first element of the input vector of [8 x float].
				///
				/// \headerfile <avxintrin.h>
				///
				/// This intrinsic is a utility function and does not correspond to a specific
				/// instruction.
				///
				/// \param __a
				/// A 256-bit vector of [8 x float].
				/// \returns A 32 bit float containing the first element of the input vector.
	static __inline float __DEFAULT_FN_ATTRS			static __inline float __DEFAULT_FN_ATTRS
	_mm256_cvtss_f32(__m256 __a)			_mm256_cvtss_f32(__m256 __a)
	{			{
	return __a[0];			return __a[0];
	}			}

	/* Vector replicate */			/* Vector replicate */
	/// \brief Moves and duplicates high-order (odd-indexed) values from a 256-bit			/// \brief Moves and duplicates high-order (odd-indexed) values from a 256-bit
	▲ Show 20 Lines • Show All 2,687 Lines • Show Last 20 Lines

cfe/trunk/lib/Headers/emmintrin.h

	Show First 20 Lines • Show All 1,593 Lines • ▼ Show 20 Lines
	_mm_loadu_pd(double const *__dp)			_mm_loadu_pd(double const *__dp)
	{			{
	struct __loadu_pd {			struct __loadu_pd {
	__m128d __v;			__m128d __v;
	} __attribute__((__packed__, __may_alias__));			} __attribute__((__packed__, __may_alias__));
	return ((struct __loadu_pd*)__dp)->__v;			return ((struct __loadu_pd*)__dp)->__v;
	}			}

				/// \brief Loads a 64-bit integer value to the low element of a 128-bit integer
				/// vector and clears the upper element.
				///
				/// \headerfile <x86intrin.h>
				///
				/// This intrinsic corresponds to the <c> VMOVQ / MOVQ </c> instruction.
				///
				/// \param __dp
				/// A pointer to a 64-bit memory location. The address of the memory
				/// location does not have to be aligned.
				/// \returns A 128-bit vector of [2 x i64] containing the loaded value.
	static __inline__ __m128i __DEFAULT_FN_ATTRS			static __inline__ __m128i __DEFAULT_FN_ATTRS
	_mm_loadu_si64(void const *__a)			_mm_loadu_si64(void const *__a)
	{			{
	struct __loadu_si64 {			struct __loadu_si64 {
	long long __v;			long long __v;
	} __attribute__((__packed__, __may_alias__));			} __attribute__((__packed__, __may_alias__));
	long long __u = ((struct __loadu_si64*)__a)->__v;			long long __u = ((struct __loadu_si64*)__a)->__v;
	return (__m128i){__u, 0L};			return (__m128i){__u, 0L};
	}			}

				/// \brief Loads a 64-bit double-precision value to the low element of a
				/// 128-bit integer vector and clears the upper element.
				///
				/// \headerfile <x86intrin.h>
				///
				/// This intrinsic corresponds to the <c> VMOVSD / MOVSD </c> instruction.
				///
				/// \param __dp
				/// An pointer to a memory location containing a double-precision value.
				/// The address of the memory location does not have to be aligned.
				/// \returns A 128-bit vector of [2 x double] containing the loaded value.
	static __inline__ __m128d __DEFAULT_FN_ATTRS			static __inline__ __m128d __DEFAULT_FN_ATTRS
	_mm_load_sd(double const *__dp)			_mm_load_sd(double const *__dp)
	{			{
	struct __mm_load_sd_struct {			struct __mm_load_sd_struct {
	double __u;			double __u;
	} __attribute__((__packed__, __may_alias__));			} __attribute__((__packed__, __may_alias__));
	double __u = ((struct __mm_load_sd_struct*)__dp)->__u;			double __u = ((struct __mm_load_sd_struct*)__dp)->__u;
	return (__m128d){ __u, 0 };			return (__m128d){ __u, 0 };
	▲ Show 20 Lines • Show All 2,394 Lines • ▼ Show 20 Lines
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	/// This intrinsic corresponds to the <c> CLFLUSH </c> instruction.			/// This intrinsic corresponds to the <c> CLFLUSH </c> instruction.
	///			///
	/// \param __p			/// \param __p
	/// A pointer to the memory location used to identify the cache line to be			/// A pointer to the memory location used to identify the cache line to be
	/// flushed.			/// flushed.
	void _mm_clflush(void const *);			void _mm_clflush(void const * __p);

	/// \brief Forces strong memory ordering (serialization) between load			/// \brief Forces strong memory ordering (serialization) between load
	/// instructions preceding this instruction and load instructions following			/// instructions preceding this instruction and load instructions following
	/// this instruction, ensuring the system completes all previous loads before			/// this instruction, ensuring the system completes all previous loads before
	/// executing subsequent loads.			/// executing subsequent loads.
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	/// This intrinsic corresponds to the <c> VPEXTRW / PEXTRW </c> instruction.			/// This intrinsic corresponds to the <c> VPEXTRW / PEXTRW </c> instruction.
	///			///
	/// \param __a			/// \param __a
	/// A 128-bit integer vector.			/// A 128-bit integer vector.
	/// \param __imm			/// \param __imm
	/// An immediate value. Bits [3:0] selects values from \a __a to be assigned			/// An immediate value. Bits [2:0] selects values from \a __a to be assigned
	/// to bits[15:0] of the result. \n			/// to bits[15:0] of the result. \n
	/// 000: assign values from bits [15:0] of \a __a. \n			/// 000: assign values from bits [15:0] of \a __a. \n
	/// 001: assign values from bits [31:16] of \a __a. \n			/// 001: assign values from bits [31:16] of \a __a. \n
	/// 010: assign values from bits [47:32] of \a __a. \n			/// 010: assign values from bits [47:32] of \a __a. \n
	/// 011: assign values from bits [63:48] of \a __a. \n			/// 011: assign values from bits [63:48] of \a __a. \n
	/// 100: assign values from bits [79:64] of \a __a. \n			/// 100: assign values from bits [79:64] of \a __a. \n
	/// 101: assign values from bits [95:80] of \a __a. \n			/// 101: assign values from bits [95:80] of \a __a. \n
	/// 110: assign values from bits [111:96] of \a __a. \n			/// 110: assign values from bits [111:96] of \a __a. \n
	▲ Show 20 Lines • Show All 639 Lines • Show Last 20 Lines

cfe/trunk/lib/Headers/mmintrin.h

	Show First 20 Lines • Show All 205 Lines • ▼ Show 20 Lines
	/// \brief Unpacks the upper 32 bits from two 64-bit integer vectors of [8 x i8]			/// \brief Unpacks the upper 32 bits from two 64-bit integer vectors of [8 x i8]
	/// and interleaves them into a 64-bit integer vector of [8 x i8].			/// and interleaves them into a 64-bit integer vector of [8 x i8].
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	/// This intrinsic corresponds to the <c> PUNPCKHBW </c> instruction.			/// This intrinsic corresponds to the <c> PUNPCKHBW </c> instruction.
	///			///
	/// \param __m1			/// \param __m1
	/// A 64-bit integer vector of [8 x i8]. \n			/// A 64-bit integer vector of [8 x i8]. \n
	/// Bits [39:32] are written to bits [7:0] of the result. \n			/// Bits [39:32] are written to bits [7:0] of the result. \n
	/// Bits [47:40] are written to bits [23:16] of the result. \n			/// Bits [47:40] are written to bits [23:16] of the result. \n
	/// Bits [55:48] are written to bits [39:32] of the result. \n			/// Bits [55:48] are written to bits [39:32] of the result. \n
	/// Bits [63:56] are written to bits [55:48] of the result.			/// Bits [63:56] are written to bits [55:48] of the result.
	/// \param __m2			/// \param __m2
	/// A 64-bit integer vector of [8 x i8].			/// A 64-bit integer vector of [8 x i8].
	/// Bits [39:32] are written to bits [15:8] of the result. \n			/// Bits [39:32] are written to bits [15:8] of the result. \n
	/// Bits [47:40] are written to bits [31:24] of the result. \n			/// Bits [47:40] are written to bits [31:24] of the result. \n
	▲ Show 20 Lines • Show All 1,327 Lines • Show Last 20 Lines

cfe/trunk/lib/Headers/pmmintrin.h

	Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines
	static __inline__ __m128 __DEFAULT_FN_ATTRS			static __inline__ __m128 __DEFAULT_FN_ATTRS
	_mm_hsub_ps(__m128 __a, __m128 __b)			_mm_hsub_ps(__m128 __a, __m128 __b)
	{			{
	return __builtin_ia32_hsubps((__v4sf)__a, (__v4sf)__b);			return __builtin_ia32_hsubps((__v4sf)__a, (__v4sf)__b);
	}			}

	/// \brief Moves and duplicates high-order (odd-indexed) values from a 128-bit			/// \brief Moves and duplicates high-order (odd-indexed) values from a 128-bit
	/// vector of [4 x float] to float values stored in a 128-bit vector of			/// vector of [4 x float] to float values stored in a 128-bit vector of
	/// [4 x float].			/// [4 x float].
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	/// This intrinsic corresponds to the <c> VMOVSHDUP </c> instruction.			/// This intrinsic corresponds to the <c> VMOVSHDUP </c> instruction.
	///			///
	/// \param __a			/// \param __a
	/// A 128-bit vector of [4 x float]. \n			/// A 128-bit vector of [4 x float]. \n
	/// Bits [127:96] of the source are written to bits [127:96] and [95:64] of			/// Bits [127:96] of the source are written to bits [127:96] and [95:64] of
	/// the destination. \n			/// the destination. \n
	/// Bits [63:32] of the source are written to bits [63:32] and [31:0] of the			/// Bits [63:32] of the source are written to bits [63:32] and [31:0] of the
	/// destination.			/// destination.
	/// \returns A 128-bit vector of [4 x float] containing the moved and duplicated			/// \returns A 128-bit vector of [4 x float] containing the moved and duplicated
	/// values.			/// values.
	static __inline__ __m128 __DEFAULT_FN_ATTRS			static __inline__ __m128 __DEFAULT_FN_ATTRS
	_mm_movehdup_ps(__m128 __a)			_mm_movehdup_ps(__m128 __a)
	{			{
	return __builtin_shufflevector((__v4sf)__a, (__v4sf)__a, 1, 1, 3, 3);			return __builtin_shufflevector((__v4sf)__a, (__v4sf)__a, 1, 1, 3, 3);
	}			}

	/// \brief Duplicates low-order (even-indexed) values from a 128-bit vector of			/// \brief Duplicates low-order (even-indexed) values from a 128-bit vector of
	/// [4 x float] to float values stored in a 128-bit vector of [4 x float].			/// [4 x float] to float values stored in a 128-bit vector of [4 x float].
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	/// This intrinsic corresponds to the <c> VMOVSLDUP </c> instruction.			/// This intrinsic corresponds to the <c> VMOVSLDUP </c> instruction.
	///			///
	/// \param __a			/// \param __a
	/// A 128-bit vector of [4 x float] \n			/// A 128-bit vector of [4 x float] \n
	/// Bits [95:64] of the source are written to bits [127:96] and [95:64] of			/// Bits [95:64] of the source are written to bits [127:96] and [95:64] of
	▲ Show 20 Lines • Show All 163 Lines • Show Last 20 Lines

cfe/trunk/lib/Headers/xmmintrin.h

	Show First 20 Lines • Show All 2,061 Lines • ▼ Show 20 Lines
	/// operation: \n			/// operation: \n
	/// _MM_HINT_NTA: Move data using the non-temporal access (NTA) hint. The			/// _MM_HINT_NTA: Move data using the non-temporal access (NTA) hint. The
	/// PREFETCHNTA instruction will be generated. \n			/// PREFETCHNTA instruction will be generated. \n
	/// _MM_HINT_T0: Move data using the T0 hint. The PREFETCHT0 instruction will			/// _MM_HINT_T0: Move data using the T0 hint. The PREFETCHT0 instruction will
	/// be generated. \n			/// be generated. \n
	/// _MM_HINT_T1: Move data using the T1 hint. The PREFETCHT1 instruction will			/// _MM_HINT_T1: Move data using the T1 hint. The PREFETCHT1 instruction will
	/// be generated. \n			/// be generated. \n
	/// _MM_HINT_T2: Move data using the T2 hint. The PREFETCHT2 instruction will			/// _MM_HINT_T2: Move data using the T2 hint. The PREFETCHT2 instruction will
	/// be generated.			/// be generated.
	#define _mm_prefetch(a, sel) (__builtin_prefetch((void *)(a), 0, (sel)))			#define _mm_prefetch(a, sel) (__builtin_prefetch((void *)(a), 0, (sel)))
	#endif			#endif

	/// \brief Stores a 64-bit integer in the specified aligned memory location. To			/// \brief Stores a 64-bit integer in the specified aligned memory location. To
	/// minimize caching, the data is flagged as non-temporal (unlikely to be			/// minimize caching, the data is flagged as non-temporal (unlikely to be
	/// used again soon).			/// used again soon).
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	▲ Show 20 Lines • Show All 351 Lines • ▼ Show 20 Lines
	/// _MM_EXCEPT_DENORM, _MM_EXCEPT_OVERFLOW, _MM_EXCEPT_UNDERFLOW,			/// _MM_EXCEPT_DENORM, _MM_EXCEPT_OVERFLOW, _MM_EXCEPT_UNDERFLOW,
	/// _MM_EXCEPT_INEXACT. There is a convenience wrapper			/// _MM_EXCEPT_INEXACT. There is a convenience wrapper
	/// _MM_GET_EXCEPTION_STATE().			/// _MM_GET_EXCEPTION_STATE().
	/// </li>			/// </li>
	/// <li>			/// <li>
	/// For checking exception masks: _MM_MASK_UNDERFLOW, _MM_MASK_OVERFLOW,			/// For checking exception masks: _MM_MASK_UNDERFLOW, _MM_MASK_OVERFLOW,
	/// _MM_MASK_INVALID, _MM_MASK_DENORM, _MM_MASK_DIV_ZERO, _MM_MASK_INEXACT.			/// _MM_MASK_INVALID, _MM_MASK_DENORM, _MM_MASK_DIV_ZERO, _MM_MASK_INEXACT.
	/// There is a convenience wrapper _MM_GET_EXCEPTION_MASK().			/// There is a convenience wrapper _MM_GET_EXCEPTION_MASK().
	/// </li>			/// </li>
	/// <li>			/// <li>
	/// For checking rounding modes: _MM_ROUND_NEAREST, _MM_ROUND_DOWN,			/// For checking rounding modes: _MM_ROUND_NEAREST, _MM_ROUND_DOWN,
	/// _MM_ROUND_UP, _MM_ROUND_TOWARD_ZERO. There is a convenience wrapper			/// _MM_ROUND_UP, _MM_ROUND_TOWARD_ZERO. There is a convenience wrapper
	/// _MM_GET_ROUNDING_MODE(x) where x is one of these macros.			/// _MM_GET_ROUNDING_MODE(x) where x is one of these macros.
	/// </li>			/// </li>
	/// <li>			/// <li>
	/// For checking flush-to-zero mode: _MM_FLUSH_ZERO_ON, _MM_FLUSH_ZERO_OFF.			/// For checking flush-to-zero mode: _MM_FLUSH_ZERO_ON, _MM_FLUSH_ZERO_OFF.
	/// There is a convenience wrapper _MM_GET_FLUSH_ZERO_MODE().			/// There is a convenience wrapper _MM_GET_FLUSH_ZERO_MODE().
	/// </li>			/// </li>
	/// <li>			/// <li>
	/// For checking denormals-are-zero mode: _MM_DENORMALS_ZERO_ON,			/// For checking denormals-are-zero mode: _MM_DENORMALS_ZERO_ON,
	/// _MM_DENORMALS_ZERO_OFF. There is a convenience wrapper			/// _MM_DENORMALS_ZERO_OFF. There is a convenience wrapper
	/// _MM_GET_DENORMALS_ZERO_MODE().			/// _MM_GET_DENORMALS_ZERO_MODE().
	/// </li>			/// </li>
	/// </ul>			/// </ul>
	///			///
	/// For example, the expression below checks if an overflow exception has			/// For example, the expression below checks if an overflow exception has
	/// occurred:			/// occurred:
	/// ( _mm_getcsr() & _MM_EXCEPT_OVERFLOW )			/// ( _mm_getcsr() & _MM_EXCEPT_OVERFLOW )
	///			///
	/// The following example gets the current rounding mode:			/// The following example gets the current rounding mode:
	/// _MM_GET_ROUNDING_MODE()			/// _MM_GET_ROUNDING_MODE()
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	/// This intrinsic corresponds to the <c> VSTMXCSR / STMXCSR </c> instruction.			/// This intrinsic corresponds to the <c> VSTMXCSR / STMXCSR </c> instruction.
	///			///
	/// \returns A 32-bit unsigned integer containing the contents of the MXCSR			/// \returns A 32-bit unsigned integer containing the contents of the MXCSR
	/// register.			/// register.
	unsigned int _mm_getcsr(void);			unsigned int _mm_getcsr(void);

	/// \brief Sets the MXCSR register with the 32-bit unsigned integer value.			/// \brief Sets the MXCSR register with the 32-bit unsigned integer value.
	///			///
	/// There are several groups of macros associated with this intrinsic,			/// There are several groups of macros associated with this intrinsic,
	/// including:			/// including:
	/// <ul>			/// <ul>
	/// <li>			/// <li>
	/// For setting exception states: _MM_EXCEPT_INVALID, _MM_EXCEPT_DIV_ZERO,			/// For setting exception states: _MM_EXCEPT_INVALID, _MM_EXCEPT_DIV_ZERO,
	/// _MM_EXCEPT_DENORM, _MM_EXCEPT_OVERFLOW, _MM_EXCEPT_UNDERFLOW,			/// _MM_EXCEPT_DENORM, _MM_EXCEPT_OVERFLOW, _MM_EXCEPT_UNDERFLOW,
	/// _MM_EXCEPT_INEXACT. There is a convenience wrapper			/// _MM_EXCEPT_INEXACT. There is a convenience wrapper
	/// _MM_SET_EXCEPTION_STATE(x) where x is one of these macros.			/// _MM_SET_EXCEPTION_STATE(x) where x is one of these macros.
	/// </li>			/// </li>
	/// <li>			/// <li>
	/// For setting exception masks: _MM_MASK_UNDERFLOW, _MM_MASK_OVERFLOW,			/// For setting exception masks: _MM_MASK_UNDERFLOW, _MM_MASK_OVERFLOW,
	/// _MM_MASK_INVALID, _MM_MASK_DENORM, _MM_MASK_DIV_ZERO, _MM_MASK_INEXACT.			/// _MM_MASK_INVALID, _MM_MASK_DENORM, _MM_MASK_DIV_ZERO, _MM_MASK_INEXACT.
	Show All 28 Lines
	/// }			/// }
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	/// This intrinsic corresponds to the <c> VLDMXCSR / LDMXCSR </c> instruction.			/// This intrinsic corresponds to the <c> VLDMXCSR / LDMXCSR </c> instruction.
	///			///
	/// \param __i			/// \param __i
	/// A 32-bit unsigned integer value to be written to the MXCSR register.			/// A 32-bit unsigned integer value to be written to the MXCSR register.
	void _mm_setcsr(unsigned int);			void _mm_setcsr(unsigned int __i);

	#if defined(__cplusplus)			#if defined(__cplusplus)
	} // extern "C"			} // extern "C"
	#endif			#endif

	/// \brief Selects 4 float values from the 128-bit operands of [4 x float], as			/// \brief Selects 4 float values from the 128-bit operands of [4 x float], as
	/// specified by the immediate value operand.			/// specified by the immediate value operand.
	///			///
	▲ Show 20 Lines • Show All 444 Lines • Show Last 20 Lines