This is an archive of the discontinued LLVM Phabricator instance.

clang/lib/Headers/avx2intrin.h
1324	j
1352	j
1407	j
1483	j
1509	j
1560	j
3474	A more intrinsic guide format is `MEM[__X+j:j]`
3506	ditto.
3538	ditto.
3570	ditto.
3602	MEM[j+31:j] := __Y[j+31:j]
3632	ditto.
3662	ditto.
3692	ditto.

probinson added inline comments.Jun 29 2023, 7:47 AM

clang/lib/Headers/avx2intrin.h
3474	LoadXX is the syntax in the gather intrinsics, e.g. _mm_mask_i32gather_pd. I'd prefer to be consistent.

s/:7/:j/ correcting bit references.

probinson marked 6 inline comments as done.Jun 29 2023, 7:55 AM

Harbormaster completed remote builds in B242097: Diff 535804.Jun 29 2023, 8:50 AM

pengfei added inline comments.Jun 29 2023, 9:31 AM

clang/lib/Headers/avx2intrin.h
3474	I think the problem here is the measurement is easily confusing. From C point of view, `__X` is a `int` pointer, so we should `+ i` rather than `i * 4` From the other part of the code, we are measuring in bits, but here `i * 4` is a byte offset.

probinson added inline comments.Jun 29 2023, 9:49 AM

clang/lib/Headers/avx2intrin.h
3474	Well, the pseudo-code is clearly not C. If you look at the gather code, it computes a byte address using an offset multiplied by an explicit scale factor. I am doing exactly the same here. The syntax `MEM[__X+j:j]` is mixing a byte address with a bit offset, which I think is more confusing. To be fully consistent, using `[]` with bit offsets only, it should be k := __X8 + i32 result[j+31:j] := MEM[k+31:k] which I think obscures more than it explains.

pengfei added inline comments.Jun 30 2023, 12:09 AM

clang/lib/Headers/avx2intrin.h
3474	Yeah, it's not C code here. But we are easy to fall into C concepts, e.g., why assuming __X is measuring in bytes? That's why I think it's clear to make both in bits. I made a mistake here, I wanted to propose `MEM[__X+j+31: __X+j]`. It matches with Intrinsic Guide.

probinson added inline comments.Jun 30 2023, 7:16 AM

clang/lib/Headers/avx2intrin.h
3474	We assume `__X` is in bytes because that's how addresses work on X86. Adding a bit offset to a byte address makes no sense. I see that is how existing Intel documentation works, which does not make it correct. To "make both in bits" means multiplying `__X` by 8, as in the example in my previous comment. Or coming up with a different syntax that makes the difference clear. `MEM(__X)[j+31:j]` or even `MEM[__X][j+31:j]` would be preferable.

LGTM.

clang/lib/Headers/avx2intrin.h
3474	My intention is to match with Intrinsic Guide as possible. Multiplying by 8 cannot achive it, but I cannot deny `__X` in bytes makes sense. So I'm fine to use a different syntax.

This revision is now accepted and ready to land.Jun 30 2023, 8:15 AM

This revision was landed with ongoing or failed builds.Jun 30 2023, 8:31 AM

Closed by commit rG1461fabfb141: [Headers][doc] Add load/store/cmp/cvt intrinsic descriptions to avx2intrin.h (authored by probinson). · Explain Why

This revision was automatically updated to reflect the committed changes.

probinson added a commit: rG1461fabfb141: [Headers][doc] Add load/store/cmp/cvt intrinsic descriptions to avx2intrin.h.

Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2023, 8:31 AM

Thanks!

Revision Contents

Path

Size

clang/

lib/

Headers/

avx2intrin.h

605 lines

Diff 536266

clang/lib/Headers/avx2intrin.h

	Show First 20 Lines • Show All 594 Lines • ▼ Show 20 Lines
	/// element is copied from \a V1; otherwise, it is copied from \a V2.			/// element is copied from \a V1; otherwise, it is copied from \a V2.
	/// \a M[0] determines the source for elements 0 and 8, \a M[1] for			/// \a M[0] determines the source for elements 0 and 8, \a M[1] for
	/// elements 1 and 9, and so forth.			/// elements 1 and 9, and so forth.
	/// \returns A 256-bit vector of [16 x i16] containing the result.			/// \returns A 256-bit vector of [16 x i16] containing the result.
	#define _mm256_blend_epi16(V1, V2, M) \			#define _mm256_blend_epi16(V1, V2, M) \
	((__m256i)__builtin_ia32_pblendw256((__v16hi)(__m256i)(V1), \			((__m256i)__builtin_ia32_pblendw256((__v16hi)(__m256i)(V1), \
	(__v16hi)(__m256i)(V2), (int)(M)))			(__v16hi)(__m256i)(V2), (int)(M)))

				/// Compares corresponding bytes in the 256-bit integer vectors in \a __a and
				/// \a __b for equality and returns the outcomes in the corresponding
				/// bytes of the 256-bit result.
				///
				/// \code{.operation}
				/// FOR i := 0 TO 31
				/// j := i*8
				/// result[j+7:j] := (__a[j+7:j] == __b[j+7:j]) ? 0xFF : 0
				/// ENDFOR
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPCMPEQB instruction.
				///
				/// \param __a
				/// A 256-bit integer vector containing one of the inputs.
				/// \param __b
				/// A 256-bit integer vector containing one of the inputs.
				/// \returns A 256-bit integer vector containing the result.
	static __inline__ __m256i __DEFAULT_FN_ATTRS256			static __inline__ __m256i __DEFAULT_FN_ATTRS256
	_mm256_cmpeq_epi8(__m256i __a, __m256i __b)			_mm256_cmpeq_epi8(__m256i __a, __m256i __b)
	{			{
	return (__m256i)((__v32qi)__a == (__v32qi)__b);			return (__m256i)((__v32qi)__a == (__v32qi)__b);
	}			}

				/// Compares corresponding elements in the 256-bit vectors of [16 x i16] in
				/// \a __a and \a __b for equality and returns the outcomes in the
				/// corresponding elements of the 256-bit result.
				///
				/// \code{.operation}
				/// FOR i := 0 TO 15
				/// j := i*16
				/// result[j+15:j] := (__a[j+15:j] == __b[j+15:j]) ? 0xFFFF : 0
				/// ENDFOR
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPCMPEQW instruction.
				///
				/// \param __a
				/// A 256-bit vector of [16 x i16] containing one of the inputs.
				/// \param __b
				/// A 256-bit vector of [16 x i16] containing one of the inputs.
				/// \returns A 256-bit vector of [16 x i16] containing the result.
	static __inline__ __m256i __DEFAULT_FN_ATTRS256			static __inline__ __m256i __DEFAULT_FN_ATTRS256
	_mm256_cmpeq_epi16(__m256i __a, __m256i __b)			_mm256_cmpeq_epi16(__m256i __a, __m256i __b)
	{			{
	return (__m256i)((__v16hi)__a == (__v16hi)__b);			return (__m256i)((__v16hi)__a == (__v16hi)__b);
	}			}

				/// Compares corresponding elements in the 256-bit vectors of [8 x i32] in
				/// \a __a and \a __b for equality and returns the outcomes in the
				/// corresponding elements of the 256-bit result.
				///
				/// \code{.operation}
				/// FOR i := 0 TO 7
				/// j := i*32
				/// result[j+31:j] := (__a[j+31:j] == __b[j+31:j]) ? 0xFFFFFFFF : 0
				/// ENDFOR
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPCMPEQD instruction.
				///
				/// \param __a
				/// A 256-bit vector of [8 x i32] containing one of the inputs.
				/// \param __b
				/// A 256-bit vector of [8 x i32] containing one of the inputs.
				/// \returns A 256-bit vector of [8 x i32] containing the result.
	static __inline__ __m256i __DEFAULT_FN_ATTRS256			static __inline__ __m256i __DEFAULT_FN_ATTRS256
	_mm256_cmpeq_epi32(__m256i __a, __m256i __b)			_mm256_cmpeq_epi32(__m256i __a, __m256i __b)
	{			{
	return (__m256i)((__v8si)__a == (__v8si)__b);			return (__m256i)((__v8si)__a == (__v8si)__b);
	}			}

				/// Compares corresponding elements in the 256-bit vectors of [4 x i64] in
				/// \a __a and \a __b for equality and returns the outcomes in the
				/// corresponding elements of the 256-bit result.
				///
				/// \code{.operation}
				/// FOR i := 0 TO 3
				/// j := i*64
				/// result[j+63:j] := (__a[j+63:j] == __b[j+63:j]) ? 0xFFFFFFFFFFFFFFFF : 0
				/// ENDFOR
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPCMPEQQ instruction.
				///
				/// \param __a
				/// A 256-bit vector of [4 x i64] containing one of the inputs.
				/// \param __b
				/// A 256-bit vector of [4 x i64] containing one of the inputs.
				/// \returns A 256-bit vector of [4 x i64] containing the result.
	static __inline__ __m256i __DEFAULT_FN_ATTRS256			static __inline__ __m256i __DEFAULT_FN_ATTRS256
	_mm256_cmpeq_epi64(__m256i __a, __m256i __b)			_mm256_cmpeq_epi64(__m256i __a, __m256i __b)
	{			{
	return (__m256i)((__v4di)__a == (__v4di)__b);			return (__m256i)((__v4di)__a == (__v4di)__b);
	}			}

				/// Compares corresponding signed bytes in the 256-bit integer vectors in
				/// \a __a and \a __b for greater-than and returns the outcomes in the
				/// corresponding bytes of the 256-bit result.
				///
				/// \code{.operation}
				/// FOR i := 0 TO 31
				/// j := i*8
				/// result[j+7:j] := (__a[j+7:j] > __b[j+7:j]) ? 0xFF : 0
				/// ENDFOR
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPCMPGTB instruction.
				///
				/// \param __a
				/// A 256-bit integer vector containing one of the inputs.
				/// \param __b
				/// A 256-bit integer vector containing one of the inputs.
				/// \returns A 256-bit integer vector containing the result.
	static __inline__ __m256i __DEFAULT_FN_ATTRS256			static __inline__ __m256i __DEFAULT_FN_ATTRS256
	_mm256_cmpgt_epi8(__m256i __a, __m256i __b)			_mm256_cmpgt_epi8(__m256i __a, __m256i __b)
	{			{
	/* This function always performs a signed comparison, but __v32qi is a char			/* This function always performs a signed comparison, but __v32qi is a char
	which may be signed or unsigned, so use __v32qs. */			which may be signed or unsigned, so use __v32qs. */
	return (__m256i)((__v32qs)__a > (__v32qs)__b);			return (__m256i)((__v32qs)__a > (__v32qs)__b);
	}			}

				/// Compares corresponding signed elements in the 256-bit vectors of
				/// [16 x i16] in \a __a and \a __b for greater-than and returns the
				/// outcomes in the corresponding elements of the 256-bit result.
				///
				/// \code{.operation}
				/// FOR i := 0 TO 15
				/// j := i*16
				/// result[j+15:j] := (__a[j+15:j] > __b[j+15:j]) ? 0xFFFF : 0
				/// ENDFOR
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPCMPGTW instruction.
				///
				/// \param __a
				/// A 256-bit vector of [16 x i16] containing one of the inputs.
				/// \param __b
				/// A 256-bit vector of [16 x i16] containing one of the inputs.
				/// \returns A 256-bit vector of [16 x i16] containing the result.
	static __inline__ __m256i __DEFAULT_FN_ATTRS256			static __inline__ __m256i __DEFAULT_FN_ATTRS256
	_mm256_cmpgt_epi16(__m256i __a, __m256i __b)			_mm256_cmpgt_epi16(__m256i __a, __m256i __b)
	{			{
	return (__m256i)((__v16hi)__a > (__v16hi)__b);			return (__m256i)((__v16hi)__a > (__v16hi)__b);
	}			}

				/// Compares corresponding signed elements in the 256-bit vectors of
				/// [8 x i32] in \a __a and \a __b for greater-than and returns the
				/// outcomes in the corresponding elements of the 256-bit result.
				///
				/// \code{.operation}
				/// FOR i := 0 TO 7
				/// j := i*32
				/// result[j+31:j] := (__a[j+31:j] > __b[j+31:j]) ? 0xFFFFFFFF : 0
				/// ENDFOR
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPCMPGTD instruction.
				///
				/// \param __a
				/// A 256-bit vector of [8 x i32] containing one of the inputs.
				/// \param __b
				/// A 256-bit vector of [8 x i32] containing one of the inputs.
				/// \returns A 256-bit vector of [8 x i32] containing the result.
	static __inline__ __m256i __DEFAULT_FN_ATTRS256			static __inline__ __m256i __DEFAULT_FN_ATTRS256
	_mm256_cmpgt_epi32(__m256i __a, __m256i __b)			_mm256_cmpgt_epi32(__m256i __a, __m256i __b)
	{			{
	return (__m256i)((__v8si)__a > (__v8si)__b);			return (__m256i)((__v8si)__a > (__v8si)__b);
	}			}

				/// Compares corresponding signed elements in the 256-bit vectors of
				/// [4 x i64] in \a __a and \a __b for greater-than and returns the
				/// outcomes in the corresponding elements of the 256-bit result.
				///
				/// \code{.operation}
				/// FOR i := 0 TO 3
				/// j := i*64
				/// result[j+63:j] := (__a[j+63:j] > __b[j+63:j]) ? 0xFFFFFFFFFFFFFFFF : 0
				/// ENDFOR
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPCMPGTQ instruction.
				///
				/// \param __a
				/// A 256-bit vector of [4 x i64] containing one of the inputs.
				/// \param __b
				/// A 256-bit vector of [4 x i64] containing one of the inputs.
				/// \returns A 256-bit vector of [4 x i64] containing the result.
	static __inline__ __m256i __DEFAULT_FN_ATTRS256			static __inline__ __m256i __DEFAULT_FN_ATTRS256
	_mm256_cmpgt_epi64(__m256i __a, __m256i __b)			_mm256_cmpgt_epi64(__m256i __a, __m256i __b)
	{			{
	return (__m256i)((__v4di)__a > (__v4di)__b);			return (__m256i)((__v4di)__a > (__v4di)__b);
	}			}

	/// Horizontally adds the adjacent pairs of 16-bit integers from two 256-bit			/// Horizontally adds the adjacent pairs of 16-bit integers from two 256-bit
	/// vectors of [16 x i16] and returns the lower 16 bits of each sum in an			/// vectors of [16 x i16] and returns the lower 16 bits of each sum in an
	▲ Show 20 Lines • Show All 493 Lines • ▼ Show 20 Lines
	}			}

	static __inline__ int __DEFAULT_FN_ATTRS256			static __inline__ int __DEFAULT_FN_ATTRS256
	_mm256_movemask_epi8(__m256i __a)			_mm256_movemask_epi8(__m256i __a)
	{			{
	return __builtin_ia32_pmovmskb256((__v32qi)__a);			return __builtin_ia32_pmovmskb256((__v32qi)__a);
	}			}

				/// Sign-extends bytes from the 128-bit integer vector in \a __V and returns
				/// the 16-bit values in the corresponding elements of a 256-bit vector
				/// of [16 x i16].
				///
				/// \code{.operation}
				/// FOR i := 0 TO 15
				/// j := i*8
				/// k := i*16
				/// result[k+15:k] := SignExtend(__V[j+7:j])
				pengfeiUnsubmitted Done Reply Inline Actions j pengfei: j
				/// ENDFOR
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPMOVSXBW instruction.
				///
				/// \param __V
				/// A 128-bit integer vector containing the source bytes.
				/// \returns A 256-bit vector of [16 x i16] containing the sign-extended
				/// values.
	static __inline__ __m256i __DEFAULT_FN_ATTRS256			static __inline__ __m256i __DEFAULT_FN_ATTRS256
	_mm256_cvtepi8_epi16(__m128i __V)			_mm256_cvtepi8_epi16(__m128i __V)
	{			{
	/* This function always performs a signed extension, but __v16qi is a char			/* This function always performs a signed extension, but __v16qi is a char
	which may be signed or unsigned, so use __v16qs. */			which may be signed or unsigned, so use __v16qs. */
	return (__m256i)__builtin_convertvector((__v16qs)__V, __v16hi);			return (__m256i)__builtin_convertvector((__v16qs)__V, __v16hi);
	}			}

				/// Sign-extends bytes from the lower half of the 128-bit integer vector in
				/// \a __V and returns the 32-bit values in the corresponding elements of a
				/// 256-bit vector of [8 x i32].
				///
				/// \code{.operation}
				/// FOR i := 0 TO 7
				/// j := i*8
				/// k := i*32
				/// result[k+31:k] := SignExtend(__V[j+7:j])
				pengfeiUnsubmitted Done Reply Inline Actions j pengfei: j
				/// ENDFOR
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPMOVSXBD instruction.
				///
				/// \param __V
				/// A 128-bit integer vector containing the source bytes.
				/// \returns A 256-bit vector of [8 x i32] containing the sign-extended
				/// values.
	static __inline__ __m256i __DEFAULT_FN_ATTRS256			static __inline__ __m256i __DEFAULT_FN_ATTRS256
	_mm256_cvtepi8_epi32(__m128i __V)			_mm256_cvtepi8_epi32(__m128i __V)
	{			{
	/* This function always performs a signed extension, but __v16qi is a char			/* This function always performs a signed extension, but __v16qi is a char
	which may be signed or unsigned, so use __v16qs. */			which may be signed or unsigned, so use __v16qs. */
	return (__m256i)__builtin_convertvector(__builtin_shufflevector((__v16qs)__V, (__v16qs)__V, 0, 1, 2, 3, 4, 5, 6, 7), __v8si);			return (__m256i)__builtin_convertvector(__builtin_shufflevector((__v16qs)__V, (__v16qs)__V, 0, 1, 2, 3, 4, 5, 6, 7), __v8si);
	}			}

				/// Sign-extends the first four bytes from the 128-bit integer vector in
				/// \a __V and returns the 64-bit values in the corresponding elements of a
				/// 256-bit vector of [4 x i64].
				///
				/// \code{.operation}
				/// result[63:0] := SignExtend(__V[7:0])
				/// result[127:64] := SignExtend(__V[15:8])
				/// result[191:128] := SignExtend(__V[23:16])
				/// result[255:192] := SignExtend(__V[31:24])
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPMOVSXBQ instruction.
				///
				/// \param __V
				/// A 128-bit integer vector containing the source bytes.
				/// \returns A 256-bit vector of [4 x i64] containing the sign-extended
				/// values.
	static __inline__ __m256i __DEFAULT_FN_ATTRS256			static __inline__ __m256i __DEFAULT_FN_ATTRS256
	_mm256_cvtepi8_epi64(__m128i __V)			_mm256_cvtepi8_epi64(__m128i __V)
	{			{
	/* This function always performs a signed extension, but __v16qi is a char			/* This function always performs a signed extension, but __v16qi is a char
	which may be signed or unsigned, so use __v16qs. */			which may be signed or unsigned, so use __v16qs. */
	return (__m256i)__builtin_convertvector(__builtin_shufflevector((__v16qs)__V, (__v16qs)__V, 0, 1, 2, 3), __v4di);			return (__m256i)__builtin_convertvector(__builtin_shufflevector((__v16qs)__V, (__v16qs)__V, 0, 1, 2, 3), __v4di);
	}			}

				/// Sign-extends 16-bit elements from the 128-bit vector of [8 x i16] in
				/// \a __V and returns the 32-bit values in the corresponding elements of a
				/// 256-bit vector of [8 x i32].
				///
				/// \code{.operation}
				/// FOR i := 0 TO 7
				/// j := i*16
				/// k := i*32
				/// result[k+31:k] := SignExtend(__V[j+15:j])
				pengfeiUnsubmitted Done Reply Inline Actions j pengfei: j
				/// ENDFOR
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPMOVSXWD instruction.
				///
				/// \param __V
				/// A 128-bit vector of [8 x i16] containing the source values.
				/// \returns A 256-bit vector of [8 x i32] containing the sign-extended
				/// values.
	static __inline__ __m256i __DEFAULT_FN_ATTRS256			static __inline__ __m256i __DEFAULT_FN_ATTRS256
	_mm256_cvtepi16_epi32(__m128i __V)			_mm256_cvtepi16_epi32(__m128i __V)
	{			{
	return (__m256i)__builtin_convertvector((__v8hi)__V, __v8si);			return (__m256i)__builtin_convertvector((__v8hi)__V, __v8si);
	}			}

				/// Sign-extends 16-bit elements from the lower half of the 128-bit vector of
				/// [8 x i16] in \a __V and returns the 64-bit values in the corresponding
				/// elements of a 256-bit vector of [4 x i64].
				///
				/// \code{.operation}
				/// result[63:0] := SignExtend(__V[15:0])
				/// result[127:64] := SignExtend(__V[31:16])
				/// result[191:128] := SignExtend(__V[47:32])
				/// result[255:192] := SignExtend(__V[64:48])
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPMOVSXWQ instruction.
				///
				/// \param __V
				/// A 128-bit vector of [8 x i16] containing the source values.
				/// \returns A 256-bit vector of [4 x i64] containing the sign-extended
				/// values.
	static __inline__ __m256i __DEFAULT_FN_ATTRS256			static __inline__ __m256i __DEFAULT_FN_ATTRS256
	_mm256_cvtepi16_epi64(__m128i __V)			_mm256_cvtepi16_epi64(__m128i __V)
	{			{
	return (__m256i)__builtin_convertvector(__builtin_shufflevector((__v8hi)__V, (__v8hi)__V, 0, 1, 2, 3), __v4di);			return (__m256i)__builtin_convertvector(__builtin_shufflevector((__v8hi)__V, (__v8hi)__V, 0, 1, 2, 3), __v4di);
	}			}

				/// Sign-extends 32-bit elements from the 128-bit vector of [4 x i32] in
				/// \a __V and returns the 64-bit values in the corresponding elements of a
				/// 256-bit vector of [4 x i64].
				///
				/// \code{.operation}
				/// result[63:0] := SignExtend(__V[31:0])
				/// result[127:64] := SignExtend(__V[63:32])
				/// result[191:128] := SignExtend(__V[95:64])
				/// result[255:192] := SignExtend(__V[127:96])
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPMOVSXDQ instruction.
				///
				/// \param __V
				/// A 128-bit vector of [4 x i32] containing the source values.
				/// \returns A 256-bit vector of [4 x i64] containing the sign-extended
				/// values.
	static __inline__ __m256i __DEFAULT_FN_ATTRS256			static __inline__ __m256i __DEFAULT_FN_ATTRS256
	_mm256_cvtepi32_epi64(__m128i __V)			_mm256_cvtepi32_epi64(__m128i __V)
	{			{
	return (__m256i)__builtin_convertvector((__v4si)__V, __v4di);			return (__m256i)__builtin_convertvector((__v4si)__V, __v4di);
	}			}

				/// Zero-extends bytes from the 128-bit integer vector in \a __V and returns
				/// the 16-bit values in the corresponding elements of a 256-bit vector
				/// of [16 x i16].
				///
				/// \code{.operation}
				/// FOR i := 0 TO 15
				/// j := i*8
				/// k := i*16
				/// result[k+15:k] := ZeroExtend(__V[j+7:j])
				pengfeiUnsubmitted Done Reply Inline Actions j pengfei: j
				/// ENDFOR
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPMOVZXBW instruction.
				///
				/// \param __V
				/// A 128-bit integer vector containing the source bytes.
				/// \returns A 256-bit vector of [16 x i16] containing the zero-extended
				/// values.
	static __inline__ __m256i __DEFAULT_FN_ATTRS256			static __inline__ __m256i __DEFAULT_FN_ATTRS256
	_mm256_cvtepu8_epi16(__m128i __V)			_mm256_cvtepu8_epi16(__m128i __V)
	{			{
	return (__m256i)__builtin_convertvector((__v16qu)__V, __v16hi);			return (__m256i)__builtin_convertvector((__v16qu)__V, __v16hi);
	}			}

				/// Zero-extends bytes from the lower half of the 128-bit integer vector in
				/// \a __V and returns the 32-bit values in the corresponding elements of a
				/// 256-bit vector of [8 x i32].
				///
				/// \code{.operation}
				/// FOR i := 0 TO 7
				/// j := i*8
				/// k := i*32
				/// result[k+31:k] := ZeroExtend(__V[j+7:j])
				pengfeiUnsubmitted Done Reply Inline Actions j pengfei: j
				/// ENDFOR
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPMOVZXBD instruction.
				///
				/// \param __V
				/// A 128-bit integer vector containing the source bytes.
				/// \returns A 256-bit vector of [8 x i32] containing the zero-extended
				/// values.
	static __inline__ __m256i __DEFAULT_FN_ATTRS256			static __inline__ __m256i __DEFAULT_FN_ATTRS256
	_mm256_cvtepu8_epi32(__m128i __V)			_mm256_cvtepu8_epi32(__m128i __V)
	{			{
	return (__m256i)__builtin_convertvector(__builtin_shufflevector((__v16qu)__V, (__v16qu)__V, 0, 1, 2, 3, 4, 5, 6, 7), __v8si);			return (__m256i)__builtin_convertvector(__builtin_shufflevector((__v16qu)__V, (__v16qu)__V, 0, 1, 2, 3, 4, 5, 6, 7), __v8si);
	}			}

				/// Zero-extends the first four bytes from the 128-bit integer vector in
				/// \a __V and returns the 64-bit values in the corresponding elements of a
				/// 256-bit vector of [4 x i64].
				///
				/// \code{.operation}
				/// result[63:0] := ZeroExtend(__V[7:0])
				/// result[127:64] := ZeroExtend(__V[15:8])
				/// result[191:128] := ZeroExtend(__V[23:16])
				/// result[255:192] := ZeroExtend(__V[31:24])
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPMOVZXBQ instruction.
				///
				/// \param __V
				/// A 128-bit integer vector containing the source bytes.
				/// \returns A 256-bit vector of [4 x i64] containing the zero-extended
				/// values.
	static __inline__ __m256i __DEFAULT_FN_ATTRS256			static __inline__ __m256i __DEFAULT_FN_ATTRS256
	_mm256_cvtepu8_epi64(__m128i __V)			_mm256_cvtepu8_epi64(__m128i __V)
	{			{
	return (__m256i)__builtin_convertvector(__builtin_shufflevector((__v16qu)__V, (__v16qu)__V, 0, 1, 2, 3), __v4di);			return (__m256i)__builtin_convertvector(__builtin_shufflevector((__v16qu)__V, (__v16qu)__V, 0, 1, 2, 3), __v4di);
	}			}

				/// Zero-extends 16-bit elements from the 128-bit vector of [8 x i16] in
				/// \a __V and returns the 32-bit values in the corresponding elements of a
				/// 256-bit vector of [8 x i32].
				///
				/// \code{.operation}
				/// FOR i := 0 TO 7
				/// j := i*16
				/// k := i*32
				/// result[k+31:k] := ZeroExtend(__V[j+15:j])
				pengfeiUnsubmitted Done Reply Inline Actions j pengfei: j
				/// ENDFOR
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPMOVZXWD instruction.
				///
				/// \param __V
				/// A 128-bit vector of [8 x i16] containing the source values.
				/// \returns A 256-bit vector of [8 x i32] containing the zero-extended
				/// values.
	static __inline__ __m256i __DEFAULT_FN_ATTRS256			static __inline__ __m256i __DEFAULT_FN_ATTRS256
	_mm256_cvtepu16_epi32(__m128i __V)			_mm256_cvtepu16_epi32(__m128i __V)
	{			{
	return (__m256i)__builtin_convertvector((__v8hu)__V, __v8si);			return (__m256i)__builtin_convertvector((__v8hu)__V, __v8si);
	}			}

				/// Zero-extends 16-bit elements from the lower half of the 128-bit vector of
				/// [8 x i16] in \a __V and returns the 64-bit values in the corresponding
				/// elements of a 256-bit vector of [4 x i64].
				///
				/// \code{.operation}
				/// result[63:0] := ZeroExtend(__V[15:0])
				/// result[127:64] := ZeroExtend(__V[31:16])
				/// result[191:128] := ZeroExtend(__V[47:32])
				/// result[255:192] := ZeroExtend(__V[64:48])
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPMOVSXWQ instruction.
				///
				/// \param __V
				/// A 128-bit vector of [8 x i16] containing the source values.
				/// \returns A 256-bit vector of [4 x i64] containing the zero-extended
				/// values.
	static __inline__ __m256i __DEFAULT_FN_ATTRS256			static __inline__ __m256i __DEFAULT_FN_ATTRS256
	_mm256_cvtepu16_epi64(__m128i __V)			_mm256_cvtepu16_epi64(__m128i __V)
	{			{
	return (__m256i)__builtin_convertvector(__builtin_shufflevector((__v8hu)__V, (__v8hu)__V, 0, 1, 2, 3), __v4di);			return (__m256i)__builtin_convertvector(__builtin_shufflevector((__v8hu)__V, (__v8hu)__V, 0, 1, 2, 3), __v4di);
	}			}

				/// Zero-extends 32-bit elements from the 128-bit vector of [4 x i32] in
				/// \a __V and returns the 64-bit values in the corresponding elements of a
				/// 256-bit vector of [4 x i64].
				///
				/// \code{.operation}
				/// result[63:0] := ZeroExtend(__V[31:0])
				/// result[127:64] := ZeroExtend(__V[63:32])
				/// result[191:128] := ZeroExtend(__V[95:64])
				/// result[255:192] := ZeroExtend(__V[127:96])
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPMOVZXDQ instruction.
				///
				/// \param __V
				/// A 128-bit vector of [4 x i32] containing the source values.
				/// \returns A 256-bit vector of [4 x i64] containing the zero-extended
				/// values.
	static __inline__ __m256i __DEFAULT_FN_ATTRS256			static __inline__ __m256i __DEFAULT_FN_ATTRS256
	_mm256_cvtepu32_epi64(__m128i __V)			_mm256_cvtepu32_epi64(__m128i __V)
	{			{
	return (__m256i)__builtin_convertvector((__v4su)__V, __v4di);			return (__m256i)__builtin_convertvector((__v4su)__V, __v4di);
	}			}

	/// Multiplies signed 32-bit integers from even-numbered elements of two			/// Multiplies signed 32-bit integers from even-numbered elements of two
	/// 256-bit vectors of [8 x i32] and returns the 64-bit products in the			/// 256-bit vectors of [8 x i32] and returns the 64-bit products in the
	▲ Show 20 Lines • Show All 1,315 Lines • ▼ Show 20 Lines
	/// A 256-bit integer vector.			/// A 256-bit integer vector.
	/// \returns A 256-bit integer vector containing the result.			/// \returns A 256-bit integer vector containing the result.
	static __inline__ __m256i __DEFAULT_FN_ATTRS256			static __inline__ __m256i __DEFAULT_FN_ATTRS256
	_mm256_xor_si256(__m256i __a, __m256i __b)			_mm256_xor_si256(__m256i __a, __m256i __b)
	{			{
	return (__m256i)((__v4du)__a ^ (__v4du)__b);			return (__m256i)((__v4du)__a ^ (__v4du)__b);
	}			}

				/// Loads the 256-bit integer vector from memory \a __V using a non-temporal
				/// memory hint and returns the vector. \a __V must be aligned on a 32-byte
				/// boundary.
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VMOVNTDQA instruction.
				///
				/// \param __V
				/// A pointer to the 32-byte aligned memory containing the vector to load.
				/// \returns A 256-bit integer vector loaded from memory.
	static __inline__ __m256i __DEFAULT_FN_ATTRS256			static __inline__ __m256i __DEFAULT_FN_ATTRS256
	_mm256_stream_load_si256(__m256i const *__V)			_mm256_stream_load_si256(__m256i const *__V)
	{			{
	typedef __v4di __v4di_aligned __attribute__((aligned(32)));			typedef __v4di __v4di_aligned __attribute__((aligned(32)));
	return (__m256i)__builtin_nontemporal_load((const __v4di_aligned *)__V);			return (__m256i)__builtin_nontemporal_load((const __v4di_aligned *)__V);
	}			}

	/// Broadcasts the 32-bit floating-point value from the low element of the			/// Broadcasts the 32-bit floating-point value from the low element of the
	▲ Show 20 Lines • Show All 485 Lines • ▼ Show 20 Lines
	/// A 128-bit integer vector containing a source value.			/// A 128-bit integer vector containing a source value.
	/// \param M			/// \param M
	/// An immediate value specifying where to put \a V2 in the result.			/// An immediate value specifying where to put \a V2 in the result.
	/// \returns A 256-bit integer vector containing the result.			/// \returns A 256-bit integer vector containing the result.
	#define _mm256_inserti128_si256(V1, V2, M) \			#define _mm256_inserti128_si256(V1, V2, M) \
	((__m256i)__builtin_ia32_insert128i256((__v4di)(__m256i)(V1), \			((__m256i)__builtin_ia32_insert128i256((__v4di)(__m256i)(V1), \
	(__v2di)(__m128i)(V2), (int)(M)))			(__v2di)(__m128i)(V2), (int)(M)))

				/// Conditionally loads eight 32-bit integer elements from memory \a __X, if
				/// the most significant bit of the corresponding element in the mask
				/// \a __M is set; otherwise, sets that element of the result to zero.
				/// Returns the 256-bit [8 x i32] result.
				///
				/// \code{.operation}
				/// FOR i := 0 TO 7
				/// j := i*32
				/// IF __M[j+31] == 1
				/// result[j+31:j] := Load32(__X+(i*4))
				pengfeiUnsubmitted Not Done Reply Inline Actions A more intrinsic guide format is `MEM[__X+j:j]` pengfei: A more intrinsic guide format is `MEM[__X+j:j]`
				probinsonAuthorUnsubmitted Not Done Reply Inline Actions LoadXX is the syntax in the gather intrinsics, e.g. _mm_mask_i32gather_pd. I'd prefer to be consistent. probinson: LoadXX is the syntax in the gather intrinsics, e.g. _mm_mask_i32gather_pd. I'd prefer to be…
				pengfeiUnsubmitted Not Done Reply Inline Actions I think the problem here is the measurement is easily confusing. From C point of view, `__X` is a `int` pointer, so we should `+ i` rather than `i * 4` From the other part of the code, we are measuring in bits, but here `i * 4` is a byte offset. pengfei: I think the problem here is the measurement is easily confusing. From C point of view, `__X` is…
				probinsonAuthorUnsubmitted Done Reply Inline Actions Well, the pseudo-code is clearly not C. If you look at the gather code, it computes a byte address using an offset multiplied by an explicit scale factor. I am doing exactly the same here. The syntax `MEM[__X+j:j]` is mixing a byte address with a bit offset, which I think is more confusing. To be fully consistent, using `[]` with bit offsets only, it should be k := __X8 + i32 result[j+31:j] := MEM[k+31:k] which I think obscures more than it explains. probinson: Well, the pseudo-code is clearly not C. If you look at the gather code, it computes a byte…
				pengfeiUnsubmitted Not Done Reply Inline Actions Yeah, it's not C code here. But we are easy to fall into C concepts, e.g., why assuming __X is measuring in bytes? That's why I think it's clear to make both in bits. I made a mistake here, I wanted to propose `MEM[__X+j+31: __X+j]`. It matches with Intrinsic Guide. pengfei: Yeah, it's not C code here. But we are easy to fall into C concepts, e.g., why assuming __X is…
				probinsonAuthorUnsubmitted Not Done Reply Inline Actions We assume `__X` is in bytes because that's how addresses work on X86. Adding a bit offset to a byte address makes no sense. I see that is how existing Intel documentation works, which does not make it correct. To "make both in bits" means multiplying `__X` by 8, as in the example in my previous comment. Or coming up with a different syntax that makes the difference clear. `MEM(__X)[j+31:j]` or even `MEM[__X][j+31:j]` would be preferable. probinson: We assume `__X` is in bytes because that's how addresses work on X86. Adding a bit offset to a…
				pengfeiUnsubmitted Not Done Reply Inline Actions My intention is to match with Intrinsic Guide as possible. Multiplying by 8 cannot achive it, but I cannot deny `__X` in bytes makes sense. So I'm fine to use a different syntax. pengfei: My intention is to match with Intrinsic Guide as possible. Multiplying by 8 cannot achive it…
				/// ELSE
				/// result[j+31:j] := 0
				/// FI
				/// ENDFOR
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPMASKMOVD instruction.
				///
				/// \param __X
				/// A pointer to the memory used for loading values.
				/// \param __M
				/// A 256-bit vector of [8 x i32] containing the mask bits.
				/// \returns A 256-bit vector of [8 x i32] containing the loaded or zeroed
				/// elements.
	static __inline__ __m256i __DEFAULT_FN_ATTRS256			static __inline__ __m256i __DEFAULT_FN_ATTRS256
	_mm256_maskload_epi32(int const *__X, __m256i __M)			_mm256_maskload_epi32(int const *__X, __m256i __M)
	{			{
	return (__m256i)__builtin_ia32_maskloadd256((const __v8si *)__X, (__v8si)__M);			return (__m256i)__builtin_ia32_maskloadd256((const __v8si *)__X, (__v8si)__M);
	}			}

				/// Conditionally loads four 64-bit integer elements from memory \a __X, if
				/// the most significant bit of the corresponding element in the mask
				/// \a __M is set; otherwise, sets that element of the result to zero.
				/// Returns the 256-bit [4 x i64] result.
				///
				/// \code{.operation}
				/// FOR i := 0 TO 3
				/// j := i*64
				/// IF __M[j+63] == 1
				/// result[j+63:j] := Load64(__X+(i*8))
				pengfeiUnsubmitted Not Done Reply Inline Actions ditto. pengfei: ditto.
				/// ELSE
				/// result[j+63:j] := 0
				/// FI
				/// ENDFOR
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPMASKMOVQ instruction.
				///
				/// \param __X
				/// A pointer to the memory used for loading values.
				/// \param __M
				/// A 256-bit vector of [4 x i64] containing the mask bits.
				/// \returns A 256-bit vector of [4 x i64] containing the loaded or zeroed
				/// elements.
	static __inline__ __m256i __DEFAULT_FN_ATTRS256			static __inline__ __m256i __DEFAULT_FN_ATTRS256
	_mm256_maskload_epi64(long long const *__X, __m256i __M)			_mm256_maskload_epi64(long long const *__X, __m256i __M)
	{			{
	return (__m256i)__builtin_ia32_maskloadq256((const __v4di *)__X, (__v4di)__M);			return (__m256i)__builtin_ia32_maskloadq256((const __v4di *)__X, (__v4di)__M);
	}			}

				/// Conditionally loads four 32-bit integer elements from memory \a __X, if
				/// the most significant bit of the corresponding element in the mask
				/// \a __M is set; otherwise, sets that element of the result to zero.
				/// Returns the 128-bit [4 x i32] result.
				///
				/// \code{.operation}
				/// FOR i := 0 TO 3
				/// j := i*32
				/// IF __M[j+31] == 1
				/// result[j+31:j] := Load32(__X+(i*4))
				pengfeiUnsubmitted Not Done Reply Inline Actions ditto. pengfei: ditto.
				/// ELSE
				/// result[j+31:j] := 0
				/// FI
				/// ENDFOR
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPMASKMOVD instruction.
				///
				/// \param __X
				/// A pointer to the memory used for loading values.
				/// \param __M
				/// A 128-bit vector of [4 x i32] containing the mask bits.
				/// \returns A 128-bit vector of [4 x i32] containing the loaded or zeroed
				/// elements.
	static __inline__ __m128i __DEFAULT_FN_ATTRS128			static __inline__ __m128i __DEFAULT_FN_ATTRS128
	_mm_maskload_epi32(int const *__X, __m128i __M)			_mm_maskload_epi32(int const *__X, __m128i __M)
	{			{
	return (__m128i)__builtin_ia32_maskloadd((const __v4si *)__X, (__v4si)__M);			return (__m128i)__builtin_ia32_maskloadd((const __v4si *)__X, (__v4si)__M);
	}			}

				/// Conditionally loads two 64-bit integer elements from memory \a __X, if
				/// the most significant bit of the corresponding element in the mask
				/// \a __M is set; otherwise, sets that element of the result to zero.
				/// Returns the 128-bit [2 x i64] result.
				///
				/// \code{.operation}
				/// FOR i := 0 TO 1
				/// j := i*64
				/// IF __M[j+63] == 1
				/// result[j+63:j] := Load64(__X+(i*8))
				pengfeiUnsubmitted Not Done Reply Inline Actions ditto. pengfei: ditto.
				/// ELSE
				/// result[j+63:j] := 0
				/// FI
				/// ENDFOR
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPMASKMOVQ instruction.
				///
				/// \param __X
				/// A pointer to the memory used for loading values.
				/// \param __M
				/// A 128-bit vector of [2 x i64] containing the mask bits.
				/// \returns A 128-bit vector of [2 x i64] containing the loaded or zeroed
				/// elements.
	static __inline__ __m128i __DEFAULT_FN_ATTRS128			static __inline__ __m128i __DEFAULT_FN_ATTRS128
	_mm_maskload_epi64(long long const *__X, __m128i __M)			_mm_maskload_epi64(long long const *__X, __m128i __M)
	{			{
	return (__m128i)__builtin_ia32_maskloadq((const __v2di *)__X, (__v2di)__M);			return (__m128i)__builtin_ia32_maskloadq((const __v2di *)__X, (__v2di)__M);
	}			}

				/// Conditionally stores eight 32-bit integer elements from the 256-bit vector
				/// of [8 x i32] in \a __Y to memory \a __X, if the most significant bit of
				/// the corresponding element in the mask \a __M is set; otherwise, the
				/// memory element is unchanged.
				///
				/// \code{.operation}
				/// FOR i := 0 TO 7
				/// j := i*32
				/// IF __M[j+31] == 1
				/// Store32(__X+(i*4), __Y[j+31:j])
				pengfeiUnsubmitted Not Done Reply Inline Actions MEM[j+31:j] := __Y[j+31:j] pengfei: MEM[j+31:j] := __Y[j+31:j]
				/// FI
				/// ENDFOR
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPMASKMOVD instruction.
				///
				/// \param __X
				/// A pointer to the memory used for storing values.
				/// \param __M
				/// A 256-bit vector of [8 x i32] containing the mask bits.
				/// \param __Y
				/// A 256-bit vector of [8 x i32] containing the values to store.
	static __inline__ void __DEFAULT_FN_ATTRS256			static __inline__ void __DEFAULT_FN_ATTRS256
	_mm256_maskstore_epi32(int *__X, __m256i __M, __m256i __Y)			_mm256_maskstore_epi32(int *__X, __m256i __M, __m256i __Y)
	{			{
	__builtin_ia32_maskstored256((__v8si *)__X, (__v8si)__M, (__v8si)__Y);			__builtin_ia32_maskstored256((__v8si *)__X, (__v8si)__M, (__v8si)__Y);
	}			}

				/// Conditionally stores four 64-bit integer elements from the 256-bit vector
				/// of [4 x i64] in \a __Y to memory \a __X, if the most significant bit of
				/// the corresponding element in the mask \a __M is set; otherwise, the
				/// memory element is unchanged.
				///
				/// \code{.operation}
				/// FOR i := 0 TO 3
				/// j := i*64
				/// IF __M[j+63] == 1
				/// Store64(__X+(i*8), __Y[j+63:j])
				pengfeiUnsubmitted Not Done Reply Inline Actions ditto. pengfei: ditto.
				/// FI
				/// ENDFOR
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPMASKMOVQ instruction.
				///
				/// \param __X
				/// A pointer to the memory used for storing values.
				/// \param __M
				/// A 256-bit vector of [4 x i64] containing the mask bits.
				/// \param __Y
				/// A 256-bit vector of [4 x i64] containing the values to store.
	static __inline__ void __DEFAULT_FN_ATTRS256			static __inline__ void __DEFAULT_FN_ATTRS256
	_mm256_maskstore_epi64(long long *__X, __m256i __M, __m256i __Y)			_mm256_maskstore_epi64(long long *__X, __m256i __M, __m256i __Y)
	{			{
	__builtin_ia32_maskstoreq256((__v4di *)__X, (__v4di)__M, (__v4di)__Y);			__builtin_ia32_maskstoreq256((__v4di *)__X, (__v4di)__M, (__v4di)__Y);
	}			}

				/// Conditionally stores four 32-bit integer elements from the 128-bit vector
				/// of [4 x i32] in \a __Y to memory \a __X, if the most significant bit of
				/// the corresponding element in the mask \a __M is set; otherwise, the
				/// memory element is unchanged.
				///
				/// \code{.operation}
				/// FOR i := 0 TO 3
				/// j := i*32
				/// IF __M[j+31] == 1
				/// Store32(__X+(i*4), __Y[j+31:j])
				pengfeiUnsubmitted Not Done Reply Inline Actions ditto. pengfei: ditto.
				/// FI
				/// ENDFOR
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPMASKMOVD instruction.
				///
				/// \param __X
				/// A pointer to the memory used for storing values.
				/// \param __M
				/// A 128-bit vector of [4 x i32] containing the mask bits.
				/// \param __Y
				/// A 128-bit vector of [4 x i32] containing the values to store.
	static __inline__ void __DEFAULT_FN_ATTRS128			static __inline__ void __DEFAULT_FN_ATTRS128
	_mm_maskstore_epi32(int *__X, __m128i __M, __m128i __Y)			_mm_maskstore_epi32(int *__X, __m128i __M, __m128i __Y)
	{			{
	__builtin_ia32_maskstored((__v4si *)__X, (__v4si)__M, (__v4si)__Y);			__builtin_ia32_maskstored((__v4si *)__X, (__v4si)__M, (__v4si)__Y);
	}			}

				/// Conditionally stores two 64-bit integer elements from the 128-bit vector
				/// of [2 x i64] in \a __Y to memory \a __X, if the most significant bit of
				/// the corresponding element in the mask \a __M is set; otherwise, the
				/// memory element is unchanged.
				///
				/// \code{.operation}
				/// FOR i := 0 TO 1
				/// j := i*64
				/// IF __M[j+63] == 1
				/// Store64(__X+(i*8), __Y[j+63:j])
				pengfeiUnsubmitted Not Done Reply Inline Actions ditto. pengfei: ditto.
				/// FI
				/// ENDFOR
				/// \endcode
				///
				/// \headerfile <immintrin.h>
				///
				/// This intrinsic corresponds to the \c VPMASKMOVQ instruction.
				///
				/// \param __X
				/// A pointer to the memory used for storing values.
				/// \param __M
				/// A 128-bit vector of [2 x i64] containing the mask bits.
				/// \param __Y
				/// A 128-bit vector of [2 x i64] containing the values to store.
	static __inline__ void __DEFAULT_FN_ATTRS128			static __inline__ void __DEFAULT_FN_ATTRS128
	_mm_maskstore_epi64(long long *__X, __m128i __M, __m128i __Y)			_mm_maskstore_epi64(long long *__X, __m128i __M, __m128i __Y)
	{			{
	__builtin_ia32_maskstoreq(( __v2di *)__X, (__v2di)__M, (__v2di)__Y);			__builtin_ia32_maskstoreq(( __v2di *)__X, (__v2di)__M, (__v2di)__Y);
	}			}

	/// Shifts each 32-bit element of the 256-bit vector of [8 x i32] in \a __X			/// Shifts each 32-bit element of the 256-bit vector of [8 x i32] in \a __X
	/// left by the number of bits given in the corresponding element of the			/// left by the number of bits given in the corresponding element of the
	▲ Show 20 Lines • Show All 1,549 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Headers][doc] Add load/store/cmp/cvt intrinsic descriptions to avx2intrin.hClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 536266

clang/lib/Headers/avx2intrin.h

[Headers][doc] Add load/store/cmp/cvt intrinsic descriptions to avx2intrin.h
ClosedPublic