This is an archive of the discontinued LLVM Phabricator instance.

Differential D20468

[X86][AVX] Ensure zero-extension of _mm256_extract_epi8 and _mm256_extract_epi16
ClosedPublic

Authored by RKSimon on May 20 2016, 6:17 AM.

Download Raw Diff

Details

Reviewers

spatel
mkuper
kromanova
craig.topper
hjl.tools

Commits

rG28666ce77845: [X86][AVX] Ensure zero-extension of _mm256_extract_epi8 and _mm256_extract_epi16
rC270330: [X86][AVX] Ensure zero-extension of _mm256_extract_epi8 and _mm256_extract_epi16
rL270330: [X86][AVX] Ensure zero-extension of _mm256_extract_epi8 and _mm256_extract_epi16

Summary

Ensure _mm256_extract_epi8 and _mm256_extract_epi16 zero extend their i8/i16 result to i32. This matches _mm_extract_epi8 and _mm_extract_epi16.

Fix for PR27594

Katya - I've updated the doxygen comments for _mm256_extract_epi8 and _mm256_extract_epi16, I guess this will need to be updated in Sony's intrinsics document for the next regeneration?

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon updated this revision to Diff 57927.May 20 2016, 6:17 AM

RKSimon retitled this revision from to [X86][AVX] Ensure zero-extension of _mm256_extract_epi8 and _mm256_extract_epi16.

RKSimon updated this object.

RKSimon added reviewers: mkuper, craig.topper, kromanova, spatel.

RKSimon set the repository for this revision to rL LLVM.

RKSimon added a subscriber: cfe-commits.

Could you point me to where in the documentation it says they must be zero-extended?
The Intel intrinsics guide actually has them with shorter return types:

__int8 _mm256_extract_epi8 (__m256i a, const int index)
__int16 _mm256_extract_epi16 (__m256i a, const int index)

mkuper added a reviewer: hjl.tools.May 20 2016, 8:00 AM

In D20468#435522, @mkuper wrote:
Could you point me to where in the documentation it says they must be zero-extended?
The Intel intrinsics guide actually has them with shorter return types:
__int8 _mm256_extract_epi8 (__m256i a, const int index)
__int16 _mm256_extract_epi16 (__m256i a, const int index)

And the gcc version has them wrapped to the _mm_extract_epi* intrinsics which map to the real 128-bit instructions which do zero-extend.

I'm open to changing the return types in the headers instead, but really I'd expect the mm256 versions to zero extend like the older mm versions.

You're right, the underlying instructions zext, and it seems like it's the right thing to do. I'm just wondering if this is something user code is supposed to rely on, given the way the intrinsics guide documents them right now.
H.J, could you please take a look?

Hi Michael,

I think the Intel Intrinsics reference and the Intel Compiler are in error and that this is the right fix for the LLVM headers. (I'll follow up to get the Intel Intrinsics reference & Intel Compiler fixed.)

The _mm256_extract_epiN "convenience" intrinsics were first introduced by gcc and used an "int" return type. They were added to the Intel Compiler about 2 years ago, but for some reason were defined to use the smaller signed types. I'm double checking with the developer that added them, but I think it was just a mistake.

-Dave

Thanks, Dave!

In that case, LGTM.

This revision is now accepted and ready to land.May 20 2016, 2:10 PM

Closed by commit rL270330: [X86][AVX] Ensure zero-extension of _mm256_extract_epi8 and _mm256_extract_epi16 (authored by RKSimon). · Explain WhyMay 21 2016, 2:20 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

cfe/

trunk/

lib/

Headers/

avxintrin.h

10 lines

test/

CodeGen/

avx-builtins.c

6 lines

Diff 58045

cfe/trunk/lib/Headers/avxintrin.h

	Show First 20 Lines • Show All 1,869 Lines • ▼ Show 20 Lines
	/// This intrinsic corresponds to the \c VEXTRACTF128+COMPOSITE /			/// This intrinsic corresponds to the \c VEXTRACTF128+COMPOSITE /
	/// EXTRACTF128+COMPOSITE instruction.			/// EXTRACTF128+COMPOSITE instruction.
	///			///
	/// \param __a			/// \param __a
	/// A 256-bit integer vector of [16 x i16].			/// A 256-bit integer vector of [16 x i16].
	/// \param __imm			/// \param __imm
	/// An immediate integer operand with bits [3:0] determining which vector			/// An immediate integer operand with bits [3:0] determining which vector
	/// element is extracted and returned.			/// element is extracted and returned.
	/// \returns A 32-bit integer containing the extracted 16 bits of extended			/// \returns A 32-bit integer containing the extracted 16 bits of zero extended
	/// packed data.			/// packed data.
	static __inline int __DEFAULT_FN_ATTRS			static __inline int __DEFAULT_FN_ATTRS
	_mm256_extract_epi16(__m256i __a, const int __imm)			_mm256_extract_epi16(__m256i __a, const int __imm)
	{			{
	__v16hi __b = (__v16hi)__a;			__v16hi __b = (__v16hi)__a;
	return __b[__imm & 15];			return (unsigned short)__b[__imm & 15];
	}			}

	/// \brief Takes a [32 x i8] vector and returns the vector element value			/// \brief Takes a [32 x i8] vector and returns the vector element value
	/// indexed by the immediate constant operand.			/// indexed by the immediate constant operand.
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	/// This intrinsic corresponds to the \c VEXTRACTF128+COMPOSITE /			/// This intrinsic corresponds to the \c VEXTRACTF128+COMPOSITE /
	/// EXTRACTF128+COMPOSITE instruction.			/// EXTRACTF128+COMPOSITE instruction.
	///			///
	/// \param __a			/// \param __a
	/// A 256-bit integer vector of [32 x i8].			/// A 256-bit integer vector of [32 x i8].
	/// \param __imm			/// \param __imm
	/// An immediate integer operand with bits [4:0] determining which vector			/// An immediate integer operand with bits [4:0] determining which vector
	/// element is extracted and returned.			/// element is extracted and returned.
	/// \returns A 32-bit integer containing the extracted 8 bits of extended packed			/// \returns A 32-bit integer containing the extracted 8 bits of zero extended
	/// data.			/// packed data.
	static __inline int __DEFAULT_FN_ATTRS			static __inline int __DEFAULT_FN_ATTRS
	_mm256_extract_epi8(__m256i __a, const int __imm)			_mm256_extract_epi8(__m256i __a, const int __imm)
	{			{
	__v32qi __b = (__v32qi)__a;			__v32qi __b = (__v32qi)__a;
	return __b[__imm & 31];			return (unsigned char)__b[__imm & 31];
	}			}

	#ifdef __x86_64__			#ifdef __x86_64__
	/// \brief Takes a [4 x i64] vector and returns the vector element value			/// \brief Takes a [4 x i64] vector and returns the vector element value
	/// indexed by the immediate constant operand.			/// indexed by the immediate constant operand.
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	▲ Show 20 Lines • Show All 990 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/avx-builtins.c

	Show First 20 Lines • Show All 308 Lines • ▼ Show 20 Lines
	}			}

	__m256 test_mm256_dp_ps(__m256 A, __m256 B) {			__m256 test_mm256_dp_ps(__m256 A, __m256 B) {
	// CHECK-LABEL: test_mm256_dp_ps			// CHECK-LABEL: test_mm256_dp_ps
	// CHECK: call <8 x float> @llvm.x86.avx.dp.ps.256(<8 x float> {{.}}, <8 x float> {{.}}, i8 7)			// CHECK: call <8 x float> @llvm.x86.avx.dp.ps.256(<8 x float> {{.}}, <8 x float> {{.}}, i8 7)
	return _mm256_dp_ps(A, B, 7);			return _mm256_dp_ps(A, B, 7);
	}			}

	// FIXME: ZEXT instead of SEXT
	int test_mm256_extract_epi8(__m256i A) {			int test_mm256_extract_epi8(__m256i A) {
	// CHECK-LABEL: test_mm256_extract_epi8			// CHECK-LABEL: test_mm256_extract_epi8
	// CHECK: and i32 %{{.*}}, 31			// CHECK: and i32 %{{.*}}, 31
	// CHECK: extractelement <32 x i8> %{{.}}, i32 %{{.}}			// CHECK: extractelement <32 x i8> %{{.}}, i32 %{{.}}
	// CHECK: ext i8 %{{.*}} to i32			// CHECK: zext i8 %{{.*}} to i32
	return _mm256_extract_epi8(A, 32);			return _mm256_extract_epi8(A, 32);
	}			}

	// FIXME: ZEXT instead of SEXT
	int test_mm256_extract_epi16(__m256i A) {			int test_mm256_extract_epi16(__m256i A) {
	// CHECK-LABEL: test_mm256_extract_epi16			// CHECK-LABEL: test_mm256_extract_epi16
	// CHECK: and i32 %{{.*}}, 15			// CHECK: and i32 %{{.*}}, 15
	// CHECK: extractelement <16 x i16> %{{.}}, i32 %{{.}}			// CHECK: extractelement <16 x i16> %{{.}}, i32 %{{.}}
	// CHECK: ext i16 %{{.*}} to i32			// CHECK: zext i16 %{{.*}} to i32
	return _mm256_extract_epi16(A, 16);			return _mm256_extract_epi16(A, 16);
	}			}

	int test_mm256_extract_epi32(__m256i A) {			int test_mm256_extract_epi32(__m256i A) {
	// CHECK-LABEL: test_mm256_extract_epi32			// CHECK-LABEL: test_mm256_extract_epi32
	// CHECK: and i32 %{{.*}}, 7			// CHECK: and i32 %{{.*}}, 7
	// CHECK: extractelement <8 x i32> %{{.}}, i32 %{{.}}			// CHECK: extractelement <8 x i32> %{{.}}, i32 %{{.}}
	return _mm256_extract_epi32(A, 8);			return _mm256_extract_epi32(A, 8);
	▲ Show 20 Lines • Show All 1,047 Lines • Show Last 20 Lines