Page MenuHomePhabricator

[X86][AVX] Ensure zero-extension of _mm256_extract_epi8 and _mm256_extract_epi16
ClosedPublic

Authored by RKSimon on May 20 2016, 6:17 AM.

Details

Summary

Ensure _mm256_extract_epi8 and _mm256_extract_epi16 zero extend their i8/i16 result to i32. This matches _mm_extract_epi8 and _mm_extract_epi16.

Fix for PR27594

Katya - I've updated the doxygen comments for _mm256_extract_epi8 and _mm256_extract_epi16, I guess this will need to be updated in Sony's intrinsics document for the next regeneration?

Diff Detail

Event Timeline

RKSimon updated this revision to Diff 57927.May 20 2016, 6:17 AM
RKSimon retitled this revision from to [X86][AVX] Ensure zero-extension of _mm256_extract_epi8 and _mm256_extract_epi16.
RKSimon updated this object.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: cfe-commits.
mkuper edited edge metadata.May 20 2016, 8:00 AM

Could you point me to where in the documentation it says they must be zero-extended?
The Intel intrinsics guide actually has them with shorter return types:

__int8 _mm256_extract_epi8 (__m256i a, const int index)
__int16 _mm256_extract_epi16 (__m256i a, const int index)

Could you point me to where in the documentation it says they must be zero-extended?
The Intel intrinsics guide actually has them with shorter return types:

__int8 _mm256_extract_epi8 (__m256i a, const int index)
__int16 _mm256_extract_epi16 (__m256i a, const int index)

And the gcc version has them wrapped to the _mm_extract_epi* intrinsics which map to the real 128-bit instructions which do zero-extend.

I'm open to changing the return types in the headers instead, but really I'd expect the mm256 versions to zero extend like the older mm versions.

You're right, the underlying instructions zext, and it seems like it's the right thing to do. I'm just wondering if this is something user code is supposed to rely on, given the way the intrinsics guide documents them right now.
H.J, could you please take a look?

Hi Michael,

I think the Intel Intrinsics reference and the Intel Compiler are in error and that this is the right fix for the LLVM headers. (I'll follow up to get the Intel Intrinsics reference & Intel Compiler fixed.)

The _mm256_extract_epiN "convenience" intrinsics were first introduced by gcc and used an "int" return type. They were added to the Intel Compiler about 2 years ago, but for some reason were defined to use the smaller signed types. I'm double checking with the developer that added them, but I think it was just a mistake.

-Dave

mkuper accepted this revision.May 20 2016, 2:10 PM
mkuper edited edge metadata.

Thanks, Dave!

In that case, LGTM.

This revision is now accepted and ready to land.May 20 2016, 2:10 PM
This revision was automatically updated to reflect the committed changes.