As long as the destination type is a 256 or 128 bit vector we can use __builtin_convertvector to directly generate trunc IR instruction which will be handled natively by the backend.
Details
Details
Diff Detail
Diff Detail
- Repository
- rC Clang
- Build Status
Buildable 17981 Build 17981: arc lint + arc unit
Event Timeline
Comment Actions
There are four other similar intrinsics which convert to 128/256-bit vectors:
m128i _mm256_cvtepi32_epi8 (m256i a)
m128i _mm256_cvtepi64_epi16 (m256i a)
m128i _mm256_cvtepi64_epi8 (m256i a)
m128i _mm512_cvtepi64_epi8 (m512i a)
Can you also include them?
Comment Actions
Probably these should be possible, but e.g. with the _mm256_cvtepi32_epi8 case, I can only get this far:
vpmovdw %ymm0, %xmm0 vpshufb .LCPI2_0(%rip), %xmm0, %xmm0 # xmm0 = xmm0[0,2,4,6,8,10,12,14],zero,zero,zero,zero,zero,zero,zero,zero vzeroupper retq
While the expected result is a vpmovdb instruction, without the extra shuffling.