This is an archive of the discontinued LLVM Phabricator instance.

[X86] Use __builtin_convertvector to replace some of the avx512 truncate builtins.
ClosedPublic

Authored by craig.topper on May 10 2018, 11:44 PM.

Details

Summary

As long as the destination type is a 256 or 128 bit vector we can use __builtin_convertvector to directly generate trunc IR instruction which will be handled natively by the backend.

Diff Detail

Repository
rC Clang

Event Timeline

craig.topper created this revision.May 10 2018, 11:44 PM

There are four other similar intrinsics which convert to 128/256-bit vectors:

m128i _mm256_cvtepi32_epi8 (m256i a)
m128i _mm256_cvtepi64_epi16 (m256i a)
m128i _mm256_cvtepi64_epi8 (m256i a)
m128i _mm512_cvtepi64_epi8 (m512i a)

Can you also include them?

Nevermind - these four are not strictly truncating. Sorry for the confusion.

There are four other similar intrinsics which convert to 128/256-bit vectors:

m128i _mm256_cvtepi32_epi8 (m256i a)
m128i _mm256_cvtepi64_epi16 (m256i a)
m128i _mm256_cvtepi64_epi8 (m256i a)
m128i _mm512_cvtepi64_epi8 (m512i a)

Can you also include them?

Probably these should be possible, but e.g. with the _mm256_cvtepi32_epi8 case, I can only get this far:

vpmovdw %ymm0, %xmm0
vpshufb .LCPI2_0(%rip), %xmm0, %xmm0 # xmm0 = xmm0[0,2,4,6,8,10,12,14],zero,zero,zero,zero,zero,zero,zero,zero
vzeroupper
retq

While the expected result is a vpmovdb instruction, without the extra shuffling.

Yeah the others will need codegen work. So I'm starting with the easy cases.

GBuella accepted this revision.May 14 2018, 3:46 AM

LGTM

This revision is now accepted and ready to land.May 14 2018, 3:46 AM
This revision was automatically updated to reflect the committed changes.
GBuella added a subscriber: ashlykov.