Page MenuHomePhabricator

[X86] Lowering integer truncation intrinsics to native IR
ClosedPublic

Authored by mike.dvoretsky on Jun 28 2018, 2:12 AM.

Details

Summary

This patch lowers the _mm[256|512]_cvtepi{64|32|16}_epi{32|16|8} intrinsics to native IR in cases where the result's length is less than 128 bits.

The resulting IR for 256-bit inputs is folded into VPMOV instructions in D46957, while for 128-bit inputs the vpshufb (or, in the 64-to-32-bit case, vinsertps) instructions are generated instead. D48822 adds fast-isel tests that demonstrate generated instructions.

Diff Detail

Repository
rC Clang

Event Timeline

mike.dvoretsky created this revision.Jun 28 2018, 2:12 AM

Uploaded the correct diff.

craig.topper added inline comments.Jun 28 2018, 10:17 AM
clang/lib/Headers/avx512vlintrin.h
7421 ↗(On Diff #153277)

If you need to add more zeroes than the width of the input, can you just repeat the zero vector in order as many times as necessary. It should look more like a concatentation.

so 0, 1, 2, 3, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7

craig.topper added inline comments.Jun 28 2018, 12:12 PM
clang/lib/Headers/avx512vlintrin.h
33 ↗(On Diff #153277)

Can you just do a local typedef in the functions that need them? That way we don't expose them to all users of the header.

mike.dvoretsky marked 2 inline comments as done.

Updated per comments. Typedefs for intermediate short vectors moved into the bodies of the functions using them.

RKSimon added inline comments.Jun 29 2018, 5:36 AM
clang/lib/Headers/avx512vlbwintrin.h
1501 ↗(On Diff #153471)

Are we happy with using illegal types like this? What about flipping the shuffle and convert?

return (__m128i)__builtin_convertvector(
                  __builtin_shufflevector((__v8hi)__A,
                                          (__v8hi){0, 0, 0, 0, 0, 0, 0, 0},
                                          0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15), __v16qi);
mike.dvoretsky edited the summary of this revision. (Show Details)Jun 29 2018, 9:11 AM
mike.dvoretsky added inline comments.Jun 29 2018, 9:16 AM
clang/lib/Headers/avx512vlbwintrin.h
1501 ↗(On Diff #153471)

This would bring its own issues, since in the cvtepi64_epi8 cases the inner shuffle would produce vectors of 16 64-bit values. There would be no extra typedef, but in the back-end these would be split in type legalization, making it harder to fold them into VPMOV instructions.

Please can you create a llvm side parallel patch that updates the relevant fast-isel tests

clang/lib/Headers/avx512vlbwintrin.h
1501 ↗(On Diff #153471)

Yeah, neither solution is particularly clean. Please keep it as is.

mike.dvoretsky edited the summary of this revision. (Show Details)Jul 2 2018, 2:31 AM
mike.dvoretsky marked 3 inline comments as done.
mike.dvoretsky edited the summary of this revision. (Show Details)

LGTM - @craig.topper any comments?

This revision is now accepted and ready to land.Jul 8 2018, 10:38 AM
This revision was automatically updated to reflect the committed changes.