This is an archive of the discontinued LLVM Phabricator instance.

[X86] Lower the cost of v4i64->v4i32 and v8i64->v8i32 truncate with AVX
ClosedPublic

Authored by craig.topper on Apr 29 2020, 11:33 AM.

Details

Summary

We generate much better code these days than we used to. And we use the same sequence for AVX1 and AVX2 for these

For v4i64->v4i32 we generate

vextractf128    xmm1, ymm0, 1
vshufps xmm0, xmm0, xmm1, 136   # xmm0 = xmm0[0,2],xmm1[0,2]

And for v8i64->v8i32 we generate

vperm2f128      ymm2, ymm0, ymm1, 49 # ymm2 = ymm0[2,3],ymm1[2,3]
vinsertf128     ymm0, ymm0, xmm1, 1
vshufps ymm0, ymm0, ymm2, 136   # ymm0 = ymm0[0,2],ymm2[0,2],ymm0[4,6],ymm2[4,6]

Diff Detail

Event Timeline

craig.topper created this revision.Apr 29 2020, 11:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2020, 11:33 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
craig.topper edited the summary of this revision. (Show Details)Apr 29 2020, 11:33 AM
spatel accepted this revision.Apr 29 2020, 12:35 PM

LGTM

This revision is now accepted and ready to land.Apr 29 2020, 12:35 PM
This revision was automatically updated to reflect the committed changes.