This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add custom type legalization for v16i64->v16i8 truncate and v8i64->v8i8 truncate when v8i64 isn't legal
ClosedPublic

Authored by craig.topper on Oct 3 2019, 3:01 PM.

Details

Summary

The default legalization for v16i64->v16i8 tries to create a multiple stage truncate concatenating after each stage and truncating again. But avx512 implements truncates with multiple uops. So it should be better to truncate all the way to the desired element size and then concatenate the pieces using unpckl instructions. This minimizes the number of 2 uop truncates. The unpcks are all single uop instructions.

I tried to handle this by just custom splitting the v16i64->v16i8 shuffle. And hoped that the DAG combiner would leave the two halves in the state needed to make D68374 do the job for each half. This worked for the first half, but the second half got messed up. So I've implemented custom handling for v8i64->v8i8 when v8i64 needs to be split to produce the VTRUNCs directly.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Oct 3 2019, 3:01 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 3 2019, 3:01 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
craig.topper marked an inline comment as done.Oct 3 2019, 3:05 PM
craig.topper added inline comments.
llvm/test/CodeGen/X86/min-legal-vector-width.ll
836 ↗(On Diff #223103)

The loss of the VPERMI2B here is a regression, but the VTRUNC form should allow us to create saturating VTRUNCs for the cases in vector-trunc-ssat.ll and vector-trunc-usat.ll. So maybe we need a late shuffle combine to VPERMI2B?

Rebase after landing the VTRUNCUS/VTRUNCS patch.

RKSimon added inline comments.Oct 5 2019, 9:37 AM
llvm/test/CodeGen/X86/min-legal-vector-width.ll
836 ↗(On Diff #223103)

We don't currently support VTRUNC in shuffle combining as we're still weak at handling conflicting vector sizes - is that limitation ok for now?

craig.topper marked an inline comment as done.Oct 5 2019, 9:52 AM
craig.topper added inline comments.
llvm/test/CodeGen/X86/min-legal-vector-width.ll
836 ↗(On Diff #223103)

I think so. We've only started shipping CPUs that support VPERMI2B last month I think so they aren't very widespread yet. If it becomes a problem we can probably match this specific pattern in a DAG combine. I'll open a bugzilla when this patch lands.

RKSimon accepted this revision.Oct 6 2019, 9:58 AM

LGTM along with raising a bug about the VPERMI2B regression

This revision is now accepted and ready to land.Oct 6 2019, 9:58 AM
This revision was automatically updated to reflect the committed changes.