This is an archive of the discontinued LLVM Phabricator instance.

[X86][AVX] Lower v16i8/v8i16 shuffles using VTRUNC/TRUNCATE
ClosedPublic

Authored by RKSimon on Aug 17 2020, 10:50 AM.

Details

Summary

This patch extends the existing lowerShuffleWithVPMOV with a lowerShuffleWithVTRUNC wrapper that handles basic binary shuffles that can be lowered either as a pure ISD::TRUNCATE or a X86ISD::VTRUNC (with undef/zero values in the remaining upper elements).

We concat the binary sources together into a single 256-bit source vector. To avoid regressions we perform this after we've tried to lower with PACKS/PACKUS which typically does a cleaner job than a concat.

For non-AVX512VL cases we have to canonicalize VTRUNC cases to use a 512-bit source vectors (inserting undefs/zeros in the upper elements as necessary), truncate and then (possibly) extract the 128-bit result.

This should address the last regressions in D66004

Diff Detail

Event Timeline

RKSimon created this revision.Aug 17 2020, 10:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2020, 10:50 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
RKSimon requested review of this revision.Aug 17 2020, 10:50 AM

It looks like we may have already doing it in some cases, but is a VTRUNC for xmm->xmm really better the VPSHUFB? VTRUNC is 2 port 5 uops. VPSHUFB is 1 port 5 uop.

llvm/lib/Target/X86/X86ISelLowering.cpp
11396

When does this condition happen? Doesn't Mask always follow VT in shuffle lowering?

It looks like we may have already doing it in some cases, but is a VTRUNC for xmm->xmm really better the VPSHUFB? VTRUNC is 2 port 5 uops. VPSHUFB is 1 port 5 uop.

That sounds reasonable to me - although there's the inevitable question of the cost of loading the shuffle mask - I'll limit it to just binary shuffles, which is the cause of the regressions in D66004.

llvm/lib/Target/X86/X86ISelLowering.cpp
11396

This is just copy+paste from lowerShuffleWithVPMOV - neither actually need it

RKSimon updated this revision to Diff 286110.Aug 17 2020, 12:37 PM

Remove unnecessary mask length sanity checks

RKSimon updated this revision to Diff 286120.Aug 17 2020, 1:00 PM
RKSimon edited the summary of this revision. (Show Details)

Only match binary shuffles

This revision is now accepted and ready to land.Aug 17 2020, 3:00 PM