This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] lowerShuffleAsDecomposedShuffleBlend - support decomposed unpacks for some vXi8/vXi16 cases
ClosedPublic

Authored by RKSimon on Sep 9 2020, 11:31 AM.

Details

Summary

Follow up to D86429 to handle the remaining regressions.

This patch generalizes lowerShuffleAsDecomposedShuffleBlend to lowerShuffleAsDecomposedShuffleMerge, and attempts to use an UNPCKL shuffle mask instead of a blend for the cases where the inputs are coming from alternating vXi8/vXi16 sources. Technically they don't have to be alternating (just as long as they can fit into a lower lane half for the unpack) but I didn't find as many general cases and it needed a lot more of the function to be altered.

For vXi32/vXi64 cases this could still be beneficial but in most cases the existing permute+blend approach was better.

Diff Detail

Event Timeline

RKSimon created this revision.Sep 9 2020, 11:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 9 2020, 11:31 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
TellowKrinkle accepted this revision.Sep 11 2020, 1:52 PM

LGTM, nice to see the remaining regressions from D86429 cleared up

This revision is now accepted and ready to land.Sep 11 2020, 1:52 PM
This revision was landed with ongoing or failed builds.Sep 12 2020, 5:41 AM
This revision was automatically updated to reflect the committed changes.