This is an archive of the discontinued LLVM Phabricator instance.

[x86] split more v8f32/v8i32 shuffles in lowering
ClosedPublic

Authored by spatel on Feb 13 2019, 7:30 AM.

Details

Summary

Similar to D57867 - this is a 1-line patch with lots of test diffs.
In most cases with half-vector-width narrowing potential, using an extract + 128-bit vshufps is a win because it replaces a 256-bit shuffle with a 128-bit shufle.

There's 1 potentially controversial diff pattern for a target with "fast-variable-shuffle".
We are changing:

vmovaps {{.*#+}} ymm1 = [load 256-bit constant permute mask]
vpermps %ymm0, %ymm1, %ymm0

to:

vextractf128 $1, %ymm0, %xmm1
vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2]

That could be a regression if the permute mask load could be moved out of a loop and the 256-bit op is executed at the same speed/power as a 128-bit op. But I think the extract+shufps combo is the right default choice at this level because it removes a ymm instruction. We should form 256-bit vpermps from the extract+shufps as a later optimization within a loop if that would be profitable.

Diff Detail

Event Timeline

spatel created this revision.Feb 13 2019, 7:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 13 2019, 7:30 AM

Better to stage this in 2 parts?
I can add a check for fast-variable-shuffle, so we get the clear improvements. Then, a follow-up can remove that check and see if that results in any real-world fallout.

Better to stage this in 2 parts?
I can add a check for fast-variable-shuffle, so we get the clear improvements. Then, a follow-up can remove that check and see if that results in any real-world fallout.

Yes, please can you update this patch for just the slow path?

spatel updated this revision to Diff 187069.Feb 15 2019, 12:46 PM

Patch updated:
Restrict the change to targets without fast-variable-shuffle.

RKSimon accepted this revision.Feb 18 2019, 3:36 AM

LGTM - cheers.

This revision is now accepted and ready to land.Feb 18 2019, 3:36 AM
This revision was automatically updated to reflect the committed changes.