This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] recognize shuffle (shuffle X, Mask0), Mask --> splat X
ClosedPublic

Authored by spatel on Feb 28 2020, 5:46 AM.

Details

Summary

We get the simple cases of this via demanded elements and other folds, but that doesn't work if the values have >1 use, so add a dedicated match for the pattern.

We already have this transform in IR, but it doesn't help the motivating x86 tests (based on PR42024) because the shuffles don't exist until after legalization and other combines have happened. The AArch64 test shows a minimal IR example of the problem.

Diff Detail

Event Timeline

spatel created this revision.Feb 28 2020, 5:46 AM
RKSimon added inline comments.Feb 28 2020, 6:03 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
19298

Check for isShuffleMaskLegal() ?

Sounds good to me modulo legality check.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
19264

I'd suggest OuterShuf and InnerShuf

19277

CombinedMask? Not sure.

19282–19286

Might it be more obvious to do

int MaskElt = Mask[i];
if (MaskElt == -1)
  continue;
// Peek through the shuffle masks to get the underlying source element.
int MaskOp0Elt = MaskOp0[MaskElt];
if (MaskOp0Elt == -1)
  continue;

?

spatel marked an inline comment as done.Feb 28 2020, 8:43 AM
spatel added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
19298

That's the conservatively correct approach, but I'm not sure if we want that limit. We know that the shuffle that we are creating is the same type that we started with, and it's a splat, so we treat that as a special-case that any target should handle efficiently:
https://llvm.org/docs/CodeGenerator.html#selectiondag-legalize-phase

I don't have a real-world example of this, but it's easy to hack an x86 test to show an improvement without the check:

define void @disguised_splat(<8 x i32> %x, <8 x i32>* %p0, <8 x i32>* %p1) {
  %shuf0 = shufflevector <8 x i32> %x, <8 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 0, i32 0, i32 3, i32 3, i32 4>
  store <8 x i32> %shuf0, <8 x i32>* %p0
  %shuf1 = shufflevector <8 x i32> %shuf0, <8 x i32> undef, <8 x i32> <i32 4, i32 3, i32 4, i32 3, i32 4, i32 4, i32 3, i32 3>
  store <8 x i32> %shuf1, <8 x i32>* %p1
  ret void
}

With mask legality check, we'll fail to detect the splat before the op gets split:

$ llc -o - shuf.ll
	shufps	$48, %xmm0, %xmm1       ## xmm1 = xmm1[0,0],xmm0[3,0]
	movaps	%xmm0, %xmm2
	shufps	$44, %xmm1, %xmm2       ## xmm2 = xmm2[0,3],xmm1[2,0]
	pshufd	$36, %xmm0, %xmm1       ## xmm1 = xmm0[0,1,2,0]
	movdqa	%xmm1, (%rdi)
	movaps	%xmm2, 16(%rdi)
	pshufd	$0, %xmm0, %xmm2        ## xmm2 = xmm0[0,0,0,0]
	shufps	$240, %xmm1, %xmm0      ## xmm0 = xmm0[0,0],xmm1[3,3]
	movdqa	%xmm2, (%rsi)
	movaps	%xmm0, 16(%rsi)

Without mask legality check, we can simplify before splitting:

$ llc -o - shuf.ll 
	shufps	$48, %xmm0, %xmm1       ## xmm1 = xmm1[0,0],xmm0[3,0]
	pshufd	$36, %xmm0, %xmm2       ## xmm2 = xmm0[0,1,2,0]
	pshufd	$0, %xmm0, %xmm3        ## xmm3 = xmm0[0,0,0,0]
	shufps	$44, %xmm1, %xmm0       ## xmm0 = xmm0[0,3],xmm1[2,0]
	movdqa	%xmm2, (%rdi)
	movaps	%xmm0, 16(%rdi)
	movdqa	%xmm3, (%rsi)
	movdqa	%xmm3, 16(%rsi)

The safe thing would be to add the mask legality predicate here, then remove it as a follow-up (after adding a test like above to show a diff)...then wait to see if there are any complaints. :)

spatel updated this revision to Diff 247285.Feb 28 2020, 9:01 AM
spatel marked 4 inline comments as done.

Patch updated:

  1. Added shuffle mask legality check.
  2. Improved variable names/logic.
  3. Added assert for types.
This revision is now accepted and ready to land.Feb 28 2020, 9:09 AM
RKSimon accepted this revision.Feb 28 2020, 9:24 AM

LGTM - cheers

This revision was automatically updated to reflect the committed changes.