This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] transform more extract/insert pairs into shuffles (PR2109)
ClosedPublic

Authored by spatel on Nov 30 2015, 3:52 PM.

Details

Summary

This is an extension of the shuffle combining from r203229:
http://reviews.llvm.org/rL203229

The idea is to widen a short input vector with undef elements so the existing shuffle transform for extract/insert can kick in.

The motivation is to finally solve PR2109:
https://llvm.org/bugs/show_bug.cgi?id=2109

For that example, the IR becomes:

%1 = bitcast <2 x i32>* %P to <2 x float>*
%ld1 = load <2 x float>, <2 x float>* %1, align 8
%2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5>
ret <4 x float> %i2

And x86 SSE output improves from:

movq	(%rdi), %xmm1           ## xmm1 = mem[0],zero
movdqa	%xmm1, %xmm2
shufps	$229, %xmm2, %xmm2      ## xmm2 = xmm2[1,1,2,3]
shufps	$48, %xmm0, %xmm1       ## xmm1 = xmm1[0,0],xmm0[3,0]
shufps	$132, %xmm1, %xmm0      ## xmm0 = xmm0[0,1],xmm1[0,2]
shufps	$32, %xmm0, %xmm2       ## xmm2 = xmm2[0,0],xmm0[2,0]
shufps	$36, %xmm2, %xmm0       ## xmm0 = xmm0[0,1],xmm2[2,0]
retq

To the almost optimal:

movhpd	(%rdi), %xmm0

Note: There's a tension in the existing transform related to generating arbitrary shufflevector masks. We avoid that in other places in InstCombine because we're scared that codegen can't handle strange masks, but it looks like we're ok with producing those here. I purposely chose weird insert/extract indexes for the regression tests to see the effect in these cases. For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or better for these examples.

Note 2: The 2x shufflevector mask limitation is not in the IR Language Reference shufflevector instruction definition, but it is encoded in ShuffleVectorInst::isValidOperands().

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 41443.Nov 30 2015, 3:52 PM
spatel retitled this revision from to [InstCombine] transform more extract/insert pairs into shuffles (PR2109).
spatel updated this object.
spatel added reviewers: t.p.northover, hfinkel, RKSimon.
spatel added a subscriber: llvm-commits.

Note 2: The 2x shufflevector mask limitation is not in the IR Language Reference shufflevector instruction definition, but it is encoded in ShuffleVectorInst::isValidOperands().

Disregard that comment. I mistook a bug in an earlier draft of this patch as that limitation. I'll update the patch to remove that check.

spatel updated this revision to Diff 41452.Nov 30 2015, 4:24 PM

Patch updated:

  1. Removed check for 2x shuffle.
  2. Updated 'too_wide' test case because it's not too wide!
RKSimon accepted this revision.Dec 19 2015, 9:12 AM
RKSimon edited edge metadata.

LGTM

This revision is now accepted and ready to land.Dec 19 2015, 9:12 AM
This revision was automatically updated to reflect the committed changes.