This is an archive of the discontinued LLVM Phabricator instance.

try to lowerVectorShuffleAsElementInsertion() for all 256-bit vector sub-types [X86, AVX]
ClosedPublic

Authored by spatel on Mar 14 2015, 8:02 AM.

Details

Summary

I suggested this change in D7898: hoist the lowerVectorShuffleAsElementInsertion() call into lower256BitVectorShuffle().

It improves the v4i64 case although not optimally. This AVX codegen:

vmovq {{.*#+}} xmm0 = mem[0],zero
vxorpd %ymm1, %ymm1, %ymm1
vblendpd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3]

Becomes:

vmovsd {{.*#+}} xmm0 = mem[0],zero

Unfortunately, this doesn't completely solve PR22685. There are still at least 2 problems under here:

  1. We're not handling v32i8 / v16i16.
  2. We're not getting the FP / int domains right for instruction selection.

But since this patch alone appears to do no harm, reduces code duplication, and helps v4i64, I'm submitting this patch ahead of fixing the above.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 21985.Mar 14 2015, 8:02 AM
spatel retitled this revision from to try to lowerVectorShuffleAsElementInsertion() for all 256-bit vector sub-types [X86, AVX].
spatel updated this object.
spatel edited the test plan for this revision. (Show Details)
spatel added a subscriber: Unknown Object (MLST).
andreadb edited edge metadata.Mar 30 2015, 9:06 AM

Hi Sanjay,

test/CodeGen/X86/vector-shuffle-256-v4.ll
830 ↗(On Diff #21985)

So, this is what you meant when you said that we don't get the correct fp/int domain.
In X86InstrSSE.td we have patterns like this:

def : Pat<(v4i64 (X86vzmovl (insert_subvector undef, (v2i64 (scalar_to_vector (loadi64 addr:$src))), (iPTR 0)))), (SUBREG_TO_REG (i32 0), (VMOVSDrm addr:$src), sub_xmm)>;

Do you plan to send a follow-up patch to fix tablegen patterns so that VMOVQI2PQIrm is used instead of VMOVSDrm for the integer domain?. If that's the case, then it makes sense to commit this patch first and fix the fp/int domain issue in a separate patch.

test/CodeGen/X86/vector-shuffle-256-v8.ll
134–137 ↗(On Diff #21985)

This has nothing to do with your patch, however, I am surprised that we get this long sequence of instructions on AVX2 instead of just a single 'vmovaps' plus 'vpermd'.
Here, %ymm1 is used to store the 'vpermd' permute mask. That mask is basically known at compile time (it is vector <7,0,0,0,0,0,0,0>) so, we could just have a load from constant pool instead of computing the mask at runtime. I think we could replace this entire sequence with a load from constant pool followed by a 'vpermd'.

963–967 ↗(On Diff #21985)

Same here.

spatel added inline comments.Mar 30 2015, 10:22 AM
test/CodeGen/X86/vector-shuffle-256-v4.ll
830 ↗(On Diff #21985)

Hi Andrea -

That's correct. I saw a couple of places where we didn't have the right tablegen patterns. And I had a patch for it somewhere...but I'm not finding it now. But it was just simple replacements to substitute the right type like what you have noted here.

test/CodeGen/X86/vector-shuffle-256-v8.ll
134–137 ↗(On Diff #21985)

Interesting - it's not entirely unrelated because the permute mask itself could be viewed as a zero-extended vector, right? I've filed this as:
https://llvm.org/bugs/show_bug.cgi?id=23073

andreadb added inline comments.Mar 30 2015, 10:41 AM
test/CodeGen/X86/vector-shuffle-256-v8.ll
134–137 ↗(On Diff #21985)

Right,

movl $7, %eax
vmovd %eax, %xmm1
vxorps %ymm2, %ymm2, %ymm2
vblendps {{.*#+}} ymm1 = ymm1[0],ymm2[1,2,3,4,5,6,7]

is basically equivalent to:

movl $7, %eax
vmovd %eax, %xmm1

Bits [VLMAX-1:32] would be implicitly zeroed.

andreadb accepted this revision.Mar 31 2015, 8:04 AM
andreadb edited edge metadata.
This revision is now accepted and ready to land.Mar 31 2015, 8:04 AM
This revision was automatically updated to reflect the committed changes.