This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] fold insertelement of constant into shuffle with constant operand (PR29126)
ClosedPublic

Authored by spatel on Aug 25 2016, 12:03 PM.

Details

Summary

We can see chains of insertelement instructions before SSE/AVX scalar intrinsics, so this is a first step towards shrinking that to a single shufflevector.

This should solve PR29126:
https://llvm.org/bugs/show_bug.cgi?id=29126

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 69278.Aug 25 2016, 12:03 PM
spatel retitled this revision from to [InstCombine] fold insertelement of constant into shuffle with constant operand (PR29126).
spatel updated this object.
spatel added reviewers: majnemer, efriedma, hfinkel.
spatel added subscribers: RKSimon, ABataev.
efriedma edited edge metadata.Aug 26 2016, 10:27 AM

This seems vaguely risky: if the backend isn't smart enough to pattern-match an arbitrary shuffle into a cheap shuffle + insert, you could end up making the code slower. (For example, I'm not sure the x86 backend can decompose an <8 x i16> shuffle into a punpcklwd+pinsrw in general.)

lib/Transforms/InstCombine/InstCombineVectorOps.cpp
602 ↗(On Diff #69278)

I'm not following this logic... don't you need to prove that the chosen element of the original constant vector isn't used? Consider a shuffle mask like "<8 x i16> <0, 1, 2, 3, 8, 9, 10, 11>" followed by an insertion at index 0.

This seems vaguely risky: if the backend isn't smart enough to pattern-match an arbitrary shuffle into a cheap shuffle + insert, you could end up making the code slower. (For example, I'm not sure the x86 backend can decompose an <8 x i16> shuffle into a punpcklwd+pinsrw in general.)

Let me write up some more tests and pass them on to a few different targets. For the x86 cases I looked at, this appeared to always be a win, but I didn't try any i16 tests.

lib/Transforms/InstCombine/InstCombineVectorOps.cpp
602 ↗(On Diff #69278)

Yes, this is just wrong. I was only thinking about the same-lane / blend pattern in the motivating example.

This leads to a question that was raised in D22114. Which is canonical: a shufflevector with a select-equivalent mask or a select with a constant condition operand?

Given:

define <4 x i8> @hoo(<4 x i8> %x) {
  %y = shufflevector <4 x i8> %x, <4 x i8> <i8 undef, i8 5, i8 6, i8 7>, <4 x i32><i32 0, i32 7, i32 6, i32 5> ; lane-changing
  ret <4 x i8> %y
}

Should we transform to:

define <4 x i8> @hoo(<4 x i8> %x) {
  %y = shufflevector <4 x i8> %x, <4 x i8> <i8 undef, i8 7, i8 6, i8 5>, <4 x i32><i32 0, i32 5, i32 6, i32 7> ; lane-preserving
  ret <4 x i8> %y
}

or:

define <4 x i8> @hoo(<4 x i8> %x) {
  %y = select <4 x i1> <i1 true, i1 false, i1 false, i1 false>, <4 x i8> %x, <4 x i8> <i8 undef, i8 7, i8 6, i8 5>
  ret <4 x i8> %y
}

IR should be canonical, so we should add transforms to make one of the latter the preferred form?

Probably a good idea to bring up the shuffle vs select discussion on llvmdev, to get more visibility; it will have a substantial impact on backends.

spatel updated this revision to Diff 70065.Sep 1 2016, 2:06 PM
spatel edited edge metadata.

Patch updated:
We seem to have consensus about canonicalizing a vector select with a constant condition operand to a shuffle, but I'll give it a bit more time before making a patch for that.

Even without that step in place, I think we can push this patch forward by limiting the transform to select-equivalent shuffles.

  1. We still get the motivating case.
  2. The previous logic was incorrect for the general shuffle case, but it should work for this limited form of shuffle.
  3. We add a bit of shuffle <--> select plumbing via isShuffleEquivalentToSelect() which may be useful for subsequent patches too.

This is looking much better.

lib/Transforms/InstCombine/InstCombineVectorOps.cpp
590 ↗(On Diff #70065)

I think you can write the body of this loop as "int EltVal = Shuf.getMaskValue(i); if (EltVal != -1 && EltVal != i && EltVal != i + Vecsize) return false;".

spatel added inline comments.Sep 2 2016, 6:49 AM
lib/Transforms/InstCombine/InstCombineVectorOps.cpp
590 ↗(On Diff #70065)

Nice - not sure how I missed that method up to now!

spatel updated this revision to Diff 70162.Sep 2 2016, 7:28 AM

Patch updated:
Use Shuf.getMaskValue(i) to simplify the code.

efriedma accepted this revision.Sep 2 2016, 9:12 AM
efriedma edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Sep 2 2016, 9:12 AM
This revision was automatically updated to reflect the committed changes.