This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] extend vector select matching for non-splat constants
ClosedPublic

Authored by spatel on Jul 7 2016, 3:02 PM.

Details

Summary

In D21740, we discussed trying to make this a more general matcher. However, I didn't see a clean way to handle the regular m_Not cases and these non-splat vector patterns, so I've just opted for the direct approach here. If there are other potential uses of areInverseVectorBitmasks(), we could move that to a higher level.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 63144.Jul 7 2016, 3:02 PM
spatel retitled this revision from to [InstCombine] extend vector select matching for non-splat constants.
spatel updated this object.
spatel added reviewers: eli.friedman, majnemer, RKSimon.
spatel added a subscriber: llvm-commits.
spatel updated this revision to Diff 63145.Jul 7 2016, 3:14 PM

Patch updated:
The previous version that I uploaded was an incorrect local draft.

eli.friedman edited edge metadata.Jul 7 2016, 3:29 PM

Do we care at all about the number of uses of the operands?

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1601 ↗(On Diff #63145)

What if one is undef?

spatel added a comment.Jul 7 2016, 3:49 PM

Do we care at all about the number of uses of the operands?

Good question. I was assuming one-use based on my motivating examples, but since we're potentially creating an xor here and bitcasts above this, we probably do need to check that.

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1601 ↗(On Diff #63145)

Ah, right. Or even both elts could be undef?
Ok to make that a FIXME comment for this patch? I'll follow-up with the extra logic and test cases.

eli.friedman added inline comments.Jul 7 2016, 4:41 PM
lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1601 ↗(On Diff #63145)

Sure.

spatel updated this revision to Diff 63327.Jul 8 2016, 3:14 PM
spatel edited edge metadata.

Patch updated:

  1. Added FIXME to handle undef elements for non-splat constant vectors.
  2. Added testcase for multiple-uses of the non-splat mask vectors.
  3. Ahead of this patch, limited the transform for multiple-use scenarios in:

http://reviews.llvm.org/rL274926
http://reviews.llvm.org/rL274932

eli.friedman accepted this revision.Jul 12 2016, 1:21 PM
eli.friedman edited edge metadata.

LGTM.

test/Transforms/InstCombine/logical-select.ll
374 ↗(On Diff #63327)

Is a select actually the canonical form for this? It seems like we should prefer a shufflevector.

This revision is now accepted and ready to land.Jul 12 2016, 1:21 PM
spatel added inline comments.Jul 12 2016, 3:23 PM
test/Transforms/InstCombine/logical-select.ll
374 ↗(On Diff #63327)

I saw that x86 SSE or AVX didn't care, eg:

vpblendw	$60, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1],xmm1[2,3,4,5],xmm0[6,7]

..so I figured one was as good as the other, but this is not true in general. AArch64 and PPC+VSX generate different code:

define <4 x i32> @foo(<4 x i32> %a, <4 x i32> %b) {
  %sel = select <4 x i1> <i1 true, i1 false, i1 false, i1 true>, <4 x i32> %a, <4 x i32> %b
  ret <4 x i32> %sel
}

define <4 x i32> @goo(<4 x i32> %a, <4 x i32> %b) {
  %shuf = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 5, i32 6, i32 3>
  ret <4 x i32> %shuf
}

$ ./llc shufsel.ll -o - -mtriple=aarch64

adrp	x8, .LCPI0_0
ldr	q2, [x8, :lo12:.LCPI0_0]
bsl	v2.16b, v0.16b, v1.16b
mov		v0.16b, v2.16b
ret

ext	v1.16b, v0.16b, v1.16b, #12
ext	v0.16b, v1.16b, v0.16b, #4
ext	v1.16b, v1.16b, v1.16b, #8
ext	v0.16b, v0.16b, v1.16b, #12
ret

$ ./llc shufsel.ll -o - -mtriple=powerpc64 -mattr=vsx

addis 3, 2, .LCPI0_0@toc@ha
addi 3, 3, .LCPI0_0@toc@l
lxvw4x 0, 0, 3
xxsel 34, 35, 34, 0
blr

addis 3, 2, .LCPI1_0@toc@ha
addi 3, 3, .LCPI1_0@toc@l
lxvw4x 36, 0, 3
vperm 2, 2, 3, 4
blr

I don't know AArch at all, but bsl seems better than 4 exts. Is there a better way than either of those? PPC xxsel vs. vperm seem equivalent, but I have no knowledge of any recent PPC HW.

Does the current codegen affect what we do here in IR?

I would lean towards making shufflevector canonical. It interacts better with existing transforms involving shuffles and vector insert/extract. I can't think of any benefit to making the select form canonical.

It's good to be aware of how specific targets handle it, but it doesn't really matter that much given that we can just fix the target to pattern-match this kind of shufflevector.

I would lean towards making shufflevector canonical. It interacts better with existing transforms involving shuffles and vector insert/extract. I can't think of any benefit to making the select form canonical.

I don't have a strong opinion on what we do here in IR, but IMO, a bitwise or element-level vector select is a required feature of any vector ISA - although x86 took 5+ ISA revs to get it :) . The more general vector shuffle/permute is not required.

I filed PR28530 and PR28531 to make at least ARM and PPC aware of their existing behavior and the proposal:
https://llvm.org/bugs/show_bug.cgi?id=28530
https://llvm.org/bugs/show_bug.cgi?id=28531 (note the VMX tragedy - we eliminated the logic ops in IR, and they came back in codegen!)

The question isn't really what instructions we expect targets to have, but rather which version makes LLVM simpler overall. That said, if we assume almost every vector target has some sort of blend instruction, and therefore almost every target would need to pattern match the shuffle into a select, that would be a reason to prefer the select form, I guess.

The question isn't really what instructions we expect targets to have, but rather which version makes LLVM simpler overall. That said, if we assume almost every vector target has some sort of blend instruction, and therefore almost every target would need to pattern match the shuffle into a select, that would be a reason to prefer the select form, I guess.

Agree - that was the point I was trying to ineffectively make: if targets are going to prefer the select/blend anyway, we save some slight codegen churn from producing that form here in IR directly.

This revision was automatically updated to reflect the committed changes.