In D21740, we discussed trying to make this a more general matcher. However, I didn't see a clean way to handle the regular m_Not cases and these non-splat vector patterns, so I've just opted for the direct approach here. If there are other potential uses of areInverseVectorBitmasks(), we could move that to a higher level.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Do we care at all about the number of uses of the operands?
lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1601 ↗ | (On Diff #63145) | What if one is undef? |
Good question. I was assuming one-use based on my motivating examples, but since we're potentially creating an xor here and bitcasts above this, we probably do need to check that.
lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1601 ↗ | (On Diff #63145) | Ah, right. Or even both elts could be undef? |
lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1601 ↗ | (On Diff #63145) | Sure. |
Patch updated:
- Added FIXME to handle undef elements for non-splat constant vectors.
- Added testcase for multiple-uses of the non-splat mask vectors.
- Ahead of this patch, limited the transform for multiple-use scenarios in:
http://reviews.llvm.org/rL274926
http://reviews.llvm.org/rL274932
LGTM.
test/Transforms/InstCombine/logical-select.ll | ||
---|---|---|
374 ↗ | (On Diff #63327) | Is a select actually the canonical form for this? It seems like we should prefer a shufflevector. |
test/Transforms/InstCombine/logical-select.ll | ||
---|---|---|
374 ↗ | (On Diff #63327) | I saw that x86 SSE or AVX didn't care, eg: vpblendw $60, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1],xmm1[2,3,4,5],xmm0[6,7] ..so I figured one was as good as the other, but this is not true in general. AArch64 and PPC+VSX generate different code: define <4 x i32> @foo(<4 x i32> %a, <4 x i32> %b) { %sel = select <4 x i1> <i1 true, i1 false, i1 false, i1 true>, <4 x i32> %a, <4 x i32> %b ret <4 x i32> %sel } define <4 x i32> @goo(<4 x i32> %a, <4 x i32> %b) { %shuf = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 5, i32 6, i32 3> ret <4 x i32> %shuf } $ ./llc shufsel.ll -o - -mtriple=aarch64 adrp x8, .LCPI0_0 ldr q2, [x8, :lo12:.LCPI0_0] bsl v2.16b, v0.16b, v1.16b mov v0.16b, v2.16b ret ext v1.16b, v0.16b, v1.16b, #12 ext v0.16b, v1.16b, v0.16b, #4 ext v1.16b, v1.16b, v1.16b, #8 ext v0.16b, v0.16b, v1.16b, #12 ret $ ./llc shufsel.ll -o - -mtriple=powerpc64 -mattr=vsx addis 3, 2, .LCPI0_0@toc@ha addi 3, 3, .LCPI0_0@toc@l lxvw4x 0, 0, 3 xxsel 34, 35, 34, 0 blr addis 3, 2, .LCPI1_0@toc@ha addi 3, 3, .LCPI1_0@toc@l lxvw4x 36, 0, 3 vperm 2, 2, 3, 4 blr I don't know AArch at all, but bsl seems better than 4 exts. Is there a better way than either of those? PPC xxsel vs. vperm seem equivalent, but I have no knowledge of any recent PPC HW. Does the current codegen affect what we do here in IR? |
I would lean towards making shufflevector canonical. It interacts better with existing transforms involving shuffles and vector insert/extract. I can't think of any benefit to making the select form canonical.
It's good to be aware of how specific targets handle it, but it doesn't really matter that much given that we can just fix the target to pattern-match this kind of shufflevector.
I don't have a strong opinion on what we do here in IR, but IMO, a bitwise or element-level vector select is a required feature of any vector ISA - although x86 took 5+ ISA revs to get it :) . The more general vector shuffle/permute is not required.
I filed PR28530 and PR28531 to make at least ARM and PPC aware of their existing behavior and the proposal:
https://llvm.org/bugs/show_bug.cgi?id=28530
https://llvm.org/bugs/show_bug.cgi?id=28531 (note the VMX tragedy - we eliminated the logic ops in IR, and they came back in codegen!)
The question isn't really what instructions we expect targets to have, but rather which version makes LLVM simpler overall. That said, if we assume almost every vector target has some sort of blend instruction, and therefore almost every target would need to pattern match the shuffle into a select, that would be a reason to prefer the select form, I guess.
Agree - that was the point I was trying to ineffectively make: if targets are going to prefer the select/blend anyway, we save some slight codegen churn from producing that form here in IR directly.