Splatting the first vector element of the result of a BinOp, where any of the
BinOp's operands are the result of a first vector element splat can be simplified to
splatting the first vector element of the result of the BinOp
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Do we need negative tests - maybe add a negative test for zero masks containing undefs (e.g. <4 x i32> <i32 0, i32 undef, i32 0, i32 0>)?
@spatel Any comments?
If the motivating case always ends in a splat, could we do this in instcombine instead?
This is an improvement for the specific patterns where it is allowed, but it needs to account for possibly several other patterns where it is changing code in ways that the original transform did not anticipate. For example, this can currently miscompile (tested with aarch64 and x86 targets):
define <4 x i32> @insert_splat_sub(i32 %x, i32 %y) {
  %x111 = insertelement <4 x i32> <i32 1, i32 1, i32 1, i32 1>, i32 %x, i64 0
  %y567 = insertelement <4 x i32> <i32 4, i32 5, i32 6, i32 7>, i32 %y, i64 0
  %yyyy = shufflevector <4 x i32> %y567, <4 x i32> poison, <4 x i32> zeroinitializer
  %r = sub <4 x i32> %x111, %yyyy
  ret <4 x i32> %r
}Moved VectorCombine tests that are no longer necessary into InstCombine
Deleted all logic from VectorCombine, and have instead moved BinOp scalarizing logic into InstCombine. I've only seen this pattern emerge in InstCombine for scalable vectors, so this InstCombine is currently guarded by a scalable vector check.
Hi @spatel,
Thank you very much for the review, it's much appreciated.
I've moved this patch away from VectorCombine and into Instcombine instead.
For the miscompiling example you provided, this should no longer miscompile as I've gated the combine behind a check for scalable vectors. We've only seen the redundant inserts and splats appear for scalable vectors as late as Instcombine, which is why I originally focused on VectorCombine. The Instcombine looks backwards from shuffles now as well, which means the example you provided wont miscompile as it's no longer blindly throwing insertelements a level higher.
@RKSimon Given the move to InstCombine I think the combine is now specific enough not to warrant negative tests. Please let me know if you disagree,
| llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
|---|---|---|
| 2631 | Please use PoisonValue here if possible. We are trying to get rid of undef. | |
Aha - I was wondering if the motivating case was just a scalable transform that escaped.
I stepped through some of the examples, and I think we can add a more general fold that will work for both scalable and fixed.
Here's a test example:
define <4 x i8> @splat_binop_splat(<4 x i8> %x, <4 x i8> %y) {
  %xsplat = shufflevector <4 x i8> %x, <4 x i8> poison, <4 x i32> zeroinitializer
  ; call void @use(<4 x i8> %xsplat)
  %b = add <4 x i8> %xsplat, %y
  %bsplat = shufflevector <4 x i8> %b, <4 x i8> poison, <4 x i32> zeroinitializer
  ret <4 x i8> %bsplat
}With fixed vectors, this is reduced via InstCombinerImpl::SimplifyDemandedVectorElts(): the final splat means we don't care about anything but the zero elements of x and y, so the splat of x is irrelevant. But InstCombinerImpl::SimplifyDemandedVectorElts() bails on scalable vectors, and I'm not sure what is needed to relax that limitation. There's also an earlier bail out on all shuffles of scalable vectors in InstCombinerImpl::visitShuffleVectorInst().
But this pattern won't fold even with fixed vectors if we uncomment the extra use of xsplat, so that seems like an opportunity to do better by just matching this set of patterns since it's probably fairly common.
So I think we want to do something like this for any vector:
splat (binop splat(X), Y) --> splat (binop X, Y)
splat (binop X, splat(Y)) --> splat (binop X, Y)
...and that should allow existing folds to reduce your examples to the ideal (scalarized binop) IR.
I've made the combine more generic to only remove the redundant BinOp splats. The vscale tests added in previous revisions did not benefit from this change, and so I've removed them. The original vscale case where redundant splats weren't being removed has been resolved by this change, as VectorCombine recognizes it can change the vector BinOps to scalar BinOps once the shuffles have been removed and only InsertElt's remain in their place. I believe the newly added tests from the discussion on this review provide sufficient coverage.
| llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
|---|---|---|
| 2605–2606 | Use ShuffleVectorInst::isZeroEltSplat() to shorten this? We don't necessarily have to match a splat of the 0-element for this transform to work. We can deal with that variant as a follow-up patch to keep things simple for now, but we should eventually handle it. | |
| 2615–2619 | I don't think this sequence works the way you are expecting. The 2nd match could overwrite the Y value that was set by the preceding statement. | |
| 2623 | This will drop fast-math and other flags, but we should be able to safely propagate those? Try examples with Alive2 to verify that that is correct. | |
| llvm/test/Transforms/InstCombine/vscale-insert-shuffle-binop.ll | ||
| 4 ↗ | (On Diff #469326) | Remove the target specifier. This is target-independent canonicalization, so we don't want target restrictions (and that would likely cause testing bots to fail). | 
| 6 ↗ | (On Diff #469326) | The fixed vector tests without an extra use are already folded, right? We should add extra uses to all fixed vector tests to show the improvement from this patch. Once that's done, please pre-commit the baseline tests, so we'll just see the test diffs in this patch. You can post that as a separate Phab review if there are questions or rebase this patch on top of the preliminary test commit without posting it. | 
| 25 ↗ | (On Diff #469326) | To get a bit more value from this set of tests, I'd use a unique binop opcode in each test. That way, we know that we're handling non-commutative opcodes, FP opcodes, propagating no-wrap and fast-math flags, etc. as expected. | 
| 81 ↗ | (On Diff #469326) | We still need some negative tests to make sure the transform isn't going overboard: 
 | 
Precommited tests to llvm/test/Transforms/InstCombine/shuffle-binop.ll
Fixed up broken match logic in simplifyBinOpSplats.
Added a type equality comparision between BinOp operands for the case where a shuffle changes a vector's length
Added fast-math flag if the BinOP is an FPMathOperator
Added test splat_binop_non_splat_x to check non-splat inputs do not get simplified.
Added test non_splat_binop_splat_x to check non-splat outputs do not get simplified.
Added test splat_binop_splat_changes_x_length to check inputs that reduce their length do not get simplified.
| llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
|---|---|---|
| 2609 | It's easier to spot a coding mistake if we don't initialize these; that's because the compiler can detect a bogus use of an uninitialized value and put up a warning. | |
| 2620 | This naked cast<> isn't safe (although I'm not sure how to create a test case for it). You probably want to do this instead: Value *NewBO = Builder.CreateBinOp(cast<BinaryOperator>(Op0)->getOpcode(), X, Y); if (auto *NewBOI = dyn_cast<Instruction>(NewBO)) NewBOI->copyIRFlags(&I); But we should have tests that include FMF, nsw, nuw etc to verify this is happening as expected. | |
| llvm/test/Transforms/InstCombine/shuffle-binop.ll | ||
| 75 | Alter this test to use at least one poison/undef element in the mask for %bsplat, so we'll verify that the mask propagates as expected. | |
| 148 | Oops - that's actually a miscompile. (You can confirm with Alive2 and a simpler fixed vector type.) We can't do this transform in general with integer division or remainder opcodes because those have the potential for immediate UB. Keep the tests, so we know we don't have this bug, but add tests with other opcodes (and IR flags) to positively show the transform. The predicate needed to guard the transform is likely: isSafeToSpeculativelyExecute(BinOp)) | |
| llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
|---|---|---|
| 2620 | Why is this cast unsafe? I was under the impression casting to a BinaryOperator was ok here due to the earlier match against m_BinOp | |
| llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
|---|---|---|
| 2620 | To be clear, this cast is ok because we matched Op0 with m_BinOp: auto BinOp = cast<BinaryOperator>(Op0); (but prefer auto * according to the coding standards) This cast is not because the Builder can fold its result to a constant Value: cast<BinaryOperator>(Builder.CreateBinOp(BinOp->getOpcode(), X, Y)); | |
Added check for isSafeToSpeculativelyExecute
Added fast-math flags and nuw/nsw flags to tests to check flag propagation
Added dyn_cast check for binop to instruction cast
| llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
|---|---|---|
| 2624–2625 | I prefer the form that I suggested (and was copied from nearby code): if (auto *NewBOI = dyn_cast<Instruction>(NewBO)) NewBOI->copyIRFlags(&I); Ie, it's better to continue here rather than give up. If we've somehow encountered a constant-folding opportunity, then we'll create the final expected (shuffled) constant right here instead of leaving it to some other transforms. (And again, I'm not sure how to write a regression test for this.) | |
| 2628–2629 | I'd prefer to use the raw creator for the final instruction: return new ShuffleVectorInst(NewBOI, SVI.getShuffleMask()); This gives a cosmetic/continuity improvement to the text output because InstCombine will transfer the name of the existing value to the new value if we use this code pattern. | |
| llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
|---|---|---|
| 2624–2625 | Ah, sorry, I missed the fact that the flags would still propagate here when called from NewBOI. I'll update this with your suggestion. | |
| llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
|---|---|---|
| 2624–2625 | Sorry - that was a copy/paste leftover. We want to transfer flags from the existing binop, so the current code in the patch is right about that part. My suggestion should have been: if (auto *NewBOI = dyn_cast<Instruction>(NewBO)) NewBOI->copyIRFlags(BinOp); | |
Don't bail on the added dyn_cast.
Use return new ShuffleVectorInst instead of replaceInstUsesWith
Will merge after 24h if there are no other comments. Thanks a lot for your help @spatel.
Hi @MattDevereau, one of our internal tests started to fail and I bisected the failure back to this change. I have filed issue 58777 for the issue. Can you take a look?
Hi @dyung, thanks for the report and reproducer. Matt's away today so I've applied a revert in e1790c8c290d773cd5b1fc79f80b7a23dceb7589.
| llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
|---|---|---|
| 2605 | I'm seeing instcombine fire on this node: IC: Visiting: %shuffle = shufflevector <2 x double> %sub, <2 x double> %sub, <2 x i32> <i32 2, i32 2> Where I understand the intent is to only match the first element. The combine then replaces the input with: shufflevector <2 x double> %3, <2 x double> poison, <2 x i32> <i32 2, i32 2> ... which is selecting elements from poison. The reason the combine is firing is that isZeroEltSplat will return true if the mask is selecting the zeroth lane of any input operand. | |
| llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
|---|---|---|
| 2605 | Normally, we wouldn't have to consider that case because it gets canonicalized (see around line 2696), but this transform can happen before shuffle mask canonicalization, so we'll need to check for a real zero mask. That check was included in an earlier draft, but I suggested isZeroEltSplat as a shortcut - oops. The header comments for that function clearly state that it works for the 0-elt of the 2nd shuffle operand, but I don't remember why I defined it that way... | |
Reverted the addition of SVI.isZeroEltSplat() to a more primitive and explicit check. This check had the unintended side effect of accepting splats from the first element of the second splat operand, which caused incorrectness as poison elements were being splatted.
Added test shuffle_op2_0th_element_mask
| llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
|---|---|---|
| 2638–2641 | LLVM generally omits brackets on a one-line if body: | |
| llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
|---|---|---|
| 2638–2641 | I forgot to remove this when doing some local changes. I'll remove this on push. | |
I'm seeing instcombine fire on this node:
Where I understand the intent is to only match the first element. The combine then replaces the input with:
... which is selecting elements from poison.
The reason the combine is firing is that isZeroEltSplat will return true if the mask is selecting the zeroth lane of any input operand.