This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Remove redundant splats in InstCombineVectorOps
ClosedPublic

Authored by MattDevereau on Oct 13 2022, 7:27 AM.

Details

Summary

Splatting the first vector element of the result of a BinOp, where any of the
BinOp's operands are the result of a first vector element splat can be simplified to
splatting the first vector element of the result of the BinOp

Diff Detail

Event Timeline

MattDevereau created this revision.Oct 13 2022, 7:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 13 2022, 7:27 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
MattDevereau requested review of this revision.Oct 13 2022, 7:27 AM
MattDevereau edited the summary of this revision. (Show Details)Oct 13 2022, 7:32 AM
MattDevereau added reviewers: spatel, RKSimon.

Do we need negative tests - maybe add a negative test for zero masks containing undefs (e.g. <4 x i32> <i32 0, i32 undef, i32 0, i32 0>)?

@spatel Any comments?

spatel requested changes to this revision.Oct 17 2022, 11:04 AM

If the motivating case always ends in a splat, could we do this in instcombine instead?

This is an improvement for the specific patterns where it is allowed, but it needs to account for possibly several other patterns where it is changing code in ways that the original transform did not anticipate. For example, this can currently miscompile (tested with aarch64 and x86 targets):

define <4 x i32> @insert_splat_sub(i32 %x, i32 %y) {
  %x111 = insertelement <4 x i32> <i32 1, i32 1, i32 1, i32 1>, i32 %x, i64 0
  %y567 = insertelement <4 x i32> <i32 4, i32 5, i32 6, i32 7>, i32 %y, i64 0
  %yyyy = shufflevector <4 x i32> %y567, <4 x i32> poison, <4 x i32> zeroinitializer
  %r = sub <4 x i32> %x111, %yyyy
  ret <4 x i32> %r
}
This revision now requires changes to proceed.Oct 17 2022, 11:04 AM
MattDevereau retitled this revision from [VectorCombine] Scalarize binops with insertelt nested inside splats to [Instcombine] Scalarize vscale splats of vectorized BinOps.
MattDevereau edited the summary of this revision. (Show Details)

Moved VectorCombine tests that are no longer necessary into InstCombine
Deleted all logic from VectorCombine, and have instead moved BinOp scalarizing logic into InstCombine. I've only seen this pattern emerge in InstCombine for scalable vectors, so this InstCombine is currently guarded by a scalable vector check.

Hi @spatel,
Thank you very much for the review, it's much appreciated.

I've moved this patch away from VectorCombine and into Instcombine instead.

For the miscompiling example you provided, this should no longer miscompile as I've gated the combine behind a check for scalable vectors. We've only seen the redundant inserts and splats appear for scalable vectors as late as Instcombine, which is why I originally focused on VectorCombine. The Instcombine looks backwards from shuffles now as well, which means the example you provided wont miscompile as it's no longer blindly throwing insertelements a level higher.

@RKSimon Given the move to InstCombine I think the combine is now specific enough not to warrant negative tests. Please let me know if you disagree,

nlopes added inline comments.Oct 19 2022, 8:59 AM
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2631

Please use PoisonValue here if possible. We are trying to get rid of undef.
Thank you!

Moved VectorCombine tests that are no longer necessary into InstCombine
Deleted all logic from VectorCombine, and have instead moved BinOp scalarizing logic into InstCombine. I've only seen this pattern emerge in InstCombine for scalable vectors, so this InstCombine is currently guarded by a scalable vector check.

Aha - I was wondering if the motivating case was just a scalable transform that escaped.

I stepped through some of the examples, and I think we can add a more general fold that will work for both scalable and fixed.

Here's a test example:

define <4 x i8> @splat_binop_splat(<4 x i8> %x, <4 x i8> %y) {
  %xsplat = shufflevector <4 x i8> %x, <4 x i8> poison, <4 x i32> zeroinitializer
  ; call void @use(<4 x i8> %xsplat)
  %b = add <4 x i8> %xsplat, %y
  %bsplat = shufflevector <4 x i8> %b, <4 x i8> poison, <4 x i32> zeroinitializer
  ret <4 x i8> %bsplat
}

With fixed vectors, this is reduced via InstCombinerImpl::SimplifyDemandedVectorElts(): the final splat means we don't care about anything but the zero elements of x and y, so the splat of x is irrelevant. But InstCombinerImpl::SimplifyDemandedVectorElts() bails on scalable vectors, and I'm not sure what is needed to relax that limitation. There's also an earlier bail out on all shuffles of scalable vectors in InstCombinerImpl::visitShuffleVectorInst().

But this pattern won't fold even with fixed vectors if we uncomment the extra use of xsplat, so that seems like an opportunity to do better by just matching this set of patterns since it's probably fairly common.
So I think we want to do something like this for any vector:
splat (binop splat(X), Y) --> splat (binop X, Y)
splat (binop X, splat(Y)) --> splat (binop X, Y)
...and that should allow existing folds to reduce your examples to the ideal (scalarized binop) IR.

MattDevereau retitled this revision from [Instcombine] Scalarize vscale splats of vectorized BinOps to [InstCombine] Remove redundant splats in InstCombineVectorOps.
MattDevereau edited the summary of this revision. (Show Details)

I've made the combine more generic to only remove the redundant BinOp splats. The vscale tests added in previous revisions did not benefit from this change, and so I've removed them. The original vscale case where redundant splats weren't being removed has been resolved by this change, as VectorCombine recognizes it can change the vector BinOps to scalar BinOps once the shuffles have been removed and only InsertElt's remain in their place. I believe the newly added tests from the discussion on this review provide sufficient coverage.

spatel added inline comments.Oct 21 2022, 5:35 AM
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2605–2606

Use ShuffleVectorInst::isZeroEltSplat() to shorten this?

We don't necessarily have to match a splat of the 0-element for this transform to work. We can deal with that variant as a follow-up patch to keep things simple for now, but we should eventually handle it.

2615–2619

I don't think this sequence works the way you are expecting. The 2nd match could overwrite the Y value that was set by the preceding statement.

2623

This will drop fast-math and other flags, but we should be able to safely propagate those? Try examples with Alive2 to verify that that is correct.

llvm/test/Transforms/InstCombine/vscale-insert-shuffle-binop.ll
4 ↗(On Diff #469326)

Remove the target specifier.

This is target-independent canonicalization, so we don't want target restrictions (and that would likely cause testing bots to fail).

6 ↗(On Diff #469326)

The fixed vector tests without an extra use are already folded, right? We should add extra uses to all fixed vector tests to show the improvement from this patch.

Once that's done, please pre-commit the baseline tests, so we'll just see the test diffs in this patch. You can post that as a separate Phab review if there are questions or rebase this patch on top of the preliminary test commit without posting it.

25 ↗(On Diff #469326)

To get a bit more value from this set of tests, I'd use a unique binop opcode in each test. That way, we know that we're handling non-commutative opcodes, FP opcodes, propagating no-wrap and fast-math flags, etc. as expected.

81 ↗(On Diff #469326)

We still need some negative tests to make sure the transform isn't going overboard:

  1. A non-splat shuffle of an input.
  2. A non-splat shuffle of the output.
  3. A splat that changes the input vector length?
MattDevereau planned changes to this revision.Oct 26 2022, 3:11 AM

I don't know if it changes anything for the motivating cases, but note that there are recent proposals to improve analysis of scalable vectors:
D136470
D136475

Precommited tests to llvm/test/Transforms/InstCombine/shuffle-binop.ll

Fixed up broken match logic in simplifyBinOpSplats.
Added a type equality comparision between BinOp operands for the case where a shuffle changes a vector's length
Added fast-math flag if the BinOP is an FPMathOperator
Added test splat_binop_non_splat_x to check non-splat inputs do not get simplified.
Added test non_splat_binop_splat_x to check non-splat outputs do not get simplified.
Added test splat_binop_splat_changes_x_length to check inputs that reduce their length do not get simplified.

spatel added inline comments.Oct 28 2022, 12:02 PM
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2609

It's easier to spot a coding mistake if we don't initialize these; that's because the compiler can detect a bogus use of an uninitialized value and put up a warning.

2620

This naked cast<> isn't safe (although I'm not sure how to create a test case for it).

You probably want to do this instead:

Value *NewBO = Builder.CreateBinOp(cast<BinaryOperator>(Op0)->getOpcode(), X, Y);
if (auto *NewBOI = dyn_cast<Instruction>(NewBO))
  NewBOI->copyIRFlags(&I);

But we should have tests that include FMF, nsw, nuw etc to verify this is happening as expected.

llvm/test/Transforms/InstCombine/shuffle-binop.ll
75

Alter this test to use at least one poison/undef element in the mask for %bsplat, so we'll verify that the mask propagates as expected.

148

Oops - that's actually a miscompile. (You can confirm with Alive2 and a simpler fixed vector type.)

We can't do this transform in general with integer division or remainder opcodes because those have the potential for immediate UB.

Keep the tests, so we know we don't have this bug, but add tests with other opcodes (and IR flags) to positively show the transform.

The predicate needed to guard the transform is likely:

isSafeToSpeculativelyExecute(BinOp))
MattDevereau added inline comments.Oct 31 2022, 4:49 AM
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2620

Why is this cast unsafe? I was under the impression casting to a BinaryOperator was ok here due to the earlier match against m_BinOp

spatel added inline comments.Oct 31 2022, 5:10 AM
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2620

To be clear, this cast is ok because we matched Op0 with m_BinOp:

auto BinOp = cast<BinaryOperator>(Op0);

(but prefer auto * according to the coding standards)

This cast is not because the Builder can fold its result to a constant Value:

cast<BinaryOperator>(Builder.CreateBinOp(BinOp->getOpcode(), X, Y));

Added check for isSafeToSpeculativelyExecute
Added fast-math flags and nuw/nsw flags to tests to check flag propagation
Added dyn_cast check for binop to instruction cast

spatel added inline comments.Oct 31 2022, 7:15 AM
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2624–2625

I prefer the form that I suggested (and was copied from nearby code):

if (auto *NewBOI = dyn_cast<Instruction>(NewBO))
  NewBOI->copyIRFlags(&I);

Ie, it's better to continue here rather than give up. If we've somehow encountered a constant-folding opportunity, then we'll create the final expected (shuffled) constant right here instead of leaving it to some other transforms.

(And again, I'm not sure how to write a regression test for this.)

2628–2629

I'd prefer to use the raw creator for the final instruction:

return new ShuffleVectorInst(NewBOI, SVI.getShuffleMask());

This gives a cosmetic/continuity improvement to the text output because InstCombine will transfer the name of the existing value to the new value if we use this code pattern.

MattDevereau added inline comments.Oct 31 2022, 7:45 AM
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2624–2625

Ah, sorry, I missed the fact that the flags would still propagate here when called from NewBOI. I'll update this with your suggestion.
I'm assuming that NewBOI->copyIRFlags(&I); here should be NewBOI->copyIRFlags(NewBO);? &I does not exist in this scope, and if you meant &SVI this would not copy the IR flags on the binops.

spatel added inline comments.Oct 31 2022, 7:52 AM
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2624–2625

Sorry - that was a copy/paste leftover. We want to transfer flags from the existing binop, so the current code in the patch is right about that part.

My suggestion should have been:

if (auto *NewBOI = dyn_cast<Instruction>(NewBO))
  NewBOI->copyIRFlags(BinOp);

Don't bail on the added dyn_cast.
Use return new ShuffleVectorInst instead of replaceInstUsesWith

spatel accepted this revision.Oct 31 2022, 8:13 AM

LGTM

This revision is now accepted and ready to land.Oct 31 2022, 8:13 AM

Will merge after 24h if there are no other comments. Thanks a lot for your help @spatel.

dyung added a subscriber: dyung.Nov 3 2022, 12:22 AM

Hi @MattDevereau, one of our internal tests started to fail and I bisected the failure back to this change. I have filed issue 58777 for the issue. Can you take a look?

Hi @dyung, thanks for the report and reproducer. Matt's away today so I've applied a revert in e1790c8c290d773cd5b1fc79f80b7a23dceb7589.

RKSimon reopened this revision.Nov 3 2022, 1:21 AM
This revision is now accepted and ready to land.Nov 3 2022, 1:21 AM
RKSimon requested changes to this revision.Nov 3 2022, 1:22 AM
This revision now requires changes to proceed.Nov 3 2022, 1:22 AM
peterwaller-arm added inline comments.Nov 3 2022, 3:54 AM
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2605

I'm seeing instcombine fire on this node:

IC: Visiting:   %shuffle = shufflevector <2 x double> %sub, <2 x double> %sub, <2 x i32> <i32 2, i32 2>

Where I understand the intent is to only match the first element. The combine then replaces the input with:

shufflevector <2 x double> %3, <2 x double> poison, <2 x i32> <i32 2, i32 2>

... which is selecting elements from poison.

The reason the combine is firing is that isZeroEltSplat will return true if the mask is selecting the zeroth lane of any input operand.

spatel added inline comments.Nov 3 2022, 5:21 AM
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2605

Normally, we wouldn't have to consider that case because it gets canonicalized (see around line 2696), but this transform can happen before shuffle mask canonicalization, so we'll need to check for a real zero mask.

That check was included in an earlier draft, but I suggested isZeroEltSplat as a shortcut - oops.

The header comments for that function clearly state that it works for the 0-elt of the 2nd shuffle operand, but I don't remember why I defined it that way...

Reverted the addition of SVI.isZeroEltSplat() to a more primitive and explicit check. This check had the unintended side effect of accepting splats from the first element of the second splat operand, which caused incorrectness as poison elements were being splatted.
Added test shuffle_op2_0th_element_mask

Hi @dyung, thanks for spotting this and drawing it to our attention. I've updated the revision and have verified that the test given here outputs -52509.586960 -52510.392997 with this change. Thank you.

spatel accepted this revision.Nov 7 2022, 6:09 AM
spatel added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2638–2640
RKSimon accepted this revision.Nov 7 2022, 6:13 AM

LGTM to unblock

This revision is now accepted and ready to land.Nov 7 2022, 6:13 AM
MattDevereau added inline comments.Nov 7 2022, 6:15 AM
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2638–2640

I forgot to remove this when doing some local changes. I'll remove this on push.

This revision was landed with ongoing or failed builds.Nov 7 2022, 7:47 AM
This revision was automatically updated to reflect the committed changes.
llvm/test/Transforms/InstCombine/shuffle-binop.ll