This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
-
VectorCombine.cpp
-
test/Transforms/VectorCombine/AArch64/
-
Transforms/
-
VectorCombine/
-
AArch64/
-
insert-shuffle-binop.ll

Differential D135876

[InstCombine] Remove redundant splats in InstCombineVectorOps
ClosedPublic

Authored by MattDevereau on Oct 13 2022, 7:27 AM.

Download Raw Diff

Details

Reviewers

peterwaller-arm
c-rhodes
dtemirbulatov
paulwalker-arm
david-arm
benmxwl-arm
spatel
RKSimon

Commits

rGa8c24d57b817: [InstCombine] Remove redundant splats in InstCombineVectorOps
rG957eed0b1af2: [InstCombine] Remove redundant splats in InstCombineVectorOps

Summary

Splatting the first vector element of the result of a BinOp, where any of the
BinOp's operands are the result of a first vector element splat can be simplified to
splatting the first vector element of the result of the BinOp

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

MattDevereau created this revision.Oct 13 2022, 7:27 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 13 2022, 7:27 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

MattDevereau requested review of this revision.Oct 13 2022, 7:27 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 13 2022, 7:27 AM

Herald added subscribers: llvm-commits, • pcwang-thead. · View Herald Transcript

MattDevereau edited the summary of this revision. (Show Details)Oct 13 2022, 7:32 AM

MattDevereau added reviewers: spatel, RKSimon.

MattDevereau updated this revision to Diff 467473.Oct 13 2022, 8:00 AM

Harbormaster completed remote builds in B191965: Diff 467473.Oct 13 2022, 8:48 AM

georges added a subscriber: georges.Oct 15 2022, 8:33 AM

Do we need negative tests - maybe add a negative test for zero masks containing undefs (e.g. <4 x i32> <i32 0, i32 undef, i32 0, i32 0>)?

@spatel Any comments?

If the motivating case always ends in a splat, could we do this in instcombine instead?

This is an improvement for the specific patterns where it is allowed, but it needs to account for possibly several other patterns where it is changing code in ways that the original transform did not anticipate. For example, this can currently miscompile (tested with aarch64 and x86 targets):

define <4 x i32> @insert_splat_sub(i32 %x, i32 %y) {
  %x111 = insertelement <4 x i32> <i32 1, i32 1, i32 1, i32 1>, i32 %x, i64 0
  %y567 = insertelement <4 x i32> <i32 4, i32 5, i32 6, i32 7>, i32 %y, i64 0
  %yyyy = shufflevector <4 x i32> %y567, <4 x i32> poison, <4 x i32> zeroinitializer
  %r = sub <4 x i32> %x111, %yyyy
  ret <4 x i32> %r
}

This revision now requires changes to proceed.Oct 17 2022, 11:04 AM

Moved VectorCombine tests that are no longer necessary into InstCombine
Deleted all logic from VectorCombine, and have instead moved BinOp scalarizing logic into InstCombine. I've only seen this pattern emerge in InstCombine for scalable vectors, so this InstCombine is currently guarded by a scalable vector check.

Herald added a subscriber: nlopes. · View Herald TranscriptOct 19 2022, 8:15 AM

Hi @spatel,
Thank you very much for the review, it's much appreciated.

I've moved this patch away from VectorCombine and into Instcombine instead.

For the miscompiling example you provided, this should no longer miscompile as I've gated the combine behind a check for scalable vectors. We've only seen the redundant inserts and splats appear for scalable vectors as late as Instcombine, which is why I originally focused on VectorCombine. The Instcombine looks backwards from shuffles now as well, which means the example you provided wont miscompile as it's no longer blindly throwing insertelements a level higher.

@RKSimon Given the move to InstCombine I think the combine is now specific enough not to warrant negative tests. Please let me know if you disagree,

nlopes added inline comments.Oct 19 2022, 8:59 AM

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2633 ↗	(On Diff #468921)	Please use PoisonValue here if possible. We are trying to get rid of undef. Thank you!

Harbormaster completed remote builds in B193010: Diff 468921.Oct 19 2022, 9:10 AM

In D135876#3868467, @MattDevereau wrote:

Moved VectorCombine tests that are no longer necessary into InstCombine
Deleted all logic from VectorCombine, and have instead moved BinOp scalarizing logic into InstCombine. I've only seen this pattern emerge in InstCombine for scalable vectors, so this InstCombine is currently guarded by a scalable vector check.

Aha - I was wondering if the motivating case was just a scalable transform that escaped.

I stepped through some of the examples, and I think we can add a more general fold that will work for both scalable and fixed.

Here's a test example:

define <4 x i8> @splat_binop_splat(<4 x i8> %x, <4 x i8> %y) {
  %xsplat = shufflevector <4 x i8> %x, <4 x i8> poison, <4 x i32> zeroinitializer
  ; call void @use(<4 x i8> %xsplat)
  %b = add <4 x i8> %xsplat, %y
  %bsplat = shufflevector <4 x i8> %b, <4 x i8> poison, <4 x i32> zeroinitializer
  ret <4 x i8> %bsplat
}

With fixed vectors, this is reduced via InstCombinerImpl::SimplifyDemandedVectorElts(): the final splat means we don't care about anything but the zero elements of x and y, so the splat of x is irrelevant. But InstCombinerImpl::SimplifyDemandedVectorElts() bails on scalable vectors, and I'm not sure what is needed to relax that limitation. There's also an earlier bail out on all shuffles of scalable vectors in InstCombinerImpl::visitShuffleVectorInst().

But this pattern won't fold even with fixed vectors if we uncomment the extra use of xsplat, so that seems like an opportunity to do better by just matching this set of patterns since it's probably fairly common.
So I think we want to do something like this for any vector:
splat (binop splat(X), Y) --> splat (binop X, Y)
splat (binop X, splat(Y)) --> splat (binop X, Y)
...and that should allow existing folds to reduce your examples to the ideal (scalarized binop) IR.

I've made the combine more generic to only remove the redundant BinOp splats. The vscale tests added in previous revisions did not benefit from this change, and so I've removed them. The original vscale case where redundant splats weren't being removed has been resolved by this change, as VectorCombine recognizes it can change the vector BinOps to scalar BinOps once the shuffles have been removed and only InsertElt's remain in their place. I believe the newly added tests from the discussion on this review provide sufficient coverage.

Harbormaster completed remote builds in B193303: Diff 469326.Oct 20 2022, 3:29 PM

spatel added inline comments.Oct 21 2022, 5:35 AM

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2607–2608 ↗	(On Diff #469326)	Use ShuffleVectorInst::isZeroEltSplat() to shorten this? We don't necessarily have to match a splat of the 0-element for this transform to work. We can deal with that variant as a follow-up patch to keep things simple for now, but we should eventually handle it.
2617–2621 ↗	(On Diff #469326)	I don't think this sequence works the way you are expecting. The 2nd match could overwrite the Y value that was set by the preceding statement.
2625 ↗	(On Diff #469326)	This will drop fast-math and other flags, but we should be able to safely propagate those? Try examples with Alive2 to verify that that is correct.
llvm/test/Transforms/InstCombine/vscale-insert-shuffle-binop.ll
4 ↗	(On Diff #469326)	Remove the target specifier. This is target-independent canonicalization, so we don't want target restrictions (and that would likely cause testing bots to fail).
6 ↗	(On Diff #469326)	The fixed vector tests without an extra use are already folded, right? We should add extra uses to all fixed vector tests to show the improvement from this patch. Once that's done, please pre-commit the baseline tests, so we'll just see the test diffs in this patch. You can post that as a separate Phab review if there are questions or rebase this patch on top of the preliminary test commit without posting it.
25 ↗	(On Diff #469326)	To get a bit more value from this set of tests, I'd use a unique binop opcode in each test. That way, we know that we're handling non-commutative opcodes, FP opcodes, propagating no-wrap and fast-math flags, etc. as expected.
81 ↗	(On Diff #469326)	We still need some negative tests to make sure the transform isn't going overboard: A non-splat shuffle of an input. A non-splat shuffle of the output. A splat that changes the input vector length?

MattDevereau planned changes to this revision.Oct 26 2022, 3:11 AM

I don't know if it changes anything for the motivating cases, but note that there are recent proposals to improve analysis of scalable vectors:
D136470
D136475

MattDevereau mentioned this in rGc1b4d7f09127: [InstCombine] Add shuffle-binop tests.Oct 28 2022, 9:02 AM

Precommited tests to llvm/test/Transforms/InstCombine/shuffle-binop.ll

Fixed up broken match logic in simplifyBinOpSplats.
Added a type equality comparision between BinOp operands for the case where a shuffle changes a vector's length
Added fast-math flag if the BinOP is an FPMathOperator
Added test splat_binop_non_splat_x to check non-splat inputs do not get simplified.
Added test non_splat_binop_splat_x to check non-splat outputs do not get simplified.
Added test splat_binop_splat_changes_x_length to check inputs that reduce their length do not get simplified.

Harbormaster completed remote builds in B194958: Diff 471576.Oct 28 2022, 9:56 AM

spatel added inline comments.Oct 28 2022, 12:02 PM

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2609 ↗	(On Diff #471576)	It's easier to spot a coding mistake if we don't initialize these; that's because the compiler can detect a bogus use of an uninitialized value and put up a warning.
2620 ↗	(On Diff #471576)	This naked cast<> isn't safe (although I'm not sure how to create a test case for it). You probably want to do this instead: Value NewBO = Builder.CreateBinOp(cast<BinaryOperator>(Op0)->getOpcode(), X, Y); if (auto NewBOI = dyn_cast<Instruction>(NewBO)) NewBOI->copyIRFlags(&I); But we should have tests that include FMF, nsw, nuw etc to verify this is happening as expected.
llvm/test/Transforms/InstCombine/shuffle-binop.ll
75 ↗	(On Diff #471576)	Alter this test to use at least one poison/undef element in the mask for %bsplat, so we'll verify that the mask propagates as expected.
112 ↗	(On Diff #471576)	Oops - that's actually a miscompile. (You can confirm with Alive2 and a simpler fixed vector type.) We can't do this transform in general with integer division or remainder opcodes because those have the potential for immediate UB. Keep the tests, so we know we don't have this bug, but add tests with other opcodes (and IR flags) to positively show the transform. The predicate needed to guard the transform is likely: isSafeToSpeculativelyExecute(BinOp))

MattDevereau added inline comments.Oct 31 2022, 4:49 AM

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2620 ↗	(On Diff #471576)	Why is this cast unsafe? I was under the impression casting to a BinaryOperator was ok here due to the earlier match against m_BinOp

spatel added inline comments.Oct 31 2022, 5:10 AM

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2620 ↗	(On Diff #471576)	To be clear, this cast is ok because we matched Op0 with m_BinOp: auto BinOp = cast<BinaryOperator>(Op0); (but prefer `auto *` according to the coding standards) This cast is not because the Builder can fold its result to a constant Value: cast<BinaryOperator>(Builder.CreateBinOp(BinOp->getOpcode(), X, Y));

Added check for isSafeToSpeculativelyExecute
Added fast-math flags and nuw/nsw flags to tests to check flag propagation
Added dyn_cast check for binop to instruction cast

spatel added inline comments.Oct 31 2022, 7:15 AM

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2624–2625 ↗	(On Diff #471985)	I prefer the form that I suggested (and was copied from nearby code): if (auto *NewBOI = dyn_cast<Instruction>(NewBO)) NewBOI->copyIRFlags(&I); Ie, it's better to continue here rather than give up. If we've somehow encountered a constant-folding opportunity, then we'll create the final expected (shuffled) constant right here instead of leaving it to some other transforms. (And again, I'm not sure how to write a regression test for this.)
2628–2629 ↗	(On Diff #471985)	I'd prefer to use the raw creator for the final instruction: return new ShuffleVectorInst(NewBOI, SVI.getShuffleMask()); This gives a cosmetic/continuity improvement to the text output because InstCombine will transfer the name of the existing value to the new value if we use this code pattern.

MattDevereau added inline comments.Oct 31 2022, 7:45 AM

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2624–2625 ↗	(On Diff #471985)	Ah, sorry, I missed the fact that the flags would still propagate here when called from NewBOI. I'll update this with your suggestion. I'm assuming that `NewBOI->copyIRFlags(&I);` here should be `NewBOI->copyIRFlags(NewBO);`? `&I` does not exist in this scope, and if you meant `&SVI` this would not copy the IR flags on the binops.

spatel added inline comments.Oct 31 2022, 7:52 AM

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2624–2625 ↗	(On Diff #471985)	Sorry - that was a copy/paste leftover. We want to transfer flags from the existing binop, so the current code in the patch is right about that part. My suggestion should have been: if (auto *NewBOI = dyn_cast<Instruction>(NewBO)) NewBOI->copyIRFlags(BinOp);

Harbormaster completed remote builds in B195251: Diff 471985.Oct 31 2022, 7:56 AM

Don't bail on the added dyn_cast.
Use return new ShuffleVectorInst instead of replaceInstUsesWith

LGTM

This revision is now accepted and ready to land.Oct 31 2022, 8:13 AM

Will merge after 24h if there are no other comments. Thanks a lot for your help @spatel.

Harbormaster completed remote builds in B195267: Diff 472010.Oct 31 2022, 8:51 AM

Closed by commit rG957eed0b1af2: [InstCombine] Remove redundant splats in InstCombineVectorOps (authored by MattDevereau). · Explain WhyNov 2 2022, 4:57 AM

This revision was automatically updated to reflect the committed changes.

MattDevereau added a commit: rG957eed0b1af2: [InstCombine] Remove redundant splats in InstCombineVectorOps.

Hi @MattDevereau, one of our internal tests started to fail and I bisected the failure back to this change. I have filed issue 58777 for the issue. Can you take a look?

Hi @dyung, thanks for the report and reproducer. Matt's away today so I've applied a revert in e1790c8c290d773cd5b1fc79f80b7a23dceb7589.

peterwaller-arm added a reverting change: rGe1790c8c290d: Revert "[InstCombine] Remove redundant splats in InstCombineVectorOps".Nov 3 2022, 1:01 AM

RKSimon reopened this revision.Nov 3 2022, 1:21 AM

This revision is now accepted and ready to land.Nov 3 2022, 1:21 AM

RKSimon requested changes to this revision.Nov 3 2022, 1:22 AM

This revision now requires changes to proceed.Nov 3 2022, 1:22 AM

peterwaller-arm added inline comments.Nov 3 2022, 3:54 AM

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2605 ↗	(On Diff #472579)	I'm seeing instcombine fire on this node: IC: Visiting: %shuffle = shufflevector <2 x double> %sub, <2 x double> %sub, <2 x i32> <i32 2, i32 2> Where I understand the intent is to only match the first element. The combine then replaces the input with: shufflevector <2 x double> %3, <2 x double> poison, <2 x i32> <i32 2, i32 2> ... which is selecting elements from poison. The reason the combine is firing is that `isZeroEltSplat` will return true if the mask is selecting the zeroth lane of any input operand.

spatel added inline comments.Nov 3 2022, 5:21 AM

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2605 ↗	(On Diff #472579)	Normally, we wouldn't have to consider that case because it gets canonicalized (see around line 2696), but this transform can happen before shuffle mask canonicalization, so we'll need to check for a real zero mask. That check was included in an earlier draft, but I suggested `isZeroEltSplat` as a shortcut - oops. The header comments for that function clearly state that it works for the 0-elt of the 2nd shuffle operand, but I don't remember why I defined it that way...

Reverted the addition of SVI.isZeroEltSplat() to a more primitive and explicit check. This check had the unintended side effect of accepting splats from the first element of the second splat operand, which caused incorrectness as poison elements were being splatted.
Added test shuffle_op2_0th_element_mask

Hi @dyung, thanks for spotting this and drawing it to our attention. I've updated the revision and have verified that the test given here outputs -52509.586960 -52510.392997 with this change. Thank you.

spatel accepted this revision.Nov 7 2022, 6:09 AM

spatel added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2638–2640 ↗	(On Diff #473639)	LLVM generally omits brackets on a one-line `if` body: https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements

LGTM to unblock

This revision is now accepted and ready to land.Nov 7 2022, 6:13 AM

MattDevereau added inline comments.Nov 7 2022, 6:15 AM

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2638–2640 ↗	(On Diff #473639)	I forgot to remove this when doing some local changes. I'll remove this on push.

Harbormaster completed remote builds in B196463: Diff 473639.Nov 7 2022, 7:31 AM

This revision was landed with ongoing or failed builds.Nov 7 2022, 7:47 AM

Closed by commit rGa8c24d57b817: [InstCombine] Remove redundant splats in InstCombineVectorOps (authored by MattDevereau). · Explain Why

This revision was automatically updated to reflect the committed changes.

MattDevereau added a commit: rGa8c24d57b817: [InstCombine] Remove redundant splats in InstCombineVectorOps.

MattDevereau mentioned this in rG87debdadaf18: [VectorCombine] check instruction type before dispatching to folds.Nov 22 2022, 8:58 AM

spatel mentioned this in rG40d772c642f7: [InstCombine] add one-use check to prevent creating an instruction in shuffle….Feb 22 2023, 4:24 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

VectorCombine.cpp

6 lines

test/

Transforms/

VectorCombine/

AArch64/

insert-shuffle-binop.ll

76 lines

Diff 467463

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

Context not available.
	if (match(U, m_Select(m_Specific(&I), m_Value(), m_Value())))	if (match(U, m_Select(m_Specific(&I), m_Value(), m_Value())))
	return false;	return false;

		Value *ShuffleOp0;
		if (match(Ins0, m_Shuffle(m_Value(ShuffleOp0), m_Undef(), m_ZeroMask())))
		Ins0 = ShuffleOp0;
		if (match(Ins1, m_Shuffle(m_Value(ShuffleOp0), m_Undef(), m_ZeroMask())))
		Ins1 = ShuffleOp0;

	// Match against one or both scalar values being inserted into constant	// Match against one or both scalar values being inserted into constant
	// vectors:	// vectors:
	// vec_op VecC0, (inselt VecC1, V1, Index)	// vec_op VecC0, (inselt VecC1, V1, Index)
Context not available.

llvm/test/Transforms/VectorCombine/AArch64/insert-shuffle-binop.ll

Context not available.

	define <vscale x 4 x float> @fadd_vscale_insertelt_a_shuffle_insert_b(float %0, float %1) {	define <vscale x 4 x float> @fadd_vscale_insertelt_a_shuffle_insert_b(float %0, float %1) {
	; CHECK-LABEL: @fadd_vscale_insertelt_a_shuffle_insert_b(	; CHECK-LABEL: @fadd_vscale_insertelt_a_shuffle_insert_b(
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 4 x float> poison, float [[TMP0:%.]], i64 0	; CHECK-NEXT: [[R_SCALAR:%.]] = fadd fast float [[TMP1:%.]], [[TMP0:%.*]]
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x float> [[BROADCAST_SPLATINSERT]], <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer	; CHECK-NEXT: [[R:%.*]] = insertelement <vscale x 4 x float> poison, float [[R_SCALAR]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT2:%.]] = insertelement <vscale x 4 x float> poison, float [[TMP1:%.]], i64 0
	; CHECK-NEXT: [[R:%.*]] = fadd fast <vscale x 4 x float> [[BROADCAST_SPLATINSERT2]], [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <vscale x 4 x float> [[R]], <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <vscale x 4 x float> [[R]], <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: ret <vscale x 4 x float> [[TMP3]]	; CHECK-NEXT: ret <vscale x 4 x float> [[TMP3]]
	;	;
Context not available.

	define <4 x float> @fadd_fixed_insertelt_a_shuffle_insert_b(float %0, float %1) {	define <4 x float> @fadd_fixed_insertelt_a_shuffle_insert_b(float %0, float %1) {
	; CHECK-LABEL: @fadd_fixed_insertelt_a_shuffle_insert_b(	; CHECK-LABEL: @fadd_fixed_insertelt_a_shuffle_insert_b(
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x float> poison, float [[TMP0:%.]], i64 0	; CHECK-NEXT: [[R_SCALAR:%.]] = fadd fast float [[TMP1:%.]], [[TMP0:%.*]]
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x float> [[BROADCAST_SPLATINSERT]], <4 x float> poison, <4 x i32> zeroinitializer	; CHECK-NEXT: [[R:%.*]] = insertelement <4 x float> poison, float [[R_SCALAR]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT2:%.]] = insertelement <4 x float> poison, float [[TMP1:%.]], i64 0
	; CHECK-NEXT: [[R:%.*]] = fadd fast <4 x float> [[BROADCAST_SPLATINSERT2]], [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[R]], <4 x float> poison, <4 x i32> zeroinitializer	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[R]], <4 x float> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: ret <4 x float> [[TMP3]]	; CHECK-NEXT: ret <4 x float> [[TMP3]]
	;	;
Context not available.

	define <vscale x 4 x float> @fsub_vscale_insertelt_a_shuffle_insert_b(float %0, float %1) {	define <vscale x 4 x float> @fsub_vscale_insertelt_a_shuffle_insert_b(float %0, float %1) {
	; CHECK-LABEL: @fsub_vscale_insertelt_a_shuffle_insert_b(	; CHECK-LABEL: @fsub_vscale_insertelt_a_shuffle_insert_b(
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 4 x float> poison, float [[TMP0:%.]], i64 0	; CHECK-NEXT: [[R_SCALAR:%.]] = fsub fast float [[TMP1:%.]], [[TMP0:%.*]]
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x float> [[BROADCAST_SPLATINSERT]], <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer	; CHECK-NEXT: [[R:%.*]] = insertelement <vscale x 4 x float> poison, float [[R_SCALAR]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT2:%.]] = insertelement <vscale x 4 x float> poison, float [[TMP1:%.]], i64 0
	; CHECK-NEXT: [[R:%.*]] = fsub fast <vscale x 4 x float> [[BROADCAST_SPLATINSERT2]], [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <vscale x 4 x float> [[R]], <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <vscale x 4 x float> [[R]], <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: ret <vscale x 4 x float> [[TMP3]]	; CHECK-NEXT: ret <vscale x 4 x float> [[TMP3]]
	;	;
Context not available.

	define <4 x float> @fsub_fixed_insertelt_a_shuffle_insert_b(float %0, float %1) {	define <4 x float> @fsub_fixed_insertelt_a_shuffle_insert_b(float %0, float %1) {
	; CHECK-LABEL: @fsub_fixed_insertelt_a_shuffle_insert_b(	; CHECK-LABEL: @fsub_fixed_insertelt_a_shuffle_insert_b(
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x float> poison, float [[TMP0:%.]], i64 0	; CHECK-NEXT: [[R_SCALAR:%.]] = fsub fast float [[TMP1:%.]], [[TMP0:%.*]]
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x float> [[BROADCAST_SPLATINSERT]], <4 x float> poison, <4 x i32> zeroinitializer	; CHECK-NEXT: [[R:%.*]] = insertelement <4 x float> poison, float [[R_SCALAR]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT2:%.]] = insertelement <4 x float> poison, float [[TMP1:%.]], i64 0
	; CHECK-NEXT: [[R:%.*]] = fsub fast <4 x float> [[BROADCAST_SPLATINSERT2]], [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[R]], <4 x float> poison, <4 x i32> zeroinitializer	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[R]], <4 x float> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: ret <4 x float> [[TMP3]]	; CHECK-NEXT: ret <4 x float> [[TMP3]]
	;	;
Context not available.

	define <vscale x 4 x float> @fadd_vscale_shuffle_insert_a_insert_b(float %0, float %1) {	define <vscale x 4 x float> @fadd_vscale_shuffle_insert_a_insert_b(float %0, float %1) {
	; CHECK-LABEL: @fadd_vscale_shuffle_insert_a_insert_b(	; CHECK-LABEL: @fadd_vscale_shuffle_insert_a_insert_b(
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 4 x float> poison, float [[TMP0:%.]], i64 0	; CHECK-NEXT: [[R_SCALAR:%.]] = fadd fast float [[TMP0:%.]], [[TMP1:%.*]]
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x float> [[BROADCAST_SPLATINSERT]], <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer	; CHECK-NEXT: [[R:%.*]] = insertelement <vscale x 4 x float> poison, float [[R_SCALAR]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT2:%.]] = insertelement <vscale x 4 x float> poison, float [[TMP1:%.]], i64 0
	; CHECK-NEXT: [[R:%.*]] = fadd fast <vscale x 4 x float> [[BROADCAST_SPLAT]], [[BROADCAST_SPLATINSERT2]]
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <vscale x 4 x float> [[R]], <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <vscale x 4 x float> [[R]], <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: ret <vscale x 4 x float> [[TMP3]]	; CHECK-NEXT: ret <vscale x 4 x float> [[TMP3]]
	;	;
Context not available.

	define <4 x float> @fadd_fixed_shuffle_insert_a_insert_b(float %0, float %1) {	define <4 x float> @fadd_fixed_shuffle_insert_a_insert_b(float %0, float %1) {
	; CHECK-LABEL: @fadd_fixed_shuffle_insert_a_insert_b(	; CHECK-LABEL: @fadd_fixed_shuffle_insert_a_insert_b(
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x float> poison, float [[TMP0:%.]], i64 0	; CHECK-NEXT: [[R_SCALAR:%.]] = fadd fast float [[TMP0:%.]], [[TMP1:%.*]]
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x float> [[BROADCAST_SPLATINSERT]], <4 x float> poison, <4 x i32> zeroinitializer	; CHECK-NEXT: [[R:%.*]] = insertelement <4 x float> poison, float [[R_SCALAR]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT2:%.]] = insertelement <4 x float> poison, float [[TMP1:%.]], i64 0
	; CHECK-NEXT: [[R:%.*]] = fadd fast <4 x float> [[BROADCAST_SPLAT]], [[BROADCAST_SPLATINSERT2]]
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[R]], <4 x float> poison, <4 x i32> zeroinitializer	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[R]], <4 x float> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: ret <4 x float> [[TMP3]]	; CHECK-NEXT: ret <4 x float> [[TMP3]]
	;	;
Context not available.

	define <vscale x 4 x float> @fsub_vscale_shuffle_insert_a_insert_b(float %0, float %1) {	define <vscale x 4 x float> @fsub_vscale_shuffle_insert_a_insert_b(float %0, float %1) {
	; CHECK-LABEL: @fsub_vscale_shuffle_insert_a_insert_b(	; CHECK-LABEL: @fsub_vscale_shuffle_insert_a_insert_b(
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 4 x float> poison, float [[TMP0:%.]], i64 0	; CHECK-NEXT: [[R_SCALAR:%.]] = fsub fast float [[TMP0:%.]], [[TMP1:%.*]]
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x float> [[BROADCAST_SPLATINSERT]], <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer	; CHECK-NEXT: [[R:%.*]] = insertelement <vscale x 4 x float> poison, float [[R_SCALAR]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT2:%.]] = insertelement <vscale x 4 x float> poison, float [[TMP1:%.]], i64 0
	; CHECK-NEXT: [[R:%.*]] = fsub fast <vscale x 4 x float> [[BROADCAST_SPLAT]], [[BROADCAST_SPLATINSERT2]]
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <vscale x 4 x float> [[R]], <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <vscale x 4 x float> [[R]], <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: ret <vscale x 4 x float> [[TMP3]]	; CHECK-NEXT: ret <vscale x 4 x float> [[TMP3]]
	;	;
Context not available.

	define <4 x float> @fsub_fixed_shuffle_insert_a_insert_b(float %0, float %1) {	define <4 x float> @fsub_fixed_shuffle_insert_a_insert_b(float %0, float %1) {
	; CHECK-LABEL: @fsub_fixed_shuffle_insert_a_insert_b(	; CHECK-LABEL: @fsub_fixed_shuffle_insert_a_insert_b(
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x float> poison, float [[TMP0:%.]], i64 0	; CHECK-NEXT: [[R_SCALAR:%.]] = fsub fast float [[TMP0:%.]], [[TMP1:%.*]]
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x float> [[BROADCAST_SPLATINSERT]], <4 x float> poison, <4 x i32> zeroinitializer	; CHECK-NEXT: [[R:%.*]] = insertelement <4 x float> poison, float [[R_SCALAR]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT2:%.]] = insertelement <4 x float> poison, float [[TMP1:%.]], i64 0
	; CHECK-NEXT: [[R:%.*]] = fsub fast <4 x float> [[BROADCAST_SPLAT]], [[BROADCAST_SPLATINSERT2]]
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[R]], <4 x float> poison, <4 x i32> zeroinitializer	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[R]], <4 x float> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: ret <4 x float> [[TMP3]]	; CHECK-NEXT: ret <4 x float> [[TMP3]]
	;	;
Context not available.

	define <vscale x 4 x float> @fadd_vscale_shuffle_insert_a_shuffle_insert_b(float %0, float %1) {	define <vscale x 4 x float> @fadd_vscale_shuffle_insert_a_shuffle_insert_b(float %0, float %1) {
	; CHECK-LABEL: @fadd_vscale_shuffle_insert_a_shuffle_insert_b(	; CHECK-LABEL: @fadd_vscale_shuffle_insert_a_shuffle_insert_b(
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 4 x float> poison, float [[TMP0:%.]], i64 0	; CHECK-NEXT: [[R_SCALAR:%.]] = fadd fast float [[TMP0:%.]], [[TMP1:%.*]]
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x float> [[BROADCAST_SPLATINSERT]], <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer	; CHECK-NEXT: [[R:%.*]] = insertelement <vscale x 4 x float> poison, float [[R_SCALAR]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT2:%.]] = insertelement <vscale x 4 x float> poison, float [[TMP1:%.]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <vscale x 4 x float> [[BROADCAST_SPLATINSERT2]], <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[R:%.*]] = fadd fast <vscale x 4 x float> [[BROADCAST_SPLAT]], [[BROADCAST_SPLAT2]]
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <vscale x 4 x float> [[R]], <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <vscale x 4 x float> [[R]], <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: ret <vscale x 4 x float> [[TMP3]]	; CHECK-NEXT: ret <vscale x 4 x float> [[TMP3]]
	;	;
Context not available.

	define <4 x float> @fadd_fixed_shuffle_insert_a_shuffle_insert_b(float %0, float %1) {	define <4 x float> @fadd_fixed_shuffle_insert_a_shuffle_insert_b(float %0, float %1) {
	; CHECK-LABEL: @fadd_fixed_shuffle_insert_a_shuffle_insert_b(	; CHECK-LABEL: @fadd_fixed_shuffle_insert_a_shuffle_insert_b(
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x float> poison, float [[TMP0:%.]], i64 0	; CHECK-NEXT: [[R_SCALAR:%.]] = fadd fast float [[TMP0:%.]], [[TMP1:%.*]]
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x float> [[BROADCAST_SPLATINSERT]], <4 x float> poison, <4 x i32> zeroinitializer	; CHECK-NEXT: [[R:%.*]] = insertelement <4 x float> poison, float [[R_SCALAR]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT2:%.]] = insertelement <4 x float> poison, float [[TMP1:%.]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <4 x float> [[BROADCAST_SPLATINSERT2]], <4 x float> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[R:%.*]] = fadd fast <4 x float> [[BROADCAST_SPLAT]], [[BROADCAST_SPLAT2]]
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[R]], <4 x float> poison, <4 x i32> zeroinitializer	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[R]], <4 x float> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: ret <4 x float> [[TMP3]]	; CHECK-NEXT: ret <4 x float> [[TMP3]]
	;	;
Context not available.

	define <vscale x 4 x float> @fsub_vscale_shuffle_insert_a_shuffle_insert_b(float %0, float %1) {	define <vscale x 4 x float> @fsub_vscale_shuffle_insert_a_shuffle_insert_b(float %0, float %1) {
	; CHECK-LABEL: @fsub_vscale_shuffle_insert_a_shuffle_insert_b(	; CHECK-LABEL: @fsub_vscale_shuffle_insert_a_shuffle_insert_b(
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 4 x float> poison, float [[TMP0:%.]], i64 0	; CHECK-NEXT: [[R_SCALAR:%.]] = fsub fast float [[TMP0:%.]], [[TMP1:%.*]]
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x float> [[BROADCAST_SPLATINSERT]], <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer	; CHECK-NEXT: [[R:%.*]] = insertelement <vscale x 4 x float> poison, float [[R_SCALAR]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT2:%.]] = insertelement <vscale x 4 x float> poison, float [[TMP1:%.]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <vscale x 4 x float> [[BROADCAST_SPLATINSERT2]], <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[R:%.*]] = fsub fast <vscale x 4 x float> [[BROADCAST_SPLAT]], [[BROADCAST_SPLAT2]]
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <vscale x 4 x float> [[R]], <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <vscale x 4 x float> [[R]], <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: ret <vscale x 4 x float> [[TMP3]]	; CHECK-NEXT: ret <vscale x 4 x float> [[TMP3]]
	;	;
Context not available.

	define <4 x float> @fsub_fixed_shuffle_insert_a_shuffle_insert_b(float %0, float %1) {	define <4 x float> @fsub_fixed_shuffle_insert_a_shuffle_insert_b(float %0, float %1) {
	; CHECK-LABEL: @fsub_fixed_shuffle_insert_a_shuffle_insert_b(	; CHECK-LABEL: @fsub_fixed_shuffle_insert_a_shuffle_insert_b(
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x float> poison, float [[TMP0:%.]], i64 0	; CHECK-NEXT: [[R_SCALAR:%.]] = fsub fast float [[TMP0:%.]], [[TMP1:%.*]]
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x float> [[BROADCAST_SPLATINSERT]], <4 x float> poison, <4 x i32> zeroinitializer	; CHECK-NEXT: [[R:%.*]] = insertelement <4 x float> poison, float [[R_SCALAR]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT2:%.]] = insertelement <4 x float> poison, float [[TMP1:%.]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <4 x float> [[BROADCAST_SPLATINSERT2]], <4 x float> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[R:%.*]] = fsub fast <4 x float> [[BROADCAST_SPLAT]], [[BROADCAST_SPLAT2]]
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[R]], <4 x float> poison, <4 x i32> zeroinitializer	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[R]], <4 x float> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: ret <4 x float> [[TMP3]]	; CHECK-NEXT: ret <4 x float> [[TMP3]]
	;	;
Context not available.