This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineAddSub.cpp
-
InstCombineInternal.h
16/16
InstructionCombining.cpp
-
test/Transforms/
-
Transforms/
-
InstCombine/
-
vscale.ll
-
LoopVectorize/
-
AArch64/
-
sve-gather-scatter.ll
-
scalable-inductions.ll

Differential D155218

[InstCombine] Optimize addition/subtraction operations of splats of vscale multiplied by a constant
AcceptedPublic

Authored by igor.kirillov on Jul 13 2023, 9:43 AM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
huntergr
mgabka
nikic
goldstein.w.n

Summary

This patch enhances the InstCombine pass to optimize addition and
subtraction operations involving splat values optionally multiplied
by a constant or shifted by a constant. The transformation combines
the operations as follows:

(A +/- splat(B)) +/- splat(C) -> A +/- splat(B +/- C)

This optimization improves the performance of vectorized code with
interleave factor > 1, where indices are part of an expression.
For example, in cases such as the following loop:

for (int i = A; i < B; ++i)
  out[i + offset] = (i + X) * Y;

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

igor.kirillov created this revision.Jul 13 2023, 9:43 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2023, 9:43 AM

Herald added subscribers: mgabka, hiraditya. · View Herald Transcript

igor.kirillov requested review of this revision.Jul 13 2023, 9:43 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2023, 9:43 AM

Herald added subscribers: llvm-commits, wangpc. · View Herald Transcript

Harbormaster completed remote builds in B245161: Diff 540084.Jul 13 2023, 12:26 PM

igor.kirillov added reviewers: paulwalker-arm, huntergr, mgabka.Jul 14 2023, 1:43 AM

paulwalker-arm added inline comments.Jul 18 2023, 4:42 AM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1684	This function seems to handle cases where the opcode is not relevant. However, you only care about two specific opcodes so this doesn't looks like the correct resting place for this code. Placing it somewhere more specific to `add` and `sub` might allow you to simplify the logic.
1689	Should this be `C = getSplatValue(LHS);`? if so then perhaps this highlights some missing tests?
1705–1706	Not sure this naming is correct because the opcode says nothing about the signedness of the data.

goldstein.w.n added a subscriber: goldstein.w.n.Jul 18 2023, 10:31 AM

goldstein.w.n added inline comments.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1684	Although you could imagine extending this function for any assosiative binop, i.e xor, or, and, mul... Probably cleaner would to be split this to a helper function so that if you don't have supported opcodes you cna just return `nullptr`. Then there won't be so much nesting.
1709	Can you add a comment explaining this. Its not exactly clear how/why the positive bools impl the opcodes they do.

Move code to a separate function and completely refactor it

igor.kirillov marked 2 inline comments as done.Jul 24 2023, 3:49 PM

igor.kirillov added inline comments.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1684	This function seems to handle cases where the opcode is not relevant. However, you only care about two specific opcodes, so this doesn't looks like the correct resting place for this code. Placing it somewhere more specific to `add` and `sub` might allow you to simplify the logic. @paulwalker-arm, Yes, that was my first idea, but I took a look at InstCombineAddSub.cpp, and it has a minimal amount of code that is vector specific, so I decided to put the logic into foldVectorBinop. @goldstein.w.n, Put everything into a separate function. I am unsure LoopVectorize could generate such code (with xor etc.).
1689	Yes, indeed! But the algorithm now is completely different.
1705–1706	No more signedness

Harbormaster completed remote builds in B247803: Diff 543717.Jul 24 2023, 11:19 PM

paulwalker-arm added inline comments.Aug 3 2023, 9:09 AM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1617	It's worth verifying but I don't think you need the commutative version here because constants should always be canonicalised to the RHS.
1621	If I'm nit picking I think you should use `VScale` consistently and not `Vscale`.
1629–1632	This seems wasteful given the number of times you're matching `m_SplatVscale`. Perhaps drop the first `match` from all the `if` blocks and then once you've figured out the add/sub combo you can have a single block that does: if (!match(SplatB, m_SplatVscale) \|\| !match(SplatC, m_SplatVscale)) return nullptr; Perhaps you don't even need `m_SplatVscale` because `getSplatValue` returns null on failure so the check could be: if (!B \|\| !match(B, m_ConstMultipliedVscale) \|\| !C \|\| !match(C, m_ConstMultipliedVscale)) return nullptr;
1682–1683	I still think this is wasteful as we're going to run through several redundant match calls before eventually hitting the null return for every bin op other than Add and Sub. To me it seems better to call this directly within `visitAdd` and `visitSub`. You'll see there's related precedent here with `foldBinOpShiftWithShift` amongst others.

Vscale -> VScale
Move foldVScaleSplatAddSub call to visitAdd and visitSub
Refactor pattern detection algorithm

igor.kirillov marked 2 inline comments as done.Aug 7 2023, 5:36 AM

igor.kirillov added inline comments.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1629–1632	What do you think about this version? My first version was using `getSplatValue`, but then I had to decide the exact order of A, B and C. Using the current approach, we can first match A, SpatB, SplatC and then match the exact add/sub operations without calling splat matcher several times.

Harbormaster completed remote builds in B250750: Diff 547740.Aug 7 2023, 7:40 AM

Matt added a subscriber: Matt.Aug 7 2023, 3:44 PM

A couple of minor requests but otherwise looks good.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1618	by
1629–1632	Seems like a reasonable compromise to me.
1660–1661	Better to move the initialisation here (i.e. `Value *B =` ) since there's no other uses.

This revision is now accepted and ready to land.Aug 9 2023, 9:37 AM

Please make sure to pre-commit test coverage (https://llvm.org/docs/TestingGuide.html#precommit-workflow-for-tests). This is also missing some tests, in particular multi-use and negative tests.

Why are the LoopVectorize changes seen here not regressions (caused by a multi-use transform)? Why does this transform only operate on multiplies of vscale?

Herald added a subscriber: StephenFan. · View Herald TranscriptAug 9 2023, 9:57 AM

The rational for making these vscale specific transformations is that vscale is effectively a constant with targets that support scalable vectors typically having instructions where vscale is implied. This is the same reason why there are no multi-use checks. Whilst the changes to LoopVectorize output look like regressions the extra IR is loop invariant that will be hoisted and generally leads to better generated code. This is especially true for cases where vector loops are interleaved.

In D155218#4573684, @paulwalker-arm wrote:

The rational for making these vscale specific transformations is that vscale is effectively a constant with targets that support scalable vectors typically having instructions where vscale is implied. This is the same reason why there are no multi-use checks. Whilst the changes to LoopVectorize output look like regressions the extra IR is loop invariant that will be hoisted and generally leads to better generated code. This is especially true for cases where vector loops are interleaved.

Is this something that we expect to hold true for all architectures with scalable vectors? I tried to look at sve and rvv codegen here: https://llvm.godbolt.org/z/9xdWrbPKv I can see how the sve case is always beneficial, even for multi-use cases, but the rvv case looks like it would only be an improvement for the one-use case.

I guess I'm making a generalisation here based on scalable vectors needing to calculate addresses and the like based on vscale and so am assuming specialised add/sub instructions will always exist. My eyes are not tuned in to reading rvv asm, can @craig.topper or @reames help clarify whether this combine also makes sense from a RISCV point of view.

I should add there's a growing need to sinking some of the vscale related logic back into loops to help with instruction selection, but regardless of this I still think it's preferable for this logic to be scalar rather than vector computations.

Added test with @use applied to nested add
Minor fixes

Please make sure to pre-commit test coverage (https://llvm.org/docs/TestingGuide.html#precommit-workflow-for-tests). This is also missing some tests, in particular multi-use and negative tests.

@nikic Added a test with use. Could you clarify what you mean by negative tests in this context?
I am aware of pre-commits. I just wasn't sure we wanted this patch at all. That's why I will pre-commit them just before landing the patch.

Harbormaster completed remote builds in B252576: Diff 550245.Aug 15 2023, 4:37 AM

Ping about the RISC-V conversation and more test requirements

In D155218#4587835, @igor.kirillov wrote:

Please make sure to pre-commit test coverage (https://llvm.org/docs/TestingGuide.html#precommit-workflow-for-tests). This is also missing some tests, in particular multi-use and negative tests.

@nikic Added a test with use. Could you clarify what you mean by negative tests in this context?
I am aware of pre-commits. I just wasn't sure we wanted this patch at all. That's why I will pre-commit them just before landing the patch.

Negative test = test where no transform happens. Generally speaking, you want one negative test for every condition in the transform, such that each test fails exactly one condition.

In D155218#4576394, @paulwalker-arm wrote:

I guess I'm making a generalisation here based on scalable vectors needing to calculate addresses and the like based on vscale and so am assuming specialised add/sub instructions will always exist. My eyes are not tuned in to reading rvv asm, can @craig.topper or @reames help clarify whether this combine also makes sense from a RISCV point of view.

To make the question more precise: If you have spat(vscale * C1) + splat(vscale * C2), does it make sense to transform it into splat(vscale * (C1+C2)) if neither splat(vscale * C1) nor splat(vscale * C2) are going away?

In D155218#4624956, @nikic wrote:

In D155218#4587835, @igor.kirillov wrote:

Please make sure to pre-commit test coverage (https://llvm.org/docs/TestingGuide.html#precommit-workflow-for-tests). This is also missing some tests, in particular multi-use and negative tests.

@nikic Added a test with use. Could you clarify what you mean by negative tests in this context?
I am aware of pre-commits. I just wasn't sure we wanted this patch at all. That's why I will pre-commit them just before landing the patch.

Negative test = test where no transform happens. Generally speaking, you want one negative test for every condition in the transform, such that each test fails exactly one condition.

In D155218#4576394, @paulwalker-arm wrote:

I guess I'm making a generalisation here based on scalable vectors needing to calculate addresses and the like based on vscale and so am assuming specialised add/sub instructions will always exist. My eyes are not tuned in to reading rvv asm, can @craig.topper or @reames help clarify whether this combine also makes sense from a RISCV point of view.

To make the question more precise: If you have spat(vscale * C1) + splat(vscale * C2), does it make sense to transform it into splat(vscale * (C1+C2)) if neither splat(vscale * C1) nor splat(vscale * C2) are going away?

RISC-V does not have any special add/sub instructions for vscale values. We define vscale as vector length in bytes divided by 8 and the vector length should always be divisible by 8. We have a read only status register called vlenb that returns the vector length in bytes. In the base case vscale is always expanded to a read of vlenb followed by dividing by 8 (shifted right by 3). If vscale is multiplied by a constant, we will try to combine the right shift with the multiplier so that we don't shift right only to immediately shift left again.

So I think @nikic is correct that for RISC-V this only makes sense for the one use case.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineAddSub.cpp

6 lines

InstCombineInternal.h

1 line

InstructionCombining.cpp

63 lines

test/

Transforms/

InstCombine/

vscale.ll

101 lines

LoopVectorize/

AArch64/

sve-gather-scatter.ll

14 lines

scalable-inductions.ll

26 lines

Diff 550245

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp

Show First 20 Lines • Show All 1,392 Lines • ▼ Show 20 Lines	if (Value *V = simplifyAddInst(I.getOperand(0), I.getOperand(1),
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (SimplifyAssociativeOrCommutative(I))		if (SimplifyAssociativeOrCommutative(I))
return &I;		return &I;

if (Instruction *X = foldVectorBinop(I))		if (Instruction *X = foldVectorBinop(I))
return X;		return X;

		if (Instruction *X = foldVScaleSplatAddSub(I))
		return X;

if (Instruction *Phi = foldBinopWithPhiOperands(I))		if (Instruction *Phi = foldBinopWithPhiOperands(I))
return Phi;		return Phi;

// (AB)+(AC) -> A*(B+C) etc		// (AB)+(AC) -> A*(B+C) etc
if (Value *V = foldUsingDistributiveLaws(I))		if (Value *V = foldUsingDistributiveLaws(I))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (Instruction *R = foldBoxMultiply(I))		if (Instruction *R = foldBoxMultiply(I))
▲ Show 20 Lines • Show All 606 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::visitSub(BinaryOperator &I) {
if (Value *V = simplifySubInst(I.getOperand(0), I.getOperand(1),		if (Value *V = simplifySubInst(I.getOperand(0), I.getOperand(1),
I.hasNoSignedWrap(), I.hasNoUnsignedWrap(),		I.hasNoSignedWrap(), I.hasNoUnsignedWrap(),
SQ.getWithInstruction(&I)))		SQ.getWithInstruction(&I)))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (Instruction *X = foldVectorBinop(I))		if (Instruction *X = foldVectorBinop(I))
return X;		return X;

		if (Instruction *X = foldVScaleSplatAddSub(I))
		return X;

if (Instruction *Phi = foldBinopWithPhiOperands(I))		if (Instruction *Phi = foldBinopWithPhiOperands(I))
return Phi;		return Phi;

Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);		Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);

// If this is a 'B = x-(-A)', change to B = x+A.		// If this is a 'B = x-(-A)', change to B = x+A.
// We deal with this without involving Negator to preserve NSW flag.		// We deal with this without involving Negator to preserve NSW flag.
if (Value *V = dyn_castNegVal(Op1)) {		if (Value *V = dyn_castNegVal(Op1)) {
▲ Show 20 Lines • Show All 801 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineInternal.h

Show First 20 Lines • Show All 506 Lines • ▼ Show 20 Lines	public:
bool SimplifyDemandedInstructionBits(Instruction &Inst);		bool SimplifyDemandedInstructionBits(Instruction &Inst);

Value SimplifyDemandedVectorElts(Value V, APInt DemandedElts,		Value SimplifyDemandedVectorElts(Value V, APInt DemandedElts,
APInt &UndefElts, unsigned Depth = 0,		APInt &UndefElts, unsigned Depth = 0,
bool AllowMultipleUsers = false) override;		bool AllowMultipleUsers = false) override;

/// Canonicalize the position of binops relative to shufflevector.		/// Canonicalize the position of binops relative to shufflevector.
Instruction *foldVectorBinop(BinaryOperator &Inst);		Instruction *foldVectorBinop(BinaryOperator &Inst);
		Instruction *foldVScaleSplatAddSub(BinaryOperator &Inst);
Instruction *foldVectorSelect(SelectInst &Sel);		Instruction *foldVectorSelect(SelectInst &Sel);
Instruction *foldSelectShuffle(ShuffleVectorInst &Shuf);		Instruction *foldSelectShuffle(ShuffleVectorInst &Shuf);

/// Given a binary operator, cast instruction, or select which has a PHI node		/// Given a binary operator, cast instruction, or select which has a PHI node
/// as operand #0, see if we can fold the instruction into the PHI (which is		/// as operand #0, see if we can fold the instruction into the PHI (which is
/// only possible if all operands to the PHI are constants).		/// only possible if all operands to the PHI are constants).
Instruction foldOpIntoPhi(Instruction &I, PHINode PN);		Instruction foldOpIntoPhi(Instruction &I, PHINode PN);

▲ Show 20 Lines • Show All 218 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 1,599 Lines • ▼ Show 20 Lines	static bool shouldMergeGEPs(GEPOperator &GEP, GEPOperator &Src) {
// Src. If Src is not a trivial GEP too, don't combine		// Src. If Src is not a trivial GEP too, don't combine
// the indices.		// the indices.
if (GEP.hasAllZeroIndices() && !Src.hasAllZeroIndices() &&		if (GEP.hasAllZeroIndices() && !Src.hasAllZeroIndices() &&
!Src.hasOneUse())		!Src.hasOneUse())
return false;		return false;
return true;		return true;
}		}

		// Combine two Add/Sub operations of the following structure:
		// (A +/- splat(B)) +/- splat(C) -> A +/- splat(B +/- C)
		// where B and C are splats of VScale multiplied by a number
		Instruction *InstCombinerImpl::foldVScaleSplatAddSub(BinaryOperator &Inst) {
		if (!isa<VectorType>(Inst.getType()))
		return nullptr;

		// Matches Value when it is either of:
		// 1) VScale
		// 2) A multiplication of a constant and VScale
		paulwalker-armUnsubmitted Done Reply Inline Actions It's worth verifying but I don't think you need the commutative version here because constants should always be canonicalised to the RHS. paulwalker-arm: It's worth verifying but I don't think you need the commutative version here because constants…
		// 3) A shift left of VScale by a constant value
		paulwalker-armUnsubmitted Done Reply Inline Actions by paulwalker-arm: by
		auto m_ConstMultipliedVScale =
		m_CombineOr(m_CombineOr(m_VScale(), m_Mul(m_VScale(), m_Constant())),
		m_Shl(m_VScale(), m_Constant()));
		paulwalker-armUnsubmitted Done Reply Inline Actions If I'm nit picking I think you should use `VScale` consistently and not `Vscale`. paulwalker-arm: If I'm nit picking I think you should use `VScale` consistently and not `Vscale`.

		// Splat of the expression from above
		auto m_SplatVScale =
		m_Shuffle(m_InsertElt(m_Value(), m_ConstMultipliedVScale, m_ZeroInt()),
		m_Value(), m_ZeroMask());

		Instruction SplatB, SplatC;
		Value *A;
		BinaryOperator::BinaryOps NewOpcode1, NewOpcode2;

		if (!match(&Inst,
		paulwalker-armUnsubmitted Done Reply Inline Actions This seems wasteful given the number of times you're matching `m_SplatVscale`. Perhaps drop the first `match` from all the `if` blocks and then once you've figured out the add/sub combo you can have a single block that does: if (!match(SplatB, m_SplatVscale) \|\| !match(SplatC, m_SplatVscale)) return nullptr; Perhaps you don't even need `m_SplatVscale` because `getSplatValue` returns null on failure so the check could be: if (!B \|\| !match(B, m_ConstMultipliedVscale) \|\| !C \|\| !match(C, m_ConstMultipliedVscale)) return nullptr; paulwalker-arm: This seems wasteful given the number of times you're matching `m_SplatVscale`. Perhaps drop…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions What do you think about this version? My first version was using `getSplatValue`, but then I had to decide the exact order of A, B and C. Using the current approach, we can first match A, SpatB, SplatC and then match the exact add/sub operations without calling splat matcher several times. igor.kirillov: What do you think about this version? My first version was using `getSplatValue`, but then I…
		paulwalker-armUnsubmitted Done Reply Inline Actions Seems like a reasonable compromise to me. paulwalker-arm: Seems like a reasonable compromise to me.
		m_c_BinOp(m_c_BinOp(m_Value(A), m_SplatVScale), m_SplatVScale)))
		return nullptr;

		if (match(&Inst, m_c_Add(m_c_Add(m_Specific(A), m_Instruction(SplatB)),
		m_Instruction(SplatC)))) {
		// (A + splat(B)) + splat(C) -> A + splat(C + B)
		NewOpcode1 = Instruction::Add;
		NewOpcode2 = Instruction::Add;
		} else if (match(&Inst, m_c_Add(m_Sub(m_Specific(A), m_Instruction(SplatB)),
		m_Instruction(SplatC)))) {
		// (A - splat(B)) + splat(C) -> A - splat(B - C)
		NewOpcode1 = Instruction::Sub;
		NewOpcode2 = Instruction::Sub;
		} else if (match(&Inst, m_Sub(m_c_Add(m_Specific(A), m_Instruction(SplatB)),
		m_Instruction(SplatC)))) {
		// (A + splat(B)) - splat(C) -> A + splat(B - C)
		NewOpcode1 = Instruction::Sub;
		NewOpcode2 = Instruction::Add;
		} else if (match(&Inst, m_Sub(m_Sub(m_Specific(A), m_Instruction(SplatB)),
		m_Instruction(SplatC)))) {
		// (A - splat(B)) - splat(C) -> A - splat(B + C)
		NewOpcode1 = Instruction::Add;
		NewOpcode2 = Instruction::Sub;
		} else {
		return nullptr;
		}

		Value *B = getSplatValue(SplatB);
		Value *C = getSplatValue(SplatC);
		paulwalker-armUnsubmitted Done Reply Inline Actions Better to move the initialisation here (i.e. `Value B =` ) since there's no other uses. paulwalker-arm:* Better to move the initialisation here (i.e. `Value *B = `) since there's no other uses.

		// Combine the two splat operations, create a new vector splat and new
		// binary operations
		auto *NewOp = Builder.CreateBinOp(NewOpcode1, B, C);
		auto EC = cast<VectorType>(Inst.getType())->getElementCount();
		auto *SplatNewOp = Builder.CreateVectorSplat(EC, NewOp);
		return BinaryOperator::Create(NewOpcode2, A, SplatNewOp);
		}

Instruction *InstCombinerImpl::foldVectorBinop(BinaryOperator &Inst) {		Instruction *InstCombinerImpl::foldVectorBinop(BinaryOperator &Inst) {
if (!isa<VectorType>(Inst.getType()))		if (!isa<VectorType>(Inst.getType()))
return nullptr;		return nullptr;

BinaryOperator::BinaryOps Opcode = Inst.getOpcode();		BinaryOperator::BinaryOps Opcode = Inst.getOpcode();
Value LHS = Inst.getOperand(0), RHS = Inst.getOperand(1);		Value LHS = Inst.getOperand(0), RHS = Inst.getOperand(1);
assert(cast<VectorType>(LHS->getType())->getElementCount() ==		assert(cast<VectorType>(LHS->getType())->getElementCount() ==
cast<VectorType>(Inst.getType())->getElementCount());		cast<VectorType>(Inst.getType())->getElementCount());
assert(cast<VectorType>(RHS->getType())->getElementCount() ==		assert(cast<VectorType>(RHS->getType())->getElementCount() ==
cast<VectorType>(Inst.getType())->getElementCount());		cast<VectorType>(Inst.getType())->getElementCount());

// If both operands of the binop are vector concatenations, then perform the		// If both operands of the binop are vector concatenations, then perform the
// narrow binop on each pair of the source operands followed by concatenation		// narrow binop on each pair of the source operands followed by concatenation
		paulwalker-armUnsubmitted Done Reply Inline Actions I still think this is wasteful as we're going to run through several redundant match calls before eventually hitting the null return for every bin op other than Add and Sub. To me it seems better to call this directly within `visitAdd` and `visitSub`. You'll see there's related precedent here with `foldBinOpShiftWithShift` amongst others. paulwalker-arm: I still think this is wasteful as we're going to run through several redundant match calls…
// of the results.		// of the results.
		paulwalker-armUnsubmitted Done Reply Inline Actions This function seems to handle cases where the opcode is not relevant. However, you only care about two specific opcodes so this doesn't looks like the correct resting place for this code. Placing it somewhere more specific to `add` and `sub` might allow you to simplify the logic. paulwalker-arm: This function seems to handle cases where the opcode is not relevant. However, you only care…
		goldstein.w.nUnsubmitted Done Reply Inline Actions Although you could imagine extending this function for any assosiative binop, i.e xor, or, and, mul... Probably cleaner would to be split this to a helper function so that if you don't have supported opcodes you cna just return `nullptr`. Then there won't be so much nesting. goldstein.w.n: Although you could imagine extending this function for any assosiative binop, i.e xor, or, and…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions This function seems to handle cases where the opcode is not relevant. However, you only care about two specific opcodes, so this doesn't looks like the correct resting place for this code. Placing it somewhere more specific to `add` and `sub` might allow you to simplify the logic. @paulwalker-arm, Yes, that was my first idea, but I took a look at InstCombineAddSub.cpp, and it has a minimal amount of code that is vector specific, so I decided to put the logic into foldVectorBinop. @goldstein.w.n, Put everything into a separate function. I am unsure LoopVectorize could generate such code (with xor etc.). igor.kirillov: > This function seems to handle cases where the opcode is not relevant. However, you only care…
Value L0, L1, R0, R1;		Value L0, L1, R0, R1;
ArrayRef<int> Mask;		ArrayRef<int> Mask;
if (match(LHS, m_Shuffle(m_Value(L0), m_Value(L1), m_Mask(Mask))) &&		if (match(LHS, m_Shuffle(m_Value(L0), m_Value(L1), m_Mask(Mask))) &&
match(RHS, m_Shuffle(m_Value(R0), m_Value(R1), m_SpecificMask(Mask))) &&		match(RHS, m_Shuffle(m_Value(R0), m_Value(R1), m_SpecificMask(Mask))) &&
LHS->hasOneUse() && RHS->hasOneUse() &&		LHS->hasOneUse() && RHS->hasOneUse() &&
		paulwalker-armUnsubmitted Done Reply Inline Actions Should this be `C = getSplatValue(LHS);`? if so then perhaps this highlights some missing tests? paulwalker-arm: Should this be `C = getSplatValue(LHS);`? if so then perhaps this highlights some missing tests?
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions Yes, indeed! But the algorithm now is completely different. igor.kirillov: Yes, indeed! But the algorithm now is completely different.
cast<ShuffleVectorInst>(LHS)->isConcat() &&		cast<ShuffleVectorInst>(LHS)->isConcat() &&
cast<ShuffleVectorInst>(RHS)->isConcat()) {		cast<ShuffleVectorInst>(RHS)->isConcat()) {
// This transform does not have the speculative execution constraint as		// This transform does not have the speculative execution constraint as
// below because the shuffle is a concatenation. The new binops are		// below because the shuffle is a concatenation. The new binops are
// operating on exactly the same elements as the existing binop.		// operating on exactly the same elements as the existing binop.
// TODO: We could ease the mask requirement to allow different undef lanes,		// TODO: We could ease the mask requirement to allow different undef lanes,
// but that requires an analysis of the binop-with-undef output value.		// but that requires an analysis of the binop-with-undef output value.
Value *NewBO0 = Builder.CreateBinOp(Opcode, L0, R0);		Value *NewBO0 = Builder.CreateBinOp(Opcode, L0, R0);
if (auto *BO = dyn_cast<BinaryOperator>(NewBO0))		if (auto *BO = dyn_cast<BinaryOperator>(NewBO0))
BO->copyIRFlags(&Inst);		BO->copyIRFlags(&Inst);
Value *NewBO1 = Builder.CreateBinOp(Opcode, L1, R1);		Value *NewBO1 = Builder.CreateBinOp(Opcode, L1, R1);
if (auto *BO = dyn_cast<BinaryOperator>(NewBO1))		if (auto *BO = dyn_cast<BinaryOperator>(NewBO1))
BO->copyIRFlags(&Inst);		BO->copyIRFlags(&Inst);
return new ShuffleVectorInst(NewBO0, NewBO1, Mask);		return new ShuffleVectorInst(NewBO0, NewBO1, Mask);
}		}

auto createBinOpReverse = [&](Value X, Value Y) {		auto createBinOpReverse = [&](Value X, Value Y) {
		paulwalker-armUnsubmitted Done Reply Inline Actions Not sure this naming is correct because the opcode says nothing about the signedness of the data. paulwalker-arm: Not sure this naming is correct because the opcode says nothing about the signedness of the…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions No more signedness igor.kirillov: No more signedness
Value *V = Builder.CreateBinOp(Opcode, X, Y, Inst.getName());		Value *V = Builder.CreateBinOp(Opcode, X, Y, Inst.getName());
if (auto *BO = dyn_cast<BinaryOperator>(V))		if (auto *BO = dyn_cast<BinaryOperator>(V))
BO->copyIRFlags(&Inst);		BO->copyIRFlags(&Inst);
		goldstein.w.nUnsubmitted Done Reply Inline Actions Can you add a comment explaining this. Its not exactly clear how/why the positive bools impl the opcodes they do. goldstein.w.n: Can you add a comment explaining this. Its not exactly clear how/why the positive bools impl…
Module *M = Inst.getModule();		Module *M = Inst.getModule();
Function *F = Intrinsic::getDeclaration(		Function *F = Intrinsic::getDeclaration(
M, Intrinsic::experimental_vector_reverse, V->getType());		M, Intrinsic::experimental_vector_reverse, V->getType());
return CallInst::Create(F, V);		return CallInst::Create(F, V);
};		};

// NOTE: Reverse shuffles don't require the speculative execution protection		// NOTE: Reverse shuffles don't require the speculative execution protection
// below because they don't affect which lanes take part in the computation.		// below because they don't affect which lanes take part in the computation.
▲ Show 20 Lines • Show All 2,827 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/vscale.ll

	Show All 34 Lines
	; CHECK-NEXT: ret i64 [[SHL]]			; CHECK-NEXT: ret i64 [[SHL]]
	;			;
	%vscale = call i32 @llvm.vscale.i32()			%vscale = call i32 @llvm.vscale.i32()
	%shl = shl i32 %vscale, 3			%shl = shl i32 %vscale, 3
	%ext = zext i32 %shl to i64			%ext = zext i32 %shl to i64
	ret i64 %ext			ret i64 %ext
	}			}

				define <vscale x 2 x i64> @test_add_add_splat_vscale(<vscale x 2 x i64> %A) {
				; CHECK-LABEL: @test_add_add_splat_vscale(
				; CHECK-NEXT: [[VSCALE:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[TMP1:%.*]] = shl i64 [[VSCALE]], 1
				; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[TMP1]], i64 0
				; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 2 x i64> [[DOTSPLATINSERT]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
				; CHECK-NEXT: [[RESULT:%.]] = add <vscale x 2 x i64> [[DOTSPLAT]], [[A:%.]]
				; CHECK-NEXT: ret <vscale x 2 x i64> [[RESULT]]
				;
				%vscale = call i64 @llvm.vscale.i64()
				%splatinsert = insertelement <vscale x 2 x i64> poison, i64 %vscale, i64 0
				%splat = shufflevector <vscale x 2 x i64> %splatinsert, <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
				%nested.operaton = add <vscale x 2 x i64> %A, %splat
				%result = add <vscale x 2 x i64> %nested.operaton, %splat
				ret <vscale x 2 x i64> %result
				}

				define <vscale x 2 x i64> @test_add_sub_splat_vscale(<vscale x 2 x i64> %A) {
				; CHECK-LABEL: @test_add_sub_splat_vscale(
				; CHECK-NEXT: [[VSCALE:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[TMP1:%.*]] = shl i64 [[VSCALE]], 1
				; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[TMP1]], i64 0
				; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 2 x i64> [[DOTSPLATINSERT]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
				; CHECK-NEXT: [[RESULT:%.]] = add <vscale x 2 x i64> [[DOTSPLAT]], [[A:%.]]
				; CHECK-NEXT: ret <vscale x 2 x i64> [[RESULT]]
				;
				%vscale = call i64 @llvm.vscale.i64()
				%splatinsert = insertelement <vscale x 2 x i64> poison, i64 %vscale, i64 0
				%splat = shufflevector <vscale x 2 x i64> %splatinsert, <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
				%1 = mul i64 %vscale, 3
				%splatinsert.3 = insertelement <vscale x 2 x i64> poison, i64 %1, i64 0
				%splat.3 = shufflevector <vscale x 2 x i64> %splatinsert.3, <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
				%nested.operaton = add <vscale x 2 x i64> %A, %splat.3
				%result = sub <vscale x 2 x i64> %nested.operaton, %splat
				ret <vscale x 2 x i64> %result
				}

				define <vscale x 2 x i64> @test_sub_add_splat_vscale(<vscale x 2 x i64> %A) {
				; CHECK-LABEL: @test_sub_add_splat_vscale(
				; CHECK-NEXT: [[VSCALE:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[DOTSPLATINSERT_NEG:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[VSCALE]], i64 0
				; CHECK-NEXT: [[DOTSPLAT_NEG:%.*]] = shufflevector <vscale x 2 x i64> [[DOTSPLATINSERT_NEG]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
				; CHECK-NEXT: [[RESULT:%.]] = add <vscale x 2 x i64> [[DOTSPLAT_NEG]], [[A:%.]]
				; CHECK-NEXT: ret <vscale x 2 x i64> [[RESULT]]
				;
				%vscale = call i64 @llvm.vscale.i64()
				%splatinsert = insertelement <vscale x 2 x i64> poison, i64 %vscale, i64 0
				%splat = shufflevector <vscale x 2 x i64> %splatinsert, <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
				%1 = shl i64 %vscale, 1
				%splatinsert.2 = insertelement <vscale x 2 x i64> poison, i64 %1, i64 0
				%splat.2 = shufflevector <vscale x 2 x i64> %splatinsert.2, <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
				%nested.operaton = sub <vscale x 2 x i64> %A, %splat
				%result = add <vscale x 2 x i64> %nested.operaton, %splat.2
				ret <vscale x 2 x i64> %result
				}

				define <vscale x 2 x i64> @test_sub_sub_splat_vscale(<vscale x 2 x i64> %A) {
				; CHECK-LABEL: @test_sub_sub_splat_vscale(
				; CHECK-NEXT: [[VSCALE:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[DOTNEG:%.*]] = mul i64 [[VSCALE]], -3
				; CHECK-NEXT: [[DOTSPLATINSERT_NEG:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[DOTNEG]], i64 0
				; CHECK-NEXT: [[DOTSPLAT_NEG:%.*]] = shufflevector <vscale x 2 x i64> [[DOTSPLATINSERT_NEG]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
				; CHECK-NEXT: [[RESULT:%.]] = add <vscale x 2 x i64> [[DOTSPLAT_NEG]], [[A:%.]]
				; CHECK-NEXT: ret <vscale x 2 x i64> [[RESULT]]
				;
				%vscale = call i64 @llvm.vscale.i64()
				%splatinsert = insertelement <vscale x 2 x i64> poison, i64 %vscale, i64 0
				%splat = shufflevector <vscale x 2 x i64> %splatinsert, <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
				%1 = shl i64 %vscale, 1
				%splatinsert.2 = insertelement <vscale x 2 x i64> poison, i64 %1, i64 0
				%splat.2 = shufflevector <vscale x 2 x i64> %splatinsert.2, <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
				%nested.operaton = sub <vscale x 2 x i64> %A, %splat
				%result = sub <vscale x 2 x i64> %nested.operaton, %splat.2
				ret <vscale x 2 x i64> %result
				}

				define <vscale x 2 x i64> @test_add_use_add_splat_vscale(<vscale x 2 x i64> %A) {
				; CHECK-LABEL: @test_add_use_add_splat_vscale(
				; CHECK-NEXT: [[VSCALE:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[SPLATINSERT:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[VSCALE]], i64 0
				; CHECK-NEXT: [[SPLAT:%.*]] = shufflevector <vscale x 2 x i64> [[SPLATINSERT]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
				; CHECK-NEXT: [[NESTED_OPERATON:%.]] = add <vscale x 2 x i64> [[SPLAT]], [[A:%.]]
				; CHECK-NEXT: call void @use(<vscale x 2 x i64> [[NESTED_OPERATON]])
				; CHECK-NEXT: [[TMP1:%.*]] = shl i64 [[VSCALE]], 1
				; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[TMP1]], i64 0
				; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 2 x i64> [[DOTSPLATINSERT]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
				; CHECK-NEXT: [[RESULT:%.*]] = add <vscale x 2 x i64> [[DOTSPLAT]], [[A]]
				; CHECK-NEXT: ret <vscale x 2 x i64> [[RESULT]]
				;
				%vscale = call i64 @llvm.vscale.i64()
				%splatinsert = insertelement <vscale x 2 x i64> poison, i64 %vscale, i64 0
				%splat = shufflevector <vscale x 2 x i64> %splatinsert, <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
				%nested.operaton = add <vscale x 2 x i64> %A, %splat
				call void @use(<vscale x 2 x i64> %nested.operaton)
				%result = add <vscale x 2 x i64> %nested.operaton, %splat
				ret <vscale x 2 x i64> %result
				}

	declare i32 @llvm.vscale.i32()			declare i32 @llvm.vscale.i32()
				declare i64 @llvm.vscale.i64()

				declare void @use(<vscale x 2 x i64>)

	attributes #0 = { vscale_range(1,16) }			attributes #0 = { vscale_range(1,16) }

llvm/test/Transforms/LoopVectorize/AArch64/sve-gather-scatter.ll

	Show First 20 Lines • Show All 311 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: store <vscale x 4 x float> [[WIDE_MASKED_GATHER]], ptr [[TMP10]], align 4			; CHECK-NEXT: store <vscale x 4 x float> [[WIDE_MASKED_GATHER]], ptr [[TMP10]], align 4
	; CHECK-NEXT: [[TMP11:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP11:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP12:%.*]] = shl nuw nsw i64 [[TMP11]], 2			; CHECK-NEXT: [[TMP12:%.*]] = shl nuw nsw i64 [[TMP11]], 2
	; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds float, ptr [[TMP10]], i64 [[TMP12]]			; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds float, ptr [[TMP10]], i64 [[TMP12]]
	; CHECK-NEXT: store <vscale x 4 x float> [[WIDE_MASKED_GATHER2]], ptr [[TMP13]], align 4			; CHECK-NEXT: store <vscale x 4 x float> [[WIDE_MASKED_GATHER2]], ptr [[TMP13]], align 4
	; CHECK-NEXT: [[TMP14:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP14:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP15:%.*]] = shl nuw nsw i64 [[TMP14]], 3			; CHECK-NEXT: [[TMP15:%.*]] = shl nuw nsw i64 [[TMP14]], 3
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP15]]			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP15]]
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[STEP_ADD]], [[DOTSPLAT]]			; CHECK-NEXT: [[TMP16:%.*]] = shl nuw nsw i64 [[TMP4]], 3
	; CHECK-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[DOTSPLATINSERT3:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP16]], i64 0
	; CHECK-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]			; CHECK-NEXT: [[DOTSPLAT4:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT3]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT4]]
				; CHECK-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[INDVARS_IV_STRIDE2:%.*]] = shl i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_STRIDE2:%.*]] = shl i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[INDVARS_IV_STRIDE2]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[INDVARS_IV_STRIDE2]]
	; CHECK-NEXT: [[TMP17:%.*]] = load float, ptr [[ARRAYIDX]], align 4			; CHECK-NEXT: [[TMP18:%.*]] = load float, ptr [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: store float [[TMP17]], ptr [[ARRAYIDX2]], align 4			; CHECK-NEXT: store float [[TMP18]], ptr [[ARRAYIDX2]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]			; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;

	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
	%indvars.iv.stride2 = mul i64 %indvars.iv, 2			%indvars.iv.stride2 = mul i64 %indvars.iv, 2
	%arrayidx = getelementptr inbounds float, ptr %b, i64 %indvars.iv.stride2			%arrayidx = getelementptr inbounds float, ptr %b, i64 %indvars.iv.stride2
	%0 = load float, ptr %arrayidx, align 4			%0 = load float, ptr %arrayidx, align 4
	Show All 18 Lines

llvm/test/Transforms/LoopVectorize/scalable-inductions.ll

	Show All 40 Lines
	; CHECK-NEXT: store <vscale x 2 x i64> [[TMP11]], ptr [[TMP13]], align 8			; CHECK-NEXT: store <vscale x 2 x i64> [[TMP11]], ptr [[TMP13]], align 8
	; CHECK-NEXT: [[TMP14:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP14:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP15:%.*]] = shl i64 [[TMP14]], 1			; CHECK-NEXT: [[TMP15:%.*]] = shl i64 [[TMP14]], 1
	; CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds i64, ptr [[TMP13]], i64 [[TMP15]]			; CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds i64, ptr [[TMP13]], i64 [[TMP15]]
	; CHECK-NEXT: store <vscale x 2 x i64> [[TMP12]], ptr [[TMP16]], align 8			; CHECK-NEXT: store <vscale x 2 x i64> [[TMP12]], ptr [[TMP16]], align 8
	; CHECK-NEXT: [[TMP17:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP17:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP18:%.*]] = shl i64 [[TMP17]], 2			; CHECK-NEXT: [[TMP18:%.*]] = shl i64 [[TMP17]], 2
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP18]]			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP18]]
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 2 x i64> [[STEP_ADD]], [[DOTSPLAT]]			; CHECK-NEXT: [[TMP19:%.*]] = shl i64 [[TMP5]], 2
	; CHECK-NEXT: [[TMP19:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[DOTSPLATINSERT3:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[TMP19]], i64 0
	; CHECK-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; CHECK-NEXT: [[DOTSPLAT4:%.*]] = shufflevector <vscale x 2 x i64> [[DOTSPLATINSERT3]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 2 x i64> [[VEC_IND]], [[DOTSPLAT4]]
				; CHECK-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
	; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I_08:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[I_08:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[I_08]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[I_08]]
	; CHECK-NEXT: [[TMP20:%.*]] = load i64, ptr [[ARRAYIDX]], align 8			; CHECK-NEXT: [[TMP21:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i64 [[TMP20]], [[I_08]]			; CHECK-NEXT: [[ADD:%.*]] = add nsw i64 [[TMP21]], [[I_08]]
	; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[I_08]]			; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[I_08]]
	; CHECK-NEXT: store i64 [[ADD]], ptr [[ARRAYIDX1]], align 8			; CHECK-NEXT: store i64 [[ADD]], ptr [[ARRAYIDX1]], align 8
	; CHECK-NEXT: [[INC]] = add nuw nsw i64 [[I_08]], 1			; CHECK-NEXT: [[INC]] = add nuw nsw i64 [[I_08]], 1
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]			; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDEX]]
	; CHECK-NEXT: store <vscale x 1 x i64> [[TMP9]], ptr [[TMP11]], align 8			; CHECK-NEXT: store <vscale x 1 x i64> [[TMP9]], ptr [[TMP11]], align 8
	; CHECK-NEXT: [[TMP12:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP12:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds i64, ptr [[TMP11]], i64 [[TMP12]]			; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds i64, ptr [[TMP11]], i64 [[TMP12]]
	; CHECK-NEXT: store <vscale x 1 x i64> [[TMP10]], ptr [[TMP13]], align 8			; CHECK-NEXT: store <vscale x 1 x i64> [[TMP10]], ptr [[TMP13]], align 8
	; CHECK-NEXT: [[TMP14:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP14:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP15:%.*]] = shl i64 [[TMP14]], 1			; CHECK-NEXT: [[TMP15:%.*]] = shl i64 [[TMP14]], 1
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP15]]			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP15]]
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 1 x i64> [[STEP_ADD]], [[DOTSPLAT]]			; CHECK-NEXT: [[TMP16:%.*]] = shl i64 [[TMP5]], 1
	; CHECK-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[DOTSPLATINSERT3:%.*]] = insertelement <vscale x 1 x i64> poison, i64 [[TMP16]], i64 0
	; CHECK-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]			; CHECK-NEXT: [[DOTSPLAT4:%.*]] = shufflevector <vscale x 1 x i64> [[DOTSPLATINSERT3]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 1 x i64> [[VEC_IND]], [[DOTSPLAT4]]
				; CHECK-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
	; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I_08:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[I_08:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[I_08]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[I_08]]
	; CHECK-NEXT: [[TMP17:%.*]] = load i64, ptr [[ARRAYIDX]], align 8			; CHECK-NEXT: [[TMP18:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i64 [[TMP17]], [[I_08]]			; CHECK-NEXT: [[ADD:%.*]] = add nsw i64 [[TMP18]], [[I_08]]
	; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[I_08]]			; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[I_08]]
	; CHECK-NEXT: store i64 [[ADD]], ptr [[ARRAYIDX1]], align 8			; CHECK-NEXT: store i64 [[ADD]], ptr [[ARRAYIDX1]], align 8
	; CHECK-NEXT: [[INC]] = add nuw nsw i64 [[I_08]], 1			; CHECK-NEXT: [[INC]] = add nuw nsw i64 [[I_08]], 1
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]			; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	▲ Show 20 Lines • Show All 189 Lines • Show Last 20 Lines