This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
Target/
-
AArch64/
2/2
AArch64ISelLowering.cpp
2
AArch64TargetTransformInfo.cpp
-
RISCV/
1/2
RISCVTargetTransformInfo.cpp
-
Transforms/Vectorize/
-
Vectorize/
10/14
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/AArch64/
-
Transforms/
-
LoopVectorize/
-
AArch64/
1/1
scalable-strict-fadd.ll
-
sve-interleaved-accesses.ll
-
sve-widen-phi.ll

Differential D145163

Add support for vectorization of interleaved memory accesses for scalable VF
ClosedPublic

Authored by huntergr on Mar 2 2023, 7:24 AM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
reames
luke
mgabka
fhahn

Commits

rG95bfb1902db9: [LV][AArch64] Allow (limited) interleaving for scalable vectors

Summary

This patch is using the new intrinsics introduced in
https://reviews.llvm.org/D141924
to enable vecorization of interleaved accesses for scalable VF.
Targets need to implement a proper cost model for supported operations
to make sure that generated IR can be code generated.

Diff Detail

Unit TestsFailed

	Time	Test
	60,030 ms	x64 debian > MLIR.Examples/standalone::test.toy

Event Timeline

mgabka created this revision.Mar 2 2023, 7:24 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2023, 7:24 AM

Herald added subscribers: nlopes, hiraditya. · View Herald Transcript

mgabka requested review of this revision.Mar 2 2023, 7:24 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2023, 7:24 AM

Herald added subscribers: llvm-commits, • pcwang-thead, alextsao1999. · View Herald Transcript

mgabka mentioned this in D134438: POC patch to demonstrate how new intrinsics for interleaved load/store could be used in LoopVectorize.Mar 2 2023, 7:25 AM

Matt added a subscriber: Matt.Mar 2 2023, 7:27 AM

nlopes added inline comments.Mar 2 2023, 8:09 AM

llvm/lib/IR/IRBuilder.cpp
596 ↗	(On Diff #501861)	Please use PoisonValue here and whenever possible as we are trying to get rid of UndefValue. Thank you!

Harbormaster completed remote builds in B216974: Diff 501861.Mar 2 2023, 8:36 AM

Thanks for adding this! I'm currently plugging in the hooks for RISC-V and will let you know what I run into.

llvm/lib/IR/IRBuilder.cpp
587 ↗	(On Diff #501861)	Maybe this should be an assertion
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2711	Need to check that `Group->getFactor() == 2` here or that the call to CreateMaskedInterleavedLoad succeeds
2826	Need to check `Group->getFactor() == 2` here too

luke added inline comments.Mar 2 2023, 9:58 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2771	It's somehow possible to reach here with a scalable vector type if `TII->hasInterleavedLoad` returns false. Can we check somewhere inside the vectorizer cost model that if `hasInterleavedLoad` is false then we rule out any recipe with an interleave group for a scalable VF?

mgabka added inline comments.Mar 10 2023, 5:47 AM

llvm/include/llvm/Analysis/TargetTransformInfo.h
835–839 ↗	(On Diff #501861)	I think these functions could actually be joined into 1, something like: supportsInterleaving(VectorType *VecTy, uint32_t Factor, bool IsMasked) I think it is going to be unlikely that target supports store but not load for the same vector type, @paulwalker-arm what do you think?
llvm/lib/IR/IRBuilder.cpp
587 ↗	(On Diff #501861)	Hi Luke, so I could remove the Factor argument entirely and make this function specific for just Factor=2 (and use assertions) what makes sense for now as the experimental interleaving intrinsics are only for factor 2.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2711	So my idea was that it would be up to the hasInterleavedLoad function to make sure that it returns true only when Factor is 2, so no extra checks is needed I think.
2771	So it is actually connected by the LV cost model, the LoopVectorizationCostModel::getInterleaveGroupCost is calling TTI.getInterleavedMemoryOpCost which should return invalid cost for factors different than 2.

Replaced use of Undef with Poison value

mgabka marked an inline comment as done.Mar 10 2023, 6:07 AM

Harbormaster completed remote builds in B218667: Diff 504122.Mar 10 2023, 7:20 AM

reames added inline comments.Mar 10 2023, 7:35 AM

llvm/include/llvm/IR/IRBuilder.h
771 ↗	(On Diff #504122)	This is the wrong interface. The IRBuilder interface should provide a way to create the interleave and deinterleave instrinsic calls. That interface should generate shuffles for fixed vectors. Then the calling logic in the vectorizer should worry about emitting the load/store. (That's the existing structure in fact.)
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2951	The changes to this function are NFC for fixed length vectors, and a generally useful scalable cleanup. Please separate and land this change without the need for further review. This applies only to the changes in this function so as to shrink the diff for future review.
llvm/test/Transforms/LoopVectorize/sve-interleaved-accesses.ll
1 ↗	(On Diff #504122)	This should be in the AArch64 sub-tree, and probably precommited. Depending on your confidence in the AArch64 code, you may want to separate that into it's own review.

huntergr mentioned this in rG9aa01c4e8917: [LV] Remove scalable constraints on creating bitcasts.Mar 17 2023, 9:20 AM

huntergr mentioned this in rGfba2a7c6958b: [LV][AArch64] Precommit interleaved access tests.Mar 29 2023, 2:26 AM

Taking this one over from @mgabka

Separated out the bitcast fix and committed. Precommitted tests.

Changed IRBuilder interface to focus on the intrinsics (and fixed-length shuffle equivalents) instead of mixing in loads/stores.

There's a few unit tests which will fail with the new interface -- although we generate the same IR instructions, the order is different. Assuming the interface is suitable I'll update the tests before posting the next patch revision.

Herald added subscribers: frasercrmck, luismarques, apazos and 20 others. · View Herald TranscriptApr 5 2023, 5:48 AM

huntergr marked 2 inline comments as done.Apr 5 2023, 5:55 AM

huntergr added inline comments.

llvm/tools/llvm-profdata/CMakeLists.txt
7 ↗	(On Diff #511067)	This is due to the fixed-length mask generation code being in VectorUtils. This isn't the only tool affected, though oddly enough I've only observed build failures on X86 and not AArch64 hosts. I've included it as a representative. I would prefer not to make changes to a bunch of cmake files for this, so I'm currently leaning towards either duplicating the mask generation code or moving it into Core. Any preferences from a reviewer?

Harbormaster completed remote builds in B223778: Diff 511067.Apr 5 2023, 6:29 AM

mgabka added inline comments.Apr 6 2023, 3:47 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
190–192	Hi @huntergr , Thanks for your changes to this patch! I have one question, the interface you proposed looks clean and nice, however it forces code generation for the deinterleaving/interleaving intrinsics to be implemented before merging this patch, am I correct? The reason why I had this option here is that it would allow us to merge this patch before other pieces are implemented.

huntergr added inline comments.Apr 6 2023, 3:57 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
190–192	Hi @mgabka , We do have code generation for these intrinsics already, they just get lowered to zips/uzips. See D141924. D146218 will match to ld2/st2 where possible (which is what we want), and should perhaps land first. The changes to isLegalInterleavedAccessType will also be needed there, so the next version of this patch can just rely on that.

mgabka mentioned this in D136153: [AArch64] Allow cost computation for interleaved accesses.Apr 6 2023, 7:02 AM

reames added inline comments.Apr 6 2023, 6:01 PM

llvm/lib/IR/IRBuilder.cpp
1285 ↗	(On Diff #511067)	cast<>
1326 ↗	(On Diff #511067)	Given this assert, we shouldn't need to pass Factor in here at all.
1329 ↗	(On Diff #511067)	cast<>
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2714	Having this be only in the normal load path seems unlikely to be correct. Surely we must also handle masked loads as well?
2779–2780	It looks like you're changing the handling for gaps in the deinterleave. This seems surprising and worth some discussion?

Implemented requested changes to utility functions (casts, removing redundant parameter)
Moved utility functions to VectorUtils instead of IRBuilder; this removes the problem that introduced a dependency on the Analysis component for several tools which don't need the funtionality.
Updated affected tests.
Full check output included for the strict fadd tests.

Herald added a subscriber: dmgreen. · View Herald TranscriptApr 14 2023, 7:53 AM

huntergr marked 5 inline comments as done.Apr 14 2023, 7:59 AM

huntergr added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2714	This does handle masked loads -- 'NewLoad = Builder.CreateAligned....' is a standalone statement on the else with no opening brace. I've added a blank line to perhaps make that a little more obvious. Unless there's something else I've missed?
2779–2780	That was the result of a bit of overzealous cleanup on my part when removing some code from the original patch; I missed the 'continue'. Reverted.

reames added inline comments.Apr 14 2023, 8:36 AM

llvm/include/llvm/Analysis/VectorUtils.h
596 ↗	(On Diff #513595)	Unless you have plans to reuse these, this is just an implementation detail of the vectorizer. As such, these would be better as static functions in LoopVectorize.cpp
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2714	Yeah, I got confused by the brace style in the code above.
2766	The interface here feels really awkward for fixed length vectors. We have to create this dummy struct type, construct it, destruct it, and we loose the ability to slice out the inactive lanes. I almost wonder if this code would be clearer without the helper function at all. With an explicit version based on scalable type here, we could do a simplified version of this loop with an early return and leave the fixed length codegen unchanged. I'd be tempted to try that and see if the overall code quality looked reasonable. You could also try a lambda which enumerate the active lanes (i.e. doing the shuffle or extract as required), and move the handling of the bitcast and reverse to a callback. This might be too much complexity though.
llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
2	Please submit a separate change to autogen this file, and then rebase. Same with the other file you switched to autogen.

Harbormaster completed remote builds in B225622: Diff 513595.Apr 14 2023, 8:59 AM

huntergr mentioned this in rGd8c49d2ac9dd: [LV][AArch64] Autogenerate checks for scalable-strict-fadd.ll (NFC).Apr 18 2023, 2:25 AM

Moved interleaveVectors to a static function in LoopVectorize
Removed deinterleaveVector, inlined intrinsic creation. This means the shuffles for fixed-length loads won't be changed, though we do end up with a little duplication as a result.
Precommitted autogen checks for scalable-strict-fadd.ll

huntergr marked 2 inline comments as done.Apr 18 2023, 4:01 AM

Harbormaster completed remote builds in B226354: Diff 514596.Apr 18 2023, 4:40 AM

Ping?

igor.kirillov added a subscriber: igor.kirillov.May 15 2023, 10:32 AM

igor.kirillov added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14732	Looks like we can use MinElts and avoid duplicating EC.getKnownMinValue()

Rebased, removed redundant call as requested.

huntergr marked an inline comment as done.May 19 2023, 1:50 AM

Harbormaster completed remote builds in B233118: Diff 523695.May 19 2023, 2:41 AM

I can't really comment on the AArch64 parts of this, but the LoopVectorizer bits look entirely reasonable to me at this point.

fhahn added a subscriber: fhahn.May 28 2023, 11:29 AM

fhahn added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14742	Nit: can just return the condition
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
458	Should we have this assert already when constructing the interleave recipe?

Simplified size checking code, changed assert on interleave factor to occur before creating recipes.

The recipe constructor doesn't actually know the VF, so just confirming that the decision for a given scalable VF is to interleave only for factors of 2 should suffice. The code that performs interleaving is still effectively guarded by asserts in call construction that it has the correct number of arguments.

huntergr marked 2 inline comments as done.Jun 1 2023, 1:41 AM

Harbormaster completed remote builds in B235766: Diff 527321.Jun 1 2023, 2:54 AM

igor.kirillov added a child revision: D152258: [LV] Add mask support for vectorizing interleaved groups.Jun 6 2023, 4:30 AM

LGTM, thanks!

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
452	Could you make sure there's a test case for RISCV that covers this case before landing?

This revision is now accepted and ready to land.Jun 6 2023, 1:18 PM

Herald added a subscriber: StephenFan. · View Herald TranscriptJun 6 2023, 1:18 PM

mgabka added inline comments.Jun 7 2023, 1:25 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
461	can you pass here Vals directly?

huntergr added inline comments.Jun 8 2023, 3:11 AM

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
452	I added that check because both interleaved-accesses.ll and strided-accesses.ll (in llvm/test/Transforms/LoopVectorize/RISCV/) crash on the cast below when vplan tries to get the cost of the interleaving group with a scalable VF. Is that sufficient?

This revision was landed with ongoing or failed builds.Jun 9 2023, 3:43 AM

Closed by commit rG95bfb1902db9: [LV][AArch64] Allow (limited) interleaving for scalable vectors (authored by huntergr). · Explain Why

This revision was automatically updated to reflect the committed changes.

huntergr added a commit: rG95bfb1902db9: [LV][AArch64] Allow (limited) interleaving for scalable vectors.

luke mentioned this in D145485: [PoC][IR] Generalize interleave/deinterleave intrinsics to factors > 2.Jun 27 2023, 3:14 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

22 lines

AArch64TargetTransformInfo.cpp

12 lines

RISCV/

RISCVTargetTransformInfo.cpp

2 lines

Transforms/

Vectorize/

LoopVectorize.cpp

98 lines

test/

Transforms/

LoopVectorize/

AArch64/

scalable-strict-fadd.ll

126 lines

sve-interleaved-accesses.ll

512 lines

sve-widen-phi.ll

85 lines

Diff 523695

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 14,698 Lines • ▼ Show 20 Lines	bool AArch64TargetLowering::hasPairedLoad(EVT LoadedType,
return NumBits == 32 \|\| NumBits == 64;		return NumBits == 32 \|\| NumBits == 64;
}		}

/// A helper function for determining the number of interleaved accesses we		/// A helper function for determining the number of interleaved accesses we
/// will generate when lowering accesses of the given type.		/// will generate when lowering accesses of the given type.
unsigned AArch64TargetLowering::getNumInterleavedAccesses(		unsigned AArch64TargetLowering::getNumInterleavedAccesses(
VectorType *VecTy, const DataLayout &DL, bool UseScalable) const {		VectorType *VecTy, const DataLayout &DL, bool UseScalable) const {
unsigned VecSize = 128;		unsigned VecSize = 128;
		unsigned ElSize = DL.getTypeSizeInBits(VecTy->getElementType());
		unsigned MinElts = VecTy->getElementCount().getKnownMinValue();
if (UseScalable)		if (UseScalable)
VecSize = std::max(Subtarget->getMinSVEVectorSizeInBits(), 128u);		VecSize = std::max(Subtarget->getMinSVEVectorSizeInBits(), 128u);
return std::max<unsigned>(1, (DL.getTypeSizeInBits(VecTy) + 127) / VecSize);		return std::max<unsigned>(1, (MinElts * ElSize + 127) / VecSize);
}		}

MachineMemOperand::Flags		MachineMemOperand::Flags
AArch64TargetLowering::getTargetMMOFlags(const Instruction &I) const {		AArch64TargetLowering::getTargetMMOFlags(const Instruction &I) const {
if (Subtarget->getProcFamily() == AArch64Subtarget::Falkor &&		if (Subtarget->getProcFamily() == AArch64Subtarget::Falkor &&
I.getMetadata(FALKOR_STRIDED_ACCESS_MD) != nullptr)		I.getMetadata(FALKOR_STRIDED_ACCESS_MD) != nullptr)
return MOStridedAccess;		return MOStridedAccess;
return MachineMemOperand::MONone;		return MachineMemOperand::MONone;
}		}

bool AArch64TargetLowering::isLegalInterleavedAccessType(		bool AArch64TargetLowering::isLegalInterleavedAccessType(
VectorType *VecTy, const DataLayout &DL, bool &UseScalable) const {		VectorType *VecTy, const DataLayout &DL, bool &UseScalable) const {

unsigned VecSize = DL.getTypeSizeInBits(VecTy);
unsigned ElSize = DL.getTypeSizeInBits(VecTy->getElementType());		unsigned ElSize = DL.getTypeSizeInBits(VecTy->getElementType());
unsigned NumElements = cast<FixedVectorType>(VecTy)->getNumElements();		auto EC = VecTy->getElementCount();
		unsigned MinElts = EC.getKnownMinValue();

UseScalable = false;		UseScalable = false;

// Ensure that the predicate for this number of elements is available.		// Ensure that the predicate for this number of elements is available.
if (Subtarget->hasSVE() && !getSVEPredPatternFromNumElements(NumElements))		if (Subtarget->hasSVE() && !getSVEPredPatternFromNumElements(MinElts))
return false;		return false;
		igor.kirillovUnsubmitted Done Reply Inline Actions Looks like we can use MinElts and avoid duplicating EC.getKnownMinValue() igor.kirillov: Looks like we can use MinElts and avoid duplicating EC.getKnownMinValue()

// Ensure the number of vector elements is greater than 1.		// Ensure the number of vector elements is greater than 1.
if (NumElements < 2)		if (MinElts < 2)
return false;		return false;

// Ensure the element type is legal.		// Ensure the element type is legal.
if (ElSize != 8 && ElSize != 16 && ElSize != 32 && ElSize != 64)		if (ElSize != 8 && ElSize != 16 && ElSize != 32 && ElSize != 64)
return false;		return false;

		if (EC.isScalable()) {
		fhahnUnsubmitted Done Reply Inline Actions Nit: can just return the condition fhahn: Nit: can just return the condition
		if (MinElts * ElSize == 128)
		return true;
		return false;
		}

		unsigned VecSize = DL.getTypeSizeInBits(VecTy);
if (Subtarget->forceStreamingCompatibleSVE() \|\|		if (Subtarget->forceStreamingCompatibleSVE() \|\|
(Subtarget->useSVEForFixedLengthVectors() &&		(Subtarget->useSVEForFixedLengthVectors() &&
(VecSize % Subtarget->getMinSVEVectorSizeInBits() == 0 \|\|		(VecSize % Subtarget->getMinSVEVectorSizeInBits() == 0 \|\|
(VecSize < Subtarget->getMinSVEVectorSizeInBits() &&		(VecSize < Subtarget->getMinSVEVectorSizeInBits() &&
isPowerOf2_32(NumElements) && VecSize > 128)))) {		isPowerOf2_32(MinElts) && VecSize > 128)))) {
UseScalable = true;		UseScalable = true;
return true;		return true;
}		}

// Ensure the total vector size is 64 or a multiple of 128. Types larger than		// Ensure the total vector size is 64 or a multiple of 128. Types larger than
// 128 will be split into multiple interleaved accesses.		// 128 will be split into multiple interleaved accesses.
return VecSize == 64 \|\| VecSize % 128 == 0;		return VecSize == 64 \|\| VecSize % 128 == 0;
}		}
▲ Show 20 Lines • Show All 10,628 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 181 Lines • ▼ Show 20 Lines

// Experimental option that will only be fully functional when the cost-model		// Experimental option that will only be fully functional when the cost-model
// and code-generator have been changed to avoid using scalable vector		// and code-generator have been changed to avoid using scalable vector
// instructions that are not legal in streaming SVE mode.		// instructions that are not legal in streaming SVE mode.
static cl::opt<bool> EnableScalableAutovecInStreamingMode(		static cl::opt<bool> EnableScalableAutovecInStreamingMode(
"enable-scalable-autovec-in-streaming-mode", cl::init(false), cl::Hidden);		"enable-scalable-autovec-in-streaming-mode", cl::init(false), cl::Hidden);

bool AArch64TTIImpl::areInlineCompatible(const Function *Caller,		bool AArch64TTIImpl::areInlineCompatible(const Function *Caller,
const Function *Callee) const {		const Function *Callee) const {
SMEAttrs CallerAttrs(*Caller);		SMEAttrs CallerAttrs(*Caller);
SMEAttrs CalleeAttrs(*Callee);		SMEAttrs CalleeAttrs(*Callee);
		mgabkaUnsubmitted Not Done Reply Inline Actions Hi @huntergr , Thanks for your changes to this patch! I have one question, the interface you proposed looks clean and nice, however it forces code generation for the deinterleaving/interleaving intrinsics to be implemented before merging this patch, am I correct? The reason why I had this option here is that it would allow us to merge this patch before other pieces are implemented. mgabka: Hi @huntergr , Thanks for your changes to this patch! I have one question, the interface you…
		huntergrAuthorUnsubmitted Not Done Reply Inline Actions Hi @mgabka , We do have code generation for these intrinsics already, they just get lowered to zips/uzips. See D141924. D146218 will match to ld2/st2 where possible (which is what we want), and should perhaps land first. The changes to isLegalInterleavedAccessType will also be needed there, so the next version of this patch can just rely on that. huntergr: Hi @mgabka , We do have code generation for these intrinsics already, they just get lowered to…
if (CallerAttrs.requiresSMChange(CalleeAttrs,		if (CallerAttrs.requiresSMChange(CalleeAttrs,
/BodyOverridesInterface=/true) \|\|		/BodyOverridesInterface=/true) \|\|
CallerAttrs.requiresLazySave(CalleeAttrs) \|\|		CallerAttrs.requiresLazySave(CalleeAttrs) \|\|
CalleeAttrs.hasNewZAInterface())		CalleeAttrs.hasNewZAInterface())
return false;		return false;

const TargetMachine &TM = getTLI()->getTargetMachine();		const TargetMachine &TM = getTLI()->getTargetMachine();

▲ Show 20 Lines • Show All 2,570 Lines • ▼ Show 20 Lines	InstructionCost AArch64TTIImpl::getMemoryOpCost(unsigned Opcode, Type *Ty,
return LT.first;		return LT.first;
}		}

InstructionCost AArch64TTIImpl::getInterleavedMemoryOpCost(		InstructionCost AArch64TTIImpl::getInterleavedMemoryOpCost(
unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,		unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,
Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,		Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,
bool UseMaskForCond, bool UseMaskForGaps) {		bool UseMaskForCond, bool UseMaskForGaps) {
assert(Factor >= 2 && "Invalid interleave factor");		assert(Factor >= 2 && "Invalid interleave factor");
auto *VecVTy = cast<FixedVectorType>(VecTy);		auto *VecVTy = cast<VectorType>(VecTy);

		if (VecTy->isScalableTy() && (!ST->hasSVE() \|\| Factor != 2))
		return InstructionCost::getInvalid();

if (!UseMaskForCond && !UseMaskForGaps &&		if (!UseMaskForCond && !UseMaskForGaps &&
Factor <= TLI->getMaxSupportedInterleaveFactor()) {		Factor <= TLI->getMaxSupportedInterleaveFactor()) {
unsigned NumElts = VecVTy->getNumElements();		unsigned MinElts = VecVTy->getElementCount().getKnownMinValue();
auto *SubVecTy =		auto *SubVecTy =
FixedVectorType::get(VecTy->getScalarType(), NumElts / Factor);		VectorType::get(VecVTy->getElementType(),
		VecVTy->getElementCount().divideCoefficientBy(Factor));

// ldN/stN only support legal vector types of size 64 or 128 in bits.		// ldN/stN only support legal vector types of size 64 or 128 in bits.
// Accesses having vector types that are a multiple of 128 bits can be		// Accesses having vector types that are a multiple of 128 bits can be
// matched to more than one ldN/stN instruction.		// matched to more than one ldN/stN instruction.
bool UseScalable;		bool UseScalable;
if (NumElts % Factor == 0 &&		if (MinElts % Factor == 0 &&
TLI->isLegalInterleavedAccessType(SubVecTy, DL, UseScalable))		TLI->isLegalInterleavedAccessType(SubVecTy, DL, UseScalable))
return Factor * TLI->getNumInterleavedAccesses(SubVecTy, DL, UseScalable);		return Factor * TLI->getNumInterleavedAccesses(SubVecTy, DL, UseScalable);
}		}

return BaseT::getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,		return BaseT::getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,
Alignment, AddressSpace, CostKind,		Alignment, AddressSpace, CostKind,
UseMaskForCond, UseMaskForGaps);		UseMaskForCond, UseMaskForGaps);
}		}
▲ Show 20 Lines • Show All 801 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

Show First 20 Lines • Show All 442 Lines • ▼ Show 20 Lines	RISCVTTIImpl::getMaskedMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,

return getMemoryOpCost(Opcode, Src, Alignment, AddressSpace, CostKind);		return getMemoryOpCost(Opcode, Src, Alignment, AddressSpace, CostKind);
}		}

InstructionCost RISCVTTIImpl::getInterleavedMemoryOpCost(		InstructionCost RISCVTTIImpl::getInterleavedMemoryOpCost(
unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,		unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,
Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,		Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,
bool UseMaskForCond, bool UseMaskForGaps) {		bool UseMaskForCond, bool UseMaskForGaps) {
		if (isa<ScalableVectorType>(VecTy))
		return InstructionCost::getInvalid();
		fhahnUnsubmitted Not Done Reply Inline Actions Could you make sure there's a test case for RISCV that covers this case before landing? fhahn: Could you make sure there's a test case for RISCV that covers this case before landing?
		huntergrAuthorUnsubmitted Done Reply Inline Actions I added that check because both interleaved-accesses.ll and strided-accesses.ll (in llvm/test/Transforms/LoopVectorize/RISCV/) crash on the cast below when vplan tries to get the cost of the interleaving group with a scalable VF. Is that sufficient? huntergr: I added that check because both interleaved-accesses.ll and strided-accesses.ll (in…
auto *FVTy = cast<FixedVectorType>(VecTy);		auto *FVTy = cast<FixedVectorType>(VecTy);
InstructionCost MemCost =		InstructionCost MemCost =
getMemoryOpCost(Opcode, VecTy, Alignment, AddressSpace, CostKind);		getMemoryOpCost(Opcode, VecTy, Alignment, AddressSpace, CostKind);
unsigned VF = FVTy->getNumElements() / Factor;		unsigned VF = FVTy->getNumElements() / Factor;

// The interleaved memory access pass will lower interleaved memory ops (i.e		// The interleaved memory access pass will lower interleaved memory ops (i.e
// a load and store followed by a specific shuffle) to vlseg/vsseg		// a load and store followed by a specific shuffle) to vlseg/vsseg
// intrinsics. In those cases then we can treat it as if it's just one (legal)		// intrinsics. In those cases then we can treat it as if it's just one (legal)
▲ Show 20 Lines • Show All 1,254 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 433 Lines • ▼ Show 20 Lines	static std::optional<unsigned> getSmallBestKnownTC(ScalarEvolution &SE,

// Check if upper bound estimate is known.		// Check if upper bound estimate is known.
if (unsigned ExpectedTC = SE.getSmallConstantMaxTripCount(L))		if (unsigned ExpectedTC = SE.getSmallConstantMaxTripCount(L))
return ExpectedTC;		return ExpectedTC;

return std::nullopt;		return std::nullopt;
}		}

		/// Return a vector containing interleaved elements from multiple
		/// smaller input vectors.
		static Value interleaveVectors(IRBuilderBase &Builder, ArrayRef<Value > Vals,
		const Twine &Name) {
		unsigned Factor = Vals.size();
		assert(Factor > 1 && "Tried to interleave invalid number of vectors");

		VectorType *VecTy = cast<VectorType>(Vals[0]->getType());
		#ifndef NDEBUG
		for (Value *Val : Vals)
		assert(Val->getType() == VecTy && "Tried to interleave mismatched types");
		#endif

		// Scalable vectors cannot use arbitrary shufflevectors (only splats), so
		// must use intrinsics to interleave.
		if (VecTy->isScalableTy()) {
		assert(Factor == 2 && "Unsupported interleave factor for scalable vectors");
		fhahnUnsubmitted Done Reply Inline Actions Should we have this assert already when constructing the interleave recipe? fhahn: Should we have this assert already when constructing the interleave recipe?
		VectorType *WideVecTy = VectorType::getDoubleElementsVectorType(VecTy);
		SmallVector<Value *, 2> Ops(Vals.begin(), Vals.end());
		return Builder.CreateIntrinsic(
		mgabkaUnsubmitted Not Done Reply Inline Actions can you pass here Vals directly? mgabka: can you pass here Vals directly?
		WideVecTy, Intrinsic::experimental_vector_interleave2, Ops,
		/FMFSource=/nullptr, Name);
		}

		// Fixed length. Start by concatenating all vectors into a wide vector.
		Value *WideVec = concatenateVectors(Builder, Vals);

		// Interleave the elements into the wide vector.
		const unsigned NumElts = VecTy->getElementCount().getFixedValue();
		return Builder.CreateShuffleVector(
		WideVec, createInterleaveMask(NumElts, Factor), Name);
		}

namespace {		namespace {
// Forward declare GeneratedRTChecks.		// Forward declare GeneratedRTChecks.
class GeneratedRTChecks;		class GeneratedRTChecks;

using SCEV2ValueTy = DenseMap<const SCEV , Value >;		using SCEV2ValueTy = DenseMap<const SCEV , Value >;
} // namespace		} // namespace

namespace llvm {		namespace llvm {
▲ Show 20 Lines • Show All 2,131 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::vectorizeInterleaveGroup(
VPTransformState &State, VPValue Addr, ArrayRef<VPValue > StoredValues,		VPTransformState &State, VPValue Addr, ArrayRef<VPValue > StoredValues,
VPValue *BlockInMask, bool NeedsMaskForGaps) {		VPValue *BlockInMask, bool NeedsMaskForGaps) {
Instruction *Instr = Group->getInsertPos();		Instruction *Instr = Group->getInsertPos();
const DataLayout &DL = Instr->getModule()->getDataLayout();		const DataLayout &DL = Instr->getModule()->getDataLayout();

// Prepare for the vector type of the interleaved load/store.		// Prepare for the vector type of the interleaved load/store.
Type *ScalarTy = getLoadStoreType(Instr);		Type *ScalarTy = getLoadStoreType(Instr);
unsigned InterleaveFactor = Group->getFactor();		unsigned InterleaveFactor = Group->getFactor();
assert(!VF.isScalable() && "scalable vectors not yet supported.");
auto VecTy = VectorType::get(ScalarTy, VF InterleaveFactor);		auto VecTy = VectorType::get(ScalarTy, VF InterleaveFactor);

// Prepare for the new pointers.		// Prepare for the new pointers.
SmallVector<Value *, 2> AddrParts;		SmallVector<Value *, 2> AddrParts;
unsigned Index = Group->getIndex(Instr);		unsigned Index = Group->getIndex(Instr);

// TODO: extend the masked interleaved-group support to reversed access.		// TODO: extend the masked interleaved-group support to reversed access.
assert((!BlockInMask \|\| !Group->isReverse()) &&		assert((!BlockInMask \|\| !Group->isReverse()) &&
"Reversed masked interleave-group not supported.");		"Reversed masked interleave-group not supported.");

		Value *Idx;
// If the group is reverse, adjust the index to refer to the last vector lane		// If the group is reverse, adjust the index to refer to the last vector lane
// instead of the first. We adjust the index from the first vector lane,		// instead of the first. We adjust the index from the first vector lane,
// rather than directly getting the pointer for lane VF - 1, because the		// rather than directly getting the pointer for lane VF - 1, because the
// pointer operand of the interleaved access is supposed to be uniform. For		// pointer operand of the interleaved access is supposed to be uniform. For
// uniform instructions, we're only required to generate a value for the		// uniform instructions, we're only required to generate a value for the
// first vector lane in each unroll iteration.		// first vector lane in each unroll iteration.
if (Group->isReverse())		if (Group->isReverse()) {
Index += (VF.getKnownMinValue() - 1) * Group->getFactor();		Value *RuntimeVF = getRuntimeVF(Builder, Builder.getInt32Ty(), VF);
		Idx = Builder.CreateSub(RuntimeVF, Builder.getInt32(1));
		Idx = Builder.CreateMul(Idx, Builder.getInt32(Group->getFactor()));
		Idx = Builder.CreateAdd(Idx, Builder.getInt32(Index));
		Idx = Builder.CreateNeg(Idx);
		} else
		Idx = Builder.getInt32(-Index);

for (unsigned Part = 0; Part < UF; Part++) {		for (unsigned Part = 0; Part < UF; Part++) {
Value *AddrPart = State.get(Addr, VPIteration(Part, 0));		Value *AddrPart = State.get(Addr, VPIteration(Part, 0));
State.setDebugLocFromInst(AddrPart);		State.setDebugLocFromInst(AddrPart);

// Notice current instruction could be any index. Need to adjust the address		// Notice current instruction could be any index. Need to adjust the address
// to the member of index 0.		// to the member of index 0.
//		//
// E.g. a = A[i+1]; // Member of index 1 (Current instruction)		// E.g. a = A[i+1]; // Member of index 1 (Current instruction)
// b = A[i]; // Member of index 0		// b = A[i]; // Member of index 0
// Current pointer is pointed to A[i+1], adjust it to A[i].		// Current pointer is pointed to A[i+1], adjust it to A[i].
//		//
// E.g. A[i+1] = a; // Member of index 1		// E.g. A[i+1] = a; // Member of index 1
// A[i] = b; // Member of index 0		// A[i] = b; // Member of index 0
// A[i+2] = c; // Member of index 2 (Current instruction)		// A[i+2] = c; // Member of index 2 (Current instruction)
// Current pointer is pointed to A[i+2], adjust it to A[i].		// Current pointer is pointed to A[i+2], adjust it to A[i].

bool InBounds = false;		bool InBounds = false;
if (auto *gep = dyn_cast<GetElementPtrInst>(AddrPart->stripPointerCasts()))		if (auto *gep = dyn_cast<GetElementPtrInst>(AddrPart->stripPointerCasts()))
InBounds = gep->isInBounds();		InBounds = gep->isInBounds();
AddrPart = Builder.CreateGEP(ScalarTy, AddrPart, Builder.getInt32(-Index),		AddrPart = Builder.CreateGEP(ScalarTy, AddrPart, Idx, "", InBounds);
"", InBounds);

// Cast to the vector pointer type.		// Cast to the vector pointer type.
unsigned AddressSpace = AddrPart->getType()->getPointerAddressSpace();		unsigned AddressSpace = AddrPart->getType()->getPointerAddressSpace();
Type *PtrTy = VecTy->getPointerTo(AddressSpace);		Type *PtrTy = VecTy->getPointerTo(AddressSpace);
AddrParts.push_back(Builder.CreateBitCast(AddrPart, PtrTy));		AddrParts.push_back(Builder.CreateBitCast(AddrPart, PtrTy));
}		}

State.setDebugLocFromInst(Instr);		State.setDebugLocFromInst(Instr);
Show All 27 Lines	for (unsigned Part = 0; Part < UF; Part++) {
MaskForGaps)		MaskForGaps)
: ShuffledMask;		: ShuffledMask;
}		}
NewLoad =		NewLoad =
Builder.CreateMaskedLoad(VecTy, AddrParts[Part], Group->getAlign(),		Builder.CreateMaskedLoad(VecTy, AddrParts[Part], Group->getAlign(),
GroupMask, PoisonVec, "wide.masked.vec");		GroupMask, PoisonVec, "wide.masked.vec");
}		}
else		else
NewLoad = Builder.CreateAlignedLoad(VecTy, AddrParts[Part],		NewLoad = Builder.CreateAlignedLoad(VecTy, AddrParts[Part],
		lukeUnsubmitted Not Done Reply Inline Actions Need to check that `Group->getFactor() == 2` here or that the call to CreateMaskedInterleavedLoad succeeds luke: Need to check that `Group->getFactor() == 2` here or that the call to…
		mgabkaUnsubmitted Done Reply Inline Actions So my idea was that it would be up to the hasInterleavedLoad function to make sure that it returns true only when Factor is 2, so no extra checks is needed I think. mgabka: So my idea was that it would be up to the hasInterleavedLoad function to make sure that it…
Group->getAlign(), "wide.vec");		Group->getAlign(), "wide.vec");
Group->addMetadata(NewLoad);		Group->addMetadata(NewLoad);
NewLoads.push_back(NewLoad);		NewLoads.push_back(NewLoad);
		reamesUnsubmitted Done Reply Inline Actions Having this be only in the normal load path seems unlikely to be correct. Surely we must also handle masked loads as well? reames: Having this be only in the normal load path seems unlikely to be correct. Surely we must also…
		huntergrAuthorUnsubmitted Done Reply Inline Actions This does handle masked loads -- 'NewLoad = Builder.CreateAligned....' is a standalone statement on the else with no opening brace. I've added a blank line to perhaps make that a little more obvious. Unless there's something else I've missed? huntergr: This does handle masked loads -- 'NewLoad = Builder.CreateAligned....' is a standalone…
		reamesUnsubmitted Done Reply Inline Actions Yeah, I got confused by the brace style in the code above. reames: Yeah, I got confused by the brace style in the code above.
}		}

		if (VecTy->isScalableTy()) {
		assert(InterleaveFactor == 2 &&
		"Unsupported deinterleave factor for scalable vectors");

		for (unsigned Part = 0; Part < UF; ++Part) {
		// Scalable vectors cannot use arbitrary shufflevectors (only splats),
		// so must use intrinsics to deinterleave.
		Value *DI = Builder.CreateIntrinsic(
		Intrinsic::experimental_vector_deinterleave2, VecTy, NewLoads[Part],
		/FMFSource=/nullptr, "strided.vec");
		unsigned J = 0;
		for (unsigned I = 0; I < InterleaveFactor; ++I) {
		Instruction *Member = Group->getMember(I);

		if (!Member)
		continue;

		Value *StridedVec = Builder.CreateExtractValue(DI, I);
		// If this member has different type, cast the result type.
		if (Member->getType() != ScalarTy) {
		VectorType *OtherVTy = VectorType::get(Member->getType(), VF);
		StridedVec = createBitOrPointerCast(StridedVec, OtherVTy, DL);
		}

		if (Group->isReverse())
		StridedVec = Builder.CreateVectorReverse(StridedVec, "reverse");

		State.set(VPDefs[J], StridedVec, Part);
		++J;
		}
		}

		return;
		}

// For each member in the group, shuffle out the appropriate data from the		// For each member in the group, shuffle out the appropriate data from the
// wide loads.		// wide loads.
unsigned J = 0;		unsigned J = 0;
for (unsigned I = 0; I < InterleaveFactor; ++I) {		for (unsigned I = 0; I < InterleaveFactor; ++I) {
Instruction *Member = Group->getMember(I);		Instruction *Member = Group->getMember(I);

// Skip the gaps in the group.		// Skip the gaps in the group.
if (!Member)		if (!Member)
continue;		continue;

auto StrideMask =		auto StrideMask =
createStrideMask(I, InterleaveFactor, VF.getKnownMinValue());		createStrideMask(I, InterleaveFactor, VF.getKnownMinValue());
for (unsigned Part = 0; Part < UF; Part++) {		for (unsigned Part = 0; Part < UF; Part++) {
Value *StridedVec = Builder.CreateShuffleVector(		Value *StridedVec = Builder.CreateShuffleVector(
NewLoads[Part], StrideMask, "strided.vec");		NewLoads[Part], StrideMask, "strided.vec");
		reamesUnsubmitted Done Reply Inline Actions The interface here feels really awkward for fixed length vectors. We have to create this dummy struct type, construct it, destruct it, and we loose the ability to slice out the inactive lanes. I almost wonder if this code would be clearer without the helper function at all. With an explicit version based on scalable type here, we could do a simplified version of this loop with an early return and leave the fixed length codegen unchanged. I'd be tempted to try that and see if the overall code quality looked reasonable. You could also try a lambda which enumerate the active lanes (i.e. doing the shuffle or extract as required), and move the handling of the bitcast and reverse to a callback. This might be too much complexity though. reames: The interface here feels really awkward for fixed length vectors. We have to create this dummy…

// If this member has different type, cast the result type.		// If this member has different type, cast the result type.
if (Member->getType() != ScalarTy) {		if (Member->getType() != ScalarTy) {
assert(!VF.isScalable() && "VF is assumed to be non scalable.");		assert(!VF.isScalable() && "VF is assumed to be non scalable.");
VectorType *OtherVTy = VectorType::get(Member->getType(), VF);		VectorType *OtherVTy = VectorType::get(Member->getType(), VF);
		lukeUnsubmitted Not Done Reply Inline Actions It's somehow possible to reach here with a scalable vector type if `TII->hasInterleavedLoad` returns false. Can we check somewhere inside the vectorizer cost model that if `hasInterleavedLoad` is false then we rule out any recipe with an interleave group for a scalable VF? luke: It's somehow possible to reach here with a scalable vector type if `TII->hasInterleavedLoad`…
		mgabkaUnsubmitted Done Reply Inline Actions So it is actually connected by the LV cost model, the LoopVectorizationCostModel::getInterleaveGroupCost is calling TTI.getInterleavedMemoryOpCost which should return invalid cost for factors different than 2. mgabka: So it is actually connected by the LV cost model, the LoopVectorizationCostModel…
StridedVec = createBitOrPointerCast(StridedVec, OtherVTy, DL);		StridedVec = createBitOrPointerCast(StridedVec, OtherVTy, DL);
}		}

if (Group->isReverse())		if (Group->isReverse())
StridedVec = Builder.CreateVectorReverse(StridedVec, "reverse");		StridedVec = Builder.CreateVectorReverse(StridedVec, "reverse");

State.set(VPDefs[J], StridedVec, Part);		State.set(VPDefs[J], StridedVec, Part);
}		}
++J;		++J;
		reamesUnsubmitted Done Reply Inline Actions It looks like you're changing the handling for gaps in the deinterleave. This seems surprising and worth some discussion? reames: It looks like you're changing the handling for gaps in the deinterleave. This seems surprising…
		huntergrAuthorUnsubmitted Done Reply Inline Actions That was the result of a bit of overzealous cleanup on my part when removing some code from the original patch; I missed the 'continue'. Reverted. huntergr: That was the result of a bit of overzealous cleanup on my part when removing some code from the…
}		}
return;		return;
}		}

// The sub vector type for current instruction.		// The sub vector type for current instruction.
auto *SubVT = VectorType::get(ScalarTy, VF);		auto *SubVT = VectorType::get(ScalarTy, VF);

// Vectorize the interleaved store group.		// Vectorize the interleaved store group.
Show All 28 Lines	for (unsigned i = 0; i < InterleaveFactor; i++) {
// If this member has different type, cast it to a unified type.		// If this member has different type, cast it to a unified type.

if (StoredVec->getType() != SubVT)		if (StoredVec->getType() != SubVT)
StoredVec = createBitOrPointerCast(StoredVec, SubVT, DL);		StoredVec = createBitOrPointerCast(StoredVec, SubVT, DL);

StoredVecs.push_back(StoredVec);		StoredVecs.push_back(StoredVec);
}		}

// Concatenate all vectors into a wide vector.		// Interleave all the smaller vectors into one wider vector.
Value *WideVec = concatenateVectors(Builder, StoredVecs);		Value *IVec = interleaveVectors(Builder, StoredVecs, "interleaved.vec");
		lukeUnsubmitted Not Done Reply Inline Actions Need to check `Group->getFactor() == 2` here too luke: Need to check `Group->getFactor() == 2` here too

// Interleave the elements in the wide vector.
Value *IVec = Builder.CreateShuffleVector(
WideVec, createInterleaveMask(VF.getKnownMinValue(), InterleaveFactor),
"interleaved.vec");

Instruction *NewStoreInstr;		Instruction *NewStoreInstr;
if (BlockInMask \|\| MaskForGaps) {		if (BlockInMask \|\| MaskForGaps) {
Value *GroupMask = MaskForGaps;		Value *GroupMask = MaskForGaps;
if (BlockInMask) {		if (BlockInMask) {
Value *BlockInMaskPart = State.get(BlockInMask, Part);		Value *BlockInMaskPart = State.get(BlockInMask, Part);
Value *ShuffledMask = Builder.CreateShuffleVector(		Value *ShuffledMask = Builder.CreateShuffleVector(
BlockInMaskPart,		BlockInMaskPart,
createReplicatedMask(InterleaveFactor, VF.getKnownMinValue()),		createReplicatedMask(InterleaveFactor, VF.getKnownMinValue()),
▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	if (Cost->requiresScalarEpilogue(VF)) {
R = Builder.CreateSelect(IsZero, Step, R);		R = Builder.CreateSelect(IsZero, Step, R);
}		}

VectorTripCount = Builder.CreateSub(TC, R, "n.vec");		VectorTripCount = Builder.CreateSub(TC, R, "n.vec");

return VectorTripCount;		return VectorTripCount;
}		}

Value InnerLoopVectorizer::createBitOrPointerCast(Value V, VectorType *DstVTy,		Value InnerLoopVectorizer::createBitOrPointerCast(Value V, VectorType *DstVTy,
		reamesUnsubmitted Done Reply Inline Actions The changes to this function are NFC for fixed length vectors, and a generally useful scalable cleanup. Please separate and land this change without the need for further review. This applies only to the changes in this function so as to shrink the diff for future review. reames: The changes to this function are NFC for fixed length vectors, and a generally useful scalable…
const DataLayout &DL) {		const DataLayout &DL) {
// Verify that V is a vector type with same number of elements as DstVTy.		// Verify that V is a vector type with same number of elements as DstVTy.
auto *DstFVTy = cast<VectorType>(DstVTy);		auto *DstFVTy = cast<VectorType>(DstVTy);
auto VF = DstFVTy->getElementCount();		auto VF = DstFVTy->getElementCount();
auto *SrcVecTy = cast<VectorType>(V->getType());		auto *SrcVecTy = cast<VectorType>(V->getType());
assert(VF == SrcVecTy->getElementCount() && "Vector dimensions do not match");		assert(VF == SrcVecTy->getElementCount() && "Vector dimensions do not match");
Type *SrcElemTy = SrcVecTy->getElementType();		Type *SrcElemTy = SrcVecTy->getElementType();
Type *DstElemTy = DstFVTy->getElementType();		Type *DstElemTy = DstFVTy->getElementType();
▲ Show 20 Lines • Show All 3,645 Lines • ▼ Show 20 Lines	return TTI.getAddressComputationCost(VectorTy) +
TTI.getGatherScatterOpCost(		TTI.getGatherScatterOpCost(
I->getOpcode(), VectorTy, Ptr, Legal->isMaskRequired(I), Alignment,		I->getOpcode(), VectorTy, Ptr, Legal->isMaskRequired(I), Alignment,
TargetTransformInfo::TCK_RecipThroughput, I);		TargetTransformInfo::TCK_RecipThroughput, I);
}		}

InstructionCost		InstructionCost
LoopVectorizationCostModel::getInterleaveGroupCost(Instruction *I,		LoopVectorizationCostModel::getInterleaveGroupCost(Instruction *I,
ElementCount VF) {		ElementCount VF) {
// TODO: Once we have support for interleaving with scalable vectors
// we can calculate the cost properly here.
if (VF.isScalable())
return InstructionCost::getInvalid();

Type *ValTy = getLoadStoreType(I);		Type *ValTy = getLoadStoreType(I);
auto *VectorTy = cast<VectorType>(ToVectorTy(ValTy, VF));		auto *VectorTy = cast<VectorType>(ToVectorTy(ValTy, VF));
unsigned AS = getLoadStoreAddressSpace(I);		unsigned AS = getLoadStoreAddressSpace(I);
enum TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;		enum TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;

auto Group = getInterleavedAccessGroup(I);		auto Group = getInterleavedAccessGroup(I);
assert(Group && "Fail to get an interleaved access group.");		assert(Group && "Fail to get an interleaved access group.");

▲ Show 20 Lines • Show All 4,066 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
	; RUN: opt -opaque-pointers=0 < %s -passes=loop-vectorize -mtriple aarch64-unknown-linux-gnu -mattr=+sve -prefer-predicate-over-epilogue=scalar-epilogue \			; RUN: opt -opaque-pointers=0 < %s -passes=loop-vectorize -mtriple aarch64-unknown-linux-gnu -mattr=+sve -prefer-predicate-over-epilogue=scalar-epilogue \
				reamesUnsubmitted Done Reply Inline Actions Please submit a separate change to autogen this file, and then rebase. Same with the other file you switched to autogen. reames: Please submit a separate change to autogen this file, and then rebase. Same with the other…
	; RUN: -force-ordered-reductions=false -hints-allow-reordering=false -S \| FileCheck %s --check-prefix=CHECK-NOT-VECTORIZED			; RUN: -force-ordered-reductions=false -hints-allow-reordering=false -S \| FileCheck %s --check-prefix=CHECK-NOT-VECTORIZED
	; RUN: opt -opaque-pointers=0 < %s -passes=loop-vectorize -mtriple aarch64-unknown-linux-gnu -mattr=+sve -prefer-predicate-over-epilogue=scalar-epilogue \			; RUN: opt -opaque-pointers=0 < %s -passes=loop-vectorize -mtriple aarch64-unknown-linux-gnu -mattr=+sve -prefer-predicate-over-epilogue=scalar-epilogue \
	; RUN: -force-ordered-reductions=false -hints-allow-reordering=true -S \| FileCheck %s --check-prefix=CHECK-UNORDERED			; RUN: -force-ordered-reductions=false -hints-allow-reordering=true -S \| FileCheck %s --check-prefix=CHECK-UNORDERED
	; RUN: opt -opaque-pointers=0 < %s -passes=loop-vectorize -mtriple aarch64-unknown-linux-gnu -mattr=+sve -prefer-predicate-over-epilogue=scalar-epilogue \			; RUN: opt -opaque-pointers=0 < %s -passes=loop-vectorize -mtriple aarch64-unknown-linux-gnu -mattr=+sve -prefer-predicate-over-epilogue=scalar-epilogue \
	; RUN: -force-ordered-reductions=true -hints-allow-reordering=false -S \| FileCheck %s --check-prefix=CHECK-ORDERED			; RUN: -force-ordered-reductions=true -hints-allow-reordering=false -S \| FileCheck %s --check-prefix=CHECK-ORDERED
	; RUN: opt -opaque-pointers=0 < %s -passes=loop-vectorize -mtriple aarch64-unknown-linux-gnu -mattr=+sve -prefer-predicate-over-epilogue=scalar-epilogue \			; RUN: opt -opaque-pointers=0 < %s -passes=loop-vectorize -mtriple aarch64-unknown-linux-gnu -mattr=+sve -prefer-predicate-over-epilogue=scalar-epilogue \
	; RUN: -force-ordered-reductions=true -hints-allow-reordering=true -S \| FileCheck %s --check-prefix=CHECK-UNORDERED			; RUN: -force-ordered-reductions=true -hints-allow-reordering=true -S \| FileCheck %s --check-prefix=CHECK-UNORDERED
	; RUN: opt -opaque-pointers=0 < %s -passes=loop-vectorize -mtriple aarch64-unknown-linux-gnu -mattr=+sve -prefer-predicate-over-epilogue=scalar-epilogue \			; RUN: opt -opaque-pointers=0 < %s -passes=loop-vectorize -mtriple aarch64-unknown-linux-gnu -mattr=+sve -prefer-predicate-over-epilogue=scalar-epilogue \
	▲ Show 20 Lines • Show All 590 Lines • ▼ Show 20 Lines
	; CHECK-UNORDERED: vector.ph:			; CHECK-UNORDERED: vector.ph:
	; CHECK-UNORDERED-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-UNORDERED-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-UNORDERED-NEXT: [[TMP6:%.*]] = mul i64 [[TMP5]], 4			; CHECK-UNORDERED-NEXT: [[TMP6:%.*]] = mul i64 [[TMP5]], 4
	; CHECK-UNORDERED-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], [[TMP6]]			; CHECK-UNORDERED-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], [[TMP6]]
	; CHECK-UNORDERED-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]			; CHECK-UNORDERED-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]
	; CHECK-UNORDERED-NEXT: [[IND_END:%.*]] = mul i64 [[N_VEC]], 2			; CHECK-UNORDERED-NEXT: [[IND_END:%.*]] = mul i64 [[N_VEC]], 2
	; CHECK-UNORDERED-NEXT: [[TMP7:%.*]] = insertelement <vscale x 4 x float> shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float -0.000000e+00, i64 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer), float [[A2]], i32 0			; CHECK-UNORDERED-NEXT: [[TMP7:%.*]] = insertelement <vscale x 4 x float> shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float -0.000000e+00, i64 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer), float [[A2]], i32 0
	; CHECK-UNORDERED-NEXT: [[TMP8:%.*]] = insertelement <vscale x 4 x float> shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float -0.000000e+00, i64 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer), float [[A1]], i32 0			; CHECK-UNORDERED-NEXT: [[TMP8:%.*]] = insertelement <vscale x 4 x float> shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float -0.000000e+00, i64 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer), float [[A1]], i32 0
	; CHECK-UNORDERED-NEXT: [[TMP9:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
	; CHECK-UNORDERED-NEXT: [[TMP10:%.*]] = add <vscale x 4 x i64> [[TMP9]], zeroinitializer
	; CHECK-UNORDERED-NEXT: [[TMP11:%.*]] = mul <vscale x 4 x i64> [[TMP10]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 2, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-UNORDERED-NEXT: [[INDUCTION:%.*]] = add <vscale x 4 x i64> zeroinitializer, [[TMP11]]
	; CHECK-UNORDERED-NEXT: [[TMP12:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-UNORDERED-NEXT: [[TMP13:%.*]] = mul i64 [[TMP12]], 4
	; CHECK-UNORDERED-NEXT: [[TMP14:%.*]] = mul i64 2, [[TMP13]]
	; CHECK-UNORDERED-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP14]], i64 0
	; CHECK-UNORDERED-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-UNORDERED-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-UNORDERED-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK-UNORDERED: vector.body:			; CHECK-UNORDERED: vector.body:
	; CHECK-UNORDERED-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-UNORDERED-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-UNORDERED-NEXT: [[VEC_PHI:%.]] = phi <vscale x 4 x float> [ [[TMP7]], [[VECTOR_PH]] ], [ [[TMP19:%.]], [[VECTOR_BODY]] ]			; CHECK-UNORDERED-NEXT: [[VEC_PHI:%.]] = phi <vscale x 4 x float> [ [[TMP7]], [[VECTOR_PH]] ], [ [[TMP16:%.]], [[VECTOR_BODY]] ]
	; CHECK-UNORDERED-NEXT: [[VEC_PHI1:%.]] = phi <vscale x 4 x float> [ [[TMP8]], [[VECTOR_PH]] ], [ [[TMP16:%.]], [[VECTOR_BODY]] ]			; CHECK-UNORDERED-NEXT: [[VEC_PHI1:%.]] = phi <vscale x 4 x float> [ [[TMP8]], [[VECTOR_PH]] ], [ [[TMP15:%.]], [[VECTOR_BODY]] ]
	; CHECK-UNORDERED-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[INDUCTION]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-UNORDERED-NEXT: [[OFFSET_IDX:%.*]] = mul i64 [[INDEX]], 2
	; CHECK-UNORDERED-NEXT: [[TMP15:%.]] = getelementptr inbounds float, float [[B]], <vscale x 4 x i64> [[VEC_IND]]			; CHECK-UNORDERED-NEXT: [[TMP9:%.*]] = add i64 [[OFFSET_IDX]], 0
	; CHECK-UNORDERED-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <vscale x 4 x float> @llvm.masked.gather.nxv4f32.nxv4p0f32(<vscale x 4 x float> [[TMP15]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x float> poison)			; CHECK-UNORDERED-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[B]], i64 [[TMP9]]
	; CHECK-UNORDERED-NEXT: [[TMP16]] = fadd <vscale x 4 x float> [[WIDE_MASKED_GATHER]], [[VEC_PHI1]]			; CHECK-UNORDERED-NEXT: [[TMP11:%.]] = getelementptr inbounds float, float [[TMP10]], i32 0
	; CHECK-UNORDERED-NEXT: [[TMP17:%.*]] = or <vscale x 4 x i64> [[VEC_IND]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-UNORDERED-NEXT: [[TMP12:%.]] = bitcast float [[TMP11]] to <vscale x 8 x float>*
	; CHECK-UNORDERED-NEXT: [[TMP18:%.]] = getelementptr inbounds float, float [[B]], <vscale x 4 x i64> [[TMP17]]			; CHECK-UNORDERED-NEXT: [[WIDE_VEC:%.]] = load <vscale x 8 x float>, <vscale x 8 x float> [[TMP12]], align 4
	; CHECK-UNORDERED-NEXT: [[WIDE_MASKED_GATHER2:%.]] = call <vscale x 4 x float> @llvm.masked.gather.nxv4f32.nxv4p0f32(<vscale x 4 x float> [[TMP18]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x float> poison)			; CHECK-UNORDERED-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x float>, <vscale x 4 x float> } @llvm.experimental.vector.deinterleave2.nxv8f32(<vscale x 8 x float> [[WIDE_VEC]])
	; CHECK-UNORDERED-NEXT: [[TMP19]] = fadd <vscale x 4 x float> [[WIDE_MASKED_GATHER2]], [[VEC_PHI]]			; CHECK-UNORDERED-NEXT: [[TMP13:%.*]] = extractvalue { <vscale x 4 x float>, <vscale x 4 x float> } [[STRIDED_VEC]], 0
	; CHECK-UNORDERED-NEXT: [[TMP20:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-UNORDERED-NEXT: [[TMP14:%.*]] = extractvalue { <vscale x 4 x float>, <vscale x 4 x float> } [[STRIDED_VEC]], 1
	; CHECK-UNORDERED-NEXT: [[TMP21:%.*]] = mul i64 [[TMP20]], 4			; CHECK-UNORDERED-NEXT: [[TMP15]] = fadd <vscale x 4 x float> [[TMP13]], [[VEC_PHI1]]
	; CHECK-UNORDERED-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP21]]			; CHECK-UNORDERED-NEXT: [[TMP16]] = fadd <vscale x 4 x float> [[TMP14]], [[VEC_PHI]]
	; CHECK-UNORDERED-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]			; CHECK-UNORDERED-NEXT: [[TMP17:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-UNORDERED-NEXT: [[TMP22:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-UNORDERED-NEXT: [[TMP18:%.*]] = mul i64 [[TMP17]], 4
	; CHECK-UNORDERED-NEXT: br i1 [[TMP22]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]			; CHECK-UNORDERED-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP18]]
				; CHECK-UNORDERED-NEXT: [[TMP19:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-UNORDERED-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
	; CHECK-UNORDERED: middle.block:			; CHECK-UNORDERED: middle.block:
	; CHECK-UNORDERED-NEXT: [[TMP23:%.*]] = call float @llvm.vector.reduce.fadd.nxv4f32(float -0.000000e+00, <vscale x 4 x float> [[TMP16]])			; CHECK-UNORDERED-NEXT: [[TMP20:%.*]] = call float @llvm.vector.reduce.fadd.nxv4f32(float -0.000000e+00, <vscale x 4 x float> [[TMP15]])
	; CHECK-UNORDERED-NEXT: [[TMP24:%.*]] = call float @llvm.vector.reduce.fadd.nxv4f32(float -0.000000e+00, <vscale x 4 x float> [[TMP19]])			; CHECK-UNORDERED-NEXT: [[TMP21:%.*]] = call float @llvm.vector.reduce.fadd.nxv4f32(float -0.000000e+00, <vscale x 4 x float> [[TMP16]])
	; CHECK-UNORDERED-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]			; CHECK-UNORDERED-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]
	; CHECK-UNORDERED-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-UNORDERED-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK-UNORDERED: scalar.ph:			; CHECK-UNORDERED: scalar.ph:
	; CHECK-UNORDERED-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-UNORDERED-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-UNORDERED-NEXT: [[BC_MERGE_RDX:%.*]] = phi float [ [[A2]], [[ENTRY]] ], [ [[TMP24]], [[MIDDLE_BLOCK]] ]			; CHECK-UNORDERED-NEXT: [[BC_MERGE_RDX:%.*]] = phi float [ [[A2]], [[ENTRY]] ], [ [[TMP21]], [[MIDDLE_BLOCK]] ]
	; CHECK-UNORDERED-NEXT: [[BC_MERGE_RDX3:%.*]] = phi float [ [[A1]], [[ENTRY]] ], [ [[TMP23]], [[MIDDLE_BLOCK]] ]			; CHECK-UNORDERED-NEXT: [[BC_MERGE_RDX2:%.*]] = phi float [ [[A1]], [[ENTRY]] ], [ [[TMP20]], [[MIDDLE_BLOCK]] ]
	; CHECK-UNORDERED-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-UNORDERED-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK-UNORDERED: for.body:			; CHECK-UNORDERED: for.body:
	; CHECK-UNORDERED-NEXT: [[ADD_PHI1:%.]] = phi float [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[ADD2:%.]], [[FOR_BODY]] ]			; CHECK-UNORDERED-NEXT: [[ADD_PHI1:%.]] = phi float [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[ADD2:%.]], [[FOR_BODY]] ]
	; CHECK-UNORDERED-NEXT: [[ADD_PHI2:%.]] = phi float [ [[BC_MERGE_RDX3]], [[SCALAR_PH]] ], [ [[ADD1:%.]], [[FOR_BODY]] ]			; CHECK-UNORDERED-NEXT: [[ADD_PHI2:%.]] = phi float [ [[BC_MERGE_RDX2]], [[SCALAR_PH]] ], [ [[ADD1:%.]], [[FOR_BODY]] ]
	; CHECK-UNORDERED-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-UNORDERED-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-UNORDERED-NEXT: [[ARRAYIDXB1:%.]] = getelementptr inbounds float, float [[B]], i64 [[IV]]			; CHECK-UNORDERED-NEXT: [[ARRAYIDXB1:%.]] = getelementptr inbounds float, float [[B]], i64 [[IV]]
	; CHECK-UNORDERED-NEXT: [[TMP25:%.]] = load float, float [[ARRAYIDXB1]], align 4			; CHECK-UNORDERED-NEXT: [[TMP22:%.]] = load float, float [[ARRAYIDXB1]], align 4
	; CHECK-UNORDERED-NEXT: [[ADD1]] = fadd float [[TMP25]], [[ADD_PHI2]]			; CHECK-UNORDERED-NEXT: [[ADD1]] = fadd float [[TMP22]], [[ADD_PHI2]]
	; CHECK-UNORDERED-NEXT: [[OR:%.*]] = or i64 [[IV]], 1			; CHECK-UNORDERED-NEXT: [[OR:%.*]] = or i64 [[IV]], 1
	; CHECK-UNORDERED-NEXT: [[ARRAYIDXB2:%.]] = getelementptr inbounds float, float [[B]], i64 [[OR]]			; CHECK-UNORDERED-NEXT: [[ARRAYIDXB2:%.]] = getelementptr inbounds float, float [[B]], i64 [[OR]]
	; CHECK-UNORDERED-NEXT: [[TMP26:%.]] = load float, float [[ARRAYIDXB2]], align 4			; CHECK-UNORDERED-NEXT: [[TMP23:%.]] = load float, float [[ARRAYIDXB2]], align 4
	; CHECK-UNORDERED-NEXT: [[ADD2]] = fadd float [[TMP26]], [[ADD_PHI1]]			; CHECK-UNORDERED-NEXT: [[ADD2]] = fadd float [[TMP23]], [[ADD_PHI1]]
	; CHECK-UNORDERED-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 2			; CHECK-UNORDERED-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 2
	; CHECK-UNORDERED-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]			; CHECK-UNORDERED-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
	; CHECK-UNORDERED-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]			; CHECK-UNORDERED-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
	; CHECK-UNORDERED: for.end:			; CHECK-UNORDERED: for.end:
	; CHECK-UNORDERED-NEXT: [[ADD1_LCSSA:%.*]] = phi float [ [[ADD1]], [[FOR_BODY]] ], [ [[TMP23]], [[MIDDLE_BLOCK]] ]			; CHECK-UNORDERED-NEXT: [[ADD1_LCSSA:%.*]] = phi float [ [[ADD1]], [[FOR_BODY]] ], [ [[TMP20]], [[MIDDLE_BLOCK]] ]
	; CHECK-UNORDERED-NEXT: [[ADD2_LCSSA:%.*]] = phi float [ [[ADD2]], [[FOR_BODY]] ], [ [[TMP24]], [[MIDDLE_BLOCK]] ]			; CHECK-UNORDERED-NEXT: [[ADD2_LCSSA:%.*]] = phi float [ [[ADD2]], [[FOR_BODY]] ], [ [[TMP21]], [[MIDDLE_BLOCK]] ]
	; CHECK-UNORDERED-NEXT: store float [[ADD1_LCSSA]], float* [[A]], align 4			; CHECK-UNORDERED-NEXT: store float [[ADD1_LCSSA]], float* [[A]], align 4
	; CHECK-UNORDERED-NEXT: store float [[ADD2_LCSSA]], float* [[ARRAYIDXA]], align 4			; CHECK-UNORDERED-NEXT: store float [[ADD2_LCSSA]], float* [[ARRAYIDXA]], align 4
	; CHECK-UNORDERED-NEXT: ret void			; CHECK-UNORDERED-NEXT: ret void
	;			;
	; CHECK-ORDERED-LABEL: define void @fadd_strict_interleave			; CHECK-ORDERED-LABEL: define void @fadd_strict_interleave
	; CHECK-ORDERED-SAME: (float* noalias nocapture readonly [[A:%.]], float noalias nocapture readonly [[B:%.]], i64 [[N:%.]]) #[[ATTR0]] {			; CHECK-ORDERED-SAME: (float* noalias nocapture readonly [[A:%.]], float noalias nocapture readonly [[B:%.]], i64 [[N:%.]]) #[[ATTR0]] {
	; CHECK-ORDERED-NEXT: entry:			; CHECK-ORDERED-NEXT: entry:
	; CHECK-ORDERED-NEXT: [[ARRAYIDXA:%.]] = getelementptr inbounds float, float [[A]], i64 1			; CHECK-ORDERED-NEXT: [[ARRAYIDXA:%.]] = getelementptr inbounds float, float [[A]], i64 1
	; CHECK-ORDERED-NEXT: [[A1:%.]] = load float, float [[A]], align 4			; CHECK-ORDERED-NEXT: [[A1:%.]] = load float, float [[A]], align 4
	; CHECK-ORDERED-NEXT: [[A2:%.]] = load float, float [[ARRAYIDXA]], align 4			; CHECK-ORDERED-NEXT: [[A2:%.]] = load float, float [[ARRAYIDXA]], align 4
	; CHECK-ORDERED-NEXT: [[TMP0:%.*]] = add i64 [[N]], -2			; CHECK-ORDERED-NEXT: [[TMP0:%.*]] = add i64 [[N]], -2
	; CHECK-ORDERED-NEXT: [[TMP1:%.*]] = lshr i64 [[TMP0]], 1			; CHECK-ORDERED-NEXT: [[TMP1:%.*]] = lshr i64 [[TMP0]], 1
	; CHECK-ORDERED-NEXT: [[TMP2:%.*]] = add nuw i64 [[TMP1]], 1			; CHECK-ORDERED-NEXT: [[TMP2:%.*]] = add nuw i64 [[TMP1]], 1
	; CHECK-ORDERED-NEXT: [[TMP3:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-ORDERED-NEXT: [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-ORDERED-NEXT: [[TMP4:%.*]] = mul i64 [[TMP3]], 4			; CHECK-ORDERED-NEXT: [[TMP4:%.*]] = mul i64 [[TMP3]], 4
	; CHECK-ORDERED-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP2]], [[TMP4]]			; CHECK-ORDERED-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP2]], [[TMP4]]
	; CHECK-ORDERED-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-ORDERED-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK-ORDERED: vector.ph:			; CHECK-ORDERED: vector.ph:
	; CHECK-ORDERED-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-ORDERED-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-ORDERED-NEXT: [[TMP6:%.*]] = mul i64 [[TMP5]], 4			; CHECK-ORDERED-NEXT: [[TMP6:%.*]] = mul i64 [[TMP5]], 4
	; CHECK-ORDERED-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], [[TMP6]]			; CHECK-ORDERED-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], [[TMP6]]
	; CHECK-ORDERED-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]			; CHECK-ORDERED-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]
	; CHECK-ORDERED-NEXT: [[IND_END:%.*]] = mul i64 [[N_VEC]], 2			; CHECK-ORDERED-NEXT: [[IND_END:%.*]] = mul i64 [[N_VEC]], 2
	; CHECK-ORDERED-NEXT: [[TMP7:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
	; CHECK-ORDERED-NEXT: [[TMP8:%.*]] = add <vscale x 4 x i64> [[TMP7]], zeroinitializer
	; CHECK-ORDERED-NEXT: [[TMP9:%.*]] = mul <vscale x 4 x i64> [[TMP8]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 2, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-ORDERED-NEXT: [[INDUCTION:%.*]] = add <vscale x 4 x i64> zeroinitializer, [[TMP9]]
	; CHECK-ORDERED-NEXT: [[TMP10:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-ORDERED-NEXT: [[TMP11:%.*]] = mul i64 [[TMP10]], 4
	; CHECK-ORDERED-NEXT: [[TMP12:%.*]] = mul i64 2, [[TMP11]]
	; CHECK-ORDERED-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP12]], i64 0
	; CHECK-ORDERED-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-ORDERED-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-ORDERED-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK-ORDERED: vector.body:			; CHECK-ORDERED: vector.body:
	; CHECK-ORDERED-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-ORDERED-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-ORDERED-NEXT: [[VEC_PHI:%.]] = phi float [ [[A2]], [[VECTOR_PH]] ], [ [[TMP16:%.]], [[VECTOR_BODY]] ]			; CHECK-ORDERED-NEXT: [[VEC_PHI:%.]] = phi float [ [[A2]], [[VECTOR_PH]] ], [ [[TMP13:%.]], [[VECTOR_BODY]] ]
	; CHECK-ORDERED-NEXT: [[VEC_PHI1:%.]] = phi float [ [[A1]], [[VECTOR_PH]] ], [ [[TMP17:%.]], [[VECTOR_BODY]] ]			; CHECK-ORDERED-NEXT: [[VEC_PHI1:%.]] = phi float [ [[A1]], [[VECTOR_PH]] ], [ [[TMP14:%.]], [[VECTOR_BODY]] ]
	; CHECK-ORDERED-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[INDUCTION]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-ORDERED-NEXT: [[OFFSET_IDX:%.*]] = mul i64 [[INDEX]], 2
	; CHECK-ORDERED-NEXT: [[TMP13:%.]] = getelementptr inbounds float, float [[B]], <vscale x 4 x i64> [[VEC_IND]]			; CHECK-ORDERED-NEXT: [[TMP7:%.*]] = add i64 [[OFFSET_IDX]], 0
	; CHECK-ORDERED-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <vscale x 4 x float> @llvm.masked.gather.nxv4f32.nxv4p0f32(<vscale x 4 x float> [[TMP13]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x float> poison)			; CHECK-ORDERED-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[B]], i64 [[TMP7]]
	; CHECK-ORDERED-NEXT: [[TMP14:%.*]] = or <vscale x 4 x i64> [[VEC_IND]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-ORDERED-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[TMP8]], i32 0
	; CHECK-ORDERED-NEXT: [[TMP15:%.]] = getelementptr inbounds float, float [[B]], <vscale x 4 x i64> [[TMP14]]			; CHECK-ORDERED-NEXT: [[TMP10:%.]] = bitcast float [[TMP9]] to <vscale x 8 x float>*
	; CHECK-ORDERED-NEXT: [[WIDE_MASKED_GATHER2:%.]] = call <vscale x 4 x float> @llvm.masked.gather.nxv4f32.nxv4p0f32(<vscale x 4 x float> [[TMP15]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x float> poison)			; CHECK-ORDERED-NEXT: [[WIDE_VEC:%.]] = load <vscale x 8 x float>, <vscale x 8 x float> [[TMP10]], align 4
	; CHECK-ORDERED-NEXT: [[TMP16]] = call float @llvm.vector.reduce.fadd.nxv4f32(float [[VEC_PHI]], <vscale x 4 x float> [[WIDE_MASKED_GATHER2]])			; CHECK-ORDERED-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x float>, <vscale x 4 x float> } @llvm.experimental.vector.deinterleave2.nxv8f32(<vscale x 8 x float> [[WIDE_VEC]])
	; CHECK-ORDERED-NEXT: [[TMP17]] = call float @llvm.vector.reduce.fadd.nxv4f32(float [[VEC_PHI1]], <vscale x 4 x float> [[WIDE_MASKED_GATHER]])			; CHECK-ORDERED-NEXT: [[TMP11:%.*]] = extractvalue { <vscale x 4 x float>, <vscale x 4 x float> } [[STRIDED_VEC]], 0
	; CHECK-ORDERED-NEXT: [[TMP18:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-ORDERED-NEXT: [[TMP12:%.*]] = extractvalue { <vscale x 4 x float>, <vscale x 4 x float> } [[STRIDED_VEC]], 1
	; CHECK-ORDERED-NEXT: [[TMP19:%.*]] = mul i64 [[TMP18]], 4			; CHECK-ORDERED-NEXT: [[TMP13]] = call float @llvm.vector.reduce.fadd.nxv4f32(float [[VEC_PHI]], <vscale x 4 x float> [[TMP12]])
	; CHECK-ORDERED-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP19]]			; CHECK-ORDERED-NEXT: [[TMP14]] = call float @llvm.vector.reduce.fadd.nxv4f32(float [[VEC_PHI1]], <vscale x 4 x float> [[TMP11]])
	; CHECK-ORDERED-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]			; CHECK-ORDERED-NEXT: [[TMP15:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-ORDERED-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-ORDERED-NEXT: [[TMP16:%.*]] = mul i64 [[TMP15]], 4
	; CHECK-ORDERED-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]			; CHECK-ORDERED-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP16]]
				; CHECK-ORDERED-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-ORDERED-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
	; CHECK-ORDERED: middle.block:			; CHECK-ORDERED: middle.block:
	; CHECK-ORDERED-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]			; CHECK-ORDERED-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]
	; CHECK-ORDERED-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-ORDERED-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK-ORDERED: scalar.ph:			; CHECK-ORDERED: scalar.ph:
	; CHECK-ORDERED-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-ORDERED-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-ORDERED-NEXT: [[BC_MERGE_RDX:%.*]] = phi float [ [[A2]], [[ENTRY]] ], [ [[TMP16]], [[MIDDLE_BLOCK]] ]			; CHECK-ORDERED-NEXT: [[BC_MERGE_RDX:%.*]] = phi float [ [[A2]], [[ENTRY]] ], [ [[TMP13]], [[MIDDLE_BLOCK]] ]
	; CHECK-ORDERED-NEXT: [[BC_MERGE_RDX3:%.*]] = phi float [ [[A1]], [[ENTRY]] ], [ [[TMP17]], [[MIDDLE_BLOCK]] ]			; CHECK-ORDERED-NEXT: [[BC_MERGE_RDX2:%.*]] = phi float [ [[A1]], [[ENTRY]] ], [ [[TMP14]], [[MIDDLE_BLOCK]] ]
	; CHECK-ORDERED-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-ORDERED-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK-ORDERED: for.body:			; CHECK-ORDERED: for.body:
	; CHECK-ORDERED-NEXT: [[ADD_PHI1:%.]] = phi float [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[ADD2:%.]], [[FOR_BODY]] ]			; CHECK-ORDERED-NEXT: [[ADD_PHI1:%.]] = phi float [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[ADD2:%.]], [[FOR_BODY]] ]
	; CHECK-ORDERED-NEXT: [[ADD_PHI2:%.]] = phi float [ [[BC_MERGE_RDX3]], [[SCALAR_PH]] ], [ [[ADD1:%.]], [[FOR_BODY]] ]			; CHECK-ORDERED-NEXT: [[ADD_PHI2:%.]] = phi float [ [[BC_MERGE_RDX2]], [[SCALAR_PH]] ], [ [[ADD1:%.]], [[FOR_BODY]] ]
	; CHECK-ORDERED-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-ORDERED-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-ORDERED-NEXT: [[ARRAYIDXB1:%.]] = getelementptr inbounds float, float [[B]], i64 [[IV]]			; CHECK-ORDERED-NEXT: [[ARRAYIDXB1:%.]] = getelementptr inbounds float, float [[B]], i64 [[IV]]
	; CHECK-ORDERED-NEXT: [[TMP21:%.]] = load float, float [[ARRAYIDXB1]], align 4			; CHECK-ORDERED-NEXT: [[TMP18:%.]] = load float, float [[ARRAYIDXB1]], align 4
	; CHECK-ORDERED-NEXT: [[ADD1]] = fadd float [[TMP21]], [[ADD_PHI2]]			; CHECK-ORDERED-NEXT: [[ADD1]] = fadd float [[TMP18]], [[ADD_PHI2]]
	; CHECK-ORDERED-NEXT: [[OR:%.*]] = or i64 [[IV]], 1			; CHECK-ORDERED-NEXT: [[OR:%.*]] = or i64 [[IV]], 1
	; CHECK-ORDERED-NEXT: [[ARRAYIDXB2:%.]] = getelementptr inbounds float, float [[B]], i64 [[OR]]			; CHECK-ORDERED-NEXT: [[ARRAYIDXB2:%.]] = getelementptr inbounds float, float [[B]], i64 [[OR]]
	; CHECK-ORDERED-NEXT: [[TMP22:%.]] = load float, float [[ARRAYIDXB2]], align 4			; CHECK-ORDERED-NEXT: [[TMP19:%.]] = load float, float [[ARRAYIDXB2]], align 4
	; CHECK-ORDERED-NEXT: [[ADD2]] = fadd float [[TMP22]], [[ADD_PHI1]]			; CHECK-ORDERED-NEXT: [[ADD2]] = fadd float [[TMP19]], [[ADD_PHI1]]
	; CHECK-ORDERED-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 2			; CHECK-ORDERED-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 2
	; CHECK-ORDERED-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]			; CHECK-ORDERED-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
	; CHECK-ORDERED-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]			; CHECK-ORDERED-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
	; CHECK-ORDERED: for.end:			; CHECK-ORDERED: for.end:
	; CHECK-ORDERED-NEXT: [[ADD1_LCSSA:%.*]] = phi float [ [[ADD1]], [[FOR_BODY]] ], [ [[TMP17]], [[MIDDLE_BLOCK]] ]			; CHECK-ORDERED-NEXT: [[ADD1_LCSSA:%.*]] = phi float [ [[ADD1]], [[FOR_BODY]] ], [ [[TMP14]], [[MIDDLE_BLOCK]] ]
	; CHECK-ORDERED-NEXT: [[ADD2_LCSSA:%.*]] = phi float [ [[ADD2]], [[FOR_BODY]] ], [ [[TMP16]], [[MIDDLE_BLOCK]] ]			; CHECK-ORDERED-NEXT: [[ADD2_LCSSA:%.*]] = phi float [ [[ADD2]], [[FOR_BODY]] ], [ [[TMP13]], [[MIDDLE_BLOCK]] ]
	; CHECK-ORDERED-NEXT: store float [[ADD1_LCSSA]], float* [[A]], align 4			; CHECK-ORDERED-NEXT: store float [[ADD1_LCSSA]], float* [[A]], align 4
	; CHECK-ORDERED-NEXT: store float [[ADD2_LCSSA]], float* [[ARRAYIDXA]], align 4			; CHECK-ORDERED-NEXT: store float [[ADD2_LCSSA]], float* [[ARRAYIDXA]], align 4
	; CHECK-ORDERED-NEXT: ret void			; CHECK-ORDERED-NEXT: ret void
	;			;
	; CHECK-ORDERED-TF-LABEL: define void @fadd_strict_interleave			; CHECK-ORDERED-TF-LABEL: define void @fadd_strict_interleave
	; CHECK-ORDERED-TF-SAME: (float* noalias nocapture readonly [[A:%.]], float noalias nocapture readonly [[B:%.]], i64 [[N:%.]]) #[[ATTR0]] {			; CHECK-ORDERED-TF-SAME: (float* noalias nocapture readonly [[A:%.]], float noalias nocapture readonly [[B:%.]], i64 [[N:%.]]) #[[ATTR0]] {
	; CHECK-ORDERED-TF-NEXT: entry:			; CHECK-ORDERED-TF-NEXT: entry:
	; CHECK-ORDERED-TF-NEXT: [[ARRAYIDXA:%.]] = getelementptr inbounds float, float [[A]], i64 1			; CHECK-ORDERED-TF-NEXT: [[ARRAYIDXA:%.]] = getelementptr inbounds float, float [[A]], i64 1
	▲ Show 20 Lines • Show All 1,660 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll

	Show All 25 Lines
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP1:%.*]] = shl nuw nsw i64 [[TMP0]], 2			; CHECK-NEXT: [[TMP1:%.*]] = shl nuw nsw i64 [[TMP0]], 2
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 512, [[TMP1]]			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 512, [[TMP1]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub nuw nsw i64 512, [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub nuw nsw i64 512, [[N_MOD_VF]]
	; CHECK-NEXT: [[IND_END:%.*]] = shl nuw nsw i64 [[N_VEC]], 1			; CHECK-NEXT: [[IND_END:%.*]] = shl nuw nsw i64 [[N_VEC]], 1
	; CHECK-NEXT: [[TMP2:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
	; CHECK-NEXT: [[TMP3:%.*]] = shl <vscale x 4 x i64> [[TMP2]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP5:%.*]] = shl nuw nsw i64 [[TMP4]], 3
	; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP5]], i64 0
	; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[C:%.]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[C:%.]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT2:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[D:%.]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[D:%.]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT3:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT2]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT1]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP3]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1
	; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds [1024 x i32], ptr @AB, i64 0, <vscale x 4 x i64> [[VEC_IND]]			; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds [1024 x i32], ptr @AB, i64 0, i64 [[OFFSET_IDX]]
	; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP6]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[TMP2]], align 4
	; CHECK-NEXT: [[TMP7:%.*]] = or <vscale x 4 x i64> [[VEC_IND]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])
	; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds [1024 x i32], ptr @AB, i64 0, <vscale x 4 x i64> [[TMP7]]			; CHECK-NEXT: [[TMP3:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0
	; CHECK-NEXT: [[WIDE_MASKED_GATHER1:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP8]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[TMP4:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 1
	; CHECK-NEXT: [[TMP9:%.*]] = add nsw <vscale x 4 x i32> [[WIDE_MASKED_GATHER]], [[BROADCAST_SPLAT]]			; CHECK-NEXT: [[TMP5:%.*]] = or i64 [[OFFSET_IDX]], 1
	; CHECK-NEXT: [[TMP10:%.*]] = mul nsw <vscale x 4 x i32> [[WIDE_MASKED_GATHER1]], [[BROADCAST_SPLAT3]]			; CHECK-NEXT: [[TMP6:%.*]] = add nsw <vscale x 4 x i32> [[TMP3]], [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds [1024 x i32], ptr @CD, i64 0, <vscale x 4 x i64> [[VEC_IND]]			; CHECK-NEXT: [[TMP7:%.*]] = mul nsw <vscale x 4 x i32> [[TMP4]], [[BROADCAST_SPLAT2]]
	; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[TMP9]], <vscale x 4 x ptr> [[TMP11]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))			; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds [1024 x i32], ptr @CD, i64 0, i64 [[TMP5]]
	; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds [1024 x i32], ptr @CD, i64 0, <vscale x 4 x i64> [[TMP7]]			; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i32, ptr [[TMP8]], i64 -1
	; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[TMP10]], <vscale x 4 x ptr> [[TMP12]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))			; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = call <vscale x 8 x i32> @llvm.experimental.vector.interleave2.nxv8i32(<vscale x 4 x i32> [[TMP6]], <vscale x 4 x i32> [[TMP7]])
	; CHECK-NEXT: [[TMP13:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: store <vscale x 8 x i32> [[INTERLEAVED_VEC]], ptr [[TMP9]], align 4
	; CHECK-NEXT: [[TMP14:%.*]] = shl nuw nsw i64 [[TMP13]], 2			; CHECK-NEXT: [[TMP10:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP14]]			; CHECK-NEXT: [[TMP11:%.*]] = shl nuw nsw i64 [[TMP10]], 2
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP11]]
	; CHECK-NEXT: [[TMP15:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP3]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP3]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds [1024 x i16], ptr @AB_i16, i64 0, <vscale x 4 x i64> [[VEC_IND]]			; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds [1024 x i16], ptr @AB_i16, i64 0, <vscale x 4 x i64> [[VEC_IND]]
	; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 4 x i16> @llvm.masked.gather.nxv4i16.nxv4p0(<vscale x 4 x ptr> [[TMP6]], i32 2, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i16> poison)			; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 4 x i16> @llvm.masked.gather.nxv4i16.nxv4p0(<vscale x 4 x ptr> [[TMP6]], i32 2, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i16> poison)
	; CHECK-NEXT: [[TMP7:%.*]] = or <vscale x 4 x i64> [[VEC_IND]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP7:%.*]] = or <vscale x 4 x i64> [[VEC_IND]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds [1024 x i16], ptr @AB_i16, i64 0, <vscale x 4 x i64> [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds [1024 x i16], ptr @AB_i16, i64 0, <vscale x 4 x i64> [[TMP7]]
	; CHECK-NEXT: [[WIDE_MASKED_GATHER1:%.*]] = call <vscale x 4 x i16> @llvm.masked.gather.nxv4i16.nxv4p0(<vscale x 4 x ptr> [[TMP8]], i32 2, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i16> poison)			; CHECK-NEXT: [[WIDE_MASKED_GATHER1:%.*]] = call <vscale x 4 x i16> @llvm.masked.gather.nxv4i16.nxv4p0(<vscale x 4 x ptr> [[TMP8]], i32 2, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i16> poison)
	; CHECK-NEXT: [[TMP9:%.*]] = sext <vscale x 4 x i16> [[WIDE_MASKED_GATHER]] to <vscale x 4 x i32>			; CHECK-NEXT: [[TMP9:%.*]] = sext <vscale x 4 x i16> [[WIDE_MASKED_GATHER]] to <vscale x 4 x i32>
	; CHECK-NEXT: [[TMP10:%.*]] = add nsw <vscale x 4 x i32> [[BROADCAST_SPLAT]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = add nsw <vscale x 4 x i32> [[BROADCAST_SPLAT]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds [1024 x i32], ptr @CD, i64 0, <vscale x 4 x i64> [[VEC_IND]]			; CHECK-NEXT: [[TMP11:%.*]] = sext <vscale x 4 x i16> [[WIDE_MASKED_GATHER1]] to <vscale x 4 x i32>
	; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[TMP10]], <vscale x 4 x ptr> [[TMP11]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))			; CHECK-NEXT: [[TMP12:%.*]] = mul nsw <vscale x 4 x i32> [[BROADCAST_SPLAT3]], [[TMP11]]
	; CHECK-NEXT: [[TMP12:%.*]] = sext <vscale x 4 x i16> [[WIDE_MASKED_GATHER1]] to <vscale x 4 x i32>			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <vscale x 4 x i64> [[TMP7]], i64 0
	; CHECK-NEXT: [[TMP13:%.*]] = mul nsw <vscale x 4 x i32> [[BROADCAST_SPLAT3]], [[TMP12]]			; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds [1024 x i32], ptr @CD, i64 0, i64 [[TMP13]]
	; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds [1024 x i32], ptr @CD, i64 0, <vscale x 4 x i64> [[TMP7]]			; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds i32, ptr [[TMP14]], i64 -1
	; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[TMP13]], <vscale x 4 x ptr> [[TMP14]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))			; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = call <vscale x 8 x i32> @llvm.experimental.vector.interleave2.nxv8i32(<vscale x 4 x i32> [[TMP10]], <vscale x 4 x i32> [[TMP12]])
	; CHECK-NEXT: [[TMP15:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: store <vscale x 8 x i32> [[INTERLEAVED_VEC]], ptr [[TMP15]], align 4
	; CHECK-NEXT: [[TMP16:%.*]] = shl nuw nsw i64 [[TMP15]], 2			; CHECK-NEXT: [[TMP16:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP16]]			; CHECK-NEXT: [[TMP17:%.*]] = shl nuw nsw i64 [[TMP16]], 2
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP17]]
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]
	; CHECK-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [1024 x i16], ptr @AB_i16, i64 0, i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [1024 x i16], ptr @AB_i16, i64 0, i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP18:%.*]] = load i16, ptr [[ARRAYIDX]], align 4			; CHECK-NEXT: [[TMP19:%.*]] = load i16, ptr [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[TMP19:%.*]] = or i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[TMP20:%.*]] = or i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds [1024 x i16], ptr @AB_i16, i64 0, i64 [[TMP19]]			; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds [1024 x i16], ptr @AB_i16, i64 0, i64 [[TMP20]]
	; CHECK-NEXT: [[TMP20:%.*]] = load i16, ptr [[ARRAYIDX2]], align 2			; CHECK-NEXT: [[TMP21:%.*]] = load i16, ptr [[ARRAYIDX2]], align 2
	; CHECK-NEXT: [[CONV:%.*]] = sext i16 [[TMP18]] to i32			; CHECK-NEXT: [[CONV:%.*]] = sext i16 [[TMP19]] to i32
	; CHECK-NEXT: [[ADD3:%.*]] = add nsw i32 [[CONV]], [[C]]			; CHECK-NEXT: [[ADD3:%.*]] = add nsw i32 [[CONV]], [[C]]
	; CHECK-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds [1024 x i32], ptr @CD, i64 0, i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds [1024 x i32], ptr @CD, i64 0, i64 [[INDVARS_IV]]
	; CHECK-NEXT: store i32 [[ADD3]], ptr [[ARRAYIDX5]], align 4			; CHECK-NEXT: store i32 [[ADD3]], ptr [[ARRAYIDX5]], align 4
	; CHECK-NEXT: [[CONV6:%.*]] = sext i16 [[TMP20]] to i32			; CHECK-NEXT: [[CONV6:%.*]] = sext i16 [[TMP21]] to i32
	; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[CONV6]], [[D]]			; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[CONV6]], [[D]]
	; CHECK-NEXT: [[ARRAYIDX9:%.*]] = getelementptr inbounds [1024 x i32], ptr @CD, i64 0, i64 [[TMP19]]			; CHECK-NEXT: [[ARRAYIDX9:%.*]] = getelementptr inbounds [1024 x i32], ptr @CD, i64 0, i64 [[TMP20]]
	; CHECK-NEXT: store i32 [[MUL]], ptr [[ARRAYIDX9]], align 4			; CHECK-NEXT: store i32 [[MUL]], ptr [[ARRAYIDX9]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 2			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INDVARS_IV]], 1022			; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INDVARS_IV]], 1022
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP5:![0-9]+]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP5:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP2:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()			; CHECK-NEXT: [[TMP2:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
	; CHECK-NEXT: [[TMP3:%.*]] = shl <vscale x 4 x i64> [[TMP2]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP3:%.*]] = shl <vscale x 4 x i64> [[TMP2]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP5:%.*]] = shl nuw nsw i64 [[TMP4]], 3			; CHECK-NEXT: [[TMP5:%.*]] = shl nuw nsw i64 [[TMP4]], 3
	; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP5]], i64 0			; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP5]], i64 0
	; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[C:%.]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[C:%.]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT2:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[D:%.]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[D:%.]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT3:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT2]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT1]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP3]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP3]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds [1024 x i32], ptr @AB, i64 0, <vscale x 4 x i64> [[VEC_IND]]			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1
	; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP6]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds [1024 x i32], ptr @AB, i64 0, i64 [[OFFSET_IDX]]
	; CHECK-NEXT: [[TMP7:%.*]] = or <vscale x 4 x i64> [[VEC_IND]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[TMP6]], align 4
	; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds [1024 x i32], ptr @AB, i64 0, <vscale x 4 x i64> [[TMP7]]			; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])
	; CHECK-NEXT: [[WIDE_MASKED_GATHER1:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP8]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[TMP7:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0
	; CHECK-NEXT: [[TMP9:%.*]] = add nsw <vscale x 4 x i32> [[WIDE_MASKED_GATHER]], [[BROADCAST_SPLAT]]			; CHECK-NEXT: [[TMP8:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 1
	; CHECK-NEXT: [[TMP10:%.*]] = trunc <vscale x 4 x i32> [[TMP9]] to <vscale x 4 x i16>			; CHECK-NEXT: [[TMP9:%.*]] = or <vscale x 4 x i64> [[VEC_IND]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds [1024 x i16], ptr @CD_i16, i64 0, <vscale x 4 x i64> [[VEC_IND]]			; CHECK-NEXT: [[TMP10:%.*]] = add nsw <vscale x 4 x i32> [[TMP7]], [[BROADCAST_SPLAT]]
	; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i16.nxv4p0(<vscale x 4 x i16> [[TMP10]], <vscale x 4 x ptr> [[TMP11]], i32 2, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))			; CHECK-NEXT: [[TMP11:%.*]] = trunc <vscale x 4 x i32> [[TMP10]] to <vscale x 4 x i16>
	; CHECK-NEXT: [[TMP12:%.*]] = mul nsw <vscale x 4 x i32> [[WIDE_MASKED_GATHER1]], [[BROADCAST_SPLAT3]]			; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds [1024 x i16], ptr @CD_i16, i64 0, <vscale x 4 x i64> [[VEC_IND]]
	; CHECK-NEXT: [[TMP13:%.*]] = trunc <vscale x 4 x i32> [[TMP12]] to <vscale x 4 x i16>			; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i16.nxv4p0(<vscale x 4 x i16> [[TMP11]], <vscale x 4 x ptr> [[TMP12]], i32 2, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))
	; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds [1024 x i16], ptr @CD_i16, i64 0, <vscale x 4 x i64> [[TMP7]]			; CHECK-NEXT: [[TMP13:%.*]] = mul nsw <vscale x 4 x i32> [[TMP8]], [[BROADCAST_SPLAT2]]
	; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i16.nxv4p0(<vscale x 4 x i16> [[TMP13]], <vscale x 4 x ptr> [[TMP14]], i32 2, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))			; CHECK-NEXT: [[TMP14:%.*]] = trunc <vscale x 4 x i32> [[TMP13]] to <vscale x 4 x i16>
	; CHECK-NEXT: [[TMP15:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds [1024 x i16], ptr @CD_i16, i64 0, <vscale x 4 x i64> [[TMP9]]
	; CHECK-NEXT: [[TMP16:%.*]] = shl nuw nsw i64 [[TMP15]], 2			; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i16.nxv4p0(<vscale x 4 x i16> [[TMP14]], <vscale x 4 x ptr> [[TMP15]], i32 2, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP16]]			; CHECK-NEXT: [[TMP16:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[TMP17:%.*]] = shl nuw nsw i64 [[TMP16]], 2
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP17]]
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]
	; CHECK-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [1024 x i32], ptr @AB, i64 0, i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [1024 x i32], ptr @AB, i64 0, i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP18:%.*]] = load i32, ptr [[ARRAYIDX]], align 4			; CHECK-NEXT: [[TMP19:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[TMP19:%.*]] = or i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[TMP20:%.*]] = or i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds [1024 x i32], ptr @AB, i64 0, i64 [[TMP19]]			; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds [1024 x i32], ptr @AB, i64 0, i64 [[TMP20]]
	; CHECK-NEXT: [[TMP20:%.*]] = load i32, ptr [[ARRAYIDX2]], align 4			; CHECK-NEXT: [[TMP21:%.*]] = load i32, ptr [[ARRAYIDX2]], align 4
	; CHECK-NEXT: [[ADD3:%.*]] = add nsw i32 [[TMP18]], [[C]]			; CHECK-NEXT: [[ADD3:%.*]] = add nsw i32 [[TMP19]], [[C]]
	; CHECK-NEXT: [[CONV:%.*]] = trunc i32 [[ADD3]] to i16			; CHECK-NEXT: [[CONV:%.*]] = trunc i32 [[ADD3]] to i16
	; CHECK-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds [1024 x i16], ptr @CD_i16, i64 0, i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds [1024 x i16], ptr @CD_i16, i64 0, i64 [[INDVARS_IV]]
	; CHECK-NEXT: store i16 [[CONV]], ptr [[ARRAYIDX5]], align 2			; CHECK-NEXT: store i16 [[CONV]], ptr [[ARRAYIDX5]], align 2
	; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP20]], [[D]]			; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP21]], [[D]]
	; CHECK-NEXT: [[CONV6:%.*]] = trunc i32 [[MUL]] to i16			; CHECK-NEXT: [[CONV6:%.*]] = trunc i32 [[MUL]] to i16
	; CHECK-NEXT: [[ARRAYIDX9:%.*]] = getelementptr inbounds [1024 x i16], ptr @CD_i16, i64 0, i64 [[TMP19]]			; CHECK-NEXT: [[ARRAYIDX9:%.*]] = getelementptr inbounds [1024 x i16], ptr @CD_i16, i64 0, i64 [[TMP20]]
	; CHECK-NEXT: store i16 [[CONV6]], ptr [[ARRAYIDX9]], align 2			; CHECK-NEXT: store i16 [[CONV6]], ptr [[ARRAYIDX9]], align 2
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 2			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INDVARS_IV]], 1022			; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INDVARS_IV]], 1022
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP7:![0-9]+]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP7:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	▲ Show 20 Lines • Show All 165 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP1:%.*]] = shl nuw nsw i64 [[TMP0]], 2			; CHECK-NEXT: [[TMP1:%.*]] = shl nuw nsw i64 [[TMP0]], 2
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP1]]			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP1]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub nuw nsw i64 1024, [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub nuw nsw i64 1024, [[N_MOD_VF]]
	; CHECK-NEXT: [[IND_END:%.*]] = add nsw i64 [[N_MOD_VF]], -1			; CHECK-NEXT: [[IND_END:%.*]] = add nsw i64 [[N_MOD_VF]], -1
	; CHECK-NEXT: [[TMP2:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()			; CHECK-NEXT: [[TMP2:%.*]] = call <vscale x 4 x i32> @llvm.experimental.stepvector.nxv4i32()
	; CHECK-NEXT: [[INDUCTION:%.*]] = sub <vscale x 4 x i64> shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1023, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer), [[TMP2]]			; CHECK-NEXT: [[INDUCTION:%.*]] = sub <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1023, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer), [[TMP2]]
	; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.vscale.i32()
	; CHECK-NEXT: [[DOTNEG:%.*]] = mul nsw i64 [[TMP3]], -4			; CHECK-NEXT: [[DOTNEG:%.*]] = mul nsw i32 [[TMP3]], -4
	; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[DOTNEG]], i64 0			; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i32> poison, i32 [[DOTNEG]], i64 0
	; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[DOTSPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = call <vscale x 4 x i32> @llvm.experimental.stepvector.nxv4i32()
	; CHECK-NEXT: [[INDUCTION1:%.*]] = sub <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1023, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer), [[TMP4]]
	; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vscale.i32()
	; CHECK-NEXT: [[DOTNEG7:%.*]] = mul nsw i32 [[TMP5]], -4
	; CHECK-NEXT: [[DOTSPLATINSERT2:%.*]] = insertelement <vscale x 4 x i32> poison, i32 [[DOTNEG7]], i64 0
	; CHECK-NEXT: [[DOTSPLAT3:%.*]] = shufflevector <vscale x 4 x i32> [[DOTSPLATINSERT2]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[INDUCTION]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i32> [ [[INDUCTION]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND4:%.]] = phi <vscale x 4 x i32> [ [[INDUCTION1]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT5:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = sub i64 1023, [[INDEX]]
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [[STRUCT_ST2:%.]], ptr [[A:%.*]], <vscale x 4 x i64> [[VEC_IND]], i32 0			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT_ST2:%.]], ptr [[A:%.*]], i64 [[OFFSET_IDX]], i32 0
	; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP6]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vscale.i32()
	; CHECK-NEXT: [[TMP7:%.*]] = add nsw <vscale x 4 x i32> [[WIDE_MASKED_GATHER]], [[VEC_IND4]]			; CHECK-NEXT: [[TMP6:%.*]] = shl nuw nsw i32 [[TMP5]], 3
	; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds [[STRUCT_ST2]], ptr [[A]], <vscale x 4 x i64> [[VEC_IND]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = sub nsw i32 2, [[TMP6]]
	; CHECK-NEXT: [[WIDE_MASKED_GATHER6:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP8]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[TMP8:%.*]] = sext i32 [[TMP7]] to i64
	; CHECK-NEXT: [[TMP9:%.*]] = sub nsw <vscale x 4 x i32> [[WIDE_MASKED_GATHER6]], [[VEC_IND4]]			; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i32, ptr [[TMP4]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds [[STRUCT_ST2]], ptr [[B:%.]], <vscale x 4 x i64> [[VEC_IND]], i32 0			; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[TMP9]], align 4
	; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[TMP7]], <vscale x 4 x ptr> [[TMP10]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))			; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])
	; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds [[STRUCT_ST2]], ptr [[B]], <vscale x 4 x i64> [[VEC_IND]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0
	; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[TMP9]], <vscale x 4 x ptr> [[TMP11]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))			; CHECK-NEXT: [[REVERSE:%.*]] = call <vscale x 4 x i32> @llvm.experimental.vector.reverse.nxv4i32(<vscale x 4 x i32> [[TMP10]])
	; CHECK-NEXT: [[TMP12:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP11:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 1
	; CHECK-NEXT: [[TMP13:%.*]] = shl nuw nsw i64 [[TMP12]], 2			; CHECK-NEXT: [[REVERSE1:%.*]] = call <vscale x 4 x i32> @llvm.experimental.vector.reverse.nxv4i32(<vscale x 4 x i32> [[TMP11]])
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP13]]			; CHECK-NEXT: [[TMP12:%.*]] = add nsw <vscale x 4 x i32> [[REVERSE]], [[VEC_IND]]
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]			; CHECK-NEXT: [[TMP13:%.*]] = sub nsw <vscale x 4 x i32> [[REVERSE1]], [[VEC_IND]]
	; CHECK-NEXT: [[VEC_IND_NEXT5]] = add <vscale x 4 x i32> [[VEC_IND4]], [[DOTSPLAT3]]			; CHECK-NEXT: [[TMP14:%.]] = getelementptr inbounds [[STRUCT_ST2]], ptr [[B:%.]], i64 [[OFFSET_IDX]], i32 1
	; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP15:%.*]] = call i32 @llvm.vscale.i32()
	; CHECK-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]			; CHECK-NEXT: [[TMP16:%.*]] = shl nuw nsw i32 [[TMP15]], 3
				; CHECK-NEXT: [[TMP17:%.*]] = sub nsw i32 1, [[TMP16]]
				; CHECK-NEXT: [[TMP18:%.*]] = sext i32 [[TMP17]] to i64
				; CHECK-NEXT: [[TMP19:%.*]] = getelementptr inbounds i32, ptr [[TMP14]], i64 [[TMP18]]
				; CHECK-NEXT: [[REVERSE2:%.*]] = call <vscale x 4 x i32> @llvm.experimental.vector.reverse.nxv4i32(<vscale x 4 x i32> [[TMP12]])
				; CHECK-NEXT: [[REVERSE3:%.*]] = call <vscale x 4 x i32> @llvm.experimental.vector.reverse.nxv4i32(<vscale x 4 x i32> [[TMP13]])
				; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = call <vscale x 8 x i32> @llvm.experimental.vector.interleave2.nxv8i32(<vscale x 4 x i32> [[REVERSE2]], <vscale x 4 x i32> [[REVERSE3]])
				; CHECK-NEXT: store <vscale x 8 x i32> [[INTERLEAVED_VEC]], ptr [[TMP19]], align 4
				; CHECK-NEXT: [[TMP20:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[TMP21:%.*]] = shl nuw nsw i64 [[TMP20]], 2
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP21]]
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i32> [[VEC_IND]], [[DOTSPLAT]]
				; CHECK-NEXT: [[TMP22:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP22]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 1023, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 1023, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP1:%.*]] = shl nuw nsw i64 [[TMP0]], 2			; CHECK-NEXT: [[TMP1:%.*]] = shl nuw nsw i64 [[TMP0]], 2
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 512, [[TMP1]]			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 512, [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i64 [[N_MOD_VF]], 0			; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
	; CHECK-NEXT: [[TMP3:%.*]] = select i1 [[TMP2]], i64 [[TMP1]], i64 [[N_MOD_VF]]			; CHECK-NEXT: [[TMP3:%.*]] = select i1 [[TMP2]], i64 [[TMP1]], i64 [[N_MOD_VF]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub nuw nsw i64 512, [[TMP3]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub nuw nsw i64 512, [[TMP3]]
	; CHECK-NEXT: [[IND_END:%.*]] = shl nuw nsw i64 [[N_VEC]], 1			; CHECK-NEXT: [[IND_END:%.*]] = shl nuw nsw i64 [[N_VEC]], 1
	; CHECK-NEXT: [[TMP4:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
	; CHECK-NEXT: [[TMP5:%.*]] = shl <vscale x 4 x i64> [[TMP4]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP7:%.*]] = shl nuw nsw i64 [[TMP6]], 3
	; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP7]], i64 0
	; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP5]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1
	; CHECK-NEXT: [[OFFSET_IDX:%.*]] = and i64 [[INDEX]], 9223372036854775804			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 [[OFFSET_IDX]]
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, ptr [[A:%.]], <vscale x 4 x i64> [[VEC_IND]]			; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[TMP4]], align 4
	; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP8]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])
	; CHECK-NEXT: [[TMP9:%.*]] = shl nsw <vscale x 4 x i32> [[WIDE_MASKED_GATHER]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP5:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[OFFSET_IDX]]			; CHECK-NEXT: [[TMP6:%.*]] = shl nsw <vscale x 4 x i32> [[TMP5]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: store <vscale x 4 x i32> [[TMP9]], ptr [[TMP10]], align 4			; CHECK-NEXT: [[TMP7:%.*]] = and i64 [[INDEX]], 9223372036854775804
	; CHECK-NEXT: [[TMP11:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[TMP7]]
	; CHECK-NEXT: [[TMP12:%.*]] = shl nuw nsw i64 [[TMP11]], 2			; CHECK-NEXT: store <vscale x 4 x i32> [[TMP6]], ptr [[TMP8]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP12]]			; CHECK-NEXT: [[TMP9:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]			; CHECK-NEXT: [[TMP10:%.*]] = shl nuw nsw i64 [[TMP9]], 2
	; CHECK-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP10]]
	; CHECK-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]			; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br label [[SCALAR_PH]]			; CHECK-NEXT: br label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP4:%.*]] = add nuw i64 [[TMP1]], 1			; CHECK-NEXT: [[TMP4:%.*]] = add nuw i64 [[TMP1]], 1
	; CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP6:%.*]] = shl nuw nsw i64 [[TMP5]], 2			; CHECK-NEXT: [[TMP6:%.*]] = shl nuw nsw i64 [[TMP5]], 2
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP4]], [[TMP6]]			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP4]], [[TMP6]]
	; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i64 [[N_MOD_VF]], 0			; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
	; CHECK-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i64 [[TMP6]], i64 [[N_MOD_VF]]			; CHECK-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i64 [[TMP6]], i64 [[N_MOD_VF]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP4]], [[TMP8]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP4]], [[TMP8]]
	; CHECK-NEXT: [[IND_END:%.*]] = shl i64 [[N_VEC]], 1			; CHECK-NEXT: [[IND_END:%.*]] = shl i64 [[N_VEC]], 1
	; CHECK-NEXT: [[TMP9:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
	; CHECK-NEXT: [[TMP10:%.*]] = shl <vscale x 4 x i64> [[TMP9]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP11:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP12:%.*]] = shl nuw nsw i64 [[TMP11]], 3
	; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP12]], i64 0
	; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP10]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1
	; CHECK-NEXT: [[OFFSET_IDX:%.*]] = and i64 [[INDEX]], 9223372036854775804			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 [[OFFSET_IDX]]
	; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, ptr [[A:%.]], <vscale x 4 x i64> [[VEC_IND]]			; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[TMP9]], align 4
	; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP13]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])
	; CHECK-NEXT: [[TMP14:%.*]] = shl nsw <vscale x 4 x i32> [[WIDE_MASKED_GATHER]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP10:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0
	; CHECK-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[OFFSET_IDX]]			; CHECK-NEXT: [[TMP11:%.*]] = shl nsw <vscale x 4 x i32> [[TMP10]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: store <vscale x 4 x i32> [[TMP14]], ptr [[TMP15]], align 4			; CHECK-NEXT: [[TMP12:%.*]] = and i64 [[INDEX]], 9223372036854775804
	; CHECK-NEXT: [[TMP16:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[TMP12]]
	; CHECK-NEXT: [[TMP17:%.*]] = shl nuw nsw i64 [[TMP16]], 2			; CHECK-NEXT: store <vscale x 4 x i32> [[TMP11]], ptr [[TMP13]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP17]]			; CHECK-NEXT: [[TMP14:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]			; CHECK-NEXT: [[TMP15:%.*]] = shl nuw nsw i64 [[TMP14]], 2
	; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP15]]
	; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]			; CHECK-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br label [[SCALAR_PH]]			; CHECK-NEXT: br label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP1:%.*]] = shl nuw nsw i64 [[TMP0]], 2			; CHECK-NEXT: [[TMP1:%.*]] = shl nuw nsw i64 [[TMP0]], 2
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 512, [[TMP1]]			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 512, [[TMP1]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub nuw nsw i64 512, [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub nuw nsw i64 512, [[N_MOD_VF]]
	; CHECK-NEXT: [[IND_END:%.*]] = shl nuw nsw i64 [[N_VEC]], 1			; CHECK-NEXT: [[IND_END:%.*]] = shl nuw nsw i64 [[N_VEC]], 1
	; CHECK-NEXT: [[TMP2:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
	; CHECK-NEXT: [[TMP3:%.*]] = shl <vscale x 4 x i64> [[TMP2]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP5:%.*]] = shl nuw nsw i64 [[TMP4]], 3
	; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP5]], i64 0
	; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP3]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, ptr [[A:%.]], <vscale x 4 x i64> [[VEC_IND]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 [[OFFSET_IDX]]
	; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP6]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[TMP2]], align 4
	; CHECK-NEXT: [[TMP7:%.*]] = or <vscale x 4 x i64> [[VEC_IND]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])
	; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, ptr [[A]], <vscale x 4 x i64> [[TMP7]]			; CHECK-NEXT: [[TMP3:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0
	; CHECK-NEXT: [[WIDE_MASKED_GATHER1:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP8]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[TMP4:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 1
	; CHECK-NEXT: [[TMP9:%.*]] = mul nsw <vscale x 4 x i32> [[WIDE_MASKED_GATHER1]], [[WIDE_MASKED_GATHER]]			; CHECK-NEXT: [[TMP5:%.*]] = or i64 [[OFFSET_IDX]], 1
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, ptr [[B:%.]], <vscale x 4 x i64> [[VEC_IND]]			; CHECK-NEXT: [[TMP6:%.*]] = mul nsw <vscale x 4 x i32> [[TMP4]], [[TMP3]]
	; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[TMP9]], <vscale x 4 x ptr> [[TMP10]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))			; CHECK-NEXT: [[STRIDED_VEC2:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])
	; CHECK-NEXT: [[WIDE_MASKED_GATHER2:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP6]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[TMP7:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC2]], 0
	; CHECK-NEXT: [[WIDE_MASKED_GATHER3:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP8]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[TMP8:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC2]], 1
	; CHECK-NEXT: [[TMP11:%.*]] = add nsw <vscale x 4 x i32> [[WIDE_MASKED_GATHER3]], [[WIDE_MASKED_GATHER2]]			; CHECK-NEXT: [[TMP9:%.*]] = add nsw <vscale x 4 x i32> [[TMP8]], [[TMP7]]
	; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds i32, ptr [[B]], <vscale x 4 x i64> [[TMP7]]			; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[TMP5]]
	; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[TMP11]], <vscale x 4 x ptr> [[TMP12]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))			; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i32, ptr [[TMP10]], i64 -1
	; CHECK-NEXT: [[TMP13:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = call <vscale x 8 x i32> @llvm.experimental.vector.interleave2.nxv8i32(<vscale x 4 x i32> [[TMP6]], <vscale x 4 x i32> [[TMP9]])
	; CHECK-NEXT: [[TMP14:%.*]] = shl nuw nsw i64 [[TMP13]], 2			; CHECK-NEXT: store <vscale x 8 x i32> [[INTERLEAVED_VEC]], ptr [[TMP11]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP14]]			; CHECK-NEXT: [[TMP12:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]			; CHECK-NEXT: [[TMP13:%.*]] = shl nuw nsw i64 [[TMP12]], 2
	; CHECK-NEXT: [[TMP15:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP13]]
	; CHECK-NEXT: br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]]			; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: @int_float_struct(			; CHECK-LABEL: @int_float_struct(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP1:%.*]] = shl nuw nsw i64 [[TMP0]], 2			; CHECK-NEXT: [[TMP1:%.*]] = shl nuw nsw i64 [[TMP0]], 2
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP1]]			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP1]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub nuw nsw i64 1024, [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub nuw nsw i64 1024, [[N_MOD_VF]]
	; CHECK-NEXT: [[TMP2:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
	; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP4:%.*]] = shl nuw nsw i64 [[TMP3]], 2
	; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP4]], i64 0
	; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP2]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <vscale x 4 x float> [ insertelement (<vscale x 4 x float> zeroinitializer, float undef, i32 0), [[VECTOR_PH]] ], [ [[TMP7:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <vscale x 4 x float> [ insertelement (<vscale x 4 x float> zeroinitializer, float undef, i32 0), [[VECTOR_PH]] ], [ [[TMP8:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI1:%.]] = phi <vscale x 4 x i32> [ insertelement (<vscale x 4 x i32> zeroinitializer, i32 undef, i32 0), [[VECTOR_PH]] ], [ [[TMP6:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI1:%.]] = phi <vscale x 4 x i32> [ insertelement (<vscale x 4 x i32> zeroinitializer, i32 undef, i32 0), [[VECTOR_PH]] ], [ [[TMP6:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds [[STRUCT_INTFLOAT:%.]], ptr [[P:%.*]], <vscale x 4 x i64> [[VEC_IND]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds [[STRUCT_INTFLOAT:%.]], ptr [[P:%.*]], i64 [[INDEX]], i32 0
	; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP5]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[TMP2]], align 4
	; CHECK-NEXT: [[TMP6]] = add <vscale x 4 x i32> [[WIDE_MASKED_GATHER]], [[VEC_PHI1]]			; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])
	; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds [[STRUCT_INTFLOAT]], ptr [[P]], <vscale x 4 x i64> [[VEC_IND]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0
	; CHECK-NEXT: [[WIDE_MASKED_GATHER2:%.*]] = call <vscale x 4 x float> @llvm.masked.gather.nxv4f32.nxv4p0(<vscale x 4 x ptr> [[TMP7]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x float> poison)			; CHECK-NEXT: [[TMP4:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 1
	; CHECK-NEXT: [[TMP8]] = fadd fast <vscale x 4 x float> [[VEC_PHI]], [[WIDE_MASKED_GATHER2]]			; CHECK-NEXT: [[TMP5:%.*]] = bitcast <vscale x 4 x i32> [[TMP4]] to <vscale x 4 x float>
	; CHECK-NEXT: [[TMP9:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP6]] = add <vscale x 4 x i32> [[TMP3]], [[VEC_PHI1]]
	; CHECK-NEXT: [[TMP10:%.*]] = shl nuw nsw i64 [[TMP9]], 2			; CHECK-NEXT: [[TMP7]] = fadd fast <vscale x 4 x float> [[VEC_PHI]], [[TMP5]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP10]]			; CHECK-NEXT: [[TMP8:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]			; CHECK-NEXT: [[TMP9:%.*]] = shl nuw nsw i64 [[TMP8]], 2
	; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP9]]
	; CHECK-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]			; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[TMP12:%.*]] = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> [[TMP6]])			; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> [[TMP6]])
	; CHECK-NEXT: [[TMP13:%.*]] = call fast float @llvm.vector.reduce.fadd.nxv4f32(float -0.000000e+00, <vscale x 4 x float> [[TMP8]])			; CHECK-NEXT: [[TMP12:%.*]] = call fast float @llvm.vector.reduce.fadd.nxv4f32(float -0.000000e+00, <vscale x 4 x float> [[TMP7]])
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi float [ [[TMP13]], [[MIDDLE_BLOCK]] ], [ undef, [[ENTRY]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi float [ [[TMP12]], [[MIDDLE_BLOCK]] ], [ undef, [[ENTRY]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX3:%.*]] = phi i32 [ [[TMP12]], [[MIDDLE_BLOCK]] ], [ undef, [[ENTRY]] ]			; CHECK-NEXT: [[BC_MERGE_RDX2:%.*]] = phi i32 [ [[TMP11]], [[MIDDLE_BLOCK]] ], [ undef, [[ENTRY]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: [[ADD_LCSSA:%.]] = phi i32 [ [[ADD:%.]], [[FOR_BODY]] ], [ [[TMP12]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[ADD_LCSSA:%.]] = phi i32 [ [[ADD:%.]], [[FOR_BODY]] ], [ [[TMP11]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: [[ADD3_LCSSA:%.]] = phi float [ [[ADD3:%.]], [[FOR_BODY]] ], [ [[TMP13]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[ADD3_LCSSA:%.]] = phi float [ [[ADD3:%.]], [[FOR_BODY]] ], [ [[TMP12]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: store i32 [[ADD_LCSSA]], ptr @SA, align 4			; CHECK-NEXT: store i32 [[ADD_LCSSA]], ptr @SA, align 4
	; CHECK-NEXT: store float [[ADD3_LCSSA]], ptr @SB, align 4			; CHECK-NEXT: store float [[ADD3_LCSSA]], ptr @SB, align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[SUMB_014:%.*]] = phi float [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[ADD3]], [[FOR_BODY]] ]			; CHECK-NEXT: [[SUMB_014:%.*]] = phi float [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[ADD3]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[SUMA_013:%.*]] = phi i32 [ [[BC_MERGE_RDX3]], [[SCALAR_PH]] ], [ [[ADD]], [[FOR_BODY]] ]			; CHECK-NEXT: [[SUMA_013:%.*]] = phi i32 [ [[BC_MERGE_RDX2]], [[SCALAR_PH]] ], [ [[ADD]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[A:%.*]] = getelementptr inbounds [[STRUCT_INTFLOAT]], ptr [[P]], i64 [[INDVARS_IV]], i32 0			; CHECK-NEXT: [[A:%.*]] = getelementptr inbounds [[STRUCT_INTFLOAT]], ptr [[P]], i64 [[INDVARS_IV]], i32 0
	; CHECK-NEXT: [[LOAD1:%.*]] = load i32, ptr [[A]], align 4			; CHECK-NEXT: [[LOAD1:%.*]] = load i32, ptr [[A]], align 4
	; CHECK-NEXT: [[ADD]] = add nsw i32 [[LOAD1]], [[SUMA_013]]			; CHECK-NEXT: [[ADD]] = add nsw i32 [[LOAD1]], [[SUMA_013]]
	; CHECK-NEXT: [[B:%.*]] = getelementptr inbounds [[STRUCT_INTFLOAT]], ptr [[P]], i64 [[INDVARS_IV]], i32 1			; CHECK-NEXT: [[B:%.*]] = getelementptr inbounds [[STRUCT_INTFLOAT]], ptr [[P]], i64 [[INDVARS_IV]], i32 1
	; CHECK-NEXT: [[LOAD2:%.*]] = load float, ptr [[B]], align 4			; CHECK-NEXT: [[LOAD2:%.*]] = load float, ptr [[B]], align 4
	; CHECK-NEXT: [[ADD3]] = fadd fast float [[SUMB_014]], [[LOAD2]]			; CHECK-NEXT: [[ADD3]] = fadd fast float [[SUMB_014]], [[LOAD2]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024
	▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP6]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP6]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds [[PAIR_I32:%.]], ptr [[P:%.*]], <vscale x 4 x i64> [[VEC_IND]], i32 0			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds [[PAIR_I32:%.]], ptr [[P:%.*]], <vscale x 4 x i64> [[VEC_IND]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], <vscale x 4 x i64> [[VEC_IND]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], <vscale x 4 x i64> [[VEC_IND]], i32 1
	; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[BROADCAST_SPLAT]], <vscale x 4 x ptr> [[TMP9]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))			; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[BROADCAST_SPLAT]], <vscale x 4 x ptr> [[TMP9]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))
	; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP9]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <vscale x 4 x ptr> [[TMP9]], i64 0
	; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[WIDE_MASKED_GATHER]], <vscale x 4 x ptr> [[TMP10]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))			; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[TMP11]], align 4
	; CHECK-NEXT: [[TMP11:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])
	; CHECK-NEXT: [[TMP12:%.*]] = shl nuw nsw i64 [[TMP11]], 2			; CHECK-NEXT: [[TMP12:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP12]]			; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[TMP12]], <vscale x 4 x ptr> [[TMP10]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))
				; CHECK-NEXT: [[TMP13:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[TMP14:%.*]] = shl nuw nsw i64 [[TMP13]], 2
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP14]]
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]
	; CHECK-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP15:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br label [[SCALAR_PH]]			; CHECK-NEXT: br label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[P_I_X:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], i64 [[I]], i32 0			; CHECK-NEXT: [[P_I_X:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], i64 [[I]], i32 0
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP8:%.*]] = shl nuw nsw i64 [[TMP7]], 2			; CHECK-NEXT: [[TMP8:%.*]] = shl nuw nsw i64 [[TMP7]], 2
	; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP8]], i64 0			; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP8]], i64 0
	; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP6]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP6]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <vscale x 4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP11:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <vscale x 4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP14:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds [[PAIR_I32:%.]], ptr [[P:%.*]], <vscale x 4 x i64> [[VEC_IND]], i32 0			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds [[PAIR_I32:%.]], ptr [[P:%.*]], i64 [[INDEX]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], <vscale x 4 x i64> [[VEC_IND]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], <vscale x 4 x i64> [[VEC_IND]], i32 1
	; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP9]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[TMP9]], align 4
	; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[WIDE_MASKED_GATHER]], <vscale x 4 x ptr> [[TMP10]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))			; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])
	; CHECK-NEXT: [[WIDE_MASKED_GATHER1:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP10]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[TMP11:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0
	; CHECK-NEXT: [[TMP11]] = add <vscale x 4 x i32> [[WIDE_MASKED_GATHER1]], [[VEC_PHI]]			; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[TMP11]], <vscale x 4 x ptr> [[TMP10]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))
	; CHECK-NEXT: [[TMP12:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <vscale x 4 x ptr> [[TMP10]], i64 0
	; CHECK-NEXT: [[TMP13:%.*]] = shl nuw nsw i64 [[TMP12]], 2			; CHECK-NEXT: [[WIDE_VEC1:%.*]] = load <vscale x 8 x i32>, ptr [[TMP12]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP13]]			; CHECK-NEXT: [[STRIDED_VEC2:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC1]])
				; CHECK-NEXT: [[TMP13:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC2]], 0
				; CHECK-NEXT: [[TMP14]] = add <vscale x 4 x i32> [[TMP13]], [[VEC_PHI]]
				; CHECK-NEXT: [[TMP15:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[TMP16:%.*]] = shl nuw nsw i64 [[TMP15]], 2
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP16]]
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]
	; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP24:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP24:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[TMP15:%.*]] = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> [[TMP11]])			; CHECK-NEXT: [[TMP18:%.*]] = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> [[TMP14]])
	; CHECK-NEXT: br label [[SCALAR_PH]]			; CHECK-NEXT: br label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP15]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP18]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[S:%.]] = phi i32 [ [[TMP17:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[S:%.]] = phi i32 [ [[TMP20:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[P_I_X:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], i64 [[I]], i32 0			; CHECK-NEXT: [[P_I_X:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], i64 [[I]], i32 0
	; CHECK-NEXT: [[P_I_Y:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], i64 [[I]], i32 1			; CHECK-NEXT: [[P_I_Y:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], i64 [[I]], i32 1
	; CHECK-NEXT: [[TMP16:%.*]] = load i32, ptr [[P_I_X]], align 4			; CHECK-NEXT: [[TMP19:%.*]] = load i32, ptr [[P_I_X]], align 4
	; CHECK-NEXT: store i32 [[TMP16]], ptr [[P_I_Y]], align 4			; CHECK-NEXT: store i32 [[TMP19]], ptr [[P_I_Y]], align 4
	; CHECK-NEXT: [[TMP17]] = add nsw i32 [[TMP16]], [[S]]			; CHECK-NEXT: [[TMP20]] = add nsw i32 [[TMP19]], [[S]]
	; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 1			; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 1
	; CHECK-NEXT: [[COND:%.*]] = icmp slt i64 [[I_NEXT]], [[N]]			; CHECK-NEXT: [[COND:%.*]] = icmp slt i64 [[I_NEXT]], [[N]]
	; CHECK-NEXT: br i1 [[COND]], label [[FOR_BODY]], label [[FOR_END:%.*]], !llvm.loop [[LOOP25:![0-9]+]]			; CHECK-NEXT: br i1 [[COND]], label [[FOR_BODY]], label [[FOR_END:%.*]], !llvm.loop [[LOOP25:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret i32 [[TMP17]]			; CHECK-NEXT: ret i32 [[TMP20]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]			%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
	%s = phi i32 [ %2, %for.body ], [ 0, %entry ]			%s = phi i32 [ %2, %for.body ], [ 0, %entry ]
	%p_i.x = getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %i, i32 0			%p_i.x = getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %i, i32 0
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[Z:%.]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[Z:%.]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP6]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP6]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds [[PAIR_I32:%.]], ptr [[P:%.*]], <vscale x 4 x i64> [[VEC_IND]], i32 0			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds [[PAIR_I32:%.]], ptr [[P:%.*]], <vscale x 4 x i64> [[VEC_IND]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], <vscale x 4 x i64> shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 -1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer), i32 0			; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], i64 -1, i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], <vscale x 4 x i64> [[VEC_IND]], i32 1			; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], <vscale x 4 x i64> [[VEC_IND]], i32 1
	; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[BROADCAST_SPLAT]], <vscale x 4 x ptr> [[TMP9]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))			; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[BROADCAST_SPLAT]], <vscale x 4 x ptr> [[TMP9]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))
	; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP10]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[TMP10]], align 4
	; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[WIDE_MASKED_GATHER]], <vscale x 4 x ptr> [[TMP11]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))			; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])
	; CHECK-NEXT: [[TMP12:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP12:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0
	; CHECK-NEXT: [[TMP13:%.*]] = shl nuw nsw i64 [[TMP12]], 2			; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[TMP12]], <vscale x 4 x ptr> [[TMP11]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP13]]			; CHECK-NEXT: [[TMP13:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[TMP14:%.*]] = shl nuw nsw i64 [[TMP13]], 2
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP14]]
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]
	; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP15:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP26:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP26:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br label [[SCALAR_PH]]			; CHECK-NEXT: br label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[P_I_X:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], i64 [[I]], i32 0			; CHECK-NEXT: [[P_I_X:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], i64 [[I]], i32 0
	; CHECK-NEXT: [[P_I_MINUS_1_X:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], i64 -1, i32 0			; CHECK-NEXT: [[P_I_MINUS_1_X:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], i64 -1, i32 0
	; CHECK-NEXT: [[P_I_Y:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], i64 [[I]], i32 1			; CHECK-NEXT: [[P_I_Y:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], i64 [[I]], i32 1
	; CHECK-NEXT: store i32 [[Z]], ptr [[P_I_X]], align 4			; CHECK-NEXT: store i32 [[Z]], ptr [[P_I_X]], align 4
	; CHECK-NEXT: [[TMP15:%.*]] = load i32, ptr [[P_I_MINUS_1_X]], align 4			; CHECK-NEXT: [[TMP16:%.*]] = load i32, ptr [[P_I_MINUS_1_X]], align 4
	; CHECK-NEXT: store i32 [[TMP15]], ptr [[P_I_Y]], align 4			; CHECK-NEXT: store i32 [[TMP16]], ptr [[P_I_Y]], align 4
	; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 1			; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 1
	; CHECK-NEXT: [[COND:%.*]] = icmp slt i64 [[I_NEXT]], [[N]]			; CHECK-NEXT: [[COND:%.*]] = icmp slt i64 [[I_NEXT]], [[N]]
	; CHECK-NEXT: br i1 [[COND]], label [[FOR_BODY]], label [[FOR_END:%.*]], !llvm.loop [[LOOP27:![0-9]+]]			; CHECK-NEXT: br i1 [[COND]], label [[FOR_BODY]], label [[FOR_END:%.*]], !llvm.loop [[LOOP27:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body
	▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP8:%.*]] = shl nuw nsw i64 [[TMP7]], 2			; CHECK-NEXT: [[TMP8:%.*]] = shl nuw nsw i64 [[TMP7]], 2
	; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP8]], i64 0			; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP8]], i64 0
	; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP6]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP6]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <vscale x 4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP13:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <vscale x 4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP15:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP9:%.*]] = add nuw nsw <vscale x 4 x i64> [[VEC_IND]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP9:%.*]] = add nuw nsw <vscale x 4 x i64> [[VEC_IND]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds [[PAIR_I32:%.]], ptr [[P:%.*]], <vscale x 4 x i64> [[VEC_IND]], i32 0			; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds [[PAIR_I32:%.]], ptr [[P:%.*]], i64 [[INDEX]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], <vscale x 4 x i64> [[VEC_IND]], i32 1			; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], i64 [[INDEX]], i32 1
	; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], <vscale x 4 x i64> [[TMP9]], i32 1			; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], <vscale x 4 x i64> [[TMP9]], i32 1
	; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP10]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[TMP10]], align 4
	; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[WIDE_MASKED_GATHER]], <vscale x 4 x ptr> [[TMP12]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))			; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])
	; CHECK-NEXT: [[WIDE_MASKED_GATHER1:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP11]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[TMP13:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0
	; CHECK-NEXT: [[TMP13]] = add <vscale x 4 x i32> [[WIDE_MASKED_GATHER1]], [[VEC_PHI]]			; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[TMP13]], <vscale x 4 x ptr> [[TMP12]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))
	; CHECK-NEXT: [[TMP14:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[WIDE_VEC1:%.*]] = load <vscale x 8 x i32>, ptr [[TMP11]], align 4
	; CHECK-NEXT: [[TMP15:%.*]] = shl nuw nsw i64 [[TMP14]], 2			; CHECK-NEXT: [[STRIDED_VEC2:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC1]])
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP15]]			; CHECK-NEXT: [[TMP14:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC2]], 0
				; CHECK-NEXT: [[TMP15]] = add <vscale x 4 x i32> [[TMP14]], [[VEC_PHI]]
				; CHECK-NEXT: [[TMP16:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[TMP17:%.*]] = shl nuw nsw i64 [[TMP16]], 2
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP17]]
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]
	; CHECK-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP28:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP28:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[TMP17:%.*]] = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> [[TMP13]])			; CHECK-NEXT: [[TMP19:%.*]] = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> [[TMP15]])
	; CHECK-NEXT: br label [[SCALAR_PH]]			; CHECK-NEXT: br label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP17]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP19]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[S:%.]] = phi i32 [ [[TMP20:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[S:%.]] = phi i32 [ [[TMP22:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[I_PLUS_1:%.*]] = add nuw nsw i64 [[I]], 1			; CHECK-NEXT: [[I_PLUS_1:%.*]] = add nuw nsw i64 [[I]], 1
	; CHECK-NEXT: [[P_I_X:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], i64 [[I]], i32 0			; CHECK-NEXT: [[P_I_X:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], i64 [[I]], i32 0
	; CHECK-NEXT: [[P_I_Y:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], i64 [[I]], i32 1			; CHECK-NEXT: [[P_I_Y:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], i64 [[I]], i32 1
	; CHECK-NEXT: [[P_I_PLUS_1_Y:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], i64 [[I_PLUS_1]], i32 1			; CHECK-NEXT: [[P_I_PLUS_1_Y:%.*]] = getelementptr inbounds [[PAIR_I32]], ptr [[P]], i64 [[I_PLUS_1]], i32 1
	; CHECK-NEXT: [[TMP18:%.*]] = load i32, ptr [[P_I_X]], align 4			; CHECK-NEXT: [[TMP20:%.*]] = load i32, ptr [[P_I_X]], align 4
	; CHECK-NEXT: store i32 [[TMP18]], ptr [[P_I_PLUS_1_Y]], align 4			; CHECK-NEXT: store i32 [[TMP20]], ptr [[P_I_PLUS_1_Y]], align 4
	; CHECK-NEXT: [[TMP19:%.*]] = load i32, ptr [[P_I_Y]], align 4			; CHECK-NEXT: [[TMP21:%.*]] = load i32, ptr [[P_I_Y]], align 4
	; CHECK-NEXT: [[TMP20]] = add nsw i32 [[TMP19]], [[S]]			; CHECK-NEXT: [[TMP22]] = add nsw i32 [[TMP21]], [[S]]
	; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 1			; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 1
	; CHECK-NEXT: [[COND:%.*]] = icmp slt i64 [[I_NEXT]], [[N]]			; CHECK-NEXT: [[COND:%.*]] = icmp slt i64 [[I_NEXT]], [[N]]
	; CHECK-NEXT: br i1 [[COND]], label [[FOR_BODY]], label [[FOR_END:%.*]], !llvm.loop [[LOOP29:![0-9]+]]			; CHECK-NEXT: br i1 [[COND]], label [[FOR_BODY]], label [[FOR_END:%.*]], !llvm.loop [[LOOP29:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret i32 [[TMP20]]			; CHECK-NEXT: ret i32 [[TMP22]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]			%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
	%s = phi i32 [ %2, %for.body ], [ 0, %entry ]			%s = phi i32 [ %2, %for.body ], [ 0, %entry ]
	%i_plus_1 = add nuw nsw i64 %i, 1			%i_plus_1 = add nuw nsw i64 %i, 1
	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[Y:%.]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[Y:%.]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT1]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT1]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT3:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[Z:%.]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT3:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[Z:%.]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT3]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT3]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP8]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP8]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP11:%.*]] = add <vscale x 4 x i64> [[VEC_IND]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP11:%.*]] = or i64 [[OFFSET_IDX]], 1
	; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, ptr [[A:%.]], <vscale x 4 x i64> [[VEC_IND]]			; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, ptr [[A:%.]], <vscale x 4 x i64> [[VEC_IND]]
	; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds i32, ptr [[A]], <vscale x 4 x i64> [[TMP11]]			; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP11]]
	; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[BROADCAST_SPLAT]], <vscale x 4 x ptr> [[TMP12]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))			; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[BROADCAST_SPLAT]], <vscale x 4 x ptr> [[TMP12]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))
	; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[BROADCAST_SPLAT2]], <vscale x 4 x ptr> [[TMP12]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))			; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds i32, ptr [[TMP13]], i64 -1
	; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[BROADCAST_SPLAT4]], <vscale x 4 x ptr> [[TMP13]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))			; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = call <vscale x 8 x i32> @llvm.experimental.vector.interleave2.nxv8i32(<vscale x 4 x i32> [[BROADCAST_SPLAT2]], <vscale x 4 x i32> [[BROADCAST_SPLAT4]])
	; CHECK-NEXT: [[TMP14:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: store <vscale x 8 x i32> [[INTERLEAVED_VEC]], ptr [[TMP14]], align 4
	; CHECK-NEXT: [[TMP15:%.*]] = shl nuw nsw i64 [[TMP14]], 2			; CHECK-NEXT: [[TMP15:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP15]]			; CHECK-NEXT: [[TMP16:%.*]] = shl nuw nsw i64 [[TMP15]], 2
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP16]]
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]
	; CHECK-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP30:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP30:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	▲ Show 20 Lines • Show All 271 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll

	Show All 24 Lines
	; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP3:%.*]] = shl nuw nsw i64 [[TMP2]], 3			; CHECK-NEXT: [[TMP3:%.*]] = shl nuw nsw i64 [[TMP2]], 3
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], [[TMP3]]			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], [[TMP3]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
	; CHECK-NEXT: [[TMP4:%.*]] = shl i64 [[N_VEC]], 3			; CHECK-NEXT: [[TMP4:%.*]] = shl i64 [[N_VEC]], 3
	; CHECK-NEXT: [[IND_END:%.]] = getelementptr i8, ptr [[C:%.]], i64 [[TMP4]]			; CHECK-NEXT: [[IND_END:%.]] = getelementptr i8, ptr [[C:%.]], i64 [[TMP4]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[POINTER_PHI:%.]] = phi ptr [ [[C]], [[VECTOR_PH]] ], [ [[PTR_IND:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP5:%.*]] = shl i64 [[INDEX]], 3
	; CHECK-NEXT: [[TMP6:%.*]] = shl nuw nsw i64 [[TMP5]], 2			; CHECK-NEXT: [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[C]], i64 [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = shl nuw nsw i64 [[TMP5]], 6			; CHECK-NEXT: [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP8:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()			; CHECK-NEXT: [[TMP7:%.*]] = shl nuw nsw i64 [[TMP6]], 5
	; CHECK-NEXT: [[VECTOR_GEP:%.*]] = shl <vscale x 4 x i64> [[TMP8]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 3, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP8:%.*]] = shl i64 [[INDEX]], 3
	; CHECK-NEXT: [[TMP9:%.*]] = getelementptr i8, ptr [[POINTER_PHI]], <vscale x 4 x i64> [[VECTOR_GEP]]			; CHECK-NEXT: [[TMP9:%.*]] = add i64 [[TMP7]], [[TMP8]]
	; CHECK-NEXT: [[DOTSPLATINSERT2:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP6]], i64 0			; CHECK-NEXT: [[NEXT_GEP2:%.*]] = getelementptr i8, ptr [[C]], i64 [[TMP9]]
	; CHECK-NEXT: [[DOTSPLAT3:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT2]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[NEXT_GEP]], align 4
	; CHECK-NEXT: [[TMP10:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()			; CHECK-NEXT: [[WIDE_VEC3:%.*]] = load <vscale x 8 x i32>, ptr [[NEXT_GEP2]], align 4
	; CHECK-NEXT: [[TMP11:%.*]] = add <vscale x 4 x i64> [[DOTSPLAT3]], [[TMP10]]			; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])
	; CHECK-NEXT: [[VECTOR_GEP4:%.*]] = shl <vscale x 4 x i64> [[TMP11]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 3, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP10:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0
	; CHECK-NEXT: [[TMP12:%.*]] = getelementptr i8, ptr [[POINTER_PHI]], <vscale x 4 x i64> [[VECTOR_GEP4]]			; CHECK-NEXT: [[TMP11:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 1
	; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds i32, <vscale x 4 x ptr> [[TMP9]], i64 1			; CHECK-NEXT: [[STRIDED_VEC4:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC3]])
	; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds i32, <vscale x 4 x ptr> [[TMP12]], i64 1			; CHECK-NEXT: [[TMP12:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC4]], 0
	; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP9]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[TMP13:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC4]], 1
	; CHECK-NEXT: [[WIDE_MASKED_GATHER5:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP12]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[TMP14:%.*]] = add nsw <vscale x 4 x i32> [[TMP10]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[WIDE_MASKED_GATHER6:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP13]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[TMP15:%.*]] = add nsw <vscale x 4 x i32> [[TMP12]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[WIDE_MASKED_GATHER7:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP14]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP15:%.*]] = add nsw <vscale x 4 x i32> [[WIDE_MASKED_GATHER]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: store <vscale x 4 x i32> [[TMP14]], ptr [[TMP16]], align 4
	; CHECK-NEXT: [[TMP16:%.*]] = add nsw <vscale x 4 x i32> [[WIDE_MASKED_GATHER5]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP17:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP17:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP18:%.*]] = shl nuw nsw i64 [[TMP17]], 2
	; CHECK-NEXT: store <vscale x 4 x i32> [[TMP15]], ptr [[TMP17]], align 4			; CHECK-NEXT: [[TMP19:%.*]] = getelementptr inbounds i32, ptr [[TMP16]], i64 [[TMP18]]
	; CHECK-NEXT: [[TMP18:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: store <vscale x 4 x i32> [[TMP15]], ptr [[TMP19]], align 4
	; CHECK-NEXT: [[TMP19:%.*]] = shl nuw nsw i64 [[TMP18]], 2			; CHECK-NEXT: [[TMP20:%.*]] = add nsw <vscale x 4 x i32> [[TMP11]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP20:%.*]] = getelementptr inbounds i32, ptr [[TMP17]], i64 [[TMP19]]			; CHECK-NEXT: [[TMP21:%.*]] = add nsw <vscale x 4 x i32> [[TMP13]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: store <vscale x 4 x i32> [[TMP16]], ptr [[TMP20]], align 4			; CHECK-NEXT: [[TMP22:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP21:%.*]] = add nsw <vscale x 4 x i32> [[WIDE_MASKED_GATHER6]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: store <vscale x 4 x i32> [[TMP20]], ptr [[TMP22]], align 4
	; CHECK-NEXT: [[TMP22:%.*]] = add nsw <vscale x 4 x i32> [[WIDE_MASKED_GATHER7]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP23:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP23:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP24:%.*]] = shl nuw nsw i64 [[TMP23]], 2
	; CHECK-NEXT: store <vscale x 4 x i32> [[TMP21]], ptr [[TMP23]], align 4			; CHECK-NEXT: [[TMP25:%.*]] = getelementptr inbounds i32, ptr [[TMP22]], i64 [[TMP24]]
	; CHECK-NEXT: [[TMP24:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: store <vscale x 4 x i32> [[TMP21]], ptr [[TMP25]], align 4
	; CHECK-NEXT: [[TMP25:%.*]] = shl nuw nsw i64 [[TMP24]], 2			; CHECK-NEXT: [[TMP26:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP26:%.*]] = getelementptr inbounds i32, ptr [[TMP23]], i64 [[TMP25]]			; CHECK-NEXT: [[TMP27:%.*]] = shl nuw nsw i64 [[TMP26]], 3
	; CHECK-NEXT: store <vscale x 4 x i32> [[TMP22]], ptr [[TMP26]], align 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP27]]
	; CHECK-NEXT: [[TMP27:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP28:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: [[TMP28:%.*]] = shl nuw nsw i64 [[TMP27]], 3			; CHECK-NEXT: br i1 [[TMP28]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP28]]
	; CHECK-NEXT: [[PTR_IND]] = getelementptr i8, ptr [[POINTER_PHI]], i64 [[TMP7]]
	; CHECK-NEXT: [[TMP29:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP29]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi ptr [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[C]], [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi ptr [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[C]], [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_RESUME_VAL1:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]			; CHECK-NEXT: [[BC_RESUME_VAL1:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[PTR_014:%.]] = phi ptr [ [[INCDEC_PTR1:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[PTR_014:%.]] = phi ptr [ [[INCDEC_PTR1:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[I_013:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL1]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[I_013:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL1]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[INCDEC_PTR:%.*]] = getelementptr inbounds i32, ptr [[PTR_014]], i64 1			; CHECK-NEXT: [[INCDEC_PTR:%.*]] = getelementptr inbounds i32, ptr [[PTR_014]], i64 1
	; CHECK-NEXT: [[TMP30:%.*]] = load i32, ptr [[PTR_014]], align 4			; CHECK-NEXT: [[TMP29:%.*]] = load i32, ptr [[PTR_014]], align 4
	; CHECK-NEXT: [[INCDEC_PTR1]] = getelementptr inbounds i32, ptr [[PTR_014]], i64 2			; CHECK-NEXT: [[INCDEC_PTR1]] = getelementptr inbounds i32, ptr [[PTR_014]], i64 2
	; CHECK-NEXT: [[TMP31:%.*]] = load i32, ptr [[INCDEC_PTR]], align 4			; CHECK-NEXT: [[TMP30:%.*]] = load i32, ptr [[INCDEC_PTR]], align 4
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP30]], 1			; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP29]], 1
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[I_013]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[I_013]]
	; CHECK-NEXT: store i32 [[ADD]], ptr [[ARRAYIDX]], align 4			; CHECK-NEXT: store i32 [[ADD]], ptr [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[ADD2:%.*]] = add nsw i32 [[TMP31]], 1			; CHECK-NEXT: [[ADD2:%.*]] = add nsw i32 [[TMP30]], 1
	; CHECK-NEXT: [[ARRAYIDX3:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[I_013]]			; CHECK-NEXT: [[ARRAYIDX3:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[I_013]]
	; CHECK-NEXT: store i32 [[ADD2]], ptr [[ARRAYIDX3]], align 4			; CHECK-NEXT: store i32 [[ADD2]], ptr [[ARRAYIDX3]], align 4
	; CHECK-NEXT: [[INC]] = add nuw nsw i64 [[I_013]], 1			; CHECK-NEXT: [[INC]] = add nuw nsw i64 [[I_013]], 1
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]			; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
	; CHECK: for.exit:			; CHECK: for.exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	▲ Show 20 Lines • Show All 302 Lines • Show Last 20 Lines