This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
TargetLowering.h
-
lib/
-
CodeGen/
5/6
InterleavedAccessPass.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64ISelLowering.h
22/24
AArch64ISelLowering.cpp
-
test/Transforms/InterleavedAccess/AArch64/
-
Transforms/
-
InterleavedAccess/
-
AArch64/
1/2
fixed-deinterleave-intrinsics.ll
-
scalable-deinterleave-intrinsics.ll

Differential D146218

[AArch64][CodeGen] Lower (de)interleave2 intrinsics to ld2/st2
ClosedPublic

Authored by huntergr on Mar 16 2023, 4:53 AM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
CarolineConcatto

Commits

rGe49d04e760f6: [AArch64][CodeGen] Lower (de)interleave2 intrinsics to ld2/st2

Summary

The InterleavedAccess pass currently matches (de)interleaving
shufflevector instructions with loads or stores, and calls into
target lowering to generate ldN or stN instructions.

Since we can't use shufflevector for scalable vectors (besides a
splat with zeroinitializer), we have interleave2 and deinterleave2
intrinsics. This patch extends InterleavedAccess to recognize those
intrinsics and if possible replace them with ld2/st2 via target lowering.

Unlike the fixed-length version, we currently cannot 'legalize' the
operation in IR because we don't have a way of concatenating or
splitting vectors at a scalable point, so for now we just bail
out if the types won't match the actual hardware instructions.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

huntergr created this revision.Mar 16 2023, 4:53 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2023, 4:53 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

huntergr requested review of this revision.Mar 16 2023, 4:53 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2023, 4:53 AM

Herald added subscribers: llvm-commits, alextsao1999. · View Herald Transcript

Harbormaster completed remote builds in B219838: Diff 505768.Mar 16 2023, 5:52 AM

paulwalker-arm added inline comments.Mar 24 2023, 11:01 AM

llvm/lib/CodeGen/InterleavedAccessPass.cpp
464	Probably best to also check for `isSimple()` to match the behaviour of `lowerInterleavedLoad`.
486	Probably best to also check for `isSimple()` to match the behaviour of `lowerInterleavedStore`.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15192–15194	I'm happy for incremental bring up but I would like to see fixed length vectors supported sooner rather than later so that we have the option to use the new shuffle intrinsics without losing this feature. I'm specifically interested in the potential to simplify SVE VLS so that it mirrors SVE VLA
15204	This will have the effect of "moving" the load to where the deinterleaving is happening? which has the potential to break the original IR's load/store order.
15232–15234	This is not sufficient because it'll allow `<vscale x 128 x i1>`, `<vscale x 64 x i2>` etc. Perhaps it's better to explicitly check for the types we do support?
15237	As above, this might create the new "store" in the wrong place.
llvm/test/Transforms/InterleavedAccess/AArch64/sve-deinterleave-intrinsics.ll
161 ↗	(On Diff #505768)	Please can you add tests for `ptr` vectors as well.

Matt added a subscriber: Matt.Mar 25 2023, 1:25 PM

huntergr mentioned this in D145163: Add support for vectorization of interleaved memory accesses for scalable VF.Apr 6 2023, 3:57 AM

Stricter checking before trying to replace load/store+intrinsic pairs
Supports fixed-length vectors
Will now 'legalize' (de)interleave operations on larger (power-of-two) vectors into multiple ld2/st2 operations with appropriate insert/extract subvector operations.
More tests.

huntergr marked 7 inline comments as done.Apr 21 2023, 1:43 AM

Harbormaster completed remote builds in B227112: Diff 515641.Apr 21 2023, 2:31 AM

mgabka added a subscriber: mgabka.Apr 24 2023, 1:50 AM

Ping?

paulwalker-arm added inline comments.Jun 1 2023, 9:34 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14829	Function names should start with a lower case letter?
14844	As above.
15191–15192	Perhaps this is better done in `isLegalInterleavedAccessType`? as it cannot really be considered legal if the necessary target features are not available. You likely also want to use `hasSVEorSME` because SME supports these instructions as well.
15207–15209	I suspect this patch will ICE the compiler when faced with SVE fixed length vectors and there's no tests to prove otherwise. The nuance here is the result vectors are fixed length but the target intrinsics will generate scalable vectors. If you look at `lowerInterleavedLoad` you'll see the extra complexity relating to working with a container type. With that said, I'm happy for this patch to not support SVE fixed length vectors, it just cannot trigger an ICE. This might be as simple as adding a bail out for `UseScalable != isa<ScalableVector>(VTy)` and adding a few tests.
15225	I think just `ConstantInt::getTrue(LdTy->getContext())` should work here?
15274	Same comment as with lowerDeinterleaveIntrinsicToLoad.
15287–15289	I don't really see what you're gaining by passing in `Address` rather than just passing in `SI`? I guess similar is true for `lowerDeinterleaveIntrinsicToLoad`.
15313	`ConstantInt::getTrue(StTy->getContext())`?
llvm/test/Transforms/InterleavedAccess/AArch64/sve-deinterleave-intrinsics.ll
18–19 ↗	(On Diff #515641)	How about returning `%deinterleave` instead of the floating `extractvalue`'s, or do they exist to test something specific?
190–192 ↗	(On Diff #515641)	I think it's better for NEON to have its own test file. Likewise for SVE fixed length support, which I suspect doesn't work.

Made recommended changes to the lowering code.

Split the test file into scalable and fixed test files, added a RUN line to the fixed test file that uses -force-streaming-compatible-sve in order to test fixed SVE. The actual transformation doesn't occur yet but we have coverage now.

huntergr marked 8 inline comments as done.Jun 8 2023, 2:00 AM

Harbormaster completed remote builds in B237449: Diff 529536.Jun 8 2023, 2:43 AM

paulwalker-arm added inline comments.Jun 13 2023, 10:30 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15222	I thought opaque pointers meant we didn't need to bitcast pointers anymore? I mainly mention it because I wasn't sure what type the following `CreateGEP` call returns and so figured we might either not need any bitcasting or we might need a little more :)
15235	Is this correct? `VTy` is the result type for each of the two results from the deinterleave intrinsic. `LdTy` is this divided by the number of of calls to `ld2` that you'll need, which represents the result type for each of the two results from the `ld2` intrinsic. However `ld2` reads `2 * sizeof(LdTy)` bytes. So I think `Offset` is half the amount it needs to be. I wonder if you can simplify the addressing logic by just using `LdTy` directly. That way the offset can just be `I * Factor` for both fixed and scalable vectors, thus no need for `vscale`. What do you think? I've not looked but I'm assuming the store function has the same problem.

paulwalker-arm added inline comments.Jun 13 2023, 10:33 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15222	https://www.llvm.org/docs/OpaquePointers.html says that for LLVM 17 "Typed pointers are not supported", so I think the bitcasts can be removed?

Removed pointer casts. I based the new code a little too closely on the existing shufflevector lowering code.
Fixed offset calculation to include the interleave Factor.
Also fixed another bug -- BaseAddr was overwritten with the new GEP value, which increases the offset for later memory operations instead of using the size per load/store. All GEPs now use the original pointer as a base.

huntergr marked 3 inline comments as done.Jun 20 2023, 3:17 AM

Harbormaster completed remote builds in B239958: Diff 532846.Jun 20 2023, 5:18 AM

A few recommendations but otherwise looks good.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14763–14765	Not sure there's a perfect way to write this but I think to be a little more accurate you want something like: if (!VecTy->isScalableTy() && !Subtarget->hasNEON()) return false; if (VecTy->isScalableTy() && !Subtarget->hasSVEorSME())) return false;
14832	Probably worth an `assert(Factor >= 2 && Factor <= 4);`. Same goes for getStructuredStoreFunction.
15228–15229	Up to you but given you've got an IRBuilder you could use `Builder.getInt64()` to reduce the line wrapping for all the places where you're using `ConstantInt::get(Type::getInt64Ty(VTy->getContext()),....`.
15281–15284	Is this code still relevant? It looks like leftovers from before you started to pass `SI` as an operand.

This revision is now accepted and ready to land.Jun 23 2023, 4:40 AM

mgabka added inline comments.Jun 26 2023, 3:07 AM

llvm/lib/CodeGen/InterleavedAccessPass.cpp
116–126	perhaps adding a comment to this and the other function below would be a good idea, at least consistent with the other lowerinterleaved* functions
481	could you explain why you are checking it here but not for lowerDeinterleaveIntrinsic?
525	I think it is worth to add a comment here explaining why we restrict this operation only to VF=2
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14782	is it correct to set it here to true or only if the condition below is true? from what I can see, before UseScalable was only set to true if this function was returning true
llvm/test/Transforms/InterleavedAccess/AArch64/fixed-deinterleave-intrinsics.ll
4	is "-force-streaming-compatible-sve" needed here? I thought that this transformation should happen always for sve, the tests for scalable vectorization have only "target-features"="+sve" added. am I missing something?

huntergr marked 7 inline comments as done.Jun 26 2023, 6:41 AM

huntergr added inline comments.

llvm/lib/CodeGen/InterleavedAccessPass.cpp
481	For deinterleave, I check that the load has a single use so that I can replace both it and the intrinsic -- so the deinterleave intrinsic itself (or a target specific intrinsic which produces the same result) can have multiple uses without problems. Here, we want to make sure that the store is the only use of the interleave so that both can be replaced.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14782	It's fine -- if the type is not a legal shuffle type, no transformation will be performed.
llvm/test/Transforms/InterleavedAccess/AArch64/fixed-deinterleave-intrinsics.ll
4	This flag allows us to explicitly force the use of SVE ldN/stN instructions for fixed-width vectors. If you look at AArch64TargetLowering::isLegalInterleavedAccessType(), you will see it make a call to Subtarget->forceStreamingCompatibleSVE() as one option for a condition to mark UseScalable = true; I think the other way to do it would be to set the minimum SVE vector size to 256b, but that only works if the vector types used in the test are >128b. So using the flag means I can force it for all fixed-width tests.

This revision was landed with ongoing or failed builds.Jun 26 2023, 6:41 AM

Closed by commit rGe49d04e760f6: [AArch64][CodeGen] Lower (de)interleave2 intrinsics to ld2/st2 (authored by huntergr). · Explain Why

This revision was automatically updated to reflect the committed changes.

huntergr marked 2 inline comments as done.

huntergr added a commit: rGe49d04e760f6: [AArch64][CodeGen] Lower (de)interleave2 intrinsics to ld2/st2.

mgabka added inline comments.Jun 26 2023, 6:50 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14782	if callers check correctly for the return value of this function. However, the UseScalable is already set to false, even if the function returns false, my point is that this function should modify state of UseScalable only if it returns true. But that is a just my personal preference I guess.

paulwalker-arm added inline comments.Jun 26 2023, 7:00 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14782	The first part is the function's design (i.e. UseScalable only contains meaning data when the function returns true). That said, this function is not great as it tries to do two different jobs. Certainly worth a redesign if there's chance to refactor the current way fixed length vectors are implemented.

huntergr added inline comments.Jun 26 2023, 7:20 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14782	I agree with Paul that it conflates two jobs, but I did just think of a nicer way to do it -- return either a pair of bools or an optional<bool> instead of passing in a reference to the variable. I can make that a fixup commit if you'd like.

luke mentioned this in D144175: [RISCV] Combine (store/load interleave,deinterleave) into vsseg2/vlseg2.Jun 27 2023, 4:41 AM

luke mentioned this in D153864: [RISCV] Lower interleave2 intrinsics to vsseg2.Jun 27 2023, 5:52 AM

luke mentioned this in rG70093fcf6c32: [RISCV] Lower interleave2 intrinsics to vsseg2.Jul 5 2023, 11:24 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetLowering.h

22 lines

lib/

CodeGen/

InterleavedAccessPass.cpp

61 lines

Target/

AArch64/

AArch64ISelLowering.h

6 lines

AArch64ISelLowering.cpp

224 lines

test/

Transforms/

InterleavedAccess/

AArch64/

fixed-deinterleave-intrinsics.ll

323 lines

scalable-deinterleave-intrinsics.ll

263 lines

Diff 534525

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 2,966 Lines • ▼ Show 20 Lines	public:
/// \p SI is the vector store instruction.		/// \p SI is the vector store instruction.
/// \p SVI is the shufflevector to RE-interleave the stored vector.		/// \p SVI is the shufflevector to RE-interleave the stored vector.
/// \p Factor is the interleave factor.		/// \p Factor is the interleave factor.
virtual bool lowerInterleavedStore(StoreInst SI, ShuffleVectorInst SVI,		virtual bool lowerInterleavedStore(StoreInst SI, ShuffleVectorInst SVI,
unsigned Factor) const {		unsigned Factor) const {
return false;		return false;
}		}

		/// Lower a deinterleave intrinsic to a target specific load intrinsic.
		/// Return true on success. Currently only supports
		/// llvm.experimental.vector.deinterleave2
		///
		/// \p DI is the deinterleave intrinsic.
		/// \p LI is the accompanying load instruction
		virtual bool lowerDeinterleaveIntrinsicToLoad(IntrinsicInst *DI,
		LoadInst *LI) const {
		return false;
		}

		/// Lower an interleave intrinsic to a target specific store intrinsic.
		/// Return true on success. Currently only supports
		/// llvm.experimental.vector.interleave2
		///
		/// \p II is the interleave intrinsic.
		/// \p SI is the accompanying store instruction
		virtual bool lowerInterleaveIntrinsicToStore(IntrinsicInst *II,
		StoreInst *SI) const {
		return false;
		}

/// Return true if an fpext operation is free (for instance, because		/// Return true if an fpext operation is free (for instance, because
/// single-precision floating-point numbers are implicitly extended to		/// single-precision floating-point numbers are implicitly extended to
/// double-precision).		/// double-precision).
virtual bool isFPExtFree(EVT DestVT, EVT SrcVT) const {		virtual bool isFPExtFree(EVT DestVT, EVT SrcVT) const {
assert(SrcVT.isFloatingPoint() && DestVT.isFloatingPoint() &&		assert(SrcVT.isFloatingPoint() && DestVT.isFloatingPoint() &&
"invalid fpext types");		"invalid fpext types");
return false;		return false;
}		}
▲ Show 20 Lines • Show All 2,341 Lines • Show Last 20 Lines

llvm/lib/CodeGen/InterleavedAccessPass.cpp

Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
#include "llvm/CodeGen/TargetSubtargetInfo.h"		#include "llvm/CodeGen/TargetSubtargetInfo.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstIterator.h"		#include "llvm/IR/InstIterator.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
Show All 38 Lines	private:

/// Transform an interleaved load into target specific intrinsics.		/// Transform an interleaved load into target specific intrinsics.
bool lowerInterleavedLoad(LoadInst *LI,		bool lowerInterleavedLoad(LoadInst *LI,
SmallVector<Instruction *, 32> &DeadInsts);		SmallVector<Instruction *, 32> &DeadInsts);

/// Transform an interleaved store into target specific intrinsics.		/// Transform an interleaved store into target specific intrinsics.
bool lowerInterleavedStore(StoreInst *SI,		bool lowerInterleavedStore(StoreInst *SI,
SmallVector<Instruction *, 32> &DeadInsts);		SmallVector<Instruction *, 32> &DeadInsts);

		/// Transform a load and a deinterleave intrinsic into target specific
		/// instructions.
		bool lowerDeinterleaveIntrinsic(IntrinsicInst *II,
		SmallVector<Instruction *, 32> &DeadInsts);

		/// Transform an interleave intrinsic and a store into target specific
		/// instructions.
		bool lowerInterleaveIntrinsic(IntrinsicInst *II,
		SmallVector<Instruction *, 32> &DeadInsts);

		mgabkaUnsubmitted Done Reply Inline Actions perhaps adding a comment to this and the other function below would be a good idea, at least consistent with the other lowerinterleaved* functions mgabka: perhaps adding a comment to this and the other function below would be a good idea, at least…
/// Returns true if the uses of an interleaved load by the		/// Returns true if the uses of an interleaved load by the
/// extractelement instructions in \p Extracts can be replaced by uses of the		/// extractelement instructions in \p Extracts can be replaced by uses of the
/// shufflevector instructions in \p Shuffles instead. If so, the necessary		/// shufflevector instructions in \p Shuffles instead. If so, the necessary
/// replacements are also performed.		/// replacements are also performed.
bool tryReplaceExtracts(ArrayRef<ExtractElementInst *> Extracts,		bool tryReplaceExtracts(ArrayRef<ExtractElementInst *> Extracts,
ArrayRef<ShuffleVectorInst *> Shuffles);		ArrayRef<ShuffleVectorInst *> Shuffles);

/// Given a number of shuffles of the form shuffle(binop(x,y)), convert them		/// Given a number of shuffles of the form shuffle(binop(x,y)), convert them
▲ Show 20 Lines • Show All 317 Lines • ▼ Show 20 Lines	if (!TLI->lowerInterleavedStore(SI, SVI, Factor))
return false;		return false;

// Already have a new target specific interleaved store. Erase the old store.		// Already have a new target specific interleaved store. Erase the old store.
DeadInsts.push_back(SI);		DeadInsts.push_back(SI);
DeadInsts.push_back(SVI);		DeadInsts.push_back(SVI);
return true;		return true;
}		}

		bool InterleavedAccess::lowerDeinterleaveIntrinsic(
		IntrinsicInst DI, SmallVector<Instruction , 32> &DeadInsts) {
		LoadInst *LI = dyn_cast<LoadInst>(DI->getOperand(0));

		if (!LI \|\| !LI->hasOneUse() \|\| !LI->isSimple())
		paulwalker-armUnsubmitted Done Reply Inline Actions Probably best to also check for `isSimple()` to match the behaviour of `lowerInterleavedLoad`. paulwalker-arm: Probably best to also check for `isSimple()` to match the behaviour of `lowerInterleavedLoad`.
		return false;

		LLVM_DEBUG(dbgs() << "IA: Found a deinterleave intrinsic: " << *DI << "\n");

		// Try and match this with target specific intrinsics.
		if (!TLI->lowerDeinterleaveIntrinsicToLoad(DI, LI))
		return false;

		// We now have a target-specific load, so delete the old one.
		DeadInsts.push_back(DI);
		DeadInsts.push_back(LI);
		return true;
		}

		bool InterleavedAccess::lowerInterleaveIntrinsic(
		IntrinsicInst II, SmallVector<Instruction , 32> &DeadInsts) {
		if (!II->hasOneUse())
		mgabkaUnsubmitted Done Reply Inline Actions could you explain why you are checking it here but not for lowerDeinterleaveIntrinsic? mgabka: could you explain why you are checking it here but not for lowerDeinterleaveIntrinsic?
		huntergrAuthorUnsubmitted Done Reply Inline Actions For deinterleave, I check that the load has a single use so that I can replace both it and the intrinsic -- so the deinterleave intrinsic itself (or a target specific intrinsic which produces the same result) can have multiple uses without problems. Here, we want to make sure that the store is the only use of the interleave so that both can be replaced. huntergr: For deinterleave, I check that the load has a single use so that I can replace both it and the…
		return false;

		StoreInst SI = dyn_cast<StoreInst>((II->users().begin()));

		if (!SI \|\| !SI->isSimple())
		paulwalker-armUnsubmitted Done Reply Inline Actions Probably best to also check for `isSimple()` to match the behaviour of `lowerInterleavedStore`. paulwalker-arm: Probably best to also check for `isSimple()` to match the behaviour of `lowerInterleavedStore`.
		return false;

		LLVM_DEBUG(dbgs() << "IA: Found an interleave intrinsic: " << *II << "\n");

		// Try and match this with target specific intrinsics.
		if (!TLI->lowerInterleaveIntrinsicToStore(II, SI))
		return false;

		// We now have a target-specific store, so delete the old one.
		DeadInsts.push_back(SI);
		DeadInsts.push_back(II);
		return true;
		}

bool InterleavedAccess::runOnFunction(Function &F) {		bool InterleavedAccess::runOnFunction(Function &F) {
auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();		auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();
if (!TPC \|\| !LowerInterleavedAccesses)		if (!TPC \|\| !LowerInterleavedAccesses)
return false;		return false;

LLVM_DEBUG(dbgs() << "*** " << getPassName() << ": " << F.getName() << "\n");		LLVM_DEBUG(dbgs() << "*** " << getPassName() << ": " << F.getName() << "\n");

DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();		DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
auto &TM = TPC->getTM<TargetMachine>();		auto &TM = TPC->getTM<TargetMachine>();
TLI = TM.getSubtargetImpl(F)->getTargetLowering();		TLI = TM.getSubtargetImpl(F)->getTargetLowering();
MaxFactor = TLI->getMaxSupportedInterleaveFactor();		MaxFactor = TLI->getMaxSupportedInterleaveFactor();

// Holds dead instructions that will be erased later.		// Holds dead instructions that will be erased later.
SmallVector<Instruction *, 32> DeadInsts;		SmallVector<Instruction *, 32> DeadInsts;
bool Changed = false;		bool Changed = false;

for (auto &I : instructions(F)) {		for (auto &I : instructions(F)) {
if (auto *LI = dyn_cast<LoadInst>(&I))		if (auto *LI = dyn_cast<LoadInst>(&I))
Changed \|= lowerInterleavedLoad(LI, DeadInsts);		Changed \|= lowerInterleavedLoad(LI, DeadInsts);

if (auto *SI = dyn_cast<StoreInst>(&I))		if (auto *SI = dyn_cast<StoreInst>(&I))
Changed \|= lowerInterleavedStore(SI, DeadInsts);		Changed \|= lowerInterleavedStore(SI, DeadInsts);

		if (auto *II = dyn_cast<IntrinsicInst>(&I)) {
		// At present, we only have intrinsics to represent (de)interleaving
		mgabkaUnsubmitted Not Done Reply Inline Actions I think it is worth to add a comment here explaining why we restrict this operation only to VF=2 mgabka: I think it is worth to add a comment here explaining why we restrict this operation only to VF=2
		// with a factor of 2.
		if (II->getIntrinsicID() == Intrinsic::experimental_vector_deinterleave2)
		Changed \|= lowerDeinterleaveIntrinsic(II, DeadInsts);
		if (II->getIntrinsicID() == Intrinsic::experimental_vector_interleave2)
		Changed \|= lowerInterleaveIntrinsic(II, DeadInsts);
		}
}		}

for (auto *I : DeadInsts)		for (auto *I : DeadInsts)
I->eraseFromParent();		I->eraseFromParent();

return Changed;		return Changed;
}		}

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 645 Lines • ▼ Show 20 Lines	public:

bool lowerInterleavedLoad(LoadInst *LI,		bool lowerInterleavedLoad(LoadInst *LI,
ArrayRef<ShuffleVectorInst *> Shuffles,		ArrayRef<ShuffleVectorInst *> Shuffles,
ArrayRef<unsigned> Indices,		ArrayRef<unsigned> Indices,
unsigned Factor) const override;		unsigned Factor) const override;
bool lowerInterleavedStore(StoreInst SI, ShuffleVectorInst SVI,		bool lowerInterleavedStore(StoreInst SI, ShuffleVectorInst SVI,
unsigned Factor) const override;		unsigned Factor) const override;

		bool lowerDeinterleaveIntrinsicToLoad(IntrinsicInst *DI,
		LoadInst *LI) const override;

		bool lowerInterleaveIntrinsicToStore(IntrinsicInst *II,
		StoreInst *SI) const override;

bool isLegalAddImmediate(int64_t) const override;		bool isLegalAddImmediate(int64_t) const override;
bool isLegalICmpImmediate(int64_t) const override;		bool isLegalICmpImmediate(int64_t) const override;

bool isMulAddWithConstProfitable(SDValue AddNode,		bool isMulAddWithConstProfitable(SDValue AddNode,
SDValue ConstNode) const override;		SDValue ConstNode) const override;

bool shouldConsiderGEPOffsetSplit() const override;		bool shouldConsiderGEPOffsetSplit() const override;

▲ Show 20 Lines • Show All 592 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 14,748 Lines • ▼ Show 20 Lines	AArch64TargetLowering::getTargetMMOFlags(const Instruction &I) const {
if (Subtarget->getProcFamily() == AArch64Subtarget::Falkor &&		if (Subtarget->getProcFamily() == AArch64Subtarget::Falkor &&
I.getMetadata(FALKOR_STRIDED_ACCESS_MD) != nullptr)		I.getMetadata(FALKOR_STRIDED_ACCESS_MD) != nullptr)
return MOStridedAccess;		return MOStridedAccess;
return MachineMemOperand::MONone;		return MachineMemOperand::MONone;
}		}

bool AArch64TargetLowering::isLegalInterleavedAccessType(		bool AArch64TargetLowering::isLegalInterleavedAccessType(
VectorType *VecTy, const DataLayout &DL, bool &UseScalable) const {		VectorType *VecTy, const DataLayout &DL, bool &UseScalable) const {

unsigned ElSize = DL.getTypeSizeInBits(VecTy->getElementType());		unsigned ElSize = DL.getTypeSizeInBits(VecTy->getElementType());
auto EC = VecTy->getElementCount();		auto EC = VecTy->getElementCount();
unsigned MinElts = EC.getKnownMinValue();		unsigned MinElts = EC.getKnownMinValue();

UseScalable = false;		UseScalable = false;

		if (!VecTy->isScalableTy() && !Subtarget->hasNEON())
		return false;

		paulwalker-armUnsubmitted Done Reply Inline Actions Not sure there's a perfect way to write this but I think to be a little more accurate you want something like: if (!VecTy->isScalableTy() && !Subtarget->hasNEON()) return false; if (VecTy->isScalableTy() && !Subtarget->hasSVEorSME())) return false; paulwalker-arm: Not sure there's a perfect way to write this but I think to be a little more accurate you want…
		if (VecTy->isScalableTy() && !Subtarget->hasSVEorSME())
		return false;

// Ensure that the predicate for this number of elements is available.		// Ensure that the predicate for this number of elements is available.
if (Subtarget->hasSVE() && !getSVEPredPatternFromNumElements(MinElts))		if (Subtarget->hasSVE() && !getSVEPredPatternFromNumElements(MinElts))
return false;		return false;

// Ensure the number of vector elements is greater than 1.		// Ensure the number of vector elements is greater than 1.
if (MinElts < 2)		if (MinElts < 2)
return false;		return false;

// Ensure the element type is legal.		// Ensure the element type is legal.
if (ElSize != 8 && ElSize != 16 && ElSize != 32 && ElSize != 64)		if (ElSize != 8 && ElSize != 16 && ElSize != 32 && ElSize != 64)
return false;		return false;

if (EC.isScalable())		if (EC.isScalable()) {
return MinElts * ElSize == 128;		UseScalable = true;
		mgabkaUnsubmitted Done Reply Inline Actions is it correct to set it here to true or only if the condition below is true? from what I can see, before UseScalable was only set to true if this function was returning true mgabka: is it correct to set it here to true or only if the condition below is true? from what I can…
		huntergrAuthorUnsubmitted Done Reply Inline Actions It's fine -- if the type is not a legal shuffle type, no transformation will be performed. huntergr: It's fine -- if the type is not a legal shuffle type, no transformation will be performed.
		mgabkaUnsubmitted Not Done Reply Inline Actions if callers check correctly for the return value of this function. However, the UseScalable is already set to false, even if the function returns false, my point is that this function should modify state of UseScalable only if it returns true. But that is a just my personal preference I guess. mgabka: if callers check correctly for the return value of this function. However, the UseScalable is…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions The first part is the function's design (i.e. UseScalable only contains meaning data when the function returns true). That said, this function is not great as it tries to do two different jobs. Certainly worth a redesign if there's chance to refactor the current way fixed length vectors are implemented. paulwalker-arm: The first part is the function's design (i.e. UseScalable only contains meaning data when the…
		huntergrAuthorUnsubmitted Done Reply Inline Actions I agree with Paul that it conflates two jobs, but I did just think of a nicer way to do it -- return either a pair of bools or an optional<bool> instead of passing in a reference to the variable. I can make that a fixup commit if you'd like. huntergr: I agree with Paul that it conflates two jobs, but I did just think of a nicer way to do it…
		return isPowerOf2_32(MinElts) && (MinElts * ElSize) % 128 == 0;
		}

unsigned VecSize = DL.getTypeSizeInBits(VecTy);		unsigned VecSize = DL.getTypeSizeInBits(VecTy);
if (Subtarget->forceStreamingCompatibleSVE() \|\|		if (Subtarget->forceStreamingCompatibleSVE() \|\|
(Subtarget->useSVEForFixedLengthVectors() &&		(Subtarget->useSVEForFixedLengthVectors() &&
(VecSize % Subtarget->getMinSVEVectorSizeInBits() == 0 \|\|		(VecSize % Subtarget->getMinSVEVectorSizeInBits() == 0 \|\|
(VecSize < Subtarget->getMinSVEVectorSizeInBits() &&		(VecSize < Subtarget->getMinSVEVectorSizeInBits() &&
isPowerOf2_32(MinElts) && VecSize > 128)))) {		isPowerOf2_32(MinElts) && VecSize > 128)))) {
UseScalable = true;		UseScalable = true;
Show All 28 Lines	if (VTy->getElementType() == Type::getInt16Ty(VTy->getContext()))
return ScalableVectorType::get(VTy->getElementType(), 8);		return ScalableVectorType::get(VTy->getElementType(), 8);

if (VTy->getElementType() == Type::getInt8Ty(VTy->getContext()))		if (VTy->getElementType() == Type::getInt8Ty(VTy->getContext()))
return ScalableVectorType::get(VTy->getElementType(), 16);		return ScalableVectorType::get(VTy->getElementType(), 16);

llvm_unreachable("Cannot handle input vector type");		llvm_unreachable("Cannot handle input vector type");
}		}

		static Function getStructuredLoadFunction(Module M, unsigned Factor,
		paulwalker-armUnsubmitted Done Reply Inline Actions Function names should start with a lower case letter? paulwalker-arm: Function names should start with a lower case letter?
		bool Scalable, Type *LDVTy,
		Type *PtrTy) {
		assert(Factor >= 2 && Factor <= 4 && "Invalid interleave factor");
		paulwalker-armUnsubmitted Done Reply Inline Actions Probably worth an `assert(Factor >= 2 && Factor <= 4);`. Same goes for getStructuredStoreFunction. paulwalker-arm: Probably worth an `assert(Factor >= 2 && Factor <= 4);`. Same goes for…
		static const Intrinsic::ID SVELoads[3] = {Intrinsic::aarch64_sve_ld2_sret,
		Intrinsic::aarch64_sve_ld3_sret,
		Intrinsic::aarch64_sve_ld4_sret};
		static const Intrinsic::ID NEONLoads[3] = {Intrinsic::aarch64_neon_ld2,
		Intrinsic::aarch64_neon_ld3,
		Intrinsic::aarch64_neon_ld4};
		if (Scalable)
		return Intrinsic::getDeclaration(M, SVELoads[Factor - 2], {LDVTy});

		return Intrinsic::getDeclaration(M, NEONLoads[Factor - 2], {LDVTy, PtrTy});
		}

		paulwalker-armUnsubmitted Done Reply Inline Actions As above. paulwalker-arm: As above.
		static Function getStructuredStoreFunction(Module M, unsigned Factor,
		bool Scalable, Type *STVTy,
		Type *PtrTy) {
		assert(Factor >= 2 && Factor <= 4 && "Invalid interleave factor");
		static const Intrinsic::ID SVEStores[3] = {Intrinsic::aarch64_sve_st2,
		Intrinsic::aarch64_sve_st3,
		Intrinsic::aarch64_sve_st4};
		static const Intrinsic::ID NEONStores[3] = {Intrinsic::aarch64_neon_st2,
		Intrinsic::aarch64_neon_st3,
		Intrinsic::aarch64_neon_st4};
		if (Scalable)
		return Intrinsic::getDeclaration(M, SVEStores[Factor - 2], {STVTy});

		return Intrinsic::getDeclaration(M, NEONStores[Factor - 2], {STVTy, PtrTy});
		}

/// Lower an interleaved load into a ldN intrinsic.		/// Lower an interleaved load into a ldN intrinsic.
///		///
/// E.g. Lower an interleaved load (Factor = 2):		/// E.g. Lower an interleaved load (Factor = 2):
/// %wide.vec = load <8 x i32>, <8 x i32>* %ptr		/// %wide.vec = load <8 x i32>, <8 x i32>* %ptr
/// %v0 = shuffle %wide.vec, undef, <0, 2, 4, 6> ; Extract even elements		/// %v0 = shuffle %wide.vec, undef, <0, 2, 4, 6> ; Extract even elements
/// %v1 = shuffle %wide.vec, undef, <1, 3, 5, 7> ; Extract odd elements		/// %v1 = shuffle %wide.vec, undef, <1, 3, 5, 7> ; Extract odd elements
///		///
/// Into:		/// Into:
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	if (NumLoads > 1) {
// We will compute the pointer operand of each load from the original base		// We will compute the pointer operand of each load from the original base
// address using GEPs. Cast the base address to a pointer to the scalar		// address using GEPs. Cast the base address to a pointer to the scalar
// element type.		// element type.
BaseAddr = Builder.CreateBitCast(		BaseAddr = Builder.CreateBitCast(
BaseAddr,		BaseAddr,
LDVTy->getElementType()->getPointerTo(LI->getPointerAddressSpace()));		LDVTy->getElementType()->getPointerTo(LI->getPointerAddressSpace()));
}		}

Type *PtrTy =		Type *PtrTy = LI->getPointerOperandType();
UseScalable
? LDVTy->getElementType()->getPointerTo(LI->getPointerAddressSpace())
: LDVTy->getPointerTo(LI->getPointerAddressSpace());
Type *PredTy = VectorType::get(Type::getInt1Ty(LDVTy->getContext()),		Type *PredTy = VectorType::get(Type::getInt1Ty(LDVTy->getContext()),
LDVTy->getElementCount());		LDVTy->getElementCount());

static const Intrinsic::ID SVELoadIntrs[3] = {		Function *LdNFunc = getStructuredLoadFunction(LI->getModule(), Factor,
Intrinsic::aarch64_sve_ld2_sret, Intrinsic::aarch64_sve_ld3_sret,		UseScalable, LDVTy, PtrTy);
Intrinsic::aarch64_sve_ld4_sret};
static const Intrinsic::ID NEONLoadIntrs[3] = {Intrinsic::aarch64_neon_ld2,
Intrinsic::aarch64_neon_ld3,
Intrinsic::aarch64_neon_ld4};
Function *LdNFunc;
if (UseScalable)
LdNFunc = Intrinsic::getDeclaration(LI->getModule(),
SVELoadIntrs[Factor - 2], {LDVTy});
else
LdNFunc = Intrinsic::getDeclaration(
LI->getModule(), NEONLoadIntrs[Factor - 2], {LDVTy, PtrTy});

// Holds sub-vectors extracted from the load intrinsic return values. The		// Holds sub-vectors extracted from the load intrinsic return values. The
// sub-vectors are associated with the shufflevector instructions they will		// sub-vectors are associated with the shufflevector instructions they will
// replace.		// replace.
DenseMap<ShuffleVectorInst , SmallVector<Value , 4>> SubVecs;		DenseMap<ShuffleVectorInst , SmallVector<Value , 4>> SubVecs;

Value *PTrue = nullptr;		Value *PTrue = nullptr;
if (UseScalable) {		if (UseScalable) {
▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	bool AArch64TargetLowering::lowerInterleavedStore(StoreInst *SI,
if (llvm::all_of(Mask, [](int Idx) { return Idx == PoisonMaskElem; })) {		if (llvm::all_of(Mask, [](int Idx) { return Idx == PoisonMaskElem; })) {
return false;		return false;
}		}
// A 64bit st2 which does not start at element 0 will involved adding extra		// A 64bit st2 which does not start at element 0 will involved adding extra
// ext elements, making the st2 unprofitable.		// ext elements, making the st2 unprofitable.
if (Factor == 2 && SubVecTy->getPrimitiveSizeInBits() == 64 && Mask[0] != 0)		if (Factor == 2 && SubVecTy->getPrimitiveSizeInBits() == 64 && Mask[0] != 0)
return false;		return false;

Type *PtrTy =		Type *PtrTy = SI->getPointerOperandType();
UseScalable
? STVTy->getElementType()->getPointerTo(SI->getPointerAddressSpace())
: STVTy->getPointerTo(SI->getPointerAddressSpace());
Type *PredTy = VectorType::get(Type::getInt1Ty(STVTy->getContext()),		Type *PredTy = VectorType::get(Type::getInt1Ty(STVTy->getContext()),
STVTy->getElementCount());		STVTy->getElementCount());

static const Intrinsic::ID SVEStoreIntrs[3] = {Intrinsic::aarch64_sve_st2,		Function *StNFunc = getStructuredStoreFunction(SI->getModule(), Factor,
Intrinsic::aarch64_sve_st3,		UseScalable, STVTy, PtrTy);
Intrinsic::aarch64_sve_st4};
static const Intrinsic::ID NEONStoreIntrs[3] = {Intrinsic::aarch64_neon_st2,
Intrinsic::aarch64_neon_st3,
Intrinsic::aarch64_neon_st4};
Function *StNFunc;
if (UseScalable)
StNFunc = Intrinsic::getDeclaration(SI->getModule(),
SVEStoreIntrs[Factor - 2], {STVTy});
else
StNFunc = Intrinsic::getDeclaration(
SI->getModule(), NEONStoreIntrs[Factor - 2], {STVTy, PtrTy});

Value *PTrue = nullptr;		Value *PTrue = nullptr;
if (UseScalable) {		if (UseScalable) {
std::optional<unsigned> PgPattern =		std::optional<unsigned> PgPattern =
getSVEPredPatternFromNumElements(SubVecTy->getNumElements());		getSVEPredPatternFromNumElements(SubVecTy->getNumElements());
if (Subtarget->getMinSVEVectorSizeInBits() ==		if (Subtarget->getMinSVEVectorSizeInBits() ==
Subtarget->getMaxSVEVectorSizeInBits() &&		Subtarget->getMaxSVEVectorSizeInBits() &&
Subtarget->getMinSVEVectorSizeInBits() ==		Subtarget->getMinSVEVectorSizeInBits() ==
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	if (StoreCount > 0)
BaseAddr, LaneLen * Factor);		BaseAddr, LaneLen * Factor);

Ops.push_back(Builder.CreateBitCast(BaseAddr, PtrTy));		Ops.push_back(Builder.CreateBitCast(BaseAddr, PtrTy));
Builder.CreateCall(StNFunc, Ops);		Builder.CreateCall(StNFunc, Ops);
}		}
return true;		return true;
}		}

		bool AArch64TargetLowering::lowerDeinterleaveIntrinsicToLoad(
		IntrinsicInst DI, LoadInst LI) const {
		// Only deinterleave2 supported at present.
		if (DI->getIntrinsicID() != Intrinsic::experimental_vector_deinterleave2)
		return false;

		// Only a factor of 2 supported at present.
		const unsigned Factor = 2;

		paulwalker-armUnsubmitted Done Reply Inline Actions Perhaps this is better done in `isLegalInterleavedAccessType`? as it cannot really be considered legal if the necessary target features are not available. You likely also want to use `hasSVEorSME` because SME supports these instructions as well. paulwalker-arm: Perhaps this is better done in `isLegalInterleavedAccessType`? as it cannot really be…
		VectorType *VTy = cast<VectorType>(DI->getType()->getContainedType(0));
		const DataLayout &DL = DI->getModule()->getDataLayout();
		paulwalker-armUnsubmitted Done Reply Inline Actions I'm happy for incremental bring up but I would like to see fixed length vectors supported sooner rather than later so that we have the option to use the new shuffle intrinsics without losing this feature. I'm specifically interested in the potential to simplify SVE VLS so that it mirrors SVE VLA paulwalker-arm: I'm happy for incremental bring up but I would like to see fixed length vectors supported…
		bool UseScalable;
		if (!isLegalInterleavedAccessType(VTy, DL, UseScalable))
		return false;

		// TODO: Add support for using SVE instructions with fixed types later, using
		// the code from lowerInterleavedLoad to obtain the correct container type.
		if (UseScalable && !VTy->isScalableTy())
		return false;

		unsigned NumLoads = getNumInterleavedAccesses(VTy, DL, UseScalable);
		paulwalker-armUnsubmitted Done Reply Inline Actions This will have the effect of "moving" the load to where the deinterleaving is happening? which has the potential to break the original IR's load/store order. paulwalker-arm: This will have the effect of "moving" the load to where the deinterleaving is happening? which…

		VectorType *LdTy =
		VectorType::get(VTy->getElementType(),
		VTy->getElementCount().divideCoefficientBy(NumLoads));

		paulwalker-armUnsubmitted Done Reply Inline Actions I suspect this patch will ICE the compiler when faced with SVE fixed length vectors and there's no tests to prove otherwise. The nuance here is the result vectors are fixed length but the target intrinsics will generate scalable vectors. If you look at `lowerInterleavedLoad` you'll see the extra complexity relating to working with a container type. With that said, I'm happy for this patch to not support SVE fixed length vectors, it just cannot trigger an ICE. This might be as simple as adding a bail out for `UseScalable != isa<ScalableVector>(VTy)` and adding a few tests. paulwalker-arm: I suspect this patch will ICE the compiler when faced with SVE fixed length vectors and there's…
		Type *PtrTy = LI->getPointerOperandType();
		Function *LdNFunc = getStructuredLoadFunction(DI->getModule(), Factor,
		UseScalable, LdTy, PtrTy);

		IRBuilder<> Builder(LI);

		Value *Pred = nullptr;
		if (UseScalable)
		Pred =
		Builder.CreateVectorSplat(LdTy->getElementCount(), Builder.getTrue());

		Value *BaseAddr = LI->getPointerOperand();
		Value *Result;
		paulwalker-armUnsubmitted Done Reply Inline Actions I thought opaque pointers meant we didn't need to bitcast pointers anymore? I mainly mention it because I wasn't sure what type the following `CreateGEP` call returns and so figured we might either not need any bitcasting or we might need a little more :) paulwalker-arm: I thought opaque pointers meant we didn't need to bitcast pointers anymore? I mainly mention it…
		paulwalker-armUnsubmitted Done Reply Inline Actions https://www.llvm.org/docs/OpaquePointers.html says that for LLVM 17 "Typed pointers are not supported", so I think the bitcasts can be removed? paulwalker-arm: https://www.llvm.org/docs/OpaquePointers.html says that for LLVM 17 "Typed pointers are not…
		if (NumLoads > 1) {
		Value *Left = PoisonValue::get(VTy);
		Value *Right = PoisonValue::get(VTy);
		paulwalker-armUnsubmitted Done Reply Inline Actions I think just `ConstantInt::getTrue(LdTy->getContext())` should work here? paulwalker-arm: I think just `ConstantInt::getTrue(LdTy->getContext())` should work here?

		for (unsigned I = 0; I < NumLoads; ++I) {
		Value Offset = Builder.getInt64(I Factor);

		paulwalker-armUnsubmitted Done Reply Inline Actions Up to you but given you've got an IRBuilder you could use `Builder.getInt64()` to reduce the line wrapping for all the places where you're using `ConstantInt::get(Type::getInt64Ty(VTy->getContext()),....`. paulwalker-arm: Up to you but given you've got an IRBuilder you could use `Builder.getInt64()` to reduce the…
		Value *Address = Builder.CreateGEP(LdTy, BaseAddr, {Offset});
		Value *LdN = nullptr;
		if (UseScalable)
		LdN = Builder.CreateCall(LdNFunc, {Pred, Address}, "ldN");
		else
		paulwalker-armUnsubmitted Done Reply Inline Actions This is not sufficient because it'll allow `<vscale x 128 x i1>`, `<vscale x 64 x i2>` etc. Perhaps it's better to explicitly check for the types we do support? paulwalker-arm: This is not sufficient because it'll allow `<vscale x 128 x i1>`, `<vscale x 64 x i2>` etc.
		LdN = Builder.CreateCall(LdNFunc, Address, "ldN");
		paulwalker-armUnsubmitted Done Reply Inline Actions Is this correct? `VTy` is the result type for each of the two results from the deinterleave intrinsic. `LdTy` is this divided by the number of of calls to `ld2` that you'll need, which represents the result type for each of the two results from the `ld2` intrinsic. However `ld2` reads `2 * sizeof(LdTy)` bytes. So I think `Offset` is half the amount it needs to be. I wonder if you can simplify the addressing logic by just using `LdTy` directly. That way the offset can just be `I * Factor` for both fixed and scalable vectors, thus no need for `vscale`. What do you think? I've not looked but I'm assuming the store function has the same problem. paulwalker-arm: Is this correct? `VTy` is the result type for each of the two results from the deinterleave…

		Value *Idx =
		paulwalker-armUnsubmitted Done Reply Inline Actions As above, this might create the new "store" in the wrong place. paulwalker-arm: As above, this might create the new "store" in the wrong place.
		Builder.getInt64(I * LdTy->getElementCount().getKnownMinValue());
		Left = Builder.CreateInsertVector(
		VTy, Left, Builder.CreateExtractValue(LdN, 0), Idx);
		Right = Builder.CreateInsertVector(
		VTy, Right, Builder.CreateExtractValue(LdN, 1), Idx);
		}

		Result = PoisonValue::get(DI->getType());
		Result = Builder.CreateInsertValue(Result, Left, 0);
		Result = Builder.CreateInsertValue(Result, Right, 1);
		} else {
		if (UseScalable)
		Result = Builder.CreateCall(LdNFunc, {Pred, BaseAddr}, "ldN");
		else
		Result = Builder.CreateCall(LdNFunc, BaseAddr, "ldN");
		}

		DI->replaceAllUsesWith(Result);
		return true;
		}

		bool AArch64TargetLowering::lowerInterleaveIntrinsicToStore(
		IntrinsicInst II, StoreInst SI) const {
		// Only interleave2 supported at present.
		if (II->getIntrinsicID() != Intrinsic::experimental_vector_interleave2)
		return false;

		// Only a factor of 2 supported at present.
		const unsigned Factor = 2;

		VectorType *VTy = cast<VectorType>(II->getOperand(0)->getType());
		const DataLayout &DL = II->getModule()->getDataLayout();
		bool UseScalable;
		if (!isLegalInterleavedAccessType(VTy, DL, UseScalable))
		return false;

		// TODO: Add support for using SVE instructions with fixed types later, using
		paulwalker-armUnsubmitted Done Reply Inline Actions Same comment as with lowerDeinterleaveIntrinsicToLoad. paulwalker-arm: Same comment as with lowerDeinterleaveIntrinsicToLoad.
		// the code from lowerInterleavedStore to obtain the correct container type.
		if (UseScalable && !VTy->isScalableTy())
		return false;

		unsigned NumStores = getNumInterleavedAccesses(VTy, DL, UseScalable);

		VectorType *StTy =
		VectorType::get(VTy->getElementType(),
		VTy->getElementCount().divideCoefficientBy(NumStores));

		paulwalker-armUnsubmitted Done Reply Inline Actions Is this code still relevant? It looks like leftovers from before you started to pass `SI` as an operand. paulwalker-arm: Is this code still relevant? It looks like leftovers from before you started to pass `SI` as an…
		Type *PtrTy = SI->getPointerOperandType();
		Function *StNFunc = getStructuredStoreFunction(SI->getModule(), Factor,
		UseScalable, StTy, PtrTy);

		IRBuilder<> Builder(SI);
		paulwalker-armUnsubmitted Done Reply Inline Actions I don't really see what you're gaining by passing in `Address` rather than just passing in `SI`? I guess similar is true for `lowerDeinterleaveIntrinsicToLoad`. paulwalker-arm: I don't really see what you're gaining by passing in `Address` rather than just passing in `SI`?

		Value *BaseAddr = SI->getPointerOperand();
		Value *Pred = nullptr;

		if (UseScalable)
		Pred =
		Builder.CreateVectorSplat(StTy->getElementCount(), Builder.getTrue());

		Value *L = II->getOperand(0);
		Value *R = II->getOperand(1);

		for (unsigned I = 0; I < NumStores; ++I) {
		Value *Address = BaseAddr;
		if (NumStores > 1) {
		Value Offset = Builder.getInt64(I Factor);
		Address = Builder.CreateGEP(StTy, BaseAddr, {Offset});

		Value *Idx =
		Builder.getInt64(I * StTy->getElementCount().getKnownMinValue());
		L = Builder.CreateExtractVector(StTy, II->getOperand(0), Idx);
		R = Builder.CreateExtractVector(StTy, II->getOperand(1), Idx);
		}

		if (UseScalable)
		paulwalker-armUnsubmitted Done Reply Inline Actions `ConstantInt::getTrue(StTy->getContext())`? paulwalker-arm: `ConstantInt::getTrue(StTy->getContext())`?
		Builder.CreateCall(StNFunc, {L, R, Pred, Address});
		else
		Builder.CreateCall(StNFunc, {L, R, Address});
		}

		return true;
		}

EVT AArch64TargetLowering::getOptimalMemOpType(		EVT AArch64TargetLowering::getOptimalMemOpType(
const MemOp &Op, const AttributeList &FuncAttributes) const {		const MemOp &Op, const AttributeList &FuncAttributes) const {
bool CanImplicitFloat = !FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat);		bool CanImplicitFloat = !FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat);
bool CanUseNEON = Subtarget->hasNEON() && CanImplicitFloat;		bool CanUseNEON = Subtarget->hasNEON() && CanImplicitFloat;
bool CanUseFP = Subtarget->hasFPARMv8() && CanImplicitFloat;		bool CanUseFP = Subtarget->hasFPARMv8() && CanImplicitFloat;
// Only use AdvSIMD to implement memset of 32-byte and above. It would have		// Only use AdvSIMD to implement memset of 32-byte and above. It would have
// taken one instruction to materialize the v2i64 zero and one store (with		// taken one instruction to materialize the v2i64 zero and one store (with
// restrictive addressing mode). Just do i64 stores.		// restrictive addressing mode). Just do i64 stores.
▲ Show 20 Lines • Show All 10,373 Lines • Show Last 20 Lines

llvm/test/Transforms/InterleavedAccess/AArch64/fixed-deinterleave-intrinsics.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
				; RUN: opt < %s -interleaved-access -S \| FileCheck %s --check-prefix=NEON
				; RUN: opt < %s -interleaved-access -mtriple=aarch64-linux-gnu -mattr=+sve -force-streaming-compatible-sve -S \| FileCheck %s --check-prefix=SVE-FIXED

				mgabkaUnsubmitted Not Done Reply Inline Actions is "-force-streaming-compatible-sve" needed here? I thought that this transformation should happen always for sve, the tests for scalable vectorization have only "target-features"="+sve" added. am I missing something? mgabka: is "-force-streaming-compatible-sve" needed here? I thought that this transformation should…
				huntergrAuthorUnsubmitted Done Reply Inline Actions This flag allows us to explicitly force the use of SVE ldN/stN instructions for fixed-width vectors. If you look at AArch64TargetLowering::isLegalInterleavedAccessType(), you will see it make a call to Subtarget->forceStreamingCompatibleSVE() as one option for a condition to mark UseScalable = true; I think the other way to do it would be to set the minimum SVE vector size to 256b, but that only works if the vector types used in the test are >128b. So using the flag means I can force it for all fixed-width tests. huntergr: This flag allows us to explicitly force the use of SVE ldN/stN instructions for fixed-width…
				target triple = "aarch64-linux-gnu"

				define { <16 x i8>, <16 x i8> } @deinterleave_i8_factor2(ptr %ptr) {
				; NEON-LABEL: define { <16 x i8>, <16 x i8> } @deinterleave_i8_factor2
				; NEON-SAME: (ptr [[PTR:%.*]]) {
				; NEON-NEXT: [[LDN:%.*]] = call { <16 x i8>, <16 x i8> } @llvm.aarch64.neon.ld2.v16i8.p0(ptr [[PTR]])
				; NEON-NEXT: ret { <16 x i8>, <16 x i8> } [[LDN]]
				;
				; SVE-FIXED-LABEL: define { <16 x i8>, <16 x i8> } @deinterleave_i8_factor2
				; SVE-FIXED-SAME: (ptr [[PTR:%.*]]) #[[ATTR0:[0-9]+]] {
				; SVE-FIXED-NEXT: [[LOAD:%.*]] = load <32 x i8>, ptr [[PTR]], align 1
				; SVE-FIXED-NEXT: [[DEINTERLEAVE:%.*]] = tail call { <16 x i8>, <16 x i8> } @llvm.experimental.vector.deinterleave2.v32i8(<32 x i8> [[LOAD]])
				; SVE-FIXED-NEXT: ret { <16 x i8>, <16 x i8> } [[DEINTERLEAVE]]
				;
				%load = load <32 x i8>, ptr %ptr, align 1
				%deinterleave = tail call { <16 x i8>, <16 x i8> } @llvm.experimental.vector.deinterleave2.v32i8(<32 x i8> %load)
				ret { <16 x i8>, <16 x i8> } %deinterleave
				}

				define { <8 x i16>, <8 x i16> } @deinterleave_i16_factor2(ptr %ptr) {
				; NEON-LABEL: define { <8 x i16>, <8 x i16> } @deinterleave_i16_factor2
				; NEON-SAME: (ptr [[PTR:%.*]]) {
				; NEON-NEXT: [[LDN:%.*]] = call { <8 x i16>, <8 x i16> } @llvm.aarch64.neon.ld2.v8i16.p0(ptr [[PTR]])
				; NEON-NEXT: ret { <8 x i16>, <8 x i16> } [[LDN]]
				;
				; SVE-FIXED-LABEL: define { <8 x i16>, <8 x i16> } @deinterleave_i16_factor2
				; SVE-FIXED-SAME: (ptr [[PTR:%.*]]) #[[ATTR0]] {
				; SVE-FIXED-NEXT: [[LOAD:%.*]] = load <16 x i16>, ptr [[PTR]], align 2
				; SVE-FIXED-NEXT: [[DEINTERLEAVE:%.*]] = tail call { <8 x i16>, <8 x i16> } @llvm.experimental.vector.deinterleave2.v16i16(<16 x i16> [[LOAD]])
				; SVE-FIXED-NEXT: ret { <8 x i16>, <8 x i16> } [[DEINTERLEAVE]]
				;
				%load = load <16 x i16>, ptr %ptr, align 2
				%deinterleave = tail call { <8 x i16>, <8 x i16> } @llvm.experimental.vector.deinterleave2.v16i16(<16 x i16> %load)
				ret { <8 x i16>, <8 x i16> } %deinterleave
				}

				define { <4 x i32>, <4 x i32> } @deinterleave_8xi32_factor2(ptr %ptr) {
				; NEON-LABEL: define { <4 x i32>, <4 x i32> } @deinterleave_8xi32_factor2
				; NEON-SAME: (ptr [[PTR:%.*]]) {
				; NEON-NEXT: [[LDN:%.*]] = call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0(ptr [[PTR]])
				; NEON-NEXT: ret { <4 x i32>, <4 x i32> } [[LDN]]
				;
				; SVE-FIXED-LABEL: define { <4 x i32>, <4 x i32> } @deinterleave_8xi32_factor2
				; SVE-FIXED-SAME: (ptr [[PTR:%.*]]) #[[ATTR0]] {
				; SVE-FIXED-NEXT: [[LOAD:%.*]] = load <8 x i32>, ptr [[PTR]], align 4
				; SVE-FIXED-NEXT: [[DEINTERLEAVE:%.*]] = tail call { <4 x i32>, <4 x i32> } @llvm.experimental.vector.deinterleave2.v8i32(<8 x i32> [[LOAD]])
				; SVE-FIXED-NEXT: ret { <4 x i32>, <4 x i32> } [[DEINTERLEAVE]]
				;
				%load = load <8 x i32>, ptr %ptr, align 4
				%deinterleave = tail call { <4 x i32>, <4 x i32> } @llvm.experimental.vector.deinterleave2.v8i32(<8 x i32> %load)
				ret { <4 x i32>, <4 x i32> } %deinterleave
				}

				define { <2 x i64>, <2 x i64> } @deinterleave_i64_factor2(ptr %ptr) {
				; NEON-LABEL: define { <2 x i64>, <2 x i64> } @deinterleave_i64_factor2
				; NEON-SAME: (ptr [[PTR:%.*]]) {
				; NEON-NEXT: [[LDN:%.*]] = call { <2 x i64>, <2 x i64> } @llvm.aarch64.neon.ld2.v2i64.p0(ptr [[PTR]])
				; NEON-NEXT: ret { <2 x i64>, <2 x i64> } [[LDN]]
				;
				; SVE-FIXED-LABEL: define { <2 x i64>, <2 x i64> } @deinterleave_i64_factor2
				; SVE-FIXED-SAME: (ptr [[PTR:%.*]]) #[[ATTR0]] {
				; SVE-FIXED-NEXT: [[LOAD:%.*]] = load <4 x i64>, ptr [[PTR]], align 8
				; SVE-FIXED-NEXT: [[DEINTERLEAVE:%.*]] = tail call { <2 x i64>, <2 x i64> } @llvm.experimental.vector.deinterleave2.v4i64(<4 x i64> [[LOAD]])
				; SVE-FIXED-NEXT: ret { <2 x i64>, <2 x i64> } [[DEINTERLEAVE]]
				;
				%load = load <4 x i64>, ptr %ptr, align 8
				%deinterleave = tail call { <2 x i64>, <2 x i64> } @llvm.experimental.vector.deinterleave2.v4i64(<4 x i64> %load)
				ret { <2 x i64>, <2 x i64> } %deinterleave
				}

				define { <4 x float>, <4 x float> } @deinterleave_float_factor2(ptr %ptr) {
				; NEON-LABEL: define { <4 x float>, <4 x float> } @deinterleave_float_factor2
				; NEON-SAME: (ptr [[PTR:%.*]]) {
				; NEON-NEXT: [[LDN:%.*]] = call { <4 x float>, <4 x float> } @llvm.aarch64.neon.ld2.v4f32.p0(ptr [[PTR]])
				; NEON-NEXT: ret { <4 x float>, <4 x float> } [[LDN]]
				;
				; SVE-FIXED-LABEL: define { <4 x float>, <4 x float> } @deinterleave_float_factor2
				; SVE-FIXED-SAME: (ptr [[PTR:%.*]]) #[[ATTR0]] {
				; SVE-FIXED-NEXT: [[LOAD:%.*]] = load <8 x float>, ptr [[PTR]], align 4
				; SVE-FIXED-NEXT: [[DEINTERLEAVE:%.*]] = tail call { <4 x float>, <4 x float> } @llvm.experimental.vector.deinterleave2.v8f32(<8 x float> [[LOAD]])
				; SVE-FIXED-NEXT: ret { <4 x float>, <4 x float> } [[DEINTERLEAVE]]
				;
				%load = load <8 x float>, ptr %ptr, align 4
				%deinterleave = tail call { <4 x float>, <4 x float> } @llvm.experimental.vector.deinterleave2.v8f32(<8 x float> %load)
				ret { <4 x float>, <4 x float> } %deinterleave
				}

				define { <2 x double>, <2 x double> } @deinterleave_double_factor2(ptr %ptr) {
				; NEON-LABEL: define { <2 x double>, <2 x double> } @deinterleave_double_factor2
				; NEON-SAME: (ptr [[PTR:%.*]]) {
				; NEON-NEXT: [[LDN:%.*]] = call { <2 x double>, <2 x double> } @llvm.aarch64.neon.ld2.v2f64.p0(ptr [[PTR]])
				; NEON-NEXT: ret { <2 x double>, <2 x double> } [[LDN]]
				;
				; SVE-FIXED-LABEL: define { <2 x double>, <2 x double> } @deinterleave_double_factor2
				; SVE-FIXED-SAME: (ptr [[PTR:%.*]]) #[[ATTR0]] {
				; SVE-FIXED-NEXT: [[LOAD:%.*]] = load <4 x double>, ptr [[PTR]], align 8
				; SVE-FIXED-NEXT: [[DEINTERLEAVE:%.*]] = tail call { <2 x double>, <2 x double> } @llvm.experimental.vector.deinterleave2.v4f64(<4 x double> [[LOAD]])
				; SVE-FIXED-NEXT: ret { <2 x double>, <2 x double> } [[DEINTERLEAVE]]
				;
				%load = load <4 x double>, ptr %ptr, align 8
				%deinterleave = tail call { <2 x double>, <2 x double> } @llvm.experimental.vector.deinterleave2.v4f64(<4 x double> %load)
				ret { <2 x double>, <2 x double> } %deinterleave
				}

				define { <2 x ptr>, <2 x ptr> } @deinterleave_ptr_factor2(ptr %ptr) {
				; NEON-LABEL: define { <2 x ptr>, <2 x ptr> } @deinterleave_ptr_factor2
				; NEON-SAME: (ptr [[PTR:%.*]]) {
				; NEON-NEXT: [[LDN:%.*]] = call { <2 x ptr>, <2 x ptr> } @llvm.aarch64.neon.ld2.v2p0.p0(ptr [[PTR]])
				; NEON-NEXT: ret { <2 x ptr>, <2 x ptr> } [[LDN]]
				;
				; SVE-FIXED-LABEL: define { <2 x ptr>, <2 x ptr> } @deinterleave_ptr_factor2
				; SVE-FIXED-SAME: (ptr [[PTR:%.*]]) #[[ATTR0]] {
				; SVE-FIXED-NEXT: [[LOAD:%.*]] = load <4 x ptr>, ptr [[PTR]], align 8
				; SVE-FIXED-NEXT: [[DEINTERLEAVE:%.*]] = tail call { <2 x ptr>, <2 x ptr> } @llvm.experimental.vector.deinterleave2.v4p0(<4 x ptr> [[LOAD]])
				; SVE-FIXED-NEXT: ret { <2 x ptr>, <2 x ptr> } [[DEINTERLEAVE]]
				;
				%load = load <4 x ptr>, ptr %ptr, align 8
				%deinterleave = tail call { <2 x ptr>, <2 x ptr> } @llvm.experimental.vector.deinterleave2.v4p0(<4 x ptr> %load)
				ret { <2 x ptr>, <2 x ptr> } %deinterleave
				}

				define void @interleave_i8_factor2(ptr %ptr, <16 x i8> %l, <16 x i8> %r) {
				; NEON-LABEL: define void @interleave_i8_factor2
				; NEON-SAME: (ptr [[PTR:%.]], <16 x i8> [[L:%.]], <16 x i8> [[R:%.*]]) {
				; NEON-NEXT: call void @llvm.aarch64.neon.st2.v16i8.p0(<16 x i8> [[L]], <16 x i8> [[R]], ptr [[PTR]])
				; NEON-NEXT: ret void
				;
				; SVE-FIXED-LABEL: define void @interleave_i8_factor2
				; SVE-FIXED-SAME: (ptr [[PTR:%.]], <16 x i8> [[L:%.]], <16 x i8> [[R:%.*]]) #[[ATTR0]] {
				; SVE-FIXED-NEXT: [[INTERLEAVE:%.*]] = tail call <32 x i8> @llvm.experimental.vector.interleave2.v32i8(<16 x i8> [[L]], <16 x i8> [[R]])
				; SVE-FIXED-NEXT: store <32 x i8> [[INTERLEAVE]], ptr [[PTR]], align 1
				; SVE-FIXED-NEXT: ret void
				;
				%interleave = tail call <32 x i8> @llvm.experimental.vector.interleave2.v32i8(<16 x i8> %l, <16 x i8> %r)
				store <32 x i8> %interleave, ptr %ptr, align 1
				ret void
				}

				define void @interleave_i16_factor2(ptr %ptr, <8 x i16> %l, <8 x i16> %r) {
				; NEON-LABEL: define void @interleave_i16_factor2
				; NEON-SAME: (ptr [[PTR:%.]], <8 x i16> [[L:%.]], <8 x i16> [[R:%.*]]) {
				; NEON-NEXT: call void @llvm.aarch64.neon.st2.v8i16.p0(<8 x i16> [[L]], <8 x i16> [[R]], ptr [[PTR]])
				; NEON-NEXT: ret void
				;
				; SVE-FIXED-LABEL: define void @interleave_i16_factor2
				; SVE-FIXED-SAME: (ptr [[PTR:%.]], <8 x i16> [[L:%.]], <8 x i16> [[R:%.*]]) #[[ATTR0]] {
				; SVE-FIXED-NEXT: [[INTERLEAVE:%.*]] = tail call <16 x i16> @llvm.experimental.vector.interleave2.v16i16(<8 x i16> [[L]], <8 x i16> [[R]])
				; SVE-FIXED-NEXT: store <16 x i16> [[INTERLEAVE]], ptr [[PTR]], align 2
				; SVE-FIXED-NEXT: ret void
				;
				%interleave = tail call <16 x i16> @llvm.experimental.vector.interleave2.v16i16(<8 x i16> %l, <8 x i16> %r)
				store <16 x i16> %interleave, ptr %ptr, align 2
				ret void
				}

				define void @interleave_i32_factor2(ptr %ptr, <4 x i32> %l, <4 x i32> %r) {
				; NEON-LABEL: define void @interleave_i32_factor2
				; NEON-SAME: (ptr [[PTR:%.]], <4 x i32> [[L:%.]], <4 x i32> [[R:%.*]]) {
				; NEON-NEXT: call void @llvm.aarch64.neon.st2.v4i32.p0(<4 x i32> [[L]], <4 x i32> [[R]], ptr [[PTR]])
				; NEON-NEXT: ret void
				;
				; SVE-FIXED-LABEL: define void @interleave_i32_factor2
				; SVE-FIXED-SAME: (ptr [[PTR:%.]], <4 x i32> [[L:%.]], <4 x i32> [[R:%.*]]) #[[ATTR0]] {
				; SVE-FIXED-NEXT: [[INTERLEAVE:%.*]] = tail call <8 x i32> @llvm.experimental.vector.interleave2.v8i32(<4 x i32> [[L]], <4 x i32> [[R]])
				; SVE-FIXED-NEXT: store <8 x i32> [[INTERLEAVE]], ptr [[PTR]], align 4
				; SVE-FIXED-NEXT: ret void
				;
				%interleave = tail call <8 x i32> @llvm.experimental.vector.interleave2.v8i32(<4 x i32> %l, <4 x i32> %r)
				store <8 x i32> %interleave, ptr %ptr, align 4
				ret void
				}

				define void @interleave_i64_factor2(ptr %ptr, <2 x i64> %l, <2 x i64> %r) {
				; NEON-LABEL: define void @interleave_i64_factor2
				; NEON-SAME: (ptr [[PTR:%.]], <2 x i64> [[L:%.]], <2 x i64> [[R:%.*]]) {
				; NEON-NEXT: call void @llvm.aarch64.neon.st2.v2i64.p0(<2 x i64> [[L]], <2 x i64> [[R]], ptr [[PTR]])
				; NEON-NEXT: ret void
				;
				; SVE-FIXED-LABEL: define void @interleave_i64_factor2
				; SVE-FIXED-SAME: (ptr [[PTR:%.]], <2 x i64> [[L:%.]], <2 x i64> [[R:%.*]]) #[[ATTR0]] {
				; SVE-FIXED-NEXT: [[INTERLEAVE:%.*]] = tail call <4 x i64> @llvm.experimental.vector.interleave2.v4i64(<2 x i64> [[L]], <2 x i64> [[R]])
				; SVE-FIXED-NEXT: store <4 x i64> [[INTERLEAVE]], ptr [[PTR]], align 8
				; SVE-FIXED-NEXT: ret void
				;
				%interleave = tail call <4 x i64> @llvm.experimental.vector.interleave2.v4i64(<2 x i64> %l, <2 x i64> %r)
				store <4 x i64> %interleave, ptr %ptr, align 8
				ret void
				}

				define void @interleave_float_factor2(ptr %ptr, <4 x float> %l, <4 x float> %r) {
				; NEON-LABEL: define void @interleave_float_factor2
				; NEON-SAME: (ptr [[PTR:%.]], <4 x float> [[L:%.]], <4 x float> [[R:%.*]]) {
				; NEON-NEXT: call void @llvm.aarch64.neon.st2.v4f32.p0(<4 x float> [[L]], <4 x float> [[R]], ptr [[PTR]])
				; NEON-NEXT: ret void
				;
				; SVE-FIXED-LABEL: define void @interleave_float_factor2
				; SVE-FIXED-SAME: (ptr [[PTR:%.]], <4 x float> [[L:%.]], <4 x float> [[R:%.*]]) #[[ATTR0]] {
				; SVE-FIXED-NEXT: [[INTERLEAVE:%.*]] = tail call <8 x float> @llvm.experimental.vector.interleave2.v8f32(<4 x float> [[L]], <4 x float> [[R]])
				; SVE-FIXED-NEXT: store <8 x float> [[INTERLEAVE]], ptr [[PTR]], align 4
				; SVE-FIXED-NEXT: ret void
				;
				%interleave = tail call <8 x float> @llvm.experimental.vector.interleave2.v8f32(<4 x float> %l, <4 x float> %r)
				store <8 x float> %interleave, ptr %ptr, align 4
				ret void
				}

				define void @interleave_double_factor2(ptr %ptr, <2 x double> %l, <2 x double> %r) {
				; NEON-LABEL: define void @interleave_double_factor2
				; NEON-SAME: (ptr [[PTR:%.]], <2 x double> [[L:%.]], <2 x double> [[R:%.*]]) {
				; NEON-NEXT: call void @llvm.aarch64.neon.st2.v2f64.p0(<2 x double> [[L]], <2 x double> [[R]], ptr [[PTR]])
				; NEON-NEXT: ret void
				;
				; SVE-FIXED-LABEL: define void @interleave_double_factor2
				; SVE-FIXED-SAME: (ptr [[PTR:%.]], <2 x double> [[L:%.]], <2 x double> [[R:%.*]]) #[[ATTR0]] {
				; SVE-FIXED-NEXT: [[INTERLEAVE:%.*]] = tail call <4 x double> @llvm.experimental.vector.interleave2.v4f64(<2 x double> [[L]], <2 x double> [[R]])
				; SVE-FIXED-NEXT: store <4 x double> [[INTERLEAVE]], ptr [[PTR]], align 4
				; SVE-FIXED-NEXT: ret void
				;
				%interleave = tail call <4 x double> @llvm.experimental.vector.interleave2.v4f64(<2 x double> %l, <2 x double> %r)
				store <4 x double> %interleave, ptr %ptr, align 4
				ret void
				}

				define void @interleave_ptr_factor2(ptr %ptr, <2 x ptr> %l, <2 x ptr> %r) {
				; NEON-LABEL: define void @interleave_ptr_factor2
				; NEON-SAME: (ptr [[PTR:%.]], <2 x ptr> [[L:%.]], <2 x ptr> [[R:%.*]]) {
				; NEON-NEXT: call void @llvm.aarch64.neon.st2.v2p0.p0(<2 x ptr> [[L]], <2 x ptr> [[R]], ptr [[PTR]])
				; NEON-NEXT: ret void
				;
				; SVE-FIXED-LABEL: define void @interleave_ptr_factor2
				; SVE-FIXED-SAME: (ptr [[PTR:%.]], <2 x ptr> [[L:%.]], <2 x ptr> [[R:%.*]]) #[[ATTR0]] {
				; SVE-FIXED-NEXT: [[INTERLEAVE:%.*]] = tail call <4 x ptr> @llvm.experimental.vector.interleave2.v4p0(<2 x ptr> [[L]], <2 x ptr> [[R]])
				; SVE-FIXED-NEXT: store <4 x ptr> [[INTERLEAVE]], ptr [[PTR]], align 4
				; SVE-FIXED-NEXT: ret void
				;
				%interleave = tail call <4 x ptr> @llvm.experimental.vector.interleave2.v4p0(<2 x ptr> %l, <2 x ptr> %r)
				store <4 x ptr> %interleave, ptr %ptr, align 4
				ret void
				}

				define { <16 x i16>, <16 x i16> } @deinterleave_wide_i16_factor2(ptr %ptr) #0 {
				; NEON-LABEL: define { <16 x i16>, <16 x i16> } @deinterleave_wide_i16_factor2
				; NEON-SAME: (ptr [[PTR:%.*]]) {
				; NEON-NEXT: [[TMP1:%.*]] = getelementptr <8 x i16>, ptr [[PTR]], i64 0
				; NEON-NEXT: [[LDN:%.*]] = call { <8 x i16>, <8 x i16> } @llvm.aarch64.neon.ld2.v8i16.p0(ptr [[TMP1]])
				; NEON-NEXT: [[TMP2:%.*]] = extractvalue { <8 x i16>, <8 x i16> } [[LDN]], 0
				; NEON-NEXT: [[TMP3:%.*]] = call <16 x i16> @llvm.vector.insert.v16i16.v8i16(<16 x i16> poison, <8 x i16> [[TMP2]], i64 0)
				; NEON-NEXT: [[TMP4:%.*]] = extractvalue { <8 x i16>, <8 x i16> } [[LDN]], 1
				; NEON-NEXT: [[TMP5:%.*]] = call <16 x i16> @llvm.vector.insert.v16i16.v8i16(<16 x i16> poison, <8 x i16> [[TMP4]], i64 0)
				; NEON-NEXT: [[TMP6:%.*]] = getelementptr <8 x i16>, ptr [[PTR]], i64 2
				; NEON-NEXT: [[LDN1:%.*]] = call { <8 x i16>, <8 x i16> } @llvm.aarch64.neon.ld2.v8i16.p0(ptr [[TMP6]])
				; NEON-NEXT: [[TMP7:%.*]] = extractvalue { <8 x i16>, <8 x i16> } [[LDN1]], 0
				; NEON-NEXT: [[TMP8:%.*]] = call <16 x i16> @llvm.vector.insert.v16i16.v8i16(<16 x i16> [[TMP3]], <8 x i16> [[TMP7]], i64 8)
				; NEON-NEXT: [[TMP9:%.*]] = extractvalue { <8 x i16>, <8 x i16> } [[LDN1]], 1
				; NEON-NEXT: [[TMP10:%.*]] = call <16 x i16> @llvm.vector.insert.v16i16.v8i16(<16 x i16> [[TMP5]], <8 x i16> [[TMP9]], i64 8)
				; NEON-NEXT: [[TMP11:%.*]] = insertvalue { <16 x i16>, <16 x i16> } poison, <16 x i16> [[TMP8]], 0
				; NEON-NEXT: [[TMP12:%.*]] = insertvalue { <16 x i16>, <16 x i16> } [[TMP11]], <16 x i16> [[TMP10]], 1
				; NEON-NEXT: ret { <16 x i16>, <16 x i16> } [[TMP12]]
				;
				; SVE-FIXED-LABEL: define { <16 x i16>, <16 x i16> } @deinterleave_wide_i16_factor2
				; SVE-FIXED-SAME: (ptr [[PTR:%.*]]) #[[ATTR0]] {
				; SVE-FIXED-NEXT: [[LOAD:%.*]] = load <32 x i16>, ptr [[PTR]], align 2
				; SVE-FIXED-NEXT: [[DEINTERLEAVE:%.*]] = tail call { <16 x i16>, <16 x i16> } @llvm.experimental.vector.deinterleave2.v32i16(<32 x i16> [[LOAD]])
				; SVE-FIXED-NEXT: ret { <16 x i16>, <16 x i16> } [[DEINTERLEAVE]]
				;
				%load = load <32 x i16>, ptr %ptr, align 2
				%deinterleave = tail call { <16 x i16>, <16 x i16> } @llvm.experimental.vector.deinterleave2.v32i16(<32 x i16> %load)
				ret { <16 x i16>, <16 x i16> } %deinterleave
				}

				define void @interleave_wide_ptr_factor2(ptr %ptr, <8 x ptr> %l, <8 x ptr> %r) {
				; NEON-LABEL: define void @interleave_wide_ptr_factor2
				; NEON-SAME: (ptr [[PTR:%.]], <8 x ptr> [[L:%.]], <8 x ptr> [[R:%.*]]) {
				; NEON-NEXT: [[TMP1:%.*]] = getelementptr <2 x ptr>, ptr [[PTR]], i64 0
				; NEON-NEXT: [[TMP2:%.*]] = call <2 x ptr> @llvm.vector.extract.v2p0.v8p0(<8 x ptr> [[L]], i64 0)
				; NEON-NEXT: [[TMP3:%.*]] = call <2 x ptr> @llvm.vector.extract.v2p0.v8p0(<8 x ptr> [[R]], i64 0)
				; NEON-NEXT: call void @llvm.aarch64.neon.st2.v2p0.p0(<2 x ptr> [[TMP2]], <2 x ptr> [[TMP3]], ptr [[TMP1]])
				; NEON-NEXT: [[TMP4:%.*]] = getelementptr <2 x ptr>, ptr [[PTR]], i64 2
				; NEON-NEXT: [[TMP5:%.*]] = call <2 x ptr> @llvm.vector.extract.v2p0.v8p0(<8 x ptr> [[L]], i64 2)
				; NEON-NEXT: [[TMP6:%.*]] = call <2 x ptr> @llvm.vector.extract.v2p0.v8p0(<8 x ptr> [[R]], i64 2)
				; NEON-NEXT: call void @llvm.aarch64.neon.st2.v2p0.p0(<2 x ptr> [[TMP5]], <2 x ptr> [[TMP6]], ptr [[TMP4]])
				; NEON-NEXT: [[TMP7:%.*]] = getelementptr <2 x ptr>, ptr [[PTR]], i64 4
				; NEON-NEXT: [[TMP8:%.*]] = call <2 x ptr> @llvm.vector.extract.v2p0.v8p0(<8 x ptr> [[L]], i64 4)
				; NEON-NEXT: [[TMP9:%.*]] = call <2 x ptr> @llvm.vector.extract.v2p0.v8p0(<8 x ptr> [[R]], i64 4)
				; NEON-NEXT: call void @llvm.aarch64.neon.st2.v2p0.p0(<2 x ptr> [[TMP8]], <2 x ptr> [[TMP9]], ptr [[TMP7]])
				; NEON-NEXT: [[TMP10:%.*]] = getelementptr <2 x ptr>, ptr [[PTR]], i64 6
				; NEON-NEXT: [[TMP11:%.*]] = call <2 x ptr> @llvm.vector.extract.v2p0.v8p0(<8 x ptr> [[L]], i64 6)
				; NEON-NEXT: [[TMP12:%.*]] = call <2 x ptr> @llvm.vector.extract.v2p0.v8p0(<8 x ptr> [[R]], i64 6)
				; NEON-NEXT: call void @llvm.aarch64.neon.st2.v2p0.p0(<2 x ptr> [[TMP11]], <2 x ptr> [[TMP12]], ptr [[TMP10]])
				; NEON-NEXT: ret void
				;
				; SVE-FIXED-LABEL: define void @interleave_wide_ptr_factor2
				; SVE-FIXED-SAME: (ptr [[PTR:%.]], <8 x ptr> [[L:%.]], <8 x ptr> [[R:%.*]]) #[[ATTR0]] {
				; SVE-FIXED-NEXT: [[INTERLEAVE:%.*]] = tail call <16 x ptr> @llvm.experimental.vector.interleave2.v16p0(<8 x ptr> [[L]], <8 x ptr> [[R]])
				; SVE-FIXED-NEXT: store <16 x ptr> [[INTERLEAVE]], ptr [[PTR]], align 4
				; SVE-FIXED-NEXT: ret void
				;
				%interleave = tail call <16 x ptr> @llvm.experimental.vector.interleave2.v16p0(<8 x ptr> %l, <8 x ptr> %r)
				store <16 x ptr> %interleave, ptr %ptr, align 4
				ret void
				}

				declare { <16 x i8>, <16 x i8> } @llvm.experimental.vector.deinterleave2.v32i8(<32 x i8>)
				declare { <8 x i16>, <8 x i16> } @llvm.experimental.vector.deinterleave2.v16i16(<16 x i16>)
				declare { <4 x i32>, <4 x i32> } @llvm.experimental.vector.deinterleave2.v8i32(<8 x i32>)
				declare { <2 x i64>, <2 x i64> } @llvm.experimental.vector.deinterleave2.v4i64(<4 x i64>)
				declare { <4 x float>, <4 x float> } @llvm.experimental.vector.deinterleave2.v8f32(<8 x float>)
				declare { <2 x double>, <2 x double> } @llvm.experimental.vector.deinterleave2.v4f64(<4 x double>)
				declare { <2 x ptr>, <2 x ptr> } @llvm.experimental.vector.deinterleave2.v4p0(<4 x ptr>)
				declare { <16 x i16>, <16 x i16> } @llvm.experimental.vector.deinterleave2.v32i16(<32 x i16>)

				declare <32 x i8> @llvm.experimental.vector.interleave2.v32i8(<16 x i8>, <16 x i8>)
				declare <16 x i16> @llvm.experimental.vector.interleave2.v16i16(<8 x i16>, <8 x i16>)
				declare <8 x i32> @llvm.experimental.vector.interleave2.v8i32(<4 x i32>, <4 x i32>)
				declare <4 x i64> @llvm.experimental.vector.interleave2.v4i64(<2 x i64>, <2 x i64>)
				declare <8 x float> @llvm.experimental.vector.interleave2.v8f32(<4 x float>, <4 x float>)
				declare <4 x double> @llvm.experimental.vector.interleave2.v4f64(<2 x double>, <2 x double>)
				declare <4 x ptr> @llvm.experimental.vector.interleave2.v4p0(<2 x ptr>, <2 x ptr>)
				declare <16 x ptr> @llvm.experimental.vector.interleave2.v16p0(<8 x ptr>, <8 x ptr>)

llvm/test/Transforms/InterleavedAccess/AArch64/scalable-deinterleave-intrinsics.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
				; RUN: opt < %s -interleaved-access -S \| FileCheck %s

				target triple = "aarch64-linux-gnu"

				define { <vscale x 16 x i8>, <vscale x 16 x i8> } @deinterleave_nxi8_factor2(ptr %ptr) #0 {
				; CHECK-LABEL: define { <vscale x 16 x i8>, <vscale x 16 x i8> } @deinterleave_nxi8_factor2
				; CHECK-SAME: (ptr [[PTR:%.*]]) #[[ATTR0:[0-9]+]] {
				; CHECK-NEXT: [[LDN:%.*]] = call { <vscale x 16 x i8>, <vscale x 16 x i8> } @llvm.aarch64.sve.ld2.sret.nxv16i8(<vscale x 16 x i1> shufflevector (<vscale x 16 x i1> insertelement (<vscale x 16 x i1> poison, i1 true, i64 0), <vscale x 16 x i1> poison, <vscale x 16 x i32> zeroinitializer), ptr [[PTR]])
				; CHECK-NEXT: ret { <vscale x 16 x i8>, <vscale x 16 x i8> } [[LDN]]
				;
				%load = load <vscale x 32 x i8>, ptr %ptr, align 1
				%deinterleave = tail call { <vscale x 16 x i8>, <vscale x 16 x i8> } @llvm.experimental.vector.deinterleave2.nxv32i8(<vscale x 32 x i8> %load)
				ret { <vscale x 16 x i8>, <vscale x 16 x i8> } %deinterleave
				}

				define { <vscale x 8 x i16>, <vscale x 8 x i16> } @deinterleave_nxi16_factor2(ptr %ptr) #0 {
				; CHECK-LABEL: define { <vscale x 8 x i16>, <vscale x 8 x i16> } @deinterleave_nxi16_factor2
				; CHECK-SAME: (ptr [[PTR:%.*]]) #[[ATTR0]] {
				; CHECK-NEXT: [[LDN:%.*]] = call { <vscale x 8 x i16>, <vscale x 8 x i16> } @llvm.aarch64.sve.ld2.sret.nxv8i16(<vscale x 8 x i1> shufflevector (<vscale x 8 x i1> insertelement (<vscale x 8 x i1> poison, i1 true, i64 0), <vscale x 8 x i1> poison, <vscale x 8 x i32> zeroinitializer), ptr [[PTR]])
				; CHECK-NEXT: ret { <vscale x 8 x i16>, <vscale x 8 x i16> } [[LDN]]
				;
				%load = load <vscale x 16 x i16>, ptr %ptr, align 2
				%deinterleave = tail call { <vscale x 8 x i16>, <vscale x 8 x i16> } @llvm.experimental.vector.deinterleave2.nxv16i16(<vscale x 16 x i16> %load)
				ret { <vscale x 8 x i16>, <vscale x 8 x i16> } %deinterleave
				}

				define { <vscale x 4 x i32>, <vscale x 4 x i32> } @deinterleave_nx8xi32_factor2(ptr %ptr) #0 {
				; CHECK-LABEL: define { <vscale x 4 x i32>, <vscale x 4 x i32> } @deinterleave_nx8xi32_factor2
				; CHECK-SAME: (ptr [[PTR:%.*]]) #[[ATTR0]] {
				; CHECK-NEXT: [[LDN:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.aarch64.sve.ld2.sret.nxv4i32(<vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), ptr [[PTR]])
				; CHECK-NEXT: ret { <vscale x 4 x i32>, <vscale x 4 x i32> } [[LDN]]
				;
				%load = load <vscale x 8 x i32>, ptr %ptr, align 4
				%deinterleave = tail call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> %load)
				ret { <vscale x 4 x i32>, <vscale x 4 x i32> } %deinterleave
				}

				define { <vscale x 2 x i64>, <vscale x 2 x i64> } @deinterleave_nxi64_factor2(ptr %ptr) #0 {
				; CHECK-LABEL: define { <vscale x 2 x i64>, <vscale x 2 x i64> } @deinterleave_nxi64_factor2
				; CHECK-SAME: (ptr [[PTR:%.*]]) #[[ATTR0]] {
				; CHECK-NEXT: [[LDN:%.*]] = call { <vscale x 2 x i64>, <vscale x 2 x i64> } @llvm.aarch64.sve.ld2.sret.nxv2i64(<vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), ptr [[PTR]])
				; CHECK-NEXT: ret { <vscale x 2 x i64>, <vscale x 2 x i64> } [[LDN]]
				;
				%load = load <vscale x 4 x i64>, ptr %ptr, align 8
				%deinterleave = tail call { <vscale x 2 x i64>, <vscale x 2 x i64> } @llvm.experimental.vector.deinterleave2.nxv4i64(<vscale x 4 x i64> %load)
				ret { <vscale x 2 x i64>, <vscale x 2 x i64> } %deinterleave
				}

				define { <vscale x 4 x float>, <vscale x 4 x float> } @deinterleave_nxfloat_factor2(ptr %ptr) #0 {
				; CHECK-LABEL: define { <vscale x 4 x float>, <vscale x 4 x float> } @deinterleave_nxfloat_factor2
				; CHECK-SAME: (ptr [[PTR:%.*]]) #[[ATTR0]] {
				; CHECK-NEXT: [[LDN:%.*]] = call { <vscale x 4 x float>, <vscale x 4 x float> } @llvm.aarch64.sve.ld2.sret.nxv4f32(<vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), ptr [[PTR]])
				; CHECK-NEXT: ret { <vscale x 4 x float>, <vscale x 4 x float> } [[LDN]]
				;
				%load = load <vscale x 8 x float>, ptr %ptr, align 4
				%deinterleave = tail call { <vscale x 4 x float>, <vscale x 4 x float> } @llvm.experimental.vector.deinterleave2.nxv8f32(<vscale x 8 x float> %load)
				ret { <vscale x 4 x float>, <vscale x 4 x float> } %deinterleave
				}

				define { <vscale x 2 x double>, <vscale x 2 x double> } @deinterleave_nxdouble_factor2(ptr %ptr) #0 {
				; CHECK-LABEL: define { <vscale x 2 x double>, <vscale x 2 x double> } @deinterleave_nxdouble_factor2
				; CHECK-SAME: (ptr [[PTR:%.*]]) #[[ATTR0]] {
				; CHECK-NEXT: [[LDN:%.*]] = call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.aarch64.sve.ld2.sret.nxv2f64(<vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), ptr [[PTR]])
				; CHECK-NEXT: ret { <vscale x 2 x double>, <vscale x 2 x double> } [[LDN]]
				;
				%load = load <vscale x 4 x double>, ptr %ptr, align 8
				%deinterleave = tail call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double> %load)
				ret { <vscale x 2 x double>, <vscale x 2 x double> } %deinterleave
				}

				define { <vscale x 2 x ptr>, <vscale x 2 x ptr> } @deinterleave_nxptr_factor2(ptr %ptr) #0 {
				; CHECK-LABEL: define { <vscale x 2 x ptr>, <vscale x 2 x ptr> } @deinterleave_nxptr_factor2
				; CHECK-SAME: (ptr [[PTR:%.*]]) #[[ATTR0]] {
				; CHECK-NEXT: [[LDN:%.*]] = call { <vscale x 2 x ptr>, <vscale x 2 x ptr> } @llvm.aarch64.sve.ld2.sret.nxv2p0(<vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), ptr [[PTR]])
				; CHECK-NEXT: ret { <vscale x 2 x ptr>, <vscale x 2 x ptr> } [[LDN]]
				;
				%load = load <vscale x 4 x ptr>, ptr %ptr, align 8
				%deinterleave = tail call { <vscale x 2 x ptr>, <vscale x 2 x ptr> } @llvm.experimental.vector.deinterleave2.nxv4p0(<vscale x 4 x ptr> %load)
				ret { <vscale x 2 x ptr>, <vscale x 2 x ptr> } %deinterleave
				}

				define void @interleave_nxi8_factor2(ptr %ptr, <vscale x 16 x i8> %l, <vscale x 16 x i8> %r) #0 {
				; CHECK-LABEL: define void @interleave_nxi8_factor2
				; CHECK-SAME: (ptr [[PTR:%.]], <vscale x 16 x i8> [[L:%.]], <vscale x 16 x i8> [[R:%.*]]) #[[ATTR0]] {
				; CHECK-NEXT: call void @llvm.aarch64.sve.st2.nxv16i8(<vscale x 16 x i8> [[L]], <vscale x 16 x i8> [[R]], <vscale x 16 x i1> shufflevector (<vscale x 16 x i1> insertelement (<vscale x 16 x i1> poison, i1 true, i64 0), <vscale x 16 x i1> poison, <vscale x 16 x i32> zeroinitializer), ptr [[PTR]])
				; CHECK-NEXT: ret void
				;
				%interleave = tail call <vscale x 32 x i8> @llvm.experimental.vector.interleave2.nxv32i8(<vscale x 16 x i8> %l, <vscale x 16 x i8> %r)
				store <vscale x 32 x i8> %interleave, ptr %ptr, align 1
				ret void
				}

				define void @interleave_nxi16_factor2(ptr %ptr, <vscale x 8 x i16> %l, <vscale x 8 x i16> %r) #0 {
				; CHECK-LABEL: define void @interleave_nxi16_factor2
				; CHECK-SAME: (ptr [[PTR:%.]], <vscale x 8 x i16> [[L:%.]], <vscale x 8 x i16> [[R:%.*]]) #[[ATTR0]] {
				; CHECK-NEXT: call void @llvm.aarch64.sve.st2.nxv8i16(<vscale x 8 x i16> [[L]], <vscale x 8 x i16> [[R]], <vscale x 8 x i1> shufflevector (<vscale x 8 x i1> insertelement (<vscale x 8 x i1> poison, i1 true, i64 0), <vscale x 8 x i1> poison, <vscale x 8 x i32> zeroinitializer), ptr [[PTR]])
				; CHECK-NEXT: ret void
				;
				%interleave = tail call <vscale x 16 x i16> @llvm.experimental.vector.interleave2.nxv16i16(<vscale x 8 x i16> %l, <vscale x 8 x i16> %r)
				store <vscale x 16 x i16> %interleave, ptr %ptr, align 2
				ret void
				}

				define void @interleave_nxi32_factor2(ptr %ptr, <vscale x 4 x i32> %l, <vscale x 4 x i32> %r) #0 {
				; CHECK-LABEL: define void @interleave_nxi32_factor2
				; CHECK-SAME: (ptr [[PTR:%.]], <vscale x 4 x i32> [[L:%.]], <vscale x 4 x i32> [[R:%.*]]) #[[ATTR0]] {
				; CHECK-NEXT: call void @llvm.aarch64.sve.st2.nxv4i32(<vscale x 4 x i32> [[L]], <vscale x 4 x i32> [[R]], <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), ptr [[PTR]])
				; CHECK-NEXT: ret void
				;
				%interleave = tail call <vscale x 8 x i32> @llvm.experimental.vector.interleave2.nxv8i32(<vscale x 4 x i32> %l, <vscale x 4 x i32> %r)
				store <vscale x 8 x i32> %interleave, ptr %ptr, align 4
				ret void
				}

				define void @interleave_nxi64_factor2(ptr %ptr, <vscale x 2 x i64> %l, <vscale x 2 x i64> %r) #0 {
				; CHECK-LABEL: define void @interleave_nxi64_factor2
				; CHECK-SAME: (ptr [[PTR:%.]], <vscale x 2 x i64> [[L:%.]], <vscale x 2 x i64> [[R:%.*]]) #[[ATTR0]] {
				; CHECK-NEXT: call void @llvm.aarch64.sve.st2.nxv2i64(<vscale x 2 x i64> [[L]], <vscale x 2 x i64> [[R]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), ptr [[PTR]])
				; CHECK-NEXT: ret void
				;
				%interleave = tail call <vscale x 4 x i64> @llvm.experimental.vector.interleave2.nxv4i64(<vscale x 2 x i64> %l, <vscale x 2 x i64> %r)
				store <vscale x 4 x i64> %interleave, ptr %ptr, align 8
				ret void
				}

				define void @interleave_nxfloat_factor2(ptr %ptr, <vscale x 4 x float> %l, <vscale x 4 x float> %r) #0 {
				; CHECK-LABEL: define void @interleave_nxfloat_factor2
				; CHECK-SAME: (ptr [[PTR:%.]], <vscale x 4 x float> [[L:%.]], <vscale x 4 x float> [[R:%.*]]) #[[ATTR0]] {
				; CHECK-NEXT: call void @llvm.aarch64.sve.st2.nxv4f32(<vscale x 4 x float> [[L]], <vscale x 4 x float> [[R]], <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), ptr [[PTR]])
				; CHECK-NEXT: ret void
				;
				%interleave = tail call <vscale x 8 x float> @llvm.experimental.vector.interleave2.nxv8f32(<vscale x 4 x float> %l, <vscale x 4 x float> %r)
				store <vscale x 8 x float> %interleave, ptr %ptr, align 4
				ret void
				}

				define void @interleave_nxdouble_factor2(ptr %ptr, <vscale x 2 x double> %l, <vscale x 2 x double> %r) #0 {
				; CHECK-LABEL: define void @interleave_nxdouble_factor2
				; CHECK-SAME: (ptr [[PTR:%.]], <vscale x 2 x double> [[L:%.]], <vscale x 2 x double> [[R:%.*]]) #[[ATTR0]] {
				; CHECK-NEXT: call void @llvm.aarch64.sve.st2.nxv2f64(<vscale x 2 x double> [[L]], <vscale x 2 x double> [[R]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), ptr [[PTR]])
				; CHECK-NEXT: ret void
				;
				%interleave = tail call <vscale x 4 x double> @llvm.experimental.vector.interleave2.nxv4f64(<vscale x 2 x double> %l, <vscale x 2 x double> %r)
				store <vscale x 4 x double> %interleave, ptr %ptr, align 4
				ret void
				}

				define void @interleave_nxptr_factor2(ptr %ptr, <vscale x 2 x ptr> %l, <vscale x 2 x ptr> %r) #0 {
				; CHECK-LABEL: define void @interleave_nxptr_factor2
				; CHECK-SAME: (ptr [[PTR:%.]], <vscale x 2 x ptr> [[L:%.]], <vscale x 2 x ptr> [[R:%.*]]) #[[ATTR0]] {
				; CHECK-NEXT: call void @llvm.aarch64.sve.st2.nxv2p0(<vscale x 2 x ptr> [[L]], <vscale x 2 x ptr> [[R]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), ptr [[PTR]])
				; CHECK-NEXT: ret void
				;
				%interleave = tail call <vscale x 4 x ptr> @llvm.experimental.vector.interleave2.nxv4p0(<vscale x 2 x ptr> %l, <vscale x 2 x ptr> %r)
				store <vscale x 4 x ptr> %interleave, ptr %ptr, align 4
				ret void
				}

				;;; Check that we 'legalize' operations that are wider than the target supports.

				define { <vscale x 16 x i32>, <vscale x 16 x i32> } @deinterleave_wide_nxi32_factor2(ptr %ptr) #0 {
				; CHECK-LABEL: define { <vscale x 16 x i32>, <vscale x 16 x i32> } @deinterleave_wide_nxi32_factor2
				; CHECK-SAME: (ptr [[PTR:%.*]]) #[[ATTR0]] {
				; CHECK-NEXT: [[TMP1:%.*]] = getelementptr <vscale x 4 x i32>, ptr [[PTR]], i64 0
				; CHECK-NEXT: [[LDN:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.aarch64.sve.ld2.sret.nxv4i32(<vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), ptr [[TMP1]])
				; CHECK-NEXT: [[TMP2:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[LDN]], 0
				; CHECK-NEXT: [[TMP3:%.*]] = call <vscale x 16 x i32> @llvm.vector.insert.nxv16i32.nxv4i32(<vscale x 16 x i32> poison, <vscale x 4 x i32> [[TMP2]], i64 0)
				; CHECK-NEXT: [[TMP4:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[LDN]], 1
				; CHECK-NEXT: [[TMP5:%.*]] = call <vscale x 16 x i32> @llvm.vector.insert.nxv16i32.nxv4i32(<vscale x 16 x i32> poison, <vscale x 4 x i32> [[TMP4]], i64 0)
				; CHECK-NEXT: [[TMP6:%.*]] = getelementptr <vscale x 4 x i32>, ptr [[PTR]], i64 2
				; CHECK-NEXT: [[LDN1:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.aarch64.sve.ld2.sret.nxv4i32(<vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), ptr [[TMP6]])
				; CHECK-NEXT: [[TMP7:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[LDN1]], 0
				; CHECK-NEXT: [[TMP8:%.*]] = call <vscale x 16 x i32> @llvm.vector.insert.nxv16i32.nxv4i32(<vscale x 16 x i32> [[TMP3]], <vscale x 4 x i32> [[TMP7]], i64 4)
				; CHECK-NEXT: [[TMP9:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[LDN1]], 1
				; CHECK-NEXT: [[TMP10:%.*]] = call <vscale x 16 x i32> @llvm.vector.insert.nxv16i32.nxv4i32(<vscale x 16 x i32> [[TMP5]], <vscale x 4 x i32> [[TMP9]], i64 4)
				; CHECK-NEXT: [[TMP11:%.*]] = getelementptr <vscale x 4 x i32>, ptr [[PTR]], i64 4
				; CHECK-NEXT: [[LDN2:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.aarch64.sve.ld2.sret.nxv4i32(<vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), ptr [[TMP11]])
				; CHECK-NEXT: [[TMP12:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[LDN2]], 0
				; CHECK-NEXT: [[TMP13:%.*]] = call <vscale x 16 x i32> @llvm.vector.insert.nxv16i32.nxv4i32(<vscale x 16 x i32> [[TMP8]], <vscale x 4 x i32> [[TMP12]], i64 8)
				; CHECK-NEXT: [[TMP14:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[LDN2]], 1
				; CHECK-NEXT: [[TMP15:%.*]] = call <vscale x 16 x i32> @llvm.vector.insert.nxv16i32.nxv4i32(<vscale x 16 x i32> [[TMP10]], <vscale x 4 x i32> [[TMP14]], i64 8)
				; CHECK-NEXT: [[TMP16:%.*]] = getelementptr <vscale x 4 x i32>, ptr [[PTR]], i64 6
				; CHECK-NEXT: [[LDN3:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.aarch64.sve.ld2.sret.nxv4i32(<vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), ptr [[TMP16]])
				; CHECK-NEXT: [[TMP17:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[LDN3]], 0
				; CHECK-NEXT: [[TMP18:%.*]] = call <vscale x 16 x i32> @llvm.vector.insert.nxv16i32.nxv4i32(<vscale x 16 x i32> [[TMP13]], <vscale x 4 x i32> [[TMP17]], i64 12)
				; CHECK-NEXT: [[TMP19:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[LDN3]], 1
				; CHECK-NEXT: [[TMP20:%.*]] = call <vscale x 16 x i32> @llvm.vector.insert.nxv16i32.nxv4i32(<vscale x 16 x i32> [[TMP15]], <vscale x 4 x i32> [[TMP19]], i64 12)
				; CHECK-NEXT: [[TMP21:%.*]] = insertvalue { <vscale x 16 x i32>, <vscale x 16 x i32> } poison, <vscale x 16 x i32> [[TMP18]], 0
				; CHECK-NEXT: [[TMP22:%.*]] = insertvalue { <vscale x 16 x i32>, <vscale x 16 x i32> } [[TMP21]], <vscale x 16 x i32> [[TMP20]], 1
				; CHECK-NEXT: ret { <vscale x 16 x i32>, <vscale x 16 x i32> } [[TMP22]]
				;
				%load = load <vscale x 32 x i32>, ptr %ptr, align 4
				%deinterleave = tail call { <vscale x 16 x i32>, <vscale x 16 x i32> } @llvm.experimental.vector.deinterleave2.nxv32i32(<vscale x 32 x i32> %load)
				ret { <vscale x 16 x i32>, <vscale x 16 x i32> } %deinterleave
				}

				define { <vscale x 4 x double>, <vscale x 4 x double> } @deinterleave_wide_nxdouble_factor2(ptr %ptr) #0 {
				; CHECK-LABEL: define { <vscale x 4 x double>, <vscale x 4 x double> } @deinterleave_wide_nxdouble_factor2
				; CHECK-SAME: (ptr [[PTR:%.*]]) #[[ATTR0]] {
				; CHECK-NEXT: [[TMP1:%.*]] = getelementptr <vscale x 2 x double>, ptr [[PTR]], i64 0
				; CHECK-NEXT: [[LDN:%.*]] = call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.aarch64.sve.ld2.sret.nxv2f64(<vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), ptr [[TMP1]])
				; CHECK-NEXT: [[TMP2:%.*]] = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } [[LDN]], 0
				; CHECK-NEXT: [[TMP3:%.*]] = call <vscale x 4 x double> @llvm.vector.insert.nxv4f64.nxv2f64(<vscale x 4 x double> poison, <vscale x 2 x double> [[TMP2]], i64 0)
				; CHECK-NEXT: [[TMP4:%.*]] = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } [[LDN]], 1
				; CHECK-NEXT: [[TMP5:%.*]] = call <vscale x 4 x double> @llvm.vector.insert.nxv4f64.nxv2f64(<vscale x 4 x double> poison, <vscale x 2 x double> [[TMP4]], i64 0)
				; CHECK-NEXT: [[TMP6:%.*]] = getelementptr <vscale x 2 x double>, ptr [[PTR]], i64 2
				; CHECK-NEXT: [[LDN1:%.*]] = call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.aarch64.sve.ld2.sret.nxv2f64(<vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), ptr [[TMP6]])
				; CHECK-NEXT: [[TMP7:%.*]] = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } [[LDN1]], 0
				; CHECK-NEXT: [[TMP8:%.*]] = call <vscale x 4 x double> @llvm.vector.insert.nxv4f64.nxv2f64(<vscale x 4 x double> [[TMP3]], <vscale x 2 x double> [[TMP7]], i64 2)
				; CHECK-NEXT: [[TMP9:%.*]] = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } [[LDN1]], 1
				; CHECK-NEXT: [[TMP10:%.*]] = call <vscale x 4 x double> @llvm.vector.insert.nxv4f64.nxv2f64(<vscale x 4 x double> [[TMP5]], <vscale x 2 x double> [[TMP9]], i64 2)
				; CHECK-NEXT: [[TMP11:%.*]] = insertvalue { <vscale x 4 x double>, <vscale x 4 x double> } poison, <vscale x 4 x double> [[TMP8]], 0
				; CHECK-NEXT: [[TMP12:%.*]] = insertvalue { <vscale x 4 x double>, <vscale x 4 x double> } [[TMP11]], <vscale x 4 x double> [[TMP10]], 1
				; CHECK-NEXT: ret { <vscale x 4 x double>, <vscale x 4 x double> } [[TMP12]]
				;
				%load = load <vscale x 8 x double>, ptr %ptr, align 8
				%deinterleave = tail call { <vscale x 4 x double>, <vscale x 4 x double> } @llvm.experimental.vector.deinterleave2.nxv8f64(<vscale x 8 x double> %load)
				ret { <vscale x 4 x double>, <vscale x 4 x double> } %deinterleave
				}

				define void @interleave_wide_nxdouble_factor2(ptr %ptr, <vscale x 4 x double> %l, <vscale x 4 x double> %r) #0 {
				; CHECK-LABEL: define void @interleave_wide_nxdouble_factor2
				; CHECK-SAME: (ptr [[PTR:%.]], <vscale x 4 x double> [[L:%.]], <vscale x 4 x double> [[R:%.*]]) #[[ATTR0]] {
				; CHECK-NEXT: [[TMP1:%.*]] = getelementptr <vscale x 2 x double>, ptr [[PTR]], i64 0
				; CHECK-NEXT: [[TMP2:%.*]] = call <vscale x 2 x double> @llvm.vector.extract.nxv2f64.nxv4f64(<vscale x 4 x double> [[L]], i64 0)
				; CHECK-NEXT: [[TMP3:%.*]] = call <vscale x 2 x double> @llvm.vector.extract.nxv2f64.nxv4f64(<vscale x 4 x double> [[R]], i64 0)
				; CHECK-NEXT: call void @llvm.aarch64.sve.st2.nxv2f64(<vscale x 2 x double> [[TMP2]], <vscale x 2 x double> [[TMP3]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), ptr [[TMP1]])
				; CHECK-NEXT: [[TMP4:%.*]] = getelementptr <vscale x 2 x double>, ptr [[PTR]], i64 2
				; CHECK-NEXT: [[TMP5:%.*]] = call <vscale x 2 x double> @llvm.vector.extract.nxv2f64.nxv4f64(<vscale x 4 x double> [[L]], i64 2)
				; CHECK-NEXT: [[TMP6:%.*]] = call <vscale x 2 x double> @llvm.vector.extract.nxv2f64.nxv4f64(<vscale x 4 x double> [[R]], i64 2)
				; CHECK-NEXT: call void @llvm.aarch64.sve.st2.nxv2f64(<vscale x 2 x double> [[TMP5]], <vscale x 2 x double> [[TMP6]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), ptr [[TMP4]])
				; CHECK-NEXT: ret void
				;
				%interleave = tail call <vscale x 8 x double> @llvm.experimental.vector.interleave2.nxv8f64(<vscale x 4 x double> %l, <vscale x 4 x double> %r)
				store <vscale x 8 x double> %interleave, ptr %ptr, align 4
				ret void
				}

				declare { <vscale x 16 x i8>, <vscale x 16 x i8> } @llvm.experimental.vector.deinterleave2.nxv32i8(<vscale x 32 x i8>)
				declare { <vscale x 8 x i16>, <vscale x 8 x i16> } @llvm.experimental.vector.deinterleave2.nxv16i16(<vscale x 16 x i16>)
				declare { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32>)
				declare { <vscale x 2 x i64>, <vscale x 2 x i64> } @llvm.experimental.vector.deinterleave2.nxv4i64(<vscale x 4 x i64>)
				declare { <vscale x 4 x float>, <vscale x 4 x float> } @llvm.experimental.vector.deinterleave2.nxv8f32(<vscale x 8 x float>)
				declare { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double>)
				declare { <vscale x 2 x ptr>, <vscale x 2 x ptr> } @llvm.experimental.vector.deinterleave2.nxv4p0(<vscale x 4 x ptr>)

				; Larger deinterleaves to test 'legalization'
				declare { <vscale x 16 x i32>, <vscale x 16 x i32> } @llvm.experimental.vector.deinterleave2.nxv32i32(<vscale x 32 x i32>)
				declare { <vscale x 4 x double>, <vscale x 4 x double> } @llvm.experimental.vector.deinterleave2.nxv8f64(<vscale x 8 x double>)

				declare <vscale x 32 x i8> @llvm.experimental.vector.interleave2.nxv32i8(<vscale x 16 x i8>, <vscale x 16 x i8>)
				declare <vscale x 16 x i16> @llvm.experimental.vector.interleave2.nxv16i16(<vscale x 8 x i16>, <vscale x 8 x i16>)
				declare <vscale x 8 x i32> @llvm.experimental.vector.interleave2.nxv8i32(<vscale x 4 x i32>, <vscale x 4 x i32>)
				declare <vscale x 4 x i64> @llvm.experimental.vector.interleave2.nxv4i64(<vscale x 2 x i64>, <vscale x 2 x i64>)
				declare <vscale x 8 x float> @llvm.experimental.vector.interleave2.nxv8f32(<vscale x 4 x float>, <vscale x 4 x float>)
				declare <vscale x 4 x double> @llvm.experimental.vector.interleave2.nxv4f64(<vscale x 2 x double>, <vscale x 2 x double>)
				declare <vscale x 4 x ptr> @llvm.experimental.vector.interleave2.nxv4p0(<vscale x 2 x ptr>, <vscale x 2 x ptr>)

				; Larger interleaves to test 'legalization'
				declare <vscale x 8 x double> @llvm.experimental.vector.interleave2.nxv8f64(<vscale x 4 x double>, <vscale x 4 x double>)

				attributes #0 = { vscale_range(1,16) "target-features"="+sve" }