This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
ComplexDeinterleavingPass.h
-
lib/
-
CodeGen/
12/12
ComplexDeinterleavingPass.cpp
-
Target/AArch64/
-
AArch64/
7
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
complex-deinterleaving-f16-add-scalable.ll
3/4
complex-deinterleaving-f16-add.ll
-
complex-deinterleaving-f16-mul-scalable.ll
-
complex-deinterleaving-f32-add-scalable.ll
-
complex-deinterleaving-f32-mul-scalable.ll
-
complex-deinterleaving-f64-add-scalable.ll
-
complex-deinterleaving-f64-mul-scalable.ll

Differential D147451

[CodeGen] Enable AArch64 SVE FCMLA/FCADD instruction generation in ComplexDeinterleaving
ClosedPublic

Authored by igor.kirillov on Apr 3 2023, 9:41 AM.

Download Raw Diff

Details

Reviewers

NickGuy
huntergr
mgabka
dmgreen

Commits

rG6850bc35c6b5: [CodeGen] Enable AArch64 SVE FCMLA/FCADD instruction generation in…

Summary

This commit adds support for scalable vector types in theComplexDeinterleaving
pass, allowing it to recognize and handle llvm.vector.interleave2 and
llvm.vector.deinterleave2 intrinsics for both fixed and scalable vectors

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

igor.kirillov created this revision.Apr 3 2023, 9:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 3 2023, 9:41 AM

Herald added subscribers: hiraditya, kristof.beyls, tschuett. · View Herald Transcript

igor.kirillov requested review of this revision.Apr 3 2023, 9:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 3 2023, 9:41 AM

Herald added subscribers: llvm-commits, alextsao1999. · View Herald Transcript

Harbormaster completed remote builds in B223382: Diff 510538.Apr 3 2023, 11:05 AM

Matt added a subscriber: Matt.Apr 3 2023, 1:56 PM

Clean up

igor.kirillov added reviewers: NickGuy, huntergr, mgabka.Apr 4 2023, 6:42 AM

Harbormaster completed remote builds in B223550: Diff 510763.Apr 4 2023, 8:01 AM

igor.kirillov added inline comments.Apr 4 2023, 8:13 AM

llvm/test/CodeGen/AArch64/complex-deinterleaving-f16-add.ll
102	I am not sure if I should duplicate all fixed-width vector tests to reflect the fact that `llvm.experimental.vector.deinterleave2` is now supported there.

NickGuy added inline comments.Apr 4 2023, 8:51 AM

llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
259–260	Might be worth adding a comment here saying what a "Deinterleave" is; It's not immediately clear that a deinterleave is either a specific intrinsic, or a shufflevector instruction.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24641–24644	When working with scalable vectors, they don't have the same restriction of bit width. Treating them with a max width of 128 bits seems wasteful and inefficient, is there any way to get the vector width at compile time (is there a `target->getMaxVectorWidth()` or something)?
llvm/test/CodeGen/AArch64/complex-deinterleaving-f16-add.ll
102	It can't hurt, more coverage is always good. On the other hand, this is probably enough to test that the intrinsics work with fixed width cases.
103–108	Is the top comment correct? Is this case actually expected to not transform? Given the `fcadd` in the output, I'd assume that this is expected to transform.

dmgreen added a subscriber: dmgreen.Apr 5 2023, 7:41 AM

dmgreen added inline comments.

llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
352	Does this need to run twice with and without IsScalable? It doubles the scanning of instructions, and seems unnecessary if it only modifies the shuffles/intrinsic matches, which are either both valid or mutually exclusive.

mgabka added inline comments.Apr 6 2023, 4:35 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24641–24644	For the scalable vectors I don't think we want to use a min or max vector width, we should rather operate on the ElementCount and size of the ElementTypes I think. IIUC for the scalable vectors the condition we want to check is if we are operating on the packed vector types (in that case all are supported) or on the set of unpacked vectors we are supporting, am I correct? in that case maybe it is worth to have dedicated section for fixed width and scalable width vectors in this function?

mgabka added inline comments.Apr 6 2023, 4:57 AM

llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
23	nit: typo
25	nit: probably does not need to start with capital letter, maybe using "shufflevector instruction" would be more clear
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24628	I think it could be worth to add extra run lines to the existing aarch64 tests which operate on fixed width vectors, to make sure that adding +sve does not stop generation of fcmla there. What do you think?

Address the most recent comments

igor.kirillov marked an inline comment as not done.Apr 11 2023, 9:17 AM

igor.kirillov added inline comments.

llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
352	I thought about it, but theoretically target could have only one (scalable or fixed vector) support. Alternatively I can pass both flags `ComplexDeinterleavingGraph` and check them each time root node is found, but I am not sure if it is a better approach.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24628	Added to `complex-deinterleaving-f16-add.ll` where we have both shufflevectors and deinterleave2 intrinsics applied to fixed-width vectors
24641–24644	@NickGuy Actually, this functions returns false if VTyWidth is less than 128 bit, so any 128+ bit sized vectors are supported. @mgabka We support any unpacked type with size 2X if 2X >= 128, there is code in `AArch64TargetLowering::createComplexDeinterleavingIR` that splits those vectors until they have minimal size of 128 and then merges them back. We don't support min-64bit sized vectors (unlike Neon) and that's the condition I added to the `if` statement.
llvm/test/CodeGen/AArch64/complex-deinterleaving-f16-add.ll
103–108	Yes, indeed. Also fixed in the other places.

igor.kirillov added a reviewer: dmgreen.Apr 11 2023, 9:17 AM

NickGuy added inline comments.Apr 11 2023, 9:30 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24641–24644	Not sure why I added the comment here, it was supposed to be on the `if (TyWidth > 128) {` below, oops... How resource-efficient is this splitting with scalable vectors though, my concern is that we'd split the operation across numerous 256+ vectors while only using the lower 128 bits of each.

Harbormaster completed remote builds in B224798: Diff 512484.Apr 11 2023, 9:56 AM

igor.kirillov added inline comments.Apr 11 2023, 10:05 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24641–24644	Not sure if I understood your concern correctly. We are splitting any 256+ min-sized vector instructions to those that have minimum 128 bits. How many bits are going to be there depends on actual CPU, but generated code would work just fine even without knowing that information. For example, for <vscale x 8 x double> we'll get 4 instructions working on vectors of size <vscale x 2 x double>

dmgreen added inline comments.Apr 12 2023, 9:59 AM

llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
352	I'm not sure why that matters. Can you explain more? I would expect that we just match starting from whatever we find (shuffle or intrinsic) and isComplexDeinterleavingOperationSupported handles whether the actual type is supported (be it scalable or fixed length). The shuffle won't ever match a scalable vector, but that shouldn't be a problem as far as I understand.

Make function evaluateBasicBlock to run once by moving scalable/fixed width support check to isComplexDeinterleavingOperationSupported.
Some minor clean ups.

igor.kirillov added inline comments.Apr 13 2023, 6:20 AM

llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
352	Nevermind, I think you are right. Now `evaluateBasicBlock` runs only once an the check is `isComplexDeinterleavingOperationSupported`

Harbormaster completed remote builds in B225336: Diff 513209.Apr 13 2023, 7:19 AM

Thanks. I have some minor suggestions but otherwise LGTM.

llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
263	instruciton -> instruction
341	I don't think this needs to call with a Scalable flag. The AArch64TargetLowering::isComplexDeinterleavingSupported routine can just return true if it has sve or complexnums.
923	This could be turned into if (!RealShuffle \|\| !ImagShuffle) return nullptr; LLVM often likes returning early to reduce indentation.

This revision is now accepted and ready to land.Apr 14 2023, 9:10 AM

mgabka added inline comments.Apr 17 2023, 1:44 AM

llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
341	The isComplexDeinterleavingSupported with a flag is used inside isComplexDeinterleavingOperationSupported to check if for given vector type (scalable or fixed width) we have required architecture extensions avaialble. For scalable vectors we need just sve, while for fixed width we need to have the ComplxNum feature. However I realized that there is probably a bug here, as the SVE feature enables only scalable FCMLA, while scalable CMLA are available only when SVE2 feature is enabled, so I think we should make sure that isComplexDeinterleavingOperationSupported takes that into account and add extra testing. I guess what David suggest is to make "isComplexDeinterleavingSupported" function as generic as possible and leave all the detailed checks to isComplexDeinterleavingOperationSupported , what looks reasonable to me.

Remove UseScalable param from isComplexDeinterleavingSupported

Fix type, swap if condition

igor.kirillov added inline comments.Apr 17 2023, 4:39 AM

llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
341	CMLA are not a problem - we are checking that scalar type is not Int in AArch64TargetLowering::isComplexDeinterleavingOperationSupported

Harbormaster completed remote builds in B226049: Diff 514159.Apr 17 2023, 5:17 AM

igor.kirillov added a child revision: D148550: [CodeGen] Add pre-commit tests for D148558.Apr 17 2023, 11:19 AM

Update commit message
Swap if-condition to reduce diff

Harbormaster completed remote builds in B226375: Diff 514636.Apr 18 2023, 8:21 AM

This revision was landed with ongoing or failed builds.Apr 21 2023, 3:29 AM

Closed by commit rG6850bc35c6b5: [CodeGen] Enable AArch64 SVE FCMLA/FCADD instruction generation in… (authored by igor.kirillov). · Explain Why

This revision was automatically updated to reflect the committed changes.

igor.kirillov added a commit: rG6850bc35c6b5: [CodeGen] Enable AArch64 SVE FCMLA/FCADD instruction generation in….

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

ComplexDeinterleavingPass.h

2 lines

lib/

CodeGen/

ComplexDeinterleavingPass.cpp

288 lines

Target/

AArch64/

AArch64ISelLowering.cpp

76 lines

test/

CodeGen/

AArch64/

complex-deinterleaving-f16-add-scalable.ll

120 lines

complex-deinterleaving-f16-add.ll

68 lines

complex-deinterleaving-f16-mul-scalable.ll

153 lines

complex-deinterleaving-f32-add-scalable.ll

83 lines

complex-deinterleaving-f32-mul-scalable.ll

111 lines

complex-deinterleaving-f64-add-scalable.ll

84 lines

complex-deinterleaving-f64-mul-scalable.ll

110 lines

Diff 515670

llvm/include/llvm/CodeGen/ComplexDeinterleavingPass.h

Show All 32 Lines	public:
PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);		PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
};		};

enum class ComplexDeinterleavingOperation {		enum class ComplexDeinterleavingOperation {
CAdd,		CAdd,
CMulPartial,		CMulPartial,
// The following 'operations' are used to represent internal states. Backends		// The following 'operations' are used to represent internal states. Backends
// are not expected to try and support these in any capacity.		// are not expected to try and support these in any capacity.
Shuffle,		Deinterleave,
Symmetric		Symmetric
};		};

enum class ComplexDeinterleavingRotation {		enum class ComplexDeinterleavingRotation {
Rotation_0 = 0,		Rotation_0 = 0,
Rotation_90 = 1,		Rotation_90 = 1,
Rotation_180 = 2,		Rotation_180 = 2,
Rotation_270 = 3,		Rotation_270 = 3,
};		};

} // namespace llvm		} // namespace llvm

#endif // LLVM_CODEGEN_COMPLEXDEINTERLEAVING_H		#endif // LLVM_CODEGEN_COMPLEXDEINTERLEAVING_H

llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp

Show All 12 Lines
// reinterleaves the complex components, with a mask of <0, 2, 1, 3>), the		// reinterleaves the complex components, with a mask of <0, 2, 1, 3>), the
// operands are evaluated and identified as "Composite Nodes" (collections of		// operands are evaluated and identified as "Composite Nodes" (collections of
// instructions that can potentially be lowered to a single complex		// instructions that can potentially be lowered to a single complex
// instruction). This is performed by checking the real and imaginary components		// instruction). This is performed by checking the real and imaginary components
// and tracking the data flow for each component while following the operand		// and tracking the data flow for each component while following the operand
// pairs. Validity of each node is expected to be done upon creation, and any		// pairs. Validity of each node is expected to be done upon creation, and any
// validation errors should halt traversal and prevent further graph		// validation errors should halt traversal and prevent further graph
// construction.		// construction.
		// Instead of relying on Shuffle operations, vector interleaving and
		// deinterleaving can be represented by vector.interleave2 and
		// vector.deinterleave2 intrinsics. Scalable vectors can be represented only by
		mgabkaUnsubmitted Done Reply Inline Actions nit: typo mgabka: nit: typo
		// these intrinsics, whereas, fixed-width vectors are recognized for both
		// shufflevector instruction and intrinsics.
		mgabkaUnsubmitted Done Reply Inline Actions nit: probably does not need to start with capital letter, maybe using "shufflevector instruction" would be more clear mgabka: nit: probably does not need to start with capital letter, maybe using "shufflevector…
//		//
// Replacement:		// Replacement:
// This step traverses the graph built up by identification, delegating to the		// This step traverses the graph built up by identification, delegating to the
// target to validate and generate the correct intrinsics, and plumbs them		// target to validate and generate the correct intrinsics, and plumbs them
// together connecting each end of the new intrinsics graph to the existing		// together connecting each end of the new intrinsics graph to the existing
// use-def chain. This step is assumed to finish successfully, as all		// use-def chain. This step is assumed to finish successfully, as all
// information is expected to be correct by this point.		// information is expected to be correct by this point.
//		//
▲ Show 20 Lines • Show All 216 Lines • ▼ Show 20 Lines	private:
/// i: ai + br		/// i: ai + br
/// 270: r: ar + bi		/// 270: r: ar + bi
/// i: ai - br		/// i: ai - br
NodePtr identifyAdd(Instruction Real, Instruction Imag);		NodePtr identifyAdd(Instruction Real, Instruction Imag);
NodePtr identifySymmetricOperation(Instruction Real, Instruction Imag);		NodePtr identifySymmetricOperation(Instruction Real, Instruction Imag);

NodePtr identifyNode(Instruction I, Instruction J);		NodePtr identifyNode(Instruction I, Instruction J);

		NodePtr identifyRoot(Instruction *I);

		/// Identifies the Deinterleave operation applied to a vector containing
		NickGuyUnsubmitted Done Reply Inline Actions Might be worth adding a comment here saying what a "Deinterleave" is; It's not immediately clear that a deinterleave is either a specific intrinsic, or a shufflevector instruction. NickGuy: Might be worth adding a comment here saying what a "Deinterleave" is; It's not immediately…
		/// complex numbers. There are two ways to represent the Deinterleave
		/// operation:
		/// * Using two shufflevectors with even indices for /pReal instruction and
		dmgreenUnsubmitted Done Reply Inline Actions instruciton -> instruction dmgreen: instruciton -> instruction
		/// odd indices for /pImag instructions (only for fixed-width vectors)
		/// * Using two extractvalue instructions applied to `vector.deinterleave2`
		/// intrinsic (for both fixed and scalable vectors)
		NodePtr identifyDeinterleave(Instruction Real, Instruction Imag);

Value *replaceNode(RawNodePtr Node);		Value *replaceNode(RawNodePtr Node);

public:		public:
void dump() { dump(dbgs()); }		void dump() { dump(dbgs()); }
void dump(raw_ostream &OS) {		void dump(raw_ostream &OS) {
for (const auto &Node : CompositeNodes)		for (const auto &Node : CompositeNodes)
Node->dump(OS);		Node->dump(OS);
}		}
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines

bool ComplexDeinterleaving::runOnFunction(Function &F) {		bool ComplexDeinterleaving::runOnFunction(Function &F) {
if (!ComplexDeinterleavingEnabled) {		if (!ComplexDeinterleavingEnabled) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "Complex deinterleaving has been explicitly disabled.\n");		dbgs() << "Complex deinterleaving has been explicitly disabled.\n");
return false;		return false;
}		}

if (!TL->isComplexDeinterleavingSupported()) {		if (!TL->isComplexDeinterleavingSupported()) {
		dmgreenUnsubmitted Done Reply Inline Actions I don't think this needs to call with a Scalable flag. The AArch64TargetLowering::isComplexDeinterleavingSupported routine can just return true if it has sve or complexnums. dmgreen: I don't think this needs to call with a Scalable flag. The AArch64TargetLowering…
		mgabkaUnsubmitted Done Reply Inline Actions The isComplexDeinterleavingSupported with a flag is used inside isComplexDeinterleavingOperationSupported to check if for given vector type (scalable or fixed width) we have required architecture extensions avaialble. For scalable vectors we need just sve, while for fixed width we need to have the ComplxNum feature. However I realized that there is probably a bug here, as the SVE feature enables only scalable FCMLA, while scalable CMLA are available only when SVE2 feature is enabled, so I think we should make sure that isComplexDeinterleavingOperationSupported takes that into account and add extra testing. I guess what David suggest is to make "isComplexDeinterleavingSupported" function as generic as possible and leave all the detailed checks to isComplexDeinterleavingOperationSupported , what looks reasonable to me. mgabka: The isComplexDeinterleavingSupported with a flag is used inside…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions CMLA are not a problem - we are checking that scalar type is not Int in AArch64TargetLowering::isComplexDeinterleavingOperationSupported igor.kirillov: CMLA are not a problem - we are checking that scalar type is not Int in…
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "Complex deinterleaving has been disabled, target does "		dbgs() << "Complex deinterleaving has been disabled, target does "
"not support lowering of complex number operations.\n");		"not support lowering of complex number operations.\n");
return false;		return false;
}		}

bool Changed = false;		bool Changed = false;
for (auto &B : F)		for (auto &B : F)
Changed \|= evaluateBasicBlock(&B);		Changed \|= evaluateBasicBlock(&B);

return Changed;		return Changed;
		dmgreenUnsubmitted Done Reply Inline Actions Does this need to run twice with and without IsScalable? It doubles the scanning of instructions, and seems unnecessary if it only modifies the shuffles/intrinsic matches, which are either both valid or mutually exclusive. dmgreen: Does this need to run twice with and without IsScalable? It doubles the scanning of…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions I thought about it, but theoretically target could have only one (scalable or fixed vector) support. Alternatively I can pass both flags `ComplexDeinterleavingGraph` and check them each time root node is found, but I am not sure if it is a better approach. igor.kirillov: I thought about it, but theoretically target could have only one (scalable or fixed vector)…
		dmgreenUnsubmitted Done Reply Inline Actions I'm not sure why that matters. Can you explain more? I would expect that we just match starting from whatever we find (shuffle or intrinsic) and isComplexDeinterleavingOperationSupported handles whether the actual type is supported (be it scalable or fixed length). The shuffle won't ever match a scalable vector, but that shouldn't be a problem as far as I understand. dmgreen: I'm not sure why that matters. Can you explain more? I would expect that we just match starting…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions Nevermind, I think you are right. Now `evaluateBasicBlock` runs only once an the check is `isComplexDeinterleavingOperationSupported` igor.kirillov: Nevermind, I think you are right. Now `evaluateBasicBlock` runs only once an the check is…
}		}

static bool isInterleavingMask(ArrayRef<int> Mask) {		static bool isInterleavingMask(ArrayRef<int> Mask) {
// If the size is not even, it's not an interleaving mask		// If the size is not even, it's not an interleaving mask
if ((Mask.size() & 1))		if ((Mask.size() & 1))
return false;		return false;

int HalfNumElements = Mask.size() / 2;		int HalfNumElements = Mask.size() / 2;
Show All 15 Lines	if (Mask[Idx] != (Idx * 2) + Offset)
return false;		return false;
}		}

return true;		return true;
}		}

bool ComplexDeinterleaving::evaluateBasicBlock(BasicBlock *B) {		bool ComplexDeinterleaving::evaluateBasicBlock(BasicBlock *B) {
ComplexDeinterleavingGraph Graph(TL, TLI);		ComplexDeinterleavingGraph Graph(TL, TLI);
		for (auto &I : *B)
for (auto &I : *B) {		Graph.identifyNodes(&I);
auto *SVI = dyn_cast<ShuffleVectorInst>(&I);
if (!SVI)
continue;

// Look for a shufflevector that takes separate vectors of the real and
// imaginary components and recombines them into a single vector.
if (!isInterleavingMask(SVI->getShuffleMask()))
continue;

Graph.identifyNodes(SVI);
}

if (Graph.checkNodes()) {		if (Graph.checkNodes()) {
Graph.replaceNodes();		Graph.replaceNodes();
return true;		return true;
}		}

return false;		return false;
}		}
▲ Show 20 Lines • Show All 354 Lines • ▼ Show 20 Lines
ComplexDeinterleavingGraph::NodePtr		ComplexDeinterleavingGraph::NodePtr
ComplexDeinterleavingGraph::identifyNode(Instruction Real, Instruction Imag) {		ComplexDeinterleavingGraph::identifyNode(Instruction Real, Instruction Imag) {
LLVM_DEBUG(dbgs() << "identifyNode on " << Real << " / " << Imag << "\n");		LLVM_DEBUG(dbgs() << "identifyNode on " << Real << " / " << Imag << "\n");
if (NodePtr CN = getContainingComposite(Real, Imag)) {		if (NodePtr CN = getContainingComposite(Real, Imag)) {
LLVM_DEBUG(dbgs() << " - Folding to existing node\n");		LLVM_DEBUG(dbgs() << " - Folding to existing node\n");
return CN;		return CN;
}		}

auto *RealShuffle = dyn_cast<ShuffleVectorInst>(Real);		NodePtr Node = identifyDeinterleave(Real, Imag);
auto *ImagShuffle = dyn_cast<ShuffleVectorInst>(Imag);		if (Node)
if (RealShuffle && ImagShuffle) {		return Node;
Value *RealOp1 = RealShuffle->getOperand(1);
if (!isa<UndefValue>(RealOp1) && !isa<ConstantAggregateZero>(RealOp1)) {
LLVM_DEBUG(dbgs() << " - RealOp1 is not undef or zero.\n");
return nullptr;
}
Value *ImagOp1 = ImagShuffle->getOperand(1);
if (!isa<UndefValue>(ImagOp1) && !isa<ConstantAggregateZero>(ImagOp1)) {
LLVM_DEBUG(dbgs() << " - ImagOp1 is not undef or zero.\n");
return nullptr;
}

Value *RealOp0 = RealShuffle->getOperand(0);
Value *ImagOp0 = ImagShuffle->getOperand(0);

if (RealOp0 != ImagOp0) {
LLVM_DEBUG(dbgs() << " - Shuffle operands are not equal.\n");
return nullptr;
}

ArrayRef<int> RealMask = RealShuffle->getShuffleMask();
ArrayRef<int> ImagMask = ImagShuffle->getShuffleMask();
if (!isDeinterleavingMask(RealMask) \|\| !isDeinterleavingMask(ImagMask)) {
LLVM_DEBUG(dbgs() << " - Masks are not deinterleaving.\n");
return nullptr;
}

if (RealMask[0] != 0 \|\| ImagMask[0] != 1) {
LLVM_DEBUG(dbgs() << " - Masks do not have the correct initial value.\n");
return nullptr;
}

// Type checking, the shuffle type should be a vector type of the same
// scalar type, but half the size
auto CheckType = [&](ShuffleVectorInst *Shuffle) {
Value *Op = Shuffle->getOperand(0);
auto *ShuffleTy = cast<FixedVectorType>(Shuffle->getType());
auto *OpTy = cast<FixedVectorType>(Op->getType());

if (OpTy->getScalarType() != ShuffleTy->getScalarType())
return false;
if ((ShuffleTy->getNumElements() * 2) != OpTy->getNumElements())
return false;

return true;
};

auto CheckDeinterleavingShuffle = [&](ShuffleVectorInst *Shuffle) -> bool {
if (!CheckType(Shuffle))
return false;

ArrayRef<int> Mask = Shuffle->getShuffleMask();
int Last = *Mask.rbegin();

Value *Op = Shuffle->getOperand(0);
auto *OpTy = cast<FixedVectorType>(Op->getType());
int NumElements = OpTy->getNumElements();

// Ensure that the deinterleaving shuffle only pulls from the first
// shuffle operand.
return Last < NumElements;
};

if (RealShuffle->getType() != ImagShuffle->getType()) {
LLVM_DEBUG(dbgs() << " - Shuffle types aren't equal.\n");
return nullptr;
}
if (!CheckDeinterleavingShuffle(RealShuffle)) {
LLVM_DEBUG(dbgs() << " - RealShuffle is invalid type.\n");
return nullptr;
}
if (!CheckDeinterleavingShuffle(ImagShuffle)) {
LLVM_DEBUG(dbgs() << " - ImagShuffle is invalid type.\n");
return nullptr;
}

NodePtr PlaceholderNode =
prepareCompositeNode(llvm::ComplexDeinterleavingOperation::Shuffle,
RealShuffle, ImagShuffle);
PlaceholderNode->ReplacementNode = RealShuffle->getOperand(0);
FinalInstructions.insert(RealShuffle);
FinalInstructions.insert(ImagShuffle);
return submitCompositeNode(PlaceholderNode);
}
if (RealShuffle \|\| ImagShuffle) {
LLVM_DEBUG(dbgs() << " - There's a shuffle where there shouldn't be.\n");
return nullptr;
}

auto *VTy = cast<FixedVectorType>(Real->getType());		auto *VTy = cast<VectorType>(Real->getType());
auto *NewVTy =		auto *NewVTy = VectorType::getDoubleElementsVectorType(VTy);
FixedVectorType::get(VTy->getScalarType(), VTy->getNumElements() * 2);

if (TL->isComplexDeinterleavingOperationSupported(		if (TL->isComplexDeinterleavingOperationSupported(
ComplexDeinterleavingOperation::CMulPartial, NewVTy) &&		ComplexDeinterleavingOperation::CMulPartial, NewVTy) &&
isInstructionPairMul(Real, Imag)) {		isInstructionPairMul(Real, Imag)) {
return identifyPartialMul(Real, Imag);		return identifyPartialMul(Real, Imag);
}		}

if (TL->isComplexDeinterleavingOperationSupported(		if (TL->isComplexDeinterleavingOperationSupported(
ComplexDeinterleavingOperation::CAdd, NewVTy) &&		ComplexDeinterleavingOperation::CAdd, NewVTy) &&
isInstructionPairAdd(Real, Imag)) {		isInstructionPairAdd(Real, Imag)) {
return identifyAdd(Real, Imag);		return identifyAdd(Real, Imag);
}		}

auto Symmetric = identifySymmetricOperation(Real, Imag);		auto Symmetric = identifySymmetricOperation(Real, Imag);
LLVM_DEBUG(if (Symmetric == nullptr) dbgs()		LLVM_DEBUG(if (Symmetric == nullptr) dbgs()
<< " - Not recognised as a valid pattern.\n");		<< " - Not recognised as a valid pattern.\n");
return Symmetric;		return Symmetric;
}		}

bool ComplexDeinterleavingGraph::identifyNodes(Instruction *RootI) {		bool ComplexDeinterleavingGraph::identifyNodes(Instruction *RootI) {
Instruction *Real;		auto RootNode = identifyRoot(RootI);
Instruction *Imag;		if (!RootNode)
if (!match(RootI, m_Shuffle(m_Instruction(Real), m_Instruction(Imag))))
return false;		return false;

auto RootNode = identifyNode(Real, Imag);

LLVM_DEBUG({		LLVM_DEBUG({
Function *F = RootI->getFunction();		Function *F = RootI->getFunction();
BasicBlock *B = RootI->getParent();		BasicBlock *B = RootI->getParent();
dbgs() << "Complex deinterleaving graph for " << F->getName()		dbgs() << "Complex deinterleaving graph for " << F->getName()
<< "::" << B->getName() << ".\n";		<< "::" << B->getName() << ".\n";
dump(dbgs());		dump(dbgs());
dbgs() << "\n";		dbgs() << "\n";
});		});

if (RootNode) {
RootToNode[RootI] = RootNode;		RootToNode[RootI] = RootNode;
OrderedRoots.push_back(RootI);		OrderedRoots.push_back(RootI);
return true;		return true;
}		}

return false;
}

bool ComplexDeinterleavingGraph::checkNodes() {		bool ComplexDeinterleavingGraph::checkNodes() {
// Collect all instructions from roots to leaves		// Collect all instructions from roots to leaves
SmallPtrSet<Instruction *, 16> AllInstructions;		SmallPtrSet<Instruction *, 16> AllInstructions;
SmallVector<Instruction *, 8> Worklist;		SmallVector<Instruction *, 8> Worklist;
for (auto *I : OrderedRoots)		for (auto *I : OrderedRoots)
Worklist.push_back(I);		Worklist.push_back(I);

// Extract all instructions that are used by all XCMLA/XCADD/ADD/SUB/NEG		// Extract all instructions that are used by all XCMLA/XCADD/ADD/SUB/NEG
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	while (!Worklist.empty()) {
for (Value *Op : I->operands()) {		for (Value *Op : I->operands()) {
if (auto *OpI = dyn_cast<Instruction>(Op))		if (auto *OpI = dyn_cast<Instruction>(Op))
Worklist.emplace_back(OpI);		Worklist.emplace_back(OpI);
}		}
}		}
return !RootToNode.empty();		return !RootToNode.empty();
}		}

		ComplexDeinterleavingGraph::NodePtr
		ComplexDeinterleavingGraph::identifyRoot(Instruction *RootI) {
		if (auto *Intrinsic = dyn_cast<IntrinsicInst>(RootI)) {
		if (Intrinsic->getIntrinsicID() !=
		Intrinsic::experimental_vector_interleave2)
		return nullptr;

		auto *Real = dyn_cast<Instruction>(Intrinsic->getOperand(0));
		auto *Imag = dyn_cast<Instruction>(Intrinsic->getOperand(1));
		if (!Real \|\| !Imag)
		return nullptr;

		return identifyNode(Real, Imag);
		}

		auto *SVI = dyn_cast<ShuffleVectorInst>(RootI);
		if (!SVI)
		return nullptr;

		// Look for a shufflevector that takes separate vectors of the real and
		// imaginary components and recombines them into a single vector.
		if (!isInterleavingMask(SVI->getShuffleMask()))
		return nullptr;

		Instruction *Real;
		Instruction *Imag;
		if (!match(RootI, m_Shuffle(m_Instruction(Real), m_Instruction(Imag))))
		return nullptr;

		return identifyNode(Real, Imag);
		}

		ComplexDeinterleavingGraph::NodePtr
		ComplexDeinterleavingGraph::identifyDeinterleave(Instruction *Real,
		Instruction *Imag) {
		Instruction *I = nullptr;
		Value *FinalValue = nullptr;
		if (match(Real, m_ExtractValue<0>(m_Instruction(I))) &&
		match(Imag, m_ExtractValue<1>(m_Specific(I))) &&
		match(I, m_Intrinsic<Intrinsic::experimental_vector_deinterleave2>(
		m_Value(FinalValue)))) {
		NodePtr PlaceholderNode = prepareCompositeNode(
		llvm::ComplexDeinterleavingOperation::Deinterleave, Real, Imag);
		PlaceholderNode->ReplacementNode = FinalValue;
		FinalInstructions.insert(Real);
		FinalInstructions.insert(Imag);
		return submitCompositeNode(PlaceholderNode);
		}

		auto *RealShuffle = dyn_cast<ShuffleVectorInst>(Real);
		auto *ImagShuffle = dyn_cast<ShuffleVectorInst>(Imag);
		if (!RealShuffle \|\| !ImagShuffle) {
		dmgreenUnsubmitted Done Reply Inline Actions This could be turned into if (!RealShuffle \|\| !ImagShuffle) return nullptr; LLVM often likes returning early to reduce indentation. dmgreen: This could be turned into ``` if (!RealShuffle \|\| !ImagShuffle) return nullptr; ``` LLVM…
		if (RealShuffle \|\| ImagShuffle)
		LLVM_DEBUG(dbgs() << " - There's a shuffle where there shouldn't be.\n");
		return nullptr;
		}

		Value *RealOp1 = RealShuffle->getOperand(1);
		if (!isa<UndefValue>(RealOp1) && !isa<ConstantAggregateZero>(RealOp1)) {
		LLVM_DEBUG(dbgs() << " - RealOp1 is not undef or zero.\n");
		return nullptr;
		}
		Value *ImagOp1 = ImagShuffle->getOperand(1);
		if (!isa<UndefValue>(ImagOp1) && !isa<ConstantAggregateZero>(ImagOp1)) {
		LLVM_DEBUG(dbgs() << " - ImagOp1 is not undef or zero.\n");
		return nullptr;
		}

		Value *RealOp0 = RealShuffle->getOperand(0);
		Value *ImagOp0 = ImagShuffle->getOperand(0);

		if (RealOp0 != ImagOp0) {
		LLVM_DEBUG(dbgs() << " - Shuffle operands are not equal.\n");
		return nullptr;
		}

		ArrayRef<int> RealMask = RealShuffle->getShuffleMask();
		ArrayRef<int> ImagMask = ImagShuffle->getShuffleMask();
		if (!isDeinterleavingMask(RealMask) \|\| !isDeinterleavingMask(ImagMask)) {
		LLVM_DEBUG(dbgs() << " - Masks are not deinterleaving.\n");
		return nullptr;
		}

		if (RealMask[0] != 0 \|\| ImagMask[0] != 1) {
		LLVM_DEBUG(dbgs() << " - Masks do not have the correct initial value.\n");
		return nullptr;
		}

		// Type checking, the shuffle type should be a vector type of the same
		// scalar type, but half the size
		auto CheckType = [&](ShuffleVectorInst *Shuffle) {
		Value *Op = Shuffle->getOperand(0);
		auto *ShuffleTy = cast<FixedVectorType>(Shuffle->getType());
		auto *OpTy = cast<FixedVectorType>(Op->getType());

		if (OpTy->getScalarType() != ShuffleTy->getScalarType())
		return false;
		if ((ShuffleTy->getNumElements() * 2) != OpTy->getNumElements())
		return false;

		return true;
		};

		auto CheckDeinterleavingShuffle = [&](ShuffleVectorInst *Shuffle) -> bool {
		if (!CheckType(Shuffle))
		return false;

		ArrayRef<int> Mask = Shuffle->getShuffleMask();
		int Last = *Mask.rbegin();

		Value *Op = Shuffle->getOperand(0);
		auto *OpTy = cast<FixedVectorType>(Op->getType());
		int NumElements = OpTy->getNumElements();

		// Ensure that the deinterleaving shuffle only pulls from the first
		// shuffle operand.
		return Last < NumElements;
		};

		if (RealShuffle->getType() != ImagShuffle->getType()) {
		LLVM_DEBUG(dbgs() << " - Shuffle types aren't equal.\n");
		return nullptr;
		}
		if (!CheckDeinterleavingShuffle(RealShuffle)) {
		LLVM_DEBUG(dbgs() << " - RealShuffle is invalid type.\n");
		return nullptr;
		}
		if (!CheckDeinterleavingShuffle(ImagShuffle)) {
		LLVM_DEBUG(dbgs() << " - ImagShuffle is invalid type.\n");
		return nullptr;
		}

		NodePtr PlaceholderNode =
		prepareCompositeNode(llvm::ComplexDeinterleavingOperation::Deinterleave,
		RealShuffle, ImagShuffle);
		PlaceholderNode->ReplacementNode = RealShuffle->getOperand(0);
		FinalInstructions.insert(RealShuffle);
		FinalInstructions.insert(ImagShuffle);
		return submitCompositeNode(PlaceholderNode);
		}

static Value *replaceSymmetricNode(ComplexDeinterleavingGraph::RawNodePtr Node,		static Value *replaceSymmetricNode(ComplexDeinterleavingGraph::RawNodePtr Node,
Value InputA, Value InputB) {		Value InputA, Value InputB) {
Instruction *I = Node->Real;		Instruction *I = Node->Real;
if (I->isUnaryOp())		if (I->isUnaryOp())
assert(!InputB &&		assert(!InputB &&
"Unary symmetric operations need one input, but two were provided.");		"Unary symmetric operations need one input, but two were provided.");
else if (I->isBinaryOp())		else if (I->isBinaryOp())
assert(InputB && "Binary symmetric operations need two inputs, only one "		assert(InputB && "Binary symmetric operations need two inputs, only one "
▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 24,619 Lines • ▼ Show 20 Lines
	}			}

	bool AArch64TargetLowering::isConstantUnsignedBitfieldExtractLegal(			bool AArch64TargetLowering::isConstantUnsignedBitfieldExtractLegal(
	unsigned Opc, LLT Ty1, LLT Ty2) const {			unsigned Opc, LLT Ty1, LLT Ty2) const {
	return Ty1 == Ty2 && (Ty1 == LLT::scalar(32) \|\| Ty1 == LLT::scalar(64));			return Ty1 == Ty2 && (Ty1 == LLT::scalar(32) \|\| Ty1 == LLT::scalar(64));
	}			}

	bool AArch64TargetLowering::isComplexDeinterleavingSupported() const {			bool AArch64TargetLowering::isComplexDeinterleavingSupported() const {
	return Subtarget->hasComplxNum();			return Subtarget->hasSVE() \|\| Subtarget->hasComplxNum();
				mgabkaUnsubmitted Not Done Reply Inline Actions I think it could be worth to add extra run lines to the existing aarch64 tests which operate on fixed width vectors, to make sure that adding +sve does not stop generation of fcmla there. What do you think? mgabka: I think it could be worth to add extra run lines to the existing aarch64 tests which operate on…
				igor.kirillovAuthorUnsubmitted Not Done Reply Inline Actions Added to `complex-deinterleaving-f16-add.ll` where we have both shufflevectors and deinterleave2 intrinsics applied to fixed-width vectors igor.kirillov: Added to `complex-deinterleaving-f16-add.ll` where we have both shufflevectors and…
	}			}

	bool AArch64TargetLowering::isComplexDeinterleavingOperationSupported(			bool AArch64TargetLowering::isComplexDeinterleavingOperationSupported(
	ComplexDeinterleavingOperation Operation, Type *Ty) const {			ComplexDeinterleavingOperation Operation, Type *Ty) const {
	auto *VTy = dyn_cast<FixedVectorType>(Ty);			auto *VTy = dyn_cast<VectorType>(Ty);
	if (!VTy)			if (!VTy)
	return false;			return false;

				// If the vector is scalable, SVE is enabled, implying support for complex
				// numbers. Otherwirse, we need to ensure complex number support is avaialble
				if (!VTy->isScalableTy() && !Subtarget->hasComplxNum())
				return false;

	auto *ScalarTy = VTy->getScalarType();			auto *ScalarTy = VTy->getScalarType();
	unsigned NumElements = VTy->getNumElements();			unsigned NumElements = VTy->getElementCount().getKnownMinValue();

				// We can only process vectors that have a bit size of 128 or higher (with an
				// additional 64 bits for Neon). Additionally, these vectors must have a
				// power-of-2 size, as we later split them into the smallest supported size
				// and merging them back together after applying complex operation.
	unsigned VTyWidth = VTy->getScalarSizeInBits() * NumElements;			unsigned VTyWidth = VTy->getScalarSizeInBits() * NumElements;
	if ((VTyWidth < 128 && VTyWidth != 64) \|\| !llvm::isPowerOf2_32(VTyWidth))			if ((VTyWidth < 128 && (VTy->isScalableTy() \|\| VTyWidth != 64)) \|\|
				!llvm::isPowerOf2_32(VTyWidth))
	return false;			return false;

	return (ScalarTy->isHalfTy() && Subtarget->hasFullFP16()) \|\|			return (ScalarTy->isHalfTy() && Subtarget->hasFullFP16()) \|\|
	NickGuyUnsubmitted Not Done Reply Inline Actions When working with scalable vectors, they don't have the same restriction of bit width. Treating them with a max width of 128 bits seems wasteful and inefficient, is there any way to get the vector width at compile time (is there a `target->getMaxVectorWidth()` or something)? NickGuy: When working with scalable vectors, they don't have the same restriction of bit width. Treating…
	mgabkaUnsubmitted Not Done Reply Inline Actions For the scalable vectors I don't think we want to use a min or max vector width, we should rather operate on the ElementCount and size of the ElementTypes I think. IIUC for the scalable vectors the condition we want to check is if we are operating on the packed vector types (in that case all are supported) or on the set of unpacked vectors we are supporting, am I correct? in that case maybe it is worth to have dedicated section for fixed width and scalable width vectors in this function? mgabka: For the scalable vectors I don't think we want to use a min or max vector width, we should…
	igor.kirillovAuthorUnsubmitted Not Done Reply Inline Actions @NickGuy Actually, this functions returns false if VTyWidth is less than 128 bit, so any 128+ bit sized vectors are supported. @mgabka We support any unpacked type with size 2X if 2X >= 128, there is code in `AArch64TargetLowering::createComplexDeinterleavingIR` that splits those vectors until they have minimal size of 128 and then merges them back. We don't support min-64bit sized vectors (unlike Neon) and that's the condition I added to the `if` statement. igor.kirillov: @NickGuy Actually, this functions returns false if VTyWidth is less than 128 bit, so any 128+…
	NickGuyUnsubmitted Not Done Reply Inline Actions Not sure why I added the comment here, it was supposed to be on the `if (TyWidth > 128) {` below, oops... How resource-efficient is this splitting with scalable vectors though, my concern is that we'd split the operation across numerous 256+ vectors while only using the lower 128 bits of each. NickGuy: Not sure why I added the comment here, it was supposed to be on the `if (TyWidth > 128) {`…
	igor.kirillovAuthorUnsubmitted Not Done Reply Inline Actions Not sure if I understood your concern correctly. We are splitting any 256+ min-sized vector instructions to those that have minimum 128 bits. How many bits are going to be there depends on actual CPU, but generated code would work just fine even without knowing that information. For example, for <vscale x 8 x double> we'll get 4 instructions working on vectors of size <vscale x 2 x double> igor.kirillov: Not sure if I understood your concern correctly. We are splitting any 256+ min-sized vector…
	ScalarTy->isFloatTy() \|\| ScalarTy->isDoubleTy();			ScalarTy->isFloatTy() \|\| ScalarTy->isDoubleTy();
	}			}

	Value *AArch64TargetLowering::createComplexDeinterleavingIR(			Value *AArch64TargetLowering::createComplexDeinterleavingIR(
	Instruction *I, ComplexDeinterleavingOperation OperationType,			Instruction *I, ComplexDeinterleavingOperation OperationType,
	ComplexDeinterleavingRotation Rotation, Value InputA, Value InputB,			ComplexDeinterleavingRotation Rotation, Value InputA, Value InputB,
	Value *Accumulator) const {			Value *Accumulator) const {
	FixedVectorType *Ty = cast<FixedVectorType>(InputA->getType());			VectorType *Ty = cast<VectorType>(InputA->getType());
				bool IsScalable = Ty->isScalableTy();

	IRBuilder<> B(I);			IRBuilder<> B(I);

	unsigned TyWidth = Ty->getScalarSizeInBits() * Ty->getNumElements();			unsigned TyWidth =
				Ty->getScalarSizeInBits() * Ty->getElementCount().getKnownMinValue();

	assert(((TyWidth >= 128 && llvm::isPowerOf2_32(TyWidth)) \|\| TyWidth == 64) &&			assert(((TyWidth >= 128 && llvm::isPowerOf2_32(TyWidth)) \|\| TyWidth == 64) &&
	"Vector type must be either 64 or a power of 2 that is at least 128");			"Vector type must be either 64 or a power of 2 that is at least 128");

	if (TyWidth > 128) {			if (TyWidth > 128) {
	int Stride = Ty->getNumElements() / 2;			int Stride = Ty->getElementCount().getKnownMinValue() / 2;
	auto SplitSeq = llvm::seq<int>(0, Ty->getNumElements());			auto *HalfTy = VectorType::getHalfElementsVectorType(Ty);
	auto SplitSeqVec = llvm::to_vector(SplitSeq);			auto *LowerSplitA = B.CreateExtractVector(HalfTy, InputA, B.getInt64(0));
	ArrayRef<int> LowerSplitMask(&SplitSeqVec[0], Stride);			auto *LowerSplitB = B.CreateExtractVector(HalfTy, InputB, B.getInt64(0));
	ArrayRef<int> UpperSplitMask(&SplitSeqVec[Stride], Stride);			auto *UpperSplitA =
				B.CreateExtractVector(HalfTy, InputA, B.getInt64(Stride));
	auto *LowerSplitA = B.CreateShuffleVector(InputA, LowerSplitMask);			auto *UpperSplitB =
	auto *LowerSplitB = B.CreateShuffleVector(InputB, LowerSplitMask);			B.CreateExtractVector(HalfTy, InputB, B.getInt64(Stride));
	auto *UpperSplitA = B.CreateShuffleVector(InputA, UpperSplitMask);
	auto *UpperSplitB = B.CreateShuffleVector(InputB, UpperSplitMask);
	Value *LowerSplitAcc = nullptr;			Value *LowerSplitAcc = nullptr;
	Value *UpperSplitAcc = nullptr;			Value *UpperSplitAcc = nullptr;

	if (Accumulator) {			if (Accumulator) {
	LowerSplitAcc = B.CreateShuffleVector(Accumulator, LowerSplitMask);			LowerSplitAcc = B.CreateExtractVector(HalfTy, Accumulator, B.getInt64(0));
	UpperSplitAcc = B.CreateShuffleVector(Accumulator, UpperSplitMask);			UpperSplitAcc =
				B.CreateExtractVector(HalfTy, Accumulator, B.getInt64(Stride));
	}			}

	auto *LowerSplitInt = createComplexDeinterleavingIR(			auto *LowerSplitInt = createComplexDeinterleavingIR(
	I, OperationType, Rotation, LowerSplitA, LowerSplitB, LowerSplitAcc);			I, OperationType, Rotation, LowerSplitA, LowerSplitB, LowerSplitAcc);
	auto *UpperSplitInt = createComplexDeinterleavingIR(			auto *UpperSplitInt = createComplexDeinterleavingIR(
	I, OperationType, Rotation, UpperSplitA, UpperSplitB, UpperSplitAcc);			I, OperationType, Rotation, UpperSplitA, UpperSplitB, UpperSplitAcc);

	ArrayRef<int> JoinMask(&SplitSeqVec[0], Ty->getNumElements());			auto *Result = B.CreateInsertVector(Ty, PoisonValue::get(Ty), LowerSplitInt,
	return B.CreateShuffleVector(LowerSplitInt, UpperSplitInt, JoinMask);			B.getInt64(0));
				return B.CreateInsertVector(Ty, Result, UpperSplitInt, B.getInt64(Stride));
	}			}

	if (OperationType == ComplexDeinterleavingOperation::CMulPartial) {			if (OperationType == ComplexDeinterleavingOperation::CMulPartial) {
				if (Accumulator == nullptr)
				Accumulator = ConstantFP::get(Ty, 0);

				if (IsScalable) {
				auto *Mask = B.CreateVectorSplat(Ty->getElementCount(), B.getInt1(true));
				return B.CreateIntrinsic(
				Intrinsic::aarch64_sve_fcmla, Ty,
				{Mask, Accumulator, InputA, InputB, B.getInt32((int)Rotation * 90)});
				}

	Intrinsic::ID IdMap[4] = {Intrinsic::aarch64_neon_vcmla_rot0,			Intrinsic::ID IdMap[4] = {Intrinsic::aarch64_neon_vcmla_rot0,
	Intrinsic::aarch64_neon_vcmla_rot90,			Intrinsic::aarch64_neon_vcmla_rot90,
	Intrinsic::aarch64_neon_vcmla_rot180,			Intrinsic::aarch64_neon_vcmla_rot180,
	Intrinsic::aarch64_neon_vcmla_rot270};			Intrinsic::aarch64_neon_vcmla_rot270};

	if (Accumulator == nullptr)
	Accumulator = ConstantFP::get(Ty, 0);

	return B.CreateIntrinsic(IdMap[(int)Rotation], Ty,			return B.CreateIntrinsic(IdMap[(int)Rotation], Ty,
	{Accumulator, InputB, InputA});			{Accumulator, InputB, InputA});
	}			}

	if (OperationType == ComplexDeinterleavingOperation::CAdd) {			if (OperationType == ComplexDeinterleavingOperation::CAdd) {
				if (IsScalable) {
				auto *Mask = B.CreateVectorSplat(Ty->getElementCount(), B.getInt1(true));
				if (Rotation == ComplexDeinterleavingRotation::Rotation_90 \|\|
				Rotation == ComplexDeinterleavingRotation::Rotation_270)
				return B.CreateIntrinsic(
				Intrinsic::aarch64_sve_fcadd, Ty,
				{Mask, InputA, InputB, B.getInt32((int)Rotation * 90)});
				return nullptr;
				}

	Intrinsic::ID IntId = Intrinsic::not_intrinsic;			Intrinsic::ID IntId = Intrinsic::not_intrinsic;
	if (Rotation == ComplexDeinterleavingRotation::Rotation_90)			if (Rotation == ComplexDeinterleavingRotation::Rotation_90)
	IntId = Intrinsic::aarch64_neon_vcadd_rot90;			IntId = Intrinsic::aarch64_neon_vcadd_rot90;
	else if (Rotation == ComplexDeinterleavingRotation::Rotation_270)			else if (Rotation == ComplexDeinterleavingRotation::Rotation_270)
	IntId = Intrinsic::aarch64_neon_vcadd_rot270;			IntId = Intrinsic::aarch64_neon_vcadd_rot270;

	if (IntId == Intrinsic::not_intrinsic)			if (IntId == Intrinsic::not_intrinsic)
	return nullptr;			return nullptr;
	Show All 17 Lines

llvm/test/CodeGen/AArch64/complex-deinterleaving-f16-add-scalable.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s --mattr=+sve -o - \| FileCheck %s

				target triple = "aarch64-arm-none-eabi"

				; Expected to not transform
				define <vscale x 4 x half> @complex_add_v4f16(<vscale x 4 x half> %a, <vscale x 4 x half> %b) {
				; CHECK-LABEL: complex_add_v4f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: uunpkhi z2.d, z0.s
				; CHECK-NEXT: uunpklo z0.d, z0.s
				; CHECK-NEXT: uunpkhi z3.d, z1.s
				; CHECK-NEXT: uunpklo z1.d, z1.s
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: uzp1 z4.d, z0.d, z2.d
				; CHECK-NEXT: uzp2 z0.d, z0.d, z2.d
				; CHECK-NEXT: uzp2 z2.d, z1.d, z3.d
				; CHECK-NEXT: uzp1 z1.d, z1.d, z3.d
				; CHECK-NEXT: fsubr z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: movprfx z1, z2
				; CHECK-NEXT: fadd z1.h, p0/m, z1.h, z4.h
				; CHECK-NEXT: zip2 z2.d, z0.d, z1.d
				; CHECK-NEXT: zip1 z0.d, z0.d, z1.d
				; CHECK-NEXT: uzp1 z0.s, z0.s, z2.s
				; CHECK-NEXT: ret
				entry:
				%a.deinterleaved = tail call { <vscale x 2 x half>, <vscale x 2 x half> } @llvm.experimental.vector.deinterleave2.nxv4f16(<vscale x 4 x half> %a)
				%a.real = extractvalue { <vscale x 2 x half>, <vscale x 2 x half> } %a.deinterleaved, 0
				%a.imag = extractvalue { <vscale x 2 x half>, <vscale x 2 x half> } %a.deinterleaved, 1
				%b.deinterleaved = tail call { <vscale x 2 x half>, <vscale x 2 x half> } @llvm.experimental.vector.deinterleave2.nxv4f16(<vscale x 4 x half> %b)
				%b.real = extractvalue { <vscale x 2 x half>, <vscale x 2 x half> } %b.deinterleaved, 0
				%b.imag = extractvalue { <vscale x 2 x half>, <vscale x 2 x half> } %b.deinterleaved, 1
				%0 = fsub fast <vscale x 2 x half> %b.real, %a.imag
				%1 = fadd fast <vscale x 2 x half> %b.imag, %a.real
				%interleaved.vec = tail call <vscale x 4 x half> @llvm.experimental.vector.interleave2.nxv4f16(<vscale x 2 x half> %0, <vscale x 2 x half> %1)
				ret <vscale x 4 x half> %interleaved.vec
				}

				; Expected to transform
				define <vscale x 8 x half> @complex_add_v8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b) {
				; CHECK-LABEL: complex_add_v8f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: fcadd z1.h, p0/m, z1.h, z0.h, #90
				; CHECK-NEXT: mov z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%a.deinterleaved = tail call { <vscale x 4 x half>, <vscale x 4 x half> } @llvm.experimental.vector.deinterleave2.nxv8f16(<vscale x 8 x half> %a)
				%a.real = extractvalue { <vscale x 4 x half>, <vscale x 4 x half> } %a.deinterleaved, 0
				%a.imag = extractvalue { <vscale x 4 x half>, <vscale x 4 x half> } %a.deinterleaved, 1
				%b.deinterleaved = tail call { <vscale x 4 x half>, <vscale x 4 x half> } @llvm.experimental.vector.deinterleave2.nxv8f16(<vscale x 8 x half> %b)
				%b.real = extractvalue { <vscale x 4 x half>, <vscale x 4 x half> } %b.deinterleaved, 0
				%b.imag = extractvalue { <vscale x 4 x half>, <vscale x 4 x half> } %b.deinterleaved, 1
				%0 = fsub fast <vscale x 4 x half> %b.real, %a.imag
				%1 = fadd fast <vscale x 4 x half> %b.imag, %a.real
				%interleaved.vec = tail call <vscale x 8 x half> @llvm.experimental.vector.interleave2.nxv8f16(<vscale x 4 x half> %0, <vscale x 4 x half> %1)
				ret <vscale x 8 x half> %interleaved.vec
				}

				; Expected to transform
				define <vscale x 16 x half> @complex_add_v16f16(<vscale x 16 x half> %a, <vscale x 16 x half> %b) {
				; CHECK-LABEL: complex_add_v16f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: fcadd z2.h, p0/m, z2.h, z0.h, #90
				; CHECK-NEXT: fcadd z3.h, p0/m, z3.h, z1.h, #90
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: mov z1.d, z3.d
				; CHECK-NEXT: ret
				entry:
				%a.deinterleaved = tail call { <vscale x 8 x half>, <vscale x 8 x half> } @llvm.experimental.vector.deinterleave2.nxv16f16(<vscale x 16 x half> %a)
				%a.real = extractvalue { <vscale x 8 x half>, <vscale x 8 x half> } %a.deinterleaved, 0
				%a.imag = extractvalue { <vscale x 8 x half>, <vscale x 8 x half> } %a.deinterleaved, 1
				%b.deinterleaved = tail call { <vscale x 8 x half>, <vscale x 8 x half> } @llvm.experimental.vector.deinterleave2.nxv16f16(<vscale x 16 x half> %b)
				%b.real = extractvalue { <vscale x 8 x half>, <vscale x 8 x half> } %b.deinterleaved, 0
				%b.imag = extractvalue { <vscale x 8 x half>, <vscale x 8 x half> } %b.deinterleaved, 1
				%0 = fsub fast <vscale x 8 x half> %b.real, %a.imag
				%1 = fadd fast <vscale x 8 x half> %b.imag, %a.real
				%interleaved.vec = tail call <vscale x 16 x half> @llvm.experimental.vector.interleave2.nxv16f16(<vscale x 8 x half> %0, <vscale x 8 x half> %1)
				ret <vscale x 16 x half> %interleaved.vec
				}

				; Expected to transform
				define <vscale x 32 x half> @complex_add_v32f16(<vscale x 32 x half> %a, <vscale x 32 x half> %b) {
				; CHECK-LABEL: complex_add_v32f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: fcadd z6.h, p0/m, z6.h, z2.h, #90
				; CHECK-NEXT: fcadd z4.h, p0/m, z4.h, z0.h, #90
				; CHECK-NEXT: fcadd z5.h, p0/m, z5.h, z1.h, #90
				; CHECK-NEXT: fcadd z7.h, p0/m, z7.h, z3.h, #90
				; CHECK-NEXT: mov z0.d, z4.d
				; CHECK-NEXT: mov z1.d, z5.d
				; CHECK-NEXT: mov z2.d, z6.d
				; CHECK-NEXT: mov z3.d, z7.d
				; CHECK-NEXT: ret
				entry:
				%a.deinterleaved = tail call { <vscale x 16 x half>, <vscale x 16 x half> } @llvm.experimental.vector.deinterleave2.nxv32f16(<vscale x 32 x half> %a)
				%a.real = extractvalue { <vscale x 16 x half>, <vscale x 16 x half> } %a.deinterleaved, 0
				%a.imag = extractvalue { <vscale x 16 x half>, <vscale x 16 x half> } %a.deinterleaved, 1
				%b.deinterleaved = tail call { <vscale x 16 x half>, <vscale x 16 x half> } @llvm.experimental.vector.deinterleave2.nxv32f16(<vscale x 32 x half> %b)
				%b.real = extractvalue { <vscale x 16 x half>, <vscale x 16 x half> } %b.deinterleaved, 0
				%b.imag = extractvalue { <vscale x 16 x half>, <vscale x 16 x half> } %b.deinterleaved, 1
				%0 = fsub fast <vscale x 16 x half> %b.real, %a.imag
				%1 = fadd fast <vscale x 16 x half> %b.imag, %a.real
				%interleaved.vec = tail call <vscale x 32 x half> @llvm.experimental.vector.interleave2.nxv32f16(<vscale x 16 x half> %0, <vscale x 16 x half> %1)
				ret <vscale x 32 x half> %interleaved.vec
				}

				declare { <vscale x 2 x half>, <vscale x 2 x half> } @llvm.experimental.vector.deinterleave2.nxv4f16(<vscale x 4 x half>)
				declare <vscale x 4 x half> @llvm.experimental.vector.interleave2.nxv4f16(<vscale x 2 x half>, <vscale x 2 x half>)

				declare { <vscale x 4 x half>, <vscale x 4 x half> } @llvm.experimental.vector.deinterleave2.nxv8f16(<vscale x 8 x half>)
				declare <vscale x 8 x half> @llvm.experimental.vector.interleave2.nxv8f16(<vscale x 4 x half>, <vscale x 4 x half>)

				declare { <vscale x 8 x half>, <vscale x 8 x half> } @llvm.experimental.vector.deinterleave2.nxv16f16(<vscale x 16 x half>)
				declare <vscale x 16 x half> @llvm.experimental.vector.interleave2.nxv16f16(<vscale x 8 x half>, <vscale x 8 x half>)

				declare { <vscale x 16 x half>, <vscale x 16 x half> } @llvm.experimental.vector.deinterleave2.nxv32f16(<vscale x 32 x half>)
				declare <vscale x 32 x half> @llvm.experimental.vector.interleave2.nxv32f16(<vscale x 16 x half>, <vscale x 16 x half>)

llvm/test/CodeGen/AArch64/complex-deinterleaving-f16-add.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s --mattr=+complxnum,+neon,+fullfp16 -o - \| FileCheck %s		; RUN: llc < %s --mattr=+complxnum,+neon,+fullfp16 -o - \| FileCheck %s
		; RUN: llc < %s --mattr=+complxnum,+neon,+fullfp16,+sve -o - \| FileCheck %s

target triple = "aarch64-arm-none-eabi"		target triple = "aarch64-arm-none-eabi"

; Expected to not transform		; Expected to not transform
define <2 x half> @complex_add_v2f16(<2 x half> %a, <2 x half> %b) {		define <2 x half> @complex_add_v2f16(<2 x half> %a, <2 x half> %b) {
; CHECK-LABEL: complex_add_v2f16:		; CHECK-LABEL: complex_add_v2f16:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1		; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	entry:
%a.imag = shufflevector <32 x half> %a, <32 x half> zeroinitializer, <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31>		%a.imag = shufflevector <32 x half> %a, <32 x half> zeroinitializer, <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31>
%b.real = shufflevector <32 x half> %b, <32 x half> zeroinitializer, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 16, i32 18, i32 20, i32 22, i32 24, i32 26, i32 28, i32 30>		%b.real = shufflevector <32 x half> %b, <32 x half> zeroinitializer, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 16, i32 18, i32 20, i32 22, i32 24, i32 26, i32 28, i32 30>
%b.imag = shufflevector <32 x half> %b, <32 x half> zeroinitializer, <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31>		%b.imag = shufflevector <32 x half> %b, <32 x half> zeroinitializer, <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31>
%0 = fsub fast <16 x half> %b.real, %a.imag		%0 = fsub fast <16 x half> %b.real, %a.imag
%1 = fadd fast <16 x half> %b.imag, %a.real		%1 = fadd fast <16 x half> %b.imag, %a.real
%interleaved.vec = shufflevector <16 x half> %0, <16 x half> %1, <32 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23, i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31>		%interleaved.vec = shufflevector <16 x half> %0, <16 x half> %1, <32 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23, i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31>
ret <32 x half> %interleaved.vec		ret <32 x half> %interleaved.vec
}		}

		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions I am not sure if I should duplicate all fixed-width vector tests to reflect the fact that `llvm.experimental.vector.deinterleave2` is now supported there. igor.kirillov: I am not sure if I should duplicate all fixed-width vector tests to reflect the fact that `llvm.
		NickGuyUnsubmitted Not Done Reply Inline Actions It can't hurt, more coverage is always good. On the other hand, this is probably enough to test that the intrinsics work with fixed width cases. NickGuy: It can't hurt, more coverage is always good. On the other hand, this is probably enough to test…
		; Expected to transform
		define <4 x half> @complex_add_v4f16_with_intrinsic(<4 x half> %a, <4 x half> %b) {
		; CHECK-LABEL: complex_add_v4f16_with_intrinsic:
		; CHECK: // %bb.0: // %entry
		; CHECK-NEXT: fcadd v0.4h, v1.4h, v0.4h, #90
		; CHECK-NEXT: ret
		NickGuyUnsubmitted Done Reply Inline Actions Is the top comment correct? Is this case actually expected to not transform? Given the `fcadd` in the output, I'd assume that this is expected to transform. NickGuy: Is the top comment correct? Is this case actually expected to not transform? Given the `fcadd`…
		igor.kirillovAuthorUnsubmitted Done Reply Inline Actions Yes, indeed. Also fixed in the other places. igor.kirillov: Yes, indeed. Also fixed in the other places.
		entry:
		%a.deinterleaved = tail call { <2 x half>, <2 x half> } @llvm.experimental.vector.deinterleave2.v4f16(<4 x half> %a)
		%a.real = extractvalue { <2 x half>, <2 x half> } %a.deinterleaved, 0
		%a.imag = extractvalue { <2 x half>, <2 x half> } %a.deinterleaved, 1
		%b.deinterleaved = tail call { <2 x half>, <2 x half> } @llvm.experimental.vector.deinterleave2.v4f16(<4 x half> %b)
		%b.real = extractvalue { <2 x half>, <2 x half> } %b.deinterleaved, 0
		%b.imag = extractvalue { <2 x half>, <2 x half> } %b.deinterleaved, 1
		%0 = fsub fast <2 x half> %b.real, %a.imag
		%1 = fadd fast <2 x half> %b.imag, %a.real
		%interleaved.vec = tail call <4 x half> @llvm.experimental.vector.interleave2.v4f16(<2 x half> %0, <2 x half> %1)
		ret <4 x half> %interleaved.vec
		}

		; Expected to transform
		define <8 x half> @complex_add_v8f16_with_intrinsic(<8 x half> %a, <8 x half> %b) {
		; CHECK-LABEL: complex_add_v8f16_with_intrinsic:
		; CHECK: // %bb.0: // %entry
		; CHECK-NEXT: fcadd v0.8h, v1.8h, v0.8h, #90
		; CHECK-NEXT: ret
		entry:
		%a.deinterleaved = tail call { <4 x half>, <4 x half> } @llvm.experimental.vector.deinterleave2.v8f16(<8 x half> %a)
		%a.real = extractvalue { <4 x half>, <4 x half> } %a.deinterleaved, 0
		%a.imag = extractvalue { <4 x half>, <4 x half> } %a.deinterleaved, 1
		%b.deinterleaved = tail call { <4 x half>, <4 x half> } @llvm.experimental.vector.deinterleave2.v8f16(<8 x half> %b)
		%b.real = extractvalue { <4 x half>, <4 x half> } %b.deinterleaved, 0
		%b.imag = extractvalue { <4 x half>, <4 x half> } %b.deinterleaved, 1
		%0 = fsub fast <4 x half> %b.real, %a.imag
		%1 = fadd fast <4 x half> %b.imag, %a.real
		%interleaved.vec = tail call <8 x half> @llvm.experimental.vector.interleave2.v8f16(<4 x half> %0, <4 x half> %1)
		ret <8 x half> %interleaved.vec
		}

		; Expected to transform
		define <16 x half> @complex_add_v16f16_with_intrinsic(<16 x half> %a, <16 x half> %b) {
		; CHECK-LABEL: complex_add_v16f16_with_intrinsic:
		; CHECK: // %bb.0: // %entry
		; CHECK-NEXT: fcadd v0.8h, v2.8h, v0.8h, #90
		; CHECK-NEXT: fcadd v1.8h, v3.8h, v1.8h, #90
		; CHECK-NEXT: ret
		entry:
		%a.deinterleaved = tail call { <8 x half>, <8 x half> } @llvm.experimental.vector.deinterleave2.v16f16(<16 x half> %a)
		%a.real = extractvalue { <8 x half>, <8 x half> } %a.deinterleaved, 0
		%a.imag = extractvalue { <8 x half>, <8 x half> } %a.deinterleaved, 1
		%b.deinterleaved = tail call { <8 x half>, <8 x half> } @llvm.experimental.vector.deinterleave2.v16f16(<16 x half> %b)
		%b.real = extractvalue { <8 x half>, <8 x half> } %b.deinterleaved, 0
		%b.imag = extractvalue { <8 x half>, <8 x half> } %b.deinterleaved, 1
		%0 = fsub fast <8 x half> %b.real, %a.imag
		%1 = fadd fast <8 x half> %b.imag, %a.real
		%interleaved.vec = tail call <16 x half> @llvm.experimental.vector.interleave2.v16f16(<8 x half> %0, <8 x half> %1)
		ret <16 x half> %interleaved.vec
		}

		declare { <2 x half>, <2 x half> } @llvm.experimental.vector.deinterleave2.v4f16(<4 x half>)
		declare <4 x half> @llvm.experimental.vector.interleave2.v4f16(<2 x half>, <2 x half>)

		declare { <4 x half>, <4 x half> } @llvm.experimental.vector.deinterleave2.v8f16(<8 x half>)
		declare <8 x half> @llvm.experimental.vector.interleave2.v8f16(<4 x half>, <4 x half>)

		declare { <8 x half>, <8 x half> } @llvm.experimental.vector.deinterleave2.v16f16(<16 x half>)
		declare <16 x half> @llvm.experimental.vector.interleave2.v16f16(<8 x half>, <8 x half>)

llvm/test/CodeGen/AArch64/complex-deinterleaving-f16-mul-scalable.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s --mattr=+sve -o - \| FileCheck %s

				target triple = "aarch64-arm-none-eabi"

				; Expected to transform
				define <vscale x 4 x half> @complex_mul_v4f16(<vscale x 4 x half> %a, <vscale x 4 x half> %b) {
				; CHECK-LABEL: complex_mul_v4f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: uunpkhi z2.d, z0.s
				; CHECK-NEXT: uunpklo z0.d, z0.s
				; CHECK-NEXT: uunpkhi z3.d, z1.s
				; CHECK-NEXT: uunpklo z1.d, z1.s
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: uzp2 z4.d, z0.d, z2.d
				; CHECK-NEXT: uzp1 z0.d, z0.d, z2.d
				; CHECK-NEXT: uzp2 z2.d, z1.d, z3.d
				; CHECK-NEXT: uzp1 z1.d, z1.d, z3.d
				; CHECK-NEXT: movprfx z3, z2
				; CHECK-NEXT: fmul z3.h, p0/m, z3.h, z0.h
				; CHECK-NEXT: fmla z3.h, p0/m, z1.h, z4.h
				; CHECK-NEXT: fmul z2.h, p0/m, z2.h, z4.h
				; CHECK-NEXT: fnmsb z0.h, p0/m, z1.h, z2.h
				; CHECK-NEXT: zip2 z1.d, z0.d, z3.d
				; CHECK-NEXT: zip1 z0.d, z0.d, z3.d
				; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%a.deinterleaved = tail call { <vscale x 2 x half>, <vscale x 2 x half> } @llvm.experimental.vector.deinterleave2.nxv4f16(<vscale x 4 x half> %a)
				%a.real = extractvalue { <vscale x 2 x half>, <vscale x 2 x half> } %a.deinterleaved, 0
				%a.imag = extractvalue { <vscale x 2 x half>, <vscale x 2 x half> } %a.deinterleaved, 1
				%b.deinterleaved = tail call { <vscale x 2 x half>, <vscale x 2 x half> } @llvm.experimental.vector.deinterleave2.nxv4f16(<vscale x 4 x half> %b)
				%b.real = extractvalue { <vscale x 2 x half>, <vscale x 2 x half> } %b.deinterleaved, 0
				%b.imag = extractvalue { <vscale x 2 x half>, <vscale x 2 x half> } %b.deinterleaved, 1
				%0 = fmul fast <vscale x 2 x half> %b.imag, %a.real
				%1 = fmul fast <vscale x 2 x half> %b.real, %a.imag
				%2 = fadd fast <vscale x 2 x half> %1, %0
				%3 = fmul fast <vscale x 2 x half> %b.real, %a.real
				%4 = fmul fast <vscale x 2 x half> %a.imag, %b.imag
				%5 = fsub fast <vscale x 2 x half> %3, %4
				%interleaved.vec = tail call <vscale x 4 x half> @llvm.experimental.vector.interleave2.nxv4f16(<vscale x 2 x half> %5, <vscale x 2 x half> %2)
				ret <vscale x 4 x half> %interleaved.vec
				}

				; Expected to transform
				define <vscale x 8 x half> @complex_mul_v8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b) {
				; CHECK-LABEL: complex_mul_v8f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: mov z2.h, #0 // =0x0
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: fcmla z2.h, p0/m, z1.h, z0.h, #0
				; CHECK-NEXT: fcmla z2.h, p0/m, z1.h, z0.h, #90
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				entry:
				%a.deinterleaved = tail call { <vscale x 4 x half>, <vscale x 4 x half> } @llvm.experimental.vector.deinterleave2.nxv8f16(<vscale x 8 x half> %a)
				%a.real = extractvalue { <vscale x 4 x half>, <vscale x 4 x half> } %a.deinterleaved, 0
				%a.imag = extractvalue { <vscale x 4 x half>, <vscale x 4 x half> } %a.deinterleaved, 1
				%b.deinterleaved = tail call { <vscale x 4 x half>, <vscale x 4 x half> } @llvm.experimental.vector.deinterleave2.nxv8f16(<vscale x 8 x half> %b)
				%b.real = extractvalue { <vscale x 4 x half>, <vscale x 4 x half> } %b.deinterleaved, 0
				%b.imag = extractvalue { <vscale x 4 x half>, <vscale x 4 x half> } %b.deinterleaved, 1
				%0 = fmul fast <vscale x 4 x half> %b.imag, %a.real
				%1 = fmul fast <vscale x 4 x half> %b.real, %a.imag
				%2 = fadd fast <vscale x 4 x half> %1, %0
				%3 = fmul fast <vscale x 4 x half> %b.real, %a.real
				%4 = fmul fast <vscale x 4 x half> %a.imag, %b.imag
				%5 = fsub fast <vscale x 4 x half> %3, %4
				%interleaved.vec = tail call <vscale x 8 x half> @llvm.experimental.vector.interleave2.nxv8f16(<vscale x 4 x half> %5, <vscale x 4 x half> %2)
				ret <vscale x 8 x half> %interleaved.vec
				}
				; Expected to transform
				define <vscale x 16 x half> @complex_mul_v16f16(<vscale x 16 x half> %a, <vscale x 16 x half> %b) {
				; CHECK-LABEL: complex_mul_v16f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: mov z4.h, #0 // =0x0
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: mov z5.d, z4.d
				; CHECK-NEXT: fcmla z4.h, p0/m, z3.h, z1.h, #0
				; CHECK-NEXT: fcmla z5.h, p0/m, z2.h, z0.h, #0
				; CHECK-NEXT: fcmla z4.h, p0/m, z3.h, z1.h, #90
				; CHECK-NEXT: fcmla z5.h, p0/m, z2.h, z0.h, #90
				; CHECK-NEXT: mov z1.d, z4.d
				; CHECK-NEXT: mov z0.d, z5.d
				; CHECK-NEXT: ret
				entry:
				%a.deinterleaved = tail call { <vscale x 8 x half>, <vscale x 8 x half> } @llvm.experimental.vector.deinterleave2.nxv16f16(<vscale x 16 x half> %a)
				%a.real = extractvalue { <vscale x 8 x half>, <vscale x 8 x half> } %a.deinterleaved, 0
				%a.imag = extractvalue { <vscale x 8 x half>, <vscale x 8 x half> } %a.deinterleaved, 1
				%b.deinterleaved = tail call { <vscale x 8 x half>, <vscale x 8 x half> } @llvm.experimental.vector.deinterleave2.nxv16f16(<vscale x 16 x half> %b)
				%b.real = extractvalue { <vscale x 8 x half>, <vscale x 8 x half> } %b.deinterleaved, 0
				%b.imag = extractvalue { <vscale x 8 x half>, <vscale x 8 x half> } %b.deinterleaved, 1
				%0 = fmul fast <vscale x 8 x half> %b.imag, %a.real
				%1 = fmul fast <vscale x 8 x half> %b.real, %a.imag
				%2 = fadd fast <vscale x 8 x half> %1, %0
				%3 = fmul fast <vscale x 8 x half> %b.real, %a.real
				%4 = fmul fast <vscale x 8 x half> %a.imag, %b.imag
				%5 = fsub fast <vscale x 8 x half> %3, %4
				%interleaved.vec = tail call <vscale x 16 x half> @llvm.experimental.vector.interleave2.nxv16f16(<vscale x 8 x half> %5, <vscale x 8 x half> %2)
				ret <vscale x 16 x half> %interleaved.vec
				}

				; Expected to transform
				define <vscale x 32 x half> @complex_mul_v32f16(<vscale x 32 x half> %a, <vscale x 32 x half> %b) {
				; CHECK-LABEL: complex_mul_v32f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: mov z24.h, #0 // =0x0
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: mov z25.d, z24.d
				; CHECK-NEXT: mov z26.d, z24.d
				; CHECK-NEXT: mov z27.d, z24.d
				; CHECK-NEXT: fcmla z25.h, p0/m, z4.h, z0.h, #0
				; CHECK-NEXT: fcmla z26.h, p0/m, z5.h, z1.h, #0
				; CHECK-NEXT: fcmla z27.h, p0/m, z6.h, z2.h, #0
				; CHECK-NEXT: fcmla z24.h, p0/m, z7.h, z3.h, #0
				; CHECK-NEXT: fcmla z25.h, p0/m, z4.h, z0.h, #90
				; CHECK-NEXT: fcmla z26.h, p0/m, z5.h, z1.h, #90
				; CHECK-NEXT: fcmla z27.h, p0/m, z6.h, z2.h, #90
				; CHECK-NEXT: fcmla z24.h, p0/m, z7.h, z3.h, #90
				; CHECK-NEXT: mov z0.d, z25.d
				; CHECK-NEXT: mov z1.d, z26.d
				; CHECK-NEXT: mov z2.d, z27.d
				; CHECK-NEXT: mov z3.d, z24.d
				; CHECK-NEXT: ret
				entry:
				%a.deinterleaved = tail call { <vscale x 16 x half>, <vscale x 16 x half> } @llvm.experimental.vector.deinterleave2.nxv32f16(<vscale x 32 x half> %a)
				%a.real = extractvalue { <vscale x 16 x half>, <vscale x 16 x half> } %a.deinterleaved, 0
				%a.imag = extractvalue { <vscale x 16 x half>, <vscale x 16 x half> } %a.deinterleaved, 1
				%b.deinterleaved = tail call { <vscale x 16 x half>, <vscale x 16 x half> } @llvm.experimental.vector.deinterleave2.nxv32f16(<vscale x 32 x half> %b)
				%b.real = extractvalue { <vscale x 16 x half>, <vscale x 16 x half> } %b.deinterleaved, 0
				%b.imag = extractvalue { <vscale x 16 x half>, <vscale x 16 x half> } %b.deinterleaved, 1
				%0 = fmul fast <vscale x 16 x half> %b.imag, %a.real
				%1 = fmul fast <vscale x 16 x half> %b.real, %a.imag
				%2 = fadd fast <vscale x 16 x half> %1, %0
				%3 = fmul fast <vscale x 16 x half> %b.real, %a.real
				%4 = fmul fast <vscale x 16 x half> %a.imag, %b.imag
				%5 = fsub fast <vscale x 16 x half> %3, %4
				%interleaved.vec = tail call <vscale x 32 x half> @llvm.experimental.vector.interleave2.nxv32f16(<vscale x 16 x half> %5, <vscale x 16 x half> %2)
				ret <vscale x 32 x half> %interleaved.vec
				}

				declare { <vscale x 2 x half>, <vscale x 2 x half> } @llvm.experimental.vector.deinterleave2.nxv4f16(<vscale x 4 x half>)
				declare <vscale x 4 x half> @llvm.experimental.vector.interleave2.nxv4f16(<vscale x 2 x half>, <vscale x 2 x half>)

				declare { <vscale x 4 x half>, <vscale x 4 x half> } @llvm.experimental.vector.deinterleave2.nxv8f16(<vscale x 8 x half>)
				declare <vscale x 8 x half> @llvm.experimental.vector.interleave2.nxv8f16(<vscale x 4 x half>, <vscale x 4 x half>)

				declare { <vscale x 8 x half>, <vscale x 8 x half> } @llvm.experimental.vector.deinterleave2.nxv16f16(<vscale x 16 x half>)
				declare <vscale x 16 x half> @llvm.experimental.vector.interleave2.nxv16f16(<vscale x 8 x half>, <vscale x 8 x half>)

				declare { <vscale x 16 x half>, <vscale x 16 x half> } @llvm.experimental.vector.deinterleave2.nxv32f16(<vscale x 32 x half>)
				declare <vscale x 32 x half> @llvm.experimental.vector.interleave2.nxv32f16(<vscale x 16 x half>, <vscale x 16 x half>)

llvm/test/CodeGen/AArch64/complex-deinterleaving-f32-add-scalable.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s --mattr=+sve -o - \| FileCheck %s

				target triple = "aarch64-arm-none-eabi"

				; Expected to transform
				define <vscale x 4 x float> @complex_add_v4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b) {
				; CHECK-LABEL: complex_add_v4f32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: fcadd z1.s, p0/m, z1.s, z0.s, #90
				; CHECK-NEXT: mov z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%a.deinterleaved = tail call { <vscale x 2 x float>, <vscale x 2 x float> } @llvm.experimental.vector.deinterleave2.nxv4f32(<vscale x 4 x float> %a)
				%a.real = extractvalue { <vscale x 2 x float>, <vscale x 2 x float> } %a.deinterleaved, 0
				%a.imag = extractvalue { <vscale x 2 x float>, <vscale x 2 x float> } %a.deinterleaved, 1
				%b.deinterleaved = tail call { <vscale x 2 x float>, <vscale x 2 x float> } @llvm.experimental.vector.deinterleave2.nxv4f32(<vscale x 4 x float> %b)
				%b.real = extractvalue { <vscale x 2 x float>, <vscale x 2 x float> } %b.deinterleaved, 0
				%b.imag = extractvalue { <vscale x 2 x float>, <vscale x 2 x float> } %b.deinterleaved, 1
				%0 = fsub fast <vscale x 2 x float> %b.real, %a.imag
				%1 = fadd fast <vscale x 2 x float> %b.imag, %a.real
				%interleaved.vec = tail call <vscale x 4 x float> @llvm.experimental.vector.interleave2.nxv4f32(<vscale x 2 x float> %0, <vscale x 2 x float> %1)
				ret <vscale x 4 x float> %interleaved.vec
				}

				; Expected to transform
				define <vscale x 8 x float> @complex_add_v8f32(<vscale x 8 x float> %a, <vscale x 8 x float> %b) {
				; CHECK-LABEL: complex_add_v8f32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: fcadd z2.s, p0/m, z2.s, z0.s, #90
				; CHECK-NEXT: fcadd z3.s, p0/m, z3.s, z1.s, #90
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: mov z1.d, z3.d
				; CHECK-NEXT: ret
				entry:
				%a.deinterleaved = tail call { <vscale x 4 x float>, <vscale x 4 x float> } @llvm.experimental.vector.deinterleave2.nxv8f32(<vscale x 8 x float> %a)
				%a.real = extractvalue { <vscale x 4 x float>, <vscale x 4 x float> } %a.deinterleaved, 0
				%a.imag = extractvalue { <vscale x 4 x float>, <vscale x 4 x float> } %a.deinterleaved, 1
				%b.deinterleaved = tail call { <vscale x 4 x float>, <vscale x 4 x float> } @llvm.experimental.vector.deinterleave2.nxv8f32(<vscale x 8 x float> %b)
				%b.real = extractvalue { <vscale x 4 x float>, <vscale x 4 x float> } %b.deinterleaved, 0
				%b.imag = extractvalue { <vscale x 4 x float>, <vscale x 4 x float> } %b.deinterleaved, 1
				%0 = fsub fast <vscale x 4 x float> %b.real, %a.imag
				%1 = fadd fast <vscale x 4 x float> %b.imag, %a.real
				%interleaved.vec = tail call <vscale x 8 x float> @llvm.experimental.vector.interleave2.nxv8f32(<vscale x 4 x float> %0, <vscale x 4 x float> %1)
				ret <vscale x 8 x float> %interleaved.vec
				}
				; Expected to transform
				define <vscale x 16 x float> @complex_add_v16f32(<vscale x 16 x float> %a, <vscale x 16 x float> %b) {
				; CHECK-LABEL: complex_add_v16f32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: fcadd z6.s, p0/m, z6.s, z2.s, #90
				; CHECK-NEXT: fcadd z4.s, p0/m, z4.s, z0.s, #90
				; CHECK-NEXT: fcadd z5.s, p0/m, z5.s, z1.s, #90
				; CHECK-NEXT: fcadd z7.s, p0/m, z7.s, z3.s, #90
				; CHECK-NEXT: mov z0.d, z4.d
				; CHECK-NEXT: mov z1.d, z5.d
				; CHECK-NEXT: mov z2.d, z6.d
				; CHECK-NEXT: mov z3.d, z7.d
				; CHECK-NEXT: ret
				entry:
				%a.deinterleaved = tail call { <vscale x 8 x float>, <vscale x 8 x float> } @llvm.experimental.vector.deinterleave2.nxv16f32(<vscale x 16 x float> %a)
				%a.real = extractvalue { <vscale x 8 x float>, <vscale x 8 x float> } %a.deinterleaved, 0
				%a.imag = extractvalue { <vscale x 8 x float>, <vscale x 8 x float> } %a.deinterleaved, 1
				%b.deinterleaved = tail call { <vscale x 8 x float>, <vscale x 8 x float> } @llvm.experimental.vector.deinterleave2.nxv16f32(<vscale x 16 x float> %b)
				%b.real = extractvalue { <vscale x 8 x float>, <vscale x 8 x float> } %b.deinterleaved, 0
				%b.imag = extractvalue { <vscale x 8 x float>, <vscale x 8 x float> } %b.deinterleaved, 1
				%0 = fsub fast <vscale x 8 x float> %b.real, %a.imag
				%1 = fadd fast <vscale x 8 x float> %b.imag, %a.real
				%interleaved.vec = tail call <vscale x 16 x float> @llvm.experimental.vector.interleave2.nxv16f32(<vscale x 8 x float> %0, <vscale x 8 x float> %1)
				ret <vscale x 16 x float> %interleaved.vec
				}

				declare { <vscale x 2 x float>, <vscale x 2 x float> } @llvm.experimental.vector.deinterleave2.nxv4f32(<vscale x 4 x float>)
				declare <vscale x 4 x float> @llvm.experimental.vector.interleave2.nxv4f32(<vscale x 2 x float>, <vscale x 2 x float>)

				declare { <vscale x 4 x float>, <vscale x 4 x float> } @llvm.experimental.vector.deinterleave2.nxv8f32(<vscale x 8 x float>)
				declare <vscale x 8 x float> @llvm.experimental.vector.interleave2.nxv8f32(<vscale x 4 x float>, <vscale x 4 x float>)

				declare { <vscale x 8 x float>, <vscale x 8 x float> } @llvm.experimental.vector.deinterleave2.nxv16f32(<vscale x 16 x float>)
				declare <vscale x 16 x float> @llvm.experimental.vector.interleave2.nxv16f32(<vscale x 8 x float>, <vscale x 8 x float>)

llvm/test/CodeGen/AArch64/complex-deinterleaving-f32-mul-scalable.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s --mattr=+sve -o - \| FileCheck %s

				target triple = "aarch64-arm-none-eabi"

				; Expected to transform
				define <vscale x 4 x float> @complex_mul_v4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b) {
				; CHECK-LABEL: complex_mul_v4f32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: mov z2.s, #0 // =0x0
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: fcmla z2.s, p0/m, z1.s, z0.s, #0
				; CHECK-NEXT: fcmla z2.s, p0/m, z1.s, z0.s, #90
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				entry:
				%a.deinterleaved = tail call { <vscale x 2 x float>, <vscale x 2 x float> } @llvm.experimental.vector.deinterleave2.nxv4f32(<vscale x 4 x float> %a)
				%a.real = extractvalue { <vscale x 2 x float>, <vscale x 2 x float> } %a.deinterleaved, 0
				%a.imag = extractvalue { <vscale x 2 x float>, <vscale x 2 x float> } %a.deinterleaved, 1
				%b.deinterleaved = tail call { <vscale x 2 x float>, <vscale x 2 x float> } @llvm.experimental.vector.deinterleave2.nxv4f32(<vscale x 4 x float> %b)
				%b.real = extractvalue { <vscale x 2 x float>, <vscale x 2 x float> } %b.deinterleaved, 0
				%b.imag = extractvalue { <vscale x 2 x float>, <vscale x 2 x float> } %b.deinterleaved, 1
				%0 = fmul fast <vscale x 2 x float> %b.imag, %a.real
				%1 = fmul fast <vscale x 2 x float> %b.real, %a.imag
				%2 = fadd fast <vscale x 2 x float> %1, %0
				%3 = fmul fast <vscale x 2 x float> %b.real, %a.real
				%4 = fmul fast <vscale x 2 x float> %a.imag, %b.imag
				%5 = fsub fast <vscale x 2 x float> %3, %4
				%interleaved.vec = tail call <vscale x 4 x float> @llvm.experimental.vector.interleave2.nxv4f32(<vscale x 2 x float> %5, <vscale x 2 x float> %2)
				ret <vscale x 4 x float> %interleaved.vec
				}

				; Expected to transform
				define <vscale x 8 x float> @complex_mul_v8f32(<vscale x 8 x float> %a, <vscale x 8 x float> %b) {
				; CHECK-LABEL: complex_mul_v8f32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: mov z4.s, #0 // =0x0
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mov z5.d, z4.d
				; CHECK-NEXT: fcmla z4.s, p0/m, z3.s, z1.s, #0
				; CHECK-NEXT: fcmla z5.s, p0/m, z2.s, z0.s, #0
				; CHECK-NEXT: fcmla z4.s, p0/m, z3.s, z1.s, #90
				; CHECK-NEXT: fcmla z5.s, p0/m, z2.s, z0.s, #90
				; CHECK-NEXT: mov z1.d, z4.d
				; CHECK-NEXT: mov z0.d, z5.d
				; CHECK-NEXT: ret
				entry:
				%a.deinterleaved = tail call { <vscale x 4 x float>, <vscale x 4 x float> } @llvm.experimental.vector.deinterleave2.nxv8f32(<vscale x 8 x float> %a)
				%a.real = extractvalue { <vscale x 4 x float>, <vscale x 4 x float> } %a.deinterleaved, 0
				%a.imag = extractvalue { <vscale x 4 x float>, <vscale x 4 x float> } %a.deinterleaved, 1
				%b.deinterleaved = tail call { <vscale x 4 x float>, <vscale x 4 x float> } @llvm.experimental.vector.deinterleave2.nxv8f32(<vscale x 8 x float> %b)
				%b.real = extractvalue { <vscale x 4 x float>, <vscale x 4 x float> } %b.deinterleaved, 0
				%b.imag = extractvalue { <vscale x 4 x float>, <vscale x 4 x float> } %b.deinterleaved, 1
				%0 = fmul fast <vscale x 4 x float> %b.imag, %a.real
				%1 = fmul fast <vscale x 4 x float> %b.real, %a.imag
				%2 = fadd fast <vscale x 4 x float> %1, %0
				%3 = fmul fast <vscale x 4 x float> %b.real, %a.real
				%4 = fmul fast <vscale x 4 x float> %a.imag, %b.imag
				%5 = fsub fast <vscale x 4 x float> %3, %4
				%interleaved.vec = tail call <vscale x 8 x float> @llvm.experimental.vector.interleave2.nxv8f32(<vscale x 4 x float> %5, <vscale x 4 x float> %2)
				ret <vscale x 8 x float> %interleaved.vec
				}

				; Expected to transform
				define <vscale x 16 x float> @complex_mul_v16f32(<vscale x 16 x float> %a, <vscale x 16 x float> %b) {
				; CHECK-LABEL: complex_mul_v16f32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: mov z24.s, #0 // =0x0
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mov z25.d, z24.d
				; CHECK-NEXT: mov z26.d, z24.d
				; CHECK-NEXT: mov z27.d, z24.d
				; CHECK-NEXT: fcmla z25.s, p0/m, z4.s, z0.s, #0
				; CHECK-NEXT: fcmla z26.s, p0/m, z5.s, z1.s, #0
				; CHECK-NEXT: fcmla z27.s, p0/m, z6.s, z2.s, #0
				; CHECK-NEXT: fcmla z24.s, p0/m, z7.s, z3.s, #0
				; CHECK-NEXT: fcmla z25.s, p0/m, z4.s, z0.s, #90
				; CHECK-NEXT: fcmla z26.s, p0/m, z5.s, z1.s, #90
				; CHECK-NEXT: fcmla z27.s, p0/m, z6.s, z2.s, #90
				; CHECK-NEXT: fcmla z24.s, p0/m, z7.s, z3.s, #90
				; CHECK-NEXT: mov z0.d, z25.d
				; CHECK-NEXT: mov z1.d, z26.d
				; CHECK-NEXT: mov z2.d, z27.d
				; CHECK-NEXT: mov z3.d, z24.d
				; CHECK-NEXT: ret
				entry:
				%a.deinterleaved = tail call { <vscale x 8 x float>, <vscale x 8 x float> } @llvm.experimental.vector.deinterleave2.nxv16f32(<vscale x 16 x float> %a)
				%a.real = extractvalue { <vscale x 8 x float>, <vscale x 8 x float> } %a.deinterleaved, 0
				%a.imag = extractvalue { <vscale x 8 x float>, <vscale x 8 x float> } %a.deinterleaved, 1
				%b.deinterleaved = tail call { <vscale x 8 x float>, <vscale x 8 x float> } @llvm.experimental.vector.deinterleave2.nxv16f32(<vscale x 16 x float> %b)
				%b.real = extractvalue { <vscale x 8 x float>, <vscale x 8 x float> } %b.deinterleaved, 0
				%b.imag = extractvalue { <vscale x 8 x float>, <vscale x 8 x float> } %b.deinterleaved, 1
				%0 = fmul fast <vscale x 8 x float> %b.imag, %a.real
				%1 = fmul fast <vscale x 8 x float> %b.real, %a.imag
				%2 = fadd fast <vscale x 8 x float> %1, %0
				%3 = fmul fast <vscale x 8 x float> %b.real, %a.real
				%4 = fmul fast <vscale x 8 x float> %a.imag, %b.imag
				%5 = fsub fast <vscale x 8 x float> %3, %4
				%interleaved.vec = tail call <vscale x 16 x float> @llvm.experimental.vector.interleave2.nxv16f32(<vscale x 8 x float> %5, <vscale x 8 x float> %2)
				ret <vscale x 16 x float> %interleaved.vec
				}

				declare { <vscale x 2 x float>, <vscale x 2 x float> } @llvm.experimental.vector.deinterleave2.nxv4f32(<vscale x 4 x float>)
				declare <vscale x 4 x float> @llvm.experimental.vector.interleave2.nxv4f32(<vscale x 2 x float>, <vscale x 2 x float>)

				declare { <vscale x 4 x float>, <vscale x 4 x float> } @llvm.experimental.vector.deinterleave2.nxv8f32(<vscale x 8 x float>)
				declare <vscale x 8 x float> @llvm.experimental.vector.interleave2.nxv8f32(<vscale x 4 x float>, <vscale x 4 x float>)

				declare { <vscale x 8 x float>, <vscale x 8 x float> } @llvm.experimental.vector.deinterleave2.nxv16f32(<vscale x 16 x float>)
				declare <vscale x 16 x float> @llvm.experimental.vector.interleave2.nxv16f32(<vscale x 8 x float>, <vscale x 8 x float>)

llvm/test/CodeGen/AArch64/complex-deinterleaving-f64-add-scalable.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s --mattr=+sve -o - \| FileCheck %s

				target triple = "aarch64-arm-none-eabi"

				; Expected to transform
				define <vscale x 2 x double> @complex_add_v2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b) {
				; CHECK-LABEL: complex_add_v2f64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: fcadd z1.d, p0/m, z1.d, z0.d, #90
				; CHECK-NEXT: mov z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%a.deinterleaved = tail call { <vscale x 1 x double>, <vscale x 1 x double> } @llvm.experimental.vector.deinterleave2.nxv2f64(<vscale x 2 x double> %a)
				%a.real = extractvalue { <vscale x 1 x double>, <vscale x 1 x double> } %a.deinterleaved, 0
				%a.imag = extractvalue { <vscale x 1 x double>, <vscale x 1 x double> } %a.deinterleaved, 1
				%b.deinterleaved = tail call { <vscale x 1 x double>, <vscale x 1 x double> } @llvm.experimental.vector.deinterleave2.nxv2f64(<vscale x 2 x double> %b)
				%b.real = extractvalue { <vscale x 1 x double>, <vscale x 1 x double> } %b.deinterleaved, 0
				%b.imag = extractvalue { <vscale x 1 x double>, <vscale x 1 x double> } %b.deinterleaved, 1
				%0 = fsub fast <vscale x 1 x double> %b.real, %a.imag
				%1 = fadd fast <vscale x 1 x double> %b.imag, %a.real
				%interleaved.vec = tail call <vscale x 2 x double> @llvm.experimental.vector.interleave2.nxv2f64(<vscale x 1 x double> %0, <vscale x 1 x double> %1)
				ret <vscale x 2 x double> %interleaved.vec
				}

				; Expected to transform
				define <vscale x 4 x double> @complex_add_v4f64(<vscale x 4 x double> %a, <vscale x 4 x double> %b) {
				; CHECK-LABEL: complex_add_v4f64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: fcadd z2.d, p0/m, z2.d, z0.d, #90
				; CHECK-NEXT: fcadd z3.d, p0/m, z3.d, z1.d, #90
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: mov z1.d, z3.d
				; CHECK-NEXT: ret
				entry:
				%a.deinterleaved = tail call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double> %a)
				%a.real = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %a.deinterleaved, 0
				%a.imag = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %a.deinterleaved, 1
				%b.deinterleaved = tail call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double> %b)
				%b.real = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %b.deinterleaved, 0
				%b.imag = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %b.deinterleaved, 1
				%0 = fsub fast <vscale x 2 x double> %b.real, %a.imag
				%1 = fadd fast <vscale x 2 x double> %b.imag, %a.real
				%interleaved.vec = tail call <vscale x 4 x double> @llvm.experimental.vector.interleave2.nxv4f64(<vscale x 2 x double> %0, <vscale x 2 x double> %1)
				ret <vscale x 4 x double> %interleaved.vec
				}

				; Expected to transform
				define <vscale x 8 x double> @complex_add_v8f64(<vscale x 8 x double> %a, <vscale x 8 x double> %b) {
				; CHECK-LABEL: complex_add_v8f64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: fcadd z6.d, p0/m, z6.d, z2.d, #90
				; CHECK-NEXT: fcadd z4.d, p0/m, z4.d, z0.d, #90
				; CHECK-NEXT: fcadd z5.d, p0/m, z5.d, z1.d, #90
				; CHECK-NEXT: fcadd z7.d, p0/m, z7.d, z3.d, #90
				; CHECK-NEXT: mov z0.d, z4.d
				; CHECK-NEXT: mov z1.d, z5.d
				; CHECK-NEXT: mov z2.d, z6.d
				; CHECK-NEXT: mov z3.d, z7.d
				; CHECK-NEXT: ret
				entry:
				%a.deinterleaved = tail call { <vscale x 4 x double>, <vscale x 4 x double> } @llvm.experimental.vector.deinterleave2.nxv8f64(<vscale x 8 x double> %a)
				%a.real = extractvalue { <vscale x 4 x double>, <vscale x 4 x double> } %a.deinterleaved, 0
				%a.imag = extractvalue { <vscale x 4 x double>, <vscale x 4 x double> } %a.deinterleaved, 1
				%b.deinterleaved = tail call { <vscale x 4 x double>, <vscale x 4 x double> } @llvm.experimental.vector.deinterleave2.nxv8f64(<vscale x 8 x double> %b)
				%b.real = extractvalue { <vscale x 4 x double>, <vscale x 4 x double> } %b.deinterleaved, 0
				%b.imag = extractvalue { <vscale x 4 x double>, <vscale x 4 x double> } %b.deinterleaved, 1
				%0 = fsub fast <vscale x 4 x double> %b.real, %a.imag
				%1 = fadd fast <vscale x 4 x double> %b.imag, %a.real
				%interleaved.vec = tail call <vscale x 8 x double> @llvm.experimental.vector.interleave2.nxv8f64(<vscale x 4 x double> %0, <vscale x 4 x double> %1)
				ret <vscale x 8 x double> %interleaved.vec
				}

				declare { <vscale x 1 x double>, <vscale x 1 x double> } @llvm.experimental.vector.deinterleave2.nxv2f64(<vscale x 2 x double>)
				declare <vscale x 2 x double> @llvm.experimental.vector.interleave2.nxv2f64(<vscale x 1 x double>, <vscale x 1 x double>)

				declare { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double>)
				declare <vscale x 4 x double> @llvm.experimental.vector.interleave2.nxv4f64(<vscale x 2 x double>, <vscale x 2 x double>)

				declare { <vscale x 4 x double>, <vscale x 4 x double> } @llvm.experimental.vector.deinterleave2.nxv8f64(<vscale x 8 x double>)
				declare <vscale x 8 x double> @llvm.experimental.vector.interleave2.nxv8f64(<vscale x 4 x double>, <vscale x 4 x double>)

llvm/test/CodeGen/AArch64/complex-deinterleaving-f64-mul-scalable.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s --mattr=+sve -o - \| FileCheck %s

				target triple = "aarch64-arm-none-eabi"

				; Expected to transform
				define <vscale x 2 x double> @complex_mul_v2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b) {
				; CHECK-LABEL: complex_mul_v2f64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: mov z2.d, #0 // =0x0
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: fcmla z2.d, p0/m, z1.d, z0.d, #0
				; CHECK-NEXT: fcmla z2.d, p0/m, z1.d, z0.d, #90
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				entry:
				%a.deinterleaved = tail call { <vscale x 1 x double>, <vscale x 1 x double> } @llvm.experimental.vector.deinterleave2.nxv2f64(<vscale x 2 x double> %a)
				%a.real = extractvalue { <vscale x 1 x double>, <vscale x 1 x double> } %a.deinterleaved, 0
				%a.imag = extractvalue { <vscale x 1 x double>, <vscale x 1 x double> } %a.deinterleaved, 1
				%b.deinterleaved = tail call { <vscale x 1 x double>, <vscale x 1 x double> } @llvm.experimental.vector.deinterleave2.nxv2f64(<vscale x 2 x double> %b)
				%b.real = extractvalue { <vscale x 1 x double>, <vscale x 1 x double> } %b.deinterleaved, 0
				%b.imag = extractvalue { <vscale x 1 x double>, <vscale x 1 x double> } %b.deinterleaved, 1
				%0 = fmul fast <vscale x 1 x double> %b.imag, %a.real
				%1 = fmul fast <vscale x 1 x double> %b.real, %a.imag
				%2 = fadd fast <vscale x 1 x double> %1, %0
				%3 = fmul fast <vscale x 1 x double> %b.real, %a.real
				%4 = fmul fast <vscale x 1 x double> %a.imag, %b.imag
				%5 = fsub fast <vscale x 1 x double> %3, %4
				%interleaved.vec = tail call <vscale x 2 x double> @llvm.experimental.vector.interleave2.nxv2f64(<vscale x 1 x double> %5, <vscale x 1 x double> %2)
				ret <vscale x 2 x double> %interleaved.vec
				}

				; Expected to transform
				define <vscale x 4 x double> @complex_mul_v4f64(<vscale x 4 x double> %a, <vscale x 4 x double> %b) {
				; CHECK-LABEL: complex_mul_v4f64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: mov z4.d, #0 // =0x0
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mov z5.d, z4.d
				; CHECK-NEXT: fcmla z4.d, p0/m, z3.d, z1.d, #0
				; CHECK-NEXT: fcmla z5.d, p0/m, z2.d, z0.d, #0
				; CHECK-NEXT: fcmla z4.d, p0/m, z3.d, z1.d, #90
				; CHECK-NEXT: fcmla z5.d, p0/m, z2.d, z0.d, #90
				; CHECK-NEXT: mov z1.d, z4.d
				; CHECK-NEXT: mov z0.d, z5.d
				; CHECK-NEXT: ret
				entry:
				%a.deinterleaved = tail call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double> %a)
				%a.real = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %a.deinterleaved, 0
				%a.imag = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %a.deinterleaved, 1
				%b.deinterleaved = tail call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double> %b)
				%b.real = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %b.deinterleaved, 0
				%b.imag = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %b.deinterleaved, 1
				%0 = fmul fast <vscale x 2 x double> %b.imag, %a.real
				%1 = fmul fast <vscale x 2 x double> %b.real, %a.imag
				%2 = fadd fast <vscale x 2 x double> %1, %0
				%3 = fmul fast <vscale x 2 x double> %b.real, %a.real
				%4 = fmul fast <vscale x 2 x double> %a.imag, %b.imag
				%5 = fsub fast <vscale x 2 x double> %3, %4
				%interleaved.vec = tail call <vscale x 4 x double> @llvm.experimental.vector.interleave2.nxv4f64(<vscale x 2 x double> %5, <vscale x 2 x double> %2)
				ret <vscale x 4 x double> %interleaved.vec
				}

				; Expected to transform
				define <vscale x 8 x double> @complex_mul_v8f64(<vscale x 8 x double> %a, <vscale x 8 x double> %b) {
				; CHECK-LABEL: complex_mul_v8f64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: mov z24.d, #0 // =0x0
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mov z25.d, z24.d
				; CHECK-NEXT: mov z26.d, z24.d
				; CHECK-NEXT: mov z27.d, z24.d
				; CHECK-NEXT: fcmla z25.d, p0/m, z4.d, z0.d, #0
				; CHECK-NEXT: fcmla z26.d, p0/m, z5.d, z1.d, #0
				; CHECK-NEXT: fcmla z27.d, p0/m, z6.d, z2.d, #0
				; CHECK-NEXT: fcmla z24.d, p0/m, z7.d, z3.d, #0
				; CHECK-NEXT: fcmla z25.d, p0/m, z4.d, z0.d, #90
				; CHECK-NEXT: fcmla z26.d, p0/m, z5.d, z1.d, #90
				; CHECK-NEXT: fcmla z27.d, p0/m, z6.d, z2.d, #90
				; CHECK-NEXT: fcmla z24.d, p0/m, z7.d, z3.d, #90
				; CHECK-NEXT: mov z0.d, z25.d
				; CHECK-NEXT: mov z1.d, z26.d
				; CHECK-NEXT: mov z2.d, z27.d
				; CHECK-NEXT: mov z3.d, z24.d
				; CHECK-NEXT: ret
				entry:
				%a.deinterleaved = tail call { <vscale x 4 x double>, <vscale x 4 x double> } @llvm.experimental.vector.deinterleave2.nxv8f64(<vscale x 8 x double> %a)
				%a.real = extractvalue { <vscale x 4 x double>, <vscale x 4 x double> } %a.deinterleaved, 0
				%a.imag = extractvalue { <vscale x 4 x double>, <vscale x 4 x double> } %a.deinterleaved, 1
				%b.deinterleaved = tail call { <vscale x 4 x double>, <vscale x 4 x double> } @llvm.experimental.vector.deinterleave2.nxv8f64(<vscale x 8 x double> %b)
				%b.real = extractvalue { <vscale x 4 x double>, <vscale x 4 x double> } %b.deinterleaved, 0
				%b.imag = extractvalue { <vscale x 4 x double>, <vscale x 4 x double> } %b.deinterleaved, 1
				%0 = fmul fast <vscale x 4 x double> %b.imag, %a.real
				%1 = fmul fast <vscale x 4 x double> %b.real, %a.imag
				%2 = fadd fast <vscale x 4 x double> %1, %0
				%3 = fmul fast <vscale x 4 x double> %b.real, %a.real
				%4 = fmul fast <vscale x 4 x double> %a.imag, %b.imag
				%5 = fsub fast <vscale x 4 x double> %3, %4
				%interleaved.vec = tail call <vscale x 8 x double> @llvm.experimental.vector.interleave2.nxv8f64(<vscale x 4 x double> %5, <vscale x 4 x double> %2)
				ret <vscale x 8 x double> %interleaved.vec
				}

				declare { <vscale x 1 x double>, <vscale x 1 x double> } @llvm.experimental.vector.deinterleave2.nxv2f64(<vscale x 2 x double>)
				declare <vscale x 2 x double> @llvm.experimental.vector.interleave2.nxv2f64(<vscale x 1 x double>, <vscale x 1 x double>)

				declare { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double>)
				declare <vscale x 4 x double> @llvm.experimental.vector.interleave2.nxv4f64(<vscale x 2 x double>, <vscale x 2 x double>)

				declare { <vscale x 4 x double>, <vscale x 4 x double> } @llvm.experimental.vector.deinterleave2.nxv8f64(<vscale x 8 x double>)
				declare <vscale x 8 x double> @llvm.experimental.vector.interleave2.nxv8f64(<vscale x 4 x double>, <vscale x 4 x double>)

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Enable AArch64 SVE FCMLA/FCADD instruction generation in ComplexDeinterleavingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 515670

llvm/include/llvm/CodeGen/ComplexDeinterleavingPass.h

llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/complex-deinterleaving-f16-add-scalable.ll

llvm/test/CodeGen/AArch64/complex-deinterleaving-f16-add.ll

llvm/test/CodeGen/AArch64/complex-deinterleaving-f16-mul-scalable.ll

llvm/test/CodeGen/AArch64/complex-deinterleaving-f32-add-scalable.ll

llvm/test/CodeGen/AArch64/complex-deinterleaving-f32-mul-scalable.ll

llvm/test/CodeGen/AArch64/complex-deinterleaving-f64-add-scalable.ll

llvm/test/CodeGen/AArch64/complex-deinterleaving-f64-mul-scalable.ll

[CodeGen] Enable AArch64 SVE FCMLA/FCADD instruction generation in ComplexDeinterleaving
ClosedPublic