This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/
-
CodeGen/
1/6
ComplexDeinterleavingPass.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
complex-deinterleaving-add-mull-fixed-fast.ll
-
complex-deinterleaving-add-mull-scalable-fast.ll
-
complex-deinterleaving-mixed-cases.ll
-
complex-deinterleaving-multiuses.ll
-
complex-deinterleaving-uniform-cases.ll
-
Thumb2/
-
mve-complex-deinterleaving-mixed-cases.ll
-
mve-complex-deinterleaving-uniform-cases.ll

Differential D148558

[CodeGen] Improve handling -Ofast generated code by ComplexDeinterleaving pass
ClosedPublic

Authored by igor.kirillov on Apr 17 2023, 11:58 AM.

Download Raw Diff

Details

Reviewers

dmgreen
NickGuy
mgabka
huntergr

Commits

rG1a1e76100e3f: [CodeGen] Improve handling -Ofast generated code by ComplexDeinterleaving pass

Summary

Code generated with -Ofast and -O3 -ffp-contract=fast (add
-ffinite-math-only to enable vectorization) can differ significantly.
Code compiled with -O3 can be deinterleaved using patterns as the
instruction order is preserved. However, with the -Ofast flag, there
can be multiple changes in the computation sequence, and even the real
and imaginary parts may not be calculated in parallel.
For more details, refer to
llvm/test/CodeGen/AArch64/complex-deinterleaving-*-fast.ll and
llvm/test/CodeGen/AArch64/complex-deinterleaving-*-contract.ll tests.
This patch implements a more general approach and enables handling most
-Ofast cases.

Depends on D148703

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

igor.kirillov created this revision.Apr 17 2023, 11:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 17 2023, 11:58 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

igor.kirillov requested review of this revision.Apr 17 2023, 11:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 17 2023, 11:58 AM

Herald added subscribers: llvm-commits, • pcwang-thead. · View Herald Transcript

I would like to bring up some questions:

What is your opinion on the approach taken to handle -Ofast? While it may seem complicated, it attempts to address a broad range of scenarios and I'm not sure how it could be simplified.
Due to the lack of direct link between ComplexDeinterleavingNode and IR Instrucitons when using -Ofast, I decided to use IRBuilder to create all new instructions in ordered way. This code can be extracted into a separate patch.
Do we need additional tests? I have already included tests that demonstrate the difference between -Ofast and -O3 -ffp-contract=fast -ffinite-math-only in D148550. However, now we have some redundant tests in other files.

igor.kirillov added reviewers: dmgreen, NickGuy, mgabka, huntergr.Apr 17 2023, 12:11 PM

Harbormaster completed remote builds in B226179: Diff 514349.Apr 17 2023, 1:46 PM

Due to the lack of direct link between ComplexDeinterleavingNode and IR Instrucitons when using -Ofast, I decided to use IRBuilder to create all new instructions in ordered way. This code can be extracted into a separate patch.

A separate patch would be ideal. There's a lot of changes, and it would help to include only what is related to the -Ofast handling here.

igor.kirillov mentioned this in D148703: [CodeGen] Refactor IR generation functions to use IRBuilder in ComplexDeinterleaving pass.Apr 19 2023, 3:27 AM

Re-adjust on top of D148703 and D148550. Add a complicated test to show that a sophisticated common sub-expression is processed well

Harbormaster completed remote builds in B226597: Diff 514945.Apr 19 2023, 6:52 AM

A separate patch would be ideal. There's a lot of changes, and it would help to include only what is related to the -Ofast handling here.

Done, now we have pre-commit patch - D148550 and node replacement refactoring - D148703

Ping

Can you precommit the test changes in complex-deinterleaving-multiuses.ll?

llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
902	Nit: Don't think I've ever seen this way to negate a bool. It certainly works, but I'd argue that `!IsPositive` is a more conventional/readable way
970	One of the original design goals was to not assume full multiplications, instead representing them as 2 partial multiplys. Is that something that is possible to preserve while still accounting for `reassoc` and `-Ofast`?
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24710 ↗	(On Diff #514945)	Do we want to be discarding the fastmath flags here?
llvm/lib/Target/AArch64/AArch64ISelLowering.h
23–24 ↗	(On Diff #514945)	Is this still necessary?
llvm/lib/Target/ARM/ARMISelLowering.cpp
22105 ↗	(On Diff #514945)	Do we want to be discarding the fastmath flags here?

Address some of the recent comments

Harbormaster completed remote builds in B228828: Diff 517924.Apr 28 2023, 7:52 AM

igor.kirillov added inline comments.Apr 28 2023, 8:25 AM

llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
970	Consider the case of complex number multiplication, such as (x, y) * (u, v). From each partial multiplication, we can only obtain three out of the four values (x, y, u, v). Consequently, it appears that performing a single partial multiplication without considering the second part is imposible. Also, it seems that current code for handling contract flag cases is capable of extracting only full multiplications, despite the detection process being divided into two methods: identifyPartialMul and identifyNodeWithImplicitAdd. Long story short, I don't know how to separate multiplications from each other.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24710 ↗	(On Diff #514945)	Yeah, you're right. Even though the code works fine without the fastmath flags and they might not be important at this point in the optimization pipeline, it's probably better to keep them. But it's not that simple. First of all, we don't know which instruction to get the flags from when dealing with -Ofast code. My idea is to use the Collect lambda function from identifyReassocNodes and AND all the flags together and store them in the ComplexDeinterleavingNode. The second issue is that currently, replaceSymmetricNode only uses flags from the Real Instruction. But what if the flags for the Imag Instruction are different? The third thing to consider is whether we should add flags to the architecture-specific intrinsics generated by createComplexDeinterleavingIR. I think the best way to handle all this is to add a Flags member to ComplexDeinterleavingNode and then apply those flags when using replaceNode. What do you think?

Matt added a subscriber: Matt.Apr 28 2023, 3:17 PM

NickGuy added inline comments.May 3 2023, 3:39 AM

llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
970	Due to how complex multiplication works, the second partial multiply will always succeed the first, however the first does not always indicate the presence of a second. While not currently implemented, the design does allow for the partial multiplys to be decoupled, theoretically matching cases like `o = a * b.real()`. The reason why it wasn't implemented initially is down to the arrangement of the shuffles (see below). Given this, I'm happy to defer the implementation for now, but as this patch aims to resolve that, I'd like to revisit the idea once this has landed :) %strided.vec26 = shufflevector <8 x float> %wide.vec25, <8 x float> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6> %14 = fmul fast <8 x float> %wide.vec25, %wide.vec %15 = shufflevector <8 x float> %14, <8 x float> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6> %16 = fmul fast <4 x float> %strided.vec26, %strided.vec24 %interleaved.vec = shufflevector <4 x float> %15, <4 x float> %16, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24710 ↗	(On Diff #514945)	My idea is to use the Collect lambda function from identifyReassocNodes and AND all the flags together and store them in the ComplexDeinterleavingNode. ANDing them feels like the most sensible to me, either that or ensuring they're all equal during identification. I think the best way to handle all this is to add a Flags member to ComplexDeinterleavingNode and then apply those flags when using replaceNode. That makes sense to me, but I'd be wary of how to apply the flags themselves. We'd either need to change the target hook to provide the flags, or have some way for a target to specify which instructions get which flags. If we do decide to apply the flags on the common side, there's also the problem of applying them to all relevant instructions, we only get the one returned from `createComplexDeinterleavingIR`, but it could create an entire subgraph of instructions if the target deems it necessary (try the existing implementation with an unrealistic <32 x double> for an example of this). The second issue is that currently, replaceSymmetricNode only uses flags from the Real Instruction. But what if the flags for the Imag Instruction are different? That's a fair point. I'd err on the side of caution, and only perform the replacement if the flags are consistent on both the Real and Imag sides. The third thing to consider is whether we should add flags to the architecture-specific intrinsics generated by createComplexDeinterleavingIR. That is something I don't feel qualified to answer :D My instinct says that it wouldn't be a problem, as nothing would be checking the flags on intrinsics, but can't say for sure.

Updated the algorithm for detecting partial multiplications. While it still supports only full complex number multiplications, it has been improved to process partial multiplications in the future.
Resolve the issue of flag loss by using Symmetric operation

Harbormaster completed remote builds in B231213: Diff 521149.May 10 2023, 4:26 PM

igor.kirillov added inline comments.May 11 2023, 4:14 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24710 ↗	(On Diff #514945)	Added flag check to identifySymmetricOperation. And added two field to ComplexDeinterleavingCompositeNode Opcode and Flags that are used only by replaceSymmetricNode. Also, now I generate Symmetric nodes rather then CAdd with 0 and 180 degree rotation.

igor.kirillov added inline comments.May 11 2023, 4:23 AM

llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
970	I made some changes to the algorithm. Now it collects all possible partial multiplications and identify complex numbers by using common instructions across different partial multiplications. I also added a place with TODO where we could handle independent (non-paired) partial multiplications in the future
1018	I lost this assertion as we don't have Instruction anymore and there are no isUnaryOp / isBinaryOp alternative for opcode

Ping

LGTM

This revision is now accepted and ready to land.May 23 2023, 8:02 AM

igor.kirillov mentioned this in rG48339d0fbbdb: [CodeGen] Add pre-commit tests for D148558.May 30 2023, 4:51 AM

igor.kirillov mentioned this in rG40a81d3100b4: [CodeGen] Refactor IR generation functions to use IRBuilder in….May 30 2023, 9:19 AM

Rebase, small refactoring replaceNode

Harbormaster completed remote builds in B235587: Diff 527070.May 31 2023, 9:54 AM

Closed by commit rG1a1e76100e3f: [CodeGen] Improve handling -Ofast generated code by ComplexDeinterleaving pass (authored by igor.kirillov). · Explain WhyMay 31 2023, 11:32 AM

This revision was automatically updated to reflect the committed changes.

igor.kirillov added a commit: rG1a1e76100e3f: [CodeGen] Improve handling -Ofast generated code by ComplexDeinterleaving pass.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

ComplexDeinterleavingPass.cpp

582 lines

test/

CodeGen/

AArch64/

complex-deinterleaving-add-mull-fixed-fast.ll

112 lines

complex-deinterleaving-add-mull-scalable-fast.ll

114 lines

complex-deinterleaving-mixed-cases.ll

6 lines

complex-deinterleaving-multiuses.ll

72 lines

complex-deinterleaving-uniform-cases.ll

10 lines

Thumb2/

mve-complex-deinterleaving-mixed-cases.ll

9 lines

mve-complex-deinterleaving-uniform-cases.ll

15 lines

Diff 527070

llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp

Show First 20 Lines • Show All 137 Lines • ▼ Show 20 Lines	private:
using NodePtr = std::shared_ptr<ComplexDeinterleavingCompositeNode>;		using NodePtr = std::shared_ptr<ComplexDeinterleavingCompositeNode>;
using RawNodePtr = ComplexDeinterleavingCompositeNode *;		using RawNodePtr = ComplexDeinterleavingCompositeNode *;

public:		public:
ComplexDeinterleavingOperation Operation;		ComplexDeinterleavingOperation Operation;
Instruction *Real;		Instruction *Real;
Instruction *Imag;		Instruction *Imag;

		// This two members are required exclusively for generating
		// ComplexDeinterleavingOperation::Symmetric operations.
		unsigned Opcode;
		FastMathFlags Flags;

ComplexDeinterleavingRotation Rotation =		ComplexDeinterleavingRotation Rotation =
ComplexDeinterleavingRotation::Rotation_0;		ComplexDeinterleavingRotation::Rotation_0;
SmallVector<RawNodePtr> Operands;		SmallVector<RawNodePtr> Operands;
Value *ReplacementNode = nullptr;		Value *ReplacementNode = nullptr;

void addOperand(NodePtr Node) { Operands.push_back(Node.get()); }		void addOperand(NodePtr Node) { Operands.push_back(Node.get()); }

void dump() { dump(dbgs()); }		void dump() { dump(dbgs()); }
Show All 27 Lines	for (const auto &Op : Operands) {
OS << " - ";		OS << " - ";
PrintNodeRef(Op);		PrintNodeRef(Op);
}		}
}		}
};		};

class ComplexDeinterleavingGraph {		class ComplexDeinterleavingGraph {
public:		public:
		struct Product {
		Instruction *Multiplier;
		Instruction *Multiplicand;
		bool IsPositive;
		};

		using Addend = std::pair<Instruction *, bool>;
using NodePtr = ComplexDeinterleavingCompositeNode::NodePtr;		using NodePtr = ComplexDeinterleavingCompositeNode::NodePtr;
using RawNodePtr = ComplexDeinterleavingCompositeNode::RawNodePtr;		using RawNodePtr = ComplexDeinterleavingCompositeNode::RawNodePtr;

		// Helper struct for holding info about potential partial multiplication
		// candidates
		struct PartialMulCandidate {
		Instruction *Common;
		NodePtr Node;
		unsigned RealIdx;
		unsigned ImagIdx;
		bool IsNodeInverted;
		};

explicit ComplexDeinterleavingGraph(const TargetLowering *TL,		explicit ComplexDeinterleavingGraph(const TargetLowering *TL,
const TargetLibraryInfo *TLI)		const TargetLibraryInfo *TLI)
: TL(TL), TLI(TLI) {}		: TL(TL), TLI(TLI) {}

private:		private:
const TargetLowering *TL = nullptr;		const TargetLowering *TL = nullptr;
const TargetLibraryInfo *TLI = nullptr;		const TargetLibraryInfo *TLI = nullptr;
SmallVector<NodePtr> CompositeNodes;		SmallVector<NodePtr> CompositeNodes;
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	private:
/// i: ai + br		/// i: ai + br
/// 270: r: ar + bi		/// 270: r: ar + bi
/// i: ai - br		/// i: ai - br
NodePtr identifyAdd(Instruction Real, Instruction Imag);		NodePtr identifyAdd(Instruction Real, Instruction Imag);
NodePtr identifySymmetricOperation(Instruction Real, Instruction Imag);		NodePtr identifySymmetricOperation(Instruction Real, Instruction Imag);

NodePtr identifyNode(Instruction I, Instruction J);		NodePtr identifyNode(Instruction I, Instruction J);

		/// Determine if a sum of complex numbers can be formed from \p RealAddends
		/// and \p ImagAddens. If \p Accumulator is not null, add the result to it.
		/// Return nullptr if it is not possible to construct a complex number.
		/// \p Flags are needed to generate symmetric Add and Sub operations.
		NodePtr identifyAdditions(std::list<Addend> &RealAddends,
		std::list<Addend> &ImagAddends, FastMathFlags Flags,
		NodePtr Accumulator);

		/// Extract one addend that have both real and imaginary parts positive.
		NodePtr extractPositiveAddend(std::list<Addend> &RealAddends,
		std::list<Addend> &ImagAddends);

		/// Determine if sum of multiplications of complex numbers can be formed from
		/// \p RealMuls and \p ImagMuls. If \p Accumulator is not null, add the result
		/// to it. Return nullptr if it is not possible to construct a complex number.
		NodePtr identifyMultiplications(std::vector<Product> &RealMuls,
		std::vector<Product> &ImagMuls,
		NodePtr Accumulator);

		/// Go through pairs of multiplication (one Real and one Imag) and find all
		/// possible candidates for partial multiplication and put them into \p
		/// Candidates. Returns true if all Product has pair with common operand
		bool collectPartialMuls(const std::vector<Product> &RealMuls,
		const std::vector<Product> &ImagMuls,
		std::vector<PartialMulCandidate> &Candidates);

		/// If the code is compiled with -Ofast or expressions have `reassoc` flag,
		/// the order of complex computation operations may be significantly altered,
		/// and the real and imaginary parts may not be executed in parallel. This
		/// function takes this into consideration and employs a more general approach
		/// to identify complex computations. Initially, it gathers all the addends
		/// and multiplicands and then constructs a complex expression from them.
		NodePtr identifyReassocNodes(Instruction I, Instruction J);

NodePtr identifyRoot(Instruction *I);		NodePtr identifyRoot(Instruction *I);

/// Identifies the Deinterleave operation applied to a vector containing		/// Identifies the Deinterleave operation applied to a vector containing
/// complex numbers. There are two ways to represent the Deinterleave		/// complex numbers. There are two ways to represent the Deinterleave
/// operation:		/// operation:
/// * Using two shufflevectors with even indices for /pReal instruction and		/// * Using two shufflevectors with even indices for /pReal instruction and
/// odd indices for /pImag instructions (only for fixed-width vectors)		/// odd indices for /pImag instructions (only for fixed-width vectors)
/// * Using two extractvalue instructions applied to `vector.deinterleave2`		/// * Using two extractvalue instructions applied to `vector.deinterleave2`
▲ Show 20 Lines • Show All 465 Lines • ▼ Show 20 Lines	if (Real->isBinaryOp()) {
if (!R1 \|\| !I1)		if (!R1 \|\| !I1)
return nullptr;		return nullptr;

Op1 = identifyNode(R1, I1);		Op1 = identifyNode(R1, I1);
if (Op1 == nullptr)		if (Op1 == nullptr)
return nullptr;		return nullptr;
}		}

		if (isa<FPMathOperator>(Real) &&
		Real->getFastMathFlags() != Imag->getFastMathFlags())
		return nullptr;

auto Node = prepareCompositeNode(ComplexDeinterleavingOperation::Symmetric,		auto Node = prepareCompositeNode(ComplexDeinterleavingOperation::Symmetric,
Real, Imag);		Real, Imag);
		Node->Opcode = Real->getOpcode();
		if (isa<FPMathOperator>(Real))
		Node->Flags = Real->getFastMathFlags();

Node->addOperand(Op0);		Node->addOperand(Op0);
if (Real->isBinaryOp())		if (Real->isBinaryOp())
Node->addOperand(Op1);		Node->addOperand(Op1);

return submitCompositeNode(Node);		return submitCompositeNode(Node);
}		}

ComplexDeinterleavingGraph::NodePtr		ComplexDeinterleavingGraph::NodePtr
ComplexDeinterleavingGraph::identifyNode(Instruction Real, Instruction Imag) {		ComplexDeinterleavingGraph::identifyNode(Instruction Real, Instruction Imag) {
LLVM_DEBUG(dbgs() << "identifyNode on " << Real << " / " << Imag << "\n");		LLVM_DEBUG(dbgs() << "identifyNode on " << Real << " / " << Imag << "\n");
if (NodePtr CN = getContainingComposite(Real, Imag)) {		if (NodePtr CN = getContainingComposite(Real, Imag)) {
LLVM_DEBUG(dbgs() << " - Folding to existing node\n");		LLVM_DEBUG(dbgs() << " - Folding to existing node\n");
return CN;		return CN;
}		}

NodePtr Node = identifyDeinterleave(Real, Imag);		if (NodePtr CN = identifyDeinterleave(Real, Imag))
if (Node)		return CN;
return Node;

auto *VTy = cast<VectorType>(Real->getType());		auto *VTy = cast<VectorType>(Real->getType());
auto *NewVTy = VectorType::getDoubleElementsVectorType(VTy);		auto *NewVTy = VectorType::getDoubleElementsVectorType(VTy);

if (TL->isComplexDeinterleavingOperationSupported(		bool HasCMulSupport = TL->isComplexDeinterleavingOperationSupported(
ComplexDeinterleavingOperation::CMulPartial, NewVTy) &&		ComplexDeinterleavingOperation::CMulPartial, NewVTy);
isInstructionPairMul(Real, Imag)) {		bool HasCAddSupport = TL->isComplexDeinterleavingOperationSupported(
return identifyPartialMul(Real, Imag);		ComplexDeinterleavingOperation::CAdd, NewVTy);
}
		if (HasCMulSupport && isInstructionPairMul(Real, Imag)) {
if (TL->isComplexDeinterleavingOperationSupported(		if (NodePtr CN = identifyPartialMul(Real, Imag))
ComplexDeinterleavingOperation::CAdd, NewVTy) &&		return CN;
isInstructionPairAdd(Real, Imag)) {		}
return identifyAdd(Real, Imag);
}		if (HasCAddSupport && isInstructionPairAdd(Real, Imag)) {
		if (NodePtr CN = identifyAdd(Real, Imag))
auto Symmetric = identifySymmetricOperation(Real, Imag);		return CN;
LLVM_DEBUG(if (Symmetric == nullptr) dbgs()		}
<< " - Not recognised as a valid pattern.\n");
return Symmetric;		if (HasCMulSupport && HasCAddSupport) {
		if (NodePtr CN = identifyReassocNodes(Real, Imag))
		return CN;
		}

		if (NodePtr CN = identifySymmetricOperation(Real, Imag))
		return CN;

		LLVM_DEBUG(dbgs() << " - Not recognised as a valid pattern.\n");
		return nullptr;
		}

		ComplexDeinterleavingGraph::NodePtr
		ComplexDeinterleavingGraph::identifyReassocNodes(Instruction *Real,
		Instruction *Imag) {
		if ((Real->getOpcode() != Instruction::FAdd &&
		Real->getOpcode() != Instruction::FSub &&
		Real->getOpcode() != Instruction::FNeg) \|\|
		(Imag->getOpcode() != Instruction::FAdd &&
		Imag->getOpcode() != Instruction::FSub &&
		Imag->getOpcode() != Instruction::FNeg))
		return nullptr;

		if (Real->getFastMathFlags() != Imag->getFastMathFlags()) {
		LLVM_DEBUG(
		dbgs()
		<< "The flags in Real and Imaginary instructions are not identical\n");
		return nullptr;
		}

		FastMathFlags Flags = Real->getFastMathFlags();
		if (!Flags.allowReassoc()) {
		LLVM_DEBUG(
		dbgs() << "the 'Reassoc' attribute is missing in the FastMath flags\n");
		return nullptr;
		}

		// Collect multiplications and addend instructions from the given instruction
		// while traversing it operands. Additionally, verify that all instructions
		// have the same fast math flags.
		auto Collect = [&Flags](Instruction *Insn, std::vector<Product> &Muls,
		std::list<Addend> &Addends) -> bool {
		SmallVector<PointerIntPair<Value *, 1, bool>> Worklist = {{Insn, true}};
		SmallPtrSet<Value *, 8> Visited;
		while (!Worklist.empty()) {
		auto [V, IsPositive] = Worklist.back();
		Worklist.pop_back();
		if (!Visited.insert(V).second)
		continue;

		Instruction *I = dyn_cast<Instruction>(V);
		if (!I)
		return false;

		// If an instruction has more than one user, it indicates that it either
		// has an external user, which will be later checked by the checkNodes
		// function, or it is a subexpression utilized by multiple expressions. In
		// the latter case, we will attempt to separately identify the complex
		// operation from here in order to create a shared
		// ComplexDeinterleavingCompositeNode.
		NickGuyUnsubmitted Done Reply Inline Actions Nit: Don't think I've ever seen this way to negate a bool. It certainly works, but I'd argue that `!IsPositive` is a more conventional/readable way NickGuy: Nit: Don't think I've ever seen this way to negate a bool. It certainly works, but I'd argue…
		if (I != Insn && I->getNumUses() > 1) {
		LLVM_DEBUG(dbgs() << "Found potential sub-expression: " << *I << "\n");
		Addends.emplace_back(I, IsPositive);
		continue;
		}

		if (I->getOpcode() == Instruction::FAdd) {
		Worklist.emplace_back(I->getOperand(1), IsPositive);
		Worklist.emplace_back(I->getOperand(0), IsPositive);
		} else if (I->getOpcode() == Instruction::FSub) {
		Worklist.emplace_back(I->getOperand(1), !IsPositive);
		Worklist.emplace_back(I->getOperand(0), IsPositive);
		} else if (I->getOpcode() == Instruction::FMul) {
		auto *A = dyn_cast<Instruction>(I->getOperand(0));
		if (A && A->getOpcode() == Instruction::FNeg) {
		A = dyn_cast<Instruction>(A->getOperand(0));
		IsPositive = !IsPositive;
		}
		if (!A)
		return false;
		auto *B = dyn_cast<Instruction>(I->getOperand(1));
		if (B && B->getOpcode() == Instruction::FNeg) {
		B = dyn_cast<Instruction>(B->getOperand(0));
		IsPositive = !IsPositive;
		}
		if (!B)
		return false;
		Muls.push_back(Product{A, B, IsPositive});
		} else if (I->getOpcode() == Instruction::FNeg) {
		Worklist.emplace_back(I->getOperand(0), !IsPositive);
		} else {
		Addends.emplace_back(I, IsPositive);
		continue;
		}

		if (I->getFastMathFlags() != Flags) {
		LLVM_DEBUG(dbgs() << "The instruction's fast math flags are "
		"inconsistent with the root instructions' flags: "
		<< *I << "\n");
		return false;
		}
		}
		return true;
		};

		std::vector<Product> RealMuls, ImagMuls;
		std::list<Addend> RealAddends, ImagAddends;
		if (!Collect(Real, RealMuls, RealAddends) \|\|
		!Collect(Imag, ImagMuls, ImagAddends))
		return nullptr;

		if (RealAddends.size() != ImagAddends.size())
		return nullptr;

		NodePtr FinalNode;
		if (!RealMuls.empty() \|\| !ImagMuls.empty()) {
		// If there are multiplicands, extract positive addend and use it as an
		// accumulator
		FinalNode = extractPositiveAddend(RealAddends, ImagAddends);
		FinalNode = identifyMultiplications(RealMuls, ImagMuls, FinalNode);
		if (!FinalNode)
		return nullptr;
		}

		// Identify and process remaining additions
		if (!RealAddends.empty() \|\| !ImagAddends.empty()) {
		FinalNode = identifyAdditions(RealAddends, ImagAddends, Flags, FinalNode);
		if (!FinalNode)
		NickGuyUnsubmitted Not Done Reply Inline Actions One of the original design goals was to not assume full multiplications, instead representing them as 2 partial multiplys. Is that something that is possible to preserve while still accounting for `reassoc` and `-Ofast`? NickGuy: One of the original design goals was to not assume full multiplications, instead representing…
		igor.kirillovAuthorUnsubmitted Not Done Reply Inline Actions Consider the case of complex number multiplication, such as (x, y) * (u, v). From each partial multiplication, we can only obtain three out of the four values (x, y, u, v). Consequently, it appears that performing a single partial multiplication without considering the second part is imposible. Also, it seems that current code for handling contract flag cases is capable of extracting only full multiplications, despite the detection process being divided into two methods: identifyPartialMul and identifyNodeWithImplicitAdd. Long story short, I don't know how to separate multiplications from each other. igor.kirillov: Consider the case of complex number multiplication, such as (x, y) * (u, v). From each partial…
		NickGuyUnsubmitted Not Done Reply Inline Actions Due to how complex multiplication works, the second partial multiply will always succeed the first, however the first does not always indicate the presence of a second. While not currently implemented, the design does allow for the partial multiplys to be decoupled, theoretically matching cases like `o = a * b.real()`. The reason why it wasn't implemented initially is down to the arrangement of the shuffles (see below). Given this, I'm happy to defer the implementation for now, but as this patch aims to resolve that, I'd like to revisit the idea once this has landed :) %strided.vec26 = shufflevector <8 x float> %wide.vec25, <8 x float> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6> %14 = fmul fast <8 x float> %wide.vec25, %wide.vec %15 = shufflevector <8 x float> %14, <8 x float> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6> %16 = fmul fast <4 x float> %strided.vec26, %strided.vec24 %interleaved.vec = shufflevector <4 x float> %15, <4 x float> %16, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7> NickGuy: Due to how complex multiplication works, the second partial multiply will always succeed the…
		igor.kirillovAuthorUnsubmitted Not Done Reply Inline Actions I made some changes to the algorithm. Now it collects all possible partial multiplications and identify complex numbers by using common instructions across different partial multiplications. I also added a place with TODO where we could handle independent (non-paired) partial multiplications in the future igor.kirillov: I made some changes to the algorithm. Now it collects all possible partial multiplications and…
		return nullptr;
		}

		// Set the Real and Imag fields of the final node and submit it
		FinalNode->Real = Real;
		FinalNode->Imag = Imag;
		submitCompositeNode(FinalNode);
		return FinalNode;
		}

		bool ComplexDeinterleavingGraph::collectPartialMuls(
		const std::vector<Product> &RealMuls, const std::vector<Product> &ImagMuls,
		std::vector<PartialMulCandidate> &PartialMulCandidates) {
		// Helper function to extract a common operand from two products
		auto FindCommonInstruction = [](const Product &Real,
		const Product &Imag) -> Instruction * {
		if (Real.Multiplicand == Imag.Multiplicand \|\|
		Real.Multiplicand == Imag.Multiplier)
		return Real.Multiplicand;

		if (Real.Multiplier == Imag.Multiplicand \|\|
		Real.Multiplier == Imag.Multiplier)
		return Real.Multiplier;

		return nullptr;
		};

		// Iterating over real and imaginary multiplications to find common operands
		// If a common operand is found, a partial multiplication candidate is created
		// and added to the candidates vector The function returns false if no common
		// operands are found for any product
		for (unsigned i = 0; i < RealMuls.size(); ++i) {
		bool FoundCommon = false;
		for (unsigned j = 0; j < ImagMuls.size(); ++j) {
		auto *Common = FindCommonInstruction(RealMuls[i], ImagMuls[j]);
		if (!Common)
		continue;

		auto *A = RealMuls[i].Multiplicand == Common ? RealMuls[i].Multiplier
		: RealMuls[i].Multiplicand;
		auto *B = ImagMuls[j].Multiplicand == Common ? ImagMuls[j].Multiplier
		: ImagMuls[j].Multiplicand;

		bool Inverted = false;
		auto Node = identifyNode(A, B);
		if (!Node) {
		std::swap(A, B);
		Inverted = true;
		Node = identifyNode(A, B);
		}
		if (!Node)
		continue;

		FoundCommon = true;
		PartialMulCandidates.push_back({Common, Node, i, j, Inverted});
		}
		if (!FoundCommon)
		return false;
		}
		return true;
		}

		ComplexDeinterleavingGraph::NodePtr
		ComplexDeinterleavingGraph::identifyMultiplications(
		std::vector<Product> &RealMuls, std::vector<Product> &ImagMuls,
		NodePtr Accumulator = nullptr) {
		if (RealMuls.size() != ImagMuls.size())
		return nullptr;

		std::vector<PartialMulCandidate> Info;
		if (!collectPartialMuls(RealMuls, ImagMuls, Info))
		return nullptr;

		// Map to store common instruction to node pointers
		std::map<Instruction *, NodePtr> CommonToNode;
		std::vector<bool> Processed(Info.size(), false);
		for (unsigned I = 0; I < Info.size(); ++I) {
		if (Processed[I])
		continue;

		PartialMulCandidate &InfoA = Info[I];
		for (unsigned J = I + 1; J < Info.size(); ++J) {
		if (Processed[J])
		continue;

		PartialMulCandidate &InfoB = Info[J];
		auto *InfoReal = &InfoA;
		auto *InfoImag = &InfoB;

		auto NodeFromCommon = identifyNode(InfoReal->Common, InfoImag->Common);
		if (!NodeFromCommon) {
		std::swap(InfoReal, InfoImag);
		NodeFromCommon = identifyNode(InfoReal->Common, InfoImag->Common);
		}
		if (!NodeFromCommon)
		continue;

		CommonToNode[InfoReal->Common] = NodeFromCommon;
		CommonToNode[InfoImag->Common] = NodeFromCommon;
		Processed[I] = true;
		Processed[J] = true;
		}
		}

		std::vector<bool> ProcessedReal(RealMuls.size(), false);
		std::vector<bool> ProcessedImag(ImagMuls.size(), false);
		NodePtr Result = Accumulator;
		for (auto &PMI : Info) {
		if (ProcessedReal[PMI.RealIdx] \|\| ProcessedImag[PMI.ImagIdx])
		continue;

		auto It = CommonToNode.find(PMI.Common);
		// TODO: Process independent complex multiplications. Cases like this:
		// A.real() * B where both A and B are complex numbers.
		if (It == CommonToNode.end()) {
		LLVM_DEBUG({
		dbgs() << "Unprocessed independent partial multiplication:\n";
		for (auto *Mul : {&RealMuls[PMI.RealIdx], &RealMuls[PMI.RealIdx]})
		dbgs().indent(4) << (Mul->IsPositive ? "+" : "-") << *Mul->Multiplier
		<< " multiplied by " << *Mul->Multiplicand << "\n";
		});
		return nullptr;
		}

		auto &RealMul = RealMuls[PMI.RealIdx];
		auto &ImagMul = ImagMuls[PMI.ImagIdx];

		auto NodeA = It->second;
		auto NodeB = PMI.Node;
		auto IsMultiplicandReal = PMI.Common == NodeA->Real;
		// The following table illustrates the relationship between multiplications
		// and rotations. If we consider the multiplication (X + iY) * (U + iV), we
		// can see:
		//
		// Rotation \| Real \| Imag \|
		// ---------+--------+--------+
		// 0 \| x * u \| x * v \|
		// 90 \| -y * v \| y * u \|
		// 180 \| -x * u \| -x * v \|
		// 270 \| y * v \| -y * u \|
		//
		// Check if the candidate can indeed be represented by partial
		// multiplication
		// TODO: Add support for multiplication by complex one
		if ((IsMultiplicandReal && PMI.IsNodeInverted) \|\|
		(!IsMultiplicandReal && !PMI.IsNodeInverted))
		continue;

		// Determine the rotation based on the multiplications
		ComplexDeinterleavingRotation Rotation;
		if (IsMultiplicandReal) {
		// Detect 0 and 180 degrees rotation
		if (RealMul.IsPositive && ImagMul.IsPositive)
		Rotation = llvm::ComplexDeinterleavingRotation::Rotation_0;
		else if (!RealMul.IsPositive && !ImagMul.IsPositive)
		Rotation = llvm::ComplexDeinterleavingRotation::Rotation_180;
		else
		continue;

		} else {
		// Detect 90 and 270 degrees rotation
		if (!RealMul.IsPositive && ImagMul.IsPositive)
		Rotation = llvm::ComplexDeinterleavingRotation::Rotation_90;
		else if (RealMul.IsPositive && !ImagMul.IsPositive)
		Rotation = llvm::ComplexDeinterleavingRotation::Rotation_270;
		else
		continue;
		}

		LLVM_DEBUG({
		dbgs() << "Identified partial multiplication (X, Y) * (U, V):\n";
		dbgs().indent(4) << "X: " << *NodeA->Real << "\n";
		dbgs().indent(4) << "Y: " << *NodeA->Imag << "\n";
		dbgs().indent(4) << "U: " << *NodeB->Real << "\n";
		dbgs().indent(4) << "V: " << *NodeB->Imag << "\n";
		dbgs().indent(4) << "Rotation - " << (int)Rotation * 90 << "\n";
		});

		NodePtr NodeMul = prepareCompositeNode(
		ComplexDeinterleavingOperation::CMulPartial, nullptr, nullptr);
		NodeMul->Rotation = Rotation;
		NodeMul->addOperand(NodeA);
		NodeMul->addOperand(NodeB);
		if (Result)
		NodeMul->addOperand(Result);
		submitCompositeNode(NodeMul);
		Result = NodeMul;
		ProcessedReal[PMI.RealIdx] = true;
		ProcessedImag[PMI.ImagIdx] = true;
		}

		// Ensure all products have been processed, if not return nullptr.
		if (!all_of(ProcessedReal, [](bool V) { return V; }) \|\|
		!all_of(ProcessedImag, [](bool V) { return V; })) {

		// Dump debug information about which partial multiplications are not
		// processed.
		LLVM_DEBUG({
		dbgs() << "Unprocessed products (Real):\n";
		for (size_t i = 0; i < ProcessedReal.size(); ++i) {
		if (!ProcessedReal[i])
		dbgs().indent(4) << (RealMuls[i].IsPositive ? "+" : "-")
		<< *RealMuls[i].Multiplier << " multiplied by "
		<< *RealMuls[i].Multiplicand << "\n";
		}
		dbgs() << "Unprocessed products (Imag):\n";
		for (size_t i = 0; i < ProcessedImag.size(); ++i) {
		if (!ProcessedImag[i])
		dbgs().indent(4) << (ImagMuls[i].IsPositive ? "+" : "-")
		<< *ImagMuls[i].Multiplier << " multiplied by "
		<< *ImagMuls[i].Multiplicand << "\n";
		}
		});
		return nullptr;
		}

		return Result;
		}

		ComplexDeinterleavingGraph::NodePtr
		ComplexDeinterleavingGraph::identifyAdditions(std::list<Addend> &RealAddends,
		std::list<Addend> &ImagAddends,
		FastMathFlags Flags,
		NodePtr Accumulator = nullptr) {
		if (RealAddends.size() != ImagAddends.size())
		return nullptr;

		NodePtr Result;
		// If we have accumulator use it as first addend
		if (Accumulator)
		Result = Accumulator;
		// Otherwise find an element with both positive real and imaginary parts.
		else
		Result = extractPositiveAddend(RealAddends, ImagAddends);

		if (!Result)
		return nullptr;

		while (!RealAddends.empty()) {
		auto ItR = RealAddends.begin();
		auto [R, IsPositiveR] = *ItR;

		bool FoundImag = false;
		for (auto ItI = ImagAddends.begin(); ItI != ImagAddends.end(); ++ItI) {
		auto [I, IsPositiveI] = *ItI;
		ComplexDeinterleavingRotation Rotation;
		if (IsPositiveR && IsPositiveI)
		Rotation = ComplexDeinterleavingRotation::Rotation_0;
		else if (!IsPositiveR && IsPositiveI)
		Rotation = ComplexDeinterleavingRotation::Rotation_90;
		else if (!IsPositiveR && !IsPositiveI)
		Rotation = ComplexDeinterleavingRotation::Rotation_180;
		else
		Rotation = ComplexDeinterleavingRotation::Rotation_270;

		NodePtr AddNode;
		if (Rotation == ComplexDeinterleavingRotation::Rotation_0 \|\|
		Rotation == ComplexDeinterleavingRotation::Rotation_180) {
		AddNode = identifyNode(R, I);
		} else {
		AddNode = identifyNode(I, R);
		}
		if (AddNode) {
		LLVM_DEBUG({
		dbgs() << "Identified addition:\n";
		dbgs().indent(4) << "X: " << *R << "\n";
		dbgs().indent(4) << "Y: " << *I << "\n";
		dbgs().indent(4) << "Rotation - " << (int)Rotation * 90 << "\n";
		});

		NodePtr TmpNode;
		if (Rotation == llvm::ComplexDeinterleavingRotation::Rotation_0) {
		TmpNode = prepareCompositeNode(
		ComplexDeinterleavingOperation::Symmetric, nullptr, nullptr);
		TmpNode->Opcode = Instruction::FAdd;
		TmpNode->Flags = Flags;
		} else if (Rotation ==
		llvm::ComplexDeinterleavingRotation::Rotation_180) {
		TmpNode = prepareCompositeNode(
		ComplexDeinterleavingOperation::Symmetric, nullptr, nullptr);
		TmpNode->Opcode = Instruction::FSub;
		TmpNode->Flags = Flags;
		} else {
		TmpNode = prepareCompositeNode(ComplexDeinterleavingOperation::CAdd,
		nullptr, nullptr);
		TmpNode->Rotation = Rotation;
		}

		TmpNode->addOperand(Result);
		TmpNode->addOperand(AddNode);
		submitCompositeNode(TmpNode);
		Result = TmpNode;
		RealAddends.erase(ItR);
		ImagAddends.erase(ItI);
		FoundImag = true;
		break;
		}
		}
		if (!FoundImag)
		return nullptr;
		}
		return Result;
		}

		ComplexDeinterleavingGraph::NodePtr
		ComplexDeinterleavingGraph::extractPositiveAddend(
		std::list<Addend> &RealAddends, std::list<Addend> &ImagAddends) {
		for (auto ItR = RealAddends.begin(); ItR != RealAddends.end(); ++ItR) {
		for (auto ItI = ImagAddends.begin(); ItI != ImagAddends.end(); ++ItI) {
		auto [R, IsPositiveR] = *ItR;
		auto [I, IsPositiveI] = *ItI;
		if (IsPositiveR && IsPositiveI) {
		auto Result = identifyNode(R, I);
		if (Result) {
		RealAddends.erase(ItR);
		ImagAddends.erase(ItI);
		return Result;
		}
		}
		}
		}
		return nullptr;
}		}

bool ComplexDeinterleavingGraph::identifyNodes(Instruction *RootI) {		bool ComplexDeinterleavingGraph::identifyNodes(Instruction *RootI) {
auto RootNode = identifyRoot(RootI);		auto RootNode = identifyRoot(RootI);
if (!RootNode)		if (!RootNode)
return false;		return false;

LLVM_DEBUG({		LLVM_DEBUG({
▲ Show 20 Lines • Show All 218 Lines • ▼ Show 20 Lines	NodePtr PlaceholderNode =
prepareCompositeNode(llvm::ComplexDeinterleavingOperation::Deinterleave,		prepareCompositeNode(llvm::ComplexDeinterleavingOperation::Deinterleave,
RealShuffle, ImagShuffle);		RealShuffle, ImagShuffle);
PlaceholderNode->ReplacementNode = RealShuffle->getOperand(0);		PlaceholderNode->ReplacementNode = RealShuffle->getOperand(0);
FinalInstructions.insert(RealShuffle);		FinalInstructions.insert(RealShuffle);
FinalInstructions.insert(ImagShuffle);		FinalInstructions.insert(ImagShuffle);
return submitCompositeNode(PlaceholderNode);		return submitCompositeNode(PlaceholderNode);
}		}

static Value *replaceSymmetricNode(IRBuilderBase &B,		static Value *replaceSymmetricNode(IRBuilderBase &B, unsigned Opcode,
ComplexDeinterleavingGraph::RawNodePtr Node,		FastMathFlags Flags, Value *InputA,
Value InputA, Value InputB) {		Value *InputB) {
Instruction *I = Node->Real;		Value *I;
if (I->isUnaryOp())		switch (Opcode) {
igor.kirillovAuthorUnsubmitted Not Done Reply Inline Actions I lost this assertion as we don't have Instruction anymore and there are no isUnaryOp / isBinaryOp alternative for opcode igor.kirillov: I lost this assertion as we don't have Instruction anymore and there are no isUnaryOp /…
assert(!InputB &&
"Unary symmetric operations need one input, but two were provided.");
else if (I->isBinaryOp())
assert(InputB && "Binary symmetric operations need two inputs, only one "
"was provided.");

switch (I->getOpcode()) {
case Instruction::FNeg:		case Instruction::FNeg:
return B.CreateFNegFMF(InputA, I);		I = B.CreateFNeg(InputA);
		break;
case Instruction::FAdd:		case Instruction::FAdd:
return B.CreateFAddFMF(InputA, InputB, I);		I = B.CreateFAdd(InputA, InputB);
		break;
case Instruction::FSub:		case Instruction::FSub:
return B.CreateFSubFMF(InputA, InputB, I);		I = B.CreateFSub(InputA, InputB);
		break;
case Instruction::FMul:		case Instruction::FMul:
return B.CreateFMulFMF(InputA, InputB, I);		I = B.CreateFMul(InputA, InputB);
		break;
		default:
		llvm_unreachable("Incorrect symmetric opcode");
}		}
		cast<Instruction>(I)->setFastMathFlags(Flags);
return nullptr;		return I;
}		}

Value *ComplexDeinterleavingGraph::replaceNode(IRBuilderBase &Builder,		Value *ComplexDeinterleavingGraph::replaceNode(IRBuilderBase &Builder,
RawNodePtr Node) {		RawNodePtr Node) {
if (Node->ReplacementNode)		if (Node->ReplacementNode)
return Node->ReplacementNode;		return Node->ReplacementNode;

Value *Input0 = replaceNode(Builder, Node->Operands[0]);		Value *Input0 = replaceNode(Builder, Node->Operands[0]);
Value *Input1 = Node->Operands.size() > 1		Value *Input1 = Node->Operands.size() > 1
? replaceNode(Builder, Node->Operands[1])		? replaceNode(Builder, Node->Operands[1])
: nullptr;		: nullptr;
Value *Accumulator = Node->Operands.size() > 2		Value *Accumulator = Node->Operands.size() > 2
? replaceNode(Builder, Node->Operands[2])		? replaceNode(Builder, Node->Operands[2])
: nullptr;		: nullptr;

if (Input1)		if (Input1)
assert(Input0->getType() == Input1->getType() &&		assert(Input0->getType() == Input1->getType() &&
"Node inputs need to be of the same type");		"Node inputs need to be of the same type");

if (Node->Operation == ComplexDeinterleavingOperation::Symmetric)		if (Node->Operation == ComplexDeinterleavingOperation::Symmetric)
Node->ReplacementNode = replaceSymmetricNode(Builder, Node, Input0, Input1);		Node->ReplacementNode = replaceSymmetricNode(Builder, Node->Opcode,
		Node->Flags, Input0, Input1);
else		else
Node->ReplacementNode = TL->createComplexDeinterleavingIR(		Node->ReplacementNode = TL->createComplexDeinterleavingIR(
Builder, Node->Operation, Node->Rotation, Input0, Input1, Accumulator);		Builder, Node->Operation, Node->Rotation, Input0, Input1, Accumulator);

assert(Node->ReplacementNode && "Target failed to create Intrinsic call.");		assert(Node->ReplacementNode && "Target failed to create Intrinsic call.");
NumComplexTransformations += 1;		NumComplexTransformations += 1;
return Node->ReplacementNode;		return Node->ReplacementNode;
}		}
Show All 20 Lines

llvm/test/CodeGen/AArch64/complex-deinterleaving-add-mull-fixed-fast.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s --mattr=+complxnum,+neon -o - \| FileCheck %s		; RUN: llc < %s --mattr=+complxnum,+neon -o - \| FileCheck %s

target triple = "aarch64-arm-none-eabi"		target triple = "aarch64-arm-none-eabi"

; a * b + c		; a * b + c
define <4 x double> @mull_add(<4 x double> %a, <4 x double> %b, <4 x double> %c) {		define <4 x double> @mull_add(<4 x double> %a, <4 x double> %b, <4 x double> %c) {
; CHECK-LABEL: mull_add:		; CHECK-LABEL: mull_add:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: zip2 v6.2d, v4.2d, v5.2d		; CHECK-NEXT: fcmla v4.2d, v2.2d, v0.2d, #0
; CHECK-NEXT: zip1 v7.2d, v0.2d, v1.2d		; CHECK-NEXT: fcmla v5.2d, v3.2d, v1.2d, #0
; CHECK-NEXT: zip2 v0.2d, v0.2d, v1.2d		; CHECK-NEXT: fcmla v4.2d, v2.2d, v0.2d, #90
; CHECK-NEXT: zip1 v1.2d, v4.2d, v5.2d		; CHECK-NEXT: fcmla v5.2d, v3.2d, v1.2d, #90
; CHECK-NEXT: zip1 v4.2d, v2.2d, v3.2d		; CHECK-NEXT: mov v0.16b, v4.16b
; CHECK-NEXT: zip2 v2.2d, v2.2d, v3.2d		; CHECK-NEXT: mov v1.16b, v5.16b
; CHECK-NEXT: fmla v6.2d, v0.2d, v4.2d
; CHECK-NEXT: fmla v1.2d, v7.2d, v4.2d
; CHECK-NEXT: fmla v6.2d, v7.2d, v2.2d
; CHECK-NEXT: fmls v1.2d, v0.2d, v2.2d
; CHECK-NEXT: zip1 v0.2d, v1.2d, v6.2d
; CHECK-NEXT: zip2 v1.2d, v1.2d, v6.2d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%strided.vec = shufflevector <4 x double> %a, <4 x double> poison, <2 x i32> <i32 0, i32 2>		%strided.vec = shufflevector <4 x double> %a, <4 x double> poison, <2 x i32> <i32 0, i32 2>
%strided.vec28 = shufflevector <4 x double> %a, <4 x double> poison, <2 x i32> <i32 1, i32 3>		%strided.vec28 = shufflevector <4 x double> %a, <4 x double> poison, <2 x i32> <i32 1, i32 3>
%strided.vec30 = shufflevector <4 x double> %b, <4 x double> poison, <2 x i32> <i32 0, i32 2>		%strided.vec30 = shufflevector <4 x double> %b, <4 x double> poison, <2 x i32> <i32 0, i32 2>
%strided.vec31 = shufflevector <4 x double> %b, <4 x double> poison, <2 x i32> <i32 1, i32 3>		%strided.vec31 = shufflevector <4 x double> %b, <4 x double> poison, <2 x i32> <i32 1, i32 3>
%0 = fmul fast <2 x double> %strided.vec31, %strided.vec		%0 = fmul fast <2 x double> %strided.vec31, %strided.vec
%1 = fmul fast <2 x double> %strided.vec30, %strided.vec28		%1 = fmul fast <2 x double> %strided.vec30, %strided.vec28
%2 = fadd fast <2 x double> %0, %1		%2 = fadd fast <2 x double> %0, %1
%3 = fmul fast <2 x double> %strided.vec30, %strided.vec		%3 = fmul fast <2 x double> %strided.vec30, %strided.vec
%strided.vec33 = shufflevector <4 x double> %c, <4 x double> poison, <2 x i32> <i32 0, i32 2>		%strided.vec33 = shufflevector <4 x double> %c, <4 x double> poison, <2 x i32> <i32 0, i32 2>
%strided.vec34 = shufflevector <4 x double> %c, <4 x double> poison, <2 x i32> <i32 1, i32 3>		%strided.vec34 = shufflevector <4 x double> %c, <4 x double> poison, <2 x i32> <i32 1, i32 3>
%4 = fadd fast <2 x double> %strided.vec33, %3		%4 = fadd fast <2 x double> %strided.vec33, %3
%5 = fmul fast <2 x double> %strided.vec31, %strided.vec28		%5 = fmul fast <2 x double> %strided.vec31, %strided.vec28
%6 = fsub fast <2 x double> %4, %5		%6 = fsub fast <2 x double> %4, %5
%7 = fadd fast <2 x double> %2, %strided.vec34		%7 = fadd fast <2 x double> %2, %strided.vec34
%interleaved.vec = shufflevector <2 x double> %6, <2 x double> %7, <4 x i32> <i32 0, i32 2, i32 1, i32 3>		%interleaved.vec = shufflevector <2 x double> %6, <2 x double> %7, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
ret <4 x double> %interleaved.vec		ret <4 x double> %interleaved.vec
}		}

; a * b + c * d		; a * b + c * d
define <4 x double> @mul_add_mull(<4 x double> %a, <4 x double> %b, <4 x double> %c, <4 x double> %d) {		define <4 x double> @mul_add_mull(<4 x double> %a, <4 x double> %b, <4 x double> %c, <4 x double> %d) {
; CHECK-LABEL: mul_add_mull:		; CHECK-LABEL: mul_add_mull:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: zip1 v16.2d, v2.2d, v3.2d		; CHECK-NEXT: movi v16.2d, #0000000000000000
; CHECK-NEXT: zip1 v17.2d, v0.2d, v1.2d		; CHECK-NEXT: movi v17.2d, #0000000000000000
; CHECK-NEXT: zip2 v0.2d, v0.2d, v1.2d		; CHECK-NEXT: fcmla v16.2d, v4.2d, v6.2d, #0
; CHECK-NEXT: zip2 v1.2d, v2.2d, v3.2d		; CHECK-NEXT: fcmla v17.2d, v5.2d, v7.2d, #0
; CHECK-NEXT: zip1 v2.2d, v4.2d, v5.2d		; CHECK-NEXT: fcmla v16.2d, v2.2d, v0.2d, #0
; CHECK-NEXT: zip2 v3.2d, v4.2d, v5.2d		; CHECK-NEXT: fcmla v17.2d, v3.2d, v1.2d, #0
; CHECK-NEXT: fmul v4.2d, v16.2d, v0.2d		; CHECK-NEXT: fcmla v16.2d, v4.2d, v6.2d, #90
; CHECK-NEXT: zip1 v5.2d, v6.2d, v7.2d		; CHECK-NEXT: fcmla v17.2d, v5.2d, v7.2d, #90
; CHECK-NEXT: zip2 v6.2d, v6.2d, v7.2d		; CHECK-NEXT: fcmla v16.2d, v2.2d, v0.2d, #90
; CHECK-NEXT: fmul v0.2d, v1.2d, v0.2d		; CHECK-NEXT: fcmla v17.2d, v3.2d, v1.2d, #90
; CHECK-NEXT: fmul v7.2d, v16.2d, v17.2d		; CHECK-NEXT: mov v0.16b, v16.16b
; CHECK-NEXT: fmla v4.2d, v17.2d, v1.2d		; CHECK-NEXT: mov v1.16b, v17.16b
; CHECK-NEXT: fmla v0.2d, v3.2d, v6.2d
; CHECK-NEXT: fmla v7.2d, v2.2d, v5.2d
; CHECK-NEXT: fmla v4.2d, v3.2d, v5.2d
; CHECK-NEXT: fsub v1.2d, v7.2d, v0.2d
; CHECK-NEXT: fmla v4.2d, v2.2d, v6.2d
; CHECK-NEXT: zip1 v0.2d, v1.2d, v4.2d
; CHECK-NEXT: zip2 v1.2d, v1.2d, v4.2d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%strided.vec = shufflevector <4 x double> %a, <4 x double> poison, <2 x i32> <i32 0, i32 2>		%strided.vec = shufflevector <4 x double> %a, <4 x double> poison, <2 x i32> <i32 0, i32 2>
%strided.vec51 = shufflevector <4 x double> %a, <4 x double> poison, <2 x i32> <i32 1, i32 3>		%strided.vec51 = shufflevector <4 x double> %a, <4 x double> poison, <2 x i32> <i32 1, i32 3>
%strided.vec53 = shufflevector <4 x double> %b, <4 x double> poison, <2 x i32> <i32 0, i32 2>		%strided.vec53 = shufflevector <4 x double> %b, <4 x double> poison, <2 x i32> <i32 0, i32 2>
%strided.vec54 = shufflevector <4 x double> %b, <4 x double> poison, <2 x i32> <i32 1, i32 3>		%strided.vec54 = shufflevector <4 x double> %b, <4 x double> poison, <2 x i32> <i32 1, i32 3>
%0 = fmul fast <2 x double> %strided.vec54, %strided.vec		%0 = fmul fast <2 x double> %strided.vec54, %strided.vec
%1 = fmul fast <2 x double> %strided.vec53, %strided.vec51		%1 = fmul fast <2 x double> %strided.vec53, %strided.vec51
Show All 16 Lines	entry:
%interleaved.vec = shufflevector <2 x double> %10, <2 x double> %13, <4 x i32> <i32 0, i32 2, i32 1, i32 3>		%interleaved.vec = shufflevector <2 x double> %10, <2 x double> %13, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
ret <4 x double> %interleaved.vec		ret <4 x double> %interleaved.vec
}		}

; a * b - c * d		; a * b - c * d
define <4 x double> @mul_sub_mull(<4 x double> %a, <4 x double> %b, <4 x double> %c, <4 x double> %d) {		define <4 x double> @mul_sub_mull(<4 x double> %a, <4 x double> %b, <4 x double> %c, <4 x double> %d) {
; CHECK-LABEL: mul_sub_mull:		; CHECK-LABEL: mul_sub_mull:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: zip1 v17.2d, v2.2d, v3.2d		; CHECK-NEXT: movi v16.2d, #0000000000000000
; CHECK-NEXT: zip1 v18.2d, v0.2d, v1.2d		; CHECK-NEXT: movi v17.2d, #0000000000000000
; CHECK-NEXT: zip2 v0.2d, v0.2d, v1.2d		; CHECK-NEXT: fcmla v16.2d, v4.2d, v6.2d, #270
; CHECK-NEXT: zip2 v1.2d, v2.2d, v3.2d		; CHECK-NEXT: fcmla v17.2d, v5.2d, v7.2d, #270
; CHECK-NEXT: zip2 v2.2d, v4.2d, v5.2d		; CHECK-NEXT: fcmla v16.2d, v2.2d, v0.2d, #0
; CHECK-NEXT: zip1 v3.2d, v6.2d, v7.2d		; CHECK-NEXT: fcmla v17.2d, v3.2d, v1.2d, #0
; CHECK-NEXT: zip1 v16.2d, v4.2d, v5.2d		; CHECK-NEXT: fcmla v16.2d, v4.2d, v6.2d, #180
; CHECK-NEXT: fmul v4.2d, v17.2d, v0.2d		; CHECK-NEXT: fcmla v17.2d, v5.2d, v7.2d, #180
; CHECK-NEXT: fmul v5.2d, v17.2d, v18.2d		; CHECK-NEXT: fcmla v16.2d, v2.2d, v0.2d, #90
; CHECK-NEXT: fmul v0.2d, v1.2d, v0.2d		; CHECK-NEXT: fcmla v17.2d, v3.2d, v1.2d, #90
; CHECK-NEXT: zip2 v6.2d, v6.2d, v7.2d		; CHECK-NEXT: mov v0.16b, v16.16b
; CHECK-NEXT: fmul v7.2d, v3.2d, v2.2d		; CHECK-NEXT: mov v1.16b, v17.16b
; CHECK-NEXT: fmla v4.2d, v18.2d, v1.2d
; CHECK-NEXT: fmla v0.2d, v16.2d, v3.2d
; CHECK-NEXT: fmla v5.2d, v2.2d, v6.2d
; CHECK-NEXT: fmla v7.2d, v16.2d, v6.2d
; CHECK-NEXT: fsub v1.2d, v5.2d, v0.2d
; CHECK-NEXT: fsub v2.2d, v4.2d, v7.2d
; CHECK-NEXT: zip1 v0.2d, v1.2d, v2.2d
; CHECK-NEXT: zip2 v1.2d, v1.2d, v2.2d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%strided.vec = shufflevector <4 x double> %a, <4 x double> poison, <2 x i32> <i32 0, i32 2>		%strided.vec = shufflevector <4 x double> %a, <4 x double> poison, <2 x i32> <i32 0, i32 2>
%strided.vec53 = shufflevector <4 x double> %a, <4 x double> poison, <2 x i32> <i32 1, i32 3>		%strided.vec53 = shufflevector <4 x double> %a, <4 x double> poison, <2 x i32> <i32 1, i32 3>
%strided.vec55 = shufflevector <4 x double> %b, <4 x double> poison, <2 x i32> <i32 0, i32 2>		%strided.vec55 = shufflevector <4 x double> %b, <4 x double> poison, <2 x i32> <i32 0, i32 2>
%strided.vec56 = shufflevector <4 x double> %b, <4 x double> poison, <2 x i32> <i32 1, i32 3>		%strided.vec56 = shufflevector <4 x double> %b, <4 x double> poison, <2 x i32> <i32 1, i32 3>
%0 = fmul fast <2 x double> %strided.vec56, %strided.vec		%0 = fmul fast <2 x double> %strided.vec56, %strided.vec
%1 = fmul fast <2 x double> %strided.vec55, %strided.vec53		%1 = fmul fast <2 x double> %strided.vec55, %strided.vec53
Show All 16 Lines	entry:
%interleaved.vec = shufflevector <2 x double> %8, <2 x double> %13, <4 x i32> <i32 0, i32 2, i32 1, i32 3>		%interleaved.vec = shufflevector <2 x double> %8, <2 x double> %13, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
ret <4 x double> %interleaved.vec		ret <4 x double> %interleaved.vec
}		}

; a * b + conj(c) * d		; a * b + conj(c) * d
define <4 x double> @mul_conj_mull(<4 x double> %a, <4 x double> %b, <4 x double> %c, <4 x double> %d) {		define <4 x double> @mul_conj_mull(<4 x double> %a, <4 x double> %b, <4 x double> %c, <4 x double> %d) {
; CHECK-LABEL: mul_conj_mull:		; CHECK-LABEL: mul_conj_mull:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: zip2 v16.2d, v2.2d, v3.2d		; CHECK-NEXT: movi v16.2d, #0000000000000000
; CHECK-NEXT: zip2 v17.2d, v0.2d, v1.2d		; CHECK-NEXT: movi v17.2d, #0000000000000000
; CHECK-NEXT: zip1 v2.2d, v2.2d, v3.2d		; CHECK-NEXT: fcmla v16.2d, v2.2d, v0.2d, #0
; CHECK-NEXT: zip1 v0.2d, v0.2d, v1.2d		; CHECK-NEXT: fcmla v17.2d, v3.2d, v1.2d, #0
; CHECK-NEXT: fmul v3.2d, v16.2d, v17.2d		; CHECK-NEXT: fcmla v16.2d, v2.2d, v0.2d, #90
; CHECK-NEXT: fmul v1.2d, v2.2d, v17.2d		; CHECK-NEXT: fcmla v17.2d, v3.2d, v1.2d, #90
; CHECK-NEXT: zip1 v17.2d, v4.2d, v5.2d		; CHECK-NEXT: fcmla v16.2d, v6.2d, v4.2d, #0
; CHECK-NEXT: zip2 v4.2d, v4.2d, v5.2d		; CHECK-NEXT: fcmla v17.2d, v7.2d, v5.2d, #0
; CHECK-NEXT: fneg v3.2d, v3.2d		; CHECK-NEXT: fcmla v16.2d, v6.2d, v4.2d, #270
; CHECK-NEXT: zip1 v5.2d, v6.2d, v7.2d		; CHECK-NEXT: fcmla v17.2d, v7.2d, v5.2d, #270
; CHECK-NEXT: fmla v1.2d, v0.2d, v16.2d		; CHECK-NEXT: mov v0.16b, v16.16b
; CHECK-NEXT: fmla v3.2d, v0.2d, v2.2d		; CHECK-NEXT: mov v1.16b, v17.16b
; CHECK-NEXT: zip2 v0.2d, v6.2d, v7.2d
; CHECK-NEXT: fmls v1.2d, v4.2d, v5.2d
; CHECK-NEXT: fmla v3.2d, v17.2d, v5.2d
; CHECK-NEXT: fmla v1.2d, v17.2d, v0.2d
; CHECK-NEXT: fmla v3.2d, v4.2d, v0.2d
; CHECK-NEXT: zip1 v0.2d, v3.2d, v1.2d
; CHECK-NEXT: zip2 v1.2d, v3.2d, v1.2d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%strided.vec = shufflevector <4 x double> %a, <4 x double> poison, <2 x i32> <i32 0, i32 2>		%strided.vec = shufflevector <4 x double> %a, <4 x double> poison, <2 x i32> <i32 0, i32 2>
%strided.vec59 = shufflevector <4 x double> %a, <4 x double> poison, <2 x i32> <i32 1, i32 3>		%strided.vec59 = shufflevector <4 x double> %a, <4 x double> poison, <2 x i32> <i32 1, i32 3>
%strided.vec61 = shufflevector <4 x double> %b, <4 x double> poison, <2 x i32> <i32 0, i32 2>		%strided.vec61 = shufflevector <4 x double> %b, <4 x double> poison, <2 x i32> <i32 0, i32 2>
%strided.vec62 = shufflevector <4 x double> %b, <4 x double> poison, <2 x i32> <i32 1, i32 3>		%strided.vec62 = shufflevector <4 x double> %b, <4 x double> poison, <2 x i32> <i32 1, i32 3>
%0 = fmul fast <2 x double> %strided.vec62, %strided.vec		%0 = fmul fast <2 x double> %strided.vec62, %strided.vec
%1 = fmul fast <2 x double> %strided.vec61, %strided.vec59		%1 = fmul fast <2 x double> %strided.vec61, %strided.vec59
▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/complex-deinterleaving-add-mull-scalable-fast.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s --mattr=+sve -o - \| FileCheck %s		; RUN: llc < %s --mattr=+sve -o - \| FileCheck %s

target triple = "aarch64-arm-none-eabi"		target triple = "aarch64-arm-none-eabi"

; a * b + c		; a * b + c
define <vscale x 4 x double> @mull_add(<vscale x 4 x double> %a, <vscale x 4 x double> %b, <vscale x 4 x double> %c) {		define <vscale x 4 x double> @mull_add(<vscale x 4 x double> %a, <vscale x 4 x double> %b, <vscale x 4 x double> %c) {
; CHECK-LABEL: mull_add:		; CHECK-LABEL: mull_add:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: uzp2 z6.d, z4.d, z5.d
; CHECK-NEXT: uzp1 z7.d, z0.d, z1.d
; CHECK-NEXT: uzp2 z0.d, z0.d, z1.d
; CHECK-NEXT: uzp1 z1.d, z4.d, z5.d
; CHECK-NEXT: uzp1 z4.d, z2.d, z3.d
; CHECK-NEXT: ptrue p0.d		; CHECK-NEXT: ptrue p0.d
; CHECK-NEXT: fmla z1.d, p0/m, z4.d, z7.d		; CHECK-NEXT: fcmla z4.d, p0/m, z0.d, z2.d, #0
; CHECK-NEXT: uzp2 z2.d, z2.d, z3.d		; CHECK-NEXT: fcmla z5.d, p0/m, z1.d, z3.d, #0
; CHECK-NEXT: movprfx z5, z6		; CHECK-NEXT: fcmla z4.d, p0/m, z0.d, z2.d, #90
; CHECK-NEXT: fmla z5.d, p0/m, z4.d, z0.d		; CHECK-NEXT: fcmla z5.d, p0/m, z1.d, z3.d, #90
; CHECK-NEXT: movprfx z3, z5		; CHECK-NEXT: mov z0.d, z4.d
; CHECK-NEXT: fmla z3.d, p0/m, z2.d, z7.d		; CHECK-NEXT: mov z1.d, z5.d
; CHECK-NEXT: fmls z1.d, p0/m, z2.d, z0.d
; CHECK-NEXT: zip1 z0.d, z1.d, z3.d
; CHECK-NEXT: zip2 z1.d, z1.d, z3.d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%strided.vec = tail call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double> %a)		%strided.vec = tail call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double> %a)
%0 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec, 0		%0 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec, 0
%1 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec, 1		%1 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec, 1
%strided.vec29 = tail call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double> %b)		%strided.vec29 = tail call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double> %b)
%2 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec29, 0		%2 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec29, 0
%3 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec29, 1		%3 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec29, 1
Show All 11 Lines	entry:
%interleaved.vec = tail call <vscale x 4 x double> @llvm.experimental.vector.interleave2.nxv4f64(<vscale x 2 x double> %12, <vscale x 2 x double> %13)		%interleaved.vec = tail call <vscale x 4 x double> @llvm.experimental.vector.interleave2.nxv4f64(<vscale x 2 x double> %12, <vscale x 2 x double> %13)
ret <vscale x 4 x double> %interleaved.vec		ret <vscale x 4 x double> %interleaved.vec
}		}

; a * b + c * d		; a * b + c * d
define <vscale x 4 x double> @mul_add_mull(<vscale x 4 x double> %a, <vscale x 4 x double> %b, <vscale x 4 x double> %c, <vscale x 4 x double> %d) {		define <vscale x 4 x double> @mul_add_mull(<vscale x 4 x double> %a, <vscale x 4 x double> %b, <vscale x 4 x double> %c, <vscale x 4 x double> %d) {
; CHECK-LABEL: mul_add_mull:		; CHECK-LABEL: mul_add_mull:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: uzp1 z25.d, z0.d, z1.d		; CHECK-NEXT: mov z24.d, #0 // =0x0
; CHECK-NEXT: uzp2 z0.d, z0.d, z1.d
; CHECK-NEXT: uzp1 z1.d, z2.d, z3.d
; CHECK-NEXT: uzp2 z24.d, z2.d, z3.d
; CHECK-NEXT: fmul z2.d, z1.d, z0.d
; CHECK-NEXT: ptrue p0.d		; CHECK-NEXT: ptrue p0.d
; CHECK-NEXT: fmla z2.d, p0/m, z24.d, z25.d		; CHECK-NEXT: mov z25.d, z24.d
; CHECK-NEXT: uzp2 z3.d, z4.d, z5.d		; CHECK-NEXT: fcmla z24.d, p0/m, z7.d, z5.d, #0
; CHECK-NEXT: uzp1 z26.d, z6.d, z7.d		; CHECK-NEXT: fcmla z25.d, p0/m, z6.d, z4.d, #0
; CHECK-NEXT: fmul z1.d, z1.d, z25.d		; CHECK-NEXT: fcmla z24.d, p0/m, z1.d, z3.d, #0
; CHECK-NEXT: fmul z0.d, z24.d, z0.d		; CHECK-NEXT: fcmla z25.d, p0/m, z0.d, z2.d, #0
; CHECK-NEXT: uzp1 z4.d, z4.d, z5.d		; CHECK-NEXT: fcmla z24.d, p0/m, z7.d, z5.d, #90
; CHECK-NEXT: uzp2 z5.d, z6.d, z7.d		; CHECK-NEXT: fcmla z25.d, p0/m, z6.d, z4.d, #90
; CHECK-NEXT: fmla z1.d, p0/m, z26.d, z4.d		; CHECK-NEXT: fcmla z24.d, p0/m, z1.d, z3.d, #90
; CHECK-NEXT: fmla z2.d, p0/m, z26.d, z3.d		; CHECK-NEXT: fcmla z25.d, p0/m, z0.d, z2.d, #90
; CHECK-NEXT: fmla z0.d, p0/m, z5.d, z3.d		; CHECK-NEXT: mov z1.d, z24.d
; CHECK-NEXT: fmla z2.d, p0/m, z5.d, z4.d		; CHECK-NEXT: mov z0.d, z25.d
; CHECK-NEXT: fsub z1.d, z1.d, z0.d
; CHECK-NEXT: zip1 z0.d, z1.d, z2.d
; CHECK-NEXT: zip2 z1.d, z1.d, z2.d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%strided.vec = tail call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double> %a)		%strided.vec = tail call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double> %a)
%0 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec, 0		%0 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec, 0
%1 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec, 1		%1 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec, 1
%strided.vec52 = tail call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double> %b)		%strided.vec52 = tail call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double> %b)
%2 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec52, 0		%2 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec52, 0
%3 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec52, 1		%3 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec52, 1
Show All 20 Lines	entry:
%interleaved.vec = tail call <vscale x 4 x double> @llvm.experimental.vector.interleave2.nxv4f64(<vscale x 2 x double> %18, <vscale x 2 x double> %21)		%interleaved.vec = tail call <vscale x 4 x double> @llvm.experimental.vector.interleave2.nxv4f64(<vscale x 2 x double> %18, <vscale x 2 x double> %21)
ret <vscale x 4 x double> %interleaved.vec		ret <vscale x 4 x double> %interleaved.vec
}		}

; a * b - c * d		; a * b - c * d
define <vscale x 4 x double> @mul_sub_mull(<vscale x 4 x double> %a, <vscale x 4 x double> %b, <vscale x 4 x double> %c, <vscale x 4 x double> %d) {		define <vscale x 4 x double> @mul_sub_mull(<vscale x 4 x double> %a, <vscale x 4 x double> %b, <vscale x 4 x double> %c, <vscale x 4 x double> %d) {
; CHECK-LABEL: mul_sub_mull:		; CHECK-LABEL: mul_sub_mull:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: uzp1 z25.d, z0.d, z1.d		; CHECK-NEXT: mov z24.d, #0 // =0x0
; CHECK-NEXT: uzp2 z0.d, z0.d, z1.d
; CHECK-NEXT: uzp1 z1.d, z2.d, z3.d
; CHECK-NEXT: ptrue p0.d		; CHECK-NEXT: ptrue p0.d
; CHECK-NEXT: uzp2 z24.d, z2.d, z3.d		; CHECK-NEXT: mov z25.d, z24.d
; CHECK-NEXT: fmul z2.d, z1.d, z0.d		; CHECK-NEXT: fcmla z24.d, p0/m, z7.d, z5.d, #270
; CHECK-NEXT: fmul z1.d, z1.d, z25.d		; CHECK-NEXT: fcmla z25.d, p0/m, z6.d, z4.d, #270
; CHECK-NEXT: uzp2 z3.d, z4.d, z5.d		; CHECK-NEXT: fcmla z24.d, p0/m, z1.d, z3.d, #0
; CHECK-NEXT: uzp1 z4.d, z4.d, z5.d		; CHECK-NEXT: fcmla z25.d, p0/m, z0.d, z2.d, #0
; CHECK-NEXT: uzp1 z5.d, z6.d, z7.d		; CHECK-NEXT: fcmla z24.d, p0/m, z7.d, z5.d, #180
; CHECK-NEXT: uzp2 z6.d, z6.d, z7.d		; CHECK-NEXT: fcmla z25.d, p0/m, z6.d, z4.d, #180
; CHECK-NEXT: fmul z0.d, z24.d, z0.d		; CHECK-NEXT: fcmla z24.d, p0/m, z1.d, z3.d, #90
; CHECK-NEXT: fmla z1.d, p0/m, z6.d, z3.d		; CHECK-NEXT: fcmla z25.d, p0/m, z0.d, z2.d, #90
; CHECK-NEXT: fmul z3.d, z5.d, z3.d		; CHECK-NEXT: mov z1.d, z24.d
; CHECK-NEXT: fmla z0.d, p0/m, z5.d, z4.d		; CHECK-NEXT: mov z0.d, z25.d
; CHECK-NEXT: fmla z3.d, p0/m, z6.d, z4.d
; CHECK-NEXT: fmla z2.d, p0/m, z24.d, z25.d
; CHECK-NEXT: fsub z1.d, z1.d, z0.d
; CHECK-NEXT: fsub z2.d, z2.d, z3.d
; CHECK-NEXT: zip1 z0.d, z1.d, z2.d
; CHECK-NEXT: zip2 z1.d, z1.d, z2.d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%strided.vec = tail call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double> %a)		%strided.vec = tail call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double> %a)
%0 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec, 0		%0 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec, 0
%1 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec, 1		%1 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec, 1
%strided.vec54 = tail call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double> %b)		%strided.vec54 = tail call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double> %b)
%2 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec54, 0		%2 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec54, 0
%3 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec54, 1		%3 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec54, 1
Show All 20 Lines	entry:
%interleaved.vec = tail call <vscale x 4 x double> @llvm.experimental.vector.interleave2.nxv4f64(<vscale x 2 x double> %16, <vscale x 2 x double> %21)		%interleaved.vec = tail call <vscale x 4 x double> @llvm.experimental.vector.interleave2.nxv4f64(<vscale x 2 x double> %16, <vscale x 2 x double> %21)
ret <vscale x 4 x double> %interleaved.vec		ret <vscale x 4 x double> %interleaved.vec
}		}

; a * b + conj(c) * d		; a * b + conj(c) * d
define <vscale x 4 x double> @mul_conj_mull(<vscale x 4 x double> %a, <vscale x 4 x double> %b, <vscale x 4 x double> %c, <vscale x 4 x double> %d) {		define <vscale x 4 x double> @mul_conj_mull(<vscale x 4 x double> %a, <vscale x 4 x double> %b, <vscale x 4 x double> %c, <vscale x 4 x double> %d) {
; CHECK-LABEL: mul_conj_mull:		; CHECK-LABEL: mul_conj_mull:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: uzp2 z24.d, z2.d, z3.d		; CHECK-NEXT: mov z24.d, #0 // =0x0
; CHECK-NEXT: uzp1 z25.d, z0.d, z1.d
; CHECK-NEXT: uzp2 z0.d, z0.d, z1.d
; CHECK-NEXT: uzp1 z1.d, z2.d, z3.d
; CHECK-NEXT: fmul z2.d, z1.d, z0.d
; CHECK-NEXT: ptrue p0.d		; CHECK-NEXT: ptrue p0.d
; CHECK-NEXT: fmul z0.d, z24.d, z0.d		; CHECK-NEXT: mov z25.d, z24.d
; CHECK-NEXT: fmla z2.d, p0/m, z24.d, z25.d		; CHECK-NEXT: fcmla z24.d, p0/m, z1.d, z3.d, #0
; CHECK-NEXT: uzp2 z3.d, z4.d, z5.d		; CHECK-NEXT: fcmla z25.d, p0/m, z0.d, z2.d, #0
; CHECK-NEXT: uzp1 z4.d, z4.d, z5.d		; CHECK-NEXT: fcmla z24.d, p0/m, z1.d, z3.d, #90
; CHECK-NEXT: uzp1 z5.d, z6.d, z7.d		; CHECK-NEXT: fcmla z25.d, p0/m, z0.d, z2.d, #90
; CHECK-NEXT: fnmls z0.d, p0/m, z1.d, z25.d		; CHECK-NEXT: fcmla z24.d, p0/m, z5.d, z7.d, #0
; CHECK-NEXT: fmla z0.d, p0/m, z5.d, z4.d		; CHECK-NEXT: fcmla z25.d, p0/m, z4.d, z6.d, #0
; CHECK-NEXT: movprfx z1, z2		; CHECK-NEXT: fcmla z24.d, p0/m, z5.d, z7.d, #270
; CHECK-NEXT: fmls z1.d, p0/m, z5.d, z3.d		; CHECK-NEXT: fcmla z25.d, p0/m, z4.d, z6.d, #270
; CHECK-NEXT: uzp2 z2.d, z6.d, z7.d		; CHECK-NEXT: mov z1.d, z24.d
; CHECK-NEXT: fmla z1.d, p0/m, z2.d, z4.d		; CHECK-NEXT: mov z0.d, z25.d
; CHECK-NEXT: fmad z3.d, p0/m, z2.d, z0.d
; CHECK-NEXT: zip1 z0.d, z3.d, z1.d
; CHECK-NEXT: zip2 z1.d, z3.d, z1.d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%strided.vec = tail call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double> %a)		%strided.vec = tail call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double> %a)
%0 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec, 0		%0 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec, 0
%1 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec, 1		%1 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec, 1
%strided.vec60 = tail call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double> %b)		%strided.vec60 = tail call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double> %b)
%2 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec60, 0		%2 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec60, 0
%3 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec60, 1		%3 = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } %strided.vec60, 1
▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/complex-deinterleaving-mixed-cases.ll

Show First 20 Lines • Show All 478 Lines • ▼ Show 20 Lines	entry:
ret <4 x float> %interleaved.vec		ret <4 x float> %interleaved.vec
}		}

; Expected to transform		; Expected to transform
define <4 x float> @mul_negequal(<4 x float> %a, <4 x float> %b) {		define <4 x float> @mul_negequal(<4 x float> %a, <4 x float> %b) {
; CHECK-LABEL: mul_negequal:		; CHECK-LABEL: mul_negequal:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: movi v2.2d, #0000000000000000		; CHECK-NEXT: movi v2.2d, #0000000000000000
; CHECK-NEXT: fcmla v2.4s, v0.4s, v1.4s, #0		; CHECK-NEXT: fcmla v2.4s, v0.4s, v1.4s, #180
; CHECK-NEXT: fcmla v2.4s, v0.4s, v1.4s, #90		; CHECK-NEXT: fcmla v2.4s, v0.4s, v1.4s, #270
; CHECK-NEXT: fneg v0.4s, v2.4s		; CHECK-NEXT: mov v0.16b, v2.16b
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%strided.vec = shufflevector <4 x float> %a, <4 x float> poison, <2 x i32> <i32 0, i32 2>		%strided.vec = shufflevector <4 x float> %a, <4 x float> poison, <2 x i32> <i32 0, i32 2>
%a.imag = shufflevector <4 x float> %a, <4 x float> poison, <2 x i32> <i32 1, i32 3>		%a.imag = shufflevector <4 x float> %a, <4 x float> poison, <2 x i32> <i32 1, i32 3>
%b.real = shufflevector <4 x float> %b, <4 x float> poison, <2 x i32> <i32 0, i32 2>		%b.real = shufflevector <4 x float> %b, <4 x float> poison, <2 x i32> <i32 0, i32 2>
%b.imag = shufflevector <4 x float> %b, <4 x float> poison, <2 x i32> <i32 1, i32 3>		%b.imag = shufflevector <4 x float> %b, <4 x float> poison, <2 x i32> <i32 1, i32 3>
%0 = fmul fast <2 x float> %b.imag, %strided.vec		%0 = fmul fast <2 x float> %b.imag, %strided.vec
%1 = fmul fast <2 x float> %b.real, %a.imag		%1 = fmul fast <2 x float> %b.real, %a.imag
Show All 9 Lines

llvm/test/CodeGen/AArch64/complex-deinterleaving-multiuses.ll

	Show First 20 Lines • Show All 293 Lines • ▼ Show 20 Lines
	}			}

	; Expected to transform. Shows that composite common subexpression is not generated twice.			; Expected to transform. Shows that composite common subexpression is not generated twice.
	; u[i] = a[i] * b[i] - (c[i] * d[i] + g[i] * h[i]);			; u[i] = a[i] * b[i] - (c[i] * d[i] + g[i] * h[i]);
	; v[i] = e[i] * f[i] + (c[i] * d[i] + g[i] * h[i]);			; v[i] = e[i] * f[i] + (c[i] * d[i] + g[i] * h[i]);
	define void @mul_add_common_mul_add_mul(<4 x double> %a, <4 x double> %b, <4 x double> %c, <4 x double> %d, <4 x double> %e, <4 x double> %f, <4 x double> %g, <4 x double> %h, ptr %p1, ptr %p2) {			define void @mul_add_common_mul_add_mul(<4 x double> %a, <4 x double> %b, <4 x double> %c, <4 x double> %d, <4 x double> %e, <4 x double> %f, <4 x double> %g, <4 x double> %h, ptr %p1, ptr %p2) {
	; CHECK-LABEL: mul_add_common_mul_add_mul:			; CHECK-LABEL: mul_add_common_mul_add_mul:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ldp q17, q16, [sp, #96]			; CHECK-NEXT: ldp q17, q16, [sp, #64]
	; CHECK-NEXT: zip2 v20.2d, v4.2d, v5.2d			; CHECK-NEXT: movi v20.2d, #0000000000000000
	; CHECK-NEXT: zip2 v21.2d, v6.2d, v7.2d			; CHECK-NEXT: movi v21.2d, #0000000000000000
	; CHECK-NEXT: zip1 v4.2d, v4.2d, v5.2d			; CHECK-NEXT: movi v24.2d, #0000000000000000
	; CHECK-NEXT: zip1 v5.2d, v6.2d, v7.2d			; CHECK-NEXT: movi v25.2d, #0000000000000000
	; CHECK-NEXT: ldp q19, q18, [sp, #64]			; CHECK-NEXT: ldp q19, q18, [sp, #96]
	; CHECK-NEXT: zip2 v23.2d, v17.2d, v16.2d			; CHECK-NEXT: fcmla v24.2d, v2.2d, v0.2d, #0
	; CHECK-NEXT: fmul v6.2d, v21.2d, v20.2d			; CHECK-NEXT: fcmla v25.2d, v3.2d, v1.2d, #0
	; CHECK-NEXT: zip1 v16.2d, v17.2d, v16.2d			; CHECK-NEXT: fcmla v20.2d, v19.2d, v17.2d, #0
	; CHECK-NEXT: zip2 v22.2d, v19.2d, v18.2d			; CHECK-NEXT: fcmla v24.2d, v2.2d, v0.2d, #90
	; CHECK-NEXT: zip1 v18.2d, v19.2d, v18.2d			; CHECK-NEXT: fcmla v21.2d, v18.2d, v16.2d, #0
	; CHECK-NEXT: fneg v6.2d, v6.2d			; CHECK-NEXT: ldp q23, q22, [sp, #32]
	; CHECK-NEXT: fmul v20.2d, v5.2d, v20.2d			; CHECK-NEXT: fcmla v20.2d, v19.2d, v17.2d, #90
	; CHECK-NEXT: fmul v7.2d, v22.2d, v23.2d			; CHECK-NEXT: fcmla v25.2d, v3.2d, v1.2d, #90
	; CHECK-NEXT: fmla v6.2d, v4.2d, v5.2d			; CHECK-NEXT: fcmla v21.2d, v18.2d, v16.2d, #90
	; CHECK-NEXT: zip2 v5.2d, v2.2d, v3.2d			; CHECK-NEXT: fcmla v20.2d, v6.2d, v4.2d, #0
	; CHECK-NEXT: fneg v7.2d, v7.2d			; CHECK-NEXT: ldp q1, q0, [sp]
	; CHECK-NEXT: zip1 v2.2d, v2.2d, v3.2d			; CHECK-NEXT: fcmla v21.2d, v7.2d, v5.2d, #0
	; CHECK-NEXT: fmla v7.2d, v18.2d, v16.2d			; CHECK-NEXT: fcmla v20.2d, v6.2d, v4.2d, #90
	; CHECK-NEXT: fadd v19.2d, v7.2d, v6.2d			; CHECK-NEXT: fcmla v21.2d, v7.2d, v5.2d, #90
	; CHECK-NEXT: fmla v20.2d, v4.2d, v21.2d			; CHECK-NEXT: fsub v2.2d, v24.2d, v20.2d
	; CHECK-NEXT: zip2 v4.2d, v0.2d, v1.2d			; CHECK-NEXT: fcmla v20.2d, v1.2d, v23.2d, #0
	; CHECK-NEXT: ldp q7, q6, [sp]			; CHECK-NEXT: fsub v3.2d, v25.2d, v21.2d
	; CHECK-NEXT: zip1 v0.2d, v0.2d, v1.2d			; CHECK-NEXT: fcmla v21.2d, v0.2d, v22.2d, #0
	; CHECK-NEXT: fmla v20.2d, v18.2d, v23.2d			; CHECK-NEXT: fcmla v20.2d, v1.2d, v23.2d, #90
	; CHECK-NEXT: fmul v1.2d, v2.2d, v4.2d			; CHECK-NEXT: stp q2, q3, [x0]
	; CHECK-NEXT: fmla v20.2d, v22.2d, v16.2d			; CHECK-NEXT: fcmla v21.2d, v0.2d, v22.2d, #90
	; CHECK-NEXT: mov v3.16b, v19.16b			; CHECK-NEXT: stp q20, q21, [x1]
	; CHECK-NEXT: fmla v1.2d, v0.2d, v5.2d
	; CHECK-NEXT: fmla v3.2d, v4.2d, v5.2d
	; CHECK-NEXT: ldp q16, q4, [sp, #32]
	; CHECK-NEXT: fneg v17.2d, v3.2d
	; CHECK-NEXT: zip1 v3.2d, v7.2d, v6.2d
	; CHECK-NEXT: zip2 v6.2d, v7.2d, v6.2d
	; CHECK-NEXT: zip1 v5.2d, v16.2d, v4.2d
	; CHECK-NEXT: fmla v17.2d, v0.2d, v2.2d
	; CHECK-NEXT: fsub v18.2d, v1.2d, v20.2d
	; CHECK-NEXT: zip2 v0.2d, v16.2d, v4.2d
	; CHECK-NEXT: fmla v19.2d, v3.2d, v5.2d
	; CHECK-NEXT: st2 { v17.2d, v18.2d }, [x0]
	; CHECK-NEXT: fmls v19.2d, v6.2d, v0.2d
	; CHECK-NEXT: fmla v20.2d, v6.2d, v5.2d
	; CHECK-NEXT: fmla v20.2d, v3.2d, v0.2d
	; CHECK-NEXT: st2 { v19.2d, v20.2d }, [x1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%strided.vec = shufflevector <4 x double> %a, <4 x double> poison, <2 x i32> <i32 0, i32 2>			%strided.vec = shufflevector <4 x double> %a, <4 x double> poison, <2 x i32> <i32 0, i32 2>
	%strided.vec123 = shufflevector <4 x double> %a, <4 x double> poison, <2 x i32> <i32 1, i32 3>			%strided.vec123 = shufflevector <4 x double> %a, <4 x double> poison, <2 x i32> <i32 1, i32 3>
	%strided.vec125 = shufflevector <4 x double> %b, <4 x double> poison, <2 x i32> <i32 0, i32 2>			%strided.vec125 = shufflevector <4 x double> %b, <4 x double> poison, <2 x i32> <i32 0, i32 2>
	%strided.vec126 = shufflevector <4 x double> %b, <4 x double> poison, <2 x i32> <i32 1, i32 3>			%strided.vec126 = shufflevector <4 x double> %b, <4 x double> poison, <2 x i32> <i32 1, i32 3>
	%0 = fmul fast <2 x double> %strided.vec125, %strided.vec			%0 = fmul fast <2 x double> %strided.vec125, %strided.vec
	%1 = fmul fast <2 x double> %strided.vec126, %strided.vec			%1 = fmul fast <2 x double> %strided.vec126, %strided.vec
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/complex-deinterleaving-uniform-cases.ll

Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	entry:
%interleaved.vec = shufflevector <2 x float> %0, <2 x float> %1, <4 x i32> <i32 0, i32 2, i32 1, i32 3>		%interleaved.vec = shufflevector <2 x float> %0, <2 x float> %1, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
ret <4 x float> %interleaved.vec		ret <4 x float> %interleaved.vec
}		}

; Expected to not transform, fadd commutativity is not yet implemented		; Expected to not transform, fadd commutativity is not yet implemented
define <4 x float> @simple_add_270_false(<4 x float> %a, <4 x float> %b) {		define <4 x float> @simple_add_270_false(<4 x float> %a, <4 x float> %b) {
; CHECK-LABEL: simple_add_270_false:		; CHECK-LABEL: simple_add_270_false:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: ext v2.16b, v0.16b, v0.16b, #8		; CHECK-NEXT: fcadd v0.4s, v0.4s, v1.4s, #270
; CHECK-NEXT: ext v3.16b, v1.16b, v1.16b, #8
; CHECK-NEXT: zip1 v4.2s, v0.2s, v2.2s
; CHECK-NEXT: zip2 v0.2s, v0.2s, v2.2s
; CHECK-NEXT: zip1 v2.2s, v1.2s, v3.2s
; CHECK-NEXT: zip2 v1.2s, v1.2s, v3.2s
; CHECK-NEXT: fadd v1.2s, v1.2s, v4.2s
; CHECK-NEXT: fsub v0.2s, v0.2s, v2.2s
; CHECK-NEXT: zip1 v0.4s, v1.4s, v0.4s
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%strided.vec = shufflevector <4 x float> %a, <4 x float> poison, <2 x i32> <i32 0, i32 2>		%strided.vec = shufflevector <4 x float> %a, <4 x float> poison, <2 x i32> <i32 0, i32 2>
%strided.vec17 = shufflevector <4 x float> %a, <4 x float> poison, <2 x i32> <i32 1, i32 3>		%strided.vec17 = shufflevector <4 x float> %a, <4 x float> poison, <2 x i32> <i32 1, i32 3>
%strided.vec19 = shufflevector <4 x float> %b, <4 x float> poison, <2 x i32> <i32 0, i32 2>		%strided.vec19 = shufflevector <4 x float> %b, <4 x float> poison, <2 x i32> <i32 0, i32 2>
%strided.vec20 = shufflevector <4 x float> %b, <4 x float> poison, <2 x i32> <i32 1, i32 3>		%strided.vec20 = shufflevector <4 x float> %b, <4 x float> poison, <2 x i32> <i32 1, i32 3>
%0 = fadd fast <2 x float> %strided.vec20, %strided.vec		%0 = fadd fast <2 x float> %strided.vec20, %strided.vec
%1 = fsub fast <2 x float> %strided.vec17, %strided.vec19		%1 = fsub fast <2 x float> %strided.vec17, %strided.vec19
▲ Show 20 Lines • Show All 185 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-complex-deinterleaving-mixed-cases.ll

	Show First 20 Lines • Show All 547 Lines • ▼ Show 20 Lines
	; Expected to transform			; Expected to transform
	define <4 x float> @mul_negequal(<4 x float> %a, <4 x float> %b) {			define <4 x float> @mul_negequal(<4 x float> %a, <4 x float> %b) {
	; CHECK-LABEL: mul_negequal:			; CHECK-LABEL: mul_negequal:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: vmov d0, r0, r1			; CHECK-NEXT: vmov d0, r0, r1
	; CHECK-NEXT: mov r0, sp			; CHECK-NEXT: mov r0, sp
	; CHECK-NEXT: vldrw.u32 q1, [r0]			; CHECK-NEXT: vldrw.u32 q1, [r0]
	; CHECK-NEXT: vmov d1, r2, r3			; CHECK-NEXT: vmov d1, r2, r3
	; CHECK-NEXT: vcmul.f32 q2, q0, q1, #0			; CHECK-NEXT: vcmul.f32 q2, q0, q1, #180
	; CHECK-NEXT: vcmla.f32 q2, q0, q1, #90			; CHECK-NEXT: vcmla.f32 q2, q0, q1, #270
	; CHECK-NEXT: vneg.f32 q0, q2			; CHECK-NEXT: vmov r0, r1, d4
	; CHECK-NEXT: vmov r0, r1, d0			; CHECK-NEXT: vmov r2, r3, d5
	; CHECK-NEXT: vmov r2, r3, d1
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	entry:			entry:
	%strided.vec = shufflevector <4 x float> %a, <4 x float> poison, <2 x i32> <i32 0, i32 2>			%strided.vec = shufflevector <4 x float> %a, <4 x float> poison, <2 x i32> <i32 0, i32 2>
	%a.imag = shufflevector <4 x float> %a, <4 x float> poison, <2 x i32> <i32 1, i32 3>			%a.imag = shufflevector <4 x float> %a, <4 x float> poison, <2 x i32> <i32 1, i32 3>
	%b.real = shufflevector <4 x float> %b, <4 x float> poison, <2 x i32> <i32 0, i32 2>			%b.real = shufflevector <4 x float> %b, <4 x float> poison, <2 x i32> <i32 0, i32 2>
	%b.imag = shufflevector <4 x float> %b, <4 x float> poison, <2 x i32> <i32 1, i32 3>			%b.imag = shufflevector <4 x float> %b, <4 x float> poison, <2 x i32> <i32 1, i32 3>
	%0 = fmul fast <2 x float> %b.imag, %strided.vec			%0 = fmul fast <2 x float> %b.imag, %strided.vec
	%1 = fmul fast <2 x float> %b.real, %a.imag			%1 = fmul fast <2 x float> %b.real, %a.imag
	Show All 9 Lines

llvm/test/CodeGen/Thumb2/mve-complex-deinterleaving-uniform-cases.ll

Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	entry:
%interleaved.vec = shufflevector <2 x float> %0, <2 x float> %1, <4 x i32> <i32 0, i32 2, i32 1, i32 3>		%interleaved.vec = shufflevector <2 x float> %0, <2 x float> %1, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
ret <4 x float> %interleaved.vec		ret <4 x float> %interleaved.vec
}		}

; Expected to not transform, fadd commutativity is not yet implemented		; Expected to not transform, fadd commutativity is not yet implemented
define arm_aapcs_vfpcc <4 x float> @simple_add_270_false(<4 x float> %a, <4 x float> %b) {		define arm_aapcs_vfpcc <4 x float> @simple_add_270_false(<4 x float> %a, <4 x float> %b) {
; CHECK-LABEL: simple_add_270_false:		; CHECK-LABEL: simple_add_270_false:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vmov.f32 s8, s4		; CHECK-NEXT: vcadd.f32 q2, q0, q1, #270
; CHECK-NEXT: vmov.f32 s12, s1		; CHECK-NEXT: vmov q0, q2
; CHECK-NEXT: vmov.f32 s4, s5
; CHECK-NEXT: vmov.f32 s9, s6
; CHECK-NEXT: vmov.f32 s13, s3
; CHECK-NEXT: vmov.f32 s1, s2
; CHECK-NEXT: vsub.f32 q2, q3, q2
; CHECK-NEXT: vmov.f32 s5, s7
; CHECK-NEXT: vadd.f32 q1, q1, q0
; CHECK-NEXT: vmov.f32 s1, s8
; CHECK-NEXT: vmov.f32 s0, s4
; CHECK-NEXT: vmov.f32 s2, s5
; CHECK-NEXT: vmov.f32 s3, s9
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%strided.vec = shufflevector <4 x float> %a, <4 x float> poison, <2 x i32> <i32 0, i32 2>		%strided.vec = shufflevector <4 x float> %a, <4 x float> poison, <2 x i32> <i32 0, i32 2>
%strided.vec17 = shufflevector <4 x float> %a, <4 x float> poison, <2 x i32> <i32 1, i32 3>		%strided.vec17 = shufflevector <4 x float> %a, <4 x float> poison, <2 x i32> <i32 1, i32 3>
%strided.vec19 = shufflevector <4 x float> %b, <4 x float> poison, <2 x i32> <i32 0, i32 2>		%strided.vec19 = shufflevector <4 x float> %b, <4 x float> poison, <2 x i32> <i32 0, i32 2>
%strided.vec20 = shufflevector <4 x float> %b, <4 x float> poison, <2 x i32> <i32 1, i32 3>		%strided.vec20 = shufflevector <4 x float> %b, <4 x float> poison, <2 x i32> <i32 1, i32 3>
%0 = fadd fast <2 x float> %strided.vec20, %strided.vec		%0 = fadd fast <2 x float> %strided.vec20, %strided.vec
%1 = fsub fast <2 x float> %strided.vec17, %strided.vec19		%1 = fsub fast <2 x float> %strided.vec17, %strided.vec19
▲ Show 20 Lines • Show All 175 Lines • Show Last 20 Lines