This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
4
DAGCombiner.cpp
1/6
LegalizeVectorTypes.cpp
-
Target/AArch64/
-
AArch64/
2/12
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
sve-fcopysign.ll
-
sve-fixed-length-fcopysign.ll
-
sve2-fcopysign.ll
-
sve2-fixed-length-fcopysign.ll

Differential D128642

[AArch64][SVE] Use SVE for VLS fcopysign for wide vectors
ClosedPublic

Authored by DavidTruby on Jun 27 2022, 7:02 AM.

Download Raw Diff

Details

Reviewers

efriedma
paulwalker-arm
peterwaller-arm
bsmith
c-rhodes
dtemirbulatov
MattDevereau

Commits

rGb1b9c39629b5: [AArch64][SVE] Use SVE for VLS fcopysign for wide vectors

Summary

Currently fcopysign for VLS vectors lowers through NEON even when the
vector width is wider than a NEON vector, causing bad codegen as the
vectors are split. This patch causes SVE to be used for these vectors
instead, giving much better codegen on wide VLS vectors.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,120 ms	x64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp
	1,600 ms	x64 debian > LLVM.CodeGen/AArch64::sve-fcopysign.ll
	1,910 ms	x64 debian > LLVM.CodeGen/AArch64::sve-fixed-length-fcopysign.ll
	1,450 ms	x64 debian > LLVM.CodeGen/AArch64::sve2-fcopysign.ll
	1,660 ms	x64 debian > LLVM.CodeGen/AArch64::sve2-fixed-length-fcopysign.ll

Event Timeline

DavidTruby created this revision.Jun 27 2022, 7:02 AM

Herald added a reviewer: efriedma. · View Herald TranscriptJun 27 2022, 7:02 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: ctetreau, psnobl, hiraditya and 2 others. · View Herald Transcript

DavidTruby requested review of this revision.Jun 27 2022, 7:02 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 27 2022, 7:02 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

DavidTruby added reviewers: paulwalker-arm, peterwaller-arm, bsmith, c-rhodes, dtemirbulatov, MattDevereau.Jun 27 2022, 7:03 AM

FYI, if I add -mattr=+sve2 to your test arguments, I get:

LLVM ERROR: Cannot select: t17: v16i16 = AArch64ISD::BSP t43, t35, t32

Harbormaster completed remote builds in B172183: Diff 440205.Jun 27 2022, 7:52 AM

Fix expansion for VLS on SVE2

Harbormaster completed remote builds in B172248: Diff 440295.Jun 27 2022, 11:57 AM

When checking the output for SVE2 I see no difference, which means we're missing out on the BSL optimisation we get for scalable vectors. I think this is because you're handling the fixed->scalable lowering too late. I think you really need to edit LowerFCOPYSIGN to first convert the fixed length ISD::FCOPYSIGN to a scalable one, then let the existing scalable vector code decide how best to lower it.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1593	Rather than have this dangling there's a large ordered/sorted block further down.
19426	I think you mean `VT.isScalableVector()` here. However... Given this bug fix it makes me wonder if the following code was ever excised before this patch? Which given my SVE2 comment I'm think we can in fact keep the original code and just remove the `fixedSVEVectorVT` code?

Matt added a subscriber: Matt.Jun 28 2022, 2:03 PM

Rework patch to use VLA lowering for the VLS types.

Harbormaster completed remote builds in B172742: Diff 440991.Jun 29 2022, 8:09 AM

paulwalker-arm added inline comments.Jun 30 2022, 9:47 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7738–7745	This doesn't look safe with respect to the extend/rounding code just below. When faced with differing types the result from both convertToScalableVector called will be a type of the same size. However their element counts will be different. For example take the case: fcopysign v8f64, v8f32 this will resulting in: In1 = nxv2f64 In2 = nxv4f32 which I doubt the remaining logic will handle properly. The most likely affect being a getNode assert firing for invalid operands. My guess is that you're not seeing this because `In1` and `In2` always have the same type and indeed I couldn't immediate see a way to exercise this logic. I think this means your "mixtype" tests are likely exercising nothing new and are redundant. This is likely also true for you original patch when you added the initial scalable vector support. If they are not exercising this code as I suspect then you either need to rewrite them or just remove them if there's no actually route to test this logic. Personally I think the safest route is to simply rewrite the fixed length fcopysign into a scalable vector one after any necessary extending/rounding of the input has taken place. For what it's worth I also think the use of FP_EXTEND/FP_ROUND is not the most efficient way to get the sign bits to align but that can be changed later.
19404–19406	Isn't this original code now fine and you instead just need to remove the following // Don't expand for NEON if (VT.isFixedLengthVector()) return SDValue(); block because that is covered by the `!VT.isScalableVector()` check?

Move VLS handling after ROUND/EXTEND

DavidTruby added inline comments.Jul 4 2022, 3:52 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7738–7745	I believe I've corrected this now; I think you're right that the inputs will always be the same type anyway though. I agree that it is safer to leave the handling in just in case that does get triggered. I think it's better to leave the mixed type tests in as is, just in case something changes in future and the types coming into this function could be different we want to make sure we don't regress in that case.

Looking generally good but I see some possible minor improvements/cleanup.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7754	Nit: Does `isFixedSVE` want to move down with the use?
7824	Is this line necessary or could it be pushed up? At a glance it appears it should already be an integer VT derived from VT. Same question for the VT assignment.

This revision is now accepted and ready to land.Jul 4 2022, 4:14 AM

Harbormaster completed remote builds in B173535: Diff 442061.Jul 4 2022, 4:49 AM

DavidTruby added inline comments.Jul 4 2022, 5:24 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7824	From 7593-4: VT and IntVT will be scalable containers for the fixed length vector types. Here we need to get the original VTs back.

Requesting changes to deal with the mixed-type combine/tests, since we have found a case where the types can be different.

This revision now requires changes to proceed.Jul 18 2022, 5:50 AM

Add flag to test FCOPYSIGN nodes with differing argument types.

This patch now depends on D130370 as a result.

Herald added a subscriber: ecnelises. · View Herald TranscriptJul 28 2022, 5:40 AM

Harbormaster completed remote builds in B178059: Diff 448315.Jul 28 2022, 6:24 AM

DavidTruby added a parent revision: D130370: [llvm] Always use TargetConstant for FP_ROUND ISD Nodes.Jul 28 2022, 7:33 AM

paulwalker-arm added inline comments.Jul 28 2022, 4:30 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15405	What about `return EnableVectorFcopysignExtendRound;`?
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
3597	By this point we know the result type is legal because results are legalised before operands. What's important here is the result type remains legal after splitting the operands. Given the result and first operands have the same type this means ensuring the types of `LHSLo` and `LHSHi` are legal after splitting. There's a function `GetSplitDestVTs` which returns the types expected from splitting. I mention this because I think it's better to query the expected types are legal before performing the actual splitting.

DavidTruby added inline comments.Jul 28 2022, 4:36 PM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
3597	Ah ok I think I was considering this wrong, I thought that the result type of the concat (which is the result type of the original FCOPYSIGN) needed to be legal for us to do the transform If that's already legal, is there a problem? Is there a case where splitting an already legal vector in two would make a vector illegal? (genuine question I'm not sure when this would pop up) Or do we need RHSLo to be legal?

paulwalker-arm added inline comments.Jul 28 2022, 4:51 PM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
3597	You can have multiple legal types for the same vector element type. For NEON `v4f32` and `v2f32` are legal. So it is possible for the result type to be legal and yet still be legal after splitting. Likewise `v1f32` is not legal for NEON and so it is possible to enter with a legal type that would become illegal when split. For the former case we can split the operation in two as you've done. For the latter we're better reverting to the original code path of calling `UnrollVector`. So generally what you've done is fine, it is just you're checking the wrong type (i.e. N's result type rather than the expected result type of the new `FCOPYSIGN` operations). Plus my comment that you probably want to use `GetSplitDestVTs` so you only call `SplitVector` for the cases that are safe.

Fix validity check for FCOPYSIGN legalization

Harbormaster completed remote builds in B178292: Diff 448640.Jul 29 2022, 9:14 AM

paulwalker-arm added inline comments.Aug 1 2022, 10:14 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
138	Up to you but I think `EnableVectorFCopySignExtendRound` looks better.
140	for?
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
3612	LHSLoVT?
3614	LHSHiVT?
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7751–7752	Not new but can this be removed? as it can never happen given the `SrcVT.bitsLT/SrcVT.bitsGT` code above.
7755	This can be assumed, plus `getContainerForFixedLengthVector` will ensure the type is legal anyway.
7822–7826	Bookending the fixed length lowering like this has pitfalls and can complicate the code. It's better to just rewrite the fixed length operations using scalable vector types and then let the scalable vector lowering handle any complexity. Towards the start of the function you can do: EVT ContainerVT = getContainerForFixedLengthVector(DAG, VT); In1 = convertToScalableVector(DAG, ContainerVT, In1); In2 = convertToScalableVector(DAG, ContainerVT, In2); Res = getNode(ISD::FCOPYSIGN, ContainerVT , In1, In2) return convertFromScalableVector(DAG, ContainerVT, Res); This way it doesn't matter how complicated the scalable vector lowering gets. Doing this also means you no longer need sve2-fixed-length-fcopysign.ll because there's nothing SVE2 special about the lowering code you've added (i.e. the original sve2-fcopysign.ll tests are good enough to protect that functionality).

Changed fixed-length lowering to rely on scalable lowering.
Removed redundant code.

Harbormaster completed remote builds in B178742: Diff 449254.Aug 2 2022, 5:53 AM

Documentation for combiner-vector-fcopysign-extend-round needs updating but otherwise looks good.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
141	Please drop this part of the documentation. Although this is why you've added the flag, it is not the only reason somebody might want to use it (i.e. somebody might actually want to enable the optimisation).
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
3615–3618	You could just `return DAG.getNode(...`.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7734	Bogus blank line.

peterwaller-arm accepted this revision.Aug 9 2022, 4:00 AM

This revision is now accepted and ready to land.Aug 9 2022, 4:00 AM

Closed by commit rGb1b9c39629b5: [AArch64][SVE] Use SVE for VLS fcopysign for wide vectors (authored by DavidTruby). · Explain WhyAug 10 2022, 3:17 AM

This revision was automatically updated to reflect the committed changes.

DavidTruby added a commit: rGb1b9c39629b5: [AArch64][SVE] Use SVE for VLS fcopysign for wide vectors.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

6 lines

LegalizeVectorTypes.cpp

22 lines

Target/

AArch64/

AArch64ISelLowering.cpp

33 lines

test/

CodeGen/

AArch64/

sve-fcopysign.ll

112 lines

sve-fixed-length-fcopysign.ll

558 lines

sve2-fcopysign.ll

139 lines

sve2-fixed-length-fcopysign.ll

546 lines

Diff 448315

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	static cl::opt<bool> EnableReduceLoadOpStoreWidth(
cl::desc("DAG combiner enable reducing the width of load/op/store "		cl::desc("DAG combiner enable reducing the width of load/op/store "
"sequence"));		"sequence"));

static cl::opt<bool> EnableShrinkLoadReplaceStoreWithStore(		static cl::opt<bool> EnableShrinkLoadReplaceStoreWithStore(
"combiner-shrink-load-replace-store-with-store", cl::Hidden, cl::init(true),		"combiner-shrink-load-replace-store-with-store", cl::Hidden, cl::init(true),
cl::desc("DAG combiner enable load/<replace bytes>/store with "		cl::desc("DAG combiner enable load/<replace bytes>/store with "
"a narrower store"));		"a narrower store"));

		static cl::opt<bool> EnableVectorFcopysignExtendRound(
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Up to you but I think `EnableVectorFCopySignExtendRound` looks better. paulwalker-arm: Up to you but I think `EnableVectorFCopySignExtendRound` looks better.
		"combiner-vector-fcopysign-extend-round", cl::Hidden, cl::init(false),
		cl::desc("Enable merging extends and rounds into FCOPYSIGN on vector types"));
		paulwalker-armUnsubmitted Not Done Reply Inline Actions for? paulwalker-arm: for?

		paulwalker-armUnsubmitted Not Done Reply Inline Actions Please drop this part of the documentation. Although this is why you've added the flag, it is not the only reason somebody might want to use it (i.e. somebody might actually want to enable the optimisation). paulwalker-arm: Please drop this part of the documentation. Although this is why you've added the flag, it is…
namespace {		namespace {

class DAGCombiner {		class DAGCombiner {
SelectionDAG &DAG;		SelectionDAG &DAG;
const TargetLowering &TLI;		const TargetLowering &TLI;
const SelectionDAGTargetInfo *STI;		const SelectionDAGTargetInfo *STI;
CombineLevel Level = BeforeLegalizeTypes;		CombineLevel Level = BeforeLegalizeTypes;
CodeGenOpt::Level OptLevel;		CodeGenOpt::Level OptLevel;
▲ Show 20 Lines • Show All 15,246 Lines • ▼ Show 20 Lines	if ((N1.getOpcode() == ISD::FP_EXTEND \|\|
// Do not optimize out type conversion of f128 type yet.		// Do not optimize out type conversion of f128 type yet.
// For some targets like x86_64, configuration is changed to keep one f128		// For some targets like x86_64, configuration is changed to keep one f128
// value in one SSE register, but instruction selection cannot handle		// value in one SSE register, but instruction selection cannot handle
// FCOPYSIGN on SSE registers yet.		// FCOPYSIGN on SSE registers yet.
if (N1Op0VT == MVT::f128)		if (N1Op0VT == MVT::f128)
return false;		return false;

// Avoid mismatched vector operand types, for better instruction selection.		// Avoid mismatched vector operand types, for better instruction selection.
if (N1Op0VT.isVector())		if (N1Op0VT.isVector() && !EnableVectorFcopysignExtendRound)
return false;		return false;
		paulwalker-armUnsubmitted Not Done Reply Inline Actions What about `return EnableVectorFcopysignExtendRound;`? paulwalker-arm: What about `return EnableVectorFcopysignExtendRound;`?

return true;		return true;
}		}
return false;		return false;
}		}

SDValue DAGCombiner::visitFCOPYSIGN(SDNode *N) {		SDValue DAGCombiner::visitFCOPYSIGN(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
▲ Show 20 Lines • Show All 9,576 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

Show First 20 Lines • Show All 3,587 Lines • ▼ Show 20 Lines	SDValue DAGTypeLegalizer::SplitVecOp_FP_ROUND(SDNode *N) {
}		}

return DAG.getNode(ISD::CONCAT_VECTORS, DL, ResVT, Lo, Hi);		return DAG.getNode(ISD::CONCAT_VECTORS, DL, ResVT, Lo, Hi);
}		}

SDValue DAGTypeLegalizer::SplitVecOp_FCOPYSIGN(SDNode *N) {		SDValue DAGTypeLegalizer::SplitVecOp_FCOPYSIGN(SDNode *N) {
// The result (and the first input) has a legal vector type, but the second		// The result (and the first input) has a legal vector type, but the second
// input needs splitting.		// input needs splitting.

		if (!isTypeLegal(N->getValueType(0)))
		paulwalker-armUnsubmitted Not Done Reply Inline Actions By this point we know the result type is legal because results are legalised before operands. What's important here is the result type remains legal after splitting the operands. Given the result and first operands have the same type this means ensuring the types of `LHSLo` and `LHSHi` are legal after splitting. There's a function `GetSplitDestVTs` which returns the types expected from splitting. I mention this because I think it's better to query the expected types are legal before performing the actual splitting. paulwalker-arm: By this point we know the result type is legal because results are legalised before operands.
		DavidTrubyAuthorUnsubmitted Done Reply Inline Actions Ah ok I think I was considering this wrong, I thought that the result type of the concat (which is the result type of the original FCOPYSIGN) needed to be legal for us to do the transform If that's already legal, is there a problem? Is there a case where splitting an already legal vector in two would make a vector illegal? (genuine question I'm not sure when this would pop up) Or do we need RHSLo to be legal? DavidTruby: Ah ok I think I was considering this wrong, I thought that the result type of the concat (which…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions You can have multiple legal types for the same vector element type. For NEON `v4f32` and `v2f32` are legal. So it is possible for the result type to be legal and yet still be legal after splitting. Likewise `v1f32` is not legal for NEON and so it is possible to enter with a legal type that would become illegal when split. For the former case we can split the operation in two as you've done. For the latter we're better reverting to the original code path of calling `UnrollVector`. So generally what you've done is fine, it is just you're checking the wrong type (i.e. N's result type rather than the expected result type of the new `FCOPYSIGN` operations). Plus my comment that you probably want to use `GetSplitDestVTs` so you only call `SplitVector` for the cases that are safe. paulwalker-arm: You can have multiple legal types for the same vector element type. For NEON `v4f32` and…
return DAG.UnrollVectorOp(N, N->getValueType(0).getVectorNumElements());		return DAG.UnrollVectorOp(N, N->getValueType(0).getVectorNumElements());

		SDLoc DL(N);

		SDValue LHSLo, LHSHi;
		std::tie(LHSLo, LHSHi) = DAG.SplitVector(N->getOperand(0), DL);

		SDValue RHSLo, RHSHi;
		std::tie(RHSLo, RHSHi) = DAG.SplitVector(N->getOperand(1), DL);

		SDValue Lo =
		DAG.getNode(ISD::FCOPYSIGN, DL, LHSLo.getValueType(), LHSLo, RHSLo);
		SDValue Hi =
		DAG.getNode(ISD::FCOPYSIGN, DL, LHSHi.getValueType(), LHSHi, RHSHi);

		paulwalker-armUnsubmitted Not Done Reply Inline Actions LHSLoVT? paulwalker-arm: LHSLoVT?
		SDValue Concat =
		DAG.getNode(ISD::CONCAT_VECTORS, DL, N->getValueType(0), Lo, Hi);
		paulwalker-armUnsubmitted Not Done Reply Inline Actions LHSHiVT? paulwalker-arm: LHSHiVT?

		return Concat;
}		}

		paulwalker-armUnsubmitted Not Done Reply Inline Actions You could just `return DAG.getNode(...`. paulwalker-arm: You could just `return DAG.getNode(...`.
SDValue DAGTypeLegalizer::SplitVecOp_FP_TO_XINT_SAT(SDNode *N) {		SDValue DAGTypeLegalizer::SplitVecOp_FP_TO_XINT_SAT(SDNode *N) {
EVT ResVT = N->getValueType(0);		EVT ResVT = N->getValueType(0);
SDValue Lo, Hi;		SDValue Lo, Hi;
SDLoc dl(N);		SDLoc dl(N);
GetSplitVector(N->getOperand(0), Lo, Hi);		GetSplitVector(N->getOperand(0), Lo, Hi);
EVT InVT = Lo.getValueType();		EVT InVT = Lo.getValueType();

EVT NewResVT =		EVT NewResVT =
▲ Show 20 Lines • Show All 3,096 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,584 Lines • ▼ Show 20 Lines	void AArch64TargetLowering::addTypeForFixedLengthSVE(MVT VT) {
// By default everything must be expanded.		// By default everything must be expanded.
for (unsigned Op = 0; Op < ISD::BUILTIN_OP_END; ++Op)		for (unsigned Op = 0; Op < ISD::BUILTIN_OP_END; ++Op)
setOperationAction(Op, VT, Expand);		setOperationAction(Op, VT, Expand);

// We use EXTRACT_SUBVECTOR to "cast" a scalable vector to a fixed length one.		// We use EXTRACT_SUBVECTOR to "cast" a scalable vector to a fixed length one.
setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);

if (VT.isFloatingPoint()) {		if (VT.isFloatingPoint()) {
setCondCodeAction(ISD::SETO, VT, Expand);		setCondCodeAction(ISD::SETO, VT, Expand);
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Rather than have this dangling there's a large ordered/sorted block further down. paulwalker-arm: Rather than have this dangling there's a large ordered/sorted block further down.
setCondCodeAction(ISD::SETOLT, VT, Expand);		setCondCodeAction(ISD::SETOLT, VT, Expand);
setCondCodeAction(ISD::SETLT, VT, Expand);		setCondCodeAction(ISD::SETLT, VT, Expand);
setCondCodeAction(ISD::SETOLE, VT, Expand);		setCondCodeAction(ISD::SETOLE, VT, Expand);
setCondCodeAction(ISD::SETLE, VT, Expand);		setCondCodeAction(ISD::SETLE, VT, Expand);
setCondCodeAction(ISD::SETULT, VT, Expand);		setCondCodeAction(ISD::SETULT, VT, Expand);
setCondCodeAction(ISD::SETULE, VT, Expand);		setCondCodeAction(ISD::SETULE, VT, Expand);
setCondCodeAction(ISD::SETUGE, VT, Expand);		setCondCodeAction(ISD::SETUGE, VT, Expand);
setCondCodeAction(ISD::SETUGT, VT, Expand);		setCondCodeAction(ISD::SETUGT, VT, Expand);
Show All 37 Lines	void AArch64TargetLowering::addTypeForFixedLengthSVE(MVT VT) {
setOperationAction(ISD::CONCAT_VECTORS, VT, Custom);		setOperationAction(ISD::CONCAT_VECTORS, VT, Custom);
setOperationAction(ISD::CTLZ, VT, Custom);		setOperationAction(ISD::CTLZ, VT, Custom);
setOperationAction(ISD::CTPOP, VT, Custom);		setOperationAction(ISD::CTPOP, VT, Custom);
setOperationAction(ISD::CTTZ, VT, Custom);		setOperationAction(ISD::CTTZ, VT, Custom);
setOperationAction(ISD::FABS, VT, Custom);		setOperationAction(ISD::FABS, VT, Custom);
setOperationAction(ISD::FADD, VT, Custom);		setOperationAction(ISD::FADD, VT, Custom);
setOperationAction(ISD::EXTRACT_VECTOR_ELT, VT, Custom);		setOperationAction(ISD::EXTRACT_VECTOR_ELT, VT, Custom);
setOperationAction(ISD::FCEIL, VT, Custom);		setOperationAction(ISD::FCEIL, VT, Custom);
		setOperationAction(ISD::FCOPYSIGN, VT, Custom);
setOperationAction(ISD::FDIV, VT, Custom);		setOperationAction(ISD::FDIV, VT, Custom);
setOperationAction(ISD::FFLOOR, VT, Custom);		setOperationAction(ISD::FFLOOR, VT, Custom);
setOperationAction(ISD::FMA, VT, Custom);		setOperationAction(ISD::FMA, VT, Custom);
setOperationAction(ISD::FMAXIMUM, VT, Custom);		setOperationAction(ISD::FMAXIMUM, VT, Custom);
setOperationAction(ISD::FMAXNUM, VT, Custom);		setOperationAction(ISD::FMAXNUM, VT, Custom);
setOperationAction(ISD::FMINIMUM, VT, Custom);		setOperationAction(ISD::FMINIMUM, VT, Custom);
setOperationAction(ISD::FMINNUM, VT, Custom);		setOperationAction(ISD::FMINNUM, VT, Custom);
setOperationAction(ISD::FMUL, VT, Custom);		setOperationAction(ISD::FMUL, VT, Custom);
▲ Show 20 Lines • Show All 6,070 Lines • ▼ Show 20 Lines	if (!Subtarget->hasNEON())
return SDValue();		return SDValue();

EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
EVT IntVT = VT.changeTypeToInteger();		EVT IntVT = VT.changeTypeToInteger();
SDLoc DL(Op);		SDLoc DL(Op);

SDValue In1 = Op.getOperand(0);		SDValue In1 = Op.getOperand(0);
SDValue In2 = Op.getOperand(1);		SDValue In2 = Op.getOperand(1);

		paulwalker-armUnsubmitted Not Done Reply Inline Actions Bogus blank line. paulwalker-arm: Bogus blank line.
		const bool isFixedSVE =
		VT.isFixedLengthVector() && useSVEForFixedLengthVectorVT(VT);

EVT SrcVT = In2.getValueType();		EVT SrcVT = In2.getValueType();

if (SrcVT.bitsLT(VT))		if (SrcVT.bitsLT(VT))
In2 = DAG.getNode(ISD::FP_EXTEND, DL, VT, In2);		In2 = DAG.getNode(ISD::FP_EXTEND, DL, VT, In2);
else if (SrcVT.bitsGT(VT))		else if (SrcVT.bitsGT(VT))
In2 = DAG.getNode(		In2 = DAG.getNode(
ISD::FP_ROUND, DL, VT, In2,		ISD::FP_ROUND, DL, VT, In2,
DAG.getTargetConstant(0, DL, getPointerTy(DAG.getDataLayout())));		DAG.getTargetConstant(0, DL, getPointerTy(DAG.getDataLayout())));
		paulwalker-armUnsubmitted Not Done Reply Inline Actions This doesn't look safe with respect to the extend/rounding code just below. When faced with differing types the result from both convertToScalableVector called will be a type of the same size. However their element counts will be different. For example take the case: fcopysign v8f64, v8f32 this will resulting in: In1 = nxv2f64 In2 = nxv4f32 which I doubt the remaining logic will handle properly. The most likely affect being a getNode assert firing for invalid operands. My guess is that you're not seeing this because `In1` and `In2` always have the same type and indeed I couldn't immediate see a way to exercise this logic. I think this means your "mixtype" tests are likely exercising nothing new and are redundant. This is likely also true for you original patch when you added the initial scalable vector support. If they are not exercising this code as I suspect then you either need to rewrite them or just remove them if there's no actually route to test this logic. Personally I think the safest route is to simply rewrite the fixed length fcopysign into a scalable vector one after any necessary extending/rounding of the input has taken place. For what it's worth I also think the use of FP_EXTEND/FP_ROUND is not the most efficient way to get the sign bits to align but that can be changed later. paulwalker-arm: This doesn't look safe with respect to the extend/rounding code just below. When faced with…
		DavidTrubyAuthorUnsubmitted Done Reply Inline Actions I believe I've corrected this now; I think you're right that the inputs will always be the same type anyway though. I agree that it is safer to leave the handling in just in case that does get triggered. I think it's better to leave the mixed type tests in as is, just in case something changes in future and the types coming into this function could be different we want to make sure we don't regress in that case. DavidTruby: I believe I've corrected this now; I think you're right that the inputs will always be the same…

if (VT.isScalableVector())		if (VT.isScalableVector())
IntVT =		IntVT =
getPackedSVEVectorVT(VT.getVectorElementType().changeTypeToInteger());		getPackedSVEVectorVT(VT.getVectorElementType().changeTypeToInteger());

if (VT != In2.getValueType())		if (VT != In2.getValueType())
return SDValue();		return SDValue();
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Not new but can this be removed? as it can never happen given the `SrcVT.bitsLT/SrcVT.bitsGT` code above. paulwalker-arm: Not new but can this be removed? as it can never happen given the `SrcVT.bitsLT/SrcVT.bitsGT`…

		if (isFixedSVE) {
		peterwaller-armUnsubmitted Not Done Reply Inline Actions Nit: Does `isFixedSVE` want to move down with the use? peterwaller-arm: Nit: Does `isFixedSVE` want to move down with the use?
		assert(isTypeLegal(VT) && "Expected only legal fixed-width types");
		paulwalker-armUnsubmitted Not Done Reply Inline Actions This can be assumed, plus `getContainerForFixedLengthVector` will ensure the type is legal anyway. paulwalker-arm: This can be assumed, plus `getContainerForFixedLengthVector` will ensure the type is legal…
		VT = getContainerForFixedLengthVector(DAG, VT);
		IntVT = getContainerForFixedLengthVector(DAG, IntVT);

		In1 = convertToScalableVector(DAG, VT, In1);
		In2 = convertToScalableVector(DAG, VT, In2);
		}


auto BitCast = [this](EVT VT, SDValue Op, SelectionDAG &DAG) {		auto BitCast = [this](EVT VT, SDValue Op, SelectionDAG &DAG) {
if (VT.isScalableVector())		if (VT.isScalableVector())
return getSVESafeBitCast(VT, Op, DAG);		return getSVESafeBitCast(VT, Op, DAG);

return DAG.getBitcast(VT, Op);		return DAG.getBitcast(VT, Op);
};		};

SDValue VecVal1, VecVal2;		SDValue VecVal1, VecVal2;
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	SDValue BSP =
DAG.getNode(AArch64ISD::BSP, DL, VecVT, SignMaskV, VecVal1, VecVal2);		DAG.getNode(AArch64ISD::BSP, DL, VecVT, SignMaskV, VecVal1, VecVal2);
if (VT == MVT::f16)		if (VT == MVT::f16)
return DAG.getTargetExtractSubreg(AArch64::hsub, DL, VT, BSP);		return DAG.getTargetExtractSubreg(AArch64::hsub, DL, VT, BSP);
if (VT == MVT::f32)		if (VT == MVT::f32)
return DAG.getTargetExtractSubreg(AArch64::ssub, DL, VT, BSP);		return DAG.getTargetExtractSubreg(AArch64::ssub, DL, VT, BSP);
if (VT == MVT::f64)		if (VT == MVT::f64)
return DAG.getTargetExtractSubreg(AArch64::dsub, DL, VT, BSP);		return DAG.getTargetExtractSubreg(AArch64::dsub, DL, VT, BSP);

		if (isFixedSVE) {
		VT = Op.getValueType();
		IntVT = VT.changeTypeToInteger();
		peterwaller-armUnsubmitted Not Done Reply Inline Actions Is this line necessary or could it be pushed up? At a glance it appears it should already be an integer VT derived from VT. Same question for the VT assignment. peterwaller-arm: Is this line necessary or could it be pushed up? At a glance it appears it should already be an…
		DavidTrubyAuthorUnsubmitted Done Reply Inline Actions From 7593-4: VT and IntVT will be scalable containers for the fixed length vector types. Here we need to get the original VTs back. DavidTruby: From 7593-4: VT and IntVT will be scalable containers for the fixed length vector types. Here…
		BSP = convertFromScalableVector(DAG, IntVT, BSP);
		}
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Bookending the fixed length lowering like this has pitfalls and can complicate the code. It's better to just rewrite the fixed length operations using scalable vector types and then let the scalable vector lowering handle any complexity. Towards the start of the function you can do: EVT ContainerVT = getContainerForFixedLengthVector(DAG, VT); In1 = convertToScalableVector(DAG, ContainerVT, In1); In2 = convertToScalableVector(DAG, ContainerVT, In2); Res = getNode(ISD::FCOPYSIGN, ContainerVT , In1, In2) return convertFromScalableVector(DAG, ContainerVT, Res); This way it doesn't matter how complicated the scalable vector lowering gets. Doing this also means you no longer need sve2-fixed-length-fcopysign.ll because there's nothing SVE2 special about the lowering code you've added (i.e. the original sve2-fcopysign.ll tests are good enough to protect that functionality). paulwalker-arm: Bookending the fixed length lowering like this has pitfalls and can complicate the code. It's…

return BitCast(VT, BSP, DAG);		return BitCast(VT, BSP, DAG);
}		}

SDValue AArch64TargetLowering::LowerCTPOP_PARITY(SDValue Op,		SDValue AArch64TargetLowering::LowerCTPOP_PARITY(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
if (DAG.getMachineFunction().getFunction().hasFnAttribute(		if (DAG.getMachineFunction().getFunction().hasFnAttribute(
Attribute::NoImplicitFloat))		Attribute::NoImplicitFloat))
return SDValue();		return SDValue();
▲ Show 20 Lines • Show All 11,577 Lines • ▼ Show 20 Lines	DCI.CombineTo(
ExtLoad.getValue(1));		ExtLoad.getValue(1));
return SDValue(N, 0); // Return N so it doesn't get rechecked!		return SDValue(N, 0); // Return N so it doesn't get rechecked!
}		}

return SDValue();		return SDValue();
}		}

static SDValue performBSPExpandForSVE(SDNode *N, SelectionDAG &DAG,		static SDValue performBSPExpandForSVE(SDNode *N, SelectionDAG &DAG,
const AArch64Subtarget *Subtarget,		const AArch64Subtarget *Subtarget) {
bool fixedSVEVectorVT) {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

// Don't expand for SVE2		// Don't expand for NEON, SVE2 or SME
if (!VT.isScalableVector() \|\| Subtarget->hasSVE2() \|\| Subtarget->hasSME())		if (!VT.isScalableVector() \|\| Subtarget->hasSVE2() \|\| Subtarget->hasSME())
return SDValue();		return SDValue();
paulwalker-armUnsubmitted Not Done Reply Inline Actions Isn't this original code now fine and you instead just need to remove the following // Don't expand for NEON if (VT.isFixedLengthVector()) return SDValue(); block because that is covered by the `!VT.isScalableVector()` check? paulwalker-arm: Isn't this original code now fine and you instead just need to remove the following ``` //…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions I think you mean `VT.isScalableVector()` here. However... Given this bug fix it makes me wonder if the following code was ever excised before this patch? Which given my SVE2 comment I'm think we can in fact keep the original code and just remove the `fixedSVEVectorVT` code? paulwalker-arm: I think you mean `VT.isScalableVector()` here. However... Given this bug fix it makes me…

// Don't expand for NEON
if (VT.isFixedLengthVector() && !fixedSVEVectorVT)
return SDValue();

SDLoc DL(N);		SDLoc DL(N);

SDValue Mask = N->getOperand(0);		SDValue Mask = N->getOperand(0);
SDValue In1 = N->getOperand(1);		SDValue In1 = N->getOperand(1);
SDValue In2 = N->getOperand(2);		SDValue In2 = N->getOperand(2);

SDValue InvMask = DAG.getNOT(DL, Mask, VT);		SDValue InvMask = DAG.getNOT(DL, Mask, VT);
SDValue Sel = DAG.getNode(ISD::AND, DL, VT, Mask, In1);		SDValue Sel = DAG.getNode(ISD::AND, DL, VT, Mask, In1);
▲ Show 20 Lines • Show All 153 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::PerformDAGCombine(SDNode *N,
case AArch64ISD::GLD1S_IMM_MERGE_ZERO:		case AArch64ISD::GLD1S_IMM_MERGE_ZERO:
return performGLD1Combine(N, DAG);		return performGLD1Combine(N, DAG);
case AArch64ISD::VASHR:		case AArch64ISD::VASHR:
case AArch64ISD::VLSHR:		case AArch64ISD::VLSHR:
return performVectorShiftCombine(N, *this, DCI);		return performVectorShiftCombine(N, *this, DCI);
case AArch64ISD::SUNPKLO:		case AArch64ISD::SUNPKLO:
return performSunpkloCombine(N, DAG);		return performSunpkloCombine(N, DAG);
case AArch64ISD::BSP:		case AArch64ISD::BSP:
return performBSPExpandForSVE(		return performBSPExpandForSVE(N, DAG, Subtarget);
N, DAG, Subtarget, useSVEForFixedLengthVectorVT(N->getValueType(0)));
case ISD::INSERT_VECTOR_ELT:		case ISD::INSERT_VECTOR_ELT:
return performInsertVectorEltCombine(N, DCI);		return performInsertVectorEltCombine(N, DCI);
case ISD::EXTRACT_VECTOR_ELT:		case ISD::EXTRACT_VECTOR_ELT:
return performExtractVectorEltCombine(N, DCI, Subtarget);		return performExtractVectorEltCombine(N, DCI, Subtarget);
case ISD::VECREDUCE_ADD:		case ISD::VECREDUCE_ADD:
return performVecReduceAddCombine(N, DCI.DAG, Subtarget);		return performVecReduceAddCombine(N, DCI.DAG, Subtarget);
case AArch64ISD::UADDV:		case AArch64ISD::UADDV:
return performUADDVCombine(N, DAG);		return performUADDVCombine(N, DAG);
▲ Show 20 Lines • Show All 2,334 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-fcopysign.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple aarch64-eabi -mattr=+sve -o - \| FileCheck --check-prefixes=CHECK %s			; RUN: llc < %s -mtriple aarch64-eabi -mattr=+sve -o - \| FileCheck --check-prefixes=CHECK,CHECK-NO-EXTEND-ROUND %s
				; RUN: llc < %s -mtriple aarch64-eabi -mattr=+sve --combiner-vector-fcopysign-extend-round -o - \| FileCheck --check-prefixes=CHECK,CHECK-EXTEND-ROUND %s
	target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"

	;============ v2f32			;============ v2f32

	define <vscale x 2 x float> @test_copysign_v2f32_v2f32(<vscale x 2 x float> %a, <vscale x 2 x float> %b) #0 {			define <vscale x 2 x float> @test_copysign_v2f32_v2f32(<vscale x 2 x float> %a, <vscale x 2 x float> %b) #0 {
	; CHECK-LABEL: test_copysign_v2f32_v2f32:			; CHECK-LABEL: test_copysign_v2f32_v2f32:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: and z1.s, z1.s, #0x80000000			; CHECK-NEXT: and z1.s, z1.s, #0x80000000
	Show All 30 Lines
	; CHECK-NEXT: orr z0.d, z0.d, z1.d			; CHECK-NEXT: orr z0.d, z0.d, z1.d
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%r = call <vscale x 4 x float> @llvm.copysign.v4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b)			%r = call <vscale x 4 x float> @llvm.copysign.v4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b)
	ret <vscale x 4 x float> %r			ret <vscale x 4 x float> %r
	}			}

	; SplitVecOp #1			; SplitVecOp #1
	define <vscale x 4 x float> @test_copysign_v4f32_v4f64(<vscale x 4 x float> %a, <vscale x 4 x double> %b) #0 {			define <vscale x 4 x float> @test_copysign_v4f32_v4f64(<vscale x 4 x float> %a, <vscale x 4 x double> %b) #0 {
	; CHECK-LABEL: test_copysign_v4f32_v4f64:			; CHECK-NO-EXTEND-ROUND-LABEL: test_copysign_v4f32_v4f64:
	; CHECK: // %bb.0:			; CHECK-NO-EXTEND-ROUND: // %bb.0:
	; CHECK-NEXT: ptrue p0.d			; CHECK-NO-EXTEND-ROUND-NEXT: ptrue p0.d
	; CHECK-NEXT: and z0.s, z0.s, #0x7fffffff			; CHECK-NO-EXTEND-ROUND-NEXT: and z0.s, z0.s, #0x7fffffff
	; CHECK-NEXT: fcvt z2.s, p0/m, z2.d			; CHECK-NO-EXTEND-ROUND-NEXT: fcvt z2.s, p0/m, z2.d
	; CHECK-NEXT: fcvt z1.s, p0/m, z1.d			; CHECK-NO-EXTEND-ROUND-NEXT: fcvt z1.s, p0/m, z1.d
	; CHECK-NEXT: uzp1 z1.s, z1.s, z2.s			; CHECK-NO-EXTEND-ROUND-NEXT: uzp1 z1.s, z1.s, z2.s
	; CHECK-NEXT: and z1.s, z1.s, #0x80000000			; CHECK-NO-EXTEND-ROUND-NEXT: and z1.s, z1.s, #0x80000000
	; CHECK-NEXT: orr z0.d, z0.d, z1.d			; CHECK-NO-EXTEND-ROUND-NEXT: orr z0.d, z0.d, z1.d
	; CHECK-NEXT: ret			; CHECK-NO-EXTEND-ROUND-NEXT: ret
				;
				; CHECK-EXTEND-ROUND-LABEL: test_copysign_v4f32_v4f64:
				; CHECK-EXTEND-ROUND: // %bb.0:
				; CHECK-EXTEND-ROUND-NEXT: ptrue p0.d
				; CHECK-EXTEND-ROUND-NEXT: uunpkhi z3.d, z0.s
				; CHECK-EXTEND-ROUND-NEXT: fcvt z2.s, p0/m, z2.d
				; CHECK-EXTEND-ROUND-NEXT: fcvt z1.s, p0/m, z1.d
				; CHECK-EXTEND-ROUND-NEXT: uunpklo z0.d, z0.s
				; CHECK-EXTEND-ROUND-NEXT: and z2.s, z2.s, #0x80000000
				; CHECK-EXTEND-ROUND-NEXT: and z3.s, z3.s, #0x7fffffff
				; CHECK-EXTEND-ROUND-NEXT: and z1.s, z1.s, #0x80000000
				; CHECK-EXTEND-ROUND-NEXT: and z0.s, z0.s, #0x7fffffff
				; CHECK-EXTEND-ROUND-NEXT: orr z2.d, z3.d, z2.d
				; CHECK-EXTEND-ROUND-NEXT: orr z0.d, z0.d, z1.d
				; CHECK-EXTEND-ROUND-NEXT: uzp1 z0.s, z0.s, z2.s
				; CHECK-EXTEND-ROUND-NEXT: ret
	%tmp0 = fptrunc <vscale x 4 x double> %b to <vscale x 4 x float>			%tmp0 = fptrunc <vscale x 4 x double> %b to <vscale x 4 x float>
	%r = call <vscale x 4 x float> @llvm.copysign.v4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %tmp0)			%r = call <vscale x 4 x float> @llvm.copysign.v4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %tmp0)
	ret <vscale x 4 x float> %r			ret <vscale x 4 x float> %r
	}			}

	declare <vscale x 4 x float> @llvm.copysign.v4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0			declare <vscale x 4 x float> @llvm.copysign.v4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0

	;============ v2f64			;============ v2f64
	▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: orr z0.d, z0.d, z1.d			; CHECK-NEXT: orr z0.d, z0.d, z1.d
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%tmp0 = fptrunc <vscale x 4 x float> %b to <vscale x 4 x half>			%tmp0 = fptrunc <vscale x 4 x float> %b to <vscale x 4 x half>
	%r = call <vscale x 4 x half> @llvm.copysign.v4f16(<vscale x 4 x half> %a, <vscale x 4 x half> %tmp0)			%r = call <vscale x 4 x half> @llvm.copysign.v4f16(<vscale x 4 x half> %a, <vscale x 4 x half> %tmp0)
	ret <vscale x 4 x half> %r			ret <vscale x 4 x half> %r
	}			}

	define <vscale x 4 x half> @test_copysign_v4f16_v4f64(<vscale x 4 x half> %a, <vscale x 4 x double> %b) #0 {			define <vscale x 4 x half> @test_copysign_v4f16_v4f64(<vscale x 4 x half> %a, <vscale x 4 x double> %b) #0 {
	; CHECK-LABEL: test_copysign_v4f16_v4f64:			; CHECK-NO-EXTEND-ROUND-LABEL: test_copysign_v4f16_v4f64:
	; CHECK: // %bb.0:			; CHECK-NO-EXTEND-ROUND: // %bb.0:
	; CHECK-NEXT: ptrue p0.d			; CHECK-NO-EXTEND-ROUND-NEXT: ptrue p0.d
	; CHECK-NEXT: and z0.h, z0.h, #0x7fff			; CHECK-NO-EXTEND-ROUND-NEXT: and z0.h, z0.h, #0x7fff
	; CHECK-NEXT: fcvt z2.h, p0/m, z2.d			; CHECK-NO-EXTEND-ROUND-NEXT: fcvt z2.h, p0/m, z2.d
	; CHECK-NEXT: fcvt z1.h, p0/m, z1.d			; CHECK-NO-EXTEND-ROUND-NEXT: fcvt z1.h, p0/m, z1.d
	; CHECK-NEXT: uzp1 z1.s, z1.s, z2.s			; CHECK-NO-EXTEND-ROUND-NEXT: uzp1 z1.s, z1.s, z2.s
	; CHECK-NEXT: and z1.h, z1.h, #0x8000			; CHECK-NO-EXTEND-ROUND-NEXT: and z1.h, z1.h, #0x8000
	; CHECK-NEXT: orr z0.d, z0.d, z1.d			; CHECK-NO-EXTEND-ROUND-NEXT: orr z0.d, z0.d, z1.d
	; CHECK-NEXT: ret			; CHECK-NO-EXTEND-ROUND-NEXT: ret
				;
				; CHECK-EXTEND-ROUND-LABEL: test_copysign_v4f16_v4f64:
				; CHECK-EXTEND-ROUND: // %bb.0:
				; CHECK-EXTEND-ROUND-NEXT: ptrue p0.d
				; CHECK-EXTEND-ROUND-NEXT: uunpkhi z3.d, z0.s
				; CHECK-EXTEND-ROUND-NEXT: fcvt z2.h, p0/m, z2.d
				; CHECK-EXTEND-ROUND-NEXT: fcvt z1.h, p0/m, z1.d
				; CHECK-EXTEND-ROUND-NEXT: uunpklo z0.d, z0.s
				; CHECK-EXTEND-ROUND-NEXT: and z2.h, z2.h, #0x8000
				; CHECK-EXTEND-ROUND-NEXT: and z3.h, z3.h, #0x7fff
				; CHECK-EXTEND-ROUND-NEXT: and z1.h, z1.h, #0x8000
				; CHECK-EXTEND-ROUND-NEXT: and z0.h, z0.h, #0x7fff
				; CHECK-EXTEND-ROUND-NEXT: orr z2.d, z3.d, z2.d
				; CHECK-EXTEND-ROUND-NEXT: orr z0.d, z0.d, z1.d
				; CHECK-EXTEND-ROUND-NEXT: uzp1 z0.s, z0.s, z2.s
				; CHECK-EXTEND-ROUND-NEXT: ret
	%tmp0 = fptrunc <vscale x 4 x double> %b to <vscale x 4 x half>			%tmp0 = fptrunc <vscale x 4 x double> %b to <vscale x 4 x half>
	%r = call <vscale x 4 x half> @llvm.copysign.v4f16(<vscale x 4 x half> %a, <vscale x 4 x half> %tmp0)			%r = call <vscale x 4 x half> @llvm.copysign.v4f16(<vscale x 4 x half> %a, <vscale x 4 x half> %tmp0)
	ret <vscale x 4 x half> %r			ret <vscale x 4 x half> %r
	}			}

	declare <vscale x 4 x half> @llvm.copysign.v4f16(<vscale x 4 x half> %a, <vscale x 4 x half> %b) #0			declare <vscale x 4 x half> @llvm.copysign.v4f16(<vscale x 4 x half> %a, <vscale x 4 x half> %b) #0

	;============ v8f16			;============ v8f16

	define <vscale x 8 x half> @test_copysign_v8f16_v8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0 {			define <vscale x 8 x half> @test_copysign_v8f16_v8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0 {
	; CHECK-LABEL: test_copysign_v8f16_v8f16:			; CHECK-LABEL: test_copysign_v8f16_v8f16:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: and z1.h, z1.h, #0x8000			; CHECK-NEXT: and z1.h, z1.h, #0x8000
	; CHECK-NEXT: and z0.h, z0.h, #0x7fff			; CHECK-NEXT: and z0.h, z0.h, #0x7fff
	; CHECK-NEXT: orr z0.d, z0.d, z1.d			; CHECK-NEXT: orr z0.d, z0.d, z1.d
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%r = call <vscale x 8 x half> @llvm.copysign.v8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b)			%r = call <vscale x 8 x half> @llvm.copysign.v8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b)
	ret <vscale x 8 x half> %r			ret <vscale x 8 x half> %r
	}			}

	define <vscale x 8 x half> @test_copysign_v8f16_v8f32(<vscale x 8 x half> %a, <vscale x 8 x float> %b) #0 {			define <vscale x 8 x half> @test_copysign_v8f16_v8f32(<vscale x 8 x half> %a, <vscale x 8 x float> %b) #0 {
	; CHECK-LABEL: test_copysign_v8f16_v8f32:			; CHECK-NO-EXTEND-ROUND-LABEL: test_copysign_v8f16_v8f32:
	; CHECK: // %bb.0:			; CHECK-NO-EXTEND-ROUND: // %bb.0:
	; CHECK-NEXT: ptrue p0.s			; CHECK-NO-EXTEND-ROUND-NEXT: ptrue p0.s
	; CHECK-NEXT: and z0.h, z0.h, #0x7fff			; CHECK-NO-EXTEND-ROUND-NEXT: and z0.h, z0.h, #0x7fff
	; CHECK-NEXT: fcvt z2.h, p0/m, z2.s			; CHECK-NO-EXTEND-ROUND-NEXT: fcvt z2.h, p0/m, z2.s
	; CHECK-NEXT: fcvt z1.h, p0/m, z1.s			; CHECK-NO-EXTEND-ROUND-NEXT: fcvt z1.h, p0/m, z1.s
	; CHECK-NEXT: uzp1 z1.h, z1.h, z2.h			; CHECK-NO-EXTEND-ROUND-NEXT: uzp1 z1.h, z1.h, z2.h
	; CHECK-NEXT: and z1.h, z1.h, #0x8000			; CHECK-NO-EXTEND-ROUND-NEXT: and z1.h, z1.h, #0x8000
	; CHECK-NEXT: orr z0.d, z0.d, z1.d			; CHECK-NO-EXTEND-ROUND-NEXT: orr z0.d, z0.d, z1.d
	; CHECK-NEXT: ret			; CHECK-NO-EXTEND-ROUND-NEXT: ret
				;
				; CHECK-EXTEND-ROUND-LABEL: test_copysign_v8f16_v8f32:
				; CHECK-EXTEND-ROUND: // %bb.0:
				; CHECK-EXTEND-ROUND-NEXT: ptrue p0.s
				; CHECK-EXTEND-ROUND-NEXT: uunpkhi z3.s, z0.h
				; CHECK-EXTEND-ROUND-NEXT: fcvt z2.h, p0/m, z2.s
				; CHECK-EXTEND-ROUND-NEXT: fcvt z1.h, p0/m, z1.s
				; CHECK-EXTEND-ROUND-NEXT: uunpklo z0.s, z0.h
				; CHECK-EXTEND-ROUND-NEXT: and z2.h, z2.h, #0x8000
				; CHECK-EXTEND-ROUND-NEXT: and z3.h, z3.h, #0x7fff
				; CHECK-EXTEND-ROUND-NEXT: and z1.h, z1.h, #0x8000
				; CHECK-EXTEND-ROUND-NEXT: and z0.h, z0.h, #0x7fff
				; CHECK-EXTEND-ROUND-NEXT: orr z2.d, z3.d, z2.d
				; CHECK-EXTEND-ROUND-NEXT: orr z0.d, z0.d, z1.d
				; CHECK-EXTEND-ROUND-NEXT: uzp1 z0.h, z0.h, z2.h
				; CHECK-EXTEND-ROUND-NEXT: ret
	%tmp0 = fptrunc <vscale x 8 x float> %b to <vscale x 8 x half>			%tmp0 = fptrunc <vscale x 8 x float> %b to <vscale x 8 x half>
	%r = call <vscale x 8 x half> @llvm.copysign.v8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %tmp0)			%r = call <vscale x 8 x half> @llvm.copysign.v8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %tmp0)
	ret <vscale x 8 x half> %r			ret <vscale x 8 x half> %r
	}			}


	;========== FCOPYSIGN_EXTEND_ROUND			;========== FCOPYSIGN_EXTEND_ROUND

	Show All 33 Lines

llvm/test/CodeGen/AArch64/sve-fixed-length-fcopysign.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -aarch64-sve-vector-bits-min=256 < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_256,CHECK_NO_EXTEND_ROUND
				; RUN: llc -aarch64-sve-vector-bits-min=512 < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,CHECK_NO_EXTEND_ROUND
				; RUN: llc -aarch64-sve-vector-bits-min=2048 < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,CHECK_NO_EXTEND_ROUND
				; RUN: llc -aarch64-sve-vector-bits-min=256 --combiner-vector-fcopysign-extend-round < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_256,CHECK_EXTEND_ROUND
				; RUN: llc -aarch64-sve-vector-bits-min=512 --combiner-vector-fcopysign-extend-round < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,CHECK_EXTEND_ROUND
				; RUN: llc -aarch64-sve-vector-bits-min=2048 --combiner-vector-fcopysign-extend-round < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,CHECK_EXTEND_ROUND

				target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"

				target triple = "aarch64-unknown-linux-gnu"

				;============ f16

				define void @test_copysign_v4f16_v4f16(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v4f16_v4f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr d0, [x0]
				; CHECK-NEXT: mvni v2.4h, #128, lsl #8
				; CHECK-NEXT: ldr d1, [x1]
				; CHECK-NEXT: bif v0.8b, v1.8b, v2.8b
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				%a = load <4 x half>, ptr %ap
				%b = load <4 x half>, ptr %bp
				%r = call <4 x half> @llvm.copysign.v4f16(<4 x half> %a, <4 x half> %b)
				store <4 x half> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v8f16_v8f16(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v8f16_v8f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr q0, [x0]
				; CHECK-NEXT: ldr q1, [x1]
				; CHECK-NEXT: mvni v2.8h, #128, lsl #8
				; CHECK-NEXT: bif v0.16b, v1.16b, v2.16b
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%a = load <8 x half>, ptr %ap
				%b = load <8 x half>, ptr %bp
				%r = call <8 x half> @llvm.copysign.v8f16(<8 x half> %a, <8 x half> %b)
				store <8 x half> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v16f16_v16f16(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v16f16_v16f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.h, vl16
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [x0]
				; CHECK-NEXT: ld1h { z1.h }, p0/z, [x1]
				; CHECK-NEXT: and z1.h, z1.h, #0x8000
				; CHECK-NEXT: and z0.h, z0.h, #0x7fff
				; CHECK-NEXT: orr z0.d, z0.d, z1.d
				; CHECK-NEXT: st1h { z0.h }, p0, [x0]
				; CHECK-NEXT: ret
				%a = load <16 x half>, ptr %ap
				%b = load <16 x half>, ptr %bp
				%r = call <16 x half> @llvm.copysign.v16f16(<16 x half> %a, <16 x half> %b)
				store <16 x half> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v32f16_v32f16(ptr %ap, ptr %bp) #0 {
				; VBITS_GE_256-LABEL: test_copysign_v32f16_v32f16:
				; VBITS_GE_256: // %bb.0:
				; VBITS_GE_256-NEXT: mov x8, #16
				; VBITS_GE_256-NEXT: ptrue p0.h, vl16
				; VBITS_GE_256-NEXT: ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
				; VBITS_GE_256-NEXT: ld1h { z1.h }, p0/z, [x0]
				; VBITS_GE_256-NEXT: ld1h { z2.h }, p0/z, [x1, x8, lsl #1]
				; VBITS_GE_256-NEXT: ld1h { z3.h }, p0/z, [x1]
				; VBITS_GE_256-NEXT: and z0.h, z0.h, #0x7fff
				; VBITS_GE_256-NEXT: and z1.h, z1.h, #0x7fff
				; VBITS_GE_256-NEXT: and z2.h, z2.h, #0x8000
				; VBITS_GE_256-NEXT: and z3.h, z3.h, #0x8000
				; VBITS_GE_256-NEXT: orr z0.d, z0.d, z2.d
				; VBITS_GE_256-NEXT: orr z1.d, z1.d, z3.d
				; VBITS_GE_256-NEXT: st1h { z0.h }, p0, [x0, x8, lsl #1]
				; VBITS_GE_256-NEXT: st1h { z1.h }, p0, [x0]
				; VBITS_GE_256-NEXT: ret
				;
				; VBITS_GE_512-LABEL: test_copysign_v32f16_v32f16:
				; VBITS_GE_512: // %bb.0:
				; VBITS_GE_512-NEXT: ptrue p0.h, vl32
				; VBITS_GE_512-NEXT: ld1h { z0.h }, p0/z, [x0]
				; VBITS_GE_512-NEXT: ld1h { z1.h }, p0/z, [x1]
				; VBITS_GE_512-NEXT: and z1.h, z1.h, #0x8000
				; VBITS_GE_512-NEXT: and z0.h, z0.h, #0x7fff
				; VBITS_GE_512-NEXT: orr z0.d, z0.d, z1.d
				; VBITS_GE_512-NEXT: st1h { z0.h }, p0, [x0]
				; VBITS_GE_512-NEXT: ret
				%a = load <32 x half>, ptr %ap
				%b = load <32 x half>, ptr %bp
				%r = call <32 x half> @llvm.copysign.v32f16(<32 x half> %a, <32 x half> %b)
				store <32 x half> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v64f16_v64f16(ptr %ap, ptr %bp) vscale_range(8,0) #0 {
				; CHECK-LABEL: test_copysign_v64f16_v64f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.h, vl64
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [x0]
				; CHECK-NEXT: ld1h { z1.h }, p0/z, [x1]
				; CHECK-NEXT: and z1.h, z1.h, #0x8000
				; CHECK-NEXT: and z0.h, z0.h, #0x7fff
				; CHECK-NEXT: orr z0.d, z0.d, z1.d
				; CHECK-NEXT: st1h { z0.h }, p0, [x0]
				; CHECK-NEXT: ret
				%a = load <64 x half>, ptr %ap
				%b = load <64 x half>, ptr %bp
				%r = call <64 x half> @llvm.copysign.v64f16(<64 x half> %a, <64 x half> %b)
				store <64 x half> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v128f16_v128f16(ptr %ap, ptr %bp) vscale_range(16,0) #0 {
				; CHECK-LABEL: test_copysign_v128f16_v128f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.h, vl128
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [x0]
				; CHECK-NEXT: ld1h { z1.h }, p0/z, [x1]
				; CHECK-NEXT: and z1.h, z1.h, #0x8000
				; CHECK-NEXT: and z0.h, z0.h, #0x7fff
				; CHECK-NEXT: orr z0.d, z0.d, z1.d
				; CHECK-NEXT: st1h { z0.h }, p0, [x0]
				; CHECK-NEXT: ret
				%a = load <128 x half>, ptr %ap
				%b = load <128 x half>, ptr %bp
				%r = call <128 x half> @llvm.copysign.v128f16(<128 x half> %a, <128 x half> %b)
				store <128 x half> %r, ptr %ap
				ret void
				}

				;============ f32

				define void @test_copysign_v2f32_v2f32(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v2f32_v2f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr d0, [x0]
				; CHECK-NEXT: mvni v2.2s, #128, lsl #24
				; CHECK-NEXT: ldr d1, [x1]
				; CHECK-NEXT: bif v0.8b, v1.8b, v2.8b
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				%a = load <2 x float>, ptr %ap
				%b = load <2 x float>, ptr %bp
				%r = call <2 x float> @llvm.copysign.v2f32(<2 x float> %a, <2 x float> %b)
				store <2 x float> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v4f32_v4f32(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v4f32_v4f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr q0, [x0]
				; CHECK-NEXT: ldr q1, [x1]
				; CHECK-NEXT: mvni v2.4s, #128, lsl #24
				; CHECK-NEXT: bif v0.16b, v1.16b, v2.16b
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%a = load <4 x float>, ptr %ap
				%b = load <4 x float>, ptr %bp
				%r = call <4 x float> @llvm.copysign.v4f32(<4 x float> %a, <4 x float> %b)
				store <4 x float> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v8f32_v8f32(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v8f32_v8f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s, vl8
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ld1w { z1.s }, p0/z, [x1]
				; CHECK-NEXT: and z1.s, z1.s, #0x80000000
				; CHECK-NEXT: and z0.s, z0.s, #0x7fffffff
				; CHECK-NEXT: orr z0.d, z0.d, z1.d
				; CHECK-NEXT: st1w { z0.s }, p0, [x0]
				; CHECK-NEXT: ret
				%a = load <8 x float>, ptr %ap
				%b = load <8 x float>, ptr %bp
				%r = call <8 x float> @llvm.copysign.v8f32(<8 x float> %a, <8 x float> %b)
				store <8 x float> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v16f32_v16f32(ptr %ap, ptr %bp) #0 {
				; VBITS_GE_256-LABEL: test_copysign_v16f32_v16f32:
				; VBITS_GE_256: // %bb.0:
				; VBITS_GE_256-NEXT: mov x8, #8
				; VBITS_GE_256-NEXT: ptrue p0.s, vl8
				; VBITS_GE_256-NEXT: ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
				; VBITS_GE_256-NEXT: ld1w { z1.s }, p0/z, [x0]
				; VBITS_GE_256-NEXT: ld1w { z2.s }, p0/z, [x1, x8, lsl #2]
				; VBITS_GE_256-NEXT: ld1w { z3.s }, p0/z, [x1]
				; VBITS_GE_256-NEXT: and z0.s, z0.s, #0x7fffffff
				; VBITS_GE_256-NEXT: and z1.s, z1.s, #0x7fffffff
				; VBITS_GE_256-NEXT: and z2.s, z2.s, #0x80000000
				; VBITS_GE_256-NEXT: and z3.s, z3.s, #0x80000000
				; VBITS_GE_256-NEXT: orr z0.d, z0.d, z2.d
				; VBITS_GE_256-NEXT: orr z1.d, z1.d, z3.d
				; VBITS_GE_256-NEXT: st1w { z0.s }, p0, [x0, x8, lsl #2]
				; VBITS_GE_256-NEXT: st1w { z1.s }, p0, [x0]
				; VBITS_GE_256-NEXT: ret
				;
				; VBITS_GE_512-LABEL: test_copysign_v16f32_v16f32:
				; VBITS_GE_512: // %bb.0:
				; VBITS_GE_512-NEXT: ptrue p0.s, vl16
				; VBITS_GE_512-NEXT: ld1w { z0.s }, p0/z, [x0]
				; VBITS_GE_512-NEXT: ld1w { z1.s }, p0/z, [x1]
				; VBITS_GE_512-NEXT: and z1.s, z1.s, #0x80000000
				; VBITS_GE_512-NEXT: and z0.s, z0.s, #0x7fffffff
				; VBITS_GE_512-NEXT: orr z0.d, z0.d, z1.d
				; VBITS_GE_512-NEXT: st1w { z0.s }, p0, [x0]
				; VBITS_GE_512-NEXT: ret
				%a = load <16 x float>, ptr %ap
				%b = load <16 x float>, ptr %bp
				%r = call <16 x float> @llvm.copysign.v16f32(<16 x float> %a, <16 x float> %b)
				store <16 x float> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v32f32_v32f32(ptr %ap, ptr %bp) vscale_range(8,0) #0 {
				; CHECK-LABEL: test_copysign_v32f32_v32f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s, vl32
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ld1w { z1.s }, p0/z, [x1]
				; CHECK-NEXT: and z1.s, z1.s, #0x80000000
				; CHECK-NEXT: and z0.s, z0.s, #0x7fffffff
				; CHECK-NEXT: orr z0.d, z0.d, z1.d
				; CHECK-NEXT: st1w { z0.s }, p0, [x0]
				; CHECK-NEXT: ret
				%a = load <32 x float>, ptr %ap
				%b = load <32 x float>, ptr %bp
				%r = call <32 x float> @llvm.copysign.v32f32(<32 x float> %a, <32 x float> %b)
				store <32 x float> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v64f32_v64f32(ptr %ap, ptr %bp) vscale_range(16,0) #0 {
				; CHECK-LABEL: test_copysign_v64f32_v64f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s, vl64
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ld1w { z1.s }, p0/z, [x1]
				; CHECK-NEXT: and z1.s, z1.s, #0x80000000
				; CHECK-NEXT: and z0.s, z0.s, #0x7fffffff
				; CHECK-NEXT: orr z0.d, z0.d, z1.d
				; CHECK-NEXT: st1w { z0.s }, p0, [x0]
				; CHECK-NEXT: ret
				%a = load <64 x float>, ptr %ap
				%b = load <64 x float>, ptr %bp
				%r = call <64 x float> @llvm.copysign.v64f32(<64 x float> %a, <64 x float> %b)
				store <64 x float> %r, ptr %ap
				ret void
				}

				;============ f64

				define void @test_copysign_v2f64_v2f64(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v2f64_v2f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: movi v0.2d, #0xffffffffffffffff
				; CHECK-NEXT: ldr q1, [x0]
				; CHECK-NEXT: ldr q2, [x1]
				; CHECK-NEXT: fneg v0.2d, v0.2d
				; CHECK-NEXT: bsl v0.16b, v1.16b, v2.16b
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%a = load <2 x double>, ptr %ap
				%b = load <2 x double>, ptr %bp
				%r = call <2 x double> @llvm.copysign.v2f64(<2 x double> %a, <2 x double> %b)
				store <2 x double> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v4f64_v4f64(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v4f64_v4f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d, vl4
				; CHECK-NEXT: ld1d { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ld1d { z1.d }, p0/z, [x1]
				; CHECK-NEXT: and z1.d, z1.d, #0x8000000000000000
				; CHECK-NEXT: and z0.d, z0.d, #0x7fffffffffffffff
				; CHECK-NEXT: orr z0.d, z0.d, z1.d
				; CHECK-NEXT: st1d { z0.d }, p0, [x0]
				; CHECK-NEXT: ret
				%a = load <4 x double>, ptr %ap
				%b = load <4 x double>, ptr %bp
				%r = call <4 x double> @llvm.copysign.v4f64(<4 x double> %a, <4 x double> %b)
				store <4 x double> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v8f64_v8f64(ptr %ap, ptr %bp) #0 {
				; VBITS_GE_256-LABEL: test_copysign_v8f64_v8f64:
				; VBITS_GE_256: // %bb.0:
				; VBITS_GE_256-NEXT: mov x8, #4
				; VBITS_GE_256-NEXT: ptrue p0.d, vl4
				; VBITS_GE_256-NEXT: ld1d { z0.d }, p0/z, [x0, x8, lsl #3]
				; VBITS_GE_256-NEXT: ld1d { z1.d }, p0/z, [x0]
				; VBITS_GE_256-NEXT: ld1d { z2.d }, p0/z, [x1, x8, lsl #3]
				; VBITS_GE_256-NEXT: ld1d { z3.d }, p0/z, [x1]
				; VBITS_GE_256-NEXT: and z0.d, z0.d, #0x7fffffffffffffff
				; VBITS_GE_256-NEXT: and z1.d, z1.d, #0x7fffffffffffffff
				; VBITS_GE_256-NEXT: and z2.d, z2.d, #0x8000000000000000
				; VBITS_GE_256-NEXT: and z3.d, z3.d, #0x8000000000000000
				; VBITS_GE_256-NEXT: orr z0.d, z0.d, z2.d
				; VBITS_GE_256-NEXT: orr z1.d, z1.d, z3.d
				; VBITS_GE_256-NEXT: st1d { z0.d }, p0, [x0, x8, lsl #3]
				; VBITS_GE_256-NEXT: st1d { z1.d }, p0, [x0]
				; VBITS_GE_256-NEXT: ret
				;
				; VBITS_GE_512-LABEL: test_copysign_v8f64_v8f64:
				; VBITS_GE_512: // %bb.0:
				; VBITS_GE_512-NEXT: ptrue p0.d, vl8
				; VBITS_GE_512-NEXT: ld1d { z0.d }, p0/z, [x0]
				; VBITS_GE_512-NEXT: ld1d { z1.d }, p0/z, [x1]
				; VBITS_GE_512-NEXT: and z1.d, z1.d, #0x8000000000000000
				; VBITS_GE_512-NEXT: and z0.d, z0.d, #0x7fffffffffffffff
				; VBITS_GE_512-NEXT: orr z0.d, z0.d, z1.d
				; VBITS_GE_512-NEXT: st1d { z0.d }, p0, [x0]
				; VBITS_GE_512-NEXT: ret
				%a = load <8 x double>, ptr %ap
				%b = load <8 x double>, ptr %bp
				%r = call <8 x double> @llvm.copysign.v8f64(<8 x double> %a, <8 x double> %b)
				store <8 x double> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v16f64_v16f64(ptr %ap, ptr %bp) vscale_range(8,0) #0 {
				; CHECK-LABEL: test_copysign_v16f64_v16f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d, vl16
				; CHECK-NEXT: ld1d { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ld1d { z1.d }, p0/z, [x1]
				; CHECK-NEXT: and z1.d, z1.d, #0x8000000000000000
				; CHECK-NEXT: and z0.d, z0.d, #0x7fffffffffffffff
				; CHECK-NEXT: orr z0.d, z0.d, z1.d
				; CHECK-NEXT: st1d { z0.d }, p0, [x0]
				; CHECK-NEXT: ret
				%a = load <16 x double>, ptr %ap
				%b = load <16 x double>, ptr %bp
				%r = call <16 x double> @llvm.copysign.v16f64(<16 x double> %a, <16 x double> %b)
				store <16 x double> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v32f64_v32f64(ptr %ap, ptr %bp) vscale_range(16,0) #0 {
				; CHECK-LABEL: test_copysign_v32f64_v32f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d, vl32
				; CHECK-NEXT: ld1d { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ld1d { z1.d }, p0/z, [x1]
				; CHECK-NEXT: and z1.d, z1.d, #0x8000000000000000
				; CHECK-NEXT: and z0.d, z0.d, #0x7fffffffffffffff
				; CHECK-NEXT: orr z0.d, z0.d, z1.d
				; CHECK-NEXT: st1d { z0.d }, p0, [x0]
				; CHECK-NEXT: ret
				%a = load <32 x double>, ptr %ap
				%b = load <32 x double>, ptr %bp
				%r = call <32 x double> @llvm.copysign.v32f64(<32 x double> %a, <32 x double> %b)
				store <32 x double> %r, ptr %ap
				ret void
				}

				;============ v2f32

				define void @test_copysign_v2f32_v2f64(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v2f32_v2f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr q0, [x1]
				; CHECK-NEXT: mvni v2.2s, #128, lsl #24
				; CHECK-NEXT: ldr d1, [x0]
				; CHECK-NEXT: fcvtn v0.2s, v0.2d
				; CHECK-NEXT: bit v0.8b, v1.8b, v2.8b
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				%a = load <2 x float>, ptr %ap
				%b = load <2 x double>, ptr %bp
				%tmp0 = fptrunc <2 x double> %b to <2 x float>
				%r = call <2 x float> @llvm.copysign.v2f32(<2 x float> %a, <2 x float> %tmp0)
				store <2 x float> %r, ptr %ap
				ret void
				}

				;============ v4f32

				; SplitVecOp #1
				define void @test_copysign_v4f32_v4f64(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v4f32_v4f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d, vl4
				; CHECK-NEXT: ldr q0, [x0]
				; CHECK-NEXT: ld1d { z1.d }, p0/z, [x1]
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mvni v2.4s, #128, lsl #24
				; CHECK-NEXT: fcvt z1.s, p0/m, z1.d
				; CHECK-NEXT: uzp1 z1.s, z1.s, z1.s
				; CHECK-NEXT: bif v0.16b, v1.16b, v2.16b
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%a = load <4 x float>, ptr %ap
				%b = load <4 x double>, ptr %bp
				%tmp0 = fptrunc <4 x double> %b to <4 x float>
				%r = call <4 x float> @llvm.copysign.v4f32(<4 x float> %a, <4 x float> %tmp0)
				store <4 x float> %r, ptr %ap
				ret void
				}

				;============ v2f64

				define void @test_copysign_v2f64_v2f32(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v2f64_v2f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: movi v0.2d, #0xffffffffffffffff
				; CHECK-NEXT: ldr d1, [x1]
				; CHECK-NEXT: ldr q2, [x0]
				; CHECK-NEXT: fcvtl v1.2d, v1.2s
				; CHECK-NEXT: fneg v0.2d, v0.2d
				; CHECK-NEXT: bsl v0.16b, v2.16b, v1.16b
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%a = load <2 x double>, ptr %ap
				%b = load < 2 x float>, ptr %bp
				%tmp0 = fpext <2 x float> %b to <2 x double>
				%r = call <2 x double> @llvm.copysign.v2f64(<2 x double> %a, <2 x double> %tmp0)
				store <2 x double> %r, ptr %ap
				ret void
				}

				;============ v4f64

				; SplitVecRes mismatched
				define void @test_copysign_v4f64_v4f32(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK_NO_EXTEND_ROUND-LABEL: test_copysign_v4f64_v4f32:
				; CHECK_NO_EXTEND_ROUND: // %bb.0:
				; CHECK_NO_EXTEND_ROUND-NEXT: ptrue p0.d, vl4
				; CHECK_NO_EXTEND_ROUND-NEXT: ld1d { z0.d }, p0/z, [x0]
				; CHECK_NO_EXTEND_ROUND-NEXT: ld1w { z1.d }, p0/z, [x1]
				; CHECK_NO_EXTEND_ROUND-NEXT: fcvt z1.d, p0/m, z1.s
				; CHECK_NO_EXTEND_ROUND-NEXT: and z0.d, z0.d, #0x7fffffffffffffff
				; CHECK_NO_EXTEND_ROUND-NEXT: and z1.d, z1.d, #0x8000000000000000
				; CHECK_NO_EXTEND_ROUND-NEXT: orr z0.d, z0.d, z1.d
				; CHECK_NO_EXTEND_ROUND-NEXT: st1d { z0.d }, p0, [x0]
				; CHECK_NO_EXTEND_ROUND-NEXT: ret
				;
				; CHECK_EXTEND_ROUND-LABEL: test_copysign_v4f64_v4f32:
				; CHECK_EXTEND_ROUND: // %bb.0:
				; CHECK_EXTEND_ROUND-NEXT: ptrue p0.d, vl4
				; CHECK_EXTEND_ROUND-NEXT: ld1d { z0.d }, p0/z, [x0]
				; CHECK_EXTEND_ROUND-NEXT: ldr q1, [x1]
				; CHECK_EXTEND_ROUND-NEXT: uunpklo z1.d, z1.s
				; CHECK_EXTEND_ROUND-NEXT: fcvt z1.d, p0/m, z1.s
				; CHECK_EXTEND_ROUND-NEXT: and z0.d, z0.d, #0x7fffffffffffffff
				; CHECK_EXTEND_ROUND-NEXT: and z1.d, z1.d, #0x8000000000000000
				; CHECK_EXTEND_ROUND-NEXT: orr z0.d, z0.d, z1.d
				; CHECK_EXTEND_ROUND-NEXT: st1d { z0.d }, p0, [x0]
				; CHECK_EXTEND_ROUND-NEXT: ret
				%a = load <4 x double>, ptr %ap
				%b = load <4 x float>, ptr %bp
				%tmp0 = fpext <4 x float> %b to <4 x double>
				%r = call <4 x double> @llvm.copysign.v4f64(<4 x double> %a, <4 x double> %tmp0)
				store <4 x double> %r, ptr %ap
				ret void
				}

				;============ v4f16

				define void @test_copysign_v4f16_v4f32(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v4f16_v4f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr q0, [x1]
				; CHECK-NEXT: mvni v2.4h, #128, lsl #8
				; CHECK-NEXT: ldr d1, [x0]
				; CHECK-NEXT: fcvtn v0.4h, v0.4s
				; CHECK-NEXT: bit v0.8b, v1.8b, v2.8b
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				%a = load <4 x half>, ptr %ap
				%b = load <4 x float>, ptr %bp
				%tmp0 = fptrunc <4 x float> %b to <4 x half>
				%r = call <4 x half> @llvm.copysign.v4f16(<4 x half> %a, <4 x half> %tmp0)
				store <4 x half> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v4f16_v4f64(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v4f16_v4f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d, vl4
				; CHECK-NEXT: ldr d0, [x0]
				; CHECK-NEXT: ld1d { z1.d }, p0/z, [x1]
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mvni v2.4h, #128, lsl #8
				; CHECK-NEXT: fcvt z1.h, p0/m, z1.d
				; CHECK-NEXT: uzp1 z1.s, z1.s, z1.s
				; CHECK-NEXT: uzp1 z1.h, z1.h, z1.h
				; CHECK-NEXT: bif v0.8b, v1.8b, v2.8b
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				%a = load <4 x half>, ptr %ap
				%b = load <4 x double>, ptr %bp
				%tmp0 = fptrunc <4 x double> %b to <4 x half>
				%r = call <4 x half> @llvm.copysign.v4f16(<4 x half> %a, <4 x half> %tmp0)
				store <4 x half> %r, ptr %ap
				ret void
				}

				declare <4 x half> @llvm.copysign.v4f16(<4 x half> %a, <4 x half> %b) #0

				;============ v8f16


				define void @test_copysign_v8f16_v8f32(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v8f16_v8f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s, vl8
				; CHECK-NEXT: ldr q0, [x0]
				; CHECK-NEXT: ld1w { z1.s }, p0/z, [x1]
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mvni v2.8h, #128, lsl #8
				; CHECK-NEXT: fcvt z1.h, p0/m, z1.s
				; CHECK-NEXT: uzp1 z1.h, z1.h, z1.h
				; CHECK-NEXT: bif v0.16b, v1.16b, v2.16b
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%a = load <8 x half>, ptr %ap
				%b = load <8 x float>, ptr %bp
				%tmp0 = fptrunc <8 x float> %b to <8 x half>
				%r = call <8 x half> @llvm.copysign.v8f16(<8 x half> %a, <8 x half> %tmp0)
				store <8 x half> %r, ptr %ap
				ret void
				}

				declare <8 x half> @llvm.copysign.v8f16(<8 x half> %a, <8 x half> %b) #0
				declare <16 x half> @llvm.copysign.v16f16(<16 x half> %a, <16 x half> %b) #0
				declare <32 x half> @llvm.copysign.v32f16(<32 x half> %a, <32 x half> %b) #0
				declare <64 x half> @llvm.copysign.v64f16(<64 x half> %a, <64 x half> %b) #0
				declare <128 x half> @llvm.copysign.v128f16(<128 x half> %a, <128 x half> %b) #0

				declare <2 x float> @llvm.copysign.v2f32(<2 x float> %a, <2 x float> %b) #0
				declare <4 x float> @llvm.copysign.v4f32(<4 x float> %a, <4 x float> %b) #0
				declare <8 x float> @llvm.copysign.v8f32(<8 x float> %a, <8 x float> %b) #0
				declare <16 x float> @llvm.copysign.v16f32(<16 x float> %a, <16 x float> %b) #0
				declare <32 x float> @llvm.copysign.v32f32(<32 x float> %a, <32 x float> %b) #0
				declare <64 x float> @llvm.copysign.v64f32(<64 x float> %a, <64 x float> %b) #0

				declare <2 x double> @llvm.copysign.v2f64(<2 x double> %a, <2 x double> %b) #0
				declare <4 x double> @llvm.copysign.v4f64(<4 x double> %a, <4 x double> %b) #0
				declare <8 x double> @llvm.copysign.v8f64(<8 x double> %a, <8 x double> %b) #0
				declare <16 x double> @llvm.copysign.v16f64(<16 x double> %a, <16 x double> %b) #0
				declare <32 x double> @llvm.copysign.v32f64(<32 x double> %a, <32 x double> %b) #0

				attributes #0 = { "target-features"="+sve" }

llvm/test/CodeGen/AArch64/sve2-fcopysign.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple aarch64-eabi -mattr=+sve2 -o - \| FileCheck --check-prefixes=CHECK %s			; RUN: llc < %s -mtriple aarch64-eabi -mattr=+sve2 -o - \| FileCheck --check-prefixes=CHECK,CHECK_NO_EXTEND_ROUND %s
				; RUN: llc < %s -mtriple aarch64-eabi -mattr=+sve2 --combiner-vector-fcopysign-extend-round -o - \| FileCheck --check-prefixes=CHECK,CHECK_EXTEND_ROUND %s

	target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"

	;============ v2f32			;============ v2f32

	define <vscale x 2 x float> @test_copysign_v2f32_v2f32(<vscale x 2 x float> %a, <vscale x 2 x float> %b) #0 {			define <vscale x 2 x float> @test_copysign_v2f32_v2f32(<vscale x 2 x float> %a, <vscale x 2 x float> %b) #0 {
	; CHECK-LABEL: test_copysign_v2f32_v2f32:			; CHECK-LABEL: test_copysign_v2f32_v2f32:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	Show All 31 Lines
	; CHECK-NEXT: bsl z0.d, z0.d, z1.d, z2.d			; CHECK-NEXT: bsl z0.d, z0.d, z1.d, z2.d
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%r = call <vscale x 4 x float> @llvm.copysign.v4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b)			%r = call <vscale x 4 x float> @llvm.copysign.v4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b)
	ret <vscale x 4 x float> %r			ret <vscale x 4 x float> %r
	}			}

	; SplitVecOp #1			; SplitVecOp #1
	define <vscale x 4 x float> @test_copysign_v4f32_v4f64(<vscale x 4 x float> %a, <vscale x 4 x double> %b) #0 {			define <vscale x 4 x float> @test_copysign_v4f32_v4f64(<vscale x 4 x float> %a, <vscale x 4 x double> %b) #0 {
	; CHECK-LABEL: test_copysign_v4f32_v4f64:			; CHECK_NO_EXTEND_ROUND-LABEL: test_copysign_v4f32_v4f64:
	; CHECK: // %bb.0:			; CHECK_NO_EXTEND_ROUND: // %bb.0:
	; CHECK-NEXT: mov w8, #2147483647			; CHECK_NO_EXTEND_ROUND-NEXT: mov w8, #2147483647
	; CHECK-NEXT: ptrue p0.d			; CHECK_NO_EXTEND_ROUND-NEXT: ptrue p0.d
	; CHECK-NEXT: fcvt z2.s, p0/m, z2.d			; CHECK_NO_EXTEND_ROUND-NEXT: fcvt z2.s, p0/m, z2.d
	; CHECK-NEXT: fcvt z1.s, p0/m, z1.d			; CHECK_NO_EXTEND_ROUND-NEXT: fcvt z1.s, p0/m, z1.d
	; CHECK-NEXT: uzp1 z1.s, z1.s, z2.s			; CHECK_NO_EXTEND_ROUND-NEXT: uzp1 z1.s, z1.s, z2.s
	; CHECK-NEXT: mov z2.s, w8			; CHECK_NO_EXTEND_ROUND-NEXT: mov z2.s, w8
	; CHECK-NEXT: bsl z0.d, z0.d, z1.d, z2.d			; CHECK_NO_EXTEND_ROUND-NEXT: bsl z0.d, z0.d, z1.d, z2.d
	; CHECK-NEXT: ret			; CHECK_NO_EXTEND_ROUND-NEXT: ret
				;
				; CHECK_EXTEND_ROUND-LABEL: test_copysign_v4f32_v4f64:
				; CHECK_EXTEND_ROUND: // %bb.0:
				; CHECK_EXTEND_ROUND-NEXT: mov w8, #2147483647
				; CHECK_EXTEND_ROUND-NEXT: ptrue p0.d
				; CHECK_EXTEND_ROUND-NEXT: fcvt z2.s, p0/m, z2.d
				; CHECK_EXTEND_ROUND-NEXT: uunpkhi z4.d, z0.s
				; CHECK_EXTEND_ROUND-NEXT: fcvt z1.s, p0/m, z1.d
				; CHECK_EXTEND_ROUND-NEXT: uunpklo z0.d, z0.s
				; CHECK_EXTEND_ROUND-NEXT: mov z3.s, w8
				; CHECK_EXTEND_ROUND-NEXT: bsl z4.d, z4.d, z2.d, z3.d
				; CHECK_EXTEND_ROUND-NEXT: bsl z0.d, z0.d, z1.d, z3.d
				; CHECK_EXTEND_ROUND-NEXT: uzp1 z0.s, z0.s, z4.s
				; CHECK_EXTEND_ROUND-NEXT: ret
	%tmp0 = fptrunc <vscale x 4 x double> %b to <vscale x 4 x float>			%tmp0 = fptrunc <vscale x 4 x double> %b to <vscale x 4 x float>
	%r = call <vscale x 4 x float> @llvm.copysign.v4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %tmp0)			%r = call <vscale x 4 x float> @llvm.copysign.v4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %tmp0)
	ret <vscale x 4 x float> %r			ret <vscale x 4 x float> %r
	}			}

	declare <vscale x 4 x float> @llvm.copysign.v4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0			declare <vscale x 4 x float> @llvm.copysign.v4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0

	;============ v2f64			;============ v2f64
	Show All 22 Lines
	}			}

	declare <vscale x 2 x double> @llvm.copysign.v2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0			declare <vscale x 2 x double> @llvm.copysign.v2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0

	;============ v4f64			;============ v4f64

	; SplitVecRes mismatched			; SplitVecRes mismatched
	define <vscale x 4 x double> @test_copysign_v4f64_v4f32(<vscale x 4 x double> %a, <vscale x 4 x float> %b) #0 {			define <vscale x 4 x double> @test_copysign_v4f64_v4f32(<vscale x 4 x double> %a, <vscale x 4 x float> %b) #0 {
	; CHECK-LABEL: test_copysign_v4f64_v4f32:			; CHECK_NO_EXTEND_ROUND-LABEL: test_copysign_v4f64_v4f32:
	; CHECK: // %bb.0:			; CHECK_NO_EXTEND_ROUND: // %bb.0:
	; CHECK-NEXT: ptrue p0.d			; CHECK_NO_EXTEND_ROUND-NEXT: ptrue p0.d
	; CHECK-NEXT: uunpkhi z3.d, z2.s			; CHECK_NO_EXTEND_ROUND-NEXT: uunpkhi z3.d, z2.s
	; CHECK-NEXT: uunpklo z2.d, z2.s			; CHECK_NO_EXTEND_ROUND-NEXT: uunpklo z2.d, z2.s
	; CHECK-NEXT: fcvt z3.d, p0/m, z3.s			; CHECK_NO_EXTEND_ROUND-NEXT: fcvt z3.d, p0/m, z3.s
	; CHECK-NEXT: fcvt z2.d, p0/m, z2.s			; CHECK_NO_EXTEND_ROUND-NEXT: fcvt z2.d, p0/m, z2.s
	; CHECK-NEXT: mov z4.d, #0x7fffffffffffffff			; CHECK_NO_EXTEND_ROUND-NEXT: mov z4.d, #0x7fffffffffffffff
	; CHECK-NEXT: bsl z0.d, z0.d, z2.d, z4.d			; CHECK_NO_EXTEND_ROUND-NEXT: bsl z0.d, z0.d, z2.d, z4.d
	; CHECK-NEXT: bsl z1.d, z1.d, z3.d, z4.d			; CHECK_NO_EXTEND_ROUND-NEXT: bsl z1.d, z1.d, z3.d, z4.d
	; CHECK-NEXT: ret			; CHECK_NO_EXTEND_ROUND-NEXT: ret
				;
				; CHECK_EXTEND_ROUND-LABEL: test_copysign_v4f64_v4f32:
				; CHECK_EXTEND_ROUND: // %bb.0:
				; CHECK_EXTEND_ROUND-NEXT: ptrue p0.d
				; CHECK_EXTEND_ROUND-NEXT: uunpklo z3.d, z2.s
				; CHECK_EXTEND_ROUND-NEXT: uunpkhi z2.d, z2.s
				; CHECK_EXTEND_ROUND-NEXT: fcvt z3.d, p0/m, z3.s
				; CHECK_EXTEND_ROUND-NEXT: mov z4.d, #0x7fffffffffffffff
				; CHECK_EXTEND_ROUND-NEXT: fcvt z2.d, p0/m, z2.s
				; CHECK_EXTEND_ROUND-NEXT: bsl z0.d, z0.d, z3.d, z4.d
				; CHECK_EXTEND_ROUND-NEXT: bsl z1.d, z1.d, z2.d, z4.d
				; CHECK_EXTEND_ROUND-NEXT: ret
	%tmp0 = fpext <vscale x 4 x float> %b to <vscale x 4 x double>			%tmp0 = fpext <vscale x 4 x float> %b to <vscale x 4 x double>
	%r = call <vscale x 4 x double> @llvm.copysign.v4f64(<vscale x 4 x double> %a, <vscale x 4 x double> %tmp0)			%r = call <vscale x 4 x double> @llvm.copysign.v4f64(<vscale x 4 x double> %a, <vscale x 4 x double> %tmp0)
	ret <vscale x 4 x double> %r			ret <vscale x 4 x double> %r
	}			}

	; SplitVecRes same			; SplitVecRes same
	define <vscale x 4 x double> @test_copysign_v4f64_v4f64(<vscale x 4 x double> %a, <vscale x 4 x double> %b) #0 {			define <vscale x 4 x double> @test_copysign_v4f64_v4f64(<vscale x 4 x double> %a, <vscale x 4 x double> %b) #0 {
	; CHECK-LABEL: test_copysign_v4f64_v4f64:			; CHECK-LABEL: test_copysign_v4f64_v4f64:
	Show All 31 Lines
	; CHECK-NEXT: bsl z0.d, z0.d, z1.d, z2.d			; CHECK-NEXT: bsl z0.d, z0.d, z1.d, z2.d
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%tmp0 = fptrunc <vscale x 4 x float> %b to <vscale x 4 x half>			%tmp0 = fptrunc <vscale x 4 x float> %b to <vscale x 4 x half>
	%r = call <vscale x 4 x half> @llvm.copysign.v4f16(<vscale x 4 x half> %a, <vscale x 4 x half> %tmp0)			%r = call <vscale x 4 x half> @llvm.copysign.v4f16(<vscale x 4 x half> %a, <vscale x 4 x half> %tmp0)
	ret <vscale x 4 x half> %r			ret <vscale x 4 x half> %r
	}			}

	define <vscale x 4 x half> @test_copysign_v4f16_v4f64(<vscale x 4 x half> %a, <vscale x 4 x double> %b) #0 {			define <vscale x 4 x half> @test_copysign_v4f16_v4f64(<vscale x 4 x half> %a, <vscale x 4 x double> %b) #0 {
	; CHECK-LABEL: test_copysign_v4f16_v4f64:			; CHECK_NO_EXTEND_ROUND-LABEL: test_copysign_v4f16_v4f64:
	; CHECK: // %bb.0:			; CHECK_NO_EXTEND_ROUND: // %bb.0:
	; CHECK-NEXT: mov w8, #32767			; CHECK_NO_EXTEND_ROUND-NEXT: mov w8, #32767
	; CHECK-NEXT: ptrue p0.d			; CHECK_NO_EXTEND_ROUND-NEXT: ptrue p0.d
	; CHECK-NEXT: fcvt z2.h, p0/m, z2.d			; CHECK_NO_EXTEND_ROUND-NEXT: fcvt z2.h, p0/m, z2.d
	; CHECK-NEXT: fcvt z1.h, p0/m, z1.d			; CHECK_NO_EXTEND_ROUND-NEXT: fcvt z1.h, p0/m, z1.d
	; CHECK-NEXT: uzp1 z1.s, z1.s, z2.s			; CHECK_NO_EXTEND_ROUND-NEXT: uzp1 z1.s, z1.s, z2.s
	; CHECK-NEXT: mov z2.h, w8			; CHECK_NO_EXTEND_ROUND-NEXT: mov z2.h, w8
	; CHECK-NEXT: bsl z0.d, z0.d, z1.d, z2.d			; CHECK_NO_EXTEND_ROUND-NEXT: bsl z0.d, z0.d, z1.d, z2.d
	; CHECK-NEXT: ret			; CHECK_NO_EXTEND_ROUND-NEXT: ret
				;
				; CHECK_EXTEND_ROUND-LABEL: test_copysign_v4f16_v4f64:
				; CHECK_EXTEND_ROUND: // %bb.0:
				; CHECK_EXTEND_ROUND-NEXT: mov w8, #32767
				; CHECK_EXTEND_ROUND-NEXT: ptrue p0.d
				; CHECK_EXTEND_ROUND-NEXT: fcvt z2.h, p0/m, z2.d
				; CHECK_EXTEND_ROUND-NEXT: uunpkhi z4.d, z0.s
				; CHECK_EXTEND_ROUND-NEXT: fcvt z1.h, p0/m, z1.d
				; CHECK_EXTEND_ROUND-NEXT: uunpklo z0.d, z0.s
				; CHECK_EXTEND_ROUND-NEXT: mov z3.h, w8
				; CHECK_EXTEND_ROUND-NEXT: bsl z4.d, z4.d, z2.d, z3.d
				; CHECK_EXTEND_ROUND-NEXT: bsl z0.d, z0.d, z1.d, z3.d
				; CHECK_EXTEND_ROUND-NEXT: uzp1 z0.s, z0.s, z4.s
				; CHECK_EXTEND_ROUND-NEXT: ret
	%tmp0 = fptrunc <vscale x 4 x double> %b to <vscale x 4 x half>			%tmp0 = fptrunc <vscale x 4 x double> %b to <vscale x 4 x half>
	%r = call <vscale x 4 x half> @llvm.copysign.v4f16(<vscale x 4 x half> %a, <vscale x 4 x half> %tmp0)			%r = call <vscale x 4 x half> @llvm.copysign.v4f16(<vscale x 4 x half> %a, <vscale x 4 x half> %tmp0)
	ret <vscale x 4 x half> %r			ret <vscale x 4 x half> %r
	}			}

	declare <vscale x 4 x half> @llvm.copysign.v4f16(<vscale x 4 x half> %a, <vscale x 4 x half> %b) #0			declare <vscale x 4 x half> @llvm.copysign.v4f16(<vscale x 4 x half> %a, <vscale x 4 x half> %b) #0

	;============ v8f16			;============ v8f16

	define <vscale x 8 x half> @test_copysign_v8f16_v8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0 {			define <vscale x 8 x half> @test_copysign_v8f16_v8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0 {
	; CHECK-LABEL: test_copysign_v8f16_v8f16:			; CHECK-LABEL: test_copysign_v8f16_v8f16:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: mov w8, #32767			; CHECK-NEXT: mov w8, #32767
	; CHECK-NEXT: mov z2.h, w8			; CHECK-NEXT: mov z2.h, w8
	; CHECK-NEXT: bsl z0.d, z0.d, z1.d, z2.d			; CHECK-NEXT: bsl z0.d, z0.d, z1.d, z2.d
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%r = call <vscale x 8 x half> @llvm.copysign.v8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b)			%r = call <vscale x 8 x half> @llvm.copysign.v8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b)
	ret <vscale x 8 x half> %r			ret <vscale x 8 x half> %r
	}			}

	define <vscale x 8 x half> @test_copysign_v8f16_v8f32(<vscale x 8 x half> %a, <vscale x 8 x float> %b) #0 {			define <vscale x 8 x half> @test_copysign_v8f16_v8f32(<vscale x 8 x half> %a, <vscale x 8 x float> %b) #0 {
	; CHECK-LABEL: test_copysign_v8f16_v8f32:			; CHECK_NO_EXTEND_ROUND-LABEL: test_copysign_v8f16_v8f32:
	; CHECK: // %bb.0:			; CHECK_NO_EXTEND_ROUND: // %bb.0:
	; CHECK-NEXT: mov w8, #32767			; CHECK_NO_EXTEND_ROUND-NEXT: mov w8, #32767
	; CHECK-NEXT: ptrue p0.s			; CHECK_NO_EXTEND_ROUND-NEXT: ptrue p0.s
	; CHECK-NEXT: fcvt z2.h, p0/m, z2.s			; CHECK_NO_EXTEND_ROUND-NEXT: fcvt z2.h, p0/m, z2.s
	; CHECK-NEXT: fcvt z1.h, p0/m, z1.s			; CHECK_NO_EXTEND_ROUND-NEXT: fcvt z1.h, p0/m, z1.s
	; CHECK-NEXT: uzp1 z1.h, z1.h, z2.h			; CHECK_NO_EXTEND_ROUND-NEXT: uzp1 z1.h, z1.h, z2.h
	; CHECK-NEXT: mov z2.h, w8			; CHECK_NO_EXTEND_ROUND-NEXT: mov z2.h, w8
	; CHECK-NEXT: bsl z0.d, z0.d, z1.d, z2.d			; CHECK_NO_EXTEND_ROUND-NEXT: bsl z0.d, z0.d, z1.d, z2.d
	; CHECK-NEXT: ret			; CHECK_NO_EXTEND_ROUND-NEXT: ret
				;
				; CHECK_EXTEND_ROUND-LABEL: test_copysign_v8f16_v8f32:
				; CHECK_EXTEND_ROUND: // %bb.0:
				; CHECK_EXTEND_ROUND-NEXT: mov w8, #32767
				; CHECK_EXTEND_ROUND-NEXT: ptrue p0.s
				; CHECK_EXTEND_ROUND-NEXT: fcvt z2.h, p0/m, z2.s
				; CHECK_EXTEND_ROUND-NEXT: uunpkhi z4.s, z0.h
				; CHECK_EXTEND_ROUND-NEXT: fcvt z1.h, p0/m, z1.s
				; CHECK_EXTEND_ROUND-NEXT: uunpklo z0.s, z0.h
				; CHECK_EXTEND_ROUND-NEXT: mov z3.h, w8
				; CHECK_EXTEND_ROUND-NEXT: bsl z4.d, z4.d, z2.d, z3.d
				; CHECK_EXTEND_ROUND-NEXT: bsl z0.d, z0.d, z1.d, z3.d
				; CHECK_EXTEND_ROUND-NEXT: uzp1 z0.h, z0.h, z4.h
				; CHECK_EXTEND_ROUND-NEXT: ret
	%tmp0 = fptrunc <vscale x 8 x float> %b to <vscale x 8 x half>			%tmp0 = fptrunc <vscale x 8 x float> %b to <vscale x 8 x half>
	%r = call <vscale x 8 x half> @llvm.copysign.v8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %tmp0)			%r = call <vscale x 8 x half> @llvm.copysign.v8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %tmp0)
	ret <vscale x 8 x half> %r			ret <vscale x 8 x half> %r
	}			}

	declare <vscale x 8 x half> @llvm.copysign.v8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0			declare <vscale x 8 x half> @llvm.copysign.v8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

llvm/test/CodeGen/AArch64/sve2-fixed-length-fcopysign.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -aarch64-sve-vector-bits-min=256 < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_256,CHECK_NO_EXTEND_ROUND
				; RUN: llc -aarch64-sve-vector-bits-min=512 < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,CHECK_NO_EXTEND_ROUND
				; RUN: llc -aarch64-sve-vector-bits-min=2048 < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,CHECK_NO_EXTEND_ROUND
				; RUN: llc -aarch64-sve-vector-bits-min=256 --combiner-vector-fcopysign-extend-round < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_256,CHECK_EXTEND_ROUND
				; RUN: llc -aarch64-sve-vector-bits-min=512 --combiner-vector-fcopysign-extend-round < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,CHECK_EXTEND_ROUND
				; RUN: llc -aarch64-sve-vector-bits-min=2048 --combiner-vector-fcopysign-extend-round < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,CHECK_EXTEND_ROUND


				target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"

				target triple = "aarch64-unknown-linux-gnu"

				;============ f16

				define void @test_copysign_v4f16_v4f16(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v4f16_v4f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr d0, [x0]
				; CHECK-NEXT: mvni v2.4h, #128, lsl #8
				; CHECK-NEXT: ldr d1, [x1]
				; CHECK-NEXT: bif v0.8b, v1.8b, v2.8b
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				%a = load <4 x half>, ptr %ap
				%b = load <4 x half>, ptr %bp
				%r = call <4 x half> @llvm.copysign.v4f16(<4 x half> %a, <4 x half> %b)
				store <4 x half> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v8f16_v8f16(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v8f16_v8f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr q0, [x0]
				; CHECK-NEXT: ldr q1, [x1]
				; CHECK-NEXT: mvni v2.8h, #128, lsl #8
				; CHECK-NEXT: bif v0.16b, v1.16b, v2.16b
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%a = load <8 x half>, ptr %ap
				%b = load <8 x half>, ptr %bp
				%r = call <8 x half> @llvm.copysign.v8f16(<8 x half> %a, <8 x half> %b)
				store <8 x half> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v16f16_v16f16(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v16f16_v16f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.h, vl16
				; CHECK-NEXT: mov w8, #32767
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [x0]
				; CHECK-NEXT: ld1h { z1.h }, p0/z, [x1]
				; CHECK-NEXT: mov z2.h, w8
				; CHECK-NEXT: bsl z0.d, z0.d, z1.d, z2.d
				; CHECK-NEXT: st1h { z0.h }, p0, [x0]
				; CHECK-NEXT: ret
				%a = load <16 x half>, ptr %ap
				%b = load <16 x half>, ptr %bp
				%r = call <16 x half> @llvm.copysign.v16f16(<16 x half> %a, <16 x half> %b)
				store <16 x half> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v32f16_v32f16(ptr %ap, ptr %bp) #0 {
				; VBITS_GE_256-LABEL: test_copysign_v32f16_v32f16:
				; VBITS_GE_256: // %bb.0:
				; VBITS_GE_256-NEXT: mov x8, #16
				; VBITS_GE_256-NEXT: ptrue p0.h, vl16
				; VBITS_GE_256-NEXT: mov w9, #32767
				; VBITS_GE_256-NEXT: ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
				; VBITS_GE_256-NEXT: ld1h { z1.h }, p0/z, [x0]
				; VBITS_GE_256-NEXT: ld1h { z2.h }, p0/z, [x1, x8, lsl #1]
				; VBITS_GE_256-NEXT: ld1h { z3.h }, p0/z, [x1]
				; VBITS_GE_256-NEXT: mov z4.h, w9
				; VBITS_GE_256-NEXT: bsl z0.d, z0.d, z2.d, z4.d
				; VBITS_GE_256-NEXT: bsl z1.d, z1.d, z3.d, z4.d
				; VBITS_GE_256-NEXT: st1h { z0.h }, p0, [x0, x8, lsl #1]
				; VBITS_GE_256-NEXT: st1h { z1.h }, p0, [x0]
				; VBITS_GE_256-NEXT: ret
				;
				; VBITS_GE_512-LABEL: test_copysign_v32f16_v32f16:
				; VBITS_GE_512: // %bb.0:
				; VBITS_GE_512-NEXT: ptrue p0.h, vl32
				; VBITS_GE_512-NEXT: mov w8, #32767
				; VBITS_GE_512-NEXT: ld1h { z0.h }, p0/z, [x0]
				; VBITS_GE_512-NEXT: ld1h { z1.h }, p0/z, [x1]
				; VBITS_GE_512-NEXT: mov z2.h, w8
				; VBITS_GE_512-NEXT: bsl z0.d, z0.d, z1.d, z2.d
				; VBITS_GE_512-NEXT: st1h { z0.h }, p0, [x0]
				; VBITS_GE_512-NEXT: ret
				%a = load <32 x half>, ptr %ap
				%b = load <32 x half>, ptr %bp
				%r = call <32 x half> @llvm.copysign.v32f16(<32 x half> %a, <32 x half> %b)
				store <32 x half> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v64f16_v64f16(ptr %ap, ptr %bp) vscale_range(8,0) #0 {
				; CHECK-LABEL: test_copysign_v64f16_v64f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.h, vl64
				; CHECK-NEXT: mov w8, #32767
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [x0]
				; CHECK-NEXT: ld1h { z1.h }, p0/z, [x1]
				; CHECK-NEXT: mov z2.h, w8
				; CHECK-NEXT: bsl z0.d, z0.d, z1.d, z2.d
				; CHECK-NEXT: st1h { z0.h }, p0, [x0]
				; CHECK-NEXT: ret
				%a = load <64 x half>, ptr %ap
				%b = load <64 x half>, ptr %bp
				%r = call <64 x half> @llvm.copysign.v64f16(<64 x half> %a, <64 x half> %b)
				store <64 x half> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v128f16_v128f16(ptr %ap, ptr %bp) vscale_range(16,0) #0 {
				; CHECK-LABEL: test_copysign_v128f16_v128f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.h, vl128
				; CHECK-NEXT: mov w8, #32767
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [x0]
				; CHECK-NEXT: ld1h { z1.h }, p0/z, [x1]
				; CHECK-NEXT: mov z2.h, w8
				; CHECK-NEXT: bsl z0.d, z0.d, z1.d, z2.d
				; CHECK-NEXT: st1h { z0.h }, p0, [x0]
				; CHECK-NEXT: ret
				%a = load <128 x half>, ptr %ap
				%b = load <128 x half>, ptr %bp
				%r = call <128 x half> @llvm.copysign.v128f16(<128 x half> %a, <128 x half> %b)
				store <128 x half> %r, ptr %ap
				ret void
				}

				;============ f32

				define void @test_copysign_v2f32_v2f32(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v2f32_v2f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr d0, [x0]
				; CHECK-NEXT: mvni v2.2s, #128, lsl #24
				; CHECK-NEXT: ldr d1, [x1]
				; CHECK-NEXT: bif v0.8b, v1.8b, v2.8b
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				%a = load <2 x float>, ptr %ap
				%b = load <2 x float>, ptr %bp
				%r = call <2 x float> @llvm.copysign.v2f32(<2 x float> %a, <2 x float> %b)
				store <2 x float> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v4f32_v4f32(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v4f32_v4f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr q0, [x0]
				; CHECK-NEXT: ldr q1, [x1]
				; CHECK-NEXT: mvni v2.4s, #128, lsl #24
				; CHECK-NEXT: bif v0.16b, v1.16b, v2.16b
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%a = load <4 x float>, ptr %ap
				%b = load <4 x float>, ptr %bp
				%r = call <4 x float> @llvm.copysign.v4f32(<4 x float> %a, <4 x float> %b)
				store <4 x float> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v8f32_v8f32(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v8f32_v8f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s, vl8
				; CHECK-NEXT: mov w8, #2147483647
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ld1w { z1.s }, p0/z, [x1]
				; CHECK-NEXT: mov z2.s, w8
				; CHECK-NEXT: bsl z0.d, z0.d, z1.d, z2.d
				; CHECK-NEXT: st1w { z0.s }, p0, [x0]
				; CHECK-NEXT: ret
				%a = load <8 x float>, ptr %ap
				%b = load <8 x float>, ptr %bp
				%r = call <8 x float> @llvm.copysign.v8f32(<8 x float> %a, <8 x float> %b)
				store <8 x float> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v16f32_v16f32(ptr %ap, ptr %bp) #0 {
				; VBITS_GE_256-LABEL: test_copysign_v16f32_v16f32:
				; VBITS_GE_256: // %bb.0:
				; VBITS_GE_256-NEXT: mov x8, #8
				; VBITS_GE_256-NEXT: ptrue p0.s, vl8
				; VBITS_GE_256-NEXT: mov w9, #2147483647
				; VBITS_GE_256-NEXT: ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
				; VBITS_GE_256-NEXT: ld1w { z1.s }, p0/z, [x0]
				; VBITS_GE_256-NEXT: ld1w { z2.s }, p0/z, [x1, x8, lsl #2]
				; VBITS_GE_256-NEXT: ld1w { z3.s }, p0/z, [x1]
				; VBITS_GE_256-NEXT: mov z4.s, w9
				; VBITS_GE_256-NEXT: bsl z0.d, z0.d, z2.d, z4.d
				; VBITS_GE_256-NEXT: bsl z1.d, z1.d, z3.d, z4.d
				; VBITS_GE_256-NEXT: st1w { z0.s }, p0, [x0, x8, lsl #2]
				; VBITS_GE_256-NEXT: st1w { z1.s }, p0, [x0]
				; VBITS_GE_256-NEXT: ret
				;
				; VBITS_GE_512-LABEL: test_copysign_v16f32_v16f32:
				; VBITS_GE_512: // %bb.0:
				; VBITS_GE_512-NEXT: ptrue p0.s, vl16
				; VBITS_GE_512-NEXT: mov w8, #2147483647
				; VBITS_GE_512-NEXT: ld1w { z0.s }, p0/z, [x0]
				; VBITS_GE_512-NEXT: ld1w { z1.s }, p0/z, [x1]
				; VBITS_GE_512-NEXT: mov z2.s, w8
				; VBITS_GE_512-NEXT: bsl z0.d, z0.d, z1.d, z2.d
				; VBITS_GE_512-NEXT: st1w { z0.s }, p0, [x0]
				; VBITS_GE_512-NEXT: ret
				%a = load <16 x float>, ptr %ap
				%b = load <16 x float>, ptr %bp
				%r = call <16 x float> @llvm.copysign.v16f32(<16 x float> %a, <16 x float> %b)
				store <16 x float> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v32f32_v32f32(ptr %ap, ptr %bp) vscale_range(8,0) #0 {
				; CHECK-LABEL: test_copysign_v32f32_v32f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s, vl32
				; CHECK-NEXT: mov w8, #2147483647
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ld1w { z1.s }, p0/z, [x1]
				; CHECK-NEXT: mov z2.s, w8
				; CHECK-NEXT: bsl z0.d, z0.d, z1.d, z2.d
				; CHECK-NEXT: st1w { z0.s }, p0, [x0]
				; CHECK-NEXT: ret
				%a = load <32 x float>, ptr %ap
				%b = load <32 x float>, ptr %bp
				%r = call <32 x float> @llvm.copysign.v32f32(<32 x float> %a, <32 x float> %b)
				store <32 x float> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v64f32_v64f32(ptr %ap, ptr %bp) vscale_range(16,0) #0 {
				; CHECK-LABEL: test_copysign_v64f32_v64f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s, vl64
				; CHECK-NEXT: mov w8, #2147483647
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ld1w { z1.s }, p0/z, [x1]
				; CHECK-NEXT: mov z2.s, w8
				; CHECK-NEXT: bsl z0.d, z0.d, z1.d, z2.d
				; CHECK-NEXT: st1w { z0.s }, p0, [x0]
				; CHECK-NEXT: ret
				%a = load <64 x float>, ptr %ap
				%b = load <64 x float>, ptr %bp
				%r = call <64 x float> @llvm.copysign.v64f32(<64 x float> %a, <64 x float> %b)
				store <64 x float> %r, ptr %ap
				ret void
				}

				;============ f64

				define void @test_copysign_v2f64_v2f64(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v2f64_v2f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: movi v0.2d, #0xffffffffffffffff
				; CHECK-NEXT: ldr q1, [x0]
				; CHECK-NEXT: ldr q2, [x1]
				; CHECK-NEXT: fneg v0.2d, v0.2d
				; CHECK-NEXT: bsl v0.16b, v1.16b, v2.16b
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%a = load <2 x double>, ptr %ap
				%b = load <2 x double>, ptr %bp
				%r = call <2 x double> @llvm.copysign.v2f64(<2 x double> %a, <2 x double> %b)
				store <2 x double> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v4f64_v4f64(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v4f64_v4f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d, vl4
				; CHECK-NEXT: mov z2.d, #0x7fffffffffffffff
				; CHECK-NEXT: ld1d { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ld1d { z1.d }, p0/z, [x1]
				; CHECK-NEXT: bsl z0.d, z0.d, z1.d, z2.d
				; CHECK-NEXT: st1d { z0.d }, p0, [x0]
				; CHECK-NEXT: ret
				%a = load <4 x double>, ptr %ap
				%b = load <4 x double>, ptr %bp
				%r = call <4 x double> @llvm.copysign.v4f64(<4 x double> %a, <4 x double> %b)
				store <4 x double> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v8f64_v8f64(ptr %ap, ptr %bp) #0 {
				; VBITS_GE_256-LABEL: test_copysign_v8f64_v8f64:
				; VBITS_GE_256: // %bb.0:
				; VBITS_GE_256-NEXT: mov x8, #4
				; VBITS_GE_256-NEXT: ptrue p0.d, vl4
				; VBITS_GE_256-NEXT: mov z4.d, #0x7fffffffffffffff
				; VBITS_GE_256-NEXT: ld1d { z0.d }, p0/z, [x0, x8, lsl #3]
				; VBITS_GE_256-NEXT: ld1d { z1.d }, p0/z, [x0]
				; VBITS_GE_256-NEXT: ld1d { z2.d }, p0/z, [x1, x8, lsl #3]
				; VBITS_GE_256-NEXT: ld1d { z3.d }, p0/z, [x1]
				; VBITS_GE_256-NEXT: bsl z0.d, z0.d, z2.d, z4.d
				; VBITS_GE_256-NEXT: bsl z1.d, z1.d, z3.d, z4.d
				; VBITS_GE_256-NEXT: st1d { z0.d }, p0, [x0, x8, lsl #3]
				; VBITS_GE_256-NEXT: st1d { z1.d }, p0, [x0]
				; VBITS_GE_256-NEXT: ret
				;
				; VBITS_GE_512-LABEL: test_copysign_v8f64_v8f64:
				; VBITS_GE_512: // %bb.0:
				; VBITS_GE_512-NEXT: ptrue p0.d, vl8
				; VBITS_GE_512-NEXT: mov z2.d, #0x7fffffffffffffff
				; VBITS_GE_512-NEXT: ld1d { z0.d }, p0/z, [x0]
				; VBITS_GE_512-NEXT: ld1d { z1.d }, p0/z, [x1]
				; VBITS_GE_512-NEXT: bsl z0.d, z0.d, z1.d, z2.d
				; VBITS_GE_512-NEXT: st1d { z0.d }, p0, [x0]
				; VBITS_GE_512-NEXT: ret
				%a = load <8 x double>, ptr %ap
				%b = load <8 x double>, ptr %bp
				%r = call <8 x double> @llvm.copysign.v8f64(<8 x double> %a, <8 x double> %b)
				store <8 x double> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v16f64_v16f64(ptr %ap, ptr %bp) vscale_range(8,0) #0 {
				; CHECK-LABEL: test_copysign_v16f64_v16f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d, vl16
				; CHECK-NEXT: mov z2.d, #0x7fffffffffffffff
				; CHECK-NEXT: ld1d { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ld1d { z1.d }, p0/z, [x1]
				; CHECK-NEXT: bsl z0.d, z0.d, z1.d, z2.d
				; CHECK-NEXT: st1d { z0.d }, p0, [x0]
				; CHECK-NEXT: ret
				%a = load <16 x double>, ptr %ap
				%b = load <16 x double>, ptr %bp
				%r = call <16 x double> @llvm.copysign.v16f64(<16 x double> %a, <16 x double> %b)
				store <16 x double> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v32f64_v32f64(ptr %ap, ptr %bp) vscale_range(16,0) #0 {
				; CHECK-LABEL: test_copysign_v32f64_v32f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d, vl32
				; CHECK-NEXT: mov z2.d, #0x7fffffffffffffff
				; CHECK-NEXT: ld1d { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ld1d { z1.d }, p0/z, [x1]
				; CHECK-NEXT: bsl z0.d, z0.d, z1.d, z2.d
				; CHECK-NEXT: st1d { z0.d }, p0, [x0]
				; CHECK-NEXT: ret
				%a = load <32 x double>, ptr %ap
				%b = load <32 x double>, ptr %bp
				%r = call <32 x double> @llvm.copysign.v32f64(<32 x double> %a, <32 x double> %b)
				store <32 x double> %r, ptr %ap
				ret void
				}

				;============ v2f32

				define void @test_copysign_v2f32_v2f64(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v2f32_v2f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr q0, [x1]
				; CHECK-NEXT: mvni v2.2s, #128, lsl #24
				; CHECK-NEXT: ldr d1, [x0]
				; CHECK-NEXT: fcvtn v0.2s, v0.2d
				; CHECK-NEXT: bit v0.8b, v1.8b, v2.8b
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				%a = load <2 x float>, ptr %ap
				%b = load <2 x double>, ptr %bp
				%tmp0 = fptrunc <2 x double> %b to <2 x float>
				%r = call <2 x float> @llvm.copysign.v2f32(<2 x float> %a, <2 x float> %tmp0)
				store <2 x float> %r, ptr %ap
				ret void
				}

				;============ v4f32

				; SplitVecOp #1
				define void @test_copysign_v4f32_v4f64(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v4f32_v4f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d, vl4
				; CHECK-NEXT: ldr q0, [x0]
				; CHECK-NEXT: ld1d { z1.d }, p0/z, [x1]
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mvni v2.4s, #128, lsl #24
				; CHECK-NEXT: fcvt z1.s, p0/m, z1.d
				; CHECK-NEXT: uzp1 z1.s, z1.s, z1.s
				; CHECK-NEXT: bif v0.16b, v1.16b, v2.16b
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%a = load <4 x float>, ptr %ap
				%b = load <4 x double>, ptr %bp
				%tmp0 = fptrunc <4 x double> %b to <4 x float>
				%r = call <4 x float> @llvm.copysign.v4f32(<4 x float> %a, <4 x float> %tmp0)
				store <4 x float> %r, ptr %ap
				ret void
				}

				;============ v2f64

				define void @test_copysign_v2f64_v2f32(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v2f64_v2f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: movi v0.2d, #0xffffffffffffffff
				; CHECK-NEXT: ldr d1, [x1]
				; CHECK-NEXT: ldr q2, [x0]
				; CHECK-NEXT: fcvtl v1.2d, v1.2s
				; CHECK-NEXT: fneg v0.2d, v0.2d
				; CHECK-NEXT: bsl v0.16b, v2.16b, v1.16b
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%a = load <2 x double>, ptr %ap
				%b = load < 2 x float>, ptr %bp
				%tmp0 = fpext <2 x float> %b to <2 x double>
				%r = call <2 x double> @llvm.copysign.v2f64(<2 x double> %a, <2 x double> %tmp0)
				store <2 x double> %r, ptr %ap
				ret void
				}

				;============ v4f64

				; SplitVecRes mismatched
				define void @test_copysign_v4f64_v4f32(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK_NO_EXTEND_ROUND-LABEL: test_copysign_v4f64_v4f32:
				; CHECK_NO_EXTEND_ROUND: // %bb.0:
				; CHECK_NO_EXTEND_ROUND-NEXT: ptrue p0.d, vl4
				; CHECK_NO_EXTEND_ROUND-NEXT: mov z2.d, #0x7fffffffffffffff
				; CHECK_NO_EXTEND_ROUND-NEXT: ld1d { z0.d }, p0/z, [x0]
				; CHECK_NO_EXTEND_ROUND-NEXT: ld1w { z1.d }, p0/z, [x1]
				; CHECK_NO_EXTEND_ROUND-NEXT: fcvt z1.d, p0/m, z1.s
				; CHECK_NO_EXTEND_ROUND-NEXT: bsl z0.d, z0.d, z1.d, z2.d
				; CHECK_NO_EXTEND_ROUND-NEXT: st1d { z0.d }, p0, [x0]
				; CHECK_NO_EXTEND_ROUND-NEXT: ret
				;
				; CHECK_EXTEND_ROUND-LABEL: test_copysign_v4f64_v4f32:
				; CHECK_EXTEND_ROUND: // %bb.0:
				; CHECK_EXTEND_ROUND-NEXT: ptrue p0.d, vl4
				; CHECK_EXTEND_ROUND-NEXT: mov z2.d, #0x7fffffffffffffff
				; CHECK_EXTEND_ROUND-NEXT: ld1d { z0.d }, p0/z, [x0]
				; CHECK_EXTEND_ROUND-NEXT: ldr q1, [x1]
				; CHECK_EXTEND_ROUND-NEXT: uunpklo z1.d, z1.s
				; CHECK_EXTEND_ROUND-NEXT: fcvt z1.d, p0/m, z1.s
				; CHECK_EXTEND_ROUND-NEXT: bsl z0.d, z0.d, z1.d, z2.d
				; CHECK_EXTEND_ROUND-NEXT: st1d { z0.d }, p0, [x0]
				; CHECK_EXTEND_ROUND-NEXT: ret
				%a = load <4 x double>, ptr %ap
				%b = load <4 x float>, ptr %bp
				%tmp0 = fpext <4 x float> %b to <4 x double>
				%r = call <4 x double> @llvm.copysign.v4f64(<4 x double> %a, <4 x double> %tmp0)
				store <4 x double> %r, ptr %ap
				ret void
				}

				;============ v4f16

				define void @test_copysign_v4f16_v4f32(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v4f16_v4f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr q0, [x1]
				; CHECK-NEXT: mvni v2.4h, #128, lsl #8
				; CHECK-NEXT: ldr d1, [x0]
				; CHECK-NEXT: fcvtn v0.4h, v0.4s
				; CHECK-NEXT: bit v0.8b, v1.8b, v2.8b
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				%a = load <4 x half>, ptr %ap
				%b = load <4 x float>, ptr %bp
				%tmp0 = fptrunc <4 x float> %b to <4 x half>
				%r = call <4 x half> @llvm.copysign.v4f16(<4 x half> %a, <4 x half> %tmp0)
				store <4 x half> %r, ptr %ap
				ret void
				}

				define void @test_copysign_v4f16_v4f64(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v4f16_v4f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d, vl4
				; CHECK-NEXT: ldr d0, [x0]
				; CHECK-NEXT: ld1d { z1.d }, p0/z, [x1]
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mvni v2.4h, #128, lsl #8
				; CHECK-NEXT: fcvt z1.h, p0/m, z1.d
				; CHECK-NEXT: uzp1 z1.s, z1.s, z1.s
				; CHECK-NEXT: uzp1 z1.h, z1.h, z1.h
				; CHECK-NEXT: bif v0.8b, v1.8b, v2.8b
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				%a = load <4 x half>, ptr %ap
				%b = load <4 x double>, ptr %bp
				%tmp0 = fptrunc <4 x double> %b to <4 x half>
				%r = call <4 x half> @llvm.copysign.v4f16(<4 x half> %a, <4 x half> %tmp0)
				store <4 x half> %r, ptr %ap
				ret void
				}

				declare <4 x half> @llvm.copysign.v4f16(<4 x half> %a, <4 x half> %b) #0

				;============ v8f16


				define void @test_copysign_v8f16_v8f32(ptr %ap, ptr %bp) vscale_range(2,0) #0 {
				; CHECK-LABEL: test_copysign_v8f16_v8f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s, vl8
				; CHECK-NEXT: ldr q0, [x0]
				; CHECK-NEXT: ld1w { z1.s }, p0/z, [x1]
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mvni v2.8h, #128, lsl #8
				; CHECK-NEXT: fcvt z1.h, p0/m, z1.s
				; CHECK-NEXT: uzp1 z1.h, z1.h, z1.h
				; CHECK-NEXT: bif v0.16b, v1.16b, v2.16b
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%a = load <8 x half>, ptr %ap
				%b = load <8 x float>, ptr %bp
				%tmp0 = fptrunc <8 x float> %b to <8 x half>
				%r = call <8 x half> @llvm.copysign.v8f16(<8 x half> %a, <8 x half> %tmp0)
				store <8 x half> %r, ptr %ap
				ret void
				}

				declare <8 x half> @llvm.copysign.v8f16(<8 x half> %a, <8 x half> %b) #0
				declare <16 x half> @llvm.copysign.v16f16(<16 x half> %a, <16 x half> %b) #0
				declare <32 x half> @llvm.copysign.v32f16(<32 x half> %a, <32 x half> %b) #0
				declare <64 x half> @llvm.copysign.v64f16(<64 x half> %a, <64 x half> %b) #0
				declare <128 x half> @llvm.copysign.v128f16(<128 x half> %a, <128 x half> %b) #0

				declare <2 x float> @llvm.copysign.v2f32(<2 x float> %a, <2 x float> %b) #0
				declare <4 x float> @llvm.copysign.v4f32(<4 x float> %a, <4 x float> %b) #0
				declare <8 x float> @llvm.copysign.v8f32(<8 x float> %a, <8 x float> %b) #0
				declare <16 x float> @llvm.copysign.v16f32(<16 x float> %a, <16 x float> %b) #0
				declare <32 x float> @llvm.copysign.v32f32(<32 x float> %a, <32 x float> %b) #0
				declare <64 x float> @llvm.copysign.v64f32(<64 x float> %a, <64 x float> %b) #0

				declare <2 x double> @llvm.copysign.v2f64(<2 x double> %a, <2 x double> %b) #0
				declare <4 x double> @llvm.copysign.v4f64(<4 x double> %a, <4 x double> %b) #0
				declare <8 x double> @llvm.copysign.v8f64(<8 x double> %a, <8 x double> %b) #0
				declare <16 x double> @llvm.copysign.v16f64(<16 x double> %a, <16 x double> %b) #0
				declare <32 x double> @llvm.copysign.v32f64(<32 x double> %a, <32 x double> %b) #0

				attributes #0 = { "target-features"="+sve2" }