This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
20/39
AArch64TargetTransformInfo.cpp
-
test/Transforms/InstCombine/AArch64/
-
Transforms/
-
InstCombine/
-
AArch64/
-
sve-intrinsic-strictfp.ll
6/13
sve-intrinsics-combine-to-u-forms.ll

Differential D152005

[SVE ACLE] Implement IR combines to convert intrinsics used for _m C/C++ builtins
ClosedPublic

Authored by jolanta.jensen on Jun 2 2023, 9:50 AM.

Download Raw Diff

Details

Reviewers

mgabka
paulwalker-arm

Commits

rGecb07f481be9: [SVE ACLE] Implement IR combines to convert intrinsics used for _m C/C++…

Summary

This patch implements IR combines to convert intrinsics used for _m C/C++ builtins
which take an all active predicate to their equivalent _u intrinsic.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jolanta.jensen created this revision.Jun 2 2023, 9:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2023, 9:50 AM

Herald added subscribers: hiraditya, tschuett. · View Herald Transcript

jolanta.jensen requested review of this revision.Jun 2 2023, 9:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2023, 9:50 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptJun 2 2023, 9:50 AM

Harbormaster completed remote builds in B236199: Diff 527886.Jun 2 2023, 10:27 AM

Matt added a subscriber: Matt.Jun 2 2023, 3:35 PM

paulwalker-arm added inline comments.Jun 5 2023, 5:11 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1399–1418	Rather than have nested switch statements perhaps you can pass the new `IntrinsicID` into this function? Since both this function and `instCombineSVEAllActive3VA` don't change the original call's operands, I think you should be able to make use of II.args(). I'm not sure if there's a variant of `CreateIntrinsic` that'll take this directly but if not, you should be able to create a `SmallVector` from it and pass that to `CreateIntrinsic`.
1406–1407	`InstCombiner` comes with its own `IRBuilder` already set up. It should be as simple as replacing your new `Builder.Create...`lines with `IC.Builder.Create...`. You'll see the existing SVE functions have been updated similarly.

Review comments addressed.
Corrected a bug where arguments to fused intrinsics were unnecessarily swapped.

Still missing conversions for mul, fmul, add, fadd, sub and fsub to their _u variants.

jolanta.jensen added inline comments.Jun 6 2023, 10:05 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1399–1418	Fixed. So much better now :)
1406–1407	Changed to InstCombiner own IRBuilder.

Harbormaster completed remote builds in B236987: Diff 528915.Jun 6 2023, 11:22 AM

I think it would be good to explain in the commit message why conversion to _u is a good thing, otherwise it is not clear why we are doing it.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1406–1410	this isn't efficient (as you are copying element by element and doing unnecessary clear), and it is a lot of line of code, SmallVector has a dedicated constructor which can take an iterator_range and create a vector from it, please use it instead.
1783–1842	would be good if the order of intrinsics here aligned with the order in the test file so it would be easier to check that we have full coverage
llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-m-to-x.ll
1 ↗	(On Diff #528915)	I am not a big fan of the name of this file, it suggests that there should be intrinsics with _x or _m suffix being used but there no such ones. I think it would also be good to sort the test in alphabetical order by the name of the intrinsic being tested, that would make navigation through this long file easier.

jolanta.jensen added inline comments.Jun 7 2023, 8:55 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1783–1842	I think they are. But I could make a mistake, I'll check.
llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-m-to-x.ll
1 ↗	(On Diff #528915)	_x and _m suffixes were inspired by acle docs. I happily take suggestions to rename the file. Shall the alphabetical sorting be for the whole file or within the sections, i.e. within Float arithmetics, Integer arithmetics, Shifts and Logical operations?

mgabka added inline comments.Jun 7 2023, 9:36 AM

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-m-to-x.ll
1 ↗	(On Diff #528915)	both options are fine, I think the one you proposed sounds better. Yeah I know from where the _x and _m is coming, I just think that in this case it would be easier to use file name like sve-intrinsics-combine-to-u-forms.ll or something simialar

Review comments addressed.
Bug fix for not propagating fast math flags and attributes.

jolanta.jensen added inline comments.Jun 8 2023, 10:03 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1406–1410	Fixed.
1783–1842	Should be now in the same order as in the test file. And the test file is renamed and alphabetically ordered within the sections.
1841	mul and mul_u execute identical code. Please advise if this can be solved some better way.

Harbormaster completed remote builds in B237533: Diff 529648.Jun 8 2023, 11:11 AM

Minor refactoring.
All IR combines are now implemented.

Harbormaster completed remote builds in B237713: Diff 529891.Jun 9 2023, 4:32 AM

paulwalker-arm added inline comments.Jun 13 2023, 5:01 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1408–1411	Having to manually choose which flags and attribute to copy seems fragile. The current set doesn't include things like fast math flags (as can be seen from `sve-intrinsics-combine-to-u-forms.ll: replace_fabd_intrinsic_half`) and debug metadata. This instcombine is essentially changing the name of the function called with everything else expected to remain as was so I think you should be able to use `II.setCalledFunction()`, which takes the new function declaration but will leave everything else (operands, flags, attributes...) as they are. NOTE: This means you're not creating a new call node thus removes the need to call `replaceInstUsesWith()`.
1785	I think my `setCalledFunction()` suggestion hopefully means this will no longer be required.
1787–1789	For this and other cases can you try to push the logic into the main function of the operation. So in this case it would be better to have all the add related combines within `instCombineSVEVectorAdd`.

Refactoring after review comments.
Bug fix where flags and attributes were not propagated correctly.

jolanta.jensen added inline comments.Jun 15 2023, 7:28 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1408–1411	Fixed. And the test corrected.
1785	It is no longer required. Removed.
1787–1789	Fixed.

mgabka added inline comments.Jun 15 2023, 7:29 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1283	looks like it is not used anymore so can be removed

mgabka added inline comments.Jun 15 2023, 8:42 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1291	I think the preference is to use auto
1292	I think the preference is to use auto
llvm/test/Transforms/InstCombine/AArch64/sve-intrinsics-combine-to-u-forms.ll
67	I have a question here, would it make more sense to have a separate test file for the cases where intrinsic is combined to a LLVM instructions? or maybe worth to add a comment to clarify why in this case we do not expect _u intrinsic?

paulwalker-arm added inline comments.Jun 15 2023, 8:56 AM

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsics-combine-to-u-forms.ll
67	These tests do exercise the new code so I think they belong in this file. Adding a comment to acknowledge the perhaps unexpected output sounds reasonable.

Harbormaster completed remote builds in B239127: Diff 531749.Jun 15 2023, 10:41 AM

Addressed review comments.

jolanta.jensen added inline comments.Jun 16 2023, 6:19 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1283	Removed.
1291	Changed to auto.
1292	Changed to auto.
llvm/test/Transforms/InstCombine/AArch64/sve-intrinsics-combine-to-u-forms.ll
67	Added comments for fadd, fmul and fsub.

Harbormaster completed remote builds in B239397: Diff 532119.Jun 16 2023, 7:28 AM

paulwalker-arm added inline comments.Jun 16 2023, 8:41 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1283	Please add a function description comment along the lines of "Canonicalise operations that take an all active predicate (e.g. sve.add -> sve.add_u).".
1302	I guess this test makes the functions easier to use. Perhaps move the test into `instCombineSVEAllActive` so the slight weirdness is hidden?
1353–1357	I'm pretty sure it's not safe to move this code here because `instCombineSVEVectorAdd` is called for both `sve.add` and `sve.add_u`. If you consider: sve.add(a, sve.mul.u(b, c)) here the inactive lanes of the result are defined to come from `a`. However this code will combine the IR into: sve.mla.u(a, b, c) where the result for inactive lanes will be undefined. This is why the combine originally lived outside of `instCombineSVEVectorAdd` and must remain outside after this work. This also means https://reviews.llvm.org/D150768 has introduced a bug that I missed during review.
1395–1399	Same comment as with `instCombineSVEVectorAdd`.
1765–1767	Based on my comment above, I think this code has to say here and you'll want to add a call to `instCombineSVEAllActive`.
1775–1777	As above.

paulwalker-arm added inline comments.Jun 18 2023, 3:46 PM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1353–1357	I've committed https://reviews.llvm.org/rGb7287a82d33b to fix the bug introduced by D150768.

Rebase on top of rGb7287a82d33b.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1283	Added.
1302	If I move the test to `instCombineAllActive()` I will not know if the returned intrinsic is the renamed one or the same _u intrinsic I got via the argument list here. If I already got an _u intrinsic I want to execute the remaining code in this function, not to return the intrinsic. But I removed this test as of now, `instCombineVectorAdd()` will never be called for an _u intrinsics. However I still have this check in `instCombineSVEVectorMul()`, which is called with both _u and non _u intrinsics. Happy for any suggestions how it could be rewritten.
1353–1357	Right, I now recall a comment in https://reviews.llvm.org/D144413 about the issue. Corrected by rebase on top of rGb7287a82d33b
1353–1357	Thank you!
1395–1399	Corrected by rebase on top of rGb7287a82d33b
1765–1767	Corrected by rebase on top of rGb7287a82d33b
1775–1777	Fixed by rebase on top of rGb7287a82d33b.

paulwalker-arm added inline comments.Jun 19 2023, 9:50 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1302	Thanks for investigating. As you say, the refactoring means there's now only a single instance so not worth worrying about.

Harbormaster completed remote builds in B239837: Diff 532695.Jun 19 2023, 10:36 AM

mgabka added inline comments.Jun 20 2023, 2:13 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1292	to me this does not look right, what are you trying to do here? why not use getModule function?
llvm/test/Transforms/InstCombine/AArch64/sve-intrinsics-combine-to-u-forms.ll
12	type, should be "replace with"
14	to me the singular form is more correct in this context
727	to me the singular form is more correct in this context
2013	if these are SVE2 instructions (and they are implemented like that currently in LLVM) then the attached attribute to these functions isn't correct, it should have sve2 attribute attached otherwise those intrinsics can not be code generated. you can try to run llc on this file and check it yourself , you will observe a compiler crash with the current version.

Addressed review comments.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1292	Changed to getModule(). It's not a function defined in llvm::IntrinsicInst but inherited and I did not find it looking for something suitable. So I reused existing snippet of code.
llvm/test/Transforms/InstCombine/AArch64/sve-intrinsics-combine-to-u-forms.ll
12	Fixed.
14	Fixed.
727	Fixed.
2013	Added +sve2 to the attributes.

Harbormaster completed remote builds in B239974: Diff 532869.Jun 20 2023, 6:35 AM

A minor test issue to fix, but otherwise looks good.

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsics-combine-to-u-forms.ll
254	Should this be `fmaxnm` to match the function name?

This revision is now accepted and ready to land.Jun 20 2023, 7:04 AM

Minor test correction after review comment.

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsics-combine-to-u-forms.ll
254	Yes, it should. Corrected.

Harbormaster completed remote builds in B240042: Diff 532959.Jun 20 2023, 10:56 AM

Closed by commit rGecb07f481be9: [SVE ACLE] Implement IR combines to convert intrinsics used for _m C/C++… (authored by jolanta.jensen). · Explain WhyJun 21 2023, 3:35 AM

This revision was automatically updated to reflect the committed changes.

jolanta.jensen added a commit: rGecb07f481be9: [SVE ACLE] Implement IR combines to convert intrinsics used for _m C/C++….

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64TargetTransformInfo.cpp

111 lines

test/

Transforms/

InstCombine/

AArch64/

sve-intrinsic-strictfp.ll

12 lines

sve-intrinsics-combine-to-u-forms.ll

2142 lines

Diff 533203

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 1,274 Lines • ▼ Show 20 Lines	if (BinOpCode == Instruction::BinaryOpsEnd \|\|
return std::nullopt;		return std::nullopt;
IRBuilderBase::FastMathFlagGuard FMFGuard(IC.Builder);		IRBuilderBase::FastMathFlagGuard FMFGuard(IC.Builder);
IC.Builder.setFastMathFlags(II.getFastMathFlags());		IC.Builder.setFastMathFlags(II.getFastMathFlags());
auto BinOp =		auto BinOp =
IC.Builder.CreateBinOp(BinOpCode, II.getOperand(1), II.getOperand(2));		IC.Builder.CreateBinOp(BinOpCode, II.getOperand(1), II.getOperand(2));
return IC.replaceInstUsesWith(II, BinOp);		return IC.replaceInstUsesWith(II, BinOp);
}		}

		// Canonicalise operations that take an all active predicate (e.g. sve.add ->
		mgabkaUnsubmitted Not Done Reply Inline Actions looks like it is not used anymore so can be removed mgabka: looks like it is not used anymore so can be removed
		jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions Removed. jolanta.jensen: Removed.
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Please add a function description comment along the lines of "Canonicalise operations that take an all active predicate (e.g. sve.add -> sve.add_u).". paulwalker-arm: Please add a function description comment along the lines of "Canonicalise operations that take…
		jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions Added. jolanta.jensen: Added.
		// sve.add_u).
		static std::optional<Instruction *> instCombineSVEAllActive(IntrinsicInst &II,
		Intrinsic::ID IID) {
		auto *OpPredicate = II.getOperand(0);
		if (!match(OpPredicate, m_Intrinsic<Intrinsic::aarch64_sve_ptrue>(
		m_ConstantInt<AArch64SVEPredPattern::all>())))
		return std::nullopt;

		mgabkaUnsubmitted Not Done Reply Inline Actions I think the preference is to use auto mgabka: I think the preference is to use auto
		jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions Changed to auto. jolanta.jensen: Changed to auto.
		auto *Mod = II.getModule();
		mgabkaUnsubmitted Not Done Reply Inline Actions I think the preference is to use auto mgabka: I think the preference is to use auto
		jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions Changed to auto. jolanta.jensen: Changed to auto.
		mgabkaUnsubmitted Not Done Reply Inline Actions to me this does not look right, what are you trying to do here? why not use getModule function? mgabka: to me this does not look right, what are you trying to do here? why not use getModule function?
		jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions Changed to getModule(). It's not a function defined in llvm::IntrinsicInst but inherited and I did not find it looking for something suitable. So I reused existing snippet of code. jolanta.jensen: Changed to getModule(). It's not a function defined in llvm::IntrinsicInst but inherited and I…
		auto *NewDecl = Intrinsic::getDeclaration(Mod, IID, {II.getType()});
		II.setCalledFunction(NewDecl);

		return &II;
		}

static std::optional<Instruction *> instCombineSVEVectorAdd(InstCombiner &IC,		static std::optional<Instruction *> instCombineSVEVectorAdd(InstCombiner &IC,
IntrinsicInst &II) {		IntrinsicInst &II) {
		if (auto II_U = instCombineSVEAllActive(II, Intrinsic::aarch64_sve_add_u))
		return II_U;
		paulwalker-armUnsubmitted Not Done Reply Inline Actions I guess this test makes the functions easier to use. Perhaps move the test into `instCombineSVEAllActive` so the slight weirdness is hidden? paulwalker-arm: I guess this test makes the functions easier to use. Perhaps move the test into…
		jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions If I move the test to `instCombineAllActive()` I will not know if the returned intrinsic is the renamed one or the same _u intrinsic I got via the argument list here. If I already got an _u intrinsic I want to execute the remaining code in this function, not to return the intrinsic. But I removed this test as of now, `instCombineVectorAdd()` will never be called for an _u intrinsics. However I still have this check in `instCombineSVEVectorMul()`, which is called with both _u and non _u intrinsics. Happy for any suggestions how it could be rewritten. jolanta.jensen: If I move the test to `instCombineAllActive()` I will not know if the returned intrinsic is the…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Thanks for investigating. As you say, the refactoring means there's now only a single instance so not worth worrying about. paulwalker-arm: Thanks for investigating. As you say, the refactoring means there's now only a single instance…
if (auto MLA = instCombineSVEVectorFuseMulAddSub<Intrinsic::aarch64_sve_mul,		if (auto MLA = instCombineSVEVectorFuseMulAddSub<Intrinsic::aarch64_sve_mul,
Intrinsic::aarch64_sve_mla>(		Intrinsic::aarch64_sve_mla>(
IC, II, true))		IC, II, true))
return MLA;		return MLA;
if (auto MAD = instCombineSVEVectorFuseMulAddSub<Intrinsic::aarch64_sve_mul,		if (auto MAD = instCombineSVEVectorFuseMulAddSub<Intrinsic::aarch64_sve_mul,
Intrinsic::aarch64_sve_mad>(		Intrinsic::aarch64_sve_mad>(
IC, II, false))		IC, II, false))
return MAD;		return MAD;
return std::nullopt;		return std::nullopt;
}		}

static std::optional<Instruction *>		static std::optional<Instruction *>
instCombineSVEVectorFAdd(InstCombiner &IC, IntrinsicInst &II) {		instCombineSVEVectorFAdd(InstCombiner &IC, IntrinsicInst &II) {
		if (auto II_U = instCombineSVEAllActive(II, Intrinsic::aarch64_sve_fadd_u))
		return II_U;
if (auto FMLA =		if (auto FMLA =
instCombineSVEVectorFuseMulAddSub<Intrinsic::aarch64_sve_fmul,		instCombineSVEVectorFuseMulAddSub<Intrinsic::aarch64_sve_fmul,
Intrinsic::aarch64_sve_fmla>(IC, II,		Intrinsic::aarch64_sve_fmla>(IC, II,
true))		true))
return FMLA;		return FMLA;
if (auto FMAD =		if (auto FMAD =
instCombineSVEVectorFuseMulAddSub<Intrinsic::aarch64_sve_fmul,		instCombineSVEVectorFuseMulAddSub<Intrinsic::aarch64_sve_fmul,
Intrinsic::aarch64_sve_fmad>(IC, II,		Intrinsic::aarch64_sve_fmad>(IC, II,
Show All 19 Lines	if (auto FMAD =
Intrinsic::aarch64_sve_fmad>(IC, II,		Intrinsic::aarch64_sve_fmad>(IC, II,
false))		false))
return FMAD;		return FMAD;
if (auto FMLA_U =		if (auto FMLA_U =
instCombineSVEVectorFuseMulAddSub<Intrinsic::aarch64_sve_fmul_u,		instCombineSVEVectorFuseMulAddSub<Intrinsic::aarch64_sve_fmul_u,
Intrinsic::aarch64_sve_fmla_u>(		Intrinsic::aarch64_sve_fmla_u>(
IC, II, true))		IC, II, true))
return FMLA_U;		return FMLA_U;
return instCombineSVEVectorBinOp(IC, II);		return instCombineSVEVectorBinOp(IC, II);
}		}

static std::optional<Instruction *>		static std::optional<Instruction *>
instCombineSVEVectorFSub(InstCombiner &IC, IntrinsicInst &II) {		instCombineSVEVectorFSub(InstCombiner &IC, IntrinsicInst &II) {
		paulwalker-armUnsubmitted Not Done Reply Inline Actions I'm pretty sure it's not safe to move this code here because `instCombineSVEVectorAdd` is called for both `sve.add` and `sve.add_u`. If you consider: sve.add(a, sve.mul.u(b, c)) here the inactive lanes of the result are defined to come from `a`. However this code will combine the IR into: sve.mla.u(a, b, c) where the result for inactive lanes will be undefined. This is why the combine originally lived outside of `instCombineSVEVectorAdd` and must remain outside after this work. This also means https://reviews.llvm.org/D150768 has introduced a bug that I missed during review. paulwalker-arm: I'm pretty sure it's not safe to move this code here because `instCombineSVEVectorAdd` is…
		jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions Right, I now recall a comment in https://reviews.llvm.org/D144413 about the issue. Corrected by rebase on top of rGb7287a82d33b jolanta.jensen: Right, I now recall a comment in https://reviews.llvm.org/D144413 about the issue. Corrected by…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions I've committed https://reviews.llvm.org/rGb7287a82d33b to fix the bug introduced by D150768. paulwalker-arm: I've committed https://reviews.llvm.org/rGb7287a82d33b to fix the bug introduced by D150768.
		jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions Thank you! jolanta.jensen: Thank you!
		if (auto II_U = instCombineSVEAllActive(II, Intrinsic::aarch64_sve_fsub_u))
		return II_U;
if (auto FMLS =		if (auto FMLS =
instCombineSVEVectorFuseMulAddSub<Intrinsic::aarch64_sve_fmul,		instCombineSVEVectorFuseMulAddSub<Intrinsic::aarch64_sve_fmul,
Intrinsic::aarch64_sve_fmls>(IC, II,		Intrinsic::aarch64_sve_fmls>(IC, II,
true))		true))
return FMLS;		return FMLS;
if (auto FMSB =		if (auto FMSB =
instCombineSVEVectorFuseMulAddSub<Intrinsic::aarch64_sve_fmul,		instCombineSVEVectorFuseMulAddSub<Intrinsic::aarch64_sve_fmul,
Intrinsic::aarch64_sve_fnmsb>(		Intrinsic::aarch64_sve_fnmsb>(
Show All 19 Lines	if (auto FMSB =
Intrinsic::aarch64_sve_fnmsb>(		Intrinsic::aarch64_sve_fnmsb>(
IC, II, false))		IC, II, false))
return FMSB;		return FMSB;
if (auto FMLS_U =		if (auto FMLS_U =
instCombineSVEVectorFuseMulAddSub<Intrinsic::aarch64_sve_fmul_u,		instCombineSVEVectorFuseMulAddSub<Intrinsic::aarch64_sve_fmul_u,
Intrinsic::aarch64_sve_fmls_u>(		Intrinsic::aarch64_sve_fmls_u>(
IC, II, true))		IC, II, true))
return FMLS_U;		return FMLS_U;
return instCombineSVEVectorBinOp(IC, II);		return instCombineSVEVectorBinOp(IC, II);
}		}

static std::optional<Instruction *> instCombineSVEVectorSub(InstCombiner &IC,		static std::optional<Instruction *> instCombineSVEVectorSub(InstCombiner &IC,
IntrinsicInst &II) {		IntrinsicInst &II) {
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Same comment as with `instCombineSVEVectorAdd`. paulwalker-arm: Same comment as with `instCombineSVEVectorAdd`.
		jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions Corrected by rebase on top of rGb7287a82d33b jolanta.jensen: Corrected by rebase on top of rGb7287a82d33b
		if (auto II_U = instCombineSVEAllActive(II, Intrinsic::aarch64_sve_sub_u))
		return II_U;
if (auto MLS = instCombineSVEVectorFuseMulAddSub<Intrinsic::aarch64_sve_mul,		if (auto MLS = instCombineSVEVectorFuseMulAddSub<Intrinsic::aarch64_sve_mul,
Intrinsic::aarch64_sve_mls>(		Intrinsic::aarch64_sve_mls>(
IC, II, true))		IC, II, true))
return MLS;		return MLS;
return std::nullopt;		return std::nullopt;
}		}
		paulwalker-armUnsubmitted Not Done Reply Inline Actions `InstCombiner` comes with its own `IRBuilder` already set up. It should be as simple as replacing your new `Builder.Create...`lines with `IC.Builder.Create...`. You'll see the existing SVE functions have been updated similarly. paulwalker-arm: `InstCombiner` comes with its own `IRBuilder` already set up. It should be as simple as…
		jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions Changed to InstCombiner own IRBuilder. jolanta.jensen: Changed to InstCombiner own IRBuilder.

static std::optional<Instruction *> instCombineSVEVectorMul(InstCombiner &IC,		static std::optional<Instruction *> instCombineSVEVectorMul(InstCombiner &IC,
IntrinsicInst &II) {		IntrinsicInst &II,
		mgabkaUnsubmitted Not Done Reply Inline Actions this isn't efficient (as you are copying element by element and doing unnecessary clear), and it is a lot of line of code, SmallVector has a dedicated constructor which can take an iterator_range and create a vector from it, please use it instead. mgabka: this isn't efficient (as you are copying element by element and doing unnecessary clear), and…
		jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions Fixed. jolanta.jensen: Fixed.
		Intrinsic::ID IID) {
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Having to manually choose which flags and attribute to copy seems fragile. The current set doesn't include things like fast math flags (as can be seen from `sve-intrinsics-combine-to-u-forms.ll: replace_fabd_intrinsic_half`) and debug metadata. This instcombine is essentially changing the name of the function called with everything else expected to remain as was so I think you should be able to use `II.setCalledFunction()`, which takes the new function declaration but will leave everything else (operands, flags, attributes...) as they are. NOTE: This means you're not creating a new call node thus removes the need to call `replaceInstUsesWith()`. paulwalker-arm: Having to manually choose which flags and attribute to copy seems fragile. The current set…
		jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions Fixed. And the test corrected. jolanta.jensen: Fixed. And the test corrected.
auto *OpPredicate = II.getOperand(0);		auto *OpPredicate = II.getOperand(0);
auto *OpMultiplicand = II.getOperand(1);		auto *OpMultiplicand = II.getOperand(1);
auto *OpMultiplier = II.getOperand(2);		auto *OpMultiplier = II.getOperand(2);

		// Canonicalise a non _u intrinsic only.
		if (II.getIntrinsicID() != IID)
		if (auto II_U = instCombineSVEAllActive(II, IID))
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Rather than have nested switch statements perhaps you can pass the new `IntrinsicID` into this function? Since both this function and `instCombineSVEAllActive3VA` don't change the original call's operands, I think you should be able to make use of II.args(). I'm not sure if there's a variant of `CreateIntrinsic` that'll take this directly but if not, you should be able to create a `SmallVector` from it and pass that to `CreateIntrinsic`. paulwalker-arm: Rather than have nested switch statements perhaps you can pass the new `IntrinsicID` into this…
		jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions Fixed. So much better now :) jolanta.jensen: Fixed. So much better now :)
		return II_U;

// Return true if a given instruction is a unit splat value, false otherwise.		// Return true if a given instruction is a unit splat value, false otherwise.
auto IsUnitSplat = [](auto *I) {		auto IsUnitSplat = [](auto *I) {
auto *SplatValue = getSplatValue(I);		auto *SplatValue = getSplatValue(I);
if (!SplatValue)		if (!SplatValue)
return false;		return false;
return match(SplatValue, m_FPOne()) \|\| match(SplatValue, m_One());		return match(SplatValue, m_FPOne()) \|\| match(SplatValue, m_One());
};		};

▲ Show 20 Lines • Show All 346 Lines • ▼ Show 20 Lines	AArch64TTIImpl::instCombineIntrinsic(InstCombiner &IC,
case Intrinsic::aarch64_sve_cnth:		case Intrinsic::aarch64_sve_cnth:
return instCombineSVECntElts(IC, II, 8);		return instCombineSVECntElts(IC, II, 8);
case Intrinsic::aarch64_sve_cntb:		case Intrinsic::aarch64_sve_cntb:
return instCombineSVECntElts(IC, II, 16);		return instCombineSVECntElts(IC, II, 16);
case Intrinsic::aarch64_sve_ptest_any:		case Intrinsic::aarch64_sve_ptest_any:
case Intrinsic::aarch64_sve_ptest_first:		case Intrinsic::aarch64_sve_ptest_first:
case Intrinsic::aarch64_sve_ptest_last:		case Intrinsic::aarch64_sve_ptest_last:
return instCombineSVEPTest(IC, II);		return instCombineSVEPTest(IC, II);
case Intrinsic::aarch64_sve_mul:		case Intrinsic::aarch64_sve_fabd:
case Intrinsic::aarch64_sve_mul_u:		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_fabd_u);
case Intrinsic::aarch64_sve_fmul:
case Intrinsic::aarch64_sve_fmul_u:
return instCombineSVEVectorMul(IC, II);
case Intrinsic::aarch64_sve_fadd:		case Intrinsic::aarch64_sve_fadd:
		paulwalker-armUnsubmitted Not Done Reply Inline Actions I think my `setCalledFunction()` suggestion hopefully means this will no longer be required. paulwalker-arm: I think my `setCalledFunction()` suggestion hopefully means this will no longer be required.
		jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions It is no longer required. Removed. jolanta.jensen: It is no longer required. Removed.
return instCombineSVEVectorFAdd(IC, II);		return instCombineSVEVectorFAdd(IC, II);
case Intrinsic::aarch64_sve_fadd_u:		case Intrinsic::aarch64_sve_fadd_u:
return instCombineSVEVectorFAddU(IC, II);		return instCombineSVEVectorFAddU(IC, II);
		case Intrinsic::aarch64_sve_fdiv:
		paulwalker-armUnsubmitted Not Done Reply Inline Actions For this and other cases can you try to push the logic into the main function of the operation. So in this case it would be better to have all the add related combines within `instCombineSVEVectorAdd`. paulwalker-arm: For this and other cases can you try to push the logic into the main function of the operation.
		jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions Fixed. jolanta.jensen: Fixed.
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_fdiv_u);
		case Intrinsic::aarch64_sve_fmax:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_fmax_u);
		case Intrinsic::aarch64_sve_fmaxnm:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_fmaxnm_u);
		case Intrinsic::aarch64_sve_fmin:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_fmin_u);
		case Intrinsic::aarch64_sve_fminnm:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_fminnm_u);
		case Intrinsic::aarch64_sve_fmla:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_fmla_u);
		case Intrinsic::aarch64_sve_fmls:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_fmls_u);
		case Intrinsic::aarch64_sve_fmul:
		case Intrinsic::aarch64_sve_fmul_u:
		return instCombineSVEVectorMul(IC, II, Intrinsic::aarch64_sve_fmul_u);
		case Intrinsic::aarch64_sve_fmulx:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_fmulx_u);
		case Intrinsic::aarch64_sve_fnmla:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_fnmla_u);
		case Intrinsic::aarch64_sve_fnmls:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_fnmls_u);
		case Intrinsic::aarch64_sve_fsub:
		return instCombineSVEVectorFSub(IC, II);
		case Intrinsic::aarch64_sve_fsub_u:
		return instCombineSVEVectorFSubU(IC, II);
case Intrinsic::aarch64_sve_add:		case Intrinsic::aarch64_sve_add:
return instCombineSVEVectorAdd(IC, II);		return instCombineSVEVectorAdd(IC, II);
case Intrinsic::aarch64_sve_add_u:		case Intrinsic::aarch64_sve_add_u:
return instCombineSVEVectorFuseMulAddSub<Intrinsic::aarch64_sve_mul_u,		return instCombineSVEVectorFuseMulAddSub<Intrinsic::aarch64_sve_mul_u,
Intrinsic::aarch64_sve_mla_u>(		Intrinsic::aarch64_sve_mla_u>(
IC, II, true);		IC, II, true);
paulwalker-armUnsubmitted Not Done Reply Inline Actions Based on my comment above, I think this code has to say here and you'll want to add a call to `instCombineSVEAllActive`. paulwalker-arm: Based on my comment above, I think this code has to say here and you'll want to add a call to…
jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions Corrected by rebase on top of rGb7287a82d33b jolanta.jensen: Corrected by rebase on top of rGb7287a82d33b
case Intrinsic::aarch64_sve_fsub:		case Intrinsic::aarch64_sve_mla:
return instCombineSVEVectorFSub(IC, II);		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_mla_u);
case Intrinsic::aarch64_sve_fsub_u:		case Intrinsic::aarch64_sve_mls:
return instCombineSVEVectorFSubU(IC, II);		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_mls_u);
		case Intrinsic::aarch64_sve_mul:
		case Intrinsic::aarch64_sve_mul_u:
		return instCombineSVEVectorMul(IC, II, Intrinsic::aarch64_sve_mul_u);
		case Intrinsic::aarch64_sve_sabd:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_sabd_u);
		case Intrinsic::aarch64_sve_smax:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_smax_u);
		case Intrinsic::aarch64_sve_smin:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_smin_u);
		case Intrinsic::aarch64_sve_smulh:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_smulh_u);
case Intrinsic::aarch64_sve_sub:		case Intrinsic::aarch64_sve_sub:
return instCombineSVEVectorSub(IC, II);		return instCombineSVEVectorSub(IC, II);
case Intrinsic::aarch64_sve_sub_u:		case Intrinsic::aarch64_sve_sub_u:
return instCombineSVEVectorFuseMulAddSub<Intrinsic::aarch64_sve_mul_u,		return instCombineSVEVectorFuseMulAddSub<Intrinsic::aarch64_sve_mul_u,
Intrinsic::aarch64_sve_mls_u>(		Intrinsic::aarch64_sve_mls_u>(
		jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions mul and mul_u execute identical code. Please advise if this can be solved some better way. jolanta.jensen: mul and mul_u execute identical code. Please advise if this can be solved some better way.
IC, II, true);		IC, II, true);
paulwalker-armUnsubmitted Not Done Reply Inline Actions As above. paulwalker-arm: As above.
jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions Fixed by rebase on top of rGb7287a82d33b. jolanta.jensen: Fixed by rebase on top of rGb7287a82d33b.
		mgabkaUnsubmitted Not Done Reply Inline Actions would be good if the order of intrinsics here aligned with the order in the test file so it would be easier to check that we have full coverage mgabka: would be good if the order of intrinsics here aligned with the order in the test file so it…
		jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions I think they are. But I could make a mistake, I'll check. jolanta.jensen: I think they are. But I could make a mistake, I'll check.
		jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions Should be now in the same order as in the test file. And the test file is renamed and alphabetically ordered within the sections. jolanta.jensen: Should be now in the same order as in the test file. And the test file is renamed and…
		case Intrinsic::aarch64_sve_uabd:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_uabd_u);
		case Intrinsic::aarch64_sve_umax:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_umax_u);
		case Intrinsic::aarch64_sve_umin:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_umin_u);
		case Intrinsic::aarch64_sve_umulh:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_umulh_u);
		case Intrinsic::aarch64_sve_asr:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_asr_u);
		case Intrinsic::aarch64_sve_lsl:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_lsl_u);
		case Intrinsic::aarch64_sve_lsr:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_lsr_u);
		case Intrinsic::aarch64_sve_and:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_and_u);
		case Intrinsic::aarch64_sve_bic:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_bic_u);
		case Intrinsic::aarch64_sve_eor:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_eor_u);
		case Intrinsic::aarch64_sve_orr:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_orr_u);
		case Intrinsic::aarch64_sve_sqsub:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_sqsub_u);
		case Intrinsic::aarch64_sve_uqsub:
		return instCombineSVEAllActive(II, Intrinsic::aarch64_sve_uqsub_u);
case Intrinsic::aarch64_sve_tbl:		case Intrinsic::aarch64_sve_tbl:
return instCombineSVETBL(IC, II);		return instCombineSVETBL(IC, II);
case Intrinsic::aarch64_sve_uunpkhi:		case Intrinsic::aarch64_sve_uunpkhi:
case Intrinsic::aarch64_sve_uunpklo:		case Intrinsic::aarch64_sve_uunpklo:
case Intrinsic::aarch64_sve_sunpkhi:		case Intrinsic::aarch64_sve_sunpkhi:
case Intrinsic::aarch64_sve_sunpklo:		case Intrinsic::aarch64_sve_sunpklo:
return instCombineSVEUnpack(IC, II);		return instCombineSVEUnpack(IC, II);
case Intrinsic::aarch64_sve_zip1:		case Intrinsic::aarch64_sve_zip1:
▲ Show 20 Lines • Show All 1,919 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-strictfp.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-attributes --check-globals			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-attributes --check-globals
	; RUN: opt -S -passes=inline,instcombine < %s \| FileCheck %s			; RUN: opt -S -passes=inline,instcombine < %s \| FileCheck %s

	target triple = "aarch64-unknown-linux-gnu"			target triple = "aarch64-unknown-linux-gnu"

	; TODO: We can only lower to constrained intrinsics when the necessary code			; TODO: We can only lower to constrained intrinsics when the necessary code
	; generation support for scalable vector strict operations exists.			; generation support for scalable vector strict operations exists.
	define <vscale x 2 x double> @replace_fadd_intrinsic_double_strictfp(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {			define <vscale x 2 x double> @replace_fadd_intrinsic_double_strictfp(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
	; CHECK: Function Attrs: strictfp			; CHECK: Function Attrs: strictfp
	; CHECK-LABEL: @replace_fadd_intrinsic_double_strictfp(			; CHECK-LABEL: @replace_fadd_intrinsic_double_strictfp(
	; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31) #[[ATTR2:[0-9]+]]			; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31) #[[ATTR2:[0-9]+]]
	; CHECK-NEXT: [[TMP2:%.]] = tail call <vscale x 2 x double> @llvm.aarch64.sve.fadd.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.*]]) #[[ATTR2]]			; CHECK-NEXT: [[TMP2:%.]] = tail call <vscale x 2 x double> @llvm.aarch64.sve.fadd.u.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.*]]) #[[ATTR2]]
	; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]			; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
	;			;
	%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31) #1			%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31) #1
	%2 = tail call <vscale x 2 x double> @llvm.aarch64.sve.fadd.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b) #1			%2 = tail call <vscale x 2 x double> @llvm.aarch64.sve.fadd.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b) #1
	ret <vscale x 2 x double> %2			ret <vscale x 2 x double> %2
	}			}

	; NOTE: IRBuilder::CreateBinOp doesn't emit constrained operations directly so			; NOTE: IRBuilder::CreateBinOp doesn't emit constrained operations directly so
	; rely on function inlining to showcase the problematic transformation.			; rely on function inlining to showcase the problematic transformation.
	define <vscale x 2 x double> @call_replace_fadd_intrinsic_double_strictfp(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {			define <vscale x 2 x double> @call_replace_fadd_intrinsic_double_strictfp(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
	; CHECK: Function Attrs: strictfp			; CHECK: Function Attrs: strictfp
	; CHECK-LABEL: @call_replace_fadd_intrinsic_double_strictfp(			; CHECK-LABEL: @call_replace_fadd_intrinsic_double_strictfp(
	; CHECK-NEXT: [[TMP1:%.*]] = call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31) #[[ATTR2]]			; CHECK-NEXT: [[TMP1:%.*]] = call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31) #[[ATTR2]]
	; CHECK-NEXT: [[TMP2:%.]] = call <vscale x 2 x double> @llvm.aarch64.sve.fadd.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.*]]) #[[ATTR2]]			; CHECK-NEXT: [[TMP2:%.]] = call <vscale x 2 x double> @llvm.aarch64.sve.fadd.u.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.*]]) #[[ATTR2]]
	; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]			; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
	;			;
	%1 = call <vscale x 2 x double> @replace_fadd_intrinsic_double_strictfp(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #1			%1 = call <vscale x 2 x double> @replace_fadd_intrinsic_double_strictfp(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #1
	ret <vscale x 2 x double> %1			ret <vscale x 2 x double> %1
	}			}

	; TODO: We can only lower to constrained intrinsics when the necessary code			; TODO: We can only lower to constrained intrinsics when the necessary code
	; generation support for scalable vector strict operations exists.			; generation support for scalable vector strict operations exists.
	define <vscale x 2 x double> @replace_fmul_intrinsic_double_strictfp(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {			define <vscale x 2 x double> @replace_fmul_intrinsic_double_strictfp(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
	; CHECK: Function Attrs: strictfp			; CHECK: Function Attrs: strictfp
	; CHECK-LABEL: @replace_fmul_intrinsic_double_strictfp(			; CHECK-LABEL: @replace_fmul_intrinsic_double_strictfp(
	; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31) #[[ATTR2]]			; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31) #[[ATTR2]]
	; CHECK-NEXT: [[TMP2:%.]] = tail call <vscale x 2 x double> @llvm.aarch64.sve.fmul.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.*]]) #[[ATTR2]]			; CHECK-NEXT: [[TMP2:%.]] = tail call <vscale x 2 x double> @llvm.aarch64.sve.fmul.u.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.*]]) #[[ATTR2]]
	; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]			; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
	;			;
	%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31) #1			%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31) #1
	%2 = tail call <vscale x 2 x double> @llvm.aarch64.sve.fmul.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b) #1			%2 = tail call <vscale x 2 x double> @llvm.aarch64.sve.fmul.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b) #1
	ret <vscale x 2 x double> %2			ret <vscale x 2 x double> %2
	}			}

	; NOTE: IRBuilder::CreateBinOp doesn't emit constrained operations directly so			; NOTE: IRBuilder::CreateBinOp doesn't emit constrained operations directly so
	; rely on function inlining to showcase the problematic transformation.			; rely on function inlining to showcase the problematic transformation.
	define <vscale x 2 x double> @call_replace_fmul_intrinsic_double_strictfp(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {			define <vscale x 2 x double> @call_replace_fmul_intrinsic_double_strictfp(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
	; CHECK: Function Attrs: strictfp			; CHECK: Function Attrs: strictfp
	; CHECK-LABEL: @call_replace_fmul_intrinsic_double_strictfp(			; CHECK-LABEL: @call_replace_fmul_intrinsic_double_strictfp(
	; CHECK-NEXT: [[TMP1:%.*]] = call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31) #[[ATTR2]]			; CHECK-NEXT: [[TMP1:%.*]] = call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31) #[[ATTR2]]
	; CHECK-NEXT: [[TMP2:%.]] = call <vscale x 2 x double> @llvm.aarch64.sve.fmul.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.*]]) #[[ATTR2]]			; CHECK-NEXT: [[TMP2:%.]] = call <vscale x 2 x double> @llvm.aarch64.sve.fmul.u.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.*]]) #[[ATTR2]]
	; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]			; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
	;			;
	%1 = call <vscale x 2 x double> @replace_fmul_intrinsic_double_strictfp(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #1			%1 = call <vscale x 2 x double> @replace_fmul_intrinsic_double_strictfp(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #1
	ret <vscale x 2 x double> %1			ret <vscale x 2 x double> %1
	}			}

	; TODO: We can only lower to constrained intrinsics when the necessary code			; TODO: We can only lower to constrained intrinsics when the necessary code
	; generation support for scalable vector strict operations exists.			; generation support for scalable vector strict operations exists.
	define <vscale x 2 x double> @replace_fsub_intrinsic_double_strictfp(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {			define <vscale x 2 x double> @replace_fsub_intrinsic_double_strictfp(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
	; CHECK: Function Attrs: strictfp			; CHECK: Function Attrs: strictfp
	; CHECK-LABEL: @replace_fsub_intrinsic_double_strictfp(			; CHECK-LABEL: @replace_fsub_intrinsic_double_strictfp(
	; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31) #[[ATTR2]]			; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31) #[[ATTR2]]
	; CHECK-NEXT: [[TMP2:%.]] = tail call <vscale x 2 x double> @llvm.aarch64.sve.fsub.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.*]]) #[[ATTR2]]			; CHECK-NEXT: [[TMP2:%.]] = tail call <vscale x 2 x double> @llvm.aarch64.sve.fsub.u.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.*]]) #[[ATTR2]]
	; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]			; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
	;			;
	%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31) #1			%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31) #1
	%2 = tail call <vscale x 2 x double> @llvm.aarch64.sve.fsub.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b) #1			%2 = tail call <vscale x 2 x double> @llvm.aarch64.sve.fsub.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b) #1
	ret <vscale x 2 x double> %2			ret <vscale x 2 x double> %2
	}			}

	; NOTE: IRBuilder::CreateBinOp doesn't emit constrained operations directly so			; NOTE: IRBuilder::CreateBinOp doesn't emit constrained operations directly so
	; rely on function inlining to showcase the problematic transformation.			; rely on function inlining to showcase the problematic transformation.
	define <vscale x 2 x double> @call_replace_fsub_intrinsic_double_strictfp(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {			define <vscale x 2 x double> @call_replace_fsub_intrinsic_double_strictfp(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
	; CHECK: Function Attrs: strictfp			; CHECK: Function Attrs: strictfp
	; CHECK-LABEL: @call_replace_fsub_intrinsic_double_strictfp(			; CHECK-LABEL: @call_replace_fsub_intrinsic_double_strictfp(
	; CHECK-NEXT: [[TMP1:%.*]] = call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31) #[[ATTR2]]			; CHECK-NEXT: [[TMP1:%.*]] = call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31) #[[ATTR2]]
	; CHECK-NEXT: [[TMP2:%.]] = call <vscale x 2 x double> @llvm.aarch64.sve.fsub.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.*]]) #[[ATTR2]]			; CHECK-NEXT: [[TMP2:%.]] = call <vscale x 2 x double> @llvm.aarch64.sve.fsub.u.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.*]]) #[[ATTR2]]
	; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]			; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
	;			;
	%1 = call <vscale x 2 x double> @replace_fsub_intrinsic_double_strictfp(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #1			%1 = call <vscale x 2 x double> @replace_fsub_intrinsic_double_strictfp(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #1
	ret <vscale x 2 x double> %1			ret <vscale x 2 x double> %1
	}			}

	declare <vscale x 2 x double> @llvm.aarch64.sve.fadd.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>)			declare <vscale x 2 x double> @llvm.aarch64.sve.fadd.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>)
	declare <vscale x 2 x double> @llvm.aarch64.sve.fmul.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>)			declare <vscale x 2 x double> @llvm.aarch64.sve.fmul.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>)
	Show All 11 Lines

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsics-combine-to-u-forms.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
				; RUN: opt -S -passes=instcombine < %s \| FileCheck %s

				target triple = "aarch64-unknown-linux-gnu"

				declare <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32)
				declare <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32)
				declare <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32)
				declare <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32)

				; Replace with SVE merging intrinsics to their equivalent undef (_u) variants when they take an all active predicate.

				mgabkaUnsubmitted Not Done Reply Inline Actions type, should be "replace with" mgabka: type, should be "replace with"
				jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions Fixed. jolanta.jensen: Fixed.
				; Float arithmetic

				mgabkaUnsubmitted Not Done Reply Inline Actions to me the singular form is more correct in this context mgabka: to me the singular form is more correct in this context
				jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions Fixed. jolanta.jensen: Fixed.
				declare <vscale x 8 x half> @llvm.aarch64.sve.fabd.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
				define <vscale x 8 x half> @replace_fabd_intrinsic_half(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x half> @replace_fabd_intrinsic_half
				; CHECK-SAME: (<vscale x 8 x half> [[A:%.]], <vscale x 8 x half> [[B:%.]]) #[[ATTR1:[0-9]+]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fabd.u.nxv8f16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x half> [[A]], <vscale x 8 x half> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x half> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fabd.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
				ret <vscale x 8 x half> %2
				}

				declare <vscale x 4 x float> @llvm.aarch64.sve.fabd.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>, <vscale x 4 x float>)
				define <vscale x 4 x float> @replace_fabd_intrinsic_float(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x float> @replace_fabd_intrinsic_float
				; CHECK-SAME: (<vscale x 4 x float> [[A:%.]], <vscale x 4 x float> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.fabd.u.nxv4f32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x float> [[A]], <vscale x 4 x float> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x float> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.fabd.nxv4f32(<vscale x 4 x i1> %1, <vscale x 4 x float> %a, <vscale x 4 x float> %b)
				ret <vscale x 4 x float> %2
				}

				declare <vscale x 2 x double> @llvm.aarch64.sve.fabd.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>)
				define <vscale x 2 x double> @replace_fabd_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @replace_fabd_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fabd.u.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A]], <vscale x 2 x double> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fabd.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				ret <vscale x 2 x double> %2
				}

				define <vscale x 2 x double> @no_replace_fabd_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @no_replace_fabd_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fabd.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A]], <vscale x 2 x double> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fabd.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				ret <vscale x 2 x double> %2
				}

				; aarch64_sve_fadd intrinsic combines to a LLVM instruction fadd.

				mgabkaUnsubmitted Not Done Reply Inline Actions I have a question here, would it make more sense to have a separate test file for the cases where intrinsic is combined to a LLVM instructions? or maybe worth to add a comment to clarify why in this case we do not expect _u intrinsic? mgabka: I have a question here, would it make more sense to have a separate test file for the cases…
				paulwalker-armUnsubmitted Not Done Reply Inline Actions These tests do exercise the new code so I think they belong in this file. Adding a comment to acknowledge the perhaps unexpected output sounds reasonable. paulwalker-arm: These tests do exercise the new code so I think they belong in this file. Adding a comment to…
				jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions Added comments for fadd, fmul and fsub. jolanta.jensen: Added comments for fadd, fmul and fsub.
				declare <vscale x 8 x half> @llvm.aarch64.sve.fadd.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
				define <vscale x 8 x half> @replace_fadd_intrinsic_half(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x half> @replace_fadd_intrinsic_half
				; CHECK-SAME: (<vscale x 8 x half> [[A:%.]], <vscale x 8 x half> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = fadd fast <vscale x 8 x half> [[A]], [[B]]
				; CHECK-NEXT: ret <vscale x 8 x half> [[TMP1]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fadd.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
				ret <vscale x 8 x half> %2
				}

				declare <vscale x 4 x float> @llvm.aarch64.sve.fadd.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>, <vscale x 4 x float>)
				define <vscale x 4 x float> @replace_fadd_intrinsic_float(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x float> @replace_fadd_intrinsic_float
				; CHECK-SAME: (<vscale x 4 x float> [[A:%.]], <vscale x 4 x float> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = fadd fast <vscale x 4 x float> [[A]], [[B]]
				; CHECK-NEXT: ret <vscale x 4 x float> [[TMP1]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.fadd.nxv4f32(<vscale x 4 x i1> %1, <vscale x 4 x float> %a, <vscale x 4 x float> %b)
				ret <vscale x 4 x float> %2
				}

				declare <vscale x 2 x double> @llvm.aarch64.sve.fadd.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>)
				define <vscale x 2 x double> @replace_fadd_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @replace_fadd_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = fadd fast <vscale x 2 x double> [[A]], [[B]]
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP1]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fadd.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				ret <vscale x 2 x double> %2
				}

				define <vscale x 2 x double> @no_replace_fadd_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @no_replace_fadd_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fadd.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A]], <vscale x 2 x double> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fadd.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				ret <vscale x 2 x double> %2
				}

				declare <vscale x 8 x half> @llvm.aarch64.sve.fdiv.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
				define <vscale x 8 x half> @replace_fdiv_intrinsic_half(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x half> @replace_fdiv_intrinsic_half
				; CHECK-SAME: (<vscale x 8 x half> [[A:%.]], <vscale x 8 x half> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fdiv.u.nxv8f16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x half> [[A]], <vscale x 8 x half> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x half> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fdiv.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
				ret <vscale x 8 x half> %2
				}

				declare <vscale x 4 x float> @llvm.aarch64.sve.fdiv.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>, <vscale x 4 x float>)
				define <vscale x 4 x float> @replace_fdiv_intrinsic_float(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x float> @replace_fdiv_intrinsic_float
				; CHECK-SAME: (<vscale x 4 x float> [[A:%.]], <vscale x 4 x float> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.fdiv.u.nxv4f32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x float> [[A]], <vscale x 4 x float> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x float> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.fdiv.nxv4f32(<vscale x 4 x i1> %1, <vscale x 4 x float> %a, <vscale x 4 x float> %b)
				ret <vscale x 4 x float> %2
				}

				declare <vscale x 2 x double> @llvm.aarch64.sve.fdiv.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>)
				define <vscale x 2 x double> @replace_fdiv_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @replace_fdiv_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fdiv.u.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A]], <vscale x 2 x double> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fdiv.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				ret <vscale x 2 x double> %2
				}

				define <vscale x 2 x double> @no_replace_fdiv_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @no_replace_fdiv_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fdiv.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A]], <vscale x 2 x double> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fdiv.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				ret <vscale x 2 x double> %2
				}

				declare <vscale x 8 x half> @llvm.aarch64.sve.fmax.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
				define <vscale x 8 x half> @replace_fmax_intrinsic_half(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x half> @replace_fmax_intrinsic_half
				; CHECK-SAME: (<vscale x 8 x half> [[A:%.]], <vscale x 8 x half> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fmax.u.nxv8f16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x half> [[A]], <vscale x 8 x half> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x half> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fmax.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
				ret <vscale x 8 x half> %2
				}

				declare <vscale x 4 x float> @llvm.aarch64.sve.fmax.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>, <vscale x 4 x float>)
				define <vscale x 4 x float> @replace_fmax_intrinsic_float(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x float> @replace_fmax_intrinsic_float
				; CHECK-SAME: (<vscale x 4 x float> [[A:%.]], <vscale x 4 x float> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.fmax.u.nxv4f32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x float> [[A]], <vscale x 4 x float> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x float> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.fmax.nxv4f32(<vscale x 4 x i1> %1, <vscale x 4 x float> %a, <vscale x 4 x float> %b)
				ret <vscale x 4 x float> %2
				}

				declare <vscale x 2 x double> @llvm.aarch64.sve.fmax.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>)
				define <vscale x 2 x double> @replace_fmax_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @replace_fmax_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmax.u.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A]], <vscale x 2 x double> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmax.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				ret <vscale x 2 x double> %2
				}

				define <vscale x 2 x double> @no_replace_fmax_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @no_replace_fmax_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmax.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A]], <vscale x 2 x double> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmax.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				ret <vscale x 2 x double> %2
				}

				declare <vscale x 8 x half> @llvm.aarch64.sve.fmaxnm.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
				define <vscale x 8 x half> @replace_fmaxnm_intrinsic_half(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x half> @replace_fmaxnm_intrinsic_half
				; CHECK-SAME: (<vscale x 8 x half> [[A:%.]], <vscale x 8 x half> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fmaxnm.u.nxv8f16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x half> [[A]], <vscale x 8 x half> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x half> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fmaxnm.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
				ret <vscale x 8 x half> %2
				}

				declare <vscale x 4 x float> @llvm.aarch64.sve.fmaxnm.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>, <vscale x 4 x float>)
				define <vscale x 4 x float> @replace_fmaxnm_intrinsic_float(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x float> @replace_fmaxnm_intrinsic_float
				; CHECK-SAME: (<vscale x 4 x float> [[A:%.]], <vscale x 4 x float> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.fmaxnm.u.nxv4f32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x float> [[A]], <vscale x 4 x float> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x float> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.fmaxnm.nxv4f32(<vscale x 4 x i1> %1, <vscale x 4 x float> %a, <vscale x 4 x float> %b)
				ret <vscale x 4 x float> %2
				}

				declare <vscale x 2 x double> @llvm.aarch64.sve.fmaxnm.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>)
				define <vscale x 2 x double> @replace_fmaxnm_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @replace_fmaxnm_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmaxnm.u.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A]], <vscale x 2 x double> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmaxnm.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				ret <vscale x 2 x double> %2
				paulwalker-armUnsubmitted Not Done Reply Inline Actions Should this be `fmaxnm` to match the function name? paulwalker-arm: Should this be `fmaxnm` to match the function name?
				jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions Yes, it should. Corrected. jolanta.jensen: Yes, it should. Corrected.
				}

				define <vscale x 2 x double> @no_replace_fmaxnm_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @no_replace_fmaxnm_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmaxnm.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A]], <vscale x 2 x double> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmaxnm.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				ret <vscale x 2 x double> %2
				}

				declare <vscale x 8 x half> @llvm.aarch64.sve.fmin.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
				define <vscale x 8 x half> @replace_fmin_intrinsic_half(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x half> @replace_fmin_intrinsic_half
				; CHECK-SAME: (<vscale x 8 x half> [[A:%.]], <vscale x 8 x half> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fmin.u.nxv8f16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x half> [[A]], <vscale x 8 x half> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x half> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fmin.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
				ret <vscale x 8 x half> %2
				}

				declare <vscale x 4 x float> @llvm.aarch64.sve.fmin.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>, <vscale x 4 x float>)
				define <vscale x 4 x float> @replace_fmin_intrinsic_float(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x float> @replace_fmin_intrinsic_float
				; CHECK-SAME: (<vscale x 4 x float> [[A:%.]], <vscale x 4 x float> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.fmin.u.nxv4f32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x float> [[A]], <vscale x 4 x float> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x float> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.fmin.nxv4f32(<vscale x 4 x i1> %1, <vscale x 4 x float> %a, <vscale x 4 x float> %b)
				ret <vscale x 4 x float> %2
				}

				declare <vscale x 2 x double> @llvm.aarch64.sve.fmin.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>)
				define <vscale x 2 x double> @replace_fmin_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @replace_fmin_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmin.u.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A]], <vscale x 2 x double> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmin.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				ret <vscale x 2 x double> %2
				}

				define <vscale x 2 x double> @no_replace_fmin_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @no_replace_fmin_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmin.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A]], <vscale x 2 x double> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmin.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				ret <vscale x 2 x double> %2
				}

				declare <vscale x 8 x half> @llvm.aarch64.sve.fminnm.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
				define <vscale x 8 x half> @replace_fminnm_intrinsic_half(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x half> @replace_fminnm_intrinsic_half
				; CHECK-SAME: (<vscale x 8 x half> [[A:%.]], <vscale x 8 x half> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fminnm.u.nxv8f16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x half> [[A]], <vscale x 8 x half> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x half> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fminnm.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
				ret <vscale x 8 x half> %2
				}

				declare <vscale x 4 x float> @llvm.aarch64.sve.fminnm.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>, <vscale x 4 x float>)
				define <vscale x 4 x float> @replace_fminnm_intrinsic_float(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x float> @replace_fminnm_intrinsic_float
				; CHECK-SAME: (<vscale x 4 x float> [[A:%.]], <vscale x 4 x float> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.fminnm.u.nxv4f32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x float> [[A]], <vscale x 4 x float> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x float> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.fminnm.nxv4f32(<vscale x 4 x i1> %1, <vscale x 4 x float> %a, <vscale x 4 x float> %b)
				ret <vscale x 4 x float> %2
				}

				declare <vscale x 2 x double> @llvm.aarch64.sve.fminnm.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>)
				define <vscale x 2 x double> @replace_fminnm_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @replace_fminnm_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fminnm.u.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A]], <vscale x 2 x double> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fminnm.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				ret <vscale x 2 x double> %2
				}

				define <vscale x 2 x double> @no_replace_fminnm_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @no_replace_fminnm_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fminnm.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A]], <vscale x 2 x double> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fminnm.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				ret <vscale x 2 x double> %2
				}

				declare <vscale x 8 x half> @llvm.aarch64.sve.fmla.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>, <vscale x 8 x half>)
				define <vscale x 8 x half> @replace_fmla_intrinsic_half(<vscale x 8 x half> %a, <vscale x 8 x half> %b, <vscale x 8 x half> %c) #0 {
				; CHECK-LABEL: define <vscale x 8 x half> @replace_fmla_intrinsic_half
				; CHECK-SAME: (<vscale x 8 x half> [[A:%.]], <vscale x 8 x half> [[B:%.]], <vscale x 8 x half> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fmla.u.nxv8f16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x half> [[A]], <vscale x 8 x half> [[B]], <vscale x 8 x half> [[C]])
				; CHECK-NEXT: ret <vscale x 8 x half> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fmla.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %a, <vscale x 8 x half> %b, <vscale x 8 x half> %c)
				ret <vscale x 8 x half> %2
				}

				declare <vscale x 4 x float> @llvm.aarch64.sve.fmla.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>, <vscale x 4 x float>, <vscale x 4 x float>)
				define <vscale x 4 x float> @replace_fmla_intrinsic_float(<vscale x 4 x float> %a, <vscale x 4 x float> %b, <vscale x 4 x float> %c) #0 {
				; CHECK-LABEL: define <vscale x 4 x float> @replace_fmla_intrinsic_float
				; CHECK-SAME: (<vscale x 4 x float> [[A:%.]], <vscale x 4 x float> [[B:%.]], <vscale x 4 x float> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.fmla.u.nxv4f32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x float> [[A]], <vscale x 4 x float> [[B]], <vscale x 4 x float> [[C]])
				; CHECK-NEXT: ret <vscale x 4 x float> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.fmla.nxv4f32(<vscale x 4 x i1> %1, <vscale x 4 x float> %a, <vscale x 4 x float> %b, <vscale x 4 x float> %c)
				ret <vscale x 4 x float> %2
				}

				declare <vscale x 2 x double> @llvm.aarch64.sve.fmla.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>, <vscale x 2 x double>)
				define <vscale x 2 x double> @replace_fmla_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b, <vscale x 2 x double> %c) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @replace_fmla_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]], <vscale x 2 x double> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmla.u.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A]], <vscale x 2 x double> [[B]], <vscale x 2 x double> [[C]])
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmla.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b, <vscale x 2 x double> %c)
				ret <vscale x 2 x double> %2
				}

				define <vscale x 2 x double> @no_replace_fmla_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b, <vscale x 2 x double> %c) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @no_replace_fmla_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]], <vscale x 2 x double> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmla.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A]], <vscale x 2 x double> [[B]], <vscale x 2 x double> [[C]])
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmla.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b, <vscale x 2 x double> %c)
				ret <vscale x 2 x double> %2
				}

				declare <vscale x 8 x half> @llvm.aarch64.sve.fmls.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>, <vscale x 8 x half>)
				define <vscale x 8 x half> @replace_fmls_intrinsic_half(<vscale x 8 x half> %a, <vscale x 8 x half> %b, <vscale x 8 x half> %c) #0 {
				; CHECK-LABEL: define <vscale x 8 x half> @replace_fmls_intrinsic_half
				; CHECK-SAME: (<vscale x 8 x half> [[A:%.]], <vscale x 8 x half> [[B:%.]], <vscale x 8 x half> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fmls.u.nxv8f16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x half> [[A]], <vscale x 8 x half> [[B]], <vscale x 8 x half> [[C]])
				; CHECK-NEXT: ret <vscale x 8 x half> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fmls.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %a, <vscale x 8 x half> %b, <vscale x 8 x half> %c)
				ret <vscale x 8 x half> %2
				}

				declare <vscale x 4 x float> @llvm.aarch64.sve.fmls.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>, <vscale x 4 x float>, <vscale x 4 x float>)
				define <vscale x 4 x float> @replace_fmls_intrinsic_float(<vscale x 4 x float> %a, <vscale x 4 x float> %b, <vscale x 4 x float> %c) #0 {
				; CHECK-LABEL: define <vscale x 4 x float> @replace_fmls_intrinsic_float
				; CHECK-SAME: (<vscale x 4 x float> [[A:%.]], <vscale x 4 x float> [[B:%.]], <vscale x 4 x float> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.fmls.u.nxv4f32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x float> [[A]], <vscale x 4 x float> [[B]], <vscale x 4 x float> [[C]])
				; CHECK-NEXT: ret <vscale x 4 x float> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.fmls.nxv4f32(<vscale x 4 x i1> %1, <vscale x 4 x float> %a, <vscale x 4 x float> %b, <vscale x 4 x float> %c)
				ret <vscale x 4 x float> %2
				}

				declare <vscale x 2 x double> @llvm.aarch64.sve.fmls.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>, <vscale x 2 x double>)
				define <vscale x 2 x double> @replace_fmls_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b, <vscale x 2 x double> %c) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @replace_fmls_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]], <vscale x 2 x double> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmls.u.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A]], <vscale x 2 x double> [[B]], <vscale x 2 x double> [[C]])
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmls.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b, <vscale x 2 x double> %c)
				ret <vscale x 2 x double> %2
				}

				define <vscale x 2 x double> @no_replace_fmls_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b, <vscale x 2 x double> %c) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @no_replace_fmls_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]], <vscale x 2 x double> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmls.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A]], <vscale x 2 x double> [[B]], <vscale x 2 x double> [[C]])
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmls.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b, <vscale x 2 x double> %c)
				ret <vscale x 2 x double> %2
				}

				; aarch64_sve_fmul intrinsic combines to a LLVM instruction fmul.

				declare <vscale x 8 x half> @llvm.aarch64.sve.fmul.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
				define <vscale x 8 x half> @replace_fmul_intrinsic_half(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x half> @replace_fmul_intrinsic_half
				; CHECK-SAME: (<vscale x 8 x half> [[A:%.]], <vscale x 8 x half> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = fmul fast <vscale x 8 x half> [[A]], [[B]]
				; CHECK-NEXT: ret <vscale x 8 x half> [[TMP1]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fmul.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
				ret <vscale x 8 x half> %2
				}

				declare <vscale x 4 x float> @llvm.aarch64.sve.fmul.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>, <vscale x 4 x float>)
				define <vscale x 4 x float> @replace_fmul_intrinsic_float(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x float> @replace_fmul_intrinsic_float
				; CHECK-SAME: (<vscale x 4 x float> [[A:%.]], <vscale x 4 x float> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = fmul fast <vscale x 4 x float> [[A]], [[B]]
				; CHECK-NEXT: ret <vscale x 4 x float> [[TMP1]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.fmul.nxv4f32(<vscale x 4 x i1> %1, <vscale x 4 x float> %a, <vscale x 4 x float> %b)
				ret <vscale x 4 x float> %2
				}

				declare <vscale x 2 x double> @llvm.aarch64.sve.fmul.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>)
				define <vscale x 2 x double> @replace_fmul_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @replace_fmul_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = fmul fast <vscale x 2 x double> [[A]], [[B]]
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP1]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmul.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				ret <vscale x 2 x double> %2
				}

				define <vscale x 2 x double> @no_replace_fmul_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @no_replace_fmul_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmul.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A]], <vscale x 2 x double> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmul.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				ret <vscale x 2 x double> %2
				}

				declare <vscale x 8 x half> @llvm.aarch64.sve.fmulx.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
				define <vscale x 8 x half> @replace_fmulx_intrinsic_half(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x half> @replace_fmulx_intrinsic_half
				; CHECK-SAME: (<vscale x 8 x half> [[A:%.]], <vscale x 8 x half> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fmulx.u.nxv8f16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x half> [[A]], <vscale x 8 x half> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x half> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fmulx.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
				ret <vscale x 8 x half> %2
				}

				declare <vscale x 4 x float> @llvm.aarch64.sve.fmulx.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>, <vscale x 4 x float>)
				define <vscale x 4 x float> @replace_fmulx_intrinsic_float(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x float> @replace_fmulx_intrinsic_float
				; CHECK-SAME: (<vscale x 4 x float> [[A:%.]], <vscale x 4 x float> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.fmulx.u.nxv4f32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x float> [[A]], <vscale x 4 x float> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x float> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.fmulx.nxv4f32(<vscale x 4 x i1> %1, <vscale x 4 x float> %a, <vscale x 4 x float> %b)
				ret <vscale x 4 x float> %2
				}

				declare <vscale x 2 x double> @llvm.aarch64.sve.fmulx.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>)
				define <vscale x 2 x double> @replace_fmulx_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @replace_fmulx_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmulx.u.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A]], <vscale x 2 x double> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmulx.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				ret <vscale x 2 x double> %2
				}

				define <vscale x 2 x double> @no_replace_fmulx_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @no_replace_fmulx_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmulx.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A]], <vscale x 2 x double> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fmulx.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				ret <vscale x 2 x double> %2
				}

				declare <vscale x 8 x half> @llvm.aarch64.sve.fnmla.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>, <vscale x 8 x half>)
				define <vscale x 8 x half> @replace_fnmla_intrinsic_half(<vscale x 8 x half> %a, <vscale x 8 x half> %b, <vscale x 8 x half> %c) #0 {
				; CHECK-LABEL: define <vscale x 8 x half> @replace_fnmla_intrinsic_half
				; CHECK-SAME: (<vscale x 8 x half> [[A:%.]], <vscale x 8 x half> [[B:%.]], <vscale x 8 x half> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fnmla.u.nxv8f16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x half> [[A]], <vscale x 8 x half> [[B]], <vscale x 8 x half> [[C]])
				; CHECK-NEXT: ret <vscale x 8 x half> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fnmla.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %a, <vscale x 8 x half> %b, <vscale x 8 x half> %c)
				ret <vscale x 8 x half> %2
				}

				declare <vscale x 4 x float> @llvm.aarch64.sve.fnmla.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>, <vscale x 4 x float>, <vscale x 4 x float>)
				define <vscale x 4 x float> @replace_fnmla_intrinsic_float(<vscale x 4 x float> %a, <vscale x 4 x float> %b, <vscale x 4 x float> %c) #0 {
				; CHECK-LABEL: define <vscale x 4 x float> @replace_fnmla_intrinsic_float
				; CHECK-SAME: (<vscale x 4 x float> [[A:%.]], <vscale x 4 x float> [[B:%.]], <vscale x 4 x float> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.fnmla.u.nxv4f32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x float> [[A]], <vscale x 4 x float> [[B]], <vscale x 4 x float> [[C]])
				; CHECK-NEXT: ret <vscale x 4 x float> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.fnmla.nxv4f32(<vscale x 4 x i1> %1, <vscale x 4 x float> %a, <vscale x 4 x float> %b, <vscale x 4 x float> %c)
				ret <vscale x 4 x float> %2
				}

				declare <vscale x 2 x double> @llvm.aarch64.sve.fnmla.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>, <vscale x 2 x double>)
				define <vscale x 2 x double> @replace_fnmla_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b, <vscale x 2 x double> %c) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @replace_fnmla_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]], <vscale x 2 x double> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fnmla.u.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A]], <vscale x 2 x double> [[B]], <vscale x 2 x double> [[C]])
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fnmla.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b, <vscale x 2 x double> %c)
				ret <vscale x 2 x double> %2
				}

				define <vscale x 2 x double> @no_replace_fnmla_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b, <vscale x 2 x double> %c) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @no_replace_fnmla_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]], <vscale x 2 x double> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fnmla.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A]], <vscale x 2 x double> [[B]], <vscale x 2 x double> [[C]])
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fnmla.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b, <vscale x 2 x double> %c)
				ret <vscale x 2 x double> %2
				}

				declare <vscale x 8 x half> @llvm.aarch64.sve.fnmls.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>, <vscale x 8 x half>)
				define <vscale x 8 x half> @replace_fnmls_intrinsic_half(<vscale x 8 x half> %a, <vscale x 8 x half> %b, <vscale x 8 x half> %c) #0 {
				; CHECK-LABEL: define <vscale x 8 x half> @replace_fnmls_intrinsic_half
				; CHECK-SAME: (<vscale x 8 x half> [[A:%.]], <vscale x 8 x half> [[B:%.]], <vscale x 8 x half> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fnmls.u.nxv8f16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x half> [[A]], <vscale x 8 x half> [[B]], <vscale x 8 x half> [[C]])
				; CHECK-NEXT: ret <vscale x 8 x half> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fnmls.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %a, <vscale x 8 x half> %b, <vscale x 8 x half> %c)
				ret <vscale x 8 x half> %2
				}

				declare <vscale x 4 x float> @llvm.aarch64.sve.fnmls.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>, <vscale x 4 x float>, <vscale x 4 x float>)
				define <vscale x 4 x float> @replace_fnmls_intrinsic_float(<vscale x 4 x float> %a, <vscale x 4 x float> %b, <vscale x 4 x float> %c) #0 {
				; CHECK-LABEL: define <vscale x 4 x float> @replace_fnmls_intrinsic_float
				; CHECK-SAME: (<vscale x 4 x float> [[A:%.]], <vscale x 4 x float> [[B:%.]], <vscale x 4 x float> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.fnmls.u.nxv4f32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x float> [[A]], <vscale x 4 x float> [[B]], <vscale x 4 x float> [[C]])
				; CHECK-NEXT: ret <vscale x 4 x float> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.fnmls.nxv4f32(<vscale x 4 x i1> %1, <vscale x 4 x float> %a, <vscale x 4 x float> %b, <vscale x 4 x float> %c)
				ret <vscale x 4 x float> %2
				}

				declare <vscale x 2 x double> @llvm.aarch64.sve.fnmls.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>, <vscale x 2 x double>)
				define <vscale x 2 x double> @replace_fnmls_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b, <vscale x 2 x double> %c) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @replace_fnmls_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]], <vscale x 2 x double> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fnmls.u.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A]], <vscale x 2 x double> [[B]], <vscale x 2 x double> [[C]])
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fnmls.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b, <vscale x 2 x double> %c)
				ret <vscale x 2 x double> %2
				}

				define <vscale x 2 x double> @no_replace_fnmls_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b, <vscale x 2 x double> %c) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @no_replace_fnmls_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]], <vscale x 2 x double> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fnmls.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A]], <vscale x 2 x double> [[B]], <vscale x 2 x double> [[C]])
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fnmls.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b, <vscale x 2 x double> %c)
				ret <vscale x 2 x double> %2
				}

				; aarch64_sve_fsub intrinsic combines to a LLVM instruction fsub.

				declare <vscale x 8 x half> @llvm.aarch64.sve.fsub.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
				define <vscale x 8 x half> @replace_fsub_intrinsic_half(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x half> @replace_fsub_intrinsic_half
				; CHECK-SAME: (<vscale x 8 x half> [[A:%.]], <vscale x 8 x half> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = fsub fast <vscale x 8 x half> [[A]], [[B]]
				; CHECK-NEXT: ret <vscale x 8 x half> [[TMP1]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fsub.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
				ret <vscale x 8 x half> %2
				}

				declare <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>, <vscale x 4 x float>)
				define <vscale x 4 x float> @replace_fsub_intrinsic_float(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x float> @replace_fsub_intrinsic_float
				; CHECK-SAME: (<vscale x 4 x float> [[A:%.]], <vscale x 4 x float> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = fsub fast <vscale x 4 x float> [[A]], [[B]]
				; CHECK-NEXT: ret <vscale x 4 x float> [[TMP1]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %1, <vscale x 4 x float> %a, <vscale x 4 x float> %b)
				ret <vscale x 4 x float> %2
				}

				declare <vscale x 2 x double> @llvm.aarch64.sve.fsub.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>)
				define <vscale x 2 x double> @replace_fsub_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @replace_fsub_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = fsub fast <vscale x 2 x double> [[A]], [[B]]
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP1]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fsub.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				ret <vscale x 2 x double> %2
				}

				define <vscale x 2 x double> @no_replace_fsub_intrinsic_double(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x double> @no_replace_fsub_intrinsic_double
				; CHECK-SAME: (<vscale x 2 x double> [[A:%.]], <vscale x 2 x double> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fsub.nxv2f64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x double> [[A]], <vscale x 2 x double> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x double> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.fsub.nxv2f64(<vscale x 2 x i1> %1, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				ret <vscale x 2 x double> %2
				}

				; Integer arithmetic

				mgabkaUnsubmitted Not Done Reply Inline Actions to me the singular form is more correct in this context mgabka: to me the singular form is more correct in this context
				jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions Fixed. jolanta.jensen: Fixed.
				declare <vscale x 16 x i8> @llvm.aarch64.sve.add.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				define <vscale x 16 x i8> @replace_add_intrinsic_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: define <vscale x 16 x i8> @replace_add_intrinsic_i8
				; CHECK-SAME: (<vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 16 x i8> @llvm.aarch64.sve.add.u.nxv16i8(<vscale x 16 x i1> [[TMP1]], <vscale x 16 x i8> [[A]], <vscale x 16 x i8> [[B]])
				; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP2]]
				;
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.add.nxv16i8(<vscale x 16 x i1> %1, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %2
				}

				declare <vscale x 8 x i16> @llvm.aarch64.sve.add.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				define <vscale x 8 x i16> @replace_add_intrinsic_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x i16> @replace_add_intrinsic_i16
				; CHECK-SAME: (<vscale x 8 x i16> [[A:%.]], <vscale x 8 x i16> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.add.u.nxv8i16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x i16> [[A]], <vscale x 8 x i16> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.add.nxv8i16(<vscale x 8 x i1> %1, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %2
				}

				declare <vscale x 4 x i32> @llvm.aarch64.sve.add.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
				define <vscale x 4 x i32> @replace_add_intrinsic_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x i32> @replace_add_intrinsic_i32
				; CHECK-SAME: (<vscale x 4 x i32> [[A:%.]], <vscale x 4 x i32> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.add.u.nxv4i32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x i32> [[A]], <vscale x 4 x i32> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.add.nxv4i32(<vscale x 4 x i1> %1, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %2
				}

				declare <vscale x 2 x i64> @llvm.aarch64.sve.add.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)
				define <vscale x 2 x i64> @replace_add_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @replace_add_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.add.u.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.add.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				define <vscale x 2 x i64> @no_replace_add_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @no_replace_add_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.add.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.add.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				declare <vscale x 16 x i8> @llvm.aarch64.sve.mla.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				define <vscale x 16 x i8> @replace_mla_intrinsic_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c) #0 {
				; CHECK-LABEL: define <vscale x 16 x i8> @replace_mla_intrinsic_i8
				; CHECK-SAME: (<vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.]], <vscale x 16 x i8> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 16 x i8> @llvm.aarch64.sve.mla.u.nxv16i8(<vscale x 16 x i1> [[TMP1]], <vscale x 16 x i8> [[A]], <vscale x 16 x i8> [[B]], <vscale x 16 x i8> [[C]])
				; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP2]]
				;
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.mla.nxv16i8(<vscale x 16 x i1> %1, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c)
				ret <vscale x 16 x i8> %2
				}

				declare <vscale x 8 x i16> @llvm.aarch64.sve.mla.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				define <vscale x 8 x i16> @replace_mla_intrinsic_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, <vscale x 8 x i16> %c) #0 {
				; CHECK-LABEL: define <vscale x 8 x i16> @replace_mla_intrinsic_i16
				; CHECK-SAME: (<vscale x 8 x i16> [[A:%.]], <vscale x 8 x i16> [[B:%.]], <vscale x 8 x i16> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.mla.u.nxv8i16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x i16> [[A]], <vscale x 8 x i16> [[B]], <vscale x 8 x i16> [[C]])
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.mla.nxv8i16(<vscale x 8 x i1> %1, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b, <vscale x 8 x i16> %c)
				ret <vscale x 8 x i16> %2
				}

				declare <vscale x 4 x i32> @llvm.aarch64.sve.mla.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>, <vscale x 4 x i32>)
				define <vscale x 4 x i32> @replace_mla_intrinsic_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, <vscale x 4 x i32> %c) #0 {
				; CHECK-LABEL: define <vscale x 4 x i32> @replace_mla_intrinsic_i32
				; CHECK-SAME: (<vscale x 4 x i32> [[A:%.]], <vscale x 4 x i32> [[B:%.]], <vscale x 4 x i32> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.mla.u.nxv4i32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x i32> [[A]], <vscale x 4 x i32> [[B]], <vscale x 4 x i32> [[C]])
				; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.mla.nxv4i32(<vscale x 4 x i1> %1, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b, <vscale x 4 x i32> %c)
				ret <vscale x 4 x i32> %2
				}

				declare <vscale x 2 x i64> @llvm.aarch64.sve.mla.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>, <vscale x 2 x i64>)
				define <vscale x 2 x i64> @replace_mla_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b, <vscale x 2 x i64> %c) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @replace_mla_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]], <vscale x 2 x i64> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.mla.u.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]], <vscale x 2 x i64> [[C]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.mla.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b, <vscale x 2 x i64> %c)
				ret <vscale x 2 x i64> %2
				}

				define <vscale x 2 x i64> @no_replace_mla_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b, <vscale x 2 x i64> %c) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @no_replace_mla_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]], <vscale x 2 x i64> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.mla.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]], <vscale x 2 x i64> [[C]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.mla.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b, <vscale x 2 x i64> %c)
				ret <vscale x 2 x i64> %2
				}

				declare <vscale x 16 x i8> @llvm.aarch64.sve.mls.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				define <vscale x 16 x i8> @replace_mls_intrinsic_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c) #0 {
				; CHECK-LABEL: define <vscale x 16 x i8> @replace_mls_intrinsic_i8
				; CHECK-SAME: (<vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.]], <vscale x 16 x i8> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 16 x i8> @llvm.aarch64.sve.mls.u.nxv16i8(<vscale x 16 x i1> [[TMP1]], <vscale x 16 x i8> [[A]], <vscale x 16 x i8> [[B]], <vscale x 16 x i8> [[C]])
				; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP2]]
				;
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.mls.nxv16i8(<vscale x 16 x i1> %1, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c)
				ret <vscale x 16 x i8> %2
				}

				declare <vscale x 8 x i16> @llvm.aarch64.sve.mls.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				define <vscale x 8 x i16> @replace_mls_intrinsic_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, <vscale x 8 x i16> %c) #0 {
				; CHECK-LABEL: define <vscale x 8 x i16> @replace_mls_intrinsic_i16
				; CHECK-SAME: (<vscale x 8 x i16> [[A:%.]], <vscale x 8 x i16> [[B:%.]], <vscale x 8 x i16> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.mls.u.nxv8i16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x i16> [[A]], <vscale x 8 x i16> [[B]], <vscale x 8 x i16> [[C]])
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.mls.nxv8i16(<vscale x 8 x i1> %1, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b, <vscale x 8 x i16> %c)
				ret <vscale x 8 x i16> %2
				}

				declare <vscale x 4 x i32> @llvm.aarch64.sve.mls.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>, <vscale x 4 x i32>)
				define <vscale x 4 x i32> @replace_mls_intrinsic_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, <vscale x 4 x i32> %c) #0 {
				; CHECK-LABEL: define <vscale x 4 x i32> @replace_mls_intrinsic_i32
				; CHECK-SAME: (<vscale x 4 x i32> [[A:%.]], <vscale x 4 x i32> [[B:%.]], <vscale x 4 x i32> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.mls.u.nxv4i32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x i32> [[A]], <vscale x 4 x i32> [[B]], <vscale x 4 x i32> [[C]])
				; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.mls.nxv4i32(<vscale x 4 x i1> %1, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b, <vscale x 4 x i32> %c)
				ret <vscale x 4 x i32> %2
				}

				declare <vscale x 2 x i64> @llvm.aarch64.sve.mls.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>, <vscale x 2 x i64>)
				define <vscale x 2 x i64> @replace_mls_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b, <vscale x 2 x i64> %c) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @replace_mls_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]], <vscale x 2 x i64> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.mls.u.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]], <vscale x 2 x i64> [[C]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.mls.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b, <vscale x 2 x i64> %c)
				ret <vscale x 2 x i64> %2
				}

				define <vscale x 2 x i64> @no_replace_mls_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b, <vscale x 2 x i64> %c) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @no_replace_mls_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]], <vscale x 2 x i64> [[C:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.mls.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]], <vscale x 2 x i64> [[C]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.mls.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b, <vscale x 2 x i64> %c)
				ret <vscale x 2 x i64> %2
				}

				declare <vscale x 16 x i8> @llvm.aarch64.sve.mul.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				define <vscale x 16 x i8> @replace_mul_intrinsic_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: define <vscale x 16 x i8> @replace_mul_intrinsic_i8
				; CHECK-SAME: (<vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 16 x i8> @llvm.aarch64.sve.mul.u.nxv16i8(<vscale x 16 x i1> [[TMP1]], <vscale x 16 x i8> [[A]], <vscale x 16 x i8> [[B]])
				; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP2]]
				;
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.mul.nxv16i8(<vscale x 16 x i1> %1, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %2
				}

				declare <vscale x 8 x i16> @llvm.aarch64.sve.mul.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				define <vscale x 8 x i16> @replace_mul_intrinsic_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x i16> @replace_mul_intrinsic_i16
				; CHECK-SAME: (<vscale x 8 x i16> [[A:%.]], <vscale x 8 x i16> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.mul.u.nxv8i16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x i16> [[A]], <vscale x 8 x i16> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.mul.nxv8i16(<vscale x 8 x i1> %1, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %2
				}

				declare <vscale x 4 x i32> @llvm.aarch64.sve.mul.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
				define <vscale x 4 x i32> @replace_mul_intrinsic_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x i32> @replace_mul_intrinsic_i32
				; CHECK-SAME: (<vscale x 4 x i32> [[A:%.]], <vscale x 4 x i32> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.mul.u.nxv4i32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x i32> [[A]], <vscale x 4 x i32> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.mul.nxv4i32(<vscale x 4 x i1> %1, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %2
				}

				declare <vscale x 2 x i64> @llvm.aarch64.sve.mul.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)
				define <vscale x 2 x i64> @replace_mul_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @replace_mul_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.mul.u.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.mul.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				define <vscale x 2 x i64> @no_replace_mul_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @no_replace_mul_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.mul.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.mul.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				declare <vscale x 16 x i8> @llvm.aarch64.sve.sabd.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				define <vscale x 16 x i8> @replace_sabd_intrinsic_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: define <vscale x 16 x i8> @replace_sabd_intrinsic_i8
				; CHECK-SAME: (<vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 16 x i8> @llvm.aarch64.sve.sabd.u.nxv16i8(<vscale x 16 x i1> [[TMP1]], <vscale x 16 x i8> [[A]], <vscale x 16 x i8> [[B]])
				; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP2]]
				;
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.sabd.nxv16i8(<vscale x 16 x i1> %1, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %2
				}

				declare <vscale x 8 x i16> @llvm.aarch64.sve.sabd.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				define <vscale x 8 x i16> @replace_sabd_intrinsic_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x i16> @replace_sabd_intrinsic_i16
				; CHECK-SAME: (<vscale x 8 x i16> [[A:%.]], <vscale x 8 x i16> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.sabd.u.nxv8i16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x i16> [[A]], <vscale x 8 x i16> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.sabd.nxv8i16(<vscale x 8 x i1> %1, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %2
				}

				declare <vscale x 4 x i32> @llvm.aarch64.sve.sabd.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
				define <vscale x 4 x i32> @replace_sabd_intrinsic_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x i32> @replace_sabd_intrinsic_i32
				; CHECK-SAME: (<vscale x 4 x i32> [[A:%.]], <vscale x 4 x i32> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.sabd.u.nxv4i32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x i32> [[A]], <vscale x 4 x i32> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.sabd.nxv4i32(<vscale x 4 x i1> %1, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %2
				}

				declare <vscale x 2 x i64> @llvm.aarch64.sve.sabd.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)
				define <vscale x 2 x i64> @replace_sabd_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @replace_sabd_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.sabd.u.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.sabd.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				define <vscale x 2 x i64> @no_replace_sabd_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @no_replace_sabd_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.sabd.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.sabd.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				declare <vscale x 16 x i8> @llvm.aarch64.sve.smax.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				define <vscale x 16 x i8> @replace_smax_intrinsic_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: define <vscale x 16 x i8> @replace_smax_intrinsic_i8
				; CHECK-SAME: (<vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 16 x i8> @llvm.aarch64.sve.smax.u.nxv16i8(<vscale x 16 x i1> [[TMP1]], <vscale x 16 x i8> [[A]], <vscale x 16 x i8> [[B]])
				; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP2]]
				;
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.smax.nxv16i8(<vscale x 16 x i1> %1, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %2
				}

				declare <vscale x 8 x i16> @llvm.aarch64.sve.smax.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				define <vscale x 8 x i16> @replace_smax_intrinsic_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x i16> @replace_smax_intrinsic_i16
				; CHECK-SAME: (<vscale x 8 x i16> [[A:%.]], <vscale x 8 x i16> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.smax.u.nxv8i16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x i16> [[A]], <vscale x 8 x i16> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.smax.nxv8i16(<vscale x 8 x i1> %1, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %2
				}

				declare <vscale x 4 x i32> @llvm.aarch64.sve.smax.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
				define <vscale x 4 x i32> @replace_smax_intrinsic_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x i32> @replace_smax_intrinsic_i32
				; CHECK-SAME: (<vscale x 4 x i32> [[A:%.]], <vscale x 4 x i32> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.smax.u.nxv4i32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x i32> [[A]], <vscale x 4 x i32> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.smax.nxv4i32(<vscale x 4 x i1> %1, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %2
				}

				declare <vscale x 2 x i64> @llvm.aarch64.sve.smax.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)
				define <vscale x 2 x i64> @replace_smax_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @replace_smax_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.smax.u.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.smax.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				define <vscale x 2 x i64> @no_replace_smax_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @no_replace_smax_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.smax.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.smax.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				declare <vscale x 16 x i8> @llvm.aarch64.sve.smin.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				define <vscale x 16 x i8> @replace_smin_intrinsic_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: define <vscale x 16 x i8> @replace_smin_intrinsic_i8
				; CHECK-SAME: (<vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 16 x i8> @llvm.aarch64.sve.smin.u.nxv16i8(<vscale x 16 x i1> [[TMP1]], <vscale x 16 x i8> [[A]], <vscale x 16 x i8> [[B]])
				; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP2]]
				;
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.smin.nxv16i8(<vscale x 16 x i1> %1, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %2
				}

				declare <vscale x 8 x i16> @llvm.aarch64.sve.smin.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				define <vscale x 8 x i16> @replace_smin_intrinsic_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x i16> @replace_smin_intrinsic_i16
				; CHECK-SAME: (<vscale x 8 x i16> [[A:%.]], <vscale x 8 x i16> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.smin.u.nxv8i16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x i16> [[A]], <vscale x 8 x i16> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.smin.nxv8i16(<vscale x 8 x i1> %1, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %2
				}

				declare <vscale x 4 x i32> @llvm.aarch64.sve.smin.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
				define <vscale x 4 x i32> @replace_smin_intrinsic_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x i32> @replace_smin_intrinsic_i32
				; CHECK-SAME: (<vscale x 4 x i32> [[A:%.]], <vscale x 4 x i32> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.smin.u.nxv4i32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x i32> [[A]], <vscale x 4 x i32> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.smin.nxv4i32(<vscale x 4 x i1> %1, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %2
				}

				declare <vscale x 2 x i64> @llvm.aarch64.sve.smin.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)
				define <vscale x 2 x i64> @replace_smin_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @replace_smin_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.smin.u.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.smin.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				define <vscale x 2 x i64> @no_replace_smin_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @no_replace_smin_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.smin.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.smin.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				declare <vscale x 16 x i8> @llvm.aarch64.sve.smulh.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				define <vscale x 16 x i8> @replace_smulh_intrinsic_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: define <vscale x 16 x i8> @replace_smulh_intrinsic_i8
				; CHECK-SAME: (<vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 16 x i8> @llvm.aarch64.sve.smulh.u.nxv16i8(<vscale x 16 x i1> [[TMP1]], <vscale x 16 x i8> [[A]], <vscale x 16 x i8> [[B]])
				; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP2]]
				;
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.smulh.nxv16i8(<vscale x 16 x i1> %1, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %2
				}

				declare <vscale x 8 x i16> @llvm.aarch64.sve.smulh.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				define <vscale x 8 x i16> @replace_smulh_intrinsic_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x i16> @replace_smulh_intrinsic_i16
				; CHECK-SAME: (<vscale x 8 x i16> [[A:%.]], <vscale x 8 x i16> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.smulh.u.nxv8i16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x i16> [[A]], <vscale x 8 x i16> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.smulh.nxv8i16(<vscale x 8 x i1> %1, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %2
				}

				declare <vscale x 4 x i32> @llvm.aarch64.sve.smulh.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
				define <vscale x 4 x i32> @replace_smulh_intrinsic_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x i32> @replace_smulh_intrinsic_i32
				; CHECK-SAME: (<vscale x 4 x i32> [[A:%.]], <vscale x 4 x i32> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.smulh.u.nxv4i32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x i32> [[A]], <vscale x 4 x i32> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.smulh.nxv4i32(<vscale x 4 x i1> %1, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %2
				}

				declare <vscale x 2 x i64> @llvm.aarch64.sve.smulh.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)
				define <vscale x 2 x i64> @replace_smulh_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @replace_smulh_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.smulh.u.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.smulh.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				define <vscale x 2 x i64> @no_replace_smulh_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @no_replace_smulh_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.smulh.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.smulh.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				declare <vscale x 16 x i8> @llvm.aarch64.sve.sub.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				define <vscale x 16 x i8> @replace_sub_intrinsic_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: define <vscale x 16 x i8> @replace_sub_intrinsic_i8
				; CHECK-SAME: (<vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 16 x i8> @llvm.aarch64.sve.sub.u.nxv16i8(<vscale x 16 x i1> [[TMP1]], <vscale x 16 x i8> [[A]], <vscale x 16 x i8> [[B]])
				; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP2]]
				;
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.sub.nxv16i8(<vscale x 16 x i1> %1, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %2
				}

				declare <vscale x 8 x i16> @llvm.aarch64.sve.sub.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				define <vscale x 8 x i16> @replace_sub_intrinsic_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x i16> @replace_sub_intrinsic_i16
				; CHECK-SAME: (<vscale x 8 x i16> [[A:%.]], <vscale x 8 x i16> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.sub.u.nxv8i16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x i16> [[A]], <vscale x 8 x i16> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.sub.nxv8i16(<vscale x 8 x i1> %1, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %2
				}

				declare <vscale x 4 x i32> @llvm.aarch64.sve.sub.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
				define <vscale x 4 x i32> @replace_sub_intrinsic_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x i32> @replace_sub_intrinsic_i32
				; CHECK-SAME: (<vscale x 4 x i32> [[A:%.]], <vscale x 4 x i32> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.sub.u.nxv4i32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x i32> [[A]], <vscale x 4 x i32> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.sub.nxv4i32(<vscale x 4 x i1> %1, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %2
				}

				declare <vscale x 2 x i64> @llvm.aarch64.sve.sub.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)
				define <vscale x 2 x i64> @replace_sub_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @replace_sub_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.sub.u.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.sub.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				define <vscale x 2 x i64> @no_replace_sub_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @no_replace_sub_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.sub.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.sub.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				declare <vscale x 16 x i8> @llvm.aarch64.sve.uabd.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				define <vscale x 16 x i8> @replace_uabd_intrinsic_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: define <vscale x 16 x i8> @replace_uabd_intrinsic_i8
				; CHECK-SAME: (<vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 16 x i8> @llvm.aarch64.sve.uabd.u.nxv16i8(<vscale x 16 x i1> [[TMP1]], <vscale x 16 x i8> [[A]], <vscale x 16 x i8> [[B]])
				; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP2]]
				;
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.uabd.nxv16i8(<vscale x 16 x i1> %1, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %2
				}

				declare <vscale x 8 x i16> @llvm.aarch64.sve.uabd.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				define <vscale x 8 x i16> @replace_uabd_intrinsic_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x i16> @replace_uabd_intrinsic_i16
				; CHECK-SAME: (<vscale x 8 x i16> [[A:%.]], <vscale x 8 x i16> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uabd.u.nxv8i16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x i16> [[A]], <vscale x 8 x i16> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uabd.nxv8i16(<vscale x 8 x i1> %1, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %2
				}

				declare <vscale x 4 x i32> @llvm.aarch64.sve.uabd.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
				define <vscale x 4 x i32> @replace_uabd_intrinsic_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x i32> @replace_uabd_intrinsic_i32
				; CHECK-SAME: (<vscale x 4 x i32> [[A:%.]], <vscale x 4 x i32> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uabd.u.nxv4i32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x i32> [[A]], <vscale x 4 x i32> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uabd.nxv4i32(<vscale x 4 x i1> %1, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %2
				}

				declare <vscale x 2 x i64> @llvm.aarch64.sve.uabd.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)
				define <vscale x 2 x i64> @replace_uabd_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @replace_uabd_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uabd.u.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uabd.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				define <vscale x 2 x i64> @no_replace_uabd_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @no_replace_uabd_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uabd.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uabd.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				declare <vscale x 16 x i8> @llvm.aarch64.sve.umax.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				define <vscale x 16 x i8> @replace_umax_intrinsic_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: define <vscale x 16 x i8> @replace_umax_intrinsic_i8
				; CHECK-SAME: (<vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 16 x i8> @llvm.aarch64.sve.umax.u.nxv16i8(<vscale x 16 x i1> [[TMP1]], <vscale x 16 x i8> [[A]], <vscale x 16 x i8> [[B]])
				; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP2]]
				;
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.umax.nxv16i8(<vscale x 16 x i1> %1, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %2
				}

				declare <vscale x 8 x i16> @llvm.aarch64.sve.umax.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				define <vscale x 8 x i16> @replace_umax_intrinsic_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x i16> @replace_umax_intrinsic_i16
				; CHECK-SAME: (<vscale x 8 x i16> [[A:%.]], <vscale x 8 x i16> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.umax.u.nxv8i16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x i16> [[A]], <vscale x 8 x i16> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.umax.nxv8i16(<vscale x 8 x i1> %1, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %2
				}

				declare <vscale x 4 x i32> @llvm.aarch64.sve.umax.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
				define <vscale x 4 x i32> @replace_umax_intrinsic_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x i32> @replace_umax_intrinsic_i32
				; CHECK-SAME: (<vscale x 4 x i32> [[A:%.]], <vscale x 4 x i32> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.umax.u.nxv4i32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x i32> [[A]], <vscale x 4 x i32> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.umax.nxv4i32(<vscale x 4 x i1> %1, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %2
				}

				declare <vscale x 2 x i64> @llvm.aarch64.sve.umax.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)
				define <vscale x 2 x i64> @replace_umax_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @replace_umax_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.umax.u.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.umax.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				define <vscale x 2 x i64> @no_replace_umax_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @no_replace_umax_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.umax.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.umax.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				declare <vscale x 16 x i8> @llvm.aarch64.sve.umin.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				define <vscale x 16 x i8> @replace_umin_intrinsic_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: define <vscale x 16 x i8> @replace_umin_intrinsic_i8
				; CHECK-SAME: (<vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 16 x i8> @llvm.aarch64.sve.umin.u.nxv16i8(<vscale x 16 x i1> [[TMP1]], <vscale x 16 x i8> [[A]], <vscale x 16 x i8> [[B]])
				; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP2]]
				;
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.umin.nxv16i8(<vscale x 16 x i1> %1, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %2
				}

				declare <vscale x 8 x i16> @llvm.aarch64.sve.umin.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				define <vscale x 8 x i16> @replace_umin_intrinsic_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x i16> @replace_umin_intrinsic_i16
				; CHECK-SAME: (<vscale x 8 x i16> [[A:%.]], <vscale x 8 x i16> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.umin.u.nxv8i16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x i16> [[A]], <vscale x 8 x i16> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.umin.nxv8i16(<vscale x 8 x i1> %1, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %2
				}

				declare <vscale x 4 x i32> @llvm.aarch64.sve.umin.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
				define <vscale x 4 x i32> @replace_umin_intrinsic_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x i32> @replace_umin_intrinsic_i32
				; CHECK-SAME: (<vscale x 4 x i32> [[A:%.]], <vscale x 4 x i32> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.umin.u.nxv4i32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x i32> [[A]], <vscale x 4 x i32> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.umin.nxv4i32(<vscale x 4 x i1> %1, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %2
				}

				declare <vscale x 2 x i64> @llvm.aarch64.sve.umin.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)
				define <vscale x 2 x i64> @replace_umin_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @replace_umin_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.umin.u.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.umin.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				define <vscale x 2 x i64> @no_replace_umin_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @no_replace_umin_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.umin.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.umin.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				declare <vscale x 16 x i8> @llvm.aarch64.sve.umulh.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				define <vscale x 16 x i8> @replace_umulh_intrinsic_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: define <vscale x 16 x i8> @replace_umulh_intrinsic_i8
				; CHECK-SAME: (<vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 16 x i8> @llvm.aarch64.sve.umulh.u.nxv16i8(<vscale x 16 x i1> [[TMP1]], <vscale x 16 x i8> [[A]], <vscale x 16 x i8> [[B]])
				; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP2]]
				;
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.umulh.nxv16i8(<vscale x 16 x i1> %1, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %2
				}

				declare <vscale x 8 x i16> @llvm.aarch64.sve.umulh.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				define <vscale x 8 x i16> @replace_umulh_intrinsic_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x i16> @replace_umulh_intrinsic_i16
				; CHECK-SAME: (<vscale x 8 x i16> [[A:%.]], <vscale x 8 x i16> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.umulh.u.nxv8i16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x i16> [[A]], <vscale x 8 x i16> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.umulh.nxv8i16(<vscale x 8 x i1> %1, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %2
				}

				declare <vscale x 4 x i32> @llvm.aarch64.sve.umulh.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
				define <vscale x 4 x i32> @replace_umulh_intrinsic_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x i32> @replace_umulh_intrinsic_i32
				; CHECK-SAME: (<vscale x 4 x i32> [[A:%.]], <vscale x 4 x i32> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.umulh.u.nxv4i32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x i32> [[A]], <vscale x 4 x i32> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.umulh.nxv4i32(<vscale x 4 x i1> %1, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %2
				}

				declare <vscale x 2 x i64> @llvm.aarch64.sve.umulh.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)
				define <vscale x 2 x i64> @replace_umulh_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @replace_umulh_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.umulh.u.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.umulh.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				define <vscale x 2 x i64> @no_replace_umulh_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @no_replace_umulh_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.umulh.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.umulh.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				; Shifts

				declare <vscale x 16 x i8> @llvm.aarch64.sve.asr.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				define <vscale x 16 x i8> @replace_asr_intrinsic_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: define <vscale x 16 x i8> @replace_asr_intrinsic_i8
				; CHECK-SAME: (<vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 16 x i8> @llvm.aarch64.sve.asr.u.nxv16i8(<vscale x 16 x i1> [[TMP1]], <vscale x 16 x i8> [[A]], <vscale x 16 x i8> [[B]])
				; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP2]]
				;
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.asr.nxv16i8(<vscale x 16 x i1> %1, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %2
				}

				declare <vscale x 8 x i16> @llvm.aarch64.sve.asr.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				define <vscale x 8 x i16> @replace_asr_intrinsic_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x i16> @replace_asr_intrinsic_i16
				; CHECK-SAME: (<vscale x 8 x i16> [[A:%.]], <vscale x 8 x i16> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.asr.u.nxv8i16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x i16> [[A]], <vscale x 8 x i16> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.asr.nxv8i16(<vscale x 8 x i1> %1, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %2
				}

				declare <vscale x 4 x i32> @llvm.aarch64.sve.asr.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
				define <vscale x 4 x i32> @replace_asr_intrinsic_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x i32> @replace_asr_intrinsic_i32
				; CHECK-SAME: (<vscale x 4 x i32> [[A:%.]], <vscale x 4 x i32> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.asr.u.nxv4i32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x i32> [[A]], <vscale x 4 x i32> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.asr.nxv4i32(<vscale x 4 x i1> %1, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %2
				}

				declare <vscale x 2 x i64> @llvm.aarch64.sve.asr.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)
				define <vscale x 2 x i64> @replace_asr_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @replace_asr_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.asr.u.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.asr.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				define <vscale x 2 x i64> @no_replace_asr_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @no_replace_asr_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.asr.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.asr.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				declare <vscale x 16 x i8> @llvm.aarch64.sve.lsl.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				define <vscale x 16 x i8> @replace_lsl_intrinsic_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: define <vscale x 16 x i8> @replace_lsl_intrinsic_i8
				; CHECK-SAME: (<vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 16 x i8> @llvm.aarch64.sve.lsl.u.nxv16i8(<vscale x 16 x i1> [[TMP1]], <vscale x 16 x i8> [[A]], <vscale x 16 x i8> [[B]])
				; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP2]]
				;
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.lsl.nxv16i8(<vscale x 16 x i1> %1, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %2
				}

				declare <vscale x 8 x i16> @llvm.aarch64.sve.lsl.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				define <vscale x 8 x i16> @replace_lsl_intrinsic_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x i16> @replace_lsl_intrinsic_i16
				; CHECK-SAME: (<vscale x 8 x i16> [[A:%.]], <vscale x 8 x i16> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.lsl.u.nxv8i16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x i16> [[A]], <vscale x 8 x i16> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.lsl.nxv8i16(<vscale x 8 x i1> %1, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %2
				}

				declare <vscale x 4 x i32> @llvm.aarch64.sve.lsl.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
				define <vscale x 4 x i32> @replace_lsl_intrinsic_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x i32> @replace_lsl_intrinsic_i32
				; CHECK-SAME: (<vscale x 4 x i32> [[A:%.]], <vscale x 4 x i32> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.lsl.u.nxv4i32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x i32> [[A]], <vscale x 4 x i32> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.lsl.nxv4i32(<vscale x 4 x i1> %1, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %2
				}

				declare <vscale x 2 x i64> @llvm.aarch64.sve.lsl.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)
				define <vscale x 2 x i64> @replace_lsl_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @replace_lsl_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.lsl.u.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.lsl.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				define <vscale x 2 x i64> @no_replace_lsl_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @no_replace_lsl_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.lsl.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.lsl.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				declare <vscale x 16 x i8> @llvm.aarch64.sve.lsr.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				define <vscale x 16 x i8> @replace_lsr_intrinsic_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: define <vscale x 16 x i8> @replace_lsr_intrinsic_i8
				; CHECK-SAME: (<vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 16 x i8> @llvm.aarch64.sve.lsr.u.nxv16i8(<vscale x 16 x i1> [[TMP1]], <vscale x 16 x i8> [[A]], <vscale x 16 x i8> [[B]])
				; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP2]]
				;
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.lsr.nxv16i8(<vscale x 16 x i1> %1, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %2
				}

				declare <vscale x 8 x i16> @llvm.aarch64.sve.lsr.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				define <vscale x 8 x i16> @replace_lsr_intrinsic_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x i16> @replace_lsr_intrinsic_i16
				; CHECK-SAME: (<vscale x 8 x i16> [[A:%.]], <vscale x 8 x i16> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.lsr.u.nxv8i16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x i16> [[A]], <vscale x 8 x i16> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.lsr.nxv8i16(<vscale x 8 x i1> %1, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %2
				}

				declare <vscale x 4 x i32> @llvm.aarch64.sve.lsr.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
				define <vscale x 4 x i32> @replace_lsr_intrinsic_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x i32> @replace_lsr_intrinsic_i32
				; CHECK-SAME: (<vscale x 4 x i32> [[A:%.]], <vscale x 4 x i32> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.lsr.u.nxv4i32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x i32> [[A]], <vscale x 4 x i32> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.lsr.nxv4i32(<vscale x 4 x i1> %1, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %2
				}

				declare <vscale x 2 x i64> @llvm.aarch64.sve.lsr.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)
				define <vscale x 2 x i64> @replace_lsr_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @replace_lsr_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.lsr.u.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.lsr.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				define <vscale x 2 x i64> @no_replace_lsr_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @no_replace_lsr_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.lsr.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.lsr.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				; Logical operations

				declare <vscale x 16 x i8> @llvm.aarch64.sve.and.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				define <vscale x 16 x i8> @replace_and_intrinsic_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: define <vscale x 16 x i8> @replace_and_intrinsic_i8
				; CHECK-SAME: (<vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 16 x i8> @llvm.aarch64.sve.and.u.nxv16i8(<vscale x 16 x i1> [[TMP1]], <vscale x 16 x i8> [[A]], <vscale x 16 x i8> [[B]])
				; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP2]]
				;
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.and.nxv16i8(<vscale x 16 x i1> %1, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %2
				}

				declare <vscale x 8 x i16> @llvm.aarch64.sve.and.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				define <vscale x 8 x i16> @replace_and_intrinsic_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x i16> @replace_and_intrinsic_i16
				; CHECK-SAME: (<vscale x 8 x i16> [[A:%.]], <vscale x 8 x i16> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.and.u.nxv8i16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x i16> [[A]], <vscale x 8 x i16> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.and.nxv8i16(<vscale x 8 x i1> %1, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %2
				}

				declare <vscale x 4 x i32> @llvm.aarch64.sve.and.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
				define <vscale x 4 x i32> @replace_and_intrinsic_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x i32> @replace_and_intrinsic_i32
				; CHECK-SAME: (<vscale x 4 x i32> [[A:%.]], <vscale x 4 x i32> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.and.u.nxv4i32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x i32> [[A]], <vscale x 4 x i32> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.and.nxv4i32(<vscale x 4 x i1> %1, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %2
				}

				declare <vscale x 2 x i64> @llvm.aarch64.sve.and.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)
				define <vscale x 2 x i64> @replace_and_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @replace_and_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.and.u.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.and.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				define <vscale x 2 x i64> @no_replace_and_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @no_replace_and_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.and.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.and.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				declare <vscale x 16 x i8> @llvm.aarch64.sve.bic.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				define <vscale x 16 x i8> @replace_bic_intrinsic_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: define <vscale x 16 x i8> @replace_bic_intrinsic_i8
				; CHECK-SAME: (<vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 16 x i8> @llvm.aarch64.sve.bic.u.nxv16i8(<vscale x 16 x i1> [[TMP1]], <vscale x 16 x i8> [[A]], <vscale x 16 x i8> [[B]])
				; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP2]]
				;
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.bic.nxv16i8(<vscale x 16 x i1> %1, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %2
				}

				declare <vscale x 8 x i16> @llvm.aarch64.sve.bic.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				define <vscale x 8 x i16> @replace_bic_intrinsic_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x i16> @replace_bic_intrinsic_i16
				; CHECK-SAME: (<vscale x 8 x i16> [[A:%.]], <vscale x 8 x i16> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.bic.u.nxv8i16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x i16> [[A]], <vscale x 8 x i16> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.bic.nxv8i16(<vscale x 8 x i1> %1, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %2
				}

				declare <vscale x 4 x i32> @llvm.aarch64.sve.bic.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
				define <vscale x 4 x i32> @replace_bic_intrinsic_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x i32> @replace_bic_intrinsic_i32
				; CHECK-SAME: (<vscale x 4 x i32> [[A:%.]], <vscale x 4 x i32> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.bic.u.nxv4i32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x i32> [[A]], <vscale x 4 x i32> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.bic.nxv4i32(<vscale x 4 x i1> %1, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %2
				}

				declare <vscale x 2 x i64> @llvm.aarch64.sve.bic.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)
				define <vscale x 2 x i64> @replace_bic_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @replace_bic_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.bic.u.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.bic.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				define <vscale x 2 x i64> @no_replace_bic_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @no_replace_bic_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.bic.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.bic.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				declare <vscale x 16 x i8> @llvm.aarch64.sve.eor.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				define <vscale x 16 x i8> @replace_eor_intrinsic_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: define <vscale x 16 x i8> @replace_eor_intrinsic_i8
				; CHECK-SAME: (<vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 16 x i8> @llvm.aarch64.sve.eor.u.nxv16i8(<vscale x 16 x i1> [[TMP1]], <vscale x 16 x i8> [[A]], <vscale x 16 x i8> [[B]])
				; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP2]]
				;
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.eor.nxv16i8(<vscale x 16 x i1> %1, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %2
				}

				declare <vscale x 8 x i16> @llvm.aarch64.sve.eor.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				define <vscale x 8 x i16> @replace_eor_intrinsic_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x i16> @replace_eor_intrinsic_i16
				; CHECK-SAME: (<vscale x 8 x i16> [[A:%.]], <vscale x 8 x i16> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.eor.u.nxv8i16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x i16> [[A]], <vscale x 8 x i16> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.eor.nxv8i16(<vscale x 8 x i1> %1, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %2
				}

				declare <vscale x 4 x i32> @llvm.aarch64.sve.eor.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
				define <vscale x 4 x i32> @replace_eor_intrinsic_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x i32> @replace_eor_intrinsic_i32
				; CHECK-SAME: (<vscale x 4 x i32> [[A:%.]], <vscale x 4 x i32> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.eor.u.nxv4i32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x i32> [[A]], <vscale x 4 x i32> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.eor.nxv4i32(<vscale x 4 x i1> %1, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %2
				}

				declare <vscale x 2 x i64> @llvm.aarch64.sve.eor.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)
				define <vscale x 2 x i64> @replace_eor_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @replace_eor_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.eor.u.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.eor.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				define <vscale x 2 x i64> @no_replace_eor_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @no_replace_eor_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.eor.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.eor.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				declare <vscale x 16 x i8> @llvm.aarch64.sve.orr.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				define <vscale x 16 x i8> @replace_orr_intrinsic_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: define <vscale x 16 x i8> @replace_orr_intrinsic_i8
				; CHECK-SAME: (<vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 16 x i8> @llvm.aarch64.sve.orr.u.nxv16i8(<vscale x 16 x i1> [[TMP1]], <vscale x 16 x i8> [[A]], <vscale x 16 x i8> [[B]])
				; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP2]]
				;
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.orr.nxv16i8(<vscale x 16 x i1> %1, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %2
				}

				declare <vscale x 8 x i16> @llvm.aarch64.sve.orr.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				define <vscale x 8 x i16> @replace_orr_intrinsic_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x i16> @replace_orr_intrinsic_i16
				; CHECK-SAME: (<vscale x 8 x i16> [[A:%.]], <vscale x 8 x i16> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.orr.u.nxv8i16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x i16> [[A]], <vscale x 8 x i16> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.orr.nxv8i16(<vscale x 8 x i1> %1, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %2
				}

				declare <vscale x 4 x i32> @llvm.aarch64.sve.orr.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
				define <vscale x 4 x i32> @replace_orr_intrinsic_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x i32> @replace_orr_intrinsic_i32
				; CHECK-SAME: (<vscale x 4 x i32> [[A:%.]], <vscale x 4 x i32> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.orr.u.nxv4i32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x i32> [[A]], <vscale x 4 x i32> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.orr.nxv4i32(<vscale x 4 x i1> %1, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %2
				}

				declare <vscale x 2 x i64> @llvm.aarch64.sve.orr.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)
				define <vscale x 2 x i64> @replace_orr_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @replace_orr_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.orr.u.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.orr.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				define <vscale x 2 x i64> @no_replace_orr_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @no_replace_orr_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.orr.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.orr.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				; SVE2 - Uniform DSP operations

				mgabkaUnsubmitted Not Done Reply Inline Actions if these are SVE2 instructions (and they are implemented like that currently in LLVM) then the attached attribute to these functions isn't correct, it should have sve2 attribute attached otherwise those intrinsics can not be code generated. you can try to run llc on this file and check it yourself , you will observe a compiler crash with the current version. mgabka: if these are SVE2 instructions (and they are implemented like that currently in LLVM) then the…
				jolanta.jensenAuthorUnsubmitted Done Reply Inline Actions Added +sve2 to the attributes. jolanta.jensen: Added +sve2 to the attributes.
				declare <vscale x 16 x i8> @llvm.aarch64.sve.sqsub.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				define <vscale x 16 x i8> @replace_sqsub_intrinsic_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: define <vscale x 16 x i8> @replace_sqsub_intrinsic_i8
				; CHECK-SAME: (<vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 16 x i8> @llvm.aarch64.sve.sqsub.u.nxv16i8(<vscale x 16 x i1> [[TMP1]], <vscale x 16 x i8> [[A]], <vscale x 16 x i8> [[B]])
				; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP2]]
				;
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.sqsub.nxv16i8(<vscale x 16 x i1> %1, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %2
				}

				declare <vscale x 8 x i16> @llvm.aarch64.sve.sqsub.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				define <vscale x 8 x i16> @replace_sqsub_intrinsic_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x i16> @replace_sqsub_intrinsic_i16
				; CHECK-SAME: (<vscale x 8 x i16> [[A:%.]], <vscale x 8 x i16> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.sqsub.u.nxv8i16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x i16> [[A]], <vscale x 8 x i16> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.sqsub.nxv8i16(<vscale x 8 x i1> %1, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %2
				}

				declare <vscale x 4 x i32> @llvm.aarch64.sve.sqsub.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
				define <vscale x 4 x i32> @replace_sqsub_intrinsic_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x i32> @replace_sqsub_intrinsic_i32
				; CHECK-SAME: (<vscale x 4 x i32> [[A:%.]], <vscale x 4 x i32> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.sqsub.u.nxv4i32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x i32> [[A]], <vscale x 4 x i32> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.sqsub.nxv4i32(<vscale x 4 x i1> %1, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %2
				}

				declare <vscale x 2 x i64> @llvm.aarch64.sve.sqsub.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)
				define <vscale x 2 x i64> @replace_sqsub_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @replace_sqsub_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.sqsub.u.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.sqsub.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				define <vscale x 2 x i64> @no_replace_sqsub_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @no_replace_sqsub_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.sqsub.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.sqsub.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				declare <vscale x 16 x i8> @llvm.aarch64.sve.uqsub.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				define <vscale x 16 x i8> @replace_uqsub_intrinsic_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: define <vscale x 16 x i8> @replace_uqsub_intrinsic_i8
				; CHECK-SAME: (<vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 16 x i8> @llvm.aarch64.sve.uqsub.u.nxv16i8(<vscale x 16 x i1> [[TMP1]], <vscale x 16 x i8> [[A]], <vscale x 16 x i8> [[B]])
				; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP2]]
				;
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.uqsub.nxv16i8(<vscale x 16 x i1> %1, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %2
				}

				declare <vscale x 8 x i16> @llvm.aarch64.sve.uqsub.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				define <vscale x 8 x i16> @replace_uqsub_intrinsic_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: define <vscale x 8 x i16> @replace_uqsub_intrinsic_i16
				; CHECK-SAME: (<vscale x 8 x i16> [[A:%.]], <vscale x 8 x i16> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uqsub.u.nxv8i16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x i16> [[A]], <vscale x 8 x i16> [[B]])
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uqsub.nxv8i16(<vscale x 8 x i1> %1, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %2
				}

				declare <vscale x 4 x i32> @llvm.aarch64.sve.uqsub.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)
				define <vscale x 4 x i32> @replace_uqsub_intrinsic_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: define <vscale x 4 x i32> @replace_uqsub_intrinsic_i32
				; CHECK-SAME: (<vscale x 4 x i32> [[A:%.]], <vscale x 4 x i32> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uqsub.u.nxv4i32(<vscale x 4 x i1> [[TMP1]], <vscale x 4 x i32> [[A]], <vscale x 4 x i32> [[B]])
				; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
				;
				%1 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uqsub.nxv4i32(<vscale x 4 x i1> %1, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %2
				}

				declare <vscale x 2 x i64> @llvm.aarch64.sve.uqsub.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)
				define <vscale x 2 x i64> @replace_uqsub_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @replace_uqsub_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uqsub.u.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 31)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uqsub.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				define <vscale x 2 x i64> @no_replace_uqsub_intrinsic_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: define <vscale x 2 x i64> @no_replace_uqsub_intrinsic_i64
				; CHECK-SAME: (<vscale x 2 x i64> [[A:%.]], <vscale x 2 x i64> [[B:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				; CHECK-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uqsub.nxv2i64(<vscale x 2 x i1> [[TMP1]], <vscale x 2 x i64> [[A]], <vscale x 2 x i64> [[B]])
				; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
				;
				%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.ptrue.nxv2i1(i32 5)
				%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uqsub.nxv2i64(<vscale x 2 x i1> %1, <vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %2
				}

				attributes #0 = { "target-features"="+sve,+sve2" }