This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
4/7
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
sve-implicit-zero-filling.ll

Differential D101369

[AArch64][SVE] Fold insert(zero, extract(X, 0), 0) -> X, when X is known to zero lanes 1-N
ClosedPublic

Authored by bsmith on Apr 27 2021, 8:28 AM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
peterwaller-arm
joechrisellis
david-arm
efriedma
dmgreen

Commits

rG9f37980d45c7: [AArch64][SVE] Fold insert(zero, extract(X, 0), 0) -> X, when X is known to…

Summary

Specifically, this allow us to rely on the lane zero'ing behaviour of
SVE reduce instructions.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

bsmith created this revision.Apr 27 2021, 8:28 AM

Herald added a reviewer: efriedma. · View Herald TranscriptApr 27 2021, 8:28 AM

Herald added subscribers: psnobl, hiraditya, kristof.beyls, tschuett. · View Herald Transcript

bsmith requested review of this revision.Apr 27 2021, 8:28 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2021, 8:28 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

LGTM, modulo nit.

Test suggestion: what about testing an insert into a non zeroinitializer vector?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
16056	Nit: indent. Also suggestion: `if (Res != SDValue()) return Res;`, that way the code can be extended to handle other combines that come in the future with minimal further changes.

This revision is now accepted and ready to land.Apr 27 2021, 8:58 AM

peterwaller-arm added inline comments.Apr 27 2021, 9:00 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
16056	Erratum: not sure what I was seeing with the whitespace. I think I misread phabricator's diff whitespace hints here.

Harbormaster completed remote builds in B101172: Diff 340856.Apr 27 2021, 9:09 AM

The documentation for the nodes you are using the implicit zeroing of say "// Only the lower result lane is defined." Should that be changed to explain that all non-zero lanes are zero?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15931	Can this use isConstantSplatVectorAllZeros?
16055–16058	I would create a function for this, to keep PerformDAGCombine straight forward. It is common to do: if (SDValue Res = performInsertVectorEltCombine(N, DAG)) return Res return performPostLD1Combine(N, DCI, true);

david-arm added inline comments.Apr 28 2021, 1:31 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15906	Sorry to chip in here as I realise I'm not a reviewer. :) However, the ANDV instruction has a scalar SIMD&FP register for it's result, so it doesn't feel right to say all the other lanes >0 in the equivalent SVE register are zero. I'd have expected something more like isOnlyFirstLaneDefined? I understand you named this function because currently it's only called from performInsertVectorEltCombine where the insert vector is a null splat, but other users may call it elsewhere in future and it feels a bit dangerous to give it a misleading name that's all.

paulwalker-arm added inline comments.Apr 28 2021, 2:11 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15906	This is not true @david-arm . The SVE reduction nodes are defined to return a vector result to match the instructions behaviour of modifying all bits of their destination Z register. You can see this in LowerReductionToSVE where after the reduction we extract the scalar required to match the definition of the common `VECREDUCE` nodes. This patch is one of the reasons we do this so the explicit behaviour is captured and thus can be taken advantage of.

david-arm added inline comments.Apr 28 2021, 2:32 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15906	Ah ok, fair enough. I was just looking at the actual ANDV instruction in the developer manual that's all, whereas from what you're saying we've defined ANDV_PRED to zero all lanes > 0. Sorry for the confusion!

Move combine switch logic into a separate function
Reuse existing isConstantSplatVectorAllZeros function

Harbormaster completed remote builds in B101434: Diff 341229.Apr 28 2021, 10:41 AM

Matt added a subscriber: Matt.Apr 29 2021, 1:16 PM

Thanks. This LGTM.

This revision was landed with ongoing or failed builds.May 4 2021, 7:05 AM

Closed by commit rG9f37980d45c7: [AArch64][SVE] Fold insert(zero, extract(X, 0), 0) -> X, when X is known to… (authored by bsmith). · Explain Why

This revision was automatically updated to reflect the committed changes.

bsmith added a commit: rG9f37980d45c7: [AArch64][SVE] Fold insert(zero, extract(X, 0), 0) -> X, when X is known to….

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

71 lines

test/

CodeGen/

AArch64/

sve-implicit-zero-filling.ll

239 lines

Diff 342720

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 15,895 Lines • ▼ Show 20 Lines	static SDValue combineSVEPrefetchVecBaseImmOff(SDNode *N, SelectionDAG &DAG,
// `aarch64_sve_prfb_gather_uxtw_index`.		// `aarch64_sve_prfb_gather_uxtw_index`.
SDLoc DL(N);		SDLoc DL(N);
Ops[1] = DAG.getConstant(Intrinsic::aarch64_sve_prfb_gather_uxtw_index, DL,		Ops[1] = DAG.getConstant(Intrinsic::aarch64_sve_prfb_gather_uxtw_index, DL,
MVT::i64);		MVT::i64);

return DAG.getNode(N->getOpcode(), DL, DAG.getVTList(MVT::Other), Ops);		return DAG.getNode(N->getOpcode(), DL, DAG.getVTList(MVT::Other), Ops);
}		}

		// Return true if the vector operation can guarantee only the first lane of its
		// result contains data, with all bits in other lanes set to zero.
		static bool isLanes1toNKnownZero(SDValue Op) {
		david-armUnsubmitted Not Done Reply Inline Actions Sorry to chip in here as I realise I'm not a reviewer. :) However, the ANDV instruction has a scalar SIMD&FP register for it's result, so it doesn't feel right to say all the other lanes >0 in the equivalent SVE register are zero. I'd have expected something more like isOnlyFirstLaneDefined? I understand you named this function because currently it's only called from performInsertVectorEltCombine where the insert vector is a null splat, but other users may call it elsewhere in future and it feels a bit dangerous to give it a misleading name that's all. david-arm: Sorry to chip in here as I realise I'm not a reviewer. :) However, the ANDV instruction has a…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions This is not true @david-arm . The SVE reduction nodes are defined to return a vector result to match the instructions behaviour of modifying all bits of their destination Z register. You can see this in LowerReductionToSVE where after the reduction we extract the scalar required to match the definition of the common `VECREDUCE` nodes. This patch is one of the reasons we do this so the explicit behaviour is captured and thus can be taken advantage of. paulwalker-arm: This is not true @david-arm . The SVE reduction nodes are defined to return a vector result to…
		david-armUnsubmitted Not Done Reply Inline Actions Ah ok, fair enough. I was just looking at the actual ANDV instruction in the developer manual that's all, whereas from what you're saying we've defined ANDV_PRED to zero all lanes > 0. Sorry for the confusion! david-arm: Ah ok, fair enough. I was just looking at the actual ANDV instruction in the developer manual…
		switch (Op.getOpcode()) {
		default:
		return false;
		case AArch64ISD::ANDV_PRED:
		case AArch64ISD::EORV_PRED:
		case AArch64ISD::FADDA_PRED:
		case AArch64ISD::FADDV_PRED:
		case AArch64ISD::FMAXNMV_PRED:
		case AArch64ISD::FMAXV_PRED:
		case AArch64ISD::FMINNMV_PRED:
		case AArch64ISD::FMINV_PRED:
		case AArch64ISD::ORV_PRED:
		case AArch64ISD::SADDV_PRED:
		case AArch64ISD::SMAXV_PRED:
		case AArch64ISD::SMINV_PRED:
		case AArch64ISD::UADDV_PRED:
		case AArch64ISD::UMAXV_PRED:
		case AArch64ISD::UMINV_PRED:
		return true;
		}
		}

		static SDValue removeRedundantInsertVectorElt(SDNode *N) {
		assert(N->getOpcode() == ISD::INSERT_VECTOR_ELT && "Unexpected node!");
		SDValue InsertVec = N->getOperand(0);
		dmgreenUnsubmitted Done Reply Inline Actions Can this use isConstantSplatVectorAllZeros? dmgreen: Can this use isConstantSplatVectorAllZeros?
		SDValue InsertElt = N->getOperand(1);
		SDValue InsertIdx = N->getOperand(2);

		// We only care about inserts into the first element...
		if (!isNullConstant(InsertIdx))
		return SDValue();
		// ...of a zero'd vector...
		if (!ISD::isConstantSplatVectorAllZeros(InsertVec.getNode()))
		return SDValue();
		// ...where the inserted data was previously extracted...
		if (InsertElt.getOpcode() != ISD::EXTRACT_VECTOR_ELT)
		return SDValue();

		SDValue ExtractVec = InsertElt.getOperand(0);
		SDValue ExtractIdx = InsertElt.getOperand(1);

		// ...from the first element of a vector.
		if (!isNullConstant(ExtractIdx))
		return SDValue();

		// If we get here we are effectively trying to zero lanes 1-N of a vector.

		// Ensure there's no type conversion going on.
		if (N->getValueType(0) != ExtractVec.getValueType())
		return SDValue();

		if (!isLanes1toNKnownZero(ExtractVec))
		return SDValue();

		// The explicit zeroing is redundant.
		return ExtractVec;
		}

		static SDValue
		performInsertVectorEltCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI) {
		if (SDValue Res = removeRedundantInsertVectorElt(N))
		return Res;

		return performPostLD1Combine(N, DCI, true);
		}

SDValue AArch64TargetLowering::PerformDAGCombine(SDNode *N,		SDValue AArch64TargetLowering::PerformDAGCombine(SDNode *N,
DAGCombinerInfo &DCI) const {		DAGCombinerInfo &DCI) const {
SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
switch (N->getOpcode()) {		switch (N->getOpcode()) {
default:		default:
LLVM_DEBUG(dbgs() << "Custom combining: skipping\n");		LLVM_DEBUG(dbgs() << "Custom combining: skipping\n");
break;		break;
case ISD::ABS:		case ISD::ABS:
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::PerformDAGCombine(SDNode *N,
case AArch64ISD::GLD1S_SCALED_MERGE_ZERO:		case AArch64ISD::GLD1S_SCALED_MERGE_ZERO:
case AArch64ISD::GLD1S_UXTW_MERGE_ZERO:		case AArch64ISD::GLD1S_UXTW_MERGE_ZERO:
case AArch64ISD::GLD1S_SXTW_MERGE_ZERO:		case AArch64ISD::GLD1S_SXTW_MERGE_ZERO:
case AArch64ISD::GLD1S_UXTW_SCALED_MERGE_ZERO:		case AArch64ISD::GLD1S_UXTW_SCALED_MERGE_ZERO:
case AArch64ISD::GLD1S_SXTW_SCALED_MERGE_ZERO:		case AArch64ISD::GLD1S_SXTW_SCALED_MERGE_ZERO:
case AArch64ISD::GLD1S_IMM_MERGE_ZERO:		case AArch64ISD::GLD1S_IMM_MERGE_ZERO:
return performGLD1Combine(N, DAG);		return performGLD1Combine(N, DAG);
case ISD::INSERT_VECTOR_ELT:		case ISD::INSERT_VECTOR_ELT:
return performPostLD1Combine(N, DCI, true);		return performInsertVectorEltCombine(N, DCI);
case ISD::EXTRACT_VECTOR_ELT:		case ISD::EXTRACT_VECTOR_ELT:
return performExtractVectorEltCombine(N, DAG);		return performExtractVectorEltCombine(N, DAG);
		peterwaller-armUnsubmitted Done Reply Inline Actions Nit: indent. Also suggestion: `if (Res != SDValue()) return Res;`, that way the code can be extended to handle other combines that come in the future with minimal further changes. peterwaller-arm: Nit: indent. Also suggestion: `if (Res != SDValue()) return Res;`, that way the code can be…
		peterwaller-armUnsubmitted Done Reply Inline Actions Erratum: not sure what I was seeing with the whitespace. I think I misread phabricator's diff whitespace hints here. peterwaller-arm: Erratum: not sure what I was seeing with the whitespace. I think I misread phabricator's diff…
case ISD::VECREDUCE_ADD:		case ISD::VECREDUCE_ADD:
return performVecReduceAddCombine(N, DCI.DAG, Subtarget);		return performVecReduceAddCombine(N, DCI.DAG, Subtarget);
		dmgreenUnsubmitted Done Reply Inline Actions I would create a function for this, to keep PerformDAGCombine straight forward. It is common to do: if (SDValue Res = performInsertVectorEltCombine(N, DAG)) return Res return performPostLD1Combine(N, DCI, true); dmgreen: I would create a function for this, to keep PerformDAGCombine straight forward. It is common to…
case ISD::STEP_VECTOR:		case ISD::STEP_VECTOR:
return performStepVectorCombine(N, DCI, DAG);		return performStepVectorCombine(N, DCI, DAG);
case ISD::INTRINSIC_VOID:		case ISD::INTRINSIC_VOID:
case ISD::INTRINSIC_W_CHAIN:		case ISD::INTRINSIC_W_CHAIN:
switch (cast<ConstantSDNode>(N->getOperand(1))->getZExtValue()) {		switch (cast<ConstantSDNode>(N->getOperand(1))->getZExtValue()) {
case Intrinsic::aarch64_sve_prfb_gather_scalar_offset:		case Intrinsic::aarch64_sve_prfb_gather_scalar_offset:
return combineSVEPrefetchVecBaseImmOff(N, DAG, 1 /=ScalarSizeInBytes/);		return combineSVEPrefetchVecBaseImmOff(N, DAG, 1 /=ScalarSizeInBytes/);
case Intrinsic::aarch64_sve_prfh_gather_scalar_offset:		case Intrinsic::aarch64_sve_prfh_gather_scalar_offset:
▲ Show 20 Lines • Show All 1,659 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-implicit-zero-filling.ll

This file was added.

				; RUN: llc < %s \| FileCheck %s

				target triple = "aarch64-unknown-linux-gnu"

				; Ensure we rely on the reduction's implicit zero filling.
				define <vscale x 16 x i8> @andv_zero_fill(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a) #0 {
				; CHECK-LABEL: andv_zero_fill:
				; CHECK: andv b0, p0, z0.b
				; CHECK-NEXT: ret
				%t1 = call i8 @llvm.aarch64.sve.andv.nxv16i8(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a)
				%t2 = insertelement <vscale x 16 x i8> zeroinitializer, i8 %t1, i64 0
				ret <vscale x 16 x i8> %t2
				}

				; Ensure we rely on the reduction's implicit zero filling.
				define <vscale x 8 x i16> @eorv_zero_fill(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a) #0 {
				; CHECK-LABEL: eorv_zero_fill:
				; CHECK: eorv h0, p0, z0.h
				; CHECK-NEXT: ret
				%t1 = call i16 @llvm.aarch64.sve.eorv.nxv8i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a)
				%t2 = insertelement <vscale x 8 x i16> zeroinitializer, i16 %t1, i64 0
				ret <vscale x 8 x i16> %t2
				}

				; Ensure we rely on the reduction's implicit zero filling.
				define <vscale x 2 x double> @fadda_zero_fill(<vscale x 2 x i1> %pg, double %init, <vscale x 2 x double> %a) #0 {
				; CHECK-LABEL: fadda_zero_fill:
				; CHECK: fadda d0, p0, d0, z1.d
				; CHECK-NEXT: ret
				%t1 = call double @llvm.aarch64.sve.fadda.nxv2f64(<vscale x 2 x i1> %pg, double %init, <vscale x 2 x double> %a)
				%t2 = insertelement <vscale x 2 x double> zeroinitializer, double %t1, i64 0
				ret <vscale x 2 x double> %t2
				}

				; Ensure we rely on the reduction's implicit zero filling.
				define <vscale x 4 x float> @faddv_zero_fill(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a) #0 {
				; CHECK-LABEL: faddv_zero_fill:
				; CHECK: faddv s0, p0, z0.s
				; CHECK-NEXT: ret
				%t1 = call float @llvm.aarch64.sve.faddv.nxv4f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a)
				%t2 = insertelement <vscale x 4 x float> zeroinitializer, float %t1, i64 0
				ret <vscale x 4 x float> %t2
				}

				; Ensure we rely on the reduction's implicit zero filling.
				define <vscale x 8 x half> @fmaxv_zero_fill(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a) #0 {
				; CHECK-LABEL: fmaxv_zero_fill:
				; CHECK: fmaxv h0, p0, z0.h
				; CHECK-NEXT: ret
				%t1 = call half @llvm.aarch64.sve.fmaxv.nxv8f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a)
				%t2 = insertelement <vscale x 8 x half> zeroinitializer, half %t1, i64 0
				ret <vscale x 8 x half> %t2
				}

				; Ensure we rely on the reduction's implicit zero filling.
				define <vscale x 2 x float> @fmaxnmv_zero_fill(<vscale x 2 x i1> %pg, <vscale x 2 x float> %a) #0 {
				; CHECK-LABEL: fmaxnmv_zero_fill:
				; CHECK: fmaxnmv s0, p0, z0.s
				; CHECK-NEXT: ret
				%t1 = call float @llvm.aarch64.sve.fmaxnmv.nxv2f32(<vscale x 2 x i1> %pg, <vscale x 2 x float> %a)
				%t2 = insertelement <vscale x 2 x float> zeroinitializer, float %t1, i64 0
				ret <vscale x 2 x float> %t2
				}

				; Ensure we rely on the reduction's implicit zero filling.
				define <vscale x 2 x float> @fminnmv_zero_fill(<vscale x 2 x i1> %pg, <vscale x 2 x float> %a) #0 {
				; CHECK-LABEL: fminnmv_zero_fill:
				; CHECK: fminnmv s0, p0, z0.s
				; CHECK-NEXT: ret
				%t1 = call float @llvm.aarch64.sve.fminnmv.nxv2f32(<vscale x 2 x i1> %pg, <vscale x 2 x float> %a)
				%t2 = insertelement <vscale x 2 x float> zeroinitializer, float %t1, i64 0
				ret <vscale x 2 x float> %t2
				}

				; Ensure we rely on the reduction's implicit zero filling.
				define <vscale x 2 x float> @fminv_zero_fill(<vscale x 2 x i1> %pg, <vscale x 2 x float> %a) #0 {
				; CHECK-LABEL: fminv_zero_fill:
				; CHECK: fminv s0, p0, z0.s
				; CHECK-NEXT: ret
				%t1 = call float @llvm.aarch64.sve.fminv.nxv2f32(<vscale x 2 x i1> %pg, <vscale x 2 x float> %a)
				%t2 = insertelement <vscale x 2 x float> zeroinitializer, float %t1, i64 0
				ret <vscale x 2 x float> %t2
				}

				; Ensure we rely on the reduction's implicit zero filling.
				define <vscale x 4 x i32> @orv_zero_fill(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a) #0 {
				; CHECK-LABEL: orv_zero_fill:
				; CHECK: orv s0, p0, z0.s
				; CHECK-NEXT: ret
				%t1 = call i32 @llvm.aarch64.sve.orv.nxv4i32(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a)
				%t2 = insertelement <vscale x 4 x i32> zeroinitializer, i32 %t1, i64 0
				ret <vscale x 4 x i32> %t2
				}

				; Ensure we rely on the reduction's implicit zero filling.
				define <vscale x 2 x i64> @saddv_zero_fill(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a) #0 {
				; CHECK-LABEL: saddv_zero_fill:
				; CHECK: saddv d0, p0, z0.b
				; CHECK-NEXT: ret
				%t1 = call i64 @llvm.aarch64.sve.saddv.nxv16i8(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a)
				%t2 = insertelement <vscale x 2 x i64> zeroinitializer, i64 %t1, i64 0
				ret <vscale x 2 x i64> %t2
				}

				; Ensure we rely on the reduction's implicit zero filling.
				define <vscale x 2 x i64> @smaxv_zero_fill(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a) #0 {
				; CHECK-LABEL: smaxv_zero_fill:
				; CHECK: smaxv d0, p0, z0.d
				; CHECK-NEXT: ret
				%t1 = call i64 @llvm.aarch64.sve.smaxv.nxv2i64(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a)
				%t2 = insertelement <vscale x 2 x i64> zeroinitializer, i64 %t1, i64 0
				ret <vscale x 2 x i64> %t2
				}

				; Ensure we rely on the reduction's implicit zero filling.
				define <vscale x 4 x i32> @sminv_zero_fill(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a) #0 {
				; CHECK-LABEL: sminv_zero_fill:
				; CHECK: sminv s0, p0, z0.s
				; CHECK-NEXT: ret
				%t1 = call i32 @llvm.aarch64.sve.sminv.nxv4i32(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a)
				%t2 = insertelement <vscale x 4 x i32> zeroinitializer, i32 %t1, i64 0
				ret <vscale x 4 x i32> %t2
				}

				; Ensure we rely on the reduction's implicit zero filling.
				define <vscale x 2 x i64> @uaddv_zero_fill(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a) #0 {
				; CHECK-LABEL: uaddv_zero_fill:
				; CHECK: uaddv d0, p0, z0.h
				; CHECK-NEXT: ret
				%t1 = call i64 @llvm.aarch64.sve.uaddv.nxv8i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a)
				%t2 = insertelement <vscale x 2 x i64> zeroinitializer, i64 %t1, i64 0
				ret <vscale x 2 x i64> %t2
				}

				; Ensure we rely on the reduction's implicit zero filling.
				define <vscale x 16 x i8> @umaxv_zero_fill(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a) #0 {
				; CHECK-LABEL: umaxv_zero_fill:
				; CHECK: umaxv b0, p0, z0.b
				; CHECK-NEXT: ret
				%t1 = call i8 @llvm.aarch64.sve.umaxv.nxv16i8(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a)
				%t2 = insertelement <vscale x 16 x i8> zeroinitializer, i8 %t1, i64 0
				ret <vscale x 16 x i8> %t2
				}

				; Ensure we rely on the reduction's implicit zero filling.
				define <vscale x 2 x i64> @uminv_zero_fill(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a) #0 {
				; CHECK-LABEL: uminv_zero_fill:
				; CHECK: uminv d0, p0, z0.d
				; CHECK-NEXT: ret
				%t1 = call i64 @llvm.aarch64.sve.uminv.nxv2i64(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a)
				%t2 = insertelement <vscale x 2 x i64> zeroinitializer, i64 %t1, i64 0
				ret <vscale x 2 x i64> %t2
				}

				; Ensure explicit zeroing when inserting into a lane other than 0.
				; NOTE: This test doesn't care about the exact way an insert is code generated,
				; so only checks the presence of one instruction from the expected chain.
				define <vscale x 2 x i64> @zero_fill_non_zero_index(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a) #0 {
				; CHECK-LABEL: zero_fill_non_zero_index:
				; CHECK: uminv d{{[0-9]+}}, p0, z0.d
				; CHECK: mov z{{[0-9]+}}.d, p{{[0-9]+}}/m, x{{[0-9]+}}
				; CHECK: ret
				%t1 = call i64 @llvm.aarch64.sve.uminv.nxv2i64(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a)
				%t2 = insertelement <vscale x 2 x i64> zeroinitializer, i64 %t1, i64 1
				ret <vscale x 2 x i64> %t2
				}

				; Ensure explicit zeroing when the result vector is larger than that produced by
				; the reduction instruction.
				define <vscale x 4 x i64> @zero_fill_type_mismatch(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a) #0 {
				; CHECK-LABEL: zero_fill_type_mismatch:
				; CHECK: uminv d0, p0, z0.d
				; CHECK-NEXT: mov z1.d, #0
				; CHECK-NEXT: ret
				%t1 = call i64 @llvm.aarch64.sve.uminv.nxv2i64(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a)
				%t2 = insertelement <vscale x 4 x i64> zeroinitializer, i64 %t1, i64 0
				ret <vscale x 4 x i64> %t2
				}

				; Ensure explicit zeroing when extracting an element from an operation that
				; cannot guarantee lanes 1-N are zero.
				; NOTE: This test doesn't care about the exact way an insert is code generated,
				; so only checks the presence of one instruction from the expected chain.
				define <vscale x 2 x i64> @zero_fill_no_zero_upper_lanes(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a) #0 {
				; CHECK-LABEL: zero_fill_no_zero_upper_lanes:
				; CHECK: umin z{{[0-9]+}}.d, p0/m, z0.d, z0.d
				; CHECK: mov z{{[0-9]+}}.d, p{{[0-9]+}}/m, x{{[0-9]+}}
				; CHECK: ret
				%t1 = call <vscale x 2 x i64> @llvm.aarch64.sve.umin.nxv2i64(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a, <vscale x 2 x i64> %a)
				%t2 = extractelement <vscale x 2 x i64> %t1, i64 0
				%t3 = insertelement <vscale x 2 x i64> zeroinitializer, i64 %t2, i64 0
				ret <vscale x 2 x i64> %t3
				}

				declare i8 @llvm.aarch64.sve.andv.nxv2i8(<vscale x 2 x i1>, <vscale x 2 x i8>)
				declare i8 @llvm.aarch64.sve.andv.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>)

				declare i8 @llvm.aarch64.sve.eorv.nxv2i8(<vscale x 2 x i1>, <vscale x 2 x i8>)
				declare i16 @llvm.aarch64.sve.eorv.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>)

				declare float @llvm.aarch64.sve.fadda.nxv2f32(<vscale x 2 x i1>, float, <vscale x 2 x float>)
				declare double @llvm.aarch64.sve.fadda.nxv2f64(<vscale x 2 x i1>, double, <vscale x 2 x double>)

				declare float @llvm.aarch64.sve.faddv.nxv2f32(<vscale x 2 x i1>, <vscale x 2 x float>)
				declare float @llvm.aarch64.sve.faddv.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>)

				declare float @llvm.aarch64.sve.fmaxnmv.nxv2f32(<vscale x 2 x i1>, <vscale x 2 x float>)

				declare half @llvm.aarch64.sve.fmaxv.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>)
				declare float @llvm.aarch64.sve.fmaxv.nxv2f32(<vscale x 2 x i1>, <vscale x 2 x float>)

				declare float @llvm.aarch64.sve.fminv.nxv2f32(<vscale x 2 x i1>, <vscale x 2 x float>)

				declare float @llvm.aarch64.sve.fminnmv.nxv2f32(<vscale x 2 x i1>, <vscale x 2 x float>)

				declare i8 @llvm.aarch64.sve.orv.nxv2i8(<vscale x 2 x i1>, <vscale x 2 x i8>)
				declare i32 @llvm.aarch64.sve.orv.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>)

				declare i64 @llvm.aarch64.sve.saddv.nxv2i8(<vscale x 2 x i1>, <vscale x 2 x i8>)
				declare i64 @llvm.aarch64.sve.saddv.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>)

				declare i8 @llvm.aarch64.sve.smaxv.nxv2i8(<vscale x 2 x i1>, <vscale x 2 x i8>)
				declare i64 @llvm.aarch64.sve.smaxv.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>)

				declare i8 @llvm.aarch64.sve.sminv.nxv2i8(<vscale x 2 x i1>, <vscale x 2 x i8>)
				declare i32 @llvm.aarch64.sve.sminv.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>)

				declare i64 @llvm.aarch64.sve.uaddv.nxv2i8(<vscale x 2 x i1>, <vscale x 2 x i8>)
				declare i64 @llvm.aarch64.sve.uaddv.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>)

				declare i8 @llvm.aarch64.sve.umaxv.nxv2i8(<vscale x 2 x i1>, <vscale x 2 x i8>)
				declare i8 @llvm.aarch64.sve.umaxv.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>)

				declare i8 @llvm.aarch64.sve.uminv.nxv2i8(<vscale x 2 x i1>, <vscale x 2 x i8>)
				declare i64 @llvm.aarch64.sve.uminv.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>)

				declare <vscale x 2 x i64> @llvm.aarch64.sve.umin.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>, <vscale x 2 x i64>)

				attributes #0 = { "target-features"="+sve" }