This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
DAGCombiner.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
sve-fixed-length-ptest.ll

Differential D117574

[AArch64][SVE] POC: Use predicate registers for <N x i1> expression trees.
Changes PlannedPublic

Authored by sdesmalen on Jan 18 2022, 9:30 AM.

Download Raw Diff

Details

Reviewers

efriedma

Summary

By default fixed-width i1 vectors are promoted, but when SVE is available,
some expression trees can be rewritten to use <vscale x M x i1> types,
such that all operations are performed on predicate registers, thus
avoiding unnecessary sign-extends and truncates.

The example chosen in this patch is to optimise an OR reduction
of a <N x i1> type, which can be implemented directly with a PTEST
instruction.

Note: this patch also contains a few other improvements that can be
split out into individual patches.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

sdesmalen created this revision.Jan 18 2022, 9:30 AM

Herald added a reviewer: efriedma. · View Herald TranscriptJan 18 2022, 9:30 AM

Herald added subscribers: ctetreau, ecnelises, psnobl and 3 others. · View Herald Transcript

sdesmalen requested review of this revision.Jan 18 2022, 9:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 18 2022, 9:30 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

rscottmanley added a subscriber: rscottmanley.Jan 18 2022, 9:59 AM

Harbormaster completed remote builds in B144039: Diff 400879.Jan 18 2022, 11:09 AM

The other possible approach I can think of is to reconsider the way legalization works for i1 vectors. This transform is basically reversing work done by type legalization: the legalizer promotes i1 vectors because they aren't legal. We could, instead, use some sort of custom legalization for i1 vectors: instead of promoting the element type, convert them directly to scalable vectors. Probably more work to implement initially. But it might be easier to reason about the profitability if we avoid generating sign-extensions that shouldn't exist in the first place.

Which approach is better depends on how complex propagatePredicateTy gets, I guess. If we just have 100 lines of code to reverse sign-extensions, fine; if we end up with 1000 lines, probably we should reconsider the approach.

Herald added a reviewer: efriedma. · View Herald TranscriptJan 18 2022, 1:56 PM

peterwaller-arm added a subscriber: peterwaller-arm.Jan 19 2022, 4:30 AM

Matt added a subscriber: Matt.Jan 25 2022, 3:16 PM

sdesmalen mentioned this in D119346: [AArch64][SVE] Perform fixed-width predicate OR reduction on SVE predicate vectors..Feb 9 2022, 8:29 AM

In D117574#3252523, @efriedma wrote:

The other possible approach I can think of is to reconsider the way legalization works for i1 vectors. This transform is basically reversing work done by type legalization: the legalizer promotes i1 vectors because they aren't legal. We could, instead, use some sort of custom legalization for i1 vectors: instead of promoting the element type, convert them directly to scalable vectors. Probably more work to implement initially. But it might be easier to reason about the profitability if we avoid generating sign-extensions that shouldn't exist in the first place.

Which approach is better depends on how complex propagatePredicateTy gets, I guess. If we just have 100 lines of code to reverse sign-extensions, fine; if we end up with 1000 lines, probably we should reconsider the approach.

My understanding is that decisions have been made for NEON on how to represent fixed-width vectors of i1's (i.e. through promotion) and we're kind of bound to those choices going forward. This avoids mixing the two representations (or better: their definition of whether they are illegal/legal types) based on the amount of elements in the vector or on where the vectors are used. It seems doable to undo the type legalisation for certain cases, such as vecreduce_or. From what I've seen so far, I expect we'll need to support only a handful of cases to bubble up the sign-extend + extract_subvector.

I've simplified the approach and put up a new patch here: D119346

sdesmalen planned changes to this revision.Feb 9 2022, 8:31 AM

For NEON, we're obviously forced to promote, sure. And that means even if we have SVE, we're forced to promote across call boundaries. That doesn't necessarily constrain what we do within a function; there's space to prefer SVE operations more aggressively. But it makes things more complicated, sure. In particular, we probably don't want to try to deal with all the side-effects if we try to mark the types "legal".

I think there are still potential alternatives to consider, though. Maybe instead of actually using the type legalization machinery, we could DAGCombine operations involving fixed-width predicates before type legalization. Or we can use custom lowering to generate some sequence that eventually produces a value of the right type, but is easier to analyze. There's some space to explore here for representations of fixed-width SETCC that aren't just SETCC with a promoted result type.

Oh, hmm, that's basically what you're doing in D119346. Okay. :)

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

6 lines

Target/

AArch64/

AArch64ISelLowering.cpp

114 lines

test/

CodeGen/

AArch64/

sve-fixed-length-ptest.ll

97 lines

Diff 400879

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 22,291 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitINSERT_SUBVECTOR(SDNode *N) {
// INSERT_SUBVECTOR( INSERT_SUBVECTOR( Vec, SubOld, Idx ), SubNew, Idx )		// INSERT_SUBVECTOR( INSERT_SUBVECTOR( Vec, SubOld, Idx ), SubNew, Idx )
// --> INSERT_SUBVECTOR( Vec, SubNew, Idx )		// --> INSERT_SUBVECTOR( Vec, SubNew, Idx )
if (N0.getOpcode() == ISD::INSERT_SUBVECTOR &&		if (N0.getOpcode() == ISD::INSERT_SUBVECTOR &&
N0.getOperand(1).getValueType() == N1.getValueType() &&		N0.getOperand(1).getValueType() == N1.getValueType() &&
N0.getOperand(2) == N2)		N0.getOperand(2) == N2)
return DAG.getNode(ISD::INSERT_SUBVECTOR, SDLoc(N), VT, N0.getOperand(0),		return DAG.getNode(ISD::INSERT_SUBVECTOR, SDLoc(N), VT, N0.getOperand(0),
N1, N2);		N1, N2);

		// Combine INSERT_SUBVECTOR(UNDEF, SPLAT(X)) -> SPLAT(X)
		if (N0.isUndef() && N1.getOpcode() == ISD::SPLAT_VECTOR) {
		SDValue Scalar = DAG.getSplatValue(N1, true);
		return DAG.getSplatVector(VT, SDLoc(N), Scalar);
		}

// Eliminate an intermediate insert into an undef vector:		// Eliminate an intermediate insert into an undef vector:
// insert_subvector undef, (insert_subvector undef, X, 0), N2 -->		// insert_subvector undef, (insert_subvector undef, X, 0), N2 -->
// insert_subvector undef, X, N2		// insert_subvector undef, X, N2
if (N0.isUndef() && N1.getOpcode() == ISD::INSERT_SUBVECTOR &&		if (N0.isUndef() && N1.getOpcode() == ISD::INSERT_SUBVECTOR &&
N1.getOperand(0).isUndef() && isNullConstant(N1.getOperand(2)))		N1.getOperand(0).isUndef() && isNullConstant(N1.getOperand(2)))
return DAG.getNode(ISD::INSERT_SUBVECTOR, SDLoc(N), VT, N0,		return DAG.getNode(ISD::INSERT_SUBVECTOR, SDLoc(N), VT, N0,
N1.getOperand(1), N2);		N1.getOperand(1), N2);

▲ Show 20 Lines • Show All 1,749 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,984 Lines • ▼ Show 20 Lines	if (useSVEForFixedLengthVectorVT(InVT)) {
SDValue Splice = DAG.getNode(ISD::VECTOR_SPLICE, DL, ContainerVT, NewInVec,		SDValue Splice = DAG.getNode(ISD::VECTOR_SPLICE, DL, ContainerVT, NewInVec,
NewInVec, DAG.getConstant(Idx, DL, MVT::i64));		NewInVec, DAG.getConstant(Idx, DL, MVT::i64));
return convertFromScalableVector(DAG, Op.getValueType(), Splice);		return convertFromScalableVector(DAG, Op.getValueType(), Splice);
}		}

return SDValue();		return SDValue();
}		}

		static SDValue getPredicateForFixedLengthVector(SelectionDAG &DAG, SDLoc &DL,
		EVT VT);

SDValue AArch64TargetLowering::LowerINSERT_SUBVECTOR(SDValue Op,		SDValue AArch64TargetLowering::LowerINSERT_SUBVECTOR(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
assert(Op.getValueType().isScalableVector() &&		assert(Op.getValueType().isScalableVector() &&
"Only expect to lower inserts into scalable vectors!");		"Only expect to lower inserts into scalable vectors!");

EVT InVT = Op.getOperand(1).getValueType();		EVT InVT = Op.getOperand(1).getValueType();
unsigned Idx = cast<ConstantSDNode>(Op.getOperand(2))->getZExtValue();		unsigned Idx = cast<ConstantSDNode>(Op.getOperand(2))->getZExtValue();

Show All 35 Lines	if (InVT.isScalableVector()) {
return SDValue();		return SDValue();
}		}

if (Idx == 0 && isPackedVectorType(VT, DAG)) {		if (Idx == 0 && isPackedVectorType(VT, DAG)) {
// This will be matched by custom code during ISelDAGToDAG.		// This will be matched by custom code during ISelDAGToDAG.
if (Vec0.isUndef())		if (Vec0.isUndef())
return Op;		return Op;

Optional<unsigned> PredPattern =		SDValue PTrue = getPredicateForFixedLengthVector(DAG, DL, InVT);
getSVEPredPatternFromNumElements(InVT.getVectorNumElements());
auto PredTy = VT.changeVectorElementType(MVT::i1);
SDValue PTrue = getPTrue(DAG, DL, PredTy, *PredPattern);
SDValue ScalableVec1 = convertToScalableVector(DAG, VT, Vec1);		SDValue ScalableVec1 = convertToScalableVector(DAG, VT, Vec1);
return DAG.getNode(ISD::VSELECT, DL, VT, PTrue, ScalableVec1, Vec0);		return DAG.getNode(ISD::VSELECT, DL, VT, PTrue, ScalableVec1, Vec0);
}		}

return SDValue();		return SDValue();
}		}

static bool isPow2Splat(SDValue Op, uint64_t &SplatVal, bool &Negated) {		static bool isPow2Splat(SDValue Op, uint64_t &SplatVal, bool &Negated) {
▲ Show 20 Lines • Show All 2,909 Lines • ▼ Show 20 Lines	static SDValue performSVEAndCombine(SDNode *N,
}		}

if (isConstantSplatVectorMaskForType(Mask.getNode(), MemVT))		if (isConstantSplatVectorMaskForType(Mask.getNode(), MemVT))
return Src;		return Src;

return SDValue();		return SDValue();
}		}

		// Recursively propagate the scalable predicate type to the leafs.
		/// \p PassthruVal is the value for lanes that are disabled under the active
		/// vector length (i.e. when sizeof(SVE register) > sizeof(fixed-width vector).
		/// \p Gaps tells whether it is allowed to have gaps in the result
		/// vector that weren't there when the vectors were all fixed-width.
		/// Gaps can be inserted when concatenating two scalable vectors that
		/// were previously fixed-width vectors, if the active vector length
		/// does not match the fixed-width vector length.
		static SDValue propagatePredicateTy(TargetLowering::DAGCombinerInfo &DCI,
		SDValue V, bool PassthruVal, bool Gaps) {
		SelectionDAG &DAG = DCI.DAG;
		SDLoc DL(V.getNode());

		const auto &Subtarget =
		static_cast<const AArch64Subtarget &>(DAG.getSubtarget());
		if (!Subtarget.hasSVE() \|\| !V->hasOneUse())
		return SDValue();

		// End of recursion.
		if (V.getValueType().isScalableVector() \|\| isa<ConstantSDNode>(V.getNode()))
		return V;

		switch (V.getOpcode()) {
		case ISD::SETCC: {
		SDValue CmpOp0 = V.getOperand(0);
		SDValue CmpOp1 = V.getOperand(1);
		SDValue CmpPred = V.getOperand(2);
		EVT FixedVT = V.getOperand(0).getValueType();
		EVT ScalableVT = getContainerForFixedLengthVector(DAG, FixedVT);
		EVT PredVT = ScalableVT.changeVectorElementType(MVT::i1);

		SDValue ZeroIdx = DAG.getConstant(0, DL, MVT::i64);
		SDValue Passthru = ScalableVT.isInteger()
		? DAG.getConstant(PassthruVal, DL, ScalableVT)
		: DAG.getConstantFP(PassthruVal, DL, ScalableVT);
		CmpOp0 = DAG.getNode(ISD::INSERT_SUBVECTOR, DL, ScalableVT, Passthru,
		CmpOp0, ZeroIdx);
		CmpOp1 = DAG.getNode(ISD::INSERT_SUBVECTOR, DL, ScalableVT, Passthru,
		CmpOp1, ZeroIdx);
		return DAG.getNode(ISD::SETCC, DL, PredVT, {CmpOp0, CmpOp1, CmpPred});
		}
		case ISD::OR:
		case ISD::XOR:
		if (SDValue Op0 =
		propagatePredicateTy(DCI, V.getOperand(0), PassthruVal, Gaps))
		if (SDValue Op1 =
		propagatePredicateTy(DCI, V.getOperand(1), PassthruVal, Gaps))
		return DAG.getNode(V.getOpcode(), DL, Op0.getValueType(), Op0, Op1);
		break;
		case ISD::TRUNCATE:
		if (SDValue Op =
		propagatePredicateTy(DCI, V.getOperand(0), PassthruVal, Gaps))
		return Op;
		break;
		case ISD::CONCAT_VECTORS: {
		if (!Gaps && Subtarget.getMinSVEVectorSizeInBits() !=
		Subtarget.getMaxSVEVectorSizeInBits())
		return SDValue();

		SmallVector<SDValue, 4> ConcatOps;
		for (unsigned I = 0; I < V.getNumOperands(); ++I) {
		SDValue Op =
		propagatePredicateTy(DCI, V.getOperand(I), PassthruVal, Gaps);
		if (!Op)
		return SDValue();
		ConcatOps.push_back(Op);
		}

		// Now generate a new vecreduce_or with the new scalable types.
		ElementCount NewConcatEC =
		ConcatOps[0].getValueType().getVectorElementCount() * ConcatOps.size();
		EVT NewConcatVT = EVT::getVectorVT(
		*DAG.getContext(), ConcatOps[0].getValueType().getVectorElementType(),
		NewConcatEC);
		return DAG.getNode(ISD::CONCAT_VECTORS, DL, NewConcatVT, ConcatOps);
		}
		case ISD::VECREDUCE_OR: {
		if (SDValue NewOp =
		propagatePredicateTy(DCI, V.getOperand(0), PassthruVal, Gaps))
		return DAG.getNode(ISD::VECREDUCE_OR, DL, V.getValueType(), NewOp);
		break;
		}
		default:
		break;
		}

		// Can't propagate scalable predicate any further.
		return SDValue();
		}

static SDValue performANDCombine(SDNode *N,		static SDValue performANDCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI) {		TargetLowering::DAGCombinerInfo &DCI) {
SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
SDValue LHS = N->getOperand(0);		SDValue LHS = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
if (!VT.isVector() \|\| !DAG.getTargetLoweringInfo().isTypeLegal(VT))		if (!DAG.getTargetLoweringInfo().isTypeLegal(VT))
		return SDValue();

		if (auto *C = dyn_cast<ConstantSDNode>(N->getOperand(1))) {
		if (C->getZExtValue() == 1 &&
		N->getOperand(0)->getOpcode() == ISD::VECREDUCE_OR)
		return propagatePredicateTy(DCI, N->getOperand(0), /PassthruVal=/false,
		/Gaps=/true);
		}

		if (!VT.isVector())
return SDValue();		return SDValue();

if (VT.isScalableVector())		if (VT.isScalableVector())
return performSVEAndCombine(N, DCI);		return performSVEAndCombine(N, DCI);

// The combining code below works only for NEON vectors. In particular, it		// The combining code below works only for NEON vectors. In particular, it
// does not work for SVE when dealing with vectors wider than 128 bits.		// does not work for SVE when dealing with vectors wider than 128 bits.
if (!(VT.is64BitVector() \|\| VT.is128BitVector()))		if (!(VT.is64BitVector() \|\| VT.is128BitVector()))
▲ Show 20 Lines • Show All 2,747 Lines • ▼ Show 20 Lines
// vselect (v1iXX setcc) (XX is the size of the compared operand type)		// vselect (v1iXX setcc) (XX is the size of the compared operand type)
// FIXME: Currently the type legalizer can't handle VSELECT having v1i1 as		// FIXME: Currently the type legalizer can't handle VSELECT having v1i1 as
// condition. If it can legalize "VSELECT v1i1" correctly, no need to combine		// condition. If it can legalize "VSELECT v1i1" correctly, no need to combine
// such VSELECT.		// such VSELECT.
static SDValue performVSelectCombine(SDNode *N, SelectionDAG &DAG) {		static SDValue performVSelectCombine(SDNode *N, SelectionDAG &DAG) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT CCVT = N0.getValueType();		EVT CCVT = N0.getValueType();

		if (isAllActivePredicate(N0))
		return N->getOperand(1);

// Check for sign pattern (VSELECT setgt, iN lhs, -1, 1, -1) and transform		// Check for sign pattern (VSELECT setgt, iN lhs, -1, 1, -1) and transform
// into (OR (ASR lhs, N-1), 1), which requires less instructions for the		// into (OR (ASR lhs, N-1), 1), which requires less instructions for the
// supported types.		// supported types.
SDValue SetCC = N->getOperand(0);		SDValue SetCC = N->getOperand(0);
if (SetCC.getOpcode() == ISD::SETCC &&		if (SetCC.getOpcode() == ISD::SETCC &&
SetCC.getOperand(2) == DAG.getCondCode(ISD::SETGT)) {		SetCC.getOperand(2) == DAG.getCondCode(ISD::SETGT)) {
SDValue CmpLHS = SetCC.getOperand(0);		SDValue CmpLHS = SetCC.getOperand(0);
EVT VT = CmpLHS.getValueType();		EVT VT = CmpLHS.getValueType();
▲ Show 20 Lines • Show All 2,051 Lines • ▼ Show 20 Lines	static SDValue getPredicateForFixedLengthVector(SelectionDAG &DAG, SDLoc &DL,
switch (VT.getVectorElementType().getSimpleVT().SimpleTy) {		switch (VT.getVectorElementType().getSimpleVT().SimpleTy) {
default:		default:
llvm_unreachable("unexpected element type for SVE predicate");		llvm_unreachable("unexpected element type for SVE predicate");
case MVT::i8:		case MVT::i8:
MaskVT = MVT::nxv16i1;		MaskVT = MVT::nxv16i1;
break;		break;
case MVT::i16:		case MVT::i16:
case MVT::f16:		case MVT::f16:
		case MVT::bf16:
MaskVT = MVT::nxv8i1;		MaskVT = MVT::nxv8i1;
break;		break;
case MVT::i32:		case MVT::i32:
case MVT::f32:		case MVT::f32:
MaskVT = MVT::nxv4i1;		MaskVT = MVT::nxv4i1;
break;		break;
case MVT::i64:		case MVT::i64:
case MVT::f64:		case MVT::f64:
▲ Show 20 Lines • Show All 996 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-fixed-length-ptest.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=aarch64 -mattr=+sve < %s \| FileCheck %s

				define i1 @ptest_v16i1_256bit_min_sve(float* %a, float * %b) vscale_range(2, 0) {
				; CHECK-LABEL: ptest_v16i1_256bit_min_sve:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov x8, #8
				; CHECK-NEXT: ptrue p0.s, vl8
				; CHECK-NEXT: mov z2.s, #0 // =0x0
				; CHECK-NEXT: ptrue p1.s
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
				; CHECK-NEXT: ld1w { z1.s }, p0/z, [x0]
				; CHECK-NEXT: sel z0.s, p0, z0.s, z2.s
				; CHECK-NEXT: sel z1.s, p0, z1.s, z2.s
				; CHECK-NEXT: fcmeq p0.s, p1/z, z0.s, #0.0
				; CHECK-NEXT: fcmeq p2.s, p1/z, z1.s, #0.0
				; CHECK-NEXT: not p0.b, p1/z, p0.b
				; CHECK-NEXT: not p1.b, p1/z, p2.b
				; CHECK-NEXT: uzp1 p0.h, p1.h, p0.h
				; CHECK-NEXT: ptrue p1.h
				; CHECK-NEXT: ptest p1, p0.b
				; CHECK-NEXT: cset w0, ne
				; CHECK-NEXT: ret
				%v0 = bitcast float* %a to <16 x float>*
				%v1 = load <16 x float>, <16 x float>* %v0, align 4
				%v2 = fcmp une <16 x float> %v1, zeroinitializer
				%v3 = call i1 @llvm.vector.reduce.or.i1.v16i1 (<16 x i1> %v2)
				ret i1 %v3
				}

				define i1 @ptest_v16i1_512bit_min_sve(float* %a, float * %b) vscale_range(4, 0) {
				; CHECK-LABEL: ptest_v16i1_512bit_min_sve:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s, vl16
				; CHECK-NEXT: mov z1.s, #0 // =0x0
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x0]
				; CHECK-NEXT: sel z0.s, p0, z0.s, z1.s
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: fcmeq p1.s, p0/z, z0.s, #0.0
				; CHECK-NEXT: not p1.b, p0/z, p1.b
				; CHECK-NEXT: ptest p0, p1.b
				; CHECK-NEXT: cset w0, ne
				; CHECK-NEXT: ret
				%v0 = bitcast float* %a to <16 x float>*
				%v1 = load <16 x float>, <16 x float>* %v0, align 4
				%v2 = fcmp une <16 x float> %v1, zeroinitializer
				%v3 = call i1 @llvm.vector.reduce.or.i1.v16i1 (<16 x i1> %v2)
				ret i1 %v3
				}

				define i1 @ptest_v16i1_512bit_sve(float* %a, float * %b) vscale_range(4, 4) {
				; CHECK-LABEL: ptest_v16i1_512bit_sve:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x0]
				; CHECK-NEXT: fcmeq p1.s, p0/z, z0.s, #0.0
				; CHECK-NEXT: not p1.b, p0/z, p1.b
				; CHECK-NEXT: ptest p0, p1.b
				; CHECK-NEXT: cset w0, ne
				; CHECK-NEXT: ret
				%v0 = bitcast float* %a to <16 x float>*
				%v1 = load <16 x float>, <16 x float>* %v0, align 4
				%v2 = fcmp une <16 x float> %v1, zeroinitializer
				%v3 = call i1 @llvm.vector.reduce.or.i1.v16i1 (<16 x i1> %v2)
				ret i1 %v3
				}

				define i1 @ptest_or_v16i1_512bit_min_sve(float* %a, float * %b) vscale_range(4, 0) {
				; CHECK-LABEL: ptest_or_v16i1_512bit_min_sve:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s, vl16
				; CHECK-NEXT: mov z2.s, #0 // =0x0
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ld1w { z1.s }, p0/z, [x1]
				; CHECK-NEXT: sel z0.s, p0, z0.s, z2.s
				; CHECK-NEXT: sel z1.s, p0, z1.s, z2.s
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: fcmeq p1.s, p0/z, z0.s, #0.0
				; CHECK-NEXT: fcmeq p2.s, p0/z, z1.s, #0.0
				; CHECK-NEXT: not p1.b, p0/z, p1.b
				; CHECK-NEXT: not p2.b, p0/z, p2.b
				; CHECK-NEXT: orr p1.b, p0/z, p1.b, p2.b
				; CHECK-NEXT: ptest p0, p1.b
				; CHECK-NEXT: cset w0, ne
				; CHECK-NEXT: ret
				%v0 = bitcast float* %a to <16 x float>*
				%v1 = load <16 x float>, <16 x float>* %v0, align 4
				%v2 = fcmp une <16 x float> %v1, zeroinitializer
				%v3 = bitcast float* %b to <16 x float>*
				%v4 = load <16 x float>, <16 x float>* %v3, align 4
				%v5 = fcmp une <16 x float> %v4, zeroinitializer
				%v6 = or <16 x i1> %v2, %v5
				%v7 = call i1 @llvm.vector.reduce.or.i1.v16i1 (<16 x i1> %v6)
				ret i1 %v7
				}

				declare i1 @llvm.vector.reduce.or.i1.v16i1(<16 x i1>)