Download Raw Diff

Details

Reviewers

david-arm
paulwalker-arm
eli.friedman
craig.topper

Summary

This patch introduces an experimental intrinsic for counting the
trailing zero elements in a vector. The intrinsic has generic expansion
in SelectionDAGBuilder, and for AArch64 there is a pattern which matches
to brkb & cntp instructions where SVE is enabled.

The intrinsic has a second operand to indicate whether it returns a valid
result if the first argument is all zero, similar to the existing cttz intrinsic.

These changes have been split out from D158291.

Diff Detail

Event Timeline

kmclaughlin created this revision.Aug 31 2023, 7:24 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2023, 7:24 AM

Herald added subscribers: sunshaoce, ctetreau, hiraditya, kristof.beyls. · View Herald Transcript

kmclaughlin requested review of this revision.Aug 31 2023, 7:24 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2023, 7:24 AM

Herald added subscribers: llvm-commits, jdoerfert. · View Herald Transcript

kmclaughlin mentioned this in D158291: [PoC][WIP] Add an AArch64 specific pass for loop idiom recognition.Aug 31 2023, 7:49 AM

Harbormaster completed remote builds in B256021: Diff 555033.Aug 31 2023, 7:58 AM

What's the use case for the poison behavior? If it wasn't for X86's weird bsr/bsf instruction behaviour would we have this on the regular llvm.cttz/ctlz intrinsics?

craig.topper added inline comments.Aug 31 2023, 8:09 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5337	Why AnyExt?

Matt added a subscriber: Matt.Aug 31 2023, 9:28 AM

Replaced getAnyExtOrTrunc in LowerINTRINSIC_WO_CHAIN with getZExtOrTrunc

In D159283#4631473, @craig.topper wrote:

What's the use case for the poison behavior? If it wasn't for X86's weird bsr/bsf instruction behaviour would we have this on the regular llvm.cttz/ctlz intrinsics?

Hi @craig.topper, it seemed sensible to add the poison behaviour to be consistent with the cttz intrinsic. Some targets seemed to require it, so I thought it made sense to return undef at least in the generic lowering if the result is 0.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5337	This should be getZExtOrTrunc

Harbormaster completed remote builds in B256085: Diff 555127.Aug 31 2023, 12:08 PM

Hi all,
I just wanted to leave a note to say that I will be away for a few weeks, but I will be replying to any questions and review comments on my return at the beginning of October.
Thanks!

The specific rational for mirroring cttz's second operand, as well as just being consistent, is that it can reduce the range of the result sufficiently to lead to better code generation. I'll use SVE as an example, although knowing that for SVE we don't rely on the common expansion. The largest supported SVE vector register is 256 bytes long. This means cttz_elt has a return value of 0 <= ret >= 256. For this scenario the common expansion will require 16-bit element types. However, for cases when an all zero input results in poison the range is reduced by 1 meaning 8-bit element types can be used.

The RISC-V instruction that we'd want to use for this is vfirst.m that returns -1 if no bits in a mask are set and the bit position of the first set bit otherwise. Determining that no bits are set is simply checking if the result is less than 0. This the same instruction we use for reduce.or on mask vectors.

Which sounds like mirroring cttz's second operand is useful then because you can omit the need for the extra compare?

In D159283#4634754, @paulwalker-arm wrote:

Which sounds like mirroring cttz's second operand is useful then because you can omit the need for the extra compare?

Yes. I think I'm missing when the bit will be set. Does D158291 intend to set the zero is poison bit? I see this patch doesn't use ARM specific instructions when zero is poison is set which confused me.

llvm/include/llvm/IR/Intrinsics.td
2176	why i32 instead of i1 like the existing cttz intrinsic?

Yes I believe D158291 will set the zero is poison bit because the transformation already has to handle the zero case for its quick loop path. At which point I'd expect this patch to be updated accordingly so the SVE code generation remains unaffected.

SjoerdMeijer added a subscriber: SjoerdMeijer.Sep 13 2023, 1:40 AM

The semantics of this new intrinsic should be described in LangRef.

And tests are missing: non-AArch64 tests to test generic lowering, and also AArch64 without SVE.

Added a description of @llvm.experimental.cttz.elts to LangRef.
Split the AArch64 test file so that some tests are not using SVE.
Added tests for other targets to test generic lowering.
Added the new tests & patterns from D158291 for this intrinsic to this patch.

Herald added subscribers: wangpc, luke, frasercrmck and 22 others. · View Herald TranscriptOct 23 2023, 8:27 AM

kmclaughlin added inline comments.Oct 23 2023, 8:33 AM

llvm/include/llvm/IR/Intrinsics.td
2176	Hi @craig.topper, I tried adding this argument as an i1, but ran into problems when trying to build the patterns in AArch64SVEInstrInfo.td if I tried to match this type. I tried to find some similar intrinsics but didn't find any with patterns that match an i1, which is why I chose to use an i32 with `timm32_0_1`.

Harbormaster completed remote builds in B257913: Diff 557843.Oct 23 2023, 10:14 AM

nikic added a subscriber: nikic.Oct 23 2023, 11:39 AM

nikic added inline comments.

llvm/include/llvm/IR/Intrinsics.td
2176	Can't comment on how to do this in tablegen, but as this is a target-independent intrinsic, we'd definitely want the argument to be i1 and not i32.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7541	I don't think this bit makes sense. The zero-is-poison flag allows you to return poison if this input is zero, but you don't have to return it. Setting this kind of flag should never result in more complex code, as you are generating here.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5327	I don't get what guarantees this assertion (or why it would make sense to assert this at all).

Changed the second argument of the @llvm.experimental.cttz.elts intrinsic to an i1.
Added an AArch64ISD node, CTTZ_ELTS, which is used in when lowering the intrinsic for AArch64 if shouldExpandCttzElements returned true.
Changed the generic lowering to no longer check if the result is 0 and return undef if zero-is-poison is set.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7541	Thanks @nikic, I misunderstood what the zero-is-poison flag was used for. I've removed the getSelectCC here and instead only use the flag above when computing the smallest vector element type that can be chosen for the expansion.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5327	Removed - there was nothing to guarantee that the flag would be zero here and we would want to lower the intrinsic for both possible values.

Harbormaster completed remote builds in B257932: Diff 557880.Oct 25 2023, 10:35 AM

nikic added inline comments.Oct 26 2023, 1:16 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.h
342	If you need an ISD opcode anyway, maybe make it a generic one right away? Having a generic ISD opcode and expanding it during legalization is generally preferred over doing an expansion in SDAGBuilder, as this allows for more optimal legalizations if the operation is supported only for some types.

kmclaughlin added inline comments.Oct 26 2023, 8:13 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.h
342	Adding a generic opcode for this would introduce a lot of additional complexity with legalisation, and splitting/promoting the operand would likely result in poor CodeGen. It seemed better to avoid this by expanding the intrinsic in SelectionDAGBuilder as we have for others such as get_active_lane_mask, at least until other targets really need full legalisation support.

I few comments but otherwise this patch looks good. I agree with not making CTTZ_ELTS common, it's really just a target specific convenience and given there's no during code generation use case I don't think it's worth the cost of having to implement all the plumbing required for a common node. We can cross that bridge if/when the operation proves its worth.

llvm/docs/LangRef.rst
18347–18348 ↗	(On Diff #557880)	Is this last part true? The cases not directly handled by the target are converted to common DAG nodes so there's nothing here that should be functionally broken for other targets?
18377 ↗	(On Diff #557880)	I think you can drop the 'then'?
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7517	Perhaps just `!cast<ConstantInt>(I.getOperand(1))->isZero()`?
7542	For the case where `I.getType()->getScalarSizeInBits()` is a non-power-of-two we're going to round up to the next power-of-two bit-width and so I think it's possible for `RetTy` to be smaller than `NewEltTy`? If true this should use `getZExtOrTrunc()`.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1761	Can this be `hasSVEorSME()`? This function could be written as return !Subtarget->hasSVE() \|\| VT != MVT::nxv16i1;
5329–5330	Up to you but you likely don't need this given `getZExtOrTrunc` will do the right thing (i.e. nothing) for a nop cast.

Changed shouldExpandCttzElements to query hasSVEorSME() and added a RUN line to AArch64/intrinsic-cttz-elts-sve.ll with -mattr=+sme
Removed unnecessary getZExtOrTrunc from lowering of intrinsic in AArch64ISelLowering.cpp
Added a test where the return type of the intrinsic is not a power of two

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1761	We can use `hasSVEorSME()` here, I've changed this and added a RUN line to `AArch64/intrinsic-cttz-elts-sve.ll`

paulwalker-arm accepted this revision.Oct 27 2023, 9:08 AM

This revision is now accepted and ready to land.Oct 27 2023, 9:08 AM

Harbormaster completed remote builds in B257953: Diff 557916.Oct 27 2023, 11:41 AM

Closed by commit rG3b786f2c7608

Thanks all for reviewing this patch!

GitHub <noreply@github.com> mentioned this in rGc7148467fc08: [AArch64] Add an AArch64 pass for loop idiom transformations (#72273).Mon, Jan 15, 1:22 PM

Diff 555033

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 459 Lines • ▼ Show 20 Lines	virtual bool shouldExpandGetActiveLaneMask(EVT VT, EVT OpVT) const {
return true;		return true;
}		}

virtual bool shouldExpandGetVectorLength(EVT CountVT, unsigned VF,		virtual bool shouldExpandGetVectorLength(EVT CountVT, unsigned VF,
bool IsScalable) const {		bool IsScalable) const {
return true;		return true;
}		}

		/// Return true if the @llvm.experimental.cttz.elts intrinsic should be
		/// expanded using generic code in SelectionDAGBuilder.
		virtual bool shouldExpandCttzElements(EVT VT) const { return true; }

// Return true if op(vecreduce(x), vecreduce(y)) should be reassociated to		// Return true if op(vecreduce(x), vecreduce(y)) should be reassociated to
// vecreduce(op(x, y)) for the reduction opcode RedOpc.		// vecreduce(op(x, y)) for the reduction opcode RedOpc.
virtual bool shouldReassociateReduction(unsigned RedOpc, EVT VT) const {		virtual bool shouldReassociateReduction(unsigned RedOpc, EVT VT) const {
return true;		return true;
}		}

/// Return true if it is profitable to convert a select of FP constants into		/// Return true if it is profitable to convert a select of FP constants into
/// a constant pool load whose address depends on the select condition. The		/// a constant pool load whose address depends on the select condition. The
▲ Show 20 Lines • Show All 4,891 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 2,165 Lines • ▼ Show 20 Lines	DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[IntrNoMem, IntrNoSync, IntrWillReturn]>;		[IntrNoMem, IntrNoSync, IntrWillReturn]>;

def int_experimental_get_vector_length:		def int_experimental_get_vector_length:
DefaultAttrsIntrinsic<[llvm_i32_ty],		DefaultAttrsIntrinsic<[llvm_i32_ty],
[llvm_anyint_ty, llvm_i32_ty, llvm_i1_ty],		[llvm_anyint_ty, llvm_i32_ty, llvm_i1_ty],
[IntrNoMem, IntrNoSync, IntrWillReturn,		[IntrNoMem, IntrNoSync, IntrWillReturn,
ImmArg<ArgIndex<1>>, ImmArg<ArgIndex<2>>]>;		ImmArg<ArgIndex<1>>, ImmArg<ArgIndex<2>>]>;

		def int_experimental_cttz_elts:
		DefaultAttrsIntrinsic<[llvm_anyint_ty],
		[llvm_anyvector_ty, llvm_i32_ty],
		craig.topperUnsubmitted Not Done Reply Inline Actions why i32 instead of i1 like the existing cttz intrinsic? craig.topper: why i32 instead of i1 like the existing cttz intrinsic?
		kmclaughlinAuthorUnsubmitted Done Reply Inline Actions Hi @craig.topper, I tried adding this argument as an i1, but ran into problems when trying to build the patterns in AArch64SVEInstrInfo.td if I tried to match this type. I tried to find some similar intrinsics but didn't find any with patterns that match an i1, which is why I chose to use an i32 with `timm32_0_1`. kmclaughlin: Hi @craig.topper, I tried adding this argument as an i1, but ran into problems when trying to…
		nikicUnsubmitted Done Reply Inline Actions Can't comment on how to do this in tablegen, but as this is a target-independent intrinsic, we'd definitely want the argument to be i1 and not i32. nikic: Can't comment on how to do this in tablegen, but as this is a target-independent intrinsic…
		[IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg<ArgIndex<1>>]>;

def int_experimental_vp_splice:		def int_experimental_vp_splice:
DefaultAttrsIntrinsic<[llvm_anyvector_ty],		DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>,		[LLVMMatchType<0>,
LLVMMatchType<0>,		LLVMMatchType<0>,
llvm_i32_ty,		llvm_i32_ty,
LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
llvm_i32_ty, llvm_i32_ty],		llvm_i32_ty, llvm_i32_ty],
[IntrNoMem, ImmArg<ArgIndex<2>>]>;		[IntrNoMem, ImmArg<ArgIndex<2>>]>;
▲ Show 20 Lines • Show All 382 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,482 Lines • ▼ Show 20 Lines	case Intrinsic::experimental_get_vector_length: {

SDValue UMin = DAG.getNode(ISD::UMIN, sdl, CountVT, Count, MaxEVL);		SDValue UMin = DAG.getNode(ISD::UMIN, sdl, CountVT, Count, MaxEVL);
// Clip to the result type if needed.		// Clip to the result type if needed.
SDValue Trunc = DAG.getNode(ISD::TRUNCATE, sdl, VT, UMin);		SDValue Trunc = DAG.getNode(ISD::TRUNCATE, sdl, VT, UMin);

setValue(&I, Trunc);		setValue(&I, Trunc);
return;		return;
}		}
		case Intrinsic::experimental_cttz_elts: {
		auto DL = getCurSDLoc();
		SDValue Op = getValue(I.getOperand(0));
		EVT OpVT = Op.getValueType();

		if (!TLI.shouldExpandCttzElements(OpVT)) {
		visitTargetIntrinsic(I, Intrinsic);
		return;
		}

		if (OpVT.getScalarType() != MVT::i1) {
		// Compare the input vector elements to zero & use to count trailing zeros
		SDValue AllZero = DAG.getConstant(0, DL, OpVT);
		OpVT = EVT::getVectorVT(*DAG.getContext(), MVT::i1,
		OpVT.getVectorElementCount());
		Op = DAG.getSetCC(DL, OpVT, Op, AllZero, ISD::SETNE);
		}

		// Find the smallest "sensible" element type to use for the expansion.
		ConstantRange CR(
		APInt(64, OpVT.getVectorElementCount().getKnownMinValue()));
		if (OpVT.isScalableVT())
		CR = CR.umul_sat(getVScaleRange(I.getCaller(), 64));

		unsigned EltWidth = I.getType()->getScalarSizeInBits();
		EltWidth = std::min(EltWidth, (unsigned)CR.getActiveBits());
		EltWidth = std::max(llvm::bit_ceil(EltWidth), (unsigned)8);
		paulwalker-armUnsubmitted Done Reply Inline Actions Perhaps just `!cast<ConstantInt>(I.getOperand(1))->isZero()`? paulwalker-arm: Perhaps just `!cast<ConstantInt>(I.getOperand(1))->isZero()`?

		MVT NewEltTy = MVT::getIntegerVT(EltWidth);

		// Create the new vector type & get the vector length
		EVT NewVT = EVT::getVectorVT(*DAG.getContext(), NewEltTy,
		OpVT.getVectorElementCount());

		SDValue VL =
		DAG.getElementCount(DL, NewEltTy, OpVT.getVectorElementCount());

		SDValue StepVec = DAG.getStepVector(DL, NewVT);
		SDValue SplatVL = DAG.getSplat(NewVT, DL, VL);
		SDValue StepVL = DAG.getNode(ISD::SUB, DL, NewVT, SplatVL, StepVec);
		SDValue Ext = DAG.getNode(ISD::SIGN_EXTEND, DL, NewVT, Op);
		SDValue And = DAG.getNode(ISD::AND, DL, NewVT, StepVL, Ext);
		SDValue Max = DAG.getNode(ISD::VECREDUCE_UMAX, DL, NewEltTy, And);
		SDValue Sub = DAG.getNode(ISD::SUB, DL, NewEltTy, VL, Max);

		// If the result is VL, then the input was all zero. Return UNDEF in this
		// case if zero-is-poison is set.
		if (cast<ConstantSDNode>(getValue(I.getOperand(1)))->getZExtValue() != 0) {
		Sub = DAG.getSelectCC(DL, Sub, VL, DAG.getUNDEF(NewEltTy), Sub,
		ISD::SETEQ);
		}
		nikicUnsubmitted Done Reply Inline Actions I don't think this bit makes sense. The zero-is-poison flag allows you to return poison if this input is zero, but you don't have to return it. Setting this kind of flag should never result in more complex code, as you are generating here. nikic: I don't think this bit makes sense. The zero-is-poison flag allows you to return poison if…
		kmclaughlinAuthorUnsubmitted Done Reply Inline Actions Thanks @nikic, I misunderstood what the zero-is-poison flag was used for. I've removed the getSelectCC here and instead only use the flag above when computing the smallest vector element type that can be chosen for the expansion. kmclaughlin: Thanks @nikic, I misunderstood what the zero-is-poison flag was used for. I've removed the…

		paulwalker-armUnsubmitted Done Reply Inline Actions For the case where `I.getType()->getScalarSizeInBits()` is a non-power-of-two we're going to round up to the next power-of-two bit-width and so I think it's possible for `RetTy` to be smaller than `NewEltTy`? If true this should use `getZExtOrTrunc()`. paulwalker-arm: For the case where `I.getType()->getScalarSizeInBits()` is a non-power-of-two we're going to…
		EVT RetTy = TLI.getValueType(DAG.getDataLayout(), I.getType());
		SDValue Ret = DAG.getNode(ISD::ZERO_EXTEND, DL, RetTy, Sub);

		setValue(&I, Ret);
		return;
		}
case Intrinsic::vector_insert: {		case Intrinsic::vector_insert: {
SDValue Vec = getValue(I.getOperand(0));		SDValue Vec = getValue(I.getOperand(0));
SDValue SubVec = getValue(I.getOperand(1));		SDValue SubVec = getValue(I.getOperand(1));
SDValue Index = getValue(I.getOperand(2));		SDValue Index = getValue(I.getOperand(2));

// The intrinsic's index type is i64, but the SDNode requires an index type		// The intrinsic's index type is i64, but the SDNode requires an index type
// suitable for the target. Convert the index as required.		// suitable for the target. Convert the index as required.
MVT VectorIdxTy = TLI.getVectorIdxTy(DAG.getDataLayout());		MVT VectorIdxTy = TLI.getVectorIdxTy(DAG.getDataLayout());
▲ Show 20 Lines • Show All 4,526 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 333 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
FMINV_PRED,		FMINV_PRED,
FMINNMV_PRED,		FMINNMV_PRED,

INSR,		INSR,
PTEST,		PTEST,
PTEST_ANY,		PTEST_ANY,
PTRUE,		PTRUE,

BITREVERSE_MERGE_PASSTHRU,		BITREVERSE_MERGE_PASSTHRU,
		nikicUnsubmitted Not Done Reply Inline Actions If you need an ISD opcode anyway, maybe make it a generic one right away? Having a generic ISD opcode and expanding it during legalization is generally preferred over doing an expansion in SDAGBuilder, as this allows for more optimal legalizations if the operation is supported only for some types. nikic: If you need an ISD opcode anyway, maybe make it a generic one right away? Having a generic ISD…
		kmclaughlinAuthorUnsubmitted Done Reply Inline Actions Adding a generic opcode for this would introduce a lot of additional complexity with legalisation, and splitting/promoting the operand would likely result in poor CodeGen. It seemed better to avoid this by expanding the intrinsic in SelectionDAGBuilder as we have for others such as get_active_lane_mask, at least until other targets really need full legalisation support. kmclaughlin: Adding a generic opcode for this would introduce a lot of additional complexity with…
BSWAP_MERGE_PASSTHRU,		BSWAP_MERGE_PASSTHRU,
REVH_MERGE_PASSTHRU,		REVH_MERGE_PASSTHRU,
REVW_MERGE_PASSTHRU,		REVW_MERGE_PASSTHRU,
CTLZ_MERGE_PASSTHRU,		CTLZ_MERGE_PASSTHRU,
CTPOP_MERGE_PASSTHRU,		CTPOP_MERGE_PASSTHRU,
DUP_MERGE_PASSTHRU,		DUP_MERGE_PASSTHRU,
INDEX_VECTOR,		INDEX_VECTOR,

▲ Show 20 Lines • Show All 571 Lines • ▼ Show 20 Lines	public:
bool isAllActivePredicate(SelectionDAG &DAG, SDValue N) const;		bool isAllActivePredicate(SelectionDAG &DAG, SDValue N) const;
EVT getPromotedVTForPredicate(EVT VT) const;		EVT getPromotedVTForPredicate(EVT VT) const;

EVT getAsmOperandValueType(const DataLayout &DL, Type *Ty,		EVT getAsmOperandValueType(const DataLayout &DL, Type *Ty,
bool AllowUnknown = false) const override;		bool AllowUnknown = false) const override;

bool shouldExpandGetActiveLaneMask(EVT VT, EVT OpVT) const override;		bool shouldExpandGetActiveLaneMask(EVT VT, EVT OpVT) const override;

		bool shouldExpandCttzElements(EVT VT) const override;

/// If a change in streaming mode is required on entry to/return from a		/// If a change in streaming mode is required on entry to/return from a
/// function call it emits and returns the corresponding SMSTART or SMSTOP node.		/// function call it emits and returns the corresponding SMSTART or SMSTOP node.
/// \p Entry tells whether this is before/after the Call, which is necessary		/// \p Entry tells whether this is before/after the Call, which is necessary
/// because PSTATE.SM is only queried once.		/// because PSTATE.SM is only queried once.
SDValue changeStreamingMode(SelectionDAG &DAG, SDLoc DL, bool Enable,		SDValue changeStreamingMode(SelectionDAG &DAG, SDLoc DL, bool Enable,
SDValue Chain, SDValue InGlue,		SDValue Chain, SDValue InGlue,
SDValue PStateSM, bool Entry) const;		SDValue PStateSM, bool Entry) const;

▲ Show 20 Lines • Show All 322 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,751 Lines • ▼ Show 20 Lines	bool AArch64TargetLowering::shouldExpandGetActiveLaneMask(EVT ResVT,

// The whilelo instruction only works with i32 or i64 scalar inputs.		// The whilelo instruction only works with i32 or i64 scalar inputs.
if (OpVT != MVT::i32 && OpVT != MVT::i64)		if (OpVT != MVT::i32 && OpVT != MVT::i64)
return true;		return true;

return false;		return false;
}		}

		bool AArch64TargetLowering::shouldExpandCttzElements(EVT VT) const {
		if (!Subtarget->hasSVE() \|\| VT != MVT::nxv16i1)
		paulwalker-armUnsubmitted Done Reply Inline Actions Can this be `hasSVEorSME()`? This function could be written as return !Subtarget->hasSVE() \|\| VT != MVT::nxv16i1; paulwalker-arm: Can this be `hasSVEorSME()`? This function could be written as ``` return !Subtarget->hasSVE()…
		kmclaughlinAuthorUnsubmitted Done Reply Inline Actions We can use `hasSVEorSME()` here, I've changed this and added a RUN line to `AArch64/intrinsic-cttz-elts-sve.ll` kmclaughlin: We can use `hasSVEorSME()` here, I've changed this and added a RUN line to `AArch64/intrinsic…
		return true;

		return false;
		}

void AArch64TargetLowering::addTypeForFixedLengthSVE(MVT VT,		void AArch64TargetLowering::addTypeForFixedLengthSVE(MVT VT,
bool StreamingSVE) {		bool StreamingSVE) {
assert(VT.isFixedLengthVector() && "Expected fixed length vector type!");		assert(VT.isFixedLengthVector() && "Expected fixed length vector type!");

// By default everything must be expanded.		// By default everything must be expanded.
for (unsigned Op = 0; Op < ISD::BUILTIN_OP_END; ++Op)		for (unsigned Op = 0; Op < ISD::BUILTIN_OP_END; ++Op)
setOperationAction(Op, VT, Expand);		setOperationAction(Op, VT, Expand);

▲ Show 20 Lines • Show All 3,541 Lines • ▼ Show 20 Lines	return DAG.getNode(Opcode, dl, Op.getValueType(), Op.getOperand(1),
Op.getOperand(2), Op.getOperand(3));		Op.getOperand(2), Op.getOperand(3));
}		}
case Intrinsic::get_active_lane_mask: {		case Intrinsic::get_active_lane_mask: {
SDValue ID =		SDValue ID =
DAG.getTargetConstant(Intrinsic::aarch64_sve_whilelo, dl, MVT::i64);		DAG.getTargetConstant(Intrinsic::aarch64_sve_whilelo, dl, MVT::i64);
return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, Op.getValueType(), ID,		return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, Op.getValueType(), ID,
Op.getOperand(1), Op.getOperand(2));		Op.getOperand(1), Op.getOperand(2));
}		}
		case Intrinsic::experimental_cttz_elts: {
		// Only lower this when zero-is-poison is false
		assert(cast<ConstantSDNode>(getValue(I.getOperand(1)))->getZExtValue() !=
		0);
		nikicUnsubmitted Done Reply Inline Actions I don't get what guarantees this assertion (or why it would make sense to assert this at all). nikic: I don't get what guarantees this assertion (or why it would make sense to assert this at all).
		kmclaughlinAuthorUnsubmitted Done Reply Inline Actions Removed - there was nothing to guarantee that the flag would be zero here and we would want to lower the intrinsic for both possible values. kmclaughlin: Removed - there was nothing to guarantee that the flag would be zero here and we would want to…

		EVT Ty = Op.getValueType();
		if (Ty == MVT::i64)
		paulwalker-armUnsubmitted Done Reply Inline Actions Up to you but you likely don't need this given `getZExtOrTrunc` will do the right thing (i.e. nothing) for a nop cast. paulwalker-arm: Up to you but you likely don't need this given `getZExtOrTrunc` will do the right thing (i.e.
		return Op;

		SDValue NewCttzElts =
		DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, MVT::i64, Op.getOperand(0),
		Op.getOperand(1), Op.getOperand(2));

		return DAG.getAnyExtOrTrunc(NewCttzElts, dl, Ty);
		craig.topperUnsubmitted Done Reply Inline Actions Why AnyExt? craig.topper: Why AnyExt?
		kmclaughlinAuthorUnsubmitted Done Reply Inline Actions This should be getZExtOrTrunc kmclaughlin: This should be getZExtOrTrunc
		}
}		}
}		}

bool AArch64TargetLowering::shouldExtendGSIndex(EVT VT, EVT &EltTy) const {		bool AArch64TargetLowering::shouldExtendGSIndex(EVT VT, EVT &EltTy) const {
if (VT.getVectorElementType() == MVT::i8 \|\|		if (VT.getVectorElementType() == MVT::i8 \|\|
VT.getVectorElementType() == MVT::i16) {		VT.getVectorElementType() == MVT::i16) {
EltTy = MVT::i32;		EltTy = MVT::i32;
return true;		return true;
▲ Show 20 Lines • Show All 20,773 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,958 Lines • ▼ Show 20 Lines	let Predicates = [HasSVEorSME] in {
def ADDVL_XXI : sve_int_arith_vl<0b0, "addvl">;		def ADDVL_XXI : sve_int_arith_vl<0b0, "addvl">;
def ADDPL_XXI : sve_int_arith_vl<0b1, "addpl">;		def ADDPL_XXI : sve_int_arith_vl<0b1, "addpl">;

defm CNTB_XPiI : sve_int_count<0b000, "cntb", int_aarch64_sve_cntb>;		defm CNTB_XPiI : sve_int_count<0b000, "cntb", int_aarch64_sve_cntb>;
defm CNTH_XPiI : sve_int_count<0b010, "cnth", int_aarch64_sve_cnth>;		defm CNTH_XPiI : sve_int_count<0b010, "cnth", int_aarch64_sve_cnth>;
defm CNTW_XPiI : sve_int_count<0b100, "cntw", int_aarch64_sve_cntw>;		defm CNTW_XPiI : sve_int_count<0b100, "cntw", int_aarch64_sve_cntw>;
defm CNTD_XPiI : sve_int_count<0b110, "cntd", int_aarch64_sve_cntd>;		defm CNTD_XPiI : sve_int_count<0b110, "cntd", int_aarch64_sve_cntd>;
defm CNTP_XPP : sve_int_pcount_pred<0b0000, "cntp", int_aarch64_sve_cntp>;		defm CNTP_XPP : sve_int_pcount_pred<0b0000, "cntp", int_aarch64_sve_cntp>;

		def : Pat<(i64 (int_experimental_cttz_elts nxv16i1:$Op1, (i32 0))),
		(i64 (!cast<Instruction>(CNTP_XPP_B)
		(nxv16i1 (!cast<Instruction>(BRKB_PPzP) (PTRUE_B 31), nxv16i1:$Op1)),
		(nxv16i1 (!cast<Instruction>(BRKB_PPzP) (PTRUE_B 31), nxv16i1:$Op1))))>;
}		}

defm INCB_XPiI : sve_int_pred_pattern_a<0b000, "incb", add, int_aarch64_sve_cntb>;		defm INCB_XPiI : sve_int_pred_pattern_a<0b000, "incb", add, int_aarch64_sve_cntb>;
defm DECB_XPiI : sve_int_pred_pattern_a<0b001, "decb", sub, int_aarch64_sve_cntb>;		defm DECB_XPiI : sve_int_pred_pattern_a<0b001, "decb", sub, int_aarch64_sve_cntb>;
defm INCH_XPiI : sve_int_pred_pattern_a<0b010, "inch", add, int_aarch64_sve_cnth>;		defm INCH_XPiI : sve_int_pred_pattern_a<0b010, "inch", add, int_aarch64_sve_cnth>;
defm DECH_XPiI : sve_int_pred_pattern_a<0b011, "dech", sub, int_aarch64_sve_cnth>;		defm DECH_XPiI : sve_int_pred_pattern_a<0b011, "dech", sub, int_aarch64_sve_cnth>;
defm INCW_XPiI : sve_int_pred_pattern_a<0b100, "incw", add, int_aarch64_sve_cntw>;		defm INCW_XPiI : sve_int_pred_pattern_a<0b100, "incw", add, int_aarch64_sve_cntw>;
defm DECW_XPiI : sve_int_pred_pattern_a<0b101, "decw", sub, int_aarch64_sve_cntw>;		defm DECW_XPiI : sve_int_pred_pattern_a<0b101, "decw", sub, int_aarch64_sve_cntw>;
▲ Show 20 Lines • Show All 2,028 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/intrinsic-cttz-elts.ll

This file was added.

				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s \| FileCheck %s

				; FIXED WIDTH

				define i8 @ctz_v8i1(<8 x i1> %a) {
				; CHECK-LABEL: .LCPI0_0:
				; CHECK-NEXT: .byte 8
				; CHECK-NEXT: .byte 7
				; CHECK-NEXT: .byte 6
				; CHECK-NEXT: .byte 5
				; CHECK-NEXT: .byte 4
				; CHECK-NEXT: .byte 3
				; CHECK-NEXT: .byte 2
				; CHECK-NEXT: .byte 1
				; CHECK-LABEL: ctz_v8i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: shl v0.8b, v0.8b, #7
				; CHECK-NEXT: adrp x8, .LCPI0_0
				; CHECK-NEXT: mov w9, #8 // =0x8
				; CHECK-NEXT: ldr d1, [x8, :lo12:.LCPI0_0]
				; CHECK-NEXT: cmlt v0.8b, v0.8b, #0
				; CHECK-NEXT: and v0.8b, v0.8b, v1.8b
				; CHECK-NEXT: umaxv b0, v0.8b
				; CHECK-NEXT: fmov w8, s0
				; CHECK-NEXT: sub w0, w9, w8
				; CHECK-NEXT: ret
				%res = call i8 @llvm.experimental.cttz.elts.i8.v8i1(<8 x i1> %a, i32 0)
				ret i8 %res
				}

				define i32 @ctz_v16i1(<16 x i1> %a) {
				; CHECK-LABEL: .LCPI1_0:
				; CHECK-NEXT: .byte 16
				; CHECK-NEXT: .byte 15
				; CHECK-NEXT: .byte 14
				; CHECK-NEXT: .byte 13
				; CHECK-NEXT: .byte 12
				; CHECK-NEXT: .byte 11
				; CHECK-NEXT: .byte 10
				; CHECK-NEXT: .byte 9
				; CHECK-NEXT: .byte 8
				; CHECK-NEXT: .byte 7
				; CHECK-NEXT: .byte 6
				; CHECK-NEXT: .byte 5
				; CHECK-NEXT: .byte 4
				; CHECK-NEXT: .byte 3
				; CHECK-NEXT: .byte 2
				; CHECK-NEXT: .byte 1
				; CHECK-LABEL: ctz_v16i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: shl v0.16b, v0.16b, #7
				; CHECK-NEXT: adrp x8, .LCPI1_0
				; CHECK-NEXT: mov w9, #16 // =0x10
				; CHECK-NEXT: ldr q1, [x8, :lo12:.LCPI1_0]
				; CHECK-NEXT: cmlt v0.16b, v0.16b, #0
				; CHECK-NEXT: and v0.16b, v0.16b, v1.16b
				; CHECK-NEXT: umaxv b0, v0.16b
				; CHECK-NEXT: fmov w8, s0
				; CHECK-NEXT: sub w8, w9, w8
				; CHECK-NEXT: and w0, w8, #0xff
				; CHECK-NEXT: ret
				%res = call i32 @llvm.experimental.cttz.elts.i32.v16i1(<16 x i1> %a, i32 0)
				ret i32 %res
				}

				define i16 @ctz_v4i32(<4 x i32> %a) {
				; CHECK-LABEL: .LCPI2_0:
				; CHECK-NEXT: .hword 4
				; CHECK-NEXT: .hword 3
				; CHECK-NEXT: .hword 2
				; CHECK-NEXT: .hword 1
				; CHECK-LABEL: ctz_v4i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: cmtst v0.4s, v0.4s, v0.4s
				; CHECK-NEXT: adrp x8, .LCPI2_0
				; CHECK-NEXT: mov w9, #4 // =0x4
				; CHECK-NEXT: ldr d1, [x8, :lo12:.LCPI2_0]
				; CHECK-NEXT: xtn v0.4h, v0.4s
				; CHECK-NEXT: and v0.8b, v0.8b, v1.8b
				; CHECK-NEXT: umaxv h0, v0.4h
				; CHECK-NEXT: fmov w8, s0
				; CHECK-NEXT: sub w8, w9, w8
				; CHECK-NEXT: and w0, w8, #0xff
				; CHECK-NEXT: ret
				%res = call i16 @llvm.experimental.cttz.elts.i16.v4i32(<4 x i32> %a, i32 0)
				ret i16 %res
				}

				; ZERO IS POISON

				define i8 @ctz_v8i1_poison(<8 x i1> %a) {
				; CHECK-LABEL: .LCPI3_0:
				; CHECK-NEXT: .byte 8
				; CHECK-NEXT: .byte 7
				; CHECK-NEXT: .byte 6
				; CHECK-NEXT: .byte 5
				; CHECK-NEXT: .byte 4
				; CHECK-NEXT: .byte 3
				; CHECK-NEXT: .byte 2
				; CHECK-NEXT: .byte 1
				; CHECK-LABEL: ctz_v8i1_poison:
				; CHECK: // %bb.0:
				; CHECK-NEXT: shl v0.8b, v0.8b, #7
				; CHECK-NEXT: adrp x8, .LCPI3_0
				; CHECK-NEXT: mov w9, #8 // =0x8
				; CHECK-NEXT: ldr d1, [x8, :lo12:.LCPI3_0]
				; CHECK-NEXT: cmlt v0.8b, v0.8b, #0
				; CHECK-NEXT: and v0.8b, v0.8b, v1.8b
				; CHECK-NEXT: umaxv b0, v0.8b
				; CHECK-NEXT: fmov w8, s0
				; CHECK-NEXT: sub w8, w9, w8
				; CHECK-NEXT: and w9, w8, #0xff
				; CHECK-NEXT: cmp w9, #8
				; CHECK-NEXT: csel w0, w8, w8, eq
				; CHECK-NEXT: ret
				%res = call i8 @llvm.experimental.cttz.elts.i8.v8i1(<8 x i1> %a, i32 1)
				ret i8 %res
				}

				; SCALABLE, WITH VSCALE RANGE

				define i64 @ctz_nxv8i1(<vscale x 8 x i1> %a) #0 {
				; CHECK-LABEL: ctz_nxv8i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: index z0.h, #0, #-1
				; CHECK-NEXT: mov z1.h, p0/z, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: cnth x9
				; CHECK-NEXT: inch z0.h
				; CHECK-NEXT: and z0.d, z0.d, z1.d
				; CHECK-NEXT: and z0.h, z0.h, #0xff
				; CHECK-NEXT: umaxv h0, p0, z0.h
				; CHECK-NEXT: fmov w8, s0
				; CHECK-NEXT: sub w8, w9, w8
				; CHECK-NEXT: and x0, x8, #0xff
				; CHECK-NEXT: ret
				%res = call i64 @llvm.experimental.cttz.elts.i64.nxv8i1(<vscale x 8 x i1> %a, i32 0)
				ret i64 %res
				}

				define i32 @ctz_nxv32i1(<vscale x 32 x i1> %a) #0 {
				; CHECK-LABEL: ctz_nxv32i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: index z0.h, #0, #-1
				; CHECK-NEXT: cnth x8
				; CHECK-NEXT: punpklo p2.h, p0.b
				; CHECK-NEXT: neg x8, x8
				; CHECK-NEXT: punpklo p3.h, p1.b
				; CHECK-NEXT: rdvl x9, #2
				; CHECK-NEXT: punpkhi p0.h, p0.b
				; CHECK-NEXT: mov z1.h, w8
				; CHECK-NEXT: rdvl x8, #-1
				; CHECK-NEXT: punpkhi p1.h, p1.b
				; CHECK-NEXT: mov z2.h, w8
				; CHECK-NEXT: inch z0.h, all, mul #4
				; CHECK-NEXT: mov z3.h, p2/z, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: ptrue p2.h
				; CHECK-NEXT: mov z5.h, p3/z, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: add z1.h, z0.h, z1.h
				; CHECK-NEXT: add z4.h, z0.h, z2.h
				; CHECK-NEXT: mov z6.h, p0/z, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: mov z7.h, p1/z, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: and z0.d, z0.d, z3.d
				; CHECK-NEXT: add z2.h, z1.h, z2.h
				; CHECK-NEXT: and z3.d, z4.d, z5.d
				; CHECK-NEXT: and z1.d, z1.d, z6.d
				; CHECK-NEXT: and z2.d, z2.d, z7.d
				; CHECK-NEXT: umax z0.h, p2/m, z0.h, z3.h
				; CHECK-NEXT: umax z1.h, p2/m, z1.h, z2.h
				; CHECK-NEXT: umax z0.h, p2/m, z0.h, z1.h
				; CHECK-NEXT: umaxv h0, p2, z0.h
				; CHECK-NEXT: fmov w8, s0
				; CHECK-NEXT: sub w8, w9, w8
				; CHECK-NEXT: and w0, w8, #0xffff
				; CHECK-NEXT: ret
				%res = call i32 @llvm.experimental.cttz.elts.i32.nxv32i1(<vscale x 32 x i1> %a, i32 0)
				ret i32 %res
				}

				define i32 @ctz_nxv4i32(<vscale x 4 x i32> %a) #0 {
				; CHECK-LABEL: ctz_nxv4i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: index z1.s, #0, #-1
				; CHECK-NEXT: cntw x9
				; CHECK-NEXT: incw z1.s
				; CHECK-NEXT: cmpne p1.s, p0/z, z0.s, #0
				; CHECK-NEXT: mov z0.s, p1/z, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: and z0.d, z1.d, z0.d
				; CHECK-NEXT: and z0.s, z0.s, #0xff
				; CHECK-NEXT: umaxv s0, p0, z0.s
				; CHECK-NEXT: fmov w8, s0
				; CHECK-NEXT: sub w8, w9, w8
				; CHECK-NEXT: and w0, w8, #0xff
				; CHECK-NEXT: ret
				%res = call i32 @llvm.experimental.cttz.elts.i32.nxv4i32(<vscale x 4 x i32> %a, i32 0)
				ret i32 %res
				}

				; SCALABLE, NO VSCALE RANGE

				define i32 @ctz_nxv8i1_no_range(<vscale x 8 x i1> %a) {
				; CHECK-LABEL: ctz_nxv8i1_no_range:
				; CHECK: // %bb.0:
				; CHECK-NEXT: index z0.s, #0, #-1
				; CHECK-NEXT: punpklo p1.h, p0.b
				; CHECK-NEXT: cntw x8
				; CHECK-NEXT: punpkhi p0.h, p0.b
				; CHECK-NEXT: neg x8, x8
				; CHECK-NEXT: cnth x9
				; CHECK-NEXT: mov z1.s, w8
				; CHECK-NEXT: incw z0.s, all, mul #2
				; CHECK-NEXT: mov z2.s, p1/z, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: mov z3.s, p0/z, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: add z1.s, z0.s, z1.s
				; CHECK-NEXT: and z0.d, z0.d, z2.d
				; CHECK-NEXT: and z1.d, z1.d, z3.d
				; CHECK-NEXT: umax z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: umaxv s0, p0, z0.s
				; CHECK-NEXT: fmov w8, s0
				; CHECK-NEXT: sub w0, w9, w8
				; CHECK-NEXT: ret
				%res = call i32 @llvm.experimental.cttz.elts.i32.nxv8i1(<vscale x 8 x i1> %a, i32 0)
				ret i32 %res
				}

				; MATCH WITH BRKB + CNTP

				define i32 @ctz_nxv16i1(<vscale x 16 x i1> %pg, <vscale x 16 x i1> %a) {
				; CHECK-LABEL: ctz_nxv16i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.b
				; CHECK-NEXT: brkb p0.b, p0/z, p1.b
				; CHECK-NEXT: cntp x0, p0, p0.b
				; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0
				; CHECK-NEXT: ret
				%res = call i32 @llvm.experimental.cttz.elts.i32.nxv16i1(<vscale x 16 x i1> %a, i32 0)
				ret i32 %res
				}

				define i32 @ctz_and_nxv16i1(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b) {
				; CHECK-LABEL: ctz_and_nxv16i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p1.b
				; CHECK-NEXT: cmpne p0.b, p0/z, z0.b, z1.b
				; CHECK-NEXT: brkb p0.b, p1/z, p0.b
				; CHECK-NEXT: cntp x0, p0, p0.b
				; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0
				; CHECK-NEXT: ret
				%cmp = icmp ne <vscale x 16 x i8> %a, %b
				%select = select <vscale x 16 x i1> %pg, <vscale x 16 x i1> %cmp, <vscale x 16 x i1> zeroinitializer
				%and = and <vscale x 16 x i1> %pg, %select
				%res = call i32 @llvm.experimental.cttz.elts.i32.nxv16i1(<vscale x 16 x i1> %and, i32 0)
				ret i32 %res
				}

				declare i8 @llvm.experimental.cttz.elts.i8.v8i1(<8 x i1>, i32)
				declare i32 @llvm.experimental.cttz.elts.i32.v16i1(<16 x i1>, i32)
				declare i16 @llvm.experimental.cttz.elts.i16.v4i32(<4 x i32>, i32)

				declare i32 @llvm.experimental.cttz.elts.i32.nxv8i1(<vscale x 8 x i1>, i32)
				declare i64 @llvm.experimental.cttz.elts.i64.nxv8i1(<vscale x 8 x i1>, i32)
				declare i32 @llvm.experimental.cttz.elts.i32.nxv16i1(<vscale x 16 x i1>, i32)
				declare i32 @llvm.experimental.cttz.elts.i32.nxv32i1(<vscale x 32 x i1>, i32)
				declare i32 @llvm.experimental.cttz.elts.i32.nxv4i32(<vscale x 4 x i32>, i32)

				attributes #0 = { vscale_range(1,16) }

This is an archive of the discontinued LLVM Phabricator instance.

Add intrinsic to count trailing zero elements in a vector
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 555033

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/include/llvm/IR/Intrinsics.td

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

llvm/test/CodeGen/AArch64/intrinsic-cttz-elts.ll

This is an archive of the discontinued LLVM Phabricator instance.

Add intrinsic to count trailing zero elements in a vectorClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 555033

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/include/llvm/IR/Intrinsics.td

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

llvm/test/CodeGen/AArch64/intrinsic-cttz-elts.ll

Add intrinsic to count trailing zero elements in a vector
ClosedPublic