This is an archive of the discontinued LLVM Phabricator instance.

I'd prefer to handle legalization in a separate patch from handling legal sdiv/udiv operations, so we actually have some context to discuss the legalization strategy.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7677	If we're going to support these operations, we might as well just add isel patterns; that's what we've been doing for other arithmetic operations.

sdesmalen added inline comments.Apr 21 2020, 12:44 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7677	Just to provide a bit of context to this approach: For unpredicated ISD nodes for which there is no predicated instruction, the predicate needs to be generated. For scalable vectors this will be a `ptrue all`, but for fixed-width vectors may take some other predicate such as VL8 for fixed `8` elements. Rather than creating new predicated AArch64 ISD nodes for each operation such as `AArch64ISD::UDIV_PRED`, the idea is to reuse the intrinsic layer we already added to support the ACLE - which are predicated and for which we already have the patterns - and map directly onto those. By doing the expansion in ISelLowering, the patterns stay simple and we can generalise `getPtrue` method so that it generates the right predicate for any scalable/fixed vector size as done in D71760 avoiding the need to write multiple patterns for different vector lengths. This patch was meant as the proof of concept of that idea (as discussed in the sync-up call of Apr 2).
llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll
29	This test should use CHECK-DAG instead of CHECK-NEXT, as the sdiv instructions are independent. (same for some of the other tests)

efriedma added inline comments.Apr 21 2020, 1:37 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7677	Using INTRINSIC_WO_CHAIN is a little annoying; it's hard to read in DAG dumps, and it gives weird error messages if we fail in selection. But there aren't really any other immediate downsides I can think of, vs. doing it the other way (converting the intrinsic to AArch64ISD::UDIV_PRED). Long-term, we're going to have a target-independent ISD::UDIV_PRED. We probably want to start using those nodes at some point, to get target-independent optimizations. Not sure if that impacts what we want to do right now.

sdesmalen added inline comments.Apr 23 2020, 3:32 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7677	I agree that using INTRINSIC_WO_CHAIN will be a bit annoying for more complicated patterns. The reuse of the intrinsics was merely for convenience because we already have the patterns, and was not a critical part of the design. It shouldn't be a big effort to create AArch64ISD nodes and use these for the intrinsics as well. If we use AArch64-specific nodes we can implement what's needed now for SVE codegen and I expect we can easily migrate to target-independent nodes when they get added.

Removed changes to handle legalisation from this patch (this will be included in a follow up)
Added AArch64ISD nodes for SDIV_PRED & UDIV_PRED
Changed LowerDIV to use the new ISD nodes rather than lowering to SVE intrinsics
Update tests to use CHECK-DAG

LGTM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11388	Whitespace.

This revision is now accepted and ready to land.Apr 23 2020, 12:35 PM

Closed by commit rG53dd72a87aeb: [SVE][CodeGen] Lower SDIV & UDIV to SVE intrinsics (authored by kmclaughlin). · Explain WhyApr 24 2020, 3:45 AM

This revision was automatically updated to reflect the committed changes.

kmclaughlin marked an inline comment as done.

Thank you both for your comments on this patch, @efriedma & @sdesmalen!

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.h

6 lines

AArch64ISelLowering.cpp

34 lines

AArch64SVEInstrInfo.td

12 lines

test/

CodeGen/

AArch64/

llvm-ir-to-intrinsic.ll

45 lines

Diff 259852

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
CSINC, // Conditional select increment.		CSINC, // Conditional select increment.

// Pointer to the thread's local storage area. Materialised from TPIDR_EL0 on		// Pointer to the thread's local storage area. Materialised from TPIDR_EL0 on
// ELF.		// ELF.
THREAD_POINTER,		THREAD_POINTER,
ADC,		ADC,
SBC, // adc, sbc instructions		SBC, // adc, sbc instructions

		// Arithmetic instructions
		SDIV_PRED,
		UDIV_PRED,

// Arithmetic instructions which write flags.		// Arithmetic instructions which write flags.
ADDS,		ADDS,
SUBS,		SUBS,
ADCS,		ADCS,
SBCS,		SBCS,
ANDS,		ANDS,

// Conditional compares. Operands: left,right,falsecc,cc,flags		// Conditional compares. Operands: left,right,falsecc,cc,flags
▲ Show 20 Lines • Show All 713 Lines • ▼ Show 20 Lines	private:
SDValue LowerFLT_ROUNDS_(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFLT_ROUNDS_(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerINSERT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINSERT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerEXTRACT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerEXTRACT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSCALAR_TO_VECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSCALAR_TO_VECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSPLAT_VECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSPLAT_VECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerDUPQLane(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerDUPQLane(SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerDIV(SDValue Op, SelectionDAG &DAG,
		unsigned NewOp) const;
SDValue LowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVectorSRA_SRL_SHL(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVectorSRA_SRL_SHL(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerShiftLeftParts(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerShiftLeftParts(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerShiftRightParts(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerShiftRightParts(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVSETCC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVSETCC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerCTPOP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerCTPOP(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerF128Call(SDValue Op, SelectionDAG &DAG,		SDValue LowerF128Call(SDValue Op, SelectionDAG &DAG,
RTLIB::Libcall Call) const;		RTLIB::Libcall Call) const;
▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 877 Lines • ▼ Show 20 Lines	if (Subtarget->hasNEON()) {
setTruncStoreAction(MVT::v4i16, MVT::v4i8, Custom);		setTruncStoreAction(MVT::v4i16, MVT::v4i8, Custom);
}		}

if (Subtarget->hasSVE()) {		if (Subtarget->hasSVE()) {
// FIXME: Add custom lowering of MLOAD to handle different passthrus (not a		// FIXME: Add custom lowering of MLOAD to handle different passthrus (not a
// splat of 0 or undef) once vector selects supported in SVE codegen. See		// splat of 0 or undef) once vector selects supported in SVE codegen. See
// D68877 for more details.		// D68877 for more details.
for (MVT VT : MVT::integer_scalable_vector_valuetypes()) {		for (MVT VT : MVT::integer_scalable_vector_valuetypes()) {
if (isTypeLegal(VT))		if (isTypeLegal(VT)) {
setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);		setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);
		setOperationAction(ISD::SDIV, VT, Custom);
		setOperationAction(ISD::UDIV, VT, Custom);
		}
}		}
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i8, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i8, Custom);
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i16, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i16, Custom);

for (MVT VT : MVT::fp_scalable_vector_valuetypes()) {		for (MVT VT : MVT::fp_scalable_vector_valuetypes()) {
if (isTypeLegal(VT)) {		if (isTypeLegal(VT)) {
setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);		setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);
}		}
▲ Show 20 Lines • Show All 379 Lines • ▼ Show 20 Lines	const char *AArch64TargetLowering::getTargetNodeName(unsigned Opcode) const {
case AArch64ISD::BRCOND: return "AArch64ISD::BRCOND";		case AArch64ISD::BRCOND: return "AArch64ISD::BRCOND";
case AArch64ISD::CSEL: return "AArch64ISD::CSEL";		case AArch64ISD::CSEL: return "AArch64ISD::CSEL";
case AArch64ISD::FCSEL: return "AArch64ISD::FCSEL";		case AArch64ISD::FCSEL: return "AArch64ISD::FCSEL";
case AArch64ISD::CSINV: return "AArch64ISD::CSINV";		case AArch64ISD::CSINV: return "AArch64ISD::CSINV";
case AArch64ISD::CSNEG: return "AArch64ISD::CSNEG";		case AArch64ISD::CSNEG: return "AArch64ISD::CSNEG";
case AArch64ISD::CSINC: return "AArch64ISD::CSINC";		case AArch64ISD::CSINC: return "AArch64ISD::CSINC";
case AArch64ISD::THREAD_POINTER: return "AArch64ISD::THREAD_POINTER";		case AArch64ISD::THREAD_POINTER: return "AArch64ISD::THREAD_POINTER";
case AArch64ISD::TLSDESC_CALLSEQ: return "AArch64ISD::TLSDESC_CALLSEQ";		case AArch64ISD::TLSDESC_CALLSEQ: return "AArch64ISD::TLSDESC_CALLSEQ";
		case AArch64ISD::SDIV_PRED: return "AArch64ISD::SDIV_PRED";
		case AArch64ISD::UDIV_PRED: return "AArch64ISD::UDIV_PRED";
case AArch64ISD::ADC: return "AArch64ISD::ADC";		case AArch64ISD::ADC: return "AArch64ISD::ADC";
case AArch64ISD::SBC: return "AArch64ISD::SBC";		case AArch64ISD::SBC: return "AArch64ISD::SBC";
case AArch64ISD::ADDS: return "AArch64ISD::ADDS";		case AArch64ISD::ADDS: return "AArch64ISD::ADDS";
case AArch64ISD::SUBS: return "AArch64ISD::SUBS";		case AArch64ISD::SUBS: return "AArch64ISD::SUBS";
case AArch64ISD::ADCS: return "AArch64ISD::ADCS";		case AArch64ISD::ADCS: return "AArch64ISD::ADCS";
case AArch64ISD::SBCS: return "AArch64ISD::SBCS";		case AArch64ISD::SBCS: return "AArch64ISD::SBCS";
case AArch64ISD::ANDS: return "AArch64ISD::ANDS";		case AArch64ISD::ANDS: return "AArch64ISD::ANDS";
case AArch64ISD::CCMP: return "AArch64ISD::CCMP";		case AArch64ISD::CCMP: return "AArch64ISD::CCMP";
▲ Show 20 Lines • Show All 2,046 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
case ISD::BUILD_VECTOR:		case ISD::BUILD_VECTOR:
return LowerBUILD_VECTOR(Op, DAG);		return LowerBUILD_VECTOR(Op, DAG);
case ISD::VECTOR_SHUFFLE:		case ISD::VECTOR_SHUFFLE:
return LowerVECTOR_SHUFFLE(Op, DAG);		return LowerVECTOR_SHUFFLE(Op, DAG);
case ISD::SPLAT_VECTOR:		case ISD::SPLAT_VECTOR:
return LowerSPLAT_VECTOR(Op, DAG);		return LowerSPLAT_VECTOR(Op, DAG);
case ISD::EXTRACT_SUBVECTOR:		case ISD::EXTRACT_SUBVECTOR:
return LowerEXTRACT_SUBVECTOR(Op, DAG);		return LowerEXTRACT_SUBVECTOR(Op, DAG);
		case ISD::SDIV:
		return LowerDIV(Op, DAG, AArch64ISD::SDIV_PRED);
		case ISD::UDIV:
		return LowerDIV(Op, DAG, AArch64ISD::UDIV_PRED);
case ISD::SRA:		case ISD::SRA:
case ISD::SRL:		case ISD::SRL:
case ISD::SHL:		case ISD::SHL:
return LowerVectorSRA_SRL_SHL(Op, DAG);		return LowerVectorSRA_SRL_SHL(Op, DAG);
case ISD::SHL_PARTS:		case ISD::SHL_PARTS:
return LowerShiftLeftParts(Op, DAG);		return LowerShiftLeftParts(Op, DAG);
case ISD::SRL_PARTS:		case ISD::SRL_PARTS:
case ISD::SRA_PARTS:		case ISD::SRA_PARTS:
▲ Show 20 Lines • Show All 4,290 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerDUPQLane(SDValue Op,
SDValue SplatIdx64 = DAG.getNode(ISD::SPLAT_VECTOR, DL, MVT::nxv2i64, Idx64);		SDValue SplatIdx64 = DAG.getNode(ISD::SPLAT_VECTOR, DL, MVT::nxv2i64, Idx64);
SDValue ShuffleMask = DAG.getNode(ISD::ADD, DL, MVT::nxv2i64, SV, SplatIdx64);		SDValue ShuffleMask = DAG.getNode(ISD::ADD, DL, MVT::nxv2i64, SV, SplatIdx64);

// create the vector Val[idx64],Val[idx64+1],Val[idx64],Val[idx64+1],...		// create the vector Val[idx64],Val[idx64+1],Val[idx64],Val[idx64+1],...
SDValue TBL = DAG.getNode(AArch64ISD::TBL, DL, MVT::nxv2i64, V, ShuffleMask);		SDValue TBL = DAG.getNode(AArch64ISD::TBL, DL, MVT::nxv2i64, V, ShuffleMask);
return DAG.getNode(ISD::BITCAST, DL, VT, TBL);		return DAG.getNode(ISD::BITCAST, DL, VT, TBL);
}		}

		SDValue AArch64TargetLowering::LowerDIV(SDValue Op,
		SelectionDAG &DAG,
		unsigned NewOp) const {
		EVT VT = Op.getValueType();
		SDLoc DL(Op);

		assert(Op.getOperand(0).getValueType().isScalableVector() &&
		Op.getOperand(1).getValueType().isScalableVector() &&
		"Only scalable vectors are supported");

		auto PredTy = VT.getVectorVT(*DAG.getContext(), MVT::i1,
		VT.getVectorNumElements(), true);
		SDValue Mask = getPTrue(DAG, DL, PredTy, AArch64SVEPredPattern::all);

		return DAG.getNode(NewOp, DL, VT, Mask, Op.getOperand(0), Op.getOperand(1));
		}

static bool resolveBuildVector(BuildVectorSDNode *BVN, APInt &CnstBits,		static bool resolveBuildVector(BuildVectorSDNode *BVN, APInt &CnstBits,
		efriedmaUnsubmitted Not Done Reply Inline Actions If we're going to support these operations, we might as well just add isel patterns; that's what we've been doing for other arithmetic operations. efriedma: If we're going to support these operations, we might as well just add isel patterns; that's…
		sdesmalenUnsubmitted Not Done Reply Inline Actions Just to provide a bit of context to this approach: For unpredicated ISD nodes for which there is no predicated instruction, the predicate needs to be generated. For scalable vectors this will be a `ptrue all`, but for fixed-width vectors may take some other predicate such as VL8 for fixed `8` elements. Rather than creating new predicated AArch64 ISD nodes for each operation such as `AArch64ISD::UDIV_PRED`, the idea is to reuse the intrinsic layer we already added to support the ACLE - which are predicated and for which we already have the patterns - and map directly onto those. By doing the expansion in ISelLowering, the patterns stay simple and we can generalise `getPtrue` method so that it generates the right predicate for any scalable/fixed vector size as done in D71760 avoiding the need to write multiple patterns for different vector lengths. This patch was meant as the proof of concept of that idea (as discussed in the sync-up call of Apr 2). sdesmalen: Just to provide a bit of context to this approach: For unpredicated ISD nodes for which there…
		efriedmaUnsubmitted Not Done Reply Inline Actions Using INTRINSIC_WO_CHAIN is a little annoying; it's hard to read in DAG dumps, and it gives weird error messages if we fail in selection. But there aren't really any other immediate downsides I can think of, vs. doing it the other way (converting the intrinsic to AArch64ISD::UDIV_PRED). Long-term, we're going to have a target-independent ISD::UDIV_PRED. We probably want to start using those nodes at some point, to get target-independent optimizations. Not sure if that impacts what we want to do right now. efriedma: Using INTRINSIC_WO_CHAIN is a little annoying; it's hard to read in DAG dumps, and it gives…
		sdesmalenUnsubmitted Not Done Reply Inline Actions I agree that using INTRINSIC_WO_CHAIN will be a bit annoying for more complicated patterns. The reuse of the intrinsics was merely for convenience because we already have the patterns, and was not a critical part of the design. It shouldn't be a big effort to create AArch64ISD nodes and use these for the intrinsics as well. If we use AArch64-specific nodes we can implement what's needed now for SVE codegen and I expect we can easily migrate to target-independent nodes when they get added. sdesmalen: I agree that using INTRINSIC_WO_CHAIN will be a bit annoying for more complicated patterns. The…
APInt &UndefBits) {		APInt &UndefBits) {
EVT VT = BVN->getValueType(0);		EVT VT = BVN->getValueType(0);
APInt SplatBits, SplatUndef;		APInt SplatBits, SplatUndef;
unsigned SplatBitSize;		unsigned SplatBitSize;
bool HasAnyUndefs;		bool HasAnyUndefs;
if (BVN->isConstantSplat(SplatBits, SplatUndef, SplatBitSize, HasAnyUndefs)) {		if (BVN->isConstantSplat(SplatBits, SplatUndef, SplatBitSize, HasAnyUndefs)) {
unsigned NumSplats = VT.getSizeInBits() / SplatBitSize;		unsigned NumSplats = VT.getSizeInBits() / SplatBitSize;

▲ Show 20 Lines • Show All 3,694 Lines • ▼ Show 20 Lines	case Intrinsic::aarch64_sve_index:
return LowerSVEIntrinsicIndex(N, DAG);		return LowerSVEIntrinsicIndex(N, DAG);
case Intrinsic::aarch64_sve_dup:		case Intrinsic::aarch64_sve_dup:
return LowerSVEIntrinsicDUP(N, DAG);		return LowerSVEIntrinsicDUP(N, DAG);
case Intrinsic::aarch64_sve_dup_x:		case Intrinsic::aarch64_sve_dup_x:
return DAG.getNode(ISD::SPLAT_VECTOR, SDLoc(N), N->getValueType(0),		return DAG.getNode(ISD::SPLAT_VECTOR, SDLoc(N), N->getValueType(0),
N->getOperand(1));		N->getOperand(1));
case Intrinsic::aarch64_sve_ext:		case Intrinsic::aarch64_sve_ext:
return LowerSVEIntrinsicEXT(N, DAG);		return LowerSVEIntrinsicEXT(N, DAG);
		case Intrinsic::aarch64_sve_sdiv:
		efriedmaUnsubmitted Done Reply Inline Actions Whitespace. efriedma: Whitespace.
		return DAG.getNode(AArch64ISD::SDIV_PRED, SDLoc(N), N->getValueType(0),
		N->getOperand(1), N->getOperand(2), N->getOperand(3));
		case Intrinsic::aarch64_sve_udiv:
		return DAG.getNode(AArch64ISD::UDIV_PRED, SDLoc(N), N->getValueType(0),
		N->getOperand(1), N->getOperand(2), N->getOperand(3));
case Intrinsic::aarch64_sve_sel:		case Intrinsic::aarch64_sve_sel:
return DAG.getNode(ISD::VSELECT, SDLoc(N), N->getValueType(0),		return DAG.getNode(ISD::VSELECT, SDLoc(N), N->getValueType(0),
N->getOperand(1), N->getOperand(2), N->getOperand(3));		N->getOperand(1), N->getOperand(2), N->getOperand(3));
case Intrinsic::aarch64_sve_cmpeq_wide:		case Intrinsic::aarch64_sve_cmpeq_wide:
return tryConvertSVEWideCompare(N, Intrinsic::aarch64_sve_cmpeq,		return tryConvertSVEWideCompare(N, Intrinsic::aarch64_sve_cmpeq,
false, DCI, DAG);		false, DCI, DAG);
case Intrinsic::aarch64_sve_cmpne_wide:		case Intrinsic::aarch64_sve_cmpne_wide:
return tryConvertSVEWideCompare(N, Intrinsic::aarch64_sve_cmpne,		return tryConvertSVEWideCompare(N, Intrinsic::aarch64_sve_cmpne,
▲ Show 20 Lines • Show All 2,676 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

Show First 20 Lines • Show All 139 Lines • ▼ Show 20 Lines
def AArch64sminv_pred : SDNode<"AArch64ISD::SMINV_PRED", SDT_AArch64Reduce>;		def AArch64sminv_pred : SDNode<"AArch64ISD::SMINV_PRED", SDT_AArch64Reduce>;
def AArch64uminv_pred : SDNode<"AArch64ISD::UMINV_PRED", SDT_AArch64Reduce>;		def AArch64uminv_pred : SDNode<"AArch64ISD::UMINV_PRED", SDT_AArch64Reduce>;
def AArch64orv_pred : SDNode<"AArch64ISD::ORV_PRED", SDT_AArch64Reduce>;		def AArch64orv_pred : SDNode<"AArch64ISD::ORV_PRED", SDT_AArch64Reduce>;
def AArch64eorv_pred : SDNode<"AArch64ISD::EORV_PRED", SDT_AArch64Reduce>;		def AArch64eorv_pred : SDNode<"AArch64ISD::EORV_PRED", SDT_AArch64Reduce>;
def AArch64andv_pred : SDNode<"AArch64ISD::ANDV_PRED", SDT_AArch64Reduce>;		def AArch64andv_pred : SDNode<"AArch64ISD::ANDV_PRED", SDT_AArch64Reduce>;
def AArch64lasta : SDNode<"AArch64ISD::LASTA", SDT_AArch64Reduce>;		def AArch64lasta : SDNode<"AArch64ISD::LASTA", SDT_AArch64Reduce>;
def AArch64lastb : SDNode<"AArch64ISD::LASTB", SDT_AArch64Reduce>;		def AArch64lastb : SDNode<"AArch64ISD::LASTB", SDT_AArch64Reduce>;

		def SDT_AArch64DIV : SDTypeProfile<1, 3, [
		SDTCisVec<0>, SDTCisVec<1>, SDTCisVec<2>, SDTCisVec<3>,
		SDTCVecEltisVT<1,i1>, SDTCisSameAs<2,3>
		]>;

		def AArch64sdiv_pred : SDNode<"AArch64ISD::SDIV_PRED", SDT_AArch64DIV>;
		def AArch64udiv_pred : SDNode<"AArch64ISD::UDIV_PRED", SDT_AArch64DIV>;

def SDT_AArch64ReduceWithInit : SDTypeProfile<1, 3, [SDTCisVec<1>, SDTCisVec<3>]>;		def SDT_AArch64ReduceWithInit : SDTypeProfile<1, 3, [SDTCisVec<1>, SDTCisVec<3>]>;
def AArch64clasta_n : SDNode<"AArch64ISD::CLASTA_N", SDT_AArch64ReduceWithInit>;		def AArch64clasta_n : SDNode<"AArch64ISD::CLASTA_N", SDT_AArch64ReduceWithInit>;
def AArch64clastb_n : SDNode<"AArch64ISD::CLASTB_N", SDT_AArch64ReduceWithInit>;		def AArch64clastb_n : SDNode<"AArch64ISD::CLASTB_N", SDT_AArch64ReduceWithInit>;

def SDT_AArch64Rev : SDTypeProfile<1, 1, [SDTCisVec<0>, SDTCisSameAs<0,1>]>;		def SDT_AArch64Rev : SDTypeProfile<1, 1, [SDTCisVec<0>, SDTCisSameAs<0,1>]>;
def AArch64rev : SDNode<"AArch64ISD::REV", SDT_AArch64Rev>;		def AArch64rev : SDNode<"AArch64ISD::REV", SDT_AArch64Rev>;

def SDT_AArch64PTest : SDTypeProfile<0, 2, [SDTCisVec<0>, SDTCisSameAs<0,1>]>;		def SDT_AArch64PTest : SDTypeProfile<0, 2, [SDTCisVec<0>, SDTCisSameAs<0,1>]>;
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	def : Pat<(mul nxv16i8:$Op1, nxv16i8:$Op2),
(MUL_ZPmZ_B (PTRUE_B 31), $Op1, $Op2)>;		(MUL_ZPmZ_B (PTRUE_B 31), $Op1, $Op2)>;
def : Pat<(mul nxv8i16:$Op1, nxv8i16:$Op2),		def : Pat<(mul nxv8i16:$Op1, nxv8i16:$Op2),
(MUL_ZPmZ_H (PTRUE_H 31), $Op1, $Op2)>;		(MUL_ZPmZ_H (PTRUE_H 31), $Op1, $Op2)>;
def : Pat<(mul nxv4i32:$Op1, nxv4i32:$Op2),		def : Pat<(mul nxv4i32:$Op1, nxv4i32:$Op2),
(MUL_ZPmZ_S (PTRUE_S 31), $Op1, $Op2)>;		(MUL_ZPmZ_S (PTRUE_S 31), $Op1, $Op2)>;
def : Pat<(mul nxv2i64:$Op1, nxv2i64:$Op2),		def : Pat<(mul nxv2i64:$Op1, nxv2i64:$Op2),
(MUL_ZPmZ_D (PTRUE_D 31), $Op1, $Op2)>;		(MUL_ZPmZ_D (PTRUE_D 31), $Op1, $Op2)>;

defm SDIV_ZPmZ : sve_int_bin_pred_arit_2_div<0b100, "sdiv", int_aarch64_sve_sdiv>;		defm SDIV_ZPmZ : sve_int_bin_pred_arit_2_div<0b100, "sdiv", AArch64sdiv_pred>;
defm UDIV_ZPmZ : sve_int_bin_pred_arit_2_div<0b101, "udiv", int_aarch64_sve_udiv>;		defm UDIV_ZPmZ : sve_int_bin_pred_arit_2_div<0b101, "udiv", AArch64udiv_pred>;
defm SDIVR_ZPmZ : sve_int_bin_pred_arit_2_div<0b110, "sdivr", int_aarch64_sve_sdivr>;		defm SDIVR_ZPmZ : sve_int_bin_pred_arit_2_div<0b110, "sdivr", int_aarch64_sve_sdivr>;
defm UDIVR_ZPmZ : sve_int_bin_pred_arit_2_div<0b111, "udivr", int_aarch64_sve_udivr>;		defm UDIVR_ZPmZ : sve_int_bin_pred_arit_2_div<0b111, "udivr", int_aarch64_sve_udivr>;

defm SDOT_ZZZ : sve_intx_dot<0b0, "sdot", int_aarch64_sve_sdot>;		defm SDOT_ZZZ : sve_intx_dot<0b0, "sdot", int_aarch64_sve_sdot>;
defm UDOT_ZZZ : sve_intx_dot<0b1, "udot", int_aarch64_sve_udot>;		defm UDOT_ZZZ : sve_intx_dot<0b1, "udot", int_aarch64_sve_udot>;

defm SDOT_ZZZI : sve_intx_dot_by_indexed_elem<0b0, "sdot", int_aarch64_sve_sdot_lane>;		defm SDOT_ZZZI : sve_intx_dot_by_indexed_elem<0b0, "sdot", int_aarch64_sve_sdot_lane>;
defm UDOT_ZZZI : sve_intx_dot_by_indexed_elem<0b1, "udot", int_aarch64_sve_udot_lane>;		defm UDOT_ZZZI : sve_intx_dot_by_indexed_elem<0b1, "udot", int_aarch64_sve_udot_lane>;
▲ Show 20 Lines • Show All 1,933 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll

This file was added.

				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s \| FileCheck %s

				;
				; SDIV
				;

				define <vscale x 4 x i32> @sdiv_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) {
				; CHECK-LABEL: @sdiv_i32
				; CHECK-DAG: ptrue p0.s
				; CHECK-DAG: sdiv z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				%div = sdiv <vscale x 4 x i32> %a, %b
				ret <vscale x 4 x i32> %div
				}

				define <vscale x 2 x i64> @sdiv_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: @sdiv_i64
				; CHECK-DAG: ptrue p0.d
				; CHECK-DAG: sdiv z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				%div = sdiv <vscale x 2 x i64> %a, %b
				ret <vscale x 2 x i64> %div
				}

				;
				; UDIV
				;

				define <vscale x 4 x i32> @udiv_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) {
				sdesmalenUnsubmitted Not Done Reply Inline Actions This test should use CHECK-DAG instead of CHECK-NEXT, as the sdiv instructions are independent. (same for some of the other tests) sdesmalen: This test should use CHECK-DAG instead of CHECK-NEXT, as the sdiv instructions are independent.
				; CHECK-LABEL: @udiv_i32
				; CHECK-DAG: ptrue p0.s
				; CHECK-DAG: udiv z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				%div = udiv <vscale x 4 x i32> %a, %b
				ret <vscale x 4 x i32> %div
				}

				define <vscale x 2 x i64> @udiv_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: @udiv_i64
				; CHECK-DAG: ptrue p0.d
				; CHECK-DAG: udiv z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				%div = udiv <vscale x 2 x i64> %a, %b
				ret <vscale x 2 x i64> %div
				}

This is an archive of the discontinued LLVM Phabricator instance.

[SVE][CodeGen] Lower SDIV & UDIV to SVE intrinsicsClosedPublic

Details

Diff Detail