This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/Target/AArch64/SVEInstrFormats.td
5570	This is depending on hasSideEffects to preserve the correct ordering with instructions that read/write FFR? That probably works. I guess the alternative is to insert an IMPLICIT_DEF of FFR in the entry block of each function. What are the calling convention rules for FFR? Is it callee-save? If not, we might need to do some work to make FFR reads/writes do something sane across calls inserted by the compiler.

kmclaughlin added inline comments.Dec 20 2019, 9:16 AM

llvm/lib/Target/AArch64/SVEInstrFormats.td
5570	The FFR is not callee-saved. We will need to add support to save & restore it where appropriate at the point the compiler starts generating reads to the FFR, but for the purpose of the ACLE the user will be required to do this if necessary.

efriedma added inline comments.Dec 20 2019, 2:00 PM

llvm/lib/Target/AArch64/SVEInstrFormats.td
5570	How can the user write correct code to save/restore the FFR? The compiler can move arbitrary readnone/argmemonly calls between the definition and the use.

andwar added inline comments.Jan 2 2020, 6:27 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10230–10232	Could you replace `GLD1*` with `Load`? I believe that that will be still correct with the added bonus of covering the new case :)
11306	You could use `getSVEContainterType` here instead. You'll need to extend it a wee bit.
12561	The following `switch` statement will now cover more than just Gather nodes. Maybe `SVE load nodes` instead?
12605–12608	Why not: SmallVector<SDvalue, 4> Ops = {Src->getOperand(0), Src->getOperand(1), Src->getOperand(2), Src->getOperand(3), Src->getOperand(4)}; ?
12609	Could you add a comment explaining what the underlying difference between `LDNF1S` and `GLD1S` is? Otherwise it's not clear why this `if` statement is needed. IIUC, `GLD1S` has an extra argument for the offsets (hence 5 args vs 4).

sdesmalen added inline comments.Jan 8 2020, 9:39 AM

llvm/lib/Target/AArch64/SVEInstrFormats.td
5570	There are separate intrinsics for loading/writing the FFR (svrdffr, svsetffr, svwrffr), which use a `svbool_t` to keep the value of the FFR. These intrinsics are implemented in the same way with a Pseudo with `hasSideEffects = 1` set. I thought this flag would prevent other calls from being scheduled/moved over these intrinsics, as they have unknown/unmodelled side-effects and would thus act kind of like a barrier?

efriedma added inline comments.Jan 8 2020, 12:39 PM

llvm/lib/Target/AArch64/SVEInstrFormats.td
5570	The issue would be transforms at the IR/SelectionDAG level. We can probably model calls at the MIR level correctly, like you're describing.

Rebased patch
Updated comments and extended getSVEContainerType to handle nxv8i16 & nxv16i8

Thanks for your suggestions, @andwar!

kmclaughlin added a child revision: D73025: [AArch64][SVE] Add first-faulting load intrinsic.Jan 20 2020, 3:23 AM

sdesmalen added inline comments.Jan 20 2020, 5:07 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12598	Move the assignment of `MemVTOpNum` to the switch statement above instead of special-casing it here?
12599	nit: `s/LD1SrcMemVT/SrcMemVT/`
12604	Better make the default '5' if there is a large likelihood of there being 5 default values.
12604	Instead of special -casing LDNF1S below, you can write this as: SmallVector<SDValue, 5> Ops; for(unsigned I=0; I<Src->getNumOperands(); ++I) Ops.push_back(Src->getOperand(I));

Some minor changes to performSignExtendInRegCombine to address comments from @sdesmalen

LGTM [with the caveat that we need to revisit the modelling of the FFR register and get rid fo the PseudoInstExpansion at a later point, as discussed during the previous sync-up call]

This revision is now accepted and ready to land.Jan 21 2020, 1:00 AM

Closed by commit rGcdcc4f2a44b5: [AArch64][SVE] Add intrinsic for non-faulting loads (authored by kmclaughlin). · Explain WhyJan 22 2020, 3:40 AM

This revision was automatically updated to reflect the committed changes.

kmclaughlin mentioned this in D73097: [AArch64][SVE] Add intrinsics for FFR manipulation.Jan 24 2020, 2:57 AM

efriedma mentioned this in D102617: [llvm][AArch64][SVE] Model FFR-using intrinsics with inaccessiblemem.May 17 2021, 10:42 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

IntrinsicsAArch64.td

8 lines

lib/

Target/

AArch64/

AArch64ISelLowering.h

3 lines

AArch64ISelLowering.cpp

109 lines

AArch64InstrInfo.td

7 lines

AArch64SVEInstrInfo.td

35 lines

SVEInstrFormats.td

15 lines

test/

CodeGen/

AArch64/

sve-intrinsics-loads-nf.ll

182 lines

Diff 239531

llvm/include/llvm/IR/IntrinsicsAArch64.td

Show First 20 Lines • Show All 769 Lines • ▼ Show 20 Lines
let TargetPrefix = "aarch64" in { // All intrinsics start with "llvm.aarch64.".		let TargetPrefix = "aarch64" in { // All intrinsics start with "llvm.aarch64.".

class AdvSIMD_1Vec_PredLoad_Intrinsic		class AdvSIMD_1Vec_PredLoad_Intrinsic
: Intrinsic<[llvm_anyvector_ty],		: Intrinsic<[llvm_anyvector_ty],
[LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,		[LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
LLVMPointerTo<0>],		LLVMPointerTo<0>],
[IntrReadMem, IntrArgMemOnly]>;		[IntrReadMem, IntrArgMemOnly]>;

		class AdvSIMD_1Vec_PredFaultingLoad_Intrinsic
		: Intrinsic<[llvm_anyvector_ty],
		[LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		LLVMPointerToElt<0>],
		[IntrReadMem, IntrArgMemOnly]>;

class AdvSIMD_1Vec_PredStore_Intrinsic		class AdvSIMD_1Vec_PredStore_Intrinsic
: Intrinsic<[],		: Intrinsic<[],
[llvm_anyvector_ty,		[llvm_anyvector_ty,
LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
LLVMPointerTo<0>],		LLVMPointerTo<0>],
[IntrArgMemOnly, NoCapture<2>]>;		[IntrArgMemOnly, NoCapture<2>]>;

class AdvSIMD_Merged1VectorArg_Intrinsic		class AdvSIMD_Merged1VectorArg_Intrinsic
▲ Show 20 Lines • Show All 378 Lines • ▼ Show 20 Lines	: Intrinsic<[],
[IntrWriteMem, IntrArgMemOnly]>;		[IntrWriteMem, IntrArgMemOnly]>;

//		//
// Loads		// Loads
//		//

def int_aarch64_sve_ldnt1 : AdvSIMD_1Vec_PredLoad_Intrinsic;		def int_aarch64_sve_ldnt1 : AdvSIMD_1Vec_PredLoad_Intrinsic;

		def int_aarch64_sve_ldnf1 : AdvSIMD_1Vec_PredFaultingLoad_Intrinsic;

//		//
// Stores		// Stores
//		//

def int_aarch64_sve_stnt1 : AdvSIMD_1Vec_PredStore_Intrinsic;		def int_aarch64_sve_stnt1 : AdvSIMD_1Vec_PredStore_Intrinsic;

//		//
// Integer arithmetic		// Integer arithmetic
▲ Show 20 Lines • Show All 534 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 209 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
LASTB,		LASTB,
REV,		REV,
TBL,		TBL,

INSR,		INSR,
PTEST,		PTEST,
PTRUE,		PTRUE,

		LDNF1,
		LDNF1S,

// Unsigned gather loads.		// Unsigned gather loads.
GLD1,		GLD1,
GLD1_SCALED,		GLD1_SCALED,
GLD1_UXTW,		GLD1_UXTW,
GLD1_SXTW,		GLD1_SXTW,
GLD1_UXTW_SCALED,		GLD1_UXTW_SCALED,
GLD1_SXTW_SCALED,		GLD1_SXTW_SCALED,
GLD1_IMM,		GLD1_IMM,
▲ Show 20 Lines • Show All 601 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,367 Lines • ▼ Show 20 Lines	const char *AArch64TargetLowering::getTargetNodeName(unsigned Opcode) const {
case AArch64ISD::STZ2G: return "AArch64ISD::STZ2G";		case AArch64ISD::STZ2G: return "AArch64ISD::STZ2G";
case AArch64ISD::SUNPKHI: return "AArch64ISD::SUNPKHI";		case AArch64ISD::SUNPKHI: return "AArch64ISD::SUNPKHI";
case AArch64ISD::SUNPKLO: return "AArch64ISD::SUNPKLO";		case AArch64ISD::SUNPKLO: return "AArch64ISD::SUNPKLO";
case AArch64ISD::UUNPKHI: return "AArch64ISD::UUNPKHI";		case AArch64ISD::UUNPKHI: return "AArch64ISD::UUNPKHI";
case AArch64ISD::UUNPKLO: return "AArch64ISD::UUNPKLO";		case AArch64ISD::UUNPKLO: return "AArch64ISD::UUNPKLO";
case AArch64ISD::INSR: return "AArch64ISD::INSR";		case AArch64ISD::INSR: return "AArch64ISD::INSR";
case AArch64ISD::PTEST: return "AArch64ISD::PTEST";		case AArch64ISD::PTEST: return "AArch64ISD::PTEST";
case AArch64ISD::PTRUE: return "AArch64ISD::PTRUE";		case AArch64ISD::PTRUE: return "AArch64ISD::PTRUE";
		case AArch64ISD::LDNF1: return "AArch64ISD::LDNF1";
		case AArch64ISD::LDNF1S: return "AArch64ISD::LDNF1S";
case AArch64ISD::GLD1: return "AArch64ISD::GLD1";		case AArch64ISD::GLD1: return "AArch64ISD::GLD1";
case AArch64ISD::GLD1_SCALED: return "AArch64ISD::GLD1_SCALED";		case AArch64ISD::GLD1_SCALED: return "AArch64ISD::GLD1_SCALED";
case AArch64ISD::GLD1_SXTW: return "AArch64ISD::GLD1_SXTW";		case AArch64ISD::GLD1_SXTW: return "AArch64ISD::GLD1_SXTW";
case AArch64ISD::GLD1_UXTW: return "AArch64ISD::GLD1_UXTW";		case AArch64ISD::GLD1_UXTW: return "AArch64ISD::GLD1_UXTW";
case AArch64ISD::GLD1_SXTW_SCALED: return "AArch64ISD::GLD1_SXTW_SCALED";		case AArch64ISD::GLD1_SXTW_SCALED: return "AArch64ISD::GLD1_SXTW_SCALED";
case AArch64ISD::GLD1_UXTW_SCALED: return "AArch64ISD::GLD1_UXTW_SCALED";		case AArch64ISD::GLD1_UXTW_SCALED: return "AArch64ISD::GLD1_UXTW_SCALED";
case AArch64ISD::GLD1_IMM: return "AArch64ISD::GLD1_IMM";		case AArch64ISD::GLD1_IMM: return "AArch64ISD::GLD1_IMM";
case AArch64ISD::GLD1S: return "AArch64ISD::GLD1S";		case AArch64ISD::GLD1S: return "AArch64ISD::GLD1S";
▲ Show 20 Lines • Show All 8,836 Lines • ▼ Show 20 Lines	if (DCI.isBeforeLegalizeOps())
return SDValue();		return SDValue();

SDValue Src = N->getOperand(0);		SDValue Src = N->getOperand(0);
SDValue Mask = N->getOperand(1);		SDValue Mask = N->getOperand(1);

if (!Src.hasOneUse())		if (!Src.hasOneUse())
return SDValue();		return SDValue();

// GLD1* instructions perform an implicit zero-extend, which makes them		EVT MemVT;

		// SVE load instructions perform an implicit zero-extend, which makes them
		andwarUnsubmitted Done Reply Inline Actions Could you replace `GLD1` with `Load`? I believe that that will be still correct with the added bonus of covering the new case :) andwar:* Could you replace `GLD1*` with `Load`? I believe that that will be still correct with the added…
// perfect candidates for combining.		// perfect candidates for combining.
switch (Src->getOpcode()) {		switch (Src->getOpcode()) {
		case AArch64ISD::LDNF1:
		MemVT = cast<VTSDNode>(Src->getOperand(3))->getVT();
		break;
case AArch64ISD::GLD1:		case AArch64ISD::GLD1:
case AArch64ISD::GLD1_SCALED:		case AArch64ISD::GLD1_SCALED:
case AArch64ISD::GLD1_SXTW:		case AArch64ISD::GLD1_SXTW:
case AArch64ISD::GLD1_SXTW_SCALED:		case AArch64ISD::GLD1_SXTW_SCALED:
case AArch64ISD::GLD1_UXTW:		case AArch64ISD::GLD1_UXTW:
case AArch64ISD::GLD1_UXTW_SCALED:		case AArch64ISD::GLD1_UXTW_SCALED:
case AArch64ISD::GLD1_IMM:		case AArch64ISD::GLD1_IMM:
		MemVT = cast<VTSDNode>(Src->getOperand(4))->getVT();
break;		break;
default:		default:
return SDValue();		return SDValue();
}		}

EVT MemVT = cast<VTSDNode>(Src->getOperand(4))->getVT();

if (isConstantSplatVectorMaskForType(Mask.getNode(), MemVT))		if (isConstantSplatVectorMaskForType(Mask.getNode(), MemVT))
return Src;		return Src;

return SDValue();		return SDValue();
}		}

static SDValue performANDCombine(SDNode *N,		static SDValue performANDCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI) {		TargetLowering::DAGCombinerInfo &DCI) {
▲ Show 20 Lines • Show All 959 Lines • ▼ Show 20 Lines	while (--NumVecElts) {
NewST1 = DAG.getStore(NewST1.getValue(0), DL, SplatVal, OffsetPtr,		NewST1 = DAG.getStore(NewST1.getValue(0), DL, SplatVal, OffsetPtr,
PtrInfo.getWithOffset(Offset), Alignment,		PtrInfo.getWithOffset(Offset), Alignment,
St.getMemOperand()->getFlags());		St.getMemOperand()->getFlags());
Offset += EltOffset;		Offset += EltOffset;
}		}
return NewST1;		return NewST1;
}		}

		// Returns an SVE type that ContentTy can be trivially sign or zero extended
		// into.
		static MVT getSVEContainerType(EVT ContentTy) {
		assert(ContentTy.isSimple() && "No SVE containers for extended types");

		switch (ContentTy.getSimpleVT().SimpleTy) {
		default:
		llvm_unreachable("No known SVE container for this MVT type");
		case MVT::nxv2i8:
		case MVT::nxv2i16:
		case MVT::nxv2i32:
		case MVT::nxv2i64:
		case MVT::nxv2f32:
		case MVT::nxv2f64:
		return MVT::nxv2i64;
		case MVT::nxv4i8:
		case MVT::nxv4i16:
		case MVT::nxv4i32:
		case MVT::nxv4f32:
		return MVT::nxv4i32;
		case MVT::nxv8i8:
		case MVT::nxv8i16:
		case MVT::nxv8f16:
		return MVT::nxv8i16;
		case MVT::nxv16i8:
		return MVT::nxv16i8;
		}
		}

static SDValue performLDNT1Combine(SDNode *N, SelectionDAG &DAG) {		static SDValue performLDNT1Combine(SDNode *N, SelectionDAG &DAG) {
SDLoc DL(N);		SDLoc DL(N);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
EVT PtrTy = N->getOperand(3).getValueType();		EVT PtrTy = N->getOperand(3).getValueType();

EVT LoadVT = VT;		EVT LoadVT = VT;
if (VT.isFloatingPoint())		if (VT.isFloatingPoint())
LoadVT = VT.changeTypeToInteger();		LoadVT = VT.changeTypeToInteger();
Show All 26 Lines	static SDValue performSTNT1Combine(SDNode *N, SelectionDAG &DAG) {

auto *MINode = cast<MemIntrinsicSDNode>(N);		auto *MINode = cast<MemIntrinsicSDNode>(N);
return DAG.getMaskedStore(MINode->getChain(), DL, Data, MINode->getOperand(4),		return DAG.getMaskedStore(MINode->getChain(), DL, Data, MINode->getOperand(4),
DAG.getUNDEF(PtrTy), MINode->getOperand(3),		DAG.getUNDEF(PtrTy), MINode->getOperand(3),
MINode->getMemoryVT(), MINode->getMemOperand(),		MINode->getMemoryVT(), MINode->getMemOperand(),
ISD::UNINDEXED, false, false);		ISD::UNINDEXED, false, false);
}		}

		static SDValue performLDNF1Combine(SDNode *N, SelectionDAG &DAG) {
		SDLoc DL(N);
		EVT VT = N->getValueType(0);

		if (VT.getSizeInBits().getKnownMinSize() > AArch64::SVEBitsPerBlock)
		return SDValue();

		EVT ContainerVT = VT;
		if (ContainerVT.isInteger())
		ContainerVT = getSVEContainerType(ContainerVT);
		andwarUnsubmitted Done Reply Inline Actions You could use `getSVEContainterType` here instead. You'll need to extend it a wee bit. andwar: You could use `getSVEContainterType` here instead. You'll need to extend it a wee bit.

		SDVTList VTs = DAG.getVTList(ContainerVT, MVT::Other);
		SDValue Ops[] = { N->getOperand(0), // Chain
		N->getOperand(2), // Pg
		N->getOperand(3), // Base
		DAG.getValueType(VT) };

		SDValue Load = DAG.getNode(AArch64ISD::LDNF1, DL, VTs, Ops);
		SDValue LoadChain = SDValue(Load.getNode(), 1);

		if (ContainerVT.isInteger() && (VT != ContainerVT))
		Load = DAG.getNode(ISD::TRUNCATE, DL, VT, Load.getValue(0));

		return DAG.getMergeValues({ Load, LoadChain }, DL);
		}

/// Replace a splat of zeros to a vector store by scalar stores of WZR/XZR. The		/// Replace a splat of zeros to a vector store by scalar stores of WZR/XZR. The
/// load store optimizer pass will merge them to store pair stores. This should		/// load store optimizer pass will merge them to store pair stores. This should
/// be better than a movi to create the vector zero followed by a vector store		/// be better than a movi to create the vector zero followed by a vector store
/// if the zero constant is not re-used, since one instructions and one register		/// if the zero constant is not re-used, since one instructions and one register
/// live range will be removed.		/// live range will be removed.
///		///
/// For example, the final generated code should be:		/// For example, the final generated code should be:
///		///
▲ Show 20 Lines • Show All 1,035 Lines • ▼ Show 20 Lines	if (!T->isSized() \|\|
return SDValue();		return SDValue();

SDLoc DL(GN);		SDLoc DL(GN);
SDValue Result = DAG.getGlobalAddress(GV, DL, MVT::i64, Offset);		SDValue Result = DAG.getGlobalAddress(GV, DL, MVT::i64, Offset);
return DAG.getNode(ISD::SUB, DL, MVT::i64, Result,		return DAG.getNode(ISD::SUB, DL, MVT::i64, Result,
DAG.getConstant(MinOffset, DL, MVT::i64));		DAG.getConstant(MinOffset, DL, MVT::i64));
}		}

// Returns an SVE type that ContentTy can be trivially sign or zero extended
// into.
static MVT getSVEContainerType(EVT ContentTy) {
assert(ContentTy.isSimple() && "No SVE containers for extended types");

switch (ContentTy.getSimpleVT().SimpleTy) {
default:
llvm_unreachable("No known SVE container for this MVT type");
case MVT::nxv2i8:
case MVT::nxv2i16:
case MVT::nxv2i32:
case MVT::nxv2i64:
case MVT::nxv2f32:
case MVT::nxv2f64:
return MVT::nxv2i64;
case MVT::nxv4i8:
case MVT::nxv4i16:
case MVT::nxv4i32:
case MVT::nxv4f32:
return MVT::nxv4i32;
}
}

static SDValue performST1ScatterCombine(SDNode *N, SelectionDAG &DAG,		static SDValue performST1ScatterCombine(SDNode *N, SelectionDAG &DAG,
unsigned Opcode,		unsigned Opcode,
bool OnlyPackedOffsets = true) {		bool OnlyPackedOffsets = true) {
const SDValue Src = N->getOperand(2);		const SDValue Src = N->getOperand(2);
const EVT SrcVT = Src->getValueType(0);		const EVT SrcVT = Src->getValueType(0);
assert(SrcVT.isScalableVector() &&		assert(SrcVT.isScalableVector() &&
"Scatter stores are only possible for SVE vectors");		"Scatter stores are only possible for SVE vectors");

▲ Show 20 Lines • Show All 171 Lines • ▼ Show 20 Lines
performSignExtendInRegCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,		performSignExtendInRegCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
if (DCI.isBeforeLegalizeOps())		if (DCI.isBeforeLegalizeOps())
return SDValue();		return SDValue();

SDValue Src = N->getOperand(0);		SDValue Src = N->getOperand(0);
unsigned Opc = Src->getOpcode();		unsigned Opc = Src->getOpcode();

// Gather load nodes (e.g. AArch64ISD::GLD1) are straightforward candidates		// SVE load nodes (e.g. AArch64ISD::GLD1) are straightforward candidates
		andwarUnsubmitted Done Reply Inline Actions The following `switch` statement will now cover more than just Gather nodes. Maybe `SVE load nodes` instead? andwar: The following `switch` statement will now cover more than just Gather nodes. Maybe `SVE load…
// for DAG Combine with SIGN_EXTEND_INREG. Bail out for all other nodes.		// for DAG Combine with SIGN_EXTEND_INREG. Bail out for all other nodes.
unsigned NewOpc;		unsigned NewOpc;
		unsigned MemVTOpNum = 4;
switch (Opc) {		switch (Opc) {
		case AArch64ISD::LDNF1:
		NewOpc = AArch64ISD::LDNF1S;
		MemVTOpNum = 3;
		break;
case AArch64ISD::GLD1:		case AArch64ISD::GLD1:
NewOpc = AArch64ISD::GLD1S;		NewOpc = AArch64ISD::GLD1S;
break;		break;
case AArch64ISD::GLD1_SCALED:		case AArch64ISD::GLD1_SCALED:
NewOpc = AArch64ISD::GLD1S_SCALED;		NewOpc = AArch64ISD::GLD1S_SCALED;
break;		break;
case AArch64ISD::GLD1_SXTW:		case AArch64ISD::GLD1_SXTW:
NewOpc = AArch64ISD::GLD1S_SXTW;		NewOpc = AArch64ISD::GLD1S_SXTW;
Show All 10 Lines	performSignExtendInRegCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,
case AArch64ISD::GLD1_IMM:		case AArch64ISD::GLD1_IMM:
NewOpc = AArch64ISD::GLD1S_IMM;		NewOpc = AArch64ISD::GLD1S_IMM;
break;		break;
default:		default:
return SDValue();		return SDValue();
}		}

EVT SignExtSrcVT = cast<VTSDNode>(N->getOperand(1))->getVT();		EVT SignExtSrcVT = cast<VTSDNode>(N->getOperand(1))->getVT();
EVT GLD1SrcMemVT = cast<VTSDNode>(Src->getOperand(4))->getVT();		EVT SrcMemVT = cast<VTSDNode>(Src->getOperand(MemVTOpNum))->getVT();

if ((SignExtSrcVT != GLD1SrcMemVT) \|\| !Src.hasOneUse())		if ((SignExtSrcVT != SrcMemVT) \|\| !Src.hasOneUse())
		sdesmalenUnsubmitted Not Done Reply Inline Actions Move the assignment of `MemVTOpNum` to the switch statement above instead of special-casing it here? sdesmalen: Move the assignment of `MemVTOpNum` to the switch statement above instead of special-casing it…
return SDValue();		return SDValue();
		sdesmalenUnsubmitted Not Done Reply Inline Actions nit: `s/LD1SrcMemVT/SrcMemVT/` sdesmalen: nit: `s/LD1SrcMemVT/SrcMemVT/`

EVT DstVT = N->getValueType(0);		EVT DstVT = N->getValueType(0);
SDVTList VTs = DAG.getVTList(DstVT, MVT::Other);		SDVTList VTs = DAG.getVTList(DstVT, MVT::Other);
SDValue Ops[] = {Src->getOperand(0), Src->getOperand(1), Src->getOperand(2),
Src->getOperand(3), Src->getOperand(4)};		SmallVector<SDValue, 5> Ops;
		sdesmalenUnsubmitted Not Done Reply Inline Actions Better make the default '5' if there is a large likelihood of there being 5 default values. sdesmalen: Better make the default '5' if there is a large likelihood of there being 5 default values.
		sdesmalenUnsubmitted Not Done Reply Inline Actions Instead of special -casing LDNF1S below, you can write this as: SmallVector<SDValue, 5> Ops; for(unsigned I=0; I<Src->getNumOperands(); ++I) Ops.push_back(Src->getOperand(I)); sdesmalen: Instead of special -casing LDNF1S below, you can write this as: SmallVector<SDValue, 5> Ops…
		for (unsigned I = 0; I < Src->getNumOperands(); ++I)
		Ops.push_back(Src->getOperand(I));

SDValue ExtLoad = DAG.getNode(NewOpc, SDLoc(N), VTs, Ops);		SDValue ExtLoad = DAG.getNode(NewOpc, SDLoc(N), VTs, Ops);
		andwarUnsubmitted Done Reply Inline Actions Why not: SmallVector<SDvalue, 4> Ops = {Src->getOperand(0), Src->getOperand(1), Src->getOperand(2), Src->getOperand(3), Src->getOperand(4)}; ? andwar: Why not: ``` SmallVector<SDvalue, 4> Ops = {Src->getOperand(0), Src->getOperand(1), Src…
DCI.CombineTo(N, ExtLoad);		DCI.CombineTo(N, ExtLoad);
		andwarUnsubmitted Done Reply Inline Actions Could you add a comment explaining what the underlying difference between `LDNF1S` and `GLD1S` is? Otherwise it's not clear why this `if` statement is needed. IIUC, `GLD1S` has an extra argument for the offsets (hence 5 args vs 4). andwar: Could you add a comment explaining what the underlying difference between `LDNF1S` and `GLD1S`…
DCI.CombineTo(Src.getNode(), ExtLoad, ExtLoad.getValue(1));		DCI.CombineTo(Src.getNode(), ExtLoad, ExtLoad.getValue(1));

// Return N so it doesn't get rechecked		// Return N so it doesn't get rechecked
return SDValue(N, 0);		return SDValue(N, 0);
}		}

SDValue AArch64TargetLowering::PerformDAGCombine(SDNode *N,		SDValue AArch64TargetLowering::PerformDAGCombine(SDNode *N,
DAGCombinerInfo &DCI) const {		DAGCombinerInfo &DCI) const {
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	case ISD::INTRINSIC_W_CHAIN:
case Intrinsic::aarch64_neon_st1x3:		case Intrinsic::aarch64_neon_st1x3:
case Intrinsic::aarch64_neon_st1x4:		case Intrinsic::aarch64_neon_st1x4:
case Intrinsic::aarch64_neon_st2lane:		case Intrinsic::aarch64_neon_st2lane:
case Intrinsic::aarch64_neon_st3lane:		case Intrinsic::aarch64_neon_st3lane:
case Intrinsic::aarch64_neon_st4lane:		case Intrinsic::aarch64_neon_st4lane:
return performNEONPostLDSTCombine(N, DCI, DAG);		return performNEONPostLDSTCombine(N, DCI, DAG);
case Intrinsic::aarch64_sve_ldnt1:		case Intrinsic::aarch64_sve_ldnt1:
return performLDNT1Combine(N, DAG);		return performLDNT1Combine(N, DAG);
		case Intrinsic::aarch64_sve_ldnf1:
		return performLDNF1Combine(N, DAG);
case Intrinsic::aarch64_sve_stnt1:		case Intrinsic::aarch64_sve_stnt1:
return performSTNT1Combine(N, DAG);		return performSTNT1Combine(N, DAG);
case Intrinsic::aarch64_sve_ld1_gather:		case Intrinsic::aarch64_sve_ld1_gather:
return performLD1GatherCombine(N, DAG, AArch64ISD::GLD1);		return performLD1GatherCombine(N, DAG, AArch64ISD::GLD1);
case Intrinsic::aarch64_sve_ld1_gather_index:		case Intrinsic::aarch64_sve_ld1_gather_index:
return performLD1GatherCombine(N, DAG, AArch64ISD::GLD1_SCALED);		return performLD1GatherCombine(N, DAG, AArch64ISD::GLD1_SCALED);
case Intrinsic::aarch64_sve_ld1_gather_sxtw:		case Intrinsic::aarch64_sve_ld1_gather_sxtw:
return performLD1GatherCombine(N, DAG, AArch64ISD::GLD1_SXTW,		return performLD1GatherCombine(N, DAG, AArch64ISD::GLD1_SXTW,
▲ Show 20 Lines • Show All 746 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 543 Lines • ▼ Show 20 Lines
	def AArch64uunpklo : SDNode<"AArch64ISD::UUNPKLO", SDT_AArch64unpk>;			def AArch64uunpklo : SDNode<"AArch64ISD::UUNPKLO", SDT_AArch64unpk>;

	def AArch64ldp : SDNode<"AArch64ISD::LDP", SDT_AArch64ldp, [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;			def AArch64ldp : SDNode<"AArch64ISD::LDP", SDT_AArch64ldp, [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
	def AArch64stp : SDNode<"AArch64ISD::STP", SDT_AArch64stp, [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;			def AArch64stp : SDNode<"AArch64ISD::STP", SDT_AArch64stp, [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
	def AArch64stnp : SDNode<"AArch64ISD::STNP", SDT_AArch64stnp, [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;			def AArch64stnp : SDNode<"AArch64ISD::STNP", SDT_AArch64stnp, [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;

	def AArch64tbl : SDNode<"AArch64ISD::TBL", SDT_AArch64TBL>;			def AArch64tbl : SDNode<"AArch64ISD::TBL", SDT_AArch64TBL>;

				def SDT_AArch64_LDNF1 : SDTypeProfile<1, 3, [
				SDTCisVec<0>, SDTCisVec<1>, SDTCisPtrTy<2>,
				SDTCVecEltisVT<1,i1>, SDTCisSameNumEltsAs<0,1>
				]>;

				def AArch64ldnf1 : SDNode<"AArch64ISD::LDNF1", SDT_AArch64_LDNF1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	// AArch64 Instruction Predicate Definitions.			// AArch64 Instruction Predicate Definitions.
	// We could compute these on a per-module basis but doing so requires accessing			// We could compute these on a per-module basis but doing so requires accessing
	// the Function object through the <Target>Subtarget and objections were raised			// the Function object through the <Target>Subtarget and objections were raised
	// to that (see post-commit review comments for r301750).			// to that (see post-commit review comments for r301750).
	▲ Show 20 Lines • Show All 6,778 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

	Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	def sve_cntw_imm : ComplexPattern<i32, 1, "SelectRDVLImm<1, 16, 4>">;			def sve_cntw_imm : ComplexPattern<i32, 1, "SelectRDVLImm<1, 16, 4>">;
	def sve_cntd_imm : ComplexPattern<i32, 1, "SelectRDVLImm<1, 16, 2>">;			def sve_cntd_imm : ComplexPattern<i32, 1, "SelectRDVLImm<1, 16, 2>">;

	// SVE DEC			// SVE DEC
	def sve_cnth_imm_neg : ComplexPattern<i32, 1, "SelectRDVLImm<1, 16, -8>">;			def sve_cnth_imm_neg : ComplexPattern<i32, 1, "SelectRDVLImm<1, 16, -8>">;
	def sve_cntw_imm_neg : ComplexPattern<i32, 1, "SelectRDVLImm<1, 16, -4>">;			def sve_cntw_imm_neg : ComplexPattern<i32, 1, "SelectRDVLImm<1, 16, -4>">;
	def sve_cntd_imm_neg : ComplexPattern<i32, 1, "SelectRDVLImm<1, 16, -2>">;			def sve_cntd_imm_neg : ComplexPattern<i32, 1, "SelectRDVLImm<1, 16, -2>">;

				def AArch64ldnf1s : SDNode<"AArch64ISD::LDNF1S", SDT_AArch64_LDNF1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
	def AArch64ld1s_gather : SDNode<"AArch64ISD::GLD1S", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;			def AArch64ld1s_gather : SDNode<"AArch64ISD::GLD1S", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
	def AArch64ld1s_gather_scaled : SDNode<"AArch64ISD::GLD1S_SCALED", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;			def AArch64ld1s_gather_scaled : SDNode<"AArch64ISD::GLD1S_SCALED", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
	def AArch64ld1s_gather_uxtw : SDNode<"AArch64ISD::GLD1S_UXTW", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;			def AArch64ld1s_gather_uxtw : SDNode<"AArch64ISD::GLD1S_UXTW", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
	def AArch64ld1s_gather_sxtw : SDNode<"AArch64ISD::GLD1S_SXTW", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;			def AArch64ld1s_gather_sxtw : SDNode<"AArch64ISD::GLD1S_SXTW", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
	def AArch64ld1s_gather_uxtw_scaled : SDNode<"AArch64ISD::GLD1S_UXTW_SCALED", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;			def AArch64ld1s_gather_uxtw_scaled : SDNode<"AArch64ISD::GLD1S_UXTW_SCALED", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
	def AArch64ld1s_gather_sxtw_scaled : SDNode<"AArch64ISD::GLD1S_SXTW_SCALED", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;			def AArch64ld1s_gather_sxtw_scaled : SDNode<"AArch64ISD::GLD1S_SXTW_SCALED", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
	def AArch64ld1s_gather_imm : SDNode<"AArch64ISD::GLD1S_IMM", SDT_AArch64_GLD1_IMM, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;			def AArch64ld1s_gather_imm : SDNode<"AArch64ISD::GLD1S_IMM", SDT_AArch64_GLD1_IMM, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;

	▲ Show 20 Lines • Show All 1,186 Lines • ▼ Show 20 Lines
	defm : pred_load<nxv8i16, nxv8i1, non_temporal_load, LDNT1H_ZRI>;			defm : pred_load<nxv8i16, nxv8i1, non_temporal_load, LDNT1H_ZRI>;
	defm : pred_load<nxv4i32, nxv4i1, non_temporal_load, LDNT1W_ZRI>;			defm : pred_load<nxv4i32, nxv4i1, non_temporal_load, LDNT1W_ZRI>;
	defm : pred_load<nxv2i64, nxv2i1, non_temporal_load, LDNT1D_ZRI>;			defm : pred_load<nxv2i64, nxv2i1, non_temporal_load, LDNT1D_ZRI>;

	defm : pred_store<nxv16i8, nxv16i1, non_temporal_store, STNT1B_ZRI>;			defm : pred_store<nxv16i8, nxv16i1, non_temporal_store, STNT1B_ZRI>;
	defm : pred_store<nxv8i16, nxv8i1, non_temporal_store, STNT1H_ZRI>;			defm : pred_store<nxv8i16, nxv8i1, non_temporal_store, STNT1H_ZRI>;
	defm : pred_store<nxv4i32, nxv4i1, non_temporal_store, STNT1W_ZRI>;			defm : pred_store<nxv4i32, nxv4i1, non_temporal_store, STNT1W_ZRI>;
	defm : pred_store<nxv2i64, nxv2i1, non_temporal_store, STNT1D_ZRI>;			defm : pred_store<nxv2i64, nxv2i1, non_temporal_store, STNT1D_ZRI>;

				multiclass ldnf1<Instruction I, ValueType Ty, SDPatternOperator Load, ValueType PredTy, ValueType MemVT> {
				// base
				def : Pat<(Ty (Load (PredTy PPR:$gp), GPR64:$base, MemVT)),
				(I PPR:$gp, GPR64sp:$base, (i64 0))>;
				}

				// 2-element contiguous non-faulting loads
				defm : ldnf1<LDNF1B_D_IMM, nxv2i64, AArch64ldnf1, nxv2i1, nxv2i8>;
				defm : ldnf1<LDNF1SB_D_IMM, nxv2i64, AArch64ldnf1s, nxv2i1, nxv2i8>;
				defm : ldnf1<LDNF1H_D_IMM, nxv2i64, AArch64ldnf1, nxv2i1, nxv2i16>;
				defm : ldnf1<LDNF1SH_D_IMM, nxv2i64, AArch64ldnf1s, nxv2i1, nxv2i16>;
				defm : ldnf1<LDNF1W_D_IMM, nxv2i64, AArch64ldnf1, nxv2i1, nxv2i32>;
				defm : ldnf1<LDNF1SW_D_IMM, nxv2i64, AArch64ldnf1s, nxv2i1, nxv2i32>;
				defm : ldnf1<LDNF1D_IMM, nxv2i64, AArch64ldnf1, nxv2i1, nxv2i64>;
				defm : ldnf1<LDNF1D_IMM, nxv2f64, AArch64ldnf1, nxv2i1, nxv2f64>;

				// 4-element contiguous non-faulting loads
				defm : ldnf1<LDNF1B_S_IMM, nxv4i32, AArch64ldnf1, nxv4i1, nxv4i8>;
				defm : ldnf1<LDNF1SB_S_IMM, nxv4i32, AArch64ldnf1s, nxv4i1, nxv4i8>;
				defm : ldnf1<LDNF1H_S_IMM, nxv4i32, AArch64ldnf1, nxv4i1, nxv4i16>;
				defm : ldnf1<LDNF1SH_S_IMM, nxv4i32, AArch64ldnf1s, nxv4i1, nxv4i16>;
				defm : ldnf1<LDNF1W_IMM, nxv4i32, AArch64ldnf1, nxv4i1, nxv4i32>;
				defm : ldnf1<LDNF1W_IMM, nxv4f32, AArch64ldnf1, nxv4i1, nxv4f32>;

				// 8-element contiguous non-faulting loads
				defm : ldnf1<LDNF1B_H_IMM, nxv8i16, AArch64ldnf1, nxv8i1, nxv8i8>;
				defm : ldnf1<LDNF1SB_H_IMM, nxv8i16, AArch64ldnf1s, nxv8i1, nxv8i8>;
				defm : ldnf1<LDNF1H_IMM, nxv8i16, AArch64ldnf1, nxv8i1, nxv8i16>;
				defm : ldnf1<LDNF1H_IMM, nxv8f16, AArch64ldnf1, nxv8i1, nxv8f16>;

				// 16-element contiguous non-faulting loads
				defm : ldnf1<LDNF1B_IMM, nxv16i8, AArch64ldnf1, nxv16i1, nxv16i8>;

	}			}

	let Predicates = [HasSVE2] in {			let Predicates = [HasSVE2] in {
	// SVE2 integer multiply-add (indexed)			// SVE2 integer multiply-add (indexed)
	defm MLA_ZZZI : sve2_int_mla_by_indexed_elem<0b01, 0b0, "mla">;			defm MLA_ZZZI : sve2_int_mla_by_indexed_elem<0b01, 0b0, "mla">;
	defm MLS_ZZZI : sve2_int_mla_by_indexed_elem<0b01, 0b1, "mls">;			defm MLS_ZZZI : sve2_int_mla_by_indexed_elem<0b01, 0b1, "mls">;

	// SVE2 saturating multiply-add high (indexed)			// SVE2 saturating multiply-add high (indexed)
	▲ Show 20 Lines • Show All 400 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/SVEInstrFormats.td

Show First 20 Lines • Show All 5,551 Lines • ▼ Show 20 Lines	: I<(outs VecList:$Zt), (ins PPR3bAny:$Pg, GPR64sp:$Rn, simm4s1:$imm4),

let mayLoad = 1;		let mayLoad = 1;
let Uses = !if(!eq(nf, 1), [FFR], []);		let Uses = !if(!eq(nf, 1), [FFR], []);
let Defs = !if(!eq(nf, 1), [FFR], []);		let Defs = !if(!eq(nf, 1), [FFR], []);
}		}

multiclass sve_mem_cld_si_base<bits<4> dtype, bit nf, string asm,		multiclass sve_mem_cld_si_base<bits<4> dtype, bit nf, string asm,
RegisterOperand listty, ZPRRegOp zprty> {		RegisterOperand listty, ZPRRegOp zprty> {
def "" : sve_mem_cld_si_base<dtype, nf, asm, listty>;		def _REAL : sve_mem_cld_si_base<dtype, nf, asm, listty>;

def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn]",		def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn]",
(!cast<Instruction>(NAME) zprty:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, 0), 0>;		(!cast<Instruction>(NAME # _REAL) zprty:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, 0), 0>;
def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn, $imm4, mul vl]",		def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn, $imm4, mul vl]",
(!cast<Instruction>(NAME) zprty:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, simm4s1:$imm4), 0>;		(!cast<Instruction>(NAME # _REAL) zprty:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, simm4s1:$imm4), 0>;
def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn]",		def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn]",
(!cast<Instruction>(NAME) listty:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, 0), 1>;		(!cast<Instruction>(NAME # _REAL) listty:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, 0), 1>;

		// We need a layer of indirection because early machine code passes balk at
		// physical register (i.e. FFR) uses that have no previous definition.
		efriedmaUnsubmitted Not Done Reply Inline Actions This is depending on hasSideEffects to preserve the correct ordering with instructions that read/write FFR? That probably works. I guess the alternative is to insert an IMPLICIT_DEF of FFR in the entry block of each function. What are the calling convention rules for FFR? Is it callee-save? If not, we might need to do some work to make FFR reads/writes do something sane across calls inserted by the compiler. efriedma: This is depending on hasSideEffects to preserve the correct ordering with instructions that…
		kmclaughlinAuthorUnsubmitted Not Done Reply Inline Actions The FFR is not callee-saved. We will need to add support to save & restore it where appropriate at the point the compiler starts generating reads to the FFR, but for the purpose of the ACLE the user will be required to do this if necessary. kmclaughlin: The FFR is not callee-saved. We will need to add support to save & restore it where appropriate…
		efriedmaUnsubmitted Not Done Reply Inline Actions How can the user write correct code to save/restore the FFR? The compiler can move arbitrary readnone/argmemonly calls between the definition and the use. efriedma: How can the user write correct code to save/restore the FFR? The compiler can move arbitrary…
		sdesmalenUnsubmitted Not Done Reply Inline Actions There are separate intrinsics for loading/writing the FFR (svrdffr, svsetffr, svwrffr), which use a `svbool_t` to keep the value of the FFR. These intrinsics are implemented in the same way with a Pseudo with `hasSideEffects = 1` set. I thought this flag would prevent other calls from being scheduled/moved over these intrinsics, as they have unknown/unmodelled side-effects and would thus act kind of like a barrier? sdesmalen: There are separate intrinsics for loading/writing the FFR (svrdffr, svsetffr, svwrffr), which…
		efriedmaUnsubmitted Not Done Reply Inline Actions The issue would be transforms at the IR/SelectionDAG level. We can probably model calls at the MIR level correctly, like you're describing. efriedma: The issue would be transforms at the IR/SelectionDAG level. We can probably model calls at the…
		let hasSideEffects = 1, hasNoSchedulingInfo = 1, mayLoad = 1 in {
		def "" : Pseudo<(outs listty:$Zt), (ins PPR3bAny:$Pg, GPR64sp:$Rn, simm4s1:$imm4), []>,
		PseudoInstExpansion<(!cast<Instruction>(NAME # _REAL) listty:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, simm4s1:$imm4)>;
		}
}		}

multiclass sve_mem_cld_si<bits<4> dtype, string asm, RegisterOperand listty,		multiclass sve_mem_cld_si<bits<4> dtype, string asm, RegisterOperand listty,
ZPRRegOp zprty>		ZPRRegOp zprty>
: sve_mem_cld_si_base<dtype, 0, asm, listty, zprty>;		: sve_mem_cld_si_base<dtype, 0, asm, listty, zprty>;

class sve_mem_cldnt_si_base<bits<2> msz, string asm, RegisterOperand VecList>		class sve_mem_cldnt_si_base<bits<2> msz, string asm, RegisterOperand VecList>
: I<(outs VecList:$Zt), (ins PPR3bAny:$Pg, GPR64sp:$Rn, simm4s1:$imm4),		: I<(outs VecList:$Zt), (ins PPR3bAny:$Pg, GPR64sp:$Rn, simm4s1:$imm4),
▲ Show 20 Lines • Show All 1,228 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-intrinsics-loads-nf.ll

This file was added.

				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s \| FileCheck %s

				define <vscale x 16 x i8> @ldnf1b(<vscale x 16 x i1> %pg, i8* %a) {
				; CHECK-LABEL: ldnf1b:
				; CHECK: ldnf1b { z0.b }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 16 x i8> @llvm.aarch64.sve.ldnf1.nxv16i8(<vscale x 16 x i1> %pg, i8* %a)
				ret <vscale x 16 x i8> %load
				}

				define <vscale x 8 x i16> @ldnf1b_h(<vscale x 8 x i1> %pg, i8* %a) {
				; CHECK-LABEL: ldnf1b_h:
				; CHECK: ldnf1b { z0.h }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 8 x i8> @llvm.aarch64.sve.ldnf1.nxv8i8(<vscale x 8 x i1> %pg, i8* %a)
				%res = zext <vscale x 8 x i8> %load to <vscale x 8 x i16>
				ret <vscale x 8 x i16> %res
				}

				define <vscale x 8 x i16> @ldnf1sb_h(<vscale x 8 x i1> %pg, i8* %a) {
				; CHECK-LABEL: ldnf1sb_h:
				; CHECK: ldnf1sb { z0.h }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 8 x i8> @llvm.aarch64.sve.ldnf1.nxv8i8(<vscale x 8 x i1> %pg, i8* %a)
				%res = sext <vscale x 8 x i8> %load to <vscale x 8 x i16>
				ret <vscale x 8 x i16> %res
				}

				define <vscale x 8 x i16> @ldnf1h(<vscale x 8 x i1> %pg, i16* %a) {
				; CHECK-LABEL: ldnf1h:
				; CHECK: ldnf1h { z0.h }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 8 x i16> @llvm.aarch64.sve.ldnf1.nxv8i16(<vscale x 8 x i1> %pg, i16* %a)
				ret <vscale x 8 x i16> %load
				}

				define <vscale x 8 x half> @ldnf1h_f16(<vscale x 8 x i1> %pg, half* %a) {
				; CHECK-LABEL: ldnf1h_f16:
				; CHECK: ldnf1h { z0.h }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 8 x half> @llvm.aarch64.sve.ldnf1.nxv8f16(<vscale x 8 x i1> %pg, half* %a)
				ret <vscale x 8 x half> %load
				}

				define <vscale x 4 x i32> @ldnf1b_s(<vscale x 4 x i1> %pg, i8* %a) {
				; CHECK-LABEL: ldnf1b_s:
				; CHECK: ldnf1b { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 4 x i8> @llvm.aarch64.sve.ldnf1.nxv4i8(<vscale x 4 x i1> %pg, i8* %a)
				%res = zext <vscale x 4 x i8> %load to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 4 x i32> @ldnf1sb_s(<vscale x 4 x i1> %pg, i8* %a) {
				; CHECK-LABEL: ldnf1sb_s:
				; CHECK: ldnf1sb { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 4 x i8> @llvm.aarch64.sve.ldnf1.nxv4i8(<vscale x 4 x i1> %pg, i8* %a)
				%res = sext <vscale x 4 x i8> %load to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 4 x i32> @ldnf1h_s(<vscale x 4 x i1> %pg, i16* %a) {
				; CHECK-LABEL: ldnf1h_s:
				; CHECK: ldnf1h { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 4 x i16> @llvm.aarch64.sve.ldnf1.nxv4i16(<vscale x 4 x i1> %pg, i16* %a)
				%res = zext <vscale x 4 x i16> %load to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 4 x i32> @ldnf1sh_s(<vscale x 4 x i1> %pg, i16* %a) {
				; CHECK-LABEL: ldnf1sh_s:
				; CHECK: ldnf1sh { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 4 x i16> @llvm.aarch64.sve.ldnf1.nxv4i16(<vscale x 4 x i1> %pg, i16* %a)
				%res = sext <vscale x 4 x i16> %load to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 4 x i32> @ldnf1w(<vscale x 4 x i1> %pg, i32* %a) {
				; CHECK-LABEL: ldnf1w:
				; CHECK: ldnf1w { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 4 x i32> @llvm.aarch64.sve.ldnf1.nxv4i32(<vscale x 4 x i1> %pg, i32* %a)
				ret <vscale x 4 x i32> %load
				}

				define <vscale x 4 x float> @ldnf1w_f32(<vscale x 4 x i1> %pg, float* %a) {
				; CHECK-LABEL: ldnf1w_f32:
				; CHECK: ldnf1w { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 4 x float> @llvm.aarch64.sve.ldnf1.nxv4f32(<vscale x 4 x i1> %pg, float* %a)
				ret <vscale x 4 x float> %load
				}

				define <vscale x 2 x i64> @ldnf1b_d(<vscale x 2 x i1> %pg, i8* %a) {
				; CHECK-LABEL: ldnf1b_d:
				; CHECK: ldnf1b { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 2 x i8> @llvm.aarch64.sve.ldnf1.nxv2i8(<vscale x 2 x i1> %pg, i8* %a)
				%res = zext <vscale x 2 x i8> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @ldnf1sb_d(<vscale x 2 x i1> %pg, i8* %a) {
				; CHECK-LABEL: ldnf1sb_d:
				; CHECK: ldnf1sb { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 2 x i8> @llvm.aarch64.sve.ldnf1.nxv2i8(<vscale x 2 x i1> %pg, i8* %a)
				%res = sext <vscale x 2 x i8> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @ldnf1h_d(<vscale x 2 x i1> %pg, i16* %a) {
				; CHECK-LABEL: ldnf1h_d:
				; CHECK: ldnf1h { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 2 x i16> @llvm.aarch64.sve.ldnf1.nxv2i16(<vscale x 2 x i1> %pg, i16* %a)
				%res = zext <vscale x 2 x i16> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @ldnf1sh_d(<vscale x 2 x i1> %pg, i16* %a) {
				; CHECK-LABEL: ldnf1sh_d:
				; CHECK: ldnf1sh { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 2 x i16> @llvm.aarch64.sve.ldnf1.nxv2i16(<vscale x 2 x i1> %pg, i16* %a)
				%res = sext <vscale x 2 x i16> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @ldnf1w_d(<vscale x 2 x i1> %pg, i32* %a) {
				; CHECK-LABEL: ldnf1w_d:
				; CHECK: ldnf1w { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 2 x i32> @llvm.aarch64.sve.ldnf1.nxv2i32(<vscale x 2 x i1> %pg, i32* %a)
				%res = zext <vscale x 2 x i32> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @ldnf1sw_d(<vscale x 2 x i1> %pg, i32* %a) {
				; CHECK-LABEL: ldnf1sw_d:
				; CHECK: ldnf1sw { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 2 x i32> @llvm.aarch64.sve.ldnf1.nxv2i32(<vscale x 2 x i1> %pg, i32* %a)
				%res = sext <vscale x 2 x i32> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @ldnf1d(<vscale x 2 x i1> %pg, i64* %a) {
				; CHECK-LABEL: ldnf1d:
				; CHECK: ldnf1d { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 2 x i64> @llvm.aarch64.sve.ldnf1.nxv2i64(<vscale x 2 x i1> %pg, i64* %a)
				ret <vscale x 2 x i64> %load
				}

				define <vscale x 2 x double> @ldnf1d_f64(<vscale x 2 x i1> %pg, double* %a) {
				; CHECK-LABEL: ldnf1d_f64:
				; CHECK: ldnf1d { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 2 x double> @llvm.aarch64.sve.ldnf1.nxv2f64(<vscale x 2 x i1> %pg, double* %a)
				ret <vscale x 2 x double> %load
				}

				declare <vscale x 16 x i8> @llvm.aarch64.sve.ldnf1.nxv16i8(<vscale x 16 x i1>, i8*)

				declare <vscale x 8 x i8> @llvm.aarch64.sve.ldnf1.nxv8i8(<vscale x 8 x i1>, i8*)
				declare <vscale x 8 x i16> @llvm.aarch64.sve.ldnf1.nxv8i16(<vscale x 8 x i1>, i16*)
				declare <vscale x 8 x half> @llvm.aarch64.sve.ldnf1.nxv8f16(<vscale x 8 x i1>, half*)

				declare <vscale x 4 x i8> @llvm.aarch64.sve.ldnf1.nxv4i8(<vscale x 4 x i1>, i8*)
				declare <vscale x 4 x i16> @llvm.aarch64.sve.ldnf1.nxv4i16(<vscale x 4 x i1>, i16*)
				declare <vscale x 4 x i32> @llvm.aarch64.sve.ldnf1.nxv4i32(<vscale x 4 x i1>, i32*)
				declare <vscale x 4 x float> @llvm.aarch64.sve.ldnf1.nxv4f32(<vscale x 4 x i1>, float*)

				declare <vscale x 2 x i8> @llvm.aarch64.sve.ldnf1.nxv2i8(<vscale x 2 x i1>, i8*)
				declare <vscale x 2 x i16> @llvm.aarch64.sve.ldnf1.nxv2i16(<vscale x 2 x i1>, i16*)
				declare <vscale x 2 x i32> @llvm.aarch64.sve.ldnf1.nxv2i32(<vscale x 2 x i1>, i32*)
				declare <vscale x 2 x i64> @llvm.aarch64.sve.ldnf1.nxv2i64(<vscale x 2 x i1>, i64*)
				declare <vscale x 2 x double> @llvm.aarch64.sve.ldnf1.nxv2f64(<vscale x 2 x i1>, double*)

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Add intrinsic for non-faulting loadsClosedPublic

Details

Diff Detail

Event Timeline