This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/Target/AArch64/SVEInstrFormats.td
5333	This is depending on hasSideEffects to preserve the correct ordering with instructions that read/write FFR? That probably works. I guess the alternative is to insert an IMPLICIT_DEF of FFR in the entry block of each function. What are the calling convention rules for FFR? Is it callee-save? If not, we might need to do some work to make FFR reads/writes do something sane across calls inserted by the compiler.

kmclaughlin added inline comments.Dec 20 2019, 9:16 AM

llvm/lib/Target/AArch64/SVEInstrFormats.td
5333	The FFR is not callee-saved. We will need to add support to save & restore it where appropriate at the point the compiler starts generating reads to the FFR, but for the purpose of the ACLE the user will be required to do this if necessary.

efriedma added inline comments.Dec 20 2019, 2:00 PM

llvm/lib/Target/AArch64/SVEInstrFormats.td
5333	How can the user write correct code to save/restore the FFR? The compiler can move arbitrary readnone/argmemonly calls between the definition and the use.

andwar added inline comments.Jan 2 2020, 6:27 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9998	Could you replace `GLD1*` with `Load`? I believe that that will be still correct with the added bonus of covering the new case :)
11051	You could use `getSVEContainterType` here instead. You'll need to extend it a wee bit.
12284	The following `switch` statement will now cover more than just Gather nodes. Maybe `SVE load nodes` instead?
12328–12331	Why not: SmallVector<SDvalue, 4> Ops = {Src->getOperand(0), Src->getOperand(1), Src->getOperand(2), Src->getOperand(3), Src->getOperand(4)}; ?
12332	Could you add a comment explaining what the underlying difference between `LDNF1S` and `GLD1S` is? Otherwise it's not clear why this `if` statement is needed. IIUC, `GLD1S` has an extra argument for the offsets (hence 5 args vs 4).

sdesmalen added inline comments.Jan 8 2020, 9:39 AM

llvm/lib/Target/AArch64/SVEInstrFormats.td
5333	There are separate intrinsics for loading/writing the FFR (svrdffr, svsetffr, svwrffr), which use a `svbool_t` to keep the value of the FFR. These intrinsics are implemented in the same way with a Pseudo with `hasSideEffects = 1` set. I thought this flag would prevent other calls from being scheduled/moved over these intrinsics, as they have unknown/unmodelled side-effects and would thus act kind of like a barrier?

efriedma added inline comments.Jan 8 2020, 12:39 PM

llvm/lib/Target/AArch64/SVEInstrFormats.td
5333	The issue would be transforms at the IR/SelectionDAG level. We can probably model calls at the MIR level correctly, like you're describing.

Rebased patch
Updated comments and extended getSVEContainerType to handle nxv8i16 & nxv16i8

Thanks for your suggestions, @andwar!

kmclaughlin added a child revision: D73025: [AArch64][SVE] Add first-faulting load intrinsic.Jan 20 2020, 3:23 AM

sdesmalen added inline comments.Jan 20 2020, 5:07 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12318–12321	Move the assignment of `MemVTOpNum` to the switch statement above instead of special-casing it here?
12319	nit: `s/LD1SrcMemVT/SrcMemVT/`
12327	Better make the default '5' if there is a large likelihood of there being 5 default values.
12327	Instead of special -casing LDNF1S below, you can write this as: SmallVector<SDValue, 5> Ops; for(unsigned I=0; I<Src->getNumOperands(); ++I) Ops.push_back(Src->getOperand(I));

Some minor changes to performSignExtendInRegCombine to address comments from @sdesmalen

LGTM [with the caveat that we need to revisit the modelling of the FFR register and get rid fo the PseudoInstExpansion at a later point, as discussed during the previous sync-up call]

This revision is now accepted and ready to land.Jan 21 2020, 1:00 AM

Closed by commit rGcdcc4f2a44b5: [AArch64][SVE] Add intrinsic for non-faulting loads (authored by kmclaughlin). · Explain WhyJan 22 2020, 3:40 AM

This revision was automatically updated to reflect the committed changes.

kmclaughlin mentioned this in D73097: [AArch64][SVE] Add intrinsics for FFR manipulation.Jan 24 2020, 2:57 AM

efriedma mentioned this in D102617: [llvm][AArch64][SVE] Model FFR-using intrinsics with inaccessiblemem.May 17 2021, 10:42 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

IntrinsicsAArch64.td

8 lines

lib/

Target/

AArch64/

AArch64ISelLowering.h

3 lines

AArch64ISelLowering.cpp

64 lines

AArch64InstrInfo.td

7 lines

AArch64SVEInstrInfo.td

35 lines

SVEInstrFormats.td

15 lines

test/

CodeGen/

AArch64/

sve-intrinsics-loads-nf.ll

182 lines

Diff 234695

llvm/include/llvm/IR/IntrinsicsAArch64.td

Show First 20 Lines • Show All 769 Lines • ▼ Show 20 Lines
let TargetPrefix = "aarch64" in { // All intrinsics start with "llvm.aarch64.".		let TargetPrefix = "aarch64" in { // All intrinsics start with "llvm.aarch64.".

class AdvSIMD_1Vec_PredLoad_Intrinsic		class AdvSIMD_1Vec_PredLoad_Intrinsic
: Intrinsic<[llvm_anyvector_ty],		: Intrinsic<[llvm_anyvector_ty],
[LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,		[LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
LLVMPointerTo<0>],		LLVMPointerTo<0>],
[IntrReadMem, IntrArgMemOnly]>;		[IntrReadMem, IntrArgMemOnly]>;

		class AdvSIMD_1Vec_PredFaultingLoad_Intrinsic
		: Intrinsic<[llvm_anyvector_ty],
		[LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		LLVMPointerToElt<0>],
		[IntrReadMem, IntrArgMemOnly]>;

class AdvSIMD_1Vec_PredStore_Intrinsic		class AdvSIMD_1Vec_PredStore_Intrinsic
: Intrinsic<[],		: Intrinsic<[],
[llvm_anyvector_ty,		[llvm_anyvector_ty,
LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
LLVMPointerTo<0>],		LLVMPointerTo<0>],
[IntrArgMemOnly, NoCapture<2>]>;		[IntrArgMemOnly, NoCapture<2>]>;

class AdvSIMD_Merged1VectorArg_Intrinsic		class AdvSIMD_Merged1VectorArg_Intrinsic
▲ Show 20 Lines • Show All 325 Lines • ▼ Show 20 Lines	: Intrinsic<[llvm_anyvector_ty],
[IntrNoMem, ImmArg<1>]>;		[IntrNoMem, ImmArg<1>]>;

//		//
// Loads		// Loads
//		//

def int_aarch64_sve_ldnt1 : AdvSIMD_1Vec_PredLoad_Intrinsic;		def int_aarch64_sve_ldnt1 : AdvSIMD_1Vec_PredLoad_Intrinsic;

		def int_aarch64_sve_ldnf1 : AdvSIMD_1Vec_PredFaultingLoad_Intrinsic;

//		//
// Stores		// Stores
//		//

def int_aarch64_sve_stnt1 : AdvSIMD_1Vec_PredStore_Intrinsic;		def int_aarch64_sve_stnt1 : AdvSIMD_1Vec_PredStore_Intrinsic;

//		//
// Integer arithmetic		// Integer arithmetic
▲ Show 20 Lines • Show All 417 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 200 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {

SUNPKHI,		SUNPKHI,
SUNPKLO,		SUNPKLO,
UUNPKHI,		UUNPKHI,
UUNPKLO,		UUNPKLO,

INSR,		INSR,

		LDNF1,
		LDNF1S,

// Unsigned gather loads.		// Unsigned gather loads.
GLD1,		GLD1,
GLD1_SCALED,		GLD1_SCALED,
GLD1_UXTW,		GLD1_UXTW,
GLD1_SXTW,		GLD1_SXTW,
GLD1_UXTW_SCALED,		GLD1_UXTW_SCALED,
GLD1_SXTW_SCALED,		GLD1_SXTW_SCALED,
GLD1_IMM,		GLD1_IMM,
▲ Show 20 Lines • Show All 596 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,341 Lines • ▼ Show 20 Lines	const char *AArch64TargetLowering::getTargetNodeName(unsigned Opcode) const {
case AArch64ISD::STZG: return "AArch64ISD::STZG";		case AArch64ISD::STZG: return "AArch64ISD::STZG";
case AArch64ISD::ST2G: return "AArch64ISD::ST2G";		case AArch64ISD::ST2G: return "AArch64ISD::ST2G";
case AArch64ISD::STZ2G: return "AArch64ISD::STZ2G";		case AArch64ISD::STZ2G: return "AArch64ISD::STZ2G";
case AArch64ISD::SUNPKHI: return "AArch64ISD::SUNPKHI";		case AArch64ISD::SUNPKHI: return "AArch64ISD::SUNPKHI";
case AArch64ISD::SUNPKLO: return "AArch64ISD::SUNPKLO";		case AArch64ISD::SUNPKLO: return "AArch64ISD::SUNPKLO";
case AArch64ISD::UUNPKHI: return "AArch64ISD::UUNPKHI";		case AArch64ISD::UUNPKHI: return "AArch64ISD::UUNPKHI";
case AArch64ISD::UUNPKLO: return "AArch64ISD::UUNPKLO";		case AArch64ISD::UUNPKLO: return "AArch64ISD::UUNPKLO";
case AArch64ISD::INSR: return "AArch64ISD::INSR";		case AArch64ISD::INSR: return "AArch64ISD::INSR";
		case AArch64ISD::LDNF1: return "AArch64ISD::LDNF1";
		case AArch64ISD::LDNF1S: return "AArch64ISD::LDNF1S";
case AArch64ISD::GLD1: return "AArch64ISD::GLD1";		case AArch64ISD::GLD1: return "AArch64ISD::GLD1";
case AArch64ISD::GLD1_SCALED: return "AArch64ISD::GLD1_SCALED";		case AArch64ISD::GLD1_SCALED: return "AArch64ISD::GLD1_SCALED";
case AArch64ISD::GLD1_SXTW: return "AArch64ISD::GLD1_SXTW";		case AArch64ISD::GLD1_SXTW: return "AArch64ISD::GLD1_SXTW";
case AArch64ISD::GLD1_UXTW: return "AArch64ISD::GLD1_UXTW";		case AArch64ISD::GLD1_UXTW: return "AArch64ISD::GLD1_UXTW";
case AArch64ISD::GLD1_SXTW_SCALED: return "AArch64ISD::GLD1_SXTW_SCALED";		case AArch64ISD::GLD1_SXTW_SCALED: return "AArch64ISD::GLD1_SXTW_SCALED";
case AArch64ISD::GLD1_UXTW_SCALED: return "AArch64ISD::GLD1_UXTW_SCALED";		case AArch64ISD::GLD1_UXTW_SCALED: return "AArch64ISD::GLD1_UXTW_SCALED";
case AArch64ISD::GLD1_IMM: return "AArch64ISD::GLD1_IMM";		case AArch64ISD::GLD1_IMM: return "AArch64ISD::GLD1_IMM";
case AArch64ISD::GLD1S: return "AArch64ISD::GLD1S";		case AArch64ISD::GLD1S: return "AArch64ISD::GLD1S";
▲ Show 20 Lines • Show All 8,628 Lines • ▼ Show 20 Lines	if (DCI.isBeforeLegalizeOps())
return SDValue();		return SDValue();

SDValue Src = N->getOperand(0);		SDValue Src = N->getOperand(0);
SDValue Mask = N->getOperand(1);		SDValue Mask = N->getOperand(1);

if (!Src.hasOneUse())		if (!Src.hasOneUse())
return SDValue();		return SDValue();

		EVT MemVT;

// GLD1* instructions perform an implicit zero-extend, which makes them		// GLD1* instructions perform an implicit zero-extend, which makes them
		andwarUnsubmitted Done Reply Inline Actions Could you replace `GLD1` with `Load`? I believe that that will be still correct with the added bonus of covering the new case :) andwar:* Could you replace `GLD1*` with `Load`? I believe that that will be still correct with the added…
// perfect candidates for combining.		// perfect candidates for combining.
switch (Src->getOpcode()) {		switch (Src->getOpcode()) {
		case AArch64ISD::LDNF1:
		MemVT = cast<VTSDNode>(Src->getOperand(3))->getVT();
		break;
case AArch64ISD::GLD1:		case AArch64ISD::GLD1:
case AArch64ISD::GLD1_SCALED:		case AArch64ISD::GLD1_SCALED:
case AArch64ISD::GLD1_SXTW:		case AArch64ISD::GLD1_SXTW:
case AArch64ISD::GLD1_SXTW_SCALED:		case AArch64ISD::GLD1_SXTW_SCALED:
case AArch64ISD::GLD1_UXTW:		case AArch64ISD::GLD1_UXTW:
case AArch64ISD::GLD1_UXTW_SCALED:		case AArch64ISD::GLD1_UXTW_SCALED:
case AArch64ISD::GLD1_IMM:		case AArch64ISD::GLD1_IMM:
		MemVT = cast<VTSDNode>(Src->getOperand(4))->getVT();
break;		break;
default:		default:
return SDValue();		return SDValue();
}		}

EVT MemVT = cast<VTSDNode>(Src->getOperand(4))->getVT();

if (isConstantSplatVectorMaskForType(Mask.getNode(), MemVT))		if (isConstantSplatVectorMaskForType(Mask.getNode(), MemVT))
return Src;		return Src;

return SDValue();		return SDValue();
}		}

static SDValue performANDCombine(SDNode *N,		static SDValue performANDCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI) {		TargetLowering::DAGCombinerInfo &DCI) {
▲ Show 20 Lines • Show All 1,009 Lines • ▼ Show 20 Lines	static SDValue performSTNT1Combine(SDNode *N, SelectionDAG &DAG) {

auto *MINode = cast<MemIntrinsicSDNode>(N);		auto *MINode = cast<MemIntrinsicSDNode>(N);
return DAG.getMaskedStore(MINode->getChain(), DL, Data, MINode->getOperand(4),		return DAG.getMaskedStore(MINode->getChain(), DL, Data, MINode->getOperand(4),
DAG.getUNDEF(PtrTy), MINode->getOperand(3),		DAG.getUNDEF(PtrTy), MINode->getOperand(3),
MINode->getMemoryVT(), MINode->getMemOperand(),		MINode->getMemoryVT(), MINode->getMemOperand(),
ISD::UNINDEXED, false, false);		ISD::UNINDEXED, false, false);
}		}

		static SDValue performLDNF1Combine(SDNode *N, SelectionDAG &DAG) {
		SDLoc DL(N);
		EVT VT = N->getValueType(0);

		if (VT.getSizeInBits().getKnownMinSize() > AArch64::SVEBitsPerBlock)
		return SDValue();

		EVT ContainerVT = VT;
		if (ContainerVT.isInteger()) {
		switch (VT.getVectorNumElements()) {
		andwarUnsubmitted Done Reply Inline Actions You could use `getSVEContainterType` here instead. You'll need to extend it a wee bit. andwar: You could use `getSVEContainterType` here instead. You'll need to extend it a wee bit.
		default: return SDValue();
		case 16: ContainerVT = MVT::nxv16i8; break;
		case 8: ContainerVT = MVT::nxv8i16; break;
		case 4: ContainerVT = MVT::nxv4i32; break;
		case 2: ContainerVT = MVT::nxv2i64; break;
		}
		}

		SDVTList VTs = DAG.getVTList(ContainerVT, MVT::Other);
		SDValue Ops[] = { N->getOperand(0), // Chain
		N->getOperand(2), // Pg
		N->getOperand(3), // Base
		DAG.getValueType(VT) };

		SDValue Load = DAG.getNode(AArch64ISD::LDNF1, DL, VTs, Ops);
		SDValue LoadChain = SDValue(Load.getNode(), 1);

		if (ContainerVT.isInteger() && (VT != ContainerVT))
		Load = DAG.getNode(ISD::TRUNCATE, DL, VT, Load.getValue(0));

		return DAG.getMergeValues({ Load, LoadChain }, DL);
		}

/// Replace a splat of zeros to a vector store by scalar stores of WZR/XZR. The		/// Replace a splat of zeros to a vector store by scalar stores of WZR/XZR. The
/// load store optimizer pass will merge them to store pair stores. This should		/// load store optimizer pass will merge them to store pair stores. This should
/// be better than a movi to create the vector zero followed by a vector store		/// be better than a movi to create the vector zero followed by a vector store
/// if the zero constant is not re-used, since one instructions and one register		/// if the zero constant is not re-used, since one instructions and one register
/// live range will be removed.		/// live range will be removed.
///		///
/// For example, the final generated code should be:		/// For example, the final generated code should be:
///		///
▲ Show 20 Lines • Show All 1,193 Lines • ▼ Show 20 Lines
performSignExtendInRegCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,		performSignExtendInRegCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
if (DCI.isBeforeLegalizeOps())		if (DCI.isBeforeLegalizeOps())
return SDValue();		return SDValue();

SDValue Src = N->getOperand(0);		SDValue Src = N->getOperand(0);
unsigned Opc = Src->getOpcode();		unsigned Opc = Src->getOpcode();

// Gather load nodes (e.g. AArch64ISD::GLD1) are straightforward candidates		// Gather load nodes (e.g. AArch64ISD::GLD1) are straightforward candidates
		andwarUnsubmitted Done Reply Inline Actions The following `switch` statement will now cover more than just Gather nodes. Maybe `SVE load nodes` instead? andwar: The following `switch` statement will now cover more than just Gather nodes. Maybe `SVE load…
// for DAG Combine with SIGN_EXTEND_INREG. Bail out for all other nodes.		// for DAG Combine with SIGN_EXTEND_INREG. Bail out for all other nodes.
unsigned NewOpc;		unsigned NewOpc;
switch (Opc) {		switch (Opc) {
		case AArch64ISD::LDNF1:
		NewOpc = AArch64ISD::LDNF1S;
		break;
case AArch64ISD::GLD1:		case AArch64ISD::GLD1:
NewOpc = AArch64ISD::GLD1S;		NewOpc = AArch64ISD::GLD1S;
break;		break;
case AArch64ISD::GLD1_SCALED:		case AArch64ISD::GLD1_SCALED:
NewOpc = AArch64ISD::GLD1S_SCALED;		NewOpc = AArch64ISD::GLD1S_SCALED;
break;		break;
case AArch64ISD::GLD1_SXTW:		case AArch64ISD::GLD1_SXTW:
NewOpc = AArch64ISD::GLD1S_SXTW;		NewOpc = AArch64ISD::GLD1S_SXTW;
Show All 10 Lines	performSignExtendInRegCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,
case AArch64ISD::GLD1_IMM:		case AArch64ISD::GLD1_IMM:
NewOpc = AArch64ISD::GLD1S_IMM;		NewOpc = AArch64ISD::GLD1S_IMM;
break;		break;
default:		default:
return SDValue();		return SDValue();
}		}

EVT SignExtSrcVT = cast<VTSDNode>(N->getOperand(1))->getVT();		EVT SignExtSrcVT = cast<VTSDNode>(N->getOperand(1))->getVT();
EVT GLD1SrcMemVT = cast<VTSDNode>(Src->getOperand(4))->getVT();

if ((SignExtSrcVT != GLD1SrcMemVT) \|\| !Src.hasOneUse())		unsigned OpNum = NewOpc == AArch64ISD::LDNF1S ? 3 : 4;
		EVT LD1SrcMemVT = cast<VTSDNode>(Src->getOperand(OpNum))->getVT();
		sdesmalenUnsubmitted Not Done Reply Inline Actions nit: `s/LD1SrcMemVT/SrcMemVT/` sdesmalen: nit: `s/LD1SrcMemVT/SrcMemVT/`

		if ((SignExtSrcVT != LD1SrcMemVT) \|\| !Src.hasOneUse())
		sdesmalenUnsubmitted Not Done Reply Inline Actions Move the assignment of `MemVTOpNum` to the switch statement above instead of special-casing it here? sdesmalen: Move the assignment of `MemVTOpNum` to the switch statement above instead of special-casing it…
return SDValue();		return SDValue();

EVT DstVT = N->getValueType(0);		EVT DstVT = N->getValueType(0);
SDVTList VTs = DAG.getVTList(DstVT, MVT::Other);		SDVTList VTs = DAG.getVTList(DstVT, MVT::Other);
SDValue Ops[] = {Src->getOperand(0), Src->getOperand(1), Src->getOperand(2),
Src->getOperand(3), Src->getOperand(4)};		SmallVector<SDValue, 4> Ops;
		sdesmalenUnsubmitted Not Done Reply Inline Actions Better make the default '5' if there is a large likelihood of there being 5 default values. sdesmalen: Better make the default '5' if there is a large likelihood of there being 5 default values.
		sdesmalenUnsubmitted Not Done Reply Inline Actions Instead of special -casing LDNF1S below, you can write this as: SmallVector<SDValue, 5> Ops; for(unsigned I=0; I<Src->getNumOperands(); ++I) Ops.push_back(Src->getOperand(I)); sdesmalen: Instead of special -casing LDNF1S below, you can write this as: SmallVector<SDValue, 5> Ops…
		Ops.push_back(Src->getOperand(0));
		Ops.push_back(Src->getOperand(1));
		Ops.push_back(Src->getOperand(2));
		Ops.push_back(Src->getOperand(3));
		andwarUnsubmitted Done Reply Inline Actions Why not: SmallVector<SDvalue, 4> Ops = {Src->getOperand(0), Src->getOperand(1), Src->getOperand(2), Src->getOperand(3), Src->getOperand(4)}; ? andwar: Why not: ``` SmallVector<SDvalue, 4> Ops = {Src->getOperand(0), Src->getOperand(1), Src…
		if (NewOpc != AArch64ISD::LDNF1S)
		andwarUnsubmitted Done Reply Inline Actions Could you add a comment explaining what the underlying difference between `LDNF1S` and `GLD1S` is? Otherwise it's not clear why this `if` statement is needed. IIUC, `GLD1S` has an extra argument for the offsets (hence 5 args vs 4). andwar: Could you add a comment explaining what the underlying difference between `LDNF1S` and `GLD1S`…
		Ops.push_back(Src->getOperand(4));

SDValue ExtLoad = DAG.getNode(NewOpc, SDLoc(N), VTs, Ops);		SDValue ExtLoad = DAG.getNode(NewOpc, SDLoc(N), VTs, Ops);
DCI.CombineTo(N, ExtLoad);		DCI.CombineTo(N, ExtLoad);
DCI.CombineTo(Src.getNode(), ExtLoad, ExtLoad.getValue(1));		DCI.CombineTo(Src.getNode(), ExtLoad, ExtLoad.getValue(1));

// Return N so it doesn't get rechecked		// Return N so it doesn't get rechecked
return SDValue(N, 0);		return SDValue(N, 0);
}		}
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	case ISD::INTRINSIC_W_CHAIN:
case Intrinsic::aarch64_neon_st1x3:		case Intrinsic::aarch64_neon_st1x3:
case Intrinsic::aarch64_neon_st1x4:		case Intrinsic::aarch64_neon_st1x4:
case Intrinsic::aarch64_neon_st2lane:		case Intrinsic::aarch64_neon_st2lane:
case Intrinsic::aarch64_neon_st3lane:		case Intrinsic::aarch64_neon_st3lane:
case Intrinsic::aarch64_neon_st4lane:		case Intrinsic::aarch64_neon_st4lane:
return performNEONPostLDSTCombine(N, DCI, DAG);		return performNEONPostLDSTCombine(N, DCI, DAG);
case Intrinsic::aarch64_sve_ldnt1:		case Intrinsic::aarch64_sve_ldnt1:
return performLDNT1Combine(N, DAG);		return performLDNT1Combine(N, DAG);
		case Intrinsic::aarch64_sve_ldnf1:
		return performLDNF1Combine(N, DAG);
case Intrinsic::aarch64_sve_stnt1:		case Intrinsic::aarch64_sve_stnt1:
return performSTNT1Combine(N, DAG);		return performSTNT1Combine(N, DAG);
case Intrinsic::aarch64_sve_ld1_gather:		case Intrinsic::aarch64_sve_ld1_gather:
return performLD1GatherCombine(N, DAG, AArch64ISD::GLD1);		return performLD1GatherCombine(N, DAG, AArch64ISD::GLD1);
case Intrinsic::aarch64_sve_ld1_gather_index:		case Intrinsic::aarch64_sve_ld1_gather_index:
return performLD1GatherCombine(N, DAG, AArch64ISD::GLD1_SCALED);		return performLD1GatherCombine(N, DAG, AArch64ISD::GLD1_SCALED);
case Intrinsic::aarch64_sve_ld1_gather_sxtw:		case Intrinsic::aarch64_sve_ld1_gather_sxtw:
return performLD1GatherCombine(N, DAG, AArch64ISD::GLD1_SXTW);		return performLD1GatherCombine(N, DAG, AArch64ISD::GLD1_SXTW);
▲ Show 20 Lines • Show All 700 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 535 Lines • ▼ Show 20 Lines
	def AArch64sunpkhi : SDNode<"AArch64ISD::SUNPKHI", SDT_AArch64unpk>;			def AArch64sunpkhi : SDNode<"AArch64ISD::SUNPKHI", SDT_AArch64unpk>;
	def AArch64sunpklo : SDNode<"AArch64ISD::SUNPKLO", SDT_AArch64unpk>;			def AArch64sunpklo : SDNode<"AArch64ISD::SUNPKLO", SDT_AArch64unpk>;
	def AArch64uunpkhi : SDNode<"AArch64ISD::UUNPKHI", SDT_AArch64unpk>;			def AArch64uunpkhi : SDNode<"AArch64ISD::UUNPKHI", SDT_AArch64unpk>;
	def AArch64uunpklo : SDNode<"AArch64ISD::UUNPKLO", SDT_AArch64unpk>;			def AArch64uunpklo : SDNode<"AArch64ISD::UUNPKLO", SDT_AArch64unpk>;

	def AArch64ldp : SDNode<"AArch64ISD::LDP", SDT_AArch64ldp, [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;			def AArch64ldp : SDNode<"AArch64ISD::LDP", SDT_AArch64ldp, [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
	def AArch64stp : SDNode<"AArch64ISD::STP", SDT_AArch64stp, [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;			def AArch64stp : SDNode<"AArch64ISD::STP", SDT_AArch64stp, [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;

				def SDT_AArch64_LDNF1 : SDTypeProfile<1, 3, [
				SDTCisVec<0>, SDTCisVec<1>, SDTCisPtrTy<2>,
				SDTCVecEltisVT<1,i1>, SDTCisSameNumEltsAs<0,1>
				]>;

				def AArch64ldnf1 : SDNode<"AArch64ISD::LDNF1", SDT_AArch64_LDNF1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	// AArch64 Instruction Predicate Definitions.			// AArch64 Instruction Predicate Definitions.
	// We could compute these on a per-module basis but doing so requires accessing			// We could compute these on a per-module basis but doing so requires accessing
	// the Function object through the <Target>Subtarget and objections were raised			// the Function object through the <Target>Subtarget and objections were raised
	// to that (see post-commit review comments for r301750).			// to that (see post-commit review comments for r301750).
	▲ Show 20 Lines • Show All 6,738 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

	Show All 40 Lines
	def AArch64ld1_gather : SDNode<"AArch64ISD::GLD1", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;			def AArch64ld1_gather : SDNode<"AArch64ISD::GLD1", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
	def AArch64ld1_gather_scaled : SDNode<"AArch64ISD::GLD1_SCALED", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;			def AArch64ld1_gather_scaled : SDNode<"AArch64ISD::GLD1_SCALED", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
	def AArch64ld1_gather_uxtw : SDNode<"AArch64ISD::GLD1_UXTW", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;			def AArch64ld1_gather_uxtw : SDNode<"AArch64ISD::GLD1_UXTW", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
	def AArch64ld1_gather_sxtw : SDNode<"AArch64ISD::GLD1_SXTW", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;			def AArch64ld1_gather_sxtw : SDNode<"AArch64ISD::GLD1_SXTW", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
	def AArch64ld1_gather_uxtw_scaled : SDNode<"AArch64ISD::GLD1_UXTW_SCALED", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;			def AArch64ld1_gather_uxtw_scaled : SDNode<"AArch64ISD::GLD1_UXTW_SCALED", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
	def AArch64ld1_gather_sxtw_scaled : SDNode<"AArch64ISD::GLD1_SXTW_SCALED", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;			def AArch64ld1_gather_sxtw_scaled : SDNode<"AArch64ISD::GLD1_SXTW_SCALED", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
	def AArch64ld1_gather_imm : SDNode<"AArch64ISD::GLD1_IMM", SDT_AArch64_GLD1_IMM, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;			def AArch64ld1_gather_imm : SDNode<"AArch64ISD::GLD1_IMM", SDT_AArch64_GLD1_IMM, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;

				def AArch64ldnf1s : SDNode<"AArch64ISD::LDNF1S", SDT_AArch64_LDNF1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
	def AArch64ld1s_gather : SDNode<"AArch64ISD::GLD1S", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;			def AArch64ld1s_gather : SDNode<"AArch64ISD::GLD1S", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
	def AArch64ld1s_gather_scaled : SDNode<"AArch64ISD::GLD1S_SCALED", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;			def AArch64ld1s_gather_scaled : SDNode<"AArch64ISD::GLD1S_SCALED", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
	def AArch64ld1s_gather_uxtw : SDNode<"AArch64ISD::GLD1S_UXTW", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;			def AArch64ld1s_gather_uxtw : SDNode<"AArch64ISD::GLD1S_UXTW", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
	def AArch64ld1s_gather_sxtw : SDNode<"AArch64ISD::GLD1S_SXTW", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;			def AArch64ld1s_gather_sxtw : SDNode<"AArch64ISD::GLD1S_SXTW", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
	def AArch64ld1s_gather_uxtw_scaled : SDNode<"AArch64ISD::GLD1S_UXTW_SCALED", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;			def AArch64ld1s_gather_uxtw_scaled : SDNode<"AArch64ISD::GLD1S_UXTW_SCALED", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
	def AArch64ld1s_gather_sxtw_scaled : SDNode<"AArch64ISD::GLD1S_SXTW_SCALED", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;			def AArch64ld1s_gather_sxtw_scaled : SDNode<"AArch64ISD::GLD1S_SXTW_SCALED", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
	def AArch64ld1s_gather_imm : SDNode<"AArch64ISD::GLD1S_IMM", SDT_AArch64_GLD1_IMM, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;			def AArch64ld1s_gather_imm : SDNode<"AArch64ISD::GLD1S_IMM", SDT_AArch64_GLD1_IMM, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;

	▲ Show 20 Lines • Show All 1,148 Lines • ▼ Show 20 Lines
	defm : pred_load<nxv8i16, nxv8i1, non_temporal_load, LDNT1H_ZRI>;			defm : pred_load<nxv8i16, nxv8i1, non_temporal_load, LDNT1H_ZRI>;
	defm : pred_load<nxv4i32, nxv4i1, non_temporal_load, LDNT1W_ZRI>;			defm : pred_load<nxv4i32, nxv4i1, non_temporal_load, LDNT1W_ZRI>;
	defm : pred_load<nxv2i64, nxv2i1, non_temporal_load, LDNT1D_ZRI>;			defm : pred_load<nxv2i64, nxv2i1, non_temporal_load, LDNT1D_ZRI>;

	defm : pred_store<nxv16i8, nxv16i1, non_temporal_store, STNT1B_ZRI>;			defm : pred_store<nxv16i8, nxv16i1, non_temporal_store, STNT1B_ZRI>;
	defm : pred_store<nxv8i16, nxv8i1, non_temporal_store, STNT1H_ZRI>;			defm : pred_store<nxv8i16, nxv8i1, non_temporal_store, STNT1H_ZRI>;
	defm : pred_store<nxv4i32, nxv4i1, non_temporal_store, STNT1W_ZRI>;			defm : pred_store<nxv4i32, nxv4i1, non_temporal_store, STNT1W_ZRI>;
	defm : pred_store<nxv2i64, nxv2i1, non_temporal_store, STNT1D_ZRI>;			defm : pred_store<nxv2i64, nxv2i1, non_temporal_store, STNT1D_ZRI>;

				multiclass ldnf1<Instruction I, ValueType Ty, SDPatternOperator Load, ValueType PredTy, ValueType MemVT> {
				// base
				def : Pat<(Ty (Load (PredTy PPR:$gp), GPR64:$base, MemVT)),
				(I PPR:$gp, GPR64sp:$base, (i64 0))>;
				}

				// 2-element contiguous non-faulting loads
				defm : ldnf1<LDNF1B_D_IMM, nxv2i64, AArch64ldnf1, nxv2i1, nxv2i8>;
				defm : ldnf1<LDNF1SB_D_IMM, nxv2i64, AArch64ldnf1s, nxv2i1, nxv2i8>;
				defm : ldnf1<LDNF1H_D_IMM, nxv2i64, AArch64ldnf1, nxv2i1, nxv2i16>;
				defm : ldnf1<LDNF1SH_D_IMM, nxv2i64, AArch64ldnf1s, nxv2i1, nxv2i16>;
				defm : ldnf1<LDNF1W_D_IMM, nxv2i64, AArch64ldnf1, nxv2i1, nxv2i32>;
				defm : ldnf1<LDNF1SW_D_IMM, nxv2i64, AArch64ldnf1s, nxv2i1, nxv2i32>;
				defm : ldnf1<LDNF1D_IMM, nxv2i64, AArch64ldnf1, nxv2i1, nxv2i64>;
				defm : ldnf1<LDNF1D_IMM, nxv2f64, AArch64ldnf1, nxv2i1, nxv2f64>;

				// 4-element contiguous non-faulting loads
				defm : ldnf1<LDNF1B_S_IMM, nxv4i32, AArch64ldnf1, nxv4i1, nxv4i8>;
				defm : ldnf1<LDNF1SB_S_IMM, nxv4i32, AArch64ldnf1s, nxv4i1, nxv4i8>;
				defm : ldnf1<LDNF1H_S_IMM, nxv4i32, AArch64ldnf1, nxv4i1, nxv4i16>;
				defm : ldnf1<LDNF1SH_S_IMM, nxv4i32, AArch64ldnf1s, nxv4i1, nxv4i16>;
				defm : ldnf1<LDNF1W_IMM, nxv4i32, AArch64ldnf1, nxv4i1, nxv4i32>;
				defm : ldnf1<LDNF1W_IMM, nxv4f32, AArch64ldnf1, nxv4i1, nxv4f32>;

				// 8-element contiguous non-faulting loads
				defm : ldnf1<LDNF1B_H_IMM, nxv8i16, AArch64ldnf1, nxv8i1, nxv8i8>;
				defm : ldnf1<LDNF1SB_H_IMM, nxv8i16, AArch64ldnf1s, nxv8i1, nxv8i8>;
				defm : ldnf1<LDNF1H_IMM, nxv8i16, AArch64ldnf1, nxv8i1, nxv8i16>;
				defm : ldnf1<LDNF1H_IMM, nxv8f16, AArch64ldnf1, nxv8i1, nxv8f16>;

				// 16-element contiguous non-faulting loads
				defm : ldnf1<LDNF1B_IMM, nxv16i8, AArch64ldnf1, nxv16i1, nxv16i8>;

	}			}

	let Predicates = [HasSVE2] in {			let Predicates = [HasSVE2] in {
	// SVE2 integer multiply-add (indexed)			// SVE2 integer multiply-add (indexed)
	defm MLA_ZZZI : sve2_int_mla_by_indexed_elem<0b01, 0b0, "mla">;			defm MLA_ZZZI : sve2_int_mla_by_indexed_elem<0b01, 0b0, "mla">;
	defm MLS_ZZZI : sve2_int_mla_by_indexed_elem<0b01, 0b1, "mls">;			defm MLS_ZZZI : sve2_int_mla_by_indexed_elem<0b01, 0b1, "mls">;

	// SVE2 saturating multiply-add high (indexed)			// SVE2 saturating multiply-add high (indexed)
	▲ Show 20 Lines • Show All 400 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/SVEInstrFormats.td

Show First 20 Lines • Show All 5,314 Lines • ▼ Show 20 Lines	: I<(outs VecList:$Zt), (ins PPR3bAny:$Pg, GPR64sp:$Rn, simm4s1:$imm4),

let mayLoad = 1;		let mayLoad = 1;
let Uses = !if(!eq(nf, 1), [FFR], []);		let Uses = !if(!eq(nf, 1), [FFR], []);
let Defs = !if(!eq(nf, 1), [FFR], []);		let Defs = !if(!eq(nf, 1), [FFR], []);
}		}

multiclass sve_mem_cld_si_base<bits<4> dtype, bit nf, string asm,		multiclass sve_mem_cld_si_base<bits<4> dtype, bit nf, string asm,
RegisterOperand listty, ZPRRegOp zprty> {		RegisterOperand listty, ZPRRegOp zprty> {
def "" : sve_mem_cld_si_base<dtype, nf, asm, listty>;		def _REAL : sve_mem_cld_si_base<dtype, nf, asm, listty>;

def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn]",		def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn]",
(!cast<Instruction>(NAME) zprty:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, 0), 0>;		(!cast<Instruction>(NAME # _REAL) zprty:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, 0), 0>;
def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn, $imm4, mul vl]",		def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn, $imm4, mul vl]",
(!cast<Instruction>(NAME) zprty:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, simm4s1:$imm4), 0>;		(!cast<Instruction>(NAME # _REAL) zprty:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, simm4s1:$imm4), 0>;
def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn]",		def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn]",
(!cast<Instruction>(NAME) listty:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, 0), 1>;		(!cast<Instruction>(NAME # _REAL) listty:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, 0), 1>;

		// We need a layer of indirection because early machine code passes balk at
		// physical register (i.e. FFR) uses that have no previous definition.
		efriedmaUnsubmitted Not Done Reply Inline Actions This is depending on hasSideEffects to preserve the correct ordering with instructions that read/write FFR? That probably works. I guess the alternative is to insert an IMPLICIT_DEF of FFR in the entry block of each function. What are the calling convention rules for FFR? Is it callee-save? If not, we might need to do some work to make FFR reads/writes do something sane across calls inserted by the compiler. efriedma: This is depending on hasSideEffects to preserve the correct ordering with instructions that…
		kmclaughlinAuthorUnsubmitted Not Done Reply Inline Actions The FFR is not callee-saved. We will need to add support to save & restore it where appropriate at the point the compiler starts generating reads to the FFR, but for the purpose of the ACLE the user will be required to do this if necessary. kmclaughlin: The FFR is not callee-saved. We will need to add support to save & restore it where appropriate…
		efriedmaUnsubmitted Not Done Reply Inline Actions How can the user write correct code to save/restore the FFR? The compiler can move arbitrary readnone/argmemonly calls between the definition and the use. efriedma: How can the user write correct code to save/restore the FFR? The compiler can move arbitrary…
		sdesmalenUnsubmitted Not Done Reply Inline Actions There are separate intrinsics for loading/writing the FFR (svrdffr, svsetffr, svwrffr), which use a `svbool_t` to keep the value of the FFR. These intrinsics are implemented in the same way with a Pseudo with `hasSideEffects = 1` set. I thought this flag would prevent other calls from being scheduled/moved over these intrinsics, as they have unknown/unmodelled side-effects and would thus act kind of like a barrier? sdesmalen: There are separate intrinsics for loading/writing the FFR (svrdffr, svsetffr, svwrffr), which…
		efriedmaUnsubmitted Not Done Reply Inline Actions The issue would be transforms at the IR/SelectionDAG level. We can probably model calls at the MIR level correctly, like you're describing. efriedma: The issue would be transforms at the IR/SelectionDAG level. We can probably model calls at the…
		let hasSideEffects = 1, hasNoSchedulingInfo = 1, mayLoad = 1 in {
		def "" : Pseudo<(outs listty:$Zt), (ins PPR3bAny:$Pg, GPR64sp:$Rn, simm4s1:$imm4), []>,
		PseudoInstExpansion<(!cast<Instruction>(NAME # _REAL) listty:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, simm4s1:$imm4)>;
		}
}		}

multiclass sve_mem_cld_si<bits<4> dtype, string asm, RegisterOperand listty,		multiclass sve_mem_cld_si<bits<4> dtype, string asm, RegisterOperand listty,
ZPRRegOp zprty>		ZPRRegOp zprty>
: sve_mem_cld_si_base<dtype, 0, asm, listty, zprty>;		: sve_mem_cld_si_base<dtype, 0, asm, listty, zprty>;

class sve_mem_cldnt_si_base<bits<2> msz, string asm, RegisterOperand VecList>		class sve_mem_cldnt_si_base<bits<2> msz, string asm, RegisterOperand VecList>
: I<(outs VecList:$Zt), (ins PPR3bAny:$Pg, GPR64sp:$Rn, simm4s1:$imm4),		: I<(outs VecList:$Zt), (ins PPR3bAny:$Pg, GPR64sp:$Rn, simm4s1:$imm4),
▲ Show 20 Lines • Show All 1,212 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-intrinsics-loads-nf.ll

This file was added.

				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s \| FileCheck %s

				define <vscale x 16 x i8> @ldnf1b(<vscale x 16 x i1> %pg, i8* %a) {
				; CHECK-LABEL: ldnf1b:
				; CHECK: ldnf1b { z0.b }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 16 x i8> @llvm.aarch64.sve.ldnf1.nxv16i8(<vscale x 16 x i1> %pg, i8* %a)
				ret <vscale x 16 x i8> %load
				}

				define <vscale x 8 x i16> @ldnf1b_h(<vscale x 8 x i1> %pg, i8* %a) {
				; CHECK-LABEL: ldnf1b_h:
				; CHECK: ldnf1b { z0.h }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 8 x i8> @llvm.aarch64.sve.ldnf1.nxv8i8(<vscale x 8 x i1> %pg, i8* %a)
				%res = zext <vscale x 8 x i8> %load to <vscale x 8 x i16>
				ret <vscale x 8 x i16> %res
				}

				define <vscale x 8 x i16> @ldnf1sb_h(<vscale x 8 x i1> %pg, i8* %a) {
				; CHECK-LABEL: ldnf1sb_h:
				; CHECK: ldnf1sb { z0.h }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 8 x i8> @llvm.aarch64.sve.ldnf1.nxv8i8(<vscale x 8 x i1> %pg, i8* %a)
				%res = sext <vscale x 8 x i8> %load to <vscale x 8 x i16>
				ret <vscale x 8 x i16> %res
				}

				define <vscale x 8 x i16> @ldnf1h(<vscale x 8 x i1> %pg, i16* %a) {
				; CHECK-LABEL: ldnf1h:
				; CHECK: ldnf1h { z0.h }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 8 x i16> @llvm.aarch64.sve.ldnf1.nxv8i16(<vscale x 8 x i1> %pg, i16* %a)
				ret <vscale x 8 x i16> %load
				}

				define <vscale x 8 x half> @ldnf1h_f16(<vscale x 8 x i1> %pg, half* %a) {
				; CHECK-LABEL: ldnf1h_f16:
				; CHECK: ldnf1h { z0.h }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 8 x half> @llvm.aarch64.sve.ldnf1.nxv8f16(<vscale x 8 x i1> %pg, half* %a)
				ret <vscale x 8 x half> %load
				}

				define <vscale x 4 x i32> @ldnf1b_s(<vscale x 4 x i1> %pg, i8* %a) {
				; CHECK-LABEL: ldnf1b_s:
				; CHECK: ldnf1b { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 4 x i8> @llvm.aarch64.sve.ldnf1.nxv4i8(<vscale x 4 x i1> %pg, i8* %a)
				%res = zext <vscale x 4 x i8> %load to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 4 x i32> @ldnf1sb_s(<vscale x 4 x i1> %pg, i8* %a) {
				; CHECK-LABEL: ldnf1sb_s:
				; CHECK: ldnf1sb { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 4 x i8> @llvm.aarch64.sve.ldnf1.nxv4i8(<vscale x 4 x i1> %pg, i8* %a)
				%res = sext <vscale x 4 x i8> %load to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 4 x i32> @ldnf1h_s(<vscale x 4 x i1> %pg, i16* %a) {
				; CHECK-LABEL: ldnf1h_s:
				; CHECK: ldnf1h { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 4 x i16> @llvm.aarch64.sve.ldnf1.nxv4i16(<vscale x 4 x i1> %pg, i16* %a)
				%res = zext <vscale x 4 x i16> %load to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 4 x i32> @ldnf1sh_s(<vscale x 4 x i1> %pg, i16* %a) {
				; CHECK-LABEL: ldnf1sh_s:
				; CHECK: ldnf1sh { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 4 x i16> @llvm.aarch64.sve.ldnf1.nxv4i16(<vscale x 4 x i1> %pg, i16* %a)
				%res = sext <vscale x 4 x i16> %load to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 4 x i32> @ldnf1w(<vscale x 4 x i1> %pg, i32* %a) {
				; CHECK-LABEL: ldnf1w:
				; CHECK: ldnf1w { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 4 x i32> @llvm.aarch64.sve.ldnf1.nxv4i32(<vscale x 4 x i1> %pg, i32* %a)
				ret <vscale x 4 x i32> %load
				}

				define <vscale x 4 x float> @ldnf1w_f32(<vscale x 4 x i1> %pg, float* %a) {
				; CHECK-LABEL: ldnf1w_f32:
				; CHECK: ldnf1w { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 4 x float> @llvm.aarch64.sve.ldnf1.nxv4f32(<vscale x 4 x i1> %pg, float* %a)
				ret <vscale x 4 x float> %load
				}

				define <vscale x 2 x i64> @ldnf1b_d(<vscale x 2 x i1> %pg, i8* %a) {
				; CHECK-LABEL: ldnf1b_d:
				; CHECK: ldnf1b { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 2 x i8> @llvm.aarch64.sve.ldnf1.nxv2i8(<vscale x 2 x i1> %pg, i8* %a)
				%res = zext <vscale x 2 x i8> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @ldnf1sb_d(<vscale x 2 x i1> %pg, i8* %a) {
				; CHECK-LABEL: ldnf1sb_d:
				; CHECK: ldnf1sb { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 2 x i8> @llvm.aarch64.sve.ldnf1.nxv2i8(<vscale x 2 x i1> %pg, i8* %a)
				%res = sext <vscale x 2 x i8> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @ldnf1h_d(<vscale x 2 x i1> %pg, i16* %a) {
				; CHECK-LABEL: ldnf1h_d:
				; CHECK: ldnf1h { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 2 x i16> @llvm.aarch64.sve.ldnf1.nxv2i16(<vscale x 2 x i1> %pg, i16* %a)
				%res = zext <vscale x 2 x i16> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @ldnf1sh_d(<vscale x 2 x i1> %pg, i16* %a) {
				; CHECK-LABEL: ldnf1sh_d:
				; CHECK: ldnf1sh { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 2 x i16> @llvm.aarch64.sve.ldnf1.nxv2i16(<vscale x 2 x i1> %pg, i16* %a)
				%res = sext <vscale x 2 x i16> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @ldnf1w_d(<vscale x 2 x i1> %pg, i32* %a) {
				; CHECK-LABEL: ldnf1w_d:
				; CHECK: ldnf1w { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 2 x i32> @llvm.aarch64.sve.ldnf1.nxv2i32(<vscale x 2 x i1> %pg, i32* %a)
				%res = zext <vscale x 2 x i32> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @ldnf1sw_d(<vscale x 2 x i1> %pg, i32* %a) {
				; CHECK-LABEL: ldnf1sw_d:
				; CHECK: ldnf1sw { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 2 x i32> @llvm.aarch64.sve.ldnf1.nxv2i32(<vscale x 2 x i1> %pg, i32* %a)
				%res = sext <vscale x 2 x i32> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @ldnf1d(<vscale x 2 x i1> %pg, i64* %a) {
				; CHECK-LABEL: ldnf1d:
				; CHECK: ldnf1d { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 2 x i64> @llvm.aarch64.sve.ldnf1.nxv2i64(<vscale x 2 x i1> %pg, i64* %a)
				ret <vscale x 2 x i64> %load
				}

				define <vscale x 2 x double> @ldnf1d_f64(<vscale x 2 x i1> %pg, double* %a) {
				; CHECK-LABEL: ldnf1d_f64:
				; CHECK: ldnf1d { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 2 x double> @llvm.aarch64.sve.ldnf1.nxv2f64(<vscale x 2 x i1> %pg, double* %a)
				ret <vscale x 2 x double> %load
				}

				declare <vscale x 16 x i8> @llvm.aarch64.sve.ldnf1.nxv16i8(<vscale x 16 x i1>, i8*)

				declare <vscale x 8 x i8> @llvm.aarch64.sve.ldnf1.nxv8i8(<vscale x 8 x i1>, i8*)
				declare <vscale x 8 x i16> @llvm.aarch64.sve.ldnf1.nxv8i16(<vscale x 8 x i1>, i16*)
				declare <vscale x 8 x half> @llvm.aarch64.sve.ldnf1.nxv8f16(<vscale x 8 x i1>, half*)

				declare <vscale x 4 x i8> @llvm.aarch64.sve.ldnf1.nxv4i8(<vscale x 4 x i1>, i8*)
				declare <vscale x 4 x i16> @llvm.aarch64.sve.ldnf1.nxv4i16(<vscale x 4 x i1>, i16*)
				declare <vscale x 4 x i32> @llvm.aarch64.sve.ldnf1.nxv4i32(<vscale x 4 x i1>, i32*)
				declare <vscale x 4 x float> @llvm.aarch64.sve.ldnf1.nxv4f32(<vscale x 4 x i1>, float*)

				declare <vscale x 2 x i8> @llvm.aarch64.sve.ldnf1.nxv2i8(<vscale x 2 x i1>, i8*)
				declare <vscale x 2 x i16> @llvm.aarch64.sve.ldnf1.nxv2i16(<vscale x 2 x i1>, i16*)
				declare <vscale x 2 x i32> @llvm.aarch64.sve.ldnf1.nxv2i32(<vscale x 2 x i1>, i32*)
				declare <vscale x 2 x i64> @llvm.aarch64.sve.ldnf1.nxv2i64(<vscale x 2 x i1>, i64*)
				declare <vscale x 2 x double> @llvm.aarch64.sve.ldnf1.nxv2f64(<vscale x 2 x i1>, double*)

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Add intrinsic for non-faulting loadsClosedPublic

Details

Diff Detail

Event Timeline