This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
CodeGen/
-
MachineMemOperand.h
-
SelectionDAGNodes.h
-
IR/
-
IntrinsicsAArch64.td
-
lib/
-
CodeGen/
-
MachineOperand.cpp
-
SelectionDAG/
1
SelectionDAG.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64ISelLowering.cpp
3
AArch64InstrInfo.td
-
AArch64SVEInstrInfo.td
-
SVEInstrFormats.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
sve-intrinsics-loads-nf.ll

Differential D71556

[AArch64][SVE] Implement intrinsic for non-faulting loads
Changes PlannedPublic

Authored by kmclaughlin on Dec 16 2019, 9:15 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
paulwalker-arm
efriedma
dancgr
mgudim
rengolin

Summary

Adds the llvm.aarch64.sve.ldnf1 intrinsic, adding a new
flag to MachineMemOperand (MONonFaulting)

Diff Detail

Event Timeline

kmclaughlin created this revision.Dec 16 2019, 9:15 AM

Herald added a reviewer: rengolin. · View Herald TranscriptDec 16 2019, 9:15 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: psnobl, rkruppe, hiraditya and 2 others. · View Herald Transcript

I'm not sure it's legal to transform a non-faulting load to a load with a non-faulting flag? At least, we'd need to consider the implications of that very carefully. In particular, I'm concerned about the interaction with intrinsics that read/write FFR. I mean, you could specify that loads marked MONonFaulting actually write to the FFR register, but that seems confusing.

It seems simpler to preserve the intrinsic until isel, at least for now.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
4490	I'm not sure how this is related.

sdesmalen added inline comments.Dec 17 2019, 3:02 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.td

329

This duplicates a lot of code, maybe it makes sense to combine this into a multiclass. I'm thinking of something along the lines of:

multiclass load_store_fragments<code pred> {
  def : PatFrag<(ops node:$ptr, node:$pred, node:$def),
                (masked_ld node:$ptr, undef, node:$pred, node:$def),
                pred>;

  def _i8 : PatFrag<(ops ...),
                    ((cast<SDPatternOperator>(NAME) ...),
                    [{ return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i8; }]>;
  def _i16 : PatFrag<(ops ...),
                    ((cast<SDPatternOperator>(NAME) ....),
                    [{ return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i16; }]>;
  def _i32 : PatFrag<(ops ...),
                    ((cast<SDPatternOperator>(NAME) ....),
                    [{ return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i32; }]>;
  def _i64 : PatFrag<(ops ...),
                    ((cast<SDPatternOperator>(NAME) ....),
                    [{ return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i64; }]>;
}

defm non_temporal_load : load_store_fragments<[{
   return cast<MaskedLoadSDNode>(N)->getExtensionType() == ISD::NON_EXTLOAD &&
          cast<MaskedLoadSDNode>(N)->isUnindexed() &&
          cast<MaskedLoadSDNode>(N)->isNonTemporal() &&
          !cast<MaskedLoadSDNode>(N)->isNonFaulting();
}]>;

defm sext_non_temporal_load : load_store_fragments<[{
   return cast<MaskedLoadSDNode>(N)->getExtensionType() == ISD::SEXTLOAD &&
          cast<MaskedLoadSDNode>(N)->isUnindexed() &&
          cast<MaskedLoadSDNode>(N)->isNonTemporal() &&
          !cast<MaskedLoadSDNode>(N)->isNonFaulting();
}]>;

defm zext_non_faulting_load : load_store_fragments<[{
   return cast<MaskedLoadSDNode>(N)->getExtensionType() == ISD::ZEXTLOAD &&
          cast<MaskedLoadSDNode>(N)->isUnindexed() &&
          !cast<MaskedLoadSDNode>(N)->isNonTemporal() &&
          cast<MaskedLoadSDNode>(N)->isNonFaulting();
}]>;

338

nit: we should probably add _masked_ to the name (although I realise that I forgot to spot that on the non-temporal patch)

341

Shall we support any-extend as well? (thus making asext_non_faulting_load)

In D71556#1786465, @efriedma wrote:

I'm not sure it's legal to transform a non-faulting load to a load with a non-faulting flag? At least, we'd need to consider the implications of that very carefully. In particular, I'm concerned about the interaction with intrinsics that read/write FFR. I mean, you could specify that loads marked MONonFaulting actually write to the FFR register, but that seems confusing.

It seems simpler to preserve the intrinsic until isel, at least for now.

I missed this comment earlier, but that's a valid point. For SVE having side-effects is assumed from the non-faulting flag. We hoped to latch on to the MLOAD here to reuse code and benefit from legalization in case we want to add a more generic mechanism in the future to use such loads directly in the loop-vectorizer.

Perhaps we can clarify the intent that the non-faulting mode may have side-effects by renaming the flag to something like NonFaultingWithSideEffects? Otherwise we can stick with the intrinsics as you suggest.

Perhaps we can clarify the intent that the non-faulting mode may have side-effects by renaming the flag to something like NonFaultingWithSideEffects?

We could specify something like that... but that probably requires some changes to target-independent DAGCombine, and it's sort of a weird edge case given that no other loads have that kind of side-effect. If we are going to make a change like that to MachineMemOperand, we'd probably want to propose it on llvmdev. I'd prefer a little extra code here to avoid that rabbit hole. We're busy enough already with other substantial changes to target-independent code.

Thanks for the feedback on this patch, @efriedma & @sdesmalen!
I think there is still value in adding a NonFaulting flag to MachineMemOperand so that we can benefit from legalisation, but as this is not a requirement for the ACLE I have created a new patch which implements the non-faulting load intrinsic explicitly: https://reviews.llvm.org/D71698
I will leave this patch in the 'plan changes' state so that it can be referred to in future discussions on the mailing list.

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

MachineMemOperand.h

6 lines

SelectionDAGNodes.h

4 lines

IR/

IntrinsicsAArch64.td

8 lines

lib/

CodeGen/

MachineOperand.cpp

2 lines

SelectionDAG/

SelectionDAG.cpp

17 lines

Target/

AArch64/

AArch64ISelLowering.cpp

11 lines

AArch64InstrInfo.td

75 lines

AArch64SVEInstrInfo.td

24 lines

SVEInstrFormats.td

15 lines

test/

CodeGen/

AArch64/

sve-intrinsics-loads-nf.ll

182 lines

Diff 234089

llvm/include/llvm/CodeGen/MachineMemOperand.h

Show First 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	enum Flags : uint16_t {
// Reserved for use by target-specific passes.		// Reserved for use by target-specific passes.
// Targets may override getSerializableMachineMemOperandTargetFlags() to		// Targets may override getSerializableMachineMemOperandTargetFlags() to
// enable MIR serialization/parsing of these flags. If more of these flags		// enable MIR serialization/parsing of these flags. If more of these flags
// are added, the MIR printing/parsing code will need to be updated as well.		// are added, the MIR printing/parsing code will need to be updated as well.
MOTargetFlag1 = 1u << 6,		MOTargetFlag1 = 1u << 6,
MOTargetFlag2 = 1u << 7,		MOTargetFlag2 = 1u << 7,
MOTargetFlag3 = 1u << 8,		MOTargetFlag3 = 1u << 8,

LLVM_MARK_AS_BITMASK_ENUM(/* LargestFlag = */ MOTargetFlag3)		// The memory access is non-faulting
		MONonFaulting = 1u << 9,

		LLVM_MARK_AS_BITMASK_ENUM(/* LargestFlag = */ MONonFaulting)
};		};

private:		private:
/// Atomic information for this memory operation.		/// Atomic information for this memory operation.
struct MachineAtomicInfo {		struct MachineAtomicInfo {
/// Synchronization scope ID for this memory operation.		/// Synchronization scope ID for this memory operation.
unsigned SSID : 8; // SyncScope::ID		unsigned SSID : 8; // SyncScope::ID
/// Atomic ordering requirements for this memory operation. For cmpxchg		/// Atomic ordering requirements for this memory operation. For cmpxchg
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	public:
}		}

bool isLoad() const { return FlagVals & MOLoad; }		bool isLoad() const { return FlagVals & MOLoad; }
bool isStore() const { return FlagVals & MOStore; }		bool isStore() const { return FlagVals & MOStore; }
bool isVolatile() const { return FlagVals & MOVolatile; }		bool isVolatile() const { return FlagVals & MOVolatile; }
bool isNonTemporal() const { return FlagVals & MONonTemporal; }		bool isNonTemporal() const { return FlagVals & MONonTemporal; }
bool isDereferenceable() const { return FlagVals & MODereferenceable; }		bool isDereferenceable() const { return FlagVals & MODereferenceable; }
bool isInvariant() const { return FlagVals & MOInvariant; }		bool isInvariant() const { return FlagVals & MOInvariant; }
		bool isNonFaulting() const { return FlagVals & MONonFaulting; }

/// Returns true if this operation has an atomic ordering requirement of		/// Returns true if this operation has an atomic ordering requirement of
/// unordered or higher, false otherwise.		/// unordered or higher, false otherwise.
bool isAtomic() const { return getOrdering() != AtomicOrdering::NotAtomic; }		bool isAtomic() const { return getOrdering() != AtomicOrdering::NotAtomic; }

/// Returns true if this memory operation doesn't have any ordering		/// Returns true if this memory operation doesn't have any ordering
/// constraints other than normal aliasing. Volatile and (ordered) atomic		/// constraints other than normal aliasing. Volatile and (ordered) atomic
/// memory operations can't be reordered.		/// memory operations can't be reordered.
▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/SelectionDAGNodes.h

Show First 20 Lines • Show All 542 Lines • ▼ Show 20 Lines	class MemSDNodeBitfields {
friend class AtomicSDNode;		friend class AtomicSDNode;

uint16_t : NumSDNodeBits;		uint16_t : NumSDNodeBits;

uint16_t IsVolatile : 1;		uint16_t IsVolatile : 1;
uint16_t IsNonTemporal : 1;		uint16_t IsNonTemporal : 1;
uint16_t IsDereferenceable : 1;		uint16_t IsDereferenceable : 1;
uint16_t IsInvariant : 1;		uint16_t IsInvariant : 1;
		uint16_t IsNonFaulting : 1;
};		};
enum { NumMemSDNodeBits = NumSDNodeBits + 4 };		enum { NumMemSDNodeBits = NumSDNodeBits + 5 };

class LSBaseSDNodeBitfields {		class LSBaseSDNodeBitfields {
friend class LSBaseSDNode;		friend class LSBaseSDNode;
friend class MaskedLoadStoreSDNode;		friend class MaskedLoadStoreSDNode;
friend class MaskedGatherScatterSDNode;		friend class MaskedGatherScatterSDNode;

uint16_t : NumMemSDNodeBits;		uint16_t : NumMemSDNodeBits;

▲ Show 20 Lines • Show All 755 Lines • ▼ Show 20 Lines	unsigned getRawSubclassData() const {
memcpy(&Data, &RawSDNodeBits, sizeof(RawSDNodeBits));		memcpy(&Data, &RawSDNodeBits, sizeof(RawSDNodeBits));
return Data;		return Data;
}		}

bool isVolatile() const { return MemSDNodeBits.IsVolatile; }		bool isVolatile() const { return MemSDNodeBits.IsVolatile; }
bool isNonTemporal() const { return MemSDNodeBits.IsNonTemporal; }		bool isNonTemporal() const { return MemSDNodeBits.IsNonTemporal; }
bool isDereferenceable() const { return MemSDNodeBits.IsDereferenceable; }		bool isDereferenceable() const { return MemSDNodeBits.IsDereferenceable; }
bool isInvariant() const { return MemSDNodeBits.IsInvariant; }		bool isInvariant() const { return MemSDNodeBits.IsInvariant; }
		bool isNonFaulting() const { return MemSDNodeBits.IsNonFaulting; }

// Returns the offset from the location of the access.		// Returns the offset from the location of the access.
int64_t getSrcValueOffset() const { return MMO->getOffset(); }		int64_t getSrcValueOffset() const { return MMO->getOffset(); }

/// Returns the AA info that describes the dereference.		/// Returns the AA info that describes the dereference.
AAMDNodes getAAInfo() const { return MMO->getAAInfo(); }		AAMDNodes getAAInfo() const { return MMO->getAAInfo(); }

/// Returns the Ranges that describes the dereference.		/// Returns the Ranges that describes the dereference.
▲ Show 20 Lines • Show All 1,342 Lines • Show Last 20 Lines

llvm/include/llvm/IR/IntrinsicsAArch64.td

Show First 20 Lines • Show All 769 Lines • ▼ Show 20 Lines
let TargetPrefix = "aarch64" in { // All intrinsics start with "llvm.aarch64.".		let TargetPrefix = "aarch64" in { // All intrinsics start with "llvm.aarch64.".

class AdvSIMD_1Vec_PredLoad_Intrinsic		class AdvSIMD_1Vec_PredLoad_Intrinsic
: Intrinsic<[llvm_anyvector_ty],		: Intrinsic<[llvm_anyvector_ty],
[LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,		[LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
LLVMPointerTo<0>],		LLVMPointerTo<0>],
[IntrReadMem, IntrArgMemOnly]>;		[IntrReadMem, IntrArgMemOnly]>;

		class AdvSIMD_1Vec_PredFaultingLoad_Intrinsic
		: Intrinsic<[llvm_anyvector_ty],
		[LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		LLVMPointerToElt<0>],
		[IntrReadMem, IntrArgMemOnly]>;

class AdvSIMD_1Vec_PredStore_Intrinsic		class AdvSIMD_1Vec_PredStore_Intrinsic
: Intrinsic<[],		: Intrinsic<[],
[llvm_anyvector_ty,		[llvm_anyvector_ty,
LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
LLVMPointerTo<0>],		LLVMPointerTo<0>],
[IntrArgMemOnly, NoCapture<2>]>;		[IntrArgMemOnly, NoCapture<2>]>;

class AdvSIMD_Merged1VectorArg_Intrinsic		class AdvSIMD_Merged1VectorArg_Intrinsic
▲ Show 20 Lines • Show All 279 Lines • ▼ Show 20 Lines	: Intrinsic<[llvm_anyvector_ty],
[IntrNoMem, ImmArg<1>]>;		[IntrNoMem, ImmArg<1>]>;

//		//
// Loads		// Loads
//		//

def int_aarch64_sve_ldnt1 : AdvSIMD_1Vec_PredLoad_Intrinsic;		def int_aarch64_sve_ldnt1 : AdvSIMD_1Vec_PredLoad_Intrinsic;

		def int_aarch64_sve_ldnf1 : AdvSIMD_1Vec_PredFaultingLoad_Intrinsic;

//		//
// Stores		// Stores
//		//

def int_aarch64_sve_stnt1 : AdvSIMD_1Vec_PredStore_Intrinsic;		def int_aarch64_sve_stnt1 : AdvSIMD_1Vec_PredStore_Intrinsic;

//		//
// Integer arithmetic		// Integer arithmetic
▲ Show 20 Lines • Show All 368 Lines • Show Last 20 Lines

llvm/lib/CodeGen/MachineOperand.cpp

Show First 20 Lines • Show All 1,083 Lines • ▼ Show 20 Lines	if (getFlags() & MachineMemOperand::MOTargetFlag1)
OS << '"' << getTargetMMOFlagName(*TII, MachineMemOperand::MOTargetFlag1)		OS << '"' << getTargetMMOFlagName(*TII, MachineMemOperand::MOTargetFlag1)
<< "\" ";		<< "\" ";
if (getFlags() & MachineMemOperand::MOTargetFlag2)		if (getFlags() & MachineMemOperand::MOTargetFlag2)
OS << '"' << getTargetMMOFlagName(*TII, MachineMemOperand::MOTargetFlag2)		OS << '"' << getTargetMMOFlagName(*TII, MachineMemOperand::MOTargetFlag2)
<< "\" ";		<< "\" ";
if (getFlags() & MachineMemOperand::MOTargetFlag3)		if (getFlags() & MachineMemOperand::MOTargetFlag3)
OS << '"' << getTargetMMOFlagName(*TII, MachineMemOperand::MOTargetFlag3)		OS << '"' << getTargetMMOFlagName(*TII, MachineMemOperand::MOTargetFlag3)
<< "\" ";		<< "\" ";
		if (isNonFaulting())
		OS << "non-faulting ";

assert((isLoad() \|\| isStore()) &&		assert((isLoad() \|\| isStore()) &&
"machine memory operand must be a load or store (or both)");		"machine memory operand must be a load or store (or both)");
if (isLoad())		if (isLoad())
OS << "load ";		OS << "load ";
if (isStore())		if (isStore())
OS << "store ";		OS << "store ";

▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,473 Lines • ▼ Show 20 Lines	if (BV->isConstant()) {
SDValue Ops = { Operand };		SDValue Ops = { Operand };
if (SDValue Fold = FoldConstantVectorArithmetic(Opcode, DL, VT, Ops))		if (SDValue Fold = FoldConstantVectorArithmetic(Opcode, DL, VT, Ops))
return Fold;		return Fold;
}		}
}		}
}		}
}		}

		if (Operand.getOpcode() == ISD::SPLAT_VECTOR) {
		if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Operand.getOperand(0))) {
		const APInt &Val = C->getAPIntValue();
		switch (Opcode) {
		default: break;
		case ISD::ANY_EXTEND:
		case ISD::ZERO_EXTEND:
		return getConstant(Val.zextOrTrunc(VT.getScalarSizeInBits()), DL, VT,
		C->isTargetOpcode(), C->isOpaque());
		efriedmaUnsubmitted Not Done Reply Inline Actions I'm not sure how this is related. efriedma: I'm not sure how this is related.
		case ISD::SIGN_EXTEND:
		return getConstant(Val.sextOrTrunc(VT.getScalarSizeInBits()), DL, VT,
		C->isTargetOpcode(), C->isOpaque());
		}
		}
		}

unsigned OpOpcode = Operand.getNode()->getOpcode();		unsigned OpOpcode = Operand.getNode()->getOpcode();
switch (Opcode) {		switch (Opcode) {
case ISD::TokenFactor:		case ISD::TokenFactor:
case ISD::MERGE_VALUES:		case ISD::MERGE_VALUES:
case ISD::CONCAT_VECTORS:		case ISD::CONCAT_VECTORS:
return Operand; // Factor, merge or concat of one node? No need.		return Operand; // Factor, merge or concat of one node? No need.
case ISD::BUILD_VECTOR: {		case ISD::BUILD_VECTOR: {
// Attempt to simplify BUILD_VECTOR.		// Attempt to simplify BUILD_VECTOR.
▲ Show 20 Lines • Show All 4,419 Lines • ▼ Show 20 Lines

MemSDNode::MemSDNode(unsigned Opc, unsigned Order, const DebugLoc &dl,		MemSDNode::MemSDNode(unsigned Opc, unsigned Order, const DebugLoc &dl,
SDVTList VTs, EVT memvt, MachineMemOperand *mmo)		SDVTList VTs, EVT memvt, MachineMemOperand *mmo)
: SDNode(Opc, Order, dl, VTs), MemoryVT(memvt), MMO(mmo) {		: SDNode(Opc, Order, dl, VTs), MemoryVT(memvt), MMO(mmo) {
MemSDNodeBits.IsVolatile = MMO->isVolatile();		MemSDNodeBits.IsVolatile = MMO->isVolatile();
MemSDNodeBits.IsNonTemporal = MMO->isNonTemporal();		MemSDNodeBits.IsNonTemporal = MMO->isNonTemporal();
MemSDNodeBits.IsDereferenceable = MMO->isDereferenceable();		MemSDNodeBits.IsDereferenceable = MMO->isDereferenceable();
MemSDNodeBits.IsInvariant = MMO->isInvariant();		MemSDNodeBits.IsInvariant = MMO->isInvariant();
		MemSDNodeBits.IsNonFaulting = MMO->isNonFaulting();

// We check here that the size of the memory operand fits within the size of		// We check here that the size of the memory operand fits within the size of
// the MMO. This is because the MMO might indicate only a possible address		// the MMO. This is because the MMO might indicate only a possible address
// range instead of specifying the affected memory addresses precisely.		// range instead of specifying the affected memory addresses precisely.
// TODO: Make MachineMemOperands aware of scalable vectors.		// TODO: Make MachineMemOperands aware of scalable vectors.
assert(memvt.getStoreSize().getKnownMinSize() <= MMO->getSize() &&		assert(memvt.getStoreSize().getKnownMinSize() <= MMO->getSize() &&
"Size mismatch!");		"Size mismatch!");
}		}
▲ Show 20 Lines • Show All 796 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,519 Lines • ▼ Show 20 Lines	case Intrinsic::aarch64_sve_ldnt1: {
Info.opc = ISD::INTRINSIC_W_CHAIN;		Info.opc = ISD::INTRINSIC_W_CHAIN;
Info.memVT = MVT::getVT(PtrTy->getElementType());		Info.memVT = MVT::getVT(PtrTy->getElementType());
Info.ptrVal = I.getArgOperand(1);		Info.ptrVal = I.getArgOperand(1);
Info.offset = 0;		Info.offset = 0;
Info.align = MaybeAlign(DL.getABITypeAlignment(PtrTy->getElementType()));		Info.align = MaybeAlign(DL.getABITypeAlignment(PtrTy->getElementType()));
Info.flags = MachineMemOperand::MOLoad \| MachineMemOperand::MONonTemporal;		Info.flags = MachineMemOperand::MOLoad \| MachineMemOperand::MONonTemporal;
return true;		return true;
}		}
		case Intrinsic::aarch64_sve_ldnf1: {
		PointerType *PtrTy = cast<PointerType>(I.getArgOperand(1)->getType());
		Info.opc = ISD::INTRINSIC_W_CHAIN;
		Info.memVT = MVT::getVT(PtrTy->getElementType());
		Info.ptrVal = I.getArgOperand(1);
		Info.offset = 0;
		Info.align = MaybeAlign(DL.getABITypeAlignment(PtrTy->getElementType()));
		Info.flags = MachineMemOperand::MOLoad \| MachineMemOperand::MONonFaulting;
		return true;
		}
case Intrinsic::aarch64_sve_stnt1: {		case Intrinsic::aarch64_sve_stnt1: {
PointerType *PtrTy = cast<PointerType>(I.getArgOperand(2)->getType());		PointerType *PtrTy = cast<PointerType>(I.getArgOperand(2)->getType());
Info.opc = ISD::INTRINSIC_W_CHAIN;		Info.opc = ISD::INTRINSIC_W_CHAIN;
Info.memVT = MVT::getVT(PtrTy->getElementType());		Info.memVT = MVT::getVT(PtrTy->getElementType());
Info.ptrVal = I.getArgOperand(2);		Info.ptrVal = I.getArgOperand(2);
Info.offset = 0;		Info.offset = 0;
Info.align = MaybeAlign(DL.getABITypeAlignment(PtrTy->getElementType()));		Info.align = MaybeAlign(DL.getABITypeAlignment(PtrTy->getElementType()));
Info.flags = MachineMemOperand::MOStore \| MachineMemOperand::MONonTemporal;		Info.flags = MachineMemOperand::MOStore \| MachineMemOperand::MONonTemporal;
▲ Show 20 Lines • Show All 3,742 Lines • ▼ Show 20 Lines	case ISD::INTRINSIC_W_CHAIN:
case Intrinsic::aarch64_neon_st1x2:		case Intrinsic::aarch64_neon_st1x2:
case Intrinsic::aarch64_neon_st1x3:		case Intrinsic::aarch64_neon_st1x3:
case Intrinsic::aarch64_neon_st1x4:		case Intrinsic::aarch64_neon_st1x4:
case Intrinsic::aarch64_neon_st2lane:		case Intrinsic::aarch64_neon_st2lane:
case Intrinsic::aarch64_neon_st3lane:		case Intrinsic::aarch64_neon_st3lane:
case Intrinsic::aarch64_neon_st4lane:		case Intrinsic::aarch64_neon_st4lane:
return performNEONPostLDSTCombine(N, DCI, DAG);		return performNEONPostLDSTCombine(N, DCI, DAG);
case Intrinsic::aarch64_sve_ldnt1:		case Intrinsic::aarch64_sve_ldnt1:
		case Intrinsic::aarch64_sve_ldnf1:
return performLDNT1Combine(N, DAG);		return performLDNT1Combine(N, DAG);
case Intrinsic::aarch64_sve_stnt1:		case Intrinsic::aarch64_sve_stnt1:
return performSTNT1Combine(N, DAG);		return performSTNT1Combine(N, DAG);
case Intrinsic::aarch64_sve_ld1_gather:		case Intrinsic::aarch64_sve_ld1_gather:
return performLD1GatherCombine(N, DAG, AArch64ISD::GLD1);		return performLD1GatherCombine(N, DAG, AArch64ISD::GLD1);
case Intrinsic::aarch64_sve_ld1_gather_index:		case Intrinsic::aarch64_sve_ld1_gather_index:
return performLD1GatherCombine(N, DAG, AArch64ISD::GLD1_SCALED);		return performLD1GatherCombine(N, DAG, AArch64ISD::GLD1_SCALED);
case Intrinsic::aarch64_sve_ld1_gather_sxtw:		case Intrinsic::aarch64_sve_ld1_gather_sxtw:
▲ Show 20 Lines • Show All 662 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 260 Lines • ▼ Show 20 Lines	def SDT_AArch64WrapperLarge : SDTypeProfile<1, 4,
SDTCisSameAs<1, 4>]>;		SDTCisSameAs<1, 4>]>;

// non-extending masked load fragment.		// non-extending masked load fragment.
def nonext_masked_load :		def nonext_masked_load :
PatFrag<(ops node:$ptr, node:$pred, node:$def),		PatFrag<(ops node:$ptr, node:$pred, node:$def),
(masked_ld node:$ptr, undef, node:$pred, node:$def), [{		(masked_ld node:$ptr, undef, node:$pred, node:$def), [{
return cast<MaskedLoadSDNode>(N)->getExtensionType() == ISD::NON_EXTLOAD &&		return cast<MaskedLoadSDNode>(N)->getExtensionType() == ISD::NON_EXTLOAD &&
cast<MaskedLoadSDNode>(N)->isUnindexed() &&		cast<MaskedLoadSDNode>(N)->isUnindexed() &&
!cast<MaskedLoadSDNode>(N)->isNonTemporal();		!cast<MaskedLoadSDNode>(N)->isNonTemporal() &&
		!cast<MaskedLoadSDNode>(N)->isNonFaulting();
}]>;		}]>;
// sign extending masked load fragments.		// sign extending masked load fragments.
def asext_masked_load :		def asext_masked_load :
PatFrag<(ops node:$ptr, node:$pred, node:$def),		PatFrag<(ops node:$ptr, node:$pred, node:$def),
(masked_ld node:$ptr, undef, node:$pred, node:$def),[{		(masked_ld node:$ptr, undef, node:$pred, node:$def),[{
return (cast<MaskedLoadSDNode>(N)->getExtensionType() == ISD::EXTLOAD \|\|		return (cast<MaskedLoadSDNode>(N)->getExtensionType() == ISD::EXTLOAD \|\|
cast<MaskedLoadSDNode>(N)->getExtensionType() == ISD::SEXTLOAD) &&		cast<MaskedLoadSDNode>(N)->getExtensionType() == ISD::SEXTLOAD) &&
cast<MaskedLoadSDNode>(N)->isUnindexed();		cast<MaskedLoadSDNode>(N)->isUnindexed() &&
		!cast<MaskedLoadSDNode>(N)->isNonFaulting();
}]>;		}]>;
def asext_masked_load_i8 :		def asext_masked_load_i8 :
PatFrag<(ops node:$ptr, node:$pred, node:$def),		PatFrag<(ops node:$ptr, node:$pred, node:$def),
(asext_masked_load node:$ptr, node:$pred, node:$def), [{		(asext_masked_load node:$ptr, node:$pred, node:$def), [{
return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i8;		return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i8;
}]>;		}]>;
def asext_masked_load_i16 :		def asext_masked_load_i16 :
PatFrag<(ops node:$ptr, node:$pred, node:$def),		PatFrag<(ops node:$ptr, node:$pred, node:$def),
(asext_masked_load node:$ptr, node:$pred, node:$def), [{		(asext_masked_load node:$ptr, node:$pred, node:$def), [{
return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i16;		return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i16;
}]>;		}]>;
def asext_masked_load_i32 :		def asext_masked_load_i32 :
PatFrag<(ops node:$ptr, node:$pred, node:$def),		PatFrag<(ops node:$ptr, node:$pred, node:$def),
(asext_masked_load node:$ptr, node:$pred, node:$def), [{		(asext_masked_load node:$ptr, node:$pred, node:$def), [{
return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i32;		return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i32;
}]>;		}]>;
// zero extending masked load fragments.		// zero extending masked load fragments.
def zext_masked_load :		def zext_masked_load :
PatFrag<(ops node:$ptr, node:$pred, node:$def),		PatFrag<(ops node:$ptr, node:$pred, node:$def),
(masked_ld node:$ptr, undef, node:$pred, node:$def), [{		(masked_ld node:$ptr, undef, node:$pred, node:$def), [{
return cast<MaskedLoadSDNode>(N)->getExtensionType() == ISD::ZEXTLOAD &&		return cast<MaskedLoadSDNode>(N)->getExtensionType() == ISD::ZEXTLOAD &&
cast<MaskedLoadSDNode>(N)->isUnindexed();		cast<MaskedLoadSDNode>(N)->isUnindexed() &&
		!cast<MaskedLoadSDNode>(N)->isNonFaulting();
}]>;		}]>;
def zext_masked_load_i8 :		def zext_masked_load_i8 :
PatFrag<(ops node:$ptr, node:$pred, node:$def),		PatFrag<(ops node:$ptr, node:$pred, node:$def),
(zext_masked_load node:$ptr, node:$pred, node:$def), [{		(zext_masked_load node:$ptr, node:$pred, node:$def), [{
return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i8;		return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i8;
}]>;		}]>;
def zext_masked_load_i16 :		def zext_masked_load_i16 :
PatFrag<(ops node:$ptr, node:$pred, node:$def),		PatFrag<(ops node:$ptr, node:$pred, node:$def),
(zext_masked_load node:$ptr, node:$pred, node:$def), [{		(zext_masked_load node:$ptr, node:$pred, node:$def), [{
return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i16;		return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i16;
}]>;		}]>;
def zext_masked_load_i32 :		def zext_masked_load_i32 :
PatFrag<(ops node:$ptr, node:$pred, node:$def),		PatFrag<(ops node:$ptr, node:$pred, node:$def),
(zext_masked_load node:$ptr, node:$pred, node:$def), [{		(zext_masked_load node:$ptr, node:$pred, node:$def), [{
return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i32;		return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i32;
}]>;		}]>;

def non_temporal_load :		def non_temporal_load :
PatFrag<(ops node:$ptr, node:$pred, node:$def),		PatFrag<(ops node:$ptr, node:$pred, node:$def),
(masked_ld node:$ptr, undef, node:$pred, node:$def), [{		(masked_ld node:$ptr, undef, node:$pred, node:$def), [{
return cast<MaskedLoadSDNode>(N)->getExtensionType() == ISD::NON_EXTLOAD &&		return cast<MaskedLoadSDNode>(N)->getExtensionType() == ISD::NON_EXTLOAD &&
cast<MaskedLoadSDNode>(N)->isUnindexed() &&		cast<MaskedLoadSDNode>(N)->isUnindexed() &&
cast<MaskedLoadSDNode>(N)->isNonTemporal();		cast<MaskedLoadSDNode>(N)->isNonTemporal() &&
		!cast<MaskedLoadSDNode>(N)->isNonFaulting();
		}]>;

		def non_faulting_load :
		sdesmalenUnsubmitted Not Done Reply Inline Actions This duplicates a lot of code, maybe it makes sense to combine this into a multiclass. I'm thinking of something along the lines of: multiclass load_store_fragments<code pred> { def : PatFrag<(ops node:$ptr, node:$pred, node:$def), (masked_ld node:$ptr, undef, node:$pred, node:$def), pred>; def _i8 : PatFrag<(ops ...), ((cast<SDPatternOperator>(NAME) ...), [{ return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i8; }]>; def _i16 : PatFrag<(ops ...), ((cast<SDPatternOperator>(NAME) ....), [{ return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i16; }]>; def _i32 : PatFrag<(ops ...), ((cast<SDPatternOperator>(NAME) ....), [{ return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i32; }]>; def _i64 : PatFrag<(ops ...), ((cast<SDPatternOperator>(NAME) ....), [{ return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i64; }]>; } defm non_temporal_load : load_store_fragments<[{ return cast<MaskedLoadSDNode>(N)->getExtensionType() == ISD::NON_EXTLOAD && cast<MaskedLoadSDNode>(N)->isUnindexed() && cast<MaskedLoadSDNode>(N)->isNonTemporal() && !cast<MaskedLoadSDNode>(N)->isNonFaulting(); }]>; defm sext_non_temporal_load : load_store_fragments<[{ return cast<MaskedLoadSDNode>(N)->getExtensionType() == ISD::SEXTLOAD && cast<MaskedLoadSDNode>(N)->isUnindexed() && cast<MaskedLoadSDNode>(N)->isNonTemporal() && !cast<MaskedLoadSDNode>(N)->isNonFaulting(); }]>; defm zext_non_faulting_load : load_store_fragments<[{ return cast<MaskedLoadSDNode>(N)->getExtensionType() == ISD::ZEXTLOAD && cast<MaskedLoadSDNode>(N)->isUnindexed() && !cast<MaskedLoadSDNode>(N)->isNonTemporal() && cast<MaskedLoadSDNode>(N)->isNonFaulting(); }]>; sdesmalen: This duplicates a lot of code, maybe it makes sense to combine this into a multiclass. I'm…
		PatFrag<(ops node:$ptr, node:$pred, node:$def),
		(masked_ld node:$ptr, undef, node:$pred, node:$def), [{
		return cast<MaskedLoadSDNode>(N)->getExtensionType() == ISD::NON_EXTLOAD &&
		cast<MaskedLoadSDNode>(N)->isUnindexed() &&
		!cast<MaskedLoadSDNode>(N)->isNonTemporal() &&
		cast<MaskedLoadSDNode>(N)->isNonFaulting();
		}]>;

		def sext_non_faulting_load :
		sdesmalenUnsubmitted Not Done Reply Inline Actions nit: we should probably add `_masked_` to the name (although I realise that I forgot to spot that on the non-temporal patch) sdesmalen: nit: we should probably add `_masked_` to the name (although I realise that I forgot to spot…
		PatFrag<(ops node:$ptr, node:$pred, node:$def),
		(masked_ld node:$ptr, undef, node:$pred, node:$def), [{
		return cast<MaskedLoadSDNode>(N)->getExtensionType() == ISD::SEXTLOAD &&
		sdesmalenUnsubmitted Not Done Reply Inline Actions Shall we support any-extend as well? (thus making `asext_non_faulting_load`) sdesmalen: Shall we support any-extend as well? (thus making `asext_non_faulting_load`)
		cast<MaskedLoadSDNode>(N)->isUnindexed() &&
		!cast<MaskedLoadSDNode>(N)->isNonTemporal() &&
		cast<MaskedLoadSDNode>(N)->isNonFaulting();
		}]>;

		def sext_non_faulting_load_i8 :
		PatFrag<(ops node:$ptr, node:$pred, node:$def),
		(sext_non_faulting_load node:$ptr, node:$pred, node:$def), [{
		return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i8;
		}]>;

		def sext_non_faulting_load_i16 :
		PatFrag<(ops node:$ptr, node:$pred, node:$def),
		(sext_non_faulting_load node:$ptr, node:$pred, node:$def), [{
		return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i16;
		}]>;

		def sext_non_faulting_load_i32 :
		PatFrag<(ops node:$ptr, node:$pred, node:$def),
		(sext_non_faulting_load node:$ptr, node:$pred, node:$def), [{
		return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i32;
		}]>;

		def zext_non_faulting_load :
		PatFrag<(ops node:$ptr, node:$pred, node:$def),
		(masked_ld node:$ptr, undef, node:$pred, node:$def), [{
		return cast<MaskedLoadSDNode>(N)->getExtensionType() == ISD::ZEXTLOAD &&
		cast<MaskedLoadSDNode>(N)->isUnindexed() &&
		!cast<MaskedLoadSDNode>(N)->isNonTemporal() &&
		cast<MaskedLoadSDNode>(N)->isNonFaulting();
		}]>;

		def zext_non_faulting_load_i8 :
		PatFrag<(ops node:$ptr, node:$pred, node:$def),
		(zext_non_faulting_load node:$ptr, node:$pred, node:$def), [{
		return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i8;
		}]>;

		def zext_non_faulting_load_i16 :
		PatFrag<(ops node:$ptr, node:$pred, node:$def),
		(zext_non_faulting_load node:$ptr, node:$pred, node:$def), [{
		return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i16;
		}]>;

		def zext_non_faulting_load_i32 :
		PatFrag<(ops node:$ptr, node:$pred, node:$def),
		(zext_non_faulting_load node:$ptr, node:$pred, node:$def), [{
		return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i32;
}]>;		}]>;

// non-truncating masked store fragment.		// non-truncating masked store fragment.
def nontrunc_masked_store :		def nontrunc_masked_store :
PatFrag<(ops node:$val, node:$ptr, node:$pred),		PatFrag<(ops node:$val, node:$ptr, node:$pred),
(masked_st node:$val, node:$ptr, undef, node:$pred), [{		(masked_st node:$val, node:$ptr, undef, node:$pred), [{
return !cast<MaskedStoreSDNode>(N)->isTruncatingStore() &&		return !cast<MaskedStoreSDNode>(N)->isTruncatingStore() &&
cast<MaskedStoreSDNode>(N)->isUnindexed() &&		cast<MaskedStoreSDNode>(N)->isUnindexed() &&
▲ Show 20 Lines • Show All 6,953 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

	Show First 20 Lines • Show All 1,182 Lines • ▼ Show 20 Lines
	defm : pred_load<nxv8i16, nxv8i1, non_temporal_load, LDNT1H_ZRI>;			defm : pred_load<nxv8i16, nxv8i1, non_temporal_load, LDNT1H_ZRI>;
	defm : pred_load<nxv4i32, nxv4i1, non_temporal_load, LDNT1W_ZRI>;			defm : pred_load<nxv4i32, nxv4i1, non_temporal_load, LDNT1W_ZRI>;
	defm : pred_load<nxv2i64, nxv2i1, non_temporal_load, LDNT1D_ZRI>;			defm : pred_load<nxv2i64, nxv2i1, non_temporal_load, LDNT1D_ZRI>;

	defm : pred_store<nxv16i8, nxv16i1, non_temporal_store, STNT1B_ZRI>;			defm : pred_store<nxv16i8, nxv16i1, non_temporal_store, STNT1B_ZRI>;
	defm : pred_store<nxv8i16, nxv8i1, non_temporal_store, STNT1H_ZRI>;			defm : pred_store<nxv8i16, nxv8i1, non_temporal_store, STNT1H_ZRI>;
	defm : pred_store<nxv4i32, nxv4i1, non_temporal_store, STNT1W_ZRI>;			defm : pred_store<nxv4i32, nxv4i1, non_temporal_store, STNT1W_ZRI>;
	defm : pred_store<nxv2i64, nxv2i1, non_temporal_store, STNT1D_ZRI>;			defm : pred_store<nxv2i64, nxv2i1, non_temporal_store, STNT1D_ZRI>;

				// 2-element contiguous non-faulting loads
				defm : pred_load<nxv2i64, nxv2i1, zext_non_faulting_load_i8, LDNF1B_D_IMM>;
				defm : pred_load<nxv2i64, nxv2i1, zext_non_faulting_load_i16, LDNF1H_D_IMM>;
				defm : pred_load<nxv2i64, nxv2i1, zext_non_faulting_load_i32, LDNF1W_D_IMM>;
				defm : pred_load<nxv2i64, nxv2i1, sext_non_faulting_load_i8, LDNF1SB_D_IMM>;
				defm : pred_load<nxv2i64, nxv2i1, sext_non_faulting_load_i16, LDNF1SH_D_IMM>;
				defm : pred_load<nxv2i64, nxv2i1, sext_non_faulting_load_i32, LDNF1SW_D_IMM>;
				defm : pred_load<nxv2i64, nxv2i1, non_faulting_load, LDNF1D_IMM>;

				// 4-element contiguous non-faulting loads
				defm : pred_load<nxv4i32, nxv4i1, zext_non_faulting_load_i8, LDNF1B_S_IMM>;
				defm : pred_load<nxv4i32, nxv4i1, zext_non_faulting_load_i16, LDNF1H_S_IMM>;
				defm : pred_load<nxv4i32, nxv4i1, sext_non_faulting_load_i8, LDNF1SB_S_IMM>;
				defm : pred_load<nxv4i32, nxv4i1, sext_non_faulting_load_i16, LDNF1SH_S_IMM>;
				defm : pred_load<nxv4i32, nxv4i1, non_faulting_load, LDNF1W_IMM>;

				// 8-element contiguous non-faulting loads
				defm : pred_load<nxv8i16, nxv8i1, zext_non_faulting_load_i8, LDNF1B_H_IMM>;
				defm : pred_load<nxv8i16, nxv8i1, sext_non_faulting_load_i8, LDNF1SB_H_IMM>;
				defm : pred_load<nxv8i16, nxv8i1, non_faulting_load, LDNF1H_IMM>;

				// 16-element contiguous non-faulting loads
				defm : pred_load<nxv16i8, nxv16i1, non_faulting_load, LDNF1B_IMM>;
	}			}

	let Predicates = [HasSVE2] in {			let Predicates = [HasSVE2] in {
	// SVE2 integer multiply-add (indexed)			// SVE2 integer multiply-add (indexed)
	defm MLA_ZZZI : sve2_int_mla_by_indexed_elem<0b01, 0b0, "mla">;			defm MLA_ZZZI : sve2_int_mla_by_indexed_elem<0b01, 0b0, "mla">;
	defm MLS_ZZZI : sve2_int_mla_by_indexed_elem<0b01, 0b1, "mls">;			defm MLS_ZZZI : sve2_int_mla_by_indexed_elem<0b01, 0b1, "mls">;

	// SVE2 saturating multiply-add high (indexed)			// SVE2 saturating multiply-add high (indexed)
	▲ Show 20 Lines • Show All 400 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/SVEInstrFormats.td

Show First 20 Lines • Show All 5,205 Lines • ▼ Show 20 Lines	: I<(outs VecList:$Zt), (ins PPR3bAny:$Pg, GPR64sp:$Rn, simm4s1:$imm4),

let mayLoad = 1;		let mayLoad = 1;
let Uses = !if(!eq(nf, 1), [FFR], []);		let Uses = !if(!eq(nf, 1), [FFR], []);
let Defs = !if(!eq(nf, 1), [FFR], []);		let Defs = !if(!eq(nf, 1), [FFR], []);
}		}

multiclass sve_mem_cld_si_base<bits<4> dtype, bit nf, string asm,		multiclass sve_mem_cld_si_base<bits<4> dtype, bit nf, string asm,
RegisterOperand listty, ZPRRegOp zprty> {		RegisterOperand listty, ZPRRegOp zprty> {
def "" : sve_mem_cld_si_base<dtype, nf, asm, listty>;		def _REAL : sve_mem_cld_si_base<dtype, nf, asm, listty>;

def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn]",		def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn]",
(!cast<Instruction>(NAME) zprty:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, 0), 0>;		(!cast<Instruction>(NAME # _REAL) zprty:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, 0), 0>;
def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn, $imm4, mul vl]",		def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn, $imm4, mul vl]",
(!cast<Instruction>(NAME) zprty:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, simm4s1:$imm4), 0>;		(!cast<Instruction>(NAME # _REAL) zprty:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, simm4s1:$imm4), 0>;
def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn]",		def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn]",
(!cast<Instruction>(NAME) listty:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, 0), 1>;		(!cast<Instruction>(NAME # _REAL) listty:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, 0), 1>;

		// We need a layer of indirection because early machine code passes balk at
		// physical register (i.e. FFR) uses that have no previous definition.
		let hasSideEffects = 1, hasNoSchedulingInfo = 1, mayLoad = 1 in {
		def "" : Pseudo<(outs listty:$Zt), (ins PPR3bAny:$Pg, GPR64sp:$Rn, simm4s1:$imm4), []>,
		PseudoInstExpansion<(!cast<Instruction>(NAME # _REAL) listty:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, simm4s1:$imm4)>;
		}
}		}

multiclass sve_mem_cld_si<bits<4> dtype, string asm, RegisterOperand listty,		multiclass sve_mem_cld_si<bits<4> dtype, string asm, RegisterOperand listty,
ZPRRegOp zprty>		ZPRRegOp zprty>
: sve_mem_cld_si_base<dtype, 0, asm, listty, zprty>;		: sve_mem_cld_si_base<dtype, 0, asm, listty, zprty>;

class sve_mem_cldnt_si_base<bits<2> msz, string asm, RegisterOperand VecList>		class sve_mem_cldnt_si_base<bits<2> msz, string asm, RegisterOperand VecList>
: I<(outs VecList:$Zt), (ins PPR3bAny:$Pg, GPR64sp:$Rn, simm4s1:$imm4),		: I<(outs VecList:$Zt), (ins PPR3bAny:$Pg, GPR64sp:$Rn, simm4s1:$imm4),
▲ Show 20 Lines • Show All 1,212 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-intrinsics-loads-nf.ll

This file was added.

				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s \| FileCheck %s

				define <vscale x 16 x i8> @ldnf1b(<vscale x 16 x i1> %pg, i8* %a) {
				; CHECK-LABEL: ldnf1b:
				; CHECK: ldnf1b { z0.b }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 16 x i8> @llvm.aarch64.sve.ldnf1.nxv16i8(<vscale x 16 x i1> %pg, i8* %a)
				ret <vscale x 16 x i8> %load
				}

				define <vscale x 8 x i16> @ldnf1b_h(<vscale x 8 x i1> %pg, i8* %a) {
				; CHECK-LABEL: ldnf1b_h:
				; CHECK: ldnf1b { z0.h }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 8 x i8> @llvm.aarch64.sve.ldnf1.nxv8i8(<vscale x 8 x i1> %pg, i8* %a)
				%res = zext <vscale x 8 x i8> %load to <vscale x 8 x i16>
				ret <vscale x 8 x i16> %res
				}

				define <vscale x 8 x i16> @ldnf1sb_h(<vscale x 8 x i1> %pg, i8* %a) {
				; CHECK-LABEL: ldnf1sb_h:
				; CHECK: ldnf1sb { z0.h }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 8 x i8> @llvm.aarch64.sve.ldnf1.nxv8i8(<vscale x 8 x i1> %pg, i8* %a)
				%res = sext <vscale x 8 x i8> %load to <vscale x 8 x i16>
				ret <vscale x 8 x i16> %res
				}

				define <vscale x 8 x i16> @ldnf1h(<vscale x 8 x i1> %pg, i16* %a) {
				; CHECK-LABEL: ldnf1h:
				; CHECK: ldnf1h { z0.h }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 8 x i16> @llvm.aarch64.sve.ldnf1.nxv8i16(<vscale x 8 x i1> %pg, i16* %a)
				ret <vscale x 8 x i16> %load
				}

				define <vscale x 8 x half> @ldnf1h_f16(<vscale x 8 x i1> %pg, half* %a) {
				; CHECK-LABEL: ldnf1h_f16:
				; CHECK: ldnf1h { z0.h }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 8 x half> @llvm.aarch64.sve.ldnf1.nxv8f16(<vscale x 8 x i1> %pg, half* %a)
				ret <vscale x 8 x half> %load
				}

				define <vscale x 4 x i32> @ldnf1b_s(<vscale x 4 x i1> %pg, i8* %a) {
				; CHECK-LABEL: ldnf1b_s:
				; CHECK: ldnf1b { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 4 x i8> @llvm.aarch64.sve.ldnf1.nxv4i8(<vscale x 4 x i1> %pg, i8* %a)
				%res = zext <vscale x 4 x i8> %load to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 4 x i32> @ldnf1sb_s(<vscale x 4 x i1> %pg, i8* %a) {
				; CHECK-LABEL: ldnf1sb_s:
				; CHECK: ldnf1sb { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 4 x i8> @llvm.aarch64.sve.ldnf1.nxv4i8(<vscale x 4 x i1> %pg, i8* %a)
				%res = sext <vscale x 4 x i8> %load to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 4 x i32> @ldnf1h_s(<vscale x 4 x i1> %pg, i16* %a) {
				; CHECK-LABEL: ldnf1h_s:
				; CHECK: ldnf1h { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 4 x i16> @llvm.aarch64.sve.ldnf1.nxv4i16(<vscale x 4 x i1> %pg, i16* %a)
				%res = zext <vscale x 4 x i16> %load to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 4 x i32> @ldnf1sh_s(<vscale x 4 x i1> %pg, i16* %a) {
				; CHECK-LABEL: ldnf1sh_s:
				; CHECK: ldnf1sh { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 4 x i16> @llvm.aarch64.sve.ldnf1.nxv4i16(<vscale x 4 x i1> %pg, i16* %a)
				%res = sext <vscale x 4 x i16> %load to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 4 x i32> @ldnf1w(<vscale x 4 x i1> %pg, i32* %a) {
				; CHECK-LABEL: ldnf1w:
				; CHECK: ldnf1w { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 4 x i32> @llvm.aarch64.sve.ldnf1.nxv4i32(<vscale x 4 x i1> %pg, i32* %a)
				ret <vscale x 4 x i32> %load
				}

				define <vscale x 4 x float> @ldnf1w_f32(<vscale x 4 x i1> %pg, float* %a) {
				; CHECK-LABEL: ldnf1w_f32:
				; CHECK: ldnf1w { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 4 x float> @llvm.aarch64.sve.ldnf1.nxv4f32(<vscale x 4 x i1> %pg, float* %a)
				ret <vscale x 4 x float> %load
				}

				define <vscale x 2 x i64> @ldnf1b_d(<vscale x 2 x i1> %pg, i8* %a) {
				; CHECK-LABEL: ldnf1b_d:
				; CHECK: ldnf1b { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 2 x i8> @llvm.aarch64.sve.ldnf1.nxv2i8(<vscale x 2 x i1> %pg, i8* %a)
				%res = zext <vscale x 2 x i8> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @ldnf1sb_d(<vscale x 2 x i1> %pg, i8* %a) {
				; CHECK-LABEL: ldnf1sb_d:
				; CHECK: ldnf1sb { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 2 x i8> @llvm.aarch64.sve.ldnf1.nxv2i8(<vscale x 2 x i1> %pg, i8* %a)
				%res = sext <vscale x 2 x i8> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @ldnf1h_d(<vscale x 2 x i1> %pg, i16* %a) {
				; CHECK-LABEL: ldnf1h_d:
				; CHECK: ldnf1h { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 2 x i16> @llvm.aarch64.sve.ldnf1.nxv2i16(<vscale x 2 x i1> %pg, i16* %a)
				%res = zext <vscale x 2 x i16> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @ldnf1sh_d(<vscale x 2 x i1> %pg, i16* %a) {
				; CHECK-LABEL: ldnf1sh_d:
				; CHECK: ldnf1sh { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 2 x i16> @llvm.aarch64.sve.ldnf1.nxv2i16(<vscale x 2 x i1> %pg, i16* %a)
				%res = sext <vscale x 2 x i16> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @ldnf1w_d(<vscale x 2 x i1> %pg, i32* %a) {
				; CHECK-LABEL: ldnf1w_d:
				; CHECK: ldnf1w { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 2 x i32> @llvm.aarch64.sve.ldnf1.nxv2i32(<vscale x 2 x i1> %pg, i32* %a)
				%res = zext <vscale x 2 x i32> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @ldnf1sw_d(<vscale x 2 x i1> %pg, i32* %a) {
				; CHECK-LABEL: ldnf1sw_d:
				; CHECK: ldnf1sw { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 2 x i32> @llvm.aarch64.sve.ldnf1.nxv2i32(<vscale x 2 x i1> %pg, i32* %a)
				%res = sext <vscale x 2 x i32> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @ldnf1d(<vscale x 2 x i1> %pg, i64* %a) {
				; CHECK-LABEL: ldnf1d:
				; CHECK: ldnf1d { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 2 x i64> @llvm.aarch64.sve.ldnf1.nxv2i64(<vscale x 2 x i1> %pg, i64* %a)
				ret <vscale x 2 x i64> %load
				}

				define <vscale x 2 x double> @ldnf1d_f64(<vscale x 2 x i1> %pg, double* %a) {
				; CHECK-LABEL: ldnf1d_f64:
				; CHECK: ldnf1d { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 2 x double> @llvm.aarch64.sve.ldnf1.nxv2f64(<vscale x 2 x i1> %pg, double* %a)
				ret <vscale x 2 x double> %load
				}

				declare <vscale x 16 x i8> @llvm.aarch64.sve.ldnf1.nxv16i8(<vscale x 16 x i1>, i8*)

				declare <vscale x 8 x i8> @llvm.aarch64.sve.ldnf1.nxv8i8(<vscale x 8 x i1>, i8*)
				declare <vscale x 8 x i16> @llvm.aarch64.sve.ldnf1.nxv8i16(<vscale x 8 x i1>, i16*)
				declare <vscale x 8 x half> @llvm.aarch64.sve.ldnf1.nxv8f16(<vscale x 8 x i1>, half*)

				declare <vscale x 4 x i8> @llvm.aarch64.sve.ldnf1.nxv4i8(<vscale x 4 x i1>, i8*)
				declare <vscale x 4 x i16> @llvm.aarch64.sve.ldnf1.nxv4i16(<vscale x 4 x i1>, i16*)
				declare <vscale x 4 x i32> @llvm.aarch64.sve.ldnf1.nxv4i32(<vscale x 4 x i1>, i32*)
				declare <vscale x 4 x float> @llvm.aarch64.sve.ldnf1.nxv4f32(<vscale x 4 x i1>, float*)

				declare <vscale x 2 x i8> @llvm.aarch64.sve.ldnf1.nxv2i8(<vscale x 2 x i1>, i8*)
				declare <vscale x 2 x i16> @llvm.aarch64.sve.ldnf1.nxv2i16(<vscale x 2 x i1>, i16*)
				declare <vscale x 2 x i32> @llvm.aarch64.sve.ldnf1.nxv2i32(<vscale x 2 x i1>, i32*)
				declare <vscale x 2 x i64> @llvm.aarch64.sve.ldnf1.nxv2i64(<vscale x 2 x i1>, i64*)
				declare <vscale x 2 x double> @llvm.aarch64.sve.ldnf1.nxv2f64(<vscale x 2 x i1>, double*)

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Implement intrinsic for non-faulting loadsChanges PlannedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 234089

llvm/include/llvm/CodeGen/MachineMemOperand.h

llvm/include/llvm/CodeGen/SelectionDAGNodes.h

llvm/include/llvm/IR/IntrinsicsAArch64.td

llvm/lib/CodeGen/MachineOperand.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/AArch64/AArch64InstrInfo.td

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

llvm/lib/Target/AArch64/SVEInstrFormats.td

llvm/test/CodeGen/AArch64/sve-intrinsics-loads-nf.ll

[AArch64][SVE] Implement intrinsic for non-faulting loads
Changes PlannedPublic