This is an archive of the discontinued LLVM Phabricator instance.

I could split the patch in two patches, one for stN and one for ldN. The stN one would not have any dependencies on the aforementioned patches, but I'd rather not do the split because the code uses the common method: AArch64DAGToDAGISel::findAddrModeSVELoadStore.

Of course, I didn't marry this patch. :) If you think I'd better split it so I can submit the stN part without waiting for the dependencies, I'd happily do it.

Thank you!

Francesco

fpetrogalli added a reviewer: sdesmalen.Apr 2 2020, 6:40 AM

I have extracted the stores in a separate patch, as it does not rely on any of the dependencies that are specific to the loads. I will update this patch to removed the stores bits.

The stores' patch is at https://reviews.llvm.org/D77435

I have removed the stores as they are separately implemented in D77435.

[NFC] Rebased on top of master, after the change for the stN went in (https://github.com/llvm/llvm-project/commit/897fdec586d9ad4c101738caa723bacdda15a769).

Harbormaster failed remote builds in B53785: Diff 258428!Apr 17 2020, 2:37 PM

c-rhodes mentioned this in D75751: [AArch64][SVE] Implement structured load intrinsics.May 26 2020, 9:01 AM

huihuiz added a subscriber: huihuiz.Jun 10 2020, 12:34 PM

Hey @fpetrogalli @sdesmalen , any updates on this patch? This should be the last one to commit for structure load? Or are we using different approach?

fpetrogalli removed parent revisions: D77435: [llvm][CodeGen] Addressing modes for SVE stN., D75751: [AArch64][SVE] Implement structured load intrinsics.Jul 6 2020, 9:28 AM

fpetrogalli added parent revisions: D77435: [llvm][CodeGen] Addressing modes for SVE stN., D75751: [AArch64][SVE] Implement structured load intrinsics.Jul 6 2020, 9:30 AM

I have rebased on top of master and added the bfloat test cases.

Harbormaster failed remote builds in B63115: Diff 275892!Jul 6 2020, 7:54 PM

sdesmalen added inline comments.Jul 20 2020, 9:52 AM

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4808	Why is this a template argument instead of a regular argument with default = 1?
4820–4821	remove line?
4845	How come that ST2..ST4 are not covered here?
llvm/test/CodeGen/AArch64/sve-intrinsics-ldN-reg+imm-addr-mode.ll
453	nit: when testing these pairs, can you at least make sure to test the min/max values?
llvm/test/CodeGen/AArch64/sve-intrinsics-ldN-reg+reg-addr-mode.ll
10	nit: s/gp/pg/ (in both tests)

Thank you for your review @sdesmalen!

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4808	It is a template because I don't foresee `NumVec` becoming a runtime value. I have used this in other places, for example in `setInfoSVEStN`. I could set the default to be 1, but I'd rather call it out in the template invocation. Your comment made me think I should have also made sure that the values of `NumVecs` are architecturally valid ==> that's why I have added a static assert in the last version of the patch.
4845	We cannot do for the structured load the same we do for the stores because address optimization is done on the intrinsics for structured stores: case Intrinsic::aarch64_sve_st2: { if (VT == MVT::nxv16i8) { SelectPredicatedStore</Scale=/0>(Node, 2, AArch64::ST2B, AArch64::ST2B_IMM); return; While for the loads the address optimization is done on the custom ISD nodes: case AArch64ISD::SVE_LD2_MERGE_ZERO: { if (VT == MVT::nxv16i8) { SelectPredicatedLoad</Scale=/0>(Node, 2, AArch64::LD2B_IMM, AArch64::LD2B); return; This means that the method `getMemVTFromNode` determines the memory type with the following generic code for the stores: if (isa<MemIntrinsicSDNode>(Root)) return cast<MemIntrinsicSDNode>(Root)->getMemoryVT(); We might do the same for the loads, but we would have to teach to `AArch64TargetLowering::getTgtMemIntrinsic` how to extract the memory type from the intrinsics call, like we do with the stores: switch (Intrinsic) { case Intrinsic::aarch64_sve_st2: return setInfoSVEStN<2>(Info, I); case Intrinsic::aarch64_sve_st3: return setInfoSVEStN<3>(Info, I); case Intrinsic::aarch64_sve_st4: return setInfoSVEStN<4>(Info, I); But that would probably requires changing all the code that converts the intrinsics of the loads into the `SVE_LD*_MERGE_ZERO` ISD nodes... If we want to do this cleanup maybe it is better to do it for the whole "structure store intrinsics/ISD nodes" in a separate patch? As far as I understand it, this would be quite an extensive change because we would have to make sure that the nodes of the structure loads are still using the intrinsic and not the custom ISD node when we enter this method, which is not what is going on now. I believe there is a reason for having converted the intrinsics nodes into custom ISD nodes before getting here in address optimization?
llvm/test/CodeGen/AArch64/sve-intrinsics-ldN-reg+imm-addr-mode.ll
453	I can do that, just making sure you didn't miss the comment on the top of file that explains why I haven't add range checks for all cases: ; NOTE: invalid, upper and lower bound immediate values of the regimm ; addressing mode are checked only for the byte version of each ; instruction (`ld<N>b`), as the code for detecting the immediate is ; common to all instructions, and varies only for the number of ; elements of the structure store, which is <N> = 2, 3, 4. Let me know if you think this is not a sufficient explanation!

Harbormaster failed remote builds in B64999: Diff 279360!Jul 20 2020, 9:21 PM

sdesmalen added inline comments.Jul 22 2020, 8:14 AM

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4808	This means the compiler will have to generate two versions of `getPackedVectorTypeFromPredicateType` because you expect `NumVec` to be a constant value. It should instead just pass it as a regular parameter. One of the reasons that this file contains various templated functions is so that you can target them in `ComplexPattern`s through TableGen by specifying a specific variant of that function, e.g. `SelectSVEShiftImm64<0, 64>` for `ComplexPattern SVEShiftImm64`.
4845	Thanks for explaining, I guess I'm happy keeping it as it is now.
llvm/test/CodeGen/AArch64/sve-intrinsics-ldN-reg+imm-addr-mode.ll
453	I meant using `#-32` and `#28` instead of `#16` and `#-16` for ld4.nxv8i64 and nxv8f64 respectively. (and similar for other tests in this file)

I have addressed your feedback. Thank you @sdesmalen.

Harbormaster failed remote builds in B65458: Diff 280266!Jul 23 2020, 2:58 PM

fpetrogalli marked 2 inline comments as done.Jul 23 2020, 3:06 PM

fpetrogalli added inline comments.

llvm/test/CodeGen/AArch64/sve-intrinsics-ldN-reg+imm-addr-mode.ll
453	Sorry, I don't understand why -32/28 would be better than -16/16. The values you are suggesting are tested to be lower and upper bound for `ld4<T>` in the byte case, `ld4b`, between lines 305-323. The range of the immediate does not depend on the scalar data type being loaded, but only on the number of vector loaded by the instruction, so I think we are goo in testing it only for `ld<N>b`. Or am I missing something?

I have modified the testing of the reg+imm addressing mode to use upper and lower bound instead of strictly in-range values.

Thank you,

Francesco

Harbormaster failed remote builds in B65573: Diff 280480!Jul 24 2020, 9:16 AM

sdesmalen added inline comments.Jul 27 2020, 7:20 AM

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
248–249	Sorry, one more thing, can you make `Scale` a normal argument rather than a template argument? I see that D77435 also uses a template argument for the stores, rather than a regular argument. I guess you can fix that one separately in an NFC patch.

Address last feedback on template parameters.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
248–249	NFC committed as https://github.com/llvm/llvm-project/commit/dbeb184b7f54db2d3ef20ac153b1c77f81cf0b99

Harbormaster failed remote builds in B65864: Diff 280962!Jul 27 2020, 10:11 AM

fpetrogalli marked an inline comment as done.Jul 27 2020, 10:11 AM

fpetrogalli marked 3 inline comments as done.

LGTM, thanks!

This revision is now accepted and ready to land.Jul 27 2020, 10:13 AM

Closed by commit rGadb28e0fb2b0: [llvm][CodeGen] Addressing modes for SVE ldN. (authored by fpetrogalli). · Explain WhyJul 27 2020, 3:29 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelDAGToDAG.cpp

64 lines

test/

CodeGen/

AArch64/

sve-intrinsics-ldN-reg+imm-addr-mode.ll

495 lines

sve-intrinsics-ldN-reg+reg-addr-mode.ll

259 lines

Diff 281065

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

Show First 20 Lines • Show All 239 Lines • ▼ Show 20 Lines	public:
void SelectTagP(SDNode *N);		void SelectTagP(SDNode *N);

void SelectLoad(SDNode *N, unsigned NumVecs, unsigned Opc,		void SelectLoad(SDNode *N, unsigned NumVecs, unsigned Opc,
unsigned SubRegIdx);		unsigned SubRegIdx);
void SelectPostLoad(SDNode *N, unsigned NumVecs, unsigned Opc,		void SelectPostLoad(SDNode *N, unsigned NumVecs, unsigned Opc,
unsigned SubRegIdx);		unsigned SubRegIdx);
void SelectLoadLane(SDNode *N, unsigned NumVecs, unsigned Opc);		void SelectLoadLane(SDNode *N, unsigned NumVecs, unsigned Opc);
void SelectPostLoadLane(SDNode *N, unsigned NumVecs, unsigned Opc);		void SelectPostLoadLane(SDNode *N, unsigned NumVecs, unsigned Opc);
void SelectPredicatedLoad(SDNode *N, unsigned NumVecs, const unsigned Opc);		void SelectPredicatedLoad(SDNode *N, unsigned NumVecs, unsigned Scale,
		unsigned Opc_rr, unsigned Opc_ri);
		sdesmalenUnsubmitted Done Reply Inline Actions Sorry, one more thing, can you make `Scale` a normal argument rather than a template argument? I see that D77435 also uses a template argument for the stores, rather than a regular argument. I guess you can fix that one separately in an NFC patch. sdesmalen: Sorry, one more thing, can you make `Scale` a normal argument rather than a template argument?
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions NFC committed as https://github.com/llvm/llvm-project/commit/dbeb184b7f54db2d3ef20ac153b1c77f81cf0b99 fpetrogalli: NFC committed as https://github.com/llvm/llvm-project/commit/dbeb184b7f54db2d3ef20ac153b1c77f81…

bool SelectAddrModeFrameIndexSVE(SDValue N, SDValue &Base, SDValue &OffImm);		bool SelectAddrModeFrameIndexSVE(SDValue N, SDValue &Base, SDValue &OffImm);
/// SVE Reg+Imm addressing mode.		/// SVE Reg+Imm addressing mode.
template <int64_t Min, int64_t Max>		template <int64_t Min, int64_t Max>
bool SelectAddrModeIndexedSVE(SDNode *Root, SDValue N, SDValue &Base,		bool SelectAddrModeIndexedSVE(SDNode *Root, SDValue N, SDValue &Base,
SDValue &OffImm);		SDValue &OffImm);
/// SVE Reg+Reg address mode.		/// SVE Reg+Reg address mode.
template <unsigned Scale>		template <unsigned Scale>
▲ Show 20 Lines • Show All 1,172 Lines • ▼ Show 20 Lines	AArch64DAGToDAGISel::findAddrModeSVELoadStore(SDNode *N, unsigned Opc_rr,
const bool IsRegReg =		const bool IsRegReg =
!IsRegImm && SelectSVERegRegAddrMode(OldBase, Scale, NewBase, NewOffset);		!IsRegImm && SelectSVERegRegAddrMode(OldBase, Scale, NewBase, NewOffset);

// Select the instruction.		// Select the instruction.
return std::make_tuple(IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset);		return std::make_tuple(IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset);
}		}

void AArch64DAGToDAGISel::SelectPredicatedLoad(SDNode *N, unsigned NumVecs,		void AArch64DAGToDAGISel::SelectPredicatedLoad(SDNode *N, unsigned NumVecs,
const unsigned Opc) {		unsigned Scale, unsigned Opc_ri,
		unsigned Opc_rr) {
		assert(Scale < 4 && "Invalid scaling value.");
SDLoc DL(N);		SDLoc DL(N);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDValue Chain = N->getOperand(0);		SDValue Chain = N->getOperand(0);

		// Optimize addressing mode.
		SDValue Base, Offset;
		unsigned Opc;
		std::tie(Opc, Base, Offset) = findAddrModeSVELoadStore(
		N, Opc_rr, Opc_ri, N->getOperand(2),
		CurDAG->getTargetConstant(0, DL, MVT::i64), Scale);

SDValue Ops[] = {N->getOperand(1), // Predicate		SDValue Ops[] = {N->getOperand(1), // Predicate
N->getOperand(2), // Memory operand		Base, // Memory operand
CurDAG->getTargetConstant(0, DL, MVT::i64), Chain};		Offset, Chain};

const EVT ResTys[] = {MVT::Untyped, MVT::Other};		const EVT ResTys[] = {MVT::Untyped, MVT::Other};

SDNode *Load = CurDAG->getMachineNode(Opc, DL, ResTys, Ops);		SDNode *Load = CurDAG->getMachineNode(Opc, DL, ResTys, Ops);
SDValue SuperReg = SDValue(Load, 0);		SDValue SuperReg = SDValue(Load, 0);
for (unsigned i = 0; i < NumVecs; ++i)		for (unsigned i = 0; i < NumVecs; ++i)
ReplaceUses(SDValue(N, i), CurDAG->getTargetExtractSubreg(		ReplaceUses(SDValue(N, i), CurDAG->getTargetExtractSubreg(
AArch64::zsub0 + i, DL, VT, SuperReg));		AArch64::zsub0 + i, DL, VT, SuperReg));
▲ Show 20 Lines • Show All 3,268 Lines • ▼ Show 20 Lines	if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {
VT == MVT::v1f64) {		VT == MVT::v1f64) {
SelectPostStoreLane(Node, 4, AArch64::ST4i64_POST);		SelectPostStoreLane(Node, 4, AArch64::ST4i64_POST);
return;		return;
}		}
break;		break;
}		}
case AArch64ISD::SVE_LD2_MERGE_ZERO: {		case AArch64ISD::SVE_LD2_MERGE_ZERO: {
if (VT == MVT::nxv16i8) {		if (VT == MVT::nxv16i8) {
SelectPredicatedLoad(Node, 2, AArch64::LD2B_IMM);		SelectPredicatedLoad(Node, 2, 0, AArch64::LD2B_IMM, AArch64::LD2B);
return;		return;
} else if (VT == MVT::nxv8i16 \|\| VT == MVT::nxv8f16 \|\|		} else if (VT == MVT::nxv8i16 \|\| VT == MVT::nxv8f16 \|\|
(VT == MVT::nxv8bf16 && Subtarget->hasBF16())) {		(VT == MVT::nxv8bf16 && Subtarget->hasBF16())) {
SelectPredicatedLoad(Node, 2, AArch64::LD2H_IMM);		SelectPredicatedLoad(Node, 2, 1, AArch64::LD2H_IMM, AArch64::LD2H);
return;		return;
} else if (VT == MVT::nxv4i32 \|\| VT == MVT::nxv4f32) {		} else if (VT == MVT::nxv4i32 \|\| VT == MVT::nxv4f32) {
SelectPredicatedLoad(Node, 2, AArch64::LD2W_IMM);		SelectPredicatedLoad(Node, 2, 2, AArch64::LD2W_IMM, AArch64::LD2W);
return;		return;
} else if (VT == MVT::nxv2i64 \|\| VT == MVT::nxv2f64) {		} else if (VT == MVT::nxv2i64 \|\| VT == MVT::nxv2f64) {
SelectPredicatedLoad(Node, 2, AArch64::LD2D_IMM);		SelectPredicatedLoad(Node, 2, 3, AArch64::LD2D_IMM, AArch64::LD2D);
return;		return;
}		}
break;		break;
}		}
case AArch64ISD::SVE_LD3_MERGE_ZERO: {		case AArch64ISD::SVE_LD3_MERGE_ZERO: {
if (VT == MVT::nxv16i8) {		if (VT == MVT::nxv16i8) {
SelectPredicatedLoad(Node, 3, AArch64::LD3B_IMM);		SelectPredicatedLoad(Node, 3, 0, AArch64::LD3B_IMM, AArch64::LD3B);
return;		return;
} else if (VT == MVT::nxv8i16 \|\| VT == MVT::nxv8f16 \|\|		} else if (VT == MVT::nxv8i16 \|\| VT == MVT::nxv8f16 \|\|
(VT == MVT::nxv8bf16 && Subtarget->hasBF16())) {		(VT == MVT::nxv8bf16 && Subtarget->hasBF16())) {
SelectPredicatedLoad(Node, 3, AArch64::LD3H_IMM);		SelectPredicatedLoad(Node, 3, 1, AArch64::LD3H_IMM, AArch64::LD3H);
return;		return;
} else if (VT == MVT::nxv4i32 \|\| VT == MVT::nxv4f32) {		} else if (VT == MVT::nxv4i32 \|\| VT == MVT::nxv4f32) {
SelectPredicatedLoad(Node, 3, AArch64::LD3W_IMM);		SelectPredicatedLoad(Node, 3, 2, AArch64::LD3W_IMM, AArch64::LD3W);
return;		return;
} else if (VT == MVT::nxv2i64 \|\| VT == MVT::nxv2f64) {		} else if (VT == MVT::nxv2i64 \|\| VT == MVT::nxv2f64) {
SelectPredicatedLoad(Node, 3, AArch64::LD3D_IMM);		SelectPredicatedLoad(Node, 3, 3, AArch64::LD3D_IMM, AArch64::LD3D);
return;		return;
}		}
break;		break;
}		}
case AArch64ISD::SVE_LD4_MERGE_ZERO: {		case AArch64ISD::SVE_LD4_MERGE_ZERO: {
if (VT == MVT::nxv16i8) {		if (VT == MVT::nxv16i8) {
SelectPredicatedLoad(Node, 4, AArch64::LD4B_IMM);		SelectPredicatedLoad(Node, 4, 0, AArch64::LD4B_IMM, AArch64::LD4B);
return;		return;
} else if (VT == MVT::nxv8i16 \|\| VT == MVT::nxv8f16 \|\|		} else if (VT == MVT::nxv8i16 \|\| VT == MVT::nxv8f16 \|\|
(VT == MVT::nxv8bf16 && Subtarget->hasBF16())) {		(VT == MVT::nxv8bf16 && Subtarget->hasBF16())) {
SelectPredicatedLoad(Node, 4, AArch64::LD4H_IMM);		SelectPredicatedLoad(Node, 4, 1, AArch64::LD4H_IMM, AArch64::LD4H);
return;		return;
} else if (VT == MVT::nxv4i32 \|\| VT == MVT::nxv4f32) {		} else if (VT == MVT::nxv4i32 \|\| VT == MVT::nxv4f32) {
SelectPredicatedLoad(Node, 4, AArch64::LD4W_IMM);		SelectPredicatedLoad(Node, 4, 2, AArch64::LD4W_IMM, AArch64::LD4W);
return;		return;
} else if (VT == MVT::nxv2i64 \|\| VT == MVT::nxv2f64) {		} else if (VT == MVT::nxv2i64 \|\| VT == MVT::nxv2f64) {
SelectPredicatedLoad(Node, 4, AArch64::LD4D_IMM);		SelectPredicatedLoad(Node, 4, 3, AArch64::LD4D_IMM, AArch64::LD4D);
return;		return;
}		}
break;		break;
}		}
}		}

// Select the default instruction		// Select the default instruction
SelectCode(Node);		SelectCode(Node);
}		}

/// createAArch64ISelDag - This pass converts a legalized DAG into a		/// createAArch64ISelDag - This pass converts a legalized DAG into a
/// AArch64-specific DAG, ready for instruction scheduling.		/// AArch64-specific DAG, ready for instruction scheduling.
FunctionPass *llvm::createAArch64ISelDag(AArch64TargetMachine &TM,		FunctionPass *llvm::createAArch64ISelDag(AArch64TargetMachine &TM,
CodeGenOpt::Level OptLevel) {		CodeGenOpt::Level OptLevel) {
return new AArch64DAGToDAGISel(TM, OptLevel);		return new AArch64DAGToDAGISel(TM, OptLevel);
}		}

/// When \p PredVT is a scalable vector predicate in the form		/// When \p PredVT is a scalable vector predicate in the form
/// MVT::nx<M>xi1, it builds the correspondent scalable vector of		/// MVT::nx<M>xi1, it builds the correspondent scalable vector of
/// integers MVT::nx<M>xi<bits> s.t. M x bits = 128. If the input		/// integers MVT::nx<M>xi<bits> s.t. M x bits = 128. When targeting
		/// structured vectors (NumVec >1), the output data type is
		/// MVT::nx<M*NumVec>xi<bits> s.t. M x bits = 128. If the input
/// PredVT is not in the form MVT::nx<M>xi1, it returns an invalid		/// PredVT is not in the form MVT::nx<M>xi1, it returns an invalid
/// EVT.		/// EVT.
static EVT getPackedVectorTypeFromPredicateType(LLVMContext &Ctx, EVT PredVT) {		static EVT getPackedVectorTypeFromPredicateType(LLVMContext &Ctx, EVT PredVT,
		sdesmalenUnsubmitted Done Reply Inline Actions Why is this a template argument instead of a regular argument with default = 1? sdesmalen: Why is this a template argument instead of a regular argument with default = 1?
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions It is a template because I don't foresee `NumVec` becoming a runtime value. I have used this in other places, for example in `setInfoSVEStN`. I could set the default to be 1, but I'd rather call it out in the template invocation. Your comment made me think I should have also made sure that the values of `NumVecs` are architecturally valid ==> that's why I have added a static assert in the last version of the patch. fpetrogalli: It is a template because I don't foresee `NumVec` becoming a runtime value. I have used this in…
		sdesmalenUnsubmitted Done Reply Inline Actions This means the compiler will have to generate two versions of `getPackedVectorTypeFromPredicateType` because you expect `NumVec` to be a constant value. It should instead just pass it as a regular parameter. One of the reasons that this file contains various templated functions is so that you can target them in `ComplexPattern`s through TableGen by specifying a specific variant of that function, e.g. `SelectSVEShiftImm64<0, 64>` for `ComplexPattern SVEShiftImm64`. sdesmalen: This means the compiler will have to generate two versions of…
		unsigned NumVec) {
		assert(NumVec > 0 && NumVec < 5 && "Invalid number of vectors.");
if (!PredVT.isScalableVector() \|\| PredVT.getVectorElementType() != MVT::i1)		if (!PredVT.isScalableVector() \|\| PredVT.getVectorElementType() != MVT::i1)
return EVT();		return EVT();

if (PredVT != MVT::nxv16i1 && PredVT != MVT::nxv8i1 &&		if (PredVT != MVT::nxv16i1 && PredVT != MVT::nxv8i1 &&
PredVT != MVT::nxv4i1 && PredVT != MVT::nxv2i1)		PredVT != MVT::nxv4i1 && PredVT != MVT::nxv2i1)
return EVT();		return EVT();

ElementCount EC = PredVT.getVectorElementCount();		ElementCount EC = PredVT.getVectorElementCount();
EVT ScalarVT = EVT::getIntegerVT(Ctx, AArch64::SVEBitsPerBlock / EC.Min);		EVT ScalarVT = EVT::getIntegerVT(Ctx, AArch64::SVEBitsPerBlock / EC.Min);
EVT MemVT = EVT::getVectorVT(Ctx, ScalarVT, EC);		EVT MemVT = EVT::getVectorVT(Ctx, ScalarVT, EC * NumVec);

		sdesmalenUnsubmitted Done Reply Inline Actions remove line? sdesmalen: remove line?
return MemVT;		return MemVT;
}		}

/// Return the EVT of the data associated to a memory operation in \p		/// Return the EVT of the data associated to a memory operation in \p
/// Root. If such EVT cannot be retrived, it returns an invalid EVT.		/// Root. If such EVT cannot be retrived, it returns an invalid EVT.
static EVT getMemVTFromNode(LLVMContext &Ctx, SDNode *Root) {		static EVT getMemVTFromNode(LLVMContext &Ctx, SDNode *Root) {
if (isa<MemSDNode>(Root))		if (isa<MemSDNode>(Root))
return cast<MemSDNode>(Root)->getMemoryVT();		return cast<MemSDNode>(Root)->getMemoryVT();

if (isa<MemIntrinsicSDNode>(Root))		if (isa<MemIntrinsicSDNode>(Root))
return cast<MemIntrinsicSDNode>(Root)->getMemoryVT();		return cast<MemIntrinsicSDNode>(Root)->getMemoryVT();

const unsigned Opcode = Root->getOpcode();		const unsigned Opcode = Root->getOpcode();
// For custom ISD nodes, we have to look at them individually to extract the		// For custom ISD nodes, we have to look at them individually to extract the
// type of the data moved to/from memory.		// type of the data moved to/from memory.
switch (Opcode) {		switch (Opcode) {
case AArch64ISD::LD1_MERGE_ZERO:		case AArch64ISD::LD1_MERGE_ZERO:
case AArch64ISD::LD1S_MERGE_ZERO:		case AArch64ISD::LD1S_MERGE_ZERO:
case AArch64ISD::LDNF1_MERGE_ZERO:		case AArch64ISD::LDNF1_MERGE_ZERO:
case AArch64ISD::LDNF1S_MERGE_ZERO:		case AArch64ISD::LDNF1S_MERGE_ZERO:
return cast<VTSDNode>(Root->getOperand(3))->getVT();		return cast<VTSDNode>(Root->getOperand(3))->getVT();
case AArch64ISD::ST1_PRED:		case AArch64ISD::ST1_PRED:
return cast<VTSDNode>(Root->getOperand(4))->getVT();		return cast<VTSDNode>(Root->getOperand(4))->getVT();
		case AArch64ISD::SVE_LD2_MERGE_ZERO:
		sdesmalenUnsubmitted Done Reply Inline Actions How come that ST2..ST4 are not covered here? sdesmalen: How come that ST2..ST4 are not covered here?
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions We cannot do for the structured load the same we do for the stores because address optimization is done on the intrinsics for structured stores: case Intrinsic::aarch64_sve_st2: { if (VT == MVT::nxv16i8) { SelectPredicatedStore</Scale=/0>(Node, 2, AArch64::ST2B, AArch64::ST2B_IMM); return; While for the loads the address optimization is done on the custom ISD nodes: case AArch64ISD::SVE_LD2_MERGE_ZERO: { if (VT == MVT::nxv16i8) { SelectPredicatedLoad</Scale=/0>(Node, 2, AArch64::LD2B_IMM, AArch64::LD2B); return; This means that the method `getMemVTFromNode` determines the memory type with the following generic code for the stores: if (isa<MemIntrinsicSDNode>(Root)) return cast<MemIntrinsicSDNode>(Root)->getMemoryVT(); We might do the same for the loads, but we would have to teach to `AArch64TargetLowering::getTgtMemIntrinsic` how to extract the memory type from the intrinsics call, like we do with the stores: switch (Intrinsic) { case Intrinsic::aarch64_sve_st2: return setInfoSVEStN<2>(Info, I); case Intrinsic::aarch64_sve_st3: return setInfoSVEStN<3>(Info, I); case Intrinsic::aarch64_sve_st4: return setInfoSVEStN<4>(Info, I); But that would probably requires changing all the code that converts the intrinsics of the loads into the `SVE_LD_MERGE_ZERO` ISD nodes... If we want to do this cleanup maybe it is better to do it for the whole "structure store intrinsics/ISD nodes" in a separate patch? As far as I understand it, this would be quite an extensive change because we would have to make sure that the nodes of the structure loads are still using the intrinsic and not the custom ISD node when we enter this method, which is not what is going on now. I believe there is a reason for having converted the intrinsics nodes into custom ISD nodes before getting here in address optimization? fpetrogalli:* We cannot do for the structured load the same we do for the stores because address optimization…
		sdesmalenUnsubmitted Done Reply Inline Actions Thanks for explaining, I guess I'm happy keeping it as it is now. sdesmalen: Thanks for explaining, I guess I'm happy keeping it as it is now.
		return getPackedVectorTypeFromPredicateType(
		Ctx, Root->getOperand(1)->getValueType(0), /NumVec=/2);
		case AArch64ISD::SVE_LD3_MERGE_ZERO:
		return getPackedVectorTypeFromPredicateType(
		Ctx, Root->getOperand(1)->getValueType(0), /NumVec=/3);
		case AArch64ISD::SVE_LD4_MERGE_ZERO:
		return getPackedVectorTypeFromPredicateType(
		Ctx, Root->getOperand(1)->getValueType(0), /NumVec=/4);
default:		default:
break;		break;
}		}

if (Opcode != ISD::INTRINSIC_VOID)		if (Opcode != ISD::INTRINSIC_VOID)
return EVT();		return EVT();

const unsigned IntNo =		const unsigned IntNo =
cast<ConstantSDNode>(Root->getOperand(1))->getZExtValue();		cast<ConstantSDNode>(Root->getOperand(1))->getZExtValue();
if (IntNo != Intrinsic::aarch64_sve_prf)		if (IntNo != Intrinsic::aarch64_sve_prf)
return EVT();		return EVT();

// We are using an SVE prefetch intrinsic. Type must be inferred		// We are using an SVE prefetch intrinsic. Type must be inferred
// from the width of the predicate.		// from the width of the predicate.
return getPackedVectorTypeFromPredicateType(		return getPackedVectorTypeFromPredicateType(
Ctx, Root->getOperand(2)->getValueType(0));		Ctx, Root->getOperand(2)->getValueType(0), /NumVec=/1);
}		}

/// SelectAddrModeIndexedSVE - Attempt selection of the addressing mode:		/// SelectAddrModeIndexedSVE - Attempt selection of the addressing mode:
/// Base + OffImm * sizeof(MemVT) for Min >= OffImm <= Max		/// Base + OffImm * sizeof(MemVT) for Min >= OffImm <= Max
/// where Root is the memory access using N for its address.		/// where Root is the memory access using N for its address.
template <int64_t Min, int64_t Max>		template <int64_t Min, int64_t Max>
bool AArch64DAGToDAGISel::SelectAddrModeIndexedSVE(SDNode *Root, SDValue N,		bool AArch64DAGToDAGISel::SelectAddrModeIndexedSVE(SDNode *Root, SDValue N,
SDValue &Base,		SDValue &Base,
▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-intrinsics-ldN-reg+imm-addr-mode.ll

This file was added.

				; RUN: llc -mtriple=aarch64--linux-gnu -mattr=sve < %s \| FileCheck %s

				; NOTE: invalid, upper and lower bound immediate values of the regimm
				; addressing mode are checked only for the byte version of each
				; instruction (`ld<N>b`), as the code for detecting the immediate is
				; common to all instructions, and varies only for the number of
				; elements of the structure store, which is <N> = 2, 3, 4.

				; ld2b
				define <vscale x 32 x i8> @ld2.nxv32i8(<vscale x 16 x i1> %Pg, <vscale x 16 x i8> *%addr) {
				; CHECK-LABEL: ld2.nxv32i8:
				; CHECK: ld2b { z0.b, z1.b }, p0/z, [x0, #2, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %addr, i64 2
				%base_ptr = bitcast <vscale x 16 x i8>* %base to i8*
				%res = call <vscale x 32 x i8> @llvm.aarch64.sve.ld2.nxv32i8.nxv16i1.p0i8(<vscale x 16 x i1> %Pg, i8 *%base_ptr)
				ret <vscale x 32 x i8> %res
				}

				define <vscale x 32 x i8> @ld2.nxv32i8_lower_bound(<vscale x 16 x i1> %Pg, <vscale x 16 x i8> *%addr) {
				; CHECK-LABEL: ld2.nxv32i8_lower_bound:
				; CHECK: ld2b { z0.b, z1.b }, p0/z, [x0, #-16, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %addr, i64 -16
				%base_ptr = bitcast <vscale x 16 x i8>* %base to i8 *
				%res = call <vscale x 32 x i8> @llvm.aarch64.sve.ld2.nxv32i8.nxv16i1.p0i8(<vscale x 16 x i1> %Pg, i8 *%base_ptr)
				ret <vscale x 32 x i8> %res
				}

				define <vscale x 32 x i8> @ld2.nxv32i8_upper_bound(<vscale x 16 x i1> %Pg, <vscale x 16 x i8> *%addr) {
				; CHECK-LABEL: ld2.nxv32i8_upper_bound:
				; CHECK: ld2b { z0.b, z1.b }, p0/z, [x0, #14, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %addr, i64 14
				%base_ptr = bitcast <vscale x 16 x i8>* %base to i8 *
				%res = call <vscale x 32 x i8> @llvm.aarch64.sve.ld2.nxv32i8.nxv16i1.p0i8(<vscale x 16 x i1> %Pg, i8 *%base_ptr)
				ret <vscale x 32 x i8> %res
				}

				define <vscale x 32 x i8> @ld2.nxv32i8_not_multiple_of_2(<vscale x 16 x i1> %Pg, <vscale x 16 x i8> *%addr) {
				; CHECK-LABEL: ld2.nxv32i8_not_multiple_of_2:
				; CHECK: rdvl x[[OFFSET:[0-9]]], #3
				; CHECK-NEXT: ld2b { z0.b, z1.b }, p0/z, [x0, x[[OFFSET]]]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %addr, i64 3
				%base_ptr = bitcast <vscale x 16 x i8>* %base to i8 *
				%res = call <vscale x 32 x i8> @llvm.aarch64.sve.ld2.nxv32i8.nxv16i1.p0i8(<vscale x 16 x i1> %Pg, i8 *%base_ptr)
				ret <vscale x 32 x i8> %res
				}

				define <vscale x 32 x i8> @ld2.nxv32i8_outside_lower_bound(<vscale x 16 x i1> %Pg, <vscale x 16 x i8> *%addr) {
				; CHECK-LABEL: ld2.nxv32i8_outside_lower_bound:
				; CHECK: rdvl x[[OFFSET:[0-9]]], #-18
				; CHECK-NEXT: ld2b { z0.b, z1.b }, p0/z, [x0, x[[OFFSET]]]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %addr, i64 -18
				%base_ptr = bitcast <vscale x 16 x i8>* %base to i8 *
				%res = call <vscale x 32 x i8> @llvm.aarch64.sve.ld2.nxv32i8.nxv16i1.p0i8(<vscale x 16 x i1> %Pg, i8 *%base_ptr)
				ret <vscale x 32 x i8> %res
				}

				define <vscale x 32 x i8> @ld2.nxv32i8_outside_upper_bound(<vscale x 16 x i1> %Pg, <vscale x 16 x i8> *%addr) {
				; CHECK-LABEL: ld2.nxv32i8_outside_upper_bound:
				; CHECK: rdvl x[[OFFSET:[0-9]]], #16
				; CHECK-NEXT: ld2b { z0.b, z1.b }, p0/z, [x0, x[[OFFSET]]]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %addr, i64 16
				%base_ptr = bitcast <vscale x 16 x i8>* %base to i8 *
				%res = call <vscale x 32 x i8> @llvm.aarch64.sve.ld2.nxv32i8.nxv16i1.p0i8(<vscale x 16 x i1> %Pg, i8 *%base_ptr)
				ret <vscale x 32 x i8> %res
				}

				; ld2h
				define <vscale x 16 x i16> @ld2.nxv16i16(<vscale x 8 x i1> %Pg, <vscale x 8 x i16>* %addr) {
				; CHECK-LABEL: ld2.nxv16i16:
				; CHECK: ld2h { z0.h, z1.h }, p0/z, [x0, #14, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 8 x i16>, <vscale x 8 x i16>* %addr, i64 14
				%base_ptr = bitcast <vscale x 8 x i16>* %base to i16 *
				%res = call <vscale x 16 x i16> @llvm.aarch64.sve.ld2.nxv16i16.nxv8i1.p0i16(<vscale x 8 x i1> %Pg, i16 *%base_ptr)
				ret <vscale x 16 x i16> %res
				}

				define <vscale x 16 x half> @ld2.nxv16f16(<vscale x 8 x i1> %Pg, <vscale x 8 x half>* %addr) {
				; CHECK-LABEL: ld2.nxv16f16:
				; CHECK: ld2h { z0.h, z1.h }, p0/z, [x0, #-16, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 8 x half>, <vscale x 8 x half>* %addr, i64 -16
				%base_ptr = bitcast <vscale x 8 x half>* %base to half *
				%res = call <vscale x 16 x half> @llvm.aarch64.sve.ld2.nxv16f16.nxv8i1.p0f16(<vscale x 8 x i1> %Pg, half *%base_ptr)
				ret <vscale x 16 x half> %res
				}

				define <vscale x 16 x bfloat> @ld2.nxv16bf16(<vscale x 8 x i1> %Pg, <vscale x 8 x bfloat>* %addr) #0 {
				; CHECK-LABEL: ld2.nxv16bf16:
				; CHECK: ld2h { z0.h, z1.h }, p0/z, [x0, #12, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 8 x bfloat>, <vscale x 8 x bfloat>* %addr, i64 12
				%base_ptr = bitcast <vscale x 8 x bfloat>* %base to bfloat *
				%res = call <vscale x 16 x bfloat> @llvm.aarch64.sve.ld2.nxv16bf16.nxv8i1.p0bf16(<vscale x 8 x i1> %Pg, bfloat *%base_ptr)
				ret <vscale x 16 x bfloat> %res
				}

				; ld2w
				define <vscale x 8 x i32> @ld2.nxv8i32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32>* %addr) {
				; CHECK-LABEL: ld2.nxv8i32:
				; CHECK: ld2w { z0.s, z1.s }, p0/z, [x0, #14, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32>* %addr, i64 14
				%base_ptr = bitcast <vscale x 4 x i32>* %base to i32 *
				%res = call <vscale x 8 x i32> @llvm.aarch64.sve.ld2.nxv8i32.nxv4i1.p0i32(<vscale x 4 x i1> %Pg, i32 *%base_ptr)
				ret <vscale x 8 x i32> %res
				}

				define <vscale x 8 x float> @ld2.nxv8f32(<vscale x 4 x i1> %Pg, <vscale x 4 x float>* %addr) {
				; CHECK-LABEL: ld2.nxv8f32:
				; CHECK: ld2w { z0.s, z1.s }, p0/z, [x0, #-16, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 4 x float>, <vscale x 4 x float>* %addr, i64 -16
				%base_ptr = bitcast <vscale x 4 x float>* %base to float *
				%res = call <vscale x 8 x float> @llvm.aarch64.sve.ld2.nxv8f32.nxv4i1.p0f32(<vscale x 4 x i1> %Pg, float *%base_ptr)
				ret <vscale x 8 x float> %res
				}

				; ld2d
				define <vscale x 4 x i64> @ld2.nxv4i64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64>* %addr) {
				; CHECK-LABEL: ld2.nxv4i64:
				; CHECK: ld2d { z0.d, z1.d }, p0/z, [x0, #14, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 2 x i64>, <vscale x 2 x i64>* %addr, i64 14
				%base_ptr = bitcast <vscale x 2 x i64>* %base to i64 *
				%res = call <vscale x 4 x i64> @llvm.aarch64.sve.ld2.nxv4i64.nxv2i1.p0i64(<vscale x 2 x i1> %Pg, i64 *%base_ptr)
				ret <vscale x 4 x i64> %res
				}

				define <vscale x 4 x double> @ld2.nxv4f64(<vscale x 2 x i1> %Pg, <vscale x 2 x double>* %addr) {
				; CHECK-LABEL: ld2.nxv4f64:
				; CHECK: ld2d { z0.d, z1.d }, p0/z, [x0, #-16, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 2 x double>, <vscale x 2 x double>* %addr, i64 -16
				%base_ptr = bitcast <vscale x 2 x double>* %base to double *
				%res = call <vscale x 4 x double> @llvm.aarch64.sve.ld2.nxv4f64.nxv2i1.p0f64(<vscale x 2 x i1> %Pg, double *%base_ptr)
				ret <vscale x 4 x double> %res
				}

				; ld3b
				define <vscale x 48 x i8> @ld3.nxv48i8(<vscale x 16 x i1> %Pg, <vscale x 16 x i8> *%addr) {
				; CHECK-LABEL: ld3.nxv48i8:
				; CHECK: ld3b { z0.b, z1.b, z2.b }, p0/z, [x0, #3, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %addr, i64 3
				%base_ptr = bitcast <vscale x 16 x i8>* %base to i8 *
				%res = call <vscale x 48 x i8> @llvm.aarch64.sve.ld3.nxv48i8.nxv16i1.p0i8(<vscale x 16 x i1> %Pg, i8 *%base_ptr)
				ret <vscale x 48 x i8> %res
				}

				define <vscale x 48 x i8> @ld3.nxv48i8_lower_bound(<vscale x 16 x i1> %Pg, <vscale x 16 x i8> *%addr) {
				; CHECK-LABEL: ld3.nxv48i8_lower_bound:
				; CHECK: ld3b { z0.b, z1.b, z2.b }, p0/z, [x0, #-24, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %addr, i64 -24
				%base_ptr = bitcast <vscale x 16 x i8>* %base to i8 *
				%res = call <vscale x 48 x i8> @llvm.aarch64.sve.ld3.nxv48i8.nxv16i1.p0i8(<vscale x 16 x i1> %Pg, i8 *%base_ptr)
				ret <vscale x 48 x i8> %res
				}

				define <vscale x 48 x i8> @ld3.nxv48i8_upper_bound(<vscale x 16 x i1> %Pg, <vscale x 16 x i8> *%addr) {
				; CHECK-LABEL: ld3.nxv48i8_upper_bound:
				; CHECK: ld3b { z0.b, z1.b, z2.b }, p0/z, [x0, #21, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %addr, i64 21
				%base_ptr = bitcast <vscale x 16 x i8>* %base to i8 *
				%res = call <vscale x 48 x i8> @llvm.aarch64.sve.ld3.nxv48i8.nxv16i1.p0i8(<vscale x 16 x i1> %Pg, i8 *%base_ptr)
				ret <vscale x 48 x i8> %res
				}

				define <vscale x 48 x i8> @ld3.nxv48i8_not_multiple_of_3_01(<vscale x 16 x i1> %Pg, <vscale x 16 x i8> *%addr) {
				; CHECK-LABEL: ld3.nxv48i8_not_multiple_of_3_01:
				; CHECK: rdvl x[[OFFSET:[0-9]]], #4
				; CHECK-NEXT: ld3b { z0.b, z1.b, z2.b }, p0/z, [x0, x[[OFFSET]]]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %addr, i64 4
				%base_ptr = bitcast <vscale x 16 x i8>* %base to i8 *
				%res = call <vscale x 48 x i8> @llvm.aarch64.sve.ld3.nxv48i8.nxv16i1.p0i8(<vscale x 16 x i1> %Pg, i8 *%base_ptr)
				ret <vscale x 48 x i8> %res
				}

				define <vscale x 48 x i8> @ld3.nxv48i8_not_multiple_of_3_02(<vscale x 16 x i1> %Pg, <vscale x 16 x i8> *%addr) {
				; CHECK-LABEL: ld3.nxv48i8_not_multiple_of_3_02:
				; CHECK: rdvl x[[OFFSET:[0-9]]], #5
				; CHECK-NEXT: ld3b { z0.b, z1.b, z2.b }, p0/z, [x0, x[[OFFSET]]]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %addr, i64 5
				%base_ptr = bitcast <vscale x 16 x i8>* %base to i8 *
				%res = call <vscale x 48 x i8> @llvm.aarch64.sve.ld3.nxv48i8.nxv16i1.p0i8(<vscale x 16 x i1> %Pg, i8 *%base_ptr)
				ret <vscale x 48 x i8> %res
				}

				define <vscale x 48 x i8> @ld3.nxv48i8_outside_lower_bound(<vscale x 16 x i1> %Pg, <vscale x 16 x i8> *%addr) {
				; CHECK-LABEL: ld3.nxv48i8_outside_lower_bound:
				; CHECK: rdvl x[[OFFSET:[0-9]]], #-27
				; CHECK-NEXT: ld3b { z0.b, z1.b, z2.b }, p0/z, [x0, x[[OFFSET]]]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %addr, i64 -27
				%base_ptr = bitcast <vscale x 16 x i8>* %base to i8 *
				%res = call <vscale x 48 x i8> @llvm.aarch64.sve.ld3.nxv48i8.nxv16i1.p0i8(<vscale x 16 x i1> %Pg, i8 *%base_ptr)
				ret <vscale x 48 x i8> %res
				}

				define <vscale x 48 x i8> @ld3.nxv48i8_outside_upper_bound(<vscale x 16 x i1> %Pg, <vscale x 16 x i8> *%addr) {
				; CHECK-LABEL: ld3.nxv48i8_outside_upper_bound:
				; CHECK: rdvl x[[OFFSET:[0-9]]], #24
				; CHECK-NEXT: ld3b { z0.b, z1.b, z2.b }, p0/z, [x0, x[[OFFSET]]]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %addr, i64 24
				%base_ptr = bitcast <vscale x 16 x i8>* %base to i8 *
				%res = call <vscale x 48 x i8> @llvm.aarch64.sve.ld3.nxv48i8.nxv16i1.p0i8(<vscale x 16 x i1> %Pg, i8 *%base_ptr)
				ret <vscale x 48 x i8> %res
				}

				; ld3h
				define <vscale x 24 x i16> @ld3.nxv24i16(<vscale x 8 x i1> %Pg, <vscale x 8 x i16> *%addr) {
				; CHECK-LABEL: ld3.nxv24i16:
				; CHECK: ld3h { z0.h, z1.h, z2.h }, p0/z, [x0, #21, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 8 x i16>, <vscale x 8 x i16>* %addr, i64 21
				%base_ptr = bitcast <vscale x 8 x i16>* %base to i16 *
				%res = call <vscale x 24 x i16> @llvm.aarch64.sve.ld3.nxv24i16.nxv8i1.p0i16(<vscale x 8 x i1> %Pg, i16 *%base_ptr)
				ret <vscale x 24 x i16> %res
				}

				define <vscale x 24 x half> @ld3.nxv24f16(<vscale x 8 x i1> %Pg, <vscale x 8 x half> *%addr) {
				; CHECK-LABEL: ld3.nxv24f16:
				; CHECK: ld3h { z0.h, z1.h, z2.h }, p0/z, [x0, #21, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 8 x half>, <vscale x 8 x half>* %addr, i64 21
				%base_ptr = bitcast <vscale x 8 x half>* %base to half *
				%res = call <vscale x 24 x half> @llvm.aarch64.sve.ld3.nxv24f16.nxv8i1.p0f16(<vscale x 8 x i1> %Pg, half *%base_ptr)
				ret <vscale x 24 x half> %res
				}

				define <vscale x 24 x bfloat> @ld3.nxv24bf16(<vscale x 8 x i1> %Pg, <vscale x 8 x bfloat> *%addr) #0 {
				; CHECK-LABEL: ld3.nxv24bf16:
				; CHECK: ld3h { z0.h, z1.h, z2.h }, p0/z, [x0, #-24, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 8 x bfloat>, <vscale x 8 x bfloat>* %addr, i64 -24
				%base_ptr = bitcast <vscale x 8 x bfloat>* %base to bfloat *
				%res = call <vscale x 24 x bfloat> @llvm.aarch64.sve.ld3.nxv24bf16.nxv8i1.p0bf16(<vscale x 8 x i1> %Pg, bfloat *%base_ptr)
				ret <vscale x 24 x bfloat> %res
				}

				; ld3w
				define <vscale x 12 x i32> @ld3.nxv12i32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> *%addr) {
				; CHECK-LABEL: ld3.nxv12i32:
				; CHECK: ld3w { z0.s, z1.s, z2.s }, p0/z, [x0, #21, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32>* %addr, i64 21
				%base_ptr = bitcast <vscale x 4 x i32>* %base to i32 *
				%res = call <vscale x 12 x i32> @llvm.aarch64.sve.ld3.nxv12i32.nxv4i1.p0i32(<vscale x 4 x i1> %Pg, i32 *%base_ptr)
				ret <vscale x 12 x i32> %res
				}

				define <vscale x 12 x float> @ld3.nxv12f32(<vscale x 4 x i1> %Pg, <vscale x 4 x float> *%addr) {
				; CHECK-LABEL: ld3.nxv12f32:
				; CHECK: ld3w { z0.s, z1.s, z2.s }, p0/z, [x0, #-24, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 4 x float>, <vscale x 4 x float>* %addr, i64 -24
				%base_ptr = bitcast <vscale x 4 x float>* %base to float *
				%res = call <vscale x 12 x float> @llvm.aarch64.sve.ld3.nxv12f32.nxv4i1.p0f32(<vscale x 4 x i1> %Pg, float *%base_ptr)
				ret <vscale x 12 x float> %res
				}

				; ld3d
				define <vscale x 6 x i64> @ld3.nxv6i64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> *%addr) {
				; CHECK-LABEL: ld3.nxv6i64:
				; CHECK: ld3d { z0.d, z1.d, z2.d }, p0/z, [x0, #21, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 2 x i64>, <vscale x 2 x i64>* %addr, i64 21
				%base_ptr = bitcast <vscale x 2 x i64>* %base to i64 *
				%res = call <vscale x 6 x i64> @llvm.aarch64.sve.ld3.nxv6i64.nxv2i1.p0i64(<vscale x 2 x i1> %Pg, i64 *%base_ptr)
				ret <vscale x 6 x i64> %res
				}

				define <vscale x 6 x double> @ld3.nxv6f64(<vscale x 2 x i1> %Pg, <vscale x 2 x double> *%addr) {
				; CHECK-LABEL: ld3.nxv6f64:
				; CHECK: ld3d { z0.d, z1.d, z2.d }, p0/z, [x0, #-24, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 2 x double>, <vscale x 2 x double>* %addr, i64 -24
				%base_ptr = bitcast <vscale x 2 x double>* %base to double *
				%res = call <vscale x 6 x double> @llvm.aarch64.sve.ld3.nxv6f64.nxv2i1.p0f64(<vscale x 2 x i1> %Pg, double *%base_ptr)
				ret <vscale x 6 x double> %res
				}

				; ; ld4b
				define <vscale x 64 x i8> @ld4.nxv64i8(<vscale x 16 x i1> %Pg, <vscale x 16 x i8> *%addr) {
				; CHECK-LABEL: ld4.nxv64i8:
				; CHECK: ld4b { z0.b, z1.b, z2.b, z3.b }, p0/z, [x0, #4, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %addr, i64 4
				%base_ptr = bitcast <vscale x 16 x i8>* %base to i8 *
				%res = call <vscale x 64 x i8> @llvm.aarch64.sve.ld4.nxv64i8.nxv16i1.p0i8(<vscale x 16 x i1> %Pg, i8 *%base_ptr)
				ret <vscale x 64 x i8> %res
				}

				define <vscale x 64 x i8> @ld4.nxv64i8_lower_bound(<vscale x 16 x i1> %Pg, <vscale x 16 x i8> *%addr) {
				; CHECK-LABEL: ld4.nxv64i8_lower_bound:
				; CHECK: ld4b { z0.b, z1.b, z2.b, z3.b }, p0/z, [x0, #-32, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %addr, i64 -32
				%base_ptr = bitcast <vscale x 16 x i8>* %base to i8 *
				%res = call <vscale x 64 x i8> @llvm.aarch64.sve.ld4.nxv64i8.nxv16i1.p0i8(<vscale x 16 x i1> %Pg, i8 *%base_ptr)
				ret <vscale x 64 x i8> %res
				}

				define <vscale x 64 x i8> @ld4.nxv64i8_upper_bound(<vscale x 16 x i1> %Pg, <vscale x 16 x i8> *%addr) {
				; CHECK-LABEL: ld4.nxv64i8_upper_bound:
				; CHECK: ld4b { z0.b, z1.b, z2.b, z3.b }, p0/z, [x0, #28, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %addr, i64 28
				%base_ptr = bitcast <vscale x 16 x i8>* %base to i8 *
				%res = call <vscale x 64 x i8> @llvm.aarch64.sve.ld4.nxv64i8.nxv16i1.p0i8(<vscale x 16 x i1> %Pg, i8 *%base_ptr)
				ret <vscale x 64 x i8> %res
				}

				define <vscale x 64 x i8> @ld4.nxv64i8_not_multiple_of_4_01(<vscale x 16 x i1> %Pg, <vscale x 16 x i8> *%addr) {
				; CHECK-LABEL: ld4.nxv64i8_not_multiple_of_4_01:
				; CHECK: rdvl x[[OFFSET:[0-9]]], #5
				; CHECK-NEXT: ld4b { z0.b, z1.b, z2.b, z3.b }, p0/z, [x0, x[[OFFSET]]]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %addr, i64 5
				%base_ptr = bitcast <vscale x 16 x i8>* %base to i8 *
				%res = call <vscale x 64 x i8> @llvm.aarch64.sve.ld4.nxv64i8.nxv16i1.p0i8(<vscale x 16 x i1> %Pg, i8 *%base_ptr)
				ret <vscale x 64 x i8> %res
				}

				define <vscale x 64 x i8> @ld4.nxv64i8_not_multiple_of_4_02(<vscale x 16 x i1> %Pg, <vscale x 16 x i8> *%addr) {
				; CHECK-LABEL: ld4.nxv64i8_not_multiple_of_4_02:
				; CHECK: rdvl x[[OFFSET:[0-9]]], #6
				; CHECK-NEXT: ld4b { z0.b, z1.b, z2.b, z3.b }, p0/z, [x0, x[[OFFSET]]]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %addr, i64 6
				%base_ptr = bitcast <vscale x 16 x i8>* %base to i8 *
				%res = call <vscale x 64 x i8> @llvm.aarch64.sve.ld4.nxv64i8.nxv16i1.p0i8(<vscale x 16 x i1> %Pg, i8 *%base_ptr)
				ret <vscale x 64 x i8> %res
				}

				define <vscale x 64 x i8> @ld4.nxv64i8_not_multiple_of_4_03(<vscale x 16 x i1> %Pg, <vscale x 16 x i8> *%addr) {
				; CHECK-LABEL: ld4.nxv64i8_not_multiple_of_4_03:
				; CHECK: rdvl x[[OFFSET:[0-9]]], #7
				; CHECK-NEXT: ld4b { z0.b, z1.b, z2.b, z3.b }, p0/z, [x0, x[[OFFSET]]]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %addr, i64 7
				%base_ptr = bitcast <vscale x 16 x i8>* %base to i8 *
				%res = call <vscale x 64 x i8> @llvm.aarch64.sve.ld4.nxv64i8.nxv16i1.p0i8(<vscale x 16 x i1> %Pg, i8 *%base_ptr)
				ret <vscale x 64 x i8> %res
				}

				define <vscale x 64 x i8> @ld4.nxv64i8_outside_lower_bound(<vscale x 16 x i1> %Pg, <vscale x 16 x i8> *%addr) {
				; CHECK-LABEL: ld4.nxv64i8_outside_lower_bound:
				; FIXME: optimize OFFSET computation so that xOFFSET = (mul (RDVL #4) #9)
				; xM = -9 * 2^6
				; xP = RDVL * 2^-4
				; xOFFSET = RDVL * 2^-4 * -9 * 2^6 = RDVL * -36
				; CHECK: rdvl x[[N:[0-9]]], #1
				; CHECK-DAG: mov x[[M:[0-9]]], #-576
				; CHECK-DAG: lsr x[[P:[0-9]]], x[[N]], #4
				; CHECK-DAG: mul x[[OFFSET:[0-9]]], x[[P]], x[[M]]
				; CHECK-NEXT: ld4b { z0.b, z1.b, z2.b, z3.b }, p0/z, [x0, x[[OFFSET]]]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %addr, i64 -36
				%base_ptr = bitcast <vscale x 16 x i8>* %base to i8 *
				%res = call <vscale x 64 x i8> @llvm.aarch64.sve.ld4.nxv64i8.nxv16i1.p0i8(<vscale x 16 x i1> %Pg, i8 *%base_ptr)
				ret <vscale x 64 x i8> %res
				}

				define <vscale x 64 x i8> @ld4.nxv64i8_outside_upper_bound(<vscale x 16 x i1> %Pg, <vscale x 16 x i8> *%addr) {
				; CHECK-LABEL: ld4.nxv64i8_outside_upper_bound:
				; FIXME: optimize OFFSET computation so that xOFFSET = (mul (RDVL #16) #2)
				; xM = 2^9
				; xP = RDVL * 2^-4
				; xOFFSET = RDVL * 2^-4 * 2^9 = RDVL * 32
				; CHECK: rdvl x[[N:[0-9]]], #1
				; CHECK-DAG: mov w[[M:[0-9]]], #512
				; CHECK-DAG: lsr x[[P:[0-9]]], x[[N]], #4
				; CHECK-DAG: mul x[[OFFSET:[0-9]]], x[[P]], x[[M]]
				; CHECK-NEXT: ld4b { z0.b, z1.b, z2.b, z3.b }, p0/z, [x0, x[[OFFSET]]]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %addr, i64 32
				%base_ptr = bitcast <vscale x 16 x i8>* %base to i8 *
				%res = call <vscale x 64 x i8> @llvm.aarch64.sve.ld4.nxv64i8.nxv16i1.p0i8(<vscale x 16 x i1> %Pg, i8 *%base_ptr)
				ret <vscale x 64 x i8> %res
				}

				; ld4h
				define <vscale x 32 x i16> @ld4.nxv32i16(<vscale x 8 x i1> %Pg, <vscale x 8 x i16> *%addr) {
				; CHECK-LABEL: ld4.nxv32i16:
				; CHECK: ld4h { z0.h, z1.h, z2.h, z3.h }, p0/z, [x0, #8, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 8 x i16>, <vscale x 8 x i16>* %addr, i64 8
				%base_ptr = bitcast <vscale x 8 x i16>* %base to i16 *
				%res = call <vscale x 32 x i16> @llvm.aarch64.sve.ld4.nxv32i16.nxv8i1.p0i16(<vscale x 8 x i1> %Pg, i16 *%base_ptr)
				ret <vscale x 32 x i16> %res
				}

				define <vscale x 32 x half> @ld4.nxv32f16(<vscale x 8 x i1> %Pg, <vscale x 8 x half> *%addr) {
				; CHECK-LABEL: ld4.nxv32f16:
				; CHECK: ld4h { z0.h, z1.h, z2.h, z3.h }, p0/z, [x0, #28, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 8 x half>, <vscale x 8 x half>* %addr, i64 28
				%base_ptr = bitcast <vscale x 8 x half>* %base to half *
				%res = call <vscale x 32 x half> @llvm.aarch64.sve.ld4.nxv32f16.nxv8i1.p0f16(<vscale x 8 x i1> %Pg, half *%base_ptr)
				ret <vscale x 32 x half> %res
				}

				define <vscale x 32 x bfloat> @ld4.nxv32bf16(<vscale x 8 x i1> %Pg, <vscale x 8 x bfloat> *%addr) #0 {
				; CHECK-LABEL: ld4.nxv32bf16:
				; CHECK: ld4h { z0.h, z1.h, z2.h, z3.h }, p0/z, [x0, #-32, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 8 x bfloat>, <vscale x 8 x bfloat>* %addr, i64 -32
				%base_ptr = bitcast <vscale x 8 x bfloat>* %base to bfloat *
				%res = call <vscale x 32 x bfloat> @llvm.aarch64.sve.ld4.nxv32bf16.nxv8i1.p0bf16(<vscale x 8 x i1> %Pg, bfloat *%base_ptr)
				ret <vscale x 32 x bfloat> %res
				}

				; ld4w
				define <vscale x 16 x i32> @ld4.nxv16i32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> *%addr) {
				; CHECK-LABEL: ld4.nxv16i32:
				; CHECK: ld4w { z0.s, z1.s, z2.s, z3.s }, p0/z, [x0, #28, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32>* %addr, i64 28
				%base_ptr = bitcast <vscale x 4 x i32>* %base to i32 *
				%res = call <vscale x 16 x i32> @llvm.aarch64.sve.ld4.nxv16i32.nxv4i1.p0i32(<vscale x 4 x i1> %Pg, i32 *%base_ptr)
				ret <vscale x 16 x i32> %res
				}

				define <vscale x 16 x float> @ld4.nxv16f32(<vscale x 4 x i1> %Pg, <vscale x 4 x float>* %addr) {
				; CHECK-LABEL: ld4.nxv16f32:
				; CHECK: ld4w { z0.s, z1.s, z2.s, z3.s }, p0/z, [x0, #-32, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 4 x float>, <vscale x 4 x float>* %addr, i64 -32
				%base_ptr = bitcast <vscale x 4 x float>* %base to float *
				%res = call <vscale x 16 x float> @llvm.aarch64.sve.ld4.nxv16f32.nxv4i1.p0f32(<vscale x 4 x i1> %Pg, float *%base_ptr)
				ret <vscale x 16 x float> %res
				}

				; ld4d
				define <vscale x 8 x i64> @ld4.nxv8i64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> *%addr) {
				; CHECK-LABEL: ld4.nxv8i64:
				; CHECK: ld4d { z0.d, z1.d, z2.d, z3.d }, p0/z, [x0, #28, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 2 x i64>, <vscale x 2 x i64>* %addr, i64 28
				%base_ptr = bitcast <vscale x 2 x i64>* %base to i64 *
				%res = call <vscale x 8 x i64> @llvm.aarch64.sve.ld4.nxv8i64.nxv2i1.p0i64(<vscale x 2 x i1> %Pg, i64 *%base_ptr)
				sdesmalenUnsubmitted Done Reply Inline Actions nit: when testing these pairs, can you at least make sure to test the min/max values? sdesmalen: nit: when testing these pairs, can you at least make sure to test the min/max values?
				fpetrogalliAuthorUnsubmitted Done Reply Inline Actions I can do that, just making sure you didn't miss the comment on the top of file that explains why I haven't add range checks for all cases: ; NOTE: invalid, upper and lower bound immediate values of the regimm ; addressing mode are checked only for the byte version of each ; instruction (`ld<N>b`), as the code for detecting the immediate is ; common to all instructions, and varies only for the number of ; elements of the structure store, which is <N> = 2, 3, 4. Let me know if you think this is not a sufficient explanation! fpetrogalli: I can do that, just making sure you didn't miss the comment on the top of file that explains…
				sdesmalenUnsubmitted Done Reply Inline Actions I meant using `#-32` and `#28` instead of `#16` and `#-16` for ld4.nxv8i64 and nxv8f64 respectively. (and similar for other tests in this file) sdesmalen: I meant using `#-32` and `#28` instead of `#16` and `#-16` for ld4.nxv8i64 and nxv8f64…
				fpetrogalliAuthorUnsubmitted Done Reply Inline Actions Sorry, I don't understand why -32/28 would be better than -16/16. The values you are suggesting are tested to be lower and upper bound for `ld4<T>` in the byte case, `ld4b`, between lines 305-323. The range of the immediate does not depend on the scalar data type being loaded, but only on the number of vector loaded by the instruction, so I think we are goo in testing it only for `ld<N>b`. Or am I missing something? fpetrogalli: Sorry, I don't understand why -32/28 would be better than -16/16. The values you are suggesting…
				ret <vscale x 8 x i64> %res
				}

				define <vscale x 8 x double> @ld4.nxv8f64(<vscale x 2 x i1> %Pg, <vscale x 2 x double> *%addr) {
				; CHECK-LABEL: ld4.nxv8f64:
				; CHECK: ld4d { z0.d, z1.d, z2.d, z3.d }, p0/z, [x0, #-32, mul vl]
				; CHECK-NEXT: ret
				%base = getelementptr <vscale x 2 x double>, <vscale x 2 x double>* %addr, i64 -32
				%base_ptr = bitcast <vscale x 2 x double>* %base to double *
				%res = call <vscale x 8 x double> @llvm.aarch64.sve.ld4.nxv8f64.nxv2i1.p0f64(<vscale x 2 x i1> %Pg, double * %base_ptr)
				ret <vscale x 8 x double> %res
				}

				declare <vscale x 32 x i8> @llvm.aarch64.sve.ld2.nxv32i8.nxv16i1.p0i8(<vscale x 16 x i1>, i8*)
				declare <vscale x 16 x i16> @llvm.aarch64.sve.ld2.nxv16i16.nxv8i1.p0i16(<vscale x 8 x i1>, i16*)
				declare <vscale x 8 x i32> @llvm.aarch64.sve.ld2.nxv8i32.nxv4i1.p0i32(<vscale x 4 x i1>, i32*)
				declare <vscale x 4 x i64> @llvm.aarch64.sve.ld2.nxv4i64.nxv2i1.p0i64(<vscale x 2 x i1>, i64*)
				declare <vscale x 16 x half> @llvm.aarch64.sve.ld2.nxv16f16.nxv8i1.p0f16(<vscale x 8 x i1>, half*)
				declare <vscale x 16 x bfloat> @llvm.aarch64.sve.ld2.nxv16bf16.nxv8i1.p0bf16(<vscale x 8 x i1>, bfloat*)
				declare <vscale x 8 x float> @llvm.aarch64.sve.ld2.nxv8f32.nxv4i1.p0f32(<vscale x 4 x i1>, float*)
				declare <vscale x 4 x double> @llvm.aarch64.sve.ld2.nxv4f64.nxv2i1.p0f64(<vscale x 2 x i1>, double*)

				declare <vscale x 48 x i8> @llvm.aarch64.sve.ld3.nxv48i8.nxv16i1.p0i8(<vscale x 16 x i1>, i8*)
				declare <vscale x 24 x i16> @llvm.aarch64.sve.ld3.nxv24i16.nxv8i1.p0i16(<vscale x 8 x i1>, i16*)
				declare <vscale x 12 x i32> @llvm.aarch64.sve.ld3.nxv12i32.nxv4i1.p0i32(<vscale x 4 x i1>, i32*)
				declare <vscale x 6 x i64> @llvm.aarch64.sve.ld3.nxv6i64.nxv2i1.p0i64(<vscale x 2 x i1>, i64*)
				declare <vscale x 24 x half> @llvm.aarch64.sve.ld3.nxv24f16.nxv8i1.p0f16(<vscale x 8 x i1>, half*)
				declare <vscale x 24 x bfloat> @llvm.aarch64.sve.ld3.nxv24bf16.nxv8i1.p0bf16(<vscale x 8 x i1>, bfloat*)
				declare <vscale x 12 x float> @llvm.aarch64.sve.ld3.nxv12f32.nxv4i1.p0f32(<vscale x 4 x i1>, float*)
				declare <vscale x 6 x double> @llvm.aarch64.sve.ld3.nxv6f64.nxv2i1.p0f64(<vscale x 2 x i1>, double*)

				declare <vscale x 64 x i8> @llvm.aarch64.sve.ld4.nxv64i8.nxv16i1.p0i8(<vscale x 16 x i1>, i8*)
				declare <vscale x 32 x i16> @llvm.aarch64.sve.ld4.nxv32i16.nxv8i1.p0i16(<vscale x 8 x i1>, i16*)
				declare <vscale x 16 x i32> @llvm.aarch64.sve.ld4.nxv16i32.nxv4i1.p0i32(<vscale x 4 x i1>, i32*)
				declare <vscale x 8 x i64> @llvm.aarch64.sve.ld4.nxv8i64.nxv2i1.p0i64(<vscale x 2 x i1>, i64*)
				declare <vscale x 32 x half> @llvm.aarch64.sve.ld4.nxv32f16.nxv8i1.p0f16(<vscale x 8 x i1>, half*)
				declare <vscale x 32 x bfloat> @llvm.aarch64.sve.ld4.nxv32bf16.nxv8i1.p0bf16(<vscale x 8 x i1>, bfloat*)
				declare <vscale x 16 x float> @llvm.aarch64.sve.ld4.nxv16f32.nxv4i1.p0f32(<vscale x 4 x i1>, float*)
				declare <vscale x 8 x double> @llvm.aarch64.sve.ld4.nxv8f64.nxv2i1.p0f64(<vscale x 2 x i1>, double*)

				; +bf16 is required for the bfloat version.
				attributes #0 = { "target-features"="+sve,+bf16" }

llvm/test/CodeGen/AArch64/sve-intrinsics-ldN-reg+reg-addr-mode.ll

This file was added.

				; RUN: llc -mtriple=aarch64--linux-gnu -mattr=sve < %s \| FileCheck %s

				; ld2b
				define <vscale x 32 x i8> @ld2.nxv32i8(<vscale x 16 x i1> %Pg, i8 *%addr, i64 %a) {
				; CHECK-LABEL: ld2.nxv32i8:
				; CHECK: ld2b { z0.b, z1.b }, p0/z, [x0, x1]
				; CHECK-NEXT: ret
				%addr2 = getelementptr i8, i8 * %addr, i64 %a
				%res = call <vscale x 32 x i8> @llvm.aarch64.sve.ld2.nxv32i8.nxv16i1.p0i8(<vscale x 16 x i1> %Pg, i8 *%addr2)
				ret <vscale x 32 x i8> %res
				sdesmalenUnsubmitted Done Reply Inline Actions nit: s/gp/pg/ (in both tests) sdesmalen: nit: s/gp/pg/ (in both tests)
				}

				; ld2h
				define <vscale x 16 x i16> @ld2.nxv16i16(<vscale x 8 x i1> %Pg, i16 *%addr, i64 %a) {
				; CHECK-LABEL: ld2.nxv16i16:
				; CHECK: ld2h { z0.h, z1.h }, p0/z, [x0, x1, lsl #1]
				; CHECK-NEXT: ret
				%addr2 = getelementptr i16, i16 * %addr, i64 %a
				%res = call <vscale x 16 x i16> @llvm.aarch64.sve.ld2.nxv16i16.nxv8i1.p0i16(<vscale x 8 x i1> %Pg, i16 *%addr2)
				ret <vscale x 16 x i16> %res
				}

				define <vscale x 16 x half> @ld2.nxv16f16(<vscale x 8 x i1> %Pg, half *%addr, i64 %a) {
				; CHECK-LABEL: ld2.nxv16f16:
				; CHECK: ld2h { z0.h, z1.h }, p0/z, [x0, x1, lsl #1]
				; CHECK-NEXT: ret
				%addr2 = getelementptr half, half * %addr, i64 %a
				%res = call <vscale x 16 x half> @llvm.aarch64.sve.ld2.nxv16f16.nxv8i1.p0f16(<vscale x 8 x i1> %Pg, half *%addr2)
				ret <vscale x 16 x half> %res
				}

				define <vscale x 16 x bfloat> @ld2.nxv16bf16(<vscale x 8 x i1> %Pg, bfloat *%addr, i64 %a) #0 {
				; CHECK-LABEL: ld2.nxv16bf16:
				; CHECK: ld2h { z0.h, z1.h }, p0/z, [x0, x1, lsl #1]
				; CHECK-NEXT: ret
				%addr2 = getelementptr bfloat, bfloat * %addr, i64 %a
				%res = call <vscale x 16 x bfloat> @llvm.aarch64.sve.ld2.nxv16bf16.nxv8i1.p0bf16(<vscale x 8 x i1> %Pg, bfloat *%addr2)
				ret <vscale x 16 x bfloat> %res
				}

				; ld2w
				define <vscale x 8 x i32> @ld2.nxv8i32(<vscale x 4 x i1> %Pg, i32 *%addr, i64 %a) {
				; CHECK-LABEL: ld2.nxv8i32:
				; CHECK: ld2w { z0.s, z1.s }, p0/z, [x0, x1, lsl #2]
				; CHECK-NEXT: ret
				%addr2 = getelementptr i32, i32 * %addr, i64 %a
				%res = call <vscale x 8 x i32> @llvm.aarch64.sve.ld2.nxv8i32.nxv4i1.p0i32(<vscale x 4 x i1> %Pg, i32 *%addr2)
				ret <vscale x 8 x i32> %res
				}

				define <vscale x 8 x float> @ld2.nxv8f32(<vscale x 4 x i1> %Pg, float *%addr, i64 %a) {
				; CHECK-LABEL: ld2.nxv8f32:
				; CHECK: ld2w { z0.s, z1.s }, p0/z, [x0, x1, lsl #2]
				; CHECK-NEXT: ret
				%addr2 = getelementptr float, float * %addr, i64 %a
				%res = call <vscale x 8 x float> @llvm.aarch64.sve.ld2.nxv8f32.nxv4i1.p0f32(<vscale x 4 x i1> %Pg, float *%addr2)
				ret <vscale x 8 x float> %res
				}

				; ld2d
				define <vscale x 4 x i64> @ld2.nxv4i64(<vscale x 2 x i1> %Pg, i64 *%addr, i64 %a) {
				; CHECK-LABEL: ld2.nxv4i64:
				; CHECK: ld2d { z0.d, z1.d }, p0/z, [x0, x1, lsl #3]
				; CHECK-NEXT: ret
				%addr2 = getelementptr i64, i64 * %addr, i64 %a
				%res = call <vscale x 4 x i64> @llvm.aarch64.sve.ld2.nxv4i64.nxv2i1.p0i64(<vscale x 2 x i1> %Pg, i64 *%addr2)
				ret <vscale x 4 x i64> %res
				}

				define <vscale x 4 x double> @ld2.nxv4f64(<vscale x 2 x i1> %Pg, double *%addr, i64 %a) {
				; CHECK-LABEL: ld2.nxv4f64:
				; CHECK: ld2d { z0.d, z1.d }, p0/z, [x0, x1, lsl #3]
				; CHECK-NEXT: ret
				%addr2 = getelementptr double, double * %addr, i64 %a
				%res = call <vscale x 4 x double> @llvm.aarch64.sve.ld2.nxv4f64.nxv2i1.p0f64(<vscale x 2 x i1> %Pg, double *%addr2)
				ret <vscale x 4 x double> %res
				}

				; ld3b
				define <vscale x 48 x i8> @ld3.nxv48i8(<vscale x 16 x i1> %Pg, i8 *%addr, i64 %a) {
				; CHECK-LABEL: ld3.nxv48i8:
				; CHECK: ld3b { z0.b, z1.b, z2.b }, p0/z, [x0, x1]
				; CHECK-NEXT: ret
				%addr2 = getelementptr i8, i8 * %addr, i64 %a
				%res = call <vscale x 48 x i8> @llvm.aarch64.sve.ld3.nxv48i8.nxv16i1.p0i8(<vscale x 16 x i1> %Pg, i8 *%addr2)
				ret <vscale x 48 x i8> %res
				}

				; ld3h
				define <vscale x 24 x i16> @ld3.nxv24i16(<vscale x 8 x i1> %Pg, i16 *%addr, i64 %a) {
				; CHECK-LABEL: ld3.nxv24i16:
				; CHECK: ld3h { z0.h, z1.h, z2.h }, p0/z, [x0, x1, lsl #1]
				; CHECK-NEXT: ret
				%addr2 = getelementptr i16, i16 * %addr, i64 %a
				%res = call <vscale x 24 x i16> @llvm.aarch64.sve.ld3.nxv24i16.nxv8i1.p0i16(<vscale x 8 x i1> %Pg, i16 *%addr2)
				ret <vscale x 24 x i16> %res
				}

				define <vscale x 24 x half> @ld3.nxv24f16(<vscale x 8 x i1> %Pg, half *%addr, i64 %a) {
				; CHECK-LABEL: ld3.nxv24f16:
				; CHECK: ld3h { z0.h, z1.h, z2.h }, p0/z, [x0, x1, lsl #1]
				; CHECK-NEXT: ret
				%addr2 = getelementptr half, half * %addr, i64 %a
				%res = call <vscale x 24 x half> @llvm.aarch64.sve.ld3.nxv24f16.nxv8i1.p0f16(<vscale x 8 x i1> %Pg, half *%addr2)
				ret <vscale x 24 x half> %res
				}

				define <vscale x 24 x bfloat> @ld3.nxv24bf16(<vscale x 8 x i1> %Pg, bfloat *%addr, i64 %a) #0 {
				; CHECK-LABEL: ld3.nxv24bf16:
				; CHECK: ld3h { z0.h, z1.h, z2.h }, p0/z, [x0, x1, lsl #1]
				; CHECK-NEXT: ret
				%addr2 = getelementptr bfloat, bfloat * %addr, i64 %a
				%res = call <vscale x 24 x bfloat> @llvm.aarch64.sve.ld3.nxv24bf16.nxv8i1.p0bf16(<vscale x 8 x i1> %Pg, bfloat *%addr2)
				ret <vscale x 24 x bfloat> %res
				}

				; ld3w
				define <vscale x 12 x i32> @ld3.nxv12i32(<vscale x 4 x i1> %Pg, i32 *%addr, i64 %a) {
				; CHECK-LABEL: ld3.nxv12i32:
				; CHECK: ld3w { z0.s, z1.s, z2.s }, p0/z, [x0, x1, lsl #2]
				; CHECK-NEXT: ret
				%addr2 = getelementptr i32, i32 * %addr, i64 %a
				%res = call <vscale x 12 x i32> @llvm.aarch64.sve.ld3.nxv12i32.nxv4i1.p0i32(<vscale x 4 x i1> %Pg, i32 *%addr2)
				ret <vscale x 12 x i32> %res
				}

				define <vscale x 12 x float> @ld3.nxv12f32(<vscale x 4 x i1> %Pg, float *%addr, i64 %a) {
				; CHECK-LABEL: ld3.nxv12f32:
				; CHECK: ld3w { z0.s, z1.s, z2.s }, p0/z, [x0, x1, lsl #2]
				; CHECK-NEXT: ret
				%addr2 = getelementptr float, float * %addr, i64 %a
				%res = call <vscale x 12 x float> @llvm.aarch64.sve.ld3.nxv12f32.nxv4i1.p0f32(<vscale x 4 x i1> %Pg, float *%addr2)
				ret <vscale x 12 x float> %res
				}

				; ld3d
				define <vscale x 6 x i64> @ld3.nxv6i64(<vscale x 2 x i1> %Pg, i64 *%addr, i64 %a) {
				; CHECK-LABEL: ld3.nxv6i64:
				; CHECK: ld3d { z0.d, z1.d, z2.d }, p0/z, [x0, x1, lsl #3]
				; CHECK-NEXT: ret
				%addr2 = getelementptr i64, i64 * %addr, i64 %a
				%res = call <vscale x 6 x i64> @llvm.aarch64.sve.ld3.nxv6i64.nxv2i1.p0i64(<vscale x 2 x i1> %Pg, i64 *%addr2)
				ret <vscale x 6 x i64> %res
				}

				define <vscale x 6 x double> @ld3.nxv6f64(<vscale x 2 x i1> %Pg, double *%addr, i64 %a) {
				; CHECK-LABEL: ld3.nxv6f64:
				; CHECK: ld3d { z0.d, z1.d, z2.d }, p0/z, [x0, x1, lsl #3]
				; CHECK-NEXT: ret
				%addr2 = getelementptr double, double * %addr, i64 %a
				%res = call <vscale x 6 x double> @llvm.aarch64.sve.ld3.nxv6f64.nxv2i1.p0f64(<vscale x 2 x i1> %Pg, double *%addr2)
				ret <vscale x 6 x double> %res
				}

				; ld4b
				define <vscale x 64 x i8> @ld4.nxv64i8(<vscale x 16 x i1> %Pg, i8 *%addr, i64 %a) {
				; CHECK-LABEL: ld4.nxv64i8:
				; CHECK: ld4b { z0.b, z1.b, z2.b, z3.b }, p0/z, [x0, x1]
				; CHECK-NEXT: ret
				%addr2 = getelementptr i8, i8 * %addr, i64 %a
				%res = call <vscale x 64 x i8> @llvm.aarch64.sve.ld4.nxv64i8.nxv16i1.p0i8(<vscale x 16 x i1> %Pg, i8 *%addr2)
				ret <vscale x 64 x i8> %res
				}

				; ld4h
				define <vscale x 32 x i16> @ld4.nxv32i16(<vscale x 8 x i1> %Pg, i16 *%addr, i64 %a) {
				; CHECK-LABEL: ld4.nxv32i16:
				; CHECK: ld4h { z0.h, z1.h, z2.h, z3.h }, p0/z, [x0, x1, lsl #1]
				; CHECK-NEXT: ret
				%addr2 = getelementptr i16, i16 * %addr, i64 %a
				%res = call <vscale x 32 x i16> @llvm.aarch64.sve.ld4.nxv32i16.nxv8i1.p0i16(<vscale x 8 x i1> %Pg, i16 *%addr2)
				ret <vscale x 32 x i16> %res
				}

				define <vscale x 32 x half> @ld4.nxv32f16(<vscale x 8 x i1> %Pg, half *%addr, i64 %a) {
				; CHECK-LABEL: ld4.nxv32f16:
				; CHECK: ld4h { z0.h, z1.h, z2.h, z3.h }, p0/z, [x0, x1, lsl #1]
				; CHECK-NEXT: ret
				%addr2 = getelementptr half, half * %addr, i64 %a
				%res = call <vscale x 32 x half> @llvm.aarch64.sve.ld4.nxv32f16.nxv8i1.p0f16(<vscale x 8 x i1> %Pg, half *%addr2)
				ret <vscale x 32 x half> %res
				}

				define <vscale x 32 x bfloat> @ld4.nxv32bf16(<vscale x 8 x i1> %Pg, bfloat *%addr, i64 %a) #0 {
				; CHECK-LABEL: ld4.nxv32bf16:
				; CHECK: ld4h { z0.h, z1.h, z2.h, z3.h }, p0/z, [x0, x1, lsl #1]
				; CHECK-NEXT: ret
				%addr2 = getelementptr bfloat, bfloat * %addr, i64 %a
				%res = call <vscale x 32 x bfloat> @llvm.aarch64.sve.ld4.nxv32bf16.nxv8i1.p0bf16(<vscale x 8 x i1> %Pg, bfloat *%addr2)
				ret <vscale x 32 x bfloat> %res
				}

				; ld4w
				define <vscale x 16 x i32> @ld4.nxv16i32(<vscale x 4 x i1> %Pg, i32 *%addr, i64 %a) {
				; CHECK-LABEL: ld4.nxv16i32:
				; CHECK: ld4w { z0.s, z1.s, z2.s, z3.s }, p0/z, [x0, x1, lsl #2]
				; CHECK-NEXT: ret
				%addr2 = getelementptr i32, i32 * %addr, i64 %a
				%res = call <vscale x 16 x i32> @llvm.aarch64.sve.ld4.nxv16i32.nxv4i1.p0i32(<vscale x 4 x i1> %Pg, i32 *%addr2)
				ret <vscale x 16 x i32> %res
				}

				define <vscale x 16 x float> @ld4.nxv16f32(<vscale x 4 x i1> %Pg, float *%addr, i64 %a) {
				; CHECK-LABEL: ld4.nxv16f32:
				; CHECK: ld4w { z0.s, z1.s, z2.s, z3.s }, p0/z, [x0, x1, lsl #2]
				; CHECK-NEXT: ret
				%addr2 = getelementptr float, float * %addr, i64 %a
				%res = call <vscale x 16 x float> @llvm.aarch64.sve.ld4.nxv16f32.nxv4i1.p0f32(<vscale x 4 x i1> %Pg, float *%addr2)
				ret <vscale x 16 x float> %res
				}

				; ld4d
				define <vscale x 8 x i64> @ld4.nxv8i64(<vscale x 2 x i1> %Pg, i64 *%addr, i64 %a) {
				; CHECK-LABEL: ld4.nxv8i64:
				; CHECK: ld4d { z0.d, z1.d, z2.d, z3.d }, p0/z, [x0, x1, lsl #3]
				; CHECK-NEXT: ret
				%addr2 = getelementptr i64, i64 * %addr, i64 %a
				%res = call <vscale x 8 x i64> @llvm.aarch64.sve.ld4.nxv8i64.nxv2i1.p0i64(<vscale x 2 x i1> %Pg, i64 *%addr2)
				ret <vscale x 8 x i64> %res
				}

				define <vscale x 8 x double> @ld4.nxv8f64(<vscale x 2 x i1> %Pg, double *%addr, i64 %a) {
				; CHECK-LABEL: ld4.nxv8f64:
				; CHECK: ld4d { z0.d, z1.d, z2.d, z3.d }, p0/z, [x0, x1, lsl #3]
				; CHECK-NEXT: ret
				%addr2 = getelementptr double, double * %addr, i64 %a
				%res = call <vscale x 8 x double> @llvm.aarch64.sve.ld4.nxv8f64.nxv2i1.p0f64(<vscale x 2 x i1> %Pg, double *%addr2)
				ret <vscale x 8 x double> %res
				}

				declare <vscale x 32 x i8> @llvm.aarch64.sve.ld2.nxv32i8.nxv16i1.p0i8(<vscale x 16 x i1>, i8*)
				declare <vscale x 16 x i16> @llvm.aarch64.sve.ld2.nxv16i16.nxv8i1.p0i16(<vscale x 8 x i1>, i16*)
				declare <vscale x 8 x i32> @llvm.aarch64.sve.ld2.nxv8i32.nxv4i1.p0i32(<vscale x 4 x i1>, i32*)
				declare <vscale x 4 x i64> @llvm.aarch64.sve.ld2.nxv4i64.nxv2i1.p0i64(<vscale x 2 x i1>, i64*)
				declare <vscale x 16 x half> @llvm.aarch64.sve.ld2.nxv16f16.nxv8i1.p0f16(<vscale x 8 x i1>, half*)
				declare <vscale x 16 x bfloat> @llvm.aarch64.sve.ld2.nxv16bf16.nxv8i1.p0bf16(<vscale x 8 x i1>, bfloat*)
				declare <vscale x 8 x float> @llvm.aarch64.sve.ld2.nxv8f32.nxv4i1.p0f32(<vscale x 4 x i1>, float*)
				declare <vscale x 4 x double> @llvm.aarch64.sve.ld2.nxv4f64.nxv2i1.p0f64(<vscale x 2 x i1>, double*)

				declare <vscale x 48 x i8> @llvm.aarch64.sve.ld3.nxv48i8.nxv16i1.p0i8(<vscale x 16 x i1>, i8*)
				declare <vscale x 24 x i16> @llvm.aarch64.sve.ld3.nxv24i16.nxv8i1.p0i16(<vscale x 8 x i1>, i16*)
				declare <vscale x 12 x i32> @llvm.aarch64.sve.ld3.nxv12i32.nxv4i1.p0i32(<vscale x 4 x i1>, i32*)
				declare <vscale x 6 x i64> @llvm.aarch64.sve.ld3.nxv6i64.nxv2i1.p0i64(<vscale x 2 x i1>, i64*)
				declare <vscale x 24 x half> @llvm.aarch64.sve.ld3.nxv24f16.nxv8i1.p0f16(<vscale x 8 x i1>, half*)
				declare <vscale x 24 x bfloat> @llvm.aarch64.sve.ld3.nxv24bf16.nxv8i1.p0bf16(<vscale x 8 x i1>, bfloat*)
				declare <vscale x 12 x float> @llvm.aarch64.sve.ld3.nxv12f32.nxv4i1.p0f32(<vscale x 4 x i1>, float*)
				declare <vscale x 6 x double> @llvm.aarch64.sve.ld3.nxv6f64.nxv2i1.p0f64(<vscale x 2 x i1>, double*)

				declare <vscale x 64 x i8> @llvm.aarch64.sve.ld4.nxv64i8.nxv16i1.p0i8(<vscale x 16 x i1>, i8*)
				declare <vscale x 32 x i16> @llvm.aarch64.sve.ld4.nxv32i16.nxv8i1.p0i16(<vscale x 8 x i1>, i16*)
				declare <vscale x 16 x i32> @llvm.aarch64.sve.ld4.nxv16i32.nxv4i1.p0i32(<vscale x 4 x i1>, i32*)
				declare <vscale x 8 x i64> @llvm.aarch64.sve.ld4.nxv8i64.nxv2i1.p0i64(<vscale x 2 x i1>, i64*)
				declare <vscale x 32 x half> @llvm.aarch64.sve.ld4.nxv32f16.nxv8i1.p0f16(<vscale x 8 x i1>, half*)
				declare <vscale x 32 x bfloat> @llvm.aarch64.sve.ld4.nxv32bf16.nxv8i1.p0bf16(<vscale x 8 x i1>, bfloat*)
				declare <vscale x 16 x float> @llvm.aarch64.sve.ld4.nxv16f32.nxv4i1.p0f32(<vscale x 4 x i1>, float*)
				declare <vscale x 8 x double> @llvm.aarch64.sve.ld4.nxv8f64.nxv2i1.p0f64(<vscale x 2 x i1>, double*)

				; +bf16 is required for the bfloat version.
				attributes #0 = { "target-features"="+sve,+bf16" }

This is an archive of the discontinued LLVM Phabricator instance.

[llvm][CodeGen] Addressing modes for SVE ldN.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 281065

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

llvm/test/CodeGen/AArch64/sve-intrinsics-ldN-reg+imm-addr-mode.ll

llvm/test/CodeGen/AArch64/sve-intrinsics-ldN-reg+reg-addr-mode.ll

[llvm][CodeGen] Addressing modes for SVE ldN.
ClosedPublic