Diff 328160

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 14,360 Lines • ▼ Show 20 Lines	if (Op1.getOperand(0).getOpcode() == AArch64ISD::UZP1) {
SDValue Z = Op1.getOperand(0).getOperand(1);		SDValue Z = Op1.getOperand(0).getOperand(1);
return DAG.getNode(AArch64ISD::UZP1, DL, ResVT, Op0, Z);		return DAG.getNode(AArch64ISD::UZP1, DL, ResVT, Op0, Z);
}		}
}		}

return SDValue();		return SDValue();
}		}

		static SDValue performGLD1Combine(SDNode *N, SelectionDAG &DAG) {
		unsigned Opc = N->getOpcode();

		SDLoc DL(N);
		SDValue Op0 = N->getOperand(0); // Ch
		SDValue Op1 = N->getOperand(1); // Pg
		SDValue Op2 = N->getOperand(2); // Base
		SDValue Op3 = N->getOperand(3); // Offset
		SDValue Op4 = N->getOperand(4); // Ty
		kmclaughlinUnsubmitted Done Reply Inline Actions nit: I think it might make the code below a bit clearer if these variables were named similarly to the comments accompanying them, i.e. Chain, Pred, Base, etc kmclaughlin: nit: I think it might make the code below a bit clearer if these variables were named similarly…

		EVT ResVT = N->getValueType(0);

		const auto OffsetOpc = Op3.getOpcode();
		const bool OffsetIsZExt =
		OffsetOpc == AArch64ISD::ZERO_EXTEND_INREG_MERGE_PASSTHRU;
		const bool OffsetIsSExt =
		OffsetOpc == AArch64ISD::SIGN_EXTEND_INREG_MERGE_PASSTHRU;
		david-armUnsubmitted Done Reply Inline Actions Are these opcodes in a range, i.e. could you assert something like: assert((Opc >= Op1 && Opc <= Op2) \|\| (Opc >= Op3 && Opc <= Op4)) It might make the assert look a bit nicer perhaps? david-arm: Are these opcodes in a range, i.e. could you assert something like: assert((Opc >= Op1 && Opc…
		joechrisellisAuthorUnsubmitted Done Reply Inline Actions Much nicer -- thank you! I've split it into two ranges -- one for unsigned gather loads, and one for signed gather loads. assert(((Opc >= AArch64ISD::GLD1_MERGE_ZERO && // unsigned gather loads Opc <= AArch64ISD::GLD1_IMM_MERGE_ZERO) \|\| (Opc >= AArch64ISD::GLD1S_MERGE_ZERO && // signed gather loads Opc <= AArch64ISD::GLD1S_IMM_MERGE_ZERO)) && "Invalid opcode."); These ranges are actually adjacent in the enum at the moment but I guess that isn't always guaranteed to be the case. joechrisellis: Much nicer -- thank you! I've split it into two ranges -- one for unsigned gather loads, and…

		kmclaughlinUnsubmitted Not Done Reply Inline Actions nit: if SExt & ZExt aren't used anywhere else, could you remove them and just set Extended based on the opcode? i.e. const bool Extended = Opc == AArch64ISD::GLD1_SXTW_MERGE_ZERO \|\| Opc == AArch64ISD::GLD1_SXTW_SCALED_MERGE_ZERO \|\| Opc == AArch64ISD::GLD1_UXTW_MERGE_ZERO \|\| Opc == AArch64ISD::GLD1_UXTW_SCALED_MERGE_ZERO; kmclaughlin: nit: if SExt & ZExt aren't used anywhere else, could you remove them and just set Extended…
		// Fold sign/zero extensions of vector offsets into GLD1 nodes where possible.
		peterwaller-armUnsubmitted Done Reply Inline Actions This comment (and below) is merely a re-expression of the code on the next line, where a message indicating the intent may be more useful -- something like "Fold sign/zero extensions of vector offsets into GLD1 nodes where possible". peterwaller-arm: This comment (and below) is merely a re-expression of the code on the next line, where a…
		if (OffsetIsSExt \|\| OffsetIsZExt) {
		kmclaughlinUnsubmitted Done Reply Inline Actions Can we remove some of the indentation below by returning SDValue() here if the offset isn't extended instead? kmclaughlin: Can we remove some of the indentation below by returning SDValue() here if the offset isn't…
		joechrisellisAuthorUnsubmitted Done Reply Inline Actions I would prefer to leave this as-is, because this was an intentional decision. It's possible that we might want to extend this function with a new optimisation pattern. It'd be easier to extend the function in its current state than it would be with the early return. 🙂 joechrisellis: I would prefer to leave this as-is, because this was an intentional decision. It's possible…
		SDValue ExtPg = Op3.getOperand(0);
		VTSDNode *ExtFrom = cast<VTSDNode>(Op3.getOperand(2).getNode());
		EVT ExtFromEVT = ExtFrom->getVT().getVectorElementType();

		// If the predicate for the sign- or zero-extended offset is the
		// same as the predicate used for this load and the sign-/zero-extension
		// was from a 32-bits...
		if (ExtPg == Op1 && ExtFromEVT == MVT::i32) {
		david-armUnsubmitted Not Done Reply Inline Actions In fact, you could move the above assert here instead and have something like: assert(Scaled \|\| Signed \|\| SExt \|\| ZExt) maybe? david-arm: In fact, you could move the above assert here instead and have something like: assert(Scaled…
		joechrisellisAuthorUnsubmitted Done Reply Inline Actions I am not sure that this will work because it is possible to have an unscaled, unsigned, and unextended opcode -- e.g. `AArch64ISD::GLD1_MERGE_ZERO`. Although this point is definitely valid if we only consider the optimisation patterns that we're performing at the moment. joechrisellis: I am not sure that this will work because it is possible to have an unscaled, unsigned, and…
		SDValue UnextendedOffset = Op3.getOperand(1);

		unsigned NewOpc;
		switch (Opc) {
		case AArch64ISD::GLD1_MERGE_ZERO:
		NewOpc = OffsetIsZExt ? AArch64ISD::GLD1_UXTW_MERGE_ZERO
		: AArch64ISD::GLD1_SXTW_MERGE_ZERO;
		break;
		case AArch64ISD::GLD1S_MERGE_ZERO:
		NewOpc = OffsetIsZExt ? AArch64ISD::GLD1S_UXTW_MERGE_ZERO
		: AArch64ISD::GLD1S_SXTW_MERGE_ZERO;
		break;
		case AArch64ISD::GLD1_SCALED_MERGE_ZERO:
		NewOpc = OffsetIsZExt ? AArch64ISD::GLD1_UXTW_SCALED_MERGE_ZERO
		: AArch64ISD::GLD1_SXTW_SCALED_MERGE_ZERO;
		break;
		case AArch64ISD::GLD1S_SCALED_MERGE_ZERO:
		NewOpc = OffsetIsZExt ? AArch64ISD::GLD1S_UXTW_SCALED_MERGE_ZERO
		: AArch64ISD::GLD1S_SXTW_SCALED_MERGE_ZERO;
		break;
		kmclaughlinUnsubmitted Done Reply Inline Actions There were some helper functions added for LowerMGATHER which I think you might be able to use here to get the right gather opcode (getGatherVecOpcode & getSignExtendedGatherOpcode). As an example: ... case GLD1_MERGE_ZERO: case GLD1S_MERGE_ZERO: getGatherVecOpcode(false /Scaled/, OffsetIsSext, true /NeedsExtend/); case GLD1_SCALED_MERGE_ZERO: case GLD1_SCALED_MERGE_ZERO: getGatherVecOpcode(true /Scaled/, OffsetIsSext, true /NeedsExtend/); ... if (Opc == GLD1S_MERGE_ZERO \|\| Opc == GLD1S_SCALED_MERGE_ZERO) NewOpc = getSignExtendedGatherOpcode(NewOp); kmclaughlin: There were some helper functions added for LowerMGATHER which I think you might be able to use…
		joechrisellisAuthorUnsubmitted Done Reply Inline Actions Thanks, that's cleaned things up a lot. 🙂 joechrisellis: Thanks, that's cleaned things up a lot. 🙂
		default:
		llvm_unreachable("Unexpected opcode.");
		david-armUnsubmitted Done Reply Inline Actions Do we not need to check the type we're extending from here? For example, what if we're extending from a vector of i8s, which isn't tested below? david-arm: Do we not need to check the type we're extending from here? For example, what if we're…
		joechrisellisAuthorUnsubmitted Done Reply Inline Actions Great spot -- thank you. Turns out we can only fold in `{u,s}xtw`s (`w` being the key part). :) joechrisellis: Great spot -- thank you. Turns out we can only fold in `{u,s}xtw`s (`w` being the key part). :)
		}

		return DAG.getNode(NewOpc, DL, {ResVT, MVT::Other},
		{Op0, Op1, Op2, UnextendedOffset, Op4});
		}
		}

		return SDValue();
		}

/// Target-specific DAG combine function for post-increment LD1 (lane) and		/// Target-specific DAG combine function for post-increment LD1 (lane) and
/// post-increment LD1R.		/// post-increment LD1R.
static SDValue performPostLD1Combine(SDNode *N,		static SDValue performPostLD1Combine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI,		TargetLowering::DAGCombinerInfo &DCI,
bool IsLaneOp) {		bool IsLaneOp) {
if (DCI.isBeforeLegalizeOps())		if (DCI.isBeforeLegalizeOps())
return SDValue();		return SDValue();

▲ Show 20 Lines • Show All 1,361 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::PerformDAGCombine(SDNode *N,
case AArch64ISD::CSEL:		case AArch64ISD::CSEL:
return performCONDCombine(N, DCI, DAG, 2, 3);		return performCONDCombine(N, DCI, DAG, 2, 3);
case AArch64ISD::DUP:		case AArch64ISD::DUP:
return performPostLD1Combine(N, DCI, false);		return performPostLD1Combine(N, DCI, false);
case AArch64ISD::NVCAST:		case AArch64ISD::NVCAST:
return performNVCASTCombine(N);		return performNVCASTCombine(N);
case AArch64ISD::UZP1:		case AArch64ISD::UZP1:
return performUzpCombine(N, DAG);		return performUzpCombine(N, DAG);
		case AArch64ISD::GLD1_MERGE_ZERO:
		case AArch64ISD::GLD1S_MERGE_ZERO:
		case AArch64ISD::GLD1_SCALED_MERGE_ZERO:
		case AArch64ISD::GLD1S_SCALED_MERGE_ZERO:
		return performGLD1Combine(N, DAG);
case ISD::INSERT_VECTOR_ELT:		case ISD::INSERT_VECTOR_ELT:
return performPostLD1Combine(N, DCI, true);		return performPostLD1Combine(N, DCI, true);
case ISD::EXTRACT_VECTOR_ELT:		case ISD::EXTRACT_VECTOR_ELT:
return performExtractVectorEltCombine(N, DAG);		return performExtractVectorEltCombine(N, DAG);
case ISD::VECREDUCE_ADD:		case ISD::VECREDUCE_ADD:
return performVecReduceAddCombine(N, DCI.DAG, Subtarget);		return performVecReduceAddCombine(N, DCI.DAG, Subtarget);
case ISD::INTRINSIC_VOID:		case ISD::INTRINSIC_VOID:
case ISD::INTRINSIC_W_CHAIN:		case ISD::INTRINSIC_W_CHAIN:
switch (cast<ConstantSDNode>(N->getOperand(1))->getZExtValue()) {		switch (cast<ConstantSDNode>(N->getOperand(1))->getZExtValue()) {
		david-armUnsubmitted Not Done Reply Inline Actions I wonder if you maybe want to make the combine function simpler for now in terms of asserts and checking by only passing in the opcodes that you actually intend to combine? If later on you want to add more then it's easy enough to update. This is just a suggestion though. david-arm: I wonder if you maybe want to make the combine function simpler for now in terms of asserts and…
		joechrisellisAuthorUnsubmitted Done Reply Inline Actions I am happy to leave it as-is for now, as long as this isn't a blocker? I think it's nice to be able to add new patterns in for the different ISD nodes without too much wrangling. Just my opinion though --override if you want! 😄 joechrisellis: I am happy to leave it as-is for now, as long as this isn't a blocker? I think it's nice to be…
case Intrinsic::aarch64_sve_prfb_gather_scalar_offset:		case Intrinsic::aarch64_sve_prfb_gather_scalar_offset:
return combineSVEPrefetchVecBaseImmOff(N, DAG, 1 /=ScalarSizeInBytes/);		return combineSVEPrefetchVecBaseImmOff(N, DAG, 1 /=ScalarSizeInBytes/);
case Intrinsic::aarch64_sve_prfh_gather_scalar_offset:		case Intrinsic::aarch64_sve_prfh_gather_scalar_offset:
return combineSVEPrefetchVecBaseImmOff(N, DAG, 2 /=ScalarSizeInBytes/);		return combineSVEPrefetchVecBaseImmOff(N, DAG, 2 /=ScalarSizeInBytes/);
case Intrinsic::aarch64_sve_prfw_gather_scalar_offset:		case Intrinsic::aarch64_sve_prfw_gather_scalar_offset:
return combineSVEPrefetchVecBaseImmOff(N, DAG, 4 /=ScalarSizeInBytes/);		return combineSVEPrefetchVecBaseImmOff(N, DAG, 4 /=ScalarSizeInBytes/);
case Intrinsic::aarch64_sve_prfd_gather_scalar_offset:		case Intrinsic::aarch64_sve_prfd_gather_scalar_offset:
return combineSVEPrefetchVecBaseImmOff(N, DAG, 8 /=ScalarSizeInBytes/);		return combineSVEPrefetchVecBaseImmOff(N, DAG, 8 /=ScalarSizeInBytes/);
▲ Show 20 Lines • Show All 1,590 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-scaled-offset.ll

	Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%load = call <vscale x 2 x i32> @llvm.aarch64.sve.ld1.gather.index.nxv2i32(<vscale x 2 x i1> %pg,			%load = call <vscale x 2 x i32> @llvm.aarch64.sve.ld1.gather.index.nxv2i32(<vscale x 2 x i1> %pg,
	i32* %base,			i32* %base,
	<vscale x 2 x i64> %b)			<vscale x 2 x i64> %b)
	%res = sext <vscale x 2 x i32> %load to <vscale x 2 x i64>			%res = sext <vscale x 2 x i32> %load to <vscale x 2 x i64>
	ret <vscale x 2 x i64> %res			ret <vscale x 2 x i64> %res
	}			}

				;
				; LD1H, LD1W, LD1D: base + 64-bit sxtw'd scaled offset
				; e.g. ld1h z0.d, p0/z, [x0, z0.d, sxtw #1]
				;

				define <vscale x 2 x i64> @gld1h_index_sxtw(<vscale x 2 x i1> %pg, i16* %base, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: gld1h_index_sxtw
				; CHECK: ld1h { z0.d }, p0/z, [x0, z0.d, sxtw #1]
				; CHECK-NEXT: ret
				%sxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.sxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %b)
				%load = call <vscale x 2 x i16> @llvm.aarch64.sve.ld1.gather.index.nxv2i16(<vscale x 2 x i1> %pg,
				i16* %base,
				<vscale x 2 x i64> %sxtw)
				%res = zext <vscale x 2 x i16> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @gld1w_index_sxtw(<vscale x 2 x i1> %pg, i32* %base, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: gld1w_index_sxtw
				; CHECK: ld1w { z0.d }, p0/z, [x0, z0.d, sxtw #2]
				; CHECK-NEXT: ret
				%sxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.sxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %b)
				%load = call <vscale x 2 x i32> @llvm.aarch64.sve.ld1.gather.index.nxv2i32(<vscale x 2 x i1> %pg,
				i32* %base,
				<vscale x 2 x i64> %sxtw)
				%res = zext <vscale x 2 x i32> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @gld1d_index_sxtw(<vscale x 2 x i1> %pg, i64* %base, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: gld1d_index_sxtw
				; CHECK: ld1d { z0.d }, p0/z, [x0, z0.d, sxtw #3]
				; CHECK-NEXT: ret
				%sxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.sxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %b)
				%load = call <vscale x 2 x i64> @llvm.aarch64.sve.ld1.gather.index.nxv2i64(<vscale x 2 x i1> %pg,
				i64* %base,
				<vscale x 2 x i64> %sxtw)
				ret <vscale x 2 x i64> %load
				}

				define <vscale x 2 x double> @gld1d_index_double_sxtw(<vscale x 2 x i1> %pg, double* %base, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: gld1d_index_double_sxtw
				; CHECK: ld1d { z0.d }, p0/z, [x0, z0.d, sxtw #3]
				; CHECK-NEXT: ret
				%sxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.sxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %b)
				%load = call <vscale x 2 x double> @llvm.aarch64.sve.ld1.gather.index.nxv2f64(<vscale x 2 x i1> %pg,
				double* %base,
				<vscale x 2 x i64> %sxtw)
				ret <vscale x 2 x double> %load
				}

				;
				; LD1SH, LD1SW: base + 64-bit sxtw'd scaled offset
				; e.g. ld1sh z0.d, p0/z, [x0, z0.d, sxtw #1]
				;

				define <vscale x 2 x i64> @gld1sh_index_sxtw(<vscale x 2 x i1> %pg, i16* %base, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: gld1sh_index_sxtw
				; CHECK: ld1sh { z0.d }, p0/z, [x0, z0.d, sxtw #1]
				; CHECK-NEXT: ret
				%sxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.sxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %b)
				%load = call <vscale x 2 x i16> @llvm.aarch64.sve.ld1.gather.index.nxv2i16(<vscale x 2 x i1> %pg,
				i16* %base,
				<vscale x 2 x i64> %sxtw)
				%res = sext <vscale x 2 x i16> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @gld1sw_index_sxtw(<vscale x 2 x i1> %pg, i32* %base, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: gld1sw_index_sxtw
				; CHECK: ld1sw { z0.d }, p0/z, [x0, z0.d, sxtw #2]
				; CHECK-NEXT: ret
				%sxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.sxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %b)
				%load = call <vscale x 2 x i32> @llvm.aarch64.sve.ld1.gather.index.nxv2i32(<vscale x 2 x i1> %pg,
				i32* %base,
				<vscale x 2 x i64> %sxtw)
				%res = sext <vscale x 2 x i32> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				;
				; LD1H, LD1W, LD1D: base + 64-bit sxtw'd scaled offset
				; e.g. ld1h z0.d, p0/z, [x0, z0.d, uxtw #1]
				;

				define <vscale x 2 x i64> @gld1h_index_uxtw(<vscale x 2 x i1> %pg, i16* %base, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: gld1h_index_uxtw
				; CHECK: ld1h { z0.d }, p0/z, [x0, z0.d, uxtw #1]
				; CHECK-NEXT: ret
				%uxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.uxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %b)
				%load = call <vscale x 2 x i16> @llvm.aarch64.sve.ld1.gather.index.nxv2i16(<vscale x 2 x i1> %pg,
				i16* %base,
				<vscale x 2 x i64> %uxtw)
				%res = zext <vscale x 2 x i16> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @gld1w_index_uxtw(<vscale x 2 x i1> %pg, i32* %base, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: gld1w_index_uxtw
				; CHECK: ld1w { z0.d }, p0/z, [x0, z0.d, uxtw #2]
				; CHECK-NEXT: ret
				%uxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.uxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %b)
				%load = call <vscale x 2 x i32> @llvm.aarch64.sve.ld1.gather.index.nxv2i32(<vscale x 2 x i1> %pg,
				i32* %base,
				<vscale x 2 x i64> %uxtw)
				%res = zext <vscale x 2 x i32> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @gld1d_index_uxtw(<vscale x 2 x i1> %pg, i64* %base, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: gld1d_index_uxtw
				; CHECK: ld1d { z0.d }, p0/z, [x0, z0.d, uxtw #3]
				; CHECK-NEXT: ret
				%uxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.uxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %b)
				%load = call <vscale x 2 x i64> @llvm.aarch64.sve.ld1.gather.index.nxv2i64(<vscale x 2 x i1> %pg,
				i64* %base,
				<vscale x 2 x i64> %uxtw)
				ret <vscale x 2 x i64> %load
				}

				define <vscale x 2 x double> @gld1d_index_double_uxtw(<vscale x 2 x i1> %pg, double* %base, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: gld1d_index_double_uxtw
				; CHECK: ld1d { z0.d }, p0/z, [x0, z0.d, uxtw #3]
				; CHECK-NEXT: ret
				%uxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.uxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %b)
				%load = call <vscale x 2 x double> @llvm.aarch64.sve.ld1.gather.index.nxv2f64(<vscale x 2 x i1> %pg,
				double* %base,
				<vscale x 2 x i64> %uxtw)
				ret <vscale x 2 x double> %load
				}

				;
				; LD1SH, LD1SW: base + 64-bit uxtw'd scaled offset
				; e.g. ld1sh z0.d, p0/z, [x0, z0.d, uxtw #1]
				;

				define <vscale x 2 x i64> @gld1sh_index_uxtw(<vscale x 2 x i1> %pg, i16* %base, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: gld1sh_index_uxtw
				; CHECK: ld1sh { z0.d }, p0/z, [x0, z0.d, uxtw #1]
				; CHECK-NEXT: ret
				%uxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.uxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %b)
				%load = call <vscale x 2 x i16> @llvm.aarch64.sve.ld1.gather.index.nxv2i16(<vscale x 2 x i1> %pg,
				i16* %base,
				<vscale x 2 x i64> %uxtw)
				%res = sext <vscale x 2 x i16> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @gld1sw_index_uxtw(<vscale x 2 x i1> %pg, i32* %base, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: gld1sw_index_uxtw
				; CHECK: ld1sw { z0.d }, p0/z, [x0, z0.d, uxtw #2]
				; CHECK-NEXT: ret
				%uxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.uxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %b)
				%load = call <vscale x 2 x i32> @llvm.aarch64.sve.ld1.gather.index.nxv2i32(<vscale x 2 x i1> %pg,
				i32* %base,
				<vscale x 2 x i64> %uxtw)
				%res = sext <vscale x 2 x i32> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

	declare <vscale x 2 x i16> @llvm.aarch64.sve.ld1.gather.index.nxv2i16(<vscale x 2 x i1>, i16*, <vscale x 2 x i64>)			declare <vscale x 2 x i16> @llvm.aarch64.sve.ld1.gather.index.nxv2i16(<vscale x 2 x i1>, i16*, <vscale x 2 x i64>)
	declare <vscale x 2 x i32> @llvm.aarch64.sve.ld1.gather.index.nxv2i32(<vscale x 2 x i1>, i32*, <vscale x 2 x i64>)			declare <vscale x 2 x i32> @llvm.aarch64.sve.ld1.gather.index.nxv2i32(<vscale x 2 x i1>, i32*, <vscale x 2 x i64>)
	declare <vscale x 2 x i64> @llvm.aarch64.sve.ld1.gather.index.nxv2i64(<vscale x 2 x i1>, i64*, <vscale x 2 x i64>)			declare <vscale x 2 x i64> @llvm.aarch64.sve.ld1.gather.index.nxv2i64(<vscale x 2 x i1>, i64*, <vscale x 2 x i64>)
	declare <vscale x 2 x double> @llvm.aarch64.sve.ld1.gather.index.nxv2f64(<vscale x 2 x i1>, double*, <vscale x 2 x i64>)			declare <vscale x 2 x double> @llvm.aarch64.sve.ld1.gather.index.nxv2f64(<vscale x 2 x i1>, double*, <vscale x 2 x i64>)

				declare <vscale x 2 x i64> @llvm.aarch64.sve.sxtw.nxv2i64(<vscale x 2 x i64>, <vscale x 2 x i1>, <vscale x 2 x i64>)
				declare <vscale x 2 x i64> @llvm.aarch64.sve.uxtw.nxv2i64(<vscale x 2 x i64>, <vscale x 2 x i1>, <vscale x 2 x i64>)

llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-unscaled-offset.ll

	Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%load = call <vscale x 2 x i32> @llvm.aarch64.sve.ld1.gather.nxv2i32(<vscale x 2 x i1> %pg,			%load = call <vscale x 2 x i32> @llvm.aarch64.sve.ld1.gather.nxv2i32(<vscale x 2 x i1> %pg,
	i32* %base,			i32* %base,
	<vscale x 2 x i64> %offsets)			<vscale x 2 x i64> %offsets)
	%res = sext <vscale x 2 x i32> %load to <vscale x 2 x i64>			%res = sext <vscale x 2 x i32> %load to <vscale x 2 x i64>
	ret <vscale x 2 x i64> %res			ret <vscale x 2 x i64> %res
	}			}

				;
				; LD1B, LD1W, LD1H, LD1D: base + 64-bit sxtw'd unscaled offset
				; e.g. ld1h { z0.d }, p0/z, [x0, z0.d, sxtw]
				;

				define <vscale x 2 x i64> @gld1b_d_sxtw(<vscale x 2 x i1> %pg, i8* %base, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: gld1b_d_sxtw:
				; CHECK: ld1b { z0.d }, p0/z, [x0, z0.d, sxtw]
				; CHECK-NEXT: ret
				%sxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.sxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %b)
				%load = call <vscale x 2 x i8> @llvm.aarch64.sve.ld1.gather.nxv2i8(<vscale x 2 x i1> %pg,
				i8* %base,
				<vscale x 2 x i64> %sxtw)
				%res = zext <vscale x 2 x i8> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @gld1h_d_sxtw(<vscale x 2 x i1> %pg, i16* %base, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: gld1h_d_sxtw:
				; CHECK: ld1h { z0.d }, p0/z, [x0, z0.d, sxtw]
				; CHECK-NEXT: ret
				%sxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.sxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %b)
				%load = call <vscale x 2 x i16> @llvm.aarch64.sve.ld1.gather.nxv2i16(<vscale x 2 x i1> %pg,
				i16* %base,
				<vscale x 2 x i64> %sxtw)
				%res = zext <vscale x 2 x i16> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @gld1w_d_sxtw(<vscale x 2 x i1> %pg, i32* %base, <vscale x 2 x i64> %offsets) {
				; CHECK-LABEL: gld1w_d_sxtw:
				; CHECK: ld1w { z0.d }, p0/z, [x0, z0.d, sxtw]
				; CHECK-NEXT: ret
				%sxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.sxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %offsets)
				%load = call <vscale x 2 x i32> @llvm.aarch64.sve.ld1.gather.nxv2i32(<vscale x 2 x i1> %pg,
				i32* %base,
				<vscale x 2 x i64> %sxtw)
				%res = zext <vscale x 2 x i32> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @gld1d_d_sxtw(<vscale x 2 x i1> %pg, i64* %base, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: gld1d_d_sxtw:
				; CHECK: ld1d { z0.d }, p0/z, [x0, z0.d, sxtw]
				; CHECK-NEXT: ret
				%sxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.sxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %b)
				%load = call <vscale x 2 x i64> @llvm.aarch64.sve.ld1.gather.nxv2i64(<vscale x 2 x i1> %pg,
				i64* %base,
				<vscale x 2 x i64> %sxtw)
				ret <vscale x 2 x i64> %load
				}

				define <vscale x 2 x double> @gld1d_d_double_sxtw(<vscale x 2 x i1> %pg, double* %base, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: gld1d_d_double_sxtw:
				; CHECK: ld1d { z0.d }, p0/z, [x0, z0.d, sxtw]
				; CHECK-NEXT: ret
				%sxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.sxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %b)
				%load = call <vscale x 2 x double> @llvm.aarch64.sve.ld1.gather.nxv2f64(<vscale x 2 x i1> %pg,
				double* %base,
				<vscale x 2 x i64> %sxtw)
				ret <vscale x 2 x double> %load
				}

				;
				; LD1SB, LD1SW, LD1SH: base + 64-bit sxtw'd unscaled offset
				; e.g. ld1sh { z0.d }, p0/z, [x0, z0.d]
				;

				define <vscale x 2 x i64> @gld1sb_d_sxtw(<vscale x 2 x i1> %pg, i8* %base, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: gld1sb_d_sxtw:
				; CHECK: ld1sb { z0.d }, p0/z, [x0, z0.d, sxtw]
				; CHECK-NEXT: ret
				%sxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.sxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %b)
				%load = call <vscale x 2 x i8> @llvm.aarch64.sve.ld1.gather.nxv2i8(<vscale x 2 x i1> %pg,
				i8* %base,
				<vscale x 2 x i64> %sxtw)
				%res = sext <vscale x 2 x i8> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @gld1sh_d_sxtw(<vscale x 2 x i1> %pg, i16* %base, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: gld1sh_d_sxtw:
				; CHECK: ld1sh { z0.d }, p0/z, [x0, z0.d, sxtw]
				; CHECK-NEXT: ret
				%sxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.sxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %b)
				%load = call <vscale x 2 x i16> @llvm.aarch64.sve.ld1.gather.nxv2i16(<vscale x 2 x i1> %pg,
				i16* %base,
				<vscale x 2 x i64> %sxtw)
				%res = sext <vscale x 2 x i16> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @gld1sw_d_sxtw(<vscale x 2 x i1> %pg, i32* %base, <vscale x 2 x i64> %offsets) {
				; CHECK-LABEL: gld1sw_d_sxtw:
				; CHECK: ld1sw { z0.d }, p0/z, [x0, z0.d, sxtw]
				; CHECK-NEXT: ret
				%sxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.sxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %offsets)
				%load = call <vscale x 2 x i32> @llvm.aarch64.sve.ld1.gather.nxv2i32(<vscale x 2 x i1> %pg,
				i32* %base,
				<vscale x 2 x i64> %sxtw)
				%res = sext <vscale x 2 x i32> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				;
				; LD1B, LD1W, LD1H, LD1D: base + 64-bit uxtw'd unscaled offset
				; e.g. ld1h { z0.d }, p0/z, [x0, z0.d, uxtw]
				;

				define <vscale x 2 x i64> @gld1b_d_uxtw(<vscale x 2 x i1> %pg, i8* %base, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: gld1b_d_uxtw:
				; CHECK: ld1b { z0.d }, p0/z, [x0, z0.d, uxtw]
				; CHECK-NEXT: ret
				%uxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.uxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %b)
				%load = call <vscale x 2 x i8> @llvm.aarch64.sve.ld1.gather.nxv2i8(<vscale x 2 x i1> %pg,
				i8* %base,
				<vscale x 2 x i64> %uxtw)
				%res = zext <vscale x 2 x i8> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @gld1h_d_uxtw(<vscale x 2 x i1> %pg, i16* %base, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: gld1h_d_uxtw:
				; CHECK: ld1h { z0.d }, p0/z, [x0, z0.d, uxtw]
				; CHECK-NEXT: ret
				%uxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.uxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %b)
				%load = call <vscale x 2 x i16> @llvm.aarch64.sve.ld1.gather.nxv2i16(<vscale x 2 x i1> %pg,
				i16* %base,
				<vscale x 2 x i64> %uxtw)
				%res = zext <vscale x 2 x i16> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @gld1w_d_uxtw(<vscale x 2 x i1> %pg, i32* %base, <vscale x 2 x i64> %offsets) {
				; CHECK-LABEL: gld1w_d_uxtw:
				; CHECK: ld1w { z0.d }, p0/z, [x0, z0.d, uxtw]
				; CHECK-NEXT: ret
				%uxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.uxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %offsets)
				%load = call <vscale x 2 x i32> @llvm.aarch64.sve.ld1.gather.nxv2i32(<vscale x 2 x i1> %pg,
				i32* %base,
				<vscale x 2 x i64> %uxtw)
				%res = zext <vscale x 2 x i32> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @gld1d_d_uxtw(<vscale x 2 x i1> %pg, i64* %base, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: gld1d_d_uxtw:
				; CHECK: ld1d { z0.d }, p0/z, [x0, z0.d, uxtw]
				; CHECK-NEXT: ret
				%uxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.uxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %b)
				%load = call <vscale x 2 x i64> @llvm.aarch64.sve.ld1.gather.nxv2i64(<vscale x 2 x i1> %pg,
				i64* %base,
				<vscale x 2 x i64> %uxtw)
				ret <vscale x 2 x i64> %load
				}

				define <vscale x 2 x double> @gld1d_d_double_uxtw(<vscale x 2 x i1> %pg, double* %base, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: gld1d_d_double_uxtw:
				; CHECK: ld1d { z0.d }, p0/z, [x0, z0.d, uxtw]
				; CHECK-NEXT: ret
				%uxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.uxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %b)
				%load = call <vscale x 2 x double> @llvm.aarch64.sve.ld1.gather.nxv2f64(<vscale x 2 x i1> %pg,
				double* %base,
				<vscale x 2 x i64> %uxtw)
				ret <vscale x 2 x double> %load
				}

				;
				; LD1SB, LD1SW, LD1SH: base + 64-bit uxtw'd unscaled offset
				; e.g. ld1sh { z0.d }, p0/z, [x0, z0.d]
				;

				define <vscale x 2 x i64> @gld1sb_d_uxtw(<vscale x 2 x i1> %pg, i8* %base, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: gld1sb_d_uxtw:
				; CHECK: ld1sb { z0.d }, p0/z, [x0, z0.d, uxtw]
				; CHECK-NEXT: ret
				%uxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.uxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %b)
				%load = call <vscale x 2 x i8> @llvm.aarch64.sve.ld1.gather.nxv2i8(<vscale x 2 x i1> %pg,
				i8* %base,
				<vscale x 2 x i64> %uxtw)
				%res = sext <vscale x 2 x i8> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @gld1sh_d_uxtw(<vscale x 2 x i1> %pg, i16* %base, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: gld1sh_d_uxtw:
				; CHECK: ld1sh { z0.d }, p0/z, [x0, z0.d, uxtw]
				; CHECK-NEXT: ret
				%uxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.uxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %b)
				%load = call <vscale x 2 x i16> @llvm.aarch64.sve.ld1.gather.nxv2i16(<vscale x 2 x i1> %pg,
				i16* %base,
				<vscale x 2 x i64> %uxtw)
				%res = sext <vscale x 2 x i16> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @gld1sw_d_uxtw(<vscale x 2 x i1> %pg, i32* %base, <vscale x 2 x i64> %offsets) {
				; CHECK-LABEL: gld1sw_d_uxtw:
				; CHECK: ld1sw { z0.d }, p0/z, [x0, z0.d, uxtw]
				; CHECK-NEXT: ret
				%uxtw = call <vscale x 2 x i64> @llvm.aarch64.sve.uxtw.nxv2i64(<vscale x 2 x i64> undef,
				<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %offsets)
				%load = call <vscale x 2 x i32> @llvm.aarch64.sve.ld1.gather.nxv2i32(<vscale x 2 x i1> %pg,
				i32* %base,
				<vscale x 2 x i64> %uxtw)
				%res = sext <vscale x 2 x i32> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %res
				}

	declare <vscale x 2 x i8> @llvm.aarch64.sve.ld1.gather.nxv2i8(<vscale x 2 x i1>, i8*, <vscale x 2 x i64>)			declare <vscale x 2 x i8> @llvm.aarch64.sve.ld1.gather.nxv2i8(<vscale x 2 x i1>, i8*, <vscale x 2 x i64>)
	declare <vscale x 2 x i16> @llvm.aarch64.sve.ld1.gather.nxv2i16(<vscale x 2 x i1>, i16*, <vscale x 2 x i64>)			declare <vscale x 2 x i16> @llvm.aarch64.sve.ld1.gather.nxv2i16(<vscale x 2 x i1>, i16*, <vscale x 2 x i64>)
	declare <vscale x 2 x i32> @llvm.aarch64.sve.ld1.gather.nxv2i32(<vscale x 2 x i1>, i32*, <vscale x 2 x i64>)			declare <vscale x 2 x i32> @llvm.aarch64.sve.ld1.gather.nxv2i32(<vscale x 2 x i1>, i32*, <vscale x 2 x i64>)
	declare <vscale x 2 x i64> @llvm.aarch64.sve.ld1.gather.nxv2i64(<vscale x 2 x i1>, i64*, <vscale x 2 x i64>)			declare <vscale x 2 x i64> @llvm.aarch64.sve.ld1.gather.nxv2i64(<vscale x 2 x i1>, i64*, <vscale x 2 x i64>)
	declare <vscale x 2 x double> @llvm.aarch64.sve.ld1.gather.nxv2f64(<vscale x 2 x i1>, double*, <vscale x 2 x i64>)			declare <vscale x 2 x double> @llvm.aarch64.sve.ld1.gather.nxv2f64(<vscale x 2 x i1>, double*, <vscale x 2 x i64>)

				declare <vscale x 2 x i64> @llvm.aarch64.sve.sxtw.nxv2i64(<vscale x 2 x i64>, <vscale x 2 x i1>, <vscale x 2 x i64>)
				declare <vscale x 2 x i64> @llvm.aarch64.sve.uxtw.nxv2i64(<vscale x 2 x i64>, <vscale x 2 x i1>, <vscale x 2 x i64>)

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Fold vector ZExt/SExt into gather loads where possible
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 328160

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-scaled-offset.ll

llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-unscaled-offset.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Fold vector ZExt/SExt into gather loads where possibleClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 328160

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-scaled-offset.ll

llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-unscaled-offset.ll

[AArch64][SVE] Fold vector ZExt/SExt into gather loads where possible
ClosedPublic