This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10525–10526	As it stands, `tryLowerLD1RQ` is returning the `BUILD_VECTOR` as an in-band signalling mechanism, reusing the 'slot' that would ordinarily be used to communicate the result of the lowering to instead communicate that further lowering should stop for now. This seems hazardous to me. I'd be interested if other reviewers have an opinion on what is a better way to achieve this. `tryLowerLD1RQ` wants its hands on the address of the load, which begins life as an `ISD::BUILD_VECTOR` and the address is not available until after `BUILD_VECTOR` has been lowered. However, `LowerDUPQLane` after this point can match `Op` and replace it with something else. The logic for converting the `BUILD_VECTOR` to a load cannot be directly called out to since it is within the lowering of `BUILD_VECTOR`, and reimplementing or refactoring that doesn't seem the right way either. On balance I think the current approach is OK, although I'd prefer to see the signalling be out-of-band, perhaps via a pass-by-pointer bool.

paulwalker-arm added inline comments.Jun 28 2022, 7:36 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10525–10526	I believe the problem here is that the current implementation of LowerDUPQLane is kicking out a MachineNode (i.e. `AArch64::DUP_ZZI_Q`) which means you cannot implement the fix by looking for the expected `splatq(load(q))` pattern. I recommend first creating an `AArch64ISD` node for `DUP_ZZI_Q` or something akin to it if you think of something more generic. Then `LowerDUPQLane` will only emit normal DAG nodes and thus you can implement a simpler DAG combine. With that said, I believe once the AArch64ISD node exists you should be able to implement the optimisation within `AArch64SVEInstrInfo.td` much like how we target the other variants of `LD1R` albeit with a different pattern to match against. It's probably worth breaking this into two patches. The first to create and use the new ISD node and the second to add the patterns to target `LD1RQ`.
10535	Not your fault but I believe this `BITCAST` is a bug waiting to hit big endian targets. I recommend ensuring your new ISD node can be selected for all packed legal scalable vector types so the need for the bitcast is removed from the DUPQ path.

Matt added a subscriber: Matt.Jun 28 2022, 2:23 PM

Abandoning in favour of https://reviews.llvm.org/D128902, which should result in a simpler combine across 2 patches

MattDevereau mentioned this in rG018a0dd5c88b: [AArch64][SVE] Create AArch64ISD node for DUPQLANE128.Jul 1 2022, 4:47 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

46 lines

test/

CodeGen/

AArch64/

sve-intrinsics-perm-select.ll

52 lines

Diff 440114

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,469 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerSPLAT_VECTOR(SDValue Op,
SplatVal = DAG.getNode(ISD::SIGN_EXTEND_INREG, DL, MVT::i64, SplatVal,		SplatVal = DAG.getNode(ISD::SIGN_EXTEND_INREG, DL, MVT::i64, SplatVal,
DAG.getValueType(MVT::i1));		DAG.getValueType(MVT::i1));
SDValue ID =		SDValue ID =
DAG.getTargetConstant(Intrinsic::aarch64_sve_whilelo, DL, MVT::i64);		DAG.getTargetConstant(Intrinsic::aarch64_sve_whilelo, DL, MVT::i64);
return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, VT, ID,		return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, VT, ID,
DAG.getConstant(0, DL, MVT::i64), SplatVal);		DAG.getConstant(0, DL, MVT::i64), SplatVal);
}		}

		static SDValue tryLowerLD1RQ(SDValue Op, SelectionDAG &DAG, SDLoc &DL,
		EVT &VT) {
		SDValue Op1 = Op.getOperand(1);
		if (Op1.getOpcode() != ISD::INSERT_SUBVECTOR)
		return SDValue();

		// Bail on BUILD_VECTOR, come back later once vector constant is available
		// in constant pool and its load result can be used in LD1RQ
		if (Op1.getOperand(1).getOpcode() == ISD::BUILD_VECTOR)
		return Op1.getOperand(1);

		SDValue Load = Op1.getOperand(1);
		if (Load.getOpcode() == ISD::BITCAST)
		Load = Load.getOperand(0);
		if (Load.getOpcode() != ISD::LOAD)
		return SDValue();

		int Opcode;
		EVT VecElTy = VT.getVectorElementType();
		if (VecElTy == MVT::f64 \|\| VecElTy == MVT::i64)
		Opcode = AArch64::LD1RQ_D_IMM;
		else if (VecElTy == MVT::f32 \|\| VecElTy == MVT::i32)
		Opcode = AArch64::LD1RQ_W_IMM;
		else if (VecElTy == MVT::f16 \|\| VecElTy == MVT::i16 \|\| VecElTy == MVT::bf16)
		Opcode = AArch64::LD1RQ_H_IMM;
		else if (VecElTy == MVT::i8)
		Opcode = AArch64::LD1RQ_B_IMM;
		else
		return SDValue();

		SDValue Label = Load.getOperand(1);
		SDNode *LD1RQ = DAG.getMachineNode(
		Opcode, DL, Op1.getValueType(),
		getPTrue(DAG, DL, MVT::nxv16i1, AArch64SVEPredPattern::all), Label,
		DAG.getTargetConstant(0, DL, MVT::i64));
		return DAG.getNode(ISD::BITCAST, DL, VT, SDValue(LD1RQ, 0));
		}

SDValue AArch64TargetLowering::LowerDUPQLane(SDValue Op,		SDValue AArch64TargetLowering::LowerDUPQLane(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDLoc DL(Op);		SDLoc DL(Op);

EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
if (!isTypeLegal(VT) \|\| !VT.isScalableVector())		if (!isTypeLegal(VT) \|\| !VT.isScalableVector())
return SDValue();		return SDValue();

		SDValue LD1RQResult = tryLowerLD1RQ(Op, DAG, DL, VT);
		if (LD1RQResult) {
		if (LD1RQResult.getOpcode() == ISD::BUILD_VECTOR)
		return SDValue();
		peterwaller-armUnsubmitted Not Done Reply Inline Actions As it stands, `tryLowerLD1RQ` is returning the `BUILD_VECTOR` as an in-band signalling mechanism, reusing the 'slot' that would ordinarily be used to communicate the result of the lowering to instead communicate that further lowering should stop for now. This seems hazardous to me. I'd be interested if other reviewers have an opinion on what is a better way to achieve this. `tryLowerLD1RQ` wants its hands on the address of the load, which begins life as an `ISD::BUILD_VECTOR` and the address is not available until after `BUILD_VECTOR` has been lowered. However, `LowerDUPQLane` after this point can match `Op` and replace it with something else. The logic for converting the `BUILD_VECTOR` to a load cannot be directly called out to since it is within the lowering of `BUILD_VECTOR`, and reimplementing or refactoring that doesn't seem the right way either. On balance I think the current approach is OK, although I'd prefer to see the signalling be out-of-band, perhaps via a pass-by-pointer bool. peterwaller-arm: As it stands, `tryLowerLD1RQ` is returning the `BUILD_VECTOR` as an in-band signalling…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions I believe the problem here is that the current implementation of LowerDUPQLane is kicking out a MachineNode (i.e. `AArch64::DUP_ZZI_Q`) which means you cannot implement the fix by looking for the expected `splatq(load(q))` pattern. I recommend first creating an `AArch64ISD` node for `DUP_ZZI_Q` or something akin to it if you think of something more generic. Then `LowerDUPQLane` will only emit normal DAG nodes and thus you can implement a simpler DAG combine. With that said, I believe once the AArch64ISD node exists you should be able to implement the optimisation within `AArch64SVEInstrInfo.td` much like how we target the other variants of `LD1R` albeit with a different pattern to match against. It's probably worth breaking this into two patches. The first to create and use the new ISD node and the second to add the patterns to target `LD1RQ`. paulwalker-arm: I believe the problem here is that the current implementation of LowerDUPQLane is kicking out a…
		return LD1RQResult;
		}

// Current lowering only supports the SVE-ACLE types.		// Current lowering only supports the SVE-ACLE types.
if (VT.getSizeInBits().getKnownMinSize() != AArch64::SVEBitsPerBlock)		if (VT.getSizeInBits().getKnownMinSize() != AArch64::SVEBitsPerBlock)
return SDValue();		return SDValue();

// The DUPQ operation is indepedent of element type so normalise to i64s.		// The DUPQ operation is indepedent of element type so normalise to i64s.
SDValue V = DAG.getNode(ISD::BITCAST, DL, MVT::nxv2i64, Op.getOperand(1));		SDValue V = DAG.getNode(ISD::BITCAST, DL, MVT::nxv2i64, Op.getOperand(1));
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Not your fault but I believe this `BITCAST` is a bug waiting to hit big endian targets. I recommend ensuring your new ISD node can be selected for all packed legal scalable vector types so the need for the bitcast is removed from the DUPQ path. paulwalker-arm: Not your fault but I believe this `BITCAST` is a bug waiting to hit big endian targets. I…
SDValue Idx128 = Op.getOperand(2);		SDValue Idx128 = Op.getOperand(2);

// DUPQ can be used when idx is in range.		// DUPQ can be used when idx is in range.
auto *CIdx = dyn_cast<ConstantSDNode>(Idx128);		auto *CIdx = dyn_cast<ConstantSDNode>(Idx128);
if (CIdx && (CIdx->getZExtValue() <= 3)) {		if (CIdx && (CIdx->getZExtValue() <= 3)) {
SDValue CI = DAG.getTargetConstant(CIdx->getZExtValue(), DL, MVT::i64);		SDValue CI = DAG.getTargetConstant(CIdx->getZExtValue(), DL, MVT::i64);
SDNode *DUPQ =		SDNode *DUPQ =
DAG.getMachineNode(AArch64::DUP_ZZI_Q, DL, MVT::nxv2i64, V, CI);		DAG.getMachineNode(AArch64::DUP_ZZI_Q, DL, MVT::nxv2i64, V, CI);
▲ Show 20 Lines • Show All 10,912 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll

Show First 20 Lines • Show All 529 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
%out = call <vscale x 2 x i64> @llvm.aarch64.sve.dupq.lane.nxv2i64(<vscale x 2 x i64> %a, i64 4)		%out = call <vscale x 2 x i64> @llvm.aarch64.sve.dupq.lane.nxv2i64(<vscale x 2 x i64> %a, i64 4)
ret <vscale x 2 x i64> %out		ret <vscale x 2 x i64> %out
}		}

define dso_local <vscale x 2 x double> @dupq_ld1rqd_f64() {		define dso_local <vscale x 2 x double> @dupq_ld1rqd_f64() {
; CHECK-LABEL: dupq_ld1rqd_f64:		; CHECK-LABEL: dupq_ld1rqd_f64:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: adrp x8, .LCPI49_0		; CHECK-NEXT: adrp x8, .LCPI49_0
; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI49_0]		; CHECK-NEXT: add x8, x8, :lo12:.LCPI49_0
; CHECK-NEXT: mov z0.q, q0		; CHECK-NEXT: ptrue p0.b
		; CHECK-NEXT: ld1rqd { z0.d }, p0/z, [x8]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%1 = tail call fast <vscale x 2 x double> @llvm.experimental.vector.insert.nxv2f64.v2f64(<vscale x 2 x double> undef, <2 x double> <double 1.000000e+00, double 2.000000e+00>, i64 0)		%1 = tail call fast <vscale x 2 x double> @llvm.experimental.vector.insert.nxv2f64.v2f64(<vscale x 2 x double> undef, <2 x double> <double 1.000000e+00, double 2.000000e+00>, i64 0)
%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.dupq.lane.nxv2f64(<vscale x 2 x double> %1, i64 0)		%2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.dupq.lane.nxv2f64(<vscale x 2 x double> %1, i64 0)
ret <vscale x 2 x double> %2		ret <vscale x 2 x double> %2
}		}

define dso_local <vscale x 4 x float> @dupq_ld1rqw_f32() {		define dso_local <vscale x 4 x float> @dupq_ld1rqw_f32() {
; CHECK-LABEL: dupq_ld1rqw_f32:		; CHECK-LABEL: dupq_ld1rqw_f32:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: adrp x8, .LCPI50_0		; CHECK-NEXT: adrp x8, .LCPI50_0
; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI50_0]		; CHECK-NEXT: add x8, x8, :lo12:.LCPI50_0
; CHECK-NEXT: mov z0.q, q0		; CHECK-NEXT: ptrue p0.b
		; CHECK-NEXT: ld1rqw { z0.s }, p0/z, [x8]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%1 = tail call fast <vscale x 4 x float> @llvm.experimental.vector.insert.nxv4f32.v4f32(<vscale x 4 x float> undef, <4 x float> <float 1.000000e+00, float 2.000000e+00, float 3.000000e+00, float 4.000000e+00>, i64 0)		%1 = tail call fast <vscale x 4 x float> @llvm.experimental.vector.insert.nxv4f32.v4f32(<vscale x 4 x float> undef, <4 x float> <float 1.000000e+00, float 2.000000e+00, float 3.000000e+00, float 4.000000e+00>, i64 0)
%2 = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.dupq.lane.nxv4f32(<vscale x 4 x float> %1, i64 0)		%2 = tail call fast <vscale x 4 x float> @llvm.aarch64.sve.dupq.lane.nxv4f32(<vscale x 4 x float> %1, i64 0)
ret <vscale x 4 x float> %2		ret <vscale x 4 x float> %2
}		}

define dso_local <vscale x 8 x half> @dupq_ld1rqh_f16() {		define dso_local <vscale x 8 x half> @dupq_ld1rqh_f16() {
; CHECK-LABEL: dupq_ld1rqh_f16:		; CHECK-LABEL: dupq_ld1rqh_f16:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: adrp x8, .LCPI51_0		; CHECK-NEXT: adrp x8, .LCPI51_0
; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI51_0]		; CHECK-NEXT: add x8, x8, :lo12:.LCPI51_0
; CHECK-NEXT: mov z0.q, q0		; CHECK-NEXT: ptrue p0.b
		; CHECK-NEXT: ld1rqh { z0.h }, p0/z, [x8]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%1 = tail call fast <vscale x 8 x half> @llvm.experimental.vector.insert.nxv8f16.v8f16(<vscale x 8 x half> undef, <8 x half> <half 0xH3C00, half 0xH4000, half 0xH4200, half 0xH4400, half 0xH4500, half 0xH4600, half 0xH4700, half 0xH4800>, i64 0)		%1 = tail call fast <vscale x 8 x half> @llvm.experimental.vector.insert.nxv8f16.v8f16(<vscale x 8 x half> undef, <8 x half> <half 0xH3C00, half 0xH4000, half 0xH4200, half 0xH4400, half 0xH4500, half 0xH4600, half 0xH4700, half 0xH4800>, i64 0)
%2 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.dupq.lane.nxv8f16(<vscale x 8 x half> %1, i64 0)		%2 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.dupq.lane.nxv8f16(<vscale x 8 x half> %1, i64 0)
ret <vscale x 8 x half> %2		ret <vscale x 8 x half> %2
}		}

define dso_local <vscale x 8 x bfloat> @dupq_ld1rqh_bf16() #0 {		define dso_local <vscale x 8 x bfloat> @dupq_ld1rqh_bf16() #0 {
; CHECK-LABEL: dupq_ld1rqh_bf16:		; CHECK-LABEL: dupq_ld1rqh_bf16:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: adrp x8, .LCPI52_0		; CHECK-NEXT: adrp x8, .LCPI52_0
; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI52_0]		; CHECK-NEXT: add x8, x8, :lo12:.LCPI52_0
; CHECK-NEXT: mov z0.q, q0		; CHECK-NEXT: ptrue p0.b
		; CHECK-NEXT: ld1rqh { z0.h }, p0/z, [x8]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%1 = call <vscale x 8 x bfloat> @llvm.experimental.vector.insert.nxv8bf16.v8bf16(<vscale x 8 x bfloat> undef, <8 x bfloat> <bfloat 1.000e+00, bfloat 2.000e+00, bfloat 3.000e+00, bfloat 4.000e+00, bfloat 5.000e+00, bfloat 6.000e+00, bfloat 7.000e+00, bfloat 8.000e+00>, i64 0)		%1 = call <vscale x 8 x bfloat> @llvm.experimental.vector.insert.nxv8bf16.v8bf16(<vscale x 8 x bfloat> undef, <8 x bfloat> <bfloat 1.000e+00, bfloat 2.000e+00, bfloat 3.000e+00, bfloat 4.000e+00, bfloat 5.000e+00, bfloat 6.000e+00, bfloat 7.000e+00, bfloat 8.000e+00>, i64 0)
%2 = call <vscale x 8 x bfloat> @llvm.aarch64.sve.dupq.lane.nxv8bf16(<vscale x 8 x bfloat> %1, i64 0)		%2 = call <vscale x 8 x bfloat> @llvm.aarch64.sve.dupq.lane.nxv8bf16(<vscale x 8 x bfloat> %1, i64 0)
ret <vscale x 8 x bfloat> %2		ret <vscale x 8 x bfloat> %2
}		}

define dso_local <vscale x 2 x i64> @dupq_ld1rqd_i64() {		define dso_local <vscale x 2 x i64> @dupq_ld1rqd_i64() {
; CHECK-LABEL: dupq_ld1rqd_i64:		; CHECK-LABEL: dupq_ld1rqd_i64:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: adrp x8, .LCPI53_0		; CHECK-NEXT: adrp x8, .LCPI53_0
; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI53_0]		; CHECK-NEXT: add x8, x8, :lo12:.LCPI53_0
; CHECK-NEXT: mov z0.q, q0		; CHECK-NEXT: ptrue p0.b
		; CHECK-NEXT: ld1rqd { z0.d }, p0/z, [x8]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%1 = tail call <vscale x 2 x i64> @llvm.experimental.vector.insert.nxv2i64.v2i64(<vscale x 2 x i64> undef, <2 x i64> <i64 1, i64 2>, i64 0)		%1 = tail call <vscale x 2 x i64> @llvm.experimental.vector.insert.nxv2i64.v2i64(<vscale x 2 x i64> undef, <2 x i64> <i64 1, i64 2>, i64 0)
%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.dupq.lane.nxv2i64(<vscale x 2 x i64> %1, i64 0)		%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.dupq.lane.nxv2i64(<vscale x 2 x i64> %1, i64 0)
ret <vscale x 2 x i64> %2		ret <vscale x 2 x i64> %2
}		}

define dso_local <vscale x 4 x i32> @dupq_ld1rqd_i32() {		define dso_local <vscale x 4 x i32> @dupq_ld1rqw_i32() {
; CHECK-LABEL: dupq_ld1rqd_i32:		; CHECK-LABEL: dupq_ld1rqw_i32:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: adrp x8, .LCPI54_0		; CHECK-NEXT: adrp x8, .LCPI54_0
; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI54_0]		; CHECK-NEXT: add x8, x8, :lo12:.LCPI54_0
; CHECK-NEXT: mov z0.q, q0		; CHECK-NEXT: ptrue p0.b
		; CHECK-NEXT: ld1rqw { z0.s }, p0/z, [x8]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%1 = tail call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.v4i32(<vscale x 4 x i32> undef, <4 x i32> <i32 1, i32 2, i32 3, i32 4>, i64 0)		%1 = tail call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.v4i32(<vscale x 4 x i32> undef, <4 x i32> <i32 1, i32 2, i32 3, i32 4>, i64 0)
%2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.dupq.lane.nxv4i32(<vscale x 4 x i32> %1, i64 0)		%2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.dupq.lane.nxv4i32(<vscale x 4 x i32> %1, i64 0)
ret <vscale x 4 x i32> %2		ret <vscale x 4 x i32> %2
}		}

define dso_local <vscale x 8 x i16> @dupq_ld1rqd_i16() {		define dso_local <vscale x 8 x i16> @dupq_ld1rqh_i16() {
; CHECK-LABEL: dupq_ld1rqd_i16:		; CHECK-LABEL: dupq_ld1rqh_i16:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: adrp x8, .LCPI55_0		; CHECK-NEXT: adrp x8, .LCPI55_0
; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI55_0]		; CHECK-NEXT: add x8, x8, :lo12:.LCPI55_0
; CHECK-NEXT: mov z0.q, q0		; CHECK-NEXT: ptrue p0.b
		; CHECK-NEXT: ld1rqh { z0.h }, p0/z, [x8]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%1 = tail call <vscale x 8 x i16> @llvm.experimental.vector.insert.nxv8i16.v8i16(<vscale x 8 x i16> undef, <8 x i16> <i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 7, i16 8>, i64 0)		%1 = tail call <vscale x 8 x i16> @llvm.experimental.vector.insert.nxv8i16.v8i16(<vscale x 8 x i16> undef, <8 x i16> <i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 7, i16 8>, i64 0)
%2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.dupq.lane.nxv8i16(<vscale x 8 x i16> %1, i64 0)		%2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.dupq.lane.nxv8i16(<vscale x 8 x i16> %1, i64 0)
ret <vscale x 8 x i16> %2		ret <vscale x 8 x i16> %2
}		}

define dso_local <vscale x 16 x i8> @dupq_ld1rqd_i8() {		define dso_local <vscale x 16 x i8> @dupq_ld1rqb_i8() {
; CHECK-LABEL: dupq_ld1rqd_i8:		; CHECK-LABEL: dupq_ld1rqb_i8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: adrp x8, .LCPI56_0		; CHECK-NEXT: adrp x8, .LCPI56_0
; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI56_0]		; CHECK-NEXT: add x8, x8, :lo12:.LCPI56_0
; CHECK-NEXT: mov z0.q, q0		; CHECK-NEXT: ptrue p0.b
		; CHECK-NEXT: ld1rqb { z0.b }, p0/z, [x8]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%1 = tail call <vscale x 16 x i8> @llvm.experimental.vector.insert.nxv16i8.v16i8(<vscale x 16 x i8> undef, <16 x i8> <i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16>, i64 0)		%1 = tail call <vscale x 16 x i8> @llvm.experimental.vector.insert.nxv16i8.v16i8(<vscale x 16 x i8> undef, <16 x i8> <i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16>, i64 0)
%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.dupq.lane.nxv16i8(<vscale x 16 x i8> %1, i64 0)		%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.dupq.lane.nxv16i8(<vscale x 16 x i8> %1, i64 0)
ret <vscale x 16 x i8> %2		ret <vscale x 16 x i8> %2
}		}

;		;
; EXT		; EXT
▲ Show 20 Lines • Show All 1,731 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Lower aarch64_sve_dupq_lane to ld1rqAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 440114

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll

[AArch64][SVE] Lower aarch64_sve_dupq_lane to ld1rq
AbandonedPublic