This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9051	I assume we could handle nxv2f16 if we wanted to? I guess the indexing gets a little more complicated.
9074	Is there some reason to do this as custom lowering, as opposed to just writing this directly as an iel pattern?

paulwalker-arm added inline comments.Sep 17 2020, 12:26 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9051	We can but currently EXTRACT_VECTOR_ELT doesn't support any of the unpacked types so I didn't bother considering them because I'm mainly concerned with maintaining code quality for D87843. Although this means the code is wrong because it's incorrectly reporting the nodes as legal, so I'll just remove this block.
9074	I just figured it was better to reuse the existing patterns?

Remove block that is incorrectly reporting unpacked and predicate EXTRACT_VECTOR_ELT as legal.

Harbormaster completed remote builds in B72062: Diff 292591.Sep 17 2020, 1:00 PM

Please make sure we have test coverage for the SVE-only extracts (e.g. extracting the third element of a <vscale x 2 x i64>).

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9074	The EXTRACT_SUBVECTOR doesn't really have any semantic meaning here: it's just to model the fact that the underlying instructions are "NEON" instructions. From a DAGCombine perspective, that isn't really useful information. Given that, I'd lean towards adding more patterns, even if they're sort of redundant. That said, the current approach is okay, I guess. Is there any reason to prefer NEON extract instructions over SVE ones if we're producing a floating-point value?

Replaced custom selection with isel patterns. Since we're going the isel route I figures I may was well add the missing patterns for unpacked floating point types.

paulwalker-arm added inline comments.Sep 18 2020, 9:43 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9074	Is there any reason to prefer NEON extract instructions over SVE ones if we're producing a floating-point value? I was thinking about it the other way round, namely why would we prefer SVE over NEON? This was based on an assumption that a NEON extract might be cheaper than a full register SVE dup. At this stage though I'm happy to wait for proof one way or the other, so have just omitted the floating-point patterns for this patch.

Harbormaster completed remote builds in B72195: Diff 292827.Sep 18 2020, 10:03 AM

LGTM with one minor fix.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
2202 ↗	(On Diff #292827)	Typo for i64? Should be nxv2i64. I think this affects one of the tests.

This revision is now accepted and ready to land.Sep 18 2020, 11:23 AM

Closed by commit rG6457455248d5: [SVE] Use NEON for extract_vector_elt when the index is in range. (authored by paulwalker-arm). · Explain WhySep 21 2020, 5:15 AM

This revision was automatically updated to reflect the committed changes.

paulwalker-arm added a commit: rG6457455248d5: [SVE] Use NEON for extract_vector_elt when the index is in range..

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

29 lines

test/

CodeGen/

AArch64/

sve-extract-element.ll

75 lines

sve-insert-element.ll

16 lines

sve-split-extract-elt.ll

6 lines

Diff 292591

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 938 Lines • ▼ Show 20 Lines	AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
}		}

if (Subtarget->hasSVE()) {		if (Subtarget->hasSVE()) {
// FIXME: Add custom lowering of MLOAD to handle different passthrus (not a		// FIXME: Add custom lowering of MLOAD to handle different passthrus (not a
// splat of 0 or undef) once vector selects supported in SVE codegen. See		// splat of 0 or undef) once vector selects supported in SVE codegen. See
// D68877 for more details.		// D68877 for more details.
for (MVT VT : MVT::integer_scalable_vector_valuetypes()) {		for (MVT VT : MVT::integer_scalable_vector_valuetypes()) {
if (isTypeLegal(VT)) {		if (isTypeLegal(VT)) {
		setOperationAction(ISD::EXTRACT_VECTOR_ELT, VT, Custom);
setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);
setOperationAction(ISD::MUL, VT, Custom);		setOperationAction(ISD::MUL, VT, Custom);
setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);		setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);
setOperationAction(ISD::SELECT, VT, Custom);		setOperationAction(ISD::SELECT, VT, Custom);
setOperationAction(ISD::SDIV, VT, Custom);		setOperationAction(ISD::SDIV, VT, Custom);
setOperationAction(ISD::UDIV, VT, Custom);		setOperationAction(ISD::UDIV, VT, Custom);
setOperationAction(ISD::SMIN, VT, Custom);		setOperationAction(ISD::SMIN, VT, Custom);
setOperationAction(ISD::UMIN, VT, Custom);		setOperationAction(ISD::UMIN, VT, Custom);
Show All 15 Lines	for (auto VT : {MVT::nxv8i8, MVT::nxv4i16, MVT::nxv2i32}) {
setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);
}		}

setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i8, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i8, Custom);
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i16, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i16, Custom);

for (MVT VT : MVT::fp_scalable_vector_valuetypes()) {		for (MVT VT : MVT::fp_scalable_vector_valuetypes()) {
if (isTypeLegal(VT)) {		if (isTypeLegal(VT)) {
		setOperationAction(ISD::EXTRACT_VECTOR_ELT, VT, Custom);
setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);
setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);		setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);
setOperationAction(ISD::SELECT, VT, Custom);		setOperationAction(ISD::SELECT, VT, Custom);
setOperationAction(ISD::FADD, VT, Custom);		setOperationAction(ISD::FADD, VT, Custom);
setOperationAction(ISD::FDIV, VT, Custom);		setOperationAction(ISD::FDIV, VT, Custom);
setOperationAction(ISD::FMA, VT, Custom);		setOperationAction(ISD::FMA, VT, Custom);
setOperationAction(ISD::FMUL, VT, Custom);		setOperationAction(ISD::FMUL, VT, Custom);
setOperationAction(ISD::FNEG, VT, Custom);		setOperationAction(ISD::FNEG, VT, Custom);
▲ Show 20 Lines • Show All 8,050 Lines • ▼ Show 20 Lines
SDValue		SDValue
AArch64TargetLowering::LowerEXTRACT_VECTOR_ELT(SDValue Op,		AArch64TargetLowering::LowerEXTRACT_VECTOR_ELT(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
assert(Op.getOpcode() == ISD::EXTRACT_VECTOR_ELT && "Unknown opcode!");		assert(Op.getOpcode() == ISD::EXTRACT_VECTOR_ELT && "Unknown opcode!");

// Check for non-constant or out of range lane.		// Check for non-constant or out of range lane.
EVT VT = Op.getOperand(0).getValueType();		EVT VT = Op.getOperand(0).getValueType();
ConstantSDNode *CI = dyn_cast<ConstantSDNode>(Op.getOperand(1));		ConstantSDNode *CI = dyn_cast<ConstantSDNode>(Op.getOperand(1));
if (!CI \|\| CI->getZExtValue() >= VT.getVectorNumElements())
		if (VT.isScalableVector()) {
		// ISel patterns only exist for the following types.
		if (VT != MVT::nxv16i8 && VT != MVT::nxv8i16 && VT != MVT::nxv4i32 &&
		VT != MVT::nxv2i64 && VT != MVT::nxv8f16 && VT != MVT::nxv4f32 &&
		VT != MVT::nxv2f64)
		efriedmaUnsubmitted Not Done Reply Inline Actions I assume we could handle nxv2f16 if we wanted to? I guess the indexing gets a little more complicated. efriedma: I assume we could handle nxv2f16 if we wanted to? I guess the indexing gets a little more…
		paulwalker-armAuthorUnsubmitted Done Reply Inline Actions We can but currently EXTRACT_VECTOR_ELT doesn't support any of the unpacked types so I didn't bother considering them because I'm mainly concerned with maintaining code quality for D87843. Although this means the code is wrong because it's incorrectly reporting the nodes as legal, so I'll just remove this block. paulwalker-arm: We can but currently EXTRACT_VECTOR_ELT doesn't support any of the unpacked types so I didn't…
return SDValue();		return SDValue();

		// If the requested element is within the NEON part of an SVE register we
		// can use more capable NEON instructions to do the work.
		unsigned KnownMinNumElts = VT.getVectorElementCount().getKnownMinValue();
		if (!CI \|\| CI->getZExtValue() >= KnownMinNumElts)
		return Op;

		SDLoc DL(Op);
		// ValueType for NEON part of the SVE input.
		EVT SubVT = EVT::getVectorVT(*DAG.getContext(), VT.getVectorElementType(),
		KnownMinNumElts);
		assert(isTypeLegal(SubVT) && "Unexpected illegal subtype for extract!");
		SDValue Bottom128 =
		DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, SubVT, Op.getOperand(0),
		DAG.getConstant(0, DL, MVT::i64));
		return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, Op.getValueType(),
		Bottom128, Op.getOperand(1));
		}

		if (!CI \|\| CI->getZExtValue() >= VT.getVectorNumElements())
		return SDValue();

		efriedmaUnsubmitted Not Done Reply Inline Actions Is there some reason to do this as custom lowering, as opposed to just writing this directly as an iel pattern? efriedma: Is there some reason to do this as custom lowering, as opposed to just writing this directly as…
		paulwalker-armAuthorUnsubmitted Done Reply Inline Actions I just figured it was better to reuse the existing patterns? paulwalker-arm: I just figured it was better to reuse the existing patterns?
		efriedmaUnsubmitted Not Done Reply Inline Actions The EXTRACT_SUBVECTOR doesn't really have any semantic meaning here: it's just to model the fact that the underlying instructions are "NEON" instructions. From a DAGCombine perspective, that isn't really useful information. Given that, I'd lean towards adding more patterns, even if they're sort of redundant. That said, the current approach is okay, I guess. Is there any reason to prefer NEON extract instructions over SVE ones if we're producing a floating-point value? efriedma: The EXTRACT_SUBVECTOR doesn't really have any semantic meaning here: it's just to model the…
		paulwalker-armAuthorUnsubmitted Done Reply Inline Actions Is there any reason to prefer NEON extract instructions over SVE ones if we're producing a floating-point value? I was thinking about it the other way round, namely why would we prefer SVE over NEON? This was based on an assumption that a NEON extract might be cheaper than a full register SVE dup. At this stage though I'm happy to wait for proof one way or the other, so have just omitted the floating-point patterns for this patch. paulwalker-arm: > Is there any reason to prefer NEON extract instructions over SVE ones if we're producing a…
// Insertion/extraction are legal for V128 types.		// Insertion/extraction are legal for V128 types.
if (VT == MVT::v16i8 \|\| VT == MVT::v8i16 \|\| VT == MVT::v4i32 \|\|		if (VT == MVT::v16i8 \|\| VT == MVT::v8i16 \|\| VT == MVT::v4i32 \|\|
VT == MVT::v2i64 \|\| VT == MVT::v4f32 \|\| VT == MVT::v2f64 \|\|		VT == MVT::v2i64 \|\| VT == MVT::v4f32 \|\| VT == MVT::v2f64 \|\|
VT == MVT::v8f16 \|\| VT == MVT::v8bf16)		VT == MVT::v8f16 \|\| VT == MVT::v8bf16)
return Op;		return Op;

if (VT != MVT::v8i8 && VT != MVT::v4i16 && VT != MVT::v2i32 &&		if (VT != MVT::v8i8 && VT != MVT::v4i16 && VT != MVT::v2i32 &&
VT != MVT::v1i64 && VT != MVT::v2f32 && VT != MVT::v4f16 &&		VT != MVT::v1i64 && VT != MVT::v2f32 && VT != MVT::v4f16 &&
▲ Show 20 Lines • Show All 6,771 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-extract-element.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t \| FileCheck %s		; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t \| FileCheck %s
; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t		; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t

; If this check fails please read test/CodeGen/AArch64/README for instructions on how to resolve it.		; If this check fails please read test/CodeGen/AArch64/README for instructions on how to resolve it.
; WARN-NOT: warning		; WARN-NOT: warning

define i8 @test_lane0_16xi8(<vscale x 16 x i8> %a) {		define i8 @test_lane0_16xi8(<vscale x 16 x i8> %a) {
; CHECK-LABEL: test_lane0_16xi8:		; CHECK-LABEL: test_lane0_16xi8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov z0.b, b0		; CHECK-NEXT: umov w0, v0.b[0]
; CHECK-NEXT: fmov w0, s0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%b = extractelement <vscale x 16 x i8> %a, i32 0		%b = extractelement <vscale x 16 x i8> %a, i32 0
ret i8 %b		ret i8 %b
}		}

define i16 @test_lane0_8xi16(<vscale x 8 x i16> %a) {		define i16 @test_lane0_8xi16(<vscale x 8 x i16> %a) {
; CHECK-LABEL: test_lane0_8xi16:		; CHECK-LABEL: test_lane0_8xi16:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov z0.h, h0		; CHECK-NEXT: umov w0, v0.h[0]
; CHECK-NEXT: fmov w0, s0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%b = extractelement <vscale x 8 x i16> %a, i32 0		%b = extractelement <vscale x 8 x i16> %a, i32 0
ret i16 %b		ret i16 %b
}		}

define i32 @test_lane0_4xi32(<vscale x 4 x i32> %a) {		define i32 @test_lane0_4xi32(<vscale x 4 x i32> %a) {
; CHECK-LABEL: test_lane0_4xi32:		; CHECK-LABEL: test_lane0_4xi32:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov z0.s, s0
; CHECK-NEXT: fmov w0, s0		; CHECK-NEXT: fmov w0, s0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%b = extractelement <vscale x 4 x i32> %a, i32 0		%b = extractelement <vscale x 4 x i32> %a, i32 0
ret i32 %b		ret i32 %b
}		}

define i64 @test_lane0_2xi64(<vscale x 2 x i64> %a) {		define i64 @test_lane0_2xi64(<vscale x 2 x i64> %a) {
; CHECK-LABEL: test_lane0_2xi64:		; CHECK-LABEL: test_lane0_2xi64:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov z0.d, d0
; CHECK-NEXT: fmov x0, d0		; CHECK-NEXT: fmov x0, d0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%b = extractelement <vscale x 2 x i64> %a, i32 0		%b = extractelement <vscale x 2 x i64> %a, i32 0
ret i64 %b		ret i64 %b
}		}

define double @test_lane0_2xf64(<vscale x 2 x double> %a) {		define double @test_lane0_2xf64(<vscale x 2 x double> %a) {
; CHECK-LABEL: test_lane0_2xf64:		; CHECK-LABEL: test_lane0_2xf64:
Show All 17 Lines
; CHECK-LABEL: test_lane0_8xf16:		; CHECK-LABEL: test_lane0_8xf16:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: // kill: def $h0 killed $h0 killed $z0		; CHECK-NEXT: // kill: def $h0 killed $h0 killed $z0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%b = extractelement <vscale x 8 x half> %a, i32 0		%b = extractelement <vscale x 8 x half> %a, i32 0
ret half %b		ret half %b
}		}

		define i8 @test_lane15_16xi8(<vscale x 16 x i8> %a) {
		; CHECK-LABEL: test_lane15_16xi8:
		; CHECK: // %bb.0:
		; CHECK-NEXT: umov w0, v0.b[15]
		; CHECK-NEXT: ret
		%b = extractelement <vscale x 16 x i8> %a, i32 15
		ret i8 %b
		}

		define i16 @test_lane7_8xi16(<vscale x 8 x i16> %a) {
		; CHECK-LABEL: test_lane7_8xi16:
		; CHECK: // %bb.0:
		; CHECK-NEXT: umov w0, v0.h[7]
		; CHECK-NEXT: ret
		%b = extractelement <vscale x 8 x i16> %a, i32 7
		ret i16 %b
		}

		define i32 @test_lane3_4xi32(<vscale x 4 x i32> %a) {
		; CHECK-LABEL: test_lane3_4xi32:
		; CHECK: // %bb.0:
		; CHECK-NEXT: mov w0, v0.s[3]
		; CHECK-NEXT: ret
		%b = extractelement <vscale x 4 x i32> %a, i32 3
		ret i32 %b
		}

		define i64 @test_lane1_2xi64(<vscale x 2 x i64> %a) {
		; CHECK-LABEL: test_lane1_2xi64:
		; CHECK: // %bb.0:
		; CHECK-NEXT: mov x0, v0.d[1]
		; CHECK-NEXT: ret
		%b = extractelement <vscale x 2 x i64> %a, i32 1
		ret i64 %b
		}

		define double @test_lane1_2xf64(<vscale x 2 x double> %a) {
		; CHECK-LABEL: test_lane1_2xf64:
		; CHECK: // %bb.0:
		; CHECK-NEXT: mov d0, v0.d[1]
		; CHECK-NEXT: ret
		%b = extractelement <vscale x 2 x double> %a, i32 1
		ret double %b
		}

		define float @test_lane3_4xf32(<vscale x 4 x float> %a) {
		; CHECK-LABEL: test_lane3_4xf32:
		; CHECK: // %bb.0:
		; CHECK-NEXT: mov s0, v0.s[3]
		; CHECK-NEXT: ret
		%b = extractelement <vscale x 4 x float> %a, i32 3
		ret float %b
		}

		define half @test_lane7_8xf16(<vscale x 8 x half> %a) {
		; CHECK-LABEL: test_lane7_8xf16:
		; CHECK: // %bb.0:
		; CHECK-NEXT: mov h0, v0.h[7]
		; CHECK-NEXT: ret
		%b = extractelement <vscale x 8 x half> %a, i32 7
		ret half %b
		}

define i8 @test_lanex_16xi8(<vscale x 16 x i8> %a, i32 %x) {		define i8 @test_lanex_16xi8(<vscale x 16 x i8> %a, i32 %x) {
; CHECK-LABEL: test_lanex_16xi8:		; CHECK-LABEL: test_lanex_16xi8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0		; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
; CHECK-NEXT: sxtw x8, w0		; CHECK-NEXT: sxtw x8, w0
; CHECK-NEXT: whilels p0.b, xzr, x8		; CHECK-NEXT: whilels p0.b, xzr, x8
; CHECK-NEXT: lastb w0, p0, z0.b		; CHECK-NEXT: lastb w0, p0, z0.b
; CHECK-NEXT: ret		; CHECK-NEXT: ret
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
%b = extractelement <vscale x 2 x double> %a, i32 9		%b = extractelement <vscale x 2 x double> %a, i32 9
ret double %b		ret double %b
}		}

; Deliberately choose an index that is undefined		; Deliberately choose an index that is undefined
define i32 @test_lane64_4xi32(<vscale x 4 x i32> %a) {		define i32 @test_lane64_4xi32(<vscale x 4 x i32> %a) {
; CHECK-LABEL: test_lane64_4xi32:		; CHECK-LABEL: test_lane64_4xi32:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov z0.s, s0
; CHECK-NEXT: fmov w0, s0		; CHECK-NEXT: fmov w0, s0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%b = extractelement <vscale x 4 x i32> %a, i32 undef		%b = extractelement <vscale x 4 x i32> %a, i32 undef
ret i32 %b		ret i32 %b
}		}

define i8 @extract_of_insert_undef_16xi8(i8 %a) {		define i8 @extract_of_insert_undef_16xi8(i8 %a) {
; CHECK-LABEL: extract_of_insert_undef_16xi8:		; CHECK-LABEL: extract_of_insert_undef_16xi8:
Show All 20 Lines	; CHECK-NEXT: ret
%c = insertelement <vscale x 16 x i8> %a, i8 %b, i32 64		%c = insertelement <vscale x 16 x i8> %a, i8 %b, i32 64
%d = extractelement <vscale x 16 x i8> %c, i32 64		%d = extractelement <vscale x 16 x i8> %c, i32 64
ret i8 %d		ret i8 %d
}		}

define i8 @extract_of_insert_diff_lanes_16xi8(<vscale x 16 x i8> %a, i8 %b) {		define i8 @extract_of_insert_diff_lanes_16xi8(<vscale x 16 x i8> %a, i8 %b) {
; CHECK-LABEL: extract_of_insert_diff_lanes_16xi8:		; CHECK-LABEL: extract_of_insert_diff_lanes_16xi8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov z0.b, z0.b[3]		; CHECK-NEXT: umov w0, v0.b[3]
; CHECK-NEXT: fmov w0, s0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%c = insertelement <vscale x 16 x i8> %a, i8 %b, i32 0		%c = insertelement <vscale x 16 x i8> %a, i8 %b, i32 0
%d = extractelement <vscale x 16 x i8> %c, i32 3		%d = extractelement <vscale x 16 x i8> %c, i32 3
ret i8 %d		ret i8 %d
}		}

define i8 @test_lane0_zero_16xi8(<vscale x 16 x i8> %a) {		define i8 @test_lane0_zero_16xi8(<vscale x 16 x i8> %a) {
; CHECK-LABEL: test_lane0_zero_16xi8:		; CHECK-LABEL: test_lane0_zero_16xi8:
Show All 19 Lines

llvm/test/CodeGen/AArch64/sve-insert-element.ll

Show First 20 Lines • Show All 176 Lines • ▼ Show 20 Lines
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%b = insertelement <vscale x 16 x i8> undef, i8 %a, i32 0		%b = insertelement <vscale x 16 x i8> undef, i8 %a, i32 0
ret <vscale x 16 x i8> %b		ret <vscale x 16 x i8> %b
}		}

define <vscale x 16 x i8> @test_insert0_of_extract0_16xi8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) {		define <vscale x 16 x i8> @test_insert0_of_extract0_16xi8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) {
; CHECK-LABEL: test_insert0_of_extract0_16xi8:		; CHECK-LABEL: test_insert0_of_extract0_16xi8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov z1.b, b1		; CHECK-NEXT: umov w8, v1.b[0]
; CHECK-NEXT: ptrue p0.b, vl1		; CHECK-NEXT: ptrue p0.b, vl1
; CHECK-NEXT: fmov w8, s1
; CHECK-NEXT: mov z0.b, p0/m, w8		; CHECK-NEXT: mov z0.b, p0/m, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%c = extractelement <vscale x 16 x i8> %b, i32 0		%c = extractelement <vscale x 16 x i8> %b, i32 0
%d = insertelement <vscale x 16 x i8> %a, i8 %c, i32 0		%d = insertelement <vscale x 16 x i8> %a, i8 %c, i32 0
ret <vscale x 16 x i8> %d		ret <vscale x 16 x i8> %d
}		}

define <vscale x 16 x i8> @test_insert64_of_extract64_16xi8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) {		define <vscale x 16 x i8> @test_insert64_of_extract64_16xi8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) {
Show All 11 Lines	; CHECK-NEXT: ret
%c = extractelement <vscale x 16 x i8> %b, i32 64		%c = extractelement <vscale x 16 x i8> %b, i32 64
%d = insertelement <vscale x 16 x i8> %a, i8 %c, i32 64		%d = insertelement <vscale x 16 x i8> %a, i8 %c, i32 64
ret <vscale x 16 x i8> %d		ret <vscale x 16 x i8> %d
}		}

define <vscale x 16 x i8> @test_insert3_of_extract1_16xi8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) {		define <vscale x 16 x i8> @test_insert3_of_extract1_16xi8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) {
; CHECK-LABEL: test_insert3_of_extract1_16xi8:		; CHECK-LABEL: test_insert3_of_extract1_16xi8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov z1.b, z1.b[1]		; CHECK-NEXT: mov w9, #3
; CHECK-NEXT: mov w8, #3		; CHECK-NEXT: umov w8, v1.b[1]
; CHECK-NEXT: index z2.b, #0, #1		; CHECK-NEXT: index z1.b, #0, #1
; CHECK-NEXT: fmov w9, s1		; CHECK-NEXT: mov z2.b, w9
; CHECK-NEXT: mov z1.b, w8
; CHECK-NEXT: ptrue p0.b		; CHECK-NEXT: ptrue p0.b
; CHECK-NEXT: cmpeq p0.b, p0/z, z2.b, z1.b		; CHECK-NEXT: cmpeq p0.b, p0/z, z1.b, z2.b
; CHECK-NEXT: mov z0.b, p0/m, w9		; CHECK-NEXT: mov z0.b, p0/m, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%c = extractelement <vscale x 16 x i8> %b, i32 1		%c = extractelement <vscale x 16 x i8> %b, i32 1
%d = insertelement <vscale x 16 x i8> %a, i8 %c, i32 3		%d = insertelement <vscale x 16 x i8> %a, i8 %c, i32 3
ret <vscale x 16 x i8> %d		ret <vscale x 16 x i8> %d
}		}

llvm/test/CodeGen/AArch64/sve-split-extract-elt.ll

Show First 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
ret i64 %ext		ret i64 %ext
}		}

; EXTRACT VECTOR ELT, CONSTANT IDX		; EXTRACT VECTOR ELT, CONSTANT IDX

define i16 @promote_extract_4i16(<vscale x 4 x i16> %a) {		define i16 @promote_extract_4i16(<vscale x 4 x i16> %a) {
; CHECK-LABEL: promote_extract_4i16:		; CHECK-LABEL: promote_extract_4i16:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov z0.s, z0.s[1]		; CHECK-NEXT: mov w0, v0.s[1]
; CHECK-NEXT: fmov w0, s0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%ext = extractelement <vscale x 4 x i16> %a, i32 1		%ext = extractelement <vscale x 4 x i16> %a, i32 1
ret i16 %ext		ret i16 %ext
}		}

define i8 @split_extract_32i8(<vscale x 32 x i8> %a) {		define i8 @split_extract_32i8(<vscale x 32 x i8> %a) {
; CHECK-LABEL: split_extract_32i8:		; CHECK-LABEL: split_extract_32i8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov z0.b, z0.b[3]		; CHECK-NEXT: umov w0, v0.b[3]
; CHECK-NEXT: fmov w0, s0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%ext = extractelement <vscale x 32 x i8> %a, i32 3		%ext = extractelement <vscale x 32 x i8> %a, i32 3
ret i8 %ext		ret i8 %ext
}		}

define i16 @split_extract_16i16(<vscale x 16 x i16> %a) {		define i16 @split_extract_16i16(<vscale x 16 x i16> %a) {
; CHECK-LABEL: split_extract_16i16:		; CHECK-LABEL: split_extract_16i16:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SVE] Use NEON for extract_vector_elt when the index is in range.ClosedPublic

Details

Diff Detail