This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7638–7639	I think this is better done within getNode() so as to eliminate the possibility as early as possible.
7641	getMinSVEVectorSizeInBits doesn't look relevant here. The restriction is linked to our expected isel to either INSR or EXT instructions. So I think we can be explicit here and use `2048/Ty.getVectorElementType().getSizeInBits()` along with a suitable comment about the range of EXT's index operand.

bsmith added inline comments.Oct 5 2021, 8:47 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7641	Is that really correct? If we just allow all vector sizes then surely you would end up with incorrect ext instructions? For example, if you had a VL of 128, a splice of a `<vscale x 16 x i8>` with an index of 31 would give an ext with an immediate of 31, which is only correct if your VL is >= 256, or are you suggesting that such a DAG node is malformed and hence undef?

bsmith added a child revision: D111165: [AArch64][SVE] Add fixed type lowering for EXTRACT_SUBVECTOR.Oct 5 2021, 9:00 AM

paulwalker-arm added inline comments.Oct 5 2021, 9:03 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7641	The latter. We can assume the index in range and just need to ensure that when out of range we don't do anything bad (like access random memory). In our case we'll always emit an EXT instruction that will either do the correct thing when the index is in range (i.e. the defined case) or return the first vector when the index is out of range (i.e. the undefined case).

Move idx==0 optimization to getNode
Allow lowering for all VL lengths, not just the current one.

Harbormaster completed remote builds in B127102: Diff 377280.Oct 5 2021, 10:21 AM

paulwalker-arm added inline comments.Oct 6 2021, 3:27 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
6162	Up to you but N3 must to be a constant so you can do `if (cast<ConstantSDNode>(N3)->isNullValue())`.
llvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll
18–22	I think the intent of all the `first_idx` tests is to validate the lower bound of the `EXT` isel patterns and so think they should all pass `1` to the `vector.splice` and then perhaps just have a single `zero_idx` test for the NOP case.
46–48	Is this safe? You've not touched this code so I've got a feeling the common expand code is incorrectly assuming the largest legal index+1 represents a safe index to use when expanding the code. I think it is safer to force an out-of-range index to `0`.
221–222	Perhaps the `_1_` in the function name should now be `_31_`?

paulwalker-arm added inline comments.Oct 6 2021, 3:46 AM

llvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll
46–48	Please ignore this. I didn't spot the ll test change and so got my wires crossed.

Change getNode to assume constant value for 3rd operand of VECTOR_SPLICE
Change #0 index tests to #1 to check minimum ext range
Add single #0 index test

paulwalker-arm accepted this revision.Oct 6 2021, 4:10 AM

This revision is now accepted and ready to land.Oct 6 2021, 4:10 AM

peterwaller-arm added inline comments.Oct 6 2021, 4:38 AM

llvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll
216	I expected to find a #2 here (1 x f16's worth of bytes), but there is a #8, which I can't quickly explain?
227	Continuing the logic of the earlier comment I make the immediate maximum to be 255 (bytes) - 2 (1 x f16) = 253, but we can't access byte 253 because we need a whole number of 2-byte halves, so I'm expecting to see 252/2 = 126 as input and #252 as the immediate.

Harbormaster completed remote builds in B127259: Diff 377494.Oct 6 2021, 4:38 AM

peterwaller-arm accepted this revision.Oct 6 2021, 5:45 AM

peterwaller-arm added inline comments.

llvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll
216	A colleague reminded me that this is an unpacked type, so the container fits in 8 bytes. Now it all makes sense.

Closed by commit rG5be266db7ab2: [AArch64][SVE] Improve VECTOR_SPLICE codegen for VL > 128-bit (authored by bsmith). · Explain WhyOct 7 2021, 8:29 AM

This revision was automatically updated to reflect the committed changes.

bsmith added a commit: rG5be266db7ab2: [AArch64][SVE] Improve VECTOR_SPLICE codegen for VL > 128-bit.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

SelectionDAG.cpp

5 lines

Target/

AArch64/

AArch64ISelLowering.cpp

7 lines

AArch64SVEInstrInfo.td

16 lines

SVEInstrFormats.td

8 lines

test/

CodeGen/

AArch64/

named-vector-shuffles-sve.ll

193 lines

Diff 377865

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,151 Lines • ▼ Show 20 Lines	SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT,
}		}
case ISD::SELECT:		case ISD::SELECT:
case ISD::VSELECT:		case ISD::VSELECT:
if (SDValue V = simplifySelect(N1, N2, N3))		if (SDValue V = simplifySelect(N1, N2, N3))
return V;		return V;
break;		break;
case ISD::VECTOR_SHUFFLE:		case ISD::VECTOR_SHUFFLE:
llvm_unreachable("should use getVectorShuffle constructor!");		llvm_unreachable("should use getVectorShuffle constructor!");
		case ISD::VECTOR_SPLICE: {
		if (cast<ConstantSDNode>(N3)->isNullValue())
		return N1;
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Up to you but N3 must to be a constant so you can do `if (cast<ConstantSDNode>(N3)->isNullValue())`. paulwalker-arm: Up to you but N3 must to be a constant so you can do `if (cast<ConstantSDNode>(N3)->isNullValue…
		break;
		}
case ISD::INSERT_VECTOR_ELT: {		case ISD::INSERT_VECTOR_ELT: {
ConstantSDNode *N3C = dyn_cast<ConstantSDNode>(N3);		ConstantSDNode *N3C = dyn_cast<ConstantSDNode>(N3);
// INSERT_VECTOR_ELT into out-of-bounds element is an UNDEF, except		// INSERT_VECTOR_ELT into out-of-bounds element is an UNDEF, except
// for scalable vectors where we will generate appropriate code to		// for scalable vectors where we will generate appropriate code to
// deal with out-of-bounds cases correctly.		// deal with out-of-bounds cases correctly.
if (N3C && N1.getValueType().isFixedLengthVector() &&		if (N3C && N1.getValueType().isFixedLengthVector() &&
N3C->getZExtValue() >= N1.getValueType().getVectorNumElements())		N3C->getZExtValue() >= N1.getValueType().getVectorNumElements())
return getUNDEF(VT);		return getUNDEF(VT);
▲ Show 20 Lines • Show All 4,922 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,624 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerSELECT_CC(ISD::CondCode CC, SDValue LHS,
}		}

// Otherwise, return the output of the first CSEL.		// Otherwise, return the output of the first CSEL.
return CS1;		return CS1;
}		}

SDValue AArch64TargetLowering::LowerVECTOR_SPLICE(SDValue Op,		SDValue AArch64TargetLowering::LowerVECTOR_SPLICE(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {

EVT Ty = Op.getValueType();		EVT Ty = Op.getValueType();
auto Idx = Op.getConstantOperandAPInt(2);		auto Idx = Op.getConstantOperandAPInt(2);
if (Idx.sge(-1) && Idx.slt(Ty.getVectorMinNumElements()))
		// This will select to an EXT instruction, which has a maximum immediate
		// value of 255, hence 2048-bits is the maximum value we can lower.
		if (Idx.sge(-1) && Idx.slt(2048 / Ty.getVectorElementType().getSizeInBits()))
return Op;		return Op;
		paulwalker-armUnsubmitted Not Done Reply Inline Actions I think this is better done within getNode() so as to eliminate the possibility as early as possible. paulwalker-arm: I think this is better done within getNode() so as to eliminate the possibility as early as…

return SDValue();		return SDValue();
		paulwalker-armUnsubmitted Not Done Reply Inline Actions getMinSVEVectorSizeInBits doesn't look relevant here. The restriction is linked to our expected isel to either INSR or EXT instructions. So I think we can be explicit here and use `2048/Ty.getVectorElementType().getSizeInBits()` along with a suitable comment about the range of EXT's index operand. paulwalker-arm: getMinSVEVectorSizeInBits doesn't look relevant here. The restriction is linked to our…
		bsmithAuthorUnsubmitted Done Reply Inline Actions Is that really correct? If we just allow all vector sizes then surely you would end up with incorrect ext instructions? For example, if you had a VL of 128, a splice of a `<vscale x 16 x i8>` with an index of 31 would give an ext with an immediate of 31, which is only correct if your VL is >= 256, or are you suggesting that such a DAG node is malformed and hence undef? bsmith: Is that really correct? If we just allow all vector sizes then surely you would end up with…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions The latter. We can assume the index in range and just need to ensure that when out of range we don't do anything bad (like access random memory). In our case we'll always emit an EXT instruction that will either do the correct thing when the index is in range (i.e. the defined case) or return the first vector when the index is out of range (i.e. the undefined case). paulwalker-arm: The latter. We can assume the index in range and just need to ensure that when out of range we…
}		}

SDValue AArch64TargetLowering::LowerSELECT_CC(SDValue Op,		SDValue AArch64TargetLowering::LowerSELECT_CC(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(4))->get();		ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(4))->get();
SDValue LHS = Op.getOperand(0);		SDValue LHS = Op.getOperand(0);
SDValue RHS = Op.getOperand(1);		SDValue RHS = Op.getOperand(1);
SDValue TVal = Op.getOperand(2);		SDValue TVal = Op.getOperand(2);
▲ Show 20 Lines • Show All 11,360 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

Show First 20 Lines • Show All 2,592 Lines • ▼ Show 20 Lines	def : Pat<(vector_extract (nxv4f32 ZPR:$Zs), (i64 0)),
(f32 (EXTRACT_SUBREG ZPR:$Zs, ssub))>;		(f32 (EXTRACT_SUBREG ZPR:$Zs, ssub))>;
def : Pat<(vector_extract (nxv2f32 ZPR:$Zs), (i64 0)),		def : Pat<(vector_extract (nxv2f32 ZPR:$Zs), (i64 0)),
(f32 (EXTRACT_SUBREG ZPR:$Zs, ssub))>;		(f32 (EXTRACT_SUBREG ZPR:$Zs, ssub))>;
def : Pat<(vector_extract (nxv2f64 ZPR:$Zs), (i64 0)),		def : Pat<(vector_extract (nxv2f64 ZPR:$Zs), (i64 0)),
(f64 (EXTRACT_SUBREG ZPR:$Zs, dsub))>;		(f64 (EXTRACT_SUBREG ZPR:$Zs, dsub))>;
}		}

// Splice with lane bigger or equal to 0		// Splice with lane bigger or equal to 0
def : Pat<(nxv16i8 (vector_splice (nxv16i8 ZPR:$Z1), (nxv16i8 ZPR:$Z2), (i64 (sve_ext_imm_0_15 i32:$index)))),		def : Pat<(nxv16i8 (vector_splice (nxv16i8 ZPR:$Z1), (nxv16i8 ZPR:$Z2), (i64 (sve_ext_imm_0_255 i32:$index)))),
(EXT_ZZI ZPR:$Z1, ZPR:$Z2, sve_ext_imm_0_15:$index)>;		(EXT_ZZI ZPR:$Z1, ZPR:$Z2, sve_ext_imm_0_255:$index)>;
def : Pat<(nxv8i16 (vector_splice (nxv8i16 ZPR:$Z1), (nxv8i16 ZPR:$Z2), (i64 (sve_ext_imm_0_7 i32:$index)))),		def : Pat<(nxv8i16 (vector_splice (nxv8i16 ZPR:$Z1), (nxv8i16 ZPR:$Z2), (i64 (sve_ext_imm_0_127 i32:$index)))),
(EXT_ZZI ZPR:$Z1, ZPR:$Z2, sve_ext_imm_0_7:$index)>;		(EXT_ZZI ZPR:$Z1, ZPR:$Z2, sve_ext_imm_0_127:$index)>;
def : Pat<(nxv4i32 (vector_splice (nxv4i32 ZPR:$Z1), (nxv4i32 ZPR:$Z2), (i64 (sve_ext_imm_0_3 i32:$index)))),		def : Pat<(nxv4i32 (vector_splice (nxv4i32 ZPR:$Z1), (nxv4i32 ZPR:$Z2), (i64 (sve_ext_imm_0_63 i32:$index)))),
(EXT_ZZI ZPR:$Z1, ZPR:$Z2, sve_ext_imm_0_3:$index)>;		(EXT_ZZI ZPR:$Z1, ZPR:$Z2, sve_ext_imm_0_63:$index)>;
def : Pat<(nxv2i64 (vector_splice (nxv2i64 ZPR:$Z1), (nxv2i64 ZPR:$Z2), (i64 (sve_ext_imm_0_1 i32:$index)))),		def : Pat<(nxv2i64 (vector_splice (nxv2i64 ZPR:$Z1), (nxv2i64 ZPR:$Z2), (i64 (sve_ext_imm_0_31 i32:$index)))),
(EXT_ZZI ZPR:$Z1, ZPR:$Z2, sve_ext_imm_0_1:$index)>;		(EXT_ZZI ZPR:$Z1, ZPR:$Z2, sve_ext_imm_0_31:$index)>;

} // End HasSVEorStreamingSVE		} // End HasSVEorStreamingSVE

let Predicates = [HasSVE, HasMatMulInt8] in {		let Predicates = [HasSVE, HasMatMulInt8] in {
defm SMMLA_ZZZ : sve_int_matmul<0b00, "smmla", int_aarch64_sve_smmla>;		defm SMMLA_ZZZ : sve_int_matmul<0b00, "smmla", int_aarch64_sve_smmla>;
defm UMMLA_ZZZ : sve_int_matmul<0b11, "ummla", int_aarch64_sve_ummla>;		defm UMMLA_ZZZ : sve_int_matmul<0b11, "ummla", int_aarch64_sve_ummla>;
defm USMMLA_ZZZ : sve_int_matmul<0b10, "usmmla", int_aarch64_sve_usmmla>;		defm USMMLA_ZZZ : sve_int_matmul<0b10, "usmmla", int_aarch64_sve_usmmla>;
} // End HasSVE, HasMatMulInt8		} // End HasSVE, HasMatMulInt8
▲ Show 20 Lines • Show All 469 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/SVEInstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 258 Lines • ▼ Show 20 Lines	def sve_incdec_imm : Operand<i32>, TImmLeaf<i32, [{
let DecoderMethod = "DecodeSVEIncDecImm";		let DecoderMethod = "DecodeSVEIncDecImm";
}		}

// This allows i32 immediate extraction from i64 based arithmetic.		// This allows i32 immediate extraction from i64 based arithmetic.
def sve_cnt_mul_imm : ComplexPattern<i32, 1, "SelectCntImm<1, 16, 1, false>">;		def sve_cnt_mul_imm : ComplexPattern<i32, 1, "SelectCntImm<1, 16, 1, false>">;
def sve_cnt_shl_imm : ComplexPattern<i32, 1, "SelectCntImm<1, 16, 1, true>">;		def sve_cnt_shl_imm : ComplexPattern<i32, 1, "SelectCntImm<1, 16, 1, true>">;


def sve_ext_imm_0_1 : ComplexPattern<i32, 1, "SelectEXTImm<1, 8>">;		def sve_ext_imm_0_31 : ComplexPattern<i32, 1, "SelectEXTImm<31, 8>">;
def sve_ext_imm_0_3 : ComplexPattern<i32, 1, "SelectEXTImm<3, 4>">;		def sve_ext_imm_0_63 : ComplexPattern<i32, 1, "SelectEXTImm<63, 4>">;
def sve_ext_imm_0_7 : ComplexPattern<i32, 1, "SelectEXTImm<7, 2>">;		def sve_ext_imm_0_127 : ComplexPattern<i32, 1, "SelectEXTImm<127, 2>">;
def sve_ext_imm_0_15 : ComplexPattern<i32, 1, "SelectEXTImm<15, 1>">;		def sve_ext_imm_0_255 : ComplexPattern<i32, 1, "SelectEXTImm<255, 1>">;

def int_aarch64_sve_cntp_oneuse : PatFrag<(ops node:$pred, node:$src2),		def int_aarch64_sve_cntp_oneuse : PatFrag<(ops node:$pred, node:$src2),
(int_aarch64_sve_cntp node:$pred, node:$src2), [{		(int_aarch64_sve_cntp node:$pred, node:$src2), [{
return N->hasOneUse();		return N->hasOneUse();
}]>;		}]>;

def step_vector_oneuse : PatFrag<(ops node:$idx),		def step_vector_oneuse : PatFrag<(ops node:$idx),
(step_vector node:$idx), [{		(step_vector node:$idx), [{
▲ Show 20 Lines • Show All 8,065 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s \| FileCheck %s

	target triple = "aarch64-unknown-linux-gnu"			target triple = "aarch64-unknown-linux-gnu"

	;			;
	; VECTOR_SPLICE (index)			; VECTOR_SPLICE (index)
	;			;

				define <vscale x 16 x i8> @splice_nxv16i8_zero_idx(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: splice_nxv16i8_zero_idx:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ret
				%res = call <vscale x 16 x i8> @llvm.experimental.vector.splice.nxv16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, i32 0)
				ret <vscale x 16 x i8> %res
				}

	define <vscale x 16 x i8> @splice_nxv16i8_first_idx(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {			define <vscale x 16 x i8> @splice_nxv16i8_first_idx(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
	; CHECK-LABEL: splice_nxv16i8_first_idx:			; CHECK-LABEL: splice_nxv16i8_first_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ext z0.b, z0.b, z1.b, #0			; CHECK-NEXT: ext z0.b, z0.b, z1.b, #1
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
				paulwalker-armUnsubmitted Not Done Reply Inline Actions I think the intent of all the `first_idx` tests is to validate the lower bound of the `EXT` isel patterns and so think they should all pass `1` to the `vector.splice` and then perhaps just have a single `zero_idx` test for the NOP case. paulwalker-arm: I think the intent of all the `first_idx` tests is to validate the lower bound of the `EXT`…
	%res = call <vscale x 16 x i8> @llvm.experimental.vector.splice.nxv16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, i32 0)			%res = call <vscale x 16 x i8> @llvm.experimental.vector.splice.nxv16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, i32 1)
	ret <vscale x 16 x i8> %res			ret <vscale x 16 x i8> %res
	}			}

	define <vscale x 16 x i8> @splice_nxv16i8_last_idx(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {			define <vscale x 16 x i8> @splice_nxv16i8_last_idx(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
	; CHECK-LABEL: splice_nxv16i8_last_idx:			; CHECK-LABEL: splice_nxv16i8_last_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ext z0.b, z0.b, z1.b, #15			; CHECK-NEXT: ext z0.b, z0.b, z1.b, #255
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 16 x i8> @llvm.experimental.vector.splice.nxv16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, i32 15)			%res = call <vscale x 16 x i8> @llvm.experimental.vector.splice.nxv16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, i32 255)
	ret <vscale x 16 x i8> %res			ret <vscale x 16 x i8> %res
	}			}

	; Ensure index is clamped when we cannot prove it's less than VL-1.			; Ensure index is clamped when we cannot prove it's less than 2048-bit.
	define <vscale x 16 x i8> @splice_nxv16i8_clamped_idx(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {			define <vscale x 16 x i8> @splice_nxv16i8_clamped_idx(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
	; CHECK-LABEL: splice_nxv16i8_clamped_idx:			; CHECK-LABEL: splice_nxv16i8_clamped_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: addvl sp, sp, #-2			; CHECK-NEXT: addvl sp, sp, #-2
	; CHECK-NEXT: rdvl x9, #1			; CHECK-NEXT: rdvl x9, #1
	; CHECK-NEXT: sub x9, x9, #1			; CHECK-NEXT: sub x9, x9, #1
	; CHECK-NEXT: ptrue p0.b			; CHECK-NEXT: ptrue p0.b
	; CHECK-NEXT: mov x8, sp			; CHECK-NEXT: mov x8, sp
	; CHECK-NEXT: mov w10, #16			; CHECK-NEXT: mov w10, #256
	; CHECK-NEXT: cmp x9, #16			; CHECK-NEXT: cmp x9, #256
	; CHECK-NEXT: st1b { z0.b }, p0, [sp]			; CHECK-NEXT: st1b { z0.b }, p0, [sp]
				paulwalker-armUnsubmitted Not Done Reply Inline Actions Is this safe? You've not touched this code so I've got a feeling the common expand code is incorrectly assuming the largest legal index+1 represents a safe index to use when expanding the code. I think it is safer to force an out-of-range index to `0`. paulwalker-arm: Is this safe? You've not touched this code so I've got a feeling the common expand code is…
				paulwalker-armUnsubmitted Not Done Reply Inline Actions Please ignore this. I didn't spot the ll test change and so got my wires crossed. paulwalker-arm: Please ignore this. I didn't spot the ll test change and so got my wires crossed.
	; CHECK-NEXT: st1b { z1.b }, p0, [x8, #1, mul vl]			; CHECK-NEXT: st1b { z1.b }, p0, [x8, #1, mul vl]
	; CHECK-NEXT: csel x9, x9, x10, lo			; CHECK-NEXT: csel x9, x9, x10, lo
	; CHECK-NEXT: ld1b { z0.b }, p0/z, [x8, x9]			; CHECK-NEXT: ld1b { z0.b }, p0/z, [x8, x9]
	; CHECK-NEXT: addvl sp, sp, #2			; CHECK-NEXT: addvl sp, sp, #2
	; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 16 x i8> @llvm.experimental.vector.splice.nxv16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, i32 16)			%res = call <vscale x 16 x i8> @llvm.experimental.vector.splice.nxv16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, i32 256)
	ret <vscale x 16 x i8> %res			ret <vscale x 16 x i8> %res
	}			}

	define <vscale x 8 x i16> @splice_nxv8i16_first_idx(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {			define <vscale x 8 x i16> @splice_nxv8i16_first_idx(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
	; CHECK-LABEL: splice_nxv8i16_first_idx:			; CHECK-LABEL: splice_nxv8i16_first_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ext z0.b, z0.b, z1.b, #0			; CHECK-NEXT: ext z0.b, z0.b, z1.b, #2
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 8 x i16> @llvm.experimental.vector.splice.nxv8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, i32 0)			%res = call <vscale x 8 x i16> @llvm.experimental.vector.splice.nxv8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, i32 1)
	ret <vscale x 8 x i16> %res			ret <vscale x 8 x i16> %res
	}			}

	define <vscale x 8 x i16> @splice_nxv8i16_last_idx(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {			define <vscale x 8 x i16> @splice_nxv8i16_last_idx(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
	; CHECK-LABEL: splice_nxv8i16_last_idx:			; CHECK-LABEL: splice_nxv8i16_last_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ext z0.b, z0.b, z1.b, #14			; CHECK-NEXT: ext z0.b, z0.b, z1.b, #254
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 8 x i16> @llvm.experimental.vector.splice.nxv8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, i32 7)			%res = call <vscale x 8 x i16> @llvm.experimental.vector.splice.nxv8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, i32 127)
	ret <vscale x 8 x i16> %res			ret <vscale x 8 x i16> %res
	}			}

	; Ensure index is clamped when we cannot prove it's less than VL-1.			; Ensure index is clamped when we cannot prove it's less than 2048-bit.
	define <vscale x 8 x i16> @splice_nxv8i16_clamped_idx(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {			define <vscale x 8 x i16> @splice_nxv8i16_clamped_idx(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
	; CHECK-LABEL: splice_nxv8i16_clamped_idx:			; CHECK-LABEL: splice_nxv8i16_clamped_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: addvl sp, sp, #-2			; CHECK-NEXT: addvl sp, sp, #-2
	; CHECK-NEXT: cnth x10			; CHECK-NEXT: cnth x10
	; CHECK-NEXT: sub x10, x10, #1			; CHECK-NEXT: sub x10, x10, #1
	; CHECK-NEXT: ptrue p0.h			; CHECK-NEXT: ptrue p0.h
	; CHECK-NEXT: mov x8, sp			; CHECK-NEXT: mov x8, sp
	; CHECK-NEXT: mov w9, #8			; CHECK-NEXT: mov w9, #128
	; CHECK-NEXT: cmp x10, #8			; CHECK-NEXT: cmp x10, #128
	; CHECK-NEXT: st1h { z0.h }, p0, [sp]			; CHECK-NEXT: st1h { z0.h }, p0, [sp]
	; CHECK-NEXT: st1h { z1.h }, p0, [x8, #1, mul vl]			; CHECK-NEXT: st1h { z1.h }, p0, [x8, #1, mul vl]
	; CHECK-NEXT: csel x9, x10, x9, lo			; CHECK-NEXT: csel x9, x10, x9, lo
	; CHECK-NEXT: ld1h { z0.h }, p0/z, [x8, x9, lsl #1]			; CHECK-NEXT: ld1h { z0.h }, p0/z, [x8, x9, lsl #1]
	; CHECK-NEXT: addvl sp, sp, #2			; CHECK-NEXT: addvl sp, sp, #2
	; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 8 x i16> @llvm.experimental.vector.splice.nxv8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, i32 8)			%res = call <vscale x 8 x i16> @llvm.experimental.vector.splice.nxv8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, i32 128)
	ret <vscale x 8 x i16> %res			ret <vscale x 8 x i16> %res
	}			}

	define <vscale x 4 x i32> @splice_nxv4i32_first_idx(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {			define <vscale x 4 x i32> @splice_nxv4i32_first_idx(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
	; CHECK-LABEL: splice_nxv4i32_first_idx:			; CHECK-LABEL: splice_nxv4i32_first_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ext z0.b, z0.b, z1.b, #0			; CHECK-NEXT: ext z0.b, z0.b, z1.b, #4
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 4 x i32> @llvm.experimental.vector.splice.nxv4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, i32 0)			%res = call <vscale x 4 x i32> @llvm.experimental.vector.splice.nxv4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, i32 1)
	ret <vscale x 4 x i32> %res			ret <vscale x 4 x i32> %res
	}			}

	define <vscale x 4 x i32> @splice_nxv4i32_last_idx(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {			define <vscale x 4 x i32> @splice_nxv4i32_last_idx(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
	; CHECK-LABEL: splice_nxv4i32_last_idx:			; CHECK-LABEL: splice_nxv4i32_last_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ext z0.b, z0.b, z1.b, #12			; CHECK-NEXT: ext z0.b, z0.b, z1.b, #252
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 4 x i32> @llvm.experimental.vector.splice.nxv4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, i32 3)			%res = call <vscale x 4 x i32> @llvm.experimental.vector.splice.nxv4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, i32 63)
	ret <vscale x 4 x i32> %res			ret <vscale x 4 x i32> %res
	}			}

	; Ensure index is clamped when we cannot prove it's less than VL-1.			; Ensure index is clamped when we cannot prove it's less than 2048-bit.
	define <vscale x 4 x i32> @splice_nxv4i32_clamped_idx(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {			define <vscale x 4 x i32> @splice_nxv4i32_clamped_idx(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
	; CHECK-LABEL: splice_nxv4i32_clamped_idx:			; CHECK-LABEL: splice_nxv4i32_clamped_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: addvl sp, sp, #-2			; CHECK-NEXT: addvl sp, sp, #-2
	; CHECK-NEXT: cntw x10			; CHECK-NEXT: cntw x10
	; CHECK-NEXT: sub x10, x10, #1			; CHECK-NEXT: sub x10, x10, #1
	; CHECK-NEXT: ptrue p0.s			; CHECK-NEXT: ptrue p0.s
	; CHECK-NEXT: mov x8, sp			; CHECK-NEXT: mov x8, sp
	; CHECK-NEXT: mov w9, #4			; CHECK-NEXT: mov w9, #64
	; CHECK-NEXT: cmp x10, #4			; CHECK-NEXT: cmp x10, #64
	; CHECK-NEXT: st1w { z0.s }, p0, [sp]			; CHECK-NEXT: st1w { z0.s }, p0, [sp]
	; CHECK-NEXT: st1w { z1.s }, p0, [x8, #1, mul vl]			; CHECK-NEXT: st1w { z1.s }, p0, [x8, #1, mul vl]
	; CHECK-NEXT: csel x9, x10, x9, lo			; CHECK-NEXT: csel x9, x10, x9, lo
	; CHECK-NEXT: ld1w { z0.s }, p0/z, [x8, x9, lsl #2]			; CHECK-NEXT: ld1w { z0.s }, p0/z, [x8, x9, lsl #2]
	; CHECK-NEXT: addvl sp, sp, #2			; CHECK-NEXT: addvl sp, sp, #2
	; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 4 x i32> @llvm.experimental.vector.splice.nxv4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, i32 4)			%res = call <vscale x 4 x i32> @llvm.experimental.vector.splice.nxv4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, i32 64)
	ret <vscale x 4 x i32> %res			ret <vscale x 4 x i32> %res
	}			}

	define <vscale x 2 x i64> @splice_nxv2i64_first_idx(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {			define <vscale x 2 x i64> @splice_nxv2i64_first_idx(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
	; CHECK-LABEL: splice_nxv2i64_first_idx:			; CHECK-LABEL: splice_nxv2i64_first_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ext z0.b, z0.b, z1.b, #0			; CHECK-NEXT: ext z0.b, z0.b, z1.b, #8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 2 x i64> @llvm.experimental.vector.splice.nxv2i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b, i32 0)			%res = call <vscale x 2 x i64> @llvm.experimental.vector.splice.nxv2i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b, i32 1)
	ret <vscale x 2 x i64> %res			ret <vscale x 2 x i64> %res
	}			}

	define <vscale x 2 x i64> @splice_nxv2i64_last_idx(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {			define <vscale x 2 x i64> @splice_nxv2i64_last_idx(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
	; CHECK-LABEL: splice_nxv2i64_last_idx:			; CHECK-LABEL: splice_nxv2i64_last_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ext z0.b, z0.b, z1.b, #8			; CHECK-NEXT: ext z0.b, z0.b, z1.b, #248
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 2 x i64> @llvm.experimental.vector.splice.nxv2i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b, i32 1)			%res = call <vscale x 2 x i64> @llvm.experimental.vector.splice.nxv2i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b, i32 31)
	ret <vscale x 2 x i64> %res			ret <vscale x 2 x i64> %res
	}			}

	; Ensure index is clamped when we cannot prove it's less than VL-1.			; Ensure index is clamped when we cannot prove it's less than 2048-bit.
	define <vscale x 2 x i64> @splice_nxv2i64_clamped_idx(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {			define <vscale x 2 x i64> @splice_nxv2i64_clamped_idx(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
	; CHECK-LABEL: splice_nxv2i64_clamped_idx:			; CHECK-LABEL: splice_nxv2i64_clamped_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: addvl sp, sp, #-2			; CHECK-NEXT: addvl sp, sp, #-2
	; CHECK-NEXT: cntd x10			; CHECK-NEXT: cntd x10
	; CHECK-NEXT: sub x10, x10, #1			; CHECK-NEXT: sub x10, x10, #1
	; CHECK-NEXT: ptrue p0.d			; CHECK-NEXT: ptrue p0.d
	; CHECK-NEXT: mov x8, sp			; CHECK-NEXT: mov x8, sp
	; CHECK-NEXT: mov w9, #2			; CHECK-NEXT: mov w9, #32
	; CHECK-NEXT: cmp x10, #2			; CHECK-NEXT: cmp x10, #32
	; CHECK-NEXT: st1d { z0.d }, p0, [sp]			; CHECK-NEXT: st1d { z0.d }, p0, [sp]
	; CHECK-NEXT: st1d { z1.d }, p0, [x8, #1, mul vl]			; CHECK-NEXT: st1d { z1.d }, p0, [x8, #1, mul vl]
	; CHECK-NEXT: csel x9, x10, x9, lo			; CHECK-NEXT: csel x9, x10, x9, lo
	; CHECK-NEXT: ld1d { z0.d }, p0/z, [x8, x9, lsl #3]			; CHECK-NEXT: ld1d { z0.d }, p0/z, [x8, x9, lsl #3]
	; CHECK-NEXT: addvl sp, sp, #2			; CHECK-NEXT: addvl sp, sp, #2
	; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 2 x i64> @llvm.experimental.vector.splice.nxv2i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b, i32 2)			%res = call <vscale x 2 x i64> @llvm.experimental.vector.splice.nxv2i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b, i32 32)
	ret <vscale x 2 x i64> %res			ret <vscale x 2 x i64> %res
	}			}

	define <vscale x 2 x half> @splice_nxv2f16_neg_idx(<vscale x 2 x half> %a, <vscale x 2 x half> %b) #0 {			define <vscale x 2 x half> @splice_nxv2f16_neg_idx(<vscale x 2 x half> %a, <vscale x 2 x half> %b) #0 {
	; CHECK-LABEL: splice_nxv2f16_neg_idx:			; CHECK-LABEL: splice_nxv2f16_neg_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ptrue p0.d			; CHECK-NEXT: ptrue p0.d
	; CHECK-NEXT: lastb d0, p0, z0.d			; CHECK-NEXT: lastb d0, p0, z0.d
	Show All 21 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 2 x half> @llvm.experimental.vector.splice.nxv2f16(<vscale x 2 x half> %a, <vscale x 2 x half> %b, i32 -2)			%res = call <vscale x 2 x half> @llvm.experimental.vector.splice.nxv2f16(<vscale x 2 x half> %a, <vscale x 2 x half> %b, i32 -2)
	ret <vscale x 2 x half> %res			ret <vscale x 2 x half> %res
	}			}

	define <vscale x 2 x half> @splice_nxv2f16_first_idx(<vscale x 2 x half> %a, <vscale x 2 x half> %b) #0 {			define <vscale x 2 x half> @splice_nxv2f16_first_idx(<vscale x 2 x half> %a, <vscale x 2 x half> %b) #0 {
	; CHECK-LABEL: splice_nxv2f16_first_idx:			; CHECK-LABEL: splice_nxv2f16_first_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ext z0.b, z0.b, z1.b, #0			; CHECK-NEXT: ext z0.b, z0.b, z1.b, #8
				peterwaller-armUnsubmitted Not Done Reply Inline Actions I expected to find a #2 here (1 x f16's worth of bytes), but there is a #8, which I can't quickly explain? peterwaller-arm: I expected to find a #2 here (1 x f16's worth of bytes), but there is a #8, which I can't…
				peterwaller-armUnsubmitted Not Done Reply Inline Actions A colleague reminded me that this is an unpacked type, so the container fits in 8 bytes. Now it all makes sense. peterwaller-arm: A colleague reminded me that this is an unpacked type, so the container fits in 8 bytes. Now it…
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 2 x half> @llvm.experimental.vector.splice.nxv2f16(<vscale x 2 x half> %a, <vscale x 2 x half> %b, i32 0)			%res = call <vscale x 2 x half> @llvm.experimental.vector.splice.nxv2f16(<vscale x 2 x half> %a, <vscale x 2 x half> %b, i32 1)
	ret <vscale x 2 x half> %res			ret <vscale x 2 x half> %res
	}			}

	define <vscale x 2 x half> @splice_nxv2f16_1_idx(<vscale x 2 x half> %a, <vscale x 2 x half> %b) #0 {			define <vscale x 2 x half> @splice_nxv2f16_last_idx(<vscale x 2 x half> %a, <vscale x 2 x half> %b) #0 {
				paulwalker-armUnsubmitted Not Done Reply Inline Actions Perhaps the `_1_` in the function name should now be `_31_`? paulwalker-arm: Perhaps the `_1_` in the function name should now be `_31_`?
	; CHECK-LABEL: splice_nxv2f16_1_idx:			; CHECK-LABEL: splice_nxv2f16_last_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ext z0.b, z0.b, z1.b, #8			; CHECK-NEXT: ext z0.b, z0.b, z1.b, #248
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 2 x half> @llvm.experimental.vector.splice.nxv2f16(<vscale x 2 x half> %a, <vscale x 2 x half> %b, i32 1)			%res = call <vscale x 2 x half> @llvm.experimental.vector.splice.nxv2f16(<vscale x 2 x half> %a, <vscale x 2 x half> %b, i32 31)
				peterwaller-armUnsubmitted Not Done Reply Inline Actions Continuing the logic of the earlier comment I make the immediate maximum to be 255 (bytes) - 2 (1 x f16) = 253, but we can't access byte 253 because we need a whole number of 2-byte halves, so I'm expecting to see 252/2 = 126 as input and #252 as the immediate. peterwaller-arm: Continuing the logic of the earlier comment I make the immediate maximum to be 255 (bytes) - 2…
	ret <vscale x 2 x half> %res			ret <vscale x 2 x half> %res
	}			}

	; Ensure index is clamped when we cannot prove it's less than VL-1.			; Ensure index is clamped when we cannot prove it's less than 2048-bit.
	define <vscale x 2 x half> @splice_nxv2f16_last_idx(<vscale x 2 x half> %a, <vscale x 2 x half> %b) #0 {			define <vscale x 2 x half> @splice_nxv2f16_clamped_idx(<vscale x 2 x half> %a, <vscale x 2 x half> %b) #0 {
	; CHECK-LABEL: splice_nxv2f16_last_idx:			; CHECK-LABEL: splice_nxv2f16_clamped_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: addvl sp, sp, #-2			; CHECK-NEXT: addvl sp, sp, #-2
	; CHECK-NEXT: cntd x10			; CHECK-NEXT: cntd x10
	; CHECK-NEXT: sub x10, x10, #1			; CHECK-NEXT: sub x10, x10, #1
	; CHECK-NEXT: mov w9, #2			; CHECK-NEXT: mov w9, #32
	; CHECK-NEXT: cmp x10, #2			; CHECK-NEXT: cmp x10, #32
	; CHECK-NEXT: ptrue p0.h			; CHECK-NEXT: ptrue p0.h
	; CHECK-NEXT: mov x8, sp			; CHECK-NEXT: mov x8, sp
	; CHECK-NEXT: csel x9, x10, x9, lo			; CHECK-NEXT: csel x9, x10, x9, lo
	; CHECK-NEXT: ptrue p1.b			; CHECK-NEXT: ptrue p1.b
	; CHECK-NEXT: st1h { z0.h }, p0, [sp]			; CHECK-NEXT: st1h { z0.h }, p0, [sp]
	; CHECK-NEXT: st1h { z1.h }, p0, [x8, #1, mul vl]			; CHECK-NEXT: st1h { z1.h }, p0, [x8, #1, mul vl]
	; CHECK-NEXT: lsl x9, x9, #3			; CHECK-NEXT: lsl x9, x9, #3
	; CHECK-NEXT: ld1b { z0.b }, p1/z, [x8, x9]			; CHECK-NEXT: ld1b { z0.b }, p1/z, [x8, x9]
	; CHECK-NEXT: addvl sp, sp, #2			; CHECK-NEXT: addvl sp, sp, #2
	; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 2 x half> @llvm.experimental.vector.splice.nxv2f16(<vscale x 2 x half> %a, <vscale x 2 x half> %b, i32 2)			%res = call <vscale x 2 x half> @llvm.experimental.vector.splice.nxv2f16(<vscale x 2 x half> %a, <vscale x 2 x half> %b, i32 32)
	ret <vscale x 2 x half> %res			ret <vscale x 2 x half> %res
	}			}

	define <vscale x 4 x half> @splice_nxv4f16_neg_idx(<vscale x 4 x half> %a, <vscale x 4 x half> %b) #0 {			define <vscale x 4 x half> @splice_nxv4f16_neg_idx(<vscale x 4 x half> %a, <vscale x 4 x half> %b) #0 {
	; CHECK-LABEL: splice_nxv4f16_neg_idx:			; CHECK-LABEL: splice_nxv4f16_neg_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ptrue p0.s			; CHECK-NEXT: ptrue p0.s
	; CHECK-NEXT: lastb s0, p0, z0.s			; CHECK-NEXT: lastb s0, p0, z0.s
	Show All 21 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 4 x half> @llvm.experimental.vector.splice.nxv4f16(<vscale x 4 x half> %a, <vscale x 4 x half> %b, i32 -3)			%res = call <vscale x 4 x half> @llvm.experimental.vector.splice.nxv4f16(<vscale x 4 x half> %a, <vscale x 4 x half> %b, i32 -3)
	ret <vscale x 4 x half> %res			ret <vscale x 4 x half> %res
	}			}

	define <vscale x 4 x half> @splice_nxv4f16_first_idx(<vscale x 4 x half> %a, <vscale x 4 x half> %b) #0 {			define <vscale x 4 x half> @splice_nxv4f16_first_idx(<vscale x 4 x half> %a, <vscale x 4 x half> %b) #0 {
	; CHECK-LABEL: splice_nxv4f16_first_idx:			; CHECK-LABEL: splice_nxv4f16_first_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ext z0.b, z0.b, z1.b, #0			; CHECK-NEXT: ext z0.b, z0.b, z1.b, #4
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 4 x half> @llvm.experimental.vector.splice.nxv4f16(<vscale x 4 x half> %a, <vscale x 4 x half> %b, i32 0)			%res = call <vscale x 4 x half> @llvm.experimental.vector.splice.nxv4f16(<vscale x 4 x half> %a, <vscale x 4 x half> %b, i32 1)
	ret <vscale x 4 x half> %res			ret <vscale x 4 x half> %res
	}			}

	define <vscale x 4 x half> @splice_nxv4f16_3_idx(<vscale x 4 x half> %a, <vscale x 4 x half> %b) #0 {			define <vscale x 4 x half> @splice_nxv4f16_last_idx(<vscale x 4 x half> %a, <vscale x 4 x half> %b) #0 {
	; CHECK-LABEL: splice_nxv4f16_3_idx:			; CHECK-LABEL: splice_nxv4f16_last_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ext z0.b, z0.b, z1.b, #12			; CHECK-NEXT: ext z0.b, z0.b, z1.b, #252
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 4 x half> @llvm.experimental.vector.splice.nxv4f16(<vscale x 4 x half> %a, <vscale x 4 x half> %b, i32 3)			%res = call <vscale x 4 x half> @llvm.experimental.vector.splice.nxv4f16(<vscale x 4 x half> %a, <vscale x 4 x half> %b, i32 63)
	ret <vscale x 4 x half> %res			ret <vscale x 4 x half> %res
	}			}

	; Ensure index is clamped when we cannot prove it's less than VL-1.			; Ensure index is clamped when we cannot prove it's less than 2048-bit.
	define <vscale x 4 x half> @splice_nxv4f16_last_idx(<vscale x 4 x half> %a, <vscale x 4 x half> %b) #0 {			define <vscale x 4 x half> @splice_nxv4f16_clamped_idx(<vscale x 4 x half> %a, <vscale x 4 x half> %b) #0 {
	; CHECK-LABEL: splice_nxv4f16_last_idx:			; CHECK-LABEL: splice_nxv4f16_clamped_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: addvl sp, sp, #-2			; CHECK-NEXT: addvl sp, sp, #-2
	; CHECK-NEXT: cntw x10			; CHECK-NEXT: cntw x10
	; CHECK-NEXT: sub x10, x10, #1			; CHECK-NEXT: sub x10, x10, #1
	; CHECK-NEXT: mov w9, #4			; CHECK-NEXT: mov w9, #64
	; CHECK-NEXT: cmp x10, #4			; CHECK-NEXT: cmp x10, #64
	; CHECK-NEXT: ptrue p0.h			; CHECK-NEXT: ptrue p0.h
	; CHECK-NEXT: mov x8, sp			; CHECK-NEXT: mov x8, sp
	; CHECK-NEXT: csel x9, x10, x9, lo			; CHECK-NEXT: csel x9, x10, x9, lo
	; CHECK-NEXT: ptrue p1.b			; CHECK-NEXT: ptrue p1.b
	; CHECK-NEXT: st1h { z0.h }, p0, [sp]			; CHECK-NEXT: st1h { z0.h }, p0, [sp]
	; CHECK-NEXT: st1h { z1.h }, p0, [x8, #1, mul vl]			; CHECK-NEXT: st1h { z1.h }, p0, [x8, #1, mul vl]
	; CHECK-NEXT: lsl x9, x9, #2			; CHECK-NEXT: lsl x9, x9, #2
	; CHECK-NEXT: ld1b { z0.b }, p1/z, [x8, x9]			; CHECK-NEXT: ld1b { z0.b }, p1/z, [x8, x9]
	; CHECK-NEXT: addvl sp, sp, #2			; CHECK-NEXT: addvl sp, sp, #2
	; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 4 x half> @llvm.experimental.vector.splice.nxv4f16(<vscale x 4 x half> %a, <vscale x 4 x half> %b, i32 4)			%res = call <vscale x 4 x half> @llvm.experimental.vector.splice.nxv4f16(<vscale x 4 x half> %a, <vscale x 4 x half> %b, i32 64)
	ret <vscale x 4 x half> %res			ret <vscale x 4 x half> %res
	}			}


	define <vscale x 8 x half> @splice_nxv8f16_first_idx(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0 {			define <vscale x 8 x half> @splice_nxv8f16_first_idx(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0 {
	; CHECK-LABEL: splice_nxv8f16_first_idx:			; CHECK-LABEL: splice_nxv8f16_first_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ext z0.b, z0.b, z1.b, #0			; CHECK-NEXT: ext z0.b, z0.b, z1.b, #2
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b, i32 0)			%res = call <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b, i32 1)
	ret <vscale x 8 x half> %res			ret <vscale x 8 x half> %res
	}			}

	define <vscale x 8 x half> @splice_nxv8f16_last_idx(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0 {			define <vscale x 8 x half> @splice_nxv8f16_last_idx(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0 {
	; CHECK-LABEL: splice_nxv8f16_last_idx:			; CHECK-LABEL: splice_nxv8f16_last_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ext z0.b, z0.b, z1.b, #14			; CHECK-NEXT: ext z0.b, z0.b, z1.b, #254
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b, i32 7)			%res = call <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b, i32 127)
	ret <vscale x 8 x half> %res			ret <vscale x 8 x half> %res
	}			}

	; Ensure index is clamped when we cannot prove it's less than VL-1.			; Ensure index is clamped when we cannot prove it's less than 2048-bit.
	define <vscale x 8 x half> @splice_nxv8f16_clamped_idx(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0 {			define <vscale x 8 x half> @splice_nxv8f16_clamped_idx(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0 {
	; CHECK-LABEL: splice_nxv8f16_clamped_idx:			; CHECK-LABEL: splice_nxv8f16_clamped_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: addvl sp, sp, #-2			; CHECK-NEXT: addvl sp, sp, #-2
	; CHECK-NEXT: cnth x10			; CHECK-NEXT: cnth x10
	; CHECK-NEXT: sub x10, x10, #1			; CHECK-NEXT: sub x10, x10, #1
	; CHECK-NEXT: ptrue p0.h			; CHECK-NEXT: ptrue p0.h
	; CHECK-NEXT: mov x8, sp			; CHECK-NEXT: mov x8, sp
	; CHECK-NEXT: mov w9, #8			; CHECK-NEXT: mov w9, #128
	; CHECK-NEXT: cmp x10, #8			; CHECK-NEXT: cmp x10, #128
	; CHECK-NEXT: st1h { z0.h }, p0, [sp]			; CHECK-NEXT: st1h { z0.h }, p0, [sp]
	; CHECK-NEXT: st1h { z1.h }, p0, [x8, #1, mul vl]			; CHECK-NEXT: st1h { z1.h }, p0, [x8, #1, mul vl]
	; CHECK-NEXT: csel x9, x10, x9, lo			; CHECK-NEXT: csel x9, x10, x9, lo
	; CHECK-NEXT: ld1h { z0.h }, p0/z, [x8, x9, lsl #1]			; CHECK-NEXT: ld1h { z0.h }, p0/z, [x8, x9, lsl #1]
	; CHECK-NEXT: addvl sp, sp, #2			; CHECK-NEXT: addvl sp, sp, #2
	; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b, i32 8)			%res = call <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b, i32 128)
	ret <vscale x 8 x half> %res			ret <vscale x 8 x half> %res
	}			}

	define <vscale x 2 x float> @splice_nxv2f32_neg_idx(<vscale x 2 x float> %a, <vscale x 2 x float> %b) #0 {			define <vscale x 2 x float> @splice_nxv2f32_neg_idx(<vscale x 2 x float> %a, <vscale x 2 x float> %b) #0 {
	; CHECK-LABEL: splice_nxv2f32_neg_idx:			; CHECK-LABEL: splice_nxv2f32_neg_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ptrue p0.d			; CHECK-NEXT: ptrue p0.d
	; CHECK-NEXT: lastb d0, p0, z0.d			; CHECK-NEXT: lastb d0, p0, z0.d
	Show All 21 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 2 x float> @llvm.experimental.vector.splice.nxv2f32(<vscale x 2 x float> %a, <vscale x 2 x float> %b, i32 -2)			%res = call <vscale x 2 x float> @llvm.experimental.vector.splice.nxv2f32(<vscale x 2 x float> %a, <vscale x 2 x float> %b, i32 -2)
	ret <vscale x 2 x float> %res			ret <vscale x 2 x float> %res
	}			}

	define <vscale x 2 x float> @splice_nxv2f32_first_idx(<vscale x 2 x float> %a, <vscale x 2 x float> %b) #0 {			define <vscale x 2 x float> @splice_nxv2f32_first_idx(<vscale x 2 x float> %a, <vscale x 2 x float> %b) #0 {
	; CHECK-LABEL: splice_nxv2f32_first_idx:			; CHECK-LABEL: splice_nxv2f32_first_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ext z0.b, z0.b, z1.b, #0			; CHECK-NEXT: ext z0.b, z0.b, z1.b, #8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 2 x float> @llvm.experimental.vector.splice.nxv2f32(<vscale x 2 x float> %a, <vscale x 2 x float> %b, i32 0)			%res = call <vscale x 2 x float> @llvm.experimental.vector.splice.nxv2f32(<vscale x 2 x float> %a, <vscale x 2 x float> %b, i32 1)
	ret <vscale x 2 x float> %res			ret <vscale x 2 x float> %res
	}			}

	define <vscale x 2 x float> @splice_nxv2f32_1_idx(<vscale x 2 x float> %a, <vscale x 2 x float> %b) #0 {			define <vscale x 2 x float> @splice_nxv2f32_last_idx(<vscale x 2 x float> %a, <vscale x 2 x float> %b) #0 {
	; CHECK-LABEL: splice_nxv2f32_1_idx:			; CHECK-LABEL: splice_nxv2f32_last_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ext z0.b, z0.b, z1.b, #8			; CHECK-NEXT: ext z0.b, z0.b, z1.b, #248
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 2 x float> @llvm.experimental.vector.splice.nxv2f32(<vscale x 2 x float> %a, <vscale x 2 x float> %b, i32 1)			%res = call <vscale x 2 x float> @llvm.experimental.vector.splice.nxv2f32(<vscale x 2 x float> %a, <vscale x 2 x float> %b, i32 31)
	ret <vscale x 2 x float> %res			ret <vscale x 2 x float> %res
	}			}

	; Ensure index is clamped when we cannot prove it's less than VL-1.			; Ensure index is clamped when we cannot prove it's less than 2048-bit.
	define <vscale x 2 x float> @splice_nxv2f32_last_idx(<vscale x 2 x float> %a, <vscale x 2 x float> %b) #0 {			define <vscale x 2 x float> @splice_nxv2f32_clamped_idx(<vscale x 2 x float> %a, <vscale x 2 x float> %b) #0 {
	; CHECK-LABEL: splice_nxv2f32_last_idx:			; CHECK-LABEL: splice_nxv2f32_clamped_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: addvl sp, sp, #-2			; CHECK-NEXT: addvl sp, sp, #-2
	; CHECK-NEXT: cntd x10			; CHECK-NEXT: cntd x10
	; CHECK-NEXT: sub x10, x10, #1			; CHECK-NEXT: sub x10, x10, #1
	; CHECK-NEXT: mov w9, #2			; CHECK-NEXT: mov w9, #32
	; CHECK-NEXT: cmp x10, #2			; CHECK-NEXT: cmp x10, #32
	; CHECK-NEXT: ptrue p0.s			; CHECK-NEXT: ptrue p0.s
	; CHECK-NEXT: mov x8, sp			; CHECK-NEXT: mov x8, sp
	; CHECK-NEXT: csel x9, x10, x9, lo			; CHECK-NEXT: csel x9, x10, x9, lo
	; CHECK-NEXT: ptrue p1.b			; CHECK-NEXT: ptrue p1.b
	; CHECK-NEXT: st1w { z0.s }, p0, [sp]			; CHECK-NEXT: st1w { z0.s }, p0, [sp]
	; CHECK-NEXT: st1w { z1.s }, p0, [x8, #1, mul vl]			; CHECK-NEXT: st1w { z1.s }, p0, [x8, #1, mul vl]
	; CHECK-NEXT: lsl x9, x9, #3			; CHECK-NEXT: lsl x9, x9, #3
	; CHECK-NEXT: ld1b { z0.b }, p1/z, [x8, x9]			; CHECK-NEXT: ld1b { z0.b }, p1/z, [x8, x9]
	; CHECK-NEXT: addvl sp, sp, #2			; CHECK-NEXT: addvl sp, sp, #2
	; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 2 x float> @llvm.experimental.vector.splice.nxv2f32(<vscale x 2 x float> %a, <vscale x 2 x float> %b, i32 2)			%res = call <vscale x 2 x float> @llvm.experimental.vector.splice.nxv2f32(<vscale x 2 x float> %a, <vscale x 2 x float> %b, i32 32)
	ret <vscale x 2 x float> %res			ret <vscale x 2 x float> %res
	}			}

	define <vscale x 4 x float> @splice_nxv4f32_first_idx(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0 {			define <vscale x 4 x float> @splice_nxv4f32_first_idx(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0 {
	; CHECK-LABEL: splice_nxv4f32_first_idx:			; CHECK-LABEL: splice_nxv4f32_first_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ext z0.b, z0.b, z1.b, #0			; CHECK-NEXT: ext z0.b, z0.b, z1.b, #4
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 4 x float> @llvm.experimental.vector.splice.nxv4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b, i32 0)			%res = call <vscale x 4 x float> @llvm.experimental.vector.splice.nxv4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b, i32 1)
	ret <vscale x 4 x float> %res			ret <vscale x 4 x float> %res
	}			}

	define <vscale x 4 x float> @splice_nxv4f32_last_idx(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0 {			define <vscale x 4 x float> @splice_nxv4f32_last_idx(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0 {
	; CHECK-LABEL: splice_nxv4f32_last_idx:			; CHECK-LABEL: splice_nxv4f32_last_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ext z0.b, z0.b, z1.b, #12			; CHECK-NEXT: ext z0.b, z0.b, z1.b, #252
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 4 x float> @llvm.experimental.vector.splice.nxv4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b, i32 3)			%res = call <vscale x 4 x float> @llvm.experimental.vector.splice.nxv4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b, i32 63)
	ret <vscale x 4 x float> %res			ret <vscale x 4 x float> %res
	}			}

	; Ensure index is clamped when we cannot prove it's less than VL-1.			; Ensure index is clamped when we cannot prove it's less than 2048-bit.
	define <vscale x 4 x float> @splice_nxv4f32_clamped_idx(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0 {			define <vscale x 4 x float> @splice_nxv4f32_clamped_idx(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0 {
	; CHECK-LABEL: splice_nxv4f32_clamped_idx:			; CHECK-LABEL: splice_nxv4f32_clamped_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: addvl sp, sp, #-2			; CHECK-NEXT: addvl sp, sp, #-2
	; CHECK-NEXT: cntw x10			; CHECK-NEXT: cntw x10
	; CHECK-NEXT: sub x10, x10, #1			; CHECK-NEXT: sub x10, x10, #1
	; CHECK-NEXT: ptrue p0.s			; CHECK-NEXT: ptrue p0.s
	; CHECK-NEXT: mov x8, sp			; CHECK-NEXT: mov x8, sp
	; CHECK-NEXT: mov w9, #4			; CHECK-NEXT: mov w9, #64
	; CHECK-NEXT: cmp x10, #4			; CHECK-NEXT: cmp x10, #64
	; CHECK-NEXT: st1w { z0.s }, p0, [sp]			; CHECK-NEXT: st1w { z0.s }, p0, [sp]
	; CHECK-NEXT: st1w { z1.s }, p0, [x8, #1, mul vl]			; CHECK-NEXT: st1w { z1.s }, p0, [x8, #1, mul vl]
	; CHECK-NEXT: csel x9, x10, x9, lo			; CHECK-NEXT: csel x9, x10, x9, lo
	; CHECK-NEXT: ld1w { z0.s }, p0/z, [x8, x9, lsl #2]			; CHECK-NEXT: ld1w { z0.s }, p0/z, [x8, x9, lsl #2]
	; CHECK-NEXT: addvl sp, sp, #2			; CHECK-NEXT: addvl sp, sp, #2
	; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 4 x float> @llvm.experimental.vector.splice.nxv4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b, i32 4)			%res = call <vscale x 4 x float> @llvm.experimental.vector.splice.nxv4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b, i32 64)
	ret <vscale x 4 x float> %res			ret <vscale x 4 x float> %res
	}			}

	define <vscale x 2 x double> @splice_nxv2f64_first_idx(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {			define <vscale x 2 x double> @splice_nxv2f64_first_idx(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
	; CHECK-LABEL: splice_nxv2f64_first_idx:			; CHECK-LABEL: splice_nxv2f64_first_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ext z0.b, z0.b, z1.b, #0			; CHECK-NEXT: ext z0.b, z0.b, z1.b, #8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 2 x double> @llvm.experimental.vector.splice.nxv2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b, i32 0)			%res = call <vscale x 2 x double> @llvm.experimental.vector.splice.nxv2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b, i32 1)
	ret <vscale x 2 x double> %res			ret <vscale x 2 x double> %res
	}			}

	define <vscale x 2 x double> @splice_nxv2f64_last_idx(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {			define <vscale x 2 x double> @splice_nxv2f64_last_idx(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
	; CHECK-LABEL: splice_nxv2f64_last_idx:			; CHECK-LABEL: splice_nxv2f64_last_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ext z0.b, z0.b, z1.b, #8			; CHECK-NEXT: ext z0.b, z0.b, z1.b, #248
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 2 x double> @llvm.experimental.vector.splice.nxv2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b, i32 1)			%res = call <vscale x 2 x double> @llvm.experimental.vector.splice.nxv2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b, i32 31)
	ret <vscale x 2 x double> %res			ret <vscale x 2 x double> %res
	}			}

	; Ensure index is clamped when we cannot prove it's less than VL-1.			; Ensure index is clamped when we cannot prove it's less than 2048-bit.
	define <vscale x 2 x double> @splice_nxv2f64_clamped_idx(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {			define <vscale x 2 x double> @splice_nxv2f64_clamped_idx(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
	; CHECK-LABEL: splice_nxv2f64_clamped_idx:			; CHECK-LABEL: splice_nxv2f64_clamped_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: addvl sp, sp, #-2			; CHECK-NEXT: addvl sp, sp, #-2
	; CHECK-NEXT: cntd x10			; CHECK-NEXT: cntd x10
	; CHECK-NEXT: sub x10, x10, #1			; CHECK-NEXT: sub x10, x10, #1
	; CHECK-NEXT: ptrue p0.d			; CHECK-NEXT: ptrue p0.d
	; CHECK-NEXT: mov x8, sp			; CHECK-NEXT: mov x8, sp
	; CHECK-NEXT: mov w9, #2			; CHECK-NEXT: mov w9, #32
	; CHECK-NEXT: cmp x10, #2			; CHECK-NEXT: cmp x10, #32
	; CHECK-NEXT: st1d { z0.d }, p0, [sp]			; CHECK-NEXT: st1d { z0.d }, p0, [sp]
	; CHECK-NEXT: st1d { z1.d }, p0, [x8, #1, mul vl]			; CHECK-NEXT: st1d { z1.d }, p0, [x8, #1, mul vl]
	; CHECK-NEXT: csel x9, x10, x9, lo			; CHECK-NEXT: csel x9, x10, x9, lo
	; CHECK-NEXT: ld1d { z0.d }, p0/z, [x8, x9, lsl #3]			; CHECK-NEXT: ld1d { z0.d }, p0/z, [x8, x9, lsl #3]
	; CHECK-NEXT: addvl sp, sp, #2			; CHECK-NEXT: addvl sp, sp, #2
	; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 2 x double> @llvm.experimental.vector.splice.nxv2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b, i32 2)			%res = call <vscale x 2 x double> @llvm.experimental.vector.splice.nxv2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b, i32 32)
	ret <vscale x 2 x double> %res			ret <vscale x 2 x double> %res
	}			}

	; Ensure predicate based splice is promoted to use ZPRs.			; Ensure predicate based splice is promoted to use ZPRs.
	define <vscale x 2 x i1> @splice_nxv2i1_idx(<vscale x 2 x i1> %a, <vscale x 2 x i1> %b) #0 {			define <vscale x 2 x i1> @splice_nxv2i1_idx(<vscale x 2 x i1> %a, <vscale x 2 x i1> %b) #0 {
	; CHECK-LABEL: splice_nxv2i1_idx:			; CHECK-LABEL: splice_nxv2i1_idx:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: mov z0.d, p1/z, #1 // =0x1			; CHECK-NEXT: mov z0.d, p1/z, #1 // =0x1
	▲ Show 20 Lines • Show All 669 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Improve VECTOR_SPLICE codegen for VL > 128-bitClosedPublic

Details

Diff Detail

Event Timeline