This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
3/7
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
sve-insert-vector.ll

Differential D126487

[SVE] Fixed custom lowering of ISD::INSERT_SUBVECTOR.
ClosedPublic

Authored by paulwalker-arm on May 26 2022, 9:49 AM.

Download Raw Diff

Details

Reviewers

efriedma
CarolineConcatto
david-arm
sdesmalen
peterwaller-arm

Commits

rG48ea26a3878f: [SVE] Fixed custom lowering of ISD::INSERT_SUBVECTOR.

Summary

LowerINSERT_SUBVECTOR emits AArch64ISD::UUNPK## when lowering
scalable vector floating point INSERT_SUBVECTOR. However, these
nodes only make sense for integer types and thus isel patterns do
not exist for floating point, which leads to isel failures.

This patch ensures floating point operands are cast to integer
before the core lowering takes place.

Fixes: #55037

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,030 ms	x64 debian > MLIR.Examples/standalone::test.toy

Event Timeline

paulwalker-arm created this revision.May 26 2022, 9:49 AM

Herald added a reviewer: efriedma. · View Herald TranscriptMay 26 2022, 9:49 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: psnobl, hiraditya, kristof.beyls, tschuett. · View Herald Transcript

paulwalker-arm requested review of this revision.May 26 2022, 9:49 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 26 2022, 9:49 AM

Herald added subscribers: llvm-commits, alextsao1999. · View Herald Transcript

paulwalker-arm added reviewers: CarolineConcatto, david-arm, sdesmalen, peterwaller-arm.May 26 2022, 9:55 AM

Harbormaster completed remote builds in B166499: Diff 432316.May 26 2022, 10:21 AM

LGTM.

An observation: nounwind only appears to affect output for 5 tests, in case there is an appetite to remove it elsewhere: insert_v2i64_nxv2i64_idx2 / insert_v4i32_nxv4i32_idx4 / insert_v16i8_nxv16i8_idx16 / insert_nxv8f16_nxv2f16 / insert_nxv4bf16_v4bf16.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11496	nit: s/concatinate/concatenate/ s/dependant/dependent/ or perhaps s/dependant/depending/.

This revision is now accepted and ready to land.May 30 2022, 4:35 AM

david-arm added inline comments.May 30 2022, 5:49 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11480–11484	Are these names the wrong way around? I would have expected the wider `VT` to be called `WideVT`. We're inserting a narrower InVT subvector into a wider VT vector.

paulwalker-arm added inline comments.May 30 2022, 5:54 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11480–11484	It's wider in the context of the element type since both types have the same total bit length. `WideVT` is the wider VT because it's created based on the element count. `InVT` (i.e. the subvector) has fewer elements than `VT` and thus its element type will need to be wider in order to match the total bit length of `NarrowVT`.

david-arm added inline comments.May 30 2022, 5:58 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11480–11484	OK, fair enough. So the width or narrowness refers to the element types then? I just found it really confusing that's all, as intuitively I was expecting an insert subvector operation to insert a narrower VT into a wider one. I guess what you actually mean here is WideElementVT and NarrowElementVT. I was thinking of widening in the legalisation sense, i.e. widen a <vscale x 3 x f32> -> <vscale x 4 x f32>.

paulwalker-arm added inline comments.May 30 2022, 6:04 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11480–11484	Although true for the original insert subvector operation the confusion here is that when lowering we're actually merging two vectors of equal length.

david-arm added inline comments.May 30 2022, 6:10 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11480–11484	Yeah I see. In that case would it be possible to leave a simple comment before merging this patch just explaining that a bit of what you said earlier, i.e. that narrow and wide here refer to the element types?

paulwalker-arm added inline comments.May 30 2022, 6:24 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11480–11484	Sure, will do.

Allen added a subscriber: Allen.May 30 2022, 6:27 AM

Fixed typos and improved comments.

Harbormaster completed remote builds in B166940: Diff 432952.May 30 2022, 10:44 AM

LGTM! Thanks @paulwalker-arm. :)

Matt added a subscriber: Matt.Jun 1 2022, 7:30 PM

This revision was landed with ongoing or failed builds.Jun 2 2022, 7:07 AM

Closed by commit rG48ea26a3878f: [SVE] Fixed custom lowering of ISD::INSERT_SUBVECTOR. (authored by paulwalker-arm). · Explain Why

This revision was automatically updated to reflect the committed changes.

paulwalker-arm added a commit: rG48ea26a3878f: [SVE] Fixed custom lowering of ISD::INSERT_SUBVECTOR..

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

33 lines

test/

CodeGen/

AArch64/

sve-insert-vector.ll

152 lines

Diff 432952

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 11,471 Lines • ▼ Show 20 Lines	if (VT.getVectorElementType() == MVT::i1) {
return DAG.getNode(AArch64ISD::UZP1, DL, VT, Lo, NewHi);		return DAG.getNode(AArch64ISD::UZP1, DL, VT, Lo, NewHi);
}		}
}		}

// Ensure the subvector is half the size of the main vector.		// Ensure the subvector is half the size of the main vector.
if (VT.getVectorElementCount() != (InVT.getVectorElementCount() * 2))		if (VT.getVectorElementCount() != (InVT.getVectorElementCount() * 2))
return SDValue();		return SDValue();

EVT WideVT;		// Here narrow and wide refers to the vector element types. After "casting"
SDValue ExtVec;		// both vectors must have the same bit length and so because the subvector
		// has fewer elements, those elements need to be bigger.
		EVT NarrowVT = getPackedSVEVectorVT(VT.getVectorElementCount());
		EVT WideVT = getPackedSVEVectorVT(InVT.getVectorElementCount());
		david-armUnsubmitted Not Done Reply Inline Actions Are these names the wrong way around? I would have expected the wider `VT` to be called `WideVT`. We're inserting a narrower InVT subvector into a wider VT vector. david-arm: Are these names the wrong way around? I would have expected the wider `VT` to be called…
		paulwalker-armAuthorUnsubmitted Done Reply Inline Actions It's wider in the context of the element type since both types have the same total bit length. `WideVT` is the wider VT because it's created based on the element count. `InVT` (i.e. the subvector) has fewer elements than `VT` and thus its element type will need to be wider in order to match the total bit length of `NarrowVT`. paulwalker-arm: It's wider in the context of the element type since both types have the same total bit length.
		david-armUnsubmitted Not Done Reply Inline Actions OK, fair enough. So the width or narrowness refers to the element types then? I just found it really confusing that's all, as intuitively I was expecting an insert subvector operation to insert a narrower VT into a wider one. I guess what you actually mean here is WideElementVT and NarrowElementVT. I was thinking of widening in the legalisation sense, i.e. widen a <vscale x 3 x f32> -> <vscale x 4 x f32>. david-arm: OK, fair enough. So the width or narrowness refers to the element types then? I just found it…
		paulwalker-armAuthorUnsubmitted Done Reply Inline Actions Although true for the original insert subvector operation the confusion here is that when lowering we're actually merging two vectors of equal length. paulwalker-arm: Although true for the original insert subvector operation the confusion here is that when…
		david-armUnsubmitted Not Done Reply Inline Actions Yeah I see. In that case would it be possible to leave a simple comment before merging this patch just explaining that a bit of what you said earlier, i.e. that narrow and wide here refer to the element types? david-arm: Yeah I see. In that case would it be possible to leave a simple comment before merging this…
		paulwalker-armAuthorUnsubmitted Done Reply Inline Actions Sure, will do. paulwalker-arm: Sure, will do.

		// NOP cast operands to the largest legal vector of the same element count.
if (VT.isFloatingPoint()) {		if (VT.isFloatingPoint()) {
// The InVT type should be legal. We can safely cast the unpacked		Vec0 = getSVESafeBitCast(NarrowVT, Vec0, DAG);
// subvector from InVT -> VT.		Vec1 = getSVESafeBitCast(WideVT, Vec1, DAG);
WideVT = VT;
ExtVec = getSVESafeBitCast(VT, Vec1, DAG);
} else {		} else {
// Extend elements of smaller vector...		// Legal integer vectors are already their largest so Vec0 is fine as is.
WideVT = InVT.widenIntegerVectorElementType(*(DAG.getContext()));		Vec1 = DAG.getNode(ISD::ANY_EXTEND, DL, WideVT, Vec1);
ExtVec = DAG.getNode(ISD::ANY_EXTEND, DL, WideVT, Vec1);
}		}

		// To replace the top/bottom half of vector V with vector SubV we widen the
		// preserved half of V, concatenate this to SubV (the order depending on the
		peterwaller-armUnsubmitted Not Done Reply Inline Actions nit: s/concatinate/concatenate/ s/dependant/dependent/ or perhaps s/dependant/depending/. peterwaller-arm: nit: s/concatinate/concatenate/ s/dependant/dependent/ or perhaps s/dependant/depending/.
		// half being replaced) and then narrow the result.
		SDValue Narrow;
if (Idx == 0) {		if (Idx == 0) {
SDValue HiVec0 = DAG.getNode(AArch64ISD::UUNPKHI, DL, WideVT, Vec0);		SDValue HiVec0 = DAG.getNode(AArch64ISD::UUNPKHI, DL, WideVT, Vec0);
return DAG.getNode(AArch64ISD::UZP1, DL, VT, ExtVec, HiVec0);		Narrow = DAG.getNode(AArch64ISD::UZP1, DL, NarrowVT, Vec1, HiVec0);
} else if (Idx == InVT.getVectorMinNumElements()) {		} else {
		assert(Idx == InVT.getVectorMinNumElements() &&
		"Invalid subvector index!");
SDValue LoVec0 = DAG.getNode(AArch64ISD::UUNPKLO, DL, WideVT, Vec0);		SDValue LoVec0 = DAG.getNode(AArch64ISD::UUNPKLO, DL, WideVT, Vec0);
return DAG.getNode(AArch64ISD::UZP1, DL, VT, LoVec0, ExtVec);		Narrow = DAG.getNode(AArch64ISD::UZP1, DL, NarrowVT, LoVec0, Vec1);
}		}

return SDValue();		return getSVESafeBitCast(VT, Narrow, DAG);
}		}

if (Idx == 0 && isPackedVectorType(VT, DAG)) {		if (Idx == 0 && isPackedVectorType(VT, DAG)) {
// This will be matched by custom code during ISelDAGToDAG.		// This will be matched by custom code during ISelDAGToDAG.
if (Vec0.isUndef())		if (Vec0.isUndef())
return Op;		return Op;

Optional<unsigned> PredPattern =		Optional<unsigned> PredPattern =
▲ Show 20 Lines • Show All 9,742 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-insert-vector.ll

	Show First 20 Lines • Show All 293 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: uunpklo z2.d, z2.s			; CHECK-NEXT: uunpklo z2.d, z2.s
	; CHECK-NEXT: uzp1 z1.s, z2.s, z1.s			; CHECK-NEXT: uzp1 z1.s, z2.s, z1.s
	; CHECK-NEXT: uzp1 z0.h, z1.h, z0.h			; CHECK-NEXT: uzp1 z0.h, z1.h, z0.h
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%r = call <vscale x 8 x i16> @llvm.experimental.vector.insert.nxv8i16.nxv2i16(<vscale x 8 x i16> %vec, <vscale x 2 x i16> %in, i64 2)			%r = call <vscale x 8 x i16> @llvm.experimental.vector.insert.nxv8i16.nxv2i16(<vscale x 8 x i16> %vec, <vscale x 2 x i16> %in, i64 2)
	ret <vscale x 8 x i16> %r			ret <vscale x 8 x i16> %r
	}			}

				define <vscale x 4 x half> @insert_nxv4f16_nxv2f16_0(<vscale x 4 x half> %sv0, <vscale x 2 x half> %sv1) nounwind {
				; CHECK-LABEL: insert_nxv4f16_nxv2f16_0:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uunpkhi z0.d, z0.s
				; CHECK-NEXT: uzp1 z0.s, z1.s, z0.s
				; CHECK-NEXT: ret
				%v0 = call <vscale x 4 x half> @llvm.experimental.vector.insert.nxv4f16.nxv2f16(<vscale x 4 x half> %sv0, <vscale x 2 x half> %sv1, i64 0)
				ret <vscale x 4 x half> %v0
				}

				define <vscale x 4 x half> @insert_nxv4f16_nxv2f16_2(<vscale x 4 x half> %sv0, <vscale x 2 x half> %sv1) nounwind {
				; CHECK-LABEL: insert_nxv4f16_nxv2f16_2:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uunpklo z0.d, z0.s
				; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
				; CHECK-NEXT: ret
				%v0 = call <vscale x 4 x half> @llvm.experimental.vector.insert.nxv4f16.nxv2f16(<vscale x 4 x half> %sv0, <vscale x 2 x half> %sv1, i64 2)
				ret <vscale x 4 x half> %v0
				}

	; Test that the index is scaled by vscale if the subvector is scalable.			; Test that the index is scaled by vscale if the subvector is scalable.
	define <vscale x 8 x half> @insert_nxv8f16_nxv2f16(<vscale x 8 x half> %vec, <vscale x 2 x half> %in) nounwind {			define <vscale x 8 x half> @insert_nxv8f16_nxv2f16(<vscale x 8 x half> %vec, <vscale x 2 x half> %in) nounwind {
	; CHECK-LABEL: insert_nxv8f16_nxv2f16:			; CHECK-LABEL: insert_nxv8f16_nxv2f16:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: addvl sp, sp, #-1			; CHECK-NEXT: addvl sp, sp, #-1
	; CHECK-NEXT: ptrue p0.h			; CHECK-NEXT: ptrue p0.h
	; CHECK-NEXT: ptrue p1.d			; CHECK-NEXT: ptrue p1.d
	; CHECK-NEXT: st1h { z0.h }, p0, [sp]			; CHECK-NEXT: st1h { z0.h }, p0, [sp]
	; CHECK-NEXT: st1h { z1.d }, p1, [sp, #1, mul vl]			; CHECK-NEXT: st1h { z1.d }, p1, [sp, #1, mul vl]
	; CHECK-NEXT: ld1h { z0.h }, p0/z, [sp]			; CHECK-NEXT: ld1h { z0.h }, p0/z, [sp]
	; CHECK-NEXT: addvl sp, sp, #1			; CHECK-NEXT: addvl sp, sp, #1
	; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%r = call <vscale x 8 x half> @llvm.experimental.vector.insert.nxv8f16.nxv2f16(<vscale x 8 x half> %vec, <vscale x 2 x half> %in, i64 2)			%r = call <vscale x 8 x half> @llvm.experimental.vector.insert.nxv8f16.nxv2f16(<vscale x 8 x half> %vec, <vscale x 2 x half> %in, i64 2)
	ret <vscale x 8 x half> %r			ret <vscale x 8 x half> %r
	}			}

				define <vscale x 8 x half> @insert_nxv8f16_nxv4f16_0(<vscale x 8 x half> %sv0, <vscale x 4 x half> %sv1) nounwind {
				; CHECK-LABEL: insert_nxv8f16_nxv4f16_0:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uunpkhi z0.s, z0.h
				; CHECK-NEXT: uzp1 z0.h, z1.h, z0.h
				; CHECK-NEXT: ret
				%v0 = call <vscale x 8 x half> @llvm.experimental.vector.insert.nxv8f16.nxv4f16(<vscale x 8 x half> %sv0, <vscale x 4 x half> %sv1, i64 0)
				ret <vscale x 8 x half> %v0
				}

				define <vscale x 8 x half> @insert_nxv8f16_nxv4f16_4(<vscale x 8 x half> %sv0, <vscale x 4 x half> %sv1) nounwind {
				; CHECK-LABEL: insert_nxv8f16_nxv4f16_4:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uunpklo z0.s, z0.h
				; CHECK-NEXT: uzp1 z0.h, z0.h, z1.h
				; CHECK-NEXT: ret
				%v0 = call <vscale x 8 x half> @llvm.experimental.vector.insert.nxv8f16.nxv4f16(<vscale x 8 x half> %sv0, <vscale x 4 x half> %sv1, i64 4)
				ret <vscale x 8 x half> %v0
				}

	; Fixed length clamping			; Fixed length clamping

	define <vscale x 2 x i64> @insert_fixed_v2i64_nxv2i64(<vscale x 2 x i64> %vec, <2 x i64> %subvec) nounwind #0 {			define <vscale x 2 x i64> @insert_fixed_v2i64_nxv2i64(<vscale x 2 x i64> %vec, <2 x i64> %subvec) nounwind #0 {
	; CHECK-LABEL: insert_fixed_v2i64_nxv2i64:			; CHECK-LABEL: insert_fixed_v2i64_nxv2i64:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: addvl sp, sp, #-1			; CHECK-NEXT: addvl sp, sp, #-1
	; CHECK-NEXT: cntd x8			; CHECK-NEXT: cntd x8
	Show All 34 Lines
	; CHECK-NEXT: addvl sp, sp, #1			; CHECK-NEXT: addvl sp, sp, #1
	; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%subvec = load <4 x i64>, <4 x i64>* %ptr			%subvec = load <4 x i64>, <4 x i64>* %ptr
	%retval = call <vscale x 2 x i64> @llvm.experimental.vector.insert.nxv2i64.v4i64(<vscale x 2 x i64> %vec, <4 x i64> %subvec, i64 4)			%retval = call <vscale x 2 x i64> @llvm.experimental.vector.insert.nxv2i64.v4i64(<vscale x 2 x i64> %vec, <4 x i64> %subvec, i64 4)
	ret <vscale x 2 x i64> %retval			ret <vscale x 2 x i64> %retval
	}			}

	attributes #0 = { vscale_range(2,2) }

	declare <vscale x 2 x i64> @llvm.experimental.vector.insert.nxv2i64.v2i64(<vscale x 2 x i64>, <2 x i64>, i64)
	declare <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.v4i32(<vscale x 4 x i32>, <4 x i32>, i64)
	declare <vscale x 8 x i16> @llvm.experimental.vector.insert.nxv8i16.v8i16(<vscale x 8 x i16>, <8 x i16>, i64)
	declare <vscale x 16 x i8> @llvm.experimental.vector.insert.nxv16i8.v16i8(<vscale x 16 x i8>, <16 x i8>, i64)

	declare <vscale x 2 x i64> @llvm.experimental.vector.insert.nxv2i64.v4i64(<vscale x 2 x i64>, <4 x i64>, i64)

	declare <vscale x 16 x i64> @llvm.experimental.vector.insert.nxv8i64.nxv16i64(<vscale x 16 x i64>, <vscale x 8 x i64>, i64)
	declare <vscale x 16 x i64> @llvm.experimental.vector.insert.v2i64.nxv16i64(<vscale x 16 x i64>, <2 x i64>, i64)
	declare <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.nxv1i32(<vscale x 4 x i32>, <vscale x 1 x i32>, i64)
	declare <vscale x 6 x i16> @llvm.experimental.vector.insert.nxv6i16.nxv1i16(<vscale x 6 x i16>, <vscale x 1 x i16>, i64)

	declare <vscale x 8 x i16> @llvm.experimental.vector.insert.nxv8i16.nxv2i16(<vscale x 8 x i16>, <vscale x 2 x i16>, i64)

	declare <vscale x 8 x half> @llvm.experimental.vector.insert.nxv8f16.nxv2f16(<vscale x 8 x half>, <vscale x 2 x half>, i64)

	;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;			;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
	;; Upacked types that need result widening			;; Upacked types that need result widening
	;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;			;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

	define <vscale x 3 x i32> @insert_nxv3i32_nxv2i32(<vscale x 2 x i32> %sv0) {			define <vscale x 3 x i32> @insert_nxv3i32_nxv2i32(<vscale x 2 x i32> %sv0) {
	; CHECK-LABEL: insert_nxv3i32_nxv2i32:			; CHECK-LABEL: insert_nxv3i32_nxv2i32:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: uzp1 z0.s, z0.s, z0.s			; CHECK-NEXT: uzp1 z0.s, z0.s, z0.s
	Show All 17 Lines
	; CHECK-LABEL: insert_nxv3f32_nxv2f32:			; CHECK-LABEL: insert_nxv3f32_nxv2f32:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: uzp1 z0.s, z0.s, z0.s			; CHECK-NEXT: uzp1 z0.s, z0.s, z0.s
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%v0 = call <vscale x 3 x float> @llvm.experimental.vector.insert.nxv3f32.nxv2f32(<vscale x 3 x float> undef, <vscale x 2 x float> %sv0, i64 0)			%v0 = call <vscale x 3 x float> @llvm.experimental.vector.insert.nxv3f32.nxv2f32(<vscale x 3 x float> undef, <vscale x 2 x float> %sv0, i64 0)
	ret <vscale x 3 x float> %v0			ret <vscale x 3 x float> %v0
	}			}

				define <vscale x 4 x float> @insert_nxv4f32_nxv2f32_0(<vscale x 4 x float> %sv0, <vscale x 2 x float> %sv1) nounwind {
				; CHECK-LABEL: insert_nxv4f32_nxv2f32_0:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uunpkhi z0.d, z0.s
				; CHECK-NEXT: uzp1 z0.s, z1.s, z0.s
				; CHECK-NEXT: ret
				%v0 = call <vscale x 4 x float> @llvm.experimental.vector.insert.nxv4f32.nxv2f32(<vscale x 4 x float> %sv0, <vscale x 2 x float> %sv1, i64 0)
				ret <vscale x 4 x float> %v0
				}

				define <vscale x 4 x float> @insert_nxv4f32_nxv2f32_2(<vscale x 4 x float> %sv0, <vscale x 2 x float> %sv1) nounwind {
				; CHECK-LABEL: insert_nxv4f32_nxv2f32_2:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uunpklo z0.d, z0.s
				; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
				; CHECK-NEXT: ret
				%v0 = call <vscale x 4 x float> @llvm.experimental.vector.insert.nxv4f32.nxv2f32(<vscale x 4 x float> %sv0, <vscale x 2 x float> %sv1, i64 2)
				ret <vscale x 4 x float> %v0
				}

	define <vscale x 6 x i32> @insert_nxv6i32_nxv2i32(<vscale x 2 x i32> %sv0, <vscale x 2 x i32> %sv1) nounwind {			define <vscale x 6 x i32> @insert_nxv6i32_nxv2i32(<vscale x 2 x i32> %sv0, <vscale x 2 x i32> %sv1) nounwind {
	; CHECK-LABEL: insert_nxv6i32_nxv2i32:			; CHECK-LABEL: insert_nxv6i32_nxv2i32:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: addvl sp, sp, #-2			; CHECK-NEXT: addvl sp, sp, #-2
	; CHECK-NEXT: ptrue p0.s			; CHECK-NEXT: ptrue p0.s
	; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s			; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
	; CHECK-NEXT: st1w { z0.s }, p0, [sp]			; CHECK-NEXT: st1w { z0.s }, p0, [sp]
	▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ptrue p0.h, vl8			; CHECK-NEXT: ptrue p0.h, vl8
	; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1			; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
	; CHECK-NEXT: mov z0.h, p0/m, z1.h			; CHECK-NEXT: mov z0.h, p0/m, z1.h
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%v0 = call <vscale x 8 x bfloat> @llvm.experimental.vector.insert.nxv8bf16.v8bf16(<vscale x 8 x bfloat> %sv0, <8 x bfloat> %v1, i64 0)			%v0 = call <vscale x 8 x bfloat> @llvm.experimental.vector.insert.nxv8bf16.v8bf16(<vscale x 8 x bfloat> %sv0, <8 x bfloat> %v1, i64 0)
	ret <vscale x 8 x bfloat> %v0			ret <vscale x 8 x bfloat> %v0
	}			}

				define <vscale x 8 x bfloat> @insert_nxv8bf16_nxv4bf16_0(<vscale x 8 x bfloat> %sv0, <vscale x 4 x bfloat> %sv1) nounwind {
				; CHECK-LABEL: insert_nxv8bf16_nxv4bf16_0:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uunpkhi z0.s, z0.h
				; CHECK-NEXT: uzp1 z0.h, z1.h, z0.h
				; CHECK-NEXT: ret
				%v0 = call <vscale x 8 x bfloat> @llvm.experimental.vector.insert.nxv8bf16.nxv4bf16(<vscale x 8 x bfloat> %sv0, <vscale x 4 x bfloat> %sv1, i64 0)
				ret <vscale x 8 x bfloat> %v0
				}

				define <vscale x 8 x bfloat> @insert_nxv8bf16_nxv4bf16_4(<vscale x 8 x bfloat> %sv0, <vscale x 4 x bfloat> %sv1) nounwind {
				; CHECK-LABEL: insert_nxv8bf16_nxv4bf16_4:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uunpklo z0.s, z0.h
				; CHECK-NEXT: uzp1 z0.h, z0.h, z1.h
				; CHECK-NEXT: ret
				%v0 = call <vscale x 8 x bfloat> @llvm.experimental.vector.insert.nxv8bf16.nxv4bf16(<vscale x 8 x bfloat> %sv0, <vscale x 4 x bfloat> %sv1, i64 4)
				ret <vscale x 8 x bfloat> %v0
				}

				define <vscale x 4 x bfloat> @insert_nxv4bf16_nxv2bf16_0(<vscale x 4 x bfloat> %sv0, <vscale x 2 x bfloat> %sv1) nounwind {
				; CHECK-LABEL: insert_nxv4bf16_nxv2bf16_0:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uunpkhi z0.d, z0.s
				; CHECK-NEXT: uzp1 z0.s, z1.s, z0.s
				; CHECK-NEXT: ret
				%v0 = call <vscale x 4 x bfloat> @llvm.experimental.vector.insert.nxv4bf16.nxv2bf16(<vscale x 4 x bfloat> %sv0, <vscale x 2 x bfloat> %sv1, i64 0)
				ret <vscale x 4 x bfloat> %v0
				}

				define <vscale x 4 x bfloat> @insert_nxv4bf16_nxv2bf16_2(<vscale x 4 x bfloat> %sv0, <vscale x 2 x bfloat> %sv1) nounwind {
				; CHECK-LABEL: insert_nxv4bf16_nxv2bf16_2:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uunpklo z0.d, z0.s
				; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
				; CHECK-NEXT: ret
				%v0 = call <vscale x 4 x bfloat> @llvm.experimental.vector.insert.nxv4bf16.nxv2bf16(<vscale x 4 x bfloat> %sv0, <vscale x 2 x bfloat> %sv1, i64 2)
				ret <vscale x 4 x bfloat> %v0
				}

	; Test predicate inserts of half size.			; Test predicate inserts of half size.
	define <vscale x 16 x i1> @insert_nxv16i1_nxv8i1_0(<vscale x 16 x i1> %vec, <vscale x 8 x i1> %sv) {			define <vscale x 16 x i1> @insert_nxv16i1_nxv8i1_0(<vscale x 16 x i1> %vec, <vscale x 8 x i1> %sv) {
	; CHECK-LABEL: insert_nxv16i1_nxv8i1_0:			; CHECK-LABEL: insert_nxv16i1_nxv8i1_0:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: punpkhi p0.h, p0.b			; CHECK-NEXT: punpkhi p0.h, p0.b
	; CHECK-NEXT: uzp1 p0.b, p1.b, p0.b			; CHECK-NEXT: uzp1 p0.b, p1.b, p0.b
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%v0 = call <vscale x 16 x i1> @llvm.experimental.vector.insert.nx16i1.nxv8i1(<vscale x 16 x i1> %vec, <vscale x 8 x i1> %sv, i64 0)			%v0 = call <vscale x 16 x i1> @llvm.experimental.vector.insert.nx16i1.nxv8i1(<vscale x 16 x i1> %vec, <vscale x 8 x i1> %sv, i64 0)
	▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: insert_nxv16i1_v64i1_const_true_into_undef:			; CHECK-LABEL: insert_nxv16i1_v64i1_const_true_into_undef:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ptrue p0.b			; CHECK-NEXT: ptrue p0.b
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%v0 = call <vscale x 16 x i1> @llvm.experimental.vector.insert.nxv16i1.v64i1 (<vscale x 16 x i1> undef, <64 x i1> <i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1>, i64 0)			%v0 = call <vscale x 16 x i1> @llvm.experimental.vector.insert.nxv16i1.v64i1 (<vscale x 16 x i1> undef, <64 x i1> <i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1>, i64 0)
	ret <vscale x 16 x i1> %v0			ret <vscale x 16 x i1> %v0
	}			}

				attributes #0 = { vscale_range(2,2) }

				declare <vscale x 16 x i8> @llvm.experimental.vector.insert.nxv16i8.v16i8(<vscale x 16 x i8>, <16 x i8>, i64)

				declare <vscale x 6 x i16> @llvm.experimental.vector.insert.nxv6i16.nxv1i16(<vscale x 6 x i16>, <vscale x 1 x i16>, i64)
				declare <vscale x 8 x i16> @llvm.experimental.vector.insert.nxv8i16.nxv2i16(<vscale x 8 x i16>, <vscale x 2 x i16>, i64)
				declare <vscale x 8 x i16> @llvm.experimental.vector.insert.nxv8i16.v8i16(<vscale x 8 x i16>, <8 x i16>, i64)

	declare <vscale x 3 x i32> @llvm.experimental.vector.insert.nxv3i32.nxv2i32(<vscale x 3 x i32>, <vscale x 2 x i32>, i64)			declare <vscale x 3 x i32> @llvm.experimental.vector.insert.nxv3i32.nxv2i32(<vscale x 3 x i32>, <vscale x 2 x i32>, i64)
	declare <vscale x 3 x float> @llvm.experimental.vector.insert.nxv3f32.nxv2f32(<vscale x 3 x float>, <vscale x 2 x float>, i64)			declare <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.nxv1i32(<vscale x 4 x i32>, <vscale x 1 x i32>, i64)
				declare <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.v4i32(<vscale x 4 x i32>, <4 x i32>, i64)
				declare <vscale x 12 x i32> @llvm.experimental.vector.insert.nxv4i32.nxv12i32(<vscale x 12 x i32>, <vscale x 4 x i32>, i64)
	declare <vscale x 6 x i32> @llvm.experimental.vector.insert.nxv6i32.nxv2i32(<vscale x 6 x i32>, <vscale x 2 x i32>, i64)			declare <vscale x 6 x i32> @llvm.experimental.vector.insert.nxv6i32.nxv2i32(<vscale x 6 x i32>, <vscale x 2 x i32>, i64)
	declare <vscale x 6 x i32> @llvm.experimental.vector.insert.nxv6i32.nxv3i32(<vscale x 6 x i32>, <vscale x 3 x i32>, i64)			declare <vscale x 6 x i32> @llvm.experimental.vector.insert.nxv6i32.nxv3i32(<vscale x 6 x i32>, <vscale x 3 x i32>, i64)
	declare <vscale x 12 x i32> @llvm.experimental.vector.insert.nxv4i32.nxv12i32(<vscale x 12 x i32>, <vscale x 4 x i32>, i64)
	declare <vscale x 8 x bfloat> @llvm.experimental.vector.insert.nxv8bf16.nxv8bf16(<vscale x 8 x bfloat>, <vscale x 8 x bfloat>, i64)			declare <vscale x 2 x bfloat> @llvm.experimental.vector.insert.nxv2bf16.nxv2bf16(<vscale x 2 x bfloat>, <vscale x 2 x bfloat>, i64)
	declare <vscale x 8 x bfloat> @llvm.experimental.vector.insert.nxv8bf16.v8bf16(<vscale x 8 x bfloat>, <8 x bfloat>, i64)			declare <vscale x 4 x bfloat> @llvm.experimental.vector.insert.nxv4bf16.nxv2bf16(<vscale x 4 x bfloat>, <vscale x 2 x bfloat>, i64)
	declare <vscale x 4 x bfloat> @llvm.experimental.vector.insert.nxv4bf16.nxv4bf16(<vscale x 4 x bfloat>, <vscale x 4 x bfloat>, i64)			declare <vscale x 4 x bfloat> @llvm.experimental.vector.insert.nxv4bf16.nxv4bf16(<vscale x 4 x bfloat>, <vscale x 4 x bfloat>, i64)
	declare <vscale x 4 x bfloat> @llvm.experimental.vector.insert.nxv4bf16.v4bf16(<vscale x 4 x bfloat>, <4 x bfloat>, i64)			declare <vscale x 4 x bfloat> @llvm.experimental.vector.insert.nxv4bf16.v4bf16(<vscale x 4 x bfloat>, <4 x bfloat>, i64)
	declare <vscale x 2 x bfloat> @llvm.experimental.vector.insert.nxv2bf16.nxv2bf16(<vscale x 2 x bfloat>, <vscale x 2 x bfloat>, i64)			declare <vscale x 8 x bfloat> @llvm.experimental.vector.insert.nxv8bf16.nxv8bf16(<vscale x 8 x bfloat>, <vscale x 8 x bfloat>, i64)
				declare <vscale x 8 x bfloat> @llvm.experimental.vector.insert.nxv8bf16.nxv4bf16(<vscale x 8 x bfloat>, <vscale x 4 x bfloat>, i64)
				declare <vscale x 8 x bfloat> @llvm.experimental.vector.insert.nxv8bf16.v8bf16(<vscale x 8 x bfloat>, <8 x bfloat>, i64)

				declare <vscale x 2 x i64> @llvm.experimental.vector.insert.nxv2i64.v2i64(<vscale x 2 x i64>, <2 x i64>, i64)
				declare <vscale x 2 x i64> @llvm.experimental.vector.insert.nxv2i64.v4i64(<vscale x 2 x i64>, <4 x i64>, i64)
				declare <vscale x 16 x i64> @llvm.experimental.vector.insert.nxv8i64.nxv16i64(<vscale x 16 x i64>, <vscale x 8 x i64>, i64)
				declare <vscale x 16 x i64> @llvm.experimental.vector.insert.v2i64.nxv16i64(<vscale x 16 x i64>, <2 x i64>, i64)

				declare <vscale x 4 x half> @llvm.experimental.vector.insert.nxv4f16.nxv2f16(<vscale x 4 x half>, <vscale x 2 x half>, i64)
				declare <vscale x 8 x half> @llvm.experimental.vector.insert.nxv8f16.nxv2f16(<vscale x 8 x half>, <vscale x 2 x half>, i64)
				declare <vscale x 8 x half> @llvm.experimental.vector.insert.nxv8f16.nxv4f16(<vscale x 8 x half>, <vscale x 4 x half>, i64)

				declare <vscale x 3 x float> @llvm.experimental.vector.insert.nxv3f32.nxv2f32(<vscale x 3 x float>, <vscale x 2 x float>, i64)
				declare <vscale x 4 x float> @llvm.experimental.vector.insert.nxv4f32.nxv2f32(<vscale x 4 x float>, <vscale x 2 x float>, i64)

	declare <vscale x 2 x i1> @llvm.experimental.vector.insert.nxv2i1.v8i1(<vscale x 2 x i1>, <8 x i1>, i64)			declare <vscale x 2 x i1> @llvm.experimental.vector.insert.nxv2i1.v8i1(<vscale x 2 x i1>, <8 x i1>, i64)
	declare <vscale x 4 x i1> @llvm.experimental.vector.insert.nxv4i1.v16i1(<vscale x 4 x i1>, <16 x i1>, i64)			declare <vscale x 4 x i1> @llvm.experimental.vector.insert.nxv4i1.v16i1(<vscale x 4 x i1>, <16 x i1>, i64)
	declare <vscale x 8 x i1> @llvm.experimental.vector.insert.nxv8i1.v32i1(<vscale x 8 x i1>, <32 x i1>, i64)			declare <vscale x 8 x i1> @llvm.experimental.vector.insert.nxv8i1.v32i1(<vscale x 8 x i1>, <32 x i1>, i64)
	declare <vscale x 16 x i1> @llvm.experimental.vector.insert.nx16i1.nxv4i1(<vscale x 16 x i1>, <vscale x 4 x i1>, i64)			declare <vscale x 16 x i1> @llvm.experimental.vector.insert.nx16i1.nxv4i1(<vscale x 16 x i1>, <vscale x 4 x i1>, i64)
	declare <vscale x 16 x i1> @llvm.experimental.vector.insert.nx16i1.nxv8i1(<vscale x 16 x i1>, <vscale x 8 x i1>, i64)			declare <vscale x 16 x i1> @llvm.experimental.vector.insert.nx16i1.nxv8i1(<vscale x 16 x i1>, <vscale x 8 x i1>, i64)
	declare <vscale x 16 x i1> @llvm.experimental.vector.insert.nxv16i1.v64i1(<vscale x 16 x i1>, <64 x i1>, i64)			declare <vscale x 16 x i1> @llvm.experimental.vector.insert.nxv16i1.v64i1(<vscale x 16 x i1>, <64 x i1>, i64)