This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
3
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
active_lane_mask.ll

Differential D116664

[AArch64] Improve codegen for get.active.lane.mask when SVE is available
ClosedPublic

Authored by david-arm on Jan 5 2022, 7:52 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
kmclaughlin
CarolineConcatto
dmgreen

Commits

rG8b58494cea78: [AArch64] Improve codegen for get.active.lane.mask when SVE is available

Summary

When lowering the get.active.lane.mask intrinsic with a fixed-width
predicate vector result, we can actually make use of the SVE whilelo
instruction when SVE is enabled. We do this by carefully choosing
a sensible VT for the whilelo instruction, then promoting it to an
integer vector, i.e. nxv16i1 -> nx16i8. We can then extract a v16i8
subvector and truncate back to the original return type, i.e. v16i1.
This leads to a significant improvement in code quality.

Diff Detail

Unit TestsFailed

	Time	Test
	100 ms	x64 debian > LLVM.Bindings/Go::go.test

Event Timeline

david-arm created this revision.Jan 5 2022, 7:52 AM

Herald added subscribers: ctetreau, hiraditya, kristof.beyls, tschuett. · View Herald TranscriptJan 5 2022, 7:52 AM

david-arm requested review of this revision.Jan 5 2022, 7:52 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2022, 7:52 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

david-arm added parent revisions: D116644: [NFC][AArch64][CodeGen] Add fixed-width vector tests for get.active.lane.mask, D116602: [SVE][CodeGen] Bail out for scalable vectors in AArch64TargetLowering::ReconstructShuffle.Jan 5 2022, 7:52 AM

Harbormaster completed remote builds in B141679: Diff 397584.Jan 5 2022, 8:30 AM

I'm not sure the testcases actually illustrate the cases we care about. Generally, I would expect the result of llvm.get.active.lane.mask() to be used in a select instruction, or a masked load, or something like that. And in that case, I'm not sure the way you're choosing the VT is appropriate; the instruction using the mask is probably not going to expect a 64-bit vector.

Matt added a subscriber: Matt.Jan 7 2022, 7:30 AM

In D116664#3223018, @efriedma wrote:

I'm not sure the testcases actually illustrate the cases we care about. Generally, I would expect the result of llvm.get.active.lane.mask() to be used in a select instruction, or a masked load, or something like that. And in that case, I'm not sure the way you're choosing the VT is appropriate; the instruction using the mask is probably not going to expect a 64-bit vector.

Hi @efriedma, I think these testcases are still useful by themselves because they are succint and make it easy to see how one IR instruction maps to assembly. At the moment if I add more complex test cases involving a select, for example, the code quality ends up being awful regardless of what promoted VT I choose. I think there is a still a codegen issue somewhere because I see loads of pointless lane moves whenever I add something like a select. So for now, I'd like to leave the tests as they are.

However, I do take your point about trying to second guess how the masks are going to be used, and perhaps I can make the choice of promoted VT simpler for now, and leave the xtn instructions in.

Okay, sounds good.

Removed code to optimise VT for NEON, since we don't know how the result is actually going to be used.

Harbormaster completed remote builds in B142888: Diff 399284.Jan 12 2022, 5:12 AM

Nice improvement!

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15069	nit: avoid indentation by doing `if (!VT.isFixedLengthVector()) return SDValue();` ?
15095	nit: add comment 'truncate v4i32 -> v4i1`

This revision is now accepted and ready to land.Jan 12 2022, 9:26 AM

dmgreen added inline comments.Jan 13 2022, 12:33 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15069	It looks like it is worth adding an assert that SVE is available too.

david-arm added a child revision: D117109: [LoopVectorize][AArch64] Use get.active.lane.mask intrinsic when SVE is enabled.Jan 13 2022, 3:16 AM

This revision was landed with ongoing or failed builds.Feb 10 2022, 8:02 AM

Closed by commit rG8b58494cea78: [AArch64] Improve codegen for get.active.lane.mask when SVE is available (authored by david-arm). · Explain Why

This revision was automatically updated to reflect the committed changes.

david-arm added a commit: rG8b58494cea78: [AArch64] Improve codegen for get.active.lane.mask when SVE is available.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

56 lines

test/

CodeGen/

AArch64/

active_lane_mask.ll

152 lines

Diff 397584

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 1,509 Lines • ▼ Show 20 Lines
	}			}

	bool AArch64TargetLowering::shouldExpandGetActiveLaneMask(EVT ResVT,			bool AArch64TargetLowering::shouldExpandGetActiveLaneMask(EVT ResVT,
	EVT OpVT) const {			EVT OpVT) const {
	// Only SVE has a 1:1 mapping from intrinsic -> instruction (whilelo).			// Only SVE has a 1:1 mapping from intrinsic -> instruction (whilelo).
	if (!Subtarget->hasSVE())			if (!Subtarget->hasSVE())
	return true;			return true;

	// We can only support legal predicate result types.			// We can only support legal predicate result types. We can use the SVE
				// whilelo instruction for generating fixed-width predicates too.
	if (ResVT != MVT::nxv2i1 && ResVT != MVT::nxv4i1 && ResVT != MVT::nxv8i1 &&			if (ResVT != MVT::nxv2i1 && ResVT != MVT::nxv4i1 && ResVT != MVT::nxv8i1 &&
	ResVT != MVT::nxv16i1)			ResVT != MVT::nxv16i1 && ResVT != MVT::v2i1 && ResVT != MVT::v4i1 &&
				ResVT != MVT::v8i1 && ResVT != MVT::v16i1)
	return true;			return true;

	// The whilelo instruction only works with i32 or i64 scalar inputs.			// The whilelo instruction only works with i32 or i64 scalar inputs.
	if (OpVT != MVT::i32 && OpVT != MVT::i64)			if (OpVT != MVT::i32 && OpVT != MVT::i64)
	return true;			return true;

	return false;			return false;
	}			}
	▲ Show 20 Lines • Show All 13,527 Lines • ▼ Show 20 Lines
	static SDValue performIntrinsicCombine(SDNode *N,			static SDValue performIntrinsicCombine(SDNode *N,
	TargetLowering::DAGCombinerInfo &DCI,			TargetLowering::DAGCombinerInfo &DCI,
	const AArch64Subtarget *Subtarget) {			const AArch64Subtarget *Subtarget) {
	SelectionDAG &DAG = DCI.DAG;			SelectionDAG &DAG = DCI.DAG;
	unsigned IID = getIntrinsicID(N);			unsigned IID = getIntrinsicID(N);
	switch (IID) {			switch (IID) {
	default:			default:
	break;			break;
				case Intrinsic::get_active_lane_mask: {
				SDValue Res = SDValue();
				EVT VT = N->getValueType(0);
				if (VT.isFixedLengthVector()) {
				sdesmalenUnsubmitted Not Done Reply Inline Actions nit: avoid indentation by doing `if (!VT.isFixedLengthVector()) return SDValue();` ? sdesmalen: nit: avoid indentation by doing `if (!VT.isFixedLengthVector()) return SDValue();` ?
				dmgreenUnsubmitted Not Done Reply Inline Actions It looks like it is worth adding an assert that SVE is available too. dmgreen: It looks like it is worth adding an assert that SVE is available too.
				// We can use the SVE whilelo instruction to lower this intrinsic by
				// creating the appropriate sequence of scalable vector operations and
				// then extracting a fixed-wdith subvector from the scalable vector.

				SDLoc DL(N);
				SDValue ID =
				DAG.getTargetConstant(Intrinsic::aarch64_sve_whilelo, DL, MVT::i64);

				// == Choose a sensible scalable VT for the whilelo instruction. ==
				// For NEON v16i1 gets promoted to v16i8, i.e. a 128-bit vector. However,
				// all other predicate vectors get promoted to 64-bit integer vectors,
				// i.e. v8i1 -> v8i8, v4i1 -> v4i16 and v2i1 -> v2i32. This means we
				// should choose a legal scalable VT for the whilelo instruction such that
				// when promoted it has the same integer element type as the promoted NEON
				// predicate. For example, for VT=v4i1 it gets promoted to VT=v4i16, so
				// we choose a while predicate of nxv8i1, since this will get promoted to
				// nxv8i16. We can then simply extract a fixed-width subvector of type
				// v4i16 from that.
				EVT WhileVT;
				if (VT == MVT::v16i1 \|\| VT == MVT::v8i1)
				WhileVT = MVT::nxv16i1;
				else if (VT == MVT::v4i1)
				WhileVT = MVT::nxv8i1;
				else {
				assert(VT == MVT::v2i1 &&
				"Unexpected fixed-width predicate type for intrinsic");
				sdesmalenUnsubmitted Not Done Reply Inline Actions nit: add comment 'truncate v4i32 -> v4i1` sdesmalen: nit: add comment 'truncate v4i32 -> v4i1`
				WhileVT = MVT::nxv4i1;
				}

				// Get promoted scalable vector VT, i.e. promote nxv4i1 -> nxv4i32.
				EVT PromVT = getPromotedVTForPredicate(WhileVT);

				// Get the fixed-width equivalent of PromVT for extraction.
				EVT ExtVT =
				EVT::getVectorVT(*DAG.getContext(), PromVT.getVectorElementType(),
				VT.getVectorElementCount());

				Res = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, WhileVT, ID,
				N->getOperand(1), N->getOperand(2));
				Res = DAG.getNode(ISD::SIGN_EXTEND, DL, PromVT, Res);
				Res = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ExtVT, Res,
				DAG.getConstant(0, DL, MVT::i64));
				Res = DAG.getNode(ISD::TRUNCATE, DL, VT, Res);
				}
				return Res;
				}
	case Intrinsic::aarch64_neon_vcvtfxs2fp:			case Intrinsic::aarch64_neon_vcvtfxs2fp:
	case Intrinsic::aarch64_neon_vcvtfxu2fp:			case Intrinsic::aarch64_neon_vcvtfxu2fp:
	return tryCombineFixedPointConvert(N, DCI, DAG);			return tryCombineFixedPointConvert(N, DCI, DAG);
	case Intrinsic::aarch64_neon_saddv:			case Intrinsic::aarch64_neon_saddv:
	return combineAcrossLanesIntrinsic(AArch64ISD::SADDV, N, DAG);			return combineAcrossLanesIntrinsic(AArch64ISD::SADDV, N, DAG);
	case Intrinsic::aarch64_neon_uaddv:			case Intrinsic::aarch64_neon_uaddv:
	return combineAcrossLanesIntrinsic(AArch64ISD::UADDV, N, DAG);			return combineAcrossLanesIntrinsic(AArch64ISD::UADDV, N, DAG);
	case Intrinsic::aarch64_neon_sminv:			case Intrinsic::aarch64_neon_sminv:
	▲ Show 20 Lines • Show All 4,715 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/active_lane_mask.ll

	Show First 20 Lines • Show All 318 Lines • ▼ Show 20 Lines
	}			}


	; == Fixed width ==			; == Fixed width ==

	define <16 x i1> @lane_mask_v16i1_i32(i32 %index, i32 %TC) {			define <16 x i1> @lane_mask_v16i1_i32(i32 %index, i32 %TC) {
	; CHECK-LABEL: lane_mask_v16i1_i32:			; CHECK-LABEL: lane_mask_v16i1_i32:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: adrp x8, .LCPI15_0			; CHECK-NEXT: whilelo p0.b, w0, w1
	; CHECK-NEXT: adrp x9, .LCPI15_3			; CHECK-NEXT: mov z0.b, p0/z, #-1 // =0xffffffffffffffff
	; CHECK-NEXT: adrp x10, .LCPI15_2			; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
	; CHECK-NEXT: dup v2.4s, w0
	; CHECK-NEXT: dup v5.4s, w1
	; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI15_0]
	; CHECK-NEXT: adrp x8, .LCPI15_1
	; CHECK-NEXT: ldr q1, [x9, :lo12:.LCPI15_3]
	; CHECK-NEXT: ldr q3, [x10, :lo12:.LCPI15_2]
	; CHECK-NEXT: ldr q4, [x8, :lo12:.LCPI15_1]
	; CHECK-NEXT: uqadd v1.4s, v2.4s, v1.4s
	; CHECK-NEXT: uqadd v3.4s, v2.4s, v3.4s
	; CHECK-NEXT: uqadd v4.4s, v2.4s, v4.4s
	; CHECK-NEXT: uqadd v0.4s, v2.4s, v0.4s
	; CHECK-NEXT: cmhi v1.4s, v5.4s, v1.4s
	; CHECK-NEXT: cmhi v2.4s, v5.4s, v3.4s
	; CHECK-NEXT: cmhi v3.4s, v5.4s, v4.4s
	; CHECK-NEXT: cmhi v0.4s, v5.4s, v0.4s
	; CHECK-NEXT: uzp1 v1.8h, v2.8h, v1.8h
	; CHECK-NEXT: uzp1 v0.8h, v0.8h, v3.8h
	; CHECK-NEXT: uzp1 v0.16b, v0.16b, v1.16b
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%active.lane.mask = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i32(i32 %index, i32 %TC)			%active.lane.mask = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i32(i32 %index, i32 %TC)
	ret <16 x i1> %active.lane.mask			ret <16 x i1> %active.lane.mask
	}			}

	define <8 x i1> @lane_mask_v8i1_i32(i32 %index, i32 %TC) {			define <8 x i1> @lane_mask_v8i1_i32(i32 %index, i32 %TC) {
	; CHECK-LABEL: lane_mask_v8i1_i32:			; CHECK-LABEL: lane_mask_v8i1_i32:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: adrp x8, .LCPI16_1			; CHECK-NEXT: whilelo p0.b, w0, w1
	; CHECK-NEXT: adrp x9, .LCPI16_0			; CHECK-NEXT: mov z0.b, p0/z, #-1 // =0xffffffffffffffff
	; CHECK-NEXT: dup v2.4s, w0			; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
	; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI16_1]
	; CHECK-NEXT: ldr q1, [x9, :lo12:.LCPI16_0]
	; CHECK-NEXT: uqadd v0.4s, v2.4s, v0.4s
	; CHECK-NEXT: uqadd v1.4s, v2.4s, v1.4s
	; CHECK-NEXT: dup v2.4s, w1
	; CHECK-NEXT: cmhi v0.4s, v2.4s, v0.4s
	; CHECK-NEXT: cmhi v1.4s, v2.4s, v1.4s
	; CHECK-NEXT: uzp1 v0.8h, v1.8h, v0.8h
	; CHECK-NEXT: xtn v0.8b, v0.8h
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%active.lane.mask = call <8 x i1> @llvm.get.active.lane.mask.v8i1.i32(i32 %index, i32 %TC)			%active.lane.mask = call <8 x i1> @llvm.get.active.lane.mask.v8i1.i32(i32 %index, i32 %TC)
	ret <8 x i1> %active.lane.mask			ret <8 x i1> %active.lane.mask
	}			}

	define <4 x i1> @lane_mask_v4i1_i32(i32 %index, i32 %TC) {			define <4 x i1> @lane_mask_v4i1_i32(i32 %index, i32 %TC) {
	; CHECK-LABEL: lane_mask_v4i1_i32:			; CHECK-LABEL: lane_mask_v4i1_i32:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: adrp x8, .LCPI17_0			; CHECK-NEXT: whilelo p0.h, w0, w1
	; CHECK-NEXT: dup v1.4s, w0			; CHECK-NEXT: mov z0.h, p0/z, #-1 // =0xffffffffffffffff
	; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI17_0]			; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
	; CHECK-NEXT: uqadd v0.4s, v1.4s, v0.4s
	; CHECK-NEXT: dup v1.4s, w1
	; CHECK-NEXT: cmhi v0.4s, v1.4s, v0.4s
	; CHECK-NEXT: xtn v0.4h, v0.4s
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%active.lane.mask = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %index, i32 %TC)			%active.lane.mask = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %index, i32 %TC)
	ret <4 x i1> %active.lane.mask			ret <4 x i1> %active.lane.mask
	}			}

	define <2 x i1> @lane_mask_v2i1_i32(i32 %index, i32 %TC) {			define <2 x i1> @lane_mask_v2i1_i32(i32 %index, i32 %TC) {
	; CHECK-LABEL: lane_mask_v2i1_i32:			; CHECK-LABEL: lane_mask_v2i1_i32:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: adrp x8, .LCPI18_0			; CHECK-NEXT: whilelo p0.s, w0, w1
	; CHECK-NEXT: dup v0.2s, w0			; CHECK-NEXT: mov z0.s, p0/z, #-1 // =0xffffffffffffffff
	; CHECK-NEXT: ldr d1, [x8, :lo12:.LCPI18_0]			; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
	; CHECK-NEXT: uqadd v0.2s, v0.2s, v1.2s
	; CHECK-NEXT: dup v1.2s, w1
	; CHECK-NEXT: cmhi v0.2s, v1.2s, v0.2s
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%active.lane.mask = call <2 x i1> @llvm.get.active.lane.mask.v2i1.i32(i32 %index, i32 %TC)			%active.lane.mask = call <2 x i1> @llvm.get.active.lane.mask.v2i1.i32(i32 %index, i32 %TC)
	ret <2 x i1> %active.lane.mask			ret <2 x i1> %active.lane.mask
	}			}

	define <16 x i1> @lane_mask_v16i1_i64(i64 %index, i64 %TC) {			define <16 x i1> @lane_mask_v16i1_i64(i64 %index, i64 %TC) {
	; CHECK-LABEL: lane_mask_v16i1_i64:			; CHECK-LABEL: lane_mask_v16i1_i64:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: adrp x8, .LCPI19_0			; CHECK-NEXT: whilelo p0.b, x0, x1
	; CHECK-NEXT: adrp x9, .LCPI19_1			; CHECK-NEXT: mov z0.b, p0/z, #-1 // =0xffffffffffffffff
	; CHECK-NEXT: adrp x10, .LCPI19_2			; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
	; CHECK-NEXT: dup v1.2d, x0
	; CHECK-NEXT: dup v17.2d, x1
	; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI19_0]
	; CHECK-NEXT: adrp x8, .LCPI19_3
	; CHECK-NEXT: ldr q2, [x9, :lo12:.LCPI19_1]
	; CHECK-NEXT: adrp x9, .LCPI19_4
	; CHECK-NEXT: ldr q3, [x10, :lo12:.LCPI19_2]
	; CHECK-NEXT: ldr q4, [x8, :lo12:.LCPI19_3]
	; CHECK-NEXT: adrp x8, .LCPI19_5
	; CHECK-NEXT: ldr q5, [x9, :lo12:.LCPI19_4]
	; CHECK-NEXT: adrp x9, .LCPI19_7
	; CHECK-NEXT: uqadd v0.2d, v1.2d, v0.2d
	; CHECK-NEXT: ldr q6, [x8, :lo12:.LCPI19_5]
	; CHECK-NEXT: adrp x8, .LCPI19_6
	; CHECK-NEXT: ldr q7, [x9, :lo12:.LCPI19_7]
	; CHECK-NEXT: uqadd v2.2d, v1.2d, v2.2d
	; CHECK-NEXT: ldr q16, [x8, :lo12:.LCPI19_6]
	; CHECK-NEXT: uqadd v3.2d, v1.2d, v3.2d
	; CHECK-NEXT: uqadd v4.2d, v1.2d, v4.2d
	; CHECK-NEXT: uqadd v6.2d, v1.2d, v6.2d
	; CHECK-NEXT: uqadd v7.2d, v1.2d, v7.2d
	; CHECK-NEXT: uqadd v16.2d, v1.2d, v16.2d
	; CHECK-NEXT: uqadd v1.2d, v1.2d, v5.2d
	; CHECK-NEXT: cmhi v6.2d, v17.2d, v6.2d
	; CHECK-NEXT: cmhi v5.2d, v17.2d, v7.2d
	; CHECK-NEXT: cmhi v7.2d, v17.2d, v16.2d
	; CHECK-NEXT: cmhi v1.2d, v17.2d, v1.2d
	; CHECK-NEXT: cmhi v4.2d, v17.2d, v4.2d
	; CHECK-NEXT: cmhi v3.2d, v17.2d, v3.2d
	; CHECK-NEXT: cmhi v2.2d, v17.2d, v2.2d
	; CHECK-NEXT: cmhi v0.2d, v17.2d, v0.2d
	; CHECK-NEXT: uzp1 v5.4s, v7.4s, v5.4s
	; CHECK-NEXT: uzp1 v1.4s, v1.4s, v6.4s
	; CHECK-NEXT: uzp1 v3.4s, v3.4s, v4.4s
	; CHECK-NEXT: uzp1 v0.4s, v0.4s, v2.4s
	; CHECK-NEXT: uzp1 v1.8h, v1.8h, v5.8h
	; CHECK-NEXT: uzp1 v0.8h, v0.8h, v3.8h
	; CHECK-NEXT: uzp1 v0.16b, v0.16b, v1.16b
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%active.lane.mask = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i64(i64 %index, i64 %TC)			%active.lane.mask = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i64(i64 %index, i64 %TC)
	ret <16 x i1> %active.lane.mask			ret <16 x i1> %active.lane.mask
	}			}

	define <8 x i1> @lane_mask_v8i1_i64(i64 %index, i64 %TC) {			define <8 x i1> @lane_mask_v8i1_i64(i64 %index, i64 %TC) {
	; CHECK-LABEL: lane_mask_v8i1_i64:			; CHECK-LABEL: lane_mask_v8i1_i64:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: adrp x8, .LCPI20_0			; CHECK-NEXT: whilelo p0.b, x0, x1
	; CHECK-NEXT: adrp x9, .LCPI20_3			; CHECK-NEXT: mov z0.b, p0/z, #-1 // =0xffffffffffffffff
	; CHECK-NEXT: adrp x10, .LCPI20_2			; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
	; CHECK-NEXT: dup v2.2d, x0
	; CHECK-NEXT: dup v5.2d, x1
	; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI20_0]
	; CHECK-NEXT: adrp x8, .LCPI20_1
	; CHECK-NEXT: ldr q1, [x9, :lo12:.LCPI20_3]
	; CHECK-NEXT: ldr q3, [x10, :lo12:.LCPI20_2]
	; CHECK-NEXT: ldr q4, [x8, :lo12:.LCPI20_1]
	; CHECK-NEXT: uqadd v1.2d, v2.2d, v1.2d
	; CHECK-NEXT: uqadd v3.2d, v2.2d, v3.2d
	; CHECK-NEXT: uqadd v4.2d, v2.2d, v4.2d
	; CHECK-NEXT: uqadd v0.2d, v2.2d, v0.2d
	; CHECK-NEXT: cmhi v1.2d, v5.2d, v1.2d
	; CHECK-NEXT: cmhi v2.2d, v5.2d, v3.2d
	; CHECK-NEXT: cmhi v3.2d, v5.2d, v4.2d
	; CHECK-NEXT: cmhi v0.2d, v5.2d, v0.2d
	; CHECK-NEXT: uzp1 v1.4s, v2.4s, v1.4s
	; CHECK-NEXT: uzp1 v0.4s, v0.4s, v3.4s
	; CHECK-NEXT: uzp1 v0.8h, v0.8h, v1.8h
	; CHECK-NEXT: xtn v0.8b, v0.8h
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%active.lane.mask = call <8 x i1> @llvm.get.active.lane.mask.v8i1.i64(i64 %index, i64 %TC)			%active.lane.mask = call <8 x i1> @llvm.get.active.lane.mask.v8i1.i64(i64 %index, i64 %TC)
	ret <8 x i1> %active.lane.mask			ret <8 x i1> %active.lane.mask
	}			}

	define <4 x i1> @lane_mask_v4i1_i64(i64 %index, i64 %TC) {			define <4 x i1> @lane_mask_v4i1_i64(i64 %index, i64 %TC) {
	; CHECK-LABEL: lane_mask_v4i1_i64:			; CHECK-LABEL: lane_mask_v4i1_i64:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: adrp x8, .LCPI21_1			; CHECK-NEXT: whilelo p0.h, x0, x1
	; CHECK-NEXT: adrp x9, .LCPI21_0			; CHECK-NEXT: mov z0.h, p0/z, #-1 // =0xffffffffffffffff
	; CHECK-NEXT: dup v2.2d, x0			; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
	; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI21_1]
	; CHECK-NEXT: ldr q1, [x9, :lo12:.LCPI21_0]
	; CHECK-NEXT: uqadd v0.2d, v2.2d, v0.2d
	; CHECK-NEXT: uqadd v1.2d, v2.2d, v1.2d
	; CHECK-NEXT: dup v2.2d, x1
	; CHECK-NEXT: cmhi v0.2d, v2.2d, v0.2d
	; CHECK-NEXT: cmhi v1.2d, v2.2d, v1.2d
	; CHECK-NEXT: uzp1 v0.4s, v1.4s, v0.4s
	; CHECK-NEXT: xtn v0.4h, v0.4s
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%active.lane.mask = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 %index, i64 %TC)			%active.lane.mask = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 %index, i64 %TC)
	ret <4 x i1> %active.lane.mask			ret <4 x i1> %active.lane.mask
	}			}

	define <2 x i1> @lane_mask_v2i1_i64(i64 %index, i64 %TC) {			define <2 x i1> @lane_mask_v2i1_i64(i64 %index, i64 %TC) {
	; CHECK-LABEL: lane_mask_v2i1_i64:			; CHECK-LABEL: lane_mask_v2i1_i64:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: adrp x8, .LCPI22_0			; CHECK-NEXT: whilelo p0.s, x0, x1
	; CHECK-NEXT: dup v1.2d, x0			; CHECK-NEXT: mov z0.s, p0/z, #-1 // =0xffffffffffffffff
	; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI22_0]			; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
	; CHECK-NEXT: uqadd v0.2d, v1.2d, v0.2d
	; CHECK-NEXT: dup v1.2d, x1
	; CHECK-NEXT: cmhi v0.2d, v1.2d, v0.2d
	; CHECK-NEXT: xtn v0.2s, v0.2d
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%active.lane.mask = call <2 x i1> @llvm.get.active.lane.mask.v2i1.i64(i64 %index, i64 %TC)			%active.lane.mask = call <2 x i1> @llvm.get.active.lane.mask.v2i1.i64(i64 %index, i64 %TC)
	ret <2 x i1> %active.lane.mask			ret <2 x i1> %active.lane.mask
	}			}

	define <16 x i1> @lane_mask_v16i1_i8(i8 %index, i8 %TC) {			define <16 x i1> @lane_mask_v16i1_i8(i8 %index, i8 %TC) {
	; CHECK-LABEL: lane_mask_v16i1_i8:			; CHECK-LABEL: lane_mask_v16i1_i8:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines