This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
1/12
LangRef.rst
-
include/llvm/
-
llvm/
-
CodeGen/
-
TargetLowering.h
-
IR/
-
Intrinsics.td
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
1/4
SelectionDAGBuilder.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
get_vector_length.ll
-
RISCV/rvv/
-
rvv/
-
get_vector_length.ll

Differential D149916

[VP][SelectionDAG][RISCV] Add get_vector_length intrinsics and generic SelectionDAG support.
ClosedPublic

Authored by craig.topper on May 4 2023, 3:54 PM.

Download Raw Diff

Details

Reviewers

reames
ABataev
frasercrmck
rogfer01
simoll
hiraditya

Commits

rGc5e6c886aabb: [VP][SelectionDAG][RISCV] Add get_vector_length intrinsics and generic…

Summary

The generic implementation is umin(TC, VF * vscale).

Lowering to vsetvli for RISC-V will come in a future patch.

This patch is a pre-requisite to be able to CodeGen vectorized code from
D99750.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

craig.topper created this revision.May 4 2023, 3:54 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2023, 3:54 PM

Herald added subscribers: jobnoorman, luke, VincentWu and 28 others. · View Herald Transcript

craig.topper requested review of this revision.May 4 2023, 3:54 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2023, 3:54 PM

Herald added subscribers: • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

craig.topper added a child revision: D99750: [LV, VP]VP intrinsics support for the Loop Vectorizer.May 4 2023, 3:56 PM

Harbormaster completed remote builds in B230125: Diff 519687.May 4 2023, 5:38 PM

I suppose this makes sense for ARM MVE/SVE as well?
Generally, we should simplify this in IR for other targets and/or tripcounts with known multiples: otherwise the VP expansion pass will create %evl-to-%mask expansion code not knowing that the %evl is ineffective (eg %evl == vector_length).

-Add SVE test.
-Updates to LangRef. Reflect that this intrinsic is not require to return VF*vscale for count > VF*vscale. We need this to make RISC-V's vsetvli a valid implementation of this intrinsic.

Harbormaster completed remote builds in B230915: Diff 520761.May 9 2023, 1:35 PM

Matt added a subscriber: Matt.May 9 2023, 1:37 PM

frasercrmck added inline comments.May 11 2023, 1:00 AM

llvm/docs/LangRef.rst
18072	Double comma here
18073	Should we explicitly denote this intrinsic with `immarg` parameters in the LangRef? I only see the one doing it but it sounds kind helpful
18087	Maybe also saying that the element width is in bits would clarify things a little better.
18087	If it's just a hint, do we need to restrict it to being at least 8? It might feel a little artificial.

Address comment. Fix dangling node in SelectionDAGBuilder

Harbormaster completed remote builds in B231525: Diff 521551.May 11 2023, 11:34 PM

It looks good to me, but I'd like to leave it open for a bit so that other more active reviewers can take a look.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7315	Do we want to sanitize negative immediates here or elsewhere, or do we just let them become large positive ones?

craig.topper added inline comments.May 12 2023, 9:17 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7315	Good question. Do you have a suggestion how to sanitize it?

frasercrmck added inline comments.May 13 2023, 12:40 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7315	Would adding something to the verifier be sufficient? Then we may be able to assert here?

Add verifier check

Harbormaster completed remote builds in B232666: Diff 523131.May 17 2023, 12:57 PM

Pass VF to shouldExpandGetVectorLength

Harbormaster completed remote builds in B232684: Diff 523151.May 17 2023, 2:15 PM

evandro removed a subscriber: evandro.May 17 2023, 3:47 PM

loralb added a subscriber: loralb.May 19 2023, 2:47 AM

Thanks @craig.topper. This went under our radar. Apologies.

In general this makes sense, but if we want this to be useful for the VE target we need to address the fixed vector case. Being able to set the vector length is orthogonal to scalable vectors.

We assume that a call to llvm.experimental.get.vector.length for VE (or any other fixed-vector target with vector length) would set the cnt parameter, might optionally use the element_width parameter and then the vector factor would not be scaled by vscale. The user of the intrinsic, like the loop vectorizer, would have already chosen a meaningful value for vf for VE (VE for instance can use vf=256 when operating with 64-bit element width).

However this would render the intrinsic a bit ambiguous between fixed and scalable.

Maybe we can add an additional parameter stating if the VF is actually scaled by vscale or not (e.g., an immarg i1 or maybe a metadata string operand)?

llvm/docs/LangRef.rst
18094	I'd emphasise the fact that we always return an `i32`, this may be easy to miss from the two instances in the Syntax section.
18099	I think we want to explain that when `cnt ≤ (vscale * VF)` (for now, assume `vscale = 1` if we're dealing with fixed vectors). We understand in this case the result of this intrinsic is `cnt`. Also the case for `cnt > (vscale * VF)` is worded in a bit of an obscure way and might not be true as we understand it. For the specific case of RISC-V, if `VLMAX=64`, `cnt=64` will return `64` but `cnt=65` may return a value `x` ∊ (32, 64], like 33. That would not be a result "at least as large as the result for any value less than count" (we understand "any value less than count" as a value ∊ [0, 64]). Maybe we got it wrong.

asb mentioned this in D150824: [RISCV] Lower experimental_get_vector_length intrinsic to vsetvli for some cases..May 19 2023, 7:15 AM

craig.topper added a child revision: D150824: [RISCV] Lower experimental_get_vector_length intrinsic to vsetvli for some cases..May 19 2023, 9:39 AM

Attempt to address Roger's comments.

Harbormaster completed remote builds in B233254: Diff 523874.May 19 2023, 2:45 PM

@kaz7 do you have any thoughts on the way we intend to define this generic intrinsic? I think it may be useful for VE as well.

llvm/docs/LangRef.rst
18072	Typo `scalable`? (maybe this is an abbreviation but only one letter was left out)

This looks reasonable to me, minor comments only.

llvm/docs/LangRef.rst
18085	First argument is interpreted an unsigned integer correct? Should state that.
18105	reword: larger than the maximum legal vectorization factor. Also, should probably add a requirement here that zero is only returned when the requested trip count is zero.
llvm/lib/IR/Verifier.cpp
5463 ↗	(On Diff #523874)	I don't think the element width can be zero or negative either can it?

reames added inline comments.May 25 2023, 12:30 PM

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7322	I believe this is just computing the number of elements in the ElementCount constructed from VF and the scaleable bit. On the IR level, we have CreateElementCount on IRBuilder. We probably need something analogous on DAG. It looks like we've got a couple examples already, so it'd be good to pull that out and rebase. One generic optimization we should probably add (in a separate patch), is to fold get_active_vector_length to TC when TC can be proven less than EC.

craig.topper added inline comments.May 25 2023, 4:33 PM

llvm/docs/LangRef.rst
18105	I'm not sure what you mean by "maximum legalization factor"? My statement here was intended to only refer to the vectorization factor passed in. If the vectorization factor is for a type that isn't supported legally by hardware, the intrinsic will still return a vector length that utilizes the whole type.

reames added inline comments.May 25 2023, 4:45 PM

llvm/docs/LangRef.rst
18105	How about something like this? (For clarity, this is going in a different direction than my original comment.) If the count is larger than the number of lanes in the type described by the last two arguments, then this intrinsic may return a value less than the number of lanes implied by the type. Basically, what if we were explicit about the VF and scalable bit mapping to a type, and then described the behavior in terms of the number of runtime lanes in that type?

Address review comments

LGTM

p.s. It would be helpful to land this even well in advance of the vectorizer changes. That would lets us write code gen test cases for both variants of VP predication being discussed, and see what we can get code generation to look like.

This revision is now accepted and ready to land.May 25 2023, 6:32 PM

Harbormaster completed remote builds in B234722: Diff 525892.May 25 2023, 7:28 PM

Remove the element width hint. I don't think it's worth while right now.

LGTM too, one typo

llvm/docs/LangRef.rst
18085	typo: `specifieso`?

Fix typo

Returning to review since I dropped the element width after @reames review

Harbormaster completed remote builds in B234770: Diff 525945.May 26 2023, 12:00 AM

In D149916#4375221, @craig.topper wrote:

Returning to review since I dropped the element width after @reames review

I think we are going to need that, but I'm happy to take advantage of this being an experimental intrinsic and iterating in tree, so LGTM either way.

This revision was not accepted when it landed; it landed in state Needs Review.May 26 2023, 9:06 AM

This revision was landed with ongoing or failed builds.

Closed by commit rGc5e6c886aabb: [VP][SelectionDAG][RISCV] Add get_vector_length intrinsics and generic… (authored by craig.topper). · Explain Why

This revision was automatically updated to reflect the committed changes.

craig.topper added a commit: rGc5e6c886aabb: [VP][SelectionDAG][RISCV] Add get_vector_length intrinsics and generic….

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

41 lines

include/

llvm/

CodeGen/

TargetLowering.h

4 lines

IR/

Intrinsics.td

6 lines

lib/

CodeGen/

SelectionDAG/

SelectionDAGBuilder.cpp

27 lines

test/

CodeGen/

AArch64/

get_vector_length.ll

40 lines

RISCV/

rvv/

get_vector_length.ll

71 lines

Diff 521551

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 18,054 Lines • ▼ Show 20 Lines


	Arguments:			Arguments:
	""""""""""			""""""""""

	None.			None.


				'``llvm.experimental.get.vector.length``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""
				This is an overloaded intrinsic.

				::

				declare i32 @llvm.experimental.get.vector.length.i32(i32 %cnt, i32 immarg %element_width, i32 immarg %vf)
				frasercrmckUnsubmitted Not Done Reply Inline Actions Double comma here frasercrmck: Double comma here
				rogfer01Unsubmitted Not Done Reply Inline Actions Typo `scalable`? (maybe this is an abbreviation but only one letter was left out) rogfer01: Typo `scalable`? (maybe this is an abbreviation but only one letter was left out)
				declare i32 @llvm.experimental.get.vector.length.i64(i64 %cnt, i32 immarg %element_width, i32 immarg %vf)
				frasercrmckUnsubmitted Not Done Reply Inline Actions Should we explicitly denote this intrinsic with `immarg` parameters in the LangRef? I only see the one doing it but it sounds kind helpful frasercrmck: Should we explicitly denote this intrinsic with `immarg` parameters in the LangRef? I only see…

				Overview:
				"""""""""

				The '``llvm.experimental.get.vector.length.*``' intrinsics take a number of
				elements to process and returns how many of the elements can be processed
				with the requested vectorization factor.

				Arguments:
				""""""""""

				The first operand is of any integer type and specifies total number of elements
				reamesUnsubmitted Not Done Reply Inline Actions First argument is interpreted an unsigned integer correct? Should state that. reames: First argument is interpreted an unsigned integer correct? Should state that.
				frasercrmckUnsubmitted Not Done Reply Inline Actions typo: `specifieso`? frasercrmck: typo: `specifieso`?
				to be processed. The second argument is an i32 immediate for the element width
				in bits of the vector type. This serves as a hint to the target about the
				frasercrmckUnsubmitted Not Done Reply Inline Actions Maybe also saying that the element width is in bits would clarify things a little better. frasercrmck: Maybe also saying that the element width is in bits would clarify things a little better.
				frasercrmckUnsubmitted Not Done Reply Inline Actions If it's just a hint, do we need to restrict it to being at least 8? It might feel a little artificial. frasercrmck: If it's just a hint, do we need to restrict it to being at least 8? It might feel a little…
				element types involved in the loop. The third parameter is an i32 immediate for
				the vectorization factor. This factor is treated as a multiple of vscale.

				Semantics:
				""""""""""

				Returns a positive value (explicit vector length) that is unknown at compile
				rogfer01Unsubmitted Not Done Reply Inline Actions I'd emphasise the fact that we always return an `i32`, this may be easy to miss from the two instances in the Syntax section. rogfer01: I'd emphasise the fact that we always return an `i32`, this may be easy to miss from the two…
				time and depends on the hardware specification.
				If the result value does not fit in the result type, then the result is
				a :ref:`poison value <poisonvalues>`.

				If the total count is larger than VF*vscale, this intrinsic may not return
				rogfer01Unsubmitted Not Done Reply Inline Actions I think we want to explain that when `cnt ≤ (vscale * VF)` (for now, assume `vscale = 1` if we're dealing with fixed vectors). We understand in this case the result of this intrinsic is `cnt`. Also the case for `cnt > (vscale * VF)` is worded in a bit of an obscure way and might not be true as we understand it. For the specific case of RISC-V, if `VLMAX=64`, `cnt=64` will return `64` but `cnt=65` may return a value `x` ∊ (32, 64], like 33. That would not be a result "at least as large as the result for any value less than count" (we understand "any value less than count" as a value ∊ [0, 64]). Maybe we got it wrong. rogfer01: I think we want to explain that when `cnt ≤ (vscale * VF)` (for now, assume `vscale = 1` if…
				VF*vscale. The result will be at least as large as the result for any value
				less than count. This ensures that calling it for the total count will return
				the largest value any later loop iteration will see.

	Matrix Intrinsics			Matrix Intrinsics
	-----------------			-----------------
				reamesUnsubmitted Not Done Reply Inline Actions reword: larger than the maximum legal vectorization factor. Also, should probably add a requirement here that zero is only returned when the requested trip count is zero. reames: reword: larger than the maximum legal vectorization factor. Also, should probably add a…
				craig.topperAuthorUnsubmitted Done Reply Inline Actions I'm not sure what you mean by "maximum legalization factor"? My statement here was intended to only refer to the vectorization factor passed in. If the vectorization factor is for a type that isn't supported legally by hardware, the intrinsic will still return a vector length that utilizes the whole type. craig.topper: I'm not sure what you mean by "maximum legalization factor"? My statement here was intended to…
				reamesUnsubmitted Not Done Reply Inline Actions How about something like this? (For clarity, this is going in a different direction than my original comment.) If the count is larger than the number of lanes in the type described by the last two arguments, then this intrinsic may return a value less than the number of lanes implied by the type. Basically, what if we were explicit about the VF and scalable bit mapping to a type, and then described the behavior in terms of the number of runtime lanes in that type? reames: How about something like this? (For clarity, this is going in a different direction than my…

	Operations on matrixes requiring shape information (like number of rows/columns			Operations on matrixes requiring shape information (like number of rows/columns
	or the memory layout) can be expressed using the matrix intrinsics. These			or the memory layout) can be expressed using the matrix intrinsics. These
	intrinsics require matrix dimensions to be passed as immediate arguments, and			intrinsics require matrix dimensions to be passed as immediate arguments, and
	matrixes are passed and returned as vectors. This means that for a ``R`` x			matrixes are passed and returned as vectors. This means that for a ``R`` x
	``C`` matrix, element ``i`` of column ``j`` is at index ``j * R + i`` in the			``C`` matrix, element ``i`` of column ``j`` is at index ``j * R + i`` in the
	corresponding vector, with indices starting at 0. Currently column-major layout			corresponding vector, with indices starting at 0. Currently column-major layout
	is assumed. The intrinsics support both integer and floating point matrixes.			is assumed. The intrinsics support both integer and floating point matrixes.
	▲ Show 20 Lines • Show All 8,941 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/TargetLowering.h

//===- llvm/CodeGen/TargetLowering.h - Target Lowering Info ------ C++ --===//		//===- llvm/CodeGen/TargetLowering.h - Target Lowering Info ------ C++ --===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
///		///
▲ Show 20 Lines • Show All 445 Lines • ▼ Show 20 Lines	public:
}		}

/// Return true if the @llvm.get.active.lane.mask intrinsic should be expanded		/// Return true if the @llvm.get.active.lane.mask intrinsic should be expanded
/// using generic code in SelectionDAGBuilder.		/// using generic code in SelectionDAGBuilder.
virtual bool shouldExpandGetActiveLaneMask(EVT VT, EVT OpVT) const {		virtual bool shouldExpandGetActiveLaneMask(EVT VT, EVT OpVT) const {
return true;		return true;
}		}

		virtual bool shouldExpandGetVectorLength() const {
		return true;
		}

// Return true if op(vecreduce(x), vecreduce(y)) should be reassociated to		// Return true if op(vecreduce(x), vecreduce(y)) should be reassociated to
// vecreduce(op(x, y)) for the reduction opcode RedOpc.		// vecreduce(op(x, y)) for the reduction opcode RedOpc.
virtual bool shouldReassociateReduction(unsigned RedOpc, EVT VT) const {		virtual bool shouldReassociateReduction(unsigned RedOpc, EVT VT) const {
return true;		return true;
}		}

/// Return true if it is profitable to convert a select of FP constants into		/// Return true if it is profitable to convert a select of FP constants into
/// a constant pool load whose address depends on the select condition. The		/// a constant pool load whose address depends on the select condition. The
▲ Show 20 Lines • Show All 4,836 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 2,140 Lines • ▼ Show 20 Lines	def int_vp_cttz : DefaultAttrsIntrinsic<[ llvm_anyvector_ty ],
llvm_i32_ty]>;		llvm_i32_ty]>;
}		}

def int_get_active_lane_mask:		def int_get_active_lane_mask:
DefaultAttrsIntrinsic<[llvm_anyvector_ty],		DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[llvm_anyint_ty, LLVMMatchType<1>],		[llvm_anyint_ty, LLVMMatchType<1>],
[IntrNoMem, IntrNoSync, IntrWillReturn]>;		[IntrNoMem, IntrNoSync, IntrWillReturn]>;

		def int_experimental_get_vector_length:
		DefaultAttrsIntrinsic<[llvm_i32_ty],
		[llvm_anyint_ty, llvm_i32_ty, llvm_i32_ty],
		[IntrNoMem, IntrNoSync, IntrWillReturn,
		ImmArg<ArgIndex<1>>, ImmArg<ArgIndex<2>>]>;

def int_experimental_vp_splice:		def int_experimental_vp_splice:
DefaultAttrsIntrinsic<[llvm_anyvector_ty],		DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>,		[LLVMMatchType<0>,
LLVMMatchType<0>,		LLVMMatchType<0>,
llvm_i32_ty,		llvm_i32_ty,
LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
llvm_i32_ty, llvm_i32_ty],		llvm_i32_ty, llvm_i32_ty],
[IntrNoMem, ImmArg<ArgIndex<2>>]>;		[IntrNoMem, ImmArg<ArgIndex<2>>]>;
▲ Show 20 Lines • Show All 357 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

//===- SelectionDAGBuilder.cpp - Selection-DAG building -------------------===//		//===- SelectionDAGBuilder.cpp - Selection-DAG building -------------------===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 7,281 Lines • ▼ Show 20 Lines	case Intrinsic::get_active_lane_mask: {
SDValue VectorStep = DAG.getStepVector(sdl, VecTy);		SDValue VectorStep = DAG.getStepVector(sdl, VecTy);
SDValue VectorInduction = DAG.getNode(		SDValue VectorInduction = DAG.getNode(
ISD::UADDSAT, sdl, VecTy, VectorIndex, VectorStep);		ISD::UADDSAT, sdl, VecTy, VectorIndex, VectorStep);
SDValue SetCC = DAG.getSetCC(sdl, CCVT, VectorInduction,		SDValue SetCC = DAG.getSetCC(sdl, CCVT, VectorInduction,
VectorTripCount, ISD::CondCode::SETULT);		VectorTripCount, ISD::CondCode::SETULT);
setValue(&I, SetCC);		setValue(&I, SetCC);
return;		return;
}		}
		case Intrinsic::experimental_get_vector_length: {
		if (!TLI.shouldExpandGetVectorLength()) {
		visitTargetIntrinsic(I, Intrinsic);
		return;
		}

		// Expand to a umin between the trip count and the maximum elements the type
		// can hold.
		EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
		SDValue TripCount = getValue(I.getOperand(0));

		// Extend the trip count to at least the result VT.
		if (TripCount.getValueType().bitsLT(VT))
		TripCount = DAG.getNode(ISD::ZERO_EXTEND, sdl, VT, TripCount);

		EVT TripCountVT = TripCount.getValueType();

		uint64_t VF = cast<ConstantInt>(I.getOperand(2))->getZExtValue();
		frasercrmckUnsubmitted Not Done Reply Inline Actions Do we want to sanitize negative immediates here or elsewhere, or do we just let them become large positive ones? frasercrmck: Do we want to sanitize negative immediates here or elsewhere, or do we just let them become…
		craig.topperAuthorUnsubmitted Done Reply Inline Actions Good question. Do you have a suggestion how to sanitize it? craig.topper: Good question. Do you have a suggestion how to sanitize it?
		frasercrmckUnsubmitted Not Done Reply Inline Actions Would adding something to the verifier be sufficient? Then we may be able to assert here? frasercrmck: Would adding something to the verifier be sufficient? Then we may be able to assert here?
		SDValue MaxEVL = DAG.getVScale(sdl, TripCountVT,
		APInt(TripCountVT.getSizeInBits(), VF));
		SDValue UMin = DAG.getNode(ISD::UMIN, sdl, TripCountVT, TripCount, MaxEVL);
		// Clip to the result type if needed.
		SDValue Trunc = DAG.getNode(ISD::TRUNCATE, sdl, VT, UMin);

		setValue(&I, Trunc);
		reamesUnsubmitted Not Done Reply Inline Actions I believe this is just computing the number of elements in the ElementCount constructed from VF and the scaleable bit. On the IR level, we have CreateElementCount on IRBuilder. We probably need something analogous on DAG. It looks like we've got a couple examples already, so it'd be good to pull that out and rebase. One generic optimization we should probably add (in a separate patch), is to fold get_active_vector_length to TC when TC can be proven less than EC. reames: I believe this is just computing the number of elements in the ElementCount constructed from VF…
		return;
		}
case Intrinsic::vector_insert: {		case Intrinsic::vector_insert: {
SDValue Vec = getValue(I.getOperand(0));		SDValue Vec = getValue(I.getOperand(0));
SDValue SubVec = getValue(I.getOperand(1));		SDValue SubVec = getValue(I.getOperand(1));
SDValue Index = getValue(I.getOperand(2));		SDValue Index = getValue(I.getOperand(2));

// The intrinsic's index type is i64, but the SDNode requires an index type		// The intrinsic's index type is i64, but the SDNode requires an index type
// suitable for the target. Convert the index as required.		// suitable for the target. Convert the index as required.
MVT VectorIdxTy = TLI.getVectorIdxTy(DAG.getDataLayout());		MVT VectorIdxTy = TLI.getVectorIdxTy(DAG.getDataLayout());
▲ Show 20 Lines • Show All 4,481 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/get_vector_length.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
				; RUN: llc < %s -mtriple=aarch64-linux-gnu -mattr=+sve \| FileCheck %s

				declare i32 @llvm.experimental.get.vector.length.i16(i16, i32, i32)
				declare i32 @llvm.experimental.get.vector.length.i32(i32, i32, i32)
				declare i32 @llvm.experimental.get.vector.length.i64(i64, i32, i32)

				define i32 @vector_length_i16(i16 zeroext %tc) {
				; CHECK-LABEL: vector_length_i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: cntd x8
				; CHECK-NEXT: cmp w0, w8
				; CHECK-NEXT: csel w0, w0, w8, lo
				; CHECK-NEXT: ret
				%a = call i32 @llvm.experimental.get.vector.length.i16(i16 %tc, i32 8, i32 2)
				ret i32 %a
				}

				define i32 @vector_length_i32(i32 zeroext %tc) {
				; CHECK-LABEL: vector_length_i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: cntd x8
				; CHECK-NEXT: cmp w0, w8
				; CHECK-NEXT: csel w0, w0, w8, lo
				; CHECK-NEXT: ret
				%a = call i32 @llvm.experimental.get.vector.length.i32(i32 %tc, i32 8, i32 2)
				ret i32 %a
				}

				define i32 @vector_length_i64(i64 %tc) {
				; CHECK-LABEL: vector_length_i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: cntd x8
				; CHECK-NEXT: cmp x0, x8
				; CHECK-NEXT: csel x0, x0, x8, lo
				; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0
				; CHECK-NEXT: ret
				%a = call i32 @llvm.experimental.get.vector.length.i64(i64 %tc, i32 8, i32 2)
				ret i32 %a
				}

llvm/test/CodeGen/RISCV/rvv/get_vector_length.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
				; RUN: sed 's/iXLen/i32/g' %s \| llc -mtriple=riscv32 -mattr=+v -verify-machineinstrs \| FileCheck %s --check-prefixes=CHECK,RV32
				; RUN: sed 's/iXLen/i32/g' %s \| llc -mtriple=riscv64 -mattr=+v -verify-machineinstrs \| FileCheck %s --check-prefixes=CHECK,RV64

				declare i32 @llvm.experimental.get.vector.length.i16(i16, i32, i32)
				declare i32 @llvm.experimental.get.vector.length.i32(i32, i32, i32)
				declare i32 @llvm.experimental.get.vector.length.i64(i64, i32, i32)

				define i32 @vector_length_i16(i16 zeroext %tc) {
				; CHECK-LABEL: vector_length_i16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: csrr a1, vlenb
				; CHECK-NEXT: srli a1, a1, 2
				; CHECK-NEXT: bltu a0, a1, .LBB0_2
				; CHECK-NEXT: # %bb.1:
				; CHECK-NEXT: mv a0, a1
				; CHECK-NEXT: .LBB0_2:
				; CHECK-NEXT: ret
				%a = call i32 @llvm.experimental.get.vector.length.i16(i16 %tc, i32 8, i32 2)
				ret i32 %a
				}

				define i32 @vector_length_i32(i32 zeroext %tc) {
				; RV32-LABEL: vector_length_i32:
				; RV32: # %bb.0:
				; RV32-NEXT: csrr a1, vlenb
				; RV32-NEXT: srli a1, a1, 2
				; RV32-NEXT: bltu a0, a1, .LBB1_2
				; RV32-NEXT: # %bb.1:
				; RV32-NEXT: mv a0, a1
				; RV32-NEXT: .LBB1_2:
				; RV32-NEXT: ret
				;
				; RV64-LABEL: vector_length_i32:
				; RV64: # %bb.0:
				; RV64-NEXT: sext.w a0, a0
				; RV64-NEXT: csrr a1, vlenb
				; RV64-NEXT: srli a1, a1, 2
				; RV64-NEXT: bltu a0, a1, .LBB1_2
				; RV64-NEXT: # %bb.1:
				; RV64-NEXT: mv a0, a1
				; RV64-NEXT: .LBB1_2:
				; RV64-NEXT: ret
				%a = call i32 @llvm.experimental.get.vector.length.i32(i32 %tc, i32 8, i32 2)
				ret i32 %a
				}

				define i32 @vector_length_XLen(iXLen zeroext %tc) {
				; RV32-LABEL: vector_length_XLen:
				; RV32: # %bb.0:
				; RV32-NEXT: csrr a1, vlenb
				; RV32-NEXT: srli a1, a1, 2
				; RV32-NEXT: bltu a0, a1, .LBB2_2
				; RV32-NEXT: # %bb.1:
				; RV32-NEXT: mv a0, a1
				; RV32-NEXT: .LBB2_2:
				; RV32-NEXT: ret
				;
				; RV64-LABEL: vector_length_XLen:
				; RV64: # %bb.0:
				; RV64-NEXT: sext.w a0, a0
				; RV64-NEXT: csrr a1, vlenb
				; RV64-NEXT: srli a1, a1, 2
				; RV64-NEXT: bltu a0, a1, .LBB2_2
				; RV64-NEXT: # %bb.1:
				; RV64-NEXT: mv a0, a1
				; RV64-NEXT: .LBB2_2:
				; RV64-NEXT: ret
				%a = call i32 @llvm.experimental.get.vector.length.iXLen(iXLen %tc, i32 8, i32 2)
				ret i32 %a
				}

This is an archive of the discontinued LLVM Phabricator instance.

[VP][SelectionDAG][RISCV] Add get_vector_length intrinsics and generic SelectionDAG support.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 521551

llvm/docs/LangRef.rst

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/include/llvm/IR/Intrinsics.td

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/test/CodeGen/AArch64/get_vector_length.ll

llvm/test/CodeGen/RISCV/rvv/get_vector_length.ll

[VP][SelectionDAG][RISCV] Add get_vector_length intrinsics and generic SelectionDAG support.
ClosedPublic