This is an archive of the discontinued LLVM Phabricator instance.

[VP][SelectionDAG][RISCV] Add get_vector_length intrinsics and generic SelectionDAG support.
ClosedPublic

Authored by craig.topper on May 4 2023, 3:54 PM.

Details

Summary

The generic implementation is umin(TC, VF * vscale).

Lowering to vsetvli for RISC-V will come in a future patch.

This patch is a pre-requisite to be able to CodeGen vectorized code from
D99750.

Diff Detail

Event Timeline

craig.topper created this revision.May 4 2023, 3:54 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2023, 3:54 PM
craig.topper requested review of this revision.May 4 2023, 3:54 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2023, 3:54 PM
simoll added a comment.May 9 2023, 8:19 AM

I suppose this makes sense for ARM MVE/SVE as well?
Generally, we should simplify this in IR for other targets and/or tripcounts with known multiples: otherwise the VP expansion pass will create %evl-to-%mask expansion code not knowing that the %evl is ineffective (eg %evl == vector_length).

-Add SVE test.
-Updates to LangRef. Reflect that this intrinsic is not require to return VF*vscale for count > VF*vscale. We need this to make RISC-V's vsetvli a valid implementation of this intrinsic.

Matt added a subscriber: Matt.May 9 2023, 1:37 PM
frasercrmck added inline comments.May 11 2023, 1:00 AM
llvm/docs/LangRef.rst
18081

Double comma here

18082

Should we explicitly denote this intrinsic with immarg parameters in the LangRef? I only see the one doing it but it sounds kind helpful

18096

Maybe also saying that the element width is in bits would clarify things a little better.

18096

If it's just a hint, do we need to restrict it to being at least 8? It might feel a little artificial.

Address comment. Fix dangling node in SelectionDAGBuilder

It looks good to me, but I'd like to leave it open for a bit so that other more active reviewers can take a look.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7339

Do we want to sanitize negative immediates here or elsewhere, or do we just let them become large positive ones?

craig.topper added inline comments.May 12 2023, 9:17 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7339

Good question. Do you have a suggestion how to sanitize it?

frasercrmck added inline comments.May 13 2023, 12:40 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7339

Would adding something to the verifier be sufficient? Then we may be able to assert here?

Add verifier check

Pass VF to shouldExpandGetVectorLength

evandro removed a subscriber: evandro.May 17 2023, 3:47 PM
loralb added a subscriber: loralb.May 19 2023, 2:47 AM

Thanks @craig.topper. This went under our radar. Apologies.

In general this makes sense, but if we want this to be useful for the VE target we need to address the fixed vector case. Being able to set the vector length is orthogonal to scalable vectors.

We assume that a call to llvm.experimental.get.vector.length for VE (or any other fixed-vector target with vector length) would set the cnt parameter, might optionally use the element_width parameter and then the vector factor would not be scaled by vscale. The user of the intrinsic, like the loop vectorizer, would have already chosen a meaningful value for vf for VE (VE for instance can use vf=256 when operating with 64-bit element width).

However this would render the intrinsic a bit ambiguous between fixed and scalable.

Maybe we can add an additional parameter stating if the VF is actually scaled by vscale or not (e.g., an immarg i1 or maybe a metadata string operand)?

llvm/docs/LangRef.rst
18103

I'd emphasise the fact that we always return an i32, this may be easy to miss from the two instances in the Syntax section.

18108

I think we want to explain that when cnt ≤ (vscale * VF) (for now, assume vscale = 1 if we're dealing with fixed vectors). We understand in this case the result of this intrinsic is cnt.

Also the case for cnt > (vscale * VF) is worded in a bit of an obscure way and might not be true as we understand it. For the specific case of RISC-V, if VLMAX=64, cnt=64 will return 64 but cnt=65 may return a value x ∊ (32, 64], like 33. That would not be a result "at least as large as the result for any value less than count" (we understand "any value less than count" as a value ∊ [0, 64]). Maybe we got it wrong.

Attempt to address Roger's comments.

rogfer01 added a subscriber: kaz7.EditedMay 23 2023, 3:27 AM

@kaz7 do you have any thoughts on the way we intend to define this generic intrinsic? I think it may be useful for VE as well.

llvm/docs/LangRef.rst
18081

Typo scalable? (maybe this is an abbreviation but only one letter was left out)

This looks reasonable to me, minor comments only.

llvm/docs/LangRef.rst
18094

First argument is interpreted an unsigned integer correct? Should state that.

18114

reword: larger than the maximum legal vectorization factor.

Also, should probably add a requirement here that zero is only returned when the requested trip count is zero.

llvm/lib/IR/Verifier.cpp
5473

I don't think the element width can be zero or negative either can it?

reames added inline comments.May 25 2023, 12:30 PM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7346

I believe this is just computing the number of elements in the ElementCount constructed from VF and the scaleable bit.

On the IR level, we have CreateElementCount on IRBuilder. We probably need something analogous on DAG. It looks like we've got a couple examples already, so it'd be good to pull that out and rebase.

One generic optimization we should probably add (in a separate patch), is to fold get_active_vector_length to TC when TC can be proven less than EC.

craig.topper added inline comments.May 25 2023, 4:33 PM
llvm/docs/LangRef.rst
18114

I'm not sure what you mean by "maximum legalization factor"? My statement here was intended to only refer to the vectorization factor passed in. If the vectorization factor is for a type that isn't supported legally by hardware, the intrinsic will still return a vector length that utilizes the whole type.

reames added inline comments.May 25 2023, 4:45 PM
llvm/docs/LangRef.rst
18114

How about something like this? (For clarity, this is going in a different direction than my original comment.)

If the count is larger than the number of lanes in the type described by the last two arguments, then this intrinsic may return a value less than the number of lanes implied by the type.

Basically, what if we were explicit about the VF and scalable bit mapping to a type, and then described the behavior in terms of the number of runtime lanes in that type?

Address review comments

reames accepted this revision.May 25 2023, 6:32 PM

LGTM

p.s. It would be helpful to land this even well in advance of the vectorizer changes. That would lets us write code gen test cases for both variants of VP predication being discussed, and see what we can get code generation to look like.

This revision is now accepted and ready to land.May 25 2023, 6:32 PM

Remove the element width hint. I don't think it's worth while right now.

frasercrmck accepted this revision.May 25 2023, 10:43 PM

LGTM too, one typo

llvm/docs/LangRef.rst
18094

typo: specifieso?

craig.topper requested review of this revision.May 25 2023, 10:48 PM

Returning to review since I dropped the element width after @reames review

Returning to review since I dropped the element width after @reames review

I think we are going to need that, but I'm happy to take advantage of this being an experimental intrinsic and iterating in tree, so LGTM either way.

This revision was not accepted when it landed; it landed in state Needs Review.May 26 2023, 9:06 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.