Page MenuHomePhabricator

[CodeGen][SelectionDAG]Add new intrinsic experimental.vector.reverse
ClosedPublic

Authored by CarolineConcatto on Jan 17 2021, 11:45 AM.

Details

Summary

This patch adds a new intrinsic experimental.vector.reduce that takes a single
vector and returns a vector of matching type but with the original lane order
reversed. For example:

vector.reverse(<A,B,C,D>) ==> <D,C,B,A>

The new intrinsic supports fixed and scalable vectors types.
The fixed-width vector relies on shufflevector to maintain existing behaviour.
Scalable vector uses the new ISD node - VECTOR_REVERSE.

This new intrinsic is one of the named shufflevector intrinsics proposed on the
mailing-list in the RFC at [1].

Patch by Paul Walker (@paulwalker-arm).

[1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html

Diff Detail

Event Timeline

CarolineConcatto requested review of this revision.Jan 17 2021, 11:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2021, 11:45 AM
craig.topper added inline comments.
llvm/include/llvm/IR/Intrinsics.td
1640

Should this use DefaultAttrsIntrinsic?

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
99

Can we keep consistent formatting with the rest of this switch?

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
10858

Put VT.getVectorNumElements() in a variable since its used here and in the loop body.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
251–252

Why not put this in TargetSelectionDAG.td?

david-arm added inline comments.Jan 18 2021, 1:21 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
10848

I think we can drop the "experimental" here now?

10850

Perhaps worth explaining that we're using VECTOR_SHUFFLE to implement the VECTOR_REVERSE operation for fixed length vectors? It confused me a bit at first trying to understand why we jump suddenly into VECTOR_SHUFFLE that's all.

10856

VECTOR_REVERSE?

llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
293

I think this can just be "vector_reverse" because it's a generic opcode and not experimental?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5891

Just for reference, if all the lowering operation does is create the ISD_VECTOR_REVERSE with the same arguments then you can actually just mark the operation as Legal and avoid creating the LowerVECTOR_REVERSE function as it should happen automatically. If we have plans to use SVE for fixed length vectors in future then it might be useful to keep this custom function anyway.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
251–252

Yeah, I think you need to add something to llvm/include/llvm/Target/TargetSelectionDAG.td along the lines of

def vector_reverse : SDNode<"ISD::VECTOR_REVERSE",

SDTypeProfile<1, 1, [SDTCisVec<0>, SDTCisSameAs<0,1>]>,[]>;

because you've added a generic ISD opcode in this patch. In places where you've got patterns using "AArch64rev" you can just then use "vector_reverse" instead.

david-arm added inline comments.Jan 18 2021, 1:23 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
10856

Sorry, ignore the comment above!

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
252

nit: whitespace

CarolineConcatto edited the summary of this revision. (Show Details)Jan 19 2021, 7:56 AM

-remove AArch64 custom lowering
-move SNode vector_reverse to TargetSelectionDAG.td

CarolineConcatto marked an inline comment as done.

-fix style on LegalizeIntegerTypes.cpp

CarolineConcatto marked 8 inline comments as done.

-fix table gen style for TargetSelectionDAG.td

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
10850

@david-arm
I've updated the commit message and the comments as well.
Is that good?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5891

Thank you @david-arm for pointing that.
It is true that we can remove the custom lowering for aarch64 and have it as legal.
This simplifies the code.

Matt added a subscriber: Matt.Jan 19 2021, 9:14 AM
This revision is now accepted and ready to land.Jan 21 2021, 8:23 AM
fhahn added a subscriber: fhahn.Jan 21 2021, 8:40 AM

Mirroring my comments for D94708: if the intrinsics needs to support fixed vectors, it would be good to have some tests for platforms other than AArch64 and also support in GlobalISel, which is the default on AArch64 with -O0 IIRC (or do the transform to shuffles as an IR transform).

Mirroring my comments for D94708: if the intrinsics needs to support fixed vectors, it would be good to have some tests for platforms other than AArch64 and also support in GlobalISel, which is the default on AArch64 with -O0 IIRC (or do the transform to shuffles as an IR transform).

Hi @fhahn, I agree with your point about -O0, but I'm not sure why we need tests for other platforms? Carol has an extensive set of tests for both fixed width and scalable vectors. The lowering is identical for fixed width vectors regardless of the target so all it would be testing is the different codegen of vector shuffles.

Mirroring my comments for D94708: if the intrinsics needs to support fixed vectors, it would be good to have some tests for platforms other than AArch64 and also support in GlobalISel, which is the default on AArch64 with -O0 IIRC (or do the transform to shuffles as an IR transform).

Hi @fhahn, I agree with your point about -O0, but I'm not sure why we need tests for other platforms? Carol has an extensive set of tests for both fixed width and scalable vectors. The lowering is identical for fixed width vectors regardless of the target so all it would be testing is the different codegen of vector shuffles.

I agree, such extra tests seem wasteful as I'd expect ISD::VECTOR_SHUFFLE to already be sufficiently tested for all targets.

fhahn added a comment.Jan 22 2021, 4:03 AM

Mirroring my comments for D94708: if the intrinsics needs to support fixed vectors, it would be good to have some tests for platforms other than AArch64 and also support in GlobalISel, which is the default on AArch64 with -O0 IIRC (or do the transform to shuffles as an IR transform).

Hi @fhahn, I agree with your point about -O0, but I'm not sure why we need tests for other platforms? Carol has an extensive set of tests for both fixed width and scalable vectors. The lowering is identical for fixed width vectors regardless of the target so all it would be testing is the different codegen of vector shuffles.

I agree, such extra tests seem wasteful as I'd expect ISD::VECTOR_SHUFFLE to already be sufficiently tested for all targets.

Given the current implementation details that's true, but the implementation could change and having a few tests for other targets guards against the implementation changing in a way that makes it only work on a certain target for example. I'm not saying that is very likely, but something like that could get missed as backends transition to GlobalISel at different times, for example.

I'm not saying we need extensive tests for all targets, just suggesting to add a small sanity test for a different commonly used backend, like X86. Adding such a test seems cheap, but please feel free to ignore the suggestion.

Taking a step back, are there plans to use the intrinsics for fixed vectors?

Taking a step back, are there plans to use the intrinsics for fixed vectors?

I'd like to say yes but at this stage it is too early to say. What we definitely need is a unified interface so that transforms can be written without needing to worry about the type of vector. If a pass uses an IRBuilder then great, we can hide the "do I create a shufflevector or intrinsic" code behind suitably names functions (e.g. craeteVectorSplice()...). For passes that want to create a node directly then we'd recommend just creating an intrinsic call with the expectation that those working with fixed length vectors are transformed to shufflevector sufficiently early to maintain existing code quality.

Once we're at a stage where the expected shuffle optimisations apply equally well to the intrinsic variants, there's consensus of how LLVM will represent step vector like constants and ultimately the final decision on if/how shufflevector can operate on scalable vectors, we should be in a position to determine whether it is worth unifying shuffles paths (i.e. minimise the use of shufflevector).

Just an extra note to say that whilst the above is true for the intrinsics, it doesn't necessarily hold for the ISD nodes, where I believe the common case will be to rely on VECTOR_SHUFFLE to implement the vector intrinsics that operate on fixed length vectors.

-add -O0 and X86 test for vector.reverse

Thank you all for the comments.
I've added tests for -O0 in AArch64 and fixed-width test for X86.
ATM the X86 test has no specific target feature.

craig.topper added inline comments.Jan 27 2021, 12:49 AM
llvm/test/CodeGen/X86/named-vector-shuffles.ll
65 ↗(On Diff #319320)

You can probably drop this. Vectors of f16 aren't legal on X86.

fhahn added a comment.Jan 27 2021, 1:14 AM

Taking a step back, are there plans to use the intrinsics for fixed vectors?

I'd like to say yes but at this stage it is too early to say. What we definitely need is a unified interface so that transforms can be written without needing to worry about the type of vector. If a pass uses an IRBuilder then great, we can hide the "do I create a shufflevector or intrinsic" code behind suitably names functions (e.g. craeteVectorSplice()...). For passes that want to create a node directly then we'd recommend just creating an intrinsic call with the expectation that those working with fixed length vectors are transformed to shufflevector sufficiently early to maintain existing code quality.

Hm, making things slightly easier for passes not using IRBuilder doesn't seem like the strongest motivation to me, especially if it also comes with new pass ordering constraints (doing the conversion during instruction selection seems like it would mean we potentially miss existing folds in InstCombine & co). Also, even if passes not using IRBuilder, creating an intrinsic call without it is probably more work than just instantiating IRBuilder and using it directly?

It seems to me like only allowing them for scalable vectors initially would define away a couple of potential problems (like introducing new pass-ordering constraints, uncertainty whether to use shuffle vector or the intrinsics). Personally I'd prefer one way of doing things rather than having to chose whether to use an intrinsic or a shuffle for fixed vectors. If it is uncertain whether the intrinsic will actually be used for fixed vectors, having it first restricted to the actual use case (scalable vectors) and extend it to fixed vectors once it is needed seems like a slightly more incremental/conservative approach.

But if others think having them for fixed with vectors is useful, that's fine by me, especially because they are still experimental and we can remove fixed vector support again if it is not used.

-remove test for f16 on X86 as they are not legal.

CarolineConcatto marked an inline comment as done.Jan 28 2021, 2:49 AM
CarolineConcatto added inline comments.
llvm/test/CodeGen/X86/named-vector-shuffles.ll
65 ↗(On Diff #319320)

Thank you @craig.topper. I've removed the test for f16 tests on X86

It seems to me like only allowing them for scalable vectors initially would define away a couple of potential problems (like introducing new pass-ordering constraints, uncertainty whether to use shuffle vector or the intrinsics). Personally I'd prefer one way of doing things rather than having to chose whether to use an intrinsic or a shuffle for fixed vectors. If it is uncertain whether the intrinsic will actually be used for fixed vectors, having it first restricted to the actual use case (scalable vectors) and extend it to fixed vectors once it is needed seems like a slightly more incremental/conservative approach.

But if others think having them for fixed with vectors is useful, that's fine by me, especially because they are still experimental and we can remove fixed vector support again if it is not used.

While these intrinsics are marked experimental they probably shouldn't be created for fixed-width vectors by passes or front-ends until we have a better understanding what direction to take for shuffles. At the same time, the name doesn't suggest the intrinsic is limited to scalable-vectors only, and it is quite trivial to make this work for fixed-width vectors, so I don't see a reason not to support this. Perhaps the LangRef description can mention this consideration?

llvm/docs/LangRef.rst
16246

nxv4i32

16254

Maybe add:

These intrinsics work for both fixed and scalable vectors. While this intrinsic is marked as experimental the recommended way to express reverse operations for fixed-width vectors is still to use a shufflevector, as that may allow for more optimization opportunities.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
10850–10852

nit: remove comment.

10859

nit: s/behaviours/behavior/

llvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll
1 ↗(On Diff #319807)

nit: can you rename this file (and the one for neon and x86) to: named-vector-shuffle-reverse-sve.ll, named-vector-shuffle-reverse-neon.ll and named-vector-shuffle-reverse.ll respectively?

CarolineConcatto marked an inline comment as done.

-change the names of the tests to named-vector-shuffle-reverse
-add text in LangRef about the use of experimental.vector.reverse

CarolineConcatto marked 2 inline comments as done.Feb 11 2021, 4:44 AM
sdesmalen accepted this revision.Feb 11 2021, 5:51 AM

Cheers, it looks good to me now!

fhahn accepted this revision.Feb 11 2021, 2:36 PM

It seems to me like only allowing them for scalable vectors initially would define away a couple of potential problems (like introducing new pass-ordering constraints, uncertainty whether to use shuffle vector or the intrinsics). Personally I'd prefer one way of doing things rather than having to chose whether to use an intrinsic or a shuffle for fixed vectors. If it is uncertain whether the intrinsic will actually be used for fixed vectors, having it first restricted to the actual use case (scalable vectors) and extend it to fixed vectors once it is needed seems like a slightly more incremental/conservative approach.

But if others think having them for fixed with vectors is useful, that's fine by me, especially because they are still experimental and we can remove fixed vector support again if it is not used.

While these intrinsics are marked experimental they probably shouldn't be created for fixed-width vectors by passes or front-ends until we have a better understanding what direction to take for shuffles. At the same time, the name doesn't suggest the intrinsic is limited to scalable-vectors only, and it is quite trivial to make this work for fixed-width vectors, so I don't see a reason not to support this. Perhaps the LangRef description can mention this consideration?

Making it explicit in the langref sounds good to me, thanks!

LGTM.

paulwalker-arm accepted this revision.Feb 12 2021, 3:36 AM

One minor issue that can be fixed when merging.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
293

Stylistically this should be on the same line as the case to match all the other entries, even those with longer names.

This revision was landed with ongoing or failed builds.Feb 15 2021, 5:41 AM
This revision was automatically updated to reflect the committed changes.