This is an archive of the discontinued LLVM Phabricator instance.

[PoC][IR] Generalize interleave/deinterleave intrinsics to factors > 2
AbandonedPublic

Authored by luke on Mar 7 2023, 3:32 AM.

Details

Summary

This replaces the @llvm.experimental.vector.interleave2
and @llvm.experimental.vector.deinterleave2 intrinsics with the more
general @llvm.experimental.vector.interleave
@llvm.experimental.vector.deinterleave intrinsics that can
interleave/deinterleave an arbitrary number of lanes.

This also extends the vector_interleave/vector_deinterleave SelectionDAG
nodes to match.

Rather than creating N different intrinsics for
@llvm.experimental.vector.interleaveN, this changes the type to accept a
variadic number of arguments, which determines the interleave factor.
Conversely, the interleave factor for the deinterleave intrinsics is
determined by the number of return types.

This does not add support for code generation for factors > 2 yet, which
will be added in a later patch.

Diff Detail

Event Timeline

luke created this revision.Mar 7 2023, 3:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2023, 3:32 AM
luke requested review of this revision.Mar 7 2023, 3:32 AM
luke added inline comments.Mar 7 2023, 3:39 AM
llvm/docs/LangRef.rst
17791

Even though there is really only overloaded type in this intrinsic, I've suffixed the result type as well to disambiguate the various interleave factors:
e.g. a deinterleave of <vscale x 8 x i32> could either decompose to 2 x <vscale x 4 x i32> or 4 x <vscale x 2 x i32>, and we need separate declarations for them.

luke updated this revision to Diff 502978.Mar 7 2023, 3:47 AM

Update ISDOpcodes documentation and fix integer conversion

luke added inline comments.Mar 7 2023, 3:54 AM
llvm/include/llvm/IR/Intrinsics.td
2120–2128

This is the main disadvantage of using just one intrinsic to represent all factors: We have to use variadic arguments which complicates the type signature, and we need to do the verification ourselves in Verifier.cpp.

I also considered just creating separate intrinsics for each interleave factor, but didn't like the duplication that would be required in Legalise*Types.cpp/SelectionDAGBuilder.cpp.

luke updated this revision to Diff 502985.Mar 7 2023, 4:24 AM

Fix deinterleave mask on fixed length vectors with NF>2

luke updated this revision to Diff 503004.Mar 7 2023, 5:34 AM

Add tests for fixed length vectors on RISC-V

luke added inline comments.Mar 7 2023, 5:42 AM
llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-fixed.ll
371

I was aiming to improve the generated code for deinterleaves in a follow up patch, but can do it in this patch if preferred.
I would have expected it to have loaded a gather mask from memory here like in the interleave case.

paulwalker-arm added inline comments.Mar 7 2023, 9:50 AM
llvm/include/llvm/IR/Intrinsics.td
2120–2128

I don't like this change because I feel it makes the intrinsics cumbersome to work with due to them being so open-ended when compare to the current form.

That said, based on the rational I don't think the change is necessary because when designing the initial support we concluded there was no need for the instrinsics and ISD nodes to follow the same interface and in fact good reasons for them to differ. This means it's perfectly acceptable to add more intrinsics (i.e. int_experimental_vector_interleave3, int_experimental_vector_interleave4...) that all lower to the same ISD node. This is why the intrinsic is numbered and the ISD is not.

llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-fixed.ll
405

This is not how the intrinsic will be written because the overloaded types is now a struct. You can see this by passing the test files through opt where you'll see the function name will be llvm.experimental.vector.deinterleave.sl_v2i64v2i64v2i64s.v6i64.

luke added inline comments.Mar 7 2023, 10:49 AM
llvm/include/llvm/IR/Intrinsics.td
2120–2128

Thanks for the review. I’m not very strongly opinionated on this, and would be happy to rework this to use separate distinct intrinsics.

In RISC-V the maximum number of interleaving fields in a load/store is 8, so as long as we have up to interleave8/deinterleave8 that should be fine. Does adding those intrinsics sound like a good way forward?

llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-fixed.ll
405

Thanks, that explains a lot of what I was seeing. Shouldn’t be a worry any more though if we rework the intrinsic

paulwalker-arm added inline comments.Mar 8 2023, 2:49 AM
llvm/include/llvm/IR/Intrinsics.td
2120–2128

That would be consistent with the current design but the answer really depends on whether such intrinsics will actually be used. This is why for AArch64 we've started with the 2-way variant only. This allows us to teach loop vectorise how to use them, along with implementing any necessary combines etc.., and thus understand the pitfalls before moving on to other interleaving factors.

luke added inline comments.Mar 8 2023, 5:00 AM
llvm/include/llvm/IR/Intrinsics.td
2120–2128

One thing I've ran into in D145495 is how you can get non-power-of-two element counts in your types like nxv6i32 quite easily with other factors, which need a little bit of attention since they aren't valid MVTs.

paulwalker-arm added inline comments.Mar 8 2023, 5:06 AM
llvm/include/llvm/IR/Intrinsics.td
2120–2128

You likely want to use EVTs during the early stages of code generation. Once type legalisation has been carried out, I'd expect MVTs to then be sufficient.

luke added inline comments.Mar 8 2023, 6:01 AM
llvm/include/llvm/IR/Intrinsics.td
2120–2128

The snag with using EVTs (IIUC) is that we end up introducing illegal types during the LegalizeDAG phase, which throws an assertion.

For the ISD node
t8: nxv2i32,nxv2i32,nxv2i32 = vector_interleave t2, t4, t6
t2,t4 and t6 already come in as legal nxv2i32 types.

When vector_interleave is then lowered during LegalizeDAG, if we just use EVTs and generate the "logical" nxv6i32 type then we emit:
t38: nxv6i32 = RISCVISD::VRGATHEREI16_VV_VL t24, t34, undef:nxv6i32, t36, Register:i64 $x0

Which is now illegal, and asserts at LegalizeDAG.cpp:975.
I presume the design of LegalizeDAG is such that it relies on LegalizeTypes to make sure the types are legal coming in, and presumes that any further legalisation/target lowering will preserve the types.

luke planned changes to this revision.Mar 8 2023, 9:23 AM
luke retitled this revision from [IR] Generalize interleave/deinterleave intrinsics to factors > 2 to [PoC][IR] Generalize interleave/deinterleave intrinsics to factors > 2.Mar 8 2023, 9:47 AM

@luke are you still working on this?

luke added a comment.Jun 27 2023, 3:14 AM

@luke are you still working on this?

Not actively. Is there interest in this now that D145163 is landed? I think the design of the intrinsics in this patch need reworked to be explicit rather than open ended

luke abandoned this revision.Jul 5 2023, 9:22 AM

Marking this as abandoned since I'm not actively working on this anymore, but it would be still good to see this someday

A potential design for the intrinsics could be something like:

llvm
def int_experimental_vector_interleave2   : DefaultAttrsIntrinsic<[LLVMNTimesElts<0, 2>],
                                                                  [llvm_anyvector_ty,
                                                                   LLVMMatchType<0>],
                                                                  [IntrNoMem]>;

def int_experimental_vector_interleave3   : DefaultAttrsIntrinsic<[LLVMNTimesElts<0, 3>],
                                                                  [llvm_anyvector_ty,
                                                                   LLVMMatchType<0>,
                                                                   LLVMMatchType<0>],
                                                                  [IntrNoMem]>;

def int_experimental_vector_deinterleave2 : DefaultAttrsIntrinsic<[llvm_anyvector_ty<0>,
                                                                   LLVMMatchType<0>],
                                                                  [LLVMNTimesElts<0, 2>],
                                                                  [IntrNoMem]>;

def int_experimental_vector_deinterleave3   : DefaultAttrsIntrinsic<[llvm_anyvector_ty,
                                                                     LLVMMatchType<0>,
                                                                     LLVMMatchType<0>],
                                                                     [LLVMNTimesElts<0, 3>],
                                                                    [IntrNoMem]>;

Where LLVMNTimesElts<I, N> is a descriptor for vectors that have N times as many elements as the I'th argument