This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Lower interleave and deinterleave intrinsics
ClosedPublic

Authored by luke on Feb 15 2023, 4:06 AM.

Details

Summary

Lower the two intrinsics introduced in D141924.

These intrinsics can be combined with loads and stores into the much more efficient segmented load and store instructions in a following patch.

Diff Detail

Event Timeline

luke created this revision.Feb 15 2023, 4:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2023, 4:06 AM
luke requested review of this revision.Feb 15 2023, 4:06 AM
luke added inline comments.Feb 15 2023, 4:10 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6654–6659

I'm not that happy with how this mask is generated.
Ideally we would just vmv.v.i v0, 0x55 directly, but I can't seem to do it in a way that keeps SelectionDAG happy. Namely, it doesn't allow bitcasting <vscale x n x i8> -> <vscale x n i1>

luke added a comment.Feb 15 2023, 8:03 AM

The herald failures should be fixed now that https://reviews.llvm.org/D141924#inline-1391060 has been updated

These should be implementable with clever use of vwaddu and vwmaccu. To avoid the vrgather.

craig.topper added inline comments.Feb 15 2023, 9:02 AM
llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-fixed.ll
2

I don't think this test even uses your patch and shows the strategy for how your patch can be improved. Only i64 and double test case use vrgather.

9

Missing CHECK lines

For interleave, a couple options to consider:

  1. Concatenate vectors into LMUL+1 vector, then gather using {0,VLMAX,1, VLMAX+1,...]. Forming that index vector takes some care for the scalable case, but I think we can do id()/2 +{masked} VLMAX where mask = {0,1,0,1}. The mask can in turn be computed as {id()&1 == 1) Using a segment store to the stack followed by a whole vector reload.

p.s. This was written before @craig.topper's comment, and I haven't read his in detail.

Here's an excerpt from the code we already have for interleave on fixed vectors. Need to see why it doesn't trigger here.

// Detect an interleave shuffle and lower to                                   
// (vmaccu.vx (vwaddu.vx lohalf(V1), lohalf(V2)), lohalf(V2), (2^eltbits - 1)) 
bool SwapSources;                                                              
if (isInterleaveShuffle(Mask, VT, SwapSources, Subtarget)) {
luke added a comment.Feb 16 2023, 2:42 AM

These should be implementable with clever use of vwaddu and vwmaccu. To avoid the vrgather.

That's a much better approach. But it won't work for i64 though right? In https://reviews.llvm.org/D144143 it falls back to a vrgather with a constant index vector

; RV32-V128-LABEL: unary_interleave_v4i64:
; RV32-V128:       # %bb.0:
; RV32-V128-NEXT:    lui a0, %hi(.LCPI19_0)
; RV32-V128-NEXT:    addi a0, a0, %lo(.LCPI19_0)
; RV32-V128-NEXT:    vsetivli zero, 4, e64, m2, ta, ma
; RV32-V128-NEXT:    vle16.v v12, (a0)
; RV32-V128-NEXT:    vrgatherei16.vv v10, v8, v12
; RV32-V128-NEXT:    vmv.v.v v8, v10
; RV32-V128-NEXT:    ret

We can't load the indices for a scalable vector because it'll be of unknown size.

luke updated this revision to Diff 498005.Feb 16 2023, 6:57 AM

Rework lowering to use narrowing shifts in the deinterleave case where possible, and use a single gather that doesn't require viota for interleaving

luke updated this revision to Diff 498006.Feb 16 2023, 7:04 AM

Fix missing check lines

luke edited the summary of this revision. (Show Details)Feb 16 2023, 7:26 AM
luke marked 2 inline comments as done.
luke updated this revision to Diff 498085.EditedFeb 16 2023, 10:59 AM
  1. Be smarter about generating masks by using generating vmsne.vi vx, 0 directly, avoiding a redundant vadd.vi
  2. When SEW < ELEN interleave vectors with vwaddu.vv/vwmacc.vx
luke updated this revision to Diff 498342.Feb 17 2023, 5:30 AM

Remove convertToMask

luke updated this revision to Diff 498343.Feb 17 2023, 5:34 AM

Update tests

At a high level, I'm unhappy with the amount of duplication between the scalable path with the new nodes and the fixed vector path. I think this is a great POC - it largely convinces me we *can* lower these reasonably - but we need to rethink this a bit for landing.

I'm wondering whether we should canonicalize fixed length shuffles for interleave and deinterleave to the new nodes, and then have the lowering as you roughly structured it. This would avoid some of the code duplication.

The other approach would be to have a set of utility functions, and call them appropriately from both places.

I will note I'm open to being convinced this can be done in tree, but in that case, I'd probably want to see a more minimal lowering with incremental improvement and sharing in follow up changes.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6626

This case looks to be missing from the fixed length lowering, we should find a way to pick it up there as well.

6700

This bit of code should be shared with the isInterleaveShuffle above. This will require a bit of restructuring.

llvm/lib/Target/RISCV/RISCVISelLowering.h
273

Unrelated, commit separately please.

luke added a comment.Feb 17 2023, 11:40 AM

I'm wondering whether we should canonicalize fixed length shuffles for interleave and deinterleave to the new nodes, and then have the lowering as you roughly structured it. This would avoid some of the code duplication.

I agree, the thought crossed my mind as I was reimplementing all of this. It’s exactly the opposite of what SelectionDAGBuilder does in the first place though: it explicitly generates shuffle_vectors from the intrinsics for fixed length vectors.

I know it’s a question that’s been raised before but perhaps it’s worthwhile discussing with the AArch64 folks if shuffle_vectors should be canonically combined into vector_interleave across all targets.
Presumably a lot of work.
And I wonder if they have the same duplication problem too.

I'm wondering whether we should canonicalize fixed length shuffles for interleave and deinterleave to the new nodes, and then have the lowering as you roughly structured it. This would avoid some of the code duplication.

I agree, the thought crossed my mind as I was reimplementing all of this. It’s exactly the opposite of what SelectionDAGBuilder does in the first place though: it explicitly generates shuffle_vectors from the intrinsics for fixed length vectors.

I know it’s a question that’s been raised before but perhaps it’s worthwhile discussing with the AArch64 folks if shuffle_vectors should be canonically combined into vector_interleave across all targets.
Presumably a lot of work.
And I wonder if they have the same duplication problem too.

Can we extract the code generation from LowerVECTOR_SHUFFLE into a function that we call in two places? That should reduce the duplication.

luke updated this revision to Diff 498810.Feb 20 2023, 5:38 AM

Share code between fixed length and scalable vectors

craig.topper added inline comments.Feb 20 2023, 5:31 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6572

Subtarget is unused

6731

Need to pass Idx instead of UNDEF to get a mask undisturbed vadd.vv.

llvm/test/CodeGen/RISCV/rvv/vector-interleave.ll
72

This is a mask agnostic vadd.vx. You need mask undisturbed.

luke updated this revision to Diff 499089.Feb 21 2023, 2:30 AM
luke marked an inline comment as done.

Fix i64 interleave case not using mask undisturbed

luke marked 4 inline comments as done.Feb 21 2023, 2:31 AM
luke added inline comments.Feb 21 2023, 2:37 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6626

I think this is handled in lowerVECTOR_SHUFFLEAsVNSRL

reames added inline comments.Feb 21 2023, 9:17 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6587

WideN.getNode()->getNumValues() should just be NumVals right?

6619

Please use getDefaultVLOps here.

6622

I strongly request we drop the optimized case from the initial patch and come back to these in follow up changes. I want to focus on having a correct base lowering first.

6652

The index type here doesn't look right. Say we have two vectors of i8. On a VLEN>2048 machine, we need the index type to be larger than i8.

6690

I think this is the same as WideVT above, but with different naming and computation. Can you standardize on one or the other please?

6696

Please drop the optimized case from the initial patch.

6706

See changeVectorElementType

6725

I think the comments are backwards here. I believe the value you're actually computing is:
// 0,0,1,1,2,2,.. etc.

This could simply be a "which order do we write vector lanes in" confusion, but this ordered doesn't match the deinterleave comment above either.

6728

Same here

reames added inline comments.Feb 21 2023, 9:58 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6720

See newly added computeVLMax.

reames added inline comments.Feb 21 2023, 10:08 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6652

Realized the case I raised is unreachable until the fast path above is removed (as I suggested).

reames added inline comments.Feb 21 2023, 10:23 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6622

Staring at this code and the analogous code in fixed length vector lowering, I'm going to drop this request. Instead, I'm going to explore a pre-change to make sharing that logic possible.

6696

Same here.

luke added inline comments.Feb 21 2023, 3:52 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6652

Come think of it this could just be replaced with i16 and VRGATHEREI16 instead, saving some space.

luke added inline comments.Feb 21 2023, 4:12 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6720

It looks like computeVLMAX computes the maximum VLMAX statically, not the actual VLMAX on the current hardware.
Although there's a bunch of other places where VLMAX is computed.
Maybe we can rename computeVLMAX to computeMaxVLMAX, and then add a helper function getVLMAX

luke added inline comments.Feb 21 2023, 4:14 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6725

I've been staring at the spec too long, they use backwards notation. I agree, keeping it in LLVM order makes more sense

craig.topper added inline comments.Feb 21 2023, 4:14 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6720

It's using ISD::VSCALE which will be expanded to something like csrr a0, vlenb; srli a0, 3. We define "vscale" as vlen/64. vlenb is already in bytes so we divide by with a shift.

craig.topper added inline comments.Feb 21 2023, 4:16 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6720

Oops that should have said "We define "vscale" as vlen/64. vlenb is already in bytes so we divide by another 8 with a shift right by 3."

luke added inline comments.Feb 21 2023, 4:39 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6652

Never mind, this causes an extra vsetvli

luke updated this revision to Diff 499324.Feb 21 2023, 4:43 PM

Address some (but not all) review comments

luke marked 2 inline comments as done.Feb 21 2023, 4:52 PM
luke added inline comments.
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6720

My bad, I didn't realise how new it actually was and hadn't pulled in 9168c98553ac9a1f8e8b87006f9b1b3f23955beb, I thought you were referring to RISCVTargetLowering::computeVLMAX

luke updated this revision to Diff 499326.Feb 21 2023, 5:01 PM
luke marked 5 inline comments as done.

Use computeVLMax

luke marked 2 inline comments as done.Feb 21 2023, 5:01 PM
luke updated this revision to Diff 499327.Feb 21 2023, 5:02 PM

Use computeVLMax

reames accepted this revision.Feb 21 2023, 6:05 PM

LGTM with two test comments addressed.

llvm/test/CodeGen/RISCV/rvv/vector-interleave-fixed.ll
40

Missing check lines here, probably due to conflict with autogen.

117

Same problem here.

This revision is now accepted and ready to land.Feb 21 2023, 6:05 PM
luke updated this revision to Diff 499556.Feb 22 2023, 9:30 AM

Fix check lines

luke marked 2 inline comments as done.Feb 22 2023, 9:39 AM
craig.topper added inline comments.Feb 22 2023, 9:43 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6589

operator-> works on SDValue. You don't need to call getNode.

@reames getDeinterleaveViaVNSRL needs some massaging to be reused here with the scalable vectors, I'll do that in a follow up patch to keep this diff clean

luke updated this revision to Diff 499567.Feb 22 2023, 10:05 AM

Use -> overload

luke marked 8 inline comments as done.Feb 22 2023, 10:06 AM
luke updated this revision to Diff 499571.Feb 22 2023, 10:08 AM

Remove accidental extra commit

Reverse ping - anything preventing this from landing?

luke added a comment.Feb 23 2023, 7:51 AM

Reverse ping - anything preventing this from landing?

Was intending to land it with https://reviews.llvm.org/D144584

This revision was landed with ongoing or failed builds.Feb 23 2023, 8:23 AM
This revision was automatically updated to reflect the committed changes.