Lower the two intrinsics introduced in D141924.
These intrinsics can be combined with loads and stores into the much more efficient segmented load and store instructions in a following patch.
Paths
| Differential D144092
[RISCV] Lower interleave and deinterleave intrinsics ClosedPublic Authored by luke on Feb 15 2023, 4:06 AM.
Details Summary Lower the two intrinsics introduced in D141924. These intrinsics can be combined with loads and stores into the much more efficient segmented load and store instructions in a following patch.
Diff Detail
Event Timeline
Comment Actions The herald failures should be fixed now that https://reviews.llvm.org/D141924#inline-1391060 has been updated Comment Actions These should be implementable with clever use of vwaddu and vwmaccu. To avoid the vrgather. Comment Actions For interleave, a couple options to consider:
p.s. This was written before @craig.topper's comment, and I haven't read his in detail. Comment Actions Here's an excerpt from the code we already have for interleave on fixed vectors. Need to see why it doesn't trigger here. // Detect an interleave shuffle and lower to // (vmaccu.vx (vwaddu.vx lohalf(V1), lohalf(V2)), lohalf(V2), (2^eltbits - 1)) bool SwapSources; if (isInterleaveShuffle(Mask, VT, SwapSources, Subtarget)) { Comment Actions
That's a much better approach. But it won't work for i64 though right? In https://reviews.llvm.org/D144143 it falls back to a vrgather with a constant index vector ; RV32-V128-LABEL: unary_interleave_v4i64: ; RV32-V128: # %bb.0: ; RV32-V128-NEXT: lui a0, %hi(.LCPI19_0) ; RV32-V128-NEXT: addi a0, a0, %lo(.LCPI19_0) ; RV32-V128-NEXT: vsetivli zero, 4, e64, m2, ta, ma ; RV32-V128-NEXT: vle16.v v12, (a0) ; RV32-V128-NEXT: vrgatherei16.vv v10, v8, v12 ; RV32-V128-NEXT: vmv.v.v v8, v10 ; RV32-V128-NEXT: ret We can't load the indices for a scalable vector because it'll be of unknown size. Comment Actions Rework lowering to use narrowing shifts in the deinterleave case where possible, and use a single gather that doesn't require viota for interleaving luke marked 2 inline comments as done. Comment Actions
luke added a child revision: D144175: [RISCV] Combine (store/load interleave,deinterleave) into vsseg2/vlseg2.Feb 17 2023, 4:09 AM Comment Actions At a high level, I'm unhappy with the amount of duplication between the scalable path with the new nodes and the fixed vector path. I think this is a great POC - it largely convinces me we *can* lower these reasonably - but we need to rethink this a bit for landing. I'm wondering whether we should canonicalize fixed length shuffles for interleave and deinterleave to the new nodes, and then have the lowering as you roughly structured it. This would avoid some of the code duplication. The other approach would be to have a set of utility functions, and call them appropriately from both places. I will note I'm open to being convinced this can be done in tree, but in that case, I'd probably want to see a more minimal lowering with incremental improvement and sharing in follow up changes.
Comment Actions
I agree, the thought crossed my mind as I was reimplementing all of this. It’s exactly the opposite of what SelectionDAGBuilder does in the first place though: it explicitly generates shuffle_vectors from the intrinsics for fixed length vectors. I know it’s a question that’s been raised before but perhaps it’s worthwhile discussing with the AArch64 folks if shuffle_vectors should be canonically combined into vector_interleave across all targets. Comment Actions
Can we extract the code generation from LowerVECTOR_SHUFFLE into a function that we call in two places? That should reduce the duplication. luke edited parent revisions, added: D144386: [RISCV] Use a smaller VL when interleaving fixed vectors; removed: D144091: [RISCV][NFC] Add VIOTA_VL node.Feb 20 2023, 5:39 AM luke marked an inline comment as done. Comment ActionsFix i64 interleave case not using mask undisturbed
luke added inline comments.
This revision is now accepted and ready to land.Feb 21 2023, 6:05 PM
luke added a parent revision: D144532: [RISCV] Reorganize deinterleave lowering for reuse [nfc].Feb 22 2023, 9:53 AM luke removed a parent revision: D144532: [RISCV] Reorganize deinterleave lowering for reuse [nfc].Feb 22 2023, 9:59 AM Comment Actions@reames getDeinterleaveViaVNSRL needs some massaging to be reused here with the scalable vectors, I'll do that in a follow up patch to keep this diff clean Comment Actions
Was intending to land it with https://reviews.llvm.org/D144584 This revision was landed with ongoing or failed builds.Feb 23 2023, 8:23 AM Closed by commit rG8d15e7275fe1: [RISCV] Lower interleave and deinterleave intrinsics (authored by luke). · Explain Why This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 499863 llvm/lib/Target/RISCV/RISCVISelLowering.h
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-fixed.ll
llvm/test/CodeGen/RISCV/rvv/vector-deinterleave.ll
llvm/test/CodeGen/RISCV/rvv/vector-interleave-fixed.ll
llvm/test/CodeGen/RISCV/rvv/vector-interleave.ll
|
Unrelated, commit separately please.