Followup to D99355: SDAG support for vector-predicated load/store/gather/scatter.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
The existing masked/gather/scatter SDNodes have seen some changes since the VP nodes were basically forked from them. I am not familiar with the indexing in masked load, for example. Maybe some of the RISC-V V folks can chime in whether any of the recent masked-sdnode updates should be applied to the vp nodes too.
llvm/include/llvm/CodeGen/SelectionDAGNodes.h | ||
---|---|---|
2373 | Since expansion/compression is unsupported I don't think we need the is.*Load/Store functions. |
I feel that it would be best if the SDNodes matched the non-VP ones as closely as possible. RISC-V doesn't have use for ISD::MemIndexedMode but some target may have one. I'm thinking even expanding loads and truncating stores should be an option. The only complexity is having to legalize it, but we can hopefully piggy-back on the logic for non-VP nodes.
This patch also needs to extend SelectionDAGBuilder to custom-build the "mem" nodes from the intrinsics. Otherwise we won't be creating the SDNodes correctly, e.g. there'll be an invalid MachineMemOperand.
As far as I can tell, this is being done correctly through SelectionDAGBuilder::visitVectorPredicationIntrinsic. Is this not the case?
No I don't believe that's the case. There's quite a bit that needs to change for these nodes: we need an in/out chain and we need to create a MachineMemOperand.
Yesterday I tried to build on top of this patch to support the VP nodes in RISC-V but quickly hit segfaults in AddNodeIDCustom.
I believe this change in the reference patch shows the kind of think we need.
Addressing review comments: updating sdnodes with indexing, expanding, compressing support; bringing in line with load/store sdnodes.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
---|---|---|
7359 ↗ | (On Diff #363804) | I think we're missing VP_GATHER and VP_SCATTER here? |
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
---|---|---|
7385 ↗ | (On Diff #365640) | I don't think you can use PtrOperand to initialize the MachineMemOperand for VP_GATHER because it's not a pointer - it's a vector of pointers. The regular llvm.masked.gather intrinsic creates a MachinePointerInfo using only the address space. |
7427 ↗ | (On Diff #365640) | Same concern about MachinePointerInfo as with VP_GATHER. |
Sorry that my review feedback is piecemeal. What I'm doing is running the following test (declares omitted for brevity) with RISC-V. I think we want to see "cannot select" errors for each function, but I'm currently only getting that with vp.load and fixed-length vp.gather. The rest assert.
; RUN: llc -mtriple=riscv64 -mattr=+experimental-v -riscv-v-vector-bits-min=128 < %s define <vscale x 2 x i32> @vp_gather_nxv2i32(<vscale x 2 x i32*> %ptrs, <vscale x 2 x i1> %mask, i32 zeroext %evl) { %ld = call <vscale x 2 x i32> @llvm.vp.gather.nxv2i32.p0nxv2i32(<vscale x 2 x i32*> %ptrs, <vscale x 2 x i1> %mask, i32 %evl) ret <vscale x 2 x i32> %ld } define <4 x i32> @vp_gather_v4i32(<4 x i32*> %ptrs, <4 x i1> %mask, i32 zeroext %evl) { %ld = call <4 x i32> @llvm.vp.gather.v4i32.p0v4i32(<4 x i32*> %ptrs, <4 x i1> %mask, i32 %evl) ret <4 x i32> %ld } define <vscale x 2 x i32> @vp_load_nxv2i32(<vscale x 2 x i32>* %ptr, <vscale x 2 x i1> %mask, i32 zeroext %evl) { %ld = call <vscale x 2 x i32> @llvm.vp.load.nxv2i32.p0nxv2i32(<vscale x 2 x i32>* %ptr, <vscale x 2 x i1> %mask, i32 %evl) ret <vscale x 2 x i32> %ld } define <4 x i32> @vp_load_v4i32(<4 x i32>* %ptr, <4 x i1> %mask, i32 zeroext %evl) { %ld = call <4 x i32> @llvm.vp.load.v4i32.p0v4i32(<4 x i32>* %ptr, <4 x i1> %mask, i32 %evl) ret <4 x i32> %ld } define void @vp_store_nxv2i32(<vscale x 2 x i32> %val, <vscale x 2 x i32>* %ptr, <vscale x 2 x i1> %mask, i32 zeroext %evl) { call void @llvm.vp.store.nxv2i32.p0nxv2i32(<vscale x 2 x i32> %val, <vscale x 2 x i32>* %ptr, <vscale x 2 x i1> %mask, i32 %evl) ret void } define void @vp_store_v4i32(<4 x i32> %val, <4 x i32>* %ptr, <4 x i1> %mask, i32 zeroext %evl) { call void @llvm.vp.store.v4i32.p0v4i32(<4 x i32> %val, <4 x i32>* %ptr, <4 x i1> %mask, i32 %evl) ret void } define void @vp_scatter_v4i32(<4 x i32> %val, <4 x i32*> %ptrs, <4 x i1> %mask, i32 zeroext %evl) { call void @llvm.vp.scatter.v4i32.p0v4i32(<4 x i32> %val, <4 x i32*> %ptrs, <4 x i1> %mask, i32 %evl) ret void } define void @vp_scatter_nxv2i32(<vscale x 2 x i32> %val, <vscale x 2 x i32*> %ptrs, <vscale x 2 x i1> %mask, i32 zeroext %evl) { call void @llvm.vp.scatter.nxv2i32.p0nxv2i32(<vscale x 2 x i32> %val, <vscale x 2 x i32*> %ptrs, <vscale x 2 x i1> %mask, i32 %evl) ret void }
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp | ||
---|---|---|
7898 | This should be getVectorElementCount so it doesn't crash on scalable vectors | |
7936 | getVectorElementCount for scalable vectors. | |
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
7375 ↗ | (On Diff #366034) | nit: empty line |
7431 ↗ | (On Diff #366034) | This is an out-of-bounds access because vp.store/vp.gather return void and so ValueVTs is empty. |
@simoll @frasercrmck All the mentioned tests are now passing, and I don't think there are any other outstanding issues. Are we good to accept and merge?
The visitVectorPredicationIntrinsic is pretty crammed now. Probably a good idea to extract the SDNode buildling for load/store/scatter/gather into their own functions as done for regular (masked) load/store/gather/scatter.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
---|---|---|
7395–7416 ↗ | (On Diff #366709) | This is the same code as .. |
7443–7464 ↗ | (On Diff #366709) | .. this code. Can we de-duplicate and move this into its own function? |
llvm/include/llvm/CodeGen/SelectionDAGNodes.h | ||
---|---|---|
2336 | nit: inconsistent capitalization, plus VLen is maybe better known at EVL. Also it doesn't have to be i32. It may be widened to a target-specific value, e.g., MVT EVLParamVT = TLI.getVPExplicitVectorLengthTy(); Not sure how best to describe that in the comments. | |
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp | ||
7596 | I think all of these VLen params may be better as EVL since that's how they're known in the rest of the code. Any thoughts, @simoll? |
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
---|---|---|
7486 ↗ | (On Diff #369128) | I think this should be Opcode == ISD::VP_SCATTER? |
I don't think the commit message title needs to mention the fact it's a "Followup to D99355" - that can be in the body.
llvm/include/llvm/CodeGen/SelectionDAGNodes.h | ||
---|---|---|
2341 | Sorry for reviving, this (I was working on a patch for strided load store). Shouldn't this have been ISD::VP_LOAD instead? |
llvm/include/llvm/CodeGen/SelectionDAGNodes.h | ||
---|---|---|
2341 | Oh, good spot! I don't think anyone's using the shared base class (everyone uses the concrete subclasses which have their own getOffset. Do you want to submit a patch or shall I? |
llvm/include/llvm/CodeGen/SelectionDAGNodes.h | ||
---|---|---|
2341 | I created https://reviews.llvm.org/D110930 Thanks for the prompt reply! |
clang-tidy: warning: invalid case style for parameter 'dl' [readability-identifier-naming]
not useful