Enable interleaved access vectorization.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Please upload patches with full context using -U999999 has documented here https://releases.llvm.org/11.0.0/docs/Phabricator.html#requesting-a-review-via-the-web-interface
Do you plan to map these to segment load/store instructions in the future?
llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll | ||
---|---|---|
4 | Is -enable-interleaved-mem-accesses=true needed if TTI enableInterleavedAccessVectorization() returns true |
Yeah, segment load/store instructions are naturally suitable for mapping these. Do we need to create a new RISCVISD?
llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll | ||
---|---|---|
4 | Yes, -enable-interleaved-mem-accesses=true is not needed any longer. |
If we aren't using segment load/store, what does the backend codegen for this look like?
I believe we need to run the InterleavedAccessPass and and and implement TargetLowering::LowerInterleavedLoad/Store to create IR intrinsics. That's how it is done on ARM for their vldX and vstX intstructions.
It looks like this
%wide.vec = load <8 x i32>, <8 x i32>* %1, align 4 %strided.vec = shufflevector <8 x i32> %wide.vec, <8 x i32> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6> %strided.vec1 = shufflevector <8 x i32> %wide.vec, <8 x i32> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7> ... %interleaved.vec = shufflevector <4 x i32> %3, <4 x i32> %4, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7> store <8 x i32> %interleaved.vec, <8 x i32>* %7, align 4
InnerLoopVectorizer::vectorizeInterleaveGroup() will generate shufflevector instructions for interleaved accesses.
Yeah, I think that is the right direction. Thanks. It seems I should submit a patch implementing TargetLowering::LowerInterleavedLoad/Store before this one.
I was asking what the RISCV assembly looks like. We don't have a 2 input shuffle instruction so this has to broken down into something like 2 vrgathers and a vmerge, but I'm not sure.
clang-format: please reformat the code