This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Enable interleaved access vectorization
Needs ReviewPublic

Authored by luke957 on Jun 29 2021, 9:31 AM.

Details

Summary

Enable interleaved access vectorization.

Diff Detail

Event Timeline

luke957 created this revision.Jun 29 2021, 9:31 AM
luke957 requested review of this revision.Jun 29 2021, 9:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2021, 9:31 AM

Please upload patches with full context using -U999999 has documented here https://releases.llvm.org/11.0.0/docs/Phabricator.html#requesting-a-review-via-the-web-interface

Do you plan to map these to segment load/store instructions in the future?

llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll
5

Is -enable-interleaved-mem-accesses=true needed if TTI enableInterleavedAccessVectorization() returns true

Please upload patches with full context using -U999999 has documented here https://releases.llvm.org/11.0.0/docs/Phabricator.html#requesting-a-review-via-the-web-interface

Do you plan to map these to segment load/store instructions in the future?

Yeah, segment load/store instructions are naturally suitable for mapping these. Do we need to create a new RISCVISD?

luke957 added inline comments.Jul 24 2021, 1:52 AM
llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll
5

Yes, -enable-interleaved-mem-accesses=true is not needed any longer.

If we aren't using segment load/store, what does the backend codegen for this look like?

Please upload patches with full context using -U999999 has documented here https://releases.llvm.org/11.0.0/docs/Phabricator.html#requesting-a-review-via-the-web-interface

Do you plan to map these to segment load/store instructions in the future?

Yeah, segment load/store instructions are naturally suitable for mapping these. Do we need to create a new RISCVISD?

I believe we need to run the InterleavedAccessPass and and and implement TargetLowering::LowerInterleavedLoad/Store to create IR intrinsics. That's how it is done on ARM for their vldX and vstX intstructions.

If we aren't using segment load/store, what does the backend codegen for this look like?

It looks like this

%wide.vec = load <8 x i32>, <8 x i32>* %1, align 4
%strided.vec = shufflevector <8 x i32> %wide.vec, <8 x i32> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
%strided.vec1 = shufflevector <8 x i32> %wide.vec, <8 x i32> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
...
%interleaved.vec = shufflevector <4 x i32> %3, <4 x i32> %4, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
store <8 x i32> %interleaved.vec, <8 x i32>* %7, align 4

InnerLoopVectorizer::vectorizeInterleaveGroup() will generate shufflevector instructions for interleaved accesses.

Please upload patches with full context using -U999999 has documented here https://releases.llvm.org/11.0.0/docs/Phabricator.html#requesting-a-review-via-the-web-interface

Do you plan to map these to segment load/store instructions in the future?

Yeah, segment load/store instructions are naturally suitable for mapping these. Do we need to create a new RISCVISD?

I believe we need to run the InterleavedAccessPass and and and implement TargetLowering::LowerInterleavedLoad/Store to create IR intrinsics. That's how it is done on ARM for their vldX and vstX intstructions.

Yeah, I think that is the right direction. Thanks. It seems I should submit a patch implementing TargetLowering::LowerInterleavedLoad/Store before this one.

If we aren't using segment load/store, what does the backend codegen for this look like?

It looks like this

%wide.vec = load <8 x i32>, <8 x i32>* %1, align 4
%strided.vec = shufflevector <8 x i32> %wide.vec, <8 x i32> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
%strided.vec1 = shufflevector <8 x i32> %wide.vec, <8 x i32> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
...
%interleaved.vec = shufflevector <4 x i32> %3, <4 x i32> %4, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
store <8 x i32> %interleaved.vec, <8 x i32>* %7, align 4

InnerLoopVectorizer::vectorizeInterleaveGroup() will generate shufflevector instructions for interleaved accesses.

I was asking what the RISCV assembly looks like. We don't have a 2 input shuffle instruction so this has to broken down into something like 2 vrgathers and a vmerge, but I'm not sure.