This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Enable interleaved access vectorization
ClosedPublic

Authored by luke on Mar 2 2023, 4:35 AM.

Details

Summary

The loop vectorizer supports generating interleaved loads and stores via
shuffle patterns for fixed length vectors.
This enables it for RISC-V, since interleaved shuffle patterns can be
lowered to vlseg/vsseg in https://reviews.llvm.org/D145022

Diff Detail

Event Timeline

luke created this revision.Mar 2 2023, 4:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2023, 4:35 AM
luke requested review of this revision.Mar 2 2023, 4:35 AM
luke added inline comments.Mar 2 2023, 4:39 AM
llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses-expensive.ll
4 ↗(On Diff #501831)

We don't need to check if the type is legal in getInterleavedMemoryOpCost, since the loop vectorizer already checks if the type needs to be split by calling TTI.getNumberOfParts

reames requested changes to this revision.Mar 6 2023, 8:28 AM
reames added inline comments.
llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
375

This doesn't look right. We need to account for the cost of the actual memory op, plus the interweave cost (if any).

At a minimum, we need to have the full cost of the wide memory op as a baseline. I can't imagine hardware with an optimized segment-2 which beats the cost of a normal load/store op of the same width.

The only question left is whether we need to explicitly model the shuffle cost. Depending on the hardware, we may or may not have an optimized segment load/store.

I think it's probably safest to cost model this as if we're going to do a wide load followed by a shuffle. We can reduce that cost if we have a target which a) actually has faster segment-2, and b) cares about the cost difference.

llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll
6

Can you add a couple tests for SEW < 64 bits?

Also, you should probably add an actual CostModel test rather than relying on indirectly testing this through the vectorizer.

This revision now requires changes to proceed.Mar 6 2023, 8:28 AM
luke updated this revision to Diff 503775.Mar 9 2023, 8:11 AM

Model wide load and shuffle

luke added a reviewer: arcbbb.Mar 9 2023, 8:19 AM
luke updated this revision to Diff 503785.Mar 9 2023, 8:20 AM

Split out test case into precommit

luke added inline comments.Mar 9 2023, 8:21 AM
llvm/test/Transforms/LoopVectorize/RISCV/interleaved-cost.ll
2

Unfortunately there doesn't seem to be a way to print out the result of getInterleavedMemoryOpCost other than via -debug-only=loop-vectorize. i.e. It's not called in getInstructionCost

luke updated this revision to Diff 504582.Mar 13 2023, 3:57 AM

Update zvl32b test case

craig.topper added inline comments.Mar 13 2023, 4:43 PM
llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
411

"Opcode must be a store"?

412

load -> store?

luke updated this revision to Diff 505485.Mar 15 2023, 7:41 AM

Fix outdated comment

luke marked 2 inline comments as done.Mar 15 2023, 7:41 AM
reames accepted this revision.Mar 15 2023, 7:58 AM

LGTM

This revision is now accepted and ready to land.Mar 15 2023, 7:58 AM
luke updated this revision to Diff 505586.Mar 15 2023, 12:04 PM

Fix deinterleaves being picked up as interleaves in cost model

luke marked 2 inline comments as done.Mar 15 2023, 12:04 PM

Fix deinterleaves being picked up as interleaves in cost model

I don't understand this comment, and the code change appears to be unrelated. My LGTM does not hold with the change until this is separated and/or explained.

This revision was automatically updated to reflect the committed changes.
luke reopened this revision.Mar 15 2023, 4:11 PM
This revision is now accepted and ready to land.Mar 15 2023, 4:11 PM
luke updated this revision to Diff 505650.Mar 15 2023, 4:11 PM

Split out getShuffleCost change

luke added inline comments.Mar 15 2023, 4:17 PM
llvm/test/Transforms/LoopVectorize/RISCV/interleaved-cost.ll
12

@reames Apologies for jumping the gun there, this is the line that fails without that change to getShuffleCost.

I've split it out into D146176, but the gist is that this deinterleaved load ends up getting costed as an interleave because it has a mask of <0, 2>

This test case covers it but I wasn't able to find a way to test it in the aforementioned patch due to how getInstructionCost doesn't cost shuffles that change length. (Happy to take a look into that though if needed)

reames accepted this revision.Mar 16 2023, 7:34 AM

LGTM (again)

p.s. In this case, leaving the code in this patch would have been fine. It was the lack of explanation/example in your rebase which was problematic.