This is an archive of the discontinued LLVM Phabricator instance.

Add support for vectorization of interleaved memory accesses for scalable VF
ClosedPublic

Authored by huntergr on Mar 2 2023, 7:24 AM.

Details

Summary

This patch is using the new intrinsics introduced in
https://reviews.llvm.org/D141924
to enable vecorization of interleaved accesses for scalable VF.
Targets need to implement a proper cost model for supported operations
to make sure that generated IR can be code generated.

Diff Detail

Event Timeline

mgabka created this revision.Mar 2 2023, 7:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2023, 7:24 AM
mgabka requested review of this revision.Mar 2 2023, 7:24 AM
Matt added a subscriber: Matt.Mar 2 2023, 7:27 AM
nlopes added inline comments.Mar 2 2023, 8:09 AM
llvm/lib/IR/IRBuilder.cpp
596

Please use PoisonValue here and whenever possible as we are trying to get rid of UndefValue.
Thank you!

luke added a comment.Mar 2 2023, 9:27 AM

Thanks for adding this! I'm currently plugging in the hooks for RISC-V and will let you know what I run into.

llvm/lib/IR/IRBuilder.cpp
587

Maybe this should be an assertion

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2721

Need to check that Group->getFactor() == 2 here or that the call to CreateMaskedInterleavedLoad succeeds

2809

Need to check Group->getFactor() == 2 here too

luke added inline comments.Mar 2 2023, 9:58 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2751

It's somehow possible to reach here with a scalable vector type if TII->hasInterleavedLoad returns false.
Can we check somewhere inside the vectorizer cost model that if hasInterleavedLoad is false then we rule out any recipe with an interleave group for a scalable VF?

mgabka added inline comments.Mar 10 2023, 5:47 AM
llvm/include/llvm/Analysis/TargetTransformInfo.h
835–839

I think these functions could actually be joined into 1, something like:

supportsInterleaving(VectorType *VecTy, uint32_t Factor, bool IsMasked)

I think it is going to be unlikely that target supports store but not load for the same vector type, @paulwalker-arm what do you think?

llvm/lib/IR/IRBuilder.cpp
587

Hi Luke,
so I could remove the Factor argument entirely and make this function specific for just Factor=2 (and use assertions) what makes sense for now as the experimental interleaving intrinsics are only for factor 2.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2721

So my idea was that it would be up to the hasInterleavedLoad function to make sure that it returns true only when Factor is 2, so no extra checks is needed I think.

2751

So it is actually connected by the LV cost model, the LoopVectorizationCostModel::getInterleaveGroupCost is calling TTI.getInterleavedMemoryOpCost which should return invalid cost for factors different than 2.

mgabka updated this revision to Diff 504122.Mar 10 2023, 6:07 AM

Replaced use of Undef with Poison value

mgabka marked an inline comment as done.Mar 10 2023, 6:07 AM
reames added inline comments.Mar 10 2023, 7:35 AM
llvm/include/llvm/IR/IRBuilder.h
771

This is the wrong interface. The IRBuilder interface should provide a way to create the interleave and deinterleave instrinsic calls. That interface should generate shuffles for fixed vectors. Then the calling logic in the vectorizer should worry about emitting the load/store. (That's the existing structure in fact.)

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2987

The changes to this function are NFC for fixed length vectors, and a generally useful scalable cleanup. Please separate and land this change without the need for further review.

This applies *only* to the changes in this function so as to shrink the diff for future review.

llvm/test/Transforms/LoopVectorize/sve-interleaved-accesses.ll
1

This should be in the AArch64 sub-tree, and probably precommited. Depending on your confidence in the AArch64 code, you may want to separate that into it's own review.

huntergr commandeered this revision.Apr 5 2023, 5:44 AM
huntergr added a reviewer: mgabka.
huntergr added a subscriber: huntergr.

Taking this one over from @mgabka

huntergr updated this revision to Diff 511067.Apr 5 2023, 5:48 AM

Separated out the bitcast fix and committed. Precommitted tests.

Changed IRBuilder interface to focus on the intrinsics (and fixed-length shuffle equivalents) instead of mixing in loads/stores.

There's a few unit tests which will fail with the new interface -- although we generate the same IR instructions, the order is different. Assuming the interface is suitable I'll update the tests before posting the next patch revision.

huntergr marked 2 inline comments as done.Apr 5 2023, 5:55 AM
huntergr added inline comments.
llvm/tools/llvm-profdata/CMakeLists.txt
7 ↗(On Diff #511067)

This is due to the fixed-length mask generation code being in VectorUtils.

This isn't the only tool affected, though oddly enough I've only observed build failures on X86 and not AArch64 hosts. I've included it as a representative.

I would prefer not to make changes to a bunch of cmake files for this, so I'm currently leaning towards either duplicating the mask generation code or moving it into Core. Any preferences from a reviewer?

mgabka added inline comments.Apr 6 2023, 3:47 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
125–127

Hi @huntergr ,
Thanks for your changes to this patch!

I have one question, the interface you proposed looks clean and nice, however it forces code generation for the deinterleaving/interleaving intrinsics to be implemented before merging this patch, am I correct?

The reason why I had this option here is that it would allow us to merge this patch before other pieces are implemented.

huntergr added inline comments.Apr 6 2023, 3:57 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
125–127

Hi @mgabka ,

We do have code generation for these intrinsics already, they just get lowered to zips/uzips. See D141924.

D146218 will match to ld2/st2 where possible (which is what we want), and should perhaps land first. The changes to isLegalInterleavedAccessType will also be needed there, so the next version of this patch can just rely on that.

reames added inline comments.Apr 6 2023, 6:01 PM
llvm/lib/IR/IRBuilder.cpp
1343

cast<>

1384

Given this assert, we shouldn't need to pass Factor in here at all.

1387

cast<>

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2730

Having this be only in the normal load path seems unlikely to be correct. Surely we must also handle masked loads as well?

2763–2764

It looks like you're changing the handling for gaps in the deinterleave. This seems surprising and worth some discussion?

huntergr updated this revision to Diff 513595.Apr 14 2023, 7:53 AM
  • Implemented requested changes to utility functions (casts, removing redundant parameter)
  • Moved utility functions to VectorUtils instead of IRBuilder; this removes the problem that introduced a dependency on the Analysis component for several tools which don't need the funtionality.
  • Updated affected tests.
  • Full check output included for the strict fadd tests.
huntergr marked 5 inline comments as done.Apr 14 2023, 7:59 AM
huntergr added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2730

This does handle masked loads -- 'NewLoad = Builder.CreateAligned....' is a standalone statement on the else with no opening brace. I've added a blank line to perhaps make that a little more obvious.

Unless there's something else I've missed?

2763–2764

That was the result of a bit of overzealous cleanup on my part when removing some code from the original patch; I missed the 'continue'. Reverted.

reames added inline comments.Apr 14 2023, 8:36 AM
llvm/include/llvm/Analysis/VectorUtils.h
596 ↗(On Diff #513595)

Unless you have plans to reuse these, this is just an implementation detail of the vectorizer. As such, these would be better as static functions in LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2730

Yeah, I got confused by the brace style in the code above.

2746–2751

The interface here feels really awkward for fixed length vectors. We have to create this dummy struct type, construct it, destruct it, and we loose the ability to slice out the inactive lanes.

I almost wonder if this code would be clearer without the helper function at all. With an explicit version based on scalable type here, we could do a simplified version of this loop with an early return and leave the fixed length codegen unchanged.

I'd be tempted to try that and see if the overall code quality looked reasonable.

You could also try a lambda which enumerate the active lanes (i.e. doing the shuffle or extract as required), and move the handling of the bitcast and reverse to a callback. This might be too much complexity though.

llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
1 ↗(On Diff #513595)

Please submit a separate change to autogen this file, and then rebase. Same with the other file you switched to autogen.

huntergr updated this revision to Diff 514596.Apr 18 2023, 3:59 AM
huntergr marked 2 inline comments as done.
  • Moved interleaveVectors to a static function in LoopVectorize
  • Removed deinterleaveVector, inlined intrinsic creation. This means the shuffles for fixed-length loads won't be changed, though we do end up with a little duplication as a result.
  • Precommitted autogen checks for scalable-strict-fadd.ll
huntergr marked 2 inline comments as done.Apr 18 2023, 4:01 AM
igor.kirillov added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14373

Looks like we can use MinElts and avoid duplicating EC.getKnownMinValue()

huntergr updated this revision to Diff 523695.May 19 2023, 1:50 AM

Rebased, removed redundant call as requested.

huntergr marked an inline comment as done.May 19 2023, 1:50 AM

I can't really comment on the AArch64 parts of this, but the LoopVectorizer bits look entirely reasonable to me at this point.

fhahn added a subscriber: fhahn.May 28 2023, 11:29 AM
fhahn added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14384

Nit: can just return the condition

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
441

Should we have this assert already when constructing the interleave recipe?

huntergr updated this revision to Diff 527321.Jun 1 2023, 1:41 AM

Simplified size checking code, changed assert on interleave factor to occur before creating recipes.

The recipe constructor doesn't actually know the VF, so just confirming that the decision for a given scalable VF is to interleave only for factors of 2 should suffice. The code that performs interleaving is still effectively guarded by asserts in call construction that it has the correct number of arguments.

huntergr marked 2 inline comments as done.Jun 1 2023, 1:41 AM
fhahn accepted this revision.Jun 6 2023, 1:18 PM

LGTM, thanks!

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
449 ↗(On Diff #527321)

Could you make sure there's a test case for RISCV that covers this case before landing?

This revision is now accepted and ready to land.Jun 6 2023, 1:18 PM
mgabka added inline comments.Jun 7 2023, 1:25 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
444

can you pass here Vals directly?

huntergr added inline comments.Jun 8 2023, 3:11 AM
llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
449 ↗(On Diff #527321)

I added that check because both interleaved-accesses.ll and strided-accesses.ll (in llvm/test/Transforms/LoopVectorize/RISCV/) crash on the cast below when vplan tries to get the cost of the interleaving group with a scalable VF. Is that sufficient?

This revision was landed with ongoing or failed builds.Jun 9 2023, 3:43 AM
This revision was automatically updated to reflect the committed changes.