Page MenuHomePhabricator

[AArch64][CodeGen] Lower (de)interleave2 intrinsics to ld2/st2
Needs ReviewPublic

Authored by huntergr on Mar 16 2023, 4:53 AM.

Details

Summary

The InterleavedAccess pass currently matches (de)interleaving
shufflevector instructions with loads or stores, and calls into
target lowering to generate ldN or stN instructions.

Since we can't use shufflevector for scalable vectors (besides a
splat with zeroinitializer), we have interleave2 and deinterleave2
intrinsics. This patch extends InterleavedAccess to recognize those
intrinsics and if possible replace them with ld2/st2 via target lowering.

Unlike the fixed-length version, we currently cannot 'legalize' the
operation in IR because we don't have a way of concatenating or
splitting vectors at a scalable point, so for now we just bail
out if the types won't match the actual hardware instructions.

Diff Detail

Event Timeline

huntergr created this revision.Mar 16 2023, 4:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2023, 4:53 AM
huntergr requested review of this revision.Mar 16 2023, 4:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2023, 4:53 AM
llvm/lib/CodeGen/InterleavedAccessPass.cpp
459

Probably best to also check for isSimple() to match the behaviour of lowerInterleavedLoad.

481

Probably best to also check for isSimple() to match the behaviour of lowerInterleavedStore.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14795–14797

I'm happy for incremental bring up but I would like to see fixed length vectors supported sooner rather than later so that we have the option to use the new shuffle intrinsics without losing this feature. I'm specifically interested in the potential to simplify SVE VLS so that it mirrors SVE VLA

14807

This will have the effect of "moving" the load to where the deinterleaving is happening? which has the potential to break the original IR's load/store order.

14835–14837

This is not sufficient because it'll allow <vscale x 128 x i1>, <vscale x 64 x i2> etc. Perhaps it's better to explicitly check for the types we do support?

14840

As above, this might create the new "store" in the wrong place.

llvm/test/Transforms/InterleavedAccess/AArch64/sve-deinterleave-intrinsics.ll
162

Please can you add tests for ptr vectors as well.

Matt added a subscriber: Matt.Mar 25 2023, 1:25 PM
huntergr updated this revision to Diff 515641.Apr 21 2023, 1:41 AM
  • Stricter checking before trying to replace load/store+intrinsic pairs
  • Supports fixed-length vectors
  • Will now 'legalize' (de)interleave operations on larger (power-of-two) vectors into multiple ld2/st2 operations with appropriate insert/extract subvector operations.
  • More tests.
huntergr marked 7 inline comments as done.Apr 21 2023, 1:43 AM
mgabka added a subscriber: mgabka.Apr 24 2023, 1:50 AM
paulwalker-arm added inline comments.Thu, Jun 1, 9:34 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14428

Function names should start with a lower case letter?

14443

As above.

14794–14795

Perhaps this is better done in isLegalInterleavedAccessType? as it cannot really be considered legal if the necessary target features are not available. You likely also want to use hasSVEorSME because SME supports these instructions as well.

14810–14812

I suspect this patch will ICE the compiler when faced with SVE fixed length vectors and there's no tests to prove otherwise. The nuance here is the result vectors are fixed length but the target intrinsics will generate scalable vectors.

If you look at lowerInterleavedLoad you'll see the extra complexity relating to working with a container type.

With that said, I'm happy for this patch to not support SVE fixed length vectors, it just cannot trigger an ICE. This might be as simple as adding a bail out for UseScalable != isa<ScalableVector>(VTy) and adding a few tests.

14828

I think just ConstantInt::getTrue(LdTy->getContext()) should work here?

14877

Same comment as with lowerDeinterleaveIntrinsicToLoad.

14890–14892

I don't really see what you're gaining by passing in Address rather than just passing in SI? I guess similar is true for lowerDeinterleaveIntrinsicToLoad.

14916

ConstantInt::getTrue(StTy->getContext())?

llvm/test/Transforms/InterleavedAccess/AArch64/sve-deinterleave-intrinsics.ll
18–19

How about returning %deinterleave instead of the floating extractvalue's, or do they exist to test something specific?

190–192

I think it's better for NEON to have its own test file. Likewise for SVE fixed length support, which I suspect doesn't work.