This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Enable AArch64 SVE FCMLA/FCADD instruction generation in ComplexDeinterleaving
ClosedPublic

Authored by igor.kirillov on Apr 3 2023, 9:41 AM.

Details

Summary

This commit adds support for scalable vector types in theComplexDeinterleaving
pass, allowing it to recognize and handle llvm.vector.interleave2 and
llvm.vector.deinterleave2 intrinsics for both fixed and scalable vectors

Diff Detail

Event Timeline

igor.kirillov created this revision.Apr 3 2023, 9:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 3 2023, 9:41 AM
igor.kirillov requested review of this revision.Apr 3 2023, 9:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 3 2023, 9:41 AM
Matt added a subscriber: Matt.Apr 3 2023, 1:56 PM
igor.kirillov added inline comments.Apr 4 2023, 8:13 AM
llvm/test/CodeGen/AArch64/complex-deinterleaving-f16-add.ll
102

I am not sure if I should duplicate all fixed-width vector tests to reflect the fact that llvm.experimental.vector.deinterleave2 is now supported there.

NickGuy added inline comments.Apr 4 2023, 8:51 AM
llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
259–260

Might be worth adding a comment here saying what a "Deinterleave" is; It's not immediately clear that a deinterleave is either a specific intrinsic, or a shufflevector instruction.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24641–24644

When working with scalable vectors, they don't have the same restriction of bit width. Treating them with a max width of 128 bits seems wasteful and inefficient, is there any way to get the vector width at compile time (is there a target->getMaxVectorWidth() or something)?

llvm/test/CodeGen/AArch64/complex-deinterleaving-f16-add.ll
102

It can't hurt, more coverage is always good. On the other hand, this is probably enough to test that the intrinsics work with fixed width cases.

103–108

Is the top comment correct? Is this case actually expected to not transform?
Given the fcadd in the output, I'd assume that this is expected to transform.

dmgreen added a subscriber: dmgreen.Apr 5 2023, 7:41 AM
dmgreen added inline comments.
llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
352

Does this need to run twice with and without IsScalable? It doubles the scanning of instructions, and seems unnecessary if it only modifies the shuffles/intrinsic matches, which are either both valid or mutually exclusive.

mgabka added inline comments.Apr 6 2023, 4:35 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24641–24644

For the scalable vectors I don't think we want to use a min or max vector width, we should rather operate on the ElementCount and size of the ElementTypes I think.
IIUC for the scalable vectors the condition we want to check is if we are operating on the packed vector types (in that case all are supported) or on the set of unpacked vectors we are supporting, am I correct?

in that case maybe it is worth to have dedicated section for fixed width and scalable width vectors in this function?

mgabka added inline comments.Apr 6 2023, 4:57 AM
llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
23

nit: typo

25

nit: probably does not need to start with capital letter, maybe using "shufflevector instruction" would be more clear

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24628

I think it could be worth to add extra run lines to the existing aarch64 tests which operate on fixed width vectors, to make sure that adding +sve does not stop generation of fcmla there. What do you think?

igor.kirillov marked 4 inline comments as done.

Address the most recent comments

igor.kirillov marked an inline comment as not done.Apr 11 2023, 9:17 AM
igor.kirillov added inline comments.
llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
352

I thought about it, but theoretically target could have only one (scalable or fixed vector) support.
Alternatively I can pass both flags ComplexDeinterleavingGraph and check them each time root node is found, but I am not sure if it is a better approach.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24628

Added to complex-deinterleaving-f16-add.ll where we have both shufflevectors and deinterleave2 intrinsics applied to fixed-width vectors

24641–24644

@NickGuy Actually, this functions returns false if VTyWidth is less than 128 bit, so any 128+ bit sized vectors are supported.

@mgabka We support any unpacked type with size 2X if 2X >= 128, there is code in AArch64TargetLowering::createComplexDeinterleavingIR that splits those vectors until they have minimal size of 128 and then merges them back. We don't support min-64bit sized vectors (unlike Neon) and that's the condition I added to the if statement.

llvm/test/CodeGen/AArch64/complex-deinterleaving-f16-add.ll
103–108

Yes, indeed. Also fixed in the other places.

NickGuy added inline comments.Apr 11 2023, 9:30 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24641–24644

Not sure why I added the comment here, it was supposed to be on the if (TyWidth > 128) { below, oops...

How resource-efficient is this splitting with scalable vectors though, my concern is that we'd split the operation across numerous 256+ vectors while only using the lower 128 bits of each.

igor.kirillov added inline comments.Apr 11 2023, 10:05 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24641–24644

Not sure if I understood your concern correctly. We are splitting any 256+ min-sized vector instructions to those that have minimum 128 bits. How many bits are going to be there depends on actual CPU, but generated code would work just fine even without knowing that information.

For example, for <vscale x 8 x double> we'll get 4 instructions working on vectors of size <vscale x 2 x double>

dmgreen added inline comments.Apr 12 2023, 9:59 AM
llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
352

I'm not sure why that matters. Can you explain more? I would expect that we just match starting from whatever we find (shuffle or intrinsic) and isComplexDeinterleavingOperationSupported handles whether the actual type is supported (be it scalable or fixed length). The shuffle won't ever match a scalable vector, but that shouldn't be a problem as far as I understand.

Make function evaluateBasicBlock to run once by moving scalable/fixed width support check to isComplexDeinterleavingOperationSupported.
Some minor clean ups.

igor.kirillov added inline comments.Apr 13 2023, 6:20 AM
llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
352

Nevermind, I think you are right. Now evaluateBasicBlock runs only once an the check is isComplexDeinterleavingOperationSupported

dmgreen accepted this revision.Apr 14 2023, 9:10 AM

Thanks. I have some minor suggestions but otherwise LGTM.

llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
263

instruciton -> instruction

341

I don't think this needs to call with a Scalable flag. The AArch64TargetLowering::isComplexDeinterleavingSupported routine can just return true if it has sve or complexnums.

923

This could be turned into

if (!RealShuffle || !ImagShuffle)
  return nullptr;

LLVM often likes returning early to reduce indentation.

This revision is now accepted and ready to land.Apr 14 2023, 9:10 AM
mgabka added inline comments.Apr 17 2023, 1:44 AM
llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
341

The isComplexDeinterleavingSupported with a flag is used inside isComplexDeinterleavingOperationSupported to check if for given vector type (scalable or fixed width) we have required architecture extensions avaialble.
For scalable vectors we need just sve, while for fixed width we need to have the ComplxNum feature.

However I realized that there is probably a bug here, as the SVE feature enables only scalable FCMLA, while scalable CMLA are available only when SVE2 feature is enabled, so I think we should make sure that isComplexDeinterleavingOperationSupported takes that into account and add extra testing.

I guess what David suggest is to make "isComplexDeinterleavingSupported" function as generic as possible and leave all the detailed checks to isComplexDeinterleavingOperationSupported , what looks reasonable to me.

Remove UseScalable param from isComplexDeinterleavingSupported

igor.kirillov marked 6 inline comments as done.

Fix type, swap if condition

igor.kirillov added inline comments.Apr 17 2023, 4:39 AM
llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp
341

CMLA are not a problem - we are checking that scalar type is not Int in AArch64TargetLowering::isComplexDeinterleavingOperationSupported

igor.kirillov edited the summary of this revision. (Show Details)

Update commit message
Swap if-condition to reduce diff

This revision was landed with ongoing or failed builds.Apr 21 2023, 3:29 AM
This revision was automatically updated to reflect the committed changes.