This is an archive of the discontinued LLVM Phabricator instance.

[TLI][AArch64] Extend SLEEF vectorized functions mapping with VLA functions
ClosedPublic

Authored by pawosm01 on Mar 24 2023, 12:19 PM.

Details

Summary

This commit extends D134719 "[AArch64] Enable libm vectorized
functions via SLEEF" with the mappings for the scalable functions.

It also introduces all the necessary changes needed to support masked
interfaces.

Diff Detail

Event Timeline

pawosm01 created this revision.Mar 24 2023, 12:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2023, 12:20 PM
pawosm01 requested review of this revision.Mar 24 2023, 12:20 PM
pawosm01 updated this revision to Diff 508498.Mar 27 2023, 12:19 AM

formatting update only

pawosm01 updated this revision to Diff 508527.Mar 27 2023, 1:44 AM

one more formatting change

Hi @pawosm01, the title of this patch peaked my interest so I had a look and left some comments. Thanks for working on this.

nit: you'll probably want to remove the [PATCH] from the title :)

llvm/lib/Transforms/Utils/InjectTLIMappings.cpp
43

nit: Is there a reason you've named this GlobalPredicate instead of just Predicate ?

113–119

nit: Is it worth doing:

for (bool Predicated : { false, true }) {
  for (ElementCount VF = ElementCount::getFixed(2), ...)
    AddVariantDecl(VF, Predicated)
  for (ElementCount VF = ElementCount::getScalable(2), ...)
    AddVariantDecl(VF, Predicated)
}
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
645 ↗(On Diff #508498)

I feel a bit uncomfortable about this code, because there may be two variants available, a masked and an unmasked variant, where one may be vectorizable and the other one is not. I wonder if isFunctionVectorizable should return true when false is passed there is a predicated variant available. That should be safe, because it is always valid to use a predicated vector function when the vector loop itself doesn't require predication (although the opposite doesn't hold). It would require changing the interface a bit to also return whether the vectorizable function is predicated.

llvm/test/Transforms/LoopVectorize/AArch64/sleef-calls-aarch64.ll
16

Given that these functions are described as masked:

TLI_DEFINE_VECFUNC("acos", "_ZGVsMxv_acos",  SCALABLE(2), MASKED)
TLI_DEFINE_VECFUNC("acosf", "_ZGVsMxv_acosf", SCALABLE(4), MASKED)

Should these calls not pass a predicate mask operand?

I'd add note to the documentation too when scalable vectors are used the sleef library is expect to be compiled with it.

llvm/include/llvm/Analysis/VecFuncs.def
21–23

I won't introduce a new argument here, since MASKED is just indicate this interface is scalable so far.
Can we make the declarations less verbose?
e.g.:
#define FIXED(NL) ElementCount::getFixed(NL), false
#define SCALABLE(NL) ElementCount::getScalable(NL), true

llvm/lib/Analysis/VectorUtils.cpp
1532

for me the "Global" naming is a bit misleading or am I missing something?
Masked maybe?

pawosm01 retitled this revision from [PATCH] [TLI][AArch64] Extend SLEEF vectorized functions mapping with VLA functions to [TLI][AArch64] Extend SLEEF vectorized functions mapping with VLA functions.Mar 27 2023, 4:00 AM
pawosm01 updated this revision to Diff 508768.Mar 27 2023, 1:01 PM

post-review changes

pawosm01 marked 5 inline comments as done.Mar 27 2023, 1:02 PM
pawosm01 marked an inline comment as done.Mar 27 2023, 1:04 PM
pawosm01 added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
645 ↗(On Diff #508498)

Considering this is the only place where this overload is used, there is no point in changing or extending the interface. I've reverted to old parameter list and handle your idea inside of the overload of isFunctionVectorizable() called from here.

pawosm01 updated this revision to Diff 508793.Mar 27 2023, 2:14 PM
pawosm01 marked an inline comment as done.

formatting changes again

sdesmalen added inline comments.Mar 28 2023, 5:36 AM
llvm/include/llvm/Analysis/VecFuncs.def
21–23

Sorry I missed this comment yesterday, but I'm not sure I agree with the suggestion to let SCALABLE imply MASKED. The two concepts are orthogonal, they just happen to have the same value for the set of vector function mappings added in this patch. It's perfectly possible to have an unmasked scalable interface and we may want to add those in the future.

The cost of adding the mask separately seemed quite low, is there another reason you prefer not to add this now?

llvm/lib/CodeGen/ReplaceWithVeclib.cpp
159 ↗(On Diff #508793)

Can you make false a default option for this argument, so that you don't explicitly have to pass it?

pawosm01 updated this revision to Diff 509032.Mar 28 2023, 8:38 AM

Restored to the previous idea. And added default param value as suggested.

pawosm01 added inline comments.Mar 28 2023, 8:40 AM
llvm/include/llvm/Analysis/VecFuncs.def
21–23

I tend to agree with @sdesmalen so I'm reverting to the previous shape.

pawosm01 marked an inline comment as done.Mar 28 2023, 8:41 AM
pawosm01 added inline comments.
llvm/lib/CodeGen/ReplaceWithVeclib.cpp
159 ↗(On Diff #508793)

Yes

pawosm01 updated this revision to Diff 509035.Mar 28 2023, 8:46 AM
pawosm01 marked an inline comment as done.

removed extra spaces that suddenly appeared in two lines.

sdesmalen added inline comments.Mar 28 2023, 9:42 AM
llvm/include/llvm/Analysis/VecFuncs.def
702

Does this need:

#undef MASKED
#undef NOMASK

?

llvm/lib/Analysis/TargetLibraryInfo.cpp
1184

When you #undef MASK/NOMASK in the .def file, this needs changing to something like:

/*MASK=*/false
pawosm01 updated this revision to Diff 509074.Mar 28 2023, 10:56 AM

post-review changes again

pawosm01 marked 3 inline comments as done.Mar 28 2023, 10:58 AM
pawosm01 added inline comments.
llvm/include/llvm/Analysis/VecFuncs.def
702

It does need it indeed.

sdesmalen accepted this revision.Mar 28 2023, 2:13 PM

LGTM, thanks for addressing all comments.

llvm/test/Transforms/LoopVectorize/AArch64/sleef-calls-aarch64.ll
175

When enabling tail-folding (enabled with -sve-tail-folding=all) the loop no longer vectorizes because there is an Invalid cost for the call to atan2. This is unexpected, because intrinsic is predicated and so could handle the tail-folded loop.

Nothing that needs fixing in this patch, I'm just pointing out that there is more work to follow-on from this patch to benefit from the masked vector-function variants.

This revision is now accepted and ready to land.Mar 28 2023, 2:13 PM
danielkiss accepted this revision.Mar 28 2023, 3:13 PM

LGTM, thanks!

llvm/include/llvm/Analysis/VecFuncs.def
21–23

I have no objection since the interface is the struct VecDesc, here the macro would just save some repetition.

pawosm01 marked 2 inline comments as done.Mar 29 2023, 3:29 AM