Page MenuHomePhabricator

[TLI][AArch64] Extend SLEEF vectorized functions mapping with VLA functions

Authored by pawosm01 on Mar 24 2023, 12:19 PM.



This commit extends D134719 "[AArch64] Enable libm vectorized
functions via SLEEF" with the mappings for the scalable functions.

It also introduces all the necessary changes needed to support masked

Diff Detail

Event Timeline

pawosm01 created this revision.Mar 24 2023, 12:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2023, 12:20 PM
pawosm01 requested review of this revision.Mar 24 2023, 12:20 PM
pawosm01 updated this revision to Diff 508498.Mar 27 2023, 12:19 AM

formatting update only

pawosm01 updated this revision to Diff 508527.Mar 27 2023, 1:44 AM

one more formatting change

Hi @pawosm01, the title of this patch peaked my interest so I had a look and left some comments. Thanks for working on this.

nit: you'll probably want to remove the [PATCH] from the title :)


nit: Is there a reason you've named this GlobalPredicate instead of just Predicate ?


nit: Is it worth doing:

for (bool Predicated : { false, true }) {
  for (ElementCount VF = ElementCount::getFixed(2), ...)
    AddVariantDecl(VF, Predicated)
  for (ElementCount VF = ElementCount::getScalable(2), ...)
    AddVariantDecl(VF, Predicated)
645 ↗(On Diff #508498)

I feel a bit uncomfortable about this code, because there may be two variants available, a masked and an unmasked variant, where one may be vectorizable and the other one is not. I wonder if isFunctionVectorizable should return true when false is passed there is a predicated variant available. That should be safe, because it is always valid to use a predicated vector function when the vector loop itself doesn't require predication (although the opposite doesn't hold). It would require changing the interface a bit to also return whether the vectorizable function is predicated.


Given that these functions are described as masked:


Should these calls not pass a predicate mask operand?

I'd add note to the documentation too when scalable vectors are used the sleef library is expect to be compiled with it.


I won't introduce a new argument here, since MASKED is just indicate this interface is scalable so far.
Can we make the declarations less verbose?
#define FIXED(NL) ElementCount::getFixed(NL), false
#define SCALABLE(NL) ElementCount::getScalable(NL), true


for me the "Global" naming is a bit misleading or am I missing something?
Masked maybe?

pawosm01 retitled this revision from [PATCH] [TLI][AArch64] Extend SLEEF vectorized functions mapping with VLA functions to [TLI][AArch64] Extend SLEEF vectorized functions mapping with VLA functions.Mar 27 2023, 4:00 AM
pawosm01 updated this revision to Diff 508768.Mar 27 2023, 1:01 PM

post-review changes

pawosm01 marked 5 inline comments as done.Mar 27 2023, 1:02 PM
pawosm01 marked an inline comment as done.Mar 27 2023, 1:04 PM
pawosm01 added inline comments.
645 ↗(On Diff #508498)

Considering this is the only place where this overload is used, there is no point in changing or extending the interface. I've reverted to old parameter list and handle your idea inside of the overload of isFunctionVectorizable() called from here.

pawosm01 updated this revision to Diff 508793.Mar 27 2023, 2:14 PM
pawosm01 marked an inline comment as done.

formatting changes again

sdesmalen added inline comments.Mar 28 2023, 5:36 AM

Sorry I missed this comment yesterday, but I'm not sure I agree with the suggestion to let SCALABLE imply MASKED. The two concepts are orthogonal, they just happen to have the same value for the set of vector function mappings added in this patch. It's perfectly possible to have an unmasked scalable interface and we may want to add those in the future.

The cost of adding the mask separately seemed quite low, is there another reason you prefer not to add this now?

159 ↗(On Diff #508793)

Can you make false a default option for this argument, so that you don't explicitly have to pass it?

pawosm01 updated this revision to Diff 509032.Mar 28 2023, 8:38 AM

Restored to the previous idea. And added default param value as suggested.

pawosm01 added inline comments.Mar 28 2023, 8:40 AM

I tend to agree with @sdesmalen so I'm reverting to the previous shape.

pawosm01 marked an inline comment as done.Mar 28 2023, 8:41 AM
pawosm01 added inline comments.
159 ↗(On Diff #508793)


pawosm01 updated this revision to Diff 509035.Mar 28 2023, 8:46 AM
pawosm01 marked an inline comment as done.

removed extra spaces that suddenly appeared in two lines.

sdesmalen added inline comments.Mar 28 2023, 9:42 AM

Does this need:

#undef MASKED
#undef NOMASK



When you #undef MASK/NOMASK in the .def file, this needs changing to something like:

pawosm01 updated this revision to Diff 509074.Mar 28 2023, 10:56 AM

post-review changes again

pawosm01 marked 3 inline comments as done.Mar 28 2023, 10:58 AM
pawosm01 added inline comments.

It does need it indeed.

sdesmalen accepted this revision.Mar 28 2023, 2:13 PM

LGTM, thanks for addressing all comments.


When enabling tail-folding (enabled with -sve-tail-folding=all) the loop no longer vectorizes because there is an Invalid cost for the call to atan2. This is unexpected, because intrinsic is predicated and so could handle the tail-folded loop.

Nothing that needs fixing in this patch, I'm just pointing out that there is more work to follow-on from this patch to benefit from the masked vector-function variants.

This revision is now accepted and ready to land.Mar 28 2023, 2:13 PM
danielkiss accepted this revision.Mar 28 2023, 3:13 PM

LGTM, thanks!


I have no objection since the interface is the struct VecDesc, here the macro would just save some repetition.

pawosm01 marked 2 inline comments as done.Mar 29 2023, 3:29 AM