This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Scalar IBM MASS library conversion pass
ClosedPublic

Authored by masoud.ataei on May 3 2021, 7:46 AM.

Details

Reviewers
etiotto
pjeeva01
renenkel
bmahjour
qiucf
shchenz
spatel
efriedma
Group Reviewers
Restricted Project
Summary

This patch introduces an option to enable conversions from math function calls
to MASS library calls. To resolves calls generated with these conversions, one
need to link libxlopt.a library.

This patch is tested on PowerPC Linux and AIX.

Diff Detail

Unit TestsFailed

Event Timeline

masoud.ataei created this revision.May 3 2021, 7:46 AM
masoud.ataei requested review of this revision.May 3 2021, 7:46 AM
steven.zhang added a reviewer: Restricted Project.May 17 2021, 10:24 PM
bmahjour added inline comments.May 18 2021, 4:28 PM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
1367

why are these being handled here instead of PPCGenScalarMASSEntries.cpp?

bmahjour added inline comments.May 18 2021, 4:28 PM
llvm/include/llvm/Analysis/ScalarFuncs.def
18

shouldn't these map from llvm.* intrinsics to mass entry points as well?

masoud.ataei added inline comments.May 19 2021, 1:07 PM
llvm/include/llvm/Analysis/ScalarFuncs.def
18

llvm intrinsics is handled in PPCISelLowering.cpp.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
1367

We are not handling llvm intrinsics in PPCGenScalarMASSEntries.cpp because we don't want to block any type of existing optimizations (like pow(x,0.5) --> sqrt(x)) and future optimizations (like https://reviews.llvm.org/D94543 ?).

bmahjour added inline comments.May 25 2021, 12:42 PM
llvm/lib/Target/PowerPC/PPCGenScalarMASSEntries.cpp
71

There should be a todo comment to handle non-finite entries using fewer fast-math flags.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
1367

I see, could you please put a comment in the code to explain that? Alternatively you can put the comment at the top of llvm/include/llvm/Analysis/ScalarFuncs.def.

1367

Instead of TM.Options.UnsafeFPMath we should test for the individual fast-math flags that are required for safety. Checking for "unsafe-fp-math" has a few drawbacks:

  1. To make clang enable that flag it is necessary but not enough to specify -funsafe-math-optimizations! You'd have to specify -fno-math-errno as well.
  2. Clang sets the "unsafe-fp-math" flag when all four of -fno-math-errno -fassociative-math -freciprocal-math -fno-signed-zeros are specified, regardless of other flags... For example this command does the conversion to the _finite calls despite the user request to honor NaNs. clang t.c -c -O3 -fno-math-errno -fassociative-math -freciprocal-math -fno-signed-zeros -fhonor-nans

Even if the clang inconsistencies/issues are resolved, it would still be better to check for the individual flags for finer control and for consistency with other front-ends.

llvm/test/CodeGen/PowerPC/lower-intrinsics-mass-aix.ll
2

why not just use the default CHECK prefix? CHECK-ALL and CHECK-LWR don't distinguish anything based on this run command.

20

CHECK-DFLT is not in the list of prefixes defined.

llvm/test/CodeGen/PowerPC/lower-intrinsics-nofast-mass.ll
148

Remove this line, #1 is unused.

Sorry it took me so long to update this patch -- I think I addressed all reviews till now.

masoud.ataei marked 8 inline comments as done.Jun 29 2021, 1:28 PM
bmahjour added inline comments.Jul 7 2021, 2:04 PM
llvm/include/llvm/Analysis/ScalarFuncs.def
11

[nit] ISelLowing -> PPCISelLowering

llvm/lib/Target/PowerPC/PPCGenScalarMASSEntries.cpp
9

Since LLVM math intrinsic lowerings are done in ISellLowering, this comment should not say "and LLVM math intrinsics".

13

llvm.cos.f32 is an intrinsic and not handled by this transformation.

72

remove this line

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
1370

Why do you still check for TM.Options.UnsafeFPMath ? If you do it out of concerns for -fno-math-errno, then it's not needed. Note that these llvm intrinsics already mention that their semantics are identical to their libm counter parts but "without trapping or setting errno".

llvm/test/CodeGen/PowerPC/lower-intrinsics-fast-mass.ll
147

See above comment and remove "unsafe-fp-math".

llvm/test/CodeGen/PowerPC/pow-025-075-intrinsic-scalar-mass.ll
302

See above comment and remove "unsafe-fp-math".

Removed dependency to unsafe-fp-math and added clang option to
control afn flag.

Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2021, 2:07 PM
bmahjour added inline comments.Jul 15 2021, 10:37 AM
clang/include/clang/Driver/Options.td
1726 ↗(On Diff #358756)

I think we should separate out the clang driver interface into its own patch and ask for feedback on the mailing list. One key question would be, should this option assume no-errno and no-trapping-math or not (given that there is no IR representation for them).

There should also be LIT tests dedicated to this to verify the clang interface. I only see llc interface being tested in this patch.

llvm/include/llvm/Target/TargetOptions.h
179

We already have the PPCGenScalarMASSEntries bit, why do we need another one? Perhaps we can remove PPCGenScalarMASSEntries, but we should not have to turn on two options to get one transformation enabled.

llvm/lib/Target/PowerPC/PPCGenScalarMASSEntries.cpp
73

...but errno and trapping-math would be an issue for non-finite entries as well.

Again, I think this function should just check for nnan/ninf/afn flags. We need to find out (with the help of the wider community) how to deal with the concerns surrounding errno and traps separately. One way to do that would be to broaden the definition of the afn flag to include no-errno and no-trapping semantics. Another way might be to make clang FE set the afn bit only if -fno-math-errno and -fno-trapping-math options are enabled (less desirable). A third way might be to add corresponding function attributes to the IR for -fno-math-errno and -fno-trapping-math.

Once these issues are sorted out, we can add the appropriate constraints to the isCandidateSafeToLower function.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
1370

if someone compiles with -Ofast without any extra options, would TM.Options.ApproxFuncFPMath be true here?

Removed clang changes from this PR.
Removed extra option for MASS pass.
Now MASS pass is active with -O3 and approx-func option.

Adding another PR for clang changes on approx-func option.

masoud.ataei marked 9 inline comments as done.Jul 16 2021, 11:10 AM
masoud.ataei added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
1370

In clang changes, I had Options.ApproxFuncFPMath = LangOpts.ApproxFunc; in clang/lib/CodeGen/BackendUtil.cpp. That was responsible to update this TM option based on the clang approximate func option. And clang approximate func option will be set with -Ofast.

Then, the answer for your question is yes.

Do we *really* need -enable-approx-func-fp-math?
I'm pretty sure we are moving away from such global options, onto relying only on the per-instruction fast-math flags.

masoud.ataei marked an inline comment as done.Aug 26 2021, 8:22 AM

Do we *really* need -enable-approx-func-fp-math?
I'm pretty sure we are moving away from such global options, onto relying only on the per-instruction fast-math flags.

I am handling LLVM intrinsic math functions in PPCISelLowering.cpp, so I need to check for TM.Options.ApproxFuncFPMath. This is the only place that I think I need it.
Currently, I am updating TM.Options.ApproxFuncFPMath in llvm/lib/CodeGen/CommandFlags.cpp using the global option. Please let me know if there is a better way to update TM.Options.ApproxFuncFPMath based on the local fast-math flag.

Do we *really* need -enable-approx-func-fp-math?
I'm pretty sure we are moving away from such global options, onto relying only on the per-instruction fast-math flags.

I am handling LLVM intrinsic math functions in PPCISelLowering.cpp, so I need to check for TM.Options.ApproxFuncFPMath. This is the only place that I think I need it.

How is this going to work e.g. in LTO when not all TU's are compiled with fast-math flags?

I'm not familiar with those llc flags, but i'm quite sure that e.g. DAGCombiner
is transitioned away from using them, so i'm wary of adding new ones.

Currently, I am updating TM.Options.ApproxFuncFPMath in llvm/lib/CodeGen/CommandFlags.cpp using the global option. Please let me know if there is a better way to update TM.Options.ApproxFuncFPMath based on the local fast-math flag.

Removing dependency to the global option to convert math functions to MASS.

I'm not familiar with this library, and I haven't looked at current state of how we enable/map optional libs in a while...
We definitely want to avoid adding another target option/debug flag, and if we can avoid relying on a function parameter too, that would be even better.
Ie, the "afn" fast-math-flag (possibly in combination with some other IR- or node-level flags) seems like it should be enough to allow this transform/lowering.
Scanning the earlier review comments, there was some concern about the semantics wrt errno. If we need to adjust the "afn" definition, it's probably fine. There haven't been many uses of that flag AFAIK.

errno handling for math library functions is a mess. Currently, we don't model it properly; we just mark the calls "readnone" and hope for the best. If you don't want to fix that, just check for readnone for now.

I don't think we want to be querying function attributes or options here; afn plus enabling MASS should be enough. The function attributes are the old mechanism; we just haven't completely migrated some parts of SelectionDAG yet.

llvm/include/llvm/Analysis/ScalarFuncs.def
20

Do "__acosf_finite" etc. actually exist on AIX? I thought they only existed on glibc, and the glibc functions are all deprecated.

I think I'd prefer to track this information in TargetLibraryInfo, like we do for the vector functions, so we can more easily generalize this mechanism in the future.

errno handling for math library functions is a mess. Currently, we don't model it properly; we just mark the calls "readnone" and hope for the best. If you don't want to fix that, just check for readnone for now.

I think using readnone would work fine. It seems that clang marks math functions with that attribute when -fno-math-errno is in effect. To get the non-finite MASS lowerings at -O3 one would have to compile with both -fapprox-func and -fno-math-errno, which seems reasonable to me.

I don't think we want to be querying function attributes or options here; afn plus enabling MASS should be enough. The function attributes are the old mechanism; we just haven't completely migrated some parts of SelectionDAG yet.

I agree. I think the problem is that this patch is trying to decide on a global lowering strategy for llvm.* math intrinsics in llvm/lib/Target/PowerPC/PPCISelLowering.cpp but such global decision making does not go well with finer granularity of fast-math flags. My understanding is that the reason we need to handle intrinsic math functions later is because of strength-reduction transformations like pow(x,0.5) --> sqrt(x) that currently operate on intrinsic calls only. If we could apply those operations on things like __xl_pow_finite and produce calls to __xl_sqrt_finite then we wouldn't have this problem. Another possibility might be to have two versions of PPCGenScalarMASSEntries one that handles non-intrinsics and runs earlier, and another one that handles intrinsics after transformations likes pow(x,0.5) --> sqrt(x) are done.

I agree. I think the problem is that this patch is trying to decide on a global lowering strategy for llvm.* math intrinsics in llvm/lib/Target/PowerPC/PPCISelLowering.cpp but such global decision making does not go well with finer granularity of fast-math flags.

Hmm. Instead of using setLibcallName() and letting the legalizer generate the calls, it should be possible to use custom lowering to generate the appropriate calls, at the cost of writing a little more code.

My understanding is that the reason we need to handle intrinsic math functions later is because of strength-reduction transformations like pow(x,0.5) --> sqrt(x) that currently operate on intrinsic calls only.

instcombine should be primarily responsible for this sort of optimization. See LibCallSimplifier::optimizePow. I guess a few transforms (D51630 etc.) landed in DAGCombine; probably we could move them earlier.

As suggested before, I removed dependency to the global option to convert math functions to MASS for all intrinsic and non-intrinsic functions.
The main changes here with respect to the last proposal is in PPCIselLowing.cpp file, about how to handle llvm intrinsic math function.

  • and sorry for taking so long to update the patch.
masoud.ataei added inline comments.Jan 7 2022, 10:54 AM
llvm/include/llvm/Analysis/ScalarFuncs.def
20

Some machines still have the old glibc, so I kept them for compatibility.

ormris removed a subscriber: ormris.Jan 18 2022, 10:08 AM

This update will fix the type of arguments passing to the converted math function in PPCISelLowing.cpp.

masoud.ataei marked an inline comment as done.Jan 24 2022, 7:00 AM
bmahjour added inline comments.Jan 27 2022, 2:04 PM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
381

what about tan, acos, and the others?

llvm/test/CodeGen/PowerPC/lower-intrinsics-afn-mass.ll
148 ↗(On Diff #402508)

All the calls have afn....why do we need this attribute?

llvm/test/CodeGen/PowerPC/lower-intrinsics-fast-mass.ll
149

do we need this attribute? Can we remove it or have separate tests for functions with attributes?

llvm/test/CodeGen/PowerPC/lower-intrinsics-mass-aix.ll
2

We don't really need a separate aix file. Can we just add a run line with the aix triple to llvm/test/CodeGen/PowerPC/lower-intrinsics-nofast-mass.ll?

llvm/test/CodeGen/PowerPC/lower-scalar-mass-fast.ll
797

shouldn't the tests starting from here move to a different file? This test file is called ...mass-fast.ll so one would expect it only contains tests with fast-math flag on.

llvm/test/CodeGen/PowerPC/pow-025-075-intrinsic-scalar-mass-fast.ll
246 ↗(On Diff #402508)

How come pow -> sqrt conversion didn't happen here?

llvm/test/CodeGen/PowerPC/pow-025-075-nointrinsic-scalar-mass-fast.ll
22 ↗(On Diff #402508)

so pow->sqrt translation never happens for non-intrinsic pow. Is that expected? If so, are we planning to recognize these patterns inside PPCGenScalarMASSEntries in the future and do the translation as part of that transform?

masoud.ataei marked 7 inline comments as done.Jan 28 2022, 10:25 AM
masoud.ataei added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
381

These are the list of math functions that llvm creates intrinsic call for them. There is no llvm intrinsic for tan, acos and other math functions which (exist in MASS and) are not in this list.

llvm/test/CodeGen/PowerPC/lower-intrinsics-afn-mass.ll
148 ↗(On Diff #402508)

Removed

llvm/test/CodeGen/PowerPC/lower-intrinsics-fast-mass.ll
149

Removed

llvm/test/CodeGen/PowerPC/lower-intrinsics-mass-aix.ll
2

Done

llvm/test/CodeGen/PowerPC/lower-scalar-mass-fast.ll
797

Done

llvm/test/CodeGen/PowerPC/pow-025-075-intrinsic-scalar-mass-fast.ll
246 ↗(On Diff #402508)

Honestly, I am not sure why the conversion is not happening in this case. But without this patch we will get powf call (the conversion is not happening again). So this is a separate issue that someone needs to look at independent of this patch.

llvm/test/CodeGen/PowerPC/pow-025-075-nointrinsic-scalar-mass-fast.ll
22 ↗(On Diff #402508)

Correct, pow->sqrt translation is not happening for none intrinsic cases. It is the case independent of this patch. I guess the reason is DAGCombiner only apply this optimization on llvm intrinsics. This is an issue that either we need to handle it in DAGCombiner (same as intrinsic one) or in MASS pass. I feel DAGCombiner is a better option and I think this is also a separate issue.

masoud.ataei marked 7 inline comments as done.

Fix test cases.

Changing function name: lowerLibCall() -> lowerLibCallType()

Ready for another round of review.

bmahjour accepted this revision.Feb 1 2022, 11:21 AM

Apart from some minor inline comments this revision addresses all my outstanding comments. LGTM.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
17377

[nit] a better name would be lowerLibCallBasedOnType

llvm/test/CodeGen/PowerPC/pow-025-075-intrinsic-scalar-mass-fast.ll
246 ↗(On Diff #402508)

Could you please make a note of this as a todo comment in each test that is affected?

llvm/test/CodeGen/PowerPC/pow-025-075-nointrinsic-scalar-mass-fast.ll
22 ↗(On Diff #402508)

Ok, I understand now. We'll have to come back to this later at some point.

This revision is now accepted and ready to land.Feb 1 2022, 11:21 AM
masoud.ataei closed this revision.Feb 2 2022, 8:35 AM