This is an archive of the discontinued LLVM Phabricator instance.

[IR] Add llvm.tan.* intrinsic
AbandonedPublic

Authored by junaire on Mar 26 2023, 3:04 AM.

Details

Summary

This patch adds llvm.tan.* intrinsic and corresponding DAG nodes support
for X86 target.

The motivation of the patch is that even if it's uncommon to see tan in
real code, but it will make things a lot easier to do optimizations and
folds if we could have uniformity of trig intrinsics.

Related issue: https://github.com/llvm/llvm-project/issues/34950

Signed-off-by: Jun Zhang <jun@junz.org>

Diff Detail

Event Timeline

junaire created this revision.Mar 26 2023, 3:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 26 2023, 3:04 AM
junaire requested review of this revision.Mar 26 2023, 3:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 26 2023, 3:04 AM
RKSimon added inline comments.Mar 26 2023, 7:38 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
124

We probably want to add STRICT_FTAN at the same time.

llvm/test/CodeGen/X86/llvm.tan.ll
39

It'd be better to put these in the existing test files with other libm calls - fp128-libcalls.ll etc. - I'm not sure how thorough our existing tests are for these though - grepping for sin.f32 doesn't bring up much for instance.

junaire updated this revision to Diff 508431.Mar 26 2023, 10:22 AM
  1. Add STRICT_TAN
  2. Add more tests
junaire marked an inline comment as done.Mar 26 2023, 10:31 AM
junaire added inline comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
124

I have to admit I know nothing about it and I just simply grep STRICT_FSIN and add corresponding cases. I'm confused about where it comes from since there's no occurrence in SelectionDAGBuilder.cpp.

It'll be helpful if you can elaborate a bit about how these things work under the hood! :)

llvm/test/CodeGen/X86/llvm.tan.ll
39

I'm uncertain about what's the right way to go. It seems the tests for these trig intrinsics are pretty messy and we're lacking test coverage for the X86 target. I created this new file since I saw a test in llvm/test/CodeGen/AMDGPU/llvm.sin.ll.

I'd propose just leaving it as is, but if you insist I can move them (Then what about llvm.tan.f32 f64 ?)

Adding some people who might have opinions on more FP intrinsics.

I voiced this before and while I will repeat it here I won't block this patch all by myself:

This is a bad idea.

We are playing whack-a-mole and assuming we don't want to commit to all of them (math intrinsics), this is always going to be a source of "weird behavior", to say the least.
Even if we had all of them there are still unsolved problems as soon as you leave the space of systems with a libm that is linked in late since not all intrinsics lower to target instructions but rather libm calls.

What we should do is to remove all math intrinsics and instead work on the math functions themselves, as we do with other library functions (malloc, free, ...).
A math intrinsic is literally a regular call to the corresponding math function with a no_errno flag attached to it (and the type encoded in the name):
We probably even have all the attributes today to express that:

declare double @tan(double) memory(readonly)

[Not to mention that this makes tan, which is represented as target independent math intrinsic, special only for X86...]

Now there have been arguments for intrinsics, e.g., lib call matching is bad. I don't disagree, but we can fix that.
It was also raised that we treat intrinsics as cheap and calls as expensive in heuristics, though, that is just not right to begin with.
A math call is the same if we call the math function as part of the lowering anyway, no matter if it was a call or intrinsic call in the IR.
That said, we can make smarter choices for heuristics fairly easily, e.g., a readnone call is probably not going to be too expensive.

All that said, I think we should recognize math functions and annotate them, but just not as intrinsics. Maybe something like this:

define double @tan(double) libm(tan,double))
define float @tanf(float) libm(tan,float))
...
pengfei added inline comments.Mar 26 2023, 6:08 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
124

The strict nodes come from constrained intrinsics emitted by the FE under strict mode, see D70256

craig.topper added inline comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
124

They aren't directly referenced in SelectionDAGBuilder.cpp because they go through this code

  case Intrinsic::INTRINSIC:                                                     
#include "llvm/IR/ConstrainedOps.def"                                            
    visitConstrainedFPIntrinsic(cast<ConstrainedFPIntrinsic>(I));                
    return;
junaire marked an inline comment as not done.Mar 26 2023, 9:33 PM

I voiced this before and while I will repeat it here I won't block this patch all by myself:

This is a bad idea.

We are playing whack-a-mole and assuming we don't want to commit to all of them (math intrinsics), this is always going to be a source of "weird behavior", to say the least.
Even if we had all of them there are still unsolved problems as soon as you leave the space of systems with a libm that is linked in late since not all intrinsics lower to target instructions but rather libm calls.

What we should do is to remove all math intrinsics and instead work on the math functions themselves, as we do with other library functions (malloc, free, ...).
A math intrinsic is literally a regular call to the corresponding math function with a no_errno flag attached to it (and the type encoded in the name):
We probably even have all the attributes today to express that:

declare double @tan(double) memory(readonly)

[Not to mention that this makes tan, which is represented as target independent math intrinsic, special only for X86...]

Now there have been arguments for intrinsics, e.g., lib call matching is bad. I don't disagree, but we can fix that.
It was also raised that we treat intrinsics as cheap and calls as expensive in heuristics, though, that is just not right to begin with.
A math call is the same if we call the math function as part of the lowering anyway, no matter if it was a call or intrinsic call in the IR.
That said, we can make smarter choices for heuristics fairly easily, e.g., a readnone call is probably not going to be too expensive.

All that said, I think we should recognize math functions and annotate them, but just not as intrinsics. Maybe something like this:

define double @tan(double) libm(tan,double))
define float @tanf(float) libm(tan,float))
...

Hi @jdoerfert, thanks for the heads up. I've read the whole thread you linked and believe what you proposed is reasonable. However, it looks like there's no obvious consensus so far in the community, and everything is just in discussion? The one thing I agree with is that the whole thing is a mess. I'd suggest you start an RFC about how we handle math intrinsic and libcall matchings. But before that, the best choice for now I think is to keep those trig calls consistent so it's easier to do folds in the middle end, and that's the primary reason for me to create the patch.

llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
124

Thank @pengfei & @craig.topper for the explanation, that works for me!

I voiced this before and while I will repeat it here I won't block this patch all by myself:

This is a bad idea.

We are playing whack-a-mole and assuming we don't want to commit to all of them (math intrinsics), this is always going to be a source of "weird behavior", to say the least.
Even if we had all of them there are still unsolved problems as soon as you leave the space of systems with a libm that is linked in late since not all intrinsics lower to target instructions but rather libm calls.

What we should do is to remove all math intrinsics and instead work on the math functions themselves, as we do with other library functions (malloc, free, ...).
A math intrinsic is literally a regular call to the corresponding math function with a no_errno flag attached to it (and the type encoded in the name):
We probably even have all the attributes today to express that:

declare double @tan(double) memory(readonly)

[Not to mention that this makes tan, which is represented as target independent math intrinsic, special only for X86...]

Now there have been arguments for intrinsics, e.g., lib call matching is bad. I don't disagree, but we can fix that.
It was also raised that we treat intrinsics as cheap and calls as expensive in heuristics, though, that is just not right to begin with.
A math call is the same if we call the math function as part of the lowering anyway, no matter if it was a call or intrinsic call in the IR.
That said, we can make smarter choices for heuristics fairly easily, e.g., a readnone call is probably not going to be too expensive.

All that said, I think we should recognize math functions and annotate them, but just not as intrinsics. Maybe something like this:

define double @tan(double) libm(tan,double))
define float @tanf(float) libm(tan,float))
...

Hi @jdoerfert, thanks for the heads up. I've read the whole thread you linked and believe what you proposed is reasonable. However, it looks like there's no obvious consensus so far in the community, and everything is just in discussion? The one thing I agree with is that the whole thing is a mess. I'd suggest you start an RFC about how we handle math intrinsic and libcall matchings. But before that, the best choice for now I think is to keep those trig calls consistent so it's easier to do folds in the middle end, and that's the primary reason for me to create the patch.

We already do math folds in the middle end. See LibCallSimplifier::optimizeFloatingPointLibCall. We even recognize tan there. Adding an intrinsic would disable the existing optimizations for tan.

I voiced this before and while I will repeat it here I won't block this patch all by myself:

This is a bad idea.

We are playing whack-a-mole and assuming we don't want to commit to all of them (math intrinsics), this is always going to be a source of "weird behavior", to say the least.
Even if we had all of them there are still unsolved problems as soon as you leave the space of systems with a libm that is linked in late since not all intrinsics lower to target instructions but rather libm calls.

What we should do is to remove all math intrinsics and instead work on the math functions themselves, as we do with other library functions (malloc, free, ...).
A math intrinsic is literally a regular call to the corresponding math function with a no_errno flag attached to it (and the type encoded in the name):
We probably even have all the attributes today to express that:

declare double @tan(double) memory(readonly)

[Not to mention that this makes tan, which is represented as target independent math intrinsic, special only for X86...]

Now there have been arguments for intrinsics, e.g., lib call matching is bad. I don't disagree, but we can fix that.
It was also raised that we treat intrinsics as cheap and calls as expensive in heuristics, though, that is just not right to begin with.
A math call is the same if we call the math function as part of the lowering anyway, no matter if it was a call or intrinsic call in the IR.
That said, we can make smarter choices for heuristics fairly easily, e.g., a readnone call is probably not going to be too expensive.

All that said, I think we should recognize math functions and annotate them, but just not as intrinsics. Maybe something like this:

define double @tan(double) libm(tan,double))
define float @tanf(float) libm(tan,float))
...

Hi @jdoerfert, thanks for the heads up. I've read the whole thread you linked and believe what you proposed is reasonable. However, it looks like there's no obvious consensus so far in the community, and everything is just in discussion? The one thing I agree with is that the whole thing is a mess. I'd suggest you start an RFC about how we handle math intrinsic and libcall matchings. But before that, the best choice for now I think is to keep those trig calls consistent so it's easier to do folds in the middle end, and that's the primary reason for me to create the patch.

We already do math folds in the middle end. See LibCallSimplifier::optimizeFloatingPointLibCall. We even recognize tan there. Adding an intrinsic would disable the existing optimizations for tan.

Yes and no. I mean those 10th-grade trigonometrical identities like https://godbolt.org/z/nxvPe5en5. These should be handled in InstCombine. Yes, you can argue that it's unnecessary to add an intrinsic to do these folds but obviously, it will make things a lot easier and more clear if we have consistent intrinsics. In fact, there's a patch long ago trying to fix the problem: https://reviews.llvm.org/D41283

I voiced this before and while I will repeat it here I won't block this patch all by myself:

This is a bad idea.

We are playing whack-a-mole and assuming we don't want to commit to all of them (math intrinsics), this is always going to be a source of "weird behavior", to say the least.
Even if we had all of them there are still unsolved problems as soon as you leave the space of systems with a libm that is linked in late since not all intrinsics lower to target instructions but rather libm calls.

What we should do is to remove all math intrinsics and instead work on the math functions themselves, as we do with other library functions (malloc, free, ...).
A math intrinsic is literally a regular call to the corresponding math function with a no_errno flag attached to it (and the type encoded in the name):
We probably even have all the attributes today to express that:

declare double @tan(double) memory(readonly)

[Not to mention that this makes tan, which is represented as target independent math intrinsic, special only for X86...]

Now there have been arguments for intrinsics, e.g., lib call matching is bad. I don't disagree, but we can fix that.
It was also raised that we treat intrinsics as cheap and calls as expensive in heuristics, though, that is just not right to begin with.
A math call is the same if we call the math function as part of the lowering anyway, no matter if it was a call or intrinsic call in the IR.
That said, we can make smarter choices for heuristics fairly easily, e.g., a readnone call is probably not going to be too expensive.

All that said, I think we should recognize math functions and annotate them, but just not as intrinsics. Maybe something like this:

define double @tan(double) libm(tan,double))
define float @tanf(float) libm(tan,float))
...

Hi @jdoerfert, thanks for the heads up. I've read the whole thread you linked and believe what you proposed is reasonable. However, it looks like there's no obvious consensus so far in the community, and everything is just in discussion? The one thing I agree with is that the whole thing is a mess. I'd suggest you start an RFC about how we handle math intrinsic and libcall matchings. But before that, the best choice for now I think is to keep those trig calls consistent so it's easier to do folds in the middle end, and that's the primary reason for me to create the patch.

We already do math folds in the middle end. See LibCallSimplifier::optimizeFloatingPointLibCall. We even recognize tan there. Adding an intrinsic would disable the existing optimizations for tan.

Yes and no. I mean those 10th-grade trigonometrical identities like https://godbolt.org/z/nxvPe5en5. These should be handled in InstCombine. Yes, you can argue that it's unnecessary to add an intrinsic to do these folds but obviously, it will make things a lot easier and more clear if we have consistent intrinsics. In fact, there's a patch long ago trying to fix the problem: https://reviews.llvm.org/D41283

LibCallSimplifier::optimizeFloatingPointLibCall should be called during InstCombine via the use of LibCallSimplifier in InstCombinerImpl::tryOptimizeCall.

Adding an intrinsic for tan will require adding intrinsic versions of the existing tan optimizations in optimizeFloatingPointLibCall to avoid regressions there.

Need to add it to llvm/include/llvm/Analysis/VecFuncs.def so the vectorizer knows it can vectorize it to vector math library. Weirdly some parts of that file think llvm.tan and some other intrinsics already exists.

I suppose llvm.tan should also be added to llvm::isTriviallyVectorizable in VectorUtils.cpp.

Is there a corresponding patch for the clang part?

craig.topper added inline comments.Mar 27 2023, 1:14 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
650

We should default this to Expand in llvm/lib/CodeGen/TargetLoweringBase.cpp like we do for ISD::CBRT. That will be needed to properly support other targets than X86.

650

Oops ISD::FCBRT

junaire updated this revision to Diff 508927.Mar 28 2023, 1:24 AM

Address comments

junaire marked 2 inline comments as done.Mar 28 2023, 1:39 AM

Is there a corresponding patch for the clang part?

Not yet. But I'll create one later.

LibCallSimplifier::optimizeFloatingPointLibCall should be called during InstCombine via the use of LibCallSimplifier in InstCombinerImpl::tryOptimizeCall.
Adding an intrinsic for tan will require adding intrinsic versions of the existing tan optimizations in optimizeFloatingPointLibCall to avoid regressions there.

Ughh, does it should be handled in InstCombinerImpl::visitCallInst? Looks like if it's an intrinsic, it will not take the code path? (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp#L1311-L1312)

Need to add it to llvm/include/llvm/Analysis/VecFuncs.def so the vectorizer knows it can vectorize it to vector math library. Weirdly some parts of that file think llvm.tan and some other intrinsics already exists.
I suppose llvm.tan should also be added to llvm::isTriviallyVectorizable in VectorUtils.cpp.

Done.

kpn added a subscriber: kpn.Mar 29 2023, 10:42 AM
kpn added a comment.Mar 29 2023, 10:49 AM

I don't see any tests for a constrained/STRICT tan. We do have tests for constrained sin and cos so tan tests where sin and cos are tested is probably appropriate.

llvm/docs/LangRef.rst
14560

If you are adding a constrained/STRICT tan then you need to document it.

FWIW, I agree with @jdoerfert that we should do this on the scalar libcall. The amount of code touched here to make the intended combine simpler doesn't seem justified.

junaire abandoned this revision.Mar 29 2023, 5:50 PM

Alright, thanks everyone for the comments. I think I'll just abandon the patch and try to do folds in the libcall directly.