This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Missed optimization in math expression: sin(x) / cos(x) => tan(x)
ClosedPublic

Authored by Quolyk on Dec 15 2017, 5:55 AM.

Details

Summary

This patch enables folding sin(x) / cos(x) -> tan(x), cos(x) / sin(x) -> 1 / tan(x) under -ffast-math flag

Diff Detail

Event Timeline

Quolyk created this revision.Dec 15 2017, 5:55 AM
Quolyk retitled this revision from [InstCombine] Missed optimization in math expression: sin(x) / cos(x) => tan(x) to [WIP][InstCombine] Missed optimization in math expression: sin(x) / cos(x) => tan(x).
Quolyk added a reviewer: davide.
hfinkel added inline comments.Dec 19 2017, 8:10 PM
lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1474

Use match here?

1481

For the name here...

  1. Predicate the transform on TLI->has(LibFunc_tan) (or tanf, tanl, depending on the type).
  2. Use TLI->getName(LibFunc_tan) (or tanf, tanl, depending on the type).
1482

Needs

BuilderTy::FastMathFlagGuard Guard(Builder);

above this.

Quolyk updated this revision to Diff 127683.Dec 20 2017, 4:29 AM
Quolyk retitled this revision from [WIP][InstCombine] Missed optimization in math expression: sin(x) / cos(x) => tan(x) to [InstCombine] Missed optimization in math expression: sin(x) / cos(x) => tan(x).

@hfinkel I'd like to thank you for your help and patience reviewing my patches, as I'm new to the community. I hope it's ok when I include you as a reviewer.

Quolyk marked 3 inline comments as done.Dec 20 2017, 4:32 AM

@hfinkel I'd like to thank you for your help and patience reviewing my patches, as I'm new to the community. I hope it's ok when I include you as a reviewer.

It's certainly fine to include me as a review. Welcome to the community!

Quolyk updated this revision to Diff 128599.Jan 4 2018, 12:52 AM
Quolyk edited the summary of this revision. (Show Details)
spatel added inline comments.Jan 5 2018, 10:43 AM
lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
561–564

Can you move and reuse the very similar function that's currently in SimplifyLibCalls.cpp?

static bool hasUnaryFloatFn(const TargetLibraryInfo *TLI, Type *Ty,
                          LibFunc DoubleFn, LibFunc FloatFn,
                          LibFunc LongDoubleFn) {

See also:

/// Emit a call to the unary function named 'Name' (e.g.  'floor'). This
/// function is known to take a single of type matching 'Op' and returns one
/// value with the same type. If 'Op' is a long double, 'l' is added as the
/// suffix of name, if 'Op' is a float, we add a 'f' suffix.
Value *emitUnaryFloatFnCall(Value *Op, StringRef Name, IRBuilder<> &B,
                            const AttributeList &Attrs);

...in BuildLibCalls.h

1475–1478

Use m_Specific to simplify this:

// sin(a) / cos(a) -> tan(a)
Value *A;
if (match(Op0, m_Intrinsic<Intrinsic::sin>(m_Value(A))) &&
    match(Op1, m_Intrinsic<Intrinsic::cos>(m_Specific(A)))) {
test/Transforms/InstCombine/fdiv.ll
90–92

Please vary the fast-ness in these tests or add test(s) that show some variation. I'd prefer to see at least one test where we show the minimum case as we showed in one of the related patches - only the fdiv has relaxed math while the trig calls are strict.

Quolyk updated this revision to Diff 128867.Jan 7 2018, 12:49 AM
Quolyk marked 3 inline comments as done.
spatel added inline comments.Jan 8 2018, 8:05 AM
lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1472

Should we also handle:

cos(a) / sin(a) -> 1 / tan(a)

?

test/Transforms/InstCombine/fdiv.ll
161–163

Do we need to declare tan functions for these tests?

davide requested changes to this revision.Jan 8 2018, 8:07 AM
davide added a subscriber: scanon.

@scanon should sign off this.

lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1472

Please wait for @scanon opinion before implementing every possible 10th grade trigonometrical identity.

This revision now requires changes to proceed.Jan 8 2018, 8:07 AM
davide added a comment.Jan 8 2018, 8:12 AM

For reference. this is what GCC generates (although it's unclear whether it's a good idea to follow them)
https://godbolt.org/g/YUUKGE

spatel added inline comments.Jan 8 2018, 10:57 AM
lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1472

Do you have some specific numerical concern here? As you've noted, this is a well-known math transform.

We can make cos(a) / sin(a) a 'TODO' if you think we should use a different transform.
https://stackoverflow.com/questions/3738384/stable-cotangent

davide added inline comments.Jan 8 2018, 11:02 AM
lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1472

The concern is that those transformation can overflow quite dramatically, even for -ffast-math.
The other concern is that I'm not a numerical expert, so I'd love to have this signed off from somebody who knows better than me.
The last concern is, again, we shouldn't pattern match every possible thing just because, it slows down the compiler without real benefit, so, do you know how this pattern is frequent?

spatel added inline comments.Jan 8 2018, 11:55 AM
lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1472

The compile-time concern is misguided.

This pattern, like every other "10th grade trigonometrical identity", should be optimized by an optimizing compiler because that's the job of an optimizing compiler.

These patterns can occur by way of templated code, inlining, or because the programmer may not be a computer performance expert. Think: scientists who are trying to model/simulate some math problem, but don't know much about perf...because again: that's the optimizing compiler's job.

If this patch or pass is causing a compile-time problem for you, please point to or file a bug. Obstructing patches like this is doubly bad when you're undermining the efforts of new contributors.

davide added inline comments.Jan 8 2018, 12:28 PM
lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1472

I'm afraid this is not entirely correct. The job of an optimizing compiler is that of making tradeoff on what to optimize, based on cost. I'm not obstructing this patch, I'm asking for a second opinion.
If you have a numerical explanation of why this patch can go in, I'll be happy to accept, otherwise I'll defer the review to @scanon.

spatel added inline comments.Jan 8 2018, 1:34 PM
lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1475–1476

We probably don't want to do this transform if both of the existing values have more than one use; we'd be trading a division for a libcall. If only one value has >1 use, it's probably still ok. Add more tests. :)

spatel added a comment.Jan 8 2018, 2:46 PM

If you have a numerical explanation of why this patch can go in, I'll be happy to accept,

I'm not sure what you're looking for. Brute force sin/cos is equal to tan for every possible value?

#include <math.h>
#include <stdio.h>
#include <string.h>

int main() {
  unsigned int i;
  for (i = 0x00000000; i<0x7f800002; i++) {
    float f;
    memcpy(&f, &i, 4);
    float slow = sin(f) / cos(f);
    float fast = tan(f);
    if (slow != fast) {
      printf("\nsin(%f)/cos(%f) = %f, tan(%f) = %f\n", f, f, slow, f, fast);
    }
    if (i % (1024*1024*256) == 0) printf("0x%x...\n", i); 
  }

  return 0;
}
$  clang -O2 tan_checker.c ; ./a.out 
0x0...
0x10000000...
0x20000000...
0x30000000...
0x40000000...
0x50000000...
0x60000000...
0x70000000...

sin(inf)/cos(inf) = nan, tan(inf) = nan

sin(nan)/cos(nan) = nan, tan(nan) = nan

Note that I'm testing on macOS x86 with strict math (the sin and cos calls are replaced by __sincos_stret).
Double-check to make sure I haven't screwed anything up there, but this suggests we could do this transform without -ffast-math?

davide added a comment.Jan 8 2018, 2:52 PM

If you have a numerical explanation of why this patch can go in, I'll be happy to accept,

I'm not sure what you're looking for. Brute force sin/cos is equal to tan for every possible value?

I don't think this is feasible (for 64-bit values). -ffast-math is a little fun and complicated, e.g.

log(pow(x, y)) -> y*log(x)

which seems perfectly fine on paper, for x = -1, y = 4

log(pow(-1, 4)) -> 0
4*log(-1) -> NaN.

(courtesy of Steven)

My point is that reasoning about seemingly innocuous algebraic simplification turns out to be harder than expected, and therefore we should make a conscious choice on whether implement these after careful numerical analysis.

log(pow(x, y)) -> y*log(x)

How is this relevant? Log is only defined for positive numbers, while sin/cos/tan are valid across all numerical inputs.

I don't think there's anything I more I can say here; sorry @Quolyk , I tried. If @hfinkel , @efriedma , @andrew.w.kaylor or anyone else would like to comment that would be great.

davide resigned from this revision.Jan 8 2018, 3:43 PM

I don't feel qualified enough to say whether this can go in, somebody with fast-math experience should comment.

davide added a comment.Jan 8 2018, 3:43 PM

log(pow(x, y)) -> y*log(x)

How is this relevant? Log is only defined for positive numbers, while sin/cos/tan are valid across all numerical inputs.

My point is that fast-math implications can be non-trivial.

davide removed a reviewer: davide.Jan 8 2018, 3:50 PM
Quolyk updated this revision to Diff 129042.Jan 9 2018, 1:04 AM
Quolyk edited the summary of this revision. (Show Details)
spatel accepted this revision.Jan 10 2018, 12:45 PM

I changed the brute force float checker to test 1/tan(x), and it matches cos(x)/sin(x) in all cases on macOS 10.13. This is also true on Ubuntu 17.10 x86-64.

LGTM.

lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1492–1493

Could hoist this check to the first 'if' to reduce the duplication.

This revision is now accepted and ready to land.Jan 10 2018, 12:45 PM
Quolyk closed this revision.Jan 10 2018, 10:34 PM
Quolyk added a subscriber: davide.Jan 10 2018, 10:39 PM

@spatel @davide thanks for review and comments

spatel added a subscriber: bkramer.Jan 11 2018, 8:01 AM

For reference, @bkramer improved (thanks!) the attribute propagation:
rL322284
rL322285