This patch enables folding sin(x) / cos(x) -> tan(x), cos(x) / sin(x) -> 1 / tan(x) under -ffast-math flag

# Details

# Diff Detail

### Event Timeline

lib/Transforms/InstCombine/InstCombineMulDivRem.cpp | ||
---|---|---|

1474 | Use | |

1481 | For the name here... - Predicate the transform on TLI->has(LibFunc_tan) (or tanf, tanl, depending on the type).
- Use TLI->getName(LibFunc_tan) (or tanf, tanl, depending on the type).
| |

1482 | Needs BuilderTy::FastMathFlagGuard Guard(Builder); above this. |

@hfinkel I'd like to thank you for your help and patience reviewing my patches, as I'm new to the community. I hope it's ok when I include you as a reviewer.

lib/Transforms/InstCombine/InstCombineMulDivRem.cpp | ||
---|---|---|

561–564 | Can you move and reuse the very similar function that's currently in SimplifyLibCalls.cpp? static bool hasUnaryFloatFn(const TargetLibraryInfo *TLI, Type *Ty, LibFunc DoubleFn, LibFunc FloatFn, LibFunc LongDoubleFn) { See also: /// Emit a call to the unary function named 'Name' (e.g. 'floor'). This /// function is known to take a single of type matching 'Op' and returns one /// value with the same type. If 'Op' is a long double, 'l' is added as the /// suffix of name, if 'Op' is a float, we add a 'f' suffix. Value *emitUnaryFloatFnCall(Value *Op, StringRef Name, IRBuilder<> &B, const AttributeList &Attrs); ...in BuildLibCalls.h | |

1475–1478 | Use m_Specific to simplify this: // sin(a) / cos(a) -> tan(a) Value *A; if (match(Op0, m_Intrinsic<Intrinsic::sin>(m_Value(A))) && match(Op1, m_Intrinsic<Intrinsic::cos>(m_Specific(A)))) { | |

test/Transforms/InstCombine/fdiv.ll | ||

90–92 ↗ | (On Diff #128599) | Please vary the fast-ness in these tests or add test(s) that show some variation. I'd prefer to see at least one test where we show the minimum case as we showed in one of the related patches - only the fdiv has relaxed math while the trig calls are strict. |

For reference. this is what GCC generates (although it's unclear whether it's a good idea to follow them)

https://godbolt.org/g/YUUKGE

lib/Transforms/InstCombine/InstCombineMulDivRem.cpp | ||
---|---|---|

1472 | Do you have some specific numerical concern here? As you've noted, this is a well-known math transform. We can make cos(a) / sin(a) a 'TODO' if you think we should use a different transform. |

lib/Transforms/InstCombine/InstCombineMulDivRem.cpp | ||
---|---|---|

1472 | The concern is that those transformation can overflow quite dramatically, even for |

lib/Transforms/InstCombine/InstCombineMulDivRem.cpp | ||
---|---|---|

1472 | The compile-time concern is misguided. This pattern, like every other "10th grade trigonometrical identity", should be optimized by an optimizing compiler because that's the job of an optimizing compiler. These patterns can occur by way of templated code, inlining, or because the programmer may not be a computer performance expert. Think: scientists who are trying to model/simulate some math problem, but don't know much about perf...because again: that's the optimizing compiler's job. If this patch or pass is causing a compile-time problem for you, please point to or file a bug. Obstructing patches like this is doubly bad when you're undermining the efforts of new contributors. |

lib/Transforms/InstCombine/InstCombineMulDivRem.cpp | ||
---|---|---|

1472 | I'm afraid this is not entirely correct. The job of an optimizing compiler is that of making tradeoff on what to optimize, based on cost. I'm not obstructing this patch, I'm asking for a second opinion. |

lib/Transforms/InstCombine/InstCombineMulDivRem.cpp | ||
---|---|---|

1475–1476 | We probably don't want to do this transform if both of the existing values have more than one use; we'd be trading a division for a libcall. If only one value has >1 use, it's probably still ok. Add more tests. :) |

If you have a numerical explanation of why this patch can go in, I'll be happy to accept,

I'm not sure what you're looking for. Brute force sin/cos is equal to tan for every possible value?

#include <math.h> #include <stdio.h> #include <string.h> int main() { unsigned int i; for (i = 0x00000000; i<0x7f800002; i++) { float f; memcpy(&f, &i, 4); float slow = sin(f) / cos(f); float fast = tan(f); if (slow != fast) { printf("\nsin(%f)/cos(%f) = %f, tan(%f) = %f\n", f, f, slow, f, fast); } if (i % (1024*1024*256) == 0) printf("0x%x...\n", i); } return 0; }

$ clang -O2 tan_checker.c ; ./a.out 0x0... 0x10000000... 0x20000000... 0x30000000... 0x40000000... 0x50000000... 0x60000000... 0x70000000... sin(inf)/cos(inf) = nan, tan(inf) = nan sin(nan)/cos(nan) = nan, tan(nan) = nan

Note that I'm testing on macOS x86 with strict math (the sin and cos calls are replaced by __sincos_stret).

Double-check to make sure I haven't screwed anything up there, but this suggests we could do this transform without -ffast-math?

I don't think this is feasible (for 64-bit values). `-ffast-math` is a little fun and complicated, e.g.

log(pow(x, y)) -> y*log(x)

which seems perfectly fine on paper, for `x = -1, y = 4`

log(pow(-1, 4)) -> 0 4*log(-1) -> NaN.

(courtesy of Steven)

My point is that reasoning about seemingly innocuous algebraic simplification turns out to be harder than expected, and therefore we should make a conscious choice on whether implement these after careful numerical analysis.

How is this relevant? Log is only defined for positive numbers, while sin/cos/tan are valid across all numerical inputs.

I don't think there's anything I more I can say here; sorry @Quolyk , I tried. If @hfinkel , @efriedma , @andrew.w.kaylor or anyone else would like to comment that would be great.

I don't feel qualified enough to say whether this can go in, somebody with fast-math experience should comment.

I changed the brute force float checker to test 1/tan(x), and it matches cos(x)/sin(x) in all cases on macOS 10.13. This is also true on Ubuntu 17.10 x86-64.

LGTM.

lib/Transforms/InstCombine/InstCombineMulDivRem.cpp | ||
---|---|---|

1492–1493 | Could hoist this check to the first 'if' to reduce the duplication. |