Partially solves https://bugs.llvm.org/show_bug.cgi?id=42190
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
lib/Transforms/Utils/SimplifyLibCalls.cpp | ||
---|---|---|
1470 ↗ | (On Diff #203657) | We have no pow(int, int) intrinsic right? So there is no easy way how to do it as follow up... |
lib/Transforms/Utils/SimplifyLibCalls.cpp | ||
---|---|---|
1468 ↗ | (On Diff #203657) | Maybe I should move transformation before: return Exp; ? |
Can you please provide some more information under which circumstances powf(x, (float) y) will provide a different result than powi(x, y), and which fast-math flags specifically are necessary to make that transform legal?
Removed "isFast" requirement for powf(x, sitofp(n)) -> powi(x, n).
New: powf(x, C) -> powi(x, C) iff C is a constant integer value
I don't see how this is valid without some kind of fast-math. What if the integer exponent is not exactly representable as an FP value?
$ cat powi.c #include <stdio.h> #include <math.h> #include <stdlib.h> int main(int argc, char *argv[]) { float base = atof(argv[1]); printf("base as float = %.8f\n", base); int exponent = atoi(argv[2]); printf("exponent = %d\n", exponent); printf("exponent as float = %.8f\n", (float)exponent); float d = powf(base, exponent); float i = __builtin_powif(base, exponent); printf("powf = %f\n", d); printf("powif = %f\n", i); return 0; } $ ./a.out 1.0000001 16777217 base as float = 1.00000012 exponent = 16777217 exponent as float = 16777216.00000000 powf = 7.389055 powif = 7.385338
We definitely need afn for this; powi performs multiple intermediate rounding steps, so it can be significantly less accurate than pow.
Beyond that, we might need nsz in some cases? Probably worth writing a bunch of tests for zero/inf/nan base with zero/positive/negative exponents to figure out exactly which cases are different.
lib/Transforms/Utils/SimplifyLibCalls.cpp | ||
---|---|---|
1454 ↗ | (On Diff #203691) | I think you're missing some checks here. |
1534 ↗ | (On Diff #203691) | powi takes a signed exponent. |
lib/Transforms/Utils/SimplifyLibCalls.cpp | ||
---|---|---|
1454 ↗ | (On Diff #203691) | I think isFast (-Ofast) check is good enough for now. I wrote some tests with various bases, https://pastebin.com/xpysEY0f. |
1534 ↗ | (On Diff #203691) | false means isUnsigned = false. Or if you meant a comment - I added it there more explicitely. If you meant something else, I don't know what is wrong :( |
lib/Transforms/Utils/SimplifyLibCalls.cpp | ||
---|---|---|
1454 ↗ | (On Diff #203691) | You are right, we need to check it. |
test/Transforms/InstCombine/pow_fp_int.ll | ||
---|---|---|
176 ↗ | (On Diff #204367) | I don't think this is right; consider, for example , pow(.999999999,4000000000). |
test/Transforms/InstCombine/pow_fp_int.ll | ||
---|---|---|
176 ↗ | (On Diff #204367) | This is a negative test, nothing was changed here. |
test/Transforms/InstCombine/pow_fp_int.ll | ||
---|---|---|
176 ↗ | (On Diff #204367) | Yeah, I should change variable naming in negative tests |
test/Transforms/InstCombine/pow_fp_int.ll | ||
---|---|---|
90 ↗ | (On Diff #204370) | Ah, right. Can we still do this atleast for some "unsigned" cases, up to i16 (i31?) ? |
It would be nice to use the exact necessary fast-math flags here, while we're thinking about it, instead of just "isFast()". From the discussion, it seems like we only need "afn"?
Yes, I think 'afn' gives us the freedom for this sort of thing. As a practical matter, I'm not sure if clang has the means to turn on 'afn' without the entirety of "-ffast-math", but that may change in the future.
It looks like you didn't change all the uses of isFast() in optimizePow?
Also, I'd like to see some performance numbers; I assume powi is faster, but it would be nice to confirm, particularly for larger exponents.
clang -O3 pw.c -lm
xbolva00@xbolva00-G551JW:~$ time ./a.out &> log
real 0m0,195s
user 0m0,195s
sys 0m0,000s
xbolva00@xbolva00-G551JW:~$ time ./a.out &> log
real 0m0,195s
user 0m0,195s
sys 0m0,000s
xbolva00@xbolva00-G551JW:~$ time ./a.out &> log
real 0m0,195s
user 0m0,195s
sys 0m0,000s
clang -Ofast pw.c -lm / clang -O3 -ffast-math pw.c -lm
xbolva00@xbolva00-G551JW:~$ time ./a.out &> log
real 0m0,053s
user 0m0,049s
sys 0m0,004s
xbolva00@xbolva00-G551JW:~$ time ./a.out &> log
real 0m0,050s
user 0m0,050s
sys 0m0,000s
xbolva00@xbolva00-G551JW:~$ time ./a.out &> log
real 0m0,051s
user 0m0,051s
sys 0m0,000s
"Benchmark": https://pastebin.com/Z0yZD4qU