Partially solves https://bugs.llvm.org/show_bug.cgi?id=42190

# Details

# Diff Detail

### Event Timeline

lib/Transforms/Utils/SimplifyLibCalls.cpp | ||
---|---|---|

1483 | We have no pow(int, int) intrinsic right? So there is no easy way how to do it as follow up... |

lib/Transforms/Utils/SimplifyLibCalls.cpp | ||
---|---|---|

1481 | Maybe I should move transformation before: return Exp; ? |

Can you please provide some more information under which circumstances `powf(x, (float) y)` will provide a different result than `powi(x, y)`, and which fast-math flags specifically are necessary to make that transform legal?

Removed "isFast" requirement for powf(x, sitofp(n)) -> powi(x, n).

New: powf(x, C) -> powi(x, C) iff C is a constant integer value

I don't see how this is valid without some kind of fast-math. What if the integer exponent is not exactly representable as an FP value?

$ cat powi.c #include <stdio.h> #include <math.h> #include <stdlib.h> int main(int argc, char *argv[]) { float base = atof(argv[1]); printf("base as float = %.8f\n", base); int exponent = atoi(argv[2]); printf("exponent = %d\n", exponent); printf("exponent as float = %.8f\n", (float)exponent); float d = powf(base, exponent); float i = __builtin_powif(base, exponent); printf("powf = %f\n", d); printf("powif = %f\n", i); return 0; } $ ./a.out 1.0000001 16777217 base as float = 1.00000012 exponent = 16777217 exponent as float = 16777216.00000000 powf = 7.389055 powif = 7.385338

We definitely need `afn` for this; powi performs multiple intermediate rounding steps, so it can be significantly less accurate than pow.

Beyond that, we might need `nsz` in some cases? Probably worth writing a bunch of tests for zero/inf/nan base with zero/positive/negative exponents to figure out exactly which cases are different.

lib/Transforms/Utils/SimplifyLibCalls.cpp | ||
---|---|---|

1454 | I think you're missing some checks here. | |

1532 | powi takes a signed exponent. |

lib/Transforms/Utils/SimplifyLibCalls.cpp | ||
---|---|---|

1454 | I think isFast (-Ofast) check is good enough for now. I wrote some tests with various bases, https://pastebin.com/xpysEY0f. | |

1532 | false means isUnsigned = false. Or if you meant a comment - I added it there more explicitely. If you meant something else, I don't know what is wrong :( |

lib/Transforms/Utils/SimplifyLibCalls.cpp | ||
---|---|---|

1454 | You are right, we need to check it. |

test/Transforms/InstCombine/pow_fp_int.ll | ||
---|---|---|

75 | I don't think this is right; consider, for example , |

test/Transforms/InstCombine/pow_fp_int.ll | ||
---|---|---|

75 | This is a negative test, nothing was changed here. |

test/Transforms/InstCombine/pow_fp_int.ll | ||
---|---|---|

75 | Yeah, I should change variable naming in negative tests |

test/Transforms/InstCombine/pow_fp_int.ll | ||
---|---|---|

54 | Ah, right. Can we still do this atleast for some "unsigned" cases, up to i16 (i31?) ? |

It would be nice to use the exact necessary fast-math flags here, while we're thinking about it, instead of just "isFast()". From the discussion, it seems like we only need "afn"?

Yes, I think 'afn' gives us the freedom for this sort of thing. As a practical matter, I'm not sure if clang has the means to turn on 'afn' without the entirety of "-ffast-math", but that may change in the future.

It looks like you didn't change all the uses of isFast() in optimizePow?

Also, I'd like to see some performance numbers; I assume powi is faster, but it would be nice to confirm, particularly for larger exponents.

clang -O3 pw.c -lm

xbolva00@xbolva00-G551JW:~$ time ./a.out &> log

real 0m0,195s

user 0m0,195s

sys 0m0,000s

xbolva00@xbolva00-G551JW:~$ time ./a.out &> log

real 0m0,195s

user 0m0,195s

sys 0m0,000s

xbolva00@xbolva00-G551JW:~$ time ./a.out &> log

real 0m0,195s

user 0m0,195s

sys 0m0,000s

clang -Ofast pw.c -lm / clang -O3 -ffast-math pw.c -lm

xbolva00@xbolva00-G551JW:~$ time ./a.out &> log

real 0m0,053s

user 0m0,049s

sys 0m0,004s

xbolva00@xbolva00-G551JW:~$ time ./a.out &> log

real 0m0,050s

user 0m0,050s

sys 0m0,000s

xbolva00@xbolva00-G551JW:~$ time ./a.out &> log

real 0m0,051s

user 0m0,051s

sys 0m0,000s

"Benchmark": https://pastebin.com/Z0yZD4qU