Hi David, this should save a function call. The code passes basic testing. This is my first dive into SimplifyLibCalls, I may miss some details, please bear with me.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
I'm not sure what the implications of this transform are with respect to correctness. My naïve understanding is that this could produce different results because it removes a step where rounding was introduced. I'd feel more comfortable with this if @scanon, @resistor and/or @escha could take a look.
lib/Transforms/Utils/SimplifyLibCalls.cpp | ||
---|---|---|
1096 ↗ | (On Diff #38342) | Please use auto *OpC here. |
1102 ↗ | (On Diff #38342) | Any reason in particular why we must limit this to isDoubleTy? |
1103–1105 ↗ | (On Diff #38342) | Is this clang-format'd? |
1105 ↗ | (On Diff #38342) | Wouldn't TLI->getName(LibFunc::exp) be equivalent to FuncName ? |
Hi, thanks for the review. I addressed the comments in this patch.
That said, I also share your concerns about correctness, that's mainly why I asked for review. On a second thought, maybe this should be enabled only under -ffast-math ? Anyway, looking forward to hear feedback from other reviewers.
As suggested by David, this should be fast-math only. It's roughly equivalent to re-association of multiplication. Besides rounding differences, this changes overflow and underflow behavior quite dramatically. Consider x = 1000, y = 0.001. pow(exp(x), y) = pow(inf, 0.001) = inf, whereas exp(x*y) = exp(1).
Also, this should really be generalized to apply to exp2 and exp10 as well as exp.
Addressed Stephen's comments. Thank you for the explanation.
I added also some tests to ensure it's handled correctly with and without fast-math. I included a case for exp2, but not exp10 as it doesn't seem to be available on FreeBSD (GNU extension). I think we can leave that for a subsequent patch.
For completeness, this is the code we emit with the patch (only under fast-math):
00000000004007b0 <mypow>:
4007b0: f2 0f 59 c1 mulsd %xmm1,%xmm0 4007b4: e9 87 fd ff ff jmpq 400540 <exp2@plt> 4007b9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
and this was the code we emit without:
00000000004007f0 <mypow>:
4007f0: 50 push %rax 4007f1: f2 0f 11 0c 24 movsd %xmm1,(%rsp) 4007f6: e8 75 fd ff ff callq 400570 <exp2@plt> 4007fb: f2 0f 10 0c 24 movsd (%rsp),%xmm1 400800: 58 pop %rax 400801: e9 7a fd ff ff jmpq 400580 <pow@plt> 400806: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 40080d: 00 00 00
David, as InstCombine owner -- do you have any objections to get this in (now that Stephen reviewed it)?
LGTM once the tests have been cleaned up.
test/Transforms/InstCombine/pow-exp2.ll | ||
---|---|---|
3–14 ↗ | (On Diff #38752) | Please simplify this test case, you don't need the alloca/store/load sequence. I believe the following should work: define double @mypow(double %x, double %y) #0 { entry: %call = call double @exp2(double %x) #2 %pow = call double @llvm.pow.f64(double %call, double %y) ret double %pow } |
10 ↗ | (On Diff #38752) | You don't define attribute #2, please remove this reference to it. |
17 ↗ | (On Diff #38752) | You don't define attribute #2, please remove it. |
20–26 ↗ | (On Diff #38752) | Please move these check directives immediately after the function. |
21 ↗ | (On Diff #38752) | I would use the following: ; CHECK-LABEL: define double @mypow( |
22 ↗ | (On Diff #38752) | This check line is a little superfluous. |