Hi David, this should save a function call. The code passes basic testing. This is my first dive into SimplifyLibCalls, I may miss some details, please bear with me.
Details
Diff Detail
Event Timeline
I'm not sure what the implications of this transform are with respect to correctness. My naïve understanding is that this could produce different results because it removes a step where rounding was introduced. I'd feel more comfortable with this if @scanon, @resistor and/or @escha could take a look.
lib/Transforms/Utils/SimplifyLibCalls.cpp | ||
---|---|---|
1096 | Please use auto *OpC here. | |
1102 | Any reason in particular why we must limit this to isDoubleTy? | |
1103–1105 | Is this clang-format'd? | |
1105 | Wouldn't TLI->getName(LibFunc::exp) be equivalent to FuncName ? |
Hi, thanks for the review. I addressed the comments in this patch.
That said, I also share your concerns about correctness, that's mainly why I asked for review. On a second thought, maybe this should be enabled only under -ffast-math ? Anyway, looking forward to hear feedback from other reviewers.
As suggested by David, this should be fast-math only. It's roughly equivalent to re-association of multiplication. Besides rounding differences, this changes overflow and underflow behavior quite dramatically. Consider x = 1000, y = 0.001. pow(exp(x), y) = pow(inf, 0.001) = inf, whereas exp(x*y) = exp(1).
Also, this should really be generalized to apply to exp2 and exp10 as well as exp.
Addressed Stephen's comments. Thank you for the explanation.
I added also some tests to ensure it's handled correctly with and without fast-math. I included a case for exp2, but not exp10 as it doesn't seem to be available on FreeBSD (GNU extension). I think we can leave that for a subsequent patch.
For completeness, this is the code we emit with the patch (only under fast-math):
00000000004007b0 <mypow>:
4007b0: f2 0f 59 c1 mulsd %xmm1,%xmm0 4007b4: e9 87 fd ff ff jmpq 400540 <exp2@plt> 4007b9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
and this was the code we emit without:
00000000004007f0 <mypow>:
4007f0: 50 push %rax 4007f1: f2 0f 11 0c 24 movsd %xmm1,(%rsp) 4007f6: e8 75 fd ff ff callq 400570 <exp2@plt> 4007fb: f2 0f 10 0c 24 movsd (%rsp),%xmm1 400800: 58 pop %rax 400801: e9 7a fd ff ff jmpq 400580 <pow@plt> 400806: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 40080d: 00 00 00
David, as InstCombine owner -- do you have any objections to get this in (now that Stephen reviewed it)?
LGTM once the tests have been cleaned up.
test/Transforms/InstCombine/pow-exp2.ll | ||
---|---|---|
3–14 | Please simplify this test case, you don't need the alloca/store/load sequence. I believe the following should work: define double @mypow(double %x, double %y) #0 { entry: %call = call double @exp2(double %x) #2 %pow = call double @llvm.pow.f64(double %call, double %y) ret double %pow } | |
10 | You don't define attribute #2, please remove this reference to it. | |
17 | You don't define attribute #2, please remove it. | |
20–26 | Please move these check directives immediately after the function. | |
21 | I would use the following: ; CHECK-LABEL: define double @mypow( | |
22 | This check line is a little superfluous. |
Please use auto *OpC here.