For x86 targets, when in PIC mode, do not optimize tail calls.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
2759 ↗ | (On Diff #25891) | We can't do this, musttail guarantees that a tail call will occur. I think the old behavior of forcing relocations to be resolved eagerly was preferable. If we cannot honor musttail, we must report an error somehow. |
You removed the musttail call change, but also removed the supporting code that arranges to jump through ECX, so I don't think this will work either.
I commented on the bug, I'm trying to understand why the GOT reference causes a problem. I don't yet see a codegen bug here.
The purpose of this change is to remove the jump (through ECX or not) completely since the use of tailcallee@GOT is incorrect under lazy dynamic linking.
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
2929 ↗ | (On Diff #26338) | It is a valid optimization with eager binding, so it seems the behavior should be controlled by an option. |
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
2929 ↗ | (On Diff #26338) | David, I suppose you prefer the default to have tail call optimization. I would rather llvm for x86 to play safe as gcc, with no tail call optimization by default. On the other hand, I think I should not remove line 2928 to 2943, |
This code really should be under an option. As mentioned in the original PR, the origin of the problem is the misdesigned XFree86/Xorg driver interface. We should not pessimise everyone who doesn't create brain dead code. The executive summary of the X bug is:
(1) A module is loaded with a reference to foo. foo is not yet defined.
(2) Another module is lodead that defines foo. X requires the first module to call this.
Ignoring must-tailcall would break all languages that require the optimisation.
I read the new discussions in https://llvm.org/bugs/show_bug.cgi?id=15086.
- If it is not considered an optimization bug, we need to change semantics of dlopen() with RTLD_LAZY.
- A fix to have both tail call optimization and RTLD_LAZY seems infeasible, to setup %ebx before and restore %ebx after the call.
- Both X.org and Android hit this problem. I have spent weeks to trace down this problem in Android and hence my motivation to change llvm and save future debug time.
- This problem has been worked around with a global -fno-optimize-sibling-calls flag. So it is not urgent to have a fix. But that flag also disables other valid cases such as non-PIC mode.
In my new diff3, I keep current isMustTail feature and disable isTailCall only under PIC mode.
That should not lose performance compared with gcc.
If we want to keep this optimization selectable under PIC mode,
then we need a new flag and the question is whether the optimization is on or off
under PIC mode by default. Considering the difficulty of debugging this problem
and current competition from gcc, I would rather the default to be off.
Suggestions?
Changing the default may affect users who depend on the current behavior which has been the case for a couple of releases. Another possible solution is to emit a warning for cases like this so that debugging similar issues in the future will become easier.
I'm coming around to the position that we should simply disable this
optimization for PIC calls to symbols with default visibility. The current
change disables all tail calls when generating PIC code, which is
undesirable.
We can add a flag later if someone asks, and call it
-f[no-]optimize-pic-sibling-calls.
Please review my new diff 4. This should allow tail call optimizations for protected and hidden symbols.
AFAIK, jump to those symbols do not use GOT.
On the other hand, I found some oddity about "sin" and "sin2" symbols in one test case,
which made me to use the strange condition:
if (!(G && (.... || ...))) isTailCall = false;
Thanks for the patch! I simplified the conditional a bit and tweaked the tests to try to cover some more corner cases that I could think of. During testing I found that this disabled tail calling of internal (think static in C) functions, so I added hasLocalLinkage() to the conditional. I committed the result as r238487.