As shown in https://llvm.org/bugs/show_bug.cgi?id=30868 the current assumption that ld will use local calls for global symbols is incorrect. For the nop elimination, we must use the much stricter ELF local symbol definition. At least GNU ld does assume that the call is otherwise interposable.
Diff Detail
Event Timeline
As shown in https://llvm.org/bugs/show_bug.cgi?id=30868 the current assumption that ld will use local calls for global symbols is incorrect.
But the PR does not actually show anything. You simply assert that, "The SysV ABI on PPC64 is supposed to be implicitly PIC all the time." As I stated in D26326, Please demonstrate where the ABI requires this; I'm not sure that your premise is correct.
I really want to understand what situation actually motivated you to look at this? Was it some large binary where some long-call stub was needed?
http://math-atlas.sourceforge.net/devel/assembly/PPC-elf64abi-1.7.pdf
3.5.11 Function Calls
Programs use the 64-bit PowerPC bl instruction to make direct function calls. The bl instruction must be
followed by a nop instruction.
The definition we currently use for whether a call is interposable doesn't agree with the logic in GNU ld or GCC. While it doesn't matter for inlined functoins, a call must contain the nop as the target can be replaced.
Alright; I think we're confusing several issues here. This does not have anything to do with PIC vs. non-PIC code. Yes, the bl will require a following nop so that we can patch in the TOC-restore when required. This will be necessary for shared libraries and other multi-TOC configurations.
However, I think that we really need to understand why you're running into a problem here. The range of bl is the same as b, and so you don't gain any extra range by stub creation here. The only thing I've been able to think of is that you're using -ffunction-sections and the linker is re-arranging things so that the direct branch needs a stub, and the linker requires the nop patch for the stub. If I'm right, then I'm fine with doing this when using -ffunction-sections.
I don't use -ffunction-sections or similar. It is a combination of two factors. The default assumption of PIC and the incorrect modelling of interposition rules in LLVM. GCC will emit calls the PLT for f on x86 when using PIC. That's the same behavior GNU ld implements for the relocation on PPC64.
It is most visible with C++ libraries.
Can you please provide a test case which fails to link (offline if necessary)? I'm still failing to understand the underlying problem and have not run into issues myself. Thanks!