llvm-dev message: https://lists.llvm.org/pipermail/llvm-dev/2021-May/150465.html
In an ELF shared object, a default visibility defined symbol is preemptible by
default. This creates some missed optimization opportunities.
-Bsymbolic-functions is more aggressive than our current -fvisibility-inlines-hidden
(present since 2012) as it applies to all function definitions. It can
- avoid PLT for cross-TU function calls && reduce dynamic symbol lookup
- reduce dynamic symbol lookup for taking function addresses and optimize out GOT/TOC on x86-64/ppc64
In a -DLLVM_TARGETS_TO_BUILD=X86 build, the number of JUMP_SLOT decreases from 12716 to 1628, and the number of GLOB_DAT decreases from 1918 to 1313
The built clang with -DLLVM_LINK_LLVM_DYLIB=on -DCLANG_LINK_CLANG_DYLIB=on is significantly faster.
See the Linux kernel build result https://bugs.archlinux.org/task/70697
Note: the performance of -fno-semantic-interposition -Bsymbolic-functions
libLLVM.so and libclang-cpp.so is close to a PIE binary linking against
libLLVM*.a and libclang*.a. -Bsymbolic-functions is the major contributor.
On x86-64 (with GOTPCRELX) and ppc64 ELFv2, the GOT/TOC relocations can be
optimized.
Some implication:
Interposing a subset of functions is no longer supported.
(This is fragile on ELF and unsupported on Mach-O at all. For Mach-O we don't
use ld -interpose or -flat_namespace)
Compiling a program which takes the address of any LLVM function with
{gcc,clang} -fno-pic and expects the address to equal to the address taken
from libLLVM.so or libclang-cpp.so is unsupported. I am fairly confident that
llvm-project shouldn't have different behaviors depending on such pointer
equality (as we've been using -fvisibility-inlines-hidden which applies to
inline functions for a long time), but if we accidentally do, users should be
aware that they should not make assumption on pointer equality in -fno-pic
mode.
See more on https://maskray.me/blog/2021-05-09-fno-semantic-interposition
I'd really like to avoid modifying CMAKE_SHARED_LINKER_FLAGS, modifying these global variables is an antipattern in modern CMake. We should always set these directly on the targets that need them. What's the problem with the approach you used in the first patch?