This is an archive of the discontinued LLVM Phabricator instance.

[CMake][ELF] Link libLLVM.so and libclang-cpp.so with -Bsymbolic-functions
ClosedPublic

Authored by MaskRay on May 7 2021, 1:06 PM.

Details

Summary

llvm-dev message: https://lists.llvm.org/pipermail/llvm-dev/2021-May/150465.html

In an ELF shared object, a default visibility defined symbol is preemptible by
default. This creates some missed optimization opportunities.
-Bsymbolic-functions is more aggressive than our current -fvisibility-inlines-hidden
(present since 2012) as it applies to all function definitions. It can

  • avoid PLT for cross-TU function calls && reduce dynamic symbol lookup
  • reduce dynamic symbol lookup for taking function addresses and optimize out GOT/TOC on x86-64/ppc64

In a -DLLVM_TARGETS_TO_BUILD=X86 build, the number of JUMP_SLOT decreases from 12716 to 1628, and the number of GLOB_DAT decreases from 1918 to 1313
The built clang with -DLLVM_LINK_LLVM_DYLIB=on -DCLANG_LINK_CLANG_DYLIB=on is significantly faster.
See the Linux kernel build result https://bugs.archlinux.org/task/70697

Note: the performance of -fno-semantic-interposition -Bsymbolic-functions
libLLVM.so and libclang-cpp.so is close to a PIE binary linking against
libLLVM*.a and libclang*.a. -Bsymbolic-functions is the major contributor.
On x86-64 (with GOTPCRELX) and ppc64 ELFv2, the GOT/TOC relocations can be
optimized.

Some implication:

Interposing a subset of functions is no longer supported.
(This is fragile on ELF and unsupported on Mach-O at all. For Mach-O we don't
use ld -interpose or -flat_namespace)

Compiling a program which takes the address of any LLVM function with
{gcc,clang} -fno-pic and expects the address to equal to the address taken
from libLLVM.so or libclang-cpp.so is unsupported. I am fairly confident that
llvm-project shouldn't have different behaviors depending on such pointer
equality (as we've been using -fvisibility-inlines-hidden which applies to
inline functions for a long time), but if we accidentally do, users should be
aware that they should not make assumption on pointer equality in -fno-pic
mode.

See more on https://maskray.me/blog/2021-05-09-fno-semantic-interposition

Diff Detail

Event Timeline

MaskRay created this revision.May 7 2021, 1:06 PM
MaskRay requested review of this revision.May 7 2021, 1:06 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 7 2021, 1:06 PM
MaskRay edited the summary of this revision. (Show Details)May 7 2021, 1:12 PM
MaskRay edited the summary of this revision. (Show Details)May 7 2021, 1:18 PM
MaskRay edited the summary of this revision. (Show Details)May 7 2021, 1:30 PM
MaskRay edited the summary of this revision. (Show Details)
phosek accepted this revision.May 7 2021, 1:39 PM

LGTM

This revision is now accepted and ready to land.May 7 2021, 1:39 PM

Does this change mean that LD_PRELOAD will no longer work? Are there any other downsides to adding these flags?

MaskRay updated this revision to Diff 343755.May 7 2021, 1:52 PM

Set global CMAKE_SHARED_LINKER_FLAGS instead.

This additional makes -DBUILD_SHARED_LIBS=on builds (dev only; not recommended for packagers) faster.

phosek added inline comments.May 7 2021, 1:55 PM
llvm/cmake/modules/HandleLLVMOptions.cmake
243

I'd really like to avoid modifying CMAKE_SHARED_LINKER_FLAGS, modifying these global variables is an antipattern in modern CMake. We should always set these directly on the targets that need them. What's the problem with the approach you used in the first patch?

MaskRay added a comment.EditedMay 7 2021, 1:59 PM

Does this change mean that LD_PRELOAD will no longer work? Are there any other downsides to adding these flags?

The downside is that it will behave more like Windows dll and Mach-O dylib using two-level namespace lookup.
No runtime preemption.

There are several types of LD_PRELOAD usage.

First, use LD_PRELOAD=same_soname.so to replace a DT_NEEDED entry with the same SONAME. Both -fno-semantic-interposition and -Bsymbolic are compatible with such usage.

Second, use LD_PRELOAD=malloc.so to intercept some functions not defined in the application or any of its shared object dependencies. Both -fno-semantic-interposition and -Bsymbolic are compatible.

c
void *f() { return malloc(0xb612); }

Third, use LD_PRELOAD=different_soname.so to replace a function defined in a shared object dependency and the SONAME is different.
Such usage is incompatible with -Bsymbolic.
If the function is referenced in its definiting translation unit, the call sites are statically bound with -fno-semantic-interposition;
otherwise the usage is still compatible.

MaskRay added inline comments.May 7 2021, 2:00 PM
llvm/cmake/modules/HandleLLVMOptions.cmake
243

With this, we can make libLLVM*.so libclang*.so (-DBUILD_SHARED_LIBS=on) and other lib*.so faster as well.

phosek requested changes to this revision.May 7 2021, 2:08 PM
phosek added inline comments.
llvm/cmake/modules/HandleLLVMOptions.cmake
243

I'd prefer to setting that flag inside add_llvm_library and add_clang_library when SHARED is set. HandleLLVMOptions is also used from runtimes where setting this flag is really undesirable (that's why using global variables is really problematic).

This revision now requires changes to proceed.May 7 2021, 2:08 PM
MaskRay updated this revision to Diff 343758.May 7 2021, 2:08 PM

Don't affect -DBUILD_SHARED_LIBS=on

MaskRay marked an inline comment as done.May 7 2021, 2:10 PM
MaskRay added inline comments.
llvm/cmake/modules/HandleLLVMOptions.cmake
243

add_llvm_library and add_clang_library may be used by downstream projects which don't want -Bsymbolic-functions.

Switched back. I'll just not touch -DBUILD_SHARED_LIBS=on for now.

phosek accepted this revision.May 7 2021, 2:42 PM
This revision is now accepted and ready to land.May 7 2021, 2:42 PM
bcain added a subscriber: bcain.May 10 2021, 11:32 AM
rnk added a subscriber: rnk.May 10 2021, 6:12 PM

+1 to the idea, but I have no idea if this is the right cmake spot

If we've decided to actually care about the shared library build, should we also consider using -fvisibility-inlines-hidden?

MaskRay marked an inline comment as done.May 11 2021, 1:01 AM

+1 to the idea, but I have no idea if this is the right cmake spot

If we've decided to actually care about the shared library build, should we also consider using -fvisibility-inlines-hidden?

Line 329: check_cxx_compiler_flag("-fvisibility-inlines-hidden" SUPPORTS_FVISIBILITY_INLINES_HIDDEN_FLAG).
llvm-project has been using this since 2012. With this flag, taking the address of an inline function may be different in different modules.
So -fno-semantic-interposition and -Bsymbolic-functions will now complement the existing option:)

I'll push this tomorrow.

MaskRay edited the summary of this revision. (Show Details)May 11 2021, 5:37 PM
MaskRay edited the summary of this revision. (Show Details)

I was originally skeptical of this change, because of some negative feedback about these flags in the past on the Fedora development list. However, the people I've talked to more recently seem to be in favor of this change, so no objection from me.

I tried to build the libc++ benchmarks locally and the benchmark with target to_chars_libcxx fails to properly execute. Bi-section let to this commit. Can the change in the linker flags affect libc++? I also see the scheduled builds of libc++ fail at Buildkite https://buildkite.com/llvm-project/libcxx-ci.
Can you have a look at the issue?

There is also a test failure on the aarch64 2-stage bot (https://lab.llvm.org/buildbot/#/builders/7/builds/2720) which I've bisected to this change, so I'll revert it for now.

MaskRay added a subscriber: marco-c.EditedMay 13 2021, 10:15 AM

I tried to build the libc++ benchmarks locally and the benchmark with target to_chars_libcxx fails to properly execute. Bi-section let to this commit. Can the change in the linker flags affect libc++? I also see the scheduled builds of libc++ fail at Buildkite https://buildkite.com/llvm-project/libcxx-ci.
Can you have a look at the issue?

libcxx/src/locale.cpp calls install(&make<_VSTD::ctype<char> >(nullptr, false, 1u)); to initialize std::__1::ctype<char>::id.
In libcxx/include/__locale (which may be transitively included by the executable), the std::__1::ctype<char>::id reference is expected to be bound to std::__1::ctype<char>::id defined in libc++.so.1.
If std::__1::ctype<char>::id is copy relocated by the executable (gcc/clang -fno-pic), the std::__1::ctype<char>::id reference from the executable will be bound to the uninitialized copy in the executable, causing a no-facet failure.

cpp
// _Facet is std::__1::ctype<char>
template <class _Facet>
inline _LIBCPP_INLINE_VISIBILITY
const _Facet&
use_facet(const locale& __l)
{
    return static_cast<const _Facet&>(*__l.use_facet(_Facet::id));
}

locale.cpp thus cannot be compiled with -fno-semantic-interposition

There is also a test failure on the aarch64 2-stage bot (https://lab.llvm.org/buildbot/#/builders/7/builds/2720) which I've bisected to this change, so I'll revert it for now.

The way @marco-c made compiler-rt/test/profile/Posix/gcov-shared-flush.c work on Linux in rL336678 is to use interposition of the global variables writeout_fn_list/etc.
This doesn't work on Darwin, thus https://bugs.llvm.org/show_bug.cgi?id=38134
-fno-semantic-interposition disables variable interposition on Linux as well, therefore a similar failure.

I think -fno-semantic-interposition should be specific to llvm/clang for now. We should ensure they work even under copy relocations.

MaskRay reopened this revision.EditedMay 13 2021, 1:23 PM

I'll go simple - just drop -fno-semantic-interposition from this patch, as I cannot figure out how to make -fno-semantic-interposition specific to llvm/ and clang/.
-Bsymbolic-functions claims most of the optimizations anyway (and more so on x86-64 with GOTPCRELX) for Clang built clang.

Unfortunately gcc -fpic -fsemantic-interposition disables IPO so we should figure out how to reinstate -fno-semantic-interposition to make distribution clang faster.

This revision is now accepted and ready to land.May 13 2021, 1:23 PM
MaskRay updated this revision to Diff 345264.May 13 2021, 1:26 PM
MaskRay retitled this revision from [CMake][ELF] Add -fno-semantic-interposition and -Bsymbolic-functions to [CMake][ELF] Link libLLVM.so and libclang-cpp.so with -Bsymbolic-functions.
MaskRay edited the summary of this revision. (Show Details)

Drop -fno-semantic-interposition

MaskRay updated this revision to Diff 345268.May 13 2021, 1:36 PM
MaskRay edited the summary of this revision. (Show Details)

Fix comment

This revision was landed with ongoing or failed builds.May 13 2021, 1:45 PM
This revision was automatically updated to reflect the committed changes.

I cannot figure out how to make -fno-semantic-interposition specific to llvm/ and clang/.

Maybe @nikic knows? I think he is interested in this work, since the base (reverted) commit introduced 4% compile time improvement - really nice.

I cannot figure out how to make -fno-semantic-interposition specific to llvm/ and clang/.

Maybe @nikic knows? I think he is interested in this work, since the base (reverted) commit introduced 4% compile time improvement - really nice.

Thanks for showing interesting:) (At least let me know I am not wasting my energy.)
Posted D102453 for the GCC optimization.

Personally I always build llvm-project with a stable Clang so D102453 doesn't really make my life better.
D102453 is for those distributions which build llvm-project with GCC...

When building Linux kernel, I don't see additional improvement by adding -fno-semantic-interposition,
but it does make libLLVM.so 2% smaller (notably, GCC -fPIC -fno-semantic-interposition can optimize some
functions which call other non-inline functions so in various places the function body can be smaller.
Some vtable can be optimized out as well.)

While -Bsymbolic-funtions brings nice performance improvements it also changes symbol resolution order. That means we effectively disabled preemption for functions and all references from inside libLLVM*.so will be resolved locally. But references to global data can still be interposed by external definitions. Do I understand correctly that main argument against using -Bsymbolic is potential issue with equality comparison of address of global? Or anything else?

rnk added a comment.Dec 3 2021, 7:16 AM

While -Bsymbolic-funtions brings nice performance improvements it also changes symbol resolution order. That means we effectively disabled preemption for functions and all references from inside libLLVM*.so will be resolved locally. But references to global data can still be interposed by external definitions. Do I understand correctly that main argument against using -Bsymbolic is potential issue with equality comparison of address of global? Or anything else?

I think it has less to do with the uniqueness of the addresses and more to do with the deduplication of the global storage. If you have an inline C++ global variable present in LLVM's headers (think a static data member of a class template instantiation), things go wrong quickly if there are two copies of this global, one in LLVM, and the other in the user's binary. Updates from one DSO will not be visible in the other. If you arrange the same situation with functions, they are usually functionally equivalent even if there are minor code differences.

Generally, users are not trying to preempt LLVM's own function definitions. The typical use cases are to override C library functionality such as malloc. The performance benefit of -Bsymbolic-functions is worth making LLVM's own functions non-interposable.

To add on rnk's comment (deduplication of vague linkage data which may be included in multiple shared objects), another big reason is -Bsymbolic data does not work with copy relocations. -fno-pic programs generally cannot avoid copy relocations (except mips; mips did this thing correctly). GCC 5 x86-64 has a regression that -fpie code can have copy relocations, too.

While -Bsymbolic-funtions brings nice performance improvements it also changes symbol resolution order. That means we effectively disabled preemption for functions and all references from inside libLLVM*.so will be resolved locally. But references to global data can still be interposed by external definitions. Do I understand correctly that main argument against using -Bsymbolic is potential issue with equality comparison of address of global? Or anything else?

I think it has less to do with the uniqueness of the addresses and more to do with the deduplication of the global storage. If you have an inline C++ global variable present in LLVM's headers (think a static data member of a class template instantiation), things go wrong quickly if there are two copies of this global, one in LLVM, and the other in the user's binary. Updates from one DSO will not be visible in the other.

I don't see what's wrong here (maybe you have specific example). IMHO, quite opposite, LLVM will consistently use it's own instance while user's binary it's own as if there were two globals with different names in the first place. I have exactly that real life case where LLVM's function (coming from libLLVM*.so since functions are not interposed) refers to a global from the user's binary with the same name by completely different semantics.

If you arrange the same situation with functions, they are usually functionally equivalent even if there are minor code differences.

I don't think it is safe to assume that. Maybe in most cases, but not always.

Generally, users are not trying to preempt LLVM's own function definitions. The typical use cases are to override C library functionality such as malloc. The performance benefit of -Bsymbolic-functions is worth making LLVM's own functions non-interposable.

I don't think this is really relevant to the subject since performance is outside of the scope of my question. The question is if -Bsymbolic can protect us from unintentional preemption by user defined globals.

To add on rnk's comment (deduplication of vague linkage data which may be included in multiple shared objects), another big reason is -Bsymbolic data does not work with copy relocations. -fno-pic programs generally cannot avoid copy relocations (except mips; mips did this thing correctly).

Do you (by "doesn't work") mean if executable refers to a global defined in LLVM.*so then each of them will use their own copy?

GCC 5 x86-64 has a regression that -fpie code can have copy relocations, too.

Looks like the bug itself is still opened. I guess you meant to refer to the the following change: After "x86-64: Optimize access to globals in PIE with copy reloc", GCC x86-64 asks the assembler to produce an R_X86_64_PC32 for an external data access." Looks like intended change ... why do you call it a regression?

rnk added a comment.Dec 6 2021, 10:58 AM

I don't see what's wrong here (maybe you have specific example). IMHO, quite opposite, LLVM will consistently use it's own instance while user's binary it's own as if there were two globals with different names in the first place. I have exactly that real life case where LLVM's function (coming from libLLVM*.so since functions are not interposed) refers to a global from the user's binary with the same name by completely different semantics.

In my experience, in C++, the problem you describe is uncommon because globals are typically namespaced, so there are rarely clashes between user code and LLVM. The problem you describe arises more often when two different versions of LLVM are linked into different DSOs. Yes, one could probably attempt to use -Bsymbolic to defend against these symbol conflicts, but there may still be conflicts over other process global resources such as signal handlers.

If you arrange the same situation with functions, they are usually functionally equivalent even if there are minor code differences.

I don't think it is safe to assume that. Maybe in most cases, but not always.

If you have two function definitions with the same name that semantically differ, that's a C++ ODR violation, so it's usually a pretty safe assumption.

Generally, users are not trying to preempt LLVM's own function definitions. The typical use cases are to override C library functionality such as malloc. The performance benefit of -Bsymbolic-functions is worth making LLVM's own functions non-interposable.

I don't think this is really relevant to the subject since performance is outside of the scope of my question. The question is if -Bsymbolic can protect us from unintentional preemption by user defined globals.

I think you could certainly try, but it risks creating the first issue that I described, duplicate copies of vague linkage globals. A concrete example of one of these issues is the llvm::Any template:
https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ADT/Any.h#L28

If the user instantiates llvm::Any outside of the main LLVM DSO, there will be a second copy of this typeid global, and Bsymbolic will stop the loader from coalescing them. Depending on how you use LLVM's C++ APIs, your risk of these issues may vary.

In any case, -Bsymbolic is not a good default for the project because of issues like this.

! In D102090#3174187, @rnk wrote:
In any case, -Bsymbolic is not a good default for the project because of issues like this.

I see. Thank you.