This is an archive of the discontinued LLVM Phabricator instance.

[lld][ELF] Add --no-search-static-libs-for-shlib-undefined flag
AbandonedPublic

Authored by pirama on Aug 12 2021, 6:01 PM.

Details

Summary

This change addresses PR43554.

Android platform builds have two properties:

  • Always build executables and libraries with -Wl,--no-undefined
  • Several util libraries linked as statically or as a shared object

across executables and libraries.

This flag will make ld.lld not open static libraries to provide
undefined symbols from a shared library. In Android, those symbols are
guaranteed to be available at runtime (because of -Wl,--no-undefined).

Diff Detail

Event Timeline

pirama created this revision.Aug 12 2021, 6:01 PM
pirama requested review of this revision.Aug 12 2021, 6:01 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 12 2021, 6:01 PM

I still need to add tests but wanted to upload it to get early thoughts on this approach.

MaskRay added a comment.EditedAug 12 2021, 10:08 PM

It's important to have symmetry. a) ld.lld def.a ref.so b) ld.lld ref.so def.a.
The patch addresses a) but not b).

(
I have noticed missing test coverage. The following diff doesn't cause a test failure.

--- i/lld/ELF/Symbols.cpp
+++ w/lld/ELF/Symbols.cpp
@@ -744,3 +744,4 @@ template <class LazyT> void Symbol::resolveLazy(const LazyT &other) {
 
-  other.fetch();
+  if (!isShared())
+    other.fetch();
 }

)


I am trying to understand the rationale behind this symbol resolution rule.
On one side, it resembles Windows link.exe and Mach-O ld64: AFAIK they don't use undefined symbols in DLL/dylib for resolution.
(

# On macOS
echo '.globl _main; _main: call bb' > a.s
echo '.globl bb; bb: call cc' > b.s
echo '.globl cc; cc: nop' > c.s
clang --target=x86_64-apple-darwin -c a.s b.s c.s
rm -f c.a && ar rc c.a c.o
 
clang -dynamiclib c.o -o c.dylib
clang -dynamiclib b.o c.dylib -o b.dylib
clang a.o b.dylib c.a
# a.out doesn't have cc

)

On the other side, I strive to think of some scenarios where this rule is both sensible and mandatory, but cannot think of one.
The closest thing I can think of:

echo '.globl _start; _start: call bb' > a.s
echo '.globl bb; bb: call cc' > b.s
echo '.globl cc; cc: nop' > c.s
cc -c a.s b.s c.s
rm -f c.a && ar rc c.a c.o
ld -shared -z defs c.o -o c.so
ld -shared -z defs b.o ./c.so -o b.so

ld -rpath=. a.o b.so c.a
# a.out defines cc. ld.lld, ld.bfd, and gold have the same behavior.

c.so and c.a are built from the same set of object files.
b.so is linked with -Bsymbolic and depends on c.so

For ld ... b.so c.a, if ld extracts c.a to satisfy b.so, the c.a definition may conflict (ODR violation) with c.so.
Keeping c.a unextracted is probably the intention.

I'd say the bug is that c.so and c.a cannot be used together.
The user should not do both things: (1) let b.so link against c.so (2) let the executable link against c.a .
Using --no-undefined (-z defs) for shared objects is not an exuse.

If the executable really needs to use c.a, the shared objects cannot use c.so.
Currently it is difficult to make --no-undefined happy because we don't support an option listing all allow undefined symbols.
(If we want to add such an option, the option can be modeled after --just-symbols.)

It's important to have symmetry. a) ld.lld def.a ref.so b) ld.lld ref.so def.a.
The patch addresses a) but not b).

I'll update to handle the case.

(
I have noticed missing test coverage. The following diff doesn't cause a test failure.

--- i/lld/ELF/Symbols.cpp
+++ w/lld/ELF/Symbols.cpp
@@ -744,3 +744,4 @@ template <class LazyT> void Symbol::resolveLazy(const LazyT &other) {
 
-  other.fetch();
+  if (!isShared())
+    other.fetch();
 }

Test coverage is also missing for the lines I changed in resolveUndefined. Will improve test coverage here as well.


I am trying to understand the rationale behind this symbol resolution rule.
On one side, it resembles Windows link.exe and Mach-O ld64: AFAIK they don't use undefined symbols in DLL/dylib for resolution.
(

# On macOS
echo '.globl _main; _main: call bb' > a.s
echo '.globl bb; bb: call cc' > b.s
echo '.globl cc; cc: nop' > c.s
clang --target=x86_64-apple-darwin -c a.s b.s c.s
rm -f c.a && ar rc c.a c.o
 
clang -dynamiclib c.o -o c.dylib
clang -dynamiclib b.o c.dylib -o b.dylib
clang a.o b.dylib c.a
# a.out doesn't have cc

)

On the other side, I strive to think of some scenarios where this rule is both sensible and mandatory, but cannot think of one.
The closest thing I can think of:

echo '.globl _start; _start: call bb' > a.s
echo '.globl bb; bb: call cc' > b.s
echo '.globl cc; cc: nop' > c.s
cc -c a.s b.s c.s
rm -f c.a && ar rc c.a c.o
ld -shared -z defs c.o -o c.so
ld -shared -z defs b.o ./c.so -o b.so

ld -rpath=. a.o b.so c.a
# a.out defines cc. ld.lld, ld.bfd, and gold have the same behavior.

c.so and c.a are built from the same set of object files.
b.so is linked with -Bsymbolic and depends on c.so

For ld ... b.so c.a, if ld extracts c.a to satisfy b.so, the c.a definition may conflict (ODR violation) with c.so.
Keeping c.a unextracted is probably the intention.

I'd say the bug is that c.so and c.a cannot be used together.
The user should not do both things: (1) let b.so link against c.so (2) let the executable link against c.a .
Using --no-undefined (-z defs) for shared objects is not an exuse.

If the executable really needs to use c.a, the shared objects cannot use c.so.

If c.so is a utility library, then one client's (a.out's) usage of c dictates how another library (b.so) _and_ the users of that library are built: any user of b.so would now want to link with c.so/c.a as well.

In Android build, ODR violations are limited by disallowing static linking of libraries with global state. The breaking cases for this usage are with utility functions that are pooled together into a single library for historic reasons.

Currently it is difficult to make --no-undefined happy because we don't support an option listing all allow undefined symbols.
(If we want to add such an option, the option can be modeled after --just-symbols.)

MaskRay added a comment.EditedAug 13 2021, 11:19 AM

Ideally test coverage improvement should be a separate patch.

I find a patch in https://android.googlesource.com/toolchain/llvm_android/+/refs/heads/master/patches/Revert-two-changes-that-break-Android-builds.v6.patch which may be related to the intention of this patch.
Can you verify whether the proposed --no-search-static-libs-for-shlib-undefined can replace it?
The patch touches shouldDefineSym, which means there is probably another missing test coverage: a symbol assignment which is referenced by a shared object and defined by an unextracted archive.

Ideally test coverage improvement should be a separate patch.

Ack.

I find a patch in https://android.googlesource.com/toolchain/llvm_android/+/refs/heads/master/patches/Revert-two-changes-that-break-Android-builds.v6.patch which may be related to the intention of this patch.
Can you verify whether the proposed --no-search-static-libs-for-shlib-undefined can replace it?
The patch touches shouldDefineSym, which means there is probably another missing test coverage: a symbol assignment which is referenced by a shared object and defined by an unextracted archive.

This goal of this patch is indeed to replace/remove the "Revert-two-changes" downstream patch. I verified that [this patch + remove downstream patch] successfully builds the cuttlefish Android target. I'll also build a few more targets to validate.

MaskRay added a comment.EditedAug 19 2021, 2:17 PM

I want to hear what Peter thinks of this.

https://sourceware.org/pipermail/binutils/2021-August/117696.html has an argument "would expect libfoo.so undefined symbols to similarly cause elements to be extracted from libbar.a. Anything else is surprising."
I can refute that with https://maskray.me/blog/2021-08-15-extract-archive-member-to-satisfy-dso-undef#did-dynamic-linking-should-be-similar-to-static-linking-affect-the-design

However, https://groups.google.com/g/generic-abi/c/NyH6f470Cuc has an argument (somewhat convincing to me) that "Indirect dependencies (dependencies of dependencies) don't count" which makes sense to me,
so I am still struggling whether this really belongs to the upstream...

For the Android problem, it looks like a workaround as well. Do you know why gold linked executables were fine when Android was still using gold?
That "working fine" may be a gold bug.

Linking libc.so into libb.so, then linking libc.a into the executable looks like an erroneous usage to me, so I am unsure an option should be added to work around it.

I'm probably not going to be of much help here. I think this falls into implementation defined behaviour. My understanding is that the intention from the ELF dynamic library designers was to mimic the semantics of static libraries, we can all argue to what degree they succeeded by that, but anyway; if we follow that through we'd expect that a programmer ought to be able to substitute a static library for a dynamic library and not to have to make any modifications. Under that argument then undefined symbols from shared libraries should extract symbols from static libraries.

A possible counter argument could run from DT_NEEDED. Consider the following example:

  • dso1.so has a reference to symbol foo
  • dso1.so has a DT_NEEDED tag for dso2.so
  • dso2.so defines symbol foo
  • static.a contains an object that defines foo
  • If we static link in the DT_NEEDED order dso1.so dso2.so then we'd expect dso2.so to define foo and no object will be extracted from static.a
  • If we don't follow the DT_NEEDED order dso1.so static.a dso2.so or if we elide dso2.so dso1.so static.a then the linker will extract foo from the object in static.a

In a static linker with no back references then this would be solvable by the programmer putting dso2.so after dso1.so. With back references it gets to be a bit more complicated. For example consider static.a dso1.so dso2.so it may be more difficult to reason about the order. I'm not sure I can come up with an example where it is not possible to put the libraries on the command line on the right order to get the right answer. However I'm not sure that this is a good argument for not extracting symbols from static libraries.

To me this is similar to the argument around back-references. If someone has a program that depends on no back-references, then they are going to be upset when a different linker says my implementation choice is to support back-references. I think we are looking at a request to support Gold's implementation choice (assuming Gold does not search static libraries for dynamic dependencies). We've got two possible responses:

  • Yes, the change is small and is not likely to be a maintenance burden. It helps people migrate from Gold. While changing link order should be easy, working with build systems is hard.
  • No, this affects a small number of programs, we don't want the maintenance burden, they should alter their program structure or maintain a downstream patch.

FWIW in Arm's old proprietary linker we had to support both as we were trying to support Symbian and ELF (Linux) dynamic linking, and Symbian required no searching and some ELF programs required searching.

If you're leaning towards no, is there anything that we can do to help Android find the programs that may need altering. If we find a case where no amount of link order tweaking will work then it strengthens the case supporting the Gold behaviour, if we can fix all cases by reordering the link line then we may be able to get away without it.

To clarify, gold is not the odd one. GNU ld, gold, and ld.lld have the similar behavior. The original description Revert-two-changes-that-break-Android-builds.v6.patch probably made an error connecting the previous LLD behavior with the gold behavior.

Say, b.so depends on a.so. The linker command line is ld main.o a.a b.so and a.a defines an undef in b.so.

a.a may be dropped by GNU ld/gold. ld.lld may extract some a.a members but this would be flagged by --warn-backrefs.

From binutils, generic-abi, and your responses, I feel that there is not sufficient argument for having --no-search-static-libs-for-shlib-undefined.
It won't be used by others and the option could just cause confusion.

For the Android problem, it looks like a workaround as well. Do you know why gold linked executables were fine when Android was still using gold?

This worked with gold because static libraries were always ahead of shared libraries in the linker input list.

To preface, I _am_ trying to replace static libs with shared libraries for the failing cases. In the past, we only considered adding the closure for the static library to the failing cases which quickly got out of hand. Replacing with shared libraries doesn't cause any bloat but there may be other constraints with vendor libraries and tests which force them to use the static library variant instead of the shared variant.

However, I don't agree that the static linker should consider/handle static libraries and shared libraries equivalently. Like discussed earlier, DT_NEEDED makes defining undefined symbols in shared libraries a "soft" requirement. OTOH, if all shared libraries are built with --no-undefined, the linker is not required at all to include any definition of undef symbols from shared libraries to produce a working program. Each shared library defines its own closure via DT_NEEDED.

With a single-pass linker, not defining undefs from shared libraries can be achieved with linker input ordering. With lld, IIUC, we should do the following to achieve this:

`ld.lld --start-lib <objects> <static-libs> --end-lib <shared-libs> --warn-backrefs`

If this is correct, does it make sense to add a flag to opt into this behavior? In particular, the flag name --warn-backrefs doesn't capture the fact that lld is not trying to resolve some references at all in this configuration. Such a flag would be more general than the one proposed here and is probably less confusing (and have more users than just Android).

pcc added a subscriber: pcc.Aug 27 2021, 10:30 AM

Could we perhaps put something in the binary if it was linked with --no-undefined (e.g. a dynamic table entry)? Then if we're linking against such a binary, ignore any undefined symbols.

pzheng added a subscriber: pzheng.Oct 22 2021, 4:08 PM
MaskRay requested changes to this revision.Oct 27 2021, 4:05 PM

(I assume that Android will fix the issues so I can request changes to remove this from my very long review queue :)

This revision now requires changes to proceed.Oct 27 2021, 4:05 PM

(I assume that Android will fix the issues so I can request changes to remove this from my very long review queue :)

Yes, that's the plan. I made partial progress towards that but still have many build rules to fix.

pirama abandoned this revision.Jun 30 2022, 12:26 PM

To close the loop here, we were able to fix the build rules in Android without needing this flag.

Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2022, 12:26 PM
Herald added a subscriber: StephenFan. · View Herald Transcript