This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Support --{,no-}allow-shlib-undefined
ClosedPublic

Authored by MaskRay on Jan 29 2019, 7:24 AM.

Details

Summary

In ld.bfd/gold, --no-allow-shlib-undefined is the default when linking
an executable. This patch implements a check to error on undefined
symbols in a shared object, if all of its DT_NEEDED entries are seen.

Our approach resembles the one used in gold, achieves a good balance to
be useful but not too smart (ld.bfd traces all DSOs and emulates the
behavior of a dynamic linker to catch more cases).

The error is issued based on the symbol table, different from undefined
reference errors issued for relocations. It is most effective when there
are DSOs that were not linked with -z defs (e.g. when static sanitizers
runtime is used).

gold has a comment that some system libraries on GNU/Linux may have
spurious undefined references and thus system libraries should be
excluded (https://sourceware.org/bugzilla/show_bug.cgi?id=6811). The
story may have changed now but we make --allow-shlib-undefined the
default for now. Its interaction with -shared can be discussed in the
future.

Diff Detail

Repository
rL LLVM

Event Timeline

MaskRay created this revision.Jan 29 2019, 7:24 AM
ruiu added a comment.Jan 29 2019, 3:25 PM

ld.bfd traces all DSOs using the same logic as the runtime to try to resolve all undefined symbols in all DSOs. The linker even reads from /etc/ld.so.conf to see how DSOs would be resolved at runtime. That's I believe the original concept of --no-allow-shlib-undefines, as you can find any DSO symbols that can fail to be resolved at runtime. But that's apparently too much for us.

ld.gold instead does what you implemented in this patch. For a DSO X, if we know all DSOs that X depends on, then we emit a warning if there is an undefined symbol in X that cannot be resolved using any DSO. That perhaps achieves the right balance between being useful and being not too smart.

So I think this patch essentially looks good. The only concern is that the patch doesn't contain any comment about the design decision as described above.

ELF/InputFiles.h
335 ↗(On Diff #184086)

We already have IsNeeded bit, so Needed would be confusing. DtNeeded is better.

ELF/Writer.cpp
1689 ↗(On Diff #184086)

This code is added without what this is supposed to do, so it would be hard to understand in the future.

Is this the best place to add this error check? Can you run another for loop to separate this new logic from another?

1694 ↗(On Diff #184086)

The way you check whether we've seen all needed libraries seems a bit too complicated. I think that can be implemented much simpler code. At least you shouldn't mix two independent checks, undefined symbol check and needed lib check in a single loop. The latter can be done in a separate loop.

MaskRay updated this revision to Diff 184231.Jan 29 2019, 6:31 PM
MaskRay edited the summary of this revision. (Show Details)

Add design decision (we take an approach resembling gold)

Delete an unnecessary check in the test in allow_shlib_undefined.s and a check line for the error message

MaskRay updated this revision to Diff 184233.Jan 29 2019, 6:40 PM
MaskRay marked an inline comment as done.

Split the loop to two: 1) check if all DT_NEEDED are seen 2) --no-allow-shlib-undefined check

MaskRay marked 3 inline comments as done.Jan 29 2019, 6:43 PM
MaskRay added inline comments.
ELF/Writer.cpp
1689 ↗(On Diff #184086)

After splitting the "whether all DT_NEEDED entries are seen for a SharedFile" logic, the actual error checking code in the loop body is simple enough. I think it does not hurt to be placed beside addSymbol, but I'd like to put it into another loop if you prefer.

MaskRay updated this revision to Diff 184234.Jan 29 2019, 6:50 PM
MaskRay marked an inline comment as done.

Put !AllowShlibUndefined code together.

This does what the comment says:

Is this the best place to add this error check? Can you run another for loop to separate this new logic from another?

MaskRay updated this revision to Diff 184237.Jan 29 2019, 6:58 PM

Change the order of AND clauses to be more efficient

It looks good to me. Few comments about the code/style are below.

ELF/Writer.cpp
1692 ↗(On Diff #184237)

nit: gold->ld.gold for consistency?

1705 ↗(On Diff #184237)

It would not change anything, but probably be slightly more consistent to use toString(File) here:
error(toString(Sym->File) + ": undefined reference to " + toString(*Sym));

Actually, I would rewrite this block with the use of early returns, it feels a bit better readable to me, but that is up to you:

for (Symbol *Sym : Symtab->getSymbols()) {
  if (!Sym->isUndefined() || Sym->isWeak())
    continue;
  if (!Sym->File || !isa<SharedFile<ELFT>>(Sym->File))
    continue;
  if (AllNeededIsKnown.count(Sym->File))
    error(toString(Sym->File) + ": undefined reference to " + toString(*Sym));
}
joerg added a subscriber: joerg.Jan 30 2019, 6:05 AM

Neither GNU ld nor gold implement really useful behavior here. Ideally, every library would be linked with -z defs, but that normally doesn't work because on many systems certain symbols are imported from the main program by default. A good example are the entry points of the dynamic linker. A bad example historically speaking is libwrap. From a user perspective it would be desirable to have a way to say "This symbol will be supplied by someone else" when linking a shared library or potentially even the main binary. That would allow whitelisting intentionally undefined symbols and allow avoiding any symbol resolution dance for shared libraries. A single pass over the symbol tables is still necessary for the sake of knowing the defined symbols and for potential copy relocations, but no further work should be needed.

ruiu added inline comments.Jan 30 2019, 9:05 AM
ELF/Writer.cpp
1704 ↗(On Diff #184237)

Isn't looking up a hash table for each symbol too slow? Maybe you should add a bit to SharedFile?

MaskRay updated this revision to Diff 184394.Jan 30 2019, 3:42 PM

Make AllNeededIsKnown a bool in SharedFile

MaskRay marked 4 inline comments as done.Jan 30 2019, 3:45 PM
MaskRay added inline comments.
ELF/Writer.cpp
1705 ↗(On Diff #184237)

Changed File->getName() to toString(F).

As of the early-return pattern, the if condition used here is simple.. So I'd prefer keeping the current form.

ruiu added inline comments.Jan 30 2019, 3:48 PM
ELF/Writer.cpp
1705 ↗(On Diff #184237)

I think the current style is fine. Probably the following style is more common in lld though:

for (...)
  if (...)
    if (auto *F = dyn_cast_or_null<SharedFile<ELFT>>(Sym->File))
      if (F->AllNeededIsKnown)
        error(...)
MaskRay updated this revision to Diff 184430.Jan 30 2019, 6:03 PM
MaskRay marked 2 inline comments as done.

Address the comment "I think the current style is fine. Probably the following style is more common in lld though:"

// (A) D57385 will not enable the check ("allow shlib undefined")
Config->AllowShlibUndefined = Args.hasFlag(OPT_allow_shlib_undefined, OPT_no_allow_shlib_undefined, true);

// (B) In a future change, check errors when linking executables
Config->AllowShlibUndefined = Args.hasFlag(OPT_allow_shlib_undefined, OPT_no_allow_shlib_undefined, Args.hasFlag(OPT_shared));

I've tested (B) on our (numerous) internal targets and only 0.025% targets fail to build after the (B) change. Most if not all broken targets share the same root cause which is not lld's fault. So I believe (B) is also safe but I'll postpone it to a future change.

ruiu accepted this revision.Jan 31 2019, 11:09 AM

LGTM

ELF/InputFiles.h
362 ↗(On Diff #184430)

Thank you for adding the comment.

This revision is now accepted and ready to land.Jan 31 2019, 11:09 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2019, 6:24 PM