This is an archive of the discontinued LLVM Phabricator instance.

Add --check-library-dependency to maintain compatibility with other linkers
ClosedPublic

Authored by ruiu on Apr 2 2018, 7:00 PM.

Details

Summary

I'm proposing a new command line flag, --check-library-dependency, in
this patch. The flag and the feature proposed below doesn't exist in
GNU linkers nor the current lld.

--check-library-dependency is an option to detect reverse or cyclic
dependencies between static archives, and it can be used to keep
your program compatible with GNU linkers after you switch to lld.
I'll explain the feature and why you may find it useful below.

lld's symbol resolution semantics is more relaxed than traditional
Unix linkers. Therefore,

ld.lld foo.a bar.o

succeeds even if bar.o contains an undefined symbol that have to be
resolved by some object file in foo.a. Traditional Unix linkers
don't allow this kind of backward reference, as they visit each
file only once from left to right in the command line while
resolving all undefined symbol at the moment of visiting.

In the above case, since there's no undefined symbol when a linker
visits foo.a, no files are pulled out from foo.a, and because the
linker forgets about foo.a after visiting, it can't resolve
undefined symbols that could have been resolved otherwise.

That lld accepts more relaxed form means (besides it makes more
sense) that you can accidentally write a command line or a build
file that works only with lld, even if you have a plan to
distribute it to wider users who may be using GNU linkers. With
--check-library-dependency, you can detect a library order that
doesn't work with other Unix linkers.

The option is also useful to detect cyclic dependencies between
static archives. Again, lld accepts

ld.lld foo.a bar.a

even if foo.a and bar.a depend on each other. With
--check-library-dependency, it is handled as an error.

Here is how the option works. We assign a group ID to each file. A
file with a smaller group ID can pull out object files from an
archive file with an equal or greater group ID. Otherwise, it is a
reverse dependency and an error.

A file outside --{start,end}-group gets a fresh ID when
instantiated. All files within the same --{start,end}-group get the
same group ID. E.g.

ld.lld A B --start-group C D --end-group E

A and B form group 0, C, D and their member object files form group
1, and E forms group 2. I think that you can see how this group
assignment rule simulates the traditional linker's semantics.

Diff Detail

Repository
rL LLVM

Event Timeline

ruiu created this revision.Apr 2 2018, 7:00 PM
ruiu updated this revision to Diff 140729.Apr 2 2018, 8:01 PM
  • rebase
ruiu updated this revision to Diff 140730.Apr 2 2018, 8:02 PM
  • revert unrelated change
smeenai added a subscriber: smeenai.Apr 2 2018, 9:22 PM
grimar added inline comments.Apr 3 2018, 4:50 AM
lld/ELF/Driver.cpp
934 ↗(On Diff #140730)

I think GroupDepth should not be a member of InputFile.
It could be a local variable it seems:

  int32_t GroupDepth = 0;
  for (auto *Arg : Args) {
    switch (Arg->getOption().getUnaliasedOption().getID()) {
...
    case OPT_start_group:
      ++GroupDepth;
      break;
    case OPT_end_group:
      if (GroupDepth > 0)
        --GroupDepth;
      else
        error("stray --end-group");
      break;
...
    }

    // All files within the same --start-group and --end-group get the
    // same group ID. Otherwise, a new file will get a new group ID.
    if (!GroupDepth)
      ++InputFile::CurrentGroupId;
  }
lld/ELF/SymbolTable.cpp
295 ↗(On Diff #140730)

I think our comments are usually impersonal.

339 ↗(On Diff #140730)

I would end the sentence after "group 0."
Otherwise "0, C, D" looks like a single list.

341 ↗(On Diff #140730)

"I think that you can see how this group assignment rule simulates the traditional linker's semantics."

Is also personal and does not make sense in the comment. It brings no new information about the feature.
I would remove this sentence completely.

344 ↗(On Diff #140730)

Why is it "bad"? In the policy of LLD it is valid and hence neutral. It just that old linkers
does not support such references, though that do not make them "bad". I would drop this word.

344 ↗(On Diff #140730)

Also, should it be a warning? It is a bit more flexible probably.

613 ↗(On Diff #140730)

checkDependency duplicates in few places, I would suggest
to land D45083 first and I think you should be able to use SymbolTable::fetchIfLazy here and
move checkDependency there to avoid duplications.

ruiu added a comment.Apr 3 2018, 10:21 AM

I'd like to not reply on each comment as I want reviewers focus on design and feature themselves first.

Guys, do you think the new command line option name make sense? Do you think it's actually useful?

dblaikie added a subscriber: ruiu.Apr 3 2018, 10:39 AM

Makes sense to me - I haven't applied/tested the patch against some recent
commits to LLVM I've made that broke gold/binutils ld based builders but
didn't break for me locally (using lld), but assuming it catches those -
yeah, it'd be useful to me & I'd totally turn it on locally & suggest
anyone else using lld with LLVM to do so as well.

ruiu updated this revision to Diff 140881.Apr 3 2018, 4:10 PM
  • rebased
  • recognize GROUP linker script command

The option and general algorithm makes sense to me.

I am just not sure about the name since cyclic dependencies are still supported with --start-group/--end-group for example.

BTW, I think this is a conservative heuristic since it would warn on "foo.a --start-group bar.a foo.a --end-group" if bar.a fetches a member of foo.a. That is probably fine.

Taking a look at the implementation now.

espindola added inline comments.Apr 3 2018, 7:32 PM
lld/ELF/Options.td
354 ↗(On Diff #140881)

To fetch archive members. Otherwise it sounds like bar.o cannot refer to foo.o in "foo.o bar.o".

lld/ELF/SymbolTable.cpp
292 ↗(On Diff #140881)

This seems to be over selling it. It will not detect cyclic reference if the members that cause it are not used.

I think it is clear to just say that the option prevents an undefined reference from fetching a member written earlier in the command line.

342 ↗(On Diff #140881)

I agree that "bad" is out of place. The user asked to find backwards references and we found one.

From the error it is also not clear that Name is a symbol name. How about something like

Undefined reference Name in Old refers to New.

378 ↗(On Diff #140881)

I think this is the only call to checkDependency that you need. We never find a backward reference when adding a lazy symbol.

ruiu updated this revision to Diff 141072.Apr 4 2018, 3:22 PM
  • address review comments
ruiu added inline comments.Apr 4 2018, 3:29 PM
lld/ELF/Driver.cpp
934 ↗(On Diff #140730)

We can't do that because we have to handle linker scripts as well.

lld/ELF/Options.td
354 ↗(On Diff #140881)

Done

lld/ELF/SymbolTable.cpp
292 ↗(On Diff #140881)

Done

342 ↗(On Diff #140881)

Changed the wording.

378 ↗(On Diff #140881)

Done

295 ↗(On Diff #140730)

I think "I" in this comment (in particular in this sentence) is fine.

ruiu added a comment.Apr 4 2018, 3:40 PM

Overall I'm satisfied with this patch. It's small and I _believe_ it works for the purpose to keep your build file compatible with GNU. That said, I want to be careful when I add a new command line flag (in particular something important like this), so I'll start a discussion on llvm-dev to get feedback from those who are using lld and interested in this feature.

Perhaps --check-library-dependency is too strong a name for the cases that this can catch. I think it is really a --find-library-backreferences that can be helpful as part of resolving differences in behaviour between lld and other unix linkers, but there are limitations.

Looking at this from a perspective of someone trying to migrate from GNU to LLD. Consider these cases for which --check-library-dependency will detect a back reference, but the outcomes are different in each case:
Library f1.a containing f1.o that defines f()
Library f2.a containing f2.o that defines f()
Library g.a containing g.o that defines g(), g.o references f()
Object m.o containing main() that references g()

The link lines below all result in the same backward reference warning:
m.o f1.a g.a : result in an undefined symbol error on a GNU linker, succeed with LLD with f1.o from f1.a.
m.o f1.a g.a f1.a : result in GNU Linker and LLD extracting the same library, f1.o from f1.a
m.o f1.a g.a f2.a f1.a : result in different outputs as the GNU Linker will select f2.o from f2.a to define f and LLD will detect f1.a

The next example does not give a backward reference warning but does give different results on GNU and LLD
m.o --start-group f1.a g.a f2.a f1.a --end-group : result in different outputs as the GNU Linker will select f2.o from f2.a to define f and LLD will detect f1.a

The last example could be problematic for users wanting to start a new project and guarantee that it will work with LLD and GNU assuming --check-library-dependency. I think that we may need a two level index, file level and group level, so that we can check for back-references with the group.

I am OK with this patch. My only concern is the name. I like the --find-library-backreferences suggestion.

ruiu added a comment.Apr 5 2018, 1:43 PM

--find-library-backreferences might be indeed better than --check-library-dependency, but both options are perhaps too long as a Unix linker option? How about --detect-backrefs? Maybe just --backref?

--find-library-backreferences might be indeed better than --check-library-dependency, but both options are perhaps too long as a Unix linker option? How about --detect-backrefs? Maybe just --backref?

+1 for --detect-backrefs.

--find-library-backreferences might be indeed better than --check-library-dependency, but both options are perhaps too long as a Unix linker option? How about --detect-backrefs? Maybe just --backref?

I think --backref is a bit too abstract, from looking at the option in a command line I would have no idea what it actually did without looking at the help. I think --detect-backrefs is better as it at least hints that it is of a diagnostic nature. I've got a mild preference for adding library or lib so we know what backrefs are referring to, but if everyone else is happy with --detect-backrefs then go with that. As an aside, Looking at the length of options in ld.bfd --help I think it has long given up on short option names, particularly for infrequently used ones.

grimar added a comment.Apr 6 2018, 2:21 AM

FWIW, --detect-backrefs sounds better for me than --backrefs.

Also can suggest -no-backrefs.

lld/ELF/Driver.cpp
629 ↗(On Diff #141072)

Does it make sense to enable it by default?

grimar added a comment.Apr 6 2018, 2:36 AM

Or maybe warn-backrefs (similarly to warn-common, warn-symbol-ordering and warn-unresolved-symbols).

Or maybe warn-backrefs (similarly to warn-common, warn-symbol-ordering and warn-unresolved-symbols).

True, that is a nice precedent.

I would leave the warning off at least initially.

MaskRay added a subscriber: MaskRay.Apr 9 2018, 2:44 PM
MaskRay added inline comments.
lld/ELF/Driver.cpp
934 ↗(On Diff #140730)

It seems gold does not support nested --start-group --end-group

% ld.gold -\( -\( a.o -\) -\)     
ld.gold: fatal error: May not nest groups
lld/ELF/SymbolTable.cpp
336 ↗(On Diff #141072)

Does A forms group 0 while B forms group 1?

ruiu added inline comments.Apr 9 2018, 2:48 PM
lld/ELF/Driver.cpp
934 ↗(On Diff #140730)

Oh, I didn't know that. That means we can replace GroupDepth with a boolean variable.

lld/ELF/SymbolTable.cpp
336 ↗(On Diff #141072)

You are right. I'll update this comment.

ruiu updated this revision to Diff 141748.Apr 9 2018, 3:01 PM
  • renamed the option --warn-backrefs
  • address other comments
ruiu updated this revision to Diff 141749.Apr 9 2018, 3:05 PM
  • update comments
espindola accepted this revision.Apr 9 2018, 3:09 PM

Fine by me.

This revision is now accepted and ready to land.Apr 9 2018, 3:09 PM
This revision was automatically updated to reflect the committed changes.
emaste added inline comments.Apr 28 2018, 7:16 PM
lld/trunk/ELF/Options.td
333

I'm not sure what fetch archive members means in this context. What about "warn about undefined symbol references satisifed by earlier archive members" or something like that?

lld/trunk/ELF/SymbolTable.cpp
292

Does the option prevent this case or (in the absence of -fatal-warnings) only warn about it?