This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Add support for weak references
ClosedPublic

Authored by int3 on Dec 15 2020, 9:36 PM.

Details

Reviewers
thakis
Group Reviewers
Restricted Project
Commits
rG811444d7a173: [lld-macho] Add support for weak references
Summary

Weak references need not necessarily be satisfied at runtime (but they must
still be satisfied at link time). So symbol resolution still works as per usual,
but we now pass around a flag -- ultimately emitting it in the bind table -- to
indicate if a given dylib symbol is a weak reference.

ld64's behavior for symbols that have both weak and strong references is
a bit bizarre. For non-function symbols, it will emit a weak import. For
function symbols (those referenced by BRANCH relocs), it will emit a
regular import. I'm not sure what value there is in that behavior, and
since emulating it will make our implementation more complex, I've
decided to treat regular weakrefs like function symbol ones for now.

Fixes PR48511.

Diff Detail

Event Timeline

int3 requested review of this revision.Dec 15 2020, 9:36 PM
int3 created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptDec 15 2020, 9:36 PM
thakis accepted this revision.Dec 16 2020, 5:45 AM
thakis added a subscriber: thakis.

Nice! I fumbled around a bit with this yesterday and wrote almost exactly the same change, except I didn't have the critical bind opcode bits (just the symbol table) since I don't know that part well yet. And in my local diff I made it so that a single non-weak ref makes the whole symbol non-weak while here a single weak ref makes the whole symbol weak (see first comment below). But in any case, this is great progress and definitely going in the right direction :)

lld/MachO/SymbolTable.cpp
77

Is that how it works? One weak ref makes all refs weak? I would've expected that one non-weak ref makes the whole symbol non-weak, but I haven't checked. Semantically it kind of makes sense: If your program has a single non-weak call to a symbol, there can't be a need for the symbol to be weak. Maybe that's what explains the ld64 behavior for functions?

118

Maybe combine these two nested ifs using the spiffy && operator? :)

lld/test/MachO/symtab.s
61–62

(a bit weird that llvm-readobj output doesn't include weakness)

This revision is now accepted and ready to land.Dec 16 2020, 5:45 AM
int3 marked an inline comment as done.Dec 16 2020, 7:36 AM
int3 added inline comments.
lld/MachO/SymbolTable.cpp
77

I would've expected that one non-weak ref makes the whole symbol non-weak

I expected this too, but ld64 doesn't do this for non-function symbols. I can't see a good reason for the inconsistency between functions and non-functions, so I guess we have to make a choice. Making weakness take priority has the advantage of being more permissive -- if ld64 succeeds in linking a given binary, so should we. But there's a good argument for not implementing dubious behavior until we have a concrete use case.

tl;dr I'll switch to having non-weak take priority

lld/test/MachO/symtab.s
61–62

it does -- these particular symbols are just not weak references. I modified this test case because we're now marking all dylib symbols as extern, which I believe is the right behavior. And that was done because nm doesn't interpret the weakref flag correctly if extern isn't set.

int3 marked an inline comment as done.Dec 16 2020, 7:55 AM

Re bind opcodes: I found http://www.m4b.io/reverse/engineering/mach/binaries/2015/03/29/mach-binaries.html to be a good resource when I was starting out. Particularly the "Mach Import Bind FSA" section.

int3 updated this revision to Diff 312218.Dec 16 2020, 8:19 AM
int3 marked 2 inline comments as done.

make non-weakness take priority

thakis added inline comments.Dec 16 2020, 9:58 AM
lld/MachO/SymbolTable.cpp
124

Can it happen that a dylib symbol is added before the first undefined? If so, then the initial dylib has isWeakRef set to false, and it can then never become true. If I read Driver.cpp, this can happen if a dylib is passed before the .o referring to it is passed. (In my local patch I gave DylibSymbol an

+  enum WeakState { Unknown, SawWeakRefOnly, SawStrongRef,  };
+  WeakState weakRefState;

and a

+  void sawUndefined(bool weakRef) {
+    switch (weakRefState) {
+    case Unknown:
+      weakRefState = weakRef ? SawWeakRefOnly : SawStrongRef;
+      break;
+    case SawWeakRefOnly:
+      if (!weakRef)
+        weakRefState = SawStrongRef;
+      break;
+    case SawStrongRef:
+      break;
+    }
+  }

and made DylibSymbol start with Unknown and then in addUndefined did

+  else if (auto *dylib = dyn_cast<DylibSymbol>(s))
+    dylib->sawUndefined(isWeakRef);

I only sketched it locally and hadn't written tests yet so maybe this is overkill, but my read of Driver is that this is necessary. (It's noticeably more annoying to write than making one weak ref turn the whole symbol weak.)

int3 planned changes to this revision.Dec 16 2020, 10:16 AM
int3 added inline comments.
lld/MachO/SymbolTable.cpp
124

oh yes. I thought I was missing something this morning in my implementation but couldn't put my finger on it... thanks!

int3 updated this revision to Diff 312344.Dec 16 2020, 5:19 PM
int3 marked an inline comment as done.
int3 edited the summary of this revision. (Show Details)

use enum for weakref state

This revision is now accepted and ready to land.Dec 16 2020, 5:19 PM
int3 added inline comments.Dec 16 2020, 5:21 PM
lld/MachO/SymbolTable.cpp
124

Okay, so I think the enum approach is the right one, though the reasons are a bit more complicated. I think handling the "dylib symbol before undefined symbol" case by itself could have been handled by initializing DylibSymbols with weakRef = true. However, there are two other cases for which having an "unreferenced" state is valuable:

  1. Determining which symbols to emit in LC_SYMTAB. The current implementation emits DylibSymbols if they are in the GOT/PLT, but that misses out symbols which aren't in either (e.g. referenced by .quad sym). I'm addressing that in this diff since it is naturally covered within the weak-references.s test.
  2. We should emit LC_LOAD_WEAK_DYLIB for libraries whose symbols are only referenced weakly. But dylibs whose symbols are not used at all should still be loaded strongly. I'm addressing that in D93435: [lld-macho] Use LC_LOAD_WEAK_DYLIB for dylibs with only weakrefs.
int3 requested review of this revision.Dec 16 2020, 5:22 PM

made quite a few changes, so please have another look :)

thakis accepted this revision.Dec 16 2020, 5:48 PM

Nice!

lld/MachO/Symbols.h
126

s/refState::Unreferenced/RefState::Unreferenced/

This revision is now accepted and ready to land.Dec 16 2020, 5:48 PM
int3 updated this revision to Diff 312356.Dec 16 2020, 6:29 PM
int3 marked an inline comment as done.

unbreak debug build

This revision was landed with ongoing or failed builds.Dec 17 2020, 5:49 AM
This revision was automatically updated to reflect the committed changes.

FYI: I saw that ld64 has the flag -weak_reference_mismatches to kind of configure its behavior around this. (I don't need that flag and no action necessary, just happened to see it in the man page and figured I'd mention it.)