This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Only allow the binding of SharedSymbol to change for the first undef ref
ClosedPublic

Authored by MaskRay on Jun 29 2019, 5:40 AM.

Details

Summary

Fixes PR42442

t.o has a STB_GLOBAL undef ref to f
t2.so has a STB_WEAK undef ref to f
t1.so defines f

ld.lld t.o t1.so t2.so currently sets the binding of f to STB_WEAK.
This is not correct because there exists a STB_GLOBAL undef ref from a
regular object. The problem is that resolveUndefined() doesn't check
if the undef ref is seen for the first time:

if (isShared() || isLazy() || (isUndefined() && Other.Binding != STB_WEAK))
  Binding = Other.Binding;

The isShared() condition should be isShared() && !Referenced
where Referenced is set to true after an undef ref is seen.

In practice, when linking a pthread program with glibc:

// a.o
#include <pthread.h>
pthread_mutex_t mu = PTHREAD_MUTEX_INITIALIZER;
int main() { pthread_mutex_unlock(&mu); }

{clang,gcc} -fuse-ld=lld a.o -lpthread # libpthread.so is linked before libgcc_s.so.1

The weak undef pthread_mutex_unlock in libgcc_s.so.1 makes the result
weak, which diverges from GNU linkers where STB_DEFAULT is used:

23: 0000000000000000     0 FUNC    WEAK   DEFAULT  UND pthread_mutex_lock

(Note, if -pthread is used instead, libpthread.so will be linked after
libgcc_s.so.1 . lld sets the binding to the expected STB_GLOBAL)

Similar linking sequences (ld.lld t.o t1.so t2.so) appear to be used by
Go, which cause a build error https://github.com/golang/go/issues/31912.

Diff Detail

Repository
rL LLVM

Event Timeline

MaskRay created this revision.Jun 29 2019, 5:40 AM
MaskRay updated this revision to Diff 207199.Jun 29 2019, 6:07 AM
MaskRay edited the summary of this revision. (Show Details)

Update test

MaskRay updated this revision to Diff 207200.Jun 29 2019, 6:11 AM
MaskRay edited the summary of this revision. (Show Details)
MaskRay removed a subscriber: jfb.

Update description

MaskRay updated this revision to Diff 207201.Jun 29 2019, 6:16 AM
MaskRay edited the summary of this revision. (Show Details)

Update description

MaskRay added a subscriber: jfb.Jun 29 2019, 6:16 AM
peter.smith added inline comments.
ELF/Symbols.cpp
413 ↗(On Diff #207201)

Coming at this part of the code with fresh eyes, I think a comment, or reference to a comment elsewhere, explaining why we are setting the binding of an existing shared symbol to weak would be helpful here. I managed to find one in Symbols.h that helped but I had to search for it:

// Symbol binding. This is not overwritten by replace() to track
// changes during resolution. In particular:
//  - An undefined weak is still weak when it resolves to a shared library.
//  - An undefined weak will not fetch archive members, but we have to
//    remember it is weak.

One other thing that might be worth mentioning is why lazy symbols don't need the same treatment. As I understand it a non-weak reference would immediately fetch the member to define the symbol so there isn't a case where the undefined weak reference changes the binding.

ELF/Symbols.h
368 ↗(On Diff #207201)

I'd have thought it was false if no undefined reference has been seen yet?

370 ↗(On Diff #207201)

I think BindingFinalized implies that it never changes but it can if we see a weak refrence first and then see a non-weak reference. As I understand the implementation, this will be set to true on first reference of the shared symbol. Some possible alternatives:

// This is true if no undefined reference to the symbol has been seen. We use this 
bool Unreferenced = true;

Note: this would need the logic reversed in the rest of the code.

if (Other.Binding != STB_WEAK || S->Unreferenced)
      Binding = Other.Binding;
S->Unreferenced = false;

Alternative retaining the existing logic.

// This is true if there has been at least one undefined reference to the symbol 
bool Referenced = false;
MaskRay updated this revision to Diff 207355.Jul 1 2019, 9:32 AM
MaskRay edited the summary of this revision. (Show Details)
MaskRay added a reviewer: peter.smith.
MaskRay removed a subscriber: peter.smith.

Rename BindingFinalized to Referenced.
Add comments

MaskRay marked 4 inline comments as done.Jul 1 2019, 9:36 AM
MaskRay added inline comments.
ELF/Symbols.cpp
413 ↗(On Diff #207201)

Just refactored the code a bit to dispatch on symbol kinds:

if (undefined) {
} else if (SharedSymbol) {
} else if (lazy) {
}

I hope the logic will become clearer.

ELF/Symbols.h
370 ↗(On Diff #207201)

Thanks for the suggestions! I'll go with Referenced (preferring positive forms to avoid double negative).

MaskRay updated this revision to Diff 207359.Jul 1 2019, 9:41 AM
MaskRay marked an inline comment as done.

Simplify

Thanks for the update, looks easier to understand to now.

grimar accepted this revision.Jul 2 2019, 2:24 AM

LGTM.

ELF/Symbols.h
370 ↗(On Diff #207201)

FWIW

This is false if no undefined reference has been seen yet

Peter's version which avoided using negation looks a bit simpler to me:
"This is true if there has been at least one undefined reference to the symbol"

This revision is now accepted and ready to land.Jul 2 2019, 2:24 AM
ruiu added inline comments.Jul 2 2019, 3:05 AM
ELF/Symbols.h
370 ↗(On Diff #207359)

Does this increase the size of the symbol?

MaskRay updated this revision to Diff 207508.Jul 2 2019, 4:01 AM

Reorder SharedSymbols fields to make it obvious there is no size increase
Reword the comment of Referenced.

MaskRay marked 3 inline comments as done.Jul 2 2019, 4:04 AM
MaskRay added inline comments.
ELF/Symbols.h
370 ↗(On Diff #207359)

sizeof(Defined) == 80
sizeof(SharedSymbol) == 80. Alignment occupied bytes [52,56). This was not obvious, so I reordered the fields to make it obvious.

sizeof(SharedSymbol) is still 80 after reordering.

ruiu accepted this revision.Jul 2 2019, 4:07 AM

LGTM

ruiu added a comment.Jul 2 2019, 4:08 AM

Maybe it's good to have a static_assert to ensure that the symbol size is 80 bytes.

MaskRay marked an inline comment as done.Jul 2 2019, 4:33 AM

Maybe it's good to have a static_assert to ensure that the symbol size is 80 bytes.

The static_assert change may be a bit risky (32-bit 64-bit ELF, Windows, etc). Let me do it in another change.

This revision was automatically updated to reflect the committed changes.