This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Emit weak-undef symbols in .dynsym of a PIE only if linked against shared libs.
ClosedPublic

Authored by sivachandra on Mar 12 2019, 3:21 PM.

Event Timeline

sivachandra created this revision.Mar 12 2019, 3:21 PM
sivachandra edited reviewers, added: ruiu; removed: espindola.Mar 12 2019, 3:24 PM
pcc added a subscriber: pcc.Mar 12 2019, 3:58 PM

It looks like our handling of weak undef in executables is a little unusual. At least it does not match the other two linkers. Taking ELF/weak-undef.s as an example:

$ ld.lld ra/obj/lld/test/ELF/Output/weak-undef.s.tmp.o -o ra/obj/lld/test/ELF/Output/weak-undef.s.tmp -pie 
$ nm -D ra/obj/lld/test/ELF/Output/weak-undef.s.tmp
                 w foo
$ ld.gold ra/obj/lld/test/ELF/Output/weak-undef.s.tmp.o -o ra/obj/lld/test/ELF/Output/weak-undef.s.tmp -pie 
$ nm -D ra/obj/lld/test/ELF/Output/weak-undef.s.tmp
nm: ra/obj/lld/test/ELF/Output/weak-undef.s.tmp: no symbols
$ ld.bfd ra/obj/lld/test/ELF/Output/weak-undef.s.tmp.o -o ra/obj/lld/test/ELF/Output/weak-undef.s.tmp -pie 
$ nm -D ra/obj/lld/test/ELF/Output/weak-undef.s.tmp
nm: ra/obj/lld/test/ELF/Output/weak-undef.s.tmp: no symbols

$ echo 'void foo() {}' | clang -shared -o foo.so -x c -
$ ld.bfd ra/obj/lld/test/ELF/Output/weak-undef.s.tmp.o -o ra/obj/lld/test/ELF/Output/weak-undef.s.tmp -pie foo.so
$ nm -D ra/obj/lld/test/ELF/Output/weak-undef.s.tmp |grep foo
                 w foo
$ ld.gold ra/obj/lld/test/ELF/Output/weak-undef.s.tmp.o -o ra/obj/lld/test/ELF/Output/weak-undef.s.tmp -pie foo.so
$ nm -D ra/obj/lld/test/ELF/Output/weak-undef.s.tmp |grep foo
                 w foo
$ ld.lld ra/obj/lld/test/ELF/Output/weak-undef.s.tmp.o -o ra/obj/lld/test/ELF/Output/weak-undef.s.tmp -pie foo.so
$ nm -D ra/obj/lld/test/ELF/Output/weak-undef.s.tmp |grep foo
                 w foo

$ echo 'void bar() {}' | clang -shared -o bar.so -x c -
$ ld.bfd ra/obj/lld/test/ELF/Output/weak-undef.s.tmp.o -o ra/obj/lld/test/ELF/Output/weak-undef.s.tmp -pie bar.so
$ nm -D ra/obj/lld/test/ELF/Output/weak-undef.s.tmp |grep foo
$ ld.gold ra/obj/lld/test/ELF/Output/weak-undef.s.tmp.o -o ra/obj/lld/test/ELF/Output/weak-undef.s.tmp -pie bar.so
$ nm -D ra/obj/lld/test/ELF/Output/weak-undef.s.tmp |grep foo
$ ld.lld ra/obj/lld/test/ELF/Output/weak-undef.s.tmp.o -o ra/obj/lld/test/ELF/Output/weak-undef.s.tmp -pie bar.so
$ nm -D ra/obj/lld/test/ELF/Output/weak-undef.s.tmp |grep foo
                 w foo

So what the other linkers appear to implement is that a shared object defining the weak symbol must be available at link time in order for the symbol to be resolved at runtime, which is essentially what you've implemented here for static PIE. Maybe we should do the same thing for other executables?

lld/test/ELF/Inputs/static-pie-weak-undef.s
2

I don't think you need this test file.

sivachandra marked an inline comment as done.

Prevent emitting weak-undef symbols in .dynsym under -pie --no-dynamic-linker.

In D59275#1426949, @pcc wrote:

It looks like our handling of weak undef in executables is a little unusual. At least it does not match the other two linkers. Taking ELF/weak-undef.s as an example:

$ ld.lld ra/obj/lld/test/ELF/Output/weak-undef.s.tmp.o -o ra/obj/lld/test/ELF/Output/weak-undef.s.tmp -pie 
$ nm -D ra/obj/lld/test/ELF/Output/weak-undef.s.tmp
                 w foo
$ ld.gold ra/obj/lld/test/ELF/Output/weak-undef.s.tmp.o -o ra/obj/lld/test/ELF/Output/weak-undef.s.tmp -pie 
$ nm -D ra/obj/lld/test/ELF/Output/weak-undef.s.tmp
nm: ra/obj/lld/test/ELF/Output/weak-undef.s.tmp: no symbols
$ ld.bfd ra/obj/lld/test/ELF/Output/weak-undef.s.tmp.o -o ra/obj/lld/test/ELF/Output/weak-undef.s.tmp -pie 
$ nm -D ra/obj/lld/test/ELF/Output/weak-undef.s.tmp
nm: ra/obj/lld/test/ELF/Output/weak-undef.s.tmp: no symbols

$ echo 'void foo() {}' | clang -shared -o foo.so -x c -
$ ld.bfd ra/obj/lld/test/ELF/Output/weak-undef.s.tmp.o -o ra/obj/lld/test/ELF/Output/weak-undef.s.tmp -pie foo.so
$ nm -D ra/obj/lld/test/ELF/Output/weak-undef.s.tmp |grep foo
                 w foo
$ ld.gold ra/obj/lld/test/ELF/Output/weak-undef.s.tmp.o -o ra/obj/lld/test/ELF/Output/weak-undef.s.tmp -pie foo.so
$ nm -D ra/obj/lld/test/ELF/Output/weak-undef.s.tmp |grep foo
                 w foo
$ ld.lld ra/obj/lld/test/ELF/Output/weak-undef.s.tmp.o -o ra/obj/lld/test/ELF/Output/weak-undef.s.tmp -pie foo.so
$ nm -D ra/obj/lld/test/ELF/Output/weak-undef.s.tmp |grep foo
                 w foo

$ echo 'void bar() {}' | clang -shared -o bar.so -x c -
$ ld.bfd ra/obj/lld/test/ELF/Output/weak-undef.s.tmp.o -o ra/obj/lld/test/ELF/Output/weak-undef.s.tmp -pie bar.so
$ nm -D ra/obj/lld/test/ELF/Output/weak-undef.s.tmp |grep foo
$ ld.gold ra/obj/lld/test/ELF/Output/weak-undef.s.tmp.o -o ra/obj/lld/test/ELF/Output/weak-undef.s.tmp -pie bar.so
$ nm -D ra/obj/lld/test/ELF/Output/weak-undef.s.tmp |grep foo
$ ld.lld ra/obj/lld/test/ELF/Output/weak-undef.s.tmp.o -o ra/obj/lld/test/ELF/Output/weak-undef.s.tmp -pie bar.so
$ nm -D ra/obj/lld/test/ELF/Output/weak-undef.s.tmp |grep foo
                 w foo

So what the other linkers appear to implement is that a shared object defining the weak symbol must be available at link time in order for the symbol to be resolved at runtime, which is essentially what you've implemented here for static PIE. Maybe we should do the same thing for other executables?

Hmmm, I think ld.bfd atleast, is doing something more "involved" if I may say so.

In the example you have chosen, the weak symbol is being referenced in this manner:

.dc.a foo // foo is the weak symbol.

If you replace it with this:

.long foo@gotpcrel

Then, you will find that ld.bfd does emit the symbol for it in .dynsym.

So, it seems to me that ld.bfd is making the decision based on whether a symbol can be resolved at run time or not . Another observation: ld.bfd does not emit weak undef symbols in .dynsym if we pass --no-dynamic-linker. In my case, I am happy if ld.lld also did this. So, I modified this change accordingly. I had to add a dummy path to a dynamic linker in few of the existing tests. The change to mimic ld.bfd is all cases is probably beyond the scope of this change.

sivachandra retitled this revision from [ELF] Do not emit weak-undef symbols in .dynsym if config is -static -pie. to [ELF] Do not emit weak-undef symbols in .dynsym under -pie --no-dynamic-linker..Mar 13 2019, 1:13 PM
sivachandra added a reviewer: pcc.
ruiu added a comment.Mar 13 2019, 1:40 PM

It seems to basically match what bfd linker does, and looks like it makes sense to me. When you are creating a statically-linked executable, weak undefined symbols cannot be resolved at runtime, therefore it should be resolved as value 0 at static-link-time.

ruiu added inline comments.Mar 13 2019, 1:41 PM
lld/ELF/Symbols.cpp
270–271

This really needs comment. Also, why do you have to check if DynamicLinker was not set? In lld, --dynamic-linker is used only for setting the name of the dynamic linker. Shouldn't this be --static?

Add a comment.

sivachandra marked an inline comment as done.Mar 13 2019, 2:04 PM
sivachandra added inline comments.
lld/ELF/Symbols.cpp
270–271

Reading pcc's comment from yesterday, it seemed to me that aligning with what other linkers do would be nice. So, ld.bfd for example, does not just really use "-pie -static" to prevent weak-undef symbols from showing up in .dynsym. It does so if --no-dynamic-linker is specified. Hence, the check for the dynamic linker name as above.

ruiu added inline comments.Mar 13 2019, 2:14 PM
lld/ELF/Symbols.cpp
270–271

What we really should do is to the thing that makes the most sense. Aligning behavior with other linkers makes sense in many cases, but we don't necessarily blindly copy all corner-case behaviors. To me, the logic that "if an executable is not dynamically linked, there's no point of exporting weak symbols" makes the most sense -- thus, checking the value of --dynamic-linker looks weird.

sivachandra marked an inline comment as done.Mar 13 2019, 2:27 PM
sivachandra added inline comments.
lld/ELF/Symbols.cpp
270–271

To avoid more round trips: how does diff 2 look? It is as per what you are expecting and that is what I started with.

pcc added inline comments.Mar 13 2019, 2:32 PM
lld/ELF/Symbols.cpp
270–271

-static is an alias for -Bstatic (i.e. don't search for .a files for -l). It doesn't necessarily imply that the executable is statically linked because the user could have either passed a .so file directly or enabled -Bdynamic at some point.

A more accurate test for whether the executable is dynamically linked is whether SharedFiles is non-empty. This is also part of the test that we use when deciding whether to emit a .dynsym: http://llvm-cs.pcc.me.uk/tools/lld/ELF/Driver.cpp#1498

Would it make sense to use something similar here?

ruiu added inline comments.Mar 13 2019, 2:34 PM
lld/ELF/Symbols.cpp
270–271

Ah, yeah, checking SharedFiles makes sense to me. Thank you for pointing that out.

sivachandra added inline comments.Mar 13 2019, 10:31 PM
lld/ELF/Symbols.cpp
270–271

Few of the existing tests will start requiring that we pass a shared object on the lld command line Is this OK?

ruiu added inline comments.Mar 14 2019, 9:44 AM
lld/ELF/Symbols.cpp
270–271

Can you elaborate? Did you mean that some tests started failing with this change?

Use presense of shared files to know whether a PIE is being dynmically linked or not.

sivachandra marked an inline comment as done.Mar 14 2019, 10:27 AM
sivachandra added inline comments.
lld/ELF/Symbols.cpp
270–271

Look at the change to weak-undef.s test that I have now added to this change. There are couple of tests also failing for the same reason, but I have not added them to this change yet.

ruiu added inline comments.Mar 15 2019, 9:57 AM
lld/ELF/Symbols.cpp
270–271

That change to the test file seems legitimate. Can you also make changes to other tests so that tests pass with this patch? How many tests are failing by the way?

Add the remaining tests affected by the change. This change is now complete and
ready for review.

sivachandra retitled this revision from [ELF] Do not emit weak-undef symbols in .dynsym under -pie --no-dynamic-linker. to [ELF] Emit weak-undef symbols in .dynsym of a PIE only if linked against shared libs..Mar 15 2019, 11:10 AM
ruiu accepted this revision.Mar 15 2019, 3:44 PM

LGTM

lld/ELF/Symbols.cpp
271

safely drop weak undef symbols from .dynsym?

This revision is now accepted and ready to land.Mar 15 2019, 3:44 PM
arichardson added inline comments.Mar 16 2019, 4:45 AM
lld/test/ELF/Inputs/pie-weak.s
2 ↗(On Diff #190856)

Since all these files are identical, would it make sense to just one in all of the tests? Or is it more important to have the names match the test?

Address comment and remove redundant test input files.

This revision was automatically updated to reflect the committed changes.