This is an archive of the discontinued LLVM Phabricator instance.

Fix direct calls to __wrap_sym when it is relocated
ClosedPublic

Authored by matthew.koontz on Jun 22 2018, 11:47 AM.

Details

Summary

Before, direct calls to __wrap_sym would not map to valid PLT entries, so they would crash at runtime. This change maps such calls to the same PLT entry as calls to sym that are then wrapped.

Diff Detail

Repository
rL LLVM

Event Timeline

ruiu added inline comments.Jun 24 2018, 10:57 PM
ELF/SymbolTable.cpp
213–214 ↗(On Diff #152529)

This seems a bit hacky to me, as it always overwrites W.Wrap even though we wrote some data to W.Wrap before in another function. Can you remove that code to initialize W.Wrap?

matthew.koontz added inline comments.Jun 25 2018, 1:26 PM
ELF/SymbolTable.cpp
213–214 ↗(On Diff #152529)

I'm confused as to what you mean. Are you talking about line 202? That is still needed to ensure the change I made to Relocations.cpp works properly.

I agree this is kinda hacky. The problem is the relocation code affects these symbols in a way that needs to be shared between them, but the relocation code also needs to know about the wrapping. We could avoid the memcpy here and just copy the fields that have been changed, which would be more clear but prone to breaking if fields are added. I'm open to other suggestions as well.

grimar added a subscriber: grimar.Jun 26 2018, 12:08 AM
grimar added inline comments.Jun 26 2018, 12:13 AM
ELF/SymbolTable.cpp
213–214 ↗(On Diff #152529)

How many fields need to be copied now if you avoid doing memcpy? I guess it might be a bit cleaner way.

grimar added inline comments.Jun 26 2018, 12:15 AM
test/ELF/wrap-plt.s
35 ↗(On Diff #152529)

mov 1 and mov 2 does not look useful. Please use nop instead, that is more common and consistent with other tests.

ruiu added a comment.Jun 26 2018, 2:48 AM

This problem is tricky. The issue is that when we rename wrap_foo -> foo by overwriting foo with wrap_foo's contents, we now have two copies of the same symbol, foo and wrap_foo. There might be some way to avoid this duplication, but I'm not completely sure how we can do that. Maybe this patch is the best way to fix it, but I want to think harder about it. Let me think more about it tomorrow.

ruiu added a comment.Jun 27 2018, 12:44 AM

I thought about this patch for a bit today. Could you give a program that crashes without this patch? I don't think I understand if we really need this.

My understanding of this patch is this:

  1. When you add --wrap=foo, foo is renamed real_foo, and wrap_foo renamed foo. After the renaming operation, wrap_foo and foo points to the same function. We do that by overwriting foo with wrap_foo's symbol contents.
  2. In some cases, only one of foo or wrap_foo got a PLT entry.
  3. If you call a function that didn't get a PLT entry, that call will be a direct call (i.e. without going through the PLT entry).

I don't know how (3) can lead to a crash. Could you explain it for me?

I've attached a basic program that will crash. It looks like this only happens when creating a shared object file, otherwise it gets a direct call like you describe. The problem is the applySymbolWrap duplicates __wrap_sym, and then sets the IsUsedInRegularObj in one of them to false. From then on, these get treated as two separate symbols. I think the cause of the runtime crash is setting the IsUsedInRegularObj prevents adding an entry in the dynamic symbol table, so the lookup fails. Also, the former code creates two PLT entries for __wrap_sym, when they should share a PLT entry since they really are the same symbol.

Ideally, we wouldn't do this memcpy and somehow have all of the references to sym to now point to __wrap_sym, but I can't think of a good way of doing that with the current code.

Thank you for the test files. For convenience, I copy the contents here.

$ cat main.c
void test();
int main() { test(); return 0; }

$ cat test.c
int foo() { return 1; }
int wrap_foo() { return 0; }
void test() { foo();
wrap_foo(); }

$ cat run-test
#! /bin/bash
set -x
set -e
clang -fuse-ld=lld -Wl,--wrap=strstr -glldb -Wl,--wrap=foo -shared -o
test.so -fpic test.c
clang -fuse-ld=lld -Wl,--wrap=strstr -glldb -Wl,--wrap=foo -dynamic -o main
-fpic main.c ./test.so
./main

I'm not still very happy about this, but it seems like this is indeed the
only thing we can do with the current architecture. LGTM with some nits.
Thank you for finding and fixing this one!

ruiu accepted this revision.Jul 5 2018, 12:26 PM

LGTM

ELF/SymbolTable.cpp
211 ↗(On Diff #152529)

This function doesn't really apply relocations, so the name doesn't seem right. I'd name this applySymbolWrapReloc.

This revision is now accepted and ready to land.Jul 5 2018, 12:26 PM

Addressed code review comments

  • Renamed applyRelocationToWrappedSymbols -> applySymbolWrapReloc
  • Updated test to use nop instead of mov
ruiu added a comment.Jul 9 2018, 1:18 PM

Do you have commit access?

No, I do not have commit access

ruiu added a comment.Jul 9 2018, 1:56 PM

I'll commit this patch for you.

This revision was automatically updated to reflect the committed changes.
phosek added a subscriber: phosek.Jul 17 2018, 9:01 PM

This has introduced a regression when linking ASan instrumented code on AArch64: https://bugs.llvm.org/show_bug.cgi?id=38170. This is currently preventing us from rolling a new toolchain into Fuchsia. Can anyone take a look? I haven't yet figured out what's the cause for that issue.