Before, direct calls to __wrap_sym would not map to valid PLT entries, so they would crash at runtime. This change maps such calls to the same PLT entry as calls to sym that are then wrapped.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
ELF/SymbolTable.cpp | ||
---|---|---|
213–214 ↗ | (On Diff #152529) | This seems a bit hacky to me, as it always overwrites W.Wrap even though we wrote some data to W.Wrap before in another function. Can you remove that code to initialize W.Wrap? |
ELF/SymbolTable.cpp | ||
---|---|---|
213–214 ↗ | (On Diff #152529) | I'm confused as to what you mean. Are you talking about line 202? That is still needed to ensure the change I made to Relocations.cpp works properly. I agree this is kinda hacky. The problem is the relocation code affects these symbols in a way that needs to be shared between them, but the relocation code also needs to know about the wrapping. We could avoid the memcpy here and just copy the fields that have been changed, which would be more clear but prone to breaking if fields are added. I'm open to other suggestions as well. |
ELF/SymbolTable.cpp | ||
---|---|---|
213–214 ↗ | (On Diff #152529) | How many fields need to be copied now if you avoid doing memcpy? I guess it might be a bit cleaner way. |
test/ELF/wrap-plt.s | ||
---|---|---|
35 ↗ | (On Diff #152529) | mov 1 and mov 2 does not look useful. Please use nop instead, that is more common and consistent with other tests. |
This problem is tricky. The issue is that when we rename wrap_foo -> foo by overwriting foo with wrap_foo's contents, we now have two copies of the same symbol, foo and wrap_foo. There might be some way to avoid this duplication, but I'm not completely sure how we can do that. Maybe this patch is the best way to fix it, but I want to think harder about it. Let me think more about it tomorrow.
I thought about this patch for a bit today. Could you give a program that crashes without this patch? I don't think I understand if we really need this.
My understanding of this patch is this:
- When you add --wrap=foo, foo is renamed real_foo, and wrap_foo renamed foo. After the renaming operation, wrap_foo and foo points to the same function. We do that by overwriting foo with wrap_foo's symbol contents.
- In some cases, only one of foo or wrap_foo got a PLT entry.
- If you call a function that didn't get a PLT entry, that call will be a direct call (i.e. without going through the PLT entry).
I don't know how (3) can lead to a crash. Could you explain it for me?
I've attached a basic program that will crash. It looks like this only happens when creating a shared object file, otherwise it gets a direct call like you describe. The problem is the applySymbolWrap duplicates __wrap_sym, and then sets the IsUsedInRegularObj in one of them to false. From then on, these get treated as two separate symbols. I think the cause of the runtime crash is setting the IsUsedInRegularObj prevents adding an entry in the dynamic symbol table, so the lookup fails. Also, the former code creates two PLT entries for __wrap_sym, when they should share a PLT entry since they really are the same symbol.
Ideally, we wouldn't do this memcpy and somehow have all of the references to sym to now point to __wrap_sym, but I can't think of a good way of doing that with the current code.
Thank you for the test files. For convenience, I copy the contents here.
$ cat main.c
void test();
int main() { test(); return 0; }
$ cat test.c
int foo() { return 1; }
int wrap_foo() { return 0; }
void test() { foo(); wrap_foo(); }
$ cat run-test
#! /bin/bash
set -x
set -e
clang -fuse-ld=lld -Wl,--wrap=strstr -glldb -Wl,--wrap=foo -shared -o
test.so -fpic test.c
clang -fuse-ld=lld -Wl,--wrap=strstr -glldb -Wl,--wrap=foo -dynamic -o main
-fpic main.c ./test.so
./main
I'm not still very happy about this, but it seems like this is indeed the
only thing we can do with the current architecture. LGTM with some nits.
Thank you for finding and fixing this one!
LGTM
ELF/SymbolTable.cpp | ||
---|---|---|
211 ↗ | (On Diff #152529) | This function doesn't really apply relocations, so the name doesn't seem right. I'd name this applySymbolWrapReloc. |
Addressed code review comments
- Renamed applyRelocationToWrappedSymbols -> applySymbolWrapReloc
- Updated test to use nop instead of mov
This has introduced a regression when linking ASan instrumented code on AArch64: https://bugs.llvm.org/show_bug.cgi?id=38170. This is currently preventing us from rolling a new toolchain into Fuchsia. Can anyone take a look? I haven't yet figured out what's the cause for that issue.