This is an archive of the discontinued LLVM Phabricator instance.

Refine our --wrap implementation
ClosedPublic

Authored by rafael on Jul 4 2017, 7:07 PM.

Details

Reviewers
ruiu
Summary

Before this patch we copy foo into real_foo and wrap_foo into foo. The net result is that __wrap_foo shows up twice in the symbol table.

With this patch we save a copy of real_foo and put it in the symbol table slow that was used by the original wrap_foo. We also omit it from the final output if it was undefined.

The net result is that

  • Anything using foo now uses __wrap_foo
  • Anything using __real_foo now uses foo.
  • Anything using __wrap_foo still does.

And

  • The slot for foo now has __wrap_foo
  • The slot for __real_foo now has foo
  • The slot for wrap_foo now has real_foo

Which I think is the desired behavior.

Diff Detail

Event Timeline

rafael created this revision.Jul 4 2017, 7:07 PM

Could we get the full diff context, please?

rafael updated this revision to Diff 105439.Jul 6 2017, 8:34 AM

full context

grimar added a subscriber: grimar.Jul 7 2017, 1:01 AM

I'm not sure I understand the motivation behind this change (or @ruiu's original change to add the symbol). Why are we creating __real_foo at all?

test/ELF/wrap.s
20

This doesn't look right to me at all. __real_foo, __wrap_foo and foo can't all have different values. Surely there should be only two different values between these three, because there are only two functions in the code?

ruiu added inline comments.Jul 10 2017, 10:47 AM
test/ELF/wrap.s
20

__wrap_foo and foo should have the same value, but __real_foo shouldn't, because it should have a value of the original function instead of your wrapper function. So I think this is correct.

jhenderson added inline comments.Jul 11 2017, 2:48 AM
test/ELF/wrap.s
20

I disagree with this. The current meaning of --wrap in other linkers is to redirect references to symbols, not to change the symbols themselves, so if anything, __real_foo and foo should have the same value and __wrap_foo a second, different value. (As an aside, the current test has three different values, not two, so is definitely wrong).

I don't understand why we need __real_foo at all - it's not actually a function in the user's source for example.

There is a serious danger that this could confuse users when they try to debug their code. By having multiple symbols for the same address, there is a potentially negative impact on the call stack displayed by the debugger - it has the right to pick whichever symbol it feels like (it may not even be deterministic about this). For example, this could mean that when the code crashes in the __wrap_foo() function, it could say that the crash occurs in foo(). The user then stares at the foo() function in their source code and cannot see anything wrong with it, because the problem isn't there.

jhenderson added inline comments.Jul 12 2017, 1:51 AM
test/ELF/wrap.s
20

Rafael posted the following on the mailing list, so I've included it here for completeness:

This actually implements the same behavior as bfd. The symbol references
become:

201000:       ba 10 10 01 00          mov    $0x11010,%edx
 201005:       ba 10 10 01 00          mov    $0x11010,%edx
 20100a:       ba 00 10 01 00          mov    $0x11000,%edx

And the symbols in the symbol table are unchanged:

0000000000011000 A foo
0000000000011020 A __real_foo
0000000000011010 A __wrap_foo

Cheers,
Rafael

Oh, I see what's going on now! I'd completely missed the second input to the test that defined all three symbols, so I was incorrectly assuming that only __wrap_foo and foo were defined. My apologies for the confusion.

FWIW, if __real_foo isn't defined in the test input, then it's not included in the bfd output, which was the case I was thinking of. It feels like this test case is missing.

I still disagree with @ruiu about __wrap_foo and foo having the same value in the symbol table (I don't think they should, and neither does bfd), although of course references to them in the code should be patched to the same value (i.e. to __wrap_foo) when using --wrap.

I now have no objection to this test change.

rafael updated this revision to Diff 117840.Oct 5 2017, 9:46 AM

Drop __real_foo from the symbol table if it was originally undefined.

Context is missing again.

rafael updated this revision to Diff 117842.Oct 5 2017, 9:50 AM
rafael retitled this revision from Hack to keep __real_foo to Refine our --wrap implementation.
rafael edited the summary of this revision. (Show Details)

now with context

ruiu added inline comments.Oct 5 2017, 12:43 PM
ELF/SymbolTable.cpp
169

Foo?

170

Real is already a Symbol*.

190–192

Is this correct? I thought foo becomes real_foo and wrap_foo becomes foo.

rafael updated this revision to Diff 117877.Oct 5 2017, 1:09 PM

Fix silly variable name.

ruiu added inline comments.Oct 5 2017, 7:24 PM
ELF/SymbolTable.cpp
190–192

I think I figured out why you want to rotate three symbols instead of just overwriting two symbols foo and wrap_foo. So, you want to keep these three symbols somehow in the symbol table so that they'll be emitted to the output symbol table, and you are reusing the symbol table slot for wrap_foo to keep real_foo, right?

What confused me was, after rotating three symbols, find("wrap_foo") returns real_foo. That is counter-intuitive because I didn't expect "--wrap foo" changes not only foo and real_foo but wrap_foo. Can you fix this?

Use a new slot for __real_foo.

ruiu accepted this revision.Oct 6 2017, 12:16 PM

LGTM

This revision is now accepted and ready to land.Oct 6 2017, 12:16 PM
espindola closed this revision.Mar 14 2018, 3:40 PM
espindola added a subscriber: espindola.

315097