Page MenuHomePhabricator

ThinLTO+CFI: short-circuit direct calls to jump table entries
ClosedPublic

Authored by dmikulin on May 1 2018, 11:34 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

No testcase?

llvm/lib/IR/Value.cpp
506 ↗(On Diff #144756)

This looks like it doesn't handle the case of a function with a function pointer argument correctly.

Ditto Eli's comments. Also adding a FDecl->replaceDirectCalls(F) in the else block after line 1016 seems like it would also correctly rewrite direct calls for the other two cases (external functions, and local functions declared in the current TU.)

llvm/include/llvm/IR/Value.h
309 ↗(On Diff #144756)

s/make/replace/

llvm/lib/Transforms/IPO/LowerTypeTests.cpp
977 ↗(On Diff #144756)

LLVM style nit: same as line as the brace above.

Also, definitions in the merged module are set to have internal linkage which will not work if we plan to call them directly in other modules. We'll need to change their linkage.

dmikulin marked 2 inline comments as done.May 1 2018, 7:57 PM

Yeah, need a test case.

Also, definitions in the merged module are set to have internal linkage which will not work if we plan to call them directly in other modules. We'll need to change their linkage.

Can you give an example? I'm not sure I completely follow...

llvm/lib/IR/Value.cpp
506 ↗(On Diff #144756)

I thought getCalledFunction on a CallInst takes care of it...

Also adding a FDecl->replaceDirectCalls(F) in the else block after line 1016 seems like it would also correctly rewrite direct calls for the other two cases (external functions, and local functions declared in the current TU.)

Aren't the other cases captured by F->replaceUsesExceptBlockAddr(FDecl) at line 1016?

Also adding a FDecl->replaceDirectCalls(F) in the else block after line 1016 seems like it would also correctly rewrite direct calls for the other two cases (external functions, and local functions declared in the current TU.)

Aren't the other cases captured by F->replaceUsesExceptBlockAddr(FDecl) at line 1016?

No, that statement is doing the reverse of what you're doing. It's replacing directs calls to a function to calls to the jumptable entry. (F/FDecl have different meanings in the 2 other cases than how you're using them.)

llvm/lib/IR/Value.cpp
506 ↗(On Diff #144756)

It checks that the call is a direct call, but the use might be an argument of the function not the called function, e.g.

void bar();
foo(&bar);

Iterating through the uses of bar you'll hit the call instruction and replace foo(&bar) with foo(&bar.cfi) though it shouldn't be. You need to check that getCalledFunction is the one you actually expect.

Yeah, need a test case.

Also, definitions in the merged module are set to have internal linkage which will not work if we plan to call them directly in other modules. We'll need to change their linkage.

Can you give an example? I'm not sure I completely follow...

Sorry, that was a pretty terse response--I'd tried a build of Chrome with a variant of this change and was reporting the way the build failed without providing much insight as to how or why. buildBitSetsFromFunctionsNative() will set function definitions in the merged module to have internal linkage; function definitions end up in the merged module when compiled with ThinLTO if they're compiled in a translation unit with no externally visible symbols (like in ex2.cpp below.) If that's the case, the definition of a symbol (X<double>.cfi in this example) will be set to have internal linkage so a direct call to it from another translation unit will fail.

ex1.cpp:

#include <stdio.h>

template<class C> void X();

int main() {
  printf("addr %p\n", &X<double>);
  X<double>();
  ((void(*)())40000)();
  return 0;
}

ex2.cpp:

#include <stdio.h>

template<class C> void X() { printf("X\n"); }

namespace { 
  __attribute__((section(".preinit_array"), used)) void (*__preinit)(void) = X<double>;
}
$ clang++ -fuse-ld=lld -flto=thin -fsanitize=cfi-icall -fvisibility=hidden -o ex ex1.cpp ex2.cpp
lld: error: undefined symbol: void X<double>() (.cfi)
>>> referenced by ex1.cpp
>>>               lto.tmp:(main)
dmikulin marked 3 inline comments as done.May 3 2018, 8:49 AM

No, that statement is doing the reverse of what you're doing. It's replacing directs calls to a function to calls to the jumptable entry. (F/FDecl have different meanings in the 2 other cases than how you're using them.)

Sounds like we need to skip direct calls in replaceUsesExceptBlockAddr(). Make it replaceUsesExceptBlockAddrOrDirectCalls()

llvm/lib/IR/Value.cpp
506 ↗(On Diff #144756)

I see... I thought there'd be a more straightforward direct call check, but I can't seem to find anything in the code. Changed to check if the getCalledFunction() is the same as what we're trying to replace, i.e. ''this'.

dmikulin marked an inline comment as done.May 3 2018, 11:30 AM

No, that statement is doing the reverse of what you're doing. It's replacing directs calls to a function to calls to the jumptable entry. (F/FDecl have different meanings in the 2 other cases than how you're using them.)

Sounds like we need to skip direct calls in replaceUsesExceptBlockAddr(). Make it replaceUsesExceptBlockAddrOrDirectCalls()

Hmm... This broke some tests, test/Transforms/LowerTypeTests/import-icall.ll to be specific. A direct call to @external, which is a CfiFunctionDecl, is expected to go through a jump table entry. With my change it doesn't.
Playing a bit more with thin and full LTO, it looks like a direct external call to a function with the same type as function pointers used in the program, will go through a jump table entry even in full LTO. Is this by design?

$ clang++ -fuse-ld=lld -flto=thin -fsanitize=cfi-icall -fvisibility=hidden -o ex ex1.cpp ex2.cpp
lld: error: undefined symbol: void X<double>() (.cfi)

referenced by ex1.cpp

lto.tmp:(main)

I get the same result with full LTO and without my changes, ToT from a few days ago. It may be a separate problem.

I get the same result with full LTO and without my changes, ToT from a few days ago. It may be a separate problem.

ToT LTO and ThinLTO builds both work fine for me. Can you post the error you get? I'm not sure how ToT would produce a direct call to a function with a jumptable entry.

Hmm... This broke some tests, test/Transforms/LowerTypeTests/import-icall.ll to be specific. A direct call to @external, which is a CfiFunctionDecl, is expected to go through a jump table entry. With my change it doesn't.

Presumably it's only expected to because the test just checks the old behavior.

Playing a bit more with thin and full LTO, it looks like a direct external call to a function with the same type as function pointers used in the program, will go through a jump table entry even in full LTO. Is this by design?

It's the same behavior as done for ThinLTO--it can be replaced with direct calls without impacting the correctness of the CFI checking.

ToT LTO and ThinLTO builds both work fine for me. Can you post the error you get? I'm not sure how ToT would produce a direct call to a function with a jumptable entry.

Never mind: too many builds, something got mixed up. I need to take another look.

dmikulin updated this revision to Diff 145500.May 7 2018, 10:51 AM

Added a test case.
Skip direct calls in addition to blockaddr uses when dealing with CFI.
Don't set linkage to internal for .cfi symbols as they may be needed by direct calls.

It would be nice to also add a direct call monolithic LTO test as monolithic LTO has a slightly different code path for renaming functions and based on a quick look above it looks like we might only correctly rename direct calls to local, not external, functions from the code above.

llvm/include/llvm/IR/Value.h
307 ↗(On Diff #145500)

Given that this function usefulness is now likely specific to LowerTypeTests it's probably worth just including it there directly instead of polluting all of Value's children.

llvm/lib/IR/Value.cpp
506 ↗(On Diff #144756)

I haven't tested it but I think this will not work for foo(&foo), e.g. it will change it to foo.cfi(&foo.cfi) instead of foo.cfi(&foo).

llvm/test/Transforms/LowerTypeTests/cfi-direct-call.ll
1 ↗(On Diff #145500)

I'd like to see this test test the correct direct call behavior for 3 types of functions (matching the 3 cases in ::importFunction): local function declared in the current translation unit, local function declared in another TU, and an external function.

6 ↗(On Diff #145500)

This variable is named @external but is included in CfiFunctionDefs instead of CfiFunctionsDecls in the summary so it's treated like a local function. e.g. for direct calls it's re-written to @external.cfi. An direct call to an external function would be left as @external (instead of being re-written to @external.cfi_jt to point to the jump table entry.)

llvm/test/Transforms/LowerTypeTests/import-icall.ll
31 ↗(On Diff #145500)

For @external and @local_a below we should add an instruction where both address-taken so that we can still test the original behavior (e.g. that non-direct calls point to the jump table aliases as expected.)

dmikulin marked an inline comment as done.May 8 2018, 5:53 PM
dmikulin added inline comments.
llvm/lib/IR/Value.cpp
506 ↗(On Diff #144756)

I can't seem to reproduce it without at least having the fptr argument wrapped in a bitcast, which makes this check work. I'm thinking I'd need a recursively defined fptr type to avoid the cast.
I don't necessarily like the roundabout way of checking if it's a direct call. Is there a better way?

llvm/test/Transforms/LowerTypeTests/cfi-direct-call.ll
1 ↗(On Diff #145500)

I'll work on the test cases.

6 ↗(On Diff #145500)

Direct calls to external functions may still go through a jump table entry, if those functions are of the same type as functions called indirectly. I'll change the test case and add more test points.

efriedma added inline comments.May 9 2018, 3:07 PM
llvm/lib/IR/Value.cpp
506 ↗(On Diff #144756)

Yes, it's impossible to have a function with itself as an argument... at least for now (it might become possible with the typeless pointer work, but no idea if/when that will happen).

Looking around a bit, it looks like there's a helper CallSite::isCallee which is meant for this sort of check; it takes a Use*.

llvm/test/Transforms/LowerTypeTests/cfi-direct-call.ll
6 ↗(On Diff #145500)

Shouldn't the logic in this change prevent that from happening? (modulo the thin versus monolithic LTO logic I have already mentioned.)

dmikulin updated this revision to Diff 146260.May 10 2018, 5:24 PM

Fixed a few missed cases where direct calls were routed through jump tables.
Added more test points for thin and full LTO.

Otherwise looks good to me.

llvm/lib/Transforms/IPO/LowerTypeTests.cpp
1221 ↗(On Diff #146260)

This logic is slightly different because it skips block addresses. Is it possible that a local weak function would exist that would cause an exception by not being replaced here?

1451 ↗(On Diff #146260)

Ditto the above. Because of !IsDefinition I can't think of a case where a blockaddress would be valid though, so this might be fine.

dmikulin added inline comments.May 15 2018, 3:42 PM
llvm/lib/Transforms/IPO/LowerTypeTests.cpp
1221 ↗(On Diff #146260)

We had to special case blockaddr before and I felt that since they don't refer to function entry points (correct?), they should not be routed through jump tables and should be skipped here as well. Or, if this case never occurs, at least it should be safe to skip.

1451 ↗(On Diff #146260)

Same here.

vlad.tsyrklevich accepted this revision.May 15 2018, 4:30 PM

LGTM but I'd like to see what @pcc has to say.

llvm/lib/Transforms/IPO/LowerTypeTests.cpp
1221 ↗(On Diff #146260)

I think you're right, the RAUW would replace the expression with a conditional so blockaddr would not work as is.

This revision is now accepted and ready to land.May 15 2018, 4:30 PM
pcc accepted this revision.May 15 2018, 4:43 PM

LGTM

llvm/lib/Transforms/IPO/LowerTypeTests.cpp
982 ↗(On Diff #146260)

It's a little confusing that FDecl is used for a different purpose here than in the rest of the function (where it is used to refer to a jump table entry). Maybe make this declare a separate RealF variable, and then once you take care of the else after return below you can move the declaration of FDecl after the if statement.

987 ↗(On Diff #146260)

No else after return.

This revision was automatically updated to reflect the committed changes.

Various chromium test cases started failing on the first build with this change, and I don't see anything suspicious chromium-side: https://ci.chromium.org/buildbot/chromium.clang/CFI%20Linux%20%28icall%29/9886 Any chance that could be due to this?

Based on a quick look at a couple of stack traces it looks like some direct
calls now call symbols that are no longer overriden by their replacements
in chromium, e.g. calls to realloc() from a DSO previously resolved to the
tcmalloc implementation in chromium instead of realloc.cfi(), the locally
linked tcmalloc implementation in that library.

pcc added a comment.May 19 2018, 2:40 PM

Based on a quick look at a couple of stack traces it looks like some direct
calls now call symbols that are no longer overriden by their replacements
in chromium, e.g. calls to realloc() from a DSO previously resolved to the
tcmalloc implementation in chromium instead of realloc.cfi(), the locally
linked tcmalloc implementation in that library.

I imagine that we will need to limit this optimization to dso_local functions.

Reverted in r332838. Please don't keep trunk broken for days.

Sorry, I'm traveling this week, didn't see this until now. I didn't get any bot failure notifications after the commit, so I assumed it was good. Do you have a test case I can use to reproduce it?

Based on a quick look at a couple of stack traces it looks like some direct
calls now call symbols that are no longer overriden by their replacements
in chromium, e.g. calls to realloc() from a DSO previously resolved to the
tcmalloc implementation in chromium instead of realloc.cfi(), the locally
linked tcmalloc implementation in that library.

If I understand correctly, you have a DSO linked to the main chromium binary. The DSO calls realloc(), which normally resolves at runtime to the realloc() in the main binary. The DSO also defines realloc(). After my changes instead of dynamically resolving realloc(), it's statically resolved to realloc.cfi(). How do you build your DSO? I can't get defined symbols to resolve externally with CFI enabled...

Oh, this must be with cross-DSO CFI. I see what the problem is. We need to skip symbols not resolved internally from replacing direct calls to calls to <name>.cfi. An extra test case wound't hurt, so if you have it please send it to me.

Sorry again for breaking top-of-tree, didn't realize it happened. Usually the bots send auto notification to authors of suspected commits. I was keeping an eye on the bots, checked reported breakages, but none of them were mine. Are chromium bots set up the same way or do I need to manually monitor their status?

pcc added a comment.May 21 2018, 9:12 PM

You don't need cross-DSO CFI to reproduce this, the issue has to do with regular ELF interposition.

Here is a standalone reproducer:

$ cat main.c
void baz();
void foo() {
  puts("main");
}

int main() {
  direct();
  indirect();
}
$ cat dso.c
void foo() {
  puts("dso");
}
typedef void (*fp)();

fp bar() { return foo; }
void indirect() {
  bar()();
}
void direct() {
  foo();
}
$ ~/src2/llvm-project4/ra/bin/clang -fsanitize=cfi-icall -shared -o dso.so dso.c -flto -fPIC
$ ~/src2/llvm-project4/ra/bin/clang -fsanitize=cfi-icall  -o main main.c -flto dso.so
$ LD_LIBRARY_PATH=. ./main
dso
main

If I rebuild dso.so without cfi-icall I see the correct result:

$ ~/src2/llvm-project4/ra/bin/clang  -shared -o dso.so dso.c -flto -fPIC
$ LD_LIBRARY_PATH=. ./main
main
main
dmikulin updated this revision to Diff 148358.May 24 2018, 2:22 AM

Changed the code not to short circuit functions that can be overridden at run time and not to expose <func>.cfi names to the runtime.
Added a test point to check for that.

Would it be possible to run this patch through Chromium build/test before I commit?

Dmitry, I can run the tests for you locally; however, it's probably worth having @pcc take a look first.

llvm/lib/Transforms/IPO/LowerTypeTests.cpp
425 ↗(On Diff #148358)

clang-format

436 ↗(On Diff #148358)

Update

1469 ↗(On Diff #148358)

Why change the visibility to hidden only for !hasLocalLinkage ?

1655 ↗(On Diff #148358)

Did new changes cause this to be required?

llvm/test/Transforms/LowerTypeTests/cfi-direct-call.ll
8 ↗(On Diff #148358)

remove?

This revision is now accepted and ready to land.May 24 2018, 10:16 AM
dmikulin marked 4 inline comments as done.May 24 2018, 2:33 PM
dmikulin added inline comments.
llvm/lib/Transforms/IPO/LowerTypeTests.cpp
1469 ↗(On Diff #148358)

setVisibility() asserts if hasLocalLinkage()

1655 ↗(On Diff #148358)

Handling of Constants? No, it's been there before my changes.

dmikulin marked 3 inline comments as done.May 24 2018, 2:40 PM
pcc added inline comments.May 30 2018, 12:18 PM
llvm/lib/Transforms/IPO/LowerTypeTests.cpp
985 ↗(On Diff #148358)

Can you check for dso_local instead of default visibility here? I think that should be exactly the property that you want to check for.

1009 ↗(On Diff #148358)

Is there a reason to set the visibility indirectly via this variable here?

1651 ↗(On Diff #148358)

Similarly I think you can check for dso_local here.

dmikulin marked 2 inline comments as done.May 30 2018, 4:30 PM

Can you guys please give it a try with Chromium? Do you get any measurable performance improvements with this change?

llvm/lib/Transforms/IPO/LowerTypeTests.cpp
985 ↗(On Diff #148358)

Thanks, that's better.

1009 ↗(On Diff #148358)

I need to set visibility late, after replaceCfiUses(). If I do it sooner, the symbol becomes dso_local, and it breaks the logic in replaceCfiUses().

Did you forget to update the change? I'll run it against the failing chrome tests after. CFI-icall presents a <=1% performance regression for chrome so it would be hard to pick up a difference. From my past look at it, most of the regression appeared to because of cfi-icall checks at every malloc/free (because we override their implementations and require a function pointer call.)

dmikulin updated this revision to Diff 149350.May 31 2018, 1:37 PM
dmikulin marked an inline comment as done.

Changed the checks to look for dso_local instread of visibility.

The browser/media/cronet tests that failed previously now pass.

pcc accepted this revision.Jun 1 2018, 12:29 PM

LGTM

llvm/lib/Transforms/IPO/LowerTypeTests.cpp
1009 ↗(On Diff #148358)

Okay, please mention that in a comment.

llvm/test/Transforms/LowerTypeTests/cfi-direct-call.ll
15 ↗(On Diff #149350)

You might also want to test the case where a default visibility function is dso_local.

dmikulin marked 2 inline comments as done.Jun 1 2018, 3:39 PM
This revision was automatically updated to reflect the committed changes.