This is an archive of the discontinued LLVM Phabricator instance.

Always inlining PrintCurrentStackSlow of tsan library to fix tail-call issue
ClosedPublic

Authored by cycheng on Apr 14 2016, 10:04 PM.

Details

Summary

The real problem is that sanitizer_print_stack_trace obtains current PC and expects the PC to be in the stack trace after function calls. We don't prevent tail calls in sanitizer runtimes, so this assumption does not necessary hold.

We add "always inline" attribute on PrintCurrentStackSlow to address this issue, however this solution is not reliable enough, but unfortunately, we don't see any simple, reliable solution.

Diff Detail

Event Timeline

cycheng updated this revision to Diff 53842.Apr 14 2016, 10:04 PM
cycheng retitled this revision from to Always inlining PrintCurrentStackSlow of tsan library to fix tail-call issue.
cycheng updated this object.
cycheng added reviewers: samsonov, hfinkel, kbarton, tjablin.
cycheng added a subscriber: llvm-commits.

This patch replace D19001.

I have bootstrap tested the patch on x86 and ppc64. The result as expectation.

On PPC64:

Program stack trace:
#0 0x100cec8c __sanitizer::BufferedStackTrace::SlowUnwindStack
#1 0x100ca86c __sanitizer::BufferedStackTrace::Unwind
#2 0x100a5358 __sanitizer_print_stack_trace
#3 0x100d542c FooBarBaz()
#4 0x100d53c4 main

target pc = 0x100a5308 (by GetCurrentPc)

On X86:

Program stack trace:
#0 0x494693 __sanitizer::BufferedStackTrace::SlowUnwindStack
#1 0x478ff9 __sanitizer_print_stack_trace
#3 0x49b222 FooBarBaz()
#4 0x49b1fd main
#5 0x7fa265caea40 __libc_start_main
#6 0x41a2f9 _start

target pc = 0x478f9c (by GetCurrentPc)
cycheng updated this revision to Diff 54167.Apr 19 2016, 1:00 AM

I just found there is ALWAYS_INLINE be defined in "sanitizer_internal_defs.h", use that instead of LLVM_ALWAYS_INLINE to avoid introducing llvm dependency.

kcc added a reviewer: dvyukov.Apr 19 2016, 4:41 PM
kcc added a subscriber: kcc.

Maybe you should instead inhibit the tail call optimization (by e.g. adding some dummy code after the call)?
Also, is a test possible for this change?

cycheng updated this revision to Diff 54325.Apr 20 2016, 1:12 AM

Add test case to check whether PrintCurrentStackSlow is inlined as our expectation.

cycheng added a comment.EditedApr 20 2016, 1:19 AM
In D19148#405951, @kcc wrote:

Maybe you should instead inhibit the tail call optimization (by e.g. adding some dummy code after the call)?
Also, is a test possible for this change?

To inhibit tail call optimization, I tested something like this:

int dummy() { volatile int i = 0; return i; }
SANITIZER_INTERFACE_ATTRIBUTE
void __sanitizer_print_stack_trace() {
  PrintCurrentStackSlow(StackTrace::GetCurrentPc());
  dummy();
}

I thought maybe always inline is simpler?

cycheng updated this revision to Diff 54326.Apr 20 2016, 1:25 AM

Rename test file name.

dvyukov edited edge metadata.Apr 20 2016, 1:27 AM

Also, is a test possible for this change?

print-stack-trace.cc already catches it

dvyukov accepted this revision.Apr 20 2016, 1:38 AM
dvyukov edited edge metadata.

It's all messy. The real problem is that sanitizer_print_stack_trace obtains current PC and expects the PC to be in the stack trace after function calls. We don't prevent tail calls in sanitizer runtimes, so this assumption does not necessary hold. For example, even if we add the ALWAYS_INLINE but introduce another function between sanitizer_print_stack_trace and PrintCurrentStackSlow (sanitizer_print_stack_trace calls Foo and Foo calls PrintCurrentStackSlow), then the test will become flaky again (sanitizer_print_stack_trace tail calls Foo and the PC disappears from the stack trace).
Another problem is that asan/msan does not turn off tail calls during compilation, so __sanitizer_print_stack_trace can be tail called, and then we get broken stack trace again.

Having said that, I don't see any simple, reliable solution. And anything complex probably does not worth it. So let's submit this fix.

Thanks for fixing this, btw.

This revision is now accepted and ready to land.Apr 20 2016, 1:38 AM

It's all messy. The real problem is that sanitizer_print_stack_trace obtains current PC and expects the PC to be in the stack trace after function calls. We don't prevent tail calls in sanitizer runtimes, so this assumption does not necessary hold. For example, even if we add the ALWAYS_INLINE but introduce another function between sanitizer_print_stack_trace and PrintCurrentStackSlow (sanitizer_print_stack_trace calls Foo and Foo calls PrintCurrentStackSlow), then the test will become flaky again (sanitizer_print_stack_trace tail calls Foo and the PC disappears from the stack trace).
Another problem is that asan/msan does not turn off tail calls during compilation, so __sanitizer_print_stack_trace can be tail called, and then we get broken stack trace again.

Having said that, I don't see any simple, reliable solution. And anything complex probably does not worth it. So let's submit this fix.

Thanks for fixing this, btw.

Agree :D
Thanks for the comments, may I borrow it as my commit message?
By the way, whether I should submit the test case or not?

Thanks for the comments, may I borrow it as my commit message?

sure

By the way, whether I should submit the test case or not?

no, I don't see what it adds
the original test checks precise contents and order of frames, it will catch if PrintCurrentStackSlow appears in the trace

cycheng updated this revision to Diff 54338.Apr 20 2016, 3:22 AM
cycheng updated this object.
cycheng edited edge metadata.
cycheng closed this revision.Apr 20 2016, 3:35 AM

Committed r266869