This is an archive of the discontinued LLVM Phabricator instance.

[ppc64] Disable sibling-call-optimization when building tsan library by clang
AbandonedPublic

Authored by cycheng on Apr 11 2016, 6:53 PM.

Details

Summary

Use "-mllvm -disable-ppc-sco=true" for clang when building tsan library, because there is a distance (in terms of instruction count) assumption between StackTrace::GetCurrentPc and BufferedStackTrace::Unwind.

Tail-call to "PrintCurrentStackSlow" break this assumption on PPC64.

See: https://llvm.org/bugs/show_bug.cgi?id=27280#c1

Diff Detail

Event Timeline

cycheng updated this revision to Diff 53347.Apr 11 2016, 6:53 PM
cycheng retitled this revision from to [ppc64] Disable sibling-call-optimization when building tsan library by clang.
cycheng updated this object.
cycheng added reviewers: samsonov, hfinkel, kbarton, tjablin.
cycheng added a subscriber: llvm-commits.
hfinkel edited edge metadata.Apr 11 2016, 9:19 PM

I'd prefer that we not fix the problem this way. You mention in the bug report that we could increase the scan distance:

uptr BufferedStackTrace::LocatePcInTrace(uptr pc) {
#ifdef __powerpc__
  const int kPcThreshold = 800;
#else
  const int kPcThreshold = 320;
#endif

why not do that?

Hi Hal,

Because I found that threshold is only workable for gcc, not for clang.

If I build compiler-rt with clang, distance between sanitizer_print_stack_trace and tsan::PrintCurrentStackSlow is 0x2df8 = 11768

So it looks like feasible work around is disable sco when building tsan library.

(Run-time data when building compiler-rt by clang: https://llvm.org/bugs/show_bug.cgi?id=27280#c2)

CY

Hi Hal,

Because I found that threshold is only workable for gcc, not for clang.

If I build compiler-rt with clang, distance between sanitizer_print_stack_trace and tsan::PrintCurrentStackSlow is 0x2df8 = 11768

So it looks like feasible work around is disable sco when building tsan library.

(Run-time data when building compiler-rt by clang: https://llvm.org/bugs/show_bug.cgi?id=27280#c2)

CY

As a general rule, I don't think it is acceptable to have a build-system dependency on an LLVM-level backend parameter. If this is something we need, then we need to add a Clang-level parameter for this. Alternatively, we might add an attribute applied to the function which must not be called as a tail call. On the other hand, before we consider that, is there a more-principled solution to actually fix this problem?

Looking at the code in BufferedStackTrace::LocatePcInTrace and MatchPc, the threshold used does not actually affect runtime, it is simply used to control matching variance. As a result, so long as it is not set large enough to pick up functions coming from other object files, it seems like setting a significantly-larger threshold would have have negative side-effect?

Agree!

  1. First, I would like to update my investigations:
    • Current mechanism of "BufferedStackTrace::LocatePcInTrace and MatchPc" is unable to handle tail-call to "__tsan::PrintCurrentStackSlow" case.
    • I used clang 3.9.0 on x86 to build tsan library, of course it tail-call to "__tsan::PrintCurrentStackSlow", and I found it passed print-stack-trace.cc test was because of lucky. see: https://llvm.org/bugs/show_bug.cgi?id=27280#c3
    • Possible solutions:
      1. Use "always_inline" attribute on "__tsan::PrintCurrentStackSlow"
      2. New mechanism? I'm still thinking @@
  2. No, the threshold does not affect runtime, but it depend on compiler's codegen decision, I feel it is a magic number, use large number can cause poping incorrect stack count.

Agree!

  1. First, I would like to update my investigations:
    • Current mechanism of "BufferedStackTrace::LocatePcInTrace and MatchPc" is unable to handle tail-call to "__tsan::PrintCurrentStackSlow" case.
    • I used clang 3.9.0 on x86 to build tsan library, of course it tail-call to "__tsan::PrintCurrentStackSlow", and I found it passed print-stack-trace.cc test was because of lucky. see: https://llvm.org/bugs/show_bug.cgi?id=27280#c3
    • Possible solutions:
      1. Use "always_inline" attribute on "__tsan::PrintCurrentStackSlow"
      2. New mechanism? I'm still thinking @@
  2. No, the threshold does not affect runtime, but it depend on compiler's codegen decision, I feel it is a magic number, use large number can cause poping incorrect stack count.

If adding always_inline to __tsan::PrintCurrentStackSlow will fix the problem, that sounds like the best solution (as it will work across compilers). I suggest proposing a patch adding that attribute (and adding a comment explaining why the function must be always_inline).

cycheng abandoned this revision.Apr 14 2016, 10:06 PM

I submit a new patch http://reviews.llvm.org/D19148 to replace this one.
Thanks Hal!