This is an archive of the discontinued LLVM Phabricator instance.

[XRay][compiler-rt] Remove reliance on C++ ABI features
ClosedPublic

Authored by dberris on May 16 2018, 11:26 PM.

Details

Summary

This fixes http://llvm.org/PR32274.

This change adds a test to ensure that we're able to link XRay modes and
the runtime to binaries that don't need to depend on the C++ standard
library or a C++ ABI library. In particular, we ensure that this will work
with C programs compiled+linked with XRay.

To make the test pass, we need to change a few things in the XRay
runtime implementations to remove the reliance on C++ ABI features. In
particular, we change the thread-safe function-local-static
initialisation to use pthread_* instead of the C++ features that ensure
non-trivial thread-local/function-local-static initialisation.

Depends on D47696.

Event Timeline

dberris created this revision.May 16 2018, 11:26 PM

"In the future we can revisit this when we have a better idea as to why not depending on the C++ ABI functionality is a better solution." - this was discussed previously in the thread linked from the bug.

A big thing, so far as I understand it, is that Clang doesn't require some specific C++ library, so wouldn't this break for users who didn't have the specific C++ library installed that compiler-rt for XRay was built against?

"In the future we can revisit this when we have a better idea as to why not depending on the C++ ABI functionality is a better solution." - this was discussed previously in the thread linked from the bug.

A big thing, so far as I understand it, is that Clang doesn't require some specific C++ library, so wouldn't this break for users who didn't have the specific C++ library installed that compiler-rt for XRay was built against?

We don't use anything from the C++ library directly, we depend on some features provided by the implementation. In some cases, this isn't a library:

extern int g();

int f() {
  static int val = g();
  return val;
}

This will emit calls to guards for the thread-safe initialisation of val.

The alternatives here are that XRay just avoids these completely, and use pthreads instead to do a pthread_once_init(...).

The other alternative is to avoid pthreads and use the sanitizer_common mutexes too, to make it self-contained in compiler-rt.

The other alternative is to just link in the ABI library (libcxxabi or something else) instead of the standard library.

I can make one of those changes work, but that's also a lot of work, which I suppose is worth doing anyway (unless I'm missing something simpler).

Any preferences/stack-ranking of which solution might be more palatable?

dberris updated this revision to Diff 147451.May 18 2018, 2:29 AM
dberris retitled this revision from [XRay][clang+compiler-rt] Make XRay depend on a C++ standard lib to [XRay][compiler-rt] Limit reliance on C++ ABI features.
dberris edited the summary of this revision. (Show Details)

Retitle, and update to limit only to compiler-rt changes.

This is still a work in progress, but working towards a cleaner solution.

dberris updated this revision to Diff 147453.May 18 2018, 2:52 AM
  • fixup: revert unnecessary change to clang
dberris updated this revision to Diff 147455.May 18 2018, 2:56 AM
dberris removed a subscriber: cfe-commits.

Revert unnecessary change in clang.

dberris planned changes to this revision.May 18 2018, 3:04 AM

It turns out it's not that bad to actually just go and remove the dependency to C++ ABI specific requirements in the XRay implementation, so I'll continue down this path and remove as much of the interface as possible. In particular, the remaining issues are now in the FDR implementation, and we should be able to continue refactoring that to remove the C++ ABI dependencies in a similar fashion there.

dberris updated this revision to Diff 147530.May 18 2018, 9:16 AM
  • fixup: update fdr implementation to remove C++ ABI reliance
jfb added inline comments.May 18 2018, 9:58 AM
compiler-rt/lib/xray/xray_fdr_logging.cc
602

If you want to guard signal handler recursion you have to use an atomic exchange, not the current if (set) set = true; pattern you have.

dberris updated this revision to Diff 147550.May 18 2018, 10:47 AM
dberris marked an inline comment as done.
  • fixup: Use atomic_exchange for signal-safety
compiler-rt/lib/xray/xray_fdr_logging.cc
602

Good catch! Fixed.

dberris updated this revision to Diff 147730.May 20 2018, 8:43 PM
  • fixup: Remove RTTI from XRay builds
  • fixup: Remove dependency on array-placement new in buffer_queue

This is now ready for another look @dblaikie and/or @echristo -- apologies for the huge change.

One general inline comment. Otherwise, yes, any decoupling you can do from the C++ standard library is goodness.

compiler-rt/lib/xray/xray_basic_logging.cc
164

Mild preference to avoid u64 and still use uint64_t here.

dberris updated this revision to Diff 147759.May 21 2018, 3:50 AM
dberris marked an inline comment as done.
  • Rebase
  • fixup: use uint64_t instead of u64
dberris updated this revision to Diff 147763.May 21 2018, 4:09 AM
  • fixup: run binary linked with xray to check it actually works
dberris retitled this revision from [XRay][compiler-rt] Limit reliance on C++ ABI features to [XRay][compiler-rt] Remove reliance on C++ ABI features.May 22 2018, 11:39 PM
dberris edited the summary of this revision. (Show Details)
dberris updated this revision to Diff 148167.May 23 2018, 12:27 AM

Rebase and use PTHREAD_ONCE_INIT.

No strong opinions. I'm probably not the right person to review this :)

dberris edited reviewers, added: kpw, eizan; removed: echristo.May 23 2018, 4:15 PM
dberris added subscribers: kpw, eizan, echristo.

Thanks, @echristo -- adding in @kpw who may have a bit more context on the XRay side and @eizan who had been looking at doing this a while back.

dberris updated this revision to Diff 148779.May 27 2018, 11:32 PM
  • fixup: use static instead of inline
kpw added inline comments.May 30 2018, 10:48 PM
compiler-rt/lib/xray/xray_basic_logging.cc
427–430

The pthread once block that initialized UseRealTSC doesn't have to be this thread. Don't we need synchronization (memory_order_acquire) to make sure it's visible?

compiler-rt/lib/xray/xray_buffer_queue.cc
24 ↗(On Diff #148779)

I've reviewed up to here. Will come back to this.

eizan added inline comments.May 31 2018, 5:58 AM
compiler-rt/lib/xray/xray_basic_logging.cc
86

Why do we need to use atomic_load with a custom memory order rather than just using the default provided by operator=?

411

Is this a bugfix?

dberris updated this revision to Diff 149289.May 31 2018, 7:20 AM
dberris marked 3 inline comments as done.
  • fixup: use memory_order_acquire instead of relaxed
  • fixup: Move out recursion guard, use in FDR and Basic mode
dberris added inline comments.May 31 2018, 7:21 AM
compiler-rt/lib/xray/xray_basic_logging.cc
86

UseRealTSC isn't a std::atomic<...> which will have the operator= overload. We're using the types in sanitizer_common/sanitizer_atomic.h here, which require that loads and stores have an explicit memory order.

411

You could call it that, though we don't have failing tests that indicate we've hit this. :)

427–430

I think a stronger memory order here doesn't hurt -- but thinking about this a little, the global synchronisation on BasicInitialized just allows us to have one of these functions running at the same time. These loads will only happen in either:

  • The first thread that gets to run this, in which case we get the once initialisation done before any of these loads.
  • Another thread that will be able to run this function will always see the release stores commit.

Note that this assumes x86, which is a bad assumption to make. So you're right, I should be using stronger loads here.

dberris planned changes to this revision.Jun 3 2018, 8:13 PM

Breaking this up into smaller changes now. Some of the smaller changes are NFC (namespace cleanup, using internal_* instead of std::*, DCHECK instead of assert, etc.) and are largely OK to commit in isolation. I'll update the patch once the NFC changes have landed.

dberris updated this revision to Diff 149675.Jun 3 2018, 11:09 PM
dberris edited the summary of this revision. (Show Details)

Rebase, reword, and split up into smaller changes.

dberris updated this revision to Diff 149941.Jun 5 2018, 4:42 AM

Rebase, on top of D47696.

dberris updated this revision to Diff 150081.Jun 6 2018, 12:28 AM
  • fixup: use atomic_exchange instead of pthread once for error latch; also clang-format
kpw added inline comments.Jun 6 2018, 10:24 AM
compiler-rt/lib/xray/xray_basic_logging.cc
500–501

I don't get why there is a mix of static FakeTLD and thread_local variables. Won't this only access and subsequently destroy the thread_locals for only a single thread?

compiler-rt/lib/xray/xray_fdr_logging.cc
660

I'm a bit surprised that using static variables with their initialized exactly once by the first caller semantics doesn't depend on the ABI. Are you sure these are OK?

714–721

What's the ABI reason to change this to aligned_storage instead of just using the type?

789–790

Are you adding the ability to cycle through states? I'd say that's a separate patch than the ABI stuff.

789–791

Removing this does not look related to ABI. This should be a separate patch if it's a behavior change. The ABI change would be to translate to destructor + internal free if I'm not mistaken.

compiler-rt/test/xray/TestCases/Posix/c-test.cc
4–5

Shouldn't a different prefix run FDR to check that that mode works as well?

dberris updated this revision to Diff 150245.Jun 6 2018, 7:22 PM
dberris marked 6 inline comments as done.
  • fixup: address comments by @kpw
compiler-rt/lib/xray/xray_basic_logging.cc
500–501

Note, that this happens at program exit -- we want to make sure that at program exit we're cleaning up the data for the final exiting thread. This code only runs when the mode installed at premain init. In that case we can't rely on the finalisation routine being invoked, so we force the cleanup and flush one way or another.

compiler-rt/lib/xray/xray_fdr_logging.cc
660

Yes, because this is initialised with a constant -- __xray::NanosecondsPerSecond is a constexpr.

714–721

Because XRayFileHeader is not trivially constructible, initialising it is guarded to be thread-safe -- that thread-safe init calls C++ ABI specific functions. Using aligned storage here ensures that the storage is static, but that initialisation happens in the pthread_once(...) region.

compiler-rt/test/xray/TestCases/Posix/c-test.cc
4–5

Technically unnecessary, since we link in the FDR mode runtime anyway -- the test is just ensuring that we can link and run a C program.

kpw accepted this revision.Jun 6 2018, 11:53 PM
kpw added inline comments.
compiler-rt/lib/xray/xray_basic_logging.cc
500–501

I see. This is to salvage something to log when exiting abnormally. It's not obvious that TLDDestructor calls similar routines to finalize and flush instead of just freeing memory and stuff like that.

If you can come up with a way to make this more obvious, be my guest, otherwise the comment will have to suffice.

compiler-rt/lib/xray/xray_fdr_logging.cc
660

So we rely on clang turning these into pre-populated addresses (in bss?) at compilation? Is it the standard or the implementation that lets us rely on that?

714–721

It would be great if you could add a writeup of the common ABI pitfalls you discovered and the xray way of accomplishing similar idioms. I'm not sure where, but otherwise, these details are not going to be obvious to future maintainers/contributors etc.

844

Doesn't the "\_\_xray" namespace have an effective "using namespace \_\_sanitizer" everywhere? <--- differential formatting turns these into underlines.

Are you just doing this to make it obvious this is not a stdlib atomic?

This revision is now accepted and ready to land.Jun 6 2018, 11:53 PM
dberris updated this revision to Diff 150445.Jun 7 2018, 8:40 PM
dberris marked an inline comment as done.
  • fixup: remove more unncessary namespace qualifications
compiler-rt/lib/xray/xray_fdr_logging.cc
660

It's the standard that says, that they must be be constant-initialised.

In C++11 this is [stmt.decl] p4 (6.7.4) which references [basic.start.init]p2 bullet 3.

714–721

That's a good idea. The test is supposed to guard against those, because it will turn into a link failure for missing ABI functions.

844

Nope, bad rebase/merge. :)

This revision was automatically updated to reflect the committed changes.