This is an archive of the discontinued LLVM Phabricator instance.

Correct atexit(3) support in TSan/NetBSD
ClosedPublic

Authored by krytarowski on Nov 3 2017, 2:27 PM.

Details

Summary

The NetBSD specific implementation of cxa_atexit() does not
preserve the 2nd argument if dso is equal to NULL.

Changes:

  • Split paths of handling intercepted __cxa_atexit() and atexit(3). This affects all supported Operating Systems.
  • Add a local stack-like structure to hold the __cxa_atexit() context. atexit(3) is documented in the C standard as calling callback from the earliest to the oldest entry. This path also fixes potential ABI problem of passing an argument to a function from the atexit(3) callback mechanism.
  • Add new test to ensure LIFO style of atexit(3) callbacks: atexit3.cc

Proposal to change the behavior of __cxa_atexit() in NetBSD has been rejected.

With the above changes TSan/NetBSD with the current tsan_interceptors.cc
can bootstrap into operation.

Sponsored by <The NetBSD Foundation>

Diff Detail

Repository
rL LLVM

Event Timeline

krytarowski created this revision.Nov 3 2017, 2:27 PM
krytarowski edited the summary of this revision. (Show Details)Nov 3 2017, 2:29 PM
vitalybuka edited edge metadata.Nov 3 2017, 3:15 PM

thanks for splitting.
I don't have particular opinion on the patch, just some nits.
Maybe wait for @dvyukov input?

lib/sanitizer_common/sanitizer_allocator.cc
196 ↗(On Diff #121552)

!naddr

lib/tsan/rtl/tsan_interceptors.cc
420

could you please move both vars closed to the use?

432

Dmitry, can this be just "Acquire(cur_thread(), 0, (uptr)ctx);" ?

468

maybe

int res = REAL(__cxa_atexit)((void (*)(void *a))at_exit_wrapper, dso ? ctx : 0, dso);
if (!dso) {
  ...
}

Maybe wait for @dvyukov input?

It would be nice to get feedback from !NetBSD tests whether it works correctly.

This code in the past and after my change levites over OS-specific implementations and assumptions.

lib/tsan/rtl/tsan_interceptors.cc
468

There are now two at_exit_wrapper functions so I would need to inline the ?: operator into the first argument as well.

joerg edited edge metadata.Nov 3 2017, 4:01 PM

Is there any reason why keeping at_exit and __cxa_atexit handling merged? They are pretty much disjunct code paths, especially since the at_exit stack means that the real at_exit can be used.

atexit(3)/NetBSD calls internally __cxa_atexit() and it can be intercepted.

Certainly there is some way to support real atexit(3), but it has been disabled explicitly perhaps for the same reason.

Another reason is that with two more distinct paths we can get more code duplication for no good reason.

For the sake of purity (like removal of casting function pointers) we generate extra new code, new structs (or unions) etc.

I've decided go this path in this revision when I noted that NetBSD casts function pointers in the atexit(3) implementation.

dvyukov edited edge metadata.Nov 6 2017, 12:58 AM

Doesn't cxa_atexit call callbacks in the same order as atexit? You said that atexit is implemented by means of cxa_atexit.
As far as I remember, __cxa_atexit is called with dso=0 for destructors in main executable. If so, order of execution will depend on if the dtor is in main executable or not, which looks wrong.
Is it really necessary to use different call mechanisms? Please add a test for LIFO atexit order, if it works then we can keep the current code.

Re InternalReallocArr, we already have Vector<T> class in tsan_vector.h for this. So if this code is really necessary, we should use Vector (probably will require declaring ctor as constexpr if we need a global instance).

joerg added a comment.Nov 6 2017, 5:36 AM

No, __cxa_atexit will always reference the DSO handle. That exists even in the main executable.

No, __cxa_atexit will always reference the DSO handle. That exists even in the main executable.

Ack. Probably confused it with __cxa_finalize.

The behavior of atexit(3) LIFO has been defined in the C standard.

I'm going to switch to the internal vector container and add a dedicated test to verify that this is really LIFO.

The behavior of atexit(3) LIFO has been defined in the C standard.

C++ also defines order of destruction of static and thread-local objects as LIFO:

3.6.3/1
If the completion of the constructor or dynamic initialization of an object with static storage
duration is sequenced before that of another, the completion of the destructor of the second is sequenced
before the initiation of the destructor of the first.

So I think we can assume they all are LIFO.

Hmm, I might be doing something incorrectly, but there is a problem with Vector. We fire the destructor for this LIFO container.. before actually calling atexit(3) functions from libc.

diff --git a/lib/tsan/rtl/tsan_interceptors.cc b/lib/tsan/rtl/tsan_interceptors.cc
index 0b4e873a0..106a3fd0e 100644
--- a/lib/tsan/rtl/tsan_interceptors.cc
+++ b/lib/tsan/rtl/tsan_interceptors.cc
@@ -391,7 +391,22 @@ struct AtExitCtx {
   void *arg;
 };
 
-static void at_exit_wrapper(void *arg) {
+Vector<struct AtExitCtx *> AtExitStack(MBlockAtExit);
+
+static void at_exit_wrapper() {
+  ThreadState *thr = cur_thread();
+  uptr pc = 0;
+
+  // Pop AtExitCtx from the top of the stack of callback functions
+  AtExitCtx *ctx = AtExitStack[AtExitStack.Size() - 1];
+  AtExitStack.PopBack();
+
+  Acquire(thr, pc, (uptr)ctx);
+  ((void(*)())ctx->f)();
+  InternalFree(ctx);
+}
+
+static void cxa_at_exit_wrapper(void *arg) {
   ThreadState *thr = cur_thread();
   uptr pc = 0;
   Acquire(thr, pc, (uptr)arg);
@@ -430,7 +445,18 @@ static int setup_at_exit_wrapper(ThreadState *thr, uptr pc, void(*f)(),
   // Memory allocation in __cxa_atexit will race with free during exit,
   // because we do not see synchronization around atexit callback list.
   ThreadIgnoreBegin(thr, pc);
-  int res = REAL(__cxa_atexit)(at_exit_wrapper, ctx, dso);
+  int res;
+  if (dso == 0) {
+    // NetBSD does not preserve the 2nd argument if dso is equal to 0
+    // Store ctx in a local stack-like structure (LIFO order)
+    res = REAL(__cxa_atexit)((void (*)(void *a))at_exit_wrapper, 0, 0);
+    // Push AtExitCtx on the top of the stack of callback functions
+    if (res == 0) {
+      AtExitStack.PushBack(ctx);
+    }
+  } else {
+    res = REAL(__cxa_atexit)(cxa_at_exit_wrapper, ctx, dso);
+  }
   ThreadIgnoreEnd(thr, pc);
   return res;
 }

This means that we need to use internal_malloc() like in the proposed patch.

krytarowski edited the summary of this revision. (Show Details)

Hmm, I might be doing something incorrectly, but there is a problem with Vector. We fire the destructor for this LIFO container.. before actually calling atexit(3) functions from libc.

Right, there is also the dtor.
I've mailed https://reviews.llvm.org/D39721 which should enable this and all other similar use-cases.
Spreading this "linker-initialized ctors/no dtors plague" throughout the code base will lead to maintenance pain long term. This is just a small, local functionality and you had to write a custom container for it.

joerg added a comment.Nov 7 2017, 4:14 AM

No need for a custom container, just allocate the vector dynamically and free it when it becomes empty.

dvyukov added inline comments.Nov 7 2017, 4:23 AM
lib/tsan/rtl/tsan_interceptors.cc
472

This needs a mutex.
at_exit_wrapper too. C++ allows concurrent callbacks, not sure if anybody does it, but at least it races with setup_at_exit_wrapper.

krytarowski edited the summary of this revision. (Show Details)
krytarowski marked an inline comment as done.
dvyukov added inline comments.Nov 7 2017, 11:55 PM
lib/tsan/rtl/tsan_interceptors.cc
413

This also needs the mutex for 2 reasons:

  1. It can race with setup_at_exit_wrapper in other threads.
  2. You push onto the stack _after_ calling __cxa_atexit, so if another thread calls exit in between, at_exit_wrapper can see an empty stack.
krytarowski added inline comments.Nov 8 2017, 7:39 AM
lib/tsan/rtl/tsan_interceptors.cc
413

atexit(3) are global, not per-thread.

I'm not convinced that races for 1 can impact execution of at_exit_wrapper().

I agree that someone might try to keep registering new atexit(3) callbacks inside atexit(3) callback functions. I don't know whether this behavior is even defined or real-world. I think that function-scoped mutex can cause deadlock with callback registration mutex, so I will protect only the access to AtExitStack.

dvyukov added a comment.EditedNov 8 2017, 7:59 AM
This comment has been deleted.
lib/tsan/rtl/tsan_interceptors.cc
413

I'm not convinced that races for 1 can impact execution of at_exit_wrapper().

Why? One thread executes callbacks, another registers new callbacks. AtExitStack is corrupted as the result. Everything explodes.

I agree that someone might try to keep registering new atexit(3) callbacks inside atexit(3) callback functions.

That's less of an issue. The problem are other threads that know nothing about process existing yet and continue registering callbacks.

dvyukov added inline comments.Nov 8 2017, 8:01 AM
lib/tsan/rtl/tsan_interceptors.cc
413

And the second problem is: thread registers callback with __cxa_atexit but don't push onto AtExitStack yes; another thread calls exit and at_exit_wrapper() discovers empty AtExitStack.

Protect AtExitStack with a mutex in at_exit_wrapper().

dvyukov accepted this revision.Nov 8 2017, 8:22 AM
This revision is now accepted and ready to land.Nov 8 2017, 8:22 AM

Thanks for review, I will commit it tonight!

The remaining problems with TSan/NetBSD:

  • Mutex tracking is broken, they seem to be not registered (in ScopedReport::AddMemoryAccess we always get mset->Size() equal to 0)
  • Thread Deatch and Thread Joined can race and assert that they go in the wrong order (we destroy a thread before marking it Finished)

Less important ones:

  • setjmp/longjmp unimplemented for NetBSD
  • StartBackgroundThread() is not executed (need to be deferred)
krytarowski closed this revision.Nov 8 2017, 2:34 PM