This is an archive of the discontinued LLVM Phabricator instance.

[tsan] Zero out the shadow memory for the stack and TLS in ThreadFinish
Needs ReviewPublic

Authored by kubamracek on Apr 13 2018, 4:48 PM.

Details

Summary

It turns out that DontNeedShadowFor() on Darwin doesn't actually zero out nor release the memory, and this can lead to a crash when the memory is reused (by a new thread). The particular problem on Darwin is because we actually use the shadow memory to store ThreadState * (as a fake thread-local storage), and if this contains a stale value, we'll crash.

I'll try to add a test case, but it seems it's pretty hard to trigger.

Diff Detail

Event Timeline

kubamracek created this revision.Apr 13 2018, 4:48 PM
Herald added a subscriber: Restricted Project. · View Herald TranscriptApr 13 2018, 4:48 PM

Two things that I don't like here:

  1. This imposes cost of zeroing of up to 32MB (standard 8MB stack x 4x shadow) per thread creation/destruction for all OSes. Some programs create threads like insane.
  2. I don't think this fixes the actual root cause, only makes it even harder to localize. Note that cur_thread_finalize already clears the shadow slot, so if pthread reuses stack/tls wholesale, then the slot should be zero already. However, tsan does not generally keep shadow clear (e.g. munmap does not clear shadow too, and most likely a bunch of other things). So if the slot reuses memory from a previous mmap, it will crash the same way.

I wonder if moving the slot to _meta_ shadow is the right things to do. We actually clear meta shadow on unmap. I don't see where we clear stack, but we should, otherwise we can leak lots of sync objects on stack.

Two things that I don't like here:

  1. This imposes cost of zeroing of up to 32MB (standard 8MB stack x 4x shadow) per thread creation/destruction for all OSes. Some programs create threads like insane.
  2. I don't think this fixes the actual root cause, only makes it even harder to localize. Note that cur_thread_finalize already clears the shadow slot, so if pthread reuses stack/tls wholesale, then the slot should be zero already. However, tsan does not generally keep shadow clear (e.g. munmap does not clear shadow too, and most likely a bunch of other things). So if the slot reuses memory from a previous mmap, it will crash the same way.

I wonder if moving the slot to _meta_ shadow is the right things to do. We actually clear meta shadow on unmap. I don't see where we clear stack, but we should, otherwise we can leak lots of sync objects on stack.

Is it performance or memory usage you're worried about? Don't we *already* fill the shadow memory for the entire stack at thread creation? Shouldn't MemoryResetRange followed by DontNeedShadowFor release the pages back to the OS anyway?

Second problem is that I still don't the exact root cause, nor am I able to reliably reproduce this in a small example. We have a giant app that will trigger this after a long period of time, and so far my theory is that a thread's stack is allocated in a place of a previously-existing thread but it's not exactly the same region, it's just overlapping. In that case, the value from the previous MemoryRangeImitateWrite in ThreadStart is in the shadow slot.

Let's concentrate on correctness for now.
Shadow memory is not guaranteed to be zeros for new memory. At munmap does not clear it, and probably some other things. Perhaps if you use munmap in test, it will be easier to catch this (e.g. mmap a large chunk of memory, write to it, munmap, create a thread).

Okay. I can reproduce the same crash with pthreads and mmap'ing/munmap'ing memory at the region where threads' stacks are usually created. However, this is a slightly different problem from what I'm trying to solve here. There's two variations of this:

  1. User code uses memory that will later be re-used as a thread's stack. This can be handled with interceptors (we can catch mmap and munmap).
  2. The kernel spawns and destroys wq threads (which happens spontaneously, without the process direct control). We get notifications about these events via my_pthread_introspection_hook in tsan_platform_mac.cc. (As a side note, we also have to work with the fact the thread creation event is not sent "soon enough", i.e. many interceptors get called before we receive the event, and the thread destruction event is not sent "late enough", i.e. interceptors get called after that event.)

I can reproduce both the issues (however the reproducer for the second one is not suitable to be a lit test, as it takes 30-60 seconds to trigger). I'm not too worried about the first variation. It's very unlikely that user code will try to mmap(MAP_FIXED) into the area of thread stacks, and non-fixed mmap will avoid this area. Mostly I'm worried about the situation where one thread doesn't clean up and a new thread will get its stack allocated over a previous thread.

Is moving the backing store of our "fake TLS" to the metamap going to help? I would need to add clearing of the metamap into ThreadFinish. Is that any better from clearing the normal shadow memory of the stack?

Shadow memory is not guaranteed to be zeros for new memory.

Doesn't that cause some problems already?

I'm not too worried about the first variation. It's very unlikely that user code will try to mmap(MAP_FIXED)

Are you sure that the first variation requires MAP_FIXED? On linux all mmaps go to the same memory range (0x7ff...) and as far as I remember it was the case for mac too. Maybe you tested only gcd threads or pthread threads? But stacks for the other one are allocated from normal mmap range?

and the thread destruction event is not sent "late enough", i.e. interceptors get called after that event.

This means that we have another bug. Late interceptors will recreate thread descriptor, right? Which means that than a new thread will reuse this descriptor and this can lead to both false negatives (as these 2 threads are now effectivetely a single thread tsan point of view) and false positives (as for the second thread we does not synchronize it with thread creation point).

Is moving the backing store of our "fake TLS" to the metamap going to help?

There are other good reasons to clean meta shadow. If we don't clean it, we leak sync objects and/or induce false negatives (as new sync objects created at the same address reuse the old one). We already pay the cost of cleaning it in munmap. And we should clean it on thread destruction too.
So if we use meta map, then we don't pay double price for cleaning both shadows and any bug fixes for mac will also help all other OSes, and vise versa -- any fixes that clean run away meta shadow to prevent sync object leaks will also help mac.
If we use normal shadow, then we pay double price, and you are on your own with long tail of cases where we forgot to clean normal shadow.

Doesn't that cause some problems already?

No. I don't think there is anything that requires it now.

But both normal shadow and meta shadow does not solve problem with early thread destruction callback...

george.karpenkov resigned from this revision.May 30 2018, 4:30 PM
george.karpenkov added a subscriber: george.karpenkov.
yln added a subscriber: yln.Nov 30 2021, 3:07 PM

(Hopefully) obsoleted by: D110236