Depends on D114495.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Do I understand it correctly that the point of splitting store and post_process was to move compression out of critical sections?
If yes, then it's not relevant with the background thread compression. It seems that the store/post_process logic can be simpler and we just Post to the semaphore whenever a block becomes full (or compress directly in tests).
Problem is that only tsan, asan, and memprof define DEFINE_REAL_PTHREAD_FUNCTIONS to have internal_start_thread. It should be trivial to define them of msan/lsan/hwasan.
Then there are platforms without internal_start_thread. we can fix some and disabled compression for the rest
WDYT?
This sounds perfectly fine. If somebody will find async compression is critical for another sanitizer, they can add DEFINE_REAL_PTHREAD_FUNCTIONS.
Or maybe we could define real pthread_create in the sanitizer library (interceptor library may provide a way to query the "real" function w/o doing the rest)? I think overall it's better if sanitizer_common will provide fully working pieces  rather than half-working. But I know it may be non-trivial. I tried to do stack printing in sanitizer_common and gave up because of all of _nolibc variants, etc.
But for other cases does it make sense to do compression inline instead (it happens episodically after start and compressing 8MB shouldn't take seconds). One environment w/o pthread is Go, it does not have/link pthread library.
But also note that background threads create problems with sandboxes which require programs to be single-threaded. At least this thread should be supported in the sandboxing callback. But I found that our sandbox2 does not call it after doing clone, so that may break (or turn into a latent bug since it executes relatively small amount of code after clone).
| compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp | ||
|---|---|---|
| 115 | s/weak/strong/ | |
Probably simpler just fire-and-forget a thread when work is available
with just-in-case fallback to in-thread compression
| compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp | ||
|---|---|---|
| 107 | Shouldn't we call CompressStackStore first time before the sleep? | |
| compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp | ||
|---|---|---|
| 107 | Now it will call compress twice w/o sleep because of: if (CompressStackStore()) i = 0; | |
| compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp | ||
|---|---|---|
| 107 | The loop looks ok now (probably, such non-standard loops are hard to analyze). But we also need integration with __sanitizer_sandbox_on_notify to stop the thread (all threads) and something for Go which don't link in pthread. ... do we really need to do this in separate thread? how long does compression take? | |
| compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp | ||
|---|---|---|
| 107 | Just noticed sandbox2 only calls __sanitizer_sandbox_on_notify if defined(THREAD_SANITIZER): while chromium does it properly: | |
| compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp | ||
|---|---|---|
| 107 | 
 That's why I intentionally had the sleep even on the first run in the beginning. 
 I am not sure I understand this comment. If we have empty thread internal_start_thread for SANITIZER_GO 
 Actually not sure. In most cases it should called just a couple of times. I have real single threaded unittest which behave like this, and it regress about 20% with asan, without the thread. No regression with the thread. If we go no-thread we need at least do that out of the hash table lock. | |
| 107 | 
 I guess I detected some test which fail like this. | |
| compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp | ||
|---|---|---|
| 107 | 
 so after this patch SANITIZER_GO will have no compression. So how GO is relevant here? Oh, I did not expect that there is a broken internal_start_thread defined for Go. Maybe it's better to enable it synchronously for Go then? 
 I guess we need the thread then. | |
switch back to semaphore and remove sleep/heartbeat
wrap into class
support __sanitizer_sandbox_on_notify
I like this version more than then previous 5 retries with sleeps in between.
| compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp | ||
|---|---|---|
| 156 | This runs before fork, right? Why do we need to stop the thread before fork? | |
| 166 | It seems that Wait needs to be before CompressStackStore because NewWorkNotify always notifies newly started thread. | |
| compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp | ||
|---|---|---|
| 156 | Because forked process will have no this thread, and we need to add some other stuff to solve restart it the new process or avoid deadlock in join of StackDepotStopBackgroundThread | |
| 166 | Yes. I didn't this for readability. | |
| compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp | ||
|---|---|---|
| 166 | 
 Sorry, added draft comment before actual change. | |
We started seeing build failure on our builders after this change landed:
ld.lld: error: undefined symbol: __sanitizer::internal_start_thread(void* (*)(void*), void*) >>> referenced by sanitizer_stackdepot.cpp:133 (compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp:133) >>> compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_stackdepot.cpp.obj:(__sanitizer::(anonymous namespace)::CompressThread::NewWorkNotify())
Would it be possible to take a look and revert the change if it cannot be resolved quickly?
+1, this is also visible on https://lab.llvm.org/buildbot/#/builders/127.
I see an attempt to revert something to fix this in rG02997febe61597b7c3f215534c5adce851ba8963 by @browneee, but that seems to have reverted the wrong commit.
Thanks for revert.
Something wrong with the bot. Real error which connects breakage with this patch is in the step marked as green. I never looked into them.
I checked our builders and they're green as well. Thanks for looking into the issue.
@vitalybuka Hi, this change caused some test failures on green dragon apple CI. This is the bot: https://green.lab.llvm.org/green/job/clang-stage1-RA/. The following tests are failing:
SanitizerCommon-asan-x86_64-Darwin.SanitizerCommon-asan-x86_64-Darwin.compress_stack_depot.cpp
 SanitizerCommon-lsan-x86_64-Darwin.SanitizerCommon-lsan-x86_64-Darwin.compress_stack_depot.cpp	
 SanitizerCommon-tsan-x86_64-Darwin.SanitizerCommon-tsan-x86_64-Darwin.compress_stack_depot.cpp
I was able to fix the tests by also allowing SANITIZER_MAC target to call StackDepotStopBackgroundThread inside of sanitize_sanbox_on_notify. Do you think that would be the proper fix to commit?
| compiler-rt/lib/sanitizer_common/sanitizer_common_libcdep.cpp | ||
|---|---|---|
| 207 ↗ | (On Diff #393219) | I think this should not have been guarded on platforms here. | 
Fixes the failing tests pointed out by Alex on Darwin: https://reviews.llvm.org/D116061
Shouldn't we call CompressStackStore first time before the sleep?
Will be surprising during debugging/looking at logs that compression is not happening after filling a block.