This is an archive of the discontinued LLVM Phabricator instance.

[sanitizer] Run Stack compression in background thread
ClosedPublic

Authored by vitalybuka on Nov 23 2021, 8:33 PM.

Diff Detail

Event Timeline

vitalybuka created this revision.Nov 23 2021, 8:33 PM
vitalybuka requested review of this revision.Nov 23 2021, 8:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 23 2021, 8:34 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript

Do I understand it correctly that the point of splitting store and post_process was to move compression out of critical sections?
If yes, then it's not relevant with the background thread compression. It seems that the store/post_process logic can be simpler and we just Post to the semaphore whenever a block becomes full (or compress directly in tests).

Do I understand it correctly that the point of splitting store and post_process was to move compression out of critical sections?
If yes, then it's not relevant with the background thread compression. It seems that the store/post_process logic can be simpler and we just Post to the semaphore whenever a block becomes full (or compress directly in tests).

Problem is that only tsan, asan, and memprof define DEFINE_REAL_PTHREAD_FUNCTIONS to have internal_start_thread. It should be trivial to define them of msan/lsan/hwasan.

Then there are platforms without internal_start_thread. we can fix some and disabled compression for the rest
WDYT?

Do I understand it correctly that the point of splitting store and post_process was to move compression out of critical sections?
If yes, then it's not relevant with the background thread compression. It seems that the store/post_process logic can be simpler and we just Post to the semaphore whenever a block becomes full (or compress directly in tests).

Problem is that only tsan, asan, and memprof define DEFINE_REAL_PTHREAD_FUNCTIONS to have internal_start_thread. It should be trivial to define them of msan/lsan/hwasan.

Then there are platforms without internal_start_thread. we can fix some and disabled compression for the rest
WDYT?

This sounds perfectly fine. If somebody will find async compression is critical for another sanitizer, they can add DEFINE_REAL_PTHREAD_FUNCTIONS.
Or maybe we could define real pthread_create in the sanitizer library (interceptor library may provide a way to query the "real" function w/o doing the rest)? I think overall it's better if sanitizer_common will provide fully working pieces rather than half-working. But I know it may be non-trivial. I tried to do stack printing in sanitizer_common and gave up because of all of _nolibc variants, etc.

But for other cases does it make sense to do compression inline instead (it happens episodically after start and compressing 8MB shouldn't take seconds). One environment w/o pthread is Go, it does not have/link pthread library.

But also note that background threads create problems with sandboxes which require programs to be single-threaded. At least this thread should be supported in the sandboxing callback. But I found that our sandbox2 does not call it after doing clone, so that may break (or turn into a latent bug since it executes relatively small amount of code after clone).

compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp
103

s/weak/strong/
weak can fail spuriously

Probably simpler just fire-and-forget a thread when work is available
with just-in-case fallback to in-thread compression

vitalybuka planned changes to this revision.Dec 2 2021, 8:46 PM

another version

refrase comment

dvyukov added inline comments.Dec 2 2021, 11:50 PM
compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp
95

Shouldn't we call CompressStackStore first time before the sleep?
Will be surprising during debugging/looking at logs that compression is not happening after filling a block.

vitalybuka marked an inline comment as done.

no first sleep

dvyukov added inline comments.Dec 3 2021, 12:36 AM
compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp
95

Now it will call compress twice w/o sleep because of:

if (CompressStackStore())
  i = 0;
vitalybuka updated this revision to Diff 391578.Dec 3 2021, 1:15 AM
vitalybuka marked an inline comment as done.

fix loop

dvyukov added inline comments.Dec 3 2021, 1:33 AM
compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp
95

The loop looks ok now (probably, such non-standard loops are hard to analyze).

But we also need integration with __sanitizer_sandbox_on_notify to stop the thread (all threads) and something for Go which don't link in pthread.

... do we really need to do this in separate thread? how long does compression take?

dvyukov added inline comments.Dec 3 2021, 1:35 AM
compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp
95
vitalybuka added inline comments.Dec 3 2021, 10:49 AM
compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp
95

The loop looks ok now (probably, such non-standard loops are hard to analyze).

That's why I intentionally had the sleep even on the first run in the beginning.

But we also need integration with __sanitizer_sandbox_on_notify to stop the thread (all threads) and something for Go which don't link in pthread.

I am not sure I understand this comment. If we have empty thread internal_start_thread for SANITIZER_GO
so after this patch SANITIZER_GO will have no compression. So how GO is relevant here?

... do we really need to do this in separate thread? how long does compression take?

Actually not sure.
It takes ~60ms on my Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz

In most cases it should called just a couple of times.
But on bad cases when the stack depot is 5GB+ it called frequently blocking everything.

I have real single threaded unittest which behave like this, and it regress about 20% with asan, without the thread. No regression with the thread.

If we go no-thread we need at least do that out of the hash table lock.

95
dvyukov added inline comments.Dec 6 2021, 4:25 AM
compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp
95

I am not sure I understand this comment. If we have empty thread internal_start_thread for SANITIZER_GO

so after this patch SANITIZER_GO will have no compression. So how GO is relevant here?

Oh, I did not expect that there is a broken internal_start_thread defined for Go. Maybe it's better to enable it synchronously for Go then?

I have real single threaded unittest which behave like this, and it regress about 20% with asan

I guess we need the thread then.

vitalybuka updated this revision to Diff 392251.Dec 6 2021, 6:21 PM

switch back to semaphore and remove sleep/heartbeat
wrap into class
support __sanitizer_sandbox_on_notify

vitalybuka updated this revision to Diff 392252.Dec 6 2021, 6:24 PM

restore comment about negative flag

vitalybuka updated this revision to Diff 392254.Dec 6 2021, 6:27 PM

fix typo in name

I like this version more than then previous 5 retries with sleeps in between.

compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp
154

It seems that Wait needs to be before CompressStackStore because NewWorkNotify always notifies newly started thread.
Or NewWorkNotify must not notify new thread.

218

This runs before fork, right? Why do we need to stop the thread before fork?

improved loop

vitalybuka added inline comments.Dec 7 2021, 10:13 AM
compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp
154

Yes. I didn't this for readability.
Removed Post on start.

218

Because forked process will have no this thread, and we need to add some other stuff to solve restart it the new process or avoid deadlock in join of StackDepotStopBackgroundThread

vitalybuka added inline comments.Dec 7 2021, 10:14 AM
compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp
154

Removed Post on start.

Sorry, added draft comment before actual change.
I removed unnececary CompressStackStore by moving out some stuff into WaitForWork

dvyukov accepted this revision.Dec 8 2021, 2:47 AM
This revision is now accepted and ready to land.Dec 8 2021, 2:47 AM
This revision was automatically updated to reflect the committed changes.
phosek added a subscriber: phosek.Dec 9 2021, 12:09 AM

We started seeing build failure on our builders after this change landed:

ld.lld: error: undefined symbol: __sanitizer::internal_start_thread(void* (*)(void*), void*)
>>> referenced by sanitizer_stackdepot.cpp:133 (compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp:133)
>>>               compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_stackdepot.cpp.obj:(__sanitizer::(anonymous namespace)::CompressThread::NewWorkNotify())

Would it be possible to take a look and revert the change if it cannot be resolved quickly?

We started seeing build failure on our builders after this change landed:

ld.lld: error: undefined symbol: __sanitizer::internal_start_thread(void* (*)(void*), void*)
>>> referenced by sanitizer_stackdepot.cpp:133 (compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp:133)
>>>               compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_stackdepot.cpp.obj:(__sanitizer::(anonymous namespace)::CompressThread::NewWorkNotify())

Would it be possible to take a look and revert the change if it cannot be resolved quickly?

+1, this is also visible on https://lab.llvm.org/buildbot/#/builders/127.

I see an attempt to revert something to fix this in rG02997febe61597b7c3f215534c5adce851ba8963 by @browneee, but that seems to have reverted the wrong commit.

phosek added a comment.Dec 9 2021, 9:58 AM

We started seeing build failure on our builders after this change landed:

ld.lld: error: undefined symbol: __sanitizer::internal_start_thread(void* (*)(void*), void*)
>>> referenced by sanitizer_stackdepot.cpp:133 (compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp:133)
>>>               compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_stackdepot.cpp.obj:(__sanitizer::(anonymous namespace)::CompressThread::NewWorkNotify())

Would it be possible to take a look and revert the change if it cannot be resolved quickly?

+1, this is also visible on https://lab.llvm.org/buildbot/#/builders/127.

I see an attempt to revert something to fix this in rG02997febe61597b7c3f215534c5adce851ba8963 by @browneee, but that seems to have reverted the wrong commit.

I have reverted the change in rGa1e7f6280104.

vitalybuka reopened this revision.Dec 9 2021, 10:18 AM

Thanks for revert.
Something wrong with the bot. Real error which connects breakage with this patch is in the step marked as green. I never looked into them.

This revision is now accepted and ready to land.Dec 9 2021, 10:18 AM
This revision was automatically updated to reflect the committed changes.

Windows bot accepted the updated revision. I don't know where to track Fuchsia.

Windows bot accepted the updated revision. I don't know where to track Fuchsia.

I checked our builders and they're green as well. Thanks for looking into the issue.

@vitalybuka Hi, this change caused some test failures on green dragon apple CI. This is the bot: https://green.lab.llvm.org/green/job/clang-stage1-RA/. The following tests are failing:

SanitizerCommon-asan-x86_64-Darwin.SanitizerCommon-asan-x86_64-Darwin.compress_stack_depot.cpp
SanitizerCommon-lsan-x86_64-Darwin.SanitizerCommon-lsan-x86_64-Darwin.compress_stack_depot.cpp
SanitizerCommon-tsan-x86_64-Darwin.SanitizerCommon-tsan-x86_64-Darwin.compress_stack_depot.cpp

I was able to fix the tests by also allowing SANITIZER_MAC target to call StackDepotStopBackgroundThread inside of sanitize_sanbox_on_notify. Do you think that would be the proper fix to commit?

yln added inline comments.Dec 20 2021, 3:26 PM
compiler-rt/lib/sanitizer_common/sanitizer_common_libcdep.cpp
213

I think this should not have been guarded on platforms here.

yln added a comment.Dec 20 2021, 3:51 PM

Fixes the failing tests pointed out by Alex on Darwin: https://reviews.llvm.org/D116061