There are cases when thread local quarantine drains almost empty
quarantine batches into the global quarantine. The current approach leaves
them almost empty, which might create a huge memory overhead (each batch
is 4K/8K, depends on bitness).
Details
Diff Detail
- Build Status
Buildable 2428 Build 2428: arc lint + arc unit
Event Timeline
Is there any noticeable gain from this change other than in the case where quarantine_size_mb is 0?
I don't want to introduce this extra complexity unless there is a real gain.
And quarantine_size_mb=0 should simply bypass the quarantine (if we care about this corner case)
As I understand, small-but-not-zero local quarantine cache size would trigger the bad behavior without this change.
Not just the small thread local quarantine size can create a lot of almost empty quarantine blocks, it can also happen joining a lot of threads which happen to make only a few allocations during their lifetime.
lib/sanitizer_common/sanitizer_quarantine.h | ||
---|---|---|
151 | is it possible to make a separate function for this part of code and then unit test it? | |
test/asan/TestCases/Linux/thread_local_quarantine_size_kb.cc | ||
1 | I would suggest to leave this test (mostly?) unchanged and create a new one. |
- Address comments.
- Add a test to verify "one allocation per thread" scenario.
- Add unit test for SanitizerQuarantine batch transfers.
- Address comments.
- Add a test to verify "one allocation per thread" scenario.
- Add unit test for SanitizerQuarantine batch transfers.
- Add dropped out line back.
- Merge quarantine blocks on Recycle, not on Put.
PTAL
Doing even minimal work under cache spin lock on Put proven to be too slow in certain conditions (many threads, many allocations). The spin lock contention hits slow path (__sanitizer::internal_sched_yield()) and context switch results in dramatically slower execution (up to 20x slowdown observed in tests).
lib/sanitizer_common/sanitizer_quarantine.h | ||
---|---|---|
38 | I guess, you mean to do not account for the quarantine batch size here and calculate the overhead every time we check whether the quarantine limit is reached? It will simplify the code a little bit, but we will have to perform this calculation on every Put() call or, if we still going to roll it into size_, EnqueueBatch, DequeueBatch and MergeBatches will have to be aware of it (not that great too). |
would it make the code simpler if we removed sizeof(QuarantineBatch) from here?