This is an archive of the discontinued LLVM Phabricator instance.

tsan: prevent pathological slowdown for spurious races
ClosedPublic

Authored by dvyukov on Jul 21 2022, 6:51 AM.

Details

Summary

Prevent the following pathological behavior:
Since memory access handling is not synchronized with DoReset,
a thread running concurrently with DoReset can leave a bogus shadow value
that will be later falsely detected as a race. For such false races
RestoreStack will return false and we will not report it.
However, consider that a thread leaves a whole lot of such bogus values
and these values are later read by a whole lot of threads.
This will cause massive amounts of ReportRace calls and lots of
serialization. In very pathological cases the resulting slowdown
can be >100x. This is very unlikely, but it was presumably observed
in practice: https://github.com/google/sanitizers/issues/1552
If this happens, previous access sid+epoch will be the same for all of
these false races b/c if the thread will try to increment epoch, it will
notice that DoReset has happened and will stop producing bogus shadow
values. So, last_spurious_race is used to remember the last sid+epoch
for which RestoreStack returned false. Then it is used to filter out
races with the same sid+epoch very early and quickly.
It is of course possible that multiple threads left multiple bogus shadow
values and all of them are read by lots of threads at the same time.
In such case last_spurious_race will only be able to deduplicate a few
races from one thread, then few from another and so on. An alternative
would be to hold an array of such sid+epoch, but we consider such scenario
as even less likely.
Note: this can lead to some rare false negatives as well:

  1. When a legit access with the same sid+epoch participates in a race

as the "previous" memory access, it will be wrongly filtered out.

  1. When RestoreStack returns false for a legit memory access because it

was already evicted from the thread trace, we will still remember it in
last_spurious_race. Then if there is another racing memory access from
the same thread that happened in the same epoch, but was stored in the
next thread trace part (which is still preserved in the thread trace),
we will also wrongly filter it out while RestoreStack would actually
succeed for that second memory access.

Diff Detail

Event Timeline

dvyukov created this revision.Jul 21 2022, 6:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2022, 6:51 AM
Herald added a subscriber: Enna1. · View Herald Transcript
dvyukov requested review of this revision.Jul 21 2022, 6:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2022, 6:51 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript
dvyukov added inline comments.Jul 21 2022, 7:02 AM
compiler-rt/lib/tsan/rtl/tsan_rtl.h
338

Thinking if we should do an array just to be sure... I would assume the size of the array will be 0/1 in most cases, rarely 2.

340

I am not super happy about this fix, but I also don't see any other reasonable way to fix this.
I did not find any reasonable way to synchronize memory accesses with DoReset,
and I don't see an easy way to radically speed up RestoreStack. Plus I am now leaning towards
stabilizing the current algorithms rather than radically rework everything again.

melver accepted this revision.Jul 25 2022, 12:53 AM
melver added inline comments.
compiler-rt/lib/tsan/rtl/tsan_rtl.h
338

Just one looks fine until there's evidence it needs more than 1.

340

Is the biggest problem not wanting to do more work on memory accesses?

Which, I assume means that on an access writing the shadow it cannot revalidate at the end (a'la transaction and retry).

This revision is now accepted and ready to land.Jul 25 2022, 12:53 AM
dvyukov added inline comments.Jul 25 2022, 1:09 AM
compiler-rt/lib/tsan/rtl/tsan_rtl.h
340

Yes.

The memory access part also needs to synchronize with another thread, so it either needs to use an atomic RMW/expensive fence, or use some kind of asymmetric synchronization. I've considered doing asymmetric synchronization, but that's still instructions on fast path and portability issues.
I've also tried to use signals to synchronize with fast paths. It was nightmare to develop and debug.

This revision was automatically updated to reflect the committed changes.