This is an archive of the discontinued LLVM Phabricator instance.

tsan: refactor fork handling
ClosedPublic

Authored by dvyukov on Apr 29 2021, 3:10 AM.

Details

Summary

Commit efd254b6362 ("tsan: fix deadlock in pthread_atfork callbacks")
fixed another deadlock related to atfork handling.
But builders with DCHECKs enabled reported failures of
pthread_atfork_deadlock2.c and pthread_atfork_deadlock3.c tests
related to the fact that we hold runtime locks on interceptor exit:
https://lab.llvm.org/buildbot/#/builders/70/builds/6727
This issue is somewhat inherent to the current approach,
we indeed execute user code (atfork callbacks) with runtime lock held.

Refactor fork handling to not run user code (atfork callbacks)
with runtime locks held. This change does this by installing
own atfork callbacks during runtime initialization.
Atfork callbacks run in LIFO order, so the expectation is that
our callbacks run last, right before the actual fork.
This way we lock runtime mutexes around fork, but not around
user callbacks.

Extend tests to also install after fork callbacks just to cover
more scenarios. Some tests also started reporting real races
that we previously suppressed.

Also extend tests to cover fork syscall support.

Diff Detail

Event Timeline

dvyukov created this revision.Apr 29 2021, 3:10 AM
dvyukov requested review of this revision.Apr 29 2021, 3:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2021, 3:10 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript

Third attempt...
Problem with the previous version manifested on an internal codebase and was related to fork syscall hooks. I've fixed it and added the test (test/tsan/Linux/fork_syscall.cpp).

vitalybuka accepted this revision.Apr 29 2021, 5:42 PM

Third attempt...
Problem with the previous version manifested on an internal codebase and was related to fork syscall hooks. I've fixed it and added the test (test/tsan/Linux/fork_syscall.cpp).

I usually reopen previous review after revert and upload new patch there. This way easier to see incremental changes.
However I applied the patch and it looks very different from the reverted one.

compiler-rt/lib/tsan/rtl/tsan_rtl.cpp
528

why now you don't need to update ignore_reads_and_writes as before?

compiler-rt/lib/tsan/rtl/tsan_rtl_report.cpp
145

I don't see this function being used anywhere

This revision is now accepted and ready to land.Apr 29 2021, 5:42 PM

Third attempt...
Problem with the previous version manifested on an internal codebase and was related to fork syscall hooks. I've fixed it and added the test (test/tsan/Linux/fork_syscall.cpp).

I usually reopen previous review after revert and upload new patch there. This way easier to see incremental changes.
However I applied the patch and it looks very different from the reverted one.

Thanks for the tip, will try next time.
It was reverted twice, maybe you diffed with the first one.
The last one is https://reviews.llvm.org/D101385 it should be close.

dvyukov added inline comments.Apr 29 2021, 11:28 PM
compiler-rt/lib/tsan/rtl/tsan_rtl_report.cpp
145

You are right. I messed something re-applying the patch...

I've uploaded the reverted patch version as Diff 2, and then new changes as Diff 3.

dvyukov added inline comments.Apr 29 2021, 11:47 PM
compiler-rt/lib/tsan/rtl/tsan_rtl.cpp
528

The problem was when memory accesses actually tried to report a race, that deadlocked, and that's why we enabled ignore_reads_and_writes.
However, the same deadlock can happen on any other reports (mutex misuses, signal spoiling errno, etc), so instead we now disable all reports with thr->suppress_reports++. Thus disabling memory accesses separately becomes unnecessary.
At least all existing and new tests pass.

This revision was landed with ongoing or failed builds.Apr 29 2021, 11:48 PM
This revision was automatically updated to reflect the committed changes.