This is an archive of the discontinued LLVM Phabricator instance.

[scudo][standalone] Fork support
ClosedPublic

Authored by cryptoad on Jan 9 2020, 11:43 AM.

Details

Summary

fork() wasn't well (or at all) supported in Scudo. This materialized
in deadlocks in children.

In order to properly support fork, we will lock the allocator pre-fork
and unlock it post-fork in parent and child. This is done via a
pthread_atfork call installing the necessary handlers.

A couple of things suck here: this function allocates - so this has to
be done post initialization as our init path is not reentrant, and it
doesn't allow for an extra pointer - so we can't pass the allocator we
are currently working with.

In order to work around this, I added a post-init template parameter
that gets executed once the allocator is initialized for the current
thread. Its job for the C wrappers is to install the atfork handlers.

I reorganized a bit the impacted area and added some tests, courtesy
of cferris@ that were deadlocking prior to this fix.

Edit: checking if this triggers presubmit.

Diff Detail

Event Timeline

cryptoad created this revision.Jan 9 2020, 11:43 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJan 9 2020, 11:43 AM
Herald added subscribers: llvm-commits, Restricted Project, jfb. · View Herald Transcript
cryptoad updated this revision to Diff 237143.Jan 9 2020, 11:49 AM

Corrections.

cryptoad edited the summary of this revision. (Show Details)Jan 9 2020, 12:00 PM
cryptoad updated this revision to Diff 237160.Jan 9 2020, 12:26 PM

Adding Quarantine & ByteMap to the things we disable/enable.

cryptoad updated this revision to Diff 237198.Jan 9 2020, 2:19 PM

Correct several inconsistencies wrt locking orders.

cryptoad updated this revision to Diff 237199.Jan 9 2020, 2:22 PM

Wrong version of the makefiles.

cryptoad updated this revision to Diff 237206.Jan 9 2020, 2:39 PM

Disable GWP-ASan for now.

eugenis added inline comments.Jan 9 2020, 5:08 PM
compiler-rt/lib/scudo/standalone/wrappers_c.inc
151

why is this signalling necessary?
disable() while disabled should block on any of the allocator mutexes just fine.

cryptoad added inline comments.Jan 10 2020, 6:41 AM
compiler-rt/lib/scudo/standalone/wrappers_c.inc
151

Yes I might have overthought it.
I was thinking about a situation where one thread would do disable() then one fork() then another enable().
In my head that was going to block somewhere, but in the end I think you are right and that it's not necessary.
I will remove it.

cryptoad updated this revision to Diff 237343.Jan 10 2020, 8:36 AM

Adding a disable()/fork()/enable() test.

cryptoad marked 3 inline comments as done.Jan 10 2020, 8:38 AM
cryptoad added inline comments.
compiler-rt/lib/scudo/standalone/wrappers_c.inc
151

Coming back to this.
This is actually needed for the scenario I was describing.
I added a test to illustrate it: doing enable() in a thread while another has done disable()/fork() should enable the allocator again and allow the fork() to go on.

SCUDO_PREFIX(Mutex) can be acquired in the child process, and in the parent process, and is not held during fork. That's a potential deadlock.
The condvar code reduces the window for it, but does not eliminate it completely.
Why can't malloc_disable() simply directly call allocator.disable(), and the same in malloc_enable?

cryptoad marked an inline comment as done.Jan 10 2020, 9:47 AM

SCUDO_PREFIX(Mutex) can be acquired in the child process, and in the parent process, and is not held during fork. That's a potential deadlock.
The condvar code reduces the window for it, but does not eliminate it completely.
Why can't malloc_disable() simply directly call allocator.disable(), and the same in malloc_enable?

You are right. Everything still works with just disable/enable. I will change this.

cryptoad updated this revision to Diff 237369.Jan 10 2020, 9:55 AM

Applying simplification suggested by eugenis@: malloc_disable just
calls disable, same for enable.

I apologize for being pushy, but could I get a yay or nay on this to unblock Android?
Thanks!

cryptoad updated this revision to Diff 237695.Jan 13 2020, 8:40 AM

Remove the NOINLINE for enable & disable, they were used to debug
some deadlocks.

I'm concerned about the removal of gwp-asan.
Could you check if https://reviews.llvm.org/D72546 helps?

cryptoad updated this revision to Diff 237735.Jan 13 2020, 10:55 AM

Re-enable GWP-ASan, add a comment about the potentially failing test.
We will skip it if this occurs upstream.

This revision is now accepted and ready to land.Jan 13 2020, 11:04 AM
cryptoad edited the summary of this revision. (Show Details)Jan 13 2020, 11:09 AM
cryptoad updated this revision to Diff 237742.Jan 13 2020, 11:32 AM

Adding a newline at end of file, to try and trigger presubmit.

Unit tests: unknown.

clang-tidy: unknown.

clang-format: pass.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt

Unit tests: pass. 61794 tests passed, 0 failed and 781 were skipped.

clang-tidy: unknown.

clang-format: pass.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

eugenis added inline comments.Jan 13 2020, 1:41 PM
compiler-rt/lib/scudo/standalone/combined.h
31

A static global in a header is not a great idea - every translation unit that includes this will get its own instance of gwp-asan, and only one of those will be initialized.

This needs to be moved to the wrappers, probably.

FWIW I'm ok with this landing in the previous version, disabling GWP-ASan, because its integration in scudo is so broken right now.

hctim added a comment.Jan 13 2020, 5:51 PM

FWIW I'm ok with this landing in the previous version, disabling GWP-ASan, because its integration in scudo is so broken right now.

+1 - we can fix GWP-ASan-in-scudo-standalone later.

eugenis added inline comments.Jan 13 2020, 6:06 PM
compiler-rt/lib/scudo/standalone/combined.h
31

I'm not sure I completely understand this comment about constexpr initialization.
But if the problem is that gwp-asan is not zero initialized, then it seems really easy to fix. There are only two non-zero fields:

  • AdjustedSampleRate can be stored in a pre-decremented form. It is not used on the hot path.
  • GuardedPagePool - I don't see why it can not start as 0

Then we make GuardedAlloc a member of Allocator, and we are done. What am I missing?

cryptoad updated this revision to Diff 237974.Jan 14 2020, 7:31 AM

Disable GWP-ASan, FIXME added.

Unit tests: pass. 61822 tests passed, 0 failed and 781 were skipped.

clang-tidy: unknown.

clang-format: pass.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

This revision was automatically updated to reflect the committed changes.