This is an archive of the discontinued LLVM Phabricator instance.

[sanitizers_common] when spawning a subprocess for symbolizers, use posix_spawn instead of fork()
Needs RevisionPublic

Authored by alex on Jun 21 2018, 12:51 PM.

Details

Summary

Fixes: https://github.com/google/sanitizers/issues/979

Note that I can't seem to get the tests running locally, so I haven't been able to properly verify this!

Event Timeline

alex created this revision.Jun 21 2018, 12:51 PM
eugenis requested changes to this revision.Jun 21 2018, 5:16 PM

It appears that Android plans to support posix_spawn in P release:

https://android.googlesource.com/platform/bionic/+/master/libc/libc.map.txt#1385

This code needs to work in M, at least.

Sorry I did not notice this before. Try vfork?

This revision now requires changes to proceed.Jun 21 2018, 5:16 PM
alex added a comment.Jun 22 2018, 8:09 AM

Sorry I did not notice this before. Try vfork?

Simply using vfork instead of fork is probably a no-go: the vfork's man page (on macOS, I haven't verified the panoply of other BSDs) says that EINVAL is returned when "A system call other than _exit() or execve() (or libc functions that make no system calls other than those) is called following calling a vfork() call." That is, on platforms with this constraint using vfork where we currently use fork will fail, because we use other syscalls to do things like setup the file descriptor table.

  • One option would be to simply use vfork on Linux only (where it does not have this restriction), and fork on all other platforms; this probably has the smallest diff.
  • Another option would be to use posix_spawn on every platform except Android older than P and retain the existing fork-based implementation for them; this has a larger diff, but maybe in some glorious future we can delete the fall-back implementation for Android and end up with less code. This also let's us avoid calling vfork directly, which I have confidence is safe here, but which nonetheless has relatively fiddly semantics.

Do you have a preference between these two?

Simply using vfork instead of fork is probably a no-go: the vfork's man page (on macOS, I haven't verified the panoply of other BSDs) says that EINVAL is returned when "A system call other than _exit() or execve() (or libc functions that make no system calls other than those) is called following calling a vfork() call." That is, on platforms with this constraint using vfork where we currently use fork will fail, because we use other syscalls to do things like setup the file descriptor table.

Really? Is that actually the case in practice, or is the man page too cautious? Cloning file descriptor tables should not make vfork any more expensive, and without that it is kinda useless. That's a pity.

I'd prefer to go with the smallest diff for now.

alex added a comment.Jun 22 2018, 8:45 AM

Good call actually checking the manpage... both dup2 and close seem to work perfectly fine in the child after vfork. I'll update the patch accordingly.

alex updated this revision to Diff 152483.Jun 22 2018, 8:59 AM
  • Simply use vfork everywhere, the macOS manpage does not describe it's semantics correctly.
alex updated this revision to Diff 152485.Jun 22 2018, 9:01 AM
  • Query for sysconf open max before vforking
eugenis accepted this revision.Jun 22 2018, 9:04 AM

Looks great.

This revision is now accepted and ready to land.Jun 22 2018, 9:04 AM
alex added a comment.Jun 22 2018, 9:11 AM

Fantastic, thanks for the review. (As an FYI I'm not an llvm committer, so I can't land this myself).

No problem, I can land it for you. Thanks for the patch!

eugenis requested changes to this revision.Jun 22 2018, 9:36 AM

Tests are crashing with corrupted stack.

lib/sanitizer_common/sanitizer_linux.cc
800 ↗(On Diff #152485)

You can't return from a function that calls vfork() in the child process. It will destroy the stack frame in the parent process.

This revision now requires changes to proceed.Jun 22 2018, 9:36 AM
kubamracek requested changes to this revision.Jun 22 2018, 10:26 AM

I've talked to our runtime and kernel folks and we should not be using vfork on Darwin. Even when the behavior you tested seems to work today, it might break tomorrow. Please opt out Darwin from vfork.

Also, it seems best to use posix_spawn wherever it's available. If Android doesn't have it, we should only fallback to fork/vfork on Android.

OK. It looks like internal_vfork has to be implemented in platform-specific assembly, which is was to much hassle.
Let's do posix_spawn, and fall back to fork on android < P.

Also, FYI, this is a related attempt of using posix_spawn on Darwin: https://reviews.llvm.org/D40032.

alex updated this revision to Diff 152516.Jun 22 2018, 10:53 AM
  • Switch back to using posix_spawn everywhere but Android

Thanks!

As a suggestion (perhaps for a future patch, not required to fix now): Could we detect the Android version at runtime and call posix_spawn on newer Android versions (where we know it's available)?

alex added a comment.Jun 22 2018, 12:21 PM

Yes, I imagine such a thing is possible. It may also be detect at compile time which version of the SDK it's targetting, but I don't know the Android SDK (NDK?) well enough to say how.

alex updated this revision to Diff 152689.Jun 25 2018, 7:33 AM
  • Use posix_spawn based implementation on r28+ for Android
eugenis added inline comments.Jun 25 2018, 1:00 PM
lib/sanitizer_common/sanitizer_posix_libcdep.cc
443

This check can be done at runtime by using weak declarations.

459

Do you need to destroy the file_actions object?

alex marked an inline comment as done.Jun 26 2018, 10:35 AM
alex added inline comments.
lib/sanitizer_common/sanitizer_posix_libcdep.cc
443

Hmm, it's kind of a pain because there's several functions involved, and also struct declarations. How important is it to do this at runtime?

alex updated this revision to Diff 152918.Jun 26 2018, 10:35 AM
  • Simply use vfork everywhere, the macOS manpage does not describe it's semantics correctly.
  • Query for sysconf open max before vforking
  • Switch back to using posix_spawn everywhere but Android
  • Use posix_spawn based implementation on r28+ for Android
  • Destroy file_actions

These are pointers to opaque structs, aka void *.
NDK packs a single asan library built for the lowest supported API level. It will basically never reach 28.

On the other hand, no one else has complained about fork() so far.

alex added a comment.Jun 26 2018, 2:05 PM

Would you be ok landing this with the current approach (+/- any other feedback of course), and filing a follow-up bug to switch it to a runtime check?

eugenis accepted this revision.Jun 26 2018, 3:57 PM

OK, sounds fine.

lib/sanitizer_common/sanitizer_posix_libcdep.cc
496

this could be moved to file_closer

alex updated this revision to Diff 153084.Jun 27 2018, 7:26 AM
  • Use at_scope_exit to simplify destroying the posix_spawn_file_actions
alex marked an inline comment as done.Jun 27 2018, 7:27 AM
alex added a comment.Jun 29 2018, 10:21 AM

Do you need any additional changes from me?

I'm fine with this.
Kuba?

kubamracek accepted this revision.Jun 29 2018, 10:26 AM

LGTM. Thanks for addressing the comments!

This revision is now accepted and ready to land.Jun 29 2018, 10:26 AM
eugenis requested changes to this revision.Jun 29 2018, 10:55 AM

Tests hang with this change when output is redirected to a pipe. The test binary exits OK, but it looks like the process on the receiving end never gets an EOF.
Sorry, I don't have time to think about this now.
Please learn how to run tests and debug this.

This revision now requires changes to proceed.Jun 29 2018, 10:55 AM