This is an archive of the discontinued LLVM Phabricator instance.

posix_spawn should retry upon EINTR
ClosedPublic

Authored by jfb on Apr 24 2019, 2:58 PM.

Details

Summary

We've seen cases of bots failing with:

clang: error: unable to execute command: posix_spawn failed: Interrupted system call

Add a small retry loop to posix_spawn in case this happens. Don't retry too much in case there's some systemic problem going on, but retry a few times.
rdar://problem/50181448

Diff Detail

Event Timeline

jfb created this revision.Apr 24 2019, 2:58 PM
jkorous accepted this revision.Apr 24 2019, 3:22 PM

Hmm, we have this utility function RetryAfterSignal() but it doesn't have maxRetries.

http://llvm.org/doxygen/namespacellvm_1_1sys.html#aea8b04d954b2cebd8a26d5d712634312

If you feel like you could make it more generic and use it here.

Anyway, the current form LGTM.

lib/Support/Unix/Program.inc
257

Nit: shouldn't it be <= or s/retries/tries/?

This revision is now accepted and ready to land.Apr 24 2019, 3:22 PM
jfb marked 2 inline comments as done.Apr 24 2019, 3:37 PM

Hmm, we have this utility function RetryAfterSignal() but it doesn't have maxRetries.

http://llvm.org/doxygen/namespacellvm_1_1sys.html#aea8b04d954b2cebd8a26d5d712634312

That one won't work because posix_spawn doesn't set errno, it returns the error code directly.

lib/Support/Unix/Program.inc
257

Say maxRetries is 1. First we try, it fails, we'll retry because 0 < 1, and then if it fails again we won't retry because 1 < 1 is false. So we got one try and one retry. I think you're counting the initial try as a retry whereas I'm not :-)

Bigcheese accepted this revision.Apr 24 2019, 3:41 PM
This revision was automatically updated to reflect the committed changes.
jfb marked an inline comment as done.