This is an archive of the discontinued LLVM Phabricator instance.

Make lit.py abort_now() kill the worker process more precisely
AbandonedPublic

Authored by porglezomp on Jul 29 2019, 3:25 PM.

Details

Summary

Previously, abort_now in lit.py was killing the entire process group if you weren't on Windows, using os.kill(0, 9). This seems to immediately kill the main process in addition to the worker, at least on macOS. Just killing the worker process explicitly by PID still has the process cleaning up without printing everything, and now it exits with the expected exit code 2 instead of the 137 from SIGKILL.

I'm interested in this change as a first step to letting lit print out a partial report when it's interrupted mid-run, which I've wanted while making changes to particularly slow-running test setups.

Diff Detail

Repository
rL LLVM

Event Timeline

porglezomp created this revision.Jul 29 2019, 3:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 29 2019, 3:25 PM

@porglezomp I think killing the process group is the intention. Shell tests can spawn many child processes and we want to make sure they get killed when abort_now() gets called. If you just kill the worker process you won't kill the child processes spawned by shell tests. This could cause zombie processes to be left lying around. For this reason I don't think we can land this patch.

delcypher requested changes to this revision.Jul 29 2019, 3:50 PM
delcypher added a reviewer: delcypher.
This revision now requires changes to proceed.Jul 29 2019, 3:51 PM

@porglezomp Just to add a little more context here killing processes in lit has been a source of trouble for me and is part of why I never landed https://reviews.llvm.org/D47210 . We don't have a mechanism for tracking all the processes spawned during tests and when I implemented the --timeout flag I had to implement a hack where we try walk the process tree to kill all the processes spawned by a shell test.
Multiprocessing seems to complicate things because the parent process will just re-spawn a worker process if it's killed.

porglezomp abandoned this revision.Jan 20 2021, 12:25 PM