The race boiled down to this:
If a test worker queue is able to run the test inferior and clean up before the dosep.py listener socket is spun up, and the worker queue is the last one (as would be the case when there's only one test rerunning in the rerun queue), then the test suite will exit the main loop before having a chance to process any test events coming from the test inferior or the worker queue job control. I found this race to be far more likely on fast hardware. Our Linux CI is one such example. While it will show up primarily during meta test events generated by a worker thread when a test inferior times out or exits with an exceptional exit (e.g. seg fault), it only requires that the OS takes longer to hook up the listener socket than it takes for the final test inferior and worker thread to shut down.