There were a number of issues about the way I have designed this test originally:
- it relied on single-stepping through large parts of code, which was slow and unreliable
- the threading libraries interfered with the exact thing we wanted to test
For this reason, I have rewritted the test using low-level linux api, which allows the test to be
much more focused. The functionality for other platforms will need to be tested separately.