Page MenuHomePhabricator

[Windows] Fix race condition between state changes
ClosedPublic

Authored by martin on May 21 2019, 3:06 AM.

Details

Summary

If the process is resumed before the state is changed to "running" there is a possibility (when single stepping) that the debugger stops and changes the state to "stopped" before it is first changed to "running". This causes the process to ignore the stop event (since the state did not change) which in turn leads the DebuggerThread to wait indefinitely for the exception predicate in HandleExceptionEvent.

Diff Detail

Repository
rL LLVM

Event Timeline

martin created this revision.May 21 2019, 3:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 21 2019, 3:07 AM

zturner is not a regular here anymore. I think he pops in from time-to-time, but I wouldn't depend on just him for a review.

Are you still wanting to land this? If you rebase and add me as a reviewer, I'd be happy to take a look.

martin updated this revision to Diff 210365.Jul 17 2019, 10:29 AM
martin added a reviewer: amccarth.

Rebased against master

zturner is not a regular here anymore. I think he pops in from time-to-time, but I wouldn't depend on just him for a review.

Are you still wanting to land this? If you rebase and add me as a reviewer, I'd be happy to take a look.

I unfortunately missed your reply since it was somehow marked as spam. But yes I would like to see this land in some shape or form. I was a bit unsure on how much to change, I thought about also moving the call to LLDB_LOG for instance, but I opted for the least disruptive change I could manage.

This change looks fine to me, but I wish there was a reliable way to test it. Is it a true race condition (e.g., because the unpredictability of thread scheduling), or is there a way to write a test that would always fail without this fix?

This change looks fine to me, but I wish there was a reliable way to test it. Is it a true race condition (e.g., because the unpredictability of thread scheduling), or is there a way to write a test that would always fail without this fix?

My understanding is that this is a race between the call to SetPrivateState(eStateRunning) in DoResume and the call to SetPrivateState(eStateStopped) in OnDebugException (on line 694). The former is supposed to take place before the latter and it often is, but when the latter is run first it leads the debugger to wait indefinitely. Because the second state change is ignored because the process is already in the stopped state.

I found this issue when I was using Visual Studio Code with the lldb-vscode debug adapter. Sometimes when I single stepped I would hit this issue. I could reliably trigger it by holding down F10 (step) in Visual Studio Code, it would only take a couple of seconds of rapid single stepping (roughly between 10 steps and 100 steps) before I hit it. However, I cannot trigger it with any code. For example, a simple while loop with an integer increment does not trigger it. But if I replace the integer increment with a function call I can trigger it within a couple of seconds. However, I am not familiar enough with LLDB to know if it would be possible to write a test that would reliably trigger it. My hunch is that it would be difficult.

amccarth accepted this revision.Jul 17 2019, 4:16 PM
This revision is now accepted and ready to land.Jul 17 2019, 4:16 PM

What is the next step to get this commited? Can you commit it?

Yes, I can submit it for you, probably in the next hour or two. Thanks for the patch!

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJul 22 2019, 10:02 AM