If the process is resumed before the state is changed to "running" there is a possibility (when single stepping) that the debugger stops and changes the state to "stopped" before it is first changed to "running". This causes the process to ignore the stop event (since the state did not change) which in turn leads the DebuggerThread to wait indefinitely for the exception predicate in HandleExceptionEvent.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
zturner is not a regular here anymore. I think he pops in from time-to-time, but I wouldn't depend on just him for a review.
Are you still wanting to land this? If you rebase and add me as a reviewer, I'd be happy to take a look.
I unfortunately missed your reply since it was somehow marked as spam. But yes I would like to see this land in some shape or form. I was a bit unsure on how much to change, I thought about also moving the call to LLDB_LOG for instance, but I opted for the least disruptive change I could manage.
This change looks fine to me, but I wish there was a reliable way to test it. Is it a true race condition (e.g., because the unpredictability of thread scheduling), or is there a way to write a test that would always fail without this fix?
My understanding is that this is a race between the call to SetPrivateState(eStateRunning) in DoResume and the call to SetPrivateState(eStateStopped) in OnDebugException (on line 694). The former is supposed to take place before the latter and it often is, but when the latter is run first it leads the debugger to wait indefinitely. Because the second state change is ignored because the process is already in the stopped state.
I found this issue when I was using Visual Studio Code with the lldb-vscode debug adapter. Sometimes when I single stepped I would hit this issue. I could reliably trigger it by holding down F10 (step) in Visual Studio Code, it would only take a couple of seconds of rapid single stepping (roughly between 10 steps and 100 steps) before I hit it. However, I cannot trigger it with any code. For example, a simple while loop with an integer increment does not trigger it. But if I replace the integer increment with a function call I can trigger it within a couple of seconds. However, I am not familiar enough with LLDB to know if it would be possible to write a test that would reliably trigger it. My hunch is that it would be difficult.
Yes, I can submit it for you, probably in the next hour or two. Thanks for the patch!