This is an archive of the discontinued LLVM Phabricator instance.

[lldb-vscode] Fix a race in test_extra_launch_commands
ClosedPublic

Authored by labath on Nov 12 2019, 8:43 AM.

Details

Summary

The test used a non-stopping "run" command to launch the process. This
is different from the regular launch with no extra launch commands,
which uses eLaunchFlagStopAtEntry to ensure that the process stops
straight away.

I'm not really sure what's supposed to happen in non-stop-at-entry mode,
or if that's even supported, but what ended up happening was the launch
packet got a reply while the process was running. Then the test case did
a continue_to_next_stop(), which queued a *second* resume request
(along with the internal "resumes" which were being issued as a part of
normal process startup). These two resumes ended up chasing each other's
tails inside lldb in a way which produced hilarious log traces.
Surprisingly, the test ended up passing most of the time, but it did
cause spurious failures when the test seemed to miss a breakpoint.

This changes the test to use stop-at-entry mode in the manual launch
sequence too, which seems to be enough to make the test pass reliably.

Event Timeline

labath created this revision.Nov 12 2019, 8:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 12 2019, 8:43 AM
jankratochvil accepted this revision.Nov 12 2019, 9:16 AM

I agree - as there is currently:

self.verify_commands('launchCommands', output, launchCommands)
# Verify the "stopCommands" here
self.continue_to_next_stop()

BTW I did not have problem with this testcase, I had problem with TestVSCode_attach but I never finished that patch: https://people.redhat.com/jkratoch/TestVSCode_attach.patch

This revision is now accepted and ready to land.Nov 12 2019, 9:16 AM

It looks like all the continue routines in lldbvscode_testcase.py use self.vscode.wait_for_stopped() after setting the target going. Would that fix the problem here? Without knowing anything about the protocol VSCode is using, I can't say that's right. But if launch is asynchronous, as it looks like it is, it seems reasonable to test whether wait_for_stopped() also works for launch. Be weird if it didn't...

--stop-at-entry only really works for Darwin since its posix_spawn has a flag that will stop it at the entry point before anything executes. On linux, does this flag do anything?

lldb/packages/Python/lldbsuite/test/tools/lldb-vscode/launch/TestVSCode_launch.py
355

This can probably still be racy on linux since I believe --stop-at-entry will launch the process and stop it as soon as possible (unlike a real posix_spawn flag on Darwin...). I wonder if we did a python call like:

script lldb.target.Launch(...)

As I believe the python interpreter is synchronous by default?

--stop-at-entry only really works for Darwin

That's not true. :) The reason this works on linux (and elsewhere) is that we don't use posix_spawn, but instead we launch the process (fork+exec) already under the control of ptrace. Ptrace automatically stops the process on execve(), so we are able to catch the process before it executes any (userspace) instructions.

It looks like all the continue routines in lldbvscode_testcase.py use self.vscode.wait_for_stopped() after setting the target going. Would that fix the problem here? Without knowing anything about the protocol VSCode is using, I can't say that's right. But if launch is asynchronous, as it looks like it is, it seems reasonable to test whether wait_for_stopped() also works for launch. Be weird if it didn't...

I think that would fix it, but I don't know if that would test the thing that it should test (I don't know much about the LSP protocol either). I made this fix going off of the fact that lldb-vscode used eLaunchFlagStopAtEntry in a "regular" launch and that the "extra launch commands are just supposed to be a fancier form of a "regular" launch.

BTW I did not have problem with this testcase, I had problem with TestVSCode_attach but I never finished that patch: https://people.redhat.com/jkratoch/TestVSCode_attach.patch

Interesting. I have seen that test fail in the past, but it doesn't seem to fail now. I don't think anything substantial has changed, so likely just random perturbations made it not fail on my end. I'll keep that patch in mind if I do see it fail again.

This revision was automatically updated to reflect the committed changes.