This is an archive of the discontinued LLVM Phabricator instance.

[lldb/Reproducers] Fix replay for process attach workflows
ClosedPublic

Authored by JDevlieghere on Mar 9 2020, 3:02 PM.

Details

Summary

Support replaying debug sessions that attach to an existing process instead of lldb launching the inferior. Bypass the logic that looks for a process with the given name and use an arbitrary PID. The value of the PID doesn't matter as the gdb remote replay infrastructure intercepts the attach and pretends that we're connected to the original process.

Diff Detail

Event Timeline

JDevlieghere created this revision.Mar 9 2020, 3:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 9 2020, 3:02 PM
Herald added a subscriber: teemperor. · View Herald Transcript

A more principled way to make this work would be to intercept (record) the Host::FindProcesses api. That way other functionality pertaining to running processes (e.g. the "platform process list" command) would also work. But if this is all you care about right now, then maybe this is fine...

The part that worries me more is the test. There are a lot of subtleties involved in making attach tests (and attach-by-name tests in particular) work reliably everywhere. I think this should be a dotest test, as there we already have some machinery to do these kinds of things (lldb_enable_attach, wait_for_file_on_target), and python is generally much better at complex control flow (I am very much against subshells and background processes in lit tests). Reproducers make this somewhat complicated because you cannot use the liblldb instance already loaded into the python process. But maybe you could run lldb in a subprocess similar to how the pexpect tests do it?

lldb/test/Shell/Reproducer/Inputs/sleep.c
5 ↗(On Diff #249226)

For attach to work reliably on linux, you need to ensure the process declares itself willing to be attached to. This is what the lldb_enable_attach macro in dotest inferiors does. Then you also need to ensure that the process has executed that statement before you attempt to attach. This is usually done via some pid file synchronization.

lldb/test/Shell/Reproducer/TestAttach.test
7 ↗(On Diff #249226)

How is this different from a plain %t/attach.out & ?

9 ↗(On Diff #249226)

Though normally determinism is good, in this case I think it is actually better to generate an unpredictable name for the process to avoid having the test be impacted by parallel test suite runs or leftover zombies. Other attach-by-name tests usually embed some a pid or a timestamp into the process name.

JDevlieghere planned changes to this revision.EditedMar 10 2020, 8:13 AM
JDevlieghere marked 2 inline comments as done.

A more principled way to make this work would be to intercept (record) the Host::FindProcesses api. That way other functionality pertaining to running processes (e.g. the "platform process list" command) would also work. But if this is all you care about right now, then maybe this is fine...

We could totally add a provider for that. I didn't because it seemed like overkill but if you're on board I also prefer that over a random PID.

The part that worries me more is the test. There are a lot of subtleties involved in making attach tests (and attach-by-name tests in particular) work reliably everywhere. I think this should be a dotest test, as there we already have some machinery to do these kinds of things (lldb_enable_attach, wait_for_file_on_target), and python is generally much better at complex control flow (I am very much against subshells and background processes in lit tests). Reproducers make this somewhat complicated because you cannot use the liblldb instance already loaded into the python process. But maybe you could run lldb in a subprocess similar to how the pexpect tests do it?

The problem is the SBDebugger::Initialize() that's called from the SWIG bindings, as soon as you import lldb it's already too late for the reproducers. I'm working on being able to capture/replay the test suite and currently I have the following hack:

if 'DOTEST_CAPTURE_PATH' in os.environ:
   SBReproducer.Capture(os.environ['DOTEST_CAPTURE_PATH'])
SBDebugger.Initialize()

If you're fine with having that in python.swig unconditionally we could make a dotest-test work.

lldb/test/Shell/Reproducer/TestAttach.test
7 ↗(On Diff #249226)

Should that work? That's the first thing I tried and lit complained about invalid syntax.

A more principled way to make this work would be to intercept (record) the Host::FindProcesses api. That way other functionality pertaining to running processes (e.g. the "platform process list" command) would also work. But if this is all you care about right now, then maybe this is fine...

We could totally add a provider for that. I didn't because it seemed like overkill but if you're on board I also prefer that over a random PID.

In general, I am in favor of doing the capture at the lowest level possible. For this particular feature/bug, it is overkill, but OTOH, this will also make it possible to support things other things without adding hacks into random pieces of code.

The part that worries me more is the test. There are a lot of subtleties involved in making attach tests (and attach-by-name tests in particular) work reliably everywhere. I think this should be a dotest test, as there we already have some machinery to do these kinds of things (lldb_enable_attach, wait_for_file_on_target), and python is generally much better at complex control flow (I am very much against subshells and background processes in lit tests). Reproducers make this somewhat complicated because you cannot use the liblldb instance already loaded into the python process. But maybe you could run lldb in a subprocess similar to how the pexpect tests do it?

The problem is the SBDebugger::Initialize() that's called from the SWIG bindings, as soon as you import lldb it's already too late for the reproducers. I'm working on being able to capture/replay the test suite and currently I have the following hack:

if 'DOTEST_CAPTURE_PATH' in os.environ:
   SBReproducer.Capture(os.environ['DOTEST_CAPTURE_PATH'])
SBDebugger.Initialize()

If you're fine with having that in python.swig unconditionally we could make a dotest-test work.

I'm not sure how that would help because for the test you need to run lldb both in capture and replay mode, and I don't think you can currently do that within a single process. It would be cool if that was possible, but even then we'd have the impendance mismatch because we'd need to run SBDebugger.Initialize inside a specific test method, whereas normally it gets run much earlier.

That's why I was talking about subprocesses in the previous patch. The test would only be responsible for building the inferior and driving the whole thing, while capture/replay would happen inside separate processes:

self.spawnSubproces(randomized_inferior_name, [token_path])
lldbutil.wait_for_file_on_target(token_path)
self.spawnSubprocess(lldbtest_config.lldbExec, ['--capture', reproducer, '-n', randomized_inferior_name, ...])
...
self.spawnSubprocess(lldbtest_config.lldbExec, ['--replay', reproducer])
lldb/test/Shell/Reproducer/TestAttach.test
7 ↗(On Diff #249226)

I've seen tests do that (RUN: setsid %run %t/LFSIGUSR -merge=1 -merge_control_file=%t/MCF %t/C1 %t/C2 2>%t/log & export PID=$! in ./compiler-rt/test/fuzzer/merge-sigusr.test), but as I said, I don't think that is a good idea, so I don't really want to encourange it...

A more principled way to make this work would be to intercept (record) the Host::FindProcesses api. That way other functionality pertaining to running processes (e.g. the "platform process list" command) would also work. But if this is all you care about right now, then maybe this is fine...

We could totally add a provider for that. I didn't because it seemed like overkill but if you're on board I also prefer that over a random PID.

In general, I am in favor of doing the capture at the lowest level possible. For this particular feature/bug, it is overkill, but OTOH, this will also make it possible to support things other things without adding hacks into random pieces of code.

The part that worries me more is the test. There are a lot of subtleties involved in making attach tests (and attach-by-name tests in particular) work reliably everywhere. I think this should be a dotest test, as there we already have some machinery to do these kinds of things (lldb_enable_attach, wait_for_file_on_target), and python is generally much better at complex control flow (I am very much against subshells and background processes in lit tests). Reproducers make this somewhat complicated because you cannot use the liblldb instance already loaded into the python process. But maybe you could run lldb in a subprocess similar to how the pexpect tests do it?

The problem is the SBDebugger::Initialize() that's called from the SWIG bindings, as soon as you import lldb it's already too late for the reproducers. I'm working on being able to capture/replay the test suite and currently I have the following hack:

if 'DOTEST_CAPTURE_PATH' in os.environ:
   SBReproducer.Capture(os.environ['DOTEST_CAPTURE_PATH'])
SBDebugger.Initialize()

If you're fine with having that in python.swig unconditionally we could make a dotest-test work.

I'm not sure how that would help because for the test you need to run lldb both in capture and replay mode, and I don't think you can currently do that within a single process. It would be cool if that was possible, but even then we'd have the impendance mismatch because we'd need to run SBDebugger.Initialize inside a specific test method, whereas normally it gets run much earlier.

That's why I was talking about subprocesses in the previous patch. The test would only be responsible for building the inferior and driving the whole thing, while capture/replay would happen inside separate processes:

Ah, I misunderstood subprocess as another Python process, yeah launching the driver should work. Thanks for the clarification.

  • Add ProcessInfo provider.
  • Rewrite test as an dotest-test.

A bunch of comments but nothing really major. Maybe it would be nice to put the code for yamlification of ProcessInfo into a separate patch?

lldb/source/Commands/CommandObjectReproducer.cpp
544–557

Maybe some kind of a utility function to convert a file to an object?
template<typename T> Expected<T> readAsYaml(StringRef filename) ?

lldb/source/Host/macosx/objcxx/Host.mm
595

This means that every implementation of FindProcesses will need to introduce this bolierplate. We should put this into common code somehow. One way to do that would be to rename all the platform-specific implementations to something like DoFindProcesses, and then implement FindProcesses source/Host/common/Host.cpp to handle the delegation & reproducer logic.

lldb/source/Utility/FileSpec.cpp
543 ↗(On Diff #249555)

There's more to FileSpecs than just the path -- they also hold the path syntax and the "case-sensitive" bit. Kind of not needed for your current goal, but maybe we should add those too while we're here?

lldb/source/Utility/ProcessInfo.cpp
400–403

You don't actually have to provide these functions if they are not going to do anything.

417–430

random thought: Would any of this be simpler if this wasn't a "multi" provider but rather stored all of the responses as a sequence in a single file?

436

what's the type of this?

lldb/test/API/functionalities/reproducers/attach/TestReproducerAttach.py
31

You still need to do the wait_for_file_on_target dance here to ensure that lldb_enable_attach is executed before we actually attach. One example of that is in test/API/python_api/hello_world/main.c.

lldb/test/API/functionalities/reproducers/attach/main.cpp
11

You probably copied this from some existing test, but I'd say this is putting unnecessary load on the system. For this use case even a 1-second sleep would be perfectly fine.

Address code review feedback.

JDevlieghere marked 7 inline comments as done.Mar 11 2020, 2:49 PM
JDevlieghere added inline comments.
lldb/source/Utility/ProcessInfo.cpp
417–430

Maybe/Probably? I'm not sure. But even if it were a bit simpler, I think it's better to reuse the existing multi-provider for consistency.

labath accepted this revision.Mar 13 2020, 4:38 AM
labath added inline comments.
lldb/test/API/functionalities/reproducers/attach/TestReproducerAttach.py
35

s/patch/path

53–54

self.assertIn(needle, haystack)

66–75

Would it be possible to run this in the context of the current process (via self.expect?)

74–75

I guess you meant assertIn here too

77–82

The reproducer dir will still linger on if the test fails for any reason. If you just put this in the build directory (by (ab)using self.getBuildArtifact), then maybe you don't need to clean it up, as it will be automatically deleted the next time the test suite runs...

This revision is now accepted and ready to land.Mar 13 2020, 4:38 AM
This revision was automatically updated to reflect the committed changes.
JDevlieghere marked an inline comment as done.