This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Move copying of reproducer out of process
ClosedPublic

Authored by JDevlieghere on Oct 16 2020, 4:01 PM.

Details

Summary

For performance reasons the reproducers don't copy the files captured by the file collector until we decide that the reproducer needs to be kept (i.e. generated). This is a problematic when LLDB crashes and we need to do this signal-unsafe work in the signal handler. This patch uses the same trick as clang and reinvokes itself to copy the files out of process.

Diff Detail

Event Timeline

JDevlieghere created this revision.Oct 16 2020, 4:01 PM
JDevlieghere requested review of this revision.Oct 16 2020, 4:01 PM

Hm... Are you sure that this will help with anything? IIUC, the serialization of the VFS file map still happens in the context of the crashing process. This requires walking a bunch of data structures with liberally-asserting code, which will abort on any inconsistency it runs into. If *that* succeeds, I'd expect that the copying of the files will succeed too.

I think it would be better if we moved the vfs map construction into the other process too. Maybe via something like this:

  • The first process opens a "log" file, which will contain the list of files to be copied. For added safety the file should be opened append-only.
  • As it works, it writes file names to this file.
  • Upon a crash, the re-execed process reads this file, constructs the vfs map, and copies all necessary files. It should be prepared to handle/ignore any garbled entries, and the format of the file should make this easy. (Since the log file must by definition contain enough information to recreate the vfs map, it might be simpler to not write out the map, but just always recreate it in memory when replaying.)
lldb/tools/driver/Driver.cpp
857

I guess this won't work if there are spaces anywhere in the path..

lldb/tools/driver/Options.td
235

Should this be a hidden argument?

JDevlieghere planned changes to this revision.Oct 19 2020, 8:45 AM

Hm... Are you sure that this will help with anything? IIUC, the serialization of the VFS file map still happens in the context of the crashing process. This requires walking a bunch of data structures with liberally-asserting code, which will abort on any inconsistency it runs into. If *that* succeeds, I'd expect that the copying of the files will succeed too.

I only have a handful of crash logs to base myself off. Currently the walking and the copying are intertwined, so I can't say for sure. My idea was to give this a try and see where that got us and make more invasive changes (like what you proposed) if needed. But since you sound pretty convinced that we'll need it anyway I will extend the patch.

I think it would be better if we moved the vfs map construction into the other process too. Maybe via something like this:

  • The first process opens a "log" file, which will contain the list of files to be copied. For added safety the file should be opened append-only.
  • As it works, it writes file names to this file.
  • Upon a crash, the re-execed process reads this file, constructs the vfs map, and copies all necessary files. It should be prepared to handle/ignore any garbled entries, and the format of the file should make this easy. (Since the log file must by definition contain enough information to recreate the vfs map, it might be simpler to not write out the map, but just always recreate it in memory when replaying.)

Yes, that would be similar to how we immediately flush GDB remote packets to disk. It would require adding some "immediate" mode to the FileCollector and an API to convert between the formats. I think the new format could be just host paths separated by newlines.

lldb/tools/driver/Options.td
235

Yep

  • Implement a FlushingFileCollector that appends paths to a file
  • Create the mapping out-of-process
  • Add an InProcessFinalizer that uses the current reproducer instance to copy over the files in-process

Hm... Are you sure that this will help with anything? IIUC, the serialization of the VFS file map still happens in the context of the crashing process. This requires walking a bunch of data structures with liberally-asserting code, which will abort on any inconsistency it runs into. If *that* succeeds, I'd expect that the copying of the files will succeed too.

I only have a handful of crash logs to base myself off. Currently the walking and the copying are intertwined, so I can't say for sure. My idea was to give this a try and see where that got us and make more invasive changes (like what you proposed) if needed. But since you sound pretty convinced that we'll need it anyway I will extend the patch.

Well.. I'd expect that the reason why saving of the reproducer fails is because we corrupt the heap/stack before we finally crash. Copying files is basically a bunch of syscalls (though some of our abstractions get in the way), so it wouldn't normally be affected by this. Our ability to _find_ the files that we're supposed to copy is a different matter...

That said, this doesn't seem significantly more invasive than the previous approach.

I think it would be better if we moved the vfs map construction into the other process too. Maybe via something like this:

  • The first process opens a "log" file, which will contain the list of files to be copied. For added safety the file should be opened append-only.
  • As it works, it writes file names to this file.
  • Upon a crash, the re-execed process reads this file, constructs the vfs map, and copies all necessary files. It should be prepared to handle/ignore any garbled entries, and the format of the file should make this easy. (Since the log file must by definition contain enough information to recreate the vfs map, it might be simpler to not write out the map, but just always recreate it in memory when replaying.)

Yes, that would be similar to how we immediately flush GDB remote packets to disk. It would require adding some "immediate" mode to the FileCollector and an API to convert between the formats. I think the new format could be just host paths separated by newlines.

A nul character is better than a newline, as paths can contain newline characters.

lldb/include/lldb/Utility/Reproducer.h
241–256

This appears to be over-engineered. Is there a reason plain functions won't do?

llvm::Error FinalizeReproducer(Loader&);
llvm::Error FinalizeReproducerInProcess(FileSpec); // calls the first one

That said, I'm not sure "InProcess" captures the distinction very well, as both are "in process" at the point where they are used. Maybe name them according to what they do (as in, whether they generate a full reproducer, or one that needs to be fixed up afterwards)

lldb/source/Utility/Reproducer.cpp
388

Is this deliberately ignoring a possible error?

lldb/source/Utility/ReproducerProvider.cpp
53 ↗(On Diff #299214)

So, this reopens the file every time a file is added? That is suboptimal... Why not just open it once and keep it that way? That would also enable handling of error during opening of files.

65–67 ↗(On Diff #299214)

if(!ec) os<<dir; ?

lldb/test/Shell/Reproducer/TestCaptureEnvOverride.test
1

Any chance of testing the finalization code by letting it loose on an hand-generated unfinalized reproducer?

lldb/tools/driver/Driver.cpp
734–735

this is really cryptic. Maybe renaming cmd to finalize_cmd would help?

JDevlieghere marked 6 inline comments as done.

Address Pavel's code review feedback

labath accepted this revision.Oct 23 2020, 1:48 AM

Looks good.

lldb/source/Utility/Reproducer.cpp
371

I'd expect this to contain more than 20 paths in a typical session, so maybe just use zero?

lldb/source/Utility/ReproducerProvider.cpp
67 ↗(On Diff #300051)

Flushing also makes things slower, so there's a tradeoff between speed an accuracy to consider. Theoretically we could play with the size of the stream buffer to fine tune that...

lldb/test/Shell/Reproducer/TestFinalize.test
5–6 ↗(On Diff #300051)

s/printf/echo. printf is not needed here, and I'm not sure it would work on windows.

This revision is now accepted and ready to land.Oct 23 2020, 1:48 AM
JDevlieghere marked an inline comment as done.Oct 23 2020, 8:58 AM
JDevlieghere added inline comments.
lldb/source/Utility/Reproducer.cpp
388

It was because we were only opening/creating the file when adding to it but that's no longer the case.

lldb/test/Shell/Reproducer/TestFinalize.test
5–6 ↗(On Diff #300051)

It was to avoid the newline, but I can pass -n.

This revision was automatically updated to reflect the committed changes.
JDevlieghere marked an inline comment as done.
Herald added a project: Restricted Project. · View Herald TranscriptOct 23 2020, 12:34 PM