For performance reasons the reproducers don't copy the files captured by the file collector until we decide that the reproducer needs to be kept (i.e. generated). This is a problematic when LLDB crashes and we need to do this signal-unsafe work in the signal handler. This patch uses the same trick as clang and reinvokes itself to copy the files out of process.
Details
Diff Detail
Event Timeline
Hm... Are you sure that this will help with anything? IIUC, the serialization of the VFS file map still happens in the context of the crashing process. This requires walking a bunch of data structures with liberally-asserting code, which will abort on any inconsistency it runs into. If *that* succeeds, I'd expect that the copying of the files will succeed too.
I think it would be better if we moved the vfs map construction into the other process too. Maybe via something like this:
- The first process opens a "log" file, which will contain the list of files to be copied. For added safety the file should be opened append-only.
- As it works, it writes file names to this file.
- Upon a crash, the re-execed process reads this file, constructs the vfs map, and copies all necessary files. It should be prepared to handle/ignore any garbled entries, and the format of the file should make this easy. (Since the log file must by definition contain enough information to recreate the vfs map, it might be simpler to not write out the map, but just always recreate it in memory when replaying.)
lldb/tools/driver/Driver.cpp | ||
---|---|---|
857 | I guess this won't work if there are spaces anywhere in the path.. | |
lldb/tools/driver/Options.td | ||
235 | Should this be a hidden argument? |
I only have a handful of crash logs to base myself off. Currently the walking and the copying are intertwined, so I can't say for sure. My idea was to give this a try and see where that got us and make more invasive changes (like what you proposed) if needed. But since you sound pretty convinced that we'll need it anyway I will extend the patch.
I think it would be better if we moved the vfs map construction into the other process too. Maybe via something like this:
- The first process opens a "log" file, which will contain the list of files to be copied. For added safety the file should be opened append-only.
- As it works, it writes file names to this file.
- Upon a crash, the re-execed process reads this file, constructs the vfs map, and copies all necessary files. It should be prepared to handle/ignore any garbled entries, and the format of the file should make this easy. (Since the log file must by definition contain enough information to recreate the vfs map, it might be simpler to not write out the map, but just always recreate it in memory when replaying.)
Yes, that would be similar to how we immediately flush GDB remote packets to disk. It would require adding some "immediate" mode to the FileCollector and an API to convert between the formats. I think the new format could be just host paths separated by newlines.
lldb/tools/driver/Options.td | ||
---|---|---|
235 | Yep |
- Implement a FlushingFileCollector that appends paths to a file
- Create the mapping out-of-process
- Add an InProcessFinalizer that uses the current reproducer instance to copy over the files in-process
Well.. I'd expect that the reason why saving of the reproducer fails is because we corrupt the heap/stack before we finally crash. Copying files is basically a bunch of syscalls (though some of our abstractions get in the way), so it wouldn't normally be affected by this. Our ability to _find_ the files that we're supposed to copy is a different matter...
That said, this doesn't seem significantly more invasive than the previous approach.
I think it would be better if we moved the vfs map construction into the other process too. Maybe via something like this:
- The first process opens a "log" file, which will contain the list of files to be copied. For added safety the file should be opened append-only.
- As it works, it writes file names to this file.
- Upon a crash, the re-execed process reads this file, constructs the vfs map, and copies all necessary files. It should be prepared to handle/ignore any garbled entries, and the format of the file should make this easy. (Since the log file must by definition contain enough information to recreate the vfs map, it might be simpler to not write out the map, but just always recreate it in memory when replaying.)
Yes, that would be similar to how we immediately flush GDB remote packets to disk. It would require adding some "immediate" mode to the FileCollector and an API to convert between the formats. I think the new format could be just host paths separated by newlines.
A nul character is better than a newline, as paths can contain newline characters.
lldb/include/lldb/Utility/Reproducer.h | ||
---|---|---|
241–256 | This appears to be over-engineered. Is there a reason plain functions won't do? llvm::Error FinalizeReproducer(Loader&); llvm::Error FinalizeReproducerInProcess(FileSpec); // calls the first one That said, I'm not sure "InProcess" captures the distinction very well, as both are "in process" at the point where they are used. Maybe name them according to what they do (as in, whether they generate a full reproducer, or one that needs to be fixed up afterwards) | |
lldb/source/Utility/Reproducer.cpp | ||
388 | Is this deliberately ignoring a possible error? | |
lldb/source/Utility/ReproducerProvider.cpp | ||
53 ↗ | (On Diff #299214) | So, this reopens the file every time a file is added? That is suboptimal... Why not just open it once and keep it that way? That would also enable handling of error during opening of files. |
65–67 ↗ | (On Diff #299214) | if(!ec) os<<dir; ? |
lldb/test/Shell/Reproducer/TestCaptureEnvOverride.test | ||
1 | Any chance of testing the finalization code by letting it loose on an hand-generated unfinalized reproducer? | |
lldb/tools/driver/Driver.cpp | ||
734–735 | this is really cryptic. Maybe renaming cmd to finalize_cmd would help? |
Looks good.
lldb/source/Utility/Reproducer.cpp | ||
---|---|---|
371 | I'd expect this to contain more than 20 paths in a typical session, so maybe just use zero? | |
lldb/source/Utility/ReproducerProvider.cpp | ||
67 ↗ | (On Diff #300051) | Flushing also makes things slower, so there's a tradeoff between speed an accuracy to consider. Theoretically we could play with the size of the stream buffer to fine tune that... |
lldb/test/Shell/Reproducer/TestFinalize.test | ||
5–6 ↗ | (On Diff #300051) | s/printf/echo. printf is not needed here, and I'm not sure it would work on windows. |
This appears to be over-engineered. Is there a reason plain functions won't do?
That said, I'm not sure "InProcess" captures the distinction very well, as both are "in process" at the point where they are used. Maybe name them according to what they do (as in, whether they generate a full reproducer, or one that needs to be fixed up afterwards)