Page MenuHomePhabricator

Allow partial UUID matching in Minidump core file plug-in

Authored by clayborg on Mar 29 2019, 11:00 AM.



Breakpad had bugs in earlier versions where it would take a 20 byte ELF build ID and put it into the minidump file as a 16 byte PDB70 UUID with an age of zero. This would make it impossible to do postmortem debugging with one of these older minidump files.

This fix allows partial matching of UUIDs. To do this we first try and match with the full UUID value, and then fall back to removing the original directory path from the module specification and we remove the UUID requirement, and then manually do the matching ourselves. This allows scripts to find symbols files using a symbol server, place them all in a directory, use the "setting set target.exec-search-paths" setting to specify the directory, and then load the core file. The Target::GetSharedModule() can then find the correct file without doing any other matching and load it.

Tests were added to cover a partial UUID match where the breakpad file has a 16 byte UUID and the actual file on disk has a 20 byte UUID, both where the first 16 bytes match, and don't match.

Diff Detail


Event Timeline

clayborg created this revision.Mar 29 2019, 11:00 AM

+mark, in case he has any thoughts about this. It's unfortunate that we need to add workarounds like this, but it seems that's what the situation is.

155–156 ↗(On Diff #192868)

The yamlization of minidumps is unfortunately not ready yet, but it should be possible to check in the so files in yaml form. Could you please try that out. You should be able to remove almost everything from the yaml file except the .build-id section.

394 ↗(On Diff #192868)

Under what circumstances can GetSharedModule fail, and still produce a valid module_sp ?

397 ↗(On Diff #192868)

I am wondering if there's any way to narrow down the scope of this so we don't do this second check for every module out there. If I understood the description correctly, it should be sufficient to do this for linux minidumps with PDB70 uuid records. Since PDBs aren't a thing on linux, and later breakpad versions used used ElfBuildId records, this should be a pretty exact match for the situation we want to capture. WDYT?

(Getting the platform of the minidump should be easy, for the UUID record type, we might need a small refactor of the MinidumpParser class.)

416–435 ↗(On Diff #192868)

I don't find this piecewise checking helps readability. What would really help IMO is if the booleans were inverted, so we check for the case when we *have* a match, instead of when we haven't.
Maybe something like

bool module_matches = dmp_bytes.empty() || mod_bytes.empty() || (dmp_bytes.size() < mod_bytes.size() && mod_bytes.take_front(dmp_bytes.size()) == dmp_bytes);
if (!module_matches) { GetTarget().GetImages().Remove(module_sp); module_sp.reset(); }
clayborg marked 2 inline comments as done.Mar 29 2019, 2:55 PM
clayborg added inline comments.
394 ↗(On Diff #192868)

Sometimes you can create a module with a specification, but it won't have a valid object file is my understanding.

397 ↗(On Diff #192868)

We really do want this. Why? Because many times with minidumps you have to go out and find the symbol files in some alternate location, put them in a directory, then we want lldb to find them automatically. If you have a module spec with a full path and no UUID, the path will need to match. If you just specify a basename with no UUID, it allows us to use that file. Also with our UUID story being a bit weak with ELF 20 bytes build IDs versus 4 bytes GNU build IDs,. we might need to pull some tricks and strip out a UUID from a binary so we can load it in LLDB. So this basename only search allows us to use a symbol file with no UUID or if the minidump didn't contain a UUID and somehow we knew where the right symbol file was. So this does catch my case, but will also find any files in the exec search paths.

435 ↗(On Diff #192868)

I can change this.

labath added inline comments.Apr 1 2019, 2:49 AM
394 ↗(On Diff #192868)

Ok, but if that's the what the function is supposed to do in that case, then why should it return an error?

To me, this looks like a bug in that function (I don't see any justification for GetSharedModule to modify the module list of a target, but still return an error), and it should be fixed there (and not worked around here). Or, if you don't know how to trigger this condition, we can just drop this code here.

397 ↗(On Diff #192868)

Ok, so I see that this patch does a bit more than it may seem at first glance. I think it would be good to enumerate the list of scenarios that you're interested in here, because there are multiple quirks of different components that we are working with here, and each may require a fix elsewhere. The scenarios I know about (but there may be others) are:

  1. breakpad truncating long build-ids
  2. LLDB using a a crc checksum as a build-id, when the real build-id is missing
  3. breakpad using some other arbitrary checksum when a real build-id is missing

Of these, 1. and 3. are a problem in the producer, and we cannot do anything about that except work around them. Doing that close to the source (i.e. ProcessMinidump) is reasonable. However, 2. is an lldb problem (I would say even a "bug"), and we certainly can do something to fix it instead of just covering it out. So, if there's any way to cut complexity here by fixing elf UUID computation, I think we should do that instead.

Now, cases 1. and 3. do result in UUIDs that are incompatible with how we use UUIDs elsewhere in lldb, so working around this by retrying the search without a UUID and then doing some auxiliary matching seems like a reasonable compromise. However, it's not clear to me why you need to strip the path too. Based on my reading of ModuleList::GetSharedModule, the directory component is ignored when searching "exec-search-paths", and so we shouldn't need to do anything special to make this case work. And if that's the case (if it isn't, then I'd like to understand why), then it seems reasonable to restrict the UUID-less retry to the cases where we know breakpad might have generated incompatible UUIDs. Restricting it to ElfBuildId-records seems like a good first-order approximation, but if there are other clues we could use for that, I'd say we should use them too.

So I'd propose to split this patch into two (or more?) independent changes:

  1. See what it takes to make exec-search-paths lookup work in the "normal" (i.e., the file has a regular build-id, and everybody agrees on what it is) case. I did some simple experiments, and it seemed to work for me without any special modifications. So if there's a case that doesn't work it could be a bug elsewhere that needs fixing.
  2. implement the retry logic to help the cases when breakpad produces "wrong" uuid.
  3. (?) If any of this is hampered by lldb crc "uuid" (I don't think it should be because if you specify that you don't wan't to search by uuid, then it really doesn't matter what lldb thinks the uuid's module is), then let's see if we can change that logic.
clayborg updated this revision to Diff 193125.Apr 1 2019, 10:46 AM

Fixed the requested issues. This patch is small and self contained and fixes the issue with partial UUID matching so I would like to start with this and iterate on the other features you mentioned with subsequent patches if possible. Let me know.

clayborg marked 2 inline comments as done.Apr 1 2019, 10:47 AM
labath added inline comments.Apr 1 2019, 12:48 PM
390–391 ↗(On Diff #193125)

This comment (the part about removing in case of error) is no longer true. Please delete it.

396–397 ↗(On Diff #193125)

Same here.

401 ↗(On Diff #193125)

What will happen if you just delete this line? Based on my experiments, I believe we should still end up looking through the exec-search-paths list. However, the test does not seem to be modifying this setting, so it's not clear to me how it is even finding the binary in the first place. What's the mechanism you're relying on for the lookup in the test?

407–408 ↗(On Diff #193125)

These comments no longer make sense like this as the checking happens all at once. Please rephrase them. Maybe something like "We consider the module to be a match if the minidump UUID is a prefix of the module UUID, or if one of the uuid's is empty". ?

clayborg updated this revision to Diff 193198.Apr 1 2019, 4:25 PM
clayborg marked 2 inline comments as done.

Fix comments and address review issues.

clayborg marked 3 inline comments as done.Apr 1 2019, 4:26 PM
clayborg added inline comments.
391 ↗(On Diff #193125)

Will fix

401 ↗(On Diff #193125)

It doesn't work without clearing the fullpath here and I didn't want to modify the current behavior of the exec search paths in here just in case it would affect others that were depending on a certain behavior.

408 ↗(On Diff #193125)

I will change it

labath accepted this revision.Apr 2 2019, 1:27 AM

lgtm. Thanks for the patience.

401 ↗(On Diff #193125)

Ok, I don't think we have to block on that, but I'd still like to understand how the files are found in the first place. Right now, I don't understand it because:
a) they are not in exec-search-path
b) (I think) they are not in CWD
So the fact that they are found may be accidental and prone to breakage.

This revision is now accepted and ready to land.Apr 2 2019, 1:27 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 2 2019, 8:41 AM
clayborg marked an inline comment as done.Apr 2 2019, 9:01 AM
clayborg added inline comments.
401 ↗(On Diff #193125)

I added the build directory to the "settings set target.exec-search-paths" in the tests in case this changes in the future.

Since this commit, TestMiniDumpUUID_py is failing on green dragon.

Could you please revert the change or commit a fix?

The bots were broken for more than six hours so I took the liberty to revert the change in r357534. Sorry for the inconvenience!

labath added inline comments.Apr 3 2019, 5:59 AM

The second failure I had was due to the full module showing up when the module was found, so this check failed.

The interesting part (and the reason why I checked out this patch) was that the module was still being found even when I removed the basename_module_spec.GetFileSpec().GetDirectory().Clear(); line, which was my last uncertainty about this patch. So now I even more strongly believe that the clearing of the directory file spec is unnecessary and should be removed.


I think this is one of the causes of failure. Changing this to libuuidmismatch.yaml made it work for me.

clayborg marked 2 inline comments as done.Apr 3 2019, 8:57 AM
clayborg added inline comments.

And that _didn't_ work for me. If I removed it, it doesn't work on macOS. I wonder if we might be setting some image search paths up as part of the test. I just tested manually and if I do the command in a fresh LLDB, I get the fullpath, even on mac. If I run through the test suite, then I just get the basename???


Ok, I will fix this.

ok, so I think I figured out what was going on: I had the .so files still in my build packages/Python/lldbsuite/test/functionalities/postmortem/minidump-new directory. They didn't show up in the "svn stat" command because they are ignored!!! arg! That is what was causing problems when testing on my machine.

labath added a comment.Apr 4 2019, 3:09 AM

ok, so I think I figured out what was going on: I had the .so files still in my build packages/Python/lldbsuite/test/functionalities/postmortem/minidump-new directory. They didn't show up in the "svn stat" command because they are ignored!!! arg! That is what was causing problems when testing on my machine.

Yeah, I can imagine this being very annoying to track down. This is very unfortunate. I remember looking at this problem a some time ago because it was causing failures on one of the bots, and not being able to conclude where this ignore is coming from. It looks like svn just doesn't like .so files altogether. Well.. I guess that's one more thing that will be solved be the impending switch to github.

Does this mean the direcotory-clearing part can be removed now? I've tried the test on a mac and it still succeeds for me. If it doesn't work for you, it looks like we still have some kind of nondeterminism in here, and I'd like to get to the bottom of that. FTR: this is the stack trace of the point where lldb find the module for me:

(lldb) bt
* thread #1, queue = '', stop reason = step over
  * frame #0: 0x00000001062ce59e`lldb_private::ModuleList::GetSharedModule(module_spec=0x00007ffeefbfbef0, module_sp=nullptr, module_search_paths_ptr=0x0000000100204c80, old_module_sp_ptr=0x00007ffeefbfc9f0, did_create_ptr=0x00007ffeefbfc9ef, always_create=false) at ModuleList.cpp:862:19
    frame #1: 0x00000001067432c9`lldb_private::Platform::GetSharedModule(this=0x0000000100606f38, spec=0x00007ffeefbfcbe0)::$_0::operator()(lldb_private::ModuleSpec const&) const at Platform.cpp:245:15
    frame #2: 0x00000001067430d4`decltype(__f=0x0000000100606f38, __args=0x00007ffeefbfcbe0)::$_0&>(fp)(std::__1::forward<lldb_private::ModuleSpec const&>(fp0))) std::__1::__invoke<lldb_private::Platform::GetSharedModule(lldb_private::ModuleSpec const&, lldb_private::Process*, std::__1::shared_ptr<lldb_private::Module>&, lldb_private::FileSpecList const*, std::__1::shared_ptr<lldb_private::Module>*, bool*)::$_0&, lldb_private::ModuleSpec const&>(lldb_private::Platform::GetSharedModule(lldb_private::ModuleSpec const&, lldb_private::Process*, std::__1::shared_ptr<lldb_private::Module>&, lldb_private::FileSpecList const*, std::__1::shared_ptr<lldb_private::Module>*, bool*)::$_0&, lldb_private::ModuleSpec const&) at type_traits:4339:1
    frame #3: 0x0000000106743054`lldb_private::Status std::__1::__invoke_void_return_wrapper<lldb_private::Status>::__call<lldb_private::Platform::GetSharedModule(__args=0x0000000100606f38, __args=0x00007ffeefbfcbe0)::$_0&, lldb_private::ModuleSpec const&>(lldb_private::Platform::GetSharedModule(lldb_private::ModuleSpec const&, lldb_private::Process*, std::__1::shared_ptr<lldb_private::Module>&, lldb_private::FileSpecList const*, std::__1::shared_ptr<lldb_private::Module>*, bool*)::$_0&, lldb_private::ModuleSpec const&) at __functional_base:318:16
    frame #4: 0x0000000106742218`std::__1::__function::__func<lldb_private::Platform::GetSharedModule(lldb_private::ModuleSpec const&, lldb_private::Process*, std::__1::shared_ptr<lldb_private::Module>&, lldb_private::FileSpecList const*, std::__1::shared_ptr<lldb_private::Module>*, bool*)::$_0, std::__1::allocator<lldb_private::Platform::GetSharedModule(lldb_private::ModuleSpec const&, lldb_private::Process*, std::__1::shared_ptr<lldb_private::Module>&, lldb_private::FileSpecList const*, std::__1::shared_ptr<lldb_private::Module>*, bool*)::$_0>, lldb_private::Status (lldb_private::ModuleSpec const&)>::operator(this=0x0000000100606f30, __arg=0x00007ffeefbfcbe0)(lldb_private::ModuleSpec const&) at functional:1562:12
    frame #5: 0x0000000106739f37`std::__1::function<lldb_private::Status (lldb_private::ModuleSpec const&)>::operator(this=0x00007ffeefbfc570, __arg=0x00007ffeefbfcbe0)(lldb_private::ModuleSpec const&) const at functional:1913:12
    frame #6: 0x000000010673301f`lldb_private::Platform::GetRemoteSharedModule(this=0x000000010045b210, module_spec=0x00007ffeefbfcbe0, process=0x0000000101063418, module_sp=nullptr, module_resolver=0x00007ffeefbfc570, did_create_ptr=0x00007ffeefbfc9ef)> const&, bool*) at Platform.cpp:1593:16
    frame #7: 0x0000000106732c7f`lldb_private::Platform::GetSharedModule(this=0x000000010045b210, module_spec=0x00007ffeefbfcbe0, process=0x0000000101063418, module_sp=nullptr, module_search_paths_ptr=0x0000000100204c80, old_module_sp_ptr=0x00007ffeefbfc9f0, did_create_ptr=0x00007ffeefbfc9ef) at Platform.cpp:254:10
    frame #8: 0x00000001067faed1`lldb_private::Target::GetSharedModule(this=0x0000000100806800, module_spec=0x00007ffeefbfcbe0, error_ptr=0x00007ffeefbfccf8) at Target.cpp:2040:34
    frame #9: 0x0000000107061f36`lldb_private::minidump::ProcessMinidump::ReadModuleList(this=0x0000000101063418) at ProcessMinidump.cpp:402:31
    frame #10: 0x00000001070617cc`lldb_private::minidump::ProcessMinidump::DoLoadCore(this=0x0000000101063418) at ProcessMinidump.cpp:218:3
    frame #11: 0x000000010675aec5`lldb_private::Process::LoadCore(this=0x0000000101063418) at Process.cpp:2629:18
    frame #12: 0x0000000105e045e7`lldb::SBTarget::LoadCore(this=0x000000010020b590, core_file="linux-arm-partial-uuids-match.dmp", error=0x00007ffeefbfd430) at SBTarget.cpp:277:34
    frame #13: 0x0000000105e041a2`lldb::SBTarget::LoadCore(this=0x000000010020b590, core_file="linux-arm-partial-uuids-match.dmp") at SBTarget.cpp:262:10
    frame #14: 0x000000010616a9ed`_wrap_SBTarget_LoadCore__SWIG_0((null)=0x0000000000000000, args=0x000000011ddc8710) at LLDBWrapPython.cpp:51384:22
    frame #15: 0x0000000106106d31`_wrap_SBTarget_LoadCore(self=0x0000000000000000, args=0x000000011ddc8710) at LLDBWrapPython.cpp:51467:16
(lldb) fr var module_spec.m_file
(lldb_private::FileSpec) module_spec.m_file = {
  m_directory = (m_string = "/invalid/path/on/current/system")
  m_filename = (m_string = "")
  m_is_resolved = true
  m_style = posix
(lldb) fr var resolved_module_spec.m_file
(lldb_private::FileSpec) resolved_module_spec.m_file = {
  m_directory = (m_string = "/Users/labath/ll/build/dbg/lldb-test-build.noindex/functionalities/postmortem/minidump-new/TestMiniDumpUUID.test_partial_uuid_match")
  m_filename = (m_string = "")
  m_is_resolved = false
  m_style = posix

As you can see, the code happily constructs the right path based on the exec-search-path setting even though we are setting the full path on the input file spec. Can you please check why your lldb fails to reach this place?