This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Add more ways to find split DWARF files
ClosedPublic

Authored by DavidSpickett on Aug 10 2023, 6:06 AM.

Details

Summary

Fixes #28667

There's a bunch of ways to end up building split DWARF where the
DWO file is not next to the program file. On top of that you may
distribute the program in various ways, move files about, switch
machines, flatten the directories, etc.

This change adds a few more strategies to find DWO files:

  • Appending the DW_AT_COMP_DIR and DWO name to all the debug search paths.
  • Appending the same to the binary's dir.
  • Appending the DWO name (e.g. a/b/foo.dwo) to all the debug search paths.
  • Appending the DWO name to the binary's location.
  • Appending the DWO filename (e.g. foo.dwo) to the debug search paths.
  • Appending the DWO filename to the binary's location.

They are applied in that order and some will be skipped
if the DW_AT_COMP_DIR is relative or absolute, same for
the DWO name (though that seems to always be relative).

This uses the setting target.debug-file-search-paths, which
is used for DWP files already.

The added tests likely do not cover every part of the
strategies listed, it's a best effort.

Diff Detail

Event Timeline

DavidSpickett created this revision.Aug 10 2023, 6:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2023, 6:06 AM
DavidSpickett requested review of this revision.Aug 10 2023, 6:06 AM
Herald added a project: Restricted Project. · View Herald Transcript
DavidSpickett planned changes to this revision.Aug 10 2023, 6:06 AM

Cleanup, add testing.

DavidSpickett retitled this revision from WIP: [lldb] Search debug file paths when looking for split DWARF files to [lldb] Search debug file paths when looking for split DWARF files.Aug 14 2023, 4:25 AM
DavidSpickett edited the summary of this revision. (Show Details)
DavidSpickett added reviewers: aprantl, clayborg, lenary.
DavidSpickett added a subscriber: lenary.

https://github.com/llvm/llvm-project/issues/28667 is the github issue.

@lenary This fixes your reproducer, if you have any more ways to confuse lldb please try them with this.

clayborg added inline comments.Aug 14 2023, 4:03 PM
lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
1735

Maybe we can also check the debug info search paths to see if they contain just the basename in case someone sends you a binary and any needed .dwo files?

So if we have a DWO path of "/users/someone/build/foo.dwo", we would check each of the debug info search paths for:

path + dwo_file.GetFilename()

This would allow someone to send a binary to someone else and allow them to load it up.

But any such search might belong down below this entire if/else statement when the file isn't found:

if (!found) {
  // Search for debug info path + basename...
}
1844

Here is where we might think about doing the debug info search path + basename searches since it doesn't matter if the .dwo file original path was relative or not.

clayborg added a comment.EditedAug 14 2023, 4:16 PM

We are also looking for a way to locate .dwo files at Meta. One idea we came up with is to add a setting that can be set:

(lldb) settings set symbols.locate-dwo-script /path/to/locate-dwo.py

Any specified scripts would be expect to contain a python function:

def locate_dwo(exe_path: lldb.SBFileSpec, dwo_path: lldb.SBFileSpec, dwo_id: str) -> lldb.SBFileSpec
  # Add code to locate the .dwo file here...

All of our binaries are built in a "buck-out" directory and the .dwo file paths are relative this to folder regardless of the DW_AT_comp_dir value. If we pass the executable into this function, we can search for "/buck-out/" in the path, and then use the first part of the path as a prefix and return a valid path.

This kind of solution might allow distributed build systems to not have to download all of the .dwo files until they are requested and the script could download them and return a path.

The current solution in this patch is nice and simple and would also work, but it would require us to know up front where these directories are so that we can set the setting correctly which is what we are trying to avoid as if someone tries to debug and doesn't see variables due to not being able to find a .dwo file, then some turn the "printf pro".

DavidSpickett added inline comments.Aug 15 2023, 1:40 AM
lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
1735

Good idea, I'll write a test for that scenario and see if it makes sense here or in a follow up.

We are also looking for a way to locate .dwo files at Meta. One idea we came up with is to add a setting that can be set:
(lldb) settings set symbols.locate-dwo-script /path/to/locate-dwo.py

Probably best bring this up on Discourse for a wider audience as I'm not up to date on all the search methods, I just dived into this one in particular. First thought is it sounds a bit like what debuginfod is supposed to do, but I don't know the details of that at the moment.

DavidSpickett marked an inline comment as done.
DavidSpickett added inline comments.
lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
1735

Just one question if we should still try all of your new code if we have a full path to a DW_AT_comp_dir, but we don't find the .dwo file, should be allow your code to run and still try to just append the relative "dwo_name" (without comp dir) to each of the search paths?

lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
1801

If "found == false" here, do we want to fall through to the "else" clause below? This would mean we have a full path to a DW_AT_comp_dir, but we didn't find the .dwo file and "dwo_name" is relative, so maybe we should try to use the relative "dwo_name" below without the comp dir if not found?

1804–1805

On major issue we have here is if someone creates a target first, then sets the debug file search paths, then this will always fail because Target::GetDefaultDebugFileSearchPaths() gets the global target settings. This is because before we have a target, if any target settings get set, they get set in this global target settings object. Then each time you create a target, it will get a copy of these global settings. So this would work:

(lldb) settings set target.debug-file-search-paths /my/objfiles
# Global target settings are set above
(lldb) target create a.out
(lldb) run
...

This wouldn't:

(lldb) target create a.out
(lldb) settings set target.debug-file-search-paths /my/objfiles
# Current target settings are set above, not the global settings
(lldb) run
...

Because the global settings have nothing for the "target.debug-file-search-paths", but the current lldb_private::Target that was created by "target create a.out" does have it set, but that isn't what we are getting here.

We don't have access to the current target anywhere in lldb_private::Module and its lldb_private::ObjectFile or lldb_private::SymbolFile because lldb_private::Module can be used in multiple targets, so the module doesn't know what target it is in. This isn't something that needs to be solved here, but we should understand the limitations.

In the interests of my own sanity I've squashed the subsequent review into this one
and applied the suggestions from both.

Also I changed the tests to look for a global so we don't have to update the
line number all the time.

DavidSpickett retitled this revision from [lldb] Search debug file paths when looking for split DWARF files to [lldb] Add more ways to find split DWARF files.Sep 1 2023, 4:36 AM
DavidSpickett edited the summary of this revision. (Show Details)
DavidSpickett marked 2 inline comments as done.Sep 1 2023, 4:38 AM
DavidSpickett added inline comments.
lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
1801

I've done this in this latest revision.

1844

This is now done above, either relative or not will end up trying paths + dwoname and paths + filename.

DavidSpickett marked 2 inline comments as done.

Use *.dwo in commands since I'm not sure the DWO name will be the
same everywhere. The specific name isn't important, we know there's
only going to be one.

clayborg added inline comments.Sep 1 2023, 9:48 AM
lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
1830

Rename to "dwo_name_spec"? I think we can use this down below in following inline comments

1833

Rename to "binary_directory"?

1837–1839

Should we only do this if "dwo_name" is relative? Also if "comp_dir" is relative, do we want to add another path to search here?

Rename as suggested.

DavidSpickett marked 2 inline comments as done.Sep 5 2023, 1:37 AM

In the fallback, only check binary dir + dwo name if the dwo name is relative.

DavidSpickett added inline comments.Sep 5 2023, 1:59 AM
lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
1837–1839

Should we only do this if "dwo_name" is relative?

Done.

Also if "comp_dir" is relative, do we want to add another path to search here?

binary dir + relative comp dir is covered in the first set of checks around:

// if DW_AT_comp_dir is relative, it should be relative to the location...
clayborg added inline comments.Sep 5 2023, 2:03 PM
lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
1732–1796

If we have no comp_dir, then we should still search for the file right? We are returning early here without trying to use the search paths?

1808

Will this return false for a symlink to a directory?

Log when comp dir not found and carry on to fallback search.

Add a test for the search path being a symlink.

DavidSpickett marked 2 inline comments as done.Sep 6 2023, 6:05 AM
DavidSpickett added inline comments.
lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
1732–1796

Refactored this to log and carry on to the later searches.

We don't test for this situation at all so it's likely very rare, which means logging it is fine. Standard advice on unexplained failures to find things is check the logs.

1808

In this case no because comp_dir was always being added to it, unless the result of that was also pointing to a symlink.

I have added a test case to check you can set the search path to a symlink, I'm not sure of an easy way to do that for comp_dir.

DavidSpickett marked 2 inline comments as done.Sep 6 2023, 6:07 AM
DavidSpickett added inline comments.
lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
1808

The underlying filesystem does have different types for a directory and a symlink, so the general answer is yes it appears that IsDirectory would return false for a symlink.

Which you'd think would affect the later searches using the search path only, but it doesn't. So maybe resolving helps there, not sure. End result is good anyway.

clayborg accepted this revision.Sep 6 2023, 9:50 AM

Thanks a lot for acting on all feedback and getting all of the edge cases! LGTM

This revision is now accepted and ready to land.Sep 6 2023, 9:50 AM

No problem. I doubt this is all of them but at least users can now workaround the gaps using the setting, and hack up one of these test cases to make reproducers.

This revision was automatically updated to reflect the committed changes.

From what I gather, split dwarf isn't a thing on Darwin so I've relanded with the tests skipped (https://github.com/llvm/llvm-project/commit/b1f14d647300b0ed003fa7c24af311b11605d009).

Any chance we could simplify this situation and have dwo searches use exactly the same/shared logic as source file searches? It seems like the dwarf spec intent was for that to be the case.

I don't mind adding more ways to find things, but think it'd be useful if we only had one set of rules instead of two, to prevent divergence.

(The script idea from Greg sounds ok too (but ideally, also in the context of one search system for both source and dwo) - but yeah, maybe invasive enough that it merits a separate/broader design discussion)

Any chance we could simplify this situation and have dwo searches use exactly the same/shared logic as source file searches?

I'm not sure what exactly this means, can you clarify?

I know that DWP and DWO have different search functions, that could certainly be unified. Not sure how source files work.

Any chance we could simplify this situation and have dwo searches use exactly the same/shared logic as source file searches?

I'm not sure what exactly this means, can you clarify?

I know that DWP and DWO have different search functions, that could certainly be unified. Not sure how source files work.

Source file search paths and debug info search paths are different in LLDB. GDB had one setting IIRC that handled both. These settings have been in LLDB for a while so I wouldn't recommend changing this up at this point.

Any chance we could simplify this situation and have dwo searches use exactly the same/shared logic as source file searches?

I'm not sure what exactly this means, can you clarify?

I know that DWP and DWO have different search functions, that could certainly be unified. Not sure how source files work.

Source file search paths and debug info search paths are different in LLDB. GDB had one setting IIRC that handled both. These settings have been in LLDB for a while so I wouldn't recommend changing this up at this point.

I don't think that's a great motivation to diverge them further.

We could make the DWO search fallback to the source search - and add new features to the source search if needed. That'd at least align things a bit closer, rather than diverging further.