Page MenuHomePhabricator

Resolve DW_AT_comp_dir path if it contains a symlink
ClosedPublic

Authored by ovyalov on Jul 20 2015, 9:24 AM.

Details

Summary

Value of DW_AT_comp_dir entry can be enforced by PWD environment variable and set with symbolic link.
If DW_AT_comp_dir is a symlink use FileSystem::Readlink to get full path.

Diff Detail

Event Timeline

ovyalov updated this revision to Diff 30164.Jul 20 2015, 9:24 AM
ovyalov retitled this revision from to Resolve DW_AT_comp_dir path if it contains a symlink .
ovyalov updated this object.
ovyalov added reviewers: clayborg, chaoren.
ovyalov added a subscriber: lldb-commits.
chaoren accepted this revision.Jul 20 2015, 10:25 AM
chaoren edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Jul 20 2015, 10:25 AM

Why do you need to do this? Between the comp dir and the file name you should have an absolute path. If the debug information is not resolved to an absolute path, that's a compiler bug and should be fixed there.

The debugger does not require that the source tree still exists, so you can't base any logic on this ACTUALLY existing to avoid a potentially large number of pointless stat's. And in general, unless there's some clear advantage, I'd rather we not touch the file system for source files until somebody actually wants to find a particular source file.

Why do you need to do this? Between the comp dir and the file name you should have an absolute path. If the debug information is not resolved to an absolute path, that's a compiler bug and should be fixed there.

Binary can be built on remote machine (in case of distributed build) and in this case DW_AT_comp_dir entry contains useless and invalid path for a host which initiated a build.
Problem similar to the issue described here - http://marc.info/?l=cfe-dev&m=131916863602489. GCC puts /proc/self/cwd as a value of DW_AT_comp_dir so host can deduce it locally to own path.

The debugger does not require that the source tree still exists, so you can't base any logic on this ACTUALLY existing to avoid a potentially large number of pointless stat's. And in general, unless there's some clear advantage, I'd rather we not touch the file system for source files until somebody actually wants to find a particular source file.

I agree that numeric stat calls may pose a significant IO overhead - as a workaround, we can introduce a new property which enables option symbolic link resolution.

clayborg requested changes to this revision.Jul 20 2015, 3:19 PM
clayborg edited edge metadata.

Maybe we can look for these magic values and only resolve the symlink if they match? Is it currently only "/proc/self/cwd" that we would need to look for? stat-ing a ton of file can be very expensive as Jim said and we should strive to avoid it if at all possible.

This revision now requires changes to proceed.Jul 20 2015, 3:19 PM
ovyalov updated this revision to Diff 30208.Jul 20 2015, 4:00 PM
ovyalov edited edge metadata.

Added static array of possible symbolic link values initialized with "/proc/self/cwd".

clayborg accepted this revision.Jul 20 2015, 5:08 PM
clayborg edited edge metadata.

Looks good.

This revision is now accepted and ready to land.Jul 20 2015, 5:08 PM
ovyalov closed this revision.Jul 20 2015, 7:10 PM

Files:

/lldb/trunk/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp

Users:

ovyalov (Author)

http://reviews.llvm.org/rL242757

As a followup, can you make this non-exported static array of names that are pretty much only interesting on Linux into something queried from the platform? That way we could provide a way for the user to augment the list to add a new directory thats relevant, or even turn the stat behavior off by clearing the array, etc...

There should also be a test for this if you can manage to make one, even if it only runs on Linux, or this is likely to break.

emaste added a subscriber: emaste.Jul 27 2015, 12:41 PM
emaste added inline comments.
source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
153

This is Linux-specific, it shouldn't be done unconditionally in here

As a followup, can you make this non-exported static array of names that are pretty much only interesting on Linux into something queried from the platform? That way we could provide a way for the user to augment the list to add a new directory thats relevant, or even turn the stat behavior off by clearing the array, etc...

Oh, I didn't notice this when I wrote my note. Agreed.

Thanks for suggestions - will address them shortly.

So a few rules that we must follow:

  • lldb_private::Module and the contained lldb_private::ObjectFile and lldb_private::SymbolVendor/lldb_private::SymbolFile are never associated with a Target or Process unless they are loaded from memory and there is no backing file. This would be the typical way to get a platform, but that options isn't available. Why? We put all modules into a global cache that means multiple processes can all use the same module for libc.so. So there can be no ties to a target/process in any fix we do.
  • Since we have no target/process we must rely on static function calls in the platforms or settings to fix this. One option is to make a setting, something like "plugin.symbol-file.dwarf.symlink-paths" which could be an array that could contain these items. This would allow people to add new ones, but this doesn't limit the checks to linux only. The setting could be a dictionary where the key is a triple (like "*-*-linux") and the value is a string which is the path to convert.
  • Another option is to find the platform by using the triple from the symbol file and then asking the platform for the list to use.

I think I like the last option, but this options doesn't guarantee that we get the Platform that you might have created for your process.

Thinking about this a little more, it seems like this should really be a host facility, not a platform one. After all, in order for this bit to do its job, a realpath or its equivalent must succeed on the local system. So it really is a host feature. That makes figuring out where to put it a lot easier, but I also think it reflects better what is going on here.