This is an archive of the discontinued LLVM Phabricator instance.

LLDB: Use path relative to binary, not relative to debugger CWD, for finding .dwo files.
ClosedPublic

Authored by cmtice on Mar 2 2021, 10:50 AM.

Details

Summary

DWARF allows .dwo file paths to be relative rather than absolute. When they are relative, DWARF uses DW_AT_comp_dir to find the .dwo file. DW_AT_comp_dir can also be relative, making the entire search patch for the .dwo file relative. In this case, LLDB currently searches relative to its current working directory, i.e. the directory from which the debugger was launched. This is not right, as the compiler, which generated the relative paths, can have no idea where the debugger will be launched. The correct thing is to search relative to the location of the executable binary. That is what this patch does.

Diff Detail

Event Timeline

cmtice created this revision.Mar 2 2021, 10:50 AM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a reviewer: Restricted Project. · View Herald Transcript
cmtice requested review of this revision.Mar 2 2021, 10:50 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript

Is this rebased on main? The LLD for Mach-O changes look like they belong to a different diff.

Something went very wrong on the patch upload. This does NOT look like the
patch I intended!

DO NOT REVIEW THIS UNTIL I CAN UPLOAD THE RIGHT PATCH!

  • Caroline

cmtice@google.com

cmtice updated this revision to Diff 327523.Mar 2 2021, 11:04 AM

Upload the correct patch this time. (sorry!)

int3 removed reviewers: int3, Restricted Project.Mar 2 2021, 11:07 AM
int3 removed subscribers: int3, gkm.
labath added a subscriber: labath.Mar 2 2021, 11:22 AM

In general, a compiler also does not know where the final executable will be placed -- this can be only (maybe) be said for a sufficiently strict build system (like the one at google). I wouldn't say that searching relative to the cwd is really that much better, but at least it is consistent with what gdb does.

labath added a subscriber: thakis.Mar 2 2021, 11:27 AM

Also, cc @thakis, who floated the idea of using a special path token ($ORIGIN) to achieve the same thing.

It is true that the compiler will not know where the final executable will be placed, and if the executable gets moved then debugging with .dwo files will probably not work anyway, However as it is currently written, LLDB will fail to find the .dwo files, *even when the binary has not been moved*, if the debugger is launched from any directory other than the one containing the binary.

E.g. we have a team where the standard workflow is 'cd src'; build binary in src/out/Debug/.; from src, do 'lldb out/Debug/binary'. lldb cannot find the .dwo files.

re "GDB does it this way" -- I am also writing/submitting a patch to get GDB to change this behavior.

re "GDB does it this way" -- I am also writing/submitting a patch to get GDB to change this behavior.

Ah, I see. That changes the picture, I'd say. If gdb folks go along with that, then I don't see any problems here. I see just one small issue to nail down -- what exactly is the path relative to? The executable file, or the file containing the debug info (in the objcopy --strip-debug/--only-keep-debug scenario)? I think the latter would be /slightly/ more useful, as one can imagine shipping the dwos in /usr/lib/debug, next to the .debug file (though I hope that in reality, everyone will just ship the dwp file). I think this is actually what this patch implements.

lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
1646–1652

All of this is basically just dwo_file.PrependPathComponent(m_objfile_sp->GetFileSpec().GetDirectory().GetStringRef()), is it not?

While next to the binary seems like a better guess than the cwd, it does seem awkward that if the binary & dwo's aren't in the same place, you have to move files to get that heuristic to work, whereas you could just "cd" into wherever the dwo's got dumped to make the cwd version work.

I'm don't really have a strong opinion one way or the other. But this does seem like the sort of thing that should be helped out by one of the debug info path settings. That way the user could point lldb at the dwo's no matter where they ended up.

labath added a comment.Mar 4 2021, 5:36 AM

Having a setting for this would not be unreasonable.

But what about reusing an existing setting for this purpose? I thinking of the (global) target.debug-file-search-paths setting and the associated logic (Symbols::LocateExecutableSymbolFile and friends). If we used this mechanism, then dwo files would be found automatically, as (I believe) the directory containing the executable file is being always searched, even with an empty setting. And this function is already being used to locate the dwp file, which is why we are able to find the dwp file next to the executable, regardless of the cwd.

cmtice added a comment.Mar 4 2021, 8:01 AM

I'm not sure about using target.debug-file-search-paths, and what the changes Pavel is suggesting would entail. But the real question I am trying to resolve here is not "what are all the places we should be searching for the .dwo files?" but "When we're given a *relative path* in the DWARF for finding the files, what should it be relative TO?".

I still think the most correct answer to that question is "relative to the location of the binary" (I know, this is assuming that the binary has not been moved since it was built, or all bets are off). If you like, I can update this patch so that it continues to search relative to the cwd (where the debugger was launched), and IN ADDITION, will search relative to where the binary is.

cmtice updated this revision to Diff 328191.Mar 4 2021, 8:46 AM

Update to incorporate Pavel's suggested simplification.

I'm not sure about using target.debug-file-search-paths, and what the changes Pavel is suggesting would entail.

I imagine it would involve calling Symbols::LocateExecutableSymbolFile to locate the dwo file, similar to how we do that for dwp files (see SymbolFileDWARF::GetDwpSymbolFile). (Disclaimer: I have not tried doing this, so I don't know if that function would work out-of-the-box.)
Note that I myself am not convinced that this is the right solution (that function is rather heavy), but it does solve the problem of "how do we let the user specify/choose the location of the dwo files" (answer: put the path in the target.debug-file-search-paths setting), and it would search the cwd and the module directory automatically. And the "heavy-ness" of the function is moderated by the fact that it is a no-op for absolute paths to existing files.

But the real question I am trying to resolve here is not "what are all the places we should be searching for the .dwo files?" but "When we're given a *relative path* in the DWARF for finding the files, what should it be relative TO?".

I'm sorry, but what's the difference between those two questions? Is it about the fact that the second question sort of implies that there should only be one correct location where we should be searching?

I still think the most correct answer to that question is "relative to the location of the binary" (I know, this is assuming that the binary has not been moved since it was built, or all bets are off). If you like, I can update this patch so that it continues to search relative to the cwd (where the debugger was launched), and IN ADDITION, will search relative to where the binary is.

I don't really have a strong opinion either way -- the thing I'm mostly interested in is some sort of consistency between lldb & gdb. Have you already discussed this with the gdb folks? If they want to move to the binary-relative mechanism, then I don't see a problem with us doing the same. If they want to keep the existing behavior, then I think it would be good to preserve that in lldb as well (but I don't have a problem with searching in other places too; and LocateExecutableSymbolFile is one way to achieve that).

cmtice added a comment.Mar 7 2021, 8:53 AM

I'm not sure about using target.debug-file-search-paths, and what the changes Pavel is suggesting would entail.

I imagine it would involve calling Symbols::LocateExecutableSymbolFile to locate the dwo file, similar to how we do that for dwp files (see SymbolFileDWARF::GetDwpSymbolFile). (Disclaimer: I have not tried doing this, so I don't know if that function would work out-of-the-box.)
Note that I myself am not convinced that this is the right solution (that function is rather heavy), but it does solve the problem of "how do we let the user specify/choose the location of the dwo files" (answer: put the path in the target.debug-file-search-paths setting), and it would search the cwd and the module directory automatically. And the "heavy-ness" of the function is moderated by the fact that it is a no-op for absolute paths to existing files.

But the real question I am trying to resolve here is not "what are all the places we should be searching for the .dwo files?" but "When we're given a *relative path* in the DWARF for finding the files, what should it be relative TO?".

I'm sorry, but what's the difference between those two questions? Is it about the fact that the second question sort of implies that there should only be one correct location where we should be searching?

I think I'm saying this poorly (I apologize). Let me try a bit differently. It's a matter of defaults. If the user builds with split debug and relative paths, and does not ever move the built binary: 1). it seems wrong to me that the debugger will sometimes find the .dwo files and sometimes will not, depending on where the debugger was launched from. It ought, at the very least, to behave consistently no matter where the debugger was launched from. 2).It is not possible that the compiler could ever have intended that paths to be relative to the location from where the debugger was launched because the compiler could have absolutely no idea where that would be. So when the compiler generated the relative paths, whatever it was truly intended to be relative to, the location from which the debugger was launched could not be it. 3). Given statements 1 & 2, it seems reasonable that the *default* behavior of the debugger is to search relative to the location of the binary.

Yes, the user could always use debugger settings to add or update search paths, but I don't think that should be required as part of the default behavior for finding the .dwo files.

I still think the most correct answer to that question is "relative to the location of the binary" (I know, this is assuming that the binary has not been moved since it was built, or all bets are off). If you like, I can update this patch so that it continues to search relative to the cwd (where the debugger was launched), and IN ADDITION, will search relative to where the binary is.

I don't really have a strong opinion either way -- the thing I'm mostly interested in is some sort of consistency between lldb & gdb. Have you already discussed this with the gdb folks? If they want to move to the binary-relative mechanism, then I don't see a problem with us doing the same. If they want to keep the existing behavior, then I think it would be good to preserve that in lldb as well (but I don't have a problem with searching in other places too; and LocateExecutableSymbolFile is one way to achieve that).

I have submitted a patch for this to GDB; it is currently under review. I am ok with waiting for this (LLDB) patch to be resolved until the GDB one is.

I bet gdb uses the debugger's CWD because that's the practice working ON gdb (at least back in the day when I worked on it). You built gdb, cd'ed into the gdb build directory and ran the debugger on it from there. gdb's build also produced a debugging-gdb specific .gdbinit file in the build directory that would also get read in if you did it that way. When using command-line debuggers, running from the build directory is probably a fairly common practice. So there are merits to the cwd way of doing things. These merits fade when using IDE's which generally support debugging more than one thing at a time, so changing the working directory of the IDE makes little sense.

Anyway, I agree with Pavel that the surprise of doing it differently from gdb is probably more important than the relative merits of one way or the other. If you can convince the gdb folks to change to next to the binary, lldb should follow that.

I don't know if people do this with dwo's but I've seen in other instances where you get symbols from some sort of federated build system and just dump them all in some debug info directory. If people also do this with dwo's then it would make sense to use the debug symbols path to specify this. That does seem like a separate patch, however.

cmtice added a comment.Apr 7 2021, 9:09 AM

It has taken a bit of time to get through the GDB reviews, but the change to GDB was accepted and committed: https://sourceware.org/pipermail/gdb-cvs/2021-April/050267.html

May I commit this change to LLDB now?

labath added a comment.Apr 8 2021, 1:58 AM

If gdb does it, then I don't have any issues with this functionality. It could use a test case though. You can try rewriting that gdb test case for lldb -- we don't have fancy dwarf assemblers (cool stuff, btw), we just use asm (you could look at test/Shell/SymbolFile/DWARF/dwo-type-in-main-file.s for inspiration). However, generating the inputs via clang should be fine as well..

cmtice updated this revision to Diff 336276.Apr 8 2021, 5:15 PM

Add a test case.

One more thing I forgot to mention: In the all-paths-are-relative scenario, lldb will need some help with finding the source files. I believe that currently these will still be resolved relative to CWD, and that's something you may not want (?) Unlike dwos, this can be worked around currently by manually setting the target.source-map setting. It's not for this patch, but I think this is something you may want to happen automatically as well (?)

lldb/test/Shell/SymbolFile/DWARF/dwo-relative-path.s
11 ↗(On Diff #336276)

Could you make the test case not involve running (so that it can work on non-linux systems too)? You should be able to simply observe the value of a global variable (target variable x) or something like that. Also, that means the executable does not need to contain the main function (and everything that goes with it), which would significantly reduce the size of the test input.

Pavel, your last comment lost me completely. How can I test the code I added to lldb if I don't run lldb? I am a complete newbie at writing test cases so there's probably something basic I'm missing? How would I observe the value of the variable without running lldb? Also, if the program doesn't contain 'main', won't the compiler complain? Is there an existing test that does what you're suggesting that I could look at?

Pavel, your last comment lost me completely. How can I test the code I added to lldb if I don't run lldb? I am a complete newbie at writing test cases so there's probably something basic I'm missing? How would I observe the value of the variable without running lldb? Also, if the program doesn't contain 'main', won't the compiler complain? Is there an existing test that does what you're suggesting that I could look at?

Sorry, I meant running the inferior -- you do definitely need to run lldb :)

And we can avoid the need for the main function by pointing lldb directly at the object file. As long as you don't try to run the file, lldb (mostly) does not care whether it's dealing with a full binary or just an object file. A lot of the .s tests in this directory use that approach, but most of them don't use split dwarf, so I don't know of a file you could copy verbatim. However, you could try to look at dwarf5-split.s, which does something pretty similar with a dwp file.

I'd imagine the test could look something like:

## As before
# RUN: llvm-mc --filetype=obj --triple x86_64-pc-linux %s -o %t.o
# RUN: llvm-objcopy --split-dwo=%T/dwo-relative-path.dwo %t.o

## No linking

# RUN: cd ../..

# RUN: %lldb %t.o -o "target variable x" -b 2>&1 | FileCheck %s

# CHECK: x = 10

## Debug info for "int x = 10;"
cmtice updated this revision to Diff 337622.Apr 14 2021, 9:01 PM

Update test case to use lldb on .o file and not run the inferior.

I did leave 'main' in the .s file, but it's not very big. Is this ok now?

labath accepted this revision.Apr 15 2021, 12:23 PM

Yes, thanks for bearing with me.

This revision is now accepted and ready to land.Apr 15 2021, 12:23 PM
This revision was landed with ongoing or failed builds.Apr 15 2021, 2:44 PM
This revision was automatically updated to reflect the committed changes.

This broke the windows LLDB bot. Please fix it soon or revert.

https://lab.llvm.org/buildbot/#/builders/83/builds/5685

I had to revert this change because the test case broke the windows builder. What's the right way to update/mark the test case so that it is only run on appropriate architectures & operating systems?

I had to revert this change because the test case broke the windows builder. What's the right way to update/mark the test case so that it is only run on appropriate architectures & operating systems?

You add the python skipIfWindows "decorator" before each test case definition. Just grep around in the test/API directory and you'll see plenty of instances of its use.

This does seem like it should work on Windows, however. Maybe some path handling problem?

I had to revert this change because the test case broke the windows builder. What's the right way to update/mark the test case so that it is only run on appropriate architectures & operating systems?

You add the python skipIfWindows "decorator" before each test case definition. Just grep around in the test/API directory and you'll see plenty of instances of its use.

Since this is a shell (lit) test, I think you mean adding # REQUIRES: windows at the top of the test?

This does seem like it should work on Windows, however. Maybe some path handling problem?

This would be my guess too. But the APIs used here seem to handle file paths generically...

It looks like you've managed to find a bug in lldb. Congratulations. ;)

I don't yet fully understand what's going on, but the rough idea is this: Lldb relies heavily on knowing the boundaries of every function. For this reason, our ELF code contains logic to parse the unwind info for an object file, and use it to create fake (synthetic) symbols if we don't have real ones (ObjectFileELF::ParseUnwindSymbols). The first bug is that it tries to create a new symbol for the unwind info of main. This is probably due to incomplete handling of unlinked files. The second bug is that the code assigns symtab.size() as the ID of the new symbol. This is a problem, because the symtab can contain real symbols with IDs larger than that (because we skip some useless symbols when creating lldb Symbol objects). This seems to be the case with this test file which contains an undefined STT_NOTYPE symbol as a first entry. (I'm not sure why is that the case, nor if this is specific to this test file). In any case, the result of all this is that we end up with two symbols with ID 12 ("x", and the synthetic one) and we end up picking one nondeterministically when resolving relocations for x.

In this case, windows actually chooses the right symbol, which is why it correctly ends up printing x = 10. On linux, we pick the wrong one, and x ends up pointing to a random block of zeroes.

I'm going to look into these issues, but for the time being, you can just work around the issue by removing the main function, which will stop all of these bad things from happening. I have attached a reduced version of this test file, which should work correctly everywhere. (Strictly speaking, it would've been enough to remove the .cfi directives, but while I was in there, I took the opportunity to remove other unnecessary cruft as well)

Hi Pavel,

Thank you for both the detailed explanation and the updated test case (and thank you Jim for your recommendations as well!). I'm testing the updated test case now. Assuming the tests pass, is it ok for me to re-land this CL using the updated test case? and if so, do I need to upload the updated patch to this code review?

Pavel's test case passed on my local machine (x86_64 linux workstation).

Since you just mainly want to run this past the bots again, probably best to just resubmit. You could update the patch here or not at your convenience, but I don't see it would serve any purpose to wait on approval.

Sorry for not noticing it was a shell test, BTW...

Hi Pavel,

Thank you for both the detailed explanation and the updated test case (and thank you Jim for your recommendations as well!). I'm testing the updated test case now. Assuming the tests pass, is it ok for me to re-land this CL using the updated test case? and if so, do I need to upload the updated patch to this code review?

pfaffe added a subscriber: pfaffe.Aug 30 2021, 6:04 AM

This change breaks all existing uses of relative comp_dirs that don't accidentally make all of them relative to the executable's directory already.

It's easy to construct broken use cases: Consider compiling your program like clang -g ./src/src.c -gsplit-dwarf -o ./out/src -fdebug-prefix-map=$(pwd)=.. This will create src.dwo in $(pwd) and set the DW_AT_comp_dir of src to "."; the change makes it impossible to load src.dwo.

The problem is easy to see in this slightly constructed example and could be fixed here by moving around the dwo files or setting the prefix to '..' to make it point to the executable. But that's not always possible! What if we have a static library that gets linked into multiple executables in different locations? What do we set comp_dir to in the library objects?

The change also introduces different behavior for loading dwos vs. source files. The latter will still be loaded relative to the CWD.

kimanh added a subscriber: kimanh.Aug 30 2021, 6:05 AM

This change breaks all existing uses of relative comp_dirs that don't accidentally make all of them relative to the executable's directory already.

It's easy to construct broken use cases: Consider compiling your program like clang -g ./src/src.c -gsplit-dwarf -o ./out/src -fdebug-prefix-map=$(pwd)=.. This will create src.dwo in $(pwd) and set the DW_AT_comp_dir of src to "."; the change makes it impossible to load src.dwo.

The problem is easy to see in this slightly constructed example and could be fixed here by moving around the dwo files or setting the prefix to '..' to make it point to the executable. But that's not always possible! What if we have a static library that gets linked into multiple executables in different locations? What do we set comp_dir to in the library objects?

The change also introduces different behavior for loading dwos vs. source files. The latter will still be loaded relative to the CWD.

This doesn't solve all use cases/it's not a terribly general purpose situation - using -fdebug-prefix-map/-fdebug-compilation-dir requires knowledge of these limitations/how to use these features together. (though I agree that changes in this area should apply to all relative comp_dir lookups - source and dwo files alike, rather than handling one group differently from the other when they're all defined as comp_dir relative)

This doesn't solve all use cases/it's not a terribly general purpose situation - using -fdebug-prefix-map/-fdebug-compilation-dir requires knowledge of these limitations/how to use these features together. (though I agree that changes in this area should apply to all relative comp_dir lookups - source and dwo files alike, rather than handling one group differently from the other when they're all defined as comp_dir relative)

My main concern is that even with knowledge of how the debug-compilation-dir features work, there are use cases that stop working after this change and are unfixable on the user's end. Anything that involves linking shared objects into multiple executables for example.

This doesn't solve all use cases/it's not a terribly general purpose situation - using -fdebug-prefix-map/-fdebug-compilation-dir requires knowledge of these limitations/how to use these features together. (though I agree that changes in this area should apply to all relative comp_dir lookups - source and dwo files alike, rather than handling one group differently from the other when they're all defined as comp_dir relative)

My main concern is that even with knowledge of how the debug-compilation-dir features work, there are use cases that stop working after this change and are unfixable on the user's end. Anything that involves linking shared objects into multiple executables for example.

Not fixable? Not sure I follow. In the case of cwd-relative (old behavior) you can fix the behavior by changing where you invoke the debugger from, and in the case of exe-relative you can fix the behavior by moving the executable? (once both source and dwo lookup is implemented consistently, at least)

Do you know anyone using the feature that would be challenged by this change in behavior? The only place I know using the feature is Chromium. (Google's internal build system uses -fdebug-compilation-dir=/proc/cwd so it really gets cwd-relative always, even with this change in lldb (but it's not a portable solution, since the path is unix-ish-specific, whereas Chromium probably needs a more portable solution that works on Windows, etc) - and then some local patches to gdb I think to do change the search paths so comp_dir gets ignored anyway, I think? not 100% sure, it's all a bit arcane)

pfaffe added a comment.Sep 1 2021, 4:27 AM

Not fixable? Not sure I follow. In the case of cwd-relative (old behavior) you can fix the behavior by changing where you invoke the debugger from, and in the case of exe-relative you can fix the behavior by moving the executable? (once both source and dwo lookup is implemented consistently, at least)

Thanks, I totally missed the fact that the patch intended to keep cwd-relative lookup working! Kudos for that! I overlooked it being mentioned in the discussion, and it's not in the commit message and the implementation is quite subtle. That eliminates most of my concerns!

That being said the implementation looks like it only does this partially. If I understand this correctly, the cwd-relative check happens in line 1646, but that only checks if the dwo_name exists. Below, comp_dir/dwo_name is however not checked against cwd. That means that we only get the old behavior if comp_dir happens to be .; any non-trivial relative comp_dir will behave differently, in that I need to cd into the comp_dir manually first to make things work. That's much better than I originally feared (chromium, e.g., uses ., so they're fine), but I still think we should retain the old behavior for non-trivial comp_dirs as well.

Not fixable? Not sure I follow. In the case of cwd-relative (old behavior) you can fix the behavior by changing where you invoke the debugger from, and in the case of exe-relative you can fix the behavior by moving the executable? (once both source and dwo lookup is implemented consistently, at least)

Thanks, I totally missed the fact that the patch intended to keep cwd-relative lookup working!

Hmm, sorry, I didn't mean to imply that/didn't know the patch did that (perhaps it only does by accident, I'm not sure).

What I meant was - no matter which interpretation (binary relative or cwd relative) is provided by a DWARF consumer, there's a way to make it work (assuming the paths in the DWARF/across all the CUs are consistently relative to the same thing) - either you move the binary, or you move where you invoke the binary from/your cwd. So I don't think either interpretation makes something unworkable/unusable, depending on where you define the constraints of the problem (if you define it as "there's something you can do when compiling/linking the binary to get a certain behavior to work" (ie: you can always get cwd-relative if that's what you want), then yes, there's no solution if the change is made to make it binary-relative - but if you define the problem as "there's a way to debug this binary given these relative paths" - then yes, I think there's an answer either way: move the binary, or move the cwd). Then it's a question of backwards compatibility (I don't know of anyone relying on the cwd-relative behavior, do you? the only place using this functionality that I know of is Chromium) and a general "which is the better behavior", which, again, if it's basically only Chromium using it/built for that situation, I think it's fair to say they probably get to say which is more suitable for their use case.

(all this said: Chromium already had a debugger configuration (at least for gdb?) to set the directories property ( https://sourceware.org/gdb/onlinedocs/gdb/Source-Path.html ) - which basically overrides/implements the comp_dir relative searching (by initializing the directories property, by default, with $cdir;$cwd (so it searches a given DWARF-specified-comp_dir-relative-path relative to DW_AT_comp_dir, and if that fails, searches relative to $cwd) - which meant that it really didn't matter what DW_AT_comp_dir they used, it could've been complete garbage and wouldn't've mattered) - but that hadn't extended to dwo searching, so they had a problem with dwo searching, which, imho, the right solution would've been to extend gdb's use of its directories property to be used also for dwo searching, since it's the mechanism for comp_dir relative path resolution and dwo paths are comp_dir relative - then there wouldn't've been much need to change anything, so far as I know... (and some similar functionality for lldb would be nice too))