This is an archive of the discontinued LLVM Phabricator instance.

Add a new breakpoint partial match settings
AbandonedPublic

Authored by yinghuitan on Jul 7 2022, 10:02 AM.

Details

Summary

Some build system (like Buck) would normalize file paths into relative paths
in debug info to support hermetic/stable build caching.
This requires IDE/debugger users to configure correct source mapping if they
are using full path for file line breakpoint.

We are seeing many users fail to bind/resolve breakpoints due to
incorrect/missing source map.

This patch adds a new partial match setting (target.breakpoint-partial-match)
which enables matching breakpoint request by partial suffix.
The number of suffix directories required to match is controlled by another
setting (target.breakpoint-partial-match-dir-count). The default value is zero
which means matching base file name only.

This mimic what most command line lldb users/testcases are doing -- they use
base file name for setting file line breakpoint.

This setting will greatly improve breakpoint reliability in lldb-vscode useage
and can help post-mortem/off-host debugging which the file path in debug info
may not match the source location in the debug machine.

Diff Detail

Event Timeline

yinghuitan created this revision.Jul 7 2022, 10:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2022, 10:03 AM
yinghuitan requested review of this revision.Jul 7 2022, 10:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2022, 10:03 AM

Remove unnecessary format changes caused by IDE.

jingham requested changes to this revision.EditedJul 7 2022, 11:18 AM

I'm not entirely clear what problem this is solving. Any actor setting breakpoints can already choose the match depth by simply providing that many directory components. I.e. if I know I have subdir1/foo.cpp and subdir2/foo.cpp I can set a breakpoint on only one of them by doing:

(lldb) break set -f subdir1/foo.cpp -l 10

So an IDE could implement this by simply passing either the base name or the base name plus one sub directory, etc. As you say, that's what command line users do anyway, they seldom type out full paths. So this is really about IDE's running lldb, and this seems more like a feature the IDE should offer, not lldb.

It also seems awkward to do all this work as part of the breakpoint filtering - which as you have seen is somewhat delicate code. After all, you can implement "only match the last N components" by only submitting the last N components when looking up the files. So it would be much more straightforward to strip the leading path components from the file name when you create the breakpoint and you don't have to muck with the filtering at all. That would also help remove some confusion when you see that the path submitted was /foo/bar/baz.cpp but I matched /some/other/bar/baz.cpp (but not /some/other/different/baz.cpp) because I had set the match suffix count to 1. It would be nice to have break list show me exactly what it was going to match on.

I am also not sure doing this as a general setting for values other than 0 is really helpful. So far as I can tell, you would use a value of 1 because you know that though you might have files with duplicate names they are always disambiguated by their parent directory name. Using a value of 2 says the names are disambiguated by the two parent directory names. But there's no guarantee that one value is going to work for all the files in your project and having to change a global setting from breakpoint to breakpoint is awkward.

Also, you are supposed to be able to save breakpoints & restore them and provided your binaries haven't changed you will get the same breakpoints. Now, for that to work, you also have to restore some target setting that wasn't saved with the breakpoints.

I wouldn't mind so much if this were passed in to the breakpoint setting directly, though again, I don't really see the point since this is mostly for IDE's and they can strip however much they want off a path before submitting it w/o involving lldb.

This revision now requires changes to proceed.Jul 7 2022, 11:18 AM

I'm not entirely clear what problem this is solving. Any actor setting breakpoints can already choose the match depth by simply providing that many directory components. I.e. if I know I have subdir1/foo.cpp and subdir2/foo.cpp I can set a breakpoint on only one of them by doing:

(lldb) break set -f subdir1/foo.cpp -l 10

So an IDE could implement this by simply passing either the base name or the base name plus one sub directory, etc. As you say, that's what command line users do anyway, they seldom type out full paths. So this is really about IDE's running lldb, and this seems more like a feature the IDE should offer, not lldb.

Xcode has this same issue if you download a dSYM file from the Apple build infrastructure if that dSYM doesn't contain path re-mappings in the UUID plists. Xcode, the IDE, will send down full paths to the source files that it knows about if you set breakpoints. Granted Xcode could work around this, but we added the ability to remap things at the module level in LLDB to work around this issue. So yes, it would be great if all IDEs would do this, but currently none do.

It also seems awkward to do all this work as part of the breakpoint filtering - which as you have seen is somewhat delicate code. After all, you can implement "only match the last N components" by only submitting the last N components when looking up the files. So it would be much more straightforward to strip the leading path components from the file name when you create the breakpoint and you don't have to muck with the filtering at all. That would also help remove some confusion when you see that the path submitted was /foo/bar/baz.cpp but I matched /some/other/bar/baz.cpp (but not /some/other/different/baz.cpp) because I had set the match suffix count to 1. It would be nice to have break list show me exactly what it was going to match on.

We might need to always store the path that was given in source file and line breakpoints and then have the breakpoint update its locations when/if either of these two settings are modified. This would also help with your example below where you comment on breakpoint settings depend on the history of the session (they shouldn't).

I am also not sure doing this as a general setting for values other than 0 is really helpful. So far as I can tell, you would use a value of 1 because you know that though you might have files with duplicate names they are always disambiguated by their parent directory name.

Exactly the case I was worried about. Usually zero would be the default IMHO, but if you have two different binaries that both have a "main.cpp" it could help. We can remove the depth and have this setting only set breakpoints by basename if needed if no one wants the extra complexity.

Using a value of 2 says the names are disambiguated by the two parent directory names. But there's no guarantee that one value is going to work for all the files in your project and having to change a global setting from breakpoint to breakpoint is awkward.

I also don't like that this makes breakpoint settings depend on the history of the session. For instance:

a) Set the partial match to on and the depth to 2
b) Set a file & line breakpoint
c) Change the partial match depth to 0
d) a library gets loaded with a file that wouldn't have matched with depth of 2 but does with 0

We would need to store the full path in the source file and line breakpoints all the time and then set all of the source breakpoints again if this setting changes (enabled/disabled or dir depth changes).

Also, you are supposed to be able to save breakpoints & restore them and provided your binaries haven't changed you will get the same breakpoints. Now, for that to work, you also have to restore some target setting that wasn't saved with the breakpoints.

We wouldn't actually need to if we always store the path we are given when the breakpoint is originally set, and then update the breakpoint locations when/if the settings are changed (which might cause more locations to show up, or less).

I wouldn't mind so much if this were passed in to the breakpoint setting directly, though again, I don't really see the point since this is mostly for IDE's and they can strip however much they want off a path before submitting it w/o involving lldb.

So neither Xcode nor Visual Studio Code make any attempt to remap sources for anyone. The IDEs always pass down the full path currently.

Just to let everyone know where we are going with this: we want to implement an auto source map feature after this patch where if the user sets a breakpoint with "/some/build/path/<src-root>/bar/baz/foo.cpp", and the debug info contains "<any-path>/<src-root>/bar/baz.foo.cpp", we can have LLDB automatically create a source mapping for "/some/build/path" -> "<any-path>" where "<any-path>" can be a relative path like "." or an absolute path like "/local/file/path". The BUCK build system uses relative paths in the DWARF for all source files so that cached compiled versions of .o and .a files can be downloaded from the build infrastructure if no one has changed sources in a particular directory. This lets build servers build and cache intermediate files on another machine and send them over to user machines to speed up local builds. Apple's B&I builds on some remote build server using different paths and then uploads results to other machines in different directories and also archives the sources in other locations. So having this feature will help users to be able to set breakpoints more reliably without having to set the source mapping settings manually.

We can implement this feature in lldb-vscode, but I would much rather have a solution in LLDB itself if this is useful to anyone other than just lldb-vscode. I am not aware of any IDEs that do source remapping for debugger in any way, so I think this feature it would be valuable to just about any IDE, but we can just put the feature in lldb-vscode and let everyone else fend for themselves if no one wants this feature in LLDB.

We would welcome feedback from anyone on the review or subscriber list.

I'm not entirely clear what problem this is solving. Any actor setting breakpoints can already choose the match depth by simply providing that many directory components. I.e. if I know I have subdir1/foo.cpp and subdir2/foo.cpp I can set a breakpoint on only one of them by doing:

(lldb) break set -f subdir1/foo.cpp -l 10

So an IDE could implement this by simply passing either the base name or the base name plus one sub directory, etc. As you say, that's what command line users do anyway, they seldom type out full paths. So this is really about IDE's running lldb, and this seems more like a feature the IDE should offer, not lldb.

Xcode has this same issue if you download a dSYM file from the Apple build infrastructure if that dSYM doesn't contain path re-mappings in the UUID plists. Xcode, the IDE, will send down full paths to the source files that it knows about if you set breakpoints. Granted Xcode could work around this, but we added the ability to remap things at the module level in LLDB to work around this issue. So yes, it would be great if all IDEs would do this, but currently none do.

It also seems awkward to do all this work as part of the breakpoint filtering - which as you have seen is somewhat delicate code. After all, you can implement "only match the last N components" by only submitting the last N components when looking up the files. So it would be much more straightforward to strip the leading path components from the file name when you create the breakpoint and you don't have to muck with the filtering at all. That would also help remove some confusion when you see that the path submitted was /foo/bar/baz.cpp but I matched /some/other/bar/baz.cpp (but not /some/other/different/baz.cpp) because I had set the match suffix count to 1. It would be nice to have break list show me exactly what it was going to match on.

We might need to always store the path that was given in source file and line breakpoints and then have the breakpoint update its locations when/if either of these two settings are modified. This would also help with your example below where you comment on breakpoint settings depend on the history of the session (they shouldn't).

I took that bit out because in fact the implementation stores the depth in the resolver when made. So any given breakpoint isn't history dependent, but rather to understand why some breakpoints take and some don't you have to know the history of your settings in the session - which is not recorded anywhere you can get your hands on. That's a source of confusion for sure but TTTT probably not a huge one.

I am also not sure doing this as a general setting for values other than 0 is really helpful. So far as I can tell, you would use a value of 1 because you know that though you might have files with duplicate names they are always disambiguated by their parent directory name.

Exactly the case I was worried about. Usually zero would be the default IMHO, but if you have two different binaries that both have a "main.cpp" it could help. We can remove the depth and have this setting only set breakpoints by basename if needed if no one wants the extra complexity.

Certainly just doing base name will find all the matches regardless of what the debug info says about the path, and the only problem is extra matches from files with the same name in different directories. That's only slightly annoying and not a fatal problem because you can always disable locations you don't want, though it would be good to have some way to handle this. But it seems to me to disambiguate effectively, you really have to know something about the structure of your project. You could probably get away with 1 directory as a global setting safely since most of time the sources move rigidly and the source root will be above everything. But anything above that will require you know that all the source files are that many levels below the source root or your setting will throw out some matches when the source root is moved. So it seems to me any higher setting is going to be too fiddly to be worth the effort.

Using a value of 2 says the names are disambiguated by the two parent directory names. But there's no guarantee that one value is going to work for all the files in your project and having to change a global setting from breakpoint to breakpoint is awkward.

I also don't like that this makes breakpoint settings depend on the history of the session. For instance:

a) Set the partial match to on and the depth to 2
b) Set a file & line breakpoint
c) Change the partial match depth to 0
d) a library gets loaded with a file that wouldn't have matched with depth of 2 but does with 0

We would need to store the full path in the source file and line breakpoints all the time and then set all of the source breakpoints again if this setting changes (enabled/disabled or dir depth changes).

I actually cut this part out of the comment (but after you responded) because the current implementation stores the setting when the breakpoint is made. So it wouldn't change with changes in the setting - which I think is the right behavior. There is no other instance where the resolver changes it's mind about how to find matches after the breakpoint is made. That useful because people can set intentions on individual locations (commands, conditions, etc.) and anything but changes in the underlying binaries that removes locations seems likely to undo user's work and reasonably upset them.

Since I don't think letting the locations change over time is a good idea, it follows that if you are going to do this it's better to have lldb trim the incoming file path based on what it plans to pay attention to rather than to keep a path with implicitly ignored components around till the very end of the breakpoint setting. It is weird for a breakpoint to show "/foo/bar/bar/blah.c" for the file path even though the breakpoint only plans to match on "bar/blah.c". These settings really just implement: "I can't be bothered to strip all but the last <N> components from a file path before submitting it for matching, please do that for me." Given that's what you are doing, then doing it in the most straightforward way seems best, i.e. just pre-process the file path based on what the user told you and record that in the breakpoint.

Also, you are supposed to be able to save breakpoints & restore them and provided your binaries haven't changed you will get the same breakpoints. Now, for that to work, you also have to restore some target setting that wasn't saved with the breakpoints.

We wouldn't actually need to if we always store the path we are given when the breakpoint is originally set, and then update the breakpoint locations when/if the settings are changed (which might cause more locations to show up, or less).

I agree with the first part of this, I don't think changing the resolver as the setting changes is a good idea (see above) but that's an orthogonal concern.

I wouldn't mind so much if this were passed in to the breakpoint setting directly, though again, I don't really see the point since this is mostly for IDE's and they can strip however much they want off a path before submitting it w/o involving lldb.

So neither Xcode nor Visual Studio Code make any attempt to remap sources for anyone. The IDEs always pass down the full path currently.

Xcode only does this because for the longest time lldb only supported "base filename" and "full path" matches, we didn't do partial path matches. They have a little bit of a problem figuring out the "root directory" of a project so they know what to strip from the path, but this setting wouldn't help with that anyway.

Just to let everyone know where we are going with this: we want to implement an auto source map feature after this patch where if the user sets a breakpoint with "/some/build/path/<src-root>/bar/baz/foo.cpp", and the debug info contains "<any-path>/<src-root>/bar/baz.foo.cpp", we can have LLDB automatically create a source mapping for "/some/build/path" -> "<any-path>" where "<any-path>" can be a relative path like "." or an absolute path like "/local/file/path". The BUCK build system uses relative paths in the DWARF for all source files so that cached compiled versions of .o and .a files can be downloaded from the build infrastructure if no one has changed sources in a particular directory. This lets build servers build and cache intermediate files on another machine and send them over to user machines to speed up local builds. Apple's B&I builds on some remote build server using different paths and then uploads results to other machines in different directories and also archives the sources in other locations. So having this feature will help users to be able to set breakpoints more reliably without having to set the source mapping settings manually.

I wonder about this a bit. Seems to me there are two separate problems that need solving. There's the main problem that source-maps are supposed to solve, namely finding actual source files on disk so that we can show source lines as you step. But the breakpoint system (except for break set -p) only needs to match paths, it doesn't care if any of the elements of the path actually exists. And really, the only reason why the breakpoint system needs paths at all is to disambiguate files with the same basename in different directories. That's all it cares about. Tying "file name disambiguation" too closely with actually ensuring that you can find real files on disk will mean you only solve the disambiguation problem if you have sources locally, which has never been a requirement of breakpoints.

Resolving breakpoint disambiguation also seems to me to be pretty different from the job of building these auto path maps. In building a path map, you are asking a question about equivalent prefixes: "how much should I chop off the beginning of these two paths to get an equivalence". For disambiguation you are asking "how many directories above the base name do I need to look at". It is awkward to try to use a specification for "N directories above the base name for any path I might find in the debug information" as a way to determine "where should I start looking for the equivalent prefix to strip to make a source map". The source paths in a complex project are at all different levels so that seems the wrong way to specify it.

And of course, you will have to do this in some way that's independent of breakpoints, since it also has to work with source list, and the frame printing which are the places where we actually care about the source map directly.. So whatever you do it can't be strictly linked to setting breakpoints.

We can implement this feature in lldb-vscode, but I would much rather have a solution in LLDB itself if this is useful to anyone other than just lldb-vscode. I am not aware of any IDEs that do source remapping for debugger in any way, so I think this feature it would be valuable to just about any IDE, but we can just put the feature in lldb-vscode and let everyone else fend for themselves if no one wants this feature in LLDB.

We would welcome feedback from anyone on the review or subscriber list.

I'm not entirely clear what problem this is solving. Any actor setting breakpoints can already choose the match depth by simply providing that many directory components. I.e. if I know I have subdir1/foo.cpp and subdir2/foo.cpp I can set a breakpoint on only one of them by doing:

(lldb) break set -f subdir1/foo.cpp -l 10

So an IDE could implement this by simply passing either the base name or the base name plus one sub directory, etc. As you say, that's what command line users do anyway, they seldom type out full paths. So this is really about IDE's running lldb, and this seems more like a feature the IDE should offer, not lldb.

Xcode has this same issue if you download a dSYM file from the Apple build infrastructure if that dSYM doesn't contain path re-mappings in the UUID plists. Xcode, the IDE, will send down full paths to the source files that it knows about if you set breakpoints. Granted Xcode could work around this, but we added the ability to remap things at the module level in LLDB to work around this issue. So yes, it would be great if all IDEs would do this, but currently none do.

It also seems awkward to do all this work as part of the breakpoint filtering - which as you have seen is somewhat delicate code. After all, you can implement "only match the last N components" by only submitting the last N components when looking up the files. So it would be much more straightforward to strip the leading path components from the file name when you create the breakpoint and you don't have to muck with the filtering at all. That would also help remove some confusion when you see that the path submitted was /foo/bar/baz.cpp but I matched /some/other/bar/baz.cpp (but not /some/other/different/baz.cpp) because I had set the match suffix count to 1. It would be nice to have break list show me exactly what it was going to match on.

We might need to always store the path that was given in source file and line breakpoints and then have the breakpoint update its locations when/if either of these two settings are modified. This would also help with your example below where you comment on breakpoint settings depend on the history of the session (they shouldn't).

I took that bit out because in fact the implementation stores the depth in the resolver when made. So any given breakpoint isn't history dependent, but rather to understand why some breakpoints take and some don't you have to know the history of your settings in the session - which is not recorded anywhere you can get your hands on. That's a source of confusion for sure but TTTT probably not a huge one.

I would love for people to be able to enable/disable this setting to see if it fixes their debug session. If we go the route you are suggesting, we will need to set the setting and then remove the breakpoint and try to set it again. But I would love it if people could just enable this setting and see if it fixes their breakpoint issues. If the breakpoints always stored the full path that was supplied, and if the breakpoint always listened to the target settings, then nothing would need to change on the breakpoint storage end. We might need to add a "resolved path" to the source file and line breakpoints which would be the path that was used to set the breakpoint which we could show if it differs from the specified path. The breakpoints would always re-resolve if the settings were changed.

I am also not sure doing this as a general setting for values other than 0 is really helpful. So far as I can tell, you would use a value of 1 because you know that though you might have files with duplicate names they are always disambiguated by their parent directory name.

Exactly the case I was worried about. Usually zero would be the default IMHO, but if you have two different binaries that both have a "main.cpp" it could help. We can remove the depth and have this setting only set breakpoints by basename if needed if no one wants the extra complexity.

Certainly just doing base name will find all the matches regardless of what the debug info says about the path, and the only problem is extra matches from files with the same name in different directories. That's only slightly annoying and not a fatal problem because you can always disable locations you don't want, though it would be good to have some way to handle this. But it seems to me to disambiguate effectively, you really have to know something about the structure of your project. You could probably get away with 1 directory as a global setting safely since most of time the sources move rigidly and the source root will be above everything. But anything above that will require you know that all the source files are that many levels below the source root or your setting will throw out some matches when the source root is moved. So it seems to me any higher setting is going to be too fiddly to be worth the effort.

Yeah, I would be fine getting rid of the directory depth setting as it does seem to fiddly and could cause problems and I am not sure anyone will know what to set it to.

Using a value of 2 says the names are disambiguated by the two parent directory names. But there's no guarantee that one value is going to work for all the files in your project and having to change a global setting from breakpoint to breakpoint is awkward.

I also don't like that this makes breakpoint settings depend on the history of the session. For instance:

a) Set the partial match to on and the depth to 2
b) Set a file & line breakpoint
c) Change the partial match depth to 0
d) a library gets loaded with a file that wouldn't have matched with depth of 2 but does with 0

We would need to store the full path in the source file and line breakpoints all the time and then set all of the source breakpoints again if this setting changes (enabled/disabled or dir depth changes).

I actually cut this part out of the comment (but after you responded) because the current implementation stores the setting when the breakpoint is made.

Yeah, I would propose we always store the full path, and maybe add a "resolved path" to the source file and line breakpoint which can be shorter. It could show up in the "breakpoint list" output only if it differs from the specified path.

So it wouldn't change with changes in the setting - which I think is the right behavior. There is no other instance where the resolver changes it's mind about how to find matches after the breakpoint is made. That useful because people can set intentions on individual locations (commands, conditions, etc.) and anything but changes in the underlying binaries that removes locations seems likely to undo user's work and reasonably upset them.

I still go back to the situation where breakpoints are not working and would love to be able to enabled this setting and see if breakpoints end up resolving. Right now we always have to tell people to run "image dump line-table Foo.cpp" and tell us if you see anything. I think having a global setting that helps breakpoint be set would be fine as long as it would re-resolve any needed breakpoints if it were changed. Most people would enable this setting by default in their init files and not change it, so it wouldn't affect users on a day to day basis, but it would help with getting breakpoints working for people.

Since I don't think letting the locations change over time is a good idea, it follows that if you are going to do this it's better to have lldb trim the incoming file path based on what it plans to pay attention to rather than to keep a path with implicitly ignored components around till the very end of the breakpoint setting. It is weird for a breakpoint to show "/foo/bar/bar/blah.c" for the file path even though the breakpoint only plans to match on "bar/blah.c". These settings really just implement: "I can't be bothered to strip all but the last <N> components from a file path before submitting it for matching, please do that for me." Given that's what you are doing, then doing it in the most straightforward way seems best, i.e. just pre-process the file path based on what the user told you and record that in the breakpoint.

It would be less intrusive of a change yes I agree. Not sure if users will then wonder why the breakpoint doesn't match their input. Like if they do:

(lldb) settings set target.breakpoint-partial-match true
(lldb) breakpoint set --file /path/to/foo.cpp --line 12
(lldb) settings set target.breakpoint-partial-match false
(lldb) breakpoint set --file /path/to/foo.cpp --line 12
(lldb) breakpoint list

I wonder if the user will know why one has "foo.cpp" and one has "/path/to/foo.cpp". I can see your point that is it easier to implement this way, but would love to be able to have people enable this feature and just see if it fixes everything if possible without too many intrusive code changes.

Also, you are supposed to be able to save breakpoints & restore them and provided your binaries haven't changed you will get the same breakpoints. Now, for that to work, you also have to restore some target setting that wasn't saved with the breakpoints.

We wouldn't actually need to if we always store the path we are given when the breakpoint is originally set, and then update the breakpoint locations when/if the settings are changed (which might cause more locations to show up, or less).

I agree with the first part of this, I don't think changing the resolver as the setting changes is a good idea (see above) but that's an orthogonal concern.

I see your point, not sure what the user will expect from the above command sequence I mentioned. The auto source map will become harder to implement if we go the route you suggest.

I wouldn't mind so much if this were passed in to the breakpoint setting directly, though again, I don't really see the point since this is mostly for IDE's and they can strip however much they want off a path before submitting it w/o involving lldb.

So neither Xcode nor Visual Studio Code make any attempt to remap sources for anyone. The IDEs always pass down the full path currently.

Xcode only does this because for the longest time lldb only supported "base filename" and "full path" matches, we didn't do partial path matches. They have a little bit of a problem figuring out the "root directory" of a project so they know what to strip from the path, but this setting wouldn't help with that anyway.

Just to let everyone know where we are going with this: we want to implement an auto source map feature after this patch where if the user sets a breakpoint with "/some/build/path/<src-root>/bar/baz/foo.cpp", and the debug info contains "<any-path>/<src-root>/bar/baz.foo.cpp", we can have LLDB automatically create a source mapping for "/some/build/path" -> "<any-path>" where "<any-path>" can be a relative path like "." or an absolute path like "/local/file/path". The BUCK build system uses relative paths in the DWARF for all source files so that cached compiled versions of .o and .a files can be downloaded from the build infrastructure if no one has changed sources in a particular directory. This lets build servers build and cache intermediate files on another machine and send them over to user machines to speed up local builds. Apple's B&I builds on some remote build server using different paths and then uploads results to other machines in different directories and also archives the sources in other locations. So having this feature will help users to be able to set breakpoints more reliably without having to set the source mapping settings manually.

I wonder about this a bit. Seems to me there are two separate problems that need solving. There's the main problem that source-maps are supposed to solve, namely finding actual source files on disk so that we can show source lines as you step. But the breakpoint system (except for break set -p) only needs to match paths, it doesn't care if any of the elements of the path actually exists. And really, the only reason why the breakpoint system needs paths at all is to disambiguate files with the same basename in different directories. That's all it cares about. Tying "file name disambiguation" too closely with actually ensuring that you can find real files on disk will mean you only solve the disambiguation problem if you have sources locally, which has never been a requirement of breakpoints.

Sources maps are the only way to get breakpoints working right now if the full path specified by the IDE doesn't match the debug info. If the user doesn't set them correctly, then no breakpoints happen at all. By setting breakpoints by basename, we can deduce the needed source mappings required by matching as many source directories as possible and then adding a source map to the settings so that everything works when stopped at the breakpoint. So we can easily locate the root directories because usually there are a few extra directories that match in the path.

Resolving breakpoint disambiguation also seems to me to be pretty different from the job of building these auto path maps. In building a path map, you are asking a question about equivalent prefixes: "how much should I chop off the beginning of these two paths to get an equivalence". For disambiguation you are asking "how many directories above the base name do I need to look at". It is awkward to try to use a specification for "N directories above the base name for any path I might find in the debug information" as a way to determine "where should I start looking for the equivalent prefix to strip to make a source map". The source paths in a complex project are at all different levels so that seems the wrong way to specify it.

We don't need the directory depth at all to make this work, using basenames is all we need. Then it is very easy to set a breakpoint a "/foo/bar/<root-dir>/src/foo.cpp" and see that a location was matched at "./<root-dir>/src/foo.cpp" and make the auto source map entry of "/foo/bar" -> ".". The "." can also be any other directory. Since most projects have a source root and the debug info usually contains the debug info that is compiled all for those directories, it makes it easy to deduce the auto source map regardless of how this breakpoint setting strategy is done.

And of course, you will have to do this in some way that's independent of breakpoints, since it also has to work with source list, and the frame printing which are the places where we actually care about the source map directly.. So whatever you do it can't be strictly linked to setting breakpoints.

We are again requiring people to be able to correctly set source maps before they can effectively debug which causes many debug sessions to fail for people. Most people debug projects that are relative to one root directory, so if people are already setting their source maps correctly., good for them, nothing will happen and things will just work. If they don't, then now they have a chance to debug instead of just running their code. Any system where the build system is not integrated into the IDE, like VS code, now requires to the user to know the details of how the build system works and how to set the debugger up in order for them to get things working and just be able to hit a breakpoint. Xcode doesn't have this problem because it is well integrated with the build system. VS Code doesn't have a build system plug-in, so everyone rolls their own (cmake + ninja, make, buck, etc) and each has its own ways of doing things. It isn't easy to dig ask the build system what it is doing.

So while it is true that setting breakpoints won't always fix everything, the truth is that users in IDEs set file and line breakpoints 99% of the time and this can help users debug more effectively more of the time without having to know the in depth detail of how LLDB requires source maps.

jingham added a comment.EditedJul 7 2022, 9:08 PM

I'm out tomorrow, so I won't get a chance for a detailed reply till Monday. But my short reactions are:

  1. Setting the breakpoint search to only check base names will of course make all your file and line breakpoints work, but they will also pick up extra hits. In command line lldb that doesn't look nearly as weird as clicking on one source file window & seeing a breakpoint marker show up in a completely different window. If we didn't care about that problem and expected people to manage these extra locations by hand, then indeed just saying "only set using base-names" is fine. But it still seems weird to me that we would add an lldb setting and code to support that rather than have IDE's just only pass in the base name if that's their strategy.
  1. I really don't want changing a setting to add or remove locations from an extant breakpoint. That's not how breakpoints work. If I go to the trouble of inputting a command on a location, and the I change a setting and the location goes away, I would rightly be ticked off. So I really think the locations should record whatever search they are going to do when they are created, and stick to it. However, I have no problem with having the breakpoint store "original path/matched path" pairs if that makes it easier to figure out what is going on.
  1. The behind your back creation of source maps gives me the willies. Particularly if you have more than one module with debug information, all built originally in the phony locations so they all look like the original directories have the same root, it seems like you could start getting source maps that map everything to everything and after a while this will get hard to reason about. Maybe I'm being pessimistic, however, and if you are excited to try, more power to you.

But I don't think you can drive this from breakpoints alone. Suppose I set no breakpoints, and the program crashes at /OriginalDirectory/SourceRoot/MyProject/subdir/foo.c, lldb is still going to have to figure out what to show the user, so you're still going to have to come up with the real source file. And you can't use your breakpoint tricks to help you because it's lldb providing the information and at the point where you are doing this it only knows the original location. The IDE is the only agent that knows where to look for other possible locations.

It seems like it would be cleaner to have some kind of callback when a binary with debug information gets loaded where the UI could have a look at the CU's in the module that got loaded, and see if it knows about any of those files locally (because it has them in in directory-equivalent places in its project). The UI can then construct a source map from there. That way this would happen predictably and for all consumers, rather than relying on the user's path through breakpoint setting to set the source mapping world back to rights.

@jingham, thanks for sharing the thoughts.

Setting the breakpoint search to only check base names will of course make all your file and line breakpoints work, but they will also pick up extra hits. In command line lldb that doesn't look nearly as weird as clicking on one source file window & seeing a breakpoint marker show up in a completely different window. If we didn't care about that problem and expected people to manage these extra locations by hand, then indeed just saying "only set using base-names" is fine. But it still seems weird to me that we would add an lldb setting and code to support that rather than have IDE's just only pass in the base name if that's their strategy.

One thought to mitigate this is, for all the symbol context(s) matched with base file name, we could iterate and find if there is any exact match with request breakpoint path. We would prefer this exact match symbol context and throw away other partial matches. However, if no exact match exists, we have to keep all of them to not miss breakpoints because there is no way to know which one users want. This should make 90% (my guess) breakpoint cases (with exact matching) the same behavior as before.

I really don't want changing a setting to add or remove locations from an extant breakpoint. That's not how breakpoints work. If I go to the trouble of inputting a command on a location, and the I change a setting and the location goes away, I would rightly be ticked off. So I really think the locations should record whatever search they are going to do when they are created, and stick to it. However, I have no problem with having the breakpoint store "original path/matched path" pairs if that makes it easier to figure out what is going on.

I do not not have much preference on this one. In 99% of the use cases, client/IDE/users would set this setting during startup without changing during the lifetime of debug sessions.

The behind your back creation of source maps gives me the willies. Particularly if you have more than one module with debug information, all built originally in the phony locations so they all look like the original directories have the same root, it seems like you could start getting source maps that map everything to everything and after a while this will get hard to reason about. Maybe I'm being pessimistic, however, and if you are excited to try, more power to you.

Ideally, if compiler/linker can emit checksum for each source file into debug info we can verify each matched source file to filter noise. I know Microsoft toolchain does so but seems like llvm does not?

But I don't think you can drive this from breakpoints alone. Suppose I set no breakpoints, and the program crashes at /OriginalDirectory/SourceRoot/MyProject/subdir/foo.c, lldb is still going to have to figure out what to show the user, so you're still going to have to come up with the real source file. And you can't use your breakpoint tricks to help you because it's lldb providing the information and at the point where you are doing this it only knows the original location. The IDE is the only agent that knows where to look for other possible locations.

I agree that breakpoint auto source mapping only helps if users/IDE set file line breakpoint. And I have spent some thoughts on the non-breakpoint cases. Ideally, we want to have a target.source-paths setting which users/IDE/CLI can tell lldb where to look for source files. IDE can even pop-up a dialog ask user to browse the target source file for selected frame without valid source files. I know both windbg and visual studio debugger are providing this option. I would like to improve this part but currently unverified breakpoint due to incorrect/missing source map settings are #1 pain points from our users in Meta.

It seems like it would be cleaner to have some kind of callback when a binary with debug information gets loaded where the UI could have a look at the CU's in the module that got loaded, and see if it knows about any of those files locally (because it has them in in directory-equivalent places in its project). The UI can then construct a source map from there. That way this would happen predictably and for all consumers, rather than relying on the user's path through breakpoint setting to set the source mapping world back to rights.

We can look into this to further improve the auto source mapping feature. My concern is that it requires relative complex interaction from IDE/client which increases the barrier for wide adoption.

From high level, I agree that only users/IDE know the true source physical locations. Actually, the design of "breakpoint guided auto source map" is using this truth information from IDE - breakpoint file path to guide auto source mapping. I agree that it is not completely fixing all source mapping situations like the non-breakpoint cases as you said. It seems natural to improve incrementally. Based on our user study, breakpoint guided auto source mapping is one of the most important first step.

Regarding implementing in IDE vs lldb, we could implement the logic in lldb-vscode but:

  • That would not benefit other IDE(s).
  • Many IDE(s) are hard to customize. They are providing a general debugging protocol shared by many language engines.
  • It would not help command line cases (I bet some command line lldb users would try to use full path to set breakpoint. I wish there is telemetry for it).

My goal with this and follow-up patches is trying to make lldb breakpoint/source-mapping working by default (command line or IDE) without caring about source map settings as much as possible. That would make lldb easier to use/adopt. Happy to work out a solution to balance all the concerns.

jingham added a comment.EditedJul 11 2022, 6:40 PM

@jingham, thanks for sharing the thoughts.

First off, I am supportive of your project, this is a pain point for some folks for sure. I am in favor of trying to automate "I built my binaries with sources in one location but locally they are in another" scenario.

Having breakpoint misses be the point where this gets checked is not a great final design, since there are other places where lldb needs to find source files. But if it does the 90% job, it's okay to start there.

OTOH, framing the feature as "you pass me a path, and I'm only going to check the last N path components regardless of how many you entered" is just weird. I'd rather we not expose that. That choice, after all, is trivial in the IDE, just send the parts of the path you care about. I think that one is fine to leave on the IDE side if you want it explicitly, since it would be trivial for an IDE to implement.

I also don't like the original framing as it mixes the problem you are actually trying to solve: "match the full paths from IDE project info with those in debug info" with the general problem of breakpoint setting. For instance, while I might very well want you to try to match up full paths from a project this way, if I type:

(lldb) break set -f foo/bar.c -l 10

I never want you to set a breakpoint on baz/bar.c. That doesn't make any sense. So I don't think we should make that possible.

What the user cares about is that you are going to auto-deduce out-of-place source file mappings for them. That's the piece that really matters; having users have to turn this on by specifying the parameters for finding inexact matches seems backwards. So the patch would make a lot more sense to me if you were adding target.auto-deduce-source-maps, and doing your work behind that setting however you need to.

For that feature, it would be clear that this is only to be done when the input path is a full path, since you can't set up a path mapping if the "real file location" isn't fully specified. That will limit the scope for accidental matchings and keep it out of all the CLI entered breakpoint setting (except for people that enter full paths, presumably to trigger this auto-deduction since I've actually never seen somebody do that IRL...) And you can use whatever algorithm to search for potential matching patterns makes sense for the job you are actually trying to do: see if there's a rigid translation between a source tree the IDE knows about, and one from Debug Information.

Setting the breakpoint search to only check base names will of course make all your file and line breakpoints work, but they will also pick up extra hits. In command line lldb that doesn't look nearly as weird as clicking on one source file window & seeing a breakpoint marker show up in a completely different window. If we didn't care about that problem and expected people to manage these extra locations by hand, then indeed just saying "only set using base-names" is fine. But it still seems weird to me that we would add an lldb setting and code to support that rather than have IDE's just only pass in the base name if that's their strategy.

One thought to mitigate this is, for all the symbol context(s) matched with base file name, we could iterate and find if there is any exact match with request breakpoint path. We would prefer this exact match symbol context and throw away other partial matches. However, if no exact match exists, we have to keep all of them to not miss breakpoints because there is no way to know which one users want. This should make 90% (my guess) breakpoint cases (with exact matching) the same behavior as before.

I really don't want changing a setting to add or remove locations from an extant breakpoint. That's not how breakpoints work. If I go to the trouble of inputting a command on a location, and the I change a setting and the location goes away, I would rightly be ticked off. So I really think the locations should record whatever search they are going to do when they are created, and stick to it. However, I have no problem with having the breakpoint store "original path/matched path" pairs if that makes it easier to figure out what is going on.

I do not not have much preference on this one. In 99% of the use cases, client/IDE/users would set this setting during startup without changing during the lifetime of debug sessions.

The behind your back creation of source maps gives me the willies. Particularly if you have more than one module with debug information, all built originally in the phony locations so they all look like the original directories have the same root, it seems like you could start getting source maps that map everything to everything and after a while this will get hard to reason about. Maybe I'm being pessimistic, however, and if you are excited to try, more power to you.

Ideally, if compiler/linker can emit checksum for each source file into debug info we can verify each matched source file to filter noise. I know Microsoft toolchain does so but seems like llvm does not?

But I don't think you can drive this from breakpoints alone. Suppose I set no breakpoints, and the program crashes at /OriginalDirectory/SourceRoot/MyProject/subdir/foo.c, lldb is still going to have to figure out what to show the user, so you're still going to have to come up with the real source file. And you can't use your breakpoint tricks to help you because it's lldb providing the information and at the point where you are doing this it only knows the original location. The IDE is the only agent that knows where to look for other possible locations.

I agree that breakpoint auto source mapping only helps if users/IDE set file line breakpoint. And I have spent some thoughts on the non-breakpoint cases. Ideally, we want to have a target.source-paths setting which users/IDE/CLI can tell lldb where to look for source files. IDE can even pop-up a dialog ask user to browse the target source file for selected frame without valid source files. I know both windbg and visual studio debugger are providing this option. I would like to improve this part but currently unverified breakpoint due to incorrect/missing source map settings are #1 pain points from our users in Meta.

It seems like it would be cleaner to have some kind of callback when a binary with debug information gets loaded where the UI could have a look at the CU's in the module that got loaded, and see if it knows about any of those files locally (because it has them in in directory-equivalent places in its project). The UI can then construct a source map from there. That way this would happen predictably and for all consumers, rather than relying on the user's path through breakpoint setting to set the source mapping world back to rights.

We can look into this to further improve the auto source mapping feature. My concern is that it requires relative complex interaction from IDE/client which increases the barrier for wide adoption.

From high level, I agree that only users/IDE know the true source physical locations. Actually, the design of "breakpoint guided auto source map" is using this truth information from IDE - breakpoint file path to guide auto source mapping. I agree that it is not completely fixing all source mapping situations like the non-breakpoint cases as you said. It seems natural to improve incrementally. Based on our user study, breakpoint guided auto source mapping is one of the most important first step.

Regarding implementing in IDE vs lldb, we could implement the logic in lldb-vscode but:

  • That would not benefit other IDE(s).
  • Many IDE(s) are hard to customize. They are providing a general debugging protocol shared by many language engines.
  • It would not help command line cases (I bet some command line lldb users would try to use full path to set breakpoint. I wish there is telemetry for it).

My goal with this and follow-up patches is trying to make lldb breakpoint/source-mapping working by default (command line or IDE) without caring about source map settings as much as possible. That would make lldb easier to use/adopt. Happy to work out a solution to balance all the concerns.

I originally wanted to stay out of this, but then we got a path-resolving bug report which got me thinking about all of this.

Generally I would say I agree with Jim, that this is a important problem to solve, but the implementation is somewhat unusual. I don't really have much in the way of a concrete suggestions, but I have a couple of observations I'd like to share.

  • The fact that the breakpoint path /src/foo/bar.cc could match debug info path /build/foo/bar.cc is definitely strange. However, you said that for you the debug info paths are relative. Having /src/foo/bar.cc match foo/bar.cc does not seem nearly as strange as the first example. In fact, it feels kind of natural as it makes the "matches" relation symmetric given that foo/bar.cc would already match /build/foo/bar.cc.
  • I am wondering if we could not make use of the fact that the debug info already come in two components (the compilation directory + (relative) path). The split between the two presents a natural point at which we can try to do some kind of remapping or other fancy logic, and it would have the advantage of being more predictable than greedily matching as many components as we can.

    The reason I started thinking about this is that, for this bug report, gdb was actually able to find the relevant source file, while lldb could not. In our build system, the build paths are also normalized (although to /proc/self/cwd instead of .), but because this binary was build in an unusual way, the normalization process did not occur, and the file ended up with a weird path prefix (in fact, if the binary consisted of multiple compile units, each unit could have gotten a different prefix). However, this prefix was only present inside the compilation directory attribute. And since the way gdb locates files is to take the path suffix and iteratively append it to the directories in the search path (the compilation directory is just one of the entries), it was able to find the file with no problem.

    I don't really know how to apply that to lldb, but the fact that we're not using the compilation directory seems like a missed opportunity.

@jingham, thanks for sharing the thoughts.

First off, I am supportive of your project, this is a pain point for some folks for sure. I am in favor of trying to automate "I built my binaries with sources in one location but locally they are in another" scenario.

Having breakpoint misses be the point where this gets checked is not a great final design, since there are other places where lldb needs to find source files. But if it does the 90% job, it's okay to start there.

OTOH, framing the feature as "you pass me a path, and I'm only going to check the last N path components regardless of how many you entered" is just weird. I'd rather we not expose that. That choice, after all, is trivial in the IDE, just send the parts of the path you care about. I think that one is fine to leave on the IDE side if you want it explicitly, since it would be trivial for an IDE to implement.

I also don't like the original framing as it mixes the problem you are actually trying to solve: "match the full paths from IDE project info with those in debug info" with the general problem of breakpoint setting. For instance, while I might very well want you to try to match up full paths from a project this way, if I type:

(lldb) break set -f foo/bar.c -l 10

I never want you to set a breakpoint on baz/bar.c. That doesn't make any sense. So I don't think we should make that possible.

What the user cares about is that you are going to auto-deduce out-of-place source file mappings for them. That's the piece that really matters; having users have to turn this on by specifying the parameters for finding inexact matches seems backwards. So the patch would make a lot more sense to me if you were adding target.auto-deduce-source-maps, and doing your work behind that setting however you need to.

That is what we originally were going to do, but then we had the breakpoint changes we needed for this and we were going to try and break up the patch into smaller sections.

For that feature, it would be clear that this is only to be done when the input path is a full path, since you can't set up a path mapping if the "real file location" isn't fully specified. That will limit the scope for accidental matchings and keep it out of all the CLI entered breakpoint setting (except for people that enter full paths, presumably to trigger this auto-deduction since I've actually never seen somebody do that IRL...) And you can use whatever algorithm to search for potential matching patterns makes sense for the job you are actually trying to do: see if there's a rigid translation between a source tree the IDE knows about, and one from Debug Information.

So to clarify your approach would be:

  • add a "target.auto-deduce-source-maps" setting that can be enabled
  • if this feature is enabled, we essentially enable breakpoint setting by basename only and then auto deduce source mappings by matching as much as we can from the specified path and the debug info path, but only for source breakpoints where full paths were specified

Let me know if this is what you were thinking of or if I missed anything

I originally wanted to stay out of this, but then we got a path-resolving bug report which got me thinking about all of this.

Generally I would say I agree with Jim, that this is a important problem to solve, but the implementation is somewhat unusual. I don't really have much in the way of a concrete suggestions, but I have a couple of observations I'd like to share.

  • The fact that the breakpoint path /src/foo/bar.cc could match debug info path /build/foo/bar.cc is definitely strange.

Not strange at all for production builds where you build on a server or download the debug info for a release build. Then you open the sources in your IDE and you want to debug using the IDE. The paths will rarely match in those cases, so this is a very important aspect of this patch.

  • However, you said that for you the debug info paths are relative. Having /src/foo/bar.cc match foo/bar.cc does not seem nearly as strange as the first example. In fact, it feels kind of natural as it makes the "matches" relation symmetric given that foo/bar.cc would already match /build/foo/bar.cc.

I am going to make a separate patch for relative paths in line tables. I agree that if you specify a full path to a binary and the line tables have relative paths, that something like "/foo/bar/baz.c" should match a line table that contains "bar/baz.c" as long as all parts of the relative paths match from the debug info. This should be a feature that just works in LLDB and is always enabled.

  • I am wondering if we could not make use of the fact that the debug info already come in two components (the compilation directory + (relative) path). The split between the two presents a natural point at which we can try to do some kind of remapping or other fancy logic, and it would have the advantage of being more predictable than greedily matching as many components as we can.

We could easily make the compilation directory available from the lldb_private::CompileUnit class if this would help. It would return an invalid path if the DWARF doesn't contain a DW_AT_comp_dir, but it could surely be useful for some logic in LLDB. Our issue is that our DW_AT_comp_dir is a relative path, if we have one, but that doesn't mean if there is a full path in DW_AT_comp_dir that LLDB can't make use it for good.

The reason I started thinking about this is that, for this bug report, gdb was actually able to find the relevant source file, while lldb could not. In our build system, the build paths are also normalized (although to `/proc/self/cwd` instead of `.`), but because this binary was build in an unusual way, the normalization process did not occur, and the file ended up with a weird path prefix (in fact, if the binary consisted of multiple compile units, each unit could have gotten a different prefix). However, this prefix was only present inside the compilation directory attribute. And since the way gdb locates files is to take the path suffix and iteratively append it to the directories in the search path (the compilation directory is just one of the entries), it was able to find the file with no problem.

Nice. As I said, I'd be happy to add the "FileSpec CompileUnit::GetCompilationDirectory()" function to the compile unit class if this can help us locate sources more effectively.

I don't really know how to apply that to lldb, but the fact that we're not using the compilation directory seems like a missed opportunity.

Indeed it does. I will submit a patch for this.

  • The fact that the breakpoint path /src/foo/bar.cc could match debug info path /build/foo/bar.cc is definitely strange.

Not strange at all for production builds where you build on a server or download the debug info for a release build. Then you open the sources in your IDE and you want to debug using the IDE. The paths will rarely match in those cases, so this is a very important aspect of this patch.

I get your use case, and understand why this is desirable there. My point is that it can be confusing it's applied too broadly, and misfires. Suppose that I have local build, I do a b bar/foo.cc:47 and it's ends up picking baz/foo.cc, which is a completely unrelated file. If I didn't know anything about this feature, I would definitely be confused.

  • However, you said that for you the debug info paths are relative. Having /src/foo/bar.cc match foo/bar.cc does not seem nearly as strange as the first example. In fact, it feels kind of natural as it makes the "matches" relation symmetric given that foo/bar.cc would already match /build/foo/bar.cc.

I am going to make a separate patch for relative paths in line tables. I agree that if you specify a full path to a binary and the line tables have relative paths, that something like "/foo/bar/baz.c" should match a line table that contains "bar/baz.c" as long as all parts of the relative paths match from the debug info. This should be a feature that just works in LLDB and is always enabled.

SGTM

  • I am wondering if we could not make use of the fact that the debug info already come in two components (the compilation directory + (relative) path). The split between the two presents a natural point at which we can try to do some kind of remapping or other fancy logic, and it would have the advantage of being more predictable than greedily matching as many components as we can.

We could easily make the compilation directory available from the lldb_private::CompileUnit class if this would help. It would return an invalid path if the DWARF doesn't contain a DW_AT_comp_dir, but it could surely be useful for some logic in LLDB. Our issue is that our DW_AT_comp_dir is a relative path, if we have one, but that doesn't mean if there is a full path in DW_AT_comp_dir that LLDB can't make use it for good.

I don't think that the fact that the comp_dir is empty prevents us from using it in this logic. It can still be used to determine a point in the path at which to apply the remapping. An empty comp_dir would mean we remap the whole (relative) path. This would be easier to explain to the user than attempting to match/remap at every possible point. And it would be similar to what gdb is doing. Of course, this assumes that remapping the full relative path would work for your use case. I don't know whether that's the case, but I am hoping (:fingers_crossed) that it is.

yinghuitan added a comment.EditedJul 13 2022, 2:47 PM

From Jim, First off, I am supportive of your project, this is a pain point for some folks for sure. I am in favor of trying to automate "I built my binaries with sources in one location but locally they are in another" scenario.

Great to hear that! Seems all three companies are on the same page/goal to improve this area.

From Pavel, I get your use case, and understand why this is desirable there. My point is that it can be confusing it's applied too broadly, and misfires. Suppose that I have local build, I do a b bar/foo.cc:47 and it's ends up picking baz/foo.cc, which is a completely unrelated file. If I didn't know anything about this feature, I would definitely be confused.

I see your concern. That's why I suggest us prefer exact match and throw away others. For example, if dwarf has /build1/baz/foo.cc and /build2/bar/foo.cc, command b bar/foo.cc:47 would prefer /build1/baz/foo.cc and throw away /build2/bar/foo.cc. However, if dwarf only has /build2/bar/foo.cc, we could aggressively match it under the new config setting because it is better than missing potential breakpoint. There is a potential issue that future runtime loaded static library may have exact match though. I personally like this aggressive matching because I am tired of dealing with missing breakpoint issues. In real life, I do not feel strong about this because absolute path from IDE is the most common usage cases we want to fix/improve.

From Greg, I am going to make a separate patch for relative paths in line tables. I agree that if you specify a full path to a binary and the line tables have relative paths, that something like "/foo/bar/baz.c" should match a line table that contains "bar/baz.c" as long as all parts of the relative paths match from the debug info. This should be a feature that just works in LLDB and is always enabled.

I agree this should enabled by default. I just want to point out this default behavior may not solve the BUCK relative path cases in Meta because BUCK generates ./buck-out/dev/gen/..../bar/foo.cpp in dwarf. I suspect the leading ./ will cause problem? We could special treat it as relative path in your implementation though.

I would like to share here a list of issues/bug reports I discovered in Meta which I hope the new design can address. For all the examples I am focusing on IDE cases like VSCode which would use absolute path to set breakpoint.

  1. Buck build system generates relative path. For example, for source file /data/users/username/buckRoot/cell/mode/bar/foo.cpp, ./cell/mode/bar/foo.cpp is stored in dwarf (with DW_AT_comp_dir being .). Without source mapping setting IDE like VSCode would fail to bind breakpoint request for /data/users/username/buckRoot/cell/mode/bar/foo.cpp.
  2. Post-mortem/off-host debugging scenarios, app is built on buildserver with /buildserver/foo/bar/a.cpp in dwarf. User is trying to debug app crash dump on a different machine with source downloaded from some source server to /sourceServer/cache/foo/bar/a.cpp. IDE won't work without explicit source map setting.
  3. In a customer bug report, user wanted to debug a header file in file tree /data/users/username/buckRoot/cell/mode/bar/foo.h. One build system decided to copy this header file into build artifacts folder ./build/<hash>/app/headers/components/config_bundle/support/bar/foo.h directory for cloud building so that later path is embedded into dwarf. Even with normal source mapping from IDE: “.” => “/data/users/username/buckRoot”, it still fails to match in IDE. To fix this, the idea is, under new target.auto-deduce-source-maps setting, IDE's breakpoint /data/users/username/buckRoot/cell/mode/bar/foo.h would be reverse mapped to ./cell/mode/bar/foo.h (by existing source map setting) as breakpoint request, then we can auto generate the delta of "./build/<hash>/app/headers/components/config_bundle/support/" => "./cell/mode/". Combining this with original source mapping, we get the new source mapping of "./build/<hash>/app/headers/components/config_bundle/support/" => “/data/users/username/buckRoot/cell/mode/”. This new mapping helps to map dwarf path back to user's IDE breakpoint file in source tree.

The #3 example above is the most complex one, but in essential it is no different from #2 in that the binary is built from one source location (build artifacts directory) and debug from a different source location (source file tree). It just needs extra paths logic while working with existing source map settings.

yinghuitan abandoned this revision.Feb 12 2023, 10:45 PM