This is an archive of the discontinued LLVM Phabricator instance.

Implement better path matching in FileSpecList::FindFileIndex(...).
AbandonedPublic

Authored by clayborg on Jul 18 2022, 3:04 PM.

Details

Summary

Currently a FileSpecList will only match the specified FileSpec if:

  • it has filename and directory and both match exactly
  • if has a filename only and any filename in the list matches

Because of this, we modify our breakpoint resolving so it can handle relative paths by doing some extra code that removes the directory from the FileSpec when searching if the path is relative.

This patch is intended to fix breakpoints so they work as users expect them to by adding the following features:

  • allow matches to relative paths in the file list to match as long as the relative path is at the end of the specified path
  • allow matches to paths to match if the specified path is relative and shorter than the file paths in the list

This allows us to remove the extra logic from BreakpointResolverFileLine.cpp that added support for setting breakpoints with relative paths.

This means we can still set breakpoints with relative paths when the debug info contains full paths. We add the ability to set breakpoints with full paths when the debug info contains relative paths.

Debug info contains "./a/b/c/main.cpp", the following will set breakpoints successfully:

  • /build/a/b/c/main.cpp
  • a/b/c/main.cpp
  • b/c/main.cpp
  • c/main.cpp
  • main.cpp
  • ./c/main.cpp
  • ./a/b/c/main.cpp
  • ./b/c/main.cpp
  • ./main.cpp

This also means that all users of FileSpecList::FindFileIndex(...) will get this extra matching, which currently is used for matching modules, but I believe we want this extra matching ability there as well.

Diff Detail

Event Timeline

clayborg created this revision.Jul 18 2022, 3:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2022, 3:04 PM
Herald added a subscriber: arphaman. · View Herald Transcript
clayborg requested review of this revision.Jul 18 2022, 3:04 PM
Herald added a project: Restricted Project. · View Herald Transcript
clayborg updated this revision to Diff 445638.Jul 18 2022, 3:10 PM

Didn't submit all changes at first because I had two commits. Squashed the commits to fix the diff.

clayborg updated this revision to Diff 445650.Jul 18 2022, 4:16 PM

Add absolute/relative path caching into FileSpec since we will be using it more for breakpoints if this patch gets approved. FileSpecList objects are used by the line table classes and the support file lists don't change.

I added relative/absolute path caching to FileSpec without increasing the size of the FileSpec objects.

The relative/absolute caching seems like a nice improvement. Splitting it into a separate patch would make it easier to review.

I'm slightly worried about the change to make the new "fuzzy" matching the default. While it makes sense for the breakpoints, I wouldn't generally expect ./a/b/c/main.cpp to match /build/a/b/c/main.cpp, at least not unless my current working directory is /build. While doing the reproducers, the CWD was probably the hardest thing to get right consistently. I'm worried that this new matching is going to lead to really subtle bugs that are going to be tricky to figure out.

To make this more obvious and explicit: would it make sense to have another matches in FileSpec and have FindFileIndex take a std::function or llvm::FunctionRef with the desired matcher. Something like:

static bool FuzzyMatch(const FileSpec &pattern, const FileSpec &file);
size_t FileSpecList::FindFileIndex(size_t start_idx, const FileSpec &file_spec,
                                   bool full, std::function<bool(const FileSpec &, const FileSpec &)> matcher = FileSpec::Match) const {
  ...
}
lldb/include/lldb/Utility/FileSpec.h
408–409

Should this be an llvm::Optional<bool> instead or maybe even better an enum value with values absolute, relative, unknown or something?

sounds good, I will split out the caching into a separate patch. I will try to use the llvm::Optional<bool> as long as it doesn't increase the size of FileSpec which is currently at 24 bytes.

I will also split out the file index matching into a new function that allows relative matches and only switch the line table search stuff over to use it so we don't affect any other clients of FileSpecList.

I submitted https://reviews.llvm.org/D130309 to fix issues with FileSpec before I do a separate patch to add caching of the absolute/relative value. This patch is needed so we can control all locations where a string is modified so we can clear the cached value when things are updated inside of a FileSpec.

Submitted the absolute caching stuff in FileSpec with https://reviews.llvm.org/D130396. I will rebase this on top of that diff when it gets checked in.

I will also revert changes to FindFileIndex, and then add a new FindContainedFileIndex(...) with the contents of this patch's FindFileIndex and explain that it will allow the specified file to be relative and have it match any files those suffix matches, or any file in the file list can be relative and as along as the suffix matches the specified file.

clayborg abandoned this revision.Jul 22 2022, 3:17 PM

Abandoning since I am going to add FileSpecList::FindCompatibleIndex(...) which doesn't match the original description of this patch. I will submit a new one.

I'm slightly worried about the change to make the new "fuzzy" matching the default. While it makes sense for the breakpoints, I wouldn't generally expect ./a/b/c/main.cpp to match /build/a/b/c/main.cpp,

Would you expect that main.cpp "generally" matches /build/a/b/c/main.cpp?

(I'm not arguing for/against anything (yet, at least), but I would like to hear your reasoning if the answer to the question is "yes".)

I'm slightly worried about the change to make the new "fuzzy" matching the default. While it makes sense for the breakpoints, I wouldn't generally expect ./a/b/c/main.cpp to match /build/a/b/c/main.cpp,

Would you expect that main.cpp "generally" matches /build/a/b/c/main.cpp?

(I'm not arguing for/against anything (yet, at least), but I would like to hear your reasoning if the answer to the question is "yes".)

I would say it should match. If FindFileIndex is currently called with a FileSpec that only has "main.cpp" as the m_filename, then it will fall back to only matching by filename even if "full = true;". I would expect it to work the other way around too if we have any files in the file list that are base name only. Does that make sense?

I'm slightly worried about the change to make the new "fuzzy" matching the default. While it makes sense for the breakpoints, I wouldn't generally expect ./a/b/c/main.cpp to match /build/a/b/c/main.cpp,

Would you expect that main.cpp "generally" matches /build/a/b/c/main.cpp?

(I'm not arguing for/against anything (yet, at least), but I would like to hear your reasoning if the answer to the question is "yes".)

I would say it should match. If FindFileIndex is currently called with a FileSpec that only has "main.cpp" as the m_filename, then it will fall back to only matching by filename even if "full = true;". I would expect it to work the other way around too if we have any files in the file list that are base name only. Does that make sense?

It makes sense to me, but the question was mainly for Jonas -- I'm sorry if that wasn't clear. I'm wondering if he sees a fundamental difference between "./a/b/c/main.cpp matching /build/a/b/c/main.cpp" and "main.cpp matching /build/a/b/c/main.cpp". The second I believe is true in the status quo as well.

Because I don't see much of a difference -- the former seems like an extension of a general principle of "fuzzy" matching that guides the latter.