This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Introduce the FileSpecBuilder abstraction
Changes PlannedPublic

Authored by bulbazord on May 30 2023, 6:14 PM.

Details

Summary

Mutating a FileSpec is quite expensive. Every time you add or remove a
component of a path, we need to re-parse the entire path and insert a
new directory and filename into the ConstString StringPool. In many
cases, we take an existing FilePath and slowly morph it into the one we
want to actually use.

In order to improve performance, I want to introduce a new abstraction
called the FileSpecBuilder. The idea is that it maintains a
SmallString to store the entire path and a path style. It uses llvm's
path manipulation code to quickly append and remove things so we don't
have to re-parse the path every single time. When you're done
manipulating paths and want a FileSpec for sure, you can invoke
FileSpecBuilder::CreateFileSpec to create a new FileSpec, only parsing
the path once.

This patch primarily is to gather feedback about the idea. I wanted to
keep this patch small and targeted to get specific feedback, and for
that reason I avoided doing any kind of refactoring as most changes will
be quite invasive. Instead I opted to add unit tests to show how I want
FileSpecBuilder to be used in code. If this abstraction goes into LLDB,
I will refactor every place where we mutate a FileSpec until
FileSpecBuilder is the default way of manipulating paths.

For further motivation of this change, please see:
https://reviews.llvm.org/D149096

Diff Detail

Event Timeline

bulbazord created this revision.May 30 2023, 6:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 30 2023, 6:14 PM
bulbazord requested review of this revision.May 30 2023, 6:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 30 2023, 6:14 PM

Why did you choose to have a separate FileSpecBuilder class, rather than making FileSpec smarter about the structure of the path (e.g. have an array of StringRef's into the paths for each component.) We could do the parse once the first time a path element was requested, and then operations like "RemovePathComponent" would be trivial. We could maybe even get rid of the distinction between "path" and "filename" since "filename" is really just "last path component".

bulbazord planned changes to this revision.EditedMay 31 2023, 6:05 PM

Why did you choose to have a separate FileSpecBuilder class, rather than making FileSpec smarter about the structure of the path (e.g. have an array of StringRef's into the paths for each component.) We could do the parse once the first time a path element was requested, and then operations like "RemovePathComponent" would be trivial. We could maybe even get rid of the distinction between "path" and "filename" since "filename" is really just "last path component".

You mean have FileSpec replace its ConstString m_directory and ConstString m_filename fields with llvm::SmallString m_path or std::string m_path? That would indeed avoid something like FileSpecBuilder (and probably be simpler in the end). As long as the API of FileSpec stays the same, the changes to call sites should be minimal. The thing I'm mostly concerned about is lifetimes... How often do we grab a ConstString from a FileSpec and not think about the lifetime? Hopefully not many places.

I'll try this out and see what the fallout is. If it's the better way to go, then I'll just refactor all the places where those assumptions are problematic.

The idea behind FileSpec containing two ConstStrings, one for the directory and one for he filename is for lookup performance when searching thousands of line tables for a given file and line when setting breakpoints.

Currently we pass in a FileSpec + line number and we expect fast searching. This is currently done by asking each CompileUnit to find a matching file in its support files array and to return a matching index. If just the filename is filled in, then we compare the just the ConstString for m_filename which is really quick for ConstString values. Then we compare all or part of the m_directory if the path is a full path or if it is relative. But at least the filename comparison is very fast. If we get a valid index back, then we know to search the compile unit's full line table for any matching entries by looking for any line entries with a matching file index. IF the file index is invalid, we don't materialize the line table at all for a compile unit.

Maybe we want the notion of a "FileSpec" (which could contain a llvm::Vector class) and a ConstFileSpec (which would contain two ConstString objects like FileSpec currently has). And we make anyone that needs quick searching, like the support files for a compile unit, use the new ConstFileSpec class.

If you change anything in the FileSpec class, please test setting breakpoints with a really large codebase to ensure we don't regress performance.

If we go the FileSpecBuilder route, we probably want to remove all FileSpec path modification methods and force people to use this class by converting a FileSpec to a FileSpecBuilder, manipulating the path with the FileSpecBuilder calls, and then extracting the FileSpec object from the FileSpecBuilder at the end.