Page MenuHomePhabricator

[index-while-building] PathIndexer

Authored by jkorous on Aug 27 2019, 6:09 PM.



Part of i-w-b upstreaming

This is actually a bit polished PathStorage.

Diff Detail

Event Timeline

jkorous created this revision.Aug 27 2019, 6:09 PM

I tried my best to provide meaningful comments, however, I can't find users of PathIndexer in, so I don't think I understand this abstraction at all. I would appreciate some big-picture description.

20 ↗(On Diff #217551)

"namespace clang"?

36 ↗(On Diff #217551)

s/Represents// (throughout the patch)

"Represents" is generally a filler word -- of course types represent some concept, they rarely *are* that concept itself, except for concepts that we invent ourselves.

39 ↗(On Diff #217551)

Would be best to not refer to one concept with two terms -- root and prefix.

40 ↗(On Diff #217551)

"Path relative to the root specified by PrefixKind."

41 ↗(On Diff #217551)

Const members in structs are not great because they make structs not act like values -- the assignment operator becomes deleted. (Throughout the patch.)

56 ↗(On Diff #217551)

Builds => stores, or maintains. It does not walk the filesystem itself.

66 ↗(On Diff #217551)

In this context, "\param".

80 ↗(On Diff #217551)

Add an assert for uniqueness?

Rename to storeNewPath?

15 ↗(On Diff #217551)

It is OK to shadow member variables with parameters. Shadowing does the right thing in the member initializer list.

Oh, it is called PathStorage there. Looking.

gribozavr added inline comments.Sep 2 2019, 7:54 AM
34 ↗(On Diff #217551)

I believe it would be helpful to build a separate class that just does string interning, and let BitPathComponent be a reference to a string in that string pool. That would simplify code in PathIndexer, and separate the concerns of path handling and string storage. PathIndexer would expose a reference to the interned storage through a getter.

54 ↗(On Diff #217551)

WDYT about moving DirBitPath and FileBitPath to be nested types in PathIndexer? You should be able to also remove "Bit" from their names then.

57 ↗(On Diff #217551)

SerializablePaths? SerializablePathCollection? OnDiskPathCollection? (like clang/lib/Serialization/MultiOnDiskHashTable.h)

I'm not a fan of using the word "Indexer" for a passive entity.

76 ↗(On Diff #217551)

WDYT about using optional instead of a special marker value?

22 ↗(On Diff #217551)

s/FileBitPaths.size()/FileToIndex.size()/ would probably be more intuitive.

jkorous marked 17 inline comments as done.Sep 20 2019, 11:35 AM
jkorous added inline comments.
20 ↗(On Diff #217551)


41 ↗(On Diff #217551)

I don't have a particularly strong opinion but in general I like immutability as an invariant.

76 ↗(On Diff #217551)

Good point! Originally the -1 convention for both return values and indices being passed as parameters was used throughout the i-w-b. It avoided explicit error handling which allowed code to look simpler but was more difficult to reason about. This seems to be the last surviving specimen.

jkorous updated this revision to Diff 221084.Sep 20 2019, 11:44 AM
jkorous marked 3 inline comments as done.

Addressed the comments and tried to simplify things by splitting into multiple smaller classes.

@arphaman can you please take a look at this patch?

Sure, will do.

jkorous updated this revision to Diff 279310.Jul 20 2020, 11:02 AM

Restore the patch. Previous update wasn't meant for this review - I confused the review IDs.

arphaman added inline comments.Jul 20 2020, 8:03 PM

I feel like StringOffSz isn't very informative. Can this be renamed to something like StringPool::OffsetSize, or maybe StringPool::Index.


Do you think you can use llvm::sys::path::starts_with instead? why is it important to return false when dir is equivalent to rest. Would we be able to store paths a path like /path/to/xyz_ab, when the CWD/Sysroot is /path/to/xyz when we don't have this check?


I looked through the uses of tryStoreFilePath in the full patch, and it doesn't seem like the use of try in the name is justified, since it could imply that some form of error could be produced as a result of a call. Can this be named more precisely, perhaps something like getStoreIndexForFile()? Also, uses seem to imply that it's always converted to uint64_t, can we avoid the conversion and return uint64_t here instead?


Same name critique, could this be named something like storeDirPath, without the try?


Please use is_separator instead of comparing to /

jkorous added inline comments.Aug 4 2020, 2:38 PM

I'll try to get rid of this. It's only used in tryStoreDirPath(). The logic is probably sound but maybe it could be implemented in a more readable fashion.


The semantics of the name are the same as STL containers - say std::map::try_emplace. I'm open to renaming but would like to have a name that conveys the fact that this is actually the method used for gathering the data for serialization.


Maybe storeUniqueDirPath or storeNewDirPath? Because we aren't always storing the input.


That's a good point!

jkorous updated this revision to Diff 283111.Aug 4 2020, 7:44 PM

Addressed comments.

jkorous marked an inline comment as done.Aug 4 2020, 7:45 PM

Friendly ping

arphaman accepted this revision.Aug 18 2020, 11:38 AM
This revision is now accepted and ready to land.Aug 18 2020, 11:38 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 19 2020, 11:25 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript