This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Take into account what is in the index (symbols, references, etc.) at indexes merge
ClosedPublic

Authored by ArcsinX on Jan 19 2021, 12:40 AM.

Details

Summary

Current indexes merge logic skip data from the static index if the file is in the dynamic index, but sometimes the dynamic index does not contain references (e.g. preamble (dynamic) index vs background (static) index).
This problem is masked with the fact, that the preamble index file list consists of file URI's and other indexes file lists consist of file paths.
This patch introduces the index contents (symbols, references, etc.), which makes indexes merge more flexible and makes it able to use URI's for the index file list.

Diff Detail

Event Timeline

ArcsinX created this revision.Jan 19 2021, 12:40 AM
ArcsinX requested review of this revision.Jan 19 2021, 12:40 AM
ArcsinX updated this revision to Diff 317521.Jan 19 2021, 4:19 AM

Prevent crash in debug mode.

Thanks for following up on this!

clang-tools-extra/clangd/index/FileIndex.cpp
264–265

I don't think it makes sense to specify this at index-build-time, should it be a constructor parameter instead?

This value describes all the data previously passed to update(), so we must always update() with the same type of data. So this is really tied to the identify of the FileSymbols, rather than the buildIndex operation.

433–434

Is this change related? It changes the key scheme for the preamble index from URIs to paths, but I'm not sure why.

Do we have multiple URIs pointing to the same path? What are they concretely?

clang-tools-extra/clangd/index/Index.h
85

This deserves a comment to motivate it... e.g.

Describes what data is covered by an index.
Indexes may contain symbols but not references from a file, etc.
This affects merging: if a staler index contains a reference but a
fresher one does not, we want to trust the fresher index *only*
if it actually includes references in general!

85

This name is a bit vague for my taste (because both "data" and "kind" tend to get attached to everything, they lose their meaning)

Maybe IndexContents? This is still vague but at least describes the relationship between the index and the e.g. Symbols.

98

I'd also consider explicit operator bool() so you can write if (Indexed(File) & IndexContents::References), but a question of taste.

ArcsinX added inline comments.Jan 25 2021, 10:30 AM
clang-tools-extra/clangd/index/FileIndex.cpp
433–434

This is the main thing in this patch. I will try to explain.
We use these keys to create the file list, which is used by indexedFiles().
Currently, the preamble index contains URI's instead of paths (as a file list), that leads to the function returned by PreambleIndex::indexedFiles() always return false (because we pass to this function paths, not URI's). So, we always take data from the preamble index (but maybe we should not in some cases).

ArcsinX added inline comments.Jan 25 2021, 10:22 PM
clang-tools-extra/clangd/index/FileIndex.cpp
433–434

that leads to the function returned by PreambleIndex::indexedFiles() always return false (because we pass to this function paths, not URI's)

This is a bit incorrect.
We pass to this function URI, but this URI is converted to path. i.e. MemIndex::indexedFiles(), Dex::indexedFiles() expect that Files are paths, but they are URI's for the preamble index. That's why PreambleIndex::indexedFiles() always return false.

sammccall added inline comments.Jan 25 2021, 11:45 PM
clang-tools-extra/clangd/index/FileIndex.cpp
433–434

Oooh... I'm not sure how I misunderstood the original so much :-( And I missed it in this patch description as well, apologies.

My impression was that the file list was derived from the index data, rather than from the keys, which were always intended to be opaque/arbitrary.
(At various times, these have been filenames, URIs, and other things IIRC. And until relatively recently, the preamble index keys were the file the preamble was built from, not the file containing the symbol!)

It feels like using URIs extracted from symbols might not be *completely* robust. Because having CanonicalDeclaration etc set to a file might not line up exactly with the idea that we indexed the file. But we do use this partitioning for FileShardedIndex, so it has to work well enough.

The advantage of using the extracted URIs would be: also works for non-file-sharded indexes like --index-file, avoid a bunch of conversion between URI and path, and we get to keep the simpler/flexible design for FileSymbols where the key is opaque.

Does this seem feasible to you?

ArcsinX added inline comments.Jan 26 2021, 12:33 AM
clang-tools-extra/clangd/index/FileIndex.cpp
433–434

I also do not like these path <=> URI conversions. But what about empty files?, e.g.:

  • open a file
  • remove everything from this file
  • the dynamic index has no symbols with definition/declaration from this file, so we do not have this file in the dynamic index file list.
  • the static index has symbols with definition/declaration from this file, so we have this file in the static index file list.
  • we will show stale results from the static index for this file.

Unsure, maybe it's ok to ignore the problem with empty files, seems this is the only case when the file was indexed, but we have no symbols located there.

Overall, I like the idea to use URI's instead of paths. I think we could implement it first as a separate patch and after that return to this one.

What do you think?

sammccall added inline comments.Jan 26 2021, 4:16 AM
clang-tools-extra/clangd/index/FileIndex.cpp
433–434

Right, completely empty files are going to be mishandled., in the same way that before your changes we mishandled files that had no results from the query.

I do still think this is the way to go though, because:

  • completely empty files are much rarer
  • again this is a peer to an FileShardedIndex issue where empty files get no shards. Therefore *within* the dynamic index, we'll never clear out the symbols for a file while the file is emptied
  • having SymbolCollector explicitly the files indexed is the principled solution to both problems, and that seems feasible for us to do at some point.

Overall, I like the idea to use URI's instead of paths. I think we could implement it first as a separate patch and after that return to this one.

That sounds great if you don't mind!
(Is there a reason we can't land the rest of this patch as-is already?)

ArcsinX updated this revision to Diff 319602.Jan 27 2021, 9:25 AM
  • Set IdxContents at FileSymbols object breation instead of at FileSymbols::buildIndex() call.
  • Revert change of the preamble index key scheme
  • Add comment for IndexContents
  • IndexDataKind => IndexContents
ArcsinX marked 4 inline comments as done.Jan 27 2021, 9:36 AM
ArcsinX added inline comments.
clang-tools-extra/clangd/index/FileIndex.cpp
433–434

(Is there a reason we can't land the rest of this patch as-is already?)

Seems we have no reason =)

clang-tools-extra/clangd/index/Index.h
98

Unsure how I can do this =(
I could not add operator bool() as a method for enum class.
I see these solutions:

  • keep it as is
  • implement bool operator !(IndexContents) and use !!
  • make IndexContents to be struct/class
  • add a function with a simple name instead of operator bool
ArcsinX edited the summary of this revision. (Show Details)Feb 4 2021, 2:51 AM
ArcsinX added inline comments.Feb 4 2021, 2:59 AM
clang-tools-extra/clangd/index/FileIndex.cpp
433–434

Overall, I like the idea to use URI's instead of paths. I think we could implement it first as a separate patch and after that return to this one.

Seems we need to land this patch first. Without index contents it's impossible to use URI's everywhere in the index file list (because the preamble index does not contain references, but without this patch we do not have this information at indexes merge).

sammccall accepted this revision.Feb 4 2021, 4:59 AM

Sorry, forgot to stamp this!

clang-tools-extra/clangd/index/FileIndex.cpp
415

seems we include relations too?

416

(this should take into account CollectMainFileRefs, but actually if you sync to HEAD it's always true now)

clang-tools-extra/clangd/index/FileIndex.h
74

It seems error-prone to have a default here.
(I guess it's for making tests shorter? Are there enough callsites for it to matter?)

If we must have a default, All seems to make more sense than None.

This revision is now accepted and ready to land.Feb 4 2021, 4:59 AM
ArcsinX updated this revision to Diff 321442.Feb 4 2021, 8:17 AM
  • Remove default value of IdxContents in FileSymbols constructor.
  • Fix index contents for the preamble index (Symbols => Symbols|Relations).
  • Rebase
ArcsinX marked 3 inline comments as done.Feb 4 2021, 8:19 AM

Thank you for review.

ArcsinX edited the summary of this revision. (Show Details)Feb 5 2021, 1:30 AM
This revision was landed with ongoing or failed builds.Feb 5 2021, 2:38 AM
This revision was automatically updated to reflect the committed changes.