This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Collect definitions when indexing.
ClosedPublic

Authored by sammccall on Feb 5 2018, 6:05 PM.

Details

Summary

Within a TU:

  • as now, collect a declaration from the first occurrence of a symbol (taking clang's canonical declaration)
  • when we first see a definition occurrence, copy the symbol and add it

Across TUs/sources:

  • mergeSymbol in Merge.h is responsible for combining matching Symbols. This covers dynamic/static merges and cross-TU merges in the static index.
  • it prefers declarations from Symbols that have a definition.
  • GlobalSymbolBuilderMain is modified to use mergeSymbol as a reduce step.

Random cleanups (can be pulled out):

  • SymbolFromYAML -> SymbolsFromYAML, new singular SymbolFromYAML added
  • avoid uninit'd SymbolLocations. Add an idiomatic way to check "absent".
  • CanonicalDeclaration (as well as Definition) are mapped as optional in YAML.
  • added operator<< for Symbol & SymbolLocation, for debugging

Diff Detail

Repository
rL LLVM

Event Timeline

sammccall created this revision.Feb 5 2018, 6:05 PM

This does seem to get at least the simple cases right:

ID:              0EF8ACCCCF4D08B11EBF3FFB8004CE702991B15F
Name:            SymbolsFromYAML
Scope:           'clang::clangd::'
SymInfo:         
  Kind:            Function
  Lang:            C
CanonicalDeclaration: 
  StartOffset:     956
  EndOffset:       1010
  FilePath:        /usr/local/google/home/sammccall/src/llvm/tools/clang/tools/extra/clangd/index/SymbolYAML.h
Definition:      
  StartOffset:     4999
  EndOffset:       5348
  FilePath:        /usr/local/google/home/sammccall/src/llvm/tools/clang/tools/extra/clangd/index/SymbolYAML.cpp
CompletionLabel: 'SymbolsFromYAML(llvm::StringRef YAMLContent)'
CompletionFilterText: SymbolsFromYAML
CompletionPlainInsertText: SymbolsFromYAML
CompletionSnippetInsertText: 'SymbolsFromYAML(${1:llvm::StringRef YAMLContent})'
Detail:          
  Documentation:   ''
  CompletionDetail: SymbolSlab
ioeric added a comment.Feb 6 2018, 1:32 AM

looks good

clangd/index/Index.h
25 ↗(On Diff #132927)

It might be worth mentioning here whether the range covers the entire declaration/definition body, or just the symbol identifier.

clangd/index/Merge.cpp
71 ↗(On Diff #132927)

Do we also need to copy other information such as completion detail, since forward declarations usually have minimum symbol information?

hokein added a comment.Feb 6 2018, 1:41 AM

I like where the patch is going now.

clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp
67 ↗(On Diff #132927)

It is fine for debugging, but I think we don't want this behavior by default.

global-symbol-builder also supports running a single TU (global-symbol-builder /path/to/file), which is sufficient for debugging, I think?

clangd/index/Index.h
131 ↗(On Diff #132927)

We might want to update the comment above.

sammccall updated this revision to Diff 132985.Feb 6 2018, 6:20 AM
sammccall marked 3 inline comments as done.

[clangd] Prefer data from TUs with symbol defn to data from TUs without.

Thanks for comments! (Still not done, adding tests)

clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp
67 ↗(On Diff #132927)

Yeah, I'm not going to check this in, thus the XXX comment :-)

Single TU isn't enough - it doesn't test the reducer. On the other hand the full compilation database is too big. So this option would actually be useful! But it doesn't belong here.

clangd/index/Index.h
131 ↗(On Diff #132927)

I think the comment above is still valid, but added one for definition.

clangd/index/Merge.cpp
71 ↗(On Diff #132927)

Done. Now if one of the symbols has a definition and the other doesn't, we prefer the one that does. Otherwise we fall back to preferring L over R.

This means we prefer a definition from the global index over a declaration from the local one, which seems OK for now.

sammccall updated this revision to Diff 133024.Feb 6 2018, 9:07 AM

Add tests for SymbolCollector (finding def/decl locations) and merge.

Found some bugs in SymbolCollector's locations - added fixmes.

sammccall edited the summary of this revision. (Show Details)Feb 6 2018, 9:09 AM

OK, I think this is good to go now. Rebased against Eric's URI change and added tests.

ioeric added inline comments.Feb 6 2018, 1:18 PM
clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp
67 ↗(On Diff #132927)

Drive-by: you could also run the tool in the default standalone mode and provide a list of files.

clangd/index/Merge.cpp
93 ↗(On Diff #133024)

else if (S.Detail)?

/*Current state: S.Detail = S.Detail*/?

clangd/index/SymbolCollector.cpp
210 ↗(On Diff #133024)

It seems that we store definition even if it's the same as declaration, which might be true for most classes? This is probably fine, but it would be worth documenting. Or maybe consider not storing the same location twice?

236 ↗(On Diff #133024)

Could you elaborate a bit more on this one? What is the current heuristic, and what would we prefer?

unittests/clangd/SymbolCollectorTests.cpp
49 ↗(On Diff #133024)

Fallout of a git merge?

108 ↗(On Diff #133024)

neat!

hokein added inline comments.Feb 7 2018, 1:45 AM
clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp
159 ↗(On Diff #133024)

nit: we could get rid of the loop by using SymbolsToYAML(UniqueSymbols) .

67 ↗(On Diff #132927)

However the default standalone mode is not multiple thread :(.

clangd/index/SymbolCollector.cpp
137 ↗(On Diff #133024)

That's bad, thanks for finding it.

210 ↗(On Diff #133024)

I think it is fine to store the same location twice. We can't tell whether the CanonicalLoc is a definition or not.

sammccall updated this revision to Diff 133501.Feb 8 2018, 3:11 PM
sammccall marked 6 inline comments as done.

Address review comments.
Make SymbolsToYAML take an ostream instead of returning a string.
Define SymbolSlab::iterator so googletest will print it as a container.

sammccall updated this revision to Diff 133502.Feb 8 2018, 3:13 PM

Revert hack in global-symbol-builder to filter the input files.

Thanks for the comments!
And sorry for the delay, I was haunted by useless gtest messages, which D43091 should fix.

clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp
159 ↗(On Diff #133024)

Yeah, but we'd put the whole text form of the index into a single in-memory string, which seems pretty wasteful. Building a YAML output for each seems silly too though.

Changed SymbolsToYAML to take a raw_ostream ref parameter to write to.

clangd/index/Merge.cpp
93 ↗(On Diff #133024)

Good catch.
Reworked the structure here to avoid needing to name this case, since it was awkward.

clangd/index/SymbolCollector.cpp
210 ↗(On Diff #133024)

Documented that these may be the same.
We wouldn't actually save any memory by avoiding saving this twice - the filename is a stringref to the same data, and the offsets get stored even if they're null.

236 ↗(On Diff #133024)

Done a bit, though I don't know the detailed answer to either question.
Using forward declarations of classes as canonical indicates that we probably want something better.

unittests/clangd/SymbolCollectorTests.cpp
108 ↗(On Diff #133024)

Indeed :-)
This is needed because otherwise all the offsets in the annotated testcase are off.
I got it slightly wrong though: need the full path to the header name because it's not resolved relative to the main file.

hokein accepted this revision.Feb 9 2018, 1:14 AM

LGTM.

clangd/index/Index.h
129 ↗(On Diff #133502)

a typo here? multiple?

This revision is now accepted and ready to land.Feb 9 2018, 1:14 AM
ioeric accepted this revision.Feb 9 2018, 1:48 AM

Lg

clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp
67 ↗(On Diff #132927)

You are right :) But I don't think this matters much if you only want to test reduction.

clangd/index/SymbolCollector.cpp
210 ↗(On Diff #133024)

I'm less concerned about space. I think this would also make it easy to check whether a symbol has a separate definition (e.g. we might want to do this for go-to-definition/go-to-declaration). A comparison would also work but seems less natural. But I'm not sure, up tp you :)

sammccall marked an inline comment as done.Feb 9 2018, 6:40 AM
sammccall added inline comments.
clangd/index/SymbolCollector.cpp
210 ↗(On Diff #133024)

Yeah sometimes we'll definitely want to check if they're different, but other times we'll want to use one without caring if they are. Comparison seems less surprising than either of the fields sometimes being empty even though data is available - that seems likely to lead to bugs I think.

This revision was automatically updated to reflect the committed changes.
This revision was automatically updated to reflect the committed changes.