This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Add a tool to build YAML-format global symbols.
ClosedPublic

Authored by hokein on Dec 21 2017, 7:31 AM.

Details

Summary

The tools is used to generate global symbols for clangd (global code completion),
The format is YAML, which is only for experiment.

TEST: used the tool to generate global symbols for LLVM (~72MB).

Diff Detail

Repository
rL LLVM

Event Timeline

hokein created this revision.Dec 21 2017, 7:31 AM
sammccall added inline comments.Dec 22 2017, 12:03 AM
clangd/CMakeLists.txt
50 ↗(On Diff #127881)

I think generally we run check-clang-tools before committing - I guess it's OK not to have tests for this experimental tool, but we should at least keep it building. Can you make check-clang-tools build it?

clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp
106 ↗(On Diff #127881)

FYI D41506 changes this API, adding SymbolSlab::Builder. Shouldn't be a problem, but we'll need to merge.

112 ↗(On Diff #127881)

This seems like you might end up overwriting good declarations with bad ones (e.g. forward decls). Picking one with a simple heuristic like "longest range" would get classes right.

175 ↗(On Diff #127881)

I don't understand why we need to plumb callbacks through SymbolCollector.
Can't we just write the slab after this line?

clangd/global-symbol-builder/run-global-symbol-builder.py
1 ↗(On Diff #127881)

So we now have three copies of this script...
I'm probably missing something, but what's here that's hard to do in a single binary?
I'd have to think it'd be faster if we weren't constantly spawning new processes and roundtripping all the data through YAML-on-disk.

(If this is just expedience because we have an existing template to copy, I'm happy to try to combine it into one binary myself)

This looks like it does the right thing, ISTM it has a few too many moving parts.
I might be missing why they're necessary, or we might not want to bother polishing this - maybe you can give me more context offline.

hokein updated this revision to Diff 128010.Dec 22 2017, 4:00 AM
hokein marked 4 inline comments as done.

Write all map-reduce logic into the global-symbol-builder binary, which makes
the patch much simpler.

Thanks for the comments.

clangd/CMakeLists.txt
50 ↗(On Diff #127881)

Yeah, it builds.

clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp
106 ↗(On Diff #127881)

Will do it after that patch is landed.

112 ↗(On Diff #127881)

Add a FIXME here, will do it in a follow-up -- as the symbol structure would be changed.

175 ↗(On Diff #127881)

ToolExecutor takes over the lifetime of FrontendActionFactory, there is no straightforward way to get the internal SymbolSlab from the indexAction.

Changed to use ClangTool, since it would simplify the code a lot, and no changes in SymbolCollector.

clangd/global-symbol-builder/run-global-symbol-builder.py
1 ↗(On Diff #127881)

Yeah, the script is borrowed from other places for the fast prototype.

We can combine it into the global-symbol-builder binary. Done.

sammccall accepted this revision.Dec 22 2017, 5:06 AM

Thanks for the simplification!
How long does it take to run over LLVM on your machine?

clangd/CMakeLists.txt
50 ↗(On Diff #127881)

Can you add it to the deps of the testsuite, so we don't break it?

clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp
45 ↗(On Diff #128010)

out of curiosity, what does this do?

57 ↗(On Diff #128010)

you could avoid opt parsing, opening files and dealing with errors etc by just writing to llvm::outs() and running with > foo.yaml - up to you

90 ↗(On Diff #128010)

if this isn't the typical use case, consider
llvm::errs() << "No compilation database found, processing individual files with flags from command-line\n"

This revision is now accepted and ready to land.Dec 22 2017, 5:06 AM
hokein updated this revision to Diff 128020.Dec 22 2017, 6:07 AM
hokein marked 4 inline comments as done.

Address remaining review comments.

Thanks for the simplification!
How long does it take to run over LLVM on your machine?

It took < 10 minutes (vs 20 minutes with python script) to run over LLVM.

clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp
45 ↗(On Diff #128010)

No official document on this :(
IIUC, this is a filter option to control whether we get symbols from the system code (like std library). All means we index all of them.

57 ↗(On Diff #128010)

Good idea. Done.

sammccall added inline comments.Dec 22 2017, 6:12 AM
clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp
57 ↗(On Diff #128020)

(remove this and use GeneralCategory as default?)

45 ↗(On Diff #128010)

Ah, that sounds fine. Later we may actually want to index stdlib separately.

hokein updated this revision to Diff 128021.Dec 22 2017, 6:22 AM
hokein marked an inline comment as done.

Use GeneralCateogry.

This revision was automatically updated to reflect the committed changes.