This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Merge symbols in global-sym-builder on the go
AbandonedPublic

Authored by ilya-biryukov on Apr 10 2018, 2:12 AM.

Details

Summary

This avoids storing intermediate symbols in memory, most of which are
duplicates.
The resulting .yaml file is ~120MB, while intermediate symbols takes
more than 20GB.

Event Timeline

ilya-biryukov created this revision.Apr 10 2018, 2:12 AM

Thanks for digging it out!

In upstream, we use InMemoryToolResults which saves all duplicated "std::string"s into the memory, I think we could optimize InMemoryToolResults by using Arena to keep the memory low, I will try it to see whether it can reduce the memory. A bonus point is that we don't need to change any client code.

Is this patch still relevant after haojian's string deduplication?

clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp
53

Seems reasonably likely we would actually have contention here? merging per-thread (combiner) and then globally at the end (reducer) might be the way to go (might be significantly faster). But not sure how big the impact is.

Is this patch still relevant after haojian's string deduplication?

Apparently it does. It has an advantage of distributing the work more evenly between the program runs.
Currently the tool reports progress based on the number of files it parsed. But then it takes a lot of time to actually merge all the symbols at the end and the progress is not reported during that time.

The perfect behavior would be to remove duplicates, not just intern them :-) That shouldn't be too hard and it probably even makes sense to have a ToolExecutor that does that.

clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp
53

I haven't seen any noticeable performance degradation after this patch.
There should be contention, but it doesn't seem to matter much, as parsing still takes way more time.

Still makes sense to rewrite the code in order to avoid contention, will do that.

Is this patch still relevant after haojian's string deduplication?

Apparently it does. It has an advantage of distributing the work more evenly between the program runs.
Currently the tool reports progress based on the number of files it parsed. But then it takes a lot of time to actually merge all the symbols at the end and the progress is not reported during that time.

The perfect behavior would be to remove duplicates, not just intern them :-) That shouldn't be too hard and it probably even makes sense to have a ToolExecutor that does that.

OK, can we sync up offline with @hokein and decide what to do here? I don't think it makes sense to have both these solutions to the same issue, but I don't have a strong opinion on exactly what we do.
(Other than clearly the right solution is to have a proper MR framework <half-trolling>)

ilya-biryukov added inline comments.Apr 19 2018, 6:00 AM
clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp
53

Looking at the traces, the threads spend ~6% of mergeSymbols time between Lock(Mut) and for(const Sym....
Improving it might be a good idea, but from my point of view it's not too bad to commit it as is.

Doing it properly is a little complicated in the current setup, since we don't have knobs to create per-thread structs and merge them later.

I think this implementation will have problems in a "real" multi-machine MR framework. The lifetime of the Merger is the whole program, with output only coming at the end.
With N workers, each will store >1/N of the symbols (more because of overlap), and the streaming nature of a MR is important to its scaling.

As you know, we rely on this code in such a framework :-)
I would be much happier if we could express these constraints/patterns in the open-source code, I think the most promising approach would be to define a more complete clang mapreduce API in Tooling.
But that's way out of scope in this patch. I'm not really sure what to suggest here, happy to talk through it more tomorrow.

clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp
56

consider calling this consume, and the member in SymbolIndexActionFactory etc Sink, to emphasize the data flow

ilya-biryukov abandoned this revision.Nov 26 2018, 7:24 AM

This landed as a different change.