This avoids storing intermediate symbols in memory, most of which are
duplicates.
The resulting .yaml file is ~120MB, while intermediate symbols takes
more than 20GB.
Details
Diff Detail
- Repository
- rCTE Clang Tools Extra
- Build Status
Buildable 16927 Build 16927: arc lint + arc unit
Event Timeline
Thanks for digging it out!
In upstream, we use InMemoryToolResults which saves all duplicated "std::string"s into the memory, I think we could optimize InMemoryToolResults by using Arena to keep the memory low, I will try it to see whether it can reduce the memory. A bonus point is that we don't need to change any client code.
Is this patch still relevant after haojian's string deduplication?
clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp | ||
---|---|---|
53 | Seems reasonably likely we would actually have contention here? merging per-thread (combiner) and then globally at the end (reducer) might be the way to go (might be significantly faster). But not sure how big the impact is. |
Apparently it does. It has an advantage of distributing the work more evenly between the program runs.
Currently the tool reports progress based on the number of files it parsed. But then it takes a lot of time to actually merge all the symbols at the end and the progress is not reported during that time.
The perfect behavior would be to remove duplicates, not just intern them :-) That shouldn't be too hard and it probably even makes sense to have a ToolExecutor that does that.
clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp | ||
---|---|---|
53 | I haven't seen any noticeable performance degradation after this patch. Still makes sense to rewrite the code in order to avoid contention, will do that. |
OK, can we sync up offline with @hokein and decide what to do here? I don't think it makes sense to have both these solutions to the same issue, but I don't have a strong opinion on exactly what we do.
(Other than clearly the right solution is to have a proper MR framework <half-trolling>)
clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp | ||
---|---|---|
53 | Looking at the traces, the threads spend ~6% of mergeSymbols time between Lock(Mut) and for(const Sym.... Doing it properly is a little complicated in the current setup, since we don't have knobs to create per-thread structs and merge them later. |
I think this implementation will have problems in a "real" multi-machine MR framework. The lifetime of the Merger is the whole program, with output only coming at the end.
With N workers, each will store >1/N of the symbols (more because of overlap), and the streaming nature of a MR is important to its scaling.
As you know, we rely on this code in such a framework :-)
I would be much happier if we could express these constraints/patterns in the open-source code, I think the most promising approach would be to define a more complete clang mapreduce API in Tooling.
But that's way out of scope in this patch. I'm not really sure what to suggest here, happy to talk through it more tomorrow.
clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp | ||
---|---|---|
56 | consider calling this consume, and the member in SymbolIndexActionFactory etc Sink, to emphasize the data flow |
Seems reasonably likely we would actually have contention here? merging per-thread (combiner) and then globally at the end (reducer) might be the way to go (might be significantly faster). But not sure how big the impact is.