This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Allow to merge symbols on-the-fly in global-symbol-builder
ClosedPublic

Authored by ilya-biryukov on Aug 23 2018, 2:11 AM.

Details

Summary

The new mode avoids serializing and deserializing YAML.
This results in better performance and less memory usage. Reduce phase
is now almost instant.

The default is to use the old mode going through YAML serialization to
allow migrating MapReduce clients that require the old mode to operate
properly. After we migrate the clients, we can switch the default to
the new mode.

Diff Detail

Repository
rL LLVM

Event Timeline

ilya-biryukov created this revision.Aug 23 2018, 2:11 AM

Collecting the timings to build the index for LLVM now. Will publish when they're ready

Overall looks good to me.

clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp
61 ↗(On Diff #162142)

AFAIK, this flag would basically depend on the executor being used. If executor can provide information like whether all tasks share memory, we can make the decision totally based on the selected executor (i.e. one fewer option).

  • New mode

./bin/global-symbol-builder -merge-on-the-fly -executor=all-TUs > /dev/null

40079.41s user 1874.40s system 5340% cpu 13:05.56 total
  • Old mode

Still running, expect it to be more than 40minutes... Will update when I have the numbers.

ilya-biryukov added inline comments.Aug 23 2018, 3:47 AM
clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp
61 ↗(On Diff #162142)

That SG. So we could add a virtual method to executor that would provide us with this information and then we can get rid of that option, does that LG?
However, do we have non-single-binary executor implementations upstream? We need to leave a way to run the code that actually uses YAML serialization to make sure we can experiment when things break.

The numbers are finally there
New mode

./bin/global-symbol-builder -merge-on-the-fly -executor=all-TUs > /dev/null
40079.41s user 1874.40s system 5340% cpu 13:05.56 total

Old mode

./bin/global-symbol-builder -merge-on-the-fly=false -executor=all-TUs  > /dev/null
39789.70s user 2014.45s system 1385% cpu 50:17.37 total

So it's almost 5 times faster now: 50 min to 13 min.

ioeric added inline comments.Aug 23 2018, 5:54 AM
clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp
61 ↗(On Diff #162142)

Sounds good.

However, do we have non-single-binary executor implementations upstream? We need to leave a way to run the code that actually uses YAML serialization to make sure we can experiment when things break.

The framework allows this, so I'm not sure if there is other downstream implementations that do this (probably not?). But I don't think we can get rid of the YAML serialization at this point because we still dump everything to yaml in the end?

ilya-biryukov added inline comments.Aug 23 2018, 6:12 AM
clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp
61 ↗(On Diff #162142)

I meant the intermediate YAML serialization-deserialization parts.

I'd keep an option to force using the consumer that accumulates YAML symbols, and only allow to use on-the-fly merge when the tool executor reports that it is single-process. Would allow to:

  • Avoid running on-the-fly merge when the executor does not support it (multi-binary MRs, etc)
  • Allow to use on-the-fly by default for executors that run in the single binary.
  • Allow to run with intermediate serialization even in a single binary if -merge-on-the-fly is specified.

Does that LG?

ioeric added inline comments.Aug 23 2018, 6:25 AM
clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp
61 ↗(On Diff #162142)

IIUC, there will basically be a flag that allows us to force using yaml intermediate serialization, even when the executor supports on-the-fly merging. By default, we use on-the-fly merging whenever possible. If that's the case, SGTM.

  • Don't merge on-the-fly for multi-process executors
ilya-biryukov added inline comments.Aug 23 2018, 9:20 AM
clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp
61 ↗(On Diff #162142)

The flag is still called -merge-on-the-fly, but it's true by default and can be set to false to reenable intermediate YAML serialization

ioeric accepted this revision.Aug 23 2018, 9:30 AM

lgtm

clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp
60 ↗(On Diff #162194)

Maybe make this hidden?

This revision is now accepted and ready to land.Aug 23 2018, 9:30 AM
  • Make the option hidden
ilya-biryukov marked an inline comment as done.Aug 24 2018, 1:35 AM
This revision was automatically updated to reflect the committed changes.