This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Don't create as much garbage while building Dex index.
ClosedPublic

Authored by sammccall on May 13 2020, 5:46 PM.

Details

Summary

The Token objects are relatively expensive and we were spending a lot of
CPU creating them for each trigram emitted. Instead, use a tiny trigram
structure until we're ready to finalize the index.

This improves the new BuildDex benchmark by 20%. This code is hot and on
the critical path in clangd: it runs after a new preamble is built.

Diff Detail

Event Timeline

sammccall created this revision.May 13 2020, 5:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 13 2020, 5:46 PM

Wow, nice work, thank you!

clang-tools-extra/clangd/index/dex/Dex.cpp
51

No longer "returns": "adds/generates"?

clang-tools-extra/clangd/index/dex/Trigram.cpp
84

It isn't immediately obvious why processing different id lengths is different. I'm assuming it's less expensive to find duplicates in the small vector as we go than to find the unique elements in the end in case of the short identifier, but I'd be happy to have a comment here.

Also, 14 is a magic number:

  • Move it to Trigram::SMALL_IDENTIFIER_SIZE or some other constant?
  • How did you find this number? Did you try running benchmark with different values and figured this one is the best for the LLVM index? If not, it would make sense to do that.
clang-tools-extra/clangd/index/dex/Trigram.h
35–55

Also doesn't "return" anymore

49

nit: maybe call it UID or something similar, "unsigned int" is somewhat surprising in this context

sammccall updated this revision to Diff 263961.May 14 2020, 3:32 AM

Address comments, add more docs.

sammccall updated this revision to Diff 263967.May 14 2020, 4:08 AM

Use a faster hash function, which seems to be worth something in the neighbourhood of 4%.

kbobyrev accepted this revision.May 14 2020, 4:13 AM

LGTM, thank you for the patch!

This revision is now accepted and ready to land.May 14 2020, 4:13 AM
This revision was automatically updated to reflect the committed changes.