Page MenuHomePhabricator

[clang][Modules] Serialize decl to comment mapping to speed up code completion.
Needs ReviewPublic

Authored by jkorous on Aug 13 2019, 2:34 PM.

Details

Summary

The original motivation for this use-case was code-completion invoked via libclang.

Currently what we do is this:

  1. We build a module and construct the the comment -> decl mapping.
  2. We serialize the module to disk (without preserving the mapping).
  3. We deserialize the module and reconstruct the mapping.

Construction of the mapping isn't exactly cheap - there are expensive SourceLocation decompositions and SourceLocation-based search. While we managed to turn the search into bisection in recent patch, preserving the information still seems a reasonable speed:module size tradeoff.

Original commit by Michael Spencer.

Diff Detail

Event Timeline

jkorous created this revision.Aug 13 2019, 2:34 PM
Bigcheese added inline comments.Aug 13 2019, 3:03 PM
clang/include/clang/Serialization/ASTBitCodes.h
794

COMMENTLESS

clang/lib/Serialization/ASTReader.cpp
9739–9752

I think these two cases need tests.

+ilya because I know he's wrestled with these before, and because I'm OOO sick

ilya-biryukov added a comment.EditedAug 14 2019, 2:36 AM

Overall this makes sense. If we choose to have this mapping in the ASTContext, serializing and deserializing it is probably a good idea (unless it means deserializing more decls eagerly...)

This patch mentions there is code currently that "reconstructs the mapping". Why not remove it in this patch? Where is it used if we actually serialize the mapping?
Do we have any measurements? How faster is this in particular use-cases? How bigger is the module now?

ilya-biryukov added inline comments.Aug 14 2019, 2:39 AM
clang/lib/Serialization/ASTReader.cpp
9733

GetDecl can cause deserialization of a corresponding decl.

Does that mean we are deserializing more decls solely because they had comments after this patch?
If not, why?

gribozavr added inline comments.Aug 14 2019, 2:59 AM
clang/include/clang/Serialization/ASTBitCodes.h
793

80 columns, please.

clang/lib/Serialization/ASTReader.cpp
9750

s/RedeclChainComments/CommentlessRedeclChains/

If this code passed tests as-is, it definitely needs more tests.

9750

I'm not certain that it is valid to serialize these caches, especially the CommentlessRedeclChains one. Here's why I think so. The result stored in CommentlessRedeclChains depends on all declarations that preceded LastCommentlessRedecl, that is, the cache relies on the fact that the redeclaration chain is fixed up to that point.

However, when we deserialize a module, that is not true -- I don't know how exactly redeclaration chains are merged, but since modules are compiled once and can be imported in arbitrary order, the redeclaration chain is definitely not fixed.

bruno added a comment.Aug 16 2019, 2:43 PM

I'm curious to know if you measured the impact on the PCM size after this change (example: PCMs from building macOS public SDK)? For instance, PPD_SKIPPED_RANGES currently explodes for some specific inputs, just wonder what's the story here.

clang/lib/Serialization/ASTReader.cpp
9739–9752

+1