Page MenuHomePhabricator

[WIP] Index-while-building
Needs ReviewPublic

Authored by jkorous on Jul 8 2019, 8:20 PM.

Diff Detail

Event Timeline

jkorous created this revision.Jul 8 2019, 8:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 8 2019, 8:20 PM

Thanks for sharing this! I hope I will have time to dig into more detail.

At a high level, my main question is around layering.
Until now, lib/Index has mostly been a pretty abstract framework: it gathers information from ASTs and exposes it to callers, who then decide what to do with it.
This has served clangd pretty well: we've been able to hook into it, use our own storage, fix the occasional bug, and avoid a bunch of custom tree traversals.
My understanding is that historically XCode has used it in similar ways by out-of-tree consumers (possibly via libclang?)

This patch introduces another concrete consumer in-tree, which is great - making uses more visible helps everyone understand and change the API.
It's also well-layered! But much of it's in the Index library, which makes the layering more fragile:

  • it's unclear to me whether strict layering is intended here - there seem to be a couple of violations (though this is preview code)
  • it's not going to be clear to people arriving at the code which dependencies are OK, and whether IWB is special or one client of many
  • as it stands, there's no technical enforcement of layering (so it may rot), nor a way to depend on the abstract part but not the concrete part. This makes changes to the concrete part scarier.

If the currently-abstract parts Index becomes coupled to IWB over time, it'll be a regression for how clangd is using the library. Maybe we can converge on the concrete consumers too, but carefully (and not necessarily!)

Can we move the concrete parts to IndexStore directories (both header and impls)? (EmitIndexAction, DeclOccurrenceGenerator, IndexDataFormat, IndexDataStore, IndexRecordWriter, PathStorage, half of IndexOptions, BitstreamUtil, FSUtil, FileIndexRecord)

clang/include/clang/Index/GenerateIndexAction.h
34 ↗(On Diff #208580)

(this seems like suspicious layering, but also seems to be unused)

clang/include/clang/Index/IndexOptions.h
39 ↗(On Diff #208580)

This file mixes abstract indexing stuff (IndexingOptions) with concrete (RecordingOptions)

clang/lib/Index/GenerateIndexAction.cpp
11 ↗(On Diff #208580)

I can't tell where this is used - it's suspicious if required

MaskRay removed a subscriber: MaskRay.
jkorous marked 4 inline comments as done.Jul 22 2019, 3:06 PM

Thanks for taking a look Sam!

I finished porting of all the changes on top of llvm.org master in the meantime. The updated patch contains all the code we'd like to land. Build (on macOS) and tests are fine.
I'll leave it as a single patch for now so we can finish discussion about layering and I'll split it into couple smaller patches once this is done.

I've put the actual index-while-building part into separate subdirectories for now. For now I went with IndexWhileBuilding name but I'm open to discussion.

clang/include/clang/Index/GenerateIndexAction.h
34 ↗(On Diff #208580)

Right. Removed.

clang/lib/Index/GenerateIndexAction.cpp
11 ↗(On Diff #208580)

Removed. Thanks.

jkorous updated this revision to Diff 211205.Jul 22 2019, 3:16 PM
jkorous marked 2 inline comments as done.
jkorous edited the summary of this revision. (Show Details)

Uploaded the full diff.

For now I went with IndexWhileBuilding name but I'm open to discussion.

I'd prefer 'IndexSerialization'.

jkorous updated this revision to Diff 211321.Jul 23 2019, 9:45 AM

clang-format

Not a full review, just some comments that would help reduce the size of the patch, plus code organization concerns.

Could we avoid nested libraries and rename clang/include/clang/Index/IndexWhileBuilding to clang/include/clang/IndexWhileBuilding ?

clang-tools-extra/clangd/ClangdUnit.cpp
30 ↗(On Diff #211321)

Please submit changes like these this #include removal separately to reduce the size of the patch.

clang-tools-extra/clangd/FindSymbols.cpp
20 ↗(On Diff #211321)

Please submit changes like these this #include removal separately to reduce the size of the patch.

clang-tools-extra/clangd/index/IndexAction.cpp
128 ↗(On Diff #211321)

Could you move index action into a separate header and inline index::createIndexingAction into its callers in a separate commit?

clang/include/clang/Index/IndexAction.h
22 ↗(On Diff #211321)

+1. Instead of WrapperFrontendAction, could you implement a general frontend action multiplexer in clang/include/clang/Frontend/FrontendAction.h ? You can do that as a separate commit.

clang/include/clang/Index/IndexDataConsumer.h
64 ↗(On Diff #211321)

Changes to this file can be submitted as a separate patch.

clang/include/clang/Index/IndexOptions.h
19 ↗(On Diff #211321)

Are these the declarations moved from clang/include/clang/Index/IndexingAction.h? I'd suggest to commit this header splitting separately to reduce the size of the patch.

clang/include/clang/Index/IndexWhileBuilding/PathStorage.h
74 ↗(On Diff #211321)

Please add newlines (everywhere in the patch).

clang/include/clang/Index/USRGeneration.h
100 ↗(On Diff #211321)

Unintended edit?

gribozavr added inline comments.Aug 6 2019, 8:42 AM
clang/include/clang/Driver/Options.td
337

Could you add a more detailed description? Even as a compiler engineer, I can't really tell what index-record-codegen-name really does or why I would want to turn it on.

clang/include/clang/Frontend/CompilerInstance.h
180 ↗(On Diff #211321)

No need to write "\brief", it is the default.

180 ↗(On Diff #211321)

Please don't write what something is used for, write what something is.

"A factory for wrappers around FrontendActions that compile imported modules."

810 ↗(On Diff #211321)

I don't see anything actually using GenModuleActionWrapper.

clang/lib/Frontend/CompilerInstance.cpp
1125

Please end comments with a period.

gribozavr added inline comments.Aug 9 2019, 6:23 AM
clang/include/clang/Frontend/FrontendOptions.h
369

Could we change "ignore system symbols" into a positive option ("record system symbols")? Throughout the patch please.

clang/include/clang/Index/CodegenNameGenerator.h
46 ↗(On Diff #211321)

Why change to a raw pointer?

We can use a unique_ptr to an incomplete type, as long as the destructor ~CodegenNameGenerator is defined in a file where the type is complete.

clang/lib/Index/CodegenNameGenerator.cpp
31 ↗(On Diff #211321)

This code looks like a straight copy from clang/lib/AST/Mangle.cpp -- can it be de-duplicated?

clang/unittests/Index/IndexTests.cpp
74

This change can be a separate patch.

106

Unclear why this test overrides these methods, if the implementation is identical to the base class.

112

No need to add virtual when we already have override.

jkorous marked 2 inline comments as done.Aug 12 2019, 6:53 PM
jkorous added inline comments.
clang/include/clang/Index/IndexAction.h
22 ↗(On Diff #211321)

Sent for review: https://reviews.llvm.org/D66125
I'll rebase this patch once/if that lands.

jkorous marked 10 inline comments as done.Aug 13 2019, 6:55 PM

Thanks for the review! I addressed just some minor details so far.

clang/include/clang/Frontend/CompilerInstance.h
810 ↗(On Diff #211321)

IIUC it's both used (lambda on CompilerInstance.cpp:L1136) and passed (CompilerInstance.cpp:L1112) to CompilerInstance of module dependencies in compileModuleImpl.

clang/unittests/Index/IndexTests.cpp
106

Removed these overrides.

jkorous updated this revision to Diff 215009.Aug 13 2019, 6:58 PM
jkorous marked an inline comment as done.

Addressed some of the comments.

gribozavr added inline comments.Aug 14 2019, 3:08 AM
clang/lib/Frontend/CompilerInstance.cpp
1143

My suggestion would be to change GenerateModuleAction::CreateASTConsumer to perform the indexing. Doing so will avoid the problem of having to compose FrontendActions (which we found in another patch to be tricky), and also will make the code a lot more straightforward, I think.

WDYT?

jkorous updated this revision to Diff 215660.Aug 16 2019, 1:12 PM

Added ASTConsumerInjector and changed `Emit|Generate-IndexAction to -ASTConsumer.

jkorous marked 3 inline comments as done.Aug 21 2019, 2:38 PM
jkorous added inline comments.
clang/lib/Index/CodegenNameGenerator.cpp
31 ↗(On Diff #211321)

Seems like you're right: https://reviews.llvm.org/D63535

jkorous updated this revision to Diff 216519.Aug 21 2019, 4:27 PM
jkorous marked an inline comment as done.
  • addressed more comments
  • rebased
  • clang-format
jkorous marked 3 inline comments as done.Aug 21 2019, 5:32 PM
jkorous added inline comments.
clang/include/clang/Frontend/CompilerInstance.h
810 ↗(On Diff #211321)

I was wrong! I thought we are passing some data in the wrapper. Turned out we aren't and I was able to completely remove it. Thanks!

jkorous updated this revision to Diff 216533.Aug 21 2019, 5:33 PM
  • addressed some comments
  • removed GenModuleActionWrapper
jkorous marked an inline comment as done.Aug 21 2019, 6:08 PM
jkorous added inline comments.
clang/include/clang/Frontend/FrontendOptions.h
369

Surprisingly it seems that the name is actually mostly determined by the fact that we want to index system symbols by default and that natural CLI for boolean argument with default positive value is optional flag with negative semantics.

Specifically - the semantics/name of the flag OPT_index_ignore_system_symbols is pretty much given (or ?).

I could rename just IndexIgnoreSystemSymbols and keep OPT_index_ignore_system_symbols but it feels less clear than current state.

WDYT?

jkorous updated this revision to Diff 216726.Aug 22 2019, 3:36 PM
  • renamed "codegen name generator" -> "mangled name generator"
  • ranemed flag index-record-codegen-name -> index-record-mangled-name
  • removed wrapper over ASTNameGenerator from libIndex
jkorous marked an inline comment as done.Aug 22 2019, 3:41 PM

I addressed the rest of the comments (excluding comments about splitting the patch).
Next step is to follow up on the discussion about FrontendActions/ASTConsumers from https://reviews.llvm.org/D66125 and then I'd like to start splitting off separate commits.
I'll keep this patch rebased on top of what has already landed to have some sense what still needs to be done.

jkorous added a comment.EditedAug 26 2019, 1:48 PM

I'm now taking a look at suggestions from https://reviews.llvm.org/D66125 regarding IndexingFrontendAction in libclang.

IndexingFrontendAction is a lot more complex -- not only it needs a complex factory that creates the ASTConsumer, but it also overrides getTranslationUnitKind, which changes the behavior of Sema and Preprocessor in significant ways. So it does not seem like IndexFrontendAction can become an ASTConsumer.

However, IndexFrontendAction does not seem like a standalone ASTConsumer anyway. Its primary reason to exist is the logic in IndexingConsumer::shouldSkipFunctionBody. That logic seems interesting for general-purpose indexing, not just for libclang, so it would make sense to fold it into IndexASTConsumer.

To summarize:

Fold shouldSkipBody from IndexingConsumer in libclang into IndexASTConsumer in libIndex.
Make libIndex expose an ASTConsumer factory function, that encapsulates the logic currently in IndexingFrontendAction::CreateASTConsumer. Clients who need to combine indexing with some other FrontendAction can use that factory to create a consumer that will index whatever comes out of Sema.
Make a FrontendAction that implements the getTranslationUnitKind function to implement the corresponding optimization. If this FrontendAction is only needed in libclang, it can live there, otherwise, it can be in libIndex.
What do you think?

IIUC the goal was to move as much of interesting logic to libIndex as possible and hopefully implement the rest of the IndexingFrontendAction as an ASTConsumer instead of FrontendAction.

Now, it seems to me:

  • IndexingFrontendAction::getTranslationUnitKind() method just returns two different results based on configuration, there's no logic - we just pass the configuration via DataConsumer. I am not quite sure this is interesting enough to create a self-standing FrontendAction. (To be completely honest - this actually seems more like a hack but I am not familiar enough with how does TUKind affect Preprocessor and Sema to assess this.)
  • We could factor out an ASTConsumer factory from IndexingFrontendAction::CreateASTConsumer but there are also preprocessor callbacks created, DataConsumer set up with CompilerInstance and CXIndexDataConsumer::importedPCH called. This means we'd probably end up with having a FrontendAction with CXIndexDataConsumer in libclang anyway as this doesn't seem like a code that would quite fit into the hypothetical ASTConsumer factory.

Overall - it doesn't seem to me we could make IndexingFrontendAction significantly simpler or could reuse much of its code elsewhere so I suggest we leave it as it is.

WDYT?

jkorous marked 3 inline comments as done.Aug 27 2019, 3:03 PM

I'll start splitting off some self-standing parts that won't be affected by hypothetical changes at the FrontendAction/ASTConsumer level.

clang/include/clang/Index/IndexDataConsumer.h
64 ↗(On Diff #211321)

Done in cfd641d84a5 [clang][Index][NFC] Move IndexDataConsumer default implementation

jkorous updated this revision to Diff 217506.Aug 27 2019, 3:05 PM
jkorous marked an inline comment as done.
  • Split off change to IndexDataConsumer.
  • Rebase.
jkorous updated this revision to Diff 219159.Sep 6 2019, 1:13 PM
jkorous edited the summary of this revision. (Show Details)

Rebased on TOT and removed noise from the patch as I've already landed couple refactoring patches.

jkorous updated this revision to Diff 222057.Sep 26 2019, 5:10 PM

Rebased on top of current master and also tentatively on top of SerializablePathCollection (previously PathStorage, PathIndexer) from https://reviews.llvm.org/D66854