This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Put direct headers into srcs section.
ClosedPublic

Authored by kadircet on Nov 21 2018, 3:19 PM.

Details

Summary

Currently, there's no way of knowing about header files
using compilation database, since it doesn't contain header files as entries.

Using this information, restoring from cache using compile commands becomes
possible instead of doing directory traversal. Also, we can issue indexing
actions for out-of-date headers even if source files depending on them haven't
changed.

Diff Detail

Event Timeline

kadircet created this revision.Nov 21 2018, 3:19 PM
kadircet updated this revision to Diff 175032.Nov 22 2018, 3:25 AM
  • Use include depth to get includes for all files, rather than just main file.

Going to leave awkward comments suggesting we expand the scope a bit.
Full disclosure: file dependency information is something that's come up as useful in lots of contexts.

clangd/index/Serialization.cpp
504

This seems like unneccesary coupling between layers. On the contrary, I think it's going to be really useful to have file and dependency information in the index data and in the in-memory index.
(e.g. for #include completion, more accurately guessing compile commands for headers, deciding which CC file corresponds to a .h file and vice versa...)

clangd/index/Serialization.h
45

This seems a little off to me conceptually: an index contains symbols that might be from many files. (Also true of Digest, which I missed).

I'd suggest a richer structure which can represent one or many files:

std::unordered_map</*uri*/string, SourceFile>; // keys are URIs
struct SourceFile {
  bool IsTU;
  StringRef URI; // points back into key
  FileDigest Digest;
  vector<StringRef> Headers; // points back into keys
  // maybe later: compile command
}

(though probably wrap the map into a class to ensure the string ownership doesn't get messed up)

For auto-index shards we could include full SourceFile info for the main file (with IsTU = true, and headers) and each of the headers would have a skeletal entry with IsTU = false and no headers stored.

But clangd-indexer could spit out an index that contains full information for all the transitive include of all indexed TUs.

kadircet updated this revision to Diff 175087.Nov 23 2018, 12:55 AM
kadircet marked an inline comment as done.
  • Change srcs to store multiple source file information rathet than one.
kadircet marked an inline comment as done.Nov 23 2018, 12:55 AM
kadircet updated this revision to Diff 175097.Nov 23 2018, 2:37 AM
  • rebased and changed URI convertion patterns, since it changed in rL347467
kadircet updated this revision to Diff 175412.Nov 27 2018, 12:51 AM
  • Rebase to get changes in background index.

structure and serialization stuff looks great!
still some questions around the auto-index logic and organization of the extraction code, let's chat offline. May make sense to split.

clangd/index/Background.cpp
292

nit: lowercase

304

It looks like you're re-reading the files here. this won't reuse in-memory copies of the file, and won't respect overridden contents outside the VFS (ugh). The SourceManager would be better I think.

314

I don't think this is going to work:

  • you're outputting the non-transitive dependencies from the TU, but I think you want the *transitive* ones, which should then be partitioned by file (same as symbols). Otherwise the headers from the shard for foo.h aren't going to be right, are they?
  • I don't think the current IncludeStructure includes all the info you want for transitive includes. e.g. for include edges A->B where there is a shorter path from Main->B, we don't record anything.
  • I think we will want the include information for other forms of index too. I think this means the logic should be next to symbolcollector, or in Headers.h, where it can be shared
clangd/index/Serialization.cpp
401

these string reassignments deserve a comment

clangd/index/Serialization.h
41

Does this belong here, vs in Index, or SymbolCollector, or so?

44

I think the StringMap could also use a typedef near SourceFile - that's where we could mention that FileURI and DirectIncludes point back at the keys.

To summarize offline discussion: I think we need to split this patch up, as the scope is growing. (In a really useful direction I think!)

The first patch should add the include graph data structure. (I think the right place for it is probably Headers.h, as we want to reuse it). It can also include the code to serialize/deserialize it in index files, which is in good shape.

Then we have at least 3 independent dependent patches :-)

  • populate the transitive include graph structure from a PPCallbacks. Probably hook it up to the static index FrontendAction too, since that's simple. This can be tested directly.
  • write the include graph appropriately in auto-index. This isn't quite trivial, as it needs to be partitioned by file.
  • read shards in auto-index. This could be all-at-once, or a simple version that didn't follow #include edges first, and then modify to consume the include graph.

This is a bunch more work than originally envisioned, but having dependency graph info in the index enables/improves a bunch of features. Making this reusable across indexes seems worth it.

kadircet updated this revision to Diff 175469.Nov 27 2018, 6:25 AM
kadircet marked 2 inline comments as done.
  • Only keep the build graph structures and {de,}serialization logic.
  • Rename a few structures, move to more relavant places.
sammccall accepted this revision.Nov 27 2018, 7:02 AM

Only really significant thing is I think the name "build graph" is confusing, and we should specifically mention includes.
Rest is just nits to take or leave...

clangd/Headers.h
52

May be worth explicitly noting:
Does not own the strings it references (BuildGraphInclusions is self-contained)

53

I think we should be more specific and call this the "include graph".
Build graph can suggest other build-system related things (bazel and ninja have a very formal concept of a build graph, other systems have less formalized but may still use the term)

54

this is worth a comment (e.g. A "main file" as opposed to a header.)

55

This is slightly ambiguous: URI of the file, or URI with the file scheme?
I think you mean the former.
I think URI is actually the best name for this field, even though it's taken by the class. As a member of a struct, there's not really any ambiguous cases.

57

nit: I don't think "File" actually adds anything here, the whole struct is scoped to a file.
I'd add a comment that this is only direct inclusions, if you choose not to mention it in the name.
(The comment seems more useful here than it does on the struct)

clangd/SourceCode.h
28

Probably worth a comment, especially since there are no functions here using it.

This revision is now accepted and ready to land.Nov 27 2018, 7:02 AM
kadircet updated this revision to Diff 175483.Nov 27 2018, 7:53 AM
kadircet marked 6 inline comments as done.
  • Address comments.
  • Move digest and digestFile into SourceCode.h
This revision was automatically updated to reflect the committed changes.