This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Populate include graph during static indexing action.
ClosedPublic

Authored by kadircet on Nov 28 2018, 4:19 AM.

Details

Summary

This is the second part for introducing include hierarchy into index
files produced by clangd. You can see the base patch that introduces structures
and discusses the future of the patches in D54817

Event Timeline

kadircet created this revision.Nov 28 2018, 4:19 AM
ilya-biryukov added inline comments.Nov 28 2018, 5:30 AM
clangd/index/IndexAction.cpp
23

Could we add an assertion that the final graph does not contain uninitialized nodes?

28

Maybe expand a bit on what a node is? I.e. mention file digests, URIs, etc.

33

s/enteries/entries

44

Is there a way to check the node has already been populated? Can we do this and avoid recomputing the hashes for the same file/assert they're the same in assertion mode?
E.g. by looking at URI.empty()?

45

What happens if we can't compute a digest for a file? Are we left with an uninitialized memory?

63

Maybe convert both URIs before adding anything to IG?
To make sure we don't change IG unless both URIs can be parsed

72

Maybe move the if block into the function body? To make sure we're be producing compiler errors on changes to the base function signature regardless of the compile configuration

125

NIT: maybe remove != nullptr? the callback is a function, not a pointer, so nullptr might be a bit confusing

149

NIT: maybe remove != nullptr?

unittests/clangd/IndexActionTests.cpp
80

Why 3 headers on each level? 1 or 2, would probably read a but simpler.

105

Directly looking at the graph to capture the include structure might be more straightforward, e.g.

EXPECT_THAT(graph, 
 UnorderedElementsAre(Pair(Main, IncludersAre("level1", ....)))

This won't work on StringMap, but converting to std::map<string, node> should be simple. Other invariants can be tested separately (e.g. values of IsTU)
Maybe try that?

127

Could we also test a few corner cases?

  • Self-includes (with and without include guards).
  • Skipped files (multiple includes of the same file with #pragma once or include guards)
kadircet updated this revision to Diff 175684.Nov 28 2018, 7:02 AM
kadircet marked 11 inline comments as done.
  • Address comments.
clangd/index/IndexAction.cpp
45

Yes, we'll be left with some random digest in the node.

But I believe it is as good as any sentinel value, this field is used for consistency checks. Absence of digest implies we will always(well almost always) have an inconsistent(~stale) version of the file in the current structure and will re-trigger indexing action next time.

Therefore, having this digest initialized to some sentinel or random gibberish has the same probability of colliding with the actual hash. So I believe it is OK to leave it like that.

125

yeah but this was the convention in the file, is it ok if I change the other usages as well?

unittests/clangd/IndexActionTests.cpp
127

I've added a self-include test with header guards, I don't think it is possible to do that without a guard. Wouldn't it cause an infinite loop? I end up getting abort with:
/clangd-test/header.h:1:10: error: #include nested too deeply
Did you mean something else?

ilya-biryukov added inline comments.Nov 28 2018, 7:56 AM
clangd/index/IndexAction.cpp
45

Reading those values is undefined behavior, which is not the same as reading a random digest.
But anyway, the best way to avoid UB is to make the default constructor initialize it to some value, which can be done independently of this patch.

125

Keeping it for consistency LG. Not sure changing the whole file is worth the trouble

unittests/clangd/IndexActionTests.cpp
127

Yeah, we'll need some cut-off, in might be in a form that's different from the include guard to test this.
The idea was to test a different behavior in the compiler, I believe it will optimize away include-guarded files and call FileSkipped, while it would actually visit the same file twice when it cannot detect an include guard. Something like this should trigger this behavior:

#ifndef FOO
#define FOO "main.cpp"
#else
#define FOO "header.h"
#endif

#include FOO

Granted, the test-case is obscure, but there's so much C++ code out there, that someone will eventually run clangd on the code doing something like this.

To be clear, I don't think testing anything other than clangd does not crash on this is useful, not saying we should think too much about those cases :-)

ilya-biryukov added inline comments.Nov 28 2018, 9:31 AM
clangd/index/IndexAction.cpp
12

NIT: maybe call it toURI?

23

I think this comment would also be useful on the IncludeGraph definition itself, most of the readers won't see the code building the graph

34

NIT: maybe add a note mentioning why we have to do this separately from InclusionDirective?

152

NIT: maybe add a comment that we are checking for the absence of the uninitialized values here?

unittests/clangd/IndexActionTests.cpp
28

Maybe inline this? This looks simple enough.

32

Why not simply return arg.IsTU?

38

Why not return arg.Digest == Digest?

88

Maybe return all outputs in runIndexingAction in a dedicated struct?
To make the inputs for the tests more explicit.

Unless this has also been copied from some other test, in which case you could argue it's consistent with how we did things before.

124

NIT: Maybe use StringRef? i.e. for (StringRef Path : ...)

149

Maybe directly store map<string, IncludeGraphNode> to avoid the flattenning code in every test?

162

Maybe move to the common code? Repeating these in every test does not seem ideal, it's a common invariant.

kadircet updated this revision to Diff 175831.Nov 29 2018, 2:48 AM
kadircet marked 17 inline comments as done.
  • Address comments.
kadircet added inline comments.Nov 29 2018, 2:48 AM
unittests/clangd/IndexActionTests.cpp
28

yeah but repeating all of that over and over again looks really ugly.

ilya-biryukov added inline comments.Nov 29 2018, 8:15 AM
clangd/index/IndexAction.cpp
31

No need to duplicate the comments here, one can always look at the struct definition.
Mentioning it populates everything except the include structure should be enough.

36

No need to mention IO, something like the following should convey the same message: we cannot populate the fields in InclusionDirective because it does not have access to the contents of the target file

unittests/clangd/IndexActionTests.cpp
28

Maybe call it toUri to make it even shorther then?
Or at least pathToURI to follow the naming conventions.

53

NIT: add a comment that uninitialized nodes will have an empty URI?

110

Do we ever use anything but the include graph?
Maybe create a helper function that return vector<pair<string, IncludeGraphNode>> (or even map<string, IncludeGraphNode>) to avoid repeating the conversion code in every test?

129

Could we move these checks into the body of runIndexingAction to avoid repeating them in every test?

kadircet updated this revision to Diff 175878.Nov 29 2018, 8:59 AM
kadircet marked 8 inline comments as done.
  • Address comments
ilya-biryukov added inline comments.Nov 29 2018, 9:55 AM
clangd/Headers.h
64 ↗(On Diff #175878)

And multi-edges too, right? Even though they're not useful.

unittests/clangd/IndexActionTests.cpp
169

Maybe have two variables instead? Would format better and HeaderPath is arguably more readable than Header.first

212

Is there a way to make this format nicer?

226

Why not auto Nodes = toMap(*runIndexingAction(MainFilePath).Sources)?
Optional has an assertion, so we'll catch empty results anyway, no need to make the test more verbose

kadircet updated this revision to Diff 175899.Nov 29 2018, 10:19 AM
kadircet marked 4 inline comments as done.
  • Address comments.
kadircet added inline comments.Nov 29 2018, 10:27 AM
unittests/clangd/IndexActionTests.cpp
169

Is it that important?

226

because toMap doesn't modify the contents of the nodes(to make sure we don't change behaviour during the process), and all the strings are still referring to the keys of the graph. Therefore the graph needs to be kept alive.

ilya-biryukov added inline comments.Nov 30 2018, 2:15 AM
unittests/clangd/IndexActionTests.cpp
169

I think it is of minor importance.

Creating a pair that is deconstructed at every use site anyway hurts readability.
If the purpose is grouping the variables together, putting them close to each other and giving them similar names solves the same purpose.

I would also expect the raw strings to format better and not have a very long indent if we put them into separate variables.

226

I missed this. It makes sense, thanks for clarifying. It's ok to have the boilerplate then.
Would still remove ASSERT_TRUE(IndexFile.Sources), asserts in optional will catch those anyway. But up to you.

kadircet updated this revision to Diff 176065.Nov 30 2018, 2:37 AM
kadircet marked 4 inline comments as done.
  • Address comments.
ilya-biryukov accepted this revision.Nov 30 2018, 7:49 AM

Thank, LGTM. A few NITs.

clangd/index/IndexAction.h
31

We have only two call sites, maybe leave out the default arg and specify null explicitly where needed?

unittests/clangd/IndexActionTests.cpp
53

Maybe also check the size of IndexFile.Sources is the same as Paths? To make sure we didn't miss any

This revision is now accepted and ready to land.Nov 30 2018, 7:49 AM
kadircet updated this revision to Diff 176144.Nov 30 2018, 9:00 AM
kadircet marked 2 inline comments as done.
  • Address comments
This revision was automatically updated to reflect the committed changes.