o Collect suitable #include paths for index symbols. This also does smart mapping
for STL symbols and IWYU pragma (code borrowed from include-fixer).
o For global code completion, add a command for inserting new #include in each code
completion item.
Details
Diff Detail
- Repository
- rCTE Clang Tools Extra
Event Timeline
TL;DR:
- I think we can expose fewer types/abstractions in the header mapping, and put it all in one header. If we want some functionality to be global-index only, we should make it an option to the symbolcollector, not stash the logic somewhere else I think.
- regex seems like a big hammer, but we can revisit if this is on the hot path
- the overridden path should probably be an "include string" rather than a URI
- lots of naming nits as usual :-)
- I haven't reviewed the actual preprocessor/include-path madness yet :-)
clangd/ClangdLSPServer.cpp | ||
---|---|---|
204 | The caveat from apply fix also applies here. Maybe we should combine the tail of this case and the tail of apply-fix (the part from WorkspaceEdit)? They're simple (and wrong) now, but will need the same changes. | |
clangd/ClangdServer.cpp | ||
314 | (I still want to look at this bit... on monday :-) | |
clangd/CodeComplete.cpp | ||
255–256 | contents is unused | |
290–291 | not sure where this is shown, but maybe add "if needed"? | |
297 | LSP says we should use additionalTextEdits instead. | |
859 | What do you think about (redundantly) passing filename to the constructor, and stashing it in a member, instead of passing to all the methods here? If you don't like that, we could also pass the SemaCCInput struct to the constructor itself. The distinction between constrcutor params and run() params is pretty artificial. (as noted above, I think/hope Contents is unused in any case) | |
clangd/Protocol.cpp | ||
410 | nit: drop braces consistent with the other code around here | |
clangd/Protocol.h | ||
432 | I would call this "header". At least in the current implementation, it's not a URI, and we we really don't want anyone trying to parse it. | |
433 | I don't think this comment applies here? | |
452 | nit: remove blank line (and below). These form a list. | |
463 | Hmm, you could consider struct Command : public ExecuteCommandParams { std::string title; } I guess it's a hack, but this repetition seems a little silly. You could then reuse some of the JSON stuff too. | |
clangd/global-symbol-builder/PragmaCommentHandler.h | ||
1 ↗ | (On Diff #131782) | can this also be put in HeaderMap.h? |
28 ↗ | (On Diff #131782) | I'm not sure this class needs to be exposed, can we just expose a function unique_ptr<CommentHandler> collectHeaderMaps(HeaderMapCollector*)? |
clangd/global-symbol-builder/STLPostfixHeaderMap.h | ||
1 ↗ | (On Diff #131782) | STL isn't really a name we want to be using :-) Why isn't this in HeaderMapCollector.h? |
clangd/index/HeaderMapCollector.h | ||
9 ↗ | (On Diff #131782) | context! what are header mappings, why do we use them (example) |
23 ↗ | (On Diff #131782) | I'm not sure "header mapping" is as descriptive as it should be, and "collector':
"Collector" doesn't really describe the class - it both records and applies rules. Maybe CanonicalIncludes? |
25 ↗ | (On Diff #131782) | these public typedefs and the explicit constructor seem leaky. Maybe just expose a public function addStandardLibraryMapping(CanonicalIncludes*)? |
31 ↗ | (On Diff #131782) | nit: just addMapping? |
41 ↗ | (On Diff #131782) | nit: just mapHeader? |
41 ↗ | (On Diff #131782) | what's the input? absolute path? working-directory-relative? include-path relative? Similarly, what's the output? (This also relates to addHeaderMapping, so might belong in the class desc) |
48 ↗ | (On Diff #131782) | This comment and the mutable are scary. We don't need to be threadsafe I think? |
50 ↗ | (On Diff #131782) | all the rules we have so far are suffix matches. It seems a little weird to allow arbitrary regexes (and prevents fast lookup by sorting the reversed strings). But it's a small amount of code and I guess the table will stay small. Maybe a FIXME that performance could be improved here? |
clangd/index/Index.h | ||
160 | I think we concluded this shouldn't be a URI but a literal include string. If absent, we should fall back to CanonicalDeclaration. The reason is that we choose to interpret the target of IWYU directives as a literal string to include. Alternatively we could choose to resolve it somehow and store the URI to the resolved target, but that would be more work. | |
clangd/index/SymbolCollector.cpp | ||
134 | shouldn't this just be return SK != Namespace, to avoid duplication? Since we only get here if we're indexing this symbol? |
clangd/ClangdServer.cpp | ||
---|---|---|
321 | Pull out a function to compute ToInclude from a uri/abspath? | |
328 | Need high-level comments explaining what we're doing to determine the right include path. | |
336 | temporary logs? | |
348 | explain why? (this has implications for direct vs transitive includes I think) | |
350 | hmm, why are we actually going to run the compiler/preprocessor? | |
368 | do we handle the case that the suggestion is already included? | |
371 | leftover? | |
393 | maybe this fixme should just be to have more consistent style support in clangdserver? |
- Merge with origin/master
- Renamed and moved a bunch files according to review comments.
- Merge remote-tracking branch 'origin/master' into include
- Half way
- Merge with uri changes.
- merged with origin/master
- Merge with origin/master
- s/HeaderUri/IncludeURI/
Thanks for the comments! I addressed comments in the symbol collect part (I think?). Will add tests in a followup patch.
clangd/index/HeaderMapCollector.h | ||
---|---|---|
48 ↗ | (On Diff #131782) | mutable was here because of llvm::Regex. Matching again a llvm::Regex is not a constant operation, but we would still like the lookup function to be const. |
clangd/index/Index.h | ||
160 | In our latest offline discussion, we concluded that this would be either URI of a file (in .inc case) or a literal string that is suitable to be inserted. I left resolving IWYU targets as a FIXME in this patch as it is more work and doesn't seem to cause any trouble at this point. Will revisit when it actually becomes a problem. | |
clangd/index/SymbolCollector.cpp | ||
134 | I think it'd be easier to justify the behavior with a white list. There are other symbols like NamespaceAlias etc, which we might or might not be collecting. |
Thanks for the comments!
Sorry that I didn't clean up the code before sending out the prototype. I planned to deal with code structure and style issues after getting some early feedback, but I think the patch is ready for review now.
clangd/ClangdServer.cpp | ||
---|---|---|
348 | Since we only need predecessor for HeaderSearch and don't really care about the actual code, I set this to false in the hope of speeding up the code. But in the latest revision, I simply use an empty file (as we only care about header search), so this option is no longer necessary. | |
350 | I couldn't find an easy way to use ApplyHeaderSearchOptions... It requires an instance of HeaderSearch, which needs a preprocessor and a bunch of other objects (SourceManager, Target etc). And these objects are mostly initialized in BeginSourceFile. We could probably pull out the code that only initializes up to proprocessor, but this is not very trivial :( | |
368 | Yes and no. The current implementation only does textual matching against existing #includes in the current file and inserts the header if no same header was found. This complies with IWYU. But we are not handling the case where the same header is included by different names. I added a FIXME for this. | |
clangd/CodeComplete.cpp | ||
859 | Passing filename to the constructor sounds good. Thanks! |
Some comments on the insert side, which looks pretty good. I'll take another look at indexing tomorrow.
clangd/ClangdServer.cpp | ||
---|---|---|
314 | This has a fair amount of logic (well, plumbing :-D) and uses relatively little from ClangdServer. I'd suggest maybe Headers.h - The formatting part of insertInclude could also fit there, as well as probably the logic from switchSourceHeader I think (the latter not in this patch of course). This won't really help much with testing, I just find it gets hard to navigate files that have lots of details of unrelated features. ClangdUnit and ClangdServer both have this potential to grow without bound, though they're in reasonable shape now. Interested what you think! | |
316 | maybe drop "via clang::HeaderSearch" (it's doc'd in the implementation) and add an example? It might be worth explicitly stating that the result is an #include string (with quotes) - you just call it a "path" here. "shortest" makes sense as a first implementation, but we may want to document something more like "best" - I know there are codebases we care about where file-relative paths are banned. Also I suspect we don't give "../../../../Foo.h" even if it's shortest :-) | |
325 | This comment helps a lot. The subtext is: HeaderSearch is hard to construct directly, so we're doing this weird dance. | |
366 | consume the optional IsSystem and use it to quote appropriately? | |
368 | I can't see the FIXME? (There's one in the header, but it doesn't seem to really cover this case) So this doesn't seem so hard: we can pass the file content, turn off recursive PP, and add PP callbacks to capture the names of each included file (the include-scanner patch does this). But up to you, I'd suggest putting the fixme where we expect to fix it. | |
unittests/clangd/ClangdTests.cpp | ||
821 | Similarly, it'd be nice to pull these tests out into a test file parallel to the header. |
Thanks! PTAL
clangd/ClangdServer.cpp | ||
---|---|---|
314 | Done. I didn't move the formatting code, as half of the code is pulling out the style, which we might want to share/change depending on other clangd logics that might use style. | |
368 | Since you prefer, I have included the change in the patch. I wanted to get to this as soon as this patch is landed. | |
unittests/clangd/ClangdTests.cpp | ||
821 | Done. I left a simple test for replacements. |
Insertion side LGTM, feel free to split and land.
Sorry I need to take off and will need to get to indexing on monday :(
clangd/ClangdServer.cpp | ||
---|---|---|
314 | I'd still think pulling out Expected<tooling::Replacement> insertInclude(File, Code, Header, VFS, Style) would be worthwhile here - the formatting isn't a lot of code, but it's a bit magic, plus the quote handling... it's a bit of code. It'd make it more obvious what the interactions with ClangdServer's state are. But up to you, we can always do this later. | |
316 |
I think this part wasn't addressed | |
clangd/Headers.cpp | ||
75 | comment is outdated now | |
115 | can you include the Header in this log message? (and possibly File, but that might add more noise than signal) | |
clangd/Headers.h | ||
31 | header-already-included is not an error condition. Suggest returning llvm::Expected<Optional<String>>, or returning "" for this case. |
- Addressed review comments.
clangd/ClangdServer.cpp | ||
---|---|---|
314 | So it's just 2 lines for the replacement magic (+1 for comment) now after removing some redundant code. I like shortenIncludePath better because it's more self-contained and easier to write tests against, and insertInclude doesn't seem to carry much more weight while we would need to handle replacements logic which has been tested in the tests. | |
316 | I added a comment for this in the header. But I might be misunderstanding you suggestion. Did you mean we need a better name for the function? |
Insertion still LG (couple of nits, inline).
For indexing, my biggest questions:
- I worry CanonicalIncludes doesn't get enough information to make good decisions - passing the include stack in some form may be better
- CanonicalIncludes has slightly weird patterns of reads/writes - most writes are permanent and a few are transient, and it's not totally obvious how it works in a multithreading context (though your code is correct). I'm not sure whether this is worth fixing.
clangd/ClangdServer.cpp | ||
---|---|---|
316 | I think the function name is fine as a shorthand, just that the comment is a bit overspecified and possibly inaccurate compared to e.g. "Determines the preferred way to #include a file, taking into account the search path. Usually this will prefer a shorter representation like 'Foo/Bar.h' over a longer one like 'Baz/include/Foo/Bar.h'". I don't think the ../../../Foo.h case is worth explicitly calling out - I just meant it as an example of why we *don't* want an overly-specific contract here. | |
clangd/CodeComplete.cpp | ||
290 | whether | |
301 | user-facing. unwrap? | |
clangd/clients/clangd-vscode/package.json | ||
1 ↗ | (On Diff #133635) | phab says you have ws-only changes in this file, which you might want to revert |
clangd/global-symbol-builder/CMakeLists.txt | ||
1 ↗ | (On Diff #133635) | (and here - ws-only changes?) |
clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp | ||
102 | Do we create one of these per TU or per thread? The former is "clean" but seems potentially wasteful (compiling all those system header regexes for each TU). The latter is "fast" but potentially non-hermetic (can't think of a triggering case though). Maybe we should have a split between transient mappings (IWYU) and permanent ones? | |
clangd/index/CanonicalIncludes.cpp | ||
52 | I think this is if (!Text.consume_front(IWYUPragma)) | |
73 | "STL" :-) | |
75 | these aren't standard library - deserves a comment? in general it looks like there's a bunch of standard library stuff, also posix stuff, and some compiler intrinsics. Can we document approximately which standard libraries, which compiler extensions, and other standards (posix, but I guess windows one day) are included? | |
clangd/index/CanonicalIncludes.h | ||
11 | Can we split out the main ideas a bit? I think these are: a) what include mapping is, b) IWYU pragmas, c) standard library. e.g. At indexing time, we decide which file to #included for a symbol. Usually this is the file with the canonical decl, but there are exceptions: - private headers may have pragmas pointing to the matching public header. (These are "IWYU" pragmas, named after the include-what-you-use tool). - the standard library is implemented in many files, without any pragmas. We have a lookup table for common standard library implementations. libstdc++ puts char_traits in bits/char_traits.h, but we #include <string>. The insert-time logic in clang/Headers.h chooses how to spell the filename we emit here; this depends on the search path at the insertion site. (I think .inc files are conceptually similar and should also be handled and mentioned here, commented below) | |
33 | nit: \brief needed? I'm not sure this class actually collects anything - that's the handler returned by collectIWYUHeaderMaps. Maybe "Maps a definition location onto an #include file, based on a set of filename rules."? | |
45 | Maps all files matching \p RE to \p CanonicalPath? | |
51 | Just this comment is probably enough for the whole function... | |
53 | So I'm a bit concerned this is too narrow an interface, and we really want to deal with SourceLocation here which would give us the include stack. Evidence #1: .inc handling really is the same job, but because this class has a single-file interface, we have to push it into the caller. One compromise would be to pass in a stack<StringRef> or something. Efficiency doesn't really matter because the caller can cache based on the top element. | |
57 | do we need both tables? it seems the system headers (which are regex-based) will typically outnumber the IWYU pragmas by a bunch. | |
68 | Explicitly mention that the mappings are registered with *Includes? (interaction isn't totally obvious here because Includes has read and write methods) | |
clangd/index/SymbolCollector.h | ||
46 | const? |
- Addressed some review comments.
clangd/ClangdServer.cpp | ||
---|---|---|
316 | I see. Thanks! | |
clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp | ||
102 | This is created per TU now. In an earlier revision, this was one-per-program because we statically constructed a regex map and passed the map into the CanonicalIncludes via the constructor, like we do in include-fixer (https://github.com/llvm-mirror/clang-tools-extra/blob/master/include-fixer/find-all-symbols/STLPostfixHeaderMap.cpp#L15). But with the current design, I didn't do this because one-per-TU approach seems to be cleaner, and the regex construction time seems to be relatively small comparing to the time spent on actually compiling a TU. If we really want to squeeze performance here, we would probably need either an interface that takes a static regex mapping or one that takes pre-computed llvm::Regex. But I'm not really sure if it would be worth it since this is not a performance critical code path. | |
clangd/index/CanonicalIncludes.cpp | ||
75 | Switched both function and variable to "system headers" and updated the documentation. | |
clangd/index/CanonicalIncludes.h | ||
11 | Thanks for the suggestion!
I'm a bit inclined to not call out the relationship here as this doesn't seem to be a better place than the code where they interact, and the doc could easily be outdated if the interaction is changed from other libraries. | |
53 |
I think this would depend on how you define the scope of this class. .inc handling is a subtle case that I'm a bit hesitated to build into the interface here.
This is actually very similar to how the hardcoded mapping was generated. I had a tool that examined include stacks for a library (e.g. STL) and applied a similar heuristic - treat the last header in the include stack within the library boundary as the "exporting" public header for a leaf include header, if there is no other public header that has shorter distance to that include. For example, if we see a stack like stl/bits/internal.h -> stl/bits/another.h -> stl/public_1.h -> user_code.cc, we say public_1.h exports bits/internal.h and add a mapping from bits/internal.h$ to public_1.h. But if we see another (shorter) include stack like stl/bits/internal.h -> stl/public_2.h -> user_code.cc, we say stl/public_2.h exports stl/bits/internal.h. This heuristic works well for many cases. However, this may produce wrong mappings when an internal header is used by multiple public headers. For example, if we have two include stacks with the same length e.g. bits/internal.h -> public_1.h -> user.cc and bits/inernal.h -> public_2.h -> user.cc, the result mapping would depend on the order in which we see these two stacks; thus, we had to do some manual adjustment to make sure bits/internal.h is mapped to the correct header according to the standard. I am happy to discuss better solution here. But within the scope of this patch, I'd prefer to stick to interfaces that work well for the current working solution instead of designing for potential future solutions. I should be easy to iterate on the interfaces as these interfaces aren't going to be widely used in clangd after all. WDYT? | |
57 | Makes sense. |
LG apart from the .inc handling (happy to chat more)
clangd/index/CanonicalIncludes.h | ||
---|---|---|
53 |
Sure it's subtle, but it's clearly in the scope of determining what the canonical header is for a symbol, which is the job of this class. We wouldn't be building it into the interface - on the contrary, the *current* proposed interface codifies *not* handling .inc files. But you're right that we should check in something that handles most cases. My preference would be to drop .inc from this patch until we can incorporate it into the design, but I'm also OK with a FIXME to move it. | |
unittests/clangd/HeadersTests.cpp | ||
30 | This test would be clearer to me if you removed this helper and just did FS.Files["sub/bar.h"] = ... in the test. Can we change buildTestFS in TestFS.cpp to call getVirtualTestFilePath on relative paths to allow this? (I can do this as a followup if you like, but it seems like a trivial change) | |
unittests/clangd/SymbolCollectorTests.cpp | ||
50 | nit: this is only used once, as Not(HasIncludeHeader()). Just use IncludeHeader("")? | |
620 | Took me a while to understand this test, and still not sure I get it. Maybe "class string" here? |
unittests/clangd/HeadersTests.cpp | ||
---|---|---|
30 | I thought better of that change to TestFS, but did some renames in r325326. So this would be FS.Files[testPath("sub/bar.h")) = ... which still seems more transparent - up to you. |
Hi @ioeric:
Just to let you know that your submission seems to break a Windows test.
FAIL: Extra Tools Unit Tests :: clangd/Checking/./ClangdTests.exe/SymbolCollectorTest.IWYUPragma (15474 of 36599) C:\xxx\llvm\tools\clang\tools\extra\unittests\clangd\SymbolCollectorTests.cpp(620): error : Value of: Symbols [C:\xxx\build\check-all.vcxproj] Expected: has 1 element and that element (q name "Foo") and ((decl u r i "file:///C%3a/clangd-test/symbol.h") and (include header "\"the/good/header.h\"")) Actual: { Foo }, where the following matchers don't match any elements: matcher #0: (q name "Foo") and ((decl u r i "file:///C%3a/clangd-test/symbol.h") and (include header "\"the/good/header.h\"")) and where the following elements don't match any matchers: element #0: Foo [ FAILED ] SymbolCollectorTest.IWYUPragma (113 ms) [----------] 1 test from SymbolCollectorTest (113 ms total)
Thanks
The caveat from apply fix also applies here. Maybe we should combine the tail of this case and the tail of apply-fix (the part from WorkspaceEdit)? They're simple (and wrong) now, but will need the same changes.