Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Address review comments. Make diagnostics symref-centered and attach them to each reference.
Thanks for the extensive explanations! See the updated version, it should also take the discussions on the document into account.
clang-tools-extra/clangd/IncludeCleaner.cpp | ||
---|---|---|
697 | Oh yeah makes sense. | |
698 | Ok. | |
706 | Ok, got it. We only use the include conversion for matching, and matching seems not to use line numbers. I guess it's the reason nothing fails. | |
708 | Ok I will skip unresolved includes. But I am not sure I fully understand. We do the following:
How does it matter in this scenario if the include is resolved? AFAIU as long as the header is spelled in the main file + it's matched with a symbol provider, we should say that the symbol reference is satisfied. Otherwise, it seems like we'll say that the header is missing, although it's there in the main file and unresolved. I don't know if this is in any way a realistic scenario. I am just approaching it with general logic, and in this sense having more "satisfied" symbols seems better than having less => leads to less false positives. It can lead to false negatives, too, but AFAIU false negatives are much less of a risk for missing include management. | |
716 | Oh this is cool. Didn't realize we can do straight from a resolved include to the ID. Thanks. | |
720 | Sure. The new design does this, as well as skipping the header name. | |
734 | Ok, sure. What's a PP-disabled region? Are you talking about #ifdef's and such? | |
737 | Thank you for the great explanation! | |
742 | Yes, storing per Symbol totally makes sense. Let's discuss specifics in the corresponding document. | |
748 | Yes, this comment goes along the lines of the design discussion we are having at the moment.
this is AFAIU in conflict with the suggestion that the diagnostic should only be attached to the first reference. | |
748 |
I was primarily avoiding clang::Range since it requires llvm::Code to build ranges, and didn't want computeIncludeCleanerFindings to depend on anything but the AST. But syntax::FileRange sounds good. | |
759 | sure. | |
775 | Ok, great. Didn't know it was Ok to test std::optional directly in an if clause. |
clang-tools-extra/clangd/IncludeCleaner.cpp | ||
---|---|---|
370 | unfortunately getFile returns an llvm::Expected which requires explicit error handling (or it'll trigger a crash). you can simply elog the issue: if (!FE) { elog("IncludeCleaner: Failed to get an entry for resolved path {0}: {1}", Inc.Resolved, FE.takeError()); continue; } | |
683 | this should also be either static or put into anon namespace | |
704 | generateMissingIncludeDiagnostics should also be either static or put into anon namespace | |
705 | i think it's better to use an llvm::ArrayRef instead of llvm::SmallVector for diag infos here, we don't need to copy. | |
708 |
It doesn't matter at a high-level view. But since the implementation recognizes headers based on the HeaderID and it's only defined for resolved includes, if we were to have any unresolved include matches somehow (e.g. it has spelling "foo/bar.h", but is unresolved, and we do a match based on spelling because some IWYU pragma pointed at this header), we would hit the assertion around HeaderID always having value. | |
709 | nit: Result.push_back(std::move(D)), as D has lots of strings in it, it might be expensive to copy it around. Even better if you construct the Diag in place via Diag &D = Result.emplace_back(), you can achieve this by moving all the logic that might bail out (e.g. replacement generation) to the top of the loop, and start forming the diagnostic only after we're sure that there'll be one. | |
713 | we've got Config::Diagnostics::Includes::IgnoreHeader that disables include-cleaner analysis on headers that match a pattern. we should be respecting those config options here too. | |
720 | i feel like there's actually value in keeping the header name around, i.e. the user will have some idea about the action, without triggering an extra interaction. this helps especially in cases where the finding is wrong, they'll discover this sooner, hence we'll be more likely to receive bug reports. but don't really have a strong preference, so feel free to keep it that way. | |
732 | could you add a comment here, as this is subtle, something like We might suggest insertion of an existing include in edge cases, e.g. include is present in a PP-disabled region, or spelling of the header turns out to be the same as one of the unresolved includes in the main file | |
734 |
Yes, I was trying to say "preprocessor disabled region". e.g. in a piece of code like: #if 0 #include "foo.h" #endif preprocessor won't actually trigger inclusion of "foo.h", but most of the heuristic parsers (most importantly the logic in HeaderIncludes) will treat this include as usual. | |
755 | this also needs to be static or put into anon namespace | |
755–763 | can you restore std::move? | |
756 | no need to copy the vector by taking a std::vector here, you can take an llvm::ArrayRef instead. | |
764 | s/AST.getSourceManager()/SM | |
766 | nit: it might be worth re-writing the following section as: std::vector<Diag> Result = generateUnusedIncludeDiagnostics(AST.tuPath(), Cfg.Diagnostics.UnusedIncludes == Strict ? computeUnusedIncludes(AST) : Findings.UnusedIncludes, Code); llvm::move(generateMissingIncludeDiagnostics(AST, MissingIncludes, Code), std::back_inserter(Result)); and move the checks like if (Cfg.Diagnostics.MissingIncludes == Config::IncludesPolicy::Strict && !Cfg.Diagnostics.Suppress.contains("missing-includes")) into the specific function, e.g. generateUnusedIncludeDiagnostics, as they already do some of the diagnostic filtering logic. | |
771 | can you introduce a trace::Span wrapping the call to walkUsed with name IncludeCleanerAnalysis so that we can collect some stats about latency here? | |
783 | nit: drop either * or & (preferably &), having a reference vs a pointer doesn't make any differences performance wise, but creates a confusion (as we don't realy need a reference to a pointer here) | |
793 | nit: we prefer early exits to extra nesting, e.g. rewriting this as: if (Satisfied || Providers.empty() || Ref.RT != Explicit) continue; const auto &TB = AST.getTokens(); auto SpelledTokens = TB.spelledForExpanded(...); if (!SpelledTokens) continue; ... increases readability by:
| |
797 | nit: auto Range = syntax::Token::range(SM, SpelledForExpanded->front(), SpelledForExpanded->back()); | |
800–803 | you don't need to explicitly copy Providers into ProviderHeaders, you can pass it directly to DiagInfo below. | |
812 | we use llvm casts, specifically llvm::dyn_cast<NamedDecl*>(&Ref.Target.declaration())->getQualifiedNameAsString() | |
813 | getQualifiedNameAsString is going to print names that are really ugly at certain times, but unfortunately that's a problem we don't have a great solution to. so no action needed ATM, but we might want to switch between qualified and unqualified name depending on the length at the very least (e.g. symbol is coming from a templated class, which has a nasty nested instantiation). | |
817 | nit: MissingIncludes.emplace_back(std::move(SymbolName), Range, Providers); | |
823 | nit: you'd want std::moves here, around both of them | |
clang-tools-extra/clangd/IncludeCleaner.h | ||
40 | it seems unfortunate that we're duplicating these strings for each diag we want to emit. it might be better to just store a Symbol here (similar to Header) and delay spelling until needed. | |
45 | nit: std::tie(SymbolName, SymRefRange, Providers) == std::tie(Other.SymbolName, ...); I'd also put the SymbolName match to be the last, as it's a string match and might be more costly (if we can bail out early) | |
52 | I don't think we've much to gain by using SmallVector here, instead of std::vector | |
clang-tools-extra/clangd/unittests/IncludeCleanerTests.cpp | ||
451 | i don't think there's much value in testing out analysis here, we should rather focus on diagnostics generation, which isn't part of computeIncludeCleanerFindings. existing tests were focused on analysis, because legacy implementation for include-cleaner was actually performing these analysis itself. so I'd rather suggest having trivial test cases (from include-cleaner analysis perspective, no need for complicated directory/file layouts) and rather test things out through calls to generateMissingIncludeDiagnostics to make sure diagnostics has the right ranges, text and fix contents. right now we're not testing:
and these are the main functionality we're adding on top of include-cleaner analysis. you can take a look at the tests in llvm/llvm-project/clang-tools-extra/clangd/unittests/DiagnosticsTests.cpp to see how we're testing out diagnostics ranges, messages, fixes and what kind of helpers/matchers we have for them. |
Thanks for the comments! I should have addressed everything apart from testing.
clang-tools-extra/clangd/IncludeCleaner.cpp | ||
---|---|---|
370 | It returns llvm::ErrorOr, if I am not mistaken. There was explicit error handling already (if (!FE) continue below), just without the elog. Are you trying to say it will crash without the logging? Not sure that's feasible :) | |
683 | sure, thanks. | |
704 | Well then generateUnusedIncludeDiagnostics too, I guess. | |
708 | Ah ok, makes sense. | |
709 | Sure. Doing the in-place construction now. | |
713 | Ok, added that. Please have a look. At the moment it's trying to run filters on the output of the spellHeader method. | |
720 | Hm I'd expect we'll actually have a higher probability to receive a bug report if the user clicks on the "Quick fix" and gets a wrong header included, because that's annoying :) But this is a pure speculation ofc. | |
732 | Ok sure. | |
734 | thanks. | |
755 | Ah didn't realize that you also left a comment here when replying to the identical comment on generateMissingIncludeDiagnostics. Should be done. | |
755–763 | I'd rather do the emplacing, so that it's the same as in the generateMissingIncludeDiagnostics. | |
756 | Oh this isn't even my code, but as long as it's a small change, sure :) | |
766 |
This code seems to ignore the option Config::IncludesPolicy::None. It's saying to either return the old-style clangd results in case of Strict or include-cleaner results otherwise (incl. in case of None). Am I missing something?
Ok, moved into generateMissingIncludeDiagnostics. | |
771 | Sure. | |
813 | Ok so for now no action, IIUC. | |
817 | This does not compile. Seems like it needs a certain type of constructor to be present in MissingIncludeDiagInfo, whereas atm it's just a struct. | |
clang-tools-extra/clangd/IncludeCleaner.h | ||
40 | I'm not sure I can see your point re delaying spelling until needed. Whether we resolve the Symbol to the SymbolName during analysis (i.e., walkUsed) or in diagnostic generation (i.e., generateMissingIncludeDiagnostics), does not change the fact that the same symbol will be resolved to its name multiple times. As discussed in the doc, this results in simpler code than the version that maps a single Symbol object to multiple Ranges and Providers. | |
45 | Ok, sure. I can see your point with the string comparisons. But I don't fully see the point of using std::tie here. What is so great about creating an extra object, even if it only stores references? Is it a purely stylistic suggestion? | |
52 | Sure. I'm not so clear on the preferences yet. AFAIU your point, stdlib is to be preferred unless the llvm alternative is clearly beneficial. Is that the case? |
clang-tools-extra/clangd/IncludeCleaner.cpp | ||
---|---|---|
291 | can you also change the logic here to use isFilteredByConfig (we need the native call inside isFilteredByConfig as well to make sure it works on windows) | |
370 |
Ah you're right, I confused it with getFileRef. So in theory ErrorOr doesn't require explicit checking hence it won't trigger a crash if destroyed while containing an error.
Right, it isn't the logging that'll prevent the crash but rather a combination of the call to takeError and the way elog consumes Error objects. But it isn't relevant here, as ErrorOr doesn't require mandatory handling. | |
384 | nit: you can directly define SpelledHeader at line 377 | |
419 | this shouldn't be spelling, it should be the resolved path of the include. | |
444 | can you prefix this with IncludeCleaner: and rather say not diagnosing missing include {0}, filtered by config to add a bit more context about what specific interaction the log is coming from | |
472 | nit: auto &F = D.Fixes.emplace_back(); F.Message = ...; F.Edits.push_back(replacementToEdit(...)); | |
476 | nit: it'd put this next to D.File above | |
720 |
Well, it's unclear when we'll get there, and moreover my suggestion was actually to still mention the header name when there's only a single provider (which will be the case most of the time).
But yeah, my point of view is also mostly speculation. So feel free to keep it this way, I'll just be grumpy :P | |
749 | auto Range = syntax::Token::range(SM, SpelledForExpanded->front(), SpelledForExpanded->back()); | |
751 | as mentioned elsewhere, i think we should delay this symbol name spelling to diagnostic generation. to make sure core analysis we perform don't do work that might not get re-used (e.g. if we're not going to diagnose missing includes, or in the future when we don't care about spelling of all the symbols) | |
756 | well, this is our code in the end :D | |
766 | nit: I think logically it makes more sense for us to return set of Used includes here, and let the interaction that issues unused include diagnostics to derive this information from the set of used includes, and change the the missingincludes to a vector< tuple<Symbol, Ref, Providers> > (not only the unsatisfied ones) would represent the analysis better and make it more usable in the future (i.e. when we want to augment Hover responses, we can't re-use all the logic in here, we really need to implement another call to walkUsed because the analysis we get out of this call won't contain information for satisfied symbols. no need to do it now though, we can perform that kind of refactoring as we're adding the features too (or maybe it'll actually look neater to just have another call in those features rather than try and re-use the logic here) | |
766 |
Well that was to be addressed by second part of the comment And move the checks like if (Cfg.Diagnostics.MissingIncludes == Config::IncludesPolicy::Strict && !Cfg.Diagnostics.Suppress.contains("missing-includes")) into the specific function, e.g. generateUnusedIncludeDiagnostics, as they already do some of the diagnostic filtering logic. | |
823 | oops, i forgot to put the surrounding {} it should've been MissingIncludes.emplace_back({...}); | |
clang-tools-extra/clangd/IncludeCleaner.h | ||
40 |
we might not generate those diagnostics always, e.g. missing-includes is disabled, but unused-includes is on, or maybe we're going to use these results for something else like Hover responses.
I wasn't trying to suggest having a map here, I was suggesting just storing a Symbol S instead of string Name.
Well, generating strings are usually expensive (not asymptotically but in practice, as they tend to require lots of memory allocations). | |
45 | right, the comments with nit: prefix are usually things that won't matter much in practice, but reflects my (well, in the general case the reviewer's) or codebase's preference. | |
52 |
Right. llvm::SmallVector and std::vector have different use cases, we usually go for the former if we're sure that number of elements we want to store are going to be handful (literally less than 10) most of the time and the size of the objects themselves is not too big. As SmallVector chooses to store objects internally (until it grows too much), rather than allocating a bunch of memory elsewhere and just storing a pointer (as std::vector does). This has benefits when your vector isn't going to grow beyond smallvector's limits (you don't need to pay for memory allocations, which are expensive), but has other costs (e.g. moving a std::vector is trivial as it's just assignment of a pointer and size, but moving a smallvector might not be as trivial (it at least needs to move all the elements) also the initial memory cost for smallvector is higher than std::vector (as it takes up the same space independent of it's fullness). So in this example, we're unlikely to have a small number of MissingIncludes, MissingIncludeDiagInfo is a big enough struct (vector and filerange add up to more than 30 bytes). Hence std::vector feels like the better choice. |
Thanks for the comments!
clang-tools-extra/clangd/IncludeCleaner.cpp | ||
---|---|---|
370 | thanks for the explanation. | |
419 | Ok thanks. | |
756 | Sorry, wrong wording. I meant to say that this is not the code that has been touched in this patch. It might sometimes get annoying when comments on the patch dig too deep into code that's not in the diff. | |
766 | Ok, I've refactored more of the config checking logic inside generate.. functions. | |
823 | No this seems to be even more wrong. This is for the version with braces. And this is for no braces: /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/new_allocator.h:175:23: error: no matching constructor for initialization of 'clang::clangd::MissingIncludeDiagInfo' { ::new((void *)__p) _Up(std::forward<_Args>(__args)...); } It seems that it just doesn't cooperate with structs. | |
clang-tools-extra/clangd/IncludeCleaner.h | ||
52 | thanks! | |
clang-tools-extra/clangd/unittests/IncludeCleanerTests.cpp | ||
451 | Thank you, this makes sense. However, I believe we need to use issueIncludeCleanerDiagnosticsrather than generateMissingIncludeDiagnostics, since the latter is private. |
clang-tools-extra/clangd/IncludeCleaner.cpp | ||
---|---|---|
751 | Ok should be done now. | |
766 | Thanks. This might very well be the case, but this comment also seems to suggest some premature optimization (in a way). It totally makes sense to re-use what's re-usable, but this sort of refactoring really only makes sense once we have a clear use case (and get there :) | |
clang-tools-extra/clangd/IncludeCleaner.h | ||
40 | Ok, agreed. Storing symbols now. Having only unused include analysis on is a convincing use case. |
thanks! looks amazing, we're missing a little bit of test coverage though
clang-tools-extra/clangd/IncludeCleaner.cpp | ||
---|---|---|
284 | s/HeaderSpelling/HeaderPath | |
286 | s/Path/NormalizedPath | |
418 | what about just resolvedPath, if you'd rather keep the verb, i think get makes more sense than find. we're not really searching anything. | |
422 | nit: you can directly return SymProvider.physical()->tryGetRealPathName(); (same for other 2 cases) and have an llvm_unreachable("Unknown symbol kind"); after the switch statement. | |
425 | in this and the next case we need to trim <>" | |
434 | same as above, either just symbolName or get | |
438 | again you can just return here and below | |
438 | getName is a StringRef, and unfortunately there are some platforms (like darwin) that don't support implicit conversion from stringrefs to std::string. so can you call .str() explicitly in the end? | |
760 | i think for now this should be if (Cfg.Diagnostics.MissingIncludes == Config::IncludesPolicy::Strict || Cfg.Diagnostics.UnusedIncludes == Config::IncludesPolicy::Experiment) { otherwise we'll run both legacy and new analysis for UnusedIncludes == Strict | |
clang-tools-extra/clangd/unittests/IncludeCleanerTests.cpp | ||
450 | this is pointing at the declaration inside b.h not to the reference inside the main file. are you sure this test passes? | |
470 | can you also add a reference (and declaration) for std::vector, and have an IWYU private pragma in one of the headers to test code paths that spell verbatim and standard headers? also having some diagnostic suppressed via IgnoreHeaders is important to check | |
482 | can you make one of these names qualified? e.g. namespace ns { struct Bar { void f(); }; } |
Thank you for all the thoughtful comments!
clang-tools-extra/clangd/IncludeCleaner.cpp | ||
---|---|---|
418 | Ok, let's call it get. I do prefer verbs for methods, that's correct. | |
clang-tools-extra/clangd/unittests/IncludeCleanerTests.cpp | ||
450 | Yes, all the tests pass. | |
470 | Thank you for the great tips on improving test coverage! In fact, I had to also introduce support for private pragmas, as they were not taken care of. Hopefully, the solution will make sense to you. |
thanks, looks great!
clang-tools-extra/clangd/IncludeCleaner.cpp | ||
---|---|---|
453 | we should respect the style configurations (sorry for missing this in previous iterations). you can get the relevant style with: clang::format::getStyle, which has an IncludeStyle. in case the getStyle fails, we should fallback to clang::format::getLLVMStyle as we do in other places. you can get at the relevant VFS instance through sourcemanager. | |
731 | you can directly use !Pragmas->isPrivate(Inc->Resolved) here, instead of getpublic | |
731 | this check seems to be new. what's the reason for rejecting private providers? I can see that we might want to be conservative by not inserting private providers, but treating symbols as unsatisfied when a private provider is already included doesn't feel right. e.g. the code being analyzed might be allowed to depend on this private header, because it's also part of the library, or it's the public header that's exposing this private header. in such a scenario we shouldn't try to insert the public header again. is there a more concrete issue this code is trying to address? | |
clang-tools-extra/clangd/unittests/IncludeCleanerTests.cpp | ||
445 | nit: braces | |
450 | this is passing because bool BDeclFound; is uninitialized above, if you set it to bool BDeclFound = false; you should see the test fail. there's no declaration for b inside the main file, it's declared in b.h and referenced inside the main file. you still need to search for the decl (without the constraint of being written in main file), use it to build an include_cleaner::Symbol, and use a clangd::Annotation range for the range of the reference. it might be easer to write this as: const NamedDecl* B = nullptr; for (...) { ... B = D; } ASSERT_TRUE(B); // build expected diagnostic info based on B and check that it's equal to what we've produced | |
458 | i think the example for std::vector is solid, and IWYU pragma private needs a little adjustment. | |
471 | we should include private.h through some indirection (not public.h) to check IWYU pragma private spellings are respected. | |
477 | name this range as bar instead of d? | |
481 | could you add a comment here saying this shouldn't be diagnosed? |
Thanks for the comments!
clang-tools-extra/clangd/IncludeCleaner.cpp | ||
---|---|---|
731 | Ok makes sense. No, I guess I was just confused, because I understood that you wanted a test that includes "private.h" with a diagnostic generated saying that "public.h" should be included instead, so I assumed that was expected behaviour. But that's not what you meant, so I misunderstood. | |
clang-tools-extra/clangd/unittests/IncludeCleanerTests.cpp | ||
450 | Didn't know there was a difference between uninitialized and false.. Thanks for the idea with ASSERT_TRUE(Decl). Please check out the new version. |
clang-tools-extra/clangd/IncludeCleaner.cpp | ||
---|---|---|
456 | creating a copy of LLVM style unnecessarily all the time is not really great, can you move this into the failure case instead? also you can drop the clang:: here and elsewhere, as this code is already part of clang:: namespace. | |
457 | as mentioned above we also need to make sure we're passing the relevant VFS instance inside the source manager, rather than using the real file system (as some clients rely on the VFS). | |
458 | s/MainFile->getName()/AST.tuPath()/ to be consistent with other places. | |
460 | can you also elog this error? as it should be rare and when this goes wrong, having this mentioned in the logs are really useful for debugging (since the failure is actually outside of clangd, it usually means a malformed config file somewhere) | |
clang-tools-extra/clangd/unittests/IncludeCleanerTests.cpp | ||
428 | nit: instead of using a point, can you use a range here instead (i.e. [[b]])? afterwards you can have a FileRange pointing at both offsets, rather than relying on the length of the identifier. | |
448 | rest of the code here doesn't really belong to the for loop, can you take them out? |
thanks for bearing with me, let's ship it!
clang-tools-extra/clangd/IncludeCleaner.cpp | ||
---|---|---|
456 | nit: this could be shorter with auto FileStyle = format::getStyle(..); if (!FileStyle) { elog("..."); FileStyle = format::getLLVMStyle(); } tooling::HeaderIncludes HeaderIncludes(AST.tuPath(), Code, FileStyle->IncludeStyle); | |
clang-tools-extra/clangd/unittests/IncludeCleanerTests.cpp | ||
450 | nit: size_t Start = llvm::cantFail(positionToOffset(MainFile.code(), Range.start)); size_t End = llvm::cantFail(positionToOffset(MainFile.code(), Range.end)); no need for EXPECT_FALSE(..takeError())s as llvm::cantFail will fail (no pun intended :P), static_casts are also redundant | |
460 | it'd be better to ASSERT_TRUE(BDecl); right after the for loop, as rest of the code will crash (and even trigger undefined behavior because we're dereferencing nullptr in failure case). difference between ASSERT_X and EXPECT_X macros are, the former will stop execution of the particular test (hence we'll never trigger a nullptr deref with ASSERT_TRUE), whereas the latter just prints the failure, but doesn't abort the execution of test (hence helps print multiple failures at once, when they're non fatal). |
This change broke regression, none-one read: "This revision was landed with ongoing or failed builds."
./ClangdTests.exe/IncludeCleaner/GenerateMissingHeaderDiags tests fails on windows
This breaks tests on windows: http://45.33.8.238/win/75486/step_9.txt
Please take a look and revert for now if it takes a while to fix.
Oh, that was reported a while ago already. Reverted in 2eb5ac99a76dbbf8ac68c538211fabeaa5ac0bfd for now.
clang-tools-extra/clangd/unittests/IncludeCleanerTests.cpp | ||
---|---|---|
464 | Looks like this filter doesn't work on windows (the / vs \ path separator might be the root cause here), I think a fix can be
|
rather than duplicating, what about renaming UnusedIncludesPolicy to IncludesPolicy and use it for both UnusedIncludes and MissingIncludes options below?