This is an archive of the discontinued LLVM Phabricator instance.

[clangd] IncludeCleaner: Add filtering mechanism
ClosedPublic

Authored by kbobyrev on Apr 11 2022, 1:42 AM.

Details

Summary

This introduces filtering out inclusions based on the resolved path. This
mechanism will be important for disabling warnings for headers that we can not
diagnose correctly yet.

Diff Detail

Event Timeline

kbobyrev created this revision.Apr 11 2022, 1:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 11 2022, 1:42 AM
kbobyrev requested review of this revision.Apr 11 2022, 1:42 AM
sammccall added inline comments.Apr 11 2022, 7:49 AM
clang-tools-extra/clangd/Config.h
106

Config is supposed to be cheap to create.
The regex should be created during compilation and shared across all usages.

Maybe store criteria here as vector<function<bool(StringRef)>> here and have each function capture a shared_ptr<vector<Regex>> by value?

clang-tools-extra/clangd/ConfigCompile.cpp
526

nit: invalid, not incorrect (incorrect suggests it's a regular expression that expresses the wrong condition)

531

you're capturing a reference to a local variable

533

Each time the resulting fragment is evaluated to create a config, we're going to move out of CompiledRegex. After the first time, there's going to be nothing there to move

(apart from the local-variable thing)

clang-tools-extra/clangd/ConfigFragment.h
237

... as unused or missing.

Second sentence is a bit technical.
"These can match any suffix of the header file in question."?

clang-tools-extra/clangd/IncludeCleaner.cpp
267

what are we doing about slashes?
As written, I think we're matching the regex against native paths, which doesn't really make sense for checked-in configs.

268

filterred -> filtered

not enough context in this dlog to be useful I think: filtered out from what?

Cool stuff! Chiming in randomly here, but I've been keeping a loose eye on the progress of IncludeCleaner.

I've been using IncludeCleaner to power include cleanup in the Chromium code base for the past few months, and have built Python scripts around it so I can more efficiently use it at a code base level. You can find all that fun stuff on my repo: https://github.com/dsanders11/chromium-include-cleanup

In my usages, I've found there are three categories of ignores I need (you can see my ongoing list of them in the config file: https://github.com/dsanders11/chromium-include-cleanup/blob/main/configs/chromium.json):

  • Skip over a file entirely and don't provide any diagnostics for it. This is useful to ignore umbrella headers, without needing to ignore the headers included in that umbrella header.
  • Ignore diagnostics for a particular header, which is what is implemented here so far.
  • Ignore an include edge (filename, header). This is useful for cases where it's a spurious diagnostic, like when the usage is inside of a macro definition, so IncludeCleaner thinks it is unused but it is not.

This isn't entirely applicable here, but could be insightful: I initially I had my filtering mechanism in the script which ran clangd over the source files, but I found that it was better for my needs to decouple the filtering from the generating the diagnostics. Main reason being that generating the diagnostics for all 80k files in the Chromium code base takes hours to run, so if I found some new ignores I wanted to add, I'd have to do another 7 hour run (on the hardware I'm using), which wasn't efficient. So I let the many script output all diagnostics, and then I filter after the fact, letting me tweak the filtering without needing to re-run the whole thing. So for my own needs I'll probably stick with doing the filtering after the fact, even if it's built-in. Not too applicable for IncludeCleaner usage through say VS Code, but there are different challenges when doing it at code base level.

kbobyrev updated this revision to Diff 422227.Apr 12 2022, 7:32 AM
kbobyrev marked 7 inline comments as done.

Resolve comments.

kbobyrev updated this revision to Diff 422228.Apr 12 2022, 7:33 AM

Fix the comment.

Cool stuff! Chiming in randomly here, but I've been keeping a loose eye on the progress of IncludeCleaner.

I've been using IncludeCleaner to power include cleanup in the Chromium code base for the past few months, and have built Python scripts around it so I can more efficiently use it at a code base level. You can find all that fun stuff on my repo: https://github.com/dsanders11/chromium-include-cleanup

In my usages, I've found there are three categories of ignores I need (you can see my ongoing list of them in the config file: https://github.com/dsanders11/chromium-include-cleanup/blob/main/configs/chromium.json):

  • Skip over a file entirely and don't provide any diagnostics for it. This is useful to ignore umbrella headers, without needing to ignore the headers included in that umbrella header.
  • Ignore diagnostics for a particular header, which is what is implemented here so far.
  • Ignore an include edge (filename, header). This is useful for cases where it's a spurious diagnostic, like when the usage is inside of a macro definition, so IncludeCleaner thinks it is unused but it is not.

This isn't entirely applicable here, but could be insightful: I initially I had my filtering mechanism in the script which ran clangd over the source files, but I found that it was better for my needs to decouple the filtering from the generating the diagnostics. Main reason being that generating the diagnostics for all 80k files in the Chromium code base takes hours to run, so if I found some new ignores I wanted to add, I'd have to do another 7 hour run (on the hardware I'm using), which wasn't efficient. So I let the many script output all diagnostics, and then I filter after the fact, letting me tweak the filtering without needing to re-run the whole thing. So for my own needs I'll probably stick with doing the filtering after the fact, even if it's built-in. Not too applicable for IncludeCleaner usage through say VS Code, but there are different challenges when doing it at code base level.

Hi David! This is really interesting, thank you very much for sharing! This is a really exciting use case and I'll look into the code that you have!

I think you might be interested in a more generic effort we're currently investing into: we want to make Include Cleaner a library and make it possible for the users to run it from Clang-Tidy and standalone tools (https://reviews.llvm.org/D121593). Is that something that would be potentially interesting to you?

Oops, sorry, I linked the wrong revision; here's the prototype: we plan to start rolling it out gradually https://reviews.llvm.org/D122677 and then have a common library that we could share between both clangd and Clang-Tidy (and potentially other tools).

Not to fix in this patch, but the move constructor of llvm::Regex is suspicious - there's no move assignment operator.
This putting them in a vector seems likely to copy them when the vector reallocates. Think you could fix this? :-)

clang-tools-extra/clangd/ConfigCompile.cpp
527

missing the error message

clang-tools-extra/clangd/IncludeCleaner.cpp
242

maybe just pass in config?

sammccall accepted this revision.Apr 19 2022, 1:59 AM

Oops, LG with comments

This revision is now accepted and ready to land.Apr 19 2022, 1:59 AM
kbobyrev updated this revision to Diff 423601.Apr 19 2022, 5:55 AM
kbobyrev marked an inline comment as done.

Resolve comments.

kbobyrev marked an inline comment as done.Apr 19 2022, 5:55 AM
This revision was landed with ongoing or failed builds.Apr 19 2022, 5:56 AM
This revision was automatically updated to reflect the committed changes.