Page MenuHomePhabricator

[clangd][WIP] Provide clang-include-cleaner
Needs ReviewPublic

Authored by serge-sans-paille on Mar 14 2022, 4:26 AM.

Details

Reviewers
sammccall
Summary

This is just a wrapper around clangd include cleaner's capability to be used in CLI mode.

I'm not sure why include-cleaner wasn't integrated in clang-tidy, but I suspect there's a good reason.
This revision is a support for discussing a potential tool that would leverage on include cleaner from clangd: it would be great to be able to run such a tool on LLVM database and make sure we don't regress.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2022, 4:26 AM
serge-sans-paille requested review of this revision.Mar 14 2022, 4:26 AM

Hi! I'm glad you're excited about IncludeCleaner and having a tool to try it out seems like a cool idea!

I'm not sure why include-cleaner wasn't integrated in clang-tidy, but I suspect there's a good reason.

The problem with IncludeCleaner as of right now is that it depends on some clangd internals. I've started reducing the dependency recently but there's still some way to go before it can really be used in Clang-Tidy or in external tooling. Detaching the most important primitives from clangd is also some work that is often orthogonal to just getting the feature done but I would agree that it is a good idea and a promising venue for future implementation efforts.

Here's what I did so far:

Apart from what I've mentioned, Clang-Tidy and clangd have different performance trade-offs and policies. In clangd we need to make everything really fast to keep the editor responsive; in Clang-Tidy we can allow expensive checks and disable them by default/provide guidance on how to do large-scale analysis. It's most likely possible to keep a large chunk of the implementation common and implement adapters for both clangd and Clang-Tidy, each of which will target the specific use-case; but that would require some thought, design and effort on making existing infrastructure commonly available (most likely under clang/Tooling where I moved stdlib handlers).

I've seen the discussion you have started (https://discourse.llvm.org/t/include-what-you-use-include-cleanup/5831). I'd like to read it thorough-fully and reply but I don't think that would happen until next week (at the earliest).

I left some comments but it's really a quasi-review, not a real one. Unfortunately, I'm mostly not working over the last three weeks for personal reasons, so I'm not as useful or up-to-date as I would usually be. @sammccall is a great person to talk to about these changes until I'm back but I'm expecting to start doing _some_ work reasonably soon and getting back to your discussion and this patch is one of the first things I'd like to do.

clang-tools-extra/clangd/CMakeLists.txt
183

I think it would be better to just put it into tool/.

clang-tools-extra/clangd/include-cleaner/CMakeLists.txt
6

nit:

clang-tools-extra/clangd/include-cleaner/ClangIncludeCleanerMain.cpp
1

Please use the LLVM style heading and explain what this file & tooling is about.

7

nit: naming should be Report; Clang-Tidy should catch this.

9

nit: here and elsewhere there are one-to-three letter variable names and it's hard to read the code without going to the definition; it should probably be documented and expanded for clarity.

14

Nit: I had a similar tool and it printed both used and unused includes which I found rather useful; maybe it would be good to include/add an option to report both for clarity.

You don't have to do this but maybe add a FIXME/mention it somewhere.

25

This seems to be unused.

31

Normally, we don't use plain argc and argv. LLVM has a nice Command Line library (llvm::cl), please use it instead. It provides a nice abstraction over the primitives and offers useful diagnostics.

Also, please provide a useful --help message through this library, this will be super nice for those without access to sources.

68

nit: return 0; at the end of int main() isn't necessary in C++

This is exciting! We weren't sure how much effort to put into decoupling this from clangd, whether there'd be interest in such a standalone tool. Some thoughts at a high level:


Eventually we should separate this from clangd. I don't think it necessarily needs to be right away, but we should avoid hopelessly entangling it.
In particular, I don't think we should be going through ClangdServer. As well as pulling in ~all of clangd, this is going to spawn threads, write PCHes to disk, and limit flexibility in how we process files.


I'm not sure why include-cleaner wasn't integrated in clang-tidy, but I suspect there's a good reason.

The TL;DR is I think it could be, and we should consider whether to do this instead of adding a separate tool.

Our initial goal was to get this functionality in clangd. While clangd has support for tidy checks, it's limited: the checks only see AST and PP events from the main file (for performance reasons).
This means if you implemented a clang-tidy check in clangd in the obvious way (collect #include information in PPCallbacks, query it while traversing) this check would not work in clangd.
Instead, we built it into clangd in a way that's aware of our preamble optimization - most of the PP events are reused across reparses.

However the capture-PP-info part is fairly simple, and the traverse-AST-and-gather-locations part is independent of clangd.
We could "easily" share this part as a library and build a clang-tidy check out of the same core.

Advantages to making this a clang-tidy check:

  • for users: fits into existing workflows and integrations
  • for maintainers: existing framework takes care of parsing ASTs, emitting diagnostics, applying fixes

Advantages to a separate tool:

  • more control over how code is processed, e.g. traversing headers exactly once
  • the check abstraction may not be a perfect fit, e.g. maybe produce a report instead of diagnostics, or treatment of alternate fixes
  • probably some minor efficiency gains as we're not really using matchers etc

Overall I suspect making this a tidy check would make it more useful to more people. It would mean sorting out the dependency issue sooner rather than later though.


I think it's important to have some spelled-out scope of what we want this tool to do and what workflows it's aiming for.

  • applying fixes vs reporting problems? (If applying, what to do when multiple fixes are possible)
  • adding vs removing includes, or both? (We do plan to support adding includes in clangd)
  • being conservative vs being complete (feeling safe to run this unattended is a big feature, c.f. IWYU)
  • do you run it over files or codebases? (important to consider how headers are treated)
  • assume "well-behaved" code, or does it try to understand everything?
  • does it make use of forward declarations, or just includes?

It's OK to make some of these configurable, but the answers are important to default behavior and also where effort is focused.

Hopefully many of the answers are aligned with what we want for clangd. If not that's OK, but we should plan for this rather than be surprised when it happens :-)

clang-tools-extra/clangd/CMakeLists.txt
183

LLVM's CMake rules make it hard to have multiple binaries in the same directory, so I think this is correct as-is.

clang-tools-extra/clangd/include-cleaner/CMakeLists.txt
1

Until this is a bit more ready for real use, I think this should be add_clang_executable so it doesn't get packaged

nridge added a subscriber: nridge.Mar 19 2022, 5:11 PM

This sounds really cool!

Overall I suspect making this a tidy check would make it more useful to more people.

Yup, it would be super useful addition.

sammccall added a comment.EditedMar 23 2022, 5:36 AM

We'd like to proceed here by extracting the include-cleaner functionality from clangd into a library.
It would be used by:

  • clangd
  • a new clang-tidy check
  • possibly a standalone tool, if clang-tidy doesn't fit well into the cleanup workflow
  • some out-of-tree cleanup tools internal to google (nothing that would interest upstream, just integrating with our source control/code review etc)

The design would be a little different than what's currently in clangd as the scope is larger: it would support suggesting insertions as well as removals, and some limited policy knobs.

Does this sound OK? I'll post a proposal on discourse shortly.