This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] New feature --skip-headers, part 1, setTraversalScope
AbandonedPublic

Authored by chh on Mar 16 2021, 7:28 AM.

Details

Summary

[clang-tidy] New feature --skip-headers, part 1, setTraversalScope

  • This depends on https://reviews.llvm.org/D104071.
  • This new feature has impacts similar to --header-filter and --system-headers.
    • Clang-tidy by default checks header files, reports the number of warnings found in header files but not those warning messages.
    • With --header-filter, clang-tidy shows warnings in selected header files, but not system header files.
    • With --system-headers, clang-tidy shows also warnings in system header files.
    • With --skip-headers, clang-tidy does not check header files, and does not know or count warnings in header files. However, if --header-filter or --system-headers are used, those header files will be checked and warnings be counted and displayed.
  • For users who do not need to know the number of not-shown warnings in header files, this feature saves maybe 60% to 80% of clang-tidy run-time, by skipping checks of header files. When building Android system with clang-tidy, repeated checks on commonly included header files in multiple compilation units took significant part of build time, which could be saved by this new feature.
  • Note that this is not the same as PCH, which usually saves parsing time, but not semantic checks or code generation time.
  • This part 1 implementation only affects the MatchFinder-based checks, not the PPCallback-based or the static analyzer checks. Follow up changes could make the other checks to skip header files and save more time.
  • Add a --show-all-warnings flag. It is hidden, not shown by --help. When it is used, ClangTidyDiagnosticConsumer::checkFilters will not suppress any diagnostic message. This is useful to find tidy checks that have not yet handled the --skip-headers flag.
  • Implementation Methods:
    • Depending on new changes to set/getTraversalScope in https://reviews.llvm.org/D104071, the AST was modified to exclude header file Decls during the call to MatchFinder's ASTConsumer's HandleTranslationUnit.
      • Note that this could impact a MatchFinder-based check, if it is performed on a non-header file, but needs some Decl in headers. Luckily, we haven't seen such a case. If there is such a case in the future, we might need some other method to skip checks without cutting out Decls in AST.
      • Note that this method does not apply to PPCallback-based checks, as many of those checks could give a warning on one file but depend on Decl or macros in other header files. Similarly static analysis checks usually depend on calls to functions in the header files. Not seeing header file Decl in AST could affect such checks on non-header files.
    • ClangTidyASTConsumer overrides two member functions of MultiplexConsumer and accesses the Consumers data member.
      • in HandleTopLevelDecl to collect filtered Decls.
      • in HandleTranslationUnit to call get/setTraversalScope before calling the Finder ASTConcumser, which calls matchAST.
    • DeclFilter::skipDecl checks/skips top level Decls.
    • Use simple cache of the last skipped/accepted FileID in DeclFilter to minimize the overhead of skipLocation.
  • Tests:
    • Add skip-headers*.cpp tests to show the effects of new flags. Some tests need to skip nested Decls and Stmts, even if their top-level Decls should not be skipped.
    • Add bugprone-forward-declaration-namespace-header.cpp test to show that a MatchFinder-based check needs to see all header Decls to issue valid warnings on the main file.
    • Add test cases into misc-unused-using-decls.cpp to show that a MatchFinder-based check needs to see all header Decls to avoid issuing invalid warnings on the main file.
    • Extend some clang-tidy checker tests to make sure that --skip-headers skips checks/warnings in the header files: google-namespaces.cpp modernize-pass-by-value-header.cpp
    • Add runs with --skip-headers in some clang-tidy checker tests to make sure that the option does not hide warnings in the main file:
      • about 18 other clang-tools-extra/test/clang-tidy/checkers/*.cpp
    • Some tests were added --skip-headers=0 flag to get expected warning messages in test header files: checkers/abseil-no-internal-dependencies.cpp checkers/abseil-upgrade-duration-conversions.cpp checkers/google-namespaces.cpp infrastructure/file-filter-symlinks.cpp infrastructure/file-filter.cpp infrastructure/line-filter.cpp

Diff Detail

Event Timeline

chh created this revision.Mar 16 2021, 7:28 AM
chh requested review of this revision.Mar 16 2021, 7:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2021, 7:28 AM
chh updated this revision to Diff 339066.Apr 20 2021, 6:13 PM

hide the --show-all-warnings flag from --help

chh updated this revision to Diff 351616.Jun 11 2021, 6:36 PM
chh retitled this revision from WIP; clang-tidy skip-headers; Alt#2; need more changes to [clang-tidy] Add --skip-headers, part 1.
chh edited the summary of this revision. (Show Details)

sync with latest clang/llvm

chh retitled this revision from [clang-tidy] Add --skip-headers, part 1 to [clang-tidy] Add --skip-headers, part 1 (use setTraversalScope).Jun 11 2021, 7:58 PM
chh added a comment.Jun 16 2021, 7:00 PM

Friendly ping reviewers. Now this change can pass all clang-tidy tests.
Please feel free to comment and/or add other reviewers. Thanks.

This comment was removed by sammccall.

Sorry, previous sent too soon. +hokein as I'll be out next week.

I think we should be careful about which layers the filtering logic is added to, and how much surface area it has.

The minimal thing seems to be adding the clang-tidy options, and modifying the AST-consumer bits of the clang-tidy driver to compute and set the traversal scope based on these options.
I don't think we should bake the "filter" concept into other layers in clang unless there's a strong reason we need to do so.
(As discussed offline, providing a "filter" API requires all the candidates to be enumerated, which can be an expensive operation in the presence of modules/PCH - that's why the model in common layers today is "optional list of top-level decls that should be traversed").

clang-tools-extra/clang-tidy/ClangTidyCheck.h
31 ↗(On Diff #351616)

The layering in clang-tidy is... not particularly great or clear, but there is a distinction between driver-y type code (the clang-tidy application) vs the checks themselves.

As I understand our plan, this filter is "just" a way to configure the traversal scope before running, i.e. what information the AST will expose to the checks.
That is, this is a driver concern and should not be visible to the checks - probably should go in ClangTidy.h or even just ClangTidy.cpp as part of the ClangTidyASTConsumer. My litmus test here is that we do have other drivers (at least clangd that I know of), and they shouldn't be seeing/reusing this mechanism, and so neither should checks.

(Similarly if we do something with PP in the future, hopefully this can *also* be transparent to checks)

clang/include/clang/ASTMatchers/ASTMatchFinder.h
148 ↗(On Diff #351616)

I don't think the filter belongs here.
The design of traversal scope is that it's a property of the AST that affects all traversals, so it shouldn't be a configuration property of particular traversals like ASTMatchFinder.

There's a question of what to do about MatchFinder::newASTConsumer(), which runs the finder immediately after an AST is created, and so covers the point at which we might set a scope. There are several good options, e.g.:

  • Don't provide any facility for setting traversal scope when newASTConsumer() is used - it's not commonly needed, and the ASTConsumer is trivial to reimplement where that's actually needed
  • Accept an optional function<void(ASTContext&)> which should run just before matching starts

This seems a bit subtle, but the difference between having this in MatchFinder proper vs just in newASTConsumer() is important I think, precisely because it's common to run the matchers directly on an existing AST.

chh updated this revision to Diff 353224.Jun 19 2021, 11:27 PM
chh marked an inline comment as done.
chh edited the summary of this revision. (Show Details)

Sync up; moved DeclFilter to ClangTidy.cpp.

chh added inline comments.Jun 19 2021, 11:29 PM
clang-tools-extra/clang-tidy/ClangTidyCheck.h
31 ↗(On Diff #351616)

Thanks. Moved this class to ClangTidy.cpp.

clang/include/clang/ASTMatchers/ASTMatchFinder.h
148 ↗(On Diff #351616)

I have some similar concerns too.

clangd calls MatchFinder::matchAST explicitly after setTraversalScope,
but clang-tidy uses MultiplexConsumer and MatchFinder::newASTConsumer
is just one of the two consumers.
(1) I am not sure if it would be safe or even work to make big changes in
ClangTidy.cpp's CreateASTConsumer to call MatchFinder::matchAST explicitly.
(2) Similarly, I wonder if setTraversalScope will work for both MatchFinder
and other consumers in the same MultiplexConsumer.

Could you check if my current change to MatchFinder::HandleTranslationUnit
is right, especially in the saving/restoring of traversal scope?

I am assuming that each consumer in a MultiplexConsumer will have its own
chance to HandleTranslationUnit and HandleTopLevelDecl.
If that's correct, it seems to me that changing those two handlers in
MatchFinder is right for clang-tidy. Then they will need the optional
MatchFinder::MatchFinderOptions::DeclFilter.

hokein added inline comments.Jun 23 2021, 7:39 AM
clang/include/clang/ASTMatchers/ASTMatchFinder.h
148 ↗(On Diff #351616)

but clang-tidy uses MultiplexConsumer and MatchFinder::newASTConsumer is just one of the two consumers.

Yeah, MultiplexConsumer is a separate interface in clang, and I don't think we have a strong reason to modify it.

However, we could refine the bit in clang-tidy -- clang-tidy uses MultiplexConsumer to dispatch all events to two consumers: the MatcherFinder, the clang's static analyzer, we can get rid of the MultiplexConsumer by implementing a new ASTConsumer, so that we have enough flexibility to affect all traversals without touching all clang areas, so the API will look like

class ClangTidyASTConsumer : public ASTConsumer {

public:
  void HandleTopLevelDecl(...) override {
     // collect all main file decl
  }
  void HandleTranslationUnit(ASTContext &Context) override {
    // set traversal scope.
    MatcherFinderConsumer->HandleTranslationUnit(Context);
    StaticAnalyzer->HandleTranslationUnit(Context);
  }
  
  // implement other necessary Handle* overrides, and dispatch to StaticAnalyzer consumers;

private:
  std::unique_ptr<ASTConsumer> MatcherFinderConsumer;
  std::unique_ptr<...> StaticAnalyzer;
  ... MainFileDecl;
};

Back from vacation!

clang-tools-extra/clang-tidy/ClangTidy.cpp
461

this is probably worth some explanation

472

this comment didn't match the next line.
Did you intend to check Location.isFile() instead?

475

maybe add a comment explaning why the expansion loc is chosen?

475

note that this means we're going to traverse up the macro expansion edges even if we're hitting the cache. You may want to cache under the original file ID instead?

477

caching all results in a DenseMap<FileID, bool> seems likely to perform better, no?

clang/include/clang/ASTMatchers/ASTMatchFinder.h
148 ↗(On Diff #351616)

MultiplexConsumer is just used as a helper to glue together two concrete ASTConsumers, not arbitrary consumers, so we can reason about whether it's safe by reading the code.

The static analyzer's consumer appears to already track top-level decls rather than traversing from the root, exactly to avoid traversing stuff from the preamble. So I would expect setTraversalContext to work fine there.

The approach Haojian suggests of avoiding MultiplexConsumer, and making the tight coupling to the particular consumers explicit, also seems fine.

If that's correct, it seems to me that changing those two handlers in MatchFinder is right for clang-tidy

This might be the most convenient thing for clang-tidy, but I don't think it's the right thing for MatchFinder, which is also important.

chh updated this revision to Diff 354737.Jun 27 2021, 2:59 AM
chh marked 7 inline comments as done.

update some comments and sync again with the latest clang/llvm source

chh added inline comments.Jun 27 2021, 3:01 AM
clang-tools-extra/clang-tidy/ClangTidy.cpp
472

I just wanted to make sure that we can get valid FileID.
Updated the comment.

475

This is to match the ClangTidyDiagnosticConsumer::checkFilters.
Do you think some other get*Location method should be used in both places?

475

I would rather explore various cache performance improvements in follow up CLs.
In my performance tests, current simple check of LastAcceptedFileID and LastSkippedFileID already has very high hit rate due to sequential check of Decls in the included header files. More caching improvements should produce relatively smaller difference, which will be shown/visible easier in their own small change CLs.

477

I am sure most calls hit the last FileID. For the missed calls, I was planning to add other caching mechanism in skipFileID function. That could be in follow up CLs with more accurate measurements to compare the overhead vs hit rate.

clang/include/clang/ASTMatchers/ASTMatchFinder.h
148 ↗(On Diff #351616)

I looked into Hokein's suggestion. Although in general correctly used inheritance is better, there are special cases that copying functionality from another class could be better. I wondered if it could work better by changing ClangTidyASTConsumer to behave like MultiplexConsumer without inheritance.

After looking into MultiplexConsumer's many methods, what are used and could be used in the future by MatchFinder's MatchASTConsumer and static analyzer's AnalysisConsumer, the dependency on MultiplexConsumer to pass through all Consumer methods is obvious. So, ClangTidyASTConsumer really needs to behave like MultiplexConsumer, duplicating all existing and future methods, to keep MatchFinder and AnalysisConsumer working. This looks like a real inheritance use case and difficult to make it better without inheritance.

If MatchFinder users really don't want an optional filter for HandleTranslationUnit and HandleTopLevelDecl, I think an alternative is to define a child class maybe called TidyMatchFinder with the extra features needed now only by clang-tidy. But, there is also advantage in adding these into MatchFinder now for future non-clang-tidy users. That will be similar to the current optional MatchFinder's CheckProfiling and the set/getTraversalScope. Those features/interfaces started with one user/tool and now we see more users/tools. If they were defined in a child class specific to clangd or some profiler, we will now need to rename or move those features to a common base class to share.

Please feel free to suggest any alternative, or let me know if you are okay with current optional filter in MatchFinder, or if you insist on defining a new child class like TidyMatchFinder.
Thanks.

sammccall added inline comments.Jun 28 2021, 8:23 AM
clang/include/clang/ASTMatchers/ASTMatchFinder.h
148 ↗(On Diff #351616)

TL;DR is I think this is a clang-tidy feature that should not be visible in the matcher layer in any way.
If it's going to touch ASTMatcher I think we should have a strong design reason or a very strong implementation reason. That it's kind of awkward to code and might one day be useful isn't sufficient.

There are four layers here: AST nodes themselves, RecursiveASTVisitor (which defines a tree via parent/child relationships), the matcher framework, and finally clang-tidy.
The mechanics of filtering using a list are solved in RecursiveASTVisitor.
There seem to be two issues here against solving the rest in the clang-tidy layer.


First is conceptual: is having filter APIs in the matcher framework good? I don't think so, because:

  • it's duplicative: it mirrors the setTraversalScope functionality and it's unclear when to use which. Having two ways to accomplish a rare task seems excessive.
  • it conflicts with the design of this layer: matchers are all about non-destructive reading, but setting traversal scope has side effects. So it's only really usable with the ASTConsumer, which is a corner of the interface.
  • it's not necessary: it's always possible to get at the underlying ASTContext wherever a MatchFinder will be used.

You suggest this might be good for "future non-clang-tidy users" but I don't think we should add it speculatively for this purpose. It seems at least as likely that such users will never materialize, or that the solution that is right for clang-tidy is not right for them.


Second is practical. That ClangTidyASTConsumerFactory::CreateASTConsumer needs to dispatch both to MatchFinder and the static analyzer's (non-trivial) ASTConsumer. We have to add a call to setTraversalScope somewhere, and none of the approaches are perfectly natural.

However there are several that are OK:

  • subclass MultiplexingASTConsumer to inject the call in HandleTranslationUnit
  • subclass MultiplexingASTConsumer and inject the call and also the matcher-based-checks logic in HandleTranslationUnit (so MAC delegates only to the static analyzer)
  • write an ASTConsumer without MultiplexingASTConsumer that manually forwards only the actually needed methods to the static analyzer consumer
  • write an unrelated ASTConsumer that forwards every method to the static analyzer consumer
  • refactor static analyzer to expose its consumer class, and subclass that (yuck!)

These do require making *some* assumptions about how these consumers work. But that's OK - there are two of them, not thousands, and these interfaces change rarely. We don't need to architect defensively against any possible future change, and that's not culturally how LLVM operates either. I'd lean towards e.g. the first option, which requires fewer assumptions.

chh updated this revision to Diff 355699.Jun 30 2021, 3:12 PM
chh edited the summary of this revision. (Show Details)

Move almost all changes to MatchFinder into ClangTidy.cpp.

chh added a comment.Jun 30 2021, 4:36 PM

I reproduced the ClangdTests::TargetDeclTest.Concept failure locally without this change,
using llvm git commit 0c400e8953. Will wait for upstream fix of the failed tests.

chh updated this revision to Diff 355827.Jul 1 2021, 4:12 AM

with clang-format suggested change

Thanks, this looks much closer.

Customizing the MatchFinder::matchAST() to set the scope is a neat try but I don't think the design really fits (and we have to abuse inheritance to make it work).
Ultimately you need to customize both near HandleTranslationUnit and near HandleTopLevelDecl, and especially the latter really is an ASTConsumer concern.
So implementing an ASTConsumer in some form seems necessary.

I reproduced the ClangdTests::TargetDeclTest.Concept failure locally without this change,

I'll have a look, can't reproduce it locally. Have you seen it on any buildbots?

clang-tools-extra/clang-tidy/ClangTidy.cpp
475

Do you think some other get*Location method should be used in both places?

It depends on what you want to achieve. Matching the check-filters behavior elsewhere makes sense.
(It does raise the question though why the code can't be shared, even if the line filter doesn't apply per se)

475

Is restoring the traversal scope strictly needed by the static analyzer? I would expect not, but I might be wrong.

clang/include/clang/ASTMatchers/ASTMatchFinder.h
144 ↗(On Diff #355827)

Matchfinder specifically says "not intended to be subclassed".

147 ↗(On Diff #355827)

This function doesn't make sense at this layer - calling it does nothing and the class is intended to be used directly, not to be subclassed. And MatchFinder itself doesn't "consume" top-level decls - that's an ASTConsumer concern.

I understand you use it here as a hook to listen for decls seen by the MatchFinder::newASTConsumer().
Better would be to pass a HandleTopLevelDecl callback to newASTConsumer() instead.
But given that MatchFinderASTConsumer is completely trivial and can be built on the public MatchFinder API, better still would just be to reimplement it with the unusual functionality ClangTidy needs.

I reproduced the ClangdTests::TargetDeclTest.Concept failure locally without this change,

I'll have a look, can't reproduce it locally. Have you seen it on any buildbots?

I can't find it on any buildbots or the premerge tests, and can't see it locally either at the commit you suggest nor with this patch applied.
So there might be something going on, but it seems unrelated for sure.

chh updated this revision to Diff 356307.Jul 2 2021, 6:39 PM
chh marked an inline comment as done.
chh edited the summary of this revision. (Show Details)
chh marked an inline comment as done and an inline comment as not done.Jul 2 2021, 6:58 PM

Last build tests were green. Let's see if the new updated diff still passes all the tests.

Looks like all choices have some risk/overhead in future maintenance and we need to pick one with the least cost.
I think cloning classes is most expensive and risky to keep future compatibility.
Adding a new subclass or optional members into existing class are more conventional solutions.

clang-tools-extra/clang-tidy/ClangTidy.cpp
475

I think it is safer to assume low or acceptable overhead in get/setTraversalScope and keep the impact only to the MatchFinder consumer, rather than depending on static analyzer working now or in the future with the changed AST.

clang/include/clang/ASTMatchers/ASTMatchFinder.h
144 ↗(On Diff #355827)

Do you know the reason it cannot have subclass?
All our current implementation choices need to change its matchAST behavior, either by overriding it, or changing the AST before it, or skip Decl in the recursive walk of AST.
For any reason not to have subclass, changing AST outside matchAST seems riskier than changing inside matchAST in a child class.

147 ↗(On Diff #355827)

Please review the updated diff. Now HandleTopLevelDecl is all in clang-tidy.
For both the MultiplexConsumer and MatchFinder, I think it is more feasible to resolve all subclass issues now and maintain the inheritance relation than making clone implementations and trying to maintain the same behavior in the future.

chh updated this revision to Diff 356319.Jul 2 2021, 9:36 PM
chh marked an inline comment as done.

fix clang-format name style issue

chh updated this revision to Diff 356321.Jul 2 2021, 10:11 PM

fix clang-format coding style issue

chh updated this revision to Diff 356589.Jul 5 2021, 7:41 PM
chh edited the summary of this revision. (Show Details)

No more change to or derivation from MatchFinder, but override two member functions in MultiplexConsumer.

This generally looks good now, mostly nits left.

The exception is the remaining complexity around stashing and restoring the traversal scope state.
If it's needed, it points to a deeper problem in static analyzer and we should understand it a bit (and whether it must be fixed as it will undo the performance characteristics of this feature in some circumstances).

clang-tools-extra/clang-tidy/ClangTidy.cpp
310

Ctx is ultimately unused, just take the regex instead?

346

please find a different name for this function or merge the two together, having two different functions differing only by case is confusing

348

this function isn't needed - it's a trivial accessor called once in a place where we can access the field directly.
Similarly with hasFilter. The code accessing the fields directly reads just as clearly.

428

can you move the impl of the members next to the class?

475

I think it is safer to assume low or acceptable overhead in get/setTraversalScope

We have reasons to believe this is not cheap. The first usage of getParent() after changing traversal scope incurs a full AST traversal.

rather than depending on static analyzer working now or in the future with the changed AST.

There are a fairly small number of reasons that setTraversalScope could break static analyzer at this point, and I think they're all going to cause the static analyzer to be slow when used with PCHes. However the static analyzer's implementation and comments say that it goes to effort to be fast with PCHes. So I do believe this is a more stable assumption, and in the long run we may be able to simplify its implementation.

514

This adds significant conceptual complexity and possible runtime costs to guard against the possibility that the static analyzer would not be compatible with simply setting the traversal scope.

Please do not do this. It may appear that you're cheaply eliminating a class of bugs, but complexity is not cheap, and in the long run can also introduce bugs. Instead investigate whether this is a real problem and why.

If it's not then Context.setTraversalScope() + MultiplexingASTConsumer::HandleTranslationUnit is enough here, and FinderASTConsumerPtr can go away.

608

move SharedDeclFilter

clang-tools-extra/clang-tidy/ClangTidyOptions.cpp
99

Maybe an explicit comment that ShowAllHeaders, SkipHeaders are runtime/optimization options that are configured at the command-line only, and therefore not mapped here

clang-tools-extra/clang-tidy/ClangTidyOptions.h
79

This overrides (and is therefore grouped with) HeaderFilterRegex, SystemHeaders.
So the order should probably be
HeaderFilterRegex
SystemHeaders
ShowAllWarnings
SkipHeaders

Comment should comment should explicitly say both "This overrides HeaderFilterRegex, ..." and "This is an option intended for testing/debugging clang-tidy".

84

This is easy to confuse with e.g. HeaderFilterRegex and SystemHeaders, which control policy rather than just implementation strategy.
I'd suggest calling this something like PruneTraversal which hints at the implementation strategy it controls but most importantly that it's *not* about whether you want warnings from the headers or not.

(similarly for the flag)

clang/include/clang/ASTMatchers/ASTMatchFinder.h
144 ↗(On Diff #355827)

Do you know the reason it cannot have subclass?

Because widely-used classes need to decide whether to accept the maintenance cost of supporting inheritance or not, and the author judged that the (conceptual) complexity of supporting it outweighs the things enabled by doing so.

chh updated this revision to Diff 357824.Jul 11 2021, 6:53 PM
chh marked 7 inline comments as done.

sync up latest clang/llvm source; more format/style changes

chh added inline comments.Jul 11 2021, 6:56 PM
clang-tools-extra/clang-tidy/ClangTidy.cpp
310

Context is used in skipFileID.

475

Okay, won't assume low overhead in set/getTraversalScope.
So far their impact is limited to skip-headers, which in this step is limited to only MatchFinder. Please see also my reply in the other comment.

514

For this first step implementation of skip-headers, could we limit the risk and impact to only AST MatchFinder based checks? If it works fine, we can add more performance improvements, handle PPCallback-based checks, and static analyzer checks. We can turn skip-headers to default or revert any follow up step.

At this time, we only know that PPCallback-based checks can save some more time with skip-headers, but do not know if any static analyzer check can benefit from skip-headers easily. In other words, we are sure to deliver big saving of runtime for clang-tidy users that do not enable static analyzer checks, but not sure about static analyzer check users.

The overhead and complexity of set/getTraversalScope and checking filters will be on the users of skip-header, but not on the static analyzer checkers.

If we apply the same change to AST for all consumers, the saving of code complexity is smaller than the risk of impact to static analyzer checks, and the saving of runtime is also much smaller than the already saved time in MatchFinder.

clang-tools-extra/clang-tidy/ClangTidyOptions.cpp
99

Why SystemHeaders is not mapped?
I agree that ShowAllWarnings is not important here,
but SkipHeaders seems as important to be included here as HeaderFilterRegex and SystemHeaders.

clang-tools-extra/clang-tidy/ClangTidyOptions.h
84

Are you suggesting a change of flag name from --skip-headers to --prune-traversal and internal option name from SkipHeaders to PruneTraversal?

So far, "traversal" seems to be an implementation term not yet used in clang-tidy flags. But there is TraversalKind used in some clang-tidy checks. They seem to mean different things from HeaderFilterRegex, and SystemHeaders.

The confusion exists between HeaderFilterRegex and SystemHeaders. Both should really do skip-check-of-headers. Their "implementation" is like a short-cut to only suppress output but not skipping the checks. When skip-headers is enabled and maybe become the default, both HeaderFilterRegex and SystemHeaders will make more sense.

sammccall added inline comments.Jul 12 2021, 2:22 AM
clang-tools-extra/clang-tidy/ClangTidy.cpp
310

Oops, I did miss this. Since we extract the header regex, can we also pull out the one boolean that we check later into a field? This would make it clearer how the context is used, since it's a large object.

514

OK, fair enough. However you seem to both be saying "let's do this for now" and "if traversal scope is unimportant for static-analyzer performance, we need never change this".
The main point of applying traversal scope uniformly is to avoid complexity and special cases, not to improve performance.

If we want to have this fairly unusual mechanism and require maintainers to understand it, then needs to be mitigated by a significant comment somewhere explaining:

  • what is being done (wrapping HandleTranslationUnitDecl in traversalscope load/restore logic for one consumer only)
  • how it's being done (keep an extra pointer to the "special" consumer so we can identify it)
  • why it's being done (ideally concrete problems in applying the scope everywhere, but at least why we don't expect this to be safe)

Coming up with a good answer to the "why" question is part of this change.

clang-tools-extra/clang-tidy/ClangTidyOptions.cpp
99

Why SystemHeaders is not mapped?

No idea, sorry.

but SkipHeaders seems as important to be included here as HeaderFilterRegex and SystemHeaders.

I don't think so.

HeaderFilterRegex and SystemHeaders control policy, and it's reasonable for that to depend on the specific code (and therefore be dynamically configurable).
SkipHeaders only controls the mechanism/implementation strategy - why do we need to support different strategies for different files in the same clang-tidy invocation, rather than just rolling it out with a flag?

Moreover, the plan AIUI is for SkipHeaders to become the default and for the option to go away. We should try to avoid supporting it in config files if that's the case.

clang-tools-extra/clang-tidy/ClangTidyOptions.h
84

Are you suggesting a change of flag name from --skip-headers to --prune-traversal and internal option name from SkipHeaders to PruneTraversal?

Yes, sorry I left the comment at the wrong spot.

So far, "traversal" seems to be an implementation term not yet used in clang-tidy flags.

That's right, because clang-tidy flags generally control behavior rather than implementation. But "skip headers" is different - it's intended to produce the same set of warnings, but faster by doing the filtering at a different step. In the long run I would prefer not to have this flag, but if we need it then it should have a name that emphasizes the implementation strategy, which is what it controls.

But there is TraversalKind used in some clang-tidy checks. They seem to mean different things from HeaderFilterRegex, and SystemHeaders.

Yes, TraversalKind is unrelated (and IMO also poorly named).

The confusion exists between HeaderFilterRegex and SystemHeaders. Both should really do skip-check-of-headers. Their "implementation" is like a short-cut to only suppress output but not skipping the checks. When skip-headers is enabled and maybe become the default, both HeaderFilterRegex and SystemHeaders will make more sense.

Sorry, I don't understand this paragraph. HeaderFilterRegex and SystemHeaders both help define which files are "in scope" for clang-tidy checks. This is the same whether --skip-headers=1 or --skip-headers=0, the only difference is at what stage the filtering is applied. I agree the --show-all-warnings flag should disable post-filtering for both.

chh retitled this revision from [clang-tidy] Add --skip-headers, part 1 (use setTraversalScope) to [clang-tidy] New feature --skip-headers, part 1.Jul 12 2021, 10:33 PM
chh edited the summary of this revision. (Show Details)
chh added a comment.Jul 12 2021, 11:11 PM

Sam, the revision summary is updated. Could you review it again?

The new summary would clarify a few points:

  • This --skip-headers is a new feature, not just a transparent run-time saving. It should be used carefully with header-filter and system-headers, and similarly handled in IO.mapOptional.
  • It's not PCH.
  • It uses set/getTraversalScope as an implementation method, but the goal of skip-headers is not to limit clang-tidy to see only non-header Decls.
  • Improvements on PPCallback-based and static analyzer checks could be in follow up changes. The gain might not be as large as the MatchFinder-based checks and the implementation could be more complicated than using set/getTraversalScope.

I am glad that with your help on set/getTraversalScope,
many part of this revision has become much simpler.
I am looking forward to applying this new feature, even with only part 1,
to save very significant Android build time, but I am also extremely cautious
not to combine many risky changes into one big revision.

In D98710#2873141, @chh wrote:

Sam, the revision summary is updated. Could you review it again?

Just to clarify - the code hasn't changed since last review pass right? I think the comments there still apply. Happy to review the patch description though!

This new feature has impacts similar to --header-filter and --system-headers.

I don't think it does a similar thing to those flags, rather it mostly affects how those flags work. (The stuff about diagnostic counts is details that aren't really the "top-line" of this feature).
I'd say rather something like:

The --skip-headers mode changes how the flags --header-filter and --system-headers limit the scope of checks.`
With --skip-headers=0 (old behaviour; default), checkers inspect all code, but warnings in files out of scope are not reported.
With --skip-headers=1, checkers do not inspect code from files that are out of scope. This can be a significant performance improvement.

Add a --show-all-warnings flag. It is hidden, not shown by --help. When it is used, ClangTidyDiagnosticConsumer::checkFilters will not suppress any diagnostic message. This is useful to find tidy checks that have not yet handled the --skip-headers flag.

This doesn't look right - an important part of the design is that tidy checks shouldn't need to be modified. (Though it's possible to imagine checks that wouldn't work properly in this mode)
Maybe rather "useful in conjunction with --skip-headers to find checks that may be processing code that should rather be skipped"?

If there is such a case in the future, we might need some other method to skip checks without cutting out Decls in AST

Realistically, I think we should probably just say such cases are unsupported. We believe no such checks exist in-tree, and clang-tidy upstream can't really be constrained by what people are doing out-of-tree.

chh updated this revision to Diff 359506.Jul 16 2021, 6:04 PM
chh retitled this revision from [clang-tidy] New feature --skip-headers, part 1 to [clang-tidy] New feature --skip-headers, part 1, (tested as default).
chh edited the summary of this revision. (Show Details)

to test skip-headers as default, do not submit;
show that setTraversalScope failed some static analyzer checks,
see #if 0 comments in ClangTidy.cpp.

chh updated this revision to Diff 359532.Jul 16 2021, 10:45 PM
chh retitled this revision from [clang-tidy] New feature --skip-headers, part 1, (tested as default) to [clang-tidy] New feature --skip-headers, part 1, setTraversalScope.
chh edited the summary of this revision. (Show Details)

--skip-headers is set to false by default;
many tidy tests runs are augmented with explicit --skip-headers or --skip-headers=0
to test this feature as on or off.

chh added a comment.Jul 19 2021, 6:12 PM

Sam,

The latest tested change contained one of your suggestions:

using setTraversalScope for all consumers in the
ClangTidyASTConsumer::HandleTranslationUnit

It is good to pass all existing tests,
although not a proof of correctness yet.

The failed libarcher.* tests have been there for days.
We can be sure that those are unrelated to this change.

We still have some different views of the purpose and requirements of
these new flags, skip-headers and show-all-warnings. That's why it is
important to describe them correctly in the summary.
We can try to include more use cases or applications now or later,
however, the following has been our requirement for a successful story
for the first Android deployment:

--skip-headers should work correctly, even if not at optimal speed.

Correctness means no more or fewer "displayed" warnings with or without
this flag, although it could report fewer "suppressed" header file warnings.

For desired performance gain, experiment data showed that it is more than
enough to get savings from only MatchFinder-based checks.

We are less critical on "implementation" methods, code complexity, or efficiency.
Using set/getTraversalScope or not, cutting Decls in advance or on the fly,
cutting only top-level Decls or all Decls at any level, are all acceptable
alternatives if they produce the same "displayed" warnings as before.

Please also take a look of https://reviews.llvm.org/D98709,
which skips Decls at all levels on-the-fly.
That is a different implementation with 50% less changes in ClangTidy.cpp
than this one. The changes in other files are identical or tiny.
It is a little slower, but D98709 is smaller and simpler to maintain,
and it limits the impact to only MatchFinder, not static analyzer.

Now the bad news is that when tested against the whole Android source,
I found failed cases, which are missing clang-tidy warning messages
by both implementations.

The missed warnings were bugprone-forward-declaration-namespace.
There could be other misses undetected.
According to bugprone/ForwardDeclarationNamespaceCheck.cpp,
it's a MatchFinder-based check that can report warning on
Decl in the main file, but the check needs Decls in an
included file. For this kind of checks, skipping their
matchers for Decls in header files is wrong.
This is similar to what we found before that some checks
need the root TranslationUnit node in AST, so we have to
change set/getTraversalScope to keep the root node.

Now we realized that we cannot simply skip or cut out Decls
of headers files for ALL MatchFinder-based checks. Some checks
or some of their matchers need to see ALL Decls in ALL source files.

I will try to update this D98710 and D98709 with new test cases
to show the failed bugprone-forward-declaration-namespace checks.
Then, I will try some implementation changes to work also for
those tidy checks. The implementation probably won't be as simple
as this one to cut all top-level header file Decls for all checks.

chh updated this revision to Diff 360339.Jul 20 2021, 6:42 PM
chh edited the summary of this revision. (Show Details)

Add bugprone-forward-declaration-namespace-header.cpp test to
show that MatchFinder-based checks can also depend on header Decls.

Please before sending any more patches, let's resolve the design questions here (or via mail).

It's clear that it's possible to write a clang-tidy check whose behavior on the main file cannot be preserved without traversing the whole AST.
Silly example: the main file should have a constant like int TotalNumberOfVarDeclsInThisTranslationUnit = 61523;, warn if it's missing or incorrect.

Possibilities for dealing with this include:

  1. traversing the whole AST
  2. traversing a partial AST, and accepting some divergence in behavior
  3. allowing each check to customize the strategy for subsetting the AST

So far we've been discussing #2, and hoping the divergences are small/acceptable. Are you saying that unless the divergence is zero, this feature is no use to you? And at a high level, what's your idea to address this?

Compared to the gold standard of traversing the whole AST, we can go wrong either by adding warnings we don't want (false positives), or missing warnings we want (false negatives). False positives are really bad for clang-tidy and I think we can't allow these to be common in practice. However false negatives with certain checks (e.g. the forward-declaration-namespace check) seem like an acceptable functionality vs performance tradeoff that's more than offset by the ability to run clang-tidy in more contexts.

chh updated this revision to Diff 360651.Jul 21 2021, 5:06 PM
chh edited the summary of this revision. (Show Details)

Add one more skip-headers-2.cpp test, which shows desired check at a top-level Decl but not at nested Decl. Checking locations of only top-level Decls will miss the opportunity to skip nested Decls.

chh added a comment.Jul 22 2021, 3:37 PM

Some Android developers and legacy code care less about clang-tidy warnings.
Newer developers spend a lot of time to get lint-free new code.
So Android source tree has a lot of clang-tidy flags like
header-filter, checks, and warnings-as-errors to select checks
for different modules. People are even asking for new features like
making header-filter to accept a list of regexp like the checks flag.

If a new change to header-filter or skip-headers lose existing valid warnings,
maybe it will be perfectly acceptable to some but not to people who want to
make code lint-free. To them, these changes are regressions.
A clang-tidy check's owner would not like to lose capability for
some minor performance gain, either.

bugprone-forward-declaration-namespace is just one valid warning found
in a couple of days. Android tree is huge and has disabled many checks,
so we won't know the impact to all checks any time soon.
If we have a mechanism to keep any special checks from the impact,
like the PPCallback-based checks, it will be much safer to release
the first phase skip-headers and know that we can fix any bad impact quickly.

I am trying such a mechanism to support "see-all-file" MatchFinder-based checks.
Those checks will be very few, so skip-headers still saves a lot of runtime.
After more tests, I will upload it to D98709.

Please note that I added a new test skip-headers-2.cpp.
It is extremely simplified, but reflects the need to check a top-level
Decl and ignore warnings in included nested Decls in other files.
If we implement skip-headers based on only top-level Decls,
we won't be able to support such a use case.
In D98709, we check/skip not only top-level Decls.

chh updated this revision to Diff 361871.Jul 26 2021, 6:00 PM

add skip-headers-3.cpp test to show that checking only Decl locations is not enough to skip some warnings in header files

chh updated this revision to Diff 362239.Jul 27 2021, 5:14 PM
chh edited the summary of this revision. (Show Details)

Add new test cases into misc-unused-using-decls.cpp.

chh updated this revision to Diff 362704.Jul 29 2021, 3:38 AM

sync to the latest source; apply clang-format; add new skip-headers-4.cpp test

MTC added a subscriber: MTC.Aug 16 2021, 6:43 PM
chh abandoned this revision.Sep 14 2022, 1:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 14 2022, 1:10 PM