This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] misc-include-cleaner: avoid duplicated fixes
AbandonedPublic

Authored by danix800 on Aug 31 2023, 2:47 AM.

Details

Summary

Do deduplication when inserting missing headers when diags are not self contained.

Fixes https://github.com/llvm/llvm-project/issues/65285

Diff Detail

Event Timeline

danix800 created this revision.Aug 31 2023, 2:47 AM
danix800 requested review of this revision.Aug 31 2023, 2:47 AM
danix800 edited the summary of this revision. (Show Details)Aug 31 2023, 2:52 AM

Thanks! @kadircet should make the call on these features (they look good to me), I have some implementation notes.

This is making two independent changes, one is a policy change that should be applied to the include-cleaner library, and one is a bugfix to the clang-tidy check. I think they should be separate patches.

clang-tools-extra/clang-tidy/misc/IncludeCleanerCheck.cpp
168

If we want this policy, it should be provided at the include-cleaner library level so it applies to all clients, not just the clang-tidy check.

I think the clearest way is to encapsulate this in include_cleaner::Includes, so that match() never matches a Header requirement against a duplicative Include. This will naturally lead to the directives being marked as unused.

(In practice, the result will work in clangd, and will respect configuration of headers where analysis should be ignored, keep directives, etc)

219

This means that if you're looking at individual diagnostics rather than applying all fixes to a file, only the first diagnostic will have any fix available at all.

I believe the preferred solution is to do this conditionally based on areDiagsSelfContained(). clang_tidy::utils::IncludeInserter encapsulates this, but isn't used here.

I don't know whether we would want to use it here, or how carefully it's already been considered. (It definitely contains a lot of logic that is dubious to apply when the include to insert has already been precisely calculated, but also some things that would be helpful). Will defer to Kadir on that.

danix800 added inline comments.Aug 31 2023, 8:27 AM
clang-tools-extra/clang-tidy/misc/IncludeCleanerCheck.cpp
168

I'll leave the policy part to another revision (maybe by others).

219

Thanks for mentioning the clang_tidy::utils::IncludeInserter. I took a further look and found that IncludeInserter is
more suitable for this checker.

danix800 updated this revision to Diff 555059.Aug 31 2023, 8:30 AM
danix800 retitled this revision from [clang-tidy] misc-include-cleaner: remove duplicated includes & fixes to [clang-tidy] misc-include-cleaner: fix duplicated fixes.
danix800 edited the summary of this revision. (Show Details)
  1. Use clang::tidy::utils::IncludeInserter to avoid repeated insertion;
  2. Leave out the deduplication policy part.
danix800 retitled this revision from [clang-tidy] misc-include-cleaner: fix duplicated fixes to [clang-tidy] misc-include-cleaner: avoid duplicated fixes.Aug 31 2023, 8:37 AM

Ping @kadircet~, could you please take a look at this revision?

Thanks a lot for the patch @danix800 !

I initially was rather focused on behavior of this check in workflows that require seeing "self-contained-diags", but also I rather see the bulk-apply use cases as always requiring clang-format afterwards. Hence didn't really try to polish that use case a lot, but I believe changes in this patch improve those use cases reasonably. But users do still need to run clang-format afterwards, e.g. if we first generate a finding that inserts "b.h" and then "a.h", they'll be in wrong order. So this is only fixing some cases. If you'd like to work on a more complete approach, we can prepare all the edits in a single tooling::Replacements and run clang::format::cleanupAroundReplacements on top of it to generate proper edits, and emit FixIts with those merged replacements. Note that this still won't be enough in all cases, as there can be other checks that emit fixes that are touching includes :/

Regarding usage of IncludeInserter; we were deliberately not using IncludeInserter here, as it has slightly different semantics to HeaderIncludes, which is used by all the other applications of include-cleaner when producing edits. That way it's easier for us to reason about the behavior in various places (and also to fix them). But moreover, HeaderIncludes uses clang-format config to figure out include-styles and works with a bigger set of projects without requiring extra configurations (hence this patch will actually be a regression for those). Therefore can you keep using HeaderIncludes, while skipping generation of duplicate fixits when we're in non-self-contained-diags mode (assuming you don't want to generalize the approach as I explained above).

Thanks a lot for the patch @danix800 !

I initially was rather focused on behavior of this check in workflows that require seeing "self-contained-diags", but also I rather see the bulk-apply use cases as always requiring clang-format afterwards. Hence didn't really try to polish that use case a lot, but I believe changes in this patch improve those use cases reasonably. But users do still need to run clang-format afterwards, e.g. if we first generate a finding that inserts "b.h" and then "a.h", they'll be in wrong order. So this is only fixing some cases. If you'd like to work on a more complete approach, we can prepare all the edits in a single tooling::Replacements and run clang::format::cleanupAroundReplacements on top of it to generate proper edits, and emit FixIts with those merged replacements. Note that this still won't be enough in all cases, as there can be other checks that emit fixes that are touching includes :/

Regarding usage of IncludeInserter; we were deliberately not using IncludeInserter here, as it has slightly different semantics to HeaderIncludes, which is used by all the other applications of include-cleaner when producing edits. That way it's easier for us to reason about the behavior in various places (and also to fix them). But moreover, HeaderIncludes uses clang-format config to figure out include-styles and works with a bigger set of projects without requiring extra configurations (hence this patch will actually be a regression for those). Therefore can you keep using HeaderIncludes, while skipping generation of duplicate fixits when we're in non-self-contained-diags mode (assuming you don't want to generalize the approach as I explained above).

Thanks for the background!

If we have further plans from upstream on these checker/clang-format related logic, I can wait and abandon this revision.

Or if such issue does need fixing then coud you take a look at the previous diff 1
which uses an internal set to prevent this issue?

If we have further plans from upstream on these checker/clang-format related logic, I can wait and abandon this revision.

Not at the moment, so please don't!

Or if such issue does need fixing then coud you take a look at the previous diff 1
which uses an internal set to prevent this issue?

Yes, I think that approach makes sense. But we should make it conditional on areDiagsSelfContained (basically de-duplicate only when diags are not self contained).

danix800 updated this revision to Diff 555867.Sep 5 2023, 8:23 AM
danix800 edited the summary of this revision. (Show Details)
  1. Revert to internal set (not using IncludeCleaner);
  2. Only do deduplication when not areDiagsSelfContained().
danix800 edited the summary of this revision. (Show Details)Sep 5 2023, 8:24 AM
danix800 abandoned this revision.Sep 5 2023, 7:10 PM