This is an archive of the discontinued LLVM Phabricator instance.

Allow passing a regex for headers to exclude from clang-tidy
Needs RevisionPublic

Authored by toddlipcon on Jun 26 2017, 5:06 PM.
Tokens
"Party Time" token, awarded by aminya."Love" token, awarded by Ashimaru."Love" token, awarded by renefritze."Love" token, awarded by marcpawl."Love" token, awarded by saimusdev."Love" token, awarded by djpeinado.

Details

Summary

This patch adds the ability to specify a regex of headers to exclude from clang-tidy diagnostics. This is the inverse of the existing header regex.

In our project we want to do something like include src/.*.h but exclude src/some-thirdparty/.*.h. Given only a positive regex configuration, we would have had to list all directories aside from 'some-thirdparty' such as src/(foo|bar|baz|...)/, which isn't practical or maintainable.

A while back there was a thread http://lists.llvm.org/pipermail/cfe-dev/2015-November/046203.html which suggested using extended regexes which offer negative lookahead assertions to achieve this, but it was rejected due to poor stdlib support. I think an "exclude" regex is also more familiar to users since it shows up commonly in tools like rsync/tar/etc.

(note: I previously posted this as D34415 but was asked to re-post it with cfe-commits as a subscriber. Sorry for the double-post)

Diff Detail

Repository
rL LLVM

Event Timeline

toddlipcon created this revision.Jun 26 2017, 5:06 PM
JonasToth added a subscriber: JonasToth.

i added reviewers, since it seems nobody takes care of this check.
remove if this was bad.

alexfh edited edge metadata.Jul 14 2017, 7:11 PM

Please regenerate the patch with full context (e.g. as described in llvm.org/docs/Phabricator.html).

alexfh requested changes to this revision.Jul 17 2017, 8:14 AM
This revision now requires changes to proceed.Jul 17 2017, 8:14 AM
toddlipcon edited edge metadata.

Regenerated patch with full context

I wonder whether anyone uses file patterns that need anything from regular expressions beyond | and .*. If not, globs (as used in -checks=) would be a better solution.

One problem with a header-filter + exclude-header-filter is that it doesn't make it easier to express restrictions similar to "everything under a/ (except for everything under a/b/ (except for everything under a/b/c/))".

In our project we want to do something like include src/.*.h but exclude src/some-thirdparty/.*.h.

There are at least two other possibilities to handle this use case.

The first one is to extend the way clang-tidy handles per-directory configs. Currently, for each translation unit clang-tidy takes the configuration for the main file of the translation unit and uses the Checks option found in this configuration to filter diagnostics coming from the main file as well as from the headers included by the translation unit. In a better world, it would take the configuration for the header where the diagnostic originates and thus allow a more source-centric way of controlling the set of diagnostics it displays. For example, consider this setup:

a/.clang-tidy: Checks=-*,check1,check2
a/a.cpp: #include a/a.h, #include b/b.h, #include third-party/c.h
a/a.h
b/.clang-tidy: Checks=-*,check2,check3
b/b.h
third-party/.clang-tidy: Checks=-*
third-party/c.h

Let's assume that each of a/a.h, b/b.h, third-party/c.h and a/a.cpp contain code that triggers check1, check2 and check3. Currently, when run on a.cpp without -header-filter, clang-tidy would only output check1 and check2 from a.cpp. -header-filter=.* will result in check1 and check2 diagnostics from all the files.

If we change clang-tidy to apply the most relevant local configuration to the generated diagnostics (and get rid of -header-filter altogether, or set it to .* by default), in the case above it would output check1 and check2 from a/a.cpp and a/a.h, check2 from b/b.h (since we don't even run check3 on the code, and check1 gets filtered out), and no diagnostics from third-party/c.h, since we filter out check1 and check2 according to the local configuration. This seems like a more logical and useful behavior, however, when implementing this we'll have to make sure configuration retrieval doesn't become a bottleneck (FileOptionsProvider already implements some sort of caching, but we'd have to carefully benchmark it on large translation units with tons of diagnostics).

The second way to handle this use case is possible on a different level: one can run clang-tidy over the whole project with -header-filter=.* and then filter the results and use a specialized tool to display them, e.g. https://github.com/Ericsson/codechecker. I'm not sure though whether this suits your needs or is an overkill.

alexfh requested changes to this revision.Jul 24 2017, 9:07 AM
This revision now requires changes to proceed.Jul 24 2017, 9:07 AM

The second way to handle this use case is possible on a different level: one can run clang-tidy over the whole project with -header-filter=.* and then filter the results and use a specialized tool to display them, e.g. https://github.com/Ericsson/codechecker. I'm not sure though whether this suits your needs or is an overkill.

This approach does not work for us, as our external headers generate a lot of warnings. It's not so much the warnings that are an issue, but generating the notes is taking a lot of time. We implemented this patch in our local installation, and it dramatically reduces the time it takes to run clang-tidy on a "component", where we can focus on the source and header files in a subdirectory tree, and exclude headers from other subdirectories. In the worst case, we're really comparing 1 hour to a few minutes.

Even the first approach is not so interesting, as we would like warnings for another subdirectory tree to be visible in this subdirectory. What we really want is to be able to segregate warnings per component (a subdirectory tree) and don't bother people who import headers from other subdirectory trees with local warnings. So the owner of a subdirectory tree should see all warnings for their tree, but consumers of headers should not be bothered.

estan added a subscriber: estan.Mar 22 2020, 2:10 AM

It would be great to get something like this in.

In our project we're currently using a kludge of overriding the RULE_LAUNCH_COMPILE CMake variable to insert our own shell script as the compiler. The script strips the clang-tidy invocation inserted by CMake if the source path matches any of a set of regexes we have in a .clang-tidy-exclude file.

The original suggested solution with an exclude regex would be sufficient for us, but the source-centric approach described by @alexfh would also work. Just something would be good.

srj added a subscriber: srj.May 2 2020, 4:58 PM
saimusdev added a subscriber: saimusdev.
renefritze added a subscriber: renefritze.

Hello,

Any news on accepting this feature?

srj added a comment.Oct 13 2020, 6:29 PM

This is marked as "needs revision", but it's not clear to me what the requested changes actually are.

The proposed fix here may not be perfect, but AFAICT, there is simply no good way to exclude headers that are beyond your control at present.

Hiralo added a subscriber: Hiralo.Dec 9 2020, 2:02 AM
Hiralo added a comment.Dec 9 2020, 2:07 AM

We are dependent on this patch to exclude headers in regex while running clang-tidy.
Can we have this patch merged with upcoming GA versions? or
Can someone summarize what is pending in this review or what could be good alternative in v11 clang-tidy?
Thank you in advance.

MartinG added a subscriber: MartinG.EditedFeb 12 2021, 12:00 PM

More than 3.5 years later and this extremely essential feature hasn't been merged yet?

MTC added a subscriber: MTC.Aug 16 2021, 6:57 PM
Ashimaru added a subscriber: Ashimaru.

In my company we are using a lot of standard checks and it becomes time consuming - this patch would be extremely useful to us and non of proposals montioned @alexfh seem to solve the issue.
We would like to select which directories containing headers would be checked and using filtering after checking still has problem of running all checks on file that will later discarded - and we do not like paying for something we do not use ;)

Repeating my question from an earlier comment: would a header glob list (similar to how the format of the -checks= flag) be enough for the use cases folks have? On the one hand glob list doesn't support all the features of regular expressions, but are they all useful for matching paths? Glob list in clang-tidy currently supports set union (similar to regex |) and <any subsequence> - * (regex .*). If needed, the support can be expanded with ?, character classes, character ranges and/or other features of POSIX globs (https://man7.org/linux/man-pages/man7/glob.7.html). On the flipside, glob list has a cleaner syntax (no need to quote characters common in paths - like .), and allows to easily express exclusion of subsets. It should be a convenient tool to represent a set of files / directories. In comparison to the proposed header-filter + exclude-header-filter glob list makes it possible to naturally express restrictions similar to "everything under a/ (except for everything under a/b/ (except for everything under a/b/c/))" - a/,-a/b/,a/b/c/.

What do folks think?

aco added a subscriber: aco.Sep 30 2021, 1:46 AM

Hi, first of all I want to say that the name "HeaderFilterRegex" is a bit ambiguous, we don't know intuitively if we define what is filtered or what passes.

in my humble opinion alexfh's proposal would be much easier to use than 2 regex.

would a header glob list (similar to how the format of the -checks= flag) be enough for the use cases folks have?

For me yes it's enough, and will be welcome.
255 added a subscriber: 255.Oct 12 2021, 8:56 AM

Repeating my question from an earlier comment: would a header glob list (similar to how the format of the -checks= flag) be enough for the use cases folks have? On the one hand glob list doesn't support all the features of regular expressions, but are they all useful for matching paths? Glob list in clang-tidy currently supports set union (similar to regex |) and <any subsequence> - * (regex .*). If needed, the support can be expanded with ?, character classes, character ranges and/or other features of POSIX globs (https://man7.org/linux/man-pages/man7/glob.7.html). On the flipside, glob list has a cleaner syntax (no need to quote characters common in paths - like .), and allows to easily express exclusion of subsets. It should be a convenient tool to represent a set of files / directories. In comparison to the proposed header-filter + exclude-header-filter glob list makes it possible to naturally express restrictions similar to "everything under a/ (except for everything under a/b/ (except for everything under a/b/c/))" - a/,-a/b/,a/b/c/.

What do folks think?

Yes in my case this would be enough :)

I've recently fiddled with the GlobList to enable globbing in NOLINT expressions, so I have it quite fresh and I think it could easily be re-used here. I'd be happy to implement this feature, should I (can I?) continue on this patch or create a brand new one?

carlosgalvezp added a comment.EditedOct 28 2021, 4:51 AM

On the other hand, switching from Regex to Glob will be quite a breaking change - everyone has .* in their config file, which will silently stop to work when switching to Glob and suddenly people won't get their headers linted. What do you think @alexfh ?

carlosgalvezp added a comment.EditedOct 28 2021, 5:41 AM

Also, the new header filter expresion would need to be: HeaderFilter: '*/mydir/*' instead of HeaderFilter: 'mydir'. Perhaps it's clearer since it's more explicit, but it's yet another breaking change.

everyone has .* in their config file, which will silently stop to work

Nevermind this. .* treated as a glob will turn into the regex ..* which is equivalent.

Here's a patch that uses Globs, let me know how you like it!
https://reviews.llvm.org/D112720

aminya added a subscriber: aminya.Feb 23 2022, 5:32 PM

Clang-tidy is very slow if you use it on a medium or large project. Something like this feature, even if it is not perfect, will definitely improve the experience with clang-tidy. The low performance of clang-tidy is one of the biggest frictions for using it.

Herald added a project: Restricted Project. · View Herald TranscriptApr 7 2022, 4:57 AM

This feature appears to have strong user support throughout this review, and a lot of people (including myself) seem interested in having it landed.

cc'ing some clang-tidy owners: @LegalizeAdulthood @carlosgalvezp @njames93

Is there anything blocking this from moving forward? Can we work to get it submitted?