This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Support IgnoredRegexp configuration to selectively suppress identifier naming checks
ClosedPublic

Authored by smhc on Oct 27 2020, 7:31 PM.

Details

Summary

The idea of suppressing naming checks for variables is to support code bases that allow short variables named e.g 'x' and 'i' without prefix/suffixes or casing styles. This was originally proposed as a 'ShortSizeThreshold' however has been made more generic with a regex to suppress identifier naming checks for those that match.

Diff Detail

Event Timeline

smhc requested review of this revision.Oct 27 2020, 7:31 PM
smhc created this revision.
smhc retitled this revision from Add IgnoreShortNames config to identifier naming checks to [clang-tidy] Add IgnoreShortNames config to identifier naming checks.Nov 4 2020, 6:38 PM
smhc added reviewers: hokein, njames93.

Overall pretty good with this, but i feel the name should be ShortNameThreshold, WDYT?

clang-tools-extra/clang-tidy/readability/IdentifierNamingCheck.cpp
150

Best to use the literal here to make sure we use the unsigned version of Options.get

smhc updated this revision to Diff 303072.Nov 5 2020, 3:36 AM

Thanks for the review! Agreed with suggestions, submitted updated diff.
Please note I do not have commit access to llvm. Regards.

Oh can you also update the release notes, there is already a largish list of changes to this check there.

Oh to run the test locally, run ninja check-clang-extra-clang-tidy think that's right.

smhc updated this revision to Diff 303311.Nov 5 2020, 6:44 PM
smhc marked an inline comment as done.

I've updated the release notes and corrected the unit tests to be passing. Thanks again.

smhc set the repository for this revision to rG LLVM Github Monorepo.Nov 12 2020, 3:30 AM
smhc added a project: Restricted Project.
Herald added a project: Restricted Project. · View Herald TranscriptNov 12 2020, 3:30 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Should this be a NamingStyle option instead.
{key: readability-identifier-naming.ParameterShortSizeThreshold, value: 2}
WDYT?

Should this be a NamingStyle option instead.
{key: readability-identifier-naming.ParameterShortSizeThreshold, value: 2}
WDYT?

I think that makes a lot of sense -- I can imagine wanting to enforce different identifier lengths depending on whether we're spelling a type name vs a local variable name, etc.

smhc added a comment.Nov 12 2020, 10:15 PM

Should this be a NamingStyle option instead.
{key: readability-identifier-naming.ParameterShortSizeThreshold, value: 2}
WDYT?

I think that makes a lot of sense -- I can imagine wanting to enforce different identifier lengths depending on whether we're spelling a type name vs a local variable name, etc.

I considered this but thought the configuration and documentation would be quite cumbersome. We could provide a global setting (as already done) and allow refining it further by type if needed? For what it's worth I only need it for local variables, I imagine that would be the main use case.

Or should we simply add this threshold to every type of name, similar to how suffix, prefix and case style have been done?

Should this be a NamingStyle option instead.
{key: readability-identifier-naming.ParameterShortSizeThreshold, value: 2}
WDYT?

I think that makes a lot of sense -- I can imagine wanting to enforce different identifier lengths depending on whether we're spelling a type name vs a local variable name, etc.

I considered this but thought the configuration and documentation would be quite cumbersome. We could provide a global setting (as already done) and allow refining it further by type if needed? For what it's worth I only need it for local variables, I imagine that would be the main use case.

Or should we simply add this threshold to every type of name, similar to how suffix, prefix and case style have been done?

I think length of names matters differently in different contexts. I think three-letter identifiers as a class name should probably be a rarity, but may be more reasonable as a local variable. I think one-letter local variables are generally a bad thing, except for loop induction variables. That sort of thing. (And we may someday want to add more specific naming categories like "loop induction variable".) So I think adding the threshold to all the different name kinds is a more flexible approach.

smhc updated this revision to Diff 305341.Nov 14 2020, 9:02 PM
smhc added a comment.Nov 14 2020, 9:13 PM

I updated it to use XXShortSizeThreshold.

But I was thinking.. perhaps a more generic approach could be "xxIgnoreRegex", this would allow you to do things such as :

VariableIgnoreRegex: ([a-z]{2}|index)
ClassIgnoreRegex: (Special_CASE|CommonOther.*)
etc.

it would be more generic while also allowing for excluding short names.

smhc updated this revision to Diff 305395.Nov 15 2020, 4:02 PM

Re-based to llvm master and re-diffed.

I updated it to use XXShortSizeThreshold.

But I was thinking.. perhaps a more generic approach could be "xxIgnoreRegex", this would allow you to do things such as :

VariableIgnoreRegex: ([a-z]{2}|index)
ClassIgnoreRegex: (Special_CASE|CommonOther.*)
etc.

it would be more generic while also allowing for excluding short names.

I don't have a strong opinion on that approach. It would be a more flexible approach and that's appealing, but it also means writing regular expressions which is a bit unappealing because those aren't obvious to everyone and it's unclear what the performance characteristics would be of running those regexes on every identifier in the program. I'm curious if @alexfh or @njames93 (or others) have opinions on this.

I don't have a strong opinion on that approach. It would be a more flexible approach and that's appealing, but it also means writing regular expressions which is a bit unappealing because those aren't obvious to everyone and it's unclear what the performance characteristics would be of running those regexes on every identifier in the program. I'm curious if @alexfh or @njames93 (or others) have opinions on this.

For the record, when profiling this (and other RenamerClangTidy based checks) are the slowest(by some margin) checks to run.
This check also uses regex to determine if identifiers are correctly named and to split words in identifiers if they need changing.
Based off that I don't think there is huge performance concerns for using a regex for a couple of styles.

smhc added a comment.Nov 19 2020, 8:13 PM

Regexes may be a bit of a slippery slope as any identifier scheme could be represented as a regex (casing, prefix, suffix etc) but it's not practical to provide a fix or suggestion with a regex.
For example with a 'ShortThreshold' a fix could offer to change the casing/xffix or to trim the variable length. With a regex it's less about making a naming rule and more about providing a workaround for problematic edge cases. As such should be used sparingly.

I think the difficulty in writing them is not too much of a deal breaker. Providing an example for the equivalent config options would be just as useful as the config option itself. e.g

"For example, to ignore names with a length less than 3 use: \w{1,3}' "

I assume the ^ and $ would be implied by the matcher. If they want anything more complex they'd be effectively asking for regex support anyway.

smhc updated this revision to Diff 306686.Nov 20 2020, 7:21 AM

Updated diff to use 'IgnoredRegexp'
I see that a regexp has been used elsewhere for some other checks (macro usage, AllowedRegexp).

(as a side note, I see that AllowedRegexp string is getting compiled on every match rather than pre-compiling, the lack of a copy constructor on llvm::Regex makes it trickier to work with)

smhc updated this revision to Diff 306687.Nov 20 2020, 7:26 AM

removed notes on short length threshold

njames93 added inline comments.Nov 20 2020, 9:50 AM
clang-tools-extra/clang-tidy/readability/IdentifierNamingCheck.cpp
124–125

Making this change removes the need to NamingStyle to be copy constructable/assignable.

clang-tools-extra/clang-tidy/readability/IdentifierNamingCheck.h
62

Potentially save an allocation.

Also is it wise to warn a user on an invalid regex, unfortunately clang-tidy doesnt support diags with no source location but could still output to llvm::errs.

smhc updated this revision to Diff 306815.Nov 20 2020, 6:15 PM

Updated after review comments.

njames93 added inline comments.Nov 22 2020, 5:22 AM
clang-tools-extra/clang-tidy/readability/IdentifierNamingCheck.cpp
134–137

Elide braces on single statement if.

138–141

IgnoredRegexp will be default constructed anyway, so this doesn't add anything.

smhc updated this revision to Diff 306942.Nov 22 2020, 3:01 PM
smhc marked 4 inline comments as done.

Updated after review comments.

smhc added inline comments.Nov 22 2020, 3:03 PM
clang-tools-extra/clang-tidy/readability/IdentifierNamingCheck.cpp
124–125

Thanks - was looking for a solution to this.

aaron.ballman accepted this revision.Nov 23 2020, 6:19 AM

LGTM, but please wait for @njames93 in case they have additional feedback.

clang-tools-extra/clang-tidy/readability/IdentifierNamingCheck.h
62
This revision is now accepted and ready to land.Nov 23 2020, 6:19 AM
njames93 accepted this revision.Nov 23 2020, 9:20 AM

LGTM, before you push this, can you edit the summary and title as its no longer about just the length of the name. Otherwise the commit message will likely lead to confusion on whats being added.

smhc updated this revision to Diff 307157.Nov 23 2020, 12:06 PM
smhc retitled this revision from [clang-tidy] Add IgnoreShortNames config to identifier naming checks to [clang-tidy] Support IgnoredRegexp configuration to selectively suppress identifier naming checks.
smhc edited the summary of this revision. (Show Details)
smhc marked an inline comment as done.

Fixed uk/us spelling difference as suggested

smhc added a comment.Nov 23 2020, 12:16 PM

Thanks for the reviews everyone, I have updated the title and summary.
Unfortunately I don't have push access to llvm - could someone help to push this please?

Just one last small nit.

// You can ignore this, nothing to do with this patch.
All these changes in the documentation tell me we should probably restructure the documentation for this check.
Right now we have NxM lines of documentation for each combination of style kind(N) and style option(M).
This could probably be changed so we could describe each style option at the top of the documentation (<StyleKind>Case, <StyleKind> Prefix, ...)
Then we could describe each StyleKind afterwards.

.. option:: ClassCase, option::ClassPrefix...

Transforms class names
.. code-block:: c++

    class <Ident> {
    };

.. option:: FunctionCase, option::FunctionPrefix...

Transforms class names
.. code-block:: c++

    void <Ident>(int Param0);
clang-tools-extra/clang-tidy/readability/IdentifierNamingCheck.cpp
126

Dont need to use str() here

smhc updated this revision to Diff 307177.EditedNov 23 2020, 12:54 PM
smhc marked an inline comment as done.

Removed unnecessary str()

smhc added a comment.Nov 23 2020, 12:58 PM

Yes I thought the same wrt the doco. There is also a hungarian notation config option in the works which does the same thing.

I nearly reworked it but figured it would be best done under a separate patch

It does have the potential to be less searchable through google for specific options. Automatic generation could be used instead if that is a problem.

smhc added a comment.EditedNov 24 2020, 6:06 PM

There's a build failure from this merge, looks like a typo:

diff --git a/clang-tools-extra/docs/clang-tidy/checks/readability-identifier-naming.rst b/clang-tools-extra/docs/clang-tidy/checks/readability-identifier-naming.rst
index b744d21b6ff8..68c0abfcbdf6 100644

  • a/clang-tools-extra/docs/clang-tidy/checks/readability-identifier-naming.rst

+++ b/clang-tools-extra/docs/clang-tidy/checks/readability-identifier-naming.rst
@@ -680,7 +680,7 @@ After:

When defined, the check will ensure enumeration names will add the
prefixed with the given value (regardless of casing).

-.. option:: EnumConstantIgnoredRegexp
+.. option:: EnumIgnoredRegexp

Identifier naming checks won't be enforced for enumeration names
matching this regular expression.
smhc added inline comments.Nov 25 2020, 1:23 AM
clang-tools-extra/docs/clang-tidy/checks/readability-identifier-naming.rst
624
smhc marked an inline comment as not done.Nov 25 2020, 1:24 AM

There's a build failure from this merge, looks like a typo:

diff --git a/clang-tools-extra/docs/clang-tidy/checks/readability-identifier-naming.rst b/clang-tools-extra/docs/clang-tidy/checks/readability-identifier-naming.rst
index b744d21b6ff8..68c0abfcbdf6 100644

  • a/clang-tools-extra/docs/clang-tidy/checks/readability-identifier-naming.rst

+++ b/clang-tools-extra/docs/clang-tidy/checks/readability-identifier-naming.rst
@@ -680,7 +680,7 @@ After:

When defined, the check will ensure enumeration names will add the
prefixed with the given value (regardless of casing).

-.. option:: EnumConstantIgnoredRegexp
+.. option:: EnumIgnoredRegexp

Identifier naming checks won't be enforced for enumeration names
matching this regular expression.

I've corrected this in ed242da0ffa28493d8a5ee6b80ecbe2441ca48a7, thanks!