Page MenuHomePhabricator

[clang-tidy] Add 'experimental-cppcoreguidelines-avoid-adjacent-arguments-of-the-same-type' check
Needs ReviewPublic

Authored by whisperity on Oct 29 2019, 6:13 AM.

Details

Summary

Finds function definitions where parameters of the same type follow each other directly, making call sites prone to calling the function with swapped or badly ordered arguments.

The following construct might be the result of insufficient design, as capabilities of the type system are not used to their full extent to ensure that at a potential call site no possible argument swap happens.

file open(string_view directory, string_view filename, string_view extension)

The relevant I.24 C++ Core Guidelines rule to which conformity is checked can be found at http://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#i24-avoid-adjacent-unrelated-parameters-of-the-same-type

Diff Detail

Event Timeline

whisperity created this revision.Oct 29 2019, 6:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 29 2019, 6:13 AM
whisperity added a comment.EditedOct 29 2019, 6:16 AM

A few interesting "true positive" findings:

  • (Such cases with many bools are being discussed on enhancing type strength)
  • (Personally, I found this case a rather big find.)

Of course, an "interface conformity check" will find "false positive" cases which cannot be proved or done away by a heuristic easily:

I attach a render of the check's documentation so it's easy to read if you don't want to manually build the docs:

Eugene.Zelenko added inline comments.Oct 29 2019, 7:23 AM
clang-tools-extra/clang-tidy/cppcoreguidelines/AdjacentArgumentsOfSameTypeCheck.cpp
1 ↗(On Diff #226883)

Please adjust length to 80 characters.

9 ↗(On Diff #226883)

Please C++ headers aster LLVM ones.

35 ↗(On Diff #226883)

Functions should be static. Anonymous namespace only for types definitions.

488 ↗(On Diff #226883)

Check seems to be useful for C and probably for Objective-C.

clang-tools-extra/docs/ReleaseNotes.rst
147

Please remove This check.

clang-tools-extra/docs/clang-tidy/checks/cppcoreguidelines-avoid-adjacent-arguments-of-same-type.rst
6 ↗(On Diff #226883)

Please remove This check.

56 ↗(On Diff #226883)

Please use single back-tick for option values.

65 ↗(On Diff #226883)

Please use single back-tick for option values.

125 ↗(On Diff #226883)

Please add new line.

whisperity edited subscribers, added: baloghadamsoftware, NoQ; removed: Szelethus.

@Szelethus and @baloghadamsoftware are colleagues to me whom are far more knowledgeable about check development and I want them to see that I want a review from them. I specifically didn't do an "internal with colleagues" downstream review with regards to this code.

clang-tools-extra/clang-tidy/cppcoreguidelines/AdjacentArgumentsOfSameTypeCheck.cpp
488 ↗(On Diff #226883)

I'm not knowledgeable about Objective-C at all to make a decision on how the "adjacent argument ranges" could be calculated and what mixups are possible. As for C, should a cppcoreguidelines- check be enabled for C? Or you mean we should allow it to work, and the user will toggle how they see fit.

whisperity marked 11 inline comments as done.Oct 31 2019, 3:00 AM
whisperity added inline comments.
clang-tools-extra/clang-tidy/cppcoreguidelines/AdjacentArgumentsOfSameTypeCheck.cpp
488 ↗(On Diff #226883)

I've added a FIXME for ObjC as I'm really not qualified in that language. C support has been added.

  • Fix some comments, formatting and documentation
  • Organised test files to be in the same directory as others on the Monorepo structure
  • Helper functions moved from namespace (anonymous) to static.
  • Added test for C and enabled check for C code.
whisperity edited the summary of this revision. (Show Details)
  • Removed This check from the documentation comment of the check's class too.
NoQ added inline comments.Nov 4 2019, 5:00 PM
clang-tools-extra/clang-tidy/cppcoreguidelines/AdjacentArgumentsOfSameTypeCheck.cpp
488 ↗(On Diff #226883)

The check should keep working on Objective-C to check C-style calls, and it should also keep working on Objective-C++ to check C++-style method calls. You should be able to test this by renaming any of your .c/.cpp test files to .m/.mm (of course you don't *have* to duplicate your tests; just keep a couple of functions to see that the warning is still there).

However, the check shouldn't try to check Objective-C message expressions / method declarations, because Objective-C message syntax has so-called "parameter labels": the caller is forced to spell out which parameter it's passing, which effectively mitigates the problem.

Note that ObjCMethodDecl is not a sub-class of FunctionDecl, so your checker is already working correctly (as long as you remove the explicit suppression).

whisperity planned changes to this revision.Jan 1 2020, 12:38 PM

There are a few really minor bug fixes, test additions, documentation update, etc. coming along soon, but I've some more pressing matters. However, please feel free to review the patch as-is!

Nice! Do you have any numbers on the amount of false positives this check produces, or the ratio of false and true positives?
I recently started work on this check myself but didn't get very far before I discovered this patch and abandoned mine.
Before abandoning I got some feedback though; there were concerns that there would be way to many false positives.

There were requests both to add heuristics to minimize false positives, as I see you have done nicely,
and to provide some data as to how the check performs with regard to false positives.

clang-tools-extra/clang-tidy/cppcoreguidelines/AdjacentArgumentsOfSameTypeCheck.cpp
474 ↗(On Diff #227254)

Am I getting this right, is this the number of consecutive arguments of the same type which is required for this check to print a diagnostic? If so: why not set the default value to 2? -I think that's what a user who just read the C++ Core Guidelines would expect.

whisperity marked an inline comment as done.Feb 24 2020, 8:44 AM

I'd rather not call them false positive because the definition of false positive is ugly and mushy with regards to this check. This check attempts to enforce an interface rule, whether you(r project) wants to adhere the rule is a concise decision. A type equivalence (or convertibility in case of the follow-up patch D75041 that considers implicit conversions) should be a "true positive" in every case, as an indicator of potentially bad design.

However, there's the question whether or not the similar argument-ness is intentional, i.e. a (from one PoV flawed) but deliberate design decision by the project. We've tested a few projects. The ugliest I've found is OpenCV where every variable of a function argument is either an _InputArray or an _OutputArray...

Another question is, how fragmented you want your type system to be...
I hope my research into this topic will be at least something fruitful. It's all about compromises, one end if using any for all your variables, the other extreme is having a specific type for every single occurrence (stuff like fabs_return_t, create_schema_name_t and stuff).

There's a similar patch D20689 but that uses names (and thus requires that arguments and parameters to be "named" in some way).

clang-tools-extra/clang-tidy/cppcoreguidelines/AdjacentArgumentsOfSameTypeCheck.cpp
474 ↗(On Diff #227254)

Well, it is to reduce the number of nagging errors. Setting it to 2 would result in every "min" and "pow" and similar functions to match. It's a compromise.

I am also concerned about the false positives from this check because I don't think there's going to be an easy heuristic for determining whether two identifiers are "related" to one another.

I agree, and this is an unfortunate thing, but I do not think there could be a sensible heuristic here. Unless we somehow flesh out D20689 which uses name checks. (The two checks could actually complement each other - this one is for users who want to design their APIs defensively and want to see the changes during their code's evolution.)

There is no obvious way to silence any of the diagnostics generated as false positives short of requiring a naming convention for users to follow, which is not a particularly clean solution.

There is, you decide not to enable the checker?

I'd rather not call them false positive because the definition of false positive is ugly and mushy with regards to this check. This check attempts to enforce an interface rule, whether you(r project) wants to adhere the rule is a concise decision. A type equivalence (or convertibility in case of the follow-up patch D75041 that considers implicit conversions) should be a "true positive" in every case, as an indicator of potentially bad design.

The problem is that there *are* false positives, logically. The rule is about "unrelated" parameters, so the false positives come from not having any way to distinguish between related and unrelated arguments. My intuition is that the simple enforcement suggested by the rule is unlikely to be acceptable, but that's why I'm asking for data over large, real-world projects.

However, there's the question whether or not the similar argument-ness is intentional, i.e. a (from one PoV flawed) but deliberate design decision by the project. We've tested a few projects. The ugliest I've found is OpenCV where every variable of a function argument is either an _InputArray or an _OutputArray...

Another question is, how fragmented you want your type system to be...
I hope my research into this topic will be at least something fruitful. It's all about compromises, one end if using any for all your variables, the other extreme is having a specific type for every single occurrence (stuff like fabs_return_t, create_schema_name_t and stuff).

Agreed -- if we can find a heuristic that makes this work, the check has a lot of value.

In D74463#1888270, @aaron.ballman wrote:
There is no obvious way to silence any of the diagnostics generated as false positives short of requiring a naming convention for users to follow, which is not a particularly clean solution.

There is, you decide not to enable the checker?

That is not a useful level of granularity given that this check is likely to be incredibly noisy as part of our other C++ Core Guideline enforcement. We typically have NOLINT comments that we can fall back on for silencing individual instances of a false positive, but that's not likely to be a workable solution by itself for this check in particular because of system and third-party headers which are outside of the user's control (and unlikely to be willing or able to modify).

vingeldal added a comment.EditedFeb 24 2020, 12:06 PM

I'd rather not call them false positive because the definition of false positive is ugly and mushy with regards to this check. This check attempts to enforce an interface rule, whether you(r project) wants to adhere the rule is a concise decision. A type equivalence (or convertibility in case of the follow-up patch D75041 that considers implicit conversions) should be a "true positive" in every case, as an indicator of potentially bad design.

The problem is that there *are* false positives, logically. The rule is about "unrelated" parameters, so the false positives come from not having any way to distinguish between related and unrelated arguments. My intuition is that the simple enforcement suggested by the rule is unlikely to be acceptable, but that's why I'm asking for data over large, real-world projects.

I agree with this, the way I see it there are false positives and there is nothing "mushy" about the definitions of those.
Though I would prefer that, even if the implementation suggested in the guideline would be to crude, any deviations from what the C++ Core Guidelines specifies must be explicitly chosen by the user and not an option default.

However, there's the question whether or not the similar argument-ness is intentional, i.e. a (from one PoV flawed) but deliberate design decision by the project. We've tested a few projects. The ugliest I've found is OpenCV where every variable of a function argument is either an _InputArray or an _OutputArray...

Another question is, how fragmented you want your type system to be...
I hope my research into this topic will be at least something fruitful. It's all about compromises, one end if using any for all your variables, the other extreme is having a specific type for every single occurrence (stuff like fabs_return_t, create_schema_name_t and stuff).

Agreed -- if we can find a heuristic that makes this work, the check has a lot of value.

In D74463#1888270, @aaron.ballman wrote:
There is no obvious way to silence any of the diagnostics generated as false positives short of requiring a naming convention for users to follow, which is not a particularly clean solution.

There is, you decide not to enable the checker?

That is not a useful level of granularity given that this check is likely to be incredibly noisy as part of our other C++ Core Guideline enforcement. We typically have NOLINT comments that we can fall back on for silencing individual instances of a false positive, but that's not likely to be a workable solution by itself for this check in particular because of system and third-party headers which are outside of the user's control (and unlikely to be willing or able to modify).

Doesn't clang-tidy exclude warnings from system headers by default though?
And there is always the possibility of using multiple .clang-tidy files in the code base to use this check for only selected portions of the code base.
Also I'm convinced this check would already be of use in many code bases, as it is today, (but sure, as you said, supporting data for that would be good) and therefor it would be a clear improvement to clang-tidy.
It would be better to use this check until we can do even better, rather than to hold of the check completely until we can implement it perfectly for every code base.

Would it not be a reasonable approach to accept this implementation and improve NOLINT to work on entire files or lists of files rather than squeezing more heuristics (prone to false negatives) to the check?

clang-tools-extra/clang-tidy/cppcoreguidelines/AdjacentArgumentsOfSameTypeCheck.cpp
474 ↗(On Diff #227254)

I think this value very well may strike a good balance in that compromise but I still think it is a clear deviation from the guideline as specified.
IMO all deviations from the guideline should be explicitly requested by the user, not set by default, no matter how useful that deviation is.

Doesn't clang-tidy exclude warnings from system headers by default though?

Third-party SDKs are not always brought in as system headers, unfortunately. Even ignoring system and third-party headers, having two parameters of the same type is a natural occurrence for some data types (like bool) where it is going to be extremely hard to tell whether they're related or unrelated parameters.

And there is always the possibility of using multiple .clang-tidy files in the code base to use this check for only selected portions of the code base.

That is an option, but it would be better for the check to be usable without requiring a lot of custom configuration.

Also I'm convinced this check would already be of use in many code bases, as it is today, (but sure, as you said, supporting data for that would be good) and therefor it would be a clear improvement to clang-tidy.

I'm not as convinced, tbh.

It would be better to use this check until we can do even better, rather than to hold of the check completely until we can implement it perfectly for every code base.

Incremental improvement is always a good thing, but it has to start from a stable base. I'm not yet convinced this check meets the bar for inclusion in clang-tidy. There is no proposed way to tell how two parameters relate to one another, and the simple enforcement seems dangerously simple.

Would it not be a reasonable approach to accept this implementation and improve NOLINT to work on entire files or lists of files rather than squeezing more heuristics (prone to false negatives) to the check?

AFAICT, that would be a novel new use of NOLINT, so I'm wary of it. I think the first thing is to gather some data as to how this check performs as-is. If the behavior is unreasonable, we can see what ways in which it is unreasonable with the dataset and see if we can tease out either heuristics or configuration options that would make it more palatable. We may also want to rope some of the C++ Core Guidelines authors in on the discussion at that point because that would be data suggesting their suggested simple enforcement is untenable.

Doesn't clang-tidy exclude warnings from system headers by default though?

Third-party SDKs are not always brought in as system headers, unfortunately. Even ignoring system and third-party headers, having two parameters of the same type is a natural occurrence for some data types (like bool) where it is going to be extremely hard to tell whether they're related or unrelated parameters.

And there is always the possibility of using multiple .clang-tidy files in the code base to use this check for only selected portions of the code base.

That is an option, but it would be better for the check to be usable without requiring a lot of custom configuration.

One can't really expect a ones own .clang-tidy file to be applicable to all third party code used in ones project. It is a fact, with or without this check, that if you use third party libraries you can't expect the third party libraries to comply with your own .clang-tidy file. You should expect to set up a separate .clang-tidy file for all third party libraries used by your project with or without this check -and that's ok because it is arguably not a big effort to do so.

Incremental improvement is always a good thing, but it has to start from a stable base. I'm not yet convinced this check meets the bar for inclusion in clang-tidy. There is no proposed way to tell how two parameters relate to one another, and the simple enforcement seems dangerously simple.

Would it not be a reasonable approach to accept this implementation and improve NOLINT to work on entire files or lists of files rather than squeezing more heuristics (prone to false negatives) to the check?

AFAICT, that would be a novel new use of NOLINT, so I'm wary of it. I think the first thing is to gather some data as to how this check performs as-is. If the behavior is unreasonable, we can see what ways in which it is unreasonable with the dataset and see if we can tease out either heuristics or configuration options that would make it more palatable. We may also want to rope some of the C++ Core Guidelines authors in on the discussion at that point because that would be data suggesting their suggested simple enforcement is untenable.

There is also the --header-filter command line option to configure the tool to skip some headers.

To sum up my position:
I'm all for investigating the performance of this check and basing any decisions on data. Until we have that data I just want to challenge the claim that there are no suitable ways to turn the check of for select portions of the code base,
there are definitely ways to do that which both work satisfyingly and are in no way novel or unconventional.

Doesn't clang-tidy exclude warnings from system headers by default though?

Third-party SDKs are not always brought in as system headers, unfortunately. Even ignoring system and third-party headers, having two parameters of the same type is a natural occurrence for some data types (like bool) where it is going to be extremely hard to tell whether they're related or unrelated parameters.

I have specifically applied the matcher in this patch in a way that it only matches for the definition of the function symbol. If you never create a build database out of the third party "package" and run the check on the third-party library's source code, you will not get a warning for these.

While it's somewhat wonky that an "interface" rule is matching the "implementation" nodes, it is a natural restriction to only warn the user about things they have a reliable way of fixing. Hopefully, they will not forget to align their headers after potentially fixing the implementation code.

whisperity added a comment.EditedFeb 25 2020, 1:08 AM

I think this value very well may strike a good balance in that compromise but I still think it is a clear deviation from the guideline as specified.
IMO all deviations from the guideline should be explicitly requested by the user, not set by default, no matter how useful that deviation is.

While I am still planning to compile a measurement with potentially tagging false and true positives -- albeit classifying this on random projects I have no domain knowledge about is hard at first glance...

@aaron.ballman Could it be a useful compromise to not say we are trying to solve the cppcoreguidelines- rule, but rather introduce this checker in another "namespace" and maybe mention in the docs that it could, although with caveats, apply "parts" of the Core Guidelines rule?

That is an option, but it would be better for the check to be usable without requiring a lot of custom configuration.

One can't really expect a ones own .clang-tidy file to be applicable to all third party code used in ones project. It is a fact, with or without this check, that if you use third party libraries you can't expect the third party libraries to comply with your own .clang-tidy file. You should expect to set up a separate .clang-tidy file for all third party libraries used by your project with or without this check -and that's ok because it is arguably not a big effort to do so.

We typically aim for checks to require as little configuration as possible by tailoring the default check behavior to be as reasonable as we can make it.

Incremental improvement is always a good thing, but it has to start from a stable base. I'm not yet convinced this check meets the bar for inclusion in clang-tidy. There is no proposed way to tell how two parameters relate to one another, and the simple enforcement seems dangerously simple.

Would it not be a reasonable approach to accept this implementation and improve NOLINT to work on entire files or lists of files rather than squeezing more heuristics (prone to false negatives) to the check?

AFAICT, that would be a novel new use of NOLINT, so I'm wary of it. I think the first thing is to gather some data as to how this check performs as-is. If the behavior is unreasonable, we can see what ways in which it is unreasonable with the dataset and see if we can tease out either heuristics or configuration options that would make it more palatable. We may also want to rope some of the C++ Core Guidelines authors in on the discussion at that point because that would be data suggesting their suggested simple enforcement is untenable.

There is also the --header-filter command line option to configure the tool to skip some headers.

That is true.

To sum up my position:
I'm all for investigating the performance of this check and basing any decisions on data. Until we have that data I just want to challenge the claim that there are no suitable ways to turn the check of for select portions of the code base,
there are definitely ways to do that which both work satisfyingly and are in no way novel or unconventional.

Using the header filters would be a reasonable approach, if a bit heavy-handed. Ideally, the check would warn about what the rule wants it to warn about. I think the unsatisfying bit is the lack of clarity in the rule itself. Trying to write a checker for a rule with that much subjectivity typically leads to a lot of configuration needs.

I think this value very well may strike a good balance in that compromise but I still think it is a clear deviation from the guideline as specified.
IMO all deviations from the guideline should be explicitly requested by the user, not set by default, no matter how useful that deviation is.

While I am still planning to compile a measurement with potentially tagging false and true positives -- albeit classifying this on random projects I have no domain knowledge about is hard at first glance...

LLVM itself might be a good starting point since we've all got some passing familiarity with the project.

@aaron.ballman Could it be a useful compromise to not say we are trying to solve the cppcoreguidelines- rule, but rather introduce this checker in another "namespace" and maybe mention in the docs that it could, although with caveats, apply "parts" of the Core Guidelines rule?

I think that could be a useful compromise, or name it cppcoreguidelines-experimental-avoid-adjacent-parameters-of-same-type to make it clear that we're trying things out to see what works in practice and the behavior may not match the guideline. I think the most important thing we do will be to eventually coordinate with the C++ Core Guidelines folks though. The eventual goal is for the clang-tidy check to cover what the guideline wants it to cover as its default behavior, but in a way that's going to give acceptable behavior for our users. This may mean we have a slightly chattier diagnostic than I'd like, and it may also mean that some updates to the core guideline are needed to help clarify what constitutes a "related" parameter in a way that's acceptable for analysis tools. What I want to avoid is having users say "this check doesn't match the guideline" or "we want to enable all the guidelines but we always have to disable this one because of the false positives".

Btw, we should update the terminology in the patch to use parameter instead of argument (parameters are what the function declares, arguments are what are passed in at the call site and they bind to parameters -- the issue this check is trying to solve is on the declaration side, not the caller side).

@aaron.ballman @vingeldal I'll go over the findings (as I've run this on LLVM and various other projects, a few examples are uploaded in my first comment as HTML renders of the bug report!) soon, but I want to wait until a research paper I have based on this topic gets a final decision. (It should happen soon.) I also plan to refurbish D20689 and have it upstreamed as a "co-checker" of this (one for the interface rule, one for the call sites, they kinda want to solve different aspects of the same issue, let it be up for the project how much they wish to enforce).

@aaron.ballman @vingeldal I'll go over the findings (as I've run this on LLVM and various other projects, a few examples are uploaded in my first comment as HTML renders of the bug report!) soon, but I want to wait until a research paper I have based on this topic gets a final decision. (It should happen soon.) I also plan to refurbish D20689 and have it upstreamed as a "co-checker" of this (one for the interface rule, one for the call sites, they kinda want to solve different aspects of the same issue, let it be up for the project how much they wish to enforce).

Sounds great, thank you!

Btw, we should update the terminology in the patch to use parameter instead of argument (parameters are what the function declares, arguments are what are passed in at the call site and they bind to parameters -- the issue this check is trying to solve is on the declaration side, not the caller side).

Indeed, I think I got confused because the rule heading says "parameter" but immediately the line following says "argument". I'll rename the check, also potentially rebase and all that, it should be a trivial thing to do.
My bigger concerns are about the subsequent patch which makes this rule matching order of magnitude more powerful is implicit conversions. See D75041 for that.

(Originally, the entire idea we came up with was more of a local brainstorming, and it was only during development that I realised that there is a (somewhat?) matching guideline rule about this.)

Agree that we could make an attempt to detect "related" parameters.
A possible heuristic to detect "related arguments" would be to look for interactions between these arguments, e.g. check whether we use them in specific operations, e.g.

// tipical for related iterators
par1 == par2
par1 != par2
etc

// tipical for pointers
par1 - par2
*par1 = *par2

and similar.

@aaron.ballman I've gone over LLVM (and a few other projects). Some general observations:

  • Length of 2 is vile. I understand that the C++CG rule says even lengths of 2 should be matched, but that is industrially infeasible unless one introduces such a rule incrementally to their project. Findings of length 2 are , in general, an order of magnitude more than... basically the rest of the findings.
    • On big projects, even the current "default" of "length 3" seems to be too low. In reality, one should consider (this is likely to be out of scope for Tidy) how often these functions are called, and various other metrics on how serious an offender is.
    • LLVM: 7206 len-2 findings, 767 len-3, 194 len-4, 59 len-5, etc.)
    • OpenCV: 4391 len-2, 987 len-3, 507, 115, ...
      • I did not manually evaluate OpenCV, and I think OpenCV is a project on which - due to their seeming design principle of type erasure - this check is useless.
      • (Similar could be said about LLVM, we are cutting a lot of corners on "niceness" for absolute bleeding-edge performance!)
  • Most of the findings come from low-level types, such as bool or int or unsigned or some project or library-specific (such as a few cases of SDVal in LLVM) types.
  • Strings and string-like types (such as StringRef, char *, etc.) are also huge offenders.
    • Some of these offenders can be side-stepped with semantic typedefs (D66148)
  • Further heuristics are definitely needed, but this patch is huge enough as is, I'd introduce further heuristics in future patches:
    • Allow the user to configure the set of ignored types and variable names from the command-line (I think this already exists as a FIXME in the code as of now)
    • Relatedness check for parameter names (more or less like in D20689!) if we can fetch parameter names -- I am thinking about trying to parse the variable names to find if they have a common prefix and number.
    • If the parameters have a sufficient (question is, what is sufficient) common ancestor in the AST (like as @zporky mentioned, assignment, comparison, etc., but also should consider passing them to the same function -- this would ignore forwarding and factory methods nicely on the first run --), they should be silenced.
      • Forwarding and trampoline functions I've put into the False-positive category, as they will be ignored once such a heuristic is complete.
      • This contributed a huge chunk of the false-positive ratio. Due to the trampolines and factories, and "forwarding overloads", it's around 40% on Xerces and LLVM. Without it, around 15%.
        • This more or less follows naturally, one offending function is one true positive, but every trampoline, pimple, overload, etc. on it is basically an additional false positive.
  • Making decisions on code you're not familiar with is hard ;)

This is most likely not possible purely from Tidy as this requires the wrangler tooling support, but I believe people who wish to use this check should be encouraged to **only** take the introduced differences into consideration on their project.

whisperity retitled this revision from [clang-tidy] Add 'cppcoreguidelines-avoid-adjacent-arguments-of-same-type' check to [clang-tidy] Add 'experimental-cppcoreguidelines-avoid-adjacent-arguments-of-the-same-type' check.
whisperity edited the summary of this revision. (Show Details)
whisperity set the repository for this revision to rG LLVM Github Monorepo.
  • Renamed check to experimental-cppcoreguidelines-avoid-adjacent-parameters-of-the-same-type.
  • s/argument/parameter/g in the code and output where appropriate.

@aaron.ballman I've gone over LLVM (and a few other projects). Some general observations:

  • Length of 2 is vile. I understand that the C++CG rule says even lengths of 2 should be matched, but that is industrially infeasible unless one introduces such a rule incrementally to their project. Findings of length 2 are , in general, an order of magnitude more than... basically the rest of the findings.
    • On big projects, even the current "default" of "length 3" seems to be too low. In reality, one should consider (this is likely to be out of scope for Tidy) how often these functions are called, and various other metrics on how serious an offender is.

Not a problem since existing projects can just use the option and change to whatever level suits them best. My opinion is still that the default shouldn't be set to whatever we think is most useful for the majority of existing projects but to what reflects the actual guideline.

Consider how this situation is similar to the guidelines regarding pointers. The guidelines assume that a project isn't using old standards of C++. Where one has a huge legacy code base it will be impractical at best to try to apply the C++ Core Guidelines for pointer handling on the old code.
The expectation is that new projects will use the new libraries and language features available to do things differently, in a way that makes it possible and practical to follow the guidelines; while legacy code remains unchecked or incrementally improved.
For any new code base this guideline shouldn't be a problem and existing projects can just adapt the usage of this rule to fit their situation by applying it selectively in the code base and/or changing the options.

@aaron.ballman I've gone over LLVM (and a few other projects). Some general observations:

  • Length of 2 is vile. I understand that the C++CG rule says even lengths of 2 should be matched, but that is industrially infeasible unless one introduces such a rule incrementally to their project. Findings of length 2 are , in general, an order of magnitude more than... basically the rest of the findings.
    • On big projects, even the current "default" of "length 3" seems to be too low. In reality, one should consider (this is likely to be out of scope for Tidy) how often these functions are called, and various other metrics on how serious an offender is.

Not a problem since existing projects can just use the option and change to whatever level suits them best. My opinion is still that the default shouldn't be set to whatever we think is most useful for the majority of existing projects but to what reflects the actual guideline.

That's what we try to aim for, but ultimately it means we have to decide whether that means we're willing to abandon a check over it or not (and we traditionally have not abandoned checks, but instead gone with more reasonable defaults). This is a case where I would probably rather abandon the check than go with that poor of a default behavior.

I think a more productive way forward is to collaborate with the rule authors until they have a rule that can be checked and reasonably complied with (which I imagine would require some heuristic choices in the rule that we could then apply in the check).

Consider how this situation is similar to the guidelines regarding pointers. The guidelines assume that a project isn't using old standards of C++. Where one has a huge legacy code base it will be impractical at best to try to apply the C++ Core Guidelines for pointer handling on the old code.

The expectation is that new projects will use the new libraries and language features available to do things differently, in a way that makes it possible and practical to follow the guidelines; while legacy code remains unchecked or incrementally improved.
For any new code base this guideline shouldn't be a problem and existing projects can just adapt the usage of this rule to fit their situation by applying it selectively in the code base and/or changing the options.

This is a very different situation, IMO. This is a check that is almost impossible to comply with when developing new code because it is insufficiently clear on what constitutes "related" parameters. Two adjacent types in a function parameter list is an incredibly common situation that arises naturally when writing code (especially for constructors, where your parameter order has to match the declaration order within the class) and designing around that from scratch is not always possible. Retroactively applying this rule to an existing code base would be nearly impossible for a project of any size.

whisperity edited the summary of this revision. (Show Details)Apr 22 2020, 9:24 AM
  • Better organisation of code, cleanup of debug prints, fix nomenclature of functions
  • Explicitly mention the first and last param of the range for clarity
  • Downlift the logic of joining and breaking ranges to main patch (this isn't terribly useful at this point, but both subsequent improvements to the check depend on this change)
    • Basically no longer assume that if param N can be swapped with param 1, then it can also be swapped with 2, 3, etc. without checking.
  • Made the list of ignored parameter and type names configurable as a checker option
  • Changed the default value of MinimumLength to 2, as prescribed in the guideline's text.
whisperity marked an inline comment as done.
whisperity added inline comments.
clang-tools-extra/docs/clang-tidy/checks/experimental-cppcoreguidelines-avoid-adjacent-parameters-of-the-same-type.rst
136–139

Why are these typos still here, typedef??? instead of typename.

whisperity marked an inline comment as not done.Apr 22 2020, 9:58 AM