Page MenuHomePhabricator

[clang-tidy] Suspicious Call Argument checker
Needs RevisionPublic

Authored by barancsuk on May 26 2016, 10:38 AM.

Details

Summary

This checker finds those function calls where the function arguments are provided in an incorrect order. It compares the name of the given variable to the argument name in the function definition.
It issues a message if the given variable name is similar to an another function argument in a function call. It uses case insensitive comparison.
If the warnings from this checker are false positive in a sense that no semantic issue is found, it can still indicate bad naming convention for some variables

Diff Detail

Event Timeline

varjujan updated this revision to Diff 58644.May 26 2016, 10:38 AM
varjujan retitled this revision from to [clang-tidy] Suspicious Call Argument checker.
varjujan updated this object.
varjujan added a reviewer: alexfh.
varjujan added subscribers: xazax.hun, cfe-commits.
varjujan updated this object.May 27 2016, 5:10 AM
alexfh edited edge metadata.May 27 2016, 6:54 AM

Thank you for the new check!

Before starting with the review, I'd like to clarify one important thing. It's not immediately obvious that the pattern the check detects is actually a good indicator of a programming mistake. Did you try to run the check on a large enough codebase (at least, on the whole LLVM project) and analyze the results? http://clang.llvm.org/extra/clang-tidy/#running-clang-tidy-on-llvm describes the recommended way to run clang-tidy on LLVM.

Yes, I did. The results from running the checker on LLVM are in the attached file. Sadly, I could'nt find any real mistakes but as I wrote in the summary, false positives can still indicate bad naming convention for some variables.

alexfh added a comment.Jun 3 2016, 3:09 PM

Yes, I did. The results from running the checker on LLVM are in the attached file. Sadly, I could'nt find any real mistakes but as I wrote in the summary, false positives can still indicate bad naming convention for some variables.

It looks like the check doesn't meet the quality bar, at least, in its current form. "Bad naming convention" is a very arguable thing and I'm not sure we can claim that the pattern detected by the check is somehow an indication of a bad naming convention.

alexfh requested changes to this revision.Jun 3 2016, 3:10 PM
alexfh edited edge metadata.
This revision now requires changes to proceed.Jun 3 2016, 3:10 PM
varjujan updated this revision to Diff 82859.Jan 3 2017, 3:49 AM
varjujan edited edge metadata.

I have implemented some more heuristics to achieve better results.

I ran the check on multiple projects and tried to categorize the warnings: real errors, false positives, naming errors and coincidences. The results are attached. I got no warnings on LLVM.

alexfh requested changes to this revision.Jan 3 2017, 7:46 AM
alexfh edited edge metadata.

I ran the check on multiple projects and tried to categorize the warnings: real errors, false positives, naming errors and coincidences. The results are attached. I got no warnings on LLVM.

I didn't find any "real errors" in the files you posted, which means that either all of these projects are of extreme code quality or that the error is rather unlikely to happen.

Another concern is that the check seems to treat argument types in a rather primitive way, e.g. it doesn't consider any type conversions or promotions.

This revision now requires changes to proceed.Jan 3 2017, 7:46 AM

I think this might be better as a readability checker to find misleading variable or parameter names.

It would also be great to consider types. Unfortunately it probably means reimplementing some of the logic from Sema, since that information is not available at this point.

@varjujan
Do you actually use all of the heuristics that are implemented?

@xazax.hun
Yes I do. Obviously some of them seem to be better than the others so I can remove a couple if needed.

whisperity added a project: Restricted Project.May 12 2017, 2:40 AM
whisperity added subscribers: gsd, dkrupp, whisperity, o.gyorgy.
barancsuk commandeered this revision.Aug 31 2017, 3:12 AM
barancsuk added a reviewer: varjujan.
barancsuk added a subscriber: barancsuk.

Since the previous author, @varjujan is not available anymore, I take over the role of the author of this revision.

barancsuk updated this revision to Diff 113376.EditedAug 31 2017, 4:58 AM
barancsuk edited edge metadata.

Major changes that have been made since the last update are as follows:

  • The checker is moved from the module misc to readability
  • It is checked, whether implicit type conversion is possible from the argument to the other parameter. The following conversion rules are considered:
    • Arithmetic type conversions
    • Pointer to pointer conversion (for multilevel pointers and also base/derived class pointers)
    • Array to pointer conversion
    • Function to function-pointer conversion
    • CV-qualifier compatibility is checked as well.
  • The calculation of a heuristic`s result and the comparison of this result with the corresponding threshold value are performed by the same method, the heuristic`s function.
  • The heuristic`s function is called only if the heuristic itself is turned on by the configuration settings.

Remark:
Implicit conversion rules of C are not checked separately, because, as I experienced when testing the checker on a larger code base, deeming all pointers convertible results in several false positives.

barancsuk updated this revision to Diff 113830.Sep 5 2017, 4:07 AM

Check if argument and parameter numbers differ, add test cases for functions with default parameters

barancsuk added a comment.EditedSep 15 2017, 6:32 AM

@alexfh, would you mind taking a look at the changes that have been introduced in the new patch?

The main improvements are:

  • The checker has been shifted to the module readability.
  • It is checked, whether implicit type conversion is possible from the argument to the parameter.

I have run the modified checker on some large code bases with the following results:
(The errors were categorized subjectively.)

  • PostgreSQL: 32 renaming opportunities of 39 warnings
  • Cpython: 10 renaming opportunities of 15 warnings
  • Xerces: 6 renaming opportunities of 8 warnings
  • FFmpeg: 5 renaming opportunities of 9 warnings
  • OpenSSL: 3 renaming opportunities of 4 warnings
  • LLVM: 20 renaming opportunities of 44 warnings

This article provides some evidence to support the feasibility of the checker as well.


The authors have proven that argument names are generally very similar to the corresponding parameters' names.
The presented empirical evidence also shows, that argument and parameter name dissimilarities are strong indicators of incorrect argument usages, or they identify renaming opportunities to improve code readability.
Moreover, the authors have even found 3 existing bugs in open source projects.

@alexfh, would you mind taking a look at the changes that have been introduced in the new patch?

The main improvements are:

    • The checker has been shifted to the module readability.
    • It is checked, whether implicit type conversion is possible from the argument to the parameter.

      I have run the modified checker on some large code bases with the following results: (The errors were categorized subjectively.)
  • PostgreSQL: 32 renaming opportunities of 39 warnings
  • Cpython: 10 renaming opportunities of 15 warnings
  • Xerces: 6 renaming opportunities of 8 warnings
  • FFmpeg: 5 renaming opportunities of 9 warnings
  • OpenSSL: 3 renaming opportunities of 4 warnings
  • LLVM: 20 renaming opportunities of 44 warnings

Is there a list of all the warnings? I'd like to take a closer look.

This article provides some evidence to support the feasibility of the checker as well.


The authors have proven that argument names are generally very similar to the corresponding parameters' names.
The presented empirical evidence also shows, that argument and parameter name dissimilarities are strong indicators of incorrect argument usages, or they identify renaming opportunities to improve code readability.
Moreover, the authors have even found 3 existing bugs in open source projects.

I attached the results of the tests.
The warnings are categorized into false positives and renaming opportunities.

@alexfh, have you had a chance to look at the results yet?

szepet added a subscriber: szepet.Oct 8 2017, 10:06 AM

Sorry, I lost this patch. I've looked at the results and it still seems that the signal-to-noise ratio is quite low. There's definitely potential in using parameter name and argument spelling to detect possibly swapped arguments, and there's a recent research on this topic, where authors claim to have reached the true positive rate of 85% with decent recall. See https://research.google.com/pubs/pub46317.html. If you're interested in working on this check, I would suggest at least looking at the techniques described in that paper.

alexfh requested changes to this revision.Jan 5 2018, 6:25 AM
This revision now requires changes to proceed.Jan 5 2018, 6:25 AM