This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Add 'readability-suspicious-call-argument' check
ClosedPublic

Authored by whisperity on May 26 2016, 10:38 AM.

Details

Summary

Find function calls where the call arguments might be provided in an incorrect order.
The check works by comparing the name of the arguments with the name of the parameters in the visible function declaration called.
A diagnostic is emitted if an argument name is similar to another parameter more than the one it is passed to currently, while also being dissimilar enough from the current one.

Several string metrics are implemented, and each has thresholds configurable by the user.

As this is a heuristics-based check, no FixIts are generated, on purpose.
False-positive (in the sense that the diagnostic isn't indicating an actual swap that was done) warnings from this check are still useful for developers as the findings indicate potential bad naming conventions used for variables and parameters.


Originally implemented by @varjujan as his Master's Thesis work.
The check was subsequently taken over by @barancsuk who added type conformity checks to silence false positive matches.

Diff Detail

Event Timeline

varjujan updated this revision to Diff 58644.May 26 2016, 10:38 AM
varjujan retitled this revision from to [clang-tidy] Suspicious Call Argument checker.
varjujan updated this object.
varjujan added a reviewer: alexfh.
varjujan added subscribers: xazax.hun, cfe-commits.
varjujan updated this object.May 27 2016, 5:10 AM
alexfh edited edge metadata.May 27 2016, 6:54 AM

Thank you for the new check!

Before starting with the review, I'd like to clarify one important thing. It's not immediately obvious that the pattern the check detects is actually a good indicator of a programming mistake. Did you try to run the check on a large enough codebase (at least, on the whole LLVM project) and analyze the results? http://clang.llvm.org/extra/clang-tidy/#running-clang-tidy-on-llvm describes the recommended way to run clang-tidy on LLVM.

Yes, I did. The results from running the checker on LLVM are in the attached file. Sadly, I could'nt find any real mistakes but as I wrote in the summary, false positives can still indicate bad naming convention for some variables.

alexfh added a comment.Jun 3 2016, 3:09 PM

Yes, I did. The results from running the checker on LLVM are in the attached file. Sadly, I could'nt find any real mistakes but as I wrote in the summary, false positives can still indicate bad naming convention for some variables.

It looks like the check doesn't meet the quality bar, at least, in its current form. "Bad naming convention" is a very arguable thing and I'm not sure we can claim that the pattern detected by the check is somehow an indication of a bad naming convention.

alexfh requested changes to this revision.Jun 3 2016, 3:10 PM
alexfh edited edge metadata.
This revision now requires changes to proceed.Jun 3 2016, 3:10 PM
varjujan updated this revision to Diff 82859.Jan 3 2017, 3:49 AM
varjujan edited edge metadata.

I have implemented some more heuristics to achieve better results.

I ran the check on multiple projects and tried to categorize the warnings: real errors, false positives, naming errors and coincidences. The results are attached. I got no warnings on LLVM.

alexfh requested changes to this revision.Jan 3 2017, 7:46 AM
alexfh edited edge metadata.

I ran the check on multiple projects and tried to categorize the warnings: real errors, false positives, naming errors and coincidences. The results are attached. I got no warnings on LLVM.

I didn't find any "real errors" in the files you posted, which means that either all of these projects are of extreme code quality or that the error is rather unlikely to happen.

Another concern is that the check seems to treat argument types in a rather primitive way, e.g. it doesn't consider any type conversions or promotions.

This revision now requires changes to proceed.Jan 3 2017, 7:46 AM

I think this might be better as a readability checker to find misleading variable or parameter names.

It would also be great to consider types. Unfortunately it probably means reimplementing some of the logic from Sema, since that information is not available at this point.

@varjujan
Do you actually use all of the heuristics that are implemented?

@xazax.hun
Yes I do. Obviously some of them seem to be better than the others so I can remove a couple if needed.

whisperity added a project: Restricted Project.May 12 2017, 2:40 AM
whisperity added subscribers: gsd, dkrupp, whisperity, o.gyorgy.
barancsuk commandeered this revision.Aug 31 2017, 3:12 AM
barancsuk added a reviewer: varjujan.
barancsuk added a subscriber: barancsuk.

Since the previous author, @varjujan is not available anymore, I take over the role of the author of this revision.

barancsuk updated this revision to Diff 113376.EditedAug 31 2017, 4:58 AM
barancsuk edited edge metadata.

Major changes that have been made since the last update are as follows:

  • The checker is moved from the module misc to readability
  • It is checked, whether implicit type conversion is possible from the argument to the other parameter. The following conversion rules are considered:
    • Arithmetic type conversions
    • Pointer to pointer conversion (for multilevel pointers and also base/derived class pointers)
    • Array to pointer conversion
    • Function to function-pointer conversion
    • CV-qualifier compatibility is checked as well.
  • The calculation of a heuristic`s result and the comparison of this result with the corresponding threshold value are performed by the same method, the heuristic`s function.
  • The heuristic`s function is called only if the heuristic itself is turned on by the configuration settings.

Remark:
Implicit conversion rules of C are not checked separately, because, as I experienced when testing the checker on a larger code base, deeming all pointers convertible results in several false positives.

barancsuk updated this revision to Diff 113830.Sep 5 2017, 4:07 AM

Check if argument and parameter numbers differ, add test cases for functions with default parameters

barancsuk added a comment.EditedSep 15 2017, 6:32 AM

@alexfh, would you mind taking a look at the changes that have been introduced in the new patch?

The main improvements are:

  • The checker has been shifted to the module readability.
  • It is checked, whether implicit type conversion is possible from the argument to the parameter.

I have run the modified checker on some large code bases with the following results:
(The errors were categorized subjectively.)

  • PostgreSQL: 32 renaming opportunities of 39 warnings
  • Cpython: 10 renaming opportunities of 15 warnings
  • Xerces: 6 renaming opportunities of 8 warnings
  • FFmpeg: 5 renaming opportunities of 9 warnings
  • OpenSSL: 3 renaming opportunities of 4 warnings
  • LLVM: 20 renaming opportunities of 44 warnings

This article provides some evidence to support the feasibility of the checker as well.


The authors have proven that argument names are generally very similar to the corresponding parameters' names.
The presented empirical evidence also shows, that argument and parameter name dissimilarities are strong indicators of incorrect argument usages, or they identify renaming opportunities to improve code readability.
Moreover, the authors have even found 3 existing bugs in open source projects.

@alexfh, would you mind taking a look at the changes that have been introduced in the new patch?

The main improvements are:

  • The checker has been shifted to the module readability.
  • It is checked, whether implicit type conversion is possible from the argument to the parameter.

I have run the modified checker on some large code bases with the following results:
(The errors were categorized subjectively.)

  • PostgreSQL: 32 renaming opportunities of 39 warnings
  • Cpython: 10 renaming opportunities of 15 warnings
  • Xerces: 6 renaming opportunities of 8 warnings
  • FFmpeg: 5 renaming opportunities of 9 warnings
  • OpenSSL: 3 renaming opportunities of 4 warnings
  • LLVM: 20 renaming opportunities of 44 warnings

Is there a list of all the warnings? I'd like to take a closer look.

This article provides some evidence to support the feasibility of the checker as well.


The authors have proven that argument names are generally very similar to the corresponding parameters' names.
The presented empirical evidence also shows, that argument and parameter name dissimilarities are strong indicators of incorrect argument usages, or they identify renaming opportunities to improve code readability.
Moreover, the authors have even found 3 existing bugs in open source projects.

I attached the results of the tests.
The warnings are categorized into false positives and renaming opportunities.

@alexfh, have you had a chance to look at the results yet?

szepet added a subscriber: szepet.Oct 8 2017, 10:06 AM

Sorry, I lost this patch. I've looked at the results and it still seems that the signal-to-noise ratio is quite low. There's definitely potential in using parameter name and argument spelling to detect possibly swapped arguments, and there's a recent research on this topic, where authors claim to have reached the true positive rate of 85% with decent recall. See https://research.google.com/pubs/pub46317.html. If you're interested in working on this check, I would suggest at least looking at the techniques described in that paper.

alexfh requested changes to this revision.Jan 5 2018, 6:25 AM
This revision now requires changes to proceed.Jan 5 2018, 6:25 AM

I have developed a related check in D69560. That one considers types, but is an interface rule checker, and does not consider (any) potential call sites. Moreover, it does not consider "swaps" that happen across a function call, only, as the name implies, adjacent similar-type ranges.

Maybe one could lift the "is-similar-type", or rather, "is-accidentally-mixable-type" related ruling to some common location, and use type similarity as a precondition gate in the reports of this check?

docs/clang-tidy/checks/misc-redundant-expression.rst
18 ↗(On Diff #113830)

This seems to be an unrelated diff.

test/clang-tidy/misc-redundant-expression.cpp
20 ↗(On Diff #113830)

This entire file seems to be unrelated to the discussion at hand, perhaps a rebase went sideways?

whisperity commandeered this revision.Apr 23 2020, 3:33 AM
whisperity added a reviewer: barancsuk.

Assuming direct control. Previous colleagues and university mates departed for snowier pastures, time to try to do something with this check.

whisperity retitled this revision from [clang-tidy] Suspicious Call Argument checker to [clang-tidy] Add 'readability-suspicious-call-argument' check.
whisperity edited the summary of this revision. (Show Details)
whisperity removed reviewers: varjujan, barancsuk.
whisperity set the repository for this revision to rG LLVM Github Monorepo.
whisperity edited subscribers, added: varjujan, zporky; removed: gsd.

First things first, we were 50 thousand (!) patches behind reality. Rebased to master. Fixed it to compile, too. Otherwise, NFC so far.

Herald added a project: Restricted Project. · View Herald TranscriptApr 23 2020, 3:40 AM
whisperity planned changes to this revision.Apr 23 2020, 3:41 AM
whisperity edited the summary of this revision. (Show Details)

Right, let's bump. I've checked the output of the checker on a set of test projects (our usual testbed, I should say) which contains around 15 projects, half of which was C and the other half C++. All projects were analysed independently in a virtual machine. All projects were analysed after checking out the latest release tag that was closest to Jan 2019. (I know this sounds dated, but the whole reproduction image was originally constructed for another checker I'm working on, and I did not wish to re-do the whole "What is needed to install this particular version?" pipeline.) The more important thing is that releases were analysed, which should, in theory, under-approximate the findings because releases are generally more polished than in-the-works code. The check was run as-is (sans a minor rebase issue that does not affect functionality) currently in the patch, namely, Diff 259508.

In total, I had received 194 reports (129 unique (1)). From this, I had found 8 (7 unique) true positives (marked Confirmed bug in CodeChecker terms), where something is visibly wrong with the call.
From this, there were 3 in a project where the called function's comment said that "The order of arguments <> and <> does not matter.", however, because this can never be figured out totally from the checker, I regarded the swapped call as a true positive. The fact that they had to seemingly create, inside the called function, some logic to detect and handle the swap shows that something was going wrong with the code's design for a long time.

In addition to these findings, I have identified 122 (75 unique) function call sites where the report about the potential swap (and the similarity to a different parameter name) is justifiable because the understandability of the code (especially to someone who is an outsider from virtually all of the projects analysed (2)) is hindered by the poor choice of argument or parameter names. The conclusions from these cases (marked Intentional in the CodeChecker database) are consistent with those drawn in [Pradel2013] and [Liu2016].

Now, onto the false positives. There were 64 (47 unique) cases. However, these cases can be further broken down into different categories, which I wasn't able to tally easily as CodeChecker only supports 3 unique categories, not infinite ones.

  • Some of the false positives are what one would say "borderline": if the person reading the code reads it accurately, the reported "swap" does not fall into the understandability issue category. However, a famous case of this is from Postgres: reloid (IIRC, meaning relation owner id) and roleid (role (user) id) are the names of args/params of some functions. They are not swapped in the calls that exist in the code, but the similarity (swapping only 2 letters) makes it very easy to typo or misread the thing. In Postgres, there were 7 such cases.
  • Approximately 5+5 cases are false positives but can be dealt with in heuristics. However, across the 17 projects, they do not account for a sizeable amount of cases.
    • Recursive function calls should be ignored. (This was mentioned in [Rice2017].)
    • Swapped and not-swapped calls appearing close to one another (i.e. constructs like b ? f(x,y) : f(y,x)) should be ignored too. This seems a bit harder to implement.
  • There is a bug in the check's current implementation when calls from/to operator() is handled. (3) I'll look into fixing this.
  • Binary operators should be ignored from reporting in general. Their parameters tend to have generic names, and the reports created from them is confusing.

Another generic observation is that the check's output is pretty wonky and hard to read at a glance, but this should be easy to fix. In addition, the observation of [Rice2017] about ignoring "patterned names" (i.e. arg1, arg2, ...) seems like a useful thing to add to the implementation, even though I had no findings at all where "ignoring patterned names" would've squelched a false positive report.

Agreeably, this check is limited compared to the previously linked [Rice2017], as it only checks the names in the call, not all variables in the surrounding context.


(1): I'm not exactly sure as to what "report uniqueing" in CodeChecker precisely does these days, but basically it uses the context of the bug and the checker message and whatnot to create a hash for the bug - "unique reports" mode shows each report belonging to the same hash only once.
(2): From the set of test projects, I only have some hands-on experience with parts of LLVM and Apache Xerces.
(3): Codes similar in nature to the following example of exact value forwarding to the call operator seems to still trigger the check. I have yet to actually pin down what causes this.

struct S
{
    int operator()(int a, int b, const char* c, double d);
};

struct T
{
    S* s;
    int operator(int a, int b, const char* c, double d)
    {
        return (*s)(a, b, c, d);
    }
};

[Pradel2013]: Michael Pradel, and Thomas R. Gross: Name-based analysis of equally typed method arguments. In: IEEE Transactions on Software Engineering, 39(8), pp. 1127-1143, 2013.
[Liu2016]: Hiu Liu, et al.: Nomen est Omen: Exploring and exploiting similarities between argument and parameter names. In: 38th IEEE International Conference on Software Engineering, pp. 1063-1073, 2016.
[Rice2017]: Andrew Rice, et al.: Detecting argument selection defects. In: Proceedings of the ACM on Programming Languages, 1, pp. 104:1-104:23, 2017.

But of course, after having written all of that, I forgot to upload the results themselves... 😲

Right, let's bump.

Thank you for all of the detailed information on the performance of the check! I worked on a similar check in a recent past life and my intuition is that over a large corpus of code, this will still have quite a bit of false positives and false negatives. However, I think those can likely be handled by post-commit improvements like noticing abbreviations (def vs define) or synonyms (number vs numeral), being smarter about patterned names, etc. I think this check is worth moving forward with, but I'm not certain how @alexfh feels about it.

(Typing this in here so it is not forgotten.) @aaron.ballman suggested in http://reviews.llvm.org/D69560#inline-893813 that a case of argument and parameter swaps could be discovered between forward declarations and the definitions, i.e.

void fn(int x, int y);

void fn(int y, int x) { ... } // Proparly mistakenly swapped names!

The conclusion was that this check (given this one does string distance, metrics, etc.) should be extended with such functionality. However, we need a better name for the check in that case!

whisperity edited the summary of this revision. (Show Details)
whisperity added reviewers: hokein, njames93.
  • Massively refactored and modernised the implementation
  • Removed spaghetti code related to the check options and made their storage and defaults clearer
  • Fixed modelling issues that caused false positives around lambdas, overloaded call operators, recursive calls, enumconstant arguments
  • Made the abbreviation dictionary for the Abbreviations heuristic a check-option
  • Made the check message much more legible
  • Fixed a severe modelling issue about not respecting or handling record types passed by copy (CXXConstructExpr)
  • Wrote the documentation for real, including all heuristics, default values, options, etc. The original documentation was basically empty. For the docs, I used @varjujan's thesis (which subject was this check), which is unfortunately only available in Hungarian, but it helped me understand what some of the enums and the Bound doing on the inside.
  • Thanks to having written a proper documentation, I also renamed a few symbols and check options to better tell what they are about.

Pinging @alexfh for opinions about the check (especially any concerns about true or false positive rates). I continue to think this is a good check that's worth moving forward on.

clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp
31–32 ↗(On Diff #329666)

signed char since we're doing > -1 below? Or better yet, int8_t because these aren't really characters?

51 ↗(On Diff #329666)

We should probably document where all these numbers come from, but 66 definitely jumps out at me as being a bit strange. :-D

93 ↗(On Diff #329666)
102–108 ↗(On Diff #329666)
271–272 ↗(On Diff #329666)
296 ↗(On Diff #329666)
301–302 ↗(On Diff #329666)

Can re-flow the comments.

341 ↗(On Diff #329666)

Elsewhere we're using isPointerType() which is subtly different because it excludes ObjC object pointers. We should be consistent about the usage.

347 ↗(On Diff #329666)

It seems like we're doing an awful lot of the same work as ASTContext::typesAreCompatible() and type compatibility rules are pretty complex, so I worry about this implementation being different than the ASTContext implementation. Have you explored whether we can reuse more of the logic from ASTContext here, or are they doing fundamentally different kinds of type compatibility checks?

462 ↗(On Diff #329666)
467 ↗(On Diff #329666)
clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.h
38 ↗(On Diff #329666)

The comment is somewhat confusing because the enumerators have the values 0 and 1, which are valid percentages but not likely what the comment means.

whisperity marked 8 inline comments as done.Mar 18 2021, 2:45 AM
whisperity added inline comments.
clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp
51 ↗(On Diff #329666)

Unfortunately, I have absolutely no idea. All these values are percentages between 0 and 100 (-1 is just saying that "This heuristic doesn't accept percentages"), and this is written in the documentation now. However, the answer to "why 66%?", unless @varjujan can say something, I think is lost to history...

I'll read over his thesis once again, maybe I can find anything with regards to this.

Either way, I've detailed from both the code and the thesis how the percentages are meant. In some cases, the % is calculated as "% of the longer string's length". In the Leventhstein's case, it's actually inverted:

Dist = (1 - Dist / LongerLength) * 100;

So what this says is that if the current arg1-param1 arg2-param2 pairing has less than the inverse of 50% (which is more than 50%) of the longer string's edit distance, but the arg2-param1 and arg1-param2 (the suggested swapped order) has more than the inverse of 66% (which is less than 33%), then the swap will be suggested.

Originally these values were called LowerBound and UpperBound, respectively, which was saying even less about what they mean...

271–272 ↗(On Diff #329666)

Good catch! I believe this is what happens when you write LaTeX and code at the same time? I didn't notice this when I was tidying up the code...

whisperity added inline comments.Mar 18 2021, 2:45 AM
clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp
347 ↗(On Diff #329666)

No, I didn't know that function even existed. This check must be older than that function.

varjujan added inline comments.Mar 18 2021, 3:36 AM
clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp
51 ↗(On Diff #329666)

Sadly, I think there isn't any scientific reason behind these numbers. They just looked ok after a couple of test runs. (Maybe they make sense for shorter arg names, like the 66 for 3 char long names.)

whisperity marked 3 inline comments as done.

NFC Made the code more legible, updated and clarified some comments, fixed grammar issues.

clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp
31–32 ↗(On Diff #329666)

Oh right, I always forget char isn't guaranteed. int8_t seems like a better idea anyways (until we have int6_t...)

51 ↗(On Diff #329666)

In [Rice2017], the inflexion point of the precision/recall plot is at around 0.55-ish threshold. They express this threshold as the distance of distances, i.e. 0 would mean the (a1, p1) - (a2, p2) pair is good as it is, and 1 would mean that it should definitely be (a2, p1) - (a1, p2) instead.

347 ↗(On Diff #329666)

Actually, no, that function is pretty old... However, that function, and all the function it subsequently calls, require a non-const ASTContext. I have changed ASTContext:

╰─ git diff --cached --stat
 clang/include/clang/AST/ASTContext.h                                     | 67 ++++++++++++++++++++++++++++++++++++-------------------------------
 clang/lib/AST/ASTContext.cpp                                             | 72 +++++++++++++++++++++++++++++++++++++-----------------------------------
 2 files changed, 73 insertions(+), 66 deletions(-)

making related member functions and internal static functions take const ASTContext &/*.

This, by itself, did not break any of the tests of check-clang check-clang-unit check-clang-tools check-clang-extra-unit!

347–349 ↗(On Diff #329666)

@aaron.ballman Changing the function to be

return Ctx.typesAreCompatible(ArgType, ParamType);

will make the checker miss the test case about T/const T& mixup.

void value_const_reference(int llllll, const int& kkkkkk);
void const_ref_value_swapped() {
  const int& kkkkkk = 42;
  const int& llllll = 42;
  value_const_reference(kkkkkk, llllll);
  // error: CHECK-MESSAGES: expected string not found in input:
  // warning: 1st argument 'kkkkkk' (passed to 'llllll') looks like it might be swapped with the 2nd, 'llllll' (passed to 'kkkkkk')
}

Setting bool CompareUnqualified = true (3rd argument to typesAreCompatible) doesn't help either.
Which is valid from typesAreCompatible's perspective... that function answer the question, applied to the context of the above test: "Is llllll = kkkkkk; valid?", which is obviously false as both are const T&s.

clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.h
38 ↗(On Diff #329666)

I'll rename it to BoundKind and update the comments.

aaron.ballman added inline comments.Mar 18 2021, 5:40 AM
clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp
51 ↗(On Diff #329666)

Thank you for the explanations! I'm fine with the values as they are (they're defaults that can be changed anyway).

164 ↗(On Diff #329666)
347 ↗(On Diff #329666)

Doubtful -- typesAreCompatible() is critical for checking the semantics of assignment in C, overloading in C++, etc. It may have simply been overlooked when writing this check. Given the complexities of type checking, my intuition is that we should be leaning on ASTContext for as much of this functionality as we can get away with. That will also get us nice "extras" like caring about address spaces, ARC, etc which are what got me worried when I started looking at this implementation.

624 ↗(On Diff #329666)

TIL about %ordinal in diagnostics, thanks for that! :-D

632–634 ↗(On Diff #329666)
644 ↗(On Diff #329666)
652–656 ↗(On Diff #329666)
657–660 ↗(On Diff #329666)

Can this case happen?

707–708 ↗(On Diff #329666)

Should the length of what's considered "too short" be a configuration option? I think 3 is a good default.

739 ↗(On Diff #329666)

Otherwise it looks like we could read an uninitalized value later (if GotBound was None).

739–741 ↗(On Diff #329666)

There's some type confusion going on here between unsigned char and char -- I think all of these uses should switch to uint8_t or int8_t (or possibly just int given that these values all wind up being promoted to int anyway).

743–744 ↗(On Diff #329666)

Any reason not to move this below the switch and use = instead of |= within the cases? (Or return from the cases directly?)

clang-tools-extra/docs/clang-tidy/checks/readability-suspicious-call-argument.rst
61–68 ↗(On Diff #329666)

I wonder how Hungarian notation impacts this heuristic -- I would imagine a lot of similar prefixes in such a code base, and things like lpstr as a prefix could be a pretty large chunk of some identifiers.

75 ↗(On Diff #329666)

Similar to above, I wonder how numeric digits impact this heuristic -- do the defaults consider this to be a swap?

void foo(int frobble1, int frobble2);
foo(frobble2, frobble1); // Hopefully identified as a swap
foo(bar2, bar1); // How about this?
whisperity marked an inline comment as done.Mar 18 2021, 6:00 AM
whisperity added inline comments.
clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp
657–660 ↗(On Diff #329666)

Oops... It seems I posted the updated patch right where you were writing more comments and we got into a data race. Which case are you referring to? It's now affixed to a diag( call for me...

clang-tools-extra/docs/clang-tidy/checks/readability-suspicious-call-argument.rst
61–68 ↗(On Diff #329666)

The switch is only warned if it would be type-safe. If the HN prefix is in both the same way, then it could be ignored. Thus, given f(const char* lpszFoo, const char* lpszBar, uint16_t psnzXXX) {}, if I do a f(lpszX, lpszA, ...);, it should consider in both cases that the prefix is common and matches. Note that to produce a diagnostic, two things has to be proven: first, that the current ordering is dissimilar (below threshold A), and second, that the potential swapped ordering is more similar (above threshold B).

75 ↗(On Diff #329666)

Currently, neither of these are matched. I have to look into why the first isn't... it really should, based on the "equality" heuristic. It's too trivial.

The second... well... that's trickier. I would say it shouldn't match, because if it did, we would be swamped with false positives. The suffix is only 1 character, and we need 25/30% based on the string's length.

aaron.ballman added inline comments.Mar 18 2021, 6:17 AM
clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp
347–349 ↗(On Diff #329666)

Yeah, I expect there to be a delta between the work this check is doing and the existing work done by typesAreCompatible(). However, given the complexity of type compatibility checking, I'd say it's better for us to try to refactor the ASTContext functionality so that we can share as much of the implementation as plausible rather than duplicate some really difficult logic in the tidy check.

657–660 ↗(On Diff #329666)

Hehe, it's "fun" when the comments move around like this, isn't it? :-D I meant the else clause in:

for (std::size_t I = 0, E = CalleeFuncDecl->getNumParams(); I != E; ++I) {
  if (const ParmVarDecl *Param = CalleeFuncDecl->getParamDecl(I)) {
    ParamTypes.push_back(Param->getType());

    if (IdentifierInfo *II = Param->getIdentifier()) {
      ParamNames.push_back(II->getName());
    } else {
      ParamNames.push_back(StringRef());
    }
  } else { // This seems like it should be impossible, no?
    ParamTypes.push_back(QualType());
    ParamNames.push_back(StringRef());
  }
}
clang-tools-extra/docs/clang-tidy/checks/readability-suspicious-call-argument.rst
75 ↗(On Diff #329666)

I agree that the first one should be caught and I also agree that the second one is tricky but that matching it would likely increase false positives.

whisperity marked 12 inline comments as done.Mar 18 2021, 6:32 AM
whisperity added inline comments.
clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp
657–660 ↗(On Diff #329666)

Oh, nevermind, there is a button that shows me the older diff where it's aligned properly.

And yeah, it seems it can't, getParamDecl always returns a ParmVarDecl. Weird issues might arise when the vectors that are built here get out of sync (such as the issue we had with operator() calls before I fixed it!), so I understood the reason behind keeping the two functions parallel with each other in terms of pure visuals, even.

743–744 ↗(On Diff #329666)

Returning from the cases directly is a bad idea because we want to try all heuristics and only say false if none of them matches.
But this break in a very bad location, I agree.

whisperity marked 2 inline comments as done.Mar 18 2021, 6:33 AM
whisperity marked 2 inline comments as done.
  • NFC Fixed some nits
  • Added a new check option, MinimumIdentifierNameLength instead of a hardcoded 3 value. Defaults to 3.
  • Fixed an issue with heuristics matching only in one direction but not in the other direction silenced a warning that clearly should have been there.
whisperity added inline comments.Mar 19 2021, 7:03 AM
clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp
347 ↗(On Diff #329666)

I talked with @aaron.ballman in private about this a little, and unfortunately, the route to call typesAreCompatible() is no dice. First things first, in C++ mode, that function just early returns (essentially) == on the type. Otherwise, it would call QualType ASTContext::mergeTypes(QualType, QualType, ...). Okay, let's delve into this thing. Now, first things first, somewhere down the line if you happen to give it something C++-specific (e.g. LValueReferenceType), it will just assert into your face. (Because C++ things are only compatible if the types are equal, most likely. In C++ mode, mergeType shouldn't be, and doesn't get, called.)

But there are even more issues with this function. Put simply, it does not perform any "implicit conversions". Sic!, I'm putting it in between quotes, but the fact is that if one type is const and the other isn't, it will just say "Nope, these are not compatible.".
So modelling the whole idea of "is passing an argument of type T acceptable to a parameter of type U?" is not possible with these functions.

So even though typesAreCompatible()'s documentation comment says:

Compatibility predicates used to check assignment expressions.

by far and large, we are not having the right kinds of types in our hands (in terms of this check) to fall back to this logic.

I've done a skim of the codebase to find call sites for these functions and try to explore my way back and forth from there to see how this function should be used, and the results are daunting.

The image is from lib/Sema/SemaExpr.cpp somewhere along the lines of checking address spaces (for Obj-C, the vast majority of this function seems to only deal with Obj-C-specific things!), from checkConditionalPointerCompatibility. The call to mergeTypes() is in the bottom left, the right pane is the continuation below on the result here is used.

It seems to me (and many other call sites to this function tend to behave the same way) that Sema does a lot of casting-away-things (like constness) before it can reasonably call this function, and if the result comes back valid, this function (and many others) then just re-apply manually the things that were originally cast off.

Thus, I see that if we would call this function in the check, a lot (if not all, or maybe even more than now!) of the logic that is currently here would still have to remain to prefix and suffix the call to mergeTypes().

Just for the sake of trying it all out, I put together a version where I leave the handling of the "C++ stuff" (like references) to the checker, but the rest back to the ASTContext, but it didn't fall into place:

ParamType = PointerType 0x5627819d0930 'const int *'
`-QualType 0x5627819cfef1 'const int' const
  `-BuiltinType 0x5627819cfef0 'int'
ArgType = PointerType 0x562781a49290 'int *'
`-BuiltinType 0x5627819cfef0 'int'
mergeType(Param, Arg) = <<<NULL>>>
mergeType(Arg, Param) = <<<NULL>>>
347 ↗(On Diff #329666)
╰─ git diff --cached --stat
 clang/include/clang/AST/ASTContext.h                                     | 67 ++++++++++++++++++++++++++++++++++++-------------------------------
 clang/lib/AST/ASTContext.cpp                                             | 72 +++++++++++++++++++++++++++++++++++++-----------------------------------
 2 files changed, 73 insertions(+), 66 deletions(-)

whisperity edited the summary of this revision. (Show Details)

Rebase over D98635. Highlighting only the parameter's name, not the entire range of the parameter.

whisperity marked 4 inline comments as done.Apr 10 2021, 11:45 AM

@aaron.ballman @alexfh Bump! How shall we move further?

whisperity edited the summary of this revision. (Show Details)
  • NFC Fix lint of header guard nomenclature
  • NFC Tear out obsolete LLVM_DEBUG calls from the implementation

NFC Fix the header guard name lint again, hopefully to the proper value this time. I forgot the CHECK out of it...

NFC Rebase and fix things broken by D104819.

aaron.ballman added inline comments.Jun 25 2021, 10:08 AM
clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp
185 ↗(On Diff #354467)

Should this be a case insensitive comparison?

207 ↗(On Diff #354467)
268 ↗(On Diff #354467)
299 ↗(On Diff #354467)
371 ↗(On Diff #354467)
403 ↗(On Diff #354467)
408–409 ↗(On Diff #354467)

Should this move down closer to where it's used?

425–427 ↗(On Diff #354467)
437 ↗(On Diff #354467)
453 ↗(On Diff #354467)
467 ↗(On Diff #354467)

It would be good to have a test case involving private inheritance.

579 ↗(On Diff #354467)
588 ↗(On Diff #354467)
694–695 ↗(On Diff #354467)

Range-based for loop over CalledFuncDecl->params()?

801 ↗(On Diff #354467)

This looks pretty reachable to me in the case where there's no bound.

whisperity marked 12 inline comments as done.Jun 28 2021, 3:04 AM
whisperity added inline comments.
clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp
408–409 ↗(On Diff #354467)

Sure!

801 ↗(On Diff #354467)

I'm not sure if that is the case. I added the llvm_unreachable so we don't get a warning about the function not having a return value on every code path. The switch covers all potential heuristics that are in the check right now, but if we add a new heuristic (to the enum) and forget to write it in, we will get a -Wswitch warning here. A default case doesn't apply here, further developers should be encouraged to wire new heuristics in properly.

whisperity marked an inline comment as done.Jun 28 2021, 3:07 AM
whisperity added inline comments.
clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp
694–695 ↗(On Diff #354467)

(Note: it's parameters() now, not params()!)

whisperity marked 2 inline comments as done.
  • Fix comments and code according to comments
  • Turned Substring into a case-insensitive heuristic.
  • Added some missing tests.
Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2021, 5:34 AM

NFC Fix ReleaseNotes entry getting out of place during rebases earlier.

I think this looks good to me. @alexfh, you had raised questions about this meeting the quality bar. I think those concerns are valid in the abstract (swapped argument checking requires heuristics), but I'd argue that most existing code bases that actually have swapped args finds those bugs in production, so the rate of true positives on existing code is often small. I see the value from this check coming from checking new code as it's written, as part of a CI pipeline for instance. With that in mind, do you still have concerns about this check?

clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp
801 ↗(On Diff #354467)

Oops, ignore my think-o! I somehow read the switch as switching over Threshold and not H and confused myself.

whisperity marked an inline comment as done.Jun 30 2021, 5:28 AM

Bump. 🙂 I would prefer this check to be able to go into Clang-Tidy 13 if it's okay, together with the other check I implemented.

aaron.ballman accepted this revision.Jul 13 2021, 12:43 PM

Bump. 🙂 I would prefer this check to be able to go into Clang-Tidy 13 if it's okay, together with the other check I implemented.

@alexfh has had a few weeks to respond, so I'm going to give my explicit LG just in case he isn't available to perform a review. Please give Alex until the end of the week to comment on here before landing, but given the considerable review and discussion on this already, I think we can address remaining concerns post-commit if he's unavailable for the moment.

This revision was not accepted when it landed; it landed in state Needs Review.Jul 19 2021, 1:19 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.