This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Add 'bugprone-easily-swappable-parameters' check
ClosedPublic

Authored by whisperity on Oct 29 2019, 6:13 AM.

Details

Summary

Finds function definitions where parameters of convertible types follow each other directly, making call sites prone to calling the function with swapped (or badly ordered) arguments.

Such constructs are usually the result of insufficient design and lack the exploitation of strong type capabilities that are possible in the language.
This check finds and flags function definitions, and not call sites. The reasoning is that while there are several ways to heuristically detect a potential swapped call, in the long term, ensuring a safe interface is more beneficial.

As an interface rule, it is understandable that it goes nuts over large projects that have yet to make their interfaces stricter and stronger.

Some of the most trivial flagged examples might be:

FILE *open(std::string_view Directory, std::string_view Filename, std::string_view Extension) { /* ... */ }

Originally, the check was meant to implement C++ Core Guideline rule I.24 "Avoid adjacent parameters of the same type when changing the argument order would change meaning" here, however, discussion about the rule's applicability is ongoing. For now, introducing the check outside the cppcoreguidelines- namespace so that people can still benefit from it.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I have re-analysed Bitcoin and Xerces with the new version (not yet published to Phab), and the results are looking really good! There is an insignificant change in the number of reports, 1 or 2 new ones compared to the previous, all explained by the fact that "unnamed parameters" are marked as ignored, i.e. the previous report (which reported 3 swappable parameters, one unnamed) is marked "Resolved" (1), but there is the (2 swappable one without the unnamed at the start/end) as "New". Due to this, I will spare you from showing the essentially same table from above again: the total number of results are unchanged, only the details of each (and they are changed in a positive (towards more sensible) direction).

In addition, I've implemented the "patterned name filtering" (a really rudimentary, but seemingly rather powerful version), with the rule "if two parameters are similar to at most 1 letter at the beginning or the end" (2), they should be scrapped from the result set.

Throwing out one-way implicit conversions (which I admit was a huge and utterly useless misunderstanding and misdesign from our part...) is also a good update. In addition, I've did the analysis with the explicit "do not report bool, bool, ... sequences" rule enabled. This silenced (as bool swaps are a more special problem wrt. "strong typing" - and even a separate guideline rule exists for them) 30 and 27 reports from the strictest case, for Bitcoing and Xerces, respectively.

There're still a few bugs (which I discovered but doesn't affect the above statements) to fix before I can split the patches and upload them to Phab.


(1): In CodeChecker, if you store a new result set over an already existing named result set, the bug reports that disappeared are marked "Resolved".
(2): lhs, rhs, text1, text2, text3, qmat, rmat, tmat, etc. are covered by this.

whisperity retitled this revision from [clang-tidy] Add 'experimental-cppcoreguidelines-avoid-adjacent-parameters-of-the-same-type' check to [clang-tidy] Add 'bugprone-easily-swappable-parameters' check.
whisperity edited the summary of this revision. (Show Details)
whisperity removed a reviewer: baloghadamsoftware.
whisperity set the repository for this revision to rG LLVM Github Monorepo.
whisperity removed a subscriber: o.gyorgy.

Refactored the check and rebased it against "current" master. This version now has the tests rewritten and fixed. In addition, this patch only introduces the very basic frame of the check (i.e. only strict type equality), nothing more. This is so that review can be done in a more digestible way.

whisperity updated this revision to Diff 319315.EditedJan 26 2021, 8:56 AM
  • Added a fallback case to also diagnose when Clang parsed the two types to be canonically the same (we will elaborate a few cases of such equivalences later, such as typedefs)
  • Fixed Clang-Format and Clang-Tidy warnings in the patch
aaron.ballman added inline comments.Jan 27 2021, 5:49 AM
clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
58 ↗(On Diff #319315)

Is there a need for the anonymous namespace? (We typically only use them when defining a class and try to make them as narrow in scope as possible, and use static for function declarations instead.)

67 ↗(On Diff #319315)

Does using the fixed underlying type buy us something? As best I can tell, this will introduce a fair amount of implicit promotions to int anyway.

90–91 ↗(On Diff #319315)

Heh, I don't know if we have any guidance on US English vs British English spellings. ;-)

105 ↗(On Diff #319315)
176–177 ↗(On Diff #319315)

Outside of field declarations, we typically only const-qualify pointer/reference types.

177 ↗(On Diff #319315)

Some interesting test cases to consider: varargs functions and K&R C functions

196 ↗(On Diff #319315)

Preferable to spell this type out to make it easier on someone reading the code to understand the type.

255 ↗(On Diff #319315)

Hmm, one downside to using the printing policy to get the node name is that there can be alternative spellings for the same type that the user might reasonably expect to be applied. e.g., if the user expects to ignore bool datatypes and the printing policy wants to spell the type as _Bool, this won't behave as expected.

284 ↗(On Diff #319315)

It might be good to move this to a more prominent location since it's a default value.

301 ↗(On Diff #319315)

If we're going to include ForwardIt, we probably want things like RandomIt as well?

302 ↗(On Diff #319315)

reverse_iterator and reverse_const_iterator too?

How about ranges?

402–403 ↗(On Diff #319315)

Can you post a screenshot of what you mean?

clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.h
30 ↗(On Diff #319315)

Do these data members need to be public?

clang-tools-extra/test/clang-tidy/checkers/bugprone-easily-swappable-parameters-len2.cpp
10 ↗(On Diff #319315)

I think this is a case where we could warn when the declaration is outside of a system header (perhaps as an option).

Thinking about it a bit more, declarations and definitions provide a novel way to get another kind of swap:

void func(int x, int y);
void func(int y, int x) {} // Oops, the swap was probably not intentional

which may or may not be interesting for a check (or its heuristics).

14 ↗(On Diff #319315)
whisperity added inline comments.Jan 27 2021, 6:28 AM
clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
58 ↗(On Diff #319315)

There will be two major parts, the modelling (and which is extended down the line with new models like implicit conversions) and the filtering (which throws away or breaks up results). Should I cut out the anonymous namespace? How about the inner namespaces, may they stay?

67 ↗(On Diff #319315)

Will it? I thought if I use the proper LLVM bitmask/flag enum thing it will work properly. For now, it is good for the bit flag debug print, because it only prints 8 numbers, not 32.

90–91 ↗(On Diff #319315)

Reflexes, my bad. (Similarly how I put the pointer * to the left side, where Stroustrup intended. At least my local pre-commit hooks take care of that.) Will try to remember this.

255 ↗(On Diff #319315)

But then the diagnostic is inconsistent with what the compiler is configured to output as diagnostics, isn't it? Can I stringify through Clang with some "default" printing policy?

402–403 ↗(On Diff #319315)

Given

Diag << SourceRange{First->getSourceRange().getBegin(), Last->getSourceRange().getEnd()};

The diagnostic is still produced with only a ^ at the original diag(Location), ignoring the fed SourceRange:

/home/whisperity/LLVM/Build/../llvm-project/clang-tools-extra/test/clang-tidy/checkers/bugprone-easily-swappable-parameters-len2.cpp:21:18: warning: 3 adjacent parameters of 'redeclChain' of similar type ('int') are easily swapped by mistake [bugprone-easily-swappable-parameters]
void redeclChain(int I, int J, int K) {}
                 ^

Instead of putting all the squigglies in as the range-location highlight, like how Sema can diagnose:

/home/whisperity/LLVM/Build/../llvm-project/clang-tools-extra/test/clang-tidy/checkers/bugprone-easily-swappable-parameters-len2.cpp:21:18: warning: 3 adjacent parameters of 'redeclChain' of similar type ('int') are easily swapped by mistake [bugprone-easily-swappable-parameters]
void redeclChain(int I, int J, int K) {}
                 ^~~~~~~~~~~~~~~~~~~

I would not want to put additional note tags in an otherwise already verbose output.

Back in 2019 I was investigating this issue specifically for another checker I was working on, but hit the wall... Somewhere deep inside where Tidy diagnostic stuff is translated and consumed to Clang stuff, it goes away. It shouldn't, because there are statements that seemingly copy the Diag.Ranges array over, but it still does not come out.

... EDIT: I found the relevant mailing list post.

clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.h
30 ↗(On Diff #319315)

Yes. The modelling heuristics will have a bunch of recursive functions which all need the config variables (okay, not the MinimumLength one, but the other 4 or 5 that will be introduced) and giving them every time by copy would introduce its own adjacent swappable bool, bool parameter sequence issue.

The previous full version of the check which I'm incrementally reworking has signatures like:

static MixupData HowPossibleToMixUpAtCallSite(const QualType LType,
                                              const QualType RType,
                                              const ASTContext &Ctx,
                                              const bool CVRMixPossible,
                                              const bool ImplicitConversion,
                                              const bool IsUserDefinedConvertersOtherwisePossible);

static MixupData RefBindsToSameType(const LValueReferenceType *LRef,
                                    const Type *T, const ASTContext &Ctx,
                                    const bool IsRefRightType,
                                    const bool CVRMixPossible,
                                    const bool ImplicitConversionEnabled);
clang-tools-extra/test/clang-tidy/checkers/bugprone-easily-swappable-parameters-len2.cpp
14 ↗(On Diff #319315)

Good catch! This is what you get when you copy-paste. 😁

whisperity added inline comments.Jan 27 2021, 11:09 AM
clang-tools-extra/test/clang-tidy/checkers/bugprone-easily-swappable-parameters-len2.cpp
10 ↗(On Diff #319315)

I gave this some thought. It is a very good idea, but I believe not for this check, but D20689. What do you think of that? Maybe simply saying "call site v. function node that was called" could be extended with a completely analogous, string distance function based "function definition node v. redecl chain" case.

whisperity marked 9 inline comments as done.
  • NFC Code style and organisation fixes
clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
284 ↗(On Diff #319315)

Moved it to the top.

302 ↗(On Diff #319315)

How about ranges?

I would like to say that having an f(RangeTy, RangeTy) is exactly as problematic (especially if both are non-const) as having f(int, int) or f(void*, void*), and should be taken care of the same way (relatedness heuristic, name heuristic).
The idea behind ignoring iterator-ly named things was that "Iterators more often than not comes in pairs and you can't(*) do anything about it".

(*): Use ranges. Similarly how draw(int x, int y) is draw(Point2D) if we are enhancing type safety.

whisperity marked an inline comment as done.Jan 28 2021, 5:29 AM
whisperity marked 7 inline comments as not done.Jan 28 2021, 5:41 AM
aaron.ballman added inline comments.Feb 2 2021, 6:42 AM
clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
58 ↗(On Diff #319315)

The inner namespaces are totally fine. As for the anon namespace, I personally don't have a strong opinion against it, I brought it up mostly as a coding style guide nit, but I think removing it is helpful so that it's more obvious when a function has internal vs external linkage from its signature.

67 ↗(On Diff #319315)

Ah, perhaps that's the case -- I'm not super familiar with the LLVM bitmask/flag macros. That said, if you think there's good utility for it, I'm happy to leave it as-is.

90–91 ↗(On Diff #319315)

Heh, no worries, I'm betting if we did some code searches, we'd find plenty of British English spellings as well. :-D

255 ↗(On Diff #319315)

The situation I'm worried about is something like this in C code:

#include <stdbool.h>

void func(bool b, bool c) {}

because the bool in the function signature will expand to _Bool after the preprocessor gets to it (because of the contents of <stdbool.h>). If the user writes bool in their configuration file because that's what they've written in their function signatures, will this fail because the printing policy says it should be _Bool?

Along similar lines, I wonder about things like struct S vs S (in C), signed char vs char, types within anonymous namespaces (where we print <anonymous namespace>), etc. Basically, any time when the printing policy may differ from what the user lexically writes in code.

302 ↗(On Diff #319315)

I was thinking about ranges in situations where you also need them to come in pairs, like range union, range intersection, range difference, etc. Those cases feel like they suffer from the same problem as swap() where the interface is intentionally operating on two of the same types.

That said, I think paired range operations are *way* less common than paired iterator operations, so I think ranges should probably be left out.

402–403 ↗(On Diff #319315)

... EDIT: I found the relevant mailing list post.

Oh wow, I had no idea! That's a rather unfortunate bug. :-(

clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.h
30 ↗(On Diff #319315)

That's understandable logic, but then we're exposing essentially what amount to be implementation details. So it sounds like we're introducing a new code maintenance issue in order to solve another code maintenance issue. :-(

That said, I don't think leaving these as public is a deal-breaker.

clang-tools-extra/test/clang-tidy/checkers/bugprone-easily-swappable-parameters-len2.cpp
10 ↗(On Diff #319315)

Hmmm, my impression is that this check is fundamentally about *parameter* ordering whereas D20689 is about *argument* ordering. Given that the example is about the ordering of parameters and doesn't have anything to do with call sites, I kind of think the functionality would be better in a parameter-oriented check.

That said, it doesn't feel entirely natural to add it to this check because this check is about parameters that are easily swappable *at call sites*, and this situation is not exactly that. However, it could probably be argued that it is appropriate to add it to this check given that swapped parameters between the declaration and the definition are likely to result in swapped arguments at call sites.

Don't feel obligated to add this functionality to this check (or invent a new check), though.

whisperity marked 6 inline comments as done.Feb 2 2021, 6:55 AM
whisperity added inline comments.
clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
255 ↗(On Diff #319315)

Meanwhile, I realised we are talking of two entirely distinct things. I will look into how this PrintingPolicy stuff works... I agree that the ignore rules (where the comment was originally posted) should respect the text written into the code as-is. The diagnostics printing type names could continue using the proper printing policy (to be in line with how Sema diagnoses, AFAIK).

402–403 ↗(On Diff #319315)

Speaking of bugs, shall I put it up on Bugzilla, or see if anyone reported already? Would require me digging into that "where are the data structures copied" kind of thing to figure it out again.

clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.h
30 ↗(On Diff #319315)

These variables are const and "directly" created from the user-facing configuration arguments. But in the end once we are done with the implementation and everything is good to go and we see how many functions and what configuration variables are actually needed where, we can look into moving them into some internal struct again. But it feels rather unlikely.

clang-tools-extra/test/clang-tidy/checkers/bugprone-easily-swappable-parameters-len2.cpp
10 ↗(On Diff #319315)

It just seems incredibly easier to put it into the other check, as that deals with names. But yes, then it would be a bit weird for the "suspicious call argument" check to say something about the definition... This check (and the relevant Core Guideline) was originally meant to consider only types (this is something I'll have to emphasise more in the research paper too!). Everything that considers names is only for making the check less noisy and make the actually emitted results more useful. However, I feel we should specifically not rely on the names and the logic too much.

But(!) your suggestion is otherwise a good idea. I am not sure if the previous research (with regards to names) consider the whole "forward declaration" situation, even where they did analysis for C projects, not just Java.

whisperity marked 3 inline comments as done.Feb 2 2021, 7:13 AM
whisperity added inline comments.
clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
402–403 ↗(On Diff #319315)

Nevermind, this is worth a bug report just so we can track it in the proper location.

I haven't read through all the comments, but the word 'easily' implies 'desirable'. This check seems to be for finding params which are undesirably swappable, right?

aaron.ballman added inline comments.Feb 2 2021, 7:17 AM
clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
402–403 ↗(On Diff #319315)

If you don't mind doing that effort, it'd be appreciated! Even if you don't root cause the issue, having the report (or pinging an existing one) is still useful.

clang-tools-extra/test/clang-tidy/checkers/bugprone-easily-swappable-parameters-len2.cpp
10 ↗(On Diff #319315)

However, I feel we should specifically not rely on the names and the logic too much.

I agree, to a point. I don't think the basic check logic should be name-sensitive, but I do think we need to rely on names to tweak the true/false positive/negative ratios. I think most of the time when we're relying on names should wind up being configuration options so that users can tune the algorithm to their needs.

We could put the logic into the suspicious call argument check with roughly the same logic -- the call site looks like it has potentially swapped arguments because the function redeclaration chain (which includes the function definition, as that's also a declaration) has inconsistent parameter names. So long as the diagnostic appears on the call site, and then refers back to the inconsistent declarations, it could probably work. However, I think it'd be a bit strange to diagnose the call site because the user of the API didn't really do anything wrong (and they may not even be able to change the declaration or the definition where the problem really lives).

I haven't read through all the comments, but the word 'easily' implies 'desirable'. This check seems to be for finding params which are undesirably swappable, right?

The easily was to mean that the swap can be done with little effort (i.e. "in an easy fashion"?) and by accident. We want to find function signatures where that is the case, yes. TL;DR: "This function will be misused by a careless client, please try designing a better interface!"

whisperity marked an inline comment as done.Feb 2 2021, 7:22 AM
whisperity added inline comments.
clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
402–403 ↗(On Diff #319315)

And I managed to post a broken link originally... Here's the right one: https://bugs.llvm.org/show_bug.cgi?id=49000

clang-tools-extra/test/clang-tidy/checkers/bugprone-easily-swappable-parameters-len2.cpp
10 ↗(On Diff #319315)

But wait... suppose there is a project A, which sees the header for f(int x, int y) and has a call f(y, x). That will be diagnosed. But if project A is only a client of f, it should be highly unlikely that project A has the definition of f in their tree, right? So people of A will not get the warning for that. Only what is part of the compilation process will be part of the analysis process.

Similarly to how this check matches only function definitions, and never a prototype. The latter one would have bazillion of warnings from every header ever, with the user not having a chance to fix it anyways.

aaron.ballman added inline comments.Feb 2 2021, 7:45 AM
clang-tools-extra/test/clang-tidy/checkers/bugprone-easily-swappable-parameters-len2.cpp
10 ↗(On Diff #319315)

But wait... suppose there is a project A

That's the point I was trying to make about it being strange to diagnose on the call site -- the user of the API didn't use it wrong, the designer of the API did something bad.

It's sounding more like this functionality is a good idea for a new check rather than an existing check.

whisperity added inline comments.Feb 2 2021, 7:50 AM
clang-tools-extra/test/clang-tidy/checkers/bugprone-easily-swappable-parameters-len2.cpp
10 ↗(On Diff #319315)

Okay, now I understand! But I did not intend to diagnose it on the call site. The other check itself would share the same configuration parameters and decision-making for string distance, and the "fwd decl vs. definition" diagnostic would come emitted to the definition's location! And the "call site vs. called function" would be diagnosed at the call site. And if there isn't a definition in the analysed TU for some function, the former logic is simply skipped.

aaron.ballman added inline comments.Feb 2 2021, 7:54 AM
clang-tools-extra/test/clang-tidy/checkers/bugprone-easily-swappable-parameters-len2.cpp
10 ↗(On Diff #319315)

But I did not intend to diagnose it on the call site.

I'd find that to be a bit strange given that the other check is called readability-suspicious-call-argument, because there would be no call in that situation. Or am I misunderstanding? (Sorry if I'm being dense!)

whisperity added inline comments.Feb 2 2021, 8:02 AM
clang-tools-extra/test/clang-tidy/checkers/bugprone-easily-swappable-parameters-len2.cpp
10 ↗(On Diff #319315)

No, it is strange! And I think we will come up with a better idea for this because the check logic itself sounds very plausible, and needed. Either factor out the string distance logic somewhere and have two checks with separate matchers, but the same underlying logic then, or rename the other check... The two diagnostics would have a different message either way. The latter option (finding a new name for the check) sounds like the path of least resistance.

aaron.ballman added inline comments.Feb 2 2021, 8:07 AM
clang-tools-extra/test/clang-tidy/checkers/bugprone-easily-swappable-parameters-len2.cpp
10 ↗(On Diff #319315)

Ah, yes! I think we're on the same page now. :-)

whisperity marked 6 inline comments as done.Feb 5 2021, 10:47 AM
whisperity added inline comments.
clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
177 ↗(On Diff #319315)

K&R C functions

Call me too young, but I had to look up what a "K&R C function" is, and I am absolutely baffled how this unholy construct is still supported! But: thanks to Clang supporting it properly in the AST, the checker works out of the box!

Given

int foo(a, b)
  int a;
  int b;
{
  return a + b;
}

We get the following output:

/tmp/knr.c:2:3: warning: 2 adjacent parameters of 'foo' of similar type ('int') are easily swapped by mistake [bugprone-easily-swappable-parameters]
  int a;
  ^
/tmp/knr.c:2:7: note: the first parameter in the range is 'a'
  int a;
      ^
/tmp/knr.c:3:7: note: the last parameter in the range is 'b'
  int b;
      ^

(even the locations are consistent!)

Should I add a test case for this? We could use a specifically C test case either way eventually...


varargs functions

This is a bit of terminology, but something tells me you meant the variadic function here, right? As opposed to type parameter packs.

Given

int sum(int ints...) {
  return 0;
}

the AST looks something like this:

`-FunctionDecl 0x56372e29e258 <variadic.cpp:1:1, line:3:1> line:1:5 sum 'int (int, ...)'
  |-ParmVarDecl 0x56372e29e188 <col:9, col:13> col:13 ints 'int'

Should we diagnose this? And if so, how? The variadic part is not represented (at least not at first glance?) in the AST. Understanding the insides of such a function would require either overapproximatic stuff and doing a looot of extra handling, or becoming flow sensitive... and we'd still need to understand all the va_ standard functions' semantics either way.

riccibruno added inline comments.
clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
337 ↗(On Diff #319840)

I think that this will become unreachable after D84658 and D85033. I may be missing an argument, but in general I would argue that getNameForDiagnostic should never return an empty name for unnamed entities since this is not useful in diagnostics.

aaron.ballman added inline comments.Feb 8 2021, 7:31 AM
clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
177 ↗(On Diff #319315)

Call me too young, but I had to look up what a "K&R C function" is, and I am absolutely baffled how this unholy construct is still supported!

Ah, to be innocent again, how I miss those days. :-D

Should I add a test case for this? We could use a specifically C test case either way eventually...

I think it'd be a useful case, but the one I was specifically more concerned with is:

// N.B.: this is C-specific and does not apply to C++.
void f();

int main(void) {
  f(1, 2, 3.4, "this is why we can't have nice things");
}

where the function has no prototype and so is treated as a varargs call.

This is a bit of terminology, but something tells me you meant the variadic function here, right? As opposed to type parameter packs.

Yes, sorry for being unclear, I am talking about variadic functions.

Should we diagnose this? And if so, how? The variadic part is not represented (at least not at first glance?) in the AST. Understanding the insides of such a function would require either overapproximatic stuff and doing a looot of extra handling, or becoming flow sensitive... and we'd still need to understand all the va_ standard functions' semantics either way.

Well, that's what I'm wondering, really. The arguments are certainly easy to swap because the type system can't help the user to identify swaps without further information (like format specifier strings). However, the checking code would be... highly unpleasant, I suspect. My intuition is to say that we don't support functions without prototypes at all (we just silently ignore them) and that we only check the typed parameters in a variadic function declaration (e.g., we'll diagnose void foo(int i, int j, ...); because of the sequential int parameters, but we won't diagnose void foo(int i, ...); even if call sites look like foo(1, 2);). WDYT?

whisperity added inline comments.Feb 8 2021, 7:57 AM
clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
177 ↗(On Diff #319315)

It is definitely highly unpleasant, at one point I remember just glancing into the logic behind printf() related warnings in Sema and it was... odd, to say the least.

That is how the checker works right now. It will not diagnose int f() {} because from the looks of the function definition, it is a 0-parameter function. Same with the variadic ..., it is a single parameter function (which has a rather weird parameter type).

But yes, I think I'll make this explicit in the documentation (and FWIW, the research paper, too).

However, the checking code would be... highly unpleasant, I suspect.

Incredibly, because this check purposefully targets the function definition nodes only, like a cruise missile. For that logic, we would need to gather call sites for variadic functions and start doing some more magic with it.

FYI, templates are also handled in a way that only what is clear from definitions (and not instantiations... I think we can kind of call a particular "overload" call to a variadic like a template instantiation, just for the sake of the argument here) only:

template <typename T, typename U>
void f(T t1, T t2, U u) {} // [t1, t2] same type, WARN.

void g() { f(1, 2, 3); }} // NO-WARN: Even though "U" in this call is also "int", this is not fixed "easily enough" in the definition, so to prevent more false positives, we shut up.

template <>
void f(int i, int j, int k) {} // [i, k] WARN.

Anything that is only known through template instantiations is context-sensitive (the subsequent patch about typedef extends the tests and diagnostics about this), and thus, we forego trying to find the way.

template <typename T> struct vector { using value_type = T; };

template <typename T>
void f(T t, typename vector<T>::element_type u) {} // NO-WARN, dependent.

template <>
void f(int i, vector<int>::element_type i2) {} // WARN, can explicitly unfold the typedef and see "int == int".

template <typename T>
void g(typename vector<T>::element_type e1, typename vector<T>::element_type e2) {} // WARN, not dependent.
177 ↗(On Diff #319315)

I definitely shall create and add a C file with test cases like this, marking them // NO-WARN, and explaining the reasoning. If for nothing else, maybe to know in the far future what can be improved upon.

aaron.ballman added inline comments.Feb 8 2021, 12:34 PM
clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
177 ↗(On Diff #319315)

I definitely shall create and add a C file with test cases like this, marking them // NO-WARN, and explaining the reasoning. If for nothing else, maybe to know in the far future what can be improved upon.

Excellent, that's the solution I was hoping we'd come to! I find it's useful to add the C test for a function without a prototype (void f()) because some code that expects all function declarations to have a function prototype will assert or crash, so it helps shake out some bugs.

Also, I agree with your interpretation of the template case -- it's another intractable problem at this stage, so best to document how we handle it and move on.

I haven't read through all the comments, but the word 'easily' implies 'desirable'. This check seems to be for finding params which are undesirably swappable, right?

The easily was to mean that the swap can be done with little effort (i.e. "in an easy fashion"?) and by accident.

I understand that. The problem is the name. The signal-to-kill-thread check is bad. The erase check is inaccurate. The roundings check is incorrect. The operator-in-strlen-in-alloc check is misplaced. Those are all words that indicate the negative. Your name indicates the positive ("easily"). The name indicates that the swappable params are desirable.

Why "easily" instead of "suspicious", "spuriously" or any of the other words that are already used?

Why "easily" instead of "suspicious", "spuriously" or any of the other words that are already used?

The name was @aaron.ballman's idea after we postponed the experimental-cppcoreguidelines-yaddayaddayadda....
The best I could come up with is potentially. The suspicious is a problem because this check is an interface check, not a call-site check. (The sister check is currently called suspiciously-swapped-argument or something like that.)

The original check idea from C++ Core Guidelines also uses the phrase:

Reason: Adjacent arguments of the same type are easily swapped by mistake.

However, potentially-swappable-parameters just sounds... wonky. It does not carry the weight the rule otherwise meant to.
weakly-typed-parameters?

I am not good with names, sadly.


I am conflicted about saying that "easily" always refers to some "positive aspect", however. Use a [[unnamed trademarked safety device]] because you can end up easily cutting off your finger during work. Do not touch a stove/hot water fountain, because you can easily get burnt. Leaving Christmas candles unattended could easily lead to a house fire. Does anyone want to get hurt, burnt, their house destroyed, etc.? There is nothing "positive" in these suggestions or regulations either. And we are doing a tool that is practically the same thing, trying to prevent a (different kind of) fire.

whisperity updated this revision to Diff 322339.Feb 9 2021, 4:04 AM
  • Added a C test file with some C-specific language constructs that we may or may not match.
  • Explained limitations with regards to variadics, templates, etc. better in the docs, and added test cases for these too.
  • When comparing the type names against the ignored type name list, do not consider PrintingPolicy, but take exactly what's in the source code. This mostly concerns bool and _Bool at face value, so this is tested in the C file too.
whisperity marked 4 inline comments as done.Feb 9 2021, 4:16 AM
whisperity added inline comments.
clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
255 ↗(On Diff #319315)

Right, @aaron.ballman I think I got this tested. Please check the C test file. bool becomes _Bool in C code by the preprocessor, but the printing policy still says both should be printed as bool. I added some test cases about this. It is weird, because without doing some extra magic specifically against macros (which might break detecting type names in other locations), I can't really test ignoring _Bool but not ignoring bool. Now, the diagnostics respect the printing policy, but the type names are taken with an empty printing policy, which should give us the result as-is in the code. Unfortunately, "as-is" still means _Bool for bool's case, so even if you ignore bool, you will still get results where bool is written, because bool is a macro (Tidy automatically puts a note about this!) and the real type is _Bool, as per the macro expansion.

Wow... okay, that explanation got convoluted.
Anyways, it's really weird. But: we no longer take the actual printing policy anymore, but the default, which should eliminate all these cases.
The bool/_Bool case doesn't fail because of the printing policy, it fails because bool is a macro in <stdbool.h>.

Also, the type name ignore list not a full match, but a substring match ignore list. So if you have a type named struct Foo, if you ignore Foo, both struct Foo, Foo, MyFoo, etc. will be also ignored. And it's case sensitive.

whisperity updated this revision to Diff 322357.Feb 9 2021, 5:51 AM

Made detection of macros as "parameter type names" more robust.

whisperity marked an inline comment as done.Feb 9 2021, 5:52 AM
whisperity added inline comments.
clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
255 ↗(On Diff #319315)

Nevermind, I think I figured this out... or at least managed to figure it out partially... This "SpellingLoc" and "ExpansionLoc" and all this stuff is really convoluted, and there doesn't seem to be any good documentation for them.

whisperity updated this revision to Diff 322363.Feb 9 2021, 5:58 AM

NFC (With all the restructuring and trying to figure out how to implement the latest change, I managed to upload the wrong, unclean diff...)

Make sure that overloadable binary operators are ignored properly.

  • [NFC] Reformat the documentation so option values are in single backticks.

@aaron.ballman I've reworked and uploaded the final addition to this line, the crown jewel, the very reason this checker was developed (D75041). So, AFAIK, the "let's rewrite the whole thing with better code quality" work can be considered done, and we may move on with the review.

aaron.ballman added inline comments.Mar 16 2021, 8:38 AM
clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
54 ↗(On Diff #326439)

const_reverse_iterator seems plausible here for the same reason as const_iterator.

56 ↗(On Diff #326439)

Are you planning to remove the debugging code now that the check is approaching its final form?

363 ↗(On Diff #326439)

Is this still needed in light of the comments from @riccibruno?

whisperity added inline comments.Mar 16 2021, 10:21 AM
clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
56 ↗(On Diff #326439)

Actually, I would prefer not to. I removed every debug thing possible. However, and this is speaking from experience because I wrote this check two times already from basically scratch... the rest of the debug code that is part of the patch has to be there. If anything goes nuts, especially if there would be false positives later... it would be impossible to figure out what is going on during the modelling without seeing all the steps being taken.

whisperity added inline comments.Mar 16 2021, 10:25 AM
clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
363 ↗(On Diff #326439)

It doesn't look like to me as if those patches they commented were ever merged, but I'll check it out.

aaron.ballman accepted this revision.Mar 16 2021, 10:34 AM

In general, I'm happy with this check (and the follow-ups), so I'm giving it my LG. However, it's complex enough that I think having some extra review from @alexfh, @hokein, or one of the other reviewers would be a good idea, so please hold off on landing it for a bit in case any of the other reviewers want to weigh in.

clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
56 ↗(On Diff #326439)

We don't often leave debugging statements in because they tend to cause a maintenance burden that doesn't justify their benefit. However, I agree with your observation that this code is quite difficult to reason through without debugging aids.

I don't insist on removing the code yet, but would say we should remain open to the idea if it causes a burden in practice. (Either in terms of keeping the debugging code up to date as the check evolves, but also in terms of compile time performance for the check if the debugging code paths turn out to be expensive on build times.)

This revision is now accepted and ready to land.Mar 16 2021, 10:34 AM

[...] so please hold off on landing it for a bit in case any of the other reviewers want to weigh in.

Due to how the patch itself only being useful in practice if all the pieces fall into place, I will definitely keep circling about until the entire patch tree is ✔. (Including the filtering heuristics, etc.)

clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
56 ↗(On Diff #326439)

All debugging code is wrapped into the LLVM_DEBUG macro specifically (that's why I even put this little "print bits" function behind NDEBUG) so they are not part of the builds.

I think in general if someone puts the effort into reading the code and has an interactive debugger they could figure it out (especially if they keep track of the recursion constantly), I tried to make everything nicely padded and all the variable names and control flow to make sense. (Of course the wealth of complexity comes in the follow-up patches.)

[...] so please hold off on landing it for a bit in case any of the other reviewers want to weigh in.

Due to how the patch itself only being useful in practice if all the pieces fall into place, I will definitely keep circling about until the entire patch tree is ✔. (Including the filtering heuristics, etc.)

Good idea. :-)

clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
56 ↗(On Diff #326439)

All debugging code is wrapped into the LLVM_DEBUG macro specifically (that's why I even put this little "print bits" function behind NDEBUG) so they are not part of the builds.

They're part of debug builds still (which is the build configuration I tend to use most often). That said, I think it's fine to leave the debugging code in for now, especially as the check is being actively worked on.

whisperity edited reviewers, added: njames93; removed: Szelethus.Mar 16 2021, 11:50 AM
whisperity marked 6 inline comments as done.
whisperity added a subscriber: Szelethus.
  • Add const reverse iterator to the default-ignored typename list.
whisperity added inline comments.Mar 17 2021, 5:31 AM
clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
363 ↗(On Diff #326439)

I checked at they haven't been merged yet. Nonetheless, we can remove this logic later once that logic is in.

I added a test case for unnamed parameters with the current logic printing <unnamed>, so if the underlying call will return something like <unnamed parameter @ 24:48> we will know because that test will blow up.

aaron.ballman added inline comments.Mar 17 2021, 5:42 AM
clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
363 ↗(On Diff #326439)

Thank you for checking and adding some test coverage for it!

whisperity marked an inline comment as done.Mar 18 2021, 7:37 AM
whisperity set the repository for this revision to rG LLVM Github Monorepo.

Rebased over D98635, allowing us to properly highlight the found "mixable adjacent parameter range" (assuming it's on the same line, of course):

Made sure that the parameter's name is highlighted promptly in the "first|last parameter of the range is..." note.

alexfh added a comment.May 9 2021, 8:34 PM

Thanks for the great work! This is mostly looking fine, but I've added a few comments. I could only make a quite superficial review so far, but I'll try to take a deeper look into this soon.

clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
70 ↗(On Diff #336622)

I think, this function is not needed (see the comment below), but if it is, a single std::string and in-place std::reverse() would be better, imo.

84–87 ↗(On Diff #336622)

I suppose that textual representation of the flags would be easier to understand in debug logs than just bits. Maybe single characters or other abbreviations, if this needs to be compact.

113–120 ↗(On Diff #336622)

I'd switch to scoped enums (enum class), remove the macro, and use simpler code like None = 0x1, Trivial = 0x2, Canonical = 0x4, etc.

133 ↗(On Diff #336622)

What should this method do and how it should be used? Same for Mix::sanitize()

295–297 ↗(On Diff #336622)

Would llvm::find(Check.IgnoredParameterNames, NodeName) != Check.IgnoredParameterNames.end() work?

302 ↗(On Diff #336622)

No need for parentheses here.

313–320 ↗(On Diff #336622)

Is there a chance that you're trying to reimplement a part of Lexer::makeFileCharRange functionality here?

clang-tools-extra/test/clang-tidy/checkers/bugprone-easily-swappable-parameters-len2.cpp
129 ↗(On Diff #336622)

Does the top-level const change anything with regard to the "mixability"? It's a purely implementation-side difference, callers could still swap parameters.

alexfh added a comment.May 9 2021, 8:38 PM

BTW, any ideas why "patch application failed"? (See the Build status in the Diff Detail section)

It would be really nice to get pre-merge checks to work for the patch.

BTW, any ideas why "patch application failed"? (See the Build status in the Diff Detail section)

It would be really nice to get pre-merge checks to work for the patch.

Put simply, I think I haven't rebased the patch since a few changes ago. The past few changes were NFC/lint stuff only.

clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
113–120 ↗(On Diff #336622)

enum class is a good idea!

Unfortunately, I'm not really sold on the hexa literals... It makes the code confusing the read, because you have to keep in mind which range the hexa literals mean, and when there is a shift in position, e.g. if you read 0x20 you have to count out that okay, that's 2nd position, so at least 16, so at least the 4th bit, but it is two, so you shift it again to the right, so in the bit pattern it turns on the 5th bit actually.
It makes it very easy to misread or misinterpret things, especially when trying to work on the source code.
(In the last extension patch, the number of elements in the bit mask go up to the 8th or the 9th bit, I don't remember exactly at the top of my head.)

Compared to this, the current text when read immediately tells you which "decimal" position you should expect the appropriate bit to be toggled on, if the flag is there.

Binary literals, on the other hand, would be amazing to use here, sadly we're only in C++14.

133 ↗(On Diff #336622)

Once there are more flags in the bitmask (not just "None" and "Trivial"), this method deals with sanitising the flags that are toggled on and off by the recursive descent on the type tree.

For example, if there is a typedef involved (see D95736), it will toggle on the typedef flag, and start investigating the aliased type. This subsequent investigation might figure out that the aliased type is trivial, or it is distinct. In case it is distinct, that level of the recursion will turn on the "Not mixable" flag. In this case, when the investigation returns completely, we will have "Typedef" and "None" both turned on. This method will throw away everything but the "None" bit, so when the data is returned to the diagnostic-generating part of the check, it will know what to do.

313–320 ↗(On Diff #336622)

Does that expand the token length properly? I took this idiom of getLocForEndOfToken from various other Tidy checks... basically, anything that deals with replacing some sort of textual range in the code calls to this function. I need to have the exact ending location so I can obtain the full text, as appears in the source file. In this case, to compare against the user's input.

E.g. here is one example from utils/UsingInserter.cpp:

  32   │ Optional<FixItHint> UsingInserter::createUsingDeclaration(
  33   │     ASTContext &Context, const Stmt &Statement, StringRef QualifiedName) {
/* ... */
  42   │   SourceLocation InsertLoc = Lexer::getLocForEndOfToken(
  43   │       Function->getBody()->getBeginLoc(), 0, SourceMgr, Context.getLangOpts());

A quick grep for makeCharRange in the repository didn't turn up any significant result, apart from one unit test.

clang-tools-extra/test/clang-tidy/checkers/bugprone-easily-swappable-parameters-len2.cpp
129 ↗(On Diff #336622)

Unfortunately, coding styles differ whether people mark local variables const or not, across projects. Even with pointers: in some cases, a swapped call to memcpy might be caught if the user made one of the pointers passed already const void* instead of just void*.

D95736 deals with implementing the logic for this, it is exposed as a user-configurable option whether they want to consider const stuff the same as non-const stuff.

There was a bit of talk on Twitter recently about strong typing which reminded me of this checker! 😁 I'll fix all issues in one go once we're through with all the comments, I'd like to see the full picture of what needs to be done.

whisperity marked 8 inline comments as done.

All changes are NFC and styling only.

  • Remove the FB macro in favour of explicitly specifying the bit flag value
  • Change debug printout from bit pattern to individual significant letters

BTW, any ideas why "patch application failed"? (See the Build status in the Diff Detail section)

I think I figured out why the pre-build CI isn't working... it seems that the fact that this patch depends on an abandoned revision D76545 misleads the CI logic as it sees the dependency, and even though it is abandoned, tries to download and apply the contents of D76545, which makes it fail...

INFO    D76545#251840 commit 1e9b6b89a7b5c49612018b120c2c142106056f82 has a later commit date then7cdbf1ed4b94259a3aad2a7575e928fa61b0e57e
[...]
ERROR   error: patch failed: clang-tools-extra/docs/clang-tidy/index.rst:62
error: clang-tools-extra/docs/clang-tidy/index.rst: patch does not apply
error: patch failed: clang-tools-extra/clang-tidy/ClangTidyForceLinker.h:15
error: clang-tools-extra/clang-tidy/ClangTidyForceLinker.h: patch does not apply

(Try nudging the CI to only apply this commit after removal of parent links.)

Attempt to fix build failure on Windows buildbot (operator new was taking unsigned long instead of size_t in a test which resulted in hard error).

aaron.ballman accepted this revision.Jun 25 2021, 5:18 AM

This is a very involved check that I think is going to produce some interesting results for users, thank you for working so diligently on it! I think all of the patches in the series have now been accepted and this is ready to land. @whisperity, are you aware of any changes that are left to be made?

This is a very involved check that I think is going to produce some interesting results for users, thank you for working so diligently on it! I think all of the patches in the series have now been accepted and this is ready to land. @whisperity, are you aware of any changes that are left to be made?

I'll ping @martong privately if they have additional comments about D75041, but otherwise, no, I think no more things left to do. Landing checklist complete.

This is a very involved check that I think is going to produce some interesting results for users, thank you for working so diligently on it! I think all of the patches in the series have now been accepted and this is ready to land. @whisperity, are you aware of any changes that are left to be made?

I'll ping @martong privately if they have additional comments about D75041, but otherwise, no, I think no more things left to do. Landing checklist complete.

Great! Unless you hear from @alexfh, @martong, or someone else with additional concerns, I think you're set to land this!

I thought I sent this batch of comments loong ago =\

Better late than never.

clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
113–120 ↗(On Diff #336622)

First of all, do we ever need to know the position of the set bit in these flag enumerators? Isn't it enough that each enumerator has a distinct bit set? If so, there's not much complexity in reading and expanding sequences like 0x0001, 0x0002, 0x0004, 0x0008, 0x0010, 0x0020, 0x0040, 0x0080, 0x0100, 0x0200, ....
However, if that's not to your taste, there's another frequently used style of defining flag enumerations in LLVM: 1 << 0, 1 << 1, 1 << 2, 1 << 3, ....

Either of these is fine, but using macros for this purpose is not necessary and doesn't make the code any easier to read or to modify.

Compared to this, the current text when read immediately tells you which "decimal" position you should expect the appropriate bit to be toggled on, if the flag is there.

Do I understand correctly that you need this exclusively for the purpose of reading debug logs? If so, it seems to be better to decode the bits before logging them. See the comment on line 87 above.

133 ↗(On Diff #336622)

Thanks for the explanation! But it would be very helpful to have this information in the code, not only in a code review comment.

clang-tools-extra/test/clang-tidy/checkers/bugprone-easily-swappable-parameters-len2.cpp
129 ↗(On Diff #336622)

The const in const void * is not top-level, it applies to the pointed at type. A top-level const here would be const void * const (a constant pointer to a constant void) or void * const - a const pointer to a non-const void. Now getting back to the original question. Consider this function:

void qualified1(int I, const int CI) {}

The top-level const for the second parameter isn't a part of the function signature and doesn't help with prevention of the unintentionally swapped parameters. I guess, the same goes for top-level volatile.

Whatever arguments x and y are passed to qualified1 they can be swapped without being noticed.

whisperity marked 2 inline comments as done.Jun 29 2021, 1:59 AM
whisperity added inline comments.
clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
113–120 ↗(On Diff #336622)

Yes, there individual positions themselves don't matter! The only thing that matters is that some flags are < than None and the rest are >=, but that's only needed because of the ugly workaround.

I turned the debug printouts to use a manual formatting with "reasonable" characters being set (e.g. T for Trivial or & for reference). It's much more readable and straight-forward, because you actually do not need to know the position of the bits themselves, those don't matter.

133 ↗(On Diff #336622)

The full patch stack landed, and this function has a much more elaborated implementation (with more comments), I believe. In this patch, there was indeed nothing to add here.

clang-tools-extra/test/clang-tidy/checkers/bugprone-easily-swappable-parameters-len2.cpp
129 ↗(On Diff #336622)

True. We can think about this later. The modelling logic would be made even more complicated by handling this, however, and the reports themselves would become asymmetric, because to understand the reports, users would need to actively think about this particular example (whether something is a "top-level const" or a "non-top-level const").

I know the check has almost been made into a CppCG checker (which in the strictest case says to only warn if something has the same type) and then we backtracked from it, some of these design decisions are hereditary from the previous reviews and development process.

I think the fact that there is a useful way from the user's perspective to toggle this is beneficial for two reasons:

  • By default, the number of warnings are smaller. To an already verbose checker as it is, this is a clear upside.
  • Even if mathematically there is no difference between I and CI from a language standpoint, if the const is there, it could be indicative of the user meaning something (similar to how I tried indicating a few things by making local variables const, and then during the review of the patch, I was told to not do that. It seems Tidy uses this coding style (something I do not personally agree with, local const variables prevent me from mistakes too!), where there are other project that use a different style). So in essence, this still allows us to respect coding styles in some way. By making one of the variables const, you wanted to indicate something - something that the checker can pick up on, and give you the choice to not start yelling immediately. At least when you're using the default settings and not tuning the individual options.
RKSimon added inline comments.
clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
57 ↗(On Diff #354803)

@whisperity There is a missing comma at the end of the "const_reverse_iterator" line causing implicit concatenation with the following "Constreverseiterator" line.

whisperity marked 3 inline comments as done.Aug 24 2021, 7:13 AM
whisperity added inline comments.
clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp
57 ↗(On Diff #354803)

Thanks... I'll get around to hotfixing this ASAP, and hope it can be backported to 13.0...

(I wish there was a checker for this? 🙄)

whisperity marked an inline comment as done.Aug 28 2021, 7:03 AM