This is an archive of the discontinued LLVM Phabricator instance.

[analyser] different.CallArgsOrder checker implementation
Needs ReviewPublic

Authored by alexey.knyshev on Dec 11 2017, 10:57 AM.

Details

Summary

CallArgsOrderChecker which looks for accidental swap or skip of arguments in function, methods, constructors and operators calls

Diff Detail

Repository
rC Clang

Event Timeline

anna resigned from this revision.Dec 14 2017, 6:13 PM

Perhaps added me incorrectly as reviewer?

szepet requested changes to this revision.Dec 18 2017, 2:51 PM
szepet edited reviewers, added: xazax.hun, szepet; removed: dergachev.a.
szepet added subscribers: alexfh, szepet.

Hi Alexey!

Thank you for working on this!

Some general comments on the patch:

  1. Please upload the changes with the context included (git flag -U99999) which could help the review process.
  2. Use the LLVM Coding Guidelines on the tests as well (Start variable names with a capital letter, etc )
  3. Ping here, not necessary on the cfe-dev mailing list ;)
  4. FYI: There is a similar check under review which uses only the AST provided information and implemented as a tidy-checker: https://reviews.llvm.org/D20689 (As I see your checker does not uses symbolic execution provided features. So, it would probably worth thinking about if the analyzer is the project where the checker should be implemented. However, @dcoughlin and @alexfh have more insight on the answer to this question. What do you think? )
  5. In overall, please add comments to the function which describes its purpose. Mainly on heuristic functions, it can help to understand it more easily. Also, could you provide some results of the current heuristics, what are the false positive rates on real projects like LLVM, FFmpeg, etc? I am quite interested in that.

Some comments added inline but they are basically the above mentioned things.

lib/StaticAnalyzer/Checkers/CallArgsOrderChecker.cpp
57

It looks like that Levenshtein edit distance could come handy in cases like this. It is already implemented as a method edit_distance on StringRef.

164

Please add a comment to this function which describes its input, output, purpose.

test/Analysis/call-args-order.c
4

Please use capital starting letter on variables. On the other test file as well.

  1. FYI: There is a similar check under review which uses only the AST provided information and implemented as a tidy-checker: https://reviews.llvm.org/D20689 (As I see your checker does not uses symbolic execution provided features. So, it would probably worth thinking about if the analyzer is the project where the checker should be implemented. However, @dcoughlin and @alexfh have more insight on the answer to this question. What do you think? )

The main problem with the check proposed in https://reviews.llvm.org/D20689 (which uses, IIUC, very similar algorithm to this one) is that the false positive rate on real projects is too high. The comment I made on the other review thread applies here as well:

There's definitely potential in using parameter name and argument spelling to detect possibly swapped arguments, and there's a recent research on this topic, where authors claim to have reached the true positive rate of 85% with decent recall. See https://research.google.com/pubs/pub46317.html. If you're interested in working on this check, I would suggest at least looking at the techniques described in that paper.

As for whether this analysis should be implemented in the static analyzer or as a clang-tidy check: I think, clang-tidy would be more suitable, since

  1. this analysis doesn't need the path analysis machinery provided by the static analyzer
  2. this analysis may benefit from suggesting automated fixes
  3. the false positive rate of the state-of-the-art algorithm implementing similar analysis have a false positive rate of at least 15-30%, and a large fraction of false positives could potentially be treated as "code smells" which can be fixed by giving function parameters or local variables proper names. I don't know whether this kind of analysis meets the bar the static analyzer sets for its checkers. Anna or Devin can tell more here.
  1. In overall, please add comments to the function which describes its purpose. Mainly on heuristic functions, it can help to understand it more easily. Also, could you provide some results of the current heuristics, what are the false positive rates on real projects like LLVM, FFmpeg, etc? I am quite interested in that.

You can take the list of projects from https://reviews.llvm.org/D20689#878803, run the analysis on them, and see whether the results are any better.

Hi Alexey,

This commit strongly needs testing on some real code. I cannot predict the TP rate of this checker now.
Regarding implementation, you can find some remarks inline.

lib/StaticAnalyzer/Checkers/CallArgsOrderChecker.cpp
31

We have a class without any private parts. This means that we should declare it as 'struct' or do some redesign - the way it is constructed (with expressions like PGL.back().ParamsVec.push_back(std::move(p)); clearly indicates some design issues. I suggest at least adding a wrapper above ParamsGroupVec with methods:
ParamGroup &addParam(const ParmVarDecl *ParamVD); // adds a variable into existing group or creates new one
void reclaimLastGroupIfSingle(); // deletes last added group if it has only a single element
and moving some logic here.

33

You don't need to reserve the amount of items same or less then SmallVector default size - they are already stack-allocated.

112

else if?

119

trim* functions should be moved out of ParamsGroup because they are not related.

139

We can the code:

SamePrefix = SamePrefix.take_front(StrSize);
SamePrefixSize = SamePrefix.size();

instead. The behaviour of StringRef::take_front(size_t N) is well-defined if N >= size().

Same below: drop_front() can be replaced with take_back().

144

What you are trying to do here (find common prefix of two strings) can be done easier using std::mismatch().

157

llvm::transform(StrVec, StrVec.begin(), ...)

164

With names like removeCommon[Pre/Suf]fix, the intention of these functions will be much cleaner. Also, please do not pass vectors by value.

185

We can use std::mismatch with std::reverse_iterator (std::rbegin(), std::rend()).

208

We can use try_emplace() to construct the new item in-place.

215

This is already check in the caller. I think, this can be replaced with an assertion.

257

Could you please explain what kind of ExprName are you trying to get?

259

As I remember, IgnoreParenCasts() can not return nullptr. So, we can turn this into assertion.

262

OrigE is not nullptr so we can use dyn_cast.

278

We have ASTContext so we can use getPrintingPolicy() and getLangOpts() instead of default values.

279

We can use Lexer::getSourceText() here.

311

Personally, I think that writing:

if (const FunctionDecl *FD = CE->getConstructor())
  checkCall(FD, CE, /*FirstActualArgOffset=*/0);

is shorter and clearer. But it is so minor that I don't insist on this change.

319

The function is pretty long. I think we should split it.

323

Should we check for NumArgs <= FirstActualArgOffset + 1, like in condition below?

334

This will look nicer with C++11-style loop.

363

Should we return or just continue?

415

Instead of eachOf(forEachDescendant(), forEachDescendant()) it is better to use forEachDescendant(anyOf()). This will avoid double iteration on descendant tree (the memoizing size for descendant tree is limited).

test/Analysis/call-args-order.c
13

Please mark tests for true negatives as // no warning.