CallArgsOrderChecker which looks for accidental swap or skip of arguments in function, methods, constructors and operators calls
Diff Detail
- Repository
- rC Clang
Event Timeline
Hi Alexey!
Thank you for working on this!
Some general comments on the patch:
- Please upload the changes with the context included (git flag -U99999) which could help the review process.
- Use the LLVM Coding Guidelines on the tests as well (Start variable names with a capital letter, etc )
- Ping here, not necessary on the cfe-dev mailing list ;)
- FYI: There is a similar check under review which uses only the AST provided information and implemented as a tidy-checker: https://reviews.llvm.org/D20689 (As I see your checker does not uses symbolic execution provided features. So, it would probably worth thinking about if the analyzer is the project where the checker should be implemented. However, @dcoughlin and @alexfh have more insight on the answer to this question. What do you think? )
- In overall, please add comments to the function which describes its purpose. Mainly on heuristic functions, it can help to understand it more easily. Also, could you provide some results of the current heuristics, what are the false positive rates on real projects like LLVM, FFmpeg, etc? I am quite interested in that.
Some comments added inline but they are basically the above mentioned things.
lib/StaticAnalyzer/Checkers/CallArgsOrderChecker.cpp | ||
---|---|---|
57 | It looks like that Levenshtein edit distance could come handy in cases like this. It is already implemented as a method edit_distance on StringRef. | |
164 | Please add a comment to this function which describes its input, output, purpose. | |
test/Analysis/call-args-order.c | ||
4 | Please use capital starting letter on variables. On the other test file as well. |
The main problem with the check proposed in https://reviews.llvm.org/D20689 (which uses, IIUC, very similar algorithm to this one) is that the false positive rate on real projects is too high. The comment I made on the other review thread applies here as well:
There's definitely potential in using parameter name and argument spelling to detect possibly swapped arguments, and there's a recent research on this topic, where authors claim to have reached the true positive rate of 85% with decent recall. See https://research.google.com/pubs/pub46317.html. If you're interested in working on this check, I would suggest at least looking at the techniques described in that paper. |
As for whether this analysis should be implemented in the static analyzer or as a clang-tidy check: I think, clang-tidy would be more suitable, since
- this analysis doesn't need the path analysis machinery provided by the static analyzer
- this analysis may benefit from suggesting automated fixes
- the false positive rate of the state-of-the-art algorithm implementing similar analysis have a false positive rate of at least 15-30%, and a large fraction of false positives could potentially be treated as "code smells" which can be fixed by giving function parameters or local variables proper names. I don't know whether this kind of analysis meets the bar the static analyzer sets for its checkers. Anna or Devin can tell more here.
- In overall, please add comments to the function which describes its purpose. Mainly on heuristic functions, it can help to understand it more easily. Also, could you provide some results of the current heuristics, what are the false positive rates on real projects like LLVM, FFmpeg, etc? I am quite interested in that.
You can take the list of projects from https://reviews.llvm.org/D20689#878803, run the analysis on them, and see whether the results are any better.
Hi Alexey,
This commit strongly needs testing on some real code. I cannot predict the TP rate of this checker now.
Regarding implementation, you can find some remarks inline.
lib/StaticAnalyzer/Checkers/CallArgsOrderChecker.cpp | ||
---|---|---|
31 | We have a class without any private parts. This means that we should declare it as 'struct' or do some redesign - the way it is constructed (with expressions like PGL.back().ParamsVec.push_back(std::move(p)); clearly indicates some design issues. I suggest at least adding a wrapper above ParamsGroupVec with methods: | |
33 | You don't need to reserve the amount of items same or less then SmallVector default size - they are already stack-allocated. | |
112 | else if? | |
119 | trim* functions should be moved out of ParamsGroup because they are not related. | |
139 | We can the code: SamePrefix = SamePrefix.take_front(StrSize); SamePrefixSize = SamePrefix.size(); instead. The behaviour of StringRef::take_front(size_t N) is well-defined if N >= size(). Same below: drop_front() can be replaced with take_back(). | |
144 | What you are trying to do here (find common prefix of two strings) can be done easier using std::mismatch(). | |
157 | llvm::transform(StrVec, StrVec.begin(), ...) | |
164 | With names like removeCommon[Pre/Suf]fix, the intention of these functions will be much cleaner. Also, please do not pass vectors by value. | |
185 | We can use std::mismatch with std::reverse_iterator (std::rbegin(), std::rend()). | |
208 | We can use try_emplace() to construct the new item in-place. | |
215 | This is already check in the caller. I think, this can be replaced with an assertion. | |
257 | Could you please explain what kind of ExprName are you trying to get? | |
259 | As I remember, IgnoreParenCasts() can not return nullptr. So, we can turn this into assertion. | |
262 | OrigE is not nullptr so we can use dyn_cast. | |
278 | We have ASTContext so we can use getPrintingPolicy() and getLangOpts() instead of default values. | |
279 | We can use Lexer::getSourceText() here. | |
311 | Personally, I think that writing: if (const FunctionDecl *FD = CE->getConstructor()) checkCall(FD, CE, /*FirstActualArgOffset=*/0); is shorter and clearer. But it is so minor that I don't insist on this change. | |
319 | The function is pretty long. I think we should split it. | |
323 | Should we check for NumArgs <= FirstActualArgOffset + 1, like in condition below? | |
334 | This will look nicer with C++11-style loop. | |
363 | Should we return or just continue? | |
415 | Instead of eachOf(forEachDescendant(), forEachDescendant()) it is better to use forEachDescendant(anyOf()). This will avoid double iteration on descendant tree (the memoizing size for descendant tree is limited). | |
test/Analysis/call-args-order.c | ||
13 | Please mark tests for true negatives as // no warning. |
We have a class without any private parts. This means that we should declare it as 'struct' or do some redesign - the way it is constructed (with expressions like PGL.back().ParamsVec.push_back(std::move(p)); clearly indicates some design issues. I suggest at least adding a wrapper above ParamsGroupVec with methods:
ParamGroup &addParam(const ParmVarDecl *ParamVD); // adds a variable into existing group or creates new one
void reclaimLastGroupIfSingle(); // deletes last added group if it has only a single element
and moving some logic here.