I propose this new readability checker.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
My thoughts on the check added.
Have you run it over a big codebase? What is the turnout?
And please add the check to the ReleaseNotes.
clang-tidy/readability/StrlenArgumentCheck.cpp | ||
---|---|---|
24 | please make it ::strlen since its in the global namespace in c++. AFAIK it detects in C correctly. | |
docs/clang-tidy/checks/readability-strlen-argument.rst | ||
7 | Could you please add a little more motivational text to it? const char* = "Some super nice string"; .... | |
test/clang-tidy/readability-strlen-argument.cpp | ||
2 | Same as documentation, maybe a little more telling examples to test on. |
Please mention this check in docs/ReleaseNotes.rst (in alphabetical order).
docs/clang-tidy/checks/readability-strlen-argument.rst | ||
---|---|---|
7 | Please use check, not checker. Please enclose strlen() into ``. |
It may be good idea to add check for arguments which taken from C++ containers like std::string, std::string_view, etc.
Thanks for all comments. I am working on fixing them. Updated patch will be uploaded soon.
Have you run it over a big codebase? What is the turnout?
I did not see this warning yet. Maybe after fixing the comments (::strlen) there will be a difference.
Not sure how you want. Do you envision something like:
std::string Par; strlen(Par.c_str() + 1);
test/clang-tidy/readability-strlen-argument.cpp | ||
---|---|---|
2 | how do you mean with char**. If you access those one by one it will look something like this right? char **A; strlen(*(A+N)); such code is not matched as far as I see. |
Not sure how you want. Do you envision something like:
std::string Par; strlen(Par.c_str() + 1);
Even strlen(Par.c_str()) will be reasonable.
test/clang-tidy/readability-strlen-argument.cpp | ||
---|---|---|
12 | but if any of the first ten chars in Str can be NUL, then the change is bogus? (I guess I fail to see the reason for this check) |
test/clang-tidy/readability-strlen-argument.cpp | ||
---|---|---|
2 | Iam not sure if I understood the goal of the check. your example is correctly what i thought of and i think as well it would not be matched . strlen(A+N)``` would not compile? The types do not match, but could there weird things happen? | |
12 | intersting point. If the Array holds a long list of string concatenated and seperated by 0, the offsetting would be a valid usecase. But i have no idea if that is actually a scenario. usually i use std::string :) |
docs/clang-tidy/checks/readability-strlen-argument.rst | ||
---|---|---|
8 | when the intend was to allocate one more char, he would need to do strlen(s) + 1, why is it changed to subtraction then? | |
20 | isnt that an overflow? the copy operation will then copy the content of s into p, therefore copying 10 characters into a buffer of length 9. as i understand it strcpy(p, s + 1) would be correct with the sizes. |
I am thinking about making my check more strict so it only warns in allocations. I believe the example code is much more motivating when there is allocation.
I have a script that runs clang/clang-tidy on all debian source code. It basically grabs all packages and if it has a configure script it runs: ./configure && bear make && clang-tidy .. Running that right now.
It goes slowly I have run clang-tidy on 22 packages (735 files) so far and got 13 warnings.
Unfortunately most warnings so far are false positives as in this example code:
#include <string.h> void dostuff(const char *P) { if (strncmp(P+2,"xyz",3)==0) {} }
Without -O2 I get no warning. With -O2 I get a false positive:
danielm@debian:~$ ~/llvm/build/bin/clang-tidy -checks=-*,readability-strlen-argument strlen.c -- -O2 1 warning generated. /home/danielm/strlen.c:3:16: warning: strlen() argument has pointer addition, it is recommended to subtract the result instead. [readability-strlen-argument] if (strncmp(P+2,"xyz",3)==0) {} ^
When the -O2 flag is used then on my machine the strncmp() function call is expanded to lots of code. and in that code, there are calls to strlen().
I should probably avoid these, I guess skipping all warnings in macro code sounds ok to me.
#include <string.h>
void dostuff(const char *P) {if (strncmp(P+2,"xyz",3)==0) {}}
Iam not super familiar with C, but the intend of the function is to check the following:
P = "foxyz", P2 = "abxyz", P3 = "opxyz", ...
And if P matches this kind of string pattern.
/home/danielm/strlen.c:3:16: warning: strlen() argument has pointer addition, it is recommended to subtract the result instead. [readability-strlen-argument]
if (strncmp(P+2,"xyz",3)==0) {} ^
Why is it matched? This is the code transformation from -O2? I dont know a way to surround that, and i think you should not call clang-tidy with optimization.
I should probably avoid these, I guess skipping all warnings in macro code sounds ok to me.
You can check if the code is a result of macro expansion, i think that would be enough.
docs/clang-tidy/checks/readability-strlen-argument.rst | ||
---|---|---|
8 | If I change it to strlen(s) + 1 then the logic of the program is changed. If I change it to subtraction then the logic is the same and the program is still wrong, but imho it is easier to see. | |
20 | yes it is overflow. My intention was to show that strlen(s+1) syntax is dangerous. | |
test/clang-tidy/readability-strlen-argument.cpp | ||
12 | I think that in theory, you have a point. It would be best that such users don't use this check. I doubt that is a big problem in practice. We don't need to speculate much, I will test this on all debian source code.. It's possible that strings in program executables are stored like that, but I'd say it's ub to calculate the address Str+10 and then dereference that if Str is a string that is 10 bytes long. |
docs/clang-tidy/checks/readability-strlen-argument.rst | ||
---|---|---|
8 | it should be made clear in the docs, that the code is bad, especially if its UB. the dangers of copy & paste ;) (even its unlikely that this will be copy&pasted). | |
20 | ok. please state that the overflow in a comment, its better to make that explicit. |
clang-tidy/readability/StrlenArgumentCheck.cpp | ||
---|---|---|
23–24 | anyOf(callee(functionDecl(hasName(x))), callee(functionDecl(hasName(y)))) unnecessarily duplicates callee and functionDecl. A better option is callee(functionDecl(hasAnyName(x, y))). | |
docs/clang-tidy/checks/readability-strlen-argument.rst | ||
8 | If we have good reasons to think the original code is wrong, so it seems that it's better to fix it, than to retain the behavior and just hide the bug? |
docs/clang-tidy/checks/readability-strlen-argument.rst | ||
---|---|---|
20 | BTW, strlen(x) - N is not only prone to overflows, but also less efficient (in case it's intentional). Did you run the check on real projects to see how likely this pattern is a bug? |
please make it ::strlen since its in the global namespace in c++. AFAIK it detects in C correctly.