[clang-tidy] New readability check for strlen argument
Needs ReviewPublic

Authored by danielmarjamaki on Fri, Apr 21, 3:45 AM.

Details

Summary

I propose this new readability checker.

Diff Detail

Repository
rL LLVM
JonasToth added a subscriber: JonasToth.EditedFri, Apr 21, 3:57 AM

My thoughts on the check added.
Have you run it over a big codebase? What is the turnout?

And please add the check to the ReleaseNotes.

clang-tidy/readability/StrlenArgumentCheck.cpp
24

please make it ::strlen since its in the global namespace in c++. AFAIK it detects in C correctly.

docs/clang-tidy/checks/readability-strlen-argument.rst
7

Could you please add a little more motivational text to it?
As I understand it, the wanted goal is to get the length of a substring, denoted as char*. Am I right?
You could give a more fully code example showing the equivalence.

const char* = "Some super nice string"; ....

test/clang-tidy/readability-strlen-argument.cpp
2

Same as documentation, maybe a little more telling examples to test on.
What happens with char**, an array of strings? Accessing those one by one would be possible with an addition or subscriptoperation.

Please mention this check in docs/ReleaseNotes.rst (in alphabetical order).

docs/clang-tidy/checks/readability-strlen-argument.rst
7

Please use check, not checker. Please enclose strlen() into ``.

It may be good idea to add check for arguments which taken from C++ containers like std::string, std::string_view, etc.

Thanks for all comments. I am working on fixing them. Updated patch will be uploaded soon.

Have you run it over a big codebase? What is the turnout?

I did not see this warning yet. Maybe after fixing the comments (::strlen) there will be a difference.

It may be good idea to add check for arguments which taken from C++ containers like std::string, std::string_view, etc.

Not sure how you want. Do you envision something like:

std::string Par;
strlen(Par.c_str() + 1);
test/clang-tidy/readability-strlen-argument.cpp
2

how do you mean with char**. If you access those one by one it will look something like this right?

char **A;
strlen(*(A+N));

such code is not matched as far as I see.

It may be good idea to add check for arguments which taken from C++ containers like std::string, std::string_view, etc.

Not sure how you want. Do you envision something like:

std::string Par;
strlen(Par.c_str() + 1);

Even strlen(Par.c_str()) will be reasonable.

sberg added a subscriber: sberg.Mon, Apr 24, 7:16 AM
sberg added inline comments.
test/clang-tidy/readability-strlen-argument.cpp
12

but if any of the first ten chars in Str can be NUL, then the change is bogus? (I guess I fail to see the reason for this check)

JonasToth added inline comments.Mon, Apr 24, 7:26 AM
test/clang-tidy/readability-strlen-argument.cpp
2

Iam not sure if I understood the goal of the check.
For me, it looks like the determination of the length of a substring, say we skip the first 10 characters, who long would the end still be.

your example is correctly what i thought of and i think as well it would not be matched .

strlen(A+N)``` would not compile? The types do not match, but could there weird things happen?
12

intersting point. If the Array holds a long list of string concatenated and seperated by 0, the offsetting would be a valid usecase.
(are strings in programm executables stored like this?)

But i have no idea if that is actually a scenario. usually i use std::string :)

Fixed review comments. Made code examples and documentation more motivational.

JonasToth added inline comments.Tue, Apr 25, 3:36 AM
docs/clang-tidy/checks/readability-strlen-argument.rst
8

when the intend was to allocate one more char, he would need to do strlen(s) + 1, why is it changed to subtraction then?

20

isnt that an overflow?
an example:
strlen(s) == 10 -> p will be 9 characters long, since its substracted with 1.

the copy operation will then copy the content of s into p, therefore copying 10 characters into a buffer of length 9.

as i understand it strcpy(p, s + 1) would be correct with the sizes.

I am thinking about making my check more strict so it only warns in allocations. I believe the example code is much more motivating when there is allocation.

My thoughts on the check added.
Have you run it over a big codebase? What is the turnout?

And please add the check to the ReleaseNotes.

I have a script that runs clang/clang-tidy on all debian source code. It basically grabs all packages and if it has a configure script it runs: ./configure && bear make && clang-tidy .. Running that right now.

It goes slowly I have run clang-tidy on 22 packages (735 files) so far and got 13 warnings.

Unfortunately most warnings so far are false positives as in this example code:

#include <string.h>
void dostuff(const char *P) {
  if (strncmp(P+2,"xyz",3)==0) {}
}

Without -O2 I get no warning. With -O2 I get a false positive:

danielm@debian:~$ ~/llvm/build/bin/clang-tidy -checks=-*,readability-strlen-argument strlen.c -- -O2
1 warning generated.
/home/danielm/strlen.c:3:16: warning: strlen() argument has pointer addition, it is recommended to subtract the result instead. [readability-strlen-argument]
  if (strncmp(P+2,"xyz",3)==0) {}
               ^

When the -O2 flag is used then on my machine the strncmp() function call is expanded to lots of code. and in that code, there are calls to strlen().

I should probably avoid these, I guess skipping all warnings in macro code sounds ok to me.

danielmarjamaki marked 4 inline comments as done.Tue, Apr 25, 3:46 AM

#include <string.h>
void dostuff(const char *P) {

if (strncmp(P+2,"xyz",3)==0) {}

}

Iam not super familiar with C, but the intend of the function is to check the following:

P = "foxyz", P2 = "abxyz", P3 = "opxyz", ...

And if P matches this kind of string pattern.

/home/danielm/strlen.c:3:16: warning: strlen() argument has pointer addition, it is recommended to subtract the result instead. [readability-strlen-argument]

if (strncmp(P+2,"xyz",3)==0) {}
             ^

Why is it matched? This is the code transformation from -O2? I dont know a way to surround that, and i think you should not call clang-tidy with optimization.

I should probably avoid these, I guess skipping all warnings in macro code sounds ok to me.

You can check if the code is a result of macro expansion, i think that would be enough.

docs/clang-tidy/checks/readability-strlen-argument.rst
8

If I change it to strlen(s) + 1 then the logic of the program is changed.

If I change it to subtraction then the logic is the same and the program is still wrong, but imho it is easier to see.

20

yes it is overflow. My intention was to show that strlen(s+1) syntax is dangerous.

test/clang-tidy/readability-strlen-argument.cpp
12

I think that in theory, you have a point. It would be best that such users don't use this check. I doubt that is a big problem in practice. We don't need to speculate much, I will test this on all debian source code..

It's possible that strings in program executables are stored like that, but I'd say it's ub to calculate the address Str+10 and then dereference that if Str is a string that is 10 bytes long.

JonasToth added inline comments.Tue, Apr 25, 5:30 AM
docs/clang-tidy/checks/readability-strlen-argument.rst
8

it should be made clear in the docs, that the code is bad, especially if its UB. the dangers of copy & paste ;) (even its unlikely that this will be copy&pasted).

20

ok. please state that the overflow in a comment, its better to make that explicit.