A possible error is to write `malloc(strlen(s+1)) instead of malloc(strlen(s)+1). Unfortunately the former is also valid syntactically, but allocates less memory by two bytes (if s` is at least one character long, undefined behavior otherwise) which may result in overflow cases. This check detects such cases and also suggests the fix for them.
Details
Diff Detail
Event Timeline
Consider the use of a function pointer:
void* malloc(int); int strlen(char*); auto fp = malloc; void bad_malloc(char *str) { char *c = (char *)fp(strlen(str + 1)); }
I think, the checker will not match in this case.
One might use allocation functions via a function pointer in case of more possible allocation strategies (e.g having a different strategy for a shared memory allocation).
We might get false positives in case of certain substring operations.
Consider the case of copying a substring, pseudo code below:
const char * s = "abcdefg"; int offset = my_find('d', s); // I want to copy "defg" char *new_subststring = (char*) malloc(strlen(s + offset)); strcpy(...);
clang-tidy/misc/MisplacedOperatorInStrlenInAllocCheck.cpp | ||
---|---|---|
30 ↗ | (On Diff #119650) | Maybe it is worth to have a configurable list of allocation functions? Maybe it would be worth to support`alloca` as well? |
44 ↗ | (On Diff #119650) | What is this fixme? |
clang-tidy/misc/MisplacedOperatorInStrlenInAllocCheck.h | ||
19 ↗ | (On Diff #119650) | There is a missing description. |
docs/clang-tidy/checks/misc-misplaced-operator-in-strlen-in-alloc.rst | ||
16 ↗ | (On Diff #119650) | What if this code is intentional for some reason? char *c = (char*) malloc(strlen(str)-1); I think it might be a good idea to mention this possibility as a way to suppress this warning in the documentation. |
docs/clang-tidy/checks/misc-misplaced-operator-in-strlen-in-alloc.rst | ||
---|---|---|
16 ↗ | (On Diff #119650) | This is my concern as well -- I'd like to know how chatty this diagnostic is on real-world code bases, especially ones that rely on C rather than C++. A somewhat common code pattern in Win32 coding are double-null-terminated lists of strings, where you have null terminated strings at adjacent memory locations with two null characters for the end of the list. This could result in reasonable code like malloc(strlen(str + offset) + 1). |
Apart from all other comments, I think, this check would better be placed under bugprone/.
clang-tidy/misc/MisplacedOperatorInStrlenInAllocCheck.cpp | ||
---|---|---|
30 ↗ | (On Diff #119650) | Support for `alloca` added. I agree, it is worth to have a configurable list, but I think it is better to be done in a separate patch. |
44 ↗ | (On Diff #119650) | Sorry, I forgot to remove it. |
clang-tidy/misc/MisplacedOperatorInStrlenInAllocCheck.h | ||
19 ↗ | (On Diff #119650) | No longer. Sorry, I forgot it. |
docs/clang-tidy/checks/list.rst | ||
10 | I agree, but I did not do it. It was the script that created the new check. I changed it back to the wrong order. | |
docs/clang-tidy/checks/misc-misplaced-operator-in-strlen-in-alloc.rst | ||
16 ↗ | (On Diff #119650) | It is now more conservative: only `+ 1 counts, and only if there is no additional + 1` outside. |
Good point! But I think we should keep this first patch small, so I will do it in a follow-up patch.
Yes! However, it seems that I need a new matcher for it so I would do it in a follow-up patch.
I'm still a bit concerned about how to silence this diagnostic if the code is actually correct. Would it make sense to diagnose malloc(strlen(s + 1)) but silence the diagnostic if the argument to strlen() is explicitly parenthesized? That means a user could silence the diagnostic by writing malloc(strlen((s + 1))).
clang-tidy/bugprone/MisplacedOperatorInStrlenInAllocCheck.cpp | ||
---|---|---|
24 | This should be checking for ::strlen instead. What about other strlen-like functions (strnlen and strnlen_s come to mind)? | |
38 | ::malloc (etc), here and below. The reason is because a pathological case might have these functions inside of a namespace, so you want to ensure this only checks against the global namespace. | |
55 | Assign into a StringRef to avoid a needless copy operation (same below). | |
64 | This should use a Twine to avoid needless allocations and copies. | |
65 | BinOp->getOpcode() will always be BO_Add, correct? That's the only binary operator you're matching against. | |
67 | I'm not keen on this wording as it doesn't tell the user what's wrong with their code; further, it doesn't tell the user how to silence the warning if the code is correct. Perhaps: "addition operator is applied to the argument to strlen instead of the result; surround the addition subexpression with parentheses to silence this warning" | |
68 | + is the only binop you match. | |
docs/ReleaseNotes.rst | ||
63 | This comment is no longer accurate and should be reworded. | |
docs/clang-tidy/checks/bugprone-misplaced-operator-in-strlen-in-alloc.rst | ||
7 | This comment is no longer accurate and should be reworded. | |
test/clang-tidy/bugprone-misplaced-operator-in-strlen-in-alloc.cpp | ||
37 | This should have some comments that explain why it's valid and it should be clearly spelled out in the docs. | |
40 | Spurious newline. | |
41 | This should have some comments that explain why it's valid and it should be clearly spelled out in the docs. |
clang-tidy/bugprone/MisplacedOperatorInStrlenInAllocCheck.cpp | ||
---|---|---|
64–67 | Please don't use auto as the type is not spelled out in the initialization. Same elsewhere as well. | |
docs/ReleaseNotes.rst | ||
63 | Still not quite right because it's talking about subtraction. | |
docs/clang-tidy/checks/bugprone-misplaced-operator-in-strlen-in-alloc.rst | ||
7 | Same comment about subtraction. | |
12 | similarly to -> as are | |
23 | You should also add an example showing how to silence the warning with parens. |
Many of the comments are marked done, but the code does not reflect that it actually is done, so I'm not certain what's happened there.
clang-tidy/bugprone/MisplacedOperatorInStrlenInAllocCheck.cpp | ||
---|---|---|
64–67 | You marked this as done but I still see auto used improperly on this line and many others. | |
87 | Addition -> addition (Diagnostics are never complete sentences with capitalization and punctuation, unlike comments.) | |
88–89 | The downside to this new wording is that it contradicts the fixit hint that's being supplied. The user sees "surround the addition with parens" as the textual advice, but the fixit shows a different way to silence the warning. I'm not certain of the best way to address that, however. Basically, there are two ways this could be fixed and both could come with fixits, but I don't believe we have a way to do an either/or pair of fixits. We could add "or hoist the addition" to the diagnostic, but that may be too cryptic and the diagnostic would be quite long. @alexfh -- do you have ideas? | |
docs/ReleaseNotes.rst | ||
63 | It's still talking about subtraction, so this does not appear to be done. | |
docs/clang-tidy/checks/bugprone-misplaced-operator-in-strlen-in-alloc.rst | ||
7 | This is also not done. |
This upload looks considerably better, thank you!
docs/ReleaseNotes.rst | ||
---|---|---|
63 | parameter of -> argument to | |
64 | and -> , and (Add the Oxford comma.) | |
65 | functions instead of to the result -> instead of the result use its return value -> the value is used | |
66 | of a -> to a | |
docs/clang-tidy/checks/bugprone-misplaced-operator-in-strlen-in-alloc.rst | ||
7 | Same wording changes from the release notes apply here as well. | |
32 | bug report -> diagnostic |
My only remaining concern is with the diagnostic message/fixit interaction itself. Let's see if @alexfh has any suggestions there, or we think of an improvement ourselves.
The diagnostic tells the user that you surround the arg to strlen with parens to silence the diagnostic, but the fixit doesn't do that -- it moves the addition to the result. That's confusing behavior. However, if you add another fixit to surround the arg with parens (and leave in the existing one), then there are conflicting fixits (you don't want to apply them both) which would also be confusing.
Then, I think we should rephrase the diagnostic message again. I am confident that in at least 99.9% of the cases adding 1 to the argument is a mistake. So the primary suggestion should be to move the addition to the result instead. The suggestion to put the argument in extra parentheses should be secondary, maybe also put in parentheses.
Agreed -- we want to keep the fixit with moving the +1 out to the result. However, I'm still worried that there'd be no way for the user to know how to silence the warning if the code is already correct -- I don't want users to disable a useful check because they don't have guidance on how to silence the diagnostic.
Do you have concrete numbers of how many diagnostics are triggered on large code bases, and the false positive rate?
I thought on something like this, but I still do not like my phrasing:
"Addition operator is applied to the argument of strlen(). instead of its result; move the '+ 1' outside of the call. (Or, if it is intentional then surround the addition subexpression with parentheses to silence this warning)."
Any better ideas?
Unfortunately, that is way too long of a diagnostic. We don't have hard and fast rules about diagnostic lengths, but (a) we don't use complete sentences for diagnostics, and (b) I would guesstimate that we usually try to keep them 80 characters or less (excluding expanded identifiers) and this one is 219 characters long (not counting strlen()).
Once we have some concrete numbers about how often this diagnostic triggers (true and false positives), that may help us pick a better approach. For instance, if this doesn't trigger any diagnostics across several large code bases, we may decide to not have any fix-it and simply tell the user addition operator is applied to the argument of 'strlen()' instead of its result. It'd be unfortunate not to tell the user what to do about it, but since it likely doesn't come up in practice all that often we can point any confused users to the documentation. However, if the numbers show that this comes up frequently (with true or false positives), we may go a different route.
Can you give me please an example where malloc(strlen(s+1)) is intentional. It allocates 2 byte less than needed to store s. If it is really the goal (e.g. s has a 2 character prefix which we do not want to copy) then the correct solution is to use malloc(strlen(s+2)+1) instead of putting s+1 into extra parentheses. So I think that we are overcomplicating things here. I would simply delete the suggestion about the extra parentheses from the error message and leave them only in the documentation, at least until no real example comes to my mind where taking the length of s+1 and not adding +1 is the fully correct solution. (Please note the malloc(strlen(s+1)+1) is already ignored.
As I pointed out earlier in the thread, it is common to have double-null-terminated strings in Win32 APIs. This is a case where strlen(s + N) is valid. Since 1-byte strings would also be a valid value of N, strlen(s + 1) is feasible, though unlikely. If you're okay dropping the fixit from your check and rewording the diagnostic to remove the "surround with parens" bit, I think the check would be fine. However, fix-its are generally only used when we know the transformation is correct. We have no way to know that in this case.
Yes, you pointed it out, but even in the example you wrote a +1 to the result as well. I a double-null terminated string in Win32 you must have at least 1-byte strings, which are 2-bytes long together with their null terminator. So the minimum offset is 2, since 1 would mean 0-byte string, but then we have two null terminators after each other which is impossible since double null is the terminator of the whole list. So s+1 cannot be valid. Furthermore, even if it would be valid, we must also allocate memory for the zero terminator of the string list item at the given offest, so we have an extra +1 outside of strlen as well.
Hmm, this is a good point -- I was thinking of the generic +N case with the original example, but with an explicit +1, you can't run into that situation with Win32 APIs. I will think on this a bit further and report back when I have a spare moment.
I think I've convinced myself that because we're limiting this check to just +1 and not +N, the current approach is fine. We should leave in the parens-to-silence behavior and the existing fixit to move the addition to the result, but we don't need to call it out in the diagnostic text (the documentation should be sufficient).
I would still appreciate verification that this doesn't have a high fp rate against some real world code bases just to be sure of my intuition.
I can test this on our repo, Mozilla, since it's a large code-base I think we will have a better understanding of the false-positive ratio.
I figured there wouldn't be, but was curious just the same.
clang-tidy/bugprone/MisplacedOperatorInStrlenInAllocCheck.cpp | ||
---|---|---|
87 | This change still needs to be applied. | |
test/clang-tidy/bugprone-misplaced-operator-in-strlen-in-alloc.c | ||
2 | Please clang-format this file. | |
test/clang-tidy/bugprone-misplaced-operator-in-strlen-in-alloc.cpp | ||
2 | Please clang-format this file. |
test/clang-tidy/bugprone-misplaced-operator-in-strlen-in-alloc.c | ||
---|---|---|
2 | I omitted it intantionally. No I did it, but it meant lots of editing back the CHECK-MESSAGES and CHECK-FIXES comments. They were broken were they should not have been, and merged where they should not have been. | |
test/clang-tidy/bugprone-misplaced-operator-in-strlen-in-alloc.cpp | ||
2 | Same as above. |
LGTM!
test/clang-tidy/bugprone-misplaced-operator-in-strlen-in-alloc.c | ||
---|---|---|
2 | Ugh, it's unfortunate that clang-format messed with the CHECK-* stuff, but thank you for ensuring the code is formatted according to the community standards. |
This should be checking for ::strlen instead.
What about other strlen-like functions (strnlen and strnlen_s come to mind)?
What about wide-character strings using wcslen? (This might be another good check because those should be wcslen(str) * sizeof(wchar_t) instead of just wcslen(str).)