This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Misplaced Operator in Strlen in Alloc
ClosedPublic

Authored by baloghadamsoftware on Oct 20 2017, 5:49 AM.

Details

Summary

A possible error is to write `malloc(strlen(s+1)) instead of malloc(strlen(s)+1). Unfortunately the former is also valid syntactically, but allocates less memory by two bytes (if s` is at least one character long, undefined behavior otherwise) which may result in overflow cases. This check detects such cases and also suggests the fix for them.

Diff Detail

Event Timeline

Consider the use of a function pointer:

void* malloc(int);
int strlen(char*);
auto fp = malloc;
void bad_malloc(char *str) { char *c = (char *)fp(strlen(str + 1)); }

I think, the checker will not match in this case.

One might use allocation functions via a function pointer in case of more possible allocation strategies (e.g having a different strategy for a shared memory allocation).

We might get false positives in case of certain substring operations.
Consider the case of copying a substring, pseudo code below:

const char * s = "abcdefg";
int offset = my_find('d', s);
// I want to copy "defg"
char *new_subststring = (char*) malloc(strlen(s + offset));
strcpy(...);
xazax.hun added inline comments.Oct 20 2017, 7:10 AM
clang-tidy/misc/MisplacedOperatorInStrlenInAllocCheck.cpp
30 ↗(On Diff #119650)

Maybe it is worth to have a configurable list of allocation functions?

Maybe it would be worth to support`alloca` as well?

44 ↗(On Diff #119650)

What is this fixme?

clang-tidy/misc/MisplacedOperatorInStrlenInAllocCheck.h
19 ↗(On Diff #119650)

There is a missing description.

docs/clang-tidy/checks/misc-misplaced-operator-in-strlen-in-alloc.rst
16 ↗(On Diff #119650)

What if this code is intentional for some reason?
I think in thase case it could be rewritten like the following (which is much cleaner):

char *c = (char*) malloc(strlen(str)-1);

I think it might be a good idea to mention this possibility as a way to suppress this warning in the documentation.

aaron.ballman added inline comments.
docs/clang-tidy/checks/misc-misplaced-operator-in-strlen-in-alloc.rst
16 ↗(On Diff #119650)

This is my concern as well -- I'd like to know how chatty this diagnostic is on real-world code bases, especially ones that rely on C rather than C++. A somewhat common code pattern in Win32 coding are double-null-terminated lists of strings, where you have null terminated strings at adjacent memory locations with two null characters for the end of the list. This could result in reasonable code like malloc(strlen(str + offset) + 1).

Same mistake could be made with new[] operator.

docs/clang-tidy/checks/list.rst
10

I think will be better to fix order of other checks separately from this check.

docs/clang-tidy/checks/misc-misplaced-operator-in-strlen-in-alloc.rst
7 ↗(On Diff #119650)

strlen() is function, not method.

Eugene.Zelenko set the repository for this revision to rL LLVM.Oct 20 2017, 11:22 AM
Eugene.Zelenko added a project: Restricted Project.
alexfh edited edge metadata.Oct 20 2017, 4:59 PM

Apart from all other comments, I think, this check would better be placed under bugprone/.

Updated according to comments.

baloghadamsoftware marked 5 inline comments as done.Oct 25 2017, 2:18 AM
baloghadamsoftware added inline comments.
clang-tidy/misc/MisplacedOperatorInStrlenInAllocCheck.cpp
30 ↗(On Diff #119650)

Support for `alloca` added.

I agree, it is worth to have a configurable list, but I think it is better to be done in a separate patch.

44 ↗(On Diff #119650)

Sorry, I forgot to remove it.

clang-tidy/misc/MisplacedOperatorInStrlenInAllocCheck.h
19 ↗(On Diff #119650)

No longer. Sorry, I forgot it.

docs/clang-tidy/checks/list.rst
10

I agree, but I did not do it. It was the script that created the new check. I changed it back to the wrong order.

docs/clang-tidy/checks/misc-misplaced-operator-in-strlen-in-alloc.rst
16 ↗(On Diff #119650)

It is now more conservative: only `+ 1 counts, and only if there is no additional + 1` outside.

baloghadamsoftware marked 4 inline comments as done.Oct 25 2017, 2:18 AM

Apart from all other comments, I think, this check would better be placed under bugprone/.

I moved it there.

Consider the use of a function pointer:

void* malloc(int);
int strlen(char*);
auto fp = malloc;
void bad_malloc(char *str) { char *c = (char *)fp(strlen(str + 1)); }

I think, the checker will not match in this case.

One might use allocation functions via a function pointer in case of more possible allocation strategies (e.g having a different strategy for a shared memory allocation).

Good point! But I think we should keep this first patch small, so I will do it in a follow-up patch.

Same mistake could be made with new[] operator.

Yes! However, it seems that I need a new matcher for it so I would do it in a follow-up patch.

I'm still a bit concerned about how to silence this diagnostic if the code is actually correct. Would it make sense to diagnose malloc(strlen(s + 1)) but silence the diagnostic if the argument to strlen() is explicitly parenthesized? That means a user could silence the diagnostic by writing malloc(strlen((s + 1))).

clang-tidy/bugprone/MisplacedOperatorInStrlenInAllocCheck.cpp
23

This should be checking for ::strlen instead.

What about other strlen-like functions (strnlen and strnlen_s come to mind)?
What about wide-character strings using wcslen? (This might be another good check because those should be wcslen(str) * sizeof(wchar_t) instead of just wcslen(str).)

37

::malloc (etc), here and below.

The reason is because a pathological case might have these functions inside of a namespace, so you want to ensure this only checks against the global namespace.

54

Assign into a StringRef to avoid a needless copy operation (same below).

63

This should use a Twine to avoid needless allocations and copies.

64

BinOp->getOpcode() will always be BO_Add, correct? That's the only binary operator you're matching against.

66

I'm not keen on this wording as it doesn't tell the user what's wrong with their code; further, it doesn't tell the user how to silence the warning if the code is correct. Perhaps:

"addition operator is applied to the argument to strlen instead of the result; surround the addition subexpression with parentheses to silence this warning"

67

+ is the only binop you match.

docs/ReleaseNotes.rst
63

This comment is no longer accurate and should be reworded.

docs/clang-tidy/checks/bugprone-misplaced-operator-in-strlen-in-alloc.rst
6

This comment is no longer accurate and should be reworded.

test/clang-tidy/bugprone-misplaced-operator-in-strlen-in-alloc.cpp
36

This should have some comments that explain why it's valid and it should be clearly spelled out in the docs.

39

Spurious newline.

40

This should have some comments that explain why it's valid and it should be clearly spelled out in the docs.

Updated according to comments.

baloghadamsoftware marked 11 inline comments as done.Oct 26 2017, 6:46 AM
aaron.ballman added inline comments.Oct 26 2017, 6:56 AM
clang-tidy/bugprone/MisplacedOperatorInStrlenInAllocCheck.cpp
65–68

Please don't use auto as the type is not spelled out in the initialization. Same elsewhere as well.

docs/ReleaseNotes.rst
63

Still not quite right because it's talking about subtraction.

docs/clang-tidy/checks/bugprone-misplaced-operator-in-strlen-in-alloc.rst
6

Same comment about subtraction.

13

similarly to -> as are

24

You should also add an example showing how to silence the warning with parens.

Updated according to the comments.

baloghadamsoftware marked 6 inline comments as done.Oct 27 2017, 12:02 AM

Many of the comments are marked done, but the code does not reflect that it actually is done, so I'm not certain what's happened there.

clang-tidy/bugprone/MisplacedOperatorInStrlenInAllocCheck.cpp
65–68

You marked this as done but I still see auto used improperly on this line and many others.

87

Addition -> addition

(Diagnostics are never complete sentences with capitalization and punctuation, unlike comments.)

88–89

The downside to this new wording is that it contradicts the fixit hint that's being supplied. The user sees "surround the addition with parens" as the textual advice, but the fixit shows a different way to silence the warning.

I'm not certain of the best way to address that, however. Basically, there are two ways this could be fixed and both could come with fixits, but I don't believe we have a way to do an either/or pair of fixits.

We could add "or hoist the addition" to the diagnostic, but that may be too cryptic and the diagnostic would be quite long. @alexfh -- do you have ideas?

docs/ReleaseNotes.rst
63

It's still talking about subtraction, so this does not appear to be done.

docs/clang-tidy/checks/bugprone-misplaced-operator-in-strlen-in-alloc.rst
6

This is also not done.

Many of the comments are marked done, but the code does not reflect that it actually is done, so I'm not certain what's happened there.

Neither am I. I will try to re-upload the diff.

Second try to upload the correct diff...

This upload looks considerably better, thank you!

docs/ReleaseNotes.rst
63

parameter of -> argument to

64

and -> , and

(Add the Oxford comma.)

65

functions instead of to the result -> instead of the result

use its return value -> the value is used

66

of a -> to a

docs/clang-tidy/checks/bugprone-misplaced-operator-in-strlen-in-alloc.rst
7

Same wording changes from the release notes apply here as well.

32

bug report -> diagnostic

Docs rephrased according to the comments.

My only remaining concern is with the diagnostic message/fixit interaction itself. Let's see if @alexfh has any suggestions there, or we think of an improvement ourselves.

My only remaining concern is with the diagnostic message/fixit interaction itself. Let's see if @alexfh has any suggestions there, or we think of an improvement ourselves.

What do you mean by message/fixit interaction and what is your concern there?

My only remaining concern is with the diagnostic message/fixit interaction itself. Let's see if @alexfh has any suggestions there, or we think of an improvement ourselves.

What do you mean by message/fixit interaction and what is your concern there?

The diagnostic tells the user that you surround the arg to strlen with parens to silence the diagnostic, but the fixit doesn't do that -- it moves the addition to the result. That's confusing behavior. However, if you add another fixit to surround the arg with parens (and leave in the existing one), then there are conflicting fixits (you don't want to apply them both) which would also be confusing.

The diagnostic tells the user that you surround the arg to strlen with parens to silence the diagnostic, but the fixit doesn't do that -- it moves the addition to the result. That's confusing behavior. However, if you add another fixit to surround the arg with parens (and leave in the existing one), then there are conflicting fixits (you don't want to apply them both) which would also be confusing.

Then, I think we should rephrase the diagnostic message again. I am confident that in at least 99.9% of the cases adding 1 to the argument is a mistake. So the primary suggestion should be to move the addition to the result instead. The suggestion to put the argument in extra parentheses should be secondary, maybe also put in parentheses.

The diagnostic tells the user that you surround the arg to strlen with parens to silence the diagnostic, but the fixit doesn't do that -- it moves the addition to the result. That's confusing behavior. However, if you add another fixit to surround the arg with parens (and leave in the existing one), then there are conflicting fixits (you don't want to apply them both) which would also be confusing.

Then, I think we should rephrase the diagnostic message again. I am confident that in at least 99.9% of the cases adding 1 to the argument is a mistake. So the primary suggestion should be to move the addition to the result instead. The suggestion to put the argument in extra parentheses should be secondary, maybe also put in parentheses.

Agreed -- we want to keep the fixit with moving the +1 out to the result. However, I'm still worried that there'd be no way for the user to know how to silence the warning if the code is already correct -- I don't want users to disable a useful check because they don't have guidance on how to silence the diagnostic.

Do you have concrete numbers of how many diagnostics are triggered on large code bases, and the false positive rate?

I thought on something like this, but I still do not like my phrasing:

"Addition operator is applied to the argument of strlen(). instead of its result; move the '+ 1' outside of the call. (Or, if it is intentional then surround the addition subexpression with parentheses to silence this warning)."

Any better ideas?

I thought on something like this, but I still do not like my phrasing:

"Addition operator is applied to the argument of strlen(). instead of its result; move the '+ 1' outside of the call. (Or, if it is intentional then surround the addition subexpression with parentheses to silence this warning)."

Any better ideas?

Unfortunately, that is way too long of a diagnostic. We don't have hard and fast rules about diagnostic lengths, but (a) we don't use complete sentences for diagnostics, and (b) I would guesstimate that we usually try to keep them 80 characters or less (excluding expanded identifiers) and this one is 219 characters long (not counting strlen()).

Once we have some concrete numbers about how often this diagnostic triggers (true and false positives), that may help us pick a better approach. For instance, if this doesn't trigger any diagnostics across several large code bases, we may decide to not have any fix-it and simply tell the user addition operator is applied to the argument of 'strlen()' instead of its result. It'd be unfortunate not to tell the user what to do about it, but since it likely doesn't come up in practice all that often we can point any confused users to the documentation. However, if the numbers show that this comes up frequently (with true or false positives), we may go a different route.

Can you give me please an example where malloc(strlen(s+1)) is intentional. It allocates 2 byte less than needed to store s. If it is really the goal (e.g. s has a 2 character prefix which we do not want to copy) then the correct solution is to use malloc(strlen(s+2)+1) instead of putting s+1 into extra parentheses. So I think that we are overcomplicating things here. I would simply delete the suggestion about the extra parentheses from the error message and leave them only in the documentation, at least until no real example comes to my mind where taking the length of s+1 and not adding +1 is the fully correct solution. (Please note the malloc(strlen(s+1)+1) is already ignored.

Can you give me please an example where malloc(strlen(s+1)) is intentional. It allocates 2 byte less than needed to store s. If it is really the goal (e.g. s has a 2 character prefix which we do not want to copy) then the correct solution is to use malloc(strlen(s+2)+1) instead of putting s+1 into extra parentheses. So I think that we are overcomplicating things here. I would simply delete the suggestion about the extra parentheses from the error message and leave them only in the documentation, at least until no real example comes to my mind where taking the length of s+1 and not adding +1 is the fully correct solution. (Please note the malloc(strlen(s+1)+1) is already ignored.

As I pointed out earlier in the thread, it is common to have double-null-terminated strings in Win32 APIs. This is a case where strlen(s + N) is valid. Since 1-byte strings would also be a valid value of N, strlen(s + 1) is feasible, though unlikely. If you're okay dropping the fixit from your check and rewording the diagnostic to remove the "surround with parens" bit, I think the check would be fine. However, fix-its are generally only used when we know the transformation is correct. We have no way to know that in this case.

As I pointed out earlier in the thread, it is common to have double-null-terminated strings in Win32 APIs. This is a case where strlen(s + N) is valid. Since 1-byte strings would also be a valid value of N, strlen(s + 1) is feasible, though unlikely. If you're okay dropping the fixit from your check and rewording the diagnostic to remove the "surround with parens" bit, I think the check would be fine. However, fix-its are generally only used when we know the transformation is correct. We have no way to know that in this case.

Yes, you pointed it out, but even in the example you wrote a +1 to the result as well. I a double-null terminated string in Win32 you must have at least 1-byte strings, which are 2-bytes long together with their null terminator. So the minimum offset is 2, since 1 would mean 0-byte string, but then we have two null terminators after each other which is impossible since double null is the terminator of the whole list. So s+1 cannot be valid. Furthermore, even if it would be valid, we must also allocate memory for the zero terminator of the string list item at the given offest, so we have an extra +1 outside of strlen as well.

As I pointed out earlier in the thread, it is common to have double-null-terminated strings in Win32 APIs. This is a case where strlen(s + N) is valid. Since 1-byte strings would also be a valid value of N, strlen(s + 1) is feasible, though unlikely. If you're okay dropping the fixit from your check and rewording the diagnostic to remove the "surround with parens" bit, I think the check would be fine. However, fix-its are generally only used when we know the transformation is correct. We have no way to know that in this case.

Yes, you pointed it out, but even in the example you wrote a +1 to the result as well. I a double-null terminated string in Win32 you must have at least 1-byte strings, which are 2-bytes long together with their null terminator. So the minimum offset is 2, since 1 would mean 0-byte string, but then we have two null terminators after each other which is impossible since double null is the terminator of the whole list. So s+1 cannot be valid. Furthermore, even if it would be valid, we must also allocate memory for the zero terminator of the string list item at the given offest, so we have an extra +1 outside of strlen as well.

Hmm, this is a good point -- I was thinking of the generic +N case with the original example, but with an explicit +1, you can't run into that situation with Win32 APIs. I will think on this a bit further and report back when I have a spare moment.

Hmm, this is a good point -- I was thinking of the generic +N case with the original example, but with an explicit +1, you can't run into that situation with Win32 APIs. I will think on this a bit further and report back when I have a spare moment.

I think I've convinced myself that because we're limiting this check to just +1 and not +N, the current approach is fine. We should leave in the parens-to-silence behavior and the existing fixit to move the addition to the result, but we don't need to call it out in the diagnostic text (the documentation should be sufficient).

I would still appreciate verification that this doesn't have a high fp rate against some real world code bases just to be sure of my intuition.

Diagnostic message changed.

OK, I will test it on some real code tomorrow, but today is a holiday here.

I tested it on a C project (Postgres) and found no false positives.

Abpostelnicu added a comment.EditedNov 3 2017, 1:13 AM

I can test this on our repo, Mozilla, since it's a large code-base I think we will have a better understanding of the false-positive ratio.

I can test this on our repo, Mozilla, since it's a large code-base I think we you will have a better understanding of the false-positive ratio.

Thank you in advance!

I tested it on a C project (Postgres) and found no false positives.

Out of curiosity, were there any true positives, either?

Out of curiosity, were there any true positives, either?

No, in a release version there should be no true positives of this kind, I think.

Out of curiosity, were there any true positives, either?

No, in a release version there should be no true positives of this kind, I think.

I figured there wouldn't be, but was curious just the same.

clang-tidy/bugprone/MisplacedOperatorInStrlenInAllocCheck.cpp
87

This change still needs to be applied.

test/clang-tidy/bugprone-misplaced-operator-in-strlen-in-alloc.c
1 ↗(On Diff #121102)

Please clang-format this file.

test/clang-tidy/bugprone-misplaced-operator-in-strlen-in-alloc.cpp
2

Please clang-format this file.

Updated according to the comments.

baloghadamsoftware marked 12 inline comments as done.Nov 6 2017, 5:57 AM
baloghadamsoftware added inline comments.
test/clang-tidy/bugprone-misplaced-operator-in-strlen-in-alloc.c
1 ↗(On Diff #121102)

I omitted it intantionally. No I did it, but it meant lots of editing back the CHECK-MESSAGES and CHECK-FIXES comments. They were broken were they should not have been, and merged where they should not have been.

test/clang-tidy/bugprone-misplaced-operator-in-strlen-in-alloc.cpp
2

Same as above.

aaron.ballman accepted this revision.Nov 6 2017, 7:28 AM

LGTM!

test/clang-tidy/bugprone-misplaced-operator-in-strlen-in-alloc.c
1 ↗(On Diff #121102)

Ugh, it's unfortunate that clang-format messed with the CHECK-* stuff, but thank you for ensuring the code is formatted according to the community standards.

This revision is now accepted and ready to land.Nov 6 2017, 7:28 AM
baloghadamsoftware marked 2 inline comments as done.