This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Check for sizeof that call functions
ClosedPublic

Authored by pfultz2 on Mar 7 2018, 3:10 PM.

Details

Summary

A common mistake that I have found in our codebase is calling a function to get an integer or enum that represents the type such as:

int numBytes = numElements * sizeof(x.GetType());

So this extends the sizeof check to check for these cases. There is also a WarnOnSizeOfCall option so it can be disabled.

Diff Detail

Repository
rL LLVM

Event Timeline

pfultz2 created this revision.Mar 7 2018, 3:10 PM
pfultz2 updated this revision to Diff 137496.Mar 7 2018, 3:12 PM
pfultz2 updated this revision to Diff 137497.
pfultz2 updated this revision to Diff 137499.Mar 7 2018, 3:15 PM
lebedev.ri retitled this revision from Check for sizeof that call functions to [clang-tidy] Check for sizeof that call functions.Mar 8 2018, 3:21 AM
lebedev.ri set the repository for this revision to rCTE Clang Tools Extra.
lebedev.ri added a project: Restricted Project.
lebedev.ri removed a reviewer: Restricted Project.Mar 8 2018, 3:21 AM
lebedev.ri added a subscriber: lebedev.ri.

Please do bother to prefix with the appropriate [prefix], set the appropriate repo, and tags!

hokein added a comment.Mar 8 2018, 3:29 AM

Can you elaborate a bit more about this? I think we also need to update the check document (adding proper section of this new behavior, and the new option).

alexfh added inline comments.Mar 8 2018, 4:29 AM
clang-tidy/misc/SizeofExpressionCheck.cpp
220 ↗(On Diff #137499)

I'm not sure I understand the message "suspicious usage of ... to an integer". Specifically, what does the "to an integer" part mean?

Can you elaborate a bit more about this?

This catches problems when calling sizeof(f()) when f returns an integer(or enum). This is because, most likely, the integer represents the type to be chosen, however, sizeof is being computed for an integer and not the type the integer represents. That is, the user has an enum for the data_type:

enum data_type {
    float_type,
    double_type
};

At some point the user may call a function to get the data type(perhaps x.GetType()) and pass it on to sizeof, like sizeof(x.GetType()), which is incorrect.

I think we also need to update the check document (adding proper section of this new behavior, and the new option).

I will update the doc.

clang-tidy/misc/SizeofExpressionCheck.cpp
220 ↗(On Diff #137499)

That could probably be worded better. It means the expr is an integer type. Maybe I should say suspicious usage of 'sizeof() on an expression that results in an integer?

Can you elaborate a bit more about this?

This catches problems when calling sizeof(f()) when f returns an integer(or enum). This is because, most likely, the integer represents the type to be chosen, however, sizeof is being computed for an integer and not the type the integer represents. That is, the user has an enum for the data_type:

enum data_type {
    float_type,
    double_type
};

At some point the user may call a function to get the data type(perhaps x.GetType()) and pass it on to sizeof, like sizeof(x.GetType()), which is incorrect.

Can you point to some real world code that would benefit from this check? I don't see this as being a common issue with code and I am concerned about false positives. For instance, it's reasonable to write sizeof(func_call()) in a context where you don't want to repeat the type name in multiple places. I've seen this construct used in LLVM's code base (see Prologue::getLength() in DWARFDebugLine.h for one such instance).

Can you point to some real world code that would benefit from this check?

Yes, here:

https://github.com/ROCmSoftwarePlatform/MIOpen/blob/master/src/convolution.cpp#L184

It is erroneously calling sizeof(yDesc.GetType()). The GetType function returns an enum that represents the float type. In this case it wants to compute the size of the float type not the size of the enum that represents the float type. There's actually many instances of this error in the codebase.

I don't see this as being a common issue with code and I am concerned about false positives.

I have seen it as a common issue. It only runs for a function that returns an integer type. Another possibility is to not warn for function calls that are dependent on a template parameter.

Of course, there is still the option to disable it.

For instance, it's reasonable to write sizeof(func_call()) in a context where you don't want to repeat the type name in multiple places.

Generally, a typedef is used to avoid repeating the type.

I've seen this construct used in LLVM's code base (see Prologue::getLength() in DWARFDebugLine.h for one such instance).

This is the first time I have seen this used legitimately outside of template metaprogramming. Alternatively, you could also write sizeof(decltype(expr)).

Can you point to some real world code that would benefit from this check?

Yes, here:

https://github.com/ROCmSoftwarePlatform/MIOpen/blob/master/src/convolution.cpp#L184

It is erroneously calling sizeof(yDesc.GetType()). The GetType function returns an enum that represents the float type. In this case it wants to compute the size of the float type not the size of the enum that represents the float type. There's actually many instances of this error in the codebase.

Thank you for the example.

I don't see this as being a common issue with code and I am concerned about false positives.

I have seen it as a common issue. It only runs for a function that returns an integer type. Another possibility is to not warn for function calls that are dependent on a template parameter.

Of course, there is still the option to disable it.

For instance, it's reasonable to write sizeof(func_call()) in a context where you don't want to repeat the type name in multiple places.

Generally, a typedef is used to avoid repeating the type.

The typedef doesn't avoid the repeating type.

uint16_t foo();
...
size_t s1 = sizeof(foo()); // No repeated type
size_t s2 = sizeof(uint16_t); // Repeated type

If the return type of foo() is changed, then the use for s1 will be automatically updated while the use for s2 will not. Your example of how to work around it using decltype isn't obvious and won't work for C users.

I've seen this construct used in LLVM's code base (see Prologue::getLength() in DWARFDebugLine.h for one such instance).

This is the first time I have seen this used legitimately outside of template metaprogramming. Alternatively, you could also write sizeof(decltype(expr)).

I think we need some data measurements over several large (>500k LoC) C and C++ code bases to see how frequently this check yields false positives. I have a hard time imagining that the true positives will outweigh the false positives, but without concrete data, it's just guesswork.

If the return type of foo() is changed, then the use for s1 will be automatically updated

But usually you write it as:

using foo_return = uint16_t;
foo_return foo();
...
size_t s1 = sizeof(foo());
size_t s2 = sizeof(foo_return);

So you just update the foo_return typedef, and it will update it everywhere that type is used.

I think we need some data measurements over several large (>500k LoC) C and C++ code bases to see how frequently this check yields false positives.

So I ran it on llvm. It mainly complained about areas where sizeof is used for building traits. I should add an extra check that the function is not overloaded.

It did complain once about sizeof in a detection idiom(in BitmaskEnum.h) and once with sizeof(getVersion()), like you mentioned.

If the return type of foo() is changed, then the use for s1 will be automatically updated

But usually you write it as:

using foo_return = uint16_t;
foo_return foo();
...
size_t s1 = sizeof(foo());
size_t s2 = sizeof(foo_return);

So you just update the foo_return typedef, and it will update it everywhere that type is used.

Again, that only works for C++ and not C.

I think we need some data measurements over several large (>500k LoC) C and C++ code bases to see how frequently this check yields false positives.

So I ran it on llvm. It mainly complained about areas where sizeof is used for building traits. I should add an extra check that the function is not overloaded.

It did complain once about sizeof in a detection idiom(in BitmaskEnum.h) and once with sizeof(getVersion()), like you mentioned.

Did it report any true positives that would need correcting?

LLVM is a good start, but hardly representative. Can you check some other large repos (both C++ and C), such as: Qt, cocos2d, rethinkdb, redis, and php-src? Did they produce true positives? How was the false positive rate?

Again, that only works for C++ and not C.

Typedef has always worked in C.

Did it report any true positives that would need correcting?

Not for LLVM, but it has in other projects like I mentioned.

Can you check some other large repos (both C++ and C), such as: Qt, cocos2d, rethinkdb, redis, and php-src?

None of those projects use cmake, so it doesn't look easy to run clang-tidy on them. Do you have a script that will run clang-tidy on these repos?

Actually, PVS-Studio has a more general check for sizeof(expr):

https://www.viva64.com/en/examples/v568/

It shows an example of FCEUX doing sizeof(buf - 1), Asterisk doing sizeof(data[0] * 2) and ReactOS doing sizeof(UnknownError [0] - 20). I think it might be a good idea to expand this from just a function call to any integer expression.

Again, that only works for C++ and not C.

Typedef has always worked in C.

This is true.

I think what I'm struggling to say is: I don't think this is a common pattern in either C or C++. It's also unnecessary because you can avoid repeating the type by just using sizeof on the function call result.

Did it report any true positives that would need correcting?

Not for LLVM, but it has in other projects like I mentioned.

Can you check some other large repos (both C++ and C), such as: Qt, cocos2d, rethinkdb, redis, and php-src?

None of those projects use cmake, so it doesn't look easy to run clang-tidy on them. Do you have a script that will run clang-tidy on these repos?

I don't have a script for it. I've used "bear" with at least some of those projects because they use makefiles rather than cmake (https://github.com/rizsotto/Bear). I'm not tied to those projects specifically, either, so if you have a different corpus you'd prefer to use, that's fine. The important bit is testing it across a considerable amount of C code and C++ code to see whether this diagnostic is too chatty or not.

Actually, PVS-Studio has a more general check for sizeof(expr):

https://www.viva64.com/en/examples/v568/

It shows an example of FCEUX doing sizeof(buf - 1), Asterisk doing sizeof(data[0] * 2) and ReactOS doing sizeof(UnknownError [0] - 20). I think it might be a good idea to expand this from just a function call to any integer expression.

That won't catch many (most?) of the issues demonstrated by PVS-Studio; the rule their check follows are to warn on side-effecting operations (which Clang already does with -Wunevaluated-expression) and arithmetic expressions in sizeof. The latter might be a reasonable extension to the check -- I have less concerns about the false positive rate with that than I do with the function call case. Another possible extension to consider is sizeof(sizeof(foo)), which it seems PVS-Studio will diagnose as well.

I don't have a script for it. I've used "bear" with at least some of those projects because they use makefiles rather than cmake (https://github.com/rizsotto/Bear). I'm not tied to those projects specifically, either, so if you have a different corpus you'd prefer to use, that's fine. The important bit is testing it across a considerable amount of C code and C++ code to see whether this diagnostic is too chatty or not.

So I did a grep over these codebases(with grep -E '\bsizeof\([^()"*]+\([^()]*\)'). Most of them are macros which access elements(ie no function call) or are used in type traits. The only false positive I saw was here:

https://github.com/rethinkdb/rethinkdb/blob/v2.3.x/external/re2_20140111/re2/prog.cc#L317

So I dont think it will be too chatty.

That won't catch many (most?) of the issues demonstrated by PVS-Studio; the rule their check follows are to warn on side-effecting operations (which Clang already does with -Wunevaluated-expression) and arithmetic expressions in sizeof.

It finds function calls as well. I tried on MIOpen and it caught the errors like I mentioned earlier here:

https://github.com/ROCmSoftwarePlatform/MIOpen/blob/master/src/convolution.cpp#L184

pfultz2 updated this revision to Diff 137822.Mar 9 2018, 1:22 PM
pfultz2 marked an inline comment as done.Mar 9 2018, 4:18 PM
pfultz2 marked an inline comment as done.Mar 11 2018, 2:57 PM

So I've updated the documentation on this. Are there any other updates needed for this to get it merged in?

As this patch can catch some mistakes, I'm fine with checking it in. I agree that it is reasonable to write normal code like sizeof(func_call()) (not false positive), maybe set the option to false by default?

As this patch can catch some mistakes, I'm fine with checking it in. I agree that it is reasonable to write normal code like sizeof(func_call()) (not false positive), maybe set the option to false by default?

I am okay with requiring people to enable this part of the check explicitly.

clang-tidy/misc/SizeofExpressionCheck.cpp
220 ↗(On Diff #137499)

Missing a terminating quote around sizeof.

docs/clang-tidy/checks/misc-sizeof-expression.rst
28 ↗(On Diff #137822)

It's not obvious what "the type" refers to in this sentence. Also, it is incorrect to state that "A common mistake is to query the size of an integer" -- that's an incredibly common practice. You may need a bit more setup in the explanation to give the reader context.

pfultz2 updated this revision to Diff 138791.Mar 16 2018, 3:44 PM
pfultz2 marked an inline comment as done.
pfultz2 marked an inline comment as done.

I have reworded the documentation. Hopefully it is clearer.

As this patch can catch some mistakes, I'm fine with checking it in. I agree that it is reasonable to write normal code like sizeof(func_call()) (not false positive), maybe set the option to false by default?

I have disabled it by default. We can decide later if we want it to enabled by default.

aaron.ballman added inline comments.Mar 16 2018, 4:48 PM
test/clang-tidy/misc-sizeof-expression.cpp
17 ↗(On Diff #138791)

Can you add a C++11 test case using enum class -- I am worried that the isInteger() matcher will return false for that type.

Also, I think you should have a test case for bool; I think the check will diagnose bool types but I'm not certain there's value in diagnosing those. What do you think?

alexfh accepted this revision.Mar 21 2018, 7:53 AM

LG modulo other reviewers' open comments.

This revision is now accepted and ready to land.Mar 21 2018, 7:53 AM
pfultz2 updated this revision to Diff 139649.Mar 23 2018, 2:01 PM
pfultz2 marked an inline comment as done.

So, I've added tests for class enums and bols.

alexfh accepted this revision.Mar 26 2018, 8:46 AM

LG. Do you need someone to submit this for you?

test/clang-tidy/misc-sizeof-expression.cpp
17 ↗(On Diff #138791)

I think the check will diagnose bool types but I'm not certain there's value in diagnosing those

It will be easier to decide once someone finds a few examples of those in real life code.

Do you need someone to submit this for you?

Yes

pfultz2 marked an inline comment as done.Mar 26 2018, 8:52 PM

Is someone able to merge in my changes?

hokein added a comment.Apr 3 2018, 1:27 AM

Sorry for the delay -- everyone was out because of the long Easter weekend. I'll commit for you.

hokein added a comment.Apr 3 2018, 2:16 AM

pfultz2@, could you rebase this patch? The check has been moved to bugprone.

pfultz2 updated this revision to Diff 140790.Apr 3 2018, 7:54 AM

I have rebased.

This revision was automatically updated to reflect the committed changes.
hokein added a comment.Apr 3 2018, 8:13 AM

Thanks, I have committed for you.