This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer][VLASizeChecker] Check for VLA size overflow.
ClosedPublic

Authored by balazske on May 4 2020, 7:58 AM.

Details

Summary

Variable-length array (VLA) should have a size that fits into
a size_t value. According to the standard: "std::size_t can
store the maximum size of a theoretically possible object of
any type (including array)" (this is applied to C too).

The size expression is evaluated at the definition of the
VLA type even if this is a typedef.
The evaluation of the size expression in itself might cause
problems if it overflows.

Diff Detail

Event Timeline

balazske created this revision.May 4 2020, 7:58 AM
martong added inline comments.May 4 2020, 8:27 AM
clang/test/Analysis/vla.c
107 ↗(On Diff #261822)

I think we could make the arithmetic more clear here:
x = BIGINDEX 65536 (2^16) and char[x][x][x][x] would be the first to overflow.
And char[x][x][x][x-1] should not overflow.

And if we are at it, then size_t's range is target dependent, so I think we must extend the RUN line with -target.

Question: Is there a way to check for overflow in the symbolic domain (that is the ArrSize value)? Or get the maximal possible value of a SVal if it is constrained? (The ArrSize that is a multiplication of every dimension size can not be checked simply for too big value symbolically because calculation overflows are not detected.)

Would it be possible to split this patch into two?

  1. The refactoring part where you move code out from checkPreStmt to checkVLA. This should be an NFC.
  2. Handling of the overflow.

Would be much cleaner I guess.

balazske added a comment.EditedMay 8 2020, 12:05 AM

I do not know if it would me much cleaner.

  • First part: Move calculation of ArraySize into checkVLA and rename checkVLASize to checkVLAIndexSize.
  • Second part: Add the check for total array size to checkVLA.

The first part in itself does not make sense necessarily until we see why this change is needed (this is, because the total size is checked and that calculation needs the same data and the total size should be checked both places where checkVLA is used). The original code is not much better than after this "part 1" refactoring, computation of ArraySize is needed only once not at both places where checkVLAis called (but it is needed to implement "part 2").

gamesh411 accepted this revision.May 11 2020, 12:41 AM

With the minor adjustment in the one test case this LGTM.

clang/test/Analysis/vla.c
107 ↗(On Diff #261822)

I would also find the extra comment and the extra passing test case helpful.

This revision is now accepted and ready to land.May 11 2020, 12:41 AM
balazske updated this revision to Diff 263151.May 11 2020, 5:37 AM

Added target dependent tests.

Szelethus added inline comments.May 13 2020, 7:06 AM
clang/test/Analysis/vla-overflow.c
11

Let's not trim be checker name here.

Also, we could mention what the specific size is.

Szelethus requested changes to this revision.EditedMay 13 2020, 7:48 AM

Variable-length array (VLA) should have a size that fits into a size_t value. At least if the size is queried with sizeof, but it is better (and more simple) to check it always

So creating VLA larger than sizeof(size_t) isn't a bug, bur rather a sign of code smell? Then we shouldn't create a fatal error node for it, unless we're trying to fit it in a variable that isn't sufficiently large. The fact that sizeofing it is a bug wasn't immediately obvious to me either, so a quote from the standard as comments would be appreciated:

§6.5.3.4.4, about operator sizeof: The value of the result is implementation-defined, and its type (an unsigned integer type) is size_t, defined in <stddef.h> (and other headers).

This revision now requires changes to proceed.May 13 2020, 7:48 AM

I do not know if it would me much cleaner.

  • First part: Move calculation of ArraySize into checkVLA and rename checkVLASize to checkVLAIndexSize.
  • Second part: Add the check for total array size to checkVLA.

The first part in itself does not make sense necessarily until we see why this change is needed (this is, because the total size is checked and that calculation needs the same data and the total size should be checked both places where checkVLA is used). The original code is not much better than after this "part 1" refactoring, computation of ArraySize is needed only once not at both places where checkVLAis called (but it is needed to implement "part 2").

Alright, makes sense, so let's just disregard the refactoring comment.

martong accepted this revision.May 13 2020, 10:11 AM

Variable-length array (VLA) should have a size that fits into a size_t value. At least if the size is queried with sizeof, but it is better (and more simple) to check it always

So creating VLA larger than sizeof(size_t) isn't a bug, bur rather a sign of code smell? Then we shouldn't create a fatal error node for it, unless we're trying to fit it in a variable that isn't sufficiently large. The fact that sizeofing it is a bug wasn't immediately obvious to me either, so a quote from the standard as comments would be appreciated:

§6.5.3.4.4, about operator sizeof: The value of the result is implementation-defined, and its type (an unsigned integer type) is size_t, defined in <stddef.h> (and other headers).

I am not sure if I can follow your concern here.
sizeof(size_t) is typically 8, so that is not a bug, neither a code smell to have char VLA[sizeof(size_t)];. The problem is when the size is bigger than the maximum value of size_t, that ix 0xff...ff, as we can see that in the new tests.
Besides, not having the size printed out in the warning is not a blocker for me, this looks good enough.

clang/test/Analysis/vla-overflow.c
11

Yes, that's a good idea to print the actual size of the VLA when we have that info. But I think we cannot just print that when it overflows! :D We can, however, print the maximum allowed value in case of the overflow.

Szelethus accepted this revision.May 13 2020, 11:11 AM

I am not sure if I can follow your concern here.
sizeof(size_t) is typically 8, so that is not a bug, neither a code smell to have char VLA[sizeof(size_t)];. The problem is when the size is bigger than the maximum value of size_t, that ix 0xff...ff, as we can see that in the new tests.
Besides, not having the size printed out in the warning is not a blocker for me, this looks good enough.

Silly me. The size would be nice, but if we don't explain how we calculated that size, it wouldn't make the bug report much better.

This revision is now accepted and ready to land.May 13 2020, 11:11 AM

Variable-length array (VLA) should have a size that fits into a size_t value. At least if the size is queried with sizeof, but it is better (and more simple) to check it always

So creating VLA larger than sizeof(size_t) isn't a bug, bur rather a sign of code smell? Then we shouldn't create a fatal error node for it, unless we're trying to fit it in a variable that isn't sufficiently large. The fact that sizeofing it is a bug wasn't immediately obvious to me either, so a quote from the standard as comments would be appreciated:

§6.5.3.4.4, about operator sizeof: The value of the result is implementation-defined, and its type (an unsigned integer type) is size_t, defined in <stddef.h> (and other headers).

I was looking at CERT ARR32-C "Ensure size arguments for variable length arrays are in a valid range". The VLA should not have a size that is larger than std::numeric_limits<size_t>::max(), in other words "fit into a size_t value", or not?

Yes creating the too large VLA in itself is not a bug, only when sizeof is called on it because it can not return the correct size. A non-fatal error is a better option, or delay the check until the sizeof call. But probably the create of such a big array in itself is sign of code smell. The array actually does not need to be created to make the problem happen, only a sizeof call on a typedef-ed and too large VLA. (What does mean that "result of sizeof is implementation-defined"? Probably it can return not the size in bytes or "chars" but something other? In such a case the checker would be not correct.)

Is it worth to improve the checker by emitting a warning only if a sizeof is used on a typedef-ed VLA type where the size is too large (and at a VLA declaration with size overflow)? Or can this be done in a later change?

whisperity added inline comments.
clang/test/Analysis/vla-overflow.c
9

While it's generally true nowadays, instead of a comment, perhaps turn this (sizeof(size_t) >= 8) into an assert, or a condition.

I was looking at CERT ARR32-C "Ensure size arguments for variable length arrays are in a valid range". The VLA should not have a size that is larger than std::numeric_limits<size_t>::max(), in other words "fit into a size_t value", or not?

Yes creating the too large VLA in itself is not a bug, only when sizeof is called on it because it can not return the correct size. A non-fatal error is a better option, or delay the check until the sizeof call. But probably the create of such a big array in itself is sign of code smell. The array actually does not need to be created to make the problem happen, only a sizeof call on a typedef-ed and too large VLA. (What does mean that "result of sizeof is implementation-defined"? Probably it can return not the size in bytes or "chars" but something other? In such a case the checker would be not correct.)

Is it worth to improve the checker by emitting a warning only if a sizeof is used on a typedef-ed VLA type where the size is too large (and at a VLA declaration with size overflow)? Or can this be done in a later change?

The result of sizeof is implementation-defined because the size of types is implementation-defined in the first place (obeying a few other rules). This covers the part that the standard does not say that the result value will be 5 or 7 or 92. The type which sizeof() returns is a size_t, which in itself is an implementation-defined type, depending on target architecture). This is why the result of sizeof is implementation-defined.

However, the following is said about size_t:

std::size_t can store the maximum size of a theoretically possible object of any type (including array). A type whose size cannot be represented by std::size_t is ill-formed (since C++14)

(The first sentence applies for C, too.)

I would reason from this that in case we know there is a type which cannot be reasoned about with sizeof then this type is bad. This conforms to what ARR32-C says.
The problem is that if you have an overflown sized array, at the point of allocation, the code generated will use this overflown size, which opens the door for a plethora of vulnerabilities.

I think we should aim for making the project safe and proper by eliminating bad code, instead of hunting purely the word of the law. If the intention visible in the code can lead to the system blowing up, let's keep the warning there where it is easiest to fix.

balazske updated this revision to Diff 264572.May 18 2020, 3:15 AM

Rebase, added checker name to overflow test.

balazske edited the summary of this revision. (Show Details)May 18 2020, 11:20 PM
This revision was automatically updated to reflect the committed changes.
NoQ added a subscriber: NoQ.May 29 2020, 5:49 AM

https://bugs.llvm.org/show_bug.cgi?id=46128 looks like a crash caused by this patch.

Strange: Even if the assume for zero index size does not indicate problem (assume that index length is not zero) the getKnownValue returns 0 for it.