This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Retrieve a value from list initialization of constant array declaration in a global scope.
ClosedPublic

Authored by ASDenysPetrov on Jun 15 2021, 3:32 AM.

Details

Summary

Fix the point that we didn't take into account array's dimension. Retrieve a value of global constant array by iterating through its initializer list.
Example:

const int arr[4] = {1, 2};
const int *ptr = arr;
int x0 = ptr[0]; // 1
int x1 = ptr[1]; // 2
int x2 = ptr[2]; // 0
int x3 = ptr[3]; // 0
int x4 = ptr[4]; // UB

Fixes: https://bugs.llvm.org/show_bug.cgi?id=50604

TODO: Support multidimensional arrays as well:

const int arr[4][2] = {1, 2};
const int *ptr = arr[0];
int x0 = ptr[0]; // 1
int x1 = ptr[1]; // 2
int x2 = ptr[2]; // UB
int x3 = ptr[3]; // UB
int x4 = ptr[4]; // UB

Diff Detail

Event Timeline

ASDenysPetrov created this revision.Jun 15 2021, 3:32 AM
ASDenysPetrov requested review of this revision.Jun 15 2021, 3:32 AM

I'm not sure about whether or not this patch would only work for constant arrays with initializer lists. If it does only work for such arrays, then I wonder whether the fix is broad enough -- I haven't tested (yet), but I think the presence of the initializer list in the test case is not necessary to trigger the spurious warnings about garbage/undefined values. I'll try it tomorrow morning...

I'm not sure about whether or not this patch would only work for constant arrays with initializer lists. If it does only work for such arrays, then I wonder whether the fix is broad enough -- I haven't tested (yet), but I think the presence of the initializer list in the test case is not necessary to trigger the spurious warnings about garbage/undefined values. I'll try it tomorrow morning...

I tested with an unpatched build using a reproducer without an initializer list, and didn't get the spurious warnings -- so your approach seems correct to me now. I've also tested your patch and I believe it gives correct behavior.

clang/lib/AST/Type.cpp
149
clang/include/clang/AST/Expr.h
4971

I think in most (all?) other methods in this class, array indices are unsigned in the API. If the array index itself comes from an expression that is negative (i.e., a literal negative integer, or an constant expression that evaluates to a negative number), that has to be handled correctly, but I'm not sure this is the right place to do it. As this code stands, if an integer literal used used, which is greater than LONG_MAX, but less than ULONG_MAX, it will be end up being treated as invalid in this method, won't it?

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1670–1679

I see where you got the int64_t from -- that's what getSExtValue() returns. So, if the literal const index value in the expr is greater than LONG_MAX (but less than ULONG_MAX, of course), this would assert. That seems undesirable....

I've looked this over and tested it locally, and I'm pretty sure it's a good patch. If it were solely up to me, I'd accept this patch as-is. I don't think I should assume I have enough experience in this area though... @NoQ , could you take a look over this, and accept it if you think it's safe and reasonable?

@chrish_ericsson_atx

Sorry for the late reply. Thank you for reveiwing this.

I think the presence of the initializer list in the test case is not necessary to trigger the spurious warnings

Could you please provide some test cases that you think will uncover other issues. I'll add them to the test set.

I also have to mention one point of what this patch do more. Consider next:

int const arr[2][2] = {{1, 2}, {3, 4}}; // global space
int const *ptr = &arr[0][0];
ptr[3]; // represented as ConcreteInt(0) 
arr[1][1]; // represented as reg_$0<int Element{Element{arr,1 S64b,int [2]},1 S64b,int}>

As you can see, now the access through the raw pointer is more presice as through the multi-level indexing. I didn't want to synchronyze those retrieved values both to be reg_$0. I've seen a way to handle it more sophisticatedly.
I'm gonna do the same for the multi-level indexing (aka arr[1][2][3]).

clang/lib/AST/Type.cpp
149

+1

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1670–1679

That's a great catch! I'll make changes soon.

I think the presence of the initializer list in the test case is not necessary to trigger the spurious warnings

Could you please provide some test cases that you think will uncover other issues. I'll add them to the test set.

I tested locally with this patch and found that my guess was incorrect-- I couldn't trigger the incorrect behavior without an initializer list. So I think your code and testcases are good as they are!

I also have to mention one point of what this patch do more. Consider next:

int const arr[2][2] = {{1, 2}, {3, 4}}; // global space
int const *ptr = &arr[0][0];
ptr[3]; // represented as ConcreteInt(0) 
arr[1][1]; // represented as reg_$0<int Element{Element{arr,1 S64b,int [2]},1 S64b,int}>

As you can see, now the access through the raw pointer is more presice as through the multi-level indexing. I didn't want to synchronyze those retrieved values both to be reg_$0. I've seen a way to handle it more sophisticatedly.
I'm gonna do the same for the multi-level indexing (aka arr[1][2][3]).

I don't understand -- probably I don't have enough experience with analyzer state dumps to know what I should find surprising or better in this example.

@chrish_ericsson_atx

I don't understand -- probably I don't have enough experience with analyzer state dumps to know what I should find surprising or better in this example.

Simply saying, now ptr[3] returns value 4 as expected, but arr[1][1] still returns unknown symbol.
Previously ptr[3] also returned unknown symbol, basically matched with what arr[1][1] returns.
After my patch ptr[3] started to return 'undefined' and therefore asserted.
I've made a fix which improved the behavior for ptr[3] but remained arr[1][1] as is.

chrish_ericsson_atx added a comment.EditedJun 25 2021, 1:00 PM

While this patch resolves the issue captured in https://bugs.llvm.org/show_bug.cgi?id=50604, it actually introduces a *new* bug. Perhaps this is what you were alluding to? Here's a reproducer which doesn't fail on main (with or without the problematic b30521c28a4d commit), but it *does* fail with this proposed patch:

eahcmrh@seroius03977[21:50][repo/eahcmrh/ltebb]$ cat ~/pr50604-newbug.c 
static float const dt[12] =
{
  0.0000, 0.0235, 0.0470, 0.0706, 0.0941, 0.1176,
  0.1411, 0.1646, 0.1881, 0.2117, 0.2352, 0.2587
};
void foo(float s)
{
  (void)( s + dt[0]) ;
}
eahcmrh@seroius03977[21:57][repo/eahcmrh/ltebb]$ /proj/bbi/eahcmrh/arcpatch-D104285/compiler-clang/bin/clang -Xanalyzer -analyzer-werror --analyze ~/pr50604-newbug.c 
/home/eahcmrh/pr50604-newbug.c:8:13: error: The right operand of '+' is a garbage value [core.UndefinedBinaryOperatorResult]
  (void)( s + dt[0]) ;
            ^ ~~~~~
1 error generated.

I'll upload this reproducer to the bug report as well.

To be clear, neither this new reproducer nor the one I originally posted fail if commit b30521c28a4d is reverted. Is it worth considering reverting that commit until a patch that addresses the original problem and doesn't introduce these new regressions is available?

@chrish_ericsson_atx
Thanks for the new test case. I'll handle it ASAP.

To be clear, neither this new reproducer nor the one I originally posted fail if commit b30521c28a4d is reverted. Is it worth considering reverting that commit until a patch that addresses the original problem and doesn't introduce these new regressions is available?

I don't think we should revert b30521c28a4d because it corrects symbol representation in CSA and fixes two bugs: https://bugs.llvm.org/show_bug.cgi?id=37503 and https://bugs.llvm.org/show_bug.cgi?id=49007. Another point of non-revert is that your cases were previously hidden in CSA core and that's good to find them. I'm afraid it's a dubious idea to return back old bugs in favor of not seeing new ones.

I don't think we should revert b30521c28a4d because it corrects symbol representation in CSA and fixes two bugs: https://bugs.llvm.org/show_bug.cgi?id=37503 and https://bugs.llvm.org/show_bug.cgi?id=49007. Another point of non-revert is that your cases were previously hidden in CSA core and that's good to find them. I'm afraid it's a dubious idea to return back old bugs in favor of not seeing new ones.

Valid point. Thanks for considering the question. I agree it's better to move forward and get this patch working. :) I just wish I had the bandwidth to help more...

Fixed a case mentioned by @chrish_ericsson_atx. Added the cases to the common bunch.

@chrish_ericsson_atx
OK. I think I found the issue. Could you please check whether it works for you?

Fixed concern about index type being either int64_t or uint64_t.

I like the idea and I think this is a valuable patch. However, because of the changes under lib/AST we need to add other reviewers who are responsible for those parts (e.g. aaronballman or rsmith). Is there really no way to workaround those changes? E.g. could we have a free function outside of the InitListExpr to implement getExprForConstArrayByRawIndex?

ASDenysPetrov retitled this revision from [analyzer] Retrieve value by direct index from list initialization of constant array declaration. to [analyzer][AST] Retrieve value by direct index from list initialization of constant array declaration..
ASDenysPetrov added a comment.EditedJul 29 2021, 3:32 PM

@martong
I've added new reviewers, thanks for the prompt.

E.g. could we have a free function outside of the InitListExpr to implement getExprForConstArrayByRawIndex

It is possible, but I think this is more natural for the instance of InitListExpr to be responsible for such traversion.

aaron.ballman added inline comments.Aug 5 2021, 6:13 AM
clang/include/clang/AST/Expr.h
4959
4970

i cannot be < 0 because the index here is unsigned anyway.

4973–4975

I don't think this overload adds enough value -- the indexes are naturally unsigned, and the caller should validate the behavior if the source expression is signed.

clang/lib/AST/Expr.cpp
2354–2358

Hmm, generally speaking, you should not cast an arbitrary type to an array type because that won't do the correct thing for qualifiers. Instead, you'd usually use ASTContext::getAsConstantArrayType() to get the correct type. However, because you're just getting the array extent, I don't believe that can be impacted. However, isa followed by cast is a code smell, and that should at least be using a dyn_cast.

@rsmith, do you have thoughts on this?

@aaron.ballman Thanks for the review and comments. I'll update it ASAP.

clang/include/clang/AST/Expr.h
4959

+1

4970

Aha, I see.

4973–4975

Sounds reasonable.

clang/lib/AST/Expr.cpp
2354–2358

I'll rewrite this part.

Changed according to comments.

Fixed smell code: from isa'n'cast to dyn-cast.

aaron.ballman added a comment.EditedAug 13 2021, 4:30 AM

One thing I think is worth asking in this thread is whether what you're analyzing is undefined behavior?

Array subscripting is defined in terms of pointer addition per: http://eel.is/c++draft/expr.sub#1
Pointer addition has a special behavior for arrays: http://eel.is/c++draft/expr.add#4 ("Otherwise, if P points to an array element i of an array object x with n elements ... Otherwise, the behavior is undefined.")

I am pretty sure that at least in C++, treating a multidimensional array as a single dimensional array is UB (depending on the index used and the declaration of the array) because of the strength of the type system around arrays. And when you turn some of these examples into constant expressions, we reject them based on the bounds. e.g., https://godbolt.org/z/nYPcY14a8

MTC added a subscriber: MTC.Aug 13 2021, 4:33 AM

One thing I think is worth asking in this thread is whether what you're analyzing is undefined behavior?

Technically you are right. Every exit out of an array extent is UB according to the Standard.
But in practice we can rely on the fact that multidimensional arrays have a continuous layout in memory on stack.
Also every compiler treats int[2][2] and int** differently. E.g.:

int arr[6][7];
arr[2][3]; // *(arr + (2*7 + 3)) = *(arr + 17)

int *ptr = arr;
ptr[17]; //  *(arr + 17)

int **ptr;
ptr[2][3] // *(*(ptr + 2) + 3)

Many engineers expoit this fact and treat multidimensional arrays on stack through a raw pointer ((int*)arr). We can foresee their intentions and treat a multidimensional array as a single one instead of a warning about UB.

And when you turn some of these examples into constant expressions, we reject them based on the bounds. e.g., https://godbolt.org/z/nYPcY14a8

Yes, when we use expicit constants there we can catch such a warning, because AST parser can timely recognize the issue. The parser is not smart enough to treat variables. Static Analyzer is in charge of this and executes after the parser. I think AST parser shall also ignore the Standard in this particular case with an eye on a real use cases and developers' intentions. As you can see there is a bit modified version which doesn't emit the warning https://godbolt.org/z/Mdhhe6Eo9.

One thing I think is worth asking in this thread is whether what you're analyzing is undefined behavior?

Technically you are right. Every exit out of an array extent is UB according to the Standard.

At least in C++; I'd have to double-check for C.

But in practice we can rely on the fact that multidimensional arrays have a continuous layout in memory on stack.

"But in practice we can rely on <this UB behavior>" is a very dangerous assumption for users to make, and I think we shouldn't codify that in the design of the static analyzer. We might be able to make that guarantee for *clang*, but we can't make that guarantee for all implementations. One of the big uses of a static analyzer is with pointing out UB due to portability concerns.

Also every compiler treats int[2][2] and int** differently. E.g.:

int arr[6][7];
arr[2][3]; // *(arr + (2*7 + 3)) = *(arr + 17)

int *ptr = arr;
ptr[17]; //  *(arr + 17)

int **ptr;
ptr[2][3] // *(*(ptr + 2) + 3)

Many engineers expoit this fact and treat multidimensional arrays on stack through a raw pointer ((int*)arr). We can foresee their intentions and treat a multidimensional array as a single one instead of a warning about UB.

We can do that only if we're convinced that's a sound static analysis (and I'm not convinced). Optimizers can optimize based on the inference that code must be UB free, so I worry that there are optimization situations where the analyzer will fail to warn the user because we're assuming this is safe based purely on memory layout. However, things like TBAA, vectorization, etc may have a different analysis than strictly the memory layout.

And when you turn some of these examples into constant expressions, we reject them based on the bounds. e.g., https://godbolt.org/z/nYPcY14a8

Yes, when we use expicit constants there we can catch such a warning, because AST parser can timely recognize the issue. The parser is not smart enough to treat variables. Static Analyzer is in charge of this and executes after the parser.

I'm aware.

I think AST parser shall also ignore the Standard in this particular case with an eye on a real use cases and developers' intentions.

It would be a bug in Clang to do so; the standard requires a diagnostic if a constant evaluation cannot be performed due to UB: http://eel.is/c++draft/expr.const#5.7

As you can see there is a bit modified version which doesn't emit the warning https://godbolt.org/z/Mdhhe6Eo9.

Correct; I would not expect Clang to diagnose that because it doesn't require constant evaluation. I was pointing out the constexpr diagnostics because it demonstrates that this code has undefined behavior and you're modelling it as though the behavior were concretely defined.

@aaron.ballman
Ok, I got your concerns. As I can see we shall only reason about objects within the bounds. Otherwise, we shall return UndefinedVal.
E.g.:

int arr[2][5];
int* ptr1= (int*)arr; // Valid indexing for `ptr` is in range [0,4].
int* ptr2 = &arr[0][0]; // Same as above.
ptr1[4]; // Valid object.
ptr2[5]; // Out of bound. UB. UndefinedVal.

Would it be correct?

@aaron.ballman
Ok, I got your concerns.

Thanks for sticking with me!

As I can see we shall only reason about objects within the bounds. Otherwise, we shall return UndefinedVal.
E.g.:

int arr[2][5];
int* ptr1= (int*)arr; // Valid indexing for `ptr` is in range [0,4].
int* ptr2 = &arr[0][0]; // Same as above.
ptr1[4]; // Valid object.
ptr2[5]; // Out of bound. UB. UndefinedVal.

Would it be correct?

I believe so, yes (with a caveat below). I also believe this holds (reversing the pointer bases):

ptr2[4]; // valid object
ptr1[5]; // out of bounds

I've been staring at the C standard for a while, and I think the situation is also UB in C. As with C++, the array subscript operators are rewritten to be pointer arithmetic using addition (6.5.2.1p2). Additive operators says (6.5.6p9) in part: ... If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise the behavior is undefined. If the result points to one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated. I believe we run afoul of "the same array object" and "one past the last element" clauses because multidimensional arrays are defined to be arrays of arrays (6.7.6.2).

Complicating matters somewhat, I would also say that your use of [5] is not technically out of bounds, but is a one-past-the-end that's then dereferenced as part of the subscript rewriting. So it's technically fine to form the pointer to the one-past-the-end element, but it's not okay to dereference it. That matters for things like:

int arr[2][5] = {0};
const int* ptr2 = &arr[0][0];
const int* end = ptr2 + 5;

for (; ptr2 < end; ++ptr2) {
  int whatever = *ptr2;
}

where end is fine because it's never dereferenced. This distinction may matter to the static analyzer because a one-past-the-end pointer is valid for performing arithmetic on, but an out-of-bounds pointer is not.

@aaron.ballman
Let me speak some thoughts. Consider next:

int arr[2][5];
int *ptr1 = &arr[0][0];
int *ptr2 = &arr[1][0];

The Standard tells that ptr1[5] is UB and ptr2[0] is a valid object. In practice ptr1 and ptr2 usually are equal. But the Standard does not garantee them to be equal and this depends on a particular implementation. So we should rely on that there might be a compiler such that creates every subarray disjointed. I think this is an exact excerpt from what our arguing actually starts from.

@aaron.ballman
Let me speak some thoughts. Consider next:

int arr[2][5];
int *ptr1 = &arr[0][0];
int *ptr2 = &arr[1][0];

The Standard tells that ptr1[5] is UB and ptr2[0] is a valid object.

Agreed.

In practice ptr1 and ptr2 usually are equal.

Do you mean &ptr1[5] and &ptr2[0]? If so, I agree they are usually going to have the same pointer address at runtime.

But the Standard does not garantee them to be equal and this depends on a particular implementation. So we should rely on that there might be a compiler such that creates every subarray disjointed. I think this is an exact excerpt from what our arguing actually starts from.

I don't think that compilers will create a disjointed multidimensional array, as that would waste space at runtime. However, I do think that *optimizers* are getting much smarter about UB situations, saying "that can't happen", and basing decisions on it. For example, this touches on pointer provenance which is an open area of discussion in LLVM that's still being hammered out (it also relates to the C restrict keyword). In a provenance world, the pointer has more information than just its address; it also knows from where the pointer was derived, so you can tell (in the backend) that &ptr1[5] and &ptr2[0] point to *different* objects even if the pointer values are identical. So while the runtime layout of the array object may *allow* for these sort of type shenanigans with the most obvious implementation strategies for multidimensional arrays, the programming language's object model does not allow for them and optimizers may do unexpected things.

I don't think that compilers will create a disjointed multidimensional array, as that would waste space at runtime. However, I do think that *optimizers* are getting much smarter about UB situations, saying "that can't happen", and basing decisions on it. For example, this touches on pointer provenance which is an open area of discussion in LLVM that's still being hammered out (it also relates to the C restrict keyword). In a provenance world, the pointer has more information than just its address; it also knows from where the pointer was derived, so you can tell (in the backend) that &ptr1[5] and &ptr2[0] point to *different* objects even if the pointer values are identical. So while the runtime layout of the array object may *allow* for these sort of type shenanigans with the most obvious implementation strategies for multidimensional arrays, the programming language's object model does not allow for them and optimizers may do unexpected things.

This is really significant obstructions. As what I see the only thing left for us is to wait until the Standard transforms this shenanigans into legal operations and becomes closer to developers.

ASDenysPetrov added a comment.EditedAug 18 2021, 4:38 AM

@aaron.ballman
Now I'm going to rework this patch according to our disscussion. This is the first patch in the stack as you can see. And I don't want to lose the series of improvements so I will adjust it to save further patches.

This is really significant obstructions. As what I see the only thing left for us is to wait until the Standard transforms this shenanigans into legal operations and becomes closer to developers.

I don't know if either committee is considering weakening their type system rules in this area, but I'm certain the topic will come up in the WG14 Memory Object Model study group as they try to tighten up the C memory model.

@aaron.ballman
Now I'm going to rework this patch according to our disscussion. This is the first patch in the stack as you can see. And I don't want to lose the series of improvements so I will adjust it to save further patches.

Thank you, sorry this isn't quite the direction you were hoping to go in.

ASDenysPetrov edited the summary of this revision. (Show Details)

Reworked the patch according to the discussion and taking UB into account. Moved Expr::getExprForConstArrayByRawIndex to RegionStoreManager.

@ASDenysPetrov Denis, do you think it would make sense to handle the non-multi-dimensional cases first? I see that you have useful patches in the stack that depends on this change (i.e handling a StringLiteral or a CompoundLiteralExpr) but perhaps they would be meaningful even without solving the mult-array case here (?).

ASDenysPetrov added a comment.EditedAug 30 2021, 7:58 AM

@ASDenysPetrov Denis, do you think it would make sense to handle the non-multi-dimensional cases first? I see that you have useful patches in the stack that depends on this change (i.e handling a StringLiteral or a CompoundLiteralExpr) but perhaps they would be meaningful even without solving the mult-array case here (?).

I think you are right. I've been wandering the Standards for a week. I can't find the proof whether it is even a legal cast or not const int arr[1][2][3]; const int ptr* = (const int*)arr;. Descriptions about this are really unclear with poor examples.
I'll try to rewrite this patch for simple-dimensional arrays for a start.

ASDenysPetrov retitled this revision from [analyzer][AST] Retrieve value by direct index from list initialization of constant array declaration. to [analyzer] Retrieve a value from list initialization of constant array declaration in a global scope..
ASDenysPetrov edited the summary of this revision. (Show Details)

Changed the Title. Changed the Summary.

Now this patch supports only one-dimensional arrays. Previously there were a bug when didn't take into account array's dimension.

Some minor drive-by nits, but this looks sensible to me.

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1696–1698
1700–1702
martong added inline comments.Sep 9 2021, 9:33 AM
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1696–1698

+1 for Aaron's suggestion, but then it would be really helpful to have an explanatory comment. E.g.:

if (!isa<ConstantArrayType>(CAT->getElementType())) { // This is a one dimensional array.
1700

This static_cast seems to be dangerous to me, it might overflow. Can't we compare Idx directly to Extent? I see that Idx is an APSint and Extent is an APInt, but I think we should be able to handle the comparison on the APInt level b/c that takes care of the signedness. And the overflow situation should be handled as well properly with APInt, given from it's name "arbitrary precision int". In this sense I don't see why do we need I at all.

clang/test/Analysis/initialization.c
1 ↗(On Diff #370284)

I don't see how this change is related. How could the tests work before having the uninitialized.Assign enabled before?

ASDenysPetrov added inline comments.Sep 20 2021, 9:55 AM
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1696–1698

I think that self-descriptive code is better than comments nearby. And it does not affect anything in terms of performance.

1700

We can't get rid of I because we use it below anyway in I >= InitList->getNumInits() and InitList->getInit(I).
I couldn't find any appropriate function to compare without additional checking for signedness or bit-width adjusting.
I'll try to improve this snippet.

clang/test/Analysis/initialization.c
1 ↗(On Diff #370284)

I've added glob_invalid_index1 and glob_invalid_index2 functions which were not there before.
core.uninitialized.Assign produces warnings for them.

ASDenysPetrov added inline comments.Sep 21 2021, 3:56 AM
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1700

This is not dangerous because we check for negatives separately in Idx < 0, so we can be sure that I is positive while I >= Extent. Unfortunately, I didn't find any suitable way to compare APSint of unknown sign and bitwidth with signless APInt without adding another checks for sign and bitwidth conversions. In this way I prefer the currect condition I >= Extent.

aaron.ballman added inline comments.Sep 21 2021, 5:19 AM
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1696–1698

FWIW, I found the condensed form more readable (I don't have to wonder what's going to care about that variable later in the function with lots of nesting).

ASDenysPetrov added inline comments.Sep 21 2021, 6:34 AM
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1696–1698

I see what you mean. That's fair in terms of variables caring. I think it's just the other hand of the approach. I don't have strong preferences here, it's just my personal flavor because it doesn't need to introspect the expression to undersand what it means. But I'll make an update using your proposition.

aaron.ballman added inline comments.Sep 21 2021, 6:35 AM
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1696–1698

FWIW, my preference is weak as well -- the code is readable either way. :-)

martong added inline comments.Sep 21 2021, 7:28 AM
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1700

I think it would be possible to use bool llvm::APInt::uge that does an Unsigned greater or equal comparison. Or you could use sge for the signed comparison. Also, both have overload that take another APInt as parameter.

ASDenysPetrov added inline comments.Sep 22 2021, 1:55 AM
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1700

I considered them. First of all choosing between uge and sge we additionally need to check for signedness. Moreover, these functions require bitwidth to be equal. Thus we need additional checks and transformations. I found this approach too verbose. Mine one seems to me simpler and works under natural rules of comparison.

martong accepted this revision.Sep 22 2021, 5:05 AM

LGTM!

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1700

Okay, thanks for thinking it through and answering my concerns!

1700–1701

Do you think it would make sense to assert(CAT->getSize().isSigned())?

This revision is now accepted and ready to land.Sep 22 2021, 5:05 AM
ASDenysPetrov added inline comments.Sep 22 2021, 10:17 AM
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1700–1701

getSize return APInt which is signless and has no isSigned method. But we know that an array extent shall be of type std​::​size_­t (http://eel.is/c++draft/dcl.array#1) which is unsigned (http://eel.is/c++draft/support.types.layout#3). So we can confidently get the size with getZExtValue.

@martong
Thank you for your time!

@martong
BTW, this patch is the first one in the stack. There are also D107339 and D108032. You could also express your opinion there.

@martong
BTW, this patch is the first one in the stack. There are also D107339 and D108032. You could also express your opinion there.

Okay, I am going to have a look, in the meanwhile let's land this first and see if the build bots are happy.

This revision was landed with ongoing or failed builds.Sep 24 2021, 2:38 AM
This revision was automatically updated to reflect the committed changes.
NoQ added a comment.Oct 5 2021, 4:09 PM

Hey, I brought you some regressions!

const int arr[];

const int arr[3] = {1, 2, 3};

void foo() {
  if (arr[0] < 3) {
  }
}
test.c:6:14: warning: The left operand of '<' is a garbage value [core.UndefinedBinaryOperatorResult]
  if (arr[0] < 3) {
      ~~~~~~ ^
1 warning generated.

According to the -ast-dump these are redeclarations of the same variable:

|-VarDecl 0x7fd1ed8844e0 <test.c:1:1, col:11> col:7 used arr 'const int []'
|-VarDecl 0x7fd1ed884670 prev 0x7fd1ed8844e0 <line:3:1, col:24> col:7 used arr 'const int [3]' cinit
| `-InitListExpr 0x7fd1ed8847a0 <col:16, col:24> 'const int [3]'
|   |-IntegerLiteral 0x7fd1ed8846d8 <col:17> 'int' 1
|   |-IntegerLiteral 0x7fd1ed8846f8 <col:20> 'int' 2
|   `-IntegerLiteral 0x7fd1ed884718 <col:23> 'int' 3

So I suspect that you need to pick the redeclaration with the initializer before invoking the new machinery.

Good catch Artem, thanks for the report!

Maybe a single line change could solve this?

const VarDecl *VD = VR->getDecl()->getCanonicalDecl();
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1663

const VarDecl *VD = VR->getDecl()->getCanonicalDecl();

Hey, I brought you some regressions!

! In D104285#3044804, @martong wrote:

const VarDecl *VD = VR->getDecl()->getCanonicalDecl();

Wow, nice! I've been recently working on some improvements in this scope. I'll take this into account and present a fix soon.