This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/AST/
-
clang/
-
AST/
3/7
Expr.h
-
Type.h
-
lib/
-
AST/
1/2
Expr.cpp
1/2
Type.cpp
-
StaticAnalyzer/Core/
-
Core/
7/18
RegionStore.cpp
-
test/Analysis/
-
Analysis/
-
initialization.cpp

Differential D104285

[analyzer] Retrieve a value from list initialization of constant array declaration in a global scope.
ClosedPublic

Authored by ASDenysPetrov on Jun 15 2021, 3:32 AM.

Download Raw Diff

Details

Reviewers

NoQ
xazax.hun
r.stahl
rsmith
Lekensteyn
aaron.ballman
steveire
vsavchenko
martong

Commits

rG98a95d4844ca: [analyzer] Retrieve a value from list initialization of constant array…

Summary

Fix the point that we didn't take into account array's dimension. Retrieve a value of global constant array by iterating through its initializer list.
Example:

const int arr[4] = {1, 2};
const int *ptr = arr;
int x0 = ptr[0]; // 1
int x1 = ptr[1]; // 2
int x2 = ptr[2]; // 0
int x3 = ptr[3]; // 0
int x4 = ptr[4]; // UB

Fixes: https://bugs.llvm.org/show_bug.cgi?id=50604

TODO: Support multidimensional arrays as well:

const int arr[4][2] = {1, 2};
const int *ptr = arr[0];
int x0 = ptr[0]; // 1
int x1 = ptr[1]; // 2
int x2 = ptr[2]; // UB
int x3 = ptr[3]; // UB
int x4 = ptr[4]; // UB

Diff Detail

Event Timeline

ASDenysPetrov created this revision.Jun 15 2021, 3:32 AM

Herald added subscribers: manas, steakhal, martong and 9 others. · View Herald TranscriptJun 15 2021, 3:32 AM

ASDenysPetrov requested review of this revision.Jun 15 2021, 3:32 AM

Herald added a subscriber: cfe-commits. · View Herald TranscriptJun 15 2021, 3:32 AM

Harbormaster completed remote builds in B109255: Diff 352052.Jun 15 2021, 4:16 AM

I'm not sure about whether or not this patch would only work for constant arrays with initializer lists. If it does only work for such arrays, then I wonder whether the fix is broad enough -- I haven't tested (yet), but I think the presence of the initializer list in the test case is not necessary to trigger the spurious warnings about garbage/undefined values. I'll try it tomorrow morning...

In D104285#2823305, @chrish_ericsson_atx wrote:

I'm not sure about whether or not this patch would only work for constant arrays with initializer lists. If it does only work for such arrays, then I wonder whether the fix is broad enough -- I haven't tested (yet), but I think the presence of the initializer list in the test case is not necessary to trigger the spurious warnings about garbage/undefined values. I'll try it tomorrow morning...

I tested with an unpatched build using a reproducer without an initializer list, and didn't get the spurious warnings -- so your approach seems correct to me now. I've also tested your patch and I believe it gives correct behavior.

clang/lib/AST/Type.cpp
149

chrish_ericsson_atx added inline comments.Jun 18 2021, 6:39 AM

clang/include/clang/AST/Expr.h
4971	I think in most (all?) other methods in this class, array indices are unsigned in the API. If the array index itself comes from an expression that is negative (i.e., a literal negative integer, or an constant expression that evaluates to a negative number), that has to be handled correctly, but I'm not sure this is the right place to do it. As this code stands, if an integer literal used used, which is greater than LONG_MAX, but less than ULONG_MAX, it will be end up being treated as invalid in this method, won't it?
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1670–1679	I see where you got the int64_t from -- that's what getSExtValue() returns. So, if the literal const index value in the expr is greater than LONG_MAX (but less than ULONG_MAX, of course), this would assert. That seems undesirable....

I've looked this over and tested it locally, and I'm pretty sure it's a good patch. If it were solely up to me, I'd accept this patch as-is. I don't think I should assume I have enough experience in this area though... @NoQ , could you take a look over this, and accept it if you think it's safe and reasonable?

@chrish_ericsson_atx

Sorry for the late reply. Thank you for reveiwing this.

I think the presence of the initializer list in the test case is not necessary to trigger the spurious warnings

Could you please provide some test cases that you think will uncover other issues. I'll add them to the test set.

I also have to mention one point of what this patch do more. Consider next:

int const arr[2][2] = {{1, 2}, {3, 4}}; // global space
int const *ptr = &arr[0][0];
ptr[3]; // represented as ConcreteInt(0) 
arr[1][1]; // represented as reg_$0<int Element{Element{arr,1 S64b,int [2]},1 S64b,int}>

As you can see, now the access through the raw pointer is more presice as through the multi-level indexing. I didn't want to synchronyze those retrieved values both to be reg_$0. I've seen a way to handle it more sophisticatedly.
I'm gonna do the same for the multi-level indexing (aka arr[1][2][3]).

clang/lib/AST/Type.cpp
149	+1
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1670–1679	That's a great catch! I'll make changes soon.

In D104285#2836190, @ASDenysPetrov wrote:

I think the presence of the initializer list in the test case is not necessary to trigger the spurious warnings

Could you please provide some test cases that you think will uncover other issues. I'll add them to the test set.

I tested locally with this patch and found that my guess was incorrect-- I couldn't trigger the incorrect behavior without an initializer list. So I think your code and testcases are good as they are!

I also have to mention one point of what this patch do more. Consider next:
int const arr[2][2] = {{1, 2}, {3, 4}}; // global space
int const *ptr = &arr[0][0];
ptr[3]; // represented as ConcreteInt(0) 
arr[1][1]; // represented as reg_$0<int Element{Element{arr,1 S64b,int [2]},1 S64b,int}>
As you can see, now the access through the raw pointer is more presice as through the multi-level indexing. I didn't want to synchronyze those retrieved values both to be reg_$0. I've seen a way to handle it more sophisticatedly.
I'm gonna do the same for the multi-level indexing (aka arr[1][2][3]).

I don't understand -- probably I don't have enough experience with analyzer state dumps to know what I should find surprising or better in this example.

@chrish_ericsson_atx

I don't understand -- probably I don't have enough experience with analyzer state dumps to know what I should find surprising or better in this example.

Simply saying, now ptr[3] returns value 4 as expected, but arr[1][1] still returns unknown symbol.
Previously ptr[3] also returned unknown symbol, basically matched with what arr[1][1] returns.
After my patch ptr[3] started to return 'undefined' and therefore asserted.
I've made a fix which improved the behavior for ptr[3] but remained arr[1][1] as is.

While this patch resolves the issue captured in https://bugs.llvm.org/show_bug.cgi?id=50604, it actually introduces a *new* bug. Perhaps this is what you were alluding to? Here's a reproducer which doesn't fail on main (with or without the problematic b30521c28a4d commit), but it *does* fail with this proposed patch:

eahcmrh@seroius03977[21:50][repo/eahcmrh/ltebb]$ cat ~/pr50604-newbug.c 
static float const dt[12] =
{
  0.0000, 0.0235, 0.0470, 0.0706, 0.0941, 0.1176,
  0.1411, 0.1646, 0.1881, 0.2117, 0.2352, 0.2587
};
void foo(float s)
{
  (void)( s + dt[0]) ;
}
eahcmrh@seroius03977[21:57][repo/eahcmrh/ltebb]$ /proj/bbi/eahcmrh/arcpatch-D104285/compiler-clang/bin/clang -Xanalyzer -analyzer-werror --analyze ~/pr50604-newbug.c 
/home/eahcmrh/pr50604-newbug.c:8:13: error: The right operand of '+' is a garbage value [core.UndefinedBinaryOperatorResult]
  (void)( s + dt[0]) ;
            ^ ~~~~~
1 error generated.

I'll upload this reproducer to the bug report as well.

To be clear, neither this new reproducer nor the one I originally posted fail if commit b30521c28a4d is reverted. Is it worth considering reverting that commit until a patch that addresses the original problem and doesn't introduce these new regressions is available?

@chrish_ericsson_atx
Thanks for the new test case. I'll handle it ASAP.

To be clear, neither this new reproducer nor the one I originally posted fail if commit b30521c28a4d is reverted. Is it worth considering reverting that commit until a patch that addresses the original problem and doesn't introduce these new regressions is available?

I don't think we should revert b30521c28a4d because it corrects symbol representation in CSA and fixes two bugs: https://bugs.llvm.org/show_bug.cgi?id=37503 and https://bugs.llvm.org/show_bug.cgi?id=49007. Another point of non-revert is that your cases were previously hidden in CSA core and that's good to find them. I'm afraid it's a dubious idea to return back old bugs in favor of not seeing new ones.

In D104285#2847547, @ASDenysPetrov wrote:

I don't think we should revert b30521c28a4d because it corrects symbol representation in CSA and fixes two bugs: https://bugs.llvm.org/show_bug.cgi?id=37503 and https://bugs.llvm.org/show_bug.cgi?id=49007. Another point of non-revert is that your cases were previously hidden in CSA core and that's good to find them. I'm afraid it's a dubious idea to return back old bugs in favor of not seeing new ones.

Valid point. Thanks for considering the question. I agree it's better to move forward and get this patch working. :) I just wish I had the bandwidth to help more...

Fixed a case mentioned by @chrish_ericsson_atx. Added the cases to the common bunch.

@chrish_ericsson_atx
OK. I think I found the issue. Could you please check whether it works for you?

Harbormaster completed remote builds in B114844: Diff 359768.Jul 19 2021, 7:36 AM

Fixed concern about index type being either int64_t or uint64_t.

Harbormaster completed remote builds in B114869: Diff 359797.Jul 19 2021, 12:47 PM

ASDenysPetrov mentioned this in D106681: [analyzer][NFCI] Move a block from `getBindingForElement` to separate functions.Jul 23 2021, 9:52 AM

ASDenysPetrov added a child revision: D106681: [analyzer][NFCI] Move a block from `getBindingForElement` to separate functions.Jul 23 2021, 2:40 PM

I like the idea and I think this is a valuable patch. However, because of the changes under lib/AST we need to add other reviewers who are responsible for those parts (e.g. aaronballman or rsmith). Is there really no way to workaround those changes? E.g. could we have a free function outside of the InitListExpr to implement getExprForConstArrayByRawIndex?

ASDenysPetrov added reviewers: rsmith, Lekensteyn, aaron.ballman, steveire.Jul 29 2021, 3:24 PM

ASDenysPetrov retitled this revision from [analyzer] Retrieve value by direct index from list initialization of constant array declaration. to [analyzer][AST] Retrieve value by direct index from list initialization of constant array declaration..

@martong
I've added new reviewers, thanks for the prompt.

E.g. could we have a free function outside of the InitListExpr to implement getExprForConstArrayByRawIndex

It is possible, but I think this is more natural for the instance of InitListExpr to be responsible for such traversion.

ASDenysPetrov added a reviewer: vsavchenko.Aug 3 2021, 6:04 AM

aaron.ballman added inline comments.Aug 5 2021, 6:13 AM

clang/include/clang/AST/Expr.h
4959
4970	`i` cannot be `< 0` because the index here is unsigned anyway.
4973–4975	I don't think this overload adds enough value -- the indexes are naturally unsigned, and the caller should validate the behavior if the source expression is signed.
clang/lib/AST/Expr.cpp
2354–2358	Hmm, generally speaking, you should not cast an arbitrary type to an array type because that won't do the correct thing for qualifiers. Instead, you'd usually use `ASTContext::getAsConstantArrayType()` to get the correct type. However, because you're just getting the array extent, I don't believe that can be impacted. However, `isa` followed by `cast` is a code smell, and that should at least be using a `dyn_cast`. @rsmith, do you have thoughts on this?

@aaron.ballman Thanks for the review and comments. I'll update it ASAP.

clang/include/clang/AST/Expr.h
4959	+1
4970	Aha, I see.
4973–4975	Sounds reasonable.
clang/lib/AST/Expr.cpp
2354–2358	I'll rewrite this part.

Changed according to comments.

Harbormaster completed remote builds in B119269: Diff 366009.Aug 12 2021, 9:38 AM

Fixed smell code: from isa'n'cast to dyn-cast.

Harbormaster completed remote builds in B119407: Diff 366209.Aug 13 2021, 1:47 AM

One thing I think is worth asking in this thread is whether what you're analyzing is undefined behavior?

Array subscripting is defined in terms of pointer addition per: http://eel.is/c++draft/expr.sub#1
Pointer addition has a special behavior for arrays: http://eel.is/c++draft/expr.add#4 ("Otherwise, if P points to an array element i of an array object x with n elements ... Otherwise, the behavior is undefined.")

I am pretty sure that at least in C++, treating a multidimensional array as a single dimensional array is UB (depending on the index used and the declaration of the array) because of the strength of the type system around arrays. And when you turn some of these examples into constant expressions, we reject them based on the bounds. e.g., https://godbolt.org/z/nYPcY14a8

MTC added a subscriber: MTC.Aug 13 2021, 4:33 AM

In D104285#2943449, @aaron.ballman wrote:

One thing I think is worth asking in this thread is whether what you're analyzing is undefined behavior?

Technically you are right. Every exit out of an array extent is UB according to the Standard.
But in practice we can rely on the fact that multidimensional arrays have a continuous layout in memory on stack.
Also every compiler treats int[2][2] and int** differently. E.g.:

int arr[6][7];
arr[2][3]; // *(arr + (2*7 + 3)) = *(arr + 17)

int *ptr = arr;
ptr[17]; //  *(arr + 17)

int **ptr;
ptr[2][3] // *(*(ptr + 2) + 3)

Many engineers expoit this fact and treat multidimensional arrays on stack through a raw pointer ((int*)arr). We can foresee their intentions and treat a multidimensional array as a single one instead of a warning about UB.

And when you turn some of these examples into constant expressions, we reject them based on the bounds. e.g., https://godbolt.org/z/nYPcY14a8

Yes, when we use expicit constants there we can catch such a warning, because AST parser can timely recognize the issue. The parser is not smart enough to treat variables. Static Analyzer is in charge of this and executes after the parser. I think AST parser shall also ignore the Standard in this particular case with an eye on a real use cases and developers' intentions. As you can see there is a bit modified version which doesn't emit the warning https://godbolt.org/z/Mdhhe6Eo9.

In D104285#2947255, @ASDenysPetrov wrote:

In D104285#2943449, @aaron.ballman wrote:

One thing I think is worth asking in this thread is whether what you're analyzing is undefined behavior?

Technically you are right. Every exit out of an array extent is UB according to the Standard.

At least in C++; I'd have to double-check for C.

But in practice we can rely on the fact that multidimensional arrays have a continuous layout in memory on stack.

"But in practice we can rely on <this UB behavior>" is a very dangerous assumption for users to make, and I think we shouldn't codify that in the design of the static analyzer. We might be able to make that guarantee for *clang*, but we can't make that guarantee for all implementations. One of the big uses of a static analyzer is with pointing out UB due to portability concerns.

Also every compiler treats int[2][2] and int** differently. E.g.:
int arr[6][7];
arr[2][3]; // *(arr + (2*7 + 3)) = *(arr + 17)

int *ptr = arr;
ptr[17]; //  *(arr + 17)

int **ptr;
ptr[2][3] // *(*(ptr + 2) + 3)
Many engineers expoit this fact and treat multidimensional arrays on stack through a raw pointer ((int*)arr). We can foresee their intentions and treat a multidimensional array as a single one instead of a warning about UB.

We can do that only if we're convinced that's a sound static analysis (and I'm not convinced). Optimizers can optimize based on the inference that code must be UB free, so I worry that there are optimization situations where the analyzer will fail to warn the user because we're assuming this is safe based purely on memory layout. However, things like TBAA, vectorization, etc may have a different analysis than strictly the memory layout.

And when you turn some of these examples into constant expressions, we reject them based on the bounds. e.g., https://godbolt.org/z/nYPcY14a8

Yes, when we use expicit constants there we can catch such a warning, because AST parser can timely recognize the issue. The parser is not smart enough to treat variables. Static Analyzer is in charge of this and executes after the parser.

I'm aware.

I think AST parser shall also ignore the Standard in this particular case with an eye on a real use cases and developers' intentions.

It would be a bug in Clang to do so; the standard requires a diagnostic if a constant evaluation cannot be performed due to UB: http://eel.is/c++draft/expr.const#5.7

As you can see there is a bit modified version which doesn't emit the warning https://godbolt.org/z/Mdhhe6Eo9.

Correct; I would not expect Clang to diagnose that because it doesn't require constant evaluation. I was pointing out the constexpr diagnostics because it demonstrates that this code has undefined behavior and you're modelling it as though the behavior were concretely defined.

@aaron.ballman
Ok, I got your concerns. As I can see we shall only reason about objects within the bounds. Otherwise, we shall return UndefinedVal.
E.g.:

int arr[2][5];
int* ptr1= (int*)arr; // Valid indexing for `ptr` is in range [0,4].
int* ptr2 = &arr[0][0]; // Same as above.
ptr1[4]; // Valid object.
ptr2[5]; // Out of bound. UB. UndefinedVal.

Would it be correct?

In D104285#2949273, @ASDenysPetrov wrote:

@aaron.ballman
Ok, I got your concerns.

Thanks for sticking with me!

As I can see we shall only reason about objects within the bounds. Otherwise, we shall return UndefinedVal.
E.g.:
int arr[2][5];
int* ptr1= (int*)arr; // Valid indexing for `ptr` is in range [0,4].
int* ptr2 = &arr[0][0]; // Same as above.
ptr1[4]; // Valid object.
ptr2[5]; // Out of bound. UB. UndefinedVal.
Would it be correct?

I believe so, yes (with a caveat below). I also believe this holds (reversing the pointer bases):

ptr2[4]; // valid object
ptr1[5]; // out of bounds

I've been staring at the C standard for a while, and I think the situation is also UB in C. As with C++, the array subscript operators are rewritten to be pointer arithmetic using addition (6.5.2.1p2). Additive operators says (6.5.6p9) in part: ... If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise the behavior is undefined. If the result points to one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated. I believe we run afoul of "the same array object" and "one past the last element" clauses because multidimensional arrays are defined to be arrays of arrays (6.7.6.2).

Complicating matters somewhat, I would also say that your use of [5] is not technically out of bounds, but is a one-past-the-end that's then dereferenced as part of the subscript rewriting. So it's technically fine to form the pointer to the one-past-the-end element, but it's not okay to dereference it. That matters for things like:

int arr[2][5] = {0};
const int* ptr2 = &arr[0][0];
const int* end = ptr2 + 5;

for (; ptr2 < end; ++ptr2) {
  int whatever = *ptr2;
}

where end is fine because it's never dereferenced. This distinction may matter to the static analyzer because a one-past-the-end pointer is valid for performing arithmetic on, but an out-of-bounds pointer is not.

@aaron.ballman
Let me speak some thoughts. Consider next:

int arr[2][5];
int *ptr1 = &arr[0][0];
int *ptr2 = &arr[1][0];

The Standard tells that ptr1[5] is UB and ptr2[0] is a valid object. In practice ptr1 and ptr2 usually are equal. But the Standard does not garantee them to be equal and this depends on a particular implementation. So we should rely on that there might be a compiler such that creates every subarray disjointed. I think this is an exact excerpt from what our arguing actually starts from.

In D104285#2949638, @ASDenysPetrov wrote:
@aaron.ballman
Let me speak some thoughts. Consider next:
int arr[2][5];
int *ptr1 = &arr[0][0];
int *ptr2 = &arr[1][0];
The Standard tells that ptr1[5] is UB and ptr2[0] is a valid object.

Agreed.

In practice ptr1 and ptr2 usually are equal.

Do you mean &ptr1[5] and &ptr2[0]? If so, I agree they are usually going to have the same pointer address at runtime.

But the Standard does not garantee them to be equal and this depends on a particular implementation. So we should rely on that there might be a compiler such that creates every subarray disjointed. I think this is an exact excerpt from what our arguing actually starts from.

I don't think that compilers will create a disjointed multidimensional array, as that would waste space at runtime. However, I do think that *optimizers* are getting much smarter about UB situations, saying "that can't happen", and basing decisions on it. For example, this touches on pointer provenance which is an open area of discussion in LLVM that's still being hammered out (it also relates to the C restrict keyword). In a provenance world, the pointer has more information than just its address; it also knows from where the pointer was derived, so you can tell (in the backend) that &ptr1[5] and &ptr2[0] point to *different* objects even if the pointer values are identical. So while the runtime layout of the array object may *allow* for these sort of type shenanigans with the most obvious implementation strategies for multidimensional arrays, the programming language's object model does not allow for them and optimizers may do unexpected things.

In D104285#2949772, @aaron.ballman wrote:

I don't think that compilers will create a disjointed multidimensional array, as that would waste space at runtime. However, I do think that *optimizers* are getting much smarter about UB situations, saying "that can't happen", and basing decisions on it. For example, this touches on pointer provenance which is an open area of discussion in LLVM that's still being hammered out (it also relates to the C restrict keyword). In a provenance world, the pointer has more information than just its address; it also knows from where the pointer was derived, so you can tell (in the backend) that &ptr1[5] and &ptr2[0] point to *different* objects even if the pointer values are identical. So while the runtime layout of the array object may *allow* for these sort of type shenanigans with the most obvious implementation strategies for multidimensional arrays, the programming language's object model does not allow for them and optimizers may do unexpected things.

This is really significant obstructions. As what I see the only thing left for us is to wait until the Standard transforms this shenanigans into legal operations and becomes closer to developers.

@aaron.ballman
Now I'm going to rework this patch according to our disscussion. This is the first patch in the stack as you can see. And I don't want to lose the series of improvements so I will adjust it to save further patches.

In D104285#2951911, @ASDenysPetrov wrote:

This is really significant obstructions. As what I see the only thing left for us is to wait until the Standard transforms this shenanigans into legal operations and becomes closer to developers.

I don't know if either committee is considering weakening their type system rules in this area, but I'm certain the topic will come up in the WG14 Memory Object Model study group as they try to tighten up the C memory model.

In D104285#2951937, @ASDenysPetrov wrote:

@aaron.ballman
Now I'm going to rework this patch according to our disscussion. This is the first patch in the stack as you can see. And I don't want to lose the series of improvements so I will adjust it to save further patches.

Thank you, sorry this isn't quite the direction you were hoping to go in.

Reworked the patch according to the discussion and taking UB into account. Moved Expr::getExprForConstArrayByRawIndex to RegionStoreManager.

Harbormaster completed remote builds in B121362: Diff 368905.Aug 26 2021, 10:24 AM

@ASDenysPetrov Denis, do you think it would make sense to handle the non-multi-dimensional cases first? I see that you have useful patches in the stack that depends on this change (i.e handling a StringLiteral or a CompoundLiteralExpr) but perhaps they would be meaningful even without solving the mult-array case here (?).

In D104285#2972215, @martong wrote:

@ASDenysPetrov Denis, do you think it would make sense to handle the non-multi-dimensional cases first? I see that you have useful patches in the stack that depends on this change (i.e handling a StringLiteral or a CompoundLiteralExpr) but perhaps they would be meaningful even without solving the mult-array case here (?).

I think you are right. I've been wandering the Standards for a week. I can't find the proof whether it is even a legal cast or not const int arr[1][2][3]; const int ptr* = (const int*)arr;. Descriptions about this are really unclear with poor examples.
I'll try to rewrite this patch for simple-dimensional arrays for a start.

Changed the Title. Changed the Summary.

Now this patch supports only one-dimensional arrays. Previously there were a bug when didn't take into account array's dimension.

Harbormaster completed remote builds in B122317: Diff 370284.Sep 2 2021, 9:29 AM

ASDenysPetrov removed a child revision: D106681: [analyzer][NFCI] Move a block from `getBindingForElement` to separate functions.Sep 3 2021, 4:24 AM

ASDenysPetrov mentioned this in D107339: [analyzer] Retrieve a character from StringLiteral as an initializer for constant arrays..Sep 3 2021, 9:53 AM

Some minor drive-by nits, but this looks sensible to me.

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1696–1698
1700–1702

martong added inline comments.Sep 9 2021, 9:33 AM

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1696–1698	+1 for Aaron's suggestion, but then it would be really helpful to have an explanatory comment. E.g.: if (!isa<ConstantArrayType>(CAT->getElementType())) { // This is a one dimensional array.
1700	This `static_cast` seems to be dangerous to me, it might overflow. Can't we compare `Idx` directly to `Extent`? I see that `Idx` is an `APSint` and `Extent` is an `APInt`, but I think we should be able to handle the comparison on the APInt level b/c that takes care of the signedness. And the overflow situation should be handled as well properly with `APInt`, given from it's name "arbitrary precision int". In this sense I don't see why do we need `I` at all.
clang/test/Analysis/initialization.c
1 ↗	(On Diff #370284)	I don't see how this change is related. How could the tests work before having the `uninitialized.Assign` enabled before?

ASDenysPetrov added inline comments.Sep 20 2021, 9:55 AM

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1696–1698	I think that self-descriptive code is better than comments nearby. And it does not affect anything in terms of performance.
1700	We can't get rid of `I` because we use it below anyway in `I >= InitList->getNumInits()` and `InitList->getInit(I)`. I couldn't find any appropriate function to compare without additional checking for signedness or bit-width adjusting. I'll try to improve this snippet.
clang/test/Analysis/initialization.c
1 ↗	(On Diff #370284)	I've added `glob_invalid_index1` and `glob_invalid_index2` functions which were not there before. `core.uninitialized.Assign` produces warnings for them.

ASDenysPetrov added inline comments.Sep 21 2021, 3:56 AM

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1700	This is not dangerous because we check for negatives separately in `Idx < 0`, so we can be sure that `I` is positive while `I >= Extent`. Unfortunately, I didn't find any suitable way to compare `APSint` of unknown sign and bitwidth with signless `APInt` without adding another checks for sign and bitwidth conversions. In this way I prefer the currect condition `I >= Extent`.

aaron.ballman added inline comments.Sep 21 2021, 5:19 AM

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1696–1698	FWIW, I found the condensed form more readable (I don't have to wonder what's going to care about that variable later in the function with lots of nesting).

ASDenysPetrov added inline comments.Sep 21 2021, 6:34 AM

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1696–1698	I see what you mean. That's fair in terms of variables caring. I think it's just the other hand of the approach. I don't have strong preferences here, it's just my personal flavor because it doesn't need to introspect the expression to undersand what it means. But I'll make an update using your proposition.

aaron.ballman added inline comments.Sep 21 2021, 6:35 AM

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1696–1698	FWIW, my preference is weak as well -- the code is readable either way. :-)

Fixed nits.

ASDenysPetrov added a child revision: D107339: [analyzer] Retrieve a character from StringLiteral as an initializer for constant arrays..Sep 21 2021, 6:58 AM

Harbormaster completed remote builds in B124890: Diff 373895.Sep 21 2021, 7:01 AM

martong added inline comments.Sep 21 2021, 7:28 AM

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1700	I think it would be possible to use `bool llvm::APInt::uge` that does an Unsigned greater or equal comparison. Or you could use `sge` for the signed comparison. Also, both have overload that take another APInt as parameter.

ASDenysPetrov added inline comments.Sep 22 2021, 1:55 AM

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1700	I considered them. First of all choosing between `uge` and `sge` we additionally need to check for signedness. Moreover, these functions require bitwidth to be equal. Thus we need additional checks and transformations. I found this approach too verbose. Mine one seems to me simpler and works under natural rules of comparison.

LGTM!

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1700	Okay, thanks for thinking it through and answering my concerns!
1700–1701	Do you think it would make sense to `assert(CAT->getSize().isSigned())`?

This revision is now accepted and ready to land.Sep 22 2021, 5:05 AM

ASDenysPetrov added inline comments.Sep 22 2021, 10:17 AM

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1700–1701	`getSize` return `APInt` which is signless and has no `isSigned` method. But we know that an array extent shall be of type `std::size_t` (http://eel.is/c++draft/dcl.array#1) which is unsigned (http://eel.is/c++draft/support.types.layout#3). So we can confidently get the size with `getZExtValue`.

@martong
Thank you for your time!

@martong
BTW, this patch is the first one in the stack. There are also D107339 and D108032. You could also express your opinion there.

In D104285#3018257, @ASDenysPetrov wrote:

@martong
BTW, this patch is the first one in the stack. There are also D107339 and D108032. You could also express your opinion there.

Okay, I am going to have a look, in the meanwhile let's land this first and see if the build bots are happy.

This revision was landed with ongoing or failed builds.Sep 24 2021, 2:38 AM

Closed by commit rG98a95d4844ca: [analyzer] Retrieve a value from list initialization of constant array… (authored by ASDenysPetrov). · Explain Why

This revision was automatically updated to reflect the committed changes.

ASDenysPetrov added a commit: rG98a95d4844ca: [analyzer] Retrieve a value from list initialization of constant array….

ASDenysPetrov mentioned this in D110927: [analyzer] Access stored value of a constant array through a pointer to another type.Oct 1 2021, 8:22 AM

ASDenysPetrov added a child revision: D110927: [analyzer] Access stored value of a constant array through a pointer to another type.

Hey, I brought you some regressions!

const int arr[];

const int arr[3] = {1, 2, 3};

void foo() {
  if (arr[0] < 3) {
  }
}

test.c:6:14: warning: The left operand of '<' is a garbage value [core.UndefinedBinaryOperatorResult]
  if (arr[0] < 3) {
      ~~~~~~ ^
1 warning generated.

According to the -ast-dump these are redeclarations of the same variable:

|-VarDecl 0x7fd1ed8844e0 <test.c:1:1, col:11> col:7 used arr 'const int []'
|-VarDecl 0x7fd1ed884670 prev 0x7fd1ed8844e0 <line:3:1, col:24> col:7 used arr 'const int [3]' cinit
| `-InitListExpr 0x7fd1ed8847a0 <col:16, col:24> 'const int [3]'
|   |-IntegerLiteral 0x7fd1ed8846d8 <col:17> 'int' 1
|   |-IntegerLiteral 0x7fd1ed8846f8 <col:20> 'int' 2
|   `-IntegerLiteral 0x7fd1ed884718 <col:23> 'int' 3

So I suspect that you need to pick the redeclaration with the initializer before invoking the new machinery.

Good catch Artem, thanks for the report!

Maybe a single line change could solve this?

const VarDecl *VD = VR->getDecl()->getCanonicalDecl();

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1663	const VarDecl *VD = VR->getDecl()->getCanonicalDecl();

In D104285#3044103, @NoQ wrote:
Hey, I brought you some regressions!

! In D104285#3044804, @martong wrote:
const VarDecl *VD = VR->getDecl()->getCanonicalDecl();

Wow, nice! I've been recently working on some improvements in this scope. I'll take this into account and present a fix soon.

ASDenysPetrov added a child revision: D111542: [analyzer] Retrieve incomplete array extent from its redeclaration..Oct 11 2021, 6:29 AM

ASDenysPetrov mentioned this in D111542: [analyzer] Retrieve incomplete array extent from its redeclaration..Oct 11 2021, 6:35 AM

ASDenysPetrov added a child revision: D111654: [analyzer] Retrieve a value from list initialization of multi-dimensional array declaration..Oct 12 2021, 9:20 AM

ASDenysPetrov added a child revision: D106681: [analyzer][NFCI] Move a block from `getBindingForElement` to separate functions.Oct 18 2021, 9:26 AM

ASDenysPetrov removed a child revision: D107339: [analyzer] Retrieve a character from StringLiteral as an initializer for constant arrays..Oct 21 2021, 8:25 AM

ASDenysPetrov removed a child revision: D111542: [analyzer] Retrieve incomplete array extent from its redeclaration..

ASDenysPetrov removed a child revision: D111654: [analyzer] Retrieve a value from list initialization of multi-dimensional array declaration..Oct 28 2021, 9:38 AM

ASDenysPetrov removed a child revision: D110927: [analyzer] Access stored value of a constant array through a pointer to another type.Nov 1 2021, 9:16 AM

Revision Contents

Path

Size

clang/

include/

clang/

AST/

Expr.h

18 lines

Type.h

1 line

lib/

AST/

Expr.cpp

54 lines

Type.cpp

12 lines

StaticAnalyzer/

Core/

RegionStore.cpp

28 lines

test/

Analysis/

initialization.cpp

113 lines

Diff 359797

clang/include/clang/AST/Expr.h

Show First 20 Lines • Show All 4,950 Lines • ▼ Show 20 Lines public:

bool hadArrayRangeDesignator() const { bool hadArrayRangeDesignator() const {

return InitListExprBits.HadArrayRangeDesignator != 0; return InitListExprBits.HadArrayRangeDesignator != 0;

} }

void sawArrayRangeDesignator(bool ARD = true) { void sawArrayRangeDesignator(bool ARD = true) {

InitListExprBits.HadArrayRangeDesignator = ARD; InitListExprBits.HadArrayRangeDesignator = ARD;

} }

/// Return an value-expression under the given index.

aaron.ballmanUnsubmitted

Not Done

InitListExprBits.HadArrayRangeDesignator = ARD;

}

- /// Return an value-expression under the given index.

+ /// Return a value-expression under the given index.

///

/// \param Idx Direct index beginning from the first value of the list.

aaron.ballman:

ASDenysPetrovAuthorUnsubmitted

Done

ASDenysPetrov: +1

///

/// \param Idx Direct index beginning from the first value of the list.

/// It's a direct index as it would be in a one-dimentional array.

/// For instance for `Idx = 3`:

/// - `const T x[2][2] = {{1,2},{3,4}}` returns '4';

/// - `const T x[2][2] = {{1,},{3,4}}` returns '4';

/// - `const T x[3][2] = {{1,},{},{5,6}}` returns '0'.

/// \returns

/// - value-expression for the valid index;

/// - `this` if there's no expression for the valid index;

/// - `nullptr` for invalid index (`i < 0` or `i >= array_size`)

aaron.ballmanUnsubmitted

Not Done

i cannot be < 0 because the index here is unsigned anyway.

aaron.ballman: `i` cannot be `< 0` because the index here is unsigned anyway.

ASDenysPetrovAuthorUnsubmitted

Done

Aha, I see.

ASDenysPetrov: Aha, I see.

/// or if it is not a list for constant array type.

chrish_ericsson_atxUnsubmitted

Not Done

I think in most (all?) other methods in this class, array indices are unsigned in the API. If the array index itself comes from an expression that is negative (i.e., a literal negative integer, or an constant expression that evaluates to a negative number), that has to be handled correctly, but I'm not sure this is the right place to do it. As this code stands, if an integer literal used used, which is greater than LONG_MAX, but less than ULONG_MAX, it will be end up being treated as invalid in this method, won't it?

chrish_ericsson_atx: I think in most (all?) other methods in this class, array indices are unsigned in the API. If…

const Expr *getExprForConstArrayByRawIndex(uint64_t Idx) const;

/// This version adapted to treat unsigned integer to distinguish between

/// -1 and ULONG_LONG_MAX.

const Expr *getExprForConstArrayByRawIndex(int64_t Idx) const;

aaron.ballmanUnsubmitted

Not Done

I don't think this overload adds enough value -- the indexes are naturally unsigned, and the caller should validate the behavior if the source expression is signed.

aaron.ballman: I don't think this overload adds enough value -- the indexes are naturally unsigned, and the…

ASDenysPetrovAuthorUnsubmitted

Done

Sounds reasonable.

ASDenysPetrov: Sounds reasonable.

SourceLocation getBeginLoc() const LLVM_READONLY; SourceLocation getBeginLoc() const LLVM_READONLY;

SourceLocation getEndLoc() const LLVM_READONLY; SourceLocation getEndLoc() const LLVM_READONLY;

static bool classof(const Stmt *T) { static bool classof(const Stmt *T) {

return T->getStmtClass() == InitListExprClass; return T->getStmtClass() == InitListExprClass;

} }

// Iterators // Iterators

▲ Show 20 Lines • Show All 1,486 Lines • Show Last 20 Lines

clang/include/clang/AST/Type.h

Show First 20 Lines • Show All 2,938 Lines • ▼ Show 20 Lines	class ConstantArrayType final
}		}

unsigned numTrailingObjects(OverloadToken<const Expr*>) const {		unsigned numTrailingObjects(OverloadToken<const Expr*>) const {
return ConstantArrayTypeBits.HasStoredSizeExpr;		return ConstantArrayTypeBits.HasStoredSizeExpr;
}		}

public:		public:
const llvm::APInt &getSize() const { return Size; }		const llvm::APInt &getSize() const { return Size; }
		SmallVector<uint64_t, 2> getAllExtents() const;
const Expr *getSizeExpr() const {		const Expr *getSizeExpr() const {
return ConstantArrayTypeBits.HasStoredSizeExpr		return ConstantArrayTypeBits.HasStoredSizeExpr
? getTrailingObjects<const Expr >()		? getTrailingObjects<const Expr >()
: nullptr;		: nullptr;
}		}
bool isSugared() const { return false; }		bool isSugared() const { return false; }
QualType desugar() const { return QualType(this, 0); }		QualType desugar() const { return QualType(this, 0); }

▲ Show 20 Lines • Show All 4,311 Lines • Show Last 20 Lines

clang/lib/AST/Expr.cpp

Show All 30 Lines
#include "clang/Basic/TargetInfo.h"		#include "clang/Basic/TargetInfo.h"
#include "clang/Lex/Lexer.h"		#include "clang/Lex/Lexer.h"
#include "clang/Lex/LiteralSupport.h"		#include "clang/Lex/LiteralSupport.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/Format.h"		#include "llvm/Support/Format.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <algorithm>		#include <algorithm>
#include <cstring>		#include <cstring>
		#include <numeric>
using namespace clang;		using namespace clang;

const Expr *Expr::getBestDynamicClassTypeExpr() const {		const Expr *Expr::getBestDynamicClassTypeExpr() const {
const Expr *E = this;		const Expr *E = this;
while (true) {		while (true) {
E = E->IgnoreParenBaseCasts();		E = E->IgnoreParenBaseCasts();

// Follow the RHS of a comma operator.		// Follow the RHS of a comma operator.
▲ Show 20 Lines • Show All 2,287 Lines • ▼ Show 20 Lines	bool InitListExpr::isIdiomaticZeroInitializer(const LangOptions &LangOpts) const {
if (LangOpts.CPlusPlus \|\| getNumInits() != 1 \|\| !getInit(0)) {		if (LangOpts.CPlusPlus \|\| getNumInits() != 1 \|\| !getInit(0)) {
return false;		return false;
}		}

const IntegerLiteral *Lit = dyn_cast<IntegerLiteral>(getInit(0)->IgnoreImplicit());		const IntegerLiteral *Lit = dyn_cast<IntegerLiteral>(getInit(0)->IgnoreImplicit());
return Lit && Lit->getValue() == 0;		return Lit && Lit->getValue() == 0;
}		}

		const Expr *InitListExpr::getExprForConstArrayByRawIndex(int64_t Idx) const {
		// Return null if index is invalid.
		if (Idx < 0)
		return nullptr;

		return getExprForConstArrayByRawIndex(static_cast<uint64_t>(Idx));
		}

		const Expr *InitListExpr::getExprForConstArrayByRawIndex(uint64_t Idx) const {
		// Make sure this is a list initialization for const array.
		QualType T = getType();
		if (!isa<ConstantArrayType>(T))
		return nullptr;

		SmallVector<uint64_t, 2> Extents =
		cast<ConstantArrayType>(T)->getAllExtents();
		aaron.ballmanUnsubmitted Not Done Reply Inline Actions Hmm, generally speaking, you should not cast an arbitrary type to an array type because that won't do the correct thing for qualifiers. Instead, you'd usually use `ASTContext::getAsConstantArrayType()` to get the correct type. However, because you're just getting the array extent, I don't believe that can be impacted. However, `isa` followed by `cast` is a code smell, and that should at least be using a `dyn_cast`. @rsmith, do you have thoughts on this? aaron.ballman: Hmm, generally speaking, you should not cast an arbitrary type to an array type because that…
		ASDenysPetrovAuthorUnsubmitted Done Reply Inline Actions I'll rewrite this part. ASDenysPetrov: I'll rewrite this part.
		// Calculate array total size.
		uint64_t Size = std::accumulate(Extents.begin(), Extents.end(), uint64_t(1),
		std::multiplies<uint64_t>());

		// Return null if index is out of array bounds.
		if (Idx >= Size)
		return nullptr;

		// We can iterate through the multi-dimensional array the same as through
		// one-dimensional array, because of the solid memory allocation.
		const InitListExpr *InitList = this;
		uint64_t I = 0;
		for (uint64_t Ext : Extents) {
		Size /= Ext;
		I = Idx / Size;
		Idx -= I * Size;

		// If there is a list but no value, it must be zero.
		// Return `this` to notify the user of this particular case.
		// This should be handled by a caller.
		if (I >= InitList->InitExprs.size())
		return this;

		const Stmt *Stmt = InitList->InitExprs[I];
		// If it is not an InitListExpr, then we've reached the actual values.
		// Return this Expr.
		if (!isa<InitListExpr>(Stmt))
		return cast<Expr>(Stmt);

		InitList = cast<InitListExpr>(Stmt);
		}

		// We shouldn't reach here unless we missed some legal syntax while parsing.
		// Otherwise, fix parsing logic above.
		llvm_unreachable("List initialization is formed in an unusual way.");
		}

SourceLocation InitListExpr::getBeginLoc() const {		SourceLocation InitListExpr::getBeginLoc() const {
if (InitListExpr *SyntacticForm = getSyntacticForm())		if (InitListExpr *SyntacticForm = getSyntacticForm())
return SyntacticForm->getBeginLoc();		return SyntacticForm->getBeginLoc();
SourceLocation Beg = LBraceLoc;		SourceLocation Beg = LBraceLoc;
if (Beg.isInvalid()) {		if (Beg.isInvalid()) {
// Find the first non-null initializer.		// Find the first non-null initializer.
for (InitExprsTy::const_iterator I = InitExprs.begin(),		for (InitExprsTy::const_iterator I = InitExprs.begin(),
E = InitExprs.end();		E = InitExprs.end();
▲ Show 20 Lines • Show All 2,641 Lines • Show Last 20 Lines

clang/lib/AST/Type.cpp

Show First 20 Lines • Show All 132 Lines • ▼ Show 20 Lines

: Type(tc, can,

(tc == DependentSizedArray

? TypeDependence::DependentInstantiation

: TypeDependence::None)),

ElementType(et) {

ArrayTypeBits.IndexTypeQuals = tq;

ArrayTypeBits.SizeModifier = sm;

}

/// Return an array with extents of the declared array type.

///

/// E.g. for `const int x[1][2][3];` returns {1,2,3}.

SmallVector<uint64_t, 2> ConstantArrayType::getAllExtents() const {

SmallVector<uint64_t, 2> Extents;

const ConstantArrayType *CAT = this;

do {

Extents.push_back(*CAT->getSize().getRawData());

} while ((CAT = dyn_cast<ConstantArrayType>(CAT->getElementType())));

chrish_ericsson_atxUnsubmitted

Not Done

Extents.push_back(*CAT->getSize().getRawData());

- } while (CAT = dyn_cast<ConstantArrayType>(CAT->getElementType()));

+ } while ((CAT = dyn_cast<ConstantArrayType>(CAT->getElementType())));

return Extents;

chrish_ericsson_atx:

ASDenysPetrovAuthorUnsubmitted

Done

ASDenysPetrov: +1

return Extents;

}

unsigned ConstantArrayType::getNumAddressingBits(const ASTContext &Context,

QualType ElementType,

const llvm::APInt &NumElements) {

uint64_t ElementSize = Context.getTypeSizeInChars(ElementType).getQuantity();

// Fast path the common cases so we can avoid the conservative computation

// below, which in common cases allocates "large" APSInt values, which are

// slow.

▲ Show 20 Lines • Show All 4,272 Lines • Show Last 20 Lines

clang/lib/StaticAnalyzer/Core/RegionStore.cpp

Show First 20 Lines • Show All 1,654 Lines • ▼ Show 20 Lines if (Optional<nonloc::ConcreteInt> CI = Idx.getAs<nonloc::ConcreteInt>()) {

// the only time such an access would be made is if a string literal was // the only time such an access would be made is if a string literal was

// used to initialize a larger array. // used to initialize a larger array.

char c = (i >= length) ? '\0' : Str->getCodeUnit(i); char c = (i >= length) ? '\0' : Str->getCodeUnit(i);

return svalBuilder.makeIntVal(c, T); return svalBuilder.makeIntVal(c, T);

} }

} else if (const VarRegion *VR = dyn_cast<VarRegion>(superR)) { } else if (const VarRegion *VR = dyn_cast<VarRegion>(superR)) {

// Check if the containing array has an initialized value that we can trust. // Check if the containing array has an initialized value that we can trust.

// We can trust a const value or a value of a global initializer in main(). // We can trust a const value or a value of a global initializer in main().

const VarDecl *VD = VR->getDecl(); const VarDecl *VD = VR->getDecl();

martongUnsubmitted

Not Done

const VarDecl *VD = VR->getDecl()->getCanonicalDecl();

martong: const VarDecl *VD = VR->getDecl()->getCanonicalDecl();

if (VD->getType().isConstQualified() || if (VD->getType().isConstQualified() ||

R->getElementType().isConstQualified() || R->getElementType().isConstQualified() ||

(B.isMainAnalysis() && VD->hasGlobalStorage())) { (B.isMainAnalysis() && VD->hasGlobalStorage())) {

if (const Expr *Init = VD->getAnyInitializer()) { if (const Expr *Init = VD->getAnyInitializer()) {

if (const auto *InitList = dyn_cast<InitListExpr>(Init)) { if (const auto *InitList = dyn_cast<InitListExpr>(Init)) {

// The array index has to be known. // The array index has to be known.

if (auto CI = R->getIndex().getAs<nonloc::ConcreteInt>()) { if (auto CI = R->getIndex().getAs<nonloc::ConcreteInt>()) {

int64_t i = CI->getValue().getSExtValue(); const llvm::APSInt &Int = CI->getValue();

// If it is known that the index is out of bounds, we can return int64_t Idx = Int.getExtValue();

// an undefined value. const Expr *E = Int.isSigned()

if (i < 0) ? InitList->getExprForConstArrayByRawIndex(Idx)

return UndefinedVal(); : InitList->getExprForConstArrayByRawIndex(

static_cast<uint64_t>(Idx));

if (auto CAT = Ctx.getAsConstantArrayType(VD->getType())) // If E is null, then the index is out of bounds. Return Undef.

if (CAT->getSize().sle(i)) if (!E)

chrish_ericsson_atxUnsubmitted

Not Done

I see where you got the int64_t from -- that's what getSExtValue() returns. So, if the literal const index value in the expr is greater than LONG_MAX (but less than ULONG_MAX, of course), this would assert. That seems undesirable....

chrish_ericsson_atx: I see where you got the int64_t from -- that's what getSExtValue() returns. So, if the literal…

ASDenysPetrovAuthorUnsubmitted

Done

That's a great catch! I'll make changes soon.

ASDenysPetrov: That's a great catch! I'll make changes soon.

return UndefinedVal(); return UndefinedVal();

// If there is a list, but no init, it must be zero. // If E is the same as InitList, then there is no value specified

if (i >= InitList->getNumInits()) // in the list and we shall return a zero value.

if (E == InitList)

return svalBuilder.makeZeroVal(R->getElementType()); return svalBuilder.makeZeroVal(R->getElementType());

if (const Expr *ElemInit = InitList->getInit(i)) // Return a constant value.

if (Optional<SVal> V = svalBuilder.getConstantVal(ElemInit)) if (Optional<SVal> V = svalBuilder.getConstantVal(E))

return *V; return *V;

} }

// Check for loads from a code text region. For such loads, just give up. // Check for loads from a code text region. For such loads, just give up.

if (isa<CodeTextRegion>(superR)) if (isa<CodeTextRegion>(superR))

return UnknownVal(); return UnknownVal();

aaron.ballmanUnsubmitted

Not Done

// TODO: Support multidimensional array.

- const bool IsOneDimensionalArray =

- !isa<ConstantArrayType>(CAT->getElementType());

- if (IsOneDimensionalArray) {

+ if (!isa<ConstantArrayType>(CAT->getElementType())) {

const llvm::APSInt &Idx = CI->getValue();

aaron.ballman:

martongUnsubmitted

Not Done

+1 for Aaron's suggestion, but then it would be really helpful to have an explanatory comment. E.g.:

if (!isa<ConstantArrayType>(CAT->getElementType())) { // This is a one dimensional array.

martong: +1 for Aaron's suggestion, but then it would be really helpful to have an explanatory comment.

ASDenysPetrovAuthorUnsubmitted

Done

I think that self-descriptive code is better than comments nearby. And it does not affect anything in terms of performance.

ASDenysPetrov: I think that self-descriptive code is better than comments nearby. And it does not affect…

aaron.ballmanUnsubmitted

Not Done

FWIW, I found the condensed form more readable (I don't have to wonder what's going to care about that variable later in the function with lots of nesting).

aaron.ballman: FWIW, I found the condensed form more readable (I don't have to wonder what's going to care…

ASDenysPetrovAuthorUnsubmitted

Done

I see what you mean. That's fair in terms of variables caring. I think it's just the other hand of the approach. I don't have strong preferences here, it's just my personal flavor because it doesn't need to introspect the expression to undersand what it means. But I'll make an update using your proposition.

ASDenysPetrov: I see what you mean. That's fair in terms of variables caring. I think it's just the other hand…

aaron.ballmanUnsubmitted

Not Done

FWIW, my preference is weak as well -- the code is readable either way. :-)

aaron.ballman: FWIW, my preference is weak as well -- the code is readable either way. :-)

// Handle the case where we are indexing into a larger scalar object. // Handle the case where we are indexing into a larger scalar object.

martongUnsubmitted

Not Done

This static_cast seems to be dangerous to me, it might overflow. Can't we compare Idx directly to Extent? I see that Idx is an APSint and Extent is an APInt, but I think we should be able to handle the comparison on the APInt level b/c that takes care of the signedness. And the overflow situation should be handled as well properly with APInt, given from it's name "arbitrary precision int". In this sense I don't see why do we need I at all.

martong: This `static_cast` seems to be dangerous to me, it might overflow. Can't we compare `Idx`…

ASDenysPetrovAuthorUnsubmitted

Done

We can't get rid of I because we use it below anyway in I >= InitList->getNumInits() and InitList->getInit(I).
I couldn't find any appropriate function to compare without additional checking for signedness or bit-width adjusting.
I'll try to improve this snippet.

ASDenysPetrov: We can't get rid of `I` because we use it below anyway in `I >= InitList->getNumInits()` and…

martongUnsubmitted

Not Done

I think it would be possible to use bool llvm::APInt::uge that does an Unsigned greater or equal comparison. Or you could use sge for the signed comparison. Also, both have overload that take another APInt as parameter.

martong: I think it would be possible to use `bool llvm::APInt::uge` that does an Unsigned greater or…

ASDenysPetrovAuthorUnsubmitted

Done

I considered them. First of all choosing between uge and sge we additionally need to check for signedness. Moreover, these functions require bitwidth to be equal. Thus we need additional checks and transformations. I found this approach too verbose. Mine one seems to me simpler and works under natural rules of comparison.

ASDenysPetrov: I considered them. First of all choosing between `uge` and `sge` we additionally need to check…

ASDenysPetrovAuthorUnsubmitted

Done

This is not dangerous because we check for negatives separately in Idx < 0, so we can be sure that I is positive while I >= Extent. Unfortunately, I didn't find any suitable way to compare APSint of unknown sign and bitwidth with signless APInt without adding another checks for sign and bitwidth conversions. In this way I prefer the currect condition I >= Extent.

ASDenysPetrov: This is not dangerous because we check for negatives separately in `Idx < 0`, so we can be sure…

martongUnsubmitted

Not Done

Okay, thanks for thinking it through and answering my concerns!

martong: Okay, thanks for thinking it through and answering my concerns!

// For example, this handles: // For example, this handles:

martongUnsubmitted

Not Done

Do you think it would make sense to assert(CAT->getSize().isSigned())?

martong: Do you think it would make sense to `assert(CAT->getSize().isSigned())`?

ASDenysPetrovAuthorUnsubmitted

Done

getSize return APInt which is signless and has no isSigned method. But we know that an array extent shall be of type std::size_t (http://eel.is/c++draft/dcl.array#1) which is unsigned (http://eel.is/c++draft/support.types.layout#3). So we can confidently get the size with getZExtValue.

ASDenysPetrov: `getSize` return `APInt` which is //signless// and has no `isSigned` method. But we know that…

// int x = ... // int x = ...

aaron.ballmanUnsubmitted

Not Done

const llvm::APSInt &Idx = CI->getValue();

- const uint64_t I = static_cast<uint64_t>(Idx.getExtValue());

+ const auto I = static_cast<uint64_t>(Idx.getExtValue());

// Use `getZExtValue` because array extent can not be negative.

const uint64_t Extent = CAT->getSize().getZExtValue();

// Check for `Idx < 0`, NOT for `I < 0`, because `Idx` CAN be

aaron.ballman:

// char *y = &x; // char *y = &x;

// return *y; // return *y;

// FIXME: This is a hack, and doesn't do anything really intelligent yet. // FIXME: This is a hack, and doesn't do anything really intelligent yet.

const RegionRawOffset &O = R->getAsArrayOffset(); const RegionRawOffset &O = R->getAsArrayOffset();

// If we cannot reason about the offset, return an unknown value. // If we cannot reason about the offset, return an unknown value.

if (!O.getRegion()) if (!O.getRegion())

return UnknownVal(); return UnknownVal();

▲ Show 20 Lines • Show All 938 Lines • Show Last 20 Lines

clang/test/Analysis/initialization.cpp

	// RUN: %clang_cc1 -std=c++14 -triple i386-apple-darwin10 -analyze -analyzer-checker=core.builtin,debug.ExprInspection -verify %s			// RUN: %clang_cc1 -std=c++14 -triple i386-apple-darwin10 -analyze -analyzer-checker=core.uninitialized.Assign,core.builtin,debug.ExprInspection,core.uninitialized.UndefReturn -verify %s

	void clang_analyzer_eval(int);			void clang_analyzer_eval(int);

	struct S {			struct S {
	int a = 3;			int a = 3;
	};			};
	S const sarr[2] = {};			S const sarr[2] = {};
	void definit() {			void definit() {
	int i = 1;			int i = 1;
	// FIXME: Should recognize that it is 3.			// FIXME: Should recognize that it is 3.
	clang_analyzer_eval(sarr[i].a); // expected-warning{{UNKNOWN}}			clang_analyzer_eval(sarr[i].a); // expected-warning{{UNKNOWN}}
	}			}

	int const arr[2][2] = {};			int const arr[2][2] = {};
	void arr2init() {			void arr2init() {
	int i = 1;			int i = 1;
	// FIXME: Should recognize that it is 0.			// FIXME: Should recognize that it is 0.
	clang_analyzer_eval(arr[i][0]); // expected-warning{{UNKNOWN}}			clang_analyzer_eval(arr[i][0]); // expected-warning{{UNKNOWN}}
	}			}

				void direct_index1() {
				int const arr[2][2][3] = {};
				int const ptr = (int const )arr;
				clang_analyzer_eval(ptr[0] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[1] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[2] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[3] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[4] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[5] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[6] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[7] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[8] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[9] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[10] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[11] == 0); // expected-warning{{TRUE}}
				}

				void direct_index2() {
				int const arr[2][2][3] = {{{1, 2, 3}, {4, 5, 6}}, {{7, 8, 9}, {10, 11, 12}}};
				int const *ptr = arr[0][0];
				clang_analyzer_eval(ptr[0] == 1); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[1] == 2); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[2] == 3); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[3] == 4); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[4] == 5); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[5] == 6); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[6] == 7); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[7] == 8); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[8] == 9); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[9] == 10); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[10] == 11); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[11] == 12); // expected-warning{{TRUE}}
				}

				void direct_index3() {
				int const arr[2][2][3] = {{{}, {}}, {{}, {}}};
				int const *ptr = &(arr[0][0][0]);
				clang_analyzer_eval(ptr[0] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[1] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[2] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[3] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[4] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[5] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[6] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[7] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[8] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[9] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[10] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[11] == 0); // expected-warning{{TRUE}}
				}

				void direct_index4() {
				int const arr[2][2][3] = {{{1, 2}, {}}, {{7, 8}, {10, 11, 12}}};
				int const ptr = (int const )arr[0];
				clang_analyzer_eval(ptr[0] == 1); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[1] == 2); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[2] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[3] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[4] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[5] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[6] == 7); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[7] == 8); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[8] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[9] == 10); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[10] == 11); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[11] == 12); // expected-warning{{TRUE}}
				}

				void direct_index5() {
				int arr[2][2][3] = {{{1, 2}}, {{7}}};
				int ptr = (int )arr;
				clang_analyzer_eval(ptr[0] == 1); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[1] == 2); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[2] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[3] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[4] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[5] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[6] == 7); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[7] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[8] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[9] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[10] == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(ptr[11] == 0); // expected-warning{{TRUE}}
				}

				void direct_invalid_index1() {
				const int arr[2][2][3] = {};
				const int ptr = (const int )arr;
				int idx = -1;
				auto x = ptr[idx]; // expected-warning{{garbage or undefined}}
				}

				void direct_invalid_index2() {
				const int arr[2][2][3] = {};
				const int ptr = (const int )arr;
				int idx = 42;
				auto x = ptr[idx]; // expected-warning{{garbage or undefined}}
				}

				static const unsigned RV[1][5] = {{1, 2, 3, 4, 5}};
				void PR50604() {
				const unsigned *rvp = &(RV[0][0]);
				auto buf_p = rvp[4]; // no-warning (garbage or undefined)
				}

				const float floats[] = {
				0.0000, 0.0235, 0.0470, 0.0706, 0.0941, 0.1176};
				float no_warn_garbage_value() {
				return floats[0]; // no-warning (garbage or undefined)
				}

This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Retrieve a value from list initialization of constant array declaration in a global scope.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 359797

clang/include/clang/AST/Expr.h

clang/include/clang/AST/Type.h

clang/lib/AST/Expr.cpp

clang/lib/AST/Type.cpp

clang/lib/StaticAnalyzer/Core/RegionStore.cpp

clang/test/Analysis/initialization.cpp

[analyzer] Retrieve a value from list initialization of constant array declaration in a global scope.
ClosedPublic