This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][NFCI] Move a block from `getBindingForElement` to separate functions
ClosedPublic

Authored by ASDenysPetrov on Jul 23 2021, 9:52 AM.

Details

Summary
  1. Improve readability by moving deeply nested block of code from RegionStoreManager::getBindingForElement to new separate functions:
    • getConstantValFromConstArrayInitializer;
    • getSValFromInitListExpr.
  2. Handle the case when index is a symbolic value. Write specific test cases.
  3. Add test cases when there is no initialization expression presented.

This patch implies to make next patches clearer and easier for review process.

Diff Detail

Event Timeline

ASDenysPetrov created this revision.Jul 23 2021, 9:52 AM
ASDenysPetrov requested review of this revision.Jul 23 2021, 9:52 AM
ASDenysPetrov abandoned this revision.Sep 3 2021, 4:24 AM

This patch is currently irrelevant after last changes in the parent revision. It's going to be reworked or permanently abandoned.

ASDenysPetrov retitled this revision from [analyzer] Retrieve a value from list initialization of constant multi-dimensional array. to [analyzer][NFC] Move a block from `getBindingForElement` to separate functions.
ASDenysPetrov edited the summary of this revision. (Show Details)
ASDenysPetrov added reviewers: martong, steakhal.
ASDenysPetrov set the repository for this revision to rG LLVM Github Monorepo.

@martong
Please, look. As you suggested in D111542#inline-1064497 I made an intermediate patch making next patches easer for review.

So, you return None instead of returning UndefinedVal.
All in all, it looks good, aside from a couple stuff inline.

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1636

Please preserve the original comment.

1650

What if that is a typedef?

1686–1689

This should have been checked at L1642 to preserve the original behavior.
As of now, the code might return UndefinedVal at L1683 or Unknown at L1659, which is probably different than the original behavior - unless you prove otherwise.

1714

What other values could appear in this context other than constants?

@steakhal
All your comments are fair in terms of NFC. I also tried to improve some places. I think I should requalificate the revision to non-NFC, that I can be more untied to bring some improvements.

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1636

OK.

1650

I'll add a test case.

1714

It's hard to say. I didn't do any investigation yet, but I think some symbolic or loc values.

ASDenysPetrov added inline comments.Oct 20 2021, 1:51 AM
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1686–1689

Unknown at L1659 comes from RegionStoreManager::getBindingForFieldOrElementCommon. As you can see, there is a snippet in here:

if (hasSymbolicIndex)
  return UnknownVal();
ASDenysPetrov added inline comments.Oct 20 2021, 1:54 AM
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1686–1689

UndefinedVal at L1683 preserves exactly the previous behaviour.

I think it's fine, maybe NFCi is would be slightly more accurate, while stating the minor behavior change and the reason for doing so in the patch summary could further improve the visibility of this issue.

That being said, since it actually changes some behavior, I'd like to request some tests covering the following (uncovered lines):
L1646, L1689, L1699

I think it's fine, maybe NFCi is would be slightly more accurate, while stating the minor behavior change and the reason for doing so in the patch summary could further improve the visibility of this issue.

That being said, since it actually changes some behavior, I'd like to request some tests covering the following (uncovered lines):
L1646, L1689, L1699

For L1646 currently I'm not sure about the exact test, since it is a heritage of the old code, so it just jumped here from the past. If you could give an example I would appreciate this.
For L1689 I'll add it.
For L1699 I've added TODO cases in D104285.

Make changes according to comments. Thanks to @steakhal

Minor comment update.

ASDenysPetrov retitled this revision from [analyzer][NFC] Move a block from `getBindingForElement` to separate functions to [analyzer][NFCI] Move a block from `getBindingForElement` to separate functions.Oct 20 2021, 5:15 AM
ASDenysPetrov edited the summary of this revision. (Show Details)
ASDenysPetrov edited the summary of this revision. (Show Details)
ASDenysPetrov added inline comments.Oct 20 2021, 5:18 AM
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1666

Item 2 from the Summary. This was borrowed from RegionStoreManager::getBindingForFieldOrElementCommon :

if (hasSymbolicIndex)
   return UnknownVal();

Fixed redefinition in test file initialization.cpp.

I think it's fine, maybe NFCi is would be slightly more accurate, while stating the minor behavior change and the reason for doing so in the patch summary could further improve the visibility of this issue.

That being said, since it actually changes some behavior, I'd like to request some tests covering the following (uncovered lines):
L1646, L1689, L1699

For L1646 currently I'm not sure about the exact test, since it is a heritage of the old code, so it just jumped here from the past. If you could give an example I would appreciate this.
For L1689 I'll add it.
For L1699 I've added TODO cases in D104285.

I just wanted you to think about these cases.
BTW this should work for L1646:

extern const int arr[]; // Incomplete array type
void top() {
  (void)arr[42];
}

And I'm okay with the rest of the uncovered lines.

clang/test/Analysis/initialization.c
103 ↗(On Diff #380934)

I'm pretty sure we should not get Unknown for simply loading from a global variable.
That would imply that loading two times subsequently we could not prove that the value remained the same.
But that should remain the same, thus compare equal unless some other code modifier that memory region.

That could happen for two reasons:

  1. There is a racecondition, and another thread modified that location. But that would be UB, so that could not happen.
  2. The global variable is volatile, so the hardware might changed its content- but this global is not volatile so this does not apply.

That being said, this load should have resulted in a fresh conjured symbolic value instead of unknown.
Could you please check if it did result in unknown before your patch, or you did introduce this behavior?

I just wanted you to think about these cases.
BTW this should work for L1646:

extern const int arr[]; // Incomplete array type
void top() {
  (void)arr[42];
}

Yes this will work for L1646. But I've already added similar tests for this line: glob_array_index4 and struct_arr_index1 functions in initialization.cpp. I mean it doesn't matter whether it is an incomplete or complete type.

clang/test/Analysis/initialization.c
103 ↗(On Diff #380934)

I'm not sure I caught your thoughts.
But I think the things is much simplier:
clang_analyzer_eval can only produce UNKNOWN or TRUE or FALSE. If we know the constraint of glob_arr_no_init[2] we return TRUE or FALSE, and UNKNOWN otherwise.
But in fact I should use clang_analyzer_dump here instead of clang_analyzer_eval. This is actually my fault. I'll update this.

martong accepted this revision.Oct 20 2021, 9:17 AM

Thanks, this looks good to me! But, please wait for an approve from @steakhal as well.

clang/test/Analysis/initialization.c
103 ↗(On Diff #380934)

Could you please check if it did result in unknown before your patch, or you did introduce this behavior?

I've just checked it, it was Unknown before this patch as well.
And actually, that is wrong because the array has static storage duration and as such, we know that it is initialized with zeros according to the C standard. But, that should be addressed in a follow-up patch (if someone has the capacity).
https://stackoverflow.com/questions/32708161/value-of-uninitialized-elements-in-array-of-c-language/32708288

This revision is now accepted and ready to land.Oct 20 2021, 9:17 AM
steakhal added inline comments.Oct 20 2021, 9:18 AM
clang/test/Analysis/initialization.c
103 ↗(On Diff #380934)

Oh true. I was actually tricked by the initialization.cpp:38, where I actually caught this. And in that case, you use dump() yet you get Unknown as a result. But the issue remains the same.

ASDenysPetrov added inline comments.Oct 20 2021, 3:24 PM
clang/test/Analysis/initialization.c
103 ↗(On Diff #380934)

C++ also states about zero-initialization for static storage lifetime duration: http://eel.is/c++draft/basic.start.static#2
I think it will be among my next patches.

ASDenysPetrov added inline comments.Oct 20 2021, 3:40 PM
clang/test/Analysis/initialization.c
103 ↗(On Diff #380934)

And in that case, you use dump() yet you get Unknown as a result. But the issue remains the same.

I just realized that I was confused as well :) The dump returns a symbolic value like reg_$0<int Element{glob_arr_no_init,2 S64b,int}>, not Unknown. So my intention of using eval was deliberate. Anyway we should improve this to produce FALSE instead of UNKNOWN, since it has a static storage.

steakhal added inline comments.Oct 21 2021, 12:46 AM
clang/test/Analysis/initialization.c
103 ↗(On Diff #380934)

It's a really complex topic. I would highly recommend taking baby steps to improve this area.

In C, you might have a chance to accomplish something, but in C++ static globals might be initialized by running a constructor, which means arbitrary user-defined code. This is actually why we disabled similar logic in the RegionStore::getInitialStore(). I highly recommend taking a look.

Consider this code:

// TU 1:
#include <cstdio>
static int a;  // zero-initialized initially
int *p2a = &a; // escapes the address of 'a'

int main() {
  printf("%d\n", a); // reports 42
}

// TU 2:
extern int *p2a;
static bool sentinel = (*p2a = 42, false);

Added a comment to initialization.c, that we can reason about values of constant array which doesn't have an initializer. This is not such simple for C++, since there are much more ways to initialize the array.

ASDenysPetrov added inline comments.Oct 21 2021, 12:07 PM
clang/test/Analysis/initialization.c
103 ↗(On Diff #380934)

Yes, I see what you mean. I'm going to tough the part of C++ in the near future.

103 ↗(On Diff #380934)

Yes, I see what you mean. I'm going to tough the part of C++ in the near future.

*Yes, I see what you mean. I'm not going to tough the part of C++ in the near future.

steakhal accepted this revision.Oct 25 2021, 1:27 AM

Sorry for blocking the review of this one for so long.

clang/test/Analysis/initialization.c
103 ↗(On Diff #380934)

Okay, I see now.

Sorry for blocking the review of this one for so long.

Thank you for the review!