Page MenuHomePhabricator

[analyzer] Fix extraction of punned and known scalar SVals
Needs ReviewPublic

Authored by vabridgers on Dec 20 2020, 5:14 AM.

Details

Reviewers
NoQ
Summary

This change addresses punning of scalar types in static analysis, and is
a precursor to handling the following two cases:

https://bugs.llvm.org/show_bug.cgi?id=39032
https://bugs.llvm.org/show_bug.cgi?id=44114

This patch was co-authored with Balazs Benics (steakhal), and debug
assistance from Gabor Marton (martong).

Region store is taught to recognize punned concrete integers and extract
the data in an endianess correct way for scalars.

Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>

Diff Detail

Event Timeline

vabridgers created this revision.Dec 20 2020, 5:14 AM
vabridgers requested review of this revision.Dec 20 2020, 5:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 20 2020, 5:14 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Clean up a few typos

vabridgers edited the summary of this revision. (Show Details)Dec 20 2020, 6:20 AM

Update commit header.

vabridgers retitled this revision from [analyzer] Prototype change to fix type punning of known SVals to [analyzer] Fix extraction of punned and known SVals.Dec 20 2020, 7:10 AM

This proposed change seems to address the referenced issues (https://bugs.llvm.org/show_bug.cgi?id=39032 and https://bugs.llvm.org/show_bug.cgi?id=44114). I believe the scalars test cases to be mostly complete, and the structures test cases to be at most a baseline (just based on covering specific cases mentioned in the bug reports). Please let me know how to improve through review comments and I'll work on that. Thanks to Balazs, Gabor, Artem and Denis for feedback in the email thread and help working through this.

Reduce scope of this initial change to just scalars

vabridgers retitled this revision from [analyzer] Fix extraction of punned and known SVals to [analyzer] Fix extraction of punned and known scalar SVals.Dec 20 2020, 12:56 PM
vabridgers edited the summary of this revision. (Show Details)

Based on a suggestion from Balazs, I reduced the scope of the initial change to just scalars. There is one issue I'd like to hear comments on, and that's how to handle the case of extracting a bit field outside of the represented APInt. Currently, I'm returning UnknownVal(), following the lead in RegionStore.cpp, line 1775, in method getBindingForElement. During development, I hit an assert in extractBits when encountering a "negative" case exposed by the LIT ptr-arith.cpp - the negative case being an index out of bounds of the punned scalar (see the LIT I added for the negative cases).

Based on this change, some of the basic structure cases are "working" (meaning not showing false results), but the change is probably not comprehensive enough, and definitely not covered well enough by tests, so focusing just on scalars for now. I'm willing to work on this collaboratively to solve these problems, since we encounter these analyzing our source code.

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1657

I added this if to handle the case of something like this:

unsigned short sh = 0x1122;
unsigned char *p = (unsigned char *)&sh;

unsigned char aa = p[0]; // This is ok, and will yield 0x11 or 0x22 - depending on endianess.
unsigned char ch = p[3];
// This is an error and should be caught.

I opted with returning UnknownVal() following the lead in RegionStore.cpp:1775 (just below). The negative test cases I added assume this. The ptr-arith.cpp LIT exposed this issue. Otherwise, I hit an assert in extractBits that the bit field is outside of represented range in the APInt.

NoQ added a comment.Dec 22 2020, 10:59 PM

I think you've found a very nice and compact 50% solution to the problem. I didn't think of this while i was looking for a proper fix. Very nice.

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1629–1631

Let's write down the contract of this method. Do i understand correctly that for a given region R and offset O, you're trying to lookup an element of type elemT at offset O while knowing that R has binding V at offset zero?

1634

Please feel free to add this as a method on SVal, with a comment that this method does not take constraints in the current program state into account.

1648

This assignment can overflow. Both because the raw offset can be very large and because it can be negative.

Say, the following code compiles just fine (and may even run) so we need to be able to analyze it but you dodge that by always checking that the base type is a scalar type (if you intend to keep it that way i'd rather assert that; the other reason to assert that would be because endianness logic is screwed without it):

int main() {
  int x[5000000000]; // Over 2³² elements.
  x[0] = 1;
  char *y = &x;
  char c = y[19999999999];
}

The following code is valid but you seem to be protected from it because you dodge SymbolicRegions by checking that R is a TypedValueRegion:

void foo(int *x) {
  long *y = (long *)(x - 1);
  y = 1;
  char c = *(char *)y;
}

The following code has an obvious UB but it still compiles so you have to make sure the analyzer behaves sanely:

void foo() {
  int x;
  long *y = (long *)(x - 1);
  y = 1;
  char c = *(char *)y;
}
1653–1654

The ConcreteValue->getBitWidth() part is the only thing that you can't generalize to arbitrary values for now. Fundamentally, any symbolic value has an intrinsic width (and, moreover, almost an intrinsic type) that you should be able to compute by just looking at the value. For now we can't rely on that though, because our values for integral casts over symbols aren't represented correctly, and we can't represent them correctly without a proper constraint solver to solve the resulting constraints.

So a good temporary solution would be to proactively store widths together with bindings. I.e., instead of "In region R at offset O we have value X" we could write down "In region R from offset O to offset O' we have value X". That'd bulk up the RegionStore and make some operations more difficult but it will solve the problem entirely.

clang/test/Analysis/concrete-endian.cpp
49

I suspect that exactly one of pps[0] in the little endian case or pps[3] in the big endian case should be 0x7788 instead. Like, they're in the opposite order, right?

NoQ added inline comments.Dec 22 2020, 11:05 PM
clang/test/Analysis/concrete-endian.cpp
49

Wait, loads are also in the opposite order. Nvm. Your code is correct and looks like there's indeed a nice and succint way to do that.

NoQ added inline comments.Dec 22 2020, 11:10 PM
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1648

(I see that you brought this up and tested it but overflows are still scary. If you're consciously relying on an overflow you should probably at least comment that.)

clang/test/Analysis/concrete-endian.cpp
56

Ideally no they shouldn't, they should yield Undefined. I'd rather mark it as FIXME: because ultimately we could have a warning here. It's ok if we miss these cases for now though.

Thanks for the comments, @NoQ . I'll carefully review and update. BTW, I found an old Bugzilla case that seems to relate to this change directly -> https://bugs.llvm.org/show_bug.cgi?id=2820. Once this change is evolved and accepted, I'll update that Bugzilla issue and close. Cheers!