Page MenuHomePhabricator

[analyzer] Implement getType for SVal
ClosedPublic

Authored by vsavchenko on Jun 18 2021, 9:55 AM.

Details

Summary

This commit adds a function to the top-class of SVal hierarchy to
provide type information about the value. That can be extremely
useful when this is the only piece of information that the user is
actually caring about.

Additionally, this commit introduces a testing framework for writing
unit-tests for symbolic values.

Diff Detail

Event Timeline

vsavchenko created this revision.Jun 18 2021, 9:55 AM
vsavchenko requested review of this revision.Jun 18 2021, 9:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 18 2021, 9:55 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Hey folks!

I was thinking that this method can be quite handy.
I think it would be great to have another pair of eyes (but more would be better) to look into this and:

  • Suggest other test cases to add
  • See if I missed any "typed" values from the hierarchy
  • See if getType for supported types does make sense

In particular, it would be great to hear your thoughts on SymolicRegion because it's not that obvious that we should return that type or not. On one hand, this is the best we've got, on the other hand, it can be misleading.

NoQ added a comment.Jun 18 2021, 7:35 PM

Excellent! This utility is the first step on a lot of paths such as:

  • Asserting that all expressions' values are of the right type. I expect this to uncover a lot of ridiculous mutually cancelling bugs.
  • Modeling extents of RegionStore bindings - they're simply widths of their respective types.
  • See if I missed any "typed" values from the hierarchy

All values in the hierarchy are typed. Well, ideally.

See inlined comment for ConcreteInts.

One particularly important one is nonloc::LocAsInteger (And, well, why doesn't it still have signedness? - I had a patch for it like 6 years ago. Why am I so bad at committing patches?).

I think you can also easily support pointers-to-members and hopefully goto labels.

  • See if getType for supported types does make sense

In particular, it would be great to hear your thoughts on SymolicRegion because it's not that obvious that we should return that type or not. On one hand, this is the best we've got, on the other hand, it can be misleading.

The type of the object it points to is indeed technically unknown. The type of the pointer represented by a loc::MemRegionVal(SymbolicRegion) is unambiguous and coincides with the type of the symbol.

In particular, if the symbol is of type void *, then it represents a void pointer value, which doesn't mean that its pointee is a void.

clang/lib/StaticAnalyzer/Core/SVals.cpp
152

This is correct except you need to get the value type, not the pointer type.

LazyCompoundVal is a prvalue and its parent region is the location in which this prvalue resided some time in the past. So the parent region is of the right type and it's always typed but you need the pointee type.

163

The biggest disappointment here is that this case is technically incorrect for the very basic case of integral types.

Because we drop casts, a symbol of an integral type may technically represent a value of a completely different integral type, the same symbol may represent multiple different values of different integral types, and so on.

This is what ruins the dream of modeling Region Store binding extents as value widths: for the single most important case we'd be getting incorrect answers.

clang/unittests/StaticAnalyzer/SValTest.cpp
145

nonloc::ConcreteInt is basically an APSInt. You can extract bit width and signedness and feed it to ASTContext::getIntTypeForBitwidth().

loc::ConcreteInt is always of the pointer type. Unless, of course, you have a target architecture with pointers of different widths, then whelp we aren't quite there yet. But when we get there, I guess it's still APSInt under the hood so a similar trick could be used.

Support GotoLabel

Support ConcreteInt, LocAsInteger, and GotoLabel

vsavchenko marked an inline comment as done.Jun 21 2021, 8:57 AM
vsavchenko added inline comments.
clang/lib/StaticAnalyzer/Core/SVals.cpp
152

OK then, can you maybe hint how can I write a test where this is going to be a pointer type (or maybe then getType for regions works incorrectly).

163

I hope that the introduction of symbolic casts is around the corner.

clang/unittests/StaticAnalyzer/SValTest.cpp
145

OK, I guess having ASTContext as a parameter is the best choice we can have here and not such a high price to pay.

NoQ added inline comments.Jun 21 2021, 2:33 PM
clang/lib/StaticAnalyzer/Core/SVals.cpp
152

Regions have getLocationType() (the pointer type) and getValueType() (the pointee type). I think you need to call the latter directly in this case, bypassing recursion.

In order to obtain a live LazyCompoundVal specimen for testing purposes, you need to load an entire compound object (not necessarily represented by a CompoundVal) from Region Store.

Eg.,

struct MyStruct a;
// ...
struct MyStruct b = a; // Avoid C++ though, constructors are a different beast.

Or you could construct one directly. But that, of course, wouldn't give you any hints about the appropriate type.

maybe then getType for regions works incorrectly

Hmm that's actually a good separate question. How do you know if a region represents a prvalue of pointer type or a glvalue of pointee type (including, but not limited to, references)? This can't be figured out without more context just by looking at the SVal.

vsavchenko marked an inline comment as done.Jun 22 2021, 2:03 AM
vsavchenko added inline comments.
clang/lib/StaticAnalyzer/Core/SVals.cpp
152

In order to obtain a live LazyCompoundVal specimen for testing purposes.

That's not a problem:

TestUnion d = {.c=b};

does produce LazyCompundVal and we don't get a pointer, but the value type. That's why I was asking how I can construct an example when this current implementation fails.

Hmm that's actually a good separate question. How do you know if a region represents a prvalue of pointer type or a glvalue of pointee type (including, but not limited to, references)? This can't be figured out without more context just by looking at the SVal.

Value categories are orthogonal to types, so I don't see why we should care for those in getType. How do you think it should affect this particular functionality?

NoQ added inline comments.Jun 22 2021, 5:01 AM
clang/lib/StaticAnalyzer/Core/SVals.cpp
152

TestUnion d = {.c=b};

(lldb) p D.dump()
compoundVal{ lazyCompoundVal{0x1110b3950,temp_object{struct TestStruct, S1276}}}

It's an eager compound value that contains a lazy compound value as the initializer for field .c.

You're still testing an eager compound value. You never visit the lazy compound value recursively.

MemRegion::getLocationType() is always a pointer type.

Value categories are orthogonal to types, so I don't see why we should care for those in getType. How do you think it should affect this particular functionality?

The static analyzer basically models operators * and & as no-op but from the perspective of the standard's formalism they jump across objects.

For example, a load from parameter int *p would produce a value &SymRegion{reg_$0<p>} that represents both the rvalue of p (which has the type int *) and the lvalue of *p (which is an entirely different object of type int).

This is not an urgent, but still a very important issue: SVal is likely among the first things a new developer in the analyzer sees. Its common around the university here that some students barely halfway through their BSc are given smaller tasks, like the implementation of a checker in the static analyzer. Instead of

// Try to get a reasonable type for the given value.

Maybe we should tell why this might fail, why its not just a type but a best effort, etc.

Despite being in the analyzer for so long, I don't have anything anything meaningful to add just yet :)

This is not an urgent, but still a very important issue: SVal is likely among the first things a new developer in the analyzer sees. Its common around the university here that some students barely halfway through their BSc are given smaller tasks, like the implementation of a checker in the static analyzer. Instead of

// Try to get a reasonable type for the given value.

Maybe we should tell why this might fail, why its not just a type but a best effort, etc.

Despite being in the analyzer for so long, I don't have anything anything meaningful to add just yet :)

That's a good point! I guess the main obstacle for a meaningful description of "why this might fail" is what @NoQ mentioned: "all values are supposed to be typed". But if we go that deep in one commit (even doing it for a rather simple pointer-to-member case is not so small of a change), we won't get this feature at all. Do you think that "for the sake of incremental approach"-like comment can help?

Do you think that "for the sake of incremental approach"-like comment can help?

A TODO: should be good enough for now. I see this is still a moving target.

NoQ added a comment.Jun 23 2021, 2:10 AM

Another thing to mention about SVal::getType() in its doxygen comment is that most of the time you don't need it. Similarly to how checking whether an SVal is a Loc or a NonLoc results in incorrect code 95% of the time (because such code is unable to discriminate between arbitrary lvalues and pointer rvalues which is typically crucial in order to get everything right but often forgotten / misunderstood), relying on the type of the SVal rather than its storage raises similar problems.

The use case in RegionStore that I mentioned in the beginning is the rare exception to this rule of thumb because there's no AST expressions to tell us the type of the storage and on top of that type punning makes the storage basically entirely untyped. There's really no other way to obtain such information. Bytes in memory really don't have a type; it's very much a matter of convenient abstraction for us to associate type with them.

I'm in favor of this patch. It will help simplify SValBuilder::evalCast, which takes an optional parameter OriginalTy and acts differently based on whether it has been passed or has not.

Since sizeof(SVal) became bigger (x1.5). I'm wondering of how much this reflects on memory consumption.

Another thing is that we can garantee returning QualType. I mean, we can replace Optional with QualType itself. QualType has a default ctor and isNull predicate, which is true when defaultly constructed.

clang/lib/StaticAnalyzer/Core/SVals.cpp
154

I'm not sure this is a correct type. I would expect here something like: class LabelType : public Type.

clang/unittests/StaticAnalyzer/SValTest.cpp
182–183

Add a cast case.

191–193

I'm in favor of this patch. It will help simplify SValBuilder::evalCast, which takes an optional parameter OriginalTy and acts differently based on whether it has been passed or has not.

@NoQ mentioned it before that it is always better to use direct types from AST. But the only participant who doesn't have AST is the Store, and when it calls evalCast it doesn't have a way to get that type. So, in this case this function van be pretty useful.

Since sizeof(SVal) became bigger (x1.5). I'm wondering of how much this reflects on memory consumption.

I'm not sure what you mean here. If it is about this particular patch, it simply adds a non-virtual method to SVal, it shouldn't affect sizeof(SVal) at all.

Another thing is that we can garantee returning QualType. I mean, we can replace Optional with QualType itself. QualType has a default ctor and isNull predicate, which is true when defaultly constructed.

I wanted to be more explicit here and convey to the user that the lack of type is normal. I can change that, it won't be a problem at all. @NoQ, @Szelethus what do you think?

clang/lib/StaticAnalyzer/Core/SVals.cpp
154

I don't think that I fully understood what you suggest here. Do you suggest to add a new type to Type.h?

clang/unittests/StaticAnalyzer/SValTest.cpp
182–183

Sure!

Another thing is that we can garantee returning QualType. I mean, we can replace Optional with QualType itself. QualType has a default ctor and isNull predicate, which is true when defaultly constructed.

I wanted to be more explicit here and convey to the user that the lack of type is normal. I can change that, it won't be a problem at all. @NoQ, @Szelethus what do you think?

// May return nullptr.

And its all good!

Another thing to mention about SVal::getType() in its doxygen comment is that most of the time you don't need it. Similarly to how checking whether an SVal is a Loc or a NonLoc results in incorrect code 95% of the time (because such code is unable to discriminate between arbitrary lvalues and pointer rvalues which is typically crucial in order to get everything right but often forgotten / misunderstood), relying on the type of the SVal rather than its storage raises similar problems.

Then I especially can't imagine a more important doxygen comment then this! We could even consider adding your point about Locs and NonLocs to SVal::getAs. The target audience are new to clang, static analysis, and C++ itself. I recall you had to beat some of this stuff into me for weeks when I implemented pointer chasing in UninitializedObjectChecker :)

I'm not sure what you mean here. If it is about this particular patch, it simply adds a non-virtual method to SVal, it shouldn't affect sizeof(SVal) at all.

My fault. For some reason I thought you added a QualType as a member to SVal declaration.

I wanted to be more explicit here and convey to the user that the lack of type is normal.

We need to think about what real usage would look and feel better.

My another proposition is to think about how we can rework ConcreteInt to Int which can keep ranges. Previosly we couldn't keep range sets inside Sval as they were ImmutableSets, but now they are just pointers to vectors. So we can pass the range set inside SVal. IMO now RangeConstraintManager is smart enough to do standart arithmetics with ranges as we do for concrete ints. These are just my recent thoughts out loud.

clang/lib/StaticAnalyzer/Core/SVals.cpp
154

Yes. As a user I expect to see some special type for labels, not a void*. But for the absence of such type let it be as is.

NoQ added inline comments.Jun 24 2021, 9:02 PM
clang/lib/StaticAnalyzer/Core/SVals.cpp
154

void * is the correct type for label values, as defined in the documentation of the respective GNU extension to C (which Clang mimics/supports):

https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html

You can get the address of a label defined in the current function (or a containing function) with the unary operator '&&'. The value has type void *.

ASDenysPetrov added inline comments.Jun 25 2021, 3:50 AM
clang/lib/StaticAnalyzer/Core/SVals.cpp
154

OK, I see. As a MSVC user I never met this feature neither in a real code nor in the Standard. I've just checked the feature in Godbolt and MSVC is almost the only one which doesn't support it.
I'm OK with void*, since a user has additional information that this also is a GotoLabel to handle whatever way it want.

vsavchenko marked 9 inline comments as done.

Address comments from review

NoQ accepted this revision.Jun 28 2021, 11:25 AM

Amazing, thanks!!

clang/include/clang/StaticAnalyzer/Core/PathSensitive/SVals.h
213

I'm also in favor of a note that says "Loc values are interpreted as pointer rvalues for the purposes of this method" or something like that.

This revision is now accepted and ready to land.Jun 28 2021, 11:25 AM
Szelethus accepted this revision.Jun 29 2021, 2:08 AM

Really appreciate the unit tests!

Add one more note to getType docstring

This revision was landed with ongoing or failed builds.Jun 29 2021, 2:44 AM
This revision was automatically updated to reflect the committed changes.

@vsavchenko One of the added tests is failing on our 32 bit Armv7 Thumb bot: https://lab.llvm.org/buildbot/#/builders/170/builds/61

/home/tcwg-buildslave/worker/clang-thumbv7-full-2stage/llvm/clang/unittests/StaticAnalyzer/SValTest.cpp:169: Failure
Expected equality of these values:
  Context.UnsignedLongTy
    Which is: unsigned long
  A.getType(Context)
    Which is: unsigned int
[  FAILED  ] SValTest.GetLocAsIntType (22 ms)
[----------] 1 test from SValTest (22 ms total)

A 32/64 bit issue?

@vsavchenko One of the added tests is failing on our 32 bit Armv7 Thumb bot: https://lab.llvm.org/buildbot/#/builders/170/builds/61

/home/tcwg-buildslave/worker/clang-thumbv7-full-2stage/llvm/clang/unittests/StaticAnalyzer/SValTest.cpp:169: Failure
Expected equality of these values:
  Context.UnsignedLongTy
    Which is: unsigned long
  A.getType(Context)
    Which is: unsigned int
[  FAILED  ] SValTest.GetLocAsIntType (22 ms)
[----------] 1 test from SValTest (22 ms total)

A 32/64 bit issue?

Hi @DavidSpickett , thanks for looking into this!
This patch was almost instantly followed by https://github.com/llvm/llvm-project/commit/b2842298cebf420ecb3750bf309021a7f37870c1 which fixed the issue. Please, let me know, if you still see it after that commit!

Cool. If you see a few more emails with the same thing ignore them, the armv7 bots are rather slow to catch up.