This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Prefer wrapping SymbolicRegions by ElementRegions
ClosedPublic

Authored by steakhal on Aug 18 2022, 8:42 AM.

Details

Summary

It turns out that in certain cases SymbolRegions are wrapped by
ElementRegions; in others, it's not. This discrepancy can cause the
analyzer not to recognize if the two regions are actually referring to
the same entity, which then can lead to unreachable paths discovered.

Consider this example:

struct Node { int* ptr; };
void with_structs(Node* n1) {
  Node c = *n1; // copy
  Node* n2 = &c;
  clang_analyzer_dump(*n1); // lazy...
  clang_analyzer_dump(*n2); // lazy...
  clang_analyzer_dump(n1->ptr); // rval(n1->ptr): reg_$2<int * SymRegion{reg_$0<struct Node * n1>}.ptr>
  clang_analyzer_dump(n2->ptr); // rval(n2->ptr): reg_$1<int * Element{SymRegion{reg_$0<struct Node * n1>},0 S64b,struct Node}.ptr>
  clang_analyzer_eval(n1->ptr != n2->ptr); // UNKNOWN, bad!
  (void)(*n1);
  (void)(*n2);
}

The copy of n1 will insert a new binding to the store; but for doing
that it actually must create a TypedValueRegion which it could pass to
the LazyCompoundVal. Since the memregion in question is a
SymbolicRegion - which is untyped, it needs to first wrap it into an
ElementRegion basically implementing this untyped -> typed conversion
for the sake of passing it to the LazyCompoundVal.
So, this is why we have Element{SymRegion{.}, 0,struct Node} for n1.

The problem appears if the analyzer evaluates a read from the expression
n1->ptr. The same logic won't apply for SymbolRegionValues, since
they accept raw SubRegions, hence the SymbolicRegion won't be
wrapped into an ElementRegion in that case.

Later when we arrive at the equality comparison, we cannot prove that
they are equal.

For more details check the corresponding thread on discourse:
https://discourse.llvm.org/t/are-symbolicregions-really-untyped/64406


In this patch, I'm eagerly wrapping each SymbolicRegion by an
ElementRegion; basically canonicalizing to this form.
It seems reasonable to do so since any object can be thought of as a single
array of that object; so this should not make much of a difference.

The tests also underpin this assumption, as only a few were broken by
this change; and actually fixed a FIXME along the way.

About the second example, which does the same copy operation - but on
the heap - it will be fixed by the next patch.

Diff Detail

Event Timeline

steakhal created this revision.Aug 18 2022, 8:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 18 2022, 8:42 AM
steakhal requested review of this revision.Aug 18 2022, 8:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 18 2022, 8:42 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
steakhal edited the summary of this revision. (Show Details)Aug 18 2022, 8:49 AM
NoQ added a comment.Aug 18 2022, 3:51 PM

Like I vaguely mentioned in the thread, I'm really curious whether it's possible to canonicalize the *absence* of element regions instead, as it allows modeling pointer casts as no-op and avoids problems like D38797 where results of even very basic operations are impossible to represent.

Like I vaguely mentioned in the thread, I'm really curious whether it's possible to canonicalize the *absence* of element regions instead, as it allows modeling pointer casts as no-op and avoids problems like D38797 where results of even very basic operations are impossible to represent.

Yes, that would make more sense if we were getting rid of the current memory model. However, I'm not feeling confident working on that before finalizing a detailed plan.
For that, we would need to discuss the potential pros and cons thoroughly and then approach how it should look & we could incrementally land it. So, it definitely does not feels like a cheap thing to do.

For now, I believe this change makes the modeling of trivial copies a bit better from the user's perspective. In the future, we could still get rid of the memory model and implement a new one if we want.

steakhal added inline comments.Aug 19 2022, 1:30 AM
clang/test/Analysis/trivial-copy-struct.cpp
27

and in the rest of the cases

I agree with @steakhal. Getting rid of element regions might be a lot of work and might have a lot of fallout that we need to deal with. I think we should not block a fix for something that we might not be able to land for a long time. Fixing this here, and independently start working on the long-term solutions sounds reasonable to me. At least once we switch over, we will have test coverage for all of the cases that used to be problematic :)

steakhal updated this revision to Diff 454009.Aug 19 2022, 8:13 AM
  • Use :digit: regex matching in the tests.
steakhal added inline comments.Aug 19 2022, 8:16 AM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
791–793

I'm not sure how much explanation I need to do here, but first, we should settle on the name of this member function if we decide to have it.

796–800

I guess, these two questions together form the answer. The ctor only asserts that it is either a any pointer, reference *or* a block pointer. However, I'd like to get this confirmed. @NoQ

steakhal updated this revision to Diff 454857.Aug 23 2022, 8:43 AM
  • Simplify getApproximatedType() to return sym->getType()->getPointeeType();
  • Add doc comments to getApproximatedType()
  • Removed the "Copied from RegionStoreManager::bind()" FIXME from the ExprEngine::VisitMemberExpr()
steakhal updated this revision to Diff 454859.Aug 23 2022, 8:44 AM

upload the same with context

martong accepted this revision.Aug 31 2022, 9:34 AM

I am okay with this change, it does give a proper canonical form, which is good. On the other hand, I agree that (on the long term) base regions and offsets would be a better memory model than what we have now with field and element regions.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
799

I think we should express that this returns the

  • type of the pointee
  • the type that is the "static" type, id est, the type of the declaration; void, char and base in your example above.

In this sense, what about getPointeeStaticType?

clang/test/Analysis/ctor.mm
221

Nice!

This revision is now accepted and ready to land.Aug 31 2022, 9:34 AM
steakhal added inline comments.Sep 1 2022, 12:45 AM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
799

I think it's important to discourage people from using this; hence the approximated calls attention that this might always be accurate.
I'm okay to add the pointee somewhere in the name.

I don't have a strong opinion about this.

martong added inline comments.Sep 1 2022, 4:16 AM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
799

Isn't precise enough to say that this returns the static type? I don't quite get the exact definition of the "approximation" we do here, so I guess others will not get that either.