This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Remove the loop from the exploded graph caused by missing information in program points
ClosedPublic

Authored by isuckatcs on Feb 4 2023, 2:57 PM.
Tokens
"Like" token, awarded by BoYanZh.

Details

Summary

This patch is meant to fix #60412.

When we falled back to conservative evaluation, CallEvent::getProgramPoint() was used to
create a new program point at the location of the call.

If an implicit destructor was invoked, the location of the call was set to the declaration of the
destructor, which lead to a loop in some cases, like the one described in the issue above.

This patch changes the way, the new program points are created and ensures that we no longer
have a loop in the egraph when not needed. This behaviour seems to also increase our coverage,
as the egraph no longer stops at the looped section.

Old egraphNew egraph

Diff Detail

Event Timeline

isuckatcs created this revision.Feb 4 2023, 2:57 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 4 2023, 2:57 PM
isuckatcs requested review of this revision.Feb 4 2023, 2:57 PM
isuckatcs updated this revision to Diff 494862.Feb 4 2023, 4:58 PM

Found a different solution by changing the location of the destructors.

In my oppinion none of these solution are 100% accurate. The previous one gives a local solution on one specific call site,
but doesn't solve the problem globally. This one changes the location of the destructor call, but as you can see it can affect
some checker outputs.

When reviewing please check the history and review the previous solution too (that's only 1 if statement).

I think this patch should also add a test that fails before your changes.

I am not sure if we ever want to make a distinction between program points based on actual source locations. I am not opposed to it as a last resort, but to me it feels a bit fragile. Do we want macro expansion location? Or spelling location? The former can be bad for diagnostic purposes. The latter can be bad when macros are involved (e.g. we can still have the same spelling location for multiple expansions, so we can still end up in a loop.)

The big difference between the automatic destructors is that separate objects are being destroyed. I think it would be great if we could somehow incorporate this knowledge into the program point (e.g., the target region of the dtor).

clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h
833

I think we want each \param in new lines.

isuckatcs edited the summary of this revision. (Show Details)Feb 7 2023, 4:06 AM
isuckatcs updated this revision to Diff 495467.Feb 7 2023, 4:13 AM

Added two new program points for destructors.

isuckatcs added a comment.EditedFeb 7 2023, 4:25 AM

The big difference between the automatic destructors is that separate objects are being destroyed. I think it would be great if we could somehow incorporate this knowledge into the program point (e.g., the target region of the dtor)

Using the location of the object being destructed instead of the location of the dtor decl was my initial way to incorporate this information to the program point. I just forgot about the different kinds of locations.

xazax.hun accepted this revision.Feb 7 2023, 8:34 AM

Thanks! This is exactly what I had in mind! Let's wait for @NoQ just in case he had something different in mind.

This revision is now accepted and ready to land.Feb 7 2023, 8:34 AM
NoQ added a comment.Feb 7 2023, 4:52 PM

Ooo, this looks amazing.

I think the right extra data to include is the CFGElementRef to the destructor element. This way destructors for different objects still receive different program points, but there's no longer any difference between these program points on different paths. This would also prevent program points from knowing anything about path-sensitive analysis, so they could be useful for building other CFG-based analyses even when path sensitivity is not desired. Also, CFGElementRef can be added to all implicit calls, so you no longer need a special subclass for destructors.

Actually it probably makes perfect sense to include CFGElementRef in most program points, and have getStmt() look up the statement from the CFG instead of storing the statement directly. But that's a much bigger refactoring pass.

clang/include/clang/Analysis/ProgramPoint.h
589–591

This isKind() implementation says that PreDestructorCall is not a sub-class of PreImplicitCall.

NoQ added a comment.Feb 7 2023, 4:59 PM

When reviewing please check the history and review the previous solution too (that's only 1 if statement).

Yeah I think I agree with your assessment, your initial solution probably covers this one test case but it doesn't actually help us discriminate between different destructors. It's really valuable that you found a way to have them have actual different program points.

I think the right extra data to include is the CFGElementRef to the destructor element.

Oooh, so the different implicit dtor calls are actually different elements in the CFG. This is even better; I fully agree with this approach.

isuckatcs added a comment.EditedFeb 8 2023, 1:53 PM

I think the right extra data to include is the CFGElementRef to the destructor element.

To construct a CFGElementRef we need the parent CFG block and the index of the element.

using CFGElementRef = ElementRefImpl<false>;

...
ElementRefImpl(CFGBlockPtr Parent, size_t Index)
        : Parent(Parent), Index(Index) {}
...

In ExpressionEngine both of these are stored in members currStmtIdx and currBldrCtx,
however CallEvent doesn't know about this information. CallEvent also needs this information
because CallEvent::getProgramPoint() creates implicit call related program points, so we need to
modify every call site of CallEvent::CallEvent() to pass this information. One of these call sites is
NoStateChangeFuncVisitor::VisitNode(), where at first glance we'll have a hard time to figure
out the element index.

I think that for this solution to work, we need to remove the program point creation from CallEvent
and every other class that does the same (if there's any other).

isuckatcs added a comment.EditedFeb 8 2023, 2:00 PM

Also one more thing I see here is that while implicit destructors are present in the CFG,
the member destructors are not.

struct Test {
  Test() {}
  ~Test();
};

struct A {
  Test m1, m2;
};

int main() {
  A a, b;

  return 0;
}
[B1 (ENTRY)]
   Succs (1): B0

 [B0 (EXIT)]
   Preds (1): B1

int main()
 [B2 (ENTRY)]
   Succs (1): B1

 [B1]
   1:  (CXXConstructExpr, [B1.2], A)
   2: A a;
   3:  (CXXConstructExpr, [B1.4], A)
   4: A b;
   5: 0
   6: return [B1.5];
   7: [B1.4].~A() (Implicit destructor)
   8: [B1.2].~A() (Implicit destructor)
   Preds (1): B2
   Succs (1): B0

 [B0 (EXIT)]
   Preds (1): B1

We have CFG entries for a and b but there are none for m1 and m2. I'm not sure if
it's a CFG dumping issue in debug.DumpCFG, or something else though. ExprEngine
visits CFGElements, so there must be one for both member destructors I just don't know if their lack of presence in the dump will affect the proposed solution or not.

NoQ added a comment.Feb 8 2023, 8:33 PM

CallEvent also needs this information because CallEvent::getProgramPoint() creates implicit call related program points, so we need to modify every call site of CallEvent::CallEvent() to pass this information.

Oh, yeah, right, they *also* need CFGElements, and they *also* already sometimes carry target memregions instead, as a hack. Yeah this may turn into a massive refactoring pass. I think we can try to land the current approach then.

One of these call sites is NoStateChangeFuncVisitor::VisitNode(), where at first glance we'll have a hard time to figure out the element index.

In this case SCtx is the callee context (that's what getCaller() consumes) which provides getCallSideBlock()/getIndex() to look up the CFG element.

We have CFG entries for a and b but there are none for m1 and m2.

Destructor calls for m1 and m2 aren't in the CFG for main(), they're in the CFG for ~A(). Because ~A() is the default destructor, it's not getting analyzed at top level, so there's no CFG dump for it. You'll see them if you define an explicit destructor for A.

isuckatcs updated this revision to Diff 496510.Feb 10 2023, 9:13 AM
  • Added CFGElementRef to ImplicitCallPoint
  • Added 2 test cases

I only added the element ref as a new field instead of replacing any existing field,
because extracting the Decl out of CFGElementRef requires a lot of boilerplate
code. Similar to what is seen inside ExprEngine::processCFGElement().

I didn't add CFGElementRef to StmtPoint because there are a lot of different
modules that can create them, and each of them needs to be refactored for the
change to properly work.

isuckatcs added inline comments.Feb 10 2023, 9:35 AM
clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
3450

One currently known drawback is that in situations like this we need to create a dummy element ref.
Here the call event is only created to extract the declaration from the call expression.

BoYanZh added a subscriber: BoYanZh.
NoQ added a comment.Feb 16 2023, 12:57 PM

Dude, this is beautiful. I have like one nitpick, and I think we can land.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h
833–834

I suspect that Target isn't necessary anymore, as it can be reconstructed from the CFG element. IIRC this could untie our hands in a few cases. I might be wrong. So I recommend adding a FIXME.

871–872

Same FIXME here, hopefully.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
164

You don't need to track it separately, it can be derived from the two previous member variables:

{currBldrCtx->getBlock(), currStmtIdx}
clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
3450

Given that there's a statement, we could grep the CFG to find it. But I agree that it looks unnecessary.

isuckatcs marked 3 inline comments as done.Feb 16 2023, 4:41 PM
isuckatcs added inline comments.
clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h
833–834

I can add a FIXME, though if I see it correctly, deducing the region from the CFG element is not trivial. The ExprEngine::Process...Dtor() functions are responsible for finding the region, which region will later be passed to the call event. A lot of these functions have a different logic for creating the region and we need to deal with arrays too, where figuring out the current element relies on ProgramStateTraits. So removing the target will also involve some refactoring, and this call event will also need to know about a program state.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
164

This is actually deliberate. We use the CFGElementRef multiple places over ExpressionEngine and I think it's easier and more readable to just use currElementRef instead of using an initializer list. Also this way it's only set once in ExprEngine::processCFGElement() and is harder to misuse it.

clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
3450

I was thinking about adding one more constructor for these kind of situations, which doesn't take an element ref as parameter, thought I think it would isntead result in forgetting about passing the element ref when they are needed.

isuckatcs marked 2 inline comments as done.Feb 16 2023, 4:42 PM
NoQ added inline comments.Feb 21 2023, 6:37 PM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h
833–834

Yeah sounds like the array index will need to be supplied separately. But the rest of the procedure can probably be moved into CallEvent and then called back. The worst case is probably the temporary destructor, as it requires looking up object-under-construction for the respective CXXBindTemporaryExpr. In case of constructors it's easier, as it's already almost entirely in getObjectUnderConstruction(). So as long as the objects-under-construction trait and the element-under-construction trait are disconnected from ExprEngine (in a separate header like Taint.h), we can have CallEvents figure them out on their own.

So definitely doable, I still don't remember what I needed that for though. One way or another, I'm glad we're opening up this opportunity.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
164

You can make a method to avoid spelling out curly braces, like getCurrElementRef() or something like that. I think it's even harder to misuse it if it's set zero times.

isuckatcs updated this revision to Diff 499418.Feb 22 2023, 1:56 AM
isuckatcs marked 4 inline comments as done.

Addressed comments

clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h
833–834

I actually missed that a ProgramStateRef is already passed to this object in the constructor (on the line below this discussion), so we have access to the state traits. In this case we just need to port the logic from ExprEngine::ProcessImplicitDtor() and somehow separate it from ExprEngine completely.

If I see it correctly, we need to make ExprEngine::prepareStateForArrayDestruction() static and ExprEngine::makeElementRegion() public or accessible from CallEvent some other way. Anyway this will be a huge refactoring patch, so I added the FIXME.

isuckatcs updated this revision to Diff 499426.Feb 22 2023, 2:08 AM

Fixed artifacts cause by git clang-format

isuckatcs retitled this revision from [analyzer] Remove the loop from the exploded graph caused by conservative evaluation to [analyzer] Remove the loop from the exploded graph caused by missing information in program points.Feb 22 2023, 5:22 AM
NoQ accepted this revision.Feb 22 2023, 5:23 PM

Looks amazing now! Thank you for getting to the bottom of this and building this elaborate fix for the root cause!

isuckatcs updated this revision to Diff 499992.Feb 23 2023, 3:00 PM

Applied git clang-format

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2023, 5:02 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript