This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Eval construction of non POD type arrays.
ClosedPublic

Authored by isuckatcs on Jun 16 2022, 7:53 AM.

Details

Summary

This patch is supposed to address the issue that we don't evaluate arrays if the contain non POD types.

The idea is to record the index of the current element being constructed, and keep calling the constructor
in a loop until we construct every element.

Diff Detail

Event Timeline

isuckatcs created this revision.Jun 16 2022, 7:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 16 2022, 7:53 AM
isuckatcs requested review of this revision.Jun 16 2022, 7:53 AM
NoQ added a comment.Jun 17 2022, 5:23 PM

I'll need some more time to digest this but it looks awesome!

clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
183–193

I think you can't avoid storing a LocationContext as part of the key (like ConstructedObjectKey does). This is because the code can potentially hit the same expression through recursion, and start processing it again, overwriting the old data while we still need it.

I also wonder if it would make sense to store the index as part of the usual ObjectsUnderConstruction map, simply as index of the ElementRegion that represents the object under construction. I suspect this is a bad idea because it can go wrong in situations like

S s[10];
new (s + 1) S[9];

where we probably don't want to construct an element of an element.

480–485

We really need to preserve this assertion in the general case. It's important as it catches missing state cleanups. Can you pattern-match the situation you're in more precisely?

clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
103–109

Could you elaborate why you need this? It sounds like you're dealing with multi-dimensional arrays?

isuckatcs added inline comments.Jun 18 2022, 4:46 AM
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
103–109

Yes, it is required for multi-dimensional arrays. I just haven't added the comments here yet.

isuckatcs updated this revision to Diff 438148.Jun 18 2022, 1:34 PM
  • Resolved merge conflicts
  • LocationContext is also part of the key
  • Testcases in ctor.mm and ctor-array.cpp pass
isuckatcs added inline comments.Jun 18 2022, 1:47 PM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
738

I fail to remove the element in ExprEngineCallAndReturn.cpp, and this is the only workaround I found is to put this check here, and if we shouldn't repeat the call, start the construction from the first element.

clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
376–377

This line has no effect and will be removed. Also this should be inside the IF statement's true branch.

clang/test/Analysis/dtor.cpp
600

No warning here, this patch seems to be breaking malloc checker.

clang/test/Analysis/new-ctor-conservative.cpp
26–31

This testcase is breaking checkConstructorInlining() (new-ctor-conservative.cpp:12), which passes if this one is not present.

NoQ added inline comments.Jun 18 2022, 10:58 PM
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
103–109

Oh I see, so when you encounter a multi-dimensional array, your solution is to flatten it and treat it as one big single-dimensional array.

I guess the only question I have is, did you double-check that multi-dimensional array constructors are actually called in this order???

clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
376–377

Cleaning up state typically has no noticeable behavior effects but it's traditionally considered to be important for performance reasons. Accessing and modifying immutable sets and maps in the program state takes up most of the analysis time according to various profilers that focus on low-level (per-instruction) profiles. Keeping these maps small usually increases access time dramatically. Hard to tell whether this specific bit will have a large impact but the overall tradition is definitely justified.

Another reason to clean up state is, it's much easier to assert() things when we know what exactly the state should be at any point. It allows us to assert() that we're not overwriting any existing state when we're entering another constructor loop.

clang/test/Analysis/new-ctor-conservative.cpp
26–31

Hmm interesting, the only reason I know for unrelated analyses (i.e., unrelated top level function analyses) to affect each other is inlining budgets. I.e., if we spend too much time inlining a function in one analysis, we may refuse to do that further in another analysis. Inlinable functions here are the constructor and destructor of S; operator new isn't inlinable because the whole point of the test is to evaluate it conservatively. Also path-sensitive analyses are performed from bottom of the file to top of the file when no call graph relations are present, so it kinda makes sense that bottom functions may affect top functions. But overall I don't see why this would happen in this specific scenario, I guess you may have to debug.

isuckatcs updated this revision to Diff 438506.Jun 20 2022, 5:26 PM
  • Evaluation now depends on maxBlockVisitOnPath.
isuckatcs updated this revision to Diff 438717.EditedJun 21 2022, 8:34 AM
  • Supporting heap allocated array construction
  • Implemented some missing parts.
  • Fixed testcases
isuckatcs updated this revision to Diff 438805.Jun 21 2022, 1:04 PM
isuckatcs edited the summary of this revision. (Show Details)

The resulting egraph is now linear, when it needs to be.

The correct parent-child relation was seen when N->getFirstPred() was the parent node.
The egraph was split because the node I wanted to replace was explicitly marked as a
successor of it's predecessor (see ExprEngineCallAndReturn.cpp:322) and this edge hadn't
been removed later.

isuckatcs updated this revision to Diff 438980.Jun 22 2022, 5:00 AM

FIxed the issues which was slowing down the testcases.

I added every expression I encountered in handleConstructionContext() to a map,
but I only removed some of them later, which probably resulted in a huge tree that
was slow to manage. Now that I only store the expression and the index if it should
be repeated, the test times are the same as before.

I will also need some more time to understand what is going on. Added some small comments inline.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
717

I think this last argument might be worth mentioning in the docstring above. E.g. could someone use this to initialize an object in a nested array like NonPod arr[3][3]? If no, we should let the potential users know. If yes, what should the Idx be in that case?

735

Could use getValueOr.

737

Could someone use handleConstructionContext for variadic length arrays in the future? Not supporting VLAs here is fine for now, I'm wondering if we want a fixme for the future.

870–871

What is the role of the index here? Should we rename this function?

clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
371

Instead of always adding one and sometimes decrementing back, we could use the inverse condition to sometimes add one.

I think it would be really useful to add something similar to printObjectsUnderConstructionJson so the indices can show up in the exploded graph. This could also help reviewers a lot to look at what is going on with small code examples :)

NoQ added a comment.Jun 23 2022, 1:08 AM

Looks like it's all coming together!

The variable length array case is indeed a beast of its own, we shouldn't aim to handle it in this patch but we also shouldn't crash or do something really broken. So I suspect that quite a few of those dyn_casts could use some critical thinking applied: in what ways can they fail?

clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
734

Before I forget, this function really should boil down to computeObjectUnderConstruction() and updateObjectsUnderConstruction(). Sometimes we call these two separately and it should be totally equivalent to calling this function, this is part of this function's contract. The new code has to be placed in one of those, because everything should still work when they're called separately.

clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
1097

Ok, can we have a constant length array of variable length arrays???

isuckatcs updated this revision to Diff 439362.Jun 23 2022, 5:51 AM

Addressed some comments.

isuckatcs marked 9 inline comments as done.Jun 23 2022, 6:26 AM
isuckatcs added inline comments.
clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
870–871

The current solution is that we loop over the same CXXConstructExpr as many times as many elements the array contains. This method is responsible for returning the memory region in which we want to construct the object. The Idx property here corresponds to the index of the memory region we need.

clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
103–109

The storage allocated for an array is a continous memory region regardless of it's dimensions. In case of a multi-dimensional array, the elements are laid out in a row major order. As a result T arr[4] = {1,2,3,4} and T arr[2][2] = {{1,2},{3,4}} have the same memory layout: | 1 | 2 | 3 | 4 |, and so the constructors are also called in the same order.

Added more testcases

I also found that in case of heap allocated (even multi-dimensional) arrays (POD/non-POD), we can't read an Undefined value (but we can read defined values).

int *arr = new int[2];
int x = arr[1]; <-- no warning

int (*arr)[2] = new int[2][2];
int x = arr[1][1]; <-- no warning
isuckatcs marked an inline comment as done.Jun 23 2022, 11:35 AM
isuckatcs added inline comments.
clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
1097

It seems we can't.

int arr[2][n] is modelled as:

VariableArrayType 'int[2][n]' 
|-VariableArrayType 'int[n]'

Though, the other way around int arr[n][2] is modelled as:

VariableArrayType 'int[n][2]'
|-ConstantArrayType 'int[2]'

If the variable is a constexpr e.g.:

constexpr int n = 2;
int arr[2][n];

We modell it as:

ConstantArrayType 'int[2][2]' 
`-ConstantArrayType 'int[2]'
NoQ added a comment.Jun 24 2022, 10:08 PM

Ok at a glance this looks amazing. Are there any known issues with this patch on your side yet?

I think you should write a test for checker callback order, to see that pre-call/post-call/eval-call are all invoked in the right order for every constructor. We have test/Analysis/cxxctr-evalcall-analysis-order.cpp, you can build on it.

I'll also look into evaluating this patch on some real-world code.

clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
450

Seems appropriate because the index isn't necessarily freshly added.

Could also assert that if the index is 0 then it's freshly added hmmm.

486

I think this could be made even stricter. The stricter the better (https://llvm.org/docs/CodingStandards.html#assert-liberally).

Maybe verify that there's an index tracked in the state?

clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
1097

Nice!

1123–1124

Maybe have the function accept CXXConstructExpr from the start? It's clearly a misuse of this function to call it with any other type, so compile-time check would be nice.

isuckatcs added inline comments.Jun 25 2022, 1:46 PM
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
486

Maybe verify that there's an index tracked in the state?

It is possible, though that requires a lot of pattern matching.

For example the function can look like this:

ProgramStateRef
ExprEngine::addObjectUnderConstruction(ProgramStateRef State,
                                       const ConstructionContextItem &Item,
                                       const LocationContext *LC, SVal V) {
  ConstructedObjectKey Key(Item, LC->getStackFrame());
  // FIXME: Currently the state might already contain the marker due to
  // incorrect handling of temporaries bound to default parameters.
  // The state will already contain the marker if we construct elements
  // in an array, as we visit the same statement multiple times before
  // the array declaration. The marker is removed when we visit the
  // DeclStmt.

  const Expr *E = nullptr;

  // 1.) try to extract the initializer from the DeclStmt
  if(auto DS = dyn_cast_or_null<DeclStmt>(Item.getStmtOrNull()))
  {
    if(auto VD = dyn_cast_or_null<VarDecl>(DS->getSingleDecl()))
      E = dyn_cast<CXXConstructExpr>(VD->getInit());
  }

  // 2.) if that fails, try to extract the initializer from CXXCtorInitializer
  if(!E && Item.getKind() == ConstructionContextItem::InitializerKind)
  {
    auto CtorInit = Item.getCXXCtorInitializer(); //!!! CRASH if there is no initializer
    E = CtorInit->getInit();
  }

  assert(!State->get<ObjectsUnderConstruction>(Key) ||
         Key.getItem().getKind() ==
             ConstructionContextItem::TemporaryDestructorKind ||
         State->contains<IndexOfElementToConstruct>({E,LC}));
  return State->set<ObjectsUnderConstruction>(Key, V);
}

Ok at a glance this looks amazing. Are there any known issues with this patch on your side yet?

I think you should write a test for checker callback order, to see that pre-call/post-call/eval-call are all invoked in the right order for every constructor. We have test/Analysis/cxxctr-evalcall-analysis-order.cpp, you can build on it.

I don't know about any issues unique to this patch.

The problem with checking whether each callback is invoked for every constructor, is that the call evaluation itself happens in ExprEngine::defaultEvalCall(), which is not invoked if a checker evaluates the call (see CheckerManager::runCheckersForEvalCall()).
If ExprEngine::defaultEvalCall() is not invoked, we don't enter the call, and the looping won't happen either (the looping happens in ExprEngine::processCallExit()). Recursive functions also face the same issue.

One workaround is to create a dummy checker that logs when it is entered, but never "evaluates" the call, or to modify the existing checker. At the moment I can check for PreCall and PostCall evaluation order.

isuckatcs marked 3 inline comments as done.

Addressed comments.

  • Added a function similar to printObjectsUnderConstructionJson so the indices can show up in the exploded graph.
  • Wrote a test for checker callback order.

The resulting exploded node after the indices are also printed:

isuckatcs updated this revision to Diff 440157.Jun 27 2022, 3:56 AM
isuckatcs marked 2 inline comments as done.

addressed comments

isuckatcs marked an inline comment as done.Jun 27 2022, 4:09 AM
isuckatcs added inline comments.
clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
737

Technically this function is called every time a constructor is invoked regardless of it being an array, or not. As I see it, this function is only responsible for creating a region, in which the constructed object will be stored, and binding every information to the state that is necessary to know about for evaluating the constructor. You can use this function for anything being constructed, even an element of a VLA.

Though to handle VLAs in the future I don't think we have to touch this function. I would simply compute the size of the VLA in shouldInlineArrayConstruction, and inline it if I can. If inlining is not possible, conservative evaluation is used, as it used to be/as it happens now.

Overall, this is starting to take a really great shape :) I'm hopeful we can land this soon.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
870–871

I see. So does that mean we need to update the name and the documentation? Or does it still return the first element of the array and only use Idx to determine which array in case of multi-dimensional arrays?

clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
188

I think Expr is too vague, a potential user might have a hard time to decide which Expr should this data correspond to. If it is always a CXXConstructExpr, I think we should make the type express that. I think it would be nice to have a small comment expressing in which cases will a CXXConstructExpr have an entry in a map. Is it only when a CXXConstructExpr is a descendant of an ArrayInitLoopExpr?

488

Will this getSingleDecl always be sufficient? I think this might return a nullptr when we have multiple declarations with the same declarator:

NonPod a[2][2], b[2][2];

I think it is fine to not support this scenario in the first iteration, but let's leave a FIXME if that is something we do not support at the moment.

clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
230

ShouldLoopCall might not be expressive enough. How about something like ShouldInlineArrayCtors?

1082

We actually already have something like this: ASTContext::getConstantArrayElementCount

isuckatcs added inline comments.Jun 27 2022, 1:59 PM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
870–871

It returns the Idxth element, so we have to update the documentation and the name too.

clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
188

A CXXConstructExpr will be a part of this map if it is array type, regardless of it's parent. I think this Expr is always a CXXConstructExpr, so I will change the parameter type.

488

I tested your example. Even though in the AST the DeclStmt has 2 children, inside this function, the ConstructionContextItem contains a DeclStmt which only has 1 child. So your example works fine.

clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
230

That technically would be incorrect. I created a method with the same name, which is called, when the analyzer decides whether to inline the call, or fall back to conservative evaluation. At this point the ctor has already been inlined.

This variable is responsible for deciding whether the analyzer increments the index when it exits the call, which results in moving to the next statement, or it doesn't increment the index, so the same statement is evaluated again instead. It's true, that at the moment only array construction desires this behaviour, but in the future this might change. As a result I wouldn't rename the variable.

isuckatcs added inline comments.Jun 27 2022, 2:02 PM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
870–871

By the way, the definition of this function is also affected by this patch, you can find it in ExprEngineCXX.cpp:97.

xazax.hun added inline comments.Jun 27 2022, 2:13 PM
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
488

If it works, that is great news! Let's add some tests to make sure we detect it when the representation changes.

NoQ added a comment.Jun 27 2022, 4:18 PM

Ok I don't have any further immediate concerns about this patch. I strongly recommend some testing on real-world code, there may be a lot of cornercases we missed. But other than that, I think it's good to go, great job!!

clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
486

Oh right, that's because you're tracking it by construct-expression and I never found a compelling reason to include the construct-expression in the context. This also means that there's no easy way to match objects and indices in the exploded graph dump.

488

Yes, that's because we actively separate our DeclStmts while building the CFG. So they're synthetic.

clang/test/Analysis/dtor.cpp
596

At this point we might as well do an // expected-warning@-19 or something like that.

isuckatcs updated this revision to Diff 440515.Jun 28 2022, 1:06 AM
isuckatcs marked 10 inline comments as done.

Addressed comments

  • Renamed makeZeroElementRegion() to makeElementRegion() and updated the description.
  • Removed computeConstantArraySize() and using ASTContext::getConstantArrayElementCount() instead.
  • The new trait store CXXConstructExpr instead of Expr. I also added a description to it.
  • Added a testcase when we have multiple declarations with the same declarator.
isuckatcs added inline comments.Jun 28 2022, 1:08 AM
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
486

I needed to find a representation that can be accessed from both processCallExit() and handleConstructor() and the expression is what both of them can see.

clang/test/Analysis/dtor.cpp
596

I wouldn't change this one. I think it is easier to find the appropriate line if the line number is explicitly there.

Szelethus retitled this revision from [Static Analyzer] Eval construction of non POD type arrays. to [analyzer] Eval construction of non POD type arrays..Jun 30 2022, 6:15 AM

So sorry, I know I took my sweet time -- the patch looks great, and currently I'm running some analysis with it. As soon as I have the results on my hand I'd be happy to share and accept. I added a couple nots here and there, none of them are particularly interesting, feel free to attend to them when we are sure that no meaningful changes are needed to be made.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
713–717

I suppose "parameter" instead of "property" would be better?

720

Please terminate sentences.

810
814
902
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
464–465

Ohhh nice! Gotta add that to my arsenal.

478–483

Maybe we should move this to the assert like before?

486

Ugh, is there a way to add an assert message as a poor man's documentation? This might only be obvious in the context of this patch.

clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
230

Maybe what you just wrote here is missing from the code, because looking at only that, I find this variable a bit cryptic as well.

clang/test/Analysis/ctor-array.cpp
2

I prefer multi-line RUN: commands.

clang/test/Analysis/operator-calls.cpp
105

I suppose this function name no longer stands?

So, it seems like the original iteration variable warning is still not here (I traced this back to D1425, a fix for https://bugs.llvm.org/show_bug.cgi?id=16745), but what is assigned a garbage value is not really stated here -- can you confirm for sure that its not the iteration variable? Can we test it?

115

Same here.

isuckatcs updated this revision to Diff 443446.Jul 9 2022, 10:01 AM
isuckatcs marked 8 inline comments as done.
  • Addressed some comments
  • Added more test cases
  • Handling list initialization on the heap (e.g.: Non-POD *arr = new Non-POD[2]{{1, 2}, {5, 6}};)
  • Added a FIXME, as we're unable to handle member list initialization (e.g.: StructCtor() : arr{{1, 2}, {3, 4}} {})
clang/test/Analysis/ctor-array.cpp
2

I think it's easier to copy the commands if needed, when they are in a single line.
Also it's not unique to this file, we do it in several other files, like ctor.mm.

Some early results: https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=qtbase_v6.2.0_pod_array_bindings_baseline&run=protobuf_v3.13.0_pod_array_bindings_baseline&items-per-page=100&sort-by=runDate&sort-desc=true&newcheck=qtbase_v6.2.0_pod_array_bindings&newcheck=protobuf_v3.13.0_pod_array_bindings&is-unique=on&diff-type=New

These are the new reports as a result of this patch. Something stupid prevented me from getting data on LLVM+Clang, I'm trying that again now. Analyzed the following projects: memcached,tmux,curl,twin,vim,openssl,sqlite,ffmpeg,postgres,tinyxml2,libwebm,xerces,bitcoin,protobuf,qtbase,codechecker,contour,llvm-project

isuckatcs added a comment.EditedJul 12 2022, 8:47 AM

Some early results: https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=qtbase_v6.2.0_pod_array_bindings_baseline&run=protobuf_v3.13.0_pod_array_bindings_baseline&items-per-page=100&sort-by=runDate&sort-desc=true&newcheck=qtbase_v6.2.0_pod_array_bindings&newcheck=protobuf_v3.13.0_pod_array_bindings&is-unique=on&diff-type=New

These are the new reports as a result of this patch. Something stupid prevented me from getting data on LLVM+Clang, I'm trying that again now. Analyzed the following projects: memcached,tmux,curl,twin,vim,openssl,sqlite,ffmpeg,postgres,tinyxml2,libwebm,xerces,bitcoin,protobuf,qtbase,codechecker,contour,llvm-project

Looking at the results, I would say the 3rd (qeglfscursor.cpp) and the 4th (qicohandler.cpp) are definitely correct.

In the 2nd (repeated_field.h) the array they are trying to access is void* elements[1];. If we assume that current_size_ is not 0, then this report is also correct.

The 1st (blake2-impl.h) one is intersteing though. It appears that sizeof(blake2b_param[1]) == 64 and sizeof(S->h[i]) == 8. So basically they create a pointer that points to the front of the
object, and increment the pointer by 8 bytes in each iteration. Then load64() reads and returns 8 bytes past that pointer. On the last iteration the said function returns the last 8 bytes of the object.
This object on which they "iterate" over has been initialized, so I think this result is a false positive. blake2b_param is a packed struct, this might has to do something with it. The value flagged as
garbage is the first parameter of uint8_t reserved[14], which is initialized by memset(P->reserved, 0, sizeof(P->reserved)).

We fail to evaluate the memset calls. The result of each call is conj_$1{uint8_t, LC1, S15740, #1}. And later we read Undefined from it.

I replaced blake2b_param P[1]; with blake2b_param tst; blake2b_param *P = &tst; and analyzed the snippet with clang 14.0 release. I get the same false positive, so this has nothing to do with my patch.

Szelethus accepted this revision.Jul 13 2022, 4:11 AM

I suppose enough has been said! Well done! Please attend to the rest of the inlines and commit at your convenience.

This revision is now accepted and ready to land.Jul 13 2022, 4:11 AM
isuckatcs updated this revision to Diff 444800.Jul 14 2022, 2:28 PM
isuckatcs marked 5 inline comments as done.

Addressed remaining comments.

isuckatcs marked an inline comment as done.Jul 14 2022, 2:29 PM
isuckatcs added inline comments.
clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
230

I renamed the variable and added a comment.

clang/test/Analysis/operator-calls.cpp
105

Since core.uninitialized.Assign doesn't report which field is being assigned an undefined value, in this automatically generated copy assignment, there is no way we can tell it for sure. The best I can do is I add -analyzer-output=text to the run commands and check the output result.

This revision was landed with ongoing or failed builds.Jul 14 2022, 2:32 PM
This revision was automatically updated to reflect the committed changes.

This patch broke the exploded-graph-rewriter.
It prints this stack-trace, when I'm using the b032e3ff6121a969b2e90ad7bf493c2d5d7ac3a2. The issue is still present at main.

Traceback (most recent call last):
  File "/home/user/git/llvm-project/clang/utils/analyzer/exploded-graph-rewriter.py", line 1054, in <module>
    main()
  File "/home/user/git/llvm-project/clang/utils/analyzer/exploded-graph-rewriter.py", line 1033, in main
    graph.add_raw_line(raw_line)
  File "/home/user/git/llvm-project/clang/utils/analyzer/exploded-graph-rewriter.py", line 395, in add_raw_line
    self.nodes[node_id].construct(node_id, json_node)
  File "/home/user/git/llvm-project/clang/utils/analyzer/exploded-graph-rewriter.py", line 322, in construct
    self.state = ProgramState(json_node['state_id'],
  File "/home/user/git/llvm-project/clang/utils/analyzer/exploded-graph-rewriter.py", line 303, in __init__
    if json_ps['index_of_element'] is not None else None
KeyError: 'index_of_element'

Unfortunately, I cannot share the egraph for reproduction, but I believe the trace should be enough for fixing this.

@steakhal

Can you send me a snippet please, which reproduces this issue? For me the egraph rewriter works fine.

@steakhal

Can you send me a snippet please, which reproduces this issue? For me the egraph rewriter works fine.

I had the time for reducing it. Luckily it wasn't that bad.

digraph "Exploded Graph" {
	label="Exploded Graph";

	Node0x7fda6c098a40 [shape=record,label="{\{ \"state_id\": 0,\l&nbsp;&nbsp;\"program_points\": [\l&nbsp;&nbsp;&nbsp;&nbsp;\{ \"kind\": \"Edge\", \"src_id\": 20, \"dst_id\": 19, \"terminator\": null, \"term_kind\": null, \"tag\": null, \"node_id\": 1, \"is_sink\": 0, \"has_report\": 0 \},\l&nbsp;&nbsp;&nbsp;&nbsp;\{ \"kind\": \"BlockEntrance\", \"block_id\": 19, \"tag\": null, \"node_id\": 2, \"is_sink\": 0, \"has_report\": 0 \},\l&nbsp;&nbsp;&nbsp;&nbsp;\{ \"kind\": \"Statement\", \"stmt_kind\": \"DeclRefExpr\", \"stmt_id\": 7381222, \"pointer\": \"0x7fda78fd1e50\", \"pretty\": \"storage\", \"location\": \{ \"line\": 12, \"column\": 28, \"file\": \"/redacted/file.cc\" \}, \"stmt_point_kind\": \"PreStmtPurgeDeadSymbols\", \"tag\": \"ExprEngine : Clean Node\", \"node_id\": 3, \"is_sink\": 0, \"has_report\": 0 \}\l&nbsp;&nbsp;],\l&nbsp;&nbsp;\"program_state\": \{\l&nbsp;&nbsp;&nbsp;&nbsp;\"store\": null,\l&nbsp;&nbsp;&nbsp;&nbsp;\"environment\": null,\l&nbsp;&nbsp;&nbsp;&nbsp;\"constraints\": null,\l&nbsp;&nbsp;&nbsp;&nbsp;\"equivalence_classes\": null,\l&nbsp;&nbsp;&nbsp;&nbsp;\"disequality_info\": null,\l&nbsp;&nbsp;&nbsp;&nbsp;\"dynamic_types\": null,\l&nbsp;&nbsp;&nbsp;&nbsp;\"dynamic_casts\": null,\l&nbsp;&nbsp;&nbsp;&nbsp;\"constructing_objects\": null,\l&nbsp;&nbsp;&nbsp;&nbsp;\"checker_messages\": null\l&nbsp;&nbsp;\}\l\}\l}"];
	Node0x7fda6c098a40 -> Node0x7fda6c098cb0;
	Node0x7fda6c098cb0 [shape=record,label="{\{ \"state_id\": 78,\l&nbsp;&nbsp;\"program_points\": [\l&nbsp;&nbsp;&nbsp;&nbsp;\{ \"kind\": \"Statement\", \"stmt_kind\": \"DeclRefExpr\", \"stmt_id\": 7381222, \"pointer\": \"0x7fda78fd1e50\", \"pretty\": \"storage\", \"location\": \{ \"line\": 12, \"column\": 28, \"file\": \"/redacted/file.cc\" \}, \"stmt_point_kind\": \"PostLValue\", \"tag\": null, \"node_id\": 4, \"is_sink\": 0, \"has_report\": 0 \},\l&nbsp;&nbsp;&nbsp;&nbsp;\{ \"kind\": \"Statement\", \"stmt_kind\": \"DeclRefExpr\", \"stmt_id\": 7381222, \"pointer\": \"0x7fda78fd1e50\", \"pretty\": \"storage\", \"location\": \{ \"line\": 12, \"column\": 28, \"file\": \"/redacted/file.cc\" \}, \"stmt_point_kind\": \"PostStmt\", \"tag\": null, \"node_id\": 5, \"is_sink\": 0, \"has_report\": 0 \}\l&nbsp;&nbsp;],\l&nbsp;&nbsp;\"program_state\": \{\l&nbsp;&nbsp;&nbsp;&nbsp;\"store\": null,\l&nbsp;&nbsp;&nbsp;&nbsp;\"environment\": \{ \"pointer\": \"0x7fda6c02e890\", \"items\": [\l&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\{ \"lctx_id\": 1, \"location_context\": \"#0 Call\", \"calling\": \"redacted::Fun\", \"location\": null, \"items\": [\l&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\{ \"stmt_id\": 7381222, \"pretty\": \"storage\", \"value\": \"&storage\" \}\l&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;]\}\l&nbsp;&nbsp;&nbsp;&nbsp;]\},\l&nbsp;&nbsp;&nbsp;&nbsp;\"constraints\": null,\l&nbsp;&nbsp;&nbsp;&nbsp;\"equivalence_classes\": null,\l&nbsp;&nbsp;&nbsp;&nbsp;\"disequality_info\": null,\l&nbsp;&nbsp;&nbsp;&nbsp;\"dynamic_types\": null,\l&nbsp;&nbsp;&nbsp;&nbsp;\"dynamic_casts\": null,\l&nbsp;&nbsp;&nbsp;&nbsp;\"constructing_objects\": null,\l&nbsp;&nbsp;&nbsp;&nbsp;\"checker_messages\": null\l&nbsp;&nbsp;\}\l\}\l}"];
	Node0x7fda6c098cb0 -> Node0x7fda6c099160;
	Node0x7fda6c099160 [shape=record,label="{\{ \"state_id\": 228,\l&nbsp;&nbsp;\"program_points\": [\l&nbsp;&nbsp;&nbsp;&nbsp;\{ \"kind\": \"Statement\", \"stmt_kind\": \"MemberExpr\", \"stmt_id\": 7381229, \"pointer\": \"0x7fda78fd1e88\", \"pretty\": \"storage-\>Comp\", \"location\": \{ \"line\": 12, \"column\": 28, \"file\": \"/redacted/file.cc\" \}, \"stmt_point_kind\": \"PostStmt\", \"tag\": null, \"node_id\": 8, \"is_sink\": 0, \"has_report\": 0 \}\l&nbsp;&nbsp;],\l&nbsp;&nbsp;\"program_state\": \{\l&nbsp;&nbsp;&nbsp;&nbsp;\"store\": null,\l&nbsp;&nbsp;&nbsp;&nbsp;\"environment\": \{ \"pointer\": \"0x7fda6c02e890\", \"items\": [\l&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\{ \"lctx_id\": 1, \"location_context\": \"#0 Call\", \"calling\": \"redacted::Fun\", \"location\": null, \"items\": [\l&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\{ \"stmt_id\": 7381222, \"pretty\": \"storage\", \"value\": \"&storage\" \},\l&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\{ \"stmt_id\": 7381226, \"pretty\": \"storage\", \"value\": \"&SymRegion\{reg_$0\<class redacted::mfun * storage\>\}\" \},\l&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\{ \"stmt_id\": 7381229, \"pretty\": \"storage-\>Comp\", \"value\": \"&code\{Comp\}\" \}\l&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;]\}\l&nbsp;&nbsp;&nbsp;&nbsp;]\},\l&nbsp;&nbsp;&nbsp;&nbsp;\"constraints\": null,\l&nbsp;&nbsp;&nbsp;&nbsp;\"equivalence_classes\": null,\l&nbsp;&nbsp;&nbsp;&nbsp;\"disequality_info\": null,\l&nbsp;&nbsp;&nbsp;&nbsp;\"dynamic_types\": null,\l&nbsp;&nbsp;&nbsp;&nbsp;\"dynamic_casts\": null,\l&nbsp;&nbsp;&nbsp;&nbsp;\"constructing_objects\": null,\l&nbsp;&nbsp;&nbsp;&nbsp;\"checker_messages\": null\l&nbsp;&nbsp;\}\l\}\l}"];
}

I had the time for reducing it. Luckily it wasn't that bad.

This egraph doesn't contain an entry with key "index_of_element", which is strange.

I still feel like I need a code snippet, which generates such egraph.

I had the time for reducing it. Luckily it wasn't that bad.

This egraph doesn't contain an entry with key "index_of_element", which is strange.

I still feel like I need a code snippet, which generates such egraph.

Oh, now I see why. The problem is probably because the version of clang I'm using is not the same as the rewrite script. For the latter I believe I'm using the latest one, hence it hes this change. Unlike with the clang I'm using, which doesn't emit this metadata.
I'm wondering if we should have a window for backward compatibility; There are a couple of places already where we check for the presence of some metadata and display it only if it exists.
Do you think it would be useful for this change as well?
Anyway, my bad for not recognizing the situation sooner. So, I'll just use the matching version of the rewriter. Problem solved for me.

I'm wondering if we should have a window for backward compatibility; There are a couple of places already where we check for the presence of some metadata and display it only if it exists.
Do you think it would be useful for this change as well?

It has been fixed in D131187 as a part of another change.

xazax.hun added inline comments.Aug 29 2022, 8:58 AM
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
482

Hmm, when we are building our Cfgs, we explode DeclStatements with multiple declarations into separate DeclStatements, I'd expect getSingleDecl to never return nullptr. Maybe we can remove the or_null.