This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] print() JSONify: Store implementation
ClosedPublic

Authored by Charusso on May 14 2019, 12:41 PM.

Diff Detail

Repository
rC Clang

Event Timeline

Charusso created this revision.May 14 2019, 12:41 PM
NoQ added a comment.May 14 2019, 3:48 PM

Yay great!

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
128 ↗(On Diff #199491)

Let's make the caller responsible for this last comma.

2623 ↗(On Diff #199491)

Let's turn this whole thing into a valid JSON, i.e., say, { "title": "Store", "items": ... }.

NoQ added inline comments.May 14 2019, 3:50 PM
clang/include/clang/Basic/JsonSupport.h
17–21 ↗(On Diff #199491)

Sigh, that's roughly the ~5th implementation of this functionality. I also don't think there is going to be anything JSON-specific here. That said, i don't have an immediate better suggestion on where to put it.

Charusso updated this revision to Diff 199667.May 15 2019, 1:17 PM
Charusso marked 3 inline comments as done.
  • More JSONify.
Charusso planned changes to this revision.May 15 2019, 1:18 PM

PrettyStackTraceLocationContext.h definitely needs the default print() method (may other users too). Even it is a little bit silly but I think we have to keep them and create printJSON(). What do you think?

clang/include/clang/Basic/JsonSupport.h
17–21 ↗(On Diff #199491)

I think the CTU is will be one of the users here and may we too with HTML parsing. This place is all about spacing and new-lines.

Charusso marked an inline comment as done.May 15 2019, 1:25 PM
Charusso added inline comments.
clang/include/clang/Basic/JsonSupport.h
21 ↗(On Diff #199667)

E.g. something like that would be very helpful, but I will remove that by the next patch.

NoQ added a comment.May 15 2019, 4:09 PM

PrettyStackTraceLocationContext.h definitely needs the default print() method (may other users too). Even it is a little bit silly but I think we have to keep them and create printJSON(). What do you think?

Yeah, we'll have to inevitably duplicate this one, i guess. I'm not really sure how important our pretty stack traces are, but i guess let's not break them.

NoQ accepted this revision.May 15 2019, 4:11 PM

Generally this is good to go, i think. I like it!

clang/test/Analysis/expr-inspection.c
15 ↗(On Diff #199667)

<3

Charusso updated this revision to Diff 200096.May 17 2019, 2:21 PM
Charusso marked 2 inline comments as done.
  • Better header handling.
  • Less JSON.
This revision is now accepted and ready to land.May 17 2019, 2:21 PM

Thanks for the accept! Sorry for the extra noise but it came out it is tricky to create valid JSON for each sub-print and we do not need it for now. I wanted to create the most compact and necessary JSON without touching the JsonSupport.h.

clang/test/Analysis/expr-inspection.c
15 ↗(On Diff #199667)

If you have mentioned it, I would bet you do not <3 easter eggs. Reverted being a funny kid.

PS: Outcome of program state printing could be found in D62087 which is very similar to your idea in D61677.

NoQ added inline comments.May 17 2019, 3:59 PM
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
2627 ↗(On Diff #200096)

I think mentioning Store twice is redundant. I'll be perfectly happy with either

"program_state": [
  { "title": "Store", "items": [ ... ]},
  { "title": "Environment", "items": [ ... ]},
  ...
]

or

"program_state": [
  "store": [ ... ],
  "environment": [ ... ],
  ...
]
clang/test/Analysis/expr-inspection.c
15 ↗(On Diff #199667)

^.^

Charusso marked an inline comment as done.May 17 2019, 4:08 PM
Charusso added inline comments.
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
2627 ↗(On Diff #200096)

You have to navigate inside the JSON tree so the second part is necessary, but the first part is the title-idea to put some info as a title. E.g. [Store (ID 666, Type: Region)] and do not create extra JSON-field where it does not need to be handled in HTML, nor CTU. It happens mostly with IDs.

Charusso planned changes to this revision.May 21 2019, 1:18 AM

After a while I think we could drop the IDs and the title part.

Here is a plan 2.0:

  • Drop every ID.
  • If any of the Store, Environment, etc. has not got any element make it null, so that drop the items part with the title.

This is the graph which is the list of nodes similar to ExplodedGraph:

{ graph: [
...
{ id: 13, "predecessors": [12], "successors": [14, 15],
  "program_points": [
    ...
  ],
  "program_state": {
    "store": null,
    "environment": [
      { "location_context": ... },
    ], ...
  }
},
{ id: 14, ...
  ...
},
...
]}

There is a concept of an empty array ("store": [],), but I think it is more precise if it is null instead.

Another plan: Because our representation will be pure JSON the Sep parameter will be useless, so I remove it.

NoQ added a comment.May 22 2019, 1:32 PM

After a while I think we could drop the IDs and the title part.

I'm using IDs, don't drop them^^ They are useful when you're viewing ExplodedGraph from a debugger: with IDs you can either seek the current node in the graph, or, more importantly, set conditional breakpoints with IDs as conditions.

Here is a plan 2.0:

There isn't much time left, so i think we should start "converging", i.e. produce something usable and stable, and then later follow up with further updates if necessary. It isn't worth it to perfect the format until we see how it works in practice.

Another plan: Because our representation will be pure JSON the Sep parameter will be useless, so I remove it.

You can't do that unless you remove the .dot dumper entirely, because for .dot files Sep has to be \\l. For now it's not done, so let's delay it.

In D61912#1512671, @NoQ wrote:

After a while I think we could drop the IDs and the title part.

I'm using IDs, don't drop them^^ They are useful when you're viewing ExplodedGraph from a debugger: with IDs you can either seek the current node in the graph, or, more importantly, set conditional breakpoints with IDs as conditions.

I meant the [Store (ID)] title business and the pointers which appears between < > like the ID of the Store.

If we write out some ProgramPoints it looks like that:

DeclRefExpr S2079258 <0x29c35c0> RM
line=2715 col=11
--------
ImplicitCastExpr S2079262 <0x29c35e0> RM
line=2715 col=11

Would you need the S or the < > version of IDs, or both? How would you call the latter?

Here is a plan 2.0:

There isn't much time left, so i think we should start "converging", i.e. produce something usable and stable, and then later follow up with further updates if necessary. It isn't worth it to perfect the format until we see how it works in practice.

I have planned this just today, so I will rewrite everything properly within the next 24h, do not worry about timings yet. I just wanted to see what would you inject or remove from the current ProgramState representation, that is why I have requested a review for each of its descendants. The ProgramPoint business is much simpler. The node pred and succ gathering is useful for creating a very smart JSON graph which is at most 30 min work with full review.

Another plan: Because our representation will be pure JSON the Sep parameter will be useless, so I remove it.

You can't do that unless you remove the .dot dumper entirely, because for .dot files Sep has to be \\l. For now it's not done, so let's delay it.

I will not remove the DOT format, just the separation will be useless as every node will be 100% pure JSON and the spacing will be the format itself.

However, the DOT-dump is broken because the spaces going to be trimmed which is fail. Because the JSON format it is not that big deal, here is a crazy example from the old version:

{ "title": "Environment", "items": [
{ "location_context": "#0 Call", "calling": "llvm::APInt::operator==", "call_line": "1743", "items": [
{ "lctx_id": 375, "stmt_id": 845291, "pretty": "this->BitWidth == RHS.BitWidth", "value": "1 U1b" },
{ "lctx_id": 375, "stmt_id": 845306, "pretty": ""Comparison requires equal bit widths"", "value": "1 U1b" },
]},
{ "location_context": "#1 Call", "calling": "llvm::APInt::udivrem", "call_line": "1847", "items": [
{ "lctx_id": 318, "stmt_id": 2048059, "pretty": "LHS", "value": "&SymRegion{reg_$1<const class llvm::APInt & A>}" },
{ "lctx_id": 318, "stmt_id": 2048063, "pretty": "RHS", "value": "&temp_object{class llvm::APInt, S2050862}" },
{ "lctx_id": 318, "stmt_id": 2048107, "pretty": "operator==", "value": "&code{operator==}" },
]},
]}

I have not seen any way to bypass this behaviour sadly.

NoQ added a comment.May 22 2019, 2:13 PM

I meant the [Store (ID)] title business and the pointers which appears between < > like the ID of the Store.

If we write out some ProgramPoints it looks like that:

DeclRefExpr S2079258 <0x29c35c0> RM
line=2715 col=11
--------
ImplicitCastExpr S2079262 <0x29c35e0> RM
line=2715 col=11

Would you need the S or the < > version of IDs, or both? How would you call the latter?

I use both. Let's call the latter "pointer" because it's literally a pointer; you can dereference it within a debugger if you're within the same process:

p ((DeclRefExpr *)0x29c35c0)->dump()

I just wanted to see what would you inject or remove from the current ProgramState representation, that is why I have requested a review for each of its descendants.

I'd rather not change anything for now. I'm using almost all of the information that's currently present in the graph.

I will not remove the DOT format, just the separation will be useless as every node will be 100% pure JSON and the spacing will be the format itself.

Emm, right, Sep is indeed useless, NL is not useless, sry^^

However, the DOT-dump is broken because the spaces going to be trimmed which is fail.
...
I have not seen any way to bypass this behaviour sadly.

Did you try indenting with &nbsp;?

Charusso updated this revision to Diff 201060.May 23 2019, 2:01 PM
  • Improved the JSON format.
This revision is now accepted and ready to land.May 23 2019, 2:01 PM

Thanks @NoQ! The &nbsp; works well, but it produces a huge SVG file, here is a complete example of node-printing:

NoQ added a comment.May 23 2019, 4:46 PM

Thanks @NoQ! The &nbsp; works well, but it produces a huge SVG file, here is a complete example of node-printing:

Looks fantastic. I can work with that.
I hope that we'll be able to remove the &nbsp; thing soon, as we go for a fancier viewer.

Charusso updated this revision to Diff 201564.May 27 2019, 1:23 PM
  • 100% pure JSON.
This revision was automatically updated to reflect the committed changes.

All these patches bypassed cfe-commits.

Why does this invent a yet another json formatter instead of using
"llvm/Support/JSON.h", in particular it's lightweight json::OStream ?

Charusso added a comment.EditedMay 29 2019, 9:09 AM

Hey @lebedev.ri, thanks for the review!

All these patches bypassed cfe-commits.

Bypassed? It is only added when you push your stuff, which is happened as expected.

Why does this invent a yet another json formatter instead of using
"llvm/Support/JSON.h", in particular it's lightweight json::OStream ?

We are doing our own JSON representation which is not really the job of json namespace. Also it is my personal feeling to avoid write::like::that for perfectly no reason.

Hey @lebedev.ri, thanks for the review!

All these patches bypassed cfe-commits.

Bypassed? It is only added when you push your stuff, which is happened as expected.

That is pretty much the opposite from what is exected, e.g.:
https://llvm.org/docs/Phabricator.html

While Phabricator is a useful tool for some, the relevant -commits mailing list
is the system of record for all LLVM code review. The mailing list should be added
as a subscriber on all reviews, and Phabricator users should be prepared to respond
to free-form comments in mail sent to the commits list.

Why does this invent a yet another json formatter instead of using
"llvm/Support/JSON.h", in particular it's lightweight json::OStream ?

We are doing our own JSON representation which is not really the job of json namespace. Also it is my personal feeling to avoid write::like::that for perfectly no reason.

That was precisely the question. Why is this not using the abstractions, but does everything on it's own?

All these patches bypassed cfe-commits.

Bypassed? It is only added when you push your stuff, which is happened as expected.

That is pretty much the opposite from what is exected, e.g.:
https://llvm.org/docs/Phabricator.html

While Phabricator is a useful tool for some, the relevant -commits mailing list
is the system of record for all LLVM code review. The mailing list should be added
as a subscriber on all reviews, and Phabricator users should be prepared to respond
to free-form comments in mail sent to the commits list.

Hm, in my mind it was in the proper order, as you could see that. Thanks for the clarification! I will add it.

Why does this invent a yet another json formatter instead of using
"llvm/Support/JSON.h", in particular it's lightweight json::OStream ?

We are doing our own JSON representation which is not really the job of json namespace. Also it is my personal feeling to avoid write::like::that for perfectly no reason.

That was precisely the question. Why is this not using the abstractions, but does everything on it's own?

There is that much information per node of the ExplodedGraph that cannot fit into a 8K monitor screen but you have to see the whole node and its predecessor to understand it. We are trying to compact and reduce it as small as possible. At the moment our representation is a little bit wider than it should be. Here is an ugly WIP example: D62553

All these patches bypassed cfe-commits.

Bypassed? It is only added when you push your stuff, which is happened as expected.

That is pretty much the opposite from what is exected, e.g.:
https://llvm.org/docs/Phabricator.html

While Phabricator is a useful tool for some, the relevant -commits mailing list
is the system of record for all LLVM code review. The mailing list should be added
as a subscriber on all reviews, and Phabricator users should be prepared to respond
to free-form comments in mail sent to the commits list.

Hm, in my mind it was in the proper order, as you could see that. Thanks for the clarification! I will add it.

I think cfe-commits is added automatically if you add "Clang" to the "Repository" field when creating a new Differential.

I think cfe-commits is added automatically if you add "Clang" to the "Repository" field when creating a new Differential.

Exactly, thanks!