This is an archive of the discontinued LLVM Phabricator instance.

[dataflow] add HTML logger: browse code/cfg/analysis timeline/state
ClosedPublic

Authored by sammccall on Mar 21 2023, 3:52 PM.

Details

Summary

With -dataflow-log=/dir we will write /dir/0.html etc for each
function analyzed.

These files show the function's code and CFG, and the path through
the CFG taken by the analysis. At each analysis point we can see the
lattice state.

Currently the lattice state dump is not terribly useful but we can
improve this: showing values associated with the current Expr,
simplifying flow condition, highlighting changes etc.

(Trying not to let this patch scope-creep too much, so I ripped out the
half-finished features)

Demo: https://htmlpreview.github.io/?https://gist.githubusercontent.com/sam-mccall/1746985bf13406bd19181af281aea9ff/raw/9718fdd48406dabccb3092acd983b4bd55da9dfa/analysis.html

Diff Detail

Event Timeline

sammccall created this revision.Mar 21 2023, 3:52 PM
Herald added a project: Restricted Project. · View Herald Transcript
sammccall requested review of this revision.Mar 21 2023, 3:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 21 2023, 3:52 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Added a bunch of reviewers for feedback on how this should work.

Happy to have detailed comments on the JS/HTML generator implementation too of course, but it may inevitably be a bit of a mess (ugly, weird languages, undertested).
I've added things like this to a few clang-based tools, and they're useful but review is always a bit awkward.

sammccall updated this revision to Diff 507339.Mar 22 2023, 6:25 AM

Use a process-shared counter for HTML output filenames to avoid clobbering.

While I am OK with this approach, I wonder if it would be too much work to split the HTML generation and emitting data up. E.g., the logger could emit YAML or JSON that could be parsed by a python script to create the HTML (or the HTML itself could use JS to parse it and generate the DOM).
This would have a couple of advantages like:

  • People could add additional visualizations or other tools like querying program states more easily just consuming the JSON/YAML file
  • Other static analysis tools could produce similar YAML/JSON so your visualization could potentially work for other analysis tools. E.g., I think people were working on a dataflow analysis framework for MLIR.

What do you think?

While I am OK with this approach, I wonder if it would be too much work to split the HTML generation and emitting data up. E.g., the logger could emit YAML or JSON that could be parsed by a python script to create the HTML (or the HTML itself could use JS to parse it and generate the DOM).

This is definitely doable. I think the main downside is more JS complexity: emitting JSON is easy, turning it into a view means some sort of templating (micro)framework, probably.
(I find it important that the output is directly viewable without running extra tools, and we're not pulling in big deps. So this probably means some custom JS)

This would have a couple of advantages like:

  • People could add additional visualizations or other tools like querying program states more easily just consuming the JSON/YAML file
  • Other static analysis tools could produce similar YAML/JSON so your visualization could potentially work for other analysis tools. E.g., I think people were working on a dataflow analysis framework for MLIR.

Definitely.
To realize these advantages, the data model needs to be generic enough and stable enough that it's actually useful with different producers/consumers.
(e.g. does the MLIR dataflow framework have the same idea about how the analysis timeline relates to the CFG and elements within it? And if we want to show the clang AST or the StorageLocation/Value graph, what's the non-clang::dataflow-coupled way to model that?)

This seems useful but also a pretty large project in itself - I don't think I have the experience or bandwidth to take it on. (My primary goal is nullability analysis, I need to spend some more time on that directly to understand deeper requirements about debugging these things).

One halfway point is to move the data into JSON, move most of the HTML into a static file, and use JS to mash them together - without exposing the JSON separately or attempting to properly decouple it from the generator/visualization, but leaving us closer to doing that later. Do you think that's valuable?

(e.g. does the MLIR dataflow framework have the same idea about how the analysis timeline relates to the CFG and elements within it? And if we want to show the clang AST or the StorageLocation/Value graph, what's the non-clang::dataflow-coupled way to model that?)

These are good questions.

Do you think that's valuable?

In case it does not add too much complexity on your side I think it is valuable. In case someone wants to use the same visualization for their tool it would be on them to figure out how to generalize the visualization part. On the other hand, even if we never get new users for the visualization, a JSON file will make it easier to build new diagnostic tools for the dataflow framework. One could even do one-off python scripts to help debugging a specific issue.

gribozavr2 added inline comments.Mar 22 2023, 3:58 PM
clang/include/clang/Analysis/FlowSensitive/Logger.h
34

Please use triple slash for doc comments.

clang/lib/Analysis/FlowSensitive/CMakeLists.txt
22

Is this the right location for the Python script? Aren't they normally under llvm-project.git/clang/utils?

31

Please add a blank line.

clang/lib/Analysis/FlowSensitive/HTMLLogger.cpp
14
119

Why not include the generated header before the class definition to avoid the need for forward declarations?

123
212

Did you mean to append ContextLogs here? (and then clear them)

clang/lib/Analysis/FlowSensitive/HTMLLogger.css
1

Please add a copyright comment.

clang/lib/Analysis/FlowSensitive/HTMLLogger.js
1

Please add a copyright comment.

75

Remove debug logging?

111

Remove debug logging?

128

Or was the order intentional?

clang/lib/Analysis/FlowSensitive/bundle_resources.py
1

Please add a copyright comment.

clang/unittests/Analysis/FlowSensitive/LoggerTest.cpp
93

const?

176

Please also verify that we include the custom log messages and pretty-printed lattice elements.

sammccall marked 15 inline comments as done.Mar 23 2023, 6:17 AM
sammccall added inline comments.
clang/lib/Analysis/FlowSensitive/CMakeLists.txt
22

I've moved it, though honestly I have no idea if that's the right place :-)

Clang/utils has lots of tools to invoke by hand, test helpers, the sources for clang-tablegen, and exactly one build-time python script that I could find.

Python scripts that are part of the build graph seem to usually live somewhere under the dir that uses them (libclang/linker-script-to-export-list.py, llvm-shlib/gen-msvc-exports.py, libclc/generic/lib/gen_convert.py, Tooling/DumpTool/generate_cxx_src_locs.py), but not always.

OTOH, this is fairly generic (there's also a copy under clang-tools-extra/pseudo) so probably makes sense to have somewhere sharable.

clang/lib/Analysis/FlowSensitive/HTMLLogger.js
128

Somehow ordering by {number of set bits, lexicographic} makes sense in my head but I've changed it because I can't explain why :-)

sammccall updated this revision to Diff 507718.Mar 23 2023, 6:18 AM
sammccall marked 2 inline comments as done.

Address all comments except moving data from HTML => JSON

sammccall planned changes to this revision.Mar 23 2023, 6:18 AM

Going to have a go at passing data via JSON

sammccall updated this revision to Diff 508272.Mar 24 2023, 8:05 PM

Move data into JSON object, render into DOM with templates.

This is cleaner in a few ways: less ugly C++ string manipulation, data is in
principle reusable, HTML layout is extracted and easier to inspect and edit.

However it's more complicated: we define and implement a (small) HTML
templating language, and the overall amount of metaprogramming/indirection is
higher.

I'm probably equally happy using this or going back to the previous
HTML-emitting version.

sammccall updated this revision to Diff 508273.Mar 24 2023, 8:07 PM

revert accidental edit

Could you upload an updated sample HTML file? It is easier to review the HTML generation and javascript code when an example is available.

clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
385

The name is too short for me. Also it is ambiguous whether "flag" is a noun or a verb.

How about makeLoggerFromCommandLine?

385

Since it is a file-local helper, we can make it static.

392
clang/lib/Analysis/FlowSensitive/HTMLLogger.html
19 ↗(On Diff #508273)

Could you add the file name and line number of the start location of the function?

It might be helpful not only for the reader (to confirm that they are debugging the right thing), but also when finding the right html file using grep.

sammccall updated this revision to Diff 508517.Mar 27 2023, 1:06 AM
sammccall marked 4 inline comments as done.
sammccall edited the summary of this revision. (Show Details)

Address Dmitri's comments
Update demo link

sammccall updated this revision to Diff 508518.Mar 27 2023, 1:07 AM

Only show the filename, not the full path, to avoid overflowing the box

sammccall added inline comments.Mar 27 2023, 4:24 AM
clang/lib/Analysis/FlowSensitive/HTMLLogger.html
19 ↗(On Diff #508273)

Added filename to the top of the code, and line numbers to each line.
(This seems nice for display, but doesn't allow you to grep for foo.cpp:42 - is that critical?)

gribozavr2 added inline comments.Mar 27 2023, 9:18 AM
clang/lib/Analysis/FlowSensitive/HTMLLogger.cpp
35
129

Now that we have an HTML template file, you could inline css and js into the file - if you like.

clang/lib/Analysis/FlowSensitive/HTMLLogger.html
1 ↗(On Diff #508518)

Please add a copyright statement.

clang/lib/Analysis/FlowSensitive/HTMLLogger.js
12
12

"if we the selection is" - parse error :)

46

Should hidden = false be under if (changed)?

Or can it be the case that nothing changed, but we still need to show the element?

48–51

Would it make sense to move this removal logic into inflate itself?

(Of course the recursive part of inflate would stay as a separate function, inflateImpl or something.)

125

What's "A" in "(BB4)A"?

185

Could you add a doc comment that describes what cls is?

Specifically, I see that in the array case we assume that the dot is already present, and in the single-value case we prepend the dot - is that intentional?

gribozavr2 added inline comments.Mar 27 2023, 9:20 AM
clang/lib/Analysis/FlowSensitive/HTMLLogger.html
19 ↗(On Diff #508273)

Imagine you're debugging an analysis on a relatively big file. How would you find the html file that corresponds to a function where the analysis is misbehaving? What if the function has a relatively common name?

sammccall updated this revision to Diff 511126.Apr 5 2023, 9:19 AM
sammccall marked 8 inline comments as done.

address comments (sorry about long turnaround!)

doh, forgot to send

clang/lib/Analysis/FlowSensitive/HTMLLogger.cpp
129

I find having them out of line much easier to navigate and edit.

clang/lib/Analysis/FlowSensitive/HTMLLogger.html
19 ↗(On Diff #508273)

Fair enough, added the function name and file:line to the <title> (in HTML rather than JS so they can be easily grepped)

clang/lib/Analysis/FlowSensitive/HTMLLogger.js
46

Moved it under the element, I'm not sure which is conceptually simpler.

It's not possible that we need to show but !changed, because either:

  • this is the first call (and changed is initialized to true), or
  • the previous call early-returned, in which case root.selection[key] was set to null for some key on the last call, and because we got here it was nonnull on this call, so again changed is true.
48–51

I think we end up with a 60 line function wrapped in a +4 line function, and the nesting causes more confusion than it solves. I don't think clearing the old content matches the name inflate either.

Happy to pull out reinflate() as a sibling if you think it helps - with one callsite I'm not sure it does.

125

a typo :-(

185

Could you add a doc comment that describes what cls is?

Added a comment.
I can also make this function a bit less magic at the cost of a bit more verbosity elsewhere (id="foo" => id="foo" class="foo" in the HTML, e => e.id => e => [e.id] in the JS).

Specifically, I see that in the array case we assume that the dot is already present, and in the single-value case we prepend the dot - is that intentional?

In the array case we call classSelector on each element recursively, so the dot gets added there.

xazax.hun added inline comments.Apr 10 2023, 12:51 PM
clang/lib/Analysis/FlowSensitive/HTMLLogger.cpp
78

We already sort of have a way to escape HTML here: https://github.com/llvm/llvm-project/blob/132f1d31fd66c30baf9773bf8f37b36a40fa7039/clang/include/clang/Rewrite/Core/HTMLRewrite.h#L60

I think we should reuse that (and potentially extend it if it does not fit our needs).

sammccall updated this revision to Diff 512377.Apr 11 2023, 2:29 AM
sammccall marked an inline comment as done.

use llvm::printHTMLEscaped

clang/lib/Analysis/FlowSensitive/HTMLLogger.cpp
78

Yeah, we have this in a bunch of places.

The canonical one looks like llvm::printHTMLEscaped in Support, switched to that.

gribozavr2 accepted this revision.Apr 11 2023, 7:23 AM
gribozavr2 added inline comments.
clang/lib/Analysis/FlowSensitive/HTMLLogger.js
48–51

Happy to pull out reinflate() as a sibling if you think it helps - with one callsite I'm not sure it does.

I'd prefer that. The reason is that you can't correctly call inflate without first deleting the previously expanded nodes (even though there's only one call site.) And that suggests to me that the callee should be responsible for that part of the work as well.

This revision is now accepted and ready to land.Apr 11 2023, 7:23 AM
sammccall updated this revision to Diff 513543.Apr 14 2023, 4:37 AM

Extract reinflate() function

sammccall updated this revision to Diff 513544.Apr 14 2023, 4:39 AM

oops, after testing this time

@xazax.hun do you still want to look at this again?
I'm itching a little to get some use out of it :-)

xazax.hun accepted this revision.Apr 18 2023, 10:31 AM

I am a bit overloaded at the moment, feel free to commit. I can still add comments later that could be addressed in a follow up.

This revision was landed with ongoing or failed builds.Apr 19 2023, 6:37 AM
This revision was automatically updated to reflect the committed changes.

I am a bit overloaded at the moment, feel free to commit. I can still add comments later that could be addressed in a follow up.

Thanks! Happy to address as they come up.

thakis added a subscriber: thakis.Apr 19 2023, 11:05 AM
thakis added inline comments.
clang/utils/bundle_resources.py
6 ↗(On Diff #514941)

This file looks very similar to clang-tools-extra/pseudo/tool/bundle_resources.py. Could we remove clang-tools-extra/pseudo/tool/bundle_resources.py and make c-t-e/pseudo/tools/CMakeLists.txt call this script instead?

sammccall added inline comments.Apr 19 2023, 11:35 AM
clang/utils/bundle_resources.py
6 ↗(On Diff #514941)

yup, I'll do that now, I don't remember why I didn't want to have it in the same commit...

FYI, I'm seeing occasional spurious build breakage from this change.

The issue is that even if we've got add_dependencies(clangAnalysisFlowSensitive clangAnalysisFlowSensitiveResources), it looks like clangAnalysisFlowSensitiveResources only is considered a dependency to the whole clangAnalysisFlowSensitive library target, but the individual HTMLLogger.cpp.o object file doesn't have a dependency on the generated HTMLLogger.inc file. So when building, it may start building HTMLLogger.cpp.o before HTMLLogger.inc has been generated.

To reproduce it:

$ cd llvm-project/llvm
$ mkdir build
$ cd build
$ cmake -G Ninja .. -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang"
$ ninja tools/clang/lib/Analysis/FlowSensitive/CMakeFiles/obj.clangAnalysisFlowSensitive.dir/HTMLLogger.cpp.o
../tools/clang/lib/Analysis/FlowSensitive/HTMLLogger.cpp:72:10: fatal error: HTMLLogger.inc: No such file or directory
   72 | #include "HTMLLogger.inc"
      |          ^~~~~~~~~~~~~~~~

If you've built this successfully once, then ninja also has picked up the dependency between HTMLLogger.cpp.o and HTMLLogger.inc, but it's possible to reproduce this in a clean build dir with the commands above.

Looking at the generated build.ninja, we've got this:

build cmake_object_order_depends_target_obj.clangAnalysisFlowSensitive: phony || tools/clang/clang-tablegen-targets

build cmake_object_order_depends_target_clangAnalysisFlowSensitive: phony || cmake_object_order_depends_target_LLVMAggressiveInstCombine cmake_object_order_depends_target_LLVMAnalysis cmake_object_order_depends_target_LLVMAsmParser cmake_object_order_depends_target_LLVMBinaryFormat cmake_object_order_depends_target_LLVMBitReader cmake_object_order_depends_target_LLVMBitstreamReader cmake_object_order_depends_target_LLVMCore cmake_object_order_depends_target_LLVMDebugInfoCodeView cmake_object_order_depends_target_LLVMDebugInfoDWARF cmake_object_order_depends_target_LLVMDebugInfoMSF cmake_object_order_depends_target_LLVMDebugInfoPDB cmake_object_order_depends_target_LLVMDemangle cmake_object_order_depends_target_LLVMFrontendOpenMP cmake_object_order_depends_target_LLVMIRReader cmake_object_order_depends_target_LLVMInstCombine cmake_object_order_depends_target_LLVMMC cmake_object_order_depends_target_LLVMMCParser cmake_object_order_depends_target_LLVMObject cmake_object_order_depends_target_LLVMProfileData cmake_object_order_depends_target_LLVMRemarks cmake_object_order_depends_target_LLVMScalarOpts cmake_object_order_depends_target_LLVMSupport cmake_object_order_depends_target_LLVMSymbolize cmake_object_order_depends_target_LLVMTargetParser cmake_object_order_depends_target_LLVMTextAPI cmake_object_order_depends_target_LLVMTransformUtils cmake_object_order_depends_target_clangAST cmake_object_order_depends_target_clangASTMatchers cmake_object_order_depends_target_clangAnalysis cmake_object_order_depends_target_clangBasic cmake_object_order_depends_target_clangLex cmake_object_order_depends_target_obj.clangAnalysisFlowSensitive tools/clang/clang-tablegen-targets tools/clang/lib/Analysis/FlowSensitive/clangAnalysisFlowSensitiveResources

build tools/clang/lib/Analysis/FlowSensitive/CMakeFiles/obj.clangAnalysisFlowSensitive.dir/HTMLLogger.cpp.o: CXX_COMPILER__obj.2eclangAnalysisFlowSensitive_Release ../tools/clang/lib/Analysis/FlowSensitive/HTMLLogger.cpp || cmake_object_order_depends_target_obj.clangAnalysisFlowSensitive

HTMLLogger.cpp.o depends on cmake_object_order_depends_target_obj.clangAnalysisFlowSensitive, but that one doesn't depend on clangAnalysisFlowSensitiveResources. Only cmake_object_order_depends_target_clangAnalysisFlowSensitive (note the missing obj. in the middle) depends on it.

Looking at the similarly handled HTMLForestResources.inc in clang-pseudo, that one does work correctly. That one is set up with add_clang_tool, where there's no intermediate object library between the target itself and the object files as for add_clang_library.

Not sure what the right CMake way to handle it is - should the dependency be added for the intermediate object library? (Looking in the cmake files, it doesn't seem like we always generate object libraries.) Or should we add an explicit dependency for the HTMLLogger.cpp source file?

It comes from handling of objlib.

  1. Use add_clang_library's DEPENDS It adds deps on both libs.
  2. Don't create a custom target. Just add to sources. add_library(HTMLLogger.inc)

Sorry about the delay. I hope this is fixed by 28114722baabb468732a9cc24784abafd6c47792. I wasn't able to reproduce the problem locally, so I'm not sure.

add_llvm_library says DEPENDS is the same as add_dependencies, but reading the code it appears to do the thing we want instead. Fingers crossed...