This is an archive of the discontinued LLVM Phabricator instance.

[FlowSensitive] Log analysis progress for debugging purposes
ClosedPublic

Authored by sammccall on Feb 24 2023, 5:50 AM.

Details

Summary

The goal is to be able to understand how the analysis executes, and what its
incremental and final findings are, by enabling logging and reading the logs.
This should include both framework and analysis-specific information.

Ad-hoc printf-debugging doesn't seem sufficient for my understanding, at least.
Being able to check in logging, turn it on in a production binary, and quickly
find particular analysis steps within complex functions seem important.

This can be enabled programmatically through DataflowAnalysisOptions, or
via the flag -dataflow-log. (Works in unittests, clang-tidy, standalone
tools...)

Important missing pieces here:

  • a logger implementation that produces an interactive report (HTML file) which can be navigated via timeline/code/CFG. (I think the Logger interface is sufficient for this, but need to prototype).
  • display of the application-specific lattice
  • more useful display for the built-in environment (e.g. meaningful & consistent names for values, hiding redundant variables in the flow condition, hiding unreachable expressions)

Diff Detail

Event Timeline

sammccall created this revision.Feb 24 2023, 5:50 AM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: wenlei. · View Herald Transcript
sammccall requested review of this revision.Feb 24 2023, 5:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2023, 5:50 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
sammccall updated this revision to Diff 500166.Feb 24 2023, 5:51 AM

A little more docs

Example textual log (though better with colors):

=== Beginning data flow analysis ===
int target(int x) {
    while (x > 0)
        {
            --x;
        }
    return x + 1;
}

FunctionDecl 0x2a9a300 </usr/local/google/home/sammccall/test.cc:1:1, line:6:1> line:1:5 target 'int (int)'
|-ParmVarDecl 0x2a9a238 <col:12, col:16> col:16 used x 'int'
`-CompoundStmt 0x2a9a568 <col:19, line:6:1>
  |-WhileStmt 0x2a9a4c0 <line:2:3, line:4:3>
  | |-BinaryOperator 0x2a9a450 <line:2:10, col:12> 'bool' '>'
  | | |-ImplicitCastExpr 0x2a9a438 <col:10> 'int' <LValueToRValue>
  | | | `-DeclRefExpr 0x2a9a3f8 <col:10> 'int' lvalue ParmVar 0x2a9a238 'x' 'int'
  | | `-IntegerLiteral 0x2a9a418 <col:12> 'int' 0
  | `-CompoundStmt 0x2a9a4a8 <col:15, line:4:3>
  |   `-UnaryOperator 0x2a9a490 <line:3:5, col:7> 'int' lvalue prefix '--'
  |     `-DeclRefExpr 0x2a9a470 <col:7> 'int' lvalue ParmVar 0x2a9a238 'x' 'int'
  `-ReturnStmt 0x2a9a558 <line:5:3, col:14>
    `-BinaryOperator 0x2a9a538 <col:10, col:14> 'int' '+'
      |-ImplicitCastExpr 0x2a9a520 <col:10> 'int' <LValueToRValue>
      | `-DeclRefExpr 0x2a9a4e0 <col:10> 'int' lvalue ParmVar 0x2a9a238 'x' 'int'
      `-IntegerLiteral 0x2a9a500 <col:14> 'int' 1

 [B5 (ENTRY)]
   Succs (1): B4

 [B1]
   1: x
   2: [B1.1] (ImplicitCastExpr, LValueToRValue, int)
   3: 1
   4: [B1.2] + [B1.3]
   5: return [B1.4];
   Preds (1): B4
   Succs (1): B0

 [B2]
   Preds (1): B3
   Succs (1): B4

 [B3]
   1: x
   2: --[B3.1]
   Preds (1): B4
   Succs (1): B2

 [B4]
   1: x
   2: [B4.1] (ImplicitCastExpr, LValueToRValue, int)
   3: 0
   4: [B4.2] > [B4.3]
   T: while [B4.4]
   Preds (2): B2 B5
   Succs (2): B3 B1

 [B0 (EXIT)]
   Preds (1): B1

=== Entering block B4 (iteration 1) ===

 [B4]
   1: x
   2: [B4.1] (ImplicitCastExpr, LValueToRValue, int)
   3: 0
   4: [B4.2] > [B4.3]
   T: while [B4.4]
   Preds (2): B2 B5
   Succs (2): B3 B1

Computed state for B4.0:
DeclToLoc:
ExprToLoc:
LocToVal:
FlowConditionToken:
(=
    B0
    B1)
(=
    B1
    B3)
(=
    B3
    B2)
B0
B2

Processing element B4.1: x
transfer!
Computed state for B4.1:
DeclToLoc:
ExprToLoc:
LocToVal:
FlowConditionToken:
(=
    B0
    B1)
(=
    B1
    B3)
(=
    B3
    B2)
B0
B2

Processing element B4.2: x (ImplicitCastExpr, LValueToRValue, int)
transfer!
Computed state for B4.2:
DeclToLoc:
ExprToLoc:
LocToVal:
FlowConditionToken:
(=
    B0
    B1)
(=
    B1
    B3)
(=
    B3
    B2)
B0
B2

Processing element B4.3: 0
transfer!
Computed state for B4.3:
DeclToLoc:
ExprToLoc:
LocToVal:
FlowConditionToken:
(=
    B0
    B1)
(=
    B1
    B3)
(=
    B3
    B2)
B0
B2

Processing element B4.4: x > 0
transfer!
Computed state for B4.4:
DeclToLoc:
ExprToLoc:
LocToVal:
FlowConditionToken:
(=
    B0
    B1)
(=
    B1
    B3)
(=
    B3
    B2)
B0
B2

=== Entering block B1 (iteration 1) ===

 [B1]
   1: x
   2: [B1.1] (ImplicitCastExpr, LValueToRValue, int)
   3: 1
   4: [B1.2] + [B1.3]
   5: return [B1.4];
   Preds (1): B4
   Succs (1): B0

Computed state for B1.0:
DeclToLoc:
ExprToLoc:
  [0x2a9a450, 0x2ab7e60]
LocToVal:
  [0x2ab7e60, 0x2ab7da0: AtomicBool]
FlowConditionToken:
(=
    B0
    B1)
(=
    B1
    B3)
(=
    B3
    B2)
(=
    B4
    (and
        B0
        (not
            B5)))
(=
    B6
    B4)
B2
B6

Processing element B1.1: x
transfer!
Computed state for B1.1:
DeclToLoc:
ExprToLoc:
  [0x2a9a450, 0x2ab7e60]
LocToVal:
  [0x2ab7e60, 0x2ab7da0: AtomicBool]
FlowConditionToken:
(=
    B0
    B1)
(=
    B1
    B3)
(=
    B3
    B2)
(=
    B4
    (and
        B0
        (not
            B5)))
(=
    B6
    B4)
B2
B6

Processing element B1.2: x (ImplicitCastExpr, LValueToRValue, int)
transfer!
Computed state for B1.2:
DeclToLoc:
ExprToLoc:
  [0x2a9a450, 0x2ab7e60]
LocToVal:
  [0x2ab7e60, 0x2ab7da0: AtomicBool]
FlowConditionToken:
(=
    B0
    B1)
(=
    B1
    B3)
(=
    B3
    B2)
(=
    B4
    (and
        B0
        (not
            B5)))
(=
    B6
    B4)
B2
B6

Processing element B1.3: 1
transfer!
Computed state for B1.3:
DeclToLoc:
ExprToLoc:
  [0x2a9a450, 0x2ab7e60]
LocToVal:
  [0x2ab7e60, 0x2ab7da0: AtomicBool]
FlowConditionToken:
(=
    B0
    B1)
(=
    B1
    B3)
(=
    B3
    B2)
(=
    B4
    (and
        B0
        (not
            B5)))
(=
    B6
    B4)
B2
B6

Processing element B1.4: x + 1
transfer!
Computed state for B1.4:
DeclToLoc:
ExprToLoc:
  [0x2a9a450, 0x2ab7e60]
LocToVal:
  [0x2ab7e60, 0x2ab7da0: AtomicBool]
FlowConditionToken:
(=
    B0
    B1)
(=
    B1
    B3)
(=
    B3
    B2)
(=
    B4
    (and
        B0
        (not
            B5)))
(=
    B6
    B4)
B2
B6

Processing element B1.5: return x + 1;
transfer!
Computed state for B1.5:
DeclToLoc:
ExprToLoc:
  [0x2a9a450, 0x2ab7e60]
LocToVal:
  [0x2ab7e60, 0x2ab7da0: AtomicBool]
FlowConditionToken:
(=
    B0
    B1)
(=
    B1
    B3)
(=
    B3
    B2)
(=
    B4
    (and
        B0
        (not
            B5)))
(=
    B6
    B4)
B2
B6

=== Entering block B0 (iteration 1) ===

 [B0 (EXIT)]
   Preds (1): B1

Computed state for B0.0:
DeclToLoc:
ExprToLoc:
  [0x2a9a450, 0x2ab7e60]
LocToVal:
  [0x2ab7e60, 0x2ab7da0: AtomicBool]
FlowConditionToken:
(=
    B0
    B1)
(=
    B1
    B3)
(=
    B3
    B2)
(=
    B4
    (and
        B0
        (not
            B5)))
(=
    B6
    B4)
(=
    B7
    B8)
(=
    B8
    B6)
B2
B7

=== Entering block B3 (iteration 1) ===

 [B3]
   1: x
   2: --[B3.1]
   Preds (1): B4
   Succs (1): B2

Computed state for B3.0:
DeclToLoc:
ExprToLoc:
  [0x2a9a450, 0x2ab7e60]
LocToVal:
  [0x2ab7e60, 0x2aba810: AtomicBool]
FlowConditionToken:
(=
    B0
    B1)
(=
    B1
    B3)
(=
    B3
    B2)
(=
    B4
    B5)
(=
    B5
    (and
        B0
        B6))
B2
B4

Processing element B3.1: x
transfer!
Computed state for B3.1:
DeclToLoc:
ExprToLoc:
  [0x2a9a450, 0x2ab7e60]
LocToVal:
  [0x2ab7e60, 0x2aba810: AtomicBool]
FlowConditionToken:
(=
    B0
    B1)
(=
    B1
    B3)
(=
    B3
    B2)
(=
    B4
    B5)
(=
    B5
    (and
        B0
        B6))
B2
B4

Processing element B3.2: --x
transfer!
Computed state for B3.2:
DeclToLoc:
ExprToLoc:
  [0x2a9a450, 0x2ab7e60]
LocToVal:
  [0x2ab7e60, 0x2aba810: AtomicBool]
FlowConditionToken:
(=
    B0
    B1)
(=
    B1
    B3)
(=
    B3
    B2)
(=
    B4
    B5)
(=
    B5
    (and
        B0
        B6))
B2
B4

=== Entering block B2 (iteration 1) ===

 [B2]
   Preds (1): B3
   Succs (1): B4

Computed state for B2.0:
DeclToLoc:
ExprToLoc:
  [0x2a9a450, 0x2ab7e60]
LocToVal:
  [0x2ab7e60, 0x2aba810: AtomicBool]
FlowConditionToken:
(=
    B0
    B1)
(=
    B1
    B3)
(=
    B3
    B2)
(=
    B4
    B5)
(=
    B5
    B7)
(=
    B6
    B4)
(=
    B7
    (and
        B0
        B8))
B2
B6

=== Entering block B4 (iteration 2) ===

 [B4]
   1: x
   2: [B4.1] (ImplicitCastExpr, LValueToRValue, int)
   3: 0
   4: [B4.2] > [B4.3]
   T: while [B4.4]
   Preds (2): B2 B5
   Succs (2): B3 B1

Computed state for B4.0:
DeclToLoc:
ExprToLoc:
LocToVal:
FlowConditionToken:
(=
    B0
    B1)
(=
    B1
    B7)
(=
    B10
    (and
        B2
        B12))
(=
    B11
    (or
        B0
        B5))
(=
    B2
    B3)
(=
    B3
    B1)
(=
    B4
    B11)
(=
    B5
    B6)
(=
    B6
    B8)
(=
    B8
    B9)
(=
    B9
    B10)
B4
B7

Processing element B4.1: x
transfer!
Computed state for B4.1:
DeclToLoc:
ExprToLoc:
LocToVal:
FlowConditionToken:
(=
    B0
    B1)
(=
    B1
    B7)
(=
    B10
    (and
        B2
        B12))
(=
    B11
    (or
        B0
        B5))
(=
    B2
    B3)
(=
    B3
    B1)
(=
    B4
    B11)
(=
    B5
    B6)
(=
    B6
    B8)
(=
    B8
    B9)
(=
    B9
    B10)
B4
B7

Processing element B4.2: x (ImplicitCastExpr, LValueToRValue, int)
transfer!
Computed state for B4.2:
DeclToLoc:
ExprToLoc:
LocToVal:
FlowConditionToken:
(=
    B0
    B1)
(=
    B1
    B7)
(=
    B10
    (and
        B2
        B12))
(=
    B11
    (or
        B0
        B5))
(=
    B2
    B3)
(=
    B3
    B1)
(=
    B4
    B11)
(=
    B5
    B6)
(=
    B6
    B8)
(=
    B8
    B9)
(=
    B9
    B10)
B4
B7

Processing element B4.3: 0
transfer!
Computed state for B4.3:
DeclToLoc:
ExprToLoc:
LocToVal:
FlowConditionToken:
(=
    B0
    B1)
(=
    B1
    B7)
(=
    B10
    (and
        B2
        B12))
(=
    B11
    (or
        B0
        B5))
(=
    B2
    B3)
(=
    B3
    B1)
(=
    B4
    B11)
(=
    B5
    B6)
(=
    B6
    B8)
(=
    B8
    B9)
(=
    B9
    B10)
B4
B7

Processing element B4.4: x > 0
transfer!
Computed state for B4.4:
DeclToLoc:
ExprToLoc:
LocToVal:
FlowConditionToken:
(=
    B0
    B1)
(=
    B1
    B7)
(=
    B10
    (and
        B2
        B12))
(=
    B11
    (or
        B0
        B5))
(=
    B2
    B3)
(=
    B3
    B1)
(=
    B4
    B11)
(=
    B5
    B6)
(=
    B6
    B8)
(=
    B8
    B9)
(=
    B9
    B10)
B4
B7

B4 has converged!
=== Finished analysis: 5 blocks in 6 total steps ===
sammccall updated this revision to Diff 500172.Feb 24 2023, 6:04 AM

Fix colors, log source

I wanted to share this to get some initial thoughts as we'd discussed debuggability a while ago.
As mentioned I'd at least want to get a reasonable HTML report before moving forward with it though.

gribozavr2 added inline comments.Mar 9 2023, 1:26 PM
clang/include/clang/Analysis/FlowSensitive/Logger.h
22

Elsewhere under Analysis/FlowSensitive we use triple slash as a doc comment marker - could you make the new header follow the pattern?

23
38

Why not DataflowAnalysisContext instead of ControlFlowContext?

You can reach the latter from the former, but not vice versa.

sammccall marked 2 inline comments as done.
sammccall retitled this revision from [FlowSensitive][WIP] log analysis progress for debugging purposes to [FlowSensitive] Log analysis progress for debugging purposes.
sammccall edited the summary of this revision. (Show Details)

Address review comments
Added -dataflow-log flag to enable logging without code changes
Removed example tool from this patch

Thanks! I cleaned up a bit and added tests, as well as a -dataflow-log flag to make this easy to access when running unit tests, tidy checks etc.
The HTML experiments seem to have validated that the logging API is sufficient to produce something useful there.

I think this is ready for proper review now.

clang/include/clang/Analysis/FlowSensitive/Logger.h
38

The idea here is "we're starting to analyze a CFG", so passing in the CFG is what we need and seems natural.
DataflowAnalysisContext provides more facilities than we need, and none of the extra stuff seems useful.
It's also more awkward to use for our purposes (you can get the CFG, but only if you know the right FunctionDecl, and the framework is unclear about whether there's always a FunctionDecl).

NoQ added a comment.Mar 21 2023, 3:04 PM

I love such debugging facilities! They're a massive boost to developer productivity compared to ad-hoc debug prints, and they allow new contributors to study how the tool works. I'm excited to see how this thing turns out.

In the static analyzer's path-sensitive engine we've settled on a Graphviz printout of the analysis graph, with an extra python script to filter and "prettify" it. In our case these graphs are much larger than CFGs (on real-world code they often contain over 200000 nodes) so filtering and searching becomes essential. And we usually don't have an option to debug a reduced toy example, because it's typically very hard to define the notion of "false positive" in a way suitable for a creduce predicate, and even when reducing manually it's easy to miss the point entirely and end up with a completely different problem. So we had to learn how to deal with these massive graphs, and these days we're pretty good at that. In your case the graphs probably aren't that massive, but other considerations probably still apply.

I found textual CFG dumps of real-world functions to be very hard to read; it's annoying to figure out in which order you're supposed to read the blocks. Most of the time I prefer Graphviz dumps of the CFG. If you're stuck with real-world code as much as we are in the static analyzer, it might be very convenient to somehow include a graphical representation of the CFG alongside your reports. Given that CFGs are relatively small, you might be able to get away with an ad-hoc graph visualizer scripted into your HTML page, so that to avoid relying on Graphviz.

Then I guess you can try adding your abstract state information directly to the graphical CFG blocks, and build a timeline of such partially analyzed CFGs, from the starting state to the fully converged state, brightly highlighting changes made on every step 🤔Dunno, just dreaming at this point.

I love such debugging facilities! They're a massive boost to developer productivity compared to ad-hoc debug prints, and they allow new contributors to study how the tool works. I'm excited to see how this thing turns out.

Thanks! It's great to know this is a problem worth some attention, as any tools here will carry some complexity.

In the static analyzer's path-sensitive engine we've settled on a Graphviz printout of the analysis graph, with an extra python script to filter and "prettify" it. In our case these graphs are much larger than CFGs (on real-world code they often contain over 200000 nodes) so filtering and searching becomes essential. And we usually don't have an option to debug a reduced toy example, because it's typically very hard to define the notion of "false positive" in a way suitable for a creduce predicate, and even when reducing manually it's easy to miss the point entirely and end up with a completely different problem. So we had to learn how to deal with these massive graphs, and these days we're pretty good at that. In your case the graphs probably aren't that massive, but other considerations probably still apply.

Yeah, as I understand the dataflow analysis proper is scoped to a function, so dealing with larger CFGs but not on that scale. I'm hoping that graphviz renderings of the CFG + overlaying it on the code are enough. Text dumps are indeed too difficult to read until you're really zeroed in on the bit you want to look at.

If you're stuck with real-world code as much as we are in the static analyzer, it might be very convenient to somehow include a graphical representation of the CFG alongside your reports. Given that CFGs are relatively small, you might be able to get away with an ad-hoc graph visualizer scripted into your HTML page, so that to avoid relying on Graphviz.

Here's a prototype I put together: https://htmlpreview.github.io/?https://gist.githubusercontent.com/sam-mccall/c03943495d2ca493be8f3087c22e3b70/raw/986a1d7d9e1ff37e46fc9b6808f0650ae28cc8d3/analyze_optional.html

I tried the ad-hoc method like you suggest to avoid the dependency, but working out how to place the nodes & arcs reasonably quickly looked too much for me. Ended up going back to shelling out to graphviz but embedding the result in the page.

Then I guess you can try adding your abstract state information directly to the graphical CFG blocks, and build a timeline of such partially analyzed CFGs, from the starting state to the fully converged state, brightly highlighting changes made on every step 🤔Dunno, just dreaming at this point.

Yes! This is very much where I want to go. I'll have an initial version up soon with basic functionality (timeline, CFG, simple textual dump at each point) but showing the most relevant state and changes at each point would be really useful.

xazax.hun added inline comments.Mar 22 2023, 8:51 AM
clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
186

If we already have a NullLogger, I wonder if making DACtx->getOptions().Log a reference that points to NullLogger when logging is disabled would be less confusing (and fewer branches).

xazax.hun added inline comments.Mar 22 2023, 9:10 AM
clang/lib/Analysis/FlowSensitive/Logger.cpp
18

Adding final? Just in case it can help with devirtualization.

24

final?

gribozavr2 accepted this revision.Mar 22 2023, 9:48 AM
This revision is now accepted and ready to land.Mar 22 2023, 9:48 AM
sammccall marked 3 inline comments as done.

address review comments

clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
186

Made the DataAnalysisContext constructor fill in this field with NullLogger if there isn't anything else, to avoid the branch. (Added a comment on the field).

It's still a pointer because of the ergonomic problems with ref fields in structs like this (can't initialize incrementally), and because we need a value for "no programmatic logging requested, feel free to respect the -dataflow-log flag" and using Logger::null() as a sentinel seems pretty surprising.

xazax.hun accepted this revision.Mar 22 2023, 10:12 AM
ymandel accepted this revision.Mar 22 2023, 12:13 PM

Awesome, thanks!

clang/include/clang/Analysis/FlowSensitive/Logger.h
19

Maybe comment on the need for these vs including the headers.

clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
393 ↗(On Diff #507418)

nit: > 0?

sammccall marked 2 inline comments as done.Mar 23 2023, 3:35 AM