This is an archive of the discontinued LLVM Phabricator instance.

[DataFlow] Factor two worklist implementations out
ClosedPublic

Authored by xazax.hun on Jan 7 2020, 5:57 PM.

Details

Summary

In Clang, we have the tendency to have ad-hoc worklist implementations everywhere. This patch is a first attempt to try to make two of them reusable.
In case the flow-sensitive lifetime analysis gets upstreamed, it is also likely to use the FordwardDataflowWorklist type (it is already using something very similar).
This could also be useful for future dataflow algorithms.

The UninitializedValues analysis used to do something slightly different which should be functionally equivalent with the new implementation. To tell the truth, I did not measure the performance implications yet.
The number of visited blocks decreased after the change, so I assume the performance should not regress. The tests we have did not uncover any problems with decreased number of visits.

What do you think, is this something we should pursue?

Diff Detail

Event Timeline

xazax.hun created this revision.Jan 7 2020, 5:57 PM
xazax.hun updated this revision to Diff 236734.Jan 7 2020, 6:09 PM
  • Add missing new line at the end of file.
  • Populate the initial worklist in UninitializedValues more efficiently.
xazax.hun updated this revision to Diff 236873.Jan 8 2020, 11:32 AM
  • Prepopulating the worklist for UninitializedValues seems to be redundant.
xazax.hun added a comment.EditedJan 8 2020, 11:38 AM

Fortunately, UninitializedValues has some statistics. So I printed it for a big translation unit (SemaExpr.cpp) before and after this change.

Before:

*** Analysis Based Warnings Stats:
33023 functions analyzed (0 w/o CFGs).
  161696 CFG blocks built.
  4 average CFG blocks per function.
  1246 max CFG blocks per function.
4792 functions analyzed for uninitialiazed variables
  9189 variables analyzed.
  1 average variables per function.
  55 max variables per function.
  47089 block visits.
  9 average block visits per function.
  1039 max block visits per function.

After:

*** Analysis Based Warnings Stats:
33023 functions analyzed (0 w/o CFGs).
  161696 CFG blocks built.
  4 average CFG blocks per function.
  1246 max CFG blocks per function.
4792 functions analyzed for uninitialiazed variables
  9189 variables analyzed.
  1 average variables per function.
  55 max variables per function.
  45728 block visits.
  9 average block visits per function.
  1039 max block visits per function.

It looks like after getting rid of prepopulating the worklist we overall visit fewer blocks which sounds good. All the tests pass, so if those extra visits were necessary, we do not have test coverage for them. But this makes me think that performance-wise this change should not be bad.

mgehre added inline comments.Jan 8 2020, 11:47 AM
clang/include/clang/Analysis/FlowSensitive/DataflowWorklist.h
20

Should this class have a bit of doxygen and a unit test?

xazax.hun edited the summary of this revision. (Show Details)Jan 8 2020, 11:47 AM
xazax.hun marked an inline comment as done.Jan 8 2020, 11:50 AM
xazax.hun added inline comments.
clang/include/clang/Analysis/FlowSensitive/DataflowWorklist.h
20

We have two users and both users have regression tests. More tests are always good, but I am not sure if we would get much value in this case. Having some comments sound very useful though :)

xazax.hun updated this revision to Diff 236886.Jan 8 2020, 12:28 PM
  • Added doxygen comments and unit tests.
xazax.hun marked 2 inline comments as done.Jan 8 2020, 12:29 PM
xazax.hun added inline comments.
clang/include/clang/Analysis/FlowSensitive/DataflowWorklist.h
20

Added a unit test anyway :)

mgehre added inline comments.Jan 9 2020, 12:43 PM
clang/include/clang/Analysis/FlowSensitive/DataflowWorklist.h
20

;-)

21

nit: wwill

Ping for the other reviewers :)

NoQ added a comment.Jan 15 2020, 2:39 PM

I'm excited about this effort!~

The change in uninitialized values analysis gives me a bit of anxiety. Could you explain what exactly has changed that caused the change in the stats and why you think it doesn't make a difference, maybe give an example? (an example could be obtained by creduce-ing over "the stats have changed" criterion)

In D72380#1822927, @NoQ wrote:

The change in uninitialized values analysis gives me a bit of anxiety. Could you explain what exactly has changed that caused the change in the stats and why you think it doesn't make a difference, maybe give an example? (an example could be obtained by creduce-ing over "the stats have changed" criterion)

Yeah, this is the riskiest part, so I do understand the concerns. Basically, the former implementation was using PostOrderCFGView::iterators and a stack (popping from the back of a smallvector). When the stack was empty, it popped the next element from the iterator. Also, the analysis queues the successors of a node once it was processed (and the analysis state was changed).
So let's imagine a linear CFG. We will process it multiple times with the original implementation, we traverse the CFG both using DFS and using reverse post order.

The new implementation is only using a reverse post order (no stack, no DFS), and does not have an extra pass over the CFG like the previous one.

Since this is a fixed point iteration and we should stop iterating whenever we reached the fixed point. Reaching that with fewer iterations sounds good to me. Since we always queue the successors of a changed node I have hard time imagining how could we stop prematurely.

I can look into a minimal repro if that helps.

In D72380#1822927, @NoQ wrote:

The change in uninitialized values analysis gives me a bit of anxiety. Could you explain what exactly has changed that caused the change in the stats and why you think it doesn't make a difference, maybe give an example? (an example could be obtained by creduce-ing over "the stats have changed" criterion)

We will process it multiple times with the original implementation, we traverse the CFG both using DFS and using reverse post order.

Sorry, I just realized that the original code initially considers all the nodes queued. In this case there is no extra pass. But it is still true that the original is mostly using DFS while the new one is only using RPO. So it comes down to the order we visit the basic blocks. Working on a minimal repro.

Here is an example:

void f() {
  int i;
  while (i < 42 && i) {
    if (i)
      &i;
  }
}

This takes 17 visits before, 16 after.

xazax.hun updated this revision to Diff 238592.Jan 16 2020, 1:09 PM
  • Fix typo.
xazax.hun marked 2 inline comments as done.Jan 16 2020, 1:09 PM
gribozavr2 accepted this revision.Jan 17 2020, 5:07 AM

Thank you for factoring our this library!

I briefly read the UninitializedValues analysis and I also think that this change is a no-op.

clang/include/clang/Analysis/FlowSensitive/DataflowWorklist.h
60

s/data-flow/dataflow/ I think.

This revision is now accepted and ready to land.Jan 17 2020, 5:07 AM
This revision was automatically updated to reflect the committed changes.