This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Restructure loops to call widen on back edges
AcceptedPublic

Authored by li.zhe.hua on Aug 10 2022, 9:58 PM.

Details

Summary

When navigating a loop block, we call the lattice's widen operator,
which gives a lattice of infinite height the opportunity to reach
convergence.

As we enter the loop, we store the block state in the back edge block,
which represents the state after the zeroth iteration. Then, after
each loop iteration, we widen the previous iteration's state with the
new iteration's state.

Tracking issue: #56931

Depends on D131645

Diff Detail

Event Timeline

li.zhe.hua created this revision.Aug 10 2022, 9:58 PM
Herald added a project: Restricted Project. · View Herald Transcript
li.zhe.hua requested review of this revision.Aug 10 2022, 9:58 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2022, 9:58 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
ymandel added a subscriber: gribozavr.
ymandel accepted this revision.Aug 11 2022, 5:47 AM

Nice!

clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
180

Might it be worth simply returning the backedge when you find it? Or is the assertion (above) sufficiently important to keep it as is?

clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
276

Might a (googletest) assertion here be better than llvm::cantFail? I would think that this line is the crux of checking whether it converges or not.

This revision is now accepted and ready to land.Aug 11 2022, 5:47 AM
li.zhe.hua marked 2 inline comments as done.

Address comments

clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
180

So, I couldn't prove to myself that a block couldn't have two backedge predecessors from staring at the CFG code, but it seems conceptually unlikely enough that I'm OK simplifying this.

clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
276

Ah, yes. I hadn't thought to look for llvm::Expected matchers; good call.

xazax.hun added inline comments.Aug 11 2022, 11:18 AM
clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
168–169

Is this also true when we have multiple continue statements in the loop?

226–232

Could you elaborate on this? Let's consider this loop:

Pred
  |
  v
LoopHeader <---BackEdge

Do we ignore the state coming from Pred on purpose? Is that sound?

I would expect the analysis to always compute join(PredState, BackEdgeState), and I would expect the widening to happen between the previous iteration of BackEdgeState and the current iteration of BackEdgeState. So, I wonder if we already invoke the widening operator along back edges, wouldn't the regular logic work just fine? Do I miss something?

Fix incorrect assumption that back edge blocks have an empty body.

li.zhe.hua marked an inline comment as done.Aug 11 2022, 12:15 PM
li.zhe.hua added inline comments.
clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
168–169

Yes. The end of the loop, and each of the continue statements, target the back edge block. They all get funneled through that back edge to Block, such that Block only has two predecessors. However, I haven't verified this in the CFG code, only by not finding a counterexample.

226–232

Do we ignore the state coming from Pred on purpose? Is that sound?

We don't, and this is what the comment

// For the first iteration of loop, the "zeroth" iteration state is set up by
// `prepareBackEdges`.

failed to explain. After transferring PredState, we copy PredState into BackEdgeState, which is done in prepareBackEdges.

I would expect the analysis to always compute join(PredState, BackEdgeState)

I'm not sure I see that we should always join PredState and BackEdgeState. Execution doesn't go from Pred into the Nth iteration of the loop, it only goes from Pred into the first iteration of the loop, e.g. the predecessor for the 4th iteration of the loop is only the back-edge from the 3rd iteration of the loop, not Pred.

Let me know if this makes sense.

xazax.hun added inline comments.Aug 11 2022, 12:35 PM
clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
226–232

I'm not sure I see that we should always join PredState and BackEdgeState. Execution doesn't go from Pred into the Nth iteration of the loop, it only goes from Pred into the first iteration of the loop, e.g. the predecessor for the 4th iteration of the loop is only the back-edge from the 3rd iteration of the loop, not Pred.

Let me know if this makes sense.

The analysis state of the dataflow analysis supposed to overestimate all of the paths. Consider the following loop and imagine we want to calculate intervals for integer variables:

int i = 0;
while (...) {
  [[program point A]];
  ++i;
}

During the analysis of this loop, the value i ==0 flows to [[program point A]]. This is the motivation to join the state from the back edge and from PredState. As far as I understand, you want to achieve this by copying PredState to the BackEdgeState before the first iteration. But this also means that we will use the widening operator to combine PredState with the state after N iterations instead of the regular join operation. I am not entirely sure whether these two approaches always produce the same results.

xazax.hun added inline comments.Aug 11 2022, 12:58 PM
clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
226–232

A contrived example (just came up with this in a couple minutes so feel free to double check if it is OK):
Consider:

int i = 0;
while (...) {
  if (i == 0)
    i = -2;
  ++i;
}

Some states:

PredState = i : [0, 0]
BackEdgeState (after first iteration) = i : [-1, -1]

And the results of join vs widen:

PredState.join(BackEdgeState) = i : [-1, 0]
PredState.widen(BackEdge) = i : [-inf, 0]

The extra widening with the PredState can result in extra imprecision.

sgatev added inline comments.Aug 12 2022, 4:08 AM
clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
157

Let's start function comments with /// throughout the file.

164
167–168
168–169

Does that hold for back edges stemming from goto statements?

clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
270

There's already a set of "widening" tests - http://google3/third_party/llvm/llvm-project/clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp;l=527-712;rcl=462638952

What do you think about refactoring those so that we have tests that exercise the framework with both join and widen?

li.zhe.hua marked 4 inline comments as done.

Address comments

clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
168–169

Ah, goto does throw a wrench in all of this. Specifically, it is problematic only for do ... while loops, where the labeled statement is the first statement of the do block.

I took out the sentence, took out the assert, and added a FIXME for prepareBackEdges.

226–232

So, my understanding is that widening introduces imprecision as a trade-off for convergence. Yes, in the contrived example, joining converges after a few iterations, but in other cases it never converges.


Looking at the Rival and Yi book, there are no intervening joins with the predecessor for the first loop as we analyze the loop. This is roughly where I drew from in terms of this implementation.

analysis(iter{p}, a) = { R <- a;
                         repeat
                             T <- R;
                             R <- widen(R, analysis(p, R));
                         until inclusion(R, T)
                         return T;
clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
270

This commit implements widening only for the lattice. Those tests use the NoopLattice, so their behavior is not expected to change with this commit. Splitting merge into a join and widen is out-of-scope for this commit, and is what I am hoping to work on next.

xazax.hun added inline comments.Aug 19 2022, 2:20 PM
clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
226–232

So, my understanding is that widening introduces imprecision as a trade-off for convergence. Yes, in the contrived example, joining converges after a few iterations, but in other cases it never converges.

In my example I do not advocate for NOT doing widening. I am advocating for not applying the widening operator with the loop pre-state and the loop iteration state as operands. My example demonstrates how that introduces additional imprecision. Not using the widening operator for those two states, only between loop iterations would still ensure convergence, but would give you more precision. My argument is, using widening in that step is not required for the convergence but we do pay for it in precision. Or do you have an example where you need widening for those two specific states?

Looking at the Rival and Yi book, there are no intervening joins with the predecessor for the first loop as we analyze the loop. This is roughly where I drew from in terms of this implementation.

Later in the book, they talk about loop unrolling where we don't use the widening operator in the first couple of iterations. The same imprecision I was talking about here could also be solved by unrolling the first iteration. I believe the book is might not talk about this technique because the unrolling subsumes it.

The main reason I am advocating for this, because I believe if we are willing to accept the join there, we would no longer need to prepare the back edges and it would simplify the implementation while improving the precision of the analysis.