Since the analysis is described to be suitable for a forward data-flow analysis, maintaining the worklist as a queue is closer to the RPO ordering of block visits, thus reaching the fixpoint earlier.
For example,
// mlir-opt -mlir-disable-threading -allow-unregistered-dialect -pass-pipeline="builtin.func(sccp)" sccp-iterations.mlir func @simple_control_flow(%arg0 : i32, %arg1 : i1) -> i32 { cond_br %arg1, ^bb1, ^bb3 ^bb1: %1 = arith.constant 1 : i32 br ^bb2(%1 : i32) ^bb3: %3 = arith.constant 2 : i32 br ^bb2(%3 : i32) ^bb2(%arg : i32): %2 = arith.constant 3 : i32 %4 = arith.addi %arg, %2 : i32 return %4 : i32 }
Running SCCP on this code results in the following visits when using a stack (currently) to maintain the worklist (I modified visitOperation in SCCP.cpp to print the operation being visited):
(SCCP): Visiting %c2_i32 = arith.constant 2 : i32 (SCCP): Visiting %c3_i32 = arith.constant 3 : i32 (SCCP): Visiting %1 = arith.addi %0, %c3_i32 : i32 (SCCP): Visiting %c1_i32 = arith.constant 1 : i32 (SCCP): Visiting %1 = arith.addi %0, %c3_i32 : i32 module { func @simple_control_flow(%arg0: i32, %arg1: i1) -> i32 { %c3_i32 = arith.constant 3 : i32 %c2_i32 = arith.constant 2 : i32 %c1_i32 = arith.constant 1 : i32 cond_br %arg1, ^bb1, ^bb2 ^bb1: // pred: ^bb0 br ^bb3(%c1_i32 : i32) ^bb2: // pred: ^bb0 br ^bb3(%c2_i32 : i32) ^bb3(%0: i32): // 2 preds: ^bb1, ^bb2 %1 = arith.addi %0, %c3_i32 : i32 return %1 : i32 } }
But with the proposed change to use a queue instead, the visits are:
(SCCP): Visiting %c1_i32 = arith.constant 1 : i32 (SCCP): Visiting %c2_i32 = arith.constant 2 : i32 (SCCP): Visiting %c3_i32 = arith.constant 3 : i32 (SCCP): Visiting %1 = arith.addi %0, %c3_i32 : i32 module { func @simple_control_flow(%arg0: i32, %arg1: i1) -> i32 { %c3_i32 = arith.constant 3 : i32 %c2_i32 = arith.constant 2 : i32 %c1_i32 = arith.constant 1 : i32 cond_br %arg1, ^bb1, ^bb2 ^bb1: // pred: ^bb0 br ^bb3(%c1_i32 : i32) ^bb2: // pred: ^bb0 br ^bb3(%c2_i32 : i32) ^bb3(%0: i32): // 2 preds: ^bb1, ^bb2 %1 = arith.addi %0, %c3_i32 : i32 return %1 : i32 } }
On the longer run, we could explore more complex strategies (not to mention the simple improvement of checking if a node is already in the queue before inserting it) as is described in "Iterative Data-flow Analysis - Revisited" (Keith D. Cooper, Timothy J. Harvey, and Ken Kennedy).
I'm not sure what's the right way to add a test for this. Perhaps add a statistic to count when an Operation is visited, print that and CHECK for that in the unit test?
System includes should go after the others:
https://llvm.org/docs/CodingStandards.html#include-style