This is an archive of the discontinued LLVM Phabricator instance.

[GreedyPatternRewriter] Introduce a config object that allows controlling internal parameters. NFC.
ClosedPublic

Authored by lattner on May 23 2021, 11:25 AM.

Details

Summary

This exposes the iterations and top-down processing as flags, and also
allows controlling whether region simplification is desirable for a client.
This allows deleting some duplicated entrypoints to
applyPatternsAndFoldGreedily.

This also deletes the Constant Preprocessing pass, which isn't worth it
on balance.

All defaults are all kept the same, so no one should see a behavior change.

Diff Detail

Event Timeline

lattner created this revision.May 23 2021, 11:25 AM
lattner requested review of this revision.May 23 2021, 11:25 AM
mehdi_amini accepted this revision.May 24 2021, 9:29 AM
mehdi_amini added inline comments.
mlir/include/mlir/Transforms/GreedyPatternRewriteDriver.h
26

bottom-up?

Or "reverse-post-order" to be exact (we populate the worklist in post-order, but then pop-back from the worklist).

This revision is now accepted and ready to land.May 24 2021, 9:29 AM
lattner marked an inline comment as done.May 24 2021, 12:39 PM

Thank you for the review!

mlir/include/mlir/Transforms/GreedyPatternRewriteDriver.h
26

This is the problem - it isn't RPO or bottom up. It is RPO on the region tree but it is bottom up on the operations within each block. This hybrid approach isn't actually good for the "bottom up" case either. Clarified the comment though. Thanks!

lattner updated this revision to Diff 347477.May 24 2021, 12:40 PM
lattner marked an inline comment as done.

Update comment, silence a warning.

mehdi_amini added inline comments.May 24 2021, 12:48 PM
mlir/include/mlir/Transforms/GreedyPatternRewriteDriver.h
26

If you consider blocks part of the "walk tree", where the list of operations in the block are just "child", isn't it just RPO?
(inside a block, what is the difference between bottom up and RPO?)

This revision was landed with ongoing or failed builds.May 24 2021, 12:48 PM
This revision was automatically updated to reflect the committed changes.
lattner added inline comments.May 24 2021, 3:56 PM
mlir/include/mlir/Transforms/GreedyPatternRewriteDriver.h
26

No, it isn't just RPO. RPO on a linear sequence is top-down. Consider that basic blocks in a CFG is just an optimization: it is also possible to model every operation as having successors (most operations would have one). in that case, RPO would be top down.

jpienaar added inline comments.
mlir/include/mlir/Transforms/GreedyPatternRewriteDriver.h
27

Note: this also changes more than just affecting ambigious patterns. We have cases such as:

%381 = X.opA (%arg1161 : !X.T) {
  ...
  X.yield %11842 : tensor<2304x3x96xf32>
}

%1108 = X.opA (%arg1161 : !X.T) {
  ...
  %11842 = X.opB 2 %381[%arg1161] : (tensor<2304x24x96xf32>, !X.T) -> tensor<2304x24x12xf32>
  ...
  X.yield %11846 : tensor<1x1024x24x12xf32>
}

In bottom up %1108 is seen as unused and so removed (along with all the ops inside its region) and so when %381 is encountered it is known dead, but in top down only the operands of the dead op %1108 is added back to worklist and %381 has already been processed so it doesn't get added back to worklist. This results in the bottom up finishing in 1 iteration, while top-down takes 157 iterations in canonicalizer (as the above form ends up being quite long chains of this pattern). So for removal of deadcode, bottom up is performing much better.

To have top down and bottom up behave the same here, one would need to enqueue all the operands of ops nested inside dead op that processes a value referenced directly (rather than passed in as argument) to worklist, as these ops can become dead now too.