Page MenuHomePhabricator

[mlir] Canonicalizer and CSE accept rewrite listener
AbandonedPublic

Authored by Mogball on Dec 22 2021, 2:40 PM.

Details

Summary

Create an API to allow the canonicalizer and CSE to be called directly on an operation. These APIs also accept a rewrite listener, so that callers can be notified when operations are replaced or erased by the passes.

Depends on D116142

Diff Detail

Event Timeline

Mogball created this revision.Dec 22 2021, 2:40 PM
Mogball requested review of this revision.Dec 22 2021, 2:40 PM
rriddle requested changes to this revision.Dec 22 2021, 2:59 PM

Marking as blocking because the design intent and goals here are not clear to me. This looks like it is going to create mega-passes, which is not something I'd like to support.

This revision now requires changes to proceed.Dec 22 2021, 2:59 PM

Marking as blocking because the design intent and goals here are not clear to me. This looks like it is going to create mega-passes, which is not something I'd like to support.

You're talking about the listener part right? The initializeCanonicalizer/canonicalizeOperations are just small wrappers around already public features I think.

mlir/include/mlir/Transforms/Passes.h
79

This will create false expectations that other sibling ops won't be modified, which isn't true for non-isolated operation at least.

mlir/include/mlir/Transforms/GreedyPatternRewriteDriver.h lines 83-95 for some doc on this (with an alternative API).

Marking as blocking because the design intent and goals here are not clear to me. This looks like it is going to create mega-passes, which is not something I'd like to support.

You're talking about the listener part right? The initializeCanonicalizer/canonicalizeOperations are just small wrappers around already public features I think.

The listeners yes, the contract there is non obvious. Also the exposing of passes as utilities. The general benefits of running a full canonicalize and CSE in the same pass aren't obvious to me, especially given the ability to run nested passes.

The general benefits of running a full canonicalize and CSE in the same pass aren't obvious to me, especially given the ability to run nested passes.

I agree, though I'd mention that if you do something like "loop unrolling" you may want to cleanup/canonicalize the loop body afterward, without reprocessing the entire module.

Mogball added a comment.EditedDec 22 2021, 9:06 PM

I agree, though I'd mention that if you do something like "loop unrolling" you may want to cleanup/canonicalize the loop body afterward, without reprocessing the entire module.

Mehdi's example is one of the specific use cases I have in mind. However, the purpose of the listeners is to keep tabs on certain operations between transformations and canonicalization/CSE. For example, if I want to tile a matmul and then vectorize it but in between these transformations I want to run CSE and canonicalizer.

The catch is trying not to apply the vectorize pattern indiscriminately, but on specific operations created by the tile. The workaround has been to tag these ops with a specific attribute but the discussion around non discardable attributes never solidified - they might get dropped by other passes. With listeners, the operations can be saved in a set and removed/added as the passes erase or replace them, to be fed into the next transformation.

> With listeners, the operations can be saved in a set and removed/added as the passes erase or replace them, to be fed into the next transformation.

This whole approach is actually bypassing the entire pass infrastructure, which is the crux of River's objection: I don't think there is an agreement that this is considered a good design in the first place.

Now I also understand that the traditional whole pass and pass ordering approach isn't suitable for modern codegen strategies, however that does not mean that we should just throw everything away and move forward without any design or strong foundations to built on. Right now I see what you're doing as enabling a feature but without providing the infra that will support it at scale.

@mehdi_amini Other potential issue is that canonicalization can open new opportunities for CSE and CSE can open new opportunities for canonicalization so canonicalize->CSE->canonicalize->CSE->... sequence must be run arbitrary number of times.
How this is supposed to be handled in existing pass infra?

> With listeners, the operations can be saved in a set and removed/added as the passes erase or replace them, to be fed into the next transformation.

This whole approach is actually bypassing the entire pass infrastructure, which is the crux of River's objection: I don't think there is an agreement that this is considered a good design in the first place.

Now I also understand that the traditional whole pass and pass ordering approach isn't suitable for modern codegen strategies, however that does not mean that we should just throw everything away and move forward without any design or strong foundations to built on. Right now I see what you're doing as enabling a feature but without providing the infra that will support it at scale.

I don't disagree with this assessment. I threw up these patches lowkey hoping it would start some discussion. We can kick this down the road until the holidays are over and talk about it in a meeting.

@mehdi_amini Other potential issue is that canonicalization can open new opportunities for CSE and CSE can open new opportunities for canonicalization so canonicalize->CSE->canonicalize->CSE->... sequence must be run arbitrary number of times.
How this is supposed to be handled in existing pass infra?

Using a dynamic pipeline: you create a pass that is CSECanonicalizerPass and this pass while. Take a hash of the current sub-IR it operates on, run a dynamic pipeline that contains canonicalize, cse, rehash the IR, if it changed just return the dynamic pipeline.

bondhugula added inline comments.
mlir/lib/Transforms/Canonicalizer.cpp
92

Won't this work only on ops that are IsolatedFromAbove?

Take a hash of the current sub-IR it operates on

This doesn't look very efficient :)
Other side question - why mlir passes doesn't return flag indicating if they actually changed something or not in IR (like llvm passes runOn return bool)? This would be quite useful for situations like this (and for other things like automatically preserving all analyses).

Take a hash of the current sub-IR it operates on

This doesn't look very efficient :)

Because everything is internalized in the context: we only ever have to hash some pointers. Most of the cost is basically the walk itself.

Other side question - why mlir passes doesn't return flag indicating if they actually changed something or not in IR (like llvm passes runOn return bool)? This would be quite useful for situations like this (and for other things like automatically preserving all analyses).

I believe the new pass manager in LLVM has passes with a signature that looks like this: PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);

That said it would be possible to model this with an analysis. For example the current IR hash could be an analysis in itself (or you could make any "fake" analysis actually) and passes calling markAllAnalysesPreserved(); would preserve it (CSE already calls this).

uenoku added a subscriber: uenoku.Dec 29 2021, 2:36 AM
Mogball abandoned this revision.Jan 28 2022, 3:04 PM