This is an archive of the discontinued LLVM Phabricator instance.

Allow dumping of OpToOpAdaptorPass.
Needs ReviewPublic

Authored by rainwoodman on Jun 2 2023, 4:47 PM.

Details

Summary

After this change, MLIR will trigger Instrumentation callbacks on OpToOpAdaptorPass itself with the module as GetOperation(). A new GetLongName() method is added to Pass to allow formatting an informative name (like OpToOpAdapterPass[[FunctionalToExecutorConversionPass]]) that is used for dumping. Note that OpToOpAdaptorPass is in mlir::detail and cannot be used in user subclasses of IR printing Config.

Before this change, MLIR dumping of OpToOpAdaptor passes are only conducted on individual container nodes (such as FuncOp). This becomes verbose when we dump DTensor MLIR rewrites of relatively large graphs from real TensorFlow models.

The user can distinguish between an OpToOpAdaptor full pass callback or a per FuncOp callback by checking if the Op is a module op. This behavior is not as straightforward as I'd like it to be (and I'd like to get some advises from the MLIR maintainers).

We have been using this patch with locally in debugging DTensor: the change reduced the number of dumps by a few thousand, which translated to a great time saving when the dumping was to a slower remote file system.

Diff Detail

Event Timeline

rainwoodman created this revision.Jun 2 2023, 4:47 PM
Herald added a project: Restricted Project. · View Herald Transcript
rainwoodman requested review of this revision.Jun 2 2023, 4:47 PM
rainwoodman retitled this revision from BEGIN_PUBLIC Allow dumping of OpToOpAdaptorPass. to Allow dumping of OpToOpAdaptorPass..Jun 4 2023, 10:18 AM
rainwoodman edited the summary of this revision. (Show Details)
rainwoodman added a subscriber: jpienaar.

Could there be one that's the format too? (More documentative than a test really)

mlir/lib/Pass/Pass.cpp
668

Could the format in the same way as the reproducer? (Ideally we'd have something that we could just copy and paste for testing)

705

Style in MLIR is to elide trivial braces.

793

Same

mlir/lib/Pass/PassDetail.h
81

Doxygen Comment?

rainwoodman edited the summary of this revision. (Show Details)
  • Update name style.
  • Add docstring to GetLongName.
rainwoodman marked 3 inline comments as done.Jul 6 2023, 3:45 PM
rainwoodman added inline comments.
mlir/lib/Pass/Pass.cpp
668

I've updated the unit tests with examples how the names look like. (And updated the pass's 'overall' name to OpToOpPassAdaptor.

Note that the unit test only covered the case where there is only one pass manager and one pass in the pass manager. Could you take another look if this is what you had in mind?

I don't get the problem this is trying to solve right now.

Before this change, MLIR dumping of OpToOpAdaptor passes are only conducted on individual container nodes (such as FuncOp). This becomes verbose when we dump DTensor MLIR rewrites of relatively large graphs from real TensorFlow models.

I don't understand what this means...

The user can distinguish between an OpToOpAdaptor full pass callback or a per FuncOp callback by checking if the Op is a module op. This behavior is not as straightforward as I'd like it to be (and I'd like to get some advises from the MLIR maintainers).

This does not make sense to me: won't there be an OpToOpAdaptor attached to the FuncOp as well?

The way OpToOpAdaptor works is that it runs the FuncOp pass on the FuncOp
nodes one by one.

Before this change the only dumping mechanism for nesting pass is dumping
for each FuncOp pass. For a complex graph with thousands of FuncOp this
creates thousands of almost identical dumps for every single nested pass.
Serval usability issues rise from the large number of files created. This
is the problem.

The change here offers an option to dump the OpToOpAdapter pass itself,
which runs on the ModuleOp and only dumps once per nested pass. Thus
effectively user now gains the freedom to choose dump per FuncOp pass, per
ModuleOp pass, or both for a OpToOpAdapter pass.

There is some added complication in GetLongName to handle the case where an
OpToOpAdaptor is fused from multiple FuncOp passes.

Thanks for elaborating, I understand how it could be useful from this point of view.

There is some added complication in GetLongName to handle the case where an OpToOpAdaptor is fused from multiple FuncOp passes.

This is something that is still not clear to me: how do you get to print "for every nested pass" since the pass manager will fuse everything in a single OpToOpAdaptor?

Another question I have is how to use this: seems like this won't mesh well with the printBeforeIfEnabled, printAfterIfEnabled, etc. mechanisms used in the IRPrinter.

wrengr added inline comments.Jul 20 2023, 1:00 PM
mlir/lib/Pass/IRPrinting.cpp
67

extra space seems like a typo?

mlir/lib/Pass/Pass.cpp
665

Nit: should use single-quotes instead. Ditto for the square brackets below.

rainwoodman added a comment.EditedJul 24 2023, 9:37 AM

Thanks for elaborating, I understand how it could be useful from this point of view.

There is some added complication in GetLongName to handle the case where an OpToOpAdaptor is fused from multiple FuncOp passes.

This is something that is still not clear to me: how do you get to print "for every nested pass" since the pass manager will fuse everything in a single OpToOpAdaptor?

Another question I have is how to use this: seems like this won't mesh well with the printBeforeIfEnabled, printAfterIfEnabled, etc. mechanisms used in the IRPrinter.

I guess this example explains both questions:

The combined pass would be dumping on "operation is mlir::ModuleOp"; the per object nested pass runs would dump on "operation is mlir::func::FuncOp" (the type used in AddNestedPass).

void printAfterIfEnabled(mlir::Pass *pass, mlir::Operation *operation,
                         PrintCallbackFn print_callback) override {
  mlir::ModuleOp module = mlir::dyn_cast<mlir::ModuleOp>(operation);
  // Only dumps the top level module and skips intermediate steps of nested
  // passes.
  if (!module) {
    return;
  }
  if (!module->hasAttr(dtensor::kDoNotLog) && !do_not_print_)
    BridgeLoggerConfig::printAfterIfEnabled(pass, operation, print_callback);
}

Even though the nested passes are fused/combined, I think it affects the concurrency more than the dumping logic. I couldn't quite find the code hooks for triggering the prints, but I remember the print hooks was either triggered (per instance, all combined passes) , or per (pass, instance) pair.

  • Use single quote.
  • Remove extra space.
rainwoodman marked 2 inline comments as done.Jul 24 2023, 9:46 AM
mehdi_amini added inline comments.Jul 24 2023, 10:59 PM
mlir/include/mlir/Pass/Pass.h
73

Instead of this, what about changing things this way:

  • Add a field std::string cachedName; to OpToOpPassAdaptor
  • Change getAdaptorName() to check if cachedName is empty and otherwise populate it.
  • Override getName() to return cachedName.
mlir/lib/Pass/IRPrinting.cpp
66

Maybe what we need is a different hook for the instrumentations: we have runBeforePass and runAfterPass but we could also have runBeforeAdaptor and runAfterAdaptor which would offer a cleaner separation of intent here.

rainwoodman added inline comments.Jul 26 2023, 1:02 PM
mlir/lib/Pass/IRPrinting.cpp
66

If we do that then the logic for skipping the per (nested pass, FuncOp) dump and for dumping the Adaptor whole module dump will be scattered in both functions. What about lifting OpToOpPassAdaptor out of details:: such that users can do a type check inside run[Before|After]Pass?

rriddle added inline comments.Jul 27 2023, 10:30 PM
mlir/lib/Pass/IRPrinting.cpp
66

We have runBeforePipeline/runAfterPipeline, I agree with Mehdi that we should look at expanding this, like runBeforePipelineCollection/runAfterPipelineCollection. The PassAdaptor classes are internal implementation details and we should keep them hidden as much as possible.