diff --git a/mlir/docs/DialectConversion.md b/mlir/docs/DialectConversion.md --- a/mlir/docs/DialectConversion.md +++ b/mlir/docs/DialectConversion.md @@ -151,12 +151,12 @@ ## Rewrite Pattern Specification After the conversion target has been defined, a set of legalization patterns -must be provided to transform illegal operations into legal ones. The structure -of the patterns supplied here is the same as those described in the -[quickstart rewrites guide](Tutorials/QuickstartRewrites.md#adding-patterns). -The patterns provided do not need to generate operations that are directly legal -on the target. The framework will automatically build a graph of conversions to -convert non-legal operations into a set of legal ones. +must be provided to transform illegal operations into legal ones. The patterns +supplied here have the same structure and restrictions as those described in the +main [Pattern](PatternRewriter.md) documentation. The patterns provided do not +need to generate operations that are directly legal on the target. The framework +will automatically build a graph of conversions to convert non-legal operations +into a set of legal ones. As an example, say you define a target that supports one operation: `foo.add`. When providing the following patterns: [`bar.add` -> `baz.add`, `baz.add` -> diff --git a/mlir/docs/GenericDAGRewriter.md b/mlir/docs/GenericDAGRewriter.md deleted file mode 100644 --- a/mlir/docs/GenericDAGRewriter.md +++ /dev/null @@ -1,415 +0,0 @@ -# Generic DAG Rewriter Infrastructure - -## Introduction and Motivation - -The goal of a compiler IR is to represent code - at various levels of -abstraction which pose different sets of tradeoffs in terms of representational -capabilities and ease of transformation. However, the ability to represent code -is not itself very useful - you also need to be able to implement those -transformations. - -There are many different sorts of compiler transformations, but this document -focuses on a particularly important class of transformation that comes up -repeatedly at scale, and is important for the immediate goals of MLIR: that of -pattern matching on a set of operations and replacing with another set. This is -the key algorithm required to implement the "op fission" algorithm used by the -tf2xla bridge, pattern matching rewrites from TF ops to TF/Lite, peephole -optimizations like "eliminate identity nodes" or "replace x+0 with x", as well -as a useful abstraction to implement optimization algorithms for MLIR graphs at -all levels. - -A particular strength of MLIR (and a major difference vs other compiler -infrastructures like LLVM, GCC, XLA, TensorFlow, etc) is that it uses a single -compiler IR to represent code at multiple levels of abstraction: an MLIR -operation can be a "TensorFlow operation", an "XLA HLO", a "TF Lite -FlatBufferModel op", a TPU LLO instruction, an LLVM IR instruction (transitively -including X86, Lanai, CUDA, and other target specific instructions), or anything -else that the MLIR type system can reasonably express. Because MLIR spans such a -wide range of different problems, a single infrastructure for performing -graph-to-graph rewrites can help solve many diverse domain challenges, including -TensorFlow graph level down to the machine code level. - -[Static single assignment](https://en.wikipedia.org/wiki/Static_single_assignment_form) -(SSA) representations like MLIR make it easy to access the operands and "users" -of an operation. As such, a natural abstraction for these graph-to-graph -rewrites is that of DAG pattern matching: clients define DAG tile patterns, and -each pattern includes a result DAG to produce and the cost of the result (or, -inversely, the benefit of doing the replacement). A common infrastructure -efficiently finds and perform the rewrites. - -While this concept is simple, the details are more nuanced. This proposal -defines and explores a set of abstractions that we feel can solve a wide range -of different problems, and can be applied to many different sorts of problems -that MLIR is - and is expected to - face over time. We do this by separating the -pattern definition and matching algorithm from the "driver" of the computation -loop, and make space for the patterns to be defined declaratively in the future. - -## Related Work - -There is a huge amount of related work to consider, given that pretty much every -compiler in existence has to solve this problem many times over. Here are a few -graph rewrite systems we have used, along with the pros and cons of this related -work. One unifying problem with all of these is that these systems are only -trying to solve one particular and usually narrow problem: our proposal would -like to solve many of these problems with a single infrastructure. Of these, the -most similar design to our proposal is the LLVM DAG-to-DAG instruction selection -algorithm at the end. - -### Constant folding - -A degenerate but pervasive case of DAG-to-DAG pattern matching is constant -folding: given an operation whose operands contain constants can often be folded -to a result constant value. - -MLIR already has constant folding routines which provide a simpler API than a -general DAG-to-DAG pattern matcher, and we expect it to remain because the -simpler contract makes it applicable in some cases that a generic matcher would -not. For example, a DAG-rewrite can remove arbitrary nodes in the current -function, which could invalidate iterators. Constant folding as an API does not -remove any nodes, it just provides a (list of) constant values and allows the -clients to update their data structures as necessary. - -### AST-Level Pattern Matchers - -The literature is full of source-to-source translators which transform -identities in order to improve performance (e.g. transforming `X*0` into `0`). -One large example that I'm aware of is the GCC `fold` function, which performs -[many optimizations](https://github.com/gcc-mirror/gcc/blob/master/gcc/fold-const.c) -on ASTs. Clang has -[similar routines](http://releases.llvm.org/3.5.0/tools/clang/docs/InternalsManual.html#constant-folding-in-the-clang-ast) -for simple constant folding of expressions (as required by the C++ standard) but -doesn't perform general optimizations on its ASTs. - -The primary downside of tree optimizers is that you can't see across operations -that have multiple uses. It is -[well known in literature](https://llvm.org/pubs/2008-06-LCTES-ISelUsingSSAGraphs.pdf) -that DAG pattern matching is more powerful than tree pattern matching, but OTOH, -DAG pattern matching can lead to duplication of computation which needs to be -checked for. - -### "Combiners" and other peephole optimizers - -Compilers end up with a lot of peephole optimizers for various things, e.g. the -GCC -["combine" routines](https://github.com/gcc-mirror/gcc/blob/master/gcc/combine.c) -(which try to merge two machine instructions into a single one), the LLVM -[Inst Combine](http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/InstCombine/) -[pass](https://llvm.org/docs/Passes.html#instcombine-combine-redundant-instructions), -LLVM's -[DAG Combiner](https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/DAGCombiner.cpp), -the Swift compiler's -[SIL Combiner](https://github.com/apple/swift/tree/master/lib/SILOptimizer/SILCombiner), -etc. These generally match one or more operations and produce zero or more -operations as a result. The LLVM -[Legalization](http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/) -infrastructure has a different outer loop but otherwise works the same way. - -These passes have a lot of diversity, but also have a unifying structure: they -mostly have a worklist outer loop which visits operations. They then use the C++ -visitor pattern (or equivalent) to switch over the class of operation and -dispatch to a method. That method contains a long list of hand-written C++ code -that pattern-matches various special cases. LLVM introduced a "match" function -that allows writing patterns in a somewhat more declarative style using template -metaprogramming (MLIR has similar facilities). Here's a simple example: - -```c++ - // Y - (X + 1) --> ~X + Y - if (match(Op1, m_OneUse(m_Add(m_Value(X), m_One())))) - return BinaryOperator::CreateAdd(Builder.CreateNot(X), Op0); -``` - -Here is a somewhat more complicated one (this is not the biggest or most -complicated :) - -```c++ - // C2 is ODD - // LHS = XOR(Y,C1), Y = AND(Z,C2), C1==(C2+1) => LHS == NEG(OR(Z, ~C2)) - // ADD(LHS, RHS) == SUB(RHS, OR(Z, ~C2)) - if (match(LHS, m_Xor(m_Value(Y), m_APInt(C1)))) - if (C1->countTrailingZeros() == 0) - if (match(Y, m_And(m_Value(Z), m_APInt(C2))) && *C1 == (*C2 + 1)) { - Value NewOr = Builder.CreateOr(Z, ~(*C2)); - return Builder.CreateSub(RHS, NewOr, "sub"); - } -``` - -These systems are simple to set up, and pattern matching templates have some -advantages (they are extensible for new sorts of sub-patterns, look compact at -point of use). OTOH, they have lots of well known problems, for example: - -* These patterns are very error prone to write, and contain lots of - redundancies. -* The IR being matched often has identities (e.g. when matching commutative - operators) and the C++ code has to handle it manually - take a look at - [the full code](http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/InstCombine/InstCombineAddSub.cpp?view=markup#l775) - for checkForNegativeOperand that defines the second pattern). -* The matching code compiles slowly, both because it generates tons of code - and because the templates instantiate slowly. -* Adding new patterns (e.g. for count leading zeros in the example above) is - awkward and doesn't often happen. -* The cost model for these patterns is not really defined - it is emergent - based on the order the patterns are matched in code. -* They are non-extensible without rebuilding the compiler. -* It isn't practical to apply theorem provers and other tools to these - patterns - they cannot be reused for other purposes. - -In addition to structured "combiners" like these, there are lots of ad-hoc -systems like the -[LLVM Machine code peephole optimizer](http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/PeepholeOptimizer.cpp?view=markup) -which are related. - -### LLVM's DAG-to-DAG Instruction Selection Infrastructure - -The instruction selection subsystem in LLVM is the result of many years worth of -iteration and discovery, driven by the need for LLVM to support code generation -for lots of targets, the complexity of code generators for modern instruction -sets (e.g. X86), and the fanatical pursuit of reusing code across targets. Eli -wrote a -[nice short overview](https://eli.thegreenplace.net/2013/02/25/a-deeper-look-into-the-llvm-code-generator-part-1) -of how this works, and the -[LLVM documentation](https://llvm.org/docs/CodeGenerator.html#select-instructions-from-dag) -describes it in more depth including its advantages and limitations. It allows -writing patterns like this. - -``` -def : Pat<(or GR64:$src, (not (add GR64:$src, 1))), - (BLCI64rr GR64:$src)>; -``` - -This example defines a matcher for the -["blci" instruction](https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets#TBM_\(Trailing_Bit_Manipulation\)) -in the -[X86 target description](http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86InstrInfo.td?view=markup), -there are many others in that file (look for `Pat<>` patterns, since they aren't -entangled in details of the compiler like assembler/disassembler generation -logic). - -For our purposes, there is much to like about this system, for example: - -* It is defined in a declarative format. -* It is extensible to target-defined operations. -* It automates matching across identities, like commutative patterns. -* It allows custom abstractions and intense factoring of target-specific - commonalities. -* It generates compact code - it compiles into a state machine, which is - interpreted. -* It allows the instruction patterns to be defined and reused for multiple - purposes. -* The patterns are "type checked" at compile time, detecting lots of bugs - early and eliminating redundancy from the pattern specifications. -* It allows the use of general C++ code for weird/complex cases. - -While there is a lot that is good here, there is also a lot of bad things: - -* All of this machinery is only applicable to instruction selection. Even - directly adjacent problems like the DAGCombiner and Legalizer can't use it. -* This isn't extensible at compiler runtime, you have to rebuild the compiler - to extend it. -* The error messages when failing to match a pattern - [are not exactly optimal](https://www.google.com/search?q=llvm+cannot+select). -* It has lots of implementation problems and limitations (e.g. can't write a - pattern for a multi-result operation) as a result of working with the - awkward SelectionDAG representation and being designed and implemented - lazily. -* This stuff all grew organically over time and has lots of sharp edges. - -### Summary - -MLIR will face a wide range of pattern matching and graph rewrite problems, and -one of the major advantages of having a common representation for code at -multiple levels that it allows us to invest in - and highly leverage - a single -infra for doing this sort of work. - -## Goals - -This proposal includes support for defining pattern matching and rewrite -algorithms on MLIR. We'd like these algorithms to encompass many problems in the -MLIR space, including 1-to-N expansions (e.g. as seen in the TF/XLA bridge when -lowering a "tf.AddN" to multiple "add" HLOs), M-to-1 patterns (as seen in -Grappler optimization passes, e.g. that convert multiple/add into a single -muladd op), as well as general M-to-N patterns (e.g. instruction selection for -target instructions). Patterns should have a cost associated with them, and the -common infrastructure should be responsible for sorting out the lowest cost -match for a given application. - -We separate the task of picking a particular locally optimal pattern from a -given root node, the algorithm used to rewrite an entire graph given a -particular set of goals, and the definition of the patterns themselves. We do -this because DAG tile pattern matching is NP complete, which means that there -are no known polynomial time algorithms to optimally solve this problem. -Additionally, we would like to support iterative rewrite algorithms that -progressively transform the input program through multiple steps. Furthermore, -we would like to support many different sorts of clients across the MLIR stack, -and they may have different tolerances for compile time cost, different demands -for optimality, and other algorithmic goals or constraints. - -We aim for MLIR transformations to be easy to implement and reduce the -likelihood for compiler bugs. We expect there to be a very very large number of -patterns that are defined over time, and we believe that these sorts of patterns -will have a very large number of legality/validity constraints - many of which -are difficult to reason about in a consistent way, may be target specific, and -whose implementation may be particularly bug-prone. As such, we aim to design the -API around pattern definition to be simple, resilient to programmer errors, and -allow separation of concerns between the legality of the nodes generated from -the idea of the pattern being defined. - -Finally, error handling is a topmost concern: in addition to allowing patterns -to be defined in a target-independent way that may not apply for all hardware, -we also want failure for any pattern to match to be diagnosable in a reasonable -way. To be clear, this is not a solvable problem in general - the space of -malfunction is too great to be fully enumerated and handled optimally, but there -are better and worse ways to handle the situation. MLIR is already designed to -represent the provenance of an operation well. This project aims to propagate -that provenance information precisely, as well as diagnose pattern match -failures with the rationale for why a set of patterns do not apply. - -### Non goals - -This proposal doesn't aim to solve all compiler problems, it is simply a -DAG-to-DAG pattern matching system, starting with a greedy driver algorithm. -Compiler algorithms that require global dataflow analysis (e.g. common -subexpression elimination, conditional constant propagation, and many many -others) will not be directly solved by this infrastructure. - -This proposal is limited to DAG patterns, which (by definition) prevent the -patterns from seeing across cycles in a graph. In an SSA-based IR like MLIR, -this means that these patterns don't see across PHI nodes / basic block -arguments. We consider this acceptable given the set of problems we are trying -to solve - we don't know of any other system that attempts to do so, and -consider the payoff of worrying about this to be low. - -This design includes the ability for DAG patterns to have associated costs -(benefits), but those costs are defined in terms of magic numbers (typically -equal to the number of nodes being replaced). For any given application, the -units of magic numbers will have to be defined. - -## Overall design - -We decompose the problem into four major pieces: - -1. the code that is used to define patterns to match, cost, and their - replacement actions -1. the driver logic to pick the best match for a given root node -1. the client that is implementing some transformation (e.g. a combiner) -1. (future) the subsystem that allows patterns to be described with a - declarative syntax, which sugars step #1. - -We sketch the first three of these pieces, each in turn. This is not intended to -be a concrete API proposal, merely to describe the design - -### Defining Patterns - -Each pattern will be an instance of a mlir::Pattern class, whose subclasses -implement methods like this. Note that this API is meant for exposition, the -actual details are different for efficiency and coding standards reasons (e.g. -the memory management of `PatternState` is not specified below, etc): - -```c++ -class Pattern { - /// Return the benefit (the inverse of "cost") of matching this pattern. The - /// benefit of a Pattern is always static - rewrites that may have dynamic - /// benefit can be instantiated multiple times (different Pattern instances) - /// for each benefit that they may return, and be guarded by different match - /// condition predicates. - PatternBenefit getBenefit() const { return benefit; } - - /// Return the root node that this pattern matches. Patterns that can - /// match multiple root types are instantiated once per root. - OperationName getRootKind() const { return rootKind; } - - /// Attempt to match against code rooted at the specified operation, - /// which is the same operation code as getRootKind(). On failure, this - /// returns a None value. On success it a (possibly null) pattern-specific - /// state wrapped in a Some. This state is passed back into its rewrite - /// function if this match is selected. - virtual Optional match(Operation *op) const = 0; - - /// Rewrite the IR rooted at the specified operation with the result of - /// this pattern, generating any new operations with the specified - /// rewriter. If an unexpected error is encountered (an internal - /// compiler error), it is emitted through the normal MLIR diagnostic - /// hooks and the IR is left in a valid state. - virtual void rewrite(Operation *op, PatternState *state, - PatternRewriter &rewriter) const; -}; -``` - -In practice, the first patterns we implement will directly subclass and -implement this stuff, but we will define some helpers to reduce boilerplate. -When we have a declarative way to describe patterns, this should be -automatically generated from the description. - -Instances of `Pattern` have a benefit that is static upon construction of the -pattern instance, but may be computed dynamically at pattern initialization -time, e.g. allowing the benefit to be derived from domain specific information, -like the target architecture). This limitation allows us MLIR to (eventually) -perform pattern fusion and compile patterns into an efficient state machine, and -[Thier, Ertl, and Krall](https://dl.acm.org/citation.cfm?id=3179501) have shown -that match predicates eliminate the need for dynamically computed costs in -almost all cases: you can simply instantiate the same pattern one time for each -possible cost and use the predicate to guard the match. - -The two-phase nature of this API (match separate from rewrite) is important for -two reasons: 1) some clients may want to explore different ways to tile the -graph, and only rewrite after committing to one tiling. 2) We want to support -runtime extensibility of the pattern sets, but want to be able to statically -compile the bulk of known patterns into a state machine at "compiler compile -time". Both of these reasons lead to us needing to match multiple patterns -before committing to an answer. - -### Picking and performing a replacement - -In the short term, this API can be very simple, something like this can work and -will be useful for many clients: - -```c++ -class PatternMatcher { - // Create a pattern matcher with a bunch of patterns. This constructor - // looks across all of the specified patterns, and builds an internal - // data structure that allows efficient matching. - PatternMatcher(ArrayRef patterns); - - // Given a specific operation, see if there is some rewrite that is - // interesting. If so, return success and return the list of new - // operations that were created. If not, return failure. - bool matchAndRewrite(Operation *op, - SmallVectorImpl &newlyCreatedOps); -}; -``` - -In practice the interesting part of this class is the acceleration structure it -builds internally. It buckets up the patterns by root operation, and sorts them -by their static benefit. When performing a match, it tests any dynamic patterns, -then tests statically known patterns from highest to lowest benefit. - -### First Client: A Greedy Worklist Combiner - -We expect that there will be lots of clients for this, but a simple greedy -worklist-driven combiner should be powerful enough to serve many important ones, -including the -[TF2XLA op expansion logic](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/compiler/tf2xla/kernels), -many of the pattern substitution passes of the -[TOCO compiler](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/toco) -for TF-Lite, many -[Grappler](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/grappler) -passes, and other general performance optimizations for applying identities. - -The structure of this algorithm is straight-forward, here is pseudo code: - -* Walk a function in preorder, adding each operation to a worklist. -* While the worklist is non-empty, pull something off the back (processing - things generally in postorder) - * Perform matchAndRewrite on the operation. If failed, continue to the - next operation. - * On success, add the newly created ops to the worklist and continue. - -## Future directions - -It is important to get implementation and usage experience with this, and many -patterns can be defined using this sort of framework. Over time, we can look to -make it easier to declare patterns in a declarative form (e.g. with the LLVM -tblgen tool or something newer/better). Once we have that, we can define an -internal abstraction for describing the patterns to match, allowing better high -level optimization of patterns (including fusion of the matching logic across -patterns, which the LLVM instruction selector does) and allow the patterns to be -defined without rebuilding the compiler itself. diff --git a/mlir/docs/PatternRewriter.md b/mlir/docs/PatternRewriter.md new file mode 100644 --- /dev/null +++ b/mlir/docs/PatternRewriter.md @@ -0,0 +1,246 @@ +# Pattern Rewriting : Generic DAG-to-DAG Rewriting + +[TOC] + +This document details the design and API of the pattern rewriting infrastructure +present in MLIR, a general DAG-to-DAG transformation framework. This framework +is widely used throughout MLIR for canonicalization, conversion, and general +transformation. + +For an introduction to DAG-to-DAG transformation, and the rationale behind this +framework please take a look at the +[Generic DAG Rewriter Rationale](Rationale/RationaleGenericDAGRewriter.md). + +## Introduction + +The pattern rewriting framework can largely be decomposed into two parts: +Pattern Definition and Pattern Application. + +## Defining Patterns + +Patterns are defined by inheriting from the `RewritePattern` class. This class +represents the base class of all rewrite patterns within MLIR, and is comprised +of the following components: + +### Benefit + +This is the expected benefit of applying a given pattern. This benefit is static +upon construction of the pattern, but may be computed dynamically at pattern +initialization time, e.g. allowing the benefit to be derived from domain +specific information, like the target architecture). This limitation allows for +performing pattern fusion and compiling patterns into an efficient state +machine, and +[Thier, Ertl, and Krall](https://dl.acm.org/citation.cfm?id=3179501) have shown +that match predicates eliminate the need for dynamically computed costs in +almost all cases: you can simply instantiate the same pattern one time for each +possible cost and use the predicate to guard the match. + +### Root Operation Name (Optional) + +The name of the root operation that this pattern matches against. If specified, +only operations with the given root name will be provided to the `match` and +`rewrite` implementation. If not specified, any operation type may be provided. +The root operation name should be provided whenever possible, because it +simplifies the analysis of patterns when applying a cost model. To match any +operation type, a special tag must be provided to make the intent explicit: +`MatchAnyOpTypeTag`. + +### `match` and `rewrite` implementation + +This is the chunk of code that matches a given root `Operation` and performs a +rewrite of the IR. A `RewritePattern` can specify this implementation either via +separate `match` and `rewrite` methods, or via a combined `matchAndRewrite` +method. When using the combined `matchAndRewrite` method, no IR mutation should +take place before the match is deemed successful. See below for examples: + +```c++ +class MyPattern : public RewritePattern { +public: + /// This overload constructs a pattern that only matches operations with the + /// root name of `MyOp`. + MyPattern(PatternBenefit benefit, MLIRContext *context) + : RewritePattern(MyOp::getOperationName(), benefit, context) {} + /// This overload constructs a pattern that matches any operation type. + MyPattern(PatternBenefit benefit) + : RewritePattern(benefit, MatchAnyOpTypeTag()) {} + + /// In this section, the `match` and `rewrite` implementation is specified + /// using the separate hooks. + LogicalResult match(Operation *op) const override { + // The `match` method returns `success()` if the pattern is a match, failure + // otherwise. + // ... + } + void rewrite(Operation *op, PatternRewriter &rewriter) { + // The `rewrite` method performs mutations on the IR rooted at `op` using + // the provided rewriter. All mutations must go through the provided + // rewriter. + } + + /// In this section, the `match` and `rewrite` implementation is specified + /// using a single hook. + LogicalResult matchAndRewrite(Operation *op, PatternRewriter &rewriter) { + // The `matchAndRewrite` method performs both the matching and the mutation. + // Note that the match must reach a successful point before IR mutation may + // take place. + } +}; +``` + +#### Restrictions + +Within the `match` section of a pattern, the following constraints apply: + +* No mutation of the IR is allowed. + +Within the `rewrite` section of a pattern, the following constraints apply: + +* All IR mutations, including creation, *must* be performed by the given + `PatternRewriter`. This class provides hooks for performing all of the + possible mutations that may take place within a pattern. For example, this + means that an operation should not be erased via its `erase` method. To + erase an operation, the appropriate `PatternRewriter` hook (in this case + `eraseOp`) should be used instead. +* The root operation is required to either be: updated in-place, replaced, or + erased. + +### Pattern Rewriter + +A `PatternRewriter` is a special class that allows for a pattern to communicate +with the driver of pattern application. As noted above, *all* IR mutations, +including creations, are required to be performed via the `PatternRewriter` +class. This is required because the underlying pattern driver may have state +that would be invalidated when a mutation takes place. Examples of some of the +more prevalent `PatternRewriter` API is shown below, please refer to the +[class documentation](https://github.com/llvm/llvm-project/blob/master/mlir/include/mlir/IR/PatternMatch.h#L235) +for a more up-to-date listing of the available API: + +* Erase an Operation : `eraseOp` + +This method erases an operation that either has no results, or whose results are +all known to have no uses. + +* Notify why a `match` failed : `notifyMatchFailure` + +This method allows for providing a diagnostic message within a `matchAndRewrite` +as to why a pattern failed to match. How this message is displayed back to the +user is determined by the specific pattern driver. + +* Replace an Operation : `replaceOp`/`replaceOpWithNewOp` + +This method replaces an operation's results with a set of provided values, and +erases the operation. + +* Update an Operation in-place : `(start|cancel|finalize)RootUpdate` + +This is a collection of methods that provide a transaction-like API for updating +the attributes, location, operands, or successors of an operation in-place +within a pattern. An in-place update transaction is started with +`startRootUpdate`, and may either be canceled or finalized with +`cancelRootUpdate` and `finalizeRootUpdate` respectively. A convenience wrapper, +`updateRootInPlace`, is provided that wraps a `start` and `finalize` around a +callback. + +* OpBuilder API + +The `PatternRewriter` inherits from the `OpBuilder` class, and thus provides all +of the same functionality present within an `OpBuilder`. This includes operation +creation, as well as many useful attribute and type construction methods. + +## Pattern Application + +After a set of patterns have been defined, they are collected and provided to a +specific driver for application. A driver consists of several high levels parts: + +* Input `OwningRewritePatternList` + +The input patterns to a driver are provided in the form of an +`OwningRewritePatternList`. This class provides a simplified API for building a +list of patterns. + +* Driver-specific `PatternRewriter` + +To ensure that the driver state does not become invalidated by IR mutations +within the pattern rewriters, a driver must provide a `PatternRewriter` instance +with the necessary hooks overridden. If a driver does not need to hook into +certain mutations, a default implementation is provided that will perform the +mutation directly. + +* Pattern Application and Cost Model + +Each driver is responsible for defining its own operation visitation order as +well as pattern cost model, but the final application is performed via a +`PatternApplicator` class. This class takes as input the +`OwningRewritePatternList` and transforms the patterns based upon a provided +cost model. This cost model computes a final benefit for a given rewrite +pattern, using whatever driver specific information necessary. After a cost +model has been computed, the driver may begin to match patterns against +operations using `PatternApplicator::matchAndRewrite`. + +An example is shown below: + +```c++ +class MyPattern : public RewritePattern { +public: + MyPattern(PatternBenefit benefit, MLIRContext *context) + : RewritePattern(MyOp::getOperationName(), benefit, context) {} +}; + +/// Populate the pattern list. +void collectMyPatterns(OwningRewritePatternList &patterns, MLIRContext *ctx) { + patterns.insert(/*benefit=*/1, ctx); +} + +/// Define a custom PatternRewriter for use by the driver. +class MyPatternRewriter : public PatternRewriter { +public: + MyPatternRewriter(MLIRContext *ctx) : PatternRewriter(ctx) {} + + /// Override the necessary PatternRewriter hooks here. +}; + +/// Apply the custom driver to `op`. +void applyMyPatternDriver(Operation *op, + const OwningRewritePatternList &patterns) { + // Initialize the custom PatternRewriter. + MyPatternRewriter rewriter(op->getContext()); + + // Create the applicator and apply our cost model. + PatternApplicator applicator(patterns); + applicator.applyCostModel([](const RewritePattern &pattern) { + // Apply a default cost model. + // Note: This is just for demonstration, if the default cost model is truly + // desired `applicator.applyDefaultCostModel()` should be used + // instead. + return pattern.getBenefit(); + }); + + // Try to match and apply a pattern. + LogicalResult result = applicator.matchAndRewrite(op, rewriter); +} +``` + +## Common Pattern Drivers + +MLIR provides several common pattern drivers that serve a variety of different +use cases. + +### Dialect Conversion Driver + +This driver provides a framework in which to perform operation conversions +between, and within dialects using a concept of "legality". This framework +allows for transforming illegal operations to those supported by a provided +conversion target, via a set of pattern-based operation rewriting patterns. More +information on this driver can be found [here](DialectConversion.nd). + +### Greedy Pattern Rewrite Driver + +This driver performs a post order traversal over the provided operations and +greedily applies the patterns that locally have the most benefit. Patterns are +iteratively applied to operations until a fixed point is reached, at which point +the driver finishes. This driver may be used via the following: +`applyPatternsAndFoldGreedily` and `applyOpPatternsAndFold`. The latter of which +only applies patterns to the provided operation, and will not traverse the IR. + +Note: This driver is the one used by the [canonicalization](Canonicalization.md) +[pass](Passes.md#-canonicalize-canonicalize-operations) in MLIR. diff --git a/mlir/docs/Rationale/MLIRForGraphAlgorithms.md b/mlir/docs/Rationale/MLIRForGraphAlgorithms.md --- a/mlir/docs/Rationale/MLIRForGraphAlgorithms.md +++ b/mlir/docs/Rationale/MLIRForGraphAlgorithms.md @@ -254,7 +254,7 @@ ### Unified Graph Rewriting Infrastructure This is still a work in progress, but we have sightlines towards a -[general rewriting infrastructure](GenericDAGRewriter.md) for transforming DAG +[general rewriting infrastructure](RationaleGenericDAGRewriter.md) for transforming DAG tiles into other DAG tiles, using a declarative pattern format. DAG to DAG rewriting is a generalized solution for many common compiler optimizations, lowerings, and other rewrites and having an IR enables us to invest in building diff --git a/mlir/docs/Rationale/RationaleGenericDAGRewriter.md b/mlir/docs/Rationale/RationaleGenericDAGRewriter.md new file mode 100644 --- /dev/null +++ b/mlir/docs/Rationale/RationaleGenericDAGRewriter.md @@ -0,0 +1,287 @@ +# Generic DAG Rewriter Infrastructure Rationale + +This document details the rationale behind a general DAG-to-DAG rewrite +infrastructure for MLIR. For up-to-date documentation on the user facing API, +please look at the main [Pattern Rewriting document](../PatternRewriter.md). + +## Introduction and Motivation + +The goal of a compiler IR is to represent code - at various levels of +abstraction which pose different sets of tradeoffs in terms of representational +capabilities and ease of transformation. However, the ability to represent code +is not itself very useful - you also need to be able to implement those +transformations. + +There are many different sorts of compiler transformations, but this document +focuses on a particularly important class of transformation that comes up +repeatedly at scale, and is important for the goals of MLIR: that of pattern +matching on a set of operations and replacing with another set. This is the key +algorithm required to implement the "op fission" algorithm used by the tf2xla +bridge, pattern matching rewrites from TF ops to TF/Lite, peephole optimizations +like "eliminate identity nodes" or "replace x+0 with x", as well as a useful +abstraction to implement optimization algorithms for MLIR graphs at all levels. + +A particular strength of MLIR (and a major difference vs other compiler +infrastructures like LLVM, GCC, XLA, TensorFlow, etc) is that it uses a single +compiler IR to represent code at multiple levels of abstraction: an MLIR +operation can be a "TensorFlow operation", an "XLA HLO", a "TF Lite +FlatBufferModel op", an LLVM IR instruction (transitively including X86, Lanai, +CUDA, and other target specific instructions), or anything else that the MLIR +operation system can reasonably express. Given that MLIR spans such a wide range +of different problem scopes, a single infrastructure for performing +graph-to-graph rewrites can help solve many diverse domain challenges, including +the TensorFlow graph level down to the machine code level. + +[Static single assignment](https://en.wikipedia.org/wiki/Static_single_assignment_form) +(SSA) representations like MLIR make it easy to access the operands and "users" +of an operation. As such, a natural abstraction for these graph-to-graph +rewrites is that of DAG pattern matching: clients define DAG tile patterns, and +each pattern includes a result DAG to produce and the cost of the result (or, +inversely, the benefit of doing the replacement). A common infrastructure +efficiently finds and perform the rewrites. + +While this concept is simple, the details are more nuanced. This document +defines and explores a set of abstractions that can solve a wide range of +different problems, and be applied to many different sorts of problems that MLIR +is - and is expected to - face over time. We do this by separating the pattern +definition and matching algorithm from the "driver" of the computation loop, and +make space for the patterns to be defined declaratively. + +## Related Work + +There is a huge amount of related work to consider, given that nearly every +compiler in existence has to solve this problem many times over. Here are a few +related graph rewrite systems, along with the pros and cons of their work. One +unifying problem is that all of these systems are only to solve one particular, +and usually, narrow problem: MLIR on the other hand, would like to solve many of +these problems within a single infrastructure. Of these, the most similar design +to the infrastructure present in MLIR is the LLVM DAG-to-DAG instruction +selection algorithm at the end. + +### Constant folding + +A degenerate but pervasive case of DAG-to-DAG pattern matching is constant +folding: given an operation whose operands contain constants can often be folded +to a result constant value. + +MLIR operations may override a +[`fold`](../Canonicalization.md/#canonicalizing-with-fold) routine, which +exposes a simpler API compared to a general DAG-to-DAG pattern matcher, and +allows for it to be applicable in cases that a generic matcher would not. For +example, a DAG-rewrite can remove arbitrary nodes in the current function, which +could invalidate iterators. Constant folding as an API does not remove any +nodes, it just provides a (list of) constant values and allows the clients to +update their data structures as necessary. + +### AST-Level Pattern Matchers + +The literature is full of source-to-source translators which transform +identities in order to improve performance (e.g. transforming `X*0` into `0`). +One large example is the GCC `fold` function, which performs +[many optimizations](https://github.com/gcc-mirror/gcc/blob/master/gcc/fold-const.c) +on ASTs. Clang has +[similar routines](http://releases.llvm.org/3.5.0/tools/clang/docs/InternalsManual.html#constant-folding-in-the-clang-ast) +for simple constant folding of expressions (as required by the C++ standard) but +doesn't perform general optimizations on its ASTs. + +The primary downside of tree optimizers is that you can't see across operations +that have multiple uses. It is +[well known in literature](https://llvm.org/pubs/2008-06-LCTES-ISelUsingSSAGraphs.pdf) +that DAG pattern matching is more powerful than tree pattern matching, but on +the other hand, DAG pattern matching can lead to duplication of computation +which needs to be checked for. + +### "Combiners" and other peephole optimizers + +Compilers end up with a lot of peephole optimizers for various things, e.g. the +GCC +["combine" routines](https://github.com/gcc-mirror/gcc/blob/master/gcc/combine.c) +(which try to merge two machine instructions into a single one), the LLVM +[Inst Combine](https://github.com/llvm/llvm-project/tree/master/llvm/lib/Transforms/InstCombine) +[pass](https://llvm.org/docs/Passes.html#instcombine-combine-redundant-instructions), +LLVM's +[DAG Combiner](https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/DAGCombiner.cpp), +the Swift compiler's +[SIL Combiner](https://github.com/apple/swift/tree/master/lib/SILOptimizer/SILCombiner), +etc. These generally match one or more operations and produce zero or more +operations as a result. The LLVM +[Legalization](https://github.com/llvm/llvm-project/tree/master/llvm/lib/CodeGen/SelectionDAG) +infrastructure has a different outer loop but otherwise works the same way. + +These passes have a lot of diversity, but also have a unifying structure: they +mostly have a worklist outer loop which visits operations. They then use the C++ +visitor pattern (or equivalent) to switch over the class of operation and +dispatch to a method. That method contains a long list of hand-written C++ code +that pattern-matches various special cases. LLVM introduced a "match" function +that allows writing patterns in a somewhat more declarative style using template +metaprogramming (MLIR has similar facilities). Here's a simple example: + +```c++ + // Y - (X + 1) --> ~X + Y + if (match(Op1, m_OneUse(m_Add(m_Value(X), m_One())))) + return BinaryOperator::CreateAdd(Builder.CreateNot(X), Op0); +``` + +Here is a somewhat more complicated one (this is not the biggest or most +complicated :) + +```c++ + // C2 is ODD + // LHS = XOR(Y,C1), Y = AND(Z,C2), C1==(C2+1) => LHS == NEG(OR(Z, ~C2)) + // ADD(LHS, RHS) == SUB(RHS, OR(Z, ~C2)) + if (match(LHS, m_Xor(m_Value(Y), m_APInt(C1)))) + if (C1->countTrailingZeros() == 0) + if (match(Y, m_And(m_Value(Z), m_APInt(C2))) && *C1 == (*C2 + 1)) { + Value NewOr = Builder.CreateOr(Z, ~(*C2)); + return Builder.CreateSub(RHS, NewOr, "sub"); + } +``` + +These systems are simple to set up, and pattern matching templates have some +advantages (they are extensible for new sorts of sub-patterns, look compact at +point of use). OTOH, they have lots of well known problems, for example: + +* These patterns are very error prone to write, and contain lots of + redundancies. +* The IR being matched often has identities (e.g. when matching commutative + operators) and the C++ code has to handle it manually - take a look at + [the full code](https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp#L767) + for `checkForNegativeOperand` that defines the second pattern). +* The matching code compiles slowly, both because it generates tons of code + and because the templates instantiate slowly. +* Adding new patterns (e.g. for count leading zeros in the example above) is + awkward and doesn't often happen. +* The cost model for these patterns is not really defined - it is emergent + based on the order the patterns are matched in code. +* They are non-extensible without rebuilding the compiler. +* It isn't practical to apply theorem provers and other tools to these + patterns - they cannot be reused for other purposes. + +In addition to structured "combiners" like these, there are lots of ad-hoc +systems like the +[LLVM Machine code peephole optimizer](http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/PeepholeOptimizer.cpp?view=markup) +which are related. + +### LLVM's DAG-to-DAG Instruction Selection Infrastructure + +The instruction selection subsystem in LLVM is the result of many years worth of +iteration and discovery, driven by the need for LLVM to support code generation +for lots of targets, the complexity of code generators for modern instruction +sets (e.g. X86), and the fanatical pursuit of reusing code across targets. Eli +wrote a +[nice short overview](https://eli.thegreenplace.net/2013/02/25/a-deeper-look-into-the-llvm-code-generator-part-1) +of how this works, and the +[LLVM documentation](https://llvm.org/docs/CodeGenerator.html#select-instructions-from-dag) +describes it in more depth including its advantages and limitations. It allows +writing patterns like this. + +``` +def : Pat<(or GR64:$src, (not (add GR64:$src, 1))), + (BLCI64rr GR64:$src)>; +``` + +This example defines a matcher for the +["blci" instruction](https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets#TBM_\(Trailing_Bit_Manipulation\)) +in the +[X86 target description](http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86InstrInfo.td?view=markup), +there are many others in that file (look for `Pat<>` patterns, since they aren't +entangled in details of the compiler like assembler/disassembler generation +logic). + +For the purposes of MLIR, there is much to like about this system, for example: + +* It is defined in a declarative format. +* It is extensible to target-defined operations. +* It automates matching across identities, like commutative patterns. +* It allows custom abstractions and intense factoring of target-specific + commonalities. +* It generates compact code - it compiles into a state machine, which is + interpreted. +* It allows the instruction patterns to be defined and reused for multiple + purposes. +* The patterns are "type checked" at compile time, detecting lots of bugs + early and eliminating redundancy from the pattern specifications. +* It allows the use of general C++ code for weird/complex cases. + +While there is a lot that is good here, there are also a few undesirable bits: + +* All of this machinery is only applicable to instruction selection. Even + directly adjacent problems like the DAGCombiner and Legalizer can't use it. +* This isn't extensible at compiler runtime, you have to rebuild the compiler + to extend it. +* The error messages when failing to match a pattern + [are not exactly optimal](https://www.google.com/search?q=llvm+cannot+select). +* It has lots of implementation problems and limitations (e.g. can't write a + pattern for a multi-result operation) as a result of working with the + awkward SelectionDAG representation and being designed and implemented on + demand. +* Organic growth over time has left lots of sharp edges. + +### Summary + +MLIR faces a wide range of pattern matching and graph rewrite problems, and one +of the major advantages of having a common representation for code at multiple +levels that it allows for investing in - and highly leveraging - a single infra +for doing this sort of work. + +## Goals + +We'd like these algorithms to encompass many problems in the MLIR space, +including 1-to-N expansions (e.g. as seen in the TF/XLA bridge when lowering a +"tf.AddN" to multiple "add" HLOs), M-to-1 patterns (as seen in Grappler +optimization passes, e.g. that convert multiple/add into a single muladd op), as +well as general M-to-N patterns (e.g. instruction selection for target +instructions). Patterns have a benefit associated with them, and the common +infrastructure should be responsible for sorting out the lowest cost match for a +given application. + +We separate the task of picking a particular locally optimal pattern from a +given root node, the algorithm used to rewrite an entire graph given a +particular set of goals, and the definition of the patterns themselves. We do +this because DAG tile pattern matching is NP complete, which means that there +are no known polynomial time algorithms to optimally solve this problem. +Additionally, we would like to support iterative rewrite algorithms that +progressively transform the input program through multiple steps. Furthermore, +we would like to support many different sorts of clients across the MLIR stack, +and they may have different tolerances for compile time cost, different demands +for optimality, and other algorithmic goals or constraints. + +We aim for MLIR transformations to be easy to implement and reduce the +likelihood for compiler bugs. We expect there to be a very very large number of +patterns that are defined over time, and we believe that these sorts of patterns +will have a very large number of legality/validity constraints - many of which +are difficult to reason about in a consistent way, may be target specific, and +whose implementation may be particularly bug-prone. As such, we aim to design +the API around pattern definition to be simple, resilient to programmer errors, +and allow separation of concerns between the legality of the nodes generated +from the idea of the pattern being defined. + +Finally, error handling is a topmost concern: in addition to allowing patterns +to be defined in a target-independent way that may not apply for all hardware, +we also want failure for any pattern to match to be diagnosable in a reasonable +way. To be clear, this is not a solvable problem in general - the space of +malfunction is too great to be fully enumerated and handled optimally, but there +are better and worse ways to handle the situation. MLIR is already designed to +represent the provenance of an operation well. This project aims to propagate +that provenance information precisely, as well as diagnose pattern match +failures with the rationale for why a set of patterns do not apply. + +### Non goals + +The pattern infrastructure does not aim to solve all compiler problems, it is +simply a DAG-to-DAG pattern matching system. Compiler algorithms that require +global dataflow analysis (e.g. common subexpression elimination, conditional +constant propagation, and many many others) will not be directly solved by this +infrastructure. + +This infrastructure is limited to DAG patterns, which (by definition) prevent +the patterns from seeing across cycles in a graph. In an SSA-based IR like MLIR, +this means that these patterns don't see across PHI nodes / basic block +arguments. We consider this acceptable given the set of problems we are trying +to solve - we don't know of any other system that attempts to do so, and +consider the payoff of worrying about this to be low. + +This design includes the ability for DAG patterns to have associated costs +(benefits), but those costs are defined in terms of magic numbers (typically +equal to the number of nodes being replaced). For any given application, the +units of magic numbers will have to be defined. diff --git a/mlir/docs/Tutorials/Toy/Ch-3.md b/mlir/docs/Tutorials/Toy/Ch-3.md --- a/mlir/docs/Tutorials/Toy/Ch-3.md +++ b/mlir/docs/Tutorials/Toy/Ch-3.md @@ -13,7 +13,7 @@ this chapter, we focus on how to leverage the Toy Dialect and its high-level semantics to perform local pattern-match transformations that would be difficult in LLVM. For this, we use MLIR's -[Generic DAG Rewriter](../../GenericDAGRewriter.md). +[Generic DAG Rewriter](../../PatternRewriter.md). There are two methods that can be used to implement pattern-match transformations: 1. Imperative, C++ pattern-match and rewrite 2. Declarative,