diff --git a/mlir/docs/Dialects/Affine.md b/mlir/docs/Dialects/Affine.md --- a/mlir/docs/Dialects/Affine.md +++ b/mlir/docs/Dialects/Affine.md @@ -1,4 +1,4 @@ -# Affine Dialect +# `affine` Dialect This dialect provides a powerful abstraction for affine operations and analyses. @@ -295,140 +295,9 @@ ## Operations -#### 'affine.apply' operation +[include "Dialects/AffineOps.md"] -Syntax: - -``` -operation ::= ssa-id `=` `affine.apply` affine-map-attribute dim-and-symbol-use-list -``` - -The `affine.apply` operation applies an -[affine mapping](#affine-expressions) to a list of SSA values, -yielding a single SSA value. The number of dimension and symbol arguments to -affine.apply must be equal to the respective number of dimensional and symbolic -inputs to the affine mapping; the `affine.apply` operation always returns one -value. The input operands and result must all have 'index' type. - -Example: - -```mlir -#map10 = affine_map<(d0, d1) -> (d0 floordiv 8 + d1 floordiv 128)> -... -%1 = affine.apply #map10 (%s, %t) - -// Inline example. -%2 = affine.apply affine_map<(i)[s0] -> (i+s0)> (%42)[%n] -``` - -#### 'affine.for' operation - -Syntax: - -``` -operation ::= `affine.for` ssa-id `=` lower-bound `to` upper-bound - (`step` integer-literal)? `{` op* `}` - -lower-bound ::= `max`? affine-map-attribute dim-and-symbol-use-list | shorthand-bound -upper-bound ::= `min`? affine-map-attribute dim-and-symbol-use-list | shorthand-bound -shorthand-bound ::= ssa-id | `-`? integer-literal -``` - -The `affine.for` operation represents an affine loop nest. It has one region -containing its body. This region must contain one block that terminates with -[`affine.terminator`](#affineterminator-operation). *Note:* when `affine.for` is -printed in custom format, the terminator is omitted. The block has one argument -of [`index`](../LangRef.md#index-type) type that represents the induction -variable of the loop. - -The `affine.for` operation executes its body a number of times iterating from a -lower bound to an upper bound by a stride. The stride, represented by `step`, is -a positive constant integer which defaults to "1" if not present. The lower and -upper bounds specify a half-open range: the range includes the lower bound but -does not include the upper bound. - -The lower and upper bounds of a `affine.for` operation are represented as an -application of an affine mapping to a list of SSA values passed to the map. The -[same restrictions](#restrictions-on-dimensions-and-symbols) hold for these SSA -values as for all bindings of SSA values to dimensions and symbols. - -The affine mappings for the bounds may return multiple results, in which case -the `max`/`min` keywords are required (for the lower/upper bound respectively), -and the bound is the maximum/minimum of the returned values. There is no -semantic ambiguity, but MLIR syntax requires the use of these keywords to make -things more obvious to human readers. - -Many upper and lower bounds are simple, so MLIR accepts two custom form -syntaxes: the form that accepts a single 'ssa-id' (e.g. `%N`) is shorthand for -applying that SSA value to a function that maps a single symbol to itself, e.g., -`()[s]->(s)()[%N]`. The integer literal form (e.g. `-42`) is shorthand for a -nullary mapping function that returns the constant value (e.g. `()->(-42)()`). - -Example showing reverse iteration of the inner loop: - -```mlir -#map57 = affine_map<(d0)[s0] -> (s0 - d0 - 1)> - -func @simple_example(%A: memref, %B: memref) { - %N = dim %A, 0 : memref - affine.for %i = 0 to %N step 1 { - affine.for %j = 0 to %N { // implicitly steps by 1 - %0 = affine.apply #map57(%j)[%N] - %tmp = call @F1(%A, %i, %0) : (memref, index, index)->(f32) - call @F2(%tmp, %B, %i, %0) : (f32, memref, index, index)->() - } - } - return -} -``` - -#### 'affine.if' operation - -Syntax: - -``` -operation ::= `affine.if` if-op-cond `{` op* `}` (`else` `{` op* `}`)? -if-op-cond ::= integer-set-attr dim-and-symbol-use-list -``` - -The `affine.if` operation restricts execution to a subset of the loop iteration -space defined by an integer set (a conjunction of affine constraints). A single -`affine.if` may end with an optional `else` clause. - -The condition of the `affine.if` is represented by an -[integer set](#integer-sets) (a conjunction of affine constraints), -and the SSA values bound to the dimensions and symbols in the integer set. The -[same restrictions](#restrictions-on-dimensions-and-symbols) hold for these SSA -values as for all bindings of SSA values to dimensions and symbols. - -The `affine.if` operation contains two regions for the "then" and "else" -clauses. The latter may be empty (i.e. contain no blocks), meaning the absence -of the else clause. When non-empty, both regions must contain exactly one block -terminating with [`affine.terminator`](#affineterminator-operation). *Note:* -when `affine.if` is printed in custom format, the terminator is omitted. These -blocks must not have any arguments. - -Example: - -```mlir -#set = affine_set<(d0, d1)[s0]: (d0 - 10 >= 0, s0 - d0 - 9 >= 0, - d1 - 10 >= 0, s0 - d1 - 9 >= 0)> -func @reduced_domain_example(%A, %X, %N) : (memref<10xi32>, i32, i32) { - affine.for %i = 0 to %N { - affine.for %j = 0 to %N { - %0 = affine.apply #map42(%j) - %tmp = call @S1(%X, %i, %0) - affine.if #set(%i, %j)[%N] { - %1 = affine.apply #map43(%i, %j) - call @S2(%tmp, %A, %i, %1) - } - } - } - return -} -``` - -#### 'affine.load' operation +### 'affine.load' operation Syntax: @@ -458,7 +327,7 @@ ``` -#### 'affine.store' operation +### 'affine.store' operation Syntax: @@ -488,7 +357,7 @@ ``` -#### 'affine.dma_start' operation +### 'affine.dma_start' operation Syntax: @@ -519,7 +388,6 @@ Example: ```mlir - For example, a DmaStartOp operation that transfers 256 elements of a memref '%src' in memory space 0 at indices [%i + 3, %j] to memref '%dst' in memory space 1 at indices [%k + 7, %l], would be specified as follows: @@ -537,10 +405,9 @@ affine.dma_start %src[%i, %j], %dst[%k, %l], %tag[%idx], %num_elements, %stride, %num_elt_per_stride : ... - ``` -#### 'affine.dma_wait' operation +### 'affine.dma_wait' operation Syntax: @@ -558,54 +425,9 @@ Example: ```mlir - - affine.dma_start %src[%i, %j], %dst[%k, %l], %tag[%index], %num_elements : - memref<2048xf32, 0>, memref<256xf32, 1>, memref<1xi32, 2> - ... - ... - affine.dma_wait %tag[%index], %num_elements : memref<1xi32, 2> - -``` - -#### 'affine.min' operation - -Syntax: - -``` -operation ::= ssa-id `=` `affine.min` affine-map-attribute dim-and-symbol-use-list -``` - -The `affine.min` operation applies an -[affine mapping](#affine-expressions) to a list of SSA values, and returns the -minimum value of all result expressions. The number of dimension and symbol -arguments to affine.min must be equal to the respective number of dimensional -and symbolic inputs to the affine mapping; the `affine.min` operation always -returns one value. The input operands and result must all have 'index' type. - -Example: - -```mlir - -%0 = affine.min affine_map<(d0)[s0] -> (1000, d0 + 512, s0)> (%arg0)[%arg1] - -``` - -#### `affine.terminator` operation - -Syntax: - -``` -operation ::= `"affine.terminator"() : () -> ()` +affine.dma_start %src[%i, %j], %dst[%k, %l], %tag[%index], %num_elements : + memref<2048xf32, 0>, memref<256xf32, 1>, memref<1xi32, 2> +... +... +affine.dma_wait %tag[%index], %num_elements : memref<1xi32, 2> ``` - -Affine terminator is a special terminator operation for blocks inside affine -loops ([`affine.for`](#affinefor-operation)) and branches -([`affine.if`](#affineif-operation)). It unconditionally transmits the control -flow to the successor of the operation enclosing the region. - -*Rationale*: bodies of affine operations are [blocks](../LangRef.md#blocks) that -must have terminators. Loops and branches represent structured control flow and -should not accept arbitrary branches as terminators. - -This operation does _not_ have a custom syntax. However, affine control -operations omit the terminator in their custom syntax for brevity. diff --git a/mlir/docs/Dialects/GPU.md b/mlir/docs/Dialects/GPU.md --- a/mlir/docs/Dialects/GPU.md +++ b/mlir/docs/Dialects/GPU.md @@ -1,4 +1,4 @@ -# GPU Dialect +# `gpu` Dialect Note: this dialect is more likely to change than others in the near future; use with caution. @@ -36,97 +36,4 @@ ## Operations -### `gpu.block_dim` - -Returns the number of threads in the thread block (aka the block size) along the -x, y, or z `dimension`. - -Example: - -```mlir - %bDimX = "gpu.block_dim"() {dimension = "x"} : () -> (index) -``` - -### `gpu.block_id` - -Returns the block id, i.e. the index of the current block within the grid along -the x, y, or z `dimension`. - -Example: - -```mlir - %bIdY = "gpu.block_id"() {dimension = "y"} : () -> (index) -``` - -### `gpu.grid_dim` - -Returns the number of thread blocks in the grid along the x, y, or z -`dimension`. - -Example: - -```mlir - %gDimZ = "gpu.grid_dim"() {dimension = "z"} : () -> (index) -``` - -### `gpu.thread_id` - -Returns the thread id, i.e. the index of the current thread within the block -along the x, y, or z `dimension`. - -Example: - -```mlir - %tIdX = "gpu.thread_id"() {dimension = "x"} : () -> (index) -``` - -### `gpu.yield` - -Is a special terminator operation for blocks inside regions in gpu ops. It -returns values to the immediately enclosing gpu op. - -Example: - -```mlir -gpu.yield %f0, %f1 : f32, f32 -``` - -### `gpu.all_reduce` - -The "all_reduce" op reduces the value of every work item across a local -workgroup. The result is equal for all work items of a workgroup. - -For example, both - -```mlir -%1 = "gpu.all_reduce"(%0) ({}) { op = "add" } : (f32) -> (f32) -%2 = "gpu.all_reduce"(%0) ({ -^bb(%lhs : f32, %rhs : f32): - %sum = addf %lhs, %rhs : f32 - "gpu.yield"(%sum) : (f32) -> () -}) : (f32) -> (f32) -``` - -compute the sum of each work item's %0 value. The first version specifies the -accumulation as operation, whereas the second version specifies the accumulation -as code region. The accumulation operation must either be `add` or `mul`. - -Either none or all work items of a workgroup need to execute this op -in convergence. - -### `gpu.barrier` - -The "barrier" op synchronizes all work items of a workgroup. It is used -to coordinate communication between the work items of the workgroup. - -```mlir -gpu.barrier -``` - -waits until all work items in the workgroup have reached this point and all -memory accesses made by these work items prior to the op are visible to all work -items in the workgroup. Data hazards between work items accessing the same -memory can be avoided by synchronizing work items in-between these accesses. - -Either none or all work items of a workgroup need to execute this op -in convergence. +[include "Dialects/GPUOps.md"] diff --git a/mlir/docs/Dialects/LLVM.md b/mlir/docs/Dialects/LLVM.md --- a/mlir/docs/Dialects/LLVM.md +++ b/mlir/docs/Dialects/LLVM.md @@ -1,4 +1,4 @@ -# LLVM IR Dialect +# `llvm` Dialect This dialect wraps the LLVM IR types and instructions into MLIR types and operations. It provides several additional operations that are necessary to diff --git a/mlir/docs/Dialects/Linalg.md b/mlir/docs/Dialects/Linalg.md --- a/mlir/docs/Dialects/Linalg.md +++ b/mlir/docs/Dialects/Linalg.md @@ -1,4 +1,4 @@ -# Linalg Dialect +# `linalg` Dialect [TOC] @@ -469,3 +469,7 @@ context of MLIR in which different levels of IR interoperate seamlessly. In practice, it is not necessary (or beneficial) to try and solve all problems in the same IR. + +## Operations + +[include "Dialects/LinalgOps.md"] diff --git a/mlir/include/mlir/Dialect/Affine/IR/AffineOps.td b/mlir/include/mlir/Dialect/Affine/IR/AffineOps.td --- a/mlir/include/mlir/Dialect/Affine/IR/AffineOps.td +++ b/mlir/include/mlir/Dialect/Affine/IR/AffineOps.td @@ -44,22 +44,23 @@ def AffineApplyOp : Affine_Op<"apply", [NoSideEffect]> { let summary = "affine apply operation"; let description = [{ - The affine.apply operation applies an affine mapping to a list of SSA - values, yielding a single SSA value. The number of dimension and symbol - arguments to affine.apply must be equal to the respective number of - dimensional and symbolic inputs to the affine mapping; the affine mapping - has to be one-dimensional, and so the affine.apply operation always returns - one value. The input operands and result must all have ‘index’ type. + The affine.apply operation applies an [affine mapping](#affine-expressions) + to a list of SSA values, yielding a single SSA value. The number of + dimension and symbol arguments to `affine.apply` must be equal to the + respective number of dimensional and symbolic inputs to the affine mapping; + the affine mapping has to be one-dimensional, and so the `affine.apply` + operation always returns one value. The input operands and result must all + have ‘index’ type. Example: ```mlir - #map10 = affine_map<(d0, d1) -> (d0 floordiv 8 + d1 floordiv 128)> - ... - %1 = affine.apply #map10 (%s, %t) + #map10 = affine_map<(d0, d1) -> (d0 floordiv 8 + d1 floordiv 128)> + ... + %1 = affine.apply #map10 (%s, %t) - // Inline example. - %2 = affine.apply affine_map<(i)[s0] -> (i+s0)> (%42)[%n] + // Inline example. + %2 = affine.apply affine_map<(i)[s0] -> (i+s0)> (%42)[%n] ``` }]; let arguments = (ins AffineMapAttr:$map, Variadic:$mapOperands); @@ -100,33 +101,66 @@ DeclareOpInterfaceMethods]> { let summary = "for operation"; let description = [{ - The "affine.for" operation represents an affine loop nest, defining an SSA - value for its induction variable. It has one region capturing the loop body. - The induction variable is represented as a argument of this region. This SSA - value always has type index, which is the size of the machine word. The - stride, represented by step, is a positive constant integer which defaults - to "1" if not present. The lower and upper bounds specify a half-open range: - the range includes the lower bound but does not include the upper bound. - - The body region must contain exactly one block that terminates with - "affine.terminator". Calling AffineForOp::build will create such region - and insert the terminator, so will the parsing even in cases if it is absent - from the custom format. - - The lower and upper bounds of a for operation are represented as an + Syntax: + + ``` + operation ::= `affine.for` ssa-id `=` lower-bound `to` upper-bound + (`step` integer-literal)? `{` op* `}` + + lower-bound ::= `max`? affine-map-attribute dim-and-symbol-use-list | shorthand-bound + upper-bound ::= `min`? affine-map-attribute dim-and-symbol-use-list | shorthand-bound + shorthand-bound ::= ssa-id | `-`? integer-literal + ``` + + The `affine.for` operation represents an affine loop nest. It has one region + containing its body. This region must contain one block that terminates with + [`affine.terminator`](#affineterminator-operation). *Note:* when + `affine.for` is printed in custom format, the terminator is omitted. The + block has one argument of [`index`](../LangRef.md#index-type) type that + represents the induction variable of the loop. + + The `affine.for` operation executes its body a number of times iterating + from a lower bound to an upper bound by a stride. The stride, represented by + `step`, is a positive constant integer which defaults to "1" if not present. + The lower and upper bounds specify a half-open range: the range includes the + lower bound but does not include the upper bound. + + The lower and upper bounds of a `affine.for` operation are represented as an application of an affine mapping to a list of SSA values passed to the map. - The same restrictions hold for these SSA values as for all bindings of SSA - values to dimensions and symbols. The affine mappings for the bounds may - return multiple results, in which case the max/min keywords are required - (for the lower/upper bound respectively), and the bound is the - maximum/minimum of the returned values. + The [same restrictions](#restrictions-on-dimensions-and-symbols) hold for + these SSA values as for all bindings of SSA values to dimensions and + symbols. - Example: + The affine mappings for the bounds may return multiple results, in which + case the `max`/`min` keywords are required (for the lower/upper bound + respectively), and the bound is the maximum/minimum of the returned values. + There is no semantic ambiguity, but MLIR syntax requires the use of these + keywords to make things more obvious to human readers. - affine.for %i = 1 to 10 { - ... - } + Many upper and lower bounds are simple, so MLIR accepts two custom form + syntaxes: the form that accepts a single 'ssa-id' (e.g. `%N`) is shorthand + for applying that SSA value to a function that maps a single symbol to + itself, e.g., `()[s]->(s)()[%N]`. The integer literal form (e.g. `-42`) is + shorthand for a nullary mapping function that returns the constant value + (e.g. `()->(-42)()`). + Example showing reverse iteration of the inner loop: + + ```mlir + #map57 = affine_map<(d0)[s0] -> (s0 - d0 - 1)> + + func @simple_example(%A: memref, %B: memref) { + %N = dim %A, 0 : memref + affine.for %i = 0 to %N step 1 { + affine.for %j = 0 to %N { // implicitly steps by 1 + %0 = affine.apply #map57(%j)[%N] + %tmp = call @F1(%A, %i, %0) : (memref, index, index)->(f32) + call @F2(%tmp, %B, %i, %0) : (f32, memref, index, index)->() + } + } + return + } + ``` }]; let arguments = (ins Variadic); let regions = (region SizedRegion<1>:$region); @@ -236,23 +270,51 @@ [ImplicitAffineTerminator, RecursiveSideEffects]> { let summary = "if-then-else operation"; let description = [{ - The "if" operation represents an if-then-else construct for conditionally - executing two regions of code. The operands to an if operation are an - IntegerSet condition and a set of symbol/dimension operands to the - condition set. The operation produces no results. For example: - - affine.if #set(%i) { - ... - } else { - ... - } - - The 'else' blocks to the if operation are optional, and may be omitted. For - example: - - affine.if #set(%i) { - ... - } + Syntax: + + ``` + operation ::= `affine.if` if-op-cond `{` op* `}` (`else` `{` op* `}`)? + if-op-cond ::= integer-set-attr dim-and-symbol-use-list + ``` + + The `affine.if` operation restricts execution to a subset of the loop + iteration space defined by an integer set (a conjunction of affine + constraints). A single `affine.if` may end with an optional `else` clause. + + The condition of the `affine.if` is represented by an + [integer set](#integer-sets) (a conjunction of affine constraints), + and the SSA values bound to the dimensions and symbols in the integer set. + The [same restrictions](#restrictions-on-dimensions-and-symbols) hold for + these SSA values as for all bindings of SSA values to dimensions and + symbols. + + The `affine.if` operation contains two regions for the "then" and "else" + clauses. The latter may be empty (i.e. contain no blocks), meaning the + absence of the else clause. When non-empty, both regions must contain + exactly one block terminating with + [`affine.terminator`](#affineterminator-operation). *Note:* when `affine.if` + is printed in custom format, the terminator is omitted. These blocks must + not have any arguments. + + Example: + + ```mlir + #set = affine_set<(d0, d1)[s0]: (d0 - 10 >= 0, s0 - d0 - 9 >= 0, + d1 - 10 >= 0, s0 - d1 - 9 >= 0)> + func @reduced_domain_example(%A, %X, %N) : (memref<10xi32>, i32, i32) { + affine.for %i = 0 to %N { + affine.for %j = 0 to %N { + %0 = affine.apply #map42(%j) + %tmp = call @S1(%X, %i, %0) + affine.if #set(%i, %j)[%N] { + %1 = affine.apply #map43(%i, %j) + call @S2(%tmp, %A, %i, %1) + } + } + } + return + } + ``` }]; let arguments = (ins Variadic); let regions = (region SizedRegion<1>:$thenRegion, AnyRegion:$elseRegion); @@ -328,12 +390,24 @@ def AffineMinOp : AffineMinMaxOpBase<"min", [NoSideEffect]> { let summary = "min operation"; let description = [{ - The "min" operation computes the minimum value result from a multi-result - affine map. + Syntax: + + ``` + operation ::= ssa-id `=` `affine.min` affine-map-attribute dim-and-symbol-use-list + ``` + + The `affine.min` operation applies an [affine mapping](#affine-expressions) + to a list of SSA values, and returns the minimum value of all result + expressions. The number of dimension and symbol arguments to `affine.min` + must be equal to the respective number of dimensional and symbolic inputs to + the affine mapping; the `affine.min` operation always returns one value. The + input operands and result must all have 'index' type. Example: - %0 = affine.min (d0) -> (1000, d0 + 512) (%i0) : index + ```mlir + %0 = affine.min affine_map<(d0)[s0] -> (1000, d0 + 512, s0)> (%arg0)[%arg1] + ``` }]; } @@ -345,7 +419,9 @@ Example: - %0 = affine.max (d0) -> (1000, d0 + 512) (%i0) : index + ```mlir + %0 = affine.max (d0) -> (1000, d0 + 512) (%i0) : index + ``` }]; } @@ -375,9 +451,9 @@ Example: ```mlir - affine.parallel (%i, %j) = (0, 0) to (10, 10) step (1, 1) { - ... - } + affine.parallel (%i, %j) = (0, 0) to (10, 10) step (1, 1) { + ... + } ``` }]; @@ -430,8 +506,9 @@ a read/write specifier, a locality hint, and a cache type specifier as shown below: - affine.prefetch %0[%i, %j + 5], read, locality<3>, data - : memref<400x400xi32> + ```mlir + affine.prefetch %0[%i, %j + 5], read, locality<3>, data : memref<400x400xi32> + ``` The read/write specifier is either 'read' or 'write', the locality hint specifier ranges from locality<0> (no locality) to locality<3> (extremely @@ -501,9 +578,20 @@ Affine_Op<"terminator", [NoSideEffect, Terminator]> { let summary = "affine terminator operation"; let description = [{ + Syntax: + + ``` + operation ::= `"affine.terminator"() : () -> ()` + ``` + Affine terminator is a special terminator operation for blocks inside affine - loops and branches. It unconditionally transmits the control flow to the - successor of the operation enclosing the region. + loops ([`affine.for`](#affinefor-operation)) and branches + ([`affine.if`](#affineif-operation)). It unconditionally transmits the + control flow to the successor of the operation enclosing the region. + + *Rationale*: bodies of affine operations are [blocks](../LangRef.md#blocks) + that must have terminators. Loops and branches represent structured control + flow and should not accept arbitrary branches as terminators. This operation does _not_ have a custom syntax. However, affine control operations omit the terminator in their custom syntax for brevity. diff --git a/mlir/include/mlir/Dialect/Affine/IR/CMakeLists.txt b/mlir/include/mlir/Dialect/Affine/IR/CMakeLists.txt --- a/mlir/include/mlir/Dialect/Affine/IR/CMakeLists.txt +++ b/mlir/include/mlir/Dialect/Affine/IR/CMakeLists.txt @@ -1,2 +1,2 @@ add_mlir_dialect(AffineOps affine) -add_mlir_doc(AffineOps -gen-dialect-doc AffineDialect Dialects/) +add_mlir_doc(AffineOps -gen-op-doc AffineOps Dialects/) diff --git a/mlir/include/mlir/Dialect/GPU/CMakeLists.txt b/mlir/include/mlir/Dialect/GPU/CMakeLists.txt --- a/mlir/include/mlir/Dialect/GPU/CMakeLists.txt +++ b/mlir/include/mlir/Dialect/GPU/CMakeLists.txt @@ -1,5 +1,5 @@ -add_mlir_dialect(GPUOps gpu GPUOps) -add_mlir_doc(GPUOps -gen-dialect-doc GPUDialect Dialects/) +add_mlir_dialect(GPUOps gpu) +add_mlir_doc(GPUOps -gen-op-doc GPUOps Dialects/) set(LLVM_TARGET_DEFINITIONS ParallelLoopMapperAttr.td) mlir_tablegen(ParallelLoopMapperAttr.h.inc -gen-struct-attr-decls) diff --git a/mlir/include/mlir/Dialect/GPU/GPUOps.td b/mlir/include/mlir/Dialect/GPU/GPUOps.td --- a/mlir/include/mlir/Dialect/GPU/GPUOps.td +++ b/mlir/include/mlir/Dialect/GPU/GPUOps.td @@ -36,10 +36,54 @@ let verifier = [{ return ::verifyIndexOp(*this); }]; } -def GPU_BlockDimOp : GPU_IndexOp<"block_dim">; -def GPU_BlockIdOp : GPU_IndexOp<"block_id">; -def GPU_GridDimOp : GPU_IndexOp<"grid_dim">; -def GPU_ThreadIdOp : GPU_IndexOp<"thread_id">; +def GPU_BlockDimOp : GPU_IndexOp<"block_dim"> { + let description = [{ + Returns the number of threads in the thread block (aka the block size) along + the x, y, or z `dimension`. + + Example: + + ```mlir + %bDimX = "gpu.block_dim"() {dimension = "x"} : () -> (index) + ``` + }]; +} +def GPU_BlockIdOp : GPU_IndexOp<"block_id"> { + let description = [{ + Returns the block id, i.e. the index of the current block within the grid + along the x, y, or z `dimension`. + + Example: + + ```mlir + %bIdY = "gpu.block_id"() {dimension = "y"} : () -> (index) + ``` + }]; +} +def GPU_GridDimOp : GPU_IndexOp<"grid_dim"> { + let description = [{ + Returns the number of thread blocks in the grid along the x, y, or z + `dimension`. + + Example: + + ```mlir + %gDimZ = "gpu.grid_dim"() {dimension = "z"} : () -> (index) + ``` + }]; +} +def GPU_ThreadIdOp : GPU_IndexOp<"thread_id"> { + let description = [{ + Returns the thread id, i.e. the index of the current thread within the block + along the x, y, or z `dimension`. + + Example: + + ```mlir + %tIdX = "gpu.thread_id"() {dimension = "x"} : () -> (index) + ``` + }]; +} def GPU_GPUFuncOp : GPU_Op<"func", [FunctionLike, IsolatedFromAbove, Symbol]> { let summary = "Function executable on a GPU"; @@ -471,13 +515,14 @@ Arguments<(ins Variadic:$values)> { let summary = "GPU yield operation"; let description = [{ - "gpu.yield" is a special terminator operation for blocks inside regions + gpu.yield` is a special terminator operation for blocks inside regions in gpu ops. It returns values to the immediately enclosing gpu op. Example: - ```gpu.yield %f0, %f1 : f32, f32 - ``` + ```mlir + gpu.yield %f0, %f1 : f32, f32 + ``` }]; } @@ -509,18 +554,20 @@ Results<(outs AnyType)> { let summary = "Reduce values among workgroup."; let description = [{ - The "all_reduce" op reduces the value of every work item across a local + The `all_reduce` op reduces the value of every work item across a local workgroup. The result is equal for all work items of a workgroup. For example, both + + ```mlir + %1 = "gpu.all_reduce"(%0) ({}) { op = "add" } : (f32) -> (f32) + %2 = "gpu.all_reduce"(%0) ({ + ^bb(%lhs : f32, %rhs : f32): + %sum = addf %lhs, %rhs : f32 + "gpu.yield"(%sum) : (f32) -> () + }) : (f32) -> (f32) ``` - %1 = "gpu.all_reduce"(%0) ({}) { op = "add" } : (f32) -> (f32) - %2 = "gpu.all_reduce"(%0) ({ - ^bb(%lhs : f32, %rhs : f32): - %sum = addf %lhs, %rhs : f32 - "gpu.yield"(%sum) : (f32) -> () - }) : (f32) -> (f32) - ``` + compute the sum of each work item's %0 value. The first version specifies the accumulation as operation, whereas the second version specifies the accumulation as code region. The accumulation operation must be one of: @@ -550,11 +597,13 @@ The "shuffle" op moves values to a different invocation within the same subgroup. - For example - ``` - %1, %2 = gpu.shuffle %0, %offset, %width xor : f32 + Example: + + ```mlir + %1, %2 = gpu.shuffle %0, %offset, %width xor : f32 ``` - for lane k returns the value from lane `k ^ offset` and `true` if that lane + + For lane k returns the value from lane `k ^ offset` and `true` if that lane is smaller than %width. Otherwise it returns an unspecified value and `false`. A lane is the index of an invocation relative to its subgroup. @@ -574,9 +623,10 @@ The "barrier" op synchronizes all work items of a workgroup. It is used to coordinate communication between the work items of the workgroup. + ```mlir + gpu.barrier ``` - gpu.barrier - ``` + waits until all work items in the workgroup have reached this point and all memory accesses made by these work items prior to the op are visible to all work items in the workgroup. Data hazards between work items diff --git a/mlir/include/mlir/Dialect/Linalg/IR/CMakeLists.txt b/mlir/include/mlir/Dialect/Linalg/IR/CMakeLists.txt --- a/mlir/include/mlir/Dialect/Linalg/IR/CMakeLists.txt +++ b/mlir/include/mlir/Dialect/Linalg/IR/CMakeLists.txt @@ -1,5 +1,6 @@ add_mlir_dialect(LinalgOps linalg) -add_mlir_doc(LinalgDoc -gen-dialect-doc LinalgDialect Dialects/) +add_mlir_doc(LinalgDoc -gen-op-doc LinalgOps Dialects/) + set(LLVM_TARGET_DEFINITIONS LinalgStructuredOps.td) mlir_tablegen(LinalgStructuredOps.h.inc -gen-op-decls) mlir_tablegen(LinalgStructuredOps.cpp.inc -gen-op-defs) diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td b/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td --- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td +++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td @@ -44,7 +44,7 @@ Example: ```mlir - %3 = linalg.range %0:%1:%2 : !linalg.range + %3 = linalg.range %0:%1:%2 : !linalg.range ```` }]; let builders = [OpBuilder< @@ -91,15 +91,15 @@ Examples: ```mlir - // Dimension collapse (i, j) -> i' and k -> k' - %1 = linalg.reshape %0 [(i, j, k) -> (i, j), (i, j, k) -> (k)] : - memref into memref + // Dimension collapse (i, j) -> i' and k -> k' + %1 = linalg.reshape %0 [(i, j, k) -> (i, j), (i, j, k) -> (k)] : + memref into memref ``` ```mlir - // Dimension expansion i -> (i', j') and (k) -> (k') - %1 = linalg.reshape %0 [(i, j, k) -> (i, j), (i, j, k) -> (k)] : - memref into memref + // Dimension expansion i -> (i', j') and (k) -> (k') + %1 = linalg.reshape %0 [(i, j, k) -> (i, j), (i, j, k) -> (k)] : + memref into memref ``` }]; @@ -151,22 +151,22 @@ 1. rank-preserving `slice`: ```mlir - %4 = linalg.slice %0[%1, %2] : memref, - !linalg.range, !linalg.range, memref - ``` + %4 = linalg.slice %0[%1, %2] : memref, + !linalg.range, !linalg.range, memref + ``` 2. rank-reducing `slice` (from 2-D to 1-D): ```mlir - %4 = linalg.slice %0[%1, %2] : memref, - index, !linalg.range, memref + %4 = linalg.slice %0[%1, %2] : memref, + index, !linalg.range, memref ``` 3. rank-reducing `slice` (from 2-D to 0-D): ```mlir - %4 = linalg.slice %0[%1, %2] : memref, - index, index, memref + %4 = linalg.slice %0[%1, %2] : memref, + index, index, memref ``` }]; @@ -210,7 +210,7 @@ Example: ```mlir - %1 = linalg.transpose %0 (i, j) -> (j, i) : memref + %1 = linalg.transpose %0 (i, j) -> (j, i) : memref ``` }]; @@ -245,7 +245,7 @@ Example: ```mlir - linalg.yield %f0, %f1 : f32, f32 + linalg.yield %f0, %f1 : f32, f32 ``` }]; } diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td b/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td --- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td +++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td @@ -61,44 +61,48 @@ Copies the data in the input view into the output view. Usage: - ```mlir - linalg.copy(%arg0, %arg1) : memref, - memref - ``` + + ```mlir + linalg.copy(%arg0, %arg1) : memref, + memref + ``` One possible lowering to loop form is: - ```mlir - %0 = linalg.dim %arg0, 0 : index - loop.for %i0 = %c0 to %0 step %c1 { - %1 = load %arg0[%i0] : memref - store %1, %arg1[%i0] : memref - } - ``` + + ```mlir + %0 = linalg.dim %arg0, 0 : index + loop.for %i0 = %c0 to %0 step %c1 { + %1 = load %arg0[%i0] : memref + store %1, %arg1[%i0] : memref + } + ``` Optionally, can take `input_permutation` and `output_permutation` attributes to reorder the dimensions of the input and output views. Usage: - ```mlir - linalg.copy(%arg0, %arg1) {inputPermutation : (i, j, k) -> (i, k, j), - outputPermutation : (i, j, k) -> (k, j, i)} : - memref, - memref - ``` + + ```mlir + linalg.copy(%arg0, %arg1) {inputPermutation : (i, j, k) -> (i, k, j), + outputPermutation : (i, j, k) -> (k, j, i)} : + memref, + memref + ``` One possible lowering to loop form is: - ```mlir - %0 = linalg.dim %arg0, 0 - %1 = linalg.dim %arg0, 1 - %2 = linalg.dim %arg0, 2 - loop.for %i0 = %c0 to %{{.*}} step %c1 { - loop.for %i1 = %c0 to %{{.*}} step %c1 { - loop.for %i2 = %c0 to %{{.*}} step %c1 { - %3 = load %arg0[%i0, %i2, %i1] : - memref - store %3, %arg1[%i2, %i1, %i0] : - memref - ``` + + ```mlir + %0 = linalg.dim %arg0, 0 + %1 = linalg.dim %arg0, 1 + %2 = linalg.dim %arg0, 2 + loop.for %i0 = %c0 to %{{.*}} step %c1 { + loop.for %i1 = %c0 to %{{.*}} step %c1 { + loop.for %i2 = %c0 to %{{.*}} step %c1 { + %3 = load %arg0[%i0, %i2, %i1] : + memref + store %3, %arg1[%i2, %i1, %i0] : + memref + ``` The views are expected to be compatible for correctness but this is not enforced at the moment. @@ -441,10 +445,10 @@ specified as attributes. In pretty form, a linalg.generic op is written as: ```mlir - linalg.generic #trait_attribute %A, %B, %C {other-attributes} : - memref, - memref, - memref + linalg.generic #trait_attribute %A, %B, %C {other-attributes} : + memref, + memref, + memref ``` Where #trait_attributes is an alias of a dictionary attribute containing: @@ -474,41 +478,41 @@ Example: Defining a #matmul_trait attribute in MLIR can be done as follows: ```mlir - func @fma(%a: f32, %b: f32, %c: f32) -> f32 { - %d = mulf %a, %b: f32 - %e = addf %c, %d: f32 - return %e: f32 - } - #matmul_accesses = [ - (m, n, k) -> (m, k), - (m, n, k) -> (k, n), - (m, n, k) -> (m, n) - ] - #matmul_trait = { - doc = "C(m, n) += A(m, k) * B(k, n)", - fun = @fma, - indexing_maps = #matmul_accesses, - library_call = "linalg_matmul", - n_views = [2, 1], - iterator_types = ["parallel", "parallel", "reduction"] - } + func @fma(%a: f32, %b: f32, %c: f32) -> f32 { + %d = mulf %a, %b: f32 + %e = addf %c, %d: f32 + return %e: f32 + } + #matmul_accesses = [ + (m, n, k) -> (m, k), + (m, n, k) -> (k, n), + (m, n, k) -> (m, n) + ] + #matmul_trait = { + doc = "C(m, n) += A(m, k) * B(k, n)", + fun = @fma, + indexing_maps = #matmul_accesses, + library_call = "linalg_matmul", + n_views = [2, 1], + iterator_types = ["parallel", "parallel", "reduction"] + } ``` And can be reused in multiple places as: ```mlir - linalg.generic #matmul_trait %A, %B, %C [other-attributes] : - memref, - memref, - memref + linalg.generic #matmul_trait %A, %B, %C [other-attributes] : + memref, + memref, + memref ``` This may lower to either: ```mlir - call @linalg_matmul(%A, %B, %C) : - (memref, - memref, - memref) - -> () + call @linalg_matmul(%A, %B, %C) : + (memref, + memref, + memref) + -> () ``` or IR resembling: @@ -532,10 +536,10 @@ mixing input and output ranked tensor values with input and output memrefs. ```mlir - %C = linalg.generic #trait_attribute %A, %B {other-attributes} : - tensor, - memref - -> (tensor) + %C = linalg.generic #trait_attribute %A, %B {other-attributes} : + tensor, + memref + -> (tensor) ``` In this case, the number of outputs (args_out) must match the sum of (1) the @@ -568,10 +572,10 @@ written as: ```mlir - linalg.indexed_generic #trait_attribute %A, %B, %C {other-attributes} : - memref, - memref, - memref + linalg.indexed_generic #trait_attribute %A, %B, %C {other-attributes} : + memref, + memref, + memref ``` Where #trait_attributes is an alias of a dictionary attribute containing: @@ -600,49 +604,53 @@ Example: Defining a #matmul_trait attribute in MLIR can be done as follows: - ```mlir - func @fma(%offset_m: index, %offset_n: index, %offset_k: index, - %a: f32, %b: f32, %c: f32) - -> f32 - { - "some_optional_condition"(%offset_m, %offset_n, %offset_k) - %d = mulf %a, %b: f32 - %e = addf %c, %d: f32 - return %e: f32 - } - #matmul_accesses = [ - (m, n, k) -> (m, k), - (m, n, k) -> (k, n), - (m, n, k) -> (m, n) - ] - #matmul_trait = { - doc = "C(m, n) += A(m, k) * B(k, n)", - fun = @fma, - indexing_maps = #matmul_accesses, - library_call = "linalg_matmul", - n_views = [2, 1], - iterator_types = ["parallel", "parallel", "reduction"] - } - ``` + + ```mlir + func @fma(%offset_m: index, %offset_n: index, %offset_k: index, + %a: f32, %b: f32, %c: f32) + -> f32 + { + "some_optional_condition"(%offset_m, %offset_n, %offset_k) + %d = mulf %a, %b: f32 + %e = addf %c, %d: f32 + return %e: f32 + } + #matmul_accesses = [ + (m, n, k) -> (m, k), + (m, n, k) -> (k, n), + (m, n, k) -> (m, n) + ] + #matmul_trait = { + doc = "C(m, n) += A(m, k) * B(k, n)", + fun = @fma, + indexing_maps = #matmul_accesses, + library_call = "linalg_matmul", + n_views = [2, 1], + iterator_types = ["parallel", "parallel", "reduction"] + } + ``` And can be reused in multiple places as: - ```mlir - linalg.indexed_generic #matmul_trait %A, %B, %C [other-attributes] : - memref, - memref, - memref - ``` + + ```mlir + linalg.indexed_generic #matmul_trait %A, %B, %C [other-attributes] : + memref, + memref, + memref + ``` This may lower to either: - ```mlir - call @linalg_matmul(%offset_m, %offset_n, %offset_k, %A, %B, %C) : - (memref, - memref, - memref) - -> () - ``` + + ```mlir + call @linalg_matmul(%offset_m, %offset_n, %offset_k, %A, %B, %C) : + (memref, + memref, + memref) + -> () + ``` or IR resembling: + ```mlir loop.for %m = %c0 to %M step %c1 { loop.for %n = %c0 to %N step %c1 { @@ -664,10 +672,10 @@ memrefs. ```mlir - %C = linalg.indexed_generic #trait_attribute %A, %B {other-attributes} - : tensor, - memref - -> (tensor) + %C = linalg.indexed_generic #trait_attribute %A, %B {other-attributes} + : tensor, + memref + -> (tensor) ``` In this case, the number of outputs (args_out) must match the sum of (1) the diff --git a/mlir/include/mlir/Dialect/LoopOps/LoopOps.td b/mlir/include/mlir/Dialect/LoopOps/LoopOps.td --- a/mlir/include/mlir/Dialect/LoopOps/LoopOps.td +++ b/mlir/include/mlir/Dialect/LoopOps/LoopOps.td @@ -57,12 +57,12 @@ cases when it is absent from the custom format. For example: ```mlir - loop.for %iv = %lb to %ub step %step { - ... // body - } + loop.for %iv = %lb to %ub step %step { + ... // body + } ``` - "loop.for" can also operate on loop-carried variables and returns the final + `loop.for` can also operate on loop-carried variables and returns the final values after loop termination. The initial values of the variables are passed as additional SSA operands to the "loop.for" following the 3 loop control SSA values mentioned above (lower bound, upper bound and step). The @@ -120,7 +120,7 @@ } return %sum : f32 } - ``` + ``` }]; let arguments = (ins Index:$lowerBound, Index:$upperBound, @@ -174,44 +174,47 @@ [SingleBlockImplicitTerminator<"YieldOp">, RecursiveSideEffects]> { let summary = "if-then-else operation"; let description = [{ - The "loop.if" operation represents an if-then-else construct for + The `loop.if` operation represents an if-then-else construct for conditionally executing two regions of code. The operand to an if operation is a boolean value. For example: ```mlir - loop.if %b { - ... - } else { - ... - } + loop.if %b { + ... + } else { + ... + } ``` - "loop.if" may also return results that are defined in its regions. The + `loop.if` may also return results that are defined in its regions. The values defined are determined by which execution path is taken. - For example: + + Example: + ```mlir - %x, %y = loop.if %b -> (f32, f32) { - %x_true = ... - %y_true = ... - loop.yield %x_true, %y_true : f32, f32 - } else { - %x_false = ... - %y_false = ... - loop.yield %x_false, %y_false : f32, f32 - } + %x, %y = loop.if %b -> (f32, f32) { + %x_true = ... + %y_true = ... + loop.yield %x_true, %y_true : f32, f32 + } else { + %x_false = ... + %y_false = ... + loop.yield %x_false, %y_false : f32, f32 + } ``` - "loop.if" regions are always terminated with "loop.yield". If "loop.if" + `loop.if` regions are always terminated with "loop.yield". If "loop.if" defines no values, the "loop.yield" can be left out, and will be inserted implicitly. Otherwise, it must be explicit. Also, if "loop.if" defines one or more values, the 'else' block cannot be omitted. - For example: + Example: + ```mlir - loop.if %b { - ... - } + loop.if %b { + ... + } ``` }]; let arguments = (ins I1:$condition); @@ -256,7 +259,7 @@ The lower and upper bounds specify a half-open range: the range includes the lower bound but does not include the upper bound. The initial values have the same types as results of "loop.parallel". If there are no results, - the keyword `init` can be omitted. + the keyword `init` can be omitted. Semantically we require that the iteration space can be iterated in any order, and the loop body can be executed in parallel. If there are data @@ -274,19 +277,20 @@ The body region must contain exactly one block that terminates with "loop.yield" without operands. Parsing ParallelOp will create such a region and insert the terminator when it is absent from the custom format. - For example: + + Example: ```mlir - loop.parallel (%iv) = (%lb) to (%ub) step (%step) -> f32 { - %zero = constant 0.0 : f32 - loop.reduce(%zero) : f32 { - ^bb0(%lhs : f32, %rhs: f32): - %res = addf %lhs, %rhs : f32 - loop.reduce.return %res : f32 - } - } + loop.parallel (%iv) = (%lb) to (%ub) step (%step) -> f32 { + %zero = constant 0.0 : f32 + loop.reduce(%zero) : f32 { + ^bb0(%lhs : f32, %rhs: f32): + %res = addf %lhs, %rhs : f32 + loop.reduce.return %res : f32 + } + } ``` - }]; + }]; let arguments = (ins Variadic:$lowerBound, Variadic:$upperBound, @@ -343,14 +347,13 @@ Example: ```mlir - %operand = constant 1.0 : f32 - loop.reduce(%operand) : f32 { - ^bb0(%lhs : f32, %rhs: f32): - %res = addf %lhs, %rhs : f32 - loop.reduce.return %res : f32 - } + %operand = constant 1.0 : f32 + loop.reduce(%operand) : f32 { + ^bb0(%lhs : f32, %rhs: f32): + %res = addf %lhs, %rhs : f32 + loop.reduce.return %res : f32 + } ``` - }]; let skipDefaultBuilders = 1; @@ -373,7 +376,7 @@ the operand of "loop.reduce". Example for the custom format: ```mlir - loop.reduce.return %res : f32 + loop.reduce.return %res : f32 ``` }]; diff --git a/mlir/include/mlir/Dialect/Quant/QuantOps.td b/mlir/include/mlir/Dialect/Quant/QuantOps.td --- a/mlir/include/mlir/Dialect/Quant/QuantOps.td +++ b/mlir/include/mlir/Dialect/Quant/QuantOps.td @@ -92,7 +92,7 @@ IsolatedFromAbove, SingleBlockImplicitTerminator<"ReturnOp">]> { let summary = [{ - The `region operation wraps high-precision ops as a logical low-precision + The `region` operation wraps high-precision ops as a logical low-precision quantized kernel. }]; @@ -119,8 +119,9 @@ def quant_ConstFakeQuant : quant_Op<"const_fake_quant", [SameOperandsAndResultType, NoSideEffect]> { - let summary = - "Simulates the effect of uniform quantization with const range."; + let summary = [{ + Simulates the effect of uniform quantization with const range. + }]; let description = [{ Given a const min, max, num_bits and narrow_range attribute, applies the @@ -148,8 +149,9 @@ def quant_ConstFakeQuantPerAxis : quant_Op<"const_fake_quant_per_axis", [SameOperandsAndResultType, NoSideEffect]> { - let summary = - "Simulates the effect of per axis uniform quantization with const range."; + let summary = [{ + Simulates the effect of per axis uniform quantization with const range. + }]; let description = [{ Given a const min, max, num_bits and narrow_range attribute, applies the @@ -179,8 +181,7 @@ } def quant_StatisticsRefOp : quant_Op<"stats_ref", [SameOperandsAndResultType]> { - let summary = - "Indicates that statistics are resolved by reference."; + let summary = "Indicates that statistics are resolved by reference."; let description = [{ This op acts as an identity that, when encountered at runtime, should result @@ -198,8 +199,7 @@ } def quant_StatisticsOp : quant_Op<"stats", [SameOperandsAndResultType]> { - let summary = - "Identity op which associates statistics with the value."; + let summary = "Identity op which associates statistics with the value."; let description = [{ Associates statistics about the runtime ranges of values observed for @@ -213,8 +213,11 @@ `layerStats` must be a rank 1 tensor: [2] `axisStats` must be a rank 2 tensor: [N, 2], where N=the slice size splitted by the `axis` dimension. For example: - , axis=3 => N=2 - , axis=2 => N=6 + + ``` + , axis=3 => N=2 + , axis=2 => N=6 + ``` }]; let arguments = (ins @@ -263,8 +266,9 @@ } def quant_CoupledRefOp : quant_Op<"coupled_ref", [SameOperandsAndResultType]> { - let summary = - "Indicates that one point of the computation is coupled to another."; + let summary = [{ + Indicates that one point of the computation is coupled to another. + }]; let description = [{ Ordinarily, relationships between ops for the purposes of determining diff --git a/mlir/include/mlir/Dialect/Shape/IR/CMakeLists.txt b/mlir/include/mlir/Dialect/Shape/IR/CMakeLists.txt --- a/mlir/include/mlir/Dialect/Shape/IR/CMakeLists.txt +++ b/mlir/include/mlir/Dialect/Shape/IR/CMakeLists.txt @@ -1,5 +1,2 @@ -set(LLVM_TARGET_DEFINITIONS ShapeOps.td) -mlir_tablegen(ShapeOps.h.inc -gen-op-decls) -mlir_tablegen(ShapeOps.cpp.inc -gen-op-defs) -mlir_tablegen(ShapeOpsDialect.h.inc -gen-dialect-decls) -add_public_tablegen_target(MLIRShapeOpsIncGen) +add_mlir_dialect(ShapeOps shape) +add_mlir_doc(ShapeOps -gen-dialect-doc ShapeDialect Dialects/) diff --git a/mlir/include/mlir/Dialect/Vector/VectorOps.td b/mlir/include/mlir/Dialect/Vector/VectorOps.td --- a/mlir/include/mlir/Dialect/Vector/VectorOps.td +++ b/mlir/include/mlir/Dialect/Vector/VectorOps.td @@ -87,58 +87,59 @@ and acc arguments. An indexing map attribute specifies a mapping from each iterator in the iterator type list, to each dimension of an N-D vector. - Examples: + Example: + ```mlir - // Simple dot product (K = 0). - #contraction_accesses = [ - affine_map<(i) -> (i)>, - affine_map<(i) -> (i)>, - affine_map<(i) -> ()> - ] - #contraction_trait = { - indexing_maps = #contraction_accesses, - iterator_types = ["reduction"] - } - %3 = vector.contract #contraction_trait %0, %1, %2 - : vector<10xf32>, vector<10xf32> into f32 - - // 2D vector contraction with one contracting dimension (matmul, K = 2). - #contraction_accesses = [ - affine_map<(i, j, k) -> (i, k)>, - affine_map<(i, j, k) -> (k, j)>, - affine_map<(i, j, k) -> (i, j)> - ] - #contraction_trait = { - indexing_maps = #contraction_accesses, - iterator_types = ["parallel", "parallel", "reduction"] - } + // Simple dot product (K = 0). + #contraction_accesses = [ + affine_map<(i) -> (i)>, + affine_map<(i) -> (i)>, + affine_map<(i) -> ()> + ] + #contraction_trait = { + indexing_maps = #contraction_accesses, + iterator_types = ["reduction"] + } + %3 = vector.contract #contraction_trait %0, %1, %2 + : vector<10xf32>, vector<10xf32> into f32 + + // 2D vector contraction with one contracting dimension (matmul, K = 2). + #contraction_accesses = [ + affine_map<(i, j, k) -> (i, k)>, + affine_map<(i, j, k) -> (k, j)>, + affine_map<(i, j, k) -> (i, j)> + ] + #contraction_trait = { + indexing_maps = #contraction_accesses, + iterator_types = ["parallel", "parallel", "reduction"] + } - %3 = vector.contract #contraction_trait %0, %1, %2 - : vector<4x3xf32>, vector<3x7xf32> into vector<4x7xf32> - - // 4D to 3D vector contraction with two contracting dimensions and - // one batch dimension (K = 3). - #contraction_accesses = [ - affine_map<(b0, f0, f1, c0, c1) -> (c0, b0, c1, f0)>, - affine_map<(b0, f0, f1, c0, c1) -> (b0, c1, c0, f1)>, - affine_map<(b0, f0, f1, c0, c1) -> (b0, f0, f1)> - ] - #contraction_trait = { - indexing_maps = #contraction_accesses, - iterator_types = ["parallel", "parallel", "parallel", - "reduction", "reduction"] - } + %3 = vector.contract #contraction_trait %0, %1, %2 + : vector<4x3xf32>, vector<3x7xf32> into vector<4x7xf32> + + // 4D to 3D vector contraction with two contracting dimensions and + // one batch dimension (K = 3). + #contraction_accesses = [ + affine_map<(b0, f0, f1, c0, c1) -> (c0, b0, c1, f0)>, + affine_map<(b0, f0, f1, c0, c1) -> (b0, c1, c0, f1)>, + affine_map<(b0, f0, f1, c0, c1) -> (b0, f0, f1)> + ] + #contraction_trait = { + indexing_maps = #contraction_accesses, + iterator_types = ["parallel", "parallel", "parallel", + "reduction", "reduction"] + } - %4 = vector.contract #contraction_trait %0, %1, %2 - : vector<7x8x16x15xf32>, vector<8x16x7x5xf32> into vector<8x15x5xf32> + %4 = vector.contract #contraction_trait %0, %1, %2 + : vector<7x8x16x15xf32>, vector<8x16x7x5xf32> into vector<8x15x5xf32> - // 4D vector contraction with two contracting dimensions and optional - // vector mask arguments. - %lhs_mask = vector.constant_mask [7, 8, 16, 15] : vector<7x8x16x15xi1> - %rhs_mask = vector.constant_mask [8, 16, 7, 5] : vector<8x16x7x5xi1> + // 4D vector contraction with two contracting dimensions and optional + // vector mask arguments. + %lhs_mask = vector.constant_mask [7, 8, 16, 15] : vector<7x8x16x15xi1> + %rhs_mask = vector.constant_mask [8, 16, 7, 5] : vector<8x16x7x5xi1> - %5 = vector.contract #contraction_trait %0, %1, %2, %lhs_mask, %rhs_mask - : vector<7x8x16x15xf32>, vector<8x16x7x5xf32> into vector<8x15x8x5xf32> + %5 = vector.contract #contraction_trait %0, %1, %2, %lhs_mask, %rhs_mask + : vector<7x8x16x15xf32>, vector<8x16x7x5xf32> into vector<8x15x8x5xf32> ``` }]; let builders = [OpBuilder< @@ -203,13 +204,14 @@ http://llvm.org/docs/LangRef.html#experimental-vector-reduction-intrinsics - Examples: + Example: + ```mlir - %1 = vector.reduction "add", %0 : vector<16xf32> into f32 + %1 = vector.reduction "add", %0 : vector<16xf32> into f32 - %3 = vector.reduction "xor", %2 : vector<4xi32> into i32 + %3 = vector.reduction "xor", %2 : vector<4xi32> into i32 - %4 = vector.reduction "mul", %0, %1 : vector<16xf32> into f32 + %4 = vector.reduction "mul", %0, %1 : vector<16xf32> into f32 ``` }]; let extraClassDeclaration = [{ @@ -247,11 +249,12 @@ dimension of 1. These rules imply that any scalar broadcast (k=0) to any shaped vector with the same element type is always legal. - Examples: + Example: + ```mlir - %0 = constant 0.0 : f32 - %1 = vector.broadcast %0 : f32 to vector<16xf32> - %2 = vector.broadcast %1 : vector<16xf32> to vector<4x16xf32> + %0 = constant 0.0 : f32 + %1 = vector.broadcast %0 : f32 to vector<16xf32> + %2 = vector.broadcast %1 : vector<16xf32> to vector<4x16xf32> ``` }]; let extraClassDeclaration = [{ @@ -290,7 +293,8 @@ mask values must be within range, viz. given two k-D operands v1 and v2 above, all mask values are in the range [0,s_1+t_1) - Examples: + Example: + ```mlir %0 = vector.shuffle %a, %b[0, 3] : vector<2xf32>, vector<2xf32> ; yields vector<2xf32> @@ -298,7 +302,6 @@ : vector<2x16xf32>, vector<1x16xf32> ; yields vector<3x16xf32> %2 = vector.shuffle %a, %b[3, 2, 1, 0] : vector<2xf32>, vector<2xf32> ; yields vector<4xf32> - ``` }]; let builders = [OpBuilder<"Builder *builder, OperationState &result," @@ -333,9 +336,10 @@ https://llvm.org/docs/LangRef.html#extractelement-instruction Example: + ```mlir - %c = constant 15 : i32 - %1 = vector.extractelement %0[%c : i32]: vector<16xf32> + %c = constant 15 : i32 + %1 = vector.extractelement %0[%c : i32]: vector<16xf32> ``` }]; let extraClassDeclaration = [{ @@ -360,10 +364,11 @@ Takes an n-D vector and a k-D position and extracts the (n-k)-D vector at the proper position. Degenerates to an element type in the 0-D case. - Examples: + Example: + ```mlir - %1 = vector.extract %0[3]: vector<4x8x16xf32> - %2 = vector.extract %0[3, 3, 3]: vector<4x8x16xf32> + %1 = vector.extract %0[3]: vector<4x8x16xf32> + %2 = vector.extract %0[3, 3, 3]: vector<4x8x16xf32> ``` }]; let builders = [OpBuilder< @@ -396,19 +401,20 @@ linear index of the slice w.r.t the unrolling scheme represented by 'sizes'. Currently, only unit strides are supported. - Examples: + Example: + ```mlir - %0 = vector.transfer_read ...: vector<4x2xf32> + %0 = vector.transfer_read ...: vector<4x2xf32> - %1 = vector.extract_slices %0, [2, 2], [1, 1] - : vector<4x2xf32> into tuple, vector<2x2xf32>> + %1 = vector.extract_slices %0, [2, 2], [1, 1] + : vector<4x2xf32> into tuple, vector<2x2xf32>> - // Example with partial slices at dimension boundaries. - %2 = vector.transfer_read ...: vector<4x3xf32> + // Example with partial slices at dimension boundaries. + %2 = vector.transfer_read ...: vector<4x3xf32> - %3 = vector.extract_slices %2, [2, 2], [1, 1] - : vector<4x3xf32> into tuple, vector<2x1xf32>, - vector<2x2xf32>, vector<2x1xf32>> + %3 = vector.extract_slices %2, [2, 2], [1, 1] + : vector<4x3xf32> into tuple, vector<2x1xf32>, + vector<2x2xf32>, vector<2x1xf32>> ``` }]; let builders = [OpBuilder< @@ -449,8 +455,9 @@ to the `llvm.fma.*` intrinsic. Example: + ```mlir - %3 = vector.fma %0, %1, %2: vector<8x16xf32> + %3 = vector.fma %0, %1, %2: vector<8x16xf32> ``` }]; // Fully specified by traits. @@ -483,10 +490,11 @@ https://llvm.org/docs/LangRef.html#insertelement-instruction Example: + ```mlir - %c = constant 15 : i32 - %f = constant 0.0f : f32 - %1 = vector.insertelement %f, %0[%c : i32]: vector<16xf32> + %c = constant 15 : i32 + %f = constant 0.0f : f32 + %1 = vector.insertelement %f, %0[%c : i32]: vector<16xf32> ``` }]; let extraClassDeclaration = [{ @@ -515,12 +523,11 @@ and inserts the n-D source into the (n+k)-D destination at the proper position. Degenerates to a scalar source type when n = 0. - Examples: + Example: + ```mlir - %2 = vector.insert %0, %1[3]: - vector<8x16xf32> into vector<4x8x16xf32> - %5 = vector.insert %3, %4[3, 3, 3]: - f32 into vector<4x8x16xf32> + %2 = vector.insert %0, %1[3] : vector<8x16xf32> into vector<4x8x16xf32> + %5 = vector.insert %3, %4[3, 3, 3] : f32 into vector<4x8x16xf32> ``` }]; let assemblyFormat = [{ @@ -558,22 +565,23 @@ linear index of the slice w.r.t the unrolling scheme represented by 'sizes'. Currently, only unit strides are supported. - Examples: + Example: + ```mlir - %0 = vector.extract_slices %0, [2, 2], [1, 1] - : vector<4x2xf32> into tuple, vector<2x2xf32>> + %0 = vector.extract_slices %0, [2, 2], [1, 1] + : vector<4x2xf32> into tuple, vector<2x2xf32>> - %1 = vector.insert_slices %0, [2, 2], [1, 1] - : tuple, vector<2x2xf32>> into vector<4x2xf32> + %1 = vector.insert_slices %0, [2, 2], [1, 1] + : tuple, vector<2x2xf32>> into vector<4x2xf32> - // Example with partial slices at dimension boundaries. - %3 = vector.extract_slices %2, [2, 2], [1, 1] - : vector<4x3xf32> into tuple, vector<2x1xf32>, - vector<2x2xf32>, vector<2x1xf32>> + // Example with partial slices at dimension boundaries. + %3 = vector.extract_slices %2, [2, 2], [1, 1] + : vector<4x3xf32> into tuple, vector<2x1xf32>, + vector<2x2xf32>, vector<2x1xf32>> - %4 = vector.insert_slices %3, [2, 2], [1, 1] - : tuple, vector<2x1xf32>, - vector<2x2xf32>, vector<2x1xf32>> into vector<4x3xf32> + %4 = vector.insert_slices %3, [2, 2], [1, 1] + : tuple, vector<2x1xf32>, + vector<2x2xf32>, vector<2x1xf32>> into vector<4x3xf32> ``` }]; @@ -616,11 +624,12 @@ the last k-D dimensions contain the k-D source vector elements strided at the proper location as specified by the offsets. - Examples: + Example: + ```mlir - %2 = vector.insert_strided_slice %0, %1 - {offsets = [0, 0, 2], strides = [1, 1]}: - vector<2x4xf32> into vector<16x4x8xf32> + %2 = vector.insert_strided_slice %0, %1 + {offsets = [0, 0, 2], strides = [1, 1]}: + vector<2x4xf32> into vector<16x4x8xf32> ``` }]; @@ -658,14 +667,15 @@ the LLVMIR dialect, this form emits `llvm.intr.fma`, which is guaranteed to lower to actual `fma` instructions on x86. - Examples: - ```mlir - %2 = vector.outerproduct %0, %1: vector<4xf32>, vector<8xf32> - return %2: vector<4x8xf32> + Example: - %3 = vector.outerproduct %0, %1, %2: - vector<4xf32>, vector<8xf32>, vector<4x8xf32> - return %3: vector<4x8xf32> + ``` + %2 = vector.outerproduct %0, %1: vector<4xf32>, vector<8xf32> + return %2: vector<4x8xf32> + + %3 = vector.outerproduct %0, %1, %2: + vector<4xf32>, vector<8xf32>, vector<4x8xf32> + return %3: vector<4x8xf32> ``` }]; let extraClassDeclaration = [{ @@ -708,8 +718,8 @@ In the examples below, valid data elements are represented by an alphabetic character, and undefined data elements are represented by '-'. - Example: - ```mlir + Example + vector<1x8xf32> with valid data shape [6], fixed vector sizes [8] input: [a, b, c, d, e, f] @@ -718,9 +728,8 @@ vector layout: [a, b, c, d, e, f, -, -] - ``` - Example: - ```mlir + Example + vector<2x8xf32> with valid data shape [10], fixed vector sizes [8] input: [a, b, c, d, e, f, g, h, i, j] @@ -729,9 +738,9 @@ vector layout: [[a, b, c, d, e, f, g, h], [i, j, -, -, -, -, -, -]] - ``` - Example: - ```mlir + + Example + vector<2x2x2x3xf32> with valid data shape [3, 5], fixed vector sizes [2, 3] @@ -750,9 +759,9 @@ [-, -, -]] [[n, o, -], [-, -, -]]]] - ``` - Example: - ```mlir + + Example + %1 = vector.reshape %0, [%c3, %c6], [%c2, %c9], [4] : vector<3x2x4xf32> to vector<2x3x4xf32> @@ -776,7 +785,6 @@ [[j, k, l, m], [n, o, p, q], [r, -, -, -]]] - ``` }]; let extraClassDeclaration = [{ @@ -828,16 +836,17 @@ attribute. The returned subvector contains the elements starting at offset `offsets` and ending at `offsets + sizes`. - Examples: + Example: + ```mlir - %1 = vector.strided_slice %0 - {offsets = [0, 2], sizes = [2, 4], strides = [1, 1]}: - vector<4x8x16xf32> to vector<2x4x16xf32> - ``` + %1 = vector.strided_slice %0 + {offsets = [0, 2], sizes = [2, 4], strides = [1, 1]}: + vector<4x8x16xf32> to vector<2x4x16xf32> // TODO(ntv) Evolve to a range form syntax similar to: %1 = vector.strided_slice %0[0:2:1][2:4:1] vector<4x8x16xf32> to vector<2x4x16xf32> + ``` }]; let builders = [OpBuilder< "Builder *builder, OperationState &result, Value source, " # @@ -948,12 +957,13 @@ implemented using a warp-shuffle if loop `j` were mapped to `threadIdx.x`. Syntax - ```mlir + ``` operation ::= ssa-id `=` `vector.transfer_read` ssa-use-list `{` attribute-entry `} :` memref-type `,` vector-type ``` - Examples: + Example: + ```mlir // Read the slice `%A[%i0, %i1:%i1+256, %i2:%i2+32]` into vector<32x256xf32> // and pad with %f0 to handle the boundary case: @@ -1026,14 +1036,7 @@ valid. Different lowerings may be pertinent depending on the hardware support. - Syntax: - - ```mlir - operation ::= `vector.transfer_write` ssa-use-list `{` attribute-entry `} : - ` vector-type ', ' memref-type ' - ``` - - Examples: + Example: ```mlir // write vector<16x32x64xf32> into the slice @@ -1099,7 +1102,7 @@ 2-D MLIR vector to a 1-D flattened LLVM vector.shape_cast lowering to LLVM is supported in that particular case, for now. - Examples: + Example: ```mlir // Example casting to a lower vector rank. @@ -1139,7 +1142,7 @@ Syntax: - ```mlir + ``` operation ::= `vector.type_cast` ssa-use : memref-type to memref-type ``` @@ -1184,20 +1187,20 @@ (otherwise element values are set to 0). Example: - ``` - create a constant vector mask of size 4x3xi1 with elements in range - 0 <= row <= 2 and 0 <= col <= 1 are set to 1 (others to 0). - - %1 = vector.constant_mask [3, 2] : vector<4x3xi1> - - print %1 - columns - 0 1 2 - |------------ - 0 | 1 1 0 - rows 1 | 1 1 0 - 2 | 1 1 0 - 3 | 0 0 0 + + ```mlir + // create a constant vector mask of size 4x3xi1 with elements in range + // 0 <= row <= 2 and 0 <= col <= 1 are set to 1 (others to 0). + %1 = vector.constant_mask [3, 2] : vector<4x3xi1> + + print %1 + columns + 0 1 2 + |------------ + 0 | 1 1 0 + rows 1 | 1 1 0 + 2 | 1 1 0 + 3 | 0 0 0 ``` }]; @@ -1221,20 +1224,20 @@ (otherwise element values are set to 0). Example: - ``` - create a vector mask of size 4x3xi1 where elements in range - 0 <= row <= 2 and 0 <= col <= 1 are set to 1 (others to 0). - - %1 = vector.create_mask %c3, %c2 : vector<4x3xi1> - - print %1 - columns - 0 1 2 - |------------ - 0 | 1 1 0 - rows 1 | 1 1 0 - 2 | 1 1 0 - 3 | 0 0 0 + + ```mlir + // create a vector mask of size 4x3xi1 where elements in range + // 0 <= row <= 2 and 0 <= col <= 1 are set to 1 (others to 0). + %1 = vector.create_mask %c3, %c2 : vector<4x3xi1> + + print %1 + columns + 0 1 2 + |------------ + 0 | 1 1 0 + rows 1 | 1 1 0 + 2 | 1 1 0 + 3 | 0 0 0 ``` }]; @@ -1254,16 +1257,17 @@ transformation and should be removed before lowering to lower-level dialects. - Examples: - ```mlir - %0 = vector.transfer_read ... : vector<2x2xf32> - %1 = vector.transfer_read ... : vector<2x1xf32> - %2 = vector.transfer_read ... : vector<2x2xf32> - %3 = vector.transfer_read ... : vector<2x1xf32> - %4 = vector.tuple %0, %1, %2, %3 - : vector<2x2xf32>, vector<2x1xf32>, vector<2x2xf32>, vector<2x1xf32> + Example: + + ```mlir + %0 = vector.transfer_read ... : vector<2x2xf32> + %1 = vector.transfer_read ... : vector<2x1xf32> + %2 = vector.transfer_read ... : vector<2x2xf32> + %3 = vector.transfer_read ... : vector<2x1xf32> + %4 = vector.tuple %0, %1, %2, %3 + : vector<2x2xf32>, vector<2x1xf32>, vector<2x2xf32>, vector<2x1xf32> ``` }]; @@ -1285,14 +1289,17 @@ Takes a n-D vector and returns the transposed n-D vector defined by the permutation of ranks in the n-sized integer array attribute. In the operation + ```mlir - %1 = vector.transpose %0, [i_1, .., i_n] - : vector - to vector + %1 = vector.transpose %0, [i_1, .., i_n] + : vector + to vector ``` + the transp array [i_1, .., i_n] must be a permutation of [0, .., n-1]. Example: + ```mlir %1 = vector.transpose %0, [1, 0] : vector<2x3xf32> to vector<3x2xf32> @@ -1326,14 +1333,15 @@ transformation and should be removed before lowering to lower-level dialects. - Examples: + Example: + ```mlir - %4 = vector.tuple %0, %1, %2, %3 - : vector<2x2xf32>, vector<2x1xf32>, vector<2x2xf32>, vector<2x1xf32>> + %4 = vector.tuple %0, %1, %2, %3 + : vector<2x2xf32>, vector<2x1xf32>, vector<2x2xf32>, vector<2x1xf32>> - %5 = vector.tuple_get %4, 1 - : tuple, vector<2x1xf32>, - vector<2x2xf32>, vector<2x1xf32>> + %5 = vector.tuple_get %4, 1 + : tuple, vector<2x1xf32>, + vector<2x2xf32>, vector<2x1xf32>> ``` }]; @@ -1356,21 +1364,22 @@ Prints the source vector (or scalar) to stdout in human readable format (for testing and debugging). No return value. - Examples: + Example: + ```mlir - %0 = constant 0.0 : f32 - %1 = vector.broadcast %0 : f32 to vector<4xf32> - vector.print %1 : vector<4xf32> + %0 = constant 0.0 : f32 + %1 = vector.broadcast %0 : f32 to vector<4xf32> + vector.print %1 : vector<4xf32> - when lowered to LLVM, the vector print is unrolled into - elementary printing method calls that at runtime will yield + when lowered to LLVM, the vector print is unrolled into + elementary printing method calls that at runtime will yield - ( 0.0, 0.0, 0.0, 0.0 ) + ( 0.0, 0.0, 0.0, 0.0 ) - on stdout when linked with a small runtime support library, - which only needs to provide a few printing methods (single - value for all data types, opening/closing bracket, comma, - newline). + on stdout when linked with a small runtime support library, + which only needs to provide a few printing methods (single + value for all data types, opening/closing bracket, comma, + newline). ``` }]; let verifier = ?; @@ -1421,9 +1430,9 @@ Example: ```mlir - %C = vector.matrix_multiply %A, %B - { lhs_rows = 4: i32, lhs_columns = 16: i32 , rhs_columns = 3: i32 } : - (vector<64xf64>, vector<48xf64>) -> vector<12xf64> + %C = vector.matrix_multiply %A, %B + { lhs_rows = 4: i32, lhs_columns = 16: i32 , rhs_columns = 3: i32 } : + (vector<64xf64>, vector<48xf64>) -> vector<12xf64> ``` }]; let builders = [