diff --git a/mlir/docs/ConversionToLLVMDialect.md b/mlir/docs/ConversionToLLVMDialect.md deleted file mode 100644 --- a/mlir/docs/ConversionToLLVMDialect.md +++ /dev/null @@ -1,284 +0,0 @@ -# Conversion to the LLVM Dialect - -Conversion from several dialects that rely on -[built-in types](LangRef.md/#builtin-types) to the -[LLVM Dialect](Dialects/LLVM.md) is expected to be performed through the -[Dialect Conversion](DialectConversion.md) infrastructure. - -The conversion of types and that of the overall module structure is described in -this document. Individual conversion passes provide a set of conversion patterns -for ops in different dialects, such as `-convert-std-to-llvm` for ops in the -[Standard dialect](Dialects/Standard.md) and `-convert-vector-to-llvm` in the -[Vector dialect](Dialects/Vector.md). *Note that some conversions subsume the -others.* - -We use the terminology defined by the -[LLVM Dialect description](Dialects/LLVM.md) throughout this document. - -[TOC] - -## Type Conversion - -### Scalar Types - -Scalar types are converted to their LLVM counterparts if they exist. The -following conversions are currently implemented: - -- `i*` converts to `!llvm.i*` -- `bf16` converts to `bf16` -- `f16` converts to `f16` -- `f32` converts to `f32` -- `f64` converts to `f64` -- `f80` converts to `f80` -- `f128` converts to `f128` - -### Index Type - -Index type is converted to an LLVM dialect integer type with bitwidth equal to -the bitwidth of the pointer size as specified by the -[data layout](Dialects/LLVM.md/#data-layout-and-triple) of the closest module. -For example, on x86-64 CPUs it converts to `i64`. This behavior can be -overridden by the type converter configuration, which is often exposed as a pass -option by conversion passes. - -### Vector Types - -LLVM IR only supports *one-dimensional* vectors, unlike MLIR where vectors can -be multi-dimensional. Vector types cannot be nested in either IR. In the -one-dimensional case, MLIR vectors are converted to LLVM IR vectors of the same -size with element type converted using these conversion rules. In the -n-dimensional case, MLIR vectors are converted to (n-1)-dimensional array types -of one-dimensional vectors. - -For example, `vector<4xf32>` converts to `vector<4xf32>` and `vector<4 x 8 x 16 -x f32>` converts to `!llvm.array<4 x array<8 x vec<16 x f32>>>`. - -### Ranked Memref Types - -Memref types in MLIR have both static and dynamic information associated with -them. In the general case, the dynamic information describes dynamic sizes in -the logical indexing space and any symbols bound to the memref. This dynamic -information must be present at runtime in the LLVM dialect equivalent type. - -In practice, the conversion supports two conventions: - -- the default convention for memrefs in the - **[strided form](Dialects/Builtin.md/#strided-memref)**; -- a "bare pointer" conversion for statically-shaped memrefs with default - layout. - -The choice between conventions is specified at type converter construction time -and is often exposed as an option by conversion passes. - -Memrefs with arbitrary layouts are not supported. Instead, these layouts can be -factored out of the type and used as part of index computation for operations -that read and write into a memref with the default layout. - -#### Default Convention - -The dynamic information comprises the buffer pointer as well as sizes and -strides of any dynamically-sized dimensions. Memref types are normalized and -converted to a _descriptor_ that is only dependent on the rank of the memref. -The descriptor contains the following fields in order: - -1. The pointer to the data buffer as allocated, referred to as "allocated - pointer". This is only useful for deallocating the memref. -2. The pointer to the properly aligned data pointer that the memref indexes, - referred to as "aligned pointer". -3. A lowered converted `index`-type integer containing the distance in number - of elements between the beginning of the (aligned) buffer and the first - element to be accessed through the memref, referred to as "offset". -4. An array containing as many converted `index`-type integers as the rank of - the memref: the array represents the size, in number of elements, of the - memref along the given dimension. For constant memref dimensions, the - corresponding size entry is a constant whose runtime value must match the - static value. -5. A second array containing as many converted `index`-type integers as the - rank of memref: the second array represents the "stride" (in tensor - abstraction sense), i.e. the number of consecutive elements of the - underlying buffer one needs to jump over to get to the next logically - indexed element. - -For constant memref dimensions, the corresponding size entry is a constant whose -runtime value matches the static value. This normalization serves as an ABI for -the memref type to interoperate with externally linked functions. In the -particular case of rank `0` memrefs, the size and stride arrays are omitted, -resulting in a struct containing two pointers + offset. - -Examples: - -```mlir -memref -> !llvm.struct<(ptr , ptr, i64)> -memref<1 x f32> -> !llvm.struct<(ptr, ptr, i64, - array<1 x 64>, array<1 x i64>)> -memref -> !llvm.struct<(ptr, ptr, i64 - array<1 x 64>, array<1 x i64>)> -memref<10x42x42x43x123 x f32> -> !llvm.struct<(ptr, ptr, i64 - array<5 x 64>, array<5 x i64>)> -memref<10x?x42x?x123 x f32> -> !llvm.struct<(ptr, ptr, i64 - array<5 x 64>, array<5 x i64>)> - -// Memref types can have vectors as element types -memref<1x? x vector<4xf32>> -> !llvm.struct<(ptr>, - ptr>, i64, - array<1 x i64>, array<1 x i64>)> -``` - -#### Bare Pointer Convention - -Ranked memrefs with static shape and default layout can be converted into an -LLVM dialect pointer to their element type. Only the default alignment is -supported in such cases, e.g. the `alloc` operation cannot have an alignment -attribute. - -Examples: - -```mlir -memref -> !llvm.ptr -memref<10x42 x f32> -> !llvm.ptr - -// Memrefs with vector types are also supported. -memref<10x42 x vector<4xf32>> -> !llvm.ptr> -``` - -### Unranked Memref types - -Unranked memrefs are converted to an unranked descriptor that contains: - -1. a converted `index`-typed integer representing the dynamic rank of the - memref; -2. a type-erased pointer (`!llvm.ptr`) to a ranked memref descriptor with - the contents listed above. - -This descriptor is primarily intended for interfacing with rank-polymorphic -library functions. The pointer to the ranked memref descriptor points to memory -_allocated on stack_ of the function in which it is used. - -Note that stack allocations may be emitted at a location where the unranked -memref first appears, e.g., a cast operation, and remain live throughout the -lifetime of the function; this may lead to stack exhaustion if used in a loop. - -Examples: - -```mlir -// Unranked descriptor. -memref<*xf32> -> !llvm.struct<(i64, ptr)> -``` - -Bare pointer convention does not support unranked memrefs. - -### Function Types - -Function types get converted to LLVM dialect function types. The arguments are -converted individually according to these rules, except for `memref` types in -function arguments and high-order functions, which are described below. The -result types need to accommodate the fact that LLVM functions always have a -return type, which may be an `!llvm.void` type. The converted function always -has a single result type. If the original function type had no results, the -converted function will have one result of the `!llvm.void` type. If the -original function type had one result, the converted function will also have one -result converted using these rules. Otherwise, the result type will be an LLVM -dialect structure type where each element of the structure corresponds to one of -the results of the original function, converted using these rules. - -Examples: - -```mlir -// Zero-ary function type with no results: -() -> () -// is converted to a zero-ary function with `void` result. -!llvm.func - -// Unary function with one result: -(i32) -> (i64) -// has its argument and result type converted, before creating the LLVM dialect -// function type. -!llvm.func - -// Binary function with one result: -(i32, f32) -> (i64) -// has its arguments handled separately -!llvm.func - -// Binary function with two results: -(i32, f32) -> (i64, f64) -// has its result aggregated into a structure type. -!llvm.func (i32, f32)> -``` - -#### Functions as Function Arguments or Results - -High-order function types, i.e. types of functions that have other functions as -arguments or results, are converted differently to accommodate the fact that -LLVM IR does not allow for function-typed values. Instead, functions are -expected to be passed into and return from other functions _by pointer_. -Therefore, function-typed function arguments are results are converted to -pointer-to-the-function type. The pointee type is converted using these rules. - -Examples: - -```mlir -// Function-typed arguments or results in higher-order functions: -(() -> ()) -> (() -> ()) -// are converted into pointers to functions. -!llvm.func> (ptr>)> - -// These rules apply recursively: a function type taking a function that takes -// another function -( ( (i32) -> (i64) ) -> () ) -> () -// is converted into a function type taking a pointer-to-function that takes -// another point-to-function. -!llvm.func>)>>)> -``` - -#### Memrefs as Function Arguments - -When used as function arguments, both ranked and unranked memrefs are converted -into a list of arguments that represents each _scalar_ component of their -descriptor. This is intended for some compatibility with C ABI, in which -structure types would need to be passed by-pointer leading to the need for -allocations and related issues, as well as for aliasing annotations, which are -currently attached to pointer in function arguments. Having scalar components -means that each size and stride is passed as an individual value. - -When used as function results, memrefs are converted as usual, i.e. each memref -is converted to a descriptor struct (default convention) or to a pointer (bare -pointer convention). - -Examples: - -```mlir -// A memref descriptor appearing as function argument: -(memref) -> () -// gets converted into a list of individual scalar components of a descriptor. -!llvm.func, ptr, i64)> - -// The list of arguments is linearized and one can freely mix memref and other -// types in this list: -(memref, f32) -> () -// which gets converted into a flat list. -!llvm.func, ptr, i64, f32)> - -// For nD ranked memref descriptors: -(memref) -> () -// the converted signature will contain 2n+1 `index`-typed integer arguments, -// offset, n sizes and n strides, per memref argument type. -!llvm.func, ptr, i64, i64, i64, i64, i64)> - -// Same rules apply to unranked descriptors: -(memref<*xf32>) -> () -// which get converted into their components. -!llvm.func)> - -// However, returning a memref from a function is not affected: -() -> (memref) -// gets converted to a function returning a descriptor structure. -!llvm.func, ptr, i64, array<1xi64>, array<1xi64>)> ()> - -// If multiple memref-typed results are returned: -() -> (memref, memref) -// their descriptor structures are additionally packed into another structure, -// potentially with other non-memref typed results. -!llvm.func, ptr, i64)>, - struct<(ptr, ptr, i64)>)> ()> -``` diff --git a/mlir/docs/LLVMDialectMemRefConvention.md b/mlir/docs/LLVMDialectMemRefConvention.md deleted file mode 100644 --- a/mlir/docs/LLVMDialectMemRefConvention.md +++ /dev/null @@ -1,494 +0,0 @@ -# Built-in Function and MemRef Calling Convention - -This documents describes the calling convention implemented in the conversion of -built-in [function operation](Dialects/Builtin.md/#func-mlirfuncop), standard -[`call`](Dialects/Standard.md/#stdcall-callop) operations and the handling of -[`memref`](Dialects/Builtin.md#memreftype) type equivalents in the -[LLVM dialect](Dialects/LLVM.md). The conversion assumes the _default_ -convention was used when converting -[built-in to the LLVM dialect types](ConversionToLLVMDialect.md). - -## Function Result Packing - -In case of multi-result functions, the returned values are inserted into a -structure-typed value before being returned and extracted from it at the call -site. This transformation is a part of the conversion and is transparent to the -defines and uses of the values being returned. - -Example: - -```mlir -func @foo(%arg0: i32, %arg1: i64) -> (i32, i64) { - return %arg0, %arg1 : i32, i64 -} -func @bar() { - %0 = constant 42 : i32 - %1 = constant 17 : i64 - %2:2 = call @foo(%0, %1) : (i32, i64) -> (i32, i64) - "use_i32"(%2#0) : (i32) -> () - "use_i64"(%2#1) : (i64) -> () -} - -// is transformed into - -llvm.func @foo(%arg0: i32, %arg1: i64) -> !llvm.struct<(i32, i64)> { - // insert the vales into a structure - %0 = llvm.mlir.undef : !llvm.struct<(i32, i64)> - %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(i32, i64)> - %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(i32, i64)> - - // return the structure value - llvm.return %2 : !llvm.struct<(i32, i64)> -} -llvm.func @bar() { - %0 = llvm.mlir.constant(42 : i32) : i32 - %1 = llvm.mlir.constant(17) : i64 - - // call and extract the values from the structure - %2 = llvm.call @bar(%0, %1) - : (i32, i32) -> !llvm.struct<(i32, i64)> - %3 = llvm.extractvalue %2[0] : !llvm.struct<(i32, i64)> - %4 = llvm.extractvalue %2[1] : !llvm.struct<(i32, i64)> - - // use as before - "use_i32"(%3) : (i32) -> () - "use_i64"(%4) : (i64) -> () -} -``` - -## Calling Convention for Ranked `memref` - -Function _arguments_ of `memref` type, ranked or unranked, are _expanded_ into a -list of arguments of non-aggregate types that the memref descriptor defined -above comprises. That is, the outer struct type and the inner array types are -replaced with individual arguments. - -This convention is implemented in the conversion of `std.func` and `std.call` to -the LLVM dialect, with the former unpacking the descriptor into a set of -individual values and the latter packing those values back into a descriptor so -as to make it transparently usable by other operations. Conversions from other -dialects should take this convention into account. - -This specific convention is motivated by the necessity to specify alignment and -aliasing attributes on the raw pointers underpinning the memref. - -Examples: - -```mlir -func @foo(%arg0: memref) -> () { - "use"(%arg0) : (memref) -> () - return -} - -// Gets converted to the following -// (using type alias for brevity): -!llvm.memref_1d = type !llvm.struct<(ptr, ptr, i64, - array<1xi64>, array<1xi64>)> - -llvm.func @foo(%arg0: !llvm.ptr, // Allocated pointer. - %arg1: !llvm.ptr, // Aligned pointer. - %arg2: i64, // Offset. - %arg3: i64, // Size in dim 0. - %arg4: i64) { // Stride in dim 0. - // Populate memref descriptor structure. - %0 = llvm.mlir.undef : - %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_1d - %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_1d - %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_1d - %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_1d - %5 = llvm.insertvalue %arg4, %4[4, 0] : !llvm.memref_1d - - // Descriptor is now usable as a single value. - "use"(%5) : (!llvm.memref_1d) -> () - llvm.return -} -``` - -```mlir -func @bar() { - %0 = "get"() : () -> (memref) - call @foo(%0) : (memref) -> () - return -} - -// Gets converted to the following -// (using type alias for brevity): -!llvm.memref_1d = type !llvm.struct<(ptr, ptr, i64, - array<1xi64>, array<1xi64>)> - -llvm.func @bar() { - %0 = "get"() : () -> !llvm.memref_1d - - // Unpack the memref descriptor. - %1 = llvm.extractvalue %0[0] : !llvm.memref_1d - %2 = llvm.extractvalue %0[1] : !llvm.memref_1d - %3 = llvm.extractvalue %0[2] : !llvm.memref_1d - %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_1d - %5 = llvm.extractvalue %0[4, 0] : !llvm.memref_1d - - // Pass individual values to the callee. - llvm.call @foo(%1, %2, %3, %4, %5) : (!llvm.memref_1d) -> () - llvm.return -} - -``` - -## Calling Convention for Unranked `memref` - -For unranked memrefs, the list of function arguments always contains two -elements, same as the unranked memref descriptor: an integer rank, and a -type-erased (`!llvm<"i8*">`) pointer to the ranked memref descriptor. Note that -while the _calling convention_ does not require stack allocation, _casting_ to -unranked memref does since one cannot take an address of an SSA value containing -the ranked memref. The caller is in charge of ensuring the thread safety and -eventually removing unnecessary stack allocations in cast operations. - -Example - -```mlir -llvm.func @foo(%arg0: memref<*xf32>) -> () { - "use"(%arg0) : (memref<*xf32>) -> () - return -} - -// Gets converted to the following. - -llvm.func @foo(%arg0: i64 // Rank. - %arg1: !llvm.ptr) { // Type-erased pointer to descriptor. - // Pack the unranked memref descriptor. - %0 = llvm.mlir.undef : !llvm.struct<(i64, ptr)> - %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(i64, ptr)> - %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(i64, ptr)> - - "use"(%2) : (!llvm.struct<(i64, ptr)>) -> () - llvm.return -} -``` - -```mlir -llvm.func @bar() { - %0 = "get"() : () -> (memref<*xf32>) - call @foo(%0): (memref<*xf32>) -> () - return -} - -// Gets converted to the following. - -llvm.func @bar() { - %0 = "get"() : () -> (!llvm.struct<(i64, ptr)>) - - // Unpack the memref descriptor. - %1 = llvm.extractvalue %0[0] : !llvm.struct<(i64, ptr)> - %2 = llvm.extractvalue %0[1] : !llvm.struct<(i64, ptr)> - - // Pass individual values to the callee. - llvm.call @foo(%1, %2) : (i64, !llvm.ptr) - llvm.return -} -``` - -**Lifetime.** The second element of the unranked memref descriptor points to -some memory in which the ranked memref descriptor is stored. By convention, this -memory is allocated on stack and has the lifetime of the function. (*Note:* due -to function-length lifetime, creation of multiple unranked memref descriptors, -e.g., in a loop, may lead to stack overflows.) If an unranked descriptor has to -be returned from a function, the ranked descriptor it points to is copied into -dynamically allocated memory, and the pointer in the unranked descriptor is -updated accordingly. The allocation happens immediately before returning. It is -the responsibility of the caller to free the dynamically allocated memory. The -default conversion of `std.call` and `std.call_indirect` copies the ranked -descriptor to newly allocated memory on the caller's stack. Thus, the convention -of the ranked memref descriptor pointed to by an unranked memref descriptor -being stored on stack is respected. - -*This convention may or may not apply if the conversion of MemRef types is -overridden by the user.* - -## C-compatible wrapper emission - -In practical cases, it may be desirable to have externally-facing functions with -a single attribute corresponding to a MemRef argument. When interfacing with -LLVM IR produced from C, the code needs to respect the corresponding calling -convention. The conversion to the LLVM dialect provides an option to generate -wrapper functions that take memref descriptors as pointers-to-struct compatible -with data types produced by Clang when compiling C sources. The generation of -such wrapper functions can additionally be controlled at a function granularity -by setting the `llvm.emit_c_interface` unit attribute. - -More specifically, a memref argument is converted into a pointer-to-struct -argument of type `{T*, T*, i64, i64[N], i64[N]}*` in the wrapper function, where -`T` is the converted element type and `N` is the memref rank. This type is -compatible with that produced by Clang for the following C++ structure template -instantiations or their equivalents in C. - -```cpp -template -struct MemRefDescriptor { - T *allocated; - T *aligned; - intptr_t offset; - intptr_t sizes[N]; - intptr_t strides[N]; -}; -``` - -Furthermore, we also rewrite function results to pointer parameters if the -rewritten function result has a struct type. The special result parameter is -added as the first parameter and is of pointer-to-struct type. - -If enabled, the option will do the following. For _external_ functions declared -in the MLIR module. - -1. Declare a new function `_mlir_ciface_` where memref arguments - are converted to pointer-to-struct and the remaining arguments are converted - as usual. Results are converted to a special argument if they are of struct - type. -2. Add a body to the original function (making it non-external) that - 1. allocates memref descriptors, - 2. populates them, - 3. potentially allocates space for the result struct, and - 4. passes the pointers to these into the newly declared interface function, - then - 5. collects the result of the call (potentially from the result struct), - and - 6. returns it to the caller. - -For (non-external) functions defined in the MLIR module. - -1. Define a new function `_mlir_ciface_` where memref arguments - are converted to pointer-to-struct and the remaining arguments are converted - as usual. Results are converted to a special argument if they are of struct - type. -2. Populate the body of the newly defined function with IR that - 1. loads descriptors from pointers; - 2. unpacks descriptor into individual non-aggregate values; - 3. passes these values into the original function; - 4. collects the results of the call and - 5. either copies the results into the result struct or returns them to the - caller. - -Examples: - -```mlir - -func @qux(%arg0: memref) - -// Gets converted into the following -// (using type alias for brevity): -!llvm.memref_2d = type !llvm.struct<(ptr, ptr, i64, - array<2xi64>, array<2xi64>)> - -// Function with unpacked arguments. -llvm.func @qux(%arg0: !llvm.ptr, %arg1: !llvm.ptr, - %arg2: i64, %arg3: i64, %arg4: i64, - %arg5: i64, %arg6: i64) { - // Populate memref descriptor (as per calling convention). - %0 = llvm.mlir.undef : !llvm.memref_2d - %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_2d - %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_2d - %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_2d - %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_2d - %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.memref_2d - %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.memref_2d - %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.memref_2d - - // Store the descriptor in a stack-allocated space. - %8 = llvm.mlir.constant(1 : index) : i64 - %9 = llvm.alloca %8 x !llvm.memref_2d - : (i64) -> !llvm.ptr, ptr, i64, - array<2xi64>, array<2xi64>)>> - llvm.store %7, %9 : !llvm.ptr, ptr, i64, - array<2xi64>, array<2xi64>)>> - - // Call the interface function. - llvm.call @_mlir_ciface_qux(%9) - : (!llvm.ptr, ptr, i64, - array<2xi64>, array<2xi64>)>>) -> () - - // The stored descriptor will be freed on return. - llvm.return -} - -// Interface function. -llvm.func @_mlir_ciface_qux(!llvm.ptr, ptr, i64, - array<2xi64>, array<2xi64>)>>) -``` - -```mlir -func @foo(%arg0: memref) { - return -} - -// Gets converted into the following -// (using type alias for brevity): -!llvm.memref_2d = type !llvm.struct<(ptr, ptr, i64, - array<2xi64>, array<2xi64>)> -!llvm.memref_2d_ptr = type !llvm.ptr, ptr, i64, - array<2xi64>, array<2xi64>)>> - -// Function with unpacked arguments. -llvm.func @foo(%arg0: !llvm.ptr, %arg1: !llvm.ptr, - %arg2: i64, %arg3: i64, %arg4: i64, - %arg5: i64, %arg6: i64) { - llvm.return -} - -// Interface function callable from C. -llvm.func @_mlir_ciface_foo(%arg0: !llvm.memref_2d_ptr) { - // Load the descriptor. - %0 = llvm.load %arg0 : !llvm.memref_2d_ptr - - // Unpack the descriptor as per calling convention. - %1 = llvm.extractvalue %0[0] : !llvm.memref_2d - %2 = llvm.extractvalue %0[1] : !llvm.memref_2d - %3 = llvm.extractvalue %0[2] : !llvm.memref_2d - %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_2d - %5 = llvm.extractvalue %0[3, 1] : !llvm.memref_2d - %6 = llvm.extractvalue %0[4, 0] : !llvm.memref_2d - %7 = llvm.extractvalue %0[4, 1] : !llvm.memref_2d - llvm.call @foo(%1, %2, %3, %4, %5, %6, %7) - : (!llvm.ptr, !llvm.ptr, i64, i64, i64, - i64, i64) -> () - llvm.return -} -``` - -```mlir -func @foo(%arg0: memref) -> memref { - return %arg0 : memref -} - -// Gets converted into the following -// (using type alias for brevity): -!llvm.memref_2d = type !llvm.struct<(ptr, ptr, i64, - array<2xi64>, array<2xi64>)> -!llvm.memref_2d_ptr = type !llvm.ptr, ptr, i64, - array<2xi64>, array<2xi64>)>> - -// Function with unpacked arguments. -llvm.func @foo(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: i64, - %arg3: i64, %arg4: i64, %arg5: i64, %arg6: i64) - -> !llvm.memref_2d { - %0 = llvm.mlir.undef : !llvm.memref_2d - %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_2d - %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_2d - %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_2d - %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_2d - %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.memref_2d - %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.memref_2d - %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.memref_2d - llvm.return %7 : !llvm.memref_2d -} - -// Interface function callable from C. -llvm.func @_mlir_ciface_foo(%arg0: !llvm.memref_2d_ptr, %arg1: !llvm.memref_2d_ptr) { - %0 = llvm.load %arg1 : !llvm.memref_2d_ptr - %1 = llvm.extractvalue %0[0] : !llvm.memref_2d - %2 = llvm.extractvalue %0[1] : !llvm.memref_2d - %3 = llvm.extractvalue %0[2] : !llvm.memref_2d - %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_2d - %5 = llvm.extractvalue %0[3, 1] : !llvm.memref_2d - %6 = llvm.extractvalue %0[4, 0] : !llvm.memref_2d - %7 = llvm.extractvalue %0[4, 1] : !llvm.memref_2d - %8 = llvm.call @foo(%1, %2, %3, %4, %5, %6, %7) - : (!llvm.ptr, !llvm.ptr, i64, i64, i64, i64, i64) -> !llvm.memref_2d - llvm.store %8, %arg0 : !llvm.memref_2d_ptr - llvm.return -} -``` - -Rationale: Introducing auxiliary functions for C-compatible interfaces is -preferred to modifying the calling convention since it will minimize the effect -of C compatibility on intra-module calls or calls between MLIR-generated -functions. In particular, when calling external functions from an MLIR module in -a (parallel) loop, the fact of storing a memref descriptor on stack can lead to -stack exhaustion and/or concurrent access to the same address. Auxiliary -interface function serves as an allocation scope in this case. Furthermore, when -targeting accelerators with separate memory spaces such as GPUs, stack-allocated -descriptors passed by pointer would have to be transferred to the device memory, -which introduces significant overhead. In such situations, auxiliary interface -functions are executed on host and only pass the values through device function -invocation mechanism. - -## Default Memref Model - -### Memref Descriptor - -Within a converted function, a `memref`-typed value is represented by a memref -_descriptor_, the type of which is the structure type obtained by converting -from the memref type. This descriptor holds all the necessary information to -produce an address of a specific element. In particular, it holds dynamic values -for static sizes, and they are expected to match at all times. - -It is created by the allocation operation and is updated by the conversion -operations that may change static dimensions into dynamic dimensions and vice -versa. - -**Note**: LLVM IR conversion does not support `memref`s with layouts that are -not amenable to the strided form. - -### Index Linearization - -Accesses to a memref element are transformed into an access to an element of the -buffer pointed to by the descriptor. The position of the element in the buffer -is calculated by linearizing memref indices in row-major order (lexically first -index is the slowest varying, similar to C, but accounting for strides). The -computation of the linear address is emitted as arithmetic operation in the LLVM -IR dialect. Strides are extracted from the memref descriptor. - -Examples: - -An access to a memref with indices: - -```mlir -%0 = load %m[%1,%2,%3,%4] : memref -``` - -is transformed into the equivalent of the following code: - -```mlir -// Compute the linearized index from strides. -// When strides or, in absence of explicit strides, the corresponding sizes are -// dynamic, extract the stride value from the descriptor. -%stride1 = llvm.extractvalue[4, 0] : !llvm.struct<(ptr, ptr, i64, - array<4xi64>, array<4xi64>)> -%addr1 = muli %stride1, %1 : i64 - -// When the stride or, in absence of explicit strides, the trailing sizes are -// known statically, this value is used as a constant. The natural value of -// strides is the product of all sizes following the current dimension. -%stride2 = llvm.mlir.constant(32 : index) : i64 -%addr2 = muli %stride2, %2 : i64 -%addr3 = addi %addr1, %addr2 : i64 - -%stride3 = llvm.mlir.constant(8 : index) : i64 -%addr4 = muli %stride3, %3 : i64 -%addr5 = addi %addr3, %addr4 : i64 - -// Multiplication with the known unit stride can be omitted. -%addr6 = addi %addr5, %4 : i64 - -// If the linear offset is known to be zero, it can also be omitted. If it is -// dynamic, it is extracted from the descriptor. -%offset = llvm.extractvalue[2] : !llvm.struct<(ptr, ptr, i64, - array<4xi64>, array<4xi64>)> -%addr7 = addi %addr6, %offset : i64 - -// All accesses are based on the aligned pointer. -%aligned = llvm.extractvalue[1] : !llvm.struct<(ptr, ptr, i64, - array<4xi64>, array<4xi64>)> - -// Get the address of the data pointer. -%ptr = llvm.getelementptr %aligned[%addr8] - : !llvm.struct<(ptr, ptr, i64, array<4xi64>, array<4xi64>)> - -> !llvm.ptr - -// Perform the actual load. -%0 = llvm.load %ptr : !llvm.ptr -``` - -For stores, the address computation code is identical and only the actual store -operation is different. - -Note: the conversion does not perform any sort of common subexpression -elimination when emitting memref accesses. diff --git a/mlir/docs/TargetLLVMIR.md b/mlir/docs/TargetLLVMIR.md new file mode 100644 --- /dev/null +++ b/mlir/docs/TargetLLVMIR.md @@ -0,0 +1,898 @@ +# LLVM IR Target + +This document describes the mechanisms of producing LLVM IR from MLIR. The +overall flow is two-stage: + +1. **conversion** of the IR to a set of dialects translatable to LLVM IR, for + example [LLVM Dialect](Dialects/LLVM.md) or one of the hardware-specific + dialects derived from LLVM IR intrinsics such as [AMX](Dialects/AMX.md), + [X86Vector](Dialects/X86Vector.md) or [ArmNeon](Dialects/ArmNeon.md); +2. **translation** of MLIR dialects to LLVM IR. + +This flow allows the non-trivial transformation to be performed within MLIR +using MLIR APIs and makes the translation between MLIR and LLVM IR *simple* and +potentially bidirectional. As a corollary, dialect ops translatable to LLVM IR +are expected to closely match the corresponding LLVM IR instructions and +intrinsics. This minimizes the dependency on LLVM IR libraries in MLIR as well +as reduces the churn in case of changes. + +SPIR-V to LLVM dialect conversion has a +[dedicated document](SPIRVToLLVMDialectConversion.md). + +[TOC] + +## Conversion to the LLVM Dialect + +Conversion to the LLVM dialect from other dialects is the first step to produce +LLVM IR. All non-trivial IR modifications are expected to happen at this stage +or before. The conversion is *progressive*: most passes convert one dialect to +the LLVM dialect and keep operations from other dialects intact. For example, +the `-convert-memref-to-llvm` pass will only convert operations from the +`memref` dialect but will not convert operations from other dialects even if +they use or produce `memref`-typed values. + +The process relies on the [Dialect Conversion](DialectConversion.md) +infrastructure and, in particular, on the +[materialization](DialectConversion.md#type-conversion) hooks of `TypeConverter` +to support progressive lowering by injecting `unrealized_conversion_cast` +operations between converted and unconverted operations. After multiple partial +conversions to the LLVM dialect are performed, the cast operations that became +noop can be removed by the `-reconcile-unrealized-casts` pass. The latter pass +is not specific to the LLVM dialect and can remove any noop casts. + +### Conversion of Built-in Types + +Built-in types have a default conversion to LLVM dialect types provided by the +`LLVMTypeConverter` class. Users targeting the LLVM dialect can reuse and extend +this type converter to support other types. Extra care must be taken if the +conversion rules for built-in types are overridden: all conversion must use the +same type converter. + +#### LLVM Dialect-compatible Types + +The types [compatible](Dialects/LLVM.md#built-in-type-compatibility) with the +LLVM dialect are kept as is. + +#### Complex Type + +Complex type is converted into an LLVM dialect literal structure type with two +elements: + +- real part; +- imaginary part. + +The elemental type is converted recursively using these rules. + +Example: + +```mlir + complex + // -> + !llvm.struct<(f32, f32)> +``` + +#### Index Type + +Index type is converted into an LLVM dialect integer type with the bitwidth +specified by the [data layout](DataLayout.md) of the closest module. For +example, on x86-64 CPUs it converts to i64. This behavior can be overridden by +the type converter configuration, which is often exposed as a pass option by +conversion passes. + +Example: + +```mlir + index + // -> on x86_64 + i64 +``` + +#### Ranked MemRef Types + +Ranked memref types are converted into an LLVM dialect literal structure type +that contains the dynamic information associated with the memref object, +referred to as *descriptor*. Only memrefs in the +**[strided form](Dialects/Builtin.md/#strided-memref)** can be converted to the +LLVM dialect with the default descriptor format. Memrefs with other, less +trivial layouts should be converted into the strided form first, e.g., by +materializing the non-trivial address remapping due to layout as `affine.apply` +operations. + +The default memref descriptor is a struct with the following fields: + +1. The pointer to the data buffer as allocated, referred to as "allocated + pointer". This is only useful for deallocating the memref. +2. The pointer to the properly aligned data pointer that the memref indexes, + referred to as "aligned pointer". +3. A lowered converted `index`-type integer containing the distance in number + of elements between the beginning of the (aligned) buffer and the first + element to be accessed through the memref, referred to as "offset". +4. An array containing as many converted `index`-type integers as the rank of + the memref: the array represents the size, in number of elements, of the + memref along the given dimension. +5. A second array containing as many converted `index`-type integers as the + rank of memref: the second array represents the "stride" (in tensor + abstraction sense), i.e. the number of consecutive elements of the + underlying buffer one needs to jump over to get to the next logically + indexed element. + +For constant memref dimensions, the corresponding size entry is a constant whose +runtime value matches the static value. This normalization serves as an ABI for +the memref type to interoperate with externally linked functions. In the +particular case of rank `0` memrefs, the size and stride arrays are omitted, +resulting in a struct containing two pointers + offset. + +Examples: + +```mlir +// Assuming index is converted to i64. + +memref -> !llvm.struct<(ptr , ptr, i64)> +memref<1 x f32> -> !llvm.struct<(ptr, ptr, i64, + array<1 x 64>, array<1 x i64>)> +memref -> !llvm.struct<(ptr, ptr, i64 + array<1 x 64>, array<1 x i64>)> +memref<10x42x42x43x123 x f32> -> !llvm.struct<(ptr, ptr, i64 + array<5 x 64>, array<5 x i64>)> +memref<10x?x42x?x123 x f32> -> !llvm.struct<(ptr, ptr, i64 + array<5 x 64>, array<5 x i64>)> + +// Memref types can have vectors as element types +memref<1x? x vector<4xf32>> -> !llvm.struct<(ptr>, + ptr>, i64, + array<2 x i64>, array<2 x i64>)> +``` + +#### Unranked MemRef Types + +Unranked memref types are converted to LLVM dialect literal structure type that +contains the ynamic information associated with the memref object, referred to +as *unranked descriptor*. It contains: + +1. a converted `index`-typed integer representing the dynamic rank of the + memref; +2. a type-erased pointer (`!llvm.ptr`) to a ranked memref descriptor with + the contents listed above. + +This descriptor is primarily intended for interfacing with rank-polymorphic +library functions. The pointer to the ranked memref descriptor points to some +*allocated* memory, which may reside on stack of the current function or in +heap. Conversion patterns for operations producing unranked memrefs are expected +to manage the allocation. Note that this may lead to stack allocations +(`llvm.alloca`) being performed in a loop and not reclaimed until the end of the +current function. + +#### Function Types + +Function types are converted to LLVM dialect function types as follows: + +- function argument and result types are converted recursively using these + rules; +- if a function type has multiple results, they are wrapped into an LLVM + dialect literal structure type since LLVM function types must have exactly + one result; +- if a function type has no results, the corresponding LLVM dialect function + type will have one `!llvm.void` result since LLVM function types must have a + result; +- function types used in arguments of another function type are wrapped in an + LLVM dialect pointer type to comply with LLVM IR expectations; +- the structs corresponding to `memref` types, both ranked and unranked, + appearing as function arguments are unbundled into individual function + arguments to allow for specifying metadata such as aliasing information on + individual pointers; +- the conversion of `memref`-typed arguments is subject to + [calling conventions](TargetLLVMIR.md#calling-conventions). + +Examples: + +```mlir +// Zero-ary function type with no results: +() -> () +// is converted to a zero-ary function with `void` result. +!llvm.func + +// Unary function with one result: +(i32) -> (i64) +// has its argument and result type converted, before creating the LLVM dialect +// function type. +!llvm.func + +// Binary function with one result: +(i32, f32) -> (i64) +// has its arguments handled separately +!llvm.func + +// Binary function with two results: +(i32, f32) -> (i64, f64) +// has its result aggregated into a structure type. +!llvm.func (i32, f32)> + +// Function-typed arguments or results in higher-order functions: +(() -> ()) -> (() -> ()) +// are converted into pointers to functions. +!llvm.func> (ptr>)> + +// These rules apply recursively: a function type taking a function that takes +// another function +( ( (i32) -> (i64) ) -> () ) -> () +// is converted into a function type taking a pointer-to-function that takes +// another point-to-function. +!llvm.func>)>>)> + +// A memref descriptor appearing as function argument: +(memref) -> () +// gets converted into a list of individual scalar components of a descriptor. +!llvm.func, ptr, i64)> + +// The list of arguments is linearized and one can freely mix memref and other +// types in this list: +(memref, f32) -> () +// which gets converted into a flat list. +!llvm.func, ptr, i64, f32)> + +// For nD ranked memref descriptors: +(memref) -> () +// the converted signature will contain 2n+1 `index`-typed integer arguments, +// offset, n sizes and n strides, per memref argument type. +!llvm.func, ptr, i64, i64, i64, i64, i64)> + +// Same rules apply to unranked descriptors: +(memref<*xf32>) -> () +// which get converted into their components. +!llvm.func)> + +// However, returning a memref from a function is not affected: +() -> (memref) +// gets converted to a function returning a descriptor structure. +!llvm.func, ptr, i64, array<1xi64>, array<1xi64>)> ()> + +// If multiple memref-typed results are returned: +() -> (memref, memref) +// their descriptor structures are additionally packed into another structure, +// potentially with other non-memref typed results. +!llvm.func, ptr, i64)>, + struct<(ptr, ptr, i64)>)> ()> +``` + +Conversion patterns are available to convert built-in function operations and +standard call operations targeting those functions using these conversion rules. + +#### Multi-dimensional Vector Types + +LLVM IR only supports *one-dimensional* vectors, unlike MLIR where vectors can +be multi-dimensional. Vector types cannot be nested in either IR. In the +one-dimensional case, MLIR vectors are converted to LLVM IR vectors of the same +size with element type converted using these conversion rules. In the +n-dimensional case, MLIR vectors are converted to (n-1)-dimensional array types +of one-dimensional vectors. + +Examples: + +``` +vector<4x8 x f32> +// -> +!llvm.array<4 x vector<8 x f32>> + +memref<2 x vector<4x8 x f32> +// -> +!llvm.struct<(ptr>>, ptr>> + i64, array<1 x i64>, array<1 x i64>)> +``` + +#### Tensor Types + +Tensor types cannot be converted to the LLVM dialect. Operations on tensors must +be [bufferized](Bufferization.md) before being converted. + +### Calling Conventions + +Calling conventions provides a mechanism to customize the conversion of function +and function call operations without changing how individual types are handled +elsewhere. They are implemented simultaneously by the default type converter and +by the conversion patterns for the relevant operations. + +#### Function Result Packing + +In case of multi-result functions, the returned values are inserted into a +structure-typed value before being returned and extracted from it at the call +site. This transformation is a part of the conversion and is transparent to the +defines and uses of the values being returned. + +Example: + +```mlir +func @foo(%arg0: i32, %arg1: i64) -> (i32, i64) { + return %arg0, %arg1 : i32, i64 +} +func @bar() { + %0 = constant 42 : i32 + %1 = constant 17 : i64 + %2:2 = call @foo(%0, %1) : (i32, i64) -> (i32, i64) + "use_i32"(%2#0) : (i32) -> () + "use_i64"(%2#1) : (i64) -> () +} + +// is transformed into + +llvm.func @foo(%arg0: i32, %arg1: i64) -> !llvm.struct<(i32, i64)> { + // insert the vales into a structure + %0 = llvm.mlir.undef : !llvm.struct<(i32, i64)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(i32, i64)> + %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(i32, i64)> + + // return the structure value + llvm.return %2 : !llvm.struct<(i32, i64)> +} +llvm.func @bar() { + %0 = llvm.mlir.constant(42 : i32) : i32 + %1 = llvm.mlir.constant(17) : i64 + + // call and extract the values from the structure + %2 = llvm.call @bar(%0, %1) + : (i32, i32) -> !llvm.struct<(i32, i64)> + %3 = llvm.extractvalue %2[0] : !llvm.struct<(i32, i64)> + %4 = llvm.extractvalue %2[1] : !llvm.struct<(i32, i64)> + + // use as before + "use_i32"(%3) : (i32) -> () + "use_i64"(%4) : (i64) -> () +} +``` + +#### Default Calling Convention for Ranked MemRef + +The default calling convention converts `memref`-typed function arguments to +LLVM dialect literal structs +[defined above](TargetLLVMIR.md#ranked-memref-types) before unbundling them into +individual scalar arguments. + +Examples: + +This convention is implemented in the conversion of `std.func` and `std.call` to +the LLVM dialect, with the former unpacking the descriptor into a set of +individual values and the latter packing those values back into a descriptor so +as to make it transparently usable by other operations. Conversions from other +dialects should take this convention into account. + +This specific convention is motivated by the necessity to specify alignment and +aliasing attributes on the raw pointers underpinning the memref. + +Examples: + +```mlir +func @foo(%arg0: memref) -> () { + "use"(%arg0) : (memref) -> () + return +} + +// Gets converted to the following +// (using type alias for brevity): +!llvm.memref_1d = type !llvm.struct<(ptr, ptr, i64, + array<1xi64>, array<1xi64>)> + +llvm.func @foo(%arg0: !llvm.ptr, // Allocated pointer. + %arg1: !llvm.ptr, // Aligned pointer. + %arg2: i64, // Offset. + %arg3: i64, // Size in dim 0. + %arg4: i64) { // Stride in dim 0. + // Populate memref descriptor structure. + %0 = llvm.mlir.undef : + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_1d + %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_1d + %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_1d + %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_1d + %5 = llvm.insertvalue %arg4, %4[4, 0] : !llvm.memref_1d + + // Descriptor is now usable as a single value. + "use"(%5) : (!llvm.memref_1d) -> () + llvm.return +} +``` + +```mlir +func @bar() { + %0 = "get"() : () -> (memref) + call @foo(%0) : (memref) -> () + return +} + +// Gets converted to the following +// (using type alias for brevity): +!llvm.memref_1d = type !llvm.struct<(ptr, ptr, i64, + array<1xi64>, array<1xi64>)> + +llvm.func @bar() { + %0 = "get"() : () -> !llvm.memref_1d + + // Unpack the memref descriptor. + %1 = llvm.extractvalue %0[0] : !llvm.memref_1d + %2 = llvm.extractvalue %0[1] : !llvm.memref_1d + %3 = llvm.extractvalue %0[2] : !llvm.memref_1d + %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_1d + %5 = llvm.extractvalue %0[4, 0] : !llvm.memref_1d + + // Pass individual values to the callee. + llvm.call @foo(%1, %2, %3, %4, %5) : (!llvm.memref_1d) -> () + llvm.return +} +``` + +#### Default Calling Convention for Unranked MemRef + +For unranked memrefs, the list of function arguments always contains two +elements, same as the unranked memref descriptor: an integer rank, and a +type-erased (`!llvm<"i8*">`) pointer to the ranked memref descriptor. Note that +while the *calling convention* does not require allocation, *casting* to +unranked memref does since one cannot take an address of an SSA value containing +the ranked memref, which must be stored in some memory instead. The caller is in +charge of ensuring the thread safety and management of the allocated memory, in +particular the deallocation. + +Example + +```mlir +llvm.func @foo(%arg0: memref<*xf32>) -> () { + "use"(%arg0) : (memref<*xf32>) -> () + return +} + +// Gets converted to the following. + +llvm.func @foo(%arg0: i64 // Rank. + %arg1: !llvm.ptr) { // Type-erased pointer to descriptor. + // Pack the unranked memref descriptor. + %0 = llvm.mlir.undef : !llvm.struct<(i64, ptr)> + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(i64, ptr)> + %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(i64, ptr)> + + "use"(%2) : (!llvm.struct<(i64, ptr)>) -> () + llvm.return +} +``` + +```mlir +llvm.func @bar() { + %0 = "get"() : () -> (memref<*xf32>) + call @foo(%0): (memref<*xf32>) -> () + return +} + +// Gets converted to the following. + +llvm.func @bar() { + %0 = "get"() : () -> (!llvm.struct<(i64, ptr)>) + + // Unpack the memref descriptor. + %1 = llvm.extractvalue %0[0] : !llvm.struct<(i64, ptr)> + %2 = llvm.extractvalue %0[1] : !llvm.struct<(i64, ptr)> + + // Pass individual values to the callee. + llvm.call @foo(%1, %2) : (i64, !llvm.ptr) + llvm.return +} +``` + +**Lifetime.** The second element of the unranked memref descriptor points to +some memory in which the ranked memref descriptor is stored. By convention, this +memory is allocated on stack and has the lifetime of the function. (*Note:* due +to function-length lifetime, creation of multiple unranked memref descriptors, +e.g., in a loop, may lead to stack overflows.) If an unranked descriptor has to +be returned from a function, the ranked descriptor it points to is copied into +dynamically allocated memory, and the pointer in the unranked descriptor is +updated accordingly. The allocation happens immediately before returning. It is +the responsibility of the caller to free the dynamically allocated memory. The +default conversion of `std.call` and `std.call_indirect` copies the ranked +descriptor to newly allocated memory on the caller's stack. Thus, the convention +of the ranked memref descriptor pointed to by an unranked memref descriptor +being stored on stack is respected. + +#### Bare Pointer Calling Convention for Ranked MemRef + +The "bare pointer" calling convention converts `memref`-typed function arguments +to a *single* pointer to the aligned data. Note that this does *not* apply to +uses of `memref` outside of function signatures, the default descriptor +structures are still used. This convention further restricts the supported cases +to the following. + +- `memref` types with default layout. +- `memref` types with all dimensions statically known. +- `memref` values allocated in such a way that the allocated and aligned + pointer match. Alternatively, the same function must handle allocation and + deallocation since only one pointer is passed to any callee. + +Examples: + +``` +func @callee(memref<2x4xf32>) { + +func @caller(%0 : memref<2x4xf32>) { + call @callee(%0) : (memref<2x4xf32>) -> () +} + +// -> + +!descriptor = !llvm.struct<(ptr, ptr, i64, + array<2xi64>, array<2xi64>)> + +llvm.func @callee(!llvm.ptr) + +llvm.func @caller(%arg0: !llvm.ptr) { + // A descriptor value is defined at the function entry point. + %0 = llvm.mlir.undef : !descriptor + + // Both the allocated and aligned pointer are set up to the same value. + %1 = llvm.insertelement %arg0, %0[0] : !descriptor + %2 = llvm.insertelement %arg0, %1[1] : !descriptor + + // The offset is set up to zero. + %3 = llvm.mlir.constant(0 : index) : i64 + %4 = llvm.insertelement %3, %2[2] : !descriptor + + // The sizes and strides are derived from the statically known values. + %5 = llvm.mlir.constant(2 : index) : i64 + %6 = llvm.mlir.constant(4 : index) : i64 + %7 = llvm.insertelement %5, %4[3, 0] : !descriptor + %8 = llvm.insertelement %6, %7[3, 1] : !descriptor + %9 = llvm.mlir.constant(1 : index) : i64 + %10 = llvm.insertelement %9, %8[4, 0] : !descriptor + %11 = llvm.insertelement %10, %9[4, 1] : !descriptor + + // The function call corresponds to extracting the aligned data pointer. + %12 = llvm.extractelement %11[1] : !descriptor + llvm.call @callee(%12) : (!llvm.ptr) -> () +} +``` + +#### Bare Pointer Calling Convention For Unranked MemRef + +The "bare pointer" calling convention does not support unranked memrefs as their +shape cannot be known at compile time. + +### C-compatible wrapper emission + +In practical cases, it may be desirable to have externally-facing functions with +a single attribute corresponding to a MemRef argument. When interfacing with +LLVM IR produced from C, the code needs to respect the corresponding calling +convention. The conversion to the LLVM dialect provides an option to generate +wrapper functions that take memref descriptors as pointers-to-struct compatible +with data types produced by Clang when compiling C sources. The generation of +such wrapper functions can additionally be controlled at a function granularity +by setting the `llvm.emit_c_interface` unit attribute. + +More specifically, a memref argument is converted into a pointer-to-struct +argument of type `{T*, T*, i64, i64[N], i64[N]}*` in the wrapper function, where +`T` is the converted element type and `N` is the memref rank. This type is +compatible with that produced by Clang for the following C++ structure template +instantiations or their equivalents in C. + +```cpp +template +struct MemRefDescriptor { + T *allocated; + T *aligned; + intptr_t offset; + intptr_t sizes[N]; + intptr_t strides[N]; +}; +``` + +Furthermore, we also rewrite function results to pointer parameters if the +rewritten function result has a struct type. The special result parameter is +added as the first parameter and is of pointer-to-struct type. + +If enabled, the option will do the following. For *external* functions declared +in the MLIR module. + +1. Declare a new function `_mlir_ciface_` where memref arguments + are converted to pointer-to-struct and the remaining arguments are converted + as usual. Results are converted to a special argument if they are of struct + type. +2. Add a body to the original function (making it non-external) that + 1. allocates memref descriptors, + 2. populates them, + 3. potentially allocates space for the result struct, and + 4. passes the pointers to these into the newly declared interface function, + then + 5. collects the result of the call (potentially from the result struct), + and + 6. returns it to the caller. + +For (non-external) functions defined in the MLIR module. + +1. Define a new function `_mlir_ciface_` where memref arguments + are converted to pointer-to-struct and the remaining arguments are converted + as usual. Results are converted to a special argument if they are of struct + type. +2. Populate the body of the newly defined function with IR that + 1. loads descriptors from pointers; + 2. unpacks descriptor into individual non-aggregate values; + 3. passes these values into the original function; + 4. collects the results of the call and + 5. either copies the results into the result struct or returns them to the + caller. + +Examples: + +```mlir + +func @qux(%arg0: memref) + +// Gets converted into the following +// (using type alias for brevity): +!llvm.memref_2d = type !llvm.struct<(ptr, ptr, i64, + array<2xi64>, array<2xi64>)> + +// Function with unpacked arguments. +llvm.func @qux(%arg0: !llvm.ptr, %arg1: !llvm.ptr, + %arg2: i64, %arg3: i64, %arg4: i64, + %arg5: i64, %arg6: i64) { + // Populate memref descriptor (as per calling convention). + %0 = llvm.mlir.undef : !llvm.memref_2d + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_2d + %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_2d + %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_2d + %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_2d + %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.memref_2d + %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.memref_2d + %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.memref_2d + + // Store the descriptor in a stack-allocated space. + %8 = llvm.mlir.constant(1 : index) : i64 + %9 = llvm.alloca %8 x !llvm.memref_2d + : (i64) -> !llvm.ptr, ptr, i64, + array<2xi64>, array<2xi64>)>> + llvm.store %7, %9 : !llvm.ptr, ptr, i64, + array<2xi64>, array<2xi64>)>> + + // Call the interface function. + llvm.call @_mlir_ciface_qux(%9) + : (!llvm.ptr, ptr, i64, + array<2xi64>, array<2xi64>)>>) -> () + + // The stored descriptor will be freed on return. + llvm.return +} + +// Interface function. +llvm.func @_mlir_ciface_qux(!llvm.ptr, ptr, i64, + array<2xi64>, array<2xi64>)>>) +``` + +```mlir +func @foo(%arg0: memref) { + return +} + +// Gets converted into the following +// (using type alias for brevity): +!llvm.memref_2d = type !llvm.struct<(ptr, ptr, i64, + array<2xi64>, array<2xi64>)> +!llvm.memref_2d_ptr = type !llvm.ptr, ptr, i64, + array<2xi64>, array<2xi64>)>> + +// Function with unpacked arguments. +llvm.func @foo(%arg0: !llvm.ptr, %arg1: !llvm.ptr, + %arg2: i64, %arg3: i64, %arg4: i64, + %arg5: i64, %arg6: i64) { + llvm.return +} + +// Interface function callable from C. +llvm.func @_mlir_ciface_foo(%arg0: !llvm.memref_2d_ptr) { + // Load the descriptor. + %0 = llvm.load %arg0 : !llvm.memref_2d_ptr + + // Unpack the descriptor as per calling convention. + %1 = llvm.extractvalue %0[0] : !llvm.memref_2d + %2 = llvm.extractvalue %0[1] : !llvm.memref_2d + %3 = llvm.extractvalue %0[2] : !llvm.memref_2d + %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_2d + %5 = llvm.extractvalue %0[3, 1] : !llvm.memref_2d + %6 = llvm.extractvalue %0[4, 0] : !llvm.memref_2d + %7 = llvm.extractvalue %0[4, 1] : !llvm.memref_2d + llvm.call @foo(%1, %2, %3, %4, %5, %6, %7) + : (!llvm.ptr, !llvm.ptr, i64, i64, i64, + i64, i64) -> () + llvm.return +} +``` + +```mlir +func @foo(%arg0: memref) -> memref { + return %arg0 : memref +} + +// Gets converted into the following +// (using type alias for brevity): +!llvm.memref_2d = type !llvm.struct<(ptr, ptr, i64, + array<2xi64>, array<2xi64>)> +!llvm.memref_2d_ptr = type !llvm.ptr, ptr, i64, + array<2xi64>, array<2xi64>)>> + +// Function with unpacked arguments. +llvm.func @foo(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: i64, + %arg3: i64, %arg4: i64, %arg5: i64, %arg6: i64) + -> !llvm.memref_2d { + %0 = llvm.mlir.undef : !llvm.memref_2d + %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_2d + %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_2d + %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_2d + %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_2d + %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.memref_2d + %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.memref_2d + %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.memref_2d + llvm.return %7 : !llvm.memref_2d +} + +// Interface function callable from C. +llvm.func @_mlir_ciface_foo(%arg0: !llvm.memref_2d_ptr, %arg1: !llvm.memref_2d_ptr) { + %0 = llvm.load %arg1 : !llvm.memref_2d_ptr + %1 = llvm.extractvalue %0[0] : !llvm.memref_2d + %2 = llvm.extractvalue %0[1] : !llvm.memref_2d + %3 = llvm.extractvalue %0[2] : !llvm.memref_2d + %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_2d + %5 = llvm.extractvalue %0[3, 1] : !llvm.memref_2d + %6 = llvm.extractvalue %0[4, 0] : !llvm.memref_2d + %7 = llvm.extractvalue %0[4, 1] : !llvm.memref_2d + %8 = llvm.call @foo(%1, %2, %3, %4, %5, %6, %7) + : (!llvm.ptr, !llvm.ptr, i64, i64, i64, i64, i64) -> !llvm.memref_2d + llvm.store %8, %arg0 : !llvm.memref_2d_ptr + llvm.return +} +``` + +Rationale: Introducing auxiliary functions for C-compatible interfaces is +preferred to modifying the calling convention since it will minimize the effect +of C compatibility on intra-module calls or calls between MLIR-generated +functions. In particular, when calling external functions from an MLIR module in +a (parallel) loop, the fact of storing a memref descriptor on stack can lead to +stack exhaustion and/or concurrent access to the same address. Auxiliary +interface function serves as an allocation scope in this case. Furthermore, when +targeting accelerators with separate memory spaces such as GPUs, stack-allocated +descriptors passed by pointer would have to be transferred to the device memory, +which introduces significant overhead. In such situations, auxiliary interface +functions are executed on host and only pass the values through device function +invocation mechanism. + +### Address Computation + +Accesses to a memref element are transformed into an access to an element of the +buffer pointed to by the descriptor. The position of the element in the buffer +is calculated by linearizing memref indices in row-major order (lexically first +index is the slowest varying, similar to C, but accounting for strides). The +computation of the linear address is emitted as arithmetic operation in the LLVM +IR dialect. Strides are extracted from the memref descriptor. + +Examples: + +An access to a memref with indices: + +```mlir +%0 = load %m[%1,%2,%3,%4] : memref +``` + +is transformed into the equivalent of the following code: + +```mlir +// Compute the linearized index from strides. +// When strides or, in absence of explicit strides, the corresponding sizes are +// dynamic, extract the stride value from the descriptor. +%stride1 = llvm.extractvalue[4, 0] : !llvm.struct<(ptr, ptr, i64, + array<4xi64>, array<4xi64>)> +%addr1 = muli %stride1, %1 : i64 + +// When the stride or, in absence of explicit strides, the trailing sizes are +// known statically, this value is used as a constant. The natural value of +// strides is the product of all sizes following the current dimension. +%stride2 = llvm.mlir.constant(32 : index) : i64 +%addr2 = muli %stride2, %2 : i64 +%addr3 = addi %addr1, %addr2 : i64 + +%stride3 = llvm.mlir.constant(8 : index) : i64 +%addr4 = muli %stride3, %3 : i64 +%addr5 = addi %addr3, %addr4 : i64 + +// Multiplication with the known unit stride can be omitted. +%addr6 = addi %addr5, %4 : i64 + +// If the linear offset is known to be zero, it can also be omitted. If it is +// dynamic, it is extracted from the descriptor. +%offset = llvm.extractvalue[2] : !llvm.struct<(ptr, ptr, i64, + array<4xi64>, array<4xi64>)> +%addr7 = addi %addr6, %offset : i64 + +// All accesses are based on the aligned pointer. +%aligned = llvm.extractvalue[1] : !llvm.struct<(ptr, ptr, i64, + array<4xi64>, array<4xi64>)> + +// Get the address of the data pointer. +%ptr = llvm.getelementptr %aligned[%addr8] + : !llvm.struct<(ptr, ptr, i64, array<4xi64>, array<4xi64>)> + -> !llvm.ptr + +// Perform the actual load. +%0 = llvm.load %ptr : !llvm.ptr +``` + +For stores, the address computation code is identical and only the actual store +operation is different. + +Note: the conversion does not perform any sort of common subexpression +elimination when emitting memref accesses. + +### Utility Classes + +Utility classes common to many conversions to the LLVM dialect can be found +under `lib/Conversion/LLVMCommon`. They include the following. + +- `LLVMConversionTarget` specifies all LLVM dialect operations as legal. +- `LLVMTypeConverter` implements the default type conversion as described + above. +- `ConvertOpToLLVMPattern` extends the conversion pattern class with LLVM + dialect-specific functionality. +- `VectorConvertOpToLLVMPattern` extends the previous class to automatically + unroll operations on higher-dimensional vectors into lists of operations on + one-dimensional vectors before. +- `StructBuilder` provides a convenient API for building IR that creates or + accesses values of LLVM dialect structure types; it is derived by + `MemRefDescriptor`, `UrankedMemrefDescriptor` and `ComplexBuilder` for the + built-in types convertible to LLVM dialect structure types. + +## Translation to LLVM IR + +MLIR modules containing `llvm.func`, `llvm.mlir.global` and `llvm.metadata` +operations can be translated to LLVM IR modules using the following scheme. + +- Module-level globals are translated to LLVM IR global values. +- Module-level metadata are translated to LLVM IR metadata, which can be later + augmented with additional metadata defined on specific ops. +- All functions are declared in the module so that they can be referenced. +- Each function is then translated separately and has access to the complete + mappings between MLIR and LLVM IR globals, metadata, and functions. +- Within a function, blocks are traversed in topological order and translated + to LLVM IR basic blocks. In each basic block, PHI nodes are created for each + of the block arguments, but not connected to their source blocks. +- Within each block, operations are translated in their order. Each operation + has access to the same mappings as the function and additionally to the + mapping of values between MLIR and LLVM IR, including PHI nodes. Operations + with regions are responsible for translated the regions they contain. +- After operations in a function are translated, the PHI nodes of blocks in + this function are connected to their source values, which are now available. + +The translation mechanism provides extension hooks for translating custom +operations to LLVM IR via a dialect interface `LLVMTranslationDialectInterface`: + +- `convertOperation` translates an operation that belongs to the current + dialect to LLVM IR given an `IRBuilderBase` and various mappings; +- `amendOperation` performs additional actions on an operation if it contains + a dialect attribute that belongs to the current dialect, for example sets up + instruction-level metadata. + +Dialects containing operations or attributes that want to be translated to LLVM +IR must provide an implementation of this interface and register it with the +system. Note that registration may happen without creating the dialect, for +example, in a separate library to avoid the need for the "main" dialect library +to depend on LLVM IR libraries. The implementations of these methods may used +the +[`ModuleTranslation`](https://mlir.llvm.org/doxygen/classmlir_1_1LLVM_1_1ModuleTranslation.html) +object provided to them which holds the state of the translation and contains +numerous utilities. + +Note that this extension mechanism is *intentionally restrictive*. LLVM IR has a +small, relatively stable set of instructions and types that MLIR intends to +model fully. Therefore, the extension mechanism is provided only for LLVM IR +constructs that are more often extended -- intrinsics and metadata. The primary +goal of the extension mechanism is to support sets of intrinsics, for example +those representing a particular instruction set. The extension mechanism does +not allow for customizing type or block translation, nor does it support custom +module-level operations. Such transformations should be performed within MLIR +and target the corresponding MLIR constructs. + +## Translation from LLVM IR + +An experimental flow allows one to import a substantially limited subset of LLVM +IR into MLIR, producing LLVM dialect operations. + +``` + mlir-translate -import-llvm filename.ll +```