diff --git a/mlir/docs/ConversionToLLVMDialect.md b/mlir/docs/ConversionToLLVMDialect.md --- a/mlir/docs/ConversionToLLVMDialect.md +++ b/mlir/docs/ConversionToLLVMDialect.md @@ -1,16 +1,19 @@ # Conversion to the LLVM Dialect -Conversion from the Standard to the [LLVM Dialect](Dialects/LLVM.md) can be -performed by the specialized dialect conversion pass by running: +Conversion from several dialects that rely on +[built-in types](LangRef.md#builtin-types) to the +[LLVM Dialect](Dialects/LLVM.md) is expected to be performed through the +[Dialect Conversion](DialectConversion.md) infrastructure. -```shell -mlir-opt -convert-std-to-llvm -``` +The conversion of types and that of the overall module structure is described in +this document. Individual conversion passes provide a set of conversion patterns +for ops in different dialects, such as `-convert-std-to-llvm` for ops in the +[Standard dialect](Dialects/Standard.md) and `-convert-vector-to-llvm` in the +[Vector dialect](Dialects/Vector.md). *Note that some conversions subsume the +others.* -It performs type and operation conversions for a subset of operations from -standard dialect (operations on scalars and vectors, control flow operations) as -described in this document. We use the terminology defined by the -[LLVM IR Dialect description](Dialects/LLVM.md) throughout this document. +We use the terminology defined by the +[LLVM Dialect description](Dialects/LLVM.md) throughout this document. [TOC] @@ -22,19 +25,19 @@ following conversions are currently implemented: - `i*` converts to `!llvm.i*` +- `bf16` converts to `!llvm.bfloat` - `f16` converts to `!llvm.half` - `f32` converts to `!llvm.float` - `f64` converts to `!llvm.double` -Note: `bf16` type is not supported by LLVM IR and cannot be converted. - ### Index Type -Index type is converted to a wrapped LLVM IR integer with bitwidth equal to the -bitwidth of the pointer size as specified by the -[data layout](https://llvm.org/docs/LangRef.html#data-layout) of the LLVM module -[contained](Dialects/LLVM.md#context-and-module-association) in the LLVM Dialect -object. For example, on x86-64 CPUs it converts to `!llvm.i64`. +Index type is converted to an LLVM dialect integer type with bitwidth equal to +the bitwidth of the pointer size as specified by the +[data layout](Dialects/LLVM.md#data-layout-and-triple) of the closest module. +For example, on x86-64 CPUs it converts to `!llvm.i64`. This behavior can be +overridden by the type converter configuration, which is often exposed as a pass +option by conversion passes. ### Vector Types @@ -45,31 +48,54 @@ n-dimensional case, MLIR vectors are converted to (n-1)-dimensional array types of one-dimensional vectors. -For example, `vector<4 x f32>` converts to `!llvm<"<4 x float>">` and `vector<4 -x 8 x 16 x f32>` converts to `!llvm<"[4 x [8 x <16 x float>]]">`. +For example, `vector<4 x f32>` converts to `!llvm.vec<4 x float>` and `vector<4 +x 8 x 16 x f32>` converts to `!llvm.array<4 x array<8 x vec<16 x float>>>`. -### Memref Types +### Ranked Memref Types Memref types in MLIR have both static and dynamic information associated with -them. The dynamic information comprises the buffer pointer as well as sizes and +them. In the general case, the dynamic information describes dynamic sizes in +the logical indexing space and any symbols bound to the memref. This dynamic +information must be present at runtime in the LLVM dialect equivalent type. + +In practice, the conversion supports two conventions: + +- the default convention for memrefs in the + **[strided form](LangRef.md#strided-memref)**; +- a "bare pointer" conversion for statically-shaped memrefs with default + layout. + +The choice between conventions is specified at type converter construction time +and is often exposed as an option by conversion passes. + +Memrefs with arbitrary layouts are not supported. Instead, these layouts can be +factored out of the type and used as part of index computation for operations +that read and write into a memref with the default layout. + +#### Default Convention + +The dynamic information comprises the buffer pointer as well as sizes and strides of any dynamically-sized dimensions. Memref types are normalized and -converted to a descriptor that is only dependent on the rank of the memref. The -descriptor contains: - -1. the pointer to the data buffer, followed by -2. the pointer to properly aligned data payload that the memref indexes, - followed by -3. a lowered `index`-type integer containing the distance between the beginning - of the buffer and the first element to be accessed through the memref, - followed by -4. an array containing as many `index`-type integers as the rank of the memref: - the array represents the size, in number of elements, of the memref along - the given dimension. For constant MemRef dimensions, the corresponding size - entry is a constant whose runtime value must match the static value, - followed by -5. a second array containing as many 64-bit integers as the rank of the MemRef: - the second array represents the "stride" (in tensor abstraction sense), i.e. - the number of consecutive elements of the underlying buffer. +converted to a _descriptor_ that is only dependent on the rank of the memref. +The descriptor contains the following fields in order. + +1. The pointer to the data buffer as allocated, referred to as "allocated + pointer". This is only useful for deallocating the memref. +2. The pointer to the properly aligned data pointer that the memref indexes, + referred to as "aligned pointer". +3. A lowered converted `index`-type integer containing the distance in number + of elements between the beginning of the (aligned) buffer and the first + element to be accessed through the memref, referred to as "offset". +4. An array containing as many converted `index`-type integers as the rank of + the memref: the array represents the size, in number of elements, of the + memref along the given dimension. For constant memref dimensions, the + corresponding size entry is a constant whose runtime value must match the + static value. +5. A second array containing as many converted `index`-type integers as the + rank of memref: the second array represents the "stride" (in tensor + abstraction sense), i.e. the number of consecutive elements of the + underlying buffer one needs to jump over to get to the next logically + indexed element. For constant memref dimensions, the corresponding size entry is a constant whose runtime value matches the static value. This normalization serves as an ABI for @@ -80,125 +106,187 @@ Examples: ```mlir -memref -> !llvm<"{ float*, float*, i64 }"> -memref<1 x f32> -> !llvm<"{ float*, float*, i64, [1 x i64], [1 x i64] }"> -memref -> !llvm<"{ float*, float*, i64, [1 x i64], [1 x i64] }"> -memref<10x42x42x43x123 x f32> -> !llvm<"{ float*, float*, i64, [5 x i64], [5 x i64] }"> -memref<10x?x42x?x123 x f32> -> !llvm<"{ float*, float*, i64, [5 x i64], [5 x i64] }"> +memref -> !llvm.struct<(ptr , ptr, i64)> +memref<1 x f32> -> !llvm.struct<(ptr, ptr, i64, + array<1 x 64>, array<1 x i64>)> +memref -> !llvm.struct<(ptr, ptr, i64 + array<1 x 64>, array<1 x i64>)> +memref<10x42x42x43x123 x f32> -> !llvm.struct<(ptr, ptr, i64 + array<5 x 64>, array<5 x i64>)> +memref<10x?x42x?x123 x f32> -> !llvm.struct<(ptr, ptr, i64 + array<5 x 64>, array<5 x i64>)> // Memref types can have vectors as element types -memref<1x? x vector<4xf32>> -> !llvm<"{ <4 x float>*, <4 x float>*, i64, [1 x i64], [1 x i64] }"> +memref<1x? x vector<4xf32>> -> !llvm.struct<(ptr>, + ptr>, i64, + array<1 x i64>, array<1 x i64>)> ``` -If the rank of the memref is unknown at compile time, the memref is converted to -an unranked descriptor that contains: - -1. a 64-bit integer representing the dynamic rank of the memref, followed by -2. a pointer to a ranked memref descriptor with the contents listed above. +#### Bare Pointer Convention -Dynamic ranked memrefs should be used only to pass arguments to external library -calls that expect a unified memref type. The called functions can parse any -unranked memref descriptor by reading the rank and parsing the enclosed ranked -descriptor pointer. +Ranked memrefs with static shape and default layout can be converted into an +LLVM dialect pointer to their element type. Only the default alignment is +supported in such cases, e.g. the `alloc` operation cannot have an alignemnt +attribute. Examples: ```mlir -// unranked descriptor -memref<*xf32> -> !llvm<"{i64, i8*}"> +memref -> !llvm.ptr +memref<10x42 x f32> -> !llvm.ptr + +// Memrefs with vector types are also supported. +memref<10x42 x vector<4xf32>> -> !llvm.ptr> ``` -**In function signatures,** `memref` is passed as a _pointer_ to the structured -defined above to comply with the calling convention. +### Unranked Memref types -Example: +Unranked memrefs are converted to an unranked descriptor that contains: + +1. a converted `index`-typed integer representing the dynamic rank of the + memref; +2. a type-erased pointer (`!llvm.ptr`) to a ranked memref descriptor with + the contents listed above. + +This descriptor is primarily intended for interfacing with rank-polymorphic +library functions. The pointer to the ranked memref descriptor points to memory +_allocated on stack_ of the function in which it is used. + +Note that stack allocations may be emitted at a location where the unranked +memref first appears, e.g., a cast operation, and remain live throughout the +lifetime of the function; this may lead to stack exhaustion if used in a loop. + +Examples: ```mlir -// A function type with memref as argument -(memref) -> () -// is transformed into the LLVM function with pointer-to-structure argument. -!llvm<"void({ float*, float*, i64, [1 x i64], [1 x i64]}*) "> +// Unranked descriptor. +memref<*xf32> -> !llvm.struct<(i64, ptr)> ``` +Bare pointer convention does not support unranked memrefs. + ### Function Types -Function types get converted to LLVM function types. The arguments are converted -individually according to these rules. The result types need to accommodate the -fact that LLVM IR functions always have a return type, which may be a Void type. -The converted function always has a single result type. If the original function -type had no results, the converted function will have one result of the wrapped -`void` type. If the original function type had one result, the converted -function will also have one result converted using these rules. Otherwise, the result -type will be a wrapped LLVM IR structure type where each element of the -structure corresponds to one of the results of the original function, converted -using these rules. In high-order functions, function-typed arguments and results -are converted to a wrapped LLVM IR function pointer type (since LLVM IR does not -allow passing functions to functions without indirection) with the pointee type -converted using these rules. +Function types get converted to LLVM dialect function types. The arguments are +converted individually according to these rules, except for `memref` types in +function arguments and high-order functions, which are described below. The +result types need to accommodate the fact that LLVM functions always have a +return type, which may be an `!llvm.void` type. The converted function always +has a single result type. If the original function type had no results, the +converted function will have one result of the `!llvm.void` type. If the +original function type had one result, the converted function will also have one +result converted using these rules. Otherwise, the result type will be an LLVM +dialect structure type where each element of the structure corresponds to one of +the results of the original function, converted using these rules. Examples: ```mlir -// zero-ary function type with no results. +// Zero-ary function type with no results: () -> () -// is converted to a zero-ary function with `void` result -!llvm<"void ()"> +// is converted to a zero-ary function with `void` result. +!llvm.func -// unary function with one result +// Unary function with one result: (i32) -> (i64) -// has its argument and result type converted, before creating the LLVM IR function type -!llvm<"i64 (i32)"> +// has its argument and result type converted, before creating the LLVM dialect +// function type. +!llvm.func -// binary function with one result +// Binary function with one result: (i32, f32) -> (i64) // has its arguments handled separately -!llvm<"i64 (i32, float)"> +!llvm.func -// binary function with two results +// Binary function with two results: (i32, f32) -> (i64, f64) -// has its result aggregated into a structure type -!llvm<"{i64, double} (i32, f32)"> +// has its result aggregated into a structure type. +!llvm.func (i32, float)> +``` + +#### Functions as Function Arguments or Results -// function-typed arguments or results in higher-order functions +High-order function types, i.e. types of functions that have other functions as +arguments or results, are converted differently to accommodate the fact that +LLVM IR does not allow for function-typed values. Instead, functions are +expected to be passed into and return from other functions _by pointer_. +Therefore, function-typed function arguments are results are converted to +pointer-to-the-function type. The pointee type is converted using these rules. + +Examples: + +```mlir +// Function-typed arguments or results in higher-order functions: (() -> ()) -> (() -> ()) -// are converted into pointers to functions -!llvm<"void ()* (void ()*)"> +// are converted into pointers to functions. +!llvm.func> (ptr>)> + +// These rules apply recursively: a function type taking a function that takes +// another function +( ( (i32) -> (i64) ) -> () ) -> () +// is converted into a function type taking a pointer-to-function that takes +// another point-to-function. +!llvm.func>)>>)> ``` -## Calling Convention +#### Memrefs as Function Arguments -### Function Signature Conversion +When used as function arguments, both ranked and unranked memrefs are converted +into a list of arguments that represents each _scalar_ component of their +descriptor. This is intended for some comaptibility with C ABI, in which +structure types would need to be passed by-pointer leading to the need for +allocations and related issues, as well as for aliasing annotations, which are +currently attached to pointer in function arguments. Having scalar components +means that each size and stride is passed as an invidivual value. -LLVM IR functions are defined by a custom operation. The function itself has a -wrapped LLVM IR function type converted as described above. The function -definition operation uses MLIR syntax. +When used as function results, memrefs are converted as usual, i.e. each memref +is converted to a descriptor struct (default convention) or to a pointer (bare +pointer convention). Examples: ```mlir -// zero-ary function type with no results. -func @foo() -> () -// gets LLVM type void(). -llvm.func @foo() -> () - -// function with one result -func @bar(i32) -> (i64) -// gets converted to LLVM type i64(i32). -func @bar(!llvm.i32) -> !llvm.i64 - -// function with two results -func @qux(i32, f32) -> (i64, f64) -// has its result aggregated into a structure type -func @qux(!llvm.i32, !llvm.float) -> !llvm<"{i64, double}"> - -// function-typed arguments or results in higher-order functions -func @quux(() -> ()) -> (() -> ()) -// are converted into pointers to functions -func @quux(!llvm<"void ()*">) -> !llvm<"void ()*"> -// the call flow is handled by the LLVM dialect `call` operation supporting both -// direct and indirect calls +// A memref descriptor appearing as function argument: +(memref) -> () +// gets converted into a list of individual scalar components of a descriptor. +!llvm.func, ptr, i64)> + +// The list of arguments is linearized and one can freely mix memref and other +// types in this list: +(memref, f32) -> () +// which gets converted into a flat list. +!llvm.func, ptr, i64, float)> + +// For nD ranked memref descriptors: +(memref) -> () +// the converted signature will contain 2n+1 `index`-typed integer arguments, +// offset, n sizes and n strides, per memref argument type. +!llvm.func, ptr, i64, i64, i64, i64, i64)> + +// Same rules apply to unranked descriptors: +(memref<*xf32>) -> () +// which get converted into their components. +!llvm.func)> + +// However, returning a memref from a function is not affected: +() -> (memref) +// gets converted to a function returning a descriptor structure. +!llvm.func, ptr, i64, array<1xi64>, array<1xi64>)> ()> + +// If multiple memref-typed results are returned: +() -> (memref, memref) +// their descriptor structures are additionally packed into another structure, +// potentially with other non-memref typed results. +!llvm.func, ptr, i64)>, + struct<(ptr, ptr, i64)>)> ()> ``` +## Calling Convention for Standard Calls + + + ### Result Packing In case of multi-result functions, the returned values are inserted into a