This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
docs/
-
TargetLLVMIR.md
-
include/mlir/Conversion/
-
mlir/
-
Conversion/
-
LLVMCommon/
1/1
LoweringOptions.h
-
MemRefBuilder.h
-
Pattern.h
-
Passes.td
-
lib/Conversion/
-
Conversion/
-
LLVMCommon/
-
MemRefBuilder.cpp
3/3
Pattern.cpp
1/1
TypeConverter.cpp
-
MemRefToLLVM/
-
MemRefToLLVM.cpp
-
StandardToLLVM/
-
StandardToLLVM.cpp
-
test/Conversion/StandardToLLVM/
-
Conversion/
-
StandardToLLVM/
1
calling-convention-dbg.mlir
-
calling-convention-external-c-function-callee.mlir
2/2
calling-convention-external-c-function-caller.mlir
2/2
calling-convention.mlir

Differential D110459

[MLIR] Improve calling convention for unranked memory descriptor results.
AbandonedPublic

Authored by frgossen on Sep 24 2021, 4:50 PM.

Download Raw Diff

Details

Reviewers

ftynse
herhut
mehdi_amini
dcaballe

Summary

[MLIR] Improve calling convention for unranked memory descriptor results.

For unranked memory descriptor results, their size is not statically known due
to the inner descriptor of dynamic rank. To return such descriptors from
functions generally requires dynamic memory allocation which involves calls to
malloc and which can be expensive. To circumvent this problem, we allocate
allocate buffers on the stack that are big enough to hold the inner descriptors
up to some supported rank (max-unranked-desc-buffer-rank). If the unranked
descriptor does not exceed this rank, we can always copy it to stack-allocated
memory and avoid heap allocation entirely. Otherwise, if the rank of the
returned buffer is too big for the pre-allocated buffer, we fall back to
dynamic memory allocation. This is an optimization similar to the implementation
of an llvm::SmallVector.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

frgossen created this revision.Sep 24 2021, 4:50 PM

Herald added a reviewer: ftynse. · View Herald TranscriptSep 24 2021, 4:50 PM

Herald added subscribers: wenzhicui, wrengr, Chia-hungDuan and 20 others. · View Herald Transcript

frgossen requested review of this revision.Sep 24 2021, 4:50 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 24 2021, 4:50 PM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

frgossen added a reviewer: herhut.Sep 24 2021, 4:50 PM

Harbormaster completed remote builds in B125673: Diff 374997.Sep 24 2021, 4:59 PM

bondhugula added a subscriber: bondhugula.Sep 24 2021, 6:07 PM

bondhugula added inline comments.

mlir/include/mlir/Conversion/LLVMCommon/LoweringOptions.h
37	Doc comment here.
mlir/lib/Conversion/LLVMCommon/Pattern.cpp
250	unsigned
271–276	I'm trying to understand the impact here when such descriptors are being allocated on the stack inside of loops: wouldn't one typically run out of stack space? (Eg. call ops inside loops.)

That's a great optimization!
Something that wasn't clear to me from the description, was that this would not put a hard limit on the max rank size: it is just really the same behavior as SmallVector. Can you maybe mention this in the description?

Something I'm not sure about yet, is the impact on the calling-convention: what you're doing here seems to make this lowering option part of the calling convention, which I think is risky and fragile. This is a nice optimization for private functions maybe, but unless we lower all the call sites and the function together in the same unit we risk these getting out-of-sync.

What may be possible would be to pass the inline size as argument separately, that way the lowering of the called function becomes entirely independent from the lowering of the callers, and various settings for the max-unranked-desc-buffer-rank becomes possible.

Basically what I'm thinking here is changing the calling convention to be friendly/resilient to this small size optimization, instead of basically making the calling convention controlled by this compiler setting.

mlir/test/Conversion/StandardToLLVM/calling-convention.mlir
2	The previous test was testing two variants, with and without emit-c-wrappers, why did we lose the distinction? Actually, could we leave this test pristine by adding `max-unranked-desc-buffer-rank=-1` (and ensuring we disable the "small size optimization in this case)? We could then add another file that just exercises the effect of the `max-unranked-desc-buffer-rank` on dedicated case. In general I feel that that we should shard files like this one into smaller tests that exercises each particular feature of the calling convention. Reviewing a change like this diff is just not possible otherwise. That said thanks for the nice documentation inline in the test! The way you made each sections explicit is really helpful.

(Also we'll need to ensure up-to-date doc in https://mlir.llvm.org/docs/TargetLLVMIR )

This revision now requires changes to proceed.Sep 24 2021, 6:12 PM

Address comments

Thanks for the useful comments :) I see how the description did not make this clear, which is updated now.

Passing the rank (or buffer size) dynamically is in principle possible but complicates the calling convention quite a bit.
Internally, the buffers are currently passed as preceding arguments where the rank/size could be added relatively easily. For the external C functions, the buffer is currently passed within the memory descriptor, which I think is easier to read in C, but would not compose well with passing the additional rank/size argument.
Also here, I would advocate to implement this if someone actually needs it. This optimization only requires to have lowerings in sync if the max-unranked-desc-buffer-rank is set to some non-default. I see what you mean with "risky and fragile" if people play with max-unranked-desc-buffer-rank in which case they should know what they're doing.

If you are happy with the above, I will update the documentation accordingly ofc. I wish I had seen https://mlir.llvm.org/docs/TargetLLVMIR/ before, which would have helped understand all this :D.

mlir/lib/Conversion/LLVMCommon/Pattern.cpp
271–276	Yes, good point. This is a known issue with unranked memory descriptors in general. The previous calling convention and any other local unranked memory descriptor suffer from this issue.
mlir/test/Conversion/StandardToLLVM/calling-convention.mlir
2	Running the tests with and w/o `emit-c-wrappers=1` just switches between looking at the `llvm.emit_c_interface` attribute or assuming it everywhere. The new tests use the `emit_c_interface` attribute everywhere to generate C interfaces only where they are tested. Adding support to disable the desc buffer passing (`max-unranked-desc-buffer-rank=-1`) to fall back to the old behaviour would complicate the calling convention quite a bit imo. In one case, buffers are passed, in the other they aren't, etc. Until someone needs that, I would rather avoid this complexity. How important do you think this is? "shard files" - Done :) I know this is a big CL to review but I don't see a way to break it down into multiple smaller ones. If you prefer, I could land the tests separately.

frgossen edited the summary of this revision. (Show Details)Oct 4 2021, 10:13 AM

Harbormaster completed remote builds in B126854: Diff 376938.Oct 4 2021, 11:02 AM

Regarding @mehdi_amini comment: I agree that this adds another dimension to the ABI with C but unranked results already have a fairly complex calling convention and I don't think this makes it worse. If there are strong concerns that this will get out of sync, we could hard-wire this to a fixed rank to start with, which would prevent that issue.

If we really want to go down the route of dynamic sized pre-allocated descriptors, I would suggest we redesign the unranked descriptor to also contain the size of the pointed to buffer. That is a refactoring in its own and I suggest we do it as a follow on.

mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp
136	Nit: used to avoid

In D110459#3041860, @herhut wrote:

Regarding @mehdi_amini comment: I agree that this adds another dimension to the ABI with C but unranked results already have a fairly complex calling convention and I don't think this makes it worse. If there are strong concerns that this will get out of sync, we could hard-wire this to a fixed rank to start with, which would prevent that issue.

If we really want to go down the route of dynamic sized pre-allocated descriptors, I would suggest we redesign the unranked descriptor to also contain the size of the pointed to buffer. That is a refactoring in its own and I suggest we do it as a follow on.

+1 we discussed this a bunch in the past but we didn't get to it.

It should be reasonably mechanical (but lenght) to make an UnrankedMemRefDescriptor flat (i.e. just the content of MemRefDescriptor + and int64_t for size and modulo some changes at the place of allocation).
It is unclear to me whether these should also support type-erasure (i.e. do we also want to put an enum for the data type?)
Codegen wouldn't need it but libraries / python interop could make use of it.

Address comments

Harbormaster completed remote builds in B127003: Diff 377127.Oct 5 2021, 2:53 AM

I don't think this makes it worse.

Well I gave a very objective criteria that shows how this makes the calling convention work. I'd still like to look more into making this more robust, I need to get back to think about what @frgossen wrote about it:

Passing the rank (or buffer size) dynamically is in principle possible but complicates the calling convention quite a bit.
Internally, the buffers are currently passed as preceding arguments where the rank/size could be added relatively easily. For the external C functions, the buffer is currently passed within the memory descriptor, which I think is easier to read in C, but would not compose well with passing the additional rank/size argument.

But I haven't had time to do so yet, because I likely need to write an example to wrap my head around it.

This revision now requires changes to proceed.Oct 5 2021, 10:58 AM

frgossen added inline comments.Oct 5 2021, 3:39 PM

mlir/test/Conversion/StandardToLLVM/calling-convention-external-c-function-caller.mlir
307	@mehdi_amini , this could be an example that you're looking for.
311–313	@mehdi_amini , this is how the buffer is currently passed through the C interface as part of the pre-allocated result. The alternative is to pass buffer and (size or rank) as separate arguments, which I'd find less intuitive.

Add documentation

Harbormaster completed remote builds in B127281: Diff 377522.Oct 6 2021, 6:27 AM

Friendly ping

I had another look at this, and I am still convinced that we shouldn't make calling convention parametric this way: this just does not seem like a good thing to me from a system consistency point of view.

I dug into this, but I'm not to the bottom of it yet.
In particular, it seems that the API for the c interface isn't changed (on the surface), as in the C++ signature stays the same. But I just got lost right now in the contract with the C++ code about alloc/free for these descriptors (both for when we call C++ from MLIR and when we have MLIR generated code that will be invoked from C++).

As an example, when we generate llvm.func @_mlir_ciface_bar(%arg0: !llvm.ptr<struct<(struct<(i64, ptr<i8>)>, f32, struct<(i64, ptr<i8>)>)>>) from func @bar() -> (memref<*xf32>, f32, memref<*xf32>) attributes { llvm.emit_c_interface } ; we would always malloc a new descriptor and the C++ code has to free it. Also the C++ code never needs to pass a valid pointer in I think (that is, before this revision).

mlir/test/Conversion/StandardToLLVM/calling-convention-dbg.mlir
3	(drive by comment: please use the test dialect and remove the allow-unregistered-dialect option) But actually I think you didn't even intend to have this file here?

frgossen abandoned this revision.Jan 17 2023, 6:09 AM

Herald added a reviewer: dcaballe. · View Herald TranscriptJan 17 2023, 6:09 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: Moerafaat, zero9178, bzcheeseman and 2 others. · View Herald Transcript

Revision Contents

Path

Size

mlir/

docs/

TargetLLVMIR.md

57 lines

include/

mlir/

Conversion/

LLVMCommon/

4 lines

12 lines

32 lines

5 lines

lib/

Conversion/

LLVMCommon/

MemRefBuilder.cpp

65 lines

Pattern.cpp

225 lines

TypeConverter.cpp

22 lines

MemRefToLLVM/

MemRefToLLVM.cpp

7 lines

StandardToLLVM/

StandardToLLVM.cpp

178 lines

test/

Conversion/

StandardToLLVM/

calling-convention-dbg.mlir

9 lines

calling-convention-external-c-function-callee.mlir

291 lines

calling-convention-external-c-function-caller.mlir

496 lines

calling-convention.mlir

812 lines

Diff 377522

mlir/docs/TargetLLVMIR.md

Show First 20 Lines • Show All 414 Lines • ▼ Show 20 Lines	llvm.func @bar() {
llvm.call @foo(%1, %2, %3, %4, %5) : (!llvm.memref_1d) -> ()		llvm.call @foo(%1, %2, %3, %4, %5) : (!llvm.memref_1d) -> ()
llvm.return		llvm.return
}		}
```		```

#### Default Calling Convention for Unranked MemRef		#### Default Calling Convention for Unranked MemRef

For unranked memrefs, the list of function arguments always contains two		For unranked memrefs, the list of function arguments always contains two
elements, same as the unranked memref descriptor: an integer rank, and a		elements, the same as the unranked memref descriptor: an integer rank, and a
type-erased (`!llvm<"i8*">`) pointer to the ranked memref descriptor. Note that		type-erased (`!llvm<"i8*">`) pointer to the ranked memref descriptor. Note that
while the calling convention does not require allocation, casting to		while the calling convention does not require allocation, casting to
unranked memref does since one cannot take an address of an SSA value containing		unranked memref does since one cannot take an address of an SSA value containing
the ranked memref, which must be stored in some memory instead. The caller is in		the ranked memref, which must be stored in some memory instead. The caller is in
charge of ensuring the thread safety and management of the allocated memory, in		charge of ensuring the thread safety and management of the allocated memory, in
particular the deallocation.		particular the deallocation.

Example		Example
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
dynamically allocated memory, and the pointer in the unranked descriptor is		dynamically allocated memory, and the pointer in the unranked descriptor is
updated accordingly. The allocation happens immediately before returning. It is		updated accordingly. The allocation happens immediately before returning. It is
the responsibility of the caller to free the dynamically allocated memory. The		the responsibility of the caller to free the dynamically allocated memory. The
default conversion of `std.call` and `std.call_indirect` copies the ranked		default conversion of `std.call` and `std.call_indirect` copies the ranked
descriptor to newly allocated memory on the caller's stack. Thus, the convention		descriptor to newly allocated memory on the caller's stack. Thus, the convention
of the ranked memref descriptor pointed to by an unranked memref descriptor		of the ranked memref descriptor pointed to by an unranked memref descriptor
being stored on stack is respected.		being stored on stack is respected.

		Descriptor buffer arguments. Functions that return unranked memref
		descriptors take one additional buffer argument per unranked result. On return,
		these are used to hold the results' inner memref descriptors for small ranks
		(up to 8 by default). This optimization avoids unnecessary calls to `malloc` and
		`free`, which are otherwise necessary at each function call site and return. In
		case the result is of greater rank (and does not fit into the buffer), the
		calling convention falls back to heap allocation.

		```mlir
		llvm.func @bar() {
		%0 = call @foo() : () -> (memref<*xf32>)
		"use"(%0) : (memref<*xf32>) -> ()
		return
		}

		// Gets converted to the following.

		llvm.func @bar() {
		%0 = llvm.mlir.constant(152 : index) : i64
		%1 = llvm.alloca %0 x i8 : (i64) -> !llvm.ptr<i8>
		%2 = llvm.call @foo(%1) : (!llvm.ptr<i8>) -> !llvm.struct<(i64, ptr<i8>)>
		%3 = llvm.mlir.constant(8 : i64) : i64
		%4 = llvm.extractvalue %2[0] : !llvm.struct<(i64, ptr<i8>)>
		%5 = llvm.icmp "ule" %4, %3 : i64
		llvm.cond_br %5, ^bb1(%2 : !llvm.struct<(i64, ptr<i8>)>), ^bb2
		^bb1(%6: !llvm.struct<(i64, ptr<i8>)>):
		"use"(%6)
		llvm.return
		^bb2:
		%17 = ... // compute the size for the inner descriptor.
		%18 = llvm.alloca %17 x i8 : (i64) -> !llvm.ptr<i8>
		%19 = llvm.extractvalue %2[1] : !llvm.struct<(i64, ptr<i8>)>
		%20 = llvm.mlir.constant(false) : i1
		"llvm.intr.memcpy"(%18, %19, %17, %20) : (!llvm.ptr<i8>, !llvm.ptr<i8>, i64, i1) -> ()
		llvm.call @free(%19) : (!llvm.ptr<i8>) -> ()
		%21 = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)>
		%22 = llvm.extractvalue %2[0] : !llvm.struct<(i64, ptr<i8>)>
		%23 = llvm.insertvalue %22, %21[0] : !llvm.struct<(i64, ptr<i8>)>
		%24 = llvm.insertvalue %18, %23[1] : !llvm.struct<(i64, ptr<i8>)>
		llvm.br ^bb1(%24 : !llvm.struct<(i64, ptr<i8>)>)
		}
		```

#### Bare Pointer Calling Convention for Ranked MemRef		#### Bare Pointer Calling Convention for Ranked MemRef

The "bare pointer" calling convention converts `memref`-typed function arguments		The "bare pointer" calling convention converts `memref`-typed function arguments
to a single pointer to the aligned data. Note that this does not apply to		to a single pointer to the aligned data. Note that this does not apply to
uses of `memref` outside of function signatures, the default descriptor		uses of `memref` outside of function signatures, the default descriptor
structures are still used. This convention further restricts the supported cases		structures are still used. This convention further restricts the supported cases
to the following.		to the following.

▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines

1. Declare a new function `_mlir_ciface_<original name>` where memref arguments		1. Declare a new function `_mlir_ciface_<original name>` where memref arguments
are converted to pointer-to-struct and the remaining arguments are converted		are converted to pointer-to-struct and the remaining arguments are converted
as usual. Results are converted to a special argument if they are of struct		as usual. Results are converted to a special argument if they are of struct
type.		type.
2. Add a body to the original function (making it non-external) that		2. Add a body to the original function (making it non-external) that
1. allocates memref descriptors,		1. allocates memref descriptors,
2. populates them,		2. populates them,
3. potentially allocates space for the result struct, and		3. potentially allocates space for the result struct (also holding any
		descriptor buffers for unranked memref results if needed), and
4. passes the pointers to these into the newly declared interface function,		4. passes the pointers to these into the newly declared interface function,
then		then
5. collects the result of the call (potentially from the result struct),		5. collects the result of the call (potentially from the result struct),
and		and
6. returns it to the caller.		6. returns it to the caller.

For (non-external) functions defined in the MLIR module.		For (non-external) functions defined in the MLIR module.

1. Define a new function `_mlir_ciface_<original name>` where memref arguments		1. Define a new function `_mlir_ciface_<original name>` where memref arguments
are converted to pointer-to-struct and the remaining arguments are converted		are converted to pointer-to-struct and the remaining arguments are converted
as usual. Results are converted to a special argument if they are of struct		as usual. Results are converted to a special argument if they are of struct
type.		type.
2. Populate the body of the newly defined function with IR that		2. Populate the body of the newly defined function with IR that
1. loads descriptors from pointers;		1. loads descriptors from pointers,
2. unpacks descriptor into individual non-aggregate values;		2. unpacks descriptor into individual non-aggregate values (also inner
3. passes these values into the original function;		desriptor buffer if needed),
4. collects the results of the call and		3. passes these values into the original function, then
		4. collects the results of the call, and
5. either copies the results into the result struct or returns them to the		5. either copies the results into the result struct or returns them to the
caller.		caller.

Examples:		Examples:

```mlir		```mlir

func @qux(%arg0: memref<?x?xf32>)		func @qux(%arg0: memref<?x?xf32>)
▲ Show 20 Lines • Show All 280 Lines • Show Last 20 Lines

mlir/include/mlir/Conversion/LLVMCommon/LoweringOptions.h

	Show All 28 Lines
	/// to share lowering options between passes, patterns, and type converter.			/// to share lowering options between passes, patterns, and type converter.
	class LowerToLLVMOptions {			class LowerToLLVMOptions {
	public:			public:
	explicit LowerToLLVMOptions(MLIRContext *ctx);			explicit LowerToLLVMOptions(MLIRContext *ctx);
	LowerToLLVMOptions(MLIRContext *ctx, const DataLayout &dl);			LowerToLLVMOptions(MLIRContext *ctx, const DataLayout &dl);

	bool useBarePtrCallConv = false;			bool useBarePtrCallConv = false;
	bool emitCWrappers = false;			bool emitCWrappers = false;

				bondhugulaUnsubmitted Done Reply Inline Actions Doc comment here. bondhugula: Doc comment here.
				// Specifies the maximum rank for which the calling convention will realize
				// stack-allocated buffers for unranked memory descriptior results.
				int64_t maxUnrankedDescBufferRank = 8;

	enum class AllocLowering {			enum class AllocLowering {
	/// Use malloc for for heap allocations.			/// Use malloc for for heap allocations.
	Malloc,			Malloc,

	/// Use aligned_alloc for heap allocations.			/// Use aligned_alloc for heap allocations.
	AlignedAlloc,			AlignedAlloc,

	/// Do not lower heap allocations. Users must provide their own patterns for			/// Do not lower heap allocations. Users must provide their own patterns for
	Show All 28 Lines

mlir/include/mlir/Conversion/LLVMCommon/MemRefBuilder.h

Show First 20 Lines • Show All 167 Lines • ▼ Show 20 Lines	public:
/// descriptor and returns them as `results` list.		/// descriptor and returns them as `results` list.
static void unpack(OpBuilder &builder, Location loc, Value packed,		static void unpack(OpBuilder &builder, Location loc, Value packed,
SmallVectorImpl<Value> &results);		SmallVectorImpl<Value> &results);

/// Returns the number of non-aggregate values that would be produced by		/// Returns the number of non-aggregate values that would be produced by
/// `unpack`.		/// `unpack`.
static unsigned getNumUnpackedValues() { return 2; }		static unsigned getNumUnpackedValues() { return 2; }

/// Builds IR computing the sizes in bytes (suitable for opaque allocation)		/// Builds IR computing the size in bytes (suitable for opaque allocation).
/// and appends the corresponding values into `sizes`.		Value computeSize(OpBuilder &builder, Location loc,
static void computeSizes(OpBuilder &builder, Location loc,		LLVMTypeConverter &typeConverter);
LLVMTypeConverter &typeConverter,
ArrayRef<UnrankedMemRefDescriptor> values,		// Returns the size in bytes (suitable for opaque allocation).
SmallVectorImpl<Value> &sizes);		static int64_t getSize(LLVMTypeConverter &typeConverter, int64_t rank);

/// TODO: The following accessors don't take alignment rules between elements		/// TODO: The following accessors don't take alignment rules between elements
/// of the descriptor struct into account. For some architectures, it might be		/// of the descriptor struct into account. For some architectures, it might be
/// necessary to extend them and to use `llvm::DataLayout` contained in		/// necessary to extend them and to use `llvm::DataLayout` contained in
/// `LLVMTypeConverter`.		/// `LLVMTypeConverter`.

/// Builds IR extracting the allocated pointer from the descriptor.		/// Builds IR extracting the allocated pointer from the descriptor.
static Value allocatedPtr(OpBuilder &builder, Location loc,		static Value allocatedPtr(OpBuilder &builder, Location loc,
▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

mlir/include/mlir/Conversion/LLVMCommon/Pattern.h

Show First 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	protected:

/// Creates and populates a canonical memref descriptor struct.		/// Creates and populates a canonical memref descriptor struct.
MemRefDescriptor		MemRefDescriptor
createMemRefDescriptor(Location loc, MemRefType memRefType,		createMemRefDescriptor(Location loc, MemRefType memRefType,
Value allocatedPtr, Value alignedPtr,		Value allocatedPtr, Value alignedPtr,
ArrayRef<Value> sizes, ArrayRef<Value> strides,		ArrayRef<Value> sizes, ArrayRef<Value> strides,
ConversionPatternRewriter &rewriter) const;		ConversionPatternRewriter &rewriter) const;

/// Copies the memory descriptor for any operands that were unranked		/// Ensures that all unranked memory descriptors are on the stack.
/// descriptors originally to heap-allocated memory (if toDynamic is true) or		/// This concerns the dynamically sized inner descriptors. If their rank is
/// to stack-allocated memory (otherwise). Also frees the previously used		/// sufficiently small, we know that they reside in stack-allocated buffers
/// memory (that is assumed to be heap-allocated) if toDynamic is false.		/// already. Otherwise, if they are of a rank greater than the maximum rank
LogicalResult copyUnrankedDescriptors(OpBuilder &builder, Location loc,		/// for stack-allocated descriptor buffers, they reside on the heap. In this
		/// case, we have to copy them over to a newly stack-allocated buffer of the
		/// right size and free the previously used buffer on the heap.
		void copyUnrankedDescriptorsToStack(ConversionPatternRewriter &rewriter,
		Location loc, int64_t maxRankOnStack,
TypeRange origTypes,		TypeRange origTypes,
SmallVectorImpl<Value> &operands,		SmallVectorImpl<Value> &operands) const;
bool toDynamic) const;
		/// Copies all unranked memory descriptors, using the given buffer arguments
		/// or newly heap-allocated memory for the inner descriptors. This is to let
		/// unranked memory descriptors escape a function. If their rank is
		/// sufficiently small, we assume that their inner descriptor fits into the
		/// provided buffer. Otherwise, if they are of a rank greater than the maximum
		/// rank for stack-allocated descriptor buffers, we allocate a new buffer on
		/// the heap. In both cases, we copy the inner descriptor and create a copy of
		/// the unranked outer descriptor.
		void copyUnrankedDescriptorsToBufferOrHeap(
		ConversionPatternRewriter &rewriter, Location loc, int64_t maxRankOnStack,
		TypeRange origTypes, ArrayRef<Value> descBuffers,
		SmallVectorImpl<Value> &operands) const;
};		};

/// Utility class for operation conversions targeting the LLVM dialect that		/// Utility class for operation conversions targeting the LLVM dialect that
/// match exactly one source operation.		/// match exactly one source operation.
template <typename SourceOp>		template <typename SourceOp>
class ConvertOpToLLVMPattern : public ConvertToLLVMPattern {		class ConvertOpToLLVMPattern : public ConvertToLLVMPattern {
public:		public:
using OpAdaptor = typename SourceOp::Adaptor;		using OpAdaptor = typename SourceOp::Adaptor;
▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines

mlir/include/mlir/Conversion/Passes.td

Show First 20 Lines • Show All 516 Lines • ▼ Show 20 Lines	def ConvertStandardToLLVM : Pass<"convert-std-to-llvm", "ModuleOp"> {
let options = [		let options = [
Option<"useBarePtrCallConv", "use-bare-ptr-memref-call-conv", "bool",		Option<"useBarePtrCallConv", "use-bare-ptr-memref-call-conv", "bool",
/default=/"false",		/default=/"false",
"Replace FuncOp's MemRef arguments with bare pointers to the MemRef "		"Replace FuncOp's MemRef arguments with bare pointers to the MemRef "
"element types">,		"element types">,
Option<"emitCWrappers", "emit-c-wrappers", "bool", /default=/"false",		Option<"emitCWrappers", "emit-c-wrappers", "bool", /default=/"false",
"Emit wrappers for C-compatible pointer-to-struct memref "		"Emit wrappers for C-compatible pointer-to-struct memref "
"descriptors">,		"descriptors">,
		Option<"maxUnrankedDescBufferRank", "max-unranked-desc-buffer-rank",
		"int64_t", /default=/"8",
		"Specifies the maximum rank for which the calling convention will "
		"realize stack-allocated buffers for unranked memory descriptior "
		"results.">,
Option<"indexBitwidth", "index-bitwidth", "unsigned",		Option<"indexBitwidth", "index-bitwidth", "unsigned",
/default=kDeriveIndexBitwidthFromDataLayout/"0",		/default=kDeriveIndexBitwidthFromDataLayout/"0",
"Bitwidth of the index type, 0 to use size of machine word">,		"Bitwidth of the index type, 0 to use size of machine word">,
Option<"dataLayout", "data-layout", "std::string",		Option<"dataLayout", "data-layout", "std::string",
/default=/"\"\"",		/default=/"\"\"",
"String description (LLVM format) of the data layout that is "		"String description (LLVM format) of the data layout that is "
"expected on the produced module">		"expected on the produced module">
];		];
▲ Show 20 Lines • Show All 182 Lines • Show Last 20 Lines

mlir/lib/Conversion/LLVMCommon/MemRefBuilder.cpp

Show First 20 Lines • Show All 327 Lines • ▼ Show 20 Lines	void UnrankedMemRefDescriptor::unpack(OpBuilder &builder, Location loc,
Value packed,		Value packed,
SmallVectorImpl<Value> &results) {		SmallVectorImpl<Value> &results) {
UnrankedMemRefDescriptor d(packed);		UnrankedMemRefDescriptor d(packed);
results.reserve(results.size() + 2);		results.reserve(results.size() + 2);
results.push_back(d.rank(builder, loc));		results.push_back(d.rank(builder, loc));
results.push_back(d.memRefDescPtr(builder, loc));		results.push_back(d.memRefDescPtr(builder, loc));
}		}

void UnrankedMemRefDescriptor::computeSizes(		Value UnrankedMemRefDescriptor::computeSize(OpBuilder &builder, Location loc,
OpBuilder &builder, Location loc, LLVMTypeConverter &typeConverter,		LLVMTypeConverter &typeConverter) {
ArrayRef<UnrankedMemRefDescriptor> values, SmallVectorImpl<Value> &sizes) {
if (values.empty())
return;

// Cache the index type.		// Get constants.
Type indexType = typeConverter.getIndexType();		Type indexType = typeConverter.getIndexType();

// Initialize shared constants.
Value one = createIndexAttrConstant(builder, loc, indexType, 1);		Value one = createIndexAttrConstant(builder, loc, indexType, 1);
Value two = createIndexAttrConstant(builder, loc, indexType, 2);		Value two = createIndexAttrConstant(builder, loc, indexType, 2);
Value pointerSize = createIndexAttrConstant(		Value pointerSize = createIndexAttrConstant(
builder, loc, indexType, ceilDiv(typeConverter.getPointerBitwidth(), 8));		builder, loc, indexType, ceilDiv(typeConverter.getPointerBitwidth(), 8));
Value indexSize =		Value indexSize =
createIndexAttrConstant(builder, loc, indexType,		createIndexAttrConstant(builder, loc, indexType,
ceilDiv(typeConverter.getIndexTypeBitwidth(), 8));		ceilDiv(typeConverter.getIndexTypeBitwidth(), 8));

sizes.reserve(sizes.size() + values.size());
for (UnrankedMemRefDescriptor desc : values) {
// Emit IR computing the memory necessary to store the descriptor. This		// Emit IR computing the memory necessary to store the descriptor. This
// assumes the descriptor to be		// assumes the descriptor to be
// { type, type, index, index[rank], index[rank] }		// { type, type, index, index[rank], index[rank] }
// and densely packed, so the total size is		// and densely packed, so the total size is
// 2 * sizeof(pointer) + (1 + 2 * rank) * sizeof(index).		// 2 * sizeof(pointer) + (1 + 2 * rank) * sizeof(index).
// TODO: consider including the actual size (including eventual padding due		// TODO: consider including the actual size (including eventual padding due
// to data layout) into the unranked descriptor.		// to data layout) into the unranked descriptor.

		// 2 * sizeof(pointer)
Value doublePointerSize =		Value doublePointerSize =
builder.create<LLVM::MulOp>(loc, indexType, two, pointerSize);		builder.create<LLVM::MulOp>(loc, indexType, two, pointerSize);

// (1 + 2 * rank) * sizeof(index)		// (1 + 2 * rank) * sizeof(index)
Value rank = desc.rank(builder, loc);		Value rank = this->rank(builder, loc);
Value doubleRank = builder.create<LLVM::MulOp>(loc, indexType, two, rank);		Value doubleRank = builder.create<LLVM::MulOp>(loc, indexType, two, rank);
Value doubleRankIncremented =		Value doubleRankIncremented =
builder.create<LLVM::AddOp>(loc, indexType, doubleRank, one);		builder.create<LLVM::AddOp>(loc, indexType, doubleRank, one);
Value rankIndexSize = builder.create<LLVM::MulOp>(		Value rankIndexSize = builder.create<LLVM::MulOp>(
loc, indexType, doubleRankIncremented, indexSize);		loc, indexType, doubleRankIncremented, indexSize);

// Total allocation size.		return builder.create<LLVM::AddOp>(loc, indexType, doublePointerSize,
Value allocationSize = builder.create<LLVM::AddOp>(		rankIndexSize);
loc, indexType, doublePointerSize, rankIndexSize);
sizes.push_back(allocationSize);
}		}

		int64_t UnrankedMemRefDescriptor::getSize(LLVMTypeConverter &typeConverter,
		int64_t rank) {
		int64_t ptrSize = ceilDiv(typeConverter.getPointerBitwidth(), 8);
		int64_t indexSize = ceilDiv(typeConverter.getIndexTypeBitwidth(), 8);
		return 2 * ptrSize + (1 + 2 * rank) * indexSize;
}		}

Value UnrankedMemRefDescriptor::allocatedPtr(OpBuilder &builder, Location loc,		Value UnrankedMemRefDescriptor::allocatedPtr(OpBuilder &builder, Location loc,
Value memRefDescPtr,		Value memRefDescPtr,
Type elemPtrPtrType) {		Type elemPtrPtrType) {

Value elementPtrPtr =		Value elementPtrPtr =
builder.create<LLVM::BitcastOp>(loc, elemPtrPtrType, memRefDescPtr);		builder.create<LLVM::BitcastOp>(loc, elemPtrPtrType, memRefDescPtr);
▲ Show 20 Lines • Show All 139 Lines • Show Last 20 Lines

mlir/lib/Conversion/LLVMCommon/Pattern.cpp

Show First 20 Lines • Show All 219 Lines • ▼ Show 20 Lines	MemRefDescriptor ConvertToLLVMPattern::createMemRefDescriptor(

// Field 5: Strides.		// Field 5: Strides.
for (auto en : llvm::enumerate(strides))		for (auto en : llvm::enumerate(strides))
memRefDescriptor.setStride(rewriter, loc, en.index(), en.value());		memRefDescriptor.setStride(rewriter, loc, en.index(), en.value());

return memRefDescriptor;		return memRefDescriptor;
}		}

LogicalResult ConvertToLLVMPattern::copyUnrankedDescriptors(		void ConvertToLLVMPattern::copyUnrankedDescriptorsToStack(
OpBuilder &builder, Location loc, TypeRange origTypes,		ConversionPatternRewriter &rewriter, Location loc, int64_t maxRankOnStack,
SmallVectorImpl<Value> &operands, bool toDynamic) const {		TypeRange origTypes, SmallVectorImpl<Value> &operands) const {
assert(origTypes.size() == operands.size() &&
"expected as may original types as operands");

// Find operands of unranked memref type and store them.
SmallVector<UnrankedMemRefDescriptor, 4> unrankedMemrefs;
for (unsigned i = 0, e = operands.size(); i < e; ++i)
if (origTypes[i].isa<UnrankedMemRefType>())
unrankedMemrefs.emplace_back(operands[i]);

if (unrankedMemrefs.empty())		// Check if there is any unranked operand to avoid shared constants.
return success();		if (llvm::none_of(origTypes,
		[](Type ty) { return ty.isa<UnrankedMemRefType>(); })) {
		return;
		}

		OpBuilder::InsertionGuard guard(rewriter);

		// Find the free function.
		auto module = rewriter.getInsertionPoint()->getParentOfType<ModuleOp>();
		LLVM::LLVMFuncOp freeFunc = LLVM::lookupOrCreateFreeFn(module);

		// Get common types and constants.
		Type voidPtrTy = this->getVoidPtrType();
		Type i1Ty = rewriter.getI1Type();
		Value maxRankOnStackCst = rewriter.create<LLVM::ConstantOp>(
		loc, rewriter.getI64Type(), rewriter.getI64IntegerAttr(maxRankOnStack));

		for (unsigned i = 0; i < operands.size(); i++) {
		bondhugulaUnsubmitted Done Reply Inline Actions unsigned bondhugula: unsigned

		// Only copy unranked descriptors.
		if (!origTypes[i].isa<UnrankedMemRefType>())
		continue;

		// Split the block to insert descriptor copying logic.
		Block *origBlock = rewriter.getBlock();
		Block *continuationBlock =
		rewriter.splitBlock(origBlock, rewriter.getInsertionPoint());
		Type descTy = getTypeConverter()->convertType(origTypes[i]);
		continuationBlock->addArgument(descTy);

		// Generate the block for large ranks.
		// This is the case in which we expect the inner descriptor in dynamic
		// memory. We copy it to stack-allocated memory and free the original
		// inner descriptor before creating the outer descriptor copy.
		Block *largeRankBlock = rewriter.createBlock(origBlock->getParent());

		// Copy inner descriptor to stack.
		UnrankedMemRefDescriptor desc(operands[i]);
		Value allocationSize = desc.computeSize(rewriter, loc, *getTypeConverter());
		Value innerDescCpy = rewriter.create<LLVM::AllocaOp>(
		loc, voidPtrTy, allocationSize, /alignment=/0);
		Value innerDesc = desc.memRefDescPtr(rewriter, loc);
		Value zero = rewriter.create<LLVM::ConstantOp>(loc, i1Ty,
		rewriter.getBoolAttr(false));
		bondhugulaUnsubmitted Done Reply Inline Actions I'm trying to understand the impact here when such descriptors are being allocated on the stack inside of loops: wouldn't one typically run out of stack space? (Eg. call ops inside loops.) bondhugula: I'm trying to understand the impact here when such descriptors are being allocated on the stack…
		frgossenAuthorUnsubmitted Done Reply Inline Actions Yes, good point. This is a known issue with unranked memory descriptors in general. The previous calling convention and any other local unranked memory descriptor suffer from this issue. frgossen: Yes, good point. This is a known issue with unranked memory descriptors in general. The…
		rewriter.create<LLVM::MemcpyOp>(loc, innerDescCpy, innerDesc,
		allocationSize, zero);
		rewriter.create<LLVM::CallOp>(loc, freeFunc, innerDesc);

		// Create a new descriptor. The same descriptor can be returned multiple
		// times, attempting to modify its pointer can lead to memory leaks
		// (allocated twice and overwritten) or double frees (the caller does not
		// know if the descriptor points to the same memory).
		auto descCpy = UnrankedMemRefDescriptor::undef(rewriter, loc, descTy);
		descCpy.setRank(rewriter, loc, desc.rank(rewriter, loc));
		descCpy.setMemRefDescPtr(rewriter, loc, innerDescCpy);

		// Propagate the new descriptor.
		rewriter.create<LLVM::BrOp>(loc, Value(descCpy), continuationBlock);

		// Generate the condition to decide if the inner descriptor is already on
		// the stack (for small ranks) or if we have to copy it over (for large
		// ranks).
		rewriter.setInsertionPointToEnd(origBlock);
		Value rank = desc.rank(rewriter, loc);
		Value pred = rewriter.create<LLVM::ICmpOp>(loc, LLVM::ICmpPredicate::ule,
		rank, maxRankOnStackCst);
		rewriter.create<LLVM::CondBrOp>(loc, pred, continuationBlock, operands[i],
		largeRankBlock, ValueRange{});

		// Continue with the original descriptor or its on-stack copy, which are
		// passed as a block argument.
		rewriter.setInsertionPointToStart(continuationBlock);
		operands[i] = continuationBlock->getArgument(0);
		}
		}

// Compute allocation sizes.		void ConvertToLLVMPattern::copyUnrankedDescriptorsToBufferOrHeap(
SmallVector<Value, 4> sizes;		ConversionPatternRewriter &rewriter, Location loc, int64_t maxRankOnStack,
UnrankedMemRefDescriptor::computeSizes(builder, loc, *getTypeConverter(),		TypeRange origTypes, ArrayRef<Value> descBuffers,
unrankedMemrefs, sizes);		SmallVectorImpl<Value> &operands) const {

// Get frequently used types.		// Check if there is any unranked operand to avoid shared constants.
MLIRContext *context = builder.getContext();		if (llvm::none_of(origTypes,
Type voidPtrType = LLVM::LLVMPointerType::get(IntegerType::get(context, 8));		[](Type ty) { return ty.isa<UnrankedMemRefType>(); })) {
auto i1Type = IntegerType::get(context, 1);		return;
Type indexType = getTypeConverter()->getIndexType();		}

// Find the malloc and free, or declare them if necessary.		OpBuilder::InsertionGuard guard(rewriter);
auto module = builder.getInsertionPoint()->getParentOfType<ModuleOp>();
LLVM::LLVMFuncOp freeFunc, mallocFunc;		// Get common types and constants.
if (toDynamic)		Type indexTy = getTypeConverter()->getIndexType();
mallocFunc = LLVM::lookupOrCreateMallocFn(module, indexType);		Type voidPtrTy = LLVM::LLVMPointerType::get(rewriter.getI8Type());
if (!toDynamic)		Type i1Ty = rewriter.getI1Type();
freeFunc = LLVM::lookupOrCreateFreeFn(module);		Value maxRankOnStackCst = rewriter.create<LLVM::ConstantOp>(
		loc, rewriter.getI64Type(), rewriter.getI64IntegerAttr(maxRankOnStack));
// Initialize shared constants.
Value zero =		// Find the malloc function.
builder.create<LLVM::ConstantOp>(loc, i1Type, builder.getBoolAttr(false));		auto module = rewriter.getInsertionPoint()->getParentOfType<ModuleOp>();
		LLVM::LLVMFuncOp mallocFunc = LLVM::lookupOrCreateMallocFn(module, indexTy);
unsigned unrankedMemrefPos = 0;
for (unsigned i = 0, e = operands.size(); i < e; ++i) {		unsigned nextBuffer = 0;
Type type = origTypes[i];		for (unsigned i = 0; i < operands.size(); i++) {
if (!type.isa<UnrankedMemRefType>())
		// Only copy unranked descriptors.
		if (!origTypes[i].isa<UnrankedMemRefType>())
continue;		continue;
Value allocationSize = sizes[unrankedMemrefPos++];
		// Compute the size of the inner descriptor for allocation and copying.
UnrankedMemRefDescriptor desc(operands[i]);		UnrankedMemRefDescriptor desc(operands[i]);
		Value allocationSize = desc.computeSize(rewriter, loc, *getTypeConverter());

// Allocate memory, copy, and free the source if necessary.		// Split the block to insert descriptor copying logic.
Value memory =		Block *origBlock = rewriter.getBlock();
toDynamic		Block *continuationBlock =
? builder.create<LLVM::CallOp>(loc, mallocFunc, allocationSize)		rewriter.splitBlock(origBlock, rewriter.getInsertionPoint());
.getResult(0)		continuationBlock->addArgument(voidPtrTy);
: builder.create<LLVM::AllocaOp>(loc, voidPtrType, allocationSize,
/alignment=/0);		// Generate the block for small ranks.
Value source = desc.memRefDescPtr(builder, loc);		// This is the case in which we can copy the inner descriptor to the
builder.create<LLVM::MemcpyOp>(loc, memory, source, allocationSize, zero);		// available buffer.
if (!toDynamic)		Block *smallRankBlock = rewriter.createBlock(origBlock->getParent());
builder.create<LLVM::CallOp>(loc, freeFunc, source);		Value buffer = descBuffers[nextBuffer++];
		rewriter.create<LLVM::BrOp>(loc, buffer, continuationBlock);

		// Generate the block for large ranks.
		// This is the case in which we copy the inner descriptor to heap-allocated
		// memory as the available buffer is too small.
		Block *largeRankBlock = rewriter.createBlock(origBlock->getParent());
		Value newBuffer =
		rewriter.create<LLVM::CallOp>(loc, mallocFunc, allocationSize)
		.getResult(0);
		rewriter.create<LLVM::BrOp>(loc, newBuffer, continuationBlock);

		// Generate the condition to decide if the inner descriptor can be copied to
		// the available buffer (for small ranks) or if we need a bigger one (for
		// large ranks).
		rewriter.setInsertionPointToEnd(origBlock);
		Value rank = desc.rank(rewriter, loc);
		Value pred = rewriter.create<LLVM::ICmpOp>(loc, LLVM::ICmpPredicate::ule,
		rank, maxRankOnStackCst);
		rewriter.create<LLVM::CondBrOp>(loc, pred, smallRankBlock, largeRankBlock);

		// Continue with the selected buffer for the inner descriptor copy, which is
		// passed as a block argument.
		rewriter.setInsertionPointToStart(continuationBlock);
		Value innerDescCpy = continuationBlock->getArgument(0);

		// Copy the inner descriptor to the new buffer.
		Value innerDesc = desc.memRefDescPtr(rewriter, loc);
		Value zero = rewriter.create<LLVM::ConstantOp>(loc, i1Ty,
		rewriter.getBoolAttr(false));
		rewriter.create<LLVM::MemcpyOp>(loc, innerDescCpy, innerDesc,
		allocationSize, zero);

// Create a new descriptor. The same descriptor can be returned multiple		// Create a new descriptor. The same descriptor can be returned multiple
// times, attempting to modify its pointer can lead to memory leaks		// times, attempting to modify its pointer can lead to memory leaks
// (allocated twice and overwritten) or double frees (the caller does not		// (allocated twice and overwritten) or double frees (the caller does not
// know if the descriptor points to the same memory).		// know if the descriptor points to the same memory).
Type descriptorType = getTypeConverter()->convertType(type);		Type descTy = getTypeConverter()->convertType(origTypes[i]);
if (!descriptorType)		auto descCpy = UnrankedMemRefDescriptor::undef(rewriter, loc, descTy);
return failure();		descCpy.setRank(rewriter, loc, rank);
auto updatedDesc =		descCpy.setMemRefDescPtr(rewriter, loc, innerDescCpy);
UnrankedMemRefDescriptor::undef(builder, loc, descriptorType);
Value rank = desc.rank(builder, loc);
updatedDesc.setRank(builder, loc, rank);
updatedDesc.setMemRefDescPtr(builder, loc, memory);

operands[i] = updatedDesc;		operands[i] = descCpy;
}		}

return success();
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Detail methods		// Detail methods
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// Replaces the given operation "op" with a new operation of type "targetOp"		/// Replaces the given operation "op" with a new operation of type "targetOp"
/// and given operands.		/// and given operands.
Show All 37 Lines

mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp

	Show First 20 Lines • Show All 122 Lines • ▼ Show 20 Lines
	// pointer-to-function types.			// pointer-to-function types.
	Type LLVMTypeConverter::convertFunctionType(FunctionType type) {			Type LLVMTypeConverter::convertFunctionType(FunctionType type) {
	SignatureConversion conversion(type.getNumInputs());			SignatureConversion conversion(type.getNumInputs());
	Type converted =			Type converted =
	convertFunctionSignature(type, /isVariadic=/false, conversion);			convertFunctionSignature(type, /isVariadic=/false, conversion);
	return LLVM::LLVMPointerType::get(converted);			return LLVM::LLVMPointerType::get(converted);
	}			}

	// Function types are converted to LLVM Function types by recursively converting			// Function types are converted to LLVM function types by elementwise converting
	// argument and result types. If MLIR Function has zero results, the LLVM			// argument and result types. If the MLIR function has zero results, the LLVM
	// Function has one VoidType result. If MLIR Function has more than one result,			// function has one VoidType result. If the MLIR function has more than one
	// they are into an LLVM StructType in their order of appearance.			// result, they are packed into an LLVM StructType in their order of appearance.
				// For every unranked memref result of the MLIR function, the LLVM function
				// expects one preceeding buffer argument. These are used to avoid dynamic
				herhutUnsubmitted Done Reply Inline Actions Nit: used to avoid herhut: Nit: used to avoid
				// memory allocation for the inner descriptors if their rank is suffiently small
				// (see option max-unranked-desc-buffer-rank).
	Type LLVMTypeConverter::convertFunctionSignature(			Type LLVMTypeConverter::convertFunctionSignature(
	FunctionType funcTy, bool isVariadic,			FunctionType funcTy, bool isVariadic,
	LLVMTypeConverter::SignatureConversion &result) {			LLVMTypeConverter::SignatureConversion &result) {
	// Select the argument converter depending on the calling convention.			// Select the argument converter depending on the calling convention.
	auto funcArgConverter = options.useBarePtrCallConv			auto funcArgConverter = options.useBarePtrCallConv
	? barePtrFuncArgTypeConverter			? barePtrFuncArgTypeConverter
	: structFuncArgTypeConverter;			: structFuncArgTypeConverter;
	// Convert argument types one by one and check for errors.			// Convert argument types one by one and check for errors.
	for (auto &en : llvm::enumerate(funcTy.getInputs())) {			for (auto &en : llvm::enumerate(funcTy.getInputs())) {
	Type type = en.value();			Type type = en.value();
	SmallVector<Type, 8> converted;			SmallVector<Type, 8> converted;
	if (failed(funcArgConverter(*this, type, converted)))			if (failed(funcArgConverter(*this, type, converted)))
	return {};			return {};
	result.addInputs(en.index(), converted);			result.addInputs(en.index(), converted);
	}			}

	SmallVector<Type, 8> argTypes;			SmallVector<Type, 8> argTypes;
	argTypes.reserve(llvm::size(result.getConvertedTypes()));			argTypes.reserve(llvm::size(result.getConvertedTypes()));

				// Add one void ptr per unranked result. These are used to pass buffers for
				// the inner descriptors.
				auto voidPtrTy =
				LLVM::LLVMPointerType::get(IntegerType::get(&getContext(), 8));
				for (Type ty : funcTy.getResults()) {
				if (ty.isa<UnrankedMemRefType>())
				argTypes.push_back(voidPtrTy);
				}

	for (Type type : result.getConvertedTypes())			for (Type type : result.getConvertedTypes())
	argTypes.push_back(type);			argTypes.push_back(type);

	// If function does not return anything, create the void result type,			// If function does not return anything, create the void result type,
	// if it returns on element, convert it, otherwise pack the result types into			// if it returns on element, convert it, otherwise pack the result types into
	// a struct.			// a struct.
	Type resultType = funcTy.getNumResults() == 0			Type resultType = funcTy.getNumResults() == 0
	? LLVM::LLVMVoidType::get(&getContext())			? LLVM::LLVMVoidType::get(&getContext())
	▲ Show 20 Lines • Show All 332 Lines • Show Last 20 Lines

mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp

Show First 20 Lines • Show All 887 Lines • ▼ Show 20 Lines	convertSourceMemRefToDescriptor(ConversionPatternRewriter &rewriter,
unsigned addressSpace = targetType.getMemorySpaceAsInt();		unsigned addressSpace = targetType.getMemorySpaceAsInt();
Type elementType = targetType.getElementType();		Type elementType = targetType.getElementType();

// Create the unranked memref descriptor that holds the ranked one. The		// Create the unranked memref descriptor that holds the ranked one. The
// inner descriptor is allocated on stack.		// inner descriptor is allocated on stack.
auto targetDesc = UnrankedMemRefDescriptor::undef(		auto targetDesc = UnrankedMemRefDescriptor::undef(
rewriter, loc, typeConverter->convertType(targetType));		rewriter, loc, typeConverter->convertType(targetType));
targetDesc.setRank(rewriter, loc, resultRank);		targetDesc.setRank(rewriter, loc, resultRank);
SmallVector<Value, 4> sizes;		Value allocationSize =
UnrankedMemRefDescriptor::computeSizes(rewriter, loc, *getTypeConverter(),		targetDesc.computeSize(rewriter, loc, *getTypeConverter());
targetDesc, sizes);
Value underlyingDescPtr = rewriter.create<LLVM::AllocaOp>(		Value underlyingDescPtr = rewriter.create<LLVM::AllocaOp>(
loc, getVoidPtrType(), sizes.front(), llvm::None);		loc, getVoidPtrType(), allocationSize, llvm::None);
targetDesc.setMemRefDescPtr(rewriter, loc, underlyingDescPtr);		targetDesc.setMemRefDescPtr(rewriter, loc, underlyingDescPtr);

// Extract pointers and offset from the source memref.		// Extract pointers and offset from the source memref.
Value allocatedPtr, alignedPtr, offset;		Value allocatedPtr, alignedPtr, offset;
extractPointersAndOffset(loc, rewriter, *getTypeConverter(),		extractPointersAndOffset(loc, rewriter, *getTypeConverter(),
reshapeOp.source(), adaptor.source(),		reshapeOp.source(), adaptor.source(),
&allocatedPtr, &alignedPtr, &offset);		&allocatedPtr, &alignedPtr, &offset);

▲ Show 20 Lines • Show All 681 Lines • Show Last 20 Lines

mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp

Show All 35 Lines
#include "mlir/Transforms/Passes.h"		#include "mlir/Transforms/Passes.h"
#include "mlir/Transforms/Utils.h"		#include "mlir/Transforms/Utils.h"
#include "llvm/ADT/TypeSwitch.h"		#include "llvm/ADT/TypeSwitch.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/FormatVariadic.h"		#include "llvm/Support/FormatVariadic.h"
#include <functional>

using namespace mlir;		using namespace mlir;

#define PASS_NAME "convert-std-to-llvm"		#define PASS_NAME "convert-std-to-llvm"

/// Only retain those attributes that are not constructed by		/// Only retain those attributes that are not constructed by
/// `LLVMFuncOp::build`. If `filterArgAttrs` is set, also filter out argument		/// `LLVMFuncOp::build`. If `filterArgAttrs` is set, also filter out argument
/// attributes.		/// attributes.
Show All 32 Lines	std::tie(wrapperFuncType, resultIsNowArg) =
typeConverter.convertFunctionTypeCWrapper(type);		typeConverter.convertFunctionTypeCWrapper(type);
auto wrapperFuncOp = rewriter.create<LLVM::LLVMFuncOp>(		auto wrapperFuncOp = rewriter.create<LLVM::LLVMFuncOp>(
loc, llvm::formatv("_mlir_ciface_{0}", funcOp.getName()).str(),		loc, llvm::formatv("_mlir_ciface_{0}", funcOp.getName()).str(),
wrapperFuncType, LLVM::Linkage::External, /dsoLocal/ false, attributes);		wrapperFuncType, LLVM::Linkage::External, /dsoLocal/ false, attributes);

OpBuilder::InsertionGuard guard(rewriter);		OpBuilder::InsertionGuard guard(rewriter);
rewriter.setInsertionPointToStart(wrapperFuncOp.addEntryBlock());		rewriter.setInsertionPointToStart(wrapperFuncOp.addEntryBlock());

		// If any of the results is an unranked descriptor, extract the pre-allocated
		// buffers form the result prts and pass them on as individual preceeding
		// arguments.
SmallVector<Value, 8> args;		SmallVector<Value, 8> args;
		if (resultIsNowArg) {
		Value resultPtr = wrapperFuncOp.getArgument(0);
		if (type.getNumResults() == 1 &&
		type.getResults().front().isa<UnrankedMemRefType>()) {
		Value loaded = rewriter.create<LLVM::LoadOp>(loc, resultPtr);
		UnrankedMemRefDescriptor unrankedDescr(loaded);
		Value innerDescrPtr = unrankedDescr.memRefDescPtr(rewriter, loc);
		args.push_back(innerDescrPtr);
		} else if (type.getNumResults() > 1 &&
		llvm::any_of(type.getResults(), [](Type ty) {
		return ty.isa<UnrankedMemRefType>();
		})) {
		Value loaded = rewriter.create<LLVM::LoadOp>(loc, resultPtr);
		for (auto it : llvm::enumerate(type.getResults())) {
		if (it.value().isa<UnrankedMemRefType>()) {
		Type resultTy = loaded.getType()
		.cast<LLVM::LLVMStructType>()
		.getBody()[it.index()];
		Value loadedResult = rewriter.create<LLVM::ExtractValueOp>(
		loc, resultTy, loaded, rewriter.getI64ArrayAttr(it.index()));
		UnrankedMemRefDescriptor unrankedDescr(loadedResult);
		Value innerDescrPtr = unrankedDescr.memRefDescPtr(rewriter, loc);
		args.push_back(innerDescrPtr);
		}
		}
		}
		}

size_t argOffset = resultIsNowArg ? 1 : 0;		size_t argOffset = resultIsNowArg ? 1 : 0;
for (auto &en : llvm::enumerate(type.getInputs())) {		for (auto &en : llvm::enumerate(type.getInputs())) {
Value arg = wrapperFuncOp.getArgument(en.index() + argOffset);		Value arg = wrapperFuncOp.getArgument(en.index() + argOffset);
if (auto memrefType = en.value().dyn_cast<MemRefType>()) {		if (auto memrefType = en.value().dyn_cast<MemRefType>()) {
Value loaded = rewriter.create<LLVM::LoadOp>(loc, arg);		Value loaded = rewriter.create<LLVM::LoadOp>(loc, arg);
MemRefDescriptor::unpack(rewriter, loc, loaded, memrefType, args);		MemRefDescriptor::unpack(rewriter, loc, loaded, memrefType, args);
continue;		continue;
}		}
Show All 12 Lines	if (resultIsNowArg) {
rewriter.create<LLVM::StoreOp>(loc, call.getResult(0),		rewriter.create<LLVM::StoreOp>(loc, call.getResult(0),
wrapperFuncOp.getArgument(0));		wrapperFuncOp.getArgument(0));
rewriter.create<LLVM::ReturnOp>(loc, ValueRange{});		rewriter.create<LLVM::ReturnOp>(loc, ValueRange{});
} else {		} else {
rewriter.create<LLVM::ReturnOp>(loc, call.getResults());		rewriter.create<LLVM::ReturnOp>(loc, call.getResults());
}		}
}		}

/// Creates an auxiliary function with pointer-to-memref-descriptor-struct		/// Creates an auxiliary function declaration with
/// arguments instead of unpacked arguments. Creates a body for the (external)		/// pointer-to-memref-descriptor-struct arguments instead of unpacked arguments.
/// `newFuncOp` that allocates a memref descriptor on stack, packs the		/// Creates a body for the (external) `newFuncOp` that allocates a memref
/// individual arguments into this descriptor and passes a pointer to it into		/// descriptor on stack, packs the individual arguments into this descriptor and
/// the auxiliary function. If the result of the function cannot be directly		/// passes a pointer to it into the auxiliary function. If the result of the
/// returned, we write it to a special first argument that provides a pointer		/// function cannot be directly returned, we write it to a special first
/// to a corresponding struct. This auxiliary external function is now		/// argument that provides a pointer to a corresponding struct. This auxiliary
/// compatible with functions defined in C using pointers to C structs		/// external function is now compatible with functions defined in C using
/// corresponding to a memref descriptor.		/// pointers to C structs corresponding to a memref descriptor.
static void wrapExternalFunction(OpBuilder &builder, Location loc,		static void wrapExternalFunction(OpBuilder &builder, Location loc,
LLVMTypeConverter &typeConverter,		LLVMTypeConverter &typeConverter,
FuncOp funcOp, LLVM::LLVMFuncOp newFuncOp) {		FuncOp funcOp, LLVM::LLVMFuncOp newFuncOp) {
OpBuilder::InsertionGuard guard(builder);		OpBuilder::InsertionGuard guard(builder);

Type wrapperType;		Type wrapperType;
bool resultIsNowArg;		bool resultIsNowArg;
std::tie(wrapperType, resultIsNowArg) =		std::tie(wrapperType, resultIsNowArg) =
Show All 13 Lines	auto wrapperFunc = builder.create<LLVM::LLVMFuncOp>(
wrapperType, LLVM::Linkage::External, /dsoLocal/ false, attributes);		wrapperType, LLVM::Linkage::External, /dsoLocal/ false, attributes);

builder.setInsertionPointToStart(newFuncOp.addEntryBlock());		builder.setInsertionPointToStart(newFuncOp.addEntryBlock());

// Get a ValueRange containing arguments.		// Get a ValueRange containing arguments.
FunctionType type = funcOp.getType();		FunctionType type = funcOp.getType();
SmallVector<Value, 8> args;		SmallVector<Value, 8> args;
args.reserve(type.getNumInputs());		args.reserve(type.getNumInputs());
ValueRange wrapperArgsRange(newFuncOp.getArguments());
		// Count the number of unranked results, which require special treatment.
		int numUnrankedResults = llvm::count_if(
		type.getResults(), [](Type ty) { return ty.isa<UnrankedMemRefType>(); });

if (resultIsNowArg) {		if (resultIsNowArg) {

// Allocate the struct on the stack and pass the pointer.		// Allocate the struct on the stack and pass the pointer.
Type resultType =		auto resultPtrTy =
wrapperType.cast<LLVM::LLVMFunctionType>().getParamType(0);		wrapperType.cast<LLVM::LLVMFunctionType>().getParamType(0);
Value one = builder.create<LLVM::ConstantOp>(		Value one = builder.create<LLVM::ConstantOp>(
loc, typeConverter.convertType(builder.getIndexType()),		loc, typeConverter.convertType(builder.getIndexType()),
builder.getIntegerAttr(builder.getIndexType(), 1));		builder.getIntegerAttr(builder.getIndexType(), 1));
Value result = builder.create<LLVM::AllocaOp>(loc, resultType, one);		Value resultPtr = builder.create<LLVM::AllocaOp>(loc, resultPtrTy, one);
args.push_back(result);		args.push_back(resultPtr);

		// If any of the results is an unranked descriptor, populate the
		// pre-allocated result with the descriptor buffers that were passed as
		// function arguments.
		if (type.getNumResults() == 1 &&
		type.getResults().front().isa<UnrankedMemRefType>()) {
		auto desc = UnrankedMemRefDescriptor::undef(
		builder, loc, newFuncOp.getType().getReturnType());
		Value buffer = newFuncOp.getArgument(0);
		desc.setMemRefDescPtr(builder, loc, buffer);
		builder.create<LLVM::StoreOp>(loc, desc, resultPtr);
		} else if (type.getNumResults() > 1 && numUnrankedResults > 0) {
		int bufferIdx = 0;
		Type resultTy = newFuncOp.getType().getReturnType();
		Value result = builder.create<LLVM::UndefOp>(loc, resultTy);
		for (auto it : llvm::enumerate(type.getResults())) {
		if (auto unrankedMemRefTy = it.value().dyn_cast<UnrankedMemRefType>()) {
		Type descTy = typeConverter.convertType(unrankedMemRefTy);
		auto desc = UnrankedMemRefDescriptor::undef(builder, loc, descTy);
		Value buffer = newFuncOp.getArgument(bufferIdx++);
		desc.setMemRefDescPtr(builder, loc, buffer);
		result = builder.create<LLVM::InsertValueOp>(
		loc, resultTy, result, desc, builder.getI64ArrayAttr(it.index()));
		}
		}
		builder.create<LLVM::StoreOp>(loc, result, resultPtr);
		}
}		}

// Iterate over the inputs of the original function and pack values into		// Iterate over the inputs of the original function and pack values into
// memref descriptors if the original type is a memref.		// memref descriptors if the original type is a memref.
		ValueRange wrapperArgsRange(
		newFuncOp.getArguments().drop_front(numUnrankedResults));
for (auto &en : llvm::enumerate(type.getInputs())) {		for (auto &en : llvm::enumerate(type.getInputs())) {
Value arg;		Value arg;
int numToDrop = 1;		int numToDrop = 1;
auto memRefType = en.value().dyn_cast<MemRefType>();		auto memRefType = en.value().dyn_cast<MemRefType>();
auto unrankedMemRefType = en.value().dyn_cast<UnrankedMemRefType>();		auto unrankedMemRefType = en.value().dyn_cast<UnrankedMemRefType>();
if (memRefType \|\| unrankedMemRefType) {		if (memRefType \|\| unrankedMemRefType) {
numToDrop = memRefType		numToDrop = memRefType
? MemRefDescriptor::getNumUnpackedValues(memRefType)		? MemRefDescriptor::getNumUnpackedValues(memRefType)
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	if (funcOp->hasAttr("llvm.linkage")) {
linkage = attr.getLinkage();		linkage = attr.getLinkage();
}		}
auto newFuncOp = rewriter.create<LLVM::LLVMFuncOp>(		auto newFuncOp = rewriter.create<LLVM::LLVMFuncOp>(
funcOp.getLoc(), funcOp.getName(), llvmType, linkage,		funcOp.getLoc(), funcOp.getName(), llvmType, linkage,
/dsoLocal/ false, attributes);		/dsoLocal/ false, attributes);
rewriter.inlineRegionBefore(funcOp.getBody(), newFuncOp.getBody(),		rewriter.inlineRegionBefore(funcOp.getBody(), newFuncOp.getBody(),
newFuncOp.end());		newFuncOp.end());
if (failed(rewriter.convertRegionTypes(&newFuncOp.getBody(), *typeConverter,		if (failed(rewriter.convertRegionTypes(&newFuncOp.getBody(), *typeConverter,
&result)))		&result))) {
return nullptr;		return nullptr;
		}

		// For every unranked result, add a preceeding void ptr argument to pass the
		// descriptor buffer.
		if (!newFuncOp.getBody().empty()) {
		auto loc = funcOp.getLoc();
		Block &entryBlock = newFuncOp.getBody().front();
		auto voidPtrTy = getVoidPtrType();
		for (Type ty : funcOp.getType().getResults()) {
		if (ty.isa<UnrankedMemRefType>())
		entryBlock.insertArgument(static_cast<unsigned>(0), voidPtrTy, loc);
		}
		}

return newFuncOp;		return newFuncOp;
}		}
};		};

/// FuncOp legalization pattern that converts MemRef arguments to pointers to		/// FuncOp legalization pattern that converts MemRef arguments to pointers to
/// MemRef descriptors (LLVM struct data types) containing all the MemRef type		/// MemRef descriptors (LLVM struct data types) containing all the MemRef type
/// information.		/// information.
static constexpr StringRef kEmitIfaceAttrName = "llvm.emit_c_interface";		static constexpr StringRef kEmitIfaceAttrName = "llvm.emit_c_interface";
struct FuncOpConversion : public FuncOpConversionBase {		struct FuncOpConversion : public FuncOpConversionBase {
FuncOpConversion(LLVMTypeConverter &converter)		FuncOpConversion(LLVMTypeConverter &converter)
: FuncOpConversionBase(converter) {}		: FuncOpConversionBase(converter) {}

LogicalResult		LogicalResult
matchAndRewrite(FuncOp funcOp, OpAdaptor adaptor,		matchAndRewrite(FuncOp funcOp, OpAdaptor adaptor,
ConversionPatternRewriter &rewriter) const override {		ConversionPatternRewriter &rewriter) const override {
auto newFuncOp = convertFuncOpToLLVMFuncOp(funcOp, rewriter);		auto newFuncOp = convertFuncOpToLLVMFuncOp(funcOp, rewriter);
if (!newFuncOp)		if (!newFuncOp)
return failure();		return failure();

if (getTypeConverter()->getOptions().emitCWrappers \|\|		if (getTypeConverter()->getOptions().emitCWrappers \|\|
funcOp->getAttrOfType<UnitAttr>(kEmitIfaceAttrName)) {		funcOp->getAttrOfType<UnitAttr>(kEmitIfaceAttrName)) {
if (newFuncOp.isExternal())		if (newFuncOp.isExternal()) {
wrapExternalFunction(rewriter, funcOp.getLoc(), *getTypeConverter(),		wrapExternalFunction(rewriter, funcOp.getLoc(), *getTypeConverter(),
funcOp, newFuncOp);		funcOp, newFuncOp);
else		} else {
wrapForExternalCallers(rewriter, funcOp.getLoc(), *getTypeConverter(),		wrapForExternalCallers(rewriter, funcOp.getLoc(), *getTypeConverter(),
funcOp, newFuncOp);		funcOp, newFuncOp);
}		}
		}

rewriter.eraseOp(funcOp);		rewriter.eraseOp(funcOp);
return success();		return success();
}		}
};		};

/// FuncOp legalization pattern that converts MemRef arguments to bare pointers		/// FuncOp legalization pattern that converts MemRef arguments to bare pointers
/// to the MemRef element type. This will impact the calling convention and ABI.		/// to the MemRef element type. This will impact the calling convention and ABI.
▲ Show 20 Lines • Show All 198 Lines • ▼ Show 20 Lines
struct CallOpInterfaceLowering : public ConvertOpToLLVMPattern<CallOpType> {		struct CallOpInterfaceLowering : public ConvertOpToLLVMPattern<CallOpType> {
using ConvertOpToLLVMPattern<CallOpType>::ConvertOpToLLVMPattern;		using ConvertOpToLLVMPattern<CallOpType>::ConvertOpToLLVMPattern;
using Super = CallOpInterfaceLowering<CallOpType>;		using Super = CallOpInterfaceLowering<CallOpType>;
using Base = ConvertOpToLLVMPattern<CallOpType>;		using Base = ConvertOpToLLVMPattern<CallOpType>;

LogicalResult		LogicalResult
matchAndRewrite(CallOpType callOp, typename CallOpType::Adaptor adaptor,		matchAndRewrite(CallOpType callOp, typename CallOpType::Adaptor adaptor,
ConversionPatternRewriter &rewriter) const override {		ConversionPatternRewriter &rewriter) const override {
		auto &typeConverter = *this->getTypeConverter();
		int64_t maxUnrankedDescBufferRank =
		typeConverter.getOptions().maxUnrankedDescBufferRank;

// Pack the result types into a struct.		// Pack the result types into a struct.
Type packedResult = nullptr;		Type packedResult = nullptr;
unsigned numResults = callOp.getNumResults();		unsigned numResults = callOp.getNumResults();
auto resultTypes = llvm::to_vector<4>(callOp.getResultTypes());		auto resultTypes = llvm::to_vector<4>(callOp.getResultTypes());

if (numResults != 0) {		if (numResults != 0) {
if (!(packedResult =		if (!(packedResult = typeConverter.packFunctionResults(resultTypes)))
this->getTypeConverter()->packFunctionResults(resultTypes)))
return failure();		return failure();
}		}

auto promoted = this->getTypeConverter()->promoteOperands(		SmallVector<Value, 18> args;

		// Create and pass a stack-allocated buffer for every unranked result.
		int numUnrankedResults =
		llvm::count_if(callOp.getResultTypes(),
		[](Type ty) { return ty.isa<UnrankedMemRefType>(); });
		if (numUnrankedResults > 0) {
		auto loc = callOp.getLoc();
		Value bufferSize = this->createIndexConstant(
		rewriter, loc,
		UnrankedMemRefDescriptor::getSize(typeConverter,
		maxUnrankedDescBufferRank));
		for (int i = 0; i < numUnrankedResults; i++) {
		args.push_back(rewriter.create<LLVM::AllocaOp>(
		callOp.getLoc(), this->getVoidPtrType(), bufferSize));
		}
		}

		auto promoted = typeConverter.promoteOperands(
callOp.getLoc(), /opOperands=/callOp->getOperands(),		callOp.getLoc(), /opOperands=/callOp->getOperands(),
adaptor.getOperands(), rewriter);		adaptor.getOperands(), rewriter);
		args.append(promoted.begin(), promoted.end());
auto newOp = rewriter.create<LLVM::CallOp>(		auto newOp = rewriter.create<LLVM::CallOp>(
callOp.getLoc(), packedResult ? TypeRange(packedResult) : TypeRange(),		callOp.getLoc(), packedResult ? TypeRange(packedResult) : TypeRange(),
promoted, callOp->getAttrs());		args, callOp->getAttrs());

SmallVector<Value, 4> results;		SmallVector<Value, 4> results;
if (numResults < 2) {		if (numResults < 2) {
// If < 2 results, packing did not do anything and we can just return.		// If < 2 results, packing did not do anything and we can just return.
results.append(newOp.result_begin(), newOp.result_end());		results.append(newOp.result_begin(), newOp.result_end());
} else {		} else {
// Otherwise, it had been converted to an operation producing a structure.		// Otherwise, it had been converted to an operation producing a structure.
// Extract individual results from the structure and return them as list.		// Extract individual results from the structure and return them as list.
results.reserve(numResults);		results.reserve(numResults);
for (unsigned i = 0; i < numResults; ++i) {		for (unsigned i = 0; i < numResults; ++i) {
auto type =		auto type = typeConverter.convertType(callOp.getResult(i).getType());
this->typeConverter->convertType(callOp.getResult(i).getType());
results.push_back(rewriter.create<LLVM::ExtractValueOp>(		results.push_back(rewriter.create<LLVM::ExtractValueOp>(
callOp.getLoc(), type, newOp->getResult(0),		callOp.getLoc(), type, newOp->getResult(0),
rewriter.getI64ArrayAttr(i)));		rewriter.getI64ArrayAttr(i)));
}		}
}		}

if (this->getTypeConverter()->getOptions().useBarePtrCallConv) {		if (typeConverter.getOptions().useBarePtrCallConv) {
// For the bare-ptr calling convention, promote memref results to		// For the bare-ptr calling convention, promote memref results to
// descriptors.		// descriptors.
assert(results.size() == resultTypes.size() &&		assert(results.size() == resultTypes.size() &&
"The number of arguments and types doesn't match");		"The number of arguments and types doesn't match");
this->getTypeConverter()->promoteBarePtrsToDescriptors(		typeConverter.promoteBarePtrsToDescriptors(rewriter, callOp.getLoc(),
rewriter, callOp.getLoc(), resultTypes, results);		resultTypes, results);
} else if (failed(this->copyUnrankedDescriptors(rewriter, callOp.getLoc(),		} else {
resultTypes, results,		this->copyUnrankedDescriptorsToStack(rewriter, callOp.getLoc(),
/toDynamic=/false))) {		maxUnrankedDescBufferRank,
return failure();		resultTypes, results);
}		}

rewriter.replaceOp(callOp, results);		rewriter.replaceOp(callOp, results);
return success();		return success();
}		}
};		};

struct CallOpLowering : public CallOpInterfaceLowering<CallOp> {		struct CallOpLowering : public CallOpInterfaceLowering<CallOp> {
▲ Show 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	if (getTypeConverter()->getOptions().useBarePtrCallConv) {
// Unranked memref is not supported in the bare pointer calling		// Unranked memref is not supported in the bare pointer calling
// convention.		// convention.
return failure();		return failure();
}		}
updatedOperands.push_back(newOperand);		updatedOperands.push_back(newOperand);
}		}
} else {		} else {
updatedOperands = llvm::to_vector<4>(adaptor.getOperands());		updatedOperands = llvm::to_vector<4>(adaptor.getOperands());
(void)copyUnrankedDescriptors(rewriter, loc, op.getOperands().getTypes(),
updatedOperands,		auto funcOp = op->getParentOfType<LLVM::LLVMFuncOp>();
/toDynamic=/true);		auto descBuffers = llvm::to_vector<8>(llvm::map_range(
		funcOp.getArguments(), [](BlockArgument a) { return Value(a); }));
		copyUnrankedDescriptorsToBufferOrHeap(
		rewriter, loc,
		getTypeConverter()->getOptions().maxUnrankedDescBufferRank,
		op.getOperands().getTypes(), descBuffers, updatedOperands);
}		}

// If ReturnOp has 0 or 1 operand, create it and return immediately.		// If ReturnOp has 0 or 1 operand, create it and return immediately.
if (numArguments == 0) {		if (numArguments == 0) {
rewriter.replaceOpWithNewOp<LLVM::ReturnOp>(op, TypeRange(), ValueRange(),		rewriter.replaceOpWithNewOp<LLVM::ReturnOp>(op, TypeRange(), ValueRange(),
op->getAttrs());		op->getAttrs());
return success();		return success();
}		}
▲ Show 20 Lines • Show All 358 Lines • ▼ Show 20 Lines	void mlir::populateStdToLLVMConversionPatterns(LLVMTypeConverter &converter,
// clang-format on		// clang-format on
}		}

namespace {		namespace {
/// A pass converting MLIR operations into the LLVM IR dialect.		/// A pass converting MLIR operations into the LLVM IR dialect.
struct LLVMLoweringPass : public ConvertStandardToLLVMBase<LLVMLoweringPass> {		struct LLVMLoweringPass : public ConvertStandardToLLVMBase<LLVMLoweringPass> {
LLVMLoweringPass() = default;		LLVMLoweringPass() = default;
LLVMLoweringPass(bool useBarePtrCallConv, bool emitCWrappers,		LLVMLoweringPass(bool useBarePtrCallConv, bool emitCWrappers,
unsigned indexBitwidth, bool useAlignedAlloc,		int64_t maxUnrankedDescBufferRank, unsigned indexBitwidth,
const llvm::DataLayout &dataLayout) {		bool useAlignedAlloc, const llvm::DataLayout &dataLayout) {
this->useBarePtrCallConv = useBarePtrCallConv;		this->useBarePtrCallConv = useBarePtrCallConv;
this->emitCWrappers = emitCWrappers;		this->emitCWrappers = emitCWrappers;
		this->maxUnrankedDescBufferRank = maxUnrankedDescBufferRank;
this->indexBitwidth = indexBitwidth;		this->indexBitwidth = indexBitwidth;
this->dataLayout = dataLayout.getStringRepresentation();		this->dataLayout = dataLayout.getStringRepresentation();
}		}

/// Run the dialect converter on the module.		/// Run the dialect converter on the module.
void runOnOperation() override {		void runOnOperation() override {
if (useBarePtrCallConv && emitCWrappers) {		if (useBarePtrCallConv && emitCWrappers) {
getOperation().emitError()		getOperation().emitError()
Show All 12 Lines	void runOnOperation() override {

ModuleOp m = getOperation();		ModuleOp m = getOperation();
const auto &dataLayoutAnalysis = getAnalysis<DataLayoutAnalysis>();		const auto &dataLayoutAnalysis = getAnalysis<DataLayoutAnalysis>();

LowerToLLVMOptions options(&getContext(),		LowerToLLVMOptions options(&getContext(),
dataLayoutAnalysis.getAtOrAbove(m));		dataLayoutAnalysis.getAtOrAbove(m));
options.useBarePtrCallConv = useBarePtrCallConv;		options.useBarePtrCallConv = useBarePtrCallConv;
options.emitCWrappers = emitCWrappers;		options.emitCWrappers = emitCWrappers;
		options.maxUnrankedDescBufferRank = maxUnrankedDescBufferRank;
if (indexBitwidth != kDeriveIndexBitwidthFromDataLayout)		if (indexBitwidth != kDeriveIndexBitwidthFromDataLayout)
options.overrideIndexBitwidth(indexBitwidth);		options.overrideIndexBitwidth(indexBitwidth);
options.dataLayout = llvm::DataLayout(this->dataLayout);		options.dataLayout = llvm::DataLayout(this->dataLayout);

LLVMTypeConverter typeConverter(&getContext(), options,		LLVMTypeConverter typeConverter(&getContext(), options,
&dataLayoutAnalysis);		&dataLayoutAnalysis);

RewritePatternSet patterns(&getContext());		RewritePatternSet patterns(&getContext());
Show All 19 Lines	mlir::createLowerToLLVMPass(const LowerToLLVMOptions &options) {
// There is no way to provide additional patterns for pass, so		// There is no way to provide additional patterns for pass, so
// AllocLowering::None will always fail.		// AllocLowering::None will always fail.
assert(allocLowering != LowerToLLVMOptions::AllocLowering::None &&		assert(allocLowering != LowerToLLVMOptions::AllocLowering::None &&
"LLVMLoweringPass doesn't support AllocLowering::None");		"LLVMLoweringPass doesn't support AllocLowering::None");
bool useAlignedAlloc =		bool useAlignedAlloc =
(allocLowering == LowerToLLVMOptions::AllocLowering::AlignedAlloc);		(allocLowering == LowerToLLVMOptions::AllocLowering::AlignedAlloc);
return std::make_unique<LLVMLoweringPass>(		return std::make_unique<LLVMLoweringPass>(
options.useBarePtrCallConv, options.emitCWrappers,		options.useBarePtrCallConv, options.emitCWrappers,
options.getIndexBitwidth(), useAlignedAlloc, options.dataLayout);		options.maxUnrankedDescBufferRank, options.getIndexBitwidth(),
		useAlignedAlloc, options.dataLayout);
}		}

mlir/test/Conversion/StandardToLLVM/calling-convention-dbg.mlir

This file was added.

				// RUN: mlir-opt %s \
				// RUN: --convert-memref-to-llvm \
				// RUN: --convert-std-to-llvm --allow-unregistered-dialect
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions (drive by comment: please use the test dialect and remove the allow-unregistered-dialect option) But actually I think you didn't even intend to have this file here? mehdi_amini: (drive by comment: please use the test dialect and remove the allow-unregistered-dialect…

				func @bar() -> memref<*xf32> attributes { llvm.emit_c_interface } {
				%0 = "get"() : () -> (memref<*xf32>)
				return %0 : memref<*xf32>
				}

mlir/test/Conversion/StandardToLLVM/calling-convention-external-c-function-callee.mlir

This file was added.

				// RUN: mlir-opt %s \
				// RUN: --convert-memref-to-llvm \
				// RUN: --convert-std-to-llvm='max-unranked-desc-buffer-rank=5' \| FileCheck %s

				func private @external_no_result(%arg0 : memref<?x?xf32>)
				attributes { llvm.emit_c_interface }

				// CHECK-LABEL: llvm.func @external_no_result
				// CHECK-SAME: %[[ALLOC:.]]: !llvm.ptr<f32>, %[[ALIGN:.]]: !llvm.ptr<f32>, %[[OFFSET:.]]: i64, %[[SIZE0:.]]: i64, %[[SIZE1:.]]: i64, %[[STRIDE0:.]]: i64, %[[STRIDE1:.*]]: i64

				// Populate the descriptor for arg0.
				// CHECK: %[[DESC0:.*]] = llvm.mlir.undef : [[DESC_TY:!llvm.struct<\(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>\)>]]
				// CHECK: %[[DESC1:.*]] = llvm.insertvalue %[[ALLOC]], %[[DESC0]][0]
				// CHECK: %[[DESC2:.*]] = llvm.insertvalue %[[ALIGN]], %[[DESC1]][1]
				// CHECK: %[[DESC3:.*]] = llvm.insertvalue %[[OFFSET]], %[[DESC2]][2]
				// CHECK: %[[DESC4:.*]] = llvm.insertvalue %[[SIZE0]], %[[DESC3]][3, 0]
				// CHECK: %[[DESC5:.*]] = llvm.insertvalue %[[STRIDE0]], %[[DESC4]][4, 0]
				// CHECK: %[[DESC6:.*]] = llvm.insertvalue %[[SIZE1]], %[[DESC5]][3, 1]
				// CHECK: %[[DESC7:.*]] = llvm.insertvalue %[[STRIDE1]], %[[DESC6]][4, 1]

				// Allocate on stack and store to comply with C calling convention.
				// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index)
				// CHECK: %[[ARG_PTR:.*]] = llvm.alloca %[[C1]] x [[DESC_TY]]
				// CHECK: llvm.store %[[DESC7]], %[[ARG_PTR]]

				// Call the interface function.
				// CHECK: llvm.call @_mlir_ciface_external_no_result(%[[ARG_PTR]])
				// CHECK: llvm.return

				// Verify that an interface function is emitted.
				// CHECK-LABEL: llvm.func @_mlir_ciface_external_no_result
				// CHECK-SAME: (!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>)>>)


				func private @external_single_result(%arg0 : memref<?xf32>) -> memref<?xf32>
				attributes { llvm.emit_c_interface }

				// CHECK-LABEL: llvm.func @external_single_result
				// CHECK-SAME: %[[ALLOC:.]]: !llvm.ptr<f32>, %[[ALIGN:.]]: !llvm.ptr<f32>, %[[OFFSET:.]]: i64, %[[SIZE0:.]]: i64, %[[STRIDE0:.*]]: i64

				// Allocate result on stack.
				// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index)
				// CHECK: %[[RESUT_PTR:.*]] = llvm.alloca %[[C1]] x [[RESULT_DESC_TY:!llvm.struct<\(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>\)>]]

				// Populate the descriptor for arg0.
				// CHECK: %[[ARG_DESC0:.*]] = llvm.mlir.undef : [[ARG_DESC_TY:!llvm.struct<\(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>\)>]]
				// CHECK: %[[ARG_DESC1:.*]] = llvm.insertvalue %[[ALLOC]], %[[ARG_DESC0]][0]
				// CHECK: %[[ARG_DESC2:.*]] = llvm.insertvalue %[[ALIGN]], %[[ARG_DESC1]][1]
				// CHECK: %[[ARG_DESC3:.*]] = llvm.insertvalue %[[OFFSET]], %[[ARG_DESC2]][2]
				// CHECK: %[[ARG_DESC4:.*]] = llvm.insertvalue %[[SIZE0]], %[[ARG_DESC3]][3, 0]
				// CHECK: %[[ARG_DESC5:.*]] = llvm.insertvalue %[[STRIDE0]], %[[ARG_DESC4]][4, 0]

				// Allocate on stack and store to comply with C calling convention.
				// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index)
				// CHECK: %[[ARG_PTR:.*]] = llvm.alloca %[[C1]] x [[ARG_DESC_TY]]
				// CHECK: llvm.store %[[ARG_DESC5]], %[[ARG_PTR]]

				// Call the interface function.
				// CHECK: llvm.call @_mlir_ciface_external_single_result(%[[RESUT_PTR]], %[[ARG_PTR]])

				// Load and return the result.
				// CHECK: %[[RESULT:.*]] = llvm.load %[[RESUT_PTR]]
				// CHECK: llvm.return %[[RESULT]]

				// Verify that an interface function is emitted.
				// CHECK-LABEL: llvm.func @_mlir_ciface_external_single_result
				// CHECK: (!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>, !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>)


				func private @external_multiple_result(%arg0 : memref<?x?xf32>)
				-> (memref<?x?xf32>, memref<?xf32>, i64, f32)
				attributes { llvm.emit_c_interface }

				// CHECK-LABEL: llvm.func @external_multiple_result
				// CHECK-SAME: %[[ALLOC:.]]: !llvm.ptr<f32>, %[[ALIGN:.]]: !llvm.ptr<f32>, %[[OFFSET:.]]: i64, %[[SIZE0:.]]: i64, %[[SIZE1:.]]: i64, %[[STRIDE0:.]]: i64, %[[STRIDE1:.*]]: i64

				// Allocate result on stack.
				// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index)
				// CHECK: %[[RESULT_PTR:.*]] = llvm.alloca %[[C1]] x [[RESULT_DESC_TY:!llvm.struct<\(struct<\(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>\)>, struct<\(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>\)>, i64, f32\)>]]

				// Populate the descriptor for arg0.
				// CHECK: %[[ARG_DESC0:.*]] = llvm.mlir.undef : [[ARG_DESC_TY:!llvm.struct<\(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>\)>]]
				// CHECK: %[[ARG_DESC1:.*]] = llvm.insertvalue %[[ALLOC]], %[[ARG_DESC0]][0]
				// CHECK: %[[ARG_DESC2:.*]] = llvm.insertvalue %[[ALIGN]], %[[ARG_DESC1]][1]
				// CHECK: %[[ARG_DESC3:.*]] = llvm.insertvalue %[[OFFSET]], %[[ARG_DESC2]][2]
				// CHECK: %[[ARG_DESC4:.*]] = llvm.insertvalue %[[SIZE0]], %[[ARG_DESC3]][3, 0]
				// CHECK: %[[ARG_DESC5:.*]] = llvm.insertvalue %[[STRIDE0]], %[[ARG_DESC4]][4, 0]
				// CHECK: %[[ARG_DESC6:.*]] = llvm.insertvalue %[[SIZE1]], %[[ARG_DESC5]][3, 1]
				// CHECK: %[[ARG_DESC7:.*]] = llvm.insertvalue %[[STRIDE1]], %[[ARG_DESC6]][4, 1]

				// Allocate on stack and store to comply with C calling convention.
				// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index)
				// CHECK: %[[ARG_PTR:.*]] = llvm.alloca %[[C1]] x [[ARG_DESC_TY]]
				// CHECK: llvm.store %[[ARG_DESC7]], %[[ARG_PTR]]

				// Call the interface function.
				// CHECK: llvm.call @_mlir_ciface_external_multiple_result(%[[RESULT_PTR]], %[[ARG_PTR]])

				// Load and return the result.
				// CHECK: %[[RESULT:.*]] = llvm.load %[[RESULT_PTR]]
				// CHECK: llvm.return %[[RESULT]]

				// Verify that an interface function is emitted.
				// CHECK-LABEL: llvm.func @_mlir_ciface_external_multiple_result
				// CHECK-SAME: (!llvm.ptr<struct<(struct<(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>)>, struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>, i64, f32)>>, !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>)>>)

				func private @external_multiple_args(%arg0 : i64, %arg1 : memref<?x?xf32>,
				%arg2 : memref<?xf32>, %arg3 : f32) attributes { llvm.emit_c_interface }

				// CHECK-LABEL: llvm.func @external_multiple_args
				// CHECK-SAME: %[[IARG:arg0]]: i64,
				// CHECK-SAME: %[[ALLOC0:arg1]]: !llvm.ptr<f32>, %[[ALIGN0:arg2]]: !llvm.ptr<f32>, %[[OFFSET0:arg3]]: i64, %[[SIZE00:arg4]]: i64, %[[SIZE01:arg5]]: i64, %[[STRIDE00:arg6]]: i64, %[[STRIDE01:arg7]]: i64,
				// CHECK-SAME: %[[ALLOC1:arg8]]: !llvm.ptr<f32>, %[[ALIGN1:arg9]]: !llvm.ptr<f32>, %[[OFFSET1:arg10]]: i64, %[[SIZE10:arg11]]: i64, %[[STRIDE10:arg12]]: i64,
				// CHECK-SAME: %[[FARG:arg13]]: f32

				// Populate the descriptor for arg0.
				// CHECK: %[[ARG0_DESC0:.*]] = llvm.mlir.undef : [[ARG0_DESC_TY:!llvm.struct<\(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>\)>]]
				// CHECK: %[[ARG0_DESC1:.*]] = llvm.insertvalue %[[ALLOC0]], %[[ARG0_DESC0]][0]
				// CHECK: %[[ARG0_DESC2:.*]] = llvm.insertvalue %[[ALIGN0]], %[[ARG0_DESC1]][1]
				// CHECK: %[[ARG0_DESC3:.*]] = llvm.insertvalue %[[OFFSET0]], %[[ARG0_DESC2]][2]
				// CHECK: %[[ARG0_DESC4:.*]] = llvm.insertvalue %[[SIZE00]], %[[ARG0_DESC3]][3, 0]
				// CHECK: %[[ARG0_DESC5:.*]] = llvm.insertvalue %[[STRIDE00]], %[[ARG0_DESC4]][4, 0]
				// CHECK: %[[ARG0_DESC6:.*]] = llvm.insertvalue %[[SIZE01]], %[[ARG0_DESC5]][3, 1]
				// CHECK: %[[ARG0_DESC7:.*]] = llvm.insertvalue %[[STRIDE01]], %[[ARG0_DESC6]][4, 1]

				// Allocate on stack and store to comply with C calling convention.
				// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index)
				// CHECK: %[[ARG0_PTR:.*]] = llvm.alloca %[[C1]] x [[ARG0_DESC_TY]]
				// CHECK: llvm.store %[[ARG0_DESC7]], %[[ARG0_PTR]]

				// Populate the descriptor for arg1.
				// CHECK: %[[ARG1_DESC0:.*]] = llvm.mlir.undef : [[ARG1_DESC_TY:!llvm.struct<\(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>\)>]]
				// CHECK: %[[ARG1_DESC1:.*]] = llvm.insertvalue %[[ALLOC1]], %[[ARG1_DESC0]][0]
				// CHECK: %[[ARG1_DESC2:.*]] = llvm.insertvalue %[[ALIGN1]], %[[ARG1_DESC1]][1]
				// CHECK: %[[ARG1_DESC3:.*]] = llvm.insertvalue %[[OFFSET1]], %[[ARG1_DESC2]][2]
				// CHECK: %[[ARG1_DESC4:.*]] = llvm.insertvalue %[[SIZE10]], %[[ARG1_DESC3]][3, 0]
				// CHECK: %[[ARG1_DESC5:.*]] = llvm.insertvalue %[[STRIDE10]], %[[ARG1_DESC4]][4, 0]

				// Allocate on stack and store to comply with C calling convention.
				// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index)
				// CHECK: %[[ARG1_PTR:.*]] = llvm.alloca %[[C1]] x [[ARG1_DESC_TY]]
				// CHECK: llvm.store %[[ARG1_DESC5]], %[[ARG1_PTR]]

				// Call the interface function.
				// CHECK: llvm.call @_mlir_ciface_external_multiple_args(%[[IARG]], %[[ARG0_PTR]], %[[ARG1_PTR]], %[[FARG]])
				// CHECK: llvm.return

				// Verify that an interface function is emitted.
				// CHECK-LABEL: llvm.func @_mlir_ciface_external_multiple_args
				// CHECK-SAME: (i64, !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>)>>, !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>, f32)


				func private @external_no_result_unranked(%arg0 : memref<*xf32>)
				attributes { llvm.emit_c_interface }

				// CHECK-LABEL: llvm.func @external_no_result_unranked
				// CHECK-SAME: %[[RANK:.]]: i64, %[[INNER_DESC:.]]: !llvm.ptr<i8>

				// Populate the descriptor for arg0.
				// CHECK: %[[ARG_DESC0:.*]] = llvm.mlir.undef : [[ARG_DESC_TY:!llvm.struct<\(i64, ptr<i8>\)>]]
				// CHECK: %[[ARG_DESC1:.*]] = llvm.insertvalue %[[RANK]], %[[ARG_DESC0]][0]
				// CHECK: %[[ARG_DESC2:.*]] = llvm.insertvalue %[[INNER_DESC]], %[[ARG_DESC1]][1]

				// Allocate on stack and store to comply with C calling convention.
				// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index)
				// CHECK: %[[ARG_PTR:.*]] = llvm.alloca %[[C1]] x [[ARG_DESC_TY]]
				// CHECK: llvm.store %[[ARG_DESC2]], %[[ARG_PTR]]

				// Call the interface function.
				// CHECK: llvm.call @_mlir_ciface_external_no_result_unranked(%[[ARG_PTR]])
				// CHECK: llvm.return

				// Verify that an interface function is emitted.
				// CHECK-LABEL: llvm.func @_mlir_ciface_external_no_result_unranked
				// CHECK-SAME: (!llvm.ptr<struct<(i64, ptr<i8>)>>)


				func private @external_single_result_unranked(%arg0 : memref<*xf32>)
				-> memref<*xf32> attributes { llvm.emit_c_interface }

				// CHECK-LABEL: llvm.func @external_single_result_unranked
				// CHECK-SAME: %[[RESULT_INNER_DESC_BUFFER:.]]: !llvm.ptr<i8>, %[[ARG_RANK:.]]: i64, %[[ARG_INNER_DESC:.*]]: !llvm.ptr<i8>

				// Allocate result on stack and populate buffer for inner descriptor.
				// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index)
				// CHECK: %[[RESULT_PTR:.*]] = llvm.alloca %[[C1]] x [[RESULT_DESC_TY:!llvm.struct<\(i64, ptr<i8>\)>]]
				// CHECK: %[[RESULT0:.*]] = llvm.mlir.undef : [[RESULT_DESC_TY]]
				// CHECK: %[[RESULT1:.*]] = llvm.insertvalue %[[RESULT_INNER_DESC_BUFFER]], %[[RESULT0]][1]
				// CHECK: llvm.store %[[RESULT1]], %[[RESULT_PTR]]

				// Populate the descriptor for arg0.
				// CHECK: %[[ARG_DESC0:.*]] = llvm.mlir.undef : [[ARG_DESC_TY:!llvm.struct<\(i64, ptr<i8>\)>]]
				// CHECK: %[[ARG_DESC1:.*]] = llvm.insertvalue %[[ARG_RANK]], %[[ARG_DESC0]][0]
				// CHECK: %[[ARG_DESC2:.*]] = llvm.insertvalue %[[ARG_INNER_DESC]], %[[ARG_DESC1]][1]

				// Allocate on stack and store to comply with C calling convention.
				// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index)
				// CHECK: %[[ARG_PTR:.*]] = llvm.alloca %[[C1]] x [[ARG_DESC_TY]]
				// CHECK: llvm.store %[[ARG_DESC2]], %[[ARG_PTR]]

				// Call the interface function.
				// CHECK: llvm.call @_mlir_ciface_external_single_result_unranked(%[[RESULT_PTR]], %[[ARG_PTR]])

				// Load and return the result.
				// CHECK: %[[RESULT:.*]] = llvm.load %[[RESULT_PTR]]
				// CHECK: llvm.return %[[RESULT]]

				// Verify that an interface function is emitted.
				// CHECK-LABEL: llvm.func @_mlir_ciface_external_single_result_unranked
				// CHECK-SAME: (!llvm.ptr<struct<(i64, ptr<i8>)>>, !llvm.ptr<struct<(i64, ptr<i8>)>>)


				func private @external_multiple_result_unranked(%arg0 : memref<*xf32>)
				-> (f32, i64, memref<xf32>, memref<xf32>)
				attributes { llvm.emit_c_interface }

				// CHECK-LABEL: llvm.func @external_multiple_result_unranked
				// CHECK-SAME: %[[RESULT_INNER_DESC_BUFFER0:.]]: !llvm.ptr<i8>, %[[RESULT_INNER_DESC_BUFFER1:.]]: !llvm.ptr<i8>,
				// CHECK-SAME: %[[ARG_RANK:.]]: i64, %[[ARG_INNER_DESC:.]]: !llvm.ptr<i8>

				// Allocate result on stack and populate buffers for inner descriptors.
				// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index)
				// CHECK: %[[RESULT_PTR:.*]] = llvm.alloca %[[C1]] x [[RESULT_TY:!llvm.struct<\(f32, i64, struct<\(i64, ptr<i8>\)>, struct<\(i64, ptr<i8>\)>\)>]]
				// CHECK: %[[RESULT0:.*]] = llvm.mlir.undef : [[RESULT_TY]]
				// CHECK: %[[RESULT_DESC00:.*]] = llvm.mlir.undef : [[RESULT_DESC0_TY:!llvm.struct<\(i64, ptr<i8>\)>]]
				// CHECK: %[[RESULT_DESC01:.*]] = llvm.insertvalue %[[RESULT_INNER_DESC_BUFFER0]], %[[RESULT_DESC00]][1]
				// CHECK: %[[RESULT1:.*]] = llvm.insertvalue %[[RESULT_DESC01]], %[[RESULT0]][2]
				// CHECK: %[[RESULT_DESC10:.*]] = llvm.mlir.undef : [[RESULT_DESC1_TY:!llvm.struct<\(i64, ptr<i8>\)>]]
				// CHECK: %[[RESULT_DESC11:.*]] = llvm.insertvalue %[[RESULT_INNER_DESC_BUFFER1]], %[[RESULT_DESC10]][1]
				// CHECK: %[[RESULT2:.*]] = llvm.insertvalue %[[RESULT_DESC11]], %[[RESULT1]][3]
				// CHECK: llvm.store %[[RESULT2]], %[[RESULT_PTR]]

				// Populate the descriptor for arg0.
				// CHECK: %[[ARG_DESC0:.*]] = llvm.mlir.undef : [[ARG_DESC_TY:!llvm.struct<\(i64, ptr<i8>\)>]]
				// CHECK: %[[ARG_DESC1:.*]] = llvm.insertvalue %[[ARG_RANK]], %[[ARG_DESC0]][0]
				// CHECK: %[[ARG_DESC2:.*]] = llvm.insertvalue %[[ARG_INNER_DESC]], %[[ARG_DESC1]][1]

				// Allocate on stack and store to comply with C calling convention.
				// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index)
				// CHECK: %[[ARG_PTR:.*]] = llvm.alloca %[[C1]] x [[ARG_DESC_TY]]
				// CHECK: llvm.store %[[ARG_DESC2]], %[[ARG_PTR]]

				// Call the interface function.
				// CHECK: llvm.call @_mlir_ciface_external_multiple_result_unranked(%[[RESULT_PTR]], %[[ARG_PTR]])

				// Load and return the result.
				// CHECK: %[[RESULT:.*]] = llvm.load %[[RESULT_PTR]]
				// CHECK: llvm.return %[[RESULT]]

				// Verify that an interface function is emitted.
				// CHECK-LABEL: llvm.func @_mlir_ciface_external_multiple_result_unranked
				// CHECK-SAME: (!llvm.ptr<struct<(f32, i64, struct<(i64, ptr<i8>)>, struct<(i64, ptr<i8>)>)>>, !llvm.ptr<struct<(i64, ptr<i8>)>>)


				func private @external_multiple_args_unranked(%arg0 : memref<*xf32>,
				%arg1 : f32, %arg2 : memref<*xf32>, %arg3 : i64)
				attributes { llvm.emit_c_interface }

				// CHECK-LABEL: llvm.func @external_multiple_args_unranked
				// CHECK-SAME: %[[ARG0_RANK:.*]]: i64, %[[ARG0_INNER_DESC:arg1]]: !llvm.ptr<i8>,
				// CHECK-SAME: %[[FARG:arg2]]: f32,
				// CHECK-SAME: %[[ARG2_RANK:.*]]: i64, %[[ARG2_INNER_DESC:arg4]]: !llvm.ptr<i8>,
				// CHECK-SAME: %[[IARG:.*]]: i64

				// Populate the descriptor for arg0.
				// CHECK: %[[ARG0_DESC0:.*]] = llvm.mlir.undef : [[ARG0_DESC_TY:!llvm.struct<\(i64, ptr<i8>\)>]]
				// CHECK: %[[ARG0_DESC1:.*]] = llvm.insertvalue %[[ARG0_RANK]], %[[ARG0_DESC0]][0]
				// CHECK: %[[ARG0_DESC2:.*]] = llvm.insertvalue %[[ARG0_INNER_DESC]], %[[ARG0_DESC1]][1]

				// Allocate on stack and store to comply with C calling convention.
				// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index)
				// CHECK: %[[ARG0_PTR:.*]] = llvm.alloca %[[C1]] x [[ARG0_DESC_TY]]
				// CHECK: llvm.store %[[ARG0_DESC2]], %[[ARG0_PTR]]

				// Populate the descriptor for arg2.
				// CHECK: %[[ARG2_DESC0:.*]] = llvm.mlir.undef : [[ARG2_DESC_TY:!llvm.struct<\(i64, ptr<i8>\)>]]
				// CHECK: %[[ARG2_DESC1:.*]] = llvm.insertvalue %[[ARG2_RANK]], %[[ARG2_DESC0]][0]
				// CHECK: %[[ARG2_DESC2:.*]] = llvm.insertvalue %[[ARG2_INNER_DESC]], %[[ARG2_DESC1]][1]

				// Allocate on stack and store to comply with C calling convention.
				// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index)
				// CHECK: %[[ARG2_PTR:.*]] = llvm.alloca %[[C1]] x [[ARG2_DESC_TY]]
				// CHECK: llvm.store %[[ARG2_DESC2]], %[[ARG2_PTR]]

				// Call the interface function.
				// CHECK: llvm.call @_mlir_ciface_external_multiple_args_unranked(%[[ARG0_PTR]], %[[FARG]], %[[ARG2_PTR]], %[[IARG]])
				// CHECK: llvm.return

				// Verify that an interface function is emitted.
				// CHECK-LABEL: llvm.func @_mlir_ciface_external_multiple_args_unranked
				// CHECK-SAME: (!llvm.ptr<struct<(i64, ptr<i8>)>>, f32, !llvm.ptr<struct<(i64, ptr<i8>)>>, i64)

mlir/test/Conversion/StandardToLLVM/calling-convention-external-c-function-caller.mlir

This file was added.

				// RUN: mlir-opt %s \
				// RUN: --convert-memref-to-llvm \
				// RUN: --convert-std-to-llvm='max-unranked-desc-buffer-rank=5' \| FileCheck %s

				func @callee_no_result(%arg0 : memref<?x?xf32>)
				attributes { llvm.emit_c_interface } {
				%c0 = constant 0 : index
				%c1 = constant 1 : index
				%0 = memref.load %arg0[%c0, %c1] : memref<?x?xf32>
				return
				}

				// CHECK-LABEL: llvm.func @callee_no_result
				// CHECK-SAME: %[[ALLOC:.]]: !llvm.ptr<f32>, %[[ALIGN:.]]: !llvm.ptr<f32>, %[[OFFSET:.]]: i64, %[[SIZE0:.]]: i64, %[[SIZE1:.]]: i64, %[[STRIDE0:.]]: i64, %[[STRIDE1:.*]]: i64

				// Populate the descriptor for arg0.
				// CHECK: %[[ARG_DESC0:.*]] = llvm.mlir.undef
				// CHECK: %[[ARG_DESC1:.*]] = llvm.insertvalue %[[ALLOC]], %[[ARG_DESC0]][0]
				// CHECK: %[[ARG_DESC2:.*]] = llvm.insertvalue %[[ALIGN]], %[[ARG_DESC1]][1]
				// CHECK: %[[ARG_DESC3:.*]] = llvm.insertvalue %[[OFFSET]], %[[ARG_DESC2]][2]
				// CHECK: %[[ARG_DESC4:.*]] = llvm.insertvalue %[[SIZE0]], %[[ARG_DESC3]][3, 0]
				// CHECK: %[[ARG_DESC5:.*]] = llvm.insertvalue %[[STRIDE0]], %[[ARG_DESC4]][4, 0]
				// CHECK: %[[ARG_DESC6:.*]] = llvm.insertvalue %[[SIZE1]], %[[ARG_DESC5]][3, 1]
				// CHECK: %[[ARG_DESC7:.*]] = llvm.insertvalue %[[STRIDE1]], %[[ARG_DESC6]][4, 1]

				// CHECK: %{{.}} = llvm.load %{{.}}
				// CHECK: llvm.return

				// CHECK-LABEL: llvm.func @_mlir_ciface_callee_no_result
				// CHECK-SAME: %[[ARG_PTR:.*]]: !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>)>>

				// Unpack descriptor for arg0.
				// CHECK: %[[ARG_DESC:.*]] = llvm.load %[[ARG_PTR]]
				// CHECK: %[[ALLOC:.*]] = llvm.extractvalue %[[ARG_DESC]][0]
				// CHECK: %[[ALIGN:.*]] = llvm.extractvalue %[[ARG_DESC]][1]
				// CHECK: %[[OFFSET:.*]] = llvm.extractvalue %[[ARG_DESC]][2]
				// CHECK: %[[SIZE0:.*]] = llvm.extractvalue %[[ARG_DESC]][3, 0]
				// CHECK: %[[SIZE1:.*]] = llvm.extractvalue %[[ARG_DESC]][3, 1]
				// CHECK: %[[STRIDE0:.*]] = llvm.extractvalue %[[ARG_DESC]][4, 0]
				// CHECK: %[[STRIDE1:.*]] = llvm.extractvalue %[[ARG_DESC]][4, 1]

				// Call the function.
				// CHECK: llvm.call @callee_no_result(%[[ALLOC]], %[[ALIGN]], %[[OFFSET]], %[[SIZE0]], %[[SIZE1]], %[[STRIDE0]], %[[STRIDE1]])
				// CHECK: llvm.return


				func @callee_single_result(%arg0 : memref<?xf32>) -> memref<?xf32>
				attributes { llvm.emit_c_interface } {
				return %arg0 : memref<?xf32>
				}

				// CHECK-LABEL: llvm.func @callee_single_result
				// CHECK-SAME: %[[ALLOC:.]]: !llvm.ptr<f32>, %[[ALIGN:.]]: !llvm.ptr<f32>, %[[OFFSET:.]]: i64, %[[SIZE0:.]]: i64, %[[STRIDE0:.*]]: i64

				// Populate the descriptor for arg0.
				// CHECK: %[[ARG_DESC0:.*]] = llvm.mlir.undef
				// CHECK: %[[ARG_DESC1:.*]] = llvm.insertvalue %[[ALLOC]], %[[ARG_DESC0]][0]
				// CHECK: %[[ARG_DESC2:.*]] = llvm.insertvalue %[[ALIGN]], %[[ARG_DESC1]][1]
				// CHECK: %[[ARG_DESC3:.*]] = llvm.insertvalue %[[OFFSET]], %[[ARG_DESC2]][2]
				// CHECK: %[[ARG_DESC4:.*]] = llvm.insertvalue %[[SIZE0]], %[[ARG_DESC3]][3, 0]
				// CHECK: %[[ARG_DESC5:.*]] = llvm.insertvalue %[[STRIDE0]], %[[ARG_DESC4]][4, 0]

				// CHECK: llvm.return %[[ARG_DESC5]]

				// CHECK-LABEL: llvm.func @_mlir_ciface_callee_single_result
				// CHECK-SAME: %[[RESULT_PTR:.*]]: !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>,
				// CHECK-SAME: %[[ARG_PTR:.*]]: !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>

				// Unpack descriptor for arg0.
				// CHECK: %[[ARG_DESC:.*]] = llvm.load %[[ARG_PTR]]
				// CHECK: %[[ALLOC:.*]] = llvm.extractvalue %[[ARG_DESC]][0]
				// CHECK: %[[ALIGN:.*]] = llvm.extractvalue %[[ARG_DESC]][1]
				// CHECK: %[[OFFSET:.*]] = llvm.extractvalue %[[ARG_DESC]][2]
				// CHECK: %[[SIZE0:.*]] = llvm.extractvalue %[[ARG_DESC]][3, 0]
				// CHECK: %[[STRIDE0:.*]] = llvm.extractvalue %[[ARG_DESC]][4, 0]

				// Call the function.
				// CHECK: %[[RESULT:.*]] = llvm.call @callee_single_result(%[[ALLOC]], %[[ALIGN]], %[[OFFSET]], %[[SIZE0]], %[[STRIDE0]])

				// Store the result and return.
				// CHECK: llvm.store %[[RESULT]], %[[RESULT_PTR]]
				// CHECK: llvm.return


				func @callee_multiple_result(%arg0 : memref<?x?xf32>,
				%arg1 : memref<?xf32>) -> (memref<?x?xf32>, memref<?xf32>, i64, f32)
				attributes { llvm.emit_c_interface } {
				%c3 = constant 3 : i64
				%pi = constant 3.141 : f32
				return %arg0, %arg1, %c3, %pi : memref<?x?xf32>, memref<?xf32>, i64, f32
				}

				// CHECK-LABEL: llvm.func @callee_multiple_result
				// CHECK-SAME: %[[ALLOC0:.]]: !llvm.ptr<f32>, %[[ALIGN0:.]]: !llvm.ptr<f32>, %[[OFFSET0:.]]: i64, %[[SIZE00:.]]: i64, %[[SIZE01:.]]: i64, %[[STRIDE00:.]]: i64, %[[STRIDE01:arg6]]: i64,
				// CHECK-SAME: %[[ALLOC1:.]]: !llvm.ptr<f32>, %[[ALIGN1:.]]: !llvm.ptr<f32>, %[[OFFSET1:.]]: i64, %[[SIZE10:.]]: i64, %[[STRIDE10:arg11]]: i64

				// Populate the descriptor for arg0.
				// CHECK: %[[ARG0_DESC0:.*]] = llvm.mlir.undef
				// CHECK: %[[ARG0_DESC1:.*]] = llvm.insertvalue %[[ALLOC0]], %[[ARG0_DESC0]][0]
				// CHECK: %[[ARG0_DESC2:.*]] = llvm.insertvalue %[[ALIGN0]], %[[ARG0_DESC1]][1]
				// CHECK: %[[ARG0_DESC3:.*]] = llvm.insertvalue %[[OFFSET0]], %[[ARG0_DESC2]][2]
				// CHECK: %[[ARG0_DESC4:.*]] = llvm.insertvalue %[[SIZE00]], %[[ARG0_DESC3]][3, 0]
				// CHECK: %[[ARG0_DESC5:.*]] = llvm.insertvalue %[[STRIDE00]], %[[ARG0_DESC4]][4, 0]
				// CHECK: %[[ARG0_DESC6:.*]] = llvm.insertvalue %[[SIZE01]], %[[ARG0_DESC5]][3, 1]
				// CHECK: %[[ARG0_DESC7:.*]] = llvm.insertvalue %[[STRIDE01]], %[[ARG0_DESC6]][4, 1]

				// Populate the descriptor for arg1.
				// CHECK: %[[ARG1_DESC0:.*]] = llvm.mlir.undef
				// CHECK: %[[ARG1_DESC1:.*]] = llvm.insertvalue %[[ALLOC1]], %[[ARG1_DESC0]][0]
				// CHECK: %[[ARG1_DESC2:.*]] = llvm.insertvalue %[[ALIGN1]], %[[ARG1_DESC1]][1]
				// CHECK: %[[ARG1_DESC3:.*]] = llvm.insertvalue %[[OFFSET1]], %[[ARG1_DESC2]][2]
				// CHECK: %[[ARG1_DESC4:.*]] = llvm.insertvalue %[[SIZE10]], %[[ARG1_DESC3]][3, 0]
				// CHECK: %[[ARG1_DESC5:.*]] = llvm.insertvalue %[[STRIDE10]], %[[ARG1_DESC4]][4, 0]

				// Populate and return result.
				// CHECK: %[[RESULT0:.*]] = llvm.mlir.undef
				// CHECK: %[[RESULT1:.*]] = llvm.insertvalue %[[ARG0_DESC7]], %[[RESULT0]][0]
				// CHECK: %[[RESULT2:.*]] = llvm.insertvalue %[[ARG1_DESC5]], %[[RESULT1]][1]
				// CHECK: %[[RESULT3:.]] = llvm.insertvalue %{{.}}, %[[RESULT2]][2]
				// CHECK: %[[RESULT4:.]] = llvm.insertvalue %{{.}}, %[[RESULT3]][3]
				// CHECK: llvm.return %[[RESULT4]]

				// CHECK-LABEL: llvm.func @_mlir_ciface_callee_multiple_result
				// CHECK-SAME: %[[RESULT_PTR:.*]]: !llvm.ptr<struct<(struct<(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>)>, struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>, i64, f32)>>,
				// CHECK-SAME: %[[ARG0_PTR:.*]]: !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>)>>,
				// CHECK-SAME: %[[ARG1_PTR:.*]]: !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>

				// Unpack descriptor for arg0.
				// CHECK: %[[ARG0_DESC:.*]] = llvm.load %[[ARG0_PTR]]
				// CHECK: %[[ALLOC0:.*]] = llvm.extractvalue %[[ARG0_DESC]][0]
				// CHECK: %[[ALIGN0:.*]] = llvm.extractvalue %[[ARG0_DESC]][1]
				// CHECK: %[[OFFSET0:.*]] = llvm.extractvalue %[[ARG0_DESC]][2]
				// CHECK: %[[SIZE00:.*]] = llvm.extractvalue %[[ARG0_DESC]][3, 0]
				// CHECK: %[[SIZE01:.*]] = llvm.extractvalue %[[ARG0_DESC]][3, 1]
				// CHECK: %[[STRIDE00:.*]] = llvm.extractvalue %[[ARG0_DESC]][4, 0]
				// CHECK: %[[STRIDE01:.*]] = llvm.extractvalue %[[ARG0_DESC]][4, 1]

				// Unpack descriptor for arg1.
				// CHECK: %[[ARG1_DESC:.*]] = llvm.load %[[ARG1_PTR]]
				// CHECK: %[[ALLOC1:.*]] = llvm.extractvalue %[[ARG1_DESC]][0]
				// CHECK: %[[ALIGN1:.*]] = llvm.extractvalue %[[ARG1_DESC]][1]
				// CHECK: %[[OFFSET1:.*]] = llvm.extractvalue %[[ARG1_DESC]][2]
				// CHECK: %[[SIZE10:.*]] = llvm.extractvalue %[[ARG1_DESC]][3, 0]
				// CHECK: %[[STRIDE10:.*]] = llvm.extractvalue %[[ARG1_DESC]][4, 0]

				// Call the function.
				// CHECK: %[[RESULT:.*]] = llvm.call @callee_multiple_result(%[[ALLOC0]], %[[ALIGN0]], %[[OFFSET0]], %[[SIZE00]], %[[SIZE01]], %[[STRIDE00]], %[[STRIDE01]], %[[ALLOC1]], %[[ALIGN1]], %[[OFFSET1]], %[[SIZE10]], %[[STRIDE10]])

				// Store the result and return.
				// CHECK: llvm.store %[[RESULT]], %[[RESULT_PTR]]
				// CHECK: llvm.return


				func @callee_multiple_args(%arg0 : index, %arg1 : memref<?x?xf32>,
				%arg2 : memref<?xf32>, %arg3 : f32) attributes { llvm.emit_c_interface } {
				%c0 = constant 0 : index
				%0 = memref.load %arg1[%c0, %arg0] : memref<?x?xf32>
				%1 = memref.load %arg2[%arg0] : memref<?xf32>
				return
				}

				// CHECK-LABEL: llvm.func @callee_multiple_args
				// CHECK-SAME: %[[IARG:arg0]]: i64,
				// CHECK-SAME: %[[ALLOC0:.]]: !llvm.ptr<f32>, %[[ALIGN0:.]]: !llvm.ptr<f32>, %[[OFFSET0:.]]: i64, %[[SIZE00:.]]: i64, %[[SIZE01:.]]: i64, %[[STRIDE00:.]]: i64, %[[STRIDE01:arg7]]: i64,
				// CHECK-SAME: %[[ALLOC1:.]]: !llvm.ptr<f32>, %[[ALIGN1:.]]: !llvm.ptr<f32>, %[[OFFSET1:.]]: i64, %[[SIZE10:.]]: i64, %[[STRIDE10:arg12]]: i64,
				// CHECK-SAME: %[[FARG:.*]]: f32

				// Populate the descriptor for arg1.
				// CHECK: %[[ARG0_DESC0:.*]] = llvm.mlir.undef
				// CHECK: %[[ARG0_DESC1:.*]] = llvm.insertvalue %[[ALLOC0]], %[[ARG0_DESC0]][0]
				// CHECK: %[[ARG0_DESC2:.*]] = llvm.insertvalue %[[ALIGN0]], %[[ARG0_DESC1]][1]
				// CHECK: %[[ARG0_DESC3:.*]] = llvm.insertvalue %[[OFFSET0]], %[[ARG0_DESC2]][2]
				// CHECK: %[[ARG0_DESC4:.*]] = llvm.insertvalue %[[SIZE00]], %[[ARG0_DESC3]][3, 0]
				// CHECK: %[[ARG0_DESC5:.*]] = llvm.insertvalue %[[STRIDE00]], %[[ARG0_DESC4]][4, 0]
				// CHECK: %[[ARG0_DESC6:.*]] = llvm.insertvalue %[[SIZE01]], %[[ARG0_DESC5]][3, 1]
				// CHECK: %[[ARG0_DESC7:.*]] = llvm.insertvalue %[[STRIDE01]], %[[ARG0_DESC6]][4, 1]

				// Populate the descriptor for arg2.
				// CHECK: %[[ARG1_DESC0:.*]] = llvm.mlir.undef
				// CHECK: %[[ARG1_DESC1:.*]] = llvm.insertvalue %[[ALLOC1]], %[[ARG1_DESC0]][0]
				// CHECK: %[[ARG1_DESC2:.*]] = llvm.insertvalue %[[ALIGN1]], %[[ARG1_DESC1]][1]
				// CHECK: %[[ARG1_DESC3:.*]] = llvm.insertvalue %[[OFFSET1]], %[[ARG1_DESC2]][2]
				// CHECK: %[[ARG1_DESC4:.*]] = llvm.insertvalue %[[SIZE10]], %[[ARG1_DESC3]][3, 0]
				// CHECK: %[[ARG1_DESC5:.*]] = llvm.insertvalue %[[STRIDE10]], %[[ARG1_DESC4]][4, 0]

				// CHECK: %{{.}} = llvm.load %{{.}}
				// CHECK: %{{.}} = llvm.load %{{.}}
				// CHECK: llvm.return

				// CHECK-LABEL: llvm.func @_mlir_ciface_callee_multiple_args
				// CHECK-SAME: %[[IARG:arg0]]: i64,
				// CHECK-SAME: %[[ARG1_PTR:.*]]: !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>)>>,
				// CHECK-SAME: %[[ARG2_PTR:.*]]: !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>,
				// CHECK-SAME: %[[FARG:.*]]: f32

				// Unpack descriptor for arg1.
				// CHECK: %[[ARG1_DESC:.*]] = llvm.load %[[ARG1_PTR]]
				// CHECK: %[[ALLOC0:.*]] = llvm.extractvalue %[[ARG1_DESC]][0]
				// CHECK: %[[ALIGN0:.*]] = llvm.extractvalue %[[ARG1_DESC]][1]
				// CHECK: %[[OFFSET0:.*]] = llvm.extractvalue %[[ARG1_DESC]][2]
				// CHECK: %[[SIZE00:.*]] = llvm.extractvalue %[[ARG1_DESC]][3, 0]
				// CHECK: %[[SIZE01:.*]] = llvm.extractvalue %[[ARG1_DESC]][3, 1]
				// CHECK: %[[STRIDE00:.*]] = llvm.extractvalue %[[ARG1_DESC]][4, 0]
				// CHECK: %[[STRIDE01:.*]] = llvm.extractvalue %[[ARG1_DESC]][4, 1]

				// Unpack descriptor for arg2.
				// CHECK: %[[ARG2_DESC:.*]] = llvm.load %[[ARG2_PTR]]
				// CHECK: %[[ALLOC1:.*]] = llvm.extractvalue %[[ARG2_DESC]][0]
				// CHECK: %[[ALIGN1:.*]] = llvm.extractvalue %[[ARG2_DESC]][1]
				// CHECK: %[[OFFSET1:.*]] = llvm.extractvalue %[[ARG2_DESC]][2]
				// CHECK: %[[SIZE10:.*]] = llvm.extractvalue %[[ARG2_DESC]][3, 0]
				// CHECK: %[[STRIDE10:.*]] = llvm.extractvalue %[[ARG2_DESC]][4, 0]

				// Call the function.
				// CHECK: llvm.call @callee_multiple_args(%[[IARG]], %[[ALLOC0]], %[[ALIGN0]], %[[OFFSET0]], %[[SIZE00]], %[[SIZE01]], %[[STRIDE00]], %[[STRIDE01]], %[[ALLOC1]], %[[ALIGN1]], %[[OFFSET1]], %[[SIZE10]], %[[STRIDE10]], %[[FARG]])
				// CHECK: llvm.return


				func @callee_no_result_unranked(%arg0 : memref<*xf32>)
				attributes { llvm.emit_c_interface } {
				%c0 = constant 0 : index
				%c1 = constant 1 : index
				%0 = memref.cast %arg0 : memref<*xf32> to memref<?x?xf32>
				%1 = memref.load %0[%c0, %c1] : memref<?x?xf32>
				return
				}

				// CHECK-LABEL: llvm.func @callee_no_result_unranked
				// CHECK-SAME: %[[ARG_RANK:.]]: i64, %[[ARG_INNER_DESC:.]]: !llvm.ptr<i8>

				// Populate the descriptor for arg0.
				// CHECK: %[[ARG_DESC0:.*]] = llvm.mlir.undef
				// CHECK: %[[ARG_DESC1:.*]] = llvm.insertvalue %[[ARG_RANK]], %[[ARG_DESC0]][0]
				// CHECK: %[[ARG_DESC2:.*]] = llvm.insertvalue %[[ARG_INNER_DESC]], %[[ARG_DESC1]][1]

				// CHECK: %{{.}} = llvm.load %{{.}}
				// CHECK: llvm.return

				// CHECK-LABEL: llvm.func @_mlir_ciface_callee_no_result_unranked
				// CHECK-SAME: %[[ARG_PTR:.*]]: !llvm.ptr<struct<(i64, ptr<i8>)>>

				// Unpack descriptor for arg0.
				// CHECK: %[[ARG_DESC:.*]] = llvm.load %[[ARG_PTR]]
				// CHECK: %[[ARG_RANK:.*]] = llvm.extractvalue %[[ARG_DESC]][0]
				// CHECK: %[[ARG_INNER_DESC:.*]] = llvm.extractvalue %[[ARG_DESC]][1]

				// Call the function.
				// CHECK: llvm.call @callee_no_result_unranked(%[[ARG_RANK]], %[[ARG_INNER_DESC]])
				// CHECK: llvm.return


				func @callee_single_result_unranked(%arg0 : memref<xf32>) -> memref<xf32>
				attributes { llvm.emit_c_interface } {
				return %arg0 : memref<*xf32>
				}

				// CHECK-LABEL: llvm.func @callee_single_result_unranked
				// CHECK-SAME: %[[RESULT_INNER_DESC_BUFFER:.*]]: !llvm.ptr<i8>,
				// CHECK-SAME: %[[ARG_RANK:.]]: i64, %[[ARG_INNER_DESC:.]]: !llvm.ptr<i8>

				// Populate the descriptor for arg0.
				// CHECK: %[[ARG_DESC0:.*]] = llvm.mlir.undef
				// CHECK: %[[ARG_DESC1:.*]] = llvm.insertvalue %[[ARG_RANK]], %[[ARG_DESC0]][0]
				// CHECK: %[[ARG_DESC2:.*]] = llvm.insertvalue %[[ARG_INNER_DESC]], %[[ARG_DESC1]][1]

				// Common constant.
				// CHECK: %[[MAX_SUPPORTED_RANK:.*]] = llvm.mlir.constant(5 : i64)

				// Compute the result's inner descriptor size.
				// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index)
				// CHECK: %[[C2:.*]] = llvm.mlir.constant(2 : index)
				// CHECK: %[[C8:.*]] = llvm.mlir.constant(8 : index)
				// CHECK: %[[C8_:.*]] = llvm.mlir.constant(8 : index)
				// CHECK: %[[SIZE_PTRS:.*]] = llvm.mul %[[C2]], %[[C8]]
				// CHECK: %[[RANK:.*]] = llvm.extractvalue %[[ARG_DESC2]][0]
				// CHECK: %[[RANK_TWICE:.*]] = llvm.mul %[[C2]], %[[RANK]]
				// CHECK: %[[NUM_I64_FIELDS:.*]] = llvm.add %[[RANK_TWICE]], %[[C1]]
				// CHECK: %[[SIZE_I64_FIELDS:.*]] = llvm.mul %[[NUM_I64_FIELDS]], %[[C8_]]
				// CHECK: %[[SIZE:.*]] = llvm.add %[[SIZE_PTRS]], %[[SIZE_I64_FIELDS]]

				// Check if the inner descriptor fits into the buffer argument.
				// CHECK: %[[RANK:.*]] = llvm.extractvalue %[[ARG_DESC2]][0]
				// CHECK: %[[PRED:.*]] = llvm.icmp "ule" %[[RANK]], %[[MAX_SUPPORTED_RANK]]
				// CHECK: llvm.cond_br %[[PRED]], ^bb2, ^bb3

				// Copy the inner descriptor to the selected buffer and return a copy of the
				// unranked outer descriptor.
				// CHECK: ^bb1(%[[SELECTED_BUFFER:.*]]: !llvm.ptr<i8>):
				// CHECK: %[[ARG_INNER_DESC:.*]] = llvm.extractvalue %[[ARG_DESC2]][1]
				// CHECK: %[[C0:.*]] = llvm.mlir.constant(false)
				// CHECK: "llvm.intr.memcpy"(%[[SELECTED_BUFFER]], %[[ARG_INNER_DESC]], %[[SIZE]], %[[C0]])
				// CHECK: %[[RESULT_DESC0:.*]] = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)>
				// CHECK: %[[RESULT_DESC1:.*]] = llvm.insertvalue %[[RANK]], %[[RESULT_DESC0]][0]
				// CHECK: %[[RESULT_DESC2:.*]] = llvm.insertvalue %[[SELECTED_BUFFER]], %[[RESULT_DESC1]][1]
				// CHECK: llvm.return %[[RESULT_DESC2]]

				// Select the buffer argument to copy the inner descriptor to.
				// CHECK: ^bb2:
				// CHECK: llvm.br ^bb1(%[[RESULT_INNER_DESC_BUFFER]] : !llvm.ptr<i8>)

				// Allocate a new buffer to copy the inner descriptor to.
				// CHECK: ^bb3:
				// CHECK: %[[NEW_BUFFER:.*]] = llvm.call @malloc(%[[SIZE]])
				// CHECK: llvm.br ^bb1(%[[NEW_BUFFER]] : !llvm.ptr<i8>)

				// CHECK-LABEL: llvm.func @_mlir_ciface_callee_single_result_unranked
				// CHECK-SAME: %[[RESULT_PTR:.*]]: !llvm.ptr<struct<(i64, ptr<i8>)>>,
				frgossenAuthorUnsubmitted Done Reply Inline Actions @mehdi_amini , this could be an example that you're looking for. frgossen: @mehdi_amini , this could be an example that you're looking for.
				// CHECK-SAME: %[[ARG_PTR:.*]]: !llvm.ptr<struct<(i64, ptr<i8>)>>

				// Extract inner descriptor buffer from pre-allocated result.
				// CHECK: %[[RESULT:.*]] = llvm.load %[[RESULT_PTR]]
				// CHECK: %[[RESULT_INNER_DESC_BUFFER:.*]] = llvm.extractvalue %[[RESULT]][1]

				frgossenAuthorUnsubmitted Done Reply Inline Actions @mehdi_amini , this is how the buffer is currently passed through the C interface as part of the pre-allocated result. The alternative is to pass buffer and (size or rank) as separate arguments, which I'd find less intuitive. frgossen: @mehdi_amini , this is how the buffer is currently passed through the C interface as part of…
				// Unpack descriptor for arg0.
				// CHECK: %[[ARG_DESC:.*]] = llvm.load %[[ARG_PTR]]
				// CHECK: %[[ARG_RANK:.*]] = llvm.extractvalue %[[ARG_DESC]][0]
				// CHECK: %[[ARG_INNER_DESC:.*]] = llvm.extractvalue %[[ARG_DESC]][1]

				// Call the function.
				// CHECK: %[[RESULT:.*]] = llvm.call @callee_single_result_unranked(%[[RESULT_INNER_DESC_BUFFER]], %[[ARG_RANK]], %[[ARG_INNER_DESC]])

				// Store the result and return.
				// CHECK: llvm.store %[[RESULT]], %[[RESULT_PTR]]
				// CHECK: llvm.return


				func @callee_multiple_result_unranked(%arg0 : memref<*xf32>) -> (f32, i64,
				memref<xf32>, memref<xf32>) attributes { llvm.emit_c_interface } {
				%pi = constant 3.141 : f32
				%c3 = constant 3 : i64
				return %pi, %c3, %arg0, %arg0 : f32, i64, memref<xf32>, memref<xf32>
				}

				// CHECK-LABEL: llvm.func @callee_multiple_result_unranked
				// CHECK-SAME: %[[RESULT_INNER_DESC_BUFFER0:.]]: !llvm.ptr<i8>, %[[RESULT_INNER_DESC_BUFFER1:.]]: !llvm.ptr<i8>,
				// CHECK-SAME: %[[ARG_RANK:.]]: i64, %[[ARG_INNER_DESC:.]]: !llvm.ptr<i8>

				// Populate the descriptor for arg0.
				// CHECK: %[[ARG_DESC0:.*]] = llvm.mlir.undef
				// CHECK: %[[ARG_DESC1:.*]] = llvm.insertvalue %[[ARG_RANK]], %[[ARG_DESC0]][0]
				// CHECK: %[[ARG_DESC2:.*]] = llvm.insertvalue %[[ARG_INNER_DESC]], %[[ARG_DESC1]][1]

				// Common constant.
				// CHECK: %[[MAX_SUPPORTED_RANK:.*]] = llvm.mlir.constant(5 : i64)

				// Compute first result's inner descriptor size.
				// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index)
				// CHECK: %[[C2:.*]] = llvm.mlir.constant(2 : index)
				// CHECK: %[[C8:.*]] = llvm.mlir.constant(8 : index)
				// CHECK: %[[C8_:.*]] = llvm.mlir.constant(8 : index)
				// CHECK: %[[SIZE_PTRS:.*]] = llvm.mul %[[C2]], %[[C8]]
				// CHECK: %[[RANK:.*]] = llvm.extractvalue %[[ARG_DESC2]][0]
				// CHECK: %[[RANK_TWICE:.*]] = llvm.mul %[[C2]], %[[RANK]]
				// CHECK: %[[NUM_I64_FIELDS:.*]] = llvm.add %[[RANK_TWICE]], %[[C1]]
				// CHECK: %[[SIZE_I64_FIELDS:.*]] = llvm.mul %[[NUM_I64_FIELDS]], %[[C8_]]
				// CHECK: %[[SIZE0:.*]] = llvm.add %[[SIZE_PTRS]], %[[SIZE_I64_FIELDS]]

				// Check if the inner descriptor fits into the buffer argument.
				// CHECK: %[[RANK:.*]] = llvm.extractvalue %[[ARG_DESC2]][0]
				// CHECK: %[[PRED:.*]] = llvm.icmp "ule" %[[RANK]], %[[MAX_SUPPORTED_RANK]]
				// CHECK: llvm.cond_br %[[PRED]], ^bb3, ^bb4

				// Copy the inner descriptor to the selected buffer and create a copy of the
				// unranked outer descriptor.
				// CHECK: ^bb1(%[[SELECTED_BUFFER:.*]]: !llvm.ptr<i8>):
				// CHECK: %[[ARG_INNER_DESC:.*]] = llvm.extractvalue %[[ARG_DESC2]][1]
				// CHECK: %[[C0:.*]] = llvm.mlir.constant(false)
				// CHECK: "llvm.intr.memcpy"(%[[SELECTED_BUFFER]], %[[ARG_INNER_DESC]], %[[SIZE0]], %[[C0]])
				// CHECK: %[[RESULT0_DESC0:.*]] = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)>
				// CHECK: %[[RESULT0_DESC1:.*]] = llvm.insertvalue %[[RANK]], %[[RESULT0_DESC0]][0]
				// CHECK: %[[RESULT0_DESC2:.*]] = llvm.insertvalue %[[SELECTED_BUFFER]], %[[RESULT0_DESC1]][1]

				// Compute second result's inner descriptor size.
				// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index)
				// CHECK: %[[C2:.*]] = llvm.mlir.constant(2 : index)
				// CHECK: %[[C8:.*]] = llvm.mlir.constant(8 : index)
				// CHECK: %[[C8_:.*]] = llvm.mlir.constant(8 : index)
				// CHECK: %[[SIZE_PTRS:.*]] = llvm.mul %[[C2]], %[[C8]]
				// CHECK: %[[RANK:.*]] = llvm.extractvalue %[[ARG_DESC2]][0]
				// CHECK: %[[RANK_TWICE:.*]] = llvm.mul %[[C2]], %[[RANK]]
				// CHECK: %[[NUM_I64_FIELDS:.*]] = llvm.add %[[RANK_TWICE]], %[[C1]]
				// CHECK: %[[SIZE_I64_FIELDS:.*]] = llvm.mul %[[NUM_I64_FIELDS]], %[[C8_]]
				// CHECK: %[[SIZE1:.*]] = llvm.add %[[SIZE_PTRS]], %[[SIZE_I64_FIELDS]]

				// Check if the inner descriptor fits into the buffer argument.
				// CHECK: %[[RANK:.*]] = llvm.extractvalue %[[ARG_DESC2]][0]
				// CHECK: %[[PRED:.*]] = llvm.icmp "ule" %[[RANK]], %[[MAX_SUPPORTED_RANK]]
				// CHECK: llvm.cond_br %[[PRED]], ^bb5, ^bb6

				// Copy the inner descriptor to the selected buffer and create a copy of the
				// unranked outer descriptor.
				// CHECK: ^bb2(%[[SELECTED_BUFFER:.*]]: !llvm.ptr<i8>):
				// CHECK: %[[ARG_INNER_DESC:.*]] = llvm.extractvalue %[[ARG_DESC2]][1]
				// CHECK: %[[C0:.*]] = llvm.mlir.constant(false)
				// CHECK: "llvm.intr.memcpy"(%[[SELECTED_BUFFER]], %[[ARG_INNER_DESC]], %[[SIZE1]], %[[C0]])
				// CHECK: %[[RESULT1_DESC0:.*]] = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)>
				// CHECK: %[[RESULT1_DESC1:.*]] = llvm.insertvalue %[[RANK]], %[[RESULT1_DESC0]][0]
				// CHECK: %[[RESULT1_DESC2:.*]] = llvm.insertvalue %[[SELECTED_BUFFER]], %[[RESULT1_DESC1]][1]

				// Populate and return result.
				// CHECK: %[[RESULT0:.*]] = llvm.mlir.undef
				// CHECK: %[[RESULT1:.]] = llvm.insertvalue %{{.}}, %[[RESULT0]][0]
				// CHECK: %[[RESULT2:.]] = llvm.insertvalue %{{.}}, %[[RESULT1]][1]
				// CHECK: %[[RESULT3:.*]] = llvm.insertvalue %[[RESULT0_DESC2]], %[[RESULT2]][2]
				// CHECK: %[[RESULT4:.*]] = llvm.insertvalue %[[RESULT1_DESC2]], %[[RESULT3]][3]
				// CHECK: llvm.return %[[RESULT4]]

				// Select the buffer argument to copy the inner descriptor to (first result).
				// CHECK: ^bb3:
				// CHECK: llvm.br ^bb1(%[[RESULT_INNER_DESC_BUFFER0]] : !llvm.ptr<i8>)

				// Allocate a new buffer to copy the inner descriptor to (first result).
				// CHECK: ^bb4:
				// CHECK: %[[NEW_BUFFER:.*]] = llvm.call @malloc(%[[SIZE0]])
				// CHECK: llvm.br ^bb1(%[[NEW_BUFFER]] : !llvm.ptr<i8>)

				// Select the buffer argument to copy the inner descriptor to (second result).
				// CHECK: ^bb5:
				// CHECK: llvm.br ^bb2(%[[RESULT_INNER_DESC_BUFFER1]] : !llvm.ptr<i8>)

				// Allocate a new buffer to copy the inner descriptor to (second result).
				// CHECK: ^bb6:
				// CHECK: %[[NEW_BUFFER:.*]] = llvm.call @malloc(%[[SIZE1]])
				// CHECK: llvm.br ^bb2(%[[NEW_BUFFER]] : !llvm.ptr<i8>)

				// CHECK-LABEL: llvm.func @_mlir_ciface_callee_multiple_result_unranked
				// CHECK-SAME: %[[RESULT_PTR:.*]]: !llvm.ptr<struct<(f32, i64, struct<(i64, ptr<i8>)>, struct<(i64, ptr<i8>)>)>>,
				// CHECK-SAME: %[[ARG_PTR:.*]]: !llvm.ptr<struct<(i64, ptr<i8>)>>

				// Extract inner descriptor buffers from the pre-allocated result.
				// CHECK: %[[RESULT:.*]] = llvm.load %[[RESULT_PTR]]
				// CHECK: %[[RESULT_DESC0:.*]] = llvm.extractvalue %[[RESULT]][2]
				// CHECK: %[[RESULT_INNER_DESC_BUFFER0:.*]] = llvm.extractvalue %[[RESULT_DESC0]][1]
				// CHECK: %[[RESULT_DESC1:.*]] = llvm.extractvalue %[[RESULT]][3]
				// CHECK: %[[RESULT_INNER_DESC_BUFFER1:.*]] = llvm.extractvalue %[[RESULT_DESC1]][1]

				// Unpack descriptor for arg0.
				// CHECK: %[[ARG_DESC:.*]] = llvm.load %[[ARG_PTR]]
				// CHECK: %[[ARG_RANK:.*]] = llvm.extractvalue %[[ARG_DESC]][0]
				// CHECK: %[[ARG_INNER_DESC:.*]] = llvm.extractvalue %[[ARG_DESC]][1]

				// Call the function.
				// CHECK: %[[RESULT:.*]] = llvm.call @callee_multiple_result_unranked(%[[RESULT_INNER_DESC_BUFFER0]], %[[RESULT_INNER_DESC_BUFFER1]], %[[ARG_RANK]], %[[ARG_INNER_DESC]])

				// Store the result and return.
				// CHECK: llvm.store %[[RESULT]], %[[RESULT_PTR]]
				// CHECK: llvm.return


				func @callee_multiple_args_unranked(%arg0 : memref<*xf32>, %arg1 : f32,
				%arg2 : memref<*xf32>, %arg3 : index) attributes { llvm.emit_c_interface } {
				%c0 = constant 0 : index
				%0 = memref.cast %arg0 : memref<*xf32> to memref<?x?xf32>
				%1 = memref.load %0[%c0, %arg3] : memref<?x?xf32>
				%2 = memref.cast %arg2 : memref<*xf32> to memref<?xf32>
				%3 = memref.load %2[%arg3] : memref<?xf32>
				return
				}

				// CHECK-LABEL: llvm.func @callee_multiple_args_unranked
				// CHECK-SAME: %[[ARG0_RANK:.*]]: i64, %[[ARG0_INNER_DESC:arg1]]: !llvm.ptr<i8>,
				// CHECK-SAME: %[[FARG:arg2]]: f32,
				// CHECK-SAME: %[[ARG1_RANK:.*]]: i64, %[[ARG1_INNER_DESC:arg4]]: !llvm.ptr<i8>,
				// CHECK-SAME: %[[IARG:.*]]: i64

				// Populate the descriptor for arg0.
				// CHECK: %[[ARG0_DESC0:.*]] = llvm.mlir.undef
				// CHECK: %[[ARG0_DESC1:.*]] = llvm.insertvalue %[[ARG0_RANK]], %[[ARG0_DESC0]][0]
				// CHECK: %[[ARG0_DESC2:.*]] = llvm.insertvalue %[[ARG0_INNER_DESC]], %[[ARG0_DESC1]][1]

				// Populate the descriptor for arg1.
				// CHECK: %[[ARG1_DESC0:.*]] = llvm.mlir.undef
				// CHECK: %[[ARG1_DESC1:.*]] = llvm.insertvalue %[[ARG1_RANK]], %[[ARG1_DESC0]][0]
				// CHECK: %[[ARG1_DESC2:.*]] = llvm.insertvalue %[[ARG1_INNER_DESC]], %[[ARG1_DESC1]][1]

				// CHECK: llvm.return

				// CHECK-LABEL: llvm.func @_mlir_ciface_callee_multiple_args_unranked
				// CHECK-SAME: %[[ARG0_PTR:arg0]]: !llvm.ptr<struct<(i64, ptr<i8>)>>,
				// CHECK-SAME: %[[FARG:arg1]]: f32,
				// CHECK-SAME: %[[ARG1_PTR:arg2]]: !llvm.ptr<struct<(i64, ptr<i8>)>>,
				// CHECK-SAME: %[[IARG:arg3]]: i64

				// Unpack descriptor for arg0.
				// CHECK: %[[ARG0_DESC:.*]] = llvm.load %[[ARG0_PTR]]
				// CHECK: %[[ARG0_RANK:.*]] = llvm.extractvalue %[[ARG0_DESC]][0]
				// CHECK: %[[ARG0_INNER_DESC:.*]] = llvm.extractvalue %[[ARG0_DESC]][1]

				// Unpack descriptor for arg1.
				// CHECK: %[[ARG1_DESC:.*]] = llvm.load %[[ARG1_PTR]]
				// CHECK: %[[ARG1_RANK:.*]] = llvm.extractvalue %[[ARG1_DESC]][0]
				// CHECK: %[[ARG1_INNER_DESC:.*]] = llvm.extractvalue %[[ARG1_DESC]][1]

				// Call the function.
				// CHECK: llvm.call @callee_multiple_args_unranked(%[[ARG0_RANK]], %[[ARG0_INNER_DESC]], %[[FARG]], %[[ARG1_RANK]], %[[ARG1_INNER_DESC]], %[[IARG]])
				// CHECK: llvm.return

mlir/test/Conversion/StandardToLLVM/calling-convention.mlir

	// RUN: mlir-opt -convert-memref-to-llvm -convert-std-to-llvm='emit-c-wrappers=1' -reconcile-unrealized-casts %s \| FileCheck %s			// RUN: mlir-opt %s \
	// RUN: mlir-opt -convert-memref-to-llvm -convert-std-to-llvm -reconcile-unrealized-casts %s \| FileCheck %s --check-prefix=EMIT_C_ATTRIBUTE			// RUN: --convert-memref-to-llvm \
	mehdi_aminiUnsubmitted Done Reply Inline Actions The previous test was testing two variants, with and without emit-c-wrappers, why did we lose the distinction? Actually, could we leave this test pristine by adding `max-unranked-desc-buffer-rank=-1` (and ensuring we disable the "small size optimization in this case)? We could then add another file that just exercises the effect of the `max-unranked-desc-buffer-rank` on dedicated case. In general I feel that that we should shard files like this one into smaller tests that exercises each particular feature of the calling convention. Reviewing a change like this diff is just not possible otherwise. That said thanks for the nice documentation inline in the test! The way you made each sections explicit is really helpful. mehdi_amini: The previous test was testing two variants, with and without emit-c-wrappers, why did we lose…
	frgossenAuthorUnsubmitted Done Reply Inline Actions Running the tests with and w/o `emit-c-wrappers=1` just switches between looking at the `llvm.emit_c_interface` attribute or assuming it everywhere. The new tests use the `emit_c_interface` attribute everywhere to generate C interfaces only where they are tested. Adding support to disable the desc buffer passing (`max-unranked-desc-buffer-rank=-1`) to fall back to the old behaviour would complicate the calling convention quite a bit imo. In one case, buffers are passed, in the other they aren't, etc. Until someone needs that, I would rather avoid this complexity. How important do you think this is? "shard files" - Done :) I know this is a big CL to review but I don't see a way to break it down into multiple smaller ones. If you prefer, I could land the tests separately. frgossen: Running the tests with and w/o `emit-c-wrappers=1` just switches between looking at the `llvm.
				// RUN: --convert-std-to-llvm='max-unranked-desc-buffer-rank=5' \| FileCheck %s

				func @callee_no_result(%arg0 : memref<?x?xf32>) {
				%c0 = constant 0 : index
				%c1 = constant 1 : index
				%0 = memref.load %arg0[%c0, %c1] : memref<?x?xf32>
				return
				}

	// This tests the default memref calling convention and the emission of C			func @caller_no_result(%arg0 : memref<?x?xf32>) {
	// wrappers. We don't need to separate runs because the wrapper-emission			call @callee_no_result(%arg0) : (memref<?x?xf32>) -> ()
	// version subsumes the calling convention and only adds new functions, that we			return
	// can also file-check in the same run.			}

	// An external function is transformed into the glue around calling an interface function.
	// CHECK-LABEL: @external
	// CHECK: %[[ALLOC0:.]]: !llvm.ptr<f32>, %[[ALIGN0:.]]: !llvm.ptr<f32>, %[[OFFSET0:.]]: i64, %[[SIZE00:.]]: i64, %[[SIZE01:.]]: i64, %[[STRIDE00:.]]: i64, %[[STRIDE01:.*]]: i64,
	// CHECK: %[[ALLOC1:.]]: !llvm.ptr<f32>, %[[ALIGN1:.]]: !llvm.ptr<f32>, %[[OFFSET1:.*]]: i64)
	func private @external(%arg0: memref<?x?xf32>, %arg1: memref<f32>)
	// Populate the descriptor for arg0.
	// CHECK: %[[DESC00:.*]] = llvm.mlir.undef : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>)>
	// CHECK: %[[DESC01:.*]] = llvm.insertvalue %arg0, %[[DESC00]][0]
	// CHECK: %[[DESC02:.*]] = llvm.insertvalue %arg1, %[[DESC01]][1]
	// CHECK: %[[DESC03:.*]] = llvm.insertvalue %arg2, %[[DESC02]][2]
	// CHECK: %[[DESC04:.*]] = llvm.insertvalue %arg3, %[[DESC03]][3, 0]
	// CHECK: %[[DESC05:.*]] = llvm.insertvalue %arg5, %[[DESC04]][4, 0]
	// CHECK: %[[DESC06:.*]] = llvm.insertvalue %arg4, %[[DESC05]][3, 1]
	// CHECK: %[[DESC07:.*]] = llvm.insertvalue %arg6, %[[DESC06]][4, 1]

	// Allocate on stack and store to comply with C calling convention.			// CHECK-LABEL: llvm.func @caller_no_result
	// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index)			// CHECK-SAME: %[[ALLOC:.]]: !llvm.ptr<f32>, %[[ALIGN:.]]: !llvm.ptr<f32>, %[[OFFSET:.]]: i64, %[[SIZE0:.]]: i64, %[[SIZE1:.]]: i64, %[[STRIDE0:.]]: i64, %[[STRIDE1:.*]]: i64
	// CHECK: %[[DESC0_ALLOCA:.*]] = llvm.alloca %[[C1]] x !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>)>
	// CHECK: llvm.store %[[DESC07]], %[[DESC0_ALLOCA]]			// Populate the descriptor for arg0.
				// CHECK: %[[ARG_DESC0:.*]] = llvm.mlir.undef : [[ARG_DESC_TY:!llvm.struct<\(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>\)>]]
				// CHECK: %[[ARG_DESC1:.*]] = llvm.insertvalue %[[ALLOC]], %[[ARG_DESC0]][0]
				// CHECK: %[[ARG_DESC2:.*]] = llvm.insertvalue %[[ALIGN]], %[[ARG_DESC1]][1]
				// CHECK: %[[ARG_DESC3:.*]] = llvm.insertvalue %[[OFFSET]], %[[ARG_DESC2]][2]
				// CHECK: %[[ARG_DESC4:.*]] = llvm.insertvalue %[[SIZE0]], %[[ARG_DESC3]][3, 0]
				// CHECK: %[[ARG_DESC5:.*]] = llvm.insertvalue %[[STRIDE0]], %[[ARG_DESC4]][4, 0]
				// CHECK: %[[ARG_DESC6:.*]] = llvm.insertvalue %[[SIZE1]], %[[ARG_DESC5]][3, 1]
				// CHECK: %[[ARG_DESC7:.*]] = llvm.insertvalue %[[STRIDE1]], %[[ARG_DESC6]][4, 1]

				// Unpack descriptor.
				// CHECK: %[[ALLOC_:.*]] = llvm.extractvalue %[[ARG_DESC7]][0]
				// CHECK: %[[ALIGN_:.*]] = llvm.extractvalue %[[ARG_DESC7]][1]
				// CHECK: %[[OFFSET_:.*]] = llvm.extractvalue %[[ARG_DESC7]][2]
				// CHECK: %[[SIZE0_:.*]] = llvm.extractvalue %[[ARG_DESC7]][3, 0]
				// CHECK: %[[SIZE1_:.*]] = llvm.extractvalue %[[ARG_DESC7]][3, 1]
				// CHECK: %[[STRIDE0_:.*]] = llvm.extractvalue %[[ARG_DESC7]][4, 0]
				// CHECK: %[[STRIDE1_:.*]] = llvm.extractvalue %[[ARG_DESC7]][4, 1]

				// Call the function.
				// CHECK: llvm.call @callee_no_result(%[[ALLOC_]], %[[ALIGN_]], %[[OFFSET_]], %[[SIZE0_]], %[[SIZE1_]], %[[STRIDE0_]], %[[STRIDE1_]])
				// CHECK: llvm.return


				func @callee_single_result(%arg0 : memref<?xf32>) -> memref<?xf32> {
				return %arg0 : memref<?xf32>
				}

				func @caller_single_result(%arg0 : memref<?xf32>) -> memref<?xf32> {
				%0 = call @callee_single_result(%arg0) : (memref<?xf32>) -> memref<?xf32>
				return %0 : memref<?xf32>
				}

				// CHECK-LABEL: llvm.func @caller_single_result
				// CHECK-SAME: %[[ALLOC:.]]: !llvm.ptr<f32>, %[[ALIGN:.]]: !llvm.ptr<f32>, %[[OFFSET:.]]: i64, %[[SIZE0:.]]: i64, %[[STRIDE0:.*]]: i64

				// Populate the descriptor for arg0.
				// CHECK: %[[ARG_DESC0:.*]] = llvm.mlir.undef
				// CHECK: %[[ARG_DESC1:.*]] = llvm.insertvalue %[[ALLOC]], %[[ARG_DESC0]][0]
				// CHECK: %[[ARG_DESC2:.*]] = llvm.insertvalue %[[ALIGN]], %[[ARG_DESC1]][1]
				// CHECK: %[[ARG_DESC3:.*]] = llvm.insertvalue %[[OFFSET]], %[[ARG_DESC2]][2]
				// CHECK: %[[ARG_DESC4:.*]] = llvm.insertvalue %[[SIZE0]], %[[ARG_DESC3]][3, 0]
				// CHECK: %[[ARG_DESC5:.*]] = llvm.insertvalue %[[STRIDE0]], %[[ARG_DESC4]][4, 0]

				// Unpack descriptor.
				// CHECK: %[[ALLOC_:.*]] = llvm.extractvalue %[[ARG_DESC5]][0]
				// CHECK: %[[ALIGN_:.*]] = llvm.extractvalue %[[ARG_DESC5]][1]
				// CHECK: %[[OFFSET_:.*]] = llvm.extractvalue %[[ARG_DESC5]][2]
				// CHECK: %[[SIZE0_:.*]] = llvm.extractvalue %[[ARG_DESC5]][3, 0]
				// CHECK: %[[STRIDE0_:.*]] = llvm.extractvalue %[[ARG_DESC5]][4, 0]

				// Call the function.
				// CHECK: %[[RESULT:.*]] = llvm.call @callee_single_result(%[[ALLOC_]], %[[ALIGN_]], %[[OFFSET_]], %[[SIZE0_]], %[[STRIDE0_]])
				// CHECK: llvm.return %[[RESULT]]


				func @callee_multiple_result(%arg0 : memref<?x?xf32>,
				%arg1 : memref<?xf32>) -> (memref<?x?xf32>, memref<?xf32>, i64, f32) {
				%c3 = constant 3 : i64
				%pi = constant 3.141 : f32
				return %arg0, %arg1, %c3, %pi : memref<?x?xf32>, memref<?xf32>, i64, f32
				}

				func @caller_multiple_result(%arg0 : memref<?x?xf32>, %arg1 : memref<?xf32>)
				-> (memref<?x?xf32>, memref<?xf32>, i64, f32) {
				%0:4 = call @callee_multiple_result(%arg0, %arg1)
				: (memref<?x?xf32>, memref<?xf32>)
				-> (memref<?x?xf32>, memref<?xf32>, i64, f32)
				return %0#0, %0#1, %0#2, %0#3 : memref<?x?xf32>, memref<?xf32>, i64, f32
				}

				// CHECK-LABEL: llvm.func @caller_multiple_result
				// CHECK-SAME: %[[ALLOC0:.]]: !llvm.ptr<f32>, %[[ALIGN0:.]]: !llvm.ptr<f32>, %[[OFFSET0:.]]: i64, %[[SIZE00:.]]: i64, %[[SIZE01:.]]: i64, %[[STRIDE00:.]]: i64, %[[STRIDE01:arg6]]: i64,
				// CHECK-SAME: %[[ALLOC1:.]]: !llvm.ptr<f32>, %[[ALIGN1:.]]: !llvm.ptr<f32>, %[[OFFSET1:.]]: i64, %[[SIZE10:.]]: i64, %[[STRIDE10:arg11]]: i64

				// Populate the descriptor for arg0.
				// CHECK: %[[ARG0_DESC0:.*]] = llvm.mlir.undef
				// CHECK: %[[ARG0_DESC1:.*]] = llvm.insertvalue %[[ALLOC0]], %[[ARG0_DESC0]][0]
				// CHECK: %[[ARG0_DESC2:.*]] = llvm.insertvalue %[[ALIGN0]], %[[ARG0_DESC1]][1]
				// CHECK: %[[ARG0_DESC3:.*]] = llvm.insertvalue %[[OFFSET0]], %[[ARG0_DESC2]][2]
				// CHECK: %[[ARG0_DESC4:.*]] = llvm.insertvalue %[[SIZE00]], %[[ARG0_DESC3]][3, 0]
				// CHECK: %[[ARG0_DESC5:.*]] = llvm.insertvalue %[[STRIDE00]], %[[ARG0_DESC4]][4, 0]
				// CHECK: %[[ARG0_DESC6:.*]] = llvm.insertvalue %[[SIZE01]], %[[ARG0_DESC5]][3, 1]
				// CHECK: %[[ARG0_DESC7:.*]] = llvm.insertvalue %[[STRIDE01]], %[[ARG0_DESC6]][4, 1]

				// Populate the descriptor for arg1.
				// CHECK: %[[ARG1_DESC0:.*]] = llvm.mlir.undef
				// CHECK: %[[ARG1_DESC1:.*]] = llvm.insertvalue %[[ALLOC1]], %[[ARG1_DESC0]][0]
				// CHECK: %[[ARG1_DESC2:.*]] = llvm.insertvalue %[[ALIGN1]], %[[ARG1_DESC1]][1]
				// CHECK: %[[ARG1_DESC3:.*]] = llvm.insertvalue %[[OFFSET1]], %[[ARG1_DESC2]][2]
				// CHECK: %[[ARG1_DESC4:.*]] = llvm.insertvalue %[[SIZE10]], %[[ARG1_DESC3]][3, 0]
				// CHECK: %[[ARG1_DESC5:.*]] = llvm.insertvalue %[[STRIDE10]], %[[ARG1_DESC4]][4, 0]

				// Unpack descriptor.
				// CHECK: %[[ALLOC0_:.*]] = llvm.extractvalue %[[ARG0_DESC7]][0]
				// CHECK: %[[ALIGN0_:.*]] = llvm.extractvalue %[[ARG0_DESC7]][1]
				// CHECK: %[[OFFSET0_:.*]] = llvm.extractvalue %[[ARG0_DESC7]][2]
				// CHECK: %[[SIZE00_:.*]] = llvm.extractvalue %[[ARG0_DESC7]][3, 0]
				// CHECK: %[[SIZE01_:.*]] = llvm.extractvalue %[[ARG0_DESC7]][3, 1]
				// CHECK: %[[STRIDE00_:.*]] = llvm.extractvalue %[[ARG0_DESC7]][4, 0]
				// CHECK: %[[STRIDE01_:.*]] = llvm.extractvalue %[[ARG0_DESC7]][4, 1]

				// Unpack descriptor.
				// CHECK: %[[ALLOC1_:.*]] = llvm.extractvalue %[[ARG1_DESC5]][0]
				// CHECK: %[[ALIGN1_:.*]] = llvm.extractvalue %[[ARG1_DESC5]][1]
				// CHECK: %[[OFFSET1_:.*]] = llvm.extractvalue %[[ARG1_DESC5]][2]
				// CHECK: %[[SIZE10_:.*]] = llvm.extractvalue %[[ARG1_DESC5]][3, 0]
				// CHECK: %[[STRIDE10_:.*]] = llvm.extractvalue %[[ARG1_DESC5]][4, 0]

				// Call the function.
				// CHECK: %[[RESULT:.*]] = llvm.call @callee_multiple_result(%[[ALLOC0_]], %[[ALIGN0_]], %[[OFFSET0_]], %[[SIZE00_]], %[[SIZE01_]], %[[STRIDE00_]], %[[STRIDE01_]], %[[ALLOC1_]], %[[ALIGN1_]], %[[OFFSET1_]], %[[SIZE10_]], %[[STRIDE10_]])

				// Unpack results.
				// CHECK: %[[RESULT0:.*]] = llvm.extractvalue %[[RESULT]][0]
				// CHECK: %[[RESULT1:.*]] = llvm.extractvalue %[[RESULT]][1]
				// CHECK: %[[RESULT2:.*]] = llvm.extractvalue %[[RESULT]][2]
				// CHECK: %[[RESULT3:.*]] = llvm.extractvalue %[[RESULT]][3]

				// Re-pack results.
				// CHECK: %[[REPACKED0:.*]] = llvm.mlir.undef
				// CHECK: %[[REPACKED1:.*]] = llvm.insertvalue %[[RESULT0]], %[[REPACKED0]][0]
				// CHECK: %[[REPACKED2:.*]] = llvm.insertvalue %[[RESULT1]], %[[REPACKED1]][1]
				// CHECK: %[[REPACKED3:.*]] = llvm.insertvalue %[[RESULT2]], %[[REPACKED2]][2]
				// CHECK: %[[REPACKED4:.*]] = llvm.insertvalue %[[RESULT3]], %[[REPACKED3]][3]

				// CHECK: llvm.return %[[REPACKED4]]


				func @callee_multiple_args(%arg0 : index, %arg1 : memref<?x?xf32>,
				%arg2 : memref<?xf32>, %arg3 : f32) {
				%c0 = constant 0 : index
				%0 = memref.load %arg1[%c0, %arg0] : memref<?x?xf32>
				%1 = memref.load %arg2[%arg0] : memref<?xf32>
				return
				}

				func @caller_multiple_args(%arg0 : index, %arg1 : memref<?x?xf32>,
				%arg2 : memref<?xf32>, %arg3 : f32) {
				call @callee_multiple_args(%arg0, %arg1, %arg2, %arg3)
				: (index, memref<?x?xf32>, memref<?xf32>, f32) -> ()
				return
				}

				// CHECK-LABEL: llvm.func @caller_multiple_args
				// CHECK-SAME: %[[IARG:arg0]]: i64,
				// CHECK-SAME: %[[ALLOC0:.]]: !llvm.ptr<f32>, %[[ALIGN0:.]]: !llvm.ptr<f32>, %[[OFFSET0:.]]: i64, %[[SIZE00:.]]: i64, %[[SIZE01:.]]: i64, %[[STRIDE00:.]]: i64, %[[STRIDE01:arg7]]: i64,
				// CHECK-SAME: %[[ALLOC1:.]]: !llvm.ptr<f32>, %[[ALIGN1:.]]: !llvm.ptr<f32>, %[[OFFSET1:.]]: i64, %[[SIZE10:.]]: i64, %[[STRIDE10:arg12]]: i64,
				// CHECK-SAME: %[[FARG:arg13]]: f32

				// Populate the descriptor for arg0.
				// CHECK: %[[ARG0_DESC0:.*]] = llvm.mlir.undef
				// CHECK: %[[ARG0_DESC1:.*]] = llvm.insertvalue %[[ALLOC0]], %[[ARG0_DESC0]][0]
				// CHECK: %[[ARG0_DESC2:.*]] = llvm.insertvalue %[[ALIGN0]], %[[ARG0_DESC1]][1]
				// CHECK: %[[ARG0_DESC3:.*]] = llvm.insertvalue %[[OFFSET0]], %[[ARG0_DESC2]][2]
				// CHECK: %[[ARG0_DESC4:.*]] = llvm.insertvalue %[[SIZE00]], %[[ARG0_DESC3]][3, 0]
				// CHECK: %[[ARG0_DESC5:.*]] = llvm.insertvalue %[[STRIDE00]], %[[ARG0_DESC4]][4, 0]
				// CHECK: %[[ARG0_DESC6:.*]] = llvm.insertvalue %[[SIZE01]], %[[ARG0_DESC5]][3, 1]
				// CHECK: %[[ARG0_DESC7:.*]] = llvm.insertvalue %[[STRIDE01]], %[[ARG0_DESC6]][4, 1]

	// Populate the descriptor for arg1.			// Populate the descriptor for arg1.
	// CHECK: %[[DESC10:.*]] = llvm.mlir.undef : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>			// CHECK: %[[ARG1_DESC0:.*]] = llvm.mlir.undef
	// CHECK: %[[DESC11:.*]] = llvm.insertvalue %arg7, %[[DESC10]][0] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>			// CHECK: %[[ARG1_DESC1:.*]] = llvm.insertvalue %[[ALLOC1]], %[[ARG1_DESC0]][0]
	// CHECK: %[[DESC12:.*]] = llvm.insertvalue %arg8, %[[DESC11]][1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>			// CHECK: %[[ARG1_DESC2:.*]] = llvm.insertvalue %[[ALIGN1]], %[[ARG1_DESC1]][1]
	// CHECK: %[[DESC13:.*]] = llvm.insertvalue %arg9, %[[DESC12]][2] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>			// CHECK: %[[ARG1_DESC3:.*]] = llvm.insertvalue %[[OFFSET1]], %[[ARG1_DESC2]][2]
				// CHECK: %[[ARG1_DESC4:.*]] = llvm.insertvalue %[[SIZE10]], %[[ARG1_DESC3]][3, 0]
	// Allocate on stack and store to comply with C calling convention.			// CHECK: %[[ARG1_DESC5:.*]] = llvm.insertvalue %[[STRIDE10]], %[[ARG1_DESC4]][4, 0]
	// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index)
	// CHECK: %[[DESC1_ALLOCA:.*]] = llvm.alloca %[[C1]] x !llvm.struct<(ptr<f32>, ptr<f32>, i64)>			// Unpack descriptor.
	// CHECK: llvm.store %[[DESC13]], %[[DESC1_ALLOCA]]			// CHECK: %[[ALLOC0_:.*]] = llvm.extractvalue %[[ARG0_DESC7]][0]
				// CHECK: %[[ALIGN0_:.*]] = llvm.extractvalue %[[ARG0_DESC7]][1]
	// Call the interface function.			// CHECK: %[[OFFSET0_:.*]] = llvm.extractvalue %[[ARG0_DESC7]][2]
	// CHECK: llvm.call @_mlir_ciface_external			// CHECK: %[[SIZE00_:.*]] = llvm.extractvalue %[[ARG0_DESC7]][3, 0]
				// CHECK: %[[SIZE01_:.*]] = llvm.extractvalue %[[ARG0_DESC7]][3, 1]
	// Verify that an interface function is emitted.			// CHECK: %[[STRIDE00_:.*]] = llvm.extractvalue %[[ARG0_DESC7]][4, 0]
	// CHECK-LABEL: llvm.func @_mlir_ciface_external			// CHECK: %[[STRIDE01_:.*]] = llvm.extractvalue %[[ARG0_DESC7]][4, 1]
	// CHECK: (!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>)>>, !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64)>>)
				// Unpack descriptor.
	// Verify that the return value is not affected.			// CHECK: %[[ALLOC1_:.*]] = llvm.extractvalue %[[ARG1_DESC5]][0]
	// CHECK-LABEL: @returner			// CHECK: %[[ALIGN1_:.*]] = llvm.extractvalue %[[ARG1_DESC5]][1]
	// CHECK: -> !llvm.struct<(struct<(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>)>, struct<(ptr<f32>, ptr<f32>, i64)>)>			// CHECK: %[[OFFSET1_:.*]] = llvm.extractvalue %[[ARG1_DESC5]][2]
	func private @returner() -> (memref<?x?xf32>, memref<f32>)			// CHECK: %[[SIZE10_:.*]] = llvm.extractvalue %[[ARG1_DESC5]][3, 0]
				// CHECK: %[[STRIDE10_:.*]] = llvm.extractvalue %[[ARG1_DESC5]][4, 0]
	// CHECK-LABEL: @caller
	func @caller() {			// Call the function.
	%0:2 = call @returner() : () -> (memref<?x?xf32>, memref<f32>)			// CHECK: llvm.call @callee_multiple_args(%[[IARG]], %[[ALLOC0_]], %[[ALIGN0_]], %[[OFFSET0_]], %[[SIZE00_]], %[[SIZE01_]], %[[STRIDE00_]], %[[STRIDE01_]], %[[ALLOC1_]], %[[ALIGN1_]], %[[OFFSET1_]], %[[SIZE10_]], %[[STRIDE10_]], %[[FARG]])
	// Extract individual values from the descriptor for the first memref.			// CHECK: llvm.return
	// CHECK: %[[ALLOC0:.]] = llvm.extractvalue %[[DESC0:.]][0] : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>)>
	// CHECK: %[[ALIGN0:.*]] = llvm.extractvalue %[[DESC0]][1]
	// CHECK: %[[OFFSET0:.*]] = llvm.extractvalue %[[DESC0]][2]			func @callee_no_result_unranked(%arg0 : memref<*xf32>) {
	// CHECK: %[[SIZE00:.*]] = llvm.extractvalue %[[DESC0]][3, 0]			%c0 = constant 0 : index
	// CHECK: %[[SIZE01:.*]] = llvm.extractvalue %[[DESC0]][3, 1]			%c1 = constant 1 : index
	// CHECK: %[[STRIDE00:.*]] = llvm.extractvalue %[[DESC0]][4, 0]			%0 = memref.cast %arg0 : memref<*xf32> to memref<?x?xf32>
	// CHECK: %[[STRIDE01:.*]] = llvm.extractvalue %[[DESC0]][4, 1]			%1 = memref.load %0[%c0, %c1] : memref<?x?xf32>
				return
	// Extract individual values from the descriptor for the second memref.			}
	// CHECK: %[[ALLOC1:.]] = llvm.extractvalue %[[DESC1:.]][0] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
	// CHECK: %[[ALIGN1:.*]] = llvm.extractvalue %[[DESC1]][1]			func @caller_no_result_unranked(%arg0 : memref<*xf32>) {
	// CHECK: %[[OFFSET1:.*]] = llvm.extractvalue %[[DESC1]][2]			call @callee_no_result_unranked(%arg0) : (memref<*xf32>) -> ()
				return
	// Forward the values to the call.			}
	// CHECK: llvm.call @external(%[[ALLOC0]], %[[ALIGN0]], %[[OFFSET0]], %[[SIZE00]], %[[SIZE01]], %[[STRIDE00]], %[[STRIDE01]], %[[ALLOC1]], %[[ALIGN1]], %[[OFFSET1]]) : (!llvm.ptr<f32>, !llvm.ptr<f32>, i64, i64, i64, i64, i64, !llvm.ptr<f32>, !llvm.ptr<f32>, i64) -> ()
	call @external(%0#0, %0#1) : (memref<?x?xf32>, memref<f32>) -> ()			// CHECK-LABEL: llvm.func @caller_no_result_unranked
	return			// CHECK-SAME: %[[ARG_RANK:.]]: i64, %[[ARG_INNER_DESC:.]]: !llvm.ptr<i8>
	}
				// Populate the descriptor for arg0.
	// CHECK-LABEL: @callee			// CHECK: %[[ARG_DESC0:.*]] = llvm.mlir.undef
	// EMIT_C_ATTRIBUTE-LABEL: @callee			// CHECK: %[[ARG_DESC1:.*]] = llvm.insertvalue %[[ARG_RANK]], %[[ARG_DESC0]][0]
	func @callee(%arg0: memref<?xf32>, %arg1: index) {			// CHECK: %[[ARG_DESC2:.*]] = llvm.insertvalue %[[ARG_INNER_DESC]], %[[ARG_DESC1]][1]
	%0 = memref.load %arg0[%arg1] : memref<?xf32>
	return			// Unpack descriptor.
	}			// CHECK: %[[ARG_RANK_:.*]] = llvm.extractvalue %[[ARG_DESC2]][0]
				// CHECK: %[[ARG_INNER_DESC_:.*]] = llvm.extractvalue %[[ARG_DESC2]][1]
	// Verify that an interface function is emitted.
	// CHECK-LABEL: @_mlir_ciface_callee			// Call the function.
	// CHECK: %[[ARG0:.*]]: !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>			// CHECK: llvm.call @callee_no_result_unranked(%[[ARG_RANK_]], %[[ARG_INNER_DESC_]])
	// Load the memref descriptor pointer.			// CHECK: llvm.return
	// CHECK: %[[DESC:.*]] = llvm.load %[[ARG0]] : !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>

	// Extract individual components of the descriptor.			func @callee_single_result_unranked(%arg0 : memref<xf32>) -> memref<xf32> {
	// CHECK: %[[ALLOC:.*]] = llvm.extractvalue %[[DESC]][0]			return %arg0 : memref<*xf32>
	// CHECK: %[[ALIGN:.*]] = llvm.extractvalue %[[DESC]][1]			}
	// CHECK: %[[OFFSET:.*]] = llvm.extractvalue %[[DESC]][2]
	// CHECK: %[[SIZE:.*]] = llvm.extractvalue %[[DESC]][3, 0]			func @caller_single_result_unranked(%arg0 : memref<xf32>) -> memref<xf32> {
	// CHECK: %[[STRIDE:.*]] = llvm.extractvalue %[[DESC]][4, 0]			%0 = call @callee_single_result_unranked(%arg0)
				: (memref<xf32>) -> memref<xf32>
	// Forward the descriptor components to the call.
	// CHECK: llvm.call @callee(%[[ALLOC]], %[[ALIGN]], %[[OFFSET]], %[[SIZE]], %[[STRIDE]], %{{.*}}) : (!llvm.ptr<f32>, !llvm.ptr<f32>, i64, i64, i64, i64) -> ()

	// EMIT_C_ATTRIBUTE-NOT: @mlir_ciface_callee

	// CHECK-LABEL: @other_callee
	// EMIT_C_ATTRIBUTE-LABEL: @other_callee
	func @other_callee(%arg0: memref<?xf32>, %arg1: index) attributes { llvm.emit_c_interface } {
	%0 = memref.load %arg0[%arg1] : memref<?xf32>
	return
	}

	// CHECK: @_mlir_ciface_other_callee
	// CHECK: llvm.call @other_callee

	// EMIT_C_ATTRIBUTE: @_mlir_ciface_other_callee
	// EMIT_C_ATTRIBUTE: llvm.call @other_callee

	//===========================================================================//
	// Calling convention on returning unranked memrefs.
	//===========================================================================//

	// CHECK-LABEL: llvm.func @return_var_memref_caller
	func @return_var_memref_caller(%arg0: memref<4x3xf32>) {
	// CHECK: %[[CALL_RES:.*]] = llvm.call @return_var_memref
	%0 = call @return_var_memref(%arg0) : (memref<4x3xf32>) -> memref<*xf32>

	// CHECK: %[[ONE:.*]] = llvm.mlir.constant(1 : index)
	// CHECK: %[[TWO:.*]] = llvm.mlir.constant(2 : index)
	// These sizes may depend on the data layout, not matching specific values.
	// CHECK: %[[PTR_SIZE:.*]] = llvm.mlir.constant
	// CHECK: %[[IDX_SIZE:.*]] = llvm.mlir.constant

	// CHECK: %[[DOUBLE_PTR_SIZE:.*]] = llvm.mul %[[TWO]], %[[PTR_SIZE]]
	// CHECK: %[[RANK:.*]] = llvm.extractvalue %[[CALL_RES]][0] : !llvm.struct<(i64, ptr<i8>)>
	// CHECK: %[[DOUBLE_RANK:.*]] = llvm.mul %[[TWO]], %[[RANK]]
	// CHECK: %[[DOUBLE_RANK_INC:.*]] = llvm.add %[[DOUBLE_RANK]], %[[ONE]]
	// CHECK: %[[TABLES_SIZE:.*]] = llvm.mul %[[DOUBLE_RANK_INC]], %[[IDX_SIZE]]
	// CHECK: %[[ALLOC_SIZE:.*]] = llvm.add %[[DOUBLE_PTR_SIZE]], %[[TABLES_SIZE]]
	// CHECK: %[[FALSE:.*]] = llvm.mlir.constant(false)
	// CHECK: %[[ALLOCA:.*]] = llvm.alloca %[[ALLOC_SIZE]] x i8
	// CHECK: %[[SOURCE:.*]] = llvm.extractvalue %[[CALL_RES]][1]
	// CHECK: "llvm.intr.memcpy"(%[[ALLOCA]], %[[SOURCE]], %[[ALLOC_SIZE]], %[[FALSE]])
	// CHECK: llvm.call @free(%[[SOURCE]])
	// CHECK: %[[DESC:.*]] = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)>
	// CHECK: %[[RANK:.*]] = llvm.extractvalue %[[CALL_RES]][0] : !llvm.struct<(i64, ptr<i8>)>
	// CHECK: %[[DESC_1:.*]] = llvm.insertvalue %[[RANK]], %[[DESC]][0]
	// CHECK: llvm.insertvalue %[[ALLOCA]], %[[DESC_1]][1]
	return
	}

	// CHECK-LABEL: llvm.func @return_var_memref
	func @return_var_memref(%arg0: memref<4x3xf32>) -> memref<*xf32> attributes { llvm.emit_c_interface } {
	// Match the construction of the unranked descriptor.
	// CHECK: %[[ALLOCA:.*]] = llvm.alloca
	// CHECK: %[[MEMORY:.*]] = llvm.bitcast %[[ALLOCA]]
	// CHECK: %[[DESC_0:.*]] = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)>
	// CHECK: %[[DESC_1:.]] = llvm.insertvalue %{{.}}, %[[DESC_0]][0]
	// CHECK: %[[DESC_2:.*]] = llvm.insertvalue %[[MEMORY]], %[[DESC_1]][1]
	%0 = memref.cast %arg0: memref<4x3xf32> to memref<*xf32>

	// CHECK: %[[ONE:.*]] = llvm.mlir.constant(1 : index)
	// CHECK: %[[TWO:.*]] = llvm.mlir.constant(2 : index)
	// These sizes may depend on the data layout, not matching specific values.
	// CHECK: %[[PTR_SIZE:.*]] = llvm.mlir.constant
	// CHECK: %[[IDX_SIZE:.*]] = llvm.mlir.constant

	// CHECK: %[[DOUBLE_PTR_SIZE:.*]] = llvm.mul %[[TWO]], %[[PTR_SIZE]]
	// CHECK: %[[RANK:.*]] = llvm.extractvalue %[[DESC_2]][0] : !llvm.struct<(i64, ptr<i8>)>
	// CHECK: %[[DOUBLE_RANK:.*]] = llvm.mul %[[TWO]], %[[RANK]]
	// CHECK: %[[DOUBLE_RANK_INC:.*]] = llvm.add %[[DOUBLE_RANK]], %[[ONE]]
	// CHECK: %[[TABLES_SIZE:.*]] = llvm.mul %[[DOUBLE_RANK_INC]], %[[IDX_SIZE]]
	// CHECK: %[[ALLOC_SIZE:.*]] = llvm.add %[[DOUBLE_PTR_SIZE]], %[[TABLES_SIZE]]
	// CHECK: %[[FALSE:.*]] = llvm.mlir.constant(false)
	// CHECK: %[[ALLOCATED:.*]] = llvm.call @malloc(%[[ALLOC_SIZE]])
	// CHECK: %[[SOURCE:.*]] = llvm.extractvalue %[[DESC_2]][1]
	// CHECK: "llvm.intr.memcpy"(%[[ALLOCATED]], %[[SOURCE]], %[[ALLOC_SIZE]], %[[FALSE]])
	// CHECK: %[[NEW_DESC:.*]] = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)>
	// CHECK: %[[RANK:.*]] = llvm.extractvalue %[[DESC_2]][0] : !llvm.struct<(i64, ptr<i8>)>
	// CHECK: %[[NEW_DESC_1:.*]] = llvm.insertvalue %[[RANK]], %[[NEW_DESC]][0]
	// CHECK: %[[NEW_DESC_2:.*]] = llvm.insertvalue %[[ALLOCATED]], %[[NEW_DESC_1]][1]
	// CHECK: llvm.return %[[NEW_DESC_2]]
	return %0 : memref<*xf32>			return %0 : memref<*xf32>
	}			}

	// Check that the result memref is passed as parameter			// CHECK-LABEL: llvm.func @caller_single_result_unranked
	// CHECK-LABEL: @_mlir_ciface_return_var_memref			// CHECK-SAME: %[[RESULT_INNER_DESC_BUFFER:.]]: !llvm.ptr<i8>, %[[ARG_RANK:.]]: i64, %[[ARG_INNER_DESC:.*]]: !llvm.ptr<i8>
	// CHECK-SAME: (%{{.}}: !llvm.ptr<struct<(i64, ptr<i8>)>>, %{{.}}: !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>)>>)
				// Populate the descriptor for arg0.
	// CHECK-LABEL: llvm.func @return_two_var_memref_caller			// CHECK: %[[ARG_DESC0:.*]] = llvm.mlir.undef
	func @return_two_var_memref_caller(%arg0: memref<4x3xf32>) {			// CHECK: %[[ARG_DESC1:.*]] = llvm.insertvalue %[[ARG_RANK]], %[[ARG_DESC0]][0]
	// Only check that we create two different descriptors using different			// CHECK: %[[ARG_DESC2:.*]] = llvm.insertvalue %[[ARG_INNER_DESC]], %[[ARG_DESC1]][1]
	// memory, and deallocate both sources. The size computation is same as for
	// the single result.			// Allocate descriptor buffers on the stack.
	// CHECK: %[[CALL_RES:.*]] = llvm.call @return_two_var_memref			// CHECK: %[[DEFAULT_DESC_BUFFER_SIZE:.*]] = llvm.mlir.constant(104 : index)
	// CHECK: %[[RES_1:.*]] = llvm.extractvalue %[[CALL_RES]][0]			// CHECK: %[[CALL_INNER_DESC_BUFFER:.*]] = llvm.alloca %[[DEFAULT_DESC_BUFFER_SIZE]] x i8
	// CHECK: %[[RES_2:.*]] = llvm.extractvalue %[[CALL_RES]][1]
	%0:2 = call @return_two_var_memref(%arg0) : (memref<4x3xf32>) -> (memref<xf32>, memref<xf32>)			// Unpack descriptor.
				// CHECK: %[[ARG_RANK_:.*]] = llvm.extractvalue %[[ARG_DESC2]][0]
	// CHECK: %[[ALLOCA_1:.]] = llvm.alloca %{{.}} x i8			// CHECK: %[[ARG_INNER_DESC_:.*]] = llvm.extractvalue %[[ARG_DESC2]][1]
	// CHECK: %[[SOURCE_1:.]] = llvm.extractvalue %[[RES_1:.]][1] : ![[DESC_TYPE:.*]]
	// CHECK: "llvm.intr.memcpy"(%[[ALLOCA_1]], %[[SOURCE_1]], %{{.}}, %[[FALSE:.]])			// Call the function.
	// CHECK: llvm.call @free(%[[SOURCE_1]])			// CHECK: %[[CALL_RESULT_DESC:.*]] = llvm.call @callee_single_result_unranked(%[[CALL_INNER_DESC_BUFFER]], %[[ARG_RANK_]], %[[ARG_INNER_DESC_]])
	// CHECK: %[[DESC_1:.*]] = llvm.mlir.undef : ![[DESC_TYPE]]
	// CHECK: %[[DESC_11:.]] = llvm.insertvalue %{{.}}, %[[DESC_1]][0]			// Common constant.
	// CHECK: llvm.insertvalue %[[ALLOCA_1]], %[[DESC_11]][1]			// CHECK: %[[MAX_SUPPORTED_RANK:.*]] = llvm.mlir.constant(5 : i64)

	// CHECK: %[[ALLOCA_2:.]] = llvm.alloca %{{.}} x i8			// Check if the inner descriptor fits into the buffer argument.
	// CHECK: %[[SOURCE_2:.]] = llvm.extractvalue %[[RES_2:.]][1]			// CHECK: %[[RANK:.*]] = llvm.extractvalue %[[CALL_RESULT_DESC]][0]
	// CHECK: "llvm.intr.memcpy"(%[[ALLOCA_2]], %[[SOURCE_2]], %{{.*}}, %[[FALSE]])			// CHECK: %[[PRED:.*]] = llvm.icmp "ule" %[[RANK]], %[[MAX_SUPPORTED_RANK]]
	// CHECK: llvm.call @free(%[[SOURCE_2]])			// CHECK: llvm.cond_br %[[PRED]], ^bb1(%[[CALL_RESULT_DESC]] : !llvm.struct<(i64, ptr<i8>)>), ^bb3
	// CHECK: %[[DESC_2:.*]] = llvm.mlir.undef : ![[DESC_TYPE]]
	// CHECK: %[[DESC_21:.]] = llvm.insertvalue %{{.}}, %[[DESC_2]][0]			// At this point, we have the call result descriptor or its copy. In both cases
	// CHECK: llvm.insertvalue %[[ALLOCA_2]], %[[DESC_21]][1]			// the descriptor, including its inner descriptor, is on the stack.
	return			// To return it, we still have to copy it to the descriptor buffer or to
	}			// dynamically allocated memory.
				// CHECK: ^bb1(%[[DESC_OR_CPY:.*]]: !llvm.struct<(i64, ptr<i8>)>):
	// CHECK-LABEL: llvm.func @return_two_var_memref
	func @return_two_var_memref(%arg0: memref<4x3xf32>) -> (memref<xf32>, memref<xf32>) attributes { llvm.emit_c_interface } {			// Common constant.
	// Match the construction of the unranked descriptor.			// CHECK: %[[MAX_SUPPORTED_RANK_:.*]] = llvm.mlir.constant(5 : i64)
	// CHECK: %[[ALLOCA:.*]] = llvm.alloca
	// CHECK: %[[MEMORY:.*]] = llvm.bitcast %[[ALLOCA]]			// Compute the final result's inner descriptor size.
	// CHECK: %[[DESC_0:.*]] = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)>			// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index)
	// CHECK: %[[DESC_1:.]] = llvm.insertvalue %{{.}}, %[[DESC_0]][0]			// CHECK: %[[C2:.*]] = llvm.mlir.constant(2 : index)
	// CHECK: %[[DESC_2:.*]] = llvm.insertvalue %[[MEMORY]], %[[DESC_1]][1]			// CHECK: %[[C8:.*]] = llvm.mlir.constant(8 : index)
	%0 = memref.cast %arg0 : memref<4x3xf32> to memref<*xf32>			// CHECK: %[[C8_:.*]] = llvm.mlir.constant(8 : index)
				// CHECK: %[[SIZE_PTRS:.*]] = llvm.mul %[[C2]], %[[C8]]
	// Only check that we allocate the memory for each operand of the "return"			// CHECK: %[[RANK:.*]] = llvm.extractvalue %[[DESC_OR_CPY]][0]
	// separately, even if both operands are the same value. The calling			// CHECK: %[[RANK_TWICE:.*]] = llvm.mul %[[C2]], %[[RANK]]
	// convention requires the caller to free them and the caller cannot know			// CHECK: %[[NUM_I64_FIELDS:.*]] = llvm.add %[[RANK_TWICE]], %[[C1]]
	// whether they are the same value or not.			// CHECK: %[[SIZE_I64_FIELDS:.*]] = llvm.mul %[[NUM_I64_FIELDS]], %[[C8_]]
	// CHECK: %[[ALLOCATED_1:.]] = llvm.call @malloc(%{{.}})			// CHECK: %[[RESULT_INNER_DESC_SIZE:.*]] = llvm.add %[[SIZE_PTRS]], %[[SIZE_I64_FIELDS]]
	// CHECK: %[[SOURCE_1:.*]] = llvm.extractvalue %[[DESC_2]][1]
	// CHECK: "llvm.intr.memcpy"(%[[ALLOCATED_1]], %[[SOURCE_1]], %{{.}}, %[[FALSE:.]])			// Check if the inner descriptor fits into the stack-allocated buffer argument.
	// CHECK: %[[RES_1:.*]] = llvm.mlir.undef			// CHECK: %[[RANK:.*]] = llvm.extractvalue %[[DESC_OR_CPY]][0]
	// CHECK: %[[RES_11:.]] = llvm.insertvalue %{{.}}, %[[RES_1]][0]			// CHECK: %[[PRED:.*]] = llvm.icmp "ule" %[[RANK]], %[[MAX_SUPPORTED_RANK_]]
	// CHECK: %[[RES_12:.*]] = llvm.insertvalue %[[ALLOCATED_1]], %[[RES_11]][1]			// CHECK: llvm.cond_br %[[PRED]], ^bb4, ^bb5

	// CHECK: %[[ALLOCATED_2:.]] = llvm.call @malloc(%{{.}})			// Copy the inner descriptor to the selected buffer and return a copy of the
	// CHECK: %[[SOURCE_2:.*]] = llvm.extractvalue %[[DESC_2]][1]			// unranked outer descriptor.
	// CHECK: "llvm.intr.memcpy"(%[[ALLOCATED_2]], %[[SOURCE_2]], %{{.*}}, %[[FALSE]])			// CHECK: ^bb2(%[[SELECTED_BUFFER:.*]]: !llvm.ptr<i8>):
	// CHECK: %[[RES_2:.*]] = llvm.mlir.undef			// CHECK: %[[CALL_RESULT_INNER_DESC:.*]] = llvm.extractvalue %[[DESC_OR_CPY]][1]
	// CHECK: %[[RES_21:.]] = llvm.insertvalue %{{.}}, %[[RES_2]][0]			// CHECK: %[[C0:.*]] = llvm.mlir.constant(false)
	// CHECK: %[[RES_22:.*]] = llvm.insertvalue %[[ALLOCATED_2]], %[[RES_21]][1]			// CHECK: "llvm.intr.memcpy"(%[[SELECTED_BUFFER]], %[[CALL_RESULT_INNER_DESC]], %[[RESULT_INNER_DESC_SIZE]], %[[C0]])
				// CHECK: %[[RESULT_DESC0:.*]] = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)>
	// CHECK: %[[RESULTS:.*]] = llvm.mlir.undef : !llvm.struct<(struct<(i64, ptr<i8>)>, struct<(i64, ptr<i8>)>)>			// CHECK: %[[RESULT_DESC1:.*]] = llvm.insertvalue %[[RANK]], %[[RESULT_DESC0]][0]
	// CHECK: %[[RESULTS_1:.*]] = llvm.insertvalue %[[RES_12]], %[[RESULTS]]			// CHECK: %[[RESULT_DESC2:.*]] = llvm.insertvalue %[[SELECTED_BUFFER]], %[[RESULT_DESC1]][1]
	// CHECK: %[[RESULTS_2:.*]] = llvm.insertvalue %[[RES_22]], %[[RESULTS_1]]			// CHECK: llvm.return %[[RESULT_DESC2]]
	// CHECK: llvm.return %[[RESULTS_2]]
	return %0, %0 : memref<xf32>, memref<xf32>			// Copy the call result descriptor to stack-allocated memory.
	}			// This is the case in which it did not fit into the pre-allocated buffer. We
				// have to free the dynamically allocated inner descriptor and copy it over to
	// Check that the result memrefs are passed as parameter			// the stack.
	// CHECK-LABEL: @_mlir_ciface_return_two_var_memref			// CHECK: ^bb3:
	// CHECK-SAME: (%{{.*}}: !llvm.ptr<struct<(struct<(i64, ptr<i8>)>, struct<(i64, ptr<i8>)>)>>,
	// CHECK-SAME: %{{.*}}: !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>)>>)			// Compute the call result's inner descriptor size.
				// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index)
				// CHECK: %[[C2:.*]] = llvm.mlir.constant(2 : index)
				// CHECK: %[[C8:.*]] = llvm.mlir.constant(8 : index)
				// CHECK: %[[C8_:.*]] = llvm.mlir.constant(8 : index)
				// CHECK: %[[SIZE_PTRS:.*]] = llvm.mul %[[C2]], %[[C8]]
				// CHECK: %[[RANK:.*]] = llvm.extractvalue %[[CALL_RESULT_DESC]][0]
				// CHECK: %[[RANK_TWICE:.*]] = llvm.mul %[[C2]], %[[RANK]]
				// CHECK: %[[NUM_I64_FIELDS:.*]] = llvm.add %[[RANK_TWICE]], %[[C1]]
				// CHECK: %[[SIZE_I64_FIELDS:.*]] = llvm.mul %[[NUM_I64_FIELDS]], %[[C8_]]
				// CHECK: %[[CALL_RESULT_INNER_DESC_SIZE:.*]] = llvm.add %[[SIZE_PTRS]], %[[SIZE_I64_FIELDS]]

				// Stack-allocate a buffer for the call result's inner descriptor and copy it
				// over. Also, free the previously dynamically allocated inner descriptor.
				// CHECK: %[[INNER_DESC:.*]] = llvm.alloca %[[CALL_RESULT_INNER_DESC_SIZE]] x i8
				// CHECK: %[[DYN_INNER_DESC:.*]] = llvm.extractvalue %[[CALL_RESULT_DESC]][1]
				// CHECK: %[[C0:.*]] = llvm.mlir.constant(false)
				// CHECK: "llvm.intr.memcpy"(%[[INNER_DESC]], %[[DYN_INNER_DESC]], %[[CALL_RESULT_INNER_DESC_SIZE]], %[[C0]])
				// CHECK: llvm.call @free(%[[DYN_INNER_DESC]])
				// CHECK: %[[CALL_RESULT_DESC_CPY0:.*]] = llvm.mlir.undef
				// CHECK: %[[RANK:.*]] = llvm.extractvalue %[[CALL_RESULT_DESC]][0]
				// CHECK: %[[CALL_RESULT_DESC_CPY1:.*]] = llvm.insertvalue %[[RANK]], %[[CALL_RESULT_DESC_CPY0]][0]
				// CHECK: %[[CALL_RESULT_DESC_CPY2:.*]] = llvm.insertvalue %[[INNER_DESC]], %[[CALL_RESULT_DESC_CPY1]][1]
				// CHECK: llvm.br ^bb1(%[[CALL_RESULT_DESC_CPY2]] : !llvm.struct<(i64, ptr<i8>)>)

				// Select the buffer argument to copy the result's inner descriptor to.
				// CHECK: ^bb4:
				// CHECK: llvm.br ^bb2(%[[RESULT_INNER_DESC_BUFFER]] : !llvm.ptr<i8>)

				// Dynamically allocate a new buffer to copy the result's inner descriptor to.
				// CHECK: ^bb5:
				// CHECK: %[[NEW_BUFFER:.*]] = llvm.call @malloc(%[[RESULT_INNER_DESC_SIZE]])
				// CHECK: llvm.br ^bb2(%[[NEW_BUFFER]] : !llvm.ptr<i8>)


				func @callee_multiple_result_unranked(%arg0 : memref<*xf32>) -> (f32, i64,
				memref<xf32>, memref<xf32>) {
				%pi = constant 3.141 : f32
				%c3 = constant 3 : i64
				return %pi, %c3, %arg0, %arg0 : f32, i64, memref<xf32>, memref<xf32>
				}

				func @caller_multiple_result_unranked(%arg0 : memref<*xf32>)
				-> (f32, i64, memref<xf32>, memref<xf32>) {
				%0:4 = call @callee_multiple_result_unranked(%arg0) : (memref<*xf32>)
				-> (f32, i64, memref<xf32>, memref<xf32>)
				return %0#0, %0#1, %0#2, %0#3 : f32, i64, memref<xf32>, memref<xf32>
				}

				// CHECK-LABEL: llvm.func @caller_multiple_result_unranked
				// CHECK-SAME: %[[RESULT_INNER_DESC_BUFFER0:arg0]]: !llvm.ptr<i8>,
				// CHECK-SAME: %[[RESULT_INNER_DESC_BUFFER1:arg1]]: !llvm.ptr<i8>,
				// CHECK-SAME: %[[ARG_RANK:.]]: i64, %[[ARG_INNER_DESC:.]]: !llvm.ptr<i8>

				// Populate the descriptor for arg0.
				// CHECK: %[[ARG_DESC0:.*]] = llvm.mlir.undef
				// CHECK: %[[ARG_DESC1:.*]] = llvm.insertvalue %[[ARG_RANK]], %[[ARG_DESC0]][0]
				// CHECK: %[[ARG_DESC2:.*]] = llvm.insertvalue %[[ARG_INNER_DESC]], %[[ARG_DESC1]][1]

				// Allocate descriptor buffers on the stack.
				// CHECK: %[[DEFAULT_DESC_BUFFER_SIZE:.*]] = llvm.mlir.constant(104 : index)
				// CHECK: %[[CALL_INNER_DESC_BUFFER0:.*]] = llvm.alloca %[[DEFAULT_DESC_BUFFER_SIZE]] x i8
				// CHECK: %[[CALL_INNER_DESC_BUFFER1:.*]] = llvm.alloca %[[DEFAULT_DESC_BUFFER_SIZE]] x i8

				// Unpack descriptor.
				// CHECK: %[[ARG_RANK_:.*]] = llvm.extractvalue %[[ARG_DESC2]][0]
				// CHECK: %[[ARG_INNER_DESC_:.*]] = llvm.extractvalue %[[ARG_DESC2]][1]

				// Call the function.
				// CHECK: %[[CALL_RESULT:.*]] = llvm.call @callee_multiple_result_unranked(%[[CALL_INNER_DESC_BUFFER0]], %[[CALL_INNER_DESC_BUFFER1]], %[[ARG_RANK_]], %[[ARG_INNER_DESC_]])

				// Unpack call result.
				// CHECK: %[[FRESULT:.*]] = llvm.extractvalue %[[CALL_RESULT]][0]
				// CHECK: %[[IRESULT:.*]] = llvm.extractvalue %[[CALL_RESULT]][1]
				// CHECK: %[[CALL_RESULT_DESC0:.*]] = llvm.extractvalue %[[CALL_RESULT]][2]
				// CHECK: %[[CALL_RESULT_DESC1:.*]] = llvm.extractvalue %[[CALL_RESULT]][3]

				// Common constant.
				// CHECK: %[[MAX_SUPPORTED_RANK:.*]] = llvm.mlir.constant(5 : i64)

				// Check if the first call result inner descriptor fits into its buffer argument
				// and copy it to a new stack-allocated buffer otherwise.
				// CHECK: %[[RANK:.*]] = llvm.extractvalue %[[CALL_RESULT_DESC0]][0]
				// CHECK: %[[PRED:.*]] = llvm.icmp "ule" %[[RANK]], %[[MAX_SUPPORTED_RANK]]
				// CHECK: llvm.cond_br %[[PRED]], ^bb1(%[[CALL_RESULT_DESC0]] : !llvm.struct<(i64, ptr<i8>)>), ^bb5

				// At this point, we have the first call result descriptor or its copy.
				// CHECK: ^bb1(%[[DESC_OR_CPY0:.*]]: !llvm.struct<(i64, ptr<i8>)>):

				// Check if the second call result inner descriptor fits into its buffer
				// argument and copy it to a new stack-allocated buffer otherwise.
				// CHECK: %[[RANK:.*]] = llvm.extractvalue %[[CALL_RESULT_DESC1]][0]
				// CHECK: %[[PRED:.*]] = llvm.icmp "ule" %[[RANK]], %[[MAX_SUPPORTED_RANK]]
				// CHECK: llvm.cond_br %[[PRED]], ^bb2(%[[CALL_RESULT_DESC1]] : !llvm.struct<(i64, ptr<i8>)>), ^bb6

				// At this point, we have the call result descriptors or their copy. In both
				// cases the descriptors, including its inner descriptors, are on the stack.
				// To return them, we still have to copy them to the argument buffer or to
				// dynamically allocated memory.
				// CHECK: ^bb2(%[[DESC_OR_CPY1:.*]]: !llvm.struct<(i64, ptr<i8>)>):

				// Common constant.
				// CHECK: %[[MAX_SUPPORTED_RANK_:.*]] = llvm.mlir.constant(5 : i64)

				// Compute the result's first inner descriptor size.
				// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index)
				// CHECK: %[[C2:.*]] = llvm.mlir.constant(2 : index)
				// CHECK: %[[C8:.*]] = llvm.mlir.constant(8 : index)
				// CHECK: %[[C8_:.*]] = llvm.mlir.constant(8 : index)
				// CHECK: %[[SIZE_PTRS:.*]] = llvm.mul %[[C2]], %[[C8]]
				// CHECK: %[[RANK:.*]] = llvm.extractvalue %[[DESC_OR_CPY0]][0]
				// CHECK: %[[RANK_TWICE:.*]] = llvm.mul %[[C2]], %[[RANK]]
				// CHECK: %[[NUM_I64_FIELDS:.*]] = llvm.add %[[RANK_TWICE]], %[[C1]]
				// CHECK: %[[SIZE_I64_FIELDS:.*]] = llvm.mul %[[NUM_I64_FIELDS]], %[[C8_]]
				// CHECK: %[[RESULT_INNER_DESC_SIZE0:.*]] = llvm.add %[[SIZE_PTRS]], %[[SIZE_I64_FIELDS]]

				// Check if the inner descriptor fits into the buffer argument.
				// CHECK: %[[RANK:.*]] = llvm.extractvalue %[[DESC_OR_CPY0]][0]
				// CHECK: %[[PRED:.*]] = llvm.icmp "ule" %[[RANK]], %[[MAX_SUPPORTED_RANK_]]
				// CHECK: llvm.cond_br %[[PRED]], ^bb7, ^bb8

				// Copy the call result's first inner descriptor to the selected buffer and
				// create a copy of the unranked outer descriptor.
				// CHECK: ^bb3(%[[SELECTED_BUFFER0:.*]]: !llvm.ptr<i8>):
				// CHECK: %[[CALL_RESULT_INNER_DESC0:.*]] = llvm.extractvalue %[[DESC_OR_CPY0]][1]
				// CHECK: %[[C0:.*]] = llvm.mlir.constant(false)
				// CHECK: "llvm.intr.memcpy"(%[[SELECTED_BUFFER0]], %[[CALL_RESULT_INNER_DESC0]], %[[RESULT_INNER_DESC_SIZE0]], %[[C0]])
				// CHECK: %[[RESULT0_DESC0:.*]] = llvm.mlir.undef
				// CHECK: %[[RESULT0_DESC1:.*]] = llvm.insertvalue %[[RANK]], %[[RESULT0_DESC0]][0]
				// CHECK: %[[RESULT0_DESC2:.*]] = llvm.insertvalue %[[SELECTED_BUFFER0]], %[[RESULT0_DESC1]][1]

				// Compute the result's second inner descriptor size.
				// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index)
				// CHECK: %[[C2:.*]] = llvm.mlir.constant(2 : index)
				// CHECK: %[[C8:.*]] = llvm.mlir.constant(8 : index)
				// CHECK: %[[C8_:.*]] = llvm.mlir.constant(8 : index)
				// CHECK: %[[SIZE_PTRS:.*]] = llvm.mul %[[C2]], %[[C8]]
				// CHECK: %[[RANK:.*]] = llvm.extractvalue %[[DESC_OR_CPY1]][0]
				// CHECK: %[[RANK_TWICE:.*]] = llvm.mul %[[C2]], %[[RANK]]
				// CHECK: %[[NUM_I64_FIELDS:.*]] = llvm.add %[[RANK_TWICE]], %[[C1]]
				// CHECK: %[[SIZE_I64_FIELDS:.*]] = llvm.mul %[[NUM_I64_FIELDS]], %[[C8_]]
				// CHECK: %[[RESULT_INNER_DESC_SIZE1:.*]] = llvm.add %[[SIZE_PTRS]], %[[SIZE_I64_FIELDS]]

				// Check if the inner descriptor fits into the buffer argument.
				// CHECK: %[[RANK:.*]] = llvm.extractvalue %[[DESC_OR_CPY1]][0]
				// CHECK: %[[PRED:.*]] = llvm.icmp "ule" %[[RANK]], %[[MAX_SUPPORTED_RANK_]]
				// CHECK: llvm.cond_br %[[PRED]], ^bb9, ^bb10

				// Copy the call result's second inner descriptor to the selected buffer and
				// create a copy of the unranked outer descriptor.
				// CHECK: ^bb4(%[[SELECTED_BUFFER1:.*]]: !llvm.ptr<i8>):
				// CHECK: %[[CALL_RESULT_INNER_DESC1:.*]] = llvm.extractvalue %[[DESC_OR_CPY1]][1]
				// CHECK: %[[C0:.*]] = llvm.mlir.constant(false)
				// CHECK: "llvm.intr.memcpy"(%[[SELECTED_BUFFER1]], %[[CALL_RESULT_INNER_DESC1]], %[[RESULT_INNER_DESC_SIZE1]], %[[C0]])
				// CHECK: %[[RESULT1_DESC0:.*]] = llvm.mlir.undef
				// CHECK: %[[RESULT1_DESC1:.*]] = llvm.insertvalue %[[RANK]], %[[RESULT1_DESC0]][0]
				// CHECK: %[[RESULT1_DESC2:.*]] = llvm.insertvalue %[[SELECTED_BUFFER1]], %[[RESULT1_DESC1]][1]

				// Pack the final result and return it.
				// CHECK: %[[RESULT0:.*]] = llvm.mlir.undef
				// CHECK: %[[RESULT1:.*]] = llvm.insertvalue %[[FRESULT]], %[[RESULT0]][0]
				// CHECK: %[[RESULT2:.*]] = llvm.insertvalue %[[IRESULT]], %[[RESULT1]][1]
				// CHECK: %[[RESULT3:.*]] = llvm.insertvalue %[[RESULT0_DESC2]], %[[RESULT2]][2]
				// CHECK: %[[RESULT4:.*]] = llvm.insertvalue %[[RESULT1_DESC2]], %[[RESULT3]][3]
				// CHECK: llvm.return %[[RESULT4]]

				// Copy the call result's first descriptor to stack-allocated memory.
				// This is the case in which it did not fit into the pre-allocated buffer.
				// CHECK: ^bb5:

				// Compute the descriptor size.
				// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index)
				// CHECK: %[[C2:.*]] = llvm.mlir.constant(2 : index)
				// CHECK: %[[C8:.*]] = llvm.mlir.constant(8 : index)
				// CHECK: %[[C8_:.*]] = llvm.mlir.constant(8 : index)
				// CHECK: %[[SIZE_PTRS:.*]] = llvm.mul %[[C2]], %[[C8]]
				// CHECK: %[[RANK:.*]] = llvm.extractvalue %[[CALL_RESULT_DESC0]][0]
				// CHECK: %[[RANK_TWICE:.*]] = llvm.mul %[[C2]], %[[RANK]]
				// CHECK: %[[NUM_I64_FIELDS:.*]] = llvm.add %[[RANK_TWICE]], %[[C1]]
				// CHECK: %[[SIZE_I64_FIELDS:.*]] = llvm.mul %[[NUM_I64_FIELDS]], %[[C8_]]
				// CHECK: %[[CALL_RESULT_INNER_DESC_SIZE0:.*]] = llvm.add %[[SIZE_PTRS]], %[[SIZE_I64_FIELDS]]

				// Stack-allocate a buffer for the call result's first inner descriptor and copy
				// it over. Also, free the previously dynamically allocated inner descriptor.
				// CHECK: %[[INNER_DESC:.*]] = llvm.alloca %[[CALL_RESULT_INNER_DESC_SIZE0]] x i8
				// CHECK: %[[DYN_INNER_DESC:.*]] = llvm.extractvalue %[[CALL_RESULT_DESC0]][1]
				// CHECK: %[[C0:.*]] = llvm.mlir.constant(false)
				// CHECK: "llvm.intr.memcpy"(%[[INNER_DESC]], %[[DYN_INNER_DESC]], %[[CALL_RESULT_INNER_DESC_SIZE0]], %[[C0]])
				// CHECK: llvm.call @free(%[[DYN_INNER_DESC]])
				// CHECK: %[[CALL_RESULT_DESC0_CPY0:.*]] = llvm.mlir.undef
				// CHECK: %[[RANK:.*]] = llvm.extractvalue %[[CALL_RESULT_DESC0]][0]
				// CHECK: %[[CALL_RESULT_DESC0_CPY1:.*]] = llvm.insertvalue %[[RANK]], %[[CALL_RESULT_DESC0_CPY0]][0]
				// CHECK: %[[CALL_RESULT_DESC0_CPY2:.*]] = llvm.insertvalue %[[INNER_DESC]], %[[CALL_RESULT_DESC0_CPY1]][1]
				// CHECK: llvm.br ^bb1(%[[CALL_RESULT_DESC0_CPY2]] : !llvm.struct<(i64, ptr<i8>)>)

				// Copy the call result's second descriptor to stack-allocated memory.
				// This is the case in which it did not fit into the pre-allocated buffer.
				// CHECK: ^bb6:

				// Compute the descriptor size.
				// CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index)
				// CHECK: %[[C2:.*]] = llvm.mlir.constant(2 : index)
				// CHECK: %[[C8:.*]] = llvm.mlir.constant(8 : index)
				// CHECK: %[[C8_:.*]] = llvm.mlir.constant(8 : index)
				// CHECK: %[[SIZE_PTRS:.*]] = llvm.mul %[[C2]], %[[C8]]
				// CHECK: %[[RANK:.*]] = llvm.extractvalue %[[CALL_RESULT_DESC1]][0]
				// CHECK: %[[RANK_TWICE:.*]] = llvm.mul %[[C2]], %[[RANK]]
				// CHECK: %[[NUM_I64_FIELDS:.*]] = llvm.add %[[RANK_TWICE]], %[[C1]]
				// CHECK: %[[SIZE_I64_FIELDS:.*]] = llvm.mul %[[NUM_I64_FIELDS]], %[[C8_]]
				// CHECK: %[[CALL_RESULT_INNER_DESC_SIZE1:.*]] = llvm.add %[[SIZE_PTRS]], %[[SIZE_I64_FIELDS]]

				// Stack-allocate a buffer for the call result's second inner descriptor and
				// copy it over. Also, free the previously dynamically allocated inner
				// descriptor.
				// CHECK: %[[INNER_DESC:.*]] = llvm.alloca %[[CALL_RESULT_INNER_DESC_SIZE1]] x i8
				// CHECK: %[[DYN_INNER_DESC:.*]] = llvm.extractvalue %[[CALL_RESULT_DESC1]][1]
				// CHECK: %[[C0:.*]] = llvm.mlir.constant(false)
				// CHECK: "llvm.intr.memcpy"(%[[INNER_DESC]], %[[DYN_INNER_DESC]], %[[CALL_RESULT_INNER_DESC_SIZE1]], %[[C0]])
				// CHECK: llvm.call @free(%[[DYN_INNER_DESC]])
				// CHECK: %[[CALL_RESULT_DESC1_CPY0:.*]] = llvm.mlir.undef
				// CHECK: %[[RANK:.*]] = llvm.extractvalue %[[CALL_RESULT_DESC1]][0]
				// CHECK: %[[CALL_RESULT_DESC1_CPY1:.*]] = llvm.insertvalue %[[RANK]], %[[CALL_RESULT_DESC1_CPY0]][0]
				// CHECK: %[[CALL_RESULT_DESC1_CPY2:.*]] = llvm.insertvalue %[[INNER_DESC]], %[[CALL_RESULT_DESC1_CPY1]][1]
				// CHECK: llvm.br ^bb2(%[[CALL_RESULT_DESC1_CPY2]] : !llvm.struct<(i64, ptr<i8>)>)

				// Select the buffer argument to copy the result's first inner descriptor to.
				// CHECK: ^bb7:
				// CHECK: llvm.br ^bb3(%[[RESULT_INNER_DESC_BUFFER0]] : !llvm.ptr<i8>)

				// Dynamically allocate a new buffer to copy the result's first inner descriptor
				// to.
				// CHECK: ^bb8:
				// CHECK: %[[NEW_BUFFER:.*]] = llvm.call @malloc(%[[RESULT_INNER_DESC_SIZE0]])
				// CHECK: llvm.br ^bb3(%[[NEW_BUFFER]] : !llvm.ptr<i8>)

				// Select the buffer argument to copy the result's first inner descriptor to.
				// CHECK: ^bb9:
				// CHECK: llvm.br ^bb4(%[[RESULT_INNER_DESC_BUFFER1]] : !llvm.ptr<i8>)

				// Dynamically allocate a new buffer to copy the result's first inner descriptor
				// to.
				// CHECK: ^bb10:
				// CHECK: %[[NEW_BUFFER:.*]] = llvm.call @malloc(%[[RESULT_INNER_DESC_SIZE1]])
				// CHECK: llvm.br ^bb4(%[[NEW_BUFFER]] : !llvm.ptr<i8>)


				func @callee_multiple_args_unranked(%arg0 : memref<*xf32>, %arg1 : f32,
				%arg2 : memref<*xf32>, %arg3 : index) {
				%c0 = constant 0 : index
				%0 = memref.cast %arg0 : memref<*xf32> to memref<?x?xf32>
				%1 = memref.load %0[%c0, %arg3] : memref<?x?xf32>
				%2 = memref.cast %arg2 : memref<*xf32> to memref<?xf32>
				%3 = memref.load %2[%arg3] : memref<?xf32>
				return
				}

				func @caller_multiple_args_unranked(%arg0 : memref<*xf32>, %arg1 : f32,
				%arg2 : memref<*xf32>, %arg3 : index) {
				call @callee_multiple_args_unranked(%arg0, %arg1, %arg2, %arg3)
				: (memref<xf32>, f32, memref<xf32>, index) -> ()
				return
				}

				// CHECK-LABEL: llvm.func @caller_multiple_args_unranked
				// CHECK-SAME: %[[ARG0_RANK:.*]]: i64, %[[ARG0_INNER_DESC:arg1]]: !llvm.ptr<i8>,
				// CHECK-SAME: %[[FARG:arg2]]: f32,
				// CHECK-SAME: %[[ARG1_RANK:.*]]: i64, %[[ARG1_INNER_DESC:arg4]]: !llvm.ptr<i8>,
				// CHECK-SAME: %[[IARG:.*]]: i64

				// Populate the descriptor for arg0.
				// CHECK: %[[ARG0_DESC0:.*]] = llvm.mlir.undef
				// CHECK: %[[ARG0_DESC1:.*]] = llvm.insertvalue %[[ARG0_RANK]], %[[ARG0_DESC0]][0]
				// CHECK: %[[ARG0_DESC2:.*]] = llvm.insertvalue %[[ARG0_INNER_DESC]], %[[ARG0_DESC1]][1]

				// Populate the descriptor for arg2.
				// CHECK: %[[ARG1_DESC0:.*]] = llvm.mlir.undef
				// CHECK: %[[ARG1_DESC1:.*]] = llvm.insertvalue %[[ARG1_RANK]], %[[ARG1_DESC0]][0]
				// CHECK: %[[ARG1_DESC2:.*]] = llvm.insertvalue %[[ARG1_INNER_DESC]], %[[ARG1_DESC1]][1]

				// Unpack descriptor for arg0.
				// CHECK: %[[ARG0_RANK:.*]] = llvm.extractvalue %[[ARG0_DESC2]][0]
				// CHECK: %[[ARG0_INNER_DESC:.*]] = llvm.extractvalue %[[ARG0_DESC2]][1]

				// Unpack descriptor for arg2.
				// CHECK: %[[ARG1_RANK:.*]] = llvm.extractvalue %[[ARG1_DESC2]][0]
				// CHECK: %[[ARG1_INNER_DESC:.*]] = llvm.extractvalue %[[ARG1_DESC2]][1]

				// Call the function and return.
				// CHECK: llvm.call @callee_multiple_args_unranked(%[[ARG0_RANK]], %[[ARG0_INNER_DESC]], %[[FARG]], %[[ARG1_RANK]], %[[ARG1_INNER_DESC]], %[[IARG]])
				// CHECK: llvm.return

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Improve calling convention for unranked memory descriptor results.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 377522

mlir/docs/TargetLLVMIR.md

mlir/include/mlir/Conversion/LLVMCommon/LoweringOptions.h

mlir/include/mlir/Conversion/LLVMCommon/MemRefBuilder.h

mlir/include/mlir/Conversion/LLVMCommon/Pattern.h

mlir/include/mlir/Conversion/Passes.td

mlir/lib/Conversion/LLVMCommon/MemRefBuilder.cpp

mlir/lib/Conversion/LLVMCommon/Pattern.cpp

mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp

mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp

mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp

mlir/test/Conversion/StandardToLLVM/calling-convention-dbg.mlir

mlir/test/Conversion/StandardToLLVM/calling-convention-external-c-function-callee.mlir

mlir/test/Conversion/StandardToLLVM/calling-convention-external-c-function-caller.mlir

mlir/test/Conversion/StandardToLLVM/calling-convention.mlir

[MLIR] Improve calling convention for unranked memory descriptor results.
AbandonedPublic