Page MenuHomePhabricator

[mlir] support returning unranked memrefs
ClosedPublic

Authored by ftynse on Fri, Jun 26, 5:35 AM.

Details

Summary

Initially, unranked memref descriptors in the LLVM dialect were designed only
to be passed into functions. An assertion was guarding against returning
unranked memrefs from functions in the standard-to-LLVM conversion. This is
insufficient for functions that wish to return an unranked memref such that the
caller does not know the rank in advance, and hence cannot allocate the
descriptor and pass it in as an argument.

Introduce a calling convention for returning unranked memref descriptors as
follows. An unranked memref descriptor always points to a ranked memref
descriptor stored on stack of the current function. When an unranked memref
descriptor is returned from a function, the ranked memref descriptor it points
to is copied to dynamically allocated memory, the ownership of which is
transferred to the caller. The caller is responsible for deallocating the
dynamically allocated memory and for copying the pointed-to ranked memref
descriptor onto its stack.

Provide default lowerings for std.return, std.call and std.indirect_call that
maintain the conversion defined above.

This convention is additionally exercised by a runtime test to guard against
memory errors.

Diff Detail

Event Timeline

ftynse created this revision.Fri, Jun 26, 5:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptFri, Jun 26, 5:35 AM
herhut accepted this revision.Fri, Jun 26, 6:25 AM

Thanks for adding this and unblocking the use of unranked results!

This can be optimized in many ways to reduce the number of allocations that is required. Some ideas we discussed offline (just to document them):

  • only allocate once for all returned descriptors
  • the caller allocates some scratchpad memory on stack and passes it in as arg 0 if the called function returns unranked memrefs. The size could be so that it can hold e.g. rank-5 descriptors for all results. That memory is then used for return values and only if it is not large enough, we allocate.

However, that is future work. Getting it supported and correct is more important.

mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1918

nit: remove on descriptors.

1927

Nit: operands.

1966

Move out of loop?

2044

Not doing an early return would avoid duplication here.

This revision is now accepted and ready to land.Fri, Jun 26, 6:25 AM
This revision was automatically updated to reflect the committed changes.
ftynse marked 2 inline comments as done.
rriddle added inline comments.Fri, Jun 26, 12:01 PM
mlir/lib/Conversion/StandardToLLVM/StandardToLLVM.cpp
1930

nit: Flip the condition and drop the braces?

mehdi_amini added inline comments.Mon, Jun 29, 8:34 PM
mlir/docs/ConversionToLLVMDialect.md
323

Is this correct?
For unranked memrefs [...] same as the unranked memref descriptor

325

stack allocation threw me off at first because I was wondering what prevents a heap alloc.

328

This last sentence is really not clear to me. What's specific about thread safety here? What's about removing stack allocations? What should a caller do here? It this about optimization?

376

I don't understand how the fact that it is allocated on the stack or not is part of the convention? The lifetime aspect is also not clear to me: is the motivation to be able to reuse this stack memory in the function itself when it is done with the memref?

378

The part about stack overflows seems to indicate that the lifetime is about the caller and not the callee. This seems all implementation details of the lowering and not part of the calling convention.

381

Typo: accordingly

ftynse marked 7 inline comments as done.Tue, Jun 30, 12:20 PM
mlir/docs/ConversionToLLVMDialect.md
323

Yes, it's calling convention about about unpacking the { i64, i8* } struct into two separate arguments

325

If we allocate on heap when lowering the cast, we need some lifetime analysis to understand where to insert the corresponding deallocation. Allocation on stack does not have this problem.

328

This was here way before the current patch. Thread safety was a concern when we were passing a pointer to the descriptor that could be mutated, but I don't think it's a problem in the current model. This should be rephrased as something like "the caller is in charge of managing the allocation".

376

The motivation is that the caller guarantees to the callee that the pointer lives longer than the callee.

378

I can factor this out into some "unranked memref lifetime management" section, but it will only scatter the information that is relevant here.

mehdi_amini added inline comments.Tue, Jun 30, 1:58 PM
mlir/docs/ConversionToLLVMDialect.md
325

I guess you're trying to document some specific lowering while I'm reading "calling convention independently" of any lowering.

328

Makes sense.

376

OK but the "stack" aspect does not seem to belong still, and the sentence is overly restrictive: as written it says that the lifetime ends with the function instead of outlive the function. This is a big difference because inside the function it impacts whether I can clobber the memory or not.

378

The issue to me (across all these comments above) is really separating the specification of the "calling convention" from a particular lowering strategy.

ftynse marked an inline comment as done.Wed, Jul 1, 1:01 AM
ftynse added inline comments.
mlir/docs/ConversionToLLVMDialect.md
325

Well, this is a side effect of MLIR: a calling convention is nothing else but the set of compatible patterns lowering std.func, std.call, std.indirect_call and std.return. I don't think it's really separable, and there are users that use different patterns and get a different convention (eg "bare pointer")

mehdi_amini added inline comments.Sat, Jul 4, 2:46 PM
mlir/docs/ConversionToLLVMDialect.md
325

I am not sure I follow: the fact that the caller is turned into a stack allocation seems like a pure implementation details of a lowering pattern. The signature of the function (the actual calling convention) isn't: it describes how anyone could implement their lowering to call such a function, or to lower their own function definition so that they can be called by the standard call.

You seem to conflate both aspects together here, while they are fundamentally different to me.