Page MenuHomePhabricator

[mlir:Bytecode] Add shared_ptr<SourceMgr> overloads to allow safe mmap of data
ClosedPublic

Authored by rriddle on Dec 5 2022, 1:29 PM.

Details

Summary

The bytecode reader currently has no mechanism that allows for directly referencing
data from the input buffer safely. This commit adds shared_ptr<SourceMgr> overloads
that provide an explicit and safe way of extending the lifetime of the input. The usage of
these new overloads is adopted in all of our tooling, and is implicitly used in the filename
only parser methods.

Diff Detail

Event Timeline

rriddle created this revision.Dec 5 2022, 1:29 PM
rriddle requested review of this revision.Dec 5 2022, 1:29 PM

Yay for timing as I was just talking about mmap'd resources.

Do we need shared_ptr and do we need it with the IR? (I have a strong dislike with shared pointers as you know :)). We did talk about it before, but just to refresh.

Do we expect the SourceMgr to be scoped to a MLIRContext or be shared between many?

Yay for timing as I was just talking about mmap'd resources.

Do we need shared_ptr and do we need it with the IR? (I have a strong dislike with shared pointers as you know :)). We did talk about it before, but just to refresh.

We need something that allows for extending a reference, otherwise there is no way to correctly and properly extend the lifetime. shared_ptr was the most effective that required least amount of mucking about. Do you have an alternate suggestion?

Do we expect the SourceMgr to be scoped to a MLIRContext or be shared between many?

I would expect in most cases the SourceMgr is for that specific context (given that most cases only use a single source manager anyways).

jpienaar accepted this revision.Dec 6 2022, 5:06 PM

Makes sense, thanks for explaining / clarifying.

mlir/lib/Bytecode/Reader/BytecodeReader.cpp
567

Is this intentionally non-mutable only here?

mlir/lib/Target/LLVMIR/ConvertFromLLVMIR.cpp
1271 ↗(On Diff #480225)

Could we have an overload for original case? (to reduce number of changes needed for cases where this optimization isn't needed and folks opt-in to higher performance one). [not blocking]

This revision is now accepted and ready to land.Dec 6 2022, 5:06 PM

Yay for timing as I was just talking about mmap'd resources.

Do we need shared_ptr and do we need it with the IR? (I have a strong dislike with shared pointers as you know :)). We did talk about it before, but just to refresh.

We need something that allows for extending a reference, otherwise there is no way to correctly and properly extend the lifetime. shared_ptr was the most effective that required least amount of mucking about. Do you have an alternate suggestion?

An alternative way is to use parsing flags, so that instead of implicit extension of lifetime it could be just explicit on the client side: "here is an input buffer that I promise to keep alive beyond the IR".

Do we expect the SourceMgr to be scoped to a MLIRContext or be shared between many?

I would expect in most cases the SourceMgr is for that specific context (given that most cases only use a single source manager anyways).

But we should be allowed to share it right?

mlir/include/mlir/Parser/Parser.h
96

Missing an empty line before this comment block? Unless there is some implicit grouping I'm not getting? (same elsewhere)

219

Can you expand these comments with some context? That is include the "why" this is useful/important:

This is useful for example to avoid copying some large resources
into the MLIRContext and instead referencing the data directly from
the input buffers.

Yay for timing as I was just talking about mmap'd resources.

Do we need shared_ptr and do we need it with the IR? (I have a strong dislike with shared pointers as you know :)). We did talk about it before, but just to refresh.

We need something that allows for extending a reference, otherwise there is no way to correctly and properly extend the lifetime. shared_ptr was the most effective that required least amount of mucking about. Do you have an alternate suggestion?

An alternative way is to use parsing flags, so that instead of implicit extension of lifetime it could be just explicit on the client side: "here is an input buffer that I promise to keep alive beyond the IR".

Yeah I considered that, but the implicitness of it didn't feel great (well worse to me than using shared_ptr). I wanted it to be explicit in the API and difficult to mess up.

Do we expect the SourceMgr to be scoped to a MLIRContext or be shared between many?

I would expect in most cases the SourceMgr is for that specific context (given that most cases only use a single source manager anyways).

But we should be allowed to share it right?

Yeah, if desired it should be shareable. Just in most cases it is specific.

rriddle updated this revision to Diff 481870.Dec 10 2022, 11:29 AM
rriddle marked 3 inline comments as done.