This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Get array from the dense elements attribute with buffer protocol.
ClosedPublic

Authored by zhanghb97 on Nov 7 2020, 5:57 PM.

Details

Summary
  • Add mlirDenseElementsAttrGetRawData C API.
  • Add def_buffer binding to PyDenseElementsAttribute.
  • Implement the protocol to access the buffer.

Diff Detail

Event Timeline

zhanghb97 created this revision.Nov 7 2020, 5:57 PM
zhanghb97 requested review of this revision.Nov 7 2020, 5:57 PM
zhanghb97 added inline comments.Nov 7 2020, 6:01 PM
mlir/lib/Bindings/Python/IRModules.cpp
1695

I'm not sure about how to free the data pointer properly. Should we add a private member data and a destructor to class PyDenseElementsAttribute, or use a smart pointer to manage the dynamically allocated array? I think I need suggestions here.

stellaraccident added inline comments.Nov 7 2020, 6:57 PM
mlir/lib/Bindings/Python/IRModules.cpp
1695

I am pretty sure that we don't want to be creating this element-by-element copy and that we should instead have an API to get the const void* of the backing buffer of the attribute (basically the inverse of the mlirDenseElementsAttr*Get functions that take a pointer to a linear buffer). The C++ method to use is getRawData(). For known int/float data types, this is a pretty critical optimization, and then we can opt to use an element-by-element copy as a fallback if needed.

If you were to do that (just accessing the raw data), then you wouldn't have this allocated array to hang on to.

The other way to do this is to not have the attribute itself implement the buffer protocol but instead have a property array that returns an array for access. This is the approach I take in IREE (https://github.com/google/iree/blob/f9c11f20fc97154a7079a3809025241f5f467183/bindings/python/pyiree/rt/host_types.cc#L179). If doing that, then you get to set a base object (this attribute object) when constructing the array, thereby keeping the backing data alive.

zhanghb97 updated this revision to Diff 303722.Nov 8 2020, 7:30 AM
  • Add mlirDenseElementsAttrGetRawData C API.
  • Use the raw data to avoid the allocated array in the buffer protocol.
mlir/lib/Bindings/Python/IRModules.cpp
1695

At the very beginning, I saw the raw data way in the pybind11 doc, so I implement this first. Now I use the C API to avoid the allocated array, PTAL.

Also, I'm wondering that which method (raw data & property array) is better for our situation.

ftynse added inline comments.Nov 9 2020, 5:34 AM
mlir/include/mlir-c/StandardAttributes.h
297

There is already mlirAttributeGetType, I don't think we want to duplicate it on each subclass of MlirAttribute.

mlir/lib/Bindings/Python/IRModules.cpp
1628

Style nit: I wouldn't check for == 1 for "booleans" in C, the usual convention that 0 is false and non-zero (which may or may not be one) is true. And`mlirTypeIsAF32(elementType)` just reads better.

1636

Nit: braces must be symmetric, if one "block" uses them, all blocks should. Also, the code above isn't really trivial and would mandate braces anyway.

1692

Can't we extract rank from shapedType?

size looks unused in the function.

1698

Nit: why ssize_t? We use intptr_t for sizes in C bindings, let's be consistent.

1703

Why int64_t here? There are 3 different types in 5 lines, this just adds complexity.

zhanghb97 updated this revision to Diff 304020.Nov 9 2020, 5:29 PM
zhanghb97 marked 6 inline comments as done.
zhanghb97 retitled this revision from [WIP][mlir] Get array from the dense elements attribute. to [mlir] Get array from the dense elements attribute..
zhanghb97 edited the summary of this revision. (Show Details)
  • Rebase.
  • Remove the mlirElementsAttrGetType C API.
  • Modify checking conditions and unify the use of braces of the if and if else in accessBuffer function.
  • Unify the type into intptr_t in bufferInfo function.
zhanghb97 added inline comments.Nov 9 2020, 5:41 PM
mlir/include/mlir-c/StandardAttributes.h
297

Remove the mlirElementsAttrGetType and use mlirAttributeGetType in the accessBuffer function.

mlir/lib/Bindings/Python/IRModules.cpp
1628

Remove the ==1 for all the checking.

1636

Add braces to all the blocks.

1692

Remove the size and rank parameters, extract rank from shapeType in the function.

1703

Unify the above types into intptr_t.

zhanghb97 updated this revision to Diff 304698.Nov 11 2020, 6:20 PM
zhanghb97 retitled this revision from [mlir] Get array from the dense elements attribute. to [mlir] Get array from the dense elements attribute with buffer protocol..

Rebase.

mehdi_amini added inline comments.Nov 11 2020, 8:44 PM
mlir/include/mlir-c/StandardAttributes.h
409

Can you add C-unit-tests for these?

zhanghb97 updated this revision to Diff 304758.Nov 12 2020, 2:13 AM

Add unit tests for mlirDenseElementsAttrGetRawData C API.

According to Stella's suggestion, there are two methods to get the array from dense elements attribute:

  • Buffer protocol
  • Array property

This patch shows the buffer protocol method, and I also implement the array property method in D91326.
Now, we can compare those two methods and choose one. Personally, I prefer the buffer protocol method, because I think the use of np.array(attr) is more pythonic than attr.array for an array.

Thanks - I prefer this version, but have a material request wrt readonly access.

mlir/lib/Bindings/Python/IRModules.cpp
1697

This is a problem: by default, the buffer will be exposed as writable, which will corrupt the uniqued data in the context if ever done.

A readonly flag was added to pybind ~one year ago (11/23/2019), which means that we no longer can just build on an ancient pybind as from regular 2018 era LTS distros. Not the end of the world, but we should document it. I believe this puts the minimum version at v2.5.0. Since we haven't launched yet, we might as well advertise v2.6.0, since as far as I can tell, mainstream distros don't include either and going as recent as possible gives us headroom. Requiring v.2.6.0 will also let me resolve a hack elsewhere.

Can you:

  • In the docs/Bindings/Python.md file, add a note to the pre-requisite for pybind11 noting that v2.6.0 is required
  • Pass true to the readonly flag in the py::buffer_info constructor (it follows strides).
1699

Use llvm::SmallVector<intptr_t, 4> here and for strides.

1715

strides, /*readonly=*/true);

zhanghb97 updated this revision to Diff 305950.Nov 17 2020, 6:17 PM
  • Set the buffer to read-only.
  • Add note to Python binding doc to note that pybind11 v2.6.0 is required.
  • Use llvm::SmallVector instead of std::vector.
zhanghb97 marked 3 inline comments as done.Nov 17 2020, 6:18 PM
stellaraccident accepted this revision.Nov 17 2020, 8:54 PM

Thanks - a few nits. Maybe put a message on discord when you land indicating the new version requirement (and tag ftynse since he is working on this)

mlir/docs/Bindings/Python.md
10 ↗(On Diff #305950)

Maybe just: "minimum version required: :2.6.0"

mlir/lib/Bindings/Python/IRModules.cpp
1696

Please use static_cast when casting to our from void* (it is stricter than reinterpret and explicitly works for void casts)

1697

Since const_cast is "impolite", add a comment: buffer is configured for read-only access below

This revision is now accepted and ready to land.Nov 17 2020, 8:54 PM
  • Rebase.
  • Use static_cast and add a comment.
zhanghb97 marked 3 inline comments as done.Nov 17 2020, 11:50 PM
This revision was landed with ongoing or failed builds.Nov 17 2020, 11:51 PM
This revision was automatically updated to reflect the committed changes.