This is an archive of the discontinued LLVM Phabricator instance.

Avoid unnecessary output buffer allocation and initialization.
ClosedPublic

Authored by bixia on Dec 7 2021, 3:16 PM.

Details

Summary

The sparse tensor code generator allocates memory for the output tensor. As
such, we only need to allocate a MemRefDescriptor to receive the output tensor
and do not need to allocate and initialize the storage for the tensor.

Diff Detail

Event Timeline

bixia created this revision.Dec 7 2021, 3:16 PM
bixia requested review of this revision.Dec 7 2021, 3:16 PM

Can you refine:

"The sparse tensor code generator allocates memory for the output tensor."

in the description test. The sparse codegen does not do bufferization for dense tensors, but uses the to_memref/to_tensor at the boundaries.
So the actual allocation comes from a later bufferization. Just to make sure details are right.

mlir/test/Integration/Dialect/SparseTensor/python/test_SpMM.py
92–95

can you split this into two lines, and assign an intuitive name to the memref descriptor

something like

ref_out = rt.make ...
mem_out = ctypes.pointer( ...

aartbik added inline comments.Dec 8 2021, 2:35 PM
mlir/test/Integration/Dialect/SparseTensor/python/test_SpMM.py
92–95

And actually add what you have in the description as comment

  1. Allocate a MemRefDescriptor to receive the output tensor.
  2. The buffer itself is allocated inside the MLIR code.

ref_out = ...
mem_out = ...

or something like that

bixia updated this revision to Diff 392954.Dec 8 2021, 3:29 PM

Address review feedback.

bixia updated this revision to Diff 392956.Dec 8 2021, 3:31 PM

Update tree.

bixia marked 2 inline comments as done.Dec 8 2021, 3:31 PM
aartbik accepted this revision.Dec 8 2021, 4:39 PM
This revision is now accepted and ready to land.Dec 8 2021, 4:39 PM