This is an archive of the discontinued LLVM Phabricator instance.

Differential D115292

Avoid unnecessary output buffer allocation and initialization.
ClosedPublic

Authored by bixia on Dec 7 2021, 3:16 PM.

Download Raw Diff

Details

Reviewers

aartbik

Commits

rG64e171c2d0c3: Avoid unnecessary output buffer allocation and initialization.

Summary

The sparse tensor code generator allocates memory for the output tensor. As
such, we only need to allocate a MemRefDescriptor to receive the output tensor
and do not need to allocate and initialize the storage for the tensor.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

bixia created this revision.Dec 7 2021, 3:16 PM

Herald added a reviewer: aartbik. · View Herald TranscriptDec 7 2021, 3:16 PM

Herald added subscribers: sdasgup3, wenzhicui, wrengr and 21 others. · View Herald Transcript

bixia requested review of this revision.Dec 7 2021, 3:16 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 7 2021, 3:16 PM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B138016: Diff 392563.Dec 7 2021, 3:34 PM

Can you refine:

"The sparse tensor code generator allocates memory for the output tensor."

in the description test. The sparse codegen does not do bufferization for dense tensors, but uses the to_memref/to_tensor at the boundaries.
So the actual allocation comes from a later bufferization. Just to make sure details are right.

mlir/test/Integration/Dialect/SparseTensor/python/test_SpMM.py
92–95	can you split this into two lines, and assign an intuitive name to the memref descriptor something like ref_out = rt.make ... mem_out = ctypes.pointer( ...

aartbik added inline comments.Dec 8 2021, 2:35 PM

mlir/test/Integration/Dialect/SparseTensor/python/test_SpMM.py
92–95	And actually add what you have in the description as comment Allocate a MemRefDescriptor to receive the output tensor. The buffer itself is allocated inside the MLIR code. ref_out = ... mem_out = ... or something like that

Address review feedback.

Update tree.

bixia marked 2 inline comments as done.Dec 8 2021, 3:31 PM

aartbik accepted this revision.Dec 8 2021, 4:39 PM

This revision is now accepted and ready to land.Dec 8 2021, 4:39 PM

Harbormaster completed remote builds in B138304: Diff 392956.Dec 8 2021, 7:03 PM

Closed by commit rG64e171c2d0c3: Avoid unnecessary output buffer allocation and initialization. (authored by bixia). · Explain WhyDec 9 2021, 8:29 AM

This revision was automatically updated to reflect the committed changes.

bixia added a commit: rG64e171c2d0c3: Avoid unnecessary output buffer allocation and initialization..

Revision Contents

Path

Size

mlir/

test/

Integration/

Dialect/

SparseTensor/

python/

test_SpMM.py

6 lines

Diff 393177

mlir/test/Integration/Dialect/SparseTensor/python/test_SpMM.py

Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	engine = execution_engine.ExecutionEngine(
module, opt_level=0, shared_libs=[support_lib])		module, opt_level=0, shared_libs=[support_lib])

# Set up numpy input and buffer for output.		# Set up numpy input and buffer for output.
a = np.array(		a = np.array(
[[1.1, 0.0, 0.0, 1.4], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 3.3, 0.0]],		[[1.1, 0.0, 0.0, 1.4], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 3.3, 0.0]],
np.float64)		np.float64)
b = np.array([[1.0, 2.0], [4.0, 3.0], [5.0, 6.0], [8.0, 7.0]], np.float64)		b = np.array([[1.0, 2.0], [4.0, 3.0], [5.0, 6.0], [8.0, 7.0]], np.float64)
c = np.zeros((3, 2), np.float64)		c = np.zeros((3, 2), np.float64)
out = np.zeros((3, 2), np.float64)

mem_a = ctypes.pointer(ctypes.pointer(rt.get_ranked_memref_descriptor(a)))		mem_a = ctypes.pointer(ctypes.pointer(rt.get_ranked_memref_descriptor(a)))
mem_b = ctypes.pointer(ctypes.pointer(rt.get_ranked_memref_descriptor(b)))		mem_b = ctypes.pointer(ctypes.pointer(rt.get_ranked_memref_descriptor(b)))
mem_c = ctypes.pointer(ctypes.pointer(rt.get_ranked_memref_descriptor(c)))		mem_c = ctypes.pointer(ctypes.pointer(rt.get_ranked_memref_descriptor(c)))
mem_out = ctypes.pointer(ctypes.pointer(rt.get_ranked_memref_descriptor(out)))		# Allocate a MemRefDescriptor to receive the output tensor.
		# The buffer itself is allocated inside the MLIR code generation.
		ref_out = rt.make_nd_memref_descriptor(2, ctypes.c_double)()
		mem_out = ctypes.pointer(ctypes.pointer(ref_out))
		aartbikUnsubmitted Done Reply Inline Actions can you split this into two lines, and assign an intuitive name to the memref descriptor something like ref_out = rt.make ... mem_out = ctypes.pointer( ... aartbik: can you split this into two lines, and assign an intuitive name to the memref descriptor…
		aartbikUnsubmitted Done Reply Inline Actions And actually add what you have in the description as comment Allocate a MemRefDescriptor to receive the output tensor. The buffer itself is allocated inside the MLIR code. ref_out = ... mem_out = ... or something like that aartbik: And actually add what you have in the description as comment # Allocate a MemRefDescriptor to…

# Invoke the kernel and get numpy output.		# Invoke the kernel and get numpy output.
# Built-in bufferization uses in-out buffers.		# Built-in bufferization uses in-out buffers.
# TODO: replace with inplace comprehensive bufferization.		# TODO: replace with inplace comprehensive bufferization.
engine.invoke('main', mem_out, mem_a, mem_b, mem_c)		engine.invoke('main', mem_out, mem_a, mem_b, mem_c)

# Sanity check on computed result.		# Sanity check on computed result.
expected = np.matmul(a, b);		expected = np.matmul(a, b);
▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines