This is an archive of the discontinued LLVM Phabricator instance.

[StreamExecutor] Add DeviceMemory and kernel arg packing
ClosedPublic

Authored by jhen on Aug 5 2016, 10:21 AM.

Details

Summary

Add types for device memory and add the code that knows how to pack these
device memory types if they are passed as arguments to kernel launches.

Diff Detail

Repository
rL LLVM

Event Timeline

jhen updated this revision to Diff 66973.Aug 5 2016, 10:21 AM
jhen retitled this revision from to [StreamExecutor] Add DeviceMemory and kernel arg packing.
jhen updated this object.
jhen added reviewers: jlebar, tra.
jhen added a subscriber: parallel_libs-commits.
jlebar added inline comments.Aug 5 2016, 11:09 AM
streamexecutor/include/streamexecutor/DeviceMemory.h
72 ↗(On Diff #66973)

"Copy-assignable", same below.

82 ↗(On Diff #66973)

Should we have a non-const version of this too? (Not sure if that makes sense.)

152 ↗(On Diff #66973)

OpenCL, where

157 ↗(On Diff #66973)

s/and by/and/ (we already have a "by" earlier)

161 ↗(On Diff #66973)

Some high-level questions about this API:

  • Why does SharedDeviceMemory inherit from DeviceMemoryBase, when its handle is always null? It seems like SharedDeviceMemory is inherently a different thing because it never has a handle.
  • Do we need to allow null handles on DeviceMemoryBase (other than for shared memory)? In general if we can reduce the number of states in our state machine, that makes life easier for everyone.
  • Same for size-zero device/shared memory objects.
  • It seems inconsistent that we allow untyped device-memory handles, but do not allow untyped shared-memory handles. Why is that?
streamexecutor/include/streamexecutor/PackedKernelArgumentArray.h
32 ↗(On Diff #66973)

CUDA kernel launches take an array of argument addresses

The code in CGNVCUDARuntime::emitDeviceStubBody doesn't look like this, but maybe this is correct from the perspective of the CUDA driver API. It may be worth clarifying, as presumably people will be more familiar with the runtime API.

47 ↗(On Diff #66973)

Nit, maybe indent a bit?

streamexecutor/lib/unittests/PackedKernelArgumentArrayTest.cpp
33 ↗(On Diff #66973)

This seems useful and idiomatic (e.g. make_tuple) -- maybe it should be a public function?

55 ↗(On Diff #66973)

It's kind of confusing to me that we give these variables the same names as their types. Maybe if we abbreviated to "DeviceMemBase" that would be easier. Or perhaps "UntypedDeviceMem", "TypedDeviceMem", "SharedDeviceMem"?

67 ↗(On Diff #66973)

You could use SCOPED_TRACE, maybe it's a bit cleaner.

169 ↗(On Diff #66973)

It looks like we don't exercise all of the functions in the pack, particularly the ones to get the raw arrays (maybe others too, I didn't look hard).

jprice added a subscriber: jprice.Aug 5 2016, 12:35 PM
jprice added inline comments.
streamexecutor/include/streamexecutor/DeviceMemory.h
48 ↗(On Diff #66973)

platform-dependent?

135 ↗(On Diff #66973)

s/it's/its/
s/paramter/parameter/

137 ↗(On Diff #66973)

The DeviceMemoryBase constructor has its Handle parameter qualified as const - is there a reason why this one isn't? Same for the makeFromElementCount() function that calls this.

183 ↗(On Diff #66973)

s/it's/its/
s/paramter/parameter/

jhen updated this revision to Diff 66999.Aug 5 2016, 12:53 PM
jhen marked 15 inline comments as done.
  • Respond to jlebar's comments
  • Respond to comments from jprice.
streamexecutor/include/streamexecutor/DeviceMemory.h
82 ↗(On Diff #66973)

Since the handle is meant to be a pointer to opaque data, I don't think it makes sense to return a non-const pointer.

135 ↗(On Diff #66973)

*jhen hides his face in mortification* :) Thanks for catching these. I'm going to claim I only made the mistake once because the other one was copy-pasted. :)

137 ↗(On Diff #66973)

Thanks for catching this. It was just an oversight. Now I have changed them all to be const.

161 ↗(On Diff #66973)

Why does SharedDeviceMemory inherit from DeviceMemoryBase, when its handle is always null? It seems like SharedDeviceMemory is inherently a different thing because it never has a handle.

It seems inconsistent that we allow untyped device-memory handles, but do not allow untyped shared-memory handles. Why is that?

Good questions. The design was a bit sloppy. Now I have cleaned it up so that there are "global" and "shared" device memory types, where each has a base and a typed version, and the shared version does not inherit from the global version.

Do we need to allow null handles on DeviceMemoryBase (other than for shared memory)? In general if we can reduce the number of states in our state machine, that makes life easier for everyone.

I think we should allow them because folks might want to pass a null-like device pointer argument to their kernel (for all the reasons that people allow passing null pointers to functions in host code).

Same for size-zero device/shared memory objects.

I think we want to allow these too for cases when folks want to write general kernels that can operate with or without a scratch space, etc.

streamexecutor/include/streamexecutor/PackedKernelArgumentArray.h
32 ↗(On Diff #66973)

Right, my comment refers only to the CUDA driver API. I added that fact to the comment.

streamexecutor/lib/unittests/PackedKernelArgumentArrayTest.cpp
67 ↗(On Diff #66973)

Unfortunately I couldn't find a nice way to pass it a string with an integer like ("Index = " << Index). However, as long as the index information is in the message, I think it's OK if it's a little harder to read.

169 ↗(On Diff #66973)

Glad you mentioned it. It was just the raw array getters that were missing tests, but when I added those tests in, I found some const-correctness issues. Now the tests are in and the bugs are fixed.

jlebar accepted this revision.Aug 5 2016, 2:36 PM
jlebar edited edge metadata.
jlebar added inline comments.
streamexecutor/include/streamexecutor/DeviceMemory.h
162 ↗(On Diff #66999)

Thanks for your answers here; sounds good to me.

streamexecutor/include/streamexecutor/PackedKernelArgumentArray.h
175 ↗(On Diff #66999)

Are these FooType *const& Foo as opposed to FooType *Foo for a reason? It seems like maybe the original author was trying to "optimize" something.

streamexecutor/lib/unittests/PackedKernelArgumentArrayTest.cpp
68 ↗(On Diff #66999)

You could maybe use Twine or something, but I'm fine with it as-is.

This revision is now accepted and ready to land.Aug 5 2016, 2:36 PM
jhen updated this revision to Diff 67048.Aug 5 2016, 4:45 PM
jhen marked 2 inline comments as done.
jhen edited edge metadata.
  • Use twine for SCOPED_TRACE error reporting
  • Replace ugly (*const &) types with (*)
streamexecutor/include/streamexecutor/PackedKernelArgumentArray.h
175 ↗(On Diff #66999)

I was actually the author of that mess. The original code uses SFINAE stuff that I didn't like and during development of this code I thought I needed those ugly types to get the template type inference to work right. But I tried it with the simpler types just now and it worked fine, so I don't know what was going wrong for me before.

As a result, I got rid of the ugly types and am now just using the pointer types. Thanks pointing out that I could do that!

streamexecutor/lib/unittests/PackedKernelArgumentArrayTest.cpp
68 ↗(On Diff #66999)

The Twine works nicely. Thanks for the suggestion.

This revision was automatically updated to reflect the committed changes.