Add types for device memory and add the code that knows how to pack these
device memory types if they are passed as arguments to kernel launches.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
streamexecutor/include/streamexecutor/DeviceMemory.h | ||
---|---|---|
72 ↗ | (On Diff #66973) | "Copy-assignable", same below. |
82 ↗ | (On Diff #66973) | Should we have a non-const version of this too? (Not sure if that makes sense.) |
152 ↗ | (On Diff #66973) | OpenCL, where |
157 ↗ | (On Diff #66973) | s/and by/and/ (we already have a "by" earlier) |
161 ↗ | (On Diff #66973) | Some high-level questions about this API:
|
streamexecutor/include/streamexecutor/PackedKernelArgumentArray.h | ||
32 ↗ | (On Diff #66973) |
The code in CGNVCUDARuntime::emitDeviceStubBody doesn't look like this, but maybe this is correct from the perspective of the CUDA driver API. It may be worth clarifying, as presumably people will be more familiar with the runtime API. |
47 ↗ | (On Diff #66973) | Nit, maybe indent a bit? |
streamexecutor/lib/unittests/PackedKernelArgumentArrayTest.cpp | ||
33 ↗ | (On Diff #66973) | This seems useful and idiomatic (e.g. make_tuple) -- maybe it should be a public function? |
55 ↗ | (On Diff #66973) | It's kind of confusing to me that we give these variables the same names as their types. Maybe if we abbreviated to "DeviceMemBase" that would be easier. Or perhaps "UntypedDeviceMem", "TypedDeviceMem", "SharedDeviceMem"? |
67 ↗ | (On Diff #66973) | You could use SCOPED_TRACE, maybe it's a bit cleaner. |
169 ↗ | (On Diff #66973) | It looks like we don't exercise all of the functions in the pack, particularly the ones to get the raw arrays (maybe others too, I didn't look hard). |
streamexecutor/include/streamexecutor/DeviceMemory.h | ||
---|---|---|
48 ↗ | (On Diff #66973) | platform-dependent? |
135 ↗ | (On Diff #66973) | s/it's/its/ |
137 ↗ | (On Diff #66973) | The DeviceMemoryBase constructor has its Handle parameter qualified as const - is there a reason why this one isn't? Same for the makeFromElementCount() function that calls this. |
183 ↗ | (On Diff #66973) | s/it's/its/ |
- Respond to jlebar's comments
- Respond to comments from jprice.
streamexecutor/include/streamexecutor/DeviceMemory.h | ||
---|---|---|
82 ↗ | (On Diff #66973) | Since the handle is meant to be a pointer to opaque data, I don't think it makes sense to return a non-const pointer. |
135 ↗ | (On Diff #66973) | *jhen hides his face in mortification* :) Thanks for catching these. I'm going to claim I only made the mistake once because the other one was copy-pasted. :) |
137 ↗ | (On Diff #66973) | Thanks for catching this. It was just an oversight. Now I have changed them all to be const. |
161 ↗ | (On Diff #66973) |
Good questions. The design was a bit sloppy. Now I have cleaned it up so that there are "global" and "shared" device memory types, where each has a base and a typed version, and the shared version does not inherit from the global version.
I think we should allow them because folks might want to pass a null-like device pointer argument to their kernel (for all the reasons that people allow passing null pointers to functions in host code).
I think we want to allow these too for cases when folks want to write general kernels that can operate with or without a scratch space, etc. |
streamexecutor/include/streamexecutor/PackedKernelArgumentArray.h | ||
32 ↗ | (On Diff #66973) | Right, my comment refers only to the CUDA driver API. I added that fact to the comment. |
streamexecutor/lib/unittests/PackedKernelArgumentArrayTest.cpp | ||
67 ↗ | (On Diff #66973) | Unfortunately I couldn't find a nice way to pass it a string with an integer like ("Index = " << Index). However, as long as the index information is in the message, I think it's OK if it's a little harder to read. |
169 ↗ | (On Diff #66973) | Glad you mentioned it. It was just the raw array getters that were missing tests, but when I added those tests in, I found some const-correctness issues. Now the tests are in and the bugs are fixed. |
streamexecutor/include/streamexecutor/DeviceMemory.h | ||
---|---|---|
162 ↗ | (On Diff #66999) | Thanks for your answers here; sounds good to me. |
streamexecutor/include/streamexecutor/PackedKernelArgumentArray.h | ||
175 ↗ | (On Diff #66999) | Are these FooType *const& Foo as opposed to FooType *Foo for a reason? It seems like maybe the original author was trying to "optimize" something. |
streamexecutor/lib/unittests/PackedKernelArgumentArrayTest.cpp | ||
68 ↗ | (On Diff #66999) | You could maybe use Twine or something, but I'm fine with it as-is. |
- Use twine for SCOPED_TRACE error reporting
- Replace ugly (*const &) types with (*)
streamexecutor/include/streamexecutor/PackedKernelArgumentArray.h | ||
---|---|---|
175 ↗ | (On Diff #66999) | I was actually the author of that mess. The original code uses SFINAE stuff that I didn't like and during development of this code I thought I needed those ugly types to get the template type inference to work right. But I tried it with the simpler types just now and it worked fine, so I don't know what was going wrong for me before. As a result, I got rid of the ugly types and am now just using the pointer types. Thanks pointing out that I could do that! |
streamexecutor/lib/unittests/PackedKernelArgumentArrayTest.cpp | ||
68 ↗ | (On Diff #66999) | The Twine works nicely. Thanks for the suggestion. |