Page MenuHomePhabricator

[SE] Add CUDA platform
ClosedPublic

Authored by jhen on Sep 13 2016, 5:41 PM.

Details

Summary

Basic CUDA platform implementation and cmake infrastructur to control whether
it's used. A few important TODOs will be handled in later patches:

  • Log some error messages that can't easily be returned as Errors.
  • Cache modules and kernels to prevent reloading them if someone tries to reload a kernel that's already loaded.
  • Tolerate shared memory arguments for kernel launches.

Diff Detail

Repository
rL LLVM

Event Timeline

jhen updated this revision to Diff 71279.Sep 13 2016, 5:41 PM
jhen retitled this revision from to [SE] Add CUDA platform.
jhen updated this object.
jhen added a reviewer: jlebar.
jhen added subscribers: parallel_libs-commits, jprice.
jlebar added inline comments.Sep 13 2016, 9:18 PM
streamexecutor/CMakeLists.txt
6 ↗(On Diff #71279)

Not necessarily in this patch, but we should document how to configure this with CUDA enabled and (see below) how to point cmake at different CUDA installs.

streamexecutor/include/streamexecutor/PlatformOptions.h.in
4 ↗(On Diff #71279)

Maybe we should have a comment in this file explaining what it's for.

streamexecutor/include/streamexecutor/platforms/cuda/CUDAPlatformDevice.h
37 ↗(On Diff #71279)

Do we want the device number in the name?

streamexecutor/lib/CMakeLists.txt
41 ↗(On Diff #71279)

Note to self, need to patch this in and try it out with the in-tree build.

streamexecutor/lib/platforms/cuda/CUDAPlatform.cpp
44 ↗(On Diff #71279)

Do you want to wrap the cuInit(0) call in a helper function so we only ever call it once?

static CUResult ensureCudaInitialized() {
  static InitResult = ...;
  return InitResult;
}

I guess it doesn't make a big difference either way.

59 ↗(On Diff #71279)

Hm. Do we care if getDevice() is slow? We could probably do this without a lock without too much trouble (essentially using the same mechanism used to ensure that function-static variables are initialized only once, except we'd be using it for member variables). Maybe for another patch, if we care about performance. If we don't care, even better. :)

streamexecutor/lib/platforms/cuda/CUDAPlatformDevice.cpp
37 ↗(On Diff #71279)

Hm, I wonder if we want to be more descriptive in our error handling and give some sort of "backtrace", indicating where we failed. e.g. something as simple as

return CUresultToError(result, "cuCtxSetCurrent");

Maybe not for this patch, though.

56 ↗(On Diff #71279)

Is this an "unused variable" warning? If so maybe do

(void) Result;
80 ↗(On Diff #71279)

Nit, "CUDA source" is probably not the right phrase, since this is all compiled CUDA code (PTX and SASS).

streamexecutor/lib/platforms/cuda/cmake/modules/FindLibcuda.cmake
7 ↗(On Diff #71279)

Can we override this on the command line somehow? If so, is that obvious, or is it worth documenting?

8 ↗(On Diff #71279)

The libcuda binary we use should come from the same cuda install as the headers we use, I think?

jprice added inline comments.Sep 14 2016, 5:03 AM
streamexecutor/include/streamexecutor/platforms/cuda/CUDAPlatformDevice.h
23 ↗(On Diff #71279)

Where is this defined? I get undefined references to this function when I try and build this patch.

37 ↗(On Diff #71279)

Or even the actual name of the device (cuDeviceGetName)? This is certainly more useful for client code (in my experience), particularly when working with systems that have more than one type of GPU.

It strikes me that "CUDA" is really the name of the platform, not the device. In which case, maybe it makes sense to also have a getPlatformName() or getPlatform()->getName() that gives you the "CUDA" vs "OpenCL" vs "Host" which is used for the error strings.

jhen updated this revision to Diff 71384.Sep 14 2016, 10:13 AM
jhen marked 12 inline comments as done.
  • Respond to review comments
streamexecutor/CMakeLists.txt
6 ↗(On Diff #71279)

Sounds good. I'll tackle this in the next patch when I enable pointing cmake at different CUDA installs.

streamexecutor/include/streamexecutor/platforms/cuda/CUDAPlatformDevice.h
23 ↗(On Diff #71279)

Oops, I failed at version control. The definition is now in CUDAPlatformDevice.cpp.

37 ↗(On Diff #71279)

I agree that more information is better here. Just as jprice suggested, I changed getName to return the device index and the name from cuDeviceGetName, and I made a new getPlatformName function for error messages..

streamexecutor/lib/platforms/cuda/CUDAPlatform.cpp
44 ↗(On Diff #71279)

Great idea! I like that much better.

59 ↗(On Diff #71279)

I think it doesn't matter for now, and we can come back to it later if it ever seems to matter.

streamexecutor/lib/platforms/cuda/CUDAPlatformDevice.cpp
37 ↗(On Diff #71279)

Yes, I could have used that in debugging already. I did the simple method name reporting for now.

56 ↗(On Diff #71279)

Yes, it was a warning. Thanks for the suggestion. I've done this for all these cases where error handling has yet to be implemented.

80 ↗(On Diff #71279)

I changed it to "CUDA code". I think that should be correct.

streamexecutor/lib/platforms/cuda/cmake/modules/FindLibcuda.cmake
7 ↗(On Diff #71279)

I added a TODO to add this functionality. I'll plan to add it in the next patch.

8 ↗(On Diff #71279)

I agree. I added a TODO here to fix it in the next patch.

jlebar accepted this revision.Sep 14 2016, 10:27 AM
jlebar edited edge metadata.
jlebar added inline comments.
streamexecutor/lib/platforms/cuda/CUDAPlatformDevice.cpp
51 ↗(On Diff #71384)

I wonder if we want to cache this instead of recomputing it every time.

This revision is now accepted and ready to land.Sep 14 2016, 10:27 AM
jhen updated this revision to Diff 71387.Sep 14 2016, 10:34 AM
jhen edited edge metadata.
  • Cache device name
jhen marked an inline comment as done.Sep 14 2016, 10:35 AM
jhen added inline comments.
streamexecutor/lib/platforms/cuda/CUDAPlatformDevice.cpp
52 ↗(On Diff #71387)

Good idea. Let's do that.

This revision was automatically updated to reflect the committed changes.
jhen marked an inline comment as done.