This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
streamexecutor/
1/2
CMakeLists.txt
-
include/streamexecutor/
-
streamexecutor/
1/1
PlatformOptions.h.in
-
platforms/cuda/
-
cuda/
-
CUDAPlatform.h
3/5
CUDAPlatformDevice.h
-
lib/
1
CMakeLists.txt
-
PlatformManager.cpp
-
platforms/
-
CMakeLists.txt
-
cuda/
-
CMakeLists.txt
2/4
CUDAPlatform.cpp
4/8
CUDAPlatformDevice.cpp
-
cmake/modules/
-
modules/
2/4
FindLibcuda.cmake

Differential D24538

[SE] Add CUDA platform
ClosedPublic

Authored by jhen on Sep 13 2016, 5:41 PM.

Download Raw Diff

Details

Reviewers

jlebar

Commits

rG6bfc863d741a: [SE] Add CUDA platform
rL281524: [SE] Add CUDA platform

Summary

Basic CUDA platform implementation and cmake infrastructur to control whether
it's used. A few important TODOs will be handled in later patches:

Log some error messages that can't easily be returned as Errors.
Cache modules and kernels to prevent reloading them if someone tries to reload a kernel that's already loaded.
Tolerate shared memory arguments for kernel launches.

Diff Detail

Event Timeline

jhen updated this revision to Diff 71279.Sep 13 2016, 5:41 PM

jhen retitled this revision from to [SE] Add CUDA platform.

jhen updated this object.

jhen added a reviewer: jlebar.

jhen added subscribers: parallel_libs-commits, jprice.

Herald added subscribers: jlebar, mgorny, beanz. · View Herald TranscriptSep 13 2016, 5:41 PM

jlebar added inline comments.Sep 13 2016, 9:18 PM

streamexecutor/CMakeLists.txt
6	Not necessarily in this patch, but we should document how to configure this with CUDA enabled and (see below) how to point cmake at different CUDA installs.
streamexecutor/include/streamexecutor/PlatformOptions.h.in
4	Maybe we should have a comment in this file explaining what it's for.
streamexecutor/include/streamexecutor/platforms/cuda/CUDAPlatformDevice.h
37	Do we want the device number in the name?
streamexecutor/lib/CMakeLists.txt
41	Note to self, need to patch this in and try it out with the in-tree build.
streamexecutor/lib/platforms/cuda/CUDAPlatform.cpp
44	Do you want to wrap the cuInit(0) call in a helper function so we only ever call it once? static CUResult ensureCudaInitialized() { static InitResult = ...; return InitResult; } I guess it doesn't make a big difference either way.
59	Hm. Do we care if getDevice() is slow? We could probably do this without a lock without too much trouble (essentially using the same mechanism used to ensure that function-static variables are initialized only once, except we'd be using it for member variables). Maybe for another patch, if we care about performance. If we don't care, even better. :)
streamexecutor/lib/platforms/cuda/CUDAPlatformDevice.cpp
37	Hm, I wonder if we want to be more descriptive in our error handling and give some sort of "backtrace", indicating where we failed. e.g. something as simple as return CUresultToError(result, "cuCtxSetCurrent"); Maybe not for this patch, though.
56	Is this an "unused variable" warning? If so maybe do (void) Result;
80	Nit, "CUDA source" is probably not the right phrase, since this is all compiled CUDA code (PTX and SASS).
streamexecutor/lib/platforms/cuda/cmake/modules/FindLibcuda.cmake
7	Can we override this on the command line somehow? If so, is that obvious, or is it worth documenting?
8	The libcuda binary we use should come from the same cuda install as the headers we use, I think?

jprice added inline comments.Sep 14 2016, 5:03 AM

streamexecutor/include/streamexecutor/platforms/cuda/CUDAPlatformDevice.h
23	Where is this defined? I get undefined references to this function when I try and build this patch.
37	Or even the actual name of the device (`cuDeviceGetName`)? This is certainly more useful for client code (in my experience), particularly when working with systems that have more than one type of GPU. It strikes me that "CUDA" is really the name of the platform, not the device. In which case, maybe it makes sense to also have a `getPlatformName()` or `getPlatform()->getName()` that gives you the "CUDA" vs "OpenCL" vs "Host" which is used for the error strings.

Respond to review comments

streamexecutor/CMakeLists.txt
6	Sounds good. I'll tackle this in the next patch when I enable pointing cmake at different CUDA installs.
streamexecutor/include/streamexecutor/platforms/cuda/CUDAPlatformDevice.h
23	Oops, I failed at version control. The definition is now in CUDAPlatformDevice.cpp.
37	I agree that more information is better here. Just as jprice suggested, I changed `getName` to return the device index and the name from `cuDeviceGetName`, and I made a new `getPlatformName` function for error messages..
streamexecutor/lib/platforms/cuda/CUDAPlatform.cpp
44	Great idea! I like that much better.
59	I think it doesn't matter for now, and we can come back to it later if it ever seems to matter.
streamexecutor/lib/platforms/cuda/CUDAPlatformDevice.cpp
37	Yes, I could have used that in debugging already. I did the simple method name reporting for now.
56	Yes, it was a warning. Thanks for the suggestion. I've done this for all these cases where error handling has yet to be implemented.
80	I changed it to "CUDA code". I think that should be correct.
streamexecutor/lib/platforms/cuda/cmake/modules/FindLibcuda.cmake
7	I added a TODO to add this functionality. I'll plan to add it in the next patch.
8	I agree. I added a TODO here to fix it in the next patch.

jlebar accepted this revision.Sep 14 2016, 10:27 AM

jlebar edited edge metadata.

jlebar added inline comments.

streamexecutor/lib/platforms/cuda/CUDAPlatformDevice.cpp
52	I wonder if we want to cache this instead of recomputing it every time.

This revision is now accepted and ready to land.Sep 14 2016, 10:27 AM

Cache device name

jhen marked an inline comment as done.Sep 14 2016, 10:35 AM

jhen added inline comments.

streamexecutor/lib/platforms/cuda/CUDAPlatformDevice.cpp
53	Good idea. Let's do that.

Closed by commit rL281524: [SE] Add CUDA platform (authored by jhen). · Explain WhySep 14 2016, 1:07 PM

This revision was automatically updated to reflect the committed changes.

jhen marked an inline comment as done.

Revision Contents

Path

Size

streamexecutor/

CMakeLists.txt

5 lines

include/

streamexecutor/

PlatformOptions.h.in

6 lines

platforms/

cuda/

CUDAPlatform.h

42 lines

CUDAPlatformDevice.h

91 lines

lib/

CMakeLists.txt

24 lines

PlatformManager.cpp

10 lines

platforms/

CMakeLists.txt

3 lines

cuda/

CMakeLists.txt

5 lines

CUDAPlatform.cpp

63 lines

CUDAPlatformDevice.cpp

227 lines

cmake/

modules/

FindLibcuda.cmake

19 lines

Diff 71279

streamexecutor/CMakeLists.txt

	cmake_minimum_required(VERSION 3.1)			cmake_minimum_required(VERSION 3.1)

	option(STREAM_EXECUTOR_UNIT_TESTS "enable unit tests" ON)			option(STREAM_EXECUTOR_UNIT_TESTS "enable unit tests" ON)
	option(STREAM_EXECUTOR_ENABLE_DOXYGEN "enable StreamExecutor doxygen" ON)			option(STREAM_EXECUTOR_ENABLE_DOXYGEN "enable StreamExecutor doxygen" ON)
	option(STREAM_EXECUTOR_ENABLE_CONFIG_TOOL "enable building streamexecutor-config tool" ON)			option(STREAM_EXECUTOR_ENABLE_CONFIG_TOOL "enable building streamexecutor-config tool" ON)
				option(STREAM_EXECUTOR_ENABLE_CUDA_PLATFORM "enable building the CUDA StreamExecutor platform" OFF)
				jlebarUnsubmitted Done Reply Inline Actions Not necessarily in this patch, but we should document how to configure this with CUDA enabled and (see below) how to point cmake at different CUDA installs. jlebar: Not necessarily in this patch, but we should document how to configure this with CUDA enabled…
				jhenAuthorUnsubmitted Not Done Reply Inline Actions Sounds good. I'll tackle this in the next patch when I enable pointing cmake at different CUDA installs. jhen: Sounds good. I'll tackle this in the next patch when I enable pointing cmake at different CUDA…

				configure_file("include/streamexecutor/PlatformOptions.h.in" "include/streamexecutor/PlatformOptions.h")

	# First find includes relative to the streamexecutor top-level source path.			# First find includes relative to the streamexecutor top-level source path.
	include_directories(BEFORE ${CMAKE_CURRENT_SOURCE_DIR}/include)			include_directories(BEFORE ${CMAKE_CURRENT_SOURCE_DIR}/include)
				# Also look for configured headers in the top-level binary directory.
				include_directories(BEFORE ${CMAKE_CURRENT_BINARY_DIR}/include)

	# If we are not building as part of LLVM, build StreamExecutor as a standalone			# If we are not building as part of LLVM, build StreamExecutor as a standalone
	# project using LLVM as an external library:			# project using LLVM as an external library:
	string(			string(
	COMPARE			COMPARE
	EQUAL			EQUAL
	"${CMAKE_SOURCE_DIR}"			"${CMAKE_SOURCE_DIR}"
	"${CMAKE_CURRENT_SOURCE_DIR}"			"${CMAKE_CURRENT_SOURCE_DIR}"
	▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

streamexecutor/include/streamexecutor/PlatformOptions.h.in

This file was added.

				#ifndef STREAMEXECUTOR_PLATFORMOPTIONS_H
				#define STREAMEXECUTOR_PLATFORMOPTIONS_H

				#cmakedefine STREAM_EXECUTOR_ENABLE_CUDA_PLATFORM
				jlebarUnsubmitted Done Reply Inline Actions Maybe we should have a comment in this file explaining what it's for. jlebar: Maybe we should have a comment in this file explaining what it's for.

				#endif // STREAMEXECUTOR_PLATFORMOPTIONS_H

streamexecutor/include/streamexecutor/platforms/cuda/CUDAPlatform.h

This file was added.

				//===-- CUDAPlatform.h - CUDA platform subclass ------------------ C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// Declaration of the CUDAPlatform class.
				///
				//===----------------------------------------------------------------------===//

				#ifndef STREAMEXECUTOR_PLATFORMS_CUDA_CUDAPLATFORM_H
				#define STREAMEXECUTOR_PLATFORMS_CUDA_CUDAPLATFORM_H

				#include "streamexecutor/Platform.h"
				#include "streamexecutor/platforms/cuda/CUDAPlatformDevice.h"

				#include "llvm/Support/Mutex.h"

				#include <map>

				namespace streamexecutor {
				namespace cuda {

				class CUDAPlatform : public Platform {
				public:
				size_t getDeviceCount() const override;

				Expected<Device> getDevice(size_t DeviceIndex) override;

				private:
				llvm::sys::Mutex Mutex;
				std::map<size_t, CUDAPlatformDevice> PlatformDevices;
				};

				} // namespace cuda
				} // namespace streamexecutor

				#endif // STREAMEXECUTOR_PLATFORMS_CUDA_CUDAPLATFORM_H

streamexecutor/include/streamexecutor/platforms/cuda/CUDAPlatformDevice.h

This file was added.

				//===-- CUDAPlatformDevice.h - CUDAPlatformDevice class ---------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// Declaration of the CUDAPlatformDevice class.
				///
				//===----------------------------------------------------------------------===//

				#ifndef STREAMEXECUTOR_PLATFORMS_CUDA_CUDAPLATFORMDEVICE_H
				#define STREAMEXECUTOR_PLATFORMS_CUDA_CUDAPLATFORMDEVICE_H

				#include "streamexecutor/PlatformDevice.h"

				namespace streamexecutor {
				namespace cuda {

				Error CUresultToError(int CUResult);
				jpriceUnsubmitted Done Reply Inline Actions Where is this defined? I get undefined references to this function when I try and build this patch. jprice: Where is this defined? I get undefined references to this function when I try and build this…
				jhenAuthorUnsubmitted Not Done Reply Inline Actions Oops, I failed at version control. The definition is now in CUDAPlatformDevice.cpp. jhen: Oops, I failed at version control. The definition is now in CUDAPlatformDevice.cpp.

				class CUDAPlatformDevice : public PlatformDevice {
				public:
				static Expected<CUDAPlatformDevice> create(size_t DeviceIndex);

				CUDAPlatformDevice(const CUDAPlatformDevice &) = delete;
				CUDAPlatformDevice &operator=(const CUDAPlatformDevice &) = delete;

				CUDAPlatformDevice(CUDAPlatformDevice &&) noexcept;
				CUDAPlatformDevice &operator=(CUDAPlatformDevice &&) noexcept;

				~CUDAPlatformDevice() override;

				std::string getName() const override { return "CUDA"; }
				jlebarUnsubmitted Done Reply Inline Actions Do we want the device number in the name? jlebar: Do we want the device number in the name?
				jpriceUnsubmitted Done Reply Inline Actions Or even the actual name of the device (`cuDeviceGetName`)? This is certainly more useful for client code (in my experience), particularly when working with systems that have more than one type of GPU. It strikes me that "CUDA" is really the name of the platform, not the device. In which case, maybe it makes sense to also have a `getPlatformName()` or `getPlatform()->getName()` that gives you the "CUDA" vs "OpenCL" vs "Host" which is used for the error strings. jprice: Or even the actual name of the device (`cuDeviceGetName`)? This is certainly more useful for…
				jhenAuthorUnsubmitted Not Done Reply Inline Actions I agree that more information is better here. Just as jprice suggested, I changed `getName` to return the device index and the name from `cuDeviceGetName`, and I made a new `getPlatformName` function for error messages.. jhen: I agree that more information is better here. Just as jprice suggested, I changed `getName` to…

				Expected<const void *>
				createKernel(const MultiKernelLoaderSpec &Spec) override;
				Error destroyKernel(const void *Handle) override;

				Expected<const void *> createStream() override;
				Error destroyStream(const void *Handle) override;

				Error launch(const void *PlatformStreamHandle, BlockDimensions BlockSize,
				GridDimensions GridSize, const void *PKernelHandle,
				const PackedKernelArgumentArrayBase &ArgumentArray) override;

				Error copyD2H(const void PlatformStreamHandle, const void DeviceSrcHandle,
				size_t SrcByteOffset, void *HostDst, size_t DstByteOffset,
				size_t ByteCount) override;

				Error copyH2D(const void PlatformStreamHandle, const void HostSrc,
				size_t SrcByteOffset, const void *DeviceDstHandle,
				size_t DstByteOffset, size_t ByteCount) override;

				Error copyD2D(const void PlatformStreamHandle, const void DeviceSrcHandle,
				size_t SrcByteOffset, const void *DeviceDstHandle,
				size_t DstByteOffset, size_t ByteCount) override;

				Error blockHostUntilDone(const void *PlatformStreamHandle) override;

				Expected<void *> allocateDeviceMemory(size_t ByteCount) override;
				Error freeDeviceMemory(const void *Handle) override;

				Error registerHostMemory(void *Memory, size_t ByteCount) override;
				Error unregisterHostMemory(const void *Memory) override;

				Error synchronousCopyD2H(const void *DeviceSrcHandle, size_t SrcByteOffset,
				void *HostDst, size_t DstByteOffset,
				size_t ByteCount) override;

				Error synchronousCopyH2D(const void *HostSrc, size_t SrcByteOffset,
				const void *DeviceDstHandle, size_t DstByteOffset,
				size_t ByteCount) override;

				Error synchronousCopyD2D(const void *DeviceDstHandle, size_t DstByteOffset,
				const void *DeviceSrcHandle, size_t SrcByteOffset,
				size_t ByteCount) override;

				private:
				CUDAPlatformDevice(size_t DeviceIndex) : DeviceIndex(DeviceIndex) {}

				int DeviceIndex;
				};

				} // namespace cuda
				} // namespace streamexecutor

				#endif // STREAMEXECUTOR_PLATFORMS_CUDA_CUDAPLATFORMDEVICE_H

streamexecutor/lib/CMakeLists.txt

	macro(add_se_library name)			macro(add_se_library name)
	add_llvm_library(${name} ${ARGN})			add_llvm_library(${name} ${ARGN})
	set_target_properties(${name} PROPERTIES FOLDER "streamexecutor libraries")			set_target_properties(${name} PROPERTIES FOLDER "streamexecutor libraries")
	endmacro(add_se_library)			endmacro(add_se_library)

				if(STREAM_EXECUTOR_ENABLE_CUDA_PLATFORM)
				set(
				CMAKE_MODULE_PATH
				${CMAKE_MODULE_PATH}
				"${CMAKE_CURRENT_SOURCE_DIR}/platforms/cuda/cmake/modules/")

				find_package(Libcuda REQUIRED)
				include_directories(${LIBCUDA_INCLUDE_DIRS})

				set(
				STREAM_EXECUTOR_CUDA_PLATFORM_TARGET_OBJECT
				$<TARGET_OBJECTS:streamexecutor_cuda_platform>)

				set(
				STREAM_EXECUTOR_LIBCUDA_LIBRARIES
				${LIBCUDA_LIBRARIES})
				endif(STREAM_EXECUTOR_ENABLE_CUDA_PLATFORM)

				add_subdirectory(platforms)

	add_se_library(			add_se_library(
	streamexecutor			streamexecutor
	Device.cpp			Device.cpp
	DeviceMemory.cpp			DeviceMemory.cpp
	Error.cpp			Error.cpp
	HostMemory.cpp			HostMemory.cpp
	Kernel.cpp			Kernel.cpp
	KernelSpec.cpp			KernelSpec.cpp
	PackedKernelArgumentArray.cpp			PackedKernelArgumentArray.cpp
	Platform.cpp			Platform.cpp
	PlatformDevice.cpp			PlatformDevice.cpp
	PlatformManager.cpp			PlatformManager.cpp
	Stream.cpp			Stream.cpp
	)			${STREAM_EXECUTOR_CUDA_PLATFORM_TARGET_OBJECT}
				LINK_LIBS
				${STREAM_EXECUTOR_LIBCUDA_LIBRARIES})
				jlebarUnsubmitted Not Done Reply Inline Actions Note to self, need to patch this in and try it out with the in-tree build. jlebar: Note to self, need to patch this in and try it out with the in-tree build.

	install(TARGETS streamexecutor DESTINATION lib)			install(TARGETS streamexecutor DESTINATION lib)

streamexecutor/lib/PlatformManager.cpp

	//===-- PlatformManager.cpp - PlatformManager implementation --------------===//			//===-- PlatformManager.cpp - PlatformManager implementation --------------===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	///			///
	/// \file			/// \file
	/// Implementation of PlatformManager class internals.			/// Implementation of PlatformManager class internals.
	///			///
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "streamexecutor/PlatformManager.h"			#include "streamexecutor/PlatformManager.h"

				#include "streamexecutor/PlatformOptions.h"
	#include "streamexecutor/platforms/host/HostPlatform.h"			#include "streamexecutor/platforms/host/HostPlatform.h"

				#ifdef STREAM_EXECUTOR_ENABLE_CUDA_PLATFORM
				#include "streamexecutor/platforms/cuda/CUDAPlatform.h"
				#endif

	namespace streamexecutor {			namespace streamexecutor {

	PlatformManager::PlatformManager() {			PlatformManager::PlatformManager() {
	// TODO(jhen): Register known platforms by name.			// TODO(jhen): Register known platforms by name.
	// We have a couple of options here:			// We have a couple of options here:
	// * Use build-system flags to set preprocessor macros that select the			// * Use build-system flags to set preprocessor macros that select the
	// appropriate code to include here.			// appropriate code to include here.
	// * Use static initialization tricks to have platform libraries register			// * Use static initialization tricks to have platform libraries register
	// themselves when they are loaded.			// themselves when they are loaded.

	PlatformsByName.emplace("host", llvm::make_unique<host::HostPlatform>());			PlatformsByName.emplace("host", llvm::make_unique<host::HostPlatform>());

				#ifdef STREAM_EXECUTOR_ENABLE_CUDA_PLATFORM
				PlatformsByName.emplace("cuda", llvm::make_unique<cuda::CUDAPlatform>());
				#endif
	}			}

	Expected<Platform *> PlatformManager::getPlatformByName(llvm::StringRef Name) {			Expected<Platform *> PlatformManager::getPlatformByName(llvm::StringRef Name) {
	static PlatformManager Instance;			static PlatformManager Instance;
	auto Iterator = Instance.PlatformsByName.find(Name.lower());			auto Iterator = Instance.PlatformsByName.find(Name.lower());
	if (Iterator != Instance.PlatformsByName.end())			if (Iterator != Instance.PlatformsByName.end())
	return Iterator->second.get();			return Iterator->second.get();
	return make_error("no available platform with name " + Name);			return make_error("no available platform with name " + Name);
	}			}

	} // namespace streamexecutor			} // namespace streamexecutor

streamexecutor/lib/platforms/CMakeLists.txt

This file was added.

				if(STREAM_EXECUTOR_ENABLE_CUDA_PLATFORM)
				add_subdirectory(cuda)
				endif()

streamexecutor/lib/platforms/cuda/CMakeLists.txt

This file was added.

				add_library(
				streamexecutor_cuda_platform
				OBJECT
				CUDAPlatform.cpp
				CUDAPlatformDevice.cpp)

streamexecutor/lib/platforms/cuda/CUDAPlatform.cpp

This file was added.

				//===-- CUDAPlatform.cpp - CUDA platform implementation -------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// Implementation of CUDA platform internals.
				///
				//===----------------------------------------------------------------------===//

				#include "streamexecutor/platforms/cuda/CUDAPlatform.h"
				#include "streamexecutor/Device.h"
				#include "streamexecutor/Platform.h"
				#include "streamexecutor/platforms/cuda/CUDAPlatformDevice.h"

				#include "llvm/Support/Mutex.h"

				#include "cuda.h"

				#include <map>

				namespace streamexecutor {
				namespace cuda {

				size_t CUDAPlatform::getDeviceCount() const {
				static CUresult InitResult = []() { return cuInit(0); }();

				if (InitResult)
				// TODO(jhen): Log an error.
				return 0;

				int DeviceCount = 0;
				CUresult Result = cuDeviceGetCount(&DeviceCount);
				// TODO(jhen): Log an error.

				return DeviceCount;
				}

				Expected<Device> CUDAPlatform::getDevice(size_t DeviceIndex) {
				static CUresult InitResult = []() { return cuInit(0); }();
				jlebarUnsubmitted Done Reply Inline Actions Do you want to wrap the cuInit(0) call in a helper function so we only ever call it once? static CUResult ensureCudaInitialized() { static InitResult = ...; return InitResult; } I guess it doesn't make a big difference either way. jlebar: Do you want to wrap the cuInit(0) call in a helper function so we only ever call it once?
				jhenAuthorUnsubmitted Not Done Reply Inline Actions Great idea! I like that much better. jhen: Great idea! I like that much better.

				if (InitResult)
				return CUresultToError(InitResult);

				llvm::sys::ScopedLock Lock(Mutex);
				auto Iterator = PlatformDevices.find(DeviceIndex);
				if (Iterator == PlatformDevices.end()) {
				if (auto MaybePDevice = CUDAPlatformDevice::create(DeviceIndex)) {
				Iterator =
				PlatformDevices.emplace(DeviceIndex, std::move(*MaybePDevice)).first;
				} else {
				return MaybePDevice.takeError();
				}
				}
				return Device(&Iterator->second);
				jlebarUnsubmitted Done Reply Inline Actions Hm. Do we care if getDevice() is slow? We could probably do this without a lock without too much trouble (essentially using the same mechanism used to ensure that function-static variables are initialized only once, except we'd be using it for member variables). Maybe for another patch, if we care about performance. If we don't care, even better. :) jlebar: Hm. Do we care if getDevice() is slow? We could probably do this without a lock without too…
				jhenAuthorUnsubmitted Not Done Reply Inline Actions I think it doesn't matter for now, and we can come back to it later if it ever seems to matter. jhen: I think it doesn't matter for now, and we can come back to it later if it ever seems to matter.
				}

				} // namespace cuda
				} // namespace streamexecutor

streamexecutor/lib/platforms/cuda/CUDAPlatformDevice.cpp

This file was added.

				//===-- CUDAPlatformDevice.cpp - CUDAPlatformDevice implementation --------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// Implementation of CUDAPlatformDevice.
				///
				//===----------------------------------------------------------------------===//

				#include "streamexecutor/platforms/cuda/CUDAPlatformDevice.h"
				#include "streamexecutor/PlatformDevice.h"

				#include "cuda.h"

				namespace streamexecutor {
				namespace cuda {

				static void offset(const void Base, size_t Offset) {
				return const_cast<char >(static_cast<const char >(Base) + Offset);
				}

				Expected<CUDAPlatformDevice> CUDAPlatformDevice::create(size_t DeviceIndex) {
				CUdevice DeviceHandle;
				if (CUresult Result = cuDeviceGet(&DeviceHandle, DeviceIndex))
				return CUresultToError(Result);

				CUcontext ContextHandle;
				if (CUresult Result = cuDevicePrimaryCtxRetain(&ContextHandle, DeviceHandle))
				return CUresultToError(Result);

				if (CUresult Result = cuCtxSetCurrent(ContextHandle))
				return CUresultToError(Result);
				jlebarUnsubmitted Done Reply Inline Actions Hm, I wonder if we want to be more descriptive in our error handling and give some sort of "backtrace", indicating where we failed. e.g. something as simple as return CUresultToError(result, "cuCtxSetCurrent"); Maybe not for this patch, though. jlebar: Hm, I wonder if we want to be more descriptive in our error handling and give some sort of…
				jhenAuthorUnsubmitted Not Done Reply Inline Actions Yes, I could have used that in debugging already. I did the simple method name reporting for now. jhen: Yes, I could have used that in debugging already. I did the simple method name reporting for…

				return CUDAPlatformDevice(DeviceIndex);
				}

				CUDAPlatformDevice::CUDAPlatformDevice(CUDAPlatformDevice &&Other) noexcept
				: DeviceIndex(Other.DeviceIndex) {
				Other.DeviceIndex = -1;
				}

				CUDAPlatformDevice &CUDAPlatformDevice::
				operator=(CUDAPlatformDevice &&Other) noexcept {
				DeviceIndex = Other.DeviceIndex;
				Other.DeviceIndex = -1;
				return *this;
				}
				jlebarUnsubmitted Done Reply Inline Actions I wonder if we want to cache this instead of recomputing it every time. jlebar: I wonder if we want to cache this instead of recomputing it every time.

				jhenAuthorUnsubmitted Not Done Reply Inline Actions Good idea. Let's do that. jhen: Good idea. Let's do that.
				CUDAPlatformDevice::~CUDAPlatformDevice() {
				CUresult Result = cuDevicePrimaryCtxRelease(DeviceIndex);
				// TODO(jhen): Log error.
				jlebarUnsubmitted Done Reply Inline Actions Is this an "unused variable" warning? If so maybe do (void) Result; jlebar: Is this an "unused variable" warning? If so maybe do (void) Result;
				jhenAuthorUnsubmitted Not Done Reply Inline Actions Yes, it was a warning. Thanks for the suggestion. I've done this for all these cases where error handling has yet to be implemented. jhen: Yes, it was a warning. Thanks for the suggestion. I've done this for all these cases where…
				}

				Expected<const void *>
				CUDAPlatformDevice::createKernel(const MultiKernelLoaderSpec &Spec) {
				// TODO(jhen): Maybe first check loaded modules?
				if (!Spec.hasCUDAPTXInMemory())
				return make_error("no CUDA source available to create kernel");

				CUdevice Device = static_cast<int>(DeviceIndex);
				int ComputeCapabilityMajor = 0;
				int ComputeCapabilityMinor = 0;
				if (CUresult Result = cuDeviceGetAttribute(
				&ComputeCapabilityMajor, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR,
				Device))
				return CUresultToError(Result);
				if (CUresult Result = cuDeviceGetAttribute(
				&ComputeCapabilityMinor, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR,
				Device))
				return CUresultToError(Result);
				const char *Code = Spec.getCUDAPTXInMemory().getCode(ComputeCapabilityMajor,
				ComputeCapabilityMinor);

				if (!Code)
				return make_error("no suitable CUDA source found for compute capability " +
				jlebarUnsubmitted Done Reply Inline Actions Nit, "CUDA source" is probably not the right phrase, since this is all compiled CUDA code (PTX and SASS). jlebar: Nit, "CUDA source" is probably not the right phrase, since this is all compiled CUDA code (PTX…
				jhenAuthorUnsubmitted Not Done Reply Inline Actions I changed it to "CUDA code". I think that should be correct. jhen: I changed it to "CUDA code". I think that should be correct.
				llvm::Twine(ComputeCapabilityMajor) + "." +
				llvm::Twine(ComputeCapabilityMinor));

				CUmodule Module;
				if (CUresult Result = cuModuleLoadData(&Module, Code))
				return CUresultToError(Result);

				CUfunction Function;
				if (CUresult Result =
				cuModuleGetFunction(&Function, Module, Spec.getKernelName().c_str()))
				return CUresultToError(Result);

				// TODO(jhen): Should I save this function pointer in case someone asks for
				// it again?

				// TODO(jhen): Should I save the module pointer so I can unload it when I
				// destroy this device?

				return static_cast<const void *>(Function);
				}

				Error CUDAPlatformDevice::destroyKernel(const void *Handle) {
				// TODO(jhen): Maybe keep track of kernels for each module and unload the
				// module after they are all destroyed.
				return Error::success();
				}

				Expected<const void *> CUDAPlatformDevice::createStream() {
				CUstream Stream;
				if (CUresult Result = cuStreamCreate(&Stream, CU_STREAM_DEFAULT))
				return CUresultToError(Result);
				return Stream;
				}

				Error CUDAPlatformDevice::destroyStream(const void *Handle) {
				return CUresultToError(
				cuStreamDestroy(static_cast<CUstream>(const_cast<void *>(Handle))));
				}

				Error CUDAPlatformDevice::launch(
				const void *PlatformStreamHandle, BlockDimensions BlockSize,
				GridDimensions GridSize, const void *PKernelHandle,
				const PackedKernelArgumentArrayBase &ArgumentArray) {
				CUfunction Function =
				reinterpret_cast<CUfunction>(const_cast<void *>(PKernelHandle));
				CUstream Stream =
				reinterpret_cast<CUstream>(const_cast<void *>(PlatformStreamHandle));
				// TODO(jhen): Deal with shared memory arguments.
				unsigned SharedMemoryBytes = 0;
				void ArgumentAddresses = const_cast<void >(ArgumentArray.getAddresses());
				return CUresultToError(cuLaunchKernel(
				Function, GridSize.X, GridSize.Y, GridSize.Z, BlockSize.X, BlockSize.Y,
				BlockSize.Z, SharedMemoryBytes, Stream, ArgumentAddresses, nullptr));
				}

				Error CUDAPlatformDevice::copyD2H(const void *PlatformStreamHandle,
				const void *DeviceSrcHandle,
				size_t SrcByteOffset, void *HostDst,
				size_t DstByteOffset, size_t ByteCount) {
				return CUresultToError(cuMemcpyDtoHAsync(
				offset(HostDst, DstByteOffset),
				reinterpret_cast<CUdeviceptr>(offset(DeviceSrcHandle, SrcByteOffset)),
				ByteCount,
				static_cast<CUstream>(const_cast<void *>(PlatformStreamHandle))));
				}

				Error CUDAPlatformDevice::copyH2D(const void *PlatformStreamHandle,
				const void *HostSrc, size_t SrcByteOffset,
				const void *DeviceDstHandle,
				size_t DstByteOffset, size_t ByteCount) {
				return CUresultToError(cuMemcpyHtoDAsync(
				reinterpret_cast<CUdeviceptr>(offset(DeviceDstHandle, DstByteOffset)),
				offset(HostSrc, SrcByteOffset), ByteCount,
				static_cast<CUstream>(const_cast<void *>(PlatformStreamHandle))));
				}

				Error CUDAPlatformDevice::copyD2D(const void *PlatformStreamHandle,
				const void *DeviceSrcHandle,
				size_t SrcByteOffset,
				const void *DeviceDstHandle,
				size_t DstByteOffset, size_t ByteCount) {
				return CUresultToError(cuMemcpyDtoDAsync(
				reinterpret_cast<CUdeviceptr>(offset(DeviceDstHandle, DstByteOffset)),
				reinterpret_cast<CUdeviceptr>(offset(DeviceSrcHandle, SrcByteOffset)),
				ByteCount,
				static_cast<CUstream>(const_cast<void *>(PlatformStreamHandle))));
				}

				Error CUDAPlatformDevice::blockHostUntilDone(const void *PlatformStreamHandle) {
				return CUresultToError(cuStreamSynchronize(
				static_cast<CUstream>(const_cast<void *>(PlatformStreamHandle))));
				}

				Expected<void *> CUDAPlatformDevice::allocateDeviceMemory(size_t ByteCount) {
				CUdeviceptr Pointer;
				if (CUresult Result = cuMemAlloc(&Pointer, ByteCount))
				return CUresultToError(Result);
				return reinterpret_cast<void *>(Pointer);
				}

				Error CUDAPlatformDevice::freeDeviceMemory(const void *Handle) {
				return CUresultToError(cuMemFree(reinterpret_cast<CUdeviceptr>(Handle)));
				}

				Error CUDAPlatformDevice::registerHostMemory(void *Memory, size_t ByteCount) {
				return CUresultToError(cuMemHostRegister(Memory, ByteCount, 0u));
				}

				Error CUDAPlatformDevice::unregisterHostMemory(const void *Memory) {
				return CUresultToError(cuMemHostUnregister(const_cast<void *>(Memory)));
				}

				Error CUDAPlatformDevice::synchronousCopyD2H(const void *DeviceSrcHandle,
				size_t SrcByteOffset,
				void *HostDst,
				size_t DstByteOffset,
				size_t ByteCount) {
				return CUresultToError(cuMemcpyDtoH(
				offset(HostDst, DstByteOffset),
				reinterpret_cast<CUdeviceptr>(offset(DeviceSrcHandle, SrcByteOffset)),
				ByteCount));
				}

				Error CUDAPlatformDevice::synchronousCopyH2D(const void *HostSrc,
				size_t SrcByteOffset,
				const void *DeviceDstHandle,
				size_t DstByteOffset,
				size_t ByteCount) {
				return CUresultToError(cuMemcpyDtoH(
				offset(DeviceDstHandle, DstByteOffset),
				reinterpret_cast<CUdeviceptr>(offset(HostSrc, SrcByteOffset)),
				ByteCount));
				}

				Error CUDAPlatformDevice::synchronousCopyD2D(const void *DeviceDstHandle,
				size_t DstByteOffset,
				const void *DeviceSrcHandle,
				size_t SrcByteOffset,
				size_t ByteCount) {
				return CUresultToError(cuMemcpyDtoD(
				reinterpret_cast<CUdeviceptr>(offset(DeviceDstHandle, DstByteOffset)),
				reinterpret_cast<CUdeviceptr>(offset(DeviceSrcHandle, SrcByteOffset)),
				ByteCount));
				}

				} // namespace cuda
				} // namespace streamexecutor

streamexecutor/lib/platforms/cuda/cmake/modules/FindLibcuda.cmake

This file was added.

				# - Try to find the libcuda library
				# Once done this will define
				# LIBCUDA_FOUND - System has libcuda
				# LIBCUDA_INCLUDE_DIRS - The libcuda include directories
				# LIBCUDA_LIBRARIES - The libraries needed to use libcuda

				find_path(LIBCUDA_INCLUDE_DIR cuda.h /usr/local/cuda/include)
				jlebarUnsubmitted Done Reply Inline Actions Can we override this on the command line somehow? If so, is that obvious, or is it worth documenting? jlebar: Can we override this on the command line somehow? If so, is that obvious, or is it worth…
				jhenAuthorUnsubmitted Not Done Reply Inline Actions I added a TODO to add this functionality. I'll plan to add it in the next patch. jhen: I added a TODO to add this functionality. I'll plan to add it in the next patch.
				find_library(LIBCUDA_LIBRARY cuda)
				jlebarUnsubmitted Done Reply Inline Actions The libcuda binary we use should come from the same cuda install as the headers we use, I think? jlebar: The libcuda binary we use should come from the same cuda install as the headers we use, I think?
				jhenAuthorUnsubmitted Not Done Reply Inline Actions I agree. I added a TODO here to fix it in the next patch. jhen: I agree. I added a TODO here to fix it in the next patch.

				include(FindPackageHandleStandardArgs)
				# handle the QUIETLY and REQUIRED arguments and set LIBCUDA_FOUND to TRUE if
				# all listed variables are TRUE
				find_package_handle_standard_args(
				LIBCUDA DEFAULT_MSG LIBCUDA_INCLUDE_DIR LIBCUDA_LIBRARY)

				mark_as_advanced(LIBCUDA_INCLUDE_DIR LIBCUDA_LIBRARY)

				set(LIBCUDA_LIBRARIES ${LIBCUDA_LIBRARY})
				set(LIBCUDA_INCLUDE_DIRS ${LIBCUDA_INCLUDE_DIR})

This is an archive of the discontinued LLVM Phabricator instance.

[SE] Add CUDA platformClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 71279

streamexecutor/CMakeLists.txt

streamexecutor/include/streamexecutor/PlatformOptions.h.in

streamexecutor/include/streamexecutor/platforms/cuda/CUDAPlatform.h

streamexecutor/include/streamexecutor/platforms/cuda/CUDAPlatformDevice.h

streamexecutor/lib/CMakeLists.txt

streamexecutor/lib/PlatformManager.cpp

streamexecutor/lib/platforms/CMakeLists.txt

streamexecutor/lib/platforms/cuda/CMakeLists.txt

streamexecutor/lib/platforms/cuda/CUDAPlatform.cpp

streamexecutor/lib/platforms/cuda/CUDAPlatformDevice.cpp

streamexecutor/lib/platforms/cuda/cmake/modules/FindLibcuda.cmake

[SE] Add CUDA platform
ClosedPublic