This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
parallel-libs/trunk/streamexecutor/
-
trunk/
-
streamexecutor/
-
CMakeLists.txt
-
include/streamexecutor/
-
streamexecutor/
-
PlatformDevice.h
-
PlatformOptions.h.in
-
platforms/
-
cuda/
-
CUDAPlatform.h
-
CUDAPlatformDevice.h
-
host/
-
HostPlatformDevice.h
-
lib/
-
CMakeLists.txt
-
PlatformManager.cpp
-
platforms/
-
CMakeLists.txt
-
cuda/
-
CMakeLists.txt
-
CUDAPlatform.cpp
-
CUDAPlatformDevice.cpp
-
cmake/modules/
-
modules/
-
FindLibcuda.cmake

Differential D24538

[SE] Add CUDA platform
ClosedPublic

Authored by jhen on Sep 13 2016, 5:41 PM.

Download Raw Diff

Details

Reviewers

jlebar

Commits

rG6bfc863d741a: [SE] Add CUDA platform
rL281524: [SE] Add CUDA platform

Summary

Basic CUDA platform implementation and cmake infrastructur to control whether
it's used. A few important TODOs will be handled in later patches:

Log some error messages that can't easily be returned as Errors.
Cache modules and kernels to prevent reloading them if someone tries to reload a kernel that's already loaded.
Tolerate shared memory arguments for kernel launches.

Diff Detail

Repository: rL LLVM

Event Timeline

jhen updated this revision to Diff 71279.Sep 13 2016, 5:41 PM

jhen retitled this revision from to [SE] Add CUDA platform.

jhen updated this object.

jhen added a reviewer: jlebar.

jhen added subscribers: parallel_libs-commits, jprice.

Herald added subscribers: jlebar, mgorny, beanz. · View Herald TranscriptSep 13 2016, 5:41 PM

jlebar added inline comments.Sep 13 2016, 9:18 PM

streamexecutor/CMakeLists.txt
6 ↗	(On Diff #71279)	Not necessarily in this patch, but we should document how to configure this with CUDA enabled and (see below) how to point cmake at different CUDA installs.
streamexecutor/include/streamexecutor/PlatformOptions.h.in
4 ↗	(On Diff #71279)	Maybe we should have a comment in this file explaining what it's for.
streamexecutor/include/streamexecutor/platforms/cuda/CUDAPlatformDevice.h
37 ↗	(On Diff #71279)	Do we want the device number in the name?
streamexecutor/lib/CMakeLists.txt
41 ↗	(On Diff #71279)	Note to self, need to patch this in and try it out with the in-tree build.
streamexecutor/lib/platforms/cuda/CUDAPlatform.cpp
44 ↗	(On Diff #71279)	Do you want to wrap the cuInit(0) call in a helper function so we only ever call it once? static CUResult ensureCudaInitialized() { static InitResult = ...; return InitResult; } I guess it doesn't make a big difference either way.
59 ↗	(On Diff #71279)	Hm. Do we care if getDevice() is slow? We could probably do this without a lock without too much trouble (essentially using the same mechanism used to ensure that function-static variables are initialized only once, except we'd be using it for member variables). Maybe for another patch, if we care about performance. If we don't care, even better. :)
streamexecutor/lib/platforms/cuda/CUDAPlatformDevice.cpp
37 ↗	(On Diff #71279)	Hm, I wonder if we want to be more descriptive in our error handling and give some sort of "backtrace", indicating where we failed. e.g. something as simple as return CUresultToError(result, "cuCtxSetCurrent"); Maybe not for this patch, though.
56 ↗	(On Diff #71279)	Is this an "unused variable" warning? If so maybe do (void) Result;
80 ↗	(On Diff #71279)	Nit, "CUDA source" is probably not the right phrase, since this is all compiled CUDA code (PTX and SASS).
streamexecutor/lib/platforms/cuda/cmake/modules/FindLibcuda.cmake
7 ↗	(On Diff #71279)	Can we override this on the command line somehow? If so, is that obvious, or is it worth documenting?
8 ↗	(On Diff #71279)	The libcuda binary we use should come from the same cuda install as the headers we use, I think?

jprice added inline comments.Sep 14 2016, 5:03 AM

streamexecutor/include/streamexecutor/platforms/cuda/CUDAPlatformDevice.h
23 ↗	(On Diff #71279)	Where is this defined? I get undefined references to this function when I try and build this patch.
37 ↗	(On Diff #71279)	Or even the actual name of the device (`cuDeviceGetName`)? This is certainly more useful for client code (in my experience), particularly when working with systems that have more than one type of GPU. It strikes me that "CUDA" is really the name of the platform, not the device. In which case, maybe it makes sense to also have a `getPlatformName()` or `getPlatform()->getName()` that gives you the "CUDA" vs "OpenCL" vs "Host" which is used for the error strings.

Respond to review comments

streamexecutor/CMakeLists.txt
6 ↗	(On Diff #71279)	Sounds good. I'll tackle this in the next patch when I enable pointing cmake at different CUDA installs.
streamexecutor/include/streamexecutor/platforms/cuda/CUDAPlatformDevice.h
23 ↗	(On Diff #71279)	Oops, I failed at version control. The definition is now in CUDAPlatformDevice.cpp.
37 ↗	(On Diff #71279)	I agree that more information is better here. Just as jprice suggested, I changed `getName` to return the device index and the name from `cuDeviceGetName`, and I made a new `getPlatformName` function for error messages..
streamexecutor/lib/platforms/cuda/CUDAPlatform.cpp
44 ↗	(On Diff #71279)	Great idea! I like that much better.
59 ↗	(On Diff #71279)	I think it doesn't matter for now, and we can come back to it later if it ever seems to matter.
streamexecutor/lib/platforms/cuda/CUDAPlatformDevice.cpp
37 ↗	(On Diff #71279)	Yes, I could have used that in debugging already. I did the simple method name reporting for now.
56 ↗	(On Diff #71279)	Yes, it was a warning. Thanks for the suggestion. I've done this for all these cases where error handling has yet to be implemented.
80 ↗	(On Diff #71279)	I changed it to "CUDA code". I think that should be correct.
streamexecutor/lib/platforms/cuda/cmake/modules/FindLibcuda.cmake
7 ↗	(On Diff #71279)	I added a TODO to add this functionality. I'll plan to add it in the next patch.
8 ↗	(On Diff #71279)	I agree. I added a TODO here to fix it in the next patch.

jlebar accepted this revision.Sep 14 2016, 10:27 AM

jlebar edited edge metadata.

jlebar added inline comments.

streamexecutor/lib/platforms/cuda/CUDAPlatformDevice.cpp
51 ↗	(On Diff #71384)	I wonder if we want to cache this instead of recomputing it every time.

This revision is now accepted and ready to land.Sep 14 2016, 10:27 AM

Cache device name

jhen marked an inline comment as done.Sep 14 2016, 10:35 AM

jhen added inline comments.

streamexecutor/lib/platforms/cuda/CUDAPlatformDevice.cpp
52 ↗	(On Diff #71387)	Good idea. Let's do that.

Closed by commit rL281524: [SE] Add CUDA platform (authored by jhen). · Explain WhySep 14 2016, 1:07 PM

This revision was automatically updated to reflect the committed changes.

jhen marked an inline comment as done.

Revision Contents

Path

Size

parallel-libs/

trunk/

streamexecutor/

CMakeLists.txt

5 lines

include/

streamexecutor/

PlatformDevice.h

40 lines

PlatformOptions.h.in

23 lines

platforms/

cuda/

CUDAPlatform.h

42 lines

CUDAPlatformDevice.h

93 lines

host/

HostPlatformDevice.h

2 lines

lib/

CMakeLists.txt

24 lines

PlatformManager.cpp

10 lines

platforms/

CMakeLists.txt

3 lines

cuda/

CMakeLists.txt

5 lines

CUDAPlatform.cpp

65 lines

CUDAPlatformDevice.cpp

280 lines

cmake/

modules/

FindLibcuda.cmake

21 lines

Diff 71413

parallel-libs/trunk/streamexecutor/CMakeLists.txt

	cmake_minimum_required(VERSION 3.1)			cmake_minimum_required(VERSION 3.1)

	option(STREAM_EXECUTOR_UNIT_TESTS "enable unit tests" ON)			option(STREAM_EXECUTOR_UNIT_TESTS "enable unit tests" ON)
	option(STREAM_EXECUTOR_ENABLE_DOXYGEN "enable StreamExecutor doxygen" ON)			option(STREAM_EXECUTOR_ENABLE_DOXYGEN "enable StreamExecutor doxygen" ON)
	option(STREAM_EXECUTOR_ENABLE_CONFIG_TOOL "enable building streamexecutor-config tool" ON)			option(STREAM_EXECUTOR_ENABLE_CONFIG_TOOL "enable building streamexecutor-config tool" ON)
				option(STREAM_EXECUTOR_ENABLE_CUDA_PLATFORM "enable building the CUDA StreamExecutor platform" OFF)

				configure_file("include/streamexecutor/PlatformOptions.h.in" "include/streamexecutor/PlatformOptions.h")

	# First find includes relative to the streamexecutor top-level source path.			# First find includes relative to the streamexecutor top-level source path.
	include_directories(BEFORE ${CMAKE_CURRENT_SOURCE_DIR}/include)			include_directories(BEFORE ${CMAKE_CURRENT_SOURCE_DIR}/include)
				# Also look for configured headers in the top-level binary directory.
				include_directories(BEFORE ${CMAKE_CURRENT_BINARY_DIR}/include)

	# If we are not building as part of LLVM, build StreamExecutor as a standalone			# If we are not building as part of LLVM, build StreamExecutor as a standalone
	# project using LLVM as an external library:			# project using LLVM as an external library:
	string(			string(
	COMPARE			COMPARE
	EQUAL			EQUAL
	"${CMAKE_SOURCE_DIR}"			"${CMAKE_SOURCE_DIR}"
	"${CMAKE_CURRENT_SOURCE_DIR}"			"${CMAKE_CURRENT_SOURCE_DIR}"
	▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

parallel-libs/trunk/streamexecutor/include/streamexecutor/PlatformDevice.h

	Show All 31 Lines
	/// The public Device and Stream classes have the type-safe versions of the			/// The public Device and Stream classes have the type-safe versions of the
	/// functions in this interface.			/// functions in this interface.
	class PlatformDevice {			class PlatformDevice {
	public:			public:
	virtual ~PlatformDevice();			virtual ~PlatformDevice();

	virtual std::string getName() const = 0;			virtual std::string getName() const = 0;

				virtual std::string getPlatformName() const = 0;

	/// Creates a platform-specific kernel.			/// Creates a platform-specific kernel.
	virtual Expected<const void *>			virtual Expected<const void *>
	createKernel(const MultiKernelLoaderSpec &Spec) {			createKernel(const MultiKernelLoaderSpec &Spec) {
	return make_error("createKernel not implemented for platform " + getName());			return make_error("createKernel not implemented for platform " +
				getPlatformName());
	}			}

	virtual Error destroyKernel(const void *Handle) {			virtual Error destroyKernel(const void *Handle) {
	return make_error("destroyKernel not implemented for platform " +			return make_error("destroyKernel not implemented for platform " +
	getName());			getPlatformName());
	}			}

	/// Creates a platform-specific stream.			/// Creates a platform-specific stream.
	virtual Expected<const void *> createStream() {			virtual Expected<const void *> createStream() {
	return make_error("createStream not implemented for platform " + getName());			return make_error("createStream not implemented for platform " +
				getPlatformName());
	}			}

	virtual Error destroyStream(const void *Handle) {			virtual Error destroyStream(const void *Handle) {
	return make_error("destroyStream not implemented for platform " +			return make_error("destroyStream not implemented for platform " +
	getName());			getPlatformName());
	}			}

	/// Launches a kernel on the given stream.			/// Launches a kernel on the given stream.
	virtual Error launch(const void *PlatformStreamHandle,			virtual Error launch(const void *PlatformStreamHandle,
	BlockDimensions BlockSize, GridDimensions GridSize,			BlockDimensions BlockSize, GridDimensions GridSize,
	const void *PKernelHandle,			const void *PKernelHandle,
	const PackedKernelArgumentArrayBase &ArgumentArray) {			const PackedKernelArgumentArrayBase &ArgumentArray) {
	return make_error("launch not implemented for platform " + getName());			return make_error("launch not implemented for platform " +
				getPlatformName());
	}			}

	/// Copies data from the device to the host.			/// Copies data from the device to the host.
	///			///
	/// HostDst should have been registered with registerHostMemory.			/// HostDst should have been registered with registerHostMemory.
	virtual Error copyD2H(const void *PlatformStreamHandle,			virtual Error copyD2H(const void *PlatformStreamHandle,
	const void *DeviceSrcHandle, size_t SrcByteOffset,			const void *DeviceSrcHandle, size_t SrcByteOffset,
	void *HostDst, size_t DstByteOffset, size_t ByteCount) {			void *HostDst, size_t DstByteOffset, size_t ByteCount) {
	return make_error("copyD2H not implemented for platform " + getName());			return make_error("copyD2H not implemented for platform " +
				getPlatformName());
	}			}

	/// Copies data from the host to the device.			/// Copies data from the host to the device.
	///			///
	/// HostSrc should have been registered with registerHostMemory.			/// HostSrc should have been registered with registerHostMemory.
	virtual Error copyH2D(const void PlatformStreamHandle, const void HostSrc,			virtual Error copyH2D(const void PlatformStreamHandle, const void HostSrc,
	size_t SrcByteOffset, const void *DeviceDstHandle,			size_t SrcByteOffset, const void *DeviceDstHandle,
	size_t DstByteOffset, size_t ByteCount) {			size_t DstByteOffset, size_t ByteCount) {
	return make_error("copyH2D not implemented for platform " + getName());			return make_error("copyH2D not implemented for platform " +
				getPlatformName());
	}			}

	/// Copies data from one device location to another.			/// Copies data from one device location to another.
	virtual Error copyD2D(const void *PlatformStreamHandle,			virtual Error copyD2D(const void *PlatformStreamHandle,
	const void *DeviceSrcHandle, size_t SrcByteOffset,			const void *DeviceSrcHandle, size_t SrcByteOffset,
	const void *DeviceDstHandle, size_t DstByteOffset,			const void *DeviceDstHandle, size_t DstByteOffset,
	size_t ByteCount) {			size_t ByteCount) {
	return make_error("copyD2D not implemented for platform " + getName());			return make_error("copyD2D not implemented for platform " +
				getPlatformName());
	}			}

	/// Blocks the host until the given stream completes all the work enqueued up			/// Blocks the host until the given stream completes all the work enqueued up
	/// to the point this function is called.			/// to the point this function is called.
	virtual Error blockHostUntilDone(const void *PlatformStreamHandle) {			virtual Error blockHostUntilDone(const void *PlatformStreamHandle) {
	return make_error("blockHostUntilDone not implemented for platform " +			return make_error("blockHostUntilDone not implemented for platform " +
	getName());			getPlatformName());
	}			}

	/// Allocates untyped device memory of a given size in bytes.			/// Allocates untyped device memory of a given size in bytes.
	virtual Expected<void *> allocateDeviceMemory(size_t ByteCount) {			virtual Expected<void *> allocateDeviceMemory(size_t ByteCount) {
	return make_error("allocateDeviceMemory not implemented for platform " +			return make_error("allocateDeviceMemory not implemented for platform " +
	getName());			getPlatformName());
	}			}

	/// Frees device memory previously allocated by allocateDeviceMemory.			/// Frees device memory previously allocated by allocateDeviceMemory.
	virtual Error freeDeviceMemory(const void *Handle) {			virtual Error freeDeviceMemory(const void *Handle) {
	return make_error("freeDeviceMemory not implemented for platform " +			return make_error("freeDeviceMemory not implemented for platform " +
	getName());			getPlatformName());
	}			}

	/// Registers previously allocated host memory so it can be used with copyH2D			/// Registers previously allocated host memory so it can be used with copyH2D
	/// and copyD2H.			/// and copyD2H.
	virtual Error registerHostMemory(void *Memory, size_t ByteCount) {			virtual Error registerHostMemory(void *Memory, size_t ByteCount) {
	return make_error("registerHostMemory not implemented for platform " +			return make_error("registerHostMemory not implemented for platform " +
	getName());			getPlatformName());
	}			}

	/// Unregisters host memory previously registered with registerHostMemory.			/// Unregisters host memory previously registered with registerHostMemory.
	virtual Error unregisterHostMemory(const void *Memory) {			virtual Error unregisterHostMemory(const void *Memory) {
	return make_error("unregisterHostMemory not implemented for platform " +			return make_error("unregisterHostMemory not implemented for platform " +
	getName());			getPlatformName());
	}			}

	/// Copies the given number of bytes from device memory to host memory.			/// Copies the given number of bytes from device memory to host memory.
	///			///
	/// Blocks the calling host thread until the copy is completed. Can operate on			/// Blocks the calling host thread until the copy is completed. Can operate on
	/// any host memory, not just registered host memory. Does not block any			/// any host memory, not just registered host memory. Does not block any
	/// ongoing device calls.			/// ongoing device calls.
	virtual Error synchronousCopyD2H(const void *DeviceSrcHandle,			virtual Error synchronousCopyD2H(const void *DeviceSrcHandle,
	size_t SrcByteOffset, void *HostDst,			size_t SrcByteOffset, void *HostDst,
	size_t DstByteOffset, size_t ByteCount) {			size_t DstByteOffset, size_t ByteCount) {
	return make_error("synchronousCopyD2H not implemented for platform " +			return make_error("synchronousCopyD2H not implemented for platform " +
	getName());			getPlatformName());
	}			}

	/// Similar to synchronousCopyD2H(const void *, size_t, void			/// Similar to synchronousCopyD2H(const void *, size_t, void
	/// *, size_t, size_t), but copies memory from host to device rather than			/// *, size_t, size_t), but copies memory from host to device rather than
	/// device to host.			/// device to host.
	virtual Error synchronousCopyH2D(const void *HostSrc, size_t SrcByteOffset,			virtual Error synchronousCopyH2D(const void *HostSrc, size_t SrcByteOffset,
	const void *DeviceDstHandle,			const void *DeviceDstHandle,
	size_t DstByteOffset, size_t ByteCount) {			size_t DstByteOffset, size_t ByteCount) {
	return make_error("synchronousCopyH2D not implemented for platform " +			return make_error("synchronousCopyH2D not implemented for platform " +
	getName());			getPlatformName());
	}			}

	/// Similar to synchronousCopyD2H(const void *, size_t, void			/// Similar to synchronousCopyD2H(const void *, size_t, void
	/// *, size_t, size_t), but copies memory from one location in device memory			/// *, size_t, size_t), but copies memory from one location in device memory
	/// to another rather than from device to host.			/// to another rather than from device to host.
	virtual Error synchronousCopyD2D(const void *DeviceSrcHandle,			virtual Error synchronousCopyD2D(const void *DeviceSrcHandle,
	size_t SrcByteOffset,			size_t SrcByteOffset,
	const void *DeviceDstHandle,			const void *DeviceDstHandle,
	size_t DstByteOffset, size_t ByteCount) {			size_t DstByteOffset, size_t ByteCount) {
	return make_error("synchronousCopyD2D not implemented for platform " +			return make_error("synchronousCopyD2D not implemented for platform " +
	getName());			getPlatformName());
	}			}
	};			};

	} // namespace streamexecutor			} // namespace streamexecutor

	#endif // STREAMEXECUTOR_PLATFORMDEVICE_H			#endif // STREAMEXECUTOR_PLATFORMDEVICE_H

parallel-libs/trunk/streamexecutor/include/streamexecutor/PlatformOptions.h.in

				//===-- PlatformOptions.h - Platform option macros --------------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This contents of this file are filled in at configuration time. This file
				/// defines macros that represent the platform configuration state of the build,
				/// e.g. which platforms are enabled.
				///
				//===----------------------------------------------------------------------===//


				#ifndef STREAMEXECUTOR_PLATFORMOPTIONS_H
				#define STREAMEXECUTOR_PLATFORMOPTIONS_H

				#cmakedefine STREAM_EXECUTOR_ENABLE_CUDA_PLATFORM

				#endif // STREAMEXECUTOR_PLATFORMOPTIONS_H

parallel-libs/trunk/streamexecutor/include/streamexecutor/platforms/cuda/CUDAPlatform.h

				//===-- CUDAPlatform.h - CUDA platform subclass ------------------ C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// Declaration of the CUDAPlatform class.
				///
				//===----------------------------------------------------------------------===//

				#ifndef STREAMEXECUTOR_PLATFORMS_CUDA_CUDAPLATFORM_H
				#define STREAMEXECUTOR_PLATFORMS_CUDA_CUDAPLATFORM_H

				#include "streamexecutor/Platform.h"
				#include "streamexecutor/platforms/cuda/CUDAPlatformDevice.h"

				#include "llvm/Support/Mutex.h"

				#include <map>

				namespace streamexecutor {
				namespace cuda {

				class CUDAPlatform : public Platform {
				public:
				size_t getDeviceCount() const override;

				Expected<Device> getDevice(size_t DeviceIndex) override;

				private:
				llvm::sys::Mutex Mutex;
				std::map<size_t, CUDAPlatformDevice> PlatformDevices;
				};

				} // namespace cuda
				} // namespace streamexecutor

				#endif // STREAMEXECUTOR_PLATFORMS_CUDA_CUDAPLATFORM_H

parallel-libs/trunk/streamexecutor/include/streamexecutor/platforms/cuda/CUDAPlatformDevice.h

				//===-- CUDAPlatformDevice.h - CUDAPlatformDevice class ---------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// Declaration of the CUDAPlatformDevice class.
				///
				//===----------------------------------------------------------------------===//

				#ifndef STREAMEXECUTOR_PLATFORMS_CUDA_CUDAPLATFORMDEVICE_H
				#define STREAMEXECUTOR_PLATFORMS_CUDA_CUDAPLATFORMDEVICE_H

				#include "streamexecutor/PlatformDevice.h"

				namespace streamexecutor {
				namespace cuda {

				Error CUresultToError(int CUResult, const llvm::Twine &Message);

				class CUDAPlatformDevice : public PlatformDevice {
				public:
				static Expected<CUDAPlatformDevice> create(size_t DeviceIndex);

				CUDAPlatformDevice(const CUDAPlatformDevice &) = delete;
				CUDAPlatformDevice &operator=(const CUDAPlatformDevice &) = delete;

				CUDAPlatformDevice(CUDAPlatformDevice &&) noexcept;
				CUDAPlatformDevice &operator=(CUDAPlatformDevice &&) noexcept;

				~CUDAPlatformDevice() override;

				std::string getName() const override;

				std::string getPlatformName() const override { return "CUDA"; }

				Expected<const void *>
				createKernel(const MultiKernelLoaderSpec &Spec) override;
				Error destroyKernel(const void *Handle) override;

				Expected<const void *> createStream() override;
				Error destroyStream(const void *Handle) override;

				Error launch(const void *PlatformStreamHandle, BlockDimensions BlockSize,
				GridDimensions GridSize, const void *PKernelHandle,
				const PackedKernelArgumentArrayBase &ArgumentArray) override;

				Error copyD2H(const void PlatformStreamHandle, const void DeviceSrcHandle,
				size_t SrcByteOffset, void *HostDst, size_t DstByteOffset,
				size_t ByteCount) override;

				Error copyH2D(const void PlatformStreamHandle, const void HostSrc,
				size_t SrcByteOffset, const void *DeviceDstHandle,
				size_t DstByteOffset, size_t ByteCount) override;

				Error copyD2D(const void PlatformStreamHandle, const void DeviceSrcHandle,
				size_t SrcByteOffset, const void *DeviceDstHandle,
				size_t DstByteOffset, size_t ByteCount) override;

				Error blockHostUntilDone(const void *PlatformStreamHandle) override;

				Expected<void *> allocateDeviceMemory(size_t ByteCount) override;
				Error freeDeviceMemory(const void *Handle) override;

				Error registerHostMemory(void *Memory, size_t ByteCount) override;
				Error unregisterHostMemory(const void *Memory) override;

				Error synchronousCopyD2H(const void *DeviceSrcHandle, size_t SrcByteOffset,
				void *HostDst, size_t DstByteOffset,
				size_t ByteCount) override;

				Error synchronousCopyH2D(const void *HostSrc, size_t SrcByteOffset,
				const void *DeviceDstHandle, size_t DstByteOffset,
				size_t ByteCount) override;

				Error synchronousCopyD2D(const void *DeviceDstHandle, size_t DstByteOffset,
				const void *DeviceSrcHandle, size_t SrcByteOffset,
				size_t ByteCount) override;

				private:
				CUDAPlatformDevice(size_t DeviceIndex) : DeviceIndex(DeviceIndex) {}

				int DeviceIndex;
				};

				} // namespace cuda
				} // namespace streamexecutor

				#endif // STREAMEXECUTOR_PLATFORMS_CUDA_CUDAPLATFORMDEVICE_H

parallel-libs/trunk/streamexecutor/include/streamexecutor/platforms/host/HostPlatformDevice.h

	Show All 23 Lines
	namespace host {			namespace host {

	/// A concrete PlatformDevice subclass that performs its work on the host rather			/// A concrete PlatformDevice subclass that performs its work on the host rather
	/// than offloading to an accelerator.			/// than offloading to an accelerator.
	class HostPlatformDevice : public PlatformDevice {			class HostPlatformDevice : public PlatformDevice {
	public:			public:
	std::string getName() const override { return "host"; }			std::string getName() const override { return "host"; }

				std::string getPlatformName() const override { return "host"; }

	Expected<const void *>			Expected<const void *>
	createKernel(const MultiKernelLoaderSpec &Spec) override {			createKernel(const MultiKernelLoaderSpec &Spec) override {
	if (!Spec.hasHostFunction()) {			if (!Spec.hasHostFunction()) {
	return make_error("no host implementation available for kernel " +			return make_error("no host implementation available for kernel " +
	Spec.getKernelName());			Spec.getKernelName());
	}			}
	return static_cast<const void *>(&Spec.getHostFunction());			return static_cast<const void *>(&Spec.getHostFunction());
	}			}
	▲ Show 20 Lines • Show All 120 Lines • Show Last 20 Lines

parallel-libs/trunk/streamexecutor/lib/CMakeLists.txt

	macro(add_se_library name)			macro(add_se_library name)
	add_llvm_library(${name} ${ARGN})			add_llvm_library(${name} ${ARGN})
	set_target_properties(${name} PROPERTIES FOLDER "streamexecutor libraries")			set_target_properties(${name} PROPERTIES FOLDER "streamexecutor libraries")
	endmacro(add_se_library)			endmacro(add_se_library)

				if(STREAM_EXECUTOR_ENABLE_CUDA_PLATFORM)
				set(
				CMAKE_MODULE_PATH
				${CMAKE_MODULE_PATH}
				"${CMAKE_CURRENT_SOURCE_DIR}/platforms/cuda/cmake/modules/")

				find_package(Libcuda REQUIRED)
				include_directories(${LIBCUDA_INCLUDE_DIRS})

				set(
				STREAM_EXECUTOR_CUDA_PLATFORM_TARGET_OBJECT
				$<TARGET_OBJECTS:streamexecutor_cuda_platform>)

				set(
				STREAM_EXECUTOR_LIBCUDA_LIBRARIES
				${LIBCUDA_LIBRARIES})
				endif(STREAM_EXECUTOR_ENABLE_CUDA_PLATFORM)

				add_subdirectory(platforms)

	add_se_library(			add_se_library(
	streamexecutor			streamexecutor
	Device.cpp			Device.cpp
	DeviceMemory.cpp			DeviceMemory.cpp
	Error.cpp			Error.cpp
	HostMemory.cpp			HostMemory.cpp
	Kernel.cpp			Kernel.cpp
	KernelSpec.cpp			KernelSpec.cpp
	PackedKernelArgumentArray.cpp			PackedKernelArgumentArray.cpp
	Platform.cpp			Platform.cpp
	PlatformDevice.cpp			PlatformDevice.cpp
	PlatformManager.cpp			PlatformManager.cpp
	Stream.cpp			Stream.cpp
	)			${STREAM_EXECUTOR_CUDA_PLATFORM_TARGET_OBJECT}
				LINK_LIBS
				${STREAM_EXECUTOR_LIBCUDA_LIBRARIES})

	install(TARGETS streamexecutor DESTINATION lib)			install(TARGETS streamexecutor DESTINATION lib)

parallel-libs/trunk/streamexecutor/lib/PlatformManager.cpp

	//===-- PlatformManager.cpp - PlatformManager implementation --------------===//			//===-- PlatformManager.cpp - PlatformManager implementation --------------===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	///			///
	/// \file			/// \file
	/// Implementation of PlatformManager class internals.			/// Implementation of PlatformManager class internals.
	///			///
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "streamexecutor/PlatformManager.h"			#include "streamexecutor/PlatformManager.h"

				#include "streamexecutor/PlatformOptions.h"
	#include "streamexecutor/platforms/host/HostPlatform.h"			#include "streamexecutor/platforms/host/HostPlatform.h"

				#ifdef STREAM_EXECUTOR_ENABLE_CUDA_PLATFORM
				#include "streamexecutor/platforms/cuda/CUDAPlatform.h"
				#endif

	namespace streamexecutor {			namespace streamexecutor {

	PlatformManager::PlatformManager() {			PlatformManager::PlatformManager() {
	// TODO(jhen): Register known platforms by name.			// TODO(jhen): Register known platforms by name.
	// We have a couple of options here:			// We have a couple of options here:
	// * Use build-system flags to set preprocessor macros that select the			// * Use build-system flags to set preprocessor macros that select the
	// appropriate code to include here.			// appropriate code to include here.
	// * Use static initialization tricks to have platform libraries register			// * Use static initialization tricks to have platform libraries register
	// themselves when they are loaded.			// themselves when they are loaded.

	PlatformsByName.emplace("host", llvm::make_unique<host::HostPlatform>());			PlatformsByName.emplace("host", llvm::make_unique<host::HostPlatform>());

				#ifdef STREAM_EXECUTOR_ENABLE_CUDA_PLATFORM
				PlatformsByName.emplace("cuda", llvm::make_unique<cuda::CUDAPlatform>());
				#endif
	}			}

	Expected<Platform *> PlatformManager::getPlatformByName(llvm::StringRef Name) {			Expected<Platform *> PlatformManager::getPlatformByName(llvm::StringRef Name) {
	static PlatformManager Instance;			static PlatformManager Instance;
	auto Iterator = Instance.PlatformsByName.find(Name.lower());			auto Iterator = Instance.PlatformsByName.find(Name.lower());
	if (Iterator != Instance.PlatformsByName.end())			if (Iterator != Instance.PlatformsByName.end())
	return Iterator->second.get();			return Iterator->second.get();
	return make_error("no available platform with name " + Name);			return make_error("no available platform with name " + Name);
	}			}

	} // namespace streamexecutor			} // namespace streamexecutor

parallel-libs/trunk/streamexecutor/lib/platforms/CMakeLists.txt

				if(STREAM_EXECUTOR_ENABLE_CUDA_PLATFORM)
				add_subdirectory(cuda)
				endif()

parallel-libs/trunk/streamexecutor/lib/platforms/cuda/CMakeLists.txt

				add_library(
				streamexecutor_cuda_platform
				OBJECT
				CUDAPlatform.cpp
				CUDAPlatformDevice.cpp)

parallel-libs/trunk/streamexecutor/lib/platforms/cuda/CUDAPlatform.cpp

				//===-- CUDAPlatform.cpp - CUDA platform implementation -------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// Implementation of CUDA platform internals.
				///
				//===----------------------------------------------------------------------===//

				#include "streamexecutor/platforms/cuda/CUDAPlatform.h"
				#include "streamexecutor/Device.h"
				#include "streamexecutor/Platform.h"
				#include "streamexecutor/platforms/cuda/CUDAPlatformDevice.h"

				#include "llvm/Support/Mutex.h"

				#include "cuda.h"

				#include <map>

				namespace streamexecutor {
				namespace cuda {

				static CUresult ensureCUDAInitialized() {
				static CUresult InitResult = []() { return cuInit(0); }();
				return InitResult;
				}

				size_t CUDAPlatform::getDeviceCount() const {
				if (ensureCUDAInitialized())
				// TODO(jhen): Log an error.
				return 0;

				int DeviceCount = 0;
				CUresult Result = cuDeviceGetCount(&DeviceCount);
				(void)Result;
				// TODO(jhen): Log an error.

				return DeviceCount;
				}

				Expected<Device> CUDAPlatform::getDevice(size_t DeviceIndex) {
				if (CUresult InitResult = ensureCUDAInitialized())
				return CUresultToError(InitResult, "cached cuInit return value");

				llvm::sys::ScopedLock Lock(Mutex);
				auto Iterator = PlatformDevices.find(DeviceIndex);
				if (Iterator == PlatformDevices.end()) {
				if (auto MaybePDevice = CUDAPlatformDevice::create(DeviceIndex)) {
				Iterator =
				PlatformDevices.emplace(DeviceIndex, std::move(*MaybePDevice)).first;
				} else {
				return MaybePDevice.takeError();
				}
				}
				return Device(&Iterator->second);
				}

				} // namespace cuda
				} // namespace streamexecutor

parallel-libs/trunk/streamexecutor/lib/platforms/cuda/CUDAPlatformDevice.cpp

				//===-- CUDAPlatformDevice.cpp - CUDAPlatformDevice implementation --------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// Implementation of CUDAPlatformDevice.
				///
				//===----------------------------------------------------------------------===//

				#include "streamexecutor/platforms/cuda/CUDAPlatformDevice.h"
				#include "streamexecutor/PlatformDevice.h"

				#include "cuda.h"

				namespace streamexecutor {
				namespace cuda {

				static void offset(const void Base, size_t Offset) {
				return const_cast<char >(static_cast<const char >(Base) + Offset);
				}

				Error CUresultToError(int CUResult, const llvm::Twine &Message) {
				CUresult Result = static_cast<CUresult>(CUResult);
				if (Result) {
				const char *ErrorName;
				if (cuGetErrorName(Result, &ErrorName))
				ErrorName = "UNKNOWN ERROR NAME";
				const char *ErrorString;
				if (cuGetErrorString(Result, &ErrorString))
				ErrorString = "UNKNOWN ERROR DESCRIPTION";
				return make_error("CUDA driver error: '" + Message + "', error code = " +
				llvm::Twine(static_cast<int>(Result)) + ", name = " +
				ErrorName + ", description = '" + ErrorString + "'");
				} else
				return Error::success();
				}

				std::string CUDAPlatformDevice::getName() const {
				static std::string CachedName = [](int DeviceIndex) {
				static constexpr size_t MAX_DRIVER_NAME_BYTES = 1024;
				std::string Name = "CUDA device " + std::to_string(DeviceIndex);
				char NameFromDriver[MAX_DRIVER_NAME_BYTES];
				if (!cuDeviceGetName(NameFromDriver, MAX_DRIVER_NAME_BYTES - 1,
				DeviceIndex)) {
				NameFromDriver[MAX_DRIVER_NAME_BYTES - 1] = '\0';
				Name.append(": ").append(NameFromDriver);
				}
				return Name;
				}(DeviceIndex);
				return CachedName;
				}

				Expected<CUDAPlatformDevice> CUDAPlatformDevice::create(size_t DeviceIndex) {
				CUdevice DeviceHandle;
				if (CUresult Result = cuDeviceGet(&DeviceHandle, DeviceIndex))
				return CUresultToError(Result, "cuDeviceGet");

				CUcontext ContextHandle;
				if (CUresult Result = cuDevicePrimaryCtxRetain(&ContextHandle, DeviceHandle))
				return CUresultToError(Result, "cuDevicePrimaryCtxRetain");

				if (CUresult Result = cuCtxSetCurrent(ContextHandle))
				return CUresultToError(Result, "cuCtxSetCurrent");

				return CUDAPlatformDevice(DeviceIndex);
				}

				CUDAPlatformDevice::CUDAPlatformDevice(CUDAPlatformDevice &&Other) noexcept
				: DeviceIndex(Other.DeviceIndex) {
				Other.DeviceIndex = -1;
				}

				CUDAPlatformDevice &CUDAPlatformDevice::
				operator=(CUDAPlatformDevice &&Other) noexcept {
				DeviceIndex = Other.DeviceIndex;
				Other.DeviceIndex = -1;
				return *this;
				}

				CUDAPlatformDevice::~CUDAPlatformDevice() {
				CUresult Result = cuDevicePrimaryCtxRelease(DeviceIndex);
				(void)Result;
				// TODO(jhen): Log error.
				}

				Expected<const void *>
				CUDAPlatformDevice::createKernel(const MultiKernelLoaderSpec &Spec) {
				// TODO(jhen): Maybe first check loaded modules?
				if (!Spec.hasCUDAPTXInMemory())
				return make_error("no CUDA code available to create kernel");

				CUdevice Device = static_cast<int>(DeviceIndex);
				int ComputeCapabilityMajor = 0;
				int ComputeCapabilityMinor = 0;
				if (CUresult Result = cuDeviceGetAttribute(
				&ComputeCapabilityMajor, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR,
				Device))
				return CUresultToError(
				Result,
				"cuDeviceGetAttribute CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR");
				if (CUresult Result = cuDeviceGetAttribute(
				&ComputeCapabilityMinor, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR,
				Device))
				return CUresultToError(
				Result,
				"cuDeviceGetAttribute CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR");
				const char *Code = Spec.getCUDAPTXInMemory().getCode(ComputeCapabilityMajor,
				ComputeCapabilityMinor);

				if (!Code)
				return make_error("no suitable CUDA source found for compute capability " +
				llvm::Twine(ComputeCapabilityMajor) + "." +
				llvm::Twine(ComputeCapabilityMinor));

				CUmodule Module;
				if (CUresult Result = cuModuleLoadData(&Module, Code))
				return CUresultToError(Result, "cuModuleLoadData");

				CUfunction Function;
				if (CUresult Result =
				cuModuleGetFunction(&Function, Module, Spec.getKernelName().c_str()))
				return CUresultToError(Result, "cuModuleGetFunction");

				// TODO(jhen): Should I save this function pointer in case someone asks for
				// it again?

				// TODO(jhen): Should I save the module pointer so I can unload it when I
				// destroy this device?

				return static_cast<const void *>(Function);
				}

				Error CUDAPlatformDevice::destroyKernel(const void *Handle) {
				// TODO(jhen): Maybe keep track of kernels for each module and unload the
				// module after they are all destroyed.
				return Error::success();
				}

				Expected<const void *> CUDAPlatformDevice::createStream() {
				CUstream Stream;
				if (CUresult Result = cuStreamCreate(&Stream, CU_STREAM_DEFAULT))
				return CUresultToError(Result, "cuStreamCreate");
				return Stream;
				}

				Error CUDAPlatformDevice::destroyStream(const void *Handle) {
				return CUresultToError(
				cuStreamDestroy(static_cast<CUstream>(const_cast<void *>(Handle))),
				"cuStreamDestroy");
				}

				Error CUDAPlatformDevice::launch(
				const void *PlatformStreamHandle, BlockDimensions BlockSize,
				GridDimensions GridSize, const void *PKernelHandle,
				const PackedKernelArgumentArrayBase &ArgumentArray) {
				CUfunction Function =
				reinterpret_cast<CUfunction>(const_cast<void *>(PKernelHandle));
				CUstream Stream =
				reinterpret_cast<CUstream>(const_cast<void *>(PlatformStreamHandle));
				// TODO(jhen): Deal with shared memory arguments.
				unsigned SharedMemoryBytes = 0;
				void ArgumentAddresses = const_cast<void >(ArgumentArray.getAddresses());
				return CUresultToError(cuLaunchKernel(Function, GridSize.X, GridSize.Y,
				GridSize.Z, BlockSize.X, BlockSize.Y,
				BlockSize.Z, SharedMemoryBytes, Stream,
				ArgumentAddresses, nullptr),
				"cuLaunchKernel");
				}

				Error CUDAPlatformDevice::copyD2H(const void *PlatformStreamHandle,
				const void *DeviceSrcHandle,
				size_t SrcByteOffset, void *HostDst,
				size_t DstByteOffset, size_t ByteCount) {
				return CUresultToError(
				cuMemcpyDtoHAsync(
				offset(HostDst, DstByteOffset),
				reinterpret_cast<CUdeviceptr>(offset(DeviceSrcHandle, SrcByteOffset)),
				ByteCount,
				static_cast<CUstream>(const_cast<void *>(PlatformStreamHandle))),
				"cuMemcpyDtoHAsync");
				}

				Error CUDAPlatformDevice::copyH2D(const void *PlatformStreamHandle,
				const void *HostSrc, size_t SrcByteOffset,
				const void *DeviceDstHandle,
				size_t DstByteOffset, size_t ByteCount) {
				return CUresultToError(
				cuMemcpyHtoDAsync(
				reinterpret_cast<CUdeviceptr>(offset(DeviceDstHandle, DstByteOffset)),
				offset(HostSrc, SrcByteOffset), ByteCount,
				static_cast<CUstream>(const_cast<void *>(PlatformStreamHandle))),
				"cuMemcpyHtoDAsync");
				}

				Error CUDAPlatformDevice::copyD2D(const void *PlatformStreamHandle,
				const void *DeviceSrcHandle,
				size_t SrcByteOffset,
				const void *DeviceDstHandle,
				size_t DstByteOffset, size_t ByteCount) {
				return CUresultToError(
				cuMemcpyDtoDAsync(
				reinterpret_cast<CUdeviceptr>(offset(DeviceDstHandle, DstByteOffset)),
				reinterpret_cast<CUdeviceptr>(offset(DeviceSrcHandle, SrcByteOffset)),
				ByteCount,
				static_cast<CUstream>(const_cast<void *>(PlatformStreamHandle))),
				"cuMemcpyDtoDAsync");
				}

				Error CUDAPlatformDevice::blockHostUntilDone(const void *PlatformStreamHandle) {
				return CUresultToError(cuStreamSynchronize(static_cast<CUstream>(
				const_cast<void *>(PlatformStreamHandle))),
				"cuStreamSynchronize");
				}

				Expected<void *> CUDAPlatformDevice::allocateDeviceMemory(size_t ByteCount) {
				CUdeviceptr Pointer;
				if (CUresult Result = cuMemAlloc(&Pointer, ByteCount))
				return CUresultToError(Result, "cuMemAlloc");
				return reinterpret_cast<void *>(Pointer);
				}

				Error CUDAPlatformDevice::freeDeviceMemory(const void *Handle) {
				return CUresultToError(cuMemFree(reinterpret_cast<CUdeviceptr>(Handle)),
				"cuMemFree");
				}

				Error CUDAPlatformDevice::registerHostMemory(void *Memory, size_t ByteCount) {
				return CUresultToError(cuMemHostRegister(Memory, ByteCount, 0u),
				"cuMemHostRegister");
				}

				Error CUDAPlatformDevice::unregisterHostMemory(const void *Memory) {
				return CUresultToError(cuMemHostUnregister(const_cast<void *>(Memory)),
				"cuMemHostUnregister");
				}

				Error CUDAPlatformDevice::synchronousCopyD2H(const void *DeviceSrcHandle,
				size_t SrcByteOffset,
				void *HostDst,
				size_t DstByteOffset,
				size_t ByteCount) {
				return CUresultToError(cuMemcpyDtoH(offset(HostDst, DstByteOffset),
				reinterpret_cast<CUdeviceptr>(offset(
				DeviceSrcHandle, SrcByteOffset)),
				ByteCount),
				"cuMemcpyDtoH");
				}

				Error CUDAPlatformDevice::synchronousCopyH2D(const void *HostSrc,
				size_t SrcByteOffset,
				const void *DeviceDstHandle,
				size_t DstByteOffset,
				size_t ByteCount) {
				return CUresultToError(
				cuMemcpyHtoD(
				reinterpret_cast<CUdeviceptr>(offset(DeviceDstHandle, DstByteOffset)),
				offset(HostSrc, SrcByteOffset), ByteCount),
				"cuMemcpyHtoD");
				}

				Error CUDAPlatformDevice::synchronousCopyD2D(const void *DeviceDstHandle,
				size_t DstByteOffset,
				const void *DeviceSrcHandle,
				size_t SrcByteOffset,
				size_t ByteCount) {
				return CUresultToError(
				cuMemcpyDtoD(
				reinterpret_cast<CUdeviceptr>(offset(DeviceDstHandle, DstByteOffset)),
				reinterpret_cast<CUdeviceptr>(offset(DeviceSrcHandle, SrcByteOffset)),
				ByteCount),
				"cuMemcpyDtoD");
				}

				} // namespace cuda
				} // namespace streamexecutor

parallel-libs/trunk/streamexecutor/lib/platforms/cuda/cmake/modules/FindLibcuda.cmake

				# - Try to find the libcuda library
				# Once done this will define
				# LIBCUDA_FOUND - System has libcuda
				# LIBCUDA_INCLUDE_DIRS - The libcuda include directories
				# LIBCUDA_LIBRARIES - The libraries needed to use libcuda

				# TODO(jhen): Allow users to specify a search path.
				find_path(LIBCUDA_INCLUDE_DIR cuda.h /usr/local/cuda/include)
				# TODO(jhen): Use the library that goes with the headers.
				find_library(LIBCUDA_LIBRARY cuda)

				include(FindPackageHandleStandardArgs)
				# handle the QUIETLY and REQUIRED arguments and set LIBCUDA_FOUND to TRUE if
				# all listed variables are TRUE
				find_package_handle_standard_args(
				LIBCUDA DEFAULT_MSG LIBCUDA_INCLUDE_DIR LIBCUDA_LIBRARY)

				mark_as_advanced(LIBCUDA_INCLUDE_DIR LIBCUDA_LIBRARY)

				set(LIBCUDA_LIBRARIES ${LIBCUDA_LIBRARY})
				set(LIBCUDA_INCLUDE_DIRS ${LIBCUDA_INCLUDE_DIR})

This is an archive of the discontinued LLVM Phabricator instance.

[SE] Add CUDA platformClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 71413

parallel-libs/trunk/streamexecutor/CMakeLists.txt

parallel-libs/trunk/streamexecutor/include/streamexecutor/PlatformDevice.h

parallel-libs/trunk/streamexecutor/include/streamexecutor/PlatformOptions.h.in

parallel-libs/trunk/streamexecutor/include/streamexecutor/platforms/cuda/CUDAPlatform.h

parallel-libs/trunk/streamexecutor/include/streamexecutor/platforms/cuda/CUDAPlatformDevice.h

parallel-libs/trunk/streamexecutor/include/streamexecutor/platforms/host/HostPlatformDevice.h

parallel-libs/trunk/streamexecutor/lib/CMakeLists.txt

parallel-libs/trunk/streamexecutor/lib/PlatformManager.cpp

parallel-libs/trunk/streamexecutor/lib/platforms/CMakeLists.txt

parallel-libs/trunk/streamexecutor/lib/platforms/cuda/CMakeLists.txt

parallel-libs/trunk/streamexecutor/lib/platforms/cuda/CUDAPlatform.cpp

parallel-libs/trunk/streamexecutor/lib/platforms/cuda/CUDAPlatformDevice.cpp

parallel-libs/trunk/streamexecutor/lib/platforms/cuda/cmake/modules/FindLibcuda.cmake

[SE] Add CUDA platform
ClosedPublic