This is an archive of the discontinued LLVM Phabricator instance.

[Acxxel] Remove setActiveDeviceForThread
ClosedPublic

Authored by jhen on Oct 27 2016, 4:51 PM.

Download Raw Diff

Details

Reviewers

jprice
jlebar

Commits

rGbdc410babaee: [Acxxel] Remove setActiveDeviceForThread
rL285372: [Acxxel] Remove setActiveDeviceForThread

Summary

After experimenting with CUDA, I realized that we really only need to
set the active context right before creating an object such as a stream
or a device memory allocation. When we go on to use these objects later,
it is fine if the context that created them is no longer active,
operations with those objects will succeed anyway.

Since it turns out that we don't have to check the active context for
every operation, it makes sense to hide this active context from users
(by removing the "ActiveDeviceForThread" setter and getter) and to
change the Acxxel API to explicitly pass in the device ID to create
objects.

This change improves the Acxxel API and greatly simplifies the CUDA and
OpenCL implementations because they no longer require thread_local data.

Diff Detail

Build Status

Buildable 853
Build 853: arc lint + arc unit

Event Timeline

jhen updated this revision to Diff 76133.Oct 27 2016, 4:51 PM

jhen retitled this revision from to [Acxxel] Remove setActiveDeviceForThread.

jhen updated this object.

jhen added reviewers: jlebar, jprice.

jhen added a subscriber: parallel_libs-commits.

Herald added a subscriber: mgorny. · View Herald TranscriptOct 27 2016, 4:51 PM

Well this is great news.

acxxel/acxxel.h
502	Is there a reason we don't default DeviceIndex here as we do elsewhere, or is that just an oversight?

This revision is now accepted and ready to land.Oct 27 2016, 5:14 PM

Default DeviceIndex for getSymbolMemory

jhen marked an inline comment as done.Oct 27 2016, 5:25 PM

jhen added inline comments.

acxxel/acxxel.h
502	Oops, I accidentally missed that one. Thanks for catching it!

Closed by commit rL285372: [Acxxel] Remove setActiveDeviceForThread (authored by jhen). · Explain WhyOct 27 2016, 6:03 PM

This revision was automatically updated to reflect the committed changes.

jhen marked an inline comment as done.

Revision Contents

Path

Size

acxxel/

1 line

83 lines

131 lines

131 lines

tests/

CMakeLists.txt

10 lines

acxxel_test.cpp

38 lines

multi_device_test.cpp

87 lines

Diff 76144

acxxel/CMakeLists.txt

	cmake_minimum_required(VERSION 3.1)			cmake_minimum_required(VERSION 3.1)

	option(ACXXEL_ENABLE_UNIT_TESTS "enable acxxel unit tests" ON)			option(ACXXEL_ENABLE_UNIT_TESTS "enable acxxel unit tests" ON)
				option(ACXXEL_ENABLE_MULTI_DEVICE_UNIT_TESTS "enable acxxel multi-device unit tests" OFF)
	option(ACXXEL_ENABLE_EXAMPLES "enable acxxel examples" OFF)			option(ACXXEL_ENABLE_EXAMPLES "enable acxxel examples" OFF)
	option(ACXXEL_ENABLE_DOXYGEN "enable Doxygen for acxxel" OFF)			option(ACXXEL_ENABLE_DOXYGEN "enable Doxygen for acxxel" OFF)
	option(ACXXEL_ENABLE_CUDA "enable CUDA for acxxel" ON)			option(ACXXEL_ENABLE_CUDA "enable CUDA for acxxel" ON)
	option(ACXXEL_ENABLE_OPENCL "enable OpenCL for acxxel" ON)			option(ACXXEL_ENABLE_OPENCL "enable OpenCL for acxxel" ON)

	project(acxxel)			project(acxxel)

	if(ACXXEL_ENABLE_CUDA)			if(ACXXEL_ENABLE_CUDA)
	▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

acxxel/acxxel.h

Show First 20 Lines • Show All 223 Lines • ▼ Show 20 Lines	private:
std::unique_ptr<void, HandleDestructor> TheHandle;		std::unique_ptr<void, HandleDestructor> TheHandle;
};		};

/// A stream of computation.		/// A stream of computation.
///		///
/// All operations enqueued on a Stream are serialized, but operations enqueued		/// All operations enqueued on a Stream are serialized, but operations enqueued
/// on different Streams may run concurrently.		/// on different Streams may run concurrently.
///		///
/// Each Platform has a notion of the currently active device on a particular		/// Each Stream is associated with a specific, fixed device.
/// thread (see Platform::getActiveDeviceForThread and
/// Platform::setActiveDeviceForThread). Each Stream is associated with a
/// specific, fixed device, set to the current thread's active device when the
/// Stream is created. Whenver a thread enqueues commands onto a Stream, its
/// active device must match the Stream's device.
class Stream {		class Stream {
public:		public:
Stream(const Stream &) = delete;		Stream(const Stream &) = delete;
Stream &operator=(const Stream &) = delete;		Stream &operator=(const Stream &) = delete;
Stream(Stream &&) noexcept;		Stream(Stream &&) noexcept;
Stream &operator=(Stream &&) noexcept;		Stream &operator=(Stream &&) noexcept;
~Stream() = default;		~Stream() = default;

▲ Show 20 Lines • Show All 196 Lines • ▼ Show 20 Lines	public:

/// Gets the time elapsed between the previous event's execution and this		/// Gets the time elapsed between the previous event's execution and this
/// event's execution.		/// event's execution.
Expected<float> getSecondsSince(const Event &Previous);		Expected<float> getSecondsSince(const Event &Previous);

private:		private:
// Only a platform can make an event.		// Only a platform can make an event.
friend class Platform;		friend class Platform;
Event(Platform APlatform, void AHandle, HandleDestructor Destructor)		Event(Platform APlatform, int DeviceIndex, void AHandle,
: ThePlatform(APlatform), TheHandle(AHandle, Destructor) {}		HandleDestructor Destructor)
		: ThePlatform(APlatform), TheDeviceIndex(DeviceIndex),
		TheHandle(AHandle, Destructor) {}

Platform *ThePlatform;		Platform *ThePlatform;

		// The index of the device on which the event can be enqueued.
		int TheDeviceIndex;

std::unique_ptr<void, HandleDestructor> TheHandle;		std::unique_ptr<void, HandleDestructor> TheHandle;
};		};

/// An accelerator platform.		/// An accelerator platform.
///		///
/// This is the base class for all platforms such as CUDA and OpenCL. It		/// This is the base class for all platforms such as CUDA and OpenCL. It
/// contains many virtual methods that must be overridden by each platform		/// contains many virtual methods that must be overridden by each platform
/// implementation.		/// implementation.
///		///
/// It also has some template wrapper functions that take care of type checking		/// It also has some template wrapper functions that take care of type checking
/// and then forward their arguments on to raw virtual functions that are		/// and then forward their arguments on to raw virtual functions that are
/// implemented by each specific platform.		/// implemented by each specific platform.
class Platform {		class Platform {
public:		public:
virtual ~Platform(){};		virtual ~Platform(){};

/// Gets the number of devices for this platform in this system.		/// Gets the number of devices for this platform in this system.
virtual Expected<int> getDeviceCount() = 0;		virtual Expected<int> getDeviceCount() = 0;

/// Sets the active device for this platform in this thread.		/// Creates a stream on the given device for the platform.
virtual Status setActiveDeviceForThread(int DeviceIndex) = 0;		virtual Expected<Stream> createStream(int DeviceIndex = 0) = 0;

/// Gets the currently active device for this platform in this thread.
virtual int getActiveDeviceForThread() = 0;

/// Creates a stream for the platform.
///
/// The created Stream is associated with the active device for this thread.
virtual Expected<Stream> createStream() = 0;

/// Creates an event for the platform.		/// Creates an event on the given device for the platform.
///		virtual Expected<Event> createEvent(int DeviceIndex = 0) = 0;
/// The created Event is associated with the active device for this thread.
virtual Expected<Event> createEvent() = 0;

/// Allocates owned device memory.		/// Allocates owned device memory.
///		///
/// \warning This function only allocates space in device memory, it does not		/// \warning This function only allocates space in device memory, it does not
/// call the constructor of T.		/// call the constructor of T.
template <typename T>		template <typename T>
Expected<DeviceMemory<T>> mallocD(ptrdiff_t ElementCount) {		Expected<DeviceMemory<T>> mallocD(ptrdiff_t ElementCount,
Expected<void > MaybePointer = rawMallocD(ElementCount sizeof(T));		int DeviceIndex = 0) {
		Expected<void *> MaybePointer =
		rawMallocD(ElementCount * sizeof(T), DeviceIndex);
if (MaybePointer.isError())		if (MaybePointer.isError())
return MaybePointer.getError();		return MaybePointer.getError();
return DeviceMemory<T>(this, MaybePointer.getValue(), ElementCount,		return DeviceMemory<T>(this, MaybePointer.getValue(), ElementCount,
this->getDeviceMemoryHandleDestructor());		this->getDeviceMemoryHandleDestructor());
}		}

/// Creates a DeviceMemorySpan for a device symbol.		/// Creates a DeviceMemorySpan for a device symbol.
///		///
/// This function is present to support __device__ variables in CUDA. Given a		/// This function is present to support __device__ variables in CUDA. Given a
/// pointer to a __device__ variable, this function returns a DeviceMemorySpan		/// pointer to a __device__ variable, this function returns a DeviceMemorySpan
/// referencing the device memory that stores that __device__ variable.		/// referencing the device memory that stores that __device__ variable.
template <typename ElementType>		template <typename ElementType>
Expected<DeviceMemorySpan<ElementType>> getSymbolMemory(ElementType *Symbol) {		Expected<DeviceMemorySpan<ElementType>> getSymbolMemory(ElementType *Symbol,
Expected<void *> MaybeAddress = rawGetDeviceSymbolAddress(Symbol);		int DeviceIndex = 0) {
		jlebarUnsubmitted Done Reply Inline Actions Is there a reason we don't default DeviceIndex here as we do elsewhere, or is that just an oversight? jlebar: Is there a reason we don't default DeviceIndex here as we do elsewhere, or is that just an…
		jhenAuthorUnsubmitted Not Done Reply Inline Actions Oops, I accidentally missed that one. Thanks for catching it! jhen: Oops, I accidentally missed that one. Thanks for catching it!
		Expected<void *> MaybeAddress =
		rawGetDeviceSymbolAddress(Symbol, DeviceIndex);
if (MaybeAddress.isError())		if (MaybeAddress.isError())
return MaybeAddress.getError();		return MaybeAddress.getError();
ElementType Address = static_cast<ElementType >(MaybeAddress.getValue());		ElementType Address = static_cast<ElementType >(MaybeAddress.getValue());
Expected<ptrdiff_t> MaybeSize = rawGetDeviceSymbolSize(Symbol);		Expected<ptrdiff_t> MaybeSize = rawGetDeviceSymbolSize(Symbol, DeviceIndex);
if (MaybeSize.isError())		if (MaybeSize.isError())
return MaybeSize.getError();		return MaybeSize.getError();
ptrdiff_t Size = MaybeSize.getValue();		ptrdiff_t Size = MaybeSize.getValue();
return DeviceMemorySpan<ElementType>(this, Address,		return DeviceMemorySpan<ElementType>(this, Address,
Size / sizeof(ElementType), 0);		Size / sizeof(ElementType), 0);
}		}

/// \name Host memory registration functions.		/// \name Host memory registration functions.
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	Expected<OwnedAsyncHostMemory<T>> newAsyncHostMem(ptrdiff_t ElementCount) {
for (ptrdiff_t I = 0; I < ElementCount; ++I)		for (ptrdiff_t I = 0; I < ElementCount; ++I)
new (Memory + I) T;		new (Memory + I) T;
return OwnedAsyncHostMemory<T>(Memory, ElementCount,		return OwnedAsyncHostMemory<T>(Memory, ElementCount,
this->getFreeHostMemoryHandleDestructor());		this->getFreeHostMemoryHandleDestructor());
}		}

/// \}		/// \}

virtual Expected<Program>		virtual Expected<Program> createProgramFromSource(Span<const char> Source,
createProgramFromSource(Span<const char> Source) = 0;		int DeviceIndex = 0) = 0;

protected:		protected:
friend class Stream;		friend class Stream;
friend class Event;		friend class Event;
friend class Program;		friend class Program;
template <typename T> friend class DeviceMemorySpan;		template <typename T> friend class DeviceMemorySpan;

void *getStreamHandle(Stream &Stream) { return Stream.TheHandle.get(); }		void *getStreamHandle(Stream &Stream) { return Stream.TheHandle.get(); }
void *getEventHandle(Event &Event) { return Event.TheHandle.get(); }		void *getEventHandle(Event &Event) { return Event.TheHandle.get(); }

// Pass along access to Stream constructor to subclasses.		// Pass along access to Stream constructor to subclasses.
Stream constructStream(Platform APlatform, void AHandle,		Stream constructStream(Platform APlatform, int DeviceIndex, void AHandle,
HandleDestructor Destructor) {		HandleDestructor Destructor) {
return Stream(APlatform, getActiveDeviceForThread(), AHandle, Destructor);		return Stream(APlatform, DeviceIndex, AHandle, Destructor);
}		}

// Pass along access to Event constructor to subclasses.		// Pass along access to Event constructor to subclasses.
Event constructEvent(Platform APlatform, void AHandle,		Event constructEvent(Platform APlatform, int DeviceIndex, void AHandle,
HandleDestructor Destructor) {		HandleDestructor Destructor) {
return Event(APlatform, AHandle, Destructor);		return Event(APlatform, DeviceIndex, AHandle, Destructor);
}		}

// Pass along access to Program constructor to subclasses.		// Pass along access to Program constructor to subclasses.
Program constructProgram(Platform APlatform, void AHandle,		Program constructProgram(Platform APlatform, void AHandle,
HandleDestructor Destructor) {		HandleDestructor Destructor) {
return Program(APlatform, AHandle, Destructor);		return Program(APlatform, AHandle, Destructor);
}		}

virtual Status streamSync(void *Stream) = 0;		virtual Status streamSync(void *Stream) = 0;
virtual Status streamWaitOnEvent(void Stream, void Event) = 0;		virtual Status streamWaitOnEvent(void Stream, void Event) = 0;

virtual Status enqueueEvent(void Event, void Stream) = 0;		virtual Status enqueueEvent(void Event, void Stream) = 0;
virtual bool eventIsDone(void *Event) = 0;		virtual bool eventIsDone(void *Event) = 0;
virtual Status eventSync(void *Event) = 0;		virtual Status eventSync(void *Event) = 0;
virtual Expected<float> getSecondsBetweenEvents(void *StartEvent,		virtual Expected<float> getSecondsBetweenEvents(void *StartEvent,
void *EndEvent) = 0;		void *EndEvent) = 0;

virtual Expected<void *> rawMallocD(ptrdiff_t ByteCount) = 0;		virtual Expected<void *> rawMallocD(ptrdiff_t ByteCount, int DeviceIndex) = 0;
virtual HandleDestructor getDeviceMemoryHandleDestructor() = 0;		virtual HandleDestructor getDeviceMemoryHandleDestructor() = 0;
virtual void getDeviceMemorySpanHandle(void BaseHandle, size_t ByteSize,		virtual void getDeviceMemorySpanHandle(void BaseHandle, size_t ByteSize,
size_t ByteOffset) = 0;		size_t ByteOffset) = 0;
virtual void rawDestroyDeviceMemorySpanHandle(void *Handle) = 0;		virtual void rawDestroyDeviceMemorySpanHandle(void *Handle) = 0;

virtual Expected<void > rawGetDeviceSymbolAddress(const void Symbol) = 0;		virtual Expected<void > rawGetDeviceSymbolAddress(const void Symbol,
virtual Expected<ptrdiff_t> rawGetDeviceSymbolSize(const void *Symbol) = 0;		int DeviceIndex) = 0;
		virtual Expected<ptrdiff_t> rawGetDeviceSymbolSize(const void *Symbol,
virtual Status rawCopyDToD(const void *DeviceSrc,		int DeviceIndex) = 0;
ptrdiff_t DeviceSrcByteOffset, void *DeviceDst,
ptrdiff_t DeviceDstByteOffset,
ptrdiff_t ByteCount) = 0;
virtual Status rawCopyDToH(const void *DeviceSrc,
ptrdiff_t DeviceSrcByteOffset, void *HostDst,
ptrdiff_t ByteCount) = 0;
virtual Status rawCopyHToD(const void HostSrc, void DeviceDst,
ptrdiff_t DeviceDstByteOffset,
ptrdiff_t ByteCount) = 0;

virtual Status rawMemsetD(void *DeviceDst, ptrdiff_t ByteOffset,
ptrdiff_t ByteCount, char ByteValue) = 0;

virtual Status rawRegisterHostMem(const void *Memory,		virtual Status rawRegisterHostMem(const void *Memory,
ptrdiff_t ByteCount) = 0;		ptrdiff_t ByteCount) = 0;
virtual HandleDestructor getUnregisterHostMemoryHandleDestructor() = 0;		virtual HandleDestructor getUnregisterHostMemoryHandleDestructor() = 0;

virtual Expected<void *> rawMallocRegisteredH(ptrdiff_t ByteCount) = 0;		virtual Expected<void *> rawMallocRegisteredH(ptrdiff_t ByteCount) = 0;
virtual HandleDestructor getFreeHostMemoryHandleDestructor() = 0;		virtual HandleDestructor getFreeHostMemoryHandleDestructor() = 0;

▲ Show 20 Lines • Show All 745 Lines • Show Last 20 Lines

acxxel/cuda_acxxel.cpp

Show All 19 Lines
#include <cassert>		#include <cassert>
#include <sstream>		#include <sstream>
#include <vector>		#include <vector>

namespace acxxel {		namespace acxxel {

namespace {		namespace {

/// Index of active device for this thread.
thread_local int ActiveDeviceIndex = 0;

static std::string getCUErrorMessage(CUresult Result) {		static std::string getCUErrorMessage(CUresult Result) {
if (!Result)		if (!Result)
return "success";		return "success";
const char *ErrorName = "UNKNOWN_ERROR_NAME";		const char *ErrorName = "UNKNOWN_ERROR_NAME";
const char *ErrorDescription = "UNKNOWN_ERROR_DESCRIPTION";		const char *ErrorDescription = "UNKNOWN_ERROR_DESCRIPTION";
cuGetErrorName(Result, &ErrorName);		cuGetErrorName(Result, &ErrorName);
cuGetErrorString(Result, &ErrorDescription);		cuGetErrorString(Result, &ErrorDescription);
std::ostringstream OutStream;		std::ostringstream OutStream;
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
class CUDAPlatform : public Platform {		class CUDAPlatform : public Platform {
public:		public:
~CUDAPlatform() override = default;		~CUDAPlatform() override = default;

static Expected<CUDAPlatform> create();		static Expected<CUDAPlatform> create();

Expected<int> getDeviceCount() override;		Expected<int> getDeviceCount() override;

Status setActiveDeviceForThread(int DeviceIndex) override;		Expected<Stream> createStream(int DeviceIndex) override;

int getActiveDeviceForThread() override;

Expected<Stream> createStream() override;

Status streamSync(void *Stream) override;		Status streamSync(void *Stream) override;

Status streamWaitOnEvent(void Stream, void Event) override;		Status streamWaitOnEvent(void Stream, void Event) override;

Expected<Event> createEvent() override;		Expected<Event> createEvent(int DeviceIndex) override;

protected:		protected:
Expected<void *> rawMallocD(ptrdiff_t ByteCount) override;		Expected<void *> rawMallocD(ptrdiff_t ByteCount, int DeviceIndex) override;
HandleDestructor getDeviceMemoryHandleDestructor() override;		HandleDestructor getDeviceMemoryHandleDestructor() override;
void getDeviceMemorySpanHandle(void BaseHandle, size_t ByteSize,		void getDeviceMemorySpanHandle(void BaseHandle, size_t ByteSize,
size_t ByteOffset) override;		size_t ByteOffset) override;
virtual void rawDestroyDeviceMemorySpanHandle(void *Handle) override;		virtual void rawDestroyDeviceMemorySpanHandle(void *Handle) override;

Expected<void > rawGetDeviceSymbolAddress(const void Symbol) override;		Expected<void > rawGetDeviceSymbolAddress(const void Symbol,
Expected<ptrdiff_t> rawGetDeviceSymbolSize(const void *Symbol) override;		int DeviceIndex) override;
		Expected<ptrdiff_t> rawGetDeviceSymbolSize(const void *Symbol,
Status rawCopyDToD(const void *DeviceSrc, ptrdiff_t DeviceSrcByteOffset,		int DeviceIndex) override;
void *DeviceDst, ptrdiff_t DeviceDstByteOffset,
ptrdiff_t ByteCount) override;
Status rawCopyDToH(const void *DeviceSrc, ptrdiff_t DeviceSrcByteOffset,
void *HostDst, ptrdiff_t ByteCount) override;
Status rawCopyHToD(const void HostSrc, void DeviceDst,
ptrdiff_t DeviceDstByteOffset,
ptrdiff_t ByteCount) override;

Status rawMemsetD(void *DeviceDst, ptrdiff_t ByteOffset, ptrdiff_t ByteCount,
char ByteValue) override;

Status rawRegisterHostMem(const void *Memory, ptrdiff_t ByteCount) override;		Status rawRegisterHostMem(const void *Memory, ptrdiff_t ByteCount) override;
HandleDestructor getUnregisterHostMemoryHandleDestructor() override;		HandleDestructor getUnregisterHostMemoryHandleDestructor() override;

Expected<void *> rawMallocRegisteredH(ptrdiff_t ByteCount) override;		Expected<void *> rawMallocRegisteredH(ptrdiff_t ByteCount) override;
HandleDestructor getFreeHostMemoryHandleDestructor() override;		HandleDestructor getFreeHostMemoryHandleDestructor() override;

Status asyncCopyDToD(const void *DeviceSrc, ptrdiff_t DeviceSrcByteOffset,		Status asyncCopyDToD(const void *DeviceSrc, ptrdiff_t DeviceSrcByteOffset,
void *DeviceDst, ptrdiff_t DeviceDstByteOffset,		void *DeviceDst, ptrdiff_t DeviceDstByteOffset,
ptrdiff_t ByteCount, void *Stream) override;		ptrdiff_t ByteCount, void *Stream) override;
Status asyncCopyDToH(const void *DeviceSrc, ptrdiff_t DeviceSrcByteOffset,		Status asyncCopyDToH(const void *DeviceSrc, ptrdiff_t DeviceSrcByteOffset,
void *HostDst, ptrdiff_t ByteCount,		void *HostDst, ptrdiff_t ByteCount,
void *Stream) override;		void *Stream) override;
Status asyncCopyHToD(const void HostSrc, void DeviceDst,		Status asyncCopyHToD(const void HostSrc, void DeviceDst,
ptrdiff_t DeviceDstByteOffset, ptrdiff_t ByteCount,		ptrdiff_t DeviceDstByteOffset, ptrdiff_t ByteCount,
void *Stream) override;		void *Stream) override;

Status asyncMemsetD(void *DeviceDst, ptrdiff_t ByteOffset,		Status asyncMemsetD(void *DeviceDst, ptrdiff_t ByteOffset,
ptrdiff_t ByteCount, char ByteValue,		ptrdiff_t ByteCount, char ByteValue,
void *Stream) override;		void *Stream) override;

Status addStreamCallback(Stream &Stream, StreamCallback Callback) override;		Status addStreamCallback(Stream &Stream, StreamCallback Callback) override;

Expected<Program> createProgramFromSource(Span<const char> Source) override;		Expected<Program> createProgramFromSource(Span<const char> Source,
		int DeviceIndex) override;

Status enqueueEvent(void Event, void Stream) override;		Status enqueueEvent(void Event, void Stream) override;
bool eventIsDone(void *Event) override;		bool eventIsDone(void *Event) override;
Status eventSync(void *Event) override;		Status eventSync(void *Event) override;
Expected<float> getSecondsBetweenEvents(void *StartEvent,		Expected<float> getSecondsBetweenEvents(void *StartEvent,
void *EndEvent) override;		void *EndEvent) override;

Expected<void > rawCreateKernel(void Program,		Expected<void > rawCreateKernel(void Program,
const std::string &Name) override;		const std::string &Name) override;
HandleDestructor getKernelHandleDestructor() override;		HandleDestructor getKernelHandleDestructor() override;

Status rawEnqueueKernelLaunch(void Stream, void Kernel,		Status rawEnqueueKernelLaunch(void Stream, void Kernel,
KernelLaunchDimensions LaunchDimensions,		KernelLaunchDimensions LaunchDimensions,
Span<void *> Arguments,		Span<void *> Arguments,
Span<size_t> ArgumentSizes,		Span<size_t> ArgumentSizes,
size_t SharedMemoryBytes) override;		size_t SharedMemoryBytes) override;

private:		private:
explicit CUDAPlatform(const std::vector<CUcontext> &Contexts)		explicit CUDAPlatform(const std::vector<CUcontext> &Contexts)
: TheContexts(Contexts) {}		: TheContexts(Contexts) {}

		Status setContext(int DeviceIndex) {
		if (DeviceIndex < 0 \|\|
		static_cast<size_t>(DeviceIndex) >= TheContexts.size())
		return Status("invalid deivce index " + std::to_string(DeviceIndex));
		return getCUError(cuCtxSetCurrent(TheContexts[DeviceIndex]),
		"cuCtxSetCurrent");
		}

// Vector of contexts for each device.		// Vector of contexts for each device.
std::vector<CUcontext> TheContexts;		std::vector<CUcontext> TheContexts;
};		};

Expected<CUDAPlatform> CUDAPlatform::create() {		Expected<CUDAPlatform> CUDAPlatform::create() {
std::vector<CUcontext> Contexts;		std::vector<CUcontext> Contexts;
if (CUresult Result = cuInit(0))		if (CUresult Result = cuInit(0))
return getCUError(Result, "cuInit");		return getCUError(Result, "cuInit");
Show All 12 Lines	for (int I = 0; I < DeviceCount; ++I) {
if (CUresult Result = cuCtxSetCurrent(Context))		if (CUresult Result = cuCtxSetCurrent(Context))
return getCUError(Result, "cuCtxSetCurrent");		return getCUError(Result, "cuCtxSetCurrent");
Contexts.emplace_back(Context);		Contexts.emplace_back(Context);
}		}

return CUDAPlatform(Contexts);		return CUDAPlatform(Contexts);
}		}

Status CUDAPlatform::setActiveDeviceForThread(int DeviceIndex) {
if (static_cast<size_t>(DeviceIndex) >= TheContexts.size())
return Status("invalid device index for SetActiveDevice: " +
std::to_string(DeviceIndex));
ActiveDeviceIndex = DeviceIndex;
return getCUError(cuCtxSetCurrent(TheContexts[DeviceIndex]),
"setActiveDeviceForThread cuCtxSetCurrent");
}

int CUDAPlatform::getActiveDeviceForThread() { return ActiveDeviceIndex; }

Expected<int> CUDAPlatform::getDeviceCount() {		Expected<int> CUDAPlatform::getDeviceCount() {
int Count = 0;		int Count = 0;
if (CUresult Result = cuDeviceGetCount(&Count))		if (CUresult Result = cuDeviceGetCount(&Count))
return getCUError(Result, "cuDeviceGetCount");		return getCUError(Result, "cuDeviceGetCount");
return Count;		return Count;
}		}

static void cudaDestroyStream(void *H) {		static void cudaDestroyStream(void *H) {
logCUWarning(cuStreamDestroy(static_cast<CUstream_st *>(H)),		logCUWarning(cuStreamDestroy(static_cast<CUstream_st *>(H)),
"cuStreamDestroy");		"cuStreamDestroy");
}		}

Expected<Stream> CUDAPlatform::createStream() {		Expected<Stream> CUDAPlatform::createStream(int DeviceIndex) {
		Status S = setContext(DeviceIndex);
		if (S.isError())
		return S;
unsigned int Flags = CU_STREAM_DEFAULT;		unsigned int Flags = CU_STREAM_DEFAULT;
CUstream Handle;		CUstream Handle;
if (CUresult Result = cuStreamCreate(&Handle, Flags))		if (CUresult Result = cuStreamCreate(&Handle, Flags))
return getCUError(Result, "cuStreamCreate");		return getCUError(Result, "cuStreamCreate");
return constructStream(this, Handle, cudaDestroyStream);		return constructStream(this, DeviceIndex, Handle, cudaDestroyStream);
}		}

Status CUDAPlatform::streamSync(void *Stream) {		Status CUDAPlatform::streamSync(void *Stream) {
return getCUError(cuStreamSynchronize(static_cast<CUstream_st *>(Stream)),		return getCUError(cuStreamSynchronize(static_cast<CUstream_st *>(Stream)),
"cuStreamSynchronize");		"cuStreamSynchronize");
}		}

Status CUDAPlatform::streamWaitOnEvent(void Stream, void Event) {		Status CUDAPlatform::streamWaitOnEvent(void Stream, void Event) {
// CUDA docs says flags must be 0.		// CUDA docs says flags must be 0.
unsigned int Flags = 0u;		unsigned int Flags = 0u;
return getCUError(cuStreamWaitEvent(static_cast<CUstream_st *>(Stream),		return getCUError(cuStreamWaitEvent(static_cast<CUstream_st *>(Stream),
static_cast<CUevent_st *>(Event), Flags),		static_cast<CUevent_st *>(Event), Flags),
"cuStreamWaitEvent");		"cuStreamWaitEvent");
}		}

static void cudaDestroyEvent(void *H) {		static void cudaDestroyEvent(void *H) {
logCUWarning(cuEventDestroy(static_cast<CUevent_st *>(H)), "cuEventDestroy");		logCUWarning(cuEventDestroy(static_cast<CUevent_st *>(H)), "cuEventDestroy");
}		}

Expected<Event> CUDAPlatform::createEvent() {		Expected<Event> CUDAPlatform::createEvent(int DeviceIndex) {
		Status S = setContext(DeviceIndex);
		if (S.isError())
		return S;
unsigned int Flags = CU_EVENT_DEFAULT;		unsigned int Flags = CU_EVENT_DEFAULT;
CUevent Handle;		CUevent Handle;
if (CUresult Result = cuEventCreate(&Handle, Flags))		if (CUresult Result = cuEventCreate(&Handle, Flags))
return getCUError(Result, "cuEventCreate");		return getCUError(Result, "cuEventCreate");
return constructEvent(this, Handle, cudaDestroyEvent);		return constructEvent(this, DeviceIndex, Handle, cudaDestroyEvent);
}		}

Status CUDAPlatform::enqueueEvent(void Event, void Stream) {		Status CUDAPlatform::enqueueEvent(void Event, void Stream) {
return getCUError(cuEventRecord(static_cast<CUevent_st *>(Event),		return getCUError(cuEventRecord(static_cast<CUevent_st *>(Event),
static_cast<CUstream_st *>(Stream)),		static_cast<CUstream_st *>(Stream)),
"cuEventRecord");		"cuEventRecord");
}		}

Show All 11 Lines	Expected<float> CUDAPlatform::getSecondsBetweenEvents(void *StartEvent,
float Milliseconds;		float Milliseconds;
if (CUresult Result = cuEventElapsedTime(		if (CUresult Result = cuEventElapsedTime(
&Milliseconds, static_cast<CUevent_st *>(StartEvent),		&Milliseconds, static_cast<CUevent_st *>(StartEvent),
static_cast<CUevent_st *>(EndEvent)))		static_cast<CUevent_st *>(EndEvent)))
return getCUError(Result, "cuEventElapsedTime");		return getCUError(Result, "cuEventElapsedTime");
return Milliseconds * 1e-6;		return Milliseconds * 1e-6;
}		}

Expected<void *> CUDAPlatform::rawMallocD(ptrdiff_t ByteCount) {		Expected<void *> CUDAPlatform::rawMallocD(ptrdiff_t ByteCount,
		int DeviceIndex) {
		Status S = setContext(DeviceIndex);
		if (S.isError())
		return S;
if (!ByteCount)		if (!ByteCount)
return nullptr;		return nullptr;
CUdeviceptr Pointer;		CUdeviceptr Pointer;
if (CUresult Result = cuMemAlloc(&Pointer, ByteCount))		if (CUresult Result = cuMemAlloc(&Pointer, ByteCount))
return getCUError(Result, "cuMemAlloc");		return getCUError(Result, "cuMemAlloc");
return reinterpret_cast<void *>(Pointer);		return reinterpret_cast<void *>(Pointer);
}		}

Show All 9 Lines	void CUDAPlatform::getDeviceMemorySpanHandle(void BaseHandle, size_t,
size_t ByteOffset) {		size_t ByteOffset) {
return static_cast<char *>(BaseHandle) + ByteOffset;		return static_cast<char *>(BaseHandle) + ByteOffset;
}		}

void CUDAPlatform::rawDestroyDeviceMemorySpanHandle(void *) {		void CUDAPlatform::rawDestroyDeviceMemorySpanHandle(void *) {
// Do nothing for this platform.		// Do nothing for this platform.
}		}

Expected<void > CUDAPlatform::rawGetDeviceSymbolAddress(const void Symbol) {		Expected<void > CUDAPlatform::rawGetDeviceSymbolAddress(const void Symbol,
		int DeviceIndex) {
		Status S = setContext(DeviceIndex);
		if (S.isError())
		return S;
void *Address;		void *Address;
if (cudaError_t Status = cudaGetSymbolAddress(&Address, Symbol))		if (cudaError_t Status = cudaGetSymbolAddress(&Address, Symbol))
return getCUDAError(Status, "cudaGetSymbolAddress");		return getCUDAError(Status, "cudaGetSymbolAddress");
return Address;		return Address;
}		}

Expected<ptrdiff_t> CUDAPlatform::rawGetDeviceSymbolSize(const void *Symbol) {		Expected<ptrdiff_t> CUDAPlatform::rawGetDeviceSymbolSize(const void *Symbol,
		int DeviceIndex) {
		Status S = setContext(DeviceIndex);
		if (S.isError())
		return S;
size_t Size;		size_t Size;
if (cudaError_t Status = cudaGetSymbolSize(&Size, Symbol))		if (cudaError_t Status = cudaGetSymbolSize(&Size, Symbol))
return getCUDAError(Status, "cudaGetSymbolSize");		return getCUDAError(Status, "cudaGetSymbolSize");
return Size;		return Size;
}		}

static const void offsetVoidPtr(const void Ptr, ptrdiff_t ByteOffset) {		static const void offsetVoidPtr(const void Ptr, ptrdiff_t ByteOffset) {
return static_cast<const void >(static_cast<const char >(Ptr) + ByteOffset);		return static_cast<const void >(static_cast<const char >(Ptr) + ByteOffset);
}		}

static void offsetVoidPtr(void Ptr, ptrdiff_t ByteOffset) {		static void offsetVoidPtr(void Ptr, ptrdiff_t ByteOffset) {
return static_cast<void >(static_cast<char >(Ptr) + ByteOffset);		return static_cast<void >(static_cast<char >(Ptr) + ByteOffset);
}		}

Status CUDAPlatform::rawCopyDToD(const void *DeviceSrc,
ptrdiff_t DeviceSrcByteOffset, void *DeviceDst,
ptrdiff_t DeviceDstByteOffset,
ptrdiff_t ByteCount) {
return getCUError(cuMemcpyDtoD(reinterpret_cast<CUdeviceptr>(offsetVoidPtr(
DeviceDst, DeviceDstByteOffset)),
reinterpret_cast<CUdeviceptr>(offsetVoidPtr(
DeviceSrc, DeviceSrcByteOffset)),
ByteCount),
"cuMemcpyDtoD");
}

Status CUDAPlatform::rawCopyDToH(const void *DeviceSrc,
ptrdiff_t DeviceSrcByteOffset, void *HostDst,
ptrdiff_t ByteCount) {
return getCUError(
cuMemcpyDtoH(HostDst, reinterpret_cast<CUdeviceptr>(
offsetVoidPtr(DeviceSrc, DeviceSrcByteOffset)),
ByteCount),
"cuMemcpyDtoH");
}

Status CUDAPlatform::rawCopyHToD(const void HostSrc, void DeviceDst,
ptrdiff_t DeviceDstByteOffset,
ptrdiff_t ByteCount) {
return getCUError(cuMemcpyHtoD(reinterpret_cast<CUdeviceptr>(offsetVoidPtr(
DeviceDst, DeviceDstByteOffset)),
HostSrc, ByteCount),
"cuMemcpyHtoD");
}

Status CUDAPlatform::rawMemsetD(void *DeviceDst, ptrdiff_t ByteOffset,
ptrdiff_t ByteCount, char ByteValue) {
return getCUError(cuMemsetD8(reinterpret_cast<CUdeviceptr>(
offsetVoidPtr(DeviceDst, ByteOffset)),
ByteValue, ByteCount),
"cuMemsetD8");
}

Status CUDAPlatform::rawRegisterHostMem(const void *Memory,		Status CUDAPlatform::rawRegisterHostMem(const void *Memory,
ptrdiff_t ByteCount) {		ptrdiff_t ByteCount) {
unsigned int Flags = 0;		unsigned int Flags = 0;
return getCUError(		return getCUError(
cuMemHostRegister(const_cast<void *>(Memory), ByteCount, Flags),		cuMemHostRegister(const_cast<void *>(Memory), ByteCount, Flags),
"cuMemHostRegiser");		"cuMemHostRegiser");
}		}

▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	return getCUError(cuStreamAddCallback(Stream, cuStreamCallbackShim,
UserData.release(), Flags),		UserData.release(), Flags),
"cuStreamAddCallback");		"cuStreamAddCallback");
}		}

static void cudaDestroyProgram(void *H) {		static void cudaDestroyProgram(void *H) {
logCUWarning(cuModuleUnload(static_cast<CUmod_st *>(H)), "cuModuleUnload");		logCUWarning(cuModuleUnload(static_cast<CUmod_st *>(H)), "cuModuleUnload");
}		}

Expected<Program>		Expected<Program> CUDAPlatform::createProgramFromSource(Span<const char> Source,
CUDAPlatform::createProgramFromSource(Span<const char> Source) {		int DeviceIndex) {
		Status S = setContext(DeviceIndex);
		if (S.isError())
		return S;
CUmodule Module;		CUmodule Module;
constexpr int LogBufferSizeBytes = 1024;		constexpr int LogBufferSizeBytes = 1024;
char InfoLogBuffer[LogBufferSizeBytes];		char InfoLogBuffer[LogBufferSizeBytes];
char ErrorLogBuffer[LogBufferSizeBytes];		char ErrorLogBuffer[LogBufferSizeBytes];
constexpr size_t OptionsCount = 4;		constexpr size_t OptionsCount = 4;
std::array<CUjit_option, OptionsCount> OptionNames = {		std::array<CUjit_option, OptionsCount> OptionNames = {
{CU_JIT_INFO_LOG_BUFFER, CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES,		{CU_JIT_INFO_LOG_BUFFER, CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES,
CU_JIT_ERROR_LOG_BUFFER, CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES}};		CU_JIT_ERROR_LOG_BUFFER, CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES}};
▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

acxxel/opencl_acxxel.cpp

Show All 27 Lines
struct FullDeviceID {		struct FullDeviceID {
cl_platform_id PlatformID;		cl_platform_id PlatformID;
cl_device_id DeviceID;		cl_device_id DeviceID;

FullDeviceID(cl_platform_id PlatformID, cl_device_id DeviceID)		FullDeviceID(cl_platform_id PlatformID, cl_device_id DeviceID)
: PlatformID(PlatformID), DeviceID(DeviceID) {}		: PlatformID(PlatformID), DeviceID(DeviceID) {}
};		};

thread_local int ActiveDeviceIndex = 0;

static std::string getOpenCLErrorMessage(cl_int Result) {		static std::string getOpenCLErrorMessage(cl_int Result) {
if (!Result)		if (!Result)
return "success";		return "success";
std::ostringstream OutStream;		std::ostringstream OutStream;
OutStream << "OpenCL error: code = " << Result;		OutStream << "OpenCL error: code = " << Result;
return OutStream.str();		return OutStream.str();
}		}

Show All 16 Lines
class OpenCLPlatform : public Platform {		class OpenCLPlatform : public Platform {
public:		public:
~OpenCLPlatform() override = default;		~OpenCLPlatform() override = default;

static Expected<OpenCLPlatform> create();		static Expected<OpenCLPlatform> create();

Expected<int> getDeviceCount() override;		Expected<int> getDeviceCount() override;

Status setActiveDeviceForThread(int DeviceIndex) override;		Expected<Stream> createStream(int DeviceIndex) override;

int getActiveDeviceForThread() override;

Expected<Stream> createStream() override;

Expected<Event> createEvent() override;		Expected<Event> createEvent(int DeviceIndex) override;

Expected<Program> createProgramFromSource(Span<const char> Source) override;		Expected<Program> createProgramFromSource(Span<const char> Source,
		int DeviceIndex) override;

protected:		protected:
Status streamSync(void *Stream) override;		Status streamSync(void *Stream) override;

Status streamWaitOnEvent(void Stream, void Event) override;		Status streamWaitOnEvent(void Stream, void Event) override;

Expected<void *> rawMallocD(ptrdiff_t ByteCount) override;		Expected<void *> rawMallocD(ptrdiff_t ByteCount, int DeviceIndex) override;
HandleDestructor getDeviceMemoryHandleDestructor() override;		HandleDestructor getDeviceMemoryHandleDestructor() override;
void getDeviceMemorySpanHandle(void BaseHandle, size_t ByteSize,		void getDeviceMemorySpanHandle(void BaseHandle, size_t ByteSize,
size_t ByteOffset) override;		size_t ByteOffset) override;
void rawDestroyDeviceMemorySpanHandle(void *Handle) override;		void rawDestroyDeviceMemorySpanHandle(void *Handle) override;

Expected<void > rawGetDeviceSymbolAddress(const void Symbol) override;		Expected<void > rawGetDeviceSymbolAddress(const void Symbol,
Expected<ptrdiff_t> rawGetDeviceSymbolSize(const void *Symbol) override;		int DeviceIndex) override;
		Expected<ptrdiff_t> rawGetDeviceSymbolSize(const void *Symbol,
Status rawCopyDToD(const void *DeviceSrc, ptrdiff_t DeviceSrcByteOffset,		int DeviceIndex) override;
void *DeviceDst, ptrdiff_t DeviceDstByteOffset,
ptrdiff_t ByteCount) override;
Status rawCopyDToH(const void *DeviceSrc, ptrdiff_t DeviceSrcByteOffset,
void *HostDst, ptrdiff_t ByteCount) override;
Status rawCopyHToD(const void HostSrc, void DeviceDst,
ptrdiff_t DeviceDstByteOffset,
ptrdiff_t ByteCount) override;

Status rawMemsetD(void *DeviceDst, ptrdiff_t ByteOffset, ptrdiff_t ByteCount,
char ByteValue) override;

Status rawRegisterHostMem(const void *Memory, ptrdiff_t ByteCount) override;		Status rawRegisterHostMem(const void *Memory, ptrdiff_t ByteCount) override;
HandleDestructor getUnregisterHostMemoryHandleDestructor() override;		HandleDestructor getUnregisterHostMemoryHandleDestructor() override;

Expected<void *> rawMallocRegisteredH(ptrdiff_t ByteCount) override;		Expected<void *> rawMallocRegisteredH(ptrdiff_t ByteCount) override;
HandleDestructor getFreeHostMemoryHandleDestructor() override;		HandleDestructor getFreeHostMemoryHandleDestructor() override;

Status asyncCopyDToD(const void *DeviceSrc, ptrdiff_t DeviceSrcByteOffset,		Status asyncCopyDToD(const void *DeviceSrc, ptrdiff_t DeviceSrcByteOffset,
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	Expected<OpenCLPlatform> OpenCLPlatform::create() {
}		}

return OpenCLPlatform(std::move(FullDeviceIDs), std::move(Contexts),		return OpenCLPlatform(std::move(FullDeviceIDs), std::move(Contexts),
std::move(CommandQueues));		std::move(CommandQueues));
}		}

Expected<int> OpenCLPlatform::getDeviceCount() { return FullDeviceIDs.size(); }		Expected<int> OpenCLPlatform::getDeviceCount() { return FullDeviceIDs.size(); }

Status OpenCLPlatform::setActiveDeviceForThread(int DeviceIndex) {
if (static_cast<size_t>(DeviceIndex) >= FullDeviceIDs.size())
return Status("Could not set active device index to " +
std::to_string(DeviceIndex) + " because there are only " +
std::to_string(FullDeviceIDs.size()) +
" devices in the system");
ActiveDeviceIndex = DeviceIndex;
return Status();
}

int OpenCLPlatform::getActiveDeviceForThread() { return ActiveDeviceIndex; }

static void openCLDestroyStream(void *H) {		static void openCLDestroyStream(void *H) {
logOpenCLWarning(clReleaseCommandQueue(static_cast<cl_command_queue>(H)),		logOpenCLWarning(clReleaseCommandQueue(static_cast<cl_command_queue>(H)),
"clReleaseCommandQueue");		"clReleaseCommandQueue");
}		}

Expected<Stream> OpenCLPlatform::createStream() {		Expected<Stream> OpenCLPlatform::createStream(int DeviceIndex) {
cl_int Result;		cl_int Result;
cl_command_queue Queue = clCreateCommandQueue(		cl_command_queue Queue = clCreateCommandQueue(
Contexts[ActiveDeviceIndex], FullDeviceIDs[ActiveDeviceIndex].DeviceID,		Contexts[DeviceIndex], FullDeviceIDs[DeviceIndex].DeviceID,
CL_QUEUE_PROFILING_ENABLE, &Result);		CL_QUEUE_PROFILING_ENABLE, &Result);
if (Result)		if (Result)
return getOpenCLError(Result, "clCreateCommandQueue");		return getOpenCLError(Result, "clCreateCommandQueue");
return constructStream(this, Queue, openCLDestroyStream);		return constructStream(this, DeviceIndex, Queue, openCLDestroyStream);
}		}

static void openCLEventDestroy(void *H) {		static void openCLEventDestroy(void *H) {
cl_event CLEvent = static_cast<cl_event >(H);		cl_event CLEvent = static_cast<cl_event >(H);
logOpenCLWarning(clReleaseEvent(*CLEvent), "clReleaseEvent");		logOpenCLWarning(clReleaseEvent(*CLEvent), "clReleaseEvent");
delete CLEvent;		delete CLEvent;
}		}

Status OpenCLPlatform::streamSync(void *Stream) {		Status OpenCLPlatform::streamSync(void *Stream) {
return getOpenCLError(clFinish(static_cast<cl_command_queue>(Stream)),		return getOpenCLError(clFinish(static_cast<cl_command_queue>(Stream)),
"clFinish");		"clFinish");
}		}

Status OpenCLPlatform::streamWaitOnEvent(void Stream, void Event) {		Status OpenCLPlatform::streamWaitOnEvent(void Stream, void Event) {
cl_event CLEvent = static_cast<cl_event >(Event);		cl_event CLEvent = static_cast<cl_event >(Event);
return getOpenCLError(		return getOpenCLError(
clEnqueueBarrierWithWaitList(static_cast<cl_command_queue>(Stream), 1,		clEnqueueBarrierWithWaitList(static_cast<cl_command_queue>(Stream), 1,
CLEvent, nullptr),		CLEvent, nullptr),
"clEnqueueMarkerWithWaitList");		"clEnqueueMarkerWithWaitList");
}		}

Expected<Event> OpenCLPlatform::createEvent() {		Expected<Event> OpenCLPlatform::createEvent(int DeviceIndex) {
cl_int Result;		cl_int Result;
cl_event Event = clCreateUserEvent(Contexts[ActiveDeviceIndex], &Result);		cl_event Event = clCreateUserEvent(Contexts[DeviceIndex], &Result);
if (Result)		if (Result)
return getOpenCLError(Result, "clCreateUserEvent");		return getOpenCLError(Result, "clCreateUserEvent");
if (cl_int Result = clSetUserEventStatus(Event, CL_COMPLETE))		if (cl_int Result = clSetUserEventStatus(Event, CL_COMPLETE))
return getOpenCLError(Result, "clSetUserEventStatus");		return getOpenCLError(Result, "clSetUserEventStatus");
return constructEvent(this, new cl_event(Event), openCLEventDestroy);		return constructEvent(this, DeviceIndex, new cl_event(Event),
		openCLEventDestroy);
}		}

static void openCLDestroyProgram(void *H) {		static void openCLDestroyProgram(void *H) {
logOpenCLWarning(clReleaseProgram(static_cast<cl_program>(H)),		logOpenCLWarning(clReleaseProgram(static_cast<cl_program>(H)),
"clReleaseProgram");		"clReleaseProgram");
}		}

Expected<Program>		Expected<Program>
OpenCLPlatform::createProgramFromSource(Span<const char> Source) {		OpenCLPlatform::createProgramFromSource(Span<const char> Source,
		int DeviceIndex) {
cl_int Error;		cl_int Error;
const char *CSource = Source.data();		const char *CSource = Source.data();
size_t SourceSize = Source.size();		size_t SourceSize = Source.size();
cl_program Program = clCreateProgramWithSource(Contexts[ActiveDeviceIndex], 1,		cl_program Program = clCreateProgramWithSource(Contexts[DeviceIndex], 1,
&CSource, &SourceSize, &Error);		&CSource, &SourceSize, &Error);
if (Error)		if (Error)
return getOpenCLError(Error, "clCreateProgramWithSource");		return getOpenCLError(Error, "clCreateProgramWithSource");
cl_device_id DeviceID = FullDeviceIDs[ActiveDeviceIndex].DeviceID;		cl_device_id DeviceID = FullDeviceIDs[DeviceIndex].DeviceID;
if (cl_int Error =		if (cl_int Error =
clBuildProgram(Program, 1, &DeviceID, nullptr, nullptr, nullptr))		clBuildProgram(Program, 1, &DeviceID, nullptr, nullptr, nullptr))
return getOpenCLError(Error, "clBuildProgram");		return getOpenCLError(Error, "clBuildProgram");
return constructProgram(this, Program, openCLDestroyProgram);		return constructProgram(this, Program, openCLDestroyProgram);
}		}

Expected<void *> OpenCLPlatform::rawMallocD(ptrdiff_t ByteCount) {		Expected<void *> OpenCLPlatform::rawMallocD(ptrdiff_t ByteCount,
		int DeviceIndex) {
cl_int Result;		cl_int Result;
cl_mem Memory = clCreateBuffer(Contexts[ActiveDeviceIndex], CL_MEM_READ_WRITE,		cl_mem Memory = clCreateBuffer(Contexts[DeviceIndex], CL_MEM_READ_WRITE,
ByteCount, nullptr, &Result);		ByteCount, nullptr, &Result);
if (Result)		if (Result)
return getOpenCLError(Result, "clCreateBuffer");		return getOpenCLError(Result, "clCreateBuffer");
return reinterpret_cast<void *>(Memory);		return reinterpret_cast<void *>(Memory);
}		}

static void openCLDestroyDeviceMemory(void *H) {		static void openCLDestroyDeviceMemory(void *H) {
logOpenCLWarning(clReleaseMemObject(static_cast<cl_mem>(H)),		logOpenCLWarning(clReleaseMemObject(static_cast<cl_mem>(H)),
Show All 20 Lines	void OpenCLPlatform::getDeviceMemorySpanHandle(void BaseHandle,
return SubBuffer;		return SubBuffer;
}		}

void OpenCLPlatform::rawDestroyDeviceMemorySpanHandle(void *Handle) {		void OpenCLPlatform::rawDestroyDeviceMemorySpanHandle(void *Handle) {
openCLDestroyDeviceMemory(Handle);		openCLDestroyDeviceMemory(Handle);
}		}

Expected<void *>		Expected<void *>
OpenCLPlatform::rawGetDeviceSymbolAddress(const void * /Symbol/) {		OpenCLPlatform::rawGetDeviceSymbolAddress(const void * /Symbol/,
		int /DeviceIndex/) {
// This doesn't seem to have any equivalent in OpenCL.		// This doesn't seem to have any equivalent in OpenCL.
return Status("not implemented");		return Status("not implemented");
}		}

Expected<ptrdiff_t>		Expected<ptrdiff_t>
OpenCLPlatform::rawGetDeviceSymbolSize(const void * /Symbol/) {		OpenCLPlatform::rawGetDeviceSymbolSize(const void * /Symbol/,
		int /DeviceIndex/) {
// This doesn't seem to have any equivalent in OpenCL.		// This doesn't seem to have any equivalent in OpenCL.
return Status("not implemented");		return Status("not implemented");
}		}

Status OpenCLPlatform::rawCopyDToD(const void *DeviceSrc,
ptrdiff_t DeviceSrcByteOffset,
void *DeviceDst,
ptrdiff_t DeviceDstByteOffset,
ptrdiff_t ByteCount) {
cl_event DoneEvent;
if (cl_int Result = clEnqueueCopyBuffer(
CommandQueues[ActiveDeviceIndex],
static_cast<cl_mem>(const_cast<void *>(DeviceSrc)),
static_cast<cl_mem>(DeviceDst), DeviceSrcByteOffset,
DeviceDstByteOffset, ByteCount, 0, nullptr, &DoneEvent))
return getOpenCLError(Result, "clEnqueueCopyBuffer");
return getOpenCLError(clWaitForEvents(1, &DoneEvent), "clWaitForEvents");
}

Status OpenCLPlatform::rawCopyDToH(const void *DeviceSrc,
ptrdiff_t DeviceSrcByteOffset, void *HostDst,
ptrdiff_t ByteCount) {
cl_event DoneEvent;
if (cl_int Result = clEnqueueReadBuffer(
CommandQueues[ActiveDeviceIndex],
static_cast<cl_mem>(const_cast<void *>(DeviceSrc)), CL_TRUE,
DeviceSrcByteOffset, ByteCount, HostDst, 0, nullptr, &DoneEvent))
return getOpenCLError(Result, "clEnqueueReadBuffer");
return getOpenCLError(clWaitForEvents(1, &DoneEvent), "clWaitForEvents");
}

Status OpenCLPlatform::rawCopyHToD(const void HostSrc, void DeviceDst,
ptrdiff_t DeviceDstByteOffset,
ptrdiff_t ByteCount) {
cl_event DoneEvent;
if (cl_int Result = clEnqueueWriteBuffer(
CommandQueues[ActiveDeviceIndex], static_cast<cl_mem>(DeviceDst),
CL_TRUE, DeviceDstByteOffset, ByteCount, HostSrc, 0, nullptr,
&DoneEvent))
return getOpenCLError(Result, "clEnqueueWriteBuffer");
return getOpenCLError(clWaitForEvents(1, &DoneEvent), "clWaitForEvents");
}

Status OpenCLPlatform::rawMemsetD(void *DeviceDst, ptrdiff_t ByteOffset,
ptrdiff_t ByteCount, char ByteValue) {
cl_event DoneEvent;
if (cl_int Result = clEnqueueFillBuffer(
CommandQueues[ActiveDeviceIndex], static_cast<cl_mem>(DeviceDst),
&ByteValue, 1, ByteOffset, ByteCount, 0, nullptr, &DoneEvent))
return getOpenCLError(Result, "clEnqueueFillBuffer");
return getOpenCLError(clWaitForEvents(1, &DoneEvent), "clWaitForEvents");
}

static void noOpHandleDestructor(void *) {}		static void noOpHandleDestructor(void *) {}

Status OpenCLPlatform::rawRegisterHostMem(const void * /Memory/,		Status OpenCLPlatform::rawRegisterHostMem(const void * /Memory/,
ptrdiff_t /ByteCount/) {		ptrdiff_t /ByteCount/) {
// TODO(jhen): Do we want to do something to pin the memory here?		// TODO(jhen): Do we want to do something to pin the memory here?
return Status();		return Status();
}		}

▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	if (cl_int Result = clSetUserEventStatus(Data->EndEvent, CL_COMPLETE))
logOpenCLWarning(Result, "clSetUserEventStatus");		logOpenCLWarning(Result, "clSetUserEventStatus");
if (cl_int Result = clReleaseEvent(Data->EndEvent))		if (cl_int Result = clReleaseEvent(Data->EndEvent))
logOpenCLWarning(Result, "clReleaseEvent");		logOpenCLWarning(Result, "clReleaseEvent");
}		}

Status OpenCLPlatform::addStreamCallback(Stream &TheStream,		Status OpenCLPlatform::addStreamCallback(Stream &TheStream,
StreamCallback Callback) {		StreamCallback Callback) {
cl_int Result;		cl_int Result;
cl_event StartEvent = clCreateUserEvent(Contexts[ActiveDeviceIndex], &Result);		cl_event StartEvent =
		clCreateUserEvent(Contexts[TheStream.getDeviceIndex()], &Result);
if (Result)		if (Result)
return getOpenCLError(Result, "clCreateUserEvent");		return getOpenCLError(Result, "clCreateUserEvent");
cl_event EndEvent = clCreateUserEvent(Contexts[ActiveDeviceIndex], &Result);		cl_event EndEvent =
		clCreateUserEvent(Contexts[TheStream.getDeviceIndex()], &Result);
if (Result)		if (Result)
return getOpenCLError(Result, "clCreateUserEvent");		return getOpenCLError(Result, "clCreateUserEvent");
cl_event StartBarrierEvent;		cl_event StartBarrierEvent;
if (cl_int Result = clEnqueueBarrierWithWaitList(		if (cl_int Result = clEnqueueBarrierWithWaitList(
static_cast<cl_command_queue>(getStreamHandle(TheStream)), 1,		static_cast<cl_command_queue>(getStreamHandle(TheStream)), 1,
&StartEvent, &StartBarrierEvent))		&StartEvent, &StartBarrierEvent))
return getOpenCLError(Result, "clEnqueueBarrierWithWaitList");		return getOpenCLError(Result, "clEnqueueBarrierWithWaitList");

▲ Show 20 Lines • Show All 128 Lines • Show Last 20 Lines

acxxel/tests/CMakeLists.txt

	Show All 23 Lines
	add_executable(opencl_test opencl_test.cpp)			add_executable(opencl_test opencl_test.cpp)
	target_link_libraries(			target_link_libraries(
	opencl_test			opencl_test
	acxxel			acxxel
	${GTEST_BOTH_LIBRARIES}			${GTEST_BOTH_LIBRARIES}
	${CMAKE_THREAD_LIBS_INIT})			${CMAKE_THREAD_LIBS_INIT})
	add_test(OpenCLTest opencl_test)			add_test(OpenCLTest opencl_test)
	endif()			endif()

				if(ACXXEL_ENABLE_MULTI_DEVICE_UNIT_TESTS)
				add_executable(multi_device_test multi_device_test.cpp)
				target_link_libraries(
				multi_device_test
				acxxel
				${GTEST_BOTH_LIBRARIES}
				${CMAKE_THREAD_LIBS_INIT})
				add_test(MultiDeviceTest multi_device_test)
				endif()

acxxel/tests/acxxel_test.cpp

Show All 12 Lines

#include <chrono>		#include <chrono>
#include <condition_variable>		#include <condition_variable>
#include <mutex>		#include <mutex>
#include <thread>		#include <thread>

namespace {		namespace {

template <typename T, size_t N> constexpr size_t size(T (&)[N]) { return N; }		template <typename T, size_t N> constexpr size_t arraySize(T (&)[N]) {
		return N;
		}

using PlatformGetter = acxxel::Expected<acxxel::Platform > ()();		using PlatformGetter = acxxel::Expected<acxxel::Platform > ()();
class AcxxelTest : public ::testing::TestWithParam<PlatformGetter> {};		class AcxxelTest : public ::testing::TestWithParam<PlatformGetter> {};

TEST_P(AcxxelTest, GetDeviceCount) {		TEST_P(AcxxelTest, GetDeviceCount) {
acxxel::Platform *Platform = GetParam()().takeValue();		acxxel::Platform *Platform = GetParam()().takeValue();
int DeviceCount = Platform->getDeviceCount().getValue();		int DeviceCount = Platform->getDeviceCount().getValue();
EXPECT_GE(DeviceCount, 0);		EXPECT_GE(DeviceCount, 0);
▲ Show 20 Lines • Show All 132 Lines • ▼ Show 20 Lines	TEST_P(AcxxelTest, DeviceMemory) {
acxxel::DeviceMemorySpan<const int> ImmutableSpan = ConstMemoryRef.asSpan();		acxxel::DeviceMemorySpan<const int> ImmutableSpan = ConstMemoryRef.asSpan();
testFullDeviceMemorySpan(ImmutableSpan, 10, sizeof(int));		testFullDeviceMemorySpan(ImmutableSpan, 10, sizeof(int));
}		}

TEST_P(AcxxelTest, CopyHostAndDevice) {		TEST_P(AcxxelTest, CopyHostAndDevice) {
acxxel::Platform *Platform = GetParam()().takeValue();		acxxel::Platform *Platform = GetParam()().takeValue();
acxxel::Stream Stream = Platform->createStream().takeValue();		acxxel::Stream Stream = Platform->createStream().takeValue();
int A[] = {0, 1, 2};		int A[] = {0, 1, 2};
std::array<int, size(A)> B;		std::array<int, arraySize(A)> B;
acxxel::DeviceMemory<int> X = Platform->mallocD<int>(size(A)).takeValue();		acxxel::DeviceMemory<int> X =
		Platform->mallocD<int>(arraySize(A)).takeValue();
Stream.syncCopyHToD(A, X);		Stream.syncCopyHToD(A, X);
Stream.syncCopyDToH(X, B);		Stream.syncCopyDToH(X, B);
for (size_t I = 0; I < size(A); ++I)		for (size_t I = 0; I < arraySize(A); ++I)
EXPECT_EQ(A[I], B[I]);		EXPECT_EQ(A[I], B[I]);
EXPECT_FALSE(Stream.takeStatus().isError());		EXPECT_FALSE(Stream.takeStatus().isError());
}		}

TEST_P(AcxxelTest, CopyDToD) {		TEST_P(AcxxelTest, CopyDToD) {
acxxel::Platform *Platform = GetParam()().takeValue();		acxxel::Platform *Platform = GetParam()().takeValue();
acxxel::Stream Stream = Platform->createStream().takeValue();		acxxel::Stream Stream = Platform->createStream().takeValue();
int A[] = {0, 1, 2};		int A[] = {0, 1, 2};
std::array<int, size(A)> B;		std::array<int, arraySize(A)> B;
acxxel::DeviceMemory<int> X = Platform->mallocD<int>(size(A)).takeValue();		acxxel::DeviceMemory<int> X =
acxxel::DeviceMemory<int> Y = Platform->mallocD<int>(size(A)).takeValue();		Platform->mallocD<int>(arraySize(A)).takeValue();
		acxxel::DeviceMemory<int> Y =
		Platform->mallocD<int>(arraySize(A)).takeValue();
Stream.syncCopyHToD(A, X);		Stream.syncCopyHToD(A, X);
Stream.syncCopyDToD(X, Y);		Stream.syncCopyDToD(X, Y);
Stream.syncCopyDToH(Y, B);		Stream.syncCopyDToH(Y, B);
for (size_t I = 0; I < size(A); ++I)		for (size_t I = 0; I < arraySize(A); ++I)
EXPECT_EQ(A[I], B[I]);		EXPECT_EQ(A[I], B[I]);
EXPECT_FALSE(Stream.takeStatus().isError());		EXPECT_FALSE(Stream.takeStatus().isError());
}		}

TEST_P(AcxxelTest, AsyncCopyHostAndDevice) {		TEST_P(AcxxelTest, AsyncCopyHostAndDevice) {
acxxel::Platform *Platform = GetParam()().takeValue();		acxxel::Platform *Platform = GetParam()().takeValue();
int A[] = {0, 1, 2};		int A[] = {0, 1, 2};
std::array<int, size(A)> B;		std::array<int, arraySize(A)> B;
acxxel::DeviceMemory<int> X = Platform->mallocD<int>(size(A)).takeValue();		acxxel::DeviceMemory<int> X =
		Platform->mallocD<int>(arraySize(A)).takeValue();
acxxel::Stream Stream = Platform->createStream().takeValue();		acxxel::Stream Stream = Platform->createStream().takeValue();
acxxel::AsyncHostMemory<int> AsyncA =		acxxel::AsyncHostMemory<int> AsyncA =
Platform->registerHostMem(A).takeValue();		Platform->registerHostMem(A).takeValue();
acxxel::AsyncHostMemory<int> AsyncB =		acxxel::AsyncHostMemory<int> AsyncB =
Platform->registerHostMem(B).takeValue();		Platform->registerHostMem(B).takeValue();
EXPECT_FALSE(Stream.asyncCopyHToD(AsyncA, X).takeStatus().isError());		EXPECT_FALSE(Stream.asyncCopyHToD(AsyncA, X).takeStatus().isError());
EXPECT_FALSE(Stream.asyncCopyDToH(X, AsyncB).takeStatus().isError());		EXPECT_FALSE(Stream.asyncCopyDToH(X, AsyncB).takeStatus().isError());
EXPECT_FALSE(Stream.sync().isError());		EXPECT_FALSE(Stream.sync().isError());
for (size_t I = 0; I < size(A); ++I)		for (size_t I = 0; I < arraySize(A); ++I)
EXPECT_EQ(A[I], B[I]);		EXPECT_EQ(A[I], B[I]);
}		}

TEST_P(AcxxelTest, AsyncMemsetD) {		TEST_P(AcxxelTest, AsyncMemsetD) {
acxxel::Platform *Platform = GetParam()().takeValue();		acxxel::Platform *Platform = GetParam()().takeValue();
constexpr size_t ArrayLength = 10;		constexpr size_t ArrayLength = 10;
std::array<uint32_t, ArrayLength> Host;		std::array<uint32_t, ArrayLength> Host;
acxxel::DeviceMemory<uint32_t> X =		acxxel::DeviceMemory<uint32_t> X =
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	TEST_P(AcxxelTest, OwnedAsyncCopyHostAndDevice) {
EXPECT_FALSE(Stream.sync().isError());		EXPECT_FALSE(Stream.sync().isError());
for (size_t I = 0; I < Length; ++I)		for (size_t I = 0; I < Length; ++I)
EXPECT_EQ(A[I], B[I]);		EXPECT_EQ(A[I], B[I]);
}		}

TEST_P(AcxxelTest, AsyncCopyDToD) {		TEST_P(AcxxelTest, AsyncCopyDToD) {
acxxel::Platform *Platform = GetParam()().takeValue();		acxxel::Platform *Platform = GetParam()().takeValue();
int A[] = {0, 1, 2};		int A[] = {0, 1, 2};
std::array<int, size(A)> B;		std::array<int, arraySize(A)> B;
acxxel::DeviceMemory<int> X = Platform->mallocD<int>(size(A)).takeValue();		acxxel::DeviceMemory<int> X =
acxxel::DeviceMemory<int> Y = Platform->mallocD<int>(size(A)).takeValue();		Platform->mallocD<int>(arraySize(A)).takeValue();
		acxxel::DeviceMemory<int> Y =
		Platform->mallocD<int>(arraySize(A)).takeValue();
acxxel::Stream Stream = Platform->createStream().takeValue();		acxxel::Stream Stream = Platform->createStream().takeValue();
acxxel::AsyncHostMemory<int> AsyncA =		acxxel::AsyncHostMemory<int> AsyncA =
Platform->registerHostMem(A).takeValue();		Platform->registerHostMem(A).takeValue();
acxxel::AsyncHostMemory<int> AsyncB =		acxxel::AsyncHostMemory<int> AsyncB =
Platform->registerHostMem(B).takeValue();		Platform->registerHostMem(B).takeValue();
EXPECT_FALSE(Stream.asyncCopyHToD(AsyncA, X).takeStatus().isError());		EXPECT_FALSE(Stream.asyncCopyHToD(AsyncA, X).takeStatus().isError());
EXPECT_FALSE(Stream.asyncCopyDToD(X, Y).takeStatus().isError());		EXPECT_FALSE(Stream.asyncCopyDToD(X, Y).takeStatus().isError());
EXPECT_FALSE(Stream.asyncCopyDToH(Y, AsyncB).takeStatus().isError());		EXPECT_FALSE(Stream.asyncCopyDToH(Y, AsyncB).takeStatus().isError());
EXPECT_FALSE(Stream.sync().isError());		EXPECT_FALSE(Stream.sync().isError());
for (size_t I = 0; I < size(A); ++I)		for (size_t I = 0; I < arraySize(A); ++I)
EXPECT_EQ(A[I], B[I]);		EXPECT_EQ(A[I], B[I]);
}		}

TEST_P(AcxxelTest, Stream) {		TEST_P(AcxxelTest, Stream) {
acxxel::Platform *Platform = GetParam()().takeValue();		acxxel::Platform *Platform = GetParam()().takeValue();
acxxel::Stream Stream = Platform->createStream().takeValue();		acxxel::Stream Stream = Platform->createStream().takeValue();
EXPECT_FALSE(Stream.sync().isError());		EXPECT_FALSE(Stream.sync().isError());
}		}
▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

acxxel/tests/multi_device_test.cpp

This file was added.

				#include "acxxel.h"
				#include "config.h"
				#include "gtest/gtest.h"

				namespace {

				using PlatformGetter = acxxel::Expected<acxxel::Platform > ()();
				class MultiDeviceTest : public ::testing::TestWithParam<PlatformGetter> {};

				TEST_P(MultiDeviceTest, AsyncCopy) {
				acxxel::Platform *Platform = GetParam()().takeValue();
				int DeviceCount = Platform->getDeviceCount().getValue();
				EXPECT_GT(DeviceCount, 0);

				int Length = 3;
				auto A = std::unique_ptr<int[]>(new int[Length]);
				auto B0 = std::unique_ptr<int[]>(new int[Length]);
				auto B1 = std::unique_ptr<int[]>(new int[Length]);

				auto ASpan = acxxel::Span<int>(A.get(), Length);
				auto B0Span = acxxel::Span<int>(B0.get(), Length);
				auto B1Span = acxxel::Span<int>(B1.get(), Length);

				for (int I = 0; I < Length; ++I)
				A[I] = I;

				auto AsyncA = Platform->registerHostMem(ASpan).takeValue();
				auto AsyncB0 = Platform->registerHostMem(B0Span).takeValue();
				auto AsyncB1 = Platform->registerHostMem(B1Span).takeValue();

				acxxel::Stream Stream0 = Platform->createStream(0).takeValue();
				acxxel::Stream Stream1 = Platform->createStream(1).takeValue();
				auto Device0 = Platform->mallocD<int>(Length, 0).takeValue();
				auto Device1 = Platform->mallocD<int>(Length, 1).takeValue();

				EXPECT_FALSE(Stream0.asyncCopyHToD(AsyncA, Device0, Length)
				.asyncCopyDToH(Device0, AsyncB0, Length)
				.sync()
				.isError());

				EXPECT_FALSE(Stream1.asyncCopyHToD(AsyncA, Device1, Length)
				.asyncCopyDToH(Device1, AsyncB1, Length)
				.sync()
				.isError());

				for (int I = 0; I < Length; ++I) {
				EXPECT_EQ(B0[I], I);
				EXPECT_EQ(B1[I], I);
				}
				}

				TEST_P(MultiDeviceTest, Events) {
				acxxel::Platform *Platform = GetParam()().takeValue();
				int DeviceCount = Platform->getDeviceCount().getValue();
				EXPECT_GT(DeviceCount, 0);

				acxxel::Stream Stream0 = Platform->createStream(0).takeValue();
				acxxel::Stream Stream1 = Platform->createStream(1).takeValue();
				acxxel::Event Event0 = Platform->createEvent(0).takeValue();
				acxxel::Event Event1 = Platform->createEvent(1).takeValue();

				EXPECT_FALSE(Stream0.enqueueEvent(Event0).sync().isError());
				EXPECT_FALSE(Stream1.enqueueEvent(Event1).sync().isError());

				EXPECT_TRUE(Event0.isDone());
				EXPECT_TRUE(Event1.isDone());

				EXPECT_FALSE(Event0.sync().isError());
				EXPECT_FALSE(Event1.sync().isError());
				}

				#if defined(ACXXEL_ENABLE_CUDA) \|\| defined(ACXXEL_ENABLE_OPENCL)
				INSTANTIATE_TEST_CASE_P(BothPlatformTest, MultiDeviceTest,
				::testing::Values(
				#ifdef ACXXEL_ENABLE_CUDA
				acxxel::getCUDAPlatform
				#ifdef ACXXEL_ENABLE_OPENCL
				,
				#endif
				#endif
				#ifdef ACXXEL_ENABLE_OPENCL
				acxxel::getOpenCLPlatform
				#endif
				));
				#endif

				} // namespace