This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
openmp/libomptarget/
-
libomptarget/
-
plugins-nextgen/amdgpu/src/
-
amdgpu/
-
src/
-
rtl.cpp
-
test/api/
-
api/
-
omp_host_pinned_memory.c
-
omp_host_pinned_memory_alloc.c

Differential D143775

[Libomptarget] Implement the host memory allocator with fine grained memory
ClosedPublic

Authored by jhuber6 on Feb 10 2023, 12:40 PM.

Download Raw Diff

Details

Reviewers

jdoerfert
kevinsala
JonChesterfield
ye-luo
tianshilei1992

Commits

rG5d560b6966b7: [Libomptarget] Implement the host memory allocator with fine grained memory

Summary

This patch should enable the "Host" allocation using fine-grained
memory. As far as I understand, this is HSA managed memory that is
availible to the host, but can be accessed by the device as well.
The original patch that introduced these extensions just stipulated that
it's "non-migratable" memory, which is most likely true because it's
managed by the host but accessible by the device. This should work
sufficiently well for what we expect the "host" allocation to do.

Depends on D143771

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jhuber6 created this revision.Feb 10 2023, 12:40 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2023, 12:40 PM

Herald added subscribers: kosarev, kerbowa, jvesely. · View Herald Transcript

jhuber6 requested review of this revision.Feb 10 2023, 12:40 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2023, 12:40 PM

Herald added subscribers: openmp-commits, sstefan1. · View Herald Transcript

Harbormaster completed remote builds in B213134: Diff 496583.Feb 10 2023, 12:44 PM

LGTM

This revision is now accepted and ready to land.Feb 13 2023, 1:21 AM

I believe coarse/fine grain and accessibility are orthogonal concepts. If the coarse grain pool happens to have the accessibility properties of "target_host", then it'll work. That may be target specific and even order of pool iteration specific.

Much safer would be to identify a memory pool with the properties that the given allocator specifies and return that. Coarse grain means synchronised on kernel boundaries iiuc.

So what is a target_host_mem allocator?

In D143775#4122420, @JonChesterfield wrote:

I believe coarse/fine grain and accessibility are orthogonal concepts. If the coarse grain pool happens to have the accessibility properties of "target_host", then it'll work. That may be target specific and even order of pool iteration specific.

Much safer would be to identify a memory pool with the properties that the given allocator specifies and return that. Coarse grain means synchronised on kernel boundaries iiuc.

So what is a target_host_mem allocator?

Yeah, I think the problem is that target_host_mem_alloc was never fully defined by @grokos. I interpreted it to mean that it's pinned memory that the device can access. I'm pretty sure coarse memory fits the bill there, but I don't know what the exact intention was with the "non-migratable" sentence.

In D143775#4122610, @jhuber6 wrote:

I interpreted it to mean that it's pinned memory that the device can access. I'm pretty sure coarse memory fits the bill there

I don't think that's what coarse grain means.

In D143775#4122646, @JonChesterfield wrote:

In D143775#4122610, @jhuber6 wrote:

I interpreted it to mean that it's pinned memory that the device can access. I'm pretty sure coarse memory fits the bill there

I don't think that's what coarse grain means.

I think the CUDA plugin right now just allocates pinned memory. We could do the same here just using new and the hsa_amd_memory_lock. Would be consistent at least.

jhuber6 retitled this revision from [Libomptarget] Implement the host memory allocator with coarse memory to [Libomptarget] Implement the host memory allocator with fine grained memory.Feb 20 2023, 6:42 AM

jhuber6 edited the summary of this revision. (Show Details)

Changing this to just go back to fine-grained. It worked before so it's less controversial.

This revision was landed with ongoing or failed builds.Feb 20 2023, 6:44 AM

Closed by commit rG5d560b6966b7: [Libomptarget] Implement the host memory allocator with fine grained memory (authored by jhuber6). · Explain Why

This revision was automatically updated to reflect the committed changes.

jhuber6 added a commit: rG5d560b6966b7: [Libomptarget] Implement the host memory allocator with fine grained memory.

Harbormaster completed remote builds in B214755: Diff 498834.Feb 20 2023, 6:47 AM

Revision Contents

Path

Size

openmp/

libomptarget/

plugins-nextgen/

amdgpu/

src/

rtl.cpp

8 lines

test/

api/

omp_host_pinned_memory.c

1 line

omp_host_pinned_memory_alloc.c

3 lines

Diff 498834

openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp

Show First 20 Lines • Show All 1,479 Lines • ▼ Show 20 Lines	struct AMDHostDeviceTy : public AMDGenericDeviceTy {

/// Get a memory pool for fine-grained allocations.		/// Get a memory pool for fine-grained allocations.
AMDGPUMemoryPoolTy &getFineGrainedMemoryPool() {		AMDGPUMemoryPoolTy &getFineGrainedMemoryPool() {
assert(!FineGrainedMemoryPools.empty() && "No fine-grained mempool");		assert(!FineGrainedMemoryPools.empty() && "No fine-grained mempool");
// Retrive any memory pool.		// Retrive any memory pool.
return *FineGrainedMemoryPools[0];		return *FineGrainedMemoryPools[0];
}		}

		AMDGPUMemoryPoolTy &getCoarseGrainedMemoryPool() {
		assert(!CoarseGrainedMemoryPools.empty() && "No coarse-grained mempool");
		// Retrive any memory pool.
		return *CoarseGrainedMemoryPools[0];
		}

/// Get a memory pool for kernel args allocations.		/// Get a memory pool for kernel args allocations.
AMDGPUMemoryPoolTy &getArgsMemoryPool() {		AMDGPUMemoryPoolTy &getArgsMemoryPool() {
assert(!ArgsMemoryPools.empty() && "No kernelargs mempool");		assert(!ArgsMemoryPools.empty() && "No kernelargs mempool");
// Retrieve any memory pool.		// Retrieve any memory pool.
return *ArgsMemoryPools[0];		return *ArgsMemoryPools[0];
}		}

/// Getters for kernel args and host pinned memory managers.		/// Getters for kernel args and host pinned memory managers.
▲ Show 20 Lines • Show All 261 Lines • ▼ Show 20 Lines	int free(void *TgtPtr, TargetAllocTy Kind) override {

AMDGPUMemoryPoolTy *MemoryPool = nullptr;		AMDGPUMemoryPoolTy *MemoryPool = nullptr;
switch (Kind) {		switch (Kind) {
case TARGET_ALLOC_DEFAULT:		case TARGET_ALLOC_DEFAULT:
case TARGET_ALLOC_DEVICE:		case TARGET_ALLOC_DEVICE:
MemoryPool = CoarseGrainedMemoryPools[0];		MemoryPool = CoarseGrainedMemoryPools[0];
break;		break;
case TARGET_ALLOC_HOST:		case TARGET_ALLOC_HOST:
		MemoryPool = &HostDevice.getFineGrainedMemoryPool();
break;		break;
case TARGET_ALLOC_SHARED:		case TARGET_ALLOC_SHARED:
MemoryPool = &HostDevice.getFineGrainedMemoryPool();		MemoryPool = &HostDevice.getFineGrainedMemoryPool();
break;		break;
}		}

if (!MemoryPool) {		if (!MemoryPool) {
REPORT("No memory pool for the specified allocation kind\n");		REPORT("No memory pool for the specified allocation kind\n");
▲ Show 20 Lines • Show All 845 Lines • ▼ Show 20 Lines	void AMDGPUDeviceTy::allocate(size_t Size, void , TargetAllocTy Kind) {
// Find the correct memory pool.		// Find the correct memory pool.
AMDGPUMemoryPoolTy *MemoryPool = nullptr;		AMDGPUMemoryPoolTy *MemoryPool = nullptr;
switch (Kind) {		switch (Kind) {
case TARGET_ALLOC_DEFAULT:		case TARGET_ALLOC_DEFAULT:
case TARGET_ALLOC_DEVICE:		case TARGET_ALLOC_DEVICE:
MemoryPool = CoarseGrainedMemoryPools[0];		MemoryPool = CoarseGrainedMemoryPools[0];
break;		break;
case TARGET_ALLOC_HOST:		case TARGET_ALLOC_HOST:
		MemoryPool = &HostDevice.getFineGrainedMemoryPool();
break;		break;
case TARGET_ALLOC_SHARED:		case TARGET_ALLOC_SHARED:
MemoryPool = &HostDevice.getFineGrainedMemoryPool();		MemoryPool = &HostDevice.getFineGrainedMemoryPool();
break;		break;
}		}

if (!MemoryPool) {		if (!MemoryPool) {
REPORT("No memory pool for the specified allocation kind\n");		REPORT("No memory pool for the specified allocation kind\n");
Show All 27 Lines

openmp/libomptarget/test/api/omp_host_pinned_memory.c

	// RUN: %libomptarget-compile-run-and-check-generic			// RUN: %libomptarget-compile-run-and-check-generic
	// UNSUPPORTED: amdgcn-amd-amdhsa

	#include <omp.h>			#include <omp.h>
	#include <stdio.h>			#include <stdio.h>

	// Allocate pinned memory on the host			// Allocate pinned memory on the host
	void *llvm_omp_target_alloc_host(size_t, int);			void *llvm_omp_target_alloc_host(size_t, int);
	void llvm_omp_target_free_host(void *, int);			void llvm_omp_target_free_host(void *, int);

	Show All 24 Lines

openmp/libomptarget/test/api/omp_host_pinned_memory_alloc.c

	// RUN: %libomptarget-compile-run-and-check-generic			// RUN: %libomptarget-compile-run-and-check-generic
	// UNSUPPORTED: amdgcn-amd-amdhsa

	#include <omp.h>			#include <omp.h>
	#include <stdio.h>			#include <stdio.h>

	int main() {			int main() {
	const int N = 64;			const int N = 64;

	int hst_ptr = omp_alloc(N sizeof(int), llvm_omp_target_host_mem_alloc);			int hst_ptr = omp_alloc(N sizeof(int), llvm_omp_target_host_mem_alloc);

	for (int i = 0; i < N; ++i)			for (int i = 0; i < N; ++i)
	hst_ptr[i] = 2;			hst_ptr[i] = 2;

	#pragma omp target teams distribute parallel for map(tofrom : hst_ptr[0 : N])			#pragma omp target teams distribute parallel for map(tofrom : hst_ptr[0 : N])
	for (int i = 0; i < N; ++i)			for (int i = 0; i < N; ++i)
	hst_ptr[i] -= 1;			hst_ptr[i] -= 1;

	int sum = 0;			int sum = 0;
	for (int i = 0; i < N; ++i)			for (int i = 0; i < N; ++i)
	sum += hst_ptr[i];			sum += hst_ptr[i];

	omp_free(hst_ptr, llvm_omp_target_shared_mem_alloc);			omp_free(hst_ptr, llvm_omp_target_host_mem_alloc);
	// CHECK: PASS			// CHECK: PASS
	if (sum == N)			if (sum == N)
	printf("PASS\n");			printf("PASS\n");
	}			}