This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in libomptarget amdgpu plugin (API and implementation)
ClosedPublic

Authored by carlo.bertolli on Dec 2 2022, 9:41 AM.

Details

Summary

The current only way to obtain pinned memory with libomptarget is to use a custom allocator llvm_omp_target_alloc_host. This reflects well the CUDA implementation of libomptarget, but it does not correctly expose the AMDGPU runtime API, where any system allocated page can be locked/unlocked through a call to hsa_amd_memory_lock/unlock.
This patch enables users to allocate memory through malloc (mmap, sbreak) and then pin the related memory pages with a libomptarget special call. It is a base support in the amdgpu libomptarget plugin to enable users to prelock their host memory pages so that the runtime doesn't need to lock them itself for asynchronous memory transfers.

Diff Detail

Event Timeline

carlo.bertolli created this revision.Dec 2 2022, 9:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 2 2022, 9:41 AM
carlo.bertolli requested review of this revision.Dec 2 2022, 9:41 AM

It's interesting that locking locked memory succeeds, but doesn't give you something that has to be unlocked twice. Not totally sure about the convenience vs error detection there. What does the proposed user facing interface look like?

openmp/libomptarget/plugins/amdgpu/impl/impl.cpp
27

Maybe return false here, "on failure return is locked" reads the opposite of the semantics

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
2716

could we keep this pattern on the implementation? if something goes wrong, return nullptr, as opposed to passing pointers to pointers that are sometimes assigned

kevinsala added inline comments.Dec 3 2022, 8:38 AM
openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
2707

Could we add both __tgt_rtl_data_lock and __tgt_rtl_data_unlock declaration to the include/omptargetplugin.h header?

carlo.bertolli marked 3 inline comments as done.Dec 5 2022, 7:45 AM

It's interesting that locking locked memory succeeds, but doesn't give you something that has to be unlocked twice. Not totally sure about the convenience vs error detection there. What does the proposed user facing interface look like?

As in the example:

llvm_omp_target_lock_mem(locked, n * sizeof(int), omp_get_default_device());
llvm_omp_target_unlock_mem(locked, omp_get_default_device());

The parent patch of this one also implements OpenMP traits for pinned memory, and calls kmp_target_lock_mem and kmp_target_unlock_mem to implement the trait on malloc'ed memory. These will map to the targetLockExplicit and targetUnlockExplicit calls in this patch, once the two patches are in.

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
2716

Great catch, thanks so much for the input!

carlo.bertolli marked an inline comment as done.

Apply comments

dhruvachak added inline comments.Dec 12 2022, 10:02 AM
openmp/libomptarget/plugins/amdgpu/impl/impl.cpp
28

You can remove else since the if branch has a return.

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
1824

I think the 4th argument is num_agent. Please add it as a comment. In addition, is it always 0?

openmp/libomptarget/src/omptarget.cpp
425

What is this lock protecting? It appears PM->Devices. If that's so, why are accesses such as PM->Devices[DeviceNum] unprotected? Both a few lines down and in targetLockExplicit().

carlo.bertolli marked 3 inline comments as done.

[OpenMP][libomptarget][AMDGPU] Apply requested changes and merge against trunk.

openmp/libomptarget/src/omptarget.cpp
425

According to deviceIsReady in device.cpp, device size can only change while registering a new runtime lib. If we have enough devices to cover for the device_num passed in by the API caller, then we know that there will always be an RTL object corresponding to that device, so we don't need to lock/unlock again because we know that there is an object that can be dereferenced at device_num position in the array PM->Devices.
We do the same lock/unlock of the vector size in targetLockExplicit but we call deviceIsReady to accomplish that. We don't do it here, in unlock, because we do not want to re-initialize the device in case we are in process tear down phase.
I've switched to using the object constructor/destructor mechanism as used elsewhere in libomptarget.

jdoerfert added inline comments.Jan 3 2023, 3:18 PM
openmp/libomptarget/src/omptarget.cpp
425

The problem is it's a vector. adding elements can cause reallocation which will race with PM->Devices[DeviceNum]. We should lock until we have the device, here and elsewhere. This is really only an issue if we add new runtime libs so locking longer will not affect performance.

carlo.bertolli marked an inline comment as done.

[OpenMP][libomptarget][AMDGPU] Add lock/unlock to prevent races on Devices vector in plugin manager.

carlo.bertolli added inline comments.Jan 4 2023, 2:55 PM
openmp/libomptarget/src/omptarget.cpp
425

Agreed, I've added locks for the Devices vector access. I will also write a new patch that does the same in all uses of that vector to prevent races.

jdoerfert accepted this revision.Jan 4 2023, 3:48 PM

Lot's of nits. I think they can be addressed pre-merge. LG

openmp/libomptarget/include/omptargetplugin.h
196

keep it consistent, int32_t ID, also below.
Nit: Style (size)

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
1835

I doubt we need the check here but I don't mind keeping it (in the old plugin).

openmp/libomptarget/src/api.cpp
84

Mark the result as nodiscard to ensure people don't assume ptr is now locked.

openmp/libomptarget/src/omptarget.cpp
393

Nit: Style.

415

Technically you only need to get the Device pointer, then you can drop the lock. Not that it should ever matter much.

446

Same as above.

openmp/libomptarget/src/private.h
56

Nit: Style

openmp/libomptarget/test/mapping/prelock.cpp
29

you need to use the result.

This revision is now accepted and ready to land.Jan 4 2023, 3:48 PM
kevinsala added inline comments.Jan 5 2023, 7:42 AM
openmp/libomptarget/include/omptargetplugin.h
199

We could return an error (int32_t) like in the rest of the plugin API functions.

ye-luo requested changes to this revision.Jan 5 2023, 9:40 PM

All the lock functions should return pointers via an argument.
All the lock and unlock functions should return error codes.

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
2726

Apart from returning. There is no error handling. the return type should be changed.

openmp/libomptarget/test/mapping/prelock.cpp
29

strange. try to lock a nullptr?

39

strange. try to unlock a nullptr?

This revision now requires changes to proceed.Jan 5 2023, 9:40 PM

ping. I'm looking forward to the landing of this patch.

carlo.bertolli marked 11 inline comments as done.

[OpenMP][libomptarget][AMDGPU] Address comments and use prelocked pointers in mep clause implementation.

Thanks for being patience with this patch update. I hit a problem with the is_locked function, when the prelocked pointer is passed to the map clause: the address offset calculation was based on the system-allocator (e.g., malloc) pointer but it would not work if the agentBasePointer (locked) was passed in. Fixed now.

All the lock functions should return pointers via an argument.
All the lock and unlock functions should return error codes.

I only added return codes to the plugin API, not to the user-level API. I could not find another example where we do that, and I thought that breaking with the convention was not a good thing. Let me know if you think we should make user-level API's also return an error code and place the locked pointer in a parameter.

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
1835

Checking if something is locked costs less than *locking* it. Because of that logic, we kept is the same way for *unlocking* here, but I have not run any tests to prove this.

For *locking*, if I remember correctly, there was a order of magnitude difference between checking if locked and locking, so definitely worth to do.
This has impact if used every time we need to transfer data H2D or D2H.

ye-luo added inline comments.Jan 10 2023, 11:12 AM
openmp/libomptarget/test/mapping/prelock.cpp
29

Still strange to me. Should the map on "unlocked[:n]"

38

Still strange to me. Should the unlock call on the "unlocked" ptr.

ye-luo added inline comments.Jan 10 2023, 11:16 AM
openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
2721

Not HostPtr?

carlo.bertolli marked 3 inline comments as done.

[OpenMP][libomptarget][AMDGPU] Addressed comments.

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
2721

Good catch. I changed it in the _lock function as well.

openmp/libomptarget/test/mapping/prelock.cpp
29

You can do either and it will work (this, incidentally, unveiled a bug in the support that took some time to find).
I added tests with unlocked and locked pointers.

ye-luo resigned from this revision.Jan 10 2023, 11:54 AM

I still don't feel safe about the amd plugin implementation

  1. how is_locked handle error.
  2. lock_memory doesn't return error code
  3. lock_memory and unlock_memory behavior is not symmetric.

But that is only inside the plugin.

llvm_omp_target_lock_mem
llvm_omp_target_unlock_mem
are not well specified.

Once renaming happens and tests get fixed. It will be OK to merge.

openmp/libomptarget/include/omptargetplugin.h
196

TgtPtr -> HostPtr.

200

TgtPtr -> HostPtr.

openmp/libomptarget/plugins/amdgpu/impl/impl_runtime.h
17

The return value can only be meaning full if err_p is a success.

openmp/libomptarget/test/mapping/prelock.cpp
22

better call it host_ptr. Lock or not is a state.

This revision is now accepted and ready to land.Jan 10 2023, 11:54 AM

I still don't feel safe about the amd plugin implementation

  1. how is_locked handle error.

Agreed, it can be improved. Any suggestions?

  1. lock_memory doesn't return error code

The __tgt_rtl_target_data_lock and unlock do have an error code now. I am not propagating it beyond libomptarget. Are you suggesting that I should?
Happy to do that, just trying to get what you think is best.

  1. lock_memory and unlock_memory behavior is not symmetric.

In what sense?

But that is only inside the plugin.

llvm_omp_target_lock_mem
llvm_omp_target_unlock_mem
are not well specified.

Because of the missing error? Happy to add that.

Once renaming happens and tests get fixed. It will be OK to merge.

I already changed the tgt's to hst's parameter names. Perhaps you are looking at an intermediate version.

Thanks!

ye-luo added inline comments.Jan 10 2023, 12:04 PM
openmp/libomptarget/test/mapping/prelock.cpp
29

I don't believe it is well defined llvm_omp_target_lock_mem when OMP_TARGET_OFFLOAD=disabled.
locked can be nullptr.
I would refrain from using "locked" explicitly.

JonChesterfield requested changes to this revision.Jan 10 2023, 12:13 PM

Please fix the merge errors in is_locked

openmp/libomptarget/plugins/amdgpu/impl/impl_runtime.h
17

That's a scary interface choice. It reads as the comment above, but actually it returns false when things go wrong and unconditionally writes success through the out parameter, throwing away the actual return code. And then the call sites pass in a value and ignore the result anyway. Perhaps the result of git merge blowing up on you?

This revision now requires changes to proceed.Jan 10 2023, 12:13 PM

lock_memory and unlock_memory not symmetric.

I mean
lock_memory(ptr) # lock happens
lock_memory(ptr) # no-op
unlock_memory(ptr) # unlock happens
unlock_memory(ptr) # no-op

I just feel unsafe. Feel more safe if
lock_memory(ptr) # lock happens
lock_memory(ptr) # no-op
unlock_memory(ptr) # no-op
unlock_memory(ptr) # unlock happens
but this requires doing reference counting.

openmp/libomptarget/plugins/amdgpu/impl/impl.cpp
15

hsa_status_t is_locked(void *ptr, void **agentBaseAddress)

Only read agentBaseAddress value if the return is a success.
Use agentBaseAddress == nullptr to tell locked or not.

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
1817

hsa_status_t lock_memory(void *mem, size_t size, **locked_ptr)

JonChesterfield resigned from this revision.EditedJan 10 2023, 12:17 PM

Ok, I failed to parse it. 1/3 call sites do something with the return code, and it is set by something that reads like a variable declaration. So maybe not a merge misfire.

^ I share that concern about lock/unlock silently succeeding in some interleavings

openmp/libomptarget/plugins/amdgpu/impl/impl.cpp
20

Oh, I missed this call as it looks like a variable declaration. So we do return different values through the out parameter, we just don't do anything with them.

87

Here we return the err from is_locked

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
1821

Here we discard it

1834

Here we discard it

This revision is now accepted and ready to land.Jan 10 2023, 12:17 PM
JonChesterfield added a comment.EditedJan 10 2023, 12:21 PM

hsa_amd_memory_lock gives the corresponding region coarse semantics. No update visible until after a kernel exits. Is that right for this?

Specifically if someone mmaps some host address space then calls this on it, it'll succeed, but if they want to see the results of atomic operations on that memory while the kernel is running, they won't.

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
1824

I think this means lock for all GPUs. It's a deprecated interface though. There's some calls around memory pool that offer more control but they're harder to use.

lock_memory and unlock_memory not symmetric.

I mean
lock_memory(ptr) # lock happens
lock_memory(ptr) # no-op
unlock_memory(ptr) # unlock happens
unlock_memory(ptr) # no-op

I just feel unsafe. Feel more safe if
lock_memory(ptr) # lock happens
lock_memory(ptr) # no-op
unlock_memory(ptr) # no-op
unlock_memory(ptr) # unlock happens
but this requires doing reference counting.

I've just tested this in a pure C++ (no openmp) program using ROCr directly.
Here's the result of the operations above (I am not sure we are doing no-op):
a is pinned after first lock
a is pinned after second lock
a is pinned after first unlock
a is not pinned after second unlock

Code (please fill the void as necessary, or I can provide actual file if you want to test locally):

double *a = (double *)aligned_alloc(4096, n*sizeof(double)); //new double[n];
double *b = (double *)aligned_alloc(4096, n*sizeof(double)); //new double[n];

double *pinned_a = nullptr;
err = hsa_amd_memory_lock((void *) a, n*sizeof(double), nullptr, 0, (void **) &pinned_a);
CHECK(err);

{
  bool already_pinned = false;
  hsa_amd_pointer_info_t info;
  info.size = sizeof(hsa_amd_pointer_info_t);
  err = hsa_amd_pointer_info(a, &info, nullptr, nullptr, nullptr);
  CHECK(err);
  already_pinned = (info.type == HSA_EXT_POINTER_TYPE_LOCKED);
  if (already_pinned) printf("a is pinned after first lock\n");
  else printf("a is not pinned after first lock\n");
}

double *repinned_a = nullptr;
err = hsa_amd_memory_lock((void *) a, n*sizeof(double), nullptr, 0, (void **) &repinned_a);
CHECK(err);

{
  bool already_pinned = false;
  hsa_amd_pointer_info_t info;
  info.size = sizeof(hsa_amd_pointer_info_t);
  err = hsa_amd_pointer_info(a, &info, nullptr, nullptr, nullptr);
  CHECK(err);
  already_pinned = (info.type == HSA_EXT_POINTER_TYPE_LOCKED);
if (already_pinned) printf("a is pinned after second lock\n");
else printf("a is pinned after second lock\n");
}


hsa_amd_memory_unlock(pinned_a);
{
  bool already_pinned = false;
  hsa_amd_pointer_info_t info;
  info.size = sizeof(hsa_amd_pointer_info_t);
  err = hsa_amd_pointer_info(a, &info, nullptr, nullptr, nullptr);
  already_pinned = (info.type == HSA_EXT_POINTER_TYPE_LOCKED);
  if (already_pinned) printf("a is pinned after first unlock\n");
  else printf("a is not pinned after first unlock\n");
}

{
  bool already_pinned = false;
  hsa_amd_pointer_info_t info;
  info.size = sizeof(hsa_amd_pointer_info_t);
  hsa_amd_memory_unlock(a);
  err = hsa_amd_pointer_info(a, &info, nullptr, nullptr, nullptr);
  already_pinned = (info.type == HSA_EXT_POINTER_TYPE_LOCKED);
  if (already_pinned) printf("a is pinned after second unlock\n");
  else printf("a is not pinned after second unlock\n");
}

This looks like what you want, right?

lock_memory and unlock_memory not symmetric.

I mean
lock_memory(ptr) # lock happens
lock_memory(ptr) # no-op
unlock_memory(ptr) # unlock happens
unlock_memory(ptr) # no-op

I just feel unsafe. Feel more safe if
lock_memory(ptr) # lock happens
lock_memory(ptr) # no-op
unlock_memory(ptr) # no-op
unlock_memory(ptr) # unlock happens
but this requires doing reference counting.

I've just tested this in a pure C++ (no openmp) program using ROCr directly.
Here's the result of the operations above (I am not sure we are doing no-op):
a is pinned after first lock
a is pinned after second lock
a is pinned after first unlock
a is not pinned after second unlock

Code (please fill the void as necessary, or I can provide actual file if you want to test locally):

double *a = (double *)aligned_alloc(4096, n*sizeof(double)); //new double[n];
double *b = (double *)aligned_alloc(4096, n*sizeof(double)); //new double[n];

double *pinned_a = nullptr;
err = hsa_amd_memory_lock((void *) a, n*sizeof(double), nullptr, 0, (void **) &pinned_a);
CHECK(err);

{
  bool already_pinned = false;
  hsa_amd_pointer_info_t info;
  info.size = sizeof(hsa_amd_pointer_info_t);
  err = hsa_amd_pointer_info(a, &info, nullptr, nullptr, nullptr);
  CHECK(err);
  already_pinned = (info.type == HSA_EXT_POINTER_TYPE_LOCKED);
  if (already_pinned) printf("a is pinned after first lock\n");
  else printf("a is not pinned after first lock\n");
}

double *repinned_a = nullptr;
err = hsa_amd_memory_lock((void *) a, n*sizeof(double), nullptr, 0, (void **) &repinned_a);
CHECK(err);

{
  bool already_pinned = false;
  hsa_amd_pointer_info_t info;
  info.size = sizeof(hsa_amd_pointer_info_t);
  err = hsa_amd_pointer_info(a, &info, nullptr, nullptr, nullptr);
  CHECK(err);
  already_pinned = (info.type == HSA_EXT_POINTER_TYPE_LOCKED);
if (already_pinned) printf("a is pinned after second lock\n");
else printf("a is pinned after second lock\n");
}


hsa_amd_memory_unlock(pinned_a);
{
  bool already_pinned = false;
  hsa_amd_pointer_info_t info;
  info.size = sizeof(hsa_amd_pointer_info_t);
  err = hsa_amd_pointer_info(a, &info, nullptr, nullptr, nullptr);
  already_pinned = (info.type == HSA_EXT_POINTER_TYPE_LOCKED);
  if (already_pinned) printf("a is pinned after first unlock\n");
  else printf("a is not pinned after first unlock\n");
}

{
  bool already_pinned = false;
  hsa_amd_pointer_info_t info;
  info.size = sizeof(hsa_amd_pointer_info_t);
  hsa_amd_memory_unlock(a);
  err = hsa_amd_pointer_info(a, &info, nullptr, nullptr, nullptr);
  already_pinned = (info.type == HSA_EXT_POINTER_TYPE_LOCKED);
  if (already_pinned) printf("a is pinned after second unlock\n");
  else printf("a is not pinned after second unlock\n");
}

This looks like what you want, right?

yes. rocr does reference counting and it is the desired behavior.

carlo.bertolli added inline comments.Jan 10 2023, 2:01 PM
openmp/libomptarget/plugins/amdgpu/impl/impl.cpp
20

Correct, here's the description in the header file:
@param[in, out] info Pointer to structure to be filled with allocation info.
Data member size must be set to the size of the structure prior to calling
hsa_amd_pointer_info. On return size will be set to the size of the
pointer info structure supported by the runtime, if smaller.

https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/a0d5e18e7752563daf4da970eae5ac8b6056a4c0/src/inc/hsa_ext_amd.h#L1844

I *believe* the size of the info struct could be smaller than what we use on older runtimes and/or gpus. I will update as necessary. Thanks for the catch.

hsa_amd_memory_lock gives the corresponding region coarse semantics. No update visible until after a kernel exits. Is that right for this?

I do not *think* hsa_amd_memory_lock changes granularity. There is a call for that, but it is not the locking one.
In general, up to OpenMP 5.2, there is no system-scope atomic available in the language. That makes the fine/coarse grain memory distinction not relevant. We will need to worry about starting with OpenMP 6.0 TR1, I believe.

Specifically if someone mmaps some host address space then calls this on it, it'll succeed, but if they want to see the results of atomic operations on that memory while the kernel is running, they won't.

Correct, and in OpenMP there is no synchronization available during kernel execution, only at kernel boundaries. In any case, not something that "locking" host memory has to do with.

carlo.bertolli added inline comments.Jan 10 2023, 2:27 PM
openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
1824

You caught me on this one :-)
It is not deprecated, but in newer versions of ROCr it will need to be passed in the list of agents that are supposed to be accessing the locked pointer. I will send an update once this patch is in. The main problem is selecting an agent when the user is doing the locking themselves and we don't have a device ID. In that case, we may be forced to pass in all GPU agents in the system, but I am not sure whether this has performance or correctness implications. The alternative is making the llvm_omp_*lock api's dependent on a device num. Suggestions?

hsa_amd_memory_lock gives the corresponding region coarse semantics. No update visible until after a kernel exits. Is that right for this?

I do not *think* hsa_amd_memory_lock changes granularity. There is a call for that, but it is not the locking one.
In general, up to OpenMP 5.2, there is no system-scope atomic available in the language. That makes the fine/coarse grain memory distinction not relevant. We will need to worry about starting with OpenMP 6.0 TR1, I believe.

There's lock memory to pool which inherits the properties of the pool. That can be used to get fine or coarse synchronization on a mmapped host pointer. Maybe openmp not having atomics means we can ignore it, though if people use opencl style atomic intrinsics in kernels today they work.

Specifically if someone mmaps some host address space then calls this on it, it'll succeed, but if they want to see the results of atomic operations on that memory while the kernel is running, they won't.

Correct, and in OpenMP there is no synchronization available during kernel execution, only at kernel boundaries. In any case, not something that "locking" host memory has to do with.

Huh. I did not realise this. Then given this is an openmp call, and openmp can't write code which can tell the difference, coarse grain seems fine.

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
1824

That would be a regression in rocr - nullptr meaning all seems like a useful capability. I'd have guessed the right thing to do there is talk rocr's maintainers out of breaking backwards compatibility.

Otherwise yeah, we can stash all agents in the system in an array and pass it in to various calls. Probably all CPU agents as well as all GPU agents.

On the other hand, this is an interface intended to let people write longer but faster code. Perhaps it should be taking a device id and only locking it for that device. That seems likely to be faster than locking it for all of them.

carlo.bertolli marked 7 inline comments as done.

[OpenMP][libomptarget][AMDGPU] Address comments, including fix in error handling in is_locked function.

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
1824

DeviceNum is already part of the API. I changed the implementation to use the related HSA agent. We are good. Thanks for noticing this.

openmp/libomptarget/test/mapping/prelock.cpp
29

Good point. Looking at other API's in api.cpp, none of them seems to be well defined for this case. Happy to take care of this if there is a suggestion here. Both pointers work fine.

ye-luo accepted this revision.Jan 13 2023, 9:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 13 2023, 10:19 AM

Hi @carlo.bertolli, I am getting the following error in my local build after this commit:

/mybuild/openmp/libomptarget/plugins/amdgpu/impl/impl.cpp:17:3: error: ‘hsa_amd_pointer_info_t’ was not declared in this scope; did you mean ‘hsa_amd_agent_info_t’?
   17 |   hsa_amd_pointer_info_t info;
      |   ^~~~~~~~~~~~~~~~~~~~~~
      |   hsa_amd_agent_info_t

Is there missing definition in hsa_ext_amd.h?

Hi Carlo,

Thank you for the prompt response! Direct push works for me. I suppose posting a message in https://reviews.llvm.org/D139208 with the new commit id should be enough to collect any post-commit review comments for the new code.

Thanks,
Slava

I think that's because we (people w/o AMD card) are all building the dynamic AMD plugin, which has all definition of AMD HSA related stuff. That one might be out of date. Please update it accordingly.

Pushed a fix to the non-amdgpu build:
7928d9e12d47fcc226d0c6984e11f5f463670f4a
Tested on machine without ROCm installation or AMDGPU.

[AMD Official Use Only - General]

Hi Slava, Shilei

I've just pushed a fix, tested against a machine without amdgpu's nor rocm. Please let me know if that fixes it for you:
7928d9e12d47fcc226d0c6984e11f5f463670f4a

Thanks for the notification+patience

  • Carlo

Thank you, Carlo! The fix works for me.

@carlo.bertolli is there documentation regarding llvm_omp_target_lock_mem llvm_omp_target_unlock_mem? While reading the comments of this patch, I see that the API ensures that locked areas feature reference counting. But I have the following doubts. Should the API support locking memory buffers that were already locked (e.g., complete or partial overlapping)? What's the behavior in such a case?

@carlo.bertolli is there documentation regarding llvm_omp_target_lock_mem llvm_omp_target_unlock_mem? While reading the comments of this patch, I see that the API ensures that locked areas feature reference counting. But I have the following doubts. Should the API support locking memory buffers that were already locked (e.g., complete or partial overlapping)? What's the behavior in such a case?

I didn't add documentation for those two APIs, but indeed I should have. Is API description in code enough or is there a better/different place for it?
According to the ROCr implementation of lock/pin and unlock/unpin:
https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/src/inc/hsa_ext_amd.h#L1525
double locking just keeps things locked. There is a comment about overlapping pages for two calls to lock that leaves the pages locked.
I don't think we need to report double locks if ref counting works correctly and I do think we should allow double locking.
My two cents. Thanks for the question!