Do not map a device accessible pointer which cannot be accessed by the host. Use is_device_ptr instead. Enable again the test for NextGen plugins.
Details
Diff Detail
Event Timeline
Now that I see it, I doubt my suggestion. I'll try to come up with some consensus on the API in 3 weeks. We don't really know what we want.
This test works correctly with the old plugin. A locked pointer is not a device pointer. It is a host pointer that has been pinned in memory and that has been made accessible by ROCr to the device agent that will be assigned the data transfers implementing the map clause. In abstract programming terms, the second feature is only an AMD one and it does not need to have a corresponding functionality on other targets.
I am guessing this is a problem only in the next gen plugin, which needs fixing. Leads for potential issues:
- Is the relevant device agent passed in to the hsa_amd_memory_async_copy?
- Is the device agent passed in to the hsa_amd_memory_lock function?
You could also have a look at my patch for the old plugin as another potential lead:
https://reviews.llvm.org/D139208
@carlo.bertolli Just tested the example below with the old plugins and it also segfaults:
#include <assert.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <omp.h> void *llvm_omp_target_lock_mem(void *ptr, size_t size, int device_num); void llvm_omp_target_unlock_mem(void *ptr, int device_num); int main(int argc, char **argv) { const size_t size = 1000*sizeof(int); int *ptr1 = (int *) malloc(size); assert(ptr1 != NULL); memset(ptr1, 0, size); int *ptr2 = (int *) llvm_omp_target_lock_mem(ptr1, size, 0); assert(ptr2 != NULL); fprintf(stdout, "ptr1: %p, ptr2: %p\n", ptr1, ptr2); fprintf(stdout, "ptr1[0]: %d\n", ptr1[0]); fprintf(stdout, "ptr2[0]: %d\n", ptr2[0]); return 0; }
When running the program, the access to ptr2[0] segfaults at the host side:
ptr1: 0x55b461f00ec0, ptr2: 0x7fc5576c8ec0 ptr1[0]: 0 Segmentation fault
So I don't think ptr2 is a locked host pointer. It's the pointer returned by hsa_amd_memory_lock as the agent_ptr output parameter. And probably it's a pointer that the GPU agents should use for asynchronous memory transfers involving the buffer [ptr1, size). I assumed the host locked pointer is ptr1 directly, after hsa_amd_memory_lock is executed. However, I'm not 100% sure since the documentation of the lock operation on HSA AMD is a bit confusing.
Agreed: a pointer returned by the lock function should not be used on the host. It can be used in map clause as it will be passed to the memory copy functions along with the device agent.
This test is different from prelock.cpp and I was not expecting this to work.
Thanks!
@kevinsala if the issue is with returning the host thread a pointer that cannot be used on the host - except in a map clause and a target region - then perhaps we should not return the locked pointer at all. The lock function could operate in a "side effect" mode, where the user asks the runtime to lock memory and the ROCr keeps track of it. Does this make sense?
I don't have a strong opinion regarding returning or not the agent pointer; I don't know which use cases may have (could be the use of is_device_ptr?). However, I have doubts about the correctness of passing a non-host pointer to a map clause and be treated as part of the host data environment.
The lock function could operate in a "side effect" mode, where the user asks the runtime to lock memory and the ROCr keeps track of it. Does this make sense?
That makes sense to me. It's what we are assuming in the nextgen plugins. The user should map the host pointer, the plugin detects that it's pinned memory, and we internally use the associated agent pointer for the asynchronous transfer.
My understanding from the weekly call is that "locked" is for a device agent not the device.
It can be used in hsa memory copy calls but is_device_ptr is for device(kernel) access and thus incorrect.