This patch adds API support for the atk_pinned trait for omp_alloc. It does not implement kmp_target_lock_mem and kmp_target_unlock_mem in libomptarget, but prepares libomp for it. Patches to libomptarget to implement lock/unlock coming after this one.
Details
Diff Detail
Event Timeline
Generally looks good. Great we are gaining support upstream. I have one question though and I hope some host runtime ppl can take a look too.
openmp/runtime/src/kmp_alloc.cpp | ||
---|---|---|
1658 | Calling lock_mem on alloc_device memory is probably not going to work well, right? | |
1683 | Same as above. |
Pinning does not apply to device memory on AMDGPU target, only to host memory, as explained here:
https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/src/inc/hsa_ext_amd.h#L1526
CUDA API seems to agree with that:
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1gb65da58f444e7230d3322b6126bb4902
openmp/runtime/src/kmp_alloc.cpp | ||
---|---|---|
1658 | Good catch. For now, I moved the pinning to the llvm_omp_target_host_memo_alloc. However, given that this is a predefined memory allocator, I do not see any way the OpenMP API would allow users to set a special trait for it. Setting traits is something you can do when initializing an allocator, which needs to be seeded with a predefined memory space - not with a predefined allocator. |
openmp/runtime/src/kmp_alloc.cpp | ||
---|---|---|
1658 | I think we should remove it. "_host" allocator should already pin memory, plus your point. |
openmp/runtime/src/kmp_alloc.cpp | ||
---|---|---|
2026–2029 |
| |
openmp/runtime/test/api/omp_pinned.c | ||
2 | Can we add another RUN line with env OMP_ALLOCATOR=omp_default_mem_space:pinned=true %libomp-run. This would have to be used in tandem with something like pinned_alloc2 = omp_get_default_allocator() below doing a second allocation and free. |
openmp/runtime/test/api/omp_pinned.c | ||
---|---|---|
3 | %libomp-compile-and-run -> %libomp-run As written now with %libomp-compile-and-run, LIT will set the environment variable for the re-compilation only. It translates to: $ env OMP_ALLOCATOR=... <compilation_command> && ./executable We can avoid the recompilation and have the environment variable apply to the run command by using %libomp-run |
Calling lock_mem on alloc_device memory is probably not going to work well, right?
I am unsure we need to lock this here at all, it's either "_host" (=pinned), "_shared" (=managed), or "_device" (=GPU) memory.
When does locking make sense in those three cases and can we guarantee it won't cause problems?