The NextGen plugins use the information regarding new mapping/unmappings to lock/unlock the corresponding host buffer and speed up the host-device transfers. Handle also the buffers that were already locked externally, i.e., from the application.
mapped/unmapped is a confusing name, especially in the context of OpenMP, you can always check if a pointer is mapped in libomptarget. Of course I understand the "map" here is not OpenMP map.
I'm working on Nvidia GPUs, so IMO pinned memory makes sense. Not sure if AMD people would object, though in the implementation you are using that name. ;-)
@tianshilei1992 This is supposed to mean "mapped" in the OpenMP sense. So the plugins are informed if a mapping occurs.
I think this is fine. Two nits and Ye's comment need to be addressed. @tianshilei1992 please accept or suggest a different name. Since we really mean "OpenMP mapped" here, I find the name fitting.
If it servers as notifying data mapping, __tgt_rtl_notify_data_map or something like that sounds more appropriate to me. __tgt_rtl_data_mapped sounds like checking whether it is mapped, and that's how it confused me initially.
Can be tgt_rtl_data_notify_mapped and tgt_rtl_data_notify_unmapped by keeping __tgt_rtl_data prefix
I don't understand this part of code.
While it is mapped, we want to keep it (by default) pinned. The escape hatch (env var) is missing though.
Unconditionally pinning mapped host memory should be an option to opt-in rather than default.
Mapping scalars as firstprivate or as tofrom? The former would not go through this mechanism. The latter makes reasonable sense to pin the page, no?
I agree we need an env var, default is the question.
I was referring to map(tofrom) scalars. firstprivte is not considered a map technically. I don't think you want to pay the cost of pinning 10 pages due to 10 scalars. Pinning is very expensive (interacting with OS).
- Fixing comments.
- Re-writing the PinnedAllocationMapTy functions to be clearer and easier to read.
- Renaming RTL API functions.
- Disable automatic locking of mapped host buffers by default. Can be enabled by setting LIBOMPTARGET_LOCK_MAPPED_HOST_BUFFERS envar.
- Although the previous envar is not set, mapped buffers that were externally locked (e.g, the app allocating HSA memory) are registered into the plugin's map and we use a single-step asynchronous HSA memory transfer. (@ye-luo)
This patch still detects invalid partial overlapping in offloading/complex_reduction.cpp, which I have to fix. The previous patch fails with OpenMC, I've to check if this also fails.
It's an output parameter. I'll add the proper comment.
The locking of mapped host buffers is disabled by default. Can be enabled through the LIBOMPTARGET_LOCK_MAPPED_HOST_BUFFERS envar. The envar accepts boolean values (on/off) and a special option:
- off: Do not lock mapped host buffers (default).
- on: Lock mapped host buffers automatically, but do not report lock failures if the plugin fails to lock them.
- mandatory: Lock mapped host buffers automatically and treat locking failures in the plugins as fatal errors. This option may be useful for debugging purposes.
This is probably a very late report but I happened to be looking at openmp bots for a different reason, some of the error returns in this change are failing to compile:
Despite the "gcc" name, this bot appears to use clang-11 which is still supported for building llvm.
This is a simplified version: https://godbolt.org/z/x91h9KcGq
I'm not sure how we're expected to use Expected (no pun intended) in this scenario. There is probably a helper function somewhere.
This should resolve it, thanks for bringing this to my attention https://reviews.llvm.org/rGedc03550063c.