diff --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst --- a/openmp/docs/design/Runtimes.rst +++ b/openmp/docs/design/Runtimes.rst @@ -141,50 +141,66 @@ Info: Entering OpenMP data region at zaxpy.cpp:14:1 with 2 arguments: Info: to(X[0:N])[16384] Info: tofrom(Y[0:N])[16384] - Info: Creating new map entry with HstPtrBegin=0x00007ffde9e99000, - TgtPtrBegin=0x00007f15dc600000, Size=16384, Name=X[0:N] - Info: Copying data from host to device, HstPtr=0x00007ffde9e99000, - TgtPtr=0x00007f15dc600000, Size=16384, Name=X[0:N] - Info: Creating new map entry with HstPtrBegin=0x00007ffde9e95000, - TgtPtrBegin=0x00007f15dc604000, Size=16384, Name=Y[0:N] - Info: Copying data from host to device, HstPtr=0x00007ffde9e95000, - TgtPtr=0x00007f15dc604000, Size=16384, Name=Y[0:N] + Info: Creating new map entry with HstPtrBegin=0x00007fff0d259a40, + TgtPtrBegin=0x00007fdba5800000, Size=16384, RefCount=1, Name=X[0:N] + Info: Copying data from host to device, HstPtr=0x00007fff0d259a40, + TgtPtr=0x00007fdba5800000, Size=16384, Name=X[0:N] + Info: Creating new map entry with HstPtrBegin=0x00007fff0d255a40, + TgtPtrBegin=0x00007fdba5804000, Size=16384, RefCount=1, Name=Y[0:N] + Info: Copying data from host to device, HstPtr=0x00007fff0d255a40, + TgtPtr=0x00007fdba5804000, Size=16384, Name=Y[0:N] Info: OpenMP Host-Device pointer mappings after block at zaxpy.cpp:14:1: Info: Host Ptr Target Ptr Size (B) RefCount Declaration - Info: 0x00007ffde9e95000 0x00007f15dc604000 16384 1 Y[0:N] at zaxpy.cpp:13:17 - Info: 0x00007ffde9e99000 0x00007f15dc600000 16384 1 X[0:N] at zaxpy.cpp:13:11 + Info: 0x00007fff0d255a40 0x00007fdba5804000 16384 1 Y[0:N] at zaxpy.cpp:13:17 + Info: 0x00007fff0d259a40 0x00007fdba5800000 16384 1 X[0:N] at zaxpy.cpp:13:11 Info: Entering OpenMP kernel at zaxpy.cpp:6:1 with 4 arguments: Info: firstprivate(N)[8] (implicit) Info: use_address(Y)[0] (implicit) Info: tofrom(D)[16] (implicit) Info: use_address(X)[0] (implicit) - Info: Mapping exists (implicit) with HstPtrBegin=0x00007ffde9e95000, - TgtPtrBegin=0x00007f15dc604000, Size=0, updated RefCount=2, Name=Y - Info: Creating new map entry with HstPtrBegin=0x00007ffde9e94fb0, - TgtPtrBegin=0x00007f15dc608000, Size=16, Name=D - Info: Copying data from host to device, HstPtr=0x00007ffde9e94fb0, - TgtPtr=0x00007f15dc608000, Size=16, Name=D - Info: Mapping exists (implicit) with HstPtrBegin=0x00007ffde9e99000, - TgtPtrBegin=0x00007f15dc600000, Size=0, updated RefCount=2, Name=X - Info: Launching kernel __omp_offloading_fd02_e25f6e76__Z5zaxpyPSt7complexIdES1_S0_m_l6 + Info: Mapping exists (implicit) with HstPtrBegin=0x00007fff0d255a40, + TgtPtrBegin=0x00007fdba5804000, Size=0, RefCount=2 (incremented), Name=Y + Info: Creating new map entry with HstPtrBegin=0x00007fff0d2559f0, + TgtPtrBegin=0x00007fdba5808000, Size=16, RefCount=1, Name=D + Info: Copying data from host to device, HstPtr=0x00007fff0d2559f0, + TgtPtr=0x00007fdba5808000, Size=16, Name=D + Info: Mapping exists (implicit) with HstPtrBegin=0x00007fff0d259a40, + TgtPtrBegin=0x00007fdba5800000, Size=0, RefCount=2 (incremented), Name=X + Info: Mapping exists with HstPtrBegin=0x00007fff0d255a40, + TgtPtrBegin=0x00007fdba5804000, Size=0, RefCount=2 (update suppressed) + Info: Mapping exists with HstPtrBegin=0x00007fff0d2559f0, + TgtPtrBegin=0x00007fdba5808000, Size=16, RefCount=1 (update suppressed) + Info: Mapping exists with HstPtrBegin=0x00007fff0d259a40, + TgtPtrBegin=0x00007fdba5800000, Size=0, RefCount=2 (update suppressed) + Info: Launching kernel __omp_offloading_10305_c08c86__Z5zaxpyPSt7complexIdES1_S0_m_l6 with 8 blocks and 128 threads in SPMD mode - Info: Copying data from device to host, TgtPtr=0x00007f15dc608000, - HstPtr=0x00007ffde9e94fb0, Size=16, Name=D - Info: Removing map entry with HstPtrBegin=0x00007ffde9e94fb0, - TgtPtrBegin=0x00007f15dc608000, Size=16, Name=D + Info: Mapping exists with HstPtrBegin=0x00007fff0d259a40, + TgtPtrBegin=0x00007fdba5800000, Size=0, RefCount=1 (decremented) + Info: Mapping exists with HstPtrBegin=0x00007fff0d2559f0, + TgtPtrBegin=0x00007fdba5808000, Size=16, RefCount=1 (deferred final decrement) + Info: Copying data from device to host, TgtPtr=0x00007fdba5808000, + HstPtr=0x00007fff0d2559f0, Size=16, Name=D + Info: Mapping exists with HstPtrBegin=0x00007fff0d255a40, + TgtPtrBegin=0x00007fdba5804000, Size=0, RefCount=1 (decremented) + Info: Removing map entry with HstPtrBegin=0x00007fff0d2559f0, + TgtPtrBegin=0x00007fdba5808000, Size=16, Name=D Info: OpenMP Host-Device pointer mappings after block at zaxpy.cpp:6:1: Info: Host Ptr Target Ptr Size (B) RefCount Declaration - Info: 0x00007ffde9e95000 0x00007f15dc604000 16384 1 Y[0:N] at zaxpy.cpp:13:17 - Info: 0x00007ffde9e99000 0x00007f15dc600000 16384 1 X[0:N] at zaxpy.cpp:13:11 + Info: 0x00007fff0d255a40 0x00007fdba5804000 16384 1 Y[0:N] at zaxpy.cpp:13:17 + Info: 0x00007fff0d259a40 0x00007fdba5800000 16384 1 X[0:N] at zaxpy.cpp:13:11 Info: Exiting OpenMP data region at zaxpy.cpp:14:1 with 2 arguments: Info: to(X[0:N])[16384] Info: tofrom(Y[0:N])[16384] - Info: Copying data from device to host, TgtPtr=0x00007f15dc604000, - HstPtr=0x00007ffde9e95000, Size=16384, Name=Y[0:N] - Info: Removing map entry with HstPtrBegin=0x00007ffde9e95000, - TgtPtrBegin=0x00007f15dc604000, Size=16384, Name=Y[0:N] - Info: Removing map entry with HstPtrBegin=0x00007ffde9e99000, - TgtPtrBegin=0x00007f15dc600000, Size=16384, Name=X[0:N] + Info: Mapping exists with HstPtrBegin=0x00007fff0d255a40, + TgtPtrBegin=0x00007fdba5804000, Size=16384, RefCount=1 (deferred final decrement) + Info: Copying data from device to host, TgtPtr=0x00007fdba5804000, + HstPtr=0x00007fff0d255a40, Size=16384, Name=Y[0:N] + Info: Mapping exists with HstPtrBegin=0x00007fff0d259a40, + TgtPtrBegin=0x00007fdba5800000, Size=16384, RefCount=1 (deferred final decrement) + Info: Removing map entry with HstPtrBegin=0x00007fff0d255a40, + TgtPtrBegin=0x00007fdba5804000, Size=16384, Name=Y[0:N] + Info: Removing map entry with HstPtrBegin=0x00007fff0d259a40, + TgtPtrBegin=0x00007fdba5800000, Size=16384, Name=X[0:N] From this information, we can see the OpenMP kernel being launched on the CUDA device with enough threads and blocks for all ``1024`` iterations of the loop in diff --git a/openmp/libomptarget/src/device.h b/openmp/libomptarget/src/device.h --- a/openmp/libomptarget/src/device.h +++ b/openmp/libomptarget/src/device.h @@ -88,6 +88,10 @@ } bool isRefCountInf() const { return RefCount == INFRefCount; } + + std::string refCountToStr() const { + return isRefCountInf() ? "INF" : std::to_string(getRefCount()); + } }; typedef uintptr_t HstPtrBeginTy; diff --git a/openmp/libomptarget/src/device.cpp b/openmp/libomptarget/src/device.cpp --- a/openmp/libomptarget/src/device.cpp +++ b/openmp/libomptarget/src/device.cpp @@ -76,16 +76,20 @@ } // Mapping does not exist, allocate it with refCount=INF - auto Res = HostDataToTargetMap.emplace( - (uintptr_t)HstPtrBegin /*HstPtrBase*/, - (uintptr_t)HstPtrBegin /*HstPtrBegin*/, - (uintptr_t)HstPtrBegin + Size /*HstPtrEnd*/, - (uintptr_t)TgtPtrBegin /*TgtPtrBegin*/, nullptr, true /*IsRefCountINF*/); - auto NewEntry = Res.first; + const HostDataToTargetTy &newEntry = + *HostDataToTargetMap + .emplace( + /*HstPtrBase=*/(uintptr_t)HstPtrBegin, + /*HstPtrBegin=*/(uintptr_t)HstPtrBegin, + /*HstPtrEnd=*/(uintptr_t)HstPtrBegin + Size, + /*TgtPtrBegin=*/(uintptr_t)TgtPtrBegin, /*Name=*/nullptr, + /*IsRefCountINF=*/true) + .first; DP("Creating new map entry: HstBase=" DPxMOD ", HstBegin=" DPxMOD - ", HstEnd=" DPxMOD ", TgtBegin=" DPxMOD "\n", - DPxPTR(NewEntry->HstPtrBase), DPxPTR(NewEntry->HstPtrBegin), - DPxPTR(NewEntry->HstPtrEnd), DPxPTR(NewEntry->TgtPtrBegin)); + ", HstEnd=" DPxMOD ", TgtBegin=" DPxMOD ", RefCount=%s\n", + DPxPTR(newEntry.HstPtrBase), DPxPTR(newEntry.HstPtrBegin), + DPxPTR(newEntry.HstPtrEnd), DPxPTR(newEntry.TgtPtrBegin), + newEntry.refCountToStr().c_str()); DataMapMtx.unlock(); @@ -211,18 +215,16 @@ ((lr.Flags.ExtendsBefore || lr.Flags.ExtendsAfter) && IsImplicit)) { auto &HT = *lr.Entry; IsNew = false; - if (UpdateRefCount) HT.incRefCount(); - uintptr_t tp = HT.TgtPtrBegin + ((uintptr_t)HstPtrBegin - HT.HstPtrBegin); INFO(OMP_INFOTYPE_MAPPING_EXISTS, DeviceID, "Mapping exists%s with HstPtrBegin=" DPxMOD ", TgtPtrBegin=" DPxMOD ", " - "Size=%" PRId64 ",%s RefCount=%s, Name=%s\n", + "Size=%" PRId64 ", RefCount=%s (%s), Name=%s\n", (IsImplicit ? " (implicit)" : ""), DPxPTR(HstPtrBegin), DPxPTR(tp), - Size, (UpdateRefCount ? " updated" : ""), - HT.isRefCountInf() ? "INF" : std::to_string(HT.getRefCount()).c_str(), + Size, HT.refCountToStr().c_str(), + UpdateRefCount ? "incremented" : "update suppressed", (HstPtrName) ? getNameFromMapping(HstPtrName).c_str() : "unknown"); rc = (void *)tp; } else if ((lr.Flags.ExtendsBefore || lr.Flags.ExtendsAfter) && !IsImplicit) { @@ -246,9 +248,9 @@ // In addition to the mapping rules above, the close map modifier forces the // mapping of the variable to the device. if (Size) { - DP("Return HstPtrBegin " DPxMOD " Size=%" PRId64 " RefCount=%s\n", - DPxPTR((uintptr_t)HstPtrBegin), Size, - (UpdateRefCount ? " updated" : "")); + DP("Return HstPtrBegin " DPxMOD " Size=%" PRId64 " for unified shared " + "memory\n", + DPxPTR((uintptr_t)HstPtrBegin), Size); IsHostPtr = true; rc = HstPtrBegin; } @@ -263,13 +265,18 @@ // If it is not contained and Size > 0, we should create a new entry for it. IsNew = true; uintptr_t tp = (uintptr_t)allocData(Size, HstPtrBegin); + const HostDataToTargetTy &newEntry = + *HostDataToTargetMap + .emplace((uintptr_t)HstPtrBase, (uintptr_t)HstPtrBegin, + (uintptr_t)HstPtrBegin + Size, tp, HstPtrName) + .first; INFO(OMP_INFOTYPE_MAPPING_CHANGED, DeviceID, "Creating new map entry with " - "HstPtrBegin=" DPxMOD ", TgtPtrBegin=" DPxMOD ", Size=%ld, Name=%s\n", + "HstPtrBegin=" DPxMOD ", TgtPtrBegin=" DPxMOD ", Size=%ld, " + "RefCount=%s, Name=%s\n", DPxPTR(HstPtrBegin), DPxPTR(tp), Size, + newEntry.refCountToStr().c_str(), (HstPtrName) ? getNameFromMapping(HstPtrName).c_str() : "unknown"); - HostDataToTargetMap.emplace((uintptr_t)HstPtrBase, (uintptr_t)HstPtrBegin, - (uintptr_t)HstPtrBegin + Size, tp, HstPtrName); rc = (void *)tp; } @@ -292,25 +299,35 @@ if (lr.Flags.IsContained || (!MustContain && (lr.Flags.ExtendsBefore || lr.Flags.ExtendsAfter))) { auto &HT = *lr.Entry; + // We do not decrement the reference count to zero here. deallocTgtPtr does + // that atomically with removing the mapping. Otherwise, before this thread + // removed the mapping in deallocTgtPtr, another thread could retrieve the + // mapping, increment and decrement back to zero, and then both threads + // would try to remove the mapping, resulting in a double free. IsLast = HT.getRefCount() == 1; - - if (!IsLast && UpdateRefCount) + const char *RefCountAction; + if (!UpdateRefCount) + RefCountAction = "update suppressed"; + else if (IsLast) + RefCountAction = "deferred final decrement"; + else { + RefCountAction = "decremented"; HT.decRefCount(); - + } uintptr_t tp = HT.TgtPtrBegin + ((uintptr_t)HstPtrBegin - HT.HstPtrBegin); - DP("Mapping exists with HstPtrBegin=" DPxMOD ", TgtPtrBegin=" DPxMOD ", " - "Size=%" PRId64 ",%s RefCount=%s\n", - DPxPTR(HstPtrBegin), DPxPTR(tp), Size, - (UpdateRefCount ? " updated" : ""), - HT.isRefCountInf() ? "INF" : std::to_string(HT.getRefCount()).c_str()); + INFO(OMP_INFOTYPE_MAPPING_EXISTS, DeviceID, + "Mapping exists with HstPtrBegin=" DPxMOD ", TgtPtrBegin=" DPxMOD ", " + "Size=%" PRId64 ", RefCount=%s (%s)\n", + DPxPTR(HstPtrBegin), DPxPTR(tp), Size, HT.refCountToStr().c_str(), + RefCountAction); rc = (void *)tp; } else if (PM->RTLs.RequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY) { // If the value isn't found in the mapping and unified shared memory // is on then it means we have stumbled upon a value which we need to // use directly from the host. - DP("Get HstPtrBegin " DPxMOD " Size=%" PRId64 " RefCount=%s\n", - DPxPTR((uintptr_t)HstPtrBegin), Size, - (UpdateRefCount ? " updated" : "")); + DP("Get HstPtrBegin " DPxMOD " Size=%" PRId64 " for unified shared " + "memory\n", + DPxPTR((uintptr_t)HstPtrBegin), Size); IsHostPtr = true; rc = HstPtrBegin; } diff --git a/openmp/libomptarget/src/private.h b/openmp/libomptarget/src/private.h --- a/openmp/libomptarget/src/private.h +++ b/openmp/libomptarget/src/private.h @@ -118,11 +118,11 @@ for (const auto &HostTargetMap : Device.HostDataToTargetMap) { SourceInfo Info(HostTargetMap.HstPtrName); INFO(OMP_INFOTYPE_ALL, Device.DeviceID, - DPxMOD " " DPxMOD " %-8" PRIuPTR " %-8" PRId64 " %s at %s:%d:%d\n", + DPxMOD " " DPxMOD " %-8" PRIuPTR " %-8s %s at %s:%d:%d\n", DPxPTR(HostTargetMap.HstPtrBegin), DPxPTR(HostTargetMap.TgtPtrBegin), HostTargetMap.HstPtrEnd - HostTargetMap.HstPtrBegin, - HostTargetMap.getRefCount(), Info.getName(), Info.getFilename(), - Info.getLine(), Info.getColumn()); + HostTargetMap.refCountToStr().c_str(), Info.getName(), + Info.getFilename(), Info.getLine(), Info.getColumn()); } Device.DataMapMtx.unlock(); } diff --git a/openmp/libomptarget/test/offloading/info.c b/openmp/libomptarget/test/offloading/info.c --- a/openmp/libomptarget/test/offloading/info.c +++ b/openmp/libomptarget/test/offloading/info.c @@ -6,6 +6,10 @@ #define N 64 +#pragma omp declare target +int global; +#pragma omp end declare target + extern void __tgt_set_info_flag(unsigned); int main() { @@ -19,10 +23,10 @@ // INFO: Libomptarget device 0 info: alloc(A[0:64])[256] // INFO: Libomptarget device 0 info: tofrom(B[0:64])[256] // INFO: Libomptarget device 0 info: to(C[0:64])[256] -// INFO: Libomptarget device 0 info: Creating new map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, Name=A[0:64] -// INFO: Libomptarget device 0 info: Creating new map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, Name=B[0:64] +// INFO: Libomptarget device 0 info: Creating new map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, RefCount=1, Name=A[0:64] +// INFO: Libomptarget device 0 info: Creating new map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, RefCount=1, Name=B[0:64] // INFO: Libomptarget device 0 info: Copying data from host to device, HstPtr={{.*}}, TgtPtr={{.*}}, Size=256, Name=B[0:64] -// INFO: Libomptarget device 0 info: Creating new map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, Name=C[0:64] +// INFO: Libomptarget device 0 info: Creating new map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, RefCount=1, Name=C[0:64] // INFO: Libomptarget device 0 info: Copying data from host to device, HstPtr={{.*}}, TgtPtr={{.*}}, Size=256, Name=C[0:64] // INFO: Libomptarget device 0 info: OpenMP Host-Device pointer mappings after block at info.c:{{[0-9]+}}:{{[0-9]+}}: // INFO: Libomptarget device 0 info: Host Ptr Target Ptr Size (B) RefCount Declaration @@ -45,6 +49,9 @@ // INFO: Libomptarget device 0 info: Removing map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, Name=C[0:64] // INFO: Libomptarget device 0 info: Removing map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, Name=B[0:64] // INFO: Libomptarget device 0 info: Removing map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, Name=A[0:64] +// INFO: Libomptarget device 0 info: OpenMP Host-Device pointer mappings after block at info.c:[[#%u,]]:[[#%u,]]: +// INFO: Libomptarget device 0 info: Host Ptr Target Ptr Size (B) RefCount Declaration +// INFO: Libomptarget device 0 info: [[#%#x,]] [[#%#x,]] 4 INF unknown at unknown:0:0 #pragma omp target data map(alloc:A[0:N]) map(tofrom:B[0:N]) map(to:C[0:N]) #pragma omp target firstprivate(val) { val = 1; }