This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Add environment variables to change stack / heap size in the CUDA plugin
ClosedPublic

Authored by jhuber6 on Jul 22 2021, 6:20 PM.

Details

Summary

This patch adds support for two environment variables to configure the device.
`LIBOMPTARGET_STACK_SIZE` sets the amount of memory in bytes that each thread
has for its stack. `LIBOMPTARGET_HEAP_SIZE` sets the amount of heap memory
that can be allocated using malloc / free on the device.

Diff Detail

Event Timeline

jhuber6 created this revision.Jul 22 2021, 6:20 PM
jhuber6 requested review of this revision.Jul 22 2021, 6:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 22 2021, 6:20 PM
This revision is now accepted and ready to land.Jul 22 2021, 6:24 PM
This revision was landed with ongoing or failed builds.Jul 22 2021, 6:40 PM
This revision was automatically updated to reflect the committed changes.
abhinavgaba added inline comments.
openmp/libomptarget/plugins/cuda/src/rtl.cpp
652

These enums don't seem to be defined in cuda.h, or somewhere else. Can you please take a look?

jdoerfert added inline comments.Jul 22 2021, 9:22 PM
openmp/libomptarget/plugins/cuda/src/rtl.cpp
652

My cuda.h defines them.

/opt/cuda/targets/x86_64-linux/include/cuda.h
1130:    CU_LIMIT_STACK_SIZE                       = 0x00, /**< GPU thread stack size */

https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TYPES.html lists them as well.
Are you sure they are not there?

abhinavgaba added inline comments.Jul 22 2021, 9:32 PM
openmp/libomptarget/plugins/cuda/src/rtl.cpp
652

I meant in the compiler sources. The other enums used in libomptarget seem to be defined in openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h. Absence of these definitions on a machine that doesn't have Cuda drivers installed, is causing a build fail for me with these errors:

llvm/openmp/libomptarget/plugins/cuda/src/rtl.cpp: In member function ‘int {anonymous}::DeviceRTLTy::initDevice(int)’:
llvm/openmp/libomptarget/plugins/cuda/src/rtl.cpp:649:25: error: ‘CU_LIMIT_STACK_SIZE’ was not declared in this scope
if (cuCtxSetLimit(CU_LIMIT_STACK_SIZE, StackLimit) != CUDA_SUCCESS)
                  ^~~~~~~~~~~~~~~~~~~

I looked at the buildbots, but they are skipping building the cuda plugin altogether, so don't report any fails. For instance, https://lab.llvm.org/buildbot/#/builders/84/builds/12107/steps/4/logs/stdio has this in the log:

-- Could NOT find LIBOMPTARGET_DEP_CUDA_DRIVER (missing: LIBOMPTARGET_DEP_CUDA_DRIVER_LIBRARIES) 
-- Could NOT find LIBOMPTARGET_DEP_VEO (missing: LIBOMPTARGET_DEP_VEO_LIBRARIES LIBOMPTARGET_DEP_VEOSINFO_LIBRARIES LIBOMPTARGET_DEP_VEO_INCLUDE_DIRS) 
-- LIBOMPTARGET: Building offloading runtime library libomptarget.
...
-- LIBOMPTARGET: Not building CUDA offloading plugin: libelf dependency not found.
...
-- check-libomptarget does nothing.
abhinavgaba added inline comments.Jul 22 2021, 9:57 PM
openmp/libomptarget/plugins/cuda/src/rtl.cpp
652

This patch makes the build pass for me, but I have no way to verify it. @jhuber6 can you please take a look?

diff --git a/openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.cpp b/openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.cpp
index c84b3814065e..235efd2728de 100644
--- a/openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.cpp
+++ b/openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.cpp
@@ -61,6 +61,9 @@ DLWRAP(cuDeviceCanAccessPeer, 3);
 DLWRAP(cuCtxEnablePeerAccess, 2);
 DLWRAP(cuMemcpyPeerAsync, 6);

+DLWRAP(cuCtxGetLimit, 2);
+DLWRAP(cuCtxSetLimit, 2);
+
 DLWRAP_FINALIZE();

 #ifndef DYNAMIC_CUDA_PATH
diff --git a/openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h b/openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h
index 045c39cacc97..17aa2a12ef6c 100644
--- a/openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h
+++ b/openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h
@@ -34,6 +34,17 @@ typedef enum CUstream_flags_enum {
   CU_STREAM_NON_BLOCKING = 0x1,
 } CUstream_flags;

+typedef enum CUlimit_enum {
+  CU_LIMIT_STACK_SIZE = 0x0,
+  CU_LIMIT_PRINTF_FIFO_SIZE = 0x1,
+  CU_LIMIT_MALLOC_HEAP_SIZE = 0x2,
+  CU_LIMIT_DEV_RUNTIME_SYNC_DEPTH = 0x3,
+  CU_LIMIT_DEV_RUNTIME_PENDING_LAUNCH_COUNT = 0x4,
+  CU_LIMIT_MAX_L2_FETCH_GRANULARITY = 0x5,
+  CU_LIMIT_PERSISTING_L2_CACHE_SIZE = 0x6,
+  CU_LIMIT_MAX
+} CUlimit;
+
 typedef enum CUdevice_attribute_enum {
   CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_X = 2,
   CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_X = 5,
@@ -100,4 +111,7 @@ CUresult cuCtxEnablePeerAccess(CUcontext, unsigned);
 CUresult cuMemcpyPeerAsync(CUdeviceptr, CUcontext, CUdeviceptr, CUcontext,
                            size_t, CUstream);

+CUresult cuCtxGetLimit(size_t *, CUlimit);
+CUresult cuCtxSetLimit(CUlimit, size_t);
+
jdoerfert added inline comments.Jul 22 2021, 10:01 PM
openmp/libomptarget/plugins/cuda/src/rtl.cpp
652

Right, that makes sense.

The above looks good to me. Could you commit it?

Johannes,

I do not have the access rights needed for git push. Could you please help commit the fix?

Thanks,
Abhinav

Johannes,

I do not have the access rights needed for git push. Could you please help commit the fix?

Thanks,
Abhinav

@pengfei kindly agreed to help commit it. Thanks, Pengfei.

There're three fails in my local check-all. Is this expected?

Failed Tests (3):
  libomptarget :: x86_64-pc-linux-gnu :: offloading/memory_manager.cpp
  libomptarget :: x86_64-pc-linux-gnu :: offloading/parallel_offloading_map.cpp
  libomptarget :: x86_64-pc-linux-gnu :: offloading/taskloop_offload_nowait.cpp

There're three fails in my local check-all. Is this expected?

Failed Tests (3):
  libomptarget :: x86_64-pc-linux-gnu :: offloading/memory_manager.cpp
  libomptarget :: x86_64-pc-linux-gnu :: offloading/parallel_offloading_map.cpp
  libomptarget :: x86_64-pc-linux-gnu :: offloading/taskloop_offload_nowait.cpp

Pengfei, these fails are from a different part of code, that should not be affected by this change.

I see these fails in my environment (RHEL 8.2, gcc 8.3.1) for commit 4a76bd0e (previous buildable commit) as well.

pengfei added a comment.EditedJul 23 2021, 1:52 AM

There're three fails in my local check-all. Is this expected?

Failed Tests (3):
  libomptarget :: x86_64-pc-linux-gnu :: offloading/memory_manager.cpp
  libomptarget :: x86_64-pc-linux-gnu :: offloading/parallel_offloading_map.cpp
  libomptarget :: x86_64-pc-linux-gnu :: offloading/taskloop_offload_nowait.cpp

Pengfei, these fails are from a different part of code, that should not be affected by this change.

I see these fails in my environment (RHEL 8.2, gcc 8.3.1) for commit 4a76bd0e (previous buildable commit) as well.

I see. Then I commited it rGf7c92995c0e1f95.

I see. Then I commited it rGf7c92995c0e1f95.

Thanks, Pengfei.

There're three fails in my local check-all. Is this expected?

Failed Tests (3):
  libomptarget :: x86_64-pc-linux-gnu :: offloading/memory_manager.cpp
  libomptarget :: x86_64-pc-linux-gnu :: offloading/parallel_offloading_map.cpp
  libomptarget :: x86_64-pc-linux-gnu :: offloading/taskloop_offload_nowait.cpp

Pengfei, these fails are from a different part of code, that should not be affected by this change.

I see these fails in my environment (RHEL 8.2, gcc 8.3.1) for commit 4a76bd0e (previous buildable commit) as well.

I see. Then I commited it rGf7c92995c0e1f95.

Thanks for fixing it, I forgot about building without the SDK.