This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
openmp/libomptarget/
-
libomptarget/
-
include/
1
omptarget.h
-
omptargetplugin.h
-
plugins/
-
cuda/
-
dynamic_cuda/
-
cuda.h
-
cuda.cpp
-
src/
2/12
rtl.cpp
-
exports
-
src/
1
device.h
-
device.cpp
-
exports
-
interface.cpp
-
rtl.h
-
rtl.cpp

Differential D106751

[OpenMP][Libomptarget] Adding `print_device_info` to RTL and `omptarget`
ClosedPublic

Authored by josemonsalve2 on Jul 24 2021, 10:58 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
JonChesterfield
ggeorgakoudis
jhuber6
baziotis
sstefan1
uenoku
tianshilei1992

Commits

rGd2f85d0910ce: [OpenMP][Libomptarget] Adding `print_device_info` to RTL and `omptarget`

Summary

This patch introduces a function in the device's plugin to print the
device information. This patch relates to another patch that introduces
a CLI tool to obtain the device information from the omplibrary directly.
It is inspired by PGI's pgaccelinfo.

The modifications are as follows:

Introduce the optional void __tgt_rtl_print_device_info(RTLdevID) function into the RTL.
Introduce the bool __tgt_print_device_info(devID) function into omptarget interface. Returns false if the RTL is not implemented
Added bool printDeviceInfo(RTLDevID) to the DeviceTy
Implement the __tgt_rtl_print_device_info for CUDA. Added additional CUDA Runtime calls.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

josemonsalve2 created this revision.Jul 24 2021, 10:58 AM

Herald added subscribers: guansong, yaxunl. · View Herald TranscriptJul 24 2021, 10:58 AM

josemonsalve2 requested review of this revision.Jul 24 2021, 10:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 24 2021, 10:58 AM

Herald added a subscriber: openmp-commits. · View Herald Transcript

josemonsalve2 added a child revision: D106752: [OpenMP][Tool] Introducing the `llvm-omp-device-info` tool.Jul 24 2021, 11:01 AM

Harbormaster completed remote builds in B116041: Diff 361470.Jul 24 2021, 11:27 AM

Not obvious to me that the functionality has much to do with the plugin. Could do a standalone tool instead?

I think there's a tool called nvidia-smi that does something similar. There's definitely one called rocminfo that does. The latter prints 'human readable' output, which gets in the way of scripting with it.

openmp/libomptarget/plugins/cuda/src/rtl.cpp
1536	Verbose. Can probably format as a table containing the cuda API call, the text to print, possibly the corresponding HSA API call, then iterate over that table printing / building json etc

tianshilei1992 added inline comments.Jul 24 2021, 11:34 AM

openmp/libomptarget/plugins/cuda/src/rtl.cpp
1539	Suggest to put all implementation details into the class above.

In D106751#2902593, @JonChesterfield wrote:

Not obvious to me that the functionality has much to do with the plugin. Could do a standalone tool instead?

I think there's a tool called nvidia-smi that does something similar. There's definitely one called rocminfo that does. The latter prints 'human readable' output, which gets in the way of scripting with it.

Jon, the idea is to be able to have this information as seen from the Libomptarget and runtime itself. It doesn’t really intend to replace the stand alone tool for each vendor. Developers using openMP could isolate the runtime and its view of the system without having to write code and compile it. I’m thinking of adding the value that goes on the -fopenmp-targets flag so the user knows what devices are supported and how to compile for them. And what are their characteristics.

Think of remote offloading, virtual GPU and other targets that may not have access to an nvidia-smi like tool. I believe this is also why PGI provides it as well.

josemonsalve2 added inline comments.Jul 24 2021, 3:22 PM

openmp/libomptarget/plugins/cuda/src/rtl.cpp
1536	I believe this is a good idea. Let me think about how to adopt it.
1539	Also a good idea.

saiislam added a subscriber: saiislam.Jul 25 2021, 5:20 AM

I'm fine with getting this in first and cleaning it up more in-tree. Makes a nice addition to 13, useful for people. @jhuber6 should probably be exposed via a _INFO flag too.

@tianshilei1992 @JonChesterfield any objections?

openmp/libomptarget/plugins/cuda/src/rtl.cpp
1536	I agree but I think this can be done later, together with some other improvements, e.g., what output stream to use.
openmp/libomptarget/src/device.h
278	brief doxygen comment please

In D106751#2903124, @jdoerfert wrote:

I'm fine with getting this in first and cleaning it up more in-tree. Makes a nice addition to 13, useful for people. @jhuber6 should probably be exposed via a _INFO flag too.

@tianshilei1992 @JonChesterfield any objections?

Adding it as some information you can query could be useful. I'd call this method when we initialize the plugin if 0x40 in the bitfield is set.

In D106751#2903128, @jhuber6 wrote:

In D106751#2903124, @jdoerfert wrote:

I'm fine with getting this in first and cleaning it up more in-tree. Makes a nice addition to 13, useful for people. @jhuber6 should probably be exposed via a _INFO flag too.

@tianshilei1992 @JonChesterfield any objections?

Adding it as some information you can query could be useful. I'd call this method when we initialize the plugin if 0x40 in the bitfield is set.

yep.

In D106751#2903124, @jdoerfert wrote:

I'm fine with getting this in first and cleaning it up more in-tree. Makes a nice addition to 13, useful for people. @jhuber6 should probably be exposed via a _INFO flag too.

@tianshilei1992 @JonChesterfield any objections?

Sounds good. Improvements can be done later but the code structure is better to be settled down at the moment.

openmp/libomptarget/include/omptarget.h
336	Better to use `int` as return type as it is C function.
openmp/libomptarget/plugins/cuda/src/rtl.cpp
19	Why do we need C header?

jdoerfert added inline comments.Jul 25 2021, 10:05 AM

openmp/libomptarget/plugins/cuda/src/rtl.cpp
19	`<string>` below will include this anyway.

If it's for debugging openmp, we should do the cuda queries once and store their result, then use that result in printing and elsewhere. Otherwise there's a risk that the value printed is different to the one used, or that an error will prevent the other queries working in the print when they're done later.

I'd be inclined to do the above and the code cleanup before landing but don't strongly object to refactoring in tree.

In D106751#2903161, @JonChesterfield wrote:

If it's for debugging openmp, we should do the cuda queries once and store their result, then use that result in printing and elsewhere. Otherwise there's a risk that the value printed is different to the one used, or that an error will prevent the other queries working in the print when they're done later.

That doesn't make as much sense as you think. Most of the values are not actually "used" anywhere. Some that are might be overwritten per target region now or in the future. All in all, there is little reason to cache stuff, if you want to know cuda values, ask cuda, (or HSA, ...). If we cache stuff there is more risk and complexity for no gain.

Printing values that openmp doesn't use seems misleading for debugging openmp. Cuda has a sticky error model where once something goes wrong, all/some calls into cuda fail afterwards. That makes querying information after failure less likely to work than querying the information before failure.

Despite those concerns, and the one about signal to noise ratio in the diff, if this is useful for cuda/openmp dev in practice then go for it.

Updating minor comments. Major re-design of a less verbose solution will be added later

Harbormaster completed remote builds in B116324: Diff 361874.Jul 26 2021, 7:42 PM

This revision is now accepted and ready to land.Jul 26 2021, 8:56 PM

Rebase to main

Harbormaster completed remote builds in B116576: Diff 362227.Jul 27 2021, 6:31 PM

This revision was landed with ongoing or failed builds.Jul 27 2021, 6:48 PM

Closed by commit rGd2f85d0910ce: [OpenMP][Libomptarget] Adding `print_device_info` to RTL and `omptarget` (authored by Jose M Monsalve Diaz <jmonsalvediaz@anl.gov>, committed by tianshilei1992). · Explain Why

This revision was automatically updated to reflect the committed changes.

tianshilei1992 added a commit: rGd2f85d0910ce: [OpenMP][Libomptarget] Adding `print_device_info` to RTL and `omptarget`.

zsrkmyn added a subscriber: zsrkmyn.Jul 27 2021, 8:19 PM

zsrkmyn added inline comments.

openmp/libomptarget/plugins/cuda/src/rtl.cpp
1187	Hi, I just found the macro CU_DEVICE_ATTRIBUTE_GPU_OVERLAP is defined nowhere inside the llvm source tree, leading to compilation failure on machines w/o cuda SDK. Could you help take a look?

jhuber6 added inline comments.Jul 27 2021, 8:21 PM

openmp/libomptarget/plugins/cuda/src/rtl.cpp
1187	Someone needs to add it to `/openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h` I'm assuming.

jdoerfert added inline comments.Jul 27 2021, 8:21 PM

openmp/libomptarget/plugins/cuda/src/rtl.cpp
1187	Only this one?

abhinavgaba added a subscriber: abhinavgaba.Jul 27 2021, 8:30 PM

zsrkmyn added inline comments.Jul 27 2021, 8:35 PM

openmp/libomptarget/plugins/cuda/src/rtl.cpp

1187

From my build log, there's no matching function for call to the following 2 functinos,

cuDeviceGetName
cuDeviceTotalMem

And the following macros are not defined

CU_DEVICE_ATTRIBUTE_GPU_OVERLAP
CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Y
CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Z
CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Y
CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Z
CU_DEVICE_ATTRIBUTE_MAX_PITCH
CU_DEVICE_ATTRIBUTE_CLOCK_RATE
CU_DEVICE_ATTRIBUTE_INTEGRATED
CU_DEVICE_ATTRIBUTE_COMPUTE_MODE

Not sure if there are more errors.

zsrkmyn added inline comments.Jul 27 2021, 9:05 PM

openmp/libomptarget/plugins/cuda/src/rtl.cpp

1187

My build passed w/ the following patch.

There are quite a lot macros are missing and I'm too lazy to check one by one, so I just add all of them from [1]. I would appreciate a lot if someone could help land it.

[1] https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE.html

diff --git a/openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h b/openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h
index 0814db7e9d26..14049e1f7559 100644
--- a/openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h
+++ b/openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h
@@ -46,9 +46,132 @@ typedef enum CUlimit_enum {
 } CUlimit;
 
 typedef enum CUdevice_attribute_enum {
+  CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK = 1,
   CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_X = 2,
+  CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Y = 3,
+  CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Z = 4,
   CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_X = 5,
+  CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Y = 6,
+  CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Z = 7,
+  CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK = 8,
+  CU_DEVICE_ATTRIBUTE_SHARED_MEMORY_PER_BLOCK = 8,
+  CU_DEVICE_ATTRIBUTE_TOTAL_CONSTANT_MEMORY = 9,
   CU_DEVICE_ATTRIBUTE_WARP_SIZE = 10,
+  CU_DEVICE_ATTRIBUTE_MAX_PITCH = 11,
+  CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_BLOCK = 12,
+  CU_DEVICE_ATTRIBUTE_REGISTERS_PER_BLOCK = 12,
+  CU_DEVICE_ATTRIBUTE_CLOCK_RATE = 13,
+  CU_DEVICE_ATTRIBUTE_TEXTURE_ALIGNMENT = 14,
+  CU_DEVICE_ATTRIBUTE_GPU_OVERLAP = 15,
+  CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT = 16,
+  CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT = 17,
+  CU_DEVICE_ATTRIBUTE_INTEGRATED = 18,
+  CU_DEVICE_ATTRIBUTE_CAN_MAP_HOST_MEMORY = 19,
+  CU_DEVICE_ATTRIBUTE_COMPUTE_MODE = 20,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE1D_WIDTH = 21,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_WIDTH = 22,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_HEIGHT = 23,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_WIDTH = 24,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_HEIGHT = 25,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_DEPTH = 26,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LAYERED_WIDTH = 27,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LAYERED_HEIGHT = 28,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LAYERED_LAYERS = 29,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_ARRAY_WIDTH = 27,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_ARRAY_HEIGHT = 28,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_ARRAY_NUMSLICES = 29,
+  CU_DEVICE_ATTRIBUTE_SURFACE_ALIGNMENT = 30,
+  CU_DEVICE_ATTRIBUTE_CONCURRENT_KERNELS = 31,
+  CU_DEVICE_ATTRIBUTE_ECC_ENABLED = 32,
+  CU_DEVICE_ATTRIBUTE_PCI_BUS_ID = 33,
+  CU_DEVICE_ATTRIBUTE_PCI_DEVICE_ID = 34,
+  CU_DEVICE_ATTRIBUTE_TCC_DRIVER = 35,
+  CU_DEVICE_ATTRIBUTE_MEMORY_CLOCK_RATE = 36,
+  CU_DEVICE_ATTRIBUTE_GLOBAL_MEMORY_BUS_WIDTH = 37,
+  CU_DEVICE_ATTRIBUTE_L2_CACHE_SIZE = 38,
+  CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR = 39,
+  CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT = 40,
+  CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING = 41,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE1D_LAYERED_WIDTH = 42,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE1D_LAYERED_LAYERS = 43,
+  CU_DEVICE_ATTRIBUTE_CAN_TEX2D_GATHER = 44,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_GATHER_WIDTH = 45,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_GATHER_HEIGHT = 46,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_WIDTH_ALTERNATE = 47,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_HEIGHT_ALTERNATE = 48,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_DEPTH_ALTERNATE = 49,
+  CU_DEVICE_ATTRIBUTE_PCI_DOMAIN_ID = 50,
+  CU_DEVICE_ATTRIBUTE_TEXTURE_PITCH_ALIGNMENT = 51,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURECUBEMAP_WIDTH = 52,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURECUBEMAP_LAYERED_WIDTH = 53,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURECUBEMAP_LAYERED_LAYERS = 54,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE1D_WIDTH = 55,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE2D_WIDTH = 56,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE2D_HEIGHT = 57,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE3D_WIDTH = 58,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE3D_HEIGHT = 59,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE3D_DEPTH = 60,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE1D_LAYERED_WIDTH = 61,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE1D_LAYERED_LAYERS = 62,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE2D_LAYERED_WIDTH = 63,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE2D_LAYERED_HEIGHT = 64,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE2D_LAYERED_LAYERS = 65,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACECUBEMAP_WIDTH = 66,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACECUBEMAP_LAYERED_WIDTH = 67,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACECUBEMAP_LAYERED_LAYERS = 68,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE1D_LINEAR_WIDTH = 69,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LINEAR_WIDTH = 70,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LINEAR_HEIGHT = 71,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LINEAR_PITCH = 72,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_MIPMAPPED_WIDTH = 73,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_MIPMAPPED_HEIGHT = 74,
+  CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR = 75,
+  CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR = 76,
+  CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE1D_MIPMAPPED_WIDTH = 77,
+  CU_DEVICE_ATTRIBUTE_STREAM_PRIORITIES_SUPPORTED = 78,
+  CU_DEVICE_ATTRIBUTE_GLOBAL_L1_CACHE_SUPPORTED = 79,
+  CU_DEVICE_ATTRIBUTE_LOCAL_L1_CACHE_SUPPORTED = 80,
+  CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_MULTIPROCESSOR = 81,
+  CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82,
+  CU_DEVICE_ATTRIBUTE_MANAGED_MEMORY = 83,
+  CU_DEVICE_ATTRIBUTE_MULTI_GPU_BOARD = 84,
+  CU_DEVICE_ATTRIBUTE_MULTI_GPU_BOARD_GROUP_ID = 85,
+  CU_DEVICE_ATTRIBUTE_HOST_NATIVE_ATOMIC_SUPPORTED = 86,
+  CU_DEVICE_ATTRIBUTE_SINGLE_TO_DOUBLE_PRECISION_PERF_RATIO = 87,
+  CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS = 88,
+  CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS = 89,
+  CU_DEVICE_ATTRIBUTE_COMPUTE_PREEMPTION_SUPPORTED = 90,
+  CU_DEVICE_ATTRIBUTE_CAN_USE_HOST_POINTER_FOR_REGISTERED_MEM = 91,
+  CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_MEM_OPS = 92,
+  CU_DEVICE_ATTRIBUTE_CAN_USE_64_BIT_STREAM_MEM_OPS = 93,
+  CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_WAIT_VALUE_NOR = 94,
+  CU_DEVICE_ATTRIBUTE_COOPERATIVE_LAUNCH = 95,
+  CU_DEVICE_ATTRIBUTE_COOPERATIVE_MULTI_DEVICE_LAUNCH = 96,
+  CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK_OPTIN = 97,
+  CU_DEVICE_ATTRIBUTE_CAN_FLUSH_REMOTE_WRITES = 98,
+  CU_DEVICE_ATTRIBUTE_HOST_REGISTER_SUPPORTED = 99,
+  CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES = 100,
+  CU_DEVICE_ATTRIBUTE_DIRECT_MANAGED_MEM_ACCESS_FROM_HOST = 101,
+  CU_DEVICE_ATTRIBUTE_VIRTUAL_ADDRESS_MANAGEMENT_SUPPORTED = 102,
+  CU_DEVICE_ATTRIBUTE_VIRTUAL_MEMORY_MANAGEMENT_SUPPORTED = 102,
+  CU_DEVICE_ATTRIBUTE_HANDLE_TYPE_POSIX_FILE_DESCRIPTOR_SUPPORTED = 103,
+  CU_DEVICE_ATTRIBUTE_HANDLE_TYPE_WIN32_HANDLE_SUPPORTED = 104,
+  CU_DEVICE_ATTRIBUTE_HANDLE_TYPE_WIN32_KMT_HANDLE_SUPPORTED = 105,
+  CU_DEVICE_ATTRIBUTE_MAX_BLOCKS_PER_MULTIPROCESSOR = 106,
+  CU_DEVICE_ATTRIBUTE_GENERIC_COMPRESSION_SUPPORTED = 107,
+  CU_DEVICE_ATTRIBUTE_MAX_PERSISTING_L2_CACHE_SIZE = 108,
+  CU_DEVICE_ATTRIBUTE_MAX_ACCESS_POLICY_WINDOW_SIZE = 109,
+  CU_DEVICE_ATTRIBUTE_GPU_DIRECT_RDMA_WITH_CUDA_VMM_SUPPORTED = 110,
+  CU_DEVICE_ATTRIBUTE_RESERVED_SHARED_MEMORY_PER_BLOCK = 111,
+  CU_DEVICE_ATTRIBUTE_SPARSE_CUDA_ARRAY_SUPPORTED = 112,
+  CU_DEVICE_ATTRIBUTE_READ_ONLY_HOST_REGISTER_SUPPORTED = 113,
+  CU_DEVICE_ATTRIBUTE_TIMELINE_SEMAPHORE_INTEROP_SUPPORTED = 114,
+  CU_DEVICE_ATTRIBUTE_MEMORY_POOLS_SUPPORTED = 115,
+  CU_DEVICE_ATTRIBUTE_GPU_DIRECT_RDMA_SUPPORTED = 116,
+  CU_DEVICE_ATTRIBUTE_GPU_DIRECT_RDMA_FLUSH_WRITES_OPTIONS = 117,
+  CU_DEVICE_ATTRIBUTE_GPU_DIRECT_RDMA_WRITES_ORDERING = 118,
+  CU_DEVICE_ATTRIBUTE_MEMPOOL_SUPPORTED_HANDLE_TYPES = 119,
+  CU_DEVICE_ATTRIBUTE_MAX,
 } CUdevice_attribute;
 
 typedef enum CUfunction_attribute_enum {
@@ -66,6 +189,12 @@ typedef enum CUmemAttach_flags_enum {
   CU_MEM_ATTACH_SINGLE = 0x4,
 } CUmemAttach_flags;
 
+typedef enum CUcomputeMode_enum {
+  CU_COMPUTEMODE_DEFAULT = 0,
+  CU_COMPUTEMODE_PROHIBITED = 2,
+  CU_COMPUTEMODE_EXCLUSIVE_PROCESS = 3,
+} CUcompute_mode;
+
 CUresult cuCtxGetDevice(CUdevice *);
 CUresult cuDeviceGet(CUdevice *, int);
 CUresult cuDeviceGetAttribute(int *, CUdevice_attribute, CUdevice);
@@ -73,8 +202,8 @@ CUresult cuDeviceGetCount(int *);
 CUresult cuFuncGetAttribute(int *, CUfunction_attribute, CUfunction);
 
 // Device info
-CUresult cuDeviceGetName(char *, int, CUdevice *);
-CUresult cuDeviceTotalMem(size_t *, CUdevice *);
+CUresult cuDeviceGetName(char *, int, CUdevice);
+CUresult cuDeviceTotalMem(size_t *, CUdevice);
 CUresult cuDriverGetVersion(int *);
 
 CUresult cuGetErrorString(CUresult, const char **);

Sorry for the delay. Working on this

josemonsalve2 mentioned this in D106933: [OpenMP] Fixing missing variables when CUDA SDK not in system.Jul 27 2021, 9:39 PM

jdoerfert mentioned this in rG88e66fa60ae5: [OpenMP] Fixing missing variables when CUDA SDK not in system.Jul 27 2021, 9:46 PM

@zsrkmyn Thanks for the report and patch https://reviews.llvm.org/rG88e66fa60ae5bad764455b5a0337aa47233f657d

Thanks for quickly fixing it :-)

Revision Contents

Path

Size

openmp/

libomptarget/

include/

omptarget.h

1 line

omptargetplugin.h

3 lines

plugins/

cuda/

dynamic_cuda/

cuda.h

5 lines

cuda.cpp

5 lines

src/

rtl.cpp

179 lines

exports

1 line

src/

4 lines

8 lines

1 line

5 lines

2 lines

2 lines

Diff 362259

openmp/libomptarget/include/omptarget.h

	Show First 20 Lines • Show All 327 Lines • ▼ Show 20 Lines

	void __kmpc_push_target_tripcount(int64_t device_id, uint64_t loop_tripcount);			void __kmpc_push_target_tripcount(int64_t device_id, uint64_t loop_tripcount);

	void __kmpc_push_target_tripcount_mapper(ident_t *loc, int64_t device_id,			void __kmpc_push_target_tripcount_mapper(ident_t *loc, int64_t device_id,
	uint64_t loop_tripcount);			uint64_t loop_tripcount);

	void __tgt_set_info_flag(uint32_t);			void __tgt_set_info_flag(uint32_t);

				int __tgt_print_device_info(int64_t device_id);
				tianshilei1992Unsubmitted Not Done Reply Inline Actions Better to use `int` as return type as it is C function. tianshilei1992: Better to use `int` as return type as it is C function.
	#ifdef __cplusplus			#ifdef __cplusplus
	}			}
	#endif			#endif

	#ifdef __cplusplus			#ifdef __cplusplus
	#define EXTERN extern "C"			#define EXTERN extern "C"
	#else			#else
	#define EXTERN extern			#define EXTERN extern
	#endif			#endif

	#endif // _OMPTARGET_H_			#endif // _OMPTARGET_H_

openmp/libomptarget/include/omptargetplugin.h

	Show First 20 Lines • Show All 136 Lines • ▼ Show 20 Lines

	// Device synchronization. In case of success, return zero. Otherwise, return an			// Device synchronization. In case of success, return zero. Otherwise, return an
	// error code.			// error code.
	int32_t __tgt_rtl_synchronize(int32_t ID, __tgt_async_info *AsyncInfo);			int32_t __tgt_rtl_synchronize(int32_t ID, __tgt_async_info *AsyncInfo);

	// Set plugin's internal information flag externally.			// Set plugin's internal information flag externally.
	void __tgt_rtl_set_info_flag(uint32_t);			void __tgt_rtl_set_info_flag(uint32_t);

				// Print the device information
				void __tgt_rtl_print_device_info(int32_t ID);

	#ifdef __cplusplus			#ifdef __cplusplus
	}			}
	#endif			#endif

	#endif // _OMPTARGETPLUGIN_H_			#endif // _OMPTARGETPLUGIN_H_

openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h

	Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	} CUmemAttach_flags;			} CUmemAttach_flags;

	CUresult cuCtxGetDevice(CUdevice *);			CUresult cuCtxGetDevice(CUdevice *);
	CUresult cuDeviceGet(CUdevice *, int);			CUresult cuDeviceGet(CUdevice *, int);
	CUresult cuDeviceGetAttribute(int *, CUdevice_attribute, CUdevice);			CUresult cuDeviceGetAttribute(int *, CUdevice_attribute, CUdevice);
	CUresult cuDeviceGetCount(int *);			CUresult cuDeviceGetCount(int *);
	CUresult cuFuncGetAttribute(int *, CUfunction_attribute, CUfunction);			CUresult cuFuncGetAttribute(int *, CUfunction_attribute, CUfunction);

				// Device info
				CUresult cuDeviceGetName(char , int, CUdevice );
				CUresult cuDeviceTotalMem(size_t , CUdevice );
				CUresult cuDriverGetVersion(int *);

	CUresult cuGetErrorString(CUresult, const char **);			CUresult cuGetErrorString(CUresult, const char **);
	CUresult cuInit(unsigned);			CUresult cuInit(unsigned);
	CUresult cuLaunchKernel(CUfunction, unsigned, unsigned, unsigned, unsigned,			CUresult cuLaunchKernel(CUfunction, unsigned, unsigned, unsigned, unsigned,
	unsigned, unsigned, unsigned, CUstream, void **,			unsigned, unsigned, unsigned, CUstream, void **,
	void **);			void **);

	CUresult cuMemAlloc(CUdeviceptr *, size_t);			CUresult cuMemAlloc(CUdeviceptr *, size_t);
	CUresult cuMemAllocHost(void **, size_t);			CUresult cuMemAllocHost(void **, size_t);
	Show All 35 Lines

openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.cpp

	Show All 22 Lines
	DLWRAP_INTERNAL(cuInit, 1);			DLWRAP_INTERNAL(cuInit, 1);

	DLWRAP(cuCtxGetDevice, 1);			DLWRAP(cuCtxGetDevice, 1);
	DLWRAP(cuDeviceGet, 2);			DLWRAP(cuDeviceGet, 2);
	DLWRAP(cuDeviceGetAttribute, 3);			DLWRAP(cuDeviceGetAttribute, 3);
	DLWRAP(cuDeviceGetCount, 1);			DLWRAP(cuDeviceGetCount, 1);
	DLWRAP(cuFuncGetAttribute, 3);			DLWRAP(cuFuncGetAttribute, 3);

				// Device info
				DLWRAP(cuDeviceGetName, 3);
				DLWRAP(cuDeviceTotalMem, 2);
				DLWRAP(cuDriverGetVersion, 1);

	DLWRAP(cuGetErrorString, 2);			DLWRAP(cuGetErrorString, 2);
	DLWRAP(cuLaunchKernel, 11);			DLWRAP(cuLaunchKernel, 11);

	DLWRAP(cuMemAlloc, 2);			DLWRAP(cuMemAlloc, 2);
	DLWRAP(cuMemAllocHost, 2);			DLWRAP(cuMemAllocHost, 2);
	DLWRAP(cuMemAllocManaged, 3);			DLWRAP(cuMemAllocManaged, 3);

	DLWRAP(cuMemcpyDtoDAsync, 4);			DLWRAP(cuMemcpyDtoDAsync, 4);
	▲ Show 20 Lines • Show All 97 Lines • Show Last 20 Lines

openmp/libomptarget/plugins/cuda/src/rtl.cpp

Show All 10 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include <cassert>		#include <cassert>
#include <cstddef>		#include <cstddef>
#include <cuda.h>		#include <cuda.h>
#include <list>		#include <list>
#include <memory>		#include <memory>
#include <mutex>		#include <mutex>
#include <string>		#include <string>
		tianshilei1992Unsubmitted Not Done Reply Inline Actions Why do we need C header? tianshilei1992: Why do we need C header?
		jdoerfertUnsubmitted Not Done Reply Inline Actions `<string>` below will include this anyway. jdoerfert: `<string>` below will include this anyway.
#include <unordered_map>		#include <unordered_map>
#include <vector>		#include <vector>

#include "Debug.h"		#include "Debug.h"
#include "omptargetplugin.h"		#include "omptargetplugin.h"

#define TARGET_NAME CUDA		#define TARGET_NAME CUDA
#define DEBUG_PREFIX "Target " GETNAME(TARGET_NAME) " RTL"		#define DEBUG_PREFIX "Target " GETNAME(TARGET_NAME) " RTL"
Show All 28 Lines	#define CUDA_ERR_STRING(err) \
do { \		do { \
const char *errStr = nullptr; \		const char *errStr = nullptr; \
CUresult errStr_status = cuGetErrorString(err, &errStr); \		CUresult errStr_status = cuGetErrorString(err, &errStr); \
if (errStr_status == CUDA_SUCCESS) \		if (errStr_status == CUDA_SUCCESS) \
REPORT("%s \n", errStr); \		REPORT("%s \n", errStr); \
} while (false)		} while (false)
#endif // OMPTARGET_DEBUG		#endif // OMPTARGET_DEBUG

		#define BOOL2TEXT(b) ((b) ? "Yes" : "No")

#include "elf_common.h"		#include "elf_common.h"

/// Keep entries table per device.		/// Keep entries table per device.
struct FuncOrGblEntryTy {		struct FuncOrGblEntryTy {
__tgt_target_table Table;		__tgt_target_table Table;
std::vector<__tgt_offload_entry> Entries;		std::vector<__tgt_offload_entry> Entries;
};		};

▲ Show 20 Lines • Show All 1,080 Lines • ▼ Show 20 Lines	int synchronize(const int DeviceId, __tgt_async_info *AsyncInfo) const {
if (Err != CUDA_SUCCESS) {		if (Err != CUDA_SUCCESS) {
DP("Error when synchronizing stream. stream = " DPxMOD		DP("Error when synchronizing stream. stream = " DPxMOD
", async info ptr = " DPxMOD "\n",		", async info ptr = " DPxMOD "\n",
DPxPTR(Stream), DPxPTR(AsyncInfo));		DPxPTR(Stream), DPxPTR(AsyncInfo));
CUDA_ERR_STRING(Err);		CUDA_ERR_STRING(Err);
}		}
return (Err == CUDA_SUCCESS) ? OFFLOAD_SUCCESS : OFFLOAD_FAIL;		return (Err == CUDA_SUCCESS) ? OFFLOAD_SUCCESS : OFFLOAD_FAIL;
}		}

		void printDeviceInfo(int32_t device_id) {
		char TmpChar[1000];
		std::string TmpStr;
		size_t TmpSt;
		int TmpInt, TmpInt2, TmpInt3;

		CUdevice Device;
		checkResult(cuDeviceGet(&Device, device_id),
		"Error returned from cuCtxGetDevice\n");

		cuDriverGetVersion(&TmpInt);
		printf(" CUDA Driver Version: \t\t%d \n", TmpInt);
		printf(" CUDA Device Number: \t\t%d \n", device_id);
		checkResult(cuDeviceGetName(TmpChar, 1000, Device),
		"Error returned from cuDeviceGetName\n");
		printf(" Device Name: \t\t\t%s \n", TmpChar);
		checkResult(cuDeviceTotalMem(&TmpSt, Device),
		"Error returned from cuDeviceTotalMem\n");
		printf(" Global Memory Size: \t\t%zu bytes \n", TmpSt);
		checkResult(cuDeviceGetAttribute(
		&TmpInt, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT, Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Number of Multiprocessors: \t\t%d \n", TmpInt);
		checkResult(
		cuDeviceGetAttribute(&TmpInt, CU_DEVICE_ATTRIBUTE_GPU_OVERLAP, Device),
		zsrkmynUnsubmitted Not Done Reply Inline Actions Hi, I just found the macro CU_DEVICE_ATTRIBUTE_GPU_OVERLAP is defined nowhere inside the llvm source tree, leading to compilation failure on machines w/o cuda SDK. Could you help take a look? zsrkmyn: Hi, I just found the macro CU_DEVICE_ATTRIBUTE_GPU_OVERLAP is defined nowhere inside the llvm…
		jhuber6Unsubmitted Not Done Reply Inline Actions Someone needs to add it to `/openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h` I'm assuming. jhuber6: Someone needs to add it to `/openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h` I'm assuming.
		jdoerfertUnsubmitted Not Done Reply Inline Actions Only this one? jdoerfert: Only this one?
		zsrkmynUnsubmitted Not Done Reply Inline Actions From my build log, there's no matching function for call to the following 2 functinos, cuDeviceGetName cuDeviceTotalMem And the following macros are not defined CU_DEVICE_ATTRIBUTE_GPU_OVERLAP CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Y CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Z CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Y CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Z CU_DEVICE_ATTRIBUTE_MAX_PITCH CU_DEVICE_ATTRIBUTE_CLOCK_RATE CU_DEVICE_ATTRIBUTE_INTEGRATED CU_DEVICE_ATTRIBUTE_COMPUTE_MODE Not sure if there are more errors. zsrkmyn: From my build log, there's no matching function for call to the following 2 functinos, ```…
		zsrkmynUnsubmitted Not Done Reply Inline Actions My build passed w/ the following patch. There are quite a lot macros are missing and I'm too lazy to check one by one, so I just add all of them from [1]. I would appreciate a lot if someone could help land it. [1] https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE.html diff --git a/openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h b/openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h index 0814db7e9d26..14049e1f7559 100644 --- a/openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h +++ b/openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h @@ -46,9 +46,132 @@ typedef enum CUlimit_enum { } CUlimit; typedef enum CUdevice_attribute_enum { + CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK = 1, CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_X = 2, + CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Y = 3, + CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Z = 4, CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_X = 5, + CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Y = 6, + CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Z = 7, + CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK = 8, + CU_DEVICE_ATTRIBUTE_SHARED_MEMORY_PER_BLOCK = 8, + CU_DEVICE_ATTRIBUTE_TOTAL_CONSTANT_MEMORY = 9, CU_DEVICE_ATTRIBUTE_WARP_SIZE = 10, + CU_DEVICE_ATTRIBUTE_MAX_PITCH = 11, + CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_BLOCK = 12, + CU_DEVICE_ATTRIBUTE_REGISTERS_PER_BLOCK = 12, + CU_DEVICE_ATTRIBUTE_CLOCK_RATE = 13, + CU_DEVICE_ATTRIBUTE_TEXTURE_ALIGNMENT = 14, + CU_DEVICE_ATTRIBUTE_GPU_OVERLAP = 15, + CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT = 16, + CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT = 17, + CU_DEVICE_ATTRIBUTE_INTEGRATED = 18, + CU_DEVICE_ATTRIBUTE_CAN_MAP_HOST_MEMORY = 19, + CU_DEVICE_ATTRIBUTE_COMPUTE_MODE = 20, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE1D_WIDTH = 21, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_WIDTH = 22, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_HEIGHT = 23, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_WIDTH = 24, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_HEIGHT = 25, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_DEPTH = 26, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LAYERED_WIDTH = 27, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LAYERED_HEIGHT = 28, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LAYERED_LAYERS = 29, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_ARRAY_WIDTH = 27, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_ARRAY_HEIGHT = 28, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_ARRAY_NUMSLICES = 29, + CU_DEVICE_ATTRIBUTE_SURFACE_ALIGNMENT = 30, + CU_DEVICE_ATTRIBUTE_CONCURRENT_KERNELS = 31, + CU_DEVICE_ATTRIBUTE_ECC_ENABLED = 32, + CU_DEVICE_ATTRIBUTE_PCI_BUS_ID = 33, + CU_DEVICE_ATTRIBUTE_PCI_DEVICE_ID = 34, + CU_DEVICE_ATTRIBUTE_TCC_DRIVER = 35, + CU_DEVICE_ATTRIBUTE_MEMORY_CLOCK_RATE = 36, + CU_DEVICE_ATTRIBUTE_GLOBAL_MEMORY_BUS_WIDTH = 37, + CU_DEVICE_ATTRIBUTE_L2_CACHE_SIZE = 38, + CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR = 39, + CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT = 40, + CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING = 41, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE1D_LAYERED_WIDTH = 42, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE1D_LAYERED_LAYERS = 43, + CU_DEVICE_ATTRIBUTE_CAN_TEX2D_GATHER = 44, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_GATHER_WIDTH = 45, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_GATHER_HEIGHT = 46, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_WIDTH_ALTERNATE = 47, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_HEIGHT_ALTERNATE = 48, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_DEPTH_ALTERNATE = 49, + CU_DEVICE_ATTRIBUTE_PCI_DOMAIN_ID = 50, + CU_DEVICE_ATTRIBUTE_TEXTURE_PITCH_ALIGNMENT = 51, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURECUBEMAP_WIDTH = 52, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURECUBEMAP_LAYERED_WIDTH = 53, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURECUBEMAP_LAYERED_LAYERS = 54, + CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE1D_WIDTH = 55, + CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE2D_WIDTH = 56, + CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE2D_HEIGHT = 57, + CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE3D_WIDTH = 58, + CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE3D_HEIGHT = 59, + CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE3D_DEPTH = 60, + CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE1D_LAYERED_WIDTH = 61, + CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE1D_LAYERED_LAYERS = 62, + CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE2D_LAYERED_WIDTH = 63, + CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE2D_LAYERED_HEIGHT = 64, + CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE2D_LAYERED_LAYERS = 65, + CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACECUBEMAP_WIDTH = 66, + CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACECUBEMAP_LAYERED_WIDTH = 67, + CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACECUBEMAP_LAYERED_LAYERS = 68, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE1D_LINEAR_WIDTH = 69, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LINEAR_WIDTH = 70, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LINEAR_HEIGHT = 71, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LINEAR_PITCH = 72, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_MIPMAPPED_WIDTH = 73, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_MIPMAPPED_HEIGHT = 74, + CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR = 75, + CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR = 76, + CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE1D_MIPMAPPED_WIDTH = 77, + CU_DEVICE_ATTRIBUTE_STREAM_PRIORITIES_SUPPORTED = 78, + CU_DEVICE_ATTRIBUTE_GLOBAL_L1_CACHE_SUPPORTED = 79, + CU_DEVICE_ATTRIBUTE_LOCAL_L1_CACHE_SUPPORTED = 80, + CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_MULTIPROCESSOR = 81, + CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82, + CU_DEVICE_ATTRIBUTE_MANAGED_MEMORY = 83, + CU_DEVICE_ATTRIBUTE_MULTI_GPU_BOARD = 84, + CU_DEVICE_ATTRIBUTE_MULTI_GPU_BOARD_GROUP_ID = 85, + CU_DEVICE_ATTRIBUTE_HOST_NATIVE_ATOMIC_SUPPORTED = 86, + CU_DEVICE_ATTRIBUTE_SINGLE_TO_DOUBLE_PRECISION_PERF_RATIO = 87, + CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS = 88, + CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS = 89, + CU_DEVICE_ATTRIBUTE_COMPUTE_PREEMPTION_SUPPORTED = 90, + CU_DEVICE_ATTRIBUTE_CAN_USE_HOST_POINTER_FOR_REGISTERED_MEM = 91, + CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_MEM_OPS = 92, + CU_DEVICE_ATTRIBUTE_CAN_USE_64_BIT_STREAM_MEM_OPS = 93, + CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_WAIT_VALUE_NOR = 94, + CU_DEVICE_ATTRIBUTE_COOPERATIVE_LAUNCH = 95, + CU_DEVICE_ATTRIBUTE_COOPERATIVE_MULTI_DEVICE_LAUNCH = 96, + CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK_OPTIN = 97, + CU_DEVICE_ATTRIBUTE_CAN_FLUSH_REMOTE_WRITES = 98, + CU_DEVICE_ATTRIBUTE_HOST_REGISTER_SUPPORTED = 99, + CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES = 100, + CU_DEVICE_ATTRIBUTE_DIRECT_MANAGED_MEM_ACCESS_FROM_HOST = 101, + CU_DEVICE_ATTRIBUTE_VIRTUAL_ADDRESS_MANAGEMENT_SUPPORTED = 102, + CU_DEVICE_ATTRIBUTE_VIRTUAL_MEMORY_MANAGEMENT_SUPPORTED = 102, + CU_DEVICE_ATTRIBUTE_HANDLE_TYPE_POSIX_FILE_DESCRIPTOR_SUPPORTED = 103, + CU_DEVICE_ATTRIBUTE_HANDLE_TYPE_WIN32_HANDLE_SUPPORTED = 104, + CU_DEVICE_ATTRIBUTE_HANDLE_TYPE_WIN32_KMT_HANDLE_SUPPORTED = 105, + CU_DEVICE_ATTRIBUTE_MAX_BLOCKS_PER_MULTIPROCESSOR = 106, + CU_DEVICE_ATTRIBUTE_GENERIC_COMPRESSION_SUPPORTED = 107, + CU_DEVICE_ATTRIBUTE_MAX_PERSISTING_L2_CACHE_SIZE = 108, + CU_DEVICE_ATTRIBUTE_MAX_ACCESS_POLICY_WINDOW_SIZE = 109, + CU_DEVICE_ATTRIBUTE_GPU_DIRECT_RDMA_WITH_CUDA_VMM_SUPPORTED = 110, + CU_DEVICE_ATTRIBUTE_RESERVED_SHARED_MEMORY_PER_BLOCK = 111, + CU_DEVICE_ATTRIBUTE_SPARSE_CUDA_ARRAY_SUPPORTED = 112, + CU_DEVICE_ATTRIBUTE_READ_ONLY_HOST_REGISTER_SUPPORTED = 113, + CU_DEVICE_ATTRIBUTE_TIMELINE_SEMAPHORE_INTEROP_SUPPORTED = 114, + CU_DEVICE_ATTRIBUTE_MEMORY_POOLS_SUPPORTED = 115, + CU_DEVICE_ATTRIBUTE_GPU_DIRECT_RDMA_SUPPORTED = 116, + CU_DEVICE_ATTRIBUTE_GPU_DIRECT_RDMA_FLUSH_WRITES_OPTIONS = 117, + CU_DEVICE_ATTRIBUTE_GPU_DIRECT_RDMA_WRITES_ORDERING = 118, + CU_DEVICE_ATTRIBUTE_MEMPOOL_SUPPORTED_HANDLE_TYPES = 119, + CU_DEVICE_ATTRIBUTE_MAX, } CUdevice_attribute; typedef enum CUfunction_attribute_enum { @@ -66,6 +189,12 @@ typedef enum CUmemAttach_flags_enum { CU_MEM_ATTACH_SINGLE = 0x4, } CUmemAttach_flags; +typedef enum CUcomputeMode_enum { + CU_COMPUTEMODE_DEFAULT = 0, + CU_COMPUTEMODE_PROHIBITED = 2, + CU_COMPUTEMODE_EXCLUSIVE_PROCESS = 3, +} CUcompute_mode; + CUresult cuCtxGetDevice(CUdevice ); CUresult cuDeviceGet(CUdevice , int); CUresult cuDeviceGetAttribute(int , CUdevice_attribute, CUdevice); @@ -73,8 +202,8 @@ CUresult cuDeviceGetCount(int ); CUresult cuFuncGetAttribute(int , CUfunction_attribute, CUfunction); // Device info -CUresult cuDeviceGetName(char , int, CUdevice ); -CUresult cuDeviceTotalMem(size_t , CUdevice ); +CUresult cuDeviceGetName(char , int, CUdevice); +CUresult cuDeviceTotalMem(size_t , CUdevice); CUresult cuDriverGetVersion(int ); CUresult cuGetErrorString(CUresult, const char ); zsrkmyn:** My build passed w/ the following patch. There are quite a lot macros are missing and I'm too…
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Concurrent Copy and Execution: \t%s \n", BOOL2TEXT(TmpInt));
		checkResult(cuDeviceGetAttribute(
		&TmpInt, CU_DEVICE_ATTRIBUTE_TOTAL_CONSTANT_MEMORY, Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Total Constant Memory: \t\t%d bytes\n", TmpInt);
		checkResult(
		cuDeviceGetAttribute(
		&TmpInt, CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK, Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Max Shared Memory per Block: \t%d bytes \n", TmpInt);
		checkResult(cuDeviceGetAttribute(
		&TmpInt, CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_BLOCK, Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Registers per Block: \t\t%d \n", TmpInt);
		checkResult(
		cuDeviceGetAttribute(&TmpInt, CU_DEVICE_ATTRIBUTE_WARP_SIZE, Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Warp Size: \t\t\t\t%d Threads \n", TmpInt);
		checkResult(cuDeviceGetAttribute(
		&TmpInt, CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK, Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Maximum Threads per Block: \t\t%d \n", TmpInt);
		checkResult(cuDeviceGetAttribute(&TmpInt, CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_X,
		Device),
		"Error returned from cuDeviceGetAttribute\n");
		checkResult(cuDeviceGetAttribute(&TmpInt2,
		CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Y, Device),
		"Error returned from cuDeviceGetAttribute\n");
		checkResult(cuDeviceGetAttribute(&TmpInt3,
		CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Z, Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Maximum Block Dimensions: \t\t%d, %d, %d \n", TmpInt, TmpInt2,
		TmpInt3);
		checkResult(
		cuDeviceGetAttribute(&TmpInt, CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_X, Device),
		"Error returned from cuDeviceGetAttribute\n");
		checkResult(cuDeviceGetAttribute(&TmpInt2, CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Y,
		Device),
		"Error returned from cuDeviceGetAttribute\n");
		checkResult(cuDeviceGetAttribute(&TmpInt3, CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Z,
		Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Maximum Grid Dimensions: \t\t%d x %d x %d \n", TmpInt, TmpInt2,
		TmpInt3);
		checkResult(
		cuDeviceGetAttribute(&TmpInt, CU_DEVICE_ATTRIBUTE_MAX_PITCH, Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Maximum Memory Pitch: \t\t%d bytes \n", TmpInt);
		checkResult(cuDeviceGetAttribute(
		&TmpInt, CU_DEVICE_ATTRIBUTE_TEXTURE_ALIGNMENT, Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Texture Alignment: \t\t\t%d bytes \n", TmpInt);
		checkResult(
		cuDeviceGetAttribute(&TmpInt, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Clock Rate: \t\t\t%d kHz\n", TmpInt);
		checkResult(cuDeviceGetAttribute(
		&TmpInt, CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT, Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Execution Timeout: \t\t\t%s \n", BOOL2TEXT(TmpInt));
		checkResult(
		cuDeviceGetAttribute(&TmpInt, CU_DEVICE_ATTRIBUTE_INTEGRATED, Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Integrated Device: \t\t\t%s \n", BOOL2TEXT(TmpInt));
		checkResult(cuDeviceGetAttribute(
		&TmpInt, CU_DEVICE_ATTRIBUTE_CAN_MAP_HOST_MEMORY, Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Can Map Host Memory: \t\t%s \n", BOOL2TEXT(TmpInt));
		checkResult(
		cuDeviceGetAttribute(&TmpInt, CU_DEVICE_ATTRIBUTE_COMPUTE_MODE, Device),
		"Error returned from cuDeviceGetAttribute\n");
		if (TmpInt == CU_COMPUTEMODE_DEFAULT)
		TmpStr = "DEFAULT";
		else if (TmpInt == CU_COMPUTEMODE_PROHIBITED)
		TmpStr = "PROHIBITED";
		else if (TmpInt == CU_COMPUTEMODE_EXCLUSIVE_PROCESS)
		TmpStr = "EXCLUSIVE PROCESS";
		else
		TmpStr = "unknown";
		printf(" Compute Mode: \t\t\t%s \n", TmpStr.c_str());
		checkResult(cuDeviceGetAttribute(
		&TmpInt, CU_DEVICE_ATTRIBUTE_CONCURRENT_KERNELS, Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Concurrent Kernels: \t\t%s \n", BOOL2TEXT(TmpInt));
		checkResult(
		cuDeviceGetAttribute(&TmpInt, CU_DEVICE_ATTRIBUTE_ECC_ENABLED, Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" ECC Enabled: \t\t\t%s \n", BOOL2TEXT(TmpInt));
		checkResult(cuDeviceGetAttribute(
		&TmpInt, CU_DEVICE_ATTRIBUTE_MEMORY_CLOCK_RATE, Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Memory Clock Rate: \t\t\t%d kHz\n", TmpInt);
		checkResult(cuDeviceGetAttribute(
		&TmpInt, CU_DEVICE_ATTRIBUTE_GLOBAL_MEMORY_BUS_WIDTH, Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Memory Bus Width: \t\t\t%d bits\n", TmpInt);
		checkResult(
		cuDeviceGetAttribute(&TmpInt, CU_DEVICE_ATTRIBUTE_L2_CACHE_SIZE, Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" L2 Cache Size: \t\t\t%d bytes \n", TmpInt);
		checkResult(
		cuDeviceGetAttribute(
		&TmpInt, CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR, Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Max Threads Per SMP: \t\t%d \n", TmpInt);
		checkResult(cuDeviceGetAttribute(
		&TmpInt, CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT, Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Async Engines: \t\t\t%s (%d) \n", BOOL2TEXT(TmpInt), TmpInt);
		checkResult(cuDeviceGetAttribute(
		&TmpInt, CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING, Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Unified Addressing: \t\t%s \n", BOOL2TEXT(TmpInt));
		checkResult(
		cuDeviceGetAttribute(&TmpInt, CU_DEVICE_ATTRIBUTE_MANAGED_MEMORY, Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Managed Memory: \t\t\t%s \n", BOOL2TEXT(TmpInt));
		checkResult(
		cuDeviceGetAttribute(
		&TmpInt, CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS, Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Concurrent Managed Memory: \t\t%s \n", BOOL2TEXT(TmpInt));
		checkResult(
		cuDeviceGetAttribute(
		&TmpInt, CU_DEVICE_ATTRIBUTE_COMPUTE_PREEMPTION_SUPPORTED, Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Preemption Supported: \t\t%s \n", BOOL2TEXT(TmpInt));
		checkResult(cuDeviceGetAttribute(
		&TmpInt, CU_DEVICE_ATTRIBUTE_COOPERATIVE_LAUNCH, Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Cooperative Launch: \t\t%s \n", BOOL2TEXT(TmpInt));
		checkResult(cuDeviceGetAttribute(&TmpInt, CU_DEVICE_ATTRIBUTE_MULTI_GPU_BOARD,
		Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Multi-Device Boars: \t\t%s \n", BOOL2TEXT(TmpInt));
		checkResult(cuDeviceGetAttribute(&TmpInt,
		CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR,
		Device),
		"Error returned from cuDeviceGetAttribute\n");
		checkResult(cuDeviceGetAttribute(&TmpInt2,
		CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR,
		Device),
		"Error returned from cuDeviceGetAttribute\n");
		printf(" Compute Capabilities: \t\t%d%d \n", TmpInt, TmpInt2);
		}
};		};

DeviceRTLTy DeviceRTL;		DeviceRTLTy DeviceRTL;
} // namespace		} // namespace

// Exposed library API function		// Exposed library API function
#ifdef __cplusplus		#ifdef __cplusplus
extern "C" {		extern "C" {
▲ Show 20 Lines • Show All 184 Lines • ▼ Show 20 Lines	int32_t __tgt_rtl_synchronize(int32_t device_id,
return DeviceRTL.synchronize(device_id, async_info_ptr);		return DeviceRTL.synchronize(device_id, async_info_ptr);
}		}

void __tgt_rtl_set_info_flag(uint32_t NewInfoLevel) {		void __tgt_rtl_set_info_flag(uint32_t NewInfoLevel) {
std::atomic<uint32_t> &InfoLevel = getInfoLevelInternal();		std::atomic<uint32_t> &InfoLevel = getInfoLevelInternal();
InfoLevel.store(NewInfoLevel);		InfoLevel.store(NewInfoLevel);
}		}

		void __tgt_rtl_print_device_info(int32_t device_id) {
		assert(DeviceRTL.isValidDeviceId(device_id) && "device_id is invalid");
		DeviceRTL.printDeviceInfo(device_id);
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions Verbose. Can probably format as a table containing the cuda API call, the text to print, possibly the corresponding HSA API call, then iterate over that table printing / building json etc JonChesterfield: Verbose. Can probably format as a table containing the cuda API call, the text to print…
		josemonsalve2AuthorUnsubmitted Done Reply Inline Actions I believe this is a good idea. Let me think about how to adopt it. josemonsalve2: I believe this is a good idea. Let me think about how to adopt it.
		jdoerfertUnsubmitted Not Done Reply Inline Actions I agree but I think this can be done later, together with some other improvements, e.g., what output stream to use. jdoerfert: I agree but I think this can be done later, together with some other improvements, e.g., what…
		}

#ifdef __cplusplus		#ifdef __cplusplus
		tianshilei1992Unsubmitted Not Done Reply Inline Actions Suggest to put all implementation details into the class above. tianshilei1992: Suggest to put all implementation details into the class above.
		josemonsalve2AuthorUnsubmitted Done Reply Inline Actions Also a good idea. josemonsalve2: Also a good idea.
}		}
#endif		#endif

openmp/libomptarget/plugins/exports

Show All 17 Lines	global:
__tgt_rtl_run_target_team_region_async;		__tgt_rtl_run_target_team_region_async;
__tgt_rtl_run_target_region;		__tgt_rtl_run_target_region;
__tgt_rtl_run_target_region_async;		__tgt_rtl_run_target_region_async;
__tgt_rtl_synchronize;		__tgt_rtl_synchronize;
__tgt_rtl_register_lib;		__tgt_rtl_register_lib;
__tgt_rtl_unregister_lib;		__tgt_rtl_unregister_lib;
__tgt_rtl_supports_empty_images;		__tgt_rtl_supports_empty_images;
__tgt_rtl_set_info_flag;		__tgt_rtl_set_info_flag;
		__tgt_rtl_print_device_info;
local:		local:
*;		*;
};		};

openmp/libomptarget/src/device.h

Show First 20 Lines • Show All 269 Lines • ▼ Show 20 Lines	int32_t runTeamRegion(void TgtEntryPtr, void *TgtVarsPtr,
ptrdiff_t *TgtOffsets, int32_t TgtVarsSize,		ptrdiff_t *TgtOffsets, int32_t TgtVarsSize,
int32_t NumTeams, int32_t ThreadLimit,		int32_t NumTeams, int32_t ThreadLimit,
uint64_t LoopTripCount, AsyncInfoTy &AsyncInfo);		uint64_t LoopTripCount, AsyncInfoTy &AsyncInfo);

/// Synchronize device/queue/event based on \p AsyncInfo and return		/// Synchronize device/queue/event based on \p AsyncInfo and return
/// OFFLOAD_SUCCESS/OFFLOAD_FAIL when succeeds/fails.		/// OFFLOAD_SUCCESS/OFFLOAD_FAIL when succeeds/fails.
int32_t synchronize(AsyncInfoTy &AsyncInfo);		int32_t synchronize(AsyncInfoTy &AsyncInfo);

		/// Calls the corresponding print in the \p RTLDEVID
		jdoerfertUnsubmitted Not Done Reply Inline Actions brief doxygen comment please jdoerfert: brief doxygen comment please
		/// device RTL to obtain the information of the specific device.
		bool printDeviceInfo(int32_t RTLDevID);

private:		private:
// Call to RTL		// Call to RTL
void init(); // To be called only via DeviceTy::initOnce()		void init(); // To be called only via DeviceTy::initOnce()
};		};

/// Map between Device ID (i.e. openmp device id) and its DeviceTy.		/// Map between Device ID (i.e. openmp device id) and its DeviceTy.
typedef std::vector<DeviceTy> DevicesTy;		typedef std::vector<DeviceTy> DevicesTy;

Show All 29 Lines

openmp/libomptarget/src/device.cpp

Show First 20 Lines • Show All 505 Lines • ▼ Show 20 Lines	int32_t DeviceTy::runRegion(void TgtEntryPtr, void *TgtVarsPtr,
if (!RTL->run_region \|\| !RTL->synchronize)		if (!RTL->run_region \|\| !RTL->synchronize)
return RTL->run_region(RTLDeviceID, TgtEntryPtr, TgtVarsPtr, TgtOffsets,		return RTL->run_region(RTLDeviceID, TgtEntryPtr, TgtVarsPtr, TgtOffsets,
TgtVarsSize);		TgtVarsSize);
else		else
return RTL->run_region_async(RTLDeviceID, TgtEntryPtr, TgtVarsPtr,		return RTL->run_region_async(RTLDeviceID, TgtEntryPtr, TgtVarsPtr,
TgtOffsets, TgtVarsSize, AsyncInfo);		TgtOffsets, TgtVarsSize, AsyncInfo);
}		}

		// Run region on device
		bool DeviceTy::printDeviceInfo(int32_t RTLDevId) {
		if (!RTL->print_device_info)
		return false;
		RTL->print_device_info(RTLDevId);
		return true;
		}

// Run team region on device.		// Run team region on device.
int32_t DeviceTy::runTeamRegion(void TgtEntryPtr, void *TgtVarsPtr,		int32_t DeviceTy::runTeamRegion(void TgtEntryPtr, void *TgtVarsPtr,
ptrdiff_t *TgtOffsets, int32_t TgtVarsSize,		ptrdiff_t *TgtOffsets, int32_t TgtVarsSize,
int32_t NumTeams, int32_t ThreadLimit,		int32_t NumTeams, int32_t ThreadLimit,
uint64_t LoopTripCount,		uint64_t LoopTripCount,
AsyncInfoTy &AsyncInfo) {		AsyncInfoTy &AsyncInfo) {
if (!RTL->run_team_region_async \|\| !RTL->synchronize)		if (!RTL->run_team_region_async \|\| !RTL->synchronize)
return RTL->run_team_region(RTLDeviceID, TgtEntryPtr, TgtVarsPtr,		return RTL->run_team_region(RTLDeviceID, TgtEntryPtr, TgtVarsPtr,
▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

openmp/libomptarget/src/exports

Show All 34 Lines	global:
omp_target_memcpy;		omp_target_memcpy;
omp_target_memcpy_rect;		omp_target_memcpy_rect;
omp_target_associate_ptr;		omp_target_associate_ptr;
omp_target_disassociate_ptr;		omp_target_disassociate_ptr;
llvm_omp_target_alloc_host;		llvm_omp_target_alloc_host;
llvm_omp_target_alloc_shared;		llvm_omp_target_alloc_shared;
llvm_omp_target_alloc_device;		llvm_omp_target_alloc_device;
__tgt_set_info_flag;		__tgt_set_info_flag;
		__tgt_print_device_info;
local:		local:
*;		*;
};		};

openmp/libomptarget/src/interface.cpp

	Show First 20 Lines • Show All 460 Lines • ▼ Show 20 Lines
	EXTERN void __tgt_set_info_flag(uint32_t NewInfoLevel) {			EXTERN void __tgt_set_info_flag(uint32_t NewInfoLevel) {
	std::atomic<uint32_t> &InfoLevel = getInfoLevelInternal();			std::atomic<uint32_t> &InfoLevel = getInfoLevelInternal();
	InfoLevel.store(NewInfoLevel);			InfoLevel.store(NewInfoLevel);
	for (auto &R : PM->RTLs.AllRTLs) {			for (auto &R : PM->RTLs.AllRTLs) {
	if (R.set_info_flag)			if (R.set_info_flag)
	R.set_info_flag(NewInfoLevel);			R.set_info_flag(NewInfoLevel);
	}			}
	}			}

				EXTERN int __tgt_print_device_info(int64_t device_id) {
				return PM->Devices[device_id].printDeviceInfo(
				PM->Devices[device_id].RTLDeviceID);
				}

openmp/libomptarget/src/rtl.h

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	struct RTLInfoTy {
typedef int32_t(run_team_region_async_ty)(int32_t, void , void *,		typedef int32_t(run_team_region_async_ty)(int32_t, void , void *,
ptrdiff_t *, int32_t, int32_t,		ptrdiff_t *, int32_t, int32_t,
int32_t, uint64_t,		int32_t, uint64_t,
__tgt_async_info *);		__tgt_async_info *);
typedef int64_t(init_requires_ty)(int64_t);		typedef int64_t(init_requires_ty)(int64_t);
typedef int64_t(synchronize_ty)(int32_t, __tgt_async_info *);		typedef int64_t(synchronize_ty)(int32_t, __tgt_async_info *);
typedef int32_t (register_lib_ty)(__tgt_bin_desc );		typedef int32_t (register_lib_ty)(__tgt_bin_desc );
typedef int32_t(supports_empty_images_ty)();		typedef int32_t(supports_empty_images_ty)();
		typedef void(print_device_info_ty)(int32_t);
typedef void(set_info_flag_ty)(uint32_t);		typedef void(set_info_flag_ty)(uint32_t);

int32_t Idx = -1; // RTL index, index is the number of devices		int32_t Idx = -1; // RTL index, index is the number of devices
// of other RTLs that were registered before,		// of other RTLs that were registered before,
// i.e. the OpenMP index of the first device		// i.e. the OpenMP index of the first device
// to be registered with this RTL.		// to be registered with this RTL.
int32_t NumberOfDevices = -1; // Number of devices this RTL deals with.		int32_t NumberOfDevices = -1; // Number of devices this RTL deals with.

Show All 22 Lines	#endif
run_team_region_ty *run_team_region = nullptr;		run_team_region_ty *run_team_region = nullptr;
run_team_region_async_ty *run_team_region_async = nullptr;		run_team_region_async_ty *run_team_region_async = nullptr;
init_requires_ty *init_requires = nullptr;		init_requires_ty *init_requires = nullptr;
synchronize_ty *synchronize = nullptr;		synchronize_ty *synchronize = nullptr;
register_lib_ty register_lib = nullptr;		register_lib_ty register_lib = nullptr;
register_lib_ty unregister_lib = nullptr;		register_lib_ty unregister_lib = nullptr;
supports_empty_images_ty *supports_empty_images = nullptr;		supports_empty_images_ty *supports_empty_images = nullptr;
set_info_flag_ty *set_info_flag = nullptr;		set_info_flag_ty *set_info_flag = nullptr;
		print_device_info_ty *print_device_info = nullptr;

// Are there images associated with this RTL.		// Are there images associated with this RTL.
bool isUsed = false;		bool isUsed = false;

// Mutex for thread-safety when calling RTL interface functions.		// Mutex for thread-safety when calling RTL interface functions.
// It is easier to enforce thread-safety at the libomptarget level,		// It is easier to enforce thread-safety at the libomptarget level,
// so that developers of new RTLs do not have to worry about it.		// so that developers of new RTLs do not have to worry about it.
std::mutex Mtx;		std::mutex Mtx;
▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

openmp/libomptarget/src/rtl.cpp

Show First 20 Lines • Show All 171 Lines • ▼ Show 20 Lines	#endif
((void *)&R.register_lib) =		((void *)&R.register_lib) =
dlsym(dynlib_handle, "__tgt_rtl_register_lib");		dlsym(dynlib_handle, "__tgt_rtl_register_lib");
((void *)&R.unregister_lib) =		((void *)&R.unregister_lib) =
dlsym(dynlib_handle, "__tgt_rtl_unregister_lib");		dlsym(dynlib_handle, "__tgt_rtl_unregister_lib");
((void *)&R.supports_empty_images) =		((void *)&R.supports_empty_images) =
dlsym(dynlib_handle, "__tgt_rtl_supports_empty_images");		dlsym(dynlib_handle, "__tgt_rtl_supports_empty_images");
((void *)&R.set_info_flag) =		((void *)&R.set_info_flag) =
dlsym(dynlib_handle, "__tgt_rtl_set_info_flag");		dlsym(dynlib_handle, "__tgt_rtl_set_info_flag");
		((void *)&R.print_device_info) =
		dlsym(dynlib_handle, "__tgt_rtl_print_device_info");
}		}

DP("RTLs loaded!\n");		DP("RTLs loaded!\n");

return;		return;
}		}

////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////
▲ Show 20 Lines • Show All 280 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][Libomptarget] Adding `print_device_info` to RTL and `omptarget`ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 362259

openmp/libomptarget/include/omptarget.h

openmp/libomptarget/include/omptargetplugin.h

openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h

openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.cpp

openmp/libomptarget/plugins/cuda/src/rtl.cpp

openmp/libomptarget/plugins/exports

openmp/libomptarget/src/device.h

openmp/libomptarget/src/device.cpp

openmp/libomptarget/src/exports

openmp/libomptarget/src/interface.cpp

openmp/libomptarget/src/rtl.h

openmp/libomptarget/src/rtl.cpp

[OpenMP][Libomptarget] Adding `print_device_info` to RTL and `omptarget`
ClosedPublic