This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
openmp/
-
libomptarget/
-
include/
2/4
device.h
-
omptarget.h
-
omptargetplugin.h
-
rtl.h
-
plugins/
-
amdgpu/src/
-
src/
-
rtl.cpp
-
cuda/src/
-
src/
1/2
rtl.cpp
-
generic-elf-64bit/src/
-
src/
-
rtl.cpp
-
ve/src/
-
src/
-
rtl.cpp
-
src/
-
api.cpp
-
device.cpp
-
exports
-
rtl.cpp
-
test/api/
-
api/
-
ompx_get_device_num_units.c
-
runtime/src/include/
-
src/
-
include/
-
omp.h.var

Differential D135162

[OPENMP] New api ompx_get_device_num_units(int devid)
Needs ReviewPublic

Authored by gregrodgers on Oct 4 2022, 8:28 AM.

Download Raw Diff

Details

Reviewers

jdoerfert

Summary

This returns the number of physical processors that can run a team on the specified device. For AMD, this is the number of CUs. for Nvidia, this is number of SMs. For CPUs, this COULD be number of sockets if multiple teams are supported. This API is needed for optimizing cross team reductions where we want to minimize the number of intermediate per-team reduction values.

If the device id is the initial device, 1 is returned.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

gregrodgers created this revision.Oct 4 2022, 8:28 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 4 2022, 8:28 AM

Herald added subscribers: kosarev, kerbowa, guansong and 2 others. · View Herald Transcript

gregrodgers requested review of this revision.Oct 4 2022, 8:28 AM

Herald added a reviewer: jdoerfert. · View Herald TranscriptOct 4 2022, 8:28 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: openmp-commits, sstefan1. · View Herald Transcript

Harbormaster completed remote builds in B190220: Diff 465023.Oct 4 2022, 8:31 AM

gregrodgers retitled this revision from [OPENMP] New api ompx_get_team_procs(devid) returns the number of physical processors that can run a team. For AMD, this is the number of CUs. for Nvidia, this is number of SMs. For CPUs, this COULD be number of sockets if multiple teams are... to [OPENMP] New api ompx_get_team_procs(devid) .Oct 4 2022, 8:35 AM

gregrodgers edited the summary of this revision. (Show Details)

The new interface to query the number of physical processors (or whatever it should be called) is fine, but the name is a little bit questionable. It implies how an OpenMP team is mapped to the low level model. I think it is possible that the execution model mapping is per-kernel. It doesn't sound great to have that implication in a general interface name.

gregrodgers mentioned this in D135299: [OPENMP] Xteamr Helper functions for high performance reductions.Oct 5 2022, 12:02 PM

In D135162#3835298, @tianshilei1992 wrote:

The new interface to query the number of physical processors (or whatever it should be called) is fine, but the name is a little bit questionable. It implies how an OpenMP team is mapped to the low level model. I think it is possible that the execution model mapping is per-kernel. It doesn't sound great to have that implication in a general interface name.

Thank you for the comment. It is the detail of the execution model as implemented by the plugin that we need to expose, namely the team. I cannot think of a model or implementation that does not map one or more teams to something physical (hence word "proc"). This commit updates all the plugins that have devices including generic-elf-64bit. I can imagine an implementation that maps a team to a host socket in a multi-socket system. This is why I did not put the word "target" in the name because the devid could be the host or initial device. This is why ompx_get_team_proc(omp_get_initial_device()) returns 1.

To minimize cross team communication, one needs to know the minimum teams to activate all hardware. See https://reviews.llvm.org/D135299 for how ompx_get_team_procs(devid) will be used to allocate storage for cross team (xteam) communication.

There is an existing api omp_get_num_procs() which currently maps to the number of hardware threads in block. This is callable in a target region and I believe it is typically the number of threads. But the number of teams is much different than the physical location of where a team can run.

Currently ompx_get_team_procs(devid) is only callable from host.

The change to openmp/libomptarget/plugins/exports needs to be deleted. The upstream changed this file to include all functions beginning with __tgt_rtl

gregrodgers mentioned this in D136094: [OPENMP] fix D35162 to remove changes to plugins/exports.Oct 17 2022, 10:06 AM

Remove change to openmp/libomptarget/plugins/exports so this revision will merge with current trunk

Harbormaster completed remote builds in B192528: Diff 468250.Oct 17 2022, 10:27 AM

As @tianshilei1992 noted, the naming is not great. Team can technically map to lots of things and soon they will. What about "thread groups"? Or even "sockets"?
Also, there is some duplication I pointed out below.

openmp/libomptarget/include/device.h
325	It's unclear why we need to store this in two places, the plugins and here. Other device data only lives in the plugins, this should too.
openmp/libomptarget/plugins/cuda/src/rtl.cpp
357	This should go into DeviceData, and in the new plugin interface it's different again.

Merge branch 'main' into arcpatch-D135162

Harbormaster completed remote builds in B192547: Diff 468270.Oct 17 2022, 11:32 AM

move NumberOfTeamProcs into DeviceData per comment from jdoerfert

Harbormaster completed remote builds in B192559: Diff 468291.Oct 17 2022, 12:45 PM

"Team can technically map to lots of things and soon they will. What about "thread groups"? Or even "sockets"?
Also, there is some duplication I pointed out below."

But teams are exactly what we are trying to address with ompx_get_team_procs(). How many physical things can a team "map to"/"run on"? This is needed because a user (or runtime) may want to limit the number of teams created to utilize all (or some subset) of the number physical team processors. At one point I was thinking of utilizing the places API in OpenMP 5.2 spec chapter 18.3. If you imagined a place was like a device and a place partition was like a CU (or nvidisa SM) then omp_get_partition_num_places(). One of several problems with using the places API is that it is internal, I need an external API to help in setting number of teams on the specified device id.

The other big problem with places api is that it is written for thread management to work with thread affinity. I think overloading this with a team (group of threads) would get us in trouble fast.

I realize that the number of teams is typically application specific (or should be) and having this API may be a gun aimed at ones foot. But when cross team coordination is necessary, it can be beneficial to limit the number of teams so as to utilize all the hardware while minimizing the coordination among teams.

I addressed both of your inline comments. One with a fix and the other with an explanation..

openmp/libomptarget/include/device.h
325	This is the value on the host DeviceTy that the getter and setter access. The getter is for the new external api ompx_get_team_procs(devid). The setter is called when the device is initialized and gets the value to set from the plugin which now stores the value in the DeviceData.
openmp/libomptarget/plugins/cuda/src/rtl.cpp
357	Done.

In D135162#3863163, @gregrodgers wrote:

"Team can technically map to lots of things and soon they will. What about "thread groups"? Or even "sockets"?
Also, there is some duplication I pointed out below."

But teams are exactly what we are trying to address with ompx_get_team_procs(). How many physical things can a team "map to"/"run on"? This is needed because a user (or runtime) may want to limit the number of teams created to utilize all (or some subset) of the number physical team processors.

I understand all that but as I said, CUs/SMs are not the only thing we can map teams on to. I could map 2 teams on each of them. Or one team one two of them (potentially with some compiler fixup logic). Etc.

Your use case is: Get me the number of processor groups so I can determine the number of teams I want. The hardware doesn't provide "team processors" but something else. And with a certain mapping in mind it makes sense to query the number of these things. However, conflating this with "teams" on the user level is problematic as "teams" can map in various ways, as mentioned.

At one point I was thinking of utilizing the places API in OpenMP 5.2 spec chapter 18.3. If you imagined a place was like a device and a place partition was like a CU (or nvidisa SM) then omp_get_partition_num_places(). One of several problems with using the places API is that it is internal, I need an external API to help in setting number of teams on the specified device id.

I can see external APIs to be useful for queries but setting the number should not be a thing. Getters reach the plugin managing the device and return the value of "hardware thing-ys".

The other big problem with places api is that it is written for thread management to work with thread affinity. I think overloading this with a team (group of threads) would get us in trouble fast.

Fair, I don't think this extension needs to unify such things.

openmp/libomptarget/include/device.h
325	But why do we need to store it in the plugin and here? No other information, e.g., max num threads, is stored twice. This just copies the value from the plugin once, which seem to provide little benefit.

Is this an LLVM-only API or will other vendors support it as well? Is there an òmpx_llvm_ namespace ?

In D135162#3863330, @tschuett wrote:

Is this an LLVM-only API or will other vendors support it as well? Is there an òmpx_llvm_ namespace ?

It will be LLVM only at first. Unsure if the others pick it up, there is a chance. llvm_ instead of ompx_ works for me, doing both is a mouthful but sure.

I think I get your rational now. A thread executes on a places proc. In fact many threads can execute on (map to) a place proc. We need the same sort of "mapping" for a team or thread group. We did not call a place a thread_proc.
As the name is now, a team_proc is a place where a team can run, That can be a CU, an SM, a VE, a host socket, a host numa domain, a node, or something. The code in this review covers all existing (in trunk) offload plugins that can execute a team.

So I am still searching for the right term. The term would fill in the blank on these two sentences. A place is to a thread as a _____ is to a team. One or more treads execute on a places proc, whereas one or more teams executes on a _______ proc.
The current thread API is omp_get_place_num_procs(int place_num). The new ompx API for this review should be something like omp_get_NEWTERM_num_procs(int device).

I am open to suggestions.

See answer to inline comment.

openmp/libomptarget/include/device.h
325	"No other information is stored twice" Not exactly. I tried to follow the mechanics of the omp_get_num_devices API as close as possible. The plugin API tgt_rtl_number_of_devices returns DeviceRTL.getNumbofDevices and initializes the PM->Devices vector in the plugin manager (outside the plugin) during initialization. Similarly the new plugin API tgt_rtl_number_of_team_procs returns the value stored in the plugin at initialization which is also stored in the PM. I don't understand why there is a lock in omp_get_num_devices around access to PM, but I copied the lock in ompx_get_team_procs. I was just trying to be consistent with an existing API.

Is the name ompx_get_device_num_units(devid) acceptable?

I agree that òmpx_llvm_ is a mouthful. But the OpenMP standard reserved the prefix òmpx_ for implementation-definded functions.

In D135162#3866539, @tschuett wrote:

I agree that òmpx_llvm_ is a mouthful. But the OpenMP standard reserved the prefix òmpx_ for implementation-definded functions.

We are an implementation, hence we can use ompx_, that's the point.

change name from ompx_get_team_procs(devid) to ompx_get_device_num_units(devid). Internal names have also been changed

Harbormaster completed remote builds in B193076: Diff 469019.Oct 19 2022, 1:14 PM

gregrodgers retitled this revision from [OPENMP] New api ompx_get_team_procs(devid) to [OPENMP] New api ompx_get_device_num_units(int devid).Oct 19 2022, 1:17 PM

I use ompx_ in research on top of the LLVM OpenMP. I have no plans to build an OpenMP runtime.

Revision Contents

Path

Size

openmp/

libomptarget/

include/

7 lines

1 line

4 lines

2 lines

plugins/

amdgpu/

src/

rtl.cpp

4 lines

cuda/

src/

rtl.cpp

21 lines

generic-elf-64bit/

src/

rtl.cpp

7 lines

ve/

src/

rtl.cpp

4 lines

src/

14 lines

1 line

1 line

3 lines

test/

api/

ompx_get_device_num_units.c

34 lines

runtime/

src/

include/

omp.h.var

1 line

Diff 469019

openmp/libomptarget/include/device.h

//===----------- device.h - Target independent OpenMP target RTL ----------===//		//===----------- device.h - Target independent OpenMP target RTL ----------===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 306 Lines • ▼ Show 20 Lines
struct DeviceTy {		struct DeviceTy {
int32_t DeviceID;		int32_t DeviceID;
RTLInfoTy *RTL;		RTLInfoTy *RTL;
int32_t RTLDeviceID;		int32_t RTLDeviceID;

bool IsInit;		bool IsInit;
std::once_flag InitFlag;		std::once_flag InitFlag;
bool HasPendingGlobals;		bool HasPendingGlobals;
		/// The physical number of team processors. For cuda, this is number of SMs,
		/// for AMD, this is number of CUs. Field used by ompx_get_device_num_units(devid).
		int32_t DeviceUnits;
		jdoerfertUnsubmitted Not Done Reply Inline Actions It's unclear why we need to store this in two places, the plugins and here. Other device data only lives in the plugins, this should too. jdoerfert: It's unclear why we need to store this in two places, the plugins and here. Other device data…
		gregrodgersAuthorUnsubmitted Done Reply Inline Actions This is the value on the host DeviceTy that the getter and setter access. The getter is for the new external api ompx_get_team_procs(devid). The setter is called when the device is initialized and gets the value to set from the plugin which now stores the value in the DeviceData. gregrodgers: This is the value on the host DeviceTy that the getter and setter access. The getter is for…
		jdoerfertUnsubmitted Not Done Reply Inline Actions But why do we need to store it in the plugin and here? No other information, e.g., max num threads, is stored twice. This just copies the value from the plugin once, which seem to provide little benefit. jdoerfert: But why do we need to store it in the plugin and here? No other information, e.g., max num…
		gregrodgersAuthorUnsubmitted Done Reply Inline Actions "No other information is stored twice" Not exactly. I tried to follow the mechanics of the omp_get_num_devices API as close as possible. The plugin API tgt_rtl_number_of_devices returns DeviceRTL.getNumbofDevices and initializes the PM->Devices vector in the plugin manager (outside the plugin) during initialization. Similarly the new plugin API tgt_rtl_number_of_team_procs returns the value stored in the plugin at initialization which is also stored in the PM. I don't understand why there is a lock in omp_get_num_devices around access to PM, but I copied the lock in ompx_get_team_procs. I was just trying to be consistent with an existing API. gregrodgers: "No other information is stored twice" Not exactly. I tried to follow the mechanics of the…

/// Host data to device map type with a wrapper key indirection that allows		/// Host data to device map type with a wrapper key indirection that allows
/// concurrent modification of the entries without invalidating the underlying		/// concurrent modification of the entries without invalidating the underlying
/// entries.		/// entries.
using HostDataToTargetListTy =		using HostDataToTargetListTy =
std::set<HostDataToTargetMapKeyTy, std::less<>>;		std::set<HostDataToTargetMapKeyTy, std::less<>>;

/// The HDTTMap is a protected object that can only be accessed by one thread		/// The HDTTMap is a protected object that can only be accessed by one thread
▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	struct DeviceTy {
/// is fulfilled.		/// is fulfilled.
int32_t waitEvent(void *Event, AsyncInfoTy &AsyncInfo);		int32_t waitEvent(void *Event, AsyncInfoTy &AsyncInfo);

/// Synchronize the event. It is expected to block the thread.		/// Synchronize the event. It is expected to block the thread.
int32_t syncEvent(void *Event);		int32_t syncEvent(void *Event);

/// Destroy the event.		/// Destroy the event.
int32_t destroyEvent(void *Event);		int32_t destroyEvent(void *Event);

		void setDeviceUnits(int32_t num_device_units) { DeviceUnits = num_device_units; }
		int32_t getDeviceUnits() { return DeviceUnits; }

/// }		/// }

private:		private:
// Call to RTL		// Call to RTL
void init(); // To be called only via DeviceTy::initOnce()		void init(); // To be called only via DeviceTy::initOnce()

/// Deinitialize the device (and plugin).		/// Deinitialize the device (and plugin).
void deinit();		void deinit();
▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

openmp/libomptarget/include/omptarget.h

//===-------- omptarget.h - Target independent OpenMP target RTL -- C++ -*-===//		//===-------- omptarget.h - Target independent OpenMP target RTL -- C++ -*-===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 208 Lines • ▼ Show 20 Lines	struct __tgt_device_info {
void *Context = nullptr;		void *Context = nullptr;
void *Device = nullptr;		void *Device = nullptr;
};		};

#ifdef __cplusplus		#ifdef __cplusplus
extern "C" {		extern "C" {
#endif		#endif

		int ompx_get_device_num_units(int device_num);
int omp_get_num_devices(void);		int omp_get_num_devices(void);
int omp_get_device_num(void);		int omp_get_device_num(void);
int omp_get_initial_device(void);		int omp_get_initial_device(void);
void *omp_target_alloc(size_t Size, int DeviceNum);		void *omp_target_alloc(size_t Size, int DeviceNum);
void omp_target_free(void *DevicePtr, int DeviceNum);		void omp_target_free(void *DevicePtr, int DeviceNum);
int omp_target_is_present(const void *Ptr, int DeviceNum);		int omp_target_is_present(const void *Ptr, int DeviceNum);
int omp_target_memcpy(void Dst, const void Src, size_t Length,		int omp_target_memcpy(void Dst, const void Src, size_t Length,
size_t DstOffset, size_t SrcOffset, int DstDevice,		size_t DstOffset, size_t SrcOffset, int DstDevice,
▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

openmp/libomptarget/include/omptargetplugin.h

	//===-- omptargetplugin.h - Target dependent OpenMP Plugin API --- C++ --===//			//===-- omptargetplugin.h - Target dependent OpenMP Plugin API --- C++ --===//
				Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	▲ Show 20 Lines • Show All 178 Lines • ▼ Show 20 Lines

	int32_t __tgt_rtl_destroy_event(int32_t ID, void *Event);			int32_t __tgt_rtl_destroy_event(int32_t ID, void *Event);
	// }			// }

	int32_t __tgt_rtl_init_async_info(int32_t ID, __tgt_async_info **AsyncInfoPtr);			int32_t __tgt_rtl_init_async_info(int32_t ID, __tgt_async_info **AsyncInfoPtr);
	int32_t __tgt_rtl_init_device_info(int32_t ID, __tgt_device_info *DeviceInfoPtr,			int32_t __tgt_rtl_init_device_info(int32_t ID, __tgt_device_info *DeviceInfoPtr,
	const char **ErrStr);			const char **ErrStr);

				// Number of available physical processors to execute teams. AMD calls these
				// CUs. Nvidia calls them SMs. For CPUs modeling teams, they could be sockets.
				int32_t __tgt_rtl_number_of_device_units(int32_t device_num);

	#ifdef __cplusplus			#ifdef __cplusplus
	}			}
	#endif			#endif

	#endif // _OMPTARGETPLUGIN_H_			#endif // _OMPTARGETPLUGIN_H_

openmp/libomptarget/include/rtl.h

//===------------ rtl.h - Target independent OpenMP target RTL ------------===//		//===------------ rtl.h - Target independent OpenMP target RTL ------------===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	struct RTLInfoTy {
typedef int32_t(record_event_ty)(int32_t, void , __tgt_async_info );		typedef int32_t(record_event_ty)(int32_t, void , __tgt_async_info );
typedef int32_t(wait_event_ty)(int32_t, void , __tgt_async_info );		typedef int32_t(wait_event_ty)(int32_t, void , __tgt_async_info );
typedef int32_t(sync_event_ty)(int32_t, void *);		typedef int32_t(sync_event_ty)(int32_t, void *);
typedef int32_t(destroy_event_ty)(int32_t, void *);		typedef int32_t(destroy_event_ty)(int32_t, void *);
typedef int32_t(release_async_info_ty)(int32_t, __tgt_async_info *);		typedef int32_t(release_async_info_ty)(int32_t, __tgt_async_info *);
typedef int32_t(init_async_info_ty)(int32_t, __tgt_async_info **);		typedef int32_t(init_async_info_ty)(int32_t, __tgt_async_info **);
typedef int64_t(init_device_into_ty)(int64_t, __tgt_device_info *,		typedef int64_t(init_device_into_ty)(int64_t, __tgt_device_info *,
const char **);		const char **);
		typedef int32_t(number_of_device_units_ty)(int32_t);

int32_t Idx = -1; // RTL index, index is the number of devices		int32_t Idx = -1; // RTL index, index is the number of devices
// of other RTLs that were registered before,		// of other RTLs that were registered before,
// i.e. the OpenMP index of the first device		// i.e. the OpenMP index of the first device
// to be registered with this RTL.		// to be registered with this RTL.
int32_t NumberOfDevices = -1; // Number of devices this RTL deals with.		int32_t NumberOfDevices = -1; // Number of devices this RTL deals with.

std::unique_ptr<llvm::sys::DynamicLibrary> LibraryHandler;		std::unique_ptr<llvm::sys::DynamicLibrary> LibraryHandler;
Show All 33 Lines	#endif
print_device_info_ty *print_device_info = nullptr;		print_device_info_ty *print_device_info = nullptr;
create_event_ty *create_event = nullptr;		create_event_ty *create_event = nullptr;
record_event_ty *record_event = nullptr;		record_event_ty *record_event = nullptr;
wait_event_ty *wait_event = nullptr;		wait_event_ty *wait_event = nullptr;
sync_event_ty *sync_event = nullptr;		sync_event_ty *sync_event = nullptr;
destroy_event_ty *destroy_event = nullptr;		destroy_event_ty *destroy_event = nullptr;
init_async_info_ty *init_async_info = nullptr;		init_async_info_ty *init_async_info = nullptr;
init_device_into_ty *init_device_info = nullptr;		init_device_into_ty *init_device_info = nullptr;
		number_of_device_units_ty *number_of_device_units = nullptr;
release_async_info_ty *release_async_info = nullptr;		release_async_info_ty *release_async_info = nullptr;

// Are there images associated with this RTL.		// Are there images associated with this RTL.
bool IsUsed = false;		bool IsUsed = false;

llvm::DenseSet<const __tgt_device_image *> UsedImages;		llvm::DenseSet<const __tgt_device_image *> UsedImages;

// Mutex for thread-safety when calling RTL interface functions.		// Mutex for thread-safety when calling RTL interface functions.
▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp

//===--- amdgpu/src/rtl.cpp --------------------------------------- C++ -*-===//		//===--- amdgpu/src/rtl.cpp --------------------------------------- C++ -*-===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 1,931 Lines • ▼ Show 20 Lines	bool IsImageCompatibleWithEnv(const char *ImgInfo, std::string EnvInfo) {
return true;		return true;
}		}

extern "C" {		extern "C" {
int32_t __tgt_rtl_is_valid_binary(__tgt_device_image *Image) {		int32_t __tgt_rtl_is_valid_binary(__tgt_device_image *Image) {
return elfMachineIdIsAmdgcn(Image);		return elfMachineIdIsAmdgcn(Image);
}		}

		int __tgt_rtl_number_of_device_units(int DeviceId) {
		return DeviceInfo().ComputeUnits[DeviceId];
		}

int32_t __tgt_rtl_is_valid_binary_info(__tgt_device_image *image,		int32_t __tgt_rtl_is_valid_binary_info(__tgt_device_image *image,
__tgt_image_info *info) {		__tgt_image_info *info) {
if (!__tgt_rtl_is_valid_binary(image))		if (!__tgt_rtl_is_valid_binary(image))
return false;		return false;

// A subarchitecture was not specified. Assume it is compatible.		// A subarchitecture was not specified. Assume it is compatible.
if (!info->Arch)		if (!info->Arch)
return true;		return true;
▲ Show 20 Lines • Show All 729 Lines • Show Last 20 Lines

openmp/libomptarget/plugins/cuda/src/rtl.cpp

//===----RTLs/cuda/src/rtl.cpp - Target RTLs Implementation ------- C++ -*-===//		//===----RTLs/cuda/src/rtl.cpp - Target RTLs Implementation ------- C++ -*-===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines	struct DeviceDataTy {
CUcontext Context = nullptr;		CUcontext Context = nullptr;
// Device properties		// Device properties
unsigned int ThreadsPerBlock = 0;		unsigned int ThreadsPerBlock = 0;
unsigned int BlocksPerGrid = 0;		unsigned int BlocksPerGrid = 0;
unsigned int WarpSize = 0;		unsigned int WarpSize = 0;
// OpenMP properties		// OpenMP properties
unsigned int NumTeams = 0;		unsigned int NumTeams = 0;
unsigned int NumThreads = 0;		unsigned int NumThreads = 0;
		unsigned int NumberOfDeviceUnits = 0;
};		};

/// Resource allocator where \p T is the resource type.		/// Resource allocator where \p T is the resource type.
/// Functions \p create and \p destroy return OFFLOAD_SUCCESS and OFFLOAD_FAIL		/// Functions \p create and \p destroy return OFFLOAD_SUCCESS and OFFLOAD_FAIL
/// accordingly. The implementation should not raise any exception.		/// accordingly. The implementation should not raise any exception.
template <typename T> struct AllocatorTy {		template <typename T> struct AllocatorTy {
using ElementTy = T;		using ElementTy = T;
virtual ~AllocatorTy() {}		virtual ~AllocatorTy() {}
▲ Show 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	class DeviceRTLTy {
using StreamPoolTy = ResourcePoolTy<StreamAllocatorTy>;		using StreamPoolTy = ResourcePoolTy<StreamAllocatorTy>;
std::vector<std::unique_ptr<StreamPoolTy>> StreamPool;		std::vector<std::unique_ptr<StreamPoolTy>> StreamPool;

using EventPoolTy = ResourcePoolTy<EventAllocatorTy>;		using EventPoolTy = ResourcePoolTy<EventAllocatorTy>;
std::vector<std::unique_ptr<EventPoolTy>> EventPool;		std::vector<std::unique_ptr<EventPoolTy>> EventPool;

std::vector<DeviceDataTy> DeviceData;		std::vector<DeviceDataTy> DeviceData;
std::vector<std::vector<CUmodule>> Modules;		std::vector<std::vector<CUmodule>> Modules;

		jdoerfertUnsubmitted Not Done Reply Inline Actions This should go into DeviceData, and in the new plugin interface it's different again. jdoerfert: This should go into DeviceData, and in the new plugin interface it's different again.
		gregrodgersAuthorUnsubmitted Done Reply Inline Actions Done. gregrodgers: Done.
/// Vector of flags indicating the initalization status of all associated		/// Vector of flags indicating the initalization status of all associated
/// devices.		/// devices.
std::vector<bool> InitializedFlags;		std::vector<bool> InitializedFlags;

enum class PeerAccessState : uint8_t { Unkown, Yes, No };		enum class PeerAccessState : uint8_t { Unkown, Yes, No };
std::vector<std::vector<PeerAccessState>> PeerAccessMatrix;		std::vector<std::vector<PeerAccessState>> PeerAccessMatrix;
std::mutex PeerAccessMatrixLock;		std::mutex PeerAccessMatrixLock;

▲ Show 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	public:

// Check whether a given DeviceId is valid		// Check whether a given DeviceId is valid
bool isValidDeviceId(const int DeviceId) const {		bool isValidDeviceId(const int DeviceId) const {
return DeviceId >= 0 && DeviceId < NumberOfDevices;		return DeviceId >= 0 && DeviceId < NumberOfDevices;
}		}

int getNumOfDevices() const { return NumberOfDevices; }		int getNumOfDevices() const { return NumberOfDevices; }

		int getNumOfDeviceUnits(int devid) const {
		return DeviceData[devid].NumberOfDeviceUnits;
		}

void setRequiresFlag(const int64_t Flags) { this->RequiresFlags = Flags; }		void setRequiresFlag(const int64_t Flags) { this->RequiresFlags = Flags; }

int initDevice(const int DeviceId) {		int initDevice(const int DeviceId) {
CUdevice Device;		CUdevice Device;

DP("Getting device %d\n", DeviceId);		DP("Getting device %d\n", DeviceId);
CUresult Err = cuDeviceGet(&Device, DeviceId);		CUresult Err = cuDeviceGet(&Device, DeviceId);
if (!checkResult(Err, "Error returned from cuDeviceGet\n"))		if (!checkResult(Err, "Error returned from cuDeviceGet\n"))
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	if (Err != CUDA_SUCCESS) {
DP("Error getting max grid dimension, use default value %d\n",		DP("Error getting max grid dimension, use default value %d\n",
DeviceRTLTy::DefaultNumTeams);		DeviceRTLTy::DefaultNumTeams);
DeviceData[DeviceId].BlocksPerGrid = DeviceRTLTy::DefaultNumTeams;		DeviceData[DeviceId].BlocksPerGrid = DeviceRTLTy::DefaultNumTeams;
} else {		} else {
DP("Using %d CUDA blocks per grid\n", MaxGridDimX);		DP("Using %d CUDA blocks per grid\n", MaxGridDimX);
DeviceData[DeviceId].BlocksPerGrid = MaxGridDimX;		DeviceData[DeviceId].BlocksPerGrid = MaxGridDimX;
}		}

		// Query attributes to for number of SMs for ompx_get_device_num_units(devid)
		int TmpDeviceUnits;
		Err = cuDeviceGetAttribute(
		&TmpDeviceUnits, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT, Device);
		if (Err != CUDA_SUCCESS) {
		DP("Error: on CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT, using %d\n", 16);
		DeviceData[DeviceId].NumberOfDeviceUnits = 16;
		} else {
		DP("Device %d has %d procs for team execution\n", DeviceId, TmpDeviceUnits);
		DeviceData[DeviceId].NumberOfDeviceUnits = TmpDeviceUnits;
		}

// We are only exploiting threads along the x axis.		// We are only exploiting threads along the x axis.
int MaxBlockDimX;		int MaxBlockDimX;
Err = cuDeviceGetAttribute(&MaxBlockDimX,		Err = cuDeviceGetAttribute(&MaxBlockDimX,
CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_X, Device);		CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_X, Device);
if (Err != CUDA_SUCCESS) {		if (Err != CUDA_SUCCESS) {
DP("Error getting max block dimension, use default value %d\n",		DP("Error getting max block dimension, use default value %d\n",
DeviceRTLTy::DefaultNumThreads);		DeviceRTLTy::DefaultNumThreads);
DeviceData[DeviceId].ThreadsPerBlock = DeviceRTLTy::DefaultNumThreads;		DeviceData[DeviceId].ThreadsPerBlock = DeviceRTLTy::DefaultNumThreads;
▲ Show 20 Lines • Show All 905 Lines • ▼ Show 20 Lines	int32_t __tgt_rtl_is_valid_binary_info(__tgt_device_image *Image,
}		}

DP("Image has compatible compute capability: %s\n", Info->Arch);		DP("Image has compatible compute capability: %s\n", Info->Arch);
return true;		return true;
}		}

int32_t __tgt_rtl_number_of_devices() { return DeviceRTL.getNumOfDevices(); }		int32_t __tgt_rtl_number_of_devices() { return DeviceRTL.getNumOfDevices(); }

		int32_t __tgt_rtl_number_of_device_units(int devid) {
		return DeviceRTL.getNumOfDeviceUnits(devid);
		}

int64_t __tgt_rtl_init_requires(int64_t RequiresFlags) {		int64_t __tgt_rtl_init_requires(int64_t RequiresFlags) {
DP("Init requires flags to %" PRId64 "\n", RequiresFlags);		DP("Init requires flags to %" PRId64 "\n", RequiresFlags);
DeviceRTL.setRequiresFlag(RequiresFlags);		DeviceRTL.setRequiresFlag(RequiresFlags);
return RequiresFlags;		return RequiresFlags;
}		}

int32_t __tgt_rtl_is_data_exchangable(int32_t SrcDevId, int DstDevId) {		int32_t __tgt_rtl_is_data_exchangable(int32_t SrcDevId, int DstDevId) {
if (DeviceRTL.isValidDeviceId(SrcDevId) &&		if (DeviceRTL.isValidDeviceId(SrcDevId) &&
▲ Show 20 Lines • Show All 290 Lines • Show Last 20 Lines

openmp/libomptarget/plugins/generic-elf-64bit/src/rtl.cpp

//===-RTLs/generic-64bit/src/rtl.cpp - Target RTLs Implementation - C++ -*-===//		//===-RTLs/generic-64bit/src/rtl.cpp - Target RTLs Implementation - C++ -*-===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
Show All 27 Lines

#ifndef TARGET_ELF_ID		#ifndef TARGET_ELF_ID
#define TARGET_ELF_ID 0		#define TARGET_ELF_ID 0
#endif		#endif

#include "elf_common.h"		#include "elf_common.h"

#define NUMBER_OF_DEVICES 4		#define NUMBER_OF_DEVICES 4
		#define NUMBER_OF_DEVICE_UNITS 1
#define OFFLOAD_SECTION_NAME "omp_offloading_entries"		#define OFFLOAD_SECTION_NAME "omp_offloading_entries"

/// Array of Dynamic libraries loaded for this target.		/// Array of Dynamic libraries loaded for this target.
struct DynLibTy {		struct DynLibTy {
std::string FileName;		std::string FileName;
std::unique_ptr<DynamicLibrary> DynLib;		std::unique_ptr<DynamicLibrary> DynLib;
};		};

▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	#if TARGET_ELF_ID < 1
return 0;		return 0;
#else		#else
return elf_check_machine(Image, TARGET_ELF_ID);		return elf_check_machine(Image, TARGET_ELF_ID);
#endif		#endif
}		}

int32_t __tgt_rtl_number_of_devices() { return NUMBER_OF_DEVICES; }		int32_t __tgt_rtl_number_of_devices() { return NUMBER_OF_DEVICES; }

		// __tgt_rtl_number_of_device_units supports ompx_get_device_num_units(devid).
		// Imples that support multiple host teams may want to change this.
		int32_t __tgt_rtl_number_of_device_units(int32_t device_id) {
		return NUMBER_OF_DEVICE_UNITS;
		}

int32_t __tgt_rtl_init_device(int32_t DeviceId) { return OFFLOAD_SUCCESS; }		int32_t __tgt_rtl_init_device(int32_t DeviceId) { return OFFLOAD_SUCCESS; }

__tgt_target_table *__tgt_rtl_load_binary(int32_t DeviceId,		__tgt_target_table *__tgt_rtl_load_binary(int32_t DeviceId,
__tgt_device_image *Image) {		__tgt_device_image *Image) {

DP("Dev %d: load binary from " DPxMOD " image\n", DeviceId,		DP("Dev %d: load binary from " DPxMOD " image\n", DeviceId,
DPxPTR(Image->ImageStart));		DPxPTR(Image->ImageStart));

▲ Show 20 Lines • Show All 148 Lines • Show Last 20 Lines

openmp/libomptarget/plugins/ve/src/rtl.cpp

//===-RTLs/nec-aurora/src/rtl.cpp - Target RTLs Implementation - C++ -*-======//		//===-RTLs/nec-aurora/src/rtl.cpp - Target RTLs Implementation - C++ -*-======//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is dual licensed under the MIT and the University of Illinois Open		// This file is dual licensed under the MIT and the University of Illinois Open
// Source Licenses. See LICENSE.txt for details.		// Source Licenses. See LICENSE.txt for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 165 Lines • ▼ Show 20 Lines	static int target_run_function_wait(uint32_t DeviceID, uint64_t FuncAddr,
}		}
return OFFLOAD_SUCCESS;		return OFFLOAD_SUCCESS;
}		}

// Return the number of available devices of the type supported by the		// Return the number of available devices of the type supported by the
// target RTL.		// target RTL.
int32_t __tgt_rtl_number_of_devices(void) { return DeviceInfo.NodeIds.size(); }		int32_t __tgt_rtl_number_of_devices(void) { return DeviceInfo.NodeIds.size(); }

		int32_t __tgt_rtl_number_of_device_units(int device_id) {
		return DeviceInfo.ProcHandles[device_id].size();
		}

// Return an integer different from zero if the provided device image can be		// Return an integer different from zero if the provided device image can be
// supported by the runtime. The functionality is similar to comparing the		// supported by the runtime. The functionality is similar to comparing the
// result of __tgt__rtl__load__binary to NULL. However, this is meant to be a		// result of __tgt__rtl__load__binary to NULL. However, this is meant to be a
// lightweight query to determine if the RTL is suitable for an image without		// lightweight query to determine if the RTL is suitable for an image without
// having to load the library, which can be expensive.		// having to load the library, which can be expensive.
int32_t __tgt_rtl_is_valid_binary(__tgt_device_image *Image) {		int32_t __tgt_rtl_is_valid_binary(__tgt_device_image *Image) {
#if TARGET_ELF_ID < 1		#if TARGET_ELF_ID < 1
return 0;		return 0;
▲ Show 20 Lines • Show All 269 Lines • Show Last 20 Lines

openmp/libomptarget/src/api.cpp

//===----------- api.cpp - Target independent OpenMP target RTL -----------===//		//===----------- api.cpp - Target independent OpenMP target RTL -----------===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
Show All 16 Lines	EXTERN int omp_get_num_devices(void) {
size_t DevicesSize = PM->Devices.size();		size_t DevicesSize = PM->Devices.size();
PM->RTLsMtx.unlock();		PM->RTLsMtx.unlock();

DP("Call to omp_get_num_devices returning %zd\n", DevicesSize);		DP("Call to omp_get_num_devices returning %zd\n", DevicesSize);

return DevicesSize;		return DevicesSize;
}		}

		EXTERN int ompx_get_device_num_units(int device_num) {
		if (!deviceIsReady(device_num)) {
		DP("Device %d did not initialize\n", device_num);
		// return 1 team proc for initial/host device
		return 1;
		}
		TIMESCOPE();
		PM->RTLsMtx.lock();
		int DeviceUnits = PM->Devices[device_num]->getDeviceUnits();
		PM->RTLsMtx.unlock();
		DP("Call to ompx_get_device_num_units returning %d\n", DeviceUnits);
		return DeviceUnits;
		}

EXTERN int omp_get_device_num(void) {		EXTERN int omp_get_device_num(void) {
TIMESCOPE();		TIMESCOPE();
int HostDevice = omp_get_initial_device();		int HostDevice = omp_get_initial_device();

DP("Call to omp_get_device_num returning %d\n", HostDevice);		DP("Call to omp_get_device_num returning %d\n", HostDevice);

return HostDevice;		return HostDevice;
}		}
▲ Show 20 Lines • Show All 280 Lines • Show Last 20 Lines

openmp/libomptarget/src/device.cpp

	//===--------- device.cpp - Target independent OpenMP target RTL ----------===//			//===--------- device.cpp - Target independent OpenMP target RTL ----------===//
				Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	▲ Show 20 Lines • Show All 482 Lines • ▼ Show 20 Lines
	/// Init device, should not be called directly.			/// Init device, should not be called directly.
	void DeviceTy::init() {			void DeviceTy::init() {
	// Make call to init_requires if it exists for this plugin.			// Make call to init_requires if it exists for this plugin.
	if (RTL->init_requires)			if (RTL->init_requires)
	RTL->init_requires(PM->RTLs.RequiresFlags);			RTL->init_requires(PM->RTLs.RequiresFlags);
	int32_t Ret = RTL->init_device(RTLDeviceID);			int32_t Ret = RTL->init_device(RTLDeviceID);
	if (Ret != OFFLOAD_SUCCESS)			if (Ret != OFFLOAD_SUCCESS)
	return;			return;
				setDeviceUnits(RTL->number_of_device_units(RTLDeviceID));

	IsInit = true;			IsInit = true;
	}			}

	/// Thread-safe method to initialize the device only once.			/// Thread-safe method to initialize the device only once.
	int32_t DeviceTy::initOnce() {			int32_t DeviceTy::initOnce() {
	std::call_once(InitFlag, &DeviceTy::init, this);			std::call_once(InitFlag, &DeviceTy::init, this);

	▲ Show 20 Lines • Show All 204 Lines • Show Last 20 Lines

openmp/libomptarget/src/exports

Show All 24 Lines	global:
__tgt_target_nowait_mapper;		__tgt_target_nowait_mapper;
__tgt_target_teams_nowait_mapper;		__tgt_target_teams_nowait_mapper;
__tgt_target_kernel;		__tgt_target_kernel;
__tgt_target_kernel_nowait;		__tgt_target_kernel_nowait;
__tgt_mapper_num_components;		__tgt_mapper_num_components;
__tgt_push_mapper_component;		__tgt_push_mapper_component;
__kmpc_push_target_tripcount;		__kmpc_push_target_tripcount;
__kmpc_push_target_tripcount_mapper;		__kmpc_push_target_tripcount_mapper;
		ompx_get_device_num_units;
omp_get_num_devices;		omp_get_num_devices;
omp_get_device_num;		omp_get_device_num;
omp_get_initial_device;		omp_get_initial_device;
omp_target_alloc;		omp_target_alloc;
omp_target_free;		omp_target_free;
omp_target_is_present;		omp_target_is_present;
omp_target_memcpy;		omp_target_memcpy;
omp_target_memcpy_rect;		omp_target_memcpy_rect;
Show All 23 Lines

openmp/libomptarget/src/rtl.cpp

//===----------- rtl.cpp - Target independent OpenMP target RTL -----------===//		//===----------- rtl.cpp - Target independent OpenMP target RTL -----------===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	for (auto *Name : RTLNames) {
bool ValidPlugin = true;		bool ValidPlugin = true;

if (!(((void *)&R.is_valid_binary) =		if (!(((void *)&R.is_valid_binary) =
DynLibrary->getAddressOfSymbol("__tgt_rtl_is_valid_binary")))		DynLibrary->getAddressOfSymbol("__tgt_rtl_is_valid_binary")))
ValidPlugin = false;		ValidPlugin = false;
if (!(((void *)&R.number_of_devices) =		if (!(((void *)&R.number_of_devices) =
DynLibrary->getAddressOfSymbol("__tgt_rtl_number_of_devices")))		DynLibrary->getAddressOfSymbol("__tgt_rtl_number_of_devices")))
ValidPlugin = false;		ValidPlugin = false;
		if (!(((void *)&R.number_of_device_units) =
		DynLibrary->getAddressOfSymbol("__tgt_rtl_number_of_device_units")))
		ValidPlugin = false;
if (!(((void *)&R.init_device) =		if (!(((void *)&R.init_device) =
DynLibrary->getAddressOfSymbol("__tgt_rtl_init_device")))		DynLibrary->getAddressOfSymbol("__tgt_rtl_init_device")))
ValidPlugin = false;		ValidPlugin = false;
if (!(((void *)&R.load_binary) =		if (!(((void *)&R.load_binary) =
DynLibrary->getAddressOfSymbol("__tgt_rtl_load_binary")))		DynLibrary->getAddressOfSymbol("__tgt_rtl_load_binary")))
ValidPlugin = false;		ValidPlugin = false;
if (!(((void *)&R.data_alloc) =		if (!(((void *)&R.data_alloc) =
DynLibrary->getAddressOfSymbol("__tgt_rtl_data_alloc")))		DynLibrary->getAddressOfSymbol("__tgt_rtl_data_alloc")))
▲ Show 20 Lines • Show All 437 Lines • Show Last 20 Lines

openmp/libomptarget/test/api/ompx_get_device_num_units.c

This file was added.

				// RUN: %libomptarget-compile-run-and-check-generic

				#include <omp.h>
				#include <stdio.h>

				int test_omp_get_device_num_units(int devid) {
				/* checks that ompx_get_device_num_units() > 0 */
				int device_units = ompx_get_device_num_units(devid);
				printf("device_units(%d) = %d\n", devid, device_units);

				#pragma omp target
				{}

				return (device_units > 0);
				}

				int main() {
				int i;
				int failed = 0;

				if (!test_omp_get_device_num_units(omp_get_initial_device())) {
				failed++;
				}
				if (!test_omp_get_device_num_units(omp_get_default_device())) {
				failed++;
				}
				if (failed)
				printf("FAIL\n");
				else
				printf("PASS\n");
				return failed;
				}

				// CHECK: PASS

openmp/runtime/src/include/omp.h.var

Show First 20 Lines • Show All 492 Lines • ▼ Show 20 Lines	# if defined(_OPENMP) && _OPENMP >= 201811
#pragma omp end declare variant		#pragma omp end declare variant
# endif		# endif

/* OpenMP 5.2 */		/* OpenMP 5.2 */
extern int __KAI_KMPC_CONVENTION omp_in_explicit_task(void);		extern int __KAI_KMPC_CONVENTION omp_in_explicit_task(void);

/* LLVM Extensions */		/* LLVM Extensions */
extern void *llvm_omp_target_dynamic_shared_alloc();		extern void *llvm_omp_target_dynamic_shared_alloc();
		extern int __KAI_KMPC_CONVENTION ompx_get_device_num_units(int);

# undef __KAI_KMPC_CONVENTION		# undef __KAI_KMPC_CONVENTION
# undef __KMP_IMP		# undef __KMP_IMP

/* Warning:		/* Warning:
The following typedefs are not standard, deprecated and will be removed in a future release.		The following typedefs are not standard, deprecated and will be removed in a future release.
*/		*/
typedef int omp_int_t;		typedef int omp_int_t;
typedef double omp_wtime_t;		typedef double omp_wtime_t;

# ifdef __cplusplus		# ifdef __cplusplus
}		}
# endif		# endif

#endif /* __OMP_H */		#endif /* __OMP_H */

This is an archive of the discontinued LLVM Phabricator instance.

[OPENMP] New api ompx_get_device_num_units(int devid)Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 469019

openmp/libomptarget/include/device.h

openmp/libomptarget/include/omptarget.h

openmp/libomptarget/include/omptargetplugin.h

openmp/libomptarget/include/rtl.h

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp

openmp/libomptarget/plugins/cuda/src/rtl.cpp

openmp/libomptarget/plugins/generic-elf-64bit/src/rtl.cpp

openmp/libomptarget/plugins/ve/src/rtl.cpp

openmp/libomptarget/src/api.cpp

openmp/libomptarget/src/device.cpp

openmp/libomptarget/src/exports

openmp/libomptarget/src/rtl.cpp

openmp/libomptarget/test/api/ompx_get_device_num_units.c

openmp/runtime/src/include/omp.h.var

[OPENMP] New api ompx_get_device_num_units(int devid)
Needs ReviewPublic