This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libomptarget/src/
-
src/
1
device.h
2
device.cpp
11/11
interface.cpp
1
omptarget.cpp
-
private.h
3/3
rtl.cpp

Differential D50522

[OpenMP][libomptarget] Bringing up to spec with respect to OMP_TARGET_OFFLOAD env var
ClosedPublic

Authored by AlexEichenberger on Aug 9 2018, 10:55 AM.

Download Raw Diff

Details

Reviewers

gtbercea
ABataev
grokos
caomhin
Hahnfeld

Commits

rG1b4a666ba584: [OpenMP][libomptarget] Bringing up to spec with respect to OMP_TARGET_OFFLOAD…
rL340542: [OpenMP][libomptarget] Bringing up to spec with respect to OMP_TARGET_OFFLOAD…
rOMP340542: [OpenMP][libomptarget] Bringing up to spec with respect to OMP_TARGET_OFFLOAD…

Summary

Right now, only the OMP_TARGET_OFFLOAD=DISABLED was implemented. Added support for the other MANDATORY and DEFAULT values.

Diff Detail

Repository: rOMP OpenMP

Event Timeline

AlexEichenberger created this revision.Aug 9 2018, 10:55 AM

Herald added subscribers: openmp-commits, guansong. · View Herald TranscriptAug 9 2018, 10:55 AM

Does this patch supersede D44522?

As discussed in there I don't see that DEFAULT means MANDATORY only iff there was a successful offload. From TR7, page 610, lines 17/18:

The DEFAULT value specifies that when one or more target devices are available, the runtime
behaves as if this environment variable is set to MANDATORY [...]

Another remark from D44522: I think we need to handle API methods as well.

libomptarget/src/rtl.cpp
49–64	There was an agreement with Intel to have a query function `__kmpc_get_target_offload`. Please use that one.

This revision now requires changes to proceed.Aug 9 2018, 11:17 AM

In D50522#1194100, @Hahnfeld wrote:

Does this patch supersede D44522?

yes

As discussed in there I don't see that DEFAULT means MANDATORY only iff there was a successful offload. From TR7, page 610, lines 17/18:

The DEFAULT value specifies that when one or more target devices are available, the runtime
behaves as if this environment variable is set to MANDATORY [...]

The question is how we define "available." IMO, it means success of first command. Happy to reconsider, I think most of our users are fine with success of first command

Another remark from D44522: I think we need to handle API methods as well.

I am all for using the suggested __kmpc_get_target_offload function, which is implemented. If we do so, we will be depending on a definition in kmp.h. Do we want a redundant definition of the

enum kmp_target_offload_kind {

tgt_disabled = 0,
tgt_default = 1,
tgt_mandatory = 2

};

I assume we want to do that, as we don't want to include kmp.h in libomptarget.

In D50522#1194113, @AlexEichenberger wrote:

In D50522#1194100, @Hahnfeld wrote:

As discussed in there I don't see that DEFAULT means MANDATORY only iff there was a successful offload. From TR7, page 610, lines 17/18:

The DEFAULT value specifies that when one or more target devices are available, the runtime
behaves as if this environment variable is set to MANDATORY [...]

The question is how we define "available." IMO, it means success of first command. Happy to reconsider, I think most of our users are fine with success of first command

You are right, "available" is not defined in the standard. I've always though of "plugged into the system", ie all devices that are visible to the CUDA runtime. That would match the current implementation of omp_get_num_devices which is defined to return "the number of available target devices".
Actually this behaviour would be important for us as we have our GPUs configured exclusively. So when there is already a process running all other users get a runtime error. In that case it would be very helpful to have libomptarget abort the program.

Another remark from D44522: I think we need to handle API methods as well.

I am all for using the suggested __kmpc_get_target_offload function, which is implemented. If we do so, we will be depending on a definition in kmp.h. Do we want a redundant definition of the
enum kmp_target_offload_kind {
  tgt_disabled = 0,
  tgt_default = 1,
  tgt_mandatory = 2
};
I assume we want to do that, as we don't want to include kmp.h in libomptarget.

Yes, D44522 also had that.

RaviNarayanaswamy added a subscriber: RaviNarayanaswamy.Aug 9 2018, 1:23 PM

RaviNarayanaswamy added inline comments.

libomptarget/src/interface.cpp
52	Why not move the check of handle_target_outcome into CheckDeviceAndCtors so you dont have to do every time you invoke this function.

In D50522#1194210, @Hahnfeld wrote:

You are right, "available" is not defined in the standard. I've always though of "plugged into the system", ie all devices that are visible to the CUDA runtime. That would match the current implementation of omp_get_num_devices which is defined to return "the number of available target devices".
Actually this behaviour would be important for us as we have our GPUs configured exclusively. So when there is already a process running all other users get a runtime error. In that case it would be very helpful to have libomptarget abort the program.

So you like deciding available on first use? This is what your comment seems to imply, but I am not 100% sure.

I think that our current interpretation of “available” for devices is
reasonable. There may be many reasons that a device is not available, even
if it is plugged in. Deciding that it is available because we were able to
use it seems the most dynamic method of determining this.

Kevin O’Brien

In D50522#1194805, @AlexEichenberger wrote:

In D50522#1194210, @Hahnfeld wrote:

You are right, "available" is not defined in the standard. I've always though of "plugged into the system", ie all devices that are visible to the CUDA runtime. That would match the current implementation of omp_get_num_devices which is defined to return "the number of available target devices".
Actually this behaviour would be important for us as we have our GPUs configured exclusively. So when there is already a process running all other users get a runtime error. In that case it would be very helpful to have libomptarget abort the program.

So you like deciding available on first use? This is what your comment seems to imply, but I am not 100% sure.

No, I favor the implication "visible" -> "available" which is the same interpretation that omp_get_num_devices is using (in its current form).
If we implemented the behaviour of this patch ("successful offload to ONE device" -> "ALL devices available", "error on ONE device" -> "NO devices available") we'd need to change the API methods. That would probably imply probing all devices at runtime startup - because after all we don't know which device the user is going to use. IIRC that was to be avoided, libomptarget uses lazy initialization at the moment.

In D50522#1194808, @caomhin wrote:

I think that our current interpretation of “available” for devices is
reasonable. There may be many reasons that a device is not available, even
if it is plugged in. Deciding that it is available because we were able to
use it seems the most dynamic method of determining this.

Suppose we have 2 devices plugged into the system, and the first one cannot be used (for whatever reason: hardware failure, exclusive configuration and somebody else is running, etc.).
Now a clever application sees the two (because omp_get_num_devices() returns 2) and does:

#pragma omp parallel num_threads(omp_get_num_devices())
{
  #pragma omp target device(omp_get_thread_num())
  { }
}

I think the runtime behaviour with this patch depends on the execution order (and exposes a race condition in handle_target_outcome on TargetOffloadPolicy; let's ignore that for now):

If target device(0) executes first, libomptarget will notice the error and silently disable offloading. All target regions will execute on the host.
If however target device(1) executes first and returns successfully, libomptarget will raise OMP_TARGET_OFFLOAD to MANDATORY and will abort execution when catching the error of target device(0).

I don't think that makes much sense. IMO the runtime should detect two "visible" -> "available" devices and abort execution in all cases.

In D50522#1194805, @AlexEichenberger wrote:

So you like deciding available on first use? This is what your comment seems to imply, but I am not 100% sure.

As I understand Jonas, he would prefer semantically something like:

devices-available =  omp_get_num_devices()>0

This does not depend on a successful offload, just compares whether a device is there:

if devices-available && DEFAULT:
  continue as if MANDATORY
else:
  continue as if DISABLED

In D50522#1195086, @protze.joachim wrote:
In D50522#1194805, @AlexEichenberger wrote:

So you like deciding available on first use? This is what your comment seems to imply, but I am not 100% sure.

As I understand Jonas, he would prefer semantically something like:
devices-available =  omp_get_num_devices()>0

Yes, because that's how the standard defines omp_get_num_devices(): to return the number of available devices.
And because it results in a sane behaviour for my example.

This does not depend on a successful offload, just compares whether a device is there:
if devices-available && DEFAULT:
  continue as if MANDATORY
else:
  continue as if DISABLED

Almost, !devices-available && MANDATORY should not result in DISABLED ;-)

if DEFAULT:
  if devices-available:
    continue as if MANDATORY
  else:
    continue as if DISABLED
endif

I disagree. As it is currently written, omp_get_num_devices() can also grow over time, so you can also have it return zero, only to increase later to a larger number. What does that do to DEFAULT?
I believe the current policy is ok; if any device fails for any reason on first use, it becomes disabled. Anyone that want to rely on devices being there should use the MANDATORY policy.
I am happy to add a lock around the change of DEFAULT to MANDATORY or DISABLED.

In D50522#1195317, @AlexEichenberger wrote:

I disagree. As it is currently written, omp_get_num_devices() can also grow over time, so you can also have it return zero, only to increase later to a larger number. What does that do to DEFAULT?

Ok, fair point. I think we need to decide on the first entry from user code: If one construct fell back to the host all following constructs should, shouldn't they?

I believe the current policy is ok; if any device fails for any reason on first use, it becomes disabled. Anyone that want to rely on devices being there should use the MANDATORY policy.

Huh, but there is only one global TargetOffloadPolicy. So how can we disable it = a single device?

In D50522#1194903, @Hahnfeld wrote:
Suppose we have 2 devices plugged into the system, and the first one cannot be used (for whatever reason: hardware failure, exclusive configuration and somebody else is running, etc.).
Now a clever application sees the two (because omp_get_num_devices() returns 2) and does:
#pragma omp parallel num_threads(omp_get_num_devices())
{
  #pragma omp target device(omp_get_thread_num())
  { }
}
I think the runtime behaviour with this patch depends on the execution order (and exposes a race condition in handle_target_outcome on TargetOffloadPolicy; let's ignore that for now):

If target device(0) executes first, libomptarget will notice the error and silently disable offloading. All target regions will execute on the host.

If however target device(1) executes first and returns successfully, libomptarget will raise OMP_TARGET_OFFLOAD to MANDATORY and will abort execution when catching the error of target device(0).

I don't think that makes much sense. IMO the runtime should detect two "visible" -> "available" devices and abort execution in all cases.

Did you consider this example?

In D50522#1194903, @Hahnfeld wrote:
Suppose we have 2 devices plugged into the system, and the first one cannot be used (for whatever reason: hardware failure, exclusive configuration and somebody else is running, etc.).
Now a clever application sees the two (because omp_get_num_devices() returns 2) and does:
#pragma omp parallel num_threads(omp_get_num_devices())
{
  #pragma omp target device(omp_get_thread_num())
  { }
}
I think the runtime behaviour with this patch depends on the execution order (and exposes a race condition in handle_target_outcome on TargetOffloadPolicy; let's ignore that for now):

If target device(0) executes first, libomptarget will notice the error and silently disable offloading. All target regions will execute on the host.

If however target device(1) executes first and returns successfully, libomptarget will raise OMP_TARGET_OFFLOAD to MANDATORY and will abort execution when catching the error of target device(0).

I don't think that makes much sense. IMO the runtime should detect two "visible" -> "available" devices and abort execution in all cases.
Did you consider this example?

Your example can show a different issue as well:

A program first do a register_lib that has code for a device which does not exist on that machine. Thus omp_get_num_devices is zero. DEFAULT then becomes DISABLED. A program then does a register_lib that has code for a device that exist on this machine. But now it is disabled.

A way out of this is to see the bigger picture. If the user want to guarantee execution of all targets on a device, the user must us MANDATORY. If the user does not want devices, then DISABLED is called for. DEFAULT is a best effort, it will not be perfect, nor it has to be perfect.

I suggest that deciding on the first attempt to actually offload is as valid a policy as any, and is simple to understand. Regardless of how we do it, what we want to really avoid is that we execute some kernels on the device, and some on the host, (ignoring here explicit orders from the program via the "if(0)" clause). Both policies "num_devices>0" and "decide on first invocation" satisfy this.

Added a mutex around changing DEFAULT to MANDATORY or DISABLED
Added proper action when discovering failure (which is immediate abort of function being performed)
Failure to locate data during an update is now tolerated (a warning could be issued)

AlexEichenberger marked an inline comment as done.Aug 10 2018, 3:18 PM

AlexEichenberger added inline comments.

libomptarget/src/interface.cpp
52	Ravi, I did your change but reverted it for the following reason. I liked to have all of the handling of target_offload variable in the same file (interface.cpp). When I moved in into the CheckDeviceAndCtors, I needed to insert it in 2 places within the function, and handle_target_outcome was now in two different files. If you feel strongly about your request, I will be happy to move the code into CheckDeviceAndCtors.

Alex: If you feel strongly about your request, I will be happy to move the code into CheckDeviceAndCtors.
It is just a suggestion. Either way is fine.

@Hahnfeld and I discussed the behavior and found a very inconsistent behavior. Let's consider something like the following code, under the constraint, that device(1) is busy (will fail to offload) and device(0) is free:

In D50522#1194903, @Hahnfeld wrote:

#pragma omp parallel num_threads(omp_get_num_devices())
{
  #pragma omp target device(omp_get_thread_num())
  { }
}

We found the following cases:

first offload to device(0) -> succeed -> set MANDATORY -> abort on offloading to device(1)
first offload to device(1) -> fail -> set DISABLED -> continue execution on the host
start offloading to device(0) and device(1) -> fail on 1 -> set DISABLED, but execution of offloaded code will complete. -> although we are in DISABLED, we executed code on a device

The locking only prevents the race on the variable, but not the behavior in 3).

From our point of view, the decision between DISABLED and MANDATORY must be before starting any offloading.

Another reason to implement such behavior:
If the offloaded code changes the memory state (and the target region code is not idempotent), this prevents inconsistent states of execution, if the initial offloading fails after partial execution on the device and a second execution the host.

Default policy is now selected when registering the libraries, and will default to mandatory/disabled depending of whether there are one or more devices/none.

Good arguments, I see why the consistency of why linking the number of devices returned and OMP_TARGET_OFFLOAD environment is good.
Implemented the requested changes

Hahnfeld requested changes to this revision.Aug 13 2018, 12:10 PM

Hahnfeld added inline comments.

libomptarget/include/omptarget.h
188–194 ↗	(On Diff #160413)	Please remove, there is a CMake flag to do this.
libomptarget/src/device.cpp
368–370	I think this implementation can live directly in `interface.cpp` as no other file should use it. Please make the function `static` and I think internal function names follow a CamelCase naming convention; `device_is_ready` seems to be the exception...
385–386	Asserts are no-ops when building with `-DNDEBUG` which is the default for `Release` builds. Call `exit(1)` directly? I think we should provide the user with an error message when aborting the execution, `DP` is subject to `LIBOMPTARGET_DEBUG` and only available in `Debug` builds. (PGI's OpenACC implementation prints a detailed error code from CUDA which is probably not possible in the target agnostic part. I think that's ok for now, the user can use `cuda-gdb` or other tools...)
libomptarget/src/device.h
24–31	Is there a particular reason the code is put into `device.h` / `device.cpp`? IMO it's not related to device management. For later reuse in API methods (if deemed necessary after the discussion on `omp-lang`) I think this should go into `private.h`.
libomptarget/src/rtl.cpp
230–231	I think this code path is not triggered when there is no matching RTL at all. At the moment this causes an `assert` (in `Debug` builds) because `TargetOffloadPolicy` is still `tgt_default`. However I don't think we can decide on a single call to `__tgt_register_lib` for the reason you mentioned last week: The number of available devices can change when more device images are registered. As such I think we should move the decision handling `tgt_default` to the beginning of all interface methods that can come from a user's `target` construct. What do you think?

This revision now requires changes to proceed.Aug 13 2018, 12:10 PM

Addressed all remaining issues

responded to comments.

libomptarget/src/interface.cpp
52	Made the handle_target_outcome static to the interface.cpp as suggested by Jonas
libomptarget/src/rtl.cpp
230–231	Agreed, you are absolutely right

Please also clang-format your changes.

libomptarget/src/interface.cpp
28–61	This seems overly complex where we now only need a single `fprintf` (which is thread-safe) inside an `if`. Do you have upcoming patches that will use these functions?
65	Please rename to CamelCase, something like `HandleTargetOutcome`? (`device_is_ready` seems to be the exception)
67–77	After some thinking I guess this actually works in all cases: If there is at least one device, the code path in `RTLsTy::LoadRTLs` will be executed. Otherwise all functions check whether the device ID is less than `Devices.size()` (mostly in `device_is_ready` which is called from `CheckDeviceAndCtors`). This can never be true when there is no RTL and the execution ends up here. However I don't find this very intuitive and it will be very easy to mess up with future changes. I suggest to do this upfront, for example with a helper function: static bool IsOffloadDisabled() { if (TargetOffloadPolicy == tgt_default) { // ... } return TargetOffloadPolicy == tgt_disabled; } The first line of each interface function would then become `if (IsOffloadDisabled()) return;`. In addition the duplicate code could be dropped from `RTLsTy::LoadRTLs`. Do you have concerns about this straight-forward solution?

implemented suggest changes

responded to comments

libomptarget/src/interface.cpp
28–61	Yes, I intend this to be used for all output feedback to users. Ideally, I would find a way to share the infrastructure in kmpc, but for the moment this should do it. Note that the lock is only acquired prior to aborting, so overheads are not an issue.
67–77	good suggestions, re-inserted the lock as now there could be a race condition. Again, in practice, this should not happen often. In practice, the lock should be acquired only once, so this is ok

Looks mostly good.

In D50522#1199469, @Hahnfeld wrote:

Please also clang-format your changes.

I'm not sure whether you did this; especially the new functions in interface.cpp don't seem to follow the style I'm used to...

libomptarget/src/interface.cpp
28–61	In that case I think the functions should be moved to `omptarget.cpp` (or a completely new file?) because they'll be used by more than just the interface? Inverting `cond` seems weird, could you explain that choice? If that's to model `assert` I think the function should be renamed to `AssertOrFatalMassage`. To me the current function name implies `if (cond) FatalMessage(...)`, but that may be just me.

AlexEichenberger marked an inline comment as done.Aug 14 2018, 8:48 PM

AlexEichenberger added inline comments.

libomptarget/src/interface.cpp
28–61	I agree.. maybe private.cpp? I named them AssertWithFatalMessage, but then you mentioned that asserts are turned off in the release mode, so I though that would be confusing too. But if you prefer Assert, that was my first choice too

Hahnfeld added inline comments.Aug 15 2018, 12:11 AM

libomptarget/src/interface.cpp
28–61	Another option would be to remove `FatalMessageWithCond` and put in the `if` statement directly. That would a) be clear, b) avoid choosing a name and whether to invert `cond` or not, and c) deduplicate code and make `FatalMessage` used (which is not at the moment). For one single function I don't think we need to start a new file right now. IMO that could go to `omptarget.cpp`.

fixed clang-format and moved fatal message function

updated comments status

Looks good, thanks for the changes.

This revision is now accepted and ready to land.Aug 16 2018, 12:59 AM

Hahnfeld mentioned this in D44522: [Libomptarget] Full implementation of the target-offload-icv.Aug 16 2018, 1:00 AM

Thanks to all for your valuable comments, much appreciated, really contributed to the quality of the patch

Looking thourhg the code one more time, I realized that there was no default init for this key variable

kmp_target_offload_kind_t TargetOffloadPolicy = tgt_default;

It's better for it to be init to zero value, to be absolutely safe.

Alex, will you land this anytime soon? (I'd like to backport for our local installation of Clang 7.0...)

Hahnfeld mentioned this in D51107: [LIBOMPTARGET] Add support for mapping of lambda captures..Aug 23 2018, 1:16 AM

Closed by commit rOMP340542: [OpenMP][libomptarget] Bringing up to spec with respect to OMP_TARGET_OFFLOAD… (authored by AlexEichenberger). · Explain WhyAug 23 2018, 9:23 AM

This revision was automatically updated to reflect the committed changes.

mikerice added a subscriber: mikerice.Aug 27 2018, 10:42 AM

mikerice added inline comments.

libomptarget/src/omptarget.cpp
36	Anyone else having trouble building after this change? .../llvm/projects/openmp/libomptarget/src/omptarget.cpp:37:21: error: 'va_start' was not declared in this scope va_start(args, fmt); ^ .../llvm/projects/openmp/libomptarget/src/omptarget.cpp:43:14: error: 'va_end' was not declared in this scope va_end(args); ^ It seems my environment would really like a stdarg.h in this file.

Yes, see D51226 which will replace the varargs function by a macro.

AlexEichenberger mentioned this in D51285: Fix a build issue on Debian Jessie.Aug 27 2018, 7:59 PM

Revision Contents

Path

Size

libomptarget/

src/

13 lines

33 lines

22 lines

22 lines

3 lines

17 lines

Diff 159957

libomptarget/src/device.h

	Show All 15 Lines

	#include <cstddef>			#include <cstddef>
	#include <climits>			#include <climits>
	#include <list>			#include <list>
	#include <map>			#include <map>
	#include <mutex>			#include <mutex>
	#include <vector>			#include <vector>

	// Forward declarations.			// Forward declarations.
	struct RTLInfoTy;			struct RTLInfoTy;
	struct __tgt_bin_desc;			struct __tgt_bin_desc;
	struct __tgt_target_table;			struct __tgt_target_table;

	#define INF_REF_CNT (LONG_MAX>>1) // leave room for additions/subtractions			#define INF_REF_CNT (LONG_MAX>>1) // leave room for additions/subtractions
	#define CONSIDERED_INF(x) (x > (INF_REF_CNT>>1))			#define CONSIDERED_INF(x) (x > (INF_REF_CNT>>1))

				HahnfeldUnsubmitted Not Done Reply Inline Actions Is there a particular reason the code is put into `device.h` / `device.cpp`? IMO it's not related to device management. For later reuse in API methods (if deemed necessary after the discussion on `omp-lang`) I think this should go into `private.h`. Hahnfeld: Is there a particular reason the code is put into `device.h` / `device.cpp`? IMO it's not…
				enum TargetOffloadPolicyEnum {
				// forces all target computation on device if presemt, on the host if not present
				OMP_TARGET_OFFLOAD_DEFAULT = 0,
				// forces all target computations on the host
				OMP_TARGET_OFFLOAD_DISABLED = 1,
				// forces all target computations on the devices, abort if failed to execute on device
				OMP_TARGET_OFFLOAD_MANDATORY = 2
				};


	/// Map between host data and target data.			/// Map between host data and target data.
	struct HostDataToTargetTy {			struct HostDataToTargetTy {
	uintptr_t HstPtrBase; // host info.			uintptr_t HstPtrBase; // host info.
	uintptr_t HstPtrBegin;			uintptr_t HstPtrBegin;
	uintptr_t HstPtrEnd; // non-inclusive.			uintptr_t HstPtrEnd; // non-inclusive.

	uintptr_t TgtPtrBegin; // target info.			uintptr_t TgtPtrBegin; // target info.

	▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines
	private:			private:
	// Call to RTL			// Call to RTL
	void init(); // To be called only via DeviceTy::initOnce()			void init(); // To be called only via DeviceTy::initOnce()
	};			};

	/// Map between Device ID (i.e. openmp device id) and its DeviceTy.			/// Map between Device ID (i.e. openmp device id) and its DeviceTy.
	typedef std::vector<DeviceTy> DevicesTy;			typedef std::vector<DeviceTy> DevicesTy;
	extern DevicesTy Devices;			extern DevicesTy Devices;
				extern TargetOffloadPolicyEnum TargetOffloadPolicy;
	extern bool device_is_ready(int device_num);			extern bool device_is_ready(int device_num);
				extern void handle_target_outcome(bool success);

	#endif			#endif

libomptarget/src/device.cpp

Show All 15 Lines
#include "rtl.h"		#include "rtl.h"

#include <cassert>		#include <cassert>
#include <climits>		#include <climits>
#include <string>		#include <string>

/// Map between Device ID (i.e. openmp device id) and its DeviceTy.		/// Map between Device ID (i.e. openmp device id) and its DeviceTy.
DevicesTy Devices;		DevicesTy Devices;
		// Store target policy (disabled, mandatory, default)
		TargetOffloadPolicyEnum TargetOffloadPolicy;

int DeviceTy::associatePtr(void HstPtrBegin, void TgtPtrBegin, int64_t Size) {		int DeviceTy::associatePtr(void HstPtrBegin, void TgtPtrBegin, int64_t Size) {
DataMapMtx.lock();		DataMapMtx.lock();

// Check if entry exists		// Check if entry exists
for (auto &HT : HostDataToTargetMap) {		for (auto &HT : HostDataToTargetMap) {
if ((uintptr_t)HstPtrBegin == HT.HstPtrBegin) {		if ((uintptr_t)HstPtrBegin == HT.HstPtrBegin) {
// Mapping already exists		// Mapping already exists
▲ Show 20 Lines • Show All 326 Lines • ▼ Show 20 Lines	if (!Device.IsInit && Device.initOnce() != OFFLOAD_SUCCESS) {
DP("Failed to init device %d\n", device_num);		DP("Failed to init device %d\n", device_num);
return false;		return false;
}		}

DP("Device %d is ready to use.\n", device_num);		DP("Device %d is ready to use.\n", device_num);

return true;		return true;
}		}

		// manage the success or failure of a target constuct
		void handle_target_outcome(bool success)
		HahnfeldUnsubmitted Not Done Reply Inline Actions I think this implementation can live directly in `interface.cpp` as no other file should use it. Please make the function `static` and I think internal function names follow a CamelCase naming convention; `device_is_ready` seems to be the exception... Hahnfeld: I think this implementation can live directly in `interface.cpp` as no other file should use it.
		{
		switch (TargetOffloadPolicy) {
		case OMP_TARGET_OFFLOAD_DEFAULT:
		// if we have a success, now all computations must go to device
		if (success) {
		DP("Default TARGET OFFLOAD policy is now mandatory "
		"(due to successful handling of a target construct)");
		TargetOffloadPolicy = OMP_TARGET_OFFLOAD_MANDATORY;
		} else {
		DP("Default TARGET OFFLOAD policy is now disabled "
		"(due to unsuccessful handling of a target construct)");
		TargetOffloadPolicy = OMP_TARGET_OFFLOAD_DISABLED;
		}
		break;
		case OMP_TARGET_OFFLOAD_MANDATORY:
		if (! success) {
		HahnfeldUnsubmitted Not Done Reply Inline Actions Asserts are no-ops when building with `-DNDEBUG` which is the default for `Release` builds. Call `exit(1)` directly? I think we should provide the user with an error message when aborting the execution, `DP` is subject to `LIBOMPTARGET_DEBUG` and only available in `Debug` builds. (PGI's OpenACC implementation prints a detailed error code from CUDA which is probably not possible in the target agnostic part. I think that's ok for now, the user can use `cuda-gdb` or other tools...) Hahnfeld: Asserts are no-ops when building with `-DNDEBUG` which is the default for `Release` builds.
		DP("failure of target construct when expecting to successfully offload");
		assert(success);
		}
		break;
		case OMP_TARGET_OFFLOAD_DISABLED:
		if (success) {
		DP("failure of target construct when expecting to fail offloading");
		assert(! success);
		}
		break;
		}
		}

libomptarget/src/interface.cpp

Show All 19 Lines

#include <cassert>		#include <cassert>
#include <cstdlib>		#include <cstdlib>

////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////
/// adds a target shared library to the target execution image		/// adds a target shared library to the target execution image
EXTERN void __tgt_register_lib(__tgt_bin_desc *desc) {		EXTERN void __tgt_register_lib(__tgt_bin_desc *desc) {
RTLs.RegisterLib(desc);		RTLs.RegisterLib(desc);
}		}

////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////
/// unloads a target shared library		/// unloads a target shared library
EXTERN void __tgt_unregister_lib(__tgt_bin_desc *desc) {		EXTERN void __tgt_unregister_lib(__tgt_bin_desc *desc) {
RTLs.UnregisterLib(desc);		RTLs.UnregisterLib(desc);
}		}

/// creates host-to-target data mapping, stores it in the		/// creates host-to-target data mapping, stores it in the
/// libomptarget.so internal structure (an entry in a stack of data maps)		/// libomptarget.so internal structure (an entry in a stack of data maps)
/// and passes the data to the device.		/// and passes the data to the device.
EXTERN void __tgt_target_data_begin(int64_t device_id, int32_t arg_num,		EXTERN void __tgt_target_data_begin(int64_t device_id, int32_t arg_num,
void args_base, void args, int64_t arg_sizes, int64_t arg_types) {		void args_base, void args, int64_t arg_sizes, int64_t arg_types) {
DP("Entering data begin region for device %" PRId64 " with %d mappings\n",		DP("Entering data begin region for device %" PRId64 " with %d mappings\n",
device_id, arg_num);		device_id, arg_num);

// No devices available?		// No devices available?
if (device_id == OFFLOAD_DEVICE_DEFAULT) {		if (device_id == OFFLOAD_DEVICE_DEFAULT) {
device_id = omp_get_default_device();		device_id = omp_get_default_device();
DP("Use default device id %" PRId64 "\n", device_id);		DP("Use default device id %" PRId64 "\n", device_id);
}		}

if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {		if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {
DP("Failed to get device %" PRId64 " ready\n", device_id);		DP("Failed to get device %" PRId64 " ready\n", device_id);
		handle_target_outcome(false);
		RaviNarayanaswamyUnsubmitted Done Reply Inline Actions Why not move the check of handle_target_outcome into CheckDeviceAndCtors so you dont have to do every time you invoke this function. RaviNarayanaswamy: Why not move the check of handle_target_outcome into CheckDeviceAndCtors so you dont have to…
		AlexEichenbergerAuthorUnsubmitted Done Reply Inline Actions Ravi, I did your change but reverted it for the following reason. I liked to have all of the handling of target_offload variable in the same file (interface.cpp). When I moved in into the CheckDeviceAndCtors, I needed to insert it in 2 places within the function, and handle_target_outcome was now in two different files. If you feel strongly about your request, I will be happy to move the code into CheckDeviceAndCtors. AlexEichenberger: Ravi, I did your change but reverted it for the following reason. I liked to have all of the…
		AlexEichenbergerAuthorUnsubmitted Done Reply Inline Actions Made the handle_target_outcome static to the interface.cpp as suggested by Jonas AlexEichenberger: Made the handle_target_outcome static to the interface.cpp as suggested by Jonas
return;		return;
}		}

DeviceTy& Device = Devices[device_id];		DeviceTy& Device = Devices[device_id];

#ifdef OMPTARGET_DEBUG		#ifdef OMPTARGET_DEBUG
for (int i=0; i<arg_num; ++i) {		for (int i=0; i<arg_num; ++i) {
DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64		DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64
", Type=0x%" PRIx64 "\n", i, DPxPTR(args_base[i]), DPxPTR(args[i]),		", Type=0x%" PRIx64 "\n", i, DPxPTR(args_base[i]), DPxPTR(args[i]),
		HahnfeldUnsubmitted Done Reply Inline Actions This seems overly complex where we now only need a single `fprintf` (which is thread-safe) inside an `if`. Do you have upcoming patches that will use these functions? Hahnfeld: This seems overly complex where we now only need a single `fprintf` (which is thread-safe)…
		AlexEichenbergerAuthorUnsubmitted Done Reply Inline Actions Yes, I intend this to be used for all output feedback to users. Ideally, I would find a way to share the infrastructure in kmpc, but for the moment this should do it. Note that the lock is only acquired prior to aborting, so overheads are not an issue. AlexEichenberger: Yes, I intend this to be used for all output feedback to users. Ideally, I would find a way to…
		HahnfeldUnsubmitted Done Reply Inline Actions In that case I think the functions should be moved to `omptarget.cpp` (or a completely new file?) because they'll be used by more than just the interface? Inverting `cond` seems weird, could you explain that choice? If that's to model `assert` I think the function should be renamed to `AssertOrFatalMassage`. To me the current function name implies `if (cond) FatalMessage(...)`, but that may be just me. Hahnfeld: In that case I think the functions should be moved to `omptarget.cpp` (or a completely new file?
		AlexEichenbergerAuthorUnsubmitted Done Reply Inline Actions I agree.. maybe private.cpp? I named them AssertWithFatalMessage, but then you mentioned that asserts are turned off in the release mode, so I though that would be confusing too. But if you prefer Assert, that was my first choice too AlexEichenberger: I agree.. maybe private.cpp? I named them AssertWithFatalMessage, but then you mentioned that…
		HahnfeldUnsubmitted Done Reply Inline Actions Another option would be to remove `FatalMessageWithCond` and put in the `if` statement directly. That would a) be clear, b) avoid choosing a name and whether to invert `cond` or not, and c) deduplicate code and make `FatalMessage` used (which is not at the moment). For one single function I don't think we need to start a new file right now. IMO that could go to `omptarget.cpp`. Hahnfeld: Another option would be to remove `FatalMessageWithCond` and put in the `if` statement directly.
arg_sizes[i], arg_types[i]);		arg_sizes[i], arg_types[i]);
}		}
#endif		#endif

		HahnfeldUnsubmitted Done Reply Inline Actions Please rename to CamelCase, something like `HandleTargetOutcome`? (`device_is_ready` seems to be the exception) Hahnfeld: Please rename to CamelCase, something like `HandleTargetOutcome`? (`device_is_ready` seems to…
target_data_begin(Device, arg_num, args_base, args, arg_sizes, arg_types);		int rc = target_data_begin(Device, arg_num, args_base,
		args, arg_sizes, arg_types);
		handle_target_outcome(rc == OFFLOAD_SUCCESS);
}		}

EXTERN void __tgt_target_data_begin_nowait(int64_t device_id, int32_t arg_num,		EXTERN void __tgt_target_data_begin_nowait(int64_t device_id, int32_t arg_num,
void args_base, void args, int64_t arg_sizes, int64_t arg_types,		void args_base, void args, int64_t arg_sizes, int64_t arg_types,
int32_t depNum, void *depList, int32_t noAliasDepNum,		int32_t depNum, void *depList, int32_t noAliasDepNum,
void *noAliasDepList) {		void *noAliasDepList) {
if (depNum + noAliasDepNum > 0)		if (depNum + noAliasDepNum > 0)
__kmpc_omp_taskwait(NULL, 0);		__kmpc_omp_taskwait(NULL, 0);

		HahnfeldUnsubmitted Done Reply Inline Actions After some thinking I guess this actually works in all cases: If there is at least one device, the code path in `RTLsTy::LoadRTLs` will be executed. Otherwise all functions check whether the device ID is less than `Devices.size()` (mostly in `device_is_ready` which is called from `CheckDeviceAndCtors`). This can never be true when there is no RTL and the execution ends up here. However I don't find this very intuitive and it will be very easy to mess up with future changes. I suggest to do this upfront, for example with a helper function: static bool IsOffloadDisabled() { if (TargetOffloadPolicy == tgt_default) { // ... } return TargetOffloadPolicy == tgt_disabled; } The first line of each interface function would then become `if (IsOffloadDisabled()) return;`. In addition the duplicate code could be dropped from `RTLsTy::LoadRTLs`. Do you have concerns about this straight-forward solution? Hahnfeld: After some thinking I guess this actually works in all cases: 1) If there is at least one…
		AlexEichenbergerAuthorUnsubmitted Done Reply Inline Actions good suggestions, re-inserted the lock as now there could be a race condition. Again, in practice, this should not happen often. In practice, the lock should be acquired only once, so this is ok AlexEichenberger: good suggestions, re-inserted the lock as now there could be a race condition. Again, in…
__tgt_target_data_begin(device_id, arg_num, args_base, args, arg_sizes,		__tgt_target_data_begin(device_id, arg_num, args_base, args, arg_sizes,
arg_types);		arg_types);
}		}

/// passes data from the target, releases target memory and destroys		/// passes data from the target, releases target memory and destroys
/// the host-target mapping (top entry from the stack of data maps)		/// the host-target mapping (top entry from the stack of data maps)
/// created by the last __tgt_target_data_begin.		/// created by the last __tgt_target_data_begin.
EXTERN void __tgt_target_data_end(int64_t device_id, int32_t arg_num,		EXTERN void __tgt_target_data_end(int64_t device_id, int32_t arg_num,
void args_base, void args, int64_t arg_sizes, int64_t arg_types) {		void args_base, void args, int64_t arg_sizes, int64_t arg_types) {
DP("Entering data end region with %d mappings\n", arg_num);		DP("Entering data end region with %d mappings\n", arg_num);

// No devices available?		// No devices available?
if (device_id == OFFLOAD_DEVICE_DEFAULT) {		if (device_id == OFFLOAD_DEVICE_DEFAULT) {
device_id = omp_get_default_device();		device_id = omp_get_default_device();
}		}

RTLsMtx.lock();		RTLsMtx.lock();
size_t Devices_size = Devices.size();		size_t Devices_size = Devices.size();
RTLsMtx.unlock();		RTLsMtx.unlock();
if (Devices_size <= (size_t)device_id) {		if (Devices_size <= (size_t)device_id) {
DP("Device ID %" PRId64 " does not have a matching RTL.\n", device_id);		DP("Device ID %" PRId64 " does not have a matching RTL.\n", device_id);
		handle_target_outcome(false);
return;		return;
}		}

DeviceTy &Device = Devices[device_id];		DeviceTy &Device = Devices[device_id];
if (!Device.IsInit) {		if (!Device.IsInit) {
DP("Uninit device: ignore");		DP("Uninit device: ignore");
		handle_target_outcome(false);
return;		return;
}		}

#ifdef OMPTARGET_DEBUG		#ifdef OMPTARGET_DEBUG
for (int i=0; i<arg_num; ++i) {		for (int i=0; i<arg_num; ++i) {
DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64		DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64
", Type=0x%" PRIx64 "\n", i, DPxPTR(args_base[i]), DPxPTR(args[i]),		", Type=0x%" PRIx64 "\n", i, DPxPTR(args_base[i]), DPxPTR(args[i]),
arg_sizes[i], arg_types[i]);		arg_sizes[i], arg_types[i]);
}		}
#endif		#endif

target_data_end(Device, arg_num, args_base, args, arg_sizes, arg_types);		int rc = target_data_end(Device, arg_num, args_base,
		args, arg_sizes, arg_types);
		handle_target_outcome(rc == OFFLOAD_SUCCESS);
}		}

EXTERN void __tgt_target_data_end_nowait(int64_t device_id, int32_t arg_num,		EXTERN void __tgt_target_data_end_nowait(int64_t device_id, int32_t arg_num,
void args_base, void args, int64_t arg_sizes, int64_t arg_types,		void args_base, void args, int64_t arg_sizes, int64_t arg_types,
int32_t depNum, void *depList, int32_t noAliasDepNum,		int32_t depNum, void *depList, int32_t noAliasDepNum,
void *noAliasDepList) {		void *noAliasDepList) {
if (depNum + noAliasDepNum > 0)		if (depNum + noAliasDepNum > 0)
__kmpc_omp_taskwait(NULL, 0);		__kmpc_omp_taskwait(NULL, 0);

__tgt_target_data_end(device_id, arg_num, args_base, args, arg_sizes,		__tgt_target_data_end(device_id, arg_num, args_base, args, arg_sizes,
arg_types);		arg_types);
}		}

EXTERN void __tgt_target_data_update(int64_t device_id, int32_t arg_num,		EXTERN void __tgt_target_data_update(int64_t device_id, int32_t arg_num,
void args_base, void args, int64_t arg_sizes, int64_t arg_types) {		void args_base, void args, int64_t arg_sizes, int64_t arg_types) {
DP("Entering data update with %d mappings\n", arg_num);		DP("Entering data update with %d mappings\n", arg_num);

// No devices available?		// No devices available?
if (device_id == OFFLOAD_DEVICE_DEFAULT) {		if (device_id == OFFLOAD_DEVICE_DEFAULT) {
device_id = omp_get_default_device();		device_id = omp_get_default_device();
}		}

if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {		if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {
DP("Failed to get device %" PRId64 " ready\n", device_id);		DP("Failed to get device %" PRId64 " ready\n", device_id);
		handle_target_outcome(false);
return;		return;
}		}

DeviceTy& Device = Devices[device_id];		DeviceTy& Device = Devices[device_id];
target_data_update(Device, arg_num, args_base, args, arg_sizes, arg_types);		int rc = target_data_update(Device, arg_num, args_base,
		args, arg_sizes, arg_types);
		handle_target_outcome(rc == OFFLOAD_SUCCESS);
}		}

EXTERN void __tgt_target_data_update_nowait(		EXTERN void __tgt_target_data_update_nowait(
int64_t device_id, int32_t arg_num, void args_base, void args,		int64_t device_id, int32_t arg_num, void args_base, void args,
int64_t arg_sizes, int64_t arg_types, int32_t depNum, void *depList,		int64_t arg_sizes, int64_t arg_types, int32_t depNum, void *depList,
int32_t noAliasDepNum, void *noAliasDepList) {		int32_t noAliasDepNum, void *noAliasDepList) {
if (depNum + noAliasDepNum > 0)		if (depNum + noAliasDepNum > 0)
__kmpc_omp_taskwait(NULL, 0);		__kmpc_omp_taskwait(NULL, 0);

__tgt_target_data_update(device_id, arg_num, args_base, args, arg_sizes,		__tgt_target_data_update(device_id, arg_num, args_base, args, arg_sizes,
arg_types);		arg_types);
}		}

EXTERN int __tgt_target(int64_t device_id, void *host_ptr, int32_t arg_num,		EXTERN int __tgt_target(int64_t device_id, void *host_ptr, int32_t arg_num,
void args_base, void args, int64_t arg_sizes, int64_t arg_types) {		void args_base, void args, int64_t arg_sizes, int64_t arg_types) {
DP("Entering target region with entry point " DPxMOD " and device Id %"		DP("Entering target region with entry point " DPxMOD " and device Id %"
PRId64 "\n", DPxPTR(host_ptr), device_id);		PRId64 "\n", DPxPTR(host_ptr), device_id);

if (device_id == OFFLOAD_DEVICE_DEFAULT) {		if (device_id == OFFLOAD_DEVICE_DEFAULT) {
device_id = omp_get_default_device();		device_id = omp_get_default_device();
}		}

if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {		if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {
DP("Failed to get device %" PRId64 " ready\n", device_id);		DP("Failed to get device %" PRId64 " ready\n", device_id);
		handle_target_outcome(false);
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}

#ifdef OMPTARGET_DEBUG		#ifdef OMPTARGET_DEBUG
for (int i=0; i<arg_num; ++i) {		for (int i=0; i<arg_num; ++i) {
DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64		DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64
", Type=0x%" PRIx64 "\n", i, DPxPTR(args_base[i]), DPxPTR(args[i]),		", Type=0x%" PRIx64 "\n", i, DPxPTR(args_base[i]), DPxPTR(args[i]),
arg_sizes[i], arg_types[i]);		arg_sizes[i], arg_types[i]);
}		}
#endif		#endif

int rc = target(device_id, host_ptr, arg_num, args_base, args, arg_sizes,		int rc = target(device_id, host_ptr, arg_num, args_base, args, arg_sizes,
arg_types, 0, 0, false /team/);		arg_types, 0, 0, false /team/);
		handle_target_outcome(rc == OFFLOAD_SUCCESS);
return rc;		return rc;
}		}

EXTERN int __tgt_target_nowait(int64_t device_id, void *host_ptr,		EXTERN int __tgt_target_nowait(int64_t device_id, void *host_ptr,
int32_t arg_num, void args_base, void args, int64_t *arg_sizes,		int32_t arg_num, void args_base, void args, int64_t *arg_sizes,
int64_t arg_types, int32_t depNum, void depList, int32_t noAliasDepNum,		int64_t arg_types, int32_t depNum, void depList, int32_t noAliasDepNum,
void *noAliasDepList) {		void *noAliasDepList) {
if (depNum + noAliasDepNum > 0)		if (depNum + noAliasDepNum > 0)
Show All 10 Lines	DP("Entering target region with entry point " DPxMOD " and device Id %"
PRId64 "\n", DPxPTR(host_ptr), device_id);		PRId64 "\n", DPxPTR(host_ptr), device_id);

if (device_id == OFFLOAD_DEVICE_DEFAULT) {		if (device_id == OFFLOAD_DEVICE_DEFAULT) {
device_id = omp_get_default_device();		device_id = omp_get_default_device();
}		}

if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {		if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {
DP("Failed to get device %" PRId64 " ready\n", device_id);		DP("Failed to get device %" PRId64 " ready\n", device_id);
		handle_target_outcome(false);
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}

#ifdef OMPTARGET_DEBUG		#ifdef OMPTARGET_DEBUG
for (int i=0; i<arg_num; ++i) {		for (int i=0; i<arg_num; ++i) {
DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64		DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64
", Type=0x%" PRIx64 "\n", i, DPxPTR(args_base[i]), DPxPTR(args[i]),		", Type=0x%" PRIx64 "\n", i, DPxPTR(args_base[i]), DPxPTR(args[i]),
arg_sizes[i], arg_types[i]);		arg_sizes[i], arg_types[i]);
}		}
#endif		#endif

int rc = target(device_id, host_ptr, arg_num, args_base, args, arg_sizes,		int rc = target(device_id, host_ptr, arg_num, args_base, args, arg_sizes,
arg_types, team_num, thread_limit, true /team/);		arg_types, team_num, thread_limit, true /team/);
		handle_target_outcome(rc == OFFLOAD_SUCCESS);

return rc;		return rc;
}		}

EXTERN int __tgt_target_teams_nowait(int64_t device_id, void *host_ptr,		EXTERN int __tgt_target_teams_nowait(int64_t device_id, void *host_ptr,
int32_t arg_num, void args_base, void args, int64_t *arg_sizes,		int32_t arg_num, void args_base, void args, int64_t *arg_sizes,
int64_t *arg_types, int32_t team_num, int32_t thread_limit, int32_t depNum,		int64_t *arg_types, int32_t team_num, int32_t thread_limit, int32_t depNum,
void depList, int32_t noAliasDepNum, void noAliasDepList) {		void depList, int32_t noAliasDepNum, void noAliasDepList) {
Show All 9 Lines
EXTERN void __kmpc_push_target_tripcount(int64_t device_id,		EXTERN void __kmpc_push_target_tripcount(int64_t device_id,
uint64_t loop_tripcount) {		uint64_t loop_tripcount) {
if (device_id == OFFLOAD_DEVICE_DEFAULT) {		if (device_id == OFFLOAD_DEVICE_DEFAULT) {
device_id = omp_get_default_device();		device_id = omp_get_default_device();
}		}

if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {		if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {
DP("Failed to get device %" PRId64 " ready\n", device_id);		DP("Failed to get device %" PRId64 " ready\n", device_id);
		handle_target_outcome(false);
return;		return;
}		}

DP("__kmpc_push_target_tripcount(%" PRId64 ", %" PRIu64 ")\n", device_id,		DP("__kmpc_push_target_tripcount(%" PRId64 ", %" PRIu64 ")\n", device_id,
loop_tripcount);		loop_tripcount);
Devices[device_id].loopTripCnt = loop_tripcount;		Devices[device_id].loopTripCnt = loop_tripcount;
}		}

libomptarget/src/omptarget.cpp

Show All 27 Lines
/* All begin addresses for partially mapped structs must be 8-aligned in order		/* All begin addresses for partially mapped structs must be 8-aligned in order
* to ensure proper alignment of members. E.g.		* to ensure proper alignment of members. E.g.
*		*
* struct S {		* struct S {
* int a; // 4-aligned		* int a; // 4-aligned
* int b; // 4-aligned		* int b; // 4-aligned
* int *p; // 8-aligned		* int *p; // 8-aligned
* } s1;		* } s1;
* ...		* ...
		mikericeUnsubmitted Not Done Reply Inline Actions Anyone else having trouble building after this change? .../llvm/projects/openmp/libomptarget/src/omptarget.cpp:37:21: error: 'va_start' was not declared in this scope va_start(args, fmt); ^ .../llvm/projects/openmp/libomptarget/src/omptarget.cpp:43:14: error: 'va_end' was not declared in this scope va_end(args); ^ It seems my environment would really like a stdarg.h in this file. mikerice: Anyone else having trouble building after this change? ...
* #pragma omp target map(tofrom: s1.b, s1.p[0:N])		* #pragma omp target map(tofrom: s1.b, s1.p[0:N])
* {		* {
* s1.b = 5;		* s1.b = 5;
* for (int i...) s1.p[i] = ...;		* for (int i...) s1.p[i] = ...;
* }		* }
*		*
* Here we are mapping s1 starting from member b, so BaseAddress=&s1=&s1.a and		* Here we are mapping s1 starting from member b, so BaseAddress=&s1=&s1.a and
* BeginAddress=&s1.b. Let's assume that the struct begins at address 0x100,		* BeginAddress=&s1.b. Let's assume that the struct begins at address 0x100,
▲ Show 20 Lines • Show All 407 Lines • ▼ Show 20 Lines	if ((arg_types[i] & OMP_TGT_MAPTYPE_FROM) \|\| DelEntry) {
}		}
}		}
}		}

return rc;		return rc;
}		}

/// Internal function to pass data to/from the target.		/// Internal function to pass data to/from the target.
void target_data_update(DeviceTy &Device, int32_t arg_num,		int target_data_update(DeviceTy &Device, int32_t arg_num,
void args_base, void args, int64_t arg_sizes, int64_t arg_types) {		void args_base, void args, int64_t arg_sizes, int64_t arg_types) {
// process each input.		// process each input.
for (int32_t i = 0; i < arg_num; ++i) {		for (int32_t i = 0; i < arg_num; ++i) {
if ((arg_types[i] & OMP_TGT_MAPTYPE_LITERAL) \|\|		if ((arg_types[i] & OMP_TGT_MAPTYPE_LITERAL) \|\|
(arg_types[i] & OMP_TGT_MAPTYPE_PRIVATE))		(arg_types[i] & OMP_TGT_MAPTYPE_PRIVATE))
continue;		continue;

void *HstPtrBegin = args[i];		void *HstPtrBegin = args[i];
int64_t MapSize = arg_sizes[i];		int64_t MapSize = arg_sizes[i];
bool IsLast;		bool IsLast;
void *TgtPtrBegin = Device.getTgtPtrBegin(HstPtrBegin, MapSize, IsLast,		void *TgtPtrBegin = Device.getTgtPtrBegin(HstPtrBegin, MapSize, IsLast,
false);		false);

if (arg_types[i] & OMP_TGT_MAPTYPE_FROM) {		if (arg_types[i] & OMP_TGT_MAPTYPE_FROM) {
DP("Moving %" PRId64 " bytes (tgt:" DPxMOD ") -> (hst:" DPxMOD ")\n",		DP("Moving %" PRId64 " bytes (tgt:" DPxMOD ") -> (hst:" DPxMOD ")\n",
arg_sizes[i], DPxPTR(TgtPtrBegin), DPxPTR(HstPtrBegin));		arg_sizes[i], DPxPTR(TgtPtrBegin), DPxPTR(HstPtrBegin));
Device.data_retrieve(HstPtrBegin, TgtPtrBegin, MapSize);		int rt = Device.data_retrieve(HstPtrBegin, TgtPtrBegin, MapSize);
		if (rt != OFFLOAD_SUCCESS) {
		DP("Copying data from device failed.\n");
		return OFFLOAD_FAIL;
		}

uintptr_t lb = (uintptr_t) HstPtrBegin;		uintptr_t lb = (uintptr_t) HstPtrBegin;
uintptr_t ub = (uintptr_t) HstPtrBegin + MapSize;		uintptr_t ub = (uintptr_t) HstPtrBegin + MapSize;
Device.ShadowMtx.lock();		Device.ShadowMtx.lock();
for (ShadowPtrListTy::iterator it = Device.ShadowPtrMap.begin();		for (ShadowPtrListTy::iterator it = Device.ShadowPtrMap.begin();
it != Device.ShadowPtrMap.end(); ++it) {		it != Device.ShadowPtrMap.end(); ++it) {
void ShadowHstPtrAddr = (void) it->first;		void ShadowHstPtrAddr = (void) it->first;
if ((uintptr_t) ShadowHstPtrAddr < lb)		if ((uintptr_t) ShadowHstPtrAddr < lb)
continue;		continue;
if ((uintptr_t) ShadowHstPtrAddr >= ub)		if ((uintptr_t) ShadowHstPtrAddr >= ub)
break;		break;
DP("Restoring original host pointer value " DPxMOD " for host pointer "		DP("Restoring original host pointer value " DPxMOD " for host pointer "
DPxMOD "\n", DPxPTR(it->second.HstPtrVal),		DPxMOD "\n", DPxPTR(it->second.HstPtrVal),
DPxPTR(ShadowHstPtrAddr));		DPxPTR(ShadowHstPtrAddr));
*ShadowHstPtrAddr = it->second.HstPtrVal;		*ShadowHstPtrAddr = it->second.HstPtrVal;
}		}
Device.ShadowMtx.unlock();		Device.ShadowMtx.unlock();
}		}

if (arg_types[i] & OMP_TGT_MAPTYPE_TO) {		if (arg_types[i] & OMP_TGT_MAPTYPE_TO) {
DP("Moving %" PRId64 " bytes (hst:" DPxMOD ") -> (tgt:" DPxMOD ")\n",		DP("Moving %" PRId64 " bytes (hst:" DPxMOD ") -> (tgt:" DPxMOD ")\n",
arg_sizes[i], DPxPTR(HstPtrBegin), DPxPTR(TgtPtrBegin));		arg_sizes[i], DPxPTR(HstPtrBegin), DPxPTR(TgtPtrBegin));
Device.data_submit(TgtPtrBegin, HstPtrBegin, MapSize);		int rt = Device.data_submit(TgtPtrBegin, HstPtrBegin, MapSize);
		if (rt != OFFLOAD_SUCCESS) {
		DP("Copying data to device failed.\n");
		return OFFLOAD_FAIL;
		}
uintptr_t lb = (uintptr_t) HstPtrBegin;		uintptr_t lb = (uintptr_t) HstPtrBegin;
uintptr_t ub = (uintptr_t) HstPtrBegin + MapSize;		uintptr_t ub = (uintptr_t) HstPtrBegin + MapSize;
Device.ShadowMtx.lock();		Device.ShadowMtx.lock();
for (ShadowPtrListTy::iterator it = Device.ShadowPtrMap.begin();		for (ShadowPtrListTy::iterator it = Device.ShadowPtrMap.begin();
it != Device.ShadowPtrMap.end(); ++it) {		it != Device.ShadowPtrMap.end(); ++it) {
void ShadowHstPtrAddr = (void) it->first;		void ShadowHstPtrAddr = (void) it->first;
if ((uintptr_t) ShadowHstPtrAddr < lb)		if ((uintptr_t) ShadowHstPtrAddr < lb)
continue;		continue;
if ((uintptr_t) ShadowHstPtrAddr >= ub)		if ((uintptr_t) ShadowHstPtrAddr >= ub)
break;		break;
DP("Restoring original target pointer value " DPxMOD " for target "		DP("Restoring original target pointer value " DPxMOD " for target "
"pointer " DPxMOD "\n", DPxPTR(it->second.TgtPtrVal),		"pointer " DPxMOD "\n", DPxPTR(it->second.TgtPtrVal),
DPxPTR(it->second.TgtPtrAddr));		DPxPTR(it->second.TgtPtrAddr));
Device.data_submit(it->second.TgtPtrAddr,		rt = Device.data_submit(it->second.TgtPtrAddr,
&it->second.TgtPtrVal, sizeof(void *));		&it->second.TgtPtrVal, sizeof(void *));
		if (rt != OFFLOAD_SUCCESS) {
		DP("Copying data to device failed.\n");
		return OFFLOAD_FAIL;
		}
}		}
Device.ShadowMtx.unlock();		Device.ShadowMtx.unlock();
}		}
}		}
		return OFFLOAD_SUCCESS;
}		}

/// performs the same actions as data_begin in case arg_num is		/// performs the same actions as data_begin in case arg_num is
/// non-zero and initiates run of the offloaded region on the target platform;		/// non-zero and initiates run of the offloaded region on the target platform;
/// if arg_num is non-zero after the region execution is done it also		/// if arg_num is non-zero after the region execution is done it also
/// performs the same action as data_update and data_end above. This function		/// performs the same action as data_update and data_end above. This function
/// returns 0 if it was able to transfer the execution to a target and an		/// returns 0 if it was able to transfer the execution to a target and an
/// integer different from zero otherwise.		/// integer different from zero otherwise.
▲ Show 20 Lines • Show All 186 Lines • Show Last 20 Lines

libomptarget/src/private.h

	Show All 18 Lines
	#include <cstdint>			#include <cstdint>

	extern int target_data_begin(DeviceTy &Device, int32_t arg_num,			extern int target_data_begin(DeviceTy &Device, int32_t arg_num,
	void args_base, void args, int64_t arg_sizes, int64_t arg_types);			void args_base, void args, int64_t arg_sizes, int64_t arg_types);

	extern int target_data_end(DeviceTy &Device, int32_t arg_num, void **args_base,			extern int target_data_end(DeviceTy &Device, int32_t arg_num, void **args_base,
	void *args, int64_t arg_sizes, int64_t *arg_types);			void *args, int64_t arg_sizes, int64_t *arg_types);

	extern void target_data_update(DeviceTy &Device, int32_t arg_num,			extern int target_data_update(DeviceTy &Device, int32_t arg_num,
	void args_base, void args, int64_t arg_sizes, int64_t arg_types);			void args_base, void args, int64_t arg_sizes, int64_t arg_types);

	extern int target(int64_t device_id, void *host_ptr, int32_t arg_num,			extern int target(int64_t device_id, void *host_ptr, int32_t arg_num,
	void args_base, void args, int64_t arg_sizes, int64_t arg_types,			void args_base, void args, int64_t arg_sizes, int64_t arg_types,
	int32_t team_num, int32_t thread_limit, int IsTeamConstruct);			int32_t team_num, int32_t thread_limit, int IsTeamConstruct);

	extern int CheckDeviceAndCtors(int64_t device_id);			extern int CheckDeviceAndCtors(int64_t device_id);

	// Implemented in libomp, they are called from within __tgt_* functions.			// Implemented in libomp, they are called from within __tgt_* functions.
	#ifdef __cplusplus			#ifdef __cplusplus
	extern "C" {			extern "C" {
	#endif			#endif
	int omp_get_default_device(void) __attribute__((weak));			int omp_get_default_device(void) __attribute__((weak));
	int32_t __kmpc_omp_taskwait(void *loc_ref, int32_t gtid) __attribute__((weak));			int32_t __kmpc_omp_taskwait(void *loc_ref, int32_t gtid) __attribute__((weak));
	#ifdef __cplusplus			#ifdef __cplusplus
	}			}
	#endif			#endif

	#ifdef OMPTARGET_DEBUG			#ifdef OMPTARGET_DEBUG
	extern int DebugLevel;			extern int DebugLevel;

	#define DP(...) \			#define DP(...) \
	do { \			do { \
	if (DebugLevel > 0) { \			if (DebugLevel > 0) { \
	DEBUGP("Libomptarget", __VA_ARGS__); \			DEBUGP("Libomptarget", __VA_ARGS__); \
	} \			} \
	} while (false)			} while (false)
	#else // OMPTARGET_DEBUG			#else // OMPTARGET_DEBUG
	#define DP(...) {}			#define DP(...) {}
	#endif // OMPTARGET_DEBUG			#endif // OMPTARGET_DEBUG

	#endif			#endif

libomptarget/src/rtl.cpp

Show All 40 Lines
void RTLsTy::LoadRTLs() {		void RTLsTy::LoadRTLs() {
#ifdef OMPTARGET_DEBUG		#ifdef OMPTARGET_DEBUG
if (char *envStr = getenv("LIBOMPTARGET_DEBUG")) {		if (char *envStr = getenv("LIBOMPTARGET_DEBUG")) {
DebugLevel = std::stoi(envStr);		DebugLevel = std::stoi(envStr);
}		}
#endif // OMPTARGET_DEBUG		#endif // OMPTARGET_DEBUG

// Parse environment variable OMP_TARGET_OFFLOAD (if set)		// Parse environment variable OMP_TARGET_OFFLOAD (if set)
		TargetOffloadPolicy = OMP_TARGET_OFFLOAD_DEFAULT;
char *envStr = getenv("OMP_TARGET_OFFLOAD");		char *envStr = getenv("OMP_TARGET_OFFLOAD");
if (envStr && !strcmp(envStr, "DISABLED")) {		if (envStr) {
		if (!strcmp(envStr, "DISABLED")) {
DP("Target offloading disabled by environment\n");		DP("Target offloading disabled by environment\n");
		TargetOffloadPolicy = OMP_TARGET_OFFLOAD_DISABLED;
return;		return;
		} else if (!strcmp(envStr, "MANDATORY")) {
		DP("Target offloading forced to be mandatory by environment\n");
		TargetOffloadPolicy = OMP_TARGET_OFFLOAD_MANDATORY;
		} else if (!strcmp(envStr, "DEFAULT")) {
		DP("Target offloading forced to default policy by environment\n");
		} else {
		DP("Target offloading with unknown value\n");
		}
}		}
		HahnfeldUnsubmitted Done Reply Inline Actions There was an agreement with Intel to have a query function `__kmpc_get_target_offload`. Please use that one. Hahnfeld: There was an agreement with Intel to have a query function `__kmpc_get_target_offload`. Please…

DP("Loading RTLs...\n");		DP("Loading RTLs...\n");

// Attempt to open all the plugins and, if they exist, check if the interface		// Attempt to open all the plugins and, if they exist, check if the interface
// is correct and if they are supporting any devices.		// is correct and if they are supporting any devices.
for (auto *Name : RTLNames) {		for (auto *Name : RTLNames) {
DP("Loading library '%s'...\n", Name);		DP("Loading library '%s'...\n", Name);
void *dynlib_handle = dlopen(Name, RTLD_NOW);		void *dynlib_handle = dlopen(Name, RTLD_NOW);
▲ Show 20 Lines • Show All 149 Lines • ▼ Show 20 Lines	for (auto &R : RTLs.AllRTLs) {

DP("Image " DPxMOD " is compatible with RTL %s!\n",		DP("Image " DPxMOD " is compatible with RTL %s!\n",
DPxPTR(img->ImageStart), R.RTLName.c_str());		DPxPTR(img->ImageStart), R.RTLName.c_str());

// If this RTL is not already in use, initialize it.		// If this RTL is not already in use, initialize it.
if (!R.isUsed) {		if (!R.isUsed) {
// Initialize the device information for the RTL we are about to use.		// Initialize the device information for the RTL we are about to use.
DeviceTy device(&R);		DeviceTy device(&R);

size_t start = Devices.size();		size_t start = Devices.size();
		HahnfeldUnsubmitted Done Reply Inline Actions I think this code path is not triggered when there is no matching RTL at all. At the moment this causes an `assert` (in `Debug` builds) because `TargetOffloadPolicy` is still `tgt_default`. However I don't think we can decide on a single call to `__tgt_register_lib` for the reason you mentioned last week: The number of available devices can change when more device images are registered. As such I think we should move the decision handling `tgt_default` to the beginning of all interface methods that can come from a user's `target` construct. What do you think? Hahnfeld: I think this code path is not triggered when there is no matching RTL at all. At the moment…
		AlexEichenbergerAuthorUnsubmitted Done Reply Inline Actions Agreed, you are absolutely right AlexEichenberger: Agreed, you are absolutely right
Devices.resize(start + R.NumberOfDevices, device);		Devices.resize(start + R.NumberOfDevices, device);
for (int32_t device_id = 0; device_id < R.NumberOfDevices;		for (int32_t device_id = 0; device_id < R.NumberOfDevices;
device_id++) {		device_id++) {
// global device ID		// global device ID
Devices[start + device_id].DeviceID = start + device_id;		Devices[start + device_id].DeviceID = start + device_id;
// RTL local device ID		// RTL local device ID
Devices[start + device_id].RTLDeviceID = device_id;		Devices[start + device_id].RTLDeviceID = device_id;

▲ Show 20 Lines • Show All 140 Lines • Show Last 20 Lines