This is an archive of the discontinued LLVM Phabricator instance.

[libomptarget] Handle offload policy in push_target_tripcount
ClosedPublic

Authored by Hahnfeld on Jul 12 2019, 3:27 AM.

Details

Summary

If the first target region in a program calls the push_target_tripcount
function, libomptarget didn't handle the offload policy correctly.
This could lead to unexpected error messages as seen in
http://lists.llvm.org/pipermail/openmp-dev/2019-June/002561.html

To solve this, add a check calling IsOffloadDisabled() as all other
entry points already do. If this method returns false, libomptarget
is effectively disabled.

Diff Detail

Repository
rL LLVM

Event Timeline

Hahnfeld created this revision.Jul 12 2019, 3:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2019, 3:27 AM

Could you provide a test?

Could you provide a test?

Ah yes, forgot to mention: We don't have tests for OMP_TARGET_OFFLOAD and I don't think that's easily possible because the host plugins don't return errors.

I do not understand what caused the error you linked. Which part of the push_tripcount call is problematic? I would have expected get_default_device to return the host id when offload is disabled and the code to not choke one the host device.

I do not understand what caused the error you linked. Which part of the push_tripcount call is problematic? I would have expected get_default_device to return the host id when offload is disabled and the code to not choke one the host device.

According to the spec omp_get_default_device returns default-device-var and I couldn't find a paragraph that implies special handling when offloading is disabled.

Consequently, omp_get_default_device always returns a "real" device and if CheckDeviceAndCtors fails (in the linked case, it was because the user didn't compile for the right architecture), the call to HandleTargetOutcome(false) will result in the error message as seen in the linked post.

I do not understand what caused the error you linked. Which part of the push_tripcount call is problematic? I would have expected get_default_device to return the host id when offload is disabled and the code to not choke one the host device.

According to the spec omp_get_default_device returns default-device-var and I couldn't find a paragraph that implies special handling when offloading is disabled.

Consequently, omp_get_default_device always returns a "real" device and if CheckDeviceAndCtors fails (in the linked case, it was because the user didn't compile for the right architecture), the call to HandleTargetOutcome(false) will result in the error message as seen in the linked post.

So why not change omp_get_default_device to return the host device number if offloading is disabled? My reasoning is: Why should "offload disabled" be different from "no offload", e.g., because there are no devices available.

I do not understand what caused the error you linked. Which part of the push_tripcount call is problematic? I would have expected get_default_device to return the host id when offload is disabled and the code to not choke one the host device.

According to the spec omp_get_default_device returns default-device-var and I couldn't find a paragraph that implies special handling when offloading is disabled.

Consequently, omp_get_default_device always returns a "real" device and if CheckDeviceAndCtors fails (in the linked case, it was because the user didn't compile for the right architecture), the call to HandleTargetOutcome(false) will result in the error message as seen in the linked post.

So why not change omp_get_default_device to return the host device number if offloading is disabled? My reasoning is: Why should "offload disabled" be different from "no offload", e.g., because there are no devices available.

So you propose to change the default-device-var ICV? Because at the moment, I think omp_get_default_device returns the value of OMP_DEFAULT_DEVICE or 0, regardless of whether the device exists or not. The user can only find out via omp_get_num_devices which will return the correct value 0 in both cases.

Sure, we can still change the ICV (although I'm not sure if the runtime library can overwrite its value with the one returned by omp_get_initial_device()), but we still don't need to call any of CheckDeviceAndCtors and friends, so we can short-circuit all of this by checking IsOffloadDisabled(). After all, setting OMP_TARGET_OFFLOAD=disabled should disable all of libomptarget.

I do not understand what caused the error you linked. Which part of the push_tripcount call is problematic? I would have expected get_default_device to return the host id when offload is disabled and the code to not choke one the host device.

According to the spec omp_get_default_device returns default-device-var and I couldn't find a paragraph that implies special handling when offloading is disabled.

Consequently, omp_get_default_device always returns a "real" device and if CheckDeviceAndCtors fails (in the linked case, it was because the user didn't compile for the right architecture), the call to HandleTargetOutcome(false) will result in the error message as seen in the linked post.

So why not change omp_get_default_device to return the host device number if offloading is disabled? My reasoning is: Why should "offload disabled" be different from "no offload", e.g., because there are no devices available.

So you propose to change the default-device-var ICV? Because at the moment, I think omp_get_default_device returns the value of OMP_DEFAULT_DEVICE or 0, regardless of whether the device exists or not. The user can only find out via omp_get_num_devices which will return the correct value 0 in both cases.

(I might not fully grasp the situation here, if so, feel free to present me a bigger picture).

I actually did not try to suggest a change of any ICV value. Let me try to explain:

My thought was that the following conditions should more or less result in the same execution:

  • offloading is disabled, e.g., through an environment variable,
  • offloading is not possible, e.g., there is no device,
  • offloading is not possible, e.g., there was a problem initializing the device.

Part of what confuses me now is:

I think omp_get_default_device returns the value of OMP_DEFAULT_DEVICE or 0, regardless of whether the device exists or not.

I would have thought that omp_get_default_device returns the host device id if there is no other one available.

[..] After all, setting OMP_TARGET_OFFLOAD=disabled should disable all of libomptarget.

I agree. That is why I think adding a conditional to __kmpc_push_target_tripcount is not a conceptual fix of the issue. (Maybe I am wrong though).
The "building blocks", or alternatively all entry points, of the library have to work fine when offloading is disabled. Fixing __kmpc_push_target_tripcount
looks like a triage of one entry point.

I do not understand what caused the error you linked. Which part of the push_tripcount call is problematic? I would have expected get_default_device to return the host id when offload is disabled and the code to not choke one the host device.

According to the spec omp_get_default_device returns default-device-var and I couldn't find a paragraph that implies special handling when offloading is disabled.

Consequently, omp_get_default_device always returns a "real" device and if CheckDeviceAndCtors fails (in the linked case, it was because the user didn't compile for the right architecture), the call to HandleTargetOutcome(false) will result in the error message as seen in the linked post.

So why not change omp_get_default_device to return the host device number if offloading is disabled? My reasoning is: Why should "offload disabled" be different from "no offload", e.g., because there are no devices available.

So you propose to change the default-device-var ICV? Because at the moment, I think omp_get_default_device returns the value of OMP_DEFAULT_DEVICE or 0, regardless of whether the device exists or not. The user can only find out via omp_get_num_devices which will return the correct value 0 in both cases.

(I might not fully grasp the situation here, if so, feel free to present me a bigger picture).

I actually did not try to suggest a change of any ICV value. Let me try to explain:

My thought was that the following conditions should more or less result in the same execution:

  • offloading is disabled, e.g., through an environment variable,
  • offloading is not possible, e.g., there is no device,
  • offloading is not possible, e.g., there was a problem initializing the device.

Agreed, and it already does.

Part of what confuses me now is:

I think omp_get_default_device returns the value of OMP_DEFAULT_DEVICE or 0, regardless of whether the device exists or not.

I would have thought that omp_get_default_device returns the host device id if there is no other one available.

No, as I described above, the routine only returns the value of the ICV. This may be a device that doesn't exist, but omp_get_default_device doesn't care for this case.

[..] After all, setting OMP_TARGET_OFFLOAD=disabled should disable all of libomptarget.

I agree. That is why I think adding a conditional to __kmpc_push_target_tripcount is not a conceptual fix of the issue. (Maybe I am wrong though).
The "building blocks", or alternatively all entry points, of the library have to work fine when offloading is disabled. Fixing __kmpc_push_target_tripcount
looks like a triage of one entry point.

All other entry points have this guard, __kmpc_push_target_tripcount is the only one where it's missing. Maybe I should have stated this upfront, sorry.

jdoerfert accepted this revision.Jul 23 2019, 4:05 AM

All other entry points have this guard, __kmpc_push_target_tripcount is the only one where it's missing. Maybe I should have stated this upfront, sorry.

That I did not know.

LGTM

This revision is now accepted and ready to land.Jul 23 2019, 4:05 AM
Hahnfeld edited the summary of this revision. (Show Details)Jul 23 2019, 7:11 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJul 23 2019, 7:22 AM