This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Add option to make offloading mandatory
ClosedPublic

Authored by jhuber6 on Feb 22 2022, 1:31 PM.

Details

Summary

Currently when we generate OpenMP offloading code we always make
fallback code for the CPU. This is necessary for implementing features
like conditional offloading and ensuring that unhandled pragmas don't
result in missing symbols. However, this is problematic for a few cases.
For offloading tests we can silently fail to the host without realizing
that offloading failed. Additionally, this makes it impossible to
provide interoperabiility to other offloading schemes like HIP or CUDA
because those methods do not provide any such host fallback guaruntee.
this patch adds the -fopenmp-offload-mandatory flag to prevent
generating the fallback symbol on the CPU and instead replaces the
function with a dummy global and the failed branch with 'unreachable'.

Diff Detail

Event Timeline

jhuber6 created this revision.Feb 22 2022, 1:31 PM
jhuber6 requested review of this revision.Feb 22 2022, 1:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2022, 1:31 PM

This is necessary for implementing features like conditional offloading and ensuring that unhandled pragmas don't result in missing symbols.

This behavior is part of the standard.

For offloading tests we can silently fail to the host without realizing that offloading failed.

It is controlled by the OMP_TARGET_OFFLOAD env variable, no? You can set this env var to mandatory to avoid this problem.

This is necessary for implementing features like conditional offloading and ensuring that unhandled pragmas don't result in missing symbols.

This behavior is part of the standard.

I believe it's reasonable to have this as an option flag to defy the standard, we have other flags that do this already (e.g. -fopenmp-cuda-mode).

For offloading tests we can silently fail to the host without realizing that offloading failed.

It is controlled by the OMP_TARGET_OFFLOAD env variable, no? You can set this env var to mandatory to avoid this problem.

Yes, I don't think we set it in the tests right now for some reason. But the main reason I made this patch is for interoperability. Without this if you want to call a CUDA function from the OpenMP device you'd need a variant and a dummy implementation. If you don't write a dummy implementation you'll get a linker error, if you don't use a variant you'll override the CUDA version.

This is necessary for implementing features like conditional offloading and ensuring that unhandled pragmas don't result in missing symbols.

This behavior is part of the standard.

I believe it's reasonable to have this as an option flag to defy the standard, we have other flags that do this already (e.g. -fopenmp-cuda-mode).

I mean it is not the implementation feature.

For offloading tests we can silently fail to the host without realizing that offloading failed.

It is controlled by the OMP_TARGET_OFFLOAD env variable, no? You can set this env var to mandatory to avoid this problem.

Yes, I don't think we set it in the tests right now for some reason.

I thought it is the default behavior. But we need to set it for offloading tests to be sure in their behavior.

But the main reason I made this patch is for interoperability. Without this if you want to call a CUDA function from the OpenMP device you'd need a variant and a dummy implementation. If you don't write a dummy implementation you'll get a linker error, if you don't use a variant you'll override the CUDA version.

Ah, ok, I see. How is supposed to be used? In Cuda code or in plain C/C++ code?

But the main reason I made this patch is for interoperability. Without this if you want to call a CUDA function from the OpenMP device you'd need a variant and a dummy implementation. If you don't write a dummy implementation you'll get a linker error, if you don't use a variant you'll override the CUDA function.

Ah, ok, I see. How is supposed to be used? In Cuda code or in plain C/C++ code?

I haven't finalized the implementation, but the basic support I've tested was calling a __device__ function compiled from another file with OpenMP, with this patch the source files would look like this for example. I think the inverse would also be possible given some code on the CUDA side. Calling CUDA kernels would take some extra work.

__device__ int cuda() { return 0; }
int cuda(void);
#pragma omp declare target device_type(nohost) to(cuda)

int main() {
  int x = 1;
#pragma omp target map(from : x)
  x = cuda();

  return x;
}

But the main reason I made this patch is for interoperability. Without this if you want to call a CUDA function from the OpenMP device you'd need a variant and a dummy implementation. If you don't write a dummy implementation you'll get a linker error, if you don't use a variant you'll override the CUDA function.

Ah, ok, I see. How is supposed to be used? In Cuda code or in plain C/C++ code?

I haven't finalized the implementation, but the basic support I've tested was calling a __device__ function compiled from another file with OpenMP, with this patch the source files would look like this for example. I think the inverse would also be possible given some code on the CUDA side. Calling CUDA kernels would take some extra work.

__device__ int cuda() { return 0; }
int cuda(void);
#pragma omp declare target device_type(nohost) to(cuda)

int main() {
  int x = 1;
#pragma omp target map(from : x)
  x = cuda();

  return x;
}

What if we have #pragma omp target if (...) or #pragma omp target device(...)?

But the main reason I made this patch is for interoperability. Without this if you want to call a CUDA function from the OpenMP device you'd need a variant and a dummy implementation. If you don't write a dummy implementation you'll get a linker error, if you don't use a variant you'll override the CUDA function.

Ah, ok, I see. How is supposed to be used? In Cuda code or in plain C/C++ code?

I haven't finalized the implementation, but the basic support I've tested was calling a __device__ function compiled from another file with OpenMP, with this patch the source files would look like this for example. I think the inverse would also be possible given some code on the CUDA side. Calling CUDA kernels would take some extra work.

__device__ int cuda() { return 0; }
int cuda(void);
#pragma omp declare target device_type(nohost) to(cuda)

int main() {
  int x = 1;
#pragma omp target map(from : x)
  x = cuda();

  return x;
}

What if we have #pragma omp target if (...) or #pragma omp target device(...)?

If we have #pragma omp target if (...) then that requires a host fallback and violates the assertion the user passed in, it will hit the unreachable and fail. If the user passed in #pragma omp target device(...) we will assume that a host implementation exists as well.

But the main reason I made this patch is for interoperability. Without this if you want to call a CUDA function from the OpenMP device you'd need a variant and a dummy implementation. If you don't write a dummy implementation you'll get a linker error, if you don't use a variant you'll override the CUDA function.

Ah, ok, I see. How is supposed to be used? In Cuda code or in plain C/C++ code?

I haven't finalized the implementation, but the basic support I've tested was calling a __device__ function compiled from another file with OpenMP, with this patch the source files would look like this for example. I think the inverse would also be possible given some code on the CUDA side. Calling CUDA kernels would take some extra work.

__device__ int cuda() { return 0; }
int cuda(void);
#pragma omp declare target device_type(nohost) to(cuda)

int main() {
  int x = 1;
#pragma omp target map(from : x)
  x = cuda();

  return x;
}

What if we have #pragma omp target if (...) or #pragma omp target device(...)?

If we have #pragma omp target if (...) then that requires a host fallback and violates the assertion the user passed in, it will hit the unreachable and fail. If the user passed in #pragma omp target device(...) we will assume that a host implementation exists as well.

Do you have a check for the last case (with the device clause)?

If we have #pragma omp target if (...) then that requires a host fallback and violates the assertion the user passed in, it will hit the unreachable and fail. If the user passed in #pragma omp target device(...) we will assume that a host implementation exists as well.

Do you have a check for the last case (with the device clause)?

In the test file I had a global, but forgot to check the globals to show that @x exists on the device. I should also put an if (0) to show that we always hit unreachable in that case.

jhuber6 updated this revision to Diff 410846.Feb 23 2022, 9:10 AM

Adding test case to check if codegen for unreachables, and an extra function to show that it is not created for the host while the other is. Also added an error message when the user specified offloading is mandatory but couldn't be created due to if(0) or a lack of triples.

Could you add a test with the device clause too?

Could you add a test with the device clause too?

Which clause exactly?

Could you add a test with the device clause too?

Which clause exactly?

device(device_id)

jhuber6 updated this revision to Diff 410851.Feb 23 2022, 9:24 AM

Adding test function with device clause

I assume it would be good to notify the user somehow about target regions, which may require execution on the host. Maybe add a note during the codegen phase?

I assume it would be good to notify the user somehow about target regions, which may require execution on the host. Maybe add a note during the codegen phase?

Technically all of them may require execution on the host according to the documentation. We could emit a warning whenever we codegen a target region with an if clause, but I feel the user should have a good enough idea that if won't work if they specifically turn on the flag that removes host execution.

This revision is now accepted and ready to land.Feb 23 2022, 9:36 AM

Thanks! Seems a good thing to add to the offloading test runner, preferably in a separate change to avoid reverting this in case of unforeseen problems

Thanks! Seems a good thing to add to the offloading test runner, preferably in a separate change to avoid reverting this in case of unforeseen problems

Could definitely do that, it doesn't seem like we test if anywhere in the OpenMP tests so it shouldn't break anything.

jhuber6 updated this revision to Diff 410919.Feb 23 2022, 1:01 PM

Guarding where we set attrs in the case that it's not a valid function now.

This revision was landed with ongoing or failed builds.Feb 23 2022, 1:45 PM
This revision was automatically updated to reflect the committed changes.