Page MenuHomePhabricator

Add interop support for OpenMP AMD GPU plugin
Needs ReviewPublic

Authored by jz10 on Nov 7 2022, 9:24 PM.

Details

Reviewers
jdoerfert
Summary

Add interop related functionalities for OpenMP AMD GPU plugin, including get async queue, get device reference and get backend runtime's ref ID

Diff Detail

Event Timeline

jz10 created this revision.Nov 7 2022, 9:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 7 2022, 9:24 PM
jz10 requested review of this revision.Nov 7 2022, 9:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 7 2022, 9:24 PM
jdoerfert added inline comments.Nov 11 2022, 5:09 PM
openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
2698

No return after else.

2709

Reverse the cases, small case first.

jz10 updated this revision to Diff 474938.Nov 12 2022, 4:31 AM

Thanks Johannes, your comments were addressed in this revision

Where are the tests?

openmp/libomptarget/include/omptargetplugin.h
167

Why don't we "just" return the int32_t? The indirection seems counterintuitive to me. Also, "DriverInfo" is not really descriptive. Can you specify what it returns, wrt. standard defined structs etc.

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
2689

Please don't cast to "int". If anything, pick a explicit size.

2698

No else after return (is what I meant). Also other places.

openmp/libomptarget/src/interop.cpp
103

This seems odd. Is the runtime and the vendor the same thing for us?

235

Do you know it's null or allocated here?

235–237

Why is the ready check happening after the rest, could you explain the problem here?

250

Why do we need this new function if we call init device info afterwards anyway? It seems we now have two functions we always call in sequence to get two numbers out of the plugin. If so, we should merge them.

251

Leftover?

jz10 updated this revision to Diff 475433.Nov 15 2022, 5:22 AM

Thanks Johannes, I addressed your comments as follows:

  1. "Why don't we "just" return the int32_t? The indirection seems counterintuitive to me. Also, "DriverInfo" is not really descriptive. Can you specify what it returns, wrt. standard defined structs etc."

I revised the interface and API name

  1. "Please don't cast to "int". If anything, pick a explicit size."

fixed

  1. "No else after return (is what I meant). Also other places."

fixed

  1. "This seems odd. Is the runtime and the vendor the same thing for us?"

this is the test code that was not removed, fixed

  1. Do you know it's null or allocated here?

fixed, see below

  1. "Why do we need this new function if we call init device info afterwards anyway? It seems we now have two functions we always call in sequence to get two numbers out of the plugin. If so, we should merge them."

I revised the implementation of "tgt_interop_init" based on its original workflow.
The original workflow for "
tgt_interop_init" is :
i. handle task dependencies
ii. create omp_interop_val_t, but the interop type was constantly set as CUDA (this is why we need to retrieve the driver type from plugin)
iii. check if the device is ready, if not, set the error string but preserve the omp_interop_val_t ;
iv. call init_device_info to setup the Context and Device fields for tgt_device_info object, if this failed, then destroy omp_interop_val_t;
v. if the interop type is kmp_interop_type_tasksync, then call init_async_info to setup the async Queue for
tgt_async_info object, if this failed, then destroy omp_interop_val_t.

based on these steps, we currently don't merge the omp_interop_val_t object creation together with init_device_info,

  1. "Why is the ready check happening after the rest, could you explain the problem here?"

fixed

  1. "Where is test case"

I added the test case as openmp/libomptarget/test/api/test_interop_amdgpu.c

  1. "Why do we need this new function if we call init device info afterwards anyway? It seems we now have two functions we always call in sequence to get two numbers out of the plugin. If so, we should merge them."

I revised the implementation of "tgt_interop_init" based on its original workflow.
The original workflow for "
tgt_interop_init" is :
i. handle task dependencies
ii. create omp_interop_val_t, but the interop type was constantly set as CUDA (this is why we need to retrieve the driver type from plugin)
iii. check if the device is ready, if not, set the error string but preserve the omp_interop_val_t ;
iv. call init_device_info to setup the Context and Device fields for tgt_device_info object, if this failed, then destroy omp_interop_val_t;
v. if the interop type is kmp_interop_type_tasksync, then call init_async_info to setup the async Queue for
tgt_async_info object, if this failed, then destroy omp_interop_val_t.

So, keeping the original flow is not as important as designing it properly.
Right now, if the device is not ready, you still call into the device RTL.
Either we can always do that, and therefore can always call init_device_info to initialize the thing in the plugin, or we cannot always do that, and should consequently not call get_driver_type if the device is not ready.
To determine which, we need to look at the "non-ready" case and what that means. If the latter is the case, we can initialize the interop type with "unknown" instead of CUDA.
Does that make sense?

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
2708

When I said, no return after else, I mean, remove the "else" not the "return". You should take a look at the LLVM coding style guide.

if (...) {
  ...
  return A;
} else {
  ...
}
return B;

should be

if (...) {
  ...
  return A;
}
...
return B;

You still have the former pattern in your code and you now again have the complicated case first. It's easier to read if you have

if (...) {
  // short
  return;
}
//long

than

if (...) {
  // long
} else {
  // short
  return;
}
jz10 updated this revision to Diff 476120.Nov 17 2022, 7:26 AM

Thanks Johannes,

  1. for the "else and return" thing, I fixed the code style based on you suggestion, please check if that works
  1. for the workflow of __tgt_interop_init, I agree with you point and revised the workflow as:

i. handle task dependencies
ii. check if the device is ready, if not, set the interop object as omp_interop_none;
iii. create omp_interop_val_t that is corresponding the actual type of backend runtime, it has two types currently CUDA and HIP, other backend runtime support are not fully available yet;
iv. call init_device_info to setup the Context and Device fields for tgt_device_info object, if this failed, then destroy omp_interop_val_t;
v. if the interop type is kmp_interop_type_tasksync, then call init_async_info to setup the async Queue for tgt_async_info object, if this failed, then destroy omp_interop_val_t.

So the main changes is at step ii and iii, since the functionality of deviceIsReady is to check: 1. if the RTL object (backend runtime) was created or not ; 2. if the RTL initOnce was successfully invoked and the isInit flag was set, if neither of those 2 conditions satisfied, the interop can not provide any valuable information to user (e.g. the queue object or device information), so we just set interop object as omp_interop_none if the invocation of deviceIsReady failed;

  1. "Either we can always do that, and therefore can always call init_device_info to initialize the thing in the plugin,"

This needs the plugin to be aware about the interop related data structure and APIs, I'm not sure if that is right way to manage interop related content

  1. "or we cannot always do that, and should consequently not call get_driver_type if the device is not ready."

This is what we are doing so far.

With the new changes, we return omp_interop_none a lot w/o telling the user what went wrong. Why not create an object and set the error string as we did before?
You also still have two entry points into the plugin, we do not need that. The omp_interop_val_t can be initialized by a single init call into the plugin. This will not only remove all the boilerplate of a new plugin API, but also cut down on the complexity of the initialization code in libomptarget.

jz10 updated this revision to Diff 476677.Nov 19 2022, 7:04 AM

Thanks Johannes

  1. "With the new changes, we return omp_interop_none a lot w/o telling the user what went wrong. Why not create an object and set the error string as we did before?"

The check of deviceIsReady returns omp_interop_val_t object with error message if device was not ready

  1. "You also still have two entry points into the plugin, we do not need that. The omp_interop_val_t can be initialized by a single init call into the plugin. This will not only remove all the boilerplate of a new plugin API, but also cut down on the complexity of the initialization code in libomptarget."

A new API "get_interop_val " creates omp_interop_val_t from plugin directly, so this also simplifies the code in "__tgt_interop_init". The API "get_driver_type" is removed.

This is still too complex and inconsistent:

  1. Remove the old plugin API.
  2. Always allocate an interop object in one place, not in 3, default initialize it.
  3. If something goes wrong, set an error message and return the object.
  4. Pass the interop to the plugin to be initialized.
openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
2692

Here you write *AsyncInfo but only below you allocate memory for it. Either there is an object and you overwrite it below or there is not and this will segfault.

openmp/libomptarget/src/interop.cpp
103

^^^

211

This API is now unused from the outside, right? Can we get rid of it?

222

No error message here?

jz10 updated this revision to Diff 476739.Nov 20 2022, 3:37 AM

Thanks Johannes

  1. "Remove the old plugin API."

The "tgt_rtl_init_device_info" and "tgt_rtl_init_async_info" were from CUDA plugin, so I'm not sure if we should remove "__tgt_rtl_init_device_info" since it was not used ourside.

  1. "Always allocate an interop object in one place, not in 3, default initialize it."

fixed

  1. "If something goes wrong, set an error message and return the object."

fixed, but the checking of API symbols from dynamic lib still returns omp_interop_none,

  1. "Pass the interop to the plugin to be initialized."

Made it, but here's one issue, the "vendor_id" and "backend_type_id" used to be const type, but we made them as non-constant now.

  1. "Here you write *AsyncInfo but only below you allocate memory for it. Either there is an object and you overwrite it below or there is not and this will segfault."

fixed

  1. openmp/libomptarget/src/interop.cpp ln;209 "No error message here?"

This checks if the dynamic lib was loaded successfully or not, if not, then no interop capacity is available, thus omp_interop_none should be enough

  1. openmp/libomptarget/src/interop.cpp ln;211 "This API is now unused from the outside, right? Can we get rid of it?"

this is same as 1st issue above

The "tgt_rtl_init_device_info" and "tgt_rtl_init_async_info" were from CUDA plugin, so I'm not sure if we should remove "__tgt_rtl_init_device_info" since it was not used ourside.

I'm now very confused what public APIs exist and which ones are actually needed/used. We should have a single one, but I'm fairly sure that's not the case, right? Why not. I don't understand the sentence I quoted. Maybe elaborate a little more if you think we should have more than one external plugin API.

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
2696

There is an assert followed by a check, that seems wrong. Either it's a pre-condition or it's something we want to handle gracefully.

2696

A nullptr queue is not an error code. It is a perfectly valid state meaning there is no outstanding async work. Check the CUDA impl, it will just return OFFLOAD_FAIL if the DeviceId is invalid, do the same.

2720–2726

There is an assert followed by a check, that seems wrong. Either it's a pre-condition or it's something we want to handle gracefully.

2729–2731
2742

We are actually not using hip at all. There is no backend or vendor part that is hip. We use HSA. Add a new enum value for both.

openmp/libomptarget/src/interop.cpp
71

Why the ( )?

openmp/libomptarget/test/api/omp_interop_amdgpu.c
78

Clang format this file please.

jz10 updated this revision to Diff 479522.Dec 1 2022, 10:03 PM

Thanks Johannes, here are replies

  1. "I'm now very confused what public APIs exist and which ones are actually needed/used. We should have a single one, but I'm fairly sure that's not the case, right? Why not. I don't understand the sentence I quoted. Maybe elaborate a little more if you think we should have more than one external plugin API."

Sorry that brings this confusion. What I was trying to say is plugins/cuda has "tgt_rtl_init_async_info" and "tgt_rtl_init_device_info", so I added those two APIs to plugins/amdgpu accordingly. To my understanding of your suggestion (maybe wrong) , is to get rid of "__tgt_rtl_init_device_info" could be removed since it is not invoked at somewhere else

  1. "There is an assert followed by a check, that seems wrong. Either it's a pre-condition or it's something we want to handle gracefully.

A nullptr queue is not an error code. It is a perfectly valid state meaning there is no outstanding async work. Check the CUDA impl, it will just return OFFLOAD_FAIL if the DeviceId is invalid, do the same.
There is an assert followed by a check, that seems wrong. Either it's a pre-condition or it's something we want to handle gracefully."

fixed

  1. "We are actually not using hip at all. There is no backend or vendor part that is hip. We use HSA. Add a new enum value for both."

added a new enum as 'amdhsa', since 'hsa' has been used for namespace

  1. "Why the ( )?"

fixed, but there are some other legacy code that still use ("...")

  1. "Clang format this file please"

fixed

, is to get rid of "__tgt_rtl_init_device_info" could be removed since it is not invoked at somewhere else

yes, please.

jz10 updated this revision to Diff 479837.Dec 3 2022, 7:05 AM

Thanks Johannes, I removed the __tgt_init_device_info API related definitions. Please check if that works.