Add interop related functionalities for OpenMP AMD GPU plugin, including get
async queue, get device reference and get backend runtime's ref ID.
Depends on D155692
Differential D137607
[OpenMP][AMDGPU] Add interop support for OpenMP AMD GPU plugin jz10 on Nov 7 2022, 9:24 PM. Authored by
Details
Diff Detail Event TimelineComment Actions Where are the tests?
Comment Actions Thanks Johannes, I addressed your comments as follows:
I revised the interface and API name
fixed
fixed
this is the test code that was not removed, fixed
fixed, see below
I revised the implementation of "tgt_interop_init" based on its original workflow. based on these steps, we currently don't merge the omp_interop_val_t object creation together with init_device_info,
fixed
I added the test case as openmp/libomptarget/test/api/test_interop_amdgpu.c Comment Actions So, keeping the original flow is not as important as designing it properly.
Comment Actions Thanks Johannes,
i. handle task dependencies So the main changes is at step ii and iii, since the functionality of deviceIsReady is to check: 1. if the RTL object (backend runtime) was created or not ; 2. if the RTL initOnce was successfully invoked and the isInit flag was set, if neither of those 2 conditions satisfied, the interop can not provide any valuable information to user (e.g. the queue object or device information), so we just set interop object as omp_interop_none if the invocation of deviceIsReady failed;
This needs the plugin to be aware about the interop related data structure and APIs, I'm not sure if that is right way to manage interop related content
This is what we are doing so far. Comment Actions With the new changes, we return omp_interop_none a lot w/o telling the user what went wrong. Why not create an object and set the error string as we did before? Comment Actions Thanks Johannes
The check of deviceIsReady returns omp_interop_val_t object with error message if device was not ready
A new API "get_interop_val " creates omp_interop_val_t from plugin directly, so this also simplifies the code in "__tgt_interop_init". The API "get_driver_type" is removed. Comment Actions This is still too complex and inconsistent:
Comment Actions Thanks Johannes
The "tgt_rtl_init_device_info" and "tgt_rtl_init_async_info" were from CUDA plugin, so I'm not sure if we should remove "__tgt_rtl_init_device_info" since it was not used ourside.
fixed
fixed, but the checking of API symbols from dynamic lib still returns omp_interop_none,
Made it, but here's one issue, the "vendor_id" and "backend_type_id" used to be const type, but we made them as non-constant now.
fixed
This checks if the dynamic lib was loaded successfully or not, if not, then no interop capacity is available, thus omp_interop_none should be enough
this is same as 1st issue above Comment Actions
I'm now very confused what public APIs exist and which ones are actually needed/used. We should have a single one, but I'm fairly sure that's not the case, right? Why not. I don't understand the sentence I quoted. Maybe elaborate a little more if you think we should have more than one external plugin API.
Comment Actions Thanks Johannes, here are replies
Sorry that brings this confusion. What I was trying to say is plugins/cuda has "tgt_rtl_init_async_info" and "tgt_rtl_init_device_info", so I added those two APIs to plugins/amdgpu accordingly. To my understanding of your suggestion (maybe wrong) , is to get rid of "__tgt_rtl_init_device_info" could be removed since it is not invoked at somewhere else
A nullptr queue is not an error code. It is a perfectly valid state meaning there is no outstanding async work. Check the CUDA impl, it will just return OFFLOAD_FAIL if the DeviceId is invalid, do the same. fixed
added a new enum as 'amdhsa', since 'hsa' has been used for namespace
fixed, but there are some other legacy code that still use ("...")
fixed Comment Actions
yes, please. Comment Actions Thanks Johannes, I removed the __tgt_init_device_info API related definitions. Please check if that works. Comment Actions The commit currently mentions me as an author, I believe that this should not be the case, as I did not do the original work and only part of the porting.
|
Why don't we "just" return the int32_t? The indirection seems counterintuitive to me. Also, "DriverInfo" is not really descriptive. Can you specify what it returns, wrt. standard defined structs etc.