This is an archive of the discontinued LLVM Phabricator instance.

[Libomptarget] Begin implementing support for RPC services
ClosedPublic

Authored by jhuber6 on Jul 2 2023, 3:50 PM.

Details

Summary

This patch adds the intial support for running an RPC server in
libomptarget to handle host services. We interface with the library
provided by the libc project to stand up a basic server. We introduce
a new type that is controlled by the plugin and has each device
intialize its interface. We then run a basic server to check the RPC
buffer.

This patch does not fully implement the interface. In the future each
plugin will want to define special handlers via the interface to support
things like malloc or H2D copies coming from RPC. We will also want to
allow the plugin to specify t he number of ports. This is currently
capped in the implementation but will be adjusted soon.

Right now running the server is handled by whatever thread ends up doing
the waiting. This is probably not a completely sound solution but I am
not overly familiar with the behaviour of OpenMP tasks and what would be
required here. This works okay with synchrnous regions, and somewhat
fine with nowait regions, but I've observed some weird behavior when
one of those regions calls exit.

Diff Detail

Event Timeline

jhuber6 created this revision.Jul 2 2023, 3:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 2 2023, 3:50 PM
jhuber6 requested review of this revision.Jul 2 2023, 3:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 2 2023, 3:50 PM
jdoerfert added inline comments.Jul 3 2023, 10:18 AM
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
533

This I don't get. We wait for 8k units, then we run the server once, then we wait again if the kernel is not done. Why isn't this part of the loop below with a timeout that adjusts based on the RPC needs?

1071

We can grab that information in the constructor.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp
558

Can you hoist Plugin.getRPCServer(), to make it more readable.

I'm not a fan of RPCDevice as a name, we have too many "devices"

openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h
854

docs

openmp/libomptarget/plugins-nextgen/common/PluginInterface/RPC.cpp
70–105

Can the GenericDevice register these handlers?
A vector of them somewhere, or just let it call a register callback.
I would not make these special and define them here.
Move all into the GenericDevice and it will call an Impl method for the subclasses to register more.

135

This seems like a bad idea. We should use the device ID to index the Devices array, and we really need to rename this.

jhuber6 added inline comments.Jul 3 2023, 10:27 AM
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
533

I wasn't sure how to mesh it with the existing logic, since we will presumably have streams that didn't come from kernel launches and we have some active timeout exception here where I guess we query at a higher rate compared to the blocking one? I just worked around it but I could try to integrate it. AMDGPU was much more difficult here than CUDA because of the indirection.

1071

I tried that, turns out we initialize these streams before we load the binary image, which is requited to setup the RPC server since we first need to know where the symbol is to initialize it.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/CMakeLists.txt
18

Forgot to remove this.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp
558

Sure, I'll try to think of a better name.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h
854

You mean explain this in more detail?

openmp/libomptarget/plugins-nextgen/common/PluginInterface/RPC.cpp
70–105

So this will probably only be necessary for things like malloc / memcpy. Figured those are already provided by the generic device so we should be able to hide that complexity in here right next to the implementation. Didn't feel like it warranted abstraction.

135

Yeah you can make multiple. Only reason I didn't want because you'd need to allocate them because of the references, but we can just std::make_unique.

tianshilei1992 added inline comments.Jul 3 2023, 10:45 AM
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h
259

looks like irrelevant changes

jhuber6 added inline comments.Jul 3 2023, 10:53 AM
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h
259

People somehow commit changes without clang-formatting and then it ends up here because my editor will auto format. Somewhat annoying because I've clang formatted all of these files at least twice by now. I'll trim it.

jhuber6 updated this revision to Diff 536847.Jul 3 2023, 11:01 AM
jhuber6 marked an inline comment as done.

Addressing comments

jhuber6 marked 4 inline comments as done.Jul 3 2023, 11:01 AM
jdoerfert added inline comments.Jul 3 2023, 12:40 PM
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
533

I'm not convinced the thread starting the thing should run the server, but there is merit to having this option. However, the timeout still seems wrong to me. The fact that we have two locations we wait on (finish and RPC) is also not great.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp
558

RPC adapter? RCP service, ...

openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h
259

Set your editor to only format the lines you changed. Or clang format and precommit.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/RPC.cpp
70–105

But this takes away the extension point, and splits the callbacks. We might want to use different mallocs per kernel. And having all callbacks in "one place" seems like a good idea as well.

jhuber6 added inline comments.Jul 3 2023, 1:32 PM
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
533

Yeah, the alternative is to run a dedicated server thread, but I asked Shilei and he said that this should be fine. When running the thread I ran into problems, e.g. a synchronous kernel can exit without waiting for any async RPC calls to finish (exit, free, (maybe an async print variant?). Secondly when exiting it was tough to tell when the thread was alive or who should clean things up.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/RPC.cpp
70–105

I figured that this portion of the malloc would be pretty straightforward, since I think its usage should be more similar to setting the breakpoint, i.e. we only do the RPC path if we run out of some preallocated buffer space.

jhuber6 updated this revision to Diff 536904.Jul 3 2023, 2:46 PM

Merge loops, Instance -> Handle.

jdoerfert added inline comments.Jul 3 2023, 4:11 PM
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
913

Name, also elsewhere.

2857–2858

auto RPCHandle = ...
set(RPCHandle)

openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h
259

recommit

jhuber6 updated this revision to Diff 536916.Jul 3 2023, 4:14 PM

Address comments

jhuber6 updated this revision to Diff 537884.Jul 6 2023, 2:27 PM

Add a test. This creates some transitive dependencies on the libcgpu.a library
exported from libc. I had to change it to emit it in a location that we can
find in the tests at least for the build process. Secondly, this cannot use the
wrapper headers as those are create in-place by the installation logic in the
resource dir. We'll need to declare the things on the GPU manually. Third, this
cannot have a dependency on the libcgpu.a target as that could potentially be
in another CMake invocation. So be aware that changes to that will not be
triggered by a re-run of check-openmp if the libc project was changed.

Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2023, 2:27 PM
jdoerfert accepted this revision.Jul 6 2023, 8:34 PM

LG, minor nits

openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h
259

unrelated. precommit

openmp/libomptarget/plugins-nextgen/common/PluginInterface/RPC.cpp
51

I'd again swap the cases, also below.

137

Hoist getDeviceId, 4 times makes it hard to read.

140

Check spelling

openmp/libomptarget/plugins-nextgen/common/PluginInterface/RPC.h
45

private:

openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp
482

Swap the cases, simple first.

This revision is now accepted and ready to land.Jul 6 2023, 8:34 PM
jhuber6 updated this revision to Diff 538095.Jul 7 2023, 5:06 AM

Address nits

arsenm added a subscriber: arsenm.Jul 7 2023, 5:20 AM
arsenm added inline comments.
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp
1145

Can you use unique_ptr?

jhuber6 added inline comments.Jul 7 2023, 5:24 AM
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp
1145

I was keeping it consistent with the GlobalHandler above. I think there was a specific reason for why we did that originally but I don't remember off the top of my head. Might've been something about ordering of the destructors.

arsenm added inline comments.Jul 7 2023, 5:26 AM
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp
1145

Also don't need the null checks