This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][Libomptarget] Introduce Remote Offloading Plugin
ClosedPublic

Authored by atmnpatel on Jan 24 2021, 10:48 AM.

Details

Summary

This introduces a remote offloading plugin for libomptarget. This
implementation relies on gRPC and protobuf, so this library will only
build if both libraries are available on the system. The corresponding
server is compiled to openmp-offloading-server.

This is a large change, but the only way to split this up is into RTL/server
but I fear that could introduce an inconsistency amongst them.

Ideally, tests for this should be added to the current ones that but that is
problematic for at least one reason. Given that libomptarget registers plugin
on a first-come-first-serve basis, if we wanted to offload onto a local x86
through a different process, then we'd have to either re-order the plugin list
in rtl.cpp (which is what I did locally for testing) or find a better
solution for runtime plugin registration in libomptarget.

Diff Detail

Event Timeline

atmnpatel created this revision.Jan 24 2021, 10:48 AM
atmnpatel requested review of this revision.Jan 24 2021, 10:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 24 2021, 10:48 AM

clangd has also machinery for detecting grpc:
https://reviews.llvm.org/D77794

jdoerfert added inline comments.Jan 25 2021, 4:38 PM
openmp/libomptarget/plugins/remote/src/Client.h
184

Remove these /// things all over the place.

openmp/libomptarget/plugins/remote/src/rtl.cpp
25

I think you can just include common/elf_common/elf_common.h now

tianshilei1992 added inline comments.Jan 25 2021, 5:09 PM
openmp/libomptarget/plugins/remote/src/rtl.cpp
25

#include "elf_common.h" instead, and then link to elf_common in CMake.

As a proof of concept, I like this. It shows roughly the plumbing needed to launch a target region on a different machine to the one that is running the main process which clearly has serious potential as a programming model.

Review gets rapidly less detailed after the first few comments. Overall this is a bit weird stylistically but the broad pattern of how the client and server fit together looks as expected.

I don't think any of this is thread safe, which is bad as the plugin can definitely be called from multiple threads on the host.

There are no tests which is unfortunate as there's a lot of subtle control flow in the patch and a lot of ways RPC can go wrong. The documentation should probably mention that 'printf' will output on the server.

Given the lack of testing, and some general scepticism about how robustly this handles things going wrong, I think this should be guarded by a cmake variable. LIBOMPTARGET_ENABLE_EXPERIMENTAL_REMOTE_PLUGIN or similar.

It looks from the valid binary test like this will claim to be a working plugin for most binaries if the server is running. The libomptarget logic for choosing a plugin picks the first plugin that claims to work. I think that will need to be refined to take multiple valid plugins into account, and perhaps surface a user interface for it.

openmp/libomptarget/plugins/remote/include/openmp.proto
43

Would probably go with I32 and I64 (or Int32, Int64) instead of Int and Long, as some platforms expect long to be 32 bits

openmp/libomptarget/plugins/remote/lib/Utils.cpp
70

std::map on a void* is a little strange - it's an ordered tree structure, so will probably be sorting by address. unordered_map?

73

malloc is a bit of a hazard in c++. This looks like it's missing a placement new.

(*Desc). to Desc-> ?

84

Not sure about this loop, seems to be at least one too many iterator. Maybe clearer as a for (size_t i = 0; ...) style one.

openmp/libomptarget/plugins/remote/src/Client.cpp
59

This is a bit sketchy. Try again on failure, when the failure cause is unknown, is not very robust. Bounding the number of events by an environment variable seems reasonable though.

In particular, it means the functions passed must all be safe to run multiple times, as the server might have run it and then fallen over, and that's not clear from the interface.

openmp/libomptarget/plugins/remote/src/rtl.cpp
32

This setup looks similar to some of the code above, perhaps worth putting in a common header

atmnpatel updated this revision to Diff 319206.Jan 25 2021, 9:47 PM
atmnpatel marked 9 inline comments as done.

Updates.

I don't think any of this is thread safe, which is bad as the plugin can definitely be called from multiple threads on the host.

Yep, it is not designed for that and I added a warning in the documentation.

The documentation should probably mention that 'printf' will output on the server.

I wrapped it so its only visible during debug/info.

Given the lack of testing, and some general scepticism about how robustly this handles things going wrong, I think this should be guarded by a cmake variable. LIBOMPTARGET_ENABLE_EXPERIMENTAL_REMOTE_PLUGIN or similar.

Agreed and Done.

It looks from the valid binary test like this will claim to be a working plugin for most binaries if the server is running. The libomptarget logic for choosing a plugin picks the first plugin that claims to work. I think that will need to be refined to take multiple valid plugins into account, and perhaps surface a user interface for it.

Yessir, ideally we get a config file at runtime that has priorities/a more explicit fallback mechanism.

@tschuett Thanks for pointing that out! I forgot to check for existing gRPC/protobuf cmake machinery. At this point however, it seems like we'd have to include clang's cmake infrastructure into openmp which seems like a decision that should be made carefully, so I'm going to leave it this way for now.

openmp/libomptarget/plugins/remote/src/Client.cpp
59

I removed it for now, it's not necessary for a MVP.

A few nits but given the opt-in semantics now I don't see a reason not to include this plugin (as we included other plugins before).

openmp/docs/SupportAndFAQ.rst
119

I think we can use a generated link here. :ref:... maybe with a custom anchor in front of the remote offloading section .. _remote_offloading:

openmp/docs/design/Runtimes.rst
321

In the beginning, say that any devices of the remote host will be exposed to the application as if they were local devices. That is you can offload to the host or GPUs of the remote machine using the appropriate device number. If the remote host is the host, you basically see every device twice, once native and through the remote interface.

openmp/libomptarget/plugins/remote/include/Utils.h
15

Address the clang tidy warning plz

48

Avoid a return type like this and prefer a struct with named members. Also consider passing such a struct as reference to avoid dynamic allocations.

98

Can we avoid the global namespace here? What if we put everything in the remoteoffloading namespace?

openmp/libomptarget/plugins/remote/lib/Utils.cpp
276

This is magic, add a comment at the top.

JonChesterfield accepted this revision.Jan 26 2021, 8:48 AM

I'm happy with this. It's a work in progress, but clearly marked experimental. My hope is it'll facilitate experimentation in this area.

It's plausible that this will spawn various uses. I'd like to run a test suite across multiple different architectures from a central point. Some people may prefer it to openmpi. If it gains traction, we can implement a robust RFC subsystem and appropriate application side config.

This revision is now accepted and ready to land.Jan 26 2021, 8:48 AM
jdoerfert accepted this revision.Jan 26 2021, 8:54 AM

LG, with the nits I mentioned before.

atmnpatel marked 6 inline comments as done.

Updates.

atmnpatel added a comment.EditedJan 26 2021, 12:12 PM

Most of these clang-tidy errors are from a mismatch between gRPC naming conventions and ours, I'm going to keep it as the gRPC ones because that's what we do elsewhere (in clangd for instance) unless there are any objections.

This revision was landed with ongoing or failed builds.Jan 26 2021, 12:34 PM
This revision was automatically updated to reflect the committed changes.