This is an archive of the discontinued LLVM Phabricator instance.

[libc] Begin implementing a library for the RPC server
ClosedPublic

Authored by jhuber6 on Mar 28 2023, 8:20 AM.

Details

Summary

This patch begins providing a generic static library that wraps around
the raw rpc.h interface. As discussed in the corresponding RFC,
https://discourse.llvm.org/t/rfc-libc-exporting-the-rpc-interface-for-the-gpu-libc/71030,
we want to begin exporting RPC services to external users. In order to
do this we decided to not expose the rpc.h header by wrapping around
its functionality. This is done with a C-interface as we make heavy use
of callbacks and allows us to provide a predictable interface.

Diff Detail

Event Timeline

jhuber6 created this revision.Mar 28 2023, 8:20 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 28 2023, 8:20 AM
jhuber6 requested review of this revision.Mar 28 2023, 8:20 AM
jhuber6 updated this revision to Diff 509038.Mar 28 2023, 8:49 AM

Make include directory public to simplify usage.

jhuber6 updated this revision to Diff 509345.Mar 29 2023, 6:46 AM

Changing implementation to use a C-based interface. This is primarily because
it's required to actually hide the implementation details of the RPC server. The
rpc.h header should not be visible outside of internal libc projects. This
also makes it easier to provide as a simple library with an expected symbol
name.

jhuber6 updated this revision to Diff 510043.Mar 31 2023, 8:02 AM

Forgot to reset memory, and ping.

jhuber6 updated this revision to Diff 528564.Jun 5 2023, 1:29 PM

Updating and rebasing.

I'm not completely happy with the interface but we can always modify it later.
This should provide what the libc developers wanted and separate the
implementation of rpc.h and the provided server. This has some downsides
compared to just exporting the full header somehow, but also has some
advantages, considering that we may not need to really provde that much custom
utility outside of libc for users, so it's easier to consoliate the
functionality in a defined interface.

Let me know what should be changed, there's a lot of cruft associated with
registering the custom handlers for malloc / free but I don't think there's
another way to get around it.

Answering the question from discord: Normally, a single header file named <some>_service.h is to be used by both the client and the server. Client's will use the client API from that header file, where as servers will use the server API. That way, the message types (or opcodes as you are calling) will be in a single shared header file.

libc/utils/gpu/server/Server.h
46 ↗(On Diff #528564)

Why are there separate functions allocation and deallocation?

51 ↗(On Diff #528564)

Update comment.

54 ↗(On Diff #528564)

Ditto.

61 ↗(On Diff #528564)

Why not just rpc_shutdown?

jhuber6 marked 3 inline comments as done.Jun 6 2023, 4:17 AM
jhuber6 added inline comments.
libc/utils/gpu/server/Server.h
46 ↗(On Diff #528564)

I'll change the name to free but we need both to adequately de allocate the shared memory.

jhuber6 updated this revision to Diff 528799.Jun 6 2023, 4:50 AM

Making suggested changes

sivachandra accepted this revision.Jun 12 2023, 10:45 PM

OK from my side but GPU review should be done by a GPU expert.

This revision is now accepted and ready to land.Jun 12 2023, 10:45 PM
libc/utils/gpu/loader/amdgpu/Loader.cpp
55

Call into the other one,

static void handle_error(rpc_status_t) {
  handle_error("Failure in the RPC server");
}
libc/utils/gpu/loader/nvptx/Loader.cpp
50

i still really dislike the copy/paste going on here

72

Does this work? It looks like the same stream running the kernel is being used to provide malloc/free, and I'd expect that to deadlock

libc/utils/gpu/server/Server.cpp
35 ↗(On Diff #528799)

Could we go with a vector of Device instead of the new+array construct?

44 ↗(On Diff #528799)

Why is this a heap allocated thing, as opposed to static State state; ?

51 ↗(On Diff #528799)

Could make the counter 64 bit and delete the test against max as a counter >= address space size can't overflow

In general the DIY reference counting is a bit odd - is there a reason this isn't a shared_ptr?

107 ↗(On Diff #528799)

I'm still hopeful that we'll come up with a better idea than rpc::MAX_LANE_SIZE

libc/utils/gpu/server/Server.h
57 ↗(On Diff #528799)

This looks like the type of a function, not the type of a function pointer. It's used as an argument to functions where it'll decay to the pointer type. More conventionally written with an extra *

typedef void(*rpc_free_ty)(void *ptr, void *data);

Is there a benefit to declaring this as the function type as opposed to the function pointer type?

70 ↗(On Diff #528799)

Want (void) if this is meant to be usable from C (the guards about suggest it is)

C++ thinks foo() is a function of no arguments. C thinks it's some aberration from the past (though that might have been dropped in the last standard).

jhuber6 added inline comments.Jun 13 2023, 12:33 PM
libc/utils/gpu/loader/nvptx/Loader.cpp
72

There's a test for this that's been running on https://lab.llvm.org/buildbot/#/builders/46 for a few weeks now and it hasn't deadlocked as far as I can tell. It's a completely separate stream called memory_stream that's just created here. The one running the kernel is just called stream. This requires CUDA 11.2 IIRC.

libc/utils/gpu/server/Server.cpp
35 ↗(On Diff #528799)

It's a static size so a constant sized array should be more correct.

44 ↗(On Diff #528799)

It's just easier to check if it's been initialized because the pointer is nullable. We coiuld probably make it a static thing and have a flag instead if you'd like.

51 ↗(On Diff #528799)

Would a shared pointer give us the same semantics? We would be allocating it multiple times and not copying it.

107 ↗(On Diff #528799)

We could use a vector and push back into it instead, or preallocate according to the size above, but this was the easiest solution.

jhuber6 marked 5 inline comments as done.Jun 14 2023, 1:27 PM
jhuber6 added inline comments.
libc/utils/gpu/server/Server.h
57 ↗(On Diff #528799)

I just forgot to add it and my IDE took care of the conversions when I got an error, will change.

70 ↗(On Diff #528799)

Forgot about that quirk of C, thanks.

jhuber6 updated this revision to Diff 531484.Jun 14 2023, 1:27 PM
jhuber6 marked 2 inline comments as done.

Addressing comments

JonChesterfield accepted this revision.Jun 14 2023, 3:03 PM

Interesting that the cuda path doesn't deadlock, I wonder if that's a fix from previous cuda revisions. Thanks

Interesting that the cuda path doesn't deadlock, I wonder if that's a fix from previous cuda revisions. Thanks

Appreciate the thorough review.

Yeah I heard about that problem as well which made me nervous about how well we could support a real malloc in the future. But pleasingly it's been running on the sm_70 and sm_60 testers for awhile and it seems good as long as your CUDA is new enough. As it stands we'll probably need to turn this function into a hard error if we integrate it into OpenMP and CUDA is too old.

This revision was landed with ongoing or failed builds.Jun 15 2023, 9:02 AM
This revision was automatically updated to reflect the committed changes.