This is an archive of the discontinued LLVM Phabricator instance.

[libc] Enable integration tests targeting the GPU
ClosedPublic

Authored by jhuber6 on Mar 16 2023, 1:19 PM.

Details

Summary

This patch enables integration tests running on the GPU. This uses the
RPC interface implemented in D145913 to compile the necessary
dependencies for the integration test object. We can then use this to
compile the objects for the GPU directly and execute them using the AMD
HSA loader combined with its RPC server. For example, the compiler is
performing the following actions to execute the integration tests.

$ clang++ --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -nostdlib -flto -ffreestanding \
    crt1.o io.o quick_exit.o test.o rpc_client.o args_test.o -o image
$ ./amdhsa_loader image 1 2 5
args_test.cpp:24: Expected 'my_streq(argv[3], "3")' to be true, but is false

This currently only works with a single threaded client implementation
running on AMDGPU. Further work will implement multiple clients for AMD
and the ability to run on NVPTX as well.

Depends on D145913

Diff Detail

Event Timeline

jhuber6 created this revision.Mar 16 2023, 1:19 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 16 2023, 1:19 PM
jhuber6 requested review of this revision.Mar 16 2023, 1:19 PM

The libc infra components LGTM.

libc/test/integration/startup/gpu/args_test.cpp
2

I suppose you are not using the linux one because you don't have envp on the gpu? Or, is it future work? Also, unrelated to this change, what about global contructor/destructor on the GPUs?

jhuber6 added inline comments.Mar 17 2023, 5:02 AM
libc/test/integration/startup/gpu/args_test.cpp
2

I could copy the env over as well. Could do that in a follow-up patch

jhuber6 updated this revision to Diff 506048.Mar 17 2023, 5:23 AM

We should only set the special libc.utils.gpu.loader target once for the target under test.

lntue added inline comments.Mar 17 2023, 8:38 AM
libc/cmake/modules/LLVMLibCTestRules.cmake
415–416

Nit: Maybe create a list for future extension like:

set(INTEGRATION_TARGETS linux gpu)
if (NOT (${LIBC_TARGET_OS} IN_LIST INTEGRATION_TARGETS))
...
jhuber6 updated this revision to Diff 506098.Mar 17 2023, 8:43 AM

Addressing comments

sivachandra added inline comments.Mar 17 2023, 9:13 AM
libc/test/integration/startup/gpu/args_test.cpp
2

If you copy the envp over, will the env vars checks pass?

jhuber6 added inline comments.Mar 17 2023, 9:14 AM
libc/test/integration/startup/gpu/args_test.cpp
2

Presumably, I'd just need to give it the same treatment that I gave for argv.

sivachandra accepted this revision.Mar 17 2023, 9:54 AM

OK from the libc structuring point of view.

libc/cmake/modules/LLVMLibCTestRules.cmake
415–416

Nit: supported_platforms may be, to avoid confusion with the "target cpu" or "build target".

libc/test/integration/startup/gpu/args_test.cpp
2

You can do it as a follow up of course. At that point, can we merge the linux and gpu tests?

This revision is now accepted and ready to land.Mar 17 2023, 9:54 AM
jhuber6 added inline comments.Mar 17 2023, 9:56 AM
libc/test/integration/startup/gpu/args_test.cpp
2

Yeah, we could do that. I'm wondering if I should add some more basic tests for the GPU right now. I'd like to be able to test the functionality of the entrypoints, but that's beyond the scope of these integration tests for now I believe.

JonChesterfield accepted this revision.Mar 17 2023, 10:02 AM

Global ctor/dtor is tricky. I have a suspicion hip and openmp have done different things there. When to run the dtor is a challenge, and how many threads the ctors run with may have implementation divergence.

Global ctor/dtor is tricky. I have a suspicion hip and openmp have done different things there. When to run the dtor is a challenge, and how many threads the ctors run with may have implementation divergence.

I guess in the context of the loader launching the kernel, a device constructor would just be a kernel called before the main and a destructor is after. The problem with HIP and OpenMP is that they want to do host stuff as well, so the ordering is weird since we need to run the "device" constructors after main has been called. It's certainly possible to implement, but it would need to be exported as a list of kernels to call, which is somewhat reinventing OpenMP and HIP.

This revision was landed with ongoing or failed builds.Mar 17 2023, 10:55 AM
This revision was automatically updated to reflect the committed changes.
JonChesterfield added a comment.EditedMar 17 2023, 11:22 AM

Libc could walk the ctors array before main and the dtors after, as normal function pointers. We might even be able to persuade hip or openmp to lower their ctor/dtors as function pointers in the same arrays, where they happen to launch a special kernel that walks at least one of them.

Libc could walk the ctors array before main and the dtors after, as normal function pointers. We might even be able to persuade hip or openmp to lower their ctor/dtors as function pointers in the same arrays, where they happen to launch a special kernel that walks at least one of them.

On linux/ELF platforms, the libc already runs through the function pointers in .init_array and .fini_arry`: https://github.com/llvm/llvm-project/blob/main/libc/startup/linux/x86_64/start.cpp#L111

If the GPU side of things can follow the same pattern, then it should be straight forward to extend the Linux ways to GPUs.