This is an archive of the discontinued LLVM Phabricator instance.

[libc] Add the '--threads' and '--blocks' option to the GPU loaders
ClosedPublic

Authored by jhuber6 on Apr 16 2023, 5:53 PM.

Details

Summary

We will want to test the GPU libc with multiple threads in the future.
This patch adds the --threads and --blocks option to set the x
dimension of the kernel. Using CUDA terminology instead of OpenCL for
familiarity.

Depends on D148288 D148342

Diff Detail

Event Timeline

jhuber6 created this revision.Apr 16 2023, 5:53 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptApr 16 2023, 5:53 PM
jdoerfert accepted this revision.Apr 17 2023, 10:53 AM

LG, one nit.

libc/utils/gpu/loader/Main.cpp
39

move this first, then emit an error if the file was not opened.
Right now random args will be ignored.

This revision is now accepted and ready to land.Apr 17 2023, 10:53 AM
jhuber6 updated this revision to Diff 514327.Apr 17 2023, 11:15 AM

Addressing comments

tra added inline comments.Apr 17 2023, 11:30 AM
libc/utils/gpu/loader/Main.cpp
33

Nit: Can we use strtoul instead? Sometimes it's convenient to be able to use hex numbers and you would also get an indication if the input is not a number.

libc/utils/gpu/loader/nvptx/Loader.cpp
160–164

If we're allowing controlling the number of blocks/threads at all, is there a reason not to allow specifying all dimensions?

jhuber6 added inline comments.Apr 17 2023, 11:35 AM
libc/utils/gpu/loader/Main.cpp
33

Good point, just used to using the quick-and-dirty atoi.

libc/utils/gpu/loader/nvptx/Loader.cpp
160–164

I wasn't sure if I should bother, but I could definitely add it since it's hardly anymore work from what's here.

I think internally for implementations we'll need to generate all of our thread id's using the full dimensions as well, but that's probably standard.

tra added inline comments.Apr 17 2023, 11:42 AM
libc/utils/gpu/loader/nvptx/Loader.cpp
160–164

That's one common pattern. However, there are also use cases when small kernels benefit performance-wise from being able to use x/y/z indices directly, without having to calculate the single thread ID and then split it into sub-indices.

There's also a limit on how large the individual dimensions can be, so specifying a single one may not be sufficient for large inputs:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications-technical-specifications-per-compute-capability

jhuber6 updated this revision to Diff 514364.Apr 17 2023, 12:37 PM

Adding options to the y and z dimensions.

tra accepted this revision.Apr 17 2023, 12:42 PM

LGTM.

Do we want to allow specifying the amount of shared memory as well?

LGTM.

Do we want to allow specifying the amount of shared memory as well?

Not sure if we'll have a use for dynamic shared memory in libc so far. Might be worthwhile for testing though, since this is a generic launcher of kernels.

jhuber6 updated this revision to Diff 514370.Apr 17 2023, 12:45 PM

Forgot to switch to strtoul.

sivachandra accepted this revision.Apr 19 2023, 12:20 AM

This is outside my area of expertise so rubber stamping.

This revision was landed with ongoing or failed builds.Apr 19 2023, 6:02 AM
This revision was automatically updated to reflect the committed changes.