Page MenuHomePhabricator

[Libomptarget] Add an external interface to dynamic shared memory
ClosedPublic

Authored by jhuber6 on Oct 1 2021, 12:11 PM.

Details

Summary

This patch adds an external interface to access the dynamic shared
memory buffer in the device runtime. The function introduced is
`llvm_omp_get_dynamic_shared`. This includes a host-side
definition that only returns a null pointer so that it can be used when
host-fallback is enabled without crashing. Support for dynamic shared
memory was also ported to the old device runtime.

Diff Detail

Event Timeline

jhuber6 requested review of this revision.Oct 1 2021, 12:11 PM
jhuber6 created this revision.
JonChesterfield added a subscriber: JonChesterfield.

Exciting! Will take a close look early next week. Surprised there's no change to the GPU plugins needed

Exciting! Will take a close look early next week. Surprised there's no change to the GPU plugins needed

That was introduced in D110006, for CUDA it's easy since it's just an argument to the kernel launch function. I haven't implemented it for AMD yet.

JonChesterfield added inline comments.Oct 4 2021, 8:38 AM
openmp/libomptarget/deviceRTLs/common/src/data_sharing.cu
24

^ static

jhuber6 updated this revision to Diff 378193.Oct 8 2021, 6:17 AM
jhuber6 edited the summary of this revision. (Show Details)

Fix and ping.

JonChesterfield accepted this revision.Oct 8 2021, 7:21 AM

The plumbing here is all uncontroversial, it's just a wrapper over the openmp pragma.

This won't work on amdgpu as-is, will need to pass the environment variable through to the HSA packet, and see what code clang emits for the allocator construct, and if that doesn't match what hip are using add lowering in the back end. There's nothing there that can't be done, just need to find the time.

This revision is now accepted and ready to land.Oct 8 2021, 7:21 AM
jhuber6 added a comment.EditedOct 8 2021, 7:24 AM

The plumbing here is all uncontroversial, it's just a wrapper over the openmp pragma.

This won't work on amdgpu as-is, will need to pass the environment variable through to the HSA packet, and see what code clang emits for the allocator construct, and if that doesn't match what hip are using add lowering in the back end. There's nothing there that can't be done, just need to find the time.

NVPTX just sees anything with the extern shared x[] pattern in the PTX and hooks up the pointer to dynamic shared memory. I'm not sure if AMD uses a similar method, but if they do I think all that would need to be done is to add the argument to the config struct used in the AMD plugin.