Page MenuHomePhabricator

[OpenMP] Add support for changing stack size in device RTL
AcceptedPublic

Authored by jhuber6 on Sep 21 2021, 8:36 AM.

Details

Summary

This patch adds support for using dynamic shared memory in the stack
using the sized introduced in D110108. This allows dynamic shared memory
to be scaled in the runtime and statically removed by DCE when the size
is zero.

Depends on D110108

Diff Detail

Event Timeline

jhuber6 created this revision.Sep 21 2021, 8:36 AM
jhuber6 requested review of this revision.Sep 21 2021, 8:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 21 2021, 8:36 AM
jhuber6 updated this revision to Diff 374259.Sep 22 2021, 9:24 AM

Making the usage use dynamic memory as well too, now the usage only uses as much
shared memory are there are threads in the block.

tianshilei1992 added inline comments.Sep 22 2021, 9:31 AM
openmp/libomptarget/plugins/cuda/src/rtl.cpp
1240

+? Not *?

jhuber6 added inline comments.Sep 22 2021, 9:55 AM
openmp/libomptarget/plugins/cuda/src/rtl.cpp
1240

This is a linear region of memory, we allocate a byte per-thread to save the memory usage per-thread. This allocated enough memory for the stack, and enough memory for all threads that are active.

jdoerfert accepted this revision.Oct 1 2021, 10:39 AM

LG. Two comments.

openmp/libomptarget/plugins/cuda/src/rtl.cpp
932

We should add a TODO here. It's unreasonable that we copy stuff from the device even though the host has the image with the information. I know this is how we do it for other stuff too, in general seems sub-optimal.

1241

Put these things in separate variables with explanation what they mean and how the size is computed. In the current way this is just magic.

This revision is now accepted and ready to land.Oct 1 2021, 10:39 AM