This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Only initialize a single queue/stream/event eagerly
AbandonedPublic

Authored by jdoerfert on Jun 2 2023, 2:43 PM.

Details

Summary

Initialization for many queues/streams/events might come at a cost even
if we do not use them. This patch lazily initializes them, otherwise
nothing (major) is supposed to change. A minor difference is the
handling of an error in the initialization of an AMD queue (other than
the first). We now report an error but continue with the first queue;
unsure if this will ever come up.

Diff Detail

Event Timeline

jdoerfert created this revision.Jun 2 2023, 2:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2023, 2:43 PM
jdoerfert requested review of this revision.Jun 2 2023, 2:43 PM
tianshilei1992 added inline comments.Jun 2 2023, 3:25 PM
openmp/docs/design/Runtimes.rst
1200

FWIW, events are used more frequently than streams even for single thread offloading.

FWIW, currently talking to @jplehr and @mhalk about the queue <-> stream mapping. Might be a follow up.

openmp/docs/design/Runtimes.rst
1200

True, not sure what the right value is. I can keep them at 32 if ppl prefer.

mhalk added inline comments.Jun 6 2023, 4:20 AM
openmp/docs/design/Runtimes.rst
1200

Just wanted to point out (since it was not directly obvious to me):
for value LIBOMPTARGET_NUM_INITIAL_STREAMS=n, getNextQueue will be called n times.

So, the patch would have to be adapted if LIBOMPTARGET_NUM_INITIAL_STREAMS will be increased.
Otherwise, one would have again n <= OMPX_NumQueues HSA queues from the get-go, even if they're not used.

kevinsala added inline comments.
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
585

Is there any reason to maintain both init and initLazy instead of making init lazy by default?

I am unsure we need this after all. @mhalk, can u check, if we don't need this we can scrap it.

mhalk added a comment.Jul 28 2023, 5:20 AM

Just looked through this patch and D154523.

Don't have a clear "yes"/"no".
IMO merging the two makes sense as they achieve the same: "single HSA queue eager init".

The busy tracking will revert like half of this patch's changes in amdgpu/src/rtl.cpp.
Should you tend to scrap it, I will have to incorporate ~30ish LoC from this one.
That is, the docs & setting of default/initial values (esp. outside of amdgpu/src/rtl.cpp).

jdoerfert abandoned this revision.Aug 1 2023, 1:23 PM

Subsumed by D154523.