This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][libomptarget] Active and blocking HSA wait states
AbandonedPublic

Authored by jplehr on Mar 24 2023, 3:28 PM.

Details

Summary

This patch adds the timeout-wait for HSA signal wait states. Meaning
that the thread is going into a blocking state only after a certain
timeout.
The idea of the implementation is copied from the AOMP implementation.
This came from a discussion about a hang observed in the wait() implementation
of NextGen plugin.

Diff Detail

Event Timeline

jplehr created this revision.Mar 24 2023, 3:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2023, 3:28 PM
jplehr requested review of this revision.Mar 24 2023, 3:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2023, 3:28 PM
jdoerfert added inline comments.Mar 24 2023, 3:44 PM
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
501

I would think we want a Env var and all Signal behave the same, so no argument here, wdyt?

558
570

assert?

573–575
ye-luo added inline comments.Mar 24 2023, 4:46 PM
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
555

const uint64_t MicrosToWait

1964

Not getting what you mean by "Example use".
Could you document the magic number 300000?

jplehr added inline comments.Mar 27 2023, 1:04 AM
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
501

In AOMP, we actually use different values for data transfers and kernel launches.
I don't think we need this for now, but we should do some experimentation to see what works best with NextGen plugin.
So, I'll change that and make it an EnvVar.

1964

Example use is copied from AOMP.
For data transfers we use this timeout (for kernels we use a different one). As said above, probably we should remove for now, and do some experimentation.

jplehr added inline comments.Mar 27 2023, 1:25 AM
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
570

I think the intention was that if MicrosToWait == 0 we fall back to the original behavior, but I just saw that we are now checking in two places.
If this patch is desirable, I would prefer to move the activeWaitImpl implementation into the AMDGPUSignaltTy::wait().

kevinsala added inline comments.
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
568

Nit. This assignment is not needed, right? We could define it when it's used:

hsa_signal_value_t Got = hsa_signal_wait_scacquire(...);

571

Could we make the exit condition consistent in both activeWaitImpl and waitImpl? The first is waiting until Signal != Init (1) and the second is waiting until Signal == 0.

jplehr marked an inline comment as done.Mar 27 2023, 3:58 AM
jplehr added inline comments.
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
571

Sure.
I would vote for the != version. though I'd need to see what changes are necessary. Basically, making the wait implementation logically the same as in AOMP: https://github.com/RadeonOpenCompute/llvm-project/blob/amd-stg-open/openmp/libomptarget/plugins/amdgpu/impl/impl.cpp#L16

Or are there objections?

kevinsala added inline comments.Mar 27 2023, 4:48 AM
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
571

That new condition looks good to me. But we should extend the code documentation of these two functions to specify the new behavior. The wait operation does no longer stop when the signal is observed with a zero value, but when we stop observing the Init (default: 1) value.

Unfortunately my original problem still persists with this patch. The active wait timed out and then still stuck in the blocking wait.
Tested 3, 30, 300, 3000, 30000, 300000, 3000000 on my W6800. I have to consider some software between hsa and the hardware is broken.

Could we please not have a customisable timeout until someone argues passionately for one? It seems unlikely that it'll get much use and we'll end up carrying that branch for ages. Also environment variables are bad things.

Unfortunately my original problem still persists with this patch. The active wait timed out and then still stuck in the blocking wait.
Tested 3, 30, 300, 3000, 30000, 300000, 3000000 on my W6800. I have to consider some software between hsa and the hardware is broken.

That's too bad. Which use case are you running into the issue?

Given that is does not help the problem, I agree with @JonChesterfield and would prefer to first do some more experiments with the NextGen plugin and this to see its impact. If we come to the conclusion that having this is desirable, we can move it forward.
I will still update the patch with the comments.

jplehr abandoned this revision.May 4 2023, 2:36 AM

Abandoning in favor of https://reviews.llvm.org/D148808