Currently, in a target task, host thread spins when invoking synchronization after kernel/transfer submission.
This patch adds LIBOMPTARGET_USE_NOWAIT_EVENT environment variable to enable the code path to unblock host thread in an deferred target task by recording an event for synchronization and calling taskyield.
Need LIBOMP_USE_HIDDEN_HELPER_TASK=0 LIBOMPTARGET_USE_NOWAIT_EVENT=1 to make this feature work nicely.
https://github.com/ye-luo/openmp-target/blob/master/hands-on/gemv/7-gemv-omp-target-many-matrices-taskloop/gemv-omp-target-many-matrices-taskloop.cpp
Is the test case I played with.
This function is defined in libomp, so it needs to be declared with the weak attribute in private.h alongside the other API functions from libomp (see private.h, the code block around line 90). Otherwise, we make libomptarget dependent on libomp, whereas we want it to be able to be build independently from any specific host OpenMP runtime.