Previously for nowait target, CG emitted a function call to __tgt_target_nowait, etc. However, in OpenMP RTL, these functions just directly call the no-nowait version, which means nowait is not working as expected.
OpenMP specification says a target is acutally a target task, which is an untied and detachable task. It is natural to go to the direction that generates a task for a nowait target. However, OpenMP task has a problem that it must be within to a parallel region; otherwise the task will be executed immediately. As a result, if we directly wrap to a regular task, the target nowait outside of a parallel region is still a synchronous version.
In D77609, I added the support for unshackled task in OpenMP RTL. Basically, unshackled task is a task that is not bound to any parallel region. So all nowait target will be tranformed into an unshackled task. In order to distinguish from regular task, a new flag bit is set for unshackled task. This flag will be used by RTL for later process.
Since all target tasks are allocated via __kmpc_omp_target_task_alloc, and in current libomptarget, __kmpc_omp_target_task_alloc just calls __kmpc_omp_task_alloc. Therefore, we can modify the flag in __kmpc_omp_target_task_alloc so that we don't need to modify the FE too much. If users choose to opt out the feature, they just need to use a RTL w/o support of unshackled threads.
As a result, in this patch, the target nowait region is simply wrapped into a regular task. Later once we have RTL support for unshackled tasks, the wrapped tasks can be executed by unshackled threads w/o changes in the FE.