The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first unshackled task is encountered, and is only responsible for the execution of unshackled tasks. We first use pthread_create to create a new thread, let's call it the initial and also the master thread of the unshackled team. This initial thread then initializes a new root, just like what RTL does in initialization. After that, it directly calls __kmpc_fork_call. It is like the initial thread encounters a parallel region. The wrapped function for this team is, for master thread, which is the initial thread that we create via pthread_create on Linux, waits on a condition variable. The condition variable can only be signaled when RTL is being destroyed. For other work threads, they just do nothing. The reason that master thread needs to wait there is, in current implementation, once master thread finishes the wrapped function of this team, it starts to free the team which is not what we want.
Two environment variables, LIBOMP_NUM_UNSHACKLED_THREADS and LIBOMP_USE_UNSHACKLED_TASK, are also set to configure the number of threads and enable/disable this feature. By default, the number of unshackled threads is 8.
Here are some open issues to be discussed:
- The master thread goes to sleeping when the initialization is finished. As Andrey mentioned, we might need it to be awaken from time to time to do some stuffs. What kind of update/check should be put here?