The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first hidden helper task is encountered, and is only responsible for the execution of hidden helper tasks. We first use pthread_create to create a new thread, let's call it the initial and also the main thread of the hidden helper team. This initial thread then initializes a new root, just like what RTL does in initialization. After that, it directly calls __kmpc_fork_call. It is like the initial thread encounters a parallel region. The wrapped function for this team is, for main thread, which is the initial thread that we create via pthread_create on Linux, waits on a condition variable. The condition variable can only be signaled when RTL is being destroyed. For other work threads, they just do nothing. The reason that main thread needs to wait there is, in current implementation, once the main thread finishes the wrapped function of this team, it starts to free the team which is not what we want.
Two environment variables, LIBOMP_NUM_HIDDEN_HELPER_THREADS and LIBOMP_USE_HIDDEN_HELPER_TASK, are also set to configure the number of threads and enable/disable this feature. By default, the number of hidden helper threads is 8.
Here are some open issues to be discussed:
- The main thread goes to sleeping when the initialization is finished. As Andrey mentioned, we might need it to be awaken from time to time to do some stuffs. What kind of update/check should be put here?
This breaks the convention that the struct size is 32 bits. The correct addition of extra flag under condition should look like: