This patch is just a drafted version to show my design of implementing unshackled task. It contains some problems, like error information is not right, the flag for unshackled task is always set to 1. Will make things right if the design itself is generally right.
The basic design is to create an outer-most parallel team. It is not regular team because it has its own team. We first use `pthread_create` to create a new thread, let's call it the initial and also master thread of unshackled team. This initial thread then initializes a new root, just like what RTL does in initialization. After that, directly call `__kmpc_fork_call`. It is like the initial thread encounters a parallel region. The wrapped function for this team is, for master thread, which is the initial thread that we create via `pthread_create` on Linux, it waits on a condition variable. The condition variable can only be signaled when RTL is deconstructing. For other work threads, they just do nothing. The reason that master thread needs to wait there is, in current implementation, once master thread finishes the wrapped function of this team, it starts to free the team which is not what we want.
Here are some open issues to be discussed:
1. You can see here we still use platform dependent method to create thread. Can we use platform independent way like `std::thread`? The advantage of using C++ thread library is, we don't need to implement same functions for different platforms. One potential reason I can think about is, we cannot set attributes using C++ thread library. We do have some configurations like stack size. Since currently `libomp.so` does not depend on `libc++.so`, using C++ thread library means it has one more dependency. Is it good? In addition to thread creating, others like condition variables, also apply.
2. Synchronization, synchronization, synchronization. What policy should we use for unshackled tasks?