Jonas Hahnfeld reported a bug in shutdown in the presence of tasking.
This change fixes a race in shutdown code when threads are being reaped. Threads spinning in fork barrier and searching for tasks to steal may identify other threads as potential victims to steal from. The other threads may have already been reaped.
The fix creates a simple flag on the threads that lets them indicate that they are in a reapable state when shutdown is happening. The shutdown code then forces any threads out of the fork barrier and then waits until all the threads are reapable, before reaping any of them.
Do we need this here or would it be enough to have the flag completely handled in __kmp_execute_tasks_template?
I don't know whether that would create a race on th_reap_state. If __kmp_execute_tasks_template is not guaranteed to be called at least once before a barrier finishes, why aren't there problems with multiple parallel regions? Each thread will have th.th_reap_state = KMP_SAFE_TO_REAP at the end of the first parallel region...