This is second attempt to exclude untied tasks from TSC, the first one was rejected because of kastors/trassen-task-dep benchmark hang.
This change works for the benchmark, because task stealing algorithm re-implemented so that the stealing of any task of the victim thread queue allowed, as opposed to only try to steal the head task.
Also added the check if the thief thread spins on a barrier - then any task allowed to be stolen.
Does this need to be global variable? If I understand the current patch correctly, __kmp_task_untied_encountered will be set to true once the first untied task has been encountered, However, it is never reset to zero, potentially degrading performance for later parallel regions in a program that only use tied tasks, right?
Wouldn't it make sense to have this as a property of the current parallel region? (maybe in kmp_taskteam_t?) Or does this cause problems because of the current implementation of barriers where the teams may not be valid anymore for the workers that are in the fork barrier?