This patch fixes https://bugs.llvm.org/show_bug.cgi?id=49066.
For detachable tasks, the assumption breaks that the proxy task cannot have remaining child tasks when the proxy completes.
In stead of increment/decrement the incomplete task count, a high order bit is flipped to mark and wait for the incomplete proxy task.
Why can't this be part of __kmp_bottom_half_finish_proxy? Is the proxy task supposed to be completed, before the __kmp_bottom_half_finish_proxy task executes?