This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
openmp/runtime/src/
-
runtime/
-
src/
4/4
kmp_tasking.cpp

Differential D107121

[OpenMP] Fix performance regression reported in bug #51235
ClosedPublic

Authored by tianshilei1992 on Jul 29 2021, 5:27 PM.

Download Raw Diff

Details

Reviewers

AndreyChurbanov
jdoerfert
protze.joachim

Commits

rG9f5d6ea52eb1: [OpenMP] Fix performance regression reported in bug #51235

Summary

This patch fixes the "performance regression" reported in https://bugs.llvm.org/show_bug.cgi?id=51235. In fact it has nothing to do with performance. The root cause is, the stolen task is not allowed to execute by another thread because by default it is tied task. Since hidden helper task will always be executed by hidden helper threads, it should be untied.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	520 ms	x64 debian > libomp.lock::omp_init_lock.c

Event Timeline

tianshilei1992 created this revision.Jul 29 2021, 5:27 PM

Herald added subscribers: guansong, yaxunl. · View Herald TranscriptJul 29 2021, 5:27 PM

tianshilei1992 requested review of this revision.Jul 29 2021, 5:27 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 29 2021, 5:27 PM

Herald added subscribers: openmp-commits, sstefan1. · View Herald Transcript

Harbormaster completed remote builds in B117080: Diff 362930.Jul 29 2021, 5:49 PM

ping

@tianshilei1992 when flaky tests fail/block in the build bots, this is often triggered by unusual thread interleaving due to over-subscription.
To reproduce on my own system, it helped sometimes to oversubscribe the system by running multiple instances to the flaky test at the same time:

for i in $(seq 20); do ./gtid.cpp.out & done

or to repeat until failing/hanging:

while for i in $(seq 20); do ./gtid.cpp.out & done; wait; do true; done

Can you reproduce the hang on your system and verify that this change fixes the issue?

In D107121#2918487, @protze.joachim wrote:
@tianshilei1992 when flaky tests fail/block in the build bots, this is often triggered by unusual thread interleaving due to over-subscription.
To reproduce on my own system, it helped sometimes to oversubscribe the system by running multiple instances to the flaky test at the same time:
for i in $(seq 20); do ./gtid.cpp.out & done
or to repeat until failing/hanging:
while for i in $(seq 20); do ./gtid.cpp.out & done; wait; do true; done
Can you reproduce the hang on your system and verify that this change fixes the issue?

Yup. In my machine, it can be almost 100% reproduced via numactl to set the number of threads to 6, which is same as the test machine.

$ numactl -C 0,1,2,3,4,5 ./a.out

After I applied the patch, I didn't see it again. Can you help try with numactl on your side to see if it can be easily reproduced and then this patch can fix it? Thanks.

tianshilei1992 added inline comments.Jul 31 2021, 11:07 AM

openmp/runtime/src/kmp_tasking.cpp
2793	A side question here, what does "TSC" mean here?

Running the following command, I get below assertion from the test:

for i in $(seq 4); do numactl -C 0,1,24,25 env LD_PRELOAD=openmp/runtime/src/libomp.so openmp/runtime/test/tasking/hidden_helper_task/Output/gtid.cpp.tmp & done; wait

gtid.cpp.tmp: llvm-project/openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp:58: void assert_gtid(int) [hidden_helper_task = false]: Assertion `v == 0 || v > __kmp_hidden_helper_threads_num' failed.

openmp/runtime/src/kmp_tasking.cpp
2793	TSC = task stealing constraint (see OpenMP spec)

In D107121#2918497, @protze.joachim wrote:

Running the following command, I get below assertion from the test:

for i in $(seq 4); do numactl -C 0,1,24,25 env LD_PRELOAD=openmp/runtime/src/libomp.so openmp/runtime/test/tasking/hidden_helper_task/Output/gtid.cpp.tmp & done; wait

gtid.cpp.tmp: llvm-project/openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp:58: void assert_gtid(int) [hidden_helper_task = false]: Assertion `v == 0 || v > __kmp_hidden_helper_threads_num' failed.

Thanks. I'll see what I should do.

openmp/runtime/src/kmp_tasking.cpp
2793	I guess it is "Task Scheduling Constraints", isn't it?

protze.joachim added inline comments.Jul 31 2021, 11:34 AM

openmp/runtime/src/kmp_tasking.cpp
2793	I meant task scheduling constraint

In D107121#2918497, @protze.joachim wrote:

Running the following command, I get below assertion from the test:

for i in $(seq 4); do numactl -C 0,1,24,25 env LD_PRELOAD=openmp/runtime/src/libomp.so openmp/runtime/test/tasking/hidden_helper_task/Output/gtid.cpp.tmp & done; wait

gtid.cpp.tmp: llvm-project/openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp:58: void assert_gtid(int) [hidden_helper_task = false]: Assertion `v == 0 || v > __kmp_hidden_helper_threads_num' failed.

I cannot reproduce your failure. Can you double check if the right libomp.so was being used?

➜ clang++ -fopenmp -L $HOME/Documents/deploy/openmp/release/lib $HOME/Documents/vscode/llvm-project/openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp -o gtid
➜ for i in $(seq 20); do numactl -C 0,1,24,25 ./gtid & done; wait
[2] 1304258
[3] 1304259
[4] 1304260
[5] 1304261
[6] 1304262
[7] 1304263
[8] 1304264
[9] 1304265
[10] 1304266
[11] 1304267
[12] 1304268
[13] 1304269
[14] 1304270
[15] 1304271
[16] 1304272
[17] 1304273
[18] 1304274
[19] 1304275
[20] 1304276
[21] 1304277
PASS
[8]    1304264 done       numactl -C 0,1,24,25 ./gtid
PASS
[9]    1304265 done       numactl -C 0,1,24,25 ./gtid
PASS
PASS
PASS
[2]    1304258 done       numactl -C 0,1,24,25 ./gtid
[7]    1304263 done       numactl -C 0,1,24,25 ./gtid
[10]    1304266 done       numactl -C 0,1,24,25 ./gtid
PASS
[13]    1304269 done       numactl -C 0,1,24,25 ./gtid
PASS
[5]    1304261 done       numactl -C 0,1,24,25 ./gtid
PASS
PASS
[20]  - 1304276 done       numactl -C 0,1,24,25 ./gtid
[11]    1304267 done       numactl -C 0,1,24,25 ./gtid
PASS
PASS
[17]    1304273 done       numactl -C 0,1,24,25 ./gtid
[18]    1304274 done       numactl -C 0,1,24,25 ./gtid
PASS
PASS
[19]  - 1304275 done       numactl -C 0,1,24,25 ./gtid
PASS
[16]  - 1304272 done       numactl -C 0,1,24,25 ./gtid
PASS
[21]  + 1304277 done       numactl -C 0,1,24,25 ./gtid
[12]    1304268 done       numactl -C 0,1,24,25 ./gtid
PASS
PASS
[15]  + 1304271 done       numactl -C 0,1,24,25 ./gtid
[4]    1304260 done       numactl -C 0,1,24,25 ./gtid
PASS
PASS
[6]  - 1304262 done       numactl -C 0,1,24,25 ./gtid
[14]  + 1304270 done       numactl -C 0,1,24,25 ./gtid
PASS
[3]  + 1304259 done       numactl -C 0,1,24,25 ./gtid
➜ ldd gtid
        linux-vdso.so.1 (0x00007fffa74ad000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f20e2952000)
        libm.so.6 => /usr/lib64/libm.so.6 (0x00007f20e280d000)
        libomp.so => /home/shiltian/Documents/deploy/openmp/release/lib/libomp.so (0x00007f20e272a000)
        libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00007f20e270f000)
        libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007f20e26ed000)
        libc.so.6 => /usr/lib64/libc.so.6 (0x00007f20e2523000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f20e2b44000)
        librt.so.1 => /usr/lib64/librt.so.1 (0x00007f20e2516000)
        libdl.so.2 => /usr/lib64/libdl.so.2 (0x00007f20e250f000)

I execute in BUILD/runtime/runtime-bin and use LD_PRELOAD to make sure, that I use the right libomp.
The system I use for testing has two sockets with 24 cores each. So, using core 24,25 I make sure to get the extra latency from communication between the sockets.

I added a print statement before the assertion and found that v=-1.
I added an atomic counter to count the executed tasks for each iteration of the for loop. If the assertion triggers, the task count is 2 instead of 3.

In D107121#2918601, @protze.joachim wrote:

I execute in BUILD/runtime/runtime-bin and use LD_PRELOAD to make sure, that I use the right libomp.
The system I use for testing has two sockets with 24 cores each. So, using core 24,25 I make sure to get the extra latency from communication between the sockets.

I added a print statement before the assertion and found that v=-1.
I added an atomic counter to count the executed tasks for each iteration of the for loop. If the assertion triggers, the task count is 2 instead of 3.

Okay, thanks. Sadly, I still cannot reproduce it, even with the same configuration as yours:
for i in $(seq 1000); do numactl -C 0,1,29,30 ./gtid & done; wait
It ran 1000 times and nothing happens…

Looking at your ldd output, I reallize, that we use different C++ runtime libraries. I always build against libc++. Could this make a difference?

I rerun the tests with different clang compiler versions (after building libomp standalone) and can reproduce the assertion with clang 11 and clang 12.
For depend.cpp, the assertion triggers with data = 7.

In D107121#2918720, @protze.joachim wrote:

Looking at your ldd output, I reallize, that we use different C++ runtime libraries. I always build against libc++. Could this make a difference?

I rerun the tests with different clang compiler versions (after building libomp standalone) and can reproduce the assertion with clang 11 and clang 12.
For depend.cpp, the assertion triggers with data = 7.

Thanks for the info. I tried with libc++, but still had no luck.

➜ ldd gtid
        linux-vdso.so.1 (0x00007ffd799ba000)
        libc++.so.1 => /home/shiltian/Documents/deploy/llvm/release/lib/libc++.so.1 (0x00007f05e1261000)
        libc++abi.so.1 => /home/shiltian/Documents/deploy/llvm/release/lib/libc++abi.so.1 (0x00007f05e1226000)
        libm.so.6 => /usr/lib64/libm.so.6 (0x00007f05e10e1000)
        libomp.so => /home/shiltian/Documents/deploy/openmp/release/lib/libomp.so (0x00007f05e0ffe000)
        libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00007f05e0fe3000)
        libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007f05e0fbf000)
        libc.so.6 => /usr/lib64/libc.so.6 (0x00007f05e0df5000)
        librt.so.1 => /usr/lib64/librt.so.1 (0x00007f05e0dea000)
        libatomic.so.1 => /usr/lib64/libatomic.so.1 (0x00007f05e0de0000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f05e134f000)
        libdl.so.2 => /usr/lib64/libdl.so.2 (0x00007f05e0dd9000)

Can you try with GDB to see if the extra 8 threads are created?

fix it in another way

@protze.joachim Could you please give it a shot?

Harbormaster completed remote builds in B117642: Diff 363743.Aug 3 2021, 8:14 AM

I still get the same assertions.

It looks like the last task is not guaranteed to finish before the taskwait. The taskwait function spins on td_incomplete_child_tasks to become 0. Is it possible, that this variable is not properly maintained for hidden helper tasks?

(PS: I applied the patches from D106977, D107121, and D107316 for my tests)

This is how I tried to refine the tests to see, what is going wrong:

diff --git a/openmp/runtime/test/tasking/hidden_helper_task/depend.cpp b/openmp/runtime/test/tasking/hidden_helper_task/depend.cpp
index 4bc27c1..a6e8cbb 100644
--- a/openmp/runtime/test/tasking/hidden_helper_task/depend.cpp
+++ b/openmp/runtime/test/tasking/hidden_helper_task/depend.cpp
@@ -44,6 +44,10 @@ template <int I>
 kmp_int32 omp_task_entry(kmp_int32 gtid, kmp_task_t_with_privates *task) {
   auto shareds = reinterpret_cast<anon *>(task->task.shareds);
   auto p = shareds->data;
+  auto v = *p;
+  if(v!=I-1)
+    printf("Fail: %i expect %i\n", v, I-1);
+  assert(v == I-1);
   *p += I;
   return 0;
 }
@@ -118,6 +122,8 @@ int main(int argc, char *argv[]) {
     // Wait for all tasks
     __kmpc_omp_taskwait(nullptr, gtid);
 
+    if(data!=15)
+      printf("Fail: %i\n",data);
     assert(data == 15);
   }
 
diff --git a/openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp b/openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp
index 8cec95b..8cab6c9 100644
--- a/openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp
+++ b/openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp
@@ -50,11 +50,15 @@ kmp_int32 omp_task_entry(kmp_int32 gtid, kmp_task_t_with_privates *task) {
   return 0;
 }
 
-template <bool hidden_helper_task> void assert_gtid(int v) {
+template <bool hidden_helper_task> void assert_gtid(int v, int I) {
   if (__kmp_hidden_helper_threads_num) {
     if (hidden_helper_task) {
+      if(!(v > 0 && v <= __kmp_hidden_helper_threads_num))
+        printf("Fail(%i, %i): v = %i\n", hidden_helper_task, I, v);
       assert(v > 0 && v <= __kmp_hidden_helper_threads_num);
     } else {
+      if(!(v == 0 || v > __kmp_hidden_helper_threads_num))
+        printf("Fail(%i, %i): v = %i\n", hidden_helper_task, I, v);
       assert(v == 0 || v > __kmp_hidden_helper_threads_num);
     }
   } else {
@@ -68,7 +72,7 @@ int main(int argc, char *argv[]) {
   constexpr const int N = 1024;
 #pragma omp parallel for
   for (int i = 0; i < N; ++i) {
-    int32_t data1 = -1, data2 = -1, data3 = -1;
+    int32_t data1 = -4, data2 = -4, data3 = -4;
     int depvar;
     int32_t gtid = __kmpc_global_thread_num(nullptr);
 
@@ -117,9 +121,9 @@ int main(int argc, char *argv[]) {
     __kmpc_omp_taskwait(nullptr, gtid);
 
     // FIXME: 8 here is not accurate
-    assert_gtid<false>(data1);
-    assert_gtid<true>(data2);
-    assert_gtid<false>(data3);
+    assert_gtid<false>(data1, 1);
+    assert_gtid<true>(data2, 2);
+    assert_gtid<false>(data3, 3);
   }
 
   std::cout << "PASS\n";

I didn't see an issue regarding the child task counter.
Increment:

if (flags->proxy == TASK_PROXY || flags->detachable == TASK_DETACHABLE ||
      flags->hidden_helper ||
      !(taskdata->td_flags.team_serial || taskdata->td_flags.tasking_ser)) {
    KMP_ATOMIC_INC(&parent_task->td_incomplete_child_tasks);
...

Decrement:

if (!(taskdata->td_flags.team_serial || taskdata->td_flags.tasking_ser) ||
        taskdata->td_flags.detachable == TASK_DETACHABLE ||
        taskdata->td_flags.hidden_helper) {
      __kmp_release_deps(gtid, taskdata);
      // Predecrement simulated by "- 1" calculation
#if KMP_DEBUG
      children = -1 +
#endif
          KMP_ATOMIC_DEC(&taskdata->td_parent->td_incomplete_child_tasks);
...

Another increment of the counter is in __kmp_task_dup_alloc, which is for task loop, that is unrelated here. No other increment/decrement.
I did check the counter for the case gtid, and after the three tasks are created, the counter is 3, as expected.

https://bugs.llvm.org/show_bug.cgi?id=49816 also reports that one task is missing. So I'm thinking the issue is not introduced by this patch. Can you help double check if w/o the patch you can still observe the failure? If it is not introduced by this patch, I feel we could first get this patch in and then see if we can reproduce the issue in another way.

rebase

Harbormaster completed remote builds in B117805: Diff 363954.Aug 3 2021, 10:44 PM

I think, I found the issue with the two tests:
On our system, the default env sets OMP_NUM_THREADS=1. When I set OMP_NUM_THREADS=5 before running the tests, the tests succeed. I guess you should be able to reproduce the assertion by exporting OMP_NUM_THREADS=1 before running the tests.
Since the tests do not specify explicit number of threads (using the num_threads clause or setting the env var), the team will run serially.

The taskwait synchronization fails in that case, because the last task has td_flags.team_serial set. I think, td_incomplete_child_tasks needs to be maintained for serialized teams, as soon as a hidden/detached task is created in a task. @AndreyChurbanov how should we handle the synchronization for such tasks?

For better test coverage, we should add two run lines to the tests, one setting OMP_NUM_THREADS=1, one setting OMP_NUM_THREADS=5.

I can confirm, that this patch fixes the deadlock issue in the runtime.

Please rephrase the subject of the patch before committing. It's not about performance, but really fixes a possible deadlock in the runtime.

This revision is now accepted and ready to land.Aug 4 2021, 12:48 AM

tianshilei1992 edited the summary of this revision. (Show Details)Aug 4 2021, 9:34 AM

This revision was landed with ongoing or failed builds.Aug 4 2021, 9:34 AM

Closed by commit rG9f5d6ea52eb1: [OpenMP] Fix performance regression reported in bug #51235 (authored by tianshilei1992). · Explain Why

This revision was automatically updated to reflect the committed changes.

tianshilei1992 added a commit: rG9f5d6ea52eb1: [OpenMP] Fix performance regression reported in bug #51235.

In D107121#2924543, @protze.joachim wrote:

I think, I found the issue with the two tests:
On our system, the default env sets OMP_NUM_THREADS=1. When I set OMP_NUM_THREADS=5 before running the tests, the tests succeed. I guess you should be able to reproduce the assertion by exporting OMP_NUM_THREADS=1 before running the tests.
Since the tests do not specify explicit number of threads (using the num_threads clause or setting the env var), the team will run serially.

The taskwait synchronization fails in that case, because the last task has td_flags.team_serial set. I think, td_incomplete_child_tasks needs to be maintained for serialized teams, as soon as a hidden/detached task is created in a task. @AndreyChurbanov how should we handle the synchronization for such tasks?

For better test coverage, we should add two run lines to the tests, one setting OMP_NUM_THREADS=1, one setting OMP_NUM_THREADS=5.

Right. In D106519 I tried to reorder the code to make release dependence happens before decrementing the counter such that we can make sure the last task is pushed to the queue. At that moment, if the initial thread happens to finish the function flag.execute_tasks(...), taskdata->td_incomplete_child_tasks becomes 0 so it exits the loop. If the initial thread is still spinning, trying to execute tasks, it will pass.

Detached task doesn't have this problem because the thread executing the bottom half can execute the task itself. But for hidden helper task, if a regular task depends on a hidden helper task, and it happens to be the last one, it has to be pushed to its "encountering team". But whether it will be executed really depends.

The issue you linked is somewhat different.

Using detached tasks, the gtid/depend test is similar to:

#include <omp.h>
#include <thread>
#include <unistd.h>
#include <assert.h>

std::atomic<int> a{0};
std::thread t;

void async(omp_event_handle_t e){
  sleep(1);
  assert(a==2);
  omp_fulfill_event(e);
}

int main(int argc, char *argv[]) {
#pragma omp parallel master
  {
#pragma omp task depend(out: a)
    { a = 1; }
    omp_event_handle_t event;
#pragma omp task depend(out: a) detach(event)
    { a = 2; t = std::thread(async, event);}
#pragma omp task depend(out: a)
    { a = 4; }
#pragma omp taskwait
    assert(a==4);
  }
  t.join();
  return 0;
}

Then compile/execute:

$ clang++ -fopenmp test.cpp
$ for i in $(seq 10); do numactl -C 0 env OMP_NUM_THREADS=1 ./a.out & done

For me ~6/10 processes print Assertion 'a==4' failed.

In D107121#2926365, @protze.joachim wrote:

The issue you linked is somewhat different.

Using detached tasks, the gtid/depend test is similar to:

#include <omp.h>
#include <thread>
#include <unistd.h>
#include <assert.h>

std::atomic<int> a{0};
std::thread t;

void async(omp_event_handle_t e){
  sleep(1);
  assert(a==2);
  omp_fulfill_event(e);
}

int main(int argc, char *argv[]) {
#pragma omp parallel master
  {
#pragma omp task depend(out: a)
    { a = 1; }
    omp_event_handle_t event;
#pragma omp task depend(out: a) detach(event)
    { a = 2; t = std::thread(async, event);}
#pragma omp task depend(out: a)
    { a = 4; }
#pragma omp taskwait
    assert(a==4);
  }
  t.join();
  return 0;
}

Then compile/execute:

$ clang++ -fopenmp test.cpp
$ for i in $(seq 10); do numactl -C 0 env OMP_NUM_THREADS=1 ./a.out & done

For me ~6/10 processes print Assertion 'a==4' failed.

Yeah, thanks for that. I updated my previous comment. It surprised me that even detached task will also fail...After I reorder the code, the counter can only be decremented if the chain of dependent tasks are finished.

tianshilei1992 mentioned this in D107496: [OpenMP] Fix task wait doesn't work as expected in serialized team.Aug 4 2021, 12:29 PM

tianshilei1992 mentioned this in rG8442967fe324: [OpenMP] Fix task wait doesn't work as expected in serialized team.Aug 31 2021, 9:16 AM

Revision Contents

Path

Size

openmp/

runtime/

src/

kmp_tasking.cpp

1 line

Diff 363954

openmp/runtime/src/kmp_tasking.cpp

Show First 20 Lines • Show All 1,438 Lines • ▼ Show 20 Lines	kmp_task_t __kmpc_omp_target_task_alloc(ident_t loc_ref, kmp_int32 gtid,
kmp_int32 flags,		kmp_int32 flags,
size_t sizeof_kmp_task_t,		size_t sizeof_kmp_task_t,
size_t sizeof_shareds,		size_t sizeof_shareds,
kmp_routine_entry_t task_entry,		kmp_routine_entry_t task_entry,
kmp_int64 device_id) {		kmp_int64 device_id) {
if (__kmp_enable_hidden_helper) {		if (__kmp_enable_hidden_helper) {
auto &input_flags = reinterpret_cast<kmp_tasking_flags_t &>(flags);		auto &input_flags = reinterpret_cast<kmp_tasking_flags_t &>(flags);
input_flags.hidden_helper = TRUE;		input_flags.hidden_helper = TRUE;
		input_flags.tiedness = TASK_UNTIED;
}		}

return __kmpc_omp_task_alloc(loc_ref, gtid, flags, sizeof_kmp_task_t,		return __kmpc_omp_task_alloc(loc_ref, gtid, flags, sizeof_kmp_task_t,
sizeof_shareds, task_entry);		sizeof_shareds, task_entry);
}		}

/*!		/*!
@ingroup TASKING		@ingroup TASKING
▲ Show 20 Lines • Show All 1,329 Lines • ▼ Show 20 Lines	static kmp_task_t __kmp_steal_task(kmp_info_t victim_thr, kmp_int32 gtid,
current = __kmp_threads[gtid]->th.th_current_task;		current = __kmp_threads[gtid]->th.th_current_task;
taskdata = victim_td->td.td_deque[victim_td->td.td_deque_head];		taskdata = victim_td->td.td_deque[victim_td->td.td_deque_head];
if (__kmp_task_is_allowed(gtid, is_constrained, taskdata, current)) {		if (__kmp_task_is_allowed(gtid, is_constrained, taskdata, current)) {
// Bump head pointer and Wrap.		// Bump head pointer and Wrap.
victim_td->td.td_deque_head =		victim_td->td.td_deque_head =
(victim_td->td.td_deque_head + 1) & TASK_DEQUE_MASK(victim_td->td);		(victim_td->td.td_deque_head + 1) & TASK_DEQUE_MASK(victim_td->td);
} else {		} else {
if (!task_team->tt.tt_untied_task_encountered) {		if (!task_team->tt.tt_untied_task_encountered) {
// The TSC does not allow to steal victim task		// The TSC does not allow to steal victim task
		tianshilei1992AuthorUnsubmitted Done Reply Inline Actions A side question here, what does "TSC" mean here? tianshilei1992: A side question here, what does "TSC" mean here?
		protze.joachimUnsubmitted Done Reply Inline Actions TSC = task stealing constraint (see OpenMP spec) protze.joachim: TSC = task stealing constraint (see OpenMP spec)
		protze.joachimUnsubmitted Done Reply Inline Actions I meant task scheduling constraint protze.joachim: I meant task scheduling constraint
		tianshilei1992AuthorUnsubmitted Done Reply Inline Actions I guess it is "Task Scheduling Constraints", isn't it? tianshilei1992: I guess it is "Task Scheduling Constraints", isn't it?
__kmp_release_bootstrap_lock(&victim_td->td.td_deque_lock);		__kmp_release_bootstrap_lock(&victim_td->td.td_deque_lock);
KA_TRACE(10, ("__kmp_steal_task(exit #3): T#%d could not steal from "		KA_TRACE(10, ("__kmp_steal_task(exit #3): T#%d could not steal from "
"T#%d: task_team=%p ntasks=%d head=%u tail=%u\n",		"T#%d: task_team=%p ntasks=%d head=%u tail=%u\n",
gtid, __kmp_gtid_from_thread(victim_thr), task_team, ntasks,		gtid, __kmp_gtid_from_thread(victim_thr), task_team, ntasks,
victim_td->td.td_deque_head, victim_td->td.td_deque_tail));		victim_td->td.td_deque_head, victim_td->td.td_deque_tail));
return NULL;		return NULL;
}		}
int i;		int i;
▲ Show 20 Lines • Show All 1,991 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Fix performance regression reported in bug #51235ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 363954

openmp/runtime/src/kmp_tasking.cpp

[OpenMP] Fix performance regression reported in bug #51235
ClosedPublic