This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
openmp/runtime/
-
runtime/
-
src/
12/12
kmp_runtime.cpp
-
kmp_settings.cpp
-
test/tasking/hidden_helper_task/
-
tasking/
-
hidden_helper_task/
-
capacity.cpp

Differential D98838

[OpenMP] Fixed a crash in hidden helper thread
ClosedPublic

Authored by tianshilei1992 on Mar 17 2021, 7:07 PM.

Download Raw Diff

Details

Reviewers

ronlieb
protze.joachim
lebedev.ri
JonChesterfield
jdoerfert

Commits

rG2df65f87c1ea: [OpenMP] Fixed a crash in hidden helper thread

Summary

It is reported that after enabling hidden helper thread, the program
can hit the assertion new_gtid < __kmp_threads_capacity sometimes. The root
cause is explained as follows. Let's say the default __kmp_threads_capacity is
N. If hidden helper thread is enabled, __kmp_threads_capacity will be offset
to N+8 by default. If the number of threads we need exceeds N+8, e.g. via
num_threads clause, we need to expand __kmp_threads. In
__kmp_expand_threads, the expansion starts from __kmp_threads_capacity, and
repeatedly doubling it until the new capacity meets the requirement. Let's
assume the new requirement is Y. If Y happens to meet the constraint
(N+8)*2^X=Y where X is the number of iterations, the new capacity is not
enough because we have 8 slots for hidden helper threads.

Here is an example.

#include <vector>

int main(int argc, char *argv[]) {
  constexpr const size_t N = 1344;
  std::vector<int> data(N);

#pragma omp parallel for
  for (unsigned i = 0; i < N; ++i) {
    data[i] = i;
  }

#pragma omp parallel for num_threads(N)
  for (unsigned i = 0; i < N; ++i) {
    data[i] += i;
  }

  return 0;
}

My CPU is 20C40T, then __kmp_threads_capacity is 160. After offset,
__kmp_threads_capacity becomes 168. 1344 = (160+8)*2^3, then the assertions
hit.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tianshilei1992 created this revision.Mar 17 2021, 7:07 PM

Herald added subscribers: guansong, yaxunl. · View Herald TranscriptMar 17 2021, 7:07 PM

tianshilei1992 requested review of this revision.Mar 17 2021, 7:07 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2021, 7:07 PM

Herald added subscribers: openmp-commits, sstefan1. · View Herald Transcript

tianshilei1992 mentioned this in D77609: [OpenMP] Added the support for hidden helper task in RTL.Mar 17 2021, 7:07 PM

avoid potential integer overflow

Harbormaster completed remote builds in B94373: Diff 331440.Mar 17 2021, 7:37 PM

Harbormaster completed remote builds in B94375: Diff 331442.Mar 17 2021, 7:55 PM

The test case should really try to exceed the current capacity by 1.

I don't think, this patch really solves the fundamental issue resulting in the assertion.

openmp/runtime/src/kmp_runtime.cpp
3556–3564	I don't think, this is the right fix for the problem. `__kmp_threads_capacity` is the size of the `__kmp_threads` array. If a call to `__kmp_expand_threads` asks to expand the array by 1, you don't need to expand by additional hidden threads (as they are not placed at the end). The hidden threads were already part of the `__kmp_threads_capacity` before expansion.
3633	This might not call `__kmp_expand_threads(1)`, if not all `__kmp_hidden_helper_threads_num` are created before this code is reached and `__kmp_all_nth - __kmp_created_hidden_helper_threads + __kmp_hidden_helper_threads_num >= capacity`.
3664	Please don't forget about this fix. I don't care whether you fix it here or in a follow-up patch.
3667–3669	This for-loop implicitly assumes, that it will find an empty space in `[0:__kmp_threads_capacity)`, i.e. implicitly assumes `gtid < __kmp_threads_capacity` , as stated in the assertion below - but you skip the first few index in the range. Line 3620 fails to provide this guarantee for your modified numbering scheme.
openmp/runtime/test/tasking/hidden_helper_task/num_threads.cpp
12 ↗	(On Diff #331442)	`omp_get_num_threads()` will always return 1 in serial context.

This revision now requires changes to proceed.Mar 18 2021, 2:39 AM

protze.joachim added inline comments.Mar 18 2021, 4:33 AM

openmp/runtime/src/kmp_runtime.cpp
4336	This assertion triggers for your test application, when the runtime is built in debug mode.
openmp/runtime/test/tasking/hidden_helper_task/num_threads.cpp
25 ↗	(On Diff #331442)	The assertion in kmp_runtime.cpp:4323 triggers also for `num_threads(__kmp_threads_capacity+1)`. In that case, `__kmp_expand_threads` is never called, so your patch would have no effect at all. To really fix the issue, all cases which try to calculate the required number of threads and eventually would call `__kmp_expand_threads` need to consider the extra space for the hidden threads, so that they first realize that space is missing and second request enough space for the hidden threads. With this consideration, `__kmp_expand_threads` does not need any change.

Reproducer for the assertion in Line 3660:

#include <omp.h>
#include <vector>
#include <thread>
#include <chrono>

void dummy_root(){
  int nthreads = omp_get_max_threads();
  std::this_thread::sleep_for(std::chrono::milliseconds(1000));
}


int main(int argc, char *argv[]) {
  const int N = 4 * omp_get_num_procs();
  std::vector<int> data(N);
  std::thread root(dummy_root);
#pragma omp parallel for num_threads(N)
  for (unsigned i = 0; i < N; ++i) {
    data[i] += i;
  }

  root.join();
  return 0;
}

tianshilei1992 added inline comments.Mar 18 2021, 6:01 AM

openmp/runtime/src/kmp_runtime.cpp
3556–3564	Even w/o hidden helper thread, expansion by 1 will not result in increment by 1 because the `newCapacity` always doubles. Say originally it is 32, and we ask for expansion by 1, `newCapacity` will be 64 instead of 33. Therefore, whether we add extra space for hidden helper thread doesn’t waste too much memory here.
3664	Sure. I’ll include in this patch.
3667–3669	Because we already take the number of hidden helper thread into account when setting `__kmp_threads_capacity`, this assumption holds, right?
openmp/runtime/test/tasking/hidden_helper_task/num_threads.cpp
12 ↗	(On Diff #331442)	This logic is from code to set the capacity, specifically in the comment. It uses `$OMP_NUM_THREADS` originally so here I changed to `omp_get_num_threads()`.

protze.joachim added inline comments.Mar 18 2021, 7:11 AM

openmp/runtime/src/kmp_runtime.cpp
3667–3669	No, as I showed with my reproducer, your patched code will never be reached in specific cases. Also in the case, I annotated in your test-case: when the application asks for `N = __kmp_threads_capacity` threads, your patch is never reached, but the assertion still triggers (an prevents access beyond allocated memory!).
openmp/runtime/test/tasking/hidden_helper_task/num_threads.cpp
12 ↗	(On Diff #331442)	Right, `omp_get_max_threads()` is the function, which gives you $OMP_NUM_THREADS. `omp_get_num_threads()` does something different. Please don't rely on the names of the functions, but check the OpenMP spec.

tianshilei1992 added inline comments.Mar 18 2021, 8:44 AM

openmp/runtime/src/kmp_runtime.cpp
3667–3669	I got your point. That is a nice catch! Thanks. I'll come up with a good way to fix it.

[AMD Official Use Only - Internal Distribution Only]

Slightly related:
In Wednesdays multi company meeting, we concluded that the helper task patch should be reverted from llvm 12 while we continued to actively work issues in trunk.

Who is taking care of that ? or whom should we notify ?

Ron

I didn't recall we have that conclusion. My memory told me the patch will be reverted if we can't fix issues before the release. No?

comments

We should pull this from the 12 release. Lots of effort at the last minute to stop a complicated patch asserting, after it has been patched several times already, is unlikely to yield a stable release.

tianshilei1992 marked 5 inline comments as done.Mar 18 2021, 9:59 AM

tianshilei1992 added inline comments.

openmp/runtime/src/kmp_runtime.cpp
4336	Can you try again?

[AMD Official Use Only - Internal Distribution Only]

I totally agree with reverting this from 12.
Lets help our llvm release engineers produce a quality release.
Knowingly leaving in a patch that has this much contention an churn is not going to lead to a quality release.

Also, that is what I thought we concluded in the meeting.

Ron

I filed https://bugs.llvm.org/show_bug.cgi?id=49631 to make release managers aware that we have a problem here.

Ron,

Prefer if you  remove this from the mail " [AMD Official Use Only - Internal Distribution Only]"

Thanks
Ravi

[AMD Public Use]

Removed in both places , sorry about that, darn mailer configuration.

Shilei,

How much time do  you think you  need to resolve or conclude to revert or disable with macros in 12.0

Some would like to stabilize their performance numbers and would like to do it as early as possible.
Thanks
Ravi

In D98838#2635221, @RaviNarayanaswamy wrote:
Shilei,
How much time do  you think you  need to resolve or conclude to revert or disable with macros in 12.0
Some would like to stabilize their performance numbers and would like to do it as early as possible.
Thanks
Ravi

For the assertion problem, I expect this patch to fix that, and hopefully people can give it a shot. For the performance regression, I didn't observe it at least for now with HPC2021. I'll contact Ron for his reproducer.

Harbormaster completed remote builds in B94489: Diff 331598.Mar 18 2021, 11:12 AM

[AMD Public Use]

Shilei
I offered you a spec cpu 619.lbm reproducer for the performance issue.
takes 2 minutes or less to compile and run.
Do you want that?

Ron

In D98838#2635259, @ronlieb wrote:

[AMD Public Use]

Shilei
I offered you a spec cpu 619.lbm reproducer for the performance issue.
takes 2 minutes or less to compile and run.
Do you want that?

Ron

Yes, please.

[AMD Public Use]

Awesome, I will send it along in private email due to spec confidentiality rules.
Ie. I cannot attach the source here.

Look for something shortly.
Please ask me for any help you might need on the reproducer.

Ron

In D98838#2634873, @ronlieb wrote:

In Wednesdays multi company meeting, we concluded that the helper task patch should be reverted from llvm 12 while we continued to actively work issues in trunk.

Who is taking care of that ? or whom should we notify ?

I also don't recall that conclusion.

In D98838#2634957, @JonChesterfield wrote:

We should pull this from the 12 release. Lots of effort at the last minute to stop a complicated patch asserting, after it has been patched several times already, is unlikely to yield a stable release.

Pulling this is not necessarily easy either, I haven't checked though.
Have you or @ronlieb tried this solution, especially if the environment
variable now works to disable all the side-effects?

Let me be direct for a second so we don't end up here again in a few months:
The patch was on phab for ~1 year, nobody cared, this is a very common phenomena.
It also has been merged for weeks. I get the fact that we want a stable release
but showing up last minute just saying we need to pull stuff is *not* helpful
from an overall perspective. I say this especially because the number of people/
organizations that develop and upstream complex features is very limited. If you
want to benefit from such efforts you should be prepared to help, IMHO. That does
mean to do some testing and reviewing *before* the last release candidate is due.
Not to say this was not tested, but the capabilities are arguably different here.

[AMD Public Use]

The environment variable LIBOMP_USE_HIDDEN_HELPER_TASK=0 does not solve the issues I have seen.
Nor did it resolve the issue in the simple test case Joachim posted...
Which I tried both with and without LIBOMP_USE_HIDDEN_HELPER_TASK=0
Joachim might be able to confirm same?

Ron

LIBOMP_NUM_HIDDEN_HELPER_THREADS=0 avoids the segfault/assertion for the two test cases I attached to the bugzilla issue. This kind of makes sense, as 0 hidden threads cannot create a hole in the __kmp_threads array.

If you still see performance regression (I could not reproduce this with lbm built from SPEC CPU 2006, for which I explicitly turned on the contained OpenMP code), I guess the code adds some additional synchronization, which was not there before.

I did some experiments with different versions of lbm:

HPC2021: didn't observe performance regression (three variants: with hht, with hht but disable it via env, and w/o hht by reverting the change)
Ron's reproducer: observed performance regression if running with numactl --localalloc --physcpubind=0-xxx. In this case, disabling it via env can help. If running w/o numactl, almost no performance difference. (unclear the tiny difference is noise or not)

All run 10 times.

__kmp_hidden_helper_initialize() always initializes all hidden threads at once. Right? In this case, you modifications make sense.

I tested the patch applied to the release branch. It fixes both of my reproducers.
Please add my reproducers as test cases.

Also remove the unnecessary code of your first shot.

openmp/runtime/src/kmp_runtime.cpp
3556–3564	This change is unnecessary.

In D98838#2635538, @protze.joachim wrote:

Please add my reproducers as test cases.

I just saw, that you fused the reproducers into a single test. I'd prefer to have them as separate tests. This will help to easier spot the source of future failures. The individual parallel regions might also change the capacity, so that the individual issues are not triggered.

comments

In D98838#2635538, @protze.joachim wrote:

__kmp_hidden_helper_initialize() always initializes all hidden threads at once. Right? In this case, you modifications make sense.

Right.

In D98838#2635384, @jdoerfert wrote:

Let me be direct for a second so we don't end up here again in a few months:
The patch was on phab for ~1 year, nobody cared, this is a very common phenomena.
It also has been merged for weeks. I get the fact that we want a stable release
but showing up last minute just saying we need to pull stuff is *not* helpful
from an overall perspective. I say this especially because the number of people/
organizations that develop and upstream complex features is very limited. If you
want to benefit from such efforts you should be prepared to help, IMHO. That does
mean to do some testing and reviewing *before* the last release candidate is due.
Not to say this was not tested, but the capabilities are arguably different here.

Well, sort of. We reported segfaults on it pretty much when it landed,
https://reviews.llvm.org/D77609 has a comment from Jan 18 this year. It landed after:

The test still doesn't work ideally on amdgpu, but it no longer crashes, and
some of the print statements within the target region are seen.

which given it's host only code was a fairly clear sign that things weren't right.

Granted Ron & I didn't review or read the change (as far as I know), but then we're
mostly fighting to get amdgpu working and didn't anticipate a host-only change
breaking gpu offloading.

In the spirit of direct, I didn't care about this change until it landed and broke stuff.

LGTM after the latest update.

This revision is now accepted and ready to land.Mar 18 2021, 2:05 PM

Harbormaster completed remote builds in B94544: Diff 331675.Mar 18 2021, 3:14 PM

Closed by commit rG2df65f87c1ea: [OpenMP] Fixed a crash in hidden helper thread (authored by tianshilei1992). · Explain WhyMar 18 2021, 3:25 PM

This revision was automatically updated to reflect the committed changes.

tianshilei1992 added a commit: rG2df65f87c1ea: [OpenMP] Fixed a crash in hidden helper thread.

In D98838#2635728, @JonChesterfield wrote:

In D98838#2635384, @jdoerfert wrote:

In the spirit of direct, I didn't care about this change until it landed and broke stuff.

As far as I can tell, we merged Jan 25, Ron reported an issue March 15. In addition to the
review time, it was upstream for 8 weeks before you reported it broke stuff. Given that delay
I would not throw rocks at people claiming they did not do any testing. It's not like we don't
try to setup LLVM/OpenMP CI and such.

[AMD Public Use]

We have a continuous integration process that takes essentially trunk changes and moves them a month's worth at a time, into our production testing branch. This branch has 100's of hours of testing.
We recently moved from Dec 8 to Jan 26 commits, and did so about 2 weeks ago, and that is when we started to see the problems. Would we like to test the larger batches of changes sooner? Yes of course. We reported the problem fairly quickly after we saw the assert issue.

There are other companies who wait until LLVM releases and then move to integrating and testing that release branch source. These companies have yet to start testing llvm12, so they have not seen the patch in question.

Hope that provides a bit of clarity into the time lags that we see and will see moving from commit into product releases.

Ron

In D98838#2636101, @ronlieb wrote:

We reported the problem fairly quickly after we saw the assert issue.

I appreciate your report. Seriously. However, no one would like to tell us how to reproduce the bug. Even now this patch has already been merged, I still didn't get any reproducer (in any form) from whom reported the issue at the very beginning. I can get that we're approaching release, and we want a stable product. However, if nobody provides steps to reproduce bugs, and just asks to revert patch, we will probably NEVER have new features.

The AMD AOCC Compiler team reported to me this morning that they are able to reproduce the SPEC CPU performance regressions when the patch is present.

They are able to recover the lost performance when they set the two environment variables using SPEC confg file rules

preENV_LIBOMP_USE_HIDDEN_HELPER_TASK=OFF
preENV_LIBOMP_NUM_HIDDEN_HELPER_THREADS=0

which for a non speccpu program would be simply

export  LIBOMP_USE_HIDDEN_HELPER_TASK=OFF
export  LIBOMP_NUM_HIDDEN_HELPER_THREADS=0

tianshilei1992 mentioned this in D99020: [OpenMP] Disable hidden helper task by default.Mar 20 2021, 9:59 AM

Revision Contents

Path

Size

openmp/

runtime/

src/

kmp_runtime.cpp

24 lines

kmp_settings.cpp

7 lines

test/

tasking/

hidden_helper_task/

capacity.cpp

60 lines

Diff 331598

openmp/runtime/src/kmp_runtime.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 848 Lines • ▼ Show 20 Lines #endif /* USE_LOAD_BALANCE */

// Check if the threads array is large enough, or needs expanding. // Check if the threads array is large enough, or needs expanding.

// See comment in __kmp_register_root() about the adjustment if // See comment in __kmp_register_root() about the adjustment if

// __kmp_threads[0] == NULL. // __kmp_threads[0] == NULL.

capacity = __kmp_threads_capacity; capacity = __kmp_threads_capacity;

if (TCR_PTR(__kmp_threads[0]) == NULL) { if (TCR_PTR(__kmp_threads[0]) == NULL) {

--capacity; --capacity;

} }

// If it is not for initializing the hidden helper team, we need to take

// __kmp_hidden_helper_threads_num out of the capacity because it is included

// in __kmp_threads_capacity.

if (__kmp_enable_hidden_helper && !TCR_4(__kmp_init_hidden_helper_threads)) {

capacity -= __kmp_hidden_helper_threads_num;

}

if (__kmp_nth + new_nthreads - if (__kmp_nth + new_nthreads -

(root->r.r_active ? 1 : root->r.r_hot_team->t.t_nproc) > (root->r.r_active ? 1 : root->r.r_hot_team->t.t_nproc) >

capacity) { capacity) {

// Expand the threads array. // Expand the threads array.

int slotsRequired = __kmp_nth + new_nthreads - int slotsRequired = __kmp_nth + new_nthreads -

(root->r.r_active ? 1 : root->r.r_hot_team->t.t_nproc) - (root->r.r_active ? 1 : root->r.r_hot_team->t.t_nproc) -

capacity; capacity;

int slotsAdded = __kmp_expand_threads(slotsRequired); int slotsAdded = __kmp_expand_threads(slotsRequired);

▲ Show 20 Lines • Show All 2,677 Lines • ▼ Show 20 Lines #endif

} }

minimumRequiredCapacity = __kmp_threads_capacity + nNeed; minimumRequiredCapacity = __kmp_threads_capacity + nNeed;

newCapacity = __kmp_threads_capacity; newCapacity = __kmp_threads_capacity;

do { do {

newCapacity = newCapacity <= (__kmp_sys_max_nth >> 1) ? (newCapacity << 1) newCapacity = newCapacity <= (__kmp_sys_max_nth >> 1) ? (newCapacity << 1)

: __kmp_sys_max_nth; : __kmp_sys_max_nth;

} while (newCapacity < minimumRequiredCapacity); } while (newCapacity < minimumRequiredCapacity);

// If hidden helper thread is enabled, we also need to count the number

if (__kmp_enable_hidden_helper) {

if (UNLIKELY(newCapacity >

__kmp_sys_max_nth - __kmp_hidden_helper_threads_num)) {

newCapacity = __kmp_sys_max_nth;

} else {

newCapacity += __kmp_hidden_helper_threads_num;

}

protze.joachimUnsubmitted

Done

I don't think, this is the right fix for the problem.
__kmp_threads_capacity is the size of the __kmp_threads array. If a call to __kmp_expand_threads asks to expand the array by 1, you don't need to expand by additional hidden threads (as they are not placed at the end). The hidden threads were already part of the __kmp_threads_capacity before expansion.

protze.joachim: I don't think, this is the right fix for the problem. `__kmp_threads_capacity` is the size of…

tianshilei1992AuthorUnsubmitted

Done

Even w/o hidden helper thread, expansion by 1 will not result in increment by 1 because the newCapacity always doubles. Say originally it is 32, and we ask for expansion by 1, newCapacity will be 64 instead of 33. Therefore, whether we add extra space for hidden helper thread doesn’t waste too much memory here.

tianshilei1992: Even w/o hidden helper thread, expansion by 1 will not result in increment by 1 because the…

protze.joachimUnsubmitted

Done

This change is unnecessary.

protze.joachim: This change is unnecessary.

newThreads = (kmp_info_t **)__kmp_allocate( newThreads = (kmp_info_t **)__kmp_allocate(

(sizeof(kmp_info_t *) + sizeof(kmp_root_t *)) * newCapacity + CACHE_LINE); (sizeof(kmp_info_t *) + sizeof(kmp_root_t *)) * newCapacity + CACHE_LINE);

newRoot = newRoot =

(kmp_root_t **)((char *)newThreads + sizeof(kmp_info_t *) * newCapacity); (kmp_root_t **)((char *)newThreads + sizeof(kmp_info_t *) * newCapacity);

KMP_MEMCPY(newThreads, __kmp_threads, KMP_MEMCPY(newThreads, __kmp_threads,

__kmp_threads_capacity * sizeof(kmp_info_t *)); __kmp_threads_capacity * sizeof(kmp_info_t *));

KMP_MEMCPY(newRoot, __kmp_root, KMP_MEMCPY(newRoot, __kmp_root,

__kmp_threads_capacity * sizeof(kmp_root_t *)); __kmp_threads_capacity * sizeof(kmp_root_t *));

▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines /* 2007-03-02:

(2) we cannot detect initial thread reliably (the first thread which does (2) we cannot detect initial thread reliably (the first thread which does

serial initialization may be not a real initial thread). serial initialization may be not a real initial thread).

*/ */

capacity = __kmp_threads_capacity; capacity = __kmp_threads_capacity;

if (!initial_thread && TCR_PTR(__kmp_threads[0]) == NULL) { if (!initial_thread && TCR_PTR(__kmp_threads[0]) == NULL) {

--capacity; --capacity;

} }

// If it is not for initializing the hidden helper team, we need to take

// __kmp_hidden_helper_threads_num out of the capacity because it is included

// in __kmp_threads_capacity.

if (__kmp_enable_hidden_helper && !TCR_4(__kmp_init_hidden_helper_threads)) {

capacity -= __kmp_hidden_helper_threads_num;

}

/* see if there are too many threads */ /* see if there are too many threads */

if (__kmp_all_nth >= capacity && !__kmp_expand_threads(1)) { if (__kmp_all_nth >= capacity && !__kmp_expand_threads(1)) {

protze.joachimUnsubmitted

Done

This might not call __kmp_expand_threads(1), if not all __kmp_hidden_helper_threads_num are created before this code is reached and __kmp_all_nth - __kmp_created_hidden_helper_threads + __kmp_hidden_helper_threads_num >= capacity.

protze.joachim: This might not call `__kmp_expand_threads(1)`, if not all `__kmp_hidden_helper_threads_num` are…

if (__kmp_tp_cached) { if (__kmp_tp_cached) {

__kmp_fatal(KMP_MSG(CantRegisterNewThread), __kmp_fatal(KMP_MSG(CantRegisterNewThread),

KMP_HNT(Set_ALL_THREADPRIVATE, __kmp_tp_capacity), KMP_HNT(Set_ALL_THREADPRIVATE, __kmp_tp_capacity),

KMP_HNT(PossibleSystemLimitOnThreads), __kmp_msg_null); KMP_HNT(PossibleSystemLimitOnThreads), __kmp_msg_null);

} else { } else {

__kmp_fatal(KMP_MSG(CantRegisterNewThread), KMP_HNT(SystemLimitOnThreads), __kmp_fatal(KMP_MSG(CantRegisterNewThread), KMP_HNT(SystemLimitOnThreads),

__kmp_msg_null); __kmp_msg_null);

} }

Show All 14 Lines if (TCR_4(__kmp_init_hidden_helper_threads)) {

KMP_ASSERT(gtid <= __kmp_hidden_helper_threads_num); KMP_ASSERT(gtid <= __kmp_hidden_helper_threads_num);

KA_TRACE(1, ("__kmp_register_root: found slot in threads array for " KA_TRACE(1, ("__kmp_register_root: found slot in threads array for "

"hidden helper thread: T#%d\n", "hidden helper thread: T#%d\n",

gtid)); gtid));

} else { } else {

/* find an available thread slot */ /* find an available thread slot */

// Don't reassign the zero slot since we need that to only be used by // Don't reassign the zero slot since we need that to only be used by

// initial thread. Slots for hidden helper threads should also be skipped. // initial thread. Slots for hidden helper threads should also be skipped.

if (initial_thread && __kmp_threads[0] == NULL) { if (initial_thread && TCR_PTR(__kmp_threads[0]) == NULL) {

protze.joachimUnsubmitted

Done

// initial thread. Slots for hidden helper threads should also be skipped.

- if (initial_thread && __kmp_threads[0] == NULL) {

+ if (initial_thread && TCR_PTR(__kmp_threads[0]) == NULL) {

gtid = 0;

Please don't forget about this fix. I don't care whether you fix it here or in a follow-up patch.

protze.joachim: Please don't forget about this fix. I don't care whether you fix it here or in a follow-up…

tianshilei1992AuthorUnsubmitted

Done

Sure. I’ll include in this patch.

tianshilei1992: Sure. I’ll include in this patch.

gtid = 0; gtid = 0;

} else { } else {

for (gtid = __kmp_hidden_helper_threads_num + 1; for (gtid = __kmp_hidden_helper_threads_num + 1;

TCR_PTR(__kmp_threads[gtid]) != NULL; gtid++) TCR_PTR(__kmp_threads[gtid]) != NULL; gtid++)

; ;

protze.joachimUnsubmitted

Done

This for-loop implicitly assumes, that it will find an empty space in [0:__kmp_threads_capacity), i.e. implicitly assumes gtid < __kmp_threads_capacity , as stated in the assertion below - but you skip the first few index in the range. Line 3620 fails to provide this guarantee for your modified numbering scheme.

protze.joachim: This for-loop implicitly assumes, that it will find an empty space in `[0…

tianshilei1992AuthorUnsubmitted

Done

Because we already take the number of hidden helper thread into account when setting __kmp_threads_capacity, this assumption holds, right?

tianshilei1992: Because we already take the number of hidden helper thread into account when setting…

protze.joachimUnsubmitted

Done

No, as I showed with my reproducer, your patched code will never be reached in specific cases.
Also in the case, I annotated in your test-case: when the application asks for N = __kmp_threads_capacity threads, your patch is never reached, but the assertion still triggers (an prevents access beyond allocated memory!).

protze.joachim: No, as I showed with my reproducer, your patched code will never be reached in specific cases.

tianshilei1992AuthorUnsubmitted

Done

I got your point. That is a nice catch! Thanks. I'll come up with a good way to fix it.

tianshilei1992: I got your point. That is a nice catch! Thanks. I'll come up with a good way to fix it.

} }

KA_TRACE( KA_TRACE(

1, ("__kmp_register_root: found slot in threads array: T#%d\n", gtid)); 1, ("__kmp_register_root: found slot in threads array: T#%d\n", gtid));

KMP_ASSERT(gtid < __kmp_threads_capacity); KMP_ASSERT(gtid < __kmp_threads_capacity);

} }

/* update global accounting */ /* update global accounting */

__kmp_all_nth++; __kmp_all_nth++;

▲ Show 20 Lines • Show All 650 Lines • ▼ Show 20 Lines #endif

{ {

int new_start_gtid = TCR_4(__kmp_init_hidden_helper_threads) int new_start_gtid = TCR_4(__kmp_init_hidden_helper_threads)

? 1 ? 1

: __kmp_hidden_helper_threads_num + 1; : __kmp_hidden_helper_threads_num + 1;

for (new_gtid = new_start_gtid; TCR_PTR(__kmp_threads[new_gtid]) != NULL; for (new_gtid = new_start_gtid; TCR_PTR(__kmp_threads[new_gtid]) != NULL;

++new_gtid) { ++new_gtid) {

KMP_DEBUG_ASSERT(new_gtid < __kmp_threads_capacity); KMP_DEBUG_ASSERT(new_gtid < __kmp_threads_capacity);

protze.joachimUnsubmitted

Done

This assertion triggers for your test application, when the runtime is built in debug mode.

protze.joachim: This assertion triggers for your test application, when the runtime is built in debug mode.

tianshilei1992AuthorUnsubmitted

Done

Can you try again?

tianshilei1992: Can you try again?

} }

if (TCR_4(__kmp_init_hidden_helper_threads)) { if (TCR_4(__kmp_init_hidden_helper_threads)) {

KMP_DEBUG_ASSERT(new_gtid <= __kmp_hidden_helper_threads_num); KMP_DEBUG_ASSERT(new_gtid <= __kmp_hidden_helper_threads_num);

} }

/* allocate space for it. */ /* allocate space for it. */

▲ Show 20 Lines • Show All 4,364 Lines • Show Last 20 Lines

openmp/runtime/src/kmp_settings.cpp

Show First 20 Lines • Show All 498 Lines • ▼ Show 20 Lines	int __kmp_initial_threads_capacity(int req_nproc) {
/* MIN( MAX( 32, 4 * $OMP_NUM_THREADS, 4 * omp_get_num_procs() ),		/* MIN( MAX( 32, 4 * $OMP_NUM_THREADS, 4 * omp_get_num_procs() ),
* __kmp_max_nth) */		* __kmp_max_nth) */
if (nth < (4 * req_nproc))		if (nth < (4 * req_nproc))
nth = (4 * req_nproc);		nth = (4 * req_nproc);
if (nth < (4 * __kmp_xproc))		if (nth < (4 * __kmp_xproc))
nth = (4 * __kmp_xproc);		nth = (4 * __kmp_xproc);

// If hidden helper task is enabled, we initialize the thread capacity with		// If hidden helper task is enabled, we initialize the thread capacity with
// extra		// extra __kmp_hidden_helper_threads_num.
// __kmp_hidden_helper_threads_num.		if (__kmp_enable_hidden_helper) {
nth += __kmp_hidden_helper_threads_num;		nth += __kmp_hidden_helper_threads_num;
		}

if (nth > __kmp_max_nth)		if (nth > __kmp_max_nth)
nth = __kmp_max_nth;		nth = __kmp_max_nth;

return nth;		return nth;
}		}

int __kmp_default_tp_capacity(int req_nproc, int max_nth,		int __kmp_default_tp_capacity(int req_nproc, int max_nth,
▲ Show 20 Lines • Show All 5,658 Lines • Show Last 20 Lines

openmp/runtime/test/tasking/hidden_helper_task/capacity.cpp

This file was added.

				// RUN: %libomp-cxx-compile-and-run

				#include <omp.h>

				#include <algorithm>
				#include <cassert>
				#include <chrono>
				#include <thread>
				#include <vector>

				void dummy_root() {
				// omp_get_max_threads() will do middle initialization
				int nthreads = omp_get_max_threads();
				std::this_thread::sleep_for(std::chrono::milliseconds(1000));
				}

				int main(int argc, char *argv[]) {
				constexpr const int __kmp_hidden_helper_threads_num = 8;

				// Create a new thread to initialize the OpenMP RTL. The new thread will not
				// be taken as the "initial thread".
				std::thread root(dummy_root);

				const int __kmp_threads_capacity =
				std::min(std::max(std::max(32, 4 * omp_get_max_threads()),
				4 * omp_get_num_procs()),
				std::numeric_limits<int>::max());
				const int capacity = __kmp_threads_capacity + __kmp_hidden_helper_threads_num;
				const int N = 2 * capacity;

				std::vector<int> data(N);

				#pragma omp parallel for
				for (unsigned i = 0; i < N; ++i) {
				data[i] = i;
				}

				#pragma omp parallel for num_threads(capacity)
				for (unsigned i = 0; i < N; ++i) {
				data[i] += i;
				}

				#pragma omp parallel for num_threads(capacity + 1)
				for (unsigned i = 0; i < N; ++i) {
				data[i] += i;
				}

				#pragma omp parallel for num_threads(N)
				for (unsigned i = 0; i < N; ++i) {
				data[i] += i;
				}

				for (unsigned i = 0; i < N; ++i) {
				assert(data[i] == 4 * i);
				}

				root.join();

				return 0;
				}

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Fixed a crash in hidden helper threadClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 331598

openmp/runtime/src/kmp_runtime.cpp

openmp/runtime/src/kmp_settings.cpp

openmp/runtime/test/tasking/hidden_helper_task/capacity.cpp

[OpenMP] Fixed a crash in hidden helper thread
ClosedPublic