It is reported that after enabling hidden helper thread, the program
can hit the assertion new_gtid < __kmp_threads_capacity sometimes. The root
cause is explained as follows. Let's say the default __kmp_threads_capacity is
N. If hidden helper thread is enabled, __kmp_threads_capacity will be offset
to N+8 by default. If the number of threads we need exceeds N+8, e.g. via
num_threads clause, we need to expand __kmp_threads. In
__kmp_expand_threads, the expansion starts from __kmp_threads_capacity, and
repeatedly doubling it until the new capacity meets the requirement. Let's
assume the new requirement is Y. If Y happens to meet the constraint
(N+8)*2^X=Y where X is the number of iterations, the new capacity is not
enough because we have 8 slots for hidden helper threads.
Here is an example.
#include <vector> int main(int argc, char *argv[]) { constexpr const size_t N = 1344; std::vector<int> data(N); #pragma omp parallel for for (unsigned i = 0; i < N; ++i) { data[i] = i; } #pragma omp parallel for num_threads(N) for (unsigned i = 0; i < N; ++i) { data[i] += i; } return 0; }
My CPU is 20C40T, then __kmp_threads_capacity is 160. After offset,
__kmp_threads_capacity becomes 168. 1344 = (160+8)*2^3, then the assertions
hit.
I don't think, this is the right fix for the problem.
__kmp_threads_capacity is the size of the __kmp_threads array. If a call to __kmp_expand_threads asks to expand the array by 1, you don't need to expand by additional hidden threads (as they are not placed at the end). The hidden threads were already part of the __kmp_threads_capacity before expansion.