This patch adds lower-bound and upper-bound to num_teams clause according to OpenMP 5.1 specification. The number of initial teams created is implementation defined, but it will be greater than or equal to lower bound and less than or equal to the upper bound. If num_teams clause is not present, the number of initial teams created is implementation defined, but it will be greater than or equal to 1.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
openmp/runtime/src/kmp.h | ||
---|---|---|
3367 | Given the version name, should we go with _51? | |
openmp/runtime/src/kmp_runtime.cpp | ||
7537 | IIRC, this is not correct anymore in 5.1. We cannot cap the number of teams. We need to implement a loop if the number of physical teams is smaller than the number requested teams to basically emulate as many as have been requested. |
openmp/runtime/src/kmp_runtime.cpp | ||
---|---|---|
7537 | According to the spec, when num_teams clause is not present, the number of teams created should be implementation defined, but it will be greater or equal to 1. 5.1 spec - "If the num_teams clause is not specified and the value of the nteams-var ICV is greater than zero, the number of teams created is less or equal to the value of the nteams-var ICV. Otherwise, the number of teams created is implementation defined, but it will be greater than or equal to 1." |
openmp/runtime/src/kmp.h | ||
---|---|---|
3367 | Sounds good. I'll change it to _51. |
openmp/runtime/src/kmp_runtime.cpp | ||
---|---|---|
7537 | I'm not arguing against that. What I'm trying to say is that we cannot cap the number of teams. So we cannot do nteams = min(nteams, __kmp_teams_max_nth). OpenMP 5.1 says, you get a number between the lower and upper bound. So if the lower bound is not smaller or equal than __kmp_teams_max_nth, we cannot cap at __kmp_teams_max_nth. This is explicitly different from num_threads on a parallel. For example, to implement N teams with a single thread, e.g., if __kmp_teams_max_nth was 1, you would do something like this: for (unsigned team_no = 0; team_no < num_teams_requested; ++team_no) { set_team_no_for_thread(team_no); execute_team_body(); } |
openmp/runtime/src/kmp_runtime.cpp | ||
---|---|---|
7537 |
Why? Once the specification allows to choose any number of teams, why would we complicate things without the need?
Right. But this is not the case in this code branch. Here we have neither lower nor upper bound. We don't cap the number of teams when we have lower and/or upper bound. But again, it is not the case here. |
openmp/runtime/src/kmp_runtime.cpp | ||
---|---|---|
7537 |
You are both right, in this branch there is no problem. I was under the impression we cap it in this conditional and I choose a bad location for my comment. If there is no capping anywhere, there is no problem. Apologies for the noise. |
openmp/runtime/test/teams/kmp_num_teams.c | ||
---|---|---|
69 | Hello, I'm wondering if nteams could have an uninitialized use here. It is defined conditionally at line 41. I was seeing nteams had a random value. This happened in a very specific testing environment and sorry I cannot provide you a repro. Thanks. |
openmp/runtime/test/teams/kmp_num_teams.c | ||
---|---|---|
69 | When the encountering thread finished executing the target construct it should have also executed line 41. That said, there might be a capturing issue here. Do we need to capture nteams as shared explicitly to avoid a firstprivate copy in the target teams region? |
openmp/runtime/test/teams/kmp_num_teams.c | ||
---|---|---|
69 | Once target construct does not have shared clause, map(tofrom:..) probably needed for a, nteams and nthreads. |
openmp/runtime/test/teams/kmp_num_teams.c | ||
---|---|---|
69 | OK, I'm seeing nteams has different addresses in the master and team threads. Maybe shared is needed. |
openmp/runtime/test/teams/kmp_num_teams.c | ||
---|---|---|
69 | Sorry, I was in a hurry. My previous fix should not work with real offload, because the test was intended to check the teams construct on a host. With real offload the test cannot pass because num_teams and thread_limit won't be propagated to the device. So the proper fix would be to remove the target construct, and execute the teams on host. I've committed another patch 5631842d1810 that should definitely fix the problem. I apologize to miss the test problem during review (I actually don't have a machine with an accelerator at hand). |
openmp/runtime/test/teams/kmp_num_teams.c | ||
---|---|---|
69 | Thanks for the quick fix! |
clang-tidy: warning: invalid case style for function '__kmp_push_num_teams_5' [readability-identifier-naming]
not useful
clang-tidy: warning: invalid case style for parameter 'loc' [readability-identifier-naming]
not useful
clang-tidy: warning: invalid case style for parameter 'gtid' [readability-identifier-naming]
not useful
clang-tidy: warning: invalid case style for parameter 'num_teams_lb' [readability-identifier-naming]
not useful