This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Add bounds to num_teams clause (OpenMP 5.1)
ClosedPublic

Authored by Nawrin on Feb 1 2021, 2:14 PM.

Details

Summary

This patch adds lower-bound and upper-bound to num_teams clause according to OpenMP 5.1 specification. The number of initial teams created is implementation defined, but it will be greater than or equal to lower bound and less than or equal to the upper bound. If num_teams clause is not present, the number of initial teams created is implementation defined, but it will be greater than or equal to 1.

Diff Detail

Event Timeline

Nawrin created this revision.Feb 1 2021, 2:14 PM
Nawrin requested review of this revision.Feb 1 2021, 2:14 PM
jdoerfert added inline comments.Feb 1 2021, 2:23 PM
openmp/runtime/src/kmp.h
3367

Given the version name, should we go with _51?

openmp/runtime/src/kmp_runtime.cpp
7545

IIRC, this is not correct anymore in 5.1. We cannot cap the number of teams. We need to implement a loop if the number of physical teams is smaller than the number requested teams to basically emulate as many as have been requested.

Nawrin added inline comments.Feb 2 2021, 2:49 PM
openmp/runtime/src/kmp_runtime.cpp
7545

According to the spec, when num_teams clause is not present, the number of teams created should be implementation defined, but it will be greater or equal to 1.

5.1 spec - "If the num_teams clause is not specified and the value of the nteams-var ICV is greater than zero, the number of teams created is less or equal to the value of the nteams-var ICV. Otherwise, the number of teams created is implementation defined, but it will be greater than or equal to 1."

Nawrin added inline comments.Feb 2 2021, 2:50 PM
openmp/runtime/src/kmp.h
3367

Sounds good. I'll change it to _51.

jdoerfert added inline comments.Feb 2 2021, 2:59 PM
openmp/runtime/src/kmp_runtime.cpp
7545

I'm not arguing against that. What I'm trying to say is that we cannot cap the number of teams. So we cannot do nteams = min(nteams, __kmp_teams_max_nth).

OpenMP 5.1 says, you get a number between the lower and upper bound. So if the lower bound is not smaller or equal than __kmp_teams_max_nth, we cannot cap at __kmp_teams_max_nth. This is explicitly different from num_threads on a parallel. For example, to implement N teams with a single thread, e.g., if __kmp_teams_max_nth was 1, you would do something like this:

for (unsigned team_no = 0; team_no < num_teams_requested; ++team_no) {
  set_team_no_for_thread(team_no);
  execute_team_body();
}
AndreyChurbanov added inline comments.Feb 4 2021, 6:07 AM
openmp/runtime/src/kmp_runtime.cpp
7545

I'm not arguing against that. What I'm trying to say is that we cannot cap the number of teams. So we cannot do nteams = min(nteams, __kmp_teams_max_nth).

Why? Once the specification allows to choose any number of teams, why would we complicate things without the need?

OpenMP 5.1 says, you get a number between the lower and upper bound. So if the lower bound is not smaller or equal than __kmp_teams_max_nth, we cannot cap at __kmp_teams_max_nth.

Right. But this is not the case in this code branch. Here we have neither lower nor upper bound.

We don't cap the number of teams when we have lower and/or upper bound. But again, it is not the case here.

jdoerfert resigned from this revision.Feb 4 2021, 6:31 AM
jdoerfert added inline comments.
openmp/runtime/src/kmp_runtime.cpp
7545

Right. But this is not the case in this code branch. Here we have neither lower nor upper bound.

You are both right, in this branch there is no problem. I was under the impression we cap it in this conditional and I choose a bad location for my comment. If there is no capping anywhere, there is no problem. Apologies for the noise.

Nawrin updated this revision to Diff 322504.Feb 9 2021, 2:30 PM

Update function name from _5 to _51

This revision is now accepted and ready to land.Feb 10 2021, 11:15 AM
This revision was automatically updated to reflect the committed changes.
hoy added a subscriber: hoy.Feb 12 2021, 4:41 PM
hoy added inline comments.
openmp/runtime/test/teams/kmp_num_teams.c
68

Hello, I'm wondering if nteams could have an uninitialized use here. It is defined conditionally at line 41. I was seeing nteams had a random value. This happened in a very specific testing environment and sorry I cannot provide you a repro. Thanks.

jdoerfert added inline comments.
openmp/runtime/test/teams/kmp_num_teams.c
68

When the encountering thread finished executing the target construct it should have also executed line 41. That said, there might be a capturing issue here. Do we need to capture nteams as shared explicitly to avoid a firstprivate copy in the target teams region?

openmp/runtime/test/teams/kmp_num_teams.c
68

Once target construct does not have shared clause, map(tofrom:..) probably needed for a, nteams and nthreads.

hoy added inline comments.Feb 12 2021, 5:01 PM
openmp/runtime/test/teams/kmp_num_teams.c
68

OK, I'm seeing nteams has different addresses in the master and team threads. Maybe shared is needed.

openmp/runtime/test/teams/kmp_num_teams.c
68

I tried to fix the test in commit 091e8da.
Strictly the a needs to be in in_reduction and reduction, but I hope atomic to work here. Otherwise map(tofrom) should be changed to in_reduction(+:a) reduction(+:a).

openmp/runtime/test/teams/kmp_num_teams.c
68

Sorry, I was in a hurry. My previous fix should not work with real offload, because the test was intended to check the teams construct on a host. With real offload the test cannot pass because num_teams and thread_limit won't be propagated to the device. So the proper fix would be to remove the target construct, and execute the teams on host. I've committed another patch 5631842d1810 that should definitely fix the problem. I apologize to miss the test problem during review (I actually don't have a machine with an accelerator at hand).

hoy added inline comments.Feb 12 2021, 6:09 PM
openmp/runtime/test/teams/kmp_num_teams.c
68

Thanks for the quick fix!