This is an archive of the discontinued LLVM Phabricator instance.

Fix thread_limits to work properly for teams construct
ClosedPublic

Authored by tlwilmar on Jan 16 2019, 1:10 PM.

Details

Summary

The thread-limit-var and omp_get_thread_limit API was not perfectly handled for teams construct. Now, when modified by thread_limit clause, omp_get_thread_limit reports the correct value. In addition, the value is restored when leaving the teams construct to what it was in the encountering context.

This is done partly by creating the notion of a Contention Group root (CG root) that keeps track of the thread at the root of each separate CG, the thread-limit-var associated with the CG, and associated counter of active threads within the contention group.

thread-limits are passed from master to worker threads via an entry in the ICV data structure. When a "contention group switch" occurs, a new CG root record is made and passed from master to worker. A thread could potentially have several CG root records if it encounters multiple nested teams constructs (but at the moment the spec doesn't allow for nested teams, so the most one could have currently is 2). The master of the teams masters gets the thread-limit clause value stored to its local ICV structure, and the other teams masters copy it from the master. The thread-limit is set from that ICV copy and restored to the ICV copy when entering and leaving the teams construct.

This change also fixes a bug when the top-level teams construct team gets reused, and OMP_DYNAMIC was true, which can cause the expected size of this team to be smaller than what was actually allocated. The fix updates the size of the team after its threads were reserved.

Diff Detail

Repository
rL LLVM

Event Timeline

tlwilmar created this revision.Jan 16 2019, 1:10 PM

I'm a bit confused about the explicit copy per task.

runtime/src/kmp.h
1851 ↗(On Diff #182129)

Is this entry necessary? From my understanding of the spec, one copy per cg or thread should be sufficient.

runtime/src/kmp_runtime.cpp
7280 ↗(On Diff #182129)

If this is the reason for a per task icv, can't we reset the cg value to the initial value after leaving the teams region?

AndreyChurbanov added inline comments.Feb 7 2019, 5:53 AM
runtime/src/kmp.h
1851 ↗(On Diff #182129)

Maybe you are correct, but the specification says it is per-task ICV. So this looks a safer way to go. E.g. in future the setter routine can be added.

runtime/src/kmp_runtime.cpp
5676 ↗(On Diff #182129)

This looks incorrect, because worker thread can only belong to one CG at a time, not to chain of CGs.

7280 ↗(On Diff #182129)

I'd guess the main reason for a per task ICV is the specification requirement. Technically the implementation can ignore it and make the ICV per contention group, but this sounds a bit risky to me.

protze.joachim added inline comments.Feb 7 2019, 6:04 AM
runtime/src/kmp.h
1851 ↗(On Diff #182129)

The spec does not litteraly say per-task ICV. The scope is "data environment".

data environment: The variables associated with the execution of a given region.

Since the value can only be set/changed on a per-cg base, this should be sufficient.

AndreyChurbanov added inline comments.Feb 7 2019, 9:38 AM
runtime/src/kmp.h
1851 ↗(On Diff #182129)

I lean to agreeing with you that thread-limit-var is rather per-contention-group. Regardless that each task has its own data environment, and spec says "There is one copy of this ICV per data environment", skipping per-task thread-limit-var ICV might be feasible. Though not easy, as the push / pop operations for CG-root structures need to be revised then.

I was going for the per-data-environment specification for the ICV, though I agree it is only needed per contention group. Is there any consensus on this? Leave as is, or remove per-data-environment ICV?

Since there is only one other ICV which might be handled similary (bind-var), I will not further argue for making a special case for thread-limit-var. So leave it as it is.

AndreyChurbanov requested changes to this revision.Feb 11 2019, 3:40 AM

Per-task ICV works for me.

But please fix the broken CG threads counter in __kmp_free_thread.

This revision now requires changes to proceed.Feb 11 2019, 3:40 AM
tlwilmar updated this revision to Diff 186290.Feb 11 2019, 10:44 AM

Fix update for non-CG_root threads in __kmp_free_thread.

This revision is now accepted and ready to land.Feb 11 2019, 10:46 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 11 2019, 1:04 PM

@tlwilmar are you registered on llvm's bugzilla?
This patch introduced a leak, please see https://bugs.llvm.org/show_bug.cgi?id=41494

openmp/trunk/runtime/src/kmp_runtime.cpp
3869

This is being leaked in the end