This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][OMPT][clang] task frame support fixed in __kmpc_fork_call
ClosedPublic

Authored by vladaindjic on Oct 25 2021, 2:45 AM.

Download Raw Diff

Details

Reviewers

protze.joachim
jlpeyton
AndreyChurbanov
hbae
jdoerfert

Commits

rGf2410bfb1c49: [OpenMP][OMPT][clang] task frame support fixed in __kmpc_fork_call

Summary

__kmp_fork_call sets the enter_frame of the active task (th_curren_task)
before new parallel region begins. After the region is finished, the
enter_frame is cleared.

The old implementation of __kmpc_fork_call didn’t clear the enter_frame of active task.

Also, the way of initializing the enter_frame of the active task was wrong.
Consider the following two OpenMP programs.

The first program: Let R1 be the serialized parallel region that encloses another serialized
parallel region R2. Assume that thread that executes R2 is going to create a new serialized
parallel region R3 by executing __kmpc_fork_call. This thread is responsible to set enter_frame
of R2’s implicit task. Note that the information about R2’s implicit task is present inside
master_th->th.th_current_task at this moment, while lwt represents the information about
R1’s implicit task. The old implementation uses lwt and resets enter_frame of R1’s implicit
task instead of R2’s implicit task. The new implementation uses
master_th->th.th_current_task instead.

The second program: Consider the OpenMP program that contains parallel region R1 which encloses
an explicit task T. Assume that thread should create another parallel region R2 during the
execution of the T. The __kmpc_fork_call is responsible to create R2 and set enter frame of T
whose information is present inside the master_th->th.th_current_task.
Old implementation tries to set the frame of parent_team->t.t_implicit_task_taskdata[tid]
which corresponds to the implicit task of the R1, instead of T.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

vladaindjic created this revision.Oct 25 2021, 2:45 AM

Herald added subscribers: guansong, yaxunl. · View Herald TranscriptOct 25 2021, 2:45 AM

vladaindjic requested review of this revision.Oct 25 2021, 2:45 AM

Herald added a reviewer: jdoerfert. · View Herald TranscriptOct 25 2021, 2:45 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: openmp-commits, sstefan1. · View Herald Transcript

Harbormaster completed remote builds in B130396: Diff 381898.Oct 25 2021, 2:51 AM

clang-formated diffs in kmp_csupport.cpp

Harbormaster completed remote builds in B130397: Diff 381899.Oct 25 2021, 2:59 AM

Looks like we missed to update the code in kmp_csupport, when the management of lwt was refactored.
lgtm

This revision is now accepted and ready to land.Oct 25 2021, 8:26 AM

@protze.joachim Thank you for the revision. Could you please land this commit for me?

Closed by commit rGf2410bfb1c49: [OpenMP][OMPT][clang] task frame support fixed in __kmpc_fork_call (authored by vladaindjic, committed by protze.joachim). · Explain WhyOct 25 2021, 9:23 AM

This revision was automatically updated to reflect the committed changes.

protze.joachim added a commit: rGf2410bfb1c49: [OpenMP][OMPT][clang] task frame support fixed in __kmpc_fork_call.

Revision Contents

Path

Size

openmp/

runtime/

src/

kmp_csupport.cpp

16 lines

test/

ompt/

parallel/

nested_serialized_task_frames.c

81 lines

region_in_expl_task_task_frames.c

87 lines

Diff 382024

openmp/runtime/src/kmp_csupport.cpp

Show First 20 Lines • Show All 282 Lines • ▼ Show 20 Lines	#endif
{		{
va_list ap;		va_list ap;
va_start(ap, microtask);		va_start(ap, microtask);

#if OMPT_SUPPORT		#if OMPT_SUPPORT
ompt_frame_t *ompt_frame;		ompt_frame_t *ompt_frame;
if (ompt_enabled.enabled) {		if (ompt_enabled.enabled) {
kmp_info_t *master_th = __kmp_threads[gtid];		kmp_info_t *master_th = __kmp_threads[gtid];
kmp_team_t *parent_team = master_th->th.th_team;		ompt_frame = &master_th->th.th_current_task->ompt_task_info.frame;
ompt_lw_taskteam_t *lwt = parent_team->t.ompt_serialized_team_info;
if (lwt)
ompt_frame = &(lwt->ompt_task_info.frame);
else {
int tid = __kmp_tid_from_gtid(gtid);
ompt_frame = &(
parent_team->t.t_implicit_task_taskdata[tid].ompt_task_info.frame);
}
ompt_frame->enter_frame.ptr = OMPT_GET_FRAME_ADDRESS(0);		ompt_frame->enter_frame.ptr = OMPT_GET_FRAME_ADDRESS(0);
}		}
OMPT_STORE_RETURN_ADDRESS(gtid);		OMPT_STORE_RETURN_ADDRESS(gtid);
#endif		#endif

#if INCLUDE_SSC_MARKS		#if INCLUDE_SSC_MARKS
SSC_MARK_FORKING();		SSC_MARK_FORKING();
#endif		#endif
__kmp_fork_call(loc, gtid, fork_context_intel, argc,		__kmp_fork_call(loc, gtid, fork_context_intel, argc,
VOLATILE_CAST(microtask_t) microtask, // "wrapped" task		VOLATILE_CAST(microtask_t) microtask, // "wrapped" task
VOLATILE_CAST(launch_t) __kmp_invoke_task_func,		VOLATILE_CAST(launch_t) __kmp_invoke_task_func,
kmp_va_addr_of(ap));		kmp_va_addr_of(ap));
#if INCLUDE_SSC_MARKS		#if INCLUDE_SSC_MARKS
SSC_MARK_JOINING();		SSC_MARK_JOINING();
#endif		#endif
__kmp_join_call(loc, gtid		__kmp_join_call(loc, gtid
#if OMPT_SUPPORT		#if OMPT_SUPPORT
,		,
fork_context_intel		fork_context_intel
#endif		#endif
);		);

va_end(ap);		va_end(ap);

		#if OMPT_SUPPORT
		if (ompt_enabled.enabled) {
		ompt_frame->enter_frame = ompt_data_none;
		}
		#endif
}		}

#if KMP_STATS_ENABLED		#if KMP_STATS_ENABLED
if (previous_state == stats_state_e::SERIAL_REGION) {		if (previous_state == stats_state_e::SERIAL_REGION) {
KMP_EXCHANGE_PARTITIONED_TIMER(OMP_serial);		KMP_EXCHANGE_PARTITIONED_TIMER(OMP_serial);
KMP_SET_THREAD_STATE(previous_state);		KMP_SET_THREAD_STATE(previous_state);
} else {		} else {
KMP_POP_PARTITIONED_TIMER();		KMP_POP_PARTITIONED_TIMER();
▲ Show 20 Lines • Show All 4,138 Lines • Show Last 20 Lines

openmp/runtime/test/ompt/parallel/nested_serialized_task_frames.c

This file was added.

				// RUN: %libomp-compile-and-run \| %sort-threads \| FileCheck %s
				// REQUIRES: ompt

				#include "callback.h"
				#include <omp.h>

				int main()
				{
				#pragma omp parallel num_threads(1)
				{
				// region 0
				#pragma omp parallel num_threads(1)
				{
				// region 1
				#pragma omp parallel num_threads(1)
				{
				// region 2
				// region 2's implicit task
				print_ids(0);
				// region 1's implicit task
				print_ids(1);
				// region 0's implicit task
				print_ids(2);
				// initial task
				print_ids(3);
				}
				}
				}

				// Check if libomp supports the callbacks for this test.
				// CHECK-NOT: {{^}}0: Could not register callback 'ompt_callback_task_create'
				// CHECK-NOT: {{^}}0: Could not register callback 'ompt_callback_implicit_task'


				// CHECK: {{^}}0: NULL_POINTER=[[NULL:.*$]]
				// CHECK: {{^}}[[MASTER_ID:[0-9]+]]: ompt_event_initial_task_begin: parallel_id=[[INITIAL_PARALLEL_ID:[0-9]+]], task_id=[[INITIAL_TASK_ID:[0-9]+]], actual_parallelism=1, index=1, flags=1

				// region 0
				// CHECK: {{^}}[[MASTER_ID]]: ompt_event_parallel_begin
				// CHECK-SAME: parent_task_frame.exit=[[NULL]], parent_task_frame.reenter=[[INITIAL_TASK_FRAME_ENTER:0x[0-f]+]],
				// CHECK-SAME: parallel_id=[[PARALLEL_ID_0:[0-9]+]]
				// CHECK: {{^}}[[MASTER_ID]]: ompt_event_implicit_task_begin: parallel_id=[[PARALLEL_ID_0]], task_id=[[TASK_ID_0:[0-9]+]]

				// region 1
				// CHECK: {{^}}[[MASTER_ID]]: ompt_event_parallel_begin
				// CHECK-SAME: parent_task_frame.exit=[[REGION_0_FRAME_EXIT:0x[0-f]+]], parent_task_frame.reenter=[[REGION_0_FRAME_ENTER:0x[0-f]+]],
				// CHECK-SAME: parallel_id=[[PARALLEL_ID_1:[0-9]+]]
				// CHECK: {{^}}[[MASTER_ID]]: ompt_event_implicit_task_begin: parallel_id=[[PARALLEL_ID_1]], task_id=[[TASK_ID_1:[0-9]+]]

				// region 2
				// CHECK: {{^}}[[MASTER_ID]]: ompt_event_parallel_begin
				// CHECK-SAME: parent_task_frame.exit=[[REGION_1_FRAME_EXIT:0x[0-f]+]], parent_task_frame.reenter=[[REGION_1_FRAME_ENTER:0x[0-f]+]],
				// CHECK-SAME: parallel_id=[[PARALLEL_ID_2:[0-9]+]]
				// CHECK: {{^}}[[MASTER_ID]]: ompt_event_implicit_task_begin: parallel_id=[[PARALLEL_ID_2]], task_id=[[TASK_ID_2:[0-9]+]]

				// region 2's implicit task information (exit frame should be set, while enter should be NULL)
				// CHECK: {{^}}[[MASTER_ID]]: task level 0: parallel_id=[[PARALLEL_ID_2]], task_id=[[TASK_ID_2]]
				// CHECK-SAME: exit_frame={{0x[0-f]+}}
				// CHECK-SAME: reenter_frame=[[NULL]]
				// CHECK-SAME: task_type=ompt_task_implicit

				// region 1's implicit task information (both exit and enter frames should be set)
				// CHECK: {{^}}[[MASTER_ID]]: task level 1: parallel_id=[[PARALLEL_ID_1]], task_id=[[TASK_ID_1]]
				// CHECK-SAME: exit_frame=[[REGION_1_FRAME_EXIT]]
				// CHECK-SAME: reenter_frame=[[REGION_1_FRAME_ENTER]]
				// CHECK-SAME: task_type=ompt_task_implicit

				// region 0's implicit task information (both exit and enter frames should be set)
				// CHECK: {{^}}[[MASTER_ID]]: task level 2: parallel_id=[[PARALLEL_ID_0]], task_id=[[TASK_ID_0]]
				// CHECK-SAME: exit_frame=[[REGION_0_FRAME_EXIT]]
				// CHECK-SAME: reenter_frame=[[REGION_0_FRAME_ENTER]]
				// CHECK-SAME: task_type=ompt_task_implicit

				// region 0's initial task information (both exit and enter frames should be set)
				// CHECK: {{^}}[[MASTER_ID]]: task level 3: parallel_id=[[INITIAL_PARALLEL_ID]], task_id=[[INITIAL_TASK_ID]]
				// CHECK-SAME: exit_frame=[[NULL]]
				// CHECK-SAME: reenter_frame=[[INITIAL_TASK_FRAME_ENTER]]
				// CHECK-SAME: task_type=ompt_task_initial

				return 0;
				}
				No newline at end of file

openmp/runtime/test/ompt/parallel/region_in_expl_task_task_frames.c

This file was added.

				// RUN: %libomp-compile-and-run \| %sort-threads \| FileCheck %s
				// REQUIRES: ompt

				#include "callback.h"
				#include <omp.h>

				int main()
				{
				#pragma omp parallel num_threads(2)
				{
				if (omp_get_thread_num() == 0) {
				// region 0
				#pragma omp task if(0)
				{
				// explicit task immediately executed by the initial master thread
				#pragma omp parallel num_threads(2)
				{
				if (omp_get_thread_num() == 0) {
				// Note that this is executed by the initial master thread
				// region 1
				// region 1's implicit task
				print_ids(0);
				// explicit task
				print_ids(1);
				// region 0's implicit task
				print_ids(2);
				// initial task
				print_ids(3);
				}
				}
				}
				}
				}

				// Check if libomp supports the callbacks for this test.
				// CHECK-NOT: {{^}}0: Could not register callback 'ompt_callback_task_create'
				// CHECK-NOT: {{^}}0: Could not register callback 'ompt_callback_implicit_task'


				// CHECK: {{^}}0: NULL_POINTER=[[NULL:.*$]]
				// CHECK: {{^}}[[MASTER_ID:[0-9]+]]: ompt_event_initial_task_begin: parallel_id=[[INITIAL_PARALLEL_ID:[0-9]+]], task_id=[[INITIAL_TASK_ID:[0-9]+]], actual_parallelism=1, index=1, flags=1

				// region 0
				// CHECK: {{^}}[[MASTER_ID]]: ompt_event_parallel_begin
				// CHECK-SAME: parent_task_frame.exit=[[NULL]], parent_task_frame.reenter=[[INITIAL_TASK_FRAME_ENTER:0x[0-f]+]],
				// CHECK-SAME: parallel_id=[[PARALLEL_ID_0:[0-9]+]]
				// CHECK: {{^}}[[MASTER_ID]]: ompt_event_implicit_task_begin: parallel_id=[[PARALLEL_ID_0]], task_id=[[TASK_ID_0:[0-9]+]]

				// explicit task
				// CHECK: {{^}}[[MASTER_ID]]: ompt_event_task_create: parent_task_id=[[TASK_ID_0]]
				// CHECK-SAME: parent_task_frame.exit=[[REGION_0_FRAME_EXIT:0x[0-f]+]]
				// CHECK-SAME: parent_task_frame.reenter=[[REGION_0_FRAME_ENTER:0x[0-f]+]]
				// CHECK-SAME: new_task_id=[[TASK_ID_1:[0-9]+]]
				// CHECK: {{^}}[[MASTER_ID]]: ompt_event_task_schedule: first_task_id=[[TASK_ID_0]], second_task_id=[[TASK_ID_1]]

				// region 1
				// CHECK: {{^}}[[MASTER_ID]]: ompt_event_parallel_begin
				// CHECK-SAME: parent_task_frame.exit=[[EXPLICIT_TASK_FRAME_EXIT:0x[0-f]+]], parent_task_frame.reenter=[[EXPLICIT_TASK_FRAME_ENTER:0x[0-f]+]],
				// CHECK-SAME: parallel_id=[[PARALLEL_ID_1:[0-9]+]]
				// CHECK: {{^}}[[MASTER_ID]]: ompt_event_implicit_task_begin: parallel_id=[[PARALLEL_ID_1]], task_id=[[TASK_ID_2:[0-9]+]]

				// region 1's implicit task information (exit frame should be set, while enter should be NULL)
				// CHECK: {{^}}[[MASTER_ID]]: task level 0: parallel_id=[[PARALLEL_ID_1]], task_id=[[TASK_ID_2]]
				// CHECK-SAME: exit_frame={{0x[0-f]+}}
				// CHECK-SAME: reenter_frame=[[NULL]]
				// CHECK-SAME: task_type=ompt_task_implicit

				// explicit task information (both exit and enter frames should be set)
				// CHECK: {{^}}[[MASTER_ID]]: task level 1: parallel_id=[[PARALLEL_ID_0]], task_id=[[TASK_ID_1]]
				// CHECK-SAME: exit_frame=[[EXPLICIT_TASK_FRAME_EXIT]]
				// CHECK-SAME: reenter_frame=[[EXPLICIT_TASK_FRAME_ENTER]]
				// CHECK-SAME: task_type=ompt_task_explicit

				// region 0's implicit task information (both exit and enter frames should be set)
				// CHECK: {{^}}[[MASTER_ID]]: task level 2: parallel_id=[[PARALLEL_ID_0]], task_id=[[TASK_ID_0]]
				// CHECK-SAME: exit_frame=[[REGION_0_FRAME_EXIT]]
				// CHECK-SAME: reenter_frame=[[REGION_0_FRAME_ENTER]]
				// CHECK-SAME: task_type=ompt_task_implicit

				// region 0's initial task information (both exit and enter frames should be set)
				// CHECK: {{^}}[[MASTER_ID]]: task level 3: parallel_id=[[INITIAL_PARALLEL_ID]], task_id=[[INITIAL_TASK_ID]]
				// CHECK-SAME: exit_frame=[[NULL]]
				// CHECK-SAME: reenter_frame=[[INITIAL_TASK_FRAME_ENTER]]
				// CHECK-SAME: task_type=ompt_task_initial

				return 0;
				}
				No newline at end of file