This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
openmp/runtime/
-
runtime/
-
src/
5/5
kmp.h
1/1
kmp_global.cpp
-
kmp_settings.cpp
2/2
kmp_taskdeps.h
2/2
kmp_taskdeps.cpp
3/3
kmp_tasking.cpp
-
test/tasking/
-
tasking/
2/2
omp_record_replay.cpp
1/1
omp_record_replay_deps.cpp
-
omp_record_replay_multiTDGs.cpp
-
omp_record_replay_taskloop.cpp

Differential D146642

[OpenMP] Implement task record and replay mechanism
ClosedPublic

Authored by yuchenle on Mar 22 2023, 9:29 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
josemonsalve2
randreshg
jhuber6
tianshilei1992
AndreyChurbanov
tlwilmar
jlpeyton

Commits

rG36d4e4c9b5f6: [OpenMP] Implement task record and replay mechanism

Summary

This patch implements the "task record and replay" mechanism. The idea is to be able to store tasks and their dependencies in the runtime so that we do not pay the cost of task creation and dependency resolution for future executions. The objective is to improve fine-grained task performance, both for those from "omp task" and "taskloop".

The entry point of the recording phase is kmpc_start_record_task, and the end of record is triggered by kmpc_end_record_task.

Tasks encapsulated between a record start and a record end are saved, meaning that the runtime stores their dependencies and structures, referred to as TDG, in order to replay them in subsequent executions. In these TDG replays, we start the execution by scheduling all root tasks (tasks that do not have input dependencies), and there will be no involvement of a hash table to track the dependencies, yet tasks do not need to be created again.

At the beginning of kmpc_start_record_task, we must check if a TDG has already been recorded. If yes, the function returns 0 and starts to replay the TDG by calling kmp_exec_tdg; if not, we start to record, and the function returns 1.

An integer uniquely identifies TDGs. Currently, this identifier needs to be incremented manually in the source code. Still, depending on how this feature would eventually be used in the library, the caller function must do it; also, the caller function needs to implement a mechanism to skip the associated region, according to the return value of __kmpc_start_record_task.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

yuchenle created this revision.Mar 22 2023, 9:29 AM

yuchenle created this object with edit policy "Administrators".

Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2023, 9:29 AM

Herald added subscribers: sunshaoce, guansong, yaxunl. · View Herald Transcript

yuchenle requested review of this revision.Mar 22 2023, 9:29 AM

Herald added subscribers: openmp-commits, jplehr, sstefan1. · View Herald TranscriptMar 22 2023, 9:29 AM

yuchenle changed the edit policy from "Administrators" to "All Users".Mar 22 2023, 9:30 AM

josemonsalve2 added reviewers: randreshg, jhuber6, tianshilei1992.Mar 22 2023, 9:31 AM

Harbormaster completed remote builds in B221050: Diff 507372.Mar 22 2023, 9:33 AM

Adding some initial comments

openmp/runtime/src/kmp.h
2485	It may be better to expose these as env variables that can be configured by the user
2497	This should probably use doxygen comments `///`
2498–2506	Remember to run clang-format on the changes through the git hook
2513	Be consistent with the TDG in upper case // Flags related to a TDG
openmp/runtime/src/kmp_taskdeps.cpp
224	I believe `KMP_ASSERT` instead of `printf` is better. Others, please advice.
293	This should be `if (task) {` with a leading space before the bracket. But `clang-format` can help with these
openmp/runtime/src/kmp_taskdeps.h
98	Remove commented code that's not used.
105	You may be able to use the macros in `kmp_debug.h` for these messages
openmp/runtime/src/kmp_tasking.cpp
3470	Not needed

hyviquel added a subscriber: hyviquel.Mar 22 2023, 9:52 AM

Changes according to some of @josemonsalve2 's comments

Harbormaster completed remote builds in B221072: Diff 507427.Mar 22 2023, 10:45 AM

Please upload the patch with full context.

openmp/runtime/src/kmp.h
2489	Use a (inline) function. No need for macros.

rneveu added a subscriber: rneveu.Mar 23 2023, 6:50 AM

changed TDG_RECORD macro to inlined function
applied clang-format where it was missed
changed TDG tests, so that they now use clangxx instead of clang, and deleted unnecessary "include" in them

Harbormaster completed remote builds in B221531: Diff 508024.Mar 24 2023, 3:37 AM

josemonsalve2 added inline comments.Mar 24 2023, 5:04 PM

openmp/runtime/src/kmp_tasking.cpp
1195–1197	It may be better to use and instead of nested if `if (is_taskgraph && __kmp_track_children_task(taskdata) && taskdata->td_taskgroup)`

Deleted include <new> in tasking.cpp which was not used
Deleted global variable kmp_max_nesting which is not used in this commit
Merge nested if as suggested
Simplified initialization of is_taskgraph in

__kmp_task_finish

Harbormaster completed remote builds in B221909: Diff 508516.Mar 27 2023, 1:07 AM

tianshilei1992 added reviewers: AndreyChurbanov, tlwilmar, jlpeyton.Mar 28 2023, 4:31 PM

I think it's better to guard the entire related code with macro.

openmp/runtime/src/kmp_global.cpp
571	add an extra line to the end
openmp/runtime/src/kmp_tasking.cpp
19	unrelated change

In D146642#4228996, @tianshilei1992 wrote:

I think it's better to guard the entire related code with macro.

What do you mean by this Shilei? To disable this feature?

In D146642#4229048, @josemonsalve2 wrote:

In D146642#4228996, @tianshilei1992 wrote:

I think it's better to guard the entire related code with macro.

What do you mean by this Shilei? To disable this feature?

Like a opt-in feature.

In D146642#4229067, @tianshilei1992 wrote:

In D146642#4229048, @josemonsalve2 wrote:

In D146642#4228996, @tianshilei1992 wrote:

I think it's better to guard the entire related code with macro.

What do you mean by this Shilei? To disable this feature?

Like a opt-in feature.

Why do you think so? What's the harm of leaving it enable as this is just an API function? Are you saying this to save space in the structs?

In D146642#4229069, @josemonsalve2 wrote:

In D146642#4229067, @tianshilei1992 wrote:

In D146642#4229048, @josemonsalve2 wrote:

In D146642#4228996, @tianshilei1992 wrote:

I think it's better to guard the entire related code with macro.

What do you mean by this Shilei? To disable this feature?

Like a opt-in feature.

Why do you think so? What's the harm of leaving it enable as this is just an API function? Are you saying this to save space in the structs?

It's not just an API function. It contains many runtime checks which can potentially compromise the performance for users that don't need the feature. I'd prefer to take it similar to OMPT.

josemonsalve2 added a comment.Mar 28 2023, 5:12 PM

This comment was removed by josemonsalve2.

In D146642#4229078, @tianshilei1992 wrote:

In D146642#4229069, @josemonsalve2 wrote:

In D146642#4229067, @tianshilei1992 wrote:

In D146642#4229048, @josemonsalve2 wrote:

In D146642#4228996, @tianshilei1992 wrote:

I think it's better to guard the entire related code with macro.

What do you mean by this Shilei? To disable this feature?

Like a opt-in feature.

Why do you think so? What's the harm of leaving it enable as this is just an API function? Are you saying this to save space in the structs?

It's not just an API function. It contains many runtime checks which can potentially compromise the performance for users that don't need the feature. I'd prefer to take it similar to OMPT.

Got it. It can produce a performance degradation in regular tasks, even when not used. I think that's a fair idea.

@yuchenle have you measure the overhead of this in regular tasks when no recording is used? This is specially important for you guys that use fine grain tasking

NUM_TDG_LIMIT is now an env variable KMP_MAX_TDGS.
Reverted a few changes unrelated to task record & replay.

Harbormaster completed remote builds in B222455: Diff 509278.Mar 29 2023, 3:13 AM

In D146642#4229088, @josemonsalve2 wrote:

In D146642#4229078, @tianshilei1992 wrote:

In D146642#4229069, @josemonsalve2 wrote:

In D146642#4229067, @tianshilei1992 wrote:

In D146642#4229048, @josemonsalve2 wrote:

In D146642#4228996, @tianshilei1992 wrote:

I think it's better to guard the entire related code with macro.

What do you mean by this Shilei? To disable this feature?

Like a opt-in feature.

Why do you think so? What's the harm of leaving it enable as this is just an API function? Are you saying this to save space in the structs?

It's not just an API function. It contains many runtime checks which can potentially compromise the performance for users that don't need the feature. I'd prefer to take it similar to OMPT.

Got it. It can produce a performance degradation in regular tasks, even when not used. I think that's a fair idea.

@yuchenle have you measure the overhead of this in regular tasks when no recording is used? This is specially important for you guys that use fine grain tasking

Sorry for the delay. I was trying to generate some data. Hopefully, I will write some scripts to automate this process in the future, so that anyone (including me) can test this patch's performance impact with ease.
I ran Heat propagation simulation (https://github.com/yuchenle/tdg-benchs/tree/master/heat) on an exclusive node of Marenostrum 4 with different granularities and numbers of threads. So far, according to the results (https://www.dropbox.com/s/jur1qrftmw2epvk/LLVM%20RR%20Perf.xlsx?dl=0) the performance impact is small to unnoticeable.
Though @Munesanz (Adrian Munera) and I agreed on including this patch within a macro to exclude performance concerns.
I will update the patch : )

Shielding changes within a macro OMPX_TASKGRAPH. This macro can be defined as 1 or TRUE at CMAKE configuration time via LIBOMP_OMPX_TASKGRAPH. The default value of it is FALSE.

Harbormaster completed remote builds in B223805: Diff 511100.Apr 5 2023, 7:57 AM

In D146642#4240082, @yuchenle wrote:

Sorry for the delay. I was trying to generate some data. Hopefully, I will write some scripts to automate this process in the future, so that anyone (including me) can test this patch's performance impact with ease.
I ran Heat propagation simulation (https://github.com/yuchenle/tdg-benchs/tree/master/heat) on an exclusive node of Marenostrum 4 with different granularities and numbers of threads. So far, according to the results (https://www.dropbox.com/s/jur1qrftmw2epvk/LLVM%20RR%20Perf.xlsx?dl=0) the performance impact is small to unnoticeable.

What is the impact to taskbench in the EPCC OpenMP micro benchmark?

Similarly, what is the impact to SPEC OMP 2012 kdtree?

randreshg added inline comments.Apr 12 2023, 7:42 PM

openmp/runtime/test/tasking/omp_record_replay.cpp
22	I think the test will fail if you don't guarantee that this line will be executed atomically.

yuchenle marked an inline comment as done.Apr 13 2023, 6:32 AM

yuchenle added inline comments.

openmp/runtime/test/tasking/omp_record_replay.cpp
22	I think the test will fail if you don't guarantee that this line will be executed atomically. You are right, if the TDG is asynchronous. However, in the current implementation, TDG execution is synchronous, using taskgroup. Plus, there is only one node in the TDG so there is no contention. Only one thread can access to "num_exec" at any given time : )

In D146642#4246126, @protze.joachim wrote:

In D146642#4240082, @yuchenle wrote:

Sorry for the delay. I was trying to generate some data. Hopefully, I will write some scripts to automate this process in the future, so that anyone (including me) can test this patch's performance impact with ease.
I ran Heat propagation simulation (https://github.com/yuchenle/tdg-benchs/tree/master/heat) on an exclusive node of Marenostrum 4 with different granularities and numbers of threads. So far, according to the results (https://www.dropbox.com/s/jur1qrftmw2epvk/LLVM%20RR%20Perf.xlsx?dl=0) the performance impact is small to unnoticeable.

What is the impact to taskbench in the EPCC OpenMP micro benchmark?

Similarly, what is the impact to SPEC OMP 2012 kdtree?

Hi, I had time to generate some numbers with EPCC microbenchmarks. The results are reported in the same dropbox excel spreadsheet, tab "EPCC chart".
vanilla libomp time is a build with OMPX_TASKGRAPH=FALSE and RR is built with OMPX_TASKGRAPH=1, with the branches introduced in this patch.
The testbed is the same as my previous comment, a node of Manorestrum 4.

baodi added a subscriber: baodi.Apr 24 2023, 12:55 PM

tianshilei1992 added inline comments.Apr 24 2023, 12:55 PM

openmp/runtime/test/tasking/omp_record_replay_deps.cpp
2	I think you will need to guard these newly added test cases such that they will only be executed if `LIBOMP_OMPX_TASKGRAPH` is enabled; otherwise it might cause unexpected failures.

Tests added for record & replay mechanism now require LIBOMP_OMPX_TASKGRAPH=TRUE or =1 CMAKE variable when building. These tests are not run by default to avoid failures, as suggested by @tianshilei1992 .

Harbormaster completed remote builds in B229375: Diff 518646.May 2 2023, 12:33 AM

Ping @jdoerfert @tianshilei1992

LGTM

This revision is now accepted and ready to land.May 12 2023, 10:19 AM

Closed by commit rG36d4e4c9b5f6: [OpenMP] Implement task record and replay mechanism (authored by Chenle Yu <chenle.yu@bsc.es>, committed by Jose M Monsalve Diaz <jmonsalvediaz@anl.gov>). · Explain WhyMay 15 2023, 8:04 AM

This revision was automatically updated to reflect the committed changes.

Jose M Monsalve Diaz <jmonsalvediaz@anl.gov> added a commit: rG36d4e4c9b5f6: [OpenMP] Implement task record and replay mechanism.

Revision Contents

Path

Size

openmp/

runtime/

src/

71 lines

10 lines

12 lines

21 lines

109 lines

372 lines

test/

tasking/

omp_record_replay.cpp

47 lines

omp_record_replay_deps.cpp

62 lines

omp_record_replay_multiTDGs.cpp

75 lines

omp_record_replay_taskloop.cpp

49 lines

Diff 509278

openmp/runtime/src/kmp.h

Show First 20 Lines • Show All 2,475 Lines • ▼ Show 20 Lines
typedef struct {		typedef struct {
kmp_event_type_t type;		kmp_event_type_t type;
kmp_tas_lock_t lock;		kmp_tas_lock_t lock;
union {		union {
kmp_task_t *task;		kmp_task_t *task;
} ed;		} ed;
} kmp_event_t;		} kmp_event_t;

		// Initial number of allocated nodes while recording
		#define INIT_MAPSIZE 50
		josemonsalve2Unsubmitted Done Reply Inline Actions It may be better to expose these as env variables that can be configured by the user josemonsalve2: It may be better to expose these as env variables that can be configured by the user

		typedef struct kmp_taskgraph_flags { /This needs to be exactly 32 bits /
		unsigned nowait : 1;
		unsigned re_record : 1;
		jdoerfertUnsubmitted Done Reply Inline Actions Use a (inline) function. No need for macros. jdoerfert: Use a (inline) function. No need for macros.
		unsigned reserved : 30;
		} kmp_taskgraph_flags_t;

		/// Represents a TDG node
		typedef struct kmp_node_info {
		kmp_task_t *task; // Pointer to the actual task
		kmp_int32 *successors; // Array of the succesors ids
		kmp_int32 nsuccessors; // Number of succesors of the node
		josemonsalve2Unsubmitted Done Reply Inline Actions This should probably use doxygen comments `///` josemonsalve2: This should probably use doxygen comments `///`
		std::atomic<kmp_int32>
		npredecessors_counter; // Number of predessors on the fly
		kmp_int32 npredecessors; // Total number of predecessors
		kmp_int32 successors_size; // Number of allocated succesors ids
		kmp_taskdata_t *parent_task; // Parent implicit task
		} kmp_node_info_t;

		/// Represent a TDG's current status
		typedef enum kmp_tdg_status {
		josemonsalve2Unsubmitted Done Reply Inline Actions Remember to run clang-format on the changes through the git hook josemonsalve2: Remember to run clang-format on the changes through the git hook
		KMP_TDG_NONE = 0,
		KMP_TDG_RECORDING = 1,
		KMP_TDG_READY = 2
		} kmp_tdg_status_t;

		/// Structure that contains a TDG
		typedef struct kmp_tdg_info {
		josemonsalve2Unsubmitted Done Reply Inline Actions Be consistent with the TDG in upper case // Flags related to a TDG josemonsalve2: Be consistent with the TDG in upper case ``` // Flags related to a TDG ```
		kmp_int32 tdg_id; // Unique idenfifier of the TDG
		kmp_taskgraph_flags_t tdg_flags; // Flags related to a TDG
		kmp_int32 map_size; // Number of allocated TDG nodes
		kmp_int32 num_roots; // Number of roots tasks int the TDG
		kmp_int32 *root_tasks; // Array of tasks identifiers that are roots
		kmp_node_info_t *record_map; // Array of TDG nodes
		kmp_tdg_status_t tdg_status =
		KMP_TDG_NONE; // Status of the TDG (recording, ready...)
		std::atomic<kmp_int32> num_tasks; // Number of TDG nodes
		kmp_bootstrap_lock_t
		graph_lock; // Protect graph attributes when updated via taskloop_recur
		// Taskloop reduction related
		void *rec_taskred_data; // Data to pass to __kmpc_task_reduction_init or
		// __kmpc_taskred_init
		kmp_int32 rec_num_taskred;
		} kmp_tdg_info_t;

		extern kmp_int32 __kmp_max_tdgs;
		extern kmp_tdg_info_t **__kmp_global_tdgs;
		extern kmp_int32 __kmp_curr_tdg_idx;
		extern kmp_int32 __kmp_successors_size;
		extern std::atomic<kmp_int32> __kmp_tdg_task_id;
		extern kmp_int32 __kmp_num_tdg;

#ifdef BUILD_TIED_TASK_STACK		#ifdef BUILD_TIED_TASK_STACK

/* Tied Task stack definitions */		/* Tied Task stack definitions */
typedef struct kmp_stack_block {		typedef struct kmp_stack_block {
kmp_taskdata_t *sb_block[TASK_STACK_BLOCK_SIZE];		kmp_taskdata_t *sb_block[TASK_STACK_BLOCK_SIZE];
struct kmp_stack_block *sb_next;		struct kmp_stack_block *sb_next;
struct kmp_stack_block *sb_prev;		struct kmp_stack_block *sb_prev;
} kmp_stack_block_t;		} kmp_stack_block_t;
Show All 31 Lines	typedef struct kmp_tasking_flags { /* Total struct must be exactly 32 bits */
// (0) [>= 2 threads]		// (0) [>= 2 threads]
/* If either team_serial or tasking_ser is set, task team may be NULL */		/* If either team_serial or tasking_ser is set, task team may be NULL */
/* Task State Flags: */		/* Task State Flags: */
unsigned started : 1; /* 1==started, 0==not started */		unsigned started : 1; /* 1==started, 0==not started */
unsigned executing : 1; /* 1==executing, 0==not executing */		unsigned executing : 1; /* 1==executing, 0==not executing */
unsigned complete : 1; /* 1==complete, 0==not complete */		unsigned complete : 1; /* 1==complete, 0==not complete */
unsigned freed : 1; /* 1==freed, 0==allocated */		unsigned freed : 1; /* 1==freed, 0==allocated */
unsigned native : 1; /* 1==gcc-compiled task, 0==intel */		unsigned native : 1; /* 1==gcc-compiled task, 0==intel */
unsigned reserved31 : 7; /* reserved for library use */		unsigned onced : 1; /* 1==ran once already, 0==never ran, record & replay purposes */
		unsigned reserved31 : 6; /* reserved for library use */

} kmp_tasking_flags_t;		} kmp_tasking_flags_t;

typedef struct kmp_target_data {		typedef struct kmp_target_data {
void *async_handle; // libomptarget async handle for task completion query		void *async_handle; // libomptarget async handle for task completion query
} kmp_target_data_t;		} kmp_target_data_t;

struct kmp_taskdata { /* aligned during dynamic allocation */		struct kmp_taskdata { /* aligned during dynamic allocation */
Show All 33 Lines
#if defined(KMP_GOMP_COMPAT)		#if defined(KMP_GOMP_COMPAT)
// GOMP sends in a copy function for copy constructors		// GOMP sends in a copy function for copy constructors
void (td_copy_func)(void , void *);		void (td_copy_func)(void , void *);
#endif		#endif
kmp_event_t td_allow_completion_event;		kmp_event_t td_allow_completion_event;
#if OMPT_SUPPORT		#if OMPT_SUPPORT
ompt_task_info_t ompt_task_info;		ompt_task_info_t ompt_task_info;
#endif		#endif
		bool is_taskgraph = 0; // whether the task is within a TDG
		kmp_tdg_info_t *tdg; // used to associate task with a TDG
kmp_target_data_t td_target_data;		kmp_target_data_t td_target_data;
}; // struct kmp_taskdata		}; // struct kmp_taskdata

// Make sure padding above worked		// Make sure padding above worked
KMP_BUILD_ASSERT(sizeof(kmp_taskdata_t) % sizeof(void *) == 0);		KMP_BUILD_ASSERT(sizeof(kmp_taskdata_t) % sizeof(void *) == 0);

// Data for task team but per thread		// Data for task team but per thread
typedef struct kmp_base_thread_data {		typedef struct kmp_base_thread_data {
▲ Show 20 Lines • Show All 1,524 Lines • ▼ Show 20 Lines	KMP_EXPORT int __kmpc_test_nest_lock(ident_t *loc, kmp_int32 gtid,
void **user_lock);		void **user_lock);

KMP_EXPORT void __kmpc_init_lock_with_hint(ident_t *loc, kmp_int32 gtid,		KMP_EXPORT void __kmpc_init_lock_with_hint(ident_t *loc, kmp_int32 gtid,
void **user_lock, uintptr_t hint);		void **user_lock, uintptr_t hint);
KMP_EXPORT void __kmpc_init_nest_lock_with_hint(ident_t *loc, kmp_int32 gtid,		KMP_EXPORT void __kmpc_init_nest_lock_with_hint(ident_t *loc, kmp_int32 gtid,
void **user_lock,		void **user_lock,
uintptr_t hint);		uintptr_t hint);

		// Taskgraph's Record & Replay mechanism
		// __kmp_tdg_is_recording: check whether a given TDG is recording
		// status: the tdg's current status
		static inline bool __kmp_tdg_is_recording(kmp_tdg_status_t status) {
		return status == KMP_TDG_RECORDING;
		}

		KMP_EXPORT kmp_int32 __kmpc_start_record_task(ident_t *loc, kmp_int32 gtid,
		kmp_int32 input_flags,
		kmp_int32 tdg_id);
		KMP_EXPORT void __kmpc_end_record_task(ident_t *loc, kmp_int32 gtid,
		kmp_int32 input_flags, kmp_int32 tdg_id);
/* Interface to fast scalable reduce methods routines */		/* Interface to fast scalable reduce methods routines */

KMP_EXPORT kmp_int32 __kmpc_reduce_nowait(		KMP_EXPORT kmp_int32 __kmpc_reduce_nowait(
ident_t *loc, kmp_int32 global_tid, kmp_int32 num_vars, size_t reduce_size,		ident_t *loc, kmp_int32 global_tid, kmp_int32 num_vars, size_t reduce_size,
void reduce_data, void (reduce_func)(void lhs_data, void rhs_data),		void reduce_data, void (reduce_func)(void lhs_data, void rhs_data),
kmp_critical_name *lck);		kmp_critical_name *lck);
KMP_EXPORT void __kmpc_end_reduce_nowait(ident_t *loc, kmp_int32 global_tid,		KMP_EXPORT void __kmpc_end_reduce_nowait(ident_t *loc, kmp_int32 global_tid,
kmp_critical_name *lck);		kmp_critical_name *lck);
▲ Show 20 Lines • Show All 461 Lines • Show Last 20 Lines

openmp/runtime/src/kmp_global.cpp

	Show First 20 Lines • Show All 551 Lines • ▼ Show 20 Lines
	// OMP Pause Resources			// OMP Pause Resources
	kmp_pause_status_t __kmp_pause_status = kmp_not_paused;			kmp_pause_status_t __kmp_pause_status = kmp_not_paused;

	// Nesting mode			// Nesting mode
	int __kmp_nesting_mode = 0;			int __kmp_nesting_mode = 0;
	int __kmp_nesting_mode_nlevels = 1;			int __kmp_nesting_mode_nlevels = 1;
	int *__kmp_nesting_nth_level;			int *__kmp_nesting_nth_level;

				// TDG record & replay
				kmp_int32 __kmp_max_tdgs = 100;
				kmp_tdg_info_t **__kmp_global_tdgs = NULL;
				kmp_int32
				__kmp_curr_tdg_idx; // Id of the current TDG being recorded or executed
				kmp_int32 __kmp_num_tdg = 0;
				kmp_int32 __kmp_successors_size = 10; // Initial succesor size list for
				// recording
				std::atomic<kmp_int32> __kmp_tdg_task_id = 0;
	// end of file //			// end of file //

				tianshilei1992Unsubmitted Done Reply Inline Actions add an extra line to the end tianshilei1992: add an extra line to the end

openmp/runtime/src/kmp_settings.cpp

Show First 20 Lines • Show All 1,232 Lines • ▼ Show 20 Lines	if (__kmp_nested_nth.nth) {
if (__kmp_dflt_team_nth_ub < __kmp_dflt_team_nth) {		if (__kmp_dflt_team_nth_ub < __kmp_dflt_team_nth) {
__kmp_dflt_team_nth_ub = __kmp_dflt_team_nth;		__kmp_dflt_team_nth_ub = __kmp_dflt_team_nth;
}		}
}		}
}		}
K_DIAG(1, ("__kmp_dflt_team_nth == %d\n", __kmp_dflt_team_nth));		K_DIAG(1, ("__kmp_dflt_team_nth == %d\n", __kmp_dflt_team_nth));
} // __kmp_stg_parse_num_threads		} // __kmp_stg_parse_num_threads

		static void __kmp_stg_parse_max_tdgs(char const name, char const value,
		void *data) {
		__kmp_stg_parse_int(name, value, 0, INT_MAX, &__kmp_max_tdgs);
		} // __kmp_stg_parse_max_tdgs

		static void __kmp_std_print_max_tdgs(kmp_str_buf_t buffer, char const name,
		void *data) {
		__kmp_stg_print_int(buffer, name, __kmp_max_tdgs);
		} // __kmp_std_print_max_tdgs

static void __kmp_stg_parse_num_hidden_helper_threads(char const *name,		static void __kmp_stg_parse_num_hidden_helper_threads(char const *name,
char const *value,		char const *value,
void *data) {		void *data) {
__kmp_stg_parse_int(name, value, 0, 16, &__kmp_hidden_helper_threads_num);		__kmp_stg_parse_int(name, value, 0, 16, &__kmp_hidden_helper_threads_num);
// If the number of hidden helper threads is zero, we disable hidden helper		// If the number of hidden helper threads is zero, we disable hidden helper
// task		// task
if (__kmp_hidden_helper_threads_num == 0) {		if (__kmp_hidden_helper_threads_num == 0) {
__kmp_enable_hidden_helper = FALSE;		__kmp_enable_hidden_helper = FALSE;
▲ Show 20 Lines • Show All 4,338 Lines • ▼ Show 20 Lines	#endif
__kmp_stg_print_omp_cancellation, NULL, 0, 0},		__kmp_stg_print_omp_cancellation, NULL, 0, 0},
{"OMP_ALLOCATOR", __kmp_stg_parse_allocator, __kmp_stg_print_allocator,		{"OMP_ALLOCATOR", __kmp_stg_parse_allocator, __kmp_stg_print_allocator,
NULL, 0, 0},		NULL, 0, 0},
{"LIBOMP_USE_HIDDEN_HELPER_TASK", __kmp_stg_parse_use_hidden_helper,		{"LIBOMP_USE_HIDDEN_HELPER_TASK", __kmp_stg_parse_use_hidden_helper,
__kmp_stg_print_use_hidden_helper, NULL, 0, 0},		__kmp_stg_print_use_hidden_helper, NULL, 0, 0},
{"LIBOMP_NUM_HIDDEN_HELPER_THREADS",		{"LIBOMP_NUM_HIDDEN_HELPER_THREADS",
__kmp_stg_parse_num_hidden_helper_threads,		__kmp_stg_parse_num_hidden_helper_threads,
__kmp_stg_print_num_hidden_helper_threads, NULL, 0, 0},		__kmp_stg_print_num_hidden_helper_threads, NULL, 0, 0},
		{"KMP_MAX_TDGS", __kmp_stg_parse_max_tdgs, __kmp_std_print_max_tdgs, NULL,
		0, 0},

#if OMPT_SUPPORT		#if OMPT_SUPPORT
{"OMP_TOOL", __kmp_stg_parse_omp_tool, __kmp_stg_print_omp_tool, NULL, 0,		{"OMP_TOOL", __kmp_stg_parse_omp_tool, __kmp_stg_print_omp_tool, NULL, 0,
0},		0},
{"OMP_TOOL_LIBRARIES", __kmp_stg_parse_omp_tool_libraries,		{"OMP_TOOL_LIBRARIES", __kmp_stg_parse_omp_tool_libraries,
__kmp_stg_print_omp_tool_libraries, NULL, 0, 0},		__kmp_stg_print_omp_tool_libraries, NULL, 0, 0},
{"OMP_TOOL_VERBOSE_INIT", __kmp_stg_parse_omp_tool_verbose_init,		{"OMP_TOOL_VERBOSE_INIT", __kmp_stg_parse_omp_tool_verbose_init,
__kmp_stg_print_omp_tool_verbose_init, NULL, 0, 0},		__kmp_stg_print_omp_tool_verbose_init, NULL, 0, 0},
▲ Show 20 Lines • Show All 863 Lines • Show Last 20 Lines

openmp/runtime/src/kmp_taskdeps.h

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines
#else		#else
__kmp_thread_free(thread, h);		__kmp_thread_free(thread, h);
#endif		#endif
}		}

extern void __kmpc_give_task(kmp_task_t *ptask, kmp_int32 start);		extern void __kmpc_give_task(kmp_task_t *ptask, kmp_int32 start);

static inline void __kmp_release_deps(kmp_int32 gtid, kmp_taskdata_t *task) {		static inline void __kmp_release_deps(kmp_int32 gtid, kmp_taskdata_t *task) {

		if (task->is_taskgraph && !(__kmp_tdg_is_recording(task->tdg->tdg_status))) {
		kmp_node_info_t *TaskInfo = &(task->tdg->record_map[task->td_task_id]);

		josemonsalve2Unsubmitted Done Reply Inline Actions Remove commented code that's not used. josemonsalve2: Remove commented code that's not used.
		for (int i = 0; i < TaskInfo->nsuccessors; i++) {
		kmp_int32 successorNumber = TaskInfo->successors[i];
		kmp_node_info_t *successor = &(task->tdg->record_map[successorNumber]);
		kmp_int32 npredecessors = KMP_ATOMIC_DEC(&successor->npredecessors_counter) - 1;
		if (successor->task != nullptr && npredecessors == 0) {
		__kmp_omp_task(gtid, successor->task, false);
		}
		josemonsalve2Unsubmitted Done Reply Inline Actions You may be able to use the macros in `kmp_debug.h` for these messages josemonsalve2: You may be able to use the macros in `kmp_debug.h` for these messages
		}
		return;
		}

kmp_info_t *thread = __kmp_threads[gtid];		kmp_info_t *thread = __kmp_threads[gtid];
kmp_depnode_t *node = task->td_depnode;		kmp_depnode_t *node = task->td_depnode;

// Check mutexinoutset dependencies, release locks		// Check mutexinoutset dependencies, release locks
if (UNLIKELY(node && (node->dn.mtx_num_locks < 0))) {		if (UNLIKELY(node && (node->dn.mtx_num_locks < 0))) {
// negative num_locks means all locks were acquired		// negative num_locks means all locks were acquired
node->dn.mtx_num_locks = -node->dn.mtx_num_locks;		node->dn.mtx_num_locks = -node->dn.mtx_num_locks;
for (int i = node->dn.mtx_num_locks - 1; i >= 0; --i) {		for (int i = node->dn.mtx_num_locks - 1; i >= 0; --i) {
Show All 12 Lines	static inline void __kmp_release_deps(kmp_int32 gtid, kmp_taskdata_t *task) {

if (!node)		if (!node)
return;		return;

KA_TRACE(20, ("__kmp_release_deps: T#%d notifying successors of task %p.\n",		KA_TRACE(20, ("__kmp_release_deps: T#%d notifying successors of task %p.\n",
gtid, task));		gtid, task));

KMP_ACQUIRE_DEPNODE(gtid, node);		KMP_ACQUIRE_DEPNODE(gtid, node);
		if (!task->is_taskgraph \|\|
		(task->is_taskgraph && !__kmp_tdg_is_recording(task->tdg->tdg_status)))
node->dn.task =		node->dn.task =
NULL; // mark this task as finished, so no new dependencies are generated		NULL; // mark this task as finished, so no new dependencies are generated
KMP_RELEASE_DEPNODE(gtid, node);		KMP_RELEASE_DEPNODE(gtid, node);

kmp_depnode_list_t *next;		kmp_depnode_list_t *next;
kmp_taskdata_t *next_taskdata;		kmp_taskdata_t *next_taskdata;
for (kmp_depnode_list_t *p = node->dn.successors; p; p = next) {		for (kmp_depnode_list_t *p = node->dn.successors; p; p = next) {
kmp_depnode_t *successor = p->node;		kmp_depnode_t *successor = p->node;
#if USE_ITT_BUILD && USE_ITT_NOTIFY		#if USE_ITT_BUILD && USE_ITT_NOTIFY
__itt_sync_releasing(successor);		__itt_sync_releasing(successor);
▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

openmp/runtime/src/kmp_taskdeps.cpp

Show First 20 Lines • Show All 212 Lines • ▼ Show 20 Lines	#endif
new_head->next = list;		new_head->next = list;

return new_head;		return new_head;
}		}

static inline void __kmp_track_dependence(kmp_int32 gtid, kmp_depnode_t *source,		static inline void __kmp_track_dependence(kmp_int32 gtid, kmp_depnode_t *source,
kmp_depnode_t *sink,		kmp_depnode_t *sink,
kmp_task_t *sink_task) {		kmp_task_t *sink_task) {
		kmp_taskdata_t *task_source = KMP_TASK_TO_TASKDATA(source->dn.task);
		kmp_taskdata_t *task_sink = KMP_TASK_TO_TASKDATA(sink_task);
		if (source->dn.task && sink_task) {
		// Not supporting dependency between two tasks that one is within the TDG
		josemonsalve2Unsubmitted Done Reply Inline Actions I believe `KMP_ASSERT` instead of `printf` is better. Others, please advice. josemonsalve2: I believe `KMP_ASSERT` instead of `printf` is better. Others, please advice.
		// and the other is not
		KMP_ASSERT(task_source->is_taskgraph == task_sink->is_taskgraph);
		}
		if (task_sink->is_taskgraph &&
		__kmp_tdg_is_recording(task_sink->tdg->tdg_status)) {
		kmp_node_info_t *source_info =
		&task_sink->tdg->record_map[task_source->td_task_id];
		bool exists = false;
		for (int i = 0; i < source_info->nsuccessors; i++) {
		if (source_info->successors[i] == task_sink->td_task_id) {
		exists = true;
		break;
		}
		}
		if (!exists) {
		if (source_info->nsuccessors >= source_info->successors_size) {
		source_info->successors_size = 2 * source_info->successors_size;
		kmp_int32 *old_succ_ids = source_info->successors;
		kmp_int32 new_succ_ids = (kmp_int32 )__kmp_allocate(
		source_info->successors_size * sizeof(kmp_int32));
		source_info->successors = new_succ_ids;
		__kmp_free(old_succ_ids);
		}

		source_info->successors[source_info->nsuccessors] = task_sink->td_task_id;
		source_info->nsuccessors++;

		kmp_node_info_t *sink_info =
		&(task_sink->tdg->record_map[task_sink->td_task_id]);
		sink_info->npredecessors++;
		}
		}
#ifdef KMP_SUPPORT_GRAPH_OUTPUT		#ifdef KMP_SUPPORT_GRAPH_OUTPUT
kmp_taskdata_t *task_source = KMP_TASK_TO_TASKDATA(source->dn.task);		kmp_taskdata_t *task_source = KMP_TASK_TO_TASKDATA(source->dn.task);
// do not use sink->dn.task as that is only filled after the dependences		// do not use sink->dn.task as that is only filled after the dependences
// are already processed!		// are already processed!
kmp_taskdata_t *task_sink = KMP_TASK_TO_TASKDATA(sink_task);		kmp_taskdata_t *task_sink = KMP_TASK_TO_TASKDATA(sink_task);

__kmp_printf("%d(%s) -> %d(%s)\n", source->dn.id,		__kmp_printf("%d(%s) -> %d(%s)\n", source->dn.id,
task_source->td_ident->psource, sink->dn.id,		task_source->td_ident->psource, sink->dn.id,
Show All 20 Lines
static inline kmp_int32		static inline kmp_int32
__kmp_depnode_link_successor(kmp_int32 gtid, kmp_info_t *thread,		__kmp_depnode_link_successor(kmp_int32 gtid, kmp_info_t *thread,
kmp_task_t task, kmp_depnode_t node,		kmp_task_t task, kmp_depnode_t node,
kmp_depnode_list_t *plist) {		kmp_depnode_list_t *plist) {
if (!plist)		if (!plist)
return 0;		return 0;
kmp_int32 npredecessors = 0;		kmp_int32 npredecessors = 0;
// link node as successor of list elements		// link node as successor of list elements
for (kmp_depnode_list_t *p = plist; p; p = p->next) {		for (kmp_depnode_list_t *p = plist; p; p = p->next) {
		josemonsalve2Unsubmitted Done Reply Inline Actions This should be `if (task) {` with a leading space before the bracket. But `clang-format` can help with these josemonsalve2: This should be `if (task) {` with a leading space before the bracket. But `clang-format` can…
kmp_depnode_t *dep = p->node;		kmp_depnode_t *dep = p->node;
		kmp_tdg_status tdg_status = KMP_TDG_NONE;
		if (task) {
		kmp_taskdata_t *td = KMP_TASK_TO_TASKDATA(task);
		if (td->is_taskgraph)
		tdg_status = KMP_TASK_TO_TASKDATA(task)->tdg->tdg_status;
		if (__kmp_tdg_is_recording(tdg_status))
		__kmp_track_dependence(gtid, dep, node, task);
		}
if (dep->dn.task) {		if (dep->dn.task) {
KMP_ACQUIRE_DEPNODE(gtid, dep);		KMP_ACQUIRE_DEPNODE(gtid, dep);
if (dep->dn.task) {		if (dep->dn.task) {
		if (!(__kmp_tdg_is_recording(tdg_status)) && task)
__kmp_track_dependence(gtid, dep, node, task);		__kmp_track_dependence(gtid, dep, node, task);
dep->dn.successors = __kmp_add_node(thread, dep->dn.successors, node);		dep->dn.successors = __kmp_add_node(thread, dep->dn.successors, node);
KA_TRACE(40, ("__kmp_process_deps: T#%d adding dependence from %p to "		KA_TRACE(40, ("__kmp_process_deps: T#%d adding dependence from %p to "
"%p\n",		"%p\n",
gtid, KMP_TASK_TO_TASKDATA(dep->dn.task),		gtid, KMP_TASK_TO_TASKDATA(dep->dn.task),
KMP_TASK_TO_TASKDATA(task)));		KMP_TASK_TO_TASKDATA(task)));
npredecessors++;		npredecessors++;
}		}
KMP_RELEASE_DEPNODE(gtid, dep);		KMP_RELEASE_DEPNODE(gtid, dep);
}		}
}		}
return npredecessors;		return npredecessors;
}		}

static inline kmp_int32 __kmp_depnode_link_successor(kmp_int32 gtid,		static inline kmp_int32 __kmp_depnode_link_successor(kmp_int32 gtid,
kmp_info_t *thread,		kmp_info_t *thread,
kmp_task_t *task,		kmp_task_t *task,
kmp_depnode_t *source,		kmp_depnode_t *source,
kmp_depnode_t *sink) {		kmp_depnode_t *sink) {
if (!sink)		if (!sink)
return 0;		return 0;
kmp_int32 npredecessors = 0;		kmp_int32 npredecessors = 0;
		kmp_tdg_status tdg_status = KMP_TDG_NONE;
		kmp_taskdata_t *td = KMP_TASK_TO_TASKDATA(task);
		if (task) {
		if (td->is_taskgraph)
		tdg_status = KMP_TASK_TO_TASKDATA(task)->tdg->tdg_status;
		if (__kmp_tdg_is_recording(tdg_status) && sink->dn.task)
		__kmp_track_dependence(gtid, sink, source, task);
		}
if (sink->dn.task) {		if (sink->dn.task) {
// synchronously add source to sink' list of successors		// synchronously add source to sink' list of successors
KMP_ACQUIRE_DEPNODE(gtid, sink);		KMP_ACQUIRE_DEPNODE(gtid, sink);
if (sink->dn.task) {		if (sink->dn.task) {
		if (!(__kmp_tdg_is_recording(tdg_status)) && task)
__kmp_track_dependence(gtid, sink, source, task);		__kmp_track_dependence(gtid, sink, source, task);
sink->dn.successors = __kmp_add_node(thread, sink->dn.successors, source);		sink->dn.successors = __kmp_add_node(thread, sink->dn.successors, source);
KA_TRACE(40, ("__kmp_process_deps: T#%d adding dependence from %p to "		KA_TRACE(40, ("__kmp_process_deps: T#%d adding dependence from %p to "
"%p\n",		"%p\n",
gtid, KMP_TASK_TO_TASKDATA(sink->dn.task),		gtid, KMP_TASK_TO_TASKDATA(sink->dn.task),
KMP_TASK_TO_TASKDATA(task)));		KMP_TASK_TO_TASKDATA(task)));
		if (__kmp_tdg_is_recording(tdg_status)) {
		kmp_taskdata_t *tdd = KMP_TASK_TO_TASKDATA(sink->dn.task);
		if (tdd->is_taskgraph) {
		if (tdd->td_flags.onced)
		// decrement npredecessors if sink->dn.task belongs to a taskgraph
		// and
		// 1) the task is reset to its initial state (by kmp_free_task) or
		// 2) the task is complete but not yet reset
		npredecessors--;
		}
		}
npredecessors++;		npredecessors++;
}		}
KMP_RELEASE_DEPNODE(gtid, sink);		KMP_RELEASE_DEPNODE(gtid, sink);
}		}
return npredecessors;		return npredecessors;
}		}

static inline kmp_int32		static inline kmp_int32
▲ Show 20 Lines • Show All 288 Lines • ▼ Show 20 Lines	kmp_int32 __kmpc_omp_task_with_deps(ident_t *loc_ref, kmp_int32 gtid,

kmp_taskdata_t *new_taskdata = KMP_TASK_TO_TASKDATA(new_task);		kmp_taskdata_t *new_taskdata = KMP_TASK_TO_TASKDATA(new_task);
KA_TRACE(10, ("__kmpc_omp_task_with_deps(enter): T#%d loc=%p task=%p\n", gtid,		KA_TRACE(10, ("__kmpc_omp_task_with_deps(enter): T#%d loc=%p task=%p\n", gtid,
loc_ref, new_taskdata));		loc_ref, new_taskdata));
__kmp_assert_valid_gtid(gtid);		__kmp_assert_valid_gtid(gtid);
kmp_info_t *thread = __kmp_threads[gtid];		kmp_info_t *thread = __kmp_threads[gtid];
kmp_taskdata_t *current_task = thread->th.th_current_task;		kmp_taskdata_t *current_task = thread->th.th_current_task;

		// record TDG with deps
		if (new_taskdata->is_taskgraph &&
		__kmp_tdg_is_recording(new_taskdata->tdg->tdg_status)) {
		kmp_tdg_info_t *tdg = new_taskdata->tdg;
		// extend record_map if needed
		if (new_taskdata->td_task_id >= tdg->map_size) {
		__kmp_acquire_bootstrap_lock(&tdg->graph_lock);
		if (new_taskdata->td_task_id >= tdg->map_size) {
		kmp_uint old_size = tdg->map_size;
		kmp_uint new_size = old_size * 2;
		kmp_node_info_t *old_record = tdg->record_map;
		kmp_node_info_t new_record = (kmp_node_info_t )__kmp_allocate(
		new_size * sizeof(kmp_node_info_t));
		KMP_MEMCPY(new_record, tdg->record_map,
		old_size * sizeof(kmp_node_info_t));
		tdg->record_map = new_record;

		__kmp_free(old_record);

		for (kmp_int i = old_size; i < new_size; i++) {
		kmp_int32 successorsList = (kmp_int32 )__kmp_allocate(
		__kmp_successors_size * sizeof(kmp_int32));
		new_record[i].task = nullptr;
		new_record[i].successors = successorsList;
		new_record[i].nsuccessors = 0;
		new_record[i].npredecessors = 0;
		new_record[i].successors_size = __kmp_successors_size;
		KMP_ATOMIC_ST_REL(&new_record[i].npredecessors_counter, 0);
		}
		// update the size at the end, so that we avoid other
		// threads use old_record while map_size is already updated
		tdg->map_size = new_size;
		}
		__kmp_release_bootstrap_lock(&tdg->graph_lock);
		}
		tdg->record_map[new_taskdata->td_task_id].task = new_task;
		tdg->record_map[new_taskdata->td_task_id].parent_task =
		new_taskdata->td_parent;
		KMP_ATOMIC_INC(&tdg->num_tasks);
		}
#if OMPT_SUPPORT		#if OMPT_SUPPORT
if (ompt_enabled.enabled) {		if (ompt_enabled.enabled) {
if (!current_task->ompt_task_info.frame.enter_frame.ptr)		if (!current_task->ompt_task_info.frame.enter_frame.ptr)
current_task->ompt_task_info.frame.enter_frame.ptr =		current_task->ompt_task_info.frame.enter_frame.ptr =
OMPT_GET_FRAME_ADDRESS(0);		OMPT_GET_FRAME_ADDRESS(0);
if (ompt_enabled.ompt_callback_task_create) {		if (ompt_enabled.ompt_callback_task_create) {
ompt_callbacks.ompt_callback(ompt_callback_task_create)(		ompt_callbacks.ompt_callback(ompt_callback_task_create)(
&(current_task->ompt_task_info.task_data),		&(current_task->ompt_task_info.task_data),
▲ Show 20 Lines • Show All 290 Lines • Show Last 20 Lines

openmp/runtime/src/kmp_tasking.cpp

Show All 10 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "kmp.h"		#include "kmp.h"
#include "kmp_i18n.h"		#include "kmp_i18n.h"
#include "kmp_itt.h"		#include "kmp_itt.h"
#include "kmp_stats.h"		#include "kmp_stats.h"
#include "kmp_wait_release.h"		#include "kmp_wait_release.h"
#include "kmp_taskdeps.h"		#include "kmp_taskdeps.h"

tianshilei1992Unsubmitted Done Reply Inline Actions unrelated change tianshilei1992: unrelated change
#if OMPT_SUPPORT		#if OMPT_SUPPORT
#include "ompt-specific.h"		#include "ompt-specific.h"
#endif		#endif

#if ENABLE_LIBOMPTARGET		#if ENABLE_LIBOMPTARGET
// Declaration of synchronization function from libomptarget.		// Declaration of synchronization function from libomptarget.
extern "C" void __tgt_target_nowait_query(void **) KMP_WEAK_ATTRIBUTE_INTERNAL;		extern "C" void __tgt_target_nowait_query(void **) KMP_WEAK_ATTRIBUTE_INTERNAL;
#endif		#endif

/* forward declaration */		/* forward declaration */
static void __kmp_enable_tasking(kmp_task_team_t *task_team,		static void __kmp_enable_tasking(kmp_task_team_t *task_team,
kmp_info_t *this_thr);		kmp_info_t *this_thr);
static void __kmp_alloc_task_deque(kmp_info_t *thread,		static void __kmp_alloc_task_deque(kmp_info_t *thread,
kmp_thread_data_t *thread_data);		kmp_thread_data_t *thread_data);
static int __kmp_realloc_task_threads_data(kmp_info_t *thread,		static int __kmp_realloc_task_threads_data(kmp_info_t *thread,
kmp_task_team_t *task_team);		kmp_task_team_t *task_team);
static void __kmp_bottom_half_finish_proxy(kmp_int32 gtid, kmp_task_t *ptask);		static void __kmp_bottom_half_finish_proxy(kmp_int32 gtid, kmp_task_t *ptask);
		int __kmp_taskloop_task(int gtid, void *ptask);

#ifdef BUILD_TIED_TASK_STACK		#ifdef BUILD_TIED_TASK_STACK

// __kmp_trace_task_stack: print the tied tasks from the task stack in order		// __kmp_trace_task_stack: print the tied tasks from the task stack in order
// from top do bottom		// from top do bottom
//		//
// gtid: global thread identifier for thread containing stack		// gtid: global thread identifier for thread containing stack
// thread_data: thread data for task team thread containing stack		// thread_data: thread data for task team thread containing stack
▲ Show 20 Lines • Show All 228 Lines • ▼ Show 20 Lines	if (current->td_flags.tasktype == TASK_EXPLICIT \|\|
KMP_DEBUG_ASSERT(parent != NULL);		KMP_DEBUG_ASSERT(parent != NULL);
}		}
if (parent != current)		if (parent != current)
return false;		return false;
}		}
}		}
// Check mutexinoutset dependencies, acquire locks		// Check mutexinoutset dependencies, acquire locks
kmp_depnode_t *node = tasknew->td_depnode;		kmp_depnode_t *node = tasknew->td_depnode;
if (UNLIKELY(node && (node->dn.mtx_num_locks > 0))) {		if (!tasknew->is_taskgraph && UNLIKELY(node && (node->dn.mtx_num_locks > 0))) {
for (int i = 0; i < node->dn.mtx_num_locks; ++i) {		for (int i = 0; i < node->dn.mtx_num_locks; ++i) {
KMP_DEBUG_ASSERT(node->dn.mtx_locks[i] != NULL);		KMP_DEBUG_ASSERT(node->dn.mtx_locks[i] != NULL);
if (__kmp_test_lock(node->dn.mtx_locks[i], gtid))		if (__kmp_test_lock(node->dn.mtx_locks[i], gtid))
continue;		continue;
// could not get the lock, release previous locks		// could not get the lock, release previous locks
for (int j = i - 1; j >= 0; --j)		for (int j = i - 1; j >= 0; --j)
__kmp_release_lock(node->dn.mtx_locks[j], gtid);		__kmp_release_lock(node->dn.mtx_locks[j], gtid);
return false;		return false;
▲ Show 20 Lines • Show All 590 Lines • ▼ Show 20 Lines	KMP_DEBUG_ASSERT(taskdata->td_allocated_child_tasks == 0 \|\|
taskdata->td_flags.task_serial == 1);		taskdata->td_flags.task_serial == 1);
KMP_DEBUG_ASSERT(taskdata->td_incomplete_child_tasks == 0);		KMP_DEBUG_ASSERT(taskdata->td_incomplete_child_tasks == 0);
kmp_task_t *task = KMP_TASKDATA_TO_TASK(taskdata);		kmp_task_t *task = KMP_TASKDATA_TO_TASK(taskdata);
// Clear data to not be re-used later by mistake.		// Clear data to not be re-used later by mistake.
task->data1.destructors = NULL;		task->data1.destructors = NULL;
task->data2.priority = 0;		task->data2.priority = 0;

taskdata->td_flags.freed = 1;		taskdata->td_flags.freed = 1;
		// do not free tasks in taskgraph
		if (!taskdata->is_taskgraph) {
// deallocate the taskdata and shared variable blocks associated with this task		// deallocate the taskdata and shared variable blocks associated with this task
#if USE_FAST_MEMORY		#if USE_FAST_MEMORY
__kmp_fast_free(thread, taskdata);		__kmp_fast_free(thread, taskdata);
#else /* ! USE_FAST_MEMORY */		#else /* ! USE_FAST_MEMORY */
__kmp_thread_free(thread, taskdata);		__kmp_thread_free(thread, taskdata);
#endif		#endif
		} else {
		taskdata->td_flags.complete = 0;
		taskdata->td_flags.started = 0;
		taskdata->td_flags.freed = 0;
		taskdata->td_flags.executing = 0;
		taskdata->td_flags.task_serial =
		(taskdata->td_parent->td_flags.final \|\|
		taskdata->td_flags.team_serial \|\| taskdata->td_flags.tasking_ser);

		// taskdata->td_allow_completion_event.pending_events_count = 1;
		KMP_ATOMIC_ST_RLX(&taskdata->td_untied_count, 0);
		KMP_ATOMIC_ST_RLX(&taskdata->td_incomplete_child_tasks, 0);
		// start at one because counts current task and children
		KMP_ATOMIC_ST_RLX(&taskdata->td_allocated_child_tasks, 1);
		}

KA_TRACE(20, ("__kmp_free_task: T#%d freed task %p\n", gtid, taskdata));		KA_TRACE(20, ("__kmp_free_task: T#%d freed task %p\n", gtid, taskdata));
}		}

// __kmp_free_task_and_ancestors: free the current task and ancestors without		// __kmp_free_task_and_ancestors: free the current task and ancestors without
// children		// children
//		//
// gtid: Global thread ID of calling thread		// gtid: Global thread ID of calling thread
// taskdata: task to free		// taskdata: task to free
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
// don't track the children, task synchronization will be broken.		// don't track the children, task synchronization will be broken.
static bool __kmp_track_children_task(kmp_taskdata_t *taskdata) {		static bool __kmp_track_children_task(kmp_taskdata_t *taskdata) {
kmp_tasking_flags_t flags = taskdata->td_flags;		kmp_tasking_flags_t flags = taskdata->td_flags;
bool ret = !(flags.team_serial \|\| flags.tasking_ser);		bool ret = !(flags.team_serial \|\| flags.tasking_ser);
ret = ret \|\| flags.proxy == TASK_PROXY \|\|		ret = ret \|\| flags.proxy == TASK_PROXY \|\|
flags.detachable == TASK_DETACHABLE \|\| flags.hidden_helper;		flags.detachable == TASK_DETACHABLE \|\| flags.hidden_helper;
ret = ret \|\|		ret = ret \|\|
KMP_ATOMIC_LD_ACQ(&taskdata->td_parent->td_incomplete_child_tasks) > 0;		KMP_ATOMIC_LD_ACQ(&taskdata->td_parent->td_incomplete_child_tasks) > 0;
		if (taskdata->td_taskgroup && taskdata->is_taskgraph)
		ret = ret \|\| KMP_ATOMIC_LD_ACQ(&taskdata->td_taskgroup->count) > 0;
return ret;		return ret;
}		}

// __kmp_task_finish: bookkeeping to do when a task finishes execution		// __kmp_task_finish: bookkeeping to do when a task finishes execution
//		//
// gtid: global thread ID for calling thread		// gtid: global thread ID for calling thread
// task: task to be finished		// task: task to be finished
// resumed_task: task to be resumed. (may be NULL if task is serialized)		// resumed_task: task to be resumed. (may be NULL if task is serialized)
//		//
// template<ompt>: effectively ompt_enabled.enabled!=0		// template<ompt>: effectively ompt_enabled.enabled!=0
// the version with ompt=false is inlined, allowing to optimize away all ompt		// the version with ompt=false is inlined, allowing to optimize away all ompt
// code in this case		// code in this case
template <bool ompt>		template <bool ompt>
static void __kmp_task_finish(kmp_int32 gtid, kmp_task_t *task,		static void __kmp_task_finish(kmp_int32 gtid, kmp_task_t *task,
kmp_taskdata_t *resumed_task) {		kmp_taskdata_t *resumed_task) {
kmp_taskdata_t *taskdata = KMP_TASK_TO_TASKDATA(task);		kmp_taskdata_t *taskdata = KMP_TASK_TO_TASKDATA(task);
kmp_info_t *thread = __kmp_threads[gtid];		kmp_info_t *thread = __kmp_threads[gtid];
kmp_task_team_t *task_team =		kmp_task_team_t *task_team =
thread->th.th_task_team; // might be NULL for serial teams...		thread->th.th_task_team; // might be NULL for serial teams...
		// to avoid seg fault when we need to access taskdata->td_flags after free when using vanilla taskloop
		bool is_taskgraph;
#if KMP_DEBUG		#if KMP_DEBUG
kmp_int32 children = 0;		kmp_int32 children = 0;
#endif		#endif
KA_TRACE(10, ("__kmp_task_finish(enter): T#%d finishing task %p and resuming "		KA_TRACE(10, ("__kmp_task_finish(enter): T#%d finishing task %p and resuming "
"task %p\n",		"task %p\n",
gtid, taskdata, resumed_task));		gtid, taskdata, resumed_task));

KMP_DEBUG_ASSERT(taskdata->td_flags.tasktype == TASK_EXPLICIT);		KMP_DEBUG_ASSERT(taskdata->td_flags.tasktype == TASK_EXPLICIT);

		is_taskgraph = taskdata->is_taskgraph;

// Pop task from stack if tied		// Pop task from stack if tied
#ifdef BUILD_TIED_TASK_STACK		#ifdef BUILD_TIED_TASK_STACK
if (taskdata->td_flags.tiedness == TASK_TIED) {		if (taskdata->td_flags.tiedness == TASK_TIED) {
__kmp_pop_task_stack(gtid, thread, taskdata);		__kmp_pop_task_stack(gtid, thread, taskdata);
}		}
#endif /* BUILD_TIED_TASK_STACK */		#endif /* BUILD_TIED_TASK_STACK */

if (UNLIKELY(taskdata->td_flags.tiedness == TASK_UNTIED)) {		if (UNLIKELY(taskdata->td_flags.tiedness == TASK_UNTIED)) {
▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	if (taskdata->td_target_data.async_handle != NULL) {
__kmpc_give_task(task, __kmp_tid_from_gtid(gtid));		__kmpc_give_task(task, __kmp_tid_from_gtid(gtid));
if (KMP_HIDDEN_HELPER_THREAD(gtid))		if (KMP_HIDDEN_HELPER_THREAD(gtid))
__kmp_hidden_helper_worker_thread_signal();		__kmp_hidden_helper_worker_thread_signal();
completed = false;		completed = false;
}		}

if (completed) {		if (completed) {
taskdata->td_flags.complete = 1; // mark the task as completed		taskdata->td_flags.complete = 1; // mark the task as completed
		taskdata->td_flags.onced = 1; // mark the task as ran once already

#if OMPT_SUPPORT		#if OMPT_SUPPORT
// This is not a detached task, we are done here		// This is not a detached task, we are done here
if (ompt)		if (ompt)
__ompt_task_finish(task, resumed_task, ompt_task_complete);		__ompt_task_finish(task, resumed_task, ompt_task_complete);
#endif		#endif
// TODO: What would be the balance between the conditions in the function		// TODO: What would be the balance between the conditions in the function
// and an atomic operation?		// and an atomic operation?
if (__kmp_track_children_task(taskdata)) {		if (__kmp_track_children_task(taskdata)) {
__kmp_release_deps(gtid, taskdata);		__kmp_release_deps(gtid, taskdata);
// Predecrement simulated by "- 1" calculation		// Predecrement simulated by "- 1" calculation
#if KMP_DEBUG		#if KMP_DEBUG
children = -1 +		children = -1 +
#endif		#endif
KMP_ATOMIC_DEC(&taskdata->td_parent->td_incomplete_child_tasks);		KMP_ATOMIC_DEC(&taskdata->td_parent->td_incomplete_child_tasks);
KMP_DEBUG_ASSERT(children >= 0);		KMP_DEBUG_ASSERT(children >= 0);
if (taskdata->td_taskgroup)		if (taskdata->td_taskgroup && !taskdata->is_taskgraph)
KMP_ATOMIC_DEC(&taskdata->td_taskgroup->count);		KMP_ATOMIC_DEC(&taskdata->td_taskgroup->count);
} else if (task_team && (task_team->tt.tt_found_proxy_tasks \|\|		} else if (task_team && (task_team->tt.tt_found_proxy_tasks \|\|
task_team->tt.tt_hidden_helper_task_encountered)) {		task_team->tt.tt_hidden_helper_task_encountered)) {
// if we found proxy or hidden helper tasks there could exist a dependency		// if we found proxy or hidden helper tasks there could exist a dependency
// chain with the proxy task as origin		// chain with the proxy task as origin
__kmp_release_deps(gtid, taskdata);		__kmp_release_deps(gtid, taskdata);
}		}
// td_flags.executing must be marked as 0 after __kmp_release_deps has been		// td_flags.executing must be marked as 0 after __kmp_release_deps has been
Show All 22 Lines	#endif
thread->th.th_current_task = resumed_task;		thread->th.th_current_task = resumed_task;
if (completed)		if (completed)
__kmp_free_task_and_ancestors(gtid, taskdata, thread);		__kmp_free_task_and_ancestors(gtid, taskdata, thread);

// TODO: GEH - make sure root team implicit task is initialized properly.		// TODO: GEH - make sure root team implicit task is initialized properly.
// KMP_DEBUG_ASSERT( resumed_task->td_flags.executing == 0 );		// KMP_DEBUG_ASSERT( resumed_task->td_flags.executing == 0 );
resumed_task->td_flags.executing = 1; // resume previous task		resumed_task->td_flags.executing = 1; // resume previous task

		if (is_taskgraph && __kmp_track_children_task(taskdata) &&
		taskdata->td_taskgroup) {
		// TDG: we only release taskgroup barrier here because
		josemonsalve2Unsubmitted Done Reply Inline Actions It may be better to use and instead of nested if `if (is_taskgraph && __kmp_track_children_task(taskdata) && taskdata->td_taskgroup)` josemonsalve2: It may be better to use and instead of nested if `if (is_taskgraph && __kmp_track_children_task…
		// free_task_and_ancestors will call
		// __kmp_free_task, which resets all task parameters such as
		// taskdata->started, etc. If we release the barrier earlier, these
		// parameters could be read before being reset. This is not an issue for
		// non-TDG implementation because we never reuse a task(data) structure
		KMP_ATOMIC_DEC(&taskdata->td_taskgroup->count);
		}

KA_TRACE(		KA_TRACE(
10, ("__kmp_task_finish(exit): T#%d finished task %p, resuming task %p\n",		10, ("__kmp_task_finish(exit): T#%d finished task %p, resuming task %p\n",
gtid, taskdata, resumed_task));		gtid, taskdata, resumed_task));

return;		return;
}		}

template <bool ompt>		template <bool ompt>
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	void __kmp_init_implicit_task(ident_t loc_ref, kmp_info_t this_thr,
task->td_flags.task_serial = 1;		task->td_flags.task_serial = 1;
task->td_flags.tasking_ser = (__kmp_tasking_mode == tskm_immediate_exec);		task->td_flags.tasking_ser = (__kmp_tasking_mode == tskm_immediate_exec);
task->td_flags.team_serial = (team->t.t_serialized) ? 1 : 0;		task->td_flags.team_serial = (team->t.t_serialized) ? 1 : 0;

task->td_flags.started = 1;		task->td_flags.started = 1;
task->td_flags.executing = 1;		task->td_flags.executing = 1;
task->td_flags.complete = 0;		task->td_flags.complete = 0;
task->td_flags.freed = 0;		task->td_flags.freed = 0;
		task->td_flags.onced = 0;

task->td_depnode = NULL;		task->td_depnode = NULL;
task->td_last_tied = task;		task->td_last_tied = task;
task->td_allow_completion_event.type = KMP_EVENT_UNINITIALIZED;		task->td_allow_completion_event.type = KMP_EVENT_UNINITIALIZED;

if (set_curr_task) { // only do this init first time thread is created		if (set_curr_task) { // only do this init first time thread is created
KMP_ATOMIC_ST_REL(&task->td_incomplete_child_tasks, 0);		KMP_ATOMIC_ST_REL(&task->td_incomplete_child_tasks, 0);
// Not used: don't need to deallocate implicit task		// Not used: don't need to deallocate implicit task
Show All 20 Lines
// parallel region.		// parallel region.
//		//
// thread: thread data structure corresponding to implicit task		// thread: thread data structure corresponding to implicit task
void __kmp_finish_implicit_task(kmp_info_t *thread) {		void __kmp_finish_implicit_task(kmp_info_t *thread) {
kmp_taskdata_t *task = thread->th.th_current_task;		kmp_taskdata_t *task = thread->th.th_current_task;
if (task->td_dephash) {		if (task->td_dephash) {
int children;		int children;
task->td_flags.complete = 1;		task->td_flags.complete = 1;
		task->td_flags.onced = 1;
children = KMP_ATOMIC_LD_ACQ(&task->td_incomplete_child_tasks);		children = KMP_ATOMIC_LD_ACQ(&task->td_incomplete_child_tasks);
kmp_tasking_flags_t flags_old = task->td_flags;		kmp_tasking_flags_t flags_old = task->td_flags;
if (children == 0 && flags_old.complete == 1) {		if (children == 0 && flags_old.complete == 1) {
kmp_tasking_flags_t flags_new = flags_old;		kmp_tasking_flags_t flags_new = flags_old;
flags_new.complete = 0;		flags_new.complete = 0;
if (KMP_COMPARE_AND_STORE_ACQ32(RCAST(kmp_int32 *, &task->td_flags),		if (KMP_COMPARE_AND_STORE_ACQ32(RCAST(kmp_int32 *, &task->td_flags),
RCAST(kmp_int32 , &flags_old),		RCAST(kmp_int32 , &flags_old),
RCAST(kmp_int32 , &flags_new))) {		RCAST(kmp_int32 , &flags_new))) {
▲ Show 20 Lines • Show All 213 Lines • ▼ Show 20 Lines	#endif
taskdata->td_flags.task_serial =		taskdata->td_flags.task_serial =
(parent_task->td_flags.final \|\| taskdata->td_flags.team_serial \|\|		(parent_task->td_flags.final \|\| taskdata->td_flags.team_serial \|\|
taskdata->td_flags.tasking_ser \|\| flags->merged_if0);		taskdata->td_flags.tasking_ser \|\| flags->merged_if0);

taskdata->td_flags.started = 0;		taskdata->td_flags.started = 0;
taskdata->td_flags.executing = 0;		taskdata->td_flags.executing = 0;
taskdata->td_flags.complete = 0;		taskdata->td_flags.complete = 0;
taskdata->td_flags.freed = 0;		taskdata->td_flags.freed = 0;
		taskdata->td_flags.onced = 0;
KMP_ATOMIC_ST_RLX(&taskdata->td_incomplete_child_tasks, 0);		KMP_ATOMIC_ST_RLX(&taskdata->td_incomplete_child_tasks, 0);
// start at one because counts current task and children		// start at one because counts current task and children
KMP_ATOMIC_ST_RLX(&taskdata->td_allocated_child_tasks, 1);		KMP_ATOMIC_ST_RLX(&taskdata->td_allocated_child_tasks, 1);
taskdata->td_taskgroup =		taskdata->td_taskgroup =
parent_task->td_taskgroup; // task inherits taskgroup from the parent task		parent_task->td_taskgroup; // task inherits taskgroup from the parent task
taskdata->td_dephash = NULL;		taskdata->td_dephash = NULL;
taskdata->td_depnode = NULL;		taskdata->td_depnode = NULL;
taskdata->td_target_data.async_handle = NULL;		taskdata->td_target_data.async_handle = NULL;
Show All 19 Lines	if (__kmp_track_children_task(taskdata)) {
}		}
if (flags->hidden_helper) {		if (flags->hidden_helper) {
taskdata->td_flags.task_serial = FALSE;		taskdata->td_flags.task_serial = FALSE;
// Increment the number of hidden helper tasks to be executed		// Increment the number of hidden helper tasks to be executed
KMP_ATOMIC_INC(&__kmp_unexecuted_hidden_helper_tasks);		KMP_ATOMIC_INC(&__kmp_unexecuted_hidden_helper_tasks);
}		}
}		}

		if (__kmp_max_tdgs &&
		__kmp_tdg_is_recording(
		__kmp_global_tdgs[__kmp_curr_tdg_idx]->tdg_status) &&
		(task_entry != (kmp_routine_entry_t)__kmp_taskloop_task)) {
		taskdata->is_taskgraph = 1;
		taskdata->tdg = __kmp_global_tdgs[__kmp_curr_tdg_idx];
		taskdata->td_task_id = KMP_ATOMIC_INC(&__kmp_tdg_task_id);
		}
KA_TRACE(20, ("__kmp_task_alloc(exit): T#%d created task %p parent=%p\n",		KA_TRACE(20, ("__kmp_task_alloc(exit): T#%d created task %p parent=%p\n",
gtid, taskdata, taskdata->td_parent));		gtid, taskdata, taskdata->td_parent));

return task;		return task;
}		}

kmp_task_t __kmpc_omp_task_alloc(ident_t loc_ref, kmp_int32 gtid,		kmp_task_t __kmpc_omp_task_alloc(ident_t loc_ref, kmp_int32 gtid,
kmp_int32 flags, size_t sizeof_kmp_task_t,		kmp_int32 flags, size_t sizeof_kmp_task_t,
▲ Show 20 Lines • Show All 326 Lines • ▼ Show 20 Lines
// TASK_CURRENT_NOT_QUEUED (0) if did not suspend and queue current task to		// TASK_CURRENT_NOT_QUEUED (0) if did not suspend and queue current task to
// be resumed later.		// be resumed later.
// TASK_CURRENT_QUEUED (1) if suspended and queued the current task to be		// TASK_CURRENT_QUEUED (1) if suspended and queued the current task to be
// resumed later.		// resumed later.
kmp_int32 __kmp_omp_task(kmp_int32 gtid, kmp_task_t *new_task,		kmp_int32 __kmp_omp_task(kmp_int32 gtid, kmp_task_t *new_task,
bool serialize_immediate) {		bool serialize_immediate) {
kmp_taskdata_t *new_taskdata = KMP_TASK_TO_TASKDATA(new_task);		kmp_taskdata_t *new_taskdata = KMP_TASK_TO_TASKDATA(new_task);

		if (new_taskdata->is_taskgraph &&
		__kmp_tdg_is_recording(new_taskdata->tdg->tdg_status)) {
		kmp_tdg_info_t *tdg = new_taskdata->tdg;
		// extend the record_map if needed
		if (new_taskdata->td_task_id >= new_taskdata->tdg->map_size) {
		__kmp_acquire_bootstrap_lock(&tdg->graph_lock);
		// map_size could have been updated by another thread if recursive
		// taskloop
		if (new_taskdata->td_task_id >= tdg->map_size) {
		kmp_uint old_size = tdg->map_size;
		kmp_uint new_size = old_size * 2;
		kmp_node_info_t *old_record = tdg->record_map;
		kmp_node_info_t new_record = (kmp_node_info_t )__kmp_allocate(
		new_size * sizeof(kmp_node_info_t));

		KMP_MEMCPY(new_record, old_record, old_size * sizeof(kmp_node_info_t));
		tdg->record_map = new_record;

		__kmp_free(old_record);

		for (kmp_int i = old_size; i < new_size; i++) {
		kmp_int32 successorsList = (kmp_int32 )__kmp_allocate(
		__kmp_successors_size * sizeof(kmp_int32));
		new_record[i].task = nullptr;
		new_record[i].successors = successorsList;
		new_record[i].nsuccessors = 0;
		new_record[i].npredecessors = 0;
		new_record[i].successors_size = __kmp_successors_size;
		KMP_ATOMIC_ST_REL(&new_record[i].npredecessors_counter, 0);
		}
		// update the size at the end, so that we avoid other
		// threads use old_record while map_size is already updated
		tdg->map_size = new_size;
		}
		__kmp_release_bootstrap_lock(&tdg->graph_lock);
		}
		// record a task
		if (tdg->record_map[new_taskdata->td_task_id].task == nullptr) {
		tdg->record_map[new_taskdata->td_task_id].task = new_task;
		tdg->record_map[new_taskdata->td_task_id].parent_task =
		new_taskdata->td_parent;
		KMP_ATOMIC_INC(&tdg->num_tasks);
		}
		}

/* Should we execute the new task or queue it? For now, let's just always try		/* Should we execute the new task or queue it? For now, let's just always try
to queue it. If the queue fills up, then we'll execute it. */		to queue it. If the queue fills up, then we'll execute it. */
if (new_taskdata->td_flags.proxy == TASK_PROXY \|\|		if (new_taskdata->td_flags.proxy == TASK_PROXY \|\|
__kmp_push_task(gtid, new_task) == TASK_NOT_PUSHED) // if cannot defer		__kmp_push_task(gtid, new_task) == TASK_NOT_PUSHED) // if cannot defer
{ // Execute this task immediately		{ // Execute this task immediately
kmp_taskdata_t *current_task = __kmp_threads[gtid]->th.th_current_task;		kmp_taskdata_t *current_task = __kmp_threads[gtid]->th.th_current_task;
if (serialize_immediate)		if (serialize_immediate)
new_taskdata->td_flags.task_serial = 1;		new_taskdata->td_flags.task_serial = 1;
▲ Show 20 Lines • Show All 232 Lines • ▼ Show 20 Lines	if (ompt) {
if (ompt_enabled.ompt_callback_sync_region) {		if (ompt_enabled.ompt_callback_sync_region) {
ompt_callbacks.ompt_callback(ompt_callback_sync_region)(		ompt_callbacks.ompt_callback(ompt_callback_sync_region)(
ompt_sync_region_taskwait, ompt_scope_end, my_parallel_data,		ompt_sync_region_taskwait, ompt_scope_end, my_parallel_data,
my_task_data, return_address);		my_task_data, return_address);
}		}
taskdata->ompt_task_info.frame.enter_frame = ompt_data_none;		taskdata->ompt_task_info.frame.enter_frame = ompt_data_none;
}		}
#endif // OMPT_SUPPORT && OMPT_OPTIONAL		#endif // OMPT_SUPPORT && OMPT_OPTIONAL

}		}

KA_TRACE(10, ("__kmpc_omp_taskwait(exit): T#%d task %p finished waiting, "		KA_TRACE(10, ("__kmpc_omp_taskwait(exit): T#%d task %p finished waiting, "
"returning TASK_CURRENT_NOT_QUEUED\n",		"returning TASK_CURRENT_NOT_QUEUED\n",
gtid, taskdata));		gtid, taskdata));

return TASK_CURRENT_NOT_QUEUED;		return TASK_CURRENT_NOT_QUEUED;
}		}

#if OMPT_SUPPORT && OMPT_OPTIONAL		#if OMPT_SUPPORT && OMPT_OPTIONAL
▲ Show 20 Lines • Show All 249 Lines • ▼ Show 20 Lines
Initialize task reduction for the taskgroup.		Initialize task reduction for the taskgroup.

Note: this entry supposes the optional compiler-generated initializer routine		Note: this entry supposes the optional compiler-generated initializer routine
has single parameter - pointer to object to be initialized. That means		has single parameter - pointer to object to be initialized. That means
the reduction either does not use omp_orig object, or the omp_orig is accessible		the reduction either does not use omp_orig object, or the omp_orig is accessible
without help of the runtime library.		without help of the runtime library.
*/		*/
void __kmpc_task_reduction_init(int gtid, int num, void data) {		void __kmpc_task_reduction_init(int gtid, int num, void data) {
		if (__kmp_max_tdgs &&
		__kmp_tdg_is_recording(
		__kmp_global_tdgs[__kmp_curr_tdg_idx]->tdg_status)) {
		kmp_tdg_info_t *this_tdg = __kmp_global_tdgs[__kmp_curr_tdg_idx];
		this_tdg->rec_taskred_data =
		__kmp_allocate(sizeof(kmp_task_red_input_t) * num);
		this_tdg->rec_num_taskred = num;
		KMP_MEMCPY(this_tdg->rec_taskred_data, data,
		sizeof(kmp_task_red_input_t) * num);
		}
return __kmp_task_reduction_init(gtid, num, (kmp_task_red_input_t *)data);		return __kmp_task_reduction_init(gtid, num, (kmp_task_red_input_t *)data);
}		}

/*!		/*!
@ingroup TASKING		@ingroup TASKING
@param gtid Global thread ID		@param gtid Global thread ID
@param num Number of data items to reduce		@param num Number of data items to reduce
@param data Array of data for reduction		@param data Array of data for reduction
@return The taskgroup identifier		@return The taskgroup identifier

Initialize task reduction for the taskgroup.		Initialize task reduction for the taskgroup.

Note: this entry supposes the optional compiler-generated initializer routine		Note: this entry supposes the optional compiler-generated initializer routine
has two parameters, pointer to object to be initialized and pointer to omp_orig		has two parameters, pointer to object to be initialized and pointer to omp_orig
*/		*/
void __kmpc_taskred_init(int gtid, int num, void data) {		void __kmpc_taskred_init(int gtid, int num, void data) {
		if (__kmp_max_tdgs &&
		__kmp_tdg_is_recording(
		__kmp_global_tdgs[__kmp_curr_tdg_idx]->tdg_status)) {
		kmp_tdg_info_t *this_tdg = __kmp_global_tdgs[__kmp_curr_tdg_idx];
		this_tdg->rec_taskred_data =
		__kmp_allocate(sizeof(kmp_task_red_input_t) * num);
		this_tdg->rec_num_taskred = num;
		KMP_MEMCPY(this_tdg->rec_taskred_data, data,
		sizeof(kmp_task_red_input_t) * num);
		}
return __kmp_task_reduction_init(gtid, num, (kmp_taskred_input_t *)data);		return __kmp_task_reduction_init(gtid, num, (kmp_taskred_input_t *)data);
}		}

// Copy task reduction data (except for shared pointers).		// Copy task reduction data (except for shared pointers).
template <typename T>		template <typename T>
void __kmp_task_reduction_init_copy(kmp_info_t thr, int num, T data,		void __kmp_task_reduction_init_copy(kmp_info_t thr, int num, T data,
kmp_taskgroup_t tg, void reduce_data) {		kmp_taskgroup_t tg, void reduce_data) {
kmp_taskred_data_t *arr;		kmp_taskred_data_t *arr;
Show All 30 Lines	void __kmpc_task_reduction_get_th_data(int gtid, void tskgrp, void *data) {
kmp_taskgroup_t tg = (kmp_taskgroup_t )tskgrp;		kmp_taskgroup_t tg = (kmp_taskgroup_t )tskgrp;
if (tg == NULL)		if (tg == NULL)
tg = thread->th.th_current_task->td_taskgroup;		tg = thread->th.th_current_task->td_taskgroup;
KMP_ASSERT(tg != NULL);		KMP_ASSERT(tg != NULL);
kmp_taskred_data_t arr = (kmp_taskred_data_t )(tg->reduce_data);		kmp_taskred_data_t arr = (kmp_taskred_data_t )(tg->reduce_data);
kmp_int32 num = tg->reduce_num_data;		kmp_int32 num = tg->reduce_num_data;
kmp_int32 tid = thread->th.th_info.ds.ds_tid;		kmp_int32 tid = thread->th.th_info.ds.ds_tid;

		if ((thread->th.th_current_task->is_taskgraph) &&
		(!__kmp_tdg_is_recording(
		__kmp_global_tdgs[__kmp_curr_tdg_idx]->tdg_status))) {
		tg = thread->th.th_current_task->td_taskgroup;
		KMP_ASSERT(tg != NULL);
		KMP_ASSERT(tg->reduce_data != NULL);
		arr = (kmp_taskred_data_t *)(tg->reduce_data);
		num = tg->reduce_num_data;
		}

KMP_ASSERT(data != NULL);		KMP_ASSERT(data != NULL);
while (tg != NULL) {		while (tg != NULL) {
for (int i = 0; i < num; ++i) {		for (int i = 0; i < num; ++i) {
if (!arr[i].flags.lazy_priv) {		if (!arr[i].flags.lazy_priv) {
if (data == arr[i].reduce_shar \|\|		if (data == arr[i].reduce_shar \|\|
(data >= arr[i].reduce_priv && data < arr[i].reduce_pend))		(data >= arr[i].reduce_priv && data < arr[i].reduce_pend))
return (char )(arr[i].reduce_priv) + tid arr[i].reduce_size;		return (char )(arr[i].reduce_priv) + tid arr[i].reduce_size;
} else {		} else {
▲ Show 20 Lines • Show All 827 Lines • ▼ Show 20 Lines	#endif /* USE_ITT_BUILD */
if (final_spin &&		if (final_spin &&
KMP_ATOMIC_LD_ACQ(&current_task->td_incomplete_child_tasks) == 0) {		KMP_ATOMIC_LD_ACQ(&current_task->td_incomplete_child_tasks) == 0) {
// First, decrement the #unfinished threads, if that has not already been		// First, decrement the #unfinished threads, if that has not already been
// done. This decrement might be to the spin location, and result in the		// done. This decrement might be to the spin location, and result in the
// termination condition being satisfied.		// termination condition being satisfied.
if (!*thread_finished) {		if (!*thread_finished) {
#if KMP_DEBUG		#if KMP_DEBUG
kmp_int32 count = -1 +		kmp_int32 count = -1 +
#endif		#endif
		josemonsalve2Unsubmitted Done Reply Inline Actions Not needed josemonsalve2: Not needed
KMP_ATOMIC_DEC(unfinished_threads);		KMP_ATOMIC_DEC(unfinished_threads);
KA_TRACE(20, ("__kmp_execute_tasks_template: T#%d dec "		KA_TRACE(20, ("__kmp_execute_tasks_template: T#%d dec "
"unfinished_threads to %d task_team=%p\n",		"unfinished_threads to %d task_team=%p\n",
gtid, count, task_team));		gtid, count, task_team));
*thread_finished = TRUE;		*thread_finished = TRUE;
}		}

// It is now unsafe to reference thread->th.th_team !!!		// It is now unsafe to reference thread->th.th_team !!!
▲ Show 20 Lines • Show All 865 Lines • ▼ Show 20 Lines	/* The finish of the proxy tasks is divided in two pieces:
half. */		half. */
static void __kmp_first_top_half_finish_proxy(kmp_taskdata_t *taskdata) {		static void __kmp_first_top_half_finish_proxy(kmp_taskdata_t *taskdata) {
KMP_DEBUG_ASSERT(taskdata->td_flags.tasktype == TASK_EXPLICIT);		KMP_DEBUG_ASSERT(taskdata->td_flags.tasktype == TASK_EXPLICIT);
KMP_DEBUG_ASSERT(taskdata->td_flags.proxy == TASK_PROXY);		KMP_DEBUG_ASSERT(taskdata->td_flags.proxy == TASK_PROXY);
KMP_DEBUG_ASSERT(taskdata->td_flags.complete == 0);		KMP_DEBUG_ASSERT(taskdata->td_flags.complete == 0);
KMP_DEBUG_ASSERT(taskdata->td_flags.freed == 0);		KMP_DEBUG_ASSERT(taskdata->td_flags.freed == 0);

taskdata->td_flags.complete = 1; // mark the task as completed		taskdata->td_flags.complete = 1; // mark the task as completed
		taskdata->td_flags.onced = 1;

if (taskdata->td_taskgroup)		if (taskdata->td_taskgroup)
KMP_ATOMIC_DEC(&taskdata->td_taskgroup->count);		KMP_ATOMIC_DEC(&taskdata->td_taskgroup->count);

// Create an imaginary children for this task so the bottom half cannot		// Create an imaginary children for this task so the bottom half cannot
// release the task before we have completed the second top half		// release the task before we have completed the second top half
KMP_ATOMIC_OR(&taskdata->td_incomplete_child_tasks, PROXY_TASK_FLAG);		KMP_ATOMIC_OR(&taskdata->td_incomplete_child_tasks, PROXY_TASK_FLAG);
}		}
▲ Show 20 Lines • Show All 182 Lines • ▼ Show 20 Lines	#endif
}		}
}		}

// __kmp_task_dup_alloc: Allocate the taskdata and make a copy of source task		// __kmp_task_dup_alloc: Allocate the taskdata and make a copy of source task
// for taskloop		// for taskloop
//		//
// thread: allocating thread		// thread: allocating thread
// task_src: pointer to source task to be duplicated		// task_src: pointer to source task to be duplicated
		// taskloop_recur: used only when dealing with taskgraph,
		// indicating whether we need to update task->td_task_id
// returns: a pointer to the allocated kmp_task_t structure (task).		// returns: a pointer to the allocated kmp_task_t structure (task).
kmp_task_t __kmp_task_dup_alloc(kmp_info_t thread, kmp_task_t *task_src) {		kmp_task_t __kmp_task_dup_alloc(kmp_info_t thread, kmp_task_t *task_src,
		int taskloop_recur) {
kmp_task_t *task;		kmp_task_t *task;
kmp_taskdata_t *taskdata;		kmp_taskdata_t *taskdata;
kmp_taskdata_t *taskdata_src = KMP_TASK_TO_TASKDATA(task_src);		kmp_taskdata_t *taskdata_src = KMP_TASK_TO_TASKDATA(task_src);
kmp_taskdata_t *parent_task = taskdata_src->td_parent; // same parent task		kmp_taskdata_t *parent_task = taskdata_src->td_parent; // same parent task
size_t shareds_offset;		size_t shareds_offset;
size_t task_size;		size_t task_size;

KA_TRACE(10, ("__kmp_task_dup_alloc(enter): Th %p, source task %p\n", thread,		KA_TRACE(10, ("__kmp_task_dup_alloc(enter): Th %p, source task %p\n", thread,
Show All 11 Lines
#else		#else
taskdata = (kmp_taskdata_t *)__kmp_thread_malloc(thread, task_size);		taskdata = (kmp_taskdata_t *)__kmp_thread_malloc(thread, task_size);
#endif /* USE_FAST_MEMORY */		#endif /* USE_FAST_MEMORY */
KMP_MEMCPY(taskdata, taskdata_src, task_size);		KMP_MEMCPY(taskdata, taskdata_src, task_size);

task = KMP_TASKDATA_TO_TASK(taskdata);		task = KMP_TASKDATA_TO_TASK(taskdata);

// Initialize new task (only specific fields not affected by memcpy)		// Initialize new task (only specific fields not affected by memcpy)
		if (!taskdata->is_taskgraph \|\| taskloop_recur)
taskdata->td_task_id = KMP_GEN_TASK_ID();		taskdata->td_task_id = KMP_GEN_TASK_ID();
		else if (taskdata->is_taskgraph &&
		__kmp_tdg_is_recording(taskdata_src->tdg->tdg_status))
		taskdata->td_task_id = KMP_ATOMIC_INC(&__kmp_tdg_task_id);
if (task->shareds != NULL) { // need setup shareds pointer		if (task->shareds != NULL) { // need setup shareds pointer
shareds_offset = (char )task_src->shareds - (char )taskdata_src;		shareds_offset = (char )task_src->shareds - (char )taskdata_src;
task->shareds = &((char *)taskdata)[shareds_offset];		task->shareds = &((char *)taskdata)[shareds_offset];
KMP_DEBUG_ASSERT((((kmp_uintptr_t)task->shareds) & (sizeof(void *) - 1)) ==		KMP_DEBUG_ASSERT((((kmp_uintptr_t)task->shareds) & (sizeof(void *) - 1)) ==
0);		0);
}		}
taskdata->td_alloc_thread = thread;		taskdata->td_alloc_thread = thread;
taskdata->td_parent = parent_task;		taskdata->td_parent = parent_task;
▲ Show 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	if (i == num_tasks - 1) {
if ((kmp_uint64)st > ub_glob - upper)		if ((kmp_uint64)st > ub_glob - upper)
lastpriv = 1;		lastpriv = 1;
} else { // negative loop stride		} else { // negative loop stride
KMP_DEBUG_ASSERT(upper + st < *ub);		KMP_DEBUG_ASSERT(upper + st < *ub);
if (upper - ub_glob < (kmp_uint64)(-st))		if (upper - ub_glob < (kmp_uint64)(-st))
lastpriv = 1;		lastpriv = 1;
}		}
}		}
next_task = __kmp_task_dup_alloc(thread, task); // allocate new task		next_task =
		__kmp_task_dup_alloc(thread, task,
		/* taskloop_recur */ 0); // allocate new task
kmp_taskdata_t *next_taskdata = KMP_TASK_TO_TASKDATA(next_task);		kmp_taskdata_t *next_taskdata = KMP_TASK_TO_TASKDATA(next_task);
kmp_taskloop_bounds_t next_task_bounds =		kmp_taskloop_bounds_t next_task_bounds =
kmp_taskloop_bounds_t(next_task, task_bounds);		kmp_taskloop_bounds_t(next_task, task_bounds);

// adjust task-specific bounds		// adjust task-specific bounds
next_task_bounds.set_lb(lower);		next_task_bounds.set_lb(lower);
if (next_taskdata->td_flags.native) {		if (next_taskdata->td_flags.native) {
next_task_bounds.set_ub(upper + (st > 0 ? 1 : -1));		next_task_bounds.set_ub(upper + (st > 0 ? 1 : -1));
▲ Show 20 Lines • Show All 180 Lines • ▼ Show 20 Lines	if (last_chunk < 0) {
ext0 = extras;		ext0 = extras;
tc1 = grainsize * n_tsk1;		tc1 = grainsize * n_tsk1;
tc0 = tc - tc1;		tc0 = tc - tc1;
}		}
ub0 = lower + st * (tc0 - 1);		ub0 = lower + st * (tc0 - 1);
lb1 = ub0 + st;		lb1 = ub0 + st;

// create pattern task for 2nd half of the loop		// create pattern task for 2nd half of the loop
next_task = __kmp_task_dup_alloc(thread, task); // duplicate the task		next_task =
		__kmp_task_dup_alloc(thread, task,
		/* taskloop_recur */ 1); // duplicate the task
// adjust lower bound (upper bound is not changed) for the 2nd half		// adjust lower bound (upper bound is not changed) for the 2nd half
(kmp_uint64 )((char *)next_task + lower_offset) = lb1;		(kmp_uint64 )((char *)next_task + lower_offset) = lb1;
if (ptask_dup != NULL) // construct firstprivates, etc.		if (ptask_dup != NULL) // construct firstprivates, etc.
ptask_dup(next_task, task, 0);		ptask_dup(next_task, task, 0);
*ub = ub0; // adjust upper bound for the 1st half		*ub = ub0; // adjust upper bound for the 1st half

// create auxiliary task for 2nd half of the loop		// create auxiliary task for 2nd half of the loop
// make sure new task has same parent task as the pattern task		// make sure new task has same parent task as the pattern task
Show All 16 Lines	#endif
p->extras = ext1;		p->extras = ext1;
p->last_chunk = last_chunk1;		p->last_chunk = last_chunk1;
p->tc = tc1;		p->tc = tc1;
p->num_t_min = num_t_min;		p->num_t_min = num_t_min;
#if OMPT_SUPPORT		#if OMPT_SUPPORT
p->codeptr_ra = codeptr_ra;		p->codeptr_ra = codeptr_ra;
#endif		#endif

		kmp_taskdata_t *new_task_data = KMP_TASK_TO_TASKDATA(new_task);
		new_task_data->tdg = taskdata->tdg;
		new_task_data->is_taskgraph = 0;

#if OMPT_SUPPORT		#if OMPT_SUPPORT
// schedule new task with correct return address for OMPT events		// schedule new task with correct return address for OMPT events
__kmp_omp_taskloop_task(NULL, gtid, new_task, codeptr_ra);		__kmp_omp_taskloop_task(NULL, gtid, new_task, codeptr_ra);
#else		#else
__kmp_omp_task(gtid, new_task, true); // schedule new task		__kmp_omp_task(gtid, new_task, true); // schedule new task
#endif		#endif

// execute the 1st half of current subrange		// execute the 1st half of current subrange
Show All 23 Lines	static void __kmp_taskloop(ident_t loc, int gtid, kmp_task_t task, int if_val,
KMP_DEBUG_ASSERT(task != NULL);		KMP_DEBUG_ASSERT(task != NULL);
if (nogroup == 0) {		if (nogroup == 0) {
#if OMPT_SUPPORT && OMPT_OPTIONAL		#if OMPT_SUPPORT && OMPT_OPTIONAL
OMPT_STORE_RETURN_ADDRESS(gtid);		OMPT_STORE_RETURN_ADDRESS(gtid);
#endif		#endif
__kmpc_taskgroup(loc, gtid);		__kmpc_taskgroup(loc, gtid);
}		}

		KMP_ATOMIC_DEC(&__kmp_tdg_task_id);
// =========================================================================		// =========================================================================
// calculate loop parameters		// calculate loop parameters
kmp_taskloop_bounds_t task_bounds(task, lb, ub);		kmp_taskloop_bounds_t task_bounds(task, lb, ub);
kmp_uint64 tc;		kmp_uint64 tc;
// compiler provides global bounds here		// compiler provides global bounds here
kmp_uint64 lower = task_bounds.get_lb();		kmp_uint64 lower = task_bounds.get_lb();
kmp_uint64 upper = task_bounds.get_ub();		kmp_uint64 upper = task_bounds.get_ub();
kmp_uint64 ub_glob = upper; // global upper used to calc lastprivate flag		kmp_uint64 ub_glob = upper; // global upper used to calc lastprivate flag
▲ Show 20 Lines • Show All 231 Lines • ▼ Show 20 Lines	bool __kmpc_omp_has_task_team(kmp_int32 gtid) {
kmp_info_t *thread = __kmp_thread_from_gtid(gtid);		kmp_info_t *thread = __kmp_thread_from_gtid(gtid);
kmp_taskdata_t *taskdata = thread->th.th_current_task;		kmp_taskdata_t *taskdata = thread->th.th_current_task;

if (!taskdata)		if (!taskdata)
return FALSE;		return FALSE;

return taskdata->td_task_team != NULL;		return taskdata->td_task_team != NULL;
}		}

		// __kmp_find_tdg: identify a TDG through its ID
		// gtid: Global Thread ID
		// tdg_id: ID of the TDG
		// returns: If a TDG corresponding to this ID is found and not
		// its initial state, return the pointer to it, otherwise nullptr
		kmp_tdg_info_t *__kmp_find_tdg(kmp_int32 tdg_id) {
		kmp_tdg_info_t *res = nullptr;
		if (__kmp_max_tdgs == 0)
		return res;

		if (__kmp_global_tdgs == NULL)
		__kmp_global_tdgs = (kmp_tdg_info_t **)__kmp_allocate(
		sizeof(kmp_tdg_info_t ) __kmp_max_tdgs);

		if ((__kmp_global_tdgs[tdg_id]) &&
		(__kmp_global_tdgs[tdg_id]->tdg_status != KMP_TDG_NONE))
		res = __kmp_global_tdgs[tdg_id];
		return res;
		}

		// __kmp_start_record: launch the execution of a previous
		// recorded TDG
		// gtid: Global Thread ID
		// tdg: ID of the TDG
		void __kmp_exec_tdg(kmp_int32 gtid, kmp_tdg_info_t *tdg) {
		KMP_DEBUG_ASSERT(tdg->tdg_status == KMP_TDG_READY);
		KA_TRACE(10, ("__kmp_exec_tdg(enter): T#%d tdg_id=%d num_roots=%d\n", gtid,
		tdg->tdg_id, tdg->num_roots));
		kmp_node_info_t *this_record_map = tdg->record_map;
		kmp_int32 *this_root_tasks = tdg->root_tasks;
		kmp_int32 this_num_roots = tdg->num_roots;
		kmp_int32 this_num_tasks = KMP_ATOMIC_LD_RLX(&tdg->num_tasks);

		kmp_info_t *thread = __kmp_threads[gtid];
		kmp_taskdata_t *parent_task = thread->th.th_current_task;

		if (tdg->rec_taskred_data) {
		__kmpc_taskred_init(gtid, tdg->rec_num_taskred, tdg->rec_taskred_data);
		}

		for (kmp_int32 j = 0; j < this_num_tasks; j++) {
		kmp_taskdata_t *td = KMP_TASK_TO_TASKDATA(this_record_map[j].task);

		td->td_parent = parent_task;
		this_record_map[j].parent_task = parent_task;

		kmp_taskgroup_t *parent_taskgroup =
		this_record_map[j].parent_task->td_taskgroup;

		KMP_ATOMIC_ST_RLX(&this_record_map[j].npredecessors_counter,
		this_record_map[j].npredecessors);
		KMP_ATOMIC_INC(&this_record_map[j].parent_task->td_incomplete_child_tasks);

		if (parent_taskgroup) {
		KMP_ATOMIC_INC(&parent_taskgroup->count);
		// The taskgroup is different so we must update it
		td->td_taskgroup = parent_taskgroup;
		} else if (td->td_taskgroup != nullptr) {
		// If the parent doesnt have a taskgroup, remove it from the task
		td->td_taskgroup = nullptr;
		}
		if (this_record_map[j].parent_task->td_flags.tasktype == TASK_EXPLICIT)
		KMP_ATOMIC_INC(&this_record_map[j].parent_task->td_allocated_child_tasks);
		}

		for (kmp_int32 j = 0; j < this_num_roots; ++j) {
		__kmp_omp_task(gtid, this_record_map[this_root_tasks[j]].task, true);
		}
		KA_TRACE(10, ("__kmp_exec_tdg(exit): T#%d tdg_id=%d num_roots=%d\n", gtid,
		tdg->tdg_id, tdg->num_roots));
		}

		// __kmp_start_record: set up a TDG structure and turn the
		// recording flag to true
		// gtid: Global Thread ID of the encountering thread
		// input_flags: Flags associated with the TDG
		// tdg_id: ID of the TDG to record
		static inline void __kmp_start_record(kmp_int32 gtid,
		kmp_taskgraph_flags_t *flags,
		kmp_int32 tdg_id) {
		kmp_tdg_info_t *tdg =
		(kmp_tdg_info_t *)__kmp_allocate(sizeof(kmp_tdg_info_t));
		__kmp_global_tdgs[__kmp_curr_tdg_idx] = tdg;
		// Initializing the TDG structure
		tdg->tdg_id = tdg_id;
		tdg->map_size = INIT_MAPSIZE;
		tdg->num_roots = -1;
		tdg->root_tasks = nullptr;
		tdg->tdg_status = KMP_TDG_RECORDING;
		tdg->rec_num_taskred = 0;
		tdg->rec_taskred_data = nullptr;
		KMP_ATOMIC_ST_RLX(&tdg->num_tasks, 0);

		// Initializing the list of nodes in this TDG
		kmp_node_info_t *this_record_map =
		(kmp_node_info_t )__kmp_allocate(INIT_MAPSIZE sizeof(kmp_node_info_t));
		for (kmp_int32 i = 0; i < INIT_MAPSIZE; i++) {
		kmp_int32 *successorsList =
		(kmp_int32 )__kmp_allocate(__kmp_successors_size sizeof(kmp_int32));
		this_record_map[i].task = nullptr;
		this_record_map[i].successors = successorsList;
		this_record_map[i].nsuccessors = 0;
		this_record_map[i].npredecessors = 0;
		this_record_map[i].successors_size = __kmp_successors_size;
		KMP_ATOMIC_ST_RLX(&this_record_map[i].npredecessors_counter, 0);
		}

		__kmp_global_tdgs[__kmp_curr_tdg_idx]->record_map = this_record_map;
		}

		// __kmpc_start_record_task: Wrapper around __kmp_start_record to mark
		// the beginning of the record process of a task region
		// loc_ref: Location of TDG, not used yet
		// gtid: Global Thread ID of the encountering thread
		// input_flags: Flags associated with the TDG
		// tdg_id: ID of the TDG to record, for now, incremental integer
		// returns: 1 if we record, otherwise, 0
		kmp_int32 __kmpc_start_record_task(ident_t *loc_ref, kmp_int32 gtid,
		kmp_int32 input_flags, kmp_int32 tdg_id) {

		kmp_int32 res;
		kmp_taskgraph_flags_t flags = (kmp_taskgraph_flags_t )&input_flags;
		KA_TRACE(10,
		("__kmpc_start_record_task(enter): T#%d loc=%p flags=%d tdg_id=%d\n",
		gtid, loc_ref, input_flags, tdg_id));

		if (__kmp_max_tdgs == 0) {
		KA_TRACE(
		10,
		("__kmpc_start_record_task(abandon): T#%d loc=%p flags=%d tdg_id = %d, "
		"__kmp_max_tdgs = 0\n",
		gtid, loc_ref, input_flags, tdg_id));
		return 1;
		}

		__kmpc_taskgroup(loc_ref, gtid);
		if (kmp_tdg_info_t *tdg = __kmp_find_tdg(tdg_id)) {
		// TODO: use re_record flag
		__kmp_exec_tdg(gtid, tdg);
		res = 0;
		} else {
		__kmp_curr_tdg_idx = tdg_id;
		KMP_DEBUG_ASSERT(__kmp_curr_tdg_idx < __kmp_max_tdgs);
		__kmp_start_record(gtid, flags, tdg_id);
		__kmp_num_tdg++;
		res = 1;
		}
		KA_TRACE(10, ("__kmpc_start_record_task(exit): T#%d TDG %d starts to %s\n",
		gtid, tdg_id, res ? "record" : "execute"));
		return res;
		}

		// __kmp_end_record: set up a TDG after recording it
		// gtid: Global thread ID
		// tdg: Pointer to the TDG
		void __kmp_end_record(kmp_int32 gtid, kmp_tdg_info_t *tdg) {
		// Store roots
		kmp_node_info_t *this_record_map = tdg->record_map;
		kmp_int32 this_num_tasks = KMP_ATOMIC_LD_RLX(&tdg->num_tasks);
		kmp_int32 *this_root_tasks =
		(kmp_int32 )__kmp_allocate(this_num_tasks sizeof(kmp_int32));
		kmp_int32 this_map_size = tdg->map_size;
		kmp_int32 this_num_roots = 0;
		kmp_info_t *thread = __kmp_threads[gtid];

		for (kmp_int32 i = 0; i < this_num_tasks; i++) {
		if (this_record_map[i].npredecessors == 0) {
		this_root_tasks[this_num_roots++] = i;
		}
		}

		// Update with roots info and mapsize
		tdg->map_size = this_map_size;
		tdg->num_roots = this_num_roots;
		tdg->root_tasks = this_root_tasks;
		KMP_DEBUG_ASSERT(tdg->tdg_status == KMP_TDG_RECORDING);
		tdg->tdg_status = KMP_TDG_READY;

		if (thread->th.th_current_task->td_dephash) {
		__kmp_dephash_free(thread, thread->th.th_current_task->td_dephash);
		thread->th.th_current_task->td_dephash = NULL;
		}

		// Reset predecessor counter
		for (kmp_int32 i = 0; i < this_num_tasks; i++) {
		KMP_ATOMIC_ST_RLX(&this_record_map[i].npredecessors_counter,
		this_record_map[i].npredecessors);
		}
		KMP_ATOMIC_ST_RLX(&__kmp_tdg_task_id, 0);
		}

		// __kmpc_end_record_task: wrapper around __kmp_end_record to mark
		// the end of recording phase
		//
		// loc_ref: Source location information
		// gtid: Global thread ID
		// input_flags: Flags attached to the graph
		// tdg_id: ID of the TDG just finished recording
		void __kmpc_end_record_task(ident_t *loc_ref, kmp_int32 gtid,
		kmp_int32 input_flags, kmp_int32 tdg_id) {
		kmp_tdg_info_t *tdg = __kmp_find_tdg(tdg_id);

		KA_TRACE(10, ("__kmpc_end_record_task(enter): T#%d loc=%p finishes recording"
		" tdg=%d with flags=%d\n",
		gtid, loc_ref, tdg_id, input_flags));
		if (__kmp_max_tdgs) {
		// TODO: use input_flags->nowait
		__kmpc_end_taskgroup(loc_ref, gtid);
		if (__kmp_tdg_is_recording(tdg->tdg_status))
		__kmp_end_record(gtid, tdg);
		}
		KA_TRACE(10, ("__kmpc_end_record_task(exit): T#%d loc=%p finished recording"
		" tdg=%d, its status is now READY\n",
		gtid, loc_ref, tdg_id));
		}

openmp/runtime/test/tasking/omp_record_replay.cpp

This file was added.

				// RUN: %libomp-cxx-compile-and-run
				#include <iostream>
				#include <cassert>
				#define NT 100

				// Compiler-generated code (emulation)
				typedef struct ident {
				void* dummy;
				} ident_t;


				#ifdef __cplusplus
				extern "C" {
				int __kmpc_global_thread_num(ident_t *);
				int __kmpc_start_record_task(ident_t *, int, int, int);
				void __kmpc_end_record_task(ident_t *, int, int , int);
				}
				#endif

				void func(int *num_exec) {
				(*num_exec)++;
				}
				randreshgUnsubmitted Done Reply Inline Actions I think the test will fail if you don't guarantee that this line will be executed atomically. randreshg: I think the test will fail if you don't guarantee that this line will be executed atomically.
				yuchenleAuthorUnsubmitted Done Reply Inline Actions I think the test will fail if you don't guarantee that this line will be executed atomically. You are right, if the TDG is asynchronous. However, in the current implementation, TDG execution is synchronous, using taskgroup. Plus, there is only one node in the TDG so there is no contention. Only one thread can access to "num_exec" at any given time : ) yuchenle: > I think the test will fail if you don't guarantee that this line will be executed atomically.

				int main() {
				int num_exec = 0;
				int num_tasks = 0;
				int x=0;
				#pragma omp parallel
				#pragma omp single
				for (int iter = 0; iter < NT; ++iter) {
				int gtid = __kmpc_global_thread_num(nullptr);
				int res = __kmpc_start_record_task(nullptr, gtid, /* kmp_tdg_flags / 0, / tdg_id */0);
				if (res) {
				num_tasks++;
				#pragma omp task
				func(&num_exec);
				}
				__kmpc_end_record_task(nullptr, gtid, /* kmp_tdg_flags /0, / tdg_id */0);
				}

				assert(num_tasks==1);
				assert(num_exec==NT);

				std::cout << "Passed" << std::endl;
				return 0;
				}
				// CHECK: Passed

openmp/runtime/test/tasking/omp_record_replay_deps.cpp

This file was added.

				// RUN: %libomp-cxx-compile-and-run
				#include <iostream>
				tianshilei1992Unsubmitted Not Done Reply Inline Actions I think you will need to guard these newly added test cases such that they will only be executed if `LIBOMP_OMPX_TASKGRAPH` is enabled; otherwise it might cause unexpected failures. tianshilei1992: I think you will need to guard these newly added test cases such that they will only be…
				#include <cassert>
				#define NT 100
				#define MULTIPLIER 100
				#define DECREMENT 5

				int val;
				// Compiler-generated code (emulation)
				typedef struct ident {
				void* dummy;
				} ident_t;


				#ifdef __cplusplus
				extern "C" {
				int __kmpc_global_thread_num(ident_t *);
				int __kmpc_start_record_task(ident_t *, int, int, int);
				void __kmpc_end_record_task(ident_t *, int, int, int);
				}
				#endif

				void sub() {
				#pragma omp atomic
				val -= DECREMENT;
				}

				void add() {
				#pragma omp atomic
				val += DECREMENT;
				}

				void mult() {
				// no atomicity needed, can only be executed by 1 thread
				// and no concurrency with other tasks possible
				val *= MULTIPLIER;
				}

				int main() {
				val = 0;
				int x, y;
				#pragma omp parallel
				#pragma omp single
				for (int iter = 0; iter < NT; ++iter) {
				int gtid = __kmpc_global_thread_num(nullptr);
				int res = __kmpc_start_record_task(nullptr, gtid, /* kmp_tdg_flags /0, / tdg_id */0);
				if (res) {
				#pragma omp task depend(out:y)
				add();
				#pragma omp task depend(out:x)
				sub();
				#pragma omp task depend(in:x,y)
				mult();
				}
				__kmpc_end_record_task(nullptr, gtid, /* kmp_tdg_flags /0, / tdg_id */0);
				}
				assert(val==0);

				std::cout << "Passed" << std::endl;
				return 0;
				}
				// CHECK: Passed

openmp/runtime/test/tasking/omp_record_replay_multiTDGs.cpp

This file was added.

				// RUN: %libomp-cxx-compile-and-run
				#include <iostream>
				#include <cassert>
				#define NT 20
				#define MULTIPLIER 100
				#define DECREMENT 5

				// Compiler-generated code (emulation)
				typedef struct ident {
				void* dummy;
				} ident_t;

				int val;
				#ifdef __cplusplus
				extern "C" {
				int __kmpc_global_thread_num(ident_t *);
				int __kmpc_start_record_task(ident_t *, int, int, int);
				void __kmpc_end_record_task(ident_t *, int, int , int);
				}
				#endif

				void sub() {
				#pragma omp atomic
				val -= DECREMENT;
				}

				void add() {
				#pragma omp atomic
				val += DECREMENT;
				}

				void mult() {
				// no atomicity needed, can only be executed by 1 thread
				// and no concurrency with other tasks possible
				val *= MULTIPLIER;
				}

				int main() {
				int num_tasks = 0;
				int x, y;
				#pragma omp parallel
				#pragma omp single
				for (int iter = 0; iter < NT; ++iter) {
				int gtid = __kmpc_global_thread_num(nullptr);
				int res = __kmpc_start_record_task(nullptr, gtid, /* kmp_tdg_flags / 0, / tdg_id */0);
				if (res) {
				num_tasks++;
				#pragma omp task depend(out:y)
				add();
				#pragma omp task depend(out:x)
				sub();
				#pragma omp task depend(in:x,y)
				mult();
				}
				__kmpc_end_record_task(nullptr, gtid, /* kmp_tdg_flags /0, / tdg_id */0);
				res = __kmpc_start_record_task(nullptr, gtid, /* kmp_tdg_flags / 0, / tdg_id */1);
				if (res) {
				num_tasks++;
				#pragma omp task depend(out:y)
				add();
				#pragma omp task depend(out:x)
				sub();
				#pragma omp task depend(in:x,y)
				mult();
				}
				__kmpc_end_record_task(nullptr, gtid, /* kmp_tdg_flags /0, / tdg_id */1);
				}

				assert(num_tasks==2);
				assert(val==0);

				std::cout << "Passed" << std::endl;
				return 0;
				}
				// CHECK: Passed

openmp/runtime/test/tasking/omp_record_replay_taskloop.cpp

This file was added.

				// RUN: %libomp-cxx-compile-and-run
				#include <iostream>
				#include <cassert>

				#define NT 20
				#define N 128*128

				typedef struct ident {
				void* dummy;
				} ident_t;


				#ifdef __cplusplus
				extern "C" {
				int __kmpc_global_thread_num(ident_t *);
				int __kmpc_start_record_task(ident_t *, int, int, int);
				void __kmpc_end_record_task(ident_t *, int, int , int);
				}
				#endif

				int main() {
				int num_tasks = 0;

				int array[N];
				for (int i = 0; i < N; ++i)
				array[i] = 1;

				long sum = 0;
				#pragma omp parallel
				#pragma omp single
				for (int iter = 0; iter < NT; ++iter) {
				int gtid = __kmpc_global_thread_num(nullptr);
				int res = __kmpc_start_record_task(nullptr, gtid, /* kmp_tdg_flags /0, / tdg_id */0);
				if (res) {
				num_tasks++;
				#pragma omp taskloop reduction(+:sum) num_tasks(4096)
				for (int i = 0; i < N; ++i) {
				sum += array[i];
				}
				}
				__kmpc_end_record_task(nullptr, gtid, /* kmp_tdg_flags /0, / tdg_id */0);
				}
				assert(sum==N*NT);
				assert(num_tasks==1);

				std::cout << "Passed" << std::endl;
				return 0;
				}
				// CHECK: Passed