This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
openmp/runtime/
-
runtime/
-
src/
-
kmp.h
-
kmp_dispatch.h
3/5
kmp_dispatch.cpp
-
kmp_dispatch_hier.h
-
kmp_settings.cpp
-
test/
-
env/
-
kmp_set_dispatch_buf.c
-
worksharing/for/
-
for/
-
kmp_set_dispatch_buf.c
-
omp_for_schedule_runtime.c
-
omp_par_in_loop.c

Differential D103648

[OpenMP] libomp: fix dynamic loop dispatcher
ClosedPublic

Authored by AndreyChurbanov on Jun 3 2021, 2:23 PM.

Download Raw Diff

Details

Reviewers

jlpeyton
hbae
tlwilmar
Nawrin
jdoerfert

Commits

rG5dd4d0d46fb8: [OpenMP] libomp: fix dynamic loop dispatcher

Summary

Restructured dynamic loop dispatcher code.
Fixed work with dispatch buffers for nonmonotonic dynamic (static_steal) schedule:

eliminated possibility of stealing iterations of the wrong loop when victim thread changed its buffer to work on another loop;
fixed race when victim thread changed its buffer to work in nested parallel.
eliminated "static" property of the schedule, that is now a single thread can execute whole loop.

Diff Detail

Event Timeline

AndreyChurbanov created this revision.Jun 3 2021, 2:23 PM

Herald added subscribers: jfb, guansong, yaxunl. · View Herald TranscriptJun 3 2021, 2:23 PM

AndreyChurbanov requested review of this revision.Jun 3 2021, 2:23 PM

Herald added a reviewer: jdoerfert. · View Herald TranscriptJun 3 2021, 2:23 PM

Herald added subscribers: openmp-commits, sstefan1. · View Herald Transcript

Harbormaster completed remote builds in B107556: Diff 349687.Jun 3 2021, 3:40 PM

Enable one more test fixed by this patch.

Harbormaster completed remote builds in B108047: Diff 350381.Jun 7 2021, 12:24 PM

Can you elaborate in the summary what your fixing about the static_steal schedule?

openmp/runtime/src/kmp_dispatch.cpp
1252–1254	Can we get rid of these commented lines?
1395–1397	Can we get rid of these commented lines?

Some comments removed.

LGTM

This revision is now accepted and ready to land.Jun 17 2021, 1:06 PM

Harbormaster completed remote builds in B109785: Diff 352814.Jun 18 2021, 1:16 AM

This revision was landed with ongoing or failed builds.Jun 22 2021, 6:29 AM

Closed by commit rG5dd4d0d46fb8: [OpenMP] libomp: fix dynamic loop dispatcher (authored by AndreyChurbanov). · Explain Why

This revision was automatically updated to reflect the committed changes.

AndreyChurbanov marked 2 inline comments as done.

AndreyChurbanov added a commit: rG5dd4d0d46fb8: [OpenMP] libomp: fix dynamic loop dispatcher.

Hello, I was seeing an assert firing in kmp_dispatch.cpp when running the omp_parallel_reduction.c test with a -DLLVM_ENABLE_ASSERTIONS enabled toolset. Would you mind taking a look? Thanks.

******************** TEST 'libomp :: parallel/omp_parallel_reduction.c' FAILED ********************
Script:
... omitted 

Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
OMP: Error #13: Assertion failure at kmp_dispatch.cpp(1453).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.

AndreyChurbanov mentioned this in D104880: [OpenMP][NFC] Fix wrong debug assertion..Jun 24 2021, 3:44 PM

In D103648#2839482, @hoy wrote:
Hello, I was seeing an assert firing in kmp_dispatch.cpp when running the omp_parallel_reduction.c test with a -DLLVM_ENABLE_ASSERTIONS enabled toolset. Would you mind taking a look? Thanks.

TEST 'libomp :: parallel/omp_parallel_reduction.c' FAILED ****

Script:
... omitted

Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
OMP: Error #13: Assertion failure at kmp_dispatch.cpp(1453).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.

Thanks for reporting.
Actually the wrong assertion existed before this patch, which apparently increased the probability of its triggering.
I've fixed the assertion in https://reviews.llvm.org/D104880.

In D103648#2839789, @AndreyChurbanov wrote:
In D103648#2839482, @hoy wrote:
Hello, I was seeing an assert firing in kmp_dispatch.cpp when running the omp_parallel_reduction.c test with a -DLLVM_ENABLE_ASSERTIONS enabled toolset. Would you mind taking a look? Thanks.

TEST 'libomp :: parallel/omp_parallel_reduction.c' FAILED ****

Script:
... omitted

Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
OMP: Error #13: Assertion failure at kmp_dispatch.cpp(1453).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Thanks for reporting.
Actually the wrong assertion existed before this patch, which apparently increased the probability of its triggering.
I've fixed the assertion in https://reviews.llvm.org/D104880.

Thanks for the fast turnaround!

AndreyChurbanov mentioned this in rGb2787945f9cd: [OpenMP][NFC] libomp: fix wrong debug assertion..Jun 24 2021, 4:02 PM

rogfer01 added a subscriber: rogfer01.Aug 13 2021, 3:48 AM

rogfer01 added inline comments.

openmp/runtime/src/kmp_dispatch.cpp
911	Sometimes (not always, so it seems a data race) running this test in an Arm 64-bit machine with 46 cores (and in a Power9 machine with 40 cores) all the threads end waiting here, so the test doesn't progress anymore. All the cases I've seen happen with `KMP_DISP_NUM_BUFFERS=3` and `-DMY_SCHEDULE=guided`. Any idea how I could debug this further? A quick look about `sh->buffer_index` shows it is a `volatile` and it is updated in sh->buffer_index += __kmp_dispatch_num_buffers; KD_TRACE(100, ("__kmp_dispatch_next: T#%d change buffer_index:%d\n", gtid, sh->buffer_index)); KMP_MB(); /* Flush all pending memory write invalidates. */ Given that this is not an atomic operation (yet it goes followed by a memory barrier) my only hypothesis is that the original load of `sh->buffer_index` might have read an old value but that would suggest `KMP_MB()` is not effective in these targets? So I am at loss here. Thanks!

AndreyChurbanov added inline comments.Aug 13 2021, 6:01 AM

openmp/runtime/src/kmp_dispatch.cpp
911	Which test(s) you see hanging? There indeed may be a data race somewhere in the code, I will try to take a look. Still better to know the exact test case.

rogfer01 added inline comments.Aug 18 2021, 9:29 AM

openmp/runtime/src/kmp_dispatch.cpp
911	Hi Andrey, apologies I forgot to mention that in the comment above `env/kmp_set_dispatch_buf.c` was easy to cause a deadlock with `KMP_DISP_NUM_BUFFERS=3` and `-DMY_SCHEDULE=guided`. Kind regards,

Revision Contents

Path

Size

openmp/

runtime/

src/

29 lines

9 lines

455 lines

2 lines

7 lines

test/

env/

kmp_set_dispatch_buf.c

6 lines

worksharing/

for/

kmp_set_dispatch_buf.c

6 lines

omp_for_schedule_runtime.c

4 lines

omp_par_in_loop.c

28 lines

Diff 352814

openmp/runtime/src/kmp.h

Show First 20 Lines • Show All 1,669 Lines • ▼ Show 20 Lines
#if KMP_STATIC_STEAL_ENABLED		#if KMP_STATIC_STEAL_ENABLED
typedef struct KMP_ALIGN_CACHE dispatch_private_info32 {		typedef struct KMP_ALIGN_CACHE dispatch_private_info32 {
kmp_int32 count;		kmp_int32 count;
kmp_int32 ub;		kmp_int32 ub;
/* Adding KMP_ALIGN_CACHE here doesn't help / can hurt performance */		/* Adding KMP_ALIGN_CACHE here doesn't help / can hurt performance */
kmp_int32 lb;		kmp_int32 lb;
kmp_int32 st;		kmp_int32 st;
kmp_int32 tc;		kmp_int32 tc;
kmp_int32 static_steal_counter; /* for static_steal only; maybe better to put		kmp_lock_t *steal_lock; // lock used for chunk stealing
after ub */		// KMP_ALIGN(32) ensures (if the KMP_ALIGN macro is turned on)
kmp_lock_t *th_steal_lock; // lock used for chunk stealing
// KMP_ALIGN( 16 ) ensures ( if the KMP_ALIGN macro is turned on )
// a) parm3 is properly aligned and		// a) parm3 is properly aligned and
// b) all parm1-4 are in the same cache line.		// b) all parm1-4 are on the same cache line.
// Because of parm1-4 are used together, performance seems to be better		// Because of parm1-4 are used together, performance seems to be better
// if they are in the same line (not measured though).		// if they are on the same cache line (not measured though).

struct KMP_ALIGN(32) { // AC: changed 16 to 32 in order to simplify template		struct KMP_ALIGN(32) { // AC: changed 16 to 32 in order to simplify template
kmp_int32 parm1; // structures in kmp_dispatch.cpp. This should		kmp_int32 parm1; // structures in kmp_dispatch.cpp. This should
kmp_int32 parm2; // make no real change at least while padding is off.		kmp_int32 parm2; // make no real change at least while padding is off.
kmp_int32 parm3;		kmp_int32 parm3;
kmp_int32 parm4;		kmp_int32 parm4;
};		};

kmp_uint32 ordered_lower;		kmp_uint32 ordered_lower;
kmp_uint32 ordered_upper;		kmp_uint32 ordered_upper;
#if KMP_OS_WINDOWS		#if KMP_OS_WINDOWS
// This var can be placed in the hole between 'tc' and 'parm1', instead of
// 'static_steal_counter'. It would be nice to measure execution times.
// Conditional if/endif can be removed at all.
kmp_int32 last_upper;		kmp_int32 last_upper;
#endif /* KMP_OS_WINDOWS */		#endif /* KMP_OS_WINDOWS */
} dispatch_private_info32_t;		} dispatch_private_info32_t;

typedef struct KMP_ALIGN_CACHE dispatch_private_info64 {		typedef struct KMP_ALIGN_CACHE dispatch_private_info64 {
kmp_int64 count; // current chunk number for static & static-steal scheduling		kmp_int64 count; // current chunk number for static & static-steal scheduling
kmp_int64 ub; /* upper-bound */		kmp_int64 ub; /* upper-bound */
/* Adding KMP_ALIGN_CACHE here doesn't help / can hurt performance */		/* Adding KMP_ALIGN_CACHE here doesn't help / can hurt performance */
kmp_int64 lb; /* lower-bound */		kmp_int64 lb; /* lower-bound */
kmp_int64 st; /* stride */		kmp_int64 st; /* stride */
kmp_int64 tc; /* trip count (number of iterations) */		kmp_int64 tc; /* trip count (number of iterations) */
kmp_int64 static_steal_counter; /* for static_steal only; maybe better to put		kmp_lock_t *steal_lock; // lock used for chunk stealing
after ub */
kmp_lock_t *th_steal_lock; // lock used for chunk stealing
/* parm[1-4] are used in different ways by different scheduling algorithms */		/* parm[1-4] are used in different ways by different scheduling algorithms */

// KMP_ALIGN( 32 ) ensures ( if the KMP_ALIGN macro is turned on )		// KMP_ALIGN( 32 ) ensures ( if the KMP_ALIGN macro is turned on )
// a) parm3 is properly aligned and		// a) parm3 is properly aligned and
// b) all parm1-4 are in the same cache line.		// b) all parm1-4 are in the same cache line.
// Because of parm1-4 are used together, performance seems to be better		// Because of parm1-4 are used together, performance seems to be better
// if they are in the same line (not measured though).		// if they are in the same line (not measured though).

struct KMP_ALIGN(32) {		struct KMP_ALIGN(32) {
kmp_int64 parm1;		kmp_int64 parm1;
kmp_int64 parm2;		kmp_int64 parm2;
kmp_int64 parm3;		kmp_int64 parm3;
kmp_int64 parm4;		kmp_int64 parm4;
};		};

kmp_uint64 ordered_lower;		kmp_uint64 ordered_lower;
kmp_uint64 ordered_upper;		kmp_uint64 ordered_upper;
#if KMP_OS_WINDOWS		#if KMP_OS_WINDOWS
// This var can be placed in the hole between 'tc' and 'parm1', instead of
// 'static_steal_counter'. It would be nice to measure execution times.
// Conditional if/endif can be removed at all.
kmp_int64 last_upper;		kmp_int64 last_upper;
#endif /* KMP_OS_WINDOWS */		#endif /* KMP_OS_WINDOWS */
} dispatch_private_info64_t;		} dispatch_private_info64_t;
#else /* KMP_STATIC_STEAL_ENABLED */		#else /* KMP_STATIC_STEAL_ENABLED */
typedef struct KMP_ALIGN_CACHE dispatch_private_info32 {		typedef struct KMP_ALIGN_CACHE dispatch_private_info32 {
kmp_int32 lb;		kmp_int32 lb;
kmp_int32 ub;		kmp_int32 ub;
kmp_int32 st;		kmp_int32 st;
Show All 37 Lines

typedef struct KMP_ALIGN_CACHE dispatch_private_info {		typedef struct KMP_ALIGN_CACHE dispatch_private_info {
union private_info {		union private_info {
dispatch_private_info32_t p32;		dispatch_private_info32_t p32;
dispatch_private_info64_t p64;		dispatch_private_info64_t p64;
} u;		} u;
enum sched_type schedule; /* scheduling algorithm */		enum sched_type schedule; /* scheduling algorithm */
kmp_sched_flags_t flags; /* flags (e.g., ordered, nomerge, etc.) */		kmp_sched_flags_t flags; /* flags (e.g., ordered, nomerge, etc.) */
		std::atomic<kmp_uint32> steal_flag; // static_steal only, state of a buffer
kmp_int32 ordered_bumped;		kmp_int32 ordered_bumped;
// To retain the structure size after making ordered_iteration scalar
kmp_int32 ordered_dummy[KMP_MAX_ORDERED - 3];
// Stack of buffers for nest of serial regions		// Stack of buffers for nest of serial regions
struct dispatch_private_info *next;		struct dispatch_private_info *next;
kmp_int32 type_size; /* the size of types in private_info */		kmp_int32 type_size; /* the size of types in private_info */
#if KMP_USE_HIER_SCHED		#if KMP_USE_HIER_SCHED
kmp_int32 hier_id;		kmp_int32 hier_id;
void parent; / hierarchical scheduling parent pointer */		void parent; / hierarchical scheduling parent pointer */
#endif		#endif
enum cons_type pushed_ws;		enum cons_type pushed_ws;
} dispatch_private_info_t;		} dispatch_private_info_t;

typedef struct dispatch_shared_info32 {		typedef struct dispatch_shared_info32 {
/* chunk index under dynamic, number of idle threads under static-steal;		/* chunk index under dynamic, number of idle threads under static-steal;
iteration index otherwise */		iteration index otherwise */
volatile kmp_uint32 iteration;		volatile kmp_uint32 iteration;
volatile kmp_uint32 num_done;		volatile kmp_int32 num_done;
volatile kmp_uint32 ordered_iteration;		volatile kmp_uint32 ordered_iteration;
// Dummy to retain the structure size after making ordered_iteration scalar		// Dummy to retain the structure size after making ordered_iteration scalar
kmp_int32 ordered_dummy[KMP_MAX_ORDERED - 1];		kmp_int32 ordered_dummy[KMP_MAX_ORDERED - 1];
} dispatch_shared_info32_t;		} dispatch_shared_info32_t;

typedef struct dispatch_shared_info64 {		typedef struct dispatch_shared_info64 {
/* chunk index under dynamic, number of idle threads under static-steal;		/* chunk index under dynamic, number of idle threads under static-steal;
iteration index otherwise */		iteration index otherwise */
volatile kmp_uint64 iteration;		volatile kmp_uint64 iteration;
volatile kmp_uint64 num_done;		volatile kmp_int64 num_done;
volatile kmp_uint64 ordered_iteration;		volatile kmp_uint64 ordered_iteration;
// Dummy to retain the structure size after making ordered_iteration scalar		// Dummy to retain the structure size after making ordered_iteration scalar
kmp_int64 ordered_dummy[KMP_MAX_ORDERED - 3];		kmp_int64 ordered_dummy[KMP_MAX_ORDERED - 3];
} dispatch_shared_info64_t;		} dispatch_shared_info64_t;

typedef struct dispatch_shared_info {		typedef struct dispatch_shared_info {
union shared_info {		union shared_info {
dispatch_shared_info32_t s32;		dispatch_shared_info32_t s32;
Show All 19 Lines	typedef struct kmp_disp {
void (th_deo_fcn)(int gtid, int cid, ident_t );		void (th_deo_fcn)(int gtid, int cid, ident_t );
/* Vector for END ORDERED SECTION */		/* Vector for END ORDERED SECTION */
void (th_dxo_fcn)(int gtid, int cid, ident_t );		void (th_dxo_fcn)(int gtid, int cid, ident_t );

dispatch_shared_info_t *th_dispatch_sh_current;		dispatch_shared_info_t *th_dispatch_sh_current;
dispatch_private_info_t *th_dispatch_pr_current;		dispatch_private_info_t *th_dispatch_pr_current;

dispatch_private_info_t *th_disp_buffer;		dispatch_private_info_t *th_disp_buffer;
kmp_int32 th_disp_index;		kmp_uint32 th_disp_index;
kmp_int32 th_doacross_buf_idx; // thread's doacross buffer index		kmp_int32 th_doacross_buf_idx; // thread's doacross buffer index
volatile kmp_uint32 *th_doacross_flags; // pointer to shared array of flags		volatile kmp_uint32 *th_doacross_flags; // pointer to shared array of flags
kmp_int64 *th_doacross_info; // info on loop bounds		kmp_int64 *th_doacross_info; // info on loop bounds
#if KMP_USE_INTERNODE_ALIGNMENT		#if KMP_USE_INTERNODE_ALIGNMENT
char more_padding[INTERNODE_CACHE_LINE];		char more_padding[INTERNODE_CACHE_LINE];
#endif		#endif
} kmp_disp_t;		} kmp_disp_t;

▲ Show 20 Lines • Show All 2,525 Lines • Show Last 20 Lines

openmp/runtime/src/kmp_dispatch.h

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	template <typename T> struct dispatch_private_infoXX_template {
typedef typename traits_t<T>::unsigned_t UT;		typedef typename traits_t<T>::unsigned_t UT;
typedef typename traits_t<T>::signed_t ST;		typedef typename traits_t<T>::signed_t ST;
UT count; // unsigned		UT count; // unsigned
T ub;		T ub;
/* Adding KMP_ALIGN_CACHE here doesn't help / can hurt performance */		/* Adding KMP_ALIGN_CACHE here doesn't help / can hurt performance */
T lb;		T lb;
ST st; // signed		ST st; // signed
UT tc; // unsigned		UT tc; // unsigned
T static_steal_counter; // for static_steal only; maybe better to put after ub		kmp_lock_t *steal_lock; // lock used for chunk stealing
kmp_lock_t *th_steal_lock; // lock used for chunk stealing
/* parm[1-4] are used in different ways by different scheduling algorithms */		/* parm[1-4] are used in different ways by different scheduling algorithms */

// KMP_ALIGN( 32 ) ensures ( if the KMP_ALIGN macro is turned on )		// KMP_ALIGN( 32 ) ensures ( if the KMP_ALIGN macro is turned on )
// a) parm3 is properly aligned and		// a) parm3 is properly aligned and
// b) all parm1-4 are in the same cache line.		// b) all parm1-4 are in the same cache line.
// Because of parm1-4 are used together, performance seems to be better		// Because of parm1-4 are used together, performance seems to be better
// if they are in the same line (not measured though).		// if they are in the same line (not measured though).

▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	template <typename T> struct KMP_ALIGN_CACHE dispatch_private_info_template {
// duplicate alignment here, otherwise size of structure is not correct in our		// duplicate alignment here, otherwise size of structure is not correct in our
// compiler		// compiler
union KMP_ALIGN_CACHE private_info_tmpl {		union KMP_ALIGN_CACHE private_info_tmpl {
dispatch_private_infoXX_template<T> p;		dispatch_private_infoXX_template<T> p;
dispatch_private_info64_t p64;		dispatch_private_info64_t p64;
} u;		} u;
enum sched_type schedule; /* scheduling algorithm */		enum sched_type schedule; /* scheduling algorithm */
kmp_sched_flags_t flags; /* flags (e.g., ordered, nomerge, etc.) */		kmp_sched_flags_t flags; /* flags (e.g., ordered, nomerge, etc.) */
		std::atomic<kmp_uint32> steal_flag; // static_steal only, state of a buffer
kmp_uint32 ordered_bumped;		kmp_uint32 ordered_bumped;
// to retain the structure size after making order
kmp_int32 ordered_dummy[KMP_MAX_ORDERED - 3];
dispatch_private_info next; / stack of buffers for nest of serial regions */		dispatch_private_info next; / stack of buffers for nest of serial regions */
kmp_uint32 type_size;		kmp_uint32 type_size;
#if KMP_USE_HIER_SCHED		#if KMP_USE_HIER_SCHED
kmp_int32 hier_id;		kmp_int32 hier_id;
kmp_hier_top_unit_t<T> *hier_parent;		kmp_hier_top_unit_t<T> *hier_parent;
// member functions		// member functions
kmp_int32 get_hier_id() const { return hier_id; }		kmp_int32 get_hier_id() const { return hier_id; }
kmp_hier_top_unit_t<T> *get_parent() { return hier_parent; }		kmp_hier_top_unit_t<T> *get_parent() { return hier_parent; }
#endif		#endif
enum cons_type pushed_ws;		enum cons_type pushed_ws;
};		};

// replaces dispatch_shared_info{32,64} structures and		// replaces dispatch_shared_info{32,64} structures and
// dispatch_shared_info{32,64}_t types		// dispatch_shared_info{32,64}_t types
template <typename T> struct dispatch_shared_infoXX_template {		template <typename T> struct dispatch_shared_infoXX_template {
typedef typename traits_t<T>::unsigned_t UT;		typedef typename traits_t<T>::unsigned_t UT;
		typedef typename traits_t<T>::signed_t ST;
/* chunk index under dynamic, number of idle threads under static-steal;		/* chunk index under dynamic, number of idle threads under static-steal;
iteration index otherwise */		iteration index otherwise */
volatile UT iteration;		volatile UT iteration;
volatile UT num_done;		volatile ST num_done;
volatile UT ordered_iteration;		volatile UT ordered_iteration;
// to retain the structure size making ordered_iteration scalar		// to retain the structure size making ordered_iteration scalar
UT ordered_dummy[KMP_MAX_ORDERED - 3];		UT ordered_dummy[KMP_MAX_ORDERED - 3];
};		};

// replaces dispatch_shared_info structure and dispatch_shared_info_t type		// replaces dispatch_shared_info structure and dispatch_shared_info_t type
template <typename T> struct dispatch_shared_info_template {		template <typename T> struct dispatch_shared_info_template {
typedef typename traits_t<T>::unsigned_t UT;		typedef typename traits_t<T>::unsigned_t UT;
▲ Show 20 Lines • Show All 339 Lines • Show Last 20 Lines

openmp/runtime/src/kmp_dispatch.cpp

Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	static inline int __kmp_get_monotonicity(ident_t *loc, enum sched_type schedule,
else if (SCHEDULE_HAS_NONMONOTONIC(schedule))		else if (SCHEDULE_HAS_NONMONOTONIC(schedule))
monotonicity = SCHEDULE_NONMONOTONIC;		monotonicity = SCHEDULE_NONMONOTONIC;
else if (SCHEDULE_HAS_MONOTONIC(schedule))		else if (SCHEDULE_HAS_MONOTONIC(schedule))
monotonicity = SCHEDULE_MONOTONIC;		monotonicity = SCHEDULE_MONOTONIC;

return monotonicity;		return monotonicity;
}		}

		#if KMP_STATIC_STEAL_ENABLED
		enum { // values for steal_flag (possible states of private per-loop buffer)
		UNUSED = 0,
		CLAIMED = 1, // owner thread started initialization
		READY = 2, // available for stealing
		THIEF = 3 // finished by owner, or claimed by thief
		// possible state changes:
		// 0 -> 1 owner only, sync
		// 0 -> 3 thief only, sync
		// 1 -> 2 owner only, async
		// 2 -> 3 owner only, async
		// 3 -> 2 owner only, async
		// 3 -> 0 last thread finishing the loop, async
		};
		#endif

// Initialize a dispatch_private_info_template<T> buffer for a particular		// Initialize a dispatch_private_info_template<T> buffer for a particular
// type of schedule,chunk. The loop description is found in lb (lower bound),		// type of schedule,chunk. The loop description is found in lb (lower bound),
// ub (upper bound), and st (stride). nproc is the number of threads relevant		// ub (upper bound), and st (stride). nproc is the number of threads relevant
// to the scheduling (often the number of threads in a team, but not always if		// to the scheduling (often the number of threads in a team, but not always if
// hierarchical scheduling is used). tid is the id of the thread calling		// hierarchical scheduling is used). tid is the id of the thread calling
// the function within the group of nproc threads. It will have a value		// the function within the group of nproc threads. It will have a value
// between 0 and nproc - 1. This is often just the thread id within a team, but		// between 0 and nproc - 1. This is often just the thread id within a team, but
// is not necessarily the case when using hierarchical scheduling.		// is not necessarily the case when using hierarchical scheduling.
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	if (schedule == kmp_sch_static) {
schedule = __kmp_static;		schedule = __kmp_static;
} else {		} else {
if (schedule == kmp_sch_runtime) {		if (schedule == kmp_sch_runtime) {
// Use the scheduling specified by OMP_SCHEDULE (or __kmp_sch_default if		// Use the scheduling specified by OMP_SCHEDULE (or __kmp_sch_default if
// not specified)		// not specified)
schedule = team->t.t_sched.r_sched_type;		schedule = team->t.t_sched.r_sched_type;
monotonicity = __kmp_get_monotonicity(loc, schedule, use_hier);		monotonicity = __kmp_get_monotonicity(loc, schedule, use_hier);
schedule = SCHEDULE_WITHOUT_MODIFIERS(schedule);		schedule = SCHEDULE_WITHOUT_MODIFIERS(schedule);
		if (pr->flags.ordered) // correct monotonicity for ordered loop if needed
		monotonicity = SCHEDULE_MONOTONIC;
// Detail the schedule if needed (global controls are differentiated		// Detail the schedule if needed (global controls are differentiated
// appropriately)		// appropriately)
if (schedule == kmp_sch_guided_chunked) {		if (schedule == kmp_sch_guided_chunked) {
schedule = __kmp_guided;		schedule = __kmp_guided;
} else if (schedule == kmp_sch_static) {		} else if (schedule == kmp_sch_static) {
schedule = __kmp_static;		schedule = __kmp_static;
}		}
// Use the chunk size specified by OMP_SCHEDULE (or default if not		// Use the chunk size specified by OMP_SCHEDULE (or default if not
▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	if (active) {
if (pr->flags.ordered) {		if (pr->flags.ordered) {
pr->ordered_bumped = 0;		pr->ordered_bumped = 0;
pr->u.p.ordered_lower = 1;		pr->u.p.ordered_lower = 1;
pr->u.p.ordered_upper = 0;		pr->u.p.ordered_upper = 0;
}		}
}		}

switch (schedule) {		switch (schedule) {
#if (KMP_STATIC_STEAL_ENABLED)		#if KMP_STATIC_STEAL_ENABLED
case kmp_sch_static_steal: {		case kmp_sch_static_steal: {
T ntc, init;		T ntc, init;

KD_TRACE(100,		KD_TRACE(100,
("__kmp_dispatch_init_algorithm: T#%d kmp_sch_static_steal case\n",		("__kmp_dispatch_init_algorithm: T#%d kmp_sch_static_steal case\n",
gtid));		gtid));

ntc = (tc % chunk ? 1 : 0) + tc / chunk;		ntc = (tc % chunk ? 1 : 0) + tc / chunk;
if (nproc > 1 && ntc >= nproc) {		if (nproc > 1 && ntc >= nproc) {
KMP_COUNT_BLOCK(OMP_LOOP_STATIC_STEAL);		KMP_COUNT_BLOCK(OMP_LOOP_STATIC_STEAL);
T id = tid;		T id = tid;
T small_chunk, extras;		T small_chunk, extras;
		kmp_uint32 old = UNUSED;
		int claimed = pr->steal_flag.compare_exchange_strong(old, CLAIMED);
		if (traits_t<T>::type_size > 4) {
		// AC: TODO: check if 16-byte CAS available and use it to
		// improve performance (probably wait for explicit request
		// before spending time on this).
		// For now use dynamically allocated per-private-buffer lock,
		// free memory in __kmp_dispatch_next when status==0.
		pr->u.p.steal_lock = (kmp_lock_t *)__kmp_allocate(sizeof(kmp_lock_t));
		__kmp_init_lock(pr->u.p.steal_lock);
		}
small_chunk = ntc / nproc;		small_chunk = ntc / nproc;
extras = ntc % nproc;		extras = ntc % nproc;

init = id * small_chunk + (id < extras ? id : extras);		init = id * small_chunk + (id < extras ? id : extras);
pr->u.p.count = init;		pr->u.p.count = init;
		if (claimed) { // are we succeeded in claiming own buffer?
pr->u.p.ub = init + small_chunk + (id < extras ? 1 : 0);		pr->u.p.ub = init + small_chunk + (id < extras ? 1 : 0);
		// Other threads will inspect steal_flag when searching for a victim.
pr->u.p.parm2 = lb;		// READY means other threads may steal from this thread from now on.
		KMP_ATOMIC_ST_REL(&pr->steal_flag, READY);
		} else {
		// other thread has stolen whole our range
		KMP_DEBUG_ASSERT(pr->steal_flag == THIEF);
		pr->u.p.ub = init; // mark there is no iterations to work on
		}
		pr->u.p.parm2 = ntc; // save number of chunks
// parm3 is the number of times to attempt stealing which is		// parm3 is the number of times to attempt stealing which is
// proportional to the number of chunks per thread up until		// nproc (just a heuristics, could be optimized later on).
// the maximum value of nproc.		pr->u.p.parm3 = nproc;
pr->u.p.parm3 = KMP_MIN(small_chunk + extras, nproc);
pr->u.p.parm4 = (id + 1) % nproc; // remember neighbour tid		pr->u.p.parm4 = (id + 1) % nproc; // remember neighbour tid
pr->u.p.st = st;
if (traits_t<T>::type_size > 4) {
// AC: TODO: check if 16-byte CAS available and use it to
// improve performance (probably wait for explicit request
// before spending time on this).
// For now use dynamically allocated per-thread lock,
// free memory in __kmp_dispatch_next when status==0.
KMP_DEBUG_ASSERT(pr->u.p.th_steal_lock == NULL);
pr->u.p.th_steal_lock =
(kmp_lock_t *)__kmp_allocate(sizeof(kmp_lock_t));
__kmp_init_lock(pr->u.p.th_steal_lock);
}
break;		break;
} else {		} else {
/* too few chunks: switching to kmp_sch_dynamic_chunked */		/* too few chunks: switching to kmp_sch_dynamic_chunked */
schedule = kmp_sch_dynamic_chunked;		schedule = kmp_sch_dynamic_chunked;
KD_TRACE(100, ("__kmp_dispatch_init_algorithm: T#%d switching to "		KD_TRACE(100, ("__kmp_dispatch_init_algorithm: T#%d switching to "
"kmp_sch_dynamic_chunked\n",		"kmp_sch_dynamic_chunked\n",
gtid));		gtid));
goto dynamic_init;		goto dynamic_init;
▲ Show 20 Lines • Show All 480 Lines • ▼ Show 20 Lines	if (!active) {
/* What happens when number of threads changes, need to resize buffer? */		/* What happens when number of threads changes, need to resize buffer? */
pr = reinterpret_cast<dispatch_private_info_template<T> *>(		pr = reinterpret_cast<dispatch_private_info_template<T> *>(
&th->th.th_dispatch		&th->th.th_dispatch
->th_disp_buffer[my_buffer_index % __kmp_dispatch_num_buffers]);		->th_disp_buffer[my_buffer_index % __kmp_dispatch_num_buffers]);
sh = reinterpret_cast<dispatch_shared_info_template<T> volatile *>(		sh = reinterpret_cast<dispatch_shared_info_template<T> volatile *>(
&team->t.t_disp_buffer[my_buffer_index % __kmp_dispatch_num_buffers]);		&team->t.t_disp_buffer[my_buffer_index % __kmp_dispatch_num_buffers]);
KD_TRACE(10, ("__kmp_dispatch_init: T#%d my_buffer_index:%d\n", gtid,		KD_TRACE(10, ("__kmp_dispatch_init: T#%d my_buffer_index:%d\n", gtid,
my_buffer_index));		my_buffer_index));
		if (sh->buffer_index != my_buffer_index) { // too many loops in progress?
		KD_TRACE(100, ("__kmp_dispatch_init: T#%d before wait: my_buffer_index:%d"
		" sh->buffer_index:%d\n",
		gtid, my_buffer_index, sh->buffer_index));
		__kmp_wait<kmp_uint32>(&sh->buffer_index, my_buffer_index,
		rogfer01Unsubmitted Not Done Reply Inline Actions Sometimes (not always, so it seems a data race) running this test in an Arm 64-bit machine with 46 cores (and in a Power9 machine with 40 cores) all the threads end waiting here, so the test doesn't progress anymore. All the cases I've seen happen with `KMP_DISP_NUM_BUFFERS=3` and `-DMY_SCHEDULE=guided`. Any idea how I could debug this further? A quick look about `sh->buffer_index` shows it is a `volatile` and it is updated in sh->buffer_index += __kmp_dispatch_num_buffers; KD_TRACE(100, ("__kmp_dispatch_next: T#%d change buffer_index:%d\n", gtid, sh->buffer_index)); KMP_MB(); /* Flush all pending memory write invalidates. / Given that this is not an atomic operation (yet it goes followed by a memory barrier) my only hypothesis is that the original load of `sh->buffer_index` might have read an old value but that would suggest `KMP_MB()` is not effective in these targets? So I am at loss here. Thanks! rogfer01:* Sometimes (not always, so it seems a data race) running this test in an Arm 64-bit machine…
		AndreyChurbanovAuthorUnsubmitted Done Reply Inline Actions Which test(s) you see hanging? There indeed may be a data race somewhere in the code, I will try to take a look. Still better to know the exact test case. AndreyChurbanov: Which test(s) you see hanging? There indeed may be a data race somewhere in the code, I will…
		rogfer01Unsubmitted Not Done Reply Inline Actions Hi Andrey, apologies I forgot to mention that in the comment above `env/kmp_set_dispatch_buf.c` was easy to cause a deadlock with `KMP_DISP_NUM_BUFFERS=3` and `-DMY_SCHEDULE=guided`. Kind regards, rogfer01: Hi Andrey, apologies I forgot to mention that in the comment above `env/kmp_set_dispatch_buf.
		__kmp_eq<kmp_uint32> USE_ITT_BUILD_ARG(NULL));
		// Note: KMP_WAIT() cannot be used there: buffer index and
		// my_buffer_index are always 32-bit integers.
		KD_TRACE(100, ("__kmp_dispatch_init: T#%d after wait: my_buffer_index:%d "
		"sh->buffer_index:%d\n",
		gtid, my_buffer_index, sh->buffer_index));
		}
}		}

__kmp_dispatch_init_algorithm(loc, gtid, pr, schedule, lb, ub, st,		__kmp_dispatch_init_algorithm(loc, gtid, pr, schedule, lb, ub, st,
#if USE_ITT_BUILD		#if USE_ITT_BUILD
&cur_chunk,		&cur_chunk,
#endif		#endif
chunk, (T)th->th.th_team_nproc,		chunk, (T)th->th.th_team_nproc,
(T)th->th.th_info.ds.ds_tid);		(T)th->th.th_info.ds.ds_tid);
if (active) {		if (active) {
if (pr->flags.ordered == 0) {		if (pr->flags.ordered == 0) {
th->th.th_dispatch->th_deo_fcn = __kmp_dispatch_deo_error;		th->th.th_dispatch->th_deo_fcn = __kmp_dispatch_deo_error;
th->th.th_dispatch->th_dxo_fcn = __kmp_dispatch_dxo_error;		th->th.th_dispatch->th_dxo_fcn = __kmp_dispatch_dxo_error;
} else {		} else {
th->th.th_dispatch->th_deo_fcn = __kmp_dispatch_deo<UT>;		th->th.th_dispatch->th_deo_fcn = __kmp_dispatch_deo<UT>;
th->th.th_dispatch->th_dxo_fcn = __kmp_dispatch_dxo<UT>;		th->th.th_dispatch->th_dxo_fcn = __kmp_dispatch_dxo<UT>;
}		}
}

if (active) {
/* The name of this buffer should be my_buffer_index when it's free to use
* it */

KD_TRACE(100, ("__kmp_dispatch_init: T#%d before wait: my_buffer_index:%d "
"sh->buffer_index:%d\n",
gtid, my_buffer_index, sh->buffer_index));
__kmp_wait<kmp_uint32>(&sh->buffer_index, my_buffer_index,
__kmp_eq<kmp_uint32> USE_ITT_BUILD_ARG(NULL));
// Note: KMP_WAIT() cannot be used there: buffer index and
// my_buffer_index are always 32-bit integers.
KMP_MB(); /* is this necessary? */
KD_TRACE(100, ("__kmp_dispatch_init: T#%d after wait: my_buffer_index:%d "
"sh->buffer_index:%d\n",
gtid, my_buffer_index, sh->buffer_index));

th->th.th_dispatch->th_dispatch_pr_current = (dispatch_private_info_t *)pr;		th->th.th_dispatch->th_dispatch_pr_current = (dispatch_private_info_t *)pr;
th->th.th_dispatch->th_dispatch_sh_current =		th->th.th_dispatch->th_dispatch_sh_current =
CCAST(dispatch_shared_info_t , (volatile dispatch_shared_info_t )sh);		CCAST(dispatch_shared_info_t , (volatile dispatch_shared_info_t )sh);
#if USE_ITT_BUILD		#if USE_ITT_BUILD
if (pr->flags.ordered) {		if (pr->flags.ordered) {
__kmp_itt_ordered_init(gtid);		__kmp_itt_ordered_init(gtid);
}		}
// Report loop metadata		// Report loop metadata
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	buff = __kmp_str_format(
traits_t<T>::spec, traits_t<T>::spec, traits_t<T>::spec);		traits_t<T>::spec, traits_t<T>::spec, traits_t<T>::spec);
KD_TRACE(10, (buff, gtid, pr->schedule, pr->flags.ordered, pr->u.p.lb,		KD_TRACE(10, (buff, gtid, pr->schedule, pr->flags.ordered, pr->u.p.lb,
pr->u.p.ub, pr->u.p.st, pr->u.p.tc, pr->u.p.count,		pr->u.p.ub, pr->u.p.st, pr->u.p.tc, pr->u.p.count,
pr->u.p.ordered_lower, pr->u.p.ordered_upper, pr->u.p.parm1,		pr->u.p.ordered_lower, pr->u.p.ordered_upper, pr->u.p.parm1,
pr->u.p.parm2, pr->u.p.parm3, pr->u.p.parm4));		pr->u.p.parm2, pr->u.p.parm3, pr->u.p.parm4));
__kmp_str_free(&buff);		__kmp_str_free(&buff);
}		}
#endif		#endif
#if (KMP_STATIC_STEAL_ENABLED)
// It cannot be guaranteed that after execution of a loop with some other
// schedule kind all the parm3 variables will contain the same value. Even if
// all parm3 will be the same, it still exists a bad case like using 0 and 1
// rather than program life-time increment. So the dedicated variable is
// required. The 'static_steal_counter' is used.
if (pr->schedule == kmp_sch_static_steal) {
// Other threads will inspect this variable when searching for a victim.
// This is a flag showing that other threads may steal from this thread
// since then.
volatile T *p = &pr->u.p.static_steal_counter;
p = p + 1;
}
#endif // ( KMP_STATIC_STEAL_ENABLED )

#if OMPT_SUPPORT && OMPT_OPTIONAL		#if OMPT_SUPPORT && OMPT_OPTIONAL
if (ompt_enabled.ompt_callback_work) {		if (ompt_enabled.ompt_callback_work) {
ompt_team_info_t *team_info = __ompt_get_teaminfo(0, NULL);		ompt_team_info_t *team_info = __ompt_get_teaminfo(0, NULL);
ompt_task_info_t *task_info = __ompt_get_task_info_object(0);		ompt_task_info_t *task_info = __ompt_get_task_info_object(0);
ompt_callbacks.ompt_callback(ompt_callback_work)(		ompt_callbacks.ompt_callback(ompt_callback_work)(
ompt_work_loop, ompt_scope_begin, &(team_info->parallel_data),		ompt_work_loop, ompt_scope_begin, &(team_info->parallel_data),
&(task_info->task_data), pr->u.p.tc, OMPT_LOAD_RETURN_ADDRESS(gtid));		&(task_info->task_data), pr->u.p.tc, OMPT_LOAD_RETURN_ADDRESS(gtid));
}		}
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
template <typename UT>		template <typename UT>
static void __kmp_dispatch_finish_chunk(int gtid, ident_t *loc) {		static void __kmp_dispatch_finish_chunk(int gtid, ident_t *loc) {
typedef typename traits_t<UT>::signed_t ST;		typedef typename traits_t<UT>::signed_t ST;
__kmp_assert_valid_gtid(gtid);		__kmp_assert_valid_gtid(gtid);
kmp_info_t *th = __kmp_threads[gtid];		kmp_info_t *th = __kmp_threads[gtid];

KD_TRACE(100, ("__kmp_dispatch_finish_chunk: T#%d called\n", gtid));		KD_TRACE(100, ("__kmp_dispatch_finish_chunk: T#%d called\n", gtid));
if (!th->th.th_team->t.t_serialized) {		if (!th->th.th_team->t.t_serialized) {
// int cid;
dispatch_private_info_template<UT> *pr =		dispatch_private_info_template<UT> *pr =
reinterpret_cast<dispatch_private_info_template<UT> *>(		reinterpret_cast<dispatch_private_info_template<UT> *>(
th->th.th_dispatch->th_dispatch_pr_current);		th->th.th_dispatch->th_dispatch_pr_current);
dispatch_shared_info_template<UT> volatile *sh =		dispatch_shared_info_template<UT> volatile *sh =
reinterpret_cast<dispatch_shared_info_template<UT> volatile *>(		reinterpret_cast<dispatch_shared_info_template<UT> volatile *>(
th->th.th_dispatch->th_dispatch_sh_current);		th->th.th_dispatch->th_dispatch_sh_current);
KMP_DEBUG_ASSERT(pr);		KMP_DEBUG_ASSERT(pr);
KMP_DEBUG_ASSERT(sh);		KMP_DEBUG_ASSERT(sh);
KMP_DEBUG_ASSERT(th->th.th_dispatch ==		KMP_DEBUG_ASSERT(th->th.th_dispatch ==
&th->th.th_team->t.t_dispatch[th->th.th_info.ds.ds_tid]);		&th->th.th_team->t.t_dispatch[th->th.th_info.ds.ds_tid]);

// for (cid = 0; cid < KMP_MAX_ORDERED; ++cid) {
UT lower = pr->u.p.ordered_lower;		UT lower = pr->u.p.ordered_lower;
UT upper = pr->u.p.ordered_upper;		UT upper = pr->u.p.ordered_upper;
UT inc = upper - lower + 1;		UT inc = upper - lower + 1;

if (pr->ordered_bumped == inc) {		if (pr->ordered_bumped == inc) {
KD_TRACE(		KD_TRACE(
1000,		1000,
("__kmp_dispatch_finish: T#%d resetting ordered_bumped to zero\n",		("__kmp_dispatch_finish: T#%d resetting ordered_bumped to zero\n",
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	if (pr->u.p.tc == 0) {
KD_TRACE(10,		KD_TRACE(10,
("__kmp_dispatch_next_algorithm: T#%d early exit trip count is "		("__kmp_dispatch_next_algorithm: T#%d early exit trip count is "
"zero status:%d\n",		"zero status:%d\n",
gtid, status));		gtid, status));
return 0;		return 0;
}		}

switch (pr->schedule) {		switch (pr->schedule) {
#if (KMP_STATIC_STEAL_ENABLED)		#if KMP_STATIC_STEAL_ENABLED
case kmp_sch_static_steal: {		case kmp_sch_static_steal: {
T chunk = pr->u.p.parm1;		T chunk = pr->u.p.parm1;
		UT nchunks = pr->u.p.parm2;
KD_TRACE(100,		KD_TRACE(100,
("__kmp_dispatch_next_algorithm: T#%d kmp_sch_static_steal case\n",		("__kmp_dispatch_next_algorithm: T#%d kmp_sch_static_steal case\n",
gtid));		gtid));

trip = pr->u.p.tc - 1;		trip = pr->u.p.tc - 1;

if (traits_t<T>::type_size > 4) {		if (traits_t<T>::type_size > 4) {
// use lock for 8-byte and CAS for 4-byte induction		// use lock for 8-byte induction variable.
// variable. TODO (optional): check and use 16-byte CAS		// TODO (optional): check presence and use 16-byte CAS
kmp_lock_t *lck = pr->u.p.th_steal_lock;		kmp_lock_t *lck = pr->u.p.steal_lock;
KMP_DEBUG_ASSERT(lck != NULL);		KMP_DEBUG_ASSERT(lck != NULL);
if (pr->u.p.count < (UT)pr->u.p.ub) {		if (pr->u.p.count < (UT)pr->u.p.ub) {
		KMP_DEBUG_ASSERT(pr->steal_flag == READY);
__kmp_acquire_lock(lck, gtid);		__kmp_acquire_lock(lck, gtid);
// try to get own chunk of iterations		// try to get own chunk of iterations
init = (pr->u.p.count)++;		init = (pr->u.p.count)++;
status = (init < (UT)pr->u.p.ub);		status = (init < (UT)pr->u.p.ub);
__kmp_release_lock(lck, gtid);		__kmp_release_lock(lck, gtid);
} else {		} else {
status = 0; // no own chunks		status = 0; // no own chunks
}		}
if (!status) { // try to steal		if (!status) { // try to steal
kmp_info_t **other_threads = team->t.t_threads;		kmp_lock_t *lckv; // victim buffer's lock
T while_limit = pr->u.p.parm3;		T while_limit = pr->u.p.parm3;
T while_index = 0;		T while_index = 0;
T id = pr->u.p.static_steal_counter; // loop id
int idx = (th->th.th_dispatch->th_disp_index - 1) %		int idx = (th->th.th_dispatch->th_disp_index - 1) %
__kmp_dispatch_num_buffers; // current loop index		__kmp_dispatch_num_buffers; // current loop index
// note: victim thread can potentially execute another loop		// note: victim thread can potentially execute another loop
// TODO: algorithm of searching for a victim		KMP_ATOMIC_ST_REL(&pr->steal_flag, THIEF); // mark self buffer inactive
// should be cleaned up and measured
while ((!status) && (while_limit != ++while_index)) {		while ((!status) && (while_limit != ++while_index)) {
dispatch_private_info_template<T> *victim;		dispatch_private_info_template<T> *v;
T remaining;		T remaining;
T victimIdx = pr->u.p.parm4;		T victimId = pr->u.p.parm4;
T oldVictimIdx = victimIdx ? victimIdx - 1 : nproc - 1;		T oldVictimId = victimId ? victimId - 1 : nproc - 1;
victim = reinterpret_cast<dispatch_private_info_template<T> *>(		v = reinterpret_cast<dispatch_private_info_template<T> *>(
&other_threads[victimIdx]->th.th_dispatch->th_disp_buffer[idx]);		&team->t.t_dispatch[victimId].th_disp_buffer[idx]);
KMP_DEBUG_ASSERT(victim);		KMP_DEBUG_ASSERT(v);
while ((victim == pr \|\| id != victim->u.p.static_steal_counter) &&		while ((v == pr \|\| KMP_ATOMIC_LD_RLX(&v->steal_flag) == THIEF) &&
oldVictimIdx != victimIdx) {		oldVictimId != victimId) {
victimIdx = (victimIdx + 1) % nproc;		victimId = (victimId + 1) % nproc;
victim = reinterpret_cast<dispatch_private_info_template<T> *>(		v = reinterpret_cast<dispatch_private_info_template<T> *>(
&other_threads[victimIdx]->th.th_dispatch->th_disp_buffer[idx]);		&team->t.t_dispatch[victimId].th_disp_buffer[idx]);
KMP_DEBUG_ASSERT(victim);		KMP_DEBUG_ASSERT(v);
}		}
if (victim == pr \|\| id != victim->u.p.static_steal_counter) {		if (v == pr \|\| KMP_ATOMIC_LD_RLX(&v->steal_flag) == THIEF) {
continue; // try once more (nproc attempts in total)		continue; // try once more (nproc attempts in total)
// no victim is ready yet to participate in stealing
// because no victim passed kmp_init_dispatch yet
}		}
if (victim->u.p.count + 2 > (UT)victim->u.p.ub) {		if (KMP_ATOMIC_LD_RLX(&v->steal_flag) == UNUSED) {
		jlpeytonUnsubmitted Done Reply Inline Actions Can we get rid of these commented lines? jlpeyton: Can we get rid of these commented lines?
pr->u.p.parm4 = (victimIdx + 1) % nproc; // shift start tid		kmp_uint32 old = UNUSED;
continue; // not enough chunks to steal, goto next victim		// try to steal whole range from inactive victim
}		status = v->steal_flag.compare_exchange_strong(old, THIEF);
		if (status) {
lck = victim->u.p.th_steal_lock;		// initialize self buffer with victim's whole range of chunks
KMP_ASSERT(lck != NULL);		T id = victimId;
		T small_chunk, extras;
		small_chunk = nchunks / nproc; // chunks per thread
		extras = nchunks % nproc;
		init = id * small_chunk + (id < extras ? id : extras);
__kmp_acquire_lock(lck, gtid);		__kmp_acquire_lock(lck, gtid);
limit = victim->u.p.ub; // keep initial ub		pr->u.p.count = init + 1; // exclude one we execute immediately
if (victim->u.p.count >= limit \|\|		pr->u.p.ub = init + small_chunk + (id < extras ? 1 : 0);
(remaining = limit - victim->u.p.count) < 2) {
__kmp_release_lock(lck, gtid);		__kmp_release_lock(lck, gtid);
pr->u.p.parm4 = (victimIdx + 1) % nproc; // next victim		pr->u.p.parm4 = (id + 1) % nproc; // remember neighbour tid
continue; // not enough chunks to steal		// no need to reinitialize other thread invariants: lb, st, etc.
		#ifdef KMP_DEBUG
		{
		char *buff;
		// create format specifiers before the debug output
		buff = __kmp_str_format(
		"__kmp_dispatch_next: T#%%d stolen chunks from T#%%d, "
		"count:%%%s ub:%%%s\n",
		traits_t<UT>::spec, traits_t<T>::spec);
		KD_TRACE(10, (buff, gtid, id, pr->u.p.count, pr->u.p.ub));
		__kmp_str_free(&buff);
}		}
// stealing succeeded, reduce victim's ub by 1/4 of undone chunks or		#endif
// by 1		// activate non-empty buffer and let others steal from us
if (remaining > 3) {		if (pr->u.p.count < (UT)pr->u.p.ub)
		KMP_ATOMIC_ST_REL(&pr->steal_flag, READY);
		break;
		}
		}
		if (KMP_ATOMIC_LD_RLX(&v->steal_flag) != READY \|\|
		v->u.p.count >= (UT)v->u.p.ub) {
		pr->u.p.parm4 = (victimId + 1) % nproc; // shift start victim tid
		continue; // no chunks to steal, try next victim
		}
		lckv = v->u.p.steal_lock;
		KMP_ASSERT(lckv != NULL);
		__kmp_acquire_lock(lckv, gtid);
		limit = v->u.p.ub; // keep initial ub
		if (v->u.p.count >= limit) {
		__kmp_release_lock(lckv, gtid);
		pr->u.p.parm4 = (victimId + 1) % nproc; // shift start victim tid
		continue; // no chunks to steal, try next victim
		}

		// stealing succeded, reduce victim's ub by 1/4 of undone chunks
		// TODO: is this heuristics good enough??
		remaining = limit - v->u.p.count;
		if (remaining > 7) {
// steal 1/4 of remaining		// steal 1/4 of remaining
KMP_COUNT_DEVELOPER_VALUE(FOR_static_steal_stolen, remaining >> 2);		KMP_COUNT_DEVELOPER_VALUE(FOR_static_steal_stolen, remaining >> 2);
init = (victim->u.p.ub -= (remaining >> 2));		init = (v->u.p.ub -= (remaining >> 2));
} else {		} else {
// steal 1 chunk of 2 or 3 remaining		// steal 1 chunk of 1..7 remaining
KMP_COUNT_DEVELOPER_VALUE(FOR_static_steal_stolen, 1);		KMP_COUNT_DEVELOPER_VALUE(FOR_static_steal_stolen, 1);
init = (victim->u.p.ub -= 1);		init = (v->u.p.ub -= 1);
}		}
__kmp_release_lock(lck, gtid);		__kmp_release_lock(lckv, gtid);
		#ifdef KMP_DEBUG
		{
		char *buff;
		// create format specifiers before the debug output
		buff = __kmp_str_format(
		"__kmp_dispatch_next: T#%%d stolen chunks from T#%%d, "
		"count:%%%s ub:%%%s\n",
		traits_t<UT>::spec, traits_t<UT>::spec);
		KD_TRACE(10, (buff, gtid, victimId, init, limit));
		__kmp_str_free(&buff);
		}
		#endif
KMP_DEBUG_ASSERT(init + 1 <= limit);		KMP_DEBUG_ASSERT(init + 1 <= limit);
pr->u.p.parm4 = victimIdx; // remember victim to steal from		pr->u.p.parm4 = victimId; // remember victim to steal from
status = 1;		status = 1;
while_index = 0;		// now update own count and ub with stolen range excluding init chunk
// now update own count and ub with stolen range but init chunk		__kmp_acquire_lock(lck, gtid);
__kmp_acquire_lock(pr->u.p.th_steal_lock, gtid);
pr->u.p.count = init + 1;		pr->u.p.count = init + 1;
pr->u.p.ub = limit;		pr->u.p.ub = limit;
__kmp_release_lock(pr->u.p.th_steal_lock, gtid);		__kmp_release_lock(lck, gtid);
		// activate non-empty buffer and let others steal from us
		if (init + 1 < limit)
		KMP_ATOMIC_ST_REL(&pr->steal_flag, READY);
} // while (search for victim)		} // while (search for victim)
} // if (try to find victim and steal)		} // if (try to find victim and steal)
} else {		} else {
// 4-byte induction variable, use 8-byte CAS for pair (count, ub)		// 4-byte induction variable, use 8-byte CAS for pair (count, ub)
		// as all operations on pair (count, ub) must be done atomically
typedef union {		typedef union {
struct {		struct {
UT count;		UT count;
T ub;		T ub;
} p;		} p;
kmp_int64 b;		kmp_int64 b;
} union_i4;		} union_i4;
// All operations on 'count' or 'ub' must be combined atomically
// together.
{
union_i4 vold, vnew;		union_i4 vold, vnew;
		if (pr->u.p.count < (UT)pr->u.p.ub) {
		KMP_DEBUG_ASSERT(pr->steal_flag == READY);
vold.b = (volatile kmp_int64 )(&pr->u.p.count);		vold.b = (volatile kmp_int64 )(&pr->u.p.count);
vnew = vold;		vnew.b = vold.b;
vnew.p.count++;		vnew.p.count++; // get chunk from head of self range
while (!KMP_COMPARE_AND_STORE_ACQ64(		while (!KMP_COMPARE_AND_STORE_REL64(
(volatile kmp_int64 *)&pr->u.p.count,		(volatile kmp_int64 *)&pr->u.p.count,
VOLATILE_CAST(kmp_int64 ) & vold.b,		VOLATILE_CAST(kmp_int64 ) & vold.b,
VOLATILE_CAST(kmp_int64 ) & vnew.b)) {		VOLATILE_CAST(kmp_int64 ) & vnew.b)) {
KMP_CPU_PAUSE();		KMP_CPU_PAUSE();
vold.b = (volatile kmp_int64 )(&pr->u.p.count);		vold.b = (volatile kmp_int64 )(&pr->u.p.count);
vnew = vold;		vnew.b = vold.b;
vnew.p.count++;		vnew.p.count++;
}		}
vnew = vold;		init = vold.p.count;
init = vnew.p.count;		status = (init < (UT)vold.p.ub);
status = (init < (UT)vnew.p.ub);		} else {
		status = 0; // no own chunks
}		}
		if (!status) { // try to steal
if (!status) {
kmp_info_t **other_threads = team->t.t_threads;
T while_limit = pr->u.p.parm3;		T while_limit = pr->u.p.parm3;
T while_index = 0;		T while_index = 0;
T id = pr->u.p.static_steal_counter; // loop id
int idx = (th->th.th_dispatch->th_disp_index - 1) %		int idx = (th->th.th_dispatch->th_disp_index - 1) %
__kmp_dispatch_num_buffers; // current loop index		__kmp_dispatch_num_buffers; // current loop index
// note: victim thread can potentially execute another loop		// note: victim thread can potentially execute another loop
// TODO: algorithm of searching for a victim		KMP_ATOMIC_ST_REL(&pr->steal_flag, THIEF); // mark self buffer inactive
// should be cleaned up and measured
while ((!status) && (while_limit != ++while_index)) {		while ((!status) && (while_limit != ++while_index)) {
dispatch_private_info_template<T> *victim;		dispatch_private_info_template<T> *v;
union_i4 vold, vnew;
T remaining;		T remaining;
T victimIdx = pr->u.p.parm4;		T victimId = pr->u.p.parm4;
T oldVictimIdx = victimIdx ? victimIdx - 1 : nproc - 1;		T oldVictimId = victimId ? victimId - 1 : nproc - 1;
victim = reinterpret_cast<dispatch_private_info_template<T> *>(		v = reinterpret_cast<dispatch_private_info_template<T> *>(
&other_threads[victimIdx]->th.th_dispatch->th_disp_buffer[idx]);		&team->t.t_dispatch[victimId].th_disp_buffer[idx]);
KMP_DEBUG_ASSERT(victim);		KMP_DEBUG_ASSERT(v);
while ((victim == pr \|\| id != victim->u.p.static_steal_counter) &&		while ((v == pr \|\| KMP_ATOMIC_LD_RLX(&v->steal_flag) == THIEF) &&
oldVictimIdx != victimIdx) {		oldVictimId != victimId) {
victimIdx = (victimIdx + 1) % nproc;		victimId = (victimId + 1) % nproc;
victim = reinterpret_cast<dispatch_private_info_template<T> *>(		v = reinterpret_cast<dispatch_private_info_template<T> *>(
&other_threads[victimIdx]->th.th_dispatch->th_disp_buffer[idx]);		&team->t.t_dispatch[victimId].th_disp_buffer[idx]);
KMP_DEBUG_ASSERT(victim);		KMP_DEBUG_ASSERT(v);
}		}
if (victim == pr \|\| id != victim->u.p.static_steal_counter) {		if (v == pr \|\| KMP_ATOMIC_LD_RLX(&v->steal_flag) == THIEF) {
continue; // try once more (nproc attempts in total)		continue; // try once more (nproc attempts in total)
// no victim is ready yet to participate in stealing
// because no victim passed kmp_init_dispatch yet
}		}
pr->u.p.parm4 = victimIdx; // new victim found		if (KMP_ATOMIC_LD_RLX(&v->steal_flag) == UNUSED) {
		jlpeytonUnsubmitted Done Reply Inline Actions Can we get rid of these commented lines? jlpeyton: Can we get rid of these commented lines?
while (1) { // CAS loop if victim has enough chunks to steal		kmp_uint32 old = UNUSED;
vold.b = (volatile kmp_int64 )(&victim->u.p.count);		// try to steal whole range from inactive victim
vnew = vold;		status = v->steal_flag.compare_exchange_strong(old, THIEF);
		if (status) {
KMP_DEBUG_ASSERT((vnew.p.ub - 1) * (UT)chunk <= trip);		// initialize self buffer with victim's whole range of chunks
if (vnew.p.count >= (UT)vnew.p.ub \|\|		T id = victimId;
(remaining = vnew.p.ub - vnew.p.count) < 2) {		T small_chunk, extras;
pr->u.p.parm4 = (victimIdx + 1) % nproc; // shift start victim id		small_chunk = nchunks / nproc; // chunks per thread
break; // not enough chunks to steal, goto next victim		extras = nchunks % nproc;
		init = id * small_chunk + (id < extras ? id : extras);
		vnew.p.count = init + 1;
		vnew.p.ub = init + small_chunk + (id < extras ? 1 : 0);
		// write pair (count, ub) at once atomically
		#if KMP_ARCH_X86
		KMP_XCHG_FIXED64((volatile kmp_int64 *)(&pr->u.p.count), vnew.b);
		#else
		(volatile kmp_int64 )(&pr->u.p.count) = vnew.b;
		#endif
		pr->u.p.parm4 = (id + 1) % nproc; // remember neighbour tid
		// no need to initialize other thread invariants: lb, st, etc.
		#ifdef KMP_DEBUG
		{
		char *buff;
		// create format specifiers before the debug output
		buff = __kmp_str_format(
		"__kmp_dispatch_next: T#%%d stolen chunks from T#%%d, "
		"count:%%%s ub:%%%s\n",
		traits_t<UT>::spec, traits_t<T>::spec);
		KD_TRACE(10, (buff, gtid, id, pr->u.p.count, pr->u.p.ub));
		__kmp_str_free(&buff);
		}
		#endif
		// activate non-empty buffer and let others steal from us
		if (pr->u.p.count < (UT)pr->u.p.ub)
		KMP_ATOMIC_ST_REL(&pr->steal_flag, READY);
		break;
		}
}		}
if (remaining > 3) {		while (1) { // CAS loop with check if victim still has enough chunks
		// many threads may be stealing concurrently from same victim
		vold.b = (volatile kmp_int64 )(&v->u.p.count);
		if (KMP_ATOMIC_LD_ACQ(&v->steal_flag) != READY \|\|
		vold.p.count >= (UT)vold.p.ub) {
		pr->u.p.parm4 = (victimId + 1) % nproc; // shift start victim id
		break; // no chunks to steal, try next victim
		}
		vnew.b = vold.b;
		remaining = vold.p.ub - vold.p.count;
// try to steal 1/4 of remaining		// try to steal 1/4 of remaining
vnew.p.ub -= remaining >> 2;		// TODO: is this heuristics good enough??
		if (remaining > 7) {
		vnew.p.ub -= remaining >> 2; // steal from tail of victim's range
} else {		} else {
vnew.p.ub -= 1; // steal 1 chunk of 2 or 3 remaining		vnew.p.ub -= 1; // steal 1 chunk of 1..7 remaining
}		}
KMP_DEBUG_ASSERT((vnew.p.ub - 1) * (UT)chunk <= trip);		KMP_DEBUG_ASSERT((vnew.p.ub - 1) * (UT)chunk <= trip);
// TODO: Should this be acquire or release?		if (KMP_COMPARE_AND_STORE_REL64(
if (KMP_COMPARE_AND_STORE_ACQ64(		(volatile kmp_int64 *)&v->u.p.count,
(volatile kmp_int64 *)&victim->u.p.count,
VOLATILE_CAST(kmp_int64 ) & vold.b,		VOLATILE_CAST(kmp_int64 ) & vold.b,
VOLATILE_CAST(kmp_int64 ) & vnew.b)) {		VOLATILE_CAST(kmp_int64 ) & vnew.b)) {
// stealing succeeded		// stealing succedded
		#ifdef KMP_DEBUG
		{
		char *buff;
		// create format specifiers before the debug output
		buff = __kmp_str_format(
		"__kmp_dispatch_next: T#%%d stolen chunks from T#%%d, "
		"count:%%%s ub:%%%s\n",
		traits_t<T>::spec, traits_t<T>::spec);
		KD_TRACE(10, (buff, gtid, victimId, vnew.p.ub, vold.p.ub));
		__kmp_str_free(&buff);
		}
		#endif
KMP_COUNT_DEVELOPER_VALUE(FOR_static_steal_stolen,		KMP_COUNT_DEVELOPER_VALUE(FOR_static_steal_stolen,
vold.p.ub - vnew.p.ub);		vold.p.ub - vnew.p.ub);
status = 1;		status = 1;
while_index = 0;		pr->u.p.parm4 = victimId; // keep victim id
// now update own count and ub		// now update own count and ub
init = vnew.p.ub;		init = vnew.p.ub;
vold.p.count = init + 1;		vold.p.count = init + 1;
#if KMP_ARCH_X86		#if KMP_ARCH_X86
KMP_XCHG_FIXED64((volatile kmp_int64 *)(&pr->u.p.count), vold.b);		KMP_XCHG_FIXED64((volatile kmp_int64 *)(&pr->u.p.count), vold.b);
#else		#else
(volatile kmp_int64 )(&pr->u.p.count) = vold.b;		(volatile kmp_int64 )(&pr->u.p.count) = vold.b;
#endif		#endif
		// activate non-empty buffer and let others steal from us
		if (vold.p.count < (UT)vold.p.ub)
		KMP_ATOMIC_ST_REL(&pr->steal_flag, READY);
break;		break;
} // if (check CAS result)		} // if (check CAS result)
KMP_CPU_PAUSE(); // CAS failed, repeatedly attempt		KMP_CPU_PAUSE(); // CAS failed, repeatedly attempt
} // while (try to steal from particular victim)		} // while (try to steal from particular victim)
} // while (search for victim)		} // while (search for victim)
} // if (try to find victim and steal)		} // if (try to find victim and steal)
} // if (4-byte induction variable)		} // if (4-byte induction variable)
if (!status) {		if (!status) {
*p_lb = 0;		*p_lb = 0;
*p_ub = 0;		*p_ub = 0;
if (p_st != NULL)		if (p_st != NULL)
*p_st = 0;		*p_st = 0;
} else {		} else {
start = pr->u.p.parm2;		start = pr->u.p.lb;
init *= chunk;		init *= chunk;
limit = chunk + init - 1;		limit = chunk + init - 1;
incr = pr->u.p.st;		incr = pr->u.p.st;
KMP_COUNT_DEVELOPER_VALUE(FOR_static_steal_chunks, 1);		KMP_COUNT_DEVELOPER_VALUE(FOR_static_steal_chunks, 1);

KMP_DEBUG_ASSERT(init <= trip);		KMP_DEBUG_ASSERT(init <= trip);
		// keep track of done chunks for possible early exit from stealing
		// TODO: count executed chunks locally with rare update of shared location
		// test_then_inc<ST>((volatile ST *)&sh->u.s.iteration);
if ((last = (limit >= trip)) != 0)		if ((last = (limit >= trip)) != 0)
limit = trip;		limit = trip;
if (p_st != NULL)		if (p_st != NULL)
*p_st = incr;		*p_st = incr;

if (incr == 1) {		if (incr == 1) {
*p_lb = start + init;		*p_lb = start + init;
*p_ub = start + limit;		*p_ub = start + limit;
} else {		} else {
p_lb = start + init incr;		p_lb = start + init incr;
p_ub = start + limit incr;		p_ub = start + limit incr;
}		}

if (pr->flags.ordered) {
pr->u.p.ordered_lower = init;
pr->u.p.ordered_upper = limit;
} // if
} // if		} // if
break;		break;
} // case		} // case
#endif // ( KMP_STATIC_STEAL_ENABLED )		#endif // KMP_STATIC_STEAL_ENABLED
case kmp_sch_static_balanced: {		case kmp_sch_static_balanced: {
KD_TRACE(		KD_TRACE(
10,		10,
("__kmp_dispatch_next_algorithm: T#%d kmp_sch_static_balanced case\n",		("__kmp_dispatch_next_algorithm: T#%d kmp_sch_static_balanced case\n",
gtid));		gtid));
/* check if thread has any iteration to do */		/* check if thread has any iteration to do */
if ((status = !pr->u.p.count) != 0) {		if ((status = !pr->u.p.count) != 0) {
pr->u.p.count = 1;		pr->u.p.count = 1;
▲ Show 20 Lines • Show All 628 Lines • ▼ Show 20 Lines	if (pr->flags.use_hier)
status = sh->hier->next(loc, gtid, pr, &last, p_lb, p_ub, p_st);		status = sh->hier->next(loc, gtid, pr, &last, p_lb, p_ub, p_st);
else		else
#endif // KMP_USE_HIER_SCHED		#endif // KMP_USE_HIER_SCHED
status = __kmp_dispatch_next_algorithm<T>(gtid, pr, sh, &last, p_lb, p_ub,		status = __kmp_dispatch_next_algorithm<T>(gtid, pr, sh, &last, p_lb, p_ub,
p_st, th->th.th_team_nproc,		p_st, th->th.th_team_nproc,
th->th.th_info.ds.ds_tid);		th->th.th_info.ds.ds_tid);
// status == 0: no more iterations to execute		// status == 0: no more iterations to execute
if (status == 0) {		if (status == 0) {
UT num_done;		ST num_done;
		num_done = test_then_inc<ST>(&sh->u.s.num_done);
num_done = test_then_inc<ST>((volatile ST *)&sh->u.s.num_done);
#ifdef KMP_DEBUG		#ifdef KMP_DEBUG
{		{
char *buff;		char *buff;
// create format specifiers before the debug output		// create format specifiers before the debug output
buff = __kmp_str_format(		buff = __kmp_str_format(
"__kmp_dispatch_next: T#%%d increment num_done:%%%s\n",		"__kmp_dispatch_next: T#%%d increment num_done:%%%s\n",
traits_t<UT>::spec);		traits_t<ST>::spec);
KD_TRACE(10, (buff, gtid, sh->u.s.num_done));		KD_TRACE(10, (buff, gtid, sh->u.s.num_done));
__kmp_str_free(&buff);		__kmp_str_free(&buff);
}		}
#endif		#endif

#if KMP_USE_HIER_SCHED		#if KMP_USE_HIER_SCHED
pr->flags.use_hier = FALSE;		pr->flags.use_hier = FALSE;
#endif		#endif
if ((ST)num_done == th->th.th_team_nproc - 1) {		if (num_done == th->th.th_team_nproc - 1) {
#if (KMP_STATIC_STEAL_ENABLED)		#if KMP_STATIC_STEAL_ENABLED
if (pr->schedule == kmp_sch_static_steal &&		if (pr->schedule == kmp_sch_static_steal) {
traits_t<T>::type_size > 4) {
int i;		int i;
int idx = (th->th.th_dispatch->th_disp_index - 1) %		int idx = (th->th.th_dispatch->th_disp_index - 1) %
__kmp_dispatch_num_buffers; // current loop index		__kmp_dispatch_num_buffers; // current loop index
kmp_info_t **other_threads = team->t.t_threads;
// loop complete, safe to destroy locks used for stealing		// loop complete, safe to destroy locks used for stealing
for (i = 0; i < th->th.th_team_nproc; ++i) {		for (i = 0; i < th->th.th_team_nproc; ++i) {
dispatch_private_info_template<T> *buf =		dispatch_private_info_template<T> *buf =
reinterpret_cast<dispatch_private_info_template<T> *>(		reinterpret_cast<dispatch_private_info_template<T> *>(
&other_threads[i]->th.th_dispatch->th_disp_buffer[idx]);		&team->t.t_dispatch[i].th_disp_buffer[idx]);
kmp_lock_t *lck = buf->u.p.th_steal_lock;		KMP_ASSERT(buf->steal_flag == THIEF); // buffer must be inactive
		KMP_ATOMIC_ST_RLX(&buf->steal_flag, UNUSED);
		if (traits_t<T>::type_size > 4) {
		// destroy locks used for stealing
		kmp_lock_t *lck = buf->u.p.steal_lock;
KMP_ASSERT(lck != NULL);		KMP_ASSERT(lck != NULL);
__kmp_destroy_lock(lck);		__kmp_destroy_lock(lck);
__kmp_free(lck);		__kmp_free(lck);
buf->u.p.th_steal_lock = NULL;		buf->u.p.steal_lock = NULL;
		}
}		}
}		}
#endif		#endif
/* NOTE: release this buffer to be reused */		/* NOTE: release shared buffer to be reused */

KMP_MB(); /* Flush all pending memory write invalidates. */		KMP_MB(); /* Flush all pending memory write invalidates. */

sh->u.s.num_done = 0;		sh->u.s.num_done = 0;
sh->u.s.iteration = 0;		sh->u.s.iteration = 0;

/* TODO replace with general release procedure? */		/* TODO replace with general release procedure? */
if (pr->flags.ordered) {		if (pr->flags.ordered) {
sh->u.s.ordered_iteration = 0;		sh->u.s.ordered_iteration = 0;
}		}

KMP_MB(); /* Flush all pending memory write invalidates. */

sh->buffer_index += __kmp_dispatch_num_buffers;		sh->buffer_index += __kmp_dispatch_num_buffers;
KD_TRACE(100, ("__kmp_dispatch_next: T#%d change buffer_index:%d\n",		KD_TRACE(100, ("__kmp_dispatch_next: T#%d change buffer_index:%d\n",
gtid, sh->buffer_index));		gtid, sh->buffer_index));

KMP_MB(); /* Flush all pending memory write invalidates. */		KMP_MB(); /* Flush all pending memory write invalidates. */

} // if		} // if
if (__kmp_env_consistency_check) {		if (__kmp_env_consistency_check) {
▲ Show 20 Lines • Show All 518 Lines • Show Last 20 Lines

openmp/runtime/src/kmp_dispatch_hier.h

Show First 20 Lines • Show All 234 Lines • ▼ Show 20 Lines	default:
// don't use the core_barrier_impl for more than 8 threads		// don't use the core_barrier_impl for more than 8 threads
KMP_ASSERT(0);		KMP_ASSERT(0);
}		}
return wait_val;		return wait_val;
}		}

public:		public:
static void reset_private(kmp_int32 num_active,		static void reset_private(kmp_int32 num_active,
kmp_hier_private_bdata_t *tdata);		kmp_hier_private_bdata_t *tdata);
		Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'kmp_hier_private_bdata_t' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'kmp_hier_private_bdata_t' [clang-diagnostic-error]…
static void reset_shared(kmp_int32 num_active,		static void reset_shared(kmp_int32 num_active,
kmp_hier_shared_bdata_t<T> *bdata);		kmp_hier_shared_bdata_t<T> *bdata);
static void barrier(kmp_int32 id, kmp_hier_shared_bdata_t<T> *bdata,		static void barrier(kmp_int32 id, kmp_hier_shared_bdata_t<T> *bdata,
kmp_hier_private_bdata_t *tdata);		kmp_hier_private_bdata_t *tdata);
		Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'kmp_hier_private_bdata_t' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'kmp_hier_private_bdata_t' [clang-diagnostic-error]…
};		};

template <typename T>		template <typename T>
void core_barrier_impl<T>::reset_private(kmp_int32 num_active,		void core_barrier_impl<T>::reset_private(kmp_int32 num_active,
kmp_hier_private_bdata_t *tdata) {		kmp_hier_private_bdata_t *tdata) {
		Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'kmp_hier_private_bdata_t' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'kmp_hier_private_bdata_t' [clang-diagnostic-error]…
tdata->num_active = num_active;		tdata->num_active = num_active;
tdata->index = 0;		tdata->index = 0;
tdata->wait_val[0] = tdata->wait_val[1] = get_wait_val(num_active);		tdata->wait_val[0] = tdata->wait_val[1] = get_wait_val(num_active);
}		}
template <typename T>		template <typename T>
void core_barrier_impl<T>::reset_shared(kmp_int32 num_active,		void core_barrier_impl<T>::reset_shared(kmp_int32 num_active,
kmp_hier_shared_bdata_t<T> *bdata) {		kmp_hier_shared_bdata_t<T> *bdata) {
bdata->val[0] = bdata->val[1] = 0LL;		bdata->val[0] = bdata->val[1] = 0LL;
bdata->status[0] = bdata->status[1] = 0LL;		bdata->status[0] = bdata->status[1] = 0LL;
}		}
template <typename T>		template <typename T>
void core_barrier_impl<T>::barrier(kmp_int32 id,		void core_barrier_impl<T>::barrier(kmp_int32 id,
kmp_hier_shared_bdata_t<T> *bdata,		kmp_hier_shared_bdata_t<T> *bdata,
kmp_hier_private_bdata_t *tdata) {		kmp_hier_private_bdata_t *tdata) {
		Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'kmp_hier_private_bdata_t' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'kmp_hier_private_bdata_t' [clang-diagnostic-error]…
kmp_uint64 current_index = tdata->index;		kmp_uint64 current_index = tdata->index;
kmp_uint64 next_index = 1 - current_index;		kmp_uint64 next_index = 1 - current_index;
kmp_uint64 current_wait_value = tdata->wait_val[current_index];		kmp_uint64 current_wait_value = tdata->wait_val[current_index];
kmp_uint64 next_wait_value =		kmp_uint64 next_wait_value =
(current_wait_value ? 0 : get_wait_val(tdata->num_active));		(current_wait_value ? 0 : get_wait_val(tdata->num_active));
KD_TRACE(10, ("core_barrier_impl::barrier(): T#%d current_index:%llu "		KD_TRACE(10, ("core_barrier_impl::barrier(): T#%d current_index:%llu "
"next_index:%llu curr_wait:%llu next_wait:%llu\n",		"next_index:%llu curr_wait:%llu next_wait:%llu\n",
__kmp_get_gtid(), current_index, next_index, current_wait_value,		__kmp_get_gtid(), current_index, next_index, current_wait_value,
next_wait_value));		next_wait_value));
char v = (current_wait_value ? '\1' : '\0');		char v = (current_wait_value ? '\1' : '\0');
(RCAST(volatile char *, &(bdata->val[current_index])))[id] = v;		(RCAST(volatile char *, &(bdata->val[current_index])))[id] = v;
__kmp_wait<kmp_uint64>(&(bdata->val[current_index]), current_wait_value,		__kmp_wait<kmp_uint64>(&(bdata->val[current_index]), current_wait_value,
__kmp_eq<kmp_uint64> USE_ITT_BUILD_ARG(NULL));		__kmp_eq<kmp_uint64> USE_ITT_BUILD_ARG(NULL));
tdata->wait_val[current_index] = next_wait_value;		tdata->wait_val[current_index] = next_wait_value;
tdata->index = next_index;		tdata->index = next_index;
}		}

// Counter barrier implementation		// Counter barrier implementation
// Can be used in a unit with arbitrary number of active threads		// Can be used in a unit with arbitrary number of active threads
template <typename T> class counter_barrier_impl {		template <typename T> class counter_barrier_impl {
public:		public:
static void reset_private(kmp_int32 num_active,		static void reset_private(kmp_int32 num_active,
kmp_hier_private_bdata_t *tdata);		kmp_hier_private_bdata_t *tdata);
		Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'kmp_hier_private_bdata_t' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'kmp_hier_private_bdata_t' [clang-diagnostic-error]…
static void reset_shared(kmp_int32 num_active,		static void reset_shared(kmp_int32 num_active,
kmp_hier_shared_bdata_t<T> *bdata);		kmp_hier_shared_bdata_t<T> *bdata);
static void barrier(kmp_int32 id, kmp_hier_shared_bdata_t<T> *bdata,		static void barrier(kmp_int32 id, kmp_hier_shared_bdata_t<T> *bdata,
kmp_hier_private_bdata_t *tdata);		kmp_hier_private_bdata_t *tdata);
		Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'kmp_hier_private_bdata_t' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'kmp_hier_private_bdata_t' [clang-diagnostic-error]…
};		};

template <typename T>		template <typename T>
void counter_barrier_impl<T>::reset_private(kmp_int32 num_active,		void counter_barrier_impl<T>::reset_private(kmp_int32 num_active,
kmp_hier_private_bdata_t *tdata) {		kmp_hier_private_bdata_t *tdata) {
		Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'kmp_hier_private_bdata_t' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'kmp_hier_private_bdata_t' [clang-diagnostic-error]…
tdata->num_active = num_active;		tdata->num_active = num_active;
tdata->index = 0;		tdata->index = 0;
tdata->wait_val[0] = tdata->wait_val[1] = (kmp_uint64)num_active;		tdata->wait_val[0] = tdata->wait_val[1] = (kmp_uint64)num_active;
}		}
template <typename T>		template <typename T>
void counter_barrier_impl<T>::reset_shared(kmp_int32 num_active,		void counter_barrier_impl<T>::reset_shared(kmp_int32 num_active,
kmp_hier_shared_bdata_t<T> *bdata) {		kmp_hier_shared_bdata_t<T> *bdata) {
bdata->val[0] = bdata->val[1] = 0LL;		bdata->val[0] = bdata->val[1] = 0LL;
bdata->status[0] = bdata->status[1] = 0LL;		bdata->status[0] = bdata->status[1] = 0LL;
}		}
template <typename T>		template <typename T>
void counter_barrier_impl<T>::barrier(kmp_int32 id,		void counter_barrier_impl<T>::barrier(kmp_int32 id,
kmp_hier_shared_bdata_t<T> *bdata,		kmp_hier_shared_bdata_t<T> *bdata,
kmp_hier_private_bdata_t *tdata) {		kmp_hier_private_bdata_t *tdata) {
		Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'kmp_hier_private_bdata_t' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'kmp_hier_private_bdata_t' [clang-diagnostic-error]…
volatile kmp_int64 *val;		volatile kmp_int64 *val;
kmp_uint64 current_index = tdata->index;		kmp_uint64 current_index = tdata->index;
kmp_uint64 next_index = 1 - current_index;		kmp_uint64 next_index = 1 - current_index;
kmp_uint64 current_wait_value = tdata->wait_val[current_index];		kmp_uint64 current_wait_value = tdata->wait_val[current_index];
kmp_uint64 next_wait_value = current_wait_value + tdata->num_active;		kmp_uint64 next_wait_value = current_wait_value + tdata->num_active;

KD_TRACE(10, ("counter_barrier_impl::barrier(): T#%d current_index:%llu "		KD_TRACE(10, ("counter_barrier_impl::barrier(): T#%d current_index:%llu "
"next_index:%llu curr_wait:%llu next_wait:%llu\n",		"next_index:%llu curr_wait:%llu next_wait:%llu\n",
Show All 25 Lines	if (active == 1)
return;		return;
hier_barrier.zero();		hier_barrier.zero();
if (active >= 2 && active <= 8) {		if (active >= 2 && active <= 8) {
core_barrier_impl<T>::reset_shared(active, &hier_barrier);		core_barrier_impl<T>::reset_shared(active, &hier_barrier);
} else {		} else {
counter_barrier_impl<T>::reset_shared(active, &hier_barrier);		counter_barrier_impl<T>::reset_shared(active, &hier_barrier);
}		}
}		}
void reset_private_barrier(kmp_hier_private_bdata_t *tdata) {		void reset_private_barrier(kmp_hier_private_bdata_t *tdata) {
		Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'kmp_hier_private_bdata_t' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'kmp_hier_private_bdata_t' [clang-diagnostic-error]…
KMP_DEBUG_ASSERT(tdata);		KMP_DEBUG_ASSERT(tdata);
KMP_DEBUG_ASSERT(active > 0);		KMP_DEBUG_ASSERT(active > 0);
if (active == 1)		if (active == 1)
return;		return;
if (active >= 2 && active <= 8) {		if (active >= 2 && active <= 8) {
core_barrier_impl<T>::reset_private(active, tdata);		core_barrier_impl<T>::reset_private(active, tdata);
} else {		} else {
counter_barrier_impl<T>::reset_private(active, tdata);		counter_barrier_impl<T>::reset_private(active, tdata);
}		}
}		}
void barrier(kmp_int32 id, kmp_hier_private_bdata_t *tdata) {		void barrier(kmp_int32 id, kmp_hier_private_bdata_t *tdata) {
		Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'kmp_hier_private_bdata_t' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'kmp_hier_private_bdata_t' [clang-diagnostic-error]…
KMP_DEBUG_ASSERT(tdata);		KMP_DEBUG_ASSERT(tdata);
KMP_DEBUG_ASSERT(active > 0);		KMP_DEBUG_ASSERT(active > 0);
KMP_DEBUG_ASSERT(id >= 0 && id < active);		KMP_DEBUG_ASSERT(id >= 0 && id < active);
if (active == 1) {		if (active == 1) {
tdata->index = 1 - tdata->index;		tdata->index = 1 - tdata->index;
return;		return;
}		}
if (active >= 2 && active <= 8) {		if (active >= 2 && active <= 8) {
▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	private:
int next_recurse(ident_t loc, int gtid, kmp_hier_top_unit_t<T> current,		int next_recurse(ident_t loc, int gtid, kmp_hier_top_unit_t<T> current,
kmp_int32 p_last, T p_lb, T p_ub, ST p_st,		kmp_int32 p_last, T p_lb, T p_ub, ST p_st,
kmp_int32 previous_id, int hier_level) {		kmp_int32 previous_id, int hier_level) {
int status;		int status;
kmp_info_t *th = __kmp_threads[gtid];		kmp_info_t *th = __kmp_threads[gtid];
auto parent = current->get_parent();		auto parent = current->get_parent();
bool last_layer = (hier_level == get_num_layers() - 1);		bool last_layer = (hier_level == get_num_layers() - 1);
KMP_DEBUG_ASSERT(th);		KMP_DEBUG_ASSERT(th);
kmp_hier_private_bdata_t *tdata = &(th->th.th_hier_bar_data[hier_level]);		kmp_hier_private_bdata_t *tdata = &(th->th.th_hier_bar_data[hier_level]);
		Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'kmp_hier_private_bdata_t' [clang-diagnostic-error] not useful clang-tidy: error: use of undeclared identifier 'kmp_hier_private_bdata_t'; did you mean 'kmp_hier_shared_bdata_t'? [clang-diagnostic-error] not useful clang-tidy: error: no member named 'th_hier_bar_data' in 'kmp_base_info' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'kmp_hier_private_bdata_t' [clang-diagnostic-error]…
KMP_DEBUG_ASSERT(current);		KMP_DEBUG_ASSERT(current);
KMP_DEBUG_ASSERT(hier_level >= 0);		KMP_DEBUG_ASSERT(hier_level >= 0);
KMP_DEBUG_ASSERT(hier_level < get_num_layers());		KMP_DEBUG_ASSERT(hier_level < get_num_layers());
KMP_DEBUG_ASSERT(tdata);		KMP_DEBUG_ASSERT(tdata);
KMP_DEBUG_ASSERT(parent \|\| last_layer);		KMP_DEBUG_ASSERT(parent \|\| last_layer);

KD_TRACE(		KD_TRACE(
1, ("kmp_hier_t.next_recurse(): T#%d (%d) called\n", gtid, hier_level));		1, ("kmp_hier_t.next_recurse(): T#%d (%d) called\n", gtid, hier_level));
Show All 19 Lines	if (previous_id == 0) {
th->th.th_dispatch->th_dispatch_sh_current);		th->th.th_dispatch->th_dispatch_sh_current);
nproc = (T)get_top_level_nproc();		nproc = (T)get_top_level_nproc();
} else {		} else {
// middle layers use the shared buffer inside the kmp_hier_top_unit_t		// middle layers use the shared buffer inside the kmp_hier_top_unit_t
// structure		// structure
KD_TRACE(10, ("kmp_hier_t.next_recurse(): T#%d (%d) using hier sh\n",		KD_TRACE(10, ("kmp_hier_t.next_recurse(): T#%d (%d) using hier sh\n",
gtid, hier_level));		gtid, hier_level));
my_sh =		my_sh =
parent->get_curr_sh(th->th.th_hier_bar_data[hier_level + 1].index);		parent->get_curr_sh(th->th.th_hier_bar_data[hier_level + 1].index);
		Lint: Pre-merge checks Inline Actions clang-tidy: error: no member named 'th_hier_bar_data' in 'kmp_base_info' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: no member named 'th_hier_bar_data' in 'kmp_base_info' [clang-diagnostic…
nproc = (T)parent->get_num_active();		nproc = (T)parent->get_num_active();
}		}
my_pr = current->get_my_pr();		my_pr = current->get_my_pr();
KMP_DEBUG_ASSERT(my_sh);		KMP_DEBUG_ASSERT(my_sh);
KMP_DEBUG_ASSERT(my_pr);		KMP_DEBUG_ASSERT(my_pr);
enum sched_type schedule = get_sched(hier_level);		enum sched_type schedule = get_sched(hier_level);
ST chunk = (ST)get_chunk(hier_level);		ST chunk = (ST)get_chunk(hier_level);
status = __kmp_dispatch_next_algorithm<T>(gtid, my_pr, my_sh,		status = __kmp_dispatch_next_algorithm<T>(gtid, my_pr, my_sh,
Show All 10 Lines	if (previous_id == 0) {
__kmp_type_convert(hier_id, &hid);		__kmp_type_convert(hier_id, &hid);
status = next_recurse(loc, gtid, parent, &contains_last, &my_lb, &my_ub,		status = next_recurse(loc, gtid, parent, &contains_last, &my_lb, &my_ub,
&my_st, hid, hier_level + 1);		&my_st, hid, hier_level + 1);
KD_TRACE(		KD_TRACE(
10,		10,
("kmp_hier_t.next_recurse(): T#%d (%d) hier_next() returned %d\n",		("kmp_hier_t.next_recurse(): T#%d (%d) hier_next() returned %d\n",
gtid, hier_level, status));		gtid, hier_level, status));
if (status == 1) {		if (status == 1) {
kmp_hier_private_bdata_t *upper_tdata =		kmp_hier_private_bdata_t *upper_tdata =
		Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'kmp_hier_private_bdata_t' [clang-diagnostic-error] not useful clang-tidy: error: use of undeclared identifier 'kmp_hier_private_bdata_t'; did you mean 'kmp_hier_shared_bdata_t'? [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'kmp_hier_private_bdata_t' [clang-diagnostic-error]…
&(th->th.th_hier_bar_data[hier_level + 1]);		&(th->th.th_hier_bar_data[hier_level + 1]);
		Lint: Pre-merge checks Inline Actions clang-tidy: error: no member named 'th_hier_bar_data' in 'kmp_base_info' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: no member named 'th_hier_bar_data' in 'kmp_base_info' [clang-diagnostic…
my_sh = parent->get_curr_sh(upper_tdata->index);		my_sh = parent->get_curr_sh(upper_tdata->index);
KD_TRACE(10, ("kmp_hier_t.next_recurse(): T#%d (%d) about to init\n",		KD_TRACE(10, ("kmp_hier_t.next_recurse(): T#%d (%d) about to init\n",
gtid, hier_level));		gtid, hier_level));
__kmp_dispatch_init_algorithm(loc, gtid, my_pr, schedule,		__kmp_dispatch_init_algorithm(loc, gtid, my_pr, schedule,
parent->get_curr_lb(upper_tdata->index),		parent->get_curr_lb(upper_tdata->index),
parent->get_curr_ub(upper_tdata->index),		parent->get_curr_ub(upper_tdata->index),
parent->get_curr_st(upper_tdata->index),		parent->get_curr_st(upper_tdata->index),
#if USE_ITT_BUILD		#if USE_ITT_BUILD
▲ Show 20 Lines • Show All 360 Lines • ▼ Show 20 Lines
template <typename T>		template <typename T>
void __kmp_dispatch_init_hierarchy(ident_t *loc, int n,		void __kmp_dispatch_init_hierarchy(ident_t *loc, int n,
kmp_hier_layer_e *new_layers,		kmp_hier_layer_e *new_layers,
enum sched_type *new_scheds,		enum sched_type *new_scheds,
typename traits_t<T>::signed_t *new_chunks,		typename traits_t<T>::signed_t *new_chunks,
T lb, T ub,		T lb, T ub,
typename traits_t<T>::signed_t st) {		typename traits_t<T>::signed_t st) {
int tid, gtid, num_hw_threads, num_threads_per_layer1, active;		int tid, gtid, num_hw_threads, num_threads_per_layer1, active;
int my_buffer_index;		unsigned int my_buffer_index;
kmp_info_t *th;		kmp_info_t *th;
kmp_team_t *team;		kmp_team_t *team;
dispatch_private_info_template<T> *pr;		dispatch_private_info_template<T> *pr;
dispatch_shared_info_template<T> volatile *sh;		dispatch_shared_info_template<T> volatile *sh;
gtid = __kmp_entry_gtid();		gtid = __kmp_entry_gtid();
tid = __kmp_tid_from_gtid(gtid);		tid = __kmp_tid_from_gtid(gtid);
#ifdef KMP_DEBUG		#ifdef KMP_DEBUG
KD_TRACE(10, ("__kmp_dispatch_init_hierarchy: T#%d called: %d layer(s)\n",		KD_TRACE(10, ("__kmp_dispatch_init_hierarchy: T#%d called: %d layer(s)\n",
▲ Show 20 Lines • Show All 177 Lines • Show Last 20 Lines

openmp/runtime/src/kmp_settings.cpp

Show First 20 Lines • Show All 4,016 Lines • ▼ Show 20 Lines	#endif // KMP_USE_HIER_SCHED
// AC: TODO: probably remove TRAPEZOIDAL (OMP 3.0 does not allow it)		// AC: TODO: probably remove TRAPEZOIDAL (OMP 3.0 does not allow it)
else if (!__kmp_strcasecmp_with_sentinel("auto", ptr, *delim))		else if (!__kmp_strcasecmp_with_sentinel("auto", ptr, *delim))
sched = kmp_sch_auto;		sched = kmp_sch_auto;
else if (!__kmp_strcasecmp_with_sentinel("trapezoidal", ptr, *delim))		else if (!__kmp_strcasecmp_with_sentinel("trapezoidal", ptr, *delim))
sched = kmp_sch_trapezoidal;		sched = kmp_sch_trapezoidal;
else if (!__kmp_strcasecmp_with_sentinel("static", ptr, *delim))		else if (!__kmp_strcasecmp_with_sentinel("static", ptr, *delim))
sched = kmp_sch_static;		sched = kmp_sch_static;
#if KMP_STATIC_STEAL_ENABLED		#if KMP_STATIC_STEAL_ENABLED
else if (!__kmp_strcasecmp_with_sentinel("static_steal", ptr, *delim))		else if (!__kmp_strcasecmp_with_sentinel("static_steal", ptr, *delim)) {
sched = kmp_sch_static_steal;		// replace static_steal with dynamic to better cope with ordered loops
		sched = kmp_sch_dynamic_chunked;
		sched_modifier = sched_type::kmp_sch_modifier_nonmonotonic;
		}
#endif		#endif
else {		else {
// If there is no proper schedule kind, then this schedule is invalid		// If there is no proper schedule kind, then this schedule is invalid
KMP_WARNING(StgInvalidValue, name, value);		KMP_WARNING(StgInvalidValue, name, value);
__kmp_omp_schedule_restore();		__kmp_omp_schedule_restore();
return NULL;		return NULL;
}		}

▲ Show 20 Lines • Show All 2,218 Lines • Show Last 20 Lines

openmp/runtime/test/env/kmp_set_dispatch_buf.c

// RUN: %libomp-compile		// RUN: %libomp-compile
// RUN: env KMP_DISP_NUM_BUFFERS=0 %libomp-run		// RUN: env KMP_DISP_NUM_BUFFERS=0 %libomp-run
// RUN: env KMP_DISP_NUM_BUFFERS=1 %libomp-run		// RUN: env KMP_DISP_NUM_BUFFERS=1 %libomp-run
// RUN: env KMP_DISP_NUM_BUFFERS=3 %libomp-run		// RUN: env KMP_DISP_NUM_BUFFERS=3 %libomp-run
// RUN: env KMP_DISP_NUM_BUFFERS=4 %libomp-run		// RUN: env KMP_DISP_NUM_BUFFERS=4 %libomp-run
// RUN: env KMP_DISP_NUM_BUFFERS=7 %libomp-run		// RUN: env KMP_DISP_NUM_BUFFERS=7 %libomp-run
// RUN: %libomp-compile -DMY_SCHEDULE=guided		// RUN: %libomp-compile -DMY_SCHEDULE=guided
// RUN: env KMP_DISP_NUM_BUFFERS=1 %libomp-run		// RUN: env KMP_DISP_NUM_BUFFERS=1 %libomp-run
// RUN: env KMP_DISP_NUM_BUFFERS=3 %libomp-run		// RUN: env KMP_DISP_NUM_BUFFERS=3 %libomp-run
// RUN: env KMP_DISP_NUM_BUFFERS=4 %libomp-run		// RUN: env KMP_DISP_NUM_BUFFERS=4 %libomp-run
// RUN: env KMP_DISP_NUM_BUFFERS=7 %libomp-run		// RUN: env KMP_DISP_NUM_BUFFERS=7 %libomp-run
// UNSUPPORTED: clang-11, clang-12, clang-13		// UNSUPPORTED: clang-11, clang-12
#include <stdio.h>		#include <stdio.h>
#include <omp.h>		#include <omp.h>
#include <stdlib.h>		#include <stdlib.h>
#include <limits.h>		#include <limits.h>
#include "omp_testsuite.h"		#include "omp_testsuite.h"

#define INCR 7		#define INCR 7
#define MY_MAX 200		#define MY_MAX 200
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	for (i = MY_MAX; i >= MY_MIN; i-=INCR)
b_known_value++;		b_known_value++;
}		}

for(i = 0; i < REPETITIONS; i++) {		for(i = 0; i < REPETITIONS; i++) {
if(!test_kmp_set_disp_num_buffers()) {		if(!test_kmp_set_disp_num_buffers()) {
num_failed++;		num_failed++;
}		}
}		}
		if (num_failed == 0)
		printf("passed\n");
		else
		printf("failed %d\n", num_failed);
return num_failed;		return num_failed;
}		}

openmp/runtime/test/worksharing/for/kmp_set_dispatch_buf.c

// RUN: %libomp-compile && %libomp-run 7		// RUN: %libomp-compile && %libomp-run 7
// RUN: %libomp-run 0 && %libomp-run -1		// RUN: %libomp-run 0 && %libomp-run -1
// RUN: %libomp-run 1 && %libomp-run 2 && %libomp-run 5		// RUN: %libomp-run 1 && %libomp-run 2 && %libomp-run 5
// RUN: %libomp-compile -DMY_SCHEDULE=guided && %libomp-run 7		// RUN: %libomp-compile -DMY_SCHEDULE=guided && %libomp-run 7
// RUN: %libomp-run 1 && %libomp-run 2 && %libomp-run 5		// RUN: %libomp-run 1 && %libomp-run 2 && %libomp-run 5
// UNSUPPORTED: clang-11, clang-12, clang-13		// UNSUPPORTED: clang-11, clang-12
#include <stdio.h>		#include <stdio.h>
#include <omp.h>		#include <omp.h>
#include <stdlib.h>		#include <stdlib.h>
#include <limits.h>		#include <limits.h>
#include "omp_testsuite.h"		#include "omp_testsuite.h"

#define INCR 7		#define INCR 7
#define MY_MAX 200		#define MY_MAX 200
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	for (i = MY_MAX; i >= MY_MIN; i-=INCR)
b_known_value++;		b_known_value++;
}		}

for(i = 0; i < REPETITIONS; i++) {		for(i = 0; i < REPETITIONS; i++) {
if(!test_kmp_set_disp_num_buffers()) {		if(!test_kmp_set_disp_num_buffers()) {
num_failed++;		num_failed++;
}		}
}		}
		if (num_failed == 0)
		printf("passed\n");
		else
		printf("failed %d\n", num_failed);
return num_failed;		return num_failed;
}		}

openmp/runtime/test/worksharing/for/omp_for_schedule_runtime.c

	// RUN: %libomp-compile			// RUN: %libomp-compile
	// RUN: env OMP_SCHEDULE=static %libomp-run 1 0			// RUN: env OMP_SCHEDULE=static %libomp-run 1 0
	// RUN: env OMP_SCHEDULE=static,10 %libomp-run 1 10			// RUN: env OMP_SCHEDULE=static,10 %libomp-run 1 10
	// RUN: env OMP_SCHEDULE=dynamic %libomp-run 2 1			// RUN: env OMP_SCHEDULE=dynamic %libomp-run 2 1
	// RUN: env OMP_SCHEDULE=dynamic,11 %libomp-run 2 11			// RUN: env OMP_SCHEDULE=dynamic,11 %libomp-run 2 11
	// RUN: env OMP_SCHEDULE=guided %libomp-run 3 1			// RUN: env OMP_SCHEDULE=guided %libomp-run 3 1
	// RUN: env OMP_SCHEDULE=guided,12 %libomp-run 3 12			// RUN: env OMP_SCHEDULE=guided,12 %libomp-run 3 12
	// RUN: env OMP_SCHEDULE=auto %libomp-run 4 1			// RUN: env OMP_SCHEDULE=auto %libomp-run 4 1
	// RUN: env OMP_SCHEDULE=trapezoidal %libomp-run 101 1			// RUN: env OMP_SCHEDULE=trapezoidal %libomp-run 101 1
	// RUN: env OMP_SCHEDULE=trapezoidal,13 %libomp-run 101 13			// RUN: env OMP_SCHEDULE=trapezoidal,13 %libomp-run 101 13
	// RUN: env OMP_SCHEDULE=static_steal %libomp-run 102 1			// RUN: env OMP_SCHEDULE=static_steal %libomp-run 2 1
	// RUN: env OMP_SCHEDULE=static_steal,14 %libomp-run 102 14			// RUN: env OMP_SCHEDULE=static_steal,14 %libomp-run 2 14

	#include <stdio.h>			#include <stdio.h>
	#include <stdlib.h>			#include <stdlib.h>
	#include <math.h>			#include <math.h>
	#include "omp_testsuite.h"			#include "omp_testsuite.h"

	int sum;			int sum;
	char* correct_kind_string;			char* correct_kind_string;
	▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

openmp/runtime/test/worksharing/for/omp_par_in_loop.c

This file was added.

				// RUN: %libomp-compile-and-run
				//
				#include <stdlib.h>
				#include <stdio.h>
				#include <math.h>
				#include <omp.h>

				#define TYPE long
				#define MAX_ITER (TYPE)((TYPE)1000000)
				#define EVERY (TYPE)((TYPE)100000)

				int main(int argc, char* argv[]) {
				TYPE x = MAX_ITER;
				omp_set_max_active_levels(2);
				omp_set_num_threads(2);
				#pragma omp parallel for schedule(nonmonotonic:dynamic,1)
				for (TYPE i = 0; i < x; i++) {
				int tid = omp_get_thread_num();
				omp_set_num_threads(1);
				#pragma omp parallel proc_bind(spread)
				{
				if (i % EVERY == (TYPE)0)
				printf("Outer thread %d at iter %ld\n", tid, i);
				}
				}
				printf("passed\n");
				return 0;
				}