This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
openmp/runtime/src/
-
runtime/
-
src/
-
kmp.h
-
kmp_alloc.cpp
-
kmp_tasking.cpp

Differential D95819

[OpenMP] libomp cleanup: move fast allocation routines to kmp_tasking.cpp
AbandonedPublic

Authored by AndreyChurbanov on Feb 1 2021, 2:14 PM.

Download Raw Diff

Details

Reviewers

hbae
jlpeyton
tlwilmar
Nawrin
jdoerfert

Summary

Move internal memory cache routines from omp_alloc.cpp to kmp_tasking.cpp where they are mostly used.
This can give compiler more opportunities to optimize tasking code that uses allocations on a hot path.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

AndreyChurbanov created this revision.Feb 1 2021, 2:14 PM

Herald added subscribers: guansong, yaxunl. · View Herald TranscriptFeb 1 2021, 2:14 PM

AndreyChurbanov requested review of this revision.Feb 1 2021, 2:14 PM

Herald added a reviewer: jdoerfert. · View Herald TranscriptFeb 1 2021, 2:14 PM

Herald added subscribers: openmp-commits, sstefan1. · View Herald Transcript

I don't know if this is the right direction. Placing code based on call profiles seems to break the idea of modularity. I mean, ___kmp_fast_allocate is now a "tasking" thing?
I didn't see a reply yet, what about LTO for the runtime?

Harbormaster completed remote builds in B87422: Diff 320596.Feb 1 2021, 3:21 PM

In D95819#2534991, @jdoerfert wrote:

I don't know if this is the right direction. Placing code based on call profiles seems to break the idea of modularity. I mean, ___kmp_fast_allocate is now a "tasking" thing?
I didn't see a reply yet, what about LTO for the runtime?

@jdoerfert, thanks for the hint. I've made some performance experiments on SpecOMP 2012 376.kdtree test (which was initial trigger of this patch), and the results showed the patch does give some performance on current library build, but hurts performance on lto build. Moreover, the performance gain disappear if I also apply diff from https://reviews.llvm.org/D95816 (named it patch2 in the following data).

Some digits (time in sec):
I used Intel 19 compiler + libomp on 2x Xeon Gold 6252 (48 cores, 48 threads used):
trunk - 401
trunk+patch - 395
trunk+patch2 - 395
trunk-lto - 385
trunk+patch-lto 387

Similar performance trend seen on other platforms I have for testing.

So given that library built with lto gives better performance, and this patch hurts it, I am abandoning it.

In D95819#2547482, @AndreyChurbanov wrote:

In D95819#2534991, @jdoerfert wrote:

I don't know if this is the right direction. Placing code based on call profiles seems to break the idea of modularity. I mean, ___kmp_fast_allocate is now a "tasking" thing?
I didn't see a reply yet, what about LTO for the runtime?

@jdoerfert, thanks for the hint. I've made some performance experiments on SpecOMP 2012 376.kdtree test (which was initial trigger of this patch), and the results showed the patch does give some performance on current library build, but hurts performance on lto build. Moreover, the performance gain disappear if I also apply diff from https://reviews.llvm.org/D95816 (named it patch2 in the following data).

Some digits (time in sec):
I used Intel 19 compiler + libomp on 2x Xeon Gold 6252 (48 cores, 48 threads used):
trunk - 401
trunk+patch - 395
trunk+patch2 - 395
trunk-lto - 385
trunk+patch-lto 387

Similar performance trend seen on other platforms I have for testing.

So given that library built with lto gives better performance, and this patch hurts it, I am abandoning it.

Very interesting numbers!

Should we enable LTO for the runtime build by default (assuming the compiler + linker combo allow it)? I doubt the compile time hit is too bad, and the shown performance win would certainly be worth it.

Revision Contents

Path

Size

openmp/

runtime/

src/

kmp.h

24 lines

kmp_alloc.cpp

233 lines

kmp_tasking.cpp

220 lines

Diff 320596

openmp/runtime/src/kmp.h

	Show First 20 Lines • Show All 2,498 Lines • ▼ Show 20 Lines
	} kmp_base_task_team_t;			} kmp_base_task_team_t;

	union KMP_ALIGN_CACHE kmp_task_team {			union KMP_ALIGN_CACHE kmp_task_team {
	kmp_base_task_team_t tt;			kmp_base_task_team_t tt;
	double tt_align; /* use worst case alignment */			double tt_align; /* use worst case alignment */
	char tt_pad[KMP_PAD(kmp_base_task_team_t, CACHE_LINE)];			char tt_pad[KMP_PAD(kmp_base_task_team_t, CACHE_LINE)];
	};			};

				// Declarations for custom fast memory allocator
				// NOTE: bufsize must be a signed datatype
				#if KMP_OS_WINDOWS
				#if KMP_ARCH_X86 \|\| KMP_ARCH_ARM
				typedef kmp_int32 bufsize;
				#else
				typedef kmp_int64 bufsize;
				#endif
				#else
				typedef ssize_t bufsize;
				#endif // KMP_OS_WINDOWS

				typedef struct kmp_mem_descr { // Memory block descriptor.
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for class 'kmp_mem_descr' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for class 'kmp_mem_descr' [readability-identifier…
				void *ptr_allocated; // Pointer returned by malloc(), subject for free().
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for member 'ptr_allocated' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for member 'ptr_allocated' [readability-identifier…
				size_t size_allocated; // Size of allocated memory block.
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for member 'size_allocated' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for member 'size_allocated' [readability-identifier…
				void *ptr_aligned; // Pointer to aligned memory, to be used by client code.
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for member 'ptr_aligned' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for member 'ptr_aligned' [readability-identifier…
				size_t size_aligned; // Size of aligned memory block.
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for member 'size_aligned' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for member 'size_aligned' [readability-identifier…
				} kmp_mem_descr_t;

	#if (USE_FAST_MEMORY == 3) \|\| (USE_FAST_MEMORY == 5)			#if (USE_FAST_MEMORY == 3) \|\| (USE_FAST_MEMORY == 5)
	// Free lists keep same-size free memory slots for fast memory allocation			// Free lists keep same-size free memory slots for fast memory allocation
	// routines			// routines
	typedef struct kmp_free_list {			typedef struct kmp_free_list {
	void *th_free_list_self; // Self-allocated tasks free list			void *th_free_list_self; // Self-allocated tasks free list
	void *th_free_list_sync; // Self-allocated tasks stolen/returned by other			void *th_free_list_sync; // Self-allocated tasks stolen/returned by other
	// threads			// threads
	void *th_free_list_other; // Non-self free list (to be returned to owner's			void *th_free_list_other; // Non-self free list (to be returned to owner's
	▲ Show 20 Lines • Show All 176 Lines • ▼ Show 20 Lines
	} kmp_base_info_t;			} kmp_base_info_t;

	typedef union KMP_ALIGN_CACHE kmp_info {			typedef union KMP_ALIGN_CACHE kmp_info {
	double th_align; /* use worst case alignment */			double th_align; /* use worst case alignment */
	char th_pad[KMP_PAD(kmp_base_info_t, CACHE_LINE)];			char th_pad[KMP_PAD(kmp_base_info_t, CACHE_LINE)];
	kmp_base_info_t th;			kmp_base_info_t th;
	} kmp_info_t;			} kmp_info_t;

				// some memory allocation wrappers declaration
				extern void kmp_b_alloc(kmp_info_t th, bufsize s);
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'kmp_b_alloc' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'th' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 's' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'kmp_b_alloc' [readability-identifier…
				extern void kmp_b_free(kmp_info_t th, void buf);
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'kmp_b_free' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'th' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'buf' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'kmp_b_free' [readability-identifier…
				extern void kmp_b_dequeue(kmp_info_t *th);
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'kmp_b_dequeue' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'th' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'kmp_b_dequeue' [readability-identifier…

	// OpenMP thread team data structures			// OpenMP thread team data structures

	typedef struct kmp_base_data { volatile kmp_uint32 t_value; } kmp_base_data_t;			typedef struct kmp_base_data { volatile kmp_uint32 t_value; } kmp_base_data_t;

	typedef union KMP_ALIGN_CACHE kmp_sleep_team {			typedef union KMP_ALIGN_CACHE kmp_sleep_team {
	double dt_align; /* use worst case alignment */			double dt_align; /* use worst case alignment */
	char dt_pad[KMP_PAD(kmp_base_data_t, CACHE_LINE)];			char dt_pad[KMP_PAD(kmp_base_data_t, CACHE_LINE)];
	kmp_base_data_t dt;			kmp_base_data_t dt;
	▲ Show 20 Lines • Show All 1,582 Lines • Show Last 20 Lines

openmp/runtime/src/kmp_alloc.cpp

Show All 17 Lines
#if KMP_USE_BGET		#if KMP_USE_BGET

/* Thread private buffer management code */		/* Thread private buffer management code */

typedef int (*bget_compact_t)(size_t, int);		typedef int (*bget_compact_t)(size_t, int);
typedef void (bget_acquire_t)(size_t);		typedef void (bget_acquire_t)(size_t);
typedef void (bget_release_t)(void );		typedef void (bget_release_t)(void );

/* NOTE: bufsize must be a signed datatype */

#if KMP_OS_WINDOWS
#if KMP_ARCH_X86 \|\| KMP_ARCH_ARM
typedef kmp_int32 bufsize;
#else
typedef kmp_int64 bufsize;
#endif
#else
typedef ssize_t bufsize;
#endif // KMP_OS_WINDOWS

/* The three modes of operation are, fifo search, lifo search, and best-fit */		/* The three modes of operation are, fifo search, lifo search, and best-fit */

typedef enum bget_mode {		typedef enum bget_mode {
bget_mode_fifo = 0,		bget_mode_fifo = 0,
bget_mode_lifo = 1,		bget_mode_lifo = 1,
bget_mode_best = 2		bget_mode_best = 2
} bget_mode_t;		} bget_mode_t;

▲ Show 20 Lines • Show All 1,683 Lines • ▼ Show 20 Lines	void __kmpc_free(int gtid, void *ptr, const omp_allocator_handle_t allocator) {
KE_TRACE(10, ("__kmpc_free: T#%d freed %p (%p)\n", gtid, desc.ptr_alloc,		KE_TRACE(10, ("__kmpc_free: T#%d freed %p (%p)\n", gtid, desc.ptr_alloc,
allocator));		allocator));
}		}

/* If LEAK_MEMORY is defined, __kmp_free() will not free memory. It causes		/* If LEAK_MEMORY is defined, __kmp_free() will not free memory. It causes
memory leaks, but it may be useful for debugging memory corruptions, used		memory leaks, but it may be useful for debugging memory corruptions, used
freed pointers, etc. */		freed pointers, etc. */
/* #define LEAK_MEMORY */		/* #define LEAK_MEMORY */
struct kmp_mem_descr { // Memory block descriptor.
void *ptr_allocated; // Pointer returned by malloc(), subject for free().
size_t size_allocated; // Size of allocated memory block.
void *ptr_aligned; // Pointer to aligned memory, to be used by client code.
size_t size_aligned; // Size of aligned memory block.
};
typedef struct kmp_mem_descr kmp_mem_descr_t;

/* Allocate memory on requested boundary, fill allocated memory with 0x00.		/* Allocate memory on requested boundary, fill allocated memory with 0x00.
NULL is NEVER returned, __kmp_abort() is called in case of memory allocation		NULL is NEVER returned, __kmp_abort() is called in case of memory allocation
error. Must use __kmp_free when freeing memory allocated by this routine! */		error. Must use __kmp_free when freeing memory allocated by this routine! */
static void *___kmp_allocate_align(size_t size,		static void *___kmp_allocate_align(size_t size,
size_t alignment KMP_SRC_LOC_DECL) {		size_t alignment KMP_SRC_LOC_DECL) {
/* __kmp_allocate() allocates (by call to malloc()) bigger memory block than		/* __kmp_allocate() allocates (by call to malloc()) bigger memory block than
requested to return properly aligned pointer. Original pointer returned		requested to return properly aligned pointer. Original pointer returned
▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines
#else		#else
free_src_loc(descr.ptr_allocated KMP_SRC_LOC_PARM);		free_src_loc(descr.ptr_allocated KMP_SRC_LOC_PARM);
#endif		#endif
#endif		#endif
KMP_MB();		KMP_MB();
KE_TRACE(25, ("<- __kmp_free() returns\n"));		KE_TRACE(25, ("<- __kmp_free() returns\n"));
} // func ___kmp_free		} // func ___kmp_free

#if USE_FAST_MEMORY == 3		// some memory allocation wrappers definition
// Allocate fast memory by first scanning the thread's free lists		void kmp_b_alloc(kmp_info_t th, bufsize s) { return bget(th, s); }
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'kmp_b_alloc' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'th' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 's' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'kmp_b_alloc' [readability-identifier…
// If a chunk the right size exists, grab it off the free list.		void kmp_b_free(kmp_info_t th, void buf) { brel(th, buf); }
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'kmp_b_free' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'th' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'buf' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'kmp_b_free' [readability-identifier…
// Otherwise allocate normally using kmp_thread_malloc.		void kmp_b_dequeue(kmp_info_t *th) { __kmp_bget_dequeue(th); }
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'kmp_b_dequeue' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'th' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'kmp_b_dequeue' [readability-identifier…

// AC: How to choose the limit? Just get 16 for now...
#define KMP_FREE_LIST_LIMIT 16

// Always use 128 bytes for determining buckets for caching memory blocks
#define DCACHE_LINE 128

void ___kmp_fast_allocate(kmp_info_t this_thr, size_t size KMP_SRC_LOC_DECL) {		#if USE_FAST_MEMORY == 3
void *ptr;
size_t num_lines, idx;
int index;
void *alloc_ptr;
size_t alloc_size;
kmp_mem_descr_t *descr;

KE_TRACE(25, ("-> __kmp_fast_allocate( T#%d, %d ) called from %s:%d\n",
__kmp_gtid_from_thread(this_thr), (int)size KMP_SRC_LOC_PARM));

num_lines = (size + DCACHE_LINE - 1) / DCACHE_LINE;
idx = num_lines - 1;
KMP_DEBUG_ASSERT(idx >= 0);
if (idx < 2) {
index = 0; // idx is [ 0, 1 ], use first free list
num_lines = 2; // 1, 2 cache lines or less than cache line
} else if ((idx >>= 2) == 0) {
index = 1; // idx is [ 2, 3 ], use second free list
num_lines = 4; // 3, 4 cache lines
} else if ((idx >>= 2) == 0) {
index = 2; // idx is [ 4, 15 ], use third free list
num_lines = 16; // 5, 6, ..., 16 cache lines
} else if ((idx >>= 2) == 0) {
index = 3; // idx is [ 16, 63 ], use fourth free list
num_lines = 64; // 17, 18, ..., 64 cache lines
} else {
goto alloc_call; // 65 or more cache lines ( > 8KB ), don't use free lists
}

ptr = this_thr->th.th_free_lists[index].th_free_list_self;
if (ptr != NULL) {
// pop the head of no-sync free list
this_thr->th.th_free_lists[index].th_free_list_self = ((void *)ptr);
KMP_DEBUG_ASSERT(
this_thr ==
((kmp_mem_descr_t *)((kmp_uintptr_t)ptr - sizeof(kmp_mem_descr_t)))
->ptr_aligned);
goto end;
}
ptr = TCR_SYNC_PTR(this_thr->th.th_free_lists[index].th_free_list_sync);
if (ptr != NULL) {
// no-sync free list is empty, use sync free list (filled in by other
// threads only)
// pop the head of the sync free list, push NULL instead
while (!KMP_COMPARE_AND_STORE_PTR(
&this_thr->th.th_free_lists[index].th_free_list_sync, ptr, nullptr)) {
KMP_CPU_PAUSE();
ptr = TCR_SYNC_PTR(this_thr->th.th_free_lists[index].th_free_list_sync);
}
// push the rest of chain into no-sync free list (can be NULL if there was
// the only block)
this_thr->th.th_free_lists[index].th_free_list_self = ((void *)ptr);
KMP_DEBUG_ASSERT(
this_thr ==
((kmp_mem_descr_t *)((kmp_uintptr_t)ptr - sizeof(kmp_mem_descr_t)))
->ptr_aligned);
goto end;
}

alloc_call:
// haven't found block in the free lists, thus allocate it
size = num_lines * DCACHE_LINE;

alloc_size = size + sizeof(kmp_mem_descr_t) + DCACHE_LINE;
KE_TRACE(25, ("__kmp_fast_allocate: T#%d Calling __kmp_thread_malloc with "
"alloc_size %d\n",
__kmp_gtid_from_thread(this_thr), alloc_size));
alloc_ptr = bget(this_thr, (bufsize)alloc_size);

// align ptr to DCACHE_LINE
ptr = (void *)((((kmp_uintptr_t)alloc_ptr) + sizeof(kmp_mem_descr_t) +
DCACHE_LINE) &
~(DCACHE_LINE - 1));
descr = (kmp_mem_descr_t *)(((kmp_uintptr_t)ptr) - sizeof(kmp_mem_descr_t));

descr->ptr_allocated = alloc_ptr; // remember allocated pointer
// we don't need size_allocated
descr->ptr_aligned = (void *)this_thr; // remember allocating thread
// (it is already saved in bget buffer,
// but we may want to use another allocator in future)
descr->size_aligned = size;

end:
KE_TRACE(25, ("<- __kmp_fast_allocate( T#%d ) returns %p\n",
__kmp_gtid_from_thread(this_thr), ptr));
return ptr;
} // func __kmp_fast_allocate

// Free fast memory and place it on the thread's free list if it is of
// the correct size.
void ___kmp_fast_free(kmp_info_t this_thr, void ptr KMP_SRC_LOC_DECL) {
kmp_mem_descr_t *descr;
kmp_info_t *alloc_thr;
size_t size;
size_t idx;
int index;

KE_TRACE(25, ("-> __kmp_fast_free( T#%d, %p ) called from %s:%d\n",
__kmp_gtid_from_thread(this_thr), ptr KMP_SRC_LOC_PARM));
KMP_ASSERT(ptr != NULL);

descr = (kmp_mem_descr_t *)(((kmp_uintptr_t)ptr) - sizeof(kmp_mem_descr_t));

KE_TRACE(26, (" __kmp_fast_free: size_aligned=%d\n",
(int)descr->size_aligned));

size = descr->size_aligned; // 2, 4, 16, 64, 65, 66, ... cache lines

idx = DCACHE_LINE * 2; // 2 cache lines is minimal size of block
if (idx == size) {
index = 0; // 2 cache lines
} else if ((idx <<= 1) == size) {
index = 1; // 4 cache lines
} else if ((idx <<= 2) == size) {
index = 2; // 16 cache lines
} else if ((idx <<= 2) == size) {
index = 3; // 64 cache lines
} else {
KMP_DEBUG_ASSERT(size > DCACHE_LINE * 64);
goto free_call; // 65 or more cache lines ( > 8KB )
}

alloc_thr = (kmp_info_t *)descr->ptr_aligned; // get thread owning the block
if (alloc_thr == this_thr) {
// push block to self no-sync free list, linking previous head (LIFO)
((void *)ptr) = this_thr->th.th_free_lists[index].th_free_list_self;
this_thr->th.th_free_lists[index].th_free_list_self = ptr;
} else {
void *head = this_thr->th.th_free_lists[index].th_free_list_other;
if (head == NULL) {
// Create new free list
this_thr->th.th_free_lists[index].th_free_list_other = ptr;
((void *)ptr) = NULL; // mark the tail of the list
descr->size_allocated = (size_t)1; // head of the list keeps its length
} else {
// need to check existed "other" list's owner thread and size of queue
kmp_mem_descr_t *dsc =
(kmp_mem_descr_t )((char )head - sizeof(kmp_mem_descr_t));
// allocating thread, same for all queue nodes
kmp_info_t q_th = (kmp_info_t )(dsc->ptr_aligned);
size_t q_sz =
dsc->size_allocated + 1; // new size in case we add current task
if (q_th == alloc_thr && q_sz <= KMP_FREE_LIST_LIMIT) {
// we can add current task to "other" list, no sync needed
((void *)ptr) = head;
descr->size_allocated = q_sz;
this_thr->th.th_free_lists[index].th_free_list_other = ptr;
} else {
// either queue blocks owner is changing or size limit exceeded
// return old queue to allocating thread (q_th) synchronously,
// and start new list for alloc_thr's tasks
void *old_ptr;
void *tail = head;
void next = ((void **)head);
while (next != NULL) {
KMP_DEBUG_ASSERT(
// queue size should decrease by 1 each step through the list
((kmp_mem_descr_t )((char )next - sizeof(kmp_mem_descr_t)))
->size_allocated +
1 ==
((kmp_mem_descr_t )((char )tail - sizeof(kmp_mem_descr_t)))
->size_allocated);
tail = next; // remember tail node
next = ((void *)next);
}
KMP_DEBUG_ASSERT(q_th != NULL);
// push block to owner's sync free list
old_ptr = TCR_PTR(q_th->th.th_free_lists[index].th_free_list_sync);
/* the next pointer must be set before setting free_list to ptr to avoid
exposing a broken list to other threads, even for an instant. */
((void *)tail) = old_ptr;

while (!KMP_COMPARE_AND_STORE_PTR(
&q_th->th.th_free_lists[index].th_free_list_sync, old_ptr, head)) {
KMP_CPU_PAUSE();
old_ptr = TCR_PTR(q_th->th.th_free_lists[index].th_free_list_sync);
((void *)tail) = old_ptr;
}

// start new list of not-selt tasks
this_thr->th.th_free_lists[index].th_free_list_other = ptr;
((void *)ptr) = NULL;
descr->size_allocated = (size_t)1; // head of queue keeps its length
}
}
}
goto end;

free_call:
KE_TRACE(25, ("__kmp_fast_free: T#%d Calling __kmp_thread_free for size %d\n",
__kmp_gtid_from_thread(this_thr), size));
__kmp_bget_dequeue(this_thr); /* Release any queued buffers */
brel(this_thr, descr->ptr_allocated);

end:
KE_TRACE(25, ("<- __kmp_fast_free() returns\n"));

} // func __kmp_fast_free

// Initialize the thread free lists related to fast memory		// Initialize the thread free lists related to fast memory
// Only do this when a thread is initially created.		// Only do this when a thread is initially created.
void __kmp_initialize_fast_memory(kmp_info_t *this_thr) {		void __kmp_initialize_fast_memory(kmp_info_t *this_thr) {
KE_TRACE(10, ("__kmp_initialize_fast_memory: Called from th %p\n", this_thr));		KE_TRACE(10, ("__kmp_initialize_fast_memory: Called from th %p\n", this_thr));

memset(this_thr->th.th_free_lists, 0, NUM_LISTS * sizeof(kmp_free_list_t));		memset(this_thr->th.th_free_lists, 0, NUM_LISTS * sizeof(kmp_free_list_t));
}		}
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

openmp/runtime/src/kmp_tasking.cpp

Show First 20 Lines • Show All 1,447 Lines • ▼ Show 20 Lines	if (__kmp_enable_hidden_helper) {
auto &input_flags = reinterpret_cast<kmp_tasking_flags_t &>(flags);		auto &input_flags = reinterpret_cast<kmp_tasking_flags_t &>(flags);
input_flags.hidden_helper = TRUE;		input_flags.hidden_helper = TRUE;
}		}

return __kmpc_omp_task_alloc(loc_ref, gtid, flags, sizeof_kmp_task_t,		return __kmpc_omp_task_alloc(loc_ref, gtid, flags, sizeof_kmp_task_t,
sizeof_shareds, task_entry);		sizeof_shareds, task_entry);
}		}

		#if USE_FAST_MEMORY == 3
		// Allocate fast memory by first scanning the thread's free lists
		// If a chunk the right size exists, grab it off the free list.
		// Otherwise allocate normally using kmp_thread_malloc.

		// AC: How to choose the limit? Just get 16 for now...
		#define KMP_FREE_LIST_LIMIT 16

		// Always use 128 bytes for determining buckets for caching memory blocks
		#define DCACHE_LINE 128

		void ___kmp_fast_allocate(kmp_info_t this_thr, size_t size KMP_SRC_LOC_DECL) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'this_thr' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'size' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'this_thr' [readability-identifier…
		void *ptr;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'ptr' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'ptr' [readability-identifier-naming]…
		size_t num_lines, idx;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'num_lines' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for variable 'idx' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'num_lines' [readability-identifier…
		int index;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'index' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'index' [readability-identifier-naming]…
		void *alloc_ptr;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'alloc_ptr' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'alloc_ptr' [readability-identifier…
		size_t alloc_size;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'alloc_size' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'alloc_size' [readability-identifier…
		kmp_mem_descr_t *descr;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'descr' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'descr' [readability-identifier-naming]…

		KE_TRACE(25, ("-> __kmp_fast_allocate( T#%d, %d ) called from %s:%d\n",
		__kmp_gtid_from_thread(this_thr), (int)size KMP_SRC_LOC_PARM));

		num_lines = (size + DCACHE_LINE - 1) / DCACHE_LINE;
		idx = num_lines - 1;
		KMP_DEBUG_ASSERT(idx >= 0);
		if (idx < 2) {
		index = 0; // idx is [ 0, 1 ], use first free list
		num_lines = 2; // 1, 2 cache lines or less than cache line
		} else if ((idx >>= 2) == 0) {
		index = 1; // idx is [ 2, 3 ], use second free list
		num_lines = 4; // 3, 4 cache lines
		} else if ((idx >>= 2) == 0) {
		index = 2; // idx is [ 4, 15 ], use third free list
		num_lines = 16; // 5, 6, ..., 16 cache lines
		} else if ((idx >>= 2) == 0) {
		index = 3; // idx is [ 16, 63 ], use fourth free list
		num_lines = 64; // 17, 18, ..., 64 cache lines
		} else {
		goto alloc_call; // 65 or more cache lines ( > 8KB ), don't use free lists
		}

		ptr = this_thr->th.th_free_lists[index].th_free_list_self;
		if (ptr != NULL) {
		// pop the head of no-sync free list
		this_thr->th.th_free_lists[index].th_free_list_self = ((void *)ptr);
		KMP_DEBUG_ASSERT(
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - KMP_DEBUG_ASSERT( - this_thr == - ((kmp_mem_descr_t )((kmp_uintptr_t)ptr - sizeof(kmp_mem_descr_t))) - ->ptr_aligned); + KMP_DEBUG_ASSERT(this_thr == ((kmp_mem_descr_t )((kmp_uintptr_t)ptr - + sizeof(kmp_mem_descr_t))) + ->ptr_aligned); Lint: Pre-merge checks: clang-format: please reformat the code ``` - KMP_DEBUG_ASSERT( - this_thr ==…
		this_thr ==
		((kmp_mem_descr_t *)((kmp_uintptr_t)ptr - sizeof(kmp_mem_descr_t)))
		->ptr_aligned);
		goto end;
		}
		ptr = TCR_SYNC_PTR(this_thr->th.th_free_lists[index].th_free_list_sync);
		if (ptr != NULL) {
		// no-sync free list is empty, use sync free list (filled in by other
		// threads only)
		// pop the head of the sync free list, push NULL instead
		while (!KMP_COMPARE_AND_STORE_PTR(
		&this_thr->th.th_free_lists[index].th_free_list_sync, ptr, nullptr)) {
		KMP_CPU_PAUSE();
		ptr = TCR_SYNC_PTR(this_thr->th.th_free_lists[index].th_free_list_sync);
		}
		// push the rest of chain into no-sync free list (can be NULL if there was
		// the only block)
		this_thr->th.th_free_lists[index].th_free_list_self = ((void *)ptr);
		KMP_DEBUG_ASSERT(
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - KMP_DEBUG_ASSERT( - this_thr == - ((kmp_mem_descr_t )((kmp_uintptr_t)ptr - sizeof(kmp_mem_descr_t))) - ->ptr_aligned); + KMP_DEBUG_ASSERT(this_thr == ((kmp_mem_descr_t )((kmp_uintptr_t)ptr - + sizeof(kmp_mem_descr_t))) + ->ptr_aligned); Lint: Pre-merge checks: clang-format: please reformat the code ``` - KMP_DEBUG_ASSERT( - this_thr ==…
		this_thr ==
		((kmp_mem_descr_t *)((kmp_uintptr_t)ptr - sizeof(kmp_mem_descr_t)))
		->ptr_aligned);
		goto end;
		}

		alloc_call:
		// haven't found block in the free lists, thus allocate it
		size = num_lines * DCACHE_LINE;

		alloc_size = size + sizeof(kmp_mem_descr_t) + DCACHE_LINE;
		KE_TRACE(25, ("__kmp_fast_allocate: T#%d Calling __kmp_thread_malloc with "
		"alloc_size %d\n",
		__kmp_gtid_from_thread(this_thr), alloc_size));
		#if INTEL_PRIVATE
		// AC: replace tbbmalloc with BGET for big blocks until the issues with
		// 358.botsalgn test is fixed:
		// CQ180918: slowdown,
		// CQ292223: memory consumption,
		// CQ291050: unneeded allocations.
		// alloc_ptr = ___kmp_thread_malloc( this_thr, alloc_size KMP_SRC_LOC_PARM );
		#endif /* INTEL_PRIVATE */
		alloc_ptr = kmp_b_alloc(this_thr, (bufsize)alloc_size);

		// align ptr to DCACHE_LINE
		ptr = (void *)((((kmp_uintptr_t)alloc_ptr) + sizeof(kmp_mem_descr_t) +
		DCACHE_LINE) &
		~(DCACHE_LINE - 1));
		descr = (kmp_mem_descr_t *)(((kmp_uintptr_t)ptr) - sizeof(kmp_mem_descr_t));

		descr->ptr_allocated = alloc_ptr; // remember allocated pointer
		// we don't need size_allocated
		descr->ptr_aligned = (void *)this_thr; // remember allocating thread
		// (it is already saved in bget buffer,
		// but we may want to use another allocator in future)
		descr->size_aligned = size;

		end:
		KE_TRACE(25, ("<- __kmp_fast_allocate( T#%d ) returns %p\n",
		__kmp_gtid_from_thread(this_thr), ptr));
		return ptr;
		} // func __kmp_fast_allocate

		// Free fast memory and place it on the thread's free list if it is of
		// the correct size.
		void ___kmp_fast_free(kmp_info_t this_thr, void ptr KMP_SRC_LOC_DECL) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '___kmp_fast_free' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'this_thr' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'ptr' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '___kmp_fast_free' [readability-identifier…
		kmp_mem_descr_t *descr;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'descr' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'descr' [readability-identifier-naming]…
		kmp_info_t *alloc_thr;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'alloc_thr' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'alloc_thr' [readability-identifier…
		size_t size;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'size' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'size' [readability-identifier-naming]…
		size_t idx;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'idx' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'idx' [readability-identifier-naming]…
		int index;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'index' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'index' [readability-identifier-naming]…

		KE_TRACE(25, ("-> __kmp_fast_free( T#%d, %p ) called from %s:%d\n",
		__kmp_gtid_from_thread(this_thr), ptr KMP_SRC_LOC_PARM));
		KMP_ASSERT(ptr != NULL);

		descr = (kmp_mem_descr_t *)(((kmp_uintptr_t)ptr) - sizeof(kmp_mem_descr_t));

		KE_TRACE(26, (" __kmp_fast_free: size_aligned=%d\n",
		(int)descr->size_aligned));

		size = descr->size_aligned; // 2, 4, 16, 64, 65, 66, ... cache lines

		idx = DCACHE_LINE * 2; // 2 cache lines is minimal size of block
		if (idx == size) {
		index = 0; // 2 cache lines
		} else if ((idx <<= 1) == size) {
		index = 1; // 4 cache lines
		} else if ((idx <<= 2) == size) {
		index = 2; // 16 cache lines
		} else if ((idx <<= 2) == size) {
		index = 3; // 64 cache lines
		} else {
		KMP_DEBUG_ASSERT(size > DCACHE_LINE * 64);
		goto free_call; // 65 or more cache lines ( > 8KB )
		}

		alloc_thr = (kmp_info_t *)descr->ptr_aligned; // get thread owning the block
		if (alloc_thr == this_thr) {
		// push block to self no-sync free list, linking previous head (LIFO)
		((void *)ptr) = this_thr->th.th_free_lists[index].th_free_list_self;
		this_thr->th.th_free_lists[index].th_free_list_self = ptr;
		} else {
		void *head = this_thr->th.th_free_lists[index].th_free_list_other;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'head' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'head' [readability-identifier-naming]…
		if (head == NULL) {
		// Create new free list
		this_thr->th.th_free_lists[index].th_free_list_other = ptr;
		((void *)ptr) = NULL; // mark the tail of the list
		descr->size_allocated = (size_t)1; // head of the list keeps its length
		} else {
		// need to check existed "other" list's owner thread and size of queue
		kmp_mem_descr_t *dsc =
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'dsc' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'dsc' [readability-identifier-naming]…
		(kmp_mem_descr_t )((char )head - sizeof(kmp_mem_descr_t));
		// allocating thread, same for all queue nodes
		kmp_info_t q_th = (kmp_info_t )(dsc->ptr_aligned);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'q_th' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'q_th' [readability-identifier-naming]…
		size_t q_sz =
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'q_sz' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'q_sz' [readability-identifier-naming]…
		dsc->size_allocated + 1; // new size in case we add current task
		if (q_th == alloc_thr && q_sz <= KMP_FREE_LIST_LIMIT) {
		// we can add current task to "other" list, no sync needed
		((void *)ptr) = head;
		descr->size_allocated = q_sz;
		this_thr->th.th_free_lists[index].th_free_list_other = ptr;
		} else {
		// either queue blocks owner is changing or size limit exceeded
		// return old queue to allocating thread (q_th) synchroneously,
		// and start new list for alloc_thr's tasks
		void *old_ptr;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'old_ptr' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'old_ptr' [readability-identifier-naming]…
		void *tail = head;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'tail' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'tail' [readability-identifier-naming]…
		void next = ((void **)head);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'next' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'next' [readability-identifier-naming]…
		while (next != NULL) {
		KMP_DEBUG_ASSERT(
		// queue size should decrease by 1 each step through the list
		((kmp_mem_descr_t )((char )next - sizeof(kmp_mem_descr_t)))
		->size_allocated +
		1 ==
		((kmp_mem_descr_t )((char )tail - sizeof(kmp_mem_descr_t)))
		->size_allocated);
		tail = next; // remember tail node
		next = ((void *)next);
		}
		KMP_DEBUG_ASSERT(q_th != NULL);
		// push block to owner's sync free list
		old_ptr = TCR_PTR(q_th->th.th_free_lists[index].th_free_list_sync);
		/* the next pointer must be set before setting free_list to ptr to avoid
		exposing a broken list to other threads, even for an instant. */
		((void *)tail) = old_ptr;

		while (!KMP_COMPARE_AND_STORE_PTR(
		&q_th->th.th_free_lists[index].th_free_list_sync, old_ptr, head)) {
		KMP_CPU_PAUSE();
		old_ptr = TCR_PTR(q_th->th.th_free_lists[index].th_free_list_sync);
		((void *)tail) = old_ptr;
		}

		// start new list of not-selt tasks
		this_thr->th.th_free_lists[index].th_free_list_other = ptr;
		((void *)ptr) = NULL;
		descr->size_allocated = (size_t)1; // head of queue keeps its length
		}
		}
		}
		goto end;

		free_call:
		KE_TRACE(25, ("__kmp_fast_free: T#%d Calling __kmp_thread_free for size %d\n",
		__kmp_gtid_from_thread(this_thr), size));
		kmp_b_dequeue(this_thr); /* Release any queued buffers */
		kmp_b_free(this_thr, descr->ptr_allocated);

		end:
		KE_TRACE(25, ("<- __kmp_fast_free() returns\n"));

		} // func __kmp_fast_free
		#endif // USE_FAST_MEMORY

/*!		/*!
@ingroup TASKING		@ingroup TASKING
@param loc_ref location of the original task directive		@param loc_ref location of the original task directive
@param gtid Global Thread ID of encountering thread		@param gtid Global Thread ID of encountering thread
@param new_task task thunk allocated by __kmpc_omp_task_alloc() for the ''new		@param new_task task thunk allocated by __kmpc_omp_task_alloc() for the ''new
task''		task''
@param naffins Number of affinity items		@param naffins Number of affinity items
@param affin_list List of affinity items		@param affin_list List of affinity items
▲ Show 20 Lines • Show All 3,305 Lines • Show Last 20 Lines