Download Raw Diff

Details

Reviewers

gtbercea
grokos

Commits

rOMP359341: [OPENMP][NVPTX]Correctly handle L2 parallelism in SPMD mode.
rGc03fe7317606: [OPENMP][NVPTX]Correctly handle L2 parallelism in SPMD mode.
rL359341: [OPENMP][NVPTX]Correctly handle L2 parallelism in SPMD mode.

Summary

The parallelLevel counter must be on per-thread basis to fully support
L2+ parallelism, otherwise we may end up with undefined behavior.
Introduce the parallelLevel on per-warp basis using shared memory. It
allows to avoid the problems with the synchronization and allows fully
support L2+ parallelism in SPMD mode with no runtime.

Diff Detail

Repository: rL LLVM

Event Timeline

ABataev created this revision.Apr 19 2019, 1:46 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2019, 1:46 PM

Herald added subscribers: jdoerfert, guansong. · View Herald Transcript

Harbormaster completed remote builds in B30782: Diff 195904.Apr 19 2019, 1:48 PM

Why is it enough to have one counter per warp, what happens if threads within a warp diverge? Before D55773 we had a counter per thread...

Updated the test after the fix + outlined warp/lane id operations into functions.

Harbormaster completed remote builds in B30845: Diff 196112.Apr 22 2019, 12:05 PM

In D60918#1473401, @Hahnfeld wrote:

Why is it enough to have one counter per warp, what happens if threads within a warp diverge? Before D55773 we had a counter per thread...

Counter per thread significantly affects the performance. It requires the class allocated in the global memory with the dynamically controlled queue to handle this object in the global memory. Doru removed this at cost of some functionality. At the moment, this solution does not work effectively when we have even warp divergence, to handle correctly L2 parallelism you must compile your code with full runtime support. This patch fixes this problem. It supports the divergence between the warps. Plus, it fixes the problem with the threads divergence within the warp. As far as the divergent threads are serialized, it is ok to have a parallelism level counter on per-warp basis. The test demonstrates the threads divergence and produces correct result in SPMD without runtime mode.
We can use per-thread counter, but it requires 1K of shared memory. This solution allows to save the shared memory and use only 32 bytes per block.

jdoerfert added inline comments.Apr 22 2019, 12:51 PM

libomptarget/deviceRTLs/nvptx/test/parallel/spmd_parallel_regions.cpp
18 ↗	(On Diff #196112)	I'm confused by the ternary operator (also the one below). If the first target thread executes this critical region, `isHost` is `-1` and it will be set to `omp_is_initial_device()`. The second thread comes along and will set it to `omp_is_initial_device` because it was, at any point in time, either `-1` or `omp_is_initial_device`. And so on. So why the ternary operator?
27 ↗	(On Diff #196112)	I don't get the calculation in the comment. Could you please elaborate? In general, I'm confused because you rewrite the test again. Was the old test broken? If not, why don't we keep both? If it was, can you explain why? I have more and more the feeling changes are added and modified faster than anyone actually understands them, leading to "tests" that simply verify the current behavior.
30 ↗	(On Diff #196112)	Why do we need this ternary operator, shouldn't ParallelLevel2 always be set to L2?
38 ↗	(On Diff #196112)	As far as I can tell, a value of -1 for isHost will not cause the test to fail. `(bool)-1` is `true` and you never verify the "Runtime error" is not printed.

ABataev marked 4 inline comments as done.Apr 22 2019, 1:06 PM

ABataev added inline comments.

libomptarget/deviceRTLs/nvptx/test/parallel/spmd_parallel_regions.cpp
18 ↗	(On Diff #196112)	Because it is a test. If `omp_is_initial_device()` at least once return unexpected value, `isHost` will be set to 1 instead of the expected 0.
27 ↗	(On Diff #196112)	The first version of the test did not test the required functionality - SPMD mode without runtime. I fixed the test and committed to test the construct that supports SPMD+no runtime. This version of the test goes further and adds the testing of the parallel level with thread divergence. The calculations are simplу (see the comment below). We have a loop, that will be executed 4 times (since the outer loop iterates from 0 to 10, scheduling is static, 1 and we have condition `omp_get_thread_num() > 5`. It means that only 4 threads will execute inner `parallel for` construct (10 - number of iterations in outer loop, 6 - the first thread that will execute the loop, need to fix the expression from 9-5 to 10-6 in the comment). Each inner loop iterates 10 time and for secnd level of parallelism `omp_get_level()` must return 2. Thus, the expected result is 4102 = 80.
30 ↗	(On Diff #196112)	Same, it is test, it checks that `ParallelLevel2` inall required threads set correctly to 2, not 3, 1, 0 or something else.
38 ↗	(On Diff #196112)	It will, since expected result is `Target region executed on the device`. With `isHost` equal to `-1` the result will be `Target region executed on the host`

Fixed the comments for the expected results.

Harbormaster completed remote builds in B30846: Diff 196125.Apr 22 2019, 1:11 PM

jdoerfert added inline comments.Apr 22 2019, 1:48 PM

libomptarget/deviceRTLs/nvptx/test/parallel/spmd_parallel_regions.cpp

18 ↗

(On Diff #196112)

I don't think it makes sense to complicate this test case by checking for the behavior of omp_is_initial_device.

Also, your method of determining "at least once return unexpected value" for omp_is_initial_device is flawed. I sketched a path below but I don't think it is the only way to get the "correct" result even though the omp_is_initial_device implementation is, for some reason, broken. Finally, Setting isHost to 1 is not distinguishable from running on the initial device. So in a test case for omp_is_initial_device we should probably be able to distinguish these cases.

Let's say we are on the device, so omp_is_initial_device should be false (=0), and we run with 4 threads. Now, an alternating omp_is_initial_device would still result in isHost = 0:

thread #	`isHost`	`omp_is_initial_device()` (see _below_)
0	-1	0
1	0	1
2	1	0
3	1	1/0

_below_: If there are two values the first is the result of the call in the condition, the second in the consequence.

27 ↗

(On Diff #196112)

need to fix the expression from 9-5 to 10-6 in the comment

That helps a lot, thanks.

30 ↗

(On Diff #196112)

Same as above, this doesn't test much.

Fixed test checks.

Harbormaster completed remote builds in B30890: Diff 196251.Apr 23 2019, 7:58 AM

Update to top.

Harbormaster completed remote builds in B30954: Diff 196511.Apr 24 2019, 1:23 PM

Other comments?

I think it looks good now.

This revision is now accepted and ready to land.Apr 26 2019, 12:18 PM

Closed by commit rL359341: [OPENMP][NVPTX]Correctly handle L2 parallelism in SPMD mode. (authored by ABataev). · Explain WhyApr 26 2019, 12:28 PM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptApr 26 2019, 12:28 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

jdoerfert mentioned this in D62199: [OPENMP][NVPTX]Fix barriers and parallel level counters, NFC..May 24 2019, 6:49 PM

Diff 196899

openmp/trunk/libomptarget/deviceRTLs/nvptx/src/libcall.cu

Show First 20 Lines • Show All 159 Lines • ▼ Show 20 Lines	EXTERN int omp_get_max_active_levels(void) {
return rc;		return rc;
}		}

EXTERN int omp_get_level(void) {		EXTERN int omp_get_level(void) {
if (isRuntimeUninitialized()) {		if (isRuntimeUninitialized()) {
ASSERT0(LT_FUSSY, isSPMDMode(),		ASSERT0(LT_FUSSY, isSPMDMode(),
"Expected SPMD mode only with uninitialized runtime.");		"Expected SPMD mode only with uninitialized runtime.");
// parallelLevel starts from 0, need to add 1 for correct level.		// parallelLevel starts from 0, need to add 1 for correct level.
return parallelLevel + 1;		return parallelLevel[GetWarpId()] + 1;
}		}
int level = 0;		int level = 0;
omptarget_nvptx_TaskDescr *currTaskDescr =		omptarget_nvptx_TaskDescr *currTaskDescr =
getMyTopTaskDescriptor(isSPMDMode());		getMyTopTaskDescriptor(isSPMDMode());
ASSERT0(LT_FUSSY, currTaskDescr,		ASSERT0(LT_FUSSY, currTaskDescr,
"do not expect fct to be called in a non-active thread");		"do not expect fct to be called in a non-active thread");
do {		do {
if (currTaskDescr->IsParallelConstruct()) {		if (currTaskDescr->IsParallelConstruct()) {
▲ Show 20 Lines • Show All 339 Lines • Show Last 20 Lines

openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omp_data.cu

Show All 25 Lines	__device__
omptarget_nvptx_Queue<omptarget_nvptx_ThreadPrivateContext, OMP_STATE_COUNT>		omptarget_nvptx_Queue<omptarget_nvptx_ThreadPrivateContext, OMP_STATE_COUNT>
omptarget_nvptx_device_State[MAX_SM];		omptarget_nvptx_device_State[MAX_SM];

__device__ omptarget_nvptx_SimpleMemoryManager		__device__ omptarget_nvptx_SimpleMemoryManager
omptarget_nvptx_simpleMemoryManager;		omptarget_nvptx_simpleMemoryManager;
__device__ __shared__ uint32_t usedMemIdx;		__device__ __shared__ uint32_t usedMemIdx;
__device__ __shared__ uint32_t usedSlotIdx;		__device__ __shared__ uint32_t usedSlotIdx;

__device__ __shared__ uint8_t parallelLevel;		__device__ __shared__ uint8_t parallelLevel[MAX_THREADS_PER_TEAM / WARPSIZE];

// Pointer to this team's OpenMP state object		// Pointer to this team's OpenMP state object
__device__ __shared__		__device__ __shared__
omptarget_nvptx_ThreadPrivateContext *omptarget_nvptx_threadPrivateContext;		omptarget_nvptx_ThreadPrivateContext *omptarget_nvptx_threadPrivateContext;

////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////
// The team master sets the outlined parallel function in this variable to		// The team master sets the outlined parallel function in this variable to
// communicate with the workers. Since it is in shared memory, there is one		// communicate with the workers. Since it is in shared memory, there is one
Show All 23 Lines

openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h

	Show First 20 Lines • Show All 400 Lines • ▼ Show 20 Lines
	////////////////////////////////////////////////////////////////////////////////			////////////////////////////////////////////////////////////////////////////////
	// global data tables			// global data tables
	////////////////////////////////////////////////////////////////////////////////			////////////////////////////////////////////////////////////////////////////////

	extern __device__ omptarget_nvptx_SimpleMemoryManager			extern __device__ omptarget_nvptx_SimpleMemoryManager
	omptarget_nvptx_simpleMemoryManager;			omptarget_nvptx_simpleMemoryManager;
	extern __device__ __shared__ uint32_t usedMemIdx;			extern __device__ __shared__ uint32_t usedMemIdx;
	extern __device__ __shared__ uint32_t usedSlotIdx;			extern __device__ __shared__ uint32_t usedSlotIdx;
	extern __device__ __shared__ uint8_t parallelLevel;			extern __device__ __shared__ uint8_t
				parallelLevel[MAX_THREADS_PER_TEAM / WARPSIZE];
	extern __device__ __shared__			extern __device__ __shared__
	omptarget_nvptx_ThreadPrivateContext *omptarget_nvptx_threadPrivateContext;			omptarget_nvptx_ThreadPrivateContext *omptarget_nvptx_threadPrivateContext;

	extern __device__ __shared__ uint32_t execution_param;			extern __device__ __shared__ uint32_t execution_param;
	extern __device__ __shared__ void *ReductionScratchpadPtr;			extern __device__ __shared__ void *ReductionScratchpadPtr;

	////////////////////////////////////////////////////////////////////////////////			////////////////////////////////////////////////////////////////////////////////
	// work function (outlined parallel/simd functions) and arguments.			// work function (outlined parallel/simd functions) and arguments.
	Show All 25 Lines

openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.cu

	Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	EXTERN void __kmpc_spmd_kernel_init(int ThreadLimit, int16_t RequiresOMPRuntime,			EXTERN void __kmpc_spmd_kernel_init(int ThreadLimit, int16_t RequiresOMPRuntime,
	int16_t RequiresDataSharing) {			int16_t RequiresDataSharing) {
	PRINT0(LD_IO, "call to __kmpc_spmd_kernel_init\n");			PRINT0(LD_IO, "call to __kmpc_spmd_kernel_init\n");

	if (!RequiresOMPRuntime) {			if (!RequiresOMPRuntime) {
	// If OMP runtime is not required don't initialize OMP state.			// If OMP runtime is not required don't initialize OMP state.
	setExecutionParameters(Spmd, RuntimeUninitialized);			setExecutionParameters(Spmd, RuntimeUninitialized);
	if (GetThreadIdInBlock() == 0) {			if (GetThreadIdInBlock() == 0) {
	parallelLevel = 0;
	usedSlotIdx = smid() % MAX_SM;			usedSlotIdx = smid() % MAX_SM;
				parallelLevel[0] = 0;
				} else if (GetLaneId() == 0) {
				parallelLevel[GetWarpId()] = 0;
	}			}
	__SYNCTHREADS();			__SYNCTHREADS();
	return;			return;
	}			}
	setExecutionParameters(Spmd, RuntimeInitialized);			setExecutionParameters(Spmd, RuntimeInitialized);

	//			//
	// Team Context Initialization.			// Team Context Initialization.
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

openmp/trunk/libomptarget/deviceRTLs/nvptx/src/parallel.cu

	Show First 20 Lines • Show All 333 Lines • ▼ Show 20 Lines
	////////////////////////////////////////////////////////////////////////////////			////////////////////////////////////////////////////////////////////////////////

	EXTERN void __kmpc_serialized_parallel(kmp_Ident *loc, uint32_t global_tid) {			EXTERN void __kmpc_serialized_parallel(kmp_Ident *loc, uint32_t global_tid) {
	PRINT0(LD_IO, "call to __kmpc_serialized_parallel\n");			PRINT0(LD_IO, "call to __kmpc_serialized_parallel\n");

	if (checkRuntimeUninitialized(loc)) {			if (checkRuntimeUninitialized(loc)) {
	ASSERT0(LT_FUSSY, checkSPMDMode(loc),			ASSERT0(LT_FUSSY, checkSPMDMode(loc),
	"Expected SPMD mode with uninitialized runtime.");			"Expected SPMD mode with uninitialized runtime.");
	__SYNCTHREADS();			unsigned tnum = __ACTIVEMASK();
	if (GetThreadIdInBlock() == 0)			int leader = __ffs(tnum) - 1;
	++parallelLevel;			__SHFL_SYNC(tnum, leader, leader);
	__SYNCTHREADS();			if (GetLaneId() == leader)
				++parallelLevel[GetWarpId()];
				__SHFL_SYNC(tnum, leader, leader);

	return;			return;
	}			}

	// assume this is only called for nested parallel			// assume this is only called for nested parallel
	int threadId = GetLogicalThreadIdInBlock(checkSPMDMode(loc));			int threadId = GetLogicalThreadIdInBlock(checkSPMDMode(loc));

	// unlike actual parallel, threads in the same team do not share			// unlike actual parallel, threads in the same team do not share
	Show All 23 Lines

	EXTERN void __kmpc_end_serialized_parallel(kmp_Ident *loc,			EXTERN void __kmpc_end_serialized_parallel(kmp_Ident *loc,
	uint32_t global_tid) {			uint32_t global_tid) {
	PRINT0(LD_IO, "call to __kmpc_end_serialized_parallel\n");			PRINT0(LD_IO, "call to __kmpc_end_serialized_parallel\n");

	if (checkRuntimeUninitialized(loc)) {			if (checkRuntimeUninitialized(loc)) {
	ASSERT0(LT_FUSSY, checkSPMDMode(loc),			ASSERT0(LT_FUSSY, checkSPMDMode(loc),
	"Expected SPMD mode with uninitialized runtime.");			"Expected SPMD mode with uninitialized runtime.");
	__SYNCTHREADS();			unsigned tnum = __ACTIVEMASK();
	if (GetThreadIdInBlock() == 0)			int leader = __ffs(tnum) - 1;
	--parallelLevel;			__SHFL_SYNC(tnum, leader, leader);
	__SYNCTHREADS();			if (GetLaneId() == leader)
				--parallelLevel[GetWarpId()];
				__SHFL_SYNC(tnum, leader, leader);
	return;			return;
	}			}

	// pop stack			// pop stack
	int threadId = GetLogicalThreadIdInBlock(checkSPMDMode(loc));			int threadId = GetLogicalThreadIdInBlock(checkSPMDMode(loc));
	omptarget_nvptx_TaskDescr *currTaskDescr = getMyTopTaskDescriptor(threadId);			omptarget_nvptx_TaskDescr *currTaskDescr = getMyTopTaskDescriptor(threadId);
	// set new top			// set new top
	omptarget_nvptx_threadPrivateContext->SetTopLevelTaskDescr(			omptarget_nvptx_threadPrivateContext->SetTopLevelTaskDescr(
	threadId, currTaskDescr->GetPrevTaskDescr());			threadId, currTaskDescr->GetPrevTaskDescr());
	// free			// free
	SafeFree(currTaskDescr, (char *)"new seq parallel task");			SafeFree(currTaskDescr, (char *)"new seq parallel task");
	currTaskDescr = getMyTopTaskDescriptor(threadId);			currTaskDescr = getMyTopTaskDescriptor(threadId);
	currTaskDescr->RestoreLoopData();			currTaskDescr->RestoreLoopData();
	}			}

	EXTERN uint16_t __kmpc_parallel_level(kmp_Ident *loc, uint32_t global_tid) {			EXTERN uint16_t __kmpc_parallel_level(kmp_Ident *loc, uint32_t global_tid) {
	PRINT0(LD_IO, "call to __kmpc_parallel_level\n");			PRINT0(LD_IO, "call to __kmpc_parallel_level\n");

	if (checkRuntimeUninitialized(loc)) {			if (checkRuntimeUninitialized(loc)) {
	ASSERT0(LT_FUSSY, checkSPMDMode(loc),			ASSERT0(LT_FUSSY, checkSPMDMode(loc),
	"Expected SPMD mode with uninitialized runtime.");			"Expected SPMD mode with uninitialized runtime.");
	return parallelLevel + 1;			return parallelLevel[GetWarpId()] + 1;
	}			}

	int threadId = GetLogicalThreadIdInBlock(checkSPMDMode(loc));			int threadId = GetLogicalThreadIdInBlock(checkSPMDMode(loc));
	omptarget_nvptx_TaskDescr *currTaskDescr =			omptarget_nvptx_TaskDescr *currTaskDescr =
	omptarget_nvptx_threadPrivateContext->GetTopLevelTaskDescr(threadId);			omptarget_nvptx_threadPrivateContext->GetTopLevelTaskDescr(threadId);
	if (currTaskDescr->InL2OrHigherParallelRegion())			if (currTaskDescr->InL2OrHigherParallelRegion())
	return 2;			return 2;
	else if (currTaskDescr->InParallelRegion())			else if (currTaskDescr->InParallelRegion())
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

openmp/trunk/libomptarget/deviceRTLs/nvptx/src/support.h

	Show All 34 Lines
	// get info from machine			// get info from machine
	////////////////////////////////////////////////////////////////////////////////			////////////////////////////////////////////////////////////////////////////////

	// get low level ids of resources			// get low level ids of resources
	INLINE int GetThreadIdInBlock();			INLINE int GetThreadIdInBlock();
	INLINE int GetBlockIdInKernel();			INLINE int GetBlockIdInKernel();
	INLINE int GetNumberOfBlocksInKernel();			INLINE int GetNumberOfBlocksInKernel();
	INLINE int GetNumberOfThreadsInBlock();			INLINE int GetNumberOfThreadsInBlock();
				INLINE unsigned GetWarpId();
				INLINE unsigned GetLaneId();

	// get global ids to locate tread/team info (constant regardless of OMP)			// get global ids to locate tread/team info (constant regardless of OMP)
	INLINE int GetLogicalThreadIdInBlock(bool isSPMDExecutionMode);			INLINE int GetLogicalThreadIdInBlock(bool isSPMDExecutionMode);
	INLINE int GetMasterThreadID();			INLINE int GetMasterThreadID();
	INLINE int GetNumberOfWorkersInTeam();			INLINE int GetNumberOfWorkersInTeam();

	// get OpenMP thread and team ids			// get OpenMP thread and team ids
	INLINE int GetOmpThreadId(int threadId, bool isSPMDExecutionMode,			INLINE int GetOmpThreadId(int threadId, bool isSPMDExecutionMode,
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

openmp/trunk/libomptarget/deviceRTLs/nvptx/src/supporti.h

Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines
INLINE int GetThreadIdInBlock() { return threadIdx.x; }		INLINE int GetThreadIdInBlock() { return threadIdx.x; }

INLINE int GetBlockIdInKernel() { return blockIdx.x; }		INLINE int GetBlockIdInKernel() { return blockIdx.x; }

INLINE int GetNumberOfBlocksInKernel() { return gridDim.x; }		INLINE int GetNumberOfBlocksInKernel() { return gridDim.x; }

INLINE int GetNumberOfThreadsInBlock() { return blockDim.x; }		INLINE int GetNumberOfThreadsInBlock() { return blockDim.x; }

		INLINE unsigned GetWarpId() { return threadIdx.x / WARPSIZE; }

		INLINE unsigned GetLaneId() { return threadIdx.x & (WARPSIZE - 1); }

////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////
//		//
// Calls to the Generic Scheme Implementation Layer (assuming 1D layout)		// Calls to the Generic Scheme Implementation Layer (assuming 1D layout)
//		//
////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////

// The master thread id is the first thread (lane) of the last warp.		// The master thread id is the first thread (lane) of the last warp.
// Thread id is 0 indexed.		// Thread id is 0 indexed.
Show All 36 Lines	INLINE int GetOmpThreadId(int threadId, bool isSPMDExecutionMode,
bool isRuntimeUninitialized) {		bool isRuntimeUninitialized) {
// omp_thread_num		// omp_thread_num
int rc;		int rc;

if (isRuntimeUninitialized) {		if (isRuntimeUninitialized) {
ASSERT0(LT_FUSSY, isSPMDExecutionMode,		ASSERT0(LT_FUSSY, isSPMDExecutionMode,
"Uninitialized runtime with non-SPMD mode.");		"Uninitialized runtime with non-SPMD mode.");
// For level 2 parallelism all parallel regions are executed sequentially.		// For level 2 parallelism all parallel regions are executed sequentially.
if (parallelLevel > 0)		if (parallelLevel[GetWarpId()] > 0)
rc = 0;		rc = 0;
else		else
rc = GetThreadIdInBlock();		rc = GetThreadIdInBlock();
} else {		} else {
omptarget_nvptx_TaskDescr *currTaskDescr =		omptarget_nvptx_TaskDescr *currTaskDescr =
omptarget_nvptx_threadPrivateContext->GetTopLevelTaskDescr(threadId);		omptarget_nvptx_threadPrivateContext->GetTopLevelTaskDescr(threadId);
rc = currTaskDescr->ThreadId();		rc = currTaskDescr->ThreadId();
}		}
return rc;		return rc;
}		}

INLINE int GetNumberOfOmpThreads(int threadId, bool isSPMDExecutionMode,		INLINE int GetNumberOfOmpThreads(int threadId, bool isSPMDExecutionMode,
bool isRuntimeUninitialized) {		bool isRuntimeUninitialized) {
// omp_num_threads		// omp_num_threads
int rc;		int rc;

if (isRuntimeUninitialized) {		if (isRuntimeUninitialized) {
ASSERT0(LT_FUSSY, isSPMDExecutionMode,		ASSERT0(LT_FUSSY, isSPMDExecutionMode,
"Uninitialized runtime with non-SPMD mode.");		"Uninitialized runtime with non-SPMD mode.");
// For level 2 parallelism all parallel regions are executed sequentially.		// For level 2 parallelism all parallel regions are executed sequentially.
if (parallelLevel > 0)		if (parallelLevel[GetWarpId()] > 0)
rc = 1;		rc = 1;
else		else
rc = GetNumberOfThreadsInBlock();		rc = GetNumberOfThreadsInBlock();
} else {		} else {
omptarget_nvptx_TaskDescr *currTaskDescr =		omptarget_nvptx_TaskDescr *currTaskDescr =
omptarget_nvptx_threadPrivateContext->GetTopLevelTaskDescr(threadId);		omptarget_nvptx_threadPrivateContext->GetTopLevelTaskDescr(threadId);
ASSERT0(LT_FUSSY, currTaskDescr, "expected a top task descr");		ASSERT0(LT_FUSSY, currTaskDescr, "expected a top task descr");
rc = currTaskDescr->ThreadsInTeam();		rc = currTaskDescr->ThreadsInTeam();
▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines

openmp/trunk/libomptarget/deviceRTLs/nvptx/test/parallel/spmd_parallel_regions.cpp

	// RUN: %compilexx-run-and-check			// RUN: %compilexx-run-and-check

	#include <stdio.h>			#include <stdio.h>
	#include <omp.h>			#include <omp.h>

	int main(void) {			int main(void) {
	int isHost = -1;			int isHost = -1;
	int ParallelLevel1 = -1, ParallelLevel2 = -1;			int ParallelLevel1 = -1, ParallelLevel2 = -1;
				int Count = 0;

	#pragma omp target parallel for map(tofrom \			#pragma omp target parallel for map(tofrom \
	: isHost, ParallelLevel1, ParallelLevel2)			: isHost, ParallelLevel1, ParallelLevel2), reduction(+: Count) schedule(static, 1)
	for (int J = 0; J < 10; ++J) {			for (int J = 0; J < 10; ++J) {
	#pragma omp critical			#pragma omp critical
	{			{
	isHost = (isHost < 0 \|\| isHost == omp_is_initial_device())			isHost = (isHost < 0 \|\| isHost == 0) ? omp_is_initial_device() : isHost;
	? omp_is_initial_device()			ParallelLevel1 = (ParallelLevel1 < 0 \|\| ParallelLevel1 == 1)
	: 1;			? omp_get_level()
	ParallelLevel1 =			: ParallelLevel1;
	(ParallelLevel1 < 0 \|\| ParallelLevel1 == 1) ? omp_get_level() : 2;
	}			}
				if (omp_get_thread_num() > 5) {
	int L2;			int L2;
	#pragma omp parallel for schedule(dynamic) lastprivate(L2)			#pragma omp parallel for schedule(dynamic) lastprivate(L2) reduction(+: Count)
	for (int I = 0; I < 10; ++I)			for (int I = 0; I < 10; ++I) {
	L2 = omp_get_level();			L2 = omp_get_level();
				Count += omp_get_level(); // (10-6)102 = 80
				}
	#pragma omp critical			#pragma omp critical
	ParallelLevel2 = (ParallelLevel2 < 0 \|\| ParallelLevel2 == 2) ? L2 : 1;			ParallelLevel2 =
				(ParallelLevel2 < 0 \|\| ParallelLevel2 == 2) ? L2 : ParallelLevel2;
				} else {
				Count += omp_get_level(); // 6 * 1 = 6
				}
	}			}

	if (isHost < 0) {			if (isHost < 0) {
	printf("Runtime error, isHost=%d\n", isHost);			printf("Runtime error, isHost=%d\n", isHost);
	}			}

	// CHECK: Target region executed on the device			// CHECK: Target region executed on the device
	printf("Target region executed on the %s\n", isHost ? "host" : "device");			printf("Target region executed on the %s\n", isHost ? "host" : "device");
	// CHECK: Parallel level in SPMD mode: L1 is 1, L2 is 2			// CHECK: Parallel level in SPMD mode: L1 is 1, L2 is 2
	printf("Parallel level in SPMD mode: L1 is %d, L2 is %d\n", ParallelLevel1,			printf("Parallel level in SPMD mode: L1 is %d, L2 is %d\n", ParallelLevel1,
	ParallelLevel2);			ParallelLevel2);
				// Final result of Count is (10-6)(num of loops)10(num of iterations)2(par
				// level) + 6(num of iterations) * 1(par level)
				// CHECK: Expected count = 86
				printf("Expected count = %d\n", Count);

	return isHost;			return isHost;
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[OPENMP][NVPTX]Correctly handle L2 parallelism in SPMD mode.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 196899

openmp/trunk/libomptarget/deviceRTLs/nvptx/src/libcall.cu

openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omp_data.cu

openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h

openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.cu

openmp/trunk/libomptarget/deviceRTLs/nvptx/src/parallel.cu

openmp/trunk/libomptarget/deviceRTLs/nvptx/src/support.h

openmp/trunk/libomptarget/deviceRTLs/nvptx/src/supporti.h

openmp/trunk/libomptarget/deviceRTLs/nvptx/test/parallel/spmd_parallel_regions.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[OPENMP][NVPTX]Correctly handle L2 parallelism in SPMD mode.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 196899

openmp/trunk/libomptarget/deviceRTLs/nvptx/src/libcall.cu

openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omp_data.cu

openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h

openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.cu

openmp/trunk/libomptarget/deviceRTLs/nvptx/src/parallel.cu

openmp/trunk/libomptarget/deviceRTLs/nvptx/src/support.h

openmp/trunk/libomptarget/deviceRTLs/nvptx/src/supporti.h

openmp/trunk/libomptarget/deviceRTLs/nvptx/test/parallel/spmd_parallel_regions.cpp

[OPENMP][NVPTX]Correctly handle L2 parallelism in SPMD mode.
ClosedPublic