This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
openmp/libomptarget/deviceRTLs/
-
libomptarget/
-
deviceRTLs/
-
common/src/
-
src/
-
omptarget.cu
-
nvptx/test/api/
-
test/
-
api/
-
get_max_threads.c
1
max_threads.c

Differential D74092

Changed omp_get_max_threads() implementation to more closely match spec description.
ClosedPublic

Authored by estewart08 on Feb 5 2020, 2:14 PM.

Download Raw Diff

Details

Reviewers

jdoerfert
ABataev
grokos
JonChesterfield

Commits

rG190a11148b75: Changed omp_get_max_threads() implementation to more closely match spec…

Summary

The 5.0 spec states, "The omp_get_max_threads routine returns an upper bound on the number of threads that could be used to form a new team if a parallel construct without a num_threads clause were encountered after execution returns from this routine." The attached test shows Max Threads: 96, Num Threads: 128 without the proposed change. The number of threads should not exceed the (max) nthreads ICV, hence we should return the higher SPMD thread number even when omp_get_max_threads() is called in a generic kernel. This change does fail the api test, max_threads.c, because now it would return 64 instead of 32.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

estewart08 created this revision.Feb 5 2020, 2:14 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2020, 2:14 PM

Herald added a subscriber: openmp-commits. · View Herald Transcript

Harbormaster completed remote builds in B45812: Diff 242751.Feb 5 2020, 2:16 PM

estewart08 retitled this revision from Changed omp_get_max_threads() implementation to more closely match spec description: "The omp_get_max_threads routine returns an upper bound on the number of threads that could be used to form a new team if a parallel construct without a... to Changed omp_get_max_threads() implementation to more closely match spec description..Feb 5 2020, 2:30 PM

estewart08 edited the summary of this revision. (Show Details)

This change does fail the api test, max_threads.c, because now it would return 64 instead of 32.

Can you please adjust the test and make it part of the commit.

The reasoning looks OK to me and I like the associated test. I don't understand the behaviour change in max_threads though, it looks like:

#pragma omp target teams map(MaxThreadsL1, MaxThreadsL2) thread_limit(32)      \
    num_teams(1)
  {
    MaxThreadsL1 = omp_get_max_threads();
// ...
  }

  // CHECK: Non-SPMD MaxThreadsL1 = 32                                                                                                                                                                             
  printf("Non-SPMD MaxThreadsL1 = %d\n", MaxThreadsL1);

Which I think is the developer asking for a maximum number of threads of 32 - do you mean both instances of 32 become 64, or just the second one? Agreed that it should be part of this patch, otherwise we'd break the build by committing.

Further, the get_max_threads result might be architecture dependent. Do nvptx and amdgcn return the same value in equivalent contexts? We might need an #ifdef in the test, or a separate instance of the test under amdgcn if they do behave differently here.

I can definitely add the change to max_threads.c to this review. The CHECK would become 64 due to the fact we are counting all threads now with this proposed change 32 thread_limit + 32 master warp.

// CHECK: Non-SPMD MaxThreadsL1 = 64

Yes, the test I proposed would be for nvptx only due to the fact that the other tests reside in the nvptx directory and the original max_threads test was checking nvptx values as well. Is the plan to convert all tests so that they support different architectures in the future and move them to common?

Update max_threads.c api test to match the change for omp_get_max_threads().

Harbormaster completed remote builds in B45869: Diff 242931.Feb 6 2020, 9:44 AM

Change looks good to me. @jdoerfert, @ABataev, @grokos?

Is the plan to convert all tests so that they support different architectures in the future and move them to common?

That sounds reasonable, though I'm not sure it's established as a plan. A fair amount of (to be added) tests should be totally architecture agnostic so those should end up under common. And some will always be architecture dependent.

This is a bug fix that is correct and helpful, one minor nit inlined but other than that LGTM.

In D74092#1862799, @JonChesterfield wrote:

Change looks good to me. @jdoerfert, @ABataev, @grokos?

Is the plan to convert all tests so that they support different architectures in the future and move them to common?

That sounds reasonable, though I'm not sure it's established as a plan. A fair amount of (to be added) tests should be totally architecture agnostic so those should end up under common. And some will always be architecture dependent.

+1, yes most should be independent and only if we have to we go dependent.

In D74092#1861046, @estewart08 wrote:
I can definitely add the change to max_threads.c to this review. The CHECK would become 64 due to the fact we are counting all threads now with this proposed change 32 thread_limit + 32 master warp.
// CHECK: Non-SPMD MaxThreadsL1 = 64
Yes, the test I proposed would be for nvptx only due to the fact that the other tests reside in the nvptx directory and the original max_threads test was checking nvptx values as well. Is the plan to convert all tests so that they support different architectures in the future and move them to common?

The entire 32 + "32 master warp" is a problematic construction we should get rid of. It's an implementation detail that leaks out and confuses people.
If we have different "warp sizes" we can have multiple check prefixes, especially since we will have to generalize compile-run-and-check anyway.

openmp/libomptarget/deviceRTLs/nvptx/test/api/max_threads.c
22–29	Nit: Please add a "Fixme" comment here explaining why 32, or actually "WARP_SIZE" would be the right thing here but why we see 62 instead.

This revision is now accepted and ready to land.Feb 7 2020, 12:24 AM

Added FIXME comment to describe change in omp_get_max_threads behavior.

Harbormaster completed remote builds in B45988: Diff 243299.Feb 7 2020, 3:20 PM

Thanks. Do you have commit access to the llvm github, and if not, would you prefer to wait until that is granted or have someone else land this on your behalf?

I've landed this on Ethan's behalf as I believe he's distracted by non-llvm activities at present.

Closed by commit rG190a11148b75: Changed omp_get_max_threads() implementation to more closely match spec… (authored by estewart08, committed by JonChesterfield). · Explain WhyFeb 12 2020, 3:36 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

openmp/

libomptarget/

deviceRTLs/

common/

src/

omptarget.cu

2 lines

nvptx/

test/

api/

get_max_threads.c

22 lines

max_threads.c

9 lines

Diff 244292

openmp/libomptarget/deviceRTLs/common/src/omptarget.cu

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	EXTERN void __kmpc_kernel_init(int ThreadLimit, int16_t RequiresOMPRuntime) {
// to point to the level zero task ICV. That ICV was init in		// to point to the level zero task ICV. That ICV was init in
// InitTeamDescr()		// InitTeamDescr()
omptarget_nvptx_threadPrivateContext->SetTopLevelTaskDescr(		omptarget_nvptx_threadPrivateContext->SetTopLevelTaskDescr(
threadId, currTeamDescr.LevelZeroTaskDescr());		threadId, currTeamDescr.LevelZeroTaskDescr());

// set number of threads and thread limit in team to started value		// set number of threads and thread limit in team to started value
omptarget_nvptx_TaskDescr *currTaskDescr =		omptarget_nvptx_TaskDescr *currTaskDescr =
omptarget_nvptx_threadPrivateContext->GetTopLevelTaskDescr(threadId);		omptarget_nvptx_threadPrivateContext->GetTopLevelTaskDescr(threadId);
nThreads = GetNumberOfWorkersInTeam();		nThreads = GetNumberOfThreadsInBlock();
threadLimit = ThreadLimit;		threadLimit = ThreadLimit;
}		}

EXTERN void __kmpc_kernel_deinit(int16_t IsOMPRuntimeInitialized) {		EXTERN void __kmpc_kernel_deinit(int16_t IsOMPRuntimeInitialized) {
PRINT0(LD_IO, "call to __kmpc_kernel_deinit\n");		PRINT0(LD_IO, "call to __kmpc_kernel_deinit\n");
ASSERT0(LT_FUSSY, IsOMPRuntimeInitialized,		ASSERT0(LT_FUSSY, IsOMPRuntimeInitialized,
"Generic always requires initialized runtime.");		"Generic always requires initialized runtime.");
// Enqueue omp state object for use by another team.		// Enqueue omp state object for use by another team.
▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

openmp/libomptarget/deviceRTLs/nvptx/test/api/get_max_threads.c

This file was added.

				// RUN: %compile-run-and-check
				#include <omp.h>
				#include <stdio.h>

				int main(){
				int max_threads = -1;
				int num_threads = -1;

				#pragma omp target map(tofrom: max_threads)
				max_threads = omp_get_max_threads();

				#pragma omp target parallel map(tofrom: num_threads)
				{
				#pragma omp master
				num_threads = omp_get_num_threads();
				}

				// CHECK: Max Threads: 128, Num Threads: 128
				printf("Max Threads: %d, Num Threads: %d\n", max_threads, num_threads);

				return 0;
				}

openmp/libomptarget/deviceRTLs/nvptx/test/api/max_threads.c

	Show All 13 Lines
	#pragma omp target teams map(MaxThreadsL1, MaxThreadsL2) thread_limit(32) \			#pragma omp target teams map(MaxThreadsL1, MaxThreadsL2) thread_limit(32) \
	num_teams(1)			num_teams(1)
	{			{
	MaxThreadsL1 = omp_get_max_threads();			MaxThreadsL1 = omp_get_max_threads();
	#pragma omp parallel reduction(unique : MaxThreadsL2)			#pragma omp parallel reduction(unique : MaxThreadsL2)
	{ MaxThreadsL2 = omp_get_max_threads(); }			{ MaxThreadsL2 = omp_get_max_threads(); }
	}			}

	// CHECK: Non-SPMD MaxThreadsL1 = 32			//FIXME: This Non-SPMD kernel will have 32 active threads due to
				// thread_limit. However, Non-SPMD MaxThreadsL1 is the total number of
				// threads in block (64 in this case), which translates to worker
				// threads + WARP_SIZE for Non-SPMD kernels and worker threads for SPMD
				// kernels. According to the spec, omp_get_max_threads must return the
				// max active threads possible between the two kernel types.

				// CHECK: Non-SPMD MaxThreadsL1 = 64
				jdoerfertUnsubmitted Not Done Reply Inline Actions Nit: Please add a "Fixme" comment here explaining why 32, or actually "WARP_SIZE" would be the right thing here but why we see 62 instead. jdoerfert: Nit: Please add a "Fixme" comment here explaining why 32, or actually "WARP_SIZE" would be the…
	printf("Non-SPMD MaxThreadsL1 = %d\n", MaxThreadsL1);			printf("Non-SPMD MaxThreadsL1 = %d\n", MaxThreadsL1);
	// CHECK: Non-SPMD MaxThreadsL2 = 1			// CHECK: Non-SPMD MaxThreadsL2 = 1
	printf("Non-SPMD MaxThreadsL2 = %d\n", MaxThreadsL2);			printf("Non-SPMD MaxThreadsL2 = %d\n", MaxThreadsL2);

	// SPMD mode with full runtime			// SPMD mode with full runtime
	MaxThreadsL2 = -1;			MaxThreadsL2 = -1;
	#pragma omp target parallel reduction(unique : MaxThreadsL2)			#pragma omp target parallel reduction(unique : MaxThreadsL2)
	{ MaxThreadsL2 = omp_get_max_threads(); }			{ MaxThreadsL2 = omp_get_max_threads(); }
	Show All 16 Lines