This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libomptarget/
-
deviceRTLs/nvptx/src/
-
nvptx/
-
src/
-
libcall.cu
-
loop.cu
-
parallel.cu
-
test/offloading/
-
offloading/
-
spmd_parallel_regions.cpp

Differential D60578

[OPENMP][NVPTX]Fix dynamic scheduling in L2+ SPMD parallel regions.
ClosedPublic

Authored by ABataev on Apr 11 2019, 1:42 PM.

Download Raw Diff

Details

Reviewers

gtbercea
kkwli0
grokos

Commits

rG13532ea62340: [OPENMP][NVPTX]Fix dynamic scheduling in L2+ SPMD parallel regions.
rL358442: [OPENMP][NVPTX]Fix dynamic scheduling in L2+ SPMD parallel regions.
rOMP358442: [OPENMP][NVPTX]Fix dynamic scheduling in L2+ SPMD parallel regions.

Summary

If the kernel is executed in SPMD mode and the L2+ parallel for region
with the dynamic scheduling is executed, dynamic scheduling functions
are called. They expect full runtime support, but SPMD kernels may be
executed without the full runtime. It leads to the runtime crash of the
compiled program. Patch fixes this problem + fixes handling of the
parallelism level in SPMD mode, which is required as part of this patch.

Diff Detail

Repository

rOMP OpenMP

Build Status

Buildable 30456
Build 30455: arc lint + arc unit

Event Timeline

ABataev created this revision.Apr 11 2019, 1:42 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 11 2019, 1:42 PM

Herald added subscribers: jdoerfert, guansong. · View Herald Transcript

Harbormaster completed remote builds in B30390: Diff 194744.Apr 11 2019, 1:46 PM

LGTM

This revision is now accepted and ready to land.Apr 11 2019, 2:15 PM

In D60578#1463411, @gtbercea wrote:

LGTM

Is there a way we can test this?

In D60578#1463452, @jdoerfert wrote:

In D60578#1463411, @gtbercea wrote:

LGTM

Is there a way we can test this?

It is tested in the internal testsuite, don't know when it is going to be committed to trunk

In D60578#1463461, @ABataev wrote:

In D60578#1463452, @jdoerfert wrote:

In D60578#1463411, @gtbercea wrote:

LGTM

Is there a way we can test this?

It is tested in the internal testsuite, don't know when it is going to be committed to trunk

There are two problems:

The internal testsuite did run before this patch, right? So it is unclear what that means.
Changes done upstream might break this without us noticing for a while and without being able to know apriory.

Why don't we have unit tests here or in the llvm-test suite?

In D60578#1463545, @jdoerfert wrote:

In D60578#1463461, @ABataev wrote:

In D60578#1463452, @jdoerfert wrote:

In D60578#1463411, @gtbercea wrote:

LGTM

Is there a way we can test this?

It is tested in the internal testsuite, don't know when it is going to be committed to trunk

There are two problems:

The internal testsuite did run before this patch, right? So it is unclear what that means.

No, the tests ran with this patch.

Changes done upstream might break this without us noticing for a while and without being able to know apriory.

We test everything before doing any changes.

Why don't we have unit tests here or in the llvm-test suite?

Because this is the library. Do you have an idea how to write the unit tests for it? It can be tested only with the executable tests. I know, that someone worked on the target-based testsuite, but don't know when it is going to be ready.

In D60578#1463583, @ABataev wrote:

In D60578#1463545, @jdoerfert wrote:

In D60578#1463461, @ABataev wrote:

In D60578#1463452, @jdoerfert wrote:

In D60578#1463411, @gtbercea wrote:

LGTM

Is there a way we can test this?

It is tested in the internal testsuite, don't know when it is going to be committed to trunk

There are two problems:

The internal testsuite did run before this patch, right? So it is unclear what that means.

No, the tests ran with this patch.

The internal test suite did run before this commit as well even though it was buggy. It is unclear to me what "the tests" does therefore mean.
Now you might have added some tests which nobody can check but we just see some changes that add "+ 1".
How should one review this? Similarly important, how should one now ensure this doesn't break in the future?

Changes done upstream might break this without us noticing for a while and without being able to know apriory.

We test everything before doing any changes.

The problem is not that you do not test everything, the problem is that the rest cannot.

Why don't we have unit tests here or in the llvm-test suite?

Because this is the library. Do you have an idea how to write the unit tests for it? It can be tested only with the executable tests.

We write google unit tests for various components, maybe something like that works here as well. A test that makes sure the initial output of omp_get_level is now 1 would then be great. It is by far not trivial to determine that omp_get_level, if called with an uninitialized device RT, should return parallelLevel + 1.

I know, that someone worked on the target-based testsuite, but don't know when it is going to be ready.

There is the V&V test suite: https://crpl.cis.udel.edu/ompvvsollve/
We could also add openmp target tests into the LLVM Test Suite and run them if people define CMAKE flags.

In D60578#1463642, @jdoerfert wrote:

In D60578#1463583, @ABataev wrote:

In D60578#1463545, @jdoerfert wrote:

In D60578#1463461, @ABataev wrote:

In D60578#1463452, @jdoerfert wrote:

In D60578#1463411, @gtbercea wrote:

LGTM

Is there a way we can test this?

It is tested in the internal testsuite, don't know when it is going to be committed to trunk

There are two problems:

The internal testsuite did run before this patch, right? So it is unclear what that means.

No, the tests ran with this patch.

The internal test suite did run before this commit as well even though it was buggy. It is unclear to me what "the tests" does therefore mean.
Now you might have added some tests which nobody can check but we just see some changes that add "+ 1".
How should one review this? Similarly important, how should one now ensure this doesn't break in the future?

Changes done upstream might break this without us noticing for a while and without being able to know apriory.

We test everything before doing any changes.

The problem is not that you do not test everything, the problem is that the rest cannot.

Why don't we have unit tests here or in the llvm-test suite?

Because this is the library. Do you have an idea how to write the unit tests for it? It can be tested only with the executable tests.

We write google unit tests for various components, maybe something like that works here as well. A test that makes sure the initial output of omp_get_level is now 1 would then be great. It is by far not trivial to determine that omp_get_level, if called with an uninitialized device RT, should return parallelLevel + 1.

I know, that someone worked on the target-based testsuite, but don't know when it is going to be ready.

There is the V&V test suite: https://crpl.cis.udel.edu/ompvvsollve/
We could also add openmp target tests into the LLVM Test Suite and run them if people define CMAKE flags.

Actually, it was the testsuite, which reveals the problems with the runtime. But only after some changes in the compiler I made to run more constructs in SPMD. Before that they all were executed in non-SPMD and the problem was masked. And I don't see a problem here since the exhaustive testing is impossible in principle.
If you have a testsuite and ready to prepare and send an RFC, solve the problems with the license, organize it, setup buildbots, provide support, then go ahead. We can do everything, but it requires a lot of time. I agree that we need target-specific testing.

In D60578#1463583, @ABataev wrote:

In D60578#1463545, @jdoerfert wrote:

Why don't we have unit tests here or in the llvm-test suite?

Because this is the library. Do you have an idea how to write the unit tests for it? It can be tested only with the executable tests. I know, that someone worked on the target-based testsuite, but don't know when it is going to be ready.

We have https://reviews.llvm.org/D51687, just nobody bothers to add tests.

...

Why don't we have unit tests here or in the llvm-test suite?

Because this is the library. Do you have an idea how to write the unit tests for it? It can be tested only with the executable tests.

We write google unit tests for various components, maybe something like that works here as well. A test that makes sure the initial output of omp_get_level is now 1 would then be great. It is by far not trivial to determine that omp_get_level, if called with an uninitialized device RT, should return parallelLevel + 1.

I know, that someone worked on the target-based testsuite, but don't know when it is going to be ready.

There is the V&V test suite: https://crpl.cis.udel.edu/ompvvsollve/
We could also add openmp target tests into the LLVM Test Suite and run them if people define CMAKE flags.

Actually, it was the testsuite, which reveals the problems with the runtime. But only after some changes in the compiler I made to run more constructs in SPMD. Before that they all were executed in non-SPMD and the problem was masked. And I don't see a problem here since the exhaustive testing is impossible in principle.
If you have a testsuite and ready to prepare and send an RFC, solve the problems with the license, organize it, setup buildbots, provide support, then go ahead. We can do everything, but it requires a lot of time. I agree that we need target-specific testing.

Our general policy is that all commits that can have tests, should have tests. We have OpenMP target tests in libomptarget/test -- and given that you've added tests there yourself, I assume that you know this ;) -- plus tests in libomptarget/deviceRTLs/nvptx/test - although it sounds like this situation can be triggered using portable code, so I'd prefer we add a test in libomptarget/test. Can you please do that?

In D60578#1464304, @hfinkel wrote:

...

Why don't we have unit tests here or in the llvm-test suite?

Because this is the library. Do you have an idea how to write the unit tests for it? It can be tested only with the executable tests.

We write google unit tests for various components, maybe something like that works here as well. A test that makes sure the initial output of omp_get_level is now 1 would then be great. It is by far not trivial to determine that omp_get_level, if called with an uninitialized device RT, should return parallelLevel + 1.

I know, that someone worked on the target-based testsuite, but don't know when it is going to be ready.

There is the V&V test suite: https://crpl.cis.udel.edu/ompvvsollve/
We could also add openmp target tests into the LLVM Test Suite and run them if people define CMAKE flags.

Actually, it was the testsuite, which reveals the problems with the runtime. But only after some changes in the compiler I made to run more constructs in SPMD. Before that they all were executed in non-SPMD and the problem was masked. And I don't see a problem here since the exhaustive testing is impossible in principle.
If you have a testsuite and ready to prepare and send an RFC, solve the problems with the license, organize it, setup buildbots, provide support, then go ahead. We can do everything, but it requires a lot of time. I agree that we need target-specific testing.

Our general policy is that all commits that can have tests, should have tests. We have OpenMP target tests in libomptarget/test -- and given that you've added tests there yourself, I assume that you know this ;) -- plus tests in libomptarget/deviceRTLs/nvptx/test - although it sounds like this situation can be triggered using portable code, so I'd prefer we add a test in libomptarget/test. Can you please do that?

Sure, if we have a testing infrastructure for this, I'll add the test. Just missed the tests for NVPTX, will definitely add it.

Added a test.

Harbormaster completed remote builds in B30456: Diff 194900.Apr 12 2019, 9:01 AM

In D60578#1464350, @ABataev wrote:

In D60578#1464304, @hfinkel wrote:

Our general policy is that all commits that can have tests, should have tests. We have OpenMP target tests in libomptarget/test -- and given that you've added tests there yourself, I assume that you know this ;) -- plus tests in libomptarget/deviceRTLs/nvptx/test - although it sounds like this situation can be triggered using portable code, so I'd prefer we add a test in libomptarget/test. Can you please do that?

Sure, if we have a testing infrastructure for this, I'll add the test. Just missed the tests for NVPTX, will definitely add it.

I think we need to be careful about adding nvptx tests to libomptarget/test: They can be executed using Clang later than 6.0.0, but that version wasn't able to offload to GPUs. Given that the changes are limited to libomptarget-nvptx (because its parallelism is kind of special), I think the new test should go to libomptarget/deviceRTLs/nvptx/test. Just my 2 cents...

In D60578#1465371, @Hahnfeld wrote:

In D60578#1464350, @ABataev wrote:

In D60578#1464304, @hfinkel wrote:

Our general policy is that all commits that can have tests, should have tests. We have OpenMP target tests in libomptarget/test -- and given that you've added tests there yourself, I assume that you know this ;) -- plus tests in libomptarget/deviceRTLs/nvptx/test - although it sounds like this situation can be triggered using portable code, so I'd prefer we add a test in libomptarget/test. Can you please do that?

Sure, if we have a testing infrastructure for this, I'll add the test. Just missed the tests for NVPTX, will definitely add it.

I think we need to be careful about adding nvptx tests to libomptarget/test: They can be executed using Clang later than 6.0.0, but that version wasn't able to offload to GPUs. Given that the changes are limited to libomptarget-nvptx (because its parallelism is kind of special), I think the new test should go to libomptarget/deviceRTLs/nvptx/test. Just my 2 cents...

I don't see anything in this test that is nvptx specific. Is there something about the semantics that make it specific to nvptx? We need to build of a suite of tests for accelerator offloading in general. We'll have other accelerator backends (e.g., for AMD GPUs), and the offloading tests should apply to them too. Also, I don't understand what Clang 6 support has to do with adding tests... clearly, we'll add tests for bugs, or already have, that will then fail on older versions of Clang. And if I target is not supported, the associated libomptarget-compile-run is ignored, no?

In D60578#1465420, @hfinkel wrote:

In D60578#1465371, @Hahnfeld wrote:

I think we need to be careful about adding nvptx tests to libomptarget/test: They can be executed using Clang later than 6.0.0, but that version wasn't able to offload to GPUs. Given that the changes are limited to libomptarget-nvptx (because its parallelism is kind of special), I think the new test should go to libomptarget/deviceRTLs/nvptx/test. Just my 2 cents...

I don't see anything in this test that is nvptx specific. Is there something about the semantics that make it specific to nvptx? We need to build of a suite of tests for accelerator offloading in general. We'll have other accelerator backends (e.g., for AMD GPUs), and the offloading tests should apply to them too.

First, I agree that the test is not specific to nvptx and should pass for all targets. However (at least so far) libomptarget/test is for tests that exercise the target-agnostic part of libomptarget; like starting target regions, environment variables, mapping etc.

Also, I don't understand what Clang 6 support has to do with adding tests... clearly, we'll add tests for bugs, or already have, that will then fail on older versions of Clang.

In that case these tests need to be marked UNSUPPORTED for versions of Clang that will not pass them. There's infrastructure for that, but it's not applied in the current form of this patch.

And if I target is not supported, the associated libomptarget-compile-run is ignored, no?

Yes and no: Yes, if a plugin (meaning the library that interfaces with the vendor libraries for launching target regions) is not compiled, the libomptarget-compile-run for that target expands to echos. However, there is currently no way of finding out if a given test can actually run (for example there needs to be a GPU plugged into the system). In theory you could use vendor commands like nvidia-smi to query that, but that still does not guarantee that the tests have a chance to pass (because of various reasons; the one that I care most about is that we have our GPUs configured in Exclusive mode, so if there's a process already running on the GPU, all others that try to create a context will get a runtime error) and IMHO it would be a poor development if check-openmp would simply stop to work in these cases.
(Side note: CUDA in Clang does the same, they have tests in test-suite that can actually be run on real hardware; the Clang tests just check the generated code.)

I can see your point that having generic tests live below libomptarget-nvptx is not ideal, but I think it's the best place we have right now given that apparently nobody plans to work on more infrastructure (which is sad).

In D60578#1465425, @Hahnfeld wrote:

In D60578#1465420, @hfinkel wrote:

In D60578#1465371, @Hahnfeld wrote:

I think we need to be careful about adding nvptx tests to libomptarget/test: They can be executed using Clang later than 6.0.0, but that version wasn't able to offload to GPUs. Given that the changes are limited to libomptarget-nvptx (because its parallelism is kind of special), I think the new test should go to libomptarget/deviceRTLs/nvptx/test. Just my 2 cents...

I don't see anything in this test that is nvptx specific. Is there something about the semantics that make it specific to nvptx? We need to build of a suite of tests for accelerator offloading in general. We'll have other accelerator backends (e.g., for AMD GPUs), and the offloading tests should apply to them too.

First, I agree that the test is not specific to nvptx and should pass for all targets. However (at least so far) libomptarget/test is for tests that exercise the target-agnostic part of libomptarget; like starting target regions, environment variables, mapping etc.

Also, I don't understand what Clang 6 support has to do with adding tests... clearly, we'll add tests for bugs, or already have, that will then fail on older versions of Clang.

In that case these tests need to be marked UNSUPPORTED for versions of Clang that will not pass them. There's infrastructure for that, but it's not applied in the current form of this patch.

Okay. This patch review is not the right place to discuss the libomptarget support for old Clang versions. We should have a separate thread on this subject.

And if I target is not supported, the associated libomptarget-compile-run is ignored, no?

Yes and no: Yes, if a plugin (meaning the library that interfaces with the vendor libraries for launching target regions) is not compiled, the libomptarget-compile-run for that target expands to echos. However, there is currently no way of finding out if a given test can actually run (for example there needs to be a GPU plugged into the system). In theory you could use vendor commands like nvidia-smi to query that, but that still does not guarantee that the tests have a chance to pass (because of various reasons; the one that I care most about is that we have our GPUs configured in Exclusive mode, so if there's a process already running on the GPU, all others that try to create a context will get a runtime error) and IMHO it would be a poor development if check-openmp would simply stop to work in these cases.
(Side note: CUDA in Clang does the same, they have tests in test-suite that can actually be run on real hardware; the Clang tests just check the generated code.)

What check command do you run to run these nvptx tests?

I can see your point that having generic tests live below libomptarget-nvptx is not ideal, but I think it's the best place we have right now given that apparently nobody plans to work on more infrastructure (which is sad).

I'm aware of several groups working on different libomptarget plugins and other related things (including my team), so I suspect that reality might not be as sad as you believe. Nevertheless, what infrastructure do we actually want here? Should we have the ability to ask make check-openmp to take a list of targets to use to run all of the offload tests so that the user can specify the names of the targets that will actually work?

In any case, let's move forward with adding this test in that directory, and then we'll address the infrastructure issue as follow-up work.

In D60578#1465958, @hfinkel wrote:

What check command do you run to run these nvptx tests?

The target is called check-libomptarget-nvptx and it's not run by check-openmp. The reasoning is basically what I've described in my previous answers, (maybe) some more in the initial revision D51687.

In that case these tests need to be marked UNSUPPORTED for versions of Clang that will not pass them. There's infrastructure for that, but it's not applied in the current form of this patch.

Okay. This patch review is not the right place to discuss the libomptarget support for old Clang versions. We should have a separate thread on this subject.

In any case, let's move forward with adding this test in that directory, and then we'll address the infrastructure issue as follow-up work.

So put differently, you're proposing to land this in its current form (which will break for some users, including me) and wait for "somebody" to work on the infrastructure to fix things?

I'm aware of several groups working on different libomptarget plugins and other related things (including my team), so I suspect that reality might not be as sad as you believe.

Working on internal testing (such as many have) is not the same as having this upstream. That's what I was referring to (sorry if that was ambiguous), I'm sure that many people are working on OpenMP offloading and the related runtime libraries.

Nevertheless, what infrastructure do we actually want here? Should we have the ability to ask make check-openmp to take a list of targets to use to run all of the offload tests so that the user can specify the names of the targets that will actually work?

In my opinion we need exactly what we have with check-libomptarget-nvptx, maybe it needs to be generalized for future targets. In its easiest setup we would just have a target for each one of them, like the current name check-libomptarget-nvptx for Nvidia, check-libomptarget-gcn for AMD and so on.

Moved the test to nvptx directory.

Harbormaster completed remote builds in B30558: Diff 195177.Apr 15 2019, 7:16 AM

In D60578#1466059, @Hahnfeld wrote:

In D60578#1465958, @hfinkel wrote:

What check command do you run to run these nvptx tests?

The target is called check-libomptarget-nvptx and it's not run by check-openmp. The reasoning is basically what I've described in my previous answers, (maybe) some more in the initial revision D51687.

In that case these tests need to be marked UNSUPPORTED for versions of Clang that will not pass them. There's infrastructure for that, but it's not applied in the current form of this patch.

Okay. This patch review is not the right place to discuss the libomptarget support for old Clang versions. We should have a separate thread on this subject.

In any case, let's move forward with adding this test in that directory, and then we'll address the infrastructure issue as follow-up work.

So put differently, you're proposing to land this in its current form (which will break for some users, including me) and wait for "somebody" to work on the infrastructure to fix things?

Of course I'm not. I'm proposing that we put the test in the nvptx directory and then address the fact that it should apply to other offloading targets as follow up.

I'm aware of several groups working on different libomptarget plugins and other related things (including my team), so I suspect that reality might not be as sad as you believe.

Working on internal testing (such as many have) is not the same as having this upstream. That's what I was referring to (sorry if that was ambiguous), I'm sure that many people are working on OpenMP offloading and the related runtime libraries.

I know what you meant, but if nothing else, as others want to add other backends, there will be collective work that will need to be done to enable that, including making sure that the testing infrastructure works correctly. Everyone is expected to contribute to basic common infrastructure.

Nevertheless, what infrastructure do we actually want here? Should we have the ability to ask make check-openmp to take a list of targets to use to run all of the offload tests so that the user can specify the names of the targets that will actually work?

In my opinion we need exactly what we have with check-libomptarget-nvptx, maybe it needs to be generalized for future targets. In its easiest setup we would just have a target for each one of them, like the current name check-libomptarget-nvptx for Nvidia, check-libomptarget-gcn for AMD and so on.

This means that we have tests in the target directories which should apply to all targets, including the host targets, but aren't run for those targets. This test, for example, won't be run against the CPU target configurations, and that's undesirable. I imagine that we actually want something like check libomptarget LIBOMPTARGET_TEST_TARGETS=ppc64le,nvptx,gcn,whatever. We can discuss this on a separate thread.

In D60578#1467071, @hfinkel wrote:

In D60578#1466059, @Hahnfeld wrote:

In D60578#1465958, @hfinkel wrote:

In that case these tests need to be marked UNSUPPORTED for versions of Clang that will not pass them. There's infrastructure for that, but it's not applied in the current form of this patch.

Okay. This patch review is not the right place to discuss the libomptarget support for old Clang versions. We should have a separate thread on this subject.

In any case, let's move forward with adding this test in that directory, and then we'll address the infrastructure issue as follow-up work.

So put differently, you're proposing to land this in its current form (which will break for some users, including me) and wait for "somebody" to work on the infrastructure to fix things?

Of course I'm not. I'm proposing that we put the test in the nvptx directory and then address the fact that it should apply to other offloading targets as follow up.

Okay, sorry for the wrong interpretation, I misread your last sentence.

So, the test is good or I should put it into some other directory?

In D60578#1467324, @ABataev wrote:

So, the test is good or I should put it into some other directory?

@Hahnfeld , libomptarget/deviceRTLs/nvptx/test/parallel works for you for now, correct?

In D60578#1467371, @hfinkel wrote:

In D60578#1467324, @ABataev wrote:

So, the test is good or I should put it into some other directory?

@Hahnfeld , libomptarget/deviceRTLs/nvptx/test/parallel works for you for now, correct?

I didn't test locally, but that should be fine, yes.

Closed by commit rOMP358442: [OPENMP][NVPTX]Fix dynamic scheduling in L2+ SPMD parallel regions. (authored by ABataev). · Explain WhyApr 15 2019, 1:13 PM

This revision was automatically updated to reflect the committed changes.

hfinkel mentioned this in D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile..Jun 13 2019, 3:01 PM

Revision Contents

Path

Size

libomptarget/

deviceRTLs/

nvptx/

src/

libcall.cu

3 lines

loop.cu

19 lines

parallel.cu

2 lines

test/

offloading/

spmd_parallel_regions.cpp

34 lines

Diff 194900

libomptarget/deviceRTLs/nvptx/src/libcall.cu

Show First 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	EXTERN int omp_get_max_active_levels(void) {
PRINT(LD_IO, "call omp_get_max_active_levels() returns %d\n", rc);		PRINT(LD_IO, "call omp_get_max_active_levels() returns %d\n", rc);
return rc;		return rc;
}		}

EXTERN int omp_get_level(void) {		EXTERN int omp_get_level(void) {
if (isRuntimeUninitialized()) {		if (isRuntimeUninitialized()) {
ASSERT0(LT_FUSSY, isSPMDMode(),		ASSERT0(LT_FUSSY, isSPMDMode(),
"Expected SPMD mode only with uninitialized runtime.");		"Expected SPMD mode only with uninitialized runtime.");
return parallelLevel;		// parallelLevel starts from 0, need to add 1 for correct level.
		return parallelLevel + 1;
}		}
int level = 0;		int level = 0;
omptarget_nvptx_TaskDescr *currTaskDescr =		omptarget_nvptx_TaskDescr *currTaskDescr =
getMyTopTaskDescriptor(isSPMDMode());		getMyTopTaskDescriptor(isSPMDMode());
ASSERT0(LT_FUSSY, currTaskDescr,		ASSERT0(LT_FUSSY, currTaskDescr,
"do not expect fct to be called in a non-active thread");		"do not expect fct to be called in a non-active thread");
do {		do {
if (currTaskDescr->IsParallelConstruct()) {		if (currTaskDescr->IsParallelConstruct()) {
▲ Show 20 Lines • Show All 339 Lines • Show Last 20 Lines

libomptarget/deviceRTLs/nvptx/src/loop.cu

Show First 20 Lines • Show All 199 Lines • ▼ Show 20 Lines	public:
INLINE static int OrderedSchedule(kmp_sched_t schedule) {		INLINE static int OrderedSchedule(kmp_sched_t schedule) {
return schedule >= kmp_sched_ordered_first &&		return schedule >= kmp_sched_ordered_first &&
schedule <= kmp_sched_ordered_last;		schedule <= kmp_sched_ordered_last;
}		}

INLINE static void dispatch_init(kmp_Ident *loc, int32_t threadId,		INLINE static void dispatch_init(kmp_Ident *loc, int32_t threadId,
kmp_sched_t schedule, T lb, T ub, ST st,		kmp_sched_t schedule, T lb, T ub, ST st,
ST chunk) {		ST chunk) {
ASSERT0(LT_FUSSY, checkRuntimeInitialized(loc),		if (checkRuntimeUninitialized(loc)) {
"Expected non-SPMD mode + initialized runtime.");		// In SPMD mode no need to check parallelism level - dynamic scheduling
		// may appear only in L2 parallel regions with lightweight runtime.
		ASSERT0(LT_FUSSY, checkSPMDMode(loc), "Expected non-SPMD mode.");
		return;
		}
int tid = GetLogicalThreadIdInBlock(checkSPMDMode(loc));		int tid = GetLogicalThreadIdInBlock(checkSPMDMode(loc));
omptarget_nvptx_TaskDescr *currTaskDescr = getMyTopTaskDescriptor(tid);		omptarget_nvptx_TaskDescr *currTaskDescr = getMyTopTaskDescriptor(tid);
T tnum = currTaskDescr->ThreadsInTeam();		T tnum = currTaskDescr->ThreadsInTeam();
T tripCount = ub - lb + 1; // +1 because ub is inclusive		T tripCount = ub - lb + 1; // +1 because ub is inclusive
ASSERT0(LT_FUSSY, threadId < tnum,		ASSERT0(LT_FUSSY, threadId < tnum,
"current thread is not needed here; error");		"current thread is not needed here; error");

/* Currently just ignore the monotonic and non-monotonic modifiers		/* Currently just ignore the monotonic and non-monotonic modifiers
▲ Show 20 Lines • Show All 216 Lines • ▼ Show 20 Lines	INLINE static int DynamicNextChunk(T &lb, T &ub, T chunkSize,
ub = loopUpperBound + 1;		ub = loopUpperBound + 1;
PRINT(LD_LOOPD, "lb %lld, ub %lld, loop ub %lld; finished\n", (long long)lb,		PRINT(LD_LOOPD, "lb %lld, ub %lld, loop ub %lld; finished\n", (long long)lb,
(long long)ub, (long long)loopUpperBound);		(long long)ub, (long long)loopUpperBound);
return FINISHED;		return FINISHED;
}		}

INLINE static int dispatch_next(kmp_Ident loc, int32_t gtid, int32_t plast,		INLINE static int dispatch_next(kmp_Ident loc, int32_t gtid, int32_t plast,
T plower, T pupper, ST *pstride) {		T plower, T pupper, ST *pstride) {
ASSERT0(LT_FUSSY, checkRuntimeInitialized(loc),		if (checkRuntimeUninitialized(loc)) {
"Expected non-SPMD mode + initialized runtime.");		// In SPMD mode no need to check parallelism level - dynamic scheduling
		// may appear only in L2 parallel regions with lightweight runtime.
		ASSERT0(LT_FUSSY, checkSPMDMode(loc), "Expected non-SPMD mode.");
		if (*plast)
		return DISPATCH_FINISHED;
		*plast = 1;
		return DISPATCH_NOTFINISHED;
		}
// ID of a thread in its own warp		// ID of a thread in its own warp

// automatically selects thread or warp ID based on selected implementation		// automatically selects thread or warp ID based on selected implementation
int tid = GetLogicalThreadIdInBlock(checkSPMDMode(loc));		int tid = GetLogicalThreadIdInBlock(checkSPMDMode(loc));
ASSERT0(LT_FUSSY,		ASSERT0(LT_FUSSY,
gtid < GetNumberOfOmpThreads(tid, checkSPMDMode(loc),		gtid < GetNumberOfOmpThreads(tid, checkSPMDMode(loc),
checkRuntimeUninitialized(loc)),		checkRuntimeUninitialized(loc)),
"current thread is not needed here; error");		"current thread is not needed here; error");
▲ Show 20 Lines • Show All 354 Lines • Show Last 20 Lines

libomptarget/deviceRTLs/nvptx/src/parallel.cu

	Show First 20 Lines • Show All 401 Lines • ▼ Show 20 Lines
	}			}

	EXTERN uint16_t __kmpc_parallel_level(kmp_Ident *loc, uint32_t global_tid) {			EXTERN uint16_t __kmpc_parallel_level(kmp_Ident *loc, uint32_t global_tid) {
	PRINT0(LD_IO, "call to __kmpc_parallel_level\n");			PRINT0(LD_IO, "call to __kmpc_parallel_level\n");

	if (checkRuntimeUninitialized(loc)) {			if (checkRuntimeUninitialized(loc)) {
	ASSERT0(LT_FUSSY, checkSPMDMode(loc),			ASSERT0(LT_FUSSY, checkSPMDMode(loc),
	"Expected SPMD mode with uninitialized runtime.");			"Expected SPMD mode with uninitialized runtime.");
	return parallelLevel;			return parallelLevel + 1;
	}			}

	int threadId = GetLogicalThreadIdInBlock(checkSPMDMode(loc));			int threadId = GetLogicalThreadIdInBlock(checkSPMDMode(loc));
	omptarget_nvptx_TaskDescr *currTaskDescr =			omptarget_nvptx_TaskDescr *currTaskDescr =
	omptarget_nvptx_threadPrivateContext->GetTopLevelTaskDescr(threadId);			omptarget_nvptx_threadPrivateContext->GetTopLevelTaskDescr(threadId);
	if (currTaskDescr->InL2OrHigherParallelRegion())			if (currTaskDescr->InL2OrHigherParallelRegion())
	return 2;			return 2;
	else if (currTaskDescr->InParallelRegion())			else if (currTaskDescr->InParallelRegion())
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

libomptarget/test/offloading/spmd_parallel_regions.cpp

This file was added.

				// RUN: %libomptarget-compilexx-run-and-check-aarch64-unknown-linux-gnu
				// RUN: %libomptarget-compilexx-run-and-check-powerpc64-ibm-linux-gnu
				// RUN: %libomptarget-compilexx-run-and-check-powerpc64le-ibm-linux-gnu
				// RUN: %libomptarget-compilexx-run-and-check-x86_64-pc-linux-gnu
				// RUN: %libomptarget-compilexx-run-and-check-nvptx64-nvidia-cuda

				#include <stdio.h>
				#include <omp.h>

				int main(void) {
				int isHost = -1;
				int ParallelLevel1, ParallelLevel2 = -1;

				#pragma omp target parallel map(from: isHost, ParallelLevel1, ParallelLevel2)
				{
				isHost = omp_is_initial_device();
				ParallelLevel1 = omp_get_level();
				#pragma omp parallel for schedule(dynamic) lastprivate(ParallelLevel2)
				for (int I = 0; I < 10; ++I)
				ParallelLevel2 = omp_get_level();
				}

				if (isHost < 0) {
				printf("Runtime error, isHost=%d\n", isHost);
				}

				// CHECK: Target region executed on the device
				printf("Target region executed on the %s\n", isHost ? "host" : "device");
				// CHECK: Parallel level in SPMD mode: L1 is 1, L2 is 2
				printf("Parallel level in SPMD mode: L1 is %d, L2 is %d\n", ParallelLevel1,
				ParallelLevel2);

				return isHost;
				}