This is an archive of the discontinued LLVM Phabricator instance.

[OPENMP][NVPTX]Fix dynamic scheduling in L2+ SPMD parallel regions.
ClosedPublic

Authored by ABataev on Apr 11 2019, 1:42 PM.

Details

Summary

If the kernel is executed in SPMD mode and the L2+ parallel for region
with the dynamic scheduling is executed, dynamic scheduling functions
are called. They expect full runtime support, but SPMD kernels may be
executed without the full runtime. It leads to the runtime crash of the
compiled program. Patch fixes this problem + fixes handling of the
parallelism level in SPMD mode, which is required as part of this patch.

Diff Detail

Repository
rOMP OpenMP

Event Timeline

ABataev created this revision.Apr 11 2019, 1:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 11 2019, 1:42 PM
This revision is now accepted and ready to land.Apr 11 2019, 2:15 PM

LGTM

Is there a way we can test this?

LGTM

Is there a way we can test this?

It is tested in the internal testsuite, don't know when it is going to be committed to trunk

LGTM

Is there a way we can test this?

It is tested in the internal testsuite, don't know when it is going to be committed to trunk

There are two problems:

  1. The internal testsuite did run before this patch, right? So it is unclear what that means.
  2. Changes done upstream might break this without us noticing for a while and without being able to know apriory.

Why don't we have unit tests here or in the llvm-test suite?

LGTM

Is there a way we can test this?

It is tested in the internal testsuite, don't know when it is going to be committed to trunk

There are two problems:

  1. The internal testsuite did run before this patch, right? So it is unclear what that means.

No, the tests ran with this patch.

  1. Changes done upstream might break this without us noticing for a while and without being able to know apriory.

We test everything before doing any changes.

Why don't we have unit tests here or in the llvm-test suite?

Because this is the library. Do you have an idea how to write the unit tests for it? It can be tested only with the executable tests. I know, that someone worked on the target-based testsuite, but don't know when it is going to be ready.

LGTM

Is there a way we can test this?

It is tested in the internal testsuite, don't know when it is going to be committed to trunk

There are two problems:

  1. The internal testsuite did run before this patch, right? So it is unclear what that means.

No, the tests ran with this patch.

The internal test suite did run before this commit as well even though it was buggy. It is unclear to me what "the tests" does therefore mean.
Now you might have added some tests which nobody can check but we just see some changes that add "+ 1".
How should one review this? Similarly important, how should one now ensure this doesn't break in the future?

  1. Changes done upstream might break this without us noticing for a while and without being able to know apriory.

We test everything before doing any changes.

The problem is not that you do not test everything, the problem is that the rest cannot.

Why don't we have unit tests here or in the llvm-test suite?

Because this is the library. Do you have an idea how to write the unit tests for it? It can be tested only with the executable tests.

We write google unit tests for various components, maybe something like that works here as well. A test that makes sure the initial output of omp_get_level is now 1 would then be great. It is by far not trivial to determine that omp_get_level, if called with an uninitialized device RT, should return parallelLevel + 1.

I know, that someone worked on the target-based testsuite, but don't know when it is going to be ready.

There is the V&V test suite: https://crpl.cis.udel.edu/ompvvsollve/
We could also add openmp target tests into the LLVM Test Suite and run them if people define CMAKE flags.

LGTM

Is there a way we can test this?

It is tested in the internal testsuite, don't know when it is going to be committed to trunk

There are two problems:

  1. The internal testsuite did run before this patch, right? So it is unclear what that means.

No, the tests ran with this patch.

The internal test suite did run before this commit as well even though it was buggy. It is unclear to me what "the tests" does therefore mean.
Now you might have added some tests which nobody can check but we just see some changes that add "+ 1".
How should one review this? Similarly important, how should one now ensure this doesn't break in the future?

  1. Changes done upstream might break this without us noticing for a while and without being able to know apriory.

We test everything before doing any changes.

The problem is not that you do not test everything, the problem is that the rest cannot.

Why don't we have unit tests here or in the llvm-test suite?

Because this is the library. Do you have an idea how to write the unit tests for it? It can be tested only with the executable tests.

We write google unit tests for various components, maybe something like that works here as well. A test that makes sure the initial output of omp_get_level is now 1 would then be great. It is by far not trivial to determine that omp_get_level, if called with an uninitialized device RT, should return parallelLevel + 1.

I know, that someone worked on the target-based testsuite, but don't know when it is going to be ready.

There is the V&V test suite: https://crpl.cis.udel.edu/ompvvsollve/
We could also add openmp target tests into the LLVM Test Suite and run them if people define CMAKE flags.

Actually, it was the testsuite, which reveals the problems with the runtime. But only after some changes in the compiler I made to run more constructs in SPMD. Before that they all were executed in non-SPMD and the problem was masked. And I don't see a problem here since the exhaustive testing is impossible in principle.
If you have a testsuite and ready to prepare and send an RFC, solve the problems with the license, organize it, setup buildbots, provide support, then go ahead. We can do everything, but it requires a lot of time. I agree that we need target-specific testing.

Why don't we have unit tests here or in the llvm-test suite?

Because this is the library. Do you have an idea how to write the unit tests for it? It can be tested only with the executable tests. I know, that someone worked on the target-based testsuite, but don't know when it is going to be ready.

We have https://reviews.llvm.org/D51687, just nobody bothers to add tests.

...

Why don't we have unit tests here or in the llvm-test suite?

Because this is the library. Do you have an idea how to write the unit tests for it? It can be tested only with the executable tests.

We write google unit tests for various components, maybe something like that works here as well. A test that makes sure the initial output of omp_get_level is now 1 would then be great. It is by far not trivial to determine that omp_get_level, if called with an uninitialized device RT, should return parallelLevel + 1.

I know, that someone worked on the target-based testsuite, but don't know when it is going to be ready.

There is the V&V test suite: https://crpl.cis.udel.edu/ompvvsollve/
We could also add openmp target tests into the LLVM Test Suite and run them if people define CMAKE flags.

Actually, it was the testsuite, which reveals the problems with the runtime. But only after some changes in the compiler I made to run more constructs in SPMD. Before that they all were executed in non-SPMD and the problem was masked. And I don't see a problem here since the exhaustive testing is impossible in principle.
If you have a testsuite and ready to prepare and send an RFC, solve the problems with the license, organize it, setup buildbots, provide support, then go ahead. We can do everything, but it requires a lot of time. I agree that we need target-specific testing.

Our general policy is that all commits that can have tests, should have tests. We have OpenMP target tests in libomptarget/test -- and given that you've added tests there yourself, I assume that you know this ;) -- plus tests in libomptarget/deviceRTLs/nvptx/test - although it sounds like this situation can be triggered using portable code, so I'd prefer we add a test in libomptarget/test. Can you please do that?

...

Why don't we have unit tests here or in the llvm-test suite?

Because this is the library. Do you have an idea how to write the unit tests for it? It can be tested only with the executable tests.

We write google unit tests for various components, maybe something like that works here as well. A test that makes sure the initial output of omp_get_level is now 1 would then be great. It is by far not trivial to determine that omp_get_level, if called with an uninitialized device RT, should return parallelLevel + 1.

I know, that someone worked on the target-based testsuite, but don't know when it is going to be ready.

There is the V&V test suite: https://crpl.cis.udel.edu/ompvvsollve/
We could also add openmp target tests into the LLVM Test Suite and run them if people define CMAKE flags.

Actually, it was the testsuite, which reveals the problems with the runtime. But only after some changes in the compiler I made to run more constructs in SPMD. Before that they all were executed in non-SPMD and the problem was masked. And I don't see a problem here since the exhaustive testing is impossible in principle.
If you have a testsuite and ready to prepare and send an RFC, solve the problems with the license, organize it, setup buildbots, provide support, then go ahead. We can do everything, but it requires a lot of time. I agree that we need target-specific testing.

Our general policy is that all commits that can have tests, should have tests. We have OpenMP target tests in libomptarget/test -- and given that you've added tests there yourself, I assume that you know this ;) -- plus tests in libomptarget/deviceRTLs/nvptx/test - although it sounds like this situation can be triggered using portable code, so I'd prefer we add a test in libomptarget/test. Can you please do that?

Sure, if we have a testing infrastructure for this, I'll add the test. Just missed the tests for NVPTX, will definitely add it.

ABataev updated this revision to Diff 194900.Apr 12 2019, 8:59 AM

Added a test.

Our general policy is that all commits that can have tests, should have tests. We have OpenMP target tests in libomptarget/test -- and given that you've added tests there yourself, I assume that you know this ;) -- plus tests in libomptarget/deviceRTLs/nvptx/test - although it sounds like this situation can be triggered using portable code, so I'd prefer we add a test in libomptarget/test. Can you please do that?

Sure, if we have a testing infrastructure for this, I'll add the test. Just missed the tests for NVPTX, will definitely add it.

I think we need to be careful about adding nvptx tests to libomptarget/test: They can be executed using Clang later than 6.0.0, but that version wasn't able to offload to GPUs. Given that the changes are limited to libomptarget-nvptx (because its parallelism is kind of special), I think the new test should go to libomptarget/deviceRTLs/nvptx/test. Just my 2 cents...

Our general policy is that all commits that can have tests, should have tests. We have OpenMP target tests in libomptarget/test -- and given that you've added tests there yourself, I assume that you know this ;) -- plus tests in libomptarget/deviceRTLs/nvptx/test - although it sounds like this situation can be triggered using portable code, so I'd prefer we add a test in libomptarget/test. Can you please do that?

Sure, if we have a testing infrastructure for this, I'll add the test. Just missed the tests for NVPTX, will definitely add it.

I think we need to be careful about adding nvptx tests to libomptarget/test: They can be executed using Clang later than 6.0.0, but that version wasn't able to offload to GPUs. Given that the changes are limited to libomptarget-nvptx (because its parallelism is kind of special), I think the new test should go to libomptarget/deviceRTLs/nvptx/test. Just my 2 cents...

I don't see anything in this test that is nvptx specific. Is there something about the semantics that make it specific to nvptx? We need to build of a suite of tests for accelerator offloading in general. We'll have other accelerator backends (e.g., for AMD GPUs), and the offloading tests should apply to them too. Also, I don't understand what Clang 6 support has to do with adding tests... clearly, we'll add tests for bugs, or already have, that will then fail on older versions of Clang. And if I target is not supported, the associated libomptarget-compile-run is ignored, no?

I think we need to be careful about adding nvptx tests to libomptarget/test: They can be executed using Clang later than 6.0.0, but that version wasn't able to offload to GPUs. Given that the changes are limited to libomptarget-nvptx (because its parallelism is kind of special), I think the new test should go to libomptarget/deviceRTLs/nvptx/test. Just my 2 cents...

I don't see anything in this test that is nvptx specific. Is there something about the semantics that make it specific to nvptx? We need to build of a suite of tests for accelerator offloading in general. We'll have other accelerator backends (e.g., for AMD GPUs), and the offloading tests should apply to them too.

First, I agree that the test is not specific to nvptx and should pass for all targets. However (at least so far) libomptarget/test is for tests that exercise the target-agnostic part of libomptarget; like starting target regions, environment variables, mapping etc.

Also, I don't understand what Clang 6 support has to do with adding tests... clearly, we'll add tests for bugs, or already have, that will then fail on older versions of Clang.

In that case these tests need to be marked UNSUPPORTED for versions of Clang that will not pass them. There's infrastructure for that, but it's not applied in the current form of this patch.

And if I target is not supported, the associated libomptarget-compile-run is ignored, no?

Yes and no: Yes, if a plugin (meaning the library that interfaces with the vendor libraries for launching target regions) is not compiled, the libomptarget-compile-run for that target expands to echos. However, there is currently no way of finding out if a given test can actually run (for example there needs to be a GPU plugged into the system). In theory you could use vendor commands like nvidia-smi to query that, but that still does not guarantee that the tests have a chance to pass (because of various reasons; the one that I care most about is that we have our GPUs configured in Exclusive mode, so if there's a process already running on the GPU, all others that try to create a context will get a runtime error) and IMHO it would be a poor development if check-openmp would simply stop to work in these cases.
(Side note: CUDA in Clang does the same, they have tests in test-suite that can actually be run on real hardware; the Clang tests just check the generated code.)

I can see your point that having generic tests live below libomptarget-nvptx is not ideal, but I think it's the best place we have right now given that apparently nobody plans to work on more infrastructure (which is sad).

I think we need to be careful about adding nvptx tests to libomptarget/test: They can be executed using Clang later than 6.0.0, but that version wasn't able to offload to GPUs. Given that the changes are limited to libomptarget-nvptx (because its parallelism is kind of special), I think the new test should go to libomptarget/deviceRTLs/nvptx/test. Just my 2 cents...

I don't see anything in this test that is nvptx specific. Is there something about the semantics that make it specific to nvptx? We need to build of a suite of tests for accelerator offloading in general. We'll have other accelerator backends (e.g., for AMD GPUs), and the offloading tests should apply to them too.

First, I agree that the test is not specific to nvptx and should pass for all targets. However (at least so far) libomptarget/test is for tests that exercise the target-agnostic part of libomptarget; like starting target regions, environment variables, mapping etc.

Also, I don't understand what Clang 6 support has to do with adding tests... clearly, we'll add tests for bugs, or already have, that will then fail on older versions of Clang.

In that case these tests need to be marked UNSUPPORTED for versions of Clang that will not pass them. There's infrastructure for that, but it's not applied in the current form of this patch.

Okay. This patch review is not the right place to discuss the libomptarget support for old Clang versions. We should have a separate thread on this subject.

And if I target is not supported, the associated libomptarget-compile-run is ignored, no?

Yes and no: Yes, if a plugin (meaning the library that interfaces with the vendor libraries for launching target regions) is not compiled, the libomptarget-compile-run for that target expands to echos. However, there is currently no way of finding out if a given test can actually run (for example there needs to be a GPU plugged into the system). In theory you could use vendor commands like nvidia-smi to query that, but that still does not guarantee that the tests have a chance to pass (because of various reasons; the one that I care most about is that we have our GPUs configured in Exclusive mode, so if there's a process already running on the GPU, all others that try to create a context will get a runtime error) and IMHO it would be a poor development if check-openmp would simply stop to work in these cases.
(Side note: CUDA in Clang does the same, they have tests in test-suite that can actually be run on real hardware; the Clang tests just check the generated code.)

What check command do you run to run these nvptx tests?

I can see your point that having generic tests live below libomptarget-nvptx is not ideal, but I think it's the best place we have right now given that apparently nobody plans to work on more infrastructure (which is sad).

I'm aware of several groups working on different libomptarget plugins and other related things (including my team), so I suspect that reality might not be as sad as you believe. Nevertheless, what infrastructure do we actually want here? Should we have the ability to ask make check-openmp to take a list of targets to use to run all of the offload tests so that the user can specify the names of the targets that will actually work?

In any case, let's move forward with adding this test in that directory, and then we'll address the infrastructure issue as follow-up work.

What check command do you run to run these nvptx tests?

The target is called check-libomptarget-nvptx and it's not run by check-openmp. The reasoning is basically what I've described in my previous answers, (maybe) some more in the initial revision D51687.

In that case these tests need to be marked UNSUPPORTED for versions of Clang that will not pass them. There's infrastructure for that, but it's not applied in the current form of this patch.

Okay. This patch review is not the right place to discuss the libomptarget support for old Clang versions. We should have a separate thread on this subject.

In any case, let's move forward with adding this test in that directory, and then we'll address the infrastructure issue as follow-up work.

So put differently, you're proposing to land this in its current form (which will break for some users, including me) and wait for "somebody" to work on the infrastructure to fix things?

I'm aware of several groups working on different libomptarget plugins and other related things (including my team), so I suspect that reality might not be as sad as you believe.

Working on internal testing (such as many have) is not the same as having this upstream. That's what I was referring to (sorry if that was ambiguous), I'm sure that many people are working on OpenMP offloading and the related runtime libraries.

Nevertheless, what infrastructure do we actually want here? Should we have the ability to ask make check-openmp to take a list of targets to use to run all of the offload tests so that the user can specify the names of the targets that will actually work?

In my opinion we need exactly what we have with check-libomptarget-nvptx, maybe it needs to be generalized for future targets. In its easiest setup we would just have a target for each one of them, like the current name check-libomptarget-nvptx for Nvidia, check-libomptarget-gcn for AMD and so on.

ABataev updated this revision to Diff 195177.Apr 15 2019, 7:13 AM

Moved the test to nvptx directory.

What check command do you run to run these nvptx tests?

The target is called check-libomptarget-nvptx and it's not run by check-openmp. The reasoning is basically what I've described in my previous answers, (maybe) some more in the initial revision D51687.

In that case these tests need to be marked UNSUPPORTED for versions of Clang that will not pass them. There's infrastructure for that, but it's not applied in the current form of this patch.

Okay. This patch review is not the right place to discuss the libomptarget support for old Clang versions. We should have a separate thread on this subject.

In any case, let's move forward with adding this test in that directory, and then we'll address the infrastructure issue as follow-up work.

So put differently, you're proposing to land this in its current form (which will break for some users, including me) and wait for "somebody" to work on the infrastructure to fix things?

Of course I'm not. I'm proposing that we put the test in the nvptx directory and then address the fact that it should apply to other offloading targets as follow up.

I'm aware of several groups working on different libomptarget plugins and other related things (including my team), so I suspect that reality might not be as sad as you believe.

Working on internal testing (such as many have) is not the same as having this upstream. That's what I was referring to (sorry if that was ambiguous), I'm sure that many people are working on OpenMP offloading and the related runtime libraries.

I know what you meant, but if nothing else, as others want to add other backends, there will be collective work that will need to be done to enable that, including making sure that the testing infrastructure works correctly. Everyone is expected to contribute to basic common infrastructure.

Nevertheless, what infrastructure do we actually want here? Should we have the ability to ask make check-openmp to take a list of targets to use to run all of the offload tests so that the user can specify the names of the targets that will actually work?

In my opinion we need exactly what we have with check-libomptarget-nvptx, maybe it needs to be generalized for future targets. In its easiest setup we would just have a target for each one of them, like the current name check-libomptarget-nvptx for Nvidia, check-libomptarget-gcn for AMD and so on.

This means that we have tests in the target directories which should apply to all targets, including the host targets, but aren't run for those targets. This test, for example, won't be run against the CPU target configurations, and that's undesirable. I imagine that we actually want something like check libomptarget LIBOMPTARGET_TEST_TARGETS=ppc64le,nvptx,gcn,whatever. We can discuss this on a separate thread.

In that case these tests need to be marked UNSUPPORTED for versions of Clang that will not pass them. There's infrastructure for that, but it's not applied in the current form of this patch.

Okay. This patch review is not the right place to discuss the libomptarget support for old Clang versions. We should have a separate thread on this subject.

In any case, let's move forward with adding this test in that directory, and then we'll address the infrastructure issue as follow-up work.

So put differently, you're proposing to land this in its current form (which will break for some users, including me) and wait for "somebody" to work on the infrastructure to fix things?

Of course I'm not. I'm proposing that we put the test in the nvptx directory and then address the fact that it should apply to other offloading targets as follow up.

Okay, sorry for the wrong interpretation, I misread your last sentence.

So, the test is good or I should put it into some other directory?

So, the test is good or I should put it into some other directory?

@Hahnfeld , libomptarget/deviceRTLs/nvptx/test/parallel works for you for now, correct?

So, the test is good or I should put it into some other directory?

@Hahnfeld , libomptarget/deviceRTLs/nvptx/test/parallel works for you for now, correct?

I didn't test locally, but that should be fine, yes.

This revision was automatically updated to reflect the committed changes.