The OpenMP device runtime needs to support the OpenMP standard. However
constructs like nested parallelism are very uncommon in real application
yet lead to complexity in the runtime that is sometimes difficult to
optimize out. As a stop-gap for performance we should supply an argument
that selectively disables this feature. This patch adds the
-fopenmp-assume-no-nested-parallelism argument which explicitly
disables the usee of nested parallelism in OpenMP.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Unit Tests
Time | Test | |
---|---|---|
2,240 ms | x64 debian > Clang.Driver::amdgpu-openmp-toolchain-new.c |
Event Timeline
Comment Actions
This looks good, but what happens when the user accidentally adds a nested parallel when this option is turned on? Do we get serial (correct) execution?
Comment Actions
With the code as it is, it will simply ignore the level and continue executing. This will probably be broken as we don't adjust some of the other state variables for this. The flag more-so asserts that nested parallelism will not work in any capacity than we reduce them to a single thread. We could potentially make it work, but it would be more complicated. There's an assertion if nested parallelism is attempted while being disabled, but this requires the user checking the assertions via -fopenmp-debug.