Page MenuHomePhabricator

[OpenMP] Introduce kernel environment

Authored by tianshilei1992 on Jan 25 2023, 11:22 AM.



This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like __kernel_name_exec_mode. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.

This is a combination and refinement of patch series D116908, D116909, and D116910.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jdoerfert added inline comments.Jan 25 2023, 12:04 PM

Unused for now.

tianshilei1992 marked 6 inline comments as done.

rebase, add more code, fix comments

tianshilei1992 edited the summary of this revision. (Show Details)Jan 25 2023, 3:54 PM
tianshilei1992 added inline comments.

This will be removed properly later.

jdoerfert added inline comments.Jan 25 2023, 5:28 PM

This looks wrong. We want either flag to allow us to remove nested parallelism handling. So
__omp_rtl_assume_no_nested_parallelism is good enough, and
!state::getKernelEnvironment().Configuration.MayUseNestedParallelism is good enough.

tianshilei1992 added inline comments.Jan 25 2023, 5:31 PM

Yeah, since I already updated OpenMPOpt.cpp, __omp_rtl_assume_no_nested_parallelism will not be generated.

rebase, fix comments, added more code.

features are finished now. time to move to tests.

fix bugs and add plugin support

rebase and fix problems

rebase and fix bugs

rebase and fix bugs


These changes in old plugins will be gone because the old plugins will be removed.

tianshilei1992 retitled this revision from [WIP][OpenMP] Introduce kernel argument to [OpenMP] Introduce kernel environment.Feb 1 2023, 9:05 AM
tianshilei1992 edited the summary of this revision. (Show Details)
jdoerfert added inline comments.Feb 14 2023, 10:32 PM

That is not the kernel, at least not when clang emits a debug wrapper as part of -g, I think.
We have one workaround for this problem already but we need to provide a generic way. I'll add something.


We can't have it constant as a whole. We need a constant part and a non-constant part. Let's add them like that from the very beginning.
I'd suggest we have a third struct type that is references from the kernelenv. It contains for now only the Level value, but we can/will add more things. It's an independent global such that we can make the kernelenv constant.


Maybe close the namespace here.


Shouldn't we always have one here?


This should happen above in the initializer. We should assume no-nested-parallelism and go back on that assumption if need be.


And here you need to check nested-parallelism then too.


__omp_rtl_assume_no_nested_parallelism is generated by clang, the value depends on -fopenmp-assume-no-nested-parallelism

tianshilei1992 marked 7 inline comments as done.

rebase and fix comments


This doesn't work as expected. Need to figure it out.

fix the WA, rebase, and update some tests

tianshilei1992 edited the summary of this revision. (Show Details)Feb 17 2023, 8:23 PM
jdoerfert accepted this revision.Feb 21 2023, 11:02 AM



Typo: tt

583 ↗(On Diff #498559)

We used to default to SPMD w/o a _exec_mode. Unsure if this will cause problems. I am inclined to allow missing _kernel_env as well and use SPMD then.

This revision is now accepted and ready to land.Feb 21 2023, 11:02 AM

rebase, update tests, fix comments

tianshilei1992 marked 2 inline comments as done.Feb 21 2023, 7:19 PM
tianshilei1992 added inline comments.
583 ↗(On Diff #498559)

Now we need the global variable to call target_init at runtime, so missing it sounds like a bug. I'm not sure if we would optimize the symbol out, but I kinda doubt it.

Pass two benchmarks: XSBench and RSBench.
The patch will break the old plugins as well. I'm wondering we might want to land it after D142820.

jdoerfert added inline comments.Feb 22 2023, 9:12 AM
583 ↗(On Diff #498559)

Missing used to mean, and does, that it's not a "target region" kernel. So it might be a con/de-structor or a kernel coming from elsewhere, e.g., CUDA. We should continue that behavior I think.

tianshilei1992 marked 2 inline comments as done.

rebase and fix comments

update tests in LLVM

Only llvm/test/Transforms/OpenMP/spmdization_constant_prop.ll is left.

update more tests

add the last test

This revision was landed with ongoing or failed builds.Apr 22 2023, 5:47 PM
This revision was automatically updated to reflect the committed changes.

@tianshilei1992 this seems to have broken the amdgpu buildbot, could you please address quickly ?
or revert and fix ?

tianshilei1992 reopened this revision.Apr 22 2023, 5:58 PM

@tianshilei1992 this seems to have broken the amdgpu buildbot, could you please address quickly ?
or revert and fix ?


This revision is now accepted and ready to land.Apr 22 2023, 5:58 PM

Fix two test failres. It looks like not working on AMDGPU because of the global
read on the host is not correct. Needs to investigate more.

tianshilei1992 added inline comments.May 1 2023, 3:14 PM
11 ↗(On Diff #516110)

@arsenm Does AMDGPU support a global struct with constant initializer? I currently encountered an error that, the generated IR has correct initializer, but when I read from the global struct from the image (ELF), the corresponding memory looks pretty random. I just need one of the first three i8 values.