This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Introduce kernel environment
ClosedPublic

Authored by tianshilei1992 on Jan 25 2023, 11:22 AM.

Details

Summary

This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like __kernel_name_exec_mode. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.

This is a combination and refinement of patch series D116908, D116909, and D116910.

Depend on D155886.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptJan 25 2023, 11:22 AM
tianshilei1992 requested review of this revision.Jan 25 2023, 11:22 AM
Herald added a reviewer: sstefan1. · View Herald Transcript
Herald added projects: Restricted Project, Restricted Project, Restricted Project. · View Herald Transcript
arsenm added a subscriber: arsenm.Jan 25 2023, 11:29 AM

Description not clear

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
3936–3938

ConstantExpr::getAddrSpaceCast

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
3525–3532

Optimizations should be a separate change

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
3525–3532

That's gonna break the tests because the interfaces between compiler and runtime are changed.

Link the old review too. Pre-commit, w/o review, the file rename and include changes.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
321

Do we need the offset parts? I think we should only work with field numbers not offsets.

openmp/libomptarget/DeviceRTL/include/Configuration.h
19 ↗(On Diff #492197)

Precommit.

24 ↗(On Diff #492197)

Unused for now.

openmp/libomptarget/DeviceRTL/src/Configuration.cpp
26

Keep it for now. Makes the transition easier. Use an || below.

tianshilei1992 marked 6 inline comments as done.

rebase, add more code, fix comments

tianshilei1992 edited the summary of this revision. (Show Details)Jan 25 2023, 3:54 PM
tianshilei1992 added inline comments.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
2643

This will be removed properly later.

jdoerfert added inline comments.Jan 25 2023, 5:28 PM
openmp/libomptarget/DeviceRTL/src/Configuration.cpp
57

This looks wrong. We want either flag to allow us to remove nested parallelism handling. So
__omp_rtl_assume_no_nested_parallelism is good enough, and
!state::getKernelEnvironment().Configuration.MayUseNestedParallelism is good enough.

tianshilei1992 added inline comments.Jan 25 2023, 5:31 PM
openmp/libomptarget/DeviceRTL/src/Configuration.cpp
57

Yeah, since I already updated OpenMPOpt.cpp, __omp_rtl_assume_no_nested_parallelism will not be generated.

rebase, fix comments, added more code.

features are finished now. time to move to tests.

fix bugs and add plugin support

rebase and fix problems

rebase and fix bugs

rebase and fix bugs

openmp/libomptarget/plugins/cuda/src/rtl.cpp
26

These changes in old plugins will be gone because the old plugins will be removed.

tianshilei1992 retitled this revision from [WIP][OpenMP] Introduce kernel argument to [OpenMP] Introduce kernel environment.Feb 1 2023, 9:05 AM
tianshilei1992 edited the summary of this revision. (Show Details)
jdoerfert added inline comments.Feb 14 2023, 10:32 PM
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
3919

That is not the kernel, at least not when clang emits a debug wrapper as part of -g, I think.
We have one workaround for this problem already but we need to provide a generic way. I'll add something.

3931

We can't have it constant as a whole. We need a constant part and a non-constant part. Let's add them like that from the very beginning.
I'd suggest we have a third struct type that is references from the kernelenv. It contains for now only the Level value, but we can/will add more things. It's an independent global such that we can make the kernelenv constant.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
212

Maybe close the namespace here.

3748

Shouldn't we always have one here?

3755

This should happen above in the initializer. We should assume no-nested-parallelism and go back on that assumption if need be.

4473

And here you need to check nested-parallelism then too.

openmp/libomptarget/DeviceRTL/src/Configuration.cpp
57

__omp_rtl_assume_no_nested_parallelism is generated by clang, the value depends on -fopenmp-assume-no-nested-parallelism

tianshilei1992 marked 7 inline comments as done.

rebase and fix comments

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
3923

This doesn't work as expected. Need to figure it out.

fix the WA, rebase, and update some tests

tianshilei1992 edited the summary of this revision. (Show Details)Feb 17 2023, 8:23 PM
jdoerfert accepted this revision.Feb 21 2023, 11:02 AM

LG

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
3917

Typo: tt

openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp
596

We used to default to SPMD w/o a _exec_mode. Unsure if this will cause problems. I am inclined to allow missing _kernel_env as well and use SPMD then.

This revision is now accepted and ready to land.Feb 21 2023, 11:02 AM

rebase, update tests, fix comments

tianshilei1992 marked 2 inline comments as done.Feb 21 2023, 7:19 PM
tianshilei1992 added inline comments.
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp
596

Now we need the global variable to call target_init at runtime, so missing it sounds like a bug. I'm not sure if we would optimize the symbol out, but I kinda doubt it.

Pass two benchmarks: XSBench and RSBench.
The patch will break the old plugins as well. I'm wondering we might want to land it after D142820.

jdoerfert added inline comments.Feb 22 2023, 9:12 AM
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp
596

Missing used to mean, and does, that it's not a "target region" kernel. So it might be a con/de-structor or a kernel coming from elsewhere, e.g., CUDA. We should continue that behavior I think.

tianshilei1992 marked 2 inline comments as done.

rebase and fix comments

update tests in LLVM

Only llvm/test/Transforms/OpenMP/spmdization_constant_prop.ll is left.

update more tests

add the last test

This revision was landed with ongoing or failed builds.Apr 22 2023, 5:47 PM
This revision was automatically updated to reflect the committed changes.

@tianshilei1992 this seems to have broken the amdgpu buildbot, could you please address quickly ?
or revert and fix ?

tianshilei1992 reopened this revision.Apr 22 2023, 5:58 PM

@tianshilei1992 this seems to have broken the amdgpu buildbot, could you please address quickly ?
or revert and fix ?

Reverted.

This revision is now accepted and ready to land.Apr 22 2023, 5:58 PM

Fix two test failres. It looks like not working on AMDGPU because of the global
read on the host is not correct. Needs to investigate more.

tianshilei1992 added inline comments.May 1 2023, 3:14 PM
llvm/test/Transforms/OpenMP/always_inline_device.ll
13–19

@arsenm Does AMDGPU support a global struct with constant initializer? I currently encountered an error that, the generated IR has correct initializer, but when I read from the global struct from the image (ELF), the corresponding memory looks pretty random. I just need one of the first three i8 values.

How does a function find the corresponding kernel environment at runtime?

For AMDGPU it is something in the linker. If I directly read the struct from .o file, the value is correct, but if I read from .img which is generated by lld, then the value is messy/random.

@__omp_offloading_32_71401f4c_main_l12_dynamic_environment = internal addrspace(1) global %struct.DynamicEnvironmentTy zeroinitializer
@__omp_offloading_32_71401f4c_main_l12_kernel_environment = local_unnamed_addr addrspace(1) constant %struct.KernelEnvironmentTy { %struct.ConfigurationEnvironmentTy { i8 1, i8 1, i8 1 }, ptr addrspacecast (ptr addrspace(1) @1 to ptr), ptr addrspacecast (ptr addrspace(1) @__omp_offloading_32_71401f4c_main_l12_dynamic_environment to ptr) }

Does the linkage of the global __omp_offloading_32_71401f4c_main_l12_kernel_environment look suspicious?

Herald added a project: Restricted Project. · View Herald Transcript

rebase.

I'll land it and see if AMD buildbot will be happy.

This revision was landed with ongoing or failed builds.Jul 23 2023, 3:36 PM
This revision was automatically updated to reflect the committed changes.
tianshilei1992 reopened this revision.Jul 23 2023, 8:34 PM
This revision is now accepted and ready to land.Jul 23 2023, 8:34 PM

rebase and fix test issues

rebase and prepare for landing

This revision was landed with ongoing or failed builds.Jul 26 2023, 10:35 AM
This revision was automatically updated to reflect the committed changes.
dhruvachak added inline comments.
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
3944

Is there a reason this has to be ExternalLinkage? Can we use GlobalValue::WeakAnyLinkage here? The external linkage leads to multiply defined linker errors downstream on test cases that have a target region in a header file. For some reason, the problem does not repro on the main branch.
@tianshilei1992 @jdoerfert

tianshilei1992 added inline comments.Aug 4 2023, 7:15 PM
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
3944

It has been fixed.

dhruvachak added inline comments.Aug 4 2023, 8:46 PM
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
3944

It has been fixed.

Thanks. I hadn't pulled, so did not see it earlier.

The changes in this patch do not actually work correctly with kernels that really need to be in Generic mode. The ExecMode value recovered in the kmpc_kernel_init function is 3 in the case where it needs to be 1.

The problem lies with the OpenMPOpt changes since disabling them makes everything work again.

openmp/libomptarget/DeviceRTL/src/State.cpp