This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Introduce basic JIT support to OpenMP target offloading
ClosedPublic

Authored by tianshilei1992 on Dec 4 2022, 7:26 PM.

Details

Summary

This patch adds the basic JIT support for OpenMP. Currently it only works on Nvidia GPUs.

The support for AMDGPU can be extended easily by just implementing three interface functions. However, the infrastructure requires a small extra extension (add a pre process hook) to support portability for AMDGPU because the AMDGPU backend reads target features of functions. https://github.com/shiltian/llvm-project/commit/02bc7effccc6ff2f5ab3fe5218336094c0485766#diff-321c2038035972ad4994ff9d85b29950ba72c08a79891db5048b8f5d46915314R432 shows how it roughly works.

As for the test, even though I added the corresponding code in CMake files, the test still cannot be triggered because some code is missing in the new plugin CMake file, which has nothing to do with this patch. It will be fixed later.

In order to enable JIT mode, when compiling, -foffload-lto is needed, and when linking, -foffload-lto -Wl,--embed-bitcode is needed. That implies that, LTO is required to enable JIT mode.

Diff Detail

Event Timeline

tianshilei1992 created this revision.Dec 4 2022, 7:26 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 4 2022, 7:26 PM
tianshilei1992 requested review of this revision.Dec 4 2022, 7:26 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptDec 4 2022, 7:26 PM
tianshilei1992 added inline comments.Dec 4 2022, 7:29 PM
clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
879 ↗(On Diff #479964)

This will be pushed by Joseph in another patch.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/CMakeLists.txt
15–22

I guess this might cause the issue of non-protected global symbols.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/JIT.cpp
185

Is there any way that we don't write it to a file here?

256

Is there better way to compare two triples?

tianshilei1992 added inline comments.Dec 4 2022, 7:32 PM
openmp/libomptarget/plugins-nextgen/common/PluginInterface/JIT.cpp
229

I might change the return value to Expected<bool> such that it is able to pass the error info back to caller.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp
723

Do we want a configurable value for the OptLevel, or can we know it from somewhere else what value is used at compile time?

Why do we have the JIT in the nextgen plugins? I figured that JIT would be handled by libomptarget proper rather than the plugins. I guess this is needed for per-kernel specialization? My idea of the rough pseudocode would be like this and we wouldn't need a complex class heirarchy. Also I don't know if we can skip ptxas by giving CUDA the ptx directly, we probably will need to invoke lld on the command line however right.

for each image:
  if image is bitcode
    image = compile(image)
 register(image)
clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
879 ↗(On Diff #479964)

Did that this morning.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/CMakeLists.txt
15–22

Should we be able to put all this in the add_llvm_library?

openmp/libomptarget/plugins-nextgen/common/PluginInterface/JIT.cpp
48–52

We could probably limit these to the ones we actually care about since we know the triples. Not sure if it would save us much runtime.

185

Why do we need to invoke LTO here? I figured that we could call the backend directly since we have no need to actually link any filies, and we may not have a need to run more expensive optimizations when the bitcode is already optimized. If you do that then you should be able to just use a raw_svector_ostream as your output stream and get the compiled output written to that buffer.

Why do we have the JIT in the nextgen plugins? I figured that JIT would be handled by libomptarget proper rather than the plugins. I guess this is needed for per-kernel specialization? My idea of the rough pseudocode would be like this and we wouldn't need a complex class heirarchy. Also I don't know if we can skip ptxas by giving CUDA the ptx directly, we probably will need to invoke lld on the command line however right.

for each image:
  if image is bitcode
    image = compile(image)
 register(image)

We could handle them in libomptarget, but that's gonna require we add another two interface functions: is_valid_bitcode_image, and compile_bitcode_image. It is doable. Handling them in plugin as a separate module can just reuse the two existing interfaces.

Also I don't know if we can skip ptxas by giving CUDA the ptx directly, we probably will need to invoke lld on the command line however right.

for each image:
  if image is bitcode
    image = compile(image)
 register(image)

We can give CUDA PTX directly, since the CUDA JIT is to just call ptxas instead of ptxas -c, which requires nvlink afterwards.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/JIT.cpp
185

For the purpose of this basic JIT support, we indeed just need backend. However, since we have the plan for super optimization, etc., having an optimization pipeline here is also useful.

Why do we have the JIT in the nextgen plugins? I figured that JIT would be handled by libomptarget proper rather than the plugins. I guess this is needed for per-kernel specialization? My idea of the rough pseudocode would be like this and we wouldn't need a complex class heirarchy. Also I don't know if we can skip ptxas by giving CUDA the ptx directly, we probably will need to invoke lld on the command line however right.

for each image:
  if image is bitcode
    image = compile(image)
 register(image)

We could handle them in libomptarget, but that's gonna require we add another two interface functions: is_valid_bitcode_image, and compile_bitcode_image. It is doable. Handling them in plugin as a separate module can just reuse the two existing interfaces.

Would we need to consult the plugin? We can just check the magic directly, if it's bitcode we just compile it for its triple. If this was wrong then when the plugin gets the compiled image it will error.

Also I don't know if we can skip ptxas by giving CUDA the ptx directly, we probably will need to invoke lld on the command line however right.

for each image:
  if image is bitcode
    image = compile(image)
 register(image)

We can give CUDA PTX directly, since the CUDA JIT is to just call ptxas instead of ptxas -c, which requires nvlink afterwards.

That makes it easier for us, so the only command line tool we need to call is lld for AMDGPU.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/JIT.cpp
185

We should be able to configure our own optimization pipeline in that case, we might want the extra control as well.

tianshilei1992 added a comment.EditedDec 5 2022, 7:52 AM

Why do we have the JIT in the nextgen plugins? I figured that JIT would be handled by libomptarget proper rather than the plugins. I guess this is needed for per-kernel specialization? My idea of the rough pseudocode would be like this and we wouldn't need a complex class heirarchy. Also I don't know if we can skip ptxas by giving CUDA the ptx directly, we probably will need to invoke lld on the command line however right.

for each image:
  if image is bitcode
    image = compile(image)
 register(image)

We could handle them in libomptarget, but that's gonna require we add another two interface functions: is_valid_bitcode_image, and compile_bitcode_image. It is doable. Handling them in plugin as a separate module can just reuse the two existing interfaces.

Would we need to consult the plugin? We can just check the magic directly, if it's bitcode we just compile it for its triple. If this was wrong then when the plugin gets the compiled image it will error.

I prefer error out at earlier stage, especially if we have a bitcode image, and both Nvidia and AMD support JIT, then both NVIDIA and AMD will report a valid binary, thus continue compiling the image, initializing the plugin, etc., which could give us the wrong results.

Also I don't know if we can skip ptxas by giving CUDA the ptx directly, we probably will need to invoke lld on the command line however right.

for each image:
  if image is bitcode
    image = compile(image)
 register(image)

We can give CUDA PTX directly, since the CUDA JIT is to just call ptxas instead of ptxas -c, which requires nvlink afterwards.

That makes it easier for us, so the only command line tool we need to call is lld for AMDGPU.

Yes, for AMDGPU it can be handled in the post processing, which is not done in this patch.

tianshilei1992 added inline comments.Dec 5 2022, 7:54 AM
openmp/libomptarget/plugins-nextgen/common/PluginInterface/JIT.cpp
185

which means we basically rewrite the function opt and backend in LTO.cpp. I thought about just invoking backend before, especially using LTO requires us to build the resolution table. However, after a second thought, I think it would be better to just use LTO.

jhuber6 added inline comments.Dec 5 2022, 8:06 AM
openmp/libomptarget/plugins-nextgen/common/PluginInterface/JIT.cpp
185

Building the passes isn't too complicated, it would take up the same amount of space as the symbol resolutions and has the advantage that we don't need to write the output to a file. I could write an implementation for this to see how well it works.

jplehr added a subscriber: jplehr.Dec 5 2022, 8:13 AM

drop LTO and fix comments

tianshilei1992 marked 5 inline comments as done.Dec 7 2022, 12:35 PM
tianshilei1992 added inline comments.
openmp/libomptarget/plugins-nextgen/common/PluginInterface/CMakeLists.txt
30–54

Have to figure out what components here.

add build components

We should probably make a test for this. Do we currently test the nextgen plugins?

All but a test and this looks good to me.

We probably want to enable a new test configuration to have each test run in JIT mode.

rebase and refine

It currently crashes in setupLLVMOptimizationRemarks

jdoerfert added inline comments.Dec 9 2022, 10:42 AM
openmp/libomptarget/plugins-nextgen/common/PluginInterface/JIT.cpp
65

Not unreachable. Use a printf and an abort.

rebase and fix opt error

rebase and fix comments

tianshilei1992 marked an inline comment as done.Dec 11 2022, 7:37 PM
jdoerfert added inline comments.Dec 12 2022, 8:13 AM
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp
723

We most likely want a env var and maybe later even pass the value through. Env var is good enough for now.

tianshilei1992 retitled this revision from [WIP][OpenMP] Introduce basic JIT support to OpenMP target offloading to [OpenMP] Introduce basic JIT support to OpenMP target offloading.Dec 12 2022, 9:14 AM
tianshilei1992 edited the summary of this revision. (Show Details)

add env for opt level

tianshilei1992 marked an inline comment as done and an inline comment as not done.Dec 12 2022, 9:24 AM
tianshilei1992 added inline comments.
openmp/cmake/OpenMPTesting.cmake
6 ↗(On Diff #482158)

Oh, this change is unrelated. Will remove it later.

tianshilei1992 edited the summary of this revision. (Show Details)Dec 12 2022, 1:35 PM

Some nits.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/JIT.cpp
122

I'm tempted to move this into LLVM somewhere since it's been duplicated so many times.

133

typo

151

Why do we need TT if we expect to read the triple out of the Module?

277–278

Should work.

302

Why a std::list? Since we use pointers we shouldn't need to worry about having a stable pointer and could use a SmallVector or similar right?

320

Should probably prefer C++ casts even if they are ridiculously verbose.

344

MCPU is the more common version AFAICT.

openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp
825

Missing comment?

jdoerfert accepted this revision.Dec 12 2022, 10:19 PM

LG as there seem to be only nits left. We can expand on this in tree

openmp/libomptarget/plugins-nextgen/common/PluginInterface/JIT.cpp
122

Let's do this as a follow up.

185

Agreed. We can test it in a follow up and decide then.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/JIT.h
42
This revision is now accepted and ready to land.Dec 12 2022, 10:19 PM
tianshilei1992 marked an inline comment as done.Dec 13 2022, 4:54 AM
tianshilei1992 added inline comments.
openmp/libomptarget/plugins-nextgen/common/PluginInterface/JIT.cpp
185

I already switched to build the pipeline ourselves and drop LTO.

tianshilei1992 marked 10 inline comments as done.

rebase and fix comments

openmp/libomptarget/plugins-nextgen/common/PluginInterface/JIT.cpp
277–278

What if OS is not NULL terminated?

jhuber6 added inline comments.Dec 27 2022, 9:01 AM
openmp/libomptarget/plugins-nextgen/common/PluginInterface/JIT.cpp
277–278

Do we need it to be? The memory buffer should contain the size, unless we need to convert it to a C string somewhere. In that case you could do OS << '\0' but it would probably mess up the size.

rebase and fix comment

tianshilei1992 marked an inline comment as done.Dec 27 2022, 11:19 AM
tianshilei1992 added inline comments.
openmp/libomptarget/plugins-nextgen/common/PluginInterface/JIT.cpp
277–278

str() returns a StringRef, which is good. I thought it returned const char *.

This revision was landed with ongoing or failed builds.Dec 27 2022, 4:07 PM
This revision was automatically updated to reflect the committed changes.
tianshilei1992 marked an inline comment as done.

seems like this broke the amdgpu buildbot , plz resolve
https://lab.llvm.org/buildbot/#/builders/193/builds/24122

seems like this broke the amdgpu buildbot , plz resolve
https://lab.llvm.org/buildbot/#/builders/193/builds/24122

Reverted. Will fix it soon.

tianshilei1992 reopened this revision.Dec 27 2022, 7:17 PM
This revision is now accepted and ready to land.Dec 27 2022, 7:17 PM

fix compile error

This revision was landed with ongoing or failed builds.Dec 27 2022, 7:19 PM
This revision was automatically updated to reflect the committed changes.
tianshilei1992 marked an inline comment as done.
ye-luo added a subscriber: ye-luo.EditedDec 28 2022, 8:21 AM

Got tons of runtime failure

target doesn't support jit
UNREACHABLE executed at /gpfs/jlse-fs0/users/yeluo/opt/llvm-clang/llvm-project-nightly/openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h:543!

on AMD GPU gfx906 when running miniqmc ctest
failure happens with
LIBOMPTARGET_NEXTGEN_PLUGINS=1

hbae added a subscriber: hbae.Jan 6 2023, 6:54 AM

Looks like GCC 7.5 cannot build LLVM after this change. Could you please take a look?

In file included from /localdisk/hbae/LLVM/llvm-base/openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp:11:0:
/localdisk/hbae/LLVM/llvm-base/openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h: In member function ‘virtual llvm::Expected<std::unique_ptr<llvm::MemoryBuffer> > llvm::omp::target::plugin::GenericDeviceTy::doJITPostProcessing(std::unique_ptr<llvm::MemoryBuffer>) const’:
/localdisk/hbae/LLVM/llvm-base/openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h:389:12: error: could not convert ‘MB’ from ‘std::unique_ptr<llvm::MemoryBuffer>’ to ‘llvm::Expected<std::unique_ptr<llvm::MemoryBuffer> >’
     return MB;
            ^~

Looks like GCC 7.5 cannot build LLVM after this change. Could you please take a look?

In file included from /localdisk/hbae/LLVM/llvm-base/openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp:11:0:
/localdisk/hbae/LLVM/llvm-base/openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h: In member function ‘virtual llvm::Expected<std::unique_ptr<llvm::MemoryBuffer> > llvm::omp::target::plugin::GenericDeviceTy::doJITPostProcessing(std::unique_ptr<llvm::MemoryBuffer>) const’:
/localdisk/hbae/LLVM/llvm-base/openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h:389:12: error: could not convert ‘MB’ from ‘std::unique_ptr<llvm::MemoryBuffer>’ to ‘llvm::Expected<std::unique_ptr<llvm::MemoryBuffer> >’
     return MB;
            ^~

Some older GCC's have problem with the implicit move on copy elision AFAIK. I'll add a std::move and let me know if that fixes it.

hbae added a comment.Jan 6 2023, 7:07 AM

Looks like GCC 7.5 cannot build LLVM after this change. Could you please take a look?

In file included from /localdisk/hbae/LLVM/llvm-base/openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp:11:0:
/localdisk/hbae/LLVM/llvm-base/openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h: In member function ‘virtual llvm::Expected<std::unique_ptr<llvm::MemoryBuffer> > llvm::omp::target::plugin::GenericDeviceTy::doJITPostProcessing(std::unique_ptr<llvm::MemoryBuffer>) const’:
/localdisk/hbae/LLVM/llvm-base/openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h:389:12: error: could not convert ‘MB’ from ‘std::unique_ptr<llvm::MemoryBuffer>’ to ‘llvm::Expected<std::unique_ptr<llvm::MemoryBuffer> >’
     return MB;
            ^~

Some older GCC's have problem with the implicit move on copy elision AFAIK. I'll add a std::move and let me know if that fixes it.

Yes, that fixes it. We have two more places in JIT.cpp (line 126, 185).

Looks like GCC 7.5 cannot build LLVM after this change. Could you please take a look?

In file included from /localdisk/hbae/LLVM/llvm-base/openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp:11:0:
/localdisk/hbae/LLVM/llvm-base/openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h: In member function ‘virtual llvm::Expected<std::unique_ptr<llvm::MemoryBuffer> > llvm::omp::target::plugin::GenericDeviceTy::doJITPostProcessing(std::unique_ptr<llvm::MemoryBuffer>) const’:
/localdisk/hbae/LLVM/llvm-base/openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h:389:12: error: could not convert ‘MB’ from ‘std::unique_ptr<llvm::MemoryBuffer>’ to ‘llvm::Expected<std::unique_ptr<llvm::MemoryBuffer> >’
     return MB;
            ^~

Some older GCC's have problem with the implicit move on copy elision AFAIK. I'll add a std::move and let me know if that fixes it.

Yes, that fixes it. We have two more places in JIT.cpp (line 126, 185).

Thanks for pointing them out, I'll fix them right away.