This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Link the bitcode library late for device LTO
ClosedPublic

Authored by jhuber6 on Jan 11 2022, 12:54 PM.

Details

Summary

This patch adds support for linking the OpenMP device bitcode library
late when doing LTO. This simply passes it in as an additional device
file when doing the final device linking phase with LTO. This has the
advantage that we don't link it multiple times, and the device
references do not get inlined and prevent us from doing needed OpenMP
optimizations when we have visiblity of the whole module.

Depends on D116975

Diff Detail

Event Timeline

jhuber6 created this revision.Jan 11 2022, 12:54 PM
jhuber6 requested review of this revision.Jan 11 2022, 12:54 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 11 2022, 12:54 PM

Do we have a hook where we could link it at, uh, link time on nvtptx without LTO? Amdgpu had a llvm-link already there. Always bothered me a little that we link a copy per TU then hope the optimiser sorts it out nicely.

Do we have a hook where we could link it at, uh, link time on nvtptx without LTO? Amdgpu had a llvm-link already there. Always bothered me a little that we link a copy per TU then hope the optimiser sorts it out nicely.

I think it would be difficult, we could compile the library stand-alone and link it normally in the linker wrapper, but the performance would probably be bad. We could potentially only link everything once, but when we compile we don't really know how many files will need it right? It would definitely improve compile times if we could somehow do it though.

Oh, right. Nvptx is still lowering to machine code per-tu. We don't want the devicertl linking as machine code, so it has to go in per-tu. Or we could link nvptx as IR instead, and send that plus amdgpu down the same code path. Probably makes applications faster and compiles slower. Which sort of brings us to this patch with a different default.

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
289

Can/should probably always link it late on amdgpu

clang/lib/Driver/ToolChains/Clang.cpp
8176

This looks familiar - is there a function in commonargs that already does this?

jhuber6 added inline comments.Jan 11 2022, 4:45 PM
clang/lib/Driver/ToolChains/Clang.cpp
8176

It's implemented individually when we select it in the CUDA and AMDGPUOpenMP toolchains, could probably try to unify it into a utility function to save a few lines.

jhuber6 updated this revision to Diff 402925.Jan 25 2022, 8:26 AM

Squash other uncommitted changes.

tianshilei1992 accepted this revision.Jan 31 2022, 7:01 PM
tianshilei1992 added a subscriber: tianshilei1992.

LGTM with two nits.

clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
75

target-library is not the common name we call it. Maybe device-runtime-library?

967

Is this still needed now?

This revision is now accepted and ready to land.Jan 31 2022, 7:01 PM
jhuber6 added inline comments.Jan 31 2022, 7:04 PM
clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
75

I think I called it bitcode-library later, since technically it's just any bitcode file that's linked in along LTO.

967

It still breaks, but I removed it later. My ordering for these patches is a bit of a mess, sorry.

This revision was landed with ongoing or failed builds.Jan 31 2022, 8:12 PM
This revision was automatically updated to reflect the committed changes.