This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][AMDGPU] Add support for linking libomptarget bitcode
AcceptedPublic

Authored by pdhaliwal on Feb 8 2021, 1:15 AM.

Details

Summary

This patch uses the existing logic of CUDA for searching libomptarget
and extracts it to a common method.

Diff Detail

Event Timeline

pdhaliwal created this revision.Feb 8 2021, 1:15 AM
pdhaliwal requested review of this revision.Feb 8 2021, 1:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 8 2021, 1:15 AM
pdhaliwal updated this revision to Diff 322052.EditedFeb 8 2021, 1:51 AM

Missed some changes,

  • Fix openmp-offload.c test failure
  • Fix amdgpu-openmp-toolchain.c test failure

I like this. Using the same logic, in the same function call, to find this library on either gpu is the right thing to do. Looks like a non functional change on nvptx, though phab doesn't make that obvious.

clang/include/clang/Driver/Options.td
949

I think there's an aliasing mechanism in the Options handling, where we can use device_bc_path as the canonical choice and nvptx_bc_path as a backwards-compatible argument that sets device_bc_path

clang/lib/Driver/ToolChains/CommonArgs.cpp
1655

This, I think, can be reduced to checking OPT_libomptarget_device_bc_path_EQ if the aliasing machinery I remember does actually exist

pdhaliwal updated this revision to Diff 322090.Feb 8 2021, 5:47 AM

Addressed review comments.

pdhaliwal marked 2 inline comments as done.Feb 8 2021, 5:47 AM

LGTM. Let's wait for someone using nvptx to sanity check

JonChesterfield added a comment.EditedFeb 8 2021, 11:55 AM

Edit: Debugged further, rewriting comment.

Current error message on missing library is:
'error: No library 'libomptarget-amdgcn-gfx906.bc' found in the default clang lib directory or in LIBRARY_PATH. Please use --libomptarget-nvptx-bc-path to specify nvptx bitcode libarary'

Written in ./clang/include/clang/Basic/DiagnosticDriverKinds.td entry err_drv_omp_offload_target_missingbcruntime should probably refer to 'device' instead of 'nvptx' (error message change only)

Cuda.cpp calls addOpenMPDeviceRTL guarded by nogpulib, AMDGPUOpenMP unconditionally calls it. That means the deviceRTL is needed on disk when building the deviceRTL. Not so good. We need a if (DriverArgs.hasArg(options::OPT_nogpulib)) return; in AMDGPUOpenMPToolChain::addClangTargetOptions. Comment inline as well.

The existing search logic looks in clang's lib and LIBRARY_PATH, I think we should probably look in the runtime directory as well for running from the build tree. That's separate to this change though.

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
193

Need if (DriverArgs.hasArg(options::OPT_nogpulib)) return; here or we can't build the deviceRTL without already having one on disk

Generally LGTM.

The existing search logic looks in clang's lib and LIBRARY_PATH, I think we should probably look in the runtime directory as well for running from the build tree. That's separate to this change though.

I don't think it's necessary as we can add the runtime directory to LIBRARY_PATH when configuring lit.

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
193

FWIW, NVPTX deviceRTLs is built by directly calling FE, not via clang driver. clang -fopenmp -fopenmp-targets=xxx basically consists of two passes, and therefore generates two IRs, which is not what we expect. I'm not sure we really need the if statement.

pdhaliwal updated this revision to Diff 322298.Feb 8 2021, 11:36 PM
  • Added check for nogpulib
  • Fixed diagnostic message
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
193

That explains what I was missing about the ptx cmake I think. I've had to locally hack around clang creating a bundle, which llvm-link chokes on, because cuda-device-only et al are ignored by openmp.

I think this check is right - it means nogpulib will exclude the rtl on both GPUs. Nvptx already has it in the control flow.

Whether RTL cmake should bypass the driver is interesting, but I think separate to this patch.

JonChesterfield accepted this revision.Feb 9 2021, 2:03 AM
This revision is now accepted and ready to land.Feb 9 2021, 2:03 AM
tianshilei1992 added inline comments.Feb 9 2021, 6:11 AM
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
193

cuda-device-only is only for CUDA compilation. We don’t have an option to invoke device compilation only.

Some background discussion about the diagnostic. This change means people using nvptx, where the library cannot be found, will now be advised to use libomptarget-device-bc-path instead of libomptarget-nvptx-bc-path. If they use libomptarget-nvptx-bc-path anyway, it'll work as intended. That avoids us adding libomptarget-amdgcn-bc-path and requiring some more conditional compilation for multiarch codebases.

@tianshilei1992 @jdoerfert can we agree on 'libomptarget-device-bc-path' being the better one to recommend in the error message, despite that being a minor change to the current behaviour? It'll slightly surprise people parsing error messages, such as our test, but shouldn't cause any confusion.

I'm personally OK with using libomptarget-nvptx-bc-path to indicate where to look for the amdgcn bitcode as well but can see that causing confusion. I'm assuming that both gpu runtimes will go in same directory - there may be a future where one invocation targets nvptx and amdgcn at the same time, but even then I'd prefer all the runtimes live in the same place in the filesystem.

tianshilei1992 requested changes to this revision.Feb 9 2021, 9:20 AM

@tianshilei1992 @jdoerfert can we agree on 'libomptarget-device-bc-path' being the better one to recommend in the error message, despite that being a minor change to the current behaviour? It'll slightly surprise people parsing error messages, such as our test, but shouldn't cause any confusion.

After a second thought, I don't think it is feasible. Consider the following senecio:

clang -fopenmp -fopenmp-targets=nvptx64,amdgcn source.cpp

We cannot use one option for the two different targets, and the alias might not work as well, especially in terms of the driver.

This revision now requires changes to proceed.Feb 9 2021, 9:20 AM
JonChesterfield added a comment.EditedFeb 9 2021, 10:07 AM

Using one option for both targets seems great - if both have put the devicertl in the same folder. Which I suppose they might not have.

Maybe keep it separate for now, one for nvptx and one for amdgcn, and hope for a common 'device' later.

pdhaliwal updated this revision to Diff 322638.EditedFeb 10 2021, 3:22 AM
pdhaliwal marked an inline comment as done.

I have removed libomptarget-device-bc-path and have added amdgcn one. For diagnostic,
instead of having one per architecture, I have used the same and added second
parameter to err_drv_omp_offload_target_missingbcruntime for having arch specifc message.

Parameter to err_drv_omp_offload_target_missingbcruntime is sensible. @tianshilei1992? With this we can set the path for nvptx and amdgcn independently.

This revision is now accepted and ready to land.Feb 11 2021, 11:07 AM
This revision was landed with ongoing or failed builds.Feb 11 2021, 9:42 PM
This revision was automatically updated to reflect the committed changes.
jdoerfert reopened this revision.Jan 8 2022, 11:31 AM
jdoerfert added inline comments.
clang/include/clang/Driver/Options.td
948

Why do we need two options that literally do the same thing?
I cannot think of a use case where we would specify two distinct paths, can anyone else?

This revision is now accepted and ready to land.Jan 8 2022, 11:31 AM
clang/include/clang/Driver/Options.td
948

It's for when someone has decided to put nvptx bitcode in one directory and amdgpu in another. That's presently useless. It might be more helpful once we can target both platforms from one compile, but even then passing bitcode-path once per arch seems better.

In favour of throwing one option away and renaming the other to work on both/all targets