This patch uses the existing logic of CUDA for searching libomptarget
and extracts it to a common method.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Missed some changes,
- Fix openmp-offload.c test failure
- Fix amdgpu-openmp-toolchain.c test failure
I like this. Using the same logic, in the same function call, to find this library on either gpu is the right thing to do. Looks like a non functional change on nvptx, though phab doesn't make that obvious.
clang/include/clang/Driver/Options.td | ||
---|---|---|
949 | I think there's an aliasing mechanism in the Options handling, where we can use device_bc_path as the canonical choice and nvptx_bc_path as a backwards-compatible argument that sets device_bc_path | |
clang/lib/Driver/ToolChains/CommonArgs.cpp | ||
1655 | This, I think, can be reduced to checking OPT_libomptarget_device_bc_path_EQ if the aliasing machinery I remember does actually exist |
Edit: Debugged further, rewriting comment.
Current error message on missing library is:
'error: No library 'libomptarget-amdgcn-gfx906.bc' found in the default clang lib directory or in LIBRARY_PATH. Please use --libomptarget-nvptx-bc-path to specify nvptx bitcode libarary'
Written in ./clang/include/clang/Basic/DiagnosticDriverKinds.td entry err_drv_omp_offload_target_missingbcruntime should probably refer to 'device' instead of 'nvptx' (error message change only)
Cuda.cpp calls addOpenMPDeviceRTL guarded by nogpulib, AMDGPUOpenMP unconditionally calls it. That means the deviceRTL is needed on disk when building the deviceRTL. Not so good. We need a if (DriverArgs.hasArg(options::OPT_nogpulib)) return; in AMDGPUOpenMPToolChain::addClangTargetOptions. Comment inline as well.
The existing search logic looks in clang's lib and LIBRARY_PATH, I think we should probably look in the runtime directory as well for running from the build tree. That's separate to this change though.
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp | ||
---|---|---|
193 | Need if (DriverArgs.hasArg(options::OPT_nogpulib)) return; here or we can't build the deviceRTL without already having one on disk |
Generally LGTM.
I don't think it's necessary as we can add the runtime directory to LIBRARY_PATH when configuring lit.
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp | ||
---|---|---|
193 | FWIW, NVPTX deviceRTLs is built by directly calling FE, not via clang driver. clang -fopenmp -fopenmp-targets=xxx basically consists of two passes, and therefore generates two IRs, which is not what we expect. I'm not sure we really need the if statement. |
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp | ||
---|---|---|
193 | That explains what I was missing about the ptx cmake I think. I've had to locally hack around clang creating a bundle, which llvm-link chokes on, because cuda-device-only et al are ignored by openmp. I think this check is right - it means nogpulib will exclude the rtl on both GPUs. Nvptx already has it in the control flow. Whether RTL cmake should bypass the driver is interesting, but I think separate to this patch. |
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp | ||
---|---|---|
193 | cuda-device-only is only for CUDA compilation. We don’t have an option to invoke device compilation only. |
Some background discussion about the diagnostic. This change means people using nvptx, where the library cannot be found, will now be advised to use libomptarget-device-bc-path instead of libomptarget-nvptx-bc-path. If they use libomptarget-nvptx-bc-path anyway, it'll work as intended. That avoids us adding libomptarget-amdgcn-bc-path and requiring some more conditional compilation for multiarch codebases.
@tianshilei1992 @jdoerfert can we agree on 'libomptarget-device-bc-path' being the better one to recommend in the error message, despite that being a minor change to the current behaviour? It'll slightly surprise people parsing error messages, such as our test, but shouldn't cause any confusion.
I'm personally OK with using libomptarget-nvptx-bc-path to indicate where to look for the amdgcn bitcode as well but can see that causing confusion. I'm assuming that both gpu runtimes will go in same directory - there may be a future where one invocation targets nvptx and amdgcn at the same time, but even then I'd prefer all the runtimes live in the same place in the filesystem.
After a second thought, I don't think it is feasible. Consider the following senecio:
clang -fopenmp -fopenmp-targets=nvptx64,amdgcn source.cpp
We cannot use one option for the two different targets, and the alias might not work as well, especially in terms of the driver.
Using one option for both targets seems great - if both have put the devicertl in the same folder. Which I suppose they might not have.
Maybe keep it separate for now, one for nvptx and one for amdgcn, and hope for a common 'device' later.
I have removed libomptarget-device-bc-path and have added amdgcn one. For diagnostic,
instead of having one per architecture, I have used the same and added second
parameter to err_drv_omp_offload_target_missingbcruntime for having arch specifc message.
Parameter to err_drv_omp_offload_target_missingbcruntime is sensible. @tianshilei1992? With this we can set the path for nvptx and amdgcn independently.
clang/include/clang/Driver/Options.td | ||
---|---|---|
948 | Why do we need two options that literally do the same thing? |
clang/include/clang/Driver/Options.td | ||
---|---|---|
948 | It's for when someone has decided to put nvptx bitcode in one directory and amdgpu in another. That's presently useless. It might be more helpful once we can target both platforms from one compile, but even then passing bitcode-path once per arch seems better. In favour of throwing one option away and renaming the other to work on both/all targets |
Why do we need two options that literally do the same thing?
I cannot think of a use case where we would specify two distinct paths, can anyone else?