Page MenuHomePhabricator

[libomptarget] Try a fallback devicertl if the preferred one is missing
AbandonedPublic

Authored by JonChesterfield on Feb 17 2021, 10:02 AM.

Details

Summary

[libomptarget] Try a fallback devicertl if the preferred one is missing

Clang may be used with a cuda version that is newer than the clang release.
This may fail in various ways, but there is also a credible chance it will
work OK, for sufficiently small version slip.

As discussed in the weekly call, this change adds a fallback path to clang
which looks for a devicertl library that doesn't have a specific cuda version
encoded in the name, specifically 'libomptarget-nvptx-unknown.bc'. It also
warns about this, and recommends the user specify a devicertl they expect to
work manually.

Diff Detail

Unit TestsFailed

TimeTest
590 msx64 debian > Clang.Driver::amdgpu-openmp-toolchain.c
Script: -- : 'RUN: at line 2'; env LIBRARY_PATH=/mnt/disks/ssd0/agent/llvm-project/clang/test/Driver/Inputs/hip_dev_lib /mnt/disks/ssd0/agent/llvm-project/build/bin/clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 /mnt/disks/ssd0/agent/llvm-project/clang/test/Driver/amdgpu-openmp-toolchain.c 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/clang/test/Driver/amdgpu-openmp-toolchain.c
1,360 msx64 debian > Clang.Driver::openmp-offload-gpu.c
Script: -- : 'RUN: at line 17'; /mnt/disks/ssd0/agent/llvm-project/build/bin/clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_35 -Xopenmp-target -march=sm_60 /mnt/disks/ssd0/agent/llvm-project/clang/test/Driver/openmp-offload-gpu.c 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck -check-prefix=CHK-FOPENMP-TARGET-ARCHS /mnt/disks/ssd0/agent/llvm-project/clang/test/Driver/openmp-offload-gpu.c
40 msx64 debian > Clang.Misc::warning-flags.c
Script: -- : 'RUN: at line 1'; diagtool list-warnings > /mnt/disks/ssd0/agent/llvm-project/build/tools/clang/test/Misc/Output/warning-flags.c.tmp 2>&1
430 msx64 windows > Clang.Driver::amdgpu-openmp-toolchain.c
Script: -- : 'RUN: at line 2'; env LIBRARY_PATH=C:\ws\w16-1\llvm-project\premerge-checks\clang\test\Driver/Inputs/hip_dev_lib c:\ws\w16-1\llvm-project\premerge-checks\build\bin\clang.exe -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 C:\ws\w16-1\llvm-project\premerge-checks\clang\test\Driver\amdgpu-openmp-toolchain.c 2>&1 | c:\ws\w16-1\llvm-project\premerge-checks\build\bin\filecheck.exe C:\ws\w16-1\llvm-project\premerge-checks\clang\test\Driver\amdgpu-openmp-toolchain.c
10,920 msx64 windows > Clang.Driver::openmp-offload-gpu.c
Script: -- : 'RUN: at line 17'; c:\ws\w16-1\llvm-project\premerge-checks\build\bin\clang.exe -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_35 -Xopenmp-target -march=sm_60 C:\ws\w16-1\llvm-project\premerge-checks\clang\test\Driver\openmp-offload-gpu.c 2>&1 | c:\ws\w16-1\llvm-project\premerge-checks\build\bin\filecheck.exe -check-prefix=CHK-FOPENMP-TARGET-ARCHS C:\ws\w16-1\llvm-project\premerge-checks\clang\test\Driver\openmp-offload-gpu.c
View Full Test Results (6 Failed)

Event Timeline

JonChesterfield requested review of this revision.Feb 17 2021, 10:02 AM
JonChesterfield created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 17 2021, 10:02 AM
  • drop whitespace from .td
clang/lib/Driver/ToolChains/CommonArgs.cpp
1695–1707

If you're using Twine, + is not needed.

llvm::Twine LibOmpTargetName("libomptarget-", BitcodeSuffix ,".bc");
1697

The BitcodeSuffix for NVPTX consists of three parts: nvptx, cuda_xxx, and sm_yy. We might want to have an unknown for every sm?

1699–1700
bool FoundBCLibrary = tryAppendBuiltinBitcodeGivenPaths(
        LibraryPaths, DriverArgs, CC1Args, LibOmpTargetName);

Remove the definition above?

clang/include/clang/Basic/DiagnosticDriverKinds.td
265

Besides, we might also need to tell users this bitcode library might not work as expected.

  • review comments
JonChesterfield marked an inline comment as done.Feb 17 2021, 10:48 AM
JonChesterfield added inline comments.
clang/lib/Driver/ToolChains/CommonArgs.cpp
1647

^clang-tidy warning about a twine in a parameter list [llvm-twine-local] looks like a bug

1695–1707

This doesn't appear to be the case. Doesn't compile anyway, no matching constructor.

The discussion around the clang-tidy warning states that Twine variables shouldn't be used as local variables because they don't make copies of the arguments, so one can hit use after free. I don't agree with that, but it doesn't matter very much so have changed to std::string.

1697

Lets not go there. 'Unknown' is a reasonably strong indicator for 'this might not work'. One per sm suggests we've somehow tested whether it's going to be OK or not.

I'd prefer we stick with 'using a cuda that didn't exist when your clang shipped is a bad idea, get a newer clang or an older cuda'. This is a sketch of how we could go the other way.

clang/include/clang/Basic/DiagnosticDriverKinds.td
265

Could you propose preferred wording?

Harbormaster completed remote builds in B89565: Diff 324342.

Suggestion is to resolve libomptarget-nvptx-unknown.bc to a cp of the bitcode libary built for the newest sm_xx and ptx version clang knows of.

tianshilei1992 added inline comments.Feb 17 2021, 3:12 PM
clang/include/clang/Basic/DiagnosticDriverKinds.td
265

maybe the following one?

def warn_drv_omp_offload_target_missingbcruntime : Warning<"No library '%0' found in the default clang lib directory or in LIBRARY_PATH. Fall back to '%1' instead.">;

Please also update the test.

Suggestion is to resolve libomptarget-nvptx-unknown.bc to a cp of the bitcode libary built for the newest sm_xx and ptx version clang knows of.

Why a copy instead of a symlink?

  • address review comment

Please also update the test.

Suggestion is to resolve libomptarget-nvptx-unknown.bc to a cp of the bitcode libary built for the newest sm_xx and ptx version clang knows of.

Why a copy instead of a symlink?

I don't trust symlinks to survive packaging (tar or deb) or copying around the filesystem during install. Possibly showing scars from windows development there.

Tests passing locally, as expected. Do you mean add a new one that jury rigs cuda version and has a libomptarget-nvptx-unknown.bc on disk somewhere?

Can we wrap this up and backport it, last known issue we should fix for 12.

If you like it, accept it. If there's more stuff to do first, please say so.

ye-luo added a comment.EditedFeb 18 2021, 9:52 AM

Does this patch include creating 'libomptarget-nvptx-unknown.bc'?

tianshilei1992 accepted this revision.Feb 18 2021, 10:13 AM

LGTM. Please also update the test(s) before commit.

This revision is now accepted and ready to land.Feb 18 2021, 10:13 AM

Which tests?

Choosing a file to duplicate with the unknown.bc name is left separate. This just adds code to look for that file if the first choice failed.

ye-luo requested changes to this revision.Feb 18 2021, 12:18 PM

Let user to copy the bc file is not feasible.
Handle this in CMake please.
libomptarget-nvptx-unknown.bc

This revision now requires changes to proceed.Feb 18 2021, 12:18 PM
ye-luo accepted this revision.Feb 18 2021, 2:56 PM

Got it. Copy a file can be tricky. Compile one more can be easily done in cmake.

This revision is now accepted and ready to land.Feb 18 2021, 2:56 PM

Is this obsolete with the change to devicertl cmake? Would prefer abandon to land if so

I think we might not this patch. We’re gonna not support old version of CUDA anyway.

ye-luo added a comment.EditedFeb 22 2021, 6:15 AM

to me this is still desired + cmake creating libomptarget-nvptx-unknown.bc as a solution for forward compatibility until a clean solution lands. I don't want repetitively asking the admin to copy files for every new CUDA toolkit installation.

to me this is still desired + cmake creating libomptarget-nvptx-unknown.bc as a solution for forward compatibility until a clean solution lands.

We’ll have newer version LLVM like 12.1 or 12.01 w/ a *right* solution. I don’t think we need to think that further.

ye-luo added a comment.EditedFeb 22 2021, 6:20 AM

to me this is still desired + cmake creating libomptarget-nvptx-unknown.bc as a solution for forward compatibility until a clean solution lands.

We’ll have newer version LLVM like 12.1 or 12.01 w/ a *right* solution. I don’t think we need to think that further.

This doesn't help people who needs to run exactly 12.0. Also cannot wait for a minor release, need things to work right away when a new cuda toolkit is installed intentionally.

to me this is still desired + cmake creating libomptarget-nvptx-unknown.bc as a solution for forward compatibility until a clean solution lands.

We’ll have newer version LLVM like 12.1 or 12.01 w/ a *right* solution. I don’t think we need to think that further.

This doesn't help people who needs to run exactly 12.0. Also cannot wait for a minor release, need things to work right away when a new cuda toolkit is installed intentionally.

First, we don’t know whether a new version of CUDA will be released during this time, especially considering the release history of CUDA, 11.3 is not that possible, and 12 will not come out so soon. Second, even if NVIDIA’s people are so brilliant and release CUDA 12 in a short time, we have a mechanism for user to work around the issue using the option libomptarget-nvptx-bc-path. This patch is just a work around, and it contains so many uncertainties, which cannot be called “forward compatibility” at all.

JonChesterfield abandoned this revision.Mar 15 2021, 12:21 PM

I'm going to abandon this. I'm not confident that a cuda toolkit that is newer than the compiler will work with it correctly and would prefer it take some jury rigging on the end users part to put the two together.

I get that 'works out of the box, whatever cuda is installed' would be an amazing feature to have. This patch sort of promises that without necessarily delivering on the 'works' part.