This is an archive of the discontinued LLVM Phabricator instance.

[Clang][OpenMP] Add partial support for Static Device Libraries
ClosedPublic

Authored by saiislam on Jun 30 2021, 6:54 AM.

Details

Summary

An archive containing device code object files can be passed to
clang command line for linking. For each given offload target
it creates a device specific archives which is either passed to llvm-link
if the target is amdgpu, or to clang-nvlink-wrapper if the target is
nvptx. -L/-l flags are used to specify these fat archives on the command
line. E.g.

clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp -L. -lmylib

It currently doesn't support linking an archive directly, like:

clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp libmylib.a

Linking with x86 offload also does not work.

Diff Detail

Event Timeline

saiislam created this revision.Jun 30 2021, 6:54 AM
saiislam requested review of this revision.Jun 30 2021, 6:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2021, 6:54 AM

Not a thorough review but comments to address.

clang/lib/Driver/ToolChains/CommonArgs.cpp
1663

Coding convention.

1675

This will accumulate more and more entries in AOBFileNames, shouldn't you clear it at the beginning?

1685
1811
1819

Lots of places you create new std::strings for no reason, use StringRef or const std::string & to avoid that (especially in range loops).

Please no conditional like a != ".." && a != ".." ..... Use a compile time set or string switch.

grokos added inline comments.Jul 9 2021, 10:29 AM
clang/lib/Driver/ToolChains/CommonArgs.cpp
1690

"a" --> ".a" (add a dot)

1797–1799

I'm with @jdoerfert here, you can use a set of library names which are known to not have device-specific SDLs and check whether that set contains SDL_Name. Also, SDL_Names can be a set of unique entries, this way even if you try to add the same library twice it won't be added. This quadratic-complexity loop looks ugly...

saiislam updated this revision to Diff 361590.Jul 26 2021, 1:19 AM
  1. Simplified the nested loop to look for repeated SDLs.
  2. Minimized creation of std::string in favour of StringRef wherever possible.
  3. Added test cases
saiislam updated this revision to Diff 361591.Jul 26 2021, 1:23 AM

Couple of wrong files got added in the last commit.

saiislam marked 5 inline comments as done.Jul 26 2021, 1:29 AM
saiislam added inline comments.
clang/lib/Driver/ToolChains/Clang.cpp
7745–7758

It will be replaced with a method invocation to get GPUArch in an upcoming patch to support multi-architecture compilation.

clang/lib/Driver/ToolChains/CommonArgs.cpp
1690

"a" is second argument of GetTemporayPath and doesn't the prefix "."

saiislam updated this revision to Diff 361633.Jul 26 2021, 5:56 AM
saiislam marked 2 inline comments as done.

Added instructions to generate a fat archive and reduced the size of attached libFatArchive.a used for testing.

x64 debian > Clang.Driver::fat_archive.cpp Failed

@ABataev @grokos any comments?

ye-luo added a subscriber: ye-luo.EditedJul 28 2021, 11:11 AM

Do I must use llvm-ar/ranlib or system ar/ranlib is OK?

  1. existing use case breaks

Use https://github.com/ye-luo/openmp-target/blob/master/tests/math/modf.cpp
$ clang++ -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target=nvptx64 -march=sm_80 modf.cpp # still OK
two steps
$ clang++ -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target=nvptx64 -march=sm_80 modf.cpp -c
$ clang++ -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target=nvptx64 -march=sm_80 modf.o
clang-14: warning: Unknown CUDA version. version.txt: 11.0.228. Assuming the latest supported version 10.1 [-Wunknown-cuda-version]
nvlink fatal : Could not open input file '/tmp/modf-0bf89b.cubin'
clang-14: error: nvlink command failed with exit code 1 (use -v to see invocation)

  1. could you make my test case working?

https://github.com/ye-luo/openmp-target/tree/master/tests/link_static_fat_bin
both compile-amd.sh and compile.sh doesn't work for me.
linking was successful but no device code is in the executable and the run fails.

Does your test actually test a run?

Do I must use llvm-ar/ranlib or system ar/ranlib is OK?

  1. existing use case breaks

Use https://github.com/ye-luo/openmp-target/blob/master/tests/math/modf.cpp
$ clang++ -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target=nvptx64 -march=sm_80 modf.cpp # still OK
two steps
$ clang++ -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target=nvptx64 -march=sm_80 modf.cpp -c
$ clang++ -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target=nvptx64 -march=sm_80 modf.o
clang-14: warning: Unknown CUDA version. version.txt: 11.0.228. Assuming the latest supported version 10.1 [-Wunknown-cuda-version]
nvlink fatal : Could not open input file '/tmp/modf-0bf89b.cubin'
clang-14: error: nvlink command failed with exit code 1 (use -v to see invocation)

  1. could you make my test case working?

https://github.com/ye-luo/openmp-target/tree/master/tests/link_static_fat_bin
both compile-amd.sh and compile.sh doesn't work for me.
linking was successful but no device code is in the executable and the run fails.

Does your test actually test a run?

  1. System ar/ranlib is OK.
  2. modf.cpp test case is breaking due to incompatibility between bundle entry ID formats which got introduced by my last patch D93525. I think D106809 should fix this, but I haven't tested it on this case yet.
  3. link_static_fat_bin is working fine. The missing component was inability of nvlink to take archive of cubin files as input. It requires a wrapper over nvlink (D108291). Please review.
tra added a subscriber: tra.Aug 23 2021, 1:43 PM
tra added inline comments.
clang/lib/Driver/ToolChains/CommonArgs.cpp
1599–1618

This could probably be collapsed into a couple of nested loops:

for (auto base: {"/libbc-", "/lib"}) {
  std::string ext = base == "/lib" ? ".bc" : ".a";
  for( auto suffix : {Lib + "-" + Arch + "-" + Target, Lib + "-" + Arch, Lib } }) {
    SDLs.append({"/libdevice" + base + suffix + ext,  base + suffix + ext })
  }
}
ye-luo added inline comments.Aug 23 2021, 2:16 PM
clang/lib/Driver/ToolChains/CommonArgs.h
62

Differentiate the names of all the three AddStaticDeviceLibs functions and add documentation. Be sure to do document every function added in this patch.

ye-luo added a comment.Sep 6 2021, 7:18 PM

@saiislam since clang-nvlink-wrapper has landed, could you update this patch?

@saiislam since clang-nvlink-wrapper has landed, could you update this patch?

@ye-luo , In the last multi-company OpenMP-dev meeting it was suggested that D106809 should be reviewed separately because it is logically a separate patch. But, D106809 is required for this patch so that it doesn't break our downstream pipeline.
Could you please take a look at D106809?

saiislam updated this revision to Diff 371946.Sep 10 2021, 10:02 AM
saiislam marked an inline comment as done.

Added documentation and other fixes suggested by reviewers.

@saiislam do my test cases work on your side? I tried this patch and still got linking failure.

@saiislam do my test cases work on your side? I tried this patch and still got linking failure.

Hey @ye-luo,
I am able to successfully compile and run all (36) tests in https://github.com/ye-luo/openmp-target (branch master, commit id: dd0544f).
Here are the steps I followed (all seems to be straightforward):

  1. Get updated trunk
  2. Apply this patch
  3. Build and install llvm-project from step 2.
  4. Set CC and CXX variables to newly installed location from step 3.
  5. Get openmp-target repo
  6. cmake, make, make all, make test

Output of make test

/work/saiyedul/openmp-target/build$ make test
Running tests...
Test project /work/saiyedul/openmp-target/build

Start  1: cxx.complex_reduction_cpu

1/36 Test #1: cxx.complex_reduction_cpu .......................... Passed 0.01 sec

Start  2: cxx.complex_reduction

2/36 Test #2: cxx.complex_reduction .............................. Passed 0.01 sec

Start  3: cxx.complex

3/36 Test #3: cxx.complex ........................................ Passed 0.00 sec

Start  4: f.complex

4/36 Test #4: f.complex .......................................... Passed 0.00 sec

Start  5: cxx.global_static

5/36 Test #5: cxx.global_static .................................. Passed 0.00 sec

Start  6: cxx.constexpr

6/36 Test #6: cxx.constexpr ...................................... Passed 0.00 sec

Start  7: cxx.link_static_fat_bin

7/36 Test #7: cxx.link_static_fat_bin ............................ Passed 0.00 sec

Start  8: cxx.linker_outlined_function_collision

8/36 Test #8: cxx.linker_outlined_function_collision ............. Passed 0.00 sec

Start  9: cxx.math_FP_ZERO

9/36 Test #9: cxx.math_FP_ZERO ................................... Passed 0.00 sec

Start 10: cxx.math_header_only

10/36 Test #10: cxx.math_header_only ............................... Passed 0.00 sec

Start 11: cxx.math_modf

11/36 Test #11: cxx.math_modf ...................................... Passed 0.00 sec

Start 12: cxx.math_sqrt_simd

12/36 Test #12: cxx.math_sqrt_simd ................................. Passed 0.00 sec

Start 13: cxx.math_sin_cos

13/36 Test #13: cxx.math_sin_cos ................................... Passed 0.00 sec

Start 14: cxx.math_sin_simd

14/36 Test #14: cxx.math_sin_simd .................................. Passed 0.00 sec

Start 15: cxx.math_sincos

15/36 Test #15: cxx.math_sincos .................................... Passed 0.00 sec

Start 16: cxx.math_sincos_simd

16/36 Test #16: cxx.math_sincos_simd ............................... Passed 0.00 sec

Start 17: cxx.math_sincos_simd_template

17/36 Test #17: cxx.math_sincos_simd_template ...................... Passed 0.00 sec

Start 18: cxx.target_teams_private__distribute

18/36 Test #18: cxx.target_teams_private__distribute ............... Passed 0.00 sec

Start 19: cxx.target__teams_private__distribute

19/36 Test #19: cxx.targetteams_privatedistribute .............. Passed 0.00 sec

Start 20: cxx.target_teams_distribute_private

20/36 Test #20: cxx.target_teams_distribute_private ................ Passed 0.00 sec

Start 21: cxx.target_teams__distribute_private

21/36 Test #21: cxx.target_teams__distribute_private ............... Passed 0.00 sec

Start 22: cxx.target__teams_distribute_private

22/36 Test #22: cxx.target__teams_distribute_private ............... Passed 0.00 sec

Start 23: cxx.target__teams__distribute_private

23/36 Test #23: cxx.targetteamsdistribute_private .............. Passed 0.00 sec

Start 24: cxx.target_teams_distribute_parallel_for_private

24/36 Test #24: cxx.target_teams_distribute_parallel_for_private ... Passed 0.00 sec

Start 25: f.target_teams_distribute_parallel_for_private

25/36 Test #25: f.target_teams_distribute_parallel_for_private ..... Passed 0.00 sec

Start 26: f.target_teams_distribute_private

26/36 Test #26: f.target_teams_distribute_private .................. Passed 0.00 sec

Start 27: cxx.target_nowait_task

27/36 Test #27: cxx.target_nowait_task ............................. Passed 0.00 sec

Start 28: cxx.taskloop_offload_nowait

28/36 Test #28: cxx.taskloop_offload_nowait ........................ Passed 0.00 sec

Start 29: cxx.taskloop

29/36 Test #29: cxx.taskloop ....................................... Passed 0.00 sec

Start 30: cxx.omp-task-bug

30/36 Test #30: cxx.omp-task-bug ................................... Passed 0.00 sec

Start 31: cxx.host_bug_libomp

31/36 Test #31: cxx.host_bug_libomp ................................ Passed 0.00 sec

Start 32: f.use_device_ptr_target

32/36 Test #32: f.use_device_ptr_target ............................ Passed 0.00 sec

Start 33: f.allocatable_array_device

33/36 Test #33: f.allocatable_array_device ......................... Passed 0.00 sec

Start 34: f.allocatable_array_device_ptr

34/36 Test #34: f.allocatable_array_device_ptr ..................... Passed 0.00 sec

Start 35: f.allocatable_array_device_isptr

35/36 Test #35: f.allocatable_array_device_isptr ................... Passed 0.00 sec

Start 36: f.allocatable_array_device_ptr_isptr

36/36 Test #36: f.allocatable_array_device_ptr_isptr ............... Passed 0.00 sec

100% tests passed, 0 tests failed out of 36

Label Time Summary:
cxx = 0.08 sec*proc (28 tests)
fortran = 0.02 sec*proc (8 tests)

Total Test time (real) = 0.12 sec

@saiislam did you turn on offload? https://github.com/ye-luo/openmp-target/wiki/OpenMP-offload-compilers#llvm-clang
On NVIDIA, it fails at CMake step. On AMD, make step stops because of unrelated issue.

Please make the exact reproducer 1 working. Right now I got

$ clang++ -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target=nvptx64 -march=sm_80 modf.o
nvlink fatal   : Could not open input file '/tmp/modf-88b730.cubin'
/soft/llvm/main-patched/bin/clang-nvlink-wrapper: error: 'nvlink' failed
clang-14: error: nvlink command failed with exit code 1 (use -v to see invocation)
saiislam updated this revision to Diff 372307.Sep 13 2021, 11:55 AM

Fix for file types in fat archive.

@saiislam did you turn on offload? https://github.com/ye-luo/openmp-target/wiki/OpenMP-offload-compilers#llvm-clang
On NVIDIA, it fails at CMake step. On AMD, make step stops because of unrelated issue.

Please make the exact reproducer 1 working. Right now I got

$ clang++ -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target=nvptx64 -march=sm_80 modf.o
nvlink fatal   : Could not open input file '/tmp/modf-88b730.cubin'
/soft/llvm/main-patched/bin/clang-nvlink-wrapper: error: 'nvlink' failed
clang-14: error: nvlink command failed with exit code 1 (use -v to see invocation)

@ye-luo
Can you please take this latest update and verify if it is working for you?
It is working for me.

Following modifications were also required for the compile.sh and compile-amd.sh.
Assuming name of your fat archive is libmylib.a and it is present in the current directory, the command to use it will be as follows:

clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp -L. -lmylib

or

clang++ -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 main.cpp -L. -lmylib

So, compile.sh should be:

clang++ -fopenmp -fopenmp-targets=nvptx64 -c -Xopenmp-target=nvptx64 -march=sm_50 classA.cpp
rm -f libmylib.a
ar qc libmylib.a classA.o
ranlib libmylib.a
clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp -L. -lmylib
./a.out

And compile-amd.sh should be:

clang++ -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 -c classA.cpp
rm -f libmylib.a
llvm-ar qc libmylib.a classA.o
llvm-ranlib libmylib.a
clang++ -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 main.cpp -L. -lmylib
./a.out

the modf test still doesn't work. The issue was from unbundle.
case 1 works.

clang++ -fopenmp -fopenmp-targets=nvptx64 modf.cpp -c
clang++ -fopenmp -fopenmp-targets=nvptx64 modf.o

case 2

clang++ -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target=nvptx64 -march=sm_80 modf.cpp -c
clang++ -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target=nvptx64 -march=sm_80 modf.o

doesn't work. The issue was a failure in unbundling

"/soft/llvm/main-patched/bin/clang-offload-bundler" -type=o -targets=host-x86_64-unknown-linux-gnu,openmp-nvptx64-sm_80 -inputs=modf.o -outputs=/tmp/modf-99be57.o,/tmp/modf-88a9eb.cubin -unbundle -allow-missing-bundles

ends up an empty cubin file. If I remove -sm_80 on the unbundle line
It seems that we need to have sm_80 when the object file is created.

When clang is compiled CLANG_OPENMP_NVPTX_DEFAULT_ARCH can be used to set the default sm_XX so it doesn't needs to be provided at compile time. If not set default is sm_35.
So in general sm_XX is always known when the compiler is used for compiling.

saiislam added a comment.EditedSep 14 2021, 4:26 AM

the modf test still doesn't work. The issue was from unbundle.
case 1 works.

clang++ -fopenmp -fopenmp-targets=nvptx64 modf.cpp -c
clang++ -fopenmp -fopenmp-targets=nvptx64 modf.o

case 2

clang++ -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target=nvptx64 -march=sm_80 modf.cpp -c
clang++ -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target=nvptx64 -march=sm_80 modf.o

doesn't work. The issue was a failure in unbundling

"/soft/llvm/main-patched/bin/clang-offload-bundler" -type=o -targets=host-x86_64-unknown-linux-gnu,openmp-nvptx64-sm_80 -inputs=modf.o -outputs=/tmp/modf-99be57.o,/tmp/modf-88a9eb.cubin -unbundle -allow-missing-bundles

ends up an empty cubin file. If I remove -sm_80 on the unbundle line
It seems that we need to have sm_80 when the object file is created.

When clang is compiled CLANG_OPENMP_NVPTX_DEFAULT_ARCH can be used to set the default sm_XX so it doesn't needs to be provided at compile time. If not set default is sm_35.
So in general sm_XX is always known when the compiler is used for compiling.

Both cases 1 and 2 are working fine in my setup using these commands. I have sm_50, though.
If you think unbundler is the the issue, can you please tell what all code objects are inside modf.o bundle in both cases with following command:

clang-offload-bundler -type=o --inputs=modf.o --list

In my case it prints the following for case 1:

openmp-nvptx64
host-x86_64-unknown-linux-gnu

for case 2:

openmp-nvptx64-sm_50
host-x86_64-unknown-linux-gnu

ye-luo added a comment.EditedSep 14 2021, 6:43 AM

case 1

yeluo@epyc-server:~/opt/openmp-target/tests/math$ clang++ -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target=nvptx64 -march=sm_80 modf.cpp -c
yeluo@epyc-server:~/opt/openmp-target/tests/math$ clang-offload-bundler -type=o --inputs=modf.o --list
openmp-nvptx64
host-x86_64-unknown-linux-gnu

case2

yeluo@epyc-server:~/opt/openmp-target/tests/math$ clang++ -fopenmp -fopenmp-targets=nvptx64 modf.cpp -c
yeluo@epyc-server:~/opt/openmp-target/tests/math$ clang-offload-bundler -type=o --inputs=modf.o --list
openmp-nvptx64
host-x86_64-unknown-linux-gnu

also try nvptx64-nvidia-cuda

yeluo@epyc-server:~/opt/openmp-target/tests/math$ clang++ -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda modf.cpp -c
warning: linking module '/soft/llvm/main-20210910/lib/libomptarget-nvptx-sm_80.bc': Linking two modules of different target triples: '/soft/llvm/main-20210910/lib/libomptarget-nvptx-sm_80.bc' is 'nvptx64' whereas 'modf.cpp' is 'nvptx64-nvidia-cuda'
 [-Wlinker-warnings]
1 warning generated.
yeluo@epyc-server:~/opt/openmp-target/tests/math$ clang-offload-bundler -type=o --inputs=modf.o --list
openmp-nvptx64-nvidia-cuda
host-x86_64-unknown-linux-gnu

None of them have sm_80 added.

Here is my clang build recipe

cmake -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=$INSTALL_FOLDER \
    -DLLVM_ENABLE_BACKTRACES=ON \
    -DLLVM_ENABLE_WERROR=OFF \
    -DBUILD_SHARED_LIBS=OFF \
    -DLLVM_ENABLE_RTTI=ON \
    -DLLVM_TARGETS_TO_BUILD="X86;AMDGPU;NVPTX" \
    -DLLVM_ENABLE_ASSERTIONS=ON \
    -DLLVM_ENABLE_PROJECTS="clang;lld" \
    -DLLVM_ENABLE_RUNTIMES="libcxxabi;libcxx;openmp" \
    -DLIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES="80,61" \
    -DCLANG_OPENMP_NVPTX_DEFAULT_ARCH=sm_80 \
    -DLIBOMPTARGET_NVPTX_MAX_SM=38 \
    -DLIBOMPTARGET_ENABLE_DEBUG=ON \
    ../llvm-project/llvm
saiislam updated this revision to Diff 372514.Sep 14 2021, 10:34 AM

Rebase and a minor fix.

  1. modf works now.
  1. if I modify the complile.sh
clang++ -fopenmp -fopenmp-targets=nvptx64 -c classA.cpp
rm -f libmylib.a
ar qc libmylib.a classA.o
ranlib libmylib.a
clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp -L. -lmylib
./a.out

doesn't work. I think the solution is adding sm_XX to the module name regardless of user command line.

3, directly linking static archive doesn't work.

clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp libmylib.a

CMake generates this style of link line. So this really needs to work.

only the following case works right now.

clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp -L. -lmylib
  1. modf works now.
  1. if I modify the complile.sh
clang++ -fopenmp -fopenmp-targets=nvptx64 -c classA.cpp
rm -f libmylib.a
ar qc libmylib.a classA.o
ranlib libmylib.a
clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp -L. -lmylib
./a.out

doesn't work. I think the solution is adding sm_XX to the module name regardless of user command line.

3, directly linking static archive doesn't work.

clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp libmylib.a

CMake generates this style of link line. So this really needs to work.

only the following case works right now.

clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp -L. -lmylib

Thanks for confirming that modf and this patch works with -l/-L options.

The option of adding sm_XX in Bundle Entry ID when user hasn't used -march flag, comes under command line simplification. I have a bunch of upcoming patches which will significantly simplify OpenMP command line for GPU offloading. But, don't you think this feature is different than supporting static device libraries and should be dealt separately?

Our plan was always to support SDLs using -l and -L options, as described in Greg Rodgers's presentation about static device libraries in last year's LLVM-CTH Workshop. It was also discussed in multi company meetings. We can explore supporting direct linking of SDLs as described by you, but it seems to me that it is out of scope of this patch. What do you think?

PS: We (mulit-company OpenMP-dev meetings) have been tracking this feature for a while and we would very much like it to be picked for llvm-13.

PS: We (mulit-company OpenMP-dev meetings) have been tracking this feature for a while and we would very much like it to be picked for llvm-13.

I think llvm-13 is on -final now so I think we've missed the train for big feature work. That's not so bad, 14 branches in ~6 months.

The option of adding sm_XX in Bundle Entry ID when user hasn't used -march flag, comes under command line simplification. I have a bunch of upcoming patches which will significantly simplify OpenMP command line for GPU offloading. But, don't you think this feature is different than supporting static device libraries and should be dealt separately?

Simplifying command line options is a different topic and we have not agreed upon how to do the simplification. In general, that is unrelated to this broken test case. But it can probably be fixed later with a separate patch.

Our plan was always to support SDLs using -l and -L options, as described in Greg Rodgers's presentation about static device libraries in last year's LLVM-CTH Workshop. It was also discussed in multi company meetings. We can explore supporting direct linking of SDLs as described by you, but it seems to me that it is out of scope of this patch. What do you think?

To me, supporting both -L/-l and libXXX.a are both required for static libraries. It is not reasonable to request users to use one way not the other. AOMP had no issue working in both ways.

PS: We (mulit-company OpenMP-dev meetings) have been tracking this feature for a while and we would very much like it to be picked for llvm-13.

Although we wanted to deliver this feature to users, I don't think we should tell users that linking static archive works with the current quality. So in my view, It is not suitable to backport it o 13.

This patch is good. It makes part of the static linking feature work but we can not claim "Static Device Libraries" is fully working. So please update the tile and also in the description saying what is working and what is not. Also need a test case for nvptx64.
In addition, I'd like to see commitment of addressing both above failed test cases before accepting this patch to help incremental development.

by the way, I found offload to x86 with static linking doesn't work with -L. -lmylib
https://github.com/ye-luo/openmp-target/blob/master/tests/linking/link_static_fat_bin/compile-x86.sh
I think this can also be addressed separately. FYI, AOMP happily makes it working.

This patch doesn't seem to break anything on my side.
@saiislam could you

  1. address all the in-source review comments
  2. update the title to [Clang][OpenMP] Add partial support for Static Device Libraries
  3. update the patch description about what works and what doesn't.

I will accept this patch once these minor revisions are added.

This patch doesn't seem to break anything on my side.
@saiislam could you

  1. address all the in-source review comments
  2. update the title to [Clang][OpenMP] Add partial support for Static Device Libraries
  3. update the patch description about what works and what doesn't.

I will accept this patch once these minor revisions are added.

Sure. I will do these changes. Thanks.

pdhaliwal added inline comments.
clang/test/Driver/fat_archive.cpp
9 ↗(On Diff #372514)

Here, LIBRARY_PATH is specifying path to build directory of openmp runtime which might not be available when openmp is not built. Why not use %S/Inputs directory and put the required files into that?

saiislam retitled this revision from [Clang][OpenMP] Add support for Static Device Libraries to [Clang][OpenMP] Add partial support for Static Device Libraries.Sep 24 2021, 10:06 AM
saiislam edited the summary of this revision. (Show Details)
saiislam updated this revision to Diff 374897.Sep 24 2021, 10:21 AM
saiislam marked an inline comment as done.

Added nvptx test cases, simplified amdgpu test case, modified commit message.

saiislam updated this revision to Diff 375648.Sep 28 2021, 11:25 AM

fixed the typos in the test cases

saiislam updated this revision to Diff 375672.Sep 28 2021, 12:54 PM

fixed nvptx test

ye-luo added inline comments.Sep 29 2021, 7:37 AM
clang/lib/Driver/ToolChains/CommonArgs.h
62

@saiislam Fix this?

saiislam marked an inline comment as done.Sep 29 2021, 7:55 AM
saiislam added inline comments.
clang/lib/Driver/ToolChains/CommonArgs.h
62

I have added documentation along with function definition in CommonArgs.cpp. Should I move it here? I thought keep documentation and code at the same place will improve readability.

ye-luo added inline comments.Sep 29 2021, 8:35 AM
clang/lib/Driver/ToolChains/CommonArgs.h
62

The docs are good. I was looking for "Differentiate the names of all the three AddStaticDeviceLibs functions"

ye-luo added inline comments.Oct 5 2021, 9:18 AM
clang/lib/Driver/ToolChains/CommonArgs.h
62
saiislam updated this revision to Diff 377500.Oct 6 2021, 4:32 AM
saiislam marked 3 inline comments as done.

Function name refactoring for AddStaticDeviceLibs.

ye-luo accepted this revision.Oct 6 2021, 11:00 AM

LGTM.

This revision is now accepted and ready to land.Oct 6 2021, 11:00 AM
This revision was landed with ongoing or failed builds.Oct 6 2021, 9:46 PM
This revision was automatically updated to reflect the committed changes.

Fixed windows build bot errors after landing this patch with 06404d5488ea505b00f711393973db3ae32d01e9

thakis added a subscriber: thakis.Oct 7 2021, 4:14 AM

Looks like this breaks tests on Mac: http://45.33.8.238/mac/36543/step_7.txt

Please take a look and revert for now if it takes a while to fix.

Looks like this breaks tests on Mac: http://45.33.8.238/mac/36543/step_7.txt

Please take a look and revert for now if it takes a while to fix.

Thanks for reporting. Taking a look.

thakis added a comment.Oct 7 2021, 6:22 AM

It's been broken for a while now. Should we revert this for now?

thakis added a comment.Oct 7 2021, 6:31 AM

Looks like the test is failing on Windows too: http://45.33.8.238/win/46512/step_7.txt

thakis added a comment.Oct 7 2021, 6:33 AM

Here's a failure on an official LLVM buildbot: https://lab.llvm.org/buildbot/#/builders/123/builds/6536 https://lab.llvm.org/buildbot/#/builders/123 has been red for over 8 hours now.

Blocking windows + osx for a couple of hours is bad, let's revert it. Will do so myself when I get to a desktop

Blocking windows + osx for a couple of hours is bad, let's revert it. Will do so myself when I get to a desktop

Here is the fix: D111311
Please review it.

thakis added a comment.Oct 7 2021, 6:48 AM

Fix didn't help, still broken at least on Windows: http://45.33.8.238/win/46515/step_7.txt

Fix didn't help, still broken at least on Windows: http://45.33.8.238/win/46515/step_7.txt

Ohk. Reverting this patch in a few minutes.

thakis added a comment.Oct 7 2021, 6:51 AM

Looks like the test looks for "ld" but Windows prints "\link.exe" -- maybe you need to pass an explicit triple to the compiler? Or relax the check line.

MyDeveloperDay added inline comments.
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
120

This file now fails clang-format

saiislam marked an inline comment as done.Oct 11 2021, 6:55 AM
saiislam added inline comments.
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
120

Thanks for reporting. Fixed it in D111545. Please have a look.