This is an archive of the discontinued LLVM Phabricator instance.

[HIP] Fix HIP include path
ClosedPublic

Authored by yaxunl on Feb 18 2022, 7:20 AM.

Details

Summary

The clang compiler prepends the HIP header include paths to the search
list using -internal-isystem when building for the HIP language. This
prevents warnings related to things like reserved identifiers when
including the HIP headers even when ROCm is installed in a non-system
directory, such as /opt/rocm.

However, when HIP is installed in /usr, then the prepended include
path would be /usr/include. That is a problem, because the C standard
library headers are stored in /usr/include and the C++ standard
library headers must come before the C library headers in the search
path list (because the C++ standard library headers use #include_next
to include the C standard library headers).

While the HIP wrapper headers _do_ need to be earlier in the search
than the C++ headers, those headers get their own subdirectory and
their own explicit -internal-isystem argument. This include path is for
<hip/hip_runtime_api.h> and <hip/hip_runtime.h>, which do not require a
particular search ordering with respect to the C or C++ headers. Thus,
HIP include path is added after other system include paths.

With contribution from Cordell Bloor.

Diff Detail

Event Timeline

yaxunl created this revision.Feb 18 2022, 7:20 AM
yaxunl requested review of this revision.Feb 18 2022, 7:20 AM
tra added inline comments.Feb 22 2022, 10:43 AM
clang/lib/Driver/ToolChains/AMDGPU.cpp
524–525

My impression, after reading the problem description, is that the actual issue is that we're using -internal-isystem for the HIP SDK includes , not that we add the include path to them.

Instead of trying to guess whether we happen to match some hardcoded path where we think the standard headers may live, I'd rather use -I or its internal equivalent, if we have it. Hardcoded paths *will* be wrong for someone. E.g. I'm pretty sure /usr/anything is not going to work on windows. Of for our internal builds.

yaxunl added inline comments.Feb 23 2022, 7:55 AM
clang/lib/Driver/ToolChains/AMDGPU.cpp
524–525

changing -internal-isystem to -I is a solution, as the same path showing up first with both -I then with -internal-isystem will be treated as -internal-isystem and placed in the latter position.

However, one drawback is that this may cause regression due to warnings emitted for HIP headers.

I think I may let AddHIPIncludeArgs return the HIP include path instead of adding it right away, then let clang add it after the system include paths. I may rename AddHIPIncludeArgs as AddHIPWrapperIncludeArgsAndGetHIPIncludePath. What do you think? Thanks.

tra added inline comments.Feb 23 2022, 12:13 PM
clang/lib/Driver/ToolChains/AMDGPU.cpp
524–525

one drawback is that this may cause regression due to warnings emitted for HIP headers.

If I understand the patch description correctly, allowing these warnings was the purpose. Is that not the case?

[current state] prevents warnings related to things like reserved identifiers when including the HIP headers even when ROCm is installed in a non-system directory, such as /opt/rocm.

I'm fine with separating include paths of wrappers and the SDK headers. I think we already do something similar for CUDA (though they are still added with -isystem-include).

yaxunl added inline comments.Feb 23 2022, 3:29 PM
clang/lib/Driver/ToolChains/AMDGPU.cpp
524–525

Some warnings are unnecessary for HIP header files e.g. warnings about macro definitions starting with __. Some applications use -Wpedantic -Werror, which can cause unnecessary errors in HIP header. Also we have customers who use the latest clang with older HIP runtime. If we switch to -I, we may get regressions.

AFAIK all language headers e.g. CUDA, OpenMP, are included by -internal-isystem by clang, therefore I think HIP better follow this convention.

tra added inline comments.Feb 23 2022, 3:40 PM
clang/lib/Driver/ToolChains/AMDGPU.cpp
524–525

I may have misinterpreted the patch description.

So -include-isystem is not the problem. The problem is that when HIP SDK includes are installed under /usr, their inclusion along with the wrappers messes with the standard c/c++ header inclusion order . Separating the wrappers and HIP SDK include paths, and moving HIP includes towards the end of the search path is the way to fix it. HIP SDK headers will still be still included via -isystem-include.

Did I get that right?

yaxunl added inline comments.Feb 23 2022, 7:09 PM
clang/lib/Driver/ToolChains/AMDGPU.cpp
524–525

Right.

Since HIP headers does not use include_next, it does not matter whether HIP include path is before or after standard C++ include path. However, HIP include path may be the same as /usr or /usr/local, which need to be after standard C++ include path. Therefore moving HIP include path to be after system path solves the issue.

yaxunl updated this revision to Diff 411150.Feb 24 2022, 8:58 AM
yaxunl edited the summary of this revision. (Show Details)
yaxunl added a reviewer: linjamaki.

add HIP include path after adding other system include paths

tra added inline comments.Feb 24 2022, 11:34 AM
clang/lib/Driver/ToolChains/AMDGPU.cpp
487

I don't think you need to both add args and retrieve HIP SDK path.

520–521

This is the only change needed to this function.

clang/lib/Driver/ToolChains/Clang.cpp
1470 ↗(On Diff #411150)

Path to HIP includes should be added in AddClangSystemIncludeArgs which is called from the loop just above.

yaxunl updated this revision to Diff 411290.Feb 24 2022, 7:14 PM

revised by Artem's comments

yaxunl planned changes to this revision.Feb 25 2022, 6:36 AM

I just found one issue with the current patch. It adds HIP include path for non-HIP programs.

We should only add HIP include path for JobAction with HIP offloading kind. However, AddClangSystemIncludeArgs is not per job action.

I feel I should not complicate AddClangSystemIncludeArgs API by making it accept a JobAction argument. Then I should add HIP include path in Clang::AddPreprocessingOptions instead of AddClangSystemIncludeArgs. Then I have to add ToolChain::AddPostSystemHIPIncludeArgs since not all ToolChain have RocmInstallation.

Basically this will end up as ToolChain having two APIs: one for adding HIP wrapper include args, one for adding HIP include args.

tra added a comment.Feb 28 2022, 12:13 PM

I just found one issue with the current patch. It adds HIP include path for non-HIP programs.

We should only add HIP include path for JobAction with HIP offloading kind. However, AddClangSystemIncludeArgs is not per job action.
I feel I should not complicate AddClangSystemIncludeArgs API by making it accept a JobAction argument.

I'm not sure I understand. In general we do want host and device-side compilations to be as close as we can make them and that includes include paths. AddClangSystemIncludeArgs is a toolchain method and does exactly what we need -- add the same include path to both host and device compilations when HIP (or CUDA) toolchain is used.
I don't quite see how you could end up with a HIP toolchain in a non-HIP compilation.

What do I miss?

I just found one issue with the current patch. It adds HIP include path for non-HIP programs.

We should only add HIP include path for JobAction with HIP offloading kind. However, AddClangSystemIncludeArgs is not per job action.
I feel I should not complicate AddClangSystemIncludeArgs API by making it accept a JobAction argument.

I'm not sure I understand. In general we do want host and device-side compilations to be as close as we can make them and that includes include paths. AddClangSystemIncludeArgs is a toolchain method and does exactly what we need -- add the same include path to both host and device compilations when HIP (or CUDA) toolchain is used.
I don't quite see how you could end up with a HIP toolchain in a non-HIP compilation.

What do I miss?

Users may use clang driver to compile HIP program and C++ program with one clang driver invocation, e.g.

clang --offload-arch=gfx906 a.hip b.cpp

Clang driver will create job actions for a.hip and b.cpp separately. The job actions for a.hip have HIP offload kind. The job actions for b.cpp have 'none' offload kind.

Only job actions having HIP offload kind should have HIP include paths, otherwise, even if clang driver is used for compiling one single C++ program, it will add HIP include path.

tra added a comment.Feb 28 2022, 3:52 PM

Users may use clang driver to compile HIP program and C++ program with one clang driver invocation, e.g.

clang --offload-arch=gfx906 a.hip b.cpp

Clang driver will create job actions for a.hip and b.cpp separately. The job actions for a.hip have HIP offload kind. The job actions for b.cpp have 'none' offload kind.

Only job actions having HIP offload kind should have HIP include paths, otherwise, even if clang driver is used for compiling one single C++ program, it will add HIP include path.

Are you saying that clang driver would pick HIP toolchain for the C++ compilation? It does not make sense. It that were the case we would be risking picking up C++ toolchain for the HIP compilation, too and that would obviously not work at all.

AFAICT, clang certainly does not add CUDA include paths to C++ compilations passed to the top-level invocation. I believe this should work for HIP, too. Search for cuda-11.5 below:

# bin/clang++ --cuda-path=$HOME/local/cuda-11.5.0 -### -c a.cu b.cc

clang version 15.0.0
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/local/home/tra/work/llvm/build/release+assert+zapcc/bin
 "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/bin/clang-15" "-cc1" "-triple" "nvptx64-nvidia-cuda" "-aux-triple" "x86_64-unknown-linux-gnu" "-S" "-disable-free" "-clear-ast-before-backend" "-main-file-name" "a.cu" "-mrelocation-model" "static" "-mframe-pointer=all" "-fno-rounding-math" "-fno-verbose-asm" "-no-integrated-as" "-aux-target-cpu" "x86-64" "-fcuda-is-device" "-mllvm" "-enable-memcpyopt-without-libcalls" "-mlink-builtin-bitcode" "/usr/local/home/tra/local/cuda-11.5.0/nvvm/libdevice/libdevice.10.bc" "-target-feature" "+ptx75" "-target-sdk-version=11.5" "-target-cpu" "sm_35" "-mllvm" "-treat-scalable-fixed-error-as-warning" "-debugger-tuning=gdb" "-fno-dwarf-directory-asm" "-resource-dir" "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/lib/clang/15.0.0" "-internal-isystem" "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/lib/clang/15.0.0/include/cuda_wrappers" "-include" "__clang_cuda_runtime_wrapper.h" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/backward" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/backward" "-internal-isystem" "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/lib/clang/15.0.0/include" "-internal-isystem" "/usr/local/include" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../x86_64-linux-gnu/include" "-internal-externc-isystem" "/usr/include/x86_64-linux-gnu" "-internal-externc-isystem" "/include" "-internal-externc-isystem" "/usr/include" "-internal-isystem" "/usr/local/home/tra/local/cuda-11.5.0/include" "-internal-isystem" "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/lib/clang/15.0.0/include" "-internal-isystem" "/usr/local/include" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../x86_64-linux-gnu/include" "-internal-externc-isystem" "/usr/include/x86_64-linux-gnu" "-internal-externc-isystem" "/include" "-internal-externc-isystem" "/usr/include" "-fdeprecated-macro" "-fno-autolink" "-fdebug-compilation-dir=/usr/local/home/tra/work/llvm/build/release+assert+zapcc" "-ferror-limit" "19" "-fgnuc-version=4.2.1" "-fcxx-exceptions" "-fexceptions" "-fcolor-diagnostics" "-cuid=fce3a17633fb941f" "-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-o" "/tmp/a-8486d3/a-sm_35.s" "-x" "cuda" "a.cu"
 "/usr/local/home/tra/local/cuda-11.5.0/bin/ptxas" "-m64" "-O0" "--gpu-name" "sm_35" "--output-file" "/tmp/a-7c357d/a-sm_35.o" "/tmp/a-8486d3/a-sm_35.s"
 "/usr/local/home/tra/local/cuda-11.5.0/bin/fatbinary" "-64" "--create" "/tmp/a-c46038.fatbin" "--image=profile=sm_35,file=/tmp/a-7c357d/a-sm_35.o" "--image=profile=compute_35,file=/tmp/a-8486d3/a-sm_35.s"
 "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/bin/clang-15" "-cc1" "-triple" "x86_64-unknown-linux-gnu" "-target-sdk-version=11.5" "-aux-triple" "nvptx64-nvidia-cuda" "-emit-obj" "-mrelax-all" "--mrelax-relocations" "-disable-free" "-clear-ast-before-backend" "-main-file-name" "a.cu" "-mrelocation-model" "static" "-mframe-pointer=all" "-fmath-errno" "-ffp-contract=on" "-fno-rounding-math" "-mconstructor-aliases" "-funwind-tables=2" "-target-cpu" "x86-64" "-tune-cpu" "generic" "-mllvm" "-treat-scalable-fixed-error-as-warning" "-debugger-tuning=gdb" "-fcoverage-compilation-dir=/usr/local/home/tra/work/llvm/build/release+assert+zapcc" "-resource-dir" "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/lib/clang/15.0.0" "-internal-isystem" "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/lib/clang/15.0.0/include/cuda_wrappers" "-include" "__clang_cuda_runtime_wrapper.h" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/backward" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/backward" "-internal-isystem" "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/lib/clang/15.0.0/include" "-internal-isystem" "/usr/local/include" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../x86_64-linux-gnu/include" "-internal-externc-isystem" "/usr/include/x86_64-linux-gnu" "-internal-externc-isystem" "/include" "-internal-externc-isystem" "/usr/include" "-internal-isystem" "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/lib/clang/15.0.0/include" "-internal-isystem" "/usr/local/include" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../x86_64-linux-gnu/include" "-internal-externc-isystem" "/usr/include/x86_64-linux-gnu" "-internal-externc-isystem" "/include" "-internal-externc-isystem" "/usr/include" "-internal-isystem" "/usr/local/home/tra/local/cuda-11.5.0/include" "-fdeprecated-macro" "-fdebug-compilation-dir=/usr/local/home/tra/work/llvm/build/release+assert+zapcc" "-ferror-limit" "19" "-fgnuc-version=4.2.1" "-fcxx-exceptions" "-fexceptions" "-fcolor-diagnostics" "-fcuda-include-gpubinary" "/tmp/a-c46038.fatbin" "-cuid=fce3a17633fb941f" "-faddrsig" "-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-o" "a.o" "-x" "cuda" "a.cu"
 "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/bin/clang-15" "-cc1" "-triple" "x86_64-unknown-linux-gnu" "-emit-obj" "-mrelax-all" "--mrelax-relocations" "-disable-free" "-clear-ast-before-backend" "-main-file-name" "b.cc" "-mrelocation-model" "static" "-mframe-pointer=all" "-fmath-errno" "-ffp-contract=on" "-fno-rounding-math" "-mconstructor-aliases" "-funwind-tables=2" "-target-cpu" "x86-64" "-tune-cpu" "generic" "-mllvm" "-treat-scalable-fixed-error-as-warning" "-debugger-tuning=gdb" "-fcoverage-compilation-dir=/usr/local/home/tra/work/llvm/build/release+assert+zapcc" "-resource-dir" "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/lib/clang/15.0.0" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/backward" "-internal-isystem" "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/lib/clang/15.0.0/include" "-internal-isystem" "/usr/local/include" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../x86_64-linux-gnu/include" "-internal-externc-isystem" "/usr/include/x86_64-linux-gnu" "-internal-externc-isystem" "/include" "-internal-externc-isystem" "/usr/include" "-fdeprecated-macro" "-fdebug-compilation-dir=/usr/local/home/tra/work/llvm/build/release+assert+zapcc" "-ferror-limit" "19" "-fgnuc-version=4.2.1" "-fcxx-exceptions" "-fexceptions" "-fcolor-diagnostics" "-faddrsig" "-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-o" "b.o" "-x" "c++" "b.cc"
yaxunl added a comment.Mar 1 2022, 7:11 AM

Users may use clang driver to compile HIP program and C++ program with one clang driver invocation, e.g.

clang --offload-arch=gfx906 a.hip b.cpp

Clang driver will create job actions for a.hip and b.cpp separately. The job actions for a.hip have HIP offload kind. The job actions for b.cpp have 'none' offload kind.

Only job actions having HIP offload kind should have HIP include paths, otherwise, even if clang driver is used for compiling one single C++ program, it will add HIP include path.

Are you saying that clang driver would pick HIP toolchain for the C++ compilation? It does not make sense. It that were the case we would be risking picking up C++ toolchain for the HIP compilation, too and that would obviously not work at all.

AFAICT, clang certainly does not add CUDA include paths to C++ compilations passed to the top-level invocation. I believe this should work for HIP, too. Search for cuda-11.5 below:

# bin/clang++ --cuda-path=$HOME/local/cuda-11.5.0 -### -c a.cu b.cc

clang version 15.0.0
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/local/home/tra/work/llvm/build/release+assert+zapcc/bin
 "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/bin/clang-15" "-cc1" "-triple" "nvptx64-nvidia-cuda" "-aux-triple" "x86_64-unknown-linux-gnu" "-S" "-disable-free" "-clear-ast-before-backend" "-main-file-name" "a.cu" "-mrelocation-model" "static" "-mframe-pointer=all" "-fno-rounding-math" "-fno-verbose-asm" "-no-integrated-as" "-aux-target-cpu" "x86-64" "-fcuda-is-device" "-mllvm" "-enable-memcpyopt-without-libcalls" "-mlink-builtin-bitcode" "/usr/local/home/tra/local/cuda-11.5.0/nvvm/libdevice/libdevice.10.bc" "-target-feature" "+ptx75" "-target-sdk-version=11.5" "-target-cpu" "sm_35" "-mllvm" "-treat-scalable-fixed-error-as-warning" "-debugger-tuning=gdb" "-fno-dwarf-directory-asm" "-resource-dir" "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/lib/clang/15.0.0" "-internal-isystem" "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/lib/clang/15.0.0/include/cuda_wrappers" "-include" "__clang_cuda_runtime_wrapper.h" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/backward" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/backward" "-internal-isystem" "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/lib/clang/15.0.0/include" "-internal-isystem" "/usr/local/include" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../x86_64-linux-gnu/include" "-internal-externc-isystem" "/usr/include/x86_64-linux-gnu" "-internal-externc-isystem" "/include" "-internal-externc-isystem" "/usr/include" "-internal-isystem" "/usr/local/home/tra/local/cuda-11.5.0/include" "-internal-isystem" "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/lib/clang/15.0.0/include" "-internal-isystem" "/usr/local/include" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../x86_64-linux-gnu/include" "-internal-externc-isystem" "/usr/include/x86_64-linux-gnu" "-internal-externc-isystem" "/include" "-internal-externc-isystem" "/usr/include" "-fdeprecated-macro" "-fno-autolink" "-fdebug-compilation-dir=/usr/local/home/tra/work/llvm/build/release+assert+zapcc" "-ferror-limit" "19" "-fgnuc-version=4.2.1" "-fcxx-exceptions" "-fexceptions" "-fcolor-diagnostics" "-cuid=fce3a17633fb941f" "-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-o" "/tmp/a-8486d3/a-sm_35.s" "-x" "cuda" "a.cu"
 "/usr/local/home/tra/local/cuda-11.5.0/bin/ptxas" "-m64" "-O0" "--gpu-name" "sm_35" "--output-file" "/tmp/a-7c357d/a-sm_35.o" "/tmp/a-8486d3/a-sm_35.s"
 "/usr/local/home/tra/local/cuda-11.5.0/bin/fatbinary" "-64" "--create" "/tmp/a-c46038.fatbin" "--image=profile=sm_35,file=/tmp/a-7c357d/a-sm_35.o" "--image=profile=compute_35,file=/tmp/a-8486d3/a-sm_35.s"
 "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/bin/clang-15" "-cc1" "-triple" "x86_64-unknown-linux-gnu" "-target-sdk-version=11.5" "-aux-triple" "nvptx64-nvidia-cuda" "-emit-obj" "-mrelax-all" "--mrelax-relocations" "-disable-free" "-clear-ast-before-backend" "-main-file-name" "a.cu" "-mrelocation-model" "static" "-mframe-pointer=all" "-fmath-errno" "-ffp-contract=on" "-fno-rounding-math" "-mconstructor-aliases" "-funwind-tables=2" "-target-cpu" "x86-64" "-tune-cpu" "generic" "-mllvm" "-treat-scalable-fixed-error-as-warning" "-debugger-tuning=gdb" "-fcoverage-compilation-dir=/usr/local/home/tra/work/llvm/build/release+assert+zapcc" "-resource-dir" "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/lib/clang/15.0.0" "-internal-isystem" "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/lib/clang/15.0.0/include/cuda_wrappers" "-include" "__clang_cuda_runtime_wrapper.h" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/backward" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/backward" "-internal-isystem" "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/lib/clang/15.0.0/include" "-internal-isystem" "/usr/local/include" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../x86_64-linux-gnu/include" "-internal-externc-isystem" "/usr/include/x86_64-linux-gnu" "-internal-externc-isystem" "/include" "-internal-externc-isystem" "/usr/include" "-internal-isystem" "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/lib/clang/15.0.0/include" "-internal-isystem" "/usr/local/include" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../x86_64-linux-gnu/include" "-internal-externc-isystem" "/usr/include/x86_64-linux-gnu" "-internal-externc-isystem" "/include" "-internal-externc-isystem" "/usr/include" "-internal-isystem" "/usr/local/home/tra/local/cuda-11.5.0/include" "-fdeprecated-macro" "-fdebug-compilation-dir=/usr/local/home/tra/work/llvm/build/release+assert+zapcc" "-ferror-limit" "19" "-fgnuc-version=4.2.1" "-fcxx-exceptions" "-fexceptions" "-fcolor-diagnostics" "-fcuda-include-gpubinary" "/tmp/a-c46038.fatbin" "-cuid=fce3a17633fb941f" "-faddrsig" "-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-o" "a.o" "-x" "cuda" "a.cu"
 "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/bin/clang-15" "-cc1" "-triple" "x86_64-unknown-linux-gnu" "-emit-obj" "-mrelax-all" "--mrelax-relocations" "-disable-free" "-clear-ast-before-backend" "-main-file-name" "b.cc" "-mrelocation-model" "static" "-mframe-pointer=all" "-fmath-errno" "-ffp-contract=on" "-fno-rounding-math" "-mconstructor-aliases" "-funwind-tables=2" "-target-cpu" "x86-64" "-tune-cpu" "generic" "-mllvm" "-treat-scalable-fixed-error-as-warning" "-debugger-tuning=gdb" "-fcoverage-compilation-dir=/usr/local/home/tra/work/llvm/build/release+assert+zapcc" "-resource-dir" "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/lib/clang/15.0.0" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/backward" "-internal-isystem" "/usr/local/home/tra/work/llvm/build/release+assert+zapcc/lib/clang/15.0.0/include" "-internal-isystem" "/usr/local/include" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../x86_64-linux-gnu/include" "-internal-externc-isystem" "/usr/include/x86_64-linux-gnu" "-internal-externc-isystem" "/include" "-internal-externc-isystem" "/usr/include" "-fdeprecated-macro" "-fdebug-compilation-dir=/usr/local/home/tra/work/llvm/build/release+assert+zapcc" "-ferror-limit" "19" "-fgnuc-version=4.2.1" "-fcxx-exceptions" "-fexceptions" "-fcolor-diagnostics" "-faddrsig" "-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-o" "b.o" "-x" "c++" "b.cc"

If any input file is HIP program, clang driver will use HIP offload kind for all inputs. This behavior is similar as cuda-clang. Therefore if any input file is HIP program, HIP include path is added for each input file compilation. I think this is acceptable.

However, when there is only C++ input file, clang driver should not add HIP or CUDA include path. The current patch breaks that. That's why I think it needs change.

tra added a comment.Mar 1 2022, 9:30 AM

If any input file is HIP program, clang driver will use HIP offload kind for all inputs. This behavior is similar as cuda-clang.

I do not think this is the case as illustrated by the example above. CUDA paths are only added to CUDA compilation. C++ compilation of b.cc does not have any of them.

Therefore if any input file is HIP program, HIP include path is added for each input file compilation. I think this is acceptable.

I disagree. HIP-specific include path should apply to HIP-specific cc1 compilation only.

However, when there is only C++ input file, clang driver should not add HIP or CUDA include path.

Agreed. AFAICT this is the case for CUDA.

The current patch breaks that. That's why I think it needs change.

I agree. Hence my suggestion to add the path via toolchain-specific AddClangSystemIncludeArgs.

If any input file is HIP program, clang driver will use HIP offload kind for all inputs. This behavior is similar as cuda-clang.

I do not think this is the case as illustrated by the example above. CUDA paths are only added to CUDA compilation. C++ compilation of b.cc does not have any of them.

I noticed that with "-c" HIP/CUDA include path is not used for C++ program. However without "-c" HIP/CUDA include path is used for C++ program. Probably this is a bug.

Therefore if any input file is HIP program, HIP include path is added for each input file compilation. I think this is acceptable.

I disagree. HIP-specific include path should apply to HIP-specific cc1 compilation only.

However, when there is only C++ input file, clang driver should not add HIP or CUDA include path.

Agreed. AFAICT this is the case for CUDA.

The current patch breaks that. That's why I think it needs change.

I agree. Hence my suggestion to add the path via a toolchain-specific AddClangSystemIncludeArgs.

In the current patch, AddClangSystemIncludeArgs is modified to add HIP include path. However, there is no good way to know if the current job action is HIP or C++ program.

The signature of AddClangSystemIncludeArgs is

void Linux::AddClangSystemIncludeArgs(const ArgList &DriverArgs,
                                      ArgStringList &CC1Args) const ;

In the case of two input files a.hip and b.cpp, DriverArgs contain both a.hip and b.cpp. Clang does not know if the call of AddClangSystemIncludeArgs is for a.hip or b.cpp. The current patch adds HIP include path for both HIP and C++ programs.

To fix this issue, we either need to add a JobAction argument to AddClangSystemIncludeArgs to let clang know the current call of AddClangSystemIncludeArgs is for HIP program or C++ program, or we need to add HIP include path in a location where the current JobAction is known.

tra added a comment.Mar 1 2022, 11:19 AM

If any input file is HIP program, clang driver will use HIP offload kind for all inputs. This behavior is similar as cuda-clang.

I do not think this is the case as illustrated by the example above. CUDA paths are only added to CUDA compilation. C++ compilation of b.cc does not have any of them.

I noticed that with "-c" HIP/CUDA include path is not used for C++ program. However without "-c" HIP/CUDA include path is used for C++ program. Probably this is a bug.

This is, indeed, a bug. cc1 invocation for a C++ file should have remained the same, yet we're passing not only include paths but also a bunch of other CUDA-related options that are not relevant for C++ compilation.

In the current patch, AddClangSystemIncludeArgs is modified to add HIP include path. However, there is no good way to know if the current job action is HIP or C++ program.

The signature of AddClangSystemIncludeArgs is

void Linux::AddClangSystemIncludeArgs(const ArgList &DriverArgs,
                                      ArgStringList &CC1Args) const ;

In the case of two input files a.hip and b.cpp, DriverArgs contain both a.hip and b.cpp. Clang does not know if the call of AddClangSystemIncludeArgs is for a.hip or b.cpp. The current patch adds HIP include path for both HIP and C++ programs.

To fix this issue, we either need to add a JobAction argument to AddClangSystemIncludeArgs to let clang know the current call of AddClangSystemIncludeArgs is for HIP program or C++ program, or we need to add HIP include path in a location where the current JobAction is known.

I think we first need to figure out first why compilation w/o -c behaves incorrectly and what we can do about it.
I suspect if we fix it and make clang add options consistently regardless of whether we use -c or not, then we would not need to pass job info around. That is, unless the fix is to pass that info around. :-/

IMO mixed-language compilation falls into a grey area where it may sometimes work, but I doubt it's expected to work in general. I don't think it's worth complicating things over it.

yaxunl added a comment.Mar 3 2022, 8:43 AM

If any input file is HIP program, clang driver will use HIP offload kind for all inputs. This behavior is similar as cuda-clang.

I do not think this is the case as illustrated by the example above. CUDA paths are only added to CUDA compilation. C++ compilation of b.cc does not have any of them.

I noticed that with "-c" HIP/CUDA include path is not used for C++ program. However without "-c" HIP/CUDA include path is used for C++ program. Probably this is a bug.

This is, indeed, a bug. cc1 invocation for a C++ file should have remained the same, yet we're passing not only include paths but also a bunch of other CUDA-related options that are not relevant for C++ compilation.

In the current patch, AddClangSystemIncludeArgs is modified to add HIP include path. However, there is no good way to know if the current job action is HIP or C++ program.

The signature of AddClangSystemIncludeArgs is

void Linux::AddClangSystemIncludeArgs(const ArgList &DriverArgs,
                                      ArgStringList &CC1Args) const ;

In the case of two input files a.hip and b.cpp, DriverArgs contain both a.hip and b.cpp. Clang does not know if the call of AddClangSystemIncludeArgs is for a.hip or b.cpp. The current patch adds HIP include path for both HIP and C++ programs.

To fix this issue, we either need to add a JobAction argument to AddClangSystemIncludeArgs to let clang know the current call of AddClangSystemIncludeArgs is for HIP program or C++ program, or we need to add HIP include path in a location where the current JobAction is known.

I think we first need to figure out first why compilation w/o -c behaves incorrectly and what we can do about it.
I suspect if we fix it and make clang add options consistently regardless of whether we use -c or not, then we would not need to pass job info around. That is, unless the fix is to pass that info around. :-/

IMO mixed-language compilation falls into a grey area where it may sometimes work, but I doubt it's expected to work in general. I don't think it's worth complicating things over it.

I created two patches for fixing the existing issues about CUDA/HIP include path

https://reviews.llvm.org/D120910 - HIP specific issue for "-c" with mixed HIP/C++ programs

https://reviews.llvm.org/D120911 - common CUDA/HIP issue for linking with mixed HIP/C++ programs

These issues are due to active offload kind of job actions not set correctly for mixed CUDA or HIP programs and C++ programs.

These issues are orthogonal to the issue which the current patch is trying to resolve.

Even with the above two issues fixed, my argument for the current patch still holds. And it is not just for mixed-language. It happens with single language too. To fix this issue, we either need to add a JobAction argument to AddClangSystemIncludeArgs to let clang know the current call of AddClangSystemIncludeArgs is for HIP program or C++ program, or we need to add HIP include path in a location where the current JobAction is known.

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 8:43 AM
yaxunl updated this revision to Diff 414100.Mar 9 2022, 7:39 AM

use -idirafter to include HIP include path

yaxunl added a comment.Mar 9 2022, 7:41 AM

I found a simple fix. Use -idirafter instead of -isystem-internal. It is still system include path but will be added after all other system include paths.

tra accepted this revision.Mar 9 2022, 10:38 AM

I still don't quite understand your reluctance to use AddClangSystemIncludeArgs for adding HIP path, but if this change works for HIP, I'm fine with it.

Also, thank you for fixing the issue with inconsistent sub-compilation options for mixed-language compilations.

This revision is now accepted and ready to land.Mar 9 2022, 10:38 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMar 9 2022, 5:58 PM