Add ROCm RCCL project compilation and tests to HIP builder.
Support building RCCL with multiple ROCm installations.
Details
Details
- Reviewers
yaxunl
Diff Detail
Diff Detail
- Repository
- rT test-suite
Event Timeline
Comment Actions
This is a work-in-progress. RCCL requires using hipcc for the linking step, however the CMake file would not allow me to override the CMAKE_LINKER as hipcc. The reason to use hipcc is to handle static libraries created by -fgpu-rdc.
Therefore, this patch requires upstream clang to support consuming static libraries (containing bundles). Currently, you will see this error on the link step with clang++ (running ninja hip-tests-rccl):
[81/82] Linking CXX executable External/HIP/RcclMultiTests-hip-4.2.0 FAILED: External/HIP/RcclMultiTests-hip-4.2.0 : && /root/llvm-project/build/bin/clang++ -O3 -DNDEBUG External/HIP/CMakeFiles/RcclMultiTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_AllGatherMultiProcess.cpp.o External/HIP/CMakeFiles/RcclMultiTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_AllReduceGroupMultiProcess.cpp.o External/HIP/CMakeFiles/RcclMultiTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_AllReduceMultiProcess.cpp.o External/HIP/CMakeFiles/RcclMultiTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_AllToAllMultiProcess.cpp.o External/HIP/CMakeFiles/RcclMultiTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_BroadcastMultiProcess.cpp.o External/HIP/CMakeFiles/RcclMultiTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_CombinedCallsMultiProcess.cpp.o External/HIP/CMakeFiles/RcclMultiTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_GatherMultiProcess.cpp.o External/HIP/CMakeFiles/RcclMultiTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_GroupCallsMultiProcess.cpp.o External/HIP/CMakeFiles/RcclMultiTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_ReduceMultiProcess.cpp.o External/HIP/CMakeFiles/RcclMultiTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_ReduceScatterMultiProcess.cpp.o External/HIP/CMakeFiles/RcclMultiTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_ScatterMultiProcess.cpp.o -o External/HIP/RcclMultiTests-hip-4.2.0 -Wl,-rpath,/root/Externals/hip/rocm-4.2.0/lib -L/root/Externals/hip/rocm-4.2.0/lib -lamdhip64 External/HIP/libRcclLib-hip-4.2.0.a -fgpu-rdc -ldl /root/Externals/hip/rocm-4.2.0/lib/libamdhip64.so /usr/local/lib/libgtest.a /usr/local/lib/libgtest_main.a -lhsa-runtime64 -lrt -lpthread -lamd_comgr && : External/HIP/libRcclLib-hip-4.2.0.a(sendrecv.cpp.o):(.hipFatBinSegment+0x8): undefined reference to `__hip_fatbin' clang-11: error: linker command failed with exit code 1 (use -v to see invocation) [82/82] Linking CXX executable External/HIP/RcclSingleTests-hip-4.2.0 FAILED: External/HIP/RcclSingleTests-hip-4.2.0 : && /root/llvm-project/build/bin/clang++ -O3 -DNDEBUG External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_AllGather.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_AllReduce.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_AllReduceAbort.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_AllReduceGroup.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_AllToAll.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_AllToAllv.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_Broadcast.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_BroadcastAbort.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_CombinedCalls.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_Gather.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_GroupCalls.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_Reduce.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_ReduceScatter.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_Scatter.cpp.o -o External/HIP/RcclSingleTests-hip-4.2.0 -Wl,-rpath,/root/Externals/hip/rocm-4.2.0/lib -L/root/Externals/hip/rocm-4.2.0/lib -lamdhip64 External/HIP/libRcclLib-hip-4.2.0.a -fgpu-rdc -ldl /root/Externals/hip/rocm-4.2.0/lib/libamdhip64.so /usr/local/lib/libgtest.a /usr/local/lib/libgtest_main.a -lhsa-runtime64 -lrt -lpthread -lamd_comgr && : External/HIP/libRcclLib-hip-4.2.0.a(sendrecv.cpp.o):(.hipFatBinSegment+0x8): undefined reference to `__hip_fatbin' clang-11: error: linker command failed with exit code 1 (use -v to see invocation) ninja: build stopped: subcommand failed.