This is an archive of the discontinued LLVM Phabricator instance.

[WIP][test-suite] Add HIP RCCL project tests
Needs ReviewPublic

Authored by ashi1 on Jul 6 2021, 1:15 PM.

Details

Reviewers
yaxunl
Summary

Add ROCm RCCL project compilation and tests to HIP builder.
Support building RCCL with multiple ROCm installations.

Diff Detail

Event Timeline

ashi1 created this revision.Jul 6 2021, 1:15 PM
ashi1 requested review of this revision.Jul 6 2021, 1:15 PM
ashi1 added a comment.Jul 6 2021, 1:20 PM

This is a work-in-progress. RCCL requires using hipcc for the linking step, however the CMake file would not allow me to override the CMAKE_LINKER as hipcc. The reason to use hipcc is to handle static libraries created by -fgpu-rdc.

Therefore, this patch requires upstream clang to support consuming static libraries (containing bundles). Currently, you will see this error on the link step with clang++ (running ninja hip-tests-rccl):

[81/82] Linking CXX executable External/HIP/RcclMultiTests-hip-4.2.0
FAILED: External/HIP/RcclMultiTests-hip-4.2.0
: && /root/llvm-project/build/bin/clang++ -O3 -DNDEBUG  External/HIP/CMakeFiles/RcclMultiTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_AllGatherMultiProcess.cpp.o External/HIP/CMakeFiles/RcclMultiTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_AllReduceGroupMultiProcess.cpp.o External/HIP/CMakeFiles/RcclMultiTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_AllReduceMultiProcess.cpp.o External/HIP/CMakeFiles/RcclMultiTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_AllToAllMultiProcess.cpp.o External/HIP/CMakeFiles/RcclMultiTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_BroadcastMultiProcess.cpp.o External/HIP/CMakeFiles/RcclMultiTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_CombinedCallsMultiProcess.cpp.o External/HIP/CMakeFiles/RcclMultiTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_GatherMultiProcess.cpp.o External/HIP/CMakeFiles/RcclMultiTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_GroupCallsMultiProcess.cpp.o External/HIP/CMakeFiles/RcclMultiTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_ReduceMultiProcess.cpp.o External/HIP/CMakeFiles/RcclMultiTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_ReduceScatterMultiProcess.cpp.o External/HIP/CMakeFiles/RcclMultiTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_ScatterMultiProcess.cpp.o -o External/HIP/RcclMultiTests-hip-4.2.0  -Wl,-rpath,/root/Externals/hip/rocm-4.2.0/lib  -L/root/Externals/hip/rocm-4.2.0/lib -lamdhip64  External/HIP/libRcclLib-hip-4.2.0.a  -fgpu-rdc  -ldl  /root/Externals/hip/rocm-4.2.0/lib/libamdhip64.so  /usr/local/lib/libgtest.a  /usr/local/lib/libgtest_main.a  -lhsa-runtime64  -lrt  -lpthread  -lamd_comgr && :
External/HIP/libRcclLib-hip-4.2.0.a(sendrecv.cpp.o):(.hipFatBinSegment+0x8): undefined reference to `__hip_fatbin'
clang-11: error: linker command failed with exit code 1 (use -v to see invocation)
[82/82] Linking CXX executable External/HIP/RcclSingleTests-hip-4.2.0
FAILED: External/HIP/RcclSingleTests-hip-4.2.0
: && /root/llvm-project/build/bin/clang++ -O3 -DNDEBUG  External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_AllGather.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_AllReduce.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_AllReduceAbort.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_AllReduceGroup.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_AllToAll.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_AllToAllv.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_Broadcast.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_BroadcastAbort.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_CombinedCalls.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_Gather.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_GroupCalls.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_Reduce.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_ReduceScatter.cpp.o External/HIP/CMakeFiles/RcclSingleTests-hip-4.2.0.dir/root/Externals/hip/rccl/test/test_Scatter.cpp.o -o External/HIP/RcclSingleTests-hip-4.2.0  -Wl,-rpath,/root/Externals/hip/rocm-4.2.0/lib  -L/root/Externals/hip/rocm-4.2.0/lib -lamdhip64  External/HIP/libRcclLib-hip-4.2.0.a  -fgpu-rdc  -ldl  /root/Externals/hip/rocm-4.2.0/lib/libamdhip64.so  /usr/local/lib/libgtest.a  /usr/local/lib/libgtest_main.a  -lhsa-runtime64  -lrt  -lpthread  -lamd_comgr && :
External/HIP/libRcclLib-hip-4.2.0.a(sendrecv.cpp.o):(.hipFatBinSegment+0x8): undefined reference to `__hip_fatbin'
clang-11: error: linker command failed with exit code 1 (use -v to see invocation)
ninja: build stopped: subcommand failed.
ashi1 updated this revision to Diff 356812.Jul 6 2021, 1:23 PM

Added context.