This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Remove mlir-rocm-runner
ClosedPublic

Authored by csigg on Mar 11 2021, 12:36 PM.

Details

Summary

This change combines for ROCm what was done for CUDA in D97463, D98203, D98360, and D98396.

I did not try to compile SerializeToHsaco.cpp or test mlir/test/Integration/GPU/ROCM because I don't have an AMD card. I fixed the things that had obvious bit-rot though.

Diff Detail

Event Timeline

csigg created this revision.Mar 11 2021, 12:36 PM
csigg requested review of this revision.Mar 11 2021, 12:36 PM
csigg updated this revision to Diff 330164.Mar 12 2021, 12:10 AM

Do not depend on D98396.

csigg edited the summary of this revision. (Show Details)
csigg updated this revision to Diff 330168.Mar 12 2021, 12:17 AM
csigg edited the summary of this revision. (Show Details)

Rebase.

csigg updated this revision to Diff 330210.Mar 12 2021, 5:03 AM

Various fixes.

I gave this change a spin on genesiscloud.com (Radeon Instinct™ MI25, ROCm HIP version: 3.5.20214-a2917cd).

The code now compiles, but the integration tests all fail with assembler initialization error.

I also had to remove the 'code-object-v3' features because LLVM wouldn't recognize it.

For future reference, here is how I tested:

sudo apt-get install ninja-build
sudo bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"
wget -qO - https://apt.kitware.com/keys/kitware-archive-latest.asc | sudo apt-key add -
sudo apt-add-repository 'deb https://apt.kitware.com/ubuntu/ bionic main'
sudo apt-get update
sudo apt-get install cmake

wget https://reviews.llvm.org/D98447?download=true -O rocm.patch
git clone https://github.com/llvm/llvm-project.git
git apply ../rocm.patch
mkdir build
cd build

CC=clang-12 CXX=clang++-12 cmake ../llvm '-DCMAKE_CUDA_COMPILER=/media/samsung_ssd_850_pro/cuda/cuda-11.1/bin/nvcc' -DLLVM_BUILD_EXAMPLES=ON '-DLLVM_TARGETS_TO_BUILD=host;AMDGPU' -DLLVM_ENABLE_PROJECTS="clang;mlir;lld" '-DMLIR_INCLUDE_INTEGRATION_TESTS=ON' '-DMLIR_ROCM_RUNNER_ENABLED=1' -DBUILD_SHARED_LIBS=ON -DLLVM_CCACHE_BUILD=OFF -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON '-DLLVM_LIT_ARGS=-v -vv' -GNinja

ninja check-mlir

@csigg May I understand the goal for this changeset is to merge everything under mlir-cpu-runner?

I'll check this changeset on a ROCm 4.0 machine, and provide feedbacks.

csigg updated this revision to Diff 330234.Mar 12 2021, 7:02 AM

Another fix.

gpu-to-hsaco.mlir now passes. The other 3 integration tests still fail but that seems unrelated to this change and should be dealt with separately.

Test output:

$ env LIT_FILTER=GPU/ROCM ninja check-mlir
-- Testing: 4 of 897 tests, 4 workers --
FAIL: MLIR :: Integration/GPU/ROCM/two-modules.mlir (1 of 4)
******************** TEST 'MLIR :: Integration/GPU/ROCM/two-modules.mlir' FAILED ********************
Script:
--
: 'RUN: at line 1'; /home/ubuntu/llvm-project/build/bin/mlir-opt /home/ubuntu/llvm-project/mlir/test/Integration/GPU/ROCM/two-modules.mlir -gpu-kernel-outlining -pass-pipeline ='gpu.module(strip-debuginfo,convert-gpu-to-rocdl,gpu-to-hsaco)' -gpu-to-llvm | /home/ubuntu/llvm-project/build/bin/mlir-cpu-runner --shared-libs=/home/ubuntu/llvm-project/buil d/lib/libmlir_rocm_runtime.so --shared-libs=/home/ubuntu/llvm-project/build/lib/libmlir_runner_utils.so --entry-point-result=void | /home/ubuntu/llvm-project/build/bin/FileChec k /home/ubuntu/llvm-project/mlir/test/Integration/GPU/ROCM/two-modules.mlir
--
Exit Code: 2

Command Output (stderr):
--
+ : 'RUN: at line 1'
+ /home/ubuntu/llvm-project/build/bin/mlir-opt /home/ubuntu/llvm-project/mlir/test/Integration/GPU/ROCM/two-modules.mlir -gpu-kernel-outlining '-pass-pipeline=gpu.module(strip-debugin fo,convert-gpu-to-rocdl,gpu-to-hsaco)' -gpu-to-llvm
+ /home/ubuntu/llvm-project/build/bin/mlir-cpu-runner --shared-libs=/home/ubuntu/llvm-project/build/lib/libmlir_rocm_runtime.so --shared-libs=/home/ubuntu/llvm-project/build/lib/libml ir_runner_utils.so --entry-point-result=void
+ /home/ubuntu/llvm-project/build/bin/FileCheck /home/ubuntu/llvm-project/mlir/test/Integration/GPU/ROCM/two-modules.mlir
/home/ubuntu/llvm-project/mlir/test/Integration/GPU/ROCM/two-modules.mlir:37:1: error: 'func' op symbol declaration cannot have public visibility
func @mgpuMemGetDeviceMemRef1dInt32(%ptr : memref<?xi32>) -> (memref<?xi32>)
^
/home/ubuntu/llvm-project/mlir/test/Integration/GPU/ROCM/two-modules.mlir:37:1: note: see current operation: "func"() ( {
}) {sym_name = "mgpuMemGetDeviceMemRef1dInt32", type = (memref<?xi32>) -> memref<?xi32>} : () -> ()
Error: entry point not found
FileCheck error: '<stdin>' is empty.
FileCheck command line: /home/ubuntu/llvm-project/build/bin/FileCheck /home/ubuntu/llvm-project/mlir/test/Integration/GPU/ROCM/two-modules.mlir

--

********************
FAIL: MLIR :: Integration/GPU/ROCM/vector-transferops.mlir (2 of 4)
******************** TEST 'MLIR :: Integration/GPU/ROCM/vector-transferops.mlir' FAILED ********************
Script:
--
: 'RUN: at line 1'; /home/ubuntu/llvm-project/build/bin/mlir-opt /home/ubuntu/llvm-project/mlir/test/Integration/GPU/ROCM/vector-transferops.mlir -gpu-kernel-outlining -pass-p ipeline='gpu.module(strip-debuginfo,convert-gpu-to-rocdl,gpu-to-hsaco)' -gpu-to-llvm | /home/ubuntu/llvm-project/build/bin/mlir-cpu-runner --shared-libs=/home/ubuntu/llvm-proje ct/build/lib/libmlir_rocm_runtime.so --shared-libs=/home/ubuntu/llvm-project/build/lib/libmlir_runner_utils.so --entry-point-result=void | /home/ubuntu/llvm-project/build/bin/F ileCheck /home/ubuntu/llvm-project/mlir/test/Integration/GPU/ROCM/vector-transferops.mlir
--
Exit Code: 2

Command Output (stderr):
--
+ : 'RUN: at line 1'
+ /home/ubuntu/llvm-project/build/bin/mlir-cpu-runner --shared-libs=/home/ubuntu/llvm-project/build/lib/libmlir_rocm_runtime.so --shared-libs=/home/ubuntu/llvm-project/build/lib/libml ir_runner_utils.so --entry-point-result=void
+ /home/ubuntu/llvm-project/build/bin/FileCheck /home/ubuntu/llvm-project/mlir/test/Integration/GPU/ROCM/vector-transferops.mlir
+ /home/ubuntu/llvm-project/build/bin/mlir-opt /home/ubuntu/llvm-project/mlir/test/Integration/GPU/ROCM/vector-transferops.mlir -gpu-kernel-outlining '-pass-pipeline=gpu.module(strip- debuginfo,convert-gpu-to-rocdl,gpu-to-hsaco)' -gpu-to-llvm
/home/ubuntu/llvm-project/mlir/test/Integration/GPU/ROCM/vector-transferops.mlir:67:3: error: 'scf.for' op operand #0 must be index, but got 'i64'
 scf.for %i = %c0 to %c4 step %c1 {
 ^
/home/ubuntu/llvm-project/mlir/test/Integration/GPU/ROCM/vector-transferops.mlir:67:3: note: see current operation: "scf.for"(%0, %2, %1) ( {
^bb0(%arg0: index): // no predecessors
 "std.store"(%4, %18, %arg0) : (f32, !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>, index) -> ()
 "std.store"(%4, %32, %arg0) : (f32, !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>, index) -> ()
 "scf.yield"() : () -> ()
}) : (i64, i64, i64) -> ()
Error: entry point not found
FileCheck error: '<stdin>' is empty.
FileCheck command line: /home/ubuntu/llvm-project/build/bin/FileCheck /home/ubuntu/llvm-project/mlir/test/Integration/GPU/ROCM/vector-transferops.mlir

--

********************
FAIL: MLIR :: Integration/GPU/ROCM/vecadd.mlir (3 of 4)
******************** TEST 'MLIR :: Integration/GPU/ROCM/vecadd.mlir' FAILED ********************
Script:
--
: 'RUN: at line 1'; /home/ubuntu/llvm-project/build/bin/mlir-opt /home/ubuntu/llvm-project/mlir/test/Integration/GPU/ROCM/vecadd.mlir -gpu-kernel-outlining -pass-pipeline='gpu .module(strip-debuginfo,convert-gpu-to-rocdl,gpu-to-hsaco)' -gpu-to-llvm | /home/ubuntu/llvm-project/build/bin/mlir-cpu-runner --shared-libs=/home/ubuntu/llvm-project/build/lib /libmlir_rocm_runtime.so --shared-libs=/home/ubuntu/llvm-project/build/lib/libmlir_runner_utils.so --entry-point-result=void | /home/ubuntu/llvm-project/build/bin/FileCheck /ho me/ubuntu/llvm-project/mlir/test/Integration/GPU/ROCM/vecadd.mlir
--
Exit Code: 2

Command Output (stderr):
--
+ : 'RUN: at line 1'
+ /home/ubuntu/llvm-project/build/bin/mlir-cpu-runner --shared-libs=/home/ubuntu/llvm-project/build/lib/libmlir_rocm_runtime.so --shared-libs=/home/ubuntu/llvm-project/build/lib/libml ir_runner_utils.so --entry-point-result=void
+ /home/ubuntu/llvm-project/build/bin/FileCheck /home/ubuntu/llvm-project/mlir/test/Integration/GPU/ROCM/vecadd.mlir
+ /home/ubuntu/llvm-project/build/bin/mlir-opt /home/ubuntu/llvm-project/mlir/test/Integration/GPU/ROCM/vecadd.mlir -gpu-kernel-outlining '-pass-pipeline=gpu.module(strip-debuginfo,co nvert-gpu-to-rocdl,gpu-to-hsaco)' -gpu-to-llvm
/home/ubuntu/llvm-project/mlir/test/Integration/GPU/ROCM/vecadd.mlir:38:3: error: 'scf.for' op operand #0 must be index, but got 'i64'
 scf.for %i = %c0 to %c5 step %c1 {
 ^
/home/ubuntu/llvm-project/mlir/test/Integration/GPU/ROCM/vecadd.mlir:38:3: note: see current operation: "scf.for"(%0, %2, %1) ( {
^bb0(%arg0: index): // no predecessors
 "std.store"(%3, %17, %arg0) : (f32, !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>, index) -> ()
 "std.store"(%3, %31, %arg0) : (f32, !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>, index) -> ()
 "scf.yield"() : () -> ()
}) : (i64, i64, i64) -> ()
Error: entry point not found
FileCheck error: '<stdin>' is empty.
FileCheck command line: /home/ubuntu/llvm-project/build/bin/FileCheck /home/ubuntu/llvm-project/mlir/test/Integration/GPU/ROCM/vecadd.mlir

--

********************
PASS: MLIR :: Integration/GPU/ROCM/gpu-to-hsaco.mlir (4 of 4)
********************
Failed Tests (3):
 MLIR :: Integration/GPU/ROCM/two-modules.mlir
 MLIR :: Integration/GPU/ROCM/vecadd.mlir
 MLIR :: Integration/GPU/ROCM/vector-transferops.mlir


Testing Time: 0.91s
 Excluded: 893
 Passed : 1
 Failed : 3
FAILED: tools/mlir/test/CMakeFiles/check-mlir
csigg added a comment.Mar 12 2021, 7:08 AM

@csigg May I understand the goal for this changeset is to merge everything under mlir-cpu-runner?

The diffs referenced in the description contain more context, but basically:

  • file-move gpu runner tests to 'Integration' directory.
  • file-move gpuKernelsToBlob to Dialect/GPU/Transform
  • code-move lowering to GPU blob from mlir-rocm-runner to mlir-opt
  • change integration tests from mlir-rocm-runner to mlir-opt + cpu-runner
  • remove mlir-rocm-runner

I'll check this changeset on a ROCm 4.0 machine, and provide feedbacks.

Thanks! Please use the latest revision ;-)

@csigg May I understand how you configure the build?

With this cmake command:

cmake -G Ninja ../llvm \
    -DLLVM_ENABLE_PROJECTS="mlir;lld" \
    -DLLVM_BUILD_EXAMPLES=ON \
    -DLLVM_TARGETS_TO_BUILD="X86;AMDGPU" \
    -DCMAKE_BUILD_TYPE=Release \
    -DLLVM_ENABLE_ASSERTIONS=ON \
    -DBUILD_SHARED_LIBS=ON \
    -DLLVM_BUILD_LLVM_DYLIB=ON \
    -DMLIR_ROCM_RUNNER_ENABLED=1

I'm getting:

tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPU.dir/Transforms/SerializeToHsaco.cpp.o: In function `std::_Function_handler<std::unique_ptr<mlir::Pass, std::default_delete<mlir::Pass> > (), mlir::registerGpuSerializeToHsacoPass()::{lambda()#1}>::_M_invoke(std::_Any_data const&)':
SerializeToHsaco.cpp:(.text._ZNSt17_Function_handlerIFSt10unique_ptrIN4mlir4PassESt14default_deleteIS2_EEvEZNS1_31registerGpuSerializeToHsacoPassEvEUlvE_E9_M_invokeERKSt9_Any_data+0x34): undefined reference to `LLVMInitializeAMDGPUAsmParser'
tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPU.dir/Transforms/SerializeToHsaco.cpp.o: In function `(anonymous namespace)::SerializeToHsacoPass::createHsaco(llvm::SmallVectorImpl<char> const&)':
SerializeToHsaco.cpp:(.text._ZN12_GLOBAL__N_120SerializeToHsacoPass11createHsacoERKN4llvm15SmallVectorImplIcEE+0x51c): undefined reference to `lld::elf::link(llvm::ArrayRef<char const*>, bool, llvm::raw_ostream&, llvm::raw_ostream&)'
mlir/test/lib/Transforms/TestConvertGPUKernelToHsaco.cpp
28

PTX -> LLVM IR?

@csigg I would need your help advising me the proper way to configure it so I could test the patch. With the cmake command I use downstream I run into link errors aforementioned.

Also, in our downstream work, we did a rebase recently and we found we have to disable multi-threading when we run MLIR passes.

static LogicalResult runMLIRPasses(ModuleOp m) {

  // TODO(sjw): fix multi-threading race condition
  m.getContext()->disableMultithreading();

  PassManager pm(m.getContext());
  applyPassManagerCLOptions(pm);

To emit HSACO for AMD GPU platform, lld is used, and it seems lld itself doesn't seem to be thread-safe.

mlir/test/lib/Transforms/TestConvertGPUKernelToHsaco.cpp
37

"code-object-v3" is dropped in recent LLVM AMDGPU backend so this line is not needed.

csigg updated this revision to Diff 330431.Mar 13 2021, 1:26 AM

Simplify dependencies.

csigg updated this revision to Diff 330461.Mar 13 2021, 11:24 AM

Fix dependencies for libMLIR.so, fix tests.

csigg marked 2 inline comments as done.Mar 13 2021, 11:30 AM

@csigg May I understand how you configure the build?

With this cmake command:

cmake -G Ninja ../llvm \
    -DLLVM_ENABLE_PROJECTS="mlir;lld" \
    -DLLVM_BUILD_EXAMPLES=ON \
    -DLLVM_TARGETS_TO_BUILD="X86;AMDGPU" \
    -DCMAKE_BUILD_TYPE=Release \
    -DLLVM_ENABLE_ASSERTIONS=ON \
    -DBUILD_SHARED_LIBS=ON \
    -DLLVM_BUILD_LLVM_DYLIB=ON \
    -DMLIR_ROCM_RUNNER_ENABLED=1

My cmake command is above. Yours seems to be missing -DMLIR_INCLUDE_INTEGRATION_TESTS=ON.

Unrelated, but note that SHARED_LIBS + DYLIB is not supported.

I'm getting: ... `undefined reference to `lld::elf::link`

This should be fixed now. lldELF is not a LLVM target, so this needed a 'creative' solution.

I also managed to fix the tests.

csigg updated this revision to Diff 330466.Mar 13 2021, 12:13 PM

Another fix.

csigg added a comment.Mar 16 2021, 3:10 AM

Also, in our downstream work, we did a rebase recently and we found we have to disable multi-threading when we run MLIR passes.
To emit HSACO for AMD GPU platform, lld is used, and it seems lld itself doesn't seem to be thread-safe.

There is a lock around the lld call. I think it should be fine.

Thanks for cleaning this up! Looks good in general but I also cannot test this. It would be nice if we had a builder for this. @whchung do you have a setup that could test this continuously or at least periodically to make sure it does not break?

mlir/lib/Dialect/GPU/CMakeLists.txt
100

nit: which rocm runner?

120

Same here.

146

What does this do? I do not understand cmake enough to make sense of this.

csigg updated this revision to Diff 331201.Mar 17 2021, 3:18 AM

Update error messages.

csigg marked 2 inline comments as done.Mar 17 2021, 3:25 AM
csigg added inline comments.
mlir/lib/Dialect/GPU/CMakeLists.txt
146

MLIR_LLVM_LINK_COMPONENTS is the list of LLVM libraries that are linked to libmlir.so. Normally, it is populated by the LINK_COMPONENTS argument of add_mlir_library() but here this call has already happened. We need an alias (really just a different name for an existing target) because the LINK_COMPONENTS name are implicitly prefixed with 'LLVM' when translated to target names.

whchung accepted this revision.Mar 18 2021, 3:44 PM

The patch has been verified on ROCm 4.0.

This revision is now accepted and ready to land.Mar 18 2021, 3:44 PM

Thanks for cleaning this up! Looks good in general but I also cannot test this. It would be nice if we had a builder for this. @whchung do you have a setup that could test this continuously or at least periodically to make sure it does not break?

@herhut We have CI nodes checking our work downstream. We used to have a job checking tip of LLVM but it's offline for some time. I'll work with my colleague to bring it back online.

This revision was landed with ongoing or failed builds.Mar 19 2021, 12:24 AM
This revision was automatically updated to reflect the committed changes.
MaskRay added a subscriber: MaskRay.EditedJan 28 2022, 1:46 PM

Coming from https://reviews.llvm.org/D108850#3280778

I am not comfortable that mlir has such an API dependency on lld::elf::link.
lld::elf::link (the library usage) is still not recommended.
It also feels odd that mlir needs to use lld.

Library usage may not control the concurrency well.
You may switch to spawning an ld.lld process if LLVM_ENABLE_PROJECTS includes lld.

mlir/tools/mlir-rocm-runner/CMakeLists.txt