Page MenuHomePhabricator

[mlir][gpu] Introduce mlir-rocm-runner.

Authored by whchung on May 27 2020, 4:33 PM.



mlir-rocm-runner is introduced in this commit to execute GPU modules on ROCm
platform. A small wrapper to encapsulate ROCm's HIP runtime API is also inside
the commit.

Due to behavior of ROCm, raw pointers inside memrefs passed to gpu.launch
must be modified on the host side to properly capture the pointer values
addressable on the GPU.

LLVM MC is used to assemble AMD GCN ISA coming out from
ConvertGPUKernelToBlobPass to binary form, and LLD is used to produce a shared
ELF object which could be loaded by ROCm HIP runtime.

gfx900 is the default target be used right now, although it could be altered via
an option in mlir-rocm-runner. Future revisions may consider using ROCm Agent
Enumerator to detect the right target on the system.

Notice AMDGPU Code Object V2 is used in this revision. Future enhancements may
upgrade to AMDGPU Code Object V3.

Bitcode libraries in ROCm-Device-Libs, which implements math routines exposed in
rocdl dialect are not yet linked, and is left as a TODO in the logic.

Diff Detail

Event Timeline

whchung created this revision.May 27 2020, 4:33 PM
Herald added a project: Restricted Project. · View Herald Transcript

This patch depends on D80142.

whchung updated this revision to Diff 266929.May 28 2020, 9:57 AM

Tame clang-tidy.

whchung updated this revision to Diff 266935.May 28 2020, 10:13 AM

Improve ROCm HIP library search logic.

whchung updated this revision to Diff 266937.May 28 2020, 10:24 AM

Add CMake checks to ensure lld is enabled building mlir-rocm-runner.

whchung updated this revision to Diff 266947.May 28 2020, 10:53 AM

Fix build errors in a shared lib build.

rriddle added inline comments.May 28 2020, 11:24 AM


whchung updated this revision to Diff 266982.May 28 2020, 12:07 PM

std::string -> StringRef.

whchung marked an inline comment as done.May 28 2020, 12:08 PM
jerryyin added inline comments.

When building with the following option: -DBUILD_SHARED_LIBS=ON -DLLVM_BUILD_LLVM_DYLIB=ON

Build failed with linker issues like below:

mlir-rocm-runner.cpp:(.text.startup.main+0x5e): undefined reference to `LLVMInitializeNVPTXTargetInfo' 
mlir-rocm-runner.cpp:(.text.startup.main+0x63): undefined reference to `LLVMInitializeAMDGPUTargetInfo'
mlir-rocm-runner.cpp:(.text.startup.main+0x6d): undefined reference to `LLVMInitializeNVPTXTargetMC'
mlir-rocm-runner.cpp:(.text.startup.main+0x72): undefined reference to `LLVMInitializeAMDGPUTargetMC'
mlir-rocm-runner.cpp:(.text.startup.main+0x86): undefined reference to `LLVMInitializeAMDGPUTargetInfo'
mlir-rocm-runner.cpp:(.text.startup.main+0x8b): undefined reference to `LLVMInitializeAMDGPUTargetMC'

Needs the following targets: LLVMAMDGPUInfo, LLVMAMDGPUDesc, LLVMNVPTXInfo, LLVMNVPTXDesc.

whchung marked 2 inline comments as done.May 28 2020, 1:39 PM
whchung added inline comments.

@jerryyin The issue should be no more should you adopt D80739.

jerryyin accepted this revision.Jun 3 2020, 12:07 PM

Good job, mostly minor issues, except the one related with the difference between mlir-rocm-runner and mlir-cuda-runner. Giving an accept to unblock future development, but we need to figure out the best way to resolve the differences between runner wrapper interfaces.


Adding a TODO regarding on supporting other target chips?


I only see you use the raw os stream, not the bos. Should you swap os with bos in the lines follows?


Nit: Could you add a const to &isaBlob just so that it is clear it is input


Nit: Use the stack allocated llvm::FileRemover instead of having to manually track/remove it?


I'm a bit concerned that we get diverged between different user APIs. It is probably fine now, but will be a maintenance burden later. I will have to, in the future, duplicate and sync every cuda-rocm-runner test with us.

Desirably, we can reach to a shape where cuda-runner and rocm-runner tests can be merged into one.

This revision is now accepted and ready to land.Jun 3 2020, 12:07 PM
whchung marked 7 inline comments as done.Jun 3 2020, 1:32 PM
whchung added inline comments.

Other targets such as gfx906 gfx908 can already work via this option. I'll add a TODO to detect native AMD GPU ISA version on the system.


This only has to deal with host pinned memrefs. In real-world applications memrefs could be explicitly allocated on GPU, and transferred between CPU and GPU.

whchung updated this revision to Diff 268292.Jun 3 2020, 1:40 PM
whchung marked 2 inline comments as done.

Address code review comments.

@jerryyin I've revised the commit based on your comments. Could you give this patch another around of review? Thanks.

whchung marked an inline comment as done.Jun 3 2020, 1:41 PM
jerryyin accepted this revision.Jun 3 2020, 2:00 PM



Thanks for clarifying

whchung updated this revision to Diff 268467.Jun 4 2020, 7:29 AM


whchung updated this revision to Diff 268658.Jun 4 2020, 9:04 PM

Remove NOLINT.

This revision was automatically updated to reflect the committed changes.