[mlir][gpu] Introduce mlir-rocm-runner.

[mlir][gpu] Introduce mlir-rocm-runner.

mlir-rocm-runner is introduced in this commit to execute GPU modules on ROCm
platform. A small wrapper to encapsulate ROCm's HIP runtime API is also inside
the commit.

Due to behavior of ROCm, raw pointers inside memrefs passed to gpu.launch
must be modified on the host side to properly capture the pointer values
addressable on the GPU.

LLVM MC is used to assemble AMD GCN ISA coming out from
ConvertGPUKernelToBlobPass to binary form, and LLD is used to produce a shared
ELF object which could be loaded by ROCm HIP runtime.

gfx900 is the default target be used right now, although it could be altered via
an option in mlir-rocm-runner. Future revisions may consider using ROCm Agent
Enumerator to detect the right target on the system.

Notice AMDGPU Code Object V2 is used in this revision. Future enhancements may
upgrade to AMDGPU Code Object V3.

Bitcode libraries in ROCm-Device-Libs, which implements math routines exposed in
rocdl dialect are not yet linked, and is left as a TODO in the logic.

Differential Revision: https://reviews.llvm.org/D80676