This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][JIT] Introduce support for AMDGPU
ClosedPublic

Authored by jdoerfert on Dec 27 2022, 11:24 PM.

Details

Summary

To JIT kernels for AMDGPUs we need to provide the architecture, the

triple, and a post-link callback. The first two are simple, the last one
is a little more complicated since we need to invoke `lld`. There is
some library interface but for that we need the lld library, which is
not generally available, thus we go with the executable for now. In
either way we need to manifest the (amdgcn) object file and read the
output from another file. We should try to avoid that in the future.
The options for `lld` are copied from the way clang invokes it.

Diff Detail

Event Timeline

jdoerfert created this revision.Dec 27 2022, 11:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 27 2022, 11:24 PM
jdoerfert requested review of this revision.Dec 27 2022, 11:24 PM

Instead of invoking lld and copying the flags, could you invoke clang instead? The flags may change over time. Rust calls cc for linking.

arsenm added inline comments.Dec 28 2022, 4:38 AM
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
1654

SmallString

1656

Don’t need llvm::

1673

Also need to get ecc and xnack target features?

1679

Do we still need this flag with real LTO

1688

ExecuteAndWait should switch to std::error_code

1696

Return StringRef

2329

Why not return a complete triple

tianshilei1992 added inline comments.Dec 28 2022, 6:27 AM
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
1696

It is not always good to return a StringRef here. In the NVPTX implementation, the arch string is built on the fly (in the return statement).

2329

Because we don't know the other components of a complete triple, though the assumption here is if the arch is same, the module should be compatible.

arsenm added inline comments.Dec 28 2022, 7:26 AM
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
1696

That doesn’t mean it should be done there

2329

That defeats the point of the other triple fields

jhuber6 added inline comments.Dec 28 2022, 7:33 AM
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
1679

I don't think we need it since we should be linking only the object files to a shared object. Probably copied from the LinkerWrapper, I think it should probably be removed there as well.

tianshilei1992 added inline comments.Dec 28 2022, 8:17 AM
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
1696

this is override function

tianshilei1992 added inline comments.Dec 28 2022, 8:22 AM
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
2329

yes, and then when we compare, we have to get arch again, which I don’t see the point.

Instead of invoking lld and copying the flags, could you invoke clang instead? The flags may change over time. Rust calls cc for linking.

I think invoking lld directly should be fine. It's the only linker that can link AMDGPU images and we already call it directly in the clang-linker-wrapper tool. To use clang you would need to pass --target=amdgcn-amd-amdhsa which most likely isn't compatible between different user cc's so we can't use that trick. If you do use clang is just gives this invocation, which probably won't be changed too much in the future:

"ld.lld" "device.o" "-shared" "-o" "a.out"

@arsenm Not sure if/how we should deal with the feature flags (xsnack, etc.) I'll leave that up to AMD people to look into. This should give us basic support, tested with OpenMC on a gfx90a.

Rust does not really care whether cc is clang or gcc. My point is that it is safer to invoke clang and let it manage the link step.

jdoerfert marked 14 inline comments as done.Dec 28 2022, 9:59 AM

I will update the nits I marked as done but didn't comment on. The commented stuff are kept for "follow ups", assuming people don't object. Anything else?

openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
1673

Let's keep that question for a follow up. I honestly don't know. Basic support should work w/o all of this. AMD folks will look into it (and how to propagate the information).

1688

Not my department.

1696

Can't call a StringRef w/o refactoring stuff. Once we JIT per kernel, not per module, this might become interesting, for now, it's less of an issue.

2329

Similar answer as to above. Once we look at more than the arch + sub-arch, e.g., features, we might want to restructure this. I'll assume AMD (@jhuber6) will take a look how to do that.

jdoerfert marked 4 inline comments as done.Dec 28 2022, 10:04 AM

Rust does not really care whether cc is clang or gcc. My point is that it is safer to invoke clang and let it manage the link step.

FWIW, what we really want is to use JITLink or lld as a library (neither we can really do right now). If people think using clang is better than lld directly, I can change it for sure.

Halide is apparently using lld as a library: https://reviews.llvm.org/D140726

Halide is apparently using lld as a library: https://reviews.llvm.org/D140726

I even started using that interface (`lld::elf::link) but:

  1. It doesn't solve our problem of materializing the two files for the link step, and
  2. It requires the lld libraries, which we don't build by default and there is no guarantee the system lld (e.g., distributed with rocm) provides those.

@arsenm Not sure if/how we should deal with the feature flags (xsnack, etc.) I'll leave that up to AMD people to look into. This should give us basic support, tested with OpenMC on a gfx90a.

Feature flags are not needed if you don't need portability. I do have a piece of code copied from somewhere that I don't remember: https://github.com/shiltian/llvm-project/commit/02bc7effccc6ff2f5ab3fe5218336094c0485766#diff-321c2038035972ad4994ff9d85b29950ba72c08a79891db5048b8f5d46915314R432.

jdoerfert updated this revision to Diff 486142.Jan 3 2023, 6:53 PM

Addressed comments, I think.

jdoerfert updated this revision to Diff 486143.Jan 3 2023, 6:54 PM

Pick the right patch file.

tianshilei1992 accepted this revision.Jan 3 2023, 7:56 PM

JIT part looks good to me. Not sure if the invocation of external tool will have more concerns.

This revision is now accepted and ready to land.Jan 3 2023, 7:56 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJan 4 2023, 10:15 AM