This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][AMDGPU] Add the support for `malloc` and `free` for AMDGPU
AbandonedPublic

Authored by tianshilei1992 on Jul 14 2022, 10:26 AM.

Details

Summary

This patch adds the basic support for malloc and free for AMDGPU.
It borrows the idea from clang/lib/Headers/__clang_hip_runtime_wrapper.h but
ignores the two checks: HIP version, and whether enabling device malloc. Arguably,
this doesn't make things worse. Currently no matter which HIP version users are
using, or whether the device malloc is enabled, they will all get link error if
they use malloc and free in target region. Then for users using pre ROCm 4.5,
they still get link error. That also helps to get rid of the check whether device
malloc is enabled because it is only needed for before ROCm 4.5. The good part is,
users using ROCm 4.5+ would get the feature.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2022, 10:26 AM
tianshilei1992 requested review of this revision.Jul 14 2022, 10:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2022, 10:26 AM

remove unrelated changes

fix rong header macro

jdoerfert added inline comments.Jul 14 2022, 11:15 AM
openmp/libomptarget/DeviceRTL/include/Memory.h
19

I don't think we can/should define it like this. Maybe using size_t = decltype(sizeof(char)); instead? Also, this should arguably be in Types.h.

openmp/libomptarget/DeviceRTL/src/Memory.cpp
32

I think we want (now or later) a second level of indirection here.
malloc -> _OMP::impl::malloc -> (__ockl_dm_alloc, ...)

Also, we need to annotate all external functions properly, e.g., see malloc and free in State.cpp.
Finally, combine this with existing malloc/free declarations and uses in the runtime.

tianshilei1992 marked 2 inline comments as done.Jul 14 2022, 12:15 PM
tianshilei1992 added inline comments.
openmp/libomptarget/DeviceRTL/src/Memory.cpp
32

That would be our next step after we have a proper allocator.

tianshilei1992 marked an inline comment as done.Jul 14 2022, 12:15 PM
jdoerfert accepted this revision.Jul 14 2022, 12:30 PM

LG, minor nits

openmp/libomptarget/DeviceRTL/include/Types.h
45

We don't require 8 byte size_t, do we? size_t is by definition supposed to be the sizeof result type, not 8 byte.

openmp/libomptarget/DeviceRTL/src/Memory.cpp
25

Rename the arguments.

Unrelated: I really have no words for dealloc taking a unsigned long long.

This revision is now accepted and ready to land.Jul 14 2022, 12:30 PM

rebase and fix comments

tianshilei1992 marked 2 inline comments as done.Jul 14 2022, 6:25 PM
JonChesterfield requested changes to this revision.Jul 15 2022, 12:02 AM

This is missing changes to the host plugin to set up the state malloc uses, and missing a host call implementation to do the allocation. So as written this isn't going to work. What's the benefit?

This revision now requires changes to proceed.Jul 15 2022, 12:02 AM
tianshilei1992 planned changes to this revision.EditedJul 15 2022, 5:50 AM

It looks like there are a couple of things still missing:

  • malloc and free defined for AMDGPU using declare variant cannot be linked into the device module correctly. Their names still contain declare variant mangling such that they are dropped directly, leaving malloc and free still undefined.
  • Even manually link the functions into device module, because some host setup are missing, as Jon pointed out, there is still some memory error:
[GPU Memory Error] Addr: 0x0 Reason: Page not present or supervisor privilege.
Memory access fault by GPU node-8 (Agent handle: 0x1428d10) on address (nil). Reason: Page not present or supervisor privilege.

Yep, the host plugin will be passing 0 at an offset where the ockl machinery expects some non-null value. I think that allocator uses some initial block allocated per kernel launch (or possibly per device image), but it should also fall back on a host call to get more memory when that is exhausted which will not work in trunk. It might be documented what setup is expected for the pointer (in particular whether it's shared across kernels and how much memory should be allocated there), otherwise it could be reverse engineered from hip.

Alternatively we could use a different implementation for malloc/free.

tianshilei1992 abandoned this revision.Jul 28 2022, 8:53 PM

Move to a more general way to implement it w/o depending on vendor interfaces.