This patch adds the basic support for malloc and free for AMDGPU.
It borrows the idea from clang/lib/Headers/__clang_hip_runtime_wrapper.h but
ignores the two checks: HIP version, and whether enabling device malloc. Arguably,
this doesn't make things worse. Currently no matter which HIP version users are
using, or whether the device malloc is enabled, they will all get link error if
they use malloc and free in target region. Then for users using pre ROCm 4.5,
they still get link error. That also helps to get rid of the check whether device
malloc is enabled because it is only needed for before ROCm 4.5. The good part is,
users using ROCm 4.5+ would get the feature.
Details
- Reviewers
jdoerfert JonChesterfield jhuber6
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
openmp/libomptarget/DeviceRTL/include/Memory.h | ||
---|---|---|
19 | I don't think we can/should define it like this. Maybe using size_t = decltype(sizeof(char)); instead? Also, this should arguably be in Types.h. | |
openmp/libomptarget/DeviceRTL/src/Memory.cpp | ||
32 | I think we want (now or later) a second level of indirection here. Also, we need to annotate all external functions properly, e.g., see malloc and free in State.cpp. |
openmp/libomptarget/DeviceRTL/src/Memory.cpp | ||
---|---|---|
32 | That would be our next step after we have a proper allocator. |
LG, minor nits
openmp/libomptarget/DeviceRTL/include/Types.h | ||
---|---|---|
45 | We don't require 8 byte size_t, do we? size_t is by definition supposed to be the sizeof result type, not 8 byte. | |
openmp/libomptarget/DeviceRTL/src/Memory.cpp | ||
25 | Rename the arguments. Unrelated: I really have no words for dealloc taking a unsigned long long. |
This is missing changes to the host plugin to set up the state malloc uses, and missing a host call implementation to do the allocation. So as written this isn't going to work. What's the benefit?
It looks like there are a couple of things still missing:
- malloc and free defined for AMDGPU using declare variant cannot be linked into the device module correctly. Their names still contain declare variant mangling such that they are dropped directly, leaving malloc and free still undefined.
- Even manually link the functions into device module, because some host setup are missing, as Jon pointed out, there is still some memory error:
[GPU Memory Error] Addr: 0x0 Reason: Page not present or supervisor privilege. Memory access fault by GPU node-8 (Agent handle: 0x1428d10) on address (nil). Reason: Page not present or supervisor privilege.
Yep, the host plugin will be passing 0 at an offset where the ockl machinery expects some non-null value. I think that allocator uses some initial block allocated per kernel launch (or possibly per device image), but it should also fall back on a host call to get more memory when that is exhausted which will not work in trunk. It might be documented what setup is expected for the pointer (in particular whether it's shared across kernels and how much memory should be allocated there), otherwise it could be reverse engineered from hip.
Alternatively we could use a different implementation for malloc/free.
I don't think we can/should define it like this. Maybe using size_t = decltype(sizeof(char)); instead? Also, this should arguably be in Types.h.