The shared memory stack in the device runtime assumes no intervined uses.
D135037 breaks the assumption, potentially causing the shared stack corruption.
This patch moves the thread array to heap memory. Since it is already the slow
path, it doesn't matter that much anyway.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
openmp/libomptarget/DeviceRTL/src/State.cpp | ||
---|---|---|
266 |
Comment Actions
Allocating these from global memory will definitely slow down the slow path further, but I believe the better performance on the fast path is worth it. This won't fix the problem on AMDGPU until we support a malloc implementation upstream (@ronlieb).