This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][DeviceRTL] Fix an issue that thread array might be corrupted
ClosedPublic

Authored by tianshilei1992 on Oct 6 2022, 12:57 PM.

Details

Summary

The shared memory stack in the device runtime assumes no intervined uses.
D135037 breaks the assumption, potentially causing the shared stack corruption.
This patch moves the thread array to heap memory. Since it is already the slow
path, it doesn't matter that much anyway.

Diff Detail

Event Timeline

tianshilei1992 created this revision.Oct 6 2022, 12:57 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 6 2022, 12:57 PM
tianshilei1992 requested review of this revision.Oct 6 2022, 12:57 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 6 2022, 12:57 PM
jhuber6 added inline comments.Oct 6 2022, 12:58 PM
openmp/libomptarget/DeviceRTL/src/State.cpp
266

fix build issue

tianshilei1992 marked an inline comment as done.Oct 6 2022, 1:01 PM
jhuber6 accepted this revision.Oct 6 2022, 1:03 PM
jhuber6 added a subscriber: ronlieb.

Allocating these from global memory will definitely slow down the slow path further, but I believe the better performance on the fast path is worth it. This won't fix the problem on AMDGPU until we support a malloc implementation upstream (@ronlieb).

This revision is now accepted and ready to land.Oct 6 2022, 1:03 PM