The RAII class used for debugging RTL entry used a shared variable to
keep track of the current depth. This used a global initializer, which
isn't supported on AMDGPU. This patch removes the initializer and
instead sets it to zero when the state is initialized in the runtime.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
LG, though racy, see comment.
openmp/libomptarget/DeviceRTL/src/State.cpp | ||
---|---|---|
370–372 | Put it under this guard. |
Comment Actions
There's a recent (amdgpu specific I think) llvm pass that gathers ctor/dtor into one or more kernels that expect to be executed with one wavefront, possibly one thread active at the entry point. I'm aware of two bugs in it, one fixed and one with a patch outstanding. These aren't actually run by the amdgpu plugin yet but probably will be shortly. At least the constructors, I'm not clear how we would know when to run the destructors. At that point we could optionally revert this, looks good for now.
Moving one-off initialisation to a global ctor, once said ctor is actually executed by the plugin, is probably a minor performance win. Do some work once instead of once per kernel.
Put it under this guard.