Page MenuHomePhabricator

[OpenMP] Fix initializer not working on AMDGPU
ClosedPublic

Authored by jhuber6 on Nov 15 2021, 8:06 PM.

Details

Summary

The RAII class used for debugging RTL entry used a shared variable to
keep track of the current depth. This used a global initializer, which
isn't supported on AMDGPU. This patch removes the initializer and
instead sets it to zero when the state is initialized in the runtime.

Diff Detail

Event Timeline

jhuber6 created this revision.Nov 15 2021, 8:06 PM
jhuber6 requested review of this revision.Nov 15 2021, 8:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 15 2021, 8:06 PM
jhuber6 updated this revision to Diff 387484.Nov 15 2021, 8:14 PM

Put the debug init call in the wrong init function.

LG, though racy, see comment.

openmp/libomptarget/DeviceRTL/src/State.cpp
370–372

Put it under this guard.

jdoerfert accepted this revision.Nov 15 2021, 8:22 PM
This revision is now accepted and ready to land.Nov 15 2021, 8:22 PM
JonChesterfield accepted this revision.EditedNov 16 2021, 12:14 AM
JonChesterfield added a reviewer: JonChesterfield.
JonChesterfield added a subscriber: JonChesterfield.

There's a recent (amdgpu specific I think) llvm pass that gathers ctor/dtor into one or more kernels that expect to be executed with one wavefront, possibly one thread active at the entry point. I'm aware of two bugs in it, one fixed and one with a patch outstanding. These aren't actually run by the amdgpu plugin yet but probably will be shortly. At least the constructors, I'm not clear how we would know when to run the destructors. At that point we could optionally revert this, looks good for now.

Moving one-off initialisation to a global ctor, once said ctor is actually executed by the plugin, is probably a minor performance win. Do some work once instead of once per kernel.

This revision was automatically updated to reflect the committed changes.