This is an archive of the discontinued LLVM Phabricator instance.

[Libomptarget] Deinitialize AMDGPU global state more intentionally
ClosedPublic

Authored by jhuber6 on Aug 2 2022, 12:31 PM.

Details

Summary

A previous patch made the destruction of the HSA plugin more
deterministic. However, there were still other global values that are not
handled this way. When attempting to call a destructor kernel, the
device would have already been uninitialized and we could not find the
appropriate kernel to call. This is because they were stored in global
containers that had their destructors called already. Merges this global
state into the rest of the info state by putting those global values
inside of the global pointer already allocated and deallocated by the
constructor and destructor. This should allow the AMDGPU plugin to
correctly identify the destructors if we were to run them.

Diff Detail

Event Timeline

jhuber6 created this revision.Aug 2 2022, 12:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2022, 12:31 PM
jhuber6 requested review of this revision.Aug 2 2022, 12:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2022, 12:31 PM

Lots of noise in the diff. Could you do a precommit one where you move the struct / functions around in the file without changing them then rebase this patch?

jhuber6 updated this revision to Diff 449434.Aug 2 2022, 2:14 PM

Trying to clean up.

That being said, I've been doing some more thinking about how to handle this and I don't think this is the right way to go. We can set an explicit order on the destruction of the plugins, but that doesn't necessarily mean that the memory or libraries depending on that plugin will still be valid. I think this is the core issue behind why I was seeing weird behaviour porting CUDA to this method. For the CUDA RTL, the CUDA library may already be unloaded by the time we try to uninititalize the data and i don't think there's any way to guaruntee this one way or the other.

I'm thinking one reasonable way to solve this is to instead handle device destructors via atexit(), I believe that function is supposed to be called before standard user exit, e.g. after main() and exit(). That would allow us to run those functions before the destructors start unwinding presumably. Then to solve the problem that Jon's original patch aimed to solve, we simply make accessing internal plugin state illegal and explicitly dlclose the plugin at the start to prevent this. For this we would need to then remove the is_valid_binary_info calls, this should be easily replaced with some data structure that saves them instead of trying to derive them all over again. There may be some problems with this approach as well, but we really need an intelligent way to handle this stuff that won't blow up.

Strongly against going with atexit. The lifetime of these things needs to be sanely nested without use after free errors and we're only going to get that with explicit control flow.

I suspect the root cause of at least some problems is we don't bother to call dlclose and we should do so. I think we can guarantee relative lifetime of hsa/cuda vs the plugin. The edge between host executable and libomptarget is more alarming, but I expect the fix to that is to stop using global destructors in libomptarget and emit a call to tear it down from the main executable instead.

JonChesterfield accepted this revision.Aug 2 2022, 3:15 PM
This revision is now accepted and ready to land.Aug 2 2022, 3:15 PM