This is an archive of the discontinued LLVM Phabricator instance.

[openmp][amdgpu] Tear down amdgpu plugin accurately
ClosedPublic

Authored by JonChesterfield on Jul 28 2022, 9:12 AM.

Details

Summary

Moves DeviceInfo global to heap to accurately control lifetime.
Moves calls from libomptarget to deinit_plugin later, plugins need to stay
alive until very shortly before libomptarget is destructed.

Leaving the deinit_plugin calls where initially inserted hits use after
free from the dynamic_module.c offloading test (verified with valgrind
that the new location is sound with respect to this)

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2022, 9:12 AM
JonChesterfield requested review of this revision.Jul 28 2022, 9:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2022, 9:12 AM
JonChesterfield added a comment.EditedJul 28 2022, 9:13 AM

@jhuber6 I believe this will fix the segfault you were seeing locally, please verify
@others we probably want the same transform in nvptx, let me know if you'd like me to write it

tianshilei1992 requested changes to this revision.Jul 28 2022, 9:23 AM

This doesn't work because the life time of a plugin is shorter than libomptarget.

This revision now requires changes to proceed.Jul 28 2022, 9:23 AM

This doesn't work because the life time of a plugin is shorter than libomptarget.

Is there a reason for this? We initialize them late sure, but why do we destruct them earlier?

This doesn't work because the life time of a plugin is shorter than libomptarget.

Is there a reason for this? We initialize them late sure, but why do we destruct them earlier?

I'm not very familiar with dynamic linker, but I guess it just tries to destruct them in a reverse order. I did some experiments before, and that showed when libomptarget starts to destruct, the plugin is already destroyed. Any call from d'tor of libomptarget to plugins will lead to UB, or in most of cases for NVPTX, segfault.

I don't remember if there is a "hard requirement" wrt the order but given the current code the plugin needs to outlive libomptarget (IIRC). We tried multiple times to tear down things, and I think we need the user code tear down hook to find a stable order, everything else might not be. Though even the user code destructor might be called after we unload the librarie(s), which would be bad.

JonChesterfield added a comment.EditedJul 28 2022, 10:06 AM

The order of execution of constructors is an emergent property of how the compiler annotated functions (e.g. __attribute__((destructor(101))) above), what structures the linker built from those (which usually depends on linker invocation) and the C runtime, how the linker associates dynamic objects with the executable, how the loader processes those objects and those passed by dlopen. To a good approximation it is unpredictable. Destructors probably execute in reverse order of constructors but I wouldn't bet on it where dlopen (and present or missing dlclose) are involved.

The plugin lifetime cannot encompass the libomptarget lifetime because they are dlopened by libomptarget. In principle the plugins could live beyond libomptarget, but doing so would mean we cannot close them explicitly and must rely on the emergent behaviour of destructors. That is what is presently segfaulting on the 15 release branch on Joseph's machine.

Any call from d'tor of libomptarget to plugins will lead to UB

This is untrue in general. If the plugin global destructor has executed before the libomptarget destructor and left the plugin in a state where calls to it will segfault then yes, it will segfault.

This patch changes the amdgpu plugin to have well defined lifetime between calls to plugin_init and plugin_deinit, and puts the call to plugin_deinit in the libomptarget destructor. That strictly nests the lifetime of the plugin within that of libomptarget while arranging for no calls to follow the one to plugin_deinit. That is the only nesting that can be correct.

However the order of global constructors / destructors within the plugin relative to those of other shared objects is not especially likely to respect this definition, so the fix there is to not have global constructor / destructors within the plugin. Strictly speaking there's still a statically allocated 8 byte pointer here, but as the destructor for that emits no code we're fine.

Finally this patch makes no difference to nvptx whatsoever as it hasn't implemented init_plugin or deinit_plugin, so nvptx maintains the current roll the die and hope semantics. It can opt into well defined ones, like this patch does to amdgpu, whenever we'd like to stop it unexpectedly segfaulting.

Any call from d'tor of libomptarget to plugins will lead to UB

This is untrue in general. If the plugin global destructor has executed before the libomptarget destructor and left the plugin in a state where calls to it will segfault then yes, it will segfault.

That is exactly the case.

This patch changes the amdgpu plugin to have well defined lifetime between calls to plugin_init and plugin_deinit, and puts the call to plugin_deinit in the libomptarget destructor. That strictly nests the lifetime of the plugin within that of libomptarget while arranging for no calls to follow the one to plugin_deinit. That is the only nesting that can be correct.

Of course you can limit the lifetime between calls to plugin_init and plugin_deinit, but the key point is when they are called.

However the order of global constructors / destructors within the plugin relative to those of other shared objects is not especially likely to respect this definition, so the fix there is to not have global constructor / destructors within the plugin. Strictly speaking there's still a statically allocated 8 byte pointer here, but as the destructor for that emits no code we're fine.

Is it possible the buffer where DeviceInfoState points is freed implicitly before deinit_plugin?

Finally this patch makes no difference to nvptx whatsoever as it hasn't implemented init_plugin or deinit_plugin, so nvptx maintains the current roll the die and hope semantics. It can opt into well defined ones, like this patch does to amdgpu, whenever we'd like to stop it unexpectedly segfaulting.

However, it changes the behavior of libomptarget, even though for now others don't implement the functions. It opens the door to mystery for others.

I'll be generally okay if we document all details in __attribute__((destructor(101))) void deinit() before we call R->deinit_plugin, such as in what scenario this could work, and in what cases it may not work as expected.

Any call from d'tor of libomptarget to plugins will lead to UB

This is untrue in general. If the plugin global destructor has executed before the libomptarget destructor and left the plugin in a state where calls to it will segfault then yes, it will segfault.

That is exactly the case.

Sometimes. It's a property of the system it runs on at present.

However the order of global constructors / destructors within the plugin relative to those of other shared objects is not especially likely to respect this definition, so the fix there is to not have global constructor / destructors within the plugin. Strictly speaking there's still a statically allocated 8 byte pointer here, but as the destructor for that emits no code we're fine.

Is it possible the buffer where DeviceInfoState points is freed implicitly before deinit_plugin?

No. It's allocated by new and freed with delete.

Finally this patch makes no difference to nvptx whatsoever as it hasn't implemented init_plugin or deinit_plugin, so nvptx maintains the current roll the die and hope semantics. It can opt into well defined ones, like this patch does to amdgpu, whenever we'd like to stop it unexpectedly segfaulting.

However, it changes the behavior of libomptarget, even though for now others don't implement the functions. It opens the door to mystery for others.

I'll be generally okay if we document all details in __attribute__((destructor(101))) void deinit() before we call R->deinit_plugin, such as in what scenario this could work, and in what cases it may not work as expected.

Can't be done. Depends on compiler and libc. As implemented this will work for any plugin that doesn't depend on undefined behaviour in its global constructor/destructor order, but those plugins are of course broken at present. This patch leaves nvptx unchanged but I'd really like to fix amdgpu for the 15 branch whatever we do with that.

  • Comment the call to deinit_plugin
This revision is now accepted and ready to land.Jul 28 2022, 11:56 AM

This patch fixes the crash I was seeing locally, thanks.