This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Fixed the issue that memory manager calls plugin interface to release buffered memory even after the plugin has been destroyed
AbandonedPublic

Authored by tianshilei1992 on Jan 7 2021, 12:30 PM.

Details

Summary

Currently on some platforms the program might crash when the program
exits. It is because the memory manager calls plugin interface to release its
buffered memory and the plugin has already been destroyed. After some search
online, it seems that loaded dynamic libraries will be destroyed in the reverse
order of loading, which means by no means, those plugin libraries loaded by
dlopen will be destroyed after libomtarget. We need something to tell the
RTLInfoTy that the hold plugin is not valid anymore so don't call those
interfaces. This is done by registering a callback function in the plugin so that
it will be called when the plugin starts to destroy itself. The callback function
for now only does one thing: release the memory manager so that all buffered
target memory will be released. We can add more features in the future if
needed.

Diff Detail

Event Timeline

tianshilei1992 created this revision.Jan 7 2021, 12:30 PM
tianshilei1992 requested review of this revision.Jan 7 2021, 12:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 7 2021, 12:30 PM
jdoerfert added inline comments.Jan 7 2021, 12:42 PM
openmp/libomptarget/src/MemoryManager.cpp
114

Can't we do this in the destructor MemoryManagerTy::~MemoryManagerTy() ?

Only check the validity in MemoryManagerTy::~MemoryManagerTy

tianshilei1992 marked an inline comment as done.Jan 7 2021, 12:48 PM
tianshilei1992 added inline comments.
openmp/libomptarget/src/MemoryManager.cpp
114

Nice point! Did that.

One nit, works for me otherwise. Anyone else?

openmp/libomptarget/src/MemoryManager.cpp
108

Early exit or braces please, former preferred.

tianshilei1992 marked an inline comment as done.

Updated the coding style

I suppose this is better, though it seems backwards to me. Plus we still leak memory with this design.

I was hoping the plugin would expose a function that libomptarget calls to do the cleanup, in place of today's global destructor. Libomptarget can then do an ordered cleanup, after the memory manager destructor has run.

tianshilei1992 edited the summary of this revision. (Show Details)Jan 7 2021, 1:25 PM
Harbormaster completed remote builds in B84375: Diff 315219.
vzakhari added inline comments.
openmp/libomptarget/src/rtl.cpp
444–445

Can we just implement this instead?

tianshilei1992 added inline comments.Jan 7 2021, 4:45 PM
openmp/libomptarget/src/rtl.cpp
444–445

At this point, the plugin has already been destroyed.

vzakhari added inline comments.Jan 7 2021, 5:01 PM
openmp/libomptarget/src/rtl.cpp
444–445

How is that possible, since we never call dlclose() for the plugins?
Is the problem specific to some particular execution conditions?

Should we just "deinitialize" the memory manager here instead of postponing the "deinitialization" to MemoryManagerTy desctructor, which will obviously be executed at the wrong time.

tianshilei1992 added inline comments.Jan 7 2021, 6:00 PM
openmp/libomptarget/src/rtl.cpp
444–445

That's most tough part of this question. libomptarget has no idea when its opened plugin will be destroyed. Actually, you can see there is no dlclose in current code. What we can know is, plugin will be destroyed before libomptarget. You can easily verify it by adding DP(...) to the destructor of the plugin.

vzakhari added inline comments.Jan 7 2021, 6:39 PM
openmp/libomptarget/src/rtl.cpp
444–445

This sounds really strange at least for Linux, since:

The dynamic linker maintains reference counts for object handles, so a dynamically loaded shared object is not deallocated until dlclose() has been called on it as many times as dlopen() has succeeded on it.

Do you have a reproducer for this kind of an issue? I must be missing something here...

tianshilei1992 marked 4 inline comments as done.Jan 7 2021, 6:44 PM
tianshilei1992 added inline comments.
openmp/libomptarget/src/rtl.cpp
444–445

You can use openmp/libomptarget/test/offloading/memory_manager.cpp and turn debug print on via setting env LIBOMPTARGET_DEBUG to a large number like 10. Then you can see a lot of logs.

...
Libomptarget --> omp_target_free deallocated device ptr
PASS
(1) Target CUDA RTL --> CUDA plugin destroyed
(2) Libomptarget --> __tgt_unregister_lib
Libomptarget --> Unloading target library!
Libomptarget --> Image 0x0000000000403870 is compatible with RTL 0x0000000000e4a8b0!
Libomptarget --> Unregistered image 0x0000000000403870 from RTL 0x0000000000e4a8b0!
Libomptarget --> Done unregistering images!
Libomptarget --> Removing translation table for descriptor 0x00000000004abb60
Libomptarget --> Done unregistering library!
Libomptarget --> Deinit target library!
Target CUDA RTL --> Error returned from cuCtxSetCurrent
Target CUDA RTL --> Unresolved CUDA error code: 4
Target CUDA RTL --> Unsuccessful cuGetErrorString return status: 4
Target CUDA RTL --> Error returned from cuCtxSetCurrent
...

Line (1) and (2) were printed because I added DP to the destructor of CUDA plugin and function __tgt_unregister_lib. The last tens of lines are all error messages because CUDA plugin was destroyed before libomptarget.

vzakhari added inline comments.Jan 7 2021, 7:24 PM
openmp/libomptarget/src/rtl.cpp
444–445

Ah, I see. The global destructor for DeviceRTL is called before __tgt_unregister_lib is called. The plugin (the shared object) is not actually deallocated yet. Thank you for the explanation.

I am not sure libomptarget has to be involved here, since this may be done exclusively at the plugin level, i.e. DeviceRTL may be dynamically allocated/deallocated using global init/fini functions, and then all plugin APIs may return early, if they are invoked after DeviceRTL has been deallocated.

Just to be clear, I am not against this change-set :)

Anyway, libomptarget has to be redesigned a bit because the plugins loading initiated in a global constructor (which calls RTLsTy::RegisterLib) is violating Microsoft conventions for DLL initialization (e.g. you cannot create a DLL with offload and then LoadLibrary this DLL, because this will result in LoadLibrary->...->LoadLibrary call stack, which is not allowed). We should probably load the plugins on the first OpenMP construct/API encountered during the execution. I assume we will have to define clear entry/exit points for plugins then, and this will solve the issue that you are addressing here.

tianshilei1992 marked 2 inline comments as done.Jan 7 2021, 7:38 PM
tianshilei1992 added inline comments.
openmp/libomptarget/src/rtl.cpp
444–445

libomptarget is involved because we have a memory manager which will releases all its buffered memory in its destructor. It is an object in libomptarget. So its destructor is invoked when libomptarget is being destroyed. Now we know at this moment the plugin has already been destroyed, and the memory manager still calls plugin interfaces, causing program crash on some platforms. However, usually it will not crash on CUDA platforms, just having some errors because the plugin is using some CUDA related data which have been explicitly destroyed when the plugin is destroyed.

Redesign of libomptarget is beyond the scope of this patch. It might be done when LLVM OpenMP decides to support Windows. For now this patch is like a WA.

tianshilei1992 marked an inline comment as done.Jan 7 2021, 7:39 PM

I suppose this is better, though it seems backwards to me. Plus we still leak memory with this design.

The memory will be eventually released by the driver if the process is terminated, right?

I was hoping the plugin would expose a function that libomptarget calls to do the cleanup, in place of today's global destructor. Libomptarget can then do an ordered cleanup, after the memory manager destructor has run.

I have some new ideas to make the cleanup possible. Will update the patch accordingly.

I suppose this is better, though it seems backwards to me. Plus we still leak memory with this design.

The memory will be eventually released by the driver if the process is terminated, right?

Maybe. Probably platform-specific. I definitely have to power cycle GPUs to get them working again occasionally so the kernel doesn't (always?) leave the stack in a just-booted state after process exit.

Leaking memory by design thwarts using memory tracking tools to find accidental leaks. In this case, it's indicative that we've failed to establish static lifetimes for the components. Too much global state.

tcramer added a subscriber: tcramer.Jan 8 2021, 5:57 AM

Fixed the issue that target memory are not released when the plugin is destroyed

tianshilei1992 edited the summary of this revision. (Show Details)Jan 9 2021, 12:58 PM

The complexity of this approach is very high relative to instantiating MemoryManagerTy in the cuda plugin, and optionally in any others that want to use it.

openmp/libomptarget/plugins/cuda/src/rtl.cpp
282

It looks like this stuff needs to be implemented in every plugin, since they all use the memory manager at present

The complexity of this approach is very high relative to instantiating MemoryManagerTy in the cuda plugin, and optionally in any others that want to use it.

This approach does not only fix the issue, but also open a door for future development. RTLInfoTy is used by DeviceTy, so each one has pointers to the other. It is a weak dependence so not a problem. We have a lot of similar things in LLVM, such as use and user.

The callback function can also be used later as it is the only chance for libomptarget to know when the plugin is destroyed and do something needed.

In the future we will add something to let a plugin exchange information with libomptarget. For example, a plugin can tell libomptarget what features it can support and libomptarget can use that information accordingly. Moving the memory manager to plugin and taking it as a common part is also feasible.

tianshilei1992 abandoned this revision.Jan 10 2021, 5:13 PM

Proposed to move the memory manager to plugin. See D94379 for more details.