This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Explicitly deinitialize device resources
ClosedPublic

Authored by jdoerfert on Feb 17 2022, 2:14 PM.

Details

Summary
There are two problems this patch tries to address:
1) We currently free resources in a random order wrt. plugin and
   libomptarget destruction. This patch should ensure the CUDA plugin
   is less fragile if something during the deinitialization goes wrong.
2) We need to support (hard) pause runtime calls eventually. This patch
   allows us to free all associated resources, though we cannot
   reinitialize the device yet.

Follow up patch will associate one event pool per device/context.

Diff Detail

Event Timeline

jdoerfert created this revision.Feb 17 2022, 2:14 PM
jdoerfert requested review of this revision.Feb 17 2022, 2:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 17 2022, 2:14 PM
Herald added a subscriber: sstefan1. · View Herald Transcript

I don't think this patch can fix the first issue because we don't know whether the plugin has already been destroyed when objects from libomptarget are destroyed. I had experience that calling plugin functions in destructor of one object from libomptarget caused crash because plugin has already been released.

jdoerfert updated this revision to Diff 409955.Feb 18 2022, 9:02 AM

Add unregister plugin call for CUDA plugin, deinitialize all devices then

jdoerfert updated this revision to Diff 409956.Feb 18 2022, 9:05 AM

Do not deinit the plugin device when the libomptarget device is destroyed.

I don't think this patch can fix the first issue because we don't know whether the plugin has already been destroyed when objects from libomptarget are destroyed. I had experience that calling plugin functions in destructor of one object from libomptarget caused crash because plugin has already been released.

OK, that seems to be a separate issue. I think with the __tgt_rtl_unregister_lib we might have that one under control.

So this patch does (1) introduce per device "deinit" capabilities (from libomptarget this is unused for now).
And, (2) deinit all devices as part of the user code destructor (via __tgt_rtl_unregister_lib)

jdoerfert retitled this revision from [OpenMP][WIP] Explicitly deinitialize device resources to [OpenMP] Explicitly deinitialize device resources.Mar 4 2022, 12:42 PM
jdoerfert edited the summary of this revision. (Show Details)
Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2022, 12:42 PM
jdoerfert updated this revision to Diff 413107.Mar 4 2022, 12:42 PM
jdoerfert updated this revision to Diff 413245.Mar 5 2022, 1:46 PM
jdoerfert edited the summary of this revision. (Show Details)

Updates

`

openmp/libomptarget/plugins/cuda/src/rtl.cpp
781

The conditionals here might help us avoid the crashes we've seen before.

tianshilei1992 added inline comments.Mar 5 2022, 7:42 PM
openmp/libomptarget/plugins/cuda/src/rtl.cpp
566

Why do we need it?

jdoerfert added inline comments.Mar 6 2022, 2:54 PM
openmp/libomptarget/plugins/cuda/src/rtl.cpp
566

Because we need to know the state of a device. Is it initialized or not. This flag tells us.

tianshilei1992 added inline comments.Mar 6 2022, 6:37 PM
openmp/libomptarget/plugins/cuda/src/rtl.cpp
566

Here they are assigned all at once, but we only initialize the device on demand.

jdoerfert updated this revision to Diff 413339.Mar 6 2022, 10:06 PM

Addressed comments

tianshilei1992 added inline comments.Mar 7 2022, 7:37 AM
openmp/libomptarget/plugins/cuda/src/rtl.cpp
590

Actually this could have data race. Say if multiple threads are trying to initialize it at the same time. I don't remember we have the guard in libomptarget. We could potentially use std::vector<std::once_flag> instead.

jdoerfert marked an inline comment as done.Mar 7 2022, 7:49 AM
jdoerfert added inline comments.
openmp/libomptarget/plugins/cuda/src/rtl.cpp
590

This is something libomptarget should handle. There is little reason N plugins should do it and also little reason to assume plugins should allow concurrent initialization of the same device. That said, DeviceTy::init() in libomptarget is guarded by a std::call_once which should do the trick just fine.

tianshilei1992 added inline comments.Mar 7 2022, 7:57 AM
openmp/libomptarget/plugins/cuda/src/rtl.cpp
26

Why do we want omptarget.h here?

590

I'm fine with that, as long as we really do it.

767

This none_of is useless here because InitializedFlags[DeviceId] is reset beforehand.

openmp/libomptarget/src/device.cpp
471

Who calls it?

tianshilei1992 added inline comments.Mar 7 2022, 7:58 AM
openmp/libomptarget/plugins/cuda/src/rtl.cpp
767

Oh, none_of, nvm.

jdoerfert marked an inline comment as done.Mar 7 2022, 8:13 AM
jdoerfert added inline comments.
openmp/libomptarget/plugins/cuda/src/rtl.cpp
26

Because my clang complete things we use it (probably for some types).

590

If we wouldn't, how is this patch making it worse? All the assignments below would be racy as well.

767

This is going to be replaced in the follow up when we have one event pool per device (which we need).

openmp/libomptarget/src/device.cpp
471

Nobody anymore, probably. Was called but I removed it. We would call it for hard_shutdown though.

tianshilei1992 accepted this revision.Mar 7 2022, 5:53 PM

LG w/ some nits.

openmp/libomptarget/plugins/cuda/src/rtl.cpp
26

omptargetplugin.h includes omptarget.h already, but it's a good practice to include what we need. Good to keep it.

590

I agree. No problem. I mean, we need to do guard it. If not in this patch, then we do it in another patch.

openmp/libomptarget/src/device.cpp
471

We can call it in the deinit function emitted by clang in another patch once we land it.

This revision is now accepted and ready to land.Mar 7 2022, 5:53 PM
tianshilei1992 added inline comments.
openmp/libomptarget/plugins/cuda/src/rtl.cpp
760

Modules are not indexed by device ID. We need std::vector<std::vector<CUmodule>> instead.