There are two problems this patch tries to address: 1) We currently free resources in a random order wrt. plugin and libomptarget destruction. This patch should ensure the CUDA plugin is less fragile if something during the deinitialization goes wrong. 2) We need to support (hard) pause runtime calls eventually. This patch allows us to free all associated resources, though we cannot reinitialize the device yet. Follow up patch will associate one event pool per device/context.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I don't think this patch can fix the first issue because we don't know whether the plugin has already been destroyed when objects from libomptarget are destroyed. I had experience that calling plugin functions in destructor of one object from libomptarget caused crash because plugin has already been released.
OK, that seems to be a separate issue. I think with the __tgt_rtl_unregister_lib we might have that one under control.
So this patch does (1) introduce per device "deinit" capabilities (from libomptarget this is unused for now).
And, (2) deinit all devices as part of the user code destructor (via __tgt_rtl_unregister_lib)
`
openmp/libomptarget/plugins/cuda/src/rtl.cpp | ||
---|---|---|
781 | The conditionals here might help us avoid the crashes we've seen before. |
openmp/libomptarget/plugins/cuda/src/rtl.cpp | ||
---|---|---|
566 | Why do we need it? |
openmp/libomptarget/plugins/cuda/src/rtl.cpp | ||
---|---|---|
566 | Because we need to know the state of a device. Is it initialized or not. This flag tells us. |
openmp/libomptarget/plugins/cuda/src/rtl.cpp | ||
---|---|---|
566 | Here they are assigned all at once, but we only initialize the device on demand. |
openmp/libomptarget/plugins/cuda/src/rtl.cpp | ||
---|---|---|
590 | Actually this could have data race. Say if multiple threads are trying to initialize it at the same time. I don't remember we have the guard in libomptarget. We could potentially use std::vector<std::once_flag> instead. |
openmp/libomptarget/plugins/cuda/src/rtl.cpp | ||
---|---|---|
590 | This is something libomptarget should handle. There is little reason N plugins should do it and also little reason to assume plugins should allow concurrent initialization of the same device. That said, DeviceTy::init() in libomptarget is guarded by a std::call_once which should do the trick just fine. |
openmp/libomptarget/plugins/cuda/src/rtl.cpp | ||
---|---|---|
767 | Oh, none_of, nvm. |
openmp/libomptarget/plugins/cuda/src/rtl.cpp | ||
---|---|---|
26 | Because my clang complete things we use it (probably for some types). | |
590 | If we wouldn't, how is this patch making it worse? All the assignments below would be racy as well. | |
767 | This is going to be replaced in the follow up when we have one event pool per device (which we need). | |
openmp/libomptarget/src/device.cpp | ||
471 | Nobody anymore, probably. Was called but I removed it. We would call it for hard_shutdown though. |
LG w/ some nits.
openmp/libomptarget/plugins/cuda/src/rtl.cpp | ||
---|---|---|
26 | omptargetplugin.h includes omptarget.h already, but it's a good practice to include what we need. Good to keep it. | |
590 | I agree. No problem. I mean, we need to do guard it. If not in this patch, then we do it in another patch. | |
openmp/libomptarget/src/device.cpp | ||
471 | We can call it in the deinit function emitted by clang in another patch once we land it. |
openmp/libomptarget/plugins/cuda/src/rtl.cpp | ||
---|---|---|
760 | Modules are not indexed by device ID. We need std::vector<std::vector<CUmodule>> instead. |
Why do we want omptarget.h here?