This patch fixes a segfault that was appearing when the plugin fails to initialize and then is deinitialized. Also, do not call hsa_shut_down if the hsa_init failed.
Details
Diff Detail
Event Timeline
Another solution might be to call any random HSA function and see if it returns HSA_STATUS_ERROR_NOT_INITIALIZED. Not sure if that would save us much.
I figured we'd just return an HSA_STATUS_ERROR immediately. But I guess in that case we'd still call the destructor?
In that case, where hsa_init returns HSA_STATUS_ERROR because there is no HSA library found, the plugin object (AMDGPUPluginTy) is kept alive. We set the number of its devices to zero but there are no further initializations. We then have to handle the destructor.
This is fine overall, but fewer random globals lying around it always good. I don't know if it's possible to do something like if (some_random_hsa_call && some_random_hsa_call() != HSA_STATUS_NOT_INITIALIZED) since it default value of the pointer might not be null.
Double checked, global variables should be initialized to zero in C++ so you can probably just check if the function pointer is non-null to see if HSA was dynamically loaded.
fewer random globals lying around it always good
Not sure if you refer to the Initialized variable. In this case it's not a global, it's a data member of the AMDGPUPluginTy class.
I mean when we link against HSA we have something like hsa_deinit() as a function with a valid address. When we load it dynamically it's just forward declared like HSA_STATUS_T (hsa_deinit())(void); which is initialized to zero.
I'll update the patch with that change, thanks! We can avoid calling another HSA function and call directly hsa_shut_down ignoring the error code, right?
if (hsa_shut_down) hsa_shut_down();
That would handle the case where HSA wasn't dynamically loaded successfully. If hsa_init() was never called but HSA was successfully loaded / linked, you'd get that HSA_STATUS_ERROR_NOT_INITIALIZED code returned instead.
@jhuber6 I was testing the change but the compilation reports the following warning:
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:2331:22: warning: the address of ‘hsa_status_t hsa_shut_down()’ will never be NULL [-Waddress] if (hsa_shut_down)
When we build with dynamic_hsa, hsa_shut_down (and the rest) appear as a defined function. Looking at nm libomptarget.rtl.amdgpu.nextgen.so, it appears the symbol with symbol type t, i.e. text (code) section. So it sounds this will never be null. I guess hsa_shut_down is a function defined in dynamic_hsa/hsa.cpp acting as a wrapper function.
I see, most likely from how we do the DLWRAP stuff I'm guessing. It's not a huge deal either way. Go ahead and land this since it's not trivial to replace like that.