This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][libomptarget] Fix deinit of NextGen AMDGPU plugin
ClosedPublic

Authored by kevinsala on Jan 19 2023, 10:56 AM.

Details

Summary

This patch fixes a segfault that was appearing when the plugin fails to initialize and then is deinitialized. Also, do not call hsa_shut_down if the hsa_init failed.

Diff Detail

Event Timeline

kevinsala created this revision.Jan 19 2023, 10:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 19 2023, 10:56 AM
kevinsala requested review of this revision.Jan 19 2023, 10:56 AM

Another solution might be to call any random HSA function and see if it returns HSA_STATUS_ERROR_NOT_INITIALIZED. Not sure if that would save us much.

Another solution might be to call any random HSA function and see if it returns HSA_STATUS_ERROR_NOT_INITIALIZED. Not sure if that would save us much.

Would that work when dlopening HSA library with dynamic_hsa and no library is found?

Another solution might be to call any random HSA function and see if it returns HSA_STATUS_ERROR_NOT_INITIALIZED. Not sure if that would save us much.

Would that work when dlopening HSA library with dynamic_hsa and no library is found?

I figured we'd just return an HSA_STATUS_ERROR immediately. But I guess in that case we'd still call the destructor?

Another solution might be to call any random HSA function and see if it returns HSA_STATUS_ERROR_NOT_INITIALIZED. Not sure if that would save us much.

Would that work when dlopening HSA library with dynamic_hsa and no library is found?

I figured we'd just return an HSA_STATUS_ERROR immediately. But I guess in that case we'd still call the destructor?

In that case, where hsa_init returns HSA_STATUS_ERROR because there is no HSA library found, the plugin object (AMDGPUPluginTy) is kept alive. We set the number of its devices to zero but there are no further initializations. We then have to handle the destructor.

jhuber6 accepted this revision.Jan 19 2023, 11:54 AM

This is fine overall, but fewer random globals lying around it always good. I don't know if it's possible to do something like if (some_random_hsa_call && some_random_hsa_call() != HSA_STATUS_NOT_INITIALIZED) since it default value of the pointer might not be null.

This revision is now accepted and ready to land.Jan 19 2023, 11:54 AM

This is fine overall, but fewer random globals lying around it always good. I don't know if it's possible to do something like if (some_random_hsa_call && some_random_hsa_call() != HSA_STATUS_NOT_INITIALIZED) since it default value of the pointer might not be null.

Double checked, global variables should be initialized to zero in C++ so you can probably just check if the function pointer is non-null to see if HSA was dynamically loaded.

This is fine overall, but fewer random globals lying around it always good. I don't know if it's possible to do something like if (some_random_hsa_call && some_random_hsa_call() != HSA_STATUS_NOT_INITIALIZED) since it default value of the pointer might not be null.

fewer random globals lying around it always good

Not sure if you refer to the Initialized variable. In this case it's not a global, it's a data member of the AMDGPUPluginTy class.

Not sure if you refer to the Initialized variable. In this case it's not a global, it's a data member of the AMDGPUPluginTy class.

I mean when we link against HSA we have something like hsa_deinit() as a function with a valid address. When we load it dynamically it's just forward declared like HSA_STATUS_T (hsa_deinit())(void); which is initialized to zero.

Not sure if you refer to the Initialized variable. In this case it's not a global, it's a data member of the AMDGPUPluginTy class.

I mean when we link against HSA we have something like hsa_deinit() as a function with a valid address. When we load it dynamically it's just forward declared like HSA_STATUS_T (hsa_deinit())(void); which is initialized to zero.

I'll update the patch with that change, thanks! We can avoid calling another HSA function and call directly hsa_shut_down ignoring the error code, right?

if (hsa_shut_down)
  hsa_shut_down();

Not sure if you refer to the Initialized variable. In this case it's not a global, it's a data member of the AMDGPUPluginTy class.

I mean when we link against HSA we have something like hsa_deinit() as a function with a valid address. When we load it dynamically it's just forward declared like HSA_STATUS_T (hsa_deinit())(void); which is initialized to zero.

I'll update the patch with that change, thanks! We can avoid calling another HSA function and call directly hsa_shut_down ignoring the error code, right?

if (hsa_shut_down)
  hsa_shut_down();

That would handle the case where HSA wasn't dynamically loaded successfully. If hsa_init() was never called but HSA was successfully loaded / linked, you'd get that HSA_STATUS_ERROR_NOT_INITIALIZED code returned instead.

@jhuber6 I was testing the change but the compilation reports the following warning:

openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:2331:22: warning: the address of ‘hsa_status_t hsa_shut_down()’ will never be NULL [-Waddress]
     if (hsa_shut_down)

When we build with dynamic_hsa, hsa_shut_down (and the rest) appear as a defined function. Looking at nm libomptarget.rtl.amdgpu.nextgen.so, it appears the symbol with symbol type t, i.e. text (code) section. So it sounds this will never be null. I guess hsa_shut_down is a function defined in dynamic_hsa/hsa.cpp acting as a wrapper function.

@jhuber6 I was testing the change but the compilation reports the following warning:

openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:2331:22: warning: the address of ‘hsa_status_t hsa_shut_down()’ will never be NULL [-Waddress]
     if (hsa_shut_down)

When we build with dynamic_hsa, hsa_shut_down (and the rest) appear as a defined function. Looking at nm libomptarget.rtl.amdgpu.nextgen.so, it appears the symbol with symbol type t, i.e. text (code) section. So it sounds this will never be null. I guess hsa_shut_down is a function defined in dynamic_hsa/hsa.cpp acting as a wrapper function.

I see, most likely from how we do the DLWRAP stuff I'm guessing. It's not a huge deal either way. Go ahead and land this since it's not trivial to replace like that.