This is an archive of the discontinued LLVM Phabricator instance.

[libomptarget][amdgpu] More robust handling of failure to init HSA
ClosedPublic

Authored by JonChesterfield on Jul 25 2021, 1:48 PM.

Details

Summary

If hsa_init fails, subsequent calls into hsa are not safe. Except for
hsa_init, but we don't retry on failure.

This patch:

  • deletes a print that called into hsa to ask why it can't call into hsa
  • drops a merge conflict block next to that print
  • reliably initializes number of devices to zero
  • skips the plugin destructor contents if the constructor failed to init hsa

Tested by making hsa_init return error, and by forcing the dynamic library
use which was then deleted from disk. Before this patch, both segv. After it,
friendly message about offloading being unavailable.

Diff Detail

Event Timeline

JonChesterfield requested review of this revision.Jul 25 2021, 1:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2021, 1:49 PM
openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
977

This paranoia may be unwarranted - calling __tgt_rtl_number_of_devices will return zero, so the library shouldn't attempt to initialize any. Will drop this sanity check if preferred.

This revision is now accepted and ready to land.Jul 25 2021, 1:51 PM
This revision was landed with ongoing or failed builds.Jul 25 2021, 3:16 PM
This revision was automatically updated to reflect the committed changes.
JonChesterfield marked an inline comment as not done.

Landed without the bounds check on device_id in __tgt_rtl_init_device. We might want to check that device_id is non-negative and less than the claimed number of devices, but if so, it should probably be on every external function, and possibly only in debug mode.