This is an archive of the discontinued LLVM Phabricator instance.

[Libomptarget] Add checks for AMDGPU TargetID using new image info
ClosedPublic

Authored by saiislam on Jun 14 2022, 10:41 AM.

Details

Summary

This patch extends the is_valid_binary routine to also check if the
binary's target ID matches the one parsed from the system's runtime
environment.
This should allow us to only use the binary whose compute capability
matches, allowing us to support basic multi-architecture binaries for
AMDGPU.
It also handles compatibility testing of target IDs of the image and
the enviornment.

Depends on D127432

Diff Detail

Event Timeline

saiislam created this revision.Jun 14 2022, 10:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2022, 10:41 AM
saiislam requested review of this revision.Jun 14 2022, 10:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2022, 10:41 AM
saiislam updated this revision to Diff 436855.Jun 14 2022, 10:54 AM

Forgot to invert the condition in an if block.

Looks fine to me given what I already created. I'll let someone with more knowledge of AMD's systems comment however.

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
2052

Compute capability is a CUDA thing right.

The control flow here is quite complicated and it's not immediately obvious why things are being stashed in a map. I vaguely recall the rules for choosing compatible images by target ID being complicated and undocumented.

Did you reverse engineer this from the HIP implementation, or is there now documentation for how the +/-/any stuff is meant to resolve?

saiislam updated this revision to Diff 437051.Jun 14 2022, 11:52 PM
saiislam marked an inline comment as done.

Added comments for parseTargeID() .

The control flow here is quite complicated and it's not immediately obvious why things are being stashed in a map. I vaguely recall the rules for choosing compatible images by target ID being complicated and undocumented.

Did you reverse engineer this from the HIP implementation, or is there now documentation for how the +/-/any stuff is meant to resolve?

I implemented and documented the compatibility checking approach last year. Here is our complete documentation and implementation in the downstream.
Upstream reviewers at that time suggested to split TargetID specific checking to a separate patch, hence only a subset of checks were documented and implemented in upstream.

Implementation in this patch is exact application of my above patches.

target-feature subsection in this doc nicely explains the allowed interactions between On/Off/Any settings of target features and forms the basis of this compatibility checking algorithm.

kosarev added a reviewer: Restricted Project.Jun 16 2022, 8:23 AM
saiislam updated this revision to Diff 447145.Jul 24 2022, 10:47 AM

Updated with StringMap and StringRef ADTs. Also refactored based on new formating style.

JonChesterfield accepted this revision.Jul 25 2022, 12:52 AM
This revision is now accepted and ready to land.Jul 25 2022, 12:52 AM
This revision was landed with ongoing or failed builds.Jul 25 2022, 2:44 AM
This revision was automatically updated to reflect the committed changes.

Is it possible to move code around to avoid re-writing the TargetID parsing?

We have effectively the same parseTargetID in clang/lib/Basic/TargetID.cpp

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
2074

@saiislam the commit in the repo doesn't match this diff, any guesses?

https://github.com/llvm/llvm-project/commit/4075a811ad99b7e263b8b99954cef8c96b042e22

-  hsa_status_t Err;
-
+  hsa_status_t Err = hsa_init();
+  if (Err != HSA_STATUS_SUCCESS) {
+    DP("HSA Initialization Failed.\n");
+    return HSA_STATUS_ERROR;
+  }

That doesn't have a matching hsa_shutdown. Found this while chasing a segfault, didn't expect to see multiple calls to hsa_init from the debugger.

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
2074

Patch to revert that in D132965.

Apologies for the delay in response.

@scott.linder, this code is in OpenMP's admgpu plugin which is called by OpenMP device runtime.
Even though we are dependent on LLVM libraries, but I don't think we can access clang library from
here. Hence the logic replication.

@JonChesterfield, I first committed this patch as rG471f2abc62d9 but it had to be reverted
(rG8cbf4a386b67) because AMDGPU builtbot was failing multiple libomptarget-amdgcn tests
(https://lab.llvm.org/buildbot/#/builders/193/builds/15846).

I fixed it by adding this initialization in the following commit (rG4075a811ad99).