Download Raw Diff

Details

Reviewers

jdoerfert
JonChesterfield
jhuber6
yaxunl

Group Reviewers

Restricted Project

Commits

rG4075a811ad99: [Libomptarget] Add checks for AMDGPU TargetID using new image info
rG471f2abc62d9: [Libomptarget] Add checks for AMDGPU TargetID using new image info

Summary

This patch extends the is_valid_binary routine to also check if the
binary's target ID matches the one parsed from the system's runtime
environment.
This should allow us to only use the binary whose compute capability
matches, allowing us to support basic multi-architecture binaries for
AMDGPU.
It also handles compatibility testing of target IDs of the image and
the enviornment.

Depends on D127432

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

saiislam created this revision.Jun 14 2022, 10:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2022, 10:41 AM

Herald added subscribers: kosarev, kerbowa, t-tye and 4 others. · View Herald Transcript

saiislam requested review of this revision.Jun 14 2022, 10:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2022, 10:41 AM

Herald added subscribers: openmp-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B169772: Diff 436851.Jun 14 2022, 10:49 AM

Herald added a subscriber: sstefan1. · View Herald TranscriptJun 14 2022, 10:49 AM

Forgot to invert the condition in an if block.

Harbormaster completed remote builds in B169774: Diff 436855.Jun 14 2022, 11:00 AM

Looks fine to me given what I already created. I'll let someone with more knowledge of AMD's systems comment however.

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
2052	Compute capability is a CUDA thing right.

The control flow here is quite complicated and it's not immediately obvious why things are being stashed in a map. I vaguely recall the rules for choosing compatible images by target ID being complicated and undocumented.

Did you reverse engineer this from the HIP implementation, or is there now documentation for how the +/-/any stuff is meant to resolve?

Added comments for parseTargeID() .

In D127769#3583335, @JonChesterfield wrote:

The control flow here is quite complicated and it's not immediately obvious why things are being stashed in a map. I vaguely recall the rules for choosing compatible images by target ID being complicated and undocumented.

Did you reverse engineer this from the HIP implementation, or is there now documentation for how the +/-/any stuff is meant to resolve?

I implemented and documented the compatibility checking approach last year. Here is our complete documentation and implementation in the downstream.
Upstream reviewers at that time suggested to split TargetID specific checking to a separate patch, hence only a subset of checks were documented and implemented in upstream.

Implementation in this patch is exact application of my above patches.

target-feature subsection in this doc nicely explains the allowed interactions between On/Off/Any settings of target features and forms the basis of this compatibility checking algorithm.

Harbormaster completed remote builds in B169915: Diff 437051.Jun 14 2022, 11:58 PM

kosarev added a reviewer: Restricted Project.Jun 16 2022, 8:23 AM

Ping

Updated with StringMap and StringRef ADTs. Also refactored based on new formating style.

Harbormaster completed remote builds in B177238: Diff 447145.Jul 24 2022, 10:53 AM

JonChesterfield accepted this revision.Jul 25 2022, 12:52 AM

This revision is now accepted and ready to land.Jul 25 2022, 12:52 AM

This revision was landed with ongoing or failed builds.Jul 25 2022, 2:44 AM

Closed by commit rG471f2abc62d9: [Libomptarget] Add checks for AMDGPU TargetID using new image info (authored by saiislam). · Explain Why

This revision was automatically updated to reflect the committed changes.

saiislam added a commit: rG471f2abc62d9: [Libomptarget] Add checks for AMDGPU TargetID using new image info.

saiislam added a reverting change: rG8cbf4a386b67: Revert "[Libomptarget] Add checks for AMDGPU TargetID using new image info".Jul 25 2022, 3:34 AM

LGTM

saiislam added a commit: rG4075a811ad99: [Libomptarget] Add checks for AMDGPU TargetID using new image info.Jul 26 2022, 12:45 AM

Is it possible to move code around to avoid re-writing the TargetID parsing?

We have effectively the same parseTargetID in clang/lib/Basic/TargetID.cpp

JonChesterfield added inline comments.Aug 30 2022, 11:33 AM

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
2074	@saiislam the commit in the repo doesn't match this diff, any guesses? https://github.com/llvm/llvm-project/commit/4075a811ad99b7e263b8b99954cef8c96b042e22 - hsa_status_t Err; - + hsa_status_t Err = hsa_init(); + if (Err != HSA_STATUS_SUCCESS) { + DP("HSA Initialization Failed.\n"); + return HSA_STATUS_ERROR; + } That doesn't have a matching hsa_shutdown. Found this while chasing a segfault, didn't expect to see multiple calls to hsa_init from the debugger.

JonChesterfield added inline comments.Aug 30 2022, 12:43 PM

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
2074	Patch to revert that in D132965.

Apologies for the delay in response.

@scott.linder, this code is in OpenMP's admgpu plugin which is called by OpenMP device runtime.
Even though we are dependent on LLVM libraries, but I don't think we can access clang library from
here. Hence the logic replication.

@JonChesterfield, I first committed this patch as rG471f2abc62d9 but it had to be reverted
(rG8cbf4a386b67) because AMDGPU builtbot was failing multiple libomptarget-amdgcn tests
(https://lab.llvm.org/buildbot/#/builders/193/builds/15846).

I fixed it by adding this initialization in the following commit (rG4075a811ad99).

Diff 437051

openmp/libomptarget/plugins/amdgpu/dynamic_hsa/hsa.h

	Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines

	typedef enum {			typedef enum {
	HSA_DEVICE_TYPE_CPU = 0,			HSA_DEVICE_TYPE_CPU = 0,
	HSA_DEVICE_TYPE_GPU = 1,			HSA_DEVICE_TYPE_GPU = 1,
	HSA_DEVICE_TYPE_DSP = 2			HSA_DEVICE_TYPE_DSP = 2
	} hsa_device_type_t;			} hsa_device_type_t;

	typedef enum {			typedef enum {
	HSA_ISA_INFO_NAME = 1,			HSA_ISA_INFO_NAME_LENGTH = 0,
				HSA_ISA_INFO_NAME = 1
	} hsa_isa_info_t;			} hsa_isa_info_t;

	typedef enum {			typedef enum {
	HSA_AGENT_INFO_NAME = 0,			HSA_AGENT_INFO_NAME = 0,
	HSA_AGENT_INFO_VENDOR_NAME = 1,			HSA_AGENT_INFO_VENDOR_NAME = 1,
	HSA_AGENT_INFO_PROFILE = 4,			HSA_AGENT_INFO_PROFILE = 4,
	HSA_AGENT_INFO_WAVEFRONT_SIZE = 6,			HSA_AGENT_INFO_WAVEFRONT_SIZE = 6,
	HSA_AGENT_INFO_WORKGROUP_MAX_DIM = 7,			HSA_AGENT_INFO_WORKGROUP_MAX_DIM = 7,
	▲ Show 20 Lines • Show All 240 Lines • Show Last 20 Lines

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp

Show All 12 Lines

#include <algorithm>

#include <assert.h>

#include <cstdio>

#include <cstdlib>

#include <cstring>

#include <functional>

#include <libelf.h>

#include <list>

#include <map>

#include <memory>

#include <mutex>

#include <shared_mutex>

#include <unordered_map>

#include <vector>

#include "interop_hsa.h"

#include "impl_runtime.h"

▲ Show 20 Lines • Show All 432 Lines • ▼ Show 20 Lines

public:

std::vector<hsa_agent_t> CPUAgents;

// Device properties

std::vector<int> ComputeUnits;

std::vector<int> GroupsPerDevice;

std::vector<int> ThreadsPerGroup;

std::vector<int> WarpSize;

std::vector<std::string> GPUName;

std::vector<std::string> TargetID;

// OpenMP properties

std::vector<int> NumTeams;

std::vector<int> NumThreads;

// OpenMP Environment properties

EnvironmentVariables Env;

▲ Show 20 Lines • Show All 1,419 Lines • ▼ Show 20 Lines

namespace core {

hsa_status_t allow_access_to_all_gpu_agents(void *ptr) {

return hsa_amd_agents_allow_access(DeviceInfo.HSAAgents.size(),

&DeviceInfo.HSAAgents[0], NULL, ptr);

}

} // namespace core

static hsa_status_t get_isa_info(hsa_isa_t isa, void *data) {

hsa_status_t err;

uint32_t name_len;

err = hsa_isa_get_info_alt(isa, HSA_ISA_INFO_NAME_LENGTH, &name_len);

if (err != HSA_STATUS_SUCCESS) {

DP("Error getting ISA info length\n");

return err;

}

char target_id[name_len];

err = hsa_isa_get_info_alt(isa, HSA_ISA_INFO_NAME, target_id);

if (err != HSA_STATUS_SUCCESS) {

DP("Error getting ISA info name\n");

return err;

}

auto TripleTargetID = llvm::StringRef(target_id);

if (TripleTargetID.consume_front_insensitive("amdgcn-amd-amdhsa")) {

DeviceInfo.TargetID.push_back(TripleTargetID.ltrim('-').str());

}

return HSA_STATUS_SUCCESS;

}

/// Parse a target_id to get processor arch and feature map.

/// Returns processor subarch.

/// Returns target_id features in \p map argument.

/// If the \p target_id contains feature+, map it to true.

/// If the \p target_id contains feature-, map it to false.

/// If the \p target_id does not contain a feature (default), do not map it.

llvm::StringRef parseTargetID(llvm::StringRef target_id,

std::map<std::string, bool> *map) {

if (target_id.empty())

return llvm::StringRef();

auto arch_feature = target_id.split(":");

auto arch = arch_feature.first;

auto features = arch_feature.second;

if (features.empty())

return arch;

if (features.contains_insensitive("sramecc+")) {

map->insert(std::pair<std::string, bool>("sramecc", true));

} else if (features.contains_insensitive("sramecc-")) {

map->insert(std::pair<std::string, bool>("sramecc", false));

}

if (features.contains_insensitive("xnack+")) {

map->insert(std::pair<std::string, bool>("xnack", true));

} else if (features.contains_insensitive("xnack-")) {

map->insert(std::pair<std::string, bool>("xnack", false));

}

return arch;

}

/// Checks if an iamge \p img_info is compatible with current

/// system's environment \p env_info

bool is_image_compatible_with_env(char *img_info, std::string env_info) {

llvm::StringRef img_tid(img_info), env_tid(env_info);

// Compatible in case of exact match

if (img_tid == env_tid) {

DP("Compatible: Exact match \t[Image: %s]\t:\t[Environment: %s]\n",

img_tid.data(), env_tid.data());

return true;

}

// Incompatible if Archs mismatch.

std::map<std::string, bool> img_map, env_map;

llvm::StringRef img_arch = parseTargetID(img_tid, &img_map);

llvm::StringRef env_arch = parseTargetID(env_tid, &env_map);

// Both env_arch and img_arch can't be empty here.

if (env_arch.empty() || img_arch.empty() || !img_arch.contains(env_arch)) {

DP("Incompatible: Processor mismatch \t[Image: %s]\t:\t[Environment: %s]\n",

img_tid.data(), env_tid.data());

return false;

}

// Incompatible if image has more features than the environment, irrespective

// of type or sign of features.

if (img_map.size() > env_map.size()) {

DP("Incompatible: Image has more features than the environment \t[Image: "

"%s]\t:\t[Environment: %s]\n",

img_tid.data(), env_tid.data());

return false;

}

// Compatible if each target feature specified by the environment is

// compatible with target feature of the image. The target feature is

// compatible if the iamge does not specify it (meaning Any), or if it

// specifies it with the same value (meaning On or Off).

for (const auto &img_feature : img_map) {

auto env_feature = env_map.find(img_feature.first);

if (env_feature == env_map.end()) {

DP("Incompatible: Value of Image's non-ANY feature is not matching with "

"the Environment feature's ANY value \t[Image: %s]\t:\t[Environment: "

"%s]\n",

img_tid.data(), env_tid.data());

return false;

} else if (env_feature->first == img_feature.first &&

env_feature->second != img_feature.second) {

DP("Incompatible: Value of Image's non-ANY feature is not matching with "

"the Environment feature's non-ANY value \t[Image: "

"%s]\t:\t[Environment: %s]\n",

img_tid.data(), env_tid.data());

return false;

}

// Image is compatible if all features of Environment are:

// - either, present in the Image's features map with the same sign,

// - or, the feature is missing from Image's features map i.e. it is

// set to ANY

DP("Compatible: Target IDs are compatible \t[Image: %s]\t:\t[Environment: "

"%s]\n",

img_tid.data(), env_tid.data());

return true;

}

extern "C" {

int32_t __tgt_rtl_is_valid_binary(__tgt_device_image *image) {

return elf_machine_id_is_amdgcn(image);

}

int32_t __tgt_rtl_is_valid_binary_info(__tgt_device_image *image,

__tgt_image_info *info) {

if (!__tgt_rtl_is_valid_binary(image))

return false;

// A subarchitecture was not specified. Assume it is compatible.

if (!info->Arch)

return true;

int32_t NumberOfDevices = __tgt_rtl_number_of_devices();

for (int32_t DeviceId = 0; DeviceId < NumberOfDevices; ++DeviceId) {

__tgt_rtl_init_device(DeviceId);

hsa_agent_t agent = DeviceInfo.HSAAgents[DeviceId];

hsa_status_t err = hsa_agent_iterate_isas(agent, get_isa_info, &DeviceId);

if (err != HSA_STATUS_SUCCESS) {

DP("Error iterating ISAs\n");

return false;

}

if (!is_image_compatible_with_env(info->Arch, DeviceInfo.TargetID[DeviceId]))

return false;

}

DP("Image has Target ID compatible with the current environment: %s\n", info->Arch);

jhuber6Unsubmitted

Done

return false;

}

- DP("Image has compatible compute capability: %s\n", info->Arch);

+ DP("Image has compatible target ID: %s\n", info->Arch);

return true;

Compute capability is a CUDA thing right.

jhuber6: Compute capability is a CUDA thing right.

return true;

}

int __tgt_rtl_number_of_devices() {

// If the construction failed, no methods are safe to call

if (DeviceInfo.ConstructionSucceeded) {

return DeviceInfo.NumberOfDevices;

} else {

DP("AMDGPU plugin construction failed. Zero devices available\n");

return 0;

}

int64_t __tgt_rtl_init_requires(int64_t RequiresFlags) {

DP("Init requires flags to %ld\n", RequiresFlags);

DeviceInfo.RequiresFlags = RequiresFlags;

return RequiresFlags;

}

int32_t __tgt_rtl_init_device(int device_id) {

hsa_status_t err;

JonChesterfieldUnsubmitted

Not Done

@saiislam the commit in the repo doesn't match this diff, any guesses?

https://github.com/llvm/llvm-project/commit/4075a811ad99b7e263b8b99954cef8c96b042e22

-  hsa_status_t Err;
-
+  hsa_status_t Err = hsa_init();
+  if (Err != HSA_STATUS_SUCCESS) {
+    DP("HSA Initialization Failed.\n");
+    return HSA_STATUS_ERROR;
+  }

That doesn't have a matching hsa_shutdown. Found this while chasing a segfault, didn't expect to see multiple calls to hsa_init from the debugger.

JonChesterfield: @saiislam the commit in the repo doesn't match this diff, any guesses? https://github.

JonChesterfieldUnsubmitted

Not Done

Patch to revert that in D132965.

JonChesterfield: Patch to revert that in D132965.

// this is per device id init

DP("Initialize the device id: %d\n", device_id);

hsa_agent_t agent = DeviceInfo.HSAAgents[device_id];

// Get number of Compute Unit

uint32_t compute_units = 0;

err = hsa_agent_get_info(

▲ Show 20 Lines • Show All 674 Lines • Show Last 20 Lines

openmp/libomptarget/plugins/exports

	VERS1.0 {			VERS1.0 {
	global:			global:
	__tgt_rtl_is_valid_binary;			__tgt_rtl_is_valid_binary;
				__tgt_rtl_is_valid_binary_info;
	__tgt_rtl_is_data_exchangable;			__tgt_rtl_is_data_exchangable;
	__tgt_rtl_number_of_devices;			__tgt_rtl_number_of_devices;
	__tgt_rtl_init_requires;			__tgt_rtl_init_requires;
	__tgt_rtl_init_device;			__tgt_rtl_init_device;
	__tgt_rtl_deinit_device;			__tgt_rtl_deinit_device;
	__tgt_rtl_load_binary;			__tgt_rtl_load_binary;
	__tgt_rtl_data_alloc;			__tgt_rtl_data_alloc;
	__tgt_rtl_data_submit;			__tgt_rtl_data_submit;
	Show All 26 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Libomptarget] Add checks for AMDGPU TargetID using new image info
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 437051

openmp/libomptarget/plugins/amdgpu/dynamic_hsa/hsa.h

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp

openmp/libomptarget/plugins/exports

This is an archive of the discontinued LLVM Phabricator instance.

[Libomptarget] Add checks for AMDGPU TargetID using new image infoClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 437051

openmp/libomptarget/plugins/amdgpu/dynamic_hsa/hsa.h

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp

openmp/libomptarget/plugins/exports

[Libomptarget] Add checks for AMDGPU TargetID using new image info
ClosedPublic