Download Raw Diff

Details

Reviewers

JonChesterfield
ronlieb
jdoerfert

Commits

rG486110eb4134: [AMDGPU][Libomptarget] Remove global KernelNameMap

Summary

KernelNameMap contains entries like "key.kd" => key which clearly
could be replaced by simple logic of removing suffix from the key.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

pdhaliwal created this revision.May 18 2021, 7:21 AM

Herald added subscribers: kerbowa, t-tye, tpr and 5 others. · View Herald TranscriptMay 18 2021, 7:21 AM

pdhaliwal requested review of this revision.May 18 2021, 7:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2021, 7:21 AM

Herald added subscribers: openmp-commits, wdng. · View Herald Transcript

pdhaliwal added a child revision: D102692: [AMDGPU][Libomptarget] Move Kernel/Symbol info tables to RTLDeviceInfoTy.May 18 2021, 7:25 AM

pdhaliwal edited the summary of this revision. (Show Details)

Herald added a reviewer: jdoerfert. · View Herald TranscriptMay 18 2021, 7:26 AM

Herald added a subscriber: sstefan1. · View Herald Transcript

pdhaliwal mentioned this in D102692: [AMDGPU][Libomptarget] Move Kernel/Symbol info tables to RTLDeviceInfoTy.May 18 2021, 7:26 AM

Let's delete it instead. It maps 'foo' to 'foo.kd'. we can replace that with a function that appends .kd to the string.

Harbormaster completed remote builds in B105042: Diff 346172.May 18 2021, 10:36 AM

Removed KernelNameMap.

pdhaliwal retitled this revision from [AMDGPU][Libomptarget] Move KernelNameMap to function scope to [AMDGPU][Libomptarget] Remove global KernelNameMap.May 19 2021, 12:53 AM

pdhaliwal edited the summary of this revision. (Show Details)

I like the direction. Could we hold it for a day or so? I'd like to check through the uses of the kernel name to see if there's a missing edge case, or if we can simplify this a step further.

It looks like the msgpack data always contains the foo and the foo.kd strings, under different keys. I wonder if that's something we can rely on the compiler emitting.

openmp/libomptarget/plugins/amdgpu/impl/system.cpp
864	I think this is the earliest point we can check the foo to foo.kd mapping is in place, so that we can error out if it isn't
1036	Can we use the foo.kd form everywhere instead of truncating it?

In D102691#2767846, @JonChesterfield wrote:

I like the direction. Could we hold it for a day or so? I'd like to check through the uses of the kernel name to see if there's a missing edge case, or if we can simplify this a step further.

It looks like the msgpack data always contains the foo and the foo.kd strings, under different keys. I wonder if that's something we can rely on the compiler emitting.

The HSA ABI metadata is defined in https://llvm.org/docs/AMDGPUUsage.html#code-object-metadata . The kernel .symbol attribute provides the ELF symbol that should be used to access the kernel descriptor used in the AQL packet to specify the kernel to execute. The intent is not to make assumptions about the symbols, but simply use the metadata.

Harbormaster completed remote builds in B105161: Diff 346354.May 19 2021, 1:20 AM

Indeed, but HSA is not the only piece here. This is tied to some generic openmp data, which I think uses the IR symbol name as the ID. That seems to be .symbol.

So far, above co v2 which is unimplemented, kernel name is derivable by appending .kd, so the suggestion here was to only use one of the two strings.

Taken a closer look at this. Cut down example msgpack,

".name" : "__omp_vmul_l8",
".symbol" : "__omp_vmul_l8.kd",

Most things use the 'name' version, i.e. without the trailing .kd suffix. The symbol name passed into the openmp runtime is that one, and that's the form used as a key in KernelInfoTable.

The HSA functions expect the suffixed version, and that's the one SYMBOL_INFO_NAME returns.

So what we presently do is:

Build a map from foo.kd to foo
Check, obliquely, that the original metadata had both foo.kd and foo present
Use that map to strip .kd from the result of a HSA call

The logic to drop a trailing .kd therefore looks like the right change to make. We can check the .name and .symbol fields when reading the metadata and error out if they don't match the expected format, and likewise error out if HSA returns a string that doesn't end .kd, and at that point we can rip out the map.

That's close to the patch as written above, but now I've actually worked through the control flow myself as well.

JonChesterfield added inline comments.May 20 2021, 9:19 AM

openmp/libomptarget/plugins/amdgpu/impl/system.cpp
137	these ^ are probably not the ideal structures. We convert char* into std::string at each call site, and map() is expected to be slower than unordered_map for the ~random access lookup. vector<unordered_map<const char*, info_t>> with the custom equality / hasher for C strings is a reasonable choice.
1036	^that turns out to be worse, truncating is the way to go. Probably cleaner to mutate the name[] buffer than do extra things with strings.

Added check for .kd in symbol name.

Harbormaster completed remote builds in B105576: Diff 346946.May 21 2021, 1:44 AM

Couple of style comments above. I think this works as-is, so we have confirmation that the map of strings can go. Which is great!

openmp/libomptarget/plugins/amdgpu/impl/system.cpp
877	I think the above works but could be clearer as: (kernelName + ".kd" ) != symbolName kernelName can drop out of scope after this test (probably move it into a nested brace to make that clear), so we could mutate it instead of making a new string to probably save an allocation. But in general, '(foo + ".kd") == bar' seems simpler than the find calls.
927–928	Lets drop the debug print and comment here now that the map is gone

Simplified if condition.

pdhaliwal marked 3 inline comments as done.May 21 2021, 5:29 AM

Harbormaster completed remote builds in B105609: Diff 346998.May 21 2021, 6:15 AM

Like it, thanks!

This revision is now accepted and ready to land.May 21 2021, 6:59 AM

This revision was landed with ongoing or failed builds.May 24 2021, 1:46 AM

Closed by commit rG486110eb4134: [AMDGPU][Libomptarget] Remove global KernelNameMap (authored by pdhaliwal). · Explain Why

This revision was automatically updated to reflect the committed changes.

pdhaliwal added a commit: rG486110eb4134: [AMDGPU][Libomptarget] Remove global KernelNameMap.

Herald added a subscriber: foad. · View Herald TranscriptMay 24 2021, 1:46 AM

pdhaliwal mentioned this in rG7648b6978e55: [AMDGPU][Libomptarget] Move Kernel/Symbol info tables to RTLDeviceInfoTy.May 26 2021, 3:02 AM

Diff 346998

openmp/libomptarget/plugins/amdgpu/impl/system.cpp

/*===--------------------------------------------------------------------------		/*===--------------------------------------------------------------------------
* ATMI (Asynchronous Task and Memory Interface)		* ATMI (Asynchronous Task and Memory Interface)
*		*
* This file is distributed under the MIT License. See LICENSE.txt for details.		* This file is distributed under the MIT License. See LICENSE.txt for details.
===------------------------------------------------------------------------/		===------------------------------------------------------------------------/
#include <gelf.h>		#include <gelf.h>
		Lint: Pre-merge checks Inline Actions clang-tidy: error: 'gelf.h' file not found [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: 'gelf.h' file not found [clang-diagnostic-error] [[https://github.
#include <libelf.h>		#include <libelf.h>

#include <cassert>		#include <cassert>
#include <cstdarg>		#include <cstdarg>
#include <fstream>		#include <fstream>
#include <iomanip>		#include <iomanip>
#include <iostream>		#include <iostream>
#include <set>		#include <set>
▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines
};		};

// global variables. TODO: Get rid of these		// global variables. TODO: Get rid of these
atmi_machine_t g_atmi_machine;		atmi_machine_t g_atmi_machine;
ATLMachine g_atl_machine;		ATLMachine g_atl_machine;

std::vector<hsa_amd_memory_pool_t> atl_gpu_kernarg_pools;		std::vector<hsa_amd_memory_pool_t> atl_gpu_kernarg_pools;

std::map<std::string, std::string> KernelNameMap;
std::vector<std::map<std::string, atl_kernel_info_t>> KernelInfoTable;		std::vector<std::map<std::string, atl_kernel_info_t>> KernelInfoTable;
std::vector<std::map<std::string, atl_symbol_info_t>> SymbolInfoTable;		std::vector<std::map<std::string, atl_symbol_info_t>> SymbolInfoTable;

		JonChesterfieldUnsubmitted Not Done Reply Inline Actions these ^ are probably not the ideal structures. We convert char* into std::string at each call site, and map() is expected to be slower than unordered_map for the ~random access lookup. vector<unordered_map<const char, info_t>> with the custom equality / hasher for C strings is a reasonable choice. JonChesterfield:* these ^ are probably not the ideal structures. We convert char* into std::string at each call…
bool g_atmi_initialized = false;		bool g_atmi_initialized = false;
bool g_atmi_hostcall_required = false;		bool g_atmi_hostcall_required = false;

/*		/*
atlc is all internal global values.		atlc is all internal global values.
The structure atl_context_t is defined in atl_internal.h		The structure atl_context_t is defined in atl_internal.h
Most references will use the global structure prefix atlc.		Most references will use the global structure prefix atlc.
*/		*/
▲ Show 20 Lines • Show All 710 Lines • ▼ Show 20 Lines	for (size_t i = 0; i < kernelsSize; i++) {
if (msgpack_errors != 0) {		if (msgpack_errors != 0) {
printf("[%s:%d] %s failed\n", __FILE__, __LINE__,		printf("[%s:%d] %s failed\n", __FILE__, __LINE__,
"element lookup in kernel metadata");		"element lookup in kernel metadata");
return HSA_STATUS_ERROR_INVALID_CODE_OBJECT;		return HSA_STATUS_ERROR_INVALID_CODE_OBJECT;
}		}

msgpack_errors += map_lookup_string(element, ".name", &kernelName);		msgpack_errors += map_lookup_string(element, ".name", &kernelName);
msgpack_errors += map_lookup_string(element, ".symbol", &symbolName);		msgpack_errors += map_lookup_string(element, ".symbol", &symbolName);
if (msgpack_errors != 0) {		if (msgpack_errors != 0) {
		JonChesterfieldUnsubmitted Done Reply Inline Actions I think this is the earliest point we can check the foo to foo.kd mapping is in place, so that we can error out if it isn't JonChesterfield: I think this is the earliest point we can check the foo to foo.kd mapping is in place, so that…
printf("[%s:%d] %s failed\n", __FILE__, __LINE__,		printf("[%s:%d] %s failed\n", __FILE__, __LINE__,
"strings lookup in kernel metadata");		"strings lookup in kernel metadata");
return HSA_STATUS_ERROR_INVALID_CODE_OBJECT;		return HSA_STATUS_ERROR_INVALID_CODE_OBJECT;
}		}

		// Make sure that kernelName + ".kd" == symbolName
		if ((kernelName + ".kd") != symbolName) {
		printf("[%s:%d] Kernel name mismatching symbol: %s != %s + .kd\n",
		__FILE__, __LINE__, symbolName.c_str(), kernelName.c_str());
		return HSA_STATUS_ERROR_INVALID_CODE_OBJECT;
		}

atl_kernel_info_t info = {0, 0, 0, 0, 0, 0, 0, 0, 0, {}, {}, {}};		atl_kernel_info_t info = {0, 0, 0, 0, 0, 0, 0, 0, 0, {}, {}, {}};
		JonChesterfieldUnsubmitted Done Reply Inline Actions I think the above works but could be clearer as: (kernelName + ".kd" ) != symbolName kernelName can drop out of scope after this test (probably move it into a nested brace to make that clear), so we could mutate it instead of making a new string to probably save an allocation. But in general, '(foo + ".kd") == bar' seems simpler than the find calls. JonChesterfield: I think the above works but could be clearer as: (kernelName + ".kd" ) != symbolName…

uint64_t sgpr_count, vgpr_count, sgpr_spill_count, vgpr_spill_count;		uint64_t sgpr_count, vgpr_count, sgpr_spill_count, vgpr_spill_count;
msgpack_errors += map_lookup_uint64_t(element, ".sgpr_count", &sgpr_count);		msgpack_errors += map_lookup_uint64_t(element, ".sgpr_count", &sgpr_count);
if (msgpack_errors != 0) {		if (msgpack_errors != 0) {
printf("[%s:%d] %s failed\n", __FILE__, __LINE__,		printf("[%s:%d] %s failed\n", __FILE__, __LINE__,
"sgpr count metadata lookup in kernel metadata");		"sgpr count metadata lookup in kernel metadata");
return HSA_STATUS_ERROR_INVALID_CODE_OBJECT;		return HSA_STATUS_ERROR_INVALID_CODE_OBJECT;
}		}
Show All 33 Lines	for (size_t i = 0; i < kernelsSize; i++) {
uint64_t kernel_segment_size;		uint64_t kernel_segment_size;
msgpack_errors += map_lookup_uint64_t(element, ".kernarg_segment_size",		msgpack_errors += map_lookup_uint64_t(element, ".kernarg_segment_size",
&kernel_segment_size);		&kernel_segment_size);
if (msgpack_errors != 0) {		if (msgpack_errors != 0) {
printf("[%s:%d] %s failed\n", __FILE__, __LINE__,		printf("[%s:%d] %s failed\n", __FILE__, __LINE__,
"kernarg segment size metadata lookup in kernel metadata");		"kernarg segment size metadata lookup in kernel metadata");
return HSA_STATUS_ERROR_INVALID_CODE_OBJECT;		return HSA_STATUS_ERROR_INVALID_CODE_OBJECT;
}		}

// create a map from symbol to name
DEBUG_PRINT("Kernel symbol %s; Name: %s; Size: %lu\n", symbolName.c_str(),
kernelName.c_str(), kernel_segment_size);
KernelNameMap[symbolName] = kernelName;

bool hasHiddenArgs = false;		bool hasHiddenArgs = false;
		JonChesterfieldUnsubmitted Done Reply Inline Actions Lets drop the debug print and comment here now that the map is gone JonChesterfield: Lets drop the debug print and comment here now that the map is gone
if (kernel_segment_size > 0) {		if (kernel_segment_size > 0) {
uint64_t argsSize;		uint64_t argsSize;
size_t offset = 0;		size_t offset = 0;

msgpack::byte_range args_array;		msgpack::byte_range args_array;
msgpack_errors +=		msgpack_errors +=
map_lookup_array(element, ".args", &args_array, &argsSize);		map_lookup_array(element, ".args", &args_array, &argsSize);
if (msgpack_errors != 0) {		if (msgpack_errors != 0) {
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	if (type == HSA_SYMBOL_KIND_KERNEL) {
char name = reinterpret_cast<char >(malloc(name_length + 1));		char name = reinterpret_cast<char >(malloc(name_length + 1));
err = hsa_executable_symbol_get_info(symbol,		err = hsa_executable_symbol_get_info(symbol,
HSA_EXECUTABLE_SYMBOL_INFO_NAME, name);		HSA_EXECUTABLE_SYMBOL_INFO_NAME, name);
if (err != HSA_STATUS_SUCCESS) {		if (err != HSA_STATUS_SUCCESS) {
printf("[%s:%d] %s failed: %s\n", __FILE__, __LINE__,		printf("[%s:%d] %s failed: %s\n", __FILE__, __LINE__,
"Symbol info extraction", get_error_string(err));		"Symbol info extraction", get_error_string(err));
return err;		return err;
}		}
name[name_length] = 0;		// remove the suffix .kd from symbol name.
		name[name_length - 3] = 0;

if (KernelNameMap.find(std::string(name)) == KernelNameMap.end()) {
// did not find kernel name in the kernel map; this can happen only
// if the ROCr API for getting symbol info (name) is different from
// the comgr method of getting symbol info
return HSA_STATUS_ERROR;
}
atl_kernel_info_t info;		atl_kernel_info_t info;
std::string kernelName = KernelNameMap[std::string(name)];		std::string kernelName(name);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'kernelName' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'kernelName' [readability-identifier…
// by now, the kernel info table should already have an entry		// by now, the kernel info table should already have an entry
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions Can we use the foo.kd form everywhere instead of truncating it? JonChesterfield: Can we use the foo.kd form everywhere instead of truncating it?
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions ^that turns out to be worse, truncating is the way to go. Probably cleaner to mutate the name[] buffer than do extra things with strings. JonChesterfield: ^that turns out to be worse, truncating is the way to go. Probably cleaner to mutate the name[]…
// because the non-ROCr custom code object parsing is called before		// because the non-ROCr custom code object parsing is called before
// iterating over the code object symbols using ROCr		// iterating over the code object symbols using ROCr
if (KernelInfoTable[gpu].find(kernelName) == KernelInfoTable[gpu].end()) {		if (KernelInfoTable[gpu].find(kernelName) == KernelInfoTable[gpu].end()) {
return HSA_STATUS_ERROR;		return HSA_STATUS_ERROR;
}		}
// found, so assign and update		// found, so assign and update
info = KernelInfoTable[gpu][kernelName];		info = KernelInfoTable[gpu][kernelName];

▲ Show 20 Lines • Show All 203 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][Libomptarget] Remove global KernelNameMap
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 346998

openmp/libomptarget/plugins/amdgpu/impl/system.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][Libomptarget] Remove global KernelNameMapClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 346998

openmp/libomptarget/plugins/amdgpu/impl/system.cpp

[AMDGPU][Libomptarget] Remove global KernelNameMap
ClosedPublic