This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/tools/amdgpu-arch/
-
tools/
-
amdgpu-arch/
12/13
AMDGPUArch.cpp
6/7
AMDGPUArchByHIP.cpp
2/2
AMDGPUArchByHSA.cpp
-
CMakeLists.txt

Differential D153725

[clang] Make amdgpu-arch tool work on Windows
ClosedPublic

Authored by yaxunl on Jun 25 2023, 10:22 AM.

Download Raw Diff

Details

Reviewers

tra
jdoerfert
jhuber6
saiislam
arsenm

Commits

rG661d91a0fd4a: [clang] Make amdgpu-arch tool work on Windows

Summary

Currently amdgpu-arch tool detects AMD GPU by dynamically
loading HSA runtime shared library and using HSA API's,
which is not available on Windows.

This patch makes it work on Windows by dynamically loading
HIP runtime dll and using HIP API's.

Diff Detail

Unit TestsFailed

	Time	Test
	3,760 ms	x64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases/Linux::auto_memory_profile_test.cpp

Event Timeline

yaxunl created this revision.Jun 25 2023, 10:22 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 25 2023, 10:22 AM

Herald added subscribers: kerbowa, tpr, dstuttard and 2 others. · View Herald Transcript

yaxunl requested review of this revision.Jun 25 2023, 10:22 AM

Herald added a reviewer: jdoerfert. · View Herald TranscriptJun 25 2023, 10:22 AM

Herald added subscribers: jplehr, sstefan1, wdng. · View Herald Transcript

Harbormaster completed remote builds in B241043: Diff 534360.Jun 25 2023, 11:18 AM

yaxunl added a reviewer: jhuber6.Jun 30 2023, 6:15 AM

Unrelated but can we get this to start reporting xnack and ecc?

clang/tools/amdgpu-arch/AMDGPUArchByHIP.cpp
81	single quotes around '\n'
92	llvm::outs
clang/tools/amdgpu-arch/AMDGPUArchByHSA.cpp
115	llvm::outs()?

In D153725#4463589, @arsenm wrote:

Unrelated but can we get this to start reporting xnack and ecc?

A lot of CMake relies on this just being an ordered list of architectures, so we'd probably need to make that an opt-in thing.

clang/tools/amdgpu-arch/AMDGPUArch.cpp
47–51	Doesn't LLVM know if it's being built for Windows? Maybe we should key off of that instead and then conditionally `add_sources` for a single function that satisfies the same "print all the architectures" thing.

In D153725#4463589, @arsenm wrote:

Unrelated but can we get this to start reporting xnack and ecc?

Using HIP runtime reports xnack and ecc. It reports the target ID including the accurate xnack and ecc feature which is ready to be used by clang.

yaxunl marked 3 inline comments as done.Jun 30 2023, 8:11 AM

yaxunl added inline comments.

clang/tools/amdgpu-arch/AMDGPUArch.cpp
47–51	When this code is compiled on Windows, the compiler predefines `_WIN32`, so it should work. I tried to tweak cmake files of amdgpu-arch to selectively add source files for Windows and non-windows but it did not work. If you have a file in that directory that is not included in any target, cmake will report an error. Seems there is a mechanism in CMake files for clang tools not allowing any 'dangling' source files.
clang/tools/amdgpu-arch/AMDGPUArchByHIP.cpp
81	will do
92	will do
clang/tools/amdgpu-arch/AMDGPUArchByHSA.cpp
115	will do

revised by comments

I tested the program on Windows and Linux and it works.

jhuber6 added inline comments.Jun 30 2023, 8:42 AM

clang/tools/amdgpu-arch/AMDGPUArch.cpp
47–51	The proper way to do that is to add it to a new subdirectory and conditionally do `add_subdirectory`. Something like HSA/GetAMDGPUArch.cpp HIP/GetAMDGPUArch.cpp It's not a big deal, but I just feel like including unused symbols in the binary on Linux isn't ideal. Up to you if you want to put in the effort.

yaxunl marked 2 inline comments as done.Jun 30 2023, 8:49 AM

yaxunl added inline comments.

clang/tools/amdgpu-arch/AMDGPUArch.cpp
47–51	The HIP version actually works on both Linux and Windows. I am not sure whether one day we want to use it on Linux too since it supports target ID features. Also, I kind of think it is overkill to have separate directories for Windows and Linux for this simple program.

Also w.r.t. target-id, I'm wondering what a good solution would be. Right now the main usage of amdgpu-arch is both to detect the -mcpu / -march in CMake and to fill in the architecture via --offload-arch=native or -fopenmp-target=amdgcn-amd-amdhsa. We may want to make a flag to specify if we want to include target-id information in the reported architectures.

arsenm added inline comments.Jun 30 2023, 8:58 AM

clang/tools/amdgpu-arch/AMDGPUArch.cpp
47–51	Why can't you get the target id features through the HSA path? I think there's value in going through the lowest level component to get the information

jhuber6 added inline comments.Jun 30 2023, 9:02 AM

clang/tools/amdgpu-arch/AMDGPUArch.cpp

47–51

This should be what we do in the OpenMP runtime, should be able to add that feature.

uint32_t name_len;                                                                                                                                                                      
err = hsa_isa_get_info_alt(isa, HSA_ISA_INFO_NAME_LENGTH, &name_len);                                                                                                                        
if (err != HSA_STATUS_SUCCESS) {                                                                                                                                                                                                   
  DP("Error getting ISA info length\n");                                                                                                                                                                                           
  return err;                                                                                                                                                                                                                      
}                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                   
char TargetID[name_len];                                                                                                                                                                                                           
err = hsa_isa_get_info_alt(isa, HSA_ISA_INFO_NAME, TargetID);                                                                                                                                                                      
if (err != HSA_STATUS_SUCCESS) {                                                                                                                                                                                                   
  DP("Error getting ISA info name\n");                                                                                                                                                                                             
  return err;                                                                                                                                                                                                                      
}

It's unfortunate that HSA does not work on Windows so we need to split this stuff up.

jhuber6 added inline comments.Jun 30 2023, 9:07 AM

clang/tools/amdgpu-arch/AMDGPUArch.cpp
47–51	Just realized this needs the isa, ignore that. I'll need to scrounge around the HSA info enums to see if this is queriable there for the agent.

Harbormaster completed remote builds in B242430: Diff 536257.Jun 30 2023, 9:24 AM

arsenm added inline comments.Jul 7 2023, 5:41 AM

clang/tools/amdgpu-arch/AMDGPUArch.cpp
50	The HIP path should work on linux too. I generally think we should build as much code as possible on all hosts, so how about #ifndef _WIN32 if (tryHSA()) return 0; #endif tryHIP()

jhuber6 added inline comments.Jul 7 2023, 5:43 AM

clang/tools/amdgpu-arch/AMDGPUArch.cpp
50	That'd be fine, I'm in favor of sticking to HSA since it's a smaller runtime that's more reasonable to build standalone without the whole ROCm stack.

revised by comments

yaxunl marked 5 inline comments as done.Jul 7 2023, 9:20 AM

yaxunl added inline comments.

clang/tools/amdgpu-arch/AMDGPUArch.cpp
50	done

jhuber6 added inline comments.Jul 7 2023, 9:28 AM

clang/tools/amdgpu-arch/AMDGPUArch.cpp
47–51	Are we missing something here ? They look the same.

Harbormaster completed remote builds in B243797: Diff 538170.Jul 7 2023, 10:29 AM

yaxunl marked an inline comment as done.Jul 7 2023, 10:54 AM

yaxunl added inline comments.

clang/tools/amdgpu-arch/AMDGPUArch.cpp
47–51	sorry my mistake. will fix.

jdoerfert added inline comments.Jul 7 2023, 10:56 AM

clang/tools/amdgpu-arch/AMDGPUArchByHIP.cpp
97	Where is the call to this?

revised by comments

yaxunl marked an inline comment as done.Jul 7 2023, 11:04 AM

yaxunl added inline comments.

clang/tools/amdgpu-arch/AMDGPUArchByHIP.cpp
97	forgot to call this in main. fixed.

This revision is now accepted and ready to land.Jul 7 2023, 11:05 AM

Harbormaster completed remote builds in B243819: Diff 538207.Jul 7 2023, 1:12 PM

This revision was landed with ongoing or failed builds.Jul 7 2023, 9:32 PM

Closed by commit rG661d91a0fd4a: [clang] Make amdgpu-arch tool work on Windows (authored by yaxunl). · Explain Why

This revision was automatically updated to reflect the committed changes.

yaxunl marked an inline comment as done.

yaxunl added a commit: rG661d91a0fd4a: [clang] Make amdgpu-arch tool work on Windows.

Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2023, 9:32 PM

JonChesterfield added a subscriber: JonChesterfield.Jul 10 2023, 4:05 AM

The right thing to do on Linux for this is to query the driver directly. That is, the kernel should populate some string under /sys that we read. That isn't yet implemented. Does windows happen to have that functionality available?

(landed here while trying to work out why tests aren't running because we now print errors about failing to load libamdhip64.so when hsa fails)

JonChesterfield added inline comments.Jul 10 2023, 4:13 AM

clang/tools/amdgpu-arch/AMDGPUArch.cpp
50	I can't think of a case on linux where HIP would work and HSA would not, given that HIP calls into HSA to do the same query. So I think this fallback path only contributes to line noise when HSA doesn't load, ./bin/amdgpu-arch Failed to 'dlopen' libhsa-runtime64.so Failed to load libamdhip64.so: libamdhip64.so: cannot open shared object file: No such file or directory

In D153725#4484711, @JonChesterfield wrote:

The right thing to do on Linux for this is to query the driver directly. That is, the kernel should populate some string under /sys that we read. That isn't yet implemented.

It should definitely not do that. That's what this redundant thing does . The kernel doesn't know the names of these devices. The kernel knows different names that map to PCI ids that are not the same as the gfx numbers. The compiler should not be responsible for maintaining yet another name mapping table and should go through a real API

Does windows happen to have that functionality available?

This sounds very un-windows like. I assume the equivalent is digging around in the registry

arsenm added inline comments.Jul 10 2023, 4:28 AM

clang/tools/amdgpu-arch/AMDGPUArchByHIP.cpp
81	Should print some kind of stringified error codes for all of these

In D153725#4484747, @arsenm wrote:

In D153725#4484711, @JonChesterfield wrote:

The right thing to do on Linux for this is to query the driver directly. That is, the kernel should populate some string under /sys that we read. That isn't yet implemented.

It should definitely not do that. That's what this redundant thing does . The kernel doesn't know the names of these devices. The kernel knows different names that map to PCI ids that are not the same as the gfx numbers. The compiler should not be responsible for maintaining yet another name mapping table and should go through a real API

There's a lot of pci id to gfx906 style tables lying around already. There used to be one in roct, last time I looked people wanted to move that to somewhere else. I don't really want to copy/paste it.

The problem with using the proper API via HSA or similar is twofold:

we use this tool to enable tests, which means HSA has to exist before building clang or the tests don't run and HSA now requires clang to build
if you open the driver too many times at once it fails to open, so running a parallel build that uses this tool doesn't work on fast machines

In D153725#4484754, @JonChesterfield wrote:

if you open the driver too many times at once it fails to open, so running a parallel build that uses this tool doesn't work on fast machines

Why would this happen? Seems like a bug to fix?

In D153725#4484754, @JonChesterfield wrote:

The problem with using the proper API via HSA or similar is twofold:

we use this tool to enable tests, which means HSA has to exist before building clang or the tests don't run and HSA now requires clang to build

I don't follow this. You don't need this to work to perform the build and test build. You may need it to execute the tests, but if HSA doesn't exist they won't be able to run anyway. If a build is invoking these tools at cmake time it's just broken

In D153725#4484966, @arsenm wrote:

In D153725#4484754, @JonChesterfield wrote:

if you open the driver too many times at once it fails to open, so running a parallel build that uses this tool doesn't work on fast machines

Why would this happen? Seems like a bug to fix?

Jon is probably referring to a recurring problem we've noticed with the libc tests on HSA that they will sometimes fail when running with multiple threads, see https://lab.llvm.org/staging/#/builders/247/builds/2599/steps/10/logs/stdio. Haven't been able to track down whether or not that's a bug in the implementation or interface somewhere.

In D153725#4484973, @arsenm wrote:

In D153725#4484754, @JonChesterfield wrote:

The problem with using the proper API via HSA or similar is twofold:

we use this tool to enable tests, which means HSA has to exist before building clang or the tests don't run and HSA now requires clang to build

I don't follow this. You don't need this to work to perform the build and test build. You may need it to execute the tests, but if HSA doesn't exist they won't be able to run anyway. If a build is invoking these tools at cmake time it's just broken

And the libomptarget build is in fact doing that, but it shouldn't have to. What it's doing actually seems really unreasonable. It's only building the locally found targets when it should be building all targetable devices. The inconvenience there is that's too many devices, so as a build time hack you should be able to opt-in to a restricted subset. Even better would be if we would only build a copy for a reasonable subset of targets (i.e. one per generation where there's actually some semblance of compatibility). Or could just capitulate and rely on the hacks device libs does

In D153725#4484966, @arsenm wrote:

In D153725#4484754, @JonChesterfield wrote:

if you open the driver too many times at once it fails to open, so running a parallel build that uses this tool doesn't work on fast machines

Why would this happen? Seems like a bug to fix?

It definitely annoys me. The argument is you can't usefully run some large N number of programs at the same time anyway and the driver failing to open is a rate limit. The problem is there are things we could usefully do, like this query, without needing to run a kernel as well. The net effect is we don't run tests widely in parallel because they fail if we do, for this and possibly other reasons.

The test detection is an awkward compromise between people who want to run the GPU tests and people who don't, and reflects diverse hardware in use and variation on whether cuda / hsa are installed.

In D153725#4485039, @arsenm wrote:

And the libomptarget build is in fact doing that, but it shouldn't have to. What it's doing actually seems really unreasonable. It's only building the locally found targets when it should be building all targetable devices. The inconvenience there is that's too many devices, so as a build time hack you should be able to opt-in to a restricted subset. Even better would be if we would only build a copy for a reasonable subset of targets (i.e. one per generation where there's actually some semblance of compatibility). Or could just capitulate and rely on the hacks device libs does

The libomptarget build uses it to determine if it should build the tests mostly, we don't want to configure tests for a system that cannot support them. The libc tests however requires it to set the architecture for its test configuration since we can't support multiple test architectures at the same time, it required too much work so I shelved that. We more or less just say "If you've got HSA / CUDA we expect to run tests".

Revision Contents

Path

Size

clang/

tools/

amdgpu-arch/

111 lines

96 lines

122 lines

2 lines

Diff 536257

clang/tools/amdgpu-arch/AMDGPUArch.cpp

	//===- AMDGPUArch.cpp - list AMDGPU installed ----------- C++ ----------===//			//===- AMDGPUArch.cpp - list AMDGPU installed ----------- C++ ----------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file implements a tool for detecting name of AMDGPU installed in system			// This file implements a tool for detecting name of AMDGPU installed in system.
	// using HSA. This tool is used by AMDGPU OpenMP driver.			// This tool is used by AMDGPU OpenMP and HIP driver.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "clang/Basic/Version.h"			#include "clang/Basic/Version.h"
	#include "llvm/Support/CommandLine.h"			#include "llvm/Support/CommandLine.h"
	#include "llvm/Support/DynamicLibrary.h"
	#include "llvm/Support/Error.h"
	#include <memory>
	#include <string>
	#include <vector>

	using namespace llvm;			using namespace llvm;

	static cl::opt<bool> Help("h", cl::desc("Alias for -help"), cl::Hidden);			static cl::opt<bool> Help("h", cl::desc("Alias for -help"), cl::Hidden);

	// Mark all our options with this category.			// Mark all our options with this category.
	static cl::OptionCategory AMDGPUArchCategory("amdgpu-arch options");			static cl::OptionCategory AMDGPUArchCategory("amdgpu-arch options");

	static void PrintVersion(raw_ostream &OS) {			static void PrintVersion(raw_ostream &OS) {
	OS << clang::getClangToolFullVersion("amdgpu-arch") << '\n';			OS << clang::getClangToolFullVersion("amdgpu-arch") << '\n';
	}			}

	typedef enum {			int printGPUsByHSA();
	HSA_STATUS_SUCCESS = 0x0,			int printGPUsByHIP();
	} hsa_status_t;

	typedef enum {
	HSA_DEVICE_TYPE_CPU = 0,
	HSA_DEVICE_TYPE_GPU = 1,
	} hsa_device_type_t;

	typedef enum {
	HSA_AGENT_INFO_NAME = 0,
	HSA_AGENT_INFO_DEVICE = 17,
	} hsa_agent_info_t;

	typedef struct hsa_agent_s {
	uint64_t handle;
	} hsa_agent_t;

	hsa_status_t (*hsa_init)();
	hsa_status_t (*hsa_shut_down)();
	hsa_status_t (hsa_agent_get_info)(hsa_agent_t, hsa_agent_info_t, void );
	hsa_status_t (hsa_iterate_agents)(hsa_status_t ()(hsa_agent_t, void *),
	void *);

	constexpr const char *DynamicHSAPath = "libhsa-runtime64.so";

	llvm::Error loadHSA() {
	std::string ErrMsg;
	auto DynlibHandle = std::make_unique<llvm::sys::DynamicLibrary>(
	llvm::sys::DynamicLibrary::getPermanentLibrary(DynamicHSAPath, &ErrMsg));
	if (!DynlibHandle->isValid()) {
	return llvm::createStringError(llvm::inconvertibleErrorCode(),
	"Failed to 'dlopen' %s", DynamicHSAPath);
	}
	#define DYNAMIC_INIT(SYMBOL) \
	{ \
	void *SymbolPtr = DynlibHandle->getAddressOfSymbol(#SYMBOL); \
	if (!SymbolPtr) \
	return llvm::createStringError(llvm::inconvertibleErrorCode(), \
	"Failed to 'dlsym' " #SYMBOL); \
	SYMBOL = reinterpret_cast<decltype(SYMBOL)>(SymbolPtr); \
	}
	DYNAMIC_INIT(hsa_init);
	DYNAMIC_INIT(hsa_shut_down);
	DYNAMIC_INIT(hsa_agent_get_info);
	DYNAMIC_INIT(hsa_iterate_agents);
	#undef DYNAMIC_INIT
	return llvm::Error::success();
	}

	static hsa_status_t iterateAgentsCallback(hsa_agent_t Agent, void *Data) {
	hsa_device_type_t DeviceType;
	hsa_status_t Status =
	hsa_agent_get_info(Agent, HSA_AGENT_INFO_DEVICE, &DeviceType);

	// continue only if device type if GPU
	if (Status != HSA_STATUS_SUCCESS \|\| DeviceType != HSA_DEVICE_TYPE_GPU) {
	return Status;
	}

	std::vector<std::string> *GPUs =
	static_cast<std::vector<std::string> *>(Data);
	char GPUName[64];
	Status = hsa_agent_get_info(Agent, HSA_AGENT_INFO_NAME, GPUName);
	if (Status != HSA_STATUS_SUCCESS) {
	return Status;
	}
	GPUs->push_back(GPUName);
	return HSA_STATUS_SUCCESS;
	}

	int main(int argc, char *argv[]) {			int main(int argc, char *argv[]) {
	cl::HideUnrelatedOptions(AMDGPUArchCategory);			cl::HideUnrelatedOptions(AMDGPUArchCategory);

	cl::SetVersionPrinter(PrintVersion);			cl::SetVersionPrinter(PrintVersion);
	cl::ParseCommandLineOptions(			cl::ParseCommandLineOptions(
	argc, argv,			argc, argv,
	"A tool to detect the presence of AMDGPU devices on the system. \n\n"			"A tool to detect the presence of AMDGPU devices on the system. \n\n"
	"The tool will output each detected GPU architecture separated by a\n"			"The tool will output each detected GPU architecture separated by a\n"
	"newline character. If multiple GPUs of the same architecture are found\n"			"newline character. If multiple GPUs of the same architecture are found\n"
	"a string will be printed for each\n");			"a string will be printed for each\n");

	if (Help) {			if (Help) {
	cl::PrintHelpMessage();			cl::PrintHelpMessage();
	return 0;			return 0;
	}			}

	// Attempt to load the HSA runtime.			#ifdef _WIN32
	if (llvm::Error Err = loadHSA()) {			return printGPUsByHIP();
	logAllUnhandledErrors(std::move(Err), llvm::errs());			#else
	return 1;			return printGPUsByHSA();
				arsenmUnsubmitted Done Reply Inline Actions The HIP path should work on linux too. I generally think we should build as much code as possible on all hosts, so how about #ifndef _WIN32 if (tryHSA()) return 0; #endif tryHIP() arsenm: The HIP path should work on linux too. I generally think we should build as much code as…
				jhuber6Unsubmitted Done Reply Inline Actions That'd be fine, I'm in favor of sticking to HSA since it's a smaller runtime that's more reasonable to build standalone without the whole ROCm stack. jhuber6: That'd be fine, I'm in favor of sticking to HSA since it's a smaller runtime that's more…
				yaxunlAuthorUnsubmitted Done Reply Inline Actions done yaxunl: done
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions I can't think of a case on linux where HIP would work and HSA would not, given that HIP calls into HSA to do the same query. So I think this fallback path only contributes to line noise when HSA doesn't load, ./bin/amdgpu-arch Failed to 'dlopen' libhsa-runtime64.so Failed to load libamdhip64.so: libamdhip64.so: cannot open shared object file: No such file or directory JonChesterfield: I can't think of a case on linux where HIP would work and HSA would not, given that HIP calls…
	}			#endif
				jhuber6Unsubmitted Done Reply Inline Actions Doesn't LLVM know if it's being built for Windows? Maybe we should key off of that instead and then conditionally `add_sources` for a single function that satisfies the same "print all the architectures" thing. jhuber6: Doesn't LLVM know if it's being built for Windows? Maybe we should key off of that instead and…
				yaxunlAuthorUnsubmitted Done Reply Inline Actions When this code is compiled on Windows, the compiler predefines `_WIN32`, so it should work. I tried to tweak cmake files of amdgpu-arch to selectively add source files for Windows and non-windows but it did not work. If you have a file in that directory that is not included in any target, cmake will report an error. Seems there is a mechanism in CMake files for clang tools not allowing any 'dangling' source files. yaxunl: When this code is compiled on Windows, the compiler predefines `_WIN32`, so it should work. I…
				jhuber6Unsubmitted Done Reply Inline Actions The proper way to do that is to add it to a new subdirectory and conditionally do `add_subdirectory`. Something like HSA/GetAMDGPUArch.cpp HIP/GetAMDGPUArch.cpp It's not a big deal, but I just feel like including unused symbols in the binary on Linux isn't ideal. Up to you if you want to put in the effort. jhuber6: The proper way to do that is to add it to a new subdirectory and conditionally do…
				yaxunlAuthorUnsubmitted Done Reply Inline Actions The HIP version actually works on both Linux and Windows. I am not sure whether one day we want to use it on Linux too since it supports target ID features. Also, I kind of think it is overkill to have separate directories for Windows and Linux for this simple program. yaxunl: The HIP version actually works on both Linux and Windows. I am not sure whether one day we want…
				arsenmUnsubmitted Done Reply Inline Actions Why can't you get the target id features through the HSA path? I think there's value in going through the lowest level component to get the information arsenm: Why can't you get the target id features through the HSA path? I think there's value in going…
				jhuber6Unsubmitted Done Reply Inline Actions This should be what we do in the OpenMP runtime, should be able to add that feature. uint32_t name_len; err = hsa_isa_get_info_alt(isa, HSA_ISA_INFO_NAME_LENGTH, &name_len); if (err != HSA_STATUS_SUCCESS) { DP("Error getting ISA info length\n"); return err; } char TargetID[name_len]; err = hsa_isa_get_info_alt(isa, HSA_ISA_INFO_NAME, TargetID); if (err != HSA_STATUS_SUCCESS) { DP("Error getting ISA info name\n"); return err; } It's unfortunate that HSA does not work on Windows so we need to split this stuff up. jhuber6: This should be what we do in the OpenMP runtime, should be able to add that feature. ```…
				jhuber6Unsubmitted Done Reply Inline Actions Just realized this needs the isa, ignore that. I'll need to scrounge around the HSA info enums to see if this is queriable there for the agent. jhuber6: Just realized this needs the isa, ignore that. I'll need to scrounge around the HSA info enums…
				jhuber6Unsubmitted Done Reply Inline Actions Are we missing something here ? They look the same. jhuber6: Are we missing something here ? They look the same.
				yaxunlAuthorUnsubmitted Done Reply Inline Actions sorry my mistake. will fix. yaxunl: sorry my mistake. will fix.

	hsa_status_t Status = hsa_init();
	if (Status != HSA_STATUS_SUCCESS) {
	return 1;
	}

	std::vector<std::string> GPUs;
	Status = hsa_iterate_agents(iterateAgentsCallback, &GPUs);
	if (Status != HSA_STATUS_SUCCESS) {
	return 1;
	}

	for (const auto &GPU : GPUs)
	printf("%s\n", GPU.c_str());

	if (GPUs.size() < 1)
	return 1;

	hsa_shut_down();
	return 0;
	}			}

clang/tools/amdgpu-arch/AMDGPUArchByHIP.cpp

This file was added.

				//===- AMDGPUArch.cpp - list AMDGPU installed ----------- C++ ----------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements a tool for detecting name of AMDGPU installed in system
				// using HIP runtime. This tool is used by AMDGPU OpenMP and HIP driver.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Support/DynamicLibrary.h"
				#include "llvm/Support/Error.h"
				#include "llvm/Support/raw_ostream.h"

				using namespace llvm;

				typedef struct {
				char padding[396];
				char gcnArchName[256];
				char padding2[1024];
				} hipDeviceProp_t;

				typedef enum {
				hipSuccess = 0,
				} hipError_t;

				typedef hipError_t (hipGetDeviceCount_t)(int );
				typedef hipError_t (hipDeviceGet_t)(int , int);
				typedef hipError_t (hipGetDeviceProperties_t)(hipDeviceProp_t , int);

				int printGPUsByHIP() {
				#ifdef _WIN32
				constexpr const char *DynamicHIPPath = "amdhip64.dll";
				#else
				constexpr const char *DynamicHIPPath = "libamdhip64.so";
				#endif

				std::string ErrMsg;
				auto DynlibHandle = std::make_unique<llvm::sys::DynamicLibrary>(
				llvm::sys::DynamicLibrary::getPermanentLibrary(DynamicHIPPath, &ErrMsg));
				if (!DynlibHandle->isValid()) {
				llvm::errs() << "Failed to load " << DynamicHIPPath << ": " << ErrMsg
				<< '\n';
				return 1;
				}

				#define DYNAMIC_INIT_HIP(SYMBOL) \
				{ \
				void *SymbolPtr = DynlibHandle->getAddressOfSymbol(#SYMBOL); \
				if (!SymbolPtr) { \
				llvm::errs() << "Failed to find symbol " << #SYMBOL << '\n'; \
				return 1; \
				} \
				SYMBOL = reinterpret_cast<decltype(SYMBOL)>(SymbolPtr); \
				}

				hipGetDeviceCount_t hipGetDeviceCount;
				hipDeviceGet_t hipDeviceGet;
				hipGetDeviceProperties_t hipGetDeviceProperties;

				DYNAMIC_INIT_HIP(hipGetDeviceCount);
				DYNAMIC_INIT_HIP(hipDeviceGet);
				DYNAMIC_INIT_HIP(hipGetDeviceProperties);

				#undef DYNAMIC_INIT_HIP

				int deviceCount;
				hipError_t err = hipGetDeviceCount(&deviceCount);
				if (err != hipSuccess) {
				llvm::errs() << "Failed to get device count\n";
				return 1;
				}

				for (int i = 0; i < deviceCount; ++i) {
				int deviceId;
				err = hipDeviceGet(&deviceId, i);
				if (err != hipSuccess) {
				llvm::errs() << "Failed to get device id for ordinal " << i << '\n';
				arsenmUnsubmitted Done Reply Inline Actions single quotes around '\n' arsenm: single quotes around '\n'
				yaxunlAuthorUnsubmitted Done Reply Inline Actions will do yaxunl: will do
				arsenmUnsubmitted Not Done Reply Inline Actions Should print some kind of stringified error codes for all of these arsenm: Should print some kind of stringified error codes for all of these
				return 1;
				}

				hipDeviceProp_t prop;
				err = hipGetDeviceProperties(&prop, deviceId);
				if (err != hipSuccess) {
				llvm::errs() << "Failed to get device properties for device " << deviceId
				<< '\n';
				return 1;
				}
				llvm::outs() << prop.gcnArchName << '\n';
				arsenmUnsubmitted Done Reply Inline Actions llvm::outs arsenm: llvm::outs
				yaxunlAuthorUnsubmitted Done Reply Inline Actions will do yaxunl: will do
				}

				return 0;
				}
				jdoerfertUnsubmitted Done Reply Inline Actions Where is the call to this? jdoerfert: Where is the call to this?
				yaxunlAuthorUnsubmitted Done Reply Inline Actions forgot to call this in main. fixed. yaxunl: forgot to call this in main. fixed.

clang/tools/amdgpu-arch/AMDGPUArchByHSA.cpp

This file was added.

				//===- AMDGPUArchLinux.cpp - list AMDGPU installed ------- C++ ----------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements a tool for detecting name of AMDGPU installed in system
				// using HSA on Linux. This tool is used by AMDGPU OpenMP and HIP driver.
				//
				//===----------------------------------------------------------------------===//

				#include "clang/Basic/Version.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/DynamicLibrary.h"
				#include "llvm/Support/Error.h"
				#include "llvm/Support/raw_ostream.h"
				#include <memory>
				#include <string>
				#include <vector>

				using namespace llvm;

				typedef enum {
				HSA_STATUS_SUCCESS = 0x0,
				} hsa_status_t;

				typedef enum {
				HSA_DEVICE_TYPE_CPU = 0,
				HSA_DEVICE_TYPE_GPU = 1,
				} hsa_device_type_t;

				typedef enum {
				HSA_AGENT_INFO_NAME = 0,
				HSA_AGENT_INFO_DEVICE = 17,
				} hsa_agent_info_t;

				typedef struct hsa_agent_s {
				uint64_t handle;
				} hsa_agent_t;

				hsa_status_t (*hsa_init)();
				hsa_status_t (*hsa_shut_down)();
				hsa_status_t (hsa_agent_get_info)(hsa_agent_t, hsa_agent_info_t, void );
				hsa_status_t (hsa_iterate_agents)(hsa_status_t ()(hsa_agent_t, void *),
				void *);

				constexpr const char *DynamicHSAPath = "libhsa-runtime64.so";

				llvm::Error loadHSA() {
				std::string ErrMsg;
				auto DynlibHandle = std::make_unique<llvm::sys::DynamicLibrary>(
				llvm::sys::DynamicLibrary::getPermanentLibrary(DynamicHSAPath, &ErrMsg));
				if (!DynlibHandle->isValid()) {
				return llvm::createStringError(llvm::inconvertibleErrorCode(),
				"Failed to 'dlopen' %s", DynamicHSAPath);
				}
				#define DYNAMIC_INIT(SYMBOL) \
				{ \
				void *SymbolPtr = DynlibHandle->getAddressOfSymbol(#SYMBOL); \
				if (!SymbolPtr) \
				return llvm::createStringError(llvm::inconvertibleErrorCode(), \
				"Failed to 'dlsym' " #SYMBOL); \
				SYMBOL = reinterpret_cast<decltype(SYMBOL)>(SymbolPtr); \
				}
				DYNAMIC_INIT(hsa_init);
				DYNAMIC_INIT(hsa_shut_down);
				DYNAMIC_INIT(hsa_agent_get_info);
				DYNAMIC_INIT(hsa_iterate_agents);
				#undef DYNAMIC_INIT
				return llvm::Error::success();
				}

				static hsa_status_t iterateAgentsCallback(hsa_agent_t Agent, void *Data) {
				hsa_device_type_t DeviceType;
				hsa_status_t Status =
				hsa_agent_get_info(Agent, HSA_AGENT_INFO_DEVICE, &DeviceType);

				// continue only if device type if GPU
				if (Status != HSA_STATUS_SUCCESS \|\| DeviceType != HSA_DEVICE_TYPE_GPU) {
				return Status;
				}

				std::vector<std::string> *GPUs =
				static_cast<std::vector<std::string> *>(Data);
				char GPUName[64];
				Status = hsa_agent_get_info(Agent, HSA_AGENT_INFO_NAME, GPUName);
				if (Status != HSA_STATUS_SUCCESS) {
				return Status;
				}
				GPUs->push_back(GPUName);
				return HSA_STATUS_SUCCESS;
				}

				int printGPUsByHSA() {
				// Attempt to load the HSA runtime.
				if (llvm::Error Err = loadHSA()) {
				logAllUnhandledErrors(std::move(Err), llvm::errs());
				return 1;
				}

				hsa_status_t Status = hsa_init();
				if (Status != HSA_STATUS_SUCCESS) {
				return 1;
				}

				std::vector<std::string> GPUs;
				Status = hsa_iterate_agents(iterateAgentsCallback, &GPUs);
				if (Status != HSA_STATUS_SUCCESS) {
				return 1;
				}

				for (const auto &GPU : GPUs)
				llvm::outs() << GPU << '\n';
				arsenmUnsubmitted Done Reply Inline Actions llvm::outs()? arsenm: llvm::outs()?
				yaxunlAuthorUnsubmitted Done Reply Inline Actions will do yaxunl: will do

				if (GPUs.size() < 1)
				return 1;

				hsa_shut_down();
				return 0;
				}

clang/tools/amdgpu-arch/CMakeLists.txt

	# //===----------------------------------------------------------------------===//			# //===----------------------------------------------------------------------===//
	# //			# //
	# // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			# // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	# // See https://llvm.org/LICENSE.txt for details.			# // See https://llvm.org/LICENSE.txt for details.
	# // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			# // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	# //			# //
	# //===----------------------------------------------------------------------===//			# //===----------------------------------------------------------------------===//

	set(LLVM_LINK_COMPONENTS Support)			set(LLVM_LINK_COMPONENTS Support)

	add_clang_tool(amdgpu-arch AMDGPUArch.cpp)			add_clang_tool(amdgpu-arch AMDGPUArch.cpp AMDGPUArchByHSA.cpp AMDGPUArchByHIP.cpp)

	target_link_libraries(amdgpu-arch PRIVATE clangBasic)			target_link_libraries(amdgpu-arch PRIVATE clangBasic)

This is an archive of the discontinued LLVM Phabricator instance.

[clang] Make amdgpu-arch tool work on WindowsClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 536257

clang/tools/amdgpu-arch/AMDGPUArch.cpp

clang/tools/amdgpu-arch/AMDGPUArchByHIP.cpp

clang/tools/amdgpu-arch/AMDGPUArchByHSA.cpp

clang/tools/amdgpu-arch/CMakeLists.txt

[clang] Make amdgpu-arch tool work on Windows
ClosedPublic