This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed
ClosedPublic

Authored by pdhaliwal on Apr 6 2021, 5:21 AM.

Details

Summary

This patch adds new clang tool named amdgpu-arch which uses
HSA to detect installed AMDGPU and report back latter's march.
This tool is built only if system has HSA installed.

The value printed by amdgpu-arch is used to fill -march when
latter is not explicitly provided in -Xopenmp-target.

Diff Detail

Event Timeline

pdhaliwal created this revision.Apr 6 2021, 5:21 AM
pdhaliwal requested review of this revision.Apr 6 2021, 5:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2021, 5:21 AM
pdhaliwal planned changes to this revision.Apr 6 2021, 5:22 AM

Working on tests.

This change is partly motivated by wanting to check in runtime tests for openmp that execute on whatever hardware is available locally. It is functionally similar to an out of tree bash script called mygpu that contains manually curated tables of pci.ids and to a python script called rocm_agent_enumerator that calls a c++ tool called rocminfo and tries to parse the output, with a different table of pci.ids for when that fails.

Ultimately, the bottom of this stack is a map from pci.id to gfx number stored in the user space driver thunk library, roct. That is linked into hsa. It would be simpler programming to copy&paste that map from roct into the openmp clang driver at the cost of inevitable divergence between the architecture clang detects and the architecture the runtime libraries detect. Spawning a process and reading stdout is a bit messy, but it beats copying the table, and it beats linking the gpu driver into clang in order to get at the table of numbers. This seems the right balance to me.

It should be possible to do something similar with cuda for nvptx, but that should be a separate executable. Partly to handle the permutations of cuda / hsa that may be present on a system. I haven't found the corresponding API call in cuda. The standalone tool nvidia-smi might be willing to emit sm_70 or similar to stdout, but I haven't found the right command line flags for that either. Rocminfo does not appear to be configurable, and is not necessarily present when compiling for amdgpu.

A bunch of comments inline, mostly style. I think there's a use-after-free bug.

It looks like our existing command line handling could be more robust. In particular, there should be error messages about march where there seem to be asserts. That makes it slightly unclear how we should handle things like the helper tool returning a string clang doesn't recognise (e.g. doesn't start with gfx). Something to revise separate to this patch I think.

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
73

AMDGPU_ARCH_PROGRAM_NAME?

78

This looks like there are too many stringrefs. Redirecting stdout to a temporary file seems reasonable, but I'd expect the pointers into OutputBuf to be invalid when it drops out of scope. Perhaps return a smallvector of smallstrings instead?

Also, we're probably expecting fewer than 8 different gpus, probably as few as 1 in the most common case, so maybe a smallvector<type,1>

85

s/const int RC =//

86

can we pass {} for execArgs here?

106

Perhaps run all_of against the whole range and drop if size () > 1 test?

156

We shouldn't be handling unknown or missing march= fields with asserts. I see that this is already the case in multiple places, so let's go with a matching assert for this and aspire to fix that in a separate patch.

clang/tools/amdgpu-arch/AMDGPUArch.cpp
33

Simpler code if we drop the class and pass in the vector<string> itself as the void*

35

Does this null terminate for any length of GPU name? Wondering if we should explicitly zero out the last char.

46

Unsure these should be writing to stderr. We capture stdout, stderr probably goes to the user. We could exit 1 instead as clang is going to treat any failure to guess the arch identically

clang/tools/amdgpu-arch/CMakeLists.txt
12

Assuming this matches the logic in amdgpu openmp plugin

yaxunl added inline comments.Apr 6 2021, 6:52 AM
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
100

This function is useful for AMDGPU toolchain and HIP toolchain. Can it be a member of AMDGPU toolchain?

clang/tools/amdgpu-arch/CMakeLists.txt
10

This tool does not use omp header file. Why is this needed?

pdhaliwal updated this revision to Diff 335761.Apr 7 2021, 2:24 AM
pdhaliwal marked 10 inline comments as done.

Addressed review comments.

RE test: Since the tool is contingent on the results of HSA API call, adding a test
which would always PASS on all the systems with different AMD GPUs as well as always ignored on systems
with non AMDGPUs would not work. I welcome suggestions on how to resolve this.

pdhaliwal added a comment.EditedApr 7 2021, 2:25 AM
This comment has been deleted.
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
156

Matched this one with below.

clang/tools/amdgpu-arch/AMDGPUArch.cpp
35

Checked the rocr-runtime, the output is null terminated.

46

Remove fprintf

yaxunl added a comment.Apr 7 2021, 6:39 AM

LGTM about AMDGPU toolchain change. Thanks.

I'm happy with this as-is. @jdoerfert is this close enough to what you expected when we discussed this offline?

Adding it directly to AMDGPU.h was a good suggestion. Makes it easy for other amdgpu language drivers to pick it up.

That it's a binary on disk also means that more DIY build systems, e.g. freestanding C++ stuff, can call it directly, in -march=$(./amdgpu-arch) fashion.

clang/tools/amdgpu-arch/AMDGPUArch.cpp
52

This strategy looks good. Briefly considered whether we should print while iterating, but that will misbehave if there is an error after printing the first gpu.

unsigned is strictly the wrong type here, though it doesn't matter in practice. Could go with a range based for loop instead.

no strong opinion rn, I would add tests though

We'll have slightly indirect testing once this is used to enable D99656. There are two pieces that can be tested:

1/ The clang handling. That we can test with a minor change. Add a command line argument to clang that specifies the name of the tool to call to get the architecture (and defaults to amdgpu-arch). Then write N tests that call N different shell scripts that return the various cases. The plumbing for that argument may prove useful when adding a corresponding nvptx toool.

2/ The call into hsa. That probably means stubbing out hsa. Kind of interested in that - a hsa library that can be jury rigged to fail various calls is a path to a robust amdgpu plugin. Way more complexity in that fake stub than in this code though. I don't think that's worthwhile here.

clang/tools/amdgpu-arch/AMDGPUArch.cpp
43

missed one, return 1 on failure to initialize

pdhaliwal updated this revision to Diff 336037.Apr 8 2021, 2:13 AM
  • Addressed review comments
  • Added LIT test case
JonChesterfield added inline comments.Apr 8 2021, 6:28 AM
clang/include/clang/Driver/Options.td
921

I'd expect path to be the directory in which the tool is found, whereas this is used to name the tool itself. Perhaps 'amdgpu_arch_tool='? We might be able to write the default string inline as part of the argument definition, otherwise the current handling looks ok

clang/test/Driver/amdgpu-openmp-system-arch.c
2

This seems fairly clear. We might want to drop the fcuda-is-device from the regex as it doesn't matter for the test.

Failing cases (maybe driven by a separate C file) are probably:
print nothing, return 0
print nothing, return 1
print two different strings, return 0

pdhaliwal updated this revision to Diff 336351.Apr 9 2021, 1:58 AM

Added tests for the failing cases

pdhaliwal updated this revision to Diff 336353.Apr 9 2021, 2:00 AM
pdhaliwal marked an inline comment as done.

Fix permissions

pdhaliwal marked an inline comment as done.Apr 9 2021, 2:01 AM

New tests look good to me, thanks.

clang/include/clang/Driver/Options.td
922

Path to tool used for detecting AMD GPU arch in the system.

?

clang/lib/Driver/ToolChains/AMDGPU.h
105

Comment out of date. Possibly delete sentence starting Location.

gregrodgers requested changes to this revision.Apr 14 2021, 7:05 AM

I have two serious concerns with this tool . 1. It does not provide the infrastructure to identify runtime capabilities to satisfy requirements of a compiled image. 2. It is not architecturally neutral or extensible to other architectures. I have a different proposed tool called offload-arch . Here is the help text for offload-arch.

grodgers@r4:/tmp/grodgers/git/aomp13/llvm-project/clang/tools/offload-arch/build$ ./offload-arch -h

offload-arch: Print offload architecture(s) for the current active system.
              or lookup information about offload architectures

Usage:
  offload-arch [ Options ] [ Optional lookup-value ]

  With no options, offload-arch prints a value for first offload architecture
  found in current system.  This value can be used by various clang frontends.
  For example, to compile for openmp offloading on current current system
  one could invoke clang with the following command:

  clang -fopenmp -openmp-targets=`offload-arch` foo.c

  If an optional lookup-value is specified, offload-arch will
  check if the value is either a valid offload-arch or a codename
  and lookup requested additional information. For example,
  this provides all information for offload-arch gfx906:

  offload-arch gfx906 -v

  Options:
  -h  Print this help message
  -a  Print values for all devices. Don't stop at first device found.
  -c  Print codename
  -n  Print numeric pci-id
  -t  Print recommended offload triple.
  -v  Verbose = -a -c -n -t
  -l  Print long codename found in pci.ids file
  -r  Print capabilities of current system to satisfy runtime requirements
      of compiled offload images.  This option is used by the runtime to
      choose correct image when multiple compiled images are availble.

  The alias amdgpu-arch returns 1 if no amdgcn GPU is found.
  The alias nvidia-arch returns 1 if no cuda GPU is found.
  These aliases are useful to determine if architecture-specific tests
  should be run. Or these aliases could be used to conditionally load
  archecture-specific software.

Copyright (c) 2021 ADVANCED MICRO DEVICES, INC.
This revision now requires changes to proceed.Apr 14 2021, 7:05 AM
  1. It does not provide the infrastructure to identify runtime capabilities to satisfy requirements of a compiled image.

I believe we only require a value for '-march=' to unblock running tests on CI machines. I'd guess you're referring to target id stuff where clang fills in reasonable defaults already.

  1. It is not architecturally neutral or extensible to other architectures.

Yes. By design. Clang calling into some unrelated tool for nvptx seems fine, nvidia-smi may already suffice for that purpose.

If we want to do something clever about guessing whether the machine has an nvidia card or an amdgpu one, clang can call both this tool and the nvidia one, and check both return codes.

Dependence on hsa is not necessary. The amdgpu and nvidia drivers both use PCI codes available in /sys . We should use architecture independent methods as much as possible.

Dependence on hsa is not necessary. The amdgpu and nvidia drivers both use PCI codes available in /sys . We should use architecture independent methods as much as possible.

I may see the disagreement.

  • You are describing, and have implemented, a tool for querying the current system to discover what hardware it has available.
  • This patch is a tool for querying whether to run tests on the current system, and if so, what -march should make those tests succeed.

Those aren't quite the same thing. In particular, finding a recognised gpu on this system is necessary but insufficient to execute on it.
The real condition we require for the amdgpu tests is 'does the hsa on the current system recognise the gpu', and calling into hsa is a great way to establish that.

A tool for introspection would ideally be dependency free. In the best case, it's just a function in clang, and we don't need any of the subprocess file io cruft.

A tool for guessing whether the in tree runtime tests should be attempted on the current system would ideally be this one, to exactly match when hsa can run them.

Rebase and review comments

pdhaliwal marked 2 inline comments as done.Apr 14 2021, 11:39 PM
JonChesterfield accepted this revision.Apr 15 2021, 1:50 AM

I've built this, checked it behaves as expected, checked clang does something reasonable when the executable is missing. All looks good to me, explicitly accepting.

We may need an internal call with Greg to work out how to unblock this given his objections above.

gregrodgers added inline comments.Apr 15 2021, 5:19 AM
clang/tools/amdgpu-arch/CMakeLists.txt
10

What happens when /opt/rocm is not available? Again, we need a cross-architecture mechanism to identify the offload-arch.

clang/tools/amdgpu-arch/CMakeLists.txt
10

Exactly the same as the amdgpu plugin. The cmake detection is char for char identical. This will look in CMAKE_INSTALL_PREFIX, which is where I install these libs when using trunk, and falls back to /opt/rocm which seems to be convenient for some users.

clang/tools/amdgpu-arch/CMakeLists.txt
10

Which may need revising at some point - I like installing hsa as if it was an llvm subcomponent, but other people might want a different convention. As long as we remember to change this file + amdgpu's cmake at the same time, all good.

gregrodgers accepted this revision.Apr 15 2021, 6:00 AM

I am removing my objection with the understanding that we will either replace or enhance amdgpu-arch with the cross-architecture tool offload-arch as described in my comments above. Also, this patch changes amd toolchains to use amdgpu-arch. So it will not interfere with how I expect the cross-architecture tool to be used to greatly simplify openmp command line arguments.

This revision is now accepted and ready to land.Apr 15 2021, 6:00 AM
t-tye added inline comments.Apr 15 2021, 8:54 AM
clang/tools/amdgpu-arch/CMakeLists.txt
10

/opt/tocm will not work with the side-by-side ROCm installation which installs ROCm in directories with the version number. Should there be the ability to configure this?

clang/tools/amdgpu-arch/CMakeLists.txt
10

If /opt/rocm isn't a safe out of the box option for finding rocm, let's remove that from here and the amdgpu plugin. Pre-existing hazard so doesn't need to block this patch.

I'm installing roct and rocr to the same CMAKE_INSTALL_PREFIX as llvm which is why the first clause works out.

How are the rocm components meant to find the corresponding pieces? If there's a rocm cmake install dir variable we could add that to the hints.

This revision was landed with ongoing or failed builds.Apr 15 2021, 10:26 PM
This revision was automatically updated to reflect the committed changes.

I've reverted this from main for now as there seems to be issue with executing test script on some CI machines.

The failing log pointed at "-mllvm" "--amdhsa-code-object-version=4", which is a hard error if the amdgpu triple is missing from the llvm. I see the test features amdgpu-registered-target. Perhaps that does not behave as one might wish?

I'd suggest rebuilding locally without the amdgpu triple enabled and see if that fails comparably, and if so, it's another argument for fixing D98746. Until the front end can run without amdgpu triple built for the middle end, I don't see how we can have any front end tests for amdgpu.

pdhaliwal reopened this revision.Apr 20 2021, 3:59 AM
This revision is now accepted and ready to land.Apr 20 2021, 3:59 AM
pdhaliwal updated this revision to Diff 338810.Apr 20 2021, 3:59 AM

Reopening this. This version is supposed to fix the buildbot failures on PPC machines.
Since I don't have PPC machine I am not sure if this will work. But the logic
followed here is motivated from Clang :: Driver/program-path-priority.c, so hopefully
it will pass the CI.

pdhaliwal requested review of this revision.Apr 20 2021, 4:00 AM

Couple of minor points above. I think the increase in error reporting granularity will be helpful if this falls over in the field, as well as helping if we need a third try to get through PPC CI

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
202

Can we reasonably factor out this duplication?

clang/test/Driver/amdgpu-openmp-system-arch-fail.c
11

I'm not sure we can assume /bin/true exists everywhere, perhaps create a shell script on the fly that just exits?

pdhaliwal updated this revision to Diff 338843.Apr 20 2021, 6:04 AM
pdhaliwal marked 2 inline comments as done.

Review comments addressed.

JonChesterfield accepted this revision.Apr 20 2021, 6:36 AM

LG, thank you for the debugging, and for the more descriptive failure reporting

This revision is now accepted and ready to land.Apr 20 2021, 6:36 AM
This revision was landed with ongoing or failed builds.Apr 20 2021, 10:06 PM
This revision was automatically updated to reflect the committed changes.
JonChesterfield added a comment.EditedApr 21 2021, 6:47 AM

I was under the impression that #!/usr/bin/env sh is a sensible invocation for running a shell on various systems. The current theory for this struggling with the ppc buildbot is that fedora doesn't support that. Ad hoc searching suggests 'sh' is required to exist in /bin on posix-like systems, and the TestRunner.sh script under clang tests starts with #!/bin/sh

It's still kind of shotgun debugging, but we could change to that #! header across the tests and ensure they all end with a newline while we're at it.

edit: found a colleague with a red hat machine, which reports

line 4: return: can only `return' from a function or sourced script

pdhaliwal reopened this revision.Apr 21 2021, 7:41 AM
This revision is now accepted and ready to land.Apr 21 2021, 7:41 AM
pdhaliwal updated this revision to Diff 339235.Apr 21 2021, 7:41 AM

Replaced the return commands in test scripts with exit command. It seems like
return is handled bit differently on fedora/rhel machines.

pdhaliwal requested review of this revision.Apr 21 2021, 7:41 AM
JonChesterfield accepted this revision.Apr 21 2021, 7:43 AM
This revision is now accepted and ready to land.Apr 21 2021, 7:43 AM
This revision was landed with ongoing or failed builds.Apr 21 2021, 10:20 PM
This revision was automatically updated to reflect the committed changes.

This change broke multi-stage builds (even if AMDGPU is disabled as a target). The problem is that clang/tools/amdgpu-arch/AMDGPUArch.cpp can't find hsa.h. Is there a quick fix?

This change broke multi-stage builds (even if AMDGPU is disabled as a target). The problem is that clang/tools/amdgpu-arch/AMDGPUArch.cpp can't find hsa.h. Is there a quick fix?

That should not be the case. The cmake for building AMDGPUArch.cpp is guarded by

find_package(hsa-runtime64 QUIET 1.2.0 HINTS ${CMAKE_INSTALL_PREFIX} PATHS /opt/rocm)
if (NOT ${hsa-runtime64_FOUND})
    return

The intent is for this to be built only when hsa-runtime64 is found on disk, at which point hsa.h is meant to be found within that package via cmake magic. Perhaps we need to explicitly specify the include directory, though we do not do so in the openmp plugin this was copied from, and I do not have hsa.h on any global search path.

Little harm will be done by reverting this (again...) and reapplying, but it would be really helpful if you can share some more information about the build config that is failing.

This change broke multi-stage builds (even if AMDGPU is disabled as a target). The problem is that clang/tools/amdgpu-arch/AMDGPUArch.cpp can't find hsa.h. Is there a quick fix?

That should not be the case. The cmake for building AMDGPUArch.cpp is guarded by

find_package(hsa-runtime64 QUIET 1.2.0 HINTS ${CMAKE_INSTALL_PREFIX} PATHS /opt/rocm)
if (NOT ${hsa-runtime64_FOUND})
    return

The intent is for this to be built only when hsa-runtime64 is found on disk, at which point hsa.h is meant to be found within that package via cmake magic. Perhaps we need to explicitly specify the include directory, though we do not do so in the openmp plugin this was copied from, and I do not have hsa.h on any global search path.

Little harm will be done by reverting this (again...) and reapplying, but it would be really helpful if you can share some more information about the build config that is failing.

As it turns out, I do have the HSA runtime installed so perhaps the problem is that the header is actually hsa/hsa.h and not hsa.h?

And if it helps:

$ sudo find / -xdev -name hsa.h
/usr/include/hsa/hsa.h
/opt/rocm-3.10.0/include/hsa/hsa.h
/opt/rocm-4.0.0/include/hsa/hsa.h

JonChesterfield added a comment.EditedApr 22 2021, 11:31 AM

That's interesting. The various permutations I have on disk are also all path/include/hsa/hsa.h. The existing in tree use of hsa.h is the amdgpu plugin, which uses #include "hsa.h" and #include <hsa.h>, which seems unlikely to be correct.

I'm going to patch this on the fly to hsa/hsa.h as that looks very likely to be the fix.

edit: or not, the include path cmake summons on my local system is -isystem $HOME/llvm-install/include/hsa, so hsa/hsa doesn't work.

Tiresome. Will revert this for now.

edit: talked to some other people. Some follow up,

  • what do you have CMAKE_PREFIX_PATH set to? It seems that's the mechanism for picking from multiple rocm installs
  • is the /usr/include one a manual install? I'm paranoid about installing stuff under /, but maybe that's a common destination for it

That's interesting. The various permutations I have on disk are also all path/include/hsa/hsa.h. The existing in tree use of hsa.h is the amdgpu plugin, which uses #include "hsa.h" and #include <hsa.h>, which seems unlikely to be correct.

I'm going to patch this on the fly to hsa/hsa.h as that looks very likely to be the fix.

edit: or not, the include path cmake summons on my local system is -isystem $HOME/llvm-install/include/hsa, so hsa/hsa doesn't work.

Tiresome. Will revert this for now.

Ya, I bet. I also find this interesting because this fails my "three stage" test at the third stage which is rare. My stages:

  1. build and test llvm+clang (release without asserts) with the system compiler release
  2. build and test llvm+clang (release with asserts) with the just build compiler
  3. build and test llvm+clang (release without assert) with compiler built in stage one
JonChesterfield added a comment.EditedApr 22 2021, 12:02 PM

It's not obvious to me why any of those stages would get a different result for the search for rocr. Do you do things with chroot/jails to ensure isolation for some of them?

edit: are the three builds from empty directories? cmake has some unusual caching properties, it might find rocr in one build, remember that in the filesystem, then assume it found it later when actually it wouldn't have

We could also do various soft-fail things. For example, if we find rocr, but can't build it for any reason, warn and continue with the rest of the build. Clang doesn't mind if the tool is missing.

It's not obvious to me why any of those stages would get a different result for the search for rocr. Do you do things with chroot/jails to ensure isolation for some of them?

Technically? Yes. Practically? No. My setup just remounts a temporary /tmp during the build to garbage collect stuff the build puts in /tmp. And the second/third stages use the clang built/installed from the first stage but otherwise it's not different.

When find_package can find rocr, but it sets up include paths that don't work with it, I think that constitutes a broken rocr install. At least, one that is difficult to use from cmake.

The packages are meant to deal with that sort of thing which suggests the /usr/include one. If you don't have a symlink in /opt/rocm then the HINTS above shouldn't be finding either of those, which also suggests the /usr/include. Can I interest you in moving / deleting that copy of hsa, so see if it's a machine-local quirk?

I removed all HSA/ROCM via the package manager and everything is fine now. Have you considered using __has_include() instead of CMake? It seems to be supported by all contemporary compilers, even MSVC.

JonChesterfield added a comment.EditedApr 22 2021, 3:32 PM

I removed all HSA/ROCM via the package manager and everything is fine now. Have you considered using __has_include() instead of CMake? It seems to be supported by all contemporary compilers, even MSVC.

To confirm, you can build the pre-revert version of this patch after that adjustment? If so I'll give this another try. (edit: misread, if all hsa is gone from your system, it won't try to build this tool at all, so you're good)

I have indeed considered something along the lines of

#if __has_include
#if __has_include("hsa.h")
#include "hsa.h"
#elif __has_include("hsa/hsa.h")
#include "hsa/hsa.h"
#endif
#else
#include "hsa.h"
#endif

It's tempting to forward declare the few pieces used instead. Hopefully the cmake find_package stuff will be robust enough to avoid such things.

It's still failing but I can disable HSA on this machine for now. FYI --

FAILED: tools/clang/tools/amdgpu-arch/CMakeFiles/amdgpu-arch.dir/AMDGPUArch.cpp.o
/p/tllvm/bin/clang++ -DGTEST_HAS_RTTI=0 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/clang/tools/amdgpu-arch -I/home/dave/ro_s/lp/clang/tools/amdgpu-arch -I/home/dave/ro_s/lp/clang/include -Itools/clang/include -Iinclude -I/home/dave/ro_s/lp/llvm/include -Werror=switch -Wno-deprecated-copy -stdlib=libc++ -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O2 -DNDEBUG  -march=skylake -fno-vectorize -fno-slp-vectorize -fno-tree-slp-vectorize -fno-exceptions -fno-rtti -std=c++14 -MD -MT tools/clang/tools/amdgpu-arch/CMakeFiles/amdgpu-arch.dir/AMDGPUArch.cpp.o -MF tools/clang/tools/amdgpu-arch/CMakeFiles/amdgpu-arch.dir/AMDGPUArch.cpp.o.d -o tools/clang/tools/amdgpu-arch/CMakeFiles/amdgpu-arch.dir/AMDGPUArch.cpp.o -c /home/dave/ro_s/lp/clang/tools/amdgpu-arch/AMDGPUArch.cpp
/home/dave/ro_s/lp/clang/tools/amdgpu-arch/AMDGPUArch.cpp:14:10: fatal error: 'hsa.h' file not found
#include <hsa.h>
         ^~~~~~~
1 error generated.

None of those include paths look like the result of find_package locating a hsa so I'd guess this some quirk of cmake. It seems it is setting hsa-runtime64_FOUND but not doing the include path setup to match. I don't know how to debug that. @pdhaliwal?

JonChesterfield added a comment.EditedMay 5 2021, 10:52 AM

We may need more robust error handling around this. I'm seeing test failures when building clang,

clang-13: error: Cannot determine AMDGPU architecture: $HOME/llvm-build/llvm/./bin/amdgpu-arch: Execute failed: No such file or directory. Consider passing it via --march.

The executable exists, but when run directly it fails with error while loading shared libraries: libhsa-runtime64.so.1: cannot open shared object file: No such file or directory
It's built with RUNPATH = [$ORIGIN/../lib], which doesn't necessarily work from the build directory.

Two somewhat orthogonal things here I think.

  • skip running the corresponding clang tests when ./amdgpu-arch fails, whatever reason it fails for (which I thought was already the case, but can't find that in D99656)
  • be able to run from the build directory, which seems to be the default location for the tests (e.g. D101926)

Changing clang_target_link_libraries to target_link_libraries made no difference. There is some cmake handling available around rpath but I haven't been able to determine what it should be set to.

edit: becoming increasingly tempting to statically link rocr, but that presently fails in a different mode

I am investigating the find_package issue.

I could not find anything in the cmake files which could point to the issue mentioned here. @davezarzycki, are you on fedora/redhat?

Yes, Fedora 34 (x86-64).

JonChesterfield added a comment.EditedMay 6 2021, 6:28 PM

There is a potential hazard here for parallel compilation and to a lesser extent testing. (I have just learned that) kfd imposes a process limit which is low enough that we may see hsa_init fail under multiple processes.

@jdoerfert you talked me out of embedding a map from pci.ids to architecture as we'd have to keep it up to date, and it can diverge from how the runtime libraries identify the hardware. I'm starting to think that's the lesser of two evils. Would probably look something like

id = cat-somewhere-in-/sys
switch(id) {
default:
  return nullptr;
case 0x67C0:
case 0x67C1:
case 0x67C2:
  return "gfx803";
... // ~ 100 lines on this theme, picks up new entries when new hardware is released
}

where the table could go in this tool, or just in AMDGPU.cpp somewhere and wipe out all the failure modes from above

Greg was also interested in having pci ids table in amdgpu-arch. And, keeping this table inside the target/amdgpu directory sounds like a good idea. Overall, I agree with not having dependency on hsa as it has caused many issues.

I have put up a patch D102067 which uses __has_include as a workaround for header not found issue. @davezarzycki can you check if this resolves the issue?