Page MenuHomePhabricator

[AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed
ClosedPublic

Authored by pdhaliwal on Apr 6 2021, 5:21 AM.

Details

Summary

This patch adds new clang tool named amdgpu-arch which uses
HSA to detect installed AMDGPU and report back latter's march.
This tool is built only if system has HSA installed.

The value printed by amdgpu-arch is used to fill -march when
latter is not explicitly provided in -Xopenmp-target.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
pdhaliwal marked an inline comment as done.Apr 9 2021, 2:01 AM

New tests look good to me, thanks.

clang/include/clang/Driver/Options.td
928

Path to tool used for detecting AMD GPU arch in the system.

?

clang/lib/Driver/ToolChains/AMDGPU.h
105

Comment out of date. Possibly delete sentence starting Location.

gregrodgers requested changes to this revision.Wed, Apr 14, 7:05 AM

I have two serious concerns with this tool . 1. It does not provide the infrastructure to identify runtime capabilities to satisfy requirements of a compiled image. 2. It is not architecturally neutral or extensible to other architectures. I have a different proposed tool called offload-arch . Here is the help text for offload-arch.

grodgers@r4:/tmp/grodgers/git/aomp13/llvm-project/clang/tools/offload-arch/build$ ./offload-arch -h

offload-arch: Print offload architecture(s) for the current active system.
              or lookup information about offload architectures

Usage:
  offload-arch [ Options ] [ Optional lookup-value ]

  With no options, offload-arch prints a value for first offload architecture
  found in current system.  This value can be used by various clang frontends.
  For example, to compile for openmp offloading on current current system
  one could invoke clang with the following command:

  clang -fopenmp -openmp-targets=`offload-arch` foo.c

  If an optional lookup-value is specified, offload-arch will
  check if the value is either a valid offload-arch or a codename
  and lookup requested additional information. For example,
  this provides all information for offload-arch gfx906:

  offload-arch gfx906 -v

  Options:
  -h  Print this help message
  -a  Print values for all devices. Don't stop at first device found.
  -c  Print codename
  -n  Print numeric pci-id
  -t  Print recommended offload triple.
  -v  Verbose = -a -c -n -t
  -l  Print long codename found in pci.ids file
  -r  Print capabilities of current system to satisfy runtime requirements
      of compiled offload images.  This option is used by the runtime to
      choose correct image when multiple compiled images are availble.

  The alias amdgpu-arch returns 1 if no amdgcn GPU is found.
  The alias nvidia-arch returns 1 if no cuda GPU is found.
  These aliases are useful to determine if architecture-specific tests
  should be run. Or these aliases could be used to conditionally load
  archecture-specific software.

Copyright (c) 2021 ADVANCED MICRO DEVICES, INC.
This revision now requires changes to proceed.Wed, Apr 14, 7:05 AM
  1. It does not provide the infrastructure to identify runtime capabilities to satisfy requirements of a compiled image.

I believe we only require a value for '-march=' to unblock running tests on CI machines. I'd guess you're referring to target id stuff where clang fills in reasonable defaults already.

  1. It is not architecturally neutral or extensible to other architectures.

Yes. By design. Clang calling into some unrelated tool for nvptx seems fine, nvidia-smi may already suffice for that purpose.

If we want to do something clever about guessing whether the machine has an nvidia card or an amdgpu one, clang can call both this tool and the nvidia one, and check both return codes.

Dependence on hsa is not necessary. The amdgpu and nvidia drivers both use PCI codes available in /sys . We should use architecture independent methods as much as possible.

Dependence on hsa is not necessary. The amdgpu and nvidia drivers both use PCI codes available in /sys . We should use architecture independent methods as much as possible.

I may see the disagreement.

  • You are describing, and have implemented, a tool for querying the current system to discover what hardware it has available.
  • This patch is a tool for querying whether to run tests on the current system, and if so, what -march should make those tests succeed.

Those aren't quite the same thing. In particular, finding a recognised gpu on this system is necessary but insufficient to execute on it.
The real condition we require for the amdgpu tests is 'does the hsa on the current system recognise the gpu', and calling into hsa is a great way to establish that.

A tool for introspection would ideally be dependency free. In the best case, it's just a function in clang, and we don't need any of the subprocess file io cruft.

A tool for guessing whether the in tree runtime tests should be attempted on the current system would ideally be this one, to exactly match when hsa can run them.

Rebase and review comments

pdhaliwal marked 2 inline comments as done.Wed, Apr 14, 11:39 PM
JonChesterfield accepted this revision.Thu, Apr 15, 1:50 AM

I've built this, checked it behaves as expected, checked clang does something reasonable when the executable is missing. All looks good to me, explicitly accepting.

We may need an internal call with Greg to work out how to unblock this given his objections above.

gregrodgers added inline comments.Thu, Apr 15, 5:19 AM
clang/tools/amdgpu-arch/CMakeLists.txt
10

What happens when /opt/rocm is not available? Again, we need a cross-architecture mechanism to identify the offload-arch.

clang/tools/amdgpu-arch/CMakeLists.txt
10

Exactly the same as the amdgpu plugin. The cmake detection is char for char identical. This will look in CMAKE_INSTALL_PREFIX, which is where I install these libs when using trunk, and falls back to /opt/rocm which seems to be convenient for some users.

clang/tools/amdgpu-arch/CMakeLists.txt
10

Which may need revising at some point - I like installing hsa as if it was an llvm subcomponent, but other people might want a different convention. As long as we remember to change this file + amdgpu's cmake at the same time, all good.

gregrodgers accepted this revision.Thu, Apr 15, 6:00 AM

I am removing my objection with the understanding that we will either replace or enhance amdgpu-arch with the cross-architecture tool offload-arch as described in my comments above. Also, this patch changes amd toolchains to use amdgpu-arch. So it will not interfere with how I expect the cross-architecture tool to be used to greatly simplify openmp command line arguments.

This revision is now accepted and ready to land.Thu, Apr 15, 6:00 AM
t-tye added inline comments.Thu, Apr 15, 8:54 AM
clang/tools/amdgpu-arch/CMakeLists.txt
10

/opt/tocm will not work with the side-by-side ROCm installation which installs ROCm in directories with the version number. Should there be the ability to configure this?

clang/tools/amdgpu-arch/CMakeLists.txt
10

If /opt/rocm isn't a safe out of the box option for finding rocm, let's remove that from here and the amdgpu plugin. Pre-existing hazard so doesn't need to block this patch.

I'm installing roct and rocr to the same CMAKE_INSTALL_PREFIX as llvm which is why the first clause works out.

How are the rocm components meant to find the corresponding pieces? If there's a rocm cmake install dir variable we could add that to the hints.

This revision was landed with ongoing or failed builds.Thu, Apr 15, 10:26 PM
This revision was automatically updated to reflect the committed changes.

I've reverted this from main for now as there seems to be issue with executing test script on some CI machines.

The failing log pointed at "-mllvm" "--amdhsa-code-object-version=4", which is a hard error if the amdgpu triple is missing from the llvm. I see the test features amdgpu-registered-target. Perhaps that does not behave as one might wish?

I'd suggest rebuilding locally without the amdgpu triple enabled and see if that fails comparably, and if so, it's another argument for fixing D98746. Until the front end can run without amdgpu triple built for the middle end, I don't see how we can have any front end tests for amdgpu.

pdhaliwal reopened this revision.Tue, Apr 20, 3:59 AM
This revision is now accepted and ready to land.Tue, Apr 20, 3:59 AM
pdhaliwal updated this revision to Diff 338810.Tue, Apr 20, 3:59 AM

Reopening this. This version is supposed to fix the buildbot failures on PPC machines.
Since I don't have PPC machine I am not sure if this will work. But the logic
followed here is motivated from Clang :: Driver/program-path-priority.c, so hopefully
it will pass the CI.

pdhaliwal requested review of this revision.Tue, Apr 20, 4:00 AM

Couple of minor points above. I think the increase in error reporting granularity will be helpful if this falls over in the field, as well as helping if we need a third try to get through PPC CI

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
212

Can we reasonably factor out this duplication?

clang/test/Driver/amdgpu-openmp-system-arch-fail.c
11

I'm not sure we can assume /bin/true exists everywhere, perhaps create a shell script on the fly that just exits?

pdhaliwal updated this revision to Diff 338843.Tue, Apr 20, 6:04 AM
pdhaliwal marked 2 inline comments as done.

Review comments addressed.

JonChesterfield accepted this revision.Tue, Apr 20, 6:36 AM

LG, thank you for the debugging, and for the more descriptive failure reporting

This revision is now accepted and ready to land.Tue, Apr 20, 6:36 AM
This revision was landed with ongoing or failed builds.Tue, Apr 20, 10:06 PM
This revision was automatically updated to reflect the committed changes.
JonChesterfield added a comment.EditedWed, Apr 21, 6:47 AM

I was under the impression that #!/usr/bin/env sh is a sensible invocation for running a shell on various systems. The current theory for this struggling with the ppc buildbot is that fedora doesn't support that. Ad hoc searching suggests 'sh' is required to exist in /bin on posix-like systems, and the TestRunner.sh script under clang tests starts with #!/bin/sh

It's still kind of shotgun debugging, but we could change to that #! header across the tests and ensure they all end with a newline while we're at it.

edit: found a colleague with a red hat machine, which reports

line 4: return: can only `return' from a function or sourced script

pdhaliwal reopened this revision.Wed, Apr 21, 7:41 AM
This revision is now accepted and ready to land.Wed, Apr 21, 7:41 AM
pdhaliwal updated this revision to Diff 339235.Wed, Apr 21, 7:41 AM

Replaced the return commands in test scripts with exit command. It seems like
return is handled bit differently on fedora/rhel machines.

pdhaliwal requested review of this revision.Wed, Apr 21, 7:41 AM
JonChesterfield accepted this revision.Wed, Apr 21, 7:43 AM
This revision is now accepted and ready to land.Wed, Apr 21, 7:43 AM
This revision was landed with ongoing or failed builds.Wed, Apr 21, 10:20 PM
This revision was automatically updated to reflect the committed changes.

This change broke multi-stage builds (even if AMDGPU is disabled as a target). The problem is that clang/tools/amdgpu-arch/AMDGPUArch.cpp can't find hsa.h. Is there a quick fix?

This change broke multi-stage builds (even if AMDGPU is disabled as a target). The problem is that clang/tools/amdgpu-arch/AMDGPUArch.cpp can't find hsa.h. Is there a quick fix?

That should not be the case. The cmake for building AMDGPUArch.cpp is guarded by

find_package(hsa-runtime64 QUIET 1.2.0 HINTS ${CMAKE_INSTALL_PREFIX} PATHS /opt/rocm)
if (NOT ${hsa-runtime64_FOUND})
    return

The intent is for this to be built only when hsa-runtime64 is found on disk, at which point hsa.h is meant to be found within that package via cmake magic. Perhaps we need to explicitly specify the include directory, though we do not do so in the openmp plugin this was copied from, and I do not have hsa.h on any global search path.

Little harm will be done by reverting this (again...) and reapplying, but it would be really helpful if you can share some more information about the build config that is failing.

This change broke multi-stage builds (even if AMDGPU is disabled as a target). The problem is that clang/tools/amdgpu-arch/AMDGPUArch.cpp can't find hsa.h. Is there a quick fix?

That should not be the case. The cmake for building AMDGPUArch.cpp is guarded by

find_package(hsa-runtime64 QUIET 1.2.0 HINTS ${CMAKE_INSTALL_PREFIX} PATHS /opt/rocm)
if (NOT ${hsa-runtime64_FOUND})
    return

The intent is for this to be built only when hsa-runtime64 is found on disk, at which point hsa.h is meant to be found within that package via cmake magic. Perhaps we need to explicitly specify the include directory, though we do not do so in the openmp plugin this was copied from, and I do not have hsa.h on any global search path.

Little harm will be done by reverting this (again...) and reapplying, but it would be really helpful if you can share some more information about the build config that is failing.

As it turns out, I do have the HSA runtime installed so perhaps the problem is that the header is actually hsa/hsa.h and not hsa.h?

And if it helps:

$ sudo find / -xdev -name hsa.h
/usr/include/hsa/hsa.h
/opt/rocm-3.10.0/include/hsa/hsa.h
/opt/rocm-4.0.0/include/hsa/hsa.h

JonChesterfield added a comment.EditedThu, Apr 22, 11:31 AM

That's interesting. The various permutations I have on disk are also all path/include/hsa/hsa.h. The existing in tree use of hsa.h is the amdgpu plugin, which uses #include "hsa.h" and #include <hsa.h>, which seems unlikely to be correct.

I'm going to patch this on the fly to hsa/hsa.h as that looks very likely to be the fix.

edit: or not, the include path cmake summons on my local system is -isystem $HOME/llvm-install/include/hsa, so hsa/hsa doesn't work.

Tiresome. Will revert this for now.

edit: talked to some other people. Some follow up,

  • what do you have CMAKE_PREFIX_PATH set to? It seems that's the mechanism for picking from multiple rocm installs
  • is the /usr/include one a manual install? I'm paranoid about installing stuff under /, but maybe that's a common destination for it

That's interesting. The various permutations I have on disk are also all path/include/hsa/hsa.h. The existing in tree use of hsa.h is the amdgpu plugin, which uses #include "hsa.h" and #include <hsa.h>, which seems unlikely to be correct.

I'm going to patch this on the fly to hsa/hsa.h as that looks very likely to be the fix.

edit: or not, the include path cmake summons on my local system is -isystem $HOME/llvm-install/include/hsa, so hsa/hsa doesn't work.

Tiresome. Will revert this for now.

Ya, I bet. I also find this interesting because this fails my "three stage" test at the third stage which is rare. My stages:

  1. build and test llvm+clang (release without asserts) with the system compiler release
  2. build and test llvm+clang (release with asserts) with the just build compiler
  3. build and test llvm+clang (release without assert) with compiler built in stage one
JonChesterfield added a comment.EditedThu, Apr 22, 12:02 PM

It's not obvious to me why any of those stages would get a different result for the search for rocr. Do you do things with chroot/jails to ensure isolation for some of them?

edit: are the three builds from empty directories? cmake has some unusual caching properties, it might find rocr in one build, remember that in the filesystem, then assume it found it later when actually it wouldn't have

We could also do various soft-fail things. For example, if we find rocr, but can't build it for any reason, warn and continue with the rest of the build. Clang doesn't mind if the tool is missing.

It's not obvious to me why any of those stages would get a different result for the search for rocr. Do you do things with chroot/jails to ensure isolation for some of them?

Technically? Yes. Practically? No. My setup just remounts a temporary /tmp during the build to garbage collect stuff the build puts in /tmp. And the second/third stages use the clang built/installed from the first stage but otherwise it's not different.

When find_package can find rocr, but it sets up include paths that don't work with it, I think that constitutes a broken rocr install. At least, one that is difficult to use from cmake.

The packages are meant to deal with that sort of thing which suggests the /usr/include one. If you don't have a symlink in /opt/rocm then the HINTS above shouldn't be finding either of those, which also suggests the /usr/include. Can I interest you in moving / deleting that copy of hsa, so see if it's a machine-local quirk?

I removed all HSA/ROCM via the package manager and everything is fine now. Have you considered using __has_include() instead of CMake? It seems to be supported by all contemporary compilers, even MSVC.

JonChesterfield added a comment.EditedThu, Apr 22, 3:32 PM

I removed all HSA/ROCM via the package manager and everything is fine now. Have you considered using __has_include() instead of CMake? It seems to be supported by all contemporary compilers, even MSVC.

To confirm, you can build the pre-revert version of this patch after that adjustment? If so I'll give this another try. (edit: misread, if all hsa is gone from your system, it won't try to build this tool at all, so you're good)

I have indeed considered something along the lines of

#if __has_include
#if __has_include("hsa.h")
#include "hsa.h"
#elif __has_include("hsa/hsa.h")
#include "hsa/hsa.h"
#endif
#else
#include "hsa.h"
#endif

It's tempting to forward declare the few pieces used instead. Hopefully the cmake find_package stuff will be robust enough to avoid such things.

It's still failing but I can disable HSA on this machine for now. FYI --

FAILED: tools/clang/tools/amdgpu-arch/CMakeFiles/amdgpu-arch.dir/AMDGPUArch.cpp.o
/p/tllvm/bin/clang++ -DGTEST_HAS_RTTI=0 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/clang/tools/amdgpu-arch -I/home/dave/ro_s/lp/clang/tools/amdgpu-arch -I/home/dave/ro_s/lp/clang/include -Itools/clang/include -Iinclude -I/home/dave/ro_s/lp/llvm/include -Werror=switch -Wno-deprecated-copy -stdlib=libc++ -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O2 -DNDEBUG  -march=skylake -fno-vectorize -fno-slp-vectorize -fno-tree-slp-vectorize -fno-exceptions -fno-rtti -std=c++14 -MD -MT tools/clang/tools/amdgpu-arch/CMakeFiles/amdgpu-arch.dir/AMDGPUArch.cpp.o -MF tools/clang/tools/amdgpu-arch/CMakeFiles/amdgpu-arch.dir/AMDGPUArch.cpp.o.d -o tools/clang/tools/amdgpu-arch/CMakeFiles/amdgpu-arch.dir/AMDGPUArch.cpp.o -c /home/dave/ro_s/lp/clang/tools/amdgpu-arch/AMDGPUArch.cpp
/home/dave/ro_s/lp/clang/tools/amdgpu-arch/AMDGPUArch.cpp:14:10: fatal error: 'hsa.h' file not found
#include <hsa.h>
         ^~~~~~~
1 error generated.

None of those include paths look like the result of find_package locating a hsa so I'd guess this some quirk of cmake. It seems it is setting hsa-runtime64_FOUND but not doing the include path setup to match. I don't know how to debug that. @pdhaliwal?

JonChesterfield added a comment.EditedWed, May 5, 10:52 AM

We may need more robust error handling around this. I'm seeing test failures when building clang,

clang-13: error: Cannot determine AMDGPU architecture: $HOME/llvm-build/llvm/./bin/amdgpu-arch: Execute failed: No such file or directory. Consider passing it via --march.

The executable exists, but when run directly it fails with error while loading shared libraries: libhsa-runtime64.so.1: cannot open shared object file: No such file or directory
It's built with RUNPATH = [$ORIGIN/../lib], which doesn't necessarily work from the build directory.

Two somewhat orthogonal things here I think.

  • skip running the corresponding clang tests when ./amdgpu-arch fails, whatever reason it fails for (which I thought was already the case, but can't find that in D99656)
  • be able to run from the build directory, which seems to be the default location for the tests (e.g. D101926)

Changing clang_target_link_libraries to target_link_libraries made no difference. There is some cmake handling available around rpath but I haven't been able to determine what it should be set to.

edit: becoming increasingly tempting to statically link rocr, but that presently fails in a different mode

I am investigating the find_package issue.

I could not find anything in the cmake files which could point to the issue mentioned here. @davezarzycki, are you on fedora/redhat?

Yes, Fedora 34 (x86-64).

JonChesterfield added a comment.EditedThu, May 6, 6:28 PM

There is a potential hazard here for parallel compilation and to a lesser extent testing. (I have just learned that) kfd imposes a process limit which is low enough that we may see hsa_init fail under multiple processes.

@jdoerfert you talked me out of embedding a map from pci.ids to architecture as we'd have to keep it up to date, and it can diverge from how the runtime libraries identify the hardware. I'm starting to think that's the lesser of two evils. Would probably look something like

id = cat-somewhere-in-/sys
switch(id) {
default:
  return nullptr;
case 0x67C0:
case 0x67C1:
case 0x67C2:
  return "gfx803";
... // ~ 100 lines on this theme, picks up new entries when new hardware is released
}

where the table could go in this tool, or just in AMDGPU.cpp somewhere and wipe out all the failure modes from above

Greg was also interested in having pci ids table in amdgpu-arch. And, keeping this table inside the target/amdgpu directory sounds like a good idea. Overall, I agree with not having dependency on hsa as it has caused many issues.

I have put up a patch D102067 which uses __has_include as a workaround for header not found issue. @davezarzycki can you check if this resolves the issue?