This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/
-
clang/
-
Basic/
-
DiagnosticDriverKinds.td
-
Driver/
2/2
Options.td
-
lib/Driver/ToolChains/
-
Driver/
-
ToolChains/
1/1
AMDGPU.h
-
AMDGPU.cpp
9/9
AMDGPUOpenMP.cpp
-
test/Driver/
-
Driver/
-
Inputs/amdgpu-arch/
-
amdgpu-arch/
-
amdgpu_arch_different
-
amdgpu_arch_fail
-
amdgpu_arch_gfx906
-
amdgpu_arch_gfx908_gfx908
1/1
amdgpu-openmp-system-arch-fail.c
1/1
amdgpu-openmp-system-arch.c
-
tools/
-
CMakeLists.txt
-
amdgpu-arch/
4/7
AMDGPUArch.cpp
1/7
CMakeLists.txt

Differential D99949

[AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed
ClosedPublic

Authored by pdhaliwal on Apr 6 2021, 5:21 AM.

Download Raw Diff

Details

Reviewers

JonChesterfield
ronlieb
jdoerfert
ABataev
gregrodgers
yaxunl

Commits

rG722d4d8e7585: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed
rG3194761d2763: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed
rG7029cffc4e78: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed

Summary

This patch adds new clang tool named amdgpu-arch which uses
HSA to detect installed AMDGPU and report back latter's march.
This tool is built only if system has HSA installed.

The value printed by amdgpu-arch is used to fill -march when
latter is not explicitly provided in -Xopenmp-target.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

pdhaliwal created this revision.Apr 6 2021, 5:21 AM

Herald added subscribers: kerbowa, guansong, t-tye and 7 others. · View Herald TranscriptApr 6 2021, 5:21 AM

pdhaliwal requested review of this revision.Apr 6 2021, 5:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2021, 5:21 AM

Herald added subscribers: cfe-commits, sstefan1, wdng. · View Herald Transcript

Working on tests.

This change is partly motivated by wanting to check in runtime tests for openmp that execute on whatever hardware is available locally. It is functionally similar to an out of tree bash script called mygpu that contains manually curated tables of pci.ids and to a python script called rocm_agent_enumerator that calls a c++ tool called rocminfo and tries to parse the output, with a different table of pci.ids for when that fails.

Ultimately, the bottom of this stack is a map from pci.id to gfx number stored in the user space driver thunk library, roct. That is linked into hsa. It would be simpler programming to copy&paste that map from roct into the openmp clang driver at the cost of inevitable divergence between the architecture clang detects and the architecture the runtime libraries detect. Spawning a process and reading stdout is a bit messy, but it beats copying the table, and it beats linking the gpu driver into clang in order to get at the table of numbers. This seems the right balance to me.

It should be possible to do something similar with cuda for nvptx, but that should be a separate executable. Partly to handle the permutations of cuda / hsa that may be present on a system. I haven't found the corresponding API call in cuda. The standalone tool nvidia-smi might be willing to emit sm_70 or similar to stdout, but I haven't found the right command line flags for that either. Rocminfo does not appear to be configurable, and is not necessarily present when compiling for amdgpu.

A bunch of comments inline, mostly style. I think there's a use-after-free bug.

It looks like our existing command line handling could be more robust. In particular, there should be error messages about march where there seem to be asserts. That makes it slightly unclear how we should handle things like the helper tool returning a string clang doesn't recognise (e.g. doesn't start with gfx). Something to revise separate to this patch I think.

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
75	AMDGPU_ARCH_PROGRAM_NAME?
80	This looks like there are too many stringrefs. Redirecting stdout to a temporary file seems reasonable, but I'd expect the pointers into OutputBuf to be invalid when it drops out of scope. Perhaps return a smallvector of smallstrings instead? Also, we're probably expecting fewer than 8 different gpus, probably as few as 1 in the most common case, so maybe a smallvector<type,1>
87	`s/const int RC =//`
88	can we pass {} for execArgs here?
108	Perhaps run all_of against the whole range and drop if size () > 1 test?
167	We shouldn't be handling unknown or missing march= fields with asserts. I see that this is already the case in multiple places, so let's go with a matching assert for this and aspire to fix that in a separate patch.
clang/tools/amdgpu-arch/AMDGPUArch.cpp
33	Simpler code if we drop the class and pass in the vector<string> itself as the void*
35	Does this null terminate for any length of GPU name? Wondering if we should explicitly zero out the last char.
46	Unsure these should be writing to stderr. We capture stdout, stderr probably goes to the user. We could exit 1 instead as clang is going to treat any failure to guess the arch identically
clang/tools/amdgpu-arch/CMakeLists.txt
12	Assuming this matches the logic in amdgpu openmp plugin

Harbormaster completed remote builds in B97290: Diff 335492.Apr 6 2021, 6:43 AM

yaxunl added inline comments.Apr 6 2021, 6:52 AM

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
102	This function is useful for AMDGPU toolchain and HIP toolchain. Can it be a member of AMDGPU toolchain?
clang/tools/amdgpu-arch/CMakeLists.txt
10	This tool does not use omp header file. Why is this needed?

Addressed review comments.

RE test: Since the tool is contingent on the results of HSA API call, adding a test
which would always PASS on all the systems with different AMD GPUs as well as always ignored on systems
with non AMDGPUs would not work. I welcome suggestions on how to resolve this.

This comment has been deleted.

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
167	Matched this one with below.
clang/tools/amdgpu-arch/AMDGPUArch.cpp
35	Checked the rocr-runtime, the output is null terminated.
46	Remove fprintf

pdhaliwal added a reviewer: yaxunl.Apr 7 2021, 2:26 AM

Harbormaster completed remote builds in B97476: Diff 335761.Apr 7 2021, 2:56 AM

LGTM about AMDGPU toolchain change. Thanks.

I'm happy with this as-is. @jdoerfert is this close enough to what you expected when we discussed this offline?

Adding it directly to AMDGPU.h was a good suggestion. Makes it easy for other amdgpu language drivers to pick it up.

That it's a binary on disk also means that more DIY build systems, e.g. freestanding C++ stuff, can call it directly, in -march=$(./amdgpu-arch) fashion.

clang/tools/amdgpu-arch/AMDGPUArch.cpp
52	This strategy looks good. Briefly considered whether we should print while iterating, but that will misbehave if there is an error after printing the first gpu. unsigned is strictly the wrong type here, though it doesn't matter in practice. Could go with a range based for loop instead.

no strong opinion rn, I would add tests though

We'll have slightly indirect testing once this is used to enable D99656. There are two pieces that can be tested:

1/ The clang handling. That we can test with a minor change. Add a command line argument to clang that specifies the name of the tool to call to get the architecture (and defaults to amdgpu-arch). Then write N tests that call N different shell scripts that return the various cases. The plumbing for that argument may prove useful when adding a corresponding nvptx toool.

2/ The call into hsa. That probably means stubbing out hsa. Kind of interested in that - a hsa library that can be jury rigged to fail various calls is a path to a robust amdgpu plugin. Way more complexity in that fake stub than in this code though. I don't think that's worthwhile here.

clang/tools/amdgpu-arch/AMDGPUArch.cpp
43	missed one, return 1 on failure to initialize

Addressed review comments
Added LIT test case

Herald added subscribers: jansvoboda11, dang. · View Herald TranscriptApr 8 2021, 2:13 AM

Harbormaster completed remote builds in B97674: Diff 336037.Apr 8 2021, 2:47 AM

JonChesterfield added inline comments.Apr 8 2021, 6:28 AM

clang/include/clang/Driver/Options.td
927	I'd expect path to be the directory in which the tool is found, whereas this is used to name the tool itself. Perhaps 'amdgpu_arch_tool='? We might be able to write the default string inline as part of the argument definition, otherwise the current handling looks ok
clang/test/Driver/amdgpu-openmp-system-arch.c
2	This seems fairly clear. We might want to drop the fcuda-is-device from the regex as it doesn't matter for the test. Failing cases (maybe driven by a separate C file) are probably: print nothing, return 0 print nothing, return 1 print two different strings, return 0

Added tests for the failing cases

Fix permissions

pdhaliwal marked an inline comment as done.Apr 9 2021, 2:01 AM

Harbormaster completed remote builds in B97907: Diff 336351.Apr 9 2021, 2:45 AM

Harbormaster completed remote builds in B97909: Diff 336353.Apr 9 2021, 3:07 AM

New tests look good to me, thanks.

clang/include/clang/Driver/Options.td
928	Path to tool used for detecting AMD GPU arch in the system. ?
clang/lib/Driver/ToolChains/AMDGPU.h
105	Comment out of date. Possibly delete sentence starting Location.

I have two serious concerns with this tool . 1. It does not provide the infrastructure to identify runtime capabilities to satisfy requirements of a compiled image. 2. It is not architecturally neutral or extensible to other architectures. I have a different proposed tool called offload-arch . Here is the help text for offload-arch.

grodgers@r4:/tmp/grodgers/git/aomp13/llvm-project/clang/tools/offload-arch/build$ ./offload-arch -h

offload-arch: Print offload architecture(s) for the current active system.
              or lookup information about offload architectures

Usage:
  offload-arch [ Options ] [ Optional lookup-value ]

  With no options, offload-arch prints a value for first offload architecture
  found in current system.  This value can be used by various clang frontends.
  For example, to compile for openmp offloading on current current system
  one could invoke clang with the following command:

  clang -fopenmp -openmp-targets=`offload-arch` foo.c

  If an optional lookup-value is specified, offload-arch will
  check if the value is either a valid offload-arch or a codename
  and lookup requested additional information. For example,
  this provides all information for offload-arch gfx906:

  offload-arch gfx906 -v

  Options:
  -h  Print this help message
  -a  Print values for all devices. Don't stop at first device found.
  -c  Print codename
  -n  Print numeric pci-id
  -t  Print recommended offload triple.
  -v  Verbose = -a -c -n -t
  -l  Print long codename found in pci.ids file
  -r  Print capabilities of current system to satisfy runtime requirements
      of compiled offload images.  This option is used by the runtime to
      choose correct image when multiple compiled images are availble.

  The alias amdgpu-arch returns 1 if no amdgcn GPU is found.
  The alias nvidia-arch returns 1 if no cuda GPU is found.
  These aliases are useful to determine if architecture-specific tests
  should be run. Or these aliases could be used to conditionally load
  archecture-specific software.

Copyright (c) 2021 ADVANCED MICRO DEVICES, INC.

This revision now requires changes to proceed.Apr 14 2021, 7:05 AM

In D99949#2688869, @gregrodgers wrote:

It does not provide the infrastructure to identify runtime capabilities to satisfy requirements of a compiled image.

I believe we only require a value for '-march=' to unblock running tests on CI machines. I'd guess you're referring to target id stuff where clang fills in reasonable defaults already.

In D99949#2688869, @gregrodgers wrote:

It is not architecturally neutral or extensible to other architectures.

Yes. By design. Clang calling into some unrelated tool for nvptx seems fine, nvidia-smi may already suffice for that purpose.

If we want to do something clever about guessing whether the machine has an nvidia card or an amdgpu one, clang can call both this tool and the nvidia one, and check both return codes.

Dependence on hsa is not necessary. The amdgpu and nvidia drivers both use PCI codes available in /sys . We should use architecture independent methods as much as possible.

In D99949#2689583, @gregrodgers wrote:

Dependence on hsa is not necessary. The amdgpu and nvidia drivers both use PCI codes available in /sys . We should use architecture independent methods as much as possible.

I may see the disagreement.

You are describing, and have implemented, a tool for querying the current system to discover what hardware it has available.
This patch is a tool for querying whether to run tests on the current system, and if so, what -march should make those tests succeed.

Those aren't quite the same thing. In particular, finding a recognised gpu on this system is necessary but insufficient to execute on it.
The real condition we require for the amdgpu tests is 'does the hsa on the current system recognise the gpu', and calling into hsa is a great way to establish that.

A tool for introspection would ideally be dependency free. In the best case, it's just a function in clang, and we don't need any of the subprocess file io cruft.

A tool for guessing whether the in tree runtime tests should be attempted on the current system would ideally be this one, to exactly match when hsa can run them.

Rebase and review comments

pdhaliwal marked 2 inline comments as done.Apr 14 2021, 11:39 PM

Harbormaster completed remote builds in B98821: Diff 337640.Apr 15 2021, 12:17 AM

I've built this, checked it behaves as expected, checked clang does something reasonable when the executable is missing. All looks good to me, explicitly accepting.

We may need an internal call with Greg to work out how to unblock this given his objections above.

gregrodgers added inline comments.Apr 15 2021, 5:19 AM

clang/tools/amdgpu-arch/CMakeLists.txt
10	What happens when /opt/rocm is not available? Again, we need a cross-architecture mechanism to identify the offload-arch.

JonChesterfield added inline comments.Apr 15 2021, 5:31 AM

clang/tools/amdgpu-arch/CMakeLists.txt
10	Exactly the same as the amdgpu plugin. The cmake detection is char for char identical. This will look in CMAKE_INSTALL_PREFIX, which is where I install these libs when using trunk, and falls back to /opt/rocm which seems to be convenient for some users.

JonChesterfield added inline comments.Apr 15 2021, 5:55 AM

clang/tools/amdgpu-arch/CMakeLists.txt
10	Which may need revising at some point - I like installing hsa as if it was an llvm subcomponent, but other people might want a different convention. As long as we remember to change this file + amdgpu's cmake at the same time, all good.

I am removing my objection with the understanding that we will either replace or enhance amdgpu-arch with the cross-architecture tool offload-arch as described in my comments above. Also, this patch changes amd toolchains to use amdgpu-arch. So it will not interfere with how I expect the cross-architecture tool to be used to greatly simplify openmp command line arguments.

This revision is now accepted and ready to land.Apr 15 2021, 6:00 AM

t-tye added inline comments.Apr 15 2021, 8:54 AM

clang/tools/amdgpu-arch/CMakeLists.txt
10	/opt/tocm will not work with the side-by-side ROCm installation which installs ROCm in directories with the version number. Should there be the ability to configure this?

JonChesterfield added inline comments.Apr 15 2021, 9:16 AM

clang/tools/amdgpu-arch/CMakeLists.txt
10	If /opt/rocm isn't a safe out of the box option for finding rocm, let's remove that from here and the amdgpu plugin. Pre-existing hazard so doesn't need to block this patch. I'm installing roct and rocr to the same CMAKE_INSTALL_PREFIX as llvm which is why the first clause works out. How are the rocm components meant to find the corresponding pieces? If there's a rocm cmake install dir variable we could add that to the hints.

This revision was landed with ongoing or failed builds.Apr 15 2021, 10:26 PM

Closed by commit rG7029cffc4e78: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed (authored by pdhaliwal). · Explain Why

This revision was automatically updated to reflect the committed changes.

pdhaliwal added a commit: rG7029cffc4e78: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed.

pdhaliwal added a reverting change: rGefc013ec4d95: Revert "[AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed".Apr 16 2021, 2:17 AM

I've reverted this from main for now as there seems to be issue with executing test script on some CI machines.

The failing log pointed at "-mllvm" "--amdhsa-code-object-version=4", which is a hard error if the amdgpu triple is missing from the llvm. I see the test features amdgpu-registered-target. Perhaps that does not behave as one might wish?

I'd suggest rebuilding locally without the amdgpu triple enabled and see if that fails comparably, and if so, it's another argument for fixing D98746. Until the front end can run without amdgpu triple built for the middle end, I don't see how we can have any front end tests for amdgpu.

pdhaliwal reopened this revision.Apr 20 2021, 3:59 AM

This revision is now accepted and ready to land.Apr 20 2021, 3:59 AM

Reopening this. This version is supposed to fix the buildbot failures on PPC machines.
Since I don't have PPC machine I am not sure if this will work. But the logic
followed here is motivated from Clang :: Driver/program-path-priority.c, so hopefully
it will pass the CI.

pdhaliwal requested review of this revision.Apr 20 2021, 4:00 AM

Couple of minor points above. I think the increase in error reporting granularity will be helpful if this falls over in the field, as well as helping if we need a third try to get through PPC CI

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
212	Can we reasonably factor out this duplication?
clang/test/Driver/amdgpu-openmp-system-arch-fail.c
11	I'm not sure we can assume /bin/true exists everywhere, perhaps create a shell script on the fly that just exits?

Harbormaster completed remote builds in B99681: Diff 338810.Apr 20 2021, 5:05 AM

Review comments addressed.

LG, thank you for the debugging, and for the more descriptive failure reporting

This revision is now accepted and ready to land.Apr 20 2021, 6:36 AM

Harbormaster completed remote builds in B99701: Diff 338843.Apr 20 2021, 7:20 AM

This revision was landed with ongoing or failed builds.Apr 20 2021, 10:06 PM

Closed by commit rG3194761d2763: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed (authored by pdhaliwal). · Explain Why

This revision was automatically updated to reflect the committed changes.

pdhaliwal added a commit: rG3194761d2763: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed.

pdhaliwal added a reverting change: rG0ad50bf27f89: Revert "[AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed".Apr 21 2021, 1:05 AM

I was under the impression that #!/usr/bin/env sh is a sensible invocation for running a shell on various systems. The current theory for this struggling with the ppc buildbot is that fedora doesn't support that. Ad hoc searching suggests 'sh' is required to exist in /bin on posix-like systems, and the TestRunner.sh script under clang tests starts with #!/bin/sh

It's still kind of shotgun debugging, but we could change to that #! header across the tests and ensure they all end with a newline while we're at it.

edit: found a colleague with a red hat machine, which reports

line 4: return: can only `return' from a function or sourced script

pdhaliwal reopened this revision.Apr 21 2021, 7:41 AM

This revision is now accepted and ready to land.Apr 21 2021, 7:41 AM

Replaced the return commands in test scripts with exit command. It seems like
return is handled bit differently on fedora/rhel machines.

pdhaliwal requested review of this revision.Apr 21 2021, 7:41 AM

JonChesterfield accepted this revision.Apr 21 2021, 7:43 AM

This revision is now accepted and ready to land.Apr 21 2021, 7:43 AM

Harbormaster completed remote builds in B99982: Diff 339235.Apr 21 2021, 8:14 AM

This revision was landed with ongoing or failed builds.Apr 21 2021, 10:20 PM

Closed by commit rG722d4d8e7585: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed (authored by pdhaliwal). · Explain Why

This revision was automatically updated to reflect the committed changes.

pdhaliwal added a commit: rG722d4d8e7585: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed.

This change broke multi-stage builds (even if AMDGPU is disabled as a target). The problem is that clang/tools/amdgpu-arch/AMDGPUArch.cpp can't find hsa.h. Is there a quick fix?

In D99949#2709297, @davezarzycki wrote:

This change broke multi-stage builds (even if AMDGPU is disabled as a target). The problem is that clang/tools/amdgpu-arch/AMDGPUArch.cpp can't find hsa.h. Is there a quick fix?

That should not be the case. The cmake for building AMDGPUArch.cpp is guarded by

find_package(hsa-runtime64 QUIET 1.2.0 HINTS ${CMAKE_INSTALL_PREFIX} PATHS /opt/rocm)
if (NOT ${hsa-runtime64_FOUND})
    return

The intent is for this to be built only when hsa-runtime64 is found on disk, at which point hsa.h is meant to be found within that package via cmake magic. Perhaps we need to explicitly specify the include directory, though we do not do so in the openmp plugin this was copied from, and I do not have hsa.h on any global search path.

Little harm will be done by reverting this (again...) and reapplying, but it would be really helpful if you can share some more information about the build config that is failing.

In D99949#2709497, @JonChesterfield wrote:
In D99949#2709297, @davezarzycki wrote:

This change broke multi-stage builds (even if AMDGPU is disabled as a target). The problem is that clang/tools/amdgpu-arch/AMDGPUArch.cpp can't find hsa.h. Is there a quick fix?

That should not be the case. The cmake for building AMDGPUArch.cpp is guarded by
find_package(hsa-runtime64 QUIET 1.2.0 HINTS ${CMAKE_INSTALL_PREFIX} PATHS /opt/rocm)
if (NOT ${hsa-runtime64_FOUND})
    return
The intent is for this to be built only when hsa-runtime64 is found on disk, at which point hsa.h is meant to be found within that package via cmake magic. Perhaps we need to explicitly specify the include directory, though we do not do so in the openmp plugin this was copied from, and I do not have hsa.h on any global search path.

Little harm will be done by reverting this (again...) and reapplying, but it would be really helpful if you can share some more information about the build config that is failing.

As it turns out, I do have the HSA runtime installed so perhaps the problem is that the header is actually hsa/hsa.h and not hsa.h?

And if it helps:

$ sudo find / -xdev -name hsa.h
/usr/include/hsa/hsa.h
/opt/rocm-3.10.0/include/hsa/hsa.h
/opt/rocm-4.0.0/include/hsa/hsa.h

That's interesting. The various permutations I have on disk are also all path/include/hsa/hsa.h. The existing in tree use of hsa.h is the amdgpu plugin, which uses #include "hsa.h" and #include <hsa.h>, which seems unlikely to be correct.

I'm going to patch this on the fly to hsa/hsa.h as that looks very likely to be the fix.

edit: or not, the include path cmake summons on my local system is -isystem $HOME/llvm-install/include/hsa, so hsa/hsa doesn't work.

Tiresome. Will revert this for now.

edit: talked to some other people. Some follow up,

what do you have CMAKE_PREFIX_PATH set to? It seems that's the mechanism for picking from multiple rocm installs
is the /usr/include one a manual install? I'm paranoid about installing stuff under /, but maybe that's a common destination for it

JonChesterfield added a reverting change: rG24c1ed3b34f7: Revert "[AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed".Apr 22 2021, 11:39 AM

In D99949#2709673, @JonChesterfield wrote:

That's interesting. The various permutations I have on disk are also all path/include/hsa/hsa.h. The existing in tree use of hsa.h is the amdgpu plugin, which uses #include "hsa.h" and #include <hsa.h>, which seems unlikely to be correct.

I'm going to patch this on the fly to hsa/hsa.h as that looks very likely to be the fix.

edit: or not, the include path cmake summons on my local system is -isystem $HOME/llvm-install/include/hsa, so hsa/hsa doesn't work.

Tiresome. Will revert this for now.

Ya, I bet. I also find this interesting because this fails my "three stage" test at the third stage which is rare. My stages:

build and test llvm+clang (release without asserts) with the system compiler release
build and test llvm+clang (release with asserts) with the just build compiler
build and test llvm+clang (release without assert) with compiler built in stage one

It's not obvious to me why any of those stages would get a different result for the search for rocr. Do you do things with chroot/jails to ensure isolation for some of them?

edit: are the three builds from empty directories? cmake has some unusual caching properties, it might find rocr in one build, remember that in the filesystem, then assume it found it later when actually it wouldn't have

We could also do various soft-fail things. For example, if we find rocr, but can't build it for any reason, warn and continue with the rest of the build. Clang doesn't mind if the tool is missing.

In D99949#2709785, @JonChesterfield wrote:

It's not obvious to me why any of those stages would get a different result for the search for rocr. Do you do things with chroot/jails to ensure isolation for some of them?

Technically? Yes. Practically? No. My setup just remounts a temporary /tmp during the build to garbage collect stuff the build puts in /tmp. And the second/third stages use the clang built/installed from the first stage but otherwise it's not different.

When find_package can find rocr, but it sets up include paths that don't work with it, I think that constitutes a broken rocr install. At least, one that is difficult to use from cmake.

The packages are meant to deal with that sort of thing which suggests the /usr/include one. If you don't have a symlink in /opt/rocm then the HINTS above shouldn't be finding either of those, which also suggests the /usr/include. Can I interest you in moving / deleting that copy of hsa, so see if it's a machine-local quirk?

I removed all HSA/ROCM via the package manager and everything is fine now. Have you considered using __has_include() instead of CMake? It seems to be supported by all contemporary compilers, even MSVC.

In D99949#2710273, @davezarzycki wrote:

I removed all HSA/ROCM via the package manager and everything is fine now. Have you considered using __has_include() instead of CMake? It seems to be supported by all contemporary compilers, even MSVC.

To confirm, you can build the pre-revert version of this patch after that adjustment? If so I'll give this another try. (edit: misread, if all hsa is gone from your system, it won't try to build this tool at all, so you're good)

I have indeed considered something along the lines of

#if __has_include
#if __has_include("hsa.h")
#include "hsa.h"
#elif __has_include("hsa/hsa.h")
#include "hsa/hsa.h"
#endif
#else
#include "hsa.h"
#endif

It's tempting to forward declare the few pieces used instead. Hopefully the cmake find_package stuff will be robust enough to avoid such things.

Live again as of 15be0c41d2e5

It's still failing but I can disable HSA on this machine for now. FYI --

FAILED: tools/clang/tools/amdgpu-arch/CMakeFiles/amdgpu-arch.dir/AMDGPUArch.cpp.o
/p/tllvm/bin/clang++ -DGTEST_HAS_RTTI=0 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/clang/tools/amdgpu-arch -I/home/dave/ro_s/lp/clang/tools/amdgpu-arch -I/home/dave/ro_s/lp/clang/include -Itools/clang/include -Iinclude -I/home/dave/ro_s/lp/llvm/include -Werror=switch -Wno-deprecated-copy -stdlib=libc++ -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O2 -DNDEBUG  -march=skylake -fno-vectorize -fno-slp-vectorize -fno-tree-slp-vectorize -fno-exceptions -fno-rtti -std=c++14 -MD -MT tools/clang/tools/amdgpu-arch/CMakeFiles/amdgpu-arch.dir/AMDGPUArch.cpp.o -MF tools/clang/tools/amdgpu-arch/CMakeFiles/amdgpu-arch.dir/AMDGPUArch.cpp.o.d -o tools/clang/tools/amdgpu-arch/CMakeFiles/amdgpu-arch.dir/AMDGPUArch.cpp.o -c /home/dave/ro_s/lp/clang/tools/amdgpu-arch/AMDGPUArch.cpp
/home/dave/ro_s/lp/clang/tools/amdgpu-arch/AMDGPUArch.cpp:14:10: fatal error: 'hsa.h' file not found
#include <hsa.h>
         ^~~~~~~
1 error generated.

None of those include paths look like the result of find_package locating a hsa so I'd guess this some quirk of cmake. It seems it is setting hsa-runtime64_FOUND but not doing the include path setup to match. I don't know how to debug that. @pdhaliwal?

We may need more robust error handling around this. I'm seeing test failures when building clang,

clang-13: error: Cannot determine AMDGPU architecture: $HOME/llvm-build/llvm/./bin/amdgpu-arch: Execute failed: No such file or directory. Consider passing it via --march.

The executable exists, but when run directly it fails with error while loading shared libraries: libhsa-runtime64.so.1: cannot open shared object file: No such file or directory
It's built with RUNPATH = [$ORIGIN/../lib], which doesn't necessarily work from the build directory.

Two somewhat orthogonal things here I think.

skip running the corresponding clang tests when ./amdgpu-arch fails, whatever reason it fails for (which I thought was already the case, but can't find that in D99656)
be able to run from the build directory, which seems to be the default location for the tests (e.g. D101926)

Changing clang_target_link_libraries to target_link_libraries made no difference. There is some cmake handling available around rpath but I haven't been able to determine what it should be set to.

edit: becoming increasingly tempting to statically link rocr, but that presently fails in a different mode

I am investigating the find_package issue.

pdhaliwal mentioned this in D101926: [amdgpu-arch] Fix rpath to run from build dir.May 5 2021, 11:50 PM

I could not find anything in the cmake files which could point to the issue mentioned here. @davezarzycki, are you on fedora/redhat?

Yes, Fedora 34 (x86-64).

There is a potential hazard here for parallel compilation and to a lesser extent testing. (I have just learned that) kfd imposes a process limit which is low enough that we may see hsa_init fail under multiple processes.

@jdoerfert you talked me out of embedding a map from pci.ids to architecture as we'd have to keep it up to date, and it can diverge from how the runtime libraries identify the hardware. I'm starting to think that's the lesser of two evils. Would probably look something like

id = cat-somewhere-in-/sys
switch(id) {
default:
  return nullptr;
case 0x67C0:
case 0x67C1:
case 0x67C2:
  return "gfx803";
... // ~ 100 lines on this theme, picks up new entries when new hardware is released
}

where the table could go in this tool, or just in AMDGPU.cpp somewhere and wipe out all the failure modes from above

Greg was also interested in having pci ids table in amdgpu-arch. And, keeping this table inside the target/amdgpu directory sounds like a good idea. Overall, I agree with not having dependency on hsa as it has caused many issues.

pdhaliwal mentioned this in D102067: [amdgpu-arch] Guard hsa.h with __has_include.May 7 2021, 5:13 AM

I have put up a patch D102067 which uses __has_include as a workaround for header not found issue. @davezarzycki can you check if this resolves the issue?

pdhaliwal mentioned this in rGc711aa0f6f9d: [amdgpu-arch] Guard hsa.h with __has_include.May 10 2021, 12:34 AM

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

DiagnosticDriverKinds.td

2 lines

Driver/

Options.td

2 lines

lib/

Driver/

ToolChains/

AMDGPU.h

8 lines

AMDGPU.cpp

79 lines

AMDGPUOpenMP.cpp

41 lines

test/

Driver/

Inputs/

amdgpu-arch/

amdgpu_arch_different

4 lines

amdgpu_arch_fail

2 lines

amdgpu_arch_gfx906

3 lines

amdgpu_arch_gfx908_gfx908

4 lines

amdgpu-openmp-system-arch-fail.c

28 lines

amdgpu-openmp-system-arch.c

24 lines

tools/

CMakeLists.txt

2 lines

amdgpu-arch/

AMDGPUArch.cpp

59 lines

CMakeLists.txt

17 lines

Diff 339475

clang/include/clang/Basic/DiagnosticDriverKinds.td

	Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines

	def err_drv_no_rocm_device_lib : Error<			def err_drv_no_rocm_device_lib : Error<
	"cannot find ROCm device library%select{\| for %1}0. Provide its path via --rocm-path or "			"cannot find ROCm device library%select{\| for %1}0. Provide its path via --rocm-path or "
	"--rocm-device-lib-path, or pass -nogpulib to build without ROCm device library.">;			"--rocm-device-lib-path, or pass -nogpulib to build without ROCm device library.">;
	def err_drv_no_hip_runtime : Error<			def err_drv_no_hip_runtime : Error<
	"cannot find HIP runtime. Provide its path via --rocm-path, or pass "			"cannot find HIP runtime. Provide its path via --rocm-path, or pass "
	"-nogpuinc to build without HIP runtime.">;			"-nogpuinc to build without HIP runtime.">;

				def err_drv_undetermined_amdgpu_arch : Error<
				"Cannot determine AMDGPU architecture: %0. Consider passing it via --march.">;
	def err_drv_cuda_version_unsupported : Error<			def err_drv_cuda_version_unsupported : Error<
	"GPU arch %0 is supported by CUDA versions between %1 and %2 (inclusive), "			"GPU arch %0 is supported by CUDA versions between %1 and %2 (inclusive), "
	"but installation at %3 is %4. Use --cuda-path to specify a different CUDA "			"but installation at %3 is %4. Use --cuda-path to specify a different CUDA "
	"install, pass a different GPU arch with --cuda-gpu-arch, or pass "			"install, pass a different GPU arch with --cuda-gpu-arch, or pass "
	"--no-cuda-version-check.">;			"--no-cuda-version-check.">;
	def warn_drv_unknown_cuda_version: Warning<			def warn_drv_unknown_cuda_version: Warning<
	"Unknown CUDA version. %0 Assuming the latest supported version %1">,			"Unknown CUDA version. %0 Assuming the latest supported version %1">,
	InGroup<CudaUnknownVersion>;			InGroup<CudaUnknownVersion>;
	▲ Show 20 Lines • Show All 468 Lines • Show Last 20 Lines

clang/include/clang/Driver/Options.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 918 Lines • ▼ Show 20 Lines
	defm cuda_short_ptr : BoolFOption<"cuda-short-ptr",			defm cuda_short_ptr : BoolFOption<"cuda-short-ptr",
	TargetOpts<"NVPTXUseShortPointers">, DefaultFalse,			TargetOpts<"NVPTXUseShortPointers">, DefaultFalse,
	PosFlag<SetTrue, [CC1Option], "Use 32-bit pointers for accessing const/local/shared address spaces">,			PosFlag<SetTrue, [CC1Option], "Use 32-bit pointers for accessing const/local/shared address spaces">,
	NegFlag<SetFalse>>;			NegFlag<SetFalse>>;
	def rocm_path_EQ : Joined<["--"], "rocm-path=">, Group<i_Group>,			def rocm_path_EQ : Joined<["--"], "rocm-path=">, Group<i_Group>,
	HelpText<"ROCm installation path, used for finding and automatically linking required bitcode libraries.">;			HelpText<"ROCm installation path, used for finding and automatically linking required bitcode libraries.">;
	def hip_path_EQ : Joined<["--"], "hip-path=">, Group<i_Group>,			def hip_path_EQ : Joined<["--"], "hip-path=">, Group<i_Group>,
	HelpText<"HIP runtime installation path, used for finding HIP version and adding HIP include path.">;			HelpText<"HIP runtime installation path, used for finding HIP version and adding HIP include path.">;
				def amdgpu_arch_tool_EQ : Joined<["--"], "amdgpu-arch-tool=">, Group<i_Group>,
				JonChesterfieldUnsubmitted Done Reply Inline Actions I'd expect path to be the directory in which the tool is found, whereas this is used to name the tool itself. Perhaps 'amdgpu_arch_tool='? We might be able to write the default string inline as part of the argument definition, otherwise the current handling looks ok JonChesterfield: I'd expect path to be the directory in which the tool is found, whereas this is used to name…
				HelpText<"Tool used for detecting AMD GPU arch in the system.">;
				JonChesterfieldUnsubmitted Done Reply Inline Actions Path to tool used for detecting AMD GPU arch in the system. ? JonChesterfield: > Path to tool used for detecting AMD GPU arch in the system. ?
	def rocm_device_lib_path_EQ : Joined<["--"], "rocm-device-lib-path=">, Group<Link_Group>,			def rocm_device_lib_path_EQ : Joined<["--"], "rocm-device-lib-path=">, Group<Link_Group>,
	HelpText<"ROCm device library path. Alternative to rocm-path.">;			HelpText<"ROCm device library path. Alternative to rocm-path.">;
	def : Joined<["--"], "hip-device-lib-path=">, Alias<rocm_device_lib_path_EQ>;			def : Joined<["--"], "hip-device-lib-path=">, Alias<rocm_device_lib_path_EQ>;
	def hip_device_lib_EQ : Joined<["--"], "hip-device-lib=">, Group<Link_Group>,			def hip_device_lib_EQ : Joined<["--"], "hip-device-lib=">, Group<Link_Group>,
	HelpText<"HIP device library">;			HelpText<"HIP device library">;
	def hip_version_EQ : Joined<["--"], "hip-version=">,			def hip_version_EQ : Joined<["--"], "hip-version=">,
	HelpText<"HIP version in the format of major.minor.patch">;			HelpText<"HIP version in the format of major.minor.patch">;
	def fhip_dump_offload_linker_script : Flag<["-"], "fhip-dump-offload-linker-script">,			def fhip_dump_offload_linker_script : Flag<["-"], "fhip-dump-offload-linker-script">,
	▲ Show 20 Lines • Show All 5,216 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/AMDGPU.h

Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	public:
}		}

/// Needed for translating LTO options.		/// Needed for translating LTO options.
const char *getDefaultLinker() const override { return "ld.lld"; }		const char *getDefaultLinker() const override { return "ld.lld"; }

/// Should skip argument.		/// Should skip argument.
bool shouldSkipArgument(const llvm::opt::Arg *Arg) const;		bool shouldSkipArgument(const llvm::opt::Arg *Arg) const;

		/// Uses amdgpu_arch tool to get arch of the system GPU. Will return error
		/// if unable to find one.
		llvm::Error getSystemGPUArch(const llvm::opt::ArgList &Args,
		JonChesterfieldUnsubmitted Done Reply Inline Actions Comment out of date. Possibly delete sentence starting Location. JonChesterfield: Comment out of date. Possibly delete sentence starting Location.
		std::string &GPUArch) const;

protected:		protected:
/// Check and diagnose invalid target ID specified by -mcpu.		/// Check and diagnose invalid target ID specified by -mcpu.
void checkTargetID(const llvm::opt::ArgList &DriverArgs) const;		void checkTargetID(const llvm::opt::ArgList &DriverArgs) const;

/// Get GPU arch from -mcpu without checking.		/// Get GPU arch from -mcpu without checking.
StringRef getGPUArch(const llvm::opt::ArgList &DriverArgs) const;		StringRef getGPUArch(const llvm::opt::ArgList &DriverArgs) const;

		llvm::Error detectSystemGPUs(const llvm::opt::ArgList &Args,
		SmallVector<std::string, 1> &GPUArchs) const;
};		};

class LLVM_LIBRARY_VISIBILITY ROCMToolChain : public AMDGPUToolChain {		class LLVM_LIBRARY_VISIBILITY ROCMToolChain : public AMDGPUToolChain {
public:		public:
ROCMToolChain(const Driver &D, const llvm::Triple &Triple,		ROCMToolChain(const Driver &D, const llvm::Triple &Triple,
const llvm::opt::ArgList &Args);		const llvm::opt::ArgList &Args);
void		void
addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,		addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,
Show All 9 Lines

clang/lib/Driver/ToolChains/AMDGPU.cpp

//===--- AMDGPU.cpp - AMDGPU ToolChain Implementations ----------- C++ --===//		//===--- AMDGPU.cpp - AMDGPU ToolChain Implementations ----------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPU.h"		#include "AMDGPU.h"
#include "CommonArgs.h"		#include "CommonArgs.h"
#include "InputInfo.h"		#include "InputInfo.h"
#include "clang/Basic/TargetID.h"		#include "clang/Basic/TargetID.h"
#include "clang/Driver/Compilation.h"		#include "clang/Driver/Compilation.h"
#include "clang/Driver/DriverDiagnostic.h"		#include "clang/Driver/DriverDiagnostic.h"
		#include "clang/Driver/Options.h"
#include "llvm/Option/ArgList.h"		#include "llvm/Option/ArgList.h"
		#include "llvm/Support/Error.h"
		#include "llvm/Support/FileUtilities.h"
		#include "llvm/Support/LineIterator.h"
#include "llvm/Support/Path.h"		#include "llvm/Support/Path.h"
#include "llvm/Support/VirtualFileSystem.h"		#include "llvm/Support/VirtualFileSystem.h"
		#include <system_error>

		#define AMDGPU_ARCH_PROGRAM_NAME "amdgpu-arch"

using namespace clang::driver;		using namespace clang::driver;
using namespace clang::driver::tools;		using namespace clang::driver::tools;
using namespace clang::driver::toolchains;		using namespace clang::driver::toolchains;
using namespace clang;		using namespace clang;
using namespace llvm::opt;		using namespace llvm::opt;

// Look for sub-directory starts with PackageName under ROCm candidate path.		// Look for sub-directory starts with PackageName under ROCm candidate path.
▲ Show 20 Lines • Show All 684 Lines • ▼ Show 20 Lines	void AMDGPUToolChain::checkTargetID(

llvm::StringMap<bool> FeatureMap;		llvm::StringMap<bool> FeatureMap;
auto OptionalGpuArch = parseTargetID(getTriple(), TargetID, &FeatureMap);		auto OptionalGpuArch = parseTargetID(getTriple(), TargetID, &FeatureMap);
if (!OptionalGpuArch) {		if (!OptionalGpuArch) {
getDriver().Diag(clang::diag::err_drv_bad_target_id) << TargetID;		getDriver().Diag(clang::diag::err_drv_bad_target_id) << TargetID;
}		}
}		}

		llvm::Error
		AMDGPUToolChain::detectSystemGPUs(const ArgList &Args,
		SmallVector<std::string, 1> &GPUArchs) const {
		std::string Program;
		if (Arg *A = Args.getLastArg(options::OPT_amdgpu_arch_tool_EQ))
		Program = A->getValue();
		else
		Program = GetProgramPath(AMDGPU_ARCH_PROGRAM_NAME);
		llvm::SmallString<64> OutputFile;
		llvm::sys::fs::createTemporaryFile("print-system-gpus", "" /* No Suffix */,
		OutputFile);
		llvm::FileRemover OutputRemover(OutputFile.c_str());
		llvm::Optional<llvm::StringRef> Redirects[] = {
		{""},
		StringRef(OutputFile),
		{""},
		};

		std::string ErrorMessage;
		if (int Result = llvm::sys::ExecuteAndWait(
		Program.c_str(), {}, {}, Redirects, /* SecondsToWait */ 0,
		/MemoryLimit/ 0, &ErrorMessage)) {
		if (Result > 0) {
		ErrorMessage = "Exited with error code " + std::to_string(Result);
		} else if (Result == -1) {
		ErrorMessage = "Execute failed: " + ErrorMessage;
		} else {
		ErrorMessage = "Crashed: " + ErrorMessage;
		}

		return llvm::createStringError(std::error_code(),
		Program + ": " + ErrorMessage);
		}

		llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> OutputBuf =
		llvm::MemoryBuffer::getFile(OutputFile.c_str());
		if (!OutputBuf) {
		return llvm::createStringError(OutputBuf.getError(),
		"Failed to read stdout of " + Program +
		": " + OutputBuf.getError().message());
		}

		for (llvm::line_iterator LineIt(**OutputBuf); !LineIt.is_at_end(); ++LineIt) {
		GPUArchs.push_back(LineIt->str());
		}
		return llvm::Error::success();
		}

		llvm::Error AMDGPUToolChain::getSystemGPUArch(const ArgList &Args,
		std::string &GPUArch) const {
		// detect the AMDGPU installed in system
		SmallVector<std::string, 1> GPUArchs;
		auto Err = detectSystemGPUs(Args, GPUArchs);
		if (Err) {
		return Err;
		}
		if (GPUArchs.empty()) {
		return llvm::createStringError(std::error_code(),
		"No AMD GPU detected in the system");
		}
		GPUArch = GPUArchs[0];
		if (GPUArchs.size() > 1) {
		bool AllSame = std::all_of(
		GPUArchs.begin(), GPUArchs.end(),
		[&](const StringRef &GPUArch) { return GPUArch == GPUArchs.front(); });
		if (!AllSame)
		return llvm::createStringError(
		std::error_code(), "Multiple AMD GPUs found with different archs");
		}
		return llvm::Error::success();
		}

void ROCMToolChain::addClangTargetOptions(		void ROCMToolChain::addClangTargetOptions(
const llvm::opt::ArgList &DriverArgs, llvm::opt::ArgStringList &CC1Args,		const llvm::opt::ArgList &DriverArgs, llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadingKind) const {		Action::OffloadKind DeviceOffloadingKind) const {
AMDGPUToolChain::addClangTargetOptions(DriverArgs, CC1Args,		AMDGPUToolChain::addClangTargetOptions(DriverArgs, CC1Args,
DeviceOffloadingKind);		DeviceOffloadingKind);

// For the OpenCL case where there is no offload target, accept -nostdlib to		// For the OpenCL case where there is no offload target, accept -nostdlib to
// disable bitcode linking.		// disable bitcode linking.
▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp

//===- AMDGPUOpenMP.cpp - AMDGPUOpenMP ToolChain Implementation -- C++ --===//		//===- AMDGPUOpenMP.cpp - AMDGPUOpenMP ToolChain Implementation -- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPUOpenMP.h"		#include "AMDGPUOpenMP.h"
#include "AMDGPU.h"		#include "AMDGPU.h"
#include "CommonArgs.h"		#include "CommonArgs.h"
#include "InputInfo.h"		#include "InputInfo.h"
		#include "clang/Basic/DiagnosticDriver.h"
#include "clang/Driver/Compilation.h"		#include "clang/Driver/Compilation.h"
#include "clang/Driver/Driver.h"		#include "clang/Driver/Driver.h"
#include "clang/Driver/DriverDiagnostic.h"		#include "clang/Driver/DriverDiagnostic.h"
#include "clang/Driver/Options.h"		#include "clang/Driver/Options.h"
#include "llvm/Support/FileSystem.h"		#include "llvm/Support/FileSystem.h"
		#include "llvm/Support/FormatAdapters.h"
		#include "llvm/Support/FormatVariadic.h"
#include "llvm/Support/Path.h"		#include "llvm/Support/Path.h"

using namespace clang::driver;		using namespace clang::driver;
using namespace clang::driver::toolchains;		using namespace clang::driver::toolchains;
using namespace clang::driver::tools;		using namespace clang::driver::tools;
using namespace clang;		using namespace clang;
using namespace llvm::opt;		using namespace llvm::opt;

Show All 35 Lines	else if (A->getOption().matches(options::OPT_O)) {
.Case("s", "2")		.Case("s", "2")
.Case("z", "2")		.Case("z", "2")
.Case("g", "1")		.Case("g", "1")
.Default("0");		.Default("0");
}		}
CmdArgs.push_back(Args.MakeArgString("-O" + OOpt));		CmdArgs.push_back(Args.MakeArgString("-O" + OOpt));
}		}
}		}

		static bool checkSystemForAMDGPU(const ArgList &Args, const AMDGPUToolChain &TC,
		std::string &GPUArch) {
		if (auto Err = TC.getSystemGPUArch(Args, GPUArch)) {
		JonChesterfieldUnsubmitted Done Reply Inline Actions AMDGPU_ARCH_PROGRAM_NAME? JonChesterfield: AMDGPU_ARCH_PROGRAM_NAME?
		std::string ErrMsg =
		llvm::formatv("{0}", llvm::fmt_consume(std::move(Err)));
		TC.getDriver().Diag(diag::err_drv_undetermined_amdgpu_arch) << ErrMsg;
		return false;
		}
		JonChesterfieldUnsubmitted Done Reply Inline Actions This looks like there are too many stringrefs. Redirecting stdout to a temporary file seems reasonable, but I'd expect the pointers into OutputBuf to be invalid when it drops out of scope. Perhaps return a smallvector of smallstrings instead? Also, we're probably expecting fewer than 8 different gpus, probably as few as 1 in the most common case, so maybe a smallvector<type,1> JonChesterfield: This looks like there are too many stringrefs. Redirecting stdout to a temporary file seems…

		return true;
		}
} // namespace		} // namespace

const char *AMDGCN::OpenMPLinker::constructLLVMLinkCommand(		const char *AMDGCN::OpenMPLinker::constructLLVMLinkCommand(
Compilation &C, const JobAction &JA, const InputInfoList &Inputs,		Compilation &C, const JobAction &JA, const InputInfoList &Inputs,
		JonChesterfieldUnsubmitted Done Reply Inline Actions `s/const int RC =//` JonChesterfield: `s/const int RC =//`
const ArgList &Args, StringRef SubArchName,		const ArgList &Args, StringRef SubArchName,
		JonChesterfieldUnsubmitted Done Reply Inline Actions can we pass {} for execArgs here? JonChesterfield: can we pass {} for execArgs here?
StringRef OutputFilePrefix) const {		StringRef OutputFilePrefix) const {
ArgStringList CmdArgs;		ArgStringList CmdArgs;

for (const auto &II : Inputs)		for (const auto &II : Inputs)
if (II.isFilename())		if (II.isFilename())
CmdArgs.push_back(II.getFilename());		CmdArgs.push_back(II.getFilename());
// Add an intermediate output file.		// Add an intermediate output file.
CmdArgs.push_back("-o");		CmdArgs.push_back("-o");
const char *OutputFileName =		const char *OutputFileName =
getOutputFileName(C, OutputFilePrefix, "-linked", "bc");		getOutputFileName(C, OutputFilePrefix, "-linked", "bc");
CmdArgs.push_back(OutputFileName);		CmdArgs.push_back(OutputFileName);
const char *Exec =		const char *Exec =
Args.MakeArgString(getToolChain().GetProgramPath("llvm-link"));		Args.MakeArgString(getToolChain().GetProgramPath("llvm-link"));
C.addCommand(std::make_unique<Command>(		C.addCommand(std::make_unique<Command>(
		yaxunlUnsubmitted Done Reply Inline Actions This function is useful for AMDGPU toolchain and HIP toolchain. Can it be a member of AMDGPU toolchain? yaxunl: This function is useful for AMDGPU toolchain and HIP toolchain. Can it be a member of AMDGPU…
JA, *this, ResponseFileSupport::AtFileCurCP(), Exec, CmdArgs, Inputs,		JA, *this, ResponseFileSupport::AtFileCurCP(), Exec, CmdArgs, Inputs,
InputInfo(&JA, Args.MakeArgString(OutputFileName))));		InputInfo(&JA, Args.MakeArgString(OutputFileName))));
return OutputFileName;		return OutputFileName;
}		}

const char *AMDGCN::OpenMPLinker::constructLlcCommand(		const char *AMDGCN::OpenMPLinker::constructLlcCommand(
		JonChesterfieldUnsubmitted Done Reply Inline Actions Perhaps run all_of against the whole range and drop if size () > 1 test? JonChesterfield: Perhaps run all_of against the whole range and drop if size () > 1 test?
Compilation &C, const JobAction &JA, const InputInfoList &Inputs,		Compilation &C, const JobAction &JA, const InputInfoList &Inputs,
const llvm::opt::ArgList &Args, llvm::StringRef SubArchName,		const llvm::opt::ArgList &Args, llvm::StringRef SubArchName,
llvm::StringRef OutputFilePrefix, const char *InputFileName,		llvm::StringRef OutputFilePrefix, const char *InputFileName,
bool OutputIsAsm) const {		bool OutputIsAsm) const {
// Construct llc command.		// Construct llc command.
ArgStringList LlcArgs;		ArgStringList LlcArgs;
// The input to llc is the output from opt.		// The input to llc is the output from opt.
LlcArgs.push_back(InputFileName);		LlcArgs.push_back(InputFileName);
Show All 38 Lines

// For amdgcn the inputs of the linker job are device bitcode and output is		// For amdgcn the inputs of the linker job are device bitcode and output is
// object file. It calls llvm-link, opt, llc, then lld steps.		// object file. It calls llvm-link, opt, llc, then lld steps.
void AMDGCN::OpenMPLinker::ConstructJob(Compilation &C, const JobAction &JA,		void AMDGCN::OpenMPLinker::ConstructJob(Compilation &C, const JobAction &JA,
const InputInfo &Output,		const InputInfo &Output,
const InputInfoList &Inputs,		const InputInfoList &Inputs,
const ArgList &Args,		const ArgList &Args,
const char *LinkingOutput) const {		const char *LinkingOutput) const {
		const ToolChain &TC = getToolChain();
assert(getToolChain().getTriple().isAMDGCN() && "Unsupported target");		assert(getToolChain().getTriple().isAMDGCN() && "Unsupported target");

StringRef GPUArch = Args.getLastArgValue(options::OPT_march_EQ);		const toolchains::AMDGPUOpenMPToolChain &AMDGPUOpenMPTC =
assert(GPUArch.startswith("gfx") && "Unsupported sub arch");		static_cast<const toolchains::AMDGPUOpenMPToolChain &>(TC);
		JonChesterfieldUnsubmitted Done Reply Inline Actions We shouldn't be handling unknown or missing march= fields with asserts. I see that this is already the case in multiple places, so let's go with a matching assert for this and aspire to fix that in a separate patch. JonChesterfield: We shouldn't be handling unknown or missing march= fields with asserts. I see that this is…
		pdhaliwalAuthorUnsubmitted Done Reply Inline Actions Matched this one with below. pdhaliwal: Matched this one with below.

		std::string GPUArch = Args.getLastArgValue(options::OPT_march_EQ).str();
		if (GPUArch.empty()) {
		if (!checkSystemForAMDGPU(Args, AMDGPUOpenMPTC, GPUArch))
		return;
		}

// Prefix for temporary file name.		// Prefix for temporary file name.
std::string Prefix;		std::string Prefix;
for (const auto &II : Inputs)		for (const auto &II : Inputs)
if (II.isFilename())		if (II.isFilename())
Prefix =		Prefix = llvm::sys::path::stem(II.getFilename()).str() + "-" + GPUArch;
llvm::sys::path::stem(II.getFilename()).str() + "-" + GPUArch.str();
assert(Prefix.length() && "no linker inputs are files ");		assert(Prefix.length() && "no linker inputs are files ");

// Each command outputs different files.		// Each command outputs different files.
const char *LLVMLinkCommand =		const char *LLVMLinkCommand =
constructLLVMLinkCommand(C, JA, Inputs, Args, GPUArch, Prefix);		constructLLVMLinkCommand(C, JA, Inputs, Args, GPUArch, Prefix);

// Produce readable assembly if save-temps is enabled.		// Produce readable assembly if save-temps is enabled.
if (C.getDriver().isSaveTempsEnabled())		if (C.getDriver().isSaveTempsEnabled())
Show All 14 Lines	AMDGPUOpenMPToolChain::AMDGPUOpenMPToolChain(const Driver &D,
getProgramPaths().push_back(getDriver().Dir);		getProgramPaths().push_back(getDriver().Dir);
}		}

void AMDGPUOpenMPToolChain::addClangTargetOptions(		void AMDGPUOpenMPToolChain::addClangTargetOptions(
const llvm::opt::ArgList &DriverArgs, llvm::opt::ArgStringList &CC1Args,		const llvm::opt::ArgList &DriverArgs, llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadingKind) const {		Action::OffloadKind DeviceOffloadingKind) const {
HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind);		HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind);

StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ);		std::string GPUArch = DriverArgs.getLastArgValue(options::OPT_march_EQ).str();
assert(!GpuArch.empty() && "Must have an explicit GPU arch.");		if (GPUArch.empty()) {
		if (!checkSystemForAMDGPU(DriverArgs, *this, GPUArch))
		JonChesterfieldUnsubmitted Done Reply Inline Actions Can we reasonably factor out this duplication? JonChesterfield: Can we reasonably factor out this duplication?
		return;
		}

assert(DeviceOffloadingKind == Action::OFK_OpenMP &&		assert(DeviceOffloadingKind == Action::OFK_OpenMP &&
"Only OpenMP offloading kinds are supported.");		"Only OpenMP offloading kinds are supported.");

CC1Args.push_back("-target-cpu");		CC1Args.push_back("-target-cpu");
CC1Args.push_back(DriverArgs.MakeArgStringRef(GpuArch));		CC1Args.push_back(DriverArgs.MakeArgStringRef(GPUArch));
CC1Args.push_back("-fcuda-is-device");		CC1Args.push_back("-fcuda-is-device");

if (DriverArgs.hasArg(options::OPT_nogpulib))		if (DriverArgs.hasArg(options::OPT_nogpulib))
return;		return;
std::string BitcodeSuffix = "amdgcn-" + GpuArch.str();		std::string BitcodeSuffix = "amdgcn-" + GPUArch;
addOpenMPDeviceRTL(getDriver(), DriverArgs, CC1Args, BitcodeSuffix,		addOpenMPDeviceRTL(getDriver(), DriverArgs, CC1Args, BitcodeSuffix,
getTriple());		getTriple());
}		}

llvm::opt::DerivedArgList *AMDGPUOpenMPToolChain::TranslateArgs(		llvm::opt::DerivedArgList *AMDGPUOpenMPToolChain::TranslateArgs(
const llvm::opt::DerivedArgList &Args, StringRef BoundArch,		const llvm::opt::DerivedArgList &Args, StringRef BoundArch,
Action::OffloadKind DeviceOffloadKind) const {		Action::OffloadKind DeviceOffloadKind) const {
DerivedArgList *DAL =		DerivedArgList *DAL =
▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

clang/test/Driver/Inputs/amdgpu-arch/amdgpu_arch_different

This file was added.

Property	Old Value	New Value
File Mode	null	100755

				#!/bin/sh
				echo gfx908
				echo gfx906
				exit 0

clang/test/Driver/Inputs/amdgpu-arch/amdgpu_arch_fail

This file was added.

Property	Old Value	New Value
File Mode	null	100755

				#!/bin/sh
				exit 1

clang/test/Driver/Inputs/amdgpu-arch/amdgpu_arch_gfx906

This file was added.

Property	Old Value	New Value
File Mode	null	100755

				#!/bin/sh
				echo "gfx906"
				exit 0

clang/test/Driver/Inputs/amdgpu-arch/amdgpu_arch_gfx908_gfx908

This file was added.

Property	Old Value	New Value
File Mode	null	100755

				#!/bin/sh
				echo gfx908
				echo gfx908
				exit 0

clang/test/Driver/amdgpu-openmp-system-arch-fail.c

This file was added.

				// REQUIRES: system-linux
				// REQUIRES: x86-registered-target
				// REQUIRES: amdgpu-registered-target
				// REQUIRES: shell

				// RUN: mkdir -p %t
				// RUN: rm -f %t/amdgpu_arch_fail %t/amdgpu_arch_different
				// RUN: cp %S/Inputs/amdgpu-arch/amdgpu_arch_fail %t/
				// RUN: cp %S/Inputs/amdgpu-arch/amdgpu_arch_different %t/
				// RUN: echo '#!/bin/sh' > %t/amdgpu_arch_empty
				// RUN: chmod +x %t/amdgpu_arch_fail
				JonChesterfieldUnsubmitted Done Reply Inline Actions I'm not sure we can assume /bin/true exists everywhere, perhaps create a shell script on the fly that just exits? JonChesterfield: I'm not sure we can assume /bin/true exists everywhere, perhaps create a shell script on the…
				// RUN: chmod +x %t/amdgpu_arch_different
				// RUN: chmod +x %t/amdgpu_arch_empty

				// case when amdgpu_arch returns nothing or fails
				// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib --amdgpu-arch-tool=%t/amdgpu_arch_fail %s 2>&1 \
				// RUN: \| FileCheck %s --check-prefix=NO-OUTPUT-ERROR
				// NO-OUTPUT-ERROR: error: Cannot determine AMDGPU architecture{{.*}}Exited with error code 1. Consider passing it via --march

				// case when amdgpu_arch returns multiple gpus but all are different
				// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib --amdgpu-arch-tool=%t/amdgpu_arch_different %s 2>&1 \
				// RUN: \| FileCheck %s --check-prefix=MULTIPLE-OUTPUT-ERROR
				// MULTIPLE-OUTPUT-ERROR: error: Cannot determine AMDGPU architecture: Multiple AMD GPUs found with different archs. Consider passing it via --march

				// case when amdgpu_arch does not return anything with successful execution
				// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib --amdgpu-arch-tool=%t/amdgpu_arch_empty %s 2>&1 \
				// RUN: \| FileCheck %s --check-prefix=EMPTY-OUTPUT
				// EMPTY-OUTPUT: error: Cannot determine AMDGPU architecture: No AMD GPU detected in the system. Consider passing it via --march

clang/test/Driver/amdgpu-openmp-system-arch.c

This file was added.

				// REQUIRES: system-linux
				// REQUIRES: x86-registered-target
				JonChesterfieldUnsubmitted Done Reply Inline Actions This seems fairly clear. We might want to drop the fcuda-is-device from the regex as it doesn't matter for the test. Failing cases (maybe driven by a separate C file) are probably: print nothing, return 0 print nothing, return 1 print two different strings, return 0 JonChesterfield: This seems fairly clear. We might want to drop the fcuda-is-device from the regex as it doesn't…
				// REQUIRES: amdgpu-registered-target
				// REQUIRES: shell

				// RUN: mkdir -p %t
				// RUN: rm -f %t/amdgpu_arch_gfx906
				// RUN: cp %S/Inputs/amdgpu-arch/amdgpu_arch_gfx906 %t/
				// RUN: cp %S/Inputs/amdgpu-arch/amdgpu_arch_gfx908_gfx908 %t/
				// RUN: chmod +x %t/amdgpu_arch_gfx906
				// RUN: chmod +x %t/amdgpu_arch_gfx908_gfx908

				// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib --amdgpu-arch-tool=%t/amdgpu_arch_gfx906 %s 2>&1 \
				// RUN: \| FileCheck %s
				// CHECK: clang{{.}}"-cc1"{{.}}"-triple" "amdgcn-amd-amdhsa"{{.*}}"-target-cpu" "[[GFX:gfx906]]"
				// CHECK: llvm-link{{.}}"-o" "{{.}}amdgpu-openmp-system-arch-{{.}}-[[GFX]]-linked-{{.}}.bc"
				// CHECK: llc{{.}}amdgpu-openmp-system-arch-{{.}}-[[GFX]]-linked-{{.}}.bc" "-mtriple=amdgcn-amd-amdhsa" "-mcpu=[[GFX]]" "-filetype=obj" "-o"{{.}}amdgpu-openmp-system-arch-{{.}}-[[GFX]]-{{.}}.o"

				// case when amdgpu_arch returns multiple gpus but of same arch
				// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib --amdgpu-arch-tool=%t/amdgpu_arch_gfx908_gfx908 %s 2>&1 \
				// RUN: \| FileCheck %s --check-prefix=CHECK-MULTIPLE
				// CHECK-MULTIPLE: clang{{.}}"-cc1"{{.}}"-triple" "amdgcn-amd-amdhsa"{{.*}}"-target-cpu" "[[GFX:gfx908]]"
				// CHECK-MULTIPLE: llvm-link{{.}}"-o" "{{.}}amdgpu-openmp-system-arch-{{.}}-[[GFX]]-linked-{{.}}.bc"
				// CHECK-MULTIPLE: llc{{.}}amdgpu-openmp-system-arch-{{.}}-[[GFX]]-linked-{{.}}.bc" "-mtriple=amdgcn-amd-amdhsa" "-mcpu=[[GFX]]" "-filetype=obj" "-o"{{.}}amdgpu-openmp-system-arch-{{.}}-[[GFX]]-{{.}}.o"

clang/tools/CMakeLists.txt

	Show All 37 Lines
	# subdirectory. It contains tools developed as part of the Clang/LLVM project			# subdirectory. It contains tools developed as part of the Clang/LLVM project
	# on top of the Clang tooling platform. We keep them in a separate repository			# on top of the Clang tooling platform. We keep them in a separate repository
	# to keep the primary Clang repository small and focused.			# to keep the primary Clang repository small and focused.
	# It also may be included by LLVM_EXTERNAL_CLANG_TOOLS_EXTRA_SOURCE_DIR.			# It also may be included by LLVM_EXTERNAL_CLANG_TOOLS_EXTRA_SOURCE_DIR.
	add_llvm_external_project(clang-tools-extra extra)			add_llvm_external_project(clang-tools-extra extra)

	# libclang may require clang-tidy in clang-tools-extra.			# libclang may require clang-tidy in clang-tools-extra.
	add_clang_subdirectory(libclang)			add_clang_subdirectory(libclang)

				add_clang_subdirectory(amdgpu-arch)

clang/tools/amdgpu-arch/AMDGPUArch.cpp

This file was added.

				//===- AMDGPUArch.cpp - list AMDGPU installed ----------- C++ ----------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements a tool for detecting name of AMDGPU installed in system
				// using HSA. This tool is used by AMDGPU OpenMP driver.
				//
				//===----------------------------------------------------------------------===//

				#include <hsa.h>
				#include <string>
				#include <vector>

				static hsa_status_t iterateAgentsCallback(hsa_agent_t Agent, void *Data) {
				hsa_device_type_t DeviceType;
				hsa_status_t Status =
				hsa_agent_get_info(Agent, HSA_AGENT_INFO_DEVICE, &DeviceType);

				// continue only if device type if GPU
				if (Status != HSA_STATUS_SUCCESS \|\| DeviceType != HSA_DEVICE_TYPE_GPU) {
				return Status;
				}

				std::vector<std::string> *GPUs =
				static_cast<std::vector<std::string> *>(Data);
				char GPUName[64];
				Status = hsa_agent_get_info(Agent, HSA_AGENT_INFO_NAME, GPUName);
				if (Status != HSA_STATUS_SUCCESS) {
				return Status;
				JonChesterfieldUnsubmitted Done Reply Inline Actions Simpler code if we drop the class and pass in the vector<string> itself as the void* JonChesterfield: Simpler code if we drop the class and pass in the vector<string> itself as the void*
				}
				GPUs->push_back(GPUName);
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions Does this null terminate for any length of GPU name? Wondering if we should explicitly zero out the last char. JonChesterfield: Does this null terminate for any length of GPU name? Wondering if we should explicitly zero out…
				pdhaliwalAuthorUnsubmitted Done Reply Inline Actions Checked the rocr-runtime, the output is null terminated. pdhaliwal: Checked the rocr-runtime, the output is null terminated.
				return HSA_STATUS_SUCCESS;
				}

				int main() {
				hsa_status_t Status = hsa_init();
				if (Status != HSA_STATUS_SUCCESS) {
				return 1;
				}
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions missed one, return 1 on failure to initialize JonChesterfield: missed one, return 1 on failure to initialize

				std::vector<std::string> GPUs;
				Status = hsa_iterate_agents(iterateAgentsCallback, &GPUs);
				JonChesterfieldUnsubmitted Done Reply Inline Actions Unsure these should be writing to stderr. We capture stdout, stderr probably goes to the user. We could exit 1 instead as clang is going to treat any failure to guess the arch identically JonChesterfield: Unsure these should be writing to stderr. We capture stdout, stderr probably goes to the user.
				pdhaliwalAuthorUnsubmitted Done Reply Inline Actions Remove fprintf pdhaliwal: Remove fprintf
				if (Status != HSA_STATUS_SUCCESS) {
				return 1;
				}

				for (const auto &GPU : GPUs)
				printf("%s\n", GPU.c_str());
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions This strategy looks good. Briefly considered whether we should print while iterating, but that will misbehave if there is an error after printing the first gpu. unsigned is strictly the wrong type here, though it doesn't matter in practice. Could go with a range based for loop instead. JonChesterfield: This strategy looks good. Briefly considered whether we should print while iterating, but that…

				if (GPUs.size() < 1)
				return 1;

				hsa_shut_down();
				return 0;
				}

clang/tools/amdgpu-arch/CMakeLists.txt

This file was added.

				# //===----------------------------------------------------------------------===//
				# //
				# // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				# // See https://llvm.org/LICENSE.txt for details.
				# // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				# //
				# //===----------------------------------------------------------------------===//

				find_package(hsa-runtime64 QUIET 1.2.0 HINTS ${CMAKE_INSTALL_PREFIX} PATHS /opt/rocm)
				if (NOT ${hsa-runtime64_FOUND})
				yaxunlUnsubmitted Done Reply Inline Actions This tool does not use omp header file. Why is this needed? yaxunl: This tool does not use omp header file. Why is this needed?
				gregrodgersUnsubmitted Not Done Reply Inline Actions What happens when /opt/rocm is not available? Again, we need a cross-architecture mechanism to identify the offload-arch. gregrodgers: What happens when /opt/rocm is not available? Again, we need a cross-architecture mechanism…
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions Exactly the same as the amdgpu plugin. The cmake detection is char for char identical. This will look in CMAKE_INSTALL_PREFIX, which is where I install these libs when using trunk, and falls back to /opt/rocm which seems to be convenient for some users. JonChesterfield: Exactly the same as the amdgpu plugin. The cmake detection is char for char identical. This…
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions Which may need revising at some point - I like installing hsa as if it was an llvm subcomponent, but other people might want a different convention. As long as we remember to change this file + amdgpu's cmake at the same time, all good. JonChesterfield: Which may need revising at some point - I like installing hsa as if it was an llvm subcomponent…
				t-tyeUnsubmitted Not Done Reply Inline Actions /opt/tocm will not work with the side-by-side ROCm installation which installs ROCm in directories with the version number. Should there be the ability to configure this? t-tye: /opt/tocm will not work with the side-by-side ROCm installation which installs ROCm in…
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions If /opt/rocm isn't a safe out of the box option for finding rocm, let's remove that from here and the amdgpu plugin. Pre-existing hazard so doesn't need to block this patch. I'm installing roct and rocr to the same CMAKE_INSTALL_PREFIX as llvm which is why the first clause works out. How are the rocm components meant to find the corresponding pieces? If there's a rocm cmake install dir variable we could add that to the hints. JonChesterfield: If /opt/rocm isn't a safe out of the box option for finding rocm, let's remove that from here…
				message(STATUS "Not building amdgpu-arch: hsa-runtime64 not found")
				return()
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions Assuming this matches the logic in amdgpu openmp plugin JonChesterfield: Assuming this matches the logic in amdgpu openmp plugin
				endif()

				add_clang_tool(amdgpu-arch AMDGPUArch.cpp)

				clang_target_link_libraries(amdgpu-arch PRIVATE hsa-runtime64::hsa-runtime64)

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installedClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 339475

clang/include/clang/Basic/DiagnosticDriverKinds.td

clang/include/clang/Driver/Options.td

clang/lib/Driver/ToolChains/AMDGPU.h

clang/lib/Driver/ToolChains/AMDGPU.cpp

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp

clang/test/Driver/Inputs/amdgpu-arch/amdgpu_arch_different

clang/test/Driver/Inputs/amdgpu-arch/amdgpu_arch_fail

clang/test/Driver/Inputs/amdgpu-arch/amdgpu_arch_gfx906

clang/test/Driver/Inputs/amdgpu-arch/amdgpu_arch_gfx908_gfx908

clang/test/Driver/amdgpu-openmp-system-arch-fail.c

clang/test/Driver/amdgpu-openmp-system-arch.c

clang/tools/CMakeLists.txt

clang/tools/amdgpu-arch/AMDGPUArch.cpp

clang/tools/amdgpu-arch/CMakeLists.txt

[AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed
ClosedPublic