Currently amdgpu-arch tool detects AMD GPU by dynamically
loading HSA runtime shared library and using HSA API's,
which is not available on Windows.
This patch makes it work on Windows by dynamically loading
HIP runtime dll and using HIP API's.
Differential D153725
[clang] Make amdgpu-arch tool work on Windows yaxunl on Jun 25 2023, 10:22 AM. Authored by
Details Currently amdgpu-arch tool detects AMD GPU by dynamically This patch makes it work on Windows by dynamically loading
Diff Detail
Event TimelineComment Actions A lot of CMake relies on this just being an ordered list of architectures, so we'd probably need to make that an opt-in thing.
Comment Actions Using HIP runtime reports xnack and ecc. It reports the target ID including the accurate xnack and ecc feature which is ready to be used by clang.
Comment Actions Also w.r.t. target-id, I'm wondering what a good solution would be. Right now the main usage of amdgpu-arch is both to detect the -mcpu / -march in CMake and to fill in the architecture via --offload-arch=native or -fopenmp-target=amdgcn-amd-amdhsa. We may want to make a flag to specify if we want to include target-id information in the reported architectures.
Comment Actions The right thing to do on Linux for this is to query the driver directly. That is, the kernel should populate some string under /sys that we read. That isn't yet implemented. Does windows happen to have that functionality available? (landed here while trying to work out why tests aren't running because we now print errors about failing to load libamdhip64.so when hsa fails)
Comment Actions It should definitely not do that. That's what this redundant thing does . The kernel doesn't know the names of these devices. The kernel knows different names that map to PCI ids that are not the same as the gfx numbers. The compiler should not be responsible for maintaining yet another name mapping table and should go through a real API
This sounds very un-windows like. I assume the equivalent is digging around in the registry
Comment Actions There's a lot of pci id to gfx906 style tables lying around already. There used to be one in roct, last time I looked people wanted to move that to somewhere else. I don't really want to copy/paste it. The problem with using the proper API via HSA or similar is twofold:
Comment Actions I don't follow this. You don't need this to work to perform the build and test build. You may need it to execute the tests, but if HSA doesn't exist they won't be able to run anyway. If a build is invoking these tools at cmake time it's just broken Comment Actions Jon is probably referring to a recurring problem we've noticed with the libc tests on HSA that they will sometimes fail when running with multiple threads, see https://lab.llvm.org/staging/#/builders/247/builds/2599/steps/10/logs/stdio. Haven't been able to track down whether or not that's a bug in the implementation or interface somewhere. Comment Actions And the libomptarget build is in fact doing that, but it shouldn't have to. What it's doing actually seems really unreasonable. It's only building the locally found targets when it should be building all targetable devices. The inconvenience there is that's too many devices, so as a build time hack you should be able to opt-in to a restricted subset. Even better would be if we would only build a copy for a reasonable subset of targets (i.e. one per generation where there's actually some semblance of compatibility). Or could just capitulate and rely on the hacks device libs does Comment Actions It definitely annoys me. The argument is you can't usefully run some large N number of programs at the same time anyway and the driver failing to open is a rate limit. The problem is there are things we could usefully do, like this query, without needing to run a kernel as well. The net effect is we don't run tests widely in parallel because they fail if we do, for this and possibly other reasons. Comment Actions The test detection is an awkward compromise between people who want to run the GPU tests and people who don't, and reflects diverse hardware in use and variation on whether cuda / hsa are installed. Comment Actions The libomptarget build uses it to determine if it should build the tests mostly, we don't want to configure tests for a system that cannot support them. The libc tests however requires it to set the architecture for its test configuration since we can't support multiple test architectures at the same time, it required too much work so I shelved that. We more or less just say "If you've got HSA / CUDA we expect to run tests". |
Doesn't LLVM know if it's being built for Windows? Maybe we should key off of that instead and then conditionally add_sources for a single function that satisfies the same "print all the architectures" thing.