This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Allow compiling multiple target architectures with OpenMP
ClosedPublic

Authored by jhuber6 on Apr 30 2022, 4:04 PM.

Details

Summary

This patch adds support for OpenMP to use the --offload-arch and
--no-offload-arch options. Traditionally, OpenMP has only supported
compiling for a single architecture via the -Xopenmp-target option.
Now we can pass in a bound architecture and use that if given, otherwise
we default to the value of the -march option as before.

Note that this only applies the basic support, the OpenMP target runtime
does not yet know how to choose between multiple architectures.
Additionally other parts of the offloading toolchain (e.g. LTO) require
the -march option, these should be worked out later.

Diff Detail

Event Timeline

jhuber6 created this revision.Apr 30 2022, 4:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 30 2022, 4:04 PM
jhuber6 requested review of this revision.Apr 30 2022, 4:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 30 2022, 4:04 PM
saiislam added inline comments.May 1 2022, 10:41 PM
clang/test/Driver/amdgpu-openmp-toolchain-new.c
6

Wouldn't it be better if the user is not required to specify the triple in this shorthand version? We can infer the triple from the GPUArch. We have this support in our downstream branch.

clang  --target=x86_64-unknown-linux-gnu -fopenmp --offload-arch=gfx906 helloworld.c -o helloworld
jhuber6 added inline comments.May 2 2022, 5:15 AM
clang/test/Driver/amdgpu-openmp-toolchain-new.c
6

We could, HIP and CUDA both use some kind of getAMDOffloadTargetTriple. I guess in this case we would consider OpenMP offloading active if the user specified -fopenmp and --offload-arch? I could do this in a separate patch.

saiislam added inline comments.May 2 2022, 5:20 AM
clang/test/Driver/amdgpu-openmp-toolchain-new.c
6

Yes, exactly. OpenMP offloading should be active when -fopenmp and --offload-arch both are present.

Thank you!

saiislam added inline comments.May 2 2022, 5:28 AM
clang/test/Driver/amdgpu-openmp-toolchain-new.c
6

Following code might be useful for your patch (it assumes that OffloadArch is associated with each device tool chain so that multiple archs of same triple can be compiled together):

  1. GetTargetInfoFromOffloadArch()
  2. Driver::GetTargetInfoFromMarch()
  3. Driver::GetTargetInfoFromOffloadArchOpts()
  4. modified definition of Driver::CreateOffloadingDeviceToolChains()
jhuber6 added inline comments.May 2 2022, 5:41 AM
clang/test/Driver/amdgpu-openmp-toolchain-new.c
6

I'll look into it, I was thinking of a good way to specify architectures per triple. So we could theoretically have --offload-arch=sm_70 and --offload_arch=gfx908 work in unison and it might just be easy to group the triples from the architecture.

saiislam added inline comments.May 2 2022, 6:44 AM
clang/test/Driver/amdgpu-openmp-toolchain-new.c
6

Along with this, we would also like to support --offload-arch=gfx906 and --offload-arch=gfx908 in the same command.

jhuber6 added inline comments.May 2 2022, 6:54 AM
clang/test/Driver/amdgpu-openmp-toolchain-new.c
6

This patch already supports that, we'll compile for all the architectures and they'll all end up linked in the linker wrapper. What's missing is the changes to select an appropriate image in the libomptarget runtime.

jhuber6 updated this revision to Diff 426521.May 2 2022, 2:29 PM

Changing slightly, I'm using the getArgsForToolchain to only get the --offload-arch options for that toolchain. This lets us quality it with options like -Xopenmp-target= so we can now specify architectures per-toolchain without it causing an error. For example, the following should work:

clang input.c -fopenmp -fopenmp-targets=nvptx64,amdgcn -Xopenmp-targets=amdgcn --offload-arch=gfx803 -Xopenmp-targets=nvptx64 --offload-arch=sm_70 -c
tra added a comment.May 2 2022, 3:35 PM
clang/test/Driver/openmp-offload-gpu-new.c
56

You may want to add a test that when no --offload-arch= is specified, driver makes a reasonable default choice for both nvptx and amdgpu.

jhuber6 added inline comments.May 2 2022, 3:43 PM
clang/test/Driver/openmp-offload-gpu-new.c
56

OpenMP uses a different default from CUDA / HIP. If not specified, it will use the CLANG_OPENMP_NVPTX_DEFAULT_ARCH definition (Should be set automatically when CMake configures Clang) or the amdgpu-arch tool and pass it in via the -march= option. This is indicated by passing an empty StringRef to the bound architecture while CUDA / HIP use some default. Elsewhere, if we see the BoundArchitecture is empty we just get the value of the -march option instead. This should be covered by some other tests and this doesn't change the default behaviour.

tra accepted this revision.May 5 2022, 3:47 PM

LGTM in general.

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
313

I'd change checkSystemForAMDGPU to return the Arch or empty string.

I'd also simplify the code to something like this:

std::string Arch = DAL->getLastArgValue(options::OPT_march_EQ).str();
if (Arch.empty()) {
  Arch = !BoundArch.empty() ? BoundArch :  checkSystemForAMDGPU(Args, *this);
  DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ), Arch);
}
This revision is now accepted and ready to land.May 5 2022, 3:47 PM
jhuber6 added inline comments.May 5 2022, 4:46 PM
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
313

Having checkSystemForAMDGPU cause errors when trying to test this was a pain. It'll take a bit more plumbing to address that so I think I'll leave this as-is for this patch and address it in a follow-up.