Page MenuHomePhabricator

[OpenMP] Add OpenMP offloading toolchain for AMDGPU
ClosedPublic

Authored by pdhaliwal on Jan 19 2021, 3:45 AM.

Details

Summary

This patch adds AMDGPUOpenMPToolChain for supporting OpenMP
offloading to AMD GPU's.

Originally authored by Greg Rodgers

Diff Detail

Unit TestsFailed

TimeTest
450 msx64 debian > Clang.Driver::openmp-offload-gpu.c
Script: -- : 'RUN: at line 17'; /mnt/disks/ssd0/agent/llvm-project/build/bin/clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_35 -Xopenmp-target -march=sm_60 /mnt/disks/ssd0/agent/llvm-project/clang/test/Driver/openmp-offload-gpu.c 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck -check-prefix=CHK-FOPENMP-TARGET-ARCHS /mnt/disks/ssd0/agent/llvm-project/clang/test/Driver/openmp-offload-gpu.c
1,000 msx64 debian > Clang.Driver::openmp-offload.c
Script: -- : 'RUN: at line 16'; /mnt/disks/ssd0/agent/llvm-project/build/bin/clang -### -fopenmp=libomp -fopenmp-targets=aaa-bbb-ccc-ddd /mnt/disks/ssd0/agent/llvm-project/clang/test/Driver/openmp-offload.c 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck -check-prefix=CHK-INVALID-TARGET /mnt/disks/ssd0/agent/llvm-project/clang/test/Driver/openmp-offload.c
650 msx64 windows > Clang.Driver::amdgpu-openmp-toolchain.c
Script: -- : 'RUN: at line 2'; c:\ws\w16c2-1\llvm-project\premerge-checks\build\bin\clang.exe -### -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target -march=gfx906 C:\ws\w16c2-1\llvm-project\premerge-checks\clang\test\Driver\amdgpu-openmp-toolchain.c 2>&1 | c:\ws\w16c2-1\llvm-project\premerge-checks\build\bin\filecheck.exe C:\ws\w16c2-1\llvm-project\premerge-checks\clang\test\Driver\amdgpu-openmp-toolchain.c
630 msx64 windows > Clang.Driver::openmp-offload-gpu.c
Script: -- : 'RUN: at line 17'; c:\ws\w16c2-1\llvm-project\premerge-checks\build\bin\clang.exe -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_35 -Xopenmp-target -march=sm_60 C:\ws\w16c2-1\llvm-project\premerge-checks\clang\test\Driver\openmp-offload-gpu.c 2>&1 | c:\ws\w16c2-1\llvm-project\premerge-checks\build\bin\filecheck.exe -check-prefix=CHK-FOPENMP-TARGET-ARCHS C:\ws\w16c2-1\llvm-project\premerge-checks\clang\test\Driver\openmp-offload-gpu.c
3,900 msx64 windows > Clang.Driver::openmp-offload.c
Script: -- : 'RUN: at line 16'; c:\ws\w16c2-1\llvm-project\premerge-checks\build\bin\clang.exe -### -fopenmp=libomp -fopenmp-targets=aaa-bbb-ccc-ddd C:\ws\w16c2-1\llvm-project\premerge-checks\clang\test\Driver\openmp-offload.c 2>&1 | c:\ws\w16c2-1\llvm-project\premerge-checks\build\bin\filecheck.exe -check-prefix=CHK-INVALID-TARGET C:\ws\w16c2-1\llvm-project\premerge-checks\clang\test\Driver\openmp-offload.c

Event Timeline

pdhaliwal created this revision.Jan 19 2021, 3:45 AM
pdhaliwal requested review of this revision.Jan 19 2021, 3:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 19 2021, 3:45 AM

This patch was written, roughly, by:

  • copying the known-working openmp driver from rocm into the trunk source tree
  • deleting lots of stuff that didn't look necessary
  • deleting some stuff that is broadly necessary, but the specifics are up for debate

The idea is to move language-independent but amdgcn-specific code into ROCMToolChain. Some has already gone in, others (like computeMSVCVersion) will likely move too.

Regarding the rest of the end to end stack:

  • host plugin works, same code in trunk / rocm / aomp
  • device plugin will work once it's building as openmp, modulo printf and malloc
  • compiler backend will work for spmd kernels today, will work for generic kernels after D94648 or equivalent lands

Regarding tests (which need the unimplemented bit filled in with the next patch):

  • Runtime tests (for spmd kernels) are working locally with a jury rigged devicertl
  • Codegen tests are proving awkward to update. Dropping line number would help, but there's still a difference in addrspace cast distribution. I'm hoping the scripts involved in generating the nvptx cases can be adapted.
pdhaliwal updated this revision to Diff 317553.Jan 19 2021, 6:56 AM

Fix clang-tidy error

Won't this just prevent us from building clang due to the missing cmake changes? We need somewhat testable chunks.

This patch was written, roughly, by:

  • copying the known-working openmp driver from rocm into the trunk source tree
  • deleting lots of stuff that didn't look necessary
  • deleting some stuff that is broadly necessary, but the specifics are up for debate

The idea is to move language-independent but amdgcn-specific code into ROCMToolChain. Some has already gone in, others (like computeMSVCVersion) will likely move too.

Regarding the rest of the end to end stack:

  • host plugin works, same code in trunk / rocm / aomp
  • device plugin will work once it's building as openmp, modulo printf and malloc
  • compiler backend will work for spmd kernels today, will work for generic kernels after D94648 or equivalent lands

Regarding tests (which need the unimplemented bit filled in with the next patch):

  • Runtime tests (for spmd kernels) are working locally with a jury rigged devicertl
  • Codegen tests are proving awkward to update. Dropping line number would help, but there's still a difference in addrspace cast distribution. I'm hoping the scripts involved in generating the nvptx cases can be adapted.

Could you add the tests for the tools invocation to check that the newly added classes/functions correctly translate options/flags?

pdhaliwal updated this revision to Diff 317810.Jan 20 2021, 1:49 AM

Won't this just prevent us from building clang due to the missing cmake changes?

It compiles and builds fine, however, I wasn't actually aware such sanity checking being present. It turns out
the unknown files inside llvm/ will lead cmake to report error but such reporting will not happen inside clang. Maybe such checks
were not enabled inside clang. Anyways thanks for pointing out. I will keep that in mind in future.

The idea for this patch was basically to introduce AMDGPUToolChain classes without much of the functionality in order
to keep its size in check. And the second patch would have integrated the toolchain with driver along with testing.
But during the intermediate time of the two patches, bare files would have existed (never built and tested).

I have updated this patch to now include somewhat functional driver along with tests.

pdhaliwal retitled this revision from [OpenMP] Add OpenMP offloading toolchain skeleton for AMDGPU to [OpenMP] Add OpenMP offloading toolchain for AMDGPU.Jan 20 2021, 2:16 AM
pdhaliwal edited the summary of this revision. (Show Details)
pdhaliwal updated this revision to Diff 317831.Jan 20 2021, 4:09 AM

Fixed failing debian tests

pdhaliwal updated this revision to Diff 318120.EditedJan 20 2021, 11:51 PM
  • Moved common methods of HIP and OpenMP to base AMDGPUToolChain
  • Removed unnecessary asserts

The deviceRTL is changing from cuda/hip to openmp at present. It would be good to be able to compile that as openmp for amdgpu, which needs a patch roughly like this and probably some follow up.

It's plausible that the contents of this file are only really of interest to AMD people, in which case we should probably land it in order to iterate in tree instead of downstream. It now has the cmake hook and a proof of concept test case.

@ABataev @grokos @jdoerfert your guidance would be very welcome, but equally if you'd prefer leave us to sink or swim that's completely reasonable too.

Thanks!

LGTM. But, please wait from someone outside AMD to accept it.

Overall, looks reasonable. I'm not a fan of the way things are hooked up but that is also true for the NVPTX toolchain and for now unavoidable I guess. Some nits below.

clang/lib/Driver/Driver.cpp
2999

Why the substr copy?

clang/lib/Driver/ToolChains/AMDGPU.h
72 ↗(On Diff #318120)

What is the coding style here? Pick upper case or lower case, form the above I assume upper case.

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
66

To verify, O0 is not mapped to O2, correct?

Overall, looks reasonable. I'm not a fan of the way things are hooked up but that is also true for the NVPTX toolchain and for now unavoidable I guess.

This is loosely based on the nvptx and hip toolchain files, but looking at others in clang the style seems similar.

clang/lib/Driver/Driver.cpp
2999

As in VStr.startswith("-march=gfx")? Slightly prettier. Not sure ad hoc arg parsing like this is ever good though

clang/lib/Driver/ToolChains/AMDGPU.h
72 ↗(On Diff #318120)

Yeah, should be supportsProfiling() as it's a function. Should probably propose that as a minimal patch to the existing code. There may be other casing mistakes. ConstructJob looks suspect, but all the files call it that instead of constructJob, so maybe I'm missing a part of the style guide.

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
66

I think opt -O0 is an error, though it does look like this will rewrite it to O2 instead. Which seems bad.

JonChesterfield added inline comments.Feb 1 2021, 8:21 AM
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
66

Suggest we drop the opt invocation. llvm-link is required for the case where multiple files are passed to clang as a single invocation, but I don't see why we need an opt invocation here. Passing the llvm-link'ed code to llc as-is, without this implicit lto style opt invocation, is probably a better default.

I believe this is 'safe', in the sense that it can't break anything other than AMDGPU's openmp implementation, which doesn't work yet anyway. The change to non-amd code is to OpenMPActionBuilder, and while parsing -march=gfx isn't pretty, it won't fire on nvptx. I'm not particularly confident it is correct, but it's hard to do comprehensive testing before there's an end to end spike. I'd like to iterate in tree, with an expectation that parts of this will need to change - @jdoerfert does the OpenMP patch look adequate for now?

clang/lib/Driver/Driver.cpp
755

This follows the same logic as nvptx so is likely to be correct

Generally OK, the more I look at it the more things pop up. Other than my comments, I'm fine with this going in.

clang/lib/Driver/Driver.cpp
767

what if we say TT.isNVPTX() || TT.isAMDGCN() in line 745, rename the Cuda things to "device" and the toolchain stuff is in a conditional? I see the triple is hardcoded here for some reason and I don't know why not to use TT. Neverhteless, it seems less copying is better, we will eventually have more architectures and 3 * 13 lines means 3 times the bugs and 3 times the required changes in the future.

3004

This does not look right. IsAMDGCN is not a question that should be answered by a -Xopenmp-target flag.

In general, why skip these passes here. Isn't what you really want to do, pass -emit-llvm[-bc] to the device compilation job?

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
67

Just to make it clear, we should default to O0.

clang/test/Driver/amdgpu-openmp-toolchain.c
3

Would this at all work without the march flag?

pdhaliwal updated this revision to Diff 320722.Feb 2 2021, 2:39 AM

Addressed review comments.

  • Combined the toolchain creation logic for nvptx and amdgcn
  • Replaced -Xopenmp-target with -emit-llvm-bc inside AMDGPUOpenMP.cpp
  • Removed opt from pipeline
pdhaliwal marked 3 inline comments as done.Feb 2 2021, 2:47 AM

After addressing the review comments, I have internally verified changes on few simple test programs. They seem to be working fine.

clang/lib/Driver/Driver.cpp
3004

I have removed the logic from here instead now I have added -emit-llvm-bc using addClangTargetOptions in AMDGPUOpenMP.

clang/test/Driver/amdgpu-openmp-toolchain.c
3

This would not work without -march flag as amdgcn is not forward/backward compatible and there is currently no way to detect system gpu arch.

JonChesterfield added inline comments.Feb 2 2021, 3:03 AM
clang/test/Driver/amdgpu-openmp-toolchain.c
3

The command line invocation openmp needs -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 could be friendlier.

Auto-detecting march is convenient in general but probably bad for clang tests as we want them to run on systems that don't have an amdgpu installed.

Looks much simpler, thanks!

clang/lib/Driver/Driver.cpp
758

Minor suggestion,

if (TT.isNVPTX() {
 ...
} else {
  assert(TT.isAMDGCN());
  ...
}

? Semantically equivalent, but if () else if () looks like there is a missing else clause.

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
44

Maybe rename this since opt is no longer involved. addLLCOptArg or similar? There's probably the same logic in another toolchain

65

Default("0") and update the comment 'we map anything else'

pdhaliwal updated this revision to Diff 320740.Feb 2 2021, 3:58 AM
  • Use 0 for default -O option
  • Rename addOptLevelArgs to addLLCOptArg
clang/lib/Driver/Driver.cpp
758

I think, having assert in last else is bit cleaner. What do you think?

JonChesterfield added inline comments.Feb 2 2021, 4:02 AM
clang/lib/Driver/Driver.cpp
758

I also prefer the if () else {assert()} construct.

jdoerfert accepted this revision.Feb 2 2021, 10:50 AM

fine with me, assuming this doesn't disturb anything outside the AMD pipeline.

This revision is now accepted and ready to land.Feb 2 2021, 10:50 AM