This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][OpenMP] Support linking of math libraries
ClosedPublic

Authored by pdhaliwal on Jul 14 2021, 7:36 AM.

Details

Summary

Math libraries are linked only when -lm is specified. This is because
host system could be missing rocm-device-libs.

Diff Detail

Event Timeline

pdhaliwal created this revision.Jul 14 2021, 7:36 AM
pdhaliwal requested review of this revision.Jul 14 2021, 7:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2021, 7:36 AM
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
260

I recognise this comment. Is this a bunch of logic that can be moved into the base class and then called from here and hip?

pdhaliwal added inline comments.Jul 14 2021, 7:55 AM
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
260

This is copied (after removing stuff related to opencl) from https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/AMDGPU.cpp#L841
I wanted to make call to ROCMToolChain::addClangTargetOptions, but there is some extra logic in it which is irrelevant to OpenMP. I will move the library linking into a separate common method as you suggest.

pdhaliwal updated this revision to Diff 358614.Jul 14 2021, 8:18 AM

Move linking logic to a common method.

JonChesterfield added inline comments.
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
275

Gone searching. This stuff has already been copied & pasted between AMDGPU.cpp and HIP.cpp. And diverged, looks like HIP has gained a bunch of inverse flags that AMDGPU has not, and some flags are duplicated (OPT_cl_fp32_correctly_rounded_divide_sqrt, OPT_fhip_fp32_correctly_rounded_divide_sqrt).

@b-sumner / @t-tye / @yaxunl / @scchan

I'd like to suggest that we use the same handling of these flags on opencl / hip / c++ openmp. Since the names have diverged and we don't want to break backwards compatibility, I'd like to check for all three (hip, no_hip, cl) flags for each one, with last one wins semantics, and do that from a single method called by all the amdgpu language drivers.

That way hip and opencl will continue working as they do today, except hip will additionally accept some opencl flags and do the right thing, and likewise opencl.

If that is unacceptable, the near future involves OPT_fopenmp_fp32_correctly_rounded_device_sqrt and similar, and I'd much rather use the same path for all the languages than copy&paste again.

@pdhaliwal getCommonBitcodeLibs will splice in ockl as well as ocml and dependencies. OpenMP doesn't call into ockl so I'd like to avoid that, preferably by moving the ockl append out of getCommonBitcodeLibs

@amdgpu in general can we get rid of this oclc_isa_version_xxx.bc file, which only defines a single constant in each case
@__oclc_ISA_version = linkonce_odr protected local_unnamed_addr addrspace(4) constant i32 9006, align 4
in favour of emitting that linkonce_odr symbol from clang? Clang already knows which gpu it is compiling for, and uses that to find the file with the corresponding name on disk, so it could instead emit that symbol, letting us drop the O(N) tiny files from the install tree and slightly improving compile time?

yaxunl added inline comments.Jul 14 2021, 9:07 AM
clang/lib/Driver/ToolChains/AMDGPU.cpp
910–920

I agree that we'd better refactor this part as a common function used by HIP/OpenMP/OpenCL. However I disagree to let HIP use OpenCL options. HIP uses clang generic options or HIP/CUDA shared options to control these flags. I think OpenMP may consider using similar options as HIP instead of OpenCL options.

clang/lib/Driver/ToolChains/AMDGPU.cpp
910–920

It's a mixture. Some flags look generic, some have hip in the name. My preference would be to have a generic named flag for each one, and alias hip/opencp/cuda onto the generic one.

Aliasing the flags would mean what works today continues working while simplifying the control flow in clang. Keeping flags with different names that do the same thing is certainly possible but doesn't seem a feature. It makes clang more complicated in order to make user build scripts more complicated.

yaxunl added inline comments.Jul 14 2021, 9:47 AM
clang/lib/Driver/ToolChains/AMDGPU.cpp
910–920

OpenCL options are defined by OpenCL spec. There may be difficulties to alias them to other options.

I am OK to alias HIP options to more generic options.

clang/lib/Driver/ToolChains/AMDGPU.cpp
910–920

Works for me. We could fold HIP and OpenMP together, both using generic options, and then leave OpenCL to adopt the same if they wish.

pdhaliwal updated this revision to Diff 361967.Jul 27 2021, 3:48 AM

Extract the options from HIP/OpenMP to a common method in base class.

pdhaliwal added inline comments.Jul 27 2021, 3:51 AM
clang/lib/Driver/ToolChains/AMDGPU.cpp
923–924

I wanted to rename these to something generic like -fgpu-fp32.... but due to some weird reason aliasing wasn't working. Anyhow, my suggestion is to make that change in separate patch.

I don't like that this pulls in ockl automatically but don't think that's a blocker. OK on my side, @yaxunl?

Due to the current state of math headers, I was unable to test this patch without ockl. But last time when headers were working, I was actually required to link ockl for a symbol (I forgot the name). I will update once I am able to get the math headers work again.

yaxunl added inline comments.Jul 28 2021, 7:12 AM
clang/lib/Driver/ToolChains/AMDGPU.cpp
831–860

I think we'd better absorb this part into the newly added function getCommonDeviceLibOptions so that we have a centralized location for determining device libs. We could use offload kind of the toolchain to differentiate between OpenCL/HIP/OpenMP.

clang/lib/Driver/ToolChains/AMDGPU.cpp
831–860

getCommonBitcodeLibs is called by opencl with some other set of constraints around argument names.

Persuading opencl to use the same arguments, getting rid of some of the files, doing things with aliasing, or however else we want to dice this problem is separable from linking the bitcode into openmp and can be left for a later patch. Using a common path for HIP and OpenMP seems a step in the right direction.

It might take quite a long time to reach consensus on how to deduplicate the two remaining copies, which I'd guess is why they were copy/pasted to begin with.

yaxunl added inline comments.Jul 28 2021, 8:28 AM
clang/lib/Driver/ToolChains/AMDGPU.cpp
831–860

We do not need to use shared options for HIP/OpenMP/OpenCL. We could use if/else to use different options for HIP/OpenCL.

I am suggesting this because this part of code is largely the same as getCommonDeviceLibOptions, except the option names. Also the name getCommonDeviceLibOptions indicates that is a common function for all languages, otherwise it should be renamed as getCommonDeviceLibOptionsForHIPAndOpenMP

clang/lib/Driver/ToolChains/AMDGPU.h
141

Maybe rename it as getCommonDeviceLibNames since it returns the bc file names instead of options.

Pls add a comment like 'returns a list of device library names shared by different languages".

yaxunl added inline comments.Jul 28 2021, 8:33 AM
clang/lib/Driver/ToolChains/AMDGPU.cpp
831–860

If it is out of scope for this patch I am OK to leave this part out. I can create a patch to refactor this part later.

pdhaliwal updated this revision to Diff 362717.Jul 29 2021, 4:36 AM

Rename method to getCommonDeviceLibNames

pdhaliwal updated this revision to Diff 362718.Jul 29 2021, 4:37 AM

Missed comment.

JonChesterfield accepted this revision.Jul 30 2021, 6:14 AM

I think this is good enough for now, assuming the majority of OvO is running locally. More cleanups to do after landing.

This revision is now accepted and ready to land.Jul 30 2021, 6:14 AM
yaxunl accepted this revision.Jul 30 2021, 6:43 AM

LGTM. Thanks.

This revision was automatically updated to reflect the committed changes.

It's been pointed out to me that -lm is a linker flag so it's weird to require it at compile time. I haven't thought of a good fix for that yet.

We don't need to splice in ocml for each compilation unit, so can move the test+splice into the link phase, except that amdgcn doesn't have a very well defined link phase as it keeps everything in IR. We could look for -c or equivalent, i.e. only check for ocml when building an executable, and not when compiling a translation unit to IR.