This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/clang/Driver/
-
clang/
-
Driver/
1/1
ToolChain.h
-
lib/
-
Driver/ToolChains/
-
ToolChains/
2
Clang.cpp
-
Cuda.h
-
Cuda.cpp
-
Headers/
-
CMakeLists.txt
4/5
__clang_openmp_math.h

Differential D60907

[OpenMP] Add math functions support in OpenMP offloading
AbandonedPublic

Authored by gtbercea on Apr 19 2019, 11:12 AM.

Download Raw Diff

Details

Reviewers

ABataev
hfinkel
caomhin
tra

Summary

This patch adds an OpenMP specific math functions header to the lib/Headers folder and ensures it is passed to Clang.

Note:
This is an example of how support for math functions could be implemented. Before expanding this to include other math functions please let me know if you have any comments, concerns or proposed changes.

Diff Detail

Repository

rC Clang

Build Status

Buildable 31011
Build 31010: arc lint + arc unit

Event Timeline

gtbercea created this revision.Apr 19 2019, 11:12 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2019, 11:12 AM

Herald added subscribers: cfe-commits, jdoerfert, guansong, mgorny. · View Herald Transcript

gtbercea added a reviewer: tra.Apr 19 2019, 11:13 AM

gtbercea added parent revisions: D60906: [OpenMP][libomptarget] Add math functions support in OpenMP offloading, D60905: [OpenMP][LLVM] Add math functions support to OpenMP.

gtbercea edited the summary of this revision. (Show Details)

ABataev added inline comments.Apr 19 2019, 11:23 AM

lib/Headers/__clang_openmp_math.h
15	Also, versions for float and long double
22	Add `powf(float)`, `powl(long double)`, `sinf(float)`, `sinl(long double)`

ABataev added inline comments.Apr 19 2019, 11:26 AM

lib/Headers/__clang_openmp_math.h
3	Why `__CLANG_OMP_CMATH_H__`? Your file is `..._math.h`, not `..._cmath.h`. Plus, seems to me, you're missing standard header for the file.

jdoerfert added inline comments.Apr 19 2019, 1:40 PM

include/clang/Driver/ToolChain.h
575	Copy & Past comment
lib/Headers/__clang_openmp_math.h
6	Why is this NVPTX specific?

To follow up on my comment why this is NVPTX specific:

Is there a reason why this has to happen in the Cuda ToolChain part?
I would have assumed us to add the declarations similar to the ones provided in __clang_openmp_math.h whenever we may compile for a target.
So, if we have any OpenMP target related code in the TU, we add the header __clang_openmp_target_math.h which defines "common" math functions as you did in __clang_openmp_math.h (without the NVPTX guard). The runtime will then implement __kmpc_XXXX as it sees fit.

Address comments.

Harbormaster completed remote builds in B30783: Diff 195915.Apr 19 2019, 2:25 PM

gtbercea marked 5 inline comments as done.Apr 19 2019, 2:26 PM

So the scheme is: pow is defined in __clang_openmp_math.h to call __kmpc_pow. This lives in libomptarget-nvptx (both bc and static lib) and just calls pow which works because nvcc and Clang in CUDA mode make sure that the call gets routed into libdevice?

Did you test that something like pow(d, 2) is optimized by LLVM to d * d? There's a pass doing so (can't recall the name) and from my previous attempts it didn't work well if you hid the function name instead of the known pow one.

In D60907#1473406, @Hahnfeld wrote:

So the scheme is: pow is defined in __clang_openmp_math.h to call __kmpc_pow. This lives in libomptarget-nvptx (both bc and static lib) and just calls pow which works because nvcc and Clang in CUDA mode make sure that the call gets routed into libdevice?

Did you test that something like pow(d, 2) is optimized by LLVM to d * d? There's a pass doing so (can't recall the name) and from my previous attempts it didn't work well if you hid the function name instead of the known pow one.

The transformation was blocked because of a check in optimizePow() this was preventing pow(x,2) from becoming x*x. By adding the pow functions to the TLI the transformation now applies. This has now been fixed in the LLVM patch. SQRT is eliminated as per usual, no change for that.

@gregrodgers

Use macros.

Harbormaster completed remote builds in B30990: Diff 196619.Apr 25 2019, 6:05 AM

Ping @hfinkel @tra

In D60907#1479118, @gtbercea wrote:

Ping @hfinkel @tra

The last two comments in D47849 indicated exploration of a different approach, and one which still seems superior to this one. Can you please comment on why you're now pursuing this approach instead?

In D60907#1479142, @hfinkel wrote:

In D60907#1479118, @gtbercea wrote:

Ping @hfinkel @tra

The last two comments in D47849 indicated exploration of a different approach, and one which still seems superior to this one. Can you please comment on why you're now pursuing this approach instead?

This solution is following Alexey's suggestions. This solution allows the optimization of math calls if they apply (example: pow(x,2) => x*x ) which was one of the issues in the previous solution I implemented.

gtbercea added a comment.Apr 25 2019, 1:11 PM

This comment was removed by gtbercea.

In D60907#1479142, @hfinkel wrote:

In D60907#1479118, @gtbercea wrote:

Ping @hfinkel @tra

The last two comments in D47849 indicated exploration of a different approach, and one which still seems superior to this one. Can you please comment on why you're now pursuing this approach instead?

Hal, as far as I can tell, this solution is similar to yours but with a slightly different implementation. If there are particular aspects about this patch you would like to discuss/give feedback on please let me know.

Update patch.

Harbormaster completed remote builds in B31011: Diff 196725.Apr 25 2019, 2:00 PM

jdoerfert added inline comments.Apr 29 2019, 7:05 PM

lib/Driver/ToolChains/Clang.cpp
1159	Here is another "NVPTX" specialization that I don't think we need. At least with more targets we need to relax this condition.
lib/Headers/__clang_openmp_math.h
13	Why is this NVPTX specific (again)?

In D60907#1479370, @gtbercea wrote:

In D60907#1479142, @hfinkel wrote:

In D60907#1479118, @gtbercea wrote:

Ping @hfinkel @tra

The last two comments in D47849 indicated exploration of a different approach, and one which still seems superior to this one. Can you please comment on why you're now pursuing this approach instead?

...

Hal, as far as I can tell, this solution is similar to yours but with a slightly different implementation. If there are particular aspects about this patch you would like to discuss/give feedback on please let me know.

The solution I suggested had the advantages of:

Being able to directly reuse the code in __clang_cuda_device_functions.h. On the other hand, using this solution we need to implement a wrapper function for every math function. When __clang_cuda_device_functions.h is updated, we need to update the OpenMP wrapper as well.
Providing access to wrappers for other CUDA intrinsics in a natural way (e.g., rnorm3d) [it looks a bit nicer to provide a host version of rnorm3d than __nv_rnorm3d in user code].
Being similar to the "declare variant" functionality from OpenMP 5, and thus, I suspect, closer to the solution we'll eventually be able to apply in a standard way to all targets.

What are all of the long-double functions going to do on NVPTX?

This solution is following Alexey's suggestions. This solution allows the optimization of math calls if they apply (example: pow(x,2) => x*x ) which was one of the issues in the previous solution I implemented.

So we're also missing that optimization for CUDA code when compiling with Clang? Isn't this also something that, regardless, should be fixed?

Also, how fragile is this? We inline bottom up but this optimization needs to apply before inlining?

Finally, regardless of all of this, do we really need to preinclude this header? Can't we do this with a math.h wrapper?

In D60907#1483615, @hfinkel wrote:

In D60907#1479370, @gtbercea wrote:

In D60907#1479142, @hfinkel wrote:

In D60907#1479118, @gtbercea wrote:

Ping @hfinkel @tra

The last two comments in D47849 indicated exploration of a different approach, and one which still seems superior to this one. Can you please comment on why you're now pursuing this approach instead?

...

Hal, as far as I can tell, this solution is similar to yours but with a slightly different implementation. If there are particular aspects about this patch you would like to discuss/give feedback on please let me know.

The solution I suggested had the advantages of:

Being able to directly reuse the code in __clang_cuda_device_functions.h. On the other hand, using this solution we need to implement a wrapper function for every math function. When __clang_cuda_device_functions.h is updated, we need to update the OpenMP wrapper as well.

I'd even go as far as to argue that __clang_cuda_device_functions.h should include the internal math.h wrapper to get all math functions. See also the next comment.

Providing access to wrappers for other CUDA intrinsics in a natural way (e.g., rnorm3d) [it looks a bit nicer to provide a host version of rnorm3d than __nv_rnorm3d in user code].

@hfinkel
I don't see why you want to mix CUDA intrinsics with math.h overloads. I added a rough outline of how I imagined the internal math.h header to look like as a comment in D47849. Could you elaborate how that differs from what you imagine and how the other intrinsics come in?

Being similar to the "declare variant" functionality from OpenMP 5, and thus, I suspect, closer to the solution we'll eventually be able to apply in a standard way to all targets.

I can see this.

This solution is following Alexey's suggestions. This solution allows the optimization of math calls if they apply (example: pow(x,2) => x*x ) which was one of the issues in the previous solution I implemented.

So we're also missing that optimization for CUDA code when compiling with Clang? Isn't this also something that, regardless, should be fixed?

Maybe through a general built-in recognition and lowering into target specific implementations/intrinsics late again?

In D60907#1483660, @jdoerfert wrote:

In D60907#1483615, @hfinkel wrote:

In D60907#1479370, @gtbercea wrote:

In D60907#1479142, @hfinkel wrote:

In D60907#1479118, @gtbercea wrote:

Ping @hfinkel @tra

The last two comments in D47849 indicated exploration of a different approach, and one which still seems superior to this one. Can you please comment on why you're now pursuing this approach instead?

...

Hal, as far as I can tell, this solution is similar to yours but with a slightly different implementation. If there are particular aspects about this patch you would like to discuss/give feedback on please let me know.

The solution I suggested had the advantages of:

Being able to directly reuse the code in __clang_cuda_device_functions.h. On the other hand, using this solution we need to implement a wrapper function for every math function. When __clang_cuda_device_functions.h is updated, we need to update the OpenMP wrapper as well.

I'd even go as far as to argue that __clang_cuda_device_functions.h should include the internal math.h wrapper to get all math functions. See also the next comment.

Providing access to wrappers for other CUDA intrinsics in a natural way (e.g., rnorm3d) [it looks a bit nicer to provide a host version of rnorm3d than __nv_rnorm3d in user code].

@hfinkel
I don't see why you want to mix CUDA intrinsics with math.h overloads.

What I had in mind was matching non-standard functions in a standard way. For example, let's just say that I have a CUDA kernel that uses the rnorm3d function, or I otherwise have a function that I'd like to write in OpenMP that will make good use of this CUDA function (because it happens to have an efficient device implementation). This is a function that CUDA provides, in the global namespace, although it's not standard.

Then I can do something like this (depending on how we setup the implementation):

double rnorm3d(double a,  double b, double c) {
  return sqrt(a*a + b*b + c*c);
}

...

#pragma omp target
{
  double a = ..., b = ..., c = ...;
  double r = rnorm3d(a, b, c)
}

and, if we use the CUDA math headers for CUDA math-function support, than this might "just work." To be clear, I can see an argument for having this work being a bad idea ;) -- but it has the advantage of providing a way to take advantage of system-specific functions while still writing completely-portable code.

I added a rough outline of how I imagined the internal math.h header to look like as a comment in D47849. Could you elaborate how that differs from what you imagine and how the other intrinsics come in?

That looks like what I had in mind (including __clang_cuda_device_functions.h to get the device functions.)

Being similar to the "declare variant" functionality from OpenMP 5, and thus, I suspect, closer to the solution we'll eventually be able to apply in a standard way to all targets.

I can see this.

This solution is following Alexey's suggestions. This solution allows the optimization of math calls if they apply (example: pow(x,2) => x*x ) which was one of the issues in the previous solution I implemented.

So we're also missing that optimization for CUDA code when compiling with Clang? Isn't this also something that, regardless, should be fixed?

Maybe through a general built-in recognition and lowering into target specific implementations/intrinsics late again?

I suspect that we need to match the intrinsics and perform the optimizations in LLVM at that level in order to get the optimizations for CUDA.

+1 to Hal's comments.

@jdoerfert :

I'd even go as far as to argue that __clang_cuda_device_functions.h should include the internal math.h wrapper to get all math functions. See also the next comment.

I'd argue other way around -- include __clang_cuda_device_functions.h from math.h and do not preinclude anything.
If the user does not include math.h, it should not have its namespace polluted by some random stuff. NVCC did this, but that's one of the most annoying 'features' we have to be compatible with for the sake of keeping existing nvcc-compilable CUDA code happy.

If users do include math.h, it should do the right thing, for both sides of the compilation.
IMO It's math.h that should be triggering pulling device functions in.

lib/Driver/ToolChains/Clang.cpp
1157–1159	This functionality is openMP-specific, but the function name `AddMathDeviceFunctions()` is not. I'd rather keep OpenMP specialization down where it can be easily seen. Could this check be pushed down into CudaInstallationDetector::AddMathDeviceFunctions() ? Another potential problem here is that this file will be pre-included only for the device. It will potentially result in more observable semantic differences between host and device compilations. I don't know if it matters for OpenMP, though. IMO intercepting math.h and providing device-specific overloads in addition to the regular math.h functions would be a better approach. Another problem with pre-included files is that sometimes users may intentionally need to not include them. For CUDA we have `-nocudainc` flag. Your change, at least, will need something similar, IMO.

In D60907#1484529, @hfinkel wrote:
In D60907#1483660, @jdoerfert wrote:

In D60907#1483615, @hfinkel wrote:

In D60907#1479370, @gtbercea wrote:

In D60907#1479142, @hfinkel wrote:

In D60907#1479118, @gtbercea wrote:

Ping @hfinkel @tra

The last two comments in D47849 indicated exploration of a different approach, and one which still seems superior to this one. Can you please comment on why you're now pursuing this approach instead?

...

Hal, as far as I can tell, this solution is similar to yours but with a slightly different implementation. If there are particular aspects about this patch you would like to discuss/give feedback on please let me know.

The solution I suggested had the advantages of:

Being able to directly reuse the code in __clang_cuda_device_functions.h. On the other hand, using this solution we need to implement a wrapper function for every math function. When __clang_cuda_device_functions.h is updated, we need to update the OpenMP wrapper as well.

I'd even go as far as to argue that __clang_cuda_device_functions.h should include the internal math.h wrapper to get all math functions. See also the next comment.

Providing access to wrappers for other CUDA intrinsics in a natural way (e.g., rnorm3d) [it looks a bit nicer to provide a host version of rnorm3d than __nv_rnorm3d in user code].

@hfinkel
I don't see why you want to mix CUDA intrinsics with math.h overloads.

What I had in mind was matching non-standard functions in a standard way. For example, let's just say that I have a CUDA kernel that uses the rnorm3d function, or I otherwise have a function that I'd like to write in OpenMP that will make good use of this CUDA function (because it happens to have an efficient device implementation). This is a function that CUDA provides, in the global namespace, although it's not standard.

Then I can do something like this (depending on how we setup the implementation):
double rnorm3d(double a,  double b, double c) {
  return sqrt(a*a + b*b + c*c);
}

...

#pragma omp target
{
  double a = ..., b = ..., c = ...;
  double r = rnorm3d(a, b, c)
}
and, if we use the CUDA math headers for CUDA math-function support, than this might "just work." To be clear, I can see an argument for having this work being a bad idea ;) -- but it has the advantage of providing a way to take advantage of system-specific functions while still writing completely-portable code.

Matching rnorm3d and replacing it with some nvvm "intrinsic" is something I wouldn't like to see happening if math.h was included and not if it was not. As you say, in Cuda that is not how it works either. I'm in favor of reusing the built-in recognition mechanism:
That is, if the target is nvptx, the name is rnorm3d, we match that name and use the appropriate intrinsic, as we do others already for other targets.

I added a rough outline of how I imagined the internal math.h header to look like as a comment in D47849. Could you elaborate how that differs from what you imagine and how the other intrinsics come in?

That looks like what I had in mind (including __clang_cuda_device_functions.h to get the device functions.)

Being similar to the "declare variant" functionality from OpenMP 5, and thus, I suspect, closer to the solution we'll eventually be able to apply in a standard way to all targets.

I can see this.

This solution is following Alexey's suggestions. This solution allows the optimization of math calls if they apply (example: pow(x,2) => x*x ) which was one of the issues in the previous solution I implemented.

So we're also missing that optimization for CUDA code when compiling with Clang? Isn't this also something that, regardless, should be fixed?

Maybe through a general built-in recognition and lowering into target specific implementations/intrinsics late again?

I suspect that we need to match the intrinsics and perform the optimizations in LLVM at that level in order to get the optimizations for CUDA.

That seems reasonable to me. We could also match other intrinsics, e.g., rnorm3d, here as well, both by name but also by the computation pattern.

In D60907#1484643, @tra wrote:

+1 to Hal's comments.

@jdoerfert :

I'd even go as far as to argue that __clang_cuda_device_functions.h should include the internal math.h wrapper to get all math functions. See also the next comment.

I'd argue other way around -- include __clang_cuda_device_functions.h from math.h and do not preinclude anything.
If the user does not include math.h, it should not have its namespace polluted by some random stuff. NVCC did this, but that's one of the most annoying 'features' we have to be compatible with for the sake of keeping existing nvcc-compilable CUDA code happy.

If users do include math.h, it should do the right thing, for both sides of the compilation.
IMO It's math.h that should be triggering pulling device functions in.

I actually don't want to preinclude anything and my arguments are (mostly) for the OpenMP offloading code path not necessarily Cuda.
Maybe to clarify, what I want is:

Make sure the clang/Headers/math.h is found first if math.h is included.
Use a scheme similar to the one described https://reviews.llvm.org/D47849#1483653 in clang/Headers/math.h
Only add math.h function overloads in our math.h. <- This is debatable
Include clang/Headers/math.h from __clang_cuda_device_functions.h to avoid duplication of math function declarations.

In D60907#1484756, @jdoerfert wrote:

I actually don't want to preinclude anything and my arguments are (mostly) for the OpenMP offloading code path not necessarily Cuda.
Maybe to clarify, what I want is:

Make sure the clang/Headers/math.h is found first if math.h is included.

Use a scheme similar to the one described https://reviews.llvm.org/D47849#1483653 in clang/Headers/math.h

Only add math.h function overloads in our math.h. <- This is debatable

Agreed.

Include clang/Headers/math.h from __clang_cuda_device_functions.h to avoid duplication of math function declarations.

This is not needed for CUDA. math.h is included early on in __clang_cuda_runtime_wrapper.h (via <cmath>), so by the time __clang_cuda_device_functions.h is included, math.h has already been included one way or another -- either in step 3 above, or directly by the __clang_cuda_runtime_wrapper.h

Replaced by: D61399

Revision Contents

Path

Size

include/

clang/

Driver/

ToolChain.h

4 lines

lib/

Driver/

ToolChains/

Clang.cpp

8 lines

Cuda.h

6 lines

Cuda.cpp

15 lines

Headers/

CMakeLists.txt

1 line

__clang_openmp_math.h

96 lines

Diff 196725

include/clang/Driver/ToolChain.h

Show First 20 Lines • Show All 566 Lines • ▼ Show 20 Lines	public:
/// a suitable profile runtime library to the linker.		/// a suitable profile runtime library to the linker.
virtual void addProfileRTLibs(const llvm::opt::ArgList &Args,		virtual void addProfileRTLibs(const llvm::opt::ArgList &Args,
llvm::opt::ArgStringList &CmdArgs) const;		llvm::opt::ArgStringList &CmdArgs) const;

/// Add arguments to use system-specific CUDA includes.		/// Add arguments to use system-specific CUDA includes.
virtual void AddCudaIncludeArgs(const llvm::opt::ArgList &DriverArgs,		virtual void AddCudaIncludeArgs(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args) const;		llvm::opt::ArgStringList &CC1Args) const;

		/// Add target math header.
		jdoerfertUnsubmitted Done Reply Inline Actions Copy & Past comment jdoerfert: Copy & Past comment
		virtual void AddMathDeviceFunctions(const llvm::opt::ArgList &DriverArgs,
		llvm::opt::ArgStringList &CC1Args) const {};

/// Add arguments to use MCU GCC toolchain includes.		/// Add arguments to use MCU GCC toolchain includes.
virtual void AddIAMCUIncludeArgs(const llvm::opt::ArgList &DriverArgs,		virtual void AddIAMCUIncludeArgs(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args) const;		llvm::opt::ArgStringList &CC1Args) const;

/// On Windows, returns the MSVC compatibility version.		/// On Windows, returns the MSVC compatibility version.
virtual VersionTuple computeMSVCVersion(const Driver *D,		virtual VersionTuple computeMSVCVersion(const Driver *D,
const llvm::opt::ArgList &Args) const;		const llvm::opt::ArgList &Args) const;

Show All 27 Lines

lib/Driver/ToolChains/Clang.cpp

Show First 20 Lines • Show All 1,144 Lines • ▼ Show 20 Lines	void Clang::AddPreprocessingOptions(Compilation &C, const JobAction &JA,

// Add offload include arguments specific for CUDA. This must happen before		// Add offload include arguments specific for CUDA. This must happen before
// we -I or -include anything else, because we must pick up the CUDA headers		// we -I or -include anything else, because we must pick up the CUDA headers
// from the particular CUDA installation, rather than from e.g.		// from the particular CUDA installation, rather than from e.g.
// /usr/local/include.		// /usr/local/include.
if (JA.isOffloading(Action::OFK_Cuda))		if (JA.isOffloading(Action::OFK_Cuda))
getToolChain().AddCudaIncludeArgs(Args, CmdArgs);		getToolChain().AddCudaIncludeArgs(Args, CmdArgs);

		// If we are offloading to a target via OpenMP and this target happens
		// to be an NVIDIA GPU then we need to include the CUDA runtime wrapper
		// to ensure the correct math functions are called in the offloaded
		// code.
		if (JA.isDeviceOffloading(Action::OFK_OpenMP) &&
		getToolChain().getTriple().isNVPTX())
		getToolChain().AddMathDeviceFunctions(Args, CmdArgs);
		jdoerfertUnsubmitted Not Done Reply Inline Actions Here is another "NVPTX" specialization that I don't think we need. At least with more targets we need to relax this condition. jdoerfert: Here is another "NVPTX" specialization that I don't think we need. At least with more targets…
		traUnsubmitted Not Done Reply Inline Actions This functionality is openMP-specific, but the function name `AddMathDeviceFunctions()` is not. I'd rather keep OpenMP specialization down where it can be easily seen. Could this check be pushed down into CudaInstallationDetector::AddMathDeviceFunctions() ? Another potential problem here is that this file will be pre-included only for the device. It will potentially result in more observable semantic differences between host and device compilations. I don't know if it matters for OpenMP, though. IMO intercepting math.h and providing device-specific overloads in addition to the regular math.h functions would be a better approach. Another problem with pre-included files is that sometimes users may intentionally need to not include them. For CUDA we have `-nocudainc` flag. Your change, at least, will need something similar, IMO. tra: This functionality is openMP-specific, but the function name `AddMathDeviceFunctions()` is not.

// Add -i* options, and automatically translate to		// Add -i* options, and automatically translate to
// -include-pch/-include-pth for transparent PCH support. It's		// -include-pch/-include-pth for transparent PCH support. It's
// wonky, but we include looking for .gch so we can support seamless		// wonky, but we include looking for .gch so we can support seamless
// replacement into a build system already set up to be generating		// replacement into a build system already set up to be generating
// .gch files.		// .gch files.

if (getToolChain().getDriver().IsCLMode()) {		if (getToolChain().getDriver().IsCLMode()) {
const Arg *YcArg = Args.getLastArg(options::OPT__SLASH_Yc);		const Arg *YcArg = Args.getLastArg(options::OPT__SLASH_Yc);
▲ Show 20 Lines • Show All 5,187 Lines • Show Last 20 Lines

lib/Driver/ToolChains/Cuda.h

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines

public:		public:
CudaInstallationDetector(const Driver &D, const llvm::Triple &HostTriple,		CudaInstallationDetector(const Driver &D, const llvm::Triple &HostTriple,
const llvm::opt::ArgList &Args);		const llvm::opt::ArgList &Args);

void AddCudaIncludeArgs(const llvm::opt::ArgList &DriverArgs,		void AddCudaIncludeArgs(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args) const;		llvm::opt::ArgStringList &CC1Args) const;

		void AddMathDeviceFunctions(const llvm::opt::ArgList &DriverArgs,
		llvm::opt::ArgStringList &CC1Args) const;

/// Emit an error if Version does not support the given Arch.		/// Emit an error if Version does not support the given Arch.
///		///
/// If either Version or Arch is unknown, does not emit an error. Emits at		/// If either Version or Arch is unknown, does not emit an error. Emits at
/// most one error per Arch.		/// most one error per Arch.
void CheckCudaVersionSupportsArch(CudaArch Arch) const;		void CheckCudaVersionSupportsArch(CudaArch Arch) const;

/// Check whether we detected a valid Cuda install.		/// Check whether we detected a valid Cuda install.
bool isValid() const { return IsValid; }		bool isValid() const { return IsValid; }
▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	public:
bool supportsDebugInfoOption(const llvm::opt::Arg *A) const override;		bool supportsDebugInfoOption(const llvm::opt::Arg *A) const override;
void adjustDebugInfoKind(codegenoptions::DebugInfoKind &DebugInfoKind,		void adjustDebugInfoKind(codegenoptions::DebugInfoKind &DebugInfoKind,
const llvm::opt::ArgList &Args) const override;		const llvm::opt::ArgList &Args) const override;
bool IsMathErrnoDefault() const override { return false; }		bool IsMathErrnoDefault() const override { return false; }

void AddCudaIncludeArgs(const llvm::opt::ArgList &DriverArgs,		void AddCudaIncludeArgs(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args) const override;		llvm::opt::ArgStringList &CC1Args) const override;

		void AddMathDeviceFunctions(const llvm::opt::ArgList &DriverArgs,
		llvm::opt::ArgStringList &CC1Args) const override;

void addClangWarningOptions(llvm::opt::ArgStringList &CC1Args) const override;		void addClangWarningOptions(llvm::opt::ArgStringList &CC1Args) const override;
CXXStdlibType GetCXXStdlibType(const llvm::opt::ArgList &Args) const override;		CXXStdlibType GetCXXStdlibType(const llvm::opt::ArgList &Args) const override;
void		void
AddClangSystemIncludeArgs(const llvm::opt::ArgList &DriverArgs,		AddClangSystemIncludeArgs(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args) const override;		llvm::opt::ArgStringList &CC1Args) const override;
void AddClangCXXStdlibIncludeArgs(		void AddClangCXXStdlibIncludeArgs(
const llvm::opt::ArgList &Args,		const llvm::opt::ArgList &Args,
llvm::opt::ArgStringList &CC1Args) const override;		llvm::opt::ArgStringList &CC1Args) const override;
Show All 27 Lines

lib/Driver/ToolChains/Cuda.cpp

Show First 20 Lines • Show All 249 Lines • ▼ Show 20 Lines	void CudaInstallationDetector::AddCudaIncludeArgs(
}		}

CC1Args.push_back("-internal-isystem");		CC1Args.push_back("-internal-isystem");
CC1Args.push_back(DriverArgs.MakeArgString(getIncludePath()));		CC1Args.push_back(DriverArgs.MakeArgString(getIncludePath()));
CC1Args.push_back("-include");		CC1Args.push_back("-include");
CC1Args.push_back("__clang_cuda_runtime_wrapper.h");		CC1Args.push_back("__clang_cuda_runtime_wrapper.h");
}		}

		void CudaInstallationDetector::AddMathDeviceFunctions(
		const ArgList &DriverArgs, ArgStringList &CC1Args) const {
		CC1Args.push_back("-internal-isystem");
		CC1Args.push_back(DriverArgs.MakeArgString(getIncludePath()));
		CC1Args.push_back("-include");
		CC1Args.push_back("__clang_openmp_math.h");
		CC1Args.push_back("-I");
		CC1Args.push_back(DriverArgs.MakeArgString(getIncludePath()));
		}

void CudaInstallationDetector::CheckCudaVersionSupportsArch(		void CudaInstallationDetector::CheckCudaVersionSupportsArch(
CudaArch Arch) const {		CudaArch Arch) const {
if (Arch == CudaArch::UNKNOWN \|\| Version == CudaVersion::UNKNOWN \|\|		if (Arch == CudaArch::UNKNOWN \|\| Version == CudaVersion::UNKNOWN \|\|
ArchsWithBadVersion.count(Arch) > 0)		ArchsWithBadVersion.count(Arch) > 0)
return;		return;

auto MinVersion = MinVersionForCudaArch(Arch);		auto MinVersion = MinVersionForCudaArch(Arch);
auto MaxVersion = MaxVersionForCudaArch(Arch);		auto MaxVersion = MaxVersionForCudaArch(Arch);
▲ Show 20 Lines • Show All 627 Lines • ▼ Show 20 Lines	if (!DriverArgs.hasArg(options::OPT_nocudainc) &&
!DriverArgs.hasArg(options::OPT_no_cuda_version_check)) {		!DriverArgs.hasArg(options::OPT_no_cuda_version_check)) {
StringRef Arch = DriverArgs.getLastArgValue(options::OPT_march_EQ);		StringRef Arch = DriverArgs.getLastArgValue(options::OPT_march_EQ);
assert(!Arch.empty() && "Must have an explicit GPU arch.");		assert(!Arch.empty() && "Must have an explicit GPU arch.");
CudaInstallation.CheckCudaVersionSupportsArch(StringToCudaArch(Arch));		CudaInstallation.CheckCudaVersionSupportsArch(StringToCudaArch(Arch));
}		}
CudaInstallation.AddCudaIncludeArgs(DriverArgs, CC1Args);		CudaInstallation.AddCudaIncludeArgs(DriverArgs, CC1Args);
}		}

		void CudaToolChain::AddMathDeviceFunctions(
		const ArgList &DriverArgs, ArgStringList &CC1Args) const {
		CudaInstallation.AddMathDeviceFunctions(DriverArgs, CC1Args);
		}

llvm::opt::DerivedArgList *		llvm::opt::DerivedArgList *
CudaToolChain::TranslateArgs(const llvm::opt::DerivedArgList &Args,		CudaToolChain::TranslateArgs(const llvm::opt::DerivedArgList &Args,
StringRef BoundArch,		StringRef BoundArch,
Action::OffloadKind DeviceOffloadKind) const {		Action::OffloadKind DeviceOffloadKind) const {
DerivedArgList *DAL =		DerivedArgList *DAL =
HostTC.TranslateArgs(Args, BoundArch, DeviceOffloadKind);		HostTC.TranslateArgs(Args, BoundArch, DeviceOffloadKind);
if (!DAL)		if (!DAL)
DAL = new DerivedArgList(Args.getBaseArgs());		DAL = new DerivedArgList(Args.getBaseArgs());
▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines

lib/Headers/CMakeLists.txt

Show All 25 Lines	set(files
avx512vldqintrin.h		avx512vldqintrin.h
avx512vlintrin.h		avx512vlintrin.h
avx512vpopcntdqvlintrin.h		avx512vpopcntdqvlintrin.h
avx512vnniintrin.h		avx512vnniintrin.h
avx512vlvnniintrin.h		avx512vlvnniintrin.h
avxintrin.h		avxintrin.h
bmi2intrin.h		bmi2intrin.h
bmiintrin.h		bmiintrin.h
		__clang_openmp_math.h
__clang_cuda_builtin_vars.h		__clang_cuda_builtin_vars.h
__clang_cuda_cmath.h		__clang_cuda_cmath.h
__clang_cuda_complex_builtins.h		__clang_cuda_complex_builtins.h
__clang_cuda_device_functions.h		__clang_cuda_device_functions.h
__clang_cuda_intrinsics.h		__clang_cuda_intrinsics.h
__clang_cuda_libdevice_declares.h		__clang_cuda_libdevice_declares.h
__clang_cuda_math_forward_declares.h		__clang_cuda_math_forward_declares.h
__clang_cuda_runtime_wrapper.h		__clang_cuda_runtime_wrapper.h
▲ Show 20 Lines • Show All 135 Lines • Show Last 20 Lines

lib/Headers/__clang_openmp_math.h

This file was added.

				/*===---- __clang_openmp_math.h - Target OpenMP math support ---------------===
				*
				* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				ABataevUnsubmitted Done Reply Inline Actions Why `__CLANG_OMP_CMATH_H__`? Your file is `..._math.h`, not `..._cmath.h`. Plus, seems to me, you're missing standard header for the file. ABataev: Why `__CLANG_OMP_CMATH_H__`? Your file is `..._math.h`, not `..._cmath.h`. Plus, seems to me…
				* See https://llvm.org/LICENSE.txt for license information.
				* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				*
				jdoerfertUnsubmitted Done Reply Inline Actions Why is this NVPTX specific? jdoerfert: Why is this NVPTX specific?
				*===-----------------------------------------------------------------------===
				*/

				#ifndef __CLANG_OPENMP_MATH_H__
				#define __CLANG_OPENMP_MATH_H__

				#ifdef __NVPTX__
				jdoerfertUnsubmitted Not Done Reply Inline Actions Why is this NVPTX specific (again)? jdoerfert: Why is this NVPTX specific (again)?
				#pragma omp declare target

				ABataevUnsubmitted Done Reply Inline Actions Also, versions for float and long double ABataev: Also, versions for float and long double
				// Declarations of function in libomptarget
				#if defined(__cplusplus)
				extern "C" {
				#endif

				// POW
				float __kmpc_powf(float, float);
				ABataevUnsubmitted Done Reply Inline Actions Add `powf(float)`, `powl(long double)`, `sinf(float)`, `sinl(long double)` ABataev: Add `powf(float)`, `powl(long double)`, `sinf(float)`, `sinl(long double)`
				double __kmpc_pow(double, double);
				long double __kmpc_powl(long double, long double);

				// LOG
				double __kmpc_log(double);
				float __kmpc_logf(float);
				double __kmpc_log10(double);
				float __kmpc_log10f(float);
				double __kmpc_log1p(double);
				float __kmpc_log1pf(float);
				double __kmpc_log2(double);
				float __kmpc_log2f(float);
				double __kmpc_logb(double);
				float __kmpc_logbf(float);

				// SIN
				float __kmpc_sinf(float);
				double __kmpc_sin(double);
				long double __kmpc_sinl(long double);

				// COS
				float __kmpc_cosf(float);
				double __kmpc_cos(double);
				long double __kmpc_cosl(long double);

				#if defined(__cplusplus)
				}
				#endif

				// Single argument functions
				#define __OPENMP_MATH_FUNC_1(__ty, __fn, __kmpc_fn) \
				__attribute__((always_inline, used)) static __ty \
				__fn(__ty __x) { \
				return __kmpc_fn(__x); \
				}

				// Double argument functions
				#define __OPENMP_MATH_FUNC_2(__ty, __fn, __kmpc_fn) \
				__attribute__((always_inline, used)) static __ty \
				__fn(__ty __x, __ty __y) { \
				return __kmpc_fn(__x, __y); \
				}

				// POW
				__OPENMP_MATH_FUNC_2(float, powf, __kmpc_powf);
				__OPENMP_MATH_FUNC_2(double, pow, __kmpc_pow);
				__OPENMP_MATH_FUNC_2(long double, powl, __kmpc_powl);

				// LOG
				__OPENMP_MATH_FUNC_1(double, log, __kmpc_log);
				__OPENMP_MATH_FUNC_1(float, logf, __kmpc_logf);
				__OPENMP_MATH_FUNC_1(double, log10, __kmpc_log10);
				__OPENMP_MATH_FUNC_1(float, log10f, __kmpc_log10f);
				__OPENMP_MATH_FUNC_1(double, log1p, __kmpc_log1p);
				__OPENMP_MATH_FUNC_1(float, log1pf, __kmpc_log1pf);
				__OPENMP_MATH_FUNC_1(double, log2, __kmpc_log2);
				__OPENMP_MATH_FUNC_1(float, log2f, __kmpc_log2f);
				__OPENMP_MATH_FUNC_1(double, logb, __kmpc_logb);
				__OPENMP_MATH_FUNC_1(float, logbf, __kmpc_logbf);

				// SIN
				__OPENMP_MATH_FUNC_1(float, sinf, __kmpc_sinf);
				__OPENMP_MATH_FUNC_1(double, sin, __kmpc_sin);
				__OPENMP_MATH_FUNC_1(long double, sinl, __kmpc_sinl);

				// COS
				__OPENMP_MATH_FUNC_1(float, cosf, __kmpc_cosf);
				__OPENMP_MATH_FUNC_1(double, cos, __kmpc_cos);
				__OPENMP_MATH_FUNC_1(long double, cosl, __kmpc_cosl);

				#pragma omp end declare target
				#endif
				#endif