This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libomptarget/deviceRTLs/nvptx/
-
deviceRTLs/
-
nvptx/
-
CMakeLists.txt
-
src/
1/2
interface.h
3
math.cu

Differential D60906

[OpenMP][libomptarget] Add math functions support in OpenMP offloading
AbandonedPublic

Authored by gtbercea on Apr 19 2019, 11:10 AM.

Download Raw Diff

Details

Reviewers

ABataev
hfinkel
caomhin
tra

Summary

Add kmpc function definition to libomptarget-nvptx.bc.

Diff Detail

Repository

rOMP OpenMP

Build Status

Buildable 30991
Build 30990: arc lint + arc unit

Event Timeline

gtbercea created this revision.Apr 19 2019, 11:10 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2019, 11:10 AM

Herald added subscribers: openmp-commits, jdoerfert, guansong, mgorny. · View Herald Transcript

gtbercea added a reviewer: tra.Apr 19 2019, 11:13 AM

gtbercea added a child revision: D60907: [OpenMP] Add math functions support in OpenMP offloading.

ABataev added inline comments.Apr 19 2019, 11:20 AM

libomptarget/deviceRTLs/nvptx/src/interface.h
570	I think you need to add `__kmpc_powf(float)`, `__kmpc_powl(long double)`, `__kmpc_sinf(float)`, `__kmpc_sinl(long double)`

Address comments.

gtbercea marked an inline comment as done.Apr 19 2019, 2:34 PM

Harbormaster completed remote builds in B30788: Diff 195920.Apr 19 2019, 2:34 PM

I guess here, and other places, we will later need to add more function/type combinations. Given that they all follow the same scheme, we should probably use a template for generation soon.

Use macros.

Harbormaster completed remote builds in B30991: Diff 196620.Apr 25 2019, 6:05 AM

gtbercea retitled this revision from [OpenMP][libomptarget][WIP] Add math functions support in OpenMP offloading to [OpenMP][libomptarget] Add math functions support in OpenMP offloading.Apr 25 2019, 11:07 AM

jdoerfert added inline comments.Apr 29 2019, 6:49 PM

libomptarget/deviceRTLs/nvptx/src/interface.h
585	Shouldn't we do this with macros again? I would even propose a separate "math_macro.inc" file that is included in both places. In one the macro `DECL_ONLY` is set and we get only declarations while in the other we expand to definitions. The idea is we simplify the maintenance in the future and cut down code. Finally, we probably want to reuse the "math_macro.inc" also in other deviceRTLs so we always support the same functions across all targets. Does that makes sense?
libomptarget/deviceRTLs/nvptx/src/math.cu
21	I was thinking we have some macro for float/double generation: #define __OPENMP_MATH_FUNC_1_FP(__fn, __kmpc_fn) \ __OPENMP_MATH_FUNC_1(float, __fn, __kmpc_fn) \ __OPENMP_MATH_FUNC_1(double, __fn, __kmpc_fn) \ so we can cut down further below
29	I don't think we want the semicolon at the end.
34	Wouldn't we get the correct conversion for: `__OPENMP_MATH_FUNC_2(long double, pow, __kmpc_powl);` If not, should we add a two type version that casts the arguments and define `__OPENMP_MATH_FUNC_2` in terms of that one?

Replaced by: D61399

Revision Contents

Path

Size

libomptarget/

deviceRTLs/

nvptx/

CMakeLists.txt

1 line

src/

interface.h

27 lines

math.cu

60 lines

Diff 196620

libomptarget/deviceRTLs/nvptx/CMakeLists.txt

Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	set(cuda_src_files
src/data_sharing.cu		src/data_sharing.cu
src/libcall.cu		src/libcall.cu
src/loop.cu		src/loop.cu
src/omptarget-nvptx.cu		src/omptarget-nvptx.cu
src/parallel.cu		src/parallel.cu
src/reduction.cu		src/reduction.cu
src/sync.cu		src/sync.cu
src/task.cu		src/task.cu
		src/math.cu
)		)

set(omp_data_objects src/omp_data.cu)		set(omp_data_objects src/omp_data.cu)

# Get the compute capability the user requested or use SM_35 by default.		# Get the compute capability the user requested or use SM_35 by default.
# SM_35 is what clang uses by default.		# SM_35 is what clang uses by default.
set(default_capabilities 35)		set(default_capabilities 35)
if (DEFINED LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY)		if (DEFINED LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY)
▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

libomptarget/deviceRTLs/nvptx/src/interface.h

	Show First 20 Lines • Show All 561 Lines • ▼ Show 20 Lines

	EXTERN void __kmpc_get_team_static_memory(int16_t isSPMDExecutionMode,			EXTERN void __kmpc_get_team_static_memory(int16_t isSPMDExecutionMode,
	const void *buf, size_t size,			const void *buf, size_t size,
	int16_t is_shared, const void **res);			int16_t is_shared, const void **res);

	EXTERN void __kmpc_restore_team_static_memory(int16_t isSPMDExecutionMode,			EXTERN void __kmpc_restore_team_static_memory(int16_t isSPMDExecutionMode,
	int16_t is_shared);			int16_t is_shared);

				// POW
				ABataevUnsubmitted Done Reply Inline Actions I think you need to add `__kmpc_powf(float)`, `__kmpc_powl(long double)`, `__kmpc_sinf(float)`, `__kmpc_sinl(long double)` ABataev: I think you need to add `__kmpc_powf(float)`, `__kmpc_powl(long double)`, `__kmpc_sinf(float)`…
				EXTERN float __kmpc_powf(float a, float b);
				EXTERN double __kmpc_pow(double, double);
				EXTERN long double __kmpc_powl(long double a, long double b);

				// LOG
				EXTERN double __kmpc_log(double);
				EXTERN float __kmpc_logf(float);
				EXTERN double __kmpc_log10(double);
				EXTERN float __kmpc_log10f(float);
				EXTERN double __kmpc_log1p(double);
				EXTERN float __kmpc_log1pf(float);
				EXTERN double __kmpc_log2(double);
				EXTERN float __kmpc_log2f(float);
				EXTERN double __kmpc_logb(double);
				EXTERN float __kmpc_logbf(float);
				jdoerfertUnsubmitted Not Done Reply Inline Actions Shouldn't we do this with macros again? I would even propose a separate "math_macro.inc" file that is included in both places. In one the macro `DECL_ONLY` is set and we get only declarations while in the other we expand to definitions. The idea is we simplify the maintenance in the future and cut down code. Finally, we probably want to reuse the "math_macro.inc" also in other deviceRTLs so we always support the same functions across all targets. Does that makes sense? jdoerfert: Shouldn't we do this with macros again? I would even propose a separate "math_macro.inc" file…

				// SIN
				EXTERN float __kmpc_sinf(float);
				EXTERN double __kmpc_sin(double);
				EXTERN long double __kmpc_sinl(long double);

				// COS
				EXTERN float __kmpc_cosf(float);
				EXTERN double __kmpc_cos(double);
				EXTERN long double __kmpc_cosl(long double);

	#endif			#endif

libomptarget/deviceRTLs/nvptx/src/math.cu

This file was added.

				//===------------ math.cu - NVPTX OpenMP math constructs --------- CUDA -*-===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains the implementation of math function not already handled.
				// Function in this file will end up as part of the .bc library of the
				// device and will call libdevice functions.
				//
				//===----------------------------------------------------------------------===//

				#include "omptarget-nvptx.h"

				// Single argument functions
				#define __OPENMP_MATH_FUNC_1(__ty, __fn, __kmpc_fn) \
				EXTERN __ty __kmpc_fn(__ty __x) { \
				return __fn(__x); \
				}
				jdoerfertUnsubmitted Not Done Reply Inline Actions I was thinking we have some macro for float/double generation: #define __OPENMP_MATH_FUNC_1_FP(__fn, __kmpc_fn) \ __OPENMP_MATH_FUNC_1(float, __fn, __kmpc_fn) \ __OPENMP_MATH_FUNC_1(double, __fn, __kmpc_fn) \ so we can cut down further below jdoerfert: I was thinking we have some macro for float/double generation: ``` #define…

				// Double argument functions
				#define __OPENMP_MATH_FUNC_2(__ty, __fn, __kmpc_fn) \
				EXTERN __ty __kmpc_fn(__ty __x, __ty __y) { \
				return __fn(__x, __y); \
				}

				__OPENMP_MATH_FUNC_2(float, powf, __kmpc_powf);
				jdoerfertUnsubmitted Not Done Reply Inline Actions I don't think we want the semicolon at the end. jdoerfert: I don't think we want the semicolon at the end.
				__OPENMP_MATH_FUNC_2(double, pow, __kmpc_pow);
				// no powl defined for the GPU device so use pow.
				EXTERN long double __kmpc_powl(long double a, long double b) {
				return pow((double) a, (double) b);
				}
				jdoerfertUnsubmitted Not Done Reply Inline Actions Wouldn't we get the correct conversion for: `__OPENMP_MATH_FUNC_2(long double, pow, __kmpc_powl);` If not, should we add a two type version that casts the arguments and define `__OPENMP_MATH_FUNC_2` in terms of that one? jdoerfert: Wouldn't we get the correct conversion for: `__OPENMP_MATH_FUNC_2(long double, pow…

				// LOG
				__OPENMP_MATH_FUNC_1(double, log, __kmpc_log);
				__OPENMP_MATH_FUNC_1(float, logf, __kmpc_logf);
				__OPENMP_MATH_FUNC_1(double, log10, __kmpc_log10);
				__OPENMP_MATH_FUNC_1(float, log10f, __kmpc_log10f);
				__OPENMP_MATH_FUNC_1(double, log1p, __kmpc_log1p);
				__OPENMP_MATH_FUNC_1(float, log1pf, __kmpc_log1pf);
				__OPENMP_MATH_FUNC_1(double, log2, __kmpc_log2);
				__OPENMP_MATH_FUNC_1(float, log2f, __kmpc_log2f);
				__OPENMP_MATH_FUNC_1(double, logb, __kmpc_logb);
				__OPENMP_MATH_FUNC_1(float, logbf, __kmpc_logbf);

				__OPENMP_MATH_FUNC_1(float, sinf, __kmpc_sinf);
				__OPENMP_MATH_FUNC_1(double, sin, __kmpc_sin);
				// no sinl defined for the GPU device so use pow.
				EXTERN long double __kmpc_sinl(long double a) {
				return sin((double) a);
				}

				// COS
				__OPENMP_MATH_FUNC_1(float, cosf, __kmpc_cosf);
				__OPENMP_MATH_FUNC_1(double, cos, __kmpc_cos);
				EXTERN long double __kmpc_cosl(long double a) {
				return cos((double) a);
				}
				No newline at end of file