This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Basic/
-
clang/
-
Basic/
-
BuiltinsPPC.def
-
lib/
-
CodeGen/
3/6
CGBuiltin.cpp
-
Headers/
-
altivec.h
-
test/CodeGen/
-
CodeGen/
-
builtins-ppc-altivec.c
1
builtins-ppc-vsx.c

Differential D101209

[PowerPC] Provide fastmath sqrt and div functions in altivec.h
ClosedPublic

Authored by nemanjai on Apr 23 2021, 4:40 PM.

Download Raw Diff

Details

Reviewers

cebowleratibm
bmahjour

Group Reviewers

Restricted Project

Commits

rGc3da07d216dd: [PowerPC] Provide fastmath sqrt and div functions in altivec.h

Summary

This adds the long overdue implementations of these functions that have been part of the ABI document and are now part of the "Power Vector Intrinsic Programming Reference" (PVIPR).

The approach is to add new builtins and to emit code with the fast flag regardless of whether fastmath was specified on the command line.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nemanjai created this revision.Apr 23 2021, 4:40 PM

Herald added subscribers: shchenz, kbarton. · View Herald TranscriptApr 23 2021, 4:40 PM

nemanjai requested review of this revision.Apr 23 2021, 4:40 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 23 2021, 4:40 PM

Harbormaster completed remote builds in B100703: Diff 340201.Apr 23 2021, 6:50 PM

qiucf added a subscriber: qiucf.Apr 25 2021, 9:31 PM

qiucf added inline comments.

clang/lib/CodeGen/CGBuiltin.cpp

15121

Seems FMF will be automatically restored without the three lines.

vector float test_recipdivd(vector float a, vector float b) {
  vector float x = vec_recipdiv(a, b);
  vector float y = x + b;
  return y;
}

define dso_local <4 x float> @test_recipdivd(<4 x float> %a, <4 x float> %b) {
entry:
  %recipdiv.i = fdiv fast <4 x float> %a, %b
  %add = fadd <4 x float> %recipdiv.i, %b
  ret <4 x float> %add
}

See https://reviews.llvm.org/D96231#inline-901337.

nemanjai added inline comments.Apr 26 2021, 12:07 PM

clang/lib/CodeGen/CGBuiltin.cpp
15121	Thanks for finding that. I did notice that and was wondering how the FMF flags return to what they were in the X86 code. So I added the reset of the flags just to be on the safe side. Now that I see that, I'll get rid of those.

bmahjour requested changes to this revision.Apr 28 2021, 8:46 AM

bmahjour added inline comments.

clang/lib/CodeGen/CGBuiltin.cpp
15129	I wonder if we can do better than "fdiv fast"... does the current lowering of "fdiv fast" employ an estimation algorithm via iterative refinement on POWER?
15133	This doesn't implement a reciprocal square root, it just performs a square root! At the very least we need a divide instruction following the call to the intrinsic, but I'm not sure if that'll result in the most optimal codegen at the end. Perhaps we need a new builtin?
clang/test/CodeGen/builtins-ppc-vsx.c
2297	See my comment above about the missing reciprocal operation.

This revision now requires changes to proceed.Apr 28 2021, 8:46 AM

nemanjai added inline comments.Apr 28 2021, 11:10 AM

clang/lib/CodeGen/CGBuiltin.cpp
15129	Yes. This `fast` includes `arcp` which will trigger the estimation+refinement algorithm in the back end.
15133	Oh, I misread the documentation. This really seems like a bizarre thing to offer a user. I will change this to `1/sqrt()`. In terms of providing optimal performance, with fast-math, the optimizer should get rid of the divide. If compiled at `-O0`, it isn't reasonable to expect optimal performance to begin with.

Changed rsqrt to be an actual reciprocal rather than just a refined sqrt estimate.

I have verified that the code generated is equivalent to gcc's and the results produced are the same.

Harbormaster completed remote builds in B101597: Diff 341456.Apr 29 2021, 4:28 AM

LGTM

This revision is now accepted and ready to land.Apr 29 2021, 3:52 PM

This revision was landed with ongoing or failed builds.Apr 30 2021, 5:18 PM

Closed by commit rGc3da07d216dd: [PowerPC] Provide fastmath sqrt and div functions in altivec.h (authored by nemanjai). · Explain Why

This revision was automatically updated to reflect the committed changes.

nemanjai added a commit: rGc3da07d216dd: [PowerPC] Provide fastmath sqrt and div functions in altivec.h.

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

BuiltinsPPC.def

6 lines

lib/

CodeGen/

CGBuiltin.cpp

19 lines

Headers/

altivec.h

22 lines

test/

CodeGen/

builtins-ppc-altivec.c

18 lines

builtins-ppc-vsx.c

18 lines

Diff 342094

clang/include/clang/Basic/BuiltinsPPC.def

	Show First 20 Lines • Show All 594 Lines • ▼ Show 20 Lines
	BUILTIN(__builtin_subf128_round_to_odd, "LLdLLdLLd", "")			BUILTIN(__builtin_subf128_round_to_odd, "LLdLLdLLd", "")
	BUILTIN(__builtin_mulf128_round_to_odd, "LLdLLdLLd", "")			BUILTIN(__builtin_mulf128_round_to_odd, "LLdLLdLLd", "")
	BUILTIN(__builtin_divf128_round_to_odd, "LLdLLdLLd", "")			BUILTIN(__builtin_divf128_round_to_odd, "LLdLLdLLd", "")
	BUILTIN(__builtin_fmaf128_round_to_odd, "LLdLLdLLdLLd", "")			BUILTIN(__builtin_fmaf128_round_to_odd, "LLdLLdLLdLLd", "")
	BUILTIN(__builtin_truncf128_round_to_odd, "dLLd", "")			BUILTIN(__builtin_truncf128_round_to_odd, "dLLd", "")
	BUILTIN(__builtin_vsx_scalar_extract_expq, "ULLiLLd", "")			BUILTIN(__builtin_vsx_scalar_extract_expq, "ULLiLLd", "")
	BUILTIN(__builtin_vsx_scalar_insert_exp_qp, "LLdLLdULLi", "")			BUILTIN(__builtin_vsx_scalar_insert_exp_qp, "LLdLLdULLi", "")

				// Fastmath by default builtins
				BUILTIN(__builtin_ppc_rsqrtf, "V4fV4f", "")
				BUILTIN(__builtin_ppc_rsqrtd, "V2dV2d", "")
				BUILTIN(__builtin_ppc_recipdivf, "V4fV4fV4f", "")
				BUILTIN(__builtin_ppc_recipdivd, "V2dV2dV2d", "")

	// HTM builtins			// HTM builtins
	BUILTIN(__builtin_tbegin, "UiUIi", "")			BUILTIN(__builtin_tbegin, "UiUIi", "")
	BUILTIN(__builtin_tend, "UiUIi", "")			BUILTIN(__builtin_tend, "UiUIi", "")

	BUILTIN(__builtin_tabort, "UiUi", "")			BUILTIN(__builtin_tabort, "UiUi", "")
	BUILTIN(__builtin_tabortdc, "UiUiUiUi", "")			BUILTIN(__builtin_tabortdc, "UiUiUiUi", "")
	BUILTIN(__builtin_tabortdci, "UiUiUii", "")			BUILTIN(__builtin_tabortdci, "UiUiUii", "")
	BUILTIN(__builtin_tabortwc, "UiUiUiUi", "")			BUILTIN(__builtin_tabortwc, "UiUiUiUi", "")
	▲ Show 20 Lines • Show All 147 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 15,107 Lines • ▼ Show 20 Lines	Value *CodeGenFunction::EmitPPCBuiltinExpr(unsigned BuiltinID,
case PPC::BI__builtin_vsx_xvabsdp:		case PPC::BI__builtin_vsx_xvabsdp:
case PPC::BI__builtin_vsx_xvabssp: {		case PPC::BI__builtin_vsx_xvabssp: {
llvm::Type *ResultType = ConvertType(E->getType());		llvm::Type *ResultType = ConvertType(E->getType());
Value *X = EmitScalarExpr(E->getArg(0));		Value *X = EmitScalarExpr(E->getArg(0));
llvm::Function *F = CGM.getIntrinsic(Intrinsic::fabs, ResultType);		llvm::Function *F = CGM.getIntrinsic(Intrinsic::fabs, ResultType);
return Builder.CreateCall(F, X);		return Builder.CreateCall(F, X);
}		}

		// Fastmath by default
		case PPC::BI__builtin_ppc_recipdivf:
		case PPC::BI__builtin_ppc_recipdivd:
		case PPC::BI__builtin_ppc_rsqrtf:
		case PPC::BI__builtin_ppc_rsqrtd: {
		Builder.getFastMathFlags().setFast();
		qiucfUnsubmitted Not Done Reply Inline Actions Seems FMF will be automatically restored without the three lines. vector float test_recipdivd(vector float a, vector float b) { vector float x = vec_recipdiv(a, b); vector float y = x + b; return y; } define dso_local <4 x float> @test_recipdivd(<4 x float> %a, <4 x float> %b) { entry: %recipdiv.i = fdiv fast <4 x float> %a, %b %add = fadd <4 x float> %recipdiv.i, %b ret <4 x float> %add } See https://reviews.llvm.org/D96231#inline-901337. qiucf: Seems FMF will be automatically restored without the three lines. ``` vector float…
		nemanjaiAuthorUnsubmitted Done Reply Inline Actions Thanks for finding that. I did notice that and was wondering how the FMF flags return to what they were in the X86 code. So I added the reset of the flags just to be on the safe side. Now that I see that, I'll get rid of those. nemanjai: Thanks for finding that. I did notice that and was wondering how the FMF flags return to what…
		llvm::Type *ResultType = ConvertType(E->getType());
		Value *X = EmitScalarExpr(E->getArg(0));

		if (BuiltinID == PPC::BI__builtin_ppc_recipdivf \|\|
		BuiltinID == PPC::BI__builtin_ppc_recipdivd) {
		Value *Y = EmitScalarExpr(E->getArg(1));
		return Builder.CreateFDiv(X, Y, "recipdiv");
		}
		bmahjourUnsubmitted Not Done Reply Inline Actions I wonder if we can do better than "fdiv fast"... does the current lowering of "fdiv fast" employ an estimation algorithm via iterative refinement on POWER? bmahjour: I wonder if we can do better than "fdiv fast"... does the current lowering of "fdiv fast"…
		nemanjaiAuthorUnsubmitted Done Reply Inline Actions Yes. This `fast` includes `arcp` which will trigger the estimation+refinement algorithm in the back end. nemanjai: Yes. This `fast` includes `arcp` which will trigger the estimation+refinement algorithm in the…
		auto *One = ConstantFP::get(ResultType, 1.0);
		llvm::Function *F = CGM.getIntrinsic(Intrinsic::sqrt, ResultType);
		return Builder.CreateFDiv(One, Builder.CreateCall(F, X), "rsqrt");
		}
		bmahjourUnsubmitted Not Done Reply Inline Actions This doesn't implement a reciprocal square root, it just performs a square root! At the very least we need a divide instruction following the call to the intrinsic, but I'm not sure if that'll result in the most optimal codegen at the end. Perhaps we need a new builtin? bmahjour: This doesn't implement a reciprocal square root, it just performs a square root! At the very…
		nemanjaiAuthorUnsubmitted Done Reply Inline Actions Oh, I misread the documentation. This really seems like a bizarre thing to offer a user. I will change this to `1/sqrt()`. In terms of providing optimal performance, with fast-math, the optimizer should get rid of the divide. If compiled at `-O0`, it isn't reasonable to expect optimal performance to begin with. nemanjai: Oh, I misread the documentation. This really seems like a bizarre thing to offer a user. I will…

// FMA variations		// FMA variations
case PPC::BI__builtin_vsx_xvmaddadp:		case PPC::BI__builtin_vsx_xvmaddadp:
case PPC::BI__builtin_vsx_xvmaddasp:		case PPC::BI__builtin_vsx_xvmaddasp:
case PPC::BI__builtin_vsx_xvnmaddadp:		case PPC::BI__builtin_vsx_xvnmaddadp:
case PPC::BI__builtin_vsx_xvnmaddasp:		case PPC::BI__builtin_vsx_xvnmaddasp:
case PPC::BI__builtin_vsx_xvmsubadp:		case PPC::BI__builtin_vsx_xvmsubadp:
case PPC::BI__builtin_vsx_xvmsubasp:		case PPC::BI__builtin_vsx_xvmsubasp:
case PPC::BI__builtin_vsx_xvnmsubadp:		case PPC::BI__builtin_vsx_xvnmsubadp:
▲ Show 20 Lines • Show All 2,844 Lines • Show Last 20 Lines

clang/lib/Headers/altivec.h

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 8,353 Lines • ▼ Show 20 Lines
	}			}

	#ifdef __VSX__			#ifdef __VSX__
	static __inline__ vector double __ATTRS_o_ai vec_rsqrte(vector double __a) {			static __inline__ vector double __ATTRS_o_ai vec_rsqrte(vector double __a) {
	return __builtin_vsx_xvrsqrtedp(__a);			return __builtin_vsx_xvrsqrtedp(__a);
	}			}
	#endif			#endif

				static vector float __ATTRS_o_ai vec_rsqrt(vector float __a) {
				return __builtin_ppc_rsqrtf(__a);
				}

				#ifdef __VSX__
				static vector double __ATTRS_o_ai vec_rsqrt(vector double __a) {
				return __builtin_ppc_rsqrtd(__a);
				}
				#endif

	/* vec_vrsqrtefp */			/* vec_vrsqrtefp */

	static __inline__ __vector float __attribute__((__always_inline__))			static __inline__ __vector float __attribute__((__always_inline__))
	vec_vrsqrtefp(vector float __a) {			vec_vrsqrtefp(vector float __a) {
	return __builtin_altivec_vrsqrtefp(__a);			return __builtin_altivec_vrsqrtefp(__a);
	}			}

	/* vec_xvtsqrt */			/* vec_xvtsqrt */
	▲ Show 20 Lines • Show All 9,522 Lines • ▼ Show 20 Lines
	static vector signed short __ATTRS_o_ai vec_nabs(vector signed short __a) {			static vector signed short __ATTRS_o_ai vec_nabs(vector signed short __a) {
	return __builtin_altivec_vminsh(__a, -__a);			return __builtin_altivec_vminsh(__a, -__a);
	}			}

	static vector signed char __ATTRS_o_ai vec_nabs(vector signed char __a) {			static vector signed char __ATTRS_o_ai vec_nabs(vector signed char __a) {
	return __builtin_altivec_vminsb(__a, -__a);			return __builtin_altivec_vminsb(__a, -__a);
	}			}

				static vector float __ATTRS_o_ai vec_recipdiv(vector float __a,
				vector float __b) {
				return __builtin_ppc_recipdivf(__a, __b);
				}

				#ifdef __VSX__
				static vector double __ATTRS_o_ai vec_recipdiv(vector double __a,
				vector double __b) {
				return __builtin_ppc_recipdivd(__a, __b);
				}
				#endif

	#ifdef __POWER10_VECTOR__			#ifdef __POWER10_VECTOR__

	/* vec_extractm */			/* vec_extractm */

	static __inline__ unsigned int __ATTRS_o_ai			static __inline__ unsigned int __ATTRS_o_ai
	vec_extractm(vector unsigned char __a) {			vec_extractm(vector unsigned char __a) {
	return __builtin_altivec_vextractbm(__a);			return __builtin_altivec_vextractbm(__a);
	}			}
	▲ Show 20 Lines • Show All 837 Lines • Show Last 20 Lines

clang/test/CodeGen/builtins-ppc-altivec.c

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,571 Lines • ▼ Show 20 Lines	void test12() {
vec_xst_be(vui, param_sll, &param_ui);		vec_xst_be(vui, param_sll, &param_ui);
// CHECK: store <4 x i32> %{{[0-9]+}}, <4 x i32>* %{{[0-9]+}}, align 1		// CHECK: store <4 x i32> %{{[0-9]+}}, <4 x i32>* %{{[0-9]+}}, align 1
// CHECK-LE: call void @llvm.ppc.vsx.stxvw4x.be(<4 x i32> %{{[0-9]+}}, i8* %{{[0-9]+}})		// CHECK-LE: call void @llvm.ppc.vsx.stxvw4x.be(<4 x i32> %{{[0-9]+}}, i8* %{{[0-9]+}})

vec_xst_be(vf, param_sll, &param_f);		vec_xst_be(vf, param_sll, &param_f);
// CHECK: store <4 x float> %{{[0-9]+}}, <4 x float>* %{{[0-9]+}}, align 1		// CHECK: store <4 x float> %{{[0-9]+}}, <4 x float>* %{{[0-9]+}}, align 1
// CHECK-LE: call void @llvm.ppc.vsx.stxvw4x.be(<4 x i32> %{{[0-9]+}}, i8* %{{[0-9]+}})		// CHECK-LE: call void @llvm.ppc.vsx.stxvw4x.be(<4 x i32> %{{[0-9]+}}, i8* %{{[0-9]+}})
}		}

		vector float test_rsqrtf(vector float a, vector float b) {
		// CHECK-LABEL: test_rsqrtf
		// CHECK: call fast <4 x float> @llvm.sqrt.v4f32
		// CHECK: fdiv fast <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>
		// CHECK-LE-LABEL: test_rsqrtf
		// CHECK-LE: call fast <4 x float> @llvm.sqrt.v4f32
		// CHECK-LE: fdiv fast <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>
		return vec_rsqrt(a);
		}

		vector float test_recipdivf(vector float a, vector float b) {
		// CHECK-LABEL: test_recipdivf
		// CHECK: fdiv fast <4 x float>
		// CHECK-LE-LABEL: test_recipdivf
		// CHECK-LE: fdiv fast <4 x float>
		return vec_recipdiv(a, b);
		}

clang/test/CodeGen/builtins-ppc-vsx.c

	Show First 20 Lines • Show All 2,277 Lines • ▼ Show 20 Lines
	// CHECK-DAG: load{{.*}}%a			// CHECK-DAG: load{{.*}}%a
	// CHECK-DAG: load{{.*}}%b			// CHECK-DAG: load{{.*}}%b
	// CHECK-NOT: SEPARATOR			// CHECK-NOT: SEPARATOR
	// CHECK-DAG: [[RA:%[0-9]+]] = load <2 x double>, <2 x double>* %a.addr			// CHECK-DAG: [[RA:%[0-9]+]] = load <2 x double>, <2 x double>* %a.addr
	// CHECK-DAG: [[RB:%[0-9]+]] = load <2 x double>, <2 x double>* %b.addr			// CHECK-DAG: [[RB:%[0-9]+]] = load <2 x double>, <2 x double>* %b.addr
	// CHECK-NEXT: call <2 x double> @llvm.copysign.v2f64(<2 x double> [[RA]], <2 x double> [[RB]])			// CHECK-NEXT: call <2 x double> @llvm.copysign.v2f64(<2 x double> [[RA]], <2 x double> [[RB]])
	__builtin_vsx_xvcpsgndp(a, b);			__builtin_vsx_xvcpsgndp(a, b);
	}			}

				vector double test_recipdivd(vector double a, vector double b) {
				// CHECK-LABEL: test_recipdivd
				// CHECK: fdiv fast <2 x double>
				// CHECK-LE-LABEL: test_recipdivd
				// CHECK-LE: fdiv fast <2 x double>
				return vec_recipdiv(a, b);
				}

				vector double test_rsqrtd(vector double a, vector double b) {
				// CHECK-LABEL: test_rsqrtd
				// CHECK: call fast <2 x double> @llvm.sqrt.v2f64
				bmahjourUnsubmitted Not Done Reply Inline Actions See my comment above about the missing reciprocal operation. bmahjour: See my comment above about the missing reciprocal operation.
				// CHECK: fdiv fast <2 x double> <double 1.000000e+00, double 1.000000e+00>
				// CHECK-LE-LABEL: test_rsqrtd
				// CHECK-LE: call fast <2 x double> @llvm.sqrt.v2f64
				// CHECK-LE: fdiv fast <2 x double> <double 1.000000e+00, double 1.000000e+00>
				return vec_rsqrt(a);
				}

This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Provide fastmath sqrt and div functions in altivec.hClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 342094

clang/include/clang/Basic/BuiltinsPPC.def

clang/lib/CodeGen/CGBuiltin.cpp

clang/lib/Headers/altivec.h

clang/test/CodeGen/builtins-ppc-altivec.c

clang/test/CodeGen/builtins-ppc-vsx.c

[PowerPC] Provide fastmath sqrt and div functions in altivec.h
ClosedPublic