This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
-
CGBuiltin.cpp
-
test/CodeGen/
-
CodeGen/
-
builtins-ppc-fma.c
-
builtins-ppc-fpconstrained.c
-
builtins-ppc-vsx.c
-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
2/2
IntrinsicsPowerPC.td
-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
-
PPCISelLowering.cpp
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
builtins-ppc-xlcompat-math.ll

Differential D116015

[PowerPC] Add generic fnmsub intrinsic
ClosedPublic

Authored by qiucf on Dec 19 2021, 8:48 PM.

Download Raw Diff

Details

Reviewers

rzurob
jsji
nemanjai
shchenz

Group Reviewers

Restricted Project

Commits

rGb2497e54356d: [PowerPC] Add generic fnmsub intrinsic

Summary

Currently in Clang, we have various builtins for fnmsub operation:

__builtin_vsx_xvnmsubasp/__builtin_vsx_xvnmsubadp for float/double vector, they'll be transformed into -fma(a, b, -c) in LLVM IR
__builtin_ppc_fnmsubs/__builtin_ppc_fnmsub for float/double scalar, they'll generate corresponding intrinsic in IR

But for the vector version of builtin, the 3 op chain may be recognized as expensive by some passes (like early cse). We need some way to keep the fnmsub form until code generation.

This patch introduces ppc.fnmsub.* intrinsic to unify four fnmsub intrinsics.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	50 ms	x64 debian > LLVM.Bindings/Go::go.test

Event Timeline

qiucf created this revision.Dec 19 2021, 8:48 PM

Herald added subscribers: kbarton, hiraditya. · View Herald TranscriptDec 19 2021, 8:48 PM

qiucf requested review of this revision.Dec 19 2021, 8:48 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptDec 19 2021, 8:48 PM

Herald added subscribers: llvm-commits, cfe-commits. · View Herald Transcript

Harbormaster completed remote builds in B140033: Diff 395369.Dec 19 2021, 9:46 PM

Converting more generic code to target-specific intrinsics is sometimes necessary to ensure the generic IR doesn't get transformed in a way that is disadvantageous. I believe that the description of this review claims that to be the case for these negated FMA's. The obvious disadvantage of producing target-specific intrinsics is that the optimizer knows nothing about them so no advantageous transformations can happen either (i.e. hiding the semantics from the optimizer is sometimes a good thing and sometimes a bad thing).

The description of this patch includes no test case that shows the optimizer performing an undesirable transformation. So the motivation for making the front end produce more opaque code is not at all clear.

In D116015#3203067, @nemanjai wrote:

Converting more generic code to target-specific intrinsics is sometimes necessary to ensure the generic IR doesn't get transformed in a way that is disadvantageous. I believe that the description of this review claims that to be the case for these negated FMA's. The obvious disadvantage of producing target-specific intrinsics is that the optimizer knows nothing about them so no advantageous transformations can happen either (i.e. hiding the semantics from the optimizer is sometimes a good thing and sometimes a bad thing).

The description of this patch includes no test case that shows the optimizer performing an undesirable transformation. So the motivation for making the front end produce more opaque code is not at all clear.

Here's a pretty simple case: vector float foo(vector float a, vector float b, vector float c, vector float d) { return __builtin_vsx_xvnmsubasp(c, d, a*b); }

It current produces xvnegsp+xvmulsp+xvnmaddasp, after this patch it produces xvmulsp+xvnmsubasp. In some complicated cases, we can see much more unexpected instructions generated.

gentle ping

hiding the semantics from the optimizer is sometimes a good thing and sometimes a bad thing).

Agree. Imagining a case when the neg and fma (from fnmsub) can both be CSE-ed with another neg and fma, so we can totally eliminate the fnmsub. But after we convert it to an intrinsic, we may lose the opportunity to CSE the fnmsub.

Here's a pretty simple case: vector float foo(vector float a, vector float b, vector float c, vector float d) { return __builtin_vsx_xvnmsubasp(c, d, a*b); }
It current produces xvnegsp+xvmulsp+xvnmaddasp, after this patch it produces xvmulsp+xvnmsubasp. In some complicated cases, we can see much more unexpected instructions generated.

This is narrowed down from a real-world case. After CSE some part of the fnmsub, it is hard to optimize it back to a single hardware fnmsub instruction as normally we check the use number of a register and if the user number is not 1, we may exit the combine.

Is it possible to get some perf data for some float workloads with this patch? @qiucf

llvm/include/llvm/IR/IntrinsicsPowerPC.td
1737	When `llvm_anyfloat_ty` is `f32` or `f64`, we will generate two intrinsics with same semantic. `llvm.ppc.nmsub.f32` + `llvm.ppc.fnmsubs` and `llvm.ppc.nmsub.f64` + `llvm.ppc.fnmsub`. At first glance, we seems can not delete the `int_ppc_fnmsub` and `int_ppc_fnmsubs`, because they are for XL compatibility and XL has seperated fnmsub for float and double and we need to map them 1 by 1. Better to check if it is possible to replace `int_ppc_fnmsub` and `int_ppc_fnmsubs` with `int_ppc_nmsub`. And if it can be replaced, we can use a meaningful name like `int_ppc_fnmsub` for the new intrinsic.

In D116015#3326148, @shchenz wrote:

hiding the semantics from the optimizer is sometimes a good thing and sometimes a bad thing).

Agree. Imagining a case when the neg and fma (from fnmsub) can both be CSE-ed with another neg and fma, so we can totally eliminate the fnmsub. But after we convert it to an intrinsic, we may lose the opportunity to CSE the fnmsub.

Here's a pretty simple case: vector float foo(vector float a, vector float b, vector float c, vector float d) { return __builtin_vsx_xvnmsubasp(c, d, a*b); }
It current produces xvnegsp+xvmulsp+xvnmaddasp, after this patch it produces xvmulsp+xvnmsubasp. In some complicated cases, we can see much more unexpected instructions generated.

This is narrowed down from a real-world case. After CSE some part of the fnmsub, it is hard to optimize it back to a single hardware fnmsub instruction as normally we check the use number of a register and if the user number is not 1, we may exit the combine.

Is it possible to get some perf data for some float workloads with this patch? @qiucf

Thanks. I did not see performance change in some common benchmarks.

llvm/include/llvm/IR/IntrinsicsPowerPC.td
1737	We can do that, but that requires more work and seems beyond this patch's scope. See D105930, we'll need to handle the builtin in Clang. And the builtin explicitly generates type-M VSX instructions (I guess to reduce copy in simple cases).

Replace existing ppc.fnmsub and ppc.fnmsubs.

Harbormaster completed remote builds in B151213: Diff 411046.Feb 24 2022, 2:48 AM

LGTM. Two nits about the comments and tests.

Please wait for some days in case other reviews have some comments.

clang/lib/CodeGen/CGBuiltin.cpp
15196	why add this comment here?
clang/test/CodeGen/PowerPC/builtins-ppc-xlcompat-math.c
98 ↗	(On Diff #411046)	If we improve the check lines to CHECK-COUNT, do we still need the original CHECKs?

This revision is now accepted and ready to land.Mar 2 2022, 6:21 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2022, 6:21 PM

qiucf updated this revision to Diff 413333.Mar 6 2022, 9:05 PM

qiucf marked an inline comment as done.

qiucf added inline comments.

clang/test/CodeGen/PowerPC/builtins-ppc-xlcompat-math.c
98 ↗	(On Diff #411046)	Yes, otherwise we can't capture the right operands of `llvm.ppc.fnmsub.f64`.

This revision was landed with ongoing or failed builds.Mar 6 2022, 9:06 PM

Closed by commit rGb2497e54356d: [PowerPC] Add generic fnmsub intrinsic (authored by qiucf). · Explain Why

This revision was automatically updated to reflect the committed changes.

qiucf added a commit: rGb2497e54356d: [PowerPC] Add generic fnmsub intrinsic.

Harbormaster completed remote builds in B152830: Diff 413333.Mar 6 2022, 10:10 PM

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGBuiltin.cpp

7 lines

test/

CodeGen/

builtins-ppc-fma.c

8 lines

builtins-ppc-fpconstrained.c

8 lines

builtins-ppc-vsx.c

16 lines

llvm/

include/

llvm/

IR/

IntrinsicsPowerPC.td

4 lines

lib/

Target/

PowerPC/

PPCISelLowering.cpp

13 lines

test/

CodeGen/

PowerPC/

builtins-ppc-xlcompat-math.ll

99 lines

Diff 395369

clang/lib/CodeGen/CGBuiltin.cpp

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	case PPC::BI__builtin_vsx_xvnmsubadp:			case PPC::BI__builtin_vsx_xvnmsubadp:
	case PPC::BI__builtin_vsx_xvnmsubasp:			case PPC::BI__builtin_vsx_xvnmsubasp:
	if (Builder.getIsFPConstrained())			if (Builder.getIsFPConstrained())
	return Builder.CreateFNeg(			return Builder.CreateFNeg(
	Builder.CreateConstrainedFPCall(			Builder.CreateConstrainedFPCall(
	F, {X, Y, Builder.CreateFNeg(Z, "neg")}),			F, {X, Y, Builder.CreateFNeg(Z, "neg")}),
	"neg");			"neg");
	else			else
	return Builder.CreateFNeg(			return Builder.CreateCall(
	Builder.CreateCall(F, {X, Y, Builder.CreateFNeg(Z, "neg")}),			CGM.getIntrinsic(Intrinsic::ppc_nmsub, ResultType), {X, Y, Z});
	"neg");			}
	}
	llvm_unreachable("Unknown FMA operation");			llvm_unreachable("Unknown FMA operation");
	return nullptr; // Suppress no-return warning			return nullptr; // Suppress no-return warning
	}			}

	case PPC::BI__builtin_vsx_insertword: {			case PPC::BI__builtin_vsx_insertword: {
	llvm::Function *F = CGM.getIntrinsic(Intrinsic::ppc_vsx_xxinsertw);			llvm::Function *F = CGM.getIntrinsic(Intrinsic::ppc_vsx_xxinsertw);

	// Third argument is a compile time constant int. It must be clamped to			// Third argument is a compile time constant int. It must be clamped to
	▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

clang/test/CodeGen/builtins-ppc-fma.c

Show All 26 Lines	void test_fma(void) {
// CHECK: [[RESULT:%[^ ]+]] = fneg <4 x float> %{{.*}}		// CHECK: [[RESULT:%[^ ]+]] = fneg <4 x float> %{{.*}}
// CHECK: @llvm.fma.v4f32(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> [[RESULT]])		// CHECK: @llvm.fma.v4f32(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> [[RESULT]])

vd = __builtin_vsx_xvmsubadp(vd, vd, vd);		vd = __builtin_vsx_xvmsubadp(vd, vd, vd);
// CHECK: [[RESULT:%[^ ]+]] = fneg <2 x double> %{{.*}}		// CHECK: [[RESULT:%[^ ]+]] = fneg <2 x double> %{{.*}}
// CHECK: <2 x double> @llvm.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT]])		// CHECK: <2 x double> @llvm.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT]])

vf = __builtin_vsx_xvnmsubasp(vf, vf, vf);		vf = __builtin_vsx_xvnmsubasp(vf, vf, vf);
// CHECK: [[RESULT:%[^ ]+]] = fneg <4 x float> %{{.*}}		// CHECK: call <4 x float> @llvm.ppc.nmsub.v4f32(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> %{{.*}})
// CHECK: [[RESULT2:%[^ ]+]] = call <4 x float> @llvm.fma.v4f32(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> [[RESULT]])
// CHECK: fneg <4 x float> [[RESULT2]]

vd = __builtin_vsx_xvnmsubadp(vd, vd, vd);		vd = __builtin_vsx_xvnmsubadp(vd, vd, vd);
// CHECK: [[RESULT:%[^ ]+]] = fneg <2 x double> %{{.*}}		// CHECK: call <2 x double> @llvm.ppc.nmsub.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> %{{.*}})
// CHECK: [[RESULT2:%[^ ]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT]])
// CHECK: fneg <2 x double> [[RESULT2]]
}		}

clang/test/CodeGen/builtins-ppc-fpconstrained.c

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	// CHECK-UNCONSTRAINED: [[RESULT:%[^ ]+]] = fneg <2 x double> %{{.*}}	// CHECK-UNCONSTRAINED: [[RESULT:%[^ ]+]] = fneg <2 x double> %{{.*}}
	// CHECK-UNCONSTRAINED: @llvm.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT]])	// CHECK-UNCONSTRAINED: @llvm.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT]])
	// CHECK-CONSTRAINED: [[RESULT:%[^ ]+]] = fneg <2 x double> %{{.*}}	// CHECK-CONSTRAINED: [[RESULT:%[^ ]+]] = fneg <2 x double> %{{.*}}
	// CHECK-CONSTRAINED: @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT]], metadata !"round.tonearest", metadata !"fpexcept.strict")	// CHECK-CONSTRAINED: @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT]], metadata !"round.tonearest", metadata !"fpexcept.strict")
	// CHECK-ASM: xvmsubadp	// CHECK-ASM: xvmsubadp

	vf = __builtin_vsx_xvnmsubasp(vf, vf, vf);	vf = __builtin_vsx_xvnmsubasp(vf, vf, vf);
	// CHECK-LABEL: try-xvnmsubasp	// CHECK-LABEL: try-xvnmsubasp
	// CHECK-UNCONSTRAINED: [[RESULT0:%[^ ]+]] = fneg <4 x float> %{{.*}}	// CHECK-UNCONSTRAINED: call <4 x float> @llvm.ppc.nmsub.v4f32(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> %{{.*}})
	// CHECK-UNCONSTRAINED: [[RESULT1:%[^ ]+]] = call <4 x float> @llvm.fma.v4f32(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> [[RESULT0]])
	// CHECK-UNCONSTRAINED: fneg <4 x float> [[RESULT1]]
	// CHECK-CONSTRAINED: [[RESULT0:%[^ ]+]] = fneg <4 x float> %{{.*}}	// CHECK-CONSTRAINED: [[RESULT0:%[^ ]+]] = fneg <4 x float> %{{.*}}
	// CHECK-CONSTRAINED: [[RESULT1:%[^ ]+]] = call <4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> [[RESULT0]], metadata !"round.tonearest", metadata !"fpexcept.strict")	// CHECK-CONSTRAINED: [[RESULT1:%[^ ]+]] = call <4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> [[RESULT0]], metadata !"round.tonearest", metadata !"fpexcept.strict")
	// CHECK-CONSTRAINED: fneg <4 x float> [[RESULT1]]	// CHECK-CONSTRAINED: fneg <4 x float> [[RESULT1]]
	// CHECK-ASM: xvnmsubasp	// CHECK-ASM: xvnmsubasp

	vd = __builtin_vsx_xvnmsubadp(vd, vd, vd);	vd = __builtin_vsx_xvnmsubadp(vd, vd, vd);
	// CHECK-LABEL: try-xvnmsubadp	// CHECK-LABEL: try-xvnmsubadp
	// CHECK-UNCONSTRAINED: [[RESULT0:%[^ ]+]] = fneg <2 x double> %{{.*}}	// CHECK-UNCONSTRAINED: call <2 x double> @llvm.ppc.nmsub.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> %{{.*}})
	// CHECK-UNCONSTRAINED: [[RESULT1:%[^ ]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT0]])
	// CHECK-UNCONSTRAINED: fneg <2 x double> [[RESULT1]]
	// CHECK-CONSTRAINED: [[RESULT0:%[^ ]+]] = fneg <2 x double> %{{.*}}	// CHECK-CONSTRAINED: [[RESULT0:%[^ ]+]] = fneg <2 x double> %{{.*}}
	// CHECK-CONSTRAINED: [[RESULT1:%[^ ]+]] = call <2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT0]], metadata !"round.tonearest", metadata !"fpexcept.strict")	// CHECK-CONSTRAINED: [[RESULT1:%[^ ]+]] = call <2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT0]], metadata !"round.tonearest", metadata !"fpexcept.strict")
	// CHECK-CONSTRAINED: fneg <2 x double> [[RESULT1]]	// CHECK-CONSTRAINED: fneg <2 x double> [[RESULT1]]
	// CHECK-ASM: xvnmsubadp	// CHECK-ASM: xvnmsubadp
	}	}
Context not available.

clang/test/CodeGen/builtins-ppc-vsx.c

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines

	res_vd = vec_nmadd(vd, vd, vd);			res_vd = vec_nmadd(vd, vd, vd);
	// CHECK: [[FM:[0-9]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}})			// CHECK: [[FM:[0-9]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}})
	// CHECK-NEXT: fneg <2 x double> %[[FM]]			// CHECK-NEXT: fneg <2 x double> %[[FM]]
	// CHECK-LE: [[FM:[0-9]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}})			// CHECK-LE: [[FM:[0-9]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}})
	// CHECK-LE-NEXT: fneg <2 x double> %[[FM]]			// CHECK-LE-NEXT: fneg <2 x double> %[[FM]]

	res_vf = vec_nmsub(vf, vf, vf);			res_vf = vec_nmsub(vf, vf, vf);
	// CHECK: fneg <4 x float> %{{[0-9]+}}			// CHECK: call <4 x float> @llvm.ppc.nmsub.v4f32(<4 x float> %{{[0-9]+}}, <4 x float> %{{[0-9]+}}, <4 x float>
	// CHECK-NEXT: call <4 x float> @llvm.fma.v4f32(<4 x float> %{{[0-9]+}}, <4 x float> %{{[0-9]+}}, <4 x float>			// CHECK-LE: call <4 x float> @llvm.ppc.nmsub.v4f32(<4 x float> %{{[0-9]+}}, <4 x float> %{{[0-9]+}}, <4 x float>
	// CHECK: fneg <4 x float> %{{[0-9]+}}
	// CHECK-LE: fneg <4 x float> %{{[0-9]+}}
	// CHECK-LE-NEXT: call <4 x float> @llvm.fma.v4f32(<4 x float> %{{[0-9]+}}, <4 x float> %{{[0-9]+}}, <4 x float>
	// CHECK-LE: fneg <4 x float> %{{[0-9]+}}

	res_vd = vec_nmsub(vd, vd, vd);			res_vd = vec_nmsub(vd, vd, vd);
	// CHECK: fneg <2 x double> %{{[0-9]+}}			// CHECK: [[FM:[0-9]+]] = call <2 x double> @llvm.ppc.nmsub.v2f64(<2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}}, <2 x double>
	// CHECK-NEXT: [[FM:[0-9]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}}, <2 x double>			// CHECK-LE: [[FM:[0-9]+]] = call <2 x double> @llvm.ppc.nmsub.v2f64(<2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}}, <2 x double>
	// CHECK-NEXT: fneg <2 x double> %[[FM]]
	// CHECK-LE: fneg <2 x double> %{{[0-9]+}}
	// CHECK-LE-NEXT: [[FM:[0-9]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}}, <2 x double>
	// CHECK-LE-NEXT: fneg <2 x double> %[[FM]]

	/* vec_nor */			/* vec_nor */
	res_vsll = vec_nor(vsll, vsll);			res_vsll = vec_nor(vsll, vsll);
	// CHECK: or <2 x i64>			// CHECK: or <2 x i64>
	// CHECK: xor <2 x i64>			// CHECK: xor <2 x i64>
	// CHECK-LE: or <2 x i64>			// CHECK-LE: or <2 x i64>
	// CHECK-LE: xor <2 x i64>			// CHECK-LE: xor <2 x i64>

	▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

llvm/include/llvm/IR/IntrinsicsPowerPC.td

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	Intrinsic <[llvm_double_ty],			Intrinsic <[llvm_double_ty],
	[llvm_double_ty, llvm_double_ty, llvm_double_ty],			[llvm_double_ty, llvm_double_ty, llvm_double_ty],
	[IntrNoMem]>;			[IntrNoMem]>;
	def int_ppc_fnmsubs			def int_ppc_fnmsubs
	: GCCBuiltin<"__builtin_ppc_fnmsubs">,			: GCCBuiltin<"__builtin_ppc_fnmsubs">,
	Intrinsic <[llvm_float_ty],			Intrinsic <[llvm_float_ty],
	[llvm_float_ty, llvm_float_ty, llvm_float_ty],			[llvm_float_ty, llvm_float_ty, llvm_float_ty],
	[IntrNoMem]>;			[IntrNoMem]>;
				def int_ppc_nmsub
				: Intrinsic<[llvm_anyfloat_ty],
				[LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>],
				[IntrNoMem]>;
				shchenzUnsubmitted Done Reply Inline Actions When `llvm_anyfloat_ty` is `f32` or `f64`, we will generate two intrinsics with same semantic. `llvm.ppc.nmsub.f32` + `llvm.ppc.fnmsubs` and `llvm.ppc.nmsub.f64` + `llvm.ppc.fnmsub`. At first glance, we seems can not delete the `int_ppc_fnmsub` and `int_ppc_fnmsubs`, because they are for XL compatibility and XL has seperated fnmsub for float and double and we need to map them 1 by 1. Better to check if it is possible to replace `int_ppc_fnmsub` and `int_ppc_fnmsubs` with `int_ppc_nmsub`. And if it can be replaced, we can use a meaningful name like `int_ppc_fnmsub` for the new intrinsic. shchenz: When `llvm_anyfloat_ty` is `f32` or `f64`, we will generate two intrinsics with same semantic.
				qiucfAuthorUnsubmitted Done Reply Inline Actions We can do that, but that requires more work and seems beyond this patch's scope. See D105930, we'll need to handle the builtin in Clang. And the builtin explicitly generates type-M VSX instructions (I guess to reduce copy in simple cases). qiucf: We can do that, but that requires more work and seems beyond this patch's scope. See D105930…
	def int_ppc_fre			def int_ppc_fre
	: GCCBuiltin<"__builtin_ppc_fre">,			: GCCBuiltin<"__builtin_ppc_fre">,
	Intrinsic <[llvm_double_ty], [llvm_double_ty], [IntrNoMem]>;			Intrinsic <[llvm_double_ty], [llvm_double_ty], [IntrNoMem]>;
	def int_ppc_fres			def int_ppc_fres
	: GCCBuiltin<"__builtin_ppc_fres">,			: GCCBuiltin<"__builtin_ppc_fres">,
	Intrinsic <[llvm_float_ty], [llvm_float_ty], [IntrNoMem]>;			Intrinsic <[llvm_float_ty], [llvm_float_ty], [IntrNoMem]>;
	def int_ppc_addex			def int_ppc_addex
	: GCCBuiltin<"__builtin_ppc_addex">,			: GCCBuiltin<"__builtin_ppc_addex">,
	▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	setOperationAction(ISD::GET_DYNAMIC_AREA_OFFSET, MVT::i64, Custom);			setOperationAction(ISD::GET_DYNAMIC_AREA_OFFSET, MVT::i64, Custom);
	setOperationAction(ISD::EH_DWARF_CFA, MVT::i32, Custom);			setOperationAction(ISD::EH_DWARF_CFA, MVT::i32, Custom);
	setOperationAction(ISD::EH_DWARF_CFA, MVT::i64, Custom);			setOperationAction(ISD::EH_DWARF_CFA, MVT::i64, Custom);

	// We want to custom lower some of our intrinsics.			// We want to custom lower some of our intrinsics.
	setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom);			setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom);
	setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::f64, Custom);			setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::f64, Custom);
	setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::ppcf128, Custom);			setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::ppcf128, Custom);
				setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::v4f32, Custom);
				setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::v2f64, Custom);

	// To handle counter-based loop conditions.			// To handle counter-based loop conditions.
	setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i1, Custom);			setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i1, Custom);

	setOperationAction(ISD::INTRINSIC_VOID, MVT::i8, Custom);			setOperationAction(ISD::INTRINSIC_VOID, MVT::i8, Custom);
	setOperationAction(ISD::INTRINSIC_VOID, MVT::i16, Custom);			setOperationAction(ISD::INTRINSIC_VOID, MVT::i16, Custom);
	setOperationAction(ISD::INTRINSIC_VOID, MVT::i32, Custom);			setOperationAction(ISD::INTRINSIC_VOID, MVT::i32, Custom);
	setOperationAction(ISD::INTRINSIC_VOID, MVT::Other, Custom);			setOperationAction(ISD::INTRINSIC_VOID, MVT::Other, Custom);
	▲ Show 20 Lines • Show All 182 Lines • ▼ Show 20 Lines
	PPC::SELECT_CC_I4, dl, MVT::i32,			PPC::SELECT_CC_I4, dl, MVT::i32,
	{SDValue(DAG.getMachineNode(CmprOpc, dl, MVT::i32, Op.getOperand(2),			{SDValue(DAG.getMachineNode(CmprOpc, dl, MVT::i32, Op.getOperand(2),
	Op.getOperand(1)),			Op.getOperand(1)),
	0),			0),
	DAG.getConstant(1, dl, MVT::i32), DAG.getConstant(0, dl, MVT::i32),			DAG.getConstant(1, dl, MVT::i32), DAG.getConstant(0, dl, MVT::i32),
	DAG.getTargetConstant(PPC::PRED_EQ, dl, MVT::i32)}),			DAG.getTargetConstant(PPC::PRED_EQ, dl, MVT::i32)}),
	0);			0);
	}			}
				case Intrinsic::ppc_nmsub: {
				EVT VT = Op.getOperand(1).getValueType();
				if (!Subtarget.hasVSX() \|\| (!Subtarget.hasFloat128() && VT == MVT::f128))
				return DAG.getNode(
				ISD::FNEG, dl, VT,
				DAG.getNode(ISD::FMA, dl, VT, Op.getOperand(1), Op.getOperand(2),
				DAG.getNode(ISD::FNEG, dl, VT, Op.getOperand(3))));
				return DAG.getNode(PPCISD::FNMSUB, dl, VT, Op.getOperand(1),
				Op.getOperand(2), Op.getOperand(3));
				}
	case Intrinsic::ppc_convert_f128_to_ppcf128:			case Intrinsic::ppc_convert_f128_to_ppcf128:
	case Intrinsic::ppc_convert_ppcf128_to_f128: {			case Intrinsic::ppc_convert_ppcf128_to_f128: {
	RTLIB::Libcall LC = IntrinsicID == Intrinsic::ppc_convert_ppcf128_to_f128			RTLIB::Libcall LC = IntrinsicID == Intrinsic::ppc_convert_ppcf128_to_f128
	? RTLIB::CONVERT_PPCF128_F128			? RTLIB::CONVERT_PPCF128_F128
	: RTLIB::CONVERT_F128_PPCF128;			: RTLIB::CONVERT_F128_PPCF128;
	MakeLibCallOptions CallOptions;			MakeLibCallOptions CallOptions;
	std::pair<SDValue, SDValue> Result =			std::pair<SDValue, SDValue> Result =
	makeLibCall(DAG, LC, Op.getValueType(), Op.getOperand(1), CallOptions,			makeLibCall(DAG, LC, Op.getValueType(), Op.getOperand(1), CallOptions,
	▲ Show 20 Lines • Show All 182 Lines • ▼ Show 20 Lines
	break;			break;
	}			}
	case ISD::INTRINSIC_WO_CHAIN: {			case ISD::INTRINSIC_WO_CHAIN: {
	switch (cast<ConstantSDNode>(N->getOperand(0))->getZExtValue()) {			switch (cast<ConstantSDNode>(N->getOperand(0))->getZExtValue()) {
	case Intrinsic::ppc_pack_longdouble:			case Intrinsic::ppc_pack_longdouble:
	Results.push_back(DAG.getNode(ISD::BUILD_PAIR, dl, MVT::ppcf128,			Results.push_back(DAG.getNode(ISD::BUILD_PAIR, dl, MVT::ppcf128,
	N->getOperand(2), N->getOperand(1)));			N->getOperand(2), N->getOperand(1)));
	break;			break;
				case Intrinsic::ppc_nmsub:
	case Intrinsic::ppc_convert_f128_to_ppcf128:			case Intrinsic::ppc_convert_f128_to_ppcf128:
	Results.push_back(LowerINTRINSIC_WO_CHAIN(SDValue(N, 0), DAG));			Results.push_back(LowerINTRINSIC_WO_CHAIN(SDValue(N, 0), DAG));
	break;			break;
	}			}
	break;			break;
	}			}
	case ISD::VAARG: {			case ISD::VAARG: {
	if (!Subtarget.isSVR4ABI() \|\| Subtarget.isPPC64())			if (!Subtarget.isSVR4ABI() \|\| Subtarget.isPPC64())
	▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/builtins-ppc-xlcompat-math.ll

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	; CHECK-PWR7-NEXT: blr			; CHECK-PWR7-NEXT: blr
	entry:			entry:
	%0 = tail call float @llvm.ppc.fnmsubs(float %f, float %f2, float %f3)			%0 = tail call float @llvm.ppc.fnmsubs(float %f, float %f2, float %f3)
	ret float %0			ret float %0
	}			}

	declare float @llvm.ppc.fnmsubs(float, float, float)			declare float @llvm.ppc.fnmsubs(float, float, float)

				define dso_local float @nmsub_f32(float %f, float %f2, float %f3) {
				; CHECK-PWR8-LABEL: nmsub_f32:
				; CHECK-PWR8: # %bb.0: # %entry
				; CHECK-PWR8-NEXT: xsnmsubasp 3, 1, 2
				; CHECK-PWR8-NEXT: fmr 1, 3
				; CHECK-PWR8-NEXT: blr
				;
				; CHECK-NOVSX-LABEL: nmsub_f32:
				; CHECK-NOVSX: # %bb.0: # %entry
				; CHECK-NOVSX-NEXT: fnmsubs 1, 1, 2, 3
				; CHECK-NOVSX-NEXT: blr
				;
				; CHECK-PWR7-LABEL: nmsub_f32:
				; CHECK-PWR7: # %bb.0: # %entry
				; CHECK-PWR7-NEXT: fnmsubs 1, 1, 2, 3
				; CHECK-PWR7-NEXT: blr
				entry:
				%0 = tail call float @llvm.ppc.nmsub.f32(float %f, float %f2, float %f3)
				ret float %0
				}

				declare float @llvm.ppc.nmsub.f32(float, float, float)

				define dso_local double @nmsub_f64(double %f, double %f2, double %f3) {
				; CHECK-PWR8-LABEL: nmsub_f64:
				; CHECK-PWR8: # %bb.0: # %entry
				; CHECK-PWR8-NEXT: xsnmsubadp 3, 1, 2
				; CHECK-PWR8-NEXT: fmr 1, 3
				; CHECK-PWR8-NEXT: blr
				;
				; CHECK-NOVSX-LABEL: nmsub_f64:
				; CHECK-NOVSX: # %bb.0: # %entry
				; CHECK-NOVSX-NEXT: fnmsub 1, 1, 2, 3
				; CHECK-NOVSX-NEXT: blr
				;
				; CHECK-PWR7-LABEL: nmsub_f64:
				; CHECK-PWR7: # %bb.0: # %entry
				; CHECK-PWR7-NEXT: xsnmsubadp 3, 1, 2
				; CHECK-PWR7-NEXT: fmr 1, 3
				; CHECK-PWR7-NEXT: blr
				entry:
				%0 = tail call double @llvm.ppc.nmsub.f64(double %f, double %f2, double %f3)
				ret double %0
				}

				declare double @llvm.ppc.nmsub.f64(double, double, double)

				define dso_local <4 x float> @nmsub_v4f32(<4 x float> %f, <4 x float> %f2, <4 x float> %f3) {
				; CHECK-PWR8-LABEL: nmsub_v4f32:
				; CHECK-PWR8: # %bb.0: # %entry
				; CHECK-PWR8-NEXT: xvnmsubasp 36, 34, 35
				; CHECK-PWR8-NEXT: vmr 2, 4
				; CHECK-PWR8-NEXT: blr
				;
				; CHECK-NOVSX-LABEL: nmsub_v4f32:
				; CHECK-NOVSX: # %bb.0: # %entry
				; CHECK-NOVSX-NEXT: fnmsubs 1, 1, 5, 9
				; CHECK-NOVSX-NEXT: fnmsubs 2, 2, 6, 10
				; CHECK-NOVSX-NEXT: fnmsubs 3, 3, 7, 11
				; CHECK-NOVSX-NEXT: fnmsubs 4, 4, 8, 12
				; CHECK-NOVSX-NEXT: blr
				;
				; CHECK-PWR7-LABEL: nmsub_v4f32:
				; CHECK-PWR7: # %bb.0: # %entry
				; CHECK-PWR7-NEXT: xvnmsubasp 36, 34, 35
				; CHECK-PWR7-NEXT: vmr 2, 4
				; CHECK-PWR7-NEXT: blr
				entry:
				%0 = tail call <4 x float> @llvm.ppc.nmsub.v4f32(<4 x float> %f, <4 x float> %f2, <4 x float> %f3)
				ret <4 x float> %0
				}

				declare <4 x float> @llvm.ppc.nmsub.v4f32(<4 x float>, <4 x float>, <4 x float>)

				define dso_local <2 x double> @nmsub_v2f64(<2 x double> %f, <2 x double> %f2, <2 x double> %f3) {
				; CHECK-PWR8-LABEL: nmsub_v2f64:
				; CHECK-PWR8: # %bb.0: # %entry
				; CHECK-PWR8-NEXT: xvnmsubadp 36, 34, 35
				; CHECK-PWR8-NEXT: vmr 2, 4
				; CHECK-PWR8-NEXT: blr
				;
				; CHECK-NOVSX-LABEL: nmsub_v2f64:
				; CHECK-NOVSX: # %bb.0: # %entry
				; CHECK-NOVSX-NEXT: fnmsub 1, 1, 3, 5
				; CHECK-NOVSX-NEXT: fnmsub 2, 2, 4, 6
				; CHECK-NOVSX-NEXT: blr
				;
				; CHECK-PWR7-LABEL: nmsub_v2f64:
				; CHECK-PWR7: # %bb.0: # %entry
				; CHECK-PWR7-NEXT: xvnmsubadp 36, 34, 35
				; CHECK-PWR7-NEXT: vmr 2, 4
				; CHECK-PWR7-NEXT: blr
				entry:
				%0 = tail call <2 x double> @llvm.ppc.nmsub.v2f64(<2 x double> %f, <2 x double> %f2, <2 x double> %f3)
				ret <2 x double> %0
				}

				declare <2 x double> @llvm.ppc.nmsub.v2f64(<2 x double>, <2 x double>, <2 x double>)

	define dso_local double @fre(double %d) {			define dso_local double @fre(double %d) {
	; CHECK-PWR8-LABEL: fre:			; CHECK-PWR8-LABEL: fre:
	; CHECK-PWR8: # %bb.0: # %entry			; CHECK-PWR8: # %bb.0: # %entry
	; CHECK-PWR8-NEXT: xsredp 1, 1			; CHECK-PWR8-NEXT: xsredp 1, 1
	; CHECK-PWR8-NEXT: blr			; CHECK-PWR8-NEXT: blr
	;			;
	; CHECK-NOVSX-LABEL: fre:			; CHECK-NOVSX-LABEL: fre:
	; CHECK-NOVSX: # %bb.0: # %entry			; CHECK-NOVSX: # %bb.0: # %entry
	Show All 35 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Add generic fnmsub intrinsicClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 395369

clang/lib/CodeGen/CGBuiltin.cpp

clang/test/CodeGen/builtins-ppc-fma.c

clang/test/CodeGen/builtins-ppc-fpconstrained.c

clang/test/CodeGen/builtins-ppc-vsx.c

llvm/include/llvm/IR/IntrinsicsPowerPC.td

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

llvm/test/CodeGen/PowerPC/builtins-ppc-xlcompat-math.ll

[PowerPC] Add generic fnmsub intrinsic
ClosedPublic