This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
-
CGBuiltin.cpp
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
builtins-ppc-fma.c
-
builtins-ppc-fpconstrained.c
-
builtins-ppc-vsx.c
1/2
builtins-ppc-xlcompat-math.c
-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
2/2
IntrinsicsPowerPC.td
-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
-
PPCISelLowering.cpp
-
PPCInstrInfo.td
-
PPCInstrVSX.td
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
builtins-ppc-xlcompat-math.ll

Differential D116015

[PowerPC] Add generic fnmsub intrinsic
ClosedPublic

Authored by qiucf on Dec 19 2021, 8:48 PM.

Download Raw Diff

Details

Reviewers

rzurob
jsji
nemanjai
shchenz

Group Reviewers

Restricted Project

Commits

rGb2497e54356d: [PowerPC] Add generic fnmsub intrinsic

Summary

Currently in Clang, we have various builtins for fnmsub operation:

__builtin_vsx_xvnmsubasp/__builtin_vsx_xvnmsubadp for float/double vector, they'll be transformed into -fma(a, b, -c) in LLVM IR
__builtin_ppc_fnmsubs/__builtin_ppc_fnmsub for float/double scalar, they'll generate corresponding intrinsic in IR

But for the vector version of builtin, the 3 op chain may be recognized as expensive by some passes (like early cse). We need some way to keep the fnmsub form until code generation.

This patch introduces ppc.fnmsub.* intrinsic to unify four fnmsub intrinsics.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,060 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vloxseg_mask.c
	60,050 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vlsegff.c
	60,040 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vluxseg_mask.c
	60,050 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-overloaded::vloxseg_mask.c
	60,040 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-overloaded::vluxseg_mask.c
		View Full Test Results (8 Failed)

Event Timeline

qiucf created this revision.Dec 19 2021, 8:48 PM

Herald added subscribers: kbarton, hiraditya. · View Herald TranscriptDec 19 2021, 8:48 PM

qiucf requested review of this revision.Dec 19 2021, 8:48 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptDec 19 2021, 8:48 PM

Herald added subscribers: llvm-commits, cfe-commits. · View Herald Transcript

Harbormaster completed remote builds in B140033: Diff 395369.Dec 19 2021, 9:46 PM

Converting more generic code to target-specific intrinsics is sometimes necessary to ensure the generic IR doesn't get transformed in a way that is disadvantageous. I believe that the description of this review claims that to be the case for these negated FMA's. The obvious disadvantage of producing target-specific intrinsics is that the optimizer knows nothing about them so no advantageous transformations can happen either (i.e. hiding the semantics from the optimizer is sometimes a good thing and sometimes a bad thing).

The description of this patch includes no test case that shows the optimizer performing an undesirable transformation. So the motivation for making the front end produce more opaque code is not at all clear.

In D116015#3203067, @nemanjai wrote:

Converting more generic code to target-specific intrinsics is sometimes necessary to ensure the generic IR doesn't get transformed in a way that is disadvantageous. I believe that the description of this review claims that to be the case for these negated FMA's. The obvious disadvantage of producing target-specific intrinsics is that the optimizer knows nothing about them so no advantageous transformations can happen either (i.e. hiding the semantics from the optimizer is sometimes a good thing and sometimes a bad thing).

The description of this patch includes no test case that shows the optimizer performing an undesirable transformation. So the motivation for making the front end produce more opaque code is not at all clear.

Here's a pretty simple case: vector float foo(vector float a, vector float b, vector float c, vector float d) { return __builtin_vsx_xvnmsubasp(c, d, a*b); }

It current produces xvnegsp+xvmulsp+xvnmaddasp, after this patch it produces xvmulsp+xvnmsubasp. In some complicated cases, we can see much more unexpected instructions generated.

gentle ping

hiding the semantics from the optimizer is sometimes a good thing and sometimes a bad thing).

Agree. Imagining a case when the neg and fma (from fnmsub) can both be CSE-ed with another neg and fma, so we can totally eliminate the fnmsub. But after we convert it to an intrinsic, we may lose the opportunity to CSE the fnmsub.

Here's a pretty simple case: vector float foo(vector float a, vector float b, vector float c, vector float d) { return __builtin_vsx_xvnmsubasp(c, d, a*b); }
It current produces xvnegsp+xvmulsp+xvnmaddasp, after this patch it produces xvmulsp+xvnmsubasp. In some complicated cases, we can see much more unexpected instructions generated.

This is narrowed down from a real-world case. After CSE some part of the fnmsub, it is hard to optimize it back to a single hardware fnmsub instruction as normally we check the use number of a register and if the user number is not 1, we may exit the combine.

Is it possible to get some perf data for some float workloads with this patch? @qiucf

llvm/include/llvm/IR/IntrinsicsPowerPC.td
1737	When `llvm_anyfloat_ty` is `f32` or `f64`, we will generate two intrinsics with same semantic. `llvm.ppc.nmsub.f32` + `llvm.ppc.fnmsubs` and `llvm.ppc.nmsub.f64` + `llvm.ppc.fnmsub`. At first glance, we seems can not delete the `int_ppc_fnmsub` and `int_ppc_fnmsubs`, because they are for XL compatibility and XL has seperated fnmsub for float and double and we need to map them 1 by 1. Better to check if it is possible to replace `int_ppc_fnmsub` and `int_ppc_fnmsubs` with `int_ppc_nmsub`. And if it can be replaced, we can use a meaningful name like `int_ppc_fnmsub` for the new intrinsic.

In D116015#3326148, @shchenz wrote:

hiding the semantics from the optimizer is sometimes a good thing and sometimes a bad thing).

Agree. Imagining a case when the neg and fma (from fnmsub) can both be CSE-ed with another neg and fma, so we can totally eliminate the fnmsub. But after we convert it to an intrinsic, we may lose the opportunity to CSE the fnmsub.

Here's a pretty simple case: vector float foo(vector float a, vector float b, vector float c, vector float d) { return __builtin_vsx_xvnmsubasp(c, d, a*b); }
It current produces xvnegsp+xvmulsp+xvnmaddasp, after this patch it produces xvmulsp+xvnmsubasp. In some complicated cases, we can see much more unexpected instructions generated.

This is narrowed down from a real-world case. After CSE some part of the fnmsub, it is hard to optimize it back to a single hardware fnmsub instruction as normally we check the use number of a register and if the user number is not 1, we may exit the combine.

Is it possible to get some perf data for some float workloads with this patch? @qiucf

Thanks. I did not see performance change in some common benchmarks.

llvm/include/llvm/IR/IntrinsicsPowerPC.td
1737	We can do that, but that requires more work and seems beyond this patch's scope. See D105930, we'll need to handle the builtin in Clang. And the builtin explicitly generates type-M VSX instructions (I guess to reduce copy in simple cases).

Replace existing ppc.fnmsub and ppc.fnmsubs.

Harbormaster completed remote builds in B151213: Diff 411046.Feb 24 2022, 2:48 AM

LGTM. Two nits about the comments and tests.

Please wait for some days in case other reviews have some comments.

clang/lib/CodeGen/CGBuiltin.cpp
15196	why add this comment here?
clang/test/CodeGen/PowerPC/builtins-ppc-xlcompat-math.c
98	If we improve the check lines to CHECK-COUNT, do we still need the original CHECKs?

This revision is now accepted and ready to land.Mar 2 2022, 6:21 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2022, 6:21 PM

qiucf updated this revision to Diff 413333.Mar 6 2022, 9:05 PM

qiucf marked an inline comment as done.

qiucf added inline comments.

clang/test/CodeGen/PowerPC/builtins-ppc-xlcompat-math.c
98	Yes, otherwise we can't capture the right operands of `llvm.ppc.fnmsub.f64`.

This revision was landed with ongoing or failed builds.Mar 6 2022, 9:06 PM

Closed by commit rGb2497e54356d: [PowerPC] Add generic fnmsub intrinsic (authored by qiucf). · Explain Why

This revision was automatically updated to reflect the committed changes.

qiucf added a commit: rGb2497e54356d: [PowerPC] Add generic fnmsub intrinsic.

Harbormaster completed remote builds in B152830: Diff 413333.Mar 6 2022, 10:10 PM

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGBuiltin.cpp

11 lines

test/

CodeGen/

PowerPC/

builtins-ppc-fma.c

8 lines

builtins-ppc-fpconstrained.c

8 lines

builtins-ppc-vsx.c

16 lines

builtins-ppc-xlcompat-math.c

6 lines

llvm/

include/

llvm/

IR/

IntrinsicsPowerPC.td

12 lines

lib/

Target/

PowerPC/

PPCISelLowering.cpp

13 lines

PPCInstrInfo.td

2 lines

PPCInstrVSX.td

2 lines

test/

CodeGen/

PowerPC/

builtins-ppc-xlcompat-math.ll

91 lines

Diff 413333

clang/lib/CodeGen/CGBuiltin.cpp

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	}			}
	case PPC::BI__builtin_ppc_load2r: {			case PPC::BI__builtin_ppc_load2r: {
	Function *F = CGM.getIntrinsic(Intrinsic::ppc_load2r);			Function *F = CGM.getIntrinsic(Intrinsic::ppc_load2r);
	Ops[0] = Builder.CreateBitCast(Ops[0], Int8PtrTy);			Ops[0] = Builder.CreateBitCast(Ops[0], Int8PtrTy);
	Value *LoadIntrinsic = Builder.CreateCall(F, Ops);			Value *LoadIntrinsic = Builder.CreateCall(F, Ops);
	return Builder.CreateTrunc(LoadIntrinsic, Int16Ty);			return Builder.CreateTrunc(LoadIntrinsic, Int16Ty);
	}			}
	// FMA variations			// FMA variations
				case PPC::BI__builtin_ppc_fnmsub:
				case PPC::BI__builtin_ppc_fnmsubs:
	case PPC::BI__builtin_vsx_xvmaddadp:			case PPC::BI__builtin_vsx_xvmaddadp:
	case PPC::BI__builtin_vsx_xvmaddasp:			case PPC::BI__builtin_vsx_xvmaddasp:
	case PPC::BI__builtin_vsx_xvnmaddadp:			case PPC::BI__builtin_vsx_xvnmaddadp:
	case PPC::BI__builtin_vsx_xvnmaddasp:			case PPC::BI__builtin_vsx_xvnmaddasp:
	case PPC::BI__builtin_vsx_xvmsubadp:			case PPC::BI__builtin_vsx_xvmsubadp:
	case PPC::BI__builtin_vsx_xvmsubasp:			case PPC::BI__builtin_vsx_xvmsubasp:
	case PPC::BI__builtin_vsx_xvnmsubadp:			case PPC::BI__builtin_vsx_xvnmsubadp:
	case PPC::BI__builtin_vsx_xvnmsubasp: {			case PPC::BI__builtin_vsx_xvnmsubasp: {
	Show All 22 Lines
	return Builder.CreateFNeg(Builder.CreateCall(F, {X, Y, Z}), "neg");			return Builder.CreateFNeg(Builder.CreateCall(F, {X, Y, Z}), "neg");
	case PPC::BI__builtin_vsx_xvmsubadp:			case PPC::BI__builtin_vsx_xvmsubadp:
	case PPC::BI__builtin_vsx_xvmsubasp:			case PPC::BI__builtin_vsx_xvmsubasp:
	if (Builder.getIsFPConstrained())			if (Builder.getIsFPConstrained())
	return Builder.CreateConstrainedFPCall(			return Builder.CreateConstrainedFPCall(
	F, {X, Y, Builder.CreateFNeg(Z, "neg")});			F, {X, Y, Builder.CreateFNeg(Z, "neg")});
	else			else
	return Builder.CreateCall(F, {X, Y, Builder.CreateFNeg(Z, "neg")});			return Builder.CreateCall(F, {X, Y, Builder.CreateFNeg(Z, "neg")});
				case PPC::BI__builtin_ppc_fnmsub:
				case PPC::BI__builtin_ppc_fnmsubs:
	case PPC::BI__builtin_vsx_xvnmsubadp:			case PPC::BI__builtin_vsx_xvnmsubadp:
	case PPC::BI__builtin_vsx_xvnmsubasp:			case PPC::BI__builtin_vsx_xvnmsubasp:
	if (Builder.getIsFPConstrained())			if (Builder.getIsFPConstrained())
	return Builder.CreateFNeg(			return Builder.CreateFNeg(
	Builder.CreateConstrainedFPCall(			Builder.CreateConstrainedFPCall(
	F, {X, Y, Builder.CreateFNeg(Z, "neg")}),			F, {X, Y, Builder.CreateFNeg(Z, "neg")}),
	"neg");			"neg");
	else			else
	return Builder.CreateFNeg(			return Builder.CreateCall(
	Builder.CreateCall(F, {X, Y, Builder.CreateFNeg(Z, "neg")}),			CGM.getIntrinsic(Intrinsic::ppc_fnmsub, ResultType), {X, Y, Z});
	"neg");			}
	}
	llvm_unreachable("Unknown FMA operation");			llvm_unreachable("Unknown FMA operation");
	return nullptr; // Suppress no-return warning			return nullptr; // Suppress no-return warning
	}			}

	case PPC::BI__builtin_vsx_insertword: {			case PPC::BI__builtin_vsx_insertword: {
	llvm::Function *F = CGM.getIntrinsic(Intrinsic::ppc_vsx_xxinsertw);			llvm::Function *F = CGM.getIntrinsic(Intrinsic::ppc_vsx_xxinsertw);

	// Third argument is a compile time constant int. It must be clamped to			// Third argument is a compile time constant int. It must be clamped to
	▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

clang/test/CodeGen/PowerPC/builtins-ppc-fma.c

Show All 26 Lines	void test_fma(void) {
// CHECK: [[RESULT:%[^ ]+]] = fneg <4 x float> %{{.*}}		// CHECK: [[RESULT:%[^ ]+]] = fneg <4 x float> %{{.*}}
// CHECK: @llvm.fma.v4f32(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> [[RESULT]])		// CHECK: @llvm.fma.v4f32(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> [[RESULT]])

vd = __builtin_vsx_xvmsubadp(vd, vd, vd);		vd = __builtin_vsx_xvmsubadp(vd, vd, vd);
// CHECK: [[RESULT:%[^ ]+]] = fneg <2 x double> %{{.*}}		// CHECK: [[RESULT:%[^ ]+]] = fneg <2 x double> %{{.*}}
// CHECK: <2 x double> @llvm.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT]])		// CHECK: <2 x double> @llvm.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT]])

vf = __builtin_vsx_xvnmsubasp(vf, vf, vf);		vf = __builtin_vsx_xvnmsubasp(vf, vf, vf);
// CHECK: [[RESULT:%[^ ]+]] = fneg <4 x float> %{{.*}}		// CHECK: call <4 x float> @llvm.ppc.fnmsub.v4f32(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> %{{.*}})
// CHECK: [[RESULT2:%[^ ]+]] = call <4 x float> @llvm.fma.v4f32(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> [[RESULT]])
// CHECK: fneg <4 x float> [[RESULT2]]

vd = __builtin_vsx_xvnmsubadp(vd, vd, vd);		vd = __builtin_vsx_xvnmsubadp(vd, vd, vd);
// CHECK: [[RESULT:%[^ ]+]] = fneg <2 x double> %{{.*}}		// CHECK: call <2 x double> @llvm.ppc.fnmsub.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> %{{.*}})
// CHECK: [[RESULT2:%[^ ]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT]])
// CHECK: fneg <2 x double> [[RESULT2]]
}		}

clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	// CHECK-UNCONSTRAINED: [[RESULT:%[^ ]+]] = fneg <2 x double> %{{.*}}	// CHECK-UNCONSTRAINED: [[RESULT:%[^ ]+]] = fneg <2 x double> %{{.*}}
	// CHECK-UNCONSTRAINED: @llvm.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT]])	// CHECK-UNCONSTRAINED: @llvm.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT]])
	// CHECK-CONSTRAINED: [[RESULT:%[^ ]+]] = fneg <2 x double> %{{.*}}	// CHECK-CONSTRAINED: [[RESULT:%[^ ]+]] = fneg <2 x double> %{{.*}}
	// CHECK-CONSTRAINED: @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT]], metadata !"round.tonearest", metadata !"fpexcept.strict")	// CHECK-CONSTRAINED: @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT]], metadata !"round.tonearest", metadata !"fpexcept.strict")
	// CHECK-ASM: xvmsubadp	// CHECK-ASM: xvmsubadp

	vf = __builtin_vsx_xvnmsubasp(vf, vf, vf);	vf = __builtin_vsx_xvnmsubasp(vf, vf, vf);
	// CHECK-LABEL: try-xvnmsubasp	// CHECK-LABEL: try-xvnmsubasp
	// CHECK-UNCONSTRAINED: [[RESULT0:%[^ ]+]] = fneg <4 x float> %{{.*}}	// CHECK-UNCONSTRAINED: call <4 x float> @llvm.ppc.fnmsub.v4f32(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> %{{.*}})
	// CHECK-UNCONSTRAINED: [[RESULT1:%[^ ]+]] = call <4 x float> @llvm.fma.v4f32(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> [[RESULT0]])
	// CHECK-UNCONSTRAINED: fneg <4 x float> [[RESULT1]]
	// CHECK-CONSTRAINED: [[RESULT0:%[^ ]+]] = fneg <4 x float> %{{.*}}	// CHECK-CONSTRAINED: [[RESULT0:%[^ ]+]] = fneg <4 x float> %{{.*}}
	// CHECK-CONSTRAINED: [[RESULT1:%[^ ]+]] = call <4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> [[RESULT0]], metadata !"round.tonearest", metadata !"fpexcept.strict")	// CHECK-CONSTRAINED: [[RESULT1:%[^ ]+]] = call <4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> [[RESULT0]], metadata !"round.tonearest", metadata !"fpexcept.strict")
	// CHECK-CONSTRAINED: fneg <4 x float> [[RESULT1]]	// CHECK-CONSTRAINED: fneg <4 x float> [[RESULT1]]
	// CHECK-ASM: xvnmsubasp	// CHECK-ASM: xvnmsubasp

	vd = __builtin_vsx_xvnmsubadp(vd, vd, vd);	vd = __builtin_vsx_xvnmsubadp(vd, vd, vd);
	// CHECK-LABEL: try-xvnmsubadp	// CHECK-LABEL: try-xvnmsubadp
	// CHECK-UNCONSTRAINED: [[RESULT0:%[^ ]+]] = fneg <2 x double> %{{.*}}	// CHECK-UNCONSTRAINED: call <2 x double> @llvm.ppc.fnmsub.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> %{{.*}})
	// CHECK-UNCONSTRAINED: [[RESULT1:%[^ ]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT0]])
	// CHECK-UNCONSTRAINED: fneg <2 x double> [[RESULT1]]
	// CHECK-CONSTRAINED: [[RESULT0:%[^ ]+]] = fneg <2 x double> %{{.*}}	// CHECK-CONSTRAINED: [[RESULT0:%[^ ]+]] = fneg <2 x double> %{{.*}}
	// CHECK-CONSTRAINED: [[RESULT1:%[^ ]+]] = call <2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT0]], metadata !"round.tonearest", metadata !"fpexcept.strict")	// CHECK-CONSTRAINED: [[RESULT1:%[^ ]+]] = call <2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT0]], metadata !"round.tonearest", metadata !"fpexcept.strict")
	// CHECK-CONSTRAINED: fneg <2 x double> [[RESULT1]]	// CHECK-CONSTRAINED: fneg <2 x double> [[RESULT1]]
	// CHECK-ASM: xvnmsubadp	// CHECK-ASM: xvnmsubadp
	}	}
Context not available.

clang/test/CodeGen/PowerPC/builtins-ppc-vsx.c

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines

	res_vd = vec_nmadd(vd, vd, vd);			res_vd = vec_nmadd(vd, vd, vd);
	// CHECK: [[FM:[0-9]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}})			// CHECK: [[FM:[0-9]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}})
	// CHECK-NEXT: fneg <2 x double> %[[FM]]			// CHECK-NEXT: fneg <2 x double> %[[FM]]
	// CHECK-LE: [[FM:[0-9]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}})			// CHECK-LE: [[FM:[0-9]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}})
	// CHECK-LE-NEXT: fneg <2 x double> %[[FM]]			// CHECK-LE-NEXT: fneg <2 x double> %[[FM]]

	res_vf = vec_nmsub(vf, vf, vf);			res_vf = vec_nmsub(vf, vf, vf);
	// CHECK: fneg <4 x float> %{{[0-9]+}}			// CHECK: call <4 x float> @llvm.ppc.fnmsub.v4f32(<4 x float> %{{[0-9]+}}, <4 x float> %{{[0-9]+}}, <4 x float>
	// CHECK-NEXT: call <4 x float> @llvm.fma.v4f32(<4 x float> %{{[0-9]+}}, <4 x float> %{{[0-9]+}}, <4 x float>			// CHECK-LE: call <4 x float> @llvm.ppc.fnmsub.v4f32(<4 x float> %{{[0-9]+}}, <4 x float> %{{[0-9]+}}, <4 x float>
	// CHECK: fneg <4 x float> %{{[0-9]+}}
	// CHECK-LE: fneg <4 x float> %{{[0-9]+}}
	// CHECK-LE-NEXT: call <4 x float> @llvm.fma.v4f32(<4 x float> %{{[0-9]+}}, <4 x float> %{{[0-9]+}}, <4 x float>
	// CHECK-LE: fneg <4 x float> %{{[0-9]+}}

	res_vd = vec_nmsub(vd, vd, vd);			res_vd = vec_nmsub(vd, vd, vd);
	// CHECK: fneg <2 x double> %{{[0-9]+}}			// CHECK: [[FM:[0-9]+]] = call <2 x double> @llvm.ppc.fnmsub.v2f64(<2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}}, <2 x double>
	// CHECK-NEXT: [[FM:[0-9]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}}, <2 x double>			// CHECK-LE: [[FM:[0-9]+]] = call <2 x double> @llvm.ppc.fnmsub.v2f64(<2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}}, <2 x double>
	// CHECK-NEXT: fneg <2 x double> %[[FM]]
	// CHECK-LE: fneg <2 x double> %{{[0-9]+}}
	// CHECK-LE-NEXT: [[FM:[0-9]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}}, <2 x double>
	// CHECK-LE-NEXT: fneg <2 x double> %[[FM]]

	/* vec_nor */			/* vec_nor */
	res_vsll = vec_nor(vsll, vsll);			res_vsll = vec_nor(vsll, vsll);
	// CHECK: or <2 x i64>			// CHECK: or <2 x i64>
	// CHECK: xor <2 x i64>			// CHECK: xor <2 x i64>
	// CHECK-LE: or <2 x i64>			// CHECK-LE: or <2 x i64>
	// CHECK-LE: xor <2 x i64>			// CHECK-LE: xor <2 x i64>

	▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

clang/test/CodeGen/PowerPC/builtins-ppc-xlcompat-math.c

	Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	//			//
	float fnmadds (float f) {			float fnmadds (float f) {
	return __fnmadds (f, f, f);			return __fnmadds (f, f, f);
	}			}

	// CHECK-LABEL: @fnmsub(			// CHECK-LABEL: @fnmsub(
	// CHECK: [[D_ADDR:%.*]] = alloca double, align 8			// CHECK: [[D_ADDR:%.*]] = alloca double, align 8
	// CHECK-NEXT: store double [[D:%.]], double [[D_ADDR]], align 8			// CHECK-NEXT: store double [[D:%.]], double [[D_ADDR]], align 8
				// CHECK-COUNT-3: load double, double* [[D_ADDR]], align 8
				shchenzUnsubmitted Not Done Reply Inline Actions If we improve the check lines to CHECK-COUNT, do we still need the original CHECKs? shchenz: If we improve the check lines to CHECK-COUNT, do we still need the original CHECKs?
				qiucfAuthorUnsubmitted Done Reply Inline Actions Yes, otherwise we can't capture the right operands of `llvm.ppc.fnmsub.f64`. qiucf: Yes, otherwise we can't capture the right operands of `llvm.ppc.fnmsub.f64`.
	// CHECK-NEXT: [[TMP0:%.]] = load double, double [[D_ADDR]], align 8			// CHECK-NEXT: [[TMP0:%.]] = load double, double [[D_ADDR]], align 8
	// CHECK-NEXT: [[TMP1:%.]] = load double, double [[D_ADDR]], align 8			// CHECK-NEXT: [[TMP1:%.]] = load double, double [[D_ADDR]], align 8
	// CHECK-NEXT: [[TMP2:%.]] = load double, double [[D_ADDR]], align 8			// CHECK-NEXT: [[TMP2:%.]] = load double, double [[D_ADDR]], align 8
	// CHECK-NEXT: [[TMP3:%.*]] = call double @llvm.ppc.fnmsub(double [[TMP0]], double [[TMP1]], double [[TMP2]])			// CHECK-NEXT: [[TMP3:%.*]] = call double @llvm.ppc.fnmsub.f64(double [[TMP0]], double [[TMP1]], double [[TMP2]])
	// CHECK-NEXT: ret double [[TMP3]]			// CHECK-NEXT: ret double [[TMP3]]
	//			//
	double fnmsub (double d) {			double fnmsub (double d) {
	return __fnmsub (d, d, d);			return __fnmsub (d, d, d);
	}			}

	// CHECK-LABEL: @fnmsubs(			// CHECK-LABEL: @fnmsubs(
	// CHECK: [[F_ADDR:%.*]] = alloca float, align 4			// CHECK: [[F_ADDR:%.*]] = alloca float, align 4
	// CHECK-NEXT: store float [[F:%.]], float [[F_ADDR]], align 4			// CHECK-NEXT: store float [[F:%.]], float [[F_ADDR]], align 4
				// CHECK-COUNT-3: load float, float* [[F_ADDR]], align 4
	// CHECK-NEXT: [[TMP0:%.]] = load float, float [[F_ADDR]], align 4			// CHECK-NEXT: [[TMP0:%.]] = load float, float [[F_ADDR]], align 4
	// CHECK-NEXT: [[TMP1:%.]] = load float, float [[F_ADDR]], align 4			// CHECK-NEXT: [[TMP1:%.]] = load float, float [[F_ADDR]], align 4
	// CHECK-NEXT: [[TMP2:%.]] = load float, float [[F_ADDR]], align 4			// CHECK-NEXT: [[TMP2:%.]] = load float, float [[F_ADDR]], align 4
	// CHECK-NEXT: [[TMP3:%.*]] = call float @llvm.ppc.fnmsubs(float [[TMP0]], float [[TMP1]], float [[TMP2]])			// CHECK-NEXT: [[TMP3:%.*]] = call float @llvm.ppc.fnmsub.f32(float [[TMP0]], float [[TMP1]], float [[TMP2]])
	// CHECK-NEXT: ret float [[TMP3]]			// CHECK-NEXT: ret float [[TMP3]]
	//			//
	float fnmsubs (float f) {			float fnmsubs (float f) {
	return __fnmsubs (f, f, f);			return __fnmsubs (f, f, f);
	}			}

	// CHECK-LABEL: @fre(			// CHECK-LABEL: @fre(
	// CHECK: [[D_ADDR:%.*]] = alloca double, align 8			// CHECK: [[D_ADDR:%.*]] = alloca double, align 8
	Show All 19 Lines

llvm/include/llvm/IR/IntrinsicsPowerPC.td

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	[llvm_double_ty, llvm_double_ty, llvm_double_ty],			[llvm_double_ty, llvm_double_ty, llvm_double_ty],
	[IntrNoMem]>;			[IntrNoMem]>;
	def int_ppc_fnmadds			def int_ppc_fnmadds
	: GCCBuiltin<"__builtin_ppc_fnmadds">,			: GCCBuiltin<"__builtin_ppc_fnmadds">,
	Intrinsic <[llvm_float_ty],			Intrinsic <[llvm_float_ty],
	[llvm_float_ty, llvm_float_ty, llvm_float_ty],			[llvm_float_ty, llvm_float_ty, llvm_float_ty],
	[IntrNoMem]>;			[IntrNoMem]>;
	def int_ppc_fnmsub			def int_ppc_fnmsub
	: GCCBuiltin<"__builtin_ppc_fnmsub">,			: Intrinsic<[llvm_anyfloat_ty],
	Intrinsic <[llvm_double_ty],			[LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>],
	[llvm_double_ty, llvm_double_ty, llvm_double_ty],			[IntrNoMem]>;
	[IntrNoMem]>;
	def int_ppc_fnmsubs
	: GCCBuiltin<"__builtin_ppc_fnmsubs">,
	Intrinsic <[llvm_float_ty],
	[llvm_float_ty, llvm_float_ty, llvm_float_ty],
	[IntrNoMem]>;
	def int_ppc_fre			def int_ppc_fre
	: GCCBuiltin<"__builtin_ppc_fre">,			: GCCBuiltin<"__builtin_ppc_fre">,
	Intrinsic <[llvm_double_ty], [llvm_double_ty], [IntrNoMem]>;			Intrinsic <[llvm_double_ty], [llvm_double_ty], [IntrNoMem]>;
	def int_ppc_fres			def int_ppc_fres
	: GCCBuiltin<"__builtin_ppc_fres">,			: GCCBuiltin<"__builtin_ppc_fres">,
	Intrinsic <[llvm_float_ty], [llvm_float_ty], [IntrNoMem]>;			Intrinsic <[llvm_float_ty], [llvm_float_ty], [IntrNoMem]>;
	def int_ppc_addex			def int_ppc_addex
	: GCCBuiltin<"__builtin_ppc_addex">,			: GCCBuiltin<"__builtin_ppc_addex">,
	Intrinsic<[llvm_i64_ty], [llvm_i64_ty, llvm_i64_ty, llvm_i32_ty],			Intrinsic<[llvm_i64_ty], [llvm_i64_ty, llvm_i64_ty, llvm_i32_ty],
	[IntrNoMem, IntrHasSideEffects, ImmArg<ArgIndex<2>>]>;			[IntrNoMem, IntrHasSideEffects, ImmArg<ArgIndex<2>>]>;
				shchenzUnsubmitted Done Reply Inline Actions When `llvm_anyfloat_ty` is `f32` or `f64`, we will generate two intrinsics with same semantic. `llvm.ppc.nmsub.f32` + `llvm.ppc.fnmsubs` and `llvm.ppc.nmsub.f64` + `llvm.ppc.fnmsub`. At first glance, we seems can not delete the `int_ppc_fnmsub` and `int_ppc_fnmsubs`, because they are for XL compatibility and XL has seperated fnmsub for float and double and we need to map them 1 by 1. Better to check if it is possible to replace `int_ppc_fnmsub` and `int_ppc_fnmsubs` with `int_ppc_nmsub`. And if it can be replaced, we can use a meaningful name like `int_ppc_fnmsub` for the new intrinsic. shchenz: When `llvm_anyfloat_ty` is `f32` or `f64`, we will generate two intrinsics with same semantic.
				qiucfAuthorUnsubmitted Done Reply Inline Actions We can do that, but that requires more work and seems beyond this patch's scope. See D105930, we'll need to handle the builtin in Clang. And the builtin explicitly generates type-M VSX instructions (I guess to reduce copy in simple cases). qiucf: We can do that, but that requires more work and seems beyond this patch's scope. See D105930…
	def int_ppc_fsel : GCCBuiltin<"__builtin_ppc_fsel">,			def int_ppc_fsel : GCCBuiltin<"__builtin_ppc_fsel">,
	Intrinsic<[llvm_double_ty], [llvm_double_ty, llvm_double_ty,			Intrinsic<[llvm_double_ty], [llvm_double_ty, llvm_double_ty,
	llvm_double_ty], [IntrNoMem]>;			llvm_double_ty], [IntrNoMem]>;
	def int_ppc_fsels : GCCBuiltin<"__builtin_ppc_fsels">,			def int_ppc_fsels : GCCBuiltin<"__builtin_ppc_fsels">,
	Intrinsic<[llvm_float_ty], [llvm_float_ty, llvm_float_ty,			Intrinsic<[llvm_float_ty], [llvm_float_ty, llvm_float_ty,
	llvm_float_ty], [IntrNoMem]>;			llvm_float_ty], [IntrNoMem]>;
	def int_ppc_frsqrte : GCCBuiltin<"__builtin_ppc_frsqrte">,			def int_ppc_frsqrte : GCCBuiltin<"__builtin_ppc_frsqrte">,
	Intrinsic<[llvm_double_ty], [llvm_double_ty], [IntrNoMem]>;			Intrinsic<[llvm_double_ty], [llvm_double_ty], [IntrNoMem]>;
	▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	setOperationAction(ISD::GET_DYNAMIC_AREA_OFFSET, MVT::i64, Custom);			setOperationAction(ISD::GET_DYNAMIC_AREA_OFFSET, MVT::i64, Custom);
	setOperationAction(ISD::EH_DWARF_CFA, MVT::i32, Custom);			setOperationAction(ISD::EH_DWARF_CFA, MVT::i32, Custom);
	setOperationAction(ISD::EH_DWARF_CFA, MVT::i64, Custom);			setOperationAction(ISD::EH_DWARF_CFA, MVT::i64, Custom);

	// We want to custom lower some of our intrinsics.			// We want to custom lower some of our intrinsics.
	setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom);			setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom);
	setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::f64, Custom);			setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::f64, Custom);
	setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::ppcf128, Custom);			setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::ppcf128, Custom);
				setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::v4f32, Custom);
				setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::v2f64, Custom);

	// To handle counter-based loop conditions.			// To handle counter-based loop conditions.
	setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i1, Custom);			setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i1, Custom);

	setOperationAction(ISD::INTRINSIC_VOID, MVT::i8, Custom);			setOperationAction(ISD::INTRINSIC_VOID, MVT::i8, Custom);
	setOperationAction(ISD::INTRINSIC_VOID, MVT::i16, Custom);			setOperationAction(ISD::INTRINSIC_VOID, MVT::i16, Custom);
	setOperationAction(ISD::INTRINSIC_VOID, MVT::i32, Custom);			setOperationAction(ISD::INTRINSIC_VOID, MVT::i32, Custom);
	setOperationAction(ISD::INTRINSIC_VOID, MVT::Other, Custom);			setOperationAction(ISD::INTRINSIC_VOID, MVT::Other, Custom);
	▲ Show 20 Lines • Show All 182 Lines • ▼ Show 20 Lines
	PPC::SELECT_CC_I4, dl, MVT::i32,			PPC::SELECT_CC_I4, dl, MVT::i32,
	{SDValue(DAG.getMachineNode(CmprOpc, dl, MVT::i32, Op.getOperand(2),			{SDValue(DAG.getMachineNode(CmprOpc, dl, MVT::i32, Op.getOperand(2),
	Op.getOperand(1)),			Op.getOperand(1)),
	0),			0),
	DAG.getConstant(1, dl, MVT::i32), DAG.getConstant(0, dl, MVT::i32),			DAG.getConstant(1, dl, MVT::i32), DAG.getConstant(0, dl, MVT::i32),
	DAG.getTargetConstant(PPC::PRED_EQ, dl, MVT::i32)}),			DAG.getTargetConstant(PPC::PRED_EQ, dl, MVT::i32)}),
	0);			0);
	}			}
				case Intrinsic::ppc_fnmsub: {
				EVT VT = Op.getOperand(1).getValueType();
				if (!Subtarget.hasVSX() \|\| (!Subtarget.hasFloat128() && VT == MVT::f128))
				return DAG.getNode(
				ISD::FNEG, dl, VT,
				DAG.getNode(ISD::FMA, dl, VT, Op.getOperand(1), Op.getOperand(2),
				DAG.getNode(ISD::FNEG, dl, VT, Op.getOperand(3))));
				return DAG.getNode(PPCISD::FNMSUB, dl, VT, Op.getOperand(1),
				Op.getOperand(2), Op.getOperand(3));
				}
	case Intrinsic::ppc_convert_f128_to_ppcf128:			case Intrinsic::ppc_convert_f128_to_ppcf128:
	case Intrinsic::ppc_convert_ppcf128_to_f128: {			case Intrinsic::ppc_convert_ppcf128_to_f128: {
	RTLIB::Libcall LC = IntrinsicID == Intrinsic::ppc_convert_ppcf128_to_f128			RTLIB::Libcall LC = IntrinsicID == Intrinsic::ppc_convert_ppcf128_to_f128
	? RTLIB::CONVERT_PPCF128_F128			? RTLIB::CONVERT_PPCF128_F128
	: RTLIB::CONVERT_F128_PPCF128;			: RTLIB::CONVERT_F128_PPCF128;
	MakeLibCallOptions CallOptions;			MakeLibCallOptions CallOptions;
	std::pair<SDValue, SDValue> Result =			std::pair<SDValue, SDValue> Result =
	makeLibCall(DAG, LC, Op.getValueType(), Op.getOperand(1), CallOptions,			makeLibCall(DAG, LC, Op.getValueType(), Op.getOperand(1), CallOptions,
	▲ Show 20 Lines • Show All 182 Lines • ▼ Show 20 Lines
	break;			break;
	}			}
	case ISD::INTRINSIC_WO_CHAIN: {			case ISD::INTRINSIC_WO_CHAIN: {
	switch (cast<ConstantSDNode>(N->getOperand(0))->getZExtValue()) {			switch (cast<ConstantSDNode>(N->getOperand(0))->getZExtValue()) {
	case Intrinsic::ppc_pack_longdouble:			case Intrinsic::ppc_pack_longdouble:
	Results.push_back(DAG.getNode(ISD::BUILD_PAIR, dl, MVT::ppcf128,			Results.push_back(DAG.getNode(ISD::BUILD_PAIR, dl, MVT::ppcf128,
	N->getOperand(2), N->getOperand(1)));			N->getOperand(2), N->getOperand(1)));
	break;			break;
				case Intrinsic::ppc_fnmsub:
	case Intrinsic::ppc_convert_f128_to_ppcf128:			case Intrinsic::ppc_convert_f128_to_ppcf128:
	Results.push_back(LowerINTRINSIC_WO_CHAIN(SDValue(N, 0), DAG));			Results.push_back(LowerINTRINSIC_WO_CHAIN(SDValue(N, 0), DAG));
	break;			break;
	}			}
	break;			break;
	}			}
	case ISD::VAARG: {			case ISD::VAARG: {
	if (!Subtarget.isSVR4ABI() \|\| Subtarget.isPPC64())			if (!Subtarget.isSVR4ABI() \|\| Subtarget.isPPC64())
	▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCInstrInfo.td

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	(FCPSGND (COPY_TO_REGCLASS $frA, F8RC), $frB)>;			(FCPSGND (COPY_TO_REGCLASS $frA, F8RC), $frB)>;
	def : Pat<(fcopysign f32:$frB, f64:$frA),			def : Pat<(fcopysign f32:$frB, f64:$frA),
	(FCPSGNS (COPY_TO_REGCLASS $frA, F4RC), $frB)>;			(FCPSGNS (COPY_TO_REGCLASS $frA, F4RC), $frB)>;
	}			}

	// XL Compat intrinsics.			// XL Compat intrinsics.
	def : Pat<(int_ppc_fmsub f64:$A, f64:$B, f64:$C), (FMSUB $A, $B, $C)>;			def : Pat<(int_ppc_fmsub f64:$A, f64:$B, f64:$C), (FMSUB $A, $B, $C)>;
	def : Pat<(int_ppc_fmsubs f32:$A, f32:$B, f32:$C), (FMSUBS $A, $B, $C)>;			def : Pat<(int_ppc_fmsubs f32:$A, f32:$B, f32:$C), (FMSUBS $A, $B, $C)>;
	def : Pat<(int_ppc_fnmsub f64:$A, f64:$B, f64:$C), (FNMSUB $A, $B, $C)>;
	def : Pat<(int_ppc_fnmsubs f32:$A, f32:$B, f32:$C), (FNMSUBS $A, $B, $C)>;
	def : Pat<(int_ppc_fnmadd f64:$A, f64:$B, f64:$C), (FNMADD $A, $B, $C)>;			def : Pat<(int_ppc_fnmadd f64:$A, f64:$B, f64:$C), (FNMADD $A, $B, $C)>;
	def : Pat<(int_ppc_fnmadds f32:$A, f32:$B, f32:$C), (FNMADDS $A, $B, $C)>;			def : Pat<(int_ppc_fnmadds f32:$A, f32:$B, f32:$C), (FNMADDS $A, $B, $C)>;
	def : Pat<(int_ppc_fre f64:$A), (FRE $A)>;			def : Pat<(int_ppc_fre f64:$A), (FRE $A)>;
	def : Pat<(int_ppc_fres f32:$A), (FRES $A)>;			def : Pat<(int_ppc_fres f32:$A), (FRES $A)>;

	include "PPCInstrAltivec.td"			include "PPCInstrAltivec.td"
	include "PPCInstrSPE.td"			include "PPCInstrSPE.td"
	include "PPCInstr64Bit.td"			include "PPCInstr64Bit.td"
	▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCInstrVSX.td

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	def : Pat<(v2i64 (PPCvcmp_rec v2i64:$vA, v2i64:$vB, 711)),			def : Pat<(v2i64 (PPCvcmp_rec v2i64:$vA, v2i64:$vB, 711)),
	(VCMPGTUB_rec DblwdCmp.MRGUGT, (v2i64 (XXLXORz)))>;			(VCMPGTUB_rec DblwdCmp.MRGUGT, (v2i64 (XXLXORz)))>;
	def : Pat<(v2i64 (PPCvcmp_rec v2i64:$vA, v2i64:$vB, 199)),			def : Pat<(v2i64 (PPCvcmp_rec v2i64:$vA, v2i64:$vB, 199)),
	(VCMPGTUB_rec DblwdCmp.MRGEQ, (v2i64 (XXLXORz)))>;			(VCMPGTUB_rec DblwdCmp.MRGEQ, (v2i64 (XXLXORz)))>;
	} // AddedComplexity = 0			} // AddedComplexity = 0

	// XL Compat builtins.			// XL Compat builtins.
	def : Pat<(int_ppc_fmsub f64:$A, f64:$B, f64:$C), (XSMSUBMDP $A, $B, $C)>;			def : Pat<(int_ppc_fmsub f64:$A, f64:$B, f64:$C), (XSMSUBMDP $A, $B, $C)>;
	def : Pat<(int_ppc_fnmsub f64:$A, f64:$B, f64:$C), (XSNMSUBMDP $A, $B, $C)>;
	def : Pat<(int_ppc_fnmadd f64:$A, f64:$B, f64:$C), (XSNMADDMDP $A, $B, $C)>;			def : Pat<(int_ppc_fnmadd f64:$A, f64:$B, f64:$C), (XSNMADDMDP $A, $B, $C)>;
	def : Pat<(int_ppc_fre f64:$A), (XSREDP $A)>;			def : Pat<(int_ppc_fre f64:$A), (XSREDP $A)>;
	def : Pat<(int_ppc_frsqrte vsfrc:$XB), (XSRSQRTEDP $XB)>;			def : Pat<(int_ppc_frsqrte vsfrc:$XB), (XSRSQRTEDP $XB)>;
	} // HasVSX			} // HasVSX

	// Any big endian VSX subtarget.			// Any big endian VSX subtarget.
	let Predicates = [HasVSX, IsBigEndian] in {			let Predicates = [HasVSX, IsBigEndian] in {
	def : Pat<(v2f64 (scalar_to_vector f64:$A)),			def : Pat<(v2f64 (scalar_to_vector f64:$A)),
	▲ Show 20 Lines • Show All 182 Lines • ▼ Show 20 Lines
	(v2i64 (COPY_TO_REGCLASS(XXLEQVOnes), VSRC))>;			(v2i64 (COPY_TO_REGCLASS(XXLEQVOnes), VSRC))>;
	def : Pat<(v8i16 (bitconvert (v16i8 immAllOnesV))),			def : Pat<(v8i16 (bitconvert (v16i8 immAllOnesV))),
	(v8i16 (COPY_TO_REGCLASS(XXLEQVOnes), VSRC))>;			(v8i16 (COPY_TO_REGCLASS(XXLEQVOnes), VSRC))>;
	def : Pat<(v16i8 (bitconvert (v16i8 immAllOnesV))),			def : Pat<(v16i8 (bitconvert (v16i8 immAllOnesV))),
	(v16i8 (COPY_TO_REGCLASS(XXLEQVOnes), VSRC))>;			(v16i8 (COPY_TO_REGCLASS(XXLEQVOnes), VSRC))>;

	// XL Compat builtins.			// XL Compat builtins.
	def : Pat<(int_ppc_fmsubs f32:$A, f32:$B, f32:$C), (XSMSUBMSP $A, $B, $C)>;			def : Pat<(int_ppc_fmsubs f32:$A, f32:$B, f32:$C), (XSMSUBMSP $A, $B, $C)>;
	def : Pat<(int_ppc_fnmsubs f32:$A, f32:$B, f32:$C), (XSNMSUBMSP $A, $B, $C)>;
	def : Pat<(int_ppc_fnmadds f32:$A, f32:$B, f32:$C), (XSNMADDMSP $A, $B, $C)>;			def : Pat<(int_ppc_fnmadds f32:$A, f32:$B, f32:$C), (XSNMADDMSP $A, $B, $C)>;
	def : Pat<(int_ppc_fres f32:$A), (XSRESP $A)>;			def : Pat<(int_ppc_fres f32:$A), (XSRESP $A)>;
	def : Pat<(i32 (int_ppc_extract_exp f64:$A)),			def : Pat<(i32 (int_ppc_extract_exp f64:$A)),
	(EXTRACT_SUBREG (XSXEXPDP (COPY_TO_REGCLASS $A, VSFRC)), sub_32)>;			(EXTRACT_SUBREG (XSXEXPDP (COPY_TO_REGCLASS $A, VSFRC)), sub_32)>;
	def : Pat<(int_ppc_extract_sig f64:$A),			def : Pat<(int_ppc_extract_sig f64:$A),
	(XSXSIGDP (COPY_TO_REGCLASS $A, VSFRC))>;			(XSXSIGDP (COPY_TO_REGCLASS $A, VSFRC))>;
	def : Pat<(f64 (int_ppc_insert_exp f64:$A, i64:$B)),			def : Pat<(f64 (int_ppc_insert_exp f64:$A, i64:$B)),
	(COPY_TO_REGCLASS (XSIEXPDP (COPY_TO_REGCLASS $A, G8RC), $B), F8RC)>;			(COPY_TO_REGCLASS (XSIEXPDP (COPY_TO_REGCLASS $A, G8RC), $B), F8RC)>;
	▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/builtins-ppc-xlcompat-math.ll

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	; CHECK-PWR7-NEXT: blr			; CHECK-PWR7-NEXT: blr
	entry:			entry:
	%0 = tail call float @llvm.ppc.fnmadds(float %f, float %f2, float %f3)			%0 = tail call float @llvm.ppc.fnmadds(float %f, float %f2, float %f3)
	ret float %0			ret float %0
	}			}

	declare float @llvm.ppc.fnmadds(float, float, float)			declare float @llvm.ppc.fnmadds(float, float, float)

	define dso_local double @fnmsub_t0(double %d, double %d2, double %d3) {			define dso_local float @fnmsub_f32(float %f, float %f2, float %f3) {
	; CHECK-PWR8-LABEL: fnmsub_t0:			; CHECK-PWR8-LABEL: fnmsub_f32:
	; CHECK-PWR8: # %bb.0: # %entry			; CHECK-PWR8: # %bb.0: # %entry
	; CHECK-PWR8-NEXT: xsnmsubmdp 1, 2, 3			; CHECK-PWR8-NEXT: xsnmsubasp 3, 1, 2
				; CHECK-PWR8-NEXT: fmr 1, 3
	; CHECK-PWR8-NEXT: blr			; CHECK-PWR8-NEXT: blr
	;			;
	; CHECK-NOVSX-LABEL: fnmsub_t0:			; CHECK-NOVSX-LABEL: fnmsub_f32:
				; CHECK-NOVSX: # %bb.0: # %entry
				; CHECK-NOVSX-NEXT: fnmsubs 1, 1, 2, 3
				; CHECK-NOVSX-NEXT: blr
				;
				; CHECK-PWR7-LABEL: fnmsub_f32:
				; CHECK-PWR7: # %bb.0: # %entry
				; CHECK-PWR7-NEXT: fnmsubs 1, 1, 2, 3
				; CHECK-PWR7-NEXT: blr
				entry:
				%0 = tail call float @llvm.ppc.fnmsub.f32(float %f, float %f2, float %f3)
				ret float %0
				}

				declare float @llvm.ppc.fnmsub.f32(float, float, float)

				define dso_local double @fnmsub_f64(double %f, double %f2, double %f3) {
				; CHECK-PWR8-LABEL: fnmsub_f64:
				; CHECK-PWR8: # %bb.0: # %entry
				; CHECK-PWR8-NEXT: xsnmsubadp 3, 1, 2
				; CHECK-PWR8-NEXT: fmr 1, 3
				; CHECK-PWR8-NEXT: blr
				;
				; CHECK-NOVSX-LABEL: fnmsub_f64:
	; CHECK-NOVSX: # %bb.0: # %entry			; CHECK-NOVSX: # %bb.0: # %entry
	; CHECK-NOVSX-NEXT: fnmsub 1, 1, 2, 3			; CHECK-NOVSX-NEXT: fnmsub 1, 1, 2, 3
	; CHECK-NOVSX-NEXT: blr			; CHECK-NOVSX-NEXT: blr
	;			;
	; CHECK-PWR7-LABEL: fnmsub_t0:			; CHECK-PWR7-LABEL: fnmsub_f64:
	; CHECK-PWR7: # %bb.0: # %entry			; CHECK-PWR7: # %bb.0: # %entry
	; CHECK-PWR7-NEXT: xsnmsubmdp 1, 2, 3			; CHECK-PWR7-NEXT: xsnmsubadp 3, 1, 2
				; CHECK-PWR7-NEXT: fmr 1, 3
	; CHECK-PWR7-NEXT: blr			; CHECK-PWR7-NEXT: blr
	entry:			entry:
	%0 = tail call double @llvm.ppc.fnmsub(double %d, double %d2, double %d3)			%0 = tail call double @llvm.ppc.fnmsub.f64(double %f, double %f2, double %f3)
	ret double %0			ret double %0
	}			}

	declare double @llvm.ppc.fnmsub(double, double, double)			declare double @llvm.ppc.fnmsub.f64(double, double, double)

	define dso_local float @fnmsubs_t0(float %f, float %f2, float %f3) {			define dso_local <4 x float> @fnmsub_v4f32(<4 x float> %f, <4 x float> %f2, <4 x float> %f3) {
	; CHECK-PWR8-LABEL: fnmsubs_t0:			; CHECK-PWR8-LABEL: fnmsub_v4f32:
	; CHECK-PWR8: # %bb.0: # %entry			; CHECK-PWR8: # %bb.0: # %entry
	; CHECK-PWR8-NEXT: xsnmsubmsp 1, 2, 3			; CHECK-PWR8-NEXT: xvnmsubasp 36, 34, 35
				; CHECK-PWR8-NEXT: vmr 2, 4
	; CHECK-PWR8-NEXT: blr			; CHECK-PWR8-NEXT: blr
	;			;
	; CHECK-NOVSX-LABEL: fnmsubs_t0:			; CHECK-NOVSX-LABEL: fnmsub_v4f32:
	; CHECK-NOVSX: # %bb.0: # %entry			; CHECK-NOVSX: # %bb.0: # %entry
	; CHECK-NOVSX-NEXT: fnmsubs 1, 1, 2, 3			; CHECK-NOVSX-NEXT: fnmsubs 1, 1, 5, 9
				; CHECK-NOVSX-NEXT: fnmsubs 2, 2, 6, 10
				; CHECK-NOVSX-NEXT: fnmsubs 3, 3, 7, 11
				; CHECK-NOVSX-NEXT: fnmsubs 4, 4, 8, 12
	; CHECK-NOVSX-NEXT: blr			; CHECK-NOVSX-NEXT: blr
	;			;
	; CHECK-PWR7-LABEL: fnmsubs_t0:			; CHECK-PWR7-LABEL: fnmsub_v4f32:
	; CHECK-PWR7: # %bb.0: # %entry			; CHECK-PWR7: # %bb.0: # %entry
	; CHECK-PWR7-NEXT: fnmsubs 1, 1, 2, 3			; CHECK-PWR7-NEXT: xvnmsubasp 36, 34, 35
				; CHECK-PWR7-NEXT: vmr 2, 4
	; CHECK-PWR7-NEXT: blr			; CHECK-PWR7-NEXT: blr
	entry:			entry:
	%0 = tail call float @llvm.ppc.fnmsubs(float %f, float %f2, float %f3)			%0 = tail call <4 x float> @llvm.ppc.fnmsub.v4f32(<4 x float> %f, <4 x float> %f2, <4 x float> %f3)
	ret float %0			ret <4 x float> %0
				}

				declare <4 x float> @llvm.ppc.fnmsub.v4f32(<4 x float>, <4 x float>, <4 x float>)

				define dso_local <2 x double> @fnmsub_v2f64(<2 x double> %f, <2 x double> %f2, <2 x double> %f3) {
				; CHECK-PWR8-LABEL: fnmsub_v2f64:
				; CHECK-PWR8: # %bb.0: # %entry
				; CHECK-PWR8-NEXT: xvnmsubadp 36, 34, 35
				; CHECK-PWR8-NEXT: vmr 2, 4
				; CHECK-PWR8-NEXT: blr
				;
				; CHECK-NOVSX-LABEL: fnmsub_v2f64:
				; CHECK-NOVSX: # %bb.0: # %entry
				; CHECK-NOVSX-NEXT: fnmsub 1, 1, 3, 5
				; CHECK-NOVSX-NEXT: fnmsub 2, 2, 4, 6
				; CHECK-NOVSX-NEXT: blr
				;
				; CHECK-PWR7-LABEL: fnmsub_v2f64:
				; CHECK-PWR7: # %bb.0: # %entry
				; CHECK-PWR7-NEXT: xvnmsubadp 36, 34, 35
				; CHECK-PWR7-NEXT: vmr 2, 4
				; CHECK-PWR7-NEXT: blr
				entry:
				%0 = tail call <2 x double> @llvm.ppc.fnmsub.v2f64(<2 x double> %f, <2 x double> %f2, <2 x double> %f3)
				ret <2 x double> %0
	}			}

	declare float @llvm.ppc.fnmsubs(float, float, float)			declare <2 x double> @llvm.ppc.fnmsub.v2f64(<2 x double>, <2 x double>, <2 x double>)

	define dso_local double @fre(double %d) {			define dso_local double @fre(double %d) {
	; CHECK-PWR8-LABEL: fre:			; CHECK-PWR8-LABEL: fre:
	; CHECK-PWR8: # %bb.0: # %entry			; CHECK-PWR8: # %bb.0: # %entry
	; CHECK-PWR8-NEXT: xsredp 1, 1			; CHECK-PWR8-NEXT: xsredp 1, 1
	; CHECK-PWR8-NEXT: blr			; CHECK-PWR8-NEXT: blr
	;			;
	; CHECK-NOVSX-LABEL: fre:			; CHECK-NOVSX-LABEL: fre:
	Show All 36 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Add generic fnmsub intrinsicClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 413333

clang/lib/CodeGen/CGBuiltin.cpp

clang/test/CodeGen/PowerPC/builtins-ppc-fma.c

clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c

clang/test/CodeGen/PowerPC/builtins-ppc-vsx.c

clang/test/CodeGen/PowerPC/builtins-ppc-xlcompat-math.c

llvm/include/llvm/IR/IntrinsicsPowerPC.td

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

llvm/lib/Target/PowerPC/PPCInstrInfo.td

llvm/lib/Target/PowerPC/PPCInstrVSX.td

llvm/test/CodeGen/PowerPC/builtins-ppc-xlcompat-math.ll

[PowerPC] Add generic fnmsub intrinsic
ClosedPublic