This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
1/1
CGBuiltin.cpp
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
builtins-ppc-fma.c
-
builtins-ppc-fpconstrained.c
-
builtins-ppc-vsx.c
1/2
builtins-ppc-xlcompat-math.c
-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
2/2
IntrinsicsPowerPC.td
-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
-
PPCISelLowering.cpp
-
PPCInstrInfo.td
-
PPCInstrVSX.td
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
builtins-ppc-xlcompat-math.ll

Differential D116015

[PowerPC] Add generic fnmsub intrinsic
ClosedPublic

Authored by qiucf on Dec 19 2021, 8:48 PM.

Download Raw Diff

Details

Reviewers

rzurob
jsji
nemanjai
shchenz

Group Reviewers

Restricted Project

Commits

rGb2497e54356d: [PowerPC] Add generic fnmsub intrinsic

Summary

Currently in Clang, we have various builtins for fnmsub operation:

__builtin_vsx_xvnmsubasp/__builtin_vsx_xvnmsubadp for float/double vector, they'll be transformed into -fma(a, b, -c) in LLVM IR
__builtin_ppc_fnmsubs/__builtin_ppc_fnmsub for float/double scalar, they'll generate corresponding intrinsic in IR

But for the vector version of builtin, the 3 op chain may be recognized as expensive by some passes (like early cse). We need some way to keep the fnmsub form until code generation.

This patch introduces ppc.fnmsub.* intrinsic to unify four fnmsub intrinsics.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

qiucf created this revision.Dec 19 2021, 8:48 PM

Herald added subscribers: kbarton, hiraditya. · View Herald TranscriptDec 19 2021, 8:48 PM

qiucf requested review of this revision.Dec 19 2021, 8:48 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptDec 19 2021, 8:48 PM

Herald added subscribers: llvm-commits, cfe-commits. · View Herald Transcript

Harbormaster completed remote builds in B140033: Diff 395369.Dec 19 2021, 9:46 PM

Converting more generic code to target-specific intrinsics is sometimes necessary to ensure the generic IR doesn't get transformed in a way that is disadvantageous. I believe that the description of this review claims that to be the case for these negated FMA's. The obvious disadvantage of producing target-specific intrinsics is that the optimizer knows nothing about them so no advantageous transformations can happen either (i.e. hiding the semantics from the optimizer is sometimes a good thing and sometimes a bad thing).

The description of this patch includes no test case that shows the optimizer performing an undesirable transformation. So the motivation for making the front end produce more opaque code is not at all clear.

In D116015#3203067, @nemanjai wrote:

Converting more generic code to target-specific intrinsics is sometimes necessary to ensure the generic IR doesn't get transformed in a way that is disadvantageous. I believe that the description of this review claims that to be the case for these negated FMA's. The obvious disadvantage of producing target-specific intrinsics is that the optimizer knows nothing about them so no advantageous transformations can happen either (i.e. hiding the semantics from the optimizer is sometimes a good thing and sometimes a bad thing).

The description of this patch includes no test case that shows the optimizer performing an undesirable transformation. So the motivation for making the front end produce more opaque code is not at all clear.

Here's a pretty simple case: vector float foo(vector float a, vector float b, vector float c, vector float d) { return __builtin_vsx_xvnmsubasp(c, d, a*b); }

It current produces xvnegsp+xvmulsp+xvnmaddasp, after this patch it produces xvmulsp+xvnmsubasp. In some complicated cases, we can see much more unexpected instructions generated.

gentle ping

hiding the semantics from the optimizer is sometimes a good thing and sometimes a bad thing).

Agree. Imagining a case when the neg and fma (from fnmsub) can both be CSE-ed with another neg and fma, so we can totally eliminate the fnmsub. But after we convert it to an intrinsic, we may lose the opportunity to CSE the fnmsub.

Here's a pretty simple case: vector float foo(vector float a, vector float b, vector float c, vector float d) { return __builtin_vsx_xvnmsubasp(c, d, a*b); }
It current produces xvnegsp+xvmulsp+xvnmaddasp, after this patch it produces xvmulsp+xvnmsubasp. In some complicated cases, we can see much more unexpected instructions generated.

This is narrowed down from a real-world case. After CSE some part of the fnmsub, it is hard to optimize it back to a single hardware fnmsub instruction as normally we check the use number of a register and if the user number is not 1, we may exit the combine.

Is it possible to get some perf data for some float workloads with this patch? @qiucf

llvm/include/llvm/IR/IntrinsicsPowerPC.td
1737	When `llvm_anyfloat_ty` is `f32` or `f64`, we will generate two intrinsics with same semantic. `llvm.ppc.nmsub.f32` + `llvm.ppc.fnmsubs` and `llvm.ppc.nmsub.f64` + `llvm.ppc.fnmsub`. At first glance, we seems can not delete the `int_ppc_fnmsub` and `int_ppc_fnmsubs`, because they are for XL compatibility and XL has seperated fnmsub for float and double and we need to map them 1 by 1. Better to check if it is possible to replace `int_ppc_fnmsub` and `int_ppc_fnmsubs` with `int_ppc_nmsub`. And if it can be replaced, we can use a meaningful name like `int_ppc_fnmsub` for the new intrinsic.

In D116015#3326148, @shchenz wrote:

hiding the semantics from the optimizer is sometimes a good thing and sometimes a bad thing).

Agree. Imagining a case when the neg and fma (from fnmsub) can both be CSE-ed with another neg and fma, so we can totally eliminate the fnmsub. But after we convert it to an intrinsic, we may lose the opportunity to CSE the fnmsub.

Here's a pretty simple case: vector float foo(vector float a, vector float b, vector float c, vector float d) { return __builtin_vsx_xvnmsubasp(c, d, a*b); }
It current produces xvnegsp+xvmulsp+xvnmaddasp, after this patch it produces xvmulsp+xvnmsubasp. In some complicated cases, we can see much more unexpected instructions generated.

This is narrowed down from a real-world case. After CSE some part of the fnmsub, it is hard to optimize it back to a single hardware fnmsub instruction as normally we check the use number of a register and if the user number is not 1, we may exit the combine.

Is it possible to get some perf data for some float workloads with this patch? @qiucf

Thanks. I did not see performance change in some common benchmarks.

llvm/include/llvm/IR/IntrinsicsPowerPC.td
1737	We can do that, but that requires more work and seems beyond this patch's scope. See D105930, we'll need to handle the builtin in Clang. And the builtin explicitly generates type-M VSX instructions (I guess to reduce copy in simple cases).

Replace existing ppc.fnmsub and ppc.fnmsubs.

Harbormaster completed remote builds in B151213: Diff 411046.Feb 24 2022, 2:48 AM

LGTM. Two nits about the comments and tests.

Please wait for some days in case other reviews have some comments.

clang/lib/CodeGen/CGBuiltin.cpp
15196	why add this comment here?
clang/test/CodeGen/PowerPC/builtins-ppc-xlcompat-math.c
98	If we improve the check lines to CHECK-COUNT, do we still need the original CHECKs?

This revision is now accepted and ready to land.Mar 2 2022, 6:21 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2022, 6:21 PM

qiucf updated this revision to Diff 413333.Mar 6 2022, 9:05 PM

qiucf marked an inline comment as done.

qiucf added inline comments.

clang/test/CodeGen/PowerPC/builtins-ppc-xlcompat-math.c
98	Yes, otherwise we can't capture the right operands of `llvm.ppc.fnmsub.f64`.

This revision was landed with ongoing or failed builds.Mar 6 2022, 9:06 PM

Closed by commit rGb2497e54356d: [PowerPC] Add generic fnmsub intrinsic (authored by qiucf). · Explain Why

This revision was automatically updated to reflect the committed changes.

qiucf added a commit: rGb2497e54356d: [PowerPC] Add generic fnmsub intrinsic.

Harbormaster completed remote builds in B152830: Diff 413333.Mar 6 2022, 10:10 PM

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGBuiltin.cpp

11 lines

test/

CodeGen/

PowerPC/

builtins-ppc-fma.c

8 lines

builtins-ppc-fpconstrained.c

8 lines

builtins-ppc-vsx.c

16 lines

builtins-ppc-xlcompat-math.c

6 lines

llvm/

include/

llvm/

IR/

IntrinsicsPowerPC.td

12 lines

lib/

Target/

PowerPC/

PPCISelLowering.cpp

13 lines

PPCInstrInfo.td

2 lines

PPCInstrVSX.td

2 lines

test/

CodeGen/

PowerPC/

builtins-ppc-xlcompat-math.ll

91 lines

Diff 413334

clang/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 15,187 Lines • ▼ Show 20 Lines	case X86::BI__builtin_ia32_vfmaddcsh_round_mask3: {
static constexpr int Mask[] = {0, 5, 6, 7};		static constexpr int Mask[] = {0, 5, 6, 7};
return Builder.CreateShuffleVector(Call, Ops[2], Mask);		return Builder.CreateShuffleVector(Call, Ops[2], Mask);
}		}
}		}
}		}

Value *CodeGenFunction::EmitPPCBuiltinExpr(unsigned BuiltinID,		Value *CodeGenFunction::EmitPPCBuiltinExpr(unsigned BuiltinID,
const CallExpr *E) {		const CallExpr *E) {
SmallVector<Value*, 4> Ops;		SmallVector<Value*, 4> Ops;
		shchenzUnsubmitted Done Reply Inline Actions why add this comment here? shchenz: why add this comment here?

for (unsigned i = 0, e = E->getNumArgs(); i != e; i++) {		for (unsigned i = 0, e = E->getNumArgs(); i != e; i++) {
if (E->getArg(i)->getType()->isArrayType())		if (E->getArg(i)->getType()->isArrayType())
Ops.push_back(EmitArrayToPointerDecay(E->getArg(i)).getPointer());		Ops.push_back(EmitArrayToPointerDecay(E->getArg(i)).getPointer());
else		else
Ops.push_back(EmitScalarExpr(E->getArg(i)));		Ops.push_back(EmitScalarExpr(E->getArg(i)));
}		}

▲ Show 20 Lines • Show All 568 Lines • ▼ Show 20 Lines	Value *CodeGenFunction::EmitPPCBuiltinExpr(unsigned BuiltinID,
}		}
case PPC::BI__builtin_ppc_load2r: {		case PPC::BI__builtin_ppc_load2r: {
Function *F = CGM.getIntrinsic(Intrinsic::ppc_load2r);		Function *F = CGM.getIntrinsic(Intrinsic::ppc_load2r);
Ops[0] = Builder.CreateBitCast(Ops[0], Int8PtrTy);		Ops[0] = Builder.CreateBitCast(Ops[0], Int8PtrTy);
Value *LoadIntrinsic = Builder.CreateCall(F, Ops);		Value *LoadIntrinsic = Builder.CreateCall(F, Ops);
return Builder.CreateTrunc(LoadIntrinsic, Int16Ty);		return Builder.CreateTrunc(LoadIntrinsic, Int16Ty);
}		}
// FMA variations		// FMA variations
		case PPC::BI__builtin_ppc_fnmsub:
		case PPC::BI__builtin_ppc_fnmsubs:
case PPC::BI__builtin_vsx_xvmaddadp:		case PPC::BI__builtin_vsx_xvmaddadp:
case PPC::BI__builtin_vsx_xvmaddasp:		case PPC::BI__builtin_vsx_xvmaddasp:
case PPC::BI__builtin_vsx_xvnmaddadp:		case PPC::BI__builtin_vsx_xvnmaddadp:
case PPC::BI__builtin_vsx_xvnmaddasp:		case PPC::BI__builtin_vsx_xvnmaddasp:
case PPC::BI__builtin_vsx_xvmsubadp:		case PPC::BI__builtin_vsx_xvmsubadp:
case PPC::BI__builtin_vsx_xvmsubasp:		case PPC::BI__builtin_vsx_xvmsubasp:
case PPC::BI__builtin_vsx_xvnmsubadp:		case PPC::BI__builtin_vsx_xvnmsubadp:
case PPC::BI__builtin_vsx_xvnmsubasp: {		case PPC::BI__builtin_vsx_xvnmsubasp: {
Show All 22 Lines	switch (BuiltinID) {
return Builder.CreateFNeg(Builder.CreateCall(F, {X, Y, Z}), "neg");		return Builder.CreateFNeg(Builder.CreateCall(F, {X, Y, Z}), "neg");
case PPC::BI__builtin_vsx_xvmsubadp:		case PPC::BI__builtin_vsx_xvmsubadp:
case PPC::BI__builtin_vsx_xvmsubasp:		case PPC::BI__builtin_vsx_xvmsubasp:
if (Builder.getIsFPConstrained())		if (Builder.getIsFPConstrained())
return Builder.CreateConstrainedFPCall(		return Builder.CreateConstrainedFPCall(
F, {X, Y, Builder.CreateFNeg(Z, "neg")});		F, {X, Y, Builder.CreateFNeg(Z, "neg")});
else		else
return Builder.CreateCall(F, {X, Y, Builder.CreateFNeg(Z, "neg")});		return Builder.CreateCall(F, {X, Y, Builder.CreateFNeg(Z, "neg")});
		case PPC::BI__builtin_ppc_fnmsub:
		case PPC::BI__builtin_ppc_fnmsubs:
case PPC::BI__builtin_vsx_xvnmsubadp:		case PPC::BI__builtin_vsx_xvnmsubadp:
case PPC::BI__builtin_vsx_xvnmsubasp:		case PPC::BI__builtin_vsx_xvnmsubasp:
if (Builder.getIsFPConstrained())		if (Builder.getIsFPConstrained())
return Builder.CreateFNeg(		return Builder.CreateFNeg(
Builder.CreateConstrainedFPCall(		Builder.CreateConstrainedFPCall(
F, {X, Y, Builder.CreateFNeg(Z, "neg")}),		F, {X, Y, Builder.CreateFNeg(Z, "neg")}),
"neg");		"neg");
else		else
return Builder.CreateFNeg(		return Builder.CreateCall(
Builder.CreateCall(F, {X, Y, Builder.CreateFNeg(Z, "neg")}),		CGM.getIntrinsic(Intrinsic::ppc_fnmsub, ResultType), {X, Y, Z});
"neg");
}		}
llvm_unreachable("Unknown FMA operation");		llvm_unreachable("Unknown FMA operation");
return nullptr; // Suppress no-return warning		return nullptr; // Suppress no-return warning
}		}

case PPC::BI__builtin_vsx_insertword: {		case PPC::BI__builtin_vsx_insertword: {
llvm::Function *F = CGM.getIntrinsic(Intrinsic::ppc_vsx_xxinsertw);		llvm::Function *F = CGM.getIntrinsic(Intrinsic::ppc_vsx_xxinsertw);

// Third argument is a compile time constant int. It must be clamped to		// Third argument is a compile time constant int. It must be clamped to
▲ Show 20 Lines • Show All 3,195 Lines • Show Last 20 Lines

clang/test/CodeGen/PowerPC/builtins-ppc-fma.c

Show All 26 Lines	void test_fma(void) {
// CHECK: [[RESULT:%[^ ]+]] = fneg <4 x float> %{{.*}}		// CHECK: [[RESULT:%[^ ]+]] = fneg <4 x float> %{{.*}}
// CHECK: @llvm.fma.v4f32(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> [[RESULT]])		// CHECK: @llvm.fma.v4f32(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> [[RESULT]])

vd = __builtin_vsx_xvmsubadp(vd, vd, vd);		vd = __builtin_vsx_xvmsubadp(vd, vd, vd);
// CHECK: [[RESULT:%[^ ]+]] = fneg <2 x double> %{{.*}}		// CHECK: [[RESULT:%[^ ]+]] = fneg <2 x double> %{{.*}}
// CHECK: <2 x double> @llvm.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT]])		// CHECK: <2 x double> @llvm.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT]])

vf = __builtin_vsx_xvnmsubasp(vf, vf, vf);		vf = __builtin_vsx_xvnmsubasp(vf, vf, vf);
// CHECK: [[RESULT:%[^ ]+]] = fneg <4 x float> %{{.*}}		// CHECK: call <4 x float> @llvm.ppc.fnmsub.v4f32(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> %{{.*}})
// CHECK: [[RESULT2:%[^ ]+]] = call <4 x float> @llvm.fma.v4f32(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> [[RESULT]])
// CHECK: fneg <4 x float> [[RESULT2]]

vd = __builtin_vsx_xvnmsubadp(vd, vd, vd);		vd = __builtin_vsx_xvnmsubadp(vd, vd, vd);
// CHECK: [[RESULT:%[^ ]+]] = fneg <2 x double> %{{.*}}		// CHECK: call <2 x double> @llvm.ppc.fnmsub.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> %{{.*}})
// CHECK: [[RESULT2:%[^ ]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT]])
// CHECK: fneg <2 x double> [[RESULT2]]
}		}

clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c

Show First 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	void test_float(void) {
// CHECK-UNCONSTRAINED: [[RESULT:%[^ ]+]] = fneg <2 x double> %{{.*}}		// CHECK-UNCONSTRAINED: [[RESULT:%[^ ]+]] = fneg <2 x double> %{{.*}}
// CHECK-UNCONSTRAINED: @llvm.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT]])		// CHECK-UNCONSTRAINED: @llvm.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT]])
// CHECK-CONSTRAINED: [[RESULT:%[^ ]+]] = fneg <2 x double> %{{.*}}		// CHECK-CONSTRAINED: [[RESULT:%[^ ]+]] = fneg <2 x double> %{{.*}}
// CHECK-CONSTRAINED: @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT]], metadata !"round.tonearest", metadata !"fpexcept.strict")		// CHECK-CONSTRAINED: @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT]], metadata !"round.tonearest", metadata !"fpexcept.strict")
// CHECK-ASM: xvmsubadp		// CHECK-ASM: xvmsubadp

vf = __builtin_vsx_xvnmsubasp(vf, vf, vf);		vf = __builtin_vsx_xvnmsubasp(vf, vf, vf);
// CHECK-LABEL: try-xvnmsubasp		// CHECK-LABEL: try-xvnmsubasp
// CHECK-UNCONSTRAINED: [[RESULT0:%[^ ]+]] = fneg <4 x float> %{{.*}}		// CHECK-UNCONSTRAINED: call <4 x float> @llvm.ppc.fnmsub.v4f32(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> %{{.*}})
// CHECK-UNCONSTRAINED: [[RESULT1:%[^ ]+]] = call <4 x float> @llvm.fma.v4f32(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> [[RESULT0]])
// CHECK-UNCONSTRAINED: fneg <4 x float> [[RESULT1]]
// CHECK-CONSTRAINED: [[RESULT0:%[^ ]+]] = fneg <4 x float> %{{.*}}		// CHECK-CONSTRAINED: [[RESULT0:%[^ ]+]] = fneg <4 x float> %{{.*}}
// CHECK-CONSTRAINED: [[RESULT1:%[^ ]+]] = call <4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> [[RESULT0]], metadata !"round.tonearest", metadata !"fpexcept.strict")		// CHECK-CONSTRAINED: [[RESULT1:%[^ ]+]] = call <4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> [[RESULT0]], metadata !"round.tonearest", metadata !"fpexcept.strict")
// CHECK-CONSTRAINED: fneg <4 x float> [[RESULT1]]		// CHECK-CONSTRAINED: fneg <4 x float> [[RESULT1]]
// CHECK-ASM: xvnmsubasp		// CHECK-ASM: xvnmsubasp

vd = __builtin_vsx_xvnmsubadp(vd, vd, vd);		vd = __builtin_vsx_xvnmsubadp(vd, vd, vd);
// CHECK-LABEL: try-xvnmsubadp		// CHECK-LABEL: try-xvnmsubadp
// CHECK-UNCONSTRAINED: [[RESULT0:%[^ ]+]] = fneg <2 x double> %{{.*}}		// CHECK-UNCONSTRAINED: call <2 x double> @llvm.ppc.fnmsub.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> %{{.*}})
// CHECK-UNCONSTRAINED: [[RESULT1:%[^ ]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT0]])
// CHECK-UNCONSTRAINED: fneg <2 x double> [[RESULT1]]
// CHECK-CONSTRAINED: [[RESULT0:%[^ ]+]] = fneg <2 x double> %{{.*}}		// CHECK-CONSTRAINED: [[RESULT0:%[^ ]+]] = fneg <2 x double> %{{.*}}
// CHECK-CONSTRAINED: [[RESULT1:%[^ ]+]] = call <2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT0]], metadata !"round.tonearest", metadata !"fpexcept.strict")		// CHECK-CONSTRAINED: [[RESULT1:%[^ ]+]] = call <2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> [[RESULT0]], metadata !"round.tonearest", metadata !"fpexcept.strict")
// CHECK-CONSTRAINED: fneg <2 x double> [[RESULT1]]		// CHECK-CONSTRAINED: fneg <2 x double> [[RESULT1]]
// CHECK-ASM: xvnmsubadp		// CHECK-ASM: xvnmsubadp
}		}

clang/test/CodeGen/PowerPC/builtins-ppc-vsx.c

Show First 20 Lines • Show All 888 Lines • ▼ Show 20 Lines	// CHECK-LE-NEXT: fneg <4 x float> %[[FM]]

res_vd = vec_nmadd(vd, vd, vd);		res_vd = vec_nmadd(vd, vd, vd);
// CHECK: [[FM:[0-9]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}})		// CHECK: [[FM:[0-9]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}})
// CHECK-NEXT: fneg <2 x double> %[[FM]]		// CHECK-NEXT: fneg <2 x double> %[[FM]]
// CHECK-LE: [[FM:[0-9]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}})		// CHECK-LE: [[FM:[0-9]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}})
// CHECK-LE-NEXT: fneg <2 x double> %[[FM]]		// CHECK-LE-NEXT: fneg <2 x double> %[[FM]]

res_vf = vec_nmsub(vf, vf, vf);		res_vf = vec_nmsub(vf, vf, vf);
// CHECK: fneg <4 x float> %{{[0-9]+}}		// CHECK: call <4 x float> @llvm.ppc.fnmsub.v4f32(<4 x float> %{{[0-9]+}}, <4 x float> %{{[0-9]+}}, <4 x float>
// CHECK-NEXT: call <4 x float> @llvm.fma.v4f32(<4 x float> %{{[0-9]+}}, <4 x float> %{{[0-9]+}}, <4 x float>		// CHECK-LE: call <4 x float> @llvm.ppc.fnmsub.v4f32(<4 x float> %{{[0-9]+}}, <4 x float> %{{[0-9]+}}, <4 x float>
// CHECK: fneg <4 x float> %{{[0-9]+}}
// CHECK-LE: fneg <4 x float> %{{[0-9]+}}
// CHECK-LE-NEXT: call <4 x float> @llvm.fma.v4f32(<4 x float> %{{[0-9]+}}, <4 x float> %{{[0-9]+}}, <4 x float>
// CHECK-LE: fneg <4 x float> %{{[0-9]+}}

res_vd = vec_nmsub(vd, vd, vd);		res_vd = vec_nmsub(vd, vd, vd);
// CHECK: fneg <2 x double> %{{[0-9]+}}		// CHECK: [[FM:[0-9]+]] = call <2 x double> @llvm.ppc.fnmsub.v2f64(<2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}}, <2 x double>
// CHECK-NEXT: [[FM:[0-9]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}}, <2 x double>		// CHECK-LE: [[FM:[0-9]+]] = call <2 x double> @llvm.ppc.fnmsub.v2f64(<2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}}, <2 x double>
// CHECK-NEXT: fneg <2 x double> %[[FM]]
// CHECK-LE: fneg <2 x double> %{{[0-9]+}}
// CHECK-LE-NEXT: [[FM:[0-9]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{[0-9]+}}, <2 x double> %{{[0-9]+}}, <2 x double>
// CHECK-LE-NEXT: fneg <2 x double> %[[FM]]

/* vec_nor */		/* vec_nor */
res_vsll = vec_nor(vsll, vsll);		res_vsll = vec_nor(vsll, vsll);
// CHECK: or <2 x i64>		// CHECK: or <2 x i64>
// CHECK: xor <2 x i64>		// CHECK: xor <2 x i64>
// CHECK-LE: or <2 x i64>		// CHECK-LE: or <2 x i64>
// CHECK-LE: xor <2 x i64>		// CHECK-LE: xor <2 x i64>

▲ Show 20 Lines • Show All 2,031 Lines • Show Last 20 Lines

clang/test/CodeGen/PowerPC/builtins-ppc-xlcompat-math.c

	Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	//			//
	float fnmadds (float f) {			float fnmadds (float f) {
	return __fnmadds (f, f, f);			return __fnmadds (f, f, f);
	}			}

	// CHECK-LABEL: @fnmsub(			// CHECK-LABEL: @fnmsub(
	// CHECK: [[D_ADDR:%.*]] = alloca double, align 8			// CHECK: [[D_ADDR:%.*]] = alloca double, align 8
	// CHECK-NEXT: store double [[D:%.]], double [[D_ADDR]], align 8			// CHECK-NEXT: store double [[D:%.]], double [[D_ADDR]], align 8
				// CHECK-COUNT-3: load double, double* [[D_ADDR]], align 8
				shchenzUnsubmitted Not Done Reply Inline Actions If we improve the check lines to CHECK-COUNT, do we still need the original CHECKs? shchenz: If we improve the check lines to CHECK-COUNT, do we still need the original CHECKs?
				qiucfAuthorUnsubmitted Done Reply Inline Actions Yes, otherwise we can't capture the right operands of `llvm.ppc.fnmsub.f64`. qiucf: Yes, otherwise we can't capture the right operands of `llvm.ppc.fnmsub.f64`.
	// CHECK-NEXT: [[TMP0:%.]] = load double, double [[D_ADDR]], align 8			// CHECK-NEXT: [[TMP0:%.]] = load double, double [[D_ADDR]], align 8
	// CHECK-NEXT: [[TMP1:%.]] = load double, double [[D_ADDR]], align 8			// CHECK-NEXT: [[TMP1:%.]] = load double, double [[D_ADDR]], align 8
	// CHECK-NEXT: [[TMP2:%.]] = load double, double [[D_ADDR]], align 8			// CHECK-NEXT: [[TMP2:%.]] = load double, double [[D_ADDR]], align 8
	// CHECK-NEXT: [[TMP3:%.*]] = call double @llvm.ppc.fnmsub(double [[TMP0]], double [[TMP1]], double [[TMP2]])			// CHECK-NEXT: [[TMP3:%.*]] = call double @llvm.ppc.fnmsub.f64(double [[TMP0]], double [[TMP1]], double [[TMP2]])
	// CHECK-NEXT: ret double [[TMP3]]			// CHECK-NEXT: ret double [[TMP3]]
	//			//
	double fnmsub (double d) {			double fnmsub (double d) {
	return __fnmsub (d, d, d);			return __fnmsub (d, d, d);
	}			}

	// CHECK-LABEL: @fnmsubs(			// CHECK-LABEL: @fnmsubs(
	// CHECK: [[F_ADDR:%.*]] = alloca float, align 4			// CHECK: [[F_ADDR:%.*]] = alloca float, align 4
	// CHECK-NEXT: store float [[F:%.]], float [[F_ADDR]], align 4			// CHECK-NEXT: store float [[F:%.]], float [[F_ADDR]], align 4
				// CHECK-COUNT-3: load float, float* [[F_ADDR]], align 4
	// CHECK-NEXT: [[TMP0:%.]] = load float, float [[F_ADDR]], align 4			// CHECK-NEXT: [[TMP0:%.]] = load float, float [[F_ADDR]], align 4
	// CHECK-NEXT: [[TMP1:%.]] = load float, float [[F_ADDR]], align 4			// CHECK-NEXT: [[TMP1:%.]] = load float, float [[F_ADDR]], align 4
	// CHECK-NEXT: [[TMP2:%.]] = load float, float [[F_ADDR]], align 4			// CHECK-NEXT: [[TMP2:%.]] = load float, float [[F_ADDR]], align 4
	// CHECK-NEXT: [[TMP3:%.*]] = call float @llvm.ppc.fnmsubs(float [[TMP0]], float [[TMP1]], float [[TMP2]])			// CHECK-NEXT: [[TMP3:%.*]] = call float @llvm.ppc.fnmsub.f32(float [[TMP0]], float [[TMP1]], float [[TMP2]])
	// CHECK-NEXT: ret float [[TMP3]]			// CHECK-NEXT: ret float [[TMP3]]
	//			//
	float fnmsubs (float f) {			float fnmsubs (float f) {
	return __fnmsubs (f, f, f);			return __fnmsubs (f, f, f);
	}			}

	// CHECK-LABEL: @fre(			// CHECK-LABEL: @fre(
	// CHECK: [[D_ADDR:%.*]] = alloca double, align 8			// CHECK: [[D_ADDR:%.*]] = alloca double, align 8
	Show All 19 Lines

llvm/include/llvm/IR/IntrinsicsPowerPC.td

Show First 20 Lines • Show All 1,716 Lines • ▼ Show 20 Lines	def int_ppc_fnmadd
[llvm_double_ty, llvm_double_ty, llvm_double_ty],		[llvm_double_ty, llvm_double_ty, llvm_double_ty],
[IntrNoMem]>;		[IntrNoMem]>;
def int_ppc_fnmadds		def int_ppc_fnmadds
: GCCBuiltin<"__builtin_ppc_fnmadds">,		: GCCBuiltin<"__builtin_ppc_fnmadds">,
Intrinsic <[llvm_float_ty],		Intrinsic <[llvm_float_ty],
[llvm_float_ty, llvm_float_ty, llvm_float_ty],		[llvm_float_ty, llvm_float_ty, llvm_float_ty],
[IntrNoMem]>;		[IntrNoMem]>;
def int_ppc_fnmsub		def int_ppc_fnmsub
: GCCBuiltin<"__builtin_ppc_fnmsub">,		: Intrinsic<[llvm_anyfloat_ty],
Intrinsic <[llvm_double_ty],		[LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>],
[llvm_double_ty, llvm_double_ty, llvm_double_ty],
[IntrNoMem]>;
def int_ppc_fnmsubs
: GCCBuiltin<"__builtin_ppc_fnmsubs">,
Intrinsic <[llvm_float_ty],
[llvm_float_ty, llvm_float_ty, llvm_float_ty],
[IntrNoMem]>;		[IntrNoMem]>;
def int_ppc_fre		def int_ppc_fre
: GCCBuiltin<"__builtin_ppc_fre">,		: GCCBuiltin<"__builtin_ppc_fre">,
Intrinsic <[llvm_double_ty], [llvm_double_ty], [IntrNoMem]>;		Intrinsic <[llvm_double_ty], [llvm_double_ty], [IntrNoMem]>;
def int_ppc_fres		def int_ppc_fres
: GCCBuiltin<"__builtin_ppc_fres">,		: GCCBuiltin<"__builtin_ppc_fres">,
Intrinsic <[llvm_float_ty], [llvm_float_ty], [IntrNoMem]>;		Intrinsic <[llvm_float_ty], [llvm_float_ty], [IntrNoMem]>;
def int_ppc_addex		def int_ppc_addex
: GCCBuiltin<"__builtin_ppc_addex">,		: GCCBuiltin<"__builtin_ppc_addex">,
Intrinsic<[llvm_i64_ty], [llvm_i64_ty, llvm_i64_ty, llvm_i32_ty],		Intrinsic<[llvm_i64_ty], [llvm_i64_ty, llvm_i64_ty, llvm_i32_ty],
[IntrNoMem, IntrHasSideEffects, ImmArg<ArgIndex<2>>]>;		[IntrNoMem, IntrHasSideEffects, ImmArg<ArgIndex<2>>]>;
		shchenzUnsubmitted Done Reply Inline Actions When `llvm_anyfloat_ty` is `f32` or `f64`, we will generate two intrinsics with same semantic. `llvm.ppc.nmsub.f32` + `llvm.ppc.fnmsubs` and `llvm.ppc.nmsub.f64` + `llvm.ppc.fnmsub`. At first glance, we seems can not delete the `int_ppc_fnmsub` and `int_ppc_fnmsubs`, because they are for XL compatibility and XL has seperated fnmsub for float and double and we need to map them 1 by 1. Better to check if it is possible to replace `int_ppc_fnmsub` and `int_ppc_fnmsubs` with `int_ppc_nmsub`. And if it can be replaced, we can use a meaningful name like `int_ppc_fnmsub` for the new intrinsic. shchenz: When `llvm_anyfloat_ty` is `f32` or `f64`, we will generate two intrinsics with same semantic.
		qiucfAuthorUnsubmitted Done Reply Inline Actions We can do that, but that requires more work and seems beyond this patch's scope. See D105930, we'll need to handle the builtin in Clang. And the builtin explicitly generates type-M VSX instructions (I guess to reduce copy in simple cases). qiucf: We can do that, but that requires more work and seems beyond this patch's scope. See D105930…
def int_ppc_fsel : GCCBuiltin<"__builtin_ppc_fsel">,		def int_ppc_fsel : GCCBuiltin<"__builtin_ppc_fsel">,
Intrinsic<[llvm_double_ty], [llvm_double_ty, llvm_double_ty,		Intrinsic<[llvm_double_ty], [llvm_double_ty, llvm_double_ty,
llvm_double_ty], [IntrNoMem]>;		llvm_double_ty], [IntrNoMem]>;
def int_ppc_fsels : GCCBuiltin<"__builtin_ppc_fsels">,		def int_ppc_fsels : GCCBuiltin<"__builtin_ppc_fsels">,
Intrinsic<[llvm_float_ty], [llvm_float_ty, llvm_float_ty,		Intrinsic<[llvm_float_ty], [llvm_float_ty, llvm_float_ty,
llvm_float_ty], [IntrNoMem]>;		llvm_float_ty], [IntrNoMem]>;
def int_ppc_frsqrte : GCCBuiltin<"__builtin_ppc_frsqrte">,		def int_ppc_frsqrte : GCCBuiltin<"__builtin_ppc_frsqrte">,
Intrinsic<[llvm_double_ty], [llvm_double_ty], [IntrNoMem]>;		Intrinsic<[llvm_double_ty], [llvm_double_ty], [IntrNoMem]>;
▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 621 Lines • ▼ Show 20 Lines	PPCTargetLowering::PPCTargetLowering(const PPCTargetMachine &TM,
setOperationAction(ISD::GET_DYNAMIC_AREA_OFFSET, MVT::i64, Custom);		setOperationAction(ISD::GET_DYNAMIC_AREA_OFFSET, MVT::i64, Custom);
setOperationAction(ISD::EH_DWARF_CFA, MVT::i32, Custom);		setOperationAction(ISD::EH_DWARF_CFA, MVT::i32, Custom);
setOperationAction(ISD::EH_DWARF_CFA, MVT::i64, Custom);		setOperationAction(ISD::EH_DWARF_CFA, MVT::i64, Custom);

// We want to custom lower some of our intrinsics.		// We want to custom lower some of our intrinsics.
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom);
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::f64, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::f64, Custom);
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::ppcf128, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::ppcf128, Custom);
		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::v4f32, Custom);
		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::v2f64, Custom);

// To handle counter-based loop conditions.		// To handle counter-based loop conditions.
setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i1, Custom);		setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i1, Custom);

setOperationAction(ISD::INTRINSIC_VOID, MVT::i8, Custom);		setOperationAction(ISD::INTRINSIC_VOID, MVT::i8, Custom);
setOperationAction(ISD::INTRINSIC_VOID, MVT::i16, Custom);		setOperationAction(ISD::INTRINSIC_VOID, MVT::i16, Custom);
setOperationAction(ISD::INTRINSIC_VOID, MVT::i32, Custom);		setOperationAction(ISD::INTRINSIC_VOID, MVT::i32, Custom);
setOperationAction(ISD::INTRINSIC_VOID, MVT::Other, Custom);		setOperationAction(ISD::INTRINSIC_VOID, MVT::Other, Custom);
▲ Show 20 Lines • Show All 9,906 Lines • ▼ Show 20 Lines	return SDValue(
PPC::SELECT_CC_I4, dl, MVT::i32,		PPC::SELECT_CC_I4, dl, MVT::i32,
{SDValue(DAG.getMachineNode(CmprOpc, dl, MVT::i32, Op.getOperand(2),		{SDValue(DAG.getMachineNode(CmprOpc, dl, MVT::i32, Op.getOperand(2),
Op.getOperand(1)),		Op.getOperand(1)),
0),		0),
DAG.getConstant(1, dl, MVT::i32), DAG.getConstant(0, dl, MVT::i32),		DAG.getConstant(1, dl, MVT::i32), DAG.getConstant(0, dl, MVT::i32),
DAG.getTargetConstant(PPC::PRED_EQ, dl, MVT::i32)}),		DAG.getTargetConstant(PPC::PRED_EQ, dl, MVT::i32)}),
0);		0);
}		}
		case Intrinsic::ppc_fnmsub: {
		EVT VT = Op.getOperand(1).getValueType();
		if (!Subtarget.hasVSX() \|\| (!Subtarget.hasFloat128() && VT == MVT::f128))
		return DAG.getNode(
		ISD::FNEG, dl, VT,
		DAG.getNode(ISD::FMA, dl, VT, Op.getOperand(1), Op.getOperand(2),
		DAG.getNode(ISD::FNEG, dl, VT, Op.getOperand(3))));
		return DAG.getNode(PPCISD::FNMSUB, dl, VT, Op.getOperand(1),
		Op.getOperand(2), Op.getOperand(3));
		}
case Intrinsic::ppc_convert_f128_to_ppcf128:		case Intrinsic::ppc_convert_f128_to_ppcf128:
case Intrinsic::ppc_convert_ppcf128_to_f128: {		case Intrinsic::ppc_convert_ppcf128_to_f128: {
RTLIB::Libcall LC = IntrinsicID == Intrinsic::ppc_convert_ppcf128_to_f128		RTLIB::Libcall LC = IntrinsicID == Intrinsic::ppc_convert_ppcf128_to_f128
? RTLIB::CONVERT_PPCF128_F128		? RTLIB::CONVERT_PPCF128_F128
: RTLIB::CONVERT_F128_PPCF128;		: RTLIB::CONVERT_F128_PPCF128;
MakeLibCallOptions CallOptions;		MakeLibCallOptions CallOptions;
std::pair<SDValue, SDValue> Result =		std::pair<SDValue, SDValue> Result =
makeLibCall(DAG, LC, Op.getValueType(), Op.getOperand(1), CallOptions,		makeLibCall(DAG, LC, Op.getValueType(), Op.getOperand(1), CallOptions,
▲ Show 20 Lines • Show All 655 Lines • ▼ Show 20 Lines	case ISD::INTRINSIC_W_CHAIN: {
break;		break;
}		}
case ISD::INTRINSIC_WO_CHAIN: {		case ISD::INTRINSIC_WO_CHAIN: {
switch (cast<ConstantSDNode>(N->getOperand(0))->getZExtValue()) {		switch (cast<ConstantSDNode>(N->getOperand(0))->getZExtValue()) {
case Intrinsic::ppc_pack_longdouble:		case Intrinsic::ppc_pack_longdouble:
Results.push_back(DAG.getNode(ISD::BUILD_PAIR, dl, MVT::ppcf128,		Results.push_back(DAG.getNode(ISD::BUILD_PAIR, dl, MVT::ppcf128,
N->getOperand(2), N->getOperand(1)));		N->getOperand(2), N->getOperand(1)));
break;		break;
		case Intrinsic::ppc_fnmsub:
case Intrinsic::ppc_convert_f128_to_ppcf128:		case Intrinsic::ppc_convert_f128_to_ppcf128:
Results.push_back(LowerINTRINSIC_WO_CHAIN(SDValue(N, 0), DAG));		Results.push_back(LowerINTRINSIC_WO_CHAIN(SDValue(N, 0), DAG));
break;		break;
}		}
break;		break;
}		}
case ISD::VAARG: {		case ISD::VAARG: {
if (!Subtarget.isSVR4ABI() \|\| Subtarget.isPPC64())		if (!Subtarget.isSVR4ABI() \|\| Subtarget.isPPC64())
▲ Show 20 Lines • Show All 6,879 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCInstrInfo.td

Show First 20 Lines • Show All 3,722 Lines • ▼ Show 20 Lines	def : Pat<(fcopysign f64:$frB, f32:$frA),
(FCPSGND (COPY_TO_REGCLASS $frA, F8RC), $frB)>;		(FCPSGND (COPY_TO_REGCLASS $frA, F8RC), $frB)>;
def : Pat<(fcopysign f32:$frB, f64:$frA),		def : Pat<(fcopysign f32:$frB, f64:$frA),
(FCPSGNS (COPY_TO_REGCLASS $frA, F4RC), $frB)>;		(FCPSGNS (COPY_TO_REGCLASS $frA, F4RC), $frB)>;
}		}

// XL Compat intrinsics.		// XL Compat intrinsics.
def : Pat<(int_ppc_fmsub f64:$A, f64:$B, f64:$C), (FMSUB $A, $B, $C)>;		def : Pat<(int_ppc_fmsub f64:$A, f64:$B, f64:$C), (FMSUB $A, $B, $C)>;
def : Pat<(int_ppc_fmsubs f32:$A, f32:$B, f32:$C), (FMSUBS $A, $B, $C)>;		def : Pat<(int_ppc_fmsubs f32:$A, f32:$B, f32:$C), (FMSUBS $A, $B, $C)>;
def : Pat<(int_ppc_fnmsub f64:$A, f64:$B, f64:$C), (FNMSUB $A, $B, $C)>;
def : Pat<(int_ppc_fnmsubs f32:$A, f32:$B, f32:$C), (FNMSUBS $A, $B, $C)>;
def : Pat<(int_ppc_fnmadd f64:$A, f64:$B, f64:$C), (FNMADD $A, $B, $C)>;		def : Pat<(int_ppc_fnmadd f64:$A, f64:$B, f64:$C), (FNMADD $A, $B, $C)>;
def : Pat<(int_ppc_fnmadds f32:$A, f32:$B, f32:$C), (FNMADDS $A, $B, $C)>;		def : Pat<(int_ppc_fnmadds f32:$A, f32:$B, f32:$C), (FNMADDS $A, $B, $C)>;
def : Pat<(int_ppc_fre f64:$A), (FRE $A)>;		def : Pat<(int_ppc_fre f64:$A), (FRE $A)>;
def : Pat<(int_ppc_fres f32:$A), (FRES $A)>;		def : Pat<(int_ppc_fres f32:$A), (FRES $A)>;

include "PPCInstrAltivec.td"		include "PPCInstrAltivec.td"
include "PPCInstrSPE.td"		include "PPCInstrSPE.td"
include "PPCInstr64Bit.td"		include "PPCInstr64Bit.td"
▲ Show 20 Lines • Show All 1,866 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCInstrVSX.td

Show First 20 Lines • Show All 2,891 Lines • ▼ Show 20 Lines
def : Pat<(v2i64 (PPCvcmp_rec v2i64:$vA, v2i64:$vB, 711)),		def : Pat<(v2i64 (PPCvcmp_rec v2i64:$vA, v2i64:$vB, 711)),
(VCMPGTUB_rec DblwdCmp.MRGUGT, (v2i64 (XXLXORz)))>;		(VCMPGTUB_rec DblwdCmp.MRGUGT, (v2i64 (XXLXORz)))>;
def : Pat<(v2i64 (PPCvcmp_rec v2i64:$vA, v2i64:$vB, 199)),		def : Pat<(v2i64 (PPCvcmp_rec v2i64:$vA, v2i64:$vB, 199)),
(VCMPGTUB_rec DblwdCmp.MRGEQ, (v2i64 (XXLXORz)))>;		(VCMPGTUB_rec DblwdCmp.MRGEQ, (v2i64 (XXLXORz)))>;
} // AddedComplexity = 0		} // AddedComplexity = 0

// XL Compat builtins.		// XL Compat builtins.
def : Pat<(int_ppc_fmsub f64:$A, f64:$B, f64:$C), (XSMSUBMDP $A, $B, $C)>;		def : Pat<(int_ppc_fmsub f64:$A, f64:$B, f64:$C), (XSMSUBMDP $A, $B, $C)>;
def : Pat<(int_ppc_fnmsub f64:$A, f64:$B, f64:$C), (XSNMSUBMDP $A, $B, $C)>;
def : Pat<(int_ppc_fnmadd f64:$A, f64:$B, f64:$C), (XSNMADDMDP $A, $B, $C)>;		def : Pat<(int_ppc_fnmadd f64:$A, f64:$B, f64:$C), (XSNMADDMDP $A, $B, $C)>;
def : Pat<(int_ppc_fre f64:$A), (XSREDP $A)>;		def : Pat<(int_ppc_fre f64:$A), (XSREDP $A)>;
def : Pat<(int_ppc_frsqrte vsfrc:$XB), (XSRSQRTEDP $XB)>;		def : Pat<(int_ppc_frsqrte vsfrc:$XB), (XSRSQRTEDP $XB)>;
} // HasVSX		} // HasVSX

// Any big endian VSX subtarget.		// Any big endian VSX subtarget.
let Predicates = [HasVSX, IsBigEndian] in {		let Predicates = [HasVSX, IsBigEndian] in {
def : Pat<(v2f64 (scalar_to_vector f64:$A)),		def : Pat<(v2f64 (scalar_to_vector f64:$A)),
▲ Show 20 Lines • Show All 397 Lines • ▼ Show 20 Lines	def : Pat<(v2i64 (bitconvert (v16i8 immAllOnesV))),
(v2i64 (COPY_TO_REGCLASS(XXLEQVOnes), VSRC))>;		(v2i64 (COPY_TO_REGCLASS(XXLEQVOnes), VSRC))>;
def : Pat<(v8i16 (bitconvert (v16i8 immAllOnesV))),		def : Pat<(v8i16 (bitconvert (v16i8 immAllOnesV))),
(v8i16 (COPY_TO_REGCLASS(XXLEQVOnes), VSRC))>;		(v8i16 (COPY_TO_REGCLASS(XXLEQVOnes), VSRC))>;
def : Pat<(v16i8 (bitconvert (v16i8 immAllOnesV))),		def : Pat<(v16i8 (bitconvert (v16i8 immAllOnesV))),
(v16i8 (COPY_TO_REGCLASS(XXLEQVOnes), VSRC))>;		(v16i8 (COPY_TO_REGCLASS(XXLEQVOnes), VSRC))>;

// XL Compat builtins.		// XL Compat builtins.
def : Pat<(int_ppc_fmsubs f32:$A, f32:$B, f32:$C), (XSMSUBMSP $A, $B, $C)>;		def : Pat<(int_ppc_fmsubs f32:$A, f32:$B, f32:$C), (XSMSUBMSP $A, $B, $C)>;
def : Pat<(int_ppc_fnmsubs f32:$A, f32:$B, f32:$C), (XSNMSUBMSP $A, $B, $C)>;
def : Pat<(int_ppc_fnmadds f32:$A, f32:$B, f32:$C), (XSNMADDMSP $A, $B, $C)>;		def : Pat<(int_ppc_fnmadds f32:$A, f32:$B, f32:$C), (XSNMADDMSP $A, $B, $C)>;
def : Pat<(int_ppc_fres f32:$A), (XSRESP $A)>;		def : Pat<(int_ppc_fres f32:$A), (XSRESP $A)>;
def : Pat<(i32 (int_ppc_extract_exp f64:$A)),		def : Pat<(i32 (int_ppc_extract_exp f64:$A)),
(EXTRACT_SUBREG (XSXEXPDP (COPY_TO_REGCLASS $A, VSFRC)), sub_32)>;		(EXTRACT_SUBREG (XSXEXPDP (COPY_TO_REGCLASS $A, VSFRC)), sub_32)>;
def : Pat<(int_ppc_extract_sig f64:$A),		def : Pat<(int_ppc_extract_sig f64:$A),
(XSXSIGDP (COPY_TO_REGCLASS $A, VSFRC))>;		(XSXSIGDP (COPY_TO_REGCLASS $A, VSFRC))>;
def : Pat<(f64 (int_ppc_insert_exp f64:$A, i64:$B)),		def : Pat<(f64 (int_ppc_insert_exp f64:$A, i64:$B)),
(COPY_TO_REGCLASS (XSIEXPDP (COPY_TO_REGCLASS $A, G8RC), $B), F8RC)>;		(COPY_TO_REGCLASS (XSIEXPDP (COPY_TO_REGCLASS $A, G8RC), $B), F8RC)>;
▲ Show 20 Lines • Show All 1,840 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/builtins-ppc-xlcompat-math.ll

	Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
	; CHECK-PWR7-NEXT: blr			; CHECK-PWR7-NEXT: blr
	entry:			entry:
	%0 = tail call float @llvm.ppc.fnmadds(float %f, float %f2, float %f3)			%0 = tail call float @llvm.ppc.fnmadds(float %f, float %f2, float %f3)
	ret float %0			ret float %0
	}			}

	declare float @llvm.ppc.fnmadds(float, float, float)			declare float @llvm.ppc.fnmadds(float, float, float)

	define dso_local double @fnmsub_t0(double %d, double %d2, double %d3) {			define dso_local float @fnmsub_f32(float %f, float %f2, float %f3) {
	; CHECK-PWR8-LABEL: fnmsub_t0:			; CHECK-PWR8-LABEL: fnmsub_f32:
	; CHECK-PWR8: # %bb.0: # %entry			; CHECK-PWR8: # %bb.0: # %entry
	; CHECK-PWR8-NEXT: xsnmsubmdp 1, 2, 3			; CHECK-PWR8-NEXT: xsnmsubasp 3, 1, 2
				; CHECK-PWR8-NEXT: fmr 1, 3
	; CHECK-PWR8-NEXT: blr			; CHECK-PWR8-NEXT: blr
	;			;
	; CHECK-NOVSX-LABEL: fnmsub_t0:			; CHECK-NOVSX-LABEL: fnmsub_f32:
				; CHECK-NOVSX: # %bb.0: # %entry
				; CHECK-NOVSX-NEXT: fnmsubs 1, 1, 2, 3
				; CHECK-NOVSX-NEXT: blr
				;
				; CHECK-PWR7-LABEL: fnmsub_f32:
				; CHECK-PWR7: # %bb.0: # %entry
				; CHECK-PWR7-NEXT: fnmsubs 1, 1, 2, 3
				; CHECK-PWR7-NEXT: blr
				entry:
				%0 = tail call float @llvm.ppc.fnmsub.f32(float %f, float %f2, float %f3)
				ret float %0
				}

				declare float @llvm.ppc.fnmsub.f32(float, float, float)

				define dso_local double @fnmsub_f64(double %f, double %f2, double %f3) {
				; CHECK-PWR8-LABEL: fnmsub_f64:
				; CHECK-PWR8: # %bb.0: # %entry
				; CHECK-PWR8-NEXT: xsnmsubadp 3, 1, 2
				; CHECK-PWR8-NEXT: fmr 1, 3
				; CHECK-PWR8-NEXT: blr
				;
				; CHECK-NOVSX-LABEL: fnmsub_f64:
	; CHECK-NOVSX: # %bb.0: # %entry			; CHECK-NOVSX: # %bb.0: # %entry
	; CHECK-NOVSX-NEXT: fnmsub 1, 1, 2, 3			; CHECK-NOVSX-NEXT: fnmsub 1, 1, 2, 3
	; CHECK-NOVSX-NEXT: blr			; CHECK-NOVSX-NEXT: blr
	;			;
	; CHECK-PWR7-LABEL: fnmsub_t0:			; CHECK-PWR7-LABEL: fnmsub_f64:
	; CHECK-PWR7: # %bb.0: # %entry			; CHECK-PWR7: # %bb.0: # %entry
	; CHECK-PWR7-NEXT: xsnmsubmdp 1, 2, 3			; CHECK-PWR7-NEXT: xsnmsubadp 3, 1, 2
				; CHECK-PWR7-NEXT: fmr 1, 3
	; CHECK-PWR7-NEXT: blr			; CHECK-PWR7-NEXT: blr
	entry:			entry:
	%0 = tail call double @llvm.ppc.fnmsub(double %d, double %d2, double %d3)			%0 = tail call double @llvm.ppc.fnmsub.f64(double %f, double %f2, double %f3)
	ret double %0			ret double %0
	}			}

	declare double @llvm.ppc.fnmsub(double, double, double)			declare double @llvm.ppc.fnmsub.f64(double, double, double)

	define dso_local float @fnmsubs_t0(float %f, float %f2, float %f3) {			define dso_local <4 x float> @fnmsub_v4f32(<4 x float> %f, <4 x float> %f2, <4 x float> %f3) {
	; CHECK-PWR8-LABEL: fnmsubs_t0:			; CHECK-PWR8-LABEL: fnmsub_v4f32:
	; CHECK-PWR8: # %bb.0: # %entry			; CHECK-PWR8: # %bb.0: # %entry
	; CHECK-PWR8-NEXT: xsnmsubmsp 1, 2, 3			; CHECK-PWR8-NEXT: xvnmsubasp 36, 34, 35
				; CHECK-PWR8-NEXT: vmr 2, 4
	; CHECK-PWR8-NEXT: blr			; CHECK-PWR8-NEXT: blr
	;			;
	; CHECK-NOVSX-LABEL: fnmsubs_t0:			; CHECK-NOVSX-LABEL: fnmsub_v4f32:
	; CHECK-NOVSX: # %bb.0: # %entry			; CHECK-NOVSX: # %bb.0: # %entry
	; CHECK-NOVSX-NEXT: fnmsubs 1, 1, 2, 3			; CHECK-NOVSX-NEXT: fnmsubs 1, 1, 5, 9
				; CHECK-NOVSX-NEXT: fnmsubs 2, 2, 6, 10
				; CHECK-NOVSX-NEXT: fnmsubs 3, 3, 7, 11
				; CHECK-NOVSX-NEXT: fnmsubs 4, 4, 8, 12
	; CHECK-NOVSX-NEXT: blr			; CHECK-NOVSX-NEXT: blr
	;			;
	; CHECK-PWR7-LABEL: fnmsubs_t0:			; CHECK-PWR7-LABEL: fnmsub_v4f32:
	; CHECK-PWR7: # %bb.0: # %entry			; CHECK-PWR7: # %bb.0: # %entry
	; CHECK-PWR7-NEXT: fnmsubs 1, 1, 2, 3			; CHECK-PWR7-NEXT: xvnmsubasp 36, 34, 35
				; CHECK-PWR7-NEXT: vmr 2, 4
	; CHECK-PWR7-NEXT: blr			; CHECK-PWR7-NEXT: blr
	entry:			entry:
	%0 = tail call float @llvm.ppc.fnmsubs(float %f, float %f2, float %f3)			%0 = tail call <4 x float> @llvm.ppc.fnmsub.v4f32(<4 x float> %f, <4 x float> %f2, <4 x float> %f3)
	ret float %0			ret <4 x float> %0
				}

				declare <4 x float> @llvm.ppc.fnmsub.v4f32(<4 x float>, <4 x float>, <4 x float>)

				define dso_local <2 x double> @fnmsub_v2f64(<2 x double> %f, <2 x double> %f2, <2 x double> %f3) {
				; CHECK-PWR8-LABEL: fnmsub_v2f64:
				; CHECK-PWR8: # %bb.0: # %entry
				; CHECK-PWR8-NEXT: xvnmsubadp 36, 34, 35
				; CHECK-PWR8-NEXT: vmr 2, 4
				; CHECK-PWR8-NEXT: blr
				;
				; CHECK-NOVSX-LABEL: fnmsub_v2f64:
				; CHECK-NOVSX: # %bb.0: # %entry
				; CHECK-NOVSX-NEXT: fnmsub 1, 1, 3, 5
				; CHECK-NOVSX-NEXT: fnmsub 2, 2, 4, 6
				; CHECK-NOVSX-NEXT: blr
				;
				; CHECK-PWR7-LABEL: fnmsub_v2f64:
				; CHECK-PWR7: # %bb.0: # %entry
				; CHECK-PWR7-NEXT: xvnmsubadp 36, 34, 35
				; CHECK-PWR7-NEXT: vmr 2, 4
				; CHECK-PWR7-NEXT: blr
				entry:
				%0 = tail call <2 x double> @llvm.ppc.fnmsub.v2f64(<2 x double> %f, <2 x double> %f2, <2 x double> %f3)
				ret <2 x double> %0
	}			}

	declare float @llvm.ppc.fnmsubs(float, float, float)			declare <2 x double> @llvm.ppc.fnmsub.v2f64(<2 x double>, <2 x double>, <2 x double>)

	define dso_local double @fre(double %d) {			define dso_local double @fre(double %d) {
	; CHECK-PWR8-LABEL: fre:			; CHECK-PWR8-LABEL: fre:
	; CHECK-PWR8: # %bb.0: # %entry			; CHECK-PWR8: # %bb.0: # %entry
	; CHECK-PWR8-NEXT: xsredp 1, 1			; CHECK-PWR8-NEXT: xsredp 1, 1
	; CHECK-PWR8-NEXT: blr			; CHECK-PWR8-NEXT: blr
	;			;
	; CHECK-NOVSX-LABEL: fre:			; CHECK-NOVSX-LABEL: fre:
	Show All 36 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Add generic fnmsub intrinsicClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 413334

clang/lib/CodeGen/CGBuiltin.cpp

clang/test/CodeGen/PowerPC/builtins-ppc-fma.c

clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c

clang/test/CodeGen/PowerPC/builtins-ppc-vsx.c

clang/test/CodeGen/PowerPC/builtins-ppc-xlcompat-math.c

llvm/include/llvm/IR/IntrinsicsPowerPC.td

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

llvm/lib/Target/PowerPC/PPCInstrInfo.td

llvm/lib/Target/PowerPC/PPCInstrVSX.td

llvm/test/CodeGen/PowerPC/builtins-ppc-xlcompat-math.ll

[PowerPC] Add generic fnmsub intrinsic
ClosedPublic