This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
-
PPCISelLowering.h
7/15
PPCISelLowering.cpp
1/2
PPCInstrInfo.td
2/4
PPCInstrVSX.td
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
1/3
ctr-minmaxnum.ll
1/3
scalar-min-max.ll

Differential D62993

[PowerPC] Emit scalar min/max instructions with unsafe fp math
ClosedPublic

Authored by nemanjai on Jun 6 2019, 8:39 PM.

Download Raw Diff

Details

Reviewers

hfinkel
jsji
kbarton
lei
stefanp
steven.zhang

Commits

rG25a41ad24200: [PowerPC] Emit scalar fp min/max instructions

Summary

This is something I meant to do a long time ago but never got around to it. These instructions should be an improvement over the compare/fsel sequence we currently emit.

The semantics of the instructions as specified in the ISA match the semantics specified in the description of the nodes.

Diff Detail

Repository: rL LLVM

Event Timeline

nemanjai created this revision.Jun 6 2019, 8:39 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2019, 8:39 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

It is a great idea to exploit xsmindp/xsmaxdp! But looks like we make it more general than restricted to UnsafeFPMath?

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
550 ↗	(On Diff #203487)	Why we need `TM.Options.UnsafeFPMath` here? If `ISD::FMAXNUM_IEEE` is generated, then the semantic is exact the same as `xsmaxdp, we should be safe to use `xsmaxdp`. I think we can also add actions for `ISD::FMAXNUM`/`FMAXIMUM` and `ISD::FMINNUM`/`FMINIMUM`, then we do need `TM.Options.UnsafeFPMath` or `TM.Options.NoNansFPMath`/`NoSignedZerosFPMath` for them.
llvm/lib/Target/PowerPC/PPCInstrVSX.td
984 ↗	(On Diff #203487)	We can add similar patterns for `fmaxnum`/`fmaxinum` and `fminnum`/`fmininum` with `Predicates` ?
llvm/test/CodeGen/PowerPC/scalar-min-max.ll
2 ↗	(On Diff #203487)	Maybe add `RUN` lines for `--enable-no-nans-fp-math`/`-enable-no-signed-zeros-fp-math`?
7 ↗	(On Diff #203487)	It would be great if we can pre-commit the testcase to show only difference .
48 ↗	(On Diff #203487)	Why we need these attributes? Looks like these should be in different `RUN` line ?

Move out of review queue, need author's action.

This revision now requires changes to proceed.Aug 27 2019, 7:54 PM

Herald added subscribers: shchenz, • wuzish, MaskRay. · View Herald TranscriptAug 27 2019, 7:54 PM

amyk added a subscriber: amyk.Sep 12 2019, 10:57 PM

amyk added inline comments.

llvm/test/CodeGen/PowerPC/scalar-min-max.ll
2 ↗	(On Diff #203487)	Maybe also add `-verify-machineinstrs` to our tests?

nemanjai marked 2 inline comments as done.Oct 22 2019, 4:54 PM

nemanjai added inline comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
550 ↗	(On Diff #203487)	When I originally did this, it would produce these nodes along with `ISD::FCANONICALIZE` when unsafe fp math isn't specified. However, the DAG combiner seems to have been modified to not do that any longer. The instructions themselves handle SNaNs correctly anyway so we can handle the inputs coming from `ISD::FCANONICALIZE` anyway. However, I don't really see a point in legalizing `FMAXNUM/FMINNUM` since we will just get the `_IEEE` versions even with fast math.
llvm/test/CodeGen/PowerPC/scalar-min-max.ll
7 ↗	(On Diff #203487)	I will add a RUN line without the FMF flags which will show the difference in codegen in the test case itself.

Remove the requirement for unsafe math for the FMAXNUM_IEEE and FMINNUM_IEEE. Add codegen for P9 xsmaxdp/xsmindp. Improve the test case.

I see there is one sanity failed.

FAIL: LLVM :: CodeGen/PowerPC/ctr-minmaxnum.ll (29517 of 50289)
******************** TEST 'LLVM :: CodeGen/PowerPC/ctr-minmaxnum.ll' FAILED ********************
Script:
--
: 'RUN: at line 1';   /home/qshanz/work/build/bin/llc -mtriple=powerpc64-unknown-linux-gnu -verify-machineinstrs -mcpu=pwr7 < /home/qshanz/work/llvm/llvm/test/CodeGen/PowerPC/ctr-minmaxnum.ll | /home/qshanz/work/build/bin/FileCheck /home/qshanz/work/llvm/llvm/test/CodeGen/PowerPC/ctr-minmaxnum.ll
: 'RUN: at line 2';   /home/qshanz/work/build/bin/llc -mtriple=powerpc64-unknown-linux-gnu -verify-machineinstrs -mcpu=a2q < /home/qshanz/work/llvm/llvm/test/CodeGen/PowerPC/ctr-minmaxnum.ll | /home/qshanz/work/build/bin/FileCheck /home/qshanz/work/llvm/llvm/test/CodeGen/PowerPC/ctr-minmaxnum.ll --check-prefix=QPX
--
Exit Code: 2

Command Output (stderr):
--
LLVM ERROR: Cannot select: t14: f32 = fcanonicalize t2
  t2: f32,ch = CopyFromReg t0, Register:f32 %2
    t1: f32 = Register %2
In function: test1
FileCheck error: '-' is empty.
FileCheck command line:  /home/qshanz/work/build/bin/FileCheck /home/qshanz/work/llvm/llvm/test/CodeGen/PowerPC/ctr-minmaxnum.ll

lib/Target/PowerPC/PPCISelLowering.cpp
555	We will get the ISD::FMAXNUM/ISD::FMINNUM node if mark it as legal. define dso_local float @testfmax_fast(float %a, float %b) { entry: %cmp = fcmp fast ogt float %a, %b %cond = select i1 %cmp, float %a, float %b ret float %cond } llc test.ll -mattr=+vsx And also for the intrinsic llvm.minnum/llvm.maxnum. Initial selection DAG: %bb.0 'testfmax_fast:entry' SelectionDAG has 11 nodes: t0: ch = EntryToken t2: f32,ch = CopyFromReg t0, Register:f32 %0 t4: f32,ch = CopyFromReg t0, Register:f32 %1 t6: i1 = setcc nnan ninf nsz arcp contract afn reassoc t2, t4, setgt:ch t7: f32 = fmaxnum t2, t4 t9: ch,glue = CopyToReg t0, Register:f32 $f1, t7 t10: ch = PPCISD::RET_FLAG t9, Register:f32 $f1, t9:1 The node is built directly not combined by select_cc. And I think, we need to lower it if we know that, the operand is NaN or not(i.e. isKnownNeverNaN()).
7217	Is it hasVSX() more clear ?
7221	Do we need logic to handle the case that if the op is NaN ? If src1 or src2 is a SNaN, an Invalid Operation exception occurs. If either src1 or src2 is a NaN, result is src2. Otherwise, if src1 is less than src2, result is src1. Otherwise, result is src2. The ISA documentation is a bit confusing here. Isn't NaN including SNaN and QNaN ? The condition in the second if cover the first one.
lib/Target/PowerPC/PPCInstrInfo.td
120	SDT_PPCFPMinMax ?

This revision now requires changes to proceed.Oct 23 2019, 1:33 AM

steven.zhang added inline comments.Oct 23 2019, 4:48 AM

lib/Target/PowerPC/PPCISelLowering.cpp
555	Hmm, ignore the above comments. It is right to have DAG generated the IEEE node instead of the non-IEEE, as the hw has instruction semantics equal.
7221	Please also ignore above comments as after double confirm, the XSMAXCDP/XSMINCDP perfectly match the semantics, no matter if the operand is NaN or not.
test/CodeGen/PowerPC/scalar-min-max.ll
13	It would be great if we have some test to verify the behavior if operand is SNaN/QNaN for P9. However, this is NOT a must.

In D62993#1718495, @steven.zhang wrote:
I see there is one sanity failed.
LLVM ERROR: Cannot select: t14: f32 = fcanonicalize t2
t2: f32,ch = CopyFromReg t0, Register:f32 %2
  t1: f32 = Register %2
In function: test1
FileCheck error: '-' is empty.
FileCheck command line: /home/qshanz/work/build/bin/FileCheck /home/qshanz/work/llvm/llvm/test/CodeGen/PowerPC/ctr-minmaxnum.ll

Ah, that's where I saw the issue I mentioned in https://reviews.llvm.org/D62993?id=203487#inline-622927
I'll fix it up in the next update.

lib/Target/PowerPC/PPCISelLowering.cpp
555	I don't dispute that we will get the nodes if we mark them legal. However, I do not think that we will get these nodes in more situations than we get the `_IEEE` versions. The way I see it, with `ninf nsz nnan`, the nodes are equivalent since the only difference between them is the handling of NaNs and (presumably) signed zeros.
7207	This needs to change from `isISA3_0()` to `hasP9Vector()`.
7217	I dont' understand this comment. This is only available in ISA3.0, so `hasVSX()` is not adequate. And VSX is a requirement for P9Vector, so why would I need both?
7221	Quiet NaNs are fine. Signaling NaNs cause exceptions. This is fine, signaling NaNs are supposed to cause exceptions as far as I can tell.
lib/Target/PowerPC/PPCInstrInfo.td
120	Sounds good. Will do.

Add handling for ISD::FCANONICALIZE.
Add a few tests.
Fix up some naming.

LGTM from my aspect.

steven.zhang added inline comments.Oct 23 2019, 8:01 PM

lib/Target/PowerPC/PPCInstrVSX.td
1267	Just out of curious, why we set the complexity as 1 here instead of 400.

nemanjai marked an inline comment as done.Oct 23 2019, 9:46 PM

nemanjai added inline comments.

lib/Target/PowerPC/PPCInstrVSX.td
1267	Ha ha... because it's a bug. I'll change it to 400 on the commit. It doesn't change semantics because there isn't an equivalent FPU choice to be made here, but it should still be consistent.

@jsji Any further comments?

Some nit, the biggest question is with line 7223

lib/Target/PowerPC/PPCISelLowering.cpp
1304	Typo? `XSMAXCDP` not `XSMAXCPD`.
7208–7209	Although this won't have problem right now, because we always return before using `Flags`. I think it would be better to move this after the new code, to avoid a potential trap for future programmers.
7219	We will lose some opportunities here, eg: with `-mcpu=pwr9 --enable-no-nans-fp-math --enable-no-signed-zeros-fp-math`? We will catch new opportunities for max/min, but will give up lowering all the other CC?
lib/Target/PowerPC/PPCInstrVSX.td
1290	Nit: The pattern order here is not consistent with above: instead of having one min, one max, we start to put all min together, hen max. It won't have problem, but would be better to read/check if we make it consistent.
test/CodeGen/PowerPC/ctr-minmaxnum.ll
69	QPX should NOT be affected, so shouldn't change here?
144	QPX , shouldn't change.
test/CodeGen/PowerPC/scalar-min-max.ll
3	I would be more interested to see `-mcpu=pwr9` with `--enable-no-nans-fp-math` instead of `-mcpu=pwr8`. :)

nemanjai marked 6 inline comments as done.Oct 25 2019, 4:12 PM

nemanjai added inline comments.

lib/Target/PowerPC/PPCISelLowering.cpp
1304	Oops. Thanks, I'l fix it.
7208–7209	Fair enough, I'll move this down below the min/max switch.
7219	That is a really good point, this will prevent us from generating the `fsel` on P9. I'll fix it up.
lib/Target/PowerPC/PPCInstrVSX.td
1290	I agree 100% that it's nicer to keep it organized and consistent. Will update, thank you.
test/CodeGen/PowerPC/ctr-minmaxnum.ll
69	Oops, overzealous search-and-replace...
test/CodeGen/PowerPC/scalar-min-max.ll
3	I think it is good to show that we get the same instructions with fast math on both P8 and P9.

Address the minor comments and fix the early exit on P9 with no NaNs/Infs.

LGTM. Thanks.

This revision is now accepted and ready to land.Oct 27 2019, 9:11 AM

Oh, maybe the title should be updated to remove "unsafe fp math" to avoid confusion?

Closed by commit rG25a41ad24200: [PowerPC] Emit scalar fp min/max instructions (authored by nemanjai). · Explain WhyOct 28 2019, 5:54 PM

This revision was automatically updated to reflect the committed changes.

ZhangKang mentioned this in D74701: [PowerPC] Fix the unexpected modification caused by D62993 in LowerSELECT_CC for power9.Feb 17 2020, 12:52 AM

ZhangKang mentioned this in rGb083d7a3460d: [PowerPC] Fix the unexpected modification caused by D62993 in LowerSELECT_CC….Feb 25 2020, 7:00 PM

Revision Contents

Path

Size

lib/

Target/

PowerPC/

3 lines

42 lines

7 lines

69 lines

test/

CodeGen/

PowerPC/

ctr-minmaxnum.ll

40 lines

scalar-min-max.ll

203 lines

Diff 226519

lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	namespace PPCISD {
enum NodeType : unsigned {		enum NodeType : unsigned {
// Start the numbering where the builtin ops and target ops leave off.		// Start the numbering where the builtin ops and target ops leave off.
FIRST_NUMBER = ISD::BUILTIN_OP_END,		FIRST_NUMBER = ISD::BUILTIN_OP_END,

/// FSEL - Traditional three-operand fsel node.		/// FSEL - Traditional three-operand fsel node.
///		///
FSEL,		FSEL,

		/// XSMAXCDP, XSMINCDP - C-type min/max instructions.
		XSMAXCDP, XSMINCDP,

/// FCFID - The FCFID instruction, taking an f64 operand and producing		/// FCFID - The FCFID instruction, taking an f64 operand and producing
/// and f64 value containing the FP representation of the integer that		/// and f64 value containing the FP representation of the integer that
/// was temporarily in the f64 operand.		/// was temporarily in the f64 operand.
FCFID,		FCFID,

/// Newer FCFID[US] integer-to-floating-point conversion instructions for		/// Newer FCFID[US] integer-to-floating-point conversion instructions for
/// unsigned integers and single-precision outputs.		/// unsigned integers and single-precision outputs.
FCFIDU, FCFIDS, FCFIDUS,		FCFIDU, FCFIDS, FCFIDUS,
▲ Show 20 Lines • Show All 1,178 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 542 Lines • ▼ Show 20 Lines	if (Subtarget.use64BitRegs()) {
setOperationAction(ISD::SRL_PARTS, MVT::i64, Custom);		setOperationAction(ISD::SRL_PARTS, MVT::i64, Custom);
} else {		} else {
// 32-bit PowerPC wants to expand i64 shifts itself.		// 32-bit PowerPC wants to expand i64 shifts itself.
setOperationAction(ISD::SHL_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SHL_PARTS, MVT::i32, Custom);
setOperationAction(ISD::SRA_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SRA_PARTS, MVT::i32, Custom);
setOperationAction(ISD::SRL_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SRL_PARTS, MVT::i32, Custom);
}		}

		if (Subtarget.hasVSX()) {
		setOperationAction(ISD::FMAXNUM_IEEE, MVT::f64, Legal);
		setOperationAction(ISD::FMAXNUM_IEEE, MVT::f32, Legal);
		setOperationAction(ISD::FMINNUM_IEEE, MVT::f64, Legal);
		setOperationAction(ISD::FMINNUM_IEEE, MVT::f32, Legal);
		steven.zhangUnsubmitted Not Done Reply Inline Actions We will get the ISD::FMAXNUM/ISD::FMINNUM node if mark it as legal. define dso_local float @testfmax_fast(float %a, float %b) { entry: %cmp = fcmp fast ogt float %a, %b %cond = select i1 %cmp, float %a, float %b ret float %cond } llc test.ll -mattr=+vsx And also for the intrinsic llvm.minnum/llvm.maxnum. Initial selection DAG: %bb.0 'testfmax_fast:entry' SelectionDAG has 11 nodes: t0: ch = EntryToken t2: f32,ch = CopyFromReg t0, Register:f32 %0 t4: f32,ch = CopyFromReg t0, Register:f32 %1 t6: i1 = setcc nnan ninf nsz arcp contract afn reassoc t2, t4, setgt:ch t7: f32 = fmaxnum t2, t4 t9: ch,glue = CopyToReg t0, Register:f32 $f1, t7 t10: ch = PPCISD::RET_FLAG t9, Register:f32 $f1, t9:1 The node is built directly not combined by select_cc. And I think, we need to lower it if we know that, the operand is NaN or not(i.e. isKnownNeverNaN()). steven.zhang: We will get the ISD::FMAXNUM/ISD::FMINNUM node if mark it as legal. ``` define dso_local float…
		steven.zhangUnsubmitted Not Done Reply Inline Actions Hmm, ignore the above comments. It is right to have DAG generated the IEEE node instead of the non-IEEE, as the hw has instruction semantics equal. steven.zhang: Hmm, ignore the above comments. It is right to have DAG generated the IEEE node instead of the…
		nemanjaiAuthorUnsubmitted Done Reply Inline Actions I don't dispute that we will get the nodes if we mark them legal. However, I do not think that we will get these nodes in more situations than we get the `_IEEE` versions. The way I see it, with `ninf nsz nnan`, the nodes are equivalent since the only difference between them is the handling of NaNs and (presumably) signed zeros. nemanjai: I don't dispute that we will get the nodes if we mark them legal. However, I do not think that…
		}

if (Subtarget.hasAltivec()) {		if (Subtarget.hasAltivec()) {
// First set operation action for all vector types to expand. Then we		// First set operation action for all vector types to expand. Then we
// will selectively turn on ones that can be effectively codegen'd.		// will selectively turn on ones that can be effectively codegen'd.
for (MVT VT : MVT::fixedlen_vector_valuetypes()) {		for (MVT VT : MVT::fixedlen_vector_valuetypes()) {
// add/sub are legal for all supported vector VT's.		// add/sub are legal for all supported vector VT's.
setOperationAction(ISD::ADD, VT, Legal);		setOperationAction(ISD::ADD, VT, Legal);
setOperationAction(ISD::SUB, VT, Legal);		setOperationAction(ISD::SUB, VT, Legal);

▲ Show 20 Lines • Show All 730 Lines • ▼ Show 20 Lines
bool PPCTargetLowering::preferIncOfAddToSubOfNot(EVT VT) const {		bool PPCTargetLowering::preferIncOfAddToSubOfNot(EVT VT) const {
return VT.isScalarInteger();		return VT.isScalarInteger();
}		}

const char *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const {		const char *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const {
switch ((PPCISD::NodeType)Opcode) {		switch ((PPCISD::NodeType)Opcode) {
case PPCISD::FIRST_NUMBER: break;		case PPCISD::FIRST_NUMBER: break;
case PPCISD::FSEL: return "PPCISD::FSEL";		case PPCISD::FSEL: return "PPCISD::FSEL";
		case PPCISD::XSMAXCDP: return "PPCISD::XSMAXCDP";
		jsjiUnsubmitted Not Done Reply Inline Actions Typo? `XSMAXCDP` not `XSMAXCPD`. jsji: Typo? `XSMAXCDP` not `XSMAXCPD`.
		nemanjaiAuthorUnsubmitted Done Reply Inline Actions Oops. Thanks, I'l fix it. nemanjai: Oops. Thanks, I'l fix it.
		case PPCISD::XSMINCDP: return "PPCISD::XSMINCDP";
case PPCISD::FCFID: return "PPCISD::FCFID";		case PPCISD::FCFID: return "PPCISD::FCFID";
case PPCISD::FCFIDU: return "PPCISD::FCFIDU";		case PPCISD::FCFIDU: return "PPCISD::FCFIDU";
case PPCISD::FCFIDS: return "PPCISD::FCFIDS";		case PPCISD::FCFIDS: return "PPCISD::FCFIDS";
case PPCISD::FCFIDUS: return "PPCISD::FCFIDUS";		case PPCISD::FCFIDUS: return "PPCISD::FCFIDUS";
case PPCISD::FCTIDZ: return "PPCISD::FCTIDZ";		case PPCISD::FCTIDZ: return "PPCISD::FCTIDZ";
case PPCISD::FCTIWZ: return "PPCISD::FCTIWZ";		case PPCISD::FCTIWZ: return "PPCISD::FCTIWZ";
case PPCISD::FCTIDUZ: return "PPCISD::FCTIDUZ";		case PPCISD::FCTIDUZ: return "PPCISD::FCTIDUZ";
case PPCISD::FCTIWUZ: return "PPCISD::FCTIWUZ";		case PPCISD::FCTIWUZ: return "PPCISD::FCTIWUZ";
▲ Show 20 Lines • Show All 5,878 Lines • ▼ Show 20 Lines
/// LowerSELECT_CC - Lower floating point select_cc's into fsel instruction when		/// LowerSELECT_CC - Lower floating point select_cc's into fsel instruction when
/// possible.		/// possible.
SDValue PPCTargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const {		SDValue PPCTargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const {
// Not FP? Not a fsel.		// Not FP? Not a fsel.
if (!Op.getOperand(0).getValueType().isFloatingPoint() \|\|		if (!Op.getOperand(0).getValueType().isFloatingPoint() \|\|
!Op.getOperand(2).getValueType().isFloatingPoint())		!Op.getOperand(2).getValueType().isFloatingPoint())
return Op;		return Op;

		bool HasNoInfs = DAG.getTarget().Options.NoInfsFPMath;
		bool HasNoNaNs = DAG.getTarget().Options.NoNaNsFPMath;
// We might be able to do better than this under some circumstances, but in		// We might be able to do better than this under some circumstances, but in
// general, fsel-based lowering of select is a finite-math-only optimization.		// general, fsel-based lowering of select is a finite-math-only optimization.
// For more information, see section F.3 of the 2.06 ISA specification.		// For more information, see section F.3 of the 2.06 ISA specification.
if (!DAG.getTarget().Options.NoInfsFPMath \|\|		// With ISA 3.0, we have xsmaxcdp/xsmincdp which are OK to emit even in the
!DAG.getTarget().Options.NoNaNsFPMath)		// presence of infinities.
		if (!Subtarget.hasP9Vector() && (!HasNoInfs \|\| !HasNoNaNs))
		nemanjaiAuthorUnsubmitted Done Reply Inline Actions This needs to change from `isISA3_0()` to `hasP9Vector()`. nemanjai: This needs to change from `isISA3_0()` to `hasP9Vector()`.
return Op;		return Op;
// TODO: Propagate flags from the select rather than global settings.
SDNodeFlags Flags;
Flags.setNoInfs(true);
Flags.setNoNaNs(true);

ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(4))->get();		ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(4))->get();
		jsjiUnsubmitted Not Done Reply Inline Actions Although this won't have problem right now, because we always return before using `Flags`. I think it would be better to move this after the new code, to avoid a potential trap for future programmers. jsji: Although this won't have problem right now, because we always return before using `Flags`. I…
		nemanjaiAuthorUnsubmitted Done Reply Inline Actions Fair enough, I'll move this down below the min/max switch. nemanjai: Fair enough, I'll move this down below the min/max switch.

EVT ResVT = Op.getValueType();		EVT ResVT = Op.getValueType();
EVT CmpVT = Op.getOperand(0).getValueType();		EVT CmpVT = Op.getOperand(0).getValueType();
SDValue LHS = Op.getOperand(0), RHS = Op.getOperand(1);		SDValue LHS = Op.getOperand(0), RHS = Op.getOperand(1);
SDValue TV = Op.getOperand(2), FV = Op.getOperand(3);		SDValue TV = Op.getOperand(2), FV = Op.getOperand(3);
SDLoc dl(Op);		SDLoc dl(Op);

		if (Subtarget.hasP9Vector() && LHS == TV && RHS == FV) {
		steven.zhangUnsubmitted Not Done Reply Inline Actions Is it hasVSX() more clear ? steven.zhang: Is it hasVSX() more clear ?
		nemanjaiAuthorUnsubmitted Done Reply Inline Actions I dont' understand this comment. This is only available in ISA3.0, so `hasVSX()` is not adequate. And VSX is a requirement for P9Vector, so why would I need both? nemanjai: I dont' understand this comment. This is only available in ISA3.0, so `hasVSX()` is not…
		switch (CC) {
		default:
		jsjiUnsubmitted Not Done Reply Inline Actions We will lose some opportunities here, eg: with `-mcpu=pwr9 --enable-no-nans-fp-math --enable-no-signed-zeros-fp-math`? We will catch new opportunities for max/min, but will give up lowering all the other CC? jsji: We will lose some opportunities here, eg: with `-mcpu=pwr9 --enable-no-nans-fp-math --enable-no…
		nemanjaiAuthorUnsubmitted Done Reply Inline Actions That is a really good point, this will prevent us from generating the `fsel` on P9. I'll fix it up. nemanjai: That is a really good point, this will prevent us from generating the `fsel` on P9. I'll fix it…
		// Not a min/max but with finite math, we may still be able to use fsel.
		if (HasNoInfs && HasNoNaNs)
		steven.zhangUnsubmitted Not Done Reply Inline Actions Do we need logic to handle the case that if the op is NaN ? If src1 or src2 is a SNaN, an Invalid Operation exception occurs. If either src1 or src2 is a NaN, result is src2. Otherwise, if src1 is less than src2, result is src1. Otherwise, result is src2. The ISA documentation is a bit confusing here. Isn't NaN including SNaN and QNaN ? The condition in the second if cover the first one. steven.zhang: Do we need logic to handle the case that if the op is NaN ? ``` If src1 or src2 is a SNaN, an…
		nemanjaiAuthorUnsubmitted Done Reply Inline Actions Quiet NaNs are fine. Signaling NaNs cause exceptions. This is fine, signaling NaNs are supposed to cause exceptions as far as I can tell. nemanjai: Quiet NaNs are fine. Signaling NaNs cause exceptions. This is fine, signaling NaNs are supposed…
		steven.zhangUnsubmitted Not Done Reply Inline Actions Please also ignore above comments as after double confirm, the XSMAXCDP/XSMINCDP perfectly match the semantics, no matter if the operand is NaN or not. steven.zhang: Please also ignore above comments as after double confirm, the XSMAXCDP/XSMINCDP perfectly…
		break;
		return Op;
		case ISD::SETOGT:
		case ISD::SETGT:
		return DAG.getNode(PPCISD::XSMAXCDP, dl, Op.getValueType(), LHS, RHS);
		case ISD::SETOLT:
		case ISD::SETLT:
		return DAG.getNode(PPCISD::XSMINCDP, dl, Op.getValueType(), LHS, RHS);
		}
		}

		// TODO: Propagate flags from the select rather than global settings.
		SDNodeFlags Flags;
		Flags.setNoInfs(true);
		Flags.setNoNaNs(true);

// If the RHS of the comparison is a 0.0, we don't need to do the		// If the RHS of the comparison is a 0.0, we don't need to do the
// subtraction at all.		// subtraction at all.
SDValue Sel1;		SDValue Sel1;
if (isFloatingPointZero(RHS))		if (isFloatingPointZero(RHS))
switch (CC) {		switch (CC) {
default: break; // SETUO etc aren't handled by fsel.		default: break; // SETUO etc aren't handled by fsel.
case ISD::SETNE:		case ISD::SETNE:
std::swap(TV, FV);		std::swap(TV, FV);
▲ Show 20 Lines • Show All 8,299 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCInstrInfo.td

	Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines
	def SDT_PPCqvlfsb : SDTypeProfile<1, 1, [			def SDT_PPCqvlfsb : SDTypeProfile<1, 1, [
	SDTCisVec<0>, SDTCisPtrTy<1>			SDTCisVec<0>, SDTCisPtrTy<1>
	]>;			]>;

	def SDT_PPCextswsli : SDTypeProfile<1, 2, [ // extswsli			def SDT_PPCextswsli : SDTypeProfile<1, 2, [ // extswsli
	SDTCisInt<0>, SDTCisInt<1>, SDTCisOpSmallerThanOp<1, 0>, SDTCisInt<2>			SDTCisInt<0>, SDTCisInt<1>, SDTCisOpSmallerThanOp<1, 0>, SDTCisInt<2>
	]>;			]>;

				def SDT_PPCFPMinMax : SDTypeProfile<1, 2, [
				steven.zhangUnsubmitted Not Done Reply Inline Actions SDT_PPCFPMinMax ? steven.zhang: SDT_PPCFPMinMax ?
				nemanjaiAuthorUnsubmitted Done Reply Inline Actions Sounds good. Will do. nemanjai: Sounds good. Will do.
				SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisFP<0>
				]>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// PowerPC specific DAG Nodes.			// PowerPC specific DAG Nodes.
	//			//

	def PPCfre : SDNode<"PPCISD::FRE", SDTFPUnaryOp, []>;			def PPCfre : SDNode<"PPCISD::FRE", SDTFPUnaryOp, []>;
	def PPCfrsqrte: SDNode<"PPCISD::FRSQRTE", SDTFPUnaryOp, []>;			def PPCfrsqrte: SDNode<"PPCISD::FRSQRTE", SDTFPUnaryOp, []>;

	def PPCfcfid : SDNode<"PPCISD::FCFID", SDTFPUnaryOp, []>;			def PPCfcfid : SDNode<"PPCISD::FCFID", SDTFPUnaryOp, []>;
	Show All 32 Lines
	// Perform FADD in round-to-zero mode.			// Perform FADD in round-to-zero mode.
	def PPCfaddrtz: SDNode<"PPCISD::FADDRTZ", SDTFPBinOp, []>;			def PPCfaddrtz: SDNode<"PPCISD::FADDRTZ", SDTFPBinOp, []>;


	def PPCfsel : SDNode<"PPCISD::FSEL",			def PPCfsel : SDNode<"PPCISD::FSEL",
	// Type constraint for fsel.			// Type constraint for fsel.
	SDTypeProfile<1, 3, [SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>,			SDTypeProfile<1, 3, [SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>,
	SDTCisFP<0>, SDTCisVT<1, f64>]>, []>;			SDTCisFP<0>, SDTCisVT<1, f64>]>, []>;
				def PPCxsmaxc : SDNode<"PPCISD::XSMAXCDP", SDT_PPCFPMinMax, []>;
				def PPCxsminc : SDNode<"PPCISD::XSMINCDP", SDT_PPCFPMinMax, []>;
	def PPChi : SDNode<"PPCISD::Hi", SDTIntBinOp, []>;			def PPChi : SDNode<"PPCISD::Hi", SDTIntBinOp, []>;
	def PPClo : SDNode<"PPCISD::Lo", SDTIntBinOp, []>;			def PPClo : SDNode<"PPCISD::Lo", SDTIntBinOp, []>;
	def PPCtoc_entry: SDNode<"PPCISD::TOC_ENTRY", SDTIntBinOp,			def PPCtoc_entry: SDNode<"PPCISD::TOC_ENTRY", SDTIntBinOp,
	[SDNPMayLoad, SDNPMemOperand]>;			[SDNPMayLoad, SDNPMemOperand]>;
	def PPCvmaddfp : SDNode<"PPCISD::VMADDFP", SDTFPTernaryOp, []>;			def PPCvmaddfp : SDNode<"PPCISD::VMADDFP", SDTFPTernaryOp, []>;
	def PPCvnmsubfp : SDNode<"PPCISD::VNMSUBFP", SDTFPTernaryOp, []>;			def PPCvnmsubfp : SDNode<"PPCISD::VNMSUBFP", SDTFPTernaryOp, []>;

	def PPCppc32GOT : SDNode<"PPCISD::PPC32_GOT", SDTIntLeaf, []>;			def PPCppc32GOT : SDNode<"PPCISD::PPC32_GOT", SDTIntLeaf, []>;
	▲ Show 20 Lines • Show All 4,858 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCInstrVSX.td

Show First 20 Lines • Show All 1,249 Lines • ▼ Show 20 Lines	def : Pat<(f64 (PPCfcfidu (PPCmtvsra (i64 (vector_extract v2i64:$S, 0))))),
(f64 (XSCVUXDDP (COPY_TO_REGCLASS $S, VSFRC)))>;		(f64 (XSCVUXDDP (COPY_TO_REGCLASS $S, VSFRC)))>;
def : Pat<(f64 (PPCfcfidu (PPCmtvsra (i64 (vector_extract v2i64:$S, 1))))),		def : Pat<(f64 (PPCfcfidu (PPCmtvsra (i64 (vector_extract v2i64:$S, 1))))),
(f64 (XSCVUXDDP (COPY_TO_REGCLASS (XXPERMDI $S, $S, 2), VSFRC)))>;		(f64 (XSCVUXDDP (COPY_TO_REGCLASS (XXPERMDI $S, $S, 2), VSFRC)))>;
} // IsBigEndian		} // IsBigEndian

} // AddedComplexity		} // AddedComplexity
} // HasVSX		} // HasVSX

		def FpMinMax {
		dag F32Min = (COPY_TO_REGCLASS (XSMINDP (COPY_TO_REGCLASS $A, VSFRC),
		(COPY_TO_REGCLASS $B, VSFRC)),
		VSSRC);
		dag F32Max = (COPY_TO_REGCLASS (XSMAXDP (COPY_TO_REGCLASS $A, VSFRC),
		(COPY_TO_REGCLASS $B, VSFRC)),
		VSSRC);
		}

		let AddedComplexity = 400, Predicates = [HasVSX] in {
		steven.zhangUnsubmitted Not Done Reply Inline Actions Just out of curious, why we set the complexity as 1 here instead of 400. steven.zhang: Just out of curious, why we set the complexity as 1 here instead of 400.
		nemanjaiAuthorUnsubmitted Done Reply Inline Actions Ha ha... because it's a bug. I'll change it to 400 on the commit. It doesn't change semantics because there isn't an equivalent FPU choice to be made here, but it should still be consistent. nemanjai: Ha ha... because it's a bug. I'll change it to 400 on the commit. It doesn't change semantics…
		// f32 Min.
		def : Pat<(f32 (fminnum_ieee f32:$A, f32:$B)),
		(f32 FpMinMax.F32Min)>;
		def : Pat<(f32 (fminnum_ieee (fcanonicalize f32:$A), f32:$B)),
		(f32 FpMinMax.F32Min)>;
		def : Pat<(f32 (fminnum_ieee f32:$A, (fcanonicalize f32:$B))),
		(f32 FpMinMax.F32Min)>;
		def : Pat<(f32 (fminnum_ieee (fcanonicalize f32:$A), (fcanonicalize f32:$B))),
		(f32 FpMinMax.F32Min)>;
		// F32 Max.
		def : Pat<(f32 (fmaxnum_ieee f32:$A, f32:$B)),
		(f32 FpMinMax.F32Max)>;
		def : Pat<(f32 (fmaxnum_ieee (fcanonicalize f32:$A), f32:$B)),
		(f32 FpMinMax.F32Max)>;
		def : Pat<(f32 (fmaxnum_ieee f32:$A, (fcanonicalize f32:$B))),
		(f32 FpMinMax.F32Max)>;
		def : Pat<(f32 (fmaxnum_ieee (fcanonicalize f32:$A), (fcanonicalize f32:$B))),
		(f32 FpMinMax.F32Max)>;

		// f64 Min.
		def : Pat<(f64 (fminnum_ieee f64:$A, f64:$B)),
		(f64 (XSMINDP $A, $B))>;
		def : Pat<(f64 (fminnum_ieee (fcanonicalize f64:$A), f64:$B)),
		jsjiUnsubmitted Not Done Reply Inline Actions Nit: The pattern order here is not consistent with above: instead of having one min, one max, we start to put all min together, hen max. It won't have problem, but would be better to read/check if we make it consistent. jsji: Nit: The pattern order here is not consistent with above: instead of having one min, one max…
		nemanjaiAuthorUnsubmitted Done Reply Inline Actions I agree 100% that it's nicer to keep it organized and consistent. Will update, thank you. nemanjai: I agree 100% that it's nicer to keep it organized and consistent. Will update, thank you.
		(f64 (XSMINDP $A, $B))>;
		def : Pat<(f64 (fminnum_ieee f64:$A, (fcanonicalize f64:$B))),
		(f64 (XSMINDP $A, $B))>;
		def : Pat<(f64 (fminnum_ieee (fcanonicalize f64:$A), (fcanonicalize f64:$B))),
		(f64 (XSMINDP $A, $B))>;
		// f64 Max.
		def : Pat<(f64 (fmaxnum_ieee f64:$A, f64:$B)),
		(f64 (XSMAXDP $A, $B))>;
		def : Pat<(f64 (fmaxnum_ieee (fcanonicalize f64:$A), f64:$B)),
		(f64 (XSMAXDP $A, $B))>;
		def : Pat<(f64 (fmaxnum_ieee f64:$A, (fcanonicalize f64:$B))),
		(f64 (XSMAXDP $A, $B))>;
		def : Pat<(f64 (fmaxnum_ieee (fcanonicalize f64:$A), (fcanonicalize f64:$B))),
		(f64 (XSMAXDP $A, $B))>;
		}

def ScalarLoads {		def ScalarLoads {
dag Li8 = (i32 (extloadi8 xoaddr:$src));		dag Li8 = (i32 (extloadi8 xoaddr:$src));
dag ZELi8 = (i32 (zextloadi8 xoaddr:$src));		dag ZELi8 = (i32 (zextloadi8 xoaddr:$src));
dag ZELi8i64 = (i64 (zextloadi8 xoaddr:$src));		dag ZELi8i64 = (i64 (zextloadi8 xoaddr:$src));
dag SELi8 = (i32 (sext_inreg (extloadi8 xoaddr:$src), i8));		dag SELi8 = (i32 (sext_inreg (extloadi8 xoaddr:$src), i8));
dag SELi8i64 = (i64 (sext_inreg (extloadi8 xoaddr:$src), i8));		dag SELi8i64 = (i64 (sext_inreg (extloadi8 xoaddr:$src), i8));

dag Li16 = (i32 (extloadi16 xoaddr:$src));		dag Li16 = (i32 (extloadi16 xoaddr:$src));
▲ Show 20 Lines • Show All 1,613 Lines • ▼ Show 20 Lines	def XVTSTDCDP : XX2_RD6_DCMX7_RS6<60, 15, 5,
(outs vsrc:$XT), (ins u7imm:$DCMX, vsrc:$XB),		(outs vsrc:$XT), (ins u7imm:$DCMX, vsrc:$XB),
"xvtstdcdp $XT, $XB, $DCMX", IIC_VecFP,		"xvtstdcdp $XT, $XB, $DCMX", IIC_VecFP,
[(set v2i64: $XT,		[(set v2i64: $XT,
(int_ppc_vsx_xvtstdcdp v2f64:$XB, timm:$DCMX))]>;		(int_ppc_vsx_xvtstdcdp v2f64:$XB, timm:$DCMX))]>;

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//

// Maximum/Minimum Type-C/Type-J DP		// Maximum/Minimum Type-C/Type-J DP
// XT.dword[1] = 0xUUUU_UUUU_UUUU_UUUU, so we use vsrc for XT		def XSMAXCDP : XX3_XT5_XA5_XB5<60, 128, "xsmaxcdp", vsfrc, vsfrc, vsfrc,
def XSMAXCDP : XX3_XT5_XA5_XB5<60, 128, "xsmaxcdp", vsrc, vsfrc, vsfrc,		IIC_VecFP,
IIC_VecFP, []>;		[(set f64:$XT, (PPCxsmaxc f64:$XA, f64:$XB))]>;
def XSMAXJDP : XX3_XT5_XA5_XB5<60, 144, "xsmaxjdp", vsrc, vsfrc, vsfrc,		def XSMAXJDP : XX3_XT5_XA5_XB5<60, 144, "xsmaxjdp", vsrc, vsfrc, vsfrc,
IIC_VecFP, []>;		IIC_VecFP, []>;
def XSMINCDP : XX3_XT5_XA5_XB5<60, 136, "xsmincdp", vsrc, vsfrc, vsfrc,		def XSMINCDP : XX3_XT5_XA5_XB5<60, 136, "xsmincdp", vsfrc, vsfrc, vsfrc,
IIC_VecFP, []>;		IIC_VecFP,
		[(set f64:$XT, (PPCxsminc f64:$XA, f64:$XB))]>;
def XSMINJDP : XX3_XT5_XA5_XB5<60, 152, "xsminjdp", vsrc, vsfrc, vsfrc,		def XSMINJDP : XX3_XT5_XA5_XB5<60, 152, "xsminjdp", vsrc, vsfrc, vsfrc,
IIC_VecFP, []>;		IIC_VecFP, []>;

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//

// Vector Byte-Reverse H/W/D/Q Word		// Vector Byte-Reverse H/W/D/Q Word
def XXBRH : XX2_XT6_XO5_XB6<60, 7, 475, "xxbrh", vsrc, []>;		def XXBRH : XX2_XT6_XO5_XB6<60, 7, 475, "xxbrh", vsrc, []>;
def XXBRW : XX2_XT6_XO5_XB6<60, 15, 475, "xxbrw", vsrc, []>;		def XXBRW : XX2_XT6_XO5_XB6<60, 15, 475, "xxbrw", vsrc, []>;
▲ Show 20 Lines • Show All 790 Lines • ▼ Show 20 Lines
// Round & Convert QP -> DP/SP		// Round & Convert QP -> DP/SP
def : Pat<(f64 (fpround f128:$src)), (f64 (XSCVQPDP $src))>;		def : Pat<(f64 (fpround f128:$src)), (f64 (XSCVQPDP $src))>;
def : Pat<(f32 (fpround f128:$src)), (f32 (XSRSP (XSCVQPDPO $src)))>;		def : Pat<(f32 (fpround f128:$src)), (f32 (XSRSP (XSCVQPDPO $src)))>;

// Convert SP -> QP		// Convert SP -> QP
def : Pat<(f128 (fpextend f32:$src)),		def : Pat<(f128 (fpextend f32:$src)),
(f128 (XSCVDPQP (COPY_TO_REGCLASS $src, VFRC)))>;		(f128 (XSCVDPQP (COPY_TO_REGCLASS $src, VFRC)))>;

		def : Pat<(f32 (PPCxsmaxc f32:$XA, f32:$XB)),
		(f32 (COPY_TO_REGCLASS (XSMAXCDP (COPY_TO_REGCLASS $XA, VSSRC),
		(COPY_TO_REGCLASS $XB, VSSRC)),
		VSSRC))>;
		def : Pat<(f32 (PPCxsminc f32:$XA, f32:$XB)),
		(f32 (COPY_TO_REGCLASS (XSMINCDP (COPY_TO_REGCLASS $XA, VSSRC),
		(COPY_TO_REGCLASS $XB, VSSRC)),
		VSSRC))>;

} // end HasP9Vector, AddedComplexity		} // end HasP9Vector, AddedComplexity

let AddedComplexity = 400 in {		let AddedComplexity = 400 in {
let Predicates = [IsISA3_0, HasP9Vector, HasDirectMove, IsBigEndian] in {		let Predicates = [IsISA3_0, HasP9Vector, HasDirectMove, IsBigEndian] in {
def : Pat<(f128 (PPCbuild_fp128 i64:$rB, i64:$rA)),		def : Pat<(f128 (PPCbuild_fp128 i64:$rB, i64:$rA)),
(f128 (COPY_TO_REGCLASS (MTVSRDD $rB, $rA), VRRC))>;		(f128 (COPY_TO_REGCLASS (MTVSRDD $rB, $rA), VRRC))>;
}		}
let Predicates = [IsISA3_0, HasP9Vector, HasDirectMove, IsLittleEndian] in {		let Predicates = [IsISA3_0, HasP9Vector, HasDirectMove, IsLittleEndian] in {
▲ Show 20 Lines • Show All 585 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/ctr-minmaxnum.ll

Show All 30 Lines	loop_body:
br i1 %2, label %loop_exit, label %loop_body		br i1 %2, label %loop_exit, label %loop_body

loop_exit:		loop_exit:
ret void		ret void
}		}

; CHECK-LABEL: test1:		; CHECK-LABEL: test1:
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: bl fminf		; CHECK: xsmindp
; CHECK-NOT: bl fminf		; CHECK-NOT: xsmindp
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: blr		; CHECK: blr

define void @test1v(<4 x float> %f, <4 x float>* %fp) {		define void @test1v(<4 x float> %f, <4 x float>* %fp) {
entry:		entry:
br label %loop_body		br label %loop_body

loop_body:		loop_body:
%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]		%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]
%0 = call <4 x float> @llvm.minnum.v4f32(<4 x float> %f, <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>)		%0 = call <4 x float> @llvm.minnum.v4f32(<4 x float> %f, <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>)
store <4 x float> %0, <4 x float>* %fp, align 16		store <4 x float> %0, <4 x float>* %fp, align 16
%1 = add i64 %invar_address.dim.0.01, 1		%1 = add i64 %invar_address.dim.0.01, 1
%2 = icmp eq i64 %1, 4		%2 = icmp eq i64 %1, 4
br i1 %2, label %loop_exit, label %loop_body		br i1 %2, label %loop_exit, label %loop_body

loop_exit:		loop_exit:
ret void		ret void
}		}

; CHECK-LABEL: test1v:		; CHECK-LABEL: test1v:
; CHECK: xvminsp		; CHECK: xvminsp
; CHECK-NOT: bl fminf		; CHECK-NOT: xsmindp
; CHECK: mtctr		; CHECK: mtctr
; CHECK-NOT: bl fminf		; CHECK-NOT: xsmindp
; CHECK: blr		; CHECK: blr

; QPX-LABEL: test1v:		; QPX-LABEL: test1v:
; QPX: mtctr		; QPX: mtctr
; QPX-NOT: bl fminf		; QPX-NOT: bl fminf
		jsjiUnsubmitted Not Done Reply Inline Actions QPX should NOT be affected, so shouldn't change here? jsji: QPX should NOT be affected, so shouldn't change here?
		nemanjaiAuthorUnsubmitted Done Reply Inline Actions Oops, overzealous search-and-replace... nemanjai: Oops, overzealous search-and-replace...
; QPX: blr		; QPX: blr

define void @test1a(float %f, float* %fp) {		define void @test1a(float %f, float* %fp) {
entry:		entry:
br label %loop_body		br label %loop_body

loop_body:		loop_body:
%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]		%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]
%0 = call float @fminf(float %f, float 1.0) readnone		%0 = call float @fminf(float %f, float 1.0) readnone
store float %0, float* %fp, align 4		store float %0, float* %fp, align 4
%1 = add i64 %invar_address.dim.0.01, 1		%1 = add i64 %invar_address.dim.0.01, 1
%2 = icmp eq i64 %1, 2		%2 = icmp eq i64 %1, 2
br i1 %2, label %loop_exit, label %loop_body		br i1 %2, label %loop_exit, label %loop_body

loop_exit:		loop_exit:
ret void		ret void
}		}

; CHECK-LABEL: test1a:		; CHECK-LABEL: test1a:
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: bl fminf		; CHECK: xsmindp
; CHECK-NOT: bl fminf		; CHECK-NOT: xsmindp
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: blr		; CHECK: blr

define void @test2(float %f, float* %fp) {		define void @test2(float %f, float* %fp) {
entry:		entry:
br label %loop_body		br label %loop_body

loop_body:		loop_body:
%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]		%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]
%0 = call float @llvm.maxnum.f32(float %f, float 1.0)		%0 = call float @llvm.maxnum.f32(float %f, float 1.0)
store float %0, float* %fp, align 4		store float %0, float* %fp, align 4
%1 = add i64 %invar_address.dim.0.01, 1		%1 = add i64 %invar_address.dim.0.01, 1
%2 = icmp eq i64 %1, 2		%2 = icmp eq i64 %1, 2
br i1 %2, label %loop_exit, label %loop_body		br i1 %2, label %loop_exit, label %loop_body

loop_exit:		loop_exit:
ret void		ret void
}		}

; CHECK-LABEL: test2:		; CHECK-LABEL: test2:
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: bl fmaxf		; CHECK: xsmaxdp
; CHECK-NOT: bl fmaxf		; CHECK-NOT: xsmaxdp
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: blr		; CHECK: blr

define void @test2v(<4 x double> %f, <4 x double>* %fp) {		define void @test2v(<4 x double> %f, <4 x double>* %fp) {
entry:		entry:
br label %loop_body		br label %loop_body

loop_body:		loop_body:
%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]		%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]
%0 = call <4 x double> @llvm.maxnum.v4f64(<4 x double> %f, <4 x double> <double 1.0, double 1.0, double 1.0, double 1.0>)		%0 = call <4 x double> @llvm.maxnum.v4f64(<4 x double> %f, <4 x double> <double 1.0, double 1.0, double 1.0, double 1.0>)
store <4 x double> %0, <4 x double>* %fp, align 16		store <4 x double> %0, <4 x double>* %fp, align 16
%1 = add i64 %invar_address.dim.0.01, 1		%1 = add i64 %invar_address.dim.0.01, 1
%2 = icmp eq i64 %1, 4		%2 = icmp eq i64 %1, 4
br i1 %2, label %loop_exit, label %loop_body		br i1 %2, label %loop_exit, label %loop_body

loop_exit:		loop_exit:
ret void		ret void
}		}

; CHECK-LABEL: test2v:		; CHECK-LABEL: test2v:
; CHECK: xvmaxdp		; CHECK: xvmaxdp
; CHECK: xvmaxdp		; CHECK: xvmaxdp
; CHECK-NOT: bl fmax		; CHECK-NOT: xsmaxdp
; CHECK: mtctr		; CHECK: mtctr
; CHECK-NOT: bl fmax		; CHECK-NOT: xsmaxdp
; CHECK: blr		; CHECK: blr

; QPX-LABEL: test2v:		; QPX-LABEL: test2v:
; QPX: mtctr		; QPX: mtctr
; QPX-NOT: bl fmax		; QPX-NOT: bl fmax
		jsjiUnsubmitted Not Done Reply Inline Actions QPX , shouldn't change. jsji: QPX , shouldn't change.
; QPX: blr		; QPX: blr

define void @test2a(float %f, float* %fp) {		define void @test2a(float %f, float* %fp) {
entry:		entry:
br label %loop_body		br label %loop_body

loop_body:		loop_body:
%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]		%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]
%0 = call float @fmaxf(float %f, float 1.0) readnone		%0 = call float @fmaxf(float %f, float 1.0) readnone
store float %0, float* %fp, align 4		store float %0, float* %fp, align 4
%1 = add i64 %invar_address.dim.0.01, 1		%1 = add i64 %invar_address.dim.0.01, 1
%2 = icmp eq i64 %1, 2		%2 = icmp eq i64 %1, 2
br i1 %2, label %loop_exit, label %loop_body		br i1 %2, label %loop_exit, label %loop_body

loop_exit:		loop_exit:
ret void		ret void
}		}

; CHECK-LABEL: test2a:		; CHECK-LABEL: test2a:
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: bl fmaxf		; CHECK: xsmaxdp
; CHECK-NOT: bl fmaxf		; CHECK-NOT: xsmaxdp
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: blr		; CHECK: blr

define void @test3(double %f, double* %fp) {		define void @test3(double %f, double* %fp) {
entry:		entry:
br label %loop_body		br label %loop_body

loop_body:		loop_body:
%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]		%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]
%0 = call double @llvm.minnum.f64(double %f, double 1.0)		%0 = call double @llvm.minnum.f64(double %f, double 1.0)
store double %0, double* %fp, align 8		store double %0, double* %fp, align 8
%1 = add i64 %invar_address.dim.0.01, 1		%1 = add i64 %invar_address.dim.0.01, 1
%2 = icmp eq i64 %1, 2		%2 = icmp eq i64 %1, 2
br i1 %2, label %loop_exit, label %loop_body		br i1 %2, label %loop_exit, label %loop_body

loop_exit:		loop_exit:
ret void		ret void
}		}

; CHECK-LABEL: test3:		; CHECK-LABEL: test3:
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: bl fmin		; CHECK: xsmindp
; CHECK-NOT: bl fmin		; CHECK-NOT: xsmindp
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: blr		; CHECK: blr

define void @test3a(double %f, double* %fp) {		define void @test3a(double %f, double* %fp) {
entry:		entry:
br label %loop_body		br label %loop_body

loop_body:		loop_body:
%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]		%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]
%0 = call double @fmin(double %f, double 1.0) readnone		%0 = call double @fmin(double %f, double 1.0) readnone
store double %0, double* %fp, align 8		store double %0, double* %fp, align 8
%1 = add i64 %invar_address.dim.0.01, 1		%1 = add i64 %invar_address.dim.0.01, 1
%2 = icmp eq i64 %1, 2		%2 = icmp eq i64 %1, 2
br i1 %2, label %loop_exit, label %loop_body		br i1 %2, label %loop_exit, label %loop_body

loop_exit:		loop_exit:
ret void		ret void
}		}

; CHECK-LABEL: test3a:		; CHECK-LABEL: test3a:
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: bl fmin		; CHECK: xsmindp
; CHECK-NOT: bl fmin		; CHECK-NOT: xsmindp
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: blr		; CHECK: blr

define void @test4(double %f, double* %fp) {		define void @test4(double %f, double* %fp) {
entry:		entry:
br label %loop_body		br label %loop_body

loop_body:		loop_body:
%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]		%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]
%0 = call double @llvm.maxnum.f64(double %f, double 1.0)		%0 = call double @llvm.maxnum.f64(double %f, double 1.0)
store double %0, double* %fp, align 8		store double %0, double* %fp, align 8
%1 = add i64 %invar_address.dim.0.01, 1		%1 = add i64 %invar_address.dim.0.01, 1
%2 = icmp eq i64 %1, 2		%2 = icmp eq i64 %1, 2
br i1 %2, label %loop_exit, label %loop_body		br i1 %2, label %loop_exit, label %loop_body

loop_exit:		loop_exit:
ret void		ret void
}		}

; CHECK-LABEL: test4:		; CHECK-LABEL: test4:
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: bl fmax		; CHECK: xsmaxdp
; CHECK-NOT: bl fmax		; CHECK-NOT: xsmaxdp
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: blr		; CHECK: blr

define void @test4a(double %f, double* %fp) {		define void @test4a(double %f, double* %fp) {
entry:		entry:
br label %loop_body		br label %loop_body

loop_body:		loop_body:
%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]		%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]
%0 = call double @fmax(double %f, double 1.0) readnone		%0 = call double @fmax(double %f, double 1.0) readnone
store double %0, double* %fp, align 8		store double %0, double* %fp, align 8
%1 = add i64 %invar_address.dim.0.01, 1		%1 = add i64 %invar_address.dim.0.01, 1
%2 = icmp eq i64 %1, 2		%2 = icmp eq i64 %1, 2
br i1 %2, label %loop_exit, label %loop_body		br i1 %2, label %loop_exit, label %loop_body

loop_exit:		loop_exit:
ret void		ret void
}		}

; CHECK-LABEL: test4a:		; CHECK-LABEL: test4a:
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: bl fmax		; CHECK: xsmaxdp
; CHECK-NOT: bl fmax		; CHECK-NOT: xsmaxdp
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: blr		; CHECK: blr

test/CodeGen/PowerPC/scalar-min-max.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mcpu=pwr8 -ppc-asm-full-reg-names --enable-unsafe-fp-math \
				; RUN: -verify-machineinstrs --enable-no-signed-zeros-fp-math \
				jsjiUnsubmitted Not Done Reply Inline Actions I would be more interested to see `-mcpu=pwr9` with `--enable-no-nans-fp-math` instead of `-mcpu=pwr8`. :) jsji: I would be more interested to see `-mcpu=pwr9` with `--enable-no-nans-fp-math` instead of `…
				nemanjaiAuthorUnsubmitted Done Reply Inline Actions I think it is good to show that we get the same instructions with fast math on both P8 and P9. nemanjai: I think it is good to show that we get the same instructions with fast math on both P8 and P9.
				; RUN: --enable-no-nans-fp-math \
				; RUN: -mtriple=powerpc64le-unknown-unknown < %s \| FileCheck %s
				; RUN: llc -mcpu=pwr9 -ppc-asm-full-reg-names --enable-unsafe-fp-math \
				; RUN: -verify-machineinstrs --enable-no-signed-zeros-fp-math \
				; RUN: --enable-no-nans-fp-math \
				; RUN: -mtriple=powerpc64le-unknown-unknown < %s \| FileCheck %s
				; RUN: llc -mcpu=pwr9 -ppc-asm-full-reg-names -verify-machineinstrs \
				; RUN: -mtriple=powerpc64le-unknown-unknown < %s \| FileCheck %s \
				; RUN: --check-prefix=NO-FAST-P9
				; RUN: llc -mcpu=pwr8 -ppc-asm-full-reg-names -verify-machineinstrs \
				steven.zhangUnsubmitted Not Done Reply Inline Actions It would be great if we have some test to verify the behavior if operand is SNaN/QNaN for P9. However, this is NOT a must. steven.zhang: It would be great if we have some test to verify the behavior if operand is SNaN/QNaN for P9.
				; RUN: -mtriple=powerpc64le-unknown-unknown < %s \| FileCheck %s \
				; RUN: --check-prefix=NO-FAST-P8
				define dso_local float @testfmax(float %a, float %b) local_unnamed_addr {
				; CHECK-LABEL: testfmax:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xsmaxdp f1, f1, f2
				; CHECK-NEXT: blr
				;
				; NO-FAST-P9-LABEL: testfmax:
				; NO-FAST-P9: # %bb.0: # %entry
				; NO-FAST-P9-NEXT: xsmaxcdp f1, f1, f2
				; NO-FAST-P9-NEXT: blr
				;
				; NO-FAST-P8-LABEL: testfmax:
				; NO-FAST-P8: # %bb.0: # %entry
				; NO-FAST-P8-NEXT: fcmpu cr0, f1, f2
				; NO-FAST-P8-NEXT: bgtlr cr0
				; NO-FAST-P8-NEXT: # %bb.1: # %entry
				; NO-FAST-P8-NEXT: fmr f1, f2
				; NO-FAST-P8-NEXT: blr
				entry:
				%cmp = fcmp ogt float %a, %b
				%cond = select i1 %cmp, float %a, float %b
				ret float %cond
				}

				define dso_local double @testdmax(double %a, double %b) local_unnamed_addr {
				; CHECK-LABEL: testdmax:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xsmaxdp f1, f1, f2
				; CHECK-NEXT: blr
				;
				; NO-FAST-P9-LABEL: testdmax:
				; NO-FAST-P9: # %bb.0: # %entry
				; NO-FAST-P9-NEXT: xsmaxcdp f1, f1, f2
				; NO-FAST-P9-NEXT: blr
				;
				; NO-FAST-P8-LABEL: testdmax:
				; NO-FAST-P8: # %bb.0: # %entry
				; NO-FAST-P8-NEXT: xscmpudp cr0, f1, f2
				; NO-FAST-P8-NEXT: bgtlr cr0
				; NO-FAST-P8-NEXT: # %bb.1: # %entry
				; NO-FAST-P8-NEXT: fmr f1, f2
				; NO-FAST-P8-NEXT: blr
				entry:
				%cmp = fcmp ogt double %a, %b
				%cond = select i1 %cmp, double %a, double %b
				ret double %cond
				}

				define dso_local float @testfmin(float %a, float %b) local_unnamed_addr {
				; CHECK-LABEL: testfmin:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xsmindp f1, f1, f2
				; CHECK-NEXT: blr
				;
				; NO-FAST-P9-LABEL: testfmin:
				; NO-FAST-P9: # %bb.0: # %entry
				; NO-FAST-P9-NEXT: xsmincdp f1, f1, f2
				; NO-FAST-P9-NEXT: blr
				;
				; NO-FAST-P8-LABEL: testfmin:
				; NO-FAST-P8: # %bb.0: # %entry
				; NO-FAST-P8-NEXT: fcmpu cr0, f1, f2
				; NO-FAST-P8-NEXT: bltlr cr0
				; NO-FAST-P8-NEXT: # %bb.1: # %entry
				; NO-FAST-P8-NEXT: fmr f1, f2
				; NO-FAST-P8-NEXT: blr
				entry:
				%cmp = fcmp olt float %a, %b
				%cond = select i1 %cmp, float %a, float %b
				ret float %cond
				}

				define dso_local double @testdmin(double %a, double %b) local_unnamed_addr {
				; CHECK-LABEL: testdmin:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xsmindp f1, f1, f2
				; CHECK-NEXT: blr
				;
				; NO-FAST-P9-LABEL: testdmin:
				; NO-FAST-P9: # %bb.0: # %entry
				; NO-FAST-P9-NEXT: xsmincdp f1, f1, f2
				; NO-FAST-P9-NEXT: blr
				;
				; NO-FAST-P8-LABEL: testdmin:
				; NO-FAST-P8: # %bb.0: # %entry
				; NO-FAST-P8-NEXT: xscmpudp cr0, f1, f2
				; NO-FAST-P8-NEXT: bltlr cr0
				; NO-FAST-P8-NEXT: # %bb.1: # %entry
				; NO-FAST-P8-NEXT: fmr f1, f2
				; NO-FAST-P8-NEXT: blr
				entry:
				%cmp = fcmp olt double %a, %b
				%cond = select i1 %cmp, double %a, double %b
				ret double %cond
				}

				define dso_local float @testfmax_fast(float %a, float %b) local_unnamed_addr {
				; CHECK-LABEL: testfmax_fast:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xsmaxdp f1, f1, f2
				; CHECK-NEXT: blr
				;
				; NO-FAST-P9-LABEL: testfmax_fast:
				; NO-FAST-P9: # %bb.0: # %entry
				; NO-FAST-P9-NEXT: xsmaxcdp f1, f1, f2
				; NO-FAST-P9-NEXT: blr
				;
				; NO-FAST-P8-LABEL: testfmax_fast:
				; NO-FAST-P8: # %bb.0: # %entry
				; NO-FAST-P8-NEXT: fcmpu cr0, f1, f2
				; NO-FAST-P8-NEXT: bgtlr cr0
				; NO-FAST-P8-NEXT: # %bb.1: # %entry
				; NO-FAST-P8-NEXT: fmr f1, f2
				; NO-FAST-P8-NEXT: blr
				entry:
				%cmp = fcmp fast ogt float %a, %b
				%cond = select i1 %cmp, float %a, float %b
				ret float %cond
				}
				define dso_local double @testdmax_fast(double %a, double %b) local_unnamed_addr {
				; CHECK-LABEL: testdmax_fast:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xsmaxdp f1, f1, f2
				; CHECK-NEXT: blr
				;
				; NO-FAST-P9-LABEL: testdmax_fast:
				; NO-FAST-P9: # %bb.0: # %entry
				; NO-FAST-P9-NEXT: xsmaxcdp f1, f1, f2
				; NO-FAST-P9-NEXT: blr
				;
				; NO-FAST-P8-LABEL: testdmax_fast:
				; NO-FAST-P8: # %bb.0: # %entry
				; NO-FAST-P8-NEXT: xscmpudp cr0, f1, f2
				; NO-FAST-P8-NEXT: bgtlr cr0
				; NO-FAST-P8-NEXT: # %bb.1: # %entry
				; NO-FAST-P8-NEXT: fmr f1, f2
				; NO-FAST-P8-NEXT: blr
				entry:
				%cmp = fcmp fast ogt double %a, %b
				%cond = select i1 %cmp, double %a, double %b
				ret double %cond
				}
				define dso_local float @testfmin_fast(float %a, float %b) local_unnamed_addr {
				; CHECK-LABEL: testfmin_fast:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xsmindp f1, f1, f2
				; CHECK-NEXT: blr
				;
				; NO-FAST-P9-LABEL: testfmin_fast:
				; NO-FAST-P9: # %bb.0: # %entry
				; NO-FAST-P9-NEXT: xsmincdp f1, f1, f2
				; NO-FAST-P9-NEXT: blr
				;
				; NO-FAST-P8-LABEL: testfmin_fast:
				; NO-FAST-P8: # %bb.0: # %entry
				; NO-FAST-P8-NEXT: fcmpu cr0, f1, f2
				; NO-FAST-P8-NEXT: bltlr cr0
				; NO-FAST-P8-NEXT: # %bb.1: # %entry
				; NO-FAST-P8-NEXT: fmr f1, f2
				; NO-FAST-P8-NEXT: blr
				entry:
				%cmp = fcmp fast olt float %a, %b
				%cond = select i1 %cmp, float %a, float %b
				ret float %cond
				}
				define dso_local double @testdmin_fast(double %a, double %b) local_unnamed_addr {
				; CHECK-LABEL: testdmin_fast:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xsmindp f1, f1, f2
				; CHECK-NEXT: blr
				;
				; NO-FAST-P9-LABEL: testdmin_fast:
				; NO-FAST-P9: # %bb.0: # %entry
				; NO-FAST-P9-NEXT: xsmincdp f1, f1, f2
				; NO-FAST-P9-NEXT: blr
				;
				; NO-FAST-P8-LABEL: testdmin_fast:
				; NO-FAST-P8: # %bb.0: # %entry
				; NO-FAST-P8-NEXT: xscmpudp cr0, f1, f2
				; NO-FAST-P8-NEXT: bltlr cr0
				; NO-FAST-P8-NEXT: # %bb.1: # %entry
				; NO-FAST-P8-NEXT: fmr f1, f2
				; NO-FAST-P8-NEXT: blr
				entry:
				%cmp = fcmp fast olt double %a, %b
				%cond = select i1 %cmp, double %a, double %b
				ret double %cond
				}

This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Emit scalar min/max instructions with unsafe fp mathClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 226519

lib/Target/PowerPC/PPCISelLowering.h

lib/Target/PowerPC/PPCISelLowering.cpp

lib/Target/PowerPC/PPCInstrInfo.td

lib/Target/PowerPC/PPCInstrVSX.td

test/CodeGen/PowerPC/ctr-minmaxnum.ll

test/CodeGen/PowerPC/scalar-min-max.ll

[PowerPC] Emit scalar min/max instructions with unsafe fp math
ClosedPublic