This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
-
PPCISelLowering.h
1/2
PPCISelLowering.cpp
-
PPCInstrInfo.td
1
PPCInstrVSX.td
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
ctr-minmaxnum.ll
1/5
scalar-min-max.ll

Differential D62993

[PowerPC] Emit scalar min/max instructions with unsafe fp math
ClosedPublic

Authored by nemanjai on Jun 6 2019, 8:39 PM.

Download Raw Diff

Details

Reviewers

hfinkel
jsji
kbarton
lei
stefanp
steven.zhang

Commits

rG25a41ad24200: [PowerPC] Emit scalar fp min/max instructions

Summary

This is something I meant to do a long time ago but never got around to it. These instructions should be an improvement over the compare/fsel sequence we currently emit.

The semantics of the instructions as specified in the ISA match the semantics specified in the description of the nodes.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nemanjai created this revision.Jun 6 2019, 8:39 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2019, 8:39 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

It is a great idea to exploit xsmindp/xsmaxdp! But looks like we make it more general than restricted to UnsafeFPMath?

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
551	Why we need `TM.Options.UnsafeFPMath` here? If `ISD::FMAXNUM_IEEE` is generated, then the semantic is exact the same as `xsmaxdp, we should be safe to use `xsmaxdp`. I think we can also add actions for `ISD::FMAXNUM`/`FMAXIMUM` and `ISD::FMINNUM`/`FMINIMUM`, then we do need `TM.Options.UnsafeFPMath` or `TM.Options.NoNansFPMath`/`NoSignedZerosFPMath` for them.
llvm/lib/Target/PowerPC/PPCInstrVSX.td
999	We can add similar patterns for `fmaxnum`/`fmaxinum` and `fminnum`/`fmininum` with `Predicates` ?
llvm/test/CodeGen/PowerPC/scalar-min-max.ll
3	Maybe add `RUN` lines for `--enable-no-nans-fp-math`/`-enable-no-signed-zeros-fp-math`?
8	It would be great if we can pre-commit the testcase to show only difference .
49	Why we need these attributes? Looks like these should be in different `RUN` line ?

Move out of review queue, need author's action.

This revision now requires changes to proceed.Aug 27 2019, 7:54 PM

Herald added subscribers: shchenz, • wuzish, MaskRay. · View Herald TranscriptAug 27 2019, 7:54 PM

amyk added a subscriber: amyk.Sep 12 2019, 10:57 PM

amyk added inline comments.

llvm/test/CodeGen/PowerPC/scalar-min-max.ll
3	Maybe also add `-verify-machineinstrs` to our tests?

nemanjai marked 2 inline comments as done.Oct 22 2019, 4:54 PM

nemanjai added inline comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
551	When I originally did this, it would produce these nodes along with `ISD::FCANONICALIZE` when unsafe fp math isn't specified. However, the DAG combiner seems to have been modified to not do that any longer. The instructions themselves handle SNaNs correctly anyway so we can handle the inputs coming from `ISD::FCANONICALIZE` anyway. However, I don't really see a point in legalizing `FMAXNUM/FMINNUM` since we will just get the `_IEEE` versions even with fast math.
llvm/test/CodeGen/PowerPC/scalar-min-max.ll
8	I will add a RUN line without the FMF flags which will show the difference in codegen in the test case itself.

Remove the requirement for unsafe math for the FMAXNUM_IEEE and FMINNUM_IEEE. Add codegen for P9 xsmaxdp/xsmindp. Improve the test case.

I see there is one sanity failed.

FAIL: LLVM :: CodeGen/PowerPC/ctr-minmaxnum.ll (29517 of 50289)
******************** TEST 'LLVM :: CodeGen/PowerPC/ctr-minmaxnum.ll' FAILED ********************
Script:
--
: 'RUN: at line 1';   /home/qshanz/work/build/bin/llc -mtriple=powerpc64-unknown-linux-gnu -verify-machineinstrs -mcpu=pwr7 < /home/qshanz/work/llvm/llvm/test/CodeGen/PowerPC/ctr-minmaxnum.ll | /home/qshanz/work/build/bin/FileCheck /home/qshanz/work/llvm/llvm/test/CodeGen/PowerPC/ctr-minmaxnum.ll
: 'RUN: at line 2';   /home/qshanz/work/build/bin/llc -mtriple=powerpc64-unknown-linux-gnu -verify-machineinstrs -mcpu=a2q < /home/qshanz/work/llvm/llvm/test/CodeGen/PowerPC/ctr-minmaxnum.ll | /home/qshanz/work/build/bin/FileCheck /home/qshanz/work/llvm/llvm/test/CodeGen/PowerPC/ctr-minmaxnum.ll --check-prefix=QPX
--
Exit Code: 2

Command Output (stderr):
--
LLVM ERROR: Cannot select: t14: f32 = fcanonicalize t2
  t2: f32,ch = CopyFromReg t0, Register:f32 %2
    t1: f32 = Register %2
In function: test1
FileCheck error: '-' is empty.
FileCheck command line:  /home/qshanz/work/build/bin/FileCheck /home/qshanz/work/llvm/llvm/test/CodeGen/PowerPC/ctr-minmaxnum.ll

lib/Target/PowerPC/PPCISelLowering.cpp
555 ↗	(On Diff #226093)	We will get the ISD::FMAXNUM/ISD::FMINNUM node if mark it as legal. define dso_local float @testfmax_fast(float %a, float %b) { entry: %cmp = fcmp fast ogt float %a, %b %cond = select i1 %cmp, float %a, float %b ret float %cond } llc test.ll -mattr=+vsx And also for the intrinsic llvm.minnum/llvm.maxnum. Initial selection DAG: %bb.0 'testfmax_fast:entry' SelectionDAG has 11 nodes: t0: ch = EntryToken t2: f32,ch = CopyFromReg t0, Register:f32 %0 t4: f32,ch = CopyFromReg t0, Register:f32 %1 t6: i1 = setcc nnan ninf nsz arcp contract afn reassoc t2, t4, setgt:ch t7: f32 = fmaxnum t2, t4 t9: ch,glue = CopyToReg t0, Register:f32 $f1, t7 t10: ch = PPCISD::RET_FLAG t9, Register:f32 $f1, t9:1 The node is built directly not combined by select_cc. And I think, we need to lower it if we know that, the operand is NaN or not(i.e. isKnownNeverNaN()).
7221 ↗	(On Diff #226093)	Is it hasVSX() more clear ?
7225 ↗	(On Diff #226093)	Do we need logic to handle the case that if the op is NaN ? If src1 or src2 is a SNaN, an Invalid Operation exception occurs. If either src1 or src2 is a NaN, result is src2. Otherwise, if src1 is less than src2, result is src1. Otherwise, result is src2. The ISA documentation is a bit confusing here. Isn't NaN including SNaN and QNaN ? The condition in the second if cover the first one.
lib/Target/PowerPC/PPCInstrInfo.td
120 ↗	(On Diff #226093)	SDT_PPCFPMinMax ?

This revision now requires changes to proceed.Oct 23 2019, 1:33 AM

steven.zhang added inline comments.Oct 23 2019, 4:48 AM

lib/Target/PowerPC/PPCISelLowering.cpp
555 ↗	(On Diff #226093)	Hmm, ignore the above comments. It is right to have DAG generated the IEEE node instead of the non-IEEE, as the hw has instruction semantics equal.
7225 ↗	(On Diff #226093)	Please also ignore above comments as after double confirm, the XSMAXCDP/XSMINCDP perfectly match the semantics, no matter if the operand is NaN or not.
test/CodeGen/PowerPC/scalar-min-max.ll
12 ↗	(On Diff #226093)	It would be great if we have some test to verify the behavior if operand is SNaN/QNaN for P9. However, this is NOT a must.

In D62993#1718495, @steven.zhang wrote:
I see there is one sanity failed.
LLVM ERROR: Cannot select: t14: f32 = fcanonicalize t2
t2: f32,ch = CopyFromReg t0, Register:f32 %2
  t1: f32 = Register %2
In function: test1
FileCheck error: '-' is empty.
FileCheck command line: /home/qshanz/work/build/bin/FileCheck /home/qshanz/work/llvm/llvm/test/CodeGen/PowerPC/ctr-minmaxnum.ll

Ah, that's where I saw the issue I mentioned in https://reviews.llvm.org/D62993?id=203487#inline-622927
I'll fix it up in the next update.

lib/Target/PowerPC/PPCISelLowering.cpp
555 ↗	(On Diff #226093)	I don't dispute that we will get the nodes if we mark them legal. However, I do not think that we will get these nodes in more situations than we get the `_IEEE` versions. The way I see it, with `ninf nsz nnan`, the nodes are equivalent since the only difference between them is the handling of NaNs and (presumably) signed zeros.
7205 ↗	(On Diff #226093)	This needs to change from `isISA3_0()` to `hasP9Vector()`.
7221 ↗	(On Diff #226093)	I dont' understand this comment. This is only available in ISA3.0, so `hasVSX()` is not adequate. And VSX is a requirement for P9Vector, so why would I need both?
7225 ↗	(On Diff #226093)	Quiet NaNs are fine. Signaling NaNs cause exceptions. This is fine, signaling NaNs are supposed to cause exceptions as far as I can tell.
lib/Target/PowerPC/PPCInstrInfo.td
120 ↗	(On Diff #226093)	Sounds good. Will do.

Add handling for ISD::FCANONICALIZE.
Add a few tests.
Fix up some naming.

LGTM from my aspect.

steven.zhang added inline comments.Oct 23 2019, 8:01 PM

lib/Target/PowerPC/PPCInstrVSX.td
1267 ↗	(On Diff #226125)	Just out of curious, why we set the complexity as 1 here instead of 400.

nemanjai marked an inline comment as done.Oct 23 2019, 9:46 PM

nemanjai added inline comments.

lib/Target/PowerPC/PPCInstrVSX.td
1267 ↗	(On Diff #226125)	Ha ha... because it's a bug. I'll change it to 400 on the commit. It doesn't change semantics because there isn't an equivalent FPU choice to be made here, but it should still be consistent.

@jsji Any further comments?

Some nit, the biggest question is with line 7223

lib/Target/PowerPC/PPCISelLowering.cpp
1304 ↗	(On Diff #226125)	Typo? `XSMAXCDP` not `XSMAXCPD`.
7210 ↗	(On Diff #226125)	Although this won't have problem right now, because we always return before using `Flags`. I think it would be better to move this after the new code, to avoid a potential trap for future programmers.
7223 ↗	(On Diff #226125)	We will lose some opportunities here, eg: with `-mcpu=pwr9 --enable-no-nans-fp-math --enable-no-signed-zeros-fp-math`? We will catch new opportunities for max/min, but will give up lowering all the other CC?
lib/Target/PowerPC/PPCInstrVSX.td
1290 ↗	(On Diff #226125)	Nit: The pattern order here is not consistent with above: instead of having one min, one max, we start to put all min together, hen max. It won't have problem, but would be better to read/check if we make it consistent.
test/CodeGen/PowerPC/ctr-minmaxnum.ll
69 ↗	(On Diff #226125)	QPX should NOT be affected, so shouldn't change here?
144 ↗	(On Diff #226125)	QPX , shouldn't change.
test/CodeGen/PowerPC/scalar-min-max.ll
2 ↗	(On Diff #226125)	I would be more interested to see `-mcpu=pwr9` with `--enable-no-nans-fp-math` instead of `-mcpu=pwr8`. :)

nemanjai marked 6 inline comments as done.Oct 25 2019, 4:12 PM

nemanjai added inline comments.

lib/Target/PowerPC/PPCISelLowering.cpp
1304 ↗	(On Diff #226125)	Oops. Thanks, I'l fix it.
7210 ↗	(On Diff #226125)	Fair enough, I'll move this down below the min/max switch.
7223 ↗	(On Diff #226125)	That is a really good point, this will prevent us from generating the `fsel` on P9. I'll fix it up.
lib/Target/PowerPC/PPCInstrVSX.td
1290 ↗	(On Diff #226125)	I agree 100% that it's nicer to keep it organized and consistent. Will update, thank you.
test/CodeGen/PowerPC/ctr-minmaxnum.ll
69 ↗	(On Diff #226125)	Oops, overzealous search-and-replace...
test/CodeGen/PowerPC/scalar-min-max.ll
2 ↗	(On Diff #226125)	I think it is good to show that we get the same instructions with fast math on both P8 and P9.

Address the minor comments and fix the early exit on P9 with no NaNs/Infs.

LGTM. Thanks.

This revision is now accepted and ready to land.Oct 27 2019, 9:11 AM

Oh, maybe the title should be updated to remove "unsafe fp math" to avoid confusion?

Closed by commit rG25a41ad24200: [PowerPC] Emit scalar fp min/max instructions (authored by nemanjai). · Explain WhyOct 28 2019, 5:54 PM

This revision was automatically updated to reflect the committed changes.

ZhangKang mentioned this in D74701: [PowerPC] Fix the unexpected modification caused by D62993 in LowerSELECT_CC for power9.Feb 17 2020, 12:52 AM

ZhangKang mentioned this in rGb083d7a3460d: [PowerPC] Fix the unexpected modification caused by D62993 in LowerSELECT_CC….Feb 25 2020, 7:00 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

PowerPC/

3 lines

42 lines

7 lines

69 lines

test/

CodeGen/

PowerPC/

ctr-minmaxnum.ll

40 lines

scalar-min-max.ll

203 lines

Diff 226825

llvm/lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	namespace PPCISD {
enum NodeType : unsigned {		enum NodeType : unsigned {
// Start the numbering where the builtin ops and target ops leave off.		// Start the numbering where the builtin ops and target ops leave off.
FIRST_NUMBER = ISD::BUILTIN_OP_END,		FIRST_NUMBER = ISD::BUILTIN_OP_END,

/// FSEL - Traditional three-operand fsel node.		/// FSEL - Traditional three-operand fsel node.
///		///
FSEL,		FSEL,

		/// XSMAXCDP, XSMINCDP - C-type min/max instructions.
		XSMAXCDP, XSMINCDP,

/// FCFID - The FCFID instruction, taking an f64 operand and producing		/// FCFID - The FCFID instruction, taking an f64 operand and producing
/// and f64 value containing the FP representation of the integer that		/// and f64 value containing the FP representation of the integer that
/// was temporarily in the f64 operand.		/// was temporarily in the f64 operand.
FCFID,		FCFID,

/// Newer FCFID[US] integer-to-floating-point conversion instructions for		/// Newer FCFID[US] integer-to-floating-point conversion instructions for
/// unsigned integers and single-precision outputs.		/// unsigned integers and single-precision outputs.
FCFIDU, FCFIDS, FCFIDUS,		FCFIDU, FCFIDS, FCFIDUS,
▲ Show 20 Lines • Show All 1,178 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 542 Lines • ▼ Show 20 Lines	if (Subtarget.use64BitRegs()) {
setOperationAction(ISD::SRL_PARTS, MVT::i64, Custom);		setOperationAction(ISD::SRL_PARTS, MVT::i64, Custom);
} else {		} else {
// 32-bit PowerPC wants to expand i64 shifts itself.		// 32-bit PowerPC wants to expand i64 shifts itself.
setOperationAction(ISD::SHL_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SHL_PARTS, MVT::i32, Custom);
setOperationAction(ISD::SRA_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SRA_PARTS, MVT::i32, Custom);
setOperationAction(ISD::SRL_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SRL_PARTS, MVT::i32, Custom);
}		}

		if (Subtarget.hasVSX()) {
		jsjiUnsubmitted Not Done Reply Inline Actions Why we need `TM.Options.UnsafeFPMath` here? If `ISD::FMAXNUM_IEEE` is generated, then the semantic is exact the same as `xsmaxdp, we should be safe to use `xsmaxdp`. I think we can also add actions for `ISD::FMAXNUM`/`FMAXIMUM` and `ISD::FMINNUM`/`FMINIMUM`, then we do need `TM.Options.UnsafeFPMath` or `TM.Options.NoNansFPMath`/`NoSignedZerosFPMath` for them. jsji: Why we need `TM.Options.UnsafeFPMath` here? If `ISD::FMAXNUM_IEEE` is generated, then the…
		nemanjaiAuthorUnsubmitted Done Reply Inline Actions When I originally did this, it would produce these nodes along with `ISD::FCANONICALIZE` when unsafe fp math isn't specified. However, the DAG combiner seems to have been modified to not do that any longer. The instructions themselves handle SNaNs correctly anyway so we can handle the inputs coming from `ISD::FCANONICALIZE` anyway. However, I don't really see a point in legalizing `FMAXNUM/FMINNUM` since we will just get the `_IEEE` versions even with fast math. nemanjai: When I originally did this, it would produce these nodes along with `ISD::FCANONICALIZE` when…
		setOperationAction(ISD::FMAXNUM_IEEE, MVT::f64, Legal);
		setOperationAction(ISD::FMAXNUM_IEEE, MVT::f32, Legal);
		setOperationAction(ISD::FMINNUM_IEEE, MVT::f64, Legal);
		setOperationAction(ISD::FMINNUM_IEEE, MVT::f32, Legal);
		}

if (Subtarget.hasAltivec()) {		if (Subtarget.hasAltivec()) {
// First set operation action for all vector types to expand. Then we		// First set operation action for all vector types to expand. Then we
// will selectively turn on ones that can be effectively codegen'd.		// will selectively turn on ones that can be effectively codegen'd.
for (MVT VT : MVT::fixedlen_vector_valuetypes()) {		for (MVT VT : MVT::fixedlen_vector_valuetypes()) {
// add/sub are legal for all supported vector VT's.		// add/sub are legal for all supported vector VT's.
setOperationAction(ISD::ADD, VT, Legal);		setOperationAction(ISD::ADD, VT, Legal);
setOperationAction(ISD::SUB, VT, Legal);		setOperationAction(ISD::SUB, VT, Legal);

▲ Show 20 Lines • Show All 730 Lines • ▼ Show 20 Lines
bool PPCTargetLowering::preferIncOfAddToSubOfNot(EVT VT) const {		bool PPCTargetLowering::preferIncOfAddToSubOfNot(EVT VT) const {
return VT.isScalarInteger();		return VT.isScalarInteger();
}		}

const char *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const {		const char *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const {
switch ((PPCISD::NodeType)Opcode) {		switch ((PPCISD::NodeType)Opcode) {
case PPCISD::FIRST_NUMBER: break;		case PPCISD::FIRST_NUMBER: break;
case PPCISD::FSEL: return "PPCISD::FSEL";		case PPCISD::FSEL: return "PPCISD::FSEL";
		case PPCISD::XSMAXCDP: return "PPCISD::XSMAXCDP";
		case PPCISD::XSMINCDP: return "PPCISD::XSMINCDP";
case PPCISD::FCFID: return "PPCISD::FCFID";		case PPCISD::FCFID: return "PPCISD::FCFID";
case PPCISD::FCFIDU: return "PPCISD::FCFIDU";		case PPCISD::FCFIDU: return "PPCISD::FCFIDU";
case PPCISD::FCFIDS: return "PPCISD::FCFIDS";		case PPCISD::FCFIDS: return "PPCISD::FCFIDS";
case PPCISD::FCFIDUS: return "PPCISD::FCFIDUS";		case PPCISD::FCFIDUS: return "PPCISD::FCFIDUS";
case PPCISD::FCTIDZ: return "PPCISD::FCTIDZ";		case PPCISD::FCTIDZ: return "PPCISD::FCTIDZ";
case PPCISD::FCTIWZ: return "PPCISD::FCTIWZ";		case PPCISD::FCTIWZ: return "PPCISD::FCTIWZ";
case PPCISD::FCTIDUZ: return "PPCISD::FCTIDUZ";		case PPCISD::FCTIDUZ: return "PPCISD::FCTIDUZ";
case PPCISD::FCTIWUZ: return "PPCISD::FCTIWUZ";		case PPCISD::FCTIWUZ: return "PPCISD::FCTIWUZ";
▲ Show 20 Lines • Show All 5,904 Lines • ▼ Show 20 Lines
/// LowerSELECT_CC - Lower floating point select_cc's into fsel instruction when		/// LowerSELECT_CC - Lower floating point select_cc's into fsel instruction when
/// possible.		/// possible.
SDValue PPCTargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const {		SDValue PPCTargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const {
// Not FP? Not a fsel.		// Not FP? Not a fsel.
if (!Op.getOperand(0).getValueType().isFloatingPoint() \|\|		if (!Op.getOperand(0).getValueType().isFloatingPoint() \|\|
!Op.getOperand(2).getValueType().isFloatingPoint())		!Op.getOperand(2).getValueType().isFloatingPoint())
return Op;		return Op;

		bool HasNoInfs = DAG.getTarget().Options.NoInfsFPMath;
		bool HasNoNaNs = DAG.getTarget().Options.NoNaNsFPMath;
// We might be able to do better than this under some circumstances, but in		// We might be able to do better than this under some circumstances, but in
// general, fsel-based lowering of select is a finite-math-only optimization.		// general, fsel-based lowering of select is a finite-math-only optimization.
// For more information, see section F.3 of the 2.06 ISA specification.		// For more information, see section F.3 of the 2.06 ISA specification.
if (!DAG.getTarget().Options.NoInfsFPMath \|\|		// With ISA 3.0, we have xsmaxcdp/xsmincdp which are OK to emit even in the
!DAG.getTarget().Options.NoNaNsFPMath)		// presence of infinities.
		if (!Subtarget.hasP9Vector() && (!HasNoInfs \|\| !HasNoNaNs))
return Op;		return Op;
// TODO: Propagate flags from the select rather than global settings.
SDNodeFlags Flags;
Flags.setNoInfs(true);
Flags.setNoNaNs(true);

ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(4))->get();		ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(4))->get();

EVT ResVT = Op.getValueType();		EVT ResVT = Op.getValueType();
EVT CmpVT = Op.getOperand(0).getValueType();		EVT CmpVT = Op.getOperand(0).getValueType();
SDValue LHS = Op.getOperand(0), RHS = Op.getOperand(1);		SDValue LHS = Op.getOperand(0), RHS = Op.getOperand(1);
SDValue TV = Op.getOperand(2), FV = Op.getOperand(3);		SDValue TV = Op.getOperand(2), FV = Op.getOperand(3);
SDLoc dl(Op);		SDLoc dl(Op);

		if (Subtarget.hasP9Vector() && LHS == TV && RHS == FV) {
		switch (CC) {
		default:
		// Not a min/max but with finite math, we may still be able to use fsel.
		if (HasNoInfs && HasNoNaNs)
		break;
		return Op;
		case ISD::SETOGT:
		case ISD::SETGT:
		return DAG.getNode(PPCISD::XSMAXCDP, dl, Op.getValueType(), LHS, RHS);
		case ISD::SETOLT:
		case ISD::SETLT:
		return DAG.getNode(PPCISD::XSMINCDP, dl, Op.getValueType(), LHS, RHS);
		}
		}

		// TODO: Propagate flags from the select rather than global settings.
		SDNodeFlags Flags;
		Flags.setNoInfs(true);
		Flags.setNoNaNs(true);

// If the RHS of the comparison is a 0.0, we don't need to do the		// If the RHS of the comparison is a 0.0, we don't need to do the
// subtraction at all.		// subtraction at all.
SDValue Sel1;		SDValue Sel1;
if (isFloatingPointZero(RHS))		if (isFloatingPointZero(RHS))
switch (CC) {		switch (CC) {
default: break; // SETUO etc aren't handled by fsel.		default: break; // SETUO etc aren't handled by fsel.
case ISD::SETNE:		case ISD::SETNE:
std::swap(TV, FV);		std::swap(TV, FV);
▲ Show 20 Lines • Show All 8,299 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCInstrInfo.td

	Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines
	def SDT_PPCqvlfsb : SDTypeProfile<1, 1, [			def SDT_PPCqvlfsb : SDTypeProfile<1, 1, [
	SDTCisVec<0>, SDTCisPtrTy<1>			SDTCisVec<0>, SDTCisPtrTy<1>
	]>;			]>;

	def SDT_PPCextswsli : SDTypeProfile<1, 2, [ // extswsli			def SDT_PPCextswsli : SDTypeProfile<1, 2, [ // extswsli
	SDTCisInt<0>, SDTCisInt<1>, SDTCisOpSmallerThanOp<1, 0>, SDTCisInt<2>			SDTCisInt<0>, SDTCisInt<1>, SDTCisOpSmallerThanOp<1, 0>, SDTCisInt<2>
	]>;			]>;

				def SDT_PPCFPMinMax : SDTypeProfile<1, 2, [
				SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisFP<0>
				]>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// PowerPC specific DAG Nodes.			// PowerPC specific DAG Nodes.
	//			//

	def PPCfre : SDNode<"PPCISD::FRE", SDTFPUnaryOp, []>;			def PPCfre : SDNode<"PPCISD::FRE", SDTFPUnaryOp, []>;
	def PPCfrsqrte: SDNode<"PPCISD::FRSQRTE", SDTFPUnaryOp, []>;			def PPCfrsqrte: SDNode<"PPCISD::FRSQRTE", SDTFPUnaryOp, []>;

	def PPCfcfid : SDNode<"PPCISD::FCFID", SDTFPUnaryOp, []>;			def PPCfcfid : SDNode<"PPCISD::FCFID", SDTFPUnaryOp, []>;
	Show All 32 Lines
	// Perform FADD in round-to-zero mode.			// Perform FADD in round-to-zero mode.
	def PPCfaddrtz: SDNode<"PPCISD::FADDRTZ", SDTFPBinOp, []>;			def PPCfaddrtz: SDNode<"PPCISD::FADDRTZ", SDTFPBinOp, []>;


	def PPCfsel : SDNode<"PPCISD::FSEL",			def PPCfsel : SDNode<"PPCISD::FSEL",
	// Type constraint for fsel.			// Type constraint for fsel.
	SDTypeProfile<1, 3, [SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>,			SDTypeProfile<1, 3, [SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>,
	SDTCisFP<0>, SDTCisVT<1, f64>]>, []>;			SDTCisFP<0>, SDTCisVT<1, f64>]>, []>;
				def PPCxsmaxc : SDNode<"PPCISD::XSMAXCDP", SDT_PPCFPMinMax, []>;
				def PPCxsminc : SDNode<"PPCISD::XSMINCDP", SDT_PPCFPMinMax, []>;
	def PPChi : SDNode<"PPCISD::Hi", SDTIntBinOp, []>;			def PPChi : SDNode<"PPCISD::Hi", SDTIntBinOp, []>;
	def PPClo : SDNode<"PPCISD::Lo", SDTIntBinOp, []>;			def PPClo : SDNode<"PPCISD::Lo", SDTIntBinOp, []>;
	def PPCtoc_entry: SDNode<"PPCISD::TOC_ENTRY", SDTIntBinOp,			def PPCtoc_entry: SDNode<"PPCISD::TOC_ENTRY", SDTIntBinOp,
	[SDNPMayLoad, SDNPMemOperand]>;			[SDNPMayLoad, SDNPMemOperand]>;
	def PPCvmaddfp : SDNode<"PPCISD::VMADDFP", SDTFPTernaryOp, []>;			def PPCvmaddfp : SDNode<"PPCISD::VMADDFP", SDTFPTernaryOp, []>;
	def PPCvnmsubfp : SDNode<"PPCISD::VNMSUBFP", SDTFPTernaryOp, []>;			def PPCvnmsubfp : SDNode<"PPCISD::VNMSUBFP", SDTFPTernaryOp, []>;

	def PPCppc32GOT : SDNode<"PPCISD::PPC32_GOT", SDTIntLeaf, []>;			def PPCppc32GOT : SDNode<"PPCISD::PPC32_GOT", SDTIntLeaf, []>;
	▲ Show 20 Lines • Show All 4,858 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCInstrVSX.td

Show First 20 Lines • Show All 990 Lines • ▼ Show 20 Lines	def : Pat<(v4i32 (or (and (vnot_ppc v4i32:$C), v4i32:$A),
(v4i32 (XXSEL $A, $B, $C))>;		(v4i32 (XXSEL $A, $B, $C))>;

let Predicates = [IsBigEndian] in {		let Predicates = [IsBigEndian] in {
def : Pat<(v2f64 (scalar_to_vector f64:$A)),		def : Pat<(v2f64 (scalar_to_vector f64:$A)),
(v2f64 (SUBREG_TO_REG (i64 1), $A, sub_64))>;		(v2f64 (SUBREG_TO_REG (i64 1), $A, sub_64))>;

def : Pat<(f64 (extractelt v2f64:$S, 0)),		def : Pat<(f64 (extractelt v2f64:$S, 0)),
(f64 (EXTRACT_SUBREG $S, sub_64))>;		(f64 (EXTRACT_SUBREG $S, sub_64))>;
def : Pat<(f64 (extractelt v2f64:$S, 1)),		def : Pat<(f64 (extractelt v2f64:$S, 1)),
		jsjiUnsubmitted Not Done Reply Inline Actions We can add similar patterns for `fmaxnum`/`fmaxinum` and `fminnum`/`fmininum` with `Predicates` ? jsji: We can add similar patterns for `fmaxnum`/`fmaxinum` and `fminnum`/`fmininum` with `Predicates`…
(f64 (EXTRACT_SUBREG (XXPERMDI $S, $S, 2), sub_64))>;		(f64 (EXTRACT_SUBREG (XXPERMDI $S, $S, 2), sub_64))>;
}		}

let Predicates = [IsLittleEndian] in {		let Predicates = [IsLittleEndian] in {
def : Pat<(v2f64 (scalar_to_vector f64:$A)),		def : Pat<(v2f64 (scalar_to_vector f64:$A)),
(v2f64 (XXPERMDI (SUBREG_TO_REG (i64 1), $A, sub_64),		(v2f64 (XXPERMDI (SUBREG_TO_REG (i64 1), $A, sub_64),
(SUBREG_TO_REG (i64 1), $A, sub_64), 0))>;		(SUBREG_TO_REG (i64 1), $A, sub_64), 0))>;

▲ Show 20 Lines • Show All 242 Lines • ▼ Show 20 Lines	def : Pat<(f64 (PPCfcfidu (PPCmtvsra (i64 (vector_extract v2i64:$S, 0))))),
(f64 (XSCVUXDDP (COPY_TO_REGCLASS $S, VSFRC)))>;		(f64 (XSCVUXDDP (COPY_TO_REGCLASS $S, VSFRC)))>;
def : Pat<(f64 (PPCfcfidu (PPCmtvsra (i64 (vector_extract v2i64:$S, 1))))),		def : Pat<(f64 (PPCfcfidu (PPCmtvsra (i64 (vector_extract v2i64:$S, 1))))),
(f64 (XSCVUXDDP (COPY_TO_REGCLASS (XXPERMDI $S, $S, 2), VSFRC)))>;		(f64 (XSCVUXDDP (COPY_TO_REGCLASS (XXPERMDI $S, $S, 2), VSFRC)))>;
} // IsBigEndian		} // IsBigEndian

} // AddedComplexity		} // AddedComplexity
} // HasVSX		} // HasVSX

		def FpMinMax {
		dag F32Min = (COPY_TO_REGCLASS (XSMINDP (COPY_TO_REGCLASS $A, VSFRC),
		(COPY_TO_REGCLASS $B, VSFRC)),
		VSSRC);
		dag F32Max = (COPY_TO_REGCLASS (XSMAXDP (COPY_TO_REGCLASS $A, VSFRC),
		(COPY_TO_REGCLASS $B, VSFRC)),
		VSSRC);
		}

		let AddedComplexity = 400, Predicates = [HasVSX] in {
		// f32 Min.
		def : Pat<(f32 (fminnum_ieee f32:$A, f32:$B)),
		(f32 FpMinMax.F32Min)>;
		def : Pat<(f32 (fminnum_ieee (fcanonicalize f32:$A), f32:$B)),
		(f32 FpMinMax.F32Min)>;
		def : Pat<(f32 (fminnum_ieee f32:$A, (fcanonicalize f32:$B))),
		(f32 FpMinMax.F32Min)>;
		def : Pat<(f32 (fminnum_ieee (fcanonicalize f32:$A), (fcanonicalize f32:$B))),
		(f32 FpMinMax.F32Min)>;
		// F32 Max.
		def : Pat<(f32 (fmaxnum_ieee f32:$A, f32:$B)),
		(f32 FpMinMax.F32Max)>;
		def : Pat<(f32 (fmaxnum_ieee (fcanonicalize f32:$A), f32:$B)),
		(f32 FpMinMax.F32Max)>;
		def : Pat<(f32 (fmaxnum_ieee f32:$A, (fcanonicalize f32:$B))),
		(f32 FpMinMax.F32Max)>;
		def : Pat<(f32 (fmaxnum_ieee (fcanonicalize f32:$A), (fcanonicalize f32:$B))),
		(f32 FpMinMax.F32Max)>;

		// f64 Min.
		def : Pat<(f64 (fminnum_ieee f64:$A, f64:$B)),
		(f64 (XSMINDP $A, $B))>;
		def : Pat<(f64 (fminnum_ieee (fcanonicalize f64:$A), f64:$B)),
		(f64 (XSMINDP $A, $B))>;
		def : Pat<(f64 (fminnum_ieee f64:$A, (fcanonicalize f64:$B))),
		(f64 (XSMINDP $A, $B))>;
		def : Pat<(f64 (fminnum_ieee (fcanonicalize f64:$A), (fcanonicalize f64:$B))),
		(f64 (XSMINDP $A, $B))>;
		// f64 Max.
		def : Pat<(f64 (fmaxnum_ieee f64:$A, f64:$B)),
		(f64 (XSMAXDP $A, $B))>;
		def : Pat<(f64 (fmaxnum_ieee (fcanonicalize f64:$A), f64:$B)),
		(f64 (XSMAXDP $A, $B))>;
		def : Pat<(f64 (fmaxnum_ieee f64:$A, (fcanonicalize f64:$B))),
		(f64 (XSMAXDP $A, $B))>;
		def : Pat<(f64 (fmaxnum_ieee (fcanonicalize f64:$A), (fcanonicalize f64:$B))),
		(f64 (XSMAXDP $A, $B))>;
		}

def ScalarLoads {		def ScalarLoads {
dag Li8 = (i32 (extloadi8 xoaddr:$src));		dag Li8 = (i32 (extloadi8 xoaddr:$src));
dag ZELi8 = (i32 (zextloadi8 xoaddr:$src));		dag ZELi8 = (i32 (zextloadi8 xoaddr:$src));
dag ZELi8i64 = (i64 (zextloadi8 xoaddr:$src));		dag ZELi8i64 = (i64 (zextloadi8 xoaddr:$src));
dag SELi8 = (i32 (sext_inreg (extloadi8 xoaddr:$src), i8));		dag SELi8 = (i32 (sext_inreg (extloadi8 xoaddr:$src), i8));
dag SELi8i64 = (i64 (sext_inreg (extloadi8 xoaddr:$src), i8));		dag SELi8i64 = (i64 (sext_inreg (extloadi8 xoaddr:$src), i8));

dag Li16 = (i32 (extloadi16 xoaddr:$src));		dag Li16 = (i32 (extloadi16 xoaddr:$src));
▲ Show 20 Lines • Show All 1,613 Lines • ▼ Show 20 Lines	def XVTSTDCDP : XX2_RD6_DCMX7_RS6<60, 15, 5,
(outs vsrc:$XT), (ins u7imm:$DCMX, vsrc:$XB),		(outs vsrc:$XT), (ins u7imm:$DCMX, vsrc:$XB),
"xvtstdcdp $XT, $XB, $DCMX", IIC_VecFP,		"xvtstdcdp $XT, $XB, $DCMX", IIC_VecFP,
[(set v2i64: $XT,		[(set v2i64: $XT,
(int_ppc_vsx_xvtstdcdp v2f64:$XB, timm:$DCMX))]>;		(int_ppc_vsx_xvtstdcdp v2f64:$XB, timm:$DCMX))]>;

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//

// Maximum/Minimum Type-C/Type-J DP		// Maximum/Minimum Type-C/Type-J DP
// XT.dword[1] = 0xUUUU_UUUU_UUUU_UUUU, so we use vsrc for XT		def XSMAXCDP : XX3_XT5_XA5_XB5<60, 128, "xsmaxcdp", vsfrc, vsfrc, vsfrc,
def XSMAXCDP : XX3_XT5_XA5_XB5<60, 128, "xsmaxcdp", vsrc, vsfrc, vsfrc,		IIC_VecFP,
IIC_VecFP, []>;		[(set f64:$XT, (PPCxsmaxc f64:$XA, f64:$XB))]>;
def XSMAXJDP : XX3_XT5_XA5_XB5<60, 144, "xsmaxjdp", vsrc, vsfrc, vsfrc,		def XSMAXJDP : XX3_XT5_XA5_XB5<60, 144, "xsmaxjdp", vsrc, vsfrc, vsfrc,
IIC_VecFP, []>;		IIC_VecFP, []>;
def XSMINCDP : XX3_XT5_XA5_XB5<60, 136, "xsmincdp", vsrc, vsfrc, vsfrc,		def XSMINCDP : XX3_XT5_XA5_XB5<60, 136, "xsmincdp", vsfrc, vsfrc, vsfrc,
IIC_VecFP, []>;		IIC_VecFP,
		[(set f64:$XT, (PPCxsminc f64:$XA, f64:$XB))]>;
def XSMINJDP : XX3_XT5_XA5_XB5<60, 152, "xsminjdp", vsrc, vsfrc, vsfrc,		def XSMINJDP : XX3_XT5_XA5_XB5<60, 152, "xsminjdp", vsrc, vsfrc, vsfrc,
IIC_VecFP, []>;		IIC_VecFP, []>;

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//

// Vector Byte-Reverse H/W/D/Q Word		// Vector Byte-Reverse H/W/D/Q Word
def XXBRH : XX2_XT6_XO5_XB6<60, 7, 475, "xxbrh", vsrc, []>;		def XXBRH : XX2_XT6_XO5_XB6<60, 7, 475, "xxbrh", vsrc, []>;
def XXBRW : XX2_XT6_XO5_XB6<60, 15, 475, "xxbrw", vsrc, []>;		def XXBRW : XX2_XT6_XO5_XB6<60, 15, 475, "xxbrw", vsrc, []>;
▲ Show 20 Lines • Show All 790 Lines • ▼ Show 20 Lines
// Round & Convert QP -> DP/SP		// Round & Convert QP -> DP/SP
def : Pat<(f64 (fpround f128:$src)), (f64 (XSCVQPDP $src))>;		def : Pat<(f64 (fpround f128:$src)), (f64 (XSCVQPDP $src))>;
def : Pat<(f32 (fpround f128:$src)), (f32 (XSRSP (XSCVQPDPO $src)))>;		def : Pat<(f32 (fpround f128:$src)), (f32 (XSRSP (XSCVQPDPO $src)))>;

// Convert SP -> QP		// Convert SP -> QP
def : Pat<(f128 (fpextend f32:$src)),		def : Pat<(f128 (fpextend f32:$src)),
(f128 (XSCVDPQP (COPY_TO_REGCLASS $src, VFRC)))>;		(f128 (XSCVDPQP (COPY_TO_REGCLASS $src, VFRC)))>;

		def : Pat<(f32 (PPCxsmaxc f32:$XA, f32:$XB)),
		(f32 (COPY_TO_REGCLASS (XSMAXCDP (COPY_TO_REGCLASS $XA, VSSRC),
		(COPY_TO_REGCLASS $XB, VSSRC)),
		VSSRC))>;
		def : Pat<(f32 (PPCxsminc f32:$XA, f32:$XB)),
		(f32 (COPY_TO_REGCLASS (XSMINCDP (COPY_TO_REGCLASS $XA, VSSRC),
		(COPY_TO_REGCLASS $XB, VSSRC)),
		VSSRC))>;

} // end HasP9Vector, AddedComplexity		} // end HasP9Vector, AddedComplexity

let AddedComplexity = 400 in {		let AddedComplexity = 400 in {
let Predicates = [IsISA3_0, HasP9Vector, HasDirectMove, IsBigEndian] in {		let Predicates = [IsISA3_0, HasP9Vector, HasDirectMove, IsBigEndian] in {
def : Pat<(f128 (PPCbuild_fp128 i64:$rB, i64:$rA)),		def : Pat<(f128 (PPCbuild_fp128 i64:$rB, i64:$rA)),
(f128 (COPY_TO_REGCLASS (MTVSRDD $rB, $rA), VRRC))>;		(f128 (COPY_TO_REGCLASS (MTVSRDD $rB, $rA), VRRC))>;
}		}
let Predicates = [IsISA3_0, HasP9Vector, HasDirectMove, IsLittleEndian] in {		let Predicates = [IsISA3_0, HasP9Vector, HasDirectMove, IsLittleEndian] in {
▲ Show 20 Lines • Show All 585 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/ctr-minmaxnum.ll

Show All 30 Lines	loop_body:
br i1 %2, label %loop_exit, label %loop_body		br i1 %2, label %loop_exit, label %loop_body

loop_exit:		loop_exit:
ret void		ret void
}		}

; CHECK-LABEL: test1:		; CHECK-LABEL: test1:
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: bl fminf		; CHECK: xsmindp
; CHECK-NOT: bl fminf		; CHECK-NOT: xsmindp
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: blr		; CHECK: blr

define void @test1v(<4 x float> %f, <4 x float>* %fp) {		define void @test1v(<4 x float> %f, <4 x float>* %fp) {
entry:		entry:
br label %loop_body		br label %loop_body

loop_body:		loop_body:
%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]		%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]
%0 = call <4 x float> @llvm.minnum.v4f32(<4 x float> %f, <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>)		%0 = call <4 x float> @llvm.minnum.v4f32(<4 x float> %f, <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>)
store <4 x float> %0, <4 x float>* %fp, align 16		store <4 x float> %0, <4 x float>* %fp, align 16
%1 = add i64 %invar_address.dim.0.01, 1		%1 = add i64 %invar_address.dim.0.01, 1
%2 = icmp eq i64 %1, 4		%2 = icmp eq i64 %1, 4
br i1 %2, label %loop_exit, label %loop_body		br i1 %2, label %loop_exit, label %loop_body

loop_exit:		loop_exit:
ret void		ret void
}		}

; CHECK-LABEL: test1v:		; CHECK-LABEL: test1v:
; CHECK: xvminsp		; CHECK: xvminsp
; CHECK-NOT: bl fminf		; CHECK-NOT: xsmindp
; CHECK: mtctr		; CHECK: mtctr
; CHECK-NOT: bl fminf		; CHECK-NOT: xsmindp
; CHECK: blr		; CHECK: blr

; QPX-LABEL: test1v:		; QPX-LABEL: test1v:
; QPX: mtctr		; QPX: mtctr
; QPX-NOT: bl fminf		; QPX-NOT: bl fminf
; QPX: blr		; QPX: blr

define void @test1a(float %f, float* %fp) {		define void @test1a(float %f, float* %fp) {
Show All 9 Lines	loop_body:
br i1 %2, label %loop_exit, label %loop_body		br i1 %2, label %loop_exit, label %loop_body

loop_exit:		loop_exit:
ret void		ret void
}		}

; CHECK-LABEL: test1a:		; CHECK-LABEL: test1a:
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: bl fminf		; CHECK: xsmindp
; CHECK-NOT: bl fminf		; CHECK-NOT: xsmindp
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: blr		; CHECK: blr

define void @test2(float %f, float* %fp) {		define void @test2(float %f, float* %fp) {
entry:		entry:
br label %loop_body		br label %loop_body

loop_body:		loop_body:
%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]		%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]
%0 = call float @llvm.maxnum.f32(float %f, float 1.0)		%0 = call float @llvm.maxnum.f32(float %f, float 1.0)
store float %0, float* %fp, align 4		store float %0, float* %fp, align 4
%1 = add i64 %invar_address.dim.0.01, 1		%1 = add i64 %invar_address.dim.0.01, 1
%2 = icmp eq i64 %1, 2		%2 = icmp eq i64 %1, 2
br i1 %2, label %loop_exit, label %loop_body		br i1 %2, label %loop_exit, label %loop_body

loop_exit:		loop_exit:
ret void		ret void
}		}

; CHECK-LABEL: test2:		; CHECK-LABEL: test2:
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: bl fmaxf		; CHECK: xsmaxdp
; CHECK-NOT: bl fmaxf		; CHECK-NOT: xsmaxdp
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: blr		; CHECK: blr

define void @test2v(<4 x double> %f, <4 x double>* %fp) {		define void @test2v(<4 x double> %f, <4 x double>* %fp) {
entry:		entry:
br label %loop_body		br label %loop_body

loop_body:		loop_body:
%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]		%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]
%0 = call <4 x double> @llvm.maxnum.v4f64(<4 x double> %f, <4 x double> <double 1.0, double 1.0, double 1.0, double 1.0>)		%0 = call <4 x double> @llvm.maxnum.v4f64(<4 x double> %f, <4 x double> <double 1.0, double 1.0, double 1.0, double 1.0>)
store <4 x double> %0, <4 x double>* %fp, align 16		store <4 x double> %0, <4 x double>* %fp, align 16
%1 = add i64 %invar_address.dim.0.01, 1		%1 = add i64 %invar_address.dim.0.01, 1
%2 = icmp eq i64 %1, 4		%2 = icmp eq i64 %1, 4
br i1 %2, label %loop_exit, label %loop_body		br i1 %2, label %loop_exit, label %loop_body

loop_exit:		loop_exit:
ret void		ret void
}		}

; CHECK-LABEL: test2v:		; CHECK-LABEL: test2v:
; CHECK: xvmaxdp		; CHECK: xvmaxdp
; CHECK: xvmaxdp		; CHECK: xvmaxdp
; CHECK-NOT: bl fmax		; CHECK-NOT: xsmaxdp
; CHECK: mtctr		; CHECK: mtctr
; CHECK-NOT: bl fmax		; CHECK-NOT: xsmaxdp
; CHECK: blr		; CHECK: blr

; QPX-LABEL: test2v:		; QPX-LABEL: test2v:
; QPX: mtctr		; QPX: mtctr
; QPX-NOT: bl fmax		; QPX-NOT: bl fmax
; QPX: blr		; QPX: blr

define void @test2a(float %f, float* %fp) {		define void @test2a(float %f, float* %fp) {
Show All 9 Lines	loop_body:
br i1 %2, label %loop_exit, label %loop_body		br i1 %2, label %loop_exit, label %loop_body

loop_exit:		loop_exit:
ret void		ret void
}		}

; CHECK-LABEL: test2a:		; CHECK-LABEL: test2a:
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: bl fmaxf		; CHECK: xsmaxdp
; CHECK-NOT: bl fmaxf		; CHECK-NOT: xsmaxdp
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: blr		; CHECK: blr

define void @test3(double %f, double* %fp) {		define void @test3(double %f, double* %fp) {
entry:		entry:
br label %loop_body		br label %loop_body

loop_body:		loop_body:
%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]		%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]
%0 = call double @llvm.minnum.f64(double %f, double 1.0)		%0 = call double @llvm.minnum.f64(double %f, double 1.0)
store double %0, double* %fp, align 8		store double %0, double* %fp, align 8
%1 = add i64 %invar_address.dim.0.01, 1		%1 = add i64 %invar_address.dim.0.01, 1
%2 = icmp eq i64 %1, 2		%2 = icmp eq i64 %1, 2
br i1 %2, label %loop_exit, label %loop_body		br i1 %2, label %loop_exit, label %loop_body

loop_exit:		loop_exit:
ret void		ret void
}		}

; CHECK-LABEL: test3:		; CHECK-LABEL: test3:
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: bl fmin		; CHECK: xsmindp
; CHECK-NOT: bl fmin		; CHECK-NOT: xsmindp
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: blr		; CHECK: blr

define void @test3a(double %f, double* %fp) {		define void @test3a(double %f, double* %fp) {
entry:		entry:
br label %loop_body		br label %loop_body

loop_body:		loop_body:
%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]		%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]
%0 = call double @fmin(double %f, double 1.0) readnone		%0 = call double @fmin(double %f, double 1.0) readnone
store double %0, double* %fp, align 8		store double %0, double* %fp, align 8
%1 = add i64 %invar_address.dim.0.01, 1		%1 = add i64 %invar_address.dim.0.01, 1
%2 = icmp eq i64 %1, 2		%2 = icmp eq i64 %1, 2
br i1 %2, label %loop_exit, label %loop_body		br i1 %2, label %loop_exit, label %loop_body

loop_exit:		loop_exit:
ret void		ret void
}		}

; CHECK-LABEL: test3a:		; CHECK-LABEL: test3a:
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: bl fmin		; CHECK: xsmindp
; CHECK-NOT: bl fmin		; CHECK-NOT: xsmindp
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: blr		; CHECK: blr

define void @test4(double %f, double* %fp) {		define void @test4(double %f, double* %fp) {
entry:		entry:
br label %loop_body		br label %loop_body

loop_body:		loop_body:
%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]		%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]
%0 = call double @llvm.maxnum.f64(double %f, double 1.0)		%0 = call double @llvm.maxnum.f64(double %f, double 1.0)
store double %0, double* %fp, align 8		store double %0, double* %fp, align 8
%1 = add i64 %invar_address.dim.0.01, 1		%1 = add i64 %invar_address.dim.0.01, 1
%2 = icmp eq i64 %1, 2		%2 = icmp eq i64 %1, 2
br i1 %2, label %loop_exit, label %loop_body		br i1 %2, label %loop_exit, label %loop_body

loop_exit:		loop_exit:
ret void		ret void
}		}

; CHECK-LABEL: test4:		; CHECK-LABEL: test4:
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: bl fmax		; CHECK: xsmaxdp
; CHECK-NOT: bl fmax		; CHECK-NOT: xsmaxdp
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: blr		; CHECK: blr

define void @test4a(double %f, double* %fp) {		define void @test4a(double %f, double* %fp) {
entry:		entry:
br label %loop_body		br label %loop_body

loop_body:		loop_body:
%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]		%invar_address.dim.0.01 = phi i64 [ 0, %entry ], [ %1, %loop_body ]
%0 = call double @fmax(double %f, double 1.0) readnone		%0 = call double @fmax(double %f, double 1.0) readnone
store double %0, double* %fp, align 8		store double %0, double* %fp, align 8
%1 = add i64 %invar_address.dim.0.01, 1		%1 = add i64 %invar_address.dim.0.01, 1
%2 = icmp eq i64 %1, 2		%2 = icmp eq i64 %1, 2
br i1 %2, label %loop_exit, label %loop_body		br i1 %2, label %loop_exit, label %loop_body

loop_exit:		loop_exit:
ret void		ret void
}		}

; CHECK-LABEL: test4a:		; CHECK-LABEL: test4a:
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: bl fmax		; CHECK: xsmaxdp
; CHECK-NOT: bl fmax		; CHECK-NOT: xsmaxdp
; CHECK-NOT: mtctr		; CHECK-NOT: mtctr
; CHECK: blr		; CHECK: blr

llvm/test/CodeGen/PowerPC/scalar-min-max.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mcpu=pwr8 -ppc-asm-full-reg-names --enable-unsafe-fp-math \
				; RUN: -verify-machineinstrs --enable-no-signed-zeros-fp-math \
				jsjiUnsubmitted Not Done Reply Inline Actions Maybe add `RUN` lines for `--enable-no-nans-fp-math`/`-enable-no-signed-zeros-fp-math`? jsji: Maybe add `RUN` lines for `--enable-no-nans-fp-math`/`-enable-no-signed-zeros-fp-math`?
				amykUnsubmitted Not Done Reply Inline Actions Maybe also add `-verify-machineinstrs` to our tests? amyk: Maybe also add `-verify-machineinstrs` to our tests?
				; RUN: --enable-no-nans-fp-math \
				; RUN: -mtriple=powerpc64le-unknown-unknown < %s \| FileCheck %s
				; RUN: llc -mcpu=pwr9 -ppc-asm-full-reg-names --enable-unsafe-fp-math \
				; RUN: -verify-machineinstrs --enable-no-signed-zeros-fp-math \
				; RUN: --enable-no-nans-fp-math \
				jsjiUnsubmitted Not Done Reply Inline Actions It would be great if we can pre-commit the testcase to show only difference . jsji: It would be great if we can pre-commit the testcase to show only difference .
				nemanjaiAuthorUnsubmitted Done Reply Inline Actions I will add a RUN line without the FMF flags which will show the difference in codegen in the test case itself. nemanjai: I will add a RUN line without the FMF flags which will show the difference in codegen in the…
				; RUN: -mtriple=powerpc64le-unknown-unknown < %s \| FileCheck %s
				; RUN: llc -mcpu=pwr9 -ppc-asm-full-reg-names -verify-machineinstrs \
				; RUN: -mtriple=powerpc64le-unknown-unknown < %s \| FileCheck %s \
				; RUN: --check-prefix=NO-FAST-P9
				; RUN: llc -mcpu=pwr8 -ppc-asm-full-reg-names -verify-machineinstrs \
				; RUN: -mtriple=powerpc64le-unknown-unknown < %s \| FileCheck %s \
				; RUN: --check-prefix=NO-FAST-P8
				define dso_local float @testfmax(float %a, float %b) local_unnamed_addr {
				; CHECK-LABEL: testfmax:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xsmaxdp f1, f1, f2
				; CHECK-NEXT: blr
				;
				; NO-FAST-P9-LABEL: testfmax:
				; NO-FAST-P9: # %bb.0: # %entry
				; NO-FAST-P9-NEXT: xsmaxcdp f1, f1, f2
				; NO-FAST-P9-NEXT: blr
				;
				; NO-FAST-P8-LABEL: testfmax:
				; NO-FAST-P8: # %bb.0: # %entry
				; NO-FAST-P8-NEXT: fcmpu cr0, f1, f2
				; NO-FAST-P8-NEXT: bgtlr cr0
				; NO-FAST-P8-NEXT: # %bb.1: # %entry
				; NO-FAST-P8-NEXT: fmr f1, f2
				; NO-FAST-P8-NEXT: blr
				entry:
				%cmp = fcmp ogt float %a, %b
				%cond = select i1 %cmp, float %a, float %b
				ret float %cond
				}

				define dso_local double @testdmax(double %a, double %b) local_unnamed_addr {
				; CHECK-LABEL: testdmax:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xsmaxdp f1, f1, f2
				; CHECK-NEXT: blr
				;
				; NO-FAST-P9-LABEL: testdmax:
				; NO-FAST-P9: # %bb.0: # %entry
				; NO-FAST-P9-NEXT: xsmaxcdp f1, f1, f2
				; NO-FAST-P9-NEXT: blr
				jsjiUnsubmitted Not Done Reply Inline Actions Why we need these attributes? Looks like these should be in different `RUN` line ? jsji: Why we need these attributes? Looks like these should be in different `RUN` line ?
				;
				; NO-FAST-P8-LABEL: testdmax:
				; NO-FAST-P8: # %bb.0: # %entry
				; NO-FAST-P8-NEXT: xscmpudp cr0, f1, f2
				; NO-FAST-P8-NEXT: bgtlr cr0
				; NO-FAST-P8-NEXT: # %bb.1: # %entry
				; NO-FAST-P8-NEXT: fmr f1, f2
				; NO-FAST-P8-NEXT: blr
				entry:
				%cmp = fcmp ogt double %a, %b
				%cond = select i1 %cmp, double %a, double %b
				ret double %cond
				}

				define dso_local float @testfmin(float %a, float %b) local_unnamed_addr {
				; CHECK-LABEL: testfmin:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xsmindp f1, f1, f2
				; CHECK-NEXT: blr
				;
				; NO-FAST-P9-LABEL: testfmin:
				; NO-FAST-P9: # %bb.0: # %entry
				; NO-FAST-P9-NEXT: xsmincdp f1, f1, f2
				; NO-FAST-P9-NEXT: blr
				;
				; NO-FAST-P8-LABEL: testfmin:
				; NO-FAST-P8: # %bb.0: # %entry
				; NO-FAST-P8-NEXT: fcmpu cr0, f1, f2
				; NO-FAST-P8-NEXT: bltlr cr0
				; NO-FAST-P8-NEXT: # %bb.1: # %entry
				; NO-FAST-P8-NEXT: fmr f1, f2
				; NO-FAST-P8-NEXT: blr
				entry:
				%cmp = fcmp olt float %a, %b
				%cond = select i1 %cmp, float %a, float %b
				ret float %cond
				}

				define dso_local double @testdmin(double %a, double %b) local_unnamed_addr {
				; CHECK-LABEL: testdmin:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xsmindp f1, f1, f2
				; CHECK-NEXT: blr
				;
				; NO-FAST-P9-LABEL: testdmin:
				; NO-FAST-P9: # %bb.0: # %entry
				; NO-FAST-P9-NEXT: xsmincdp f1, f1, f2
				; NO-FAST-P9-NEXT: blr
				;
				; NO-FAST-P8-LABEL: testdmin:
				; NO-FAST-P8: # %bb.0: # %entry
				; NO-FAST-P8-NEXT: xscmpudp cr0, f1, f2
				; NO-FAST-P8-NEXT: bltlr cr0
				; NO-FAST-P8-NEXT: # %bb.1: # %entry
				; NO-FAST-P8-NEXT: fmr f1, f2
				; NO-FAST-P8-NEXT: blr
				entry:
				%cmp = fcmp olt double %a, %b
				%cond = select i1 %cmp, double %a, double %b
				ret double %cond
				}

				define dso_local float @testfmax_fast(float %a, float %b) local_unnamed_addr {
				; CHECK-LABEL: testfmax_fast:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xsmaxdp f1, f1, f2
				; CHECK-NEXT: blr
				;
				; NO-FAST-P9-LABEL: testfmax_fast:
				; NO-FAST-P9: # %bb.0: # %entry
				; NO-FAST-P9-NEXT: xsmaxcdp f1, f1, f2
				; NO-FAST-P9-NEXT: blr
				;
				; NO-FAST-P8-LABEL: testfmax_fast:
				; NO-FAST-P8: # %bb.0: # %entry
				; NO-FAST-P8-NEXT: fcmpu cr0, f1, f2
				; NO-FAST-P8-NEXT: bgtlr cr0
				; NO-FAST-P8-NEXT: # %bb.1: # %entry
				; NO-FAST-P8-NEXT: fmr f1, f2
				; NO-FAST-P8-NEXT: blr
				entry:
				%cmp = fcmp fast ogt float %a, %b
				%cond = select i1 %cmp, float %a, float %b
				ret float %cond
				}
				define dso_local double @testdmax_fast(double %a, double %b) local_unnamed_addr {
				; CHECK-LABEL: testdmax_fast:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xsmaxdp f1, f1, f2
				; CHECK-NEXT: blr
				;
				; NO-FAST-P9-LABEL: testdmax_fast:
				; NO-FAST-P9: # %bb.0: # %entry
				; NO-FAST-P9-NEXT: xsmaxcdp f1, f1, f2
				; NO-FAST-P9-NEXT: blr
				;
				; NO-FAST-P8-LABEL: testdmax_fast:
				; NO-FAST-P8: # %bb.0: # %entry
				; NO-FAST-P8-NEXT: xscmpudp cr0, f1, f2
				; NO-FAST-P8-NEXT: bgtlr cr0
				; NO-FAST-P8-NEXT: # %bb.1: # %entry
				; NO-FAST-P8-NEXT: fmr f1, f2
				; NO-FAST-P8-NEXT: blr
				entry:
				%cmp = fcmp fast ogt double %a, %b
				%cond = select i1 %cmp, double %a, double %b
				ret double %cond
				}
				define dso_local float @testfmin_fast(float %a, float %b) local_unnamed_addr {
				; CHECK-LABEL: testfmin_fast:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xsmindp f1, f1, f2
				; CHECK-NEXT: blr
				;
				; NO-FAST-P9-LABEL: testfmin_fast:
				; NO-FAST-P9: # %bb.0: # %entry
				; NO-FAST-P9-NEXT: xsmincdp f1, f1, f2
				; NO-FAST-P9-NEXT: blr
				;
				; NO-FAST-P8-LABEL: testfmin_fast:
				; NO-FAST-P8: # %bb.0: # %entry
				; NO-FAST-P8-NEXT: fcmpu cr0, f1, f2
				; NO-FAST-P8-NEXT: bltlr cr0
				; NO-FAST-P8-NEXT: # %bb.1: # %entry
				; NO-FAST-P8-NEXT: fmr f1, f2
				; NO-FAST-P8-NEXT: blr
				entry:
				%cmp = fcmp fast olt float %a, %b
				%cond = select i1 %cmp, float %a, float %b
				ret float %cond
				}
				define dso_local double @testdmin_fast(double %a, double %b) local_unnamed_addr {
				; CHECK-LABEL: testdmin_fast:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xsmindp f1, f1, f2
				; CHECK-NEXT: blr
				;
				; NO-FAST-P9-LABEL: testdmin_fast:
				; NO-FAST-P9: # %bb.0: # %entry
				; NO-FAST-P9-NEXT: xsmincdp f1, f1, f2
				; NO-FAST-P9-NEXT: blr
				;
				; NO-FAST-P8-LABEL: testdmin_fast:
				; NO-FAST-P8: # %bb.0: # %entry
				; NO-FAST-P8-NEXT: xscmpudp cr0, f1, f2
				; NO-FAST-P8-NEXT: bltlr cr0
				; NO-FAST-P8-NEXT: # %bb.1: # %entry
				; NO-FAST-P8-NEXT: fmr f1, f2
				; NO-FAST-P8-NEXT: blr
				entry:
				%cmp = fcmp fast olt double %a, %b
				%cond = select i1 %cmp, double %a, double %b
				ret double %cond
				}

This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Emit scalar min/max instructions with unsafe fp mathClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 226825

llvm/lib/Target/PowerPC/PPCISelLowering.h

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

llvm/lib/Target/PowerPC/PPCInstrInfo.td

llvm/lib/Target/PowerPC/PPCInstrVSX.td

llvm/test/CodeGen/PowerPC/ctr-minmaxnum.ll

llvm/test/CodeGen/PowerPC/scalar-min-max.ll

[PowerPC] Emit scalar min/max instructions with unsafe fp math
ClosedPublic