This is an archive of the discontinued LLVM Phabricator instance.

Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize NoSignedZerosFPMath options control
ClosedPublic

Authored by mcberg2017 on Jul 23 2019, 3:56 PM.

Download Raw Diff

Details

Reviewers

spatel
arsenm
hfinkel
wristow
craig.topper

Commits

rG005d705d4392: Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize…
rL367486: Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize…

Summary

Honoring no signed zeroes is also available as a user control through clang separately regardless of fastmath or UnsafeFPMath context, DAG guards should reflect this context.

Diff Detail

Repository: rL LLVM

Event Timeline

mcberg2017 created this revision.Jul 23 2019, 3:56 PM

Herald added subscribers: jsji, MaskRay, javed.absar and 4 others. · View Herald TranscriptJul 23 2019, 3:56 PM

mcberg2017 marked an inline comment as done.Jul 23 2019, 3:59 PM

mcberg2017 added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12059 ↗	(On Diff #211364)	In this case, UnsafeFPMath holds reassociation context.

Herald added a subscriber: • wuzish. · View Herald TranscriptJul 23 2019, 4:00 PM

nhaehnle removed a subscriber: nhaehnle.Jul 24 2019, 12:15 AM

updated with one case...

Code changes look good. See inline to for some comments about the tests.

test/CodeGen/AMDGPU/enable-no-signed-zeros-fp-math.ll
3 ↗	(On Diff #211588)	Here and other test files: is it possible to remove "-enable-unsafe-fp-math" and still get the same test results? If we can tighten up the constraints, that would help move us away from the global requirements. Another option would be to specify the function-level attribute only on the tests that still require more than 'nsz' to produce the expected test results.
test/CodeGen/AMDGPU/fneg-combines.ll
222–226 ↗	(On Diff #211588)	I don't know how to interpret this test diff: regression, improvement, or does this test no longer accomplish its original intent?

spatel added inline comments.Jul 27 2019, 6:04 AM

test/CodeGen/AMDGPU/enable-no-signed-zeros-fp-math.ll
3 ↗	(On Diff #211588)	Still another option (and moves us closer still to the goal of IR/node-level flags only): can we remove the global settings entirely, add FMF to the IR in these tests, and maintain their intent?

mcberg2017 marked 2 inline comments as done.Jul 29 2019, 9:47 AM

mcberg2017 added inline comments.

test/CodeGen/AMDGPU/enable-no-signed-zeros-fp-math.ll
3 ↗	(On Diff #211588)	We need the new flag or attribute to keep the results the same. I will look over the tests and see where (hopefully all) we can use attributes in place of the flags.
test/CodeGen/AMDGPU/fneg-combines.ll
222–226 ↗	(On Diff #211588)	Yes, now we no longer optimize this case, should I just remove the gcn-nsz-dag context for fneg_fadd_0?

spatel added subscribers: nhaehnle, foad, rampitec.Jul 29 2019, 10:07 AM

spatel added inline comments.

test/CodeGen/AMDGPU/fneg-combines.ll
222–226 ↗	(On Diff #211588)	Someone with AMDGPU knowledge should answer that. cc'ing @arsenm @nhaehnle @foad @rampitec

rampitec added inline comments.Jul 29 2019, 10:12 AM

test/CodeGen/AMDGPU/fneg-combines.ll
222–226 ↗	(On Diff #211588)	This is clearly a regression.

mcberg2017 marked an inline comment as done.Jul 29 2019, 10:31 AM

mcberg2017 added inline comments.

test/CodeGen/AMDGPU/fneg-combines.ll
222–226 ↗	(On Diff #211588)	I will debug this case, it looks like there's an additional dependency with the new code shape wrt fadd.

The new code is actually better by 1 instruction, we just never completed the full match on the test. In the old path we had -enable-no-signed-zeros-fp-math on but no way to reach it for the zero fold of the fadd via llc as the flags are all user controlled. This should not be a regression.

test/CodeGen/AMDGPU/fneg-combines.ll

222–226 ↗

(On Diff #211588)

with the change of

// N0 + -0.0 --> N0 (also allowed with +0.0 and fast-math)
ConstantFPSDNode *N1C = isConstOrConstSplatFP(N1, true);
if (N1C && N1C->isZero())
  if (N1C->isNegative() || Options.NoSignedZerosFPMath || Flags.hasNoSignedZeros())
    return N0;

we get this DAG:

SelectionDAG has 21 nodes:

t0: ch = EntryToken
t2: f32,ch = CopyFromReg t0, Register:f32 %0
          t27: f32 = fneg t28
        t13: i1 = setcc t27, t2, setuge:ch
      t15: f32 = select t13, t2, t28
    t17: i1 = setcc t15, ConstantFP:f32<0.000000e+00>, setule:ch
  t19: f32 = select t17, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<nan>
t21: ch,glue = CopyToReg t0, Register:f32 $vgpr0, t19
    t4: f32,ch = CopyFromReg t0, Register:f32 %1
  t7: f32 = fdiv ConstantFP:f32<1.000000e+00>, t4
t28: f32 = fmul nnan arcp contract reassoc t7, ConstantFP:f32<-0.000000e+00>
t22: ch = RETURN_TO_EPILOG t21, Register:f32 $vgpr0, t21:1

for which we produce this assembler:

fneg_fadd_0: ; @fneg_fadd_0
; %bb.0: ; %.entry

v_rcp_f32_e32 v0, s1
v_mov_b32_e32 v1, s0
v_mov_b32_e32 v2, 0x7fc00000
v_mul_f32_e32 v0, 0x80000000, v0
v_cmp_nlt_f32_e64 vcc, -v0, s0
v_cndmask_b32_e32 v0, v0, v1, vcc
v_cmp_nlt_f32_e32 vcc, 0, v0
v_cndmask_b32_e64 v0, v2, 0, vcc

with the code the way it is currently posed(unmodified):

// N0 + -0.0 --> N0 (also allowed with +0.0 and fast-math)
ConstantFPSDNode *N1C = isConstOrConstSplatFP(N1, true);
if (N1C && N1C->isZero())
  if (N1C->isNegative() || Options.UnsafeFPMath || Flags.hasNoSignedZeros())
    return N0;

we get this DAG:

SelectionDAG has 21 nodes:

t0: ch = EntryToken
t2: f32,ch = CopyFromReg t0, Register:f32 %0
      t4: f32,ch = CopyFromReg t0, Register:f32 %1
    t7: f32 = fdiv ConstantFP:f32<1.000000e+00>, t4
  t9: f32 = fmul t7, ConstantFP:f32<0.000000e+00>
t11: f32 = fadd t9, ConstantFP:f32<0.000000e+00>
        t13: i1 = setcc t11, t2, setuge:ch
        t14: f32 = fneg t11
      t15: f32 = select t13, t2, t14
    t17: i1 = setcc t15, ConstantFP:f32<0.000000e+00>, setule:ch
  t19: f32 = select t17, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<nan>
t21: ch,glue = CopyToReg t0, Register:f32 $vgpr0, t19
t22: ch = RETURN_TO_EPILOG t21, Register:f32 $vgpr0, t21:1

for which we fold an fused multiply add, and produce this assembler:

fneg_fadd_0: ; @fneg_fadd_0
; %bb.0: ; %.entry

v_rcp_f32_e32 v0, s1
v_bfrev_b32_e32 v1, 1
v_mov_b32_e32 v2, s0
v_mac_f32_e32 v1, v0, v1
v_cmp_nlt_f32_e64 vcc, -v1, s0
v_cndmask_b32_e32 v0, v1, v2, vcc
v_mov_b32_e32 v1, 0x7fc00000
v_cmp_nlt_f32_e32 vcc, 0, v0
v_cndmask_b32_e64 v0, v1, 0, vcc

rampitec added inline comments.Jul 29 2019, 12:51 PM

test/CodeGen/AMDGPU/fneg-combines.ll
222–226 ↗	(On Diff #211588)	I am trying to understand what does the existing ISA do, and it is: v_rcp_f32_e32 v0, s1 v0 = 1 / s1 v_bfrev_b32_e32 v1, 1 v1 = 0x8000000 = -0.0 v_mov_b32_e32 v2, s0 v2 = s0 v_mac_f32_e32 v1, v0, v1 v1 = v1 * v0 + v1 = v0 * -0.0 - 0.0 = 0 Instead of that fancy mac instruction that seems to be all now folded into v_mul_f32_e32 v0, 0x80000000, v0 v0 = v0 * -0.0 I.e. it is hardly practically performance relevant code. The comment above tells it used to assert, so I guess this is just a regression test. Given no signed zeroes this is as good as just v0 = 0, but that's a different matter. I have no objection for this test change.

Note: test/CodeGen/AMDGPU/fneg-combines.ll needs rearchitecting, so i left it in options flag form, test/CodeGen/PowerPC/fmf-propagation.ll has portions that can be removed once we stop using the options flags and so i am leaving it in its current form with mods until that happens. test/CodeGen/X86/fp-fast.ll uses a subset of fmf that equate to the options flag that were used (let me know if you just want to generalize to fast or smaller subset), all the others use either fast or context relevant fmf and have been converted to not use options flags. Have a look and see what needs editing, currently this all passes testing.

spatel added inline comments.Jul 31 2019, 3:44 AM

test/CodeGen/AArch64/fadd-combines.ll
150–151 ↗	(On Diff #212412)	This comment is incorrect now. MachineCombiner was relying on the function attribute?
test/CodeGen/PowerPC/fma-mutate.ll
6 ↗	(On Diff #212412)	Selectively using 2 different labels for the same RUN line confused me. That seems unnecessary because the function name and IR makes it clear what the difference in output is expected to be.
13–16 ↗	(On Diff #212412)	Would it be better to auto-generate the complete output for these tests using utils/update_llc_test_checks.py?
test/CodeGen/PowerPC/qpx-recipest.ll
1 ↗	(On Diff #212412)	Same comment as above: Selectively using 2 different labels for the same RUN line confused me. That seems unnecessary because the function name and IR makes it clear what the difference in output is expected to be.
test/CodeGen/PowerPC/recipest.ll
1 ↗	(On Diff #212412)	Same comment as above: Selectively using 2 different labels for the same RUN line confused me. That seems unnecessary because the function name and IR makes it clear what the difference in output is expected to be.
test/CodeGen/X86/dagcombine-unsafe-math.ll
64 ↗	(On Diff #212412)	If 'contract' is required, that is unnecessary? Mark with a 'FIXME' note?
test/CodeGen/X86/fp-fast.ll
8–9 ↗	(On Diff #212412)	Same as earlier comment: If 'contract' is required, that is unnecessary? Mark with a 'FIXME' note?
test/CodeGen/X86/fp-fold.ll
1 ↗	(On Diff #212412)	Same comment as earlier: Selectively using 2 different labels for the same RUN line confused me. That seems unnecessary because the function name and IR makes it clear what the difference in output is expected to be. Definitely use the auto-generation script when possible for x86 tests.

added a TODO comment for machine combines in test/CodeGen/AArch64/fadd-combines.ll,
test/CodeGen/PowerPC/fma-mutate.ll, test/CodeGen/PowerPC/qpx-recipest.ll, test/CodeGen/PowerPC/recipest.ll, test/CodeGen/X86/fp-fold.ll now have just standard CHECK lines. The fmf contract flag was
removed from test/CodeGen/X86/dagcombine-unsafe-math.ll and test/CodeGen/X86/fp-fast.ll.

LGTM - thanks for the test file updates. See inline comment about the AArch64 test.

test/CodeGen/AArch64/fadd-combines.ll
150–152 ↗	(On Diff #212621)	That comment seems wrong from the start. This form has better throughput than 3 chained adds. Ideally, this would be FMA? 2.0 * x + 101.0 The '17' variable names in the check lines are wrong too: 1109917696 --> 0x42280000 --> 42.0

This revision is now accepted and ready to land.Jul 31 2019, 1:22 PM

Closed by commit rL367486: Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize… (authored by mcberg2017). · Explain WhyJul 31 2019, 3:01 PM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2019, 3:01 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

12 lines

SelectionDAG.cpp

2 lines

test/

CodeGen/

AArch64/

fadd-combines.ll

25 lines

AMDGPU/

enable-no-signed-zeros-fp-math.ll

26 lines

ffloor.f64.ll

28 lines

fneg-combines.ll

7 lines

PowerPC/

19 lines

2 lines

212 lines

277 lines

X86/

dagcombine-unsafe-math.ll

7 lines

fmul-combines.ll

54 lines

fp-fast.ll

76 lines

fp-fold.ll

244 lines

Diff 212674

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 833 Lines • ▼ Show 20 Lines	if (TLI.isOperationLegal(ISD::ConstantFP, VT) &&
return 1;		return 1;
return llvm::all_of(Op->op_values(), [&](SDValue N) {		return llvm::all_of(Op->op_values(), [&](SDValue N) {
return N.isUndef() \|\|		return N.isUndef() \|\|
TLI.isFPImmLegal(neg(cast<ConstantFPSDNode>(N)->getValueAPF()), VT,		TLI.isFPImmLegal(neg(cast<ConstantFPSDNode>(N)->getValueAPF()), VT,
ForCodeSize);		ForCodeSize);
});		});
}		}
case ISD::FADD:		case ISD::FADD:
if (!Options->UnsafeFPMath && !Flags.hasNoSignedZeros())		if (!Options->NoSignedZerosFPMath && !Flags.hasNoSignedZeros())
return 0;		return 0;

// After operation legalization, it might not be legal to create new FSUBs.		// After operation legalization, it might not be legal to create new FSUBs.
if (LegalOperations && !TLI.isOperationLegalOrCustom(ISD::FSUB, VT))		if (LegalOperations && !TLI.isOperationLegalOrCustom(ISD::FSUB, VT))
return 0;		return 0;

// fold (fneg (fadd A, B)) -> (fsub (fneg A), B)		// fold (fneg (fadd A, B)) -> (fsub (fneg A), B)
if (char V = isNegatibleForFree(Op.getOperand(0), LegalOperations, TLI,		if (char V = isNegatibleForFree(Op.getOperand(0), LegalOperations, TLI,
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	for (SDValue C : Op->op_values()) {
}		}
APFloat V = cast<ConstantFPSDNode>(C)->getValueAPF();		APFloat V = cast<ConstantFPSDNode>(C)->getValueAPF();
V.changeSign();		V.changeSign();
Ops.push_back(DAG.getConstantFP(V, SDLoc(Op), C.getValueType()));		Ops.push_back(DAG.getConstantFP(V, SDLoc(Op), C.getValueType()));
}		}
return DAG.getBuildVector(Op.getValueType(), SDLoc(Op), Ops);		return DAG.getBuildVector(Op.getValueType(), SDLoc(Op), Ops);
}		}
case ISD::FADD:		case ISD::FADD:
assert(Options.UnsafeFPMath \|\| Flags.hasNoSignedZeros());		assert(Options.NoSignedZerosFPMath \|\| Flags.hasNoSignedZeros());

// fold (fneg (fadd A, B)) -> (fsub (fneg A), B)		// fold (fneg (fadd A, B)) -> (fsub (fneg A), B)
if (isNegatibleForFree(Op.getOperand(0), LegalOperations,		if (isNegatibleForFree(Op.getOperand(0), LegalOperations,
DAG.getTargetLoweringInfo(), &Options, ForCodeSize,		DAG.getTargetLoweringInfo(), &Options, ForCodeSize,
Depth + 1))		Depth + 1))
return DAG.getNode(ISD::FSUB, SDLoc(Op), Op.getValueType(),		return DAG.getNode(ISD::FSUB, SDLoc(Op), Op.getValueType(),
GetNegatedExpression(Op.getOperand(0), DAG,		GetNegatedExpression(Op.getOperand(0), DAG,
LegalOperations, ForCodeSize,		LegalOperations, ForCodeSize,
▲ Show 20 Lines • Show All 11,088 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFADD(SDNode *N) {

// canonicalize constant to RHS		// canonicalize constant to RHS
if (N0CFP && !N1CFP)		if (N0CFP && !N1CFP)
return DAG.getNode(ISD::FADD, DL, VT, N1, N0, Flags);		return DAG.getNode(ISD::FADD, DL, VT, N1, N0, Flags);

// N0 + -0.0 --> N0 (also allowed with +0.0 and fast-math)		// N0 + -0.0 --> N0 (also allowed with +0.0 and fast-math)
ConstantFPSDNode *N1C = isConstOrConstSplatFP(N1, true);		ConstantFPSDNode *N1C = isConstOrConstSplatFP(N1, true);
if (N1C && N1C->isZero())		if (N1C && N1C->isZero())
if (N1C->isNegative() \|\| Options.UnsafeFPMath \|\| Flags.hasNoSignedZeros())		if (N1C->isNegative() \|\| Options.NoSignedZerosFPMath \|\| Flags.hasNoSignedZeros())
return N0;		return N0;

if (SDValue NewSel = foldBinOpIntoSelect(N))		if (SDValue NewSel = foldBinOpIntoSelect(N))
return NewSel;		return NewSel;

// fold (fadd A, (fneg B)) -> (fsub A, B)		// fold (fadd A, (fneg B)) -> (fsub A, B)
if ((!LegalOperations \|\| TLI.isOperationLegalOrCustom(ISD::FSUB, VT)) &&		if ((!LegalOperations \|\| TLI.isOperationLegalOrCustom(ISD::FSUB, VT)) &&
isNegatibleForFree(N1, LegalOperations, TLI, &Options, ForCodeSize) == 2)		isNegatibleForFree(N1, LegalOperations, TLI, &Options, ForCodeSize) == 2)
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	if ((Options.NoNaNsFPMath \|\| Flags.hasNoNaNs()) && AllowNewConst) {
// If allowed, fold (fadd x, (fneg x)) -> 0.0		// If allowed, fold (fadd x, (fneg x)) -> 0.0
if (N1.getOpcode() == ISD::FNEG && N1.getOperand(0) == N0)		if (N1.getOpcode() == ISD::FNEG && N1.getOperand(0) == N0)
return DAG.getConstantFP(0.0, DL, VT);		return DAG.getConstantFP(0.0, DL, VT);
}		}

// If 'unsafe math' or reassoc and nsz, fold lots of things.		// If 'unsafe math' or reassoc and nsz, fold lots of things.
// TODO: break out portions of the transformations below for which Unsafe is		// TODO: break out portions of the transformations below for which Unsafe is
// considered and which do not require both nsz and reassoc		// considered and which do not require both nsz and reassoc
if ((Options.UnsafeFPMath \|\|		if (((Options.UnsafeFPMath && Options.NoSignedZerosFPMath) \|\|
(Flags.hasAllowReassociation() && Flags.hasNoSignedZeros())) &&		(Flags.hasAllowReassociation() && Flags.hasNoSignedZeros())) &&
AllowNewConst) {		AllowNewConst) {
// fadd (fadd x, c1), c2 -> fadd x, c1 + c2		// fadd (fadd x, c1), c2 -> fadd x, c1 + c2
if (N1CFP && N0.getOpcode() == ISD::FADD &&		if (N1CFP && N0.getOpcode() == ISD::FADD &&
isConstantFPBuildVectorOrConstantFP(N0.getOperand(1))) {		isConstantFPBuildVectorOrConstantFP(N0.getOperand(1))) {
SDValue NewC = DAG.getNode(ISD::FADD, DL, VT, N0.getOperand(1), N1, Flags);		SDValue NewC = DAG.getNode(ISD::FADD, DL, VT, N0.getOperand(1), N1, Flags);
return DAG.getNode(ISD::FADD, DL, VT, N0.getOperand(0), NewC, Flags);		return DAG.getNode(ISD::FADD, DL, VT, N0.getOperand(0), NewC, Flags);
}		}
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFSUB(SDNode *N) {
if (N0CFP && N1CFP)		if (N0CFP && N1CFP)
return DAG.getNode(ISD::FSUB, DL, VT, N0, N1, Flags);		return DAG.getNode(ISD::FSUB, DL, VT, N0, N1, Flags);

if (SDValue NewSel = foldBinOpIntoSelect(N))		if (SDValue NewSel = foldBinOpIntoSelect(N))
return NewSel;		return NewSel;

// (fsub A, 0) -> A		// (fsub A, 0) -> A
if (N1CFP && N1CFP->isZero()) {		if (N1CFP && N1CFP->isZero()) {
if (!N1CFP->isNegative() \|\| Options.UnsafeFPMath \|\|		if (!N1CFP->isNegative() \|\| Options.NoSignedZerosFPMath \|\|
Flags.hasNoSignedZeros()) {		Flags.hasNoSignedZeros()) {
return N0;		return N0;
}		}
}		}

if (N0 == N1) {		if (N0 == N1) {
// (fsub x, x) -> 0.0		// (fsub x, x) -> 0.0
if (Options.NoNaNsFPMath \|\| Flags.hasNoNaNs())		if (Options.NoNaNsFPMath \|\| Flags.hasNoNaNs())
Show All 10 Lines	if (N0CFP->isNegative() \|\|
(Options.NoSignedZerosFPMath \|\| Flags.hasNoSignedZeros())) {		(Options.NoSignedZerosFPMath \|\| Flags.hasNoSignedZeros())) {
if (isNegatibleForFree(N1, LegalOperations, TLI, &Options, ForCodeSize))		if (isNegatibleForFree(N1, LegalOperations, TLI, &Options, ForCodeSize))
return GetNegatedExpression(N1, DAG, LegalOperations, ForCodeSize);		return GetNegatedExpression(N1, DAG, LegalOperations, ForCodeSize);
if (!LegalOperations \|\| TLI.isOperationLegal(ISD::FNEG, VT))		if (!LegalOperations \|\| TLI.isOperationLegal(ISD::FNEG, VT))
return DAG.getNode(ISD::FNEG, DL, VT, N1, Flags);		return DAG.getNode(ISD::FNEG, DL, VT, N1, Flags);
}		}
}		}

if ((Options.UnsafeFPMath \|\|		if (((Options.UnsafeFPMath && Options.NoSignedZerosFPMath) \|\|
(Flags.hasAllowReassociation() && Flags.hasNoSignedZeros()))		(Flags.hasAllowReassociation() && Flags.hasNoSignedZeros()))
&& N1.getOpcode() == ISD::FADD) {		&& N1.getOpcode() == ISD::FADD) {
// X - (X + Y) -> -Y		// X - (X + Y) -> -Y
if (N0 == N1->getOperand(0))		if (N0 == N1->getOperand(0))
return DAG.getNode(ISD::FNEG, DL, VT, N1->getOperand(1), Flags);		return DAG.getNode(ISD::FNEG, DL, VT, N1->getOperand(1), Flags);
// X - (Y + X) -> -Y		// X - (Y + X) -> -Y
if (N0 == N1->getOperand(1))		if (N0 == N1->getOperand(1))
return DAG.getNode(ISD::FNEG, DL, VT, N1->getOperand(0), Flags);		return DAG.getNode(ISD::FNEG, DL, VT, N1->getOperand(0), Flags);
▲ Show 20 Lines • Show All 8,650 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,624 Lines • ▼ Show 20 Lines	if (OpOpcode == ISD::EXTRACT_VECTOR_ELT &&
return Operand.getOperand(0);		return Operand.getOperand(0);
break;		break;
case ISD::FNEG:		case ISD::FNEG:
// Negation of an unknown bag of bits is still completely undefined.		// Negation of an unknown bag of bits is still completely undefined.
if (OpOpcode == ISD::UNDEF)		if (OpOpcode == ISD::UNDEF)
return getUNDEF(VT);		return getUNDEF(VT);

// -(X-Y) -> (Y-X) is unsafe because when X==Y, -0.0 != +0.0		// -(X-Y) -> (Y-X) is unsafe because when X==Y, -0.0 != +0.0
if ((getTarget().Options.UnsafeFPMath \|\| Flags.hasNoSignedZeros()) &&		if ((getTarget().Options.NoSignedZerosFPMath \|\| Flags.hasNoSignedZeros()) &&
OpOpcode == ISD::FSUB)		OpOpcode == ISD::FSUB)
return getNode(ISD::FSUB, DL, VT, Operand.getOperand(1),		return getNode(ISD::FSUB, DL, VT, Operand.getOperand(1),
Operand.getOperand(0), Flags);		Operand.getOperand(0), Flags);
if (OpOpcode == ISD::FNEG) // --X -> X		if (OpOpcode == ISD::FNEG) // --X -> X
return Operand.getOperand(0);		return Operand.getOperand(0);
break;		break;
case ISD::FABS:		case ISD::FABS:
if (OpOpcode == ISD::FNEG) // abs(-X) -> abs(X)		if (OpOpcode == ISD::FNEG) // abs(-X) -> abs(X)
▲ Show 20 Lines • Show All 4,963 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/fadd-combines.ll

	Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: fadd s0, [[TMP1]], [[TMP2]]			; CHECK-NEXT: fadd s0, [[TMP1]], [[TMP2]]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%a1 = fadd float %x, 42.0			%a1 = fadd float %x, 42.0
	%a2 = fadd nsz reassoc float %a1, 17.0			%a2 = fadd nsz reassoc float %a1, 17.0
	%a3 = fadd float %a1, %a2			%a3 = fadd float %a1, %a2
	ret float %a3			ret float %a3
	}			}

	; DAGCombiner transforms this into: (x + 59.0) + (x + 17.0).			; DAGCombiner transforms this into: (x + 17.0) + (x + 59.0).
	; The machine combiner transforms this into a chain of 3 dependent adds:			define float @fadd_const_multiuse_attr(float %x) {
	; ((x + 59.0) + 17.0) + x

	define float @fadd_const_multiuse_attr(float %x) #0 {
	; CHECK-LABEL: fadd_const_multiuse_attr:			; CHECK-LABEL: fadd_const_multiuse_attr:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-DAG: mov [[W59:w[0-9]+]], #1114374144
	; CHECK-DAG: mov [[W17:w[0-9]+]], #1109917696			; CHECK-DAG: mov [[W17:w[0-9]+]], #1109917696
	; CHECK-NEXT: fmov [[FP59:s[0-9]+]], [[W59]]			; CHECK-DAG: mov [[W59:w[0-9]+]], #1114374144
	; CHECK-NEXT: fmov [[FP17:s[0-9]+]], [[W17]]			; CHECK-NEXT: fmov [[FP17:s[0-9]+]], [[W17]]
	; CHECK-NEXT: fadd [[TMP1:s[0-9]+]], s0, [[FP59]]			; CHECK-NEXT: fmov [[FP59:s[0-9]+]], [[W59]]
	; CHECK-NEXT: fadd [[TMP2:s[0-9]+]], [[FP17]], [[TMP1]]			; CHECK-NEXT: fadd [[TMP1:s[0-9]+]], s0, [[FP17]]
	; CHECK-NEXT: fadd s0, s0, [[TMP2]]			; CHECK-NEXT: fadd [[TMP2:s[0-9]+]], s0, [[FP59]]
				; CHECK-NEXT: fadd s0, [[TMP1]], [[TMP2]]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%a1 = fadd float %x, 42.0			%a1 = fadd fast float %x, 42.0
	%a2 = fadd float %a1, 17.0			%a2 = fadd fast float %a1, 17.0
	%a3 = fadd float %a1, %a2			%a3 = fadd fast float %a1, %a2
	ret float %a3			ret float %a3
	}			}

	attributes #0 = { "unsafe-fp-math"="true" }

	declare void @use(double)			declare void @use(double)

llvm/trunk/test/CodeGen/AMDGPU/enable-no-signed-zeros-fp-math.ll

	; RUN: llc -march=amdgcn -enable-no-signed-zeros-fp-math=0 < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-SAFE %s			; RUN: llc -march=amdgcn < %s \| FileCheck --check-prefixes=GCN,GCN-FMF,GCN-SAFE %s
	; RUN: llc -march=amdgcn -enable-no-signed-zeros-fp-math=1 < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-UNSAFE %s
	; RUN: llc -march=amdgcn -enable-unsafe-fp-math < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-UNSAFE %s

	declare i32 @llvm.amdgcn.workitem.id.x() nounwind readnone			declare i32 @llvm.amdgcn.workitem.id.x() nounwind readnone

	; Test that the -enable-no-signed-zeros-fp-math flag works			; Test that the -enable-no-signed-zeros-fp-math flag works

	; GCN-LABEL: {{^}}fneg_fsub_f32:			; GCN-LABEL: {{^}}fneg_fsub_f32_fmf:
	; GCN: v_sub_f32_e32 [[SUB:v[0-9]+]], {{v[0-9]+}}, {{v[0-9]+}}			; GCN: v_sub_f32_e32 [[SUB:v[0-9]+]], {{v[0-9]+}}, {{v[0-9]+}}
	; GCN-SAFE: v_xor_b32_e32 v{{[0-9]+}}, 0x80000000, [[SUB]]			; GCN-FMF-NOT: xor
				define amdgpu_kernel void @fneg_fsub_f32_fmf(float addrspace(1)* %out, float addrspace(1)* %in) #0 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%add = add i32 %tid, 1
				%gep = getelementptr float, float addrspace(1)* %in, i32 %tid
				%b_ptr = getelementptr float, float addrspace(1)* %in, i32 %add
				%a = load float, float addrspace(1)* %gep, align 4
				%b = load float, float addrspace(1)* %b_ptr, align 4
				%result = fsub fast float %a, %b
				%neg.result = fsub fast float -0.0, %result
				store float %neg.result, float addrspace(1)* %out, align 4
				ret void
				}

	; GCN-UNSAFE-NOT: xor			; GCN-LABEL: {{^}}fneg_fsub_f32_safe:
	define amdgpu_kernel void @fneg_fsub_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {			; GCN: v_sub_f32_e32 [[SUB:v[0-9]+]], {{v[0-9]+}}, {{v[0-9]+}}
				; GCN-SAFE: v_xor_b32_e32 v{{[0-9]+}}, 0x80000000, [[SUB]]
				define amdgpu_kernel void @fneg_fsub_f32_safe(float addrspace(1)* %out, float addrspace(1)* %in) #0 {
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%add = add i32 %tid, 1			%add = add i32 %tid, 1
	%gep = getelementptr float, float addrspace(1)* %in, i32 %tid			%gep = getelementptr float, float addrspace(1)* %in, i32 %tid
	%b_ptr = getelementptr float, float addrspace(1)* %in, i32 %add			%b_ptr = getelementptr float, float addrspace(1)* %in, i32 %add
	%a = load float, float addrspace(1)* %gep, align 4			%a = load float, float addrspace(1)* %gep, align 4
	%b = load float, float addrspace(1)* %b_ptr, align 4			%b = load float, float addrspace(1)* %b_ptr, align 4
	%result = fsub float %a, %b			%result = fsub float %a, %b
	%neg.result = fsub float -0.0, %result			%neg.result = fsub float -0.0, %result
	store float %neg.result, float addrspace(1)* %out, align 4			store float %neg.result, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

llvm/trunk/test/CodeGen/AMDGPU/ffloor.f64.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs -enable-unsafe-fp-math < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs -enable-unsafe-fp-math < %s \| FileCheck -check-prefix=CI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs < %s \| FileCheck -check-prefix=CI -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs -enable-unsafe-fp-math < %s \| FileCheck -check-prefix=CI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=CI -check-prefix=FUNC %s

	declare double @llvm.fabs.f64(double %Val)			declare double @llvm.fabs.f64(double %Val)
	declare double @llvm.floor.f64(double) nounwind readnone			declare double @llvm.floor.f64(double) nounwind readnone
	declare <2 x double> @llvm.floor.v2f64(<2 x double>) nounwind readnone			declare <2 x double> @llvm.floor.v2f64(<2 x double>) nounwind readnone
	declare <3 x double> @llvm.floor.v3f64(<3 x double>) nounwind readnone			declare <3 x double> @llvm.floor.v3f64(<3 x double>) nounwind readnone
	declare <4 x double> @llvm.floor.v4f64(<4 x double>) nounwind readnone			declare <4 x double> @llvm.floor.v4f64(<4 x double>) nounwind readnone
	declare <8 x double> @llvm.floor.v8f64(<8 x double>) nounwind readnone			declare <8 x double> @llvm.floor.v8f64(<8 x double>) nounwind readnone
	declare <16 x double> @llvm.floor.v16f64(<16 x double>) nounwind readnone			declare <16 x double> @llvm.floor.v16f64(<16 x double>) nounwind readnone

	; FUNC-LABEL: {{^}}ffloor_f64:			; FUNC-LABEL: {{^}}ffloor_f64:
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; SI: v_fract_f64_e32			; SI: v_fract_f64_e32
	; SI-DAG: v_min_f64			; SI-DAG: v_min_f64
	; SI-DAG: v_cmp_class_f64_e64 vcc			; SI-DAG: v_cmp_class_f64_e64 vcc
	; SI: v_cndmask_b32_e32			; SI: v_cndmask_b32_e32
	; SI: v_cndmask_b32_e32			; SI: v_cndmask_b32_e32
	; SI: v_add_f64			; SI: v_add_f64
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @ffloor_f64(double addrspace(1)* %out, double %x) {			define amdgpu_kernel void @ffloor_f64(double addrspace(1)* %out, double %x) {
	%y = call double @llvm.floor.f64(double %x) nounwind readnone			%y = call fast double @llvm.floor.f64(double %x) nounwind readnone
	store double %y, double addrspace(1)* %out			store double %y, double addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}ffloor_f64_neg:			; FUNC-LABEL: {{^}}ffloor_f64_neg:
	; CI: v_floor_f64_e64			; CI: v_floor_f64_e64
	; SI: v_fract_f64_e64 {{v[[0-9]+:[0-9]+]}}, -[[INPUT:s[[0-9]+:[0-9]+]]]			; SI: v_fract_f64_e64 {{v[[0-9]+:[0-9]+]}}, -[[INPUT:s[[0-9]+:[0-9]+]]]
	; SI-DAG: v_min_f64			; SI-DAG: v_min_f64
	; SI-DAG: v_cmp_class_f64_e64 vcc			; SI-DAG: v_cmp_class_f64_e64 vcc
	; SI: v_cndmask_b32_e32			; SI: v_cndmask_b32_e32
	; SI: v_cndmask_b32_e32			; SI: v_cndmask_b32_e32
	; SI: v_add_f64 {{v[[0-9]+:[0-9]+]}}, -[[INPUT]]			; SI: v_add_f64 {{v[[0-9]+:[0-9]+]}}, -[[INPUT]]
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @ffloor_f64_neg(double addrspace(1)* %out, double %x) {			define amdgpu_kernel void @ffloor_f64_neg(double addrspace(1)* %out, double %x) {
	%neg = fsub double 0.0, %x			%neg = fsub nsz double 0.0, %x
	%y = call double @llvm.floor.f64(double %neg) nounwind readnone			%y = call fast double @llvm.floor.f64(double %neg) nounwind readnone
	store double %y, double addrspace(1)* %out			store double %y, double addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}ffloor_f64_neg_abs:			; FUNC-LABEL: {{^}}ffloor_f64_neg_abs:
	; CI: v_floor_f64_e64			; CI: v_floor_f64_e64
	; SI: v_fract_f64_e64 {{v[[0-9]+:[0-9]+]}}, -\|[[INPUT:s[[0-9]+:[0-9]+]]]\|			; SI: v_fract_f64_e64 {{v[[0-9]+:[0-9]+]}}, -\|[[INPUT:s[[0-9]+:[0-9]+]]]\|
	; SI-DAG: v_min_f64			; SI-DAG: v_min_f64
	; SI-DAG: v_cmp_class_f64_e64 vcc			; SI-DAG: v_cmp_class_f64_e64 vcc
	; SI: v_cndmask_b32_e32			; SI: v_cndmask_b32_e32
	; SI: v_cndmask_b32_e32			; SI: v_cndmask_b32_e32
	; SI: v_add_f64 {{v[[0-9]+:[0-9]+]}}, -\|[[INPUT]]\|			; SI: v_add_f64 {{v[[0-9]+:[0-9]+]}}, -\|[[INPUT]]\|
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @ffloor_f64_neg_abs(double addrspace(1)* %out, double %x) {			define amdgpu_kernel void @ffloor_f64_neg_abs(double addrspace(1)* %out, double %x) {
	%abs = call double @llvm.fabs.f64(double %x)			%abs = call fast double @llvm.fabs.f64(double %x)
	%neg = fsub double 0.0, %abs			%neg = fsub nsz double 0.0, %abs
	%y = call double @llvm.floor.f64(double %neg) nounwind readnone			%y = call fast double @llvm.floor.f64(double %neg) nounwind readnone
	store double %y, double addrspace(1)* %out			store double %y, double addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}ffloor_v2f64:			; FUNC-LABEL: {{^}}ffloor_v2f64:
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	define amdgpu_kernel void @ffloor_v2f64(<2 x double> addrspace(1)* %out, <2 x double> %x) {			define amdgpu_kernel void @ffloor_v2f64(<2 x double> addrspace(1)* %out, <2 x double> %x) {
	%y = call <2 x double> @llvm.floor.v2f64(<2 x double> %x) nounwind readnone			%y = call fast <2 x double> @llvm.floor.v2f64(<2 x double> %x) nounwind readnone
	store <2 x double> %y, <2 x double> addrspace(1)* %out			store <2 x double> %y, <2 x double> addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}ffloor_v3f64:			; FUNC-LABEL: {{^}}ffloor_v3f64:
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI-NOT: v_floor_f64_e32			; CI-NOT: v_floor_f64_e32
	define amdgpu_kernel void @ffloor_v3f64(<3 x double> addrspace(1)* %out, <3 x double> %x) {			define amdgpu_kernel void @ffloor_v3f64(<3 x double> addrspace(1)* %out, <3 x double> %x) {
	%y = call <3 x double> @llvm.floor.v3f64(<3 x double> %x) nounwind readnone			%y = call fast <3 x double> @llvm.floor.v3f64(<3 x double> %x) nounwind readnone
	store <3 x double> %y, <3 x double> addrspace(1)* %out			store <3 x double> %y, <3 x double> addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}ffloor_v4f64:			; FUNC-LABEL: {{^}}ffloor_v4f64:
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	define amdgpu_kernel void @ffloor_v4f64(<4 x double> addrspace(1)* %out, <4 x double> %x) {			define amdgpu_kernel void @ffloor_v4f64(<4 x double> addrspace(1)* %out, <4 x double> %x) {
	%y = call <4 x double> @llvm.floor.v4f64(<4 x double> %x) nounwind readnone			%y = call fast <4 x double> @llvm.floor.v4f64(<4 x double> %x) nounwind readnone
	store <4 x double> %y, <4 x double> addrspace(1)* %out			store <4 x double> %y, <4 x double> addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}ffloor_v8f64:			; FUNC-LABEL: {{^}}ffloor_v8f64:
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	define amdgpu_kernel void @ffloor_v8f64(<8 x double> addrspace(1)* %out, <8 x double> %x) {			define amdgpu_kernel void @ffloor_v8f64(<8 x double> addrspace(1)* %out, <8 x double> %x) {
	%y = call <8 x double> @llvm.floor.v8f64(<8 x double> %x) nounwind readnone			%y = call fast <8 x double> @llvm.floor.v8f64(<8 x double> %x) nounwind readnone
	store <8 x double> %y, <8 x double> addrspace(1)* %out			store <8 x double> %y, <8 x double> addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}ffloor_v16f64:			; FUNC-LABEL: {{^}}ffloor_v16f64:
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	define amdgpu_kernel void @ffloor_v16f64(<16 x double> addrspace(1)* %out, <16 x double> %x) {			define amdgpu_kernel void @ffloor_v16f64(<16 x double> addrspace(1)* %out, <16 x double> %x) {
	%y = call <16 x double> @llvm.floor.v16f64(<16 x double> %x) nounwind readnone			%y = call fast <16 x double> @llvm.floor.v16f64(<16 x double> %x) nounwind readnone
	store <16 x double> %y, <16 x double> addrspace(1)* %out			store <16 x double> %y, <16 x double> addrspace(1)* %out
	ret void			ret void
	}			}

llvm/trunk/test/CodeGen/AMDGPU/fneg-combines.ll

Show First 20 Lines • Show All 213 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @v_fneg_add_multi_use_fneg_x_f32(float addrspace(1)* %out, float addrspace(1)* %a.ptr, float addrspace(1)* %b.ptr, float %c) #0 {
ret void		ret void
}		}

; This one asserted with -enable-no-signed-zeros-fp-math		; This one asserted with -enable-no-signed-zeros-fp-math
; GCN-LABEL: {{^}}fneg_fadd_0:		; GCN-LABEL: {{^}}fneg_fadd_0:
; GCN-SAFE-DAG: v_mad_f32 [[A:v[0-9]+]],		; GCN-SAFE-DAG: v_mad_f32 [[A:v[0-9]+]],
; GCN-SAFE-DAG: v_cmp_ngt_f32_e32 {{.*}}, [[A]]		; GCN-SAFE-DAG: v_cmp_ngt_f32_e32 {{.*}}, [[A]]
; GCN-SAFE-DAG: v_cndmask_b32_e64 v{{[0-9]+}}, -[[A]]		; GCN-SAFE-DAG: v_cndmask_b32_e64 v{{[0-9]+}}, -[[A]]
; GCN-NSZ-DAG: v_mac_f32_e32 [[C:v[0-9]+]],		; GCN-NSZ-DAG: v_rcp_f32_e32 [[A:v[0-9]+]],
; GCN-NSZ-DAG: v_cmp_nlt_f32_e64 {{.*}}, -[[C]]		; GCN-NSZ-DAG: v_mov_b32_e32 [[B:v[0-9]+]],
		; GCN-NSZ-DAG: v_mov_b32_e32 [[C:v[0-9]+]],
		; GCN-NSZ-DAG: v_mul_f32_e32 [[D:v[0-9]+]],
		; GCN-NSZ-DAG: v_cmp_nlt_f32_e64 {{.*}}, -[[D]]

define amdgpu_ps float @fneg_fadd_0(float inreg %tmp2, float inreg %tmp6, <4 x i32> %arg) local_unnamed_addr #0 {		define amdgpu_ps float @fneg_fadd_0(float inreg %tmp2, float inreg %tmp6, <4 x i32> %arg) local_unnamed_addr #0 {
.entry:		.entry:
%tmp7 = fdiv float 1.000000e+00, %tmp6		%tmp7 = fdiv float 1.000000e+00, %tmp6
%tmp8 = fmul float 0.000000e+00, %tmp7		%tmp8 = fmul float 0.000000e+00, %tmp7
%tmp9 = fmul reassoc nnan arcp contract float 0.000000e+00, %tmp8		%tmp9 = fmul reassoc nnan arcp contract float 0.000000e+00, %tmp8
%.i188 = fadd float %tmp9, 0.000000e+00		%.i188 = fadd float %tmp9, 0.000000e+00
%tmp10 = fcmp uge float %.i188, %tmp2		%tmp10 = fcmp uge float %.i188, %tmp2
▲ Show 20 Lines • Show All 2,292 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/PowerPC/fma-mutate.ll

	; Test several VSX FMA mutation opportunities. The first one isn't a			; Test several VSX FMA mutation opportunities. The first one isn't a
	; reasonable transformation because the killed product register is the			; reasonable transformation because the killed product register is the
	; same as the FMA target register. The second one is legal. The third			; same as the FMA target register. The second one is legal. The third
	; one doesn't fit the feeding-copy pattern.			; one doesn't fit the feeding-copy pattern.

	; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -enable-unsafe-fp-math -mattr=+vsx -disable-ppc-vsx-fma-mutation=false \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -mattr=+vsx -disable-ppc-vsx-fma-mutation=false \| FileCheck %s
	target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"			target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	declare double @llvm.sqrt.f64(double)			declare double @llvm.sqrt.f64(double)

	define double @foo3(double %a) nounwind {			define double @foo3_fmf(double %a) nounwind {
	%r = call double @llvm.sqrt.f64(double %a)			; CHECK: @foo3_fmf
	ret double %r

	; CHECK: @foo3
	; CHECK-NOT: fmr			; CHECK-NOT: fmr
	; CHECK: xsmaddmdp			; CHECK: xsmaddmdp
	; CHECK: xsmaddadp			; CHECK: xsmaddadp
				%r = call fast double @llvm.sqrt.f64(double %a)
				ret double %r
				}

				define double @foo3_safe(double %a) nounwind {
				; CHECK: @foo3_safe
				; CHECK-NOT: fmr
				; CHECK: xssqrtdp
				%r = call double @llvm.sqrt.f64(double %a)
				ret double %r
	}			}

llvm/trunk/test/CodeGen/PowerPC/fmf-propagation.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: llc < %s -mtriple=powerpc64le -debug-only=isel -o /dev/null 2>&1 \| FileCheck %s --check-prefix=FMFDEBUG			; RUN: llc < %s -mtriple=powerpc64le -debug-only=isel -o /dev/null 2>&1 \| FileCheck %s --check-prefix=FMFDEBUG
	; RUN: llc < %s -mtriple=powerpc64le \| FileCheck %s --check-prefix=FMF			; RUN: llc < %s -mtriple=powerpc64le \| FileCheck %s --check-prefix=FMF
	; RUN: llc < %s -mtriple=powerpc64le -debug-only=isel -o /dev/null 2>&1 -enable-unsafe-fp-math -enable-no-nans-fp-math \| FileCheck %s --check-prefix=GLOBALDEBUG			; RUN: llc < %s -mtriple=powerpc64le -debug-only=isel -o /dev/null 2>&1 -enable-unsafe-fp-math -enable-no-nans-fp-math \| FileCheck %s --check-prefix=GLOBALDEBUG
	; RUN: llc < %s -mtriple=powerpc64le -enable-unsafe-fp-math -enable-no-nans-fp-math \| FileCheck %s --check-prefix=GLOBAL			; RUN: llc < %s -mtriple=powerpc64le -enable-unsafe-fp-math -enable-no-nans-fp-math -enable-no-signed-zeros-fp-math \| FileCheck %s --check-prefix=GLOBAL

	; Test FP transforms using instruction/node-level fast-math-flags.			; Test FP transforms using instruction/node-level fast-math-flags.
	; We're also checking debug output to verify that FMF is propagated to the newly created nodes.			; We're also checking debug output to verify that FMF is propagated to the newly created nodes.
	; The run with the global unsafe param tests the pre-FMF behavior using regular instructions/nodes.			; The run with the global unsafe param tests the pre-FMF behavior using regular instructions/nodes.

	declare float @llvm.fma.f32(float, float, float)			declare float @llvm.fma.f32(float, float, float)
	declare float @llvm.sqrt.f32(float)			declare float @llvm.sqrt.f32(float)

	▲ Show 20 Lines • Show All 465 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/PowerPC/qpx-recipest.ll

	; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=a2q -enable-unsafe-fp-math \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=a2q \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=a2q \| FileCheck -check-prefix=CHECK-SAFE %s
	target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"			target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	declare <4 x double> @llvm.sqrt.v4f64(<4 x double>)			declare <4 x double> @llvm.sqrt.v4f64(<4 x double>)
	declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)			declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)

	define <4 x double> @foo(<4 x double> %a, <4 x double> %b) nounwind {			define <4 x double> @foo_fmf(<4 x double> %a, <4 x double> %b) nounwind {
	entry:			; CHECK-LABEL: @foo_fmf
	%x = call <4 x double> @llvm.sqrt.v4f64(<4 x double> %b)
	%r = fdiv <4 x double> %a, %x
	ret <4 x double> %r

	; CHECK-LABEL: @foo
	; CHECK: qvfrsqrte			; CHECK: qvfrsqrte
	; CHECK-DAG: qvfmul			; CHECK-DAG: qvfmul
	; FIXME: We're currently loading two constants here (1.5 and -1.5), and using			; CHECK-DAG: qvfmsub
	; an qvfmadd instead of a qvfnmsub			; CHECK-DAG: qvfnmsub
	; CHECK-DAG: qvfmadd
	; CHECK-DAG: qvfmadd
	; CHECK: qvfmul			; CHECK: qvfmul
	; CHECK: qvfmul			; CHECK: qvfmul
	; CHECK: qvfmadd			; CHECK: qvfnmsub
	; CHECK: qvfmul			; CHECK: qvfmul
	; CHECK: qvfmul			; CHECK: qvfmul
	; CHECK: blr			; CHECK: blr
				entry:
	; CHECK-SAFE-LABEL: @foo			%x = call fast <4 x double> @llvm.sqrt.v4f64(<4 x double> %b)
	; CHECK-SAFE: fsqrt			%r = fdiv fast <4 x double> %a, %x
	; CHECK-SAFE: fdiv			ret <4 x double> %r
	; CHECK-SAFE: blr
	}			}

	define <4 x double> @foof(<4 x double> %a, <4 x float> %b) nounwind {			define <4 x double> @foo_safe(<4 x double> %a, <4 x double> %b) nounwind {
				; CHECK-LABEL: @foo_safe
				; CHECK: fsqrt
				; CHECK: fdiv
				; CHECK: blr
	entry:			entry:
	%x = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %b)			%x = call <4 x double> @llvm.sqrt.v4f64(<4 x double> %b)
	%y = fpext <4 x float> %x to <4 x double>			%r = fdiv <4 x double> %a, %x
	%r = fdiv <4 x double> %a, %y
	ret <4 x double> %r			ret <4 x double> %r
				}

	; CHECK-LABEL: @foof			define <4 x double> @foof_fmf(<4 x double> %a, <4 x float> %b) nounwind {
				; CHECK-LABEL: @foof_fmf
	; CHECK: qvfrsqrtes			; CHECK: qvfrsqrtes
	; CHECK-DAG: qvfmuls			; CHECK-DAG: qvfmuls
	; FIXME: We're currently loading two constants here (1.5 and -1.5), and using			; FIXME: We're currently loading two constants here (1.5 and -1.5), and using
	; an qvfmadd instead of a qvfnmsubs			; an qvfmadd instead of a qvfnmsubs
	; CHECK-DAG: qvfmadds			; CHECK-DAG: qvfmadds
	; CHECK-DAG: qvfmadds			; CHECK-DAG: qvfmadds
	; CHECK: qvfmuls			; CHECK: qvfmuls
	; CHECK: qvfmul			; CHECK: qvfmul
	; CHECK: blr			; CHECK: blr
				entry:
	; CHECK-SAFE-LABEL: @foof			%x = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> %b)
	; CHECK-SAFE: fsqrts			%y = fpext <4 x float> %x to <4 x double>
	; CHECK-SAFE: fdiv			%r = fdiv fast <4 x double> %a, %y
	; CHECK-SAFE: blr			ret <4 x double> %r
	}			}

	define <4 x float> @food(<4 x float> %a, <4 x double> %b) nounwind {			define <4 x double> @foof_safe(<4 x double> %a, <4 x float> %b) nounwind {
				; CHECK-LABEL: @foof_safe
				; CHECK: fsqrts
				; CHECK: fdiv
				; CHECK: blr
	entry:			entry:
	%x = call <4 x double> @llvm.sqrt.v4f64(<4 x double> %b)			%x = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %b)
	%y = fptrunc <4 x double> %x to <4 x float>			%y = fpext <4 x float> %x to <4 x double>
	%r = fdiv <4 x float> %a, %y			%r = fdiv <4 x double> %a, %y
	ret <4 x float> %r			ret <4 x double> %r
				}

	; CHECK-LABEL: @food			define <4 x float> @food_fmf(<4 x float> %a, <4 x double> %b) nounwind {
				; CHECK-LABEL: @food_fmf
	; CHECK: qvfrsqrte			; CHECK: qvfrsqrte
	; CHECK-DAG: qvfmul			; CHECK-DAG: qvfmul
	; FIXME: We're currently loading two constants here (1.5 and -1.5), and using			; CHECK-DAG: qvfmsub
	; an qvfmadd instead of a qvfnmsub			; CHECK-DAG: qvfnmsub
	; CHECK-DAG: qvfmadd
	; CHECK-DAG: qvfmadd
	; CHECK: qvfmul			; CHECK: qvfmul
	; CHECK: qvfmul			; CHECK: qvfmul
	; CHECK: qvfmadd			; CHECK: qvfnmsub
	; CHECK: qvfmul			; CHECK: qvfmul
	; CHECK: qvfrsp			; CHECK: qvfrsp
	; CHECK: qvfmuls			; CHECK: qvfmuls
	; CHECK: blr			; CHECK: blr
				entry:
	; CHECK-SAFE-LABEL: @food			%x = call fast <4 x double> @llvm.sqrt.v4f64(<4 x double> %b)
	; CHECK-SAFE: fsqrt			%y = fptrunc <4 x double> %x to <4 x float>
	; CHECK-SAFE: fdivs			%r = fdiv fast <4 x float> %a, %y
	; CHECK-SAFE: blr			ret <4 x float> %r
	}			}

	define <4 x float> @goo(<4 x float> %a, <4 x float> %b) nounwind {			define <4 x float> @food_safe(<4 x float> %a, <4 x double> %b) nounwind {
				; CHECK-LABEL: @food_safe
				; CHECK: fsqrt
				; CHECK: fdivs
				; CHECK: blr
	entry:			entry:
	%x = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %b)			%x = call <4 x double> @llvm.sqrt.v4f64(<4 x double> %b)
	%r = fdiv <4 x float> %a, %x			%y = fptrunc <4 x double> %x to <4 x float>
				%r = fdiv <4 x float> %a, %y
	ret <4 x float> %r			ret <4 x float> %r
				}

	; CHECK-LABEL: @goo			define <4 x float> @goo_fmf(<4 x float> %a, <4 x float> %b) nounwind {
				; CHECK-LABEL: @goo_fmf
	; CHECK: qvfrsqrtes			; CHECK: qvfrsqrtes
	; CHECK-DAG: qvfmuls			; CHECK-DAG: qvfmuls
	; FIXME: We're currently loading two constants here (1.5 and -1.5), and using			; FIXME: We're currently loading two constants here (1.5 and -1.5), and using
	; an qvfmadd instead of a qvfnmsubs			; an qvfmadd instead of a qvfnmsubs
	; CHECK-DAG: qvfmadds			; CHECK-DAG: qvfmadds
	; CHECK-DAG: qvfmadds			; CHECK-DAG: qvfmadds
	; CHECK: qvfmuls			; CHECK: qvfmuls
	; CHECK: qvfmuls			; CHECK: qvfmuls
	; CHECK: blr			; CHECK: blr
				entry:
	; CHECK-SAFE-LABEL: @goo			%x = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> %b)
	; CHECK-SAFE: fsqrts			%r = fdiv fast <4 x float> %a, %x
	; CHECK-SAFE: fdivs			ret <4 x float> %r
	; CHECK-SAFE: blr
	}			}

	define <4 x double> @foo2(<4 x double> %a, <4 x double> %b) nounwind {			define <4 x float> @goo_safe(<4 x float> %a, <4 x float> %b) nounwind {
				; CHECK-LABEL: @goo_safe
				; CHECK: fsqrts
				; CHECK: fdivs
				; CHECK: blr
	entry:			entry:
	%r = fdiv <4 x double> %a, %b			%x = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %b)
	ret <4 x double> %r			%r = fdiv <4 x float> %a, %x
				ret <4 x float> %r
				}

	; CHECK-LABEL: @foo2			define <4 x double> @foo2_fmf(<4 x double> %a, <4 x double> %b) nounwind {
				; CHECK-LABEL: @foo2_fmf
	; CHECK: qvfre			; CHECK: qvfre
	; CHECK: qvfnmsub			; CHECK: qvfnmsub
	; CHECK: qvfmadd			; CHECK: qvfmadd
	; CHECK: qvfnmsub			; CHECK: qvfnmsub
	; CHECK: qvfmadd			; CHECK: qvfmadd
	; CHECK: qvfmul			; CHECK: qvfmul
	; CHECK: blr			; CHECK: blr
				entry:
	; CHECK-SAFE-LABEL: @foo2			%r = fdiv fast <4 x double> %a, %b
	; CHECK-SAFE: fdiv			ret <4 x double> %r
	; CHECK-SAFE: blr
	}			}

	define <4 x float> @goo2(<4 x float> %a, <4 x float> %b) nounwind {			define <4 x double> @foo2_safe(<4 x double> %a, <4 x double> %b) nounwind {
	entry:			; CHECK-LABEL: @foo2_safe
	%r = fdiv <4 x float> %a, %b			; CHECK: fdiv
	ret <4 x float> %r			; CHECK: blr
				%r = fdiv <4 x double> %a, %b
				ret <4 x double> %r
				}

	; CHECK-LABEL: @goo2			define <4 x float> @goo2_fmf(<4 x float> %a, <4 x float> %b) nounwind {
				; CHECK-LABEL: @goo2_fmf
	; CHECK: qvfres			; CHECK: qvfres
	; CHECK: qvfnmsubs			; CHECK: qvfnmsubs
	; CHECK: qvfmadds			; CHECK: qvfmadds
	; CHECK: qvfmuls			; CHECK: qvfmuls
	; CHECK: blr			; CHECK: blr
				entry:
	; CHECK-SAFE-LABEL: @goo2			%r = fdiv fast <4 x float> %a, %b
	; CHECK-SAFE: fdivs			ret <4 x float> %r
	; CHECK-SAFE: blr
	}			}

	define <4 x double> @foo3(<4 x double> %a) nounwind {			define <4 x float> @goo2_safe(<4 x float> %a, <4 x float> %b) nounwind {
				; CHECK-LABEL: @goo2_safe
				; CHECK: fdivs
				; CHECK: blr
	entry:			entry:
	%r = call <4 x double> @llvm.sqrt.v4f64(<4 x double> %a)			%r = fdiv <4 x float> %a, %b
	ret <4 x double> %r			ret <4 x float> %r
				}

	; CHECK-LABEL: @foo3			define <4 x double> @foo3_fmf(<4 x double> %a) nounwind {
				; CHECK-LABEL: @foo3_fmf
	; CHECK: qvfrsqrte			; CHECK: qvfrsqrte
	; CHECK: qvfmul			; CHECK: qvfmul
	; FIXME: We're currently loading two constants here (1.5 and -1.5), and using			; CHECK-DAG: qvfmsub
	; an qvfmadd instead of a qvfnmsub
	; CHECK-DAG: qvfmadd
	; CHECK-DAG: qvfcmpeq			; CHECK-DAG: qvfcmpeq
	; CHECK-DAG: qvfmadd			; CHECK-DAG: qvfnmsub
	; CHECK-DAG: qvfmul			; CHECK-DAG: qvfmul
	; CHECK-DAG: qvfmul			; CHECK-DAG: qvfmul
	; CHECK-DAG: qvfmadd			; CHECK-DAG: qvfnmsub
	; CHECK-DAG: qvfmul			; CHECK-DAG: qvfmul
	; CHECK-DAG: qvfmul			; CHECK-DAG: qvfmul
	; CHECK: qvfsel			; CHECK: qvfsel
	; CHECK: blr			; CHECK: blr
				entry:
	; CHECK-SAFE-LABEL: @foo3			%r = call fast <4 x double> @llvm.sqrt.v4f64(<4 x double> %a)
	; CHECK-SAFE: fsqrt			ret <4 x double> %r
	; CHECK-SAFE: blr
	}			}

	define <4 x float> @goo3(<4 x float> %a) nounwind {			define <4 x double> @foo3_safe(<4 x double> %a) nounwind {
				; CHECK-LABEL: @foo3_safe
				; CHECK: fsqrt
				; CHECK: blr
	entry:			entry:
	%r = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %a)			%r = call <4 x double> @llvm.sqrt.v4f64(<4 x double> %a)
	ret <4 x float> %r			ret <4 x double> %r
				}

	; CHECK-LABEL: @goo3			define <4 x float> @goo3_fmf(<4 x float> %a) nounwind {
				; CHECK-LABEL: @goo3_fmf
	; CHECK: qvfrsqrtes			; CHECK: qvfrsqrtes
	; CHECK: qvfmuls			; CHECK: qvfmuls
	; FIXME: We're currently loading two constants here (1.5 and -1.5), and using			; FIXME: We're currently loading two constants here (1.5 and -1.5), and using
	; an qvfmadds instead of a qvfnmsubs			; an qvfmadds instead of a qvfnmsubs
	; CHECK-DAG: qvfmadds			; CHECK-DAG: qvfmadds
	; CHECK-DAG: qvfcmpeq			; CHECK-DAG: qvfcmpeq
	; CHECK-DAG: qvfmadds			; CHECK-DAG: qvfmadds
	; CHECK-DAG: qvfmuls			; CHECK-DAG: qvfmuls
	; CHECK-DAG: qvfmuls			; CHECK-DAG: qvfmuls
	; CHECK: qvfsel			; CHECK: qvfsel
	; CHECK: blr			; CHECK: blr
				entry:
				%r = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> %a)
				ret <4 x float> %r
				}

	; CHECK-SAFE-LABEL: @goo3			define <4 x float> @goo3_safe(<4 x float> %a) nounwind {
	; CHECK-SAFE: fsqrts			; CHECK-LABEL: @goo3_safe
	; CHECK-SAFE: blr			; CHECK: fsqrts
				; CHECK: blr
				entry:
				%r = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %a)
				ret <4 x float> %r
	}			}

llvm/trunk/test/CodeGen/PowerPC/recipest.ll

	; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -enable-unsafe-fp-math -mattr=-vsx \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -mattr=-vsx \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -mattr=-vsx \| FileCheck -check-prefix=CHECK-SAFE %s

	target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"			target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	declare double @llvm.sqrt.f64(double)			declare double @llvm.sqrt.f64(double)
	declare float @llvm.sqrt.f32(float)			declare float @llvm.sqrt.f32(float)
	declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)			declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)

	define double @foo(double %a, double %b) nounwind {			define double @foo_fmf(double %a, double %b) nounwind {
	%x = call double @llvm.sqrt.f64(double %b)			; CHECK: @foo_fmf
	%r = fdiv double %a, %x
	ret double %r

	; CHECK: @foo
	; CHECK: frsqrte			; CHECK: frsqrte
	; CHECK: fmul			; CHECK: fmul
	; CHECK-NEXT: fmadd			; CHECK-NEXT: fmadd
	; CHECK-NEXT: fmul			; CHECK-NEXT: fmul
	; CHECK-NEXT: fmul			; CHECK-NEXT: fmul
	; CHECK-NEXT: fmul			; CHECK-NEXT: fmul
	; CHECK-NEXT: fmadd			; CHECK-NEXT: fmadd
	; CHECK-NEXT: fmul			; CHECK-NEXT: fmul
	; CHECK-NEXT: fmul			; CHECK-NEXT: fmul
	; CHECK-NEXT: fmul			; CHECK-NEXT: fmul
	; CHECK: blr			; CHECK: blr
				%x = call fast double @llvm.sqrt.f64(double %b)
	; CHECK-SAFE: @foo			%r = fdiv fast double %a, %x
	; CHECK-SAFE: fsqrt			ret double %r
	; CHECK-SAFE: fdiv
	; CHECK-SAFE: blr
	}			}

	define double @no_estimate_refinement_f64(double %a, double %b) #0 {			define double @foo_safe(double %a, double %b) nounwind {
				; CHECK: @foo_safe
				; CHECK: fsqrt
				; CHECK: fdiv
				; CHECK: blr
	%x = call double @llvm.sqrt.f64(double %b)			%x = call double @llvm.sqrt.f64(double %b)
	%r = fdiv double %a, %x			%r = fdiv double %a, %x
	ret double %r			ret double %r
				}

				define double @no_estimate_refinement_f64(double %a, double %b) #0 {
	; CHECK-LABEL: @no_estimate_refinement_f64			; CHECK-LABEL: @no_estimate_refinement_f64
	; CHECK: frsqrte			; CHECK: frsqrte
	; CHECK-NOT: fmadd			; CHECK-NOT: fmadd
	; CHECK: fmul			; CHECK: fmul
	; CHECK-NOT: fmadd			; CHECK-NOT: fmadd
	; CHECK: blr			; CHECK: blr
	}			%x = call fast double @llvm.sqrt.f64(double %b)
				%r = fdiv fast double %a, %x

	define double @foof(double %a, float %b) nounwind {
	%x = call float @llvm.sqrt.f32(float %b)
	%y = fpext float %x to double
	%r = fdiv double %a, %y
	ret double %r			ret double %r
				}

	; CHECK: @foof			define double @foof_fmf(double %a, float %b) nounwind {
				; CHECK: @foof_fmf
	; CHECK-DAG: frsqrtes			; CHECK-DAG: frsqrtes
	; CHECK: fmuls			; CHECK: fmuls
	; CHECK-NEXT: fmadds			; CHECK-NEXT: fmadds
	; CHECK-NEXT: fmuls			; CHECK-NEXT: fmuls
	; CHECK-NEXT: fmuls			; CHECK-NEXT: fmuls
	; CHECK-NEXT: fmul			; CHECK-NEXT: fmul
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				%x = call fast float @llvm.sqrt.f32(float %b)
	; CHECK-SAFE: @foof			%y = fpext float %x to double
	; CHECK-SAFE: fsqrts			%r = fdiv fast double %a, %y
	; CHECK-SAFE: fdiv			ret double %r
	; CHECK-SAFE: blr
	}			}

	define float @food(float %a, double %b) nounwind {			define double @foof_safe(double %a, float %b) nounwind {
	%x = call double @llvm.sqrt.f64(double %b)			; CHECK: @foof_safe
	%y = fptrunc double %x to float			; CHECK: fsqrts
	%r = fdiv float %a, %y			; CHECK: fdiv
	ret float %r			; CHECK: blr
				%x = call float @llvm.sqrt.f32(float %b)
				%y = fpext float %x to double
				%r = fdiv double %a, %y
				ret double %r
				}

	; CHECK: @foo			define float @food_fmf(float %a, double %b) nounwind {
				; CHECK: @food_fmf
	; CHECK-DAG: frsqrte			; CHECK-DAG: frsqrte
	; CHECK: fmul			; CHECK: fmul
	; CHECK-NEXT: fmadd			; CHECK-NEXT: fmadd
	; CHECK-NEXT: fmul			; CHECK-NEXT: fmul
	; CHECK-NEXT: fmul			; CHECK-NEXT: fmul
	; CHECK-NEXT: fmul			; CHECK-NEXT: fmul
	; CHECK-NEXT: fmadd			; CHECK-NEXT: fmadd
	; CHECK-NEXT: fmul			; CHECK-NEXT: fmul
	; CHECK-NEXT: fmul			; CHECK-NEXT: fmul
	; CHECK-NEXT: frsp			; CHECK-NEXT: frsp
	; CHECK-NEXT: fmuls			; CHECK-NEXT: fmuls
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				%x = call fast double @llvm.sqrt.f64(double %b)
	; CHECK-SAFE: @foo			%y = fptrunc double %x to float
	; CHECK-SAFE: fsqrt			%r = fdiv fast float %a, %y
	; CHECK-SAFE: fdivs			ret float %r
	; CHECK-SAFE: blr
	}			}

	define float @goo(float %a, float %b) nounwind {			define float @food_safe(float %a, double %b) nounwind {
	%x = call float @llvm.sqrt.f32(float %b)			; CHECK: @food_safe
	%r = fdiv float %a, %x			; CHECK: fsqrt
				; CHECK: fdivs
				; CHECK: blr
				%x = call double @llvm.sqrt.f64(double %b)
				%y = fptrunc double %x to float
				%r = fdiv float %a, %y
	ret float %r			ret float %r
				}

	; CHECK: @goo			define float @goo_fmf(float %a, float %b) nounwind {
				; CHECK: @goo_fmf
	; CHECK-DAG: frsqrtes			; CHECK-DAG: frsqrtes
	; CHECK: fmuls			; CHECK: fmuls
	; CHECK-NEXT: fmadds			; CHECK-NEXT: fmadds
	; CHECK-NEXT: fmuls			; CHECK-NEXT: fmuls
	; CHECK-NEXT: fmuls			; CHECK-NEXT: fmuls
	; CHECK-NEXT: fmuls			; CHECK-NEXT: fmuls
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				%x = call fast float @llvm.sqrt.f32(float %b)
	; CHECK-SAFE: @goo			%r = fdiv fast float %a, %x
	; CHECK-SAFE: fsqrts			ret float %r
	; CHECK-SAFE: fdivs
	; CHECK-SAFE: blr
	}			}

				define float @goo_safe(float %a, float %b) nounwind {
	define float @no_estimate_refinement_f32(float %a, float %b) #0 {			; CHECK: @goo_safe
				; CHECK: fsqrts
				; CHECK: fdivs
				; CHECK: blr
	%x = call float @llvm.sqrt.f32(float %b)			%x = call float @llvm.sqrt.f32(float %b)
	%r = fdiv float %a, %x			%r = fdiv float %a, %x
	ret float %r			ret float %r
				}

				define float @no_estimate_refinement_f32(float %a, float %b) #0 {
	; CHECK-LABEL: @no_estimate_refinement_f32			; CHECK-LABEL: @no_estimate_refinement_f32
	; CHECK: frsqrtes			; CHECK: frsqrtes
	; CHECK-NOT: fmadds			; CHECK-NOT: fmadds
	; CHECK: fmuls			; CHECK: fmuls
	; CHECK-NOT: fmadds			; CHECK-NOT: fmadds
	; CHECK: blr			; CHECK: blr
				%x = call fast float @llvm.sqrt.f32(float %b)
				%r = fdiv fast float %a, %x
				ret float %r
	}			}

	; Recognize that this is rsqrt(a) * rcp(b) * c,			; Recognize that this is rsqrt(a) * rcp(b) * c,
	; not 1 / ( 1 / sqrt(a)) * rcp(b) * c.			; not 1 / ( 1 / sqrt(a)) * rcp(b) * c.
	define float @rsqrt_fmul(float %a, float %b, float %c) {			define float @rsqrt_fmul_fmf(float %a, float %b, float %c) {
	%x = call float @llvm.sqrt.f32(float %a)			; CHECK: @rsqrt_fmul_fmf
	%y = fmul float %x, %b
	%z = fdiv float %c, %y
	ret float %z

	; CHECK: @rsqrt_fmul
	; CHECK-DAG: frsqrtes			; CHECK-DAG: frsqrtes
	; CHECK-DAG: fres			; CHECK-DAG: fres
	; CHECK-DAG: fnmsubs			; CHECK-DAG: fnmsubs
	; CHECK-DAG: fmuls			; CHECK-DAG: fmuls
	; CHECK-DAG: fmadds			; CHECK-DAG: fmadds
	; CHECK-DAG: fmadds			; CHECK-DAG: fmadds
	; CHECK: fmuls			; CHECK: fmuls
	; CHECK-NEXT: fmuls			; CHECK-NEXT: fmuls
	; CHECK-NEXT: fmuls			; CHECK-NEXT: fmuls
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				%x = call fast float @llvm.sqrt.f32(float %a)
				%y = fmul fast float %x, %b
				%z = fdiv fast float %c, %y
				ret float %z
				}

				; Recognize that this is rsqrt(a) * rcp(b) * c,
				; not 1 / ( 1 / sqrt(a)) * rcp(b) * c.
				define float @rsqrt_fmul_safe(float %a, float %b, float %c) {
				; CHECK: @rsqrt_fmul_safe
				; CHECK: fsqrts
				; CHECK: fmuls
				; CHECK: fdivs
				; CHECK: blr
				%x = call float @llvm.sqrt.f32(float %a)
				%y = fmul float %x, %b
				%z = fdiv float %c, %y
				ret float %z
				}

	; CHECK-SAFE: @rsqrt_fmul			define <4 x float> @hoo_fmf(<4 x float> %a, <4 x float> %b) nounwind {
	; CHECK-SAFE: fsqrts			; CHECK: @hoo_fmf
	; CHECK-SAFE: fmuls			; CHECK: vrsqrtefp
	; CHECK-SAFE: fdivs			%x = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> %b)
	; CHECK-SAFE: blr			%r = fdiv fast <4 x float> %a, %x
				ret <4 x float> %r
	}			}

	define <4 x float> @hoo(<4 x float> %a, <4 x float> %b) nounwind {			define <4 x float> @hoo_safe(<4 x float> %a, <4 x float> %b) nounwind {
				; CHECK: @hoo_safe
				; CHECK-NOT: vrsqrtefp
				; CHECK: blr
	%x = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %b)			%x = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %b)
	%r = fdiv <4 x float> %a, %x			%r = fdiv <4 x float> %a, %x
	ret <4 x float> %r			ret <4 x float> %r

	; CHECK: @hoo
	; CHECK: vrsqrtefp

	; CHECK-SAFE: @hoo
	; CHECK-SAFE-NOT: vrsqrtefp
	; CHECK-SAFE: blr
	}			}

	define double @foo2(double %a, double %b) nounwind {			define double @foo2_fmf(double %a, double %b) nounwind {
	%r = fdiv double %a, %b			; CHECK: @foo2_fmf
	ret double %r

	; CHECK: @foo2
	; CHECK-DAG: fre			; CHECK-DAG: fre
	; CHECK-DAG: fnmsub			; CHECK-DAG: fnmsub
	; CHECK: fmadd			; CHECK: fmadd
	; CHECK-NEXT: fnmsub			; CHECK-NEXT: fnmsub
	; CHECK-NEXT: fmadd			; CHECK-NEXT: fmadd
	; CHECK-NEXT: fmul			; CHECK-NEXT: fmul
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				%r = fdiv fast double %a, %b
	; CHECK-SAFE: @foo2			ret double %r
	; CHECK-SAFE: fdiv
	; CHECK-SAFE: blr
	}			}

	define float @goo2(float %a, float %b) nounwind {			define double @foo2_safe(double %a, double %b) nounwind {
	%r = fdiv float %a, %b			; CHECK: @foo2_safe
	ret float %r			; CHECK: fdiv
				; CHECK: blr
				%r = fdiv double %a, %b
				ret double %r
				}

	; CHECK: @goo2			define float @goo2_fmf(float %a, float %b) nounwind {
				; CHECK: @goo2_fmf
	; CHECK-DAG: fres			; CHECK-DAG: fres
	; CHECK-DAG: fnmsubs			; CHECK-DAG: fnmsubs
	; CHECK: fmadds			; CHECK: fmadds
	; CHECK-NEXT: fmuls			; CHECK-NEXT: fmuls
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				%r = fdiv fast float %a, %b
	; CHECK-SAFE: @goo2			ret float %r
	; CHECK-SAFE: fdivs
	; CHECK-SAFE: blr
	}			}

	define <4 x float> @hoo2(<4 x float> %a, <4 x float> %b) nounwind {			define float @goo2_safe(float %a, float %b) nounwind {
	%r = fdiv <4 x float> %a, %b			; CHECK: @goo2_safe
	ret <4 x float> %r			; CHECK: fdivs
				; CHECK: blr
				%r = fdiv float %a, %b
				ret float %r
				}

	; CHECK: @hoo2			define <4 x float> @hoo2_fmf(<4 x float> %a, <4 x float> %b) nounwind {
				; CHECK: @hoo2_fmf
	; CHECK: vrefp			; CHECK: vrefp
				%r = fdiv fast <4 x float> %a, %b
	; CHECK-SAFE: @hoo2			ret <4 x float> %r
	; CHECK-SAFE-NOT: vrefp
	; CHECK-SAFE: blr
	}			}

	define double @foo3(double %a) nounwind {			define <4 x float> @hoo2_safe(<4 x float> %a, <4 x float> %b) nounwind {
	%r = call double @llvm.sqrt.f64(double %a)			; CHECK: @hoo2_safe
	ret double %r			; CHECK-NOT: vrefp
				; CHECK: blr
				%r = fdiv <4 x float> %a, %b
				ret <4 x float> %r
				}

	; CHECK: @foo3			define double @foo3_fmf(double %a) nounwind {
				; CHECK: @foo3_fmf
	; CHECK: fcmpu			; CHECK: fcmpu
	; CHECK-DAG: frsqrte			; CHECK-DAG: frsqrte
	; CHECK: fmul			; CHECK: fmul
	; CHECK-NEXT: fmadd			; CHECK-NEXT: fmadd
	; CHECK-NEXT: fmul			; CHECK-NEXT: fmul
	; CHECK-NEXT: fmul			; CHECK-NEXT: fmul
	; CHECK-NEXT: fmul			; CHECK-NEXT: fmul
	; CHECK-NEXT: fmadd			; CHECK-NEXT: fmadd
	; CHECK-NEXT: fmul			; CHECK-NEXT: fmul
	; CHECK-NEXT: fmul			; CHECK-NEXT: fmul
	; CHECK: blr			; CHECK: blr
				%r = call fast double @llvm.sqrt.f64(double %a)
	; CHECK-SAFE: @foo3			ret double %r
	; CHECK-SAFE: fsqrt
	; CHECK-SAFE: blr
	}			}

	define float @goo3(float %a) nounwind {			define double @foo3_safe(double %a) nounwind {
	%r = call float @llvm.sqrt.f32(float %a)			; CHECK: @foo3_safe
	ret float %r			; CHECK: fsqrt
				; CHECK: blr
				%r = call double @llvm.sqrt.f64(double %a)
				ret double %r
				}

	; CHECK: @goo3			define float @goo3_fmf(float %a) nounwind {
				; CHECK: @goo3_fmf
	; CHECK: fcmpu			; CHECK: fcmpu
	; CHECK-DAG: frsqrtes			; CHECK-DAG: frsqrtes
	; CHECK: fmuls			; CHECK: fmuls
	; CHECK-NEXT: fmadds			; CHECK-NEXT: fmadds
	; CHECK-NEXT: fmuls			; CHECK-NEXT: fmuls
	; CHECK-NEXT: fmuls			; CHECK-NEXT: fmuls
	; CHECK: blr			; CHECK: blr
				%r = call fast float @llvm.sqrt.f32(float %a)
	; CHECK-SAFE: @goo3			ret float %r
	; CHECK-SAFE: fsqrts
	; CHECK-SAFE: blr
	}			}

	define <4 x float> @hoo3(<4 x float> %a) nounwind {			define float @goo3_safe(float %a) nounwind {
	%r = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %a)			; CHECK: @goo3_safe
	ret <4 x float> %r			; CHECK: fsqrts
				; CHECK: blr
				%r = call float @llvm.sqrt.f32(float %a)
				ret float %r
				}

	; CHECK: @hoo3			define <4 x float> @hoo3_fmf(<4 x float> %a) nounwind {
				; CHECK: @hoo3_fmf
	; CHECK: vrsqrtefp			; CHECK: vrsqrtefp
	; CHECK-DAG: vcmpeqfp			; CHECK-DAG: vcmpeqfp
				%r = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> %a)
				ret <4 x float> %r
				}

	; CHECK-SAFE: @hoo3			define <4 x float> @hoo3_safe(<4 x float> %a) nounwind {
	; CHECK-SAFE-NOT: vrsqrtefp			; CHECK: @hoo3_safe
	; CHECK-SAFE: blr			; CHECK-NOT: vrsqrtefp
				; CHECK: blr
				%r = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %a)
				ret <4 x float> %r
	}			}

	attributes #0 = { nounwind "reciprocal-estimates"="sqrtf:0,sqrtd:0" }			attributes #0 = { nounwind "reciprocal-estimates"="sqrtf:0,sqrtd:0" }

llvm/trunk/test/CodeGen/X86/dagcombine-unsafe-math.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=corei7-avx \| FileCheck %s
	; RUN: llc < %s -enable-unsafe-fp-math -mtriple=x86_64-apple-darwin -mcpu=corei7-avx \| FileCheck %s


	; rdar://13126763			; rdar://13126763
	; Expression "x + xx" was mistakenly transformed into "x 3.0f".			; Expression "x + xx" was mistakenly transformed into "x 3.0f".

	define float @test1(float %x) {			define float @test1(float %x) {
	; CHECK-LABEL: test1:			; CHECK-LABEL: test1:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	define float @test5(<4 x float> %x) {			define float @test5(<4 x float> %x) {
	; CHECK-LABEL: test5:			; CHECK-LABEL: test5:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%splat = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> zeroinitializer			%splat = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> zeroinitializer
	%v1 = extractelement <4 x float> %splat, i32 1			%v1 = extractelement <4 x float> %splat, i32 1
	%v0 = extractelement <4 x float> %splat, i32 0			%v0 = extractelement <4 x float> %splat, i32 0
	%add1 = fadd float %v0, %v1			%add1 = fadd reassoc nsz float %v0, %v1
	%v2 = extractelement <4 x float> %splat, i32 2			%v2 = extractelement <4 x float> %splat, i32 2
	%add2 = fadd float %v2, %add1			%add2 = fadd reassoc nsz float %v2, %add1
	ret float %add2			ret float %add2
	}			}

llvm/trunk/test/CodeGen/X86/fmul-combines.ll

	Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: constant_fold_fmul_v4f32_undef:			; CHECK-LABEL: constant_fold_fmul_v4f32_undef:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movaps {{.*#+}} xmm0 = [8.0E+0,NaN,8.0E+0,NaN]			; CHECK-NEXT: movaps {{.*#+}} xmm0 = [8.0E+0,NaN,8.0E+0,NaN]
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%y = fmul <4 x float> <float 4.0, float undef, float 4.0, float 4.0>, <float 2.0, float 2.0, float 2.0, float undef>			%y = fmul <4 x float> <float 4.0, float undef, float 4.0, float 4.0>, <float 2.0, float 2.0, float 2.0, float undef>
	ret <4 x float> %y			ret <4 x float> %y
	}			}

	define <4 x float> @fmul0_v4f32_nsz_nnan(<4 x float> %x) #0 {			define <4 x float> @fmul0_v4f32_nsz_nnan(<4 x float> %x) {
	; CHECK-LABEL: fmul0_v4f32_nsz_nnan:			; CHECK-LABEL: fmul0_v4f32_nsz_nnan:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: xorps %xmm0, %xmm0			; CHECK-NEXT: xorps %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%y = fmul nnan nsz <4 x float> %x, <float 0.0, float 0.0, float 0.0, float 0.0>			%y = fmul nnan nsz <4 x float> %x, <float 0.0, float 0.0, float 0.0, float 0.0>
	ret <4 x float> %y			ret <4 x float> %y
	}			}

	define <4 x float> @fmul0_v4f32_undef(<4 x float> %x) #0 {			define <4 x float> @fmul0_v4f32_undef(<4 x float> %x) {
	; CHECK-LABEL: fmul0_v4f32_undef:			; CHECK-LABEL: fmul0_v4f32_undef:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: xorps %xmm0, %xmm0			; CHECK-NEXT: xorps %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%y = fmul nnan nsz <4 x float> %x, <float undef, float 0.0, float undef, float 0.0>			%y = fmul nnan nsz <4 x float> %x, <float undef, float 0.0, float undef, float 0.0>
	ret <4 x float> %y			ret <4 x float> %y
	}			}

	define <4 x float> @fmul_c2_c4_v4f32(<4 x float> %x) #0 {			define <4 x float> @fmul_c2_c4_v4f32(<4 x float> %x) {
	; CHECK-LABEL: fmul_c2_c4_v4f32:			; CHECK-LABEL: fmul_c2_c4_v4f32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0			; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%y = fmul <4 x float> %x, <float 2.0, float 2.0, float 2.0, float 2.0>			%y = fmul fast <4 x float> %x, <float 2.0, float 2.0, float 2.0, float 2.0>
	%z = fmul <4 x float> %y, <float 4.0, float 4.0, float 4.0, float 4.0>			%z = fmul fast <4 x float> %y, <float 4.0, float 4.0, float 4.0, float 4.0>
	ret <4 x float> %z			ret <4 x float> %z
	}			}

	define <4 x float> @fmul_c3_c4_v4f32(<4 x float> %x) #0 {			define <4 x float> @fmul_c3_c4_v4f32(<4 x float> %x) {
	; CHECK-LABEL: fmul_c3_c4_v4f32:			; CHECK-LABEL: fmul_c3_c4_v4f32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0			; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%y = fmul <4 x float> %x, <float 3.0, float 3.0, float 3.0, float 3.0>			%y = fmul fast <4 x float> %x, <float 3.0, float 3.0, float 3.0, float 3.0>
	%z = fmul <4 x float> %y, <float 4.0, float 4.0, float 4.0, float 4.0>			%z = fmul fast <4 x float> %y, <float 4.0, float 4.0, float 4.0, float 4.0>
	ret <4 x float> %z			ret <4 x float> %z
	}			}

	; CHECK: float 5			; CHECK: float 5
	; CHECK: float 12			; CHECK: float 12
	; CHECK: float 21			; CHECK: float 21
	; CHECK: float 32			; CHECK: float 32

	; We should be able to pre-multiply the two constant vectors.			; We should be able to pre-multiply the two constant vectors.
	define <4 x float> @fmul_v4f32_two_consts_no_splat(<4 x float> %x) #0 {			define <4 x float> @fmul_v4f32_two_consts_no_splat(<4 x float> %x) {
	; CHECK-LABEL: fmul_v4f32_two_consts_no_splat:			; CHECK-LABEL: fmul_v4f32_two_consts_no_splat:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0			; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%y = fmul <4 x float> %x, <float 1.0, float 2.0, float 3.0, float 4.0>			%y = fmul fast <4 x float> %x, <float 1.0, float 2.0, float 3.0, float 4.0>
	%z = fmul <4 x float> %y, <float 5.0, float 6.0, float 7.0, float 8.0>			%z = fmul fast <4 x float> %y, <float 5.0, float 6.0, float 7.0, float 8.0>
	ret <4 x float> %z			ret <4 x float> %z
	}			}

	; Same as above, but reverse operands to make sure non-canonical form is also handled.			; Same as above, but reverse operands to make sure non-canonical form is also handled.
	define <4 x float> @fmul_v4f32_two_consts_no_splat_non_canonical(<4 x float> %x) #0 {			define <4 x float> @fmul_v4f32_two_consts_no_splat_non_canonical(<4 x float> %x) {
	; CHECK-LABEL: fmul_v4f32_two_consts_no_splat_non_canonical:			; CHECK-LABEL: fmul_v4f32_two_consts_no_splat_non_canonical:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0			; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%y = fmul <4 x float> <float 1.0, float 2.0, float 3.0, float 4.0>, %x			%y = fmul fast <4 x float> <float 1.0, float 2.0, float 3.0, float 4.0>, %x
	%z = fmul <4 x float> <float 5.0, float 6.0, float 7.0, float 8.0>, %y			%z = fmul fast <4 x float> <float 5.0, float 6.0, float 7.0, float 8.0>, %y
	ret <4 x float> %z			ret <4 x float> %z
	}			}

	; Node-level FMF and no function-level attributes.			; Node-level FMF and no function-level attributes.

	define <4 x float> @fmul_v4f32_two_consts_no_splat_reassoc(<4 x float> %x) {			define <4 x float> @fmul_v4f32_two_consts_no_splat_reassoc(<4 x float> %x) {
	; CHECK-LABEL: fmul_v4f32_two_consts_no_splat_reassoc:			; CHECK-LABEL: fmul_v4f32_two_consts_no_splat_reassoc:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	Show All 18 Lines

	; CHECK: float 6			; CHECK: float 6
	; CHECK: float 14			; CHECK: float 14
	; CHECK: float 24			; CHECK: float 24
	; CHECK: float 36			; CHECK: float 36

	; More than one use of a constant multiply should not inhibit the optimization.			; More than one use of a constant multiply should not inhibit the optimization.
	; Instead of a chain of 2 dependent mults, this test will have 2 independent mults.			; Instead of a chain of 2 dependent mults, this test will have 2 independent mults.
	define <4 x float> @fmul_v4f32_two_consts_no_splat_multiple_use(<4 x float> %x) #0 {			define <4 x float> @fmul_v4f32_two_consts_no_splat_multiple_use(<4 x float> %x) {
	; CHECK-LABEL: fmul_v4f32_two_consts_no_splat_multiple_use:			; CHECK-LABEL: fmul_v4f32_two_consts_no_splat_multiple_use:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0			; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%y = fmul <4 x float> %x, <float 1.0, float 2.0, float 3.0, float 4.0>			%y = fmul fast <4 x float> %x, <float 1.0, float 2.0, float 3.0, float 4.0>
	%z = fmul <4 x float> %y, <float 5.0, float 6.0, float 7.0, float 8.0>			%z = fmul fast <4 x float> %y, <float 5.0, float 6.0, float 7.0, float 8.0>
	%a = fadd <4 x float> %y, %z			%a = fadd fast <4 x float> %y, %z
	ret <4 x float> %a			ret <4 x float> %a
	}			}

	; PR22698 - http://llvm.org/bugs/show_bug.cgi?id=22698			; PR22698 - http://llvm.org/bugs/show_bug.cgi?id=22698
	; Make sure that we don't infinite loop swapping constants back and forth.			; Make sure that we don't infinite loop swapping constants back and forth.

	; CHECK: float 24			; CHECK: float 24
	; CHECK: float 24			; CHECK: float 24
	; CHECK: float 24			; CHECK: float 24
	; CHECK: float 24			; CHECK: float 24

	define <4 x float> @PR22698_splats(<4 x float> %a) #0 {			define <4 x float> @PR22698_splats(<4 x float> %a) {
	; CHECK-LABEL: PR22698_splats:			; CHECK-LABEL: PR22698_splats:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0			; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%mul1 = fmul fast <4 x float> <float 2.0, float 2.0, float 2.0, float 2.0>, <float 3.0, float 3.0, float 3.0, float 3.0>			%mul1 = fmul fast <4 x float> <float 2.0, float 2.0, float 2.0, float 2.0>, <float 3.0, float 3.0, float 3.0, float 3.0>
	%mul2 = fmul fast <4 x float> <float 4.0, float 4.0, float 4.0, float 4.0>, %mul1			%mul2 = fmul fast <4 x float> <float 4.0, float 4.0, float 4.0, float 4.0>, %mul1
	%mul3 = fmul fast <4 x float> %a, %mul2			%mul3 = fmul fast <4 x float> %a, %mul2
	ret <4 x float> %mul3			ret <4 x float> %mul3
	}			}

	; Same as above, but verify that non-splat vectors are handled correctly too.			; Same as above, but verify that non-splat vectors are handled correctly too.

	; CHECK: float 45			; CHECK: float 45
	; CHECK: float 120			; CHECK: float 120
	; CHECK: float 231			; CHECK: float 231
	; CHECK: float 384			; CHECK: float 384

	define <4 x float> @PR22698_no_splats(<4 x float> %a) #0 {			define <4 x float> @PR22698_no_splats(<4 x float> %a) {
	; CHECK-LABEL: PR22698_no_splats:			; CHECK-LABEL: PR22698_no_splats:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0			; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%mul1 = fmul fast <4 x float> <float 1.0, float 2.0, float 3.0, float 4.0>, <float 5.0, float 6.0, float 7.0, float 8.0>			%mul1 = fmul fast <4 x float> <float 1.0, float 2.0, float 3.0, float 4.0>, <float 5.0, float 6.0, float 7.0, float 8.0>
	%mul2 = fmul fast <4 x float> <float 9.0, float 10.0, float 11.0, float 12.0>, %mul1			%mul2 = fmul fast <4 x float> <float 9.0, float 10.0, float 11.0, float 12.0>, %mul1
	%mul3 = fmul fast <4 x float> %a, %mul2			%mul3 = fmul fast <4 x float> %a, %mul2
	ret <4 x float> %mul3			ret <4 x float> %mul3
	}			}

	define float @fmul_c2_c4_f32(float %x) #0 {			define float @fmul_c2_c4_f32(float %x) {
	; CHECK-LABEL: fmul_c2_c4_f32:			; CHECK-LABEL: fmul_c2_c4_f32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: mulss {{.*}}(%rip), %xmm0			; CHECK-NEXT: mulss {{.*}}(%rip), %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%y = fmul float %x, 2.0			%y = fmul fast float %x, 2.0
	%z = fmul float %y, 4.0			%z = fmul fast float %y, 4.0
	ret float %z			ret float %z
	}			}

	define float @fmul_c3_c4_f32(float %x) #0 {			define float @fmul_c3_c4_f32(float %x) {
	; CHECK-LABEL: fmul_c3_c4_f32:			; CHECK-LABEL: fmul_c3_c4_f32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: mulss {{.*}}(%rip), %xmm0			; CHECK-NEXT: mulss {{.*}}(%rip), %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%y = fmul float %x, 3.0			%y = fmul fast float %x, 3.0
	%z = fmul float %y, 4.0			%z = fmul fast float %y, 4.0
	ret float %z			ret float %z
	}			}

	define float @fmul_fneg_fneg_f32(float %x, float %y) {			define float @fmul_fneg_fneg_f32(float %x, float %y) {
	; CHECK-LABEL: fmul_fneg_fneg_f32:			; CHECK-LABEL: fmul_fneg_fneg_f32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: mulss %xmm1, %xmm0			; CHECK-NEXT: mulss %xmm1, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%x.neg = fsub float -0.0, %x			%x.neg = fsub float -0.0, %x
	%y.neg = fsub float -0.0, %y			%y.neg = fsub float -0.0, %y
	%mul = fmul float %x.neg, %y.neg			%mul = fmul float %x.neg, %y.neg
	ret float %mul			ret float %mul
	}			}

	define <4 x float> @fmul_fneg_fneg_v4f32(<4 x float> %x, <4 x float> %y) {			define <4 x float> @fmul_fneg_fneg_v4f32(<4 x float> %x, <4 x float> %y) {
	; CHECK-LABEL: fmul_fneg_fneg_v4f32:			; CHECK-LABEL: fmul_fneg_fneg_v4f32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: mulps %xmm1, %xmm0			; CHECK-NEXT: mulps %xmm1, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%x.neg = fsub <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, %x			%x.neg = fsub <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, %x
	%y.neg = fsub <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, %y			%y.neg = fsub <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, %y
	%mul = fmul <4 x float> %x.neg, %y.neg			%mul = fmul <4 x float> %x.neg, %y.neg
	ret <4 x float> %mul			ret <4 x float> %mul
	}			}

	attributes #0 = { "less-precise-fpmad"="true" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "unsafe-fp-math"="true" }

llvm/trunk/test/CodeGen/X86/fp-fast.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=avx < %s \| FileCheck %s
	; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=avx -enable-unsafe-fp-math --enable-no-nans-fp-math < %s \| FileCheck %s

	define float @test1(float %a) {			define float @test1(float %a) #0 {
	; CHECK-LABEL: test1:			; CHECK-LABEL: test1:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t1 = fadd float %a, %a			%t1 = fadd nnan reassoc nsz float %a, %a
	%r = fadd float %t1, %t1			%r = fadd nnan reassoc nsz float %t1, %t1
	ret float %r			ret float %r
	}			}

	define float @test2(float %a) {			define float @test2(float %a) #0 {
	; CHECK-LABEL: test2:			; CHECK-LABEL: test2:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t1 = fmul float 4.0, %a			%t1 = fmul nnan reassoc nsz float 4.0, %a
	%t2 = fadd float %a, %a			%t2 = fadd nnan reassoc nsz float %a, %a
	%r = fadd float %t1, %t2			%r = fadd nnan reassoc nsz float %t1, %t2
	ret float %r			ret float %r
	}			}

	define float @test3(float %a) {			define float @test3(float %a) #0 {
	; CHECK-LABEL: test3:			; CHECK-LABEL: test3:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t1 = fmul float %a, 4.0			%t1 = fmul nnan reassoc nsz float %a, 4.0
	%t2 = fadd float %a, %a			%t2 = fadd nnan reassoc nsz float %a, %a
	%r = fadd float %t1, %t2			%r = fadd nnan reassoc nsz float %t1, %t2
	ret float %r			ret float %r
	}			}

	define float @test4(float %a) {			define float @test4(float %a) #0 {
	; CHECK-LABEL: test4:			; CHECK-LABEL: test4:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t1 = fadd float %a, %a			%t1 = fadd nnan reassoc nsz float %a, %a
	%t2 = fmul float 4.0, %a			%t2 = fmul nnan reassoc nsz float 4.0, %a
	%r = fadd float %t1, %t2			%r = fadd nnan reassoc nsz float %t1, %t2
	ret float %r			ret float %r
	}			}

	define float @test5(float %a) {			define float @test5(float %a) #0 {
	; CHECK-LABEL: test5:			; CHECK-LABEL: test5:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t1 = fadd float %a, %a			%t1 = fadd nnan reassoc nsz float %a, %a
	%t2 = fmul float %a, 4.0			%t2 = fmul nnan reassoc nsz float %a, 4.0
	%r = fadd float %t1, %t2			%r = fadd nnan reassoc nsz float %t1, %t2
	ret float %r			ret float %r
	}			}

	define float @test6(float %a) {			define float @test6(float %a) #0 {
	; CHECK-LABEL: test6:			; CHECK-LABEL: test6:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vxorps %xmm0, %xmm0, %xmm0			; CHECK-NEXT: vxorps %xmm0, %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t1 = fmul float 2.0, %a			%t1 = fmul nnan reassoc nsz float 2.0, %a
	%t2 = fadd float %a, %a			%t2 = fadd nnan reassoc nsz float %a, %a
	%r = fsub float %t1, %t2			%r = fsub nnan reassoc nsz float %t1, %t2
	ret float %r			ret float %r
	}			}

	define float @test7(float %a) {			define float @test7(float %a) #0 {
	; CHECK-LABEL: test7:			; CHECK-LABEL: test7:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vxorps %xmm0, %xmm0, %xmm0			; CHECK-NEXT: vxorps %xmm0, %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t1 = fmul float %a, 2.0			%t1 = fmul nnan reassoc nsz float %a, 2.0
	%t2 = fadd float %a, %a			%t2 = fadd nnan reassoc nsz float %a, %a
	%r = fsub float %t1, %t2			%r = fsub nnan reassoc nsz float %t1, %t2
	ret float %r			ret float %r
	}			}

	define float @test8(float %a) {			define float @test8(float %a) #0 {
	; CHECK-LABEL: test8:			; CHECK-LABEL: test8:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t1 = fmul float %a, 0.0			%t1 = fmul nsz float %a, 0.0
	%t2 = fadd float %a, %t1			%t2 = fadd nnan reassoc nsz float %a, %t1
	ret float %t2			ret float %t2
	}			}

	define float @test9(float %a) {			define float @test9(float %a) #0 {
	; CHECK-LABEL: test9:			; CHECK-LABEL: test9:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t1 = fmul float 0.0, %a			%t1 = fmul nsz float 0.0, %a
	%t2 = fadd float %t1, %a			%t2 = fadd nnan reassoc nsz float %t1, %a
	ret float %t2			ret float %t2
	}			}

	define float @test10(float %a) {			define float @test10(float %a) #0 {
	; CHECK-LABEL: test10:			; CHECK-LABEL: test10:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vxorps %xmm0, %xmm0, %xmm0			; CHECK-NEXT: vxorps %xmm0, %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t1 = fsub float -0.0, %a			%t1 = fsub nsz float -0.0, %a
	%t2 = fadd float %a, %t1			%t2 = fadd nnan reassoc nsz float %a, %t1
	ret float %t2			ret float %t2
	}			}

llvm/trunk/test/CodeGen/X86/fp-fold.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefixes=ANY,STRICT
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -enable-unsafe-fp-math \| FileCheck %s --check-prefixes=ANY,UNSAFE			define float @fadd_zero_strict(float %x) {
				; CHECK-LABEL: fadd_zero_strict:
	define float @fadd_zero(float %x) {			; CHECK: # %bb.0:
	; STRICT-LABEL: fadd_zero:			; CHECK-NEXT: xorps %xmm1, %xmm1
	; STRICT: # %bb.0:			; CHECK-NEXT: addss %xmm1, %xmm0
	; STRICT-NEXT: xorps %xmm1, %xmm1			; CHECK-NEXT: retq
	; STRICT-NEXT: addss %xmm1, %xmm0
	; STRICT-NEXT: retq
	;
	; UNSAFE-LABEL: fadd_zero:
	; UNSAFE: # %bb.0:
	; UNSAFE-NEXT: retq
	%r = fadd float %x, 0.0			%r = fadd float %x, 0.0
	ret float %r			ret float %r
	}			}

	define float @fadd_negzero(float %x) {			define float @fadd_negzero(float %x) {
	; ANY-LABEL: fadd_negzero:			; CHECK-LABEL: fadd_negzero:
	; ANY: # %bb.0:			; CHECK: # %bb.0:
	; ANY-NEXT: retq			; CHECK-NEXT: retq
	%r = fadd float %x, -0.0			%r = fadd float %x, -0.0
	ret float %r			ret float %r
	}			}

	define float @fadd_produce_zero(float %x) {			define float @fadd_produce_zero(float %x) {
	; ANY-LABEL: fadd_produce_zero:			; CHECK-LABEL: fadd_produce_zero:
	; ANY: # %bb.0:			; CHECK: # %bb.0:
	; ANY-NEXT: xorps %xmm0, %xmm0			; CHECK-NEXT: xorps %xmm0, %xmm0
	; ANY-NEXT: retq			; CHECK-NEXT: retq
	%neg = fsub nsz float 0.0, %x			%neg = fsub nsz float 0.0, %x
	%r = fadd nnan float %neg, %x			%r = fadd nnan float %neg, %x
	ret float %r			ret float %r
	}			}

	define float @fadd_reassociate(float %x) {			define float @fadd_reassociate(float %x) {
	; ANY-LABEL: fadd_reassociate:			; CHECK-LABEL: fadd_reassociate:
	; ANY: # %bb.0:			; CHECK: # %bb.0:
	; ANY-NEXT: addss {{.*}}(%rip), %xmm0			; CHECK-NEXT: addss {{.*}}(%rip), %xmm0
	; ANY-NEXT: retq			; CHECK-NEXT: retq
	%sum = fadd float %x, 8.0			%sum = fadd float %x, 8.0
	%r = fadd reassoc nsz float %sum, 12.0			%r = fadd reassoc nsz float %sum, 12.0
	ret float %r			ret float %r
	}			}

	define float @fadd_negzero_nsz(float %x) {			define float @fadd_negzero_nsz(float %x) {
	; ANY-LABEL: fadd_negzero_nsz:			; CHECK-LABEL: fadd_negzero_nsz:
	; ANY: # %bb.0:			; CHECK: # %bb.0:
	; ANY-NEXT: retq			; CHECK-NEXT: retq
	%r = fadd nsz float %x, -0.0			%r = fadd nsz float %x, -0.0
	ret float %r			ret float %r
	}			}

	define float @fadd_zero_nsz(float %x) {			define float @fadd_zero_nsz(float %x) {
	; ANY-LABEL: fadd_zero_nsz:			; CHECK-LABEL: fadd_zero_nsz:
	; ANY: # %bb.0:			; CHECK: # %bb.0:
	; ANY-NEXT: retq			; CHECK-NEXT: retq
	%r = fadd nsz float %x, 0.0			%r = fadd nsz float %x, 0.0
	ret float %r			ret float %r
	}			}

	define float @fsub_zero(float %x) {			define float @fsub_zero(float %x) {
	; ANY-LABEL: fsub_zero:			; CHECK-LABEL: fsub_zero:
	; ANY: # %bb.0:			; CHECK: # %bb.0:
	; ANY-NEXT: retq			; CHECK-NEXT: retq
	%r = fsub float %x, 0.0			%r = fsub float %x, 0.0
	ret float %r			ret float %r
	}			}

	define float @fsub_self(float %x) {			define float @fsub_self(float %x) {
	; ANY-LABEL: fsub_self:			; CHECK-LABEL: fsub_self:
	; ANY: # %bb.0:			; CHECK: # %bb.0:
	; ANY-NEXT: xorps %xmm0, %xmm0			; CHECK-NEXT: xorps %xmm0, %xmm0
	; ANY-NEXT: retq			; CHECK-NEXT: retq
	%r = fsub nnan float %x, %x			%r = fsub nnan float %x, %x
	ret float %r			ret float %r
	}			}

	define float @fsub_neg_x_y(float %x, float %y) {			define float @fsub_neg_x_y(float %x, float %y) {
	; ANY-LABEL: fsub_neg_x_y:			; CHECK-LABEL: fsub_neg_x_y:
	; ANY: # %bb.0:			; CHECK: # %bb.0:
	; ANY-NEXT: subss %xmm0, %xmm1			; CHECK-NEXT: subss %xmm0, %xmm1
	; ANY-NEXT: movaps %xmm1, %xmm0			; CHECK-NEXT: movaps %xmm1, %xmm0
	; ANY-NEXT: retq			; CHECK-NEXT: retq
	%neg = fsub nsz float 0.0, %x			%neg = fsub nsz float 0.0, %x
	%r = fadd nsz float %neg, %y			%r = fadd nsz float %neg, %y
	ret float %r			ret float %r
	}			}

	define float @fsub_neg_y(float %x, float %y) {			define float @fsub_neg_y(float %x, float %y) {
	; ANY-LABEL: fsub_neg_y:			; CHECK-LABEL: fsub_neg_y:
	; ANY: # %bb.0:			; CHECK: # %bb.0:
	; ANY-NEXT: mulss {{.*}}(%rip), %xmm0			; CHECK-NEXT: mulss {{.*}}(%rip), %xmm0
	; ANY-NEXT: retq			; CHECK-NEXT: retq
	%mul = fmul float %x, 5.0			%mul = fmul float %x, 5.0
	%add = fadd float %mul, %y			%add = fadd float %mul, %y
	%r = fsub nsz reassoc float %y, %add			%r = fsub nsz reassoc float %y, %add
	ret float %r			ret float %r
	}			}

	define <4 x float> @fsub_neg_y_vector(<4 x float> %x, <4 x float> %y) {			define <4 x float> @fsub_neg_y_vector(<4 x float> %x, <4 x float> %y) {
	; ANY-LABEL: fsub_neg_y_vector:			; CHECK-LABEL: fsub_neg_y_vector:
	; ANY: # %bb.0:			; CHECK: # %bb.0:
	; ANY-NEXT: mulps {{.*}}(%rip), %xmm0			; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0
	; ANY-NEXT: retq			; CHECK-NEXT: retq
	%mul = fmul <4 x float> %x, <float 5.0, float 5.0, float 5.0, float 5.0>			%mul = fmul <4 x float> %x, <float 5.0, float 5.0, float 5.0, float 5.0>
	%add = fadd <4 x float> %mul, %y			%add = fadd <4 x float> %mul, %y
	%r = fsub nsz reassoc <4 x float> %y, %add			%r = fsub nsz reassoc <4 x float> %y, %add
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define <4 x float> @fsub_neg_y_vector_nonuniform(<4 x float> %x, <4 x float> %y) {			define <4 x float> @fsub_neg_y_vector_nonuniform(<4 x float> %x, <4 x float> %y) {
	; ANY-LABEL: fsub_neg_y_vector_nonuniform:			; CHECK-LABEL: fsub_neg_y_vector_nonuniform:
	; ANY: # %bb.0:			; CHECK: # %bb.0:
	; ANY-NEXT: mulps {{.*}}(%rip), %xmm0			; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0
	; ANY-NEXT: retq			; CHECK-NEXT: retq
	%mul = fmul <4 x float> %x, <float 5.0, float 6.0, float 7.0, float 8.0>			%mul = fmul <4 x float> %x, <float 5.0, float 6.0, float 7.0, float 8.0>
	%add = fadd <4 x float> %mul, %y			%add = fadd <4 x float> %mul, %y
	%r = fsub nsz reassoc <4 x float> %y, %add			%r = fsub nsz reassoc <4 x float> %y, %add
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define float @fsub_neg_y_commute(float %x, float %y) {			define float @fsub_neg_y_commute(float %x, float %y) {
	; ANY-LABEL: fsub_neg_y_commute:			; CHECK-LABEL: fsub_neg_y_commute:
	; ANY: # %bb.0:			; CHECK: # %bb.0:
	; ANY-NEXT: mulss {{.*}}(%rip), %xmm0			; CHECK-NEXT: mulss {{.*}}(%rip), %xmm0
	; ANY-NEXT: retq			; CHECK-NEXT: retq
	%mul = fmul float %x, 5.0			%mul = fmul float %x, 5.0
	%add = fadd float %y, %mul			%add = fadd float %y, %mul
	%r = fsub nsz reassoc float %y, %add			%r = fsub nsz reassoc float %y, %add
	ret float %r			ret float %r
	}			}

	define <4 x float> @fsub_neg_y_commute_vector(<4 x float> %x, <4 x float> %y) {			define <4 x float> @fsub_neg_y_commute_vector(<4 x float> %x, <4 x float> %y) {
	; ANY-LABEL: fsub_neg_y_commute_vector:			; CHECK-LABEL: fsub_neg_y_commute_vector:
	; ANY: # %bb.0:			; CHECK: # %bb.0:
	; ANY-NEXT: mulps {{.*}}(%rip), %xmm0			; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0
	; ANY-NEXT: retq			; CHECK-NEXT: retq
	%mul = fmul <4 x float> %x, <float 5.0, float 5.0, float 5.0, float 5.0>			%mul = fmul <4 x float> %x, <float 5.0, float 5.0, float 5.0, float 5.0>
	%add = fadd <4 x float> %y, %mul			%add = fadd <4 x float> %y, %mul
	%r = fsub nsz reassoc <4 x float> %y, %add			%r = fsub nsz reassoc <4 x float> %y, %add
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	; Y - (X + Y) --> -X			; Y - (X + Y) --> -X

	define float @fsub_fadd_common_op_fneg(float %x, float %y) {			define float @fsub_fadd_common_op_fneg(float %x, float %y) {
	; ANY-LABEL: fsub_fadd_common_op_fneg:			; CHECK-LABEL: fsub_fadd_common_op_fneg:
	; ANY: # %bb.0:			; CHECK: # %bb.0:
	; ANY-NEXT: xorps {{.*}}(%rip), %xmm0			; CHECK-NEXT: xorps {{.*}}(%rip), %xmm0
	; ANY-NEXT: retq			; CHECK-NEXT: retq
	%a = fadd float %x, %y			%a = fadd float %x, %y
	%r = fsub reassoc nsz float %y, %a			%r = fsub reassoc nsz float %y, %a
	ret float %r			ret float %r
	}			}

	; Y - (X + Y) --> -X			; Y - (X + Y) --> -X

	define <4 x float> @fsub_fadd_common_op_fneg_vec(<4 x float> %x, <4 x float> %y) {			define <4 x float> @fsub_fadd_common_op_fneg_vec(<4 x float> %x, <4 x float> %y) {
	; ANY-LABEL: fsub_fadd_common_op_fneg_vec:			; CHECK-LABEL: fsub_fadd_common_op_fneg_vec:
	; ANY: # %bb.0:			; CHECK: # %bb.0:
	; ANY-NEXT: xorps {{.*}}(%rip), %xmm0			; CHECK-NEXT: xorps {{.*}}(%rip), %xmm0
	; ANY-NEXT: retq			; CHECK-NEXT: retq
	%a = fadd <4 x float> %x, %y			%a = fadd <4 x float> %x, %y
	%r = fsub nsz reassoc <4 x float> %y, %a			%r = fsub nsz reassoc <4 x float> %y, %a
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	; Y - (Y + X) --> -X			; Y - (Y + X) --> -X
	; Commute operands of the 'add'.			; Commute operands of the 'add'.

	define float @fsub_fadd_common_op_fneg_commute(float %x, float %y) {			define float @fsub_fadd_common_op_fneg_commute(float %x, float %y) {
	; ANY-LABEL: fsub_fadd_common_op_fneg_commute:			; CHECK-LABEL: fsub_fadd_common_op_fneg_commute:
	; ANY: # %bb.0:			; CHECK: # %bb.0:
	; ANY-NEXT: xorps {{.*}}(%rip), %xmm0			; CHECK-NEXT: xorps {{.*}}(%rip), %xmm0
	; ANY-NEXT: retq			; CHECK-NEXT: retq
	%a = fadd float %y, %x			%a = fadd float %y, %x
	%r = fsub reassoc nsz float %y, %a			%r = fsub reassoc nsz float %y, %a
	ret float %r			ret float %r
	}			}

	; Y - (Y + X) --> -X			; Y - (Y + X) --> -X

	define <4 x float> @fsub_fadd_common_op_fneg_commute_vec(<4 x float> %x, <4 x float> %y) {			define <4 x float> @fsub_fadd_common_op_fneg_commute_vec(<4 x float> %x, <4 x float> %y) {
	; ANY-LABEL: fsub_fadd_common_op_fneg_commute_vec:			; CHECK-LABEL: fsub_fadd_common_op_fneg_commute_vec:
	; ANY: # %bb.0:			; CHECK: # %bb.0:
	; ANY-NEXT: xorps {{.*}}(%rip), %xmm0			; CHECK-NEXT: xorps {{.*}}(%rip), %xmm0
	; ANY-NEXT: retq			; CHECK-NEXT: retq
	%a = fadd <4 x float> %y, %x			%a = fadd <4 x float> %y, %x
	%r = fsub reassoc nsz <4 x float> %y, %a			%r = fsub reassoc nsz <4 x float> %y, %a
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define float @fsub_negzero(float %x) {			define float @fsub_negzero_strict(float %x) {
	; STRICT-LABEL: fsub_negzero:			; CHECK-LABEL: fsub_negzero_strict:
	; STRICT: # %bb.0:			; CHECK: # %bb.0:
	; STRICT-NEXT: xorps %xmm1, %xmm1			; CHECK-NEXT: xorps %xmm1, %xmm1
	; STRICT-NEXT: addss %xmm1, %xmm0			; CHECK-NEXT: addss %xmm1, %xmm0
	; STRICT-NEXT: retq			; CHECK-NEXT: retq
	;
	; UNSAFE-LABEL: fsub_negzero:
	; UNSAFE: # %bb.0:
	; UNSAFE-NEXT: retq
	%r = fsub float %x, -0.0			%r = fsub float %x, -0.0
	ret float %r			ret float %r
	}			}

	define <4 x float> @fsub_negzero_vector(<4 x float> %x) {			define float @fsub_negzero_nsz(float %x) {
	; STRICT-LABEL: fsub_negzero_vector:			; CHECK-LABEL: fsub_negzero_nsz:
	; STRICT: # %bb.0:			; CHECK: # %bb.0:
	; STRICT-NEXT: xorps %xmm1, %xmm1			; CHECK-NEXT: retq
	; STRICT-NEXT: addps %xmm1, %xmm0			%r = fsub nsz float %x, -0.0
	; STRICT-NEXT: retq			ret float %r
	;			}
	; UNSAFE-LABEL: fsub_negzero_vector:
	; UNSAFE: # %bb.0:			define <4 x float> @fsub_negzero_strict_vector(<4 x float> %x) {
	; UNSAFE-NEXT: retq			; CHECK-LABEL: fsub_negzero_strict_vector:
				; CHECK: # %bb.0:
				; CHECK-NEXT: xorps %xmm1, %xmm1
				; CHECK-NEXT: addps %xmm1, %xmm0
				; CHECK-NEXT: retq
	%r = fsub <4 x float> %x, <float -0.0, float -0.0, float -0.0, float -0.0>			%r = fsub <4 x float> %x, <float -0.0, float -0.0, float -0.0, float -0.0>
	ret <4 x float> %r			ret <4 x float> %r
	}			}

				define <4 x float> @fsub_negzero_nsz_vector(<4 x float> %x) {
				; CHECK-LABEL: fsub_negzero_nsz_vector:
				; CHECK: # %bb.0:
				; CHECK-NEXT: retq
				%r = fsub nsz <4 x float> %x, <float -0.0, float -0.0, float -0.0, float -0.0>
				ret <4 x float> %r
				}

	define float @fsub_zero_nsz_1(float %x) {			define float @fsub_zero_nsz_1(float %x) {
	; ANY-LABEL: fsub_zero_nsz_1:			; CHECK-LABEL: fsub_zero_nsz_1:
	; ANY: # %bb.0:			; CHECK: # %bb.0:
	; ANY-NEXT: retq			; CHECK-NEXT: retq
	%r = fsub nsz float %x, 0.0			%r = fsub nsz float %x, 0.0
	ret float %r			ret float %r
	}			}

	define float @fsub_zero_nsz_2(float %x) {			define float @fsub_zero_nsz_2(float %x) {
	; ANY-LABEL: fsub_zero_nsz_2:			; CHECK-LABEL: fsub_zero_nsz_2:
	; ANY: # %bb.0:			; CHECK: # %bb.0:
	; ANY-NEXT: xorps {{.*}}(%rip), %xmm0			; CHECK-NEXT: xorps {{.*}}(%rip), %xmm0
	; ANY-NEXT: retq			; CHECK-NEXT: retq
	%r = fsub nsz float 0.0, %x			%r = fsub nsz float 0.0, %x
	ret float %r			ret float %r
	}			}

	define float @fsub_negzero_nsz(float %x) {
	; ANY-LABEL: fsub_negzero_nsz:
	; ANY: # %bb.0:
	; ANY-NEXT: retq
	%r = fsub nsz float %x, -0.0
	ret float %r
	}

	define float @fmul_zero(float %x) {			define float @fmul_zero(float %x) {
	; ANY-LABEL: fmul_zero:			; CHECK-LABEL: fmul_zero:
	; ANY: # %bb.0:			; CHECK: # %bb.0:
	; ANY-NEXT: xorps %xmm0, %xmm0			; CHECK-NEXT: xorps %xmm0, %xmm0
	; ANY-NEXT: retq			; CHECK-NEXT: retq
	%r = fmul nnan nsz float %x, 0.0			%r = fmul nnan nsz float %x, 0.0
	ret float %r			ret float %r
	}			}

	define float @fmul_one(float %x) {			define float @fmul_one(float %x) {
	; ANY-LABEL: fmul_one:			; CHECK-LABEL: fmul_one:
	; ANY: # %bb.0:			; CHECK: # %bb.0:
	; ANY-NEXT: retq			; CHECK-NEXT: retq
	%r = fmul float %x, 1.0			%r = fmul float %x, 1.0
	ret float %r			ret float %r
	}			}

	define float @fmul_x_const_const(float %x) {			define float @fmul_x_const_const(float %x) {
	; ANY-LABEL: fmul_x_const_const:			; CHECK-LABEL: fmul_x_const_const:
	; ANY: # %bb.0:			; CHECK: # %bb.0:
	; ANY-NEXT: mulss {{.*}}(%rip), %xmm0			; CHECK-NEXT: mulss {{.*}}(%rip), %xmm0
	; ANY-NEXT: retq			; CHECK-NEXT: retq
	%mul = fmul reassoc float %x, 9.0			%mul = fmul reassoc float %x, 9.0
	%r = fmul reassoc float %mul, 4.0			%r = fmul reassoc float %mul, 4.0
	ret float %r			ret float %r
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize NoSignedZerosFPMath options controlClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 212674

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

llvm/trunk/test/CodeGen/AArch64/fadd-combines.ll

llvm/trunk/test/CodeGen/AMDGPU/enable-no-signed-zeros-fp-math.ll

llvm/trunk/test/CodeGen/AMDGPU/ffloor.f64.ll

llvm/trunk/test/CodeGen/AMDGPU/fneg-combines.ll

llvm/trunk/test/CodeGen/PowerPC/fma-mutate.ll

llvm/trunk/test/CodeGen/PowerPC/fmf-propagation.ll

llvm/trunk/test/CodeGen/PowerPC/qpx-recipest.ll

llvm/trunk/test/CodeGen/PowerPC/recipest.ll

llvm/trunk/test/CodeGen/X86/dagcombine-unsafe-math.ll

llvm/trunk/test/CodeGen/X86/fmul-combines.ll

llvm/trunk/test/CodeGen/X86/fp-fast.ll

llvm/trunk/test/CodeGen/X86/fp-fold.ll

Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize NoSignedZerosFPMath options control
ClosedPublic