This is an archive of the discontinued LLVM Phabricator instance.

Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize NoSignedZerosFPMath options control
ClosedPublic

Authored by mcberg2017 on Jul 23 2019, 3:56 PM.

Download Raw Diff

Details

Reviewers

spatel
arsenm
hfinkel
wristow
craig.topper

Commits

rG005d705d4392: Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize…
rL367486: Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize…

Summary

Honoring no signed zeroes is also available as a user control through clang separately regardless of fastmath or UnsafeFPMath context, DAG guards should reflect this context.

Diff Detail

Event Timeline

mcberg2017 created this revision.Jul 23 2019, 3:56 PM

Herald added subscribers: jsji, MaskRay, javed.absar and 4 others. · View Herald TranscriptJul 23 2019, 3:56 PM

mcberg2017 marked an inline comment as done.Jul 23 2019, 3:59 PM

mcberg2017 added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12059	In this case, UnsafeFPMath holds reassociation context.

Herald added a subscriber: • wuzish. · View Herald TranscriptJul 23 2019, 4:00 PM

nhaehnle removed a subscriber: nhaehnle.Jul 24 2019, 12:15 AM

updated with one case...

Code changes look good. See inline to for some comments about the tests.

test/CodeGen/AMDGPU/enable-no-signed-zeros-fp-math.ll
1	Here and other test files: is it possible to remove "-enable-unsafe-fp-math" and still get the same test results? If we can tighten up the constraints, that would help move us away from the global requirements. Another option would be to specify the function-level attribute only on the tests that still require more than 'nsz' to produce the expected test results.
test/CodeGen/AMDGPU/fneg-combines.ll
222–227	I don't know how to interpret this test diff: regression, improvement, or does this test no longer accomplish its original intent?

spatel added inline comments.Jul 27 2019, 6:04 AM

test/CodeGen/AMDGPU/enable-no-signed-zeros-fp-math.ll
1	Still another option (and moves us closer still to the goal of IR/node-level flags only): can we remove the global settings entirely, add FMF to the IR in these tests, and maintain their intent?

mcberg2017 marked 2 inline comments as done.Jul 29 2019, 9:47 AM

mcberg2017 added inline comments.

test/CodeGen/AMDGPU/enable-no-signed-zeros-fp-math.ll
1	We need the new flag or attribute to keep the results the same. I will look over the tests and see where (hopefully all) we can use attributes in place of the flags.
test/CodeGen/AMDGPU/fneg-combines.ll
222–227	Yes, now we no longer optimize this case, should I just remove the gcn-nsz-dag context for fneg_fadd_0?

spatel added subscribers: nhaehnle, foad, rampitec.Jul 29 2019, 10:07 AM

spatel added inline comments.

test/CodeGen/AMDGPU/fneg-combines.ll
222–227	Someone with AMDGPU knowledge should answer that. cc'ing @arsenm @nhaehnle @foad @rampitec

rampitec added inline comments.Jul 29 2019, 10:12 AM

test/CodeGen/AMDGPU/fneg-combines.ll
222–227	This is clearly a regression.

mcberg2017 marked an inline comment as done.Jul 29 2019, 10:31 AM

mcberg2017 added inline comments.

test/CodeGen/AMDGPU/fneg-combines.ll
222–227	I will debug this case, it looks like there's an additional dependency with the new code shape wrt fadd.

The new code is actually better by 1 instruction, we just never completed the full match on the test. In the old path we had -enable-no-signed-zeros-fp-math on but no way to reach it for the zero fold of the fadd via llc as the flags are all user controlled. This should not be a regression.

test/CodeGen/AMDGPU/fneg-combines.ll

222–227

with the change of

// N0 + -0.0 --> N0 (also allowed with +0.0 and fast-math)
ConstantFPSDNode *N1C = isConstOrConstSplatFP(N1, true);
if (N1C && N1C->isZero())
  if (N1C->isNegative() || Options.NoSignedZerosFPMath || Flags.hasNoSignedZeros())
    return N0;

we get this DAG:

SelectionDAG has 21 nodes:

t0: ch = EntryToken
t2: f32,ch = CopyFromReg t0, Register:f32 %0
          t27: f32 = fneg t28
        t13: i1 = setcc t27, t2, setuge:ch
      t15: f32 = select t13, t2, t28
    t17: i1 = setcc t15, ConstantFP:f32<0.000000e+00>, setule:ch
  t19: f32 = select t17, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<nan>
t21: ch,glue = CopyToReg t0, Register:f32 $vgpr0, t19
    t4: f32,ch = CopyFromReg t0, Register:f32 %1
  t7: f32 = fdiv ConstantFP:f32<1.000000e+00>, t4
t28: f32 = fmul nnan arcp contract reassoc t7, ConstantFP:f32<-0.000000e+00>
t22: ch = RETURN_TO_EPILOG t21, Register:f32 $vgpr0, t21:1

for which we produce this assembler:

fneg_fadd_0: ; @fneg_fadd_0
; %bb.0: ; %.entry

v_rcp_f32_e32 v0, s1
v_mov_b32_e32 v1, s0
v_mov_b32_e32 v2, 0x7fc00000
v_mul_f32_e32 v0, 0x80000000, v0
v_cmp_nlt_f32_e64 vcc, -v0, s0
v_cndmask_b32_e32 v0, v0, v1, vcc
v_cmp_nlt_f32_e32 vcc, 0, v0
v_cndmask_b32_e64 v0, v2, 0, vcc

with the code the way it is currently posed(unmodified):

// N0 + -0.0 --> N0 (also allowed with +0.0 and fast-math)
ConstantFPSDNode *N1C = isConstOrConstSplatFP(N1, true);
if (N1C && N1C->isZero())
  if (N1C->isNegative() || Options.UnsafeFPMath || Flags.hasNoSignedZeros())
    return N0;

we get this DAG:

SelectionDAG has 21 nodes:

t0: ch = EntryToken
t2: f32,ch = CopyFromReg t0, Register:f32 %0
      t4: f32,ch = CopyFromReg t0, Register:f32 %1
    t7: f32 = fdiv ConstantFP:f32<1.000000e+00>, t4
  t9: f32 = fmul t7, ConstantFP:f32<0.000000e+00>
t11: f32 = fadd t9, ConstantFP:f32<0.000000e+00>
        t13: i1 = setcc t11, t2, setuge:ch
        t14: f32 = fneg t11
      t15: f32 = select t13, t2, t14
    t17: i1 = setcc t15, ConstantFP:f32<0.000000e+00>, setule:ch
  t19: f32 = select t17, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<nan>
t21: ch,glue = CopyToReg t0, Register:f32 $vgpr0, t19
t22: ch = RETURN_TO_EPILOG t21, Register:f32 $vgpr0, t21:1

for which we fold an fused multiply add, and produce this assembler:

fneg_fadd_0: ; @fneg_fadd_0
; %bb.0: ; %.entry

v_rcp_f32_e32 v0, s1
v_bfrev_b32_e32 v1, 1
v_mov_b32_e32 v2, s0
v_mac_f32_e32 v1, v0, v1
v_cmp_nlt_f32_e64 vcc, -v1, s0
v_cndmask_b32_e32 v0, v1, v2, vcc
v_mov_b32_e32 v1, 0x7fc00000
v_cmp_nlt_f32_e32 vcc, 0, v0
v_cndmask_b32_e64 v0, v1, 0, vcc

rampitec added inline comments.Jul 29 2019, 12:51 PM

test/CodeGen/AMDGPU/fneg-combines.ll
222–227	I am trying to understand what does the existing ISA do, and it is: v_rcp_f32_e32 v0, s1 v0 = 1 / s1 v_bfrev_b32_e32 v1, 1 v1 = 0x8000000 = -0.0 v_mov_b32_e32 v2, s0 v2 = s0 v_mac_f32_e32 v1, v0, v1 v1 = v1 * v0 + v1 = v0 * -0.0 - 0.0 = 0 Instead of that fancy mac instruction that seems to be all now folded into v_mul_f32_e32 v0, 0x80000000, v0 v0 = v0 * -0.0 I.e. it is hardly practically performance relevant code. The comment above tells it used to assert, so I guess this is just a regression test. Given no signed zeroes this is as good as just v0 = 0, but that's a different matter. I have no objection for this test change.

Note: test/CodeGen/AMDGPU/fneg-combines.ll needs rearchitecting, so i left it in options flag form, test/CodeGen/PowerPC/fmf-propagation.ll has portions that can be removed once we stop using the options flags and so i am leaving it in its current form with mods until that happens. test/CodeGen/X86/fp-fast.ll uses a subset of fmf that equate to the options flag that were used (let me know if you just want to generalize to fast or smaller subset), all the others use either fast or context relevant fmf and have been converted to not use options flags. Have a look and see what needs editing, currently this all passes testing.

spatel added inline comments.Jul 31 2019, 3:44 AM

test/CodeGen/AArch64/fadd-combines.ll
150–151	This comment is incorrect now. MachineCombiner was relying on the function attribute?
test/CodeGen/PowerPC/fma-mutate.ll
6	Selectively using 2 different labels for the same RUN line confused me. That seems unnecessary because the function name and IR makes it clear what the difference in output is expected to be.
13–16	Would it be better to auto-generate the complete output for these tests using utils/update_llc_test_checks.py?
test/CodeGen/PowerPC/qpx-recipest.ll
1	Same comment as above: Selectively using 2 different labels for the same RUN line confused me. That seems unnecessary because the function name and IR makes it clear what the difference in output is expected to be.
test/CodeGen/PowerPC/recipest.ll
1	Same comment as above: Selectively using 2 different labels for the same RUN line confused me. That seems unnecessary because the function name and IR makes it clear what the difference in output is expected to be.
test/CodeGen/X86/dagcombine-unsafe-math.ll
64	If 'contract' is required, that is unnecessary? Mark with a 'FIXME' note?
test/CodeGen/X86/fp-fast.ll
8–9	Same as earlier comment: If 'contract' is required, that is unnecessary? Mark with a 'FIXME' note?
test/CodeGen/X86/fp-fold.ll
1	Same comment as earlier: Selectively using 2 different labels for the same RUN line confused me. That seems unnecessary because the function name and IR makes it clear what the difference in output is expected to be. Definitely use the auto-generation script when possible for x86 tests.

added a TODO comment for machine combines in test/CodeGen/AArch64/fadd-combines.ll,
test/CodeGen/PowerPC/fma-mutate.ll, test/CodeGen/PowerPC/qpx-recipest.ll, test/CodeGen/PowerPC/recipest.ll, test/CodeGen/X86/fp-fold.ll now have just standard CHECK lines. The fmf contract flag was
removed from test/CodeGen/X86/dagcombine-unsafe-math.ll and test/CodeGen/X86/fp-fast.ll.

LGTM - thanks for the test file updates. See inline comment about the AArch64 test.

test/CodeGen/AArch64/fadd-combines.ll
152–154	That comment seems wrong from the start. This form has better throughput than 3 chained adds. Ideally, this would be FMA? 2.0 * x + 101.0 The '17' variable names in the check lines are wrong too: 1109917696 --> 0x42280000 --> 42.0

This revision is now accepted and ready to land.Jul 31 2019, 1:22 PM

Closed by commit rL367486: Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize… (authored by mcberg2017). · Explain WhyJul 31 2019, 3:01 PM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2019, 3:01 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

12 lines

SelectionDAG.cpp

2 lines

test/

CodeGen/

AArch64/

fadd-combines.ll

20 lines

AMDGPU/

enable-no-signed-zeros-fp-math.ll

26 lines

ffloor.f64.ll

28 lines

fneg-combines.ll

7 lines

PowerPC/

21 lines

2 lines

280 lines

381 lines

X86/

dagcombine-unsafe-math.ll

7 lines

fmul-combines.ll

54 lines

fp-fast.ll

76 lines

fp-fold.ll

50 lines

Diff 212412

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 814 Lines • ▼ Show 20 Lines	if (TLI.isOperationLegal(ISD::ConstantFP, VT) &&
return 1;		return 1;
return llvm::all_of(Op->op_values(), [&](SDValue N) {		return llvm::all_of(Op->op_values(), [&](SDValue N) {
return N.isUndef() \|\|		return N.isUndef() \|\|
TLI.isFPImmLegal(neg(cast<ConstantFPSDNode>(N)->getValueAPF()), VT,		TLI.isFPImmLegal(neg(cast<ConstantFPSDNode>(N)->getValueAPF()), VT,
ForCodeSize);		ForCodeSize);
});		});
}		}
case ISD::FADD:		case ISD::FADD:
if (!Options->UnsafeFPMath && !Flags.hasNoSignedZeros())		if (!Options->NoSignedZerosFPMath && !Flags.hasNoSignedZeros())
return 0;		return 0;

// After operation legalization, it might not be legal to create new FSUBs.		// After operation legalization, it might not be legal to create new FSUBs.
if (LegalOperations && !TLI.isOperationLegalOrCustom(ISD::FSUB, VT))		if (LegalOperations && !TLI.isOperationLegalOrCustom(ISD::FSUB, VT))
return 0;		return 0;

// fold (fneg (fadd A, B)) -> (fsub (fneg A), B)		// fold (fneg (fadd A, B)) -> (fsub (fneg A), B)
if (char V = isNegatibleForFree(Op.getOperand(0), LegalOperations, TLI,		if (char V = isNegatibleForFree(Op.getOperand(0), LegalOperations, TLI,
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	for (SDValue C : Op->op_values()) {
}		}
APFloat V = cast<ConstantFPSDNode>(C)->getValueAPF();		APFloat V = cast<ConstantFPSDNode>(C)->getValueAPF();
V.changeSign();		V.changeSign();
Ops.push_back(DAG.getConstantFP(V, SDLoc(Op), C.getValueType()));		Ops.push_back(DAG.getConstantFP(V, SDLoc(Op), C.getValueType()));
}		}
return DAG.getBuildVector(Op.getValueType(), SDLoc(Op), Ops);		return DAG.getBuildVector(Op.getValueType(), SDLoc(Op), Ops);
}		}
case ISD::FADD:		case ISD::FADD:
assert(Options.UnsafeFPMath \|\| Flags.hasNoSignedZeros());		assert(Options.NoSignedZerosFPMath \|\| Flags.hasNoSignedZeros());

// fold (fneg (fadd A, B)) -> (fsub (fneg A), B)		// fold (fneg (fadd A, B)) -> (fsub (fneg A), B)
if (isNegatibleForFree(Op.getOperand(0), LegalOperations,		if (isNegatibleForFree(Op.getOperand(0), LegalOperations,
DAG.getTargetLoweringInfo(), &Options, ForCodeSize,		DAG.getTargetLoweringInfo(), &Options, ForCodeSize,
Depth + 1))		Depth + 1))
return DAG.getNode(ISD::FSUB, SDLoc(Op), Op.getValueType(),		return DAG.getNode(ISD::FSUB, SDLoc(Op), Op.getValueType(),
GetNegatedExpression(Op.getOperand(0), DAG,		GetNegatedExpression(Op.getOperand(0), DAG,
LegalOperations, ForCodeSize,		LegalOperations, ForCodeSize,
▲ Show 20 Lines • Show All 11,088 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFADD(SDNode *N) {

// canonicalize constant to RHS		// canonicalize constant to RHS
if (N0CFP && !N1CFP)		if (N0CFP && !N1CFP)
return DAG.getNode(ISD::FADD, DL, VT, N1, N0, Flags);		return DAG.getNode(ISD::FADD, DL, VT, N1, N0, Flags);

// N0 + -0.0 --> N0 (also allowed with +0.0 and fast-math)		// N0 + -0.0 --> N0 (also allowed with +0.0 and fast-math)
ConstantFPSDNode *N1C = isConstOrConstSplatFP(N1, true);		ConstantFPSDNode *N1C = isConstOrConstSplatFP(N1, true);
if (N1C && N1C->isZero())		if (N1C && N1C->isZero())
if (N1C->isNegative() \|\| Options.UnsafeFPMath \|\| Flags.hasNoSignedZeros())		if (N1C->isNegative() \|\| Options.NoSignedZerosFPMath \|\| Flags.hasNoSignedZeros())
return N0;		return N0;

if (SDValue NewSel = foldBinOpIntoSelect(N))		if (SDValue NewSel = foldBinOpIntoSelect(N))
return NewSel;		return NewSel;

// fold (fadd A, (fneg B)) -> (fsub A, B)		// fold (fadd A, (fneg B)) -> (fsub A, B)
if ((!LegalOperations \|\| TLI.isOperationLegalOrCustom(ISD::FSUB, VT)) &&		if ((!LegalOperations \|\| TLI.isOperationLegalOrCustom(ISD::FSUB, VT)) &&
isNegatibleForFree(N1, LegalOperations, TLI, &Options, ForCodeSize) == 2)		isNegatibleForFree(N1, LegalOperations, TLI, &Options, ForCodeSize) == 2)
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	if ((Options.NoNaNsFPMath \|\| Flags.hasNoNaNs()) && AllowNewConst) {
// If allowed, fold (fadd x, (fneg x)) -> 0.0		// If allowed, fold (fadd x, (fneg x)) -> 0.0
if (N1.getOpcode() == ISD::FNEG && N1.getOperand(0) == N0)		if (N1.getOpcode() == ISD::FNEG && N1.getOperand(0) == N0)
return DAG.getConstantFP(0.0, DL, VT);		return DAG.getConstantFP(0.0, DL, VT);
}		}

// If 'unsafe math' or reassoc and nsz, fold lots of things.		// If 'unsafe math' or reassoc and nsz, fold lots of things.
// TODO: break out portions of the transformations below for which Unsafe is		// TODO: break out portions of the transformations below for which Unsafe is
// considered and which do not require both nsz and reassoc		// considered and which do not require both nsz and reassoc
if ((Options.UnsafeFPMath \|\|		if (((Options.UnsafeFPMath && Options.NoSignedZerosFPMath) \|\|
		mcberg2017AuthorUnsubmitted Done Reply Inline Actions In this case, UnsafeFPMath holds reassociation context. mcberg2017: In this case, UnsafeFPMath holds reassociation context.
(Flags.hasAllowReassociation() && Flags.hasNoSignedZeros())) &&		(Flags.hasAllowReassociation() && Flags.hasNoSignedZeros())) &&
AllowNewConst) {		AllowNewConst) {
// fadd (fadd x, c1), c2 -> fadd x, c1 + c2		// fadd (fadd x, c1), c2 -> fadd x, c1 + c2
if (N1CFP && N0.getOpcode() == ISD::FADD &&		if (N1CFP && N0.getOpcode() == ISD::FADD &&
isConstantFPBuildVectorOrConstantFP(N0.getOperand(1))) {		isConstantFPBuildVectorOrConstantFP(N0.getOperand(1))) {
SDValue NewC = DAG.getNode(ISD::FADD, DL, VT, N0.getOperand(1), N1, Flags);		SDValue NewC = DAG.getNode(ISD::FADD, DL, VT, N0.getOperand(1), N1, Flags);
return DAG.getNode(ISD::FADD, DL, VT, N0.getOperand(0), NewC, Flags);		return DAG.getNode(ISD::FADD, DL, VT, N0.getOperand(0), NewC, Flags);
}		}
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFSUB(SDNode *N) {
if (N0CFP && N1CFP)		if (N0CFP && N1CFP)
return DAG.getNode(ISD::FSUB, DL, VT, N0, N1, Flags);		return DAG.getNode(ISD::FSUB, DL, VT, N0, N1, Flags);

if (SDValue NewSel = foldBinOpIntoSelect(N))		if (SDValue NewSel = foldBinOpIntoSelect(N))
return NewSel;		return NewSel;

// (fsub A, 0) -> A		// (fsub A, 0) -> A
if (N1CFP && N1CFP->isZero()) {		if (N1CFP && N1CFP->isZero()) {
if (!N1CFP->isNegative() \|\| Options.UnsafeFPMath \|\|		if (!N1CFP->isNegative() \|\| Options.NoSignedZerosFPMath \|\|
Flags.hasNoSignedZeros()) {		Flags.hasNoSignedZeros()) {
return N0;		return N0;
}		}
}		}

if (N0 == N1) {		if (N0 == N1) {
// (fsub x, x) -> 0.0		// (fsub x, x) -> 0.0
if (Options.NoNaNsFPMath \|\| Flags.hasNoNaNs())		if (Options.NoNaNsFPMath \|\| Flags.hasNoNaNs())
Show All 10 Lines	if (N0CFP->isNegative() \|\|
(Options.NoSignedZerosFPMath \|\| Flags.hasNoSignedZeros())) {		(Options.NoSignedZerosFPMath \|\| Flags.hasNoSignedZeros())) {
if (isNegatibleForFree(N1, LegalOperations, TLI, &Options, ForCodeSize))		if (isNegatibleForFree(N1, LegalOperations, TLI, &Options, ForCodeSize))
return GetNegatedExpression(N1, DAG, LegalOperations, ForCodeSize);		return GetNegatedExpression(N1, DAG, LegalOperations, ForCodeSize);
if (!LegalOperations \|\| TLI.isOperationLegal(ISD::FNEG, VT))		if (!LegalOperations \|\| TLI.isOperationLegal(ISD::FNEG, VT))
return DAG.getNode(ISD::FNEG, DL, VT, N1, Flags);		return DAG.getNode(ISD::FNEG, DL, VT, N1, Flags);
}		}
}		}

if ((Options.UnsafeFPMath \|\|		if (((Options.UnsafeFPMath && Options.NoSignedZerosFPMath) \|\|
(Flags.hasAllowReassociation() && Flags.hasNoSignedZeros()))		(Flags.hasAllowReassociation() && Flags.hasNoSignedZeros()))
&& N1.getOpcode() == ISD::FADD) {		&& N1.getOpcode() == ISD::FADD) {
// X - (X + Y) -> -Y		// X - (X + Y) -> -Y
if (N0 == N1->getOperand(0))		if (N0 == N1->getOperand(0))
return DAG.getNode(ISD::FNEG, DL, VT, N1->getOperand(1), Flags);		return DAG.getNode(ISD::FNEG, DL, VT, N1->getOperand(1), Flags);
// X - (Y + X) -> -Y		// X - (Y + X) -> -Y
if (N0 == N1->getOperand(1))		if (N0 == N1->getOperand(1))
return DAG.getNode(ISD::FNEG, DL, VT, N1->getOperand(0), Flags);		return DAG.getNode(ISD::FNEG, DL, VT, N1->getOperand(0), Flags);
▲ Show 20 Lines • Show All 8,618 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,639 Lines • ▼ Show 20 Lines	if (OpOpcode == ISD::EXTRACT_VECTOR_ELT &&
return Operand.getOperand(0);		return Operand.getOperand(0);
break;		break;
case ISD::FNEG:		case ISD::FNEG:
// Negation of an unknown bag of bits is still completely undefined.		// Negation of an unknown bag of bits is still completely undefined.
if (OpOpcode == ISD::UNDEF)		if (OpOpcode == ISD::UNDEF)
return getUNDEF(VT);		return getUNDEF(VT);

// -(X-Y) -> (Y-X) is unsafe because when X==Y, -0.0 != +0.0		// -(X-Y) -> (Y-X) is unsafe because when X==Y, -0.0 != +0.0
if ((getTarget().Options.UnsafeFPMath \|\| Flags.hasNoSignedZeros()) &&		if ((getTarget().Options.NoSignedZerosFPMath \|\| Flags.hasNoSignedZeros()) &&
OpOpcode == ISD::FSUB)		OpOpcode == ISD::FSUB)
return getNode(ISD::FSUB, DL, VT, Operand.getOperand(1),		return getNode(ISD::FSUB, DL, VT, Operand.getOperand(1),
Operand.getOperand(0), Flags);		Operand.getOperand(0), Flags);
if (OpOpcode == ISD::FNEG) // --X -> X		if (OpOpcode == ISD::FNEG) // --X -> X
return Operand.getOperand(0);		return Operand.getOperand(0);
break;		break;
case ISD::FABS:		case ISD::FABS:
if (OpOpcode == ISD::FNEG) // abs(-X) -> abs(X)		if (OpOpcode == ISD::FNEG) // abs(-X) -> abs(X)
▲ Show 20 Lines • Show All 4,938 Lines • Show Last 20 Lines

test/CodeGen/AArch64/fadd-combines.ll

	Show First 20 Lines • Show All 141 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%a1 = fadd float %x, 42.0			%a1 = fadd float %x, 42.0
	%a2 = fadd nsz reassoc float %a1, 17.0			%a2 = fadd nsz reassoc float %a1, 17.0
	%a3 = fadd float %a1, %a2			%a3 = fadd float %a1, %a2
	ret float %a3			ret float %a3
	}			}

	; DAGCombiner transforms this into: (x + 59.0) + (x + 17.0).			; DAGCombiner transforms this into: (x + 59.0) + (x + 17.0).
	; The machine combiner transforms this into a chain of 3 dependent adds:			; The machine combiner transforms this into a chain of 3 dependent adds:
	; ((x + 59.0) + 17.0) + x			; ((x + 59.0) + 17.0) + x
				spatelUnsubmitted Not Done Reply Inline Actions This comment is incorrect now. MachineCombiner was relying on the function attribute? spatel: This comment is incorrect now. MachineCombiner was relying on the function attribute?

	define float @fadd_const_multiuse_attr(float %x) #0 {			define float @fadd_const_multiuse_attr(float %x) {
	; CHECK-LABEL: fadd_const_multiuse_attr:			; CHECK-LABEL: fadd_const_multiuse_attr:
				spatelUnsubmitted Not Done Reply Inline Actions That comment seems wrong from the start. This form has better throughput than 3 chained adds. Ideally, this would be FMA? 2.0 * x + 101.0 The '17' variable names in the check lines are wrong too: 1109917696 --> 0x42280000 --> 42.0 spatel: That comment seems wrong from the start. This form has better throughput than 3 chained adds.
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-DAG: mov [[W59:w[0-9]+]], #1114374144
	; CHECK-DAG: mov [[W17:w[0-9]+]], #1109917696			; CHECK-DAG: mov [[W17:w[0-9]+]], #1109917696
	; CHECK-NEXT: fmov [[FP59:s[0-9]+]], [[W59]]			; CHECK-DAG: mov [[W59:w[0-9]+]], #1114374144
	; CHECK-NEXT: fmov [[FP17:s[0-9]+]], [[W17]]			; CHECK-NEXT: fmov [[FP17:s[0-9]+]], [[W17]]
	; CHECK-NEXT: fadd [[TMP1:s[0-9]+]], s0, [[FP59]]			; CHECK-NEXT: fmov [[FP59:s[0-9]+]], [[W59]]
	; CHECK-NEXT: fadd [[TMP2:s[0-9]+]], [[FP17]], [[TMP1]]			; CHECK-NEXT: fadd [[TMP1:s[0-9]+]], s0, [[FP17]]
	; CHECK-NEXT: fadd s0, s0, [[TMP2]]			; CHECK-NEXT: fadd [[TMP2:s[0-9]+]], s0, [[FP59]]
				; CHECK-NEXT: fadd s0, [[TMP1]], [[TMP2]]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%a1 = fadd float %x, 42.0			%a1 = fadd fast float %x, 42.0
	%a2 = fadd float %a1, 17.0			%a2 = fadd fast float %a1, 17.0
	%a3 = fadd float %a1, %a2			%a3 = fadd fast float %a1, %a2
	ret float %a3			ret float %a3
	}			}

	attributes #0 = { "unsafe-fp-math"="true" }

	declare void @use(double)			declare void @use(double)

test/CodeGen/AMDGPU/enable-no-signed-zeros-fp-math.ll

	; RUN: llc -march=amdgcn -enable-no-signed-zeros-fp-math=0 < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-SAFE %s			; RUN: llc -march=amdgcn < %s \| FileCheck --check-prefixes=GCN,GCN-FMF,GCN-SAFE %s
				spatelUnsubmitted Not Done Reply Inline Actions Here and other test files: is it possible to remove "-enable-unsafe-fp-math" and still get the same test results? If we can tighten up the constraints, that would help move us away from the global requirements. Another option would be to specify the function-level attribute only on the tests that still require more than 'nsz' to produce the expected test results. spatel: Here and other test files: is it possible to remove "-enable-unsafe-fp-math" and still get the…
				spatelUnsubmitted Not Done Reply Inline Actions Still another option (and moves us closer still to the goal of IR/node-level flags only): can we remove the global settings entirely, add FMF to the IR in these tests, and maintain their intent? spatel: Still another option (and moves us closer still to the goal of IR/node-level flags only): can…
				mcberg2017AuthorUnsubmitted Done Reply Inline Actions We need the new flag or attribute to keep the results the same. I will look over the tests and see where (hopefully all) we can use attributes in place of the flags. mcberg2017: We need the new flag or attribute to keep the results the same. I will look over the tests and…
	; RUN: llc -march=amdgcn -enable-no-signed-zeros-fp-math=1 < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-UNSAFE %s
	; RUN: llc -march=amdgcn -enable-unsafe-fp-math < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-UNSAFE %s

	declare i32 @llvm.amdgcn.workitem.id.x() nounwind readnone			declare i32 @llvm.amdgcn.workitem.id.x() nounwind readnone

	; Test that the -enable-no-signed-zeros-fp-math flag works			; Test that the -enable-no-signed-zeros-fp-math flag works

	; GCN-LABEL: {{^}}fneg_fsub_f32:			; GCN-LABEL: {{^}}fneg_fsub_f32_fmf:
	; GCN: v_sub_f32_e32 [[SUB:v[0-9]+]], {{v[0-9]+}}, {{v[0-9]+}}			; GCN: v_sub_f32_e32 [[SUB:v[0-9]+]], {{v[0-9]+}}, {{v[0-9]+}}
	; GCN-SAFE: v_xor_b32_e32 v{{[0-9]+}}, 0x80000000, [[SUB]]			; GCN-FMF-NOT: xor
				define amdgpu_kernel void @fneg_fsub_f32_fmf(float addrspace(1)* %out, float addrspace(1)* %in) #0 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%add = add i32 %tid, 1
				%gep = getelementptr float, float addrspace(1)* %in, i32 %tid
				%b_ptr = getelementptr float, float addrspace(1)* %in, i32 %add
				%a = load float, float addrspace(1)* %gep, align 4
				%b = load float, float addrspace(1)* %b_ptr, align 4
				%result = fsub fast float %a, %b
				%neg.result = fsub fast float -0.0, %result
				store float %neg.result, float addrspace(1)* %out, align 4
				ret void
				}

	; GCN-UNSAFE-NOT: xor			; GCN-LABEL: {{^}}fneg_fsub_f32_safe:
	define amdgpu_kernel void @fneg_fsub_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {			; GCN: v_sub_f32_e32 [[SUB:v[0-9]+]], {{v[0-9]+}}, {{v[0-9]+}}
				; GCN-SAFE: v_xor_b32_e32 v{{[0-9]+}}, 0x80000000, [[SUB]]
				define amdgpu_kernel void @fneg_fsub_f32_safe(float addrspace(1)* %out, float addrspace(1)* %in) #0 {
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%add = add i32 %tid, 1			%add = add i32 %tid, 1
	%gep = getelementptr float, float addrspace(1)* %in, i32 %tid			%gep = getelementptr float, float addrspace(1)* %in, i32 %tid
	%b_ptr = getelementptr float, float addrspace(1)* %in, i32 %add			%b_ptr = getelementptr float, float addrspace(1)* %in, i32 %add
	%a = load float, float addrspace(1)* %gep, align 4			%a = load float, float addrspace(1)* %gep, align 4
	%b = load float, float addrspace(1)* %b_ptr, align 4			%b = load float, float addrspace(1)* %b_ptr, align 4
	%result = fsub float %a, %b			%result = fsub float %a, %b
	%neg.result = fsub float -0.0, %result			%neg.result = fsub float -0.0, %result
	store float %neg.result, float addrspace(1)* %out, align 4			store float %neg.result, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

test/CodeGen/AMDGPU/ffloor.f64.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs -enable-unsafe-fp-math < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs -enable-unsafe-fp-math < %s \| FileCheck -check-prefix=CI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs < %s \| FileCheck -check-prefix=CI -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs -enable-unsafe-fp-math < %s \| FileCheck -check-prefix=CI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=CI -check-prefix=FUNC %s

	declare double @llvm.fabs.f64(double %Val)			declare double @llvm.fabs.f64(double %Val)
	declare double @llvm.floor.f64(double) nounwind readnone			declare double @llvm.floor.f64(double) nounwind readnone
	declare <2 x double> @llvm.floor.v2f64(<2 x double>) nounwind readnone			declare <2 x double> @llvm.floor.v2f64(<2 x double>) nounwind readnone
	declare <3 x double> @llvm.floor.v3f64(<3 x double>) nounwind readnone			declare <3 x double> @llvm.floor.v3f64(<3 x double>) nounwind readnone
	declare <4 x double> @llvm.floor.v4f64(<4 x double>) nounwind readnone			declare <4 x double> @llvm.floor.v4f64(<4 x double>) nounwind readnone
	declare <8 x double> @llvm.floor.v8f64(<8 x double>) nounwind readnone			declare <8 x double> @llvm.floor.v8f64(<8 x double>) nounwind readnone
	declare <16 x double> @llvm.floor.v16f64(<16 x double>) nounwind readnone			declare <16 x double> @llvm.floor.v16f64(<16 x double>) nounwind readnone

	; FUNC-LABEL: {{^}}ffloor_f64:			; FUNC-LABEL: {{^}}ffloor_f64:
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; SI: v_fract_f64_e32			; SI: v_fract_f64_e32
	; SI-DAG: v_min_f64			; SI-DAG: v_min_f64
	; SI-DAG: v_cmp_class_f64_e64 vcc			; SI-DAG: v_cmp_class_f64_e64 vcc
	; SI: v_cndmask_b32_e32			; SI: v_cndmask_b32_e32
	; SI: v_cndmask_b32_e32			; SI: v_cndmask_b32_e32
	; SI: v_add_f64			; SI: v_add_f64
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @ffloor_f64(double addrspace(1)* %out, double %x) {			define amdgpu_kernel void @ffloor_f64(double addrspace(1)* %out, double %x) {
	%y = call double @llvm.floor.f64(double %x) nounwind readnone			%y = call fast double @llvm.floor.f64(double %x) nounwind readnone
	store double %y, double addrspace(1)* %out			store double %y, double addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}ffloor_f64_neg:			; FUNC-LABEL: {{^}}ffloor_f64_neg:
	; CI: v_floor_f64_e64			; CI: v_floor_f64_e64
	; SI: v_fract_f64_e64 {{v[[0-9]+:[0-9]+]}}, -[[INPUT:s[[0-9]+:[0-9]+]]]			; SI: v_fract_f64_e64 {{v[[0-9]+:[0-9]+]}}, -[[INPUT:s[[0-9]+:[0-9]+]]]
	; SI-DAG: v_min_f64			; SI-DAG: v_min_f64
	; SI-DAG: v_cmp_class_f64_e64 vcc			; SI-DAG: v_cmp_class_f64_e64 vcc
	; SI: v_cndmask_b32_e32			; SI: v_cndmask_b32_e32
	; SI: v_cndmask_b32_e32			; SI: v_cndmask_b32_e32
	; SI: v_add_f64 {{v[[0-9]+:[0-9]+]}}, -[[INPUT]]			; SI: v_add_f64 {{v[[0-9]+:[0-9]+]}}, -[[INPUT]]
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @ffloor_f64_neg(double addrspace(1)* %out, double %x) {			define amdgpu_kernel void @ffloor_f64_neg(double addrspace(1)* %out, double %x) {
	%neg = fsub double 0.0, %x			%neg = fsub nsz double 0.0, %x
	%y = call double @llvm.floor.f64(double %neg) nounwind readnone			%y = call fast double @llvm.floor.f64(double %neg) nounwind readnone
	store double %y, double addrspace(1)* %out			store double %y, double addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}ffloor_f64_neg_abs:			; FUNC-LABEL: {{^}}ffloor_f64_neg_abs:
	; CI: v_floor_f64_e64			; CI: v_floor_f64_e64
	; SI: v_fract_f64_e64 {{v[[0-9]+:[0-9]+]}}, -\|[[INPUT:s[[0-9]+:[0-9]+]]]\|			; SI: v_fract_f64_e64 {{v[[0-9]+:[0-9]+]}}, -\|[[INPUT:s[[0-9]+:[0-9]+]]]\|
	; SI-DAG: v_min_f64			; SI-DAG: v_min_f64
	; SI-DAG: v_cmp_class_f64_e64 vcc			; SI-DAG: v_cmp_class_f64_e64 vcc
	; SI: v_cndmask_b32_e32			; SI: v_cndmask_b32_e32
	; SI: v_cndmask_b32_e32			; SI: v_cndmask_b32_e32
	; SI: v_add_f64 {{v[[0-9]+:[0-9]+]}}, -\|[[INPUT]]\|			; SI: v_add_f64 {{v[[0-9]+:[0-9]+]}}, -\|[[INPUT]]\|
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @ffloor_f64_neg_abs(double addrspace(1)* %out, double %x) {			define amdgpu_kernel void @ffloor_f64_neg_abs(double addrspace(1)* %out, double %x) {
	%abs = call double @llvm.fabs.f64(double %x)			%abs = call fast double @llvm.fabs.f64(double %x)
	%neg = fsub double 0.0, %abs			%neg = fsub nsz double 0.0, %abs
	%y = call double @llvm.floor.f64(double %neg) nounwind readnone			%y = call fast double @llvm.floor.f64(double %neg) nounwind readnone
	store double %y, double addrspace(1)* %out			store double %y, double addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}ffloor_v2f64:			; FUNC-LABEL: {{^}}ffloor_v2f64:
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	define amdgpu_kernel void @ffloor_v2f64(<2 x double> addrspace(1)* %out, <2 x double> %x) {			define amdgpu_kernel void @ffloor_v2f64(<2 x double> addrspace(1)* %out, <2 x double> %x) {
	%y = call <2 x double> @llvm.floor.v2f64(<2 x double> %x) nounwind readnone			%y = call fast <2 x double> @llvm.floor.v2f64(<2 x double> %x) nounwind readnone
	store <2 x double> %y, <2 x double> addrspace(1)* %out			store <2 x double> %y, <2 x double> addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}ffloor_v3f64:			; FUNC-LABEL: {{^}}ffloor_v3f64:
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI-NOT: v_floor_f64_e32			; CI-NOT: v_floor_f64_e32
	define amdgpu_kernel void @ffloor_v3f64(<3 x double> addrspace(1)* %out, <3 x double> %x) {			define amdgpu_kernel void @ffloor_v3f64(<3 x double> addrspace(1)* %out, <3 x double> %x) {
	%y = call <3 x double> @llvm.floor.v3f64(<3 x double> %x) nounwind readnone			%y = call fast <3 x double> @llvm.floor.v3f64(<3 x double> %x) nounwind readnone
	store <3 x double> %y, <3 x double> addrspace(1)* %out			store <3 x double> %y, <3 x double> addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}ffloor_v4f64:			; FUNC-LABEL: {{^}}ffloor_v4f64:
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	define amdgpu_kernel void @ffloor_v4f64(<4 x double> addrspace(1)* %out, <4 x double> %x) {			define amdgpu_kernel void @ffloor_v4f64(<4 x double> addrspace(1)* %out, <4 x double> %x) {
	%y = call <4 x double> @llvm.floor.v4f64(<4 x double> %x) nounwind readnone			%y = call fast <4 x double> @llvm.floor.v4f64(<4 x double> %x) nounwind readnone
	store <4 x double> %y, <4 x double> addrspace(1)* %out			store <4 x double> %y, <4 x double> addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}ffloor_v8f64:			; FUNC-LABEL: {{^}}ffloor_v8f64:
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	define amdgpu_kernel void @ffloor_v8f64(<8 x double> addrspace(1)* %out, <8 x double> %x) {			define amdgpu_kernel void @ffloor_v8f64(<8 x double> addrspace(1)* %out, <8 x double> %x) {
	%y = call <8 x double> @llvm.floor.v8f64(<8 x double> %x) nounwind readnone			%y = call fast <8 x double> @llvm.floor.v8f64(<8 x double> %x) nounwind readnone
	store <8 x double> %y, <8 x double> addrspace(1)* %out			store <8 x double> %y, <8 x double> addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}ffloor_v16f64:			; FUNC-LABEL: {{^}}ffloor_v16f64:
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	define amdgpu_kernel void @ffloor_v16f64(<16 x double> addrspace(1)* %out, <16 x double> %x) {			define amdgpu_kernel void @ffloor_v16f64(<16 x double> addrspace(1)* %out, <16 x double> %x) {
	%y = call <16 x double> @llvm.floor.v16f64(<16 x double> %x) nounwind readnone			%y = call fast <16 x double> @llvm.floor.v16f64(<16 x double> %x) nounwind readnone
	store <16 x double> %y, <16 x double> addrspace(1)* %out			store <16 x double> %y, <16 x double> addrspace(1)* %out
	ret void			ret void
	}			}

test/CodeGen/AMDGPU/fneg-combines.ll

Show First 20 Lines • Show All 213 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @v_fneg_add_multi_use_fneg_x_f32(float addrspace(1)* %out, float addrspace(1)* %a.ptr, float addrspace(1)* %b.ptr, float %c) #0 {
ret void		ret void
}		}

; This one asserted with -enable-no-signed-zeros-fp-math		; This one asserted with -enable-no-signed-zeros-fp-math
; GCN-LABEL: {{^}}fneg_fadd_0:		; GCN-LABEL: {{^}}fneg_fadd_0:
; GCN-SAFE-DAG: v_mad_f32 [[A:v[0-9]+]],		; GCN-SAFE-DAG: v_mad_f32 [[A:v[0-9]+]],
; GCN-SAFE-DAG: v_cmp_ngt_f32_e32 {{.*}}, [[A]]		; GCN-SAFE-DAG: v_cmp_ngt_f32_e32 {{.*}}, [[A]]
; GCN-SAFE-DAG: v_cndmask_b32_e64 v{{[0-9]+}}, -[[A]]		; GCN-SAFE-DAG: v_cndmask_b32_e64 v{{[0-9]+}}, -[[A]]
; GCN-NSZ-DAG: v_mac_f32_e32 [[C:v[0-9]+]],		; GCN-NSZ-DAG: v_rcp_f32_e32 [[A:v[0-9]+]],
; GCN-NSZ-DAG: v_cmp_nlt_f32_e64 {{.*}}, -[[C]]		; GCN-NSZ-DAG: v_mov_b32_e32 [[B:v[0-9]+]],
		; GCN-NSZ-DAG: v_mov_b32_e32 [[C:v[0-9]+]],
		; GCN-NSZ-DAG: v_mul_f32_e32 [[D:v[0-9]+]],
		; GCN-NSZ-DAG: v_cmp_nlt_f32_e64 {{.*}}, -[[D]]

		spatelUnsubmitted Not Done Reply Inline Actions I don't know how to interpret this test diff: regression, improvement, or does this test no longer accomplish its original intent? spatel: I don't know how to interpret this test diff: regression, improvement, or does this test no…
		mcberg2017AuthorUnsubmitted Done Reply Inline Actions Yes, now we no longer optimize this case, should I just remove the gcn-nsz-dag context for fneg_fadd_0? mcberg2017: Yes, now we no longer optimize this case, should I just remove the gcn-nsz-dag context for…
		spatelUnsubmitted Not Done Reply Inline Actions Someone with AMDGPU knowledge should answer that. cc'ing @arsenm @nhaehnle @foad @rampitec spatel: Someone with AMDGPU knowledge should answer that. cc'ing @arsenm @nhaehnle @foad @rampitec
		rampitecUnsubmitted Not Done Reply Inline Actions This is clearly a regression. rampitec: This is clearly a regression.
		mcberg2017AuthorUnsubmitted Done Reply Inline Actions I will debug this case, it looks like there's an additional dependency with the new code shape wrt fadd. mcberg2017: I will debug this case, it looks like there's an additional dependency with the new code shape…
		mcberg2017AuthorUnsubmitted Done Reply Inline Actions with the change of // N0 + -0.0 --> N0 (also allowed with +0.0 and fast-math) ConstantFPSDNode N1C = isConstOrConstSplatFP(N1, true); if (N1C && N1C->isZero()) if (N1C->isNegative() \|\| Options.NoSignedZerosFPMath \|\| Flags.hasNoSignedZeros()) return N0; we get this DAG: SelectionDAG has 21 nodes: t0: ch = EntryToken t2: f32,ch = CopyFromReg t0, Register:f32 %0 t27: f32 = fneg t28 t13: i1 = setcc t27, t2, setuge:ch t15: f32 = select t13, t2, t28 t17: i1 = setcc t15, ConstantFP:f32<0.000000e+00>, setule:ch t19: f32 = select t17, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<nan> t21: ch,glue = CopyToReg t0, Register:f32 $vgpr0, t19 t4: f32,ch = CopyFromReg t0, Register:f32 %1 t7: f32 = fdiv ConstantFP:f32<1.000000e+00>, t4 t28: f32 = fmul nnan arcp contract reassoc t7, ConstantFP:f32<-0.000000e+00> t22: ch = RETURN_TO_EPILOG t21, Register:f32 $vgpr0, t21:1 for which we produce this assembler: fneg_fadd_0: ; @fneg_fadd_0 ; %bb.0: ; %.entry v_rcp_f32_e32 v0, s1 v_mov_b32_e32 v1, s0 v_mov_b32_e32 v2, 0x7fc00000 v_mul_f32_e32 v0, 0x80000000, v0 v_cmp_nlt_f32_e64 vcc, -v0, s0 v_cndmask_b32_e32 v0, v0, v1, vcc v_cmp_nlt_f32_e32 vcc, 0, v0 v_cndmask_b32_e64 v0, v2, 0, vcc with the code the way it is currently posed(unmodified): // N0 + -0.0 --> N0 (also allowed with +0.0 and fast-math) ConstantFPSDNode N1C = isConstOrConstSplatFP(N1, true); if (N1C && N1C->isZero()) if (N1C->isNegative() \|\| Options.UnsafeFPMath \|\| Flags.hasNoSignedZeros()) return N0; we get this DAG: SelectionDAG has 21 nodes: t0: ch = EntryToken t2: f32,ch = CopyFromReg t0, Register:f32 %0 t4: f32,ch = CopyFromReg t0, Register:f32 %1 t7: f32 = fdiv ConstantFP:f32<1.000000e+00>, t4 t9: f32 = fmul t7, ConstantFP:f32<0.000000e+00> t11: f32 = fadd t9, ConstantFP:f32<0.000000e+00> t13: i1 = setcc t11, t2, setuge:ch t14: f32 = fneg t11 t15: f32 = select t13, t2, t14 t17: i1 = setcc t15, ConstantFP:f32<0.000000e+00>, setule:ch t19: f32 = select t17, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<nan> t21: ch,glue = CopyToReg t0, Register:f32 $vgpr0, t19 t22: ch = RETURN_TO_EPILOG t21, Register:f32 $vgpr0, t21:1 for which we fold an fused multiply add, and produce this assembler: fneg_fadd_0: ; @fneg_fadd_0 ; %bb.0: ; %.entry v_rcp_f32_e32 v0, s1 v_bfrev_b32_e32 v1, 1 v_mov_b32_e32 v2, s0 v_mac_f32_e32 v1, v0, v1 v_cmp_nlt_f32_e64 vcc, -v1, s0 v_cndmask_b32_e32 v0, v1, v2, vcc v_mov_b32_e32 v1, 0x7fc00000 v_cmp_nlt_f32_e32 vcc, 0, v0 v_cndmask_b32_e64 v0, v1, 0, vcc mcberg2017: with the change of // N0 + -0.0 --> N0 (also allowed with +0.0 and fast-math)…
		rampitecUnsubmitted Not Done Reply Inline Actions I am trying to understand what does the existing ISA do, and it is: v_rcp_f32_e32 v0, s1 v0 = 1 / s1 v_bfrev_b32_e32 v1, 1 v1 = 0x8000000 = -0.0 v_mov_b32_e32 v2, s0 v2 = s0 v_mac_f32_e32 v1, v0, v1 v1 = v1 * v0 + v1 = v0 * -0.0 - 0.0 = 0 Instead of that fancy mac instruction that seems to be all now folded into v_mul_f32_e32 v0, 0x80000000, v0 v0 = v0 * -0.0 I.e. it is hardly practically performance relevant code. The comment above tells it used to assert, so I guess this is just a regression test. Given no signed zeroes this is as good as just v0 = 0, but that's a different matter. I have no objection for this test change. rampitec: I am trying to understand what does the existing ISA do, and it is: ``` v_rcp_f32_e32 v0, s1…
define amdgpu_ps float @fneg_fadd_0(float inreg %tmp2, float inreg %tmp6, <4 x i32> %arg) local_unnamed_addr #0 {		define amdgpu_ps float @fneg_fadd_0(float inreg %tmp2, float inreg %tmp6, <4 x i32> %arg) local_unnamed_addr #0 {
.entry:		.entry:
%tmp7 = fdiv float 1.000000e+00, %tmp6		%tmp7 = fdiv float 1.000000e+00, %tmp6
%tmp8 = fmul float 0.000000e+00, %tmp7		%tmp8 = fmul float 0.000000e+00, %tmp7
%tmp9 = fmul reassoc nnan arcp contract float 0.000000e+00, %tmp8		%tmp9 = fmul reassoc nnan arcp contract float 0.000000e+00, %tmp8
%.i188 = fadd float %tmp9, 0.000000e+00		%.i188 = fadd float %tmp9, 0.000000e+00
%tmp10 = fcmp uge float %.i188, %tmp2		%tmp10 = fcmp uge float %.i188, %tmp2
%tmp11 = fsub float -0.000000e+00, %.i188		%tmp11 = fsub float -0.000000e+00, %.i188
▲ Show 20 Lines • Show All 2,291 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/fma-mutate.ll

	; Test several VSX FMA mutation opportunities. The first one isn't a			; Test several VSX FMA mutation opportunities. The first one isn't a
	; reasonable transformation because the killed product register is the			; reasonable transformation because the killed product register is the
	; same as the FMA target register. The second one is legal. The third			; same as the FMA target register. The second one is legal. The third
	; one doesn't fit the feeding-copy pattern.			; one doesn't fit the feeding-copy pattern.

	; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -enable-unsafe-fp-math -mattr=+vsx -disable-ppc-vsx-fma-mutation=false \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -mattr=+vsx -disable-ppc-vsx-fma-mutation=false \| FileCheck --check-prefixes=CHECK-SAFE,FMF %s
				spatelUnsubmitted Not Done Reply Inline Actions Selectively using 2 different labels for the same RUN line confused me. That seems unnecessary because the function name and IR makes it clear what the difference in output is expected to be. spatel: Selectively using 2 different labels for the same RUN line confused me. That seems unnecessary…
	target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"			target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	declare double @llvm.sqrt.f64(double)			declare double @llvm.sqrt.f64(double)

	define double @foo3(double %a) nounwind {			define double @foo3_fmf(double %a) nounwind {
	%r = call double @llvm.sqrt.f64(double %a)			; FMF: @foo3_fmf
				; FMF-NOT: fmr
				; FMF: xsmaddmdp
				; FMF: xsmaddadp
				spatelUnsubmitted Not Done Reply Inline Actions Would it be better to auto-generate the complete output for these tests using utils/update_llc_test_checks.py? spatel: Would it be better to auto-generate the complete output for these tests using…
				%r = call fast double @llvm.sqrt.f64(double %a)
	ret double %r			ret double %r
				}

	; CHECK: @foo3			define double @foo3_safe(double %a) nounwind {
	; CHECK-NOT: fmr			; CHECK-SAFE: @foo3_safe
	; CHECK: xsmaddmdp			; CHECK-SAFE-NOT: fmr
	; CHECK: xsmaddadp			; CHECK-SAFE: xssqrtdp
				%r = call double @llvm.sqrt.f64(double %a)
				ret double %r
	}			}

test/CodeGen/PowerPC/fmf-propagation.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: llc < %s -mtriple=powerpc64le -debug-only=isel -o /dev/null 2>&1 \| FileCheck %s --check-prefix=FMFDEBUG			; RUN: llc < %s -mtriple=powerpc64le -debug-only=isel -o /dev/null 2>&1 \| FileCheck %s --check-prefix=FMFDEBUG
	; RUN: llc < %s -mtriple=powerpc64le \| FileCheck %s --check-prefix=FMF			; RUN: llc < %s -mtriple=powerpc64le \| FileCheck %s --check-prefix=FMF
	; RUN: llc < %s -mtriple=powerpc64le -debug-only=isel -o /dev/null 2>&1 -enable-unsafe-fp-math -enable-no-nans-fp-math \| FileCheck %s --check-prefix=GLOBALDEBUG			; RUN: llc < %s -mtriple=powerpc64le -debug-only=isel -o /dev/null 2>&1 -enable-unsafe-fp-math -enable-no-nans-fp-math \| FileCheck %s --check-prefix=GLOBALDEBUG
	; RUN: llc < %s -mtriple=powerpc64le -enable-unsafe-fp-math -enable-no-nans-fp-math \| FileCheck %s --check-prefix=GLOBAL			; RUN: llc < %s -mtriple=powerpc64le -enable-unsafe-fp-math -enable-no-nans-fp-math -enable-no-signed-zeros-fp-math \| FileCheck %s --check-prefix=GLOBAL

	; Test FP transforms using instruction/node-level fast-math-flags.			; Test FP transforms using instruction/node-level fast-math-flags.
	; We're also checking debug output to verify that FMF is propagated to the newly created nodes.			; We're also checking debug output to verify that FMF is propagated to the newly created nodes.
	; The run with the global unsafe param tests the pre-FMF behavior using regular instructions/nodes.			; The run with the global unsafe param tests the pre-FMF behavior using regular instructions/nodes.

	declare float @llvm.fma.f32(float, float, float)			declare float @llvm.fma.f32(float, float, float)
	declare float @llvm.sqrt.f32(float)			declare float @llvm.sqrt.f32(float)

	▲ Show 20 Lines • Show All 465 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/qpx-recipest.ll

	; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=a2q -enable-unsafe-fp-math \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=a2q \| FileCheck --check-prefixes=CHECK-SAFE,FMF %s
				spatelUnsubmitted Not Done Reply Inline Actions Same comment as above: Selectively using 2 different labels for the same RUN line confused me. That seems unnecessary because the function name and IR makes it clear what the difference in output is expected to be. spatel: Same comment as above: Selectively using 2 different labels for the same RUN line confused me.
	; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=a2q \| FileCheck -check-prefix=CHECK-SAFE %s
	target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"			target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	declare <4 x double> @llvm.sqrt.v4f64(<4 x double>)			declare <4 x double> @llvm.sqrt.v4f64(<4 x double>)
	declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)			declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)

	define <4 x double> @foo(<4 x double> %a, <4 x double> %b) nounwind {			define <4 x double> @foo_fmf(<4 x double> %a, <4 x double> %b) nounwind {
				; FMF-LABEL: @foo_fmf
				; FMF: qvfrsqrte
				; FMF-DAG: qvfmul
				; FMF-DAG: qvfmsub
				; FMF-DAG: qvfnmsub
				; FMF: qvfmul
				; FMF: qvfmul
				; FMF: qvfnmsub
				; FMF: qvfmul
				; FMF: qvfmul
				; FMF: blr
	entry:			entry:
	%x = call <4 x double> @llvm.sqrt.v4f64(<4 x double> %b)			%x = call fast <4 x double> @llvm.sqrt.v4f64(<4 x double> %b)
	%r = fdiv <4 x double> %a, %x			%r = fdiv fast <4 x double> %a, %x
	ret <4 x double> %r			ret <4 x double> %r
				}

	; CHECK-LABEL: @foo			define <4 x double> @foo_safe(<4 x double> %a, <4 x double> %b) nounwind {
	; CHECK: qvfrsqrte			; CHECK-SAFE-LABEL: @foo_safe
	; CHECK-DAG: qvfmul
	; FIXME: We're currently loading two constants here (1.5 and -1.5), and using
	; an qvfmadd instead of a qvfnmsub
	; CHECK-DAG: qvfmadd
	; CHECK-DAG: qvfmadd
	; CHECK: qvfmul
	; CHECK: qvfmul
	; CHECK: qvfmadd
	; CHECK: qvfmul
	; CHECK: qvfmul
	; CHECK: blr

	; CHECK-SAFE-LABEL: @foo
	; CHECK-SAFE: fsqrt			; CHECK-SAFE: fsqrt
	; CHECK-SAFE: fdiv			; CHECK-SAFE: fdiv
	; CHECK-SAFE: blr			; CHECK-SAFE: blr
	}

	define <4 x double> @foof(<4 x double> %a, <4 x float> %b) nounwind {
	entry:			entry:
	%x = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %b)			%x = call <4 x double> @llvm.sqrt.v4f64(<4 x double> %b)
	%y = fpext <4 x float> %x to <4 x double>			%r = fdiv <4 x double> %a, %x
	%r = fdiv <4 x double> %a, %y
	ret <4 x double> %r			ret <4 x double> %r
				}

	; CHECK-LABEL: @foof			define <4 x double> @foof_fmf(<4 x double> %a, <4 x float> %b) nounwind {
	; CHECK: qvfrsqrtes			; FMF-LABEL: @foof_fmf
	; CHECK-DAG: qvfmuls			; FMF: qvfrsqrtes
				; FMF-DAG: qvfmuls
	; FIXME: We're currently loading two constants here (1.5 and -1.5), and using			; FIXME: We're currently loading two constants here (1.5 and -1.5), and using
	; an qvfmadd instead of a qvfnmsubs			; an qvfmadd instead of a qvfnmsubs
	; CHECK-DAG: qvfmadds			; FMF-DAG: qvfmadds
	; CHECK-DAG: qvfmadds			; FMF-DAG: qvfmadds
	; CHECK: qvfmuls			; FMF: qvfmuls
	; CHECK: qvfmul			; FMF: qvfmul
	; CHECK: blr			; FMF: blr
				entry:
				%x = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> %b)
				%y = fpext <4 x float> %x to <4 x double>
				%r = fdiv fast <4 x double> %a, %y
				ret <4 x double> %r
				}

	; CHECK-SAFE-LABEL: @foof			define <4 x double> @foof_safe(<4 x double> %a, <4 x float> %b) nounwind {
				; CHECK-SAFE-LABEL: @foof_safe
	; CHECK-SAFE: fsqrts			; CHECK-SAFE: fsqrts
	; CHECK-SAFE: fdiv			; CHECK-SAFE: fdiv
	; CHECK-SAFE: blr			; CHECK-SAFE: blr
				entry:
				%x = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %b)
				%y = fpext <4 x float> %x to <4 x double>
				%r = fdiv <4 x double> %a, %y
				ret <4 x double> %r
	}			}

	define <4 x float> @food(<4 x float> %a, <4 x double> %b) nounwind {			define <4 x float> @food_fmf(<4 x float> %a, <4 x double> %b) nounwind {
				; FMF-LABEL: @food_fmf
				; FMF: qvfrsqrte
				; FMF-DAG: qvfmul
				; FMF-DAG: qvfmsub
				; FMF-DAG: qvfnmsub
				; FMF: qvfmul
				; FMF: qvfmul
				; FMF: qvfnmsub
				; FMF: qvfmul
				; FMF: qvfrsp
				; FMF: qvfmuls
				; FMF: blr
	entry:			entry:
	%x = call <4 x double> @llvm.sqrt.v4f64(<4 x double> %b)			%x = call fast <4 x double> @llvm.sqrt.v4f64(<4 x double> %b)
	%y = fptrunc <4 x double> %x to <4 x float>			%y = fptrunc <4 x double> %x to <4 x float>
	%r = fdiv <4 x float> %a, %y			%r = fdiv fast <4 x float> %a, %y
	ret <4 x float> %r			ret <4 x float> %r
				}

	; CHECK-LABEL: @food			define <4 x float> @food_safe(<4 x float> %a, <4 x double> %b) nounwind {
	; CHECK: qvfrsqrte			; CHECK-SAFE-LABEL: @food_safe
	; CHECK-DAG: qvfmul
	; FIXME: We're currently loading two constants here (1.5 and -1.5), and using
	; an qvfmadd instead of a qvfnmsub
	; CHECK-DAG: qvfmadd
	; CHECK-DAG: qvfmadd
	; CHECK: qvfmul
	; CHECK: qvfmul
	; CHECK: qvfmadd
	; CHECK: qvfmul
	; CHECK: qvfrsp
	; CHECK: qvfmuls
	; CHECK: blr

	; CHECK-SAFE-LABEL: @food
	; CHECK-SAFE: fsqrt			; CHECK-SAFE: fsqrt
	; CHECK-SAFE: fdivs			; CHECK-SAFE: fdivs
	; CHECK-SAFE: blr			; CHECK-SAFE: blr
	}

	define <4 x float> @goo(<4 x float> %a, <4 x float> %b) nounwind {
	entry:			entry:
	%x = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %b)			%x = call <4 x double> @llvm.sqrt.v4f64(<4 x double> %b)
	%r = fdiv <4 x float> %a, %x			%y = fptrunc <4 x double> %x to <4 x float>
				%r = fdiv <4 x float> %a, %y
	ret <4 x float> %r			ret <4 x float> %r
				}

	; CHECK-LABEL: @goo			define <4 x float> @goo_fmf(<4 x float> %a, <4 x float> %b) nounwind {
	; CHECK: qvfrsqrtes			; FMF-LABEL: @goo_fmf
	; CHECK-DAG: qvfmuls			; FMF: qvfrsqrtes
				; FMF-DAG: qvfmuls
	; FIXME: We're currently loading two constants here (1.5 and -1.5), and using			; FIXME: We're currently loading two constants here (1.5 and -1.5), and using
	; an qvfmadd instead of a qvfnmsubs			; an qvfmadd instead of a qvfnmsubs
	; CHECK-DAG: qvfmadds			; FMF-DAG: qvfmadds
	; CHECK-DAG: qvfmadds			; FMF-DAG: qvfmadds
	; CHECK: qvfmuls			; FMF: qvfmuls
	; CHECK: qvfmuls			; FMF: qvfmuls
	; CHECK: blr			; FMF: blr
				entry:
				%x = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> %b)
				%r = fdiv fast <4 x float> %a, %x
				ret <4 x float> %r
				}

	; CHECK-SAFE-LABEL: @goo			define <4 x float> @goo_safe(<4 x float> %a, <4 x float> %b) nounwind {
				; CHECK-SAFE-LABEL: @goo_safe
	; CHECK-SAFE: fsqrts			; CHECK-SAFE: fsqrts
	; CHECK-SAFE: fdivs			; CHECK-SAFE: fdivs
	; CHECK-SAFE: blr			; CHECK-SAFE: blr
				entry:
				%x = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %b)
				%r = fdiv <4 x float> %a, %x
				ret <4 x float> %r
	}			}

	define <4 x double> @foo2(<4 x double> %a, <4 x double> %b) nounwind {			define <4 x double> @foo2_fmf(<4 x double> %a, <4 x double> %b) nounwind {
				; FMF-LABEL: @foo2_fmf
				; FMF: qvfre
				; FMF: qvfnmsub
				; FMF: qvfmadd
				; FMF: qvfnmsub
				; FMF: qvfmadd
				; FMF: qvfmul
				; FMF: blr
	entry:			entry:
	%r = fdiv <4 x double> %a, %b			%r = fdiv fast <4 x double> %a, %b
	ret <4 x double> %r			ret <4 x double> %r
				}

	; CHECK-LABEL: @foo2			define <4 x double> @foo2_safe(<4 x double> %a, <4 x double> %b) nounwind {
	; CHECK: qvfre			; CHECK-SAFE-LABEL: @foo2_safe
	; CHECK: qvfnmsub
	; CHECK: qvfmadd
	; CHECK: qvfnmsub
	; CHECK: qvfmadd
	; CHECK: qvfmul
	; CHECK: blr

	; CHECK-SAFE-LABEL: @foo2
	; CHECK-SAFE: fdiv			; CHECK-SAFE: fdiv
	; CHECK-SAFE: blr			; CHECK-SAFE: blr
				%r = fdiv <4 x double> %a, %b
				ret <4 x double> %r
	}			}

	define <4 x float> @goo2(<4 x float> %a, <4 x float> %b) nounwind {			define <4 x float> @goo2_fmf(<4 x float> %a, <4 x float> %b) nounwind {
				; FMF-LABEL: @goo2_fmf
				; FMF: qvfres
				; FMF: qvfnmsubs
				; FMF: qvfmadds
				; FMF: qvfmuls
				; FMF: blr
	entry:			entry:
	%r = fdiv <4 x float> %a, %b			%r = fdiv fast <4 x float> %a, %b
	ret <4 x float> %r			ret <4 x float> %r
				}

	; CHECK-LABEL: @goo2			define <4 x float> @goo2_safe(<4 x float> %a, <4 x float> %b) nounwind {
	; CHECK: qvfres			; CHECK-SAFE-LABEL: @goo2_safe
	; CHECK: qvfnmsubs
	; CHECK: qvfmadds
	; CHECK: qvfmuls
	; CHECK: blr

	; CHECK-SAFE-LABEL: @goo2
	; CHECK-SAFE: fdivs			; CHECK-SAFE: fdivs
	; CHECK-SAFE: blr			; CHECK-SAFE: blr
				entry:
				%r = fdiv <4 x float> %a, %b
				ret <4 x float> %r
	}			}

	define <4 x double> @foo3(<4 x double> %a) nounwind {			define <4 x double> @foo3_fmf(<4 x double> %a) nounwind {
				; FMF-LABEL: @foo3_fmf
				; FMF: qvfrsqrte
				; FMF: qvfmul
				; FMF-DAG: qvfmsub
				; FMF-DAG: qvfcmpeq
				; FMF-DAG: qvfnmsub
				; FMF-DAG: qvfmul
				; FMF-DAG: qvfmul
				; FMF-DAG: qvfnmsub
				; FMF-DAG: qvfmul
				; FMF-DAG: qvfmul
				; FMF: qvfsel
				; FMF: blr
	entry:			entry:
	%r = call <4 x double> @llvm.sqrt.v4f64(<4 x double> %a)			%r = call fast <4 x double> @llvm.sqrt.v4f64(<4 x double> %a)
	ret <4 x double> %r			ret <4 x double> %r
				}

	; CHECK-LABEL: @foo3			define <4 x double> @foo3_safe(<4 x double> %a) nounwind {
	; CHECK: qvfrsqrte			; CHECK-SAFE-LABEL: @foo3_safe
	; CHECK: qvfmul
	; FIXME: We're currently loading two constants here (1.5 and -1.5), and using
	; an qvfmadd instead of a qvfnmsub
	; CHECK-DAG: qvfmadd
	; CHECK-DAG: qvfcmpeq
	; CHECK-DAG: qvfmadd
	; CHECK-DAG: qvfmul
	; CHECK-DAG: qvfmul
	; CHECK-DAG: qvfmadd
	; CHECK-DAG: qvfmul
	; CHECK-DAG: qvfmul
	; CHECK: qvfsel
	; CHECK: blr

	; CHECK-SAFE-LABEL: @foo3
	; CHECK-SAFE: fsqrt			; CHECK-SAFE: fsqrt
	; CHECK-SAFE: blr			; CHECK-SAFE: blr
	}

	define <4 x float> @goo3(<4 x float> %a) nounwind {
	entry:			entry:
	%r = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %a)			%r = call <4 x double> @llvm.sqrt.v4f64(<4 x double> %a)
	ret <4 x float> %r			ret <4 x double> %r
				}

	; CHECK-LABEL: @goo3			define <4 x float> @goo3_fmf(<4 x float> %a) nounwind {
	; CHECK: qvfrsqrtes			; FMF-LABEL: @goo3_fmf
	; CHECK: qvfmuls			; FMF: qvfrsqrtes
				; FMF: qvfmuls
	; FIXME: We're currently loading two constants here (1.5 and -1.5), and using			; FIXME: We're currently loading two constants here (1.5 and -1.5), and using
	; an qvfmadds instead of a qvfnmsubs			; an qvfmadds instead of a qvfnmsubs
	; CHECK-DAG: qvfmadds			; FMF-DAG: qvfmadds
	; CHECK-DAG: qvfcmpeq			; FMF-DAG: qvfcmpeq
	; CHECK-DAG: qvfmadds			; FMF-DAG: qvfmadds
	; CHECK-DAG: qvfmuls			; FMF-DAG: qvfmuls
	; CHECK-DAG: qvfmuls			; FMF-DAG: qvfmuls
	; CHECK: qvfsel			; FMF: qvfsel
	; CHECK: blr			; FMF: blr
				entry:
				%r = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> %a)
				ret <4 x float> %r
				}

	; CHECK-SAFE-LABEL: @goo3			define <4 x float> @goo3_safe(<4 x float> %a) nounwind {
				; CHECK-SAFE-LABEL: @goo3_safe
	; CHECK-SAFE: fsqrts			; CHECK-SAFE: fsqrts
	; CHECK-SAFE: blr			; CHECK-SAFE: blr
				entry:
				%r = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %a)
				ret <4 x float> %r
	}			}

test/CodeGen/PowerPC/recipest.ll

	; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -enable-unsafe-fp-math -mattr=-vsx \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -mattr=-vsx \| FileCheck --check-prefixes=CHECK-SAFE,FMF %s
				spatelUnsubmitted Not Done Reply Inline Actions Same comment as above: Selectively using 2 different labels for the same RUN line confused me. That seems unnecessary because the function name and IR makes it clear what the difference in output is expected to be. spatel: Same comment as above: Selectively using 2 different labels for the same RUN line confused me.
	; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -mattr=-vsx \| FileCheck -check-prefix=CHECK-SAFE %s

	target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"			target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	declare double @llvm.sqrt.f64(double)			declare double @llvm.sqrt.f64(double)
	declare float @llvm.sqrt.f32(float)			declare float @llvm.sqrt.f32(float)
	declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)			declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)

	define double @foo(double %a, double %b) nounwind {			define double @foo_fmf(double %a, double %b) nounwind {
	%x = call double @llvm.sqrt.f64(double %b)			; FMF: @foo
	%r = fdiv double %a, %x			; FMF: frsqrte
				; FMF: fmul
				; FMF-NEXT: fmadd
				; FMF-NEXT: fmul
				; FMF-NEXT: fmul
				; FMF-NEXT: fmul
				; FMF-NEXT: fmadd
				; FMF-NEXT: fmul
				; FMF-NEXT: fmul
				; FMF-NEXT: fmul
				; FMF: blr
				%x = call fast double @llvm.sqrt.f64(double %b)
				%r = fdiv fast double %a, %x
	ret double %r			ret double %r
				}

	; CHECK: @foo			define double @foo_safe(double %a, double %b) nounwind {
	; CHECK: frsqrte
	; CHECK: fmul
	; CHECK-NEXT: fmadd
	; CHECK-NEXT: fmul
	; CHECK-NEXT: fmul
	; CHECK-NEXT: fmul
	; CHECK-NEXT: fmadd
	; CHECK-NEXT: fmul
	; CHECK-NEXT: fmul
	; CHECK-NEXT: fmul
	; CHECK: blr

	; CHECK-SAFE: @foo			; CHECK-SAFE: @foo
	; CHECK-SAFE: fsqrt			; CHECK-SAFE: fsqrt
	; CHECK-SAFE: fdiv			; CHECK-SAFE: fdiv
	; CHECK-SAFE: blr			; CHECK-SAFE: blr
	}

	define double @no_estimate_refinement_f64(double %a, double %b) #0 {
	%x = call double @llvm.sqrt.f64(double %b)			%x = call double @llvm.sqrt.f64(double %b)
	%r = fdiv double %a, %x			%r = fdiv double %a, %x
	ret double %r			ret double %r

	; CHECK-LABEL: @no_estimate_refinement_f64
	; CHECK: frsqrte
	; CHECK-NOT: fmadd
	; CHECK: fmul
	; CHECK-NOT: fmadd
	; CHECK: blr
	}			}

				define double @no_estimate_refinement_f64(double %a, double %b) #0 {
				; FMF-LABEL: @no_estimate_refinement_f64
				; FMF: frsqrte
				; FMF-NOT: fmadd
				; FMF: fmul
				; FMF-NOT: fmadd
				; FMF: blr
				%x = call fast double @llvm.sqrt.f64(double %b)
				%r = fdiv fast double %a, %x
				ret double %r
				}

	define double @foof(double %a, float %b) nounwind {			define double @foof_fmf(double %a, float %b) nounwind {
	%x = call float @llvm.sqrt.f32(float %b)			; FMF: @foof_fmf
				; FMF-DAG: frsqrtes
				; FMF: fmuls
				; FMF-NEXT: fmadds
				; FMF-NEXT: fmuls
				; FMF-NEXT: fmuls
				; FMF-NEXT: fmul
				; FMF-NEXT: blr
				%x = call fast float @llvm.sqrt.f32(float %b)
	%y = fpext float %x to double			%y = fpext float %x to double
	%r = fdiv double %a, %y			%r = fdiv fast double %a, %y
	ret double %r			ret double %r
				}

	; CHECK: @foof			define double @foof_safe(double %a, float %b) nounwind {
	; CHECK-DAG: frsqrtes			; CHECK-SAFE: @foof_safe
	; CHECK: fmuls
	; CHECK-NEXT: fmadds
	; CHECK-NEXT: fmuls
	; CHECK-NEXT: fmuls
	; CHECK-NEXT: fmul
	; CHECK-NEXT: blr

	; CHECK-SAFE: @foof
	; CHECK-SAFE: fsqrts			; CHECK-SAFE: fsqrts
	; CHECK-SAFE: fdiv			; CHECK-SAFE: fdiv
	; CHECK-SAFE: blr			; CHECK-SAFE: blr
				%x = call float @llvm.sqrt.f32(float %b)
				%y = fpext float %x to double
				%r = fdiv double %a, %y
				ret double %r
	}			}

	define float @food(float %a, double %b) nounwind {			define float @food_fmf(float %a, double %b) nounwind {
	%x = call double @llvm.sqrt.f64(double %b)			; FMF: @food_fmf
				; FMF-DAG: frsqrte
				; FMF: fmul
				; FMF-NEXT: fmadd
				; FMF-NEXT: fmul
				; FMF-NEXT: fmul
				; FMF-NEXT: fmul
				; FMF-NEXT: fmadd
				; FMF-NEXT: fmul
				; FMF-NEXT: fmul
				; FMF-NEXT: frsp
				; FMF-NEXT: fmuls
				; FMF-NEXT: blr
				%x = call fast double @llvm.sqrt.f64(double %b)
	%y = fptrunc double %x to float			%y = fptrunc double %x to float
	%r = fdiv float %a, %y			%r = fdiv fast float %a, %y
	ret float %r			ret float %r
				}

	; CHECK: @foo			define float @food_safe(float %a, double %b) nounwind {
	; CHECK-DAG: frsqrte			; CHECK-SAFE: @food_safe
	; CHECK: fmul
	; CHECK-NEXT: fmadd
	; CHECK-NEXT: fmul
	; CHECK-NEXT: fmul
	; CHECK-NEXT: fmul
	; CHECK-NEXT: fmadd
	; CHECK-NEXT: fmul
	; CHECK-NEXT: fmul
	; CHECK-NEXT: frsp
	; CHECK-NEXT: fmuls
	; CHECK-NEXT: blr

	; CHECK-SAFE: @foo
	; CHECK-SAFE: fsqrt			; CHECK-SAFE: fsqrt
	; CHECK-SAFE: fdivs			; CHECK-SAFE: fdivs
	; CHECK-SAFE: blr			; CHECK-SAFE: blr
				%x = call double @llvm.sqrt.f64(double %b)
				%y = fptrunc double %x to float
				%r = fdiv float %a, %y
				ret float %r
	}			}

	define float @goo(float %a, float %b) nounwind {			define float @goo_fmf(float %a, float %b) nounwind {
	%x = call float @llvm.sqrt.f32(float %b)			; FMF: @goo_fmf
	%r = fdiv float %a, %x			; FMF-DAG: frsqrtes
				; FMF: fmuls
				; FMF-NEXT: fmadds
				; FMF-NEXT: fmuls
				; FMF-NEXT: fmuls
				; FMF-NEXT: fmuls
				; FMF-NEXT: blr
				%x = call fast float @llvm.sqrt.f32(float %b)
				%r = fdiv fast float %a, %x
	ret float %r			ret float %r
				}

	; CHECK: @goo			define float @goo_safe(float %a, float %b) nounwind {
	; CHECK-DAG: frsqrtes			; CHECK-SAFE: @goo_safe
	; CHECK: fmuls
	; CHECK-NEXT: fmadds
	; CHECK-NEXT: fmuls
	; CHECK-NEXT: fmuls
	; CHECK-NEXT: fmuls
	; CHECK-NEXT: blr

	; CHECK-SAFE: @goo
	; CHECK-SAFE: fsqrts			; CHECK-SAFE: fsqrts
	; CHECK-SAFE: fdivs			; CHECK-SAFE: fdivs
	; CHECK-SAFE: blr			; CHECK-SAFE: blr
	}


	define float @no_estimate_refinement_f32(float %a, float %b) #0 {
	%x = call float @llvm.sqrt.f32(float %b)			%x = call float @llvm.sqrt.f32(float %b)
	%r = fdiv float %a, %x			%r = fdiv float %a, %x
	ret float %r			ret float %r
				}

	; CHECK-LABEL: @no_estimate_refinement_f32			define float @no_estimate_refinement_f32(float %a, float %b) #0 {
	; CHECK: frsqrtes			; FMF-LABEL: @no_estimate_refinement_f32
	; CHECK-NOT: fmadds			; FMF: frsqrtes
	; CHECK: fmuls			; FMF-NOT: fmadds
	; CHECK-NOT: fmadds			; FMF: fmuls
	; CHECK: blr			; FMF-NOT: fmadds
				; FMF: blr
				%x = call fast float @llvm.sqrt.f32(float %b)
				%r = fdiv fast float %a, %x
				ret float %r
	}			}

	; Recognize that this is rsqrt(a) * rcp(b) * c,			; Recognize that this is rsqrt(a) * rcp(b) * c,
	; not 1 / ( 1 / sqrt(a)) * rcp(b) * c.			; not 1 / ( 1 / sqrt(a)) * rcp(b) * c.
	define float @rsqrt_fmul(float %a, float %b, float %c) {			define float @rsqrt_fmul_fmf(float %a, float %b, float %c) {
	%x = call float @llvm.sqrt.f32(float %a)			; FMF: @rsqrt_fmul_fmf
	%y = fmul float %x, %b			; FMF-DAG: frsqrtes
	%z = fdiv float %c, %y			; FMF-DAG: fres
				; FMF-DAG: fnmsubs
				; FMF-DAG: fmuls
				; FMF-DAG: fmadds
				; FMF-DAG: fmadds
				; FMF: fmuls
				; FMF-NEXT: fmuls
				; FMF-NEXT: fmuls
				; FMF-NEXT: blr
				%x = call fast float @llvm.sqrt.f32(float %a)
				%y = fmul fast float %x, %b
				%z = fdiv fast float %c, %y
	ret float %z			ret float %z
				}

	; CHECK: @rsqrt_fmul			; Recognize that this is rsqrt(a) * rcp(b) * c,
	; CHECK-DAG: frsqrtes			; not 1 / ( 1 / sqrt(a)) * rcp(b) * c.
	; CHECK-DAG: fres			define float @rsqrt_fmul_safe(float %a, float %b, float %c) {
	; CHECK-DAG: fnmsubs			; CHECK-SAFE: @rsqrt_fmul_safe
	; CHECK-DAG: fmuls
	; CHECK-DAG: fmadds
	; CHECK-DAG: fmadds
	; CHECK: fmuls
	; CHECK-NEXT: fmuls
	; CHECK-NEXT: fmuls
	; CHECK-NEXT: blr

	; CHECK-SAFE: @rsqrt_fmul
	; CHECK-SAFE: fsqrts			; CHECK-SAFE: fsqrts
	; CHECK-SAFE: fmuls			; CHECK-SAFE: fmuls
	; CHECK-SAFE: fdivs			; CHECK-SAFE: fdivs
	; CHECK-SAFE: blr			; CHECK-SAFE: blr
				%x = call float @llvm.sqrt.f32(float %a)
				%y = fmul float %x, %b
				%z = fdiv float %c, %y
				ret float %z
	}			}

	define <4 x float> @hoo(<4 x float> %a, <4 x float> %b) nounwind {			define <4 x float> @hoo_fmf(<4 x float> %a, <4 x float> %b) nounwind {
	%x = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %b)			; FMF: @hoo_fmf
	%r = fdiv <4 x float> %a, %x			; FMF: vrsqrtefp
				%x = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> %b)
				%r = fdiv fast <4 x float> %a, %x
	ret <4 x float> %r			ret <4 x float> %r
				}

	; CHECK: @hoo			define <4 x float> @hoo_safe(<4 x float> %a, <4 x float> %b) nounwind {
	; CHECK: vrsqrtefp			; CHECK-SAFE: @hoo_safe

	; CHECK-SAFE: @hoo
	; CHECK-SAFE-NOT: vrsqrtefp			; CHECK-SAFE-NOT: vrsqrtefp
	; CHECK-SAFE: blr			; CHECK-SAFE: blr
				%x = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %b)
				%r = fdiv <4 x float> %a, %x
				ret <4 x float> %r
	}			}

	define double @foo2(double %a, double %b) nounwind {			define double @foo2_fmf(double %a, double %b) nounwind {
	%r = fdiv double %a, %b			; FMF: @foo2_fmf
				; FMF-DAG: fre
				; FMF-DAG: fnmsub
				; FMF: fmadd
				; FMF-NEXT: fnmsub
				; FMF-NEXT: fmadd
				; FMF-NEXT: fmul
				; FMF-NEXT: blr
				%r = fdiv fast double %a, %b
	ret double %r			ret double %r
				}

	; CHECK: @foo2			define double @foo2_safe(double %a, double %b) nounwind {
	; CHECK-DAG: fre			; CHECK-SAFE: @foo2_safe
	; CHECK-DAG: fnmsub
	; CHECK: fmadd
	; CHECK-NEXT: fnmsub
	; CHECK-NEXT: fmadd
	; CHECK-NEXT: fmul
	; CHECK-NEXT: blr

	; CHECK-SAFE: @foo2
	; CHECK-SAFE: fdiv			; CHECK-SAFE: fdiv
	; CHECK-SAFE: blr			; CHECK-SAFE: blr
				%r = fdiv double %a, %b
				ret double %r
	}			}

	define float @goo2(float %a, float %b) nounwind {			define float @goo2_fmf(float %a, float %b) nounwind {
	%r = fdiv float %a, %b			; FMF: @goo2_fmf
				; FMF-DAG: fres
				; FMF-DAG: fnmsubs
				; FMF: fmadds
				; FMF-NEXT: fmuls
				; FMF-NEXT: blr
				%r = fdiv fast float %a, %b
	ret float %r			ret float %r
				}

	; CHECK: @goo2			define float @goo2_safe(float %a, float %b) nounwind {
	; CHECK-DAG: fres			; CHECK-SAFE: @goo2_safe
	; CHECK-DAG: fnmsubs
	; CHECK: fmadds
	; CHECK-NEXT: fmuls
	; CHECK-NEXT: blr

	; CHECK-SAFE: @goo2
	; CHECK-SAFE: fdivs			; CHECK-SAFE: fdivs
	; CHECK-SAFE: blr			; CHECK-SAFE: blr
				%r = fdiv float %a, %b
				ret float %r
	}			}

	define <4 x float> @hoo2(<4 x float> %a, <4 x float> %b) nounwind {			define <4 x float> @hoo2_fmf(<4 x float> %a, <4 x float> %b) nounwind {
	%r = fdiv <4 x float> %a, %b			; FMF: @hoo2_fmf
				; FMF: vrefp
				%r = fdiv fast <4 x float> %a, %b
	ret <4 x float> %r			ret <4 x float> %r
				}

	; CHECK: @hoo2			define <4 x float> @hoo2_safe(<4 x float> %a, <4 x float> %b) nounwind {
	; CHECK: vrefp			; CHECK-SAFE: @hoo2_safe

	; CHECK-SAFE: @hoo2
	; CHECK-SAFE-NOT: vrefp			; CHECK-SAFE-NOT: vrefp
	; CHECK-SAFE: blr			; CHECK-SAFE: blr
				%r = fdiv <4 x float> %a, %b
				ret <4 x float> %r
	}			}

	define double @foo3(double %a) nounwind {			define double @foo3_fmf(double %a) nounwind {
	%r = call double @llvm.sqrt.f64(double %a)			; FMF: @foo3_fmf
				; FMF: fcmpu
				; FMF-DAG: frsqrte
				; FMF: fmul
				; FMF-NEXT: fmadd
				; FMF-NEXT: fmul
				; FMF-NEXT: fmul
				; FMF-NEXT: fmul
				; FMF-NEXT: fmadd
				; FMF-NEXT: fmul
				; FMF-NEXT: fmul
				; FMF: blr
				%r = call fast double @llvm.sqrt.f64(double %a)
	ret double %r			ret double %r
				}

	; CHECK: @foo3			define double @foo3_safe(double %a) nounwind {
	; CHECK: fcmpu			; CHECK-SAFE: @foo3_safe
	; CHECK-DAG: frsqrte
	; CHECK: fmul
	; CHECK-NEXT: fmadd
	; CHECK-NEXT: fmul
	; CHECK-NEXT: fmul
	; CHECK-NEXT: fmul
	; CHECK-NEXT: fmadd
	; CHECK-NEXT: fmul
	; CHECK-NEXT: fmul
	; CHECK: blr

	; CHECK-SAFE: @foo3
	; CHECK-SAFE: fsqrt			; CHECK-SAFE: fsqrt
	; CHECK-SAFE: blr			; CHECK-SAFE: blr
				%r = call double @llvm.sqrt.f64(double %a)
				ret double %r
	}			}

	define float @goo3(float %a) nounwind {			define float @goo3_fmf(float %a) nounwind {
	%r = call float @llvm.sqrt.f32(float %a)			; FMF: @goo3_fmf
				; FMF: fcmpu
				; FMF-DAG: frsqrtes
				; FMF: fmuls
				; FMF-NEXT: fmadds
				; FMF-NEXT: fmuls
				; FMF-NEXT: fmuls
				; FMF: blr
				%r = call fast float @llvm.sqrt.f32(float %a)
	ret float %r			ret float %r
				}

	; CHECK: @goo3			define float @goo3_safe(float %a) nounwind {
	; CHECK: fcmpu			; CHECK-SAFE: @goo3_safe
	; CHECK-DAG: frsqrtes
	; CHECK: fmuls
	; CHECK-NEXT: fmadds
	; CHECK-NEXT: fmuls
	; CHECK-NEXT: fmuls
	; CHECK: blr

	; CHECK-SAFE: @goo3
	; CHECK-SAFE: fsqrts			; CHECK-SAFE: fsqrts
	; CHECK-SAFE: blr			; CHECK-SAFE: blr
				%r = call float @llvm.sqrt.f32(float %a)
				ret float %r
	}			}

	define <4 x float> @hoo3(<4 x float> %a) nounwind {			define <4 x float> @hoo3_fmf(<4 x float> %a) nounwind {
	%r = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %a)			; FMF: @hoo3_fmf
				; FMF: vrsqrtefp
				; FMF-DAG: vcmpeqfp
				%r = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> %a)
	ret <4 x float> %r			ret <4 x float> %r
				}

	; CHECK: @hoo3			define <4 x float> @hoo3_safe(<4 x float> %a) nounwind {
	; CHECK: vrsqrtefp			; CHECK-SAFE: @hoo3_safe
	; CHECK-DAG: vcmpeqfp

	; CHECK-SAFE: @hoo3
	; CHECK-SAFE-NOT: vrsqrtefp			; CHECK-SAFE-NOT: vrsqrtefp
	; CHECK-SAFE: blr			; CHECK-SAFE: blr
				%r = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %a)
				ret <4 x float> %r
	}			}

	attributes #0 = { nounwind "reciprocal-estimates"="sqrtf:0,sqrtd:0" }			attributes #0 = { nounwind "reciprocal-estimates"="sqrtf:0,sqrtd:0" }

test/CodeGen/X86/dagcombine-unsafe-math.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=corei7-avx \| FileCheck %s
	; RUN: llc < %s -enable-unsafe-fp-math -mtriple=x86_64-apple-darwin -mcpu=corei7-avx \| FileCheck %s


	; rdar://13126763			; rdar://13126763
	; Expression "x + xx" was mistakenly transformed into "x 3.0f".			; Expression "x + xx" was mistakenly transformed into "x 3.0f".

	define float @test1(float %x) {			define float @test1(float %x) {
	; CHECK-LABEL: test1:			; CHECK-LABEL: test1:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	define float @test5(<4 x float> %x) {			define float @test5(<4 x float> %x) {
	; CHECK-LABEL: test5:			; CHECK-LABEL: test5:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%splat = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> zeroinitializer			%splat = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> zeroinitializer
	%v1 = extractelement <4 x float> %splat, i32 1			%v1 = extractelement <4 x float> %splat, i32 1
	%v0 = extractelement <4 x float> %splat, i32 0			%v0 = extractelement <4 x float> %splat, i32 0
	%add1 = fadd float %v0, %v1			%add1 = fadd contract reassoc nsz float %v0, %v1
				spatelUnsubmitted Not Done Reply Inline Actions If 'contract' is required, that is unnecessary? Mark with a 'FIXME' note? spatel: If 'contract' is required, that is unnecessary? Mark with a 'FIXME' note?
	%v2 = extractelement <4 x float> %splat, i32 2			%v2 = extractelement <4 x float> %splat, i32 2
	%add2 = fadd float %v2, %add1			%add2 = fadd contract reassoc nsz float %v2, %add1
	ret float %add2			ret float %add2
	}			}

test/CodeGen/X86/fmul-combines.ll

	Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: constant_fold_fmul_v4f32_undef:			; CHECK-LABEL: constant_fold_fmul_v4f32_undef:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movaps {{.*#+}} xmm0 = [8.0E+0,NaN,8.0E+0,NaN]			; CHECK-NEXT: movaps {{.*#+}} xmm0 = [8.0E+0,NaN,8.0E+0,NaN]
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%y = fmul <4 x float> <float 4.0, float undef, float 4.0, float 4.0>, <float 2.0, float 2.0, float 2.0, float undef>			%y = fmul <4 x float> <float 4.0, float undef, float 4.0, float 4.0>, <float 2.0, float 2.0, float 2.0, float undef>
	ret <4 x float> %y			ret <4 x float> %y
	}			}

	define <4 x float> @fmul0_v4f32_nsz_nnan(<4 x float> %x) #0 {			define <4 x float> @fmul0_v4f32_nsz_nnan(<4 x float> %x) {
	; CHECK-LABEL: fmul0_v4f32_nsz_nnan:			; CHECK-LABEL: fmul0_v4f32_nsz_nnan:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: xorps %xmm0, %xmm0			; CHECK-NEXT: xorps %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%y = fmul nnan nsz <4 x float> %x, <float 0.0, float 0.0, float 0.0, float 0.0>			%y = fmul nnan nsz <4 x float> %x, <float 0.0, float 0.0, float 0.0, float 0.0>
	ret <4 x float> %y			ret <4 x float> %y
	}			}

	define <4 x float> @fmul0_v4f32_undef(<4 x float> %x) #0 {			define <4 x float> @fmul0_v4f32_undef(<4 x float> %x) {
	; CHECK-LABEL: fmul0_v4f32_undef:			; CHECK-LABEL: fmul0_v4f32_undef:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: xorps %xmm0, %xmm0			; CHECK-NEXT: xorps %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%y = fmul nnan nsz <4 x float> %x, <float undef, float 0.0, float undef, float 0.0>			%y = fmul nnan nsz <4 x float> %x, <float undef, float 0.0, float undef, float 0.0>
	ret <4 x float> %y			ret <4 x float> %y
	}			}

	define <4 x float> @fmul_c2_c4_v4f32(<4 x float> %x) #0 {			define <4 x float> @fmul_c2_c4_v4f32(<4 x float> %x) {
	; CHECK-LABEL: fmul_c2_c4_v4f32:			; CHECK-LABEL: fmul_c2_c4_v4f32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0			; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%y = fmul <4 x float> %x, <float 2.0, float 2.0, float 2.0, float 2.0>			%y = fmul fast <4 x float> %x, <float 2.0, float 2.0, float 2.0, float 2.0>
	%z = fmul <4 x float> %y, <float 4.0, float 4.0, float 4.0, float 4.0>			%z = fmul fast <4 x float> %y, <float 4.0, float 4.0, float 4.0, float 4.0>
	ret <4 x float> %z			ret <4 x float> %z
	}			}

	define <4 x float> @fmul_c3_c4_v4f32(<4 x float> %x) #0 {			define <4 x float> @fmul_c3_c4_v4f32(<4 x float> %x) {
	; CHECK-LABEL: fmul_c3_c4_v4f32:			; CHECK-LABEL: fmul_c3_c4_v4f32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0			; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%y = fmul <4 x float> %x, <float 3.0, float 3.0, float 3.0, float 3.0>			%y = fmul fast <4 x float> %x, <float 3.0, float 3.0, float 3.0, float 3.0>
	%z = fmul <4 x float> %y, <float 4.0, float 4.0, float 4.0, float 4.0>			%z = fmul fast <4 x float> %y, <float 4.0, float 4.0, float 4.0, float 4.0>
	ret <4 x float> %z			ret <4 x float> %z
	}			}

	; CHECK: float 5			; CHECK: float 5
	; CHECK: float 12			; CHECK: float 12
	; CHECK: float 21			; CHECK: float 21
	; CHECK: float 32			; CHECK: float 32

	; We should be able to pre-multiply the two constant vectors.			; We should be able to pre-multiply the two constant vectors.
	define <4 x float> @fmul_v4f32_two_consts_no_splat(<4 x float> %x) #0 {			define <4 x float> @fmul_v4f32_two_consts_no_splat(<4 x float> %x) {
	; CHECK-LABEL: fmul_v4f32_two_consts_no_splat:			; CHECK-LABEL: fmul_v4f32_two_consts_no_splat:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0			; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%y = fmul <4 x float> %x, <float 1.0, float 2.0, float 3.0, float 4.0>			%y = fmul fast <4 x float> %x, <float 1.0, float 2.0, float 3.0, float 4.0>
	%z = fmul <4 x float> %y, <float 5.0, float 6.0, float 7.0, float 8.0>			%z = fmul fast <4 x float> %y, <float 5.0, float 6.0, float 7.0, float 8.0>
	ret <4 x float> %z			ret <4 x float> %z
	}			}

	; Same as above, but reverse operands to make sure non-canonical form is also handled.			; Same as above, but reverse operands to make sure non-canonical form is also handled.
	define <4 x float> @fmul_v4f32_two_consts_no_splat_non_canonical(<4 x float> %x) #0 {			define <4 x float> @fmul_v4f32_two_consts_no_splat_non_canonical(<4 x float> %x) {
	; CHECK-LABEL: fmul_v4f32_two_consts_no_splat_non_canonical:			; CHECK-LABEL: fmul_v4f32_two_consts_no_splat_non_canonical:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0			; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%y = fmul <4 x float> <float 1.0, float 2.0, float 3.0, float 4.0>, %x			%y = fmul fast <4 x float> <float 1.0, float 2.0, float 3.0, float 4.0>, %x
	%z = fmul <4 x float> <float 5.0, float 6.0, float 7.0, float 8.0>, %y			%z = fmul fast <4 x float> <float 5.0, float 6.0, float 7.0, float 8.0>, %y
	ret <4 x float> %z			ret <4 x float> %z
	}			}

	; Node-level FMF and no function-level attributes.			; Node-level FMF and no function-level attributes.

	define <4 x float> @fmul_v4f32_two_consts_no_splat_reassoc(<4 x float> %x) {			define <4 x float> @fmul_v4f32_two_consts_no_splat_reassoc(<4 x float> %x) {
	; CHECK-LABEL: fmul_v4f32_two_consts_no_splat_reassoc:			; CHECK-LABEL: fmul_v4f32_two_consts_no_splat_reassoc:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	Show All 18 Lines

	; CHECK: float 6			; CHECK: float 6
	; CHECK: float 14			; CHECK: float 14
	; CHECK: float 24			; CHECK: float 24
	; CHECK: float 36			; CHECK: float 36

	; More than one use of a constant multiply should not inhibit the optimization.			; More than one use of a constant multiply should not inhibit the optimization.
	; Instead of a chain of 2 dependent mults, this test will have 2 independent mults.			; Instead of a chain of 2 dependent mults, this test will have 2 independent mults.
	define <4 x float> @fmul_v4f32_two_consts_no_splat_multiple_use(<4 x float> %x) #0 {			define <4 x float> @fmul_v4f32_two_consts_no_splat_multiple_use(<4 x float> %x) {
	; CHECK-LABEL: fmul_v4f32_two_consts_no_splat_multiple_use:			; CHECK-LABEL: fmul_v4f32_two_consts_no_splat_multiple_use:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0			; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%y = fmul <4 x float> %x, <float 1.0, float 2.0, float 3.0, float 4.0>			%y = fmul fast <4 x float> %x, <float 1.0, float 2.0, float 3.0, float 4.0>
	%z = fmul <4 x float> %y, <float 5.0, float 6.0, float 7.0, float 8.0>			%z = fmul fast <4 x float> %y, <float 5.0, float 6.0, float 7.0, float 8.0>
	%a = fadd <4 x float> %y, %z			%a = fadd fast <4 x float> %y, %z
	ret <4 x float> %a			ret <4 x float> %a
	}			}

	; PR22698 - http://llvm.org/bugs/show_bug.cgi?id=22698			; PR22698 - http://llvm.org/bugs/show_bug.cgi?id=22698
	; Make sure that we don't infinite loop swapping constants back and forth.			; Make sure that we don't infinite loop swapping constants back and forth.

	; CHECK: float 24			; CHECK: float 24
	; CHECK: float 24			; CHECK: float 24
	; CHECK: float 24			; CHECK: float 24
	; CHECK: float 24			; CHECK: float 24

	define <4 x float> @PR22698_splats(<4 x float> %a) #0 {			define <4 x float> @PR22698_splats(<4 x float> %a) {
	; CHECK-LABEL: PR22698_splats:			; CHECK-LABEL: PR22698_splats:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0			; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%mul1 = fmul fast <4 x float> <float 2.0, float 2.0, float 2.0, float 2.0>, <float 3.0, float 3.0, float 3.0, float 3.0>			%mul1 = fmul fast <4 x float> <float 2.0, float 2.0, float 2.0, float 2.0>, <float 3.0, float 3.0, float 3.0, float 3.0>
	%mul2 = fmul fast <4 x float> <float 4.0, float 4.0, float 4.0, float 4.0>, %mul1			%mul2 = fmul fast <4 x float> <float 4.0, float 4.0, float 4.0, float 4.0>, %mul1
	%mul3 = fmul fast <4 x float> %a, %mul2			%mul3 = fmul fast <4 x float> %a, %mul2
	ret <4 x float> %mul3			ret <4 x float> %mul3
	}			}

	; Same as above, but verify that non-splat vectors are handled correctly too.			; Same as above, but verify that non-splat vectors are handled correctly too.

	; CHECK: float 45			; CHECK: float 45
	; CHECK: float 120			; CHECK: float 120
	; CHECK: float 231			; CHECK: float 231
	; CHECK: float 384			; CHECK: float 384

	define <4 x float> @PR22698_no_splats(<4 x float> %a) #0 {			define <4 x float> @PR22698_no_splats(<4 x float> %a) {
	; CHECK-LABEL: PR22698_no_splats:			; CHECK-LABEL: PR22698_no_splats:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0			; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%mul1 = fmul fast <4 x float> <float 1.0, float 2.0, float 3.0, float 4.0>, <float 5.0, float 6.0, float 7.0, float 8.0>			%mul1 = fmul fast <4 x float> <float 1.0, float 2.0, float 3.0, float 4.0>, <float 5.0, float 6.0, float 7.0, float 8.0>
	%mul2 = fmul fast <4 x float> <float 9.0, float 10.0, float 11.0, float 12.0>, %mul1			%mul2 = fmul fast <4 x float> <float 9.0, float 10.0, float 11.0, float 12.0>, %mul1
	%mul3 = fmul fast <4 x float> %a, %mul2			%mul3 = fmul fast <4 x float> %a, %mul2
	ret <4 x float> %mul3			ret <4 x float> %mul3
	}			}

	define float @fmul_c2_c4_f32(float %x) #0 {			define float @fmul_c2_c4_f32(float %x) {
	; CHECK-LABEL: fmul_c2_c4_f32:			; CHECK-LABEL: fmul_c2_c4_f32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: mulss {{.*}}(%rip), %xmm0			; CHECK-NEXT: mulss {{.*}}(%rip), %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%y = fmul float %x, 2.0			%y = fmul fast float %x, 2.0
	%z = fmul float %y, 4.0			%z = fmul fast float %y, 4.0
	ret float %z			ret float %z
	}			}

	define float @fmul_c3_c4_f32(float %x) #0 {			define float @fmul_c3_c4_f32(float %x) {
	; CHECK-LABEL: fmul_c3_c4_f32:			; CHECK-LABEL: fmul_c3_c4_f32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: mulss {{.*}}(%rip), %xmm0			; CHECK-NEXT: mulss {{.*}}(%rip), %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%y = fmul float %x, 3.0			%y = fmul fast float %x, 3.0
	%z = fmul float %y, 4.0			%z = fmul fast float %y, 4.0
	ret float %z			ret float %z
	}			}

	define float @fmul_fneg_fneg_f32(float %x, float %y) {			define float @fmul_fneg_fneg_f32(float %x, float %y) {
	; CHECK-LABEL: fmul_fneg_fneg_f32:			; CHECK-LABEL: fmul_fneg_fneg_f32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: mulss %xmm1, %xmm0			; CHECK-NEXT: mulss %xmm1, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%x.neg = fsub float -0.0, %x			%x.neg = fsub float -0.0, %x
	%y.neg = fsub float -0.0, %y			%y.neg = fsub float -0.0, %y
	%mul = fmul float %x.neg, %y.neg			%mul = fmul float %x.neg, %y.neg
	ret float %mul			ret float %mul
	}			}

	define <4 x float> @fmul_fneg_fneg_v4f32(<4 x float> %x, <4 x float> %y) {			define <4 x float> @fmul_fneg_fneg_v4f32(<4 x float> %x, <4 x float> %y) {
	; CHECK-LABEL: fmul_fneg_fneg_v4f32:			; CHECK-LABEL: fmul_fneg_fneg_v4f32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: mulps %xmm1, %xmm0			; CHECK-NEXT: mulps %xmm1, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%x.neg = fsub <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, %x			%x.neg = fsub <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, %x
	%y.neg = fsub <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, %y			%y.neg = fsub <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, %y
	%mul = fmul <4 x float> %x.neg, %y.neg			%mul = fmul <4 x float> %x.neg, %y.neg
	ret <4 x float> %mul			ret <4 x float> %mul
	}			}

	attributes #0 = { "less-precise-fpmad"="true" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "unsafe-fp-math"="true" }

test/CodeGen/X86/fp-fast.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=avx < %s \| FileCheck %s
	; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=avx -enable-unsafe-fp-math --enable-no-nans-fp-math < %s \| FileCheck %s

	define float @test1(float %a) {			define float @test1(float %a) #0 {
	; CHECK-LABEL: test1:			; CHECK-LABEL: test1:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t1 = fadd float %a, %a			%t1 = fadd nnan contract reassoc nsz float %a, %a
	%r = fadd float %t1, %t1			%r = fadd nnan contract reassoc nsz float %t1, %t1
				spatelUnsubmitted Not Done Reply Inline Actions Same as earlier comment: If 'contract' is required, that is unnecessary? Mark with a 'FIXME' note? spatel: Same as earlier comment: If 'contract' is required, that is unnecessary? Mark with a 'FIXME'…
	ret float %r			ret float %r
	}			}

	define float @test2(float %a) {			define float @test2(float %a) #0 {
	; CHECK-LABEL: test2:			; CHECK-LABEL: test2:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t1 = fmul float 4.0, %a			%t1 = fmul nnan contract reassoc nsz float 4.0, %a
	%t2 = fadd float %a, %a			%t2 = fadd nnan contract reassoc nsz float %a, %a
	%r = fadd float %t1, %t2			%r = fadd nnan contract reassoc nsz float %t1, %t2
	ret float %r			ret float %r
	}			}

	define float @test3(float %a) {			define float @test3(float %a) #0 {
	; CHECK-LABEL: test3:			; CHECK-LABEL: test3:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t1 = fmul float %a, 4.0			%t1 = fmul nnan contract reassoc nsz float %a, 4.0
	%t2 = fadd float %a, %a			%t2 = fadd nnan contract reassoc nsz float %a, %a
	%r = fadd float %t1, %t2			%r = fadd nnan contract reassoc nsz float %t1, %t2
	ret float %r			ret float %r
	}			}

	define float @test4(float %a) {			define float @test4(float %a) #0 {
	; CHECK-LABEL: test4:			; CHECK-LABEL: test4:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t1 = fadd float %a, %a			%t1 = fadd nnan contract reassoc nsz float %a, %a
	%t2 = fmul float 4.0, %a			%t2 = fmul nnan contract reassoc nsz float 4.0, %a
	%r = fadd float %t1, %t2			%r = fadd nnan contract reassoc nsz float %t1, %t2
	ret float %r			ret float %r
	}			}

	define float @test5(float %a) {			define float @test5(float %a) #0 {
	; CHECK-LABEL: test5:			; CHECK-LABEL: test5:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t1 = fadd float %a, %a			%t1 = fadd nnan contract reassoc nsz float %a, %a
	%t2 = fmul float %a, 4.0			%t2 = fmul nnan contract reassoc nsz float %a, 4.0
	%r = fadd float %t1, %t2			%r = fadd nnan contract reassoc nsz float %t1, %t2
	ret float %r			ret float %r
	}			}

	define float @test6(float %a) {			define float @test6(float %a) #0 {
	; CHECK-LABEL: test6:			; CHECK-LABEL: test6:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vxorps %xmm0, %xmm0, %xmm0			; CHECK-NEXT: vxorps %xmm0, %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t1 = fmul float 2.0, %a			%t1 = fmul nnan contract reassoc nsz float 2.0, %a
	%t2 = fadd float %a, %a			%t2 = fadd nnan contract reassoc nsz float %a, %a
	%r = fsub float %t1, %t2			%r = fsub nnan contract reassoc nsz float %t1, %t2
	ret float %r			ret float %r
	}			}

	define float @test7(float %a) {			define float @test7(float %a) #0 {
	; CHECK-LABEL: test7:			; CHECK-LABEL: test7:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vxorps %xmm0, %xmm0, %xmm0			; CHECK-NEXT: vxorps %xmm0, %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t1 = fmul float %a, 2.0			%t1 = fmul nnan contract reassoc nsz float %a, 2.0
	%t2 = fadd float %a, %a			%t2 = fadd nnan contract reassoc nsz float %a, %a
	%r = fsub float %t1, %t2			%r = fsub nnan contract reassoc nsz float %t1, %t2
	ret float %r			ret float %r
	}			}

	define float @test8(float %a) {			define float @test8(float %a) #0 {
	; CHECK-LABEL: test8:			; CHECK-LABEL: test8:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t1 = fmul float %a, 0.0			%t1 = fmul nnan contract reassoc nsz float %a, 0.0
	%t2 = fadd float %a, %t1			%t2 = fadd nnan contract reassoc nsz float %a, %t1
	ret float %t2			ret float %t2
	}			}

	define float @test9(float %a) {			define float @test9(float %a) #0 {
	; CHECK-LABEL: test9:			; CHECK-LABEL: test9:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t1 = fmul float 0.0, %a			%t1 = fmul nnan contract reassoc nsz float 0.0, %a
	%t2 = fadd float %t1, %a			%t2 = fadd nnan contract reassoc nsz float %t1, %a
	ret float %t2			ret float %t2
	}			}

	define float @test10(float %a) {			define float @test10(float %a) #0 {
	; CHECK-LABEL: test10:			; CHECK-LABEL: test10:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vxorps %xmm0, %xmm0, %xmm0			; CHECK-NEXT: vxorps %xmm0, %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t1 = fsub float -0.0, %a			%t1 = fsub nnan contract reassoc nsz float -0.0, %a
	%t2 = fadd float %a, %t1			%t2 = fadd nnan contract reassoc nsz float %a, %t1
	ret float %t2			ret float %t2
	}			}

test/CodeGen/X86/fp-fold.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefixes=ANY,STRICT			; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefixes=ANY,STRICT
				spatelUnsubmitted Not Done Reply Inline Actions Same comment as earlier: Selectively using 2 different labels for the same RUN line confused me. That seems unnecessary because the function name and IR makes it clear what the difference in output is expected to be. Definitely use the auto-generation script when possible for x86 tests. spatel: Same comment as earlier: Selectively using 2 different labels for the same RUN line confused me.
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -enable-unsafe-fp-math \| FileCheck %s --check-prefixes=ANY,UNSAFE

	define float @fadd_zero(float %x) {			define float @fadd_zero_strict(float %x) {
	; STRICT-LABEL: fadd_zero:			; STRICT-LABEL: fadd_zero_strict:
	; STRICT: # %bb.0:			; STRICT: # %bb.0:
	; STRICT-NEXT: xorps %xmm1, %xmm1			; STRICT-NEXT: xorps %xmm1, %xmm1
	; STRICT-NEXT: addss %xmm1, %xmm0			; STRICT-NEXT: addss %xmm1, %xmm0
	; STRICT-NEXT: retq			; STRICT-NEXT: retq
	;
	; UNSAFE-LABEL: fadd_zero:
	; UNSAFE: # %bb.0:
	; UNSAFE-NEXT: retq
	%r = fadd float %x, 0.0			%r = fadd float %x, 0.0
	ret float %r			ret float %r
	}			}

	define float @fadd_negzero(float %x) {			define float @fadd_negzero(float %x) {
	; ANY-LABEL: fadd_negzero:			; ANY-LABEL: fadd_negzero:
	; ANY: # %bb.0:			; ANY: # %bb.0:
	; ANY-NEXT: retq			; ANY-NEXT: retq
	▲ Show 20 Lines • Show All 164 Lines • ▼ Show 20 Lines
	; ANY: # %bb.0:			; ANY: # %bb.0:
	; ANY-NEXT: xorps {{.*}}(%rip), %xmm0			; ANY-NEXT: xorps {{.*}}(%rip), %xmm0
	; ANY-NEXT: retq			; ANY-NEXT: retq
	%a = fadd <4 x float> %y, %x			%a = fadd <4 x float> %y, %x
	%r = fsub reassoc nsz <4 x float> %y, %a			%r = fsub reassoc nsz <4 x float> %y, %a
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define float @fsub_negzero(float %x) {			define float @fsub_negzero_strict(float %x) {
	; STRICT-LABEL: fsub_negzero:			; STRICT-LABEL: fsub_negzero_strict:
	; STRICT: # %bb.0:			; STRICT: # %bb.0:
	; STRICT-NEXT: xorps %xmm1, %xmm1			; STRICT-NEXT: xorps %xmm1, %xmm1
	; STRICT-NEXT: addss %xmm1, %xmm0			; STRICT-NEXT: addss %xmm1, %xmm0
	; STRICT-NEXT: retq			; STRICT-NEXT: retq
	;
	; UNSAFE-LABEL: fsub_negzero:
	; UNSAFE: # %bb.0:
	; UNSAFE-NEXT: retq
	%r = fsub float %x, -0.0			%r = fsub float %x, -0.0
	ret float %r			ret float %r
	}			}

	define <4 x float> @fsub_negzero_vector(<4 x float> %x) {			define float @fsub_negzero_nsz(float %x) {
	; STRICT-LABEL: fsub_negzero_vector:			; ANY-LABEL: fsub_negzero_nsz:
				; ANY: # %bb.0:
				; ANY-NEXT: retq
				%r = fsub nsz float %x, -0.0
				ret float %r
				}

				define <4 x float> @fsub_negzero_strict_vector(<4 x float> %x) {
				; STRICT-LABEL: fsub_negzero_strict_vector:
	; STRICT: # %bb.0:			; STRICT: # %bb.0:
	; STRICT-NEXT: xorps %xmm1, %xmm1			; STRICT-NEXT: xorps %xmm1, %xmm1
	; STRICT-NEXT: addps %xmm1, %xmm0			; STRICT-NEXT: addps %xmm1, %xmm0
	; STRICT-NEXT: retq			; STRICT-NEXT: retq
	;
	; UNSAFE-LABEL: fsub_negzero_vector:
	; UNSAFE: # %bb.0:
	; UNSAFE-NEXT: retq
	%r = fsub <4 x float> %x, <float -0.0, float -0.0, float -0.0, float -0.0>			%r = fsub <4 x float> %x, <float -0.0, float -0.0, float -0.0, float -0.0>
	ret <4 x float> %r			ret <4 x float> %r
	}			}

				define <4 x float> @fsub_negzero_nsz_vector(<4 x float> %x) {
				; ANY-LABEL: fsub_negzero_nsz_vector:
				; ANY: # %bb.0:
				; ANY-NEXT: retq
				%r = fsub nsz <4 x float> %x, <float -0.0, float -0.0, float -0.0, float -0.0>
				ret <4 x float> %r
				}

	define float @fsub_zero_nsz_1(float %x) {			define float @fsub_zero_nsz_1(float %x) {
	; ANY-LABEL: fsub_zero_nsz_1:			; ANY-LABEL: fsub_zero_nsz_1:
	; ANY: # %bb.0:			; ANY: # %bb.0:
	; ANY-NEXT: retq			; ANY-NEXT: retq
	%r = fsub nsz float %x, 0.0			%r = fsub nsz float %x, 0.0
	ret float %r			ret float %r
	}			}

	define float @fsub_zero_nsz_2(float %x) {			define float @fsub_zero_nsz_2(float %x) {
	; ANY-LABEL: fsub_zero_nsz_2:			; ANY-LABEL: fsub_zero_nsz_2:
	; ANY: # %bb.0:			; ANY: # %bb.0:
	; ANY-NEXT: xorps {{.*}}(%rip), %xmm0			; ANY-NEXT: xorps {{.*}}(%rip), %xmm0
	; ANY-NEXT: retq			; ANY-NEXT: retq
	%r = fsub nsz float 0.0, %x			%r = fsub nsz float 0.0, %x
	ret float %r			ret float %r
	}			}

	define float @fsub_negzero_nsz(float %x) {
	; ANY-LABEL: fsub_negzero_nsz:
	; ANY: # %bb.0:
	; ANY-NEXT: retq
	%r = fsub nsz float %x, -0.0
	ret float %r
	}

	define float @fmul_zero(float %x) {			define float @fmul_zero(float %x) {
	; ANY-LABEL: fmul_zero:			; ANY-LABEL: fmul_zero:
	; ANY: # %bb.0:			; ANY: # %bb.0:
	; ANY-NEXT: xorps %xmm0, %xmm0			; ANY-NEXT: xorps %xmm0, %xmm0
	; ANY-NEXT: retq			; ANY-NEXT: retq
	%r = fmul nnan nsz float %x, 0.0			%r = fmul nnan nsz float %x, 0.0
	ret float %r			ret float %r
	}			}
	Show All 18 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize NoSignedZerosFPMath options controlClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 212412

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

test/CodeGen/AArch64/fadd-combines.ll

test/CodeGen/AMDGPU/enable-no-signed-zeros-fp-math.ll

test/CodeGen/AMDGPU/ffloor.f64.ll

test/CodeGen/AMDGPU/fneg-combines.ll

test/CodeGen/PowerPC/fma-mutate.ll

test/CodeGen/PowerPC/fmf-propagation.ll

test/CodeGen/PowerPC/qpx-recipest.ll

test/CodeGen/PowerPC/recipest.ll

test/CodeGen/X86/dagcombine-unsafe-math.ll

test/CodeGen/X86/fmul-combines.ll

test/CodeGen/X86/fp-fast.ll

test/CodeGen/X86/fp-fold.ll

Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize NoSignedZerosFPMath options control
ClosedPublic