This is an archive of the discontinued LLVM Phabricator instance.

Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize NoSignedZerosFPMath options control
ClosedPublic

Authored by mcberg2017 on Jul 23 2019, 3:56 PM.

Download Raw Diff

Details

Reviewers

spatel
arsenm
hfinkel
wristow
craig.topper

Commits

rG005d705d4392: Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize…
rL367486: Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize…

Summary

Honoring no signed zeroes is also available as a user control through clang separately regardless of fastmath or UnsafeFPMath context, DAG guards should reflect this context.

Diff Detail

Event Timeline

mcberg2017 created this revision.Jul 23 2019, 3:56 PM

Herald added subscribers: jsji, MaskRay, javed.absar and 4 others. · View Herald TranscriptJul 23 2019, 3:56 PM

mcberg2017 marked an inline comment as done.Jul 23 2019, 3:59 PM

mcberg2017 added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12059	In this case, UnsafeFPMath holds reassociation context.

Herald added a subscriber: • wuzish. · View Herald TranscriptJul 23 2019, 4:00 PM

nhaehnle removed a subscriber: nhaehnle.Jul 24 2019, 12:15 AM

updated with one case...

Code changes look good. See inline to for some comments about the tests.

test/CodeGen/AMDGPU/enable-no-signed-zeros-fp-math.ll
3	Here and other test files: is it possible to remove "-enable-unsafe-fp-math" and still get the same test results? If we can tighten up the constraints, that would help move us away from the global requirements. Another option would be to specify the function-level attribute only on the tests that still require more than 'nsz' to produce the expected test results.
test/CodeGen/AMDGPU/fneg-combines.ll
222–226 ↗	(On Diff #211588)	I don't know how to interpret this test diff: regression, improvement, or does this test no longer accomplish its original intent?

spatel added inline comments.Jul 27 2019, 6:04 AM

test/CodeGen/AMDGPU/enable-no-signed-zeros-fp-math.ll
3	Still another option (and moves us closer still to the goal of IR/node-level flags only): can we remove the global settings entirely, add FMF to the IR in these tests, and maintain their intent?

mcberg2017 marked 2 inline comments as done.Jul 29 2019, 9:47 AM

mcberg2017 added inline comments.

test/CodeGen/AMDGPU/enable-no-signed-zeros-fp-math.ll
3	We need the new flag or attribute to keep the results the same. I will look over the tests and see where (hopefully all) we can use attributes in place of the flags.
test/CodeGen/AMDGPU/fneg-combines.ll
222–226 ↗	(On Diff #211588)	Yes, now we no longer optimize this case, should I just remove the gcn-nsz-dag context for fneg_fadd_0?

spatel added subscribers: nhaehnle, foad, rampitec.Jul 29 2019, 10:07 AM

spatel added inline comments.

test/CodeGen/AMDGPU/fneg-combines.ll
222–226 ↗	(On Diff #211588)	Someone with AMDGPU knowledge should answer that. cc'ing @arsenm @nhaehnle @foad @rampitec

rampitec added inline comments.Jul 29 2019, 10:12 AM

test/CodeGen/AMDGPU/fneg-combines.ll
222–226 ↗	(On Diff #211588)	This is clearly a regression.

mcberg2017 marked an inline comment as done.Jul 29 2019, 10:31 AM

mcberg2017 added inline comments.

test/CodeGen/AMDGPU/fneg-combines.ll
222–226 ↗	(On Diff #211588)	I will debug this case, it looks like there's an additional dependency with the new code shape wrt fadd.

The new code is actually better by 1 instruction, we just never completed the full match on the test. In the old path we had -enable-no-signed-zeros-fp-math on but no way to reach it for the zero fold of the fadd via llc as the flags are all user controlled. This should not be a regression.

test/CodeGen/AMDGPU/fneg-combines.ll

222–226 ↗

(On Diff #211588)

with the change of

// N0 + -0.0 --> N0 (also allowed with +0.0 and fast-math)
ConstantFPSDNode *N1C = isConstOrConstSplatFP(N1, true);
if (N1C && N1C->isZero())
  if (N1C->isNegative() || Options.NoSignedZerosFPMath || Flags.hasNoSignedZeros())
    return N0;

we get this DAG:

SelectionDAG has 21 nodes:

t0: ch = EntryToken
t2: f32,ch = CopyFromReg t0, Register:f32 %0
          t27: f32 = fneg t28
        t13: i1 = setcc t27, t2, setuge:ch
      t15: f32 = select t13, t2, t28
    t17: i1 = setcc t15, ConstantFP:f32<0.000000e+00>, setule:ch
  t19: f32 = select t17, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<nan>
t21: ch,glue = CopyToReg t0, Register:f32 $vgpr0, t19
    t4: f32,ch = CopyFromReg t0, Register:f32 %1
  t7: f32 = fdiv ConstantFP:f32<1.000000e+00>, t4
t28: f32 = fmul nnan arcp contract reassoc t7, ConstantFP:f32<-0.000000e+00>
t22: ch = RETURN_TO_EPILOG t21, Register:f32 $vgpr0, t21:1

for which we produce this assembler:

fneg_fadd_0: ; @fneg_fadd_0
; %bb.0: ; %.entry

v_rcp_f32_e32 v0, s1
v_mov_b32_e32 v1, s0
v_mov_b32_e32 v2, 0x7fc00000
v_mul_f32_e32 v0, 0x80000000, v0
v_cmp_nlt_f32_e64 vcc, -v0, s0
v_cndmask_b32_e32 v0, v0, v1, vcc
v_cmp_nlt_f32_e32 vcc, 0, v0
v_cndmask_b32_e64 v0, v2, 0, vcc

with the code the way it is currently posed(unmodified):

// N0 + -0.0 --> N0 (also allowed with +0.0 and fast-math)
ConstantFPSDNode *N1C = isConstOrConstSplatFP(N1, true);
if (N1C && N1C->isZero())
  if (N1C->isNegative() || Options.UnsafeFPMath || Flags.hasNoSignedZeros())
    return N0;

we get this DAG:

SelectionDAG has 21 nodes:

t0: ch = EntryToken
t2: f32,ch = CopyFromReg t0, Register:f32 %0
      t4: f32,ch = CopyFromReg t0, Register:f32 %1
    t7: f32 = fdiv ConstantFP:f32<1.000000e+00>, t4
  t9: f32 = fmul t7, ConstantFP:f32<0.000000e+00>
t11: f32 = fadd t9, ConstantFP:f32<0.000000e+00>
        t13: i1 = setcc t11, t2, setuge:ch
        t14: f32 = fneg t11
      t15: f32 = select t13, t2, t14
    t17: i1 = setcc t15, ConstantFP:f32<0.000000e+00>, setule:ch
  t19: f32 = select t17, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<nan>
t21: ch,glue = CopyToReg t0, Register:f32 $vgpr0, t19
t22: ch = RETURN_TO_EPILOG t21, Register:f32 $vgpr0, t21:1

for which we fold an fused multiply add, and produce this assembler:

fneg_fadd_0: ; @fneg_fadd_0
; %bb.0: ; %.entry

v_rcp_f32_e32 v0, s1
v_bfrev_b32_e32 v1, 1
v_mov_b32_e32 v2, s0
v_mac_f32_e32 v1, v0, v1
v_cmp_nlt_f32_e64 vcc, -v1, s0
v_cndmask_b32_e32 v0, v1, v2, vcc
v_mov_b32_e32 v1, 0x7fc00000
v_cmp_nlt_f32_e32 vcc, 0, v0
v_cndmask_b32_e64 v0, v1, 0, vcc

rampitec added inline comments.Jul 29 2019, 12:51 PM

test/CodeGen/AMDGPU/fneg-combines.ll
222–226 ↗	(On Diff #211588)	I am trying to understand what does the existing ISA do, and it is: v_rcp_f32_e32 v0, s1 v0 = 1 / s1 v_bfrev_b32_e32 v1, 1 v1 = 0x8000000 = -0.0 v_mov_b32_e32 v2, s0 v2 = s0 v_mac_f32_e32 v1, v0, v1 v1 = v1 * v0 + v1 = v0 * -0.0 - 0.0 = 0 Instead of that fancy mac instruction that seems to be all now folded into v_mul_f32_e32 v0, 0x80000000, v0 v0 = v0 * -0.0 I.e. it is hardly practically performance relevant code. The comment above tells it used to assert, so I guess this is just a regression test. Given no signed zeroes this is as good as just v0 = 0, but that's a different matter. I have no objection for this test change.

Note: test/CodeGen/AMDGPU/fneg-combines.ll needs rearchitecting, so i left it in options flag form, test/CodeGen/PowerPC/fmf-propagation.ll has portions that can be removed once we stop using the options flags and so i am leaving it in its current form with mods until that happens. test/CodeGen/X86/fp-fast.ll uses a subset of fmf that equate to the options flag that were used (let me know if you just want to generalize to fast or smaller subset), all the others use either fast or context relevant fmf and have been converted to not use options flags. Have a look and see what needs editing, currently this all passes testing.

spatel added inline comments.Jul 31 2019, 3:44 AM

test/CodeGen/AArch64/fadd-combines.ll
150–151	This comment is incorrect now. MachineCombiner was relying on the function attribute?
test/CodeGen/PowerPC/fma-mutate.ll
6	Selectively using 2 different labels for the same RUN line confused me. That seems unnecessary because the function name and IR makes it clear what the difference in output is expected to be.
14–17	Would it be better to auto-generate the complete output for these tests using utils/update_llc_test_checks.py?
test/CodeGen/PowerPC/qpx-recipest.ll
2	Same comment as above: Selectively using 2 different labels for the same RUN line confused me. That seems unnecessary because the function name and IR makes it clear what the difference in output is expected to be.
test/CodeGen/PowerPC/recipest.ll
2	Same comment as above: Selectively using 2 different labels for the same RUN line confused me. That seems unnecessary because the function name and IR makes it clear what the difference in output is expected to be.
test/CodeGen/X86/dagcombine-unsafe-math.ll
65	If 'contract' is required, that is unnecessary? Mark with a 'FIXME' note?
test/CodeGen/X86/fp-fast.ll
10–11	Same as earlier comment: If 'contract' is required, that is unnecessary? Mark with a 'FIXME' note?
test/CodeGen/X86/fp-fold.ll
2–3	Same comment as earlier: Selectively using 2 different labels for the same RUN line confused me. That seems unnecessary because the function name and IR makes it clear what the difference in output is expected to be. Definitely use the auto-generation script when possible for x86 tests.

added a TODO comment for machine combines in test/CodeGen/AArch64/fadd-combines.ll,
test/CodeGen/PowerPC/fma-mutate.ll, test/CodeGen/PowerPC/qpx-recipest.ll, test/CodeGen/PowerPC/recipest.ll, test/CodeGen/X86/fp-fold.ll now have just standard CHECK lines. The fmf contract flag was
removed from test/CodeGen/X86/dagcombine-unsafe-math.ll and test/CodeGen/X86/fp-fast.ll.

LGTM - thanks for the test file updates. See inline comment about the AArch64 test.

test/CodeGen/AArch64/fadd-combines.ll
152–154	That comment seems wrong from the start. This form has better throughput than 3 chained adds. Ideally, this would be FMA? 2.0 * x + 101.0 The '17' variable names in the check lines are wrong too: 1109917696 --> 0x42280000 --> 42.0

This revision is now accepted and ready to land.Jul 31 2019, 1:22 PM

Closed by commit rL367486: Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize… (authored by mcberg2017). · Explain WhyJul 31 2019, 3:01 PM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2019, 3:01 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

10 lines

SelectionDAG.cpp

2 lines

test/

CodeGen/

AArch64/

fadd-combines.ll

2 lines

AMDGPU/

enable-no-signed-zeros-fp-math.ll

2 lines

ffloor.f64.ll

6 lines

PowerPC/

2 lines

2 lines

2 lines

2 lines

X86/

dagcombine-unsafe-math.ll

2 lines

fmul-combines.ll

2 lines

fp-fast.ll

2 lines

fp-fold.ll

2 lines

Diff 211364

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 814 Lines • ▼ Show 20 Lines	if (TLI.isOperationLegal(ISD::ConstantFP, VT) &&
return 1;		return 1;
return llvm::all_of(Op->op_values(), [&](SDValue N) {		return llvm::all_of(Op->op_values(), [&](SDValue N) {
return N.isUndef() \|\|		return N.isUndef() \|\|
TLI.isFPImmLegal(neg(cast<ConstantFPSDNode>(N)->getValueAPF()), VT,		TLI.isFPImmLegal(neg(cast<ConstantFPSDNode>(N)->getValueAPF()), VT,
ForCodeSize);		ForCodeSize);
});		});
}		}
case ISD::FADD:		case ISD::FADD:
if (!Options->UnsafeFPMath && !Flags.hasNoSignedZeros())		if (!Options->NoSignedZerosFPMath && !Flags.hasNoSignedZeros())
return 0;		return 0;

// After operation legalization, it might not be legal to create new FSUBs.		// After operation legalization, it might not be legal to create new FSUBs.
if (LegalOperations && !TLI.isOperationLegalOrCustom(ISD::FSUB, VT))		if (LegalOperations && !TLI.isOperationLegalOrCustom(ISD::FSUB, VT))
return 0;		return 0;

// fold (fneg (fadd A, B)) -> (fsub (fneg A), B)		// fold (fneg (fadd A, B)) -> (fsub (fneg A), B)
if (char V = isNegatibleForFree(Op.getOperand(0), LegalOperations, TLI,		if (char V = isNegatibleForFree(Op.getOperand(0), LegalOperations, TLI,
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	for (SDValue C : Op->op_values()) {
}		}
APFloat V = cast<ConstantFPSDNode>(C)->getValueAPF();		APFloat V = cast<ConstantFPSDNode>(C)->getValueAPF();
V.changeSign();		V.changeSign();
Ops.push_back(DAG.getConstantFP(V, SDLoc(Op), C.getValueType()));		Ops.push_back(DAG.getConstantFP(V, SDLoc(Op), C.getValueType()));
}		}
return DAG.getBuildVector(Op.getValueType(), SDLoc(Op), Ops);		return DAG.getBuildVector(Op.getValueType(), SDLoc(Op), Ops);
}		}
case ISD::FADD:		case ISD::FADD:
assert(Options.UnsafeFPMath \|\| Flags.hasNoSignedZeros());		assert(Options.NoSignedZerosFPMath \|\| Flags.hasNoSignedZeros());

// fold (fneg (fadd A, B)) -> (fsub (fneg A), B)		// fold (fneg (fadd A, B)) -> (fsub (fneg A), B)
if (isNegatibleForFree(Op.getOperand(0), LegalOperations,		if (isNegatibleForFree(Op.getOperand(0), LegalOperations,
DAG.getTargetLoweringInfo(), &Options, ForCodeSize,		DAG.getTargetLoweringInfo(), &Options, ForCodeSize,
Depth + 1))		Depth + 1))
return DAG.getNode(ISD::FSUB, SDLoc(Op), Op.getValueType(),		return DAG.getNode(ISD::FSUB, SDLoc(Op), Op.getValueType(),
GetNegatedExpression(Op.getOperand(0), DAG,		GetNegatedExpression(Op.getOperand(0), DAG,
LegalOperations, ForCodeSize,		LegalOperations, ForCodeSize,
▲ Show 20 Lines • Show All 11,146 Lines • ▼ Show 20 Lines	if ((Options.NoNaNsFPMath \|\| Flags.hasNoNaNs()) && AllowNewConst) {
// If allowed, fold (fadd x, (fneg x)) -> 0.0		// If allowed, fold (fadd x, (fneg x)) -> 0.0
if (N1.getOpcode() == ISD::FNEG && N1.getOperand(0) == N0)		if (N1.getOpcode() == ISD::FNEG && N1.getOperand(0) == N0)
return DAG.getConstantFP(0.0, DL, VT);		return DAG.getConstantFP(0.0, DL, VT);
}		}

// If 'unsafe math' or reassoc and nsz, fold lots of things.		// If 'unsafe math' or reassoc and nsz, fold lots of things.
// TODO: break out portions of the transformations below for which Unsafe is		// TODO: break out portions of the transformations below for which Unsafe is
// considered and which do not require both nsz and reassoc		// considered and which do not require both nsz and reassoc
if ((Options.UnsafeFPMath \|\|		if (((Options.UnsafeFPMath && Options.NoSignedZerosFPMath) \|\|
		mcberg2017AuthorUnsubmitted Done Reply Inline Actions In this case, UnsafeFPMath holds reassociation context. mcberg2017: In this case, UnsafeFPMath holds reassociation context.
(Flags.hasAllowReassociation() && Flags.hasNoSignedZeros())) &&		(Flags.hasAllowReassociation() && Flags.hasNoSignedZeros())) &&
AllowNewConst) {		AllowNewConst) {
// fadd (fadd x, c1), c2 -> fadd x, c1 + c2		// fadd (fadd x, c1), c2 -> fadd x, c1 + c2
if (N1CFP && N0.getOpcode() == ISD::FADD &&		if (N1CFP && N0.getOpcode() == ISD::FADD &&
isConstantFPBuildVectorOrConstantFP(N0.getOperand(1))) {		isConstantFPBuildVectorOrConstantFP(N0.getOperand(1))) {
SDValue NewC = DAG.getNode(ISD::FADD, DL, VT, N0.getOperand(1), N1, Flags);		SDValue NewC = DAG.getNode(ISD::FADD, DL, VT, N0.getOperand(1), N1, Flags);
return DAG.getNode(ISD::FADD, DL, VT, N0.getOperand(0), NewC, Flags);		return DAG.getNode(ISD::FADD, DL, VT, N0.getOperand(0), NewC, Flags);
}		}
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFSUB(SDNode *N) {
if (N0CFP && N1CFP)		if (N0CFP && N1CFP)
return DAG.getNode(ISD::FSUB, DL, VT, N0, N1, Flags);		return DAG.getNode(ISD::FSUB, DL, VT, N0, N1, Flags);

if (SDValue NewSel = foldBinOpIntoSelect(N))		if (SDValue NewSel = foldBinOpIntoSelect(N))
return NewSel;		return NewSel;

// (fsub A, 0) -> A		// (fsub A, 0) -> A
if (N1CFP && N1CFP->isZero()) {		if (N1CFP && N1CFP->isZero()) {
if (!N1CFP->isNegative() \|\| Options.UnsafeFPMath \|\|		if (!N1CFP->isNegative() \|\| Options.NoSignedZerosFPMath \|\|
Flags.hasNoSignedZeros()) {		Flags.hasNoSignedZeros()) {
return N0;		return N0;
}		}
}		}

if (N0 == N1) {		if (N0 == N1) {
// (fsub x, x) -> 0.0		// (fsub x, x) -> 0.0
if (Options.NoNaNsFPMath \|\| Flags.hasNoNaNs())		if (Options.NoNaNsFPMath \|\| Flags.hasNoNaNs())
Show All 10 Lines	if (N0CFP->isNegative() \|\|
(Options.NoSignedZerosFPMath \|\| Flags.hasNoSignedZeros())) {		(Options.NoSignedZerosFPMath \|\| Flags.hasNoSignedZeros())) {
if (isNegatibleForFree(N1, LegalOperations, TLI, &Options, ForCodeSize))		if (isNegatibleForFree(N1, LegalOperations, TLI, &Options, ForCodeSize))
return GetNegatedExpression(N1, DAG, LegalOperations, ForCodeSize);		return GetNegatedExpression(N1, DAG, LegalOperations, ForCodeSize);
if (!LegalOperations \|\| TLI.isOperationLegal(ISD::FNEG, VT))		if (!LegalOperations \|\| TLI.isOperationLegal(ISD::FNEG, VT))
return DAG.getNode(ISD::FNEG, DL, VT, N1, Flags);		return DAG.getNode(ISD::FNEG, DL, VT, N1, Flags);
}		}
}		}

if ((Options.UnsafeFPMath \|\|		if (((Options.UnsafeFPMath && Options.NoSignedZerosFPMath) \|\|
(Flags.hasAllowReassociation() && Flags.hasNoSignedZeros()))		(Flags.hasAllowReassociation() && Flags.hasNoSignedZeros()))
&& N1.getOpcode() == ISD::FADD) {		&& N1.getOpcode() == ISD::FADD) {
// X - (X + Y) -> -Y		// X - (X + Y) -> -Y
if (N0 == N1->getOperand(0))		if (N0 == N1->getOperand(0))
return DAG.getNode(ISD::FNEG, DL, VT, N1->getOperand(1), Flags);		return DAG.getNode(ISD::FNEG, DL, VT, N1->getOperand(1), Flags);
// X - (Y + X) -> -Y		// X - (Y + X) -> -Y
if (N0 == N1->getOperand(1))		if (N0 == N1->getOperand(1))
return DAG.getNode(ISD::FNEG, DL, VT, N1->getOperand(0), Flags);		return DAG.getNode(ISD::FNEG, DL, VT, N1->getOperand(0), Flags);
▲ Show 20 Lines • Show All 8,618 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,639 Lines • ▼ Show 20 Lines	if (OpOpcode == ISD::EXTRACT_VECTOR_ELT &&
return Operand.getOperand(0);		return Operand.getOperand(0);
break;		break;
case ISD::FNEG:		case ISD::FNEG:
// Negation of an unknown bag of bits is still completely undefined.		// Negation of an unknown bag of bits is still completely undefined.
if (OpOpcode == ISD::UNDEF)		if (OpOpcode == ISD::UNDEF)
return getUNDEF(VT);		return getUNDEF(VT);

// -(X-Y) -> (Y-X) is unsafe because when X==Y, -0.0 != +0.0		// -(X-Y) -> (Y-X) is unsafe because when X==Y, -0.0 != +0.0
if ((getTarget().Options.UnsafeFPMath \|\| Flags.hasNoSignedZeros()) &&		if ((getTarget().Options.NoSignedZerosFPMath \|\| Flags.hasNoSignedZeros()) &&
OpOpcode == ISD::FSUB)		OpOpcode == ISD::FSUB)
return getNode(ISD::FSUB, DL, VT, Operand.getOperand(1),		return getNode(ISD::FSUB, DL, VT, Operand.getOperand(1),
Operand.getOperand(0), Flags);		Operand.getOperand(0), Flags);
if (OpOpcode == ISD::FNEG) // --X -> X		if (OpOpcode == ISD::FNEG) // --X -> X
return Operand.getOperand(0);		return Operand.getOperand(0);
break;		break;
case ISD::FABS:		case ISD::FABS:
if (OpOpcode == ISD::FNEG) // abs(-X) -> abs(X)		if (OpOpcode == ISD::FNEG) // abs(-X) -> abs(X)
▲ Show 20 Lines • Show All 4,938 Lines • Show Last 20 Lines

test/CodeGen/AArch64/fadd-combines.ll

	Show First 20 Lines • Show All 141 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%a1 = fadd float %x, 42.0			%a1 = fadd float %x, 42.0
	%a2 = fadd nsz reassoc float %a1, 17.0			%a2 = fadd nsz reassoc float %a1, 17.0
	%a3 = fadd float %a1, %a2			%a3 = fadd float %a1, %a2
	ret float %a3			ret float %a3
	}			}

	; DAGCombiner transforms this into: (x + 59.0) + (x + 17.0).			; DAGCombiner transforms this into: (x + 59.0) + (x + 17.0).
	; The machine combiner transforms this into a chain of 3 dependent adds:			; The machine combiner transforms this into a chain of 3 dependent adds:
	; ((x + 59.0) + 17.0) + x			; ((x + 59.0) + 17.0) + x
				spatelUnsubmitted Not Done Reply Inline Actions This comment is incorrect now. MachineCombiner was relying on the function attribute? spatel: This comment is incorrect now. MachineCombiner was relying on the function attribute?

	define float @fadd_const_multiuse_attr(float %x) #0 {			define float @fadd_const_multiuse_attr(float %x) #0 {
	; CHECK-LABEL: fadd_const_multiuse_attr:			; CHECK-LABEL: fadd_const_multiuse_attr:
				spatelUnsubmitted Not Done Reply Inline Actions That comment seems wrong from the start. This form has better throughput than 3 chained adds. Ideally, this would be FMA? 2.0 * x + 101.0 The '17' variable names in the check lines are wrong too: 1109917696 --> 0x42280000 --> 42.0 spatel: That comment seems wrong from the start. This form has better throughput than 3 chained adds.
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-DAG: mov [[W59:w[0-9]+]], #1114374144			; CHECK-DAG: mov [[W59:w[0-9]+]], #1114374144
	; CHECK-DAG: mov [[W17:w[0-9]+]], #1109917696			; CHECK-DAG: mov [[W17:w[0-9]+]], #1109917696
	; CHECK-NEXT: fmov [[FP59:s[0-9]+]], [[W59]]			; CHECK-NEXT: fmov [[FP59:s[0-9]+]], [[W59]]
	; CHECK-NEXT: fmov [[FP17:s[0-9]+]], [[W17]]			; CHECK-NEXT: fmov [[FP17:s[0-9]+]], [[W17]]
	; CHECK-NEXT: fadd [[TMP1:s[0-9]+]], s0, [[FP59]]			; CHECK-NEXT: fadd [[TMP1:s[0-9]+]], s0, [[FP59]]
	; CHECK-NEXT: fadd [[TMP2:s[0-9]+]], [[FP17]], [[TMP1]]			; CHECK-NEXT: fadd [[TMP2:s[0-9]+]], [[FP17]], [[TMP1]]
	; CHECK-NEXT: fadd s0, s0, [[TMP2]]			; CHECK-NEXT: fadd s0, s0, [[TMP2]]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%a1 = fadd float %x, 42.0			%a1 = fadd float %x, 42.0
	%a2 = fadd float %a1, 17.0			%a2 = fadd float %a1, 17.0
	%a3 = fadd float %a1, %a2			%a3 = fadd float %a1, %a2
	ret float %a3			ret float %a3
	}			}

	attributes #0 = { "unsafe-fp-math"="true" }			attributes #0 = { "unsafe-fp-math"="true" "no-signed-zeros-fp-math"="true" }

	declare void @use(double)			declare void @use(double)

test/CodeGen/AMDGPU/enable-no-signed-zeros-fp-math.ll

	; RUN: llc -march=amdgcn -enable-no-signed-zeros-fp-math=0 < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-SAFE %s			; RUN: llc -march=amdgcn -enable-no-signed-zeros-fp-math=0 < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-SAFE %s
	; RUN: llc -march=amdgcn -enable-no-signed-zeros-fp-math=1 < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-UNSAFE %s			; RUN: llc -march=amdgcn -enable-no-signed-zeros-fp-math=1 < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-UNSAFE %s
	; RUN: llc -march=amdgcn -enable-unsafe-fp-math < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-UNSAFE %s			; RUN: llc -march=amdgcn -enable-unsafe-fp-math -enable-no-signed-zeros-fp-math < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-UNSAFE %s
				spatelUnsubmitted Not Done Reply Inline Actions Here and other test files: is it possible to remove "-enable-unsafe-fp-math" and still get the same test results? If we can tighten up the constraints, that would help move us away from the global requirements. Another option would be to specify the function-level attribute only on the tests that still require more than 'nsz' to produce the expected test results. spatel: Here and other test files: is it possible to remove "-enable-unsafe-fp-math" and still get the…
				spatelUnsubmitted Not Done Reply Inline Actions Still another option (and moves us closer still to the goal of IR/node-level flags only): can we remove the global settings entirely, add FMF to the IR in these tests, and maintain their intent? spatel: Still another option (and moves us closer still to the goal of IR/node-level flags only): can…
				mcberg2017AuthorUnsubmitted Done Reply Inline Actions We need the new flag or attribute to keep the results the same. I will look over the tests and see where (hopefully all) we can use attributes in place of the flags. mcberg2017: We need the new flag or attribute to keep the results the same. I will look over the tests and…

	declare i32 @llvm.amdgcn.workitem.id.x() nounwind readnone			declare i32 @llvm.amdgcn.workitem.id.x() nounwind readnone

	; Test that the -enable-no-signed-zeros-fp-math flag works			; Test that the -enable-no-signed-zeros-fp-math flag works

	; GCN-LABEL: {{^}}fneg_fsub_f32:			; GCN-LABEL: {{^}}fneg_fsub_f32:
	; GCN: v_sub_f32_e32 [[SUB:v[0-9]+]], {{v[0-9]+}}, {{v[0-9]+}}			; GCN: v_sub_f32_e32 [[SUB:v[0-9]+]], {{v[0-9]+}}, {{v[0-9]+}}
	; GCN-SAFE: v_xor_b32_e32 v{{[0-9]+}}, 0x80000000, [[SUB]]			; GCN-SAFE: v_xor_b32_e32 v{{[0-9]+}}, 0x80000000, [[SUB]]
	Show All 16 Lines

test/CodeGen/AMDGPU/ffloor.f64.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs -enable-unsafe-fp-math < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -verify-machineinstrs -enable-unsafe-fp-math -enable-no-signed-zeros-fp-math < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs -enable-unsafe-fp-math < %s \| FileCheck -check-prefix=CI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs -enable-unsafe-fp-math -enable-no-signed-zeros-fp-math < %s \| FileCheck -check-prefix=CI -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs -enable-unsafe-fp-math < %s \| FileCheck -check-prefix=CI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs -enable-unsafe-fp-math -enable-no-signed-zeros-fp-math < %s \| FileCheck -check-prefix=CI -check-prefix=FUNC %s

	declare double @llvm.fabs.f64(double %Val)			declare double @llvm.fabs.f64(double %Val)
	declare double @llvm.floor.f64(double) nounwind readnone			declare double @llvm.floor.f64(double) nounwind readnone
	declare <2 x double> @llvm.floor.v2f64(<2 x double>) nounwind readnone			declare <2 x double> @llvm.floor.v2f64(<2 x double>) nounwind readnone
	declare <3 x double> @llvm.floor.v3f64(<3 x double>) nounwind readnone			declare <3 x double> @llvm.floor.v3f64(<3 x double>) nounwind readnone
	declare <4 x double> @llvm.floor.v4f64(<4 x double>) nounwind readnone			declare <4 x double> @llvm.floor.v4f64(<4 x double>) nounwind readnone
	declare <8 x double> @llvm.floor.v8f64(<8 x double>) nounwind readnone			declare <8 x double> @llvm.floor.v8f64(<8 x double>) nounwind readnone
	declare <16 x double> @llvm.floor.v16f64(<16 x double>) nounwind readnone			declare <16 x double> @llvm.floor.v16f64(<16 x double>) nounwind readnone
	▲ Show 20 Lines • Show All 117 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/fma-mutate.ll

	; Test several VSX FMA mutation opportunities. The first one isn't a			; Test several VSX FMA mutation opportunities. The first one isn't a
	; reasonable transformation because the killed product register is the			; reasonable transformation because the killed product register is the
	; same as the FMA target register. The second one is legal. The third			; same as the FMA target register. The second one is legal. The third
	; one doesn't fit the feeding-copy pattern.			; one doesn't fit the feeding-copy pattern.

	; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -enable-unsafe-fp-math -mattr=+vsx -disable-ppc-vsx-fma-mutation=false \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -enable-unsafe-fp-math -enable-no-signed-zeros-fp-math -mattr=+vsx -disable-ppc-vsx-fma-mutation=false \| FileCheck %s
				spatelUnsubmitted Not Done Reply Inline Actions Selectively using 2 different labels for the same RUN line confused me. That seems unnecessary because the function name and IR makes it clear what the difference in output is expected to be. spatel: Selectively using 2 different labels for the same RUN line confused me. That seems unnecessary…
	target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"			target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	declare double @llvm.sqrt.f64(double)			declare double @llvm.sqrt.f64(double)

	define double @foo3(double %a) nounwind {			define double @foo3(double %a) nounwind {
	%r = call double @llvm.sqrt.f64(double %a)			%r = call double @llvm.sqrt.f64(double %a)
	ret double %r			ret double %r

	; CHECK: @foo3			; CHECK: @foo3
	; CHECK-NOT: fmr			; CHECK-NOT: fmr
				spatelUnsubmitted Not Done Reply Inline Actions Would it be better to auto-generate the complete output for these tests using utils/update_llc_test_checks.py? spatel: Would it be better to auto-generate the complete output for these tests using…
	; CHECK: xsmaddmdp			; CHECK: xsmaddmdp
	; CHECK: xsmaddadp			; CHECK: xsmaddadp
	}			}

test/CodeGen/PowerPC/fmf-propagation.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: llc < %s -mtriple=powerpc64le -debug-only=isel -o /dev/null 2>&1 \| FileCheck %s --check-prefix=FMFDEBUG			; RUN: llc < %s -mtriple=powerpc64le -debug-only=isel -o /dev/null 2>&1 \| FileCheck %s --check-prefix=FMFDEBUG
	; RUN: llc < %s -mtriple=powerpc64le \| FileCheck %s --check-prefix=FMF			; RUN: llc < %s -mtriple=powerpc64le \| FileCheck %s --check-prefix=FMF
	; RUN: llc < %s -mtriple=powerpc64le -debug-only=isel -o /dev/null 2>&1 -enable-unsafe-fp-math -enable-no-nans-fp-math \| FileCheck %s --check-prefix=GLOBALDEBUG			; RUN: llc < %s -mtriple=powerpc64le -debug-only=isel -o /dev/null 2>&1 -enable-unsafe-fp-math -enable-no-nans-fp-math \| FileCheck %s --check-prefix=GLOBALDEBUG
	; RUN: llc < %s -mtriple=powerpc64le -enable-unsafe-fp-math -enable-no-nans-fp-math \| FileCheck %s --check-prefix=GLOBAL			; RUN: llc < %s -mtriple=powerpc64le -enable-unsafe-fp-math -enable-no-nans-fp-math -enable-no-signed-zeros-fp-math \| FileCheck %s --check-prefix=GLOBAL

	; Test FP transforms using instruction/node-level fast-math-flags.			; Test FP transforms using instruction/node-level fast-math-flags.
	; We're also checking debug output to verify that FMF is propagated to the newly created nodes.			; We're also checking debug output to verify that FMF is propagated to the newly created nodes.
	; The run with the global unsafe param tests the pre-FMF behavior using regular instructions/nodes.			; The run with the global unsafe param tests the pre-FMF behavior using regular instructions/nodes.

	declare float @llvm.fma.f32(float, float, float)			declare float @llvm.fma.f32(float, float, float)
	declare float @llvm.sqrt.f32(float)			declare float @llvm.sqrt.f32(float)

	▲ Show 20 Lines • Show All 465 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/qpx-recipest.ll

	; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=a2q -enable-unsafe-fp-math \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=a2q -enable-unsafe-fp-math -enable-no-signed-zeros-fp-math \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=a2q \| FileCheck -check-prefix=CHECK-SAFE %s			; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=a2q \| FileCheck -check-prefix=CHECK-SAFE %s
				spatelUnsubmitted Not Done Reply Inline Actions Same comment as above: Selectively using 2 different labels for the same RUN line confused me. That seems unnecessary because the function name and IR makes it clear what the difference in output is expected to be. spatel: Same comment as above: Selectively using 2 different labels for the same RUN line confused me.
	target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"			target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	declare <4 x double> @llvm.sqrt.v4f64(<4 x double>)			declare <4 x double> @llvm.sqrt.v4f64(<4 x double>)
	declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)			declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)

	define <4 x double> @foo(<4 x double> %a, <4 x double> %b) nounwind {			define <4 x double> @foo(<4 x double> %a, <4 x double> %b) nounwind {
	entry:			entry:
	▲ Show 20 Lines • Show All 184 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/recipest.ll

	; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -enable-unsafe-fp-math -mattr=-vsx \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -enable-unsafe-fp-math -enable-no-signed-zeros-fp-math -mattr=-vsx \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -mattr=-vsx \| FileCheck -check-prefix=CHECK-SAFE %s			; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -mattr=-vsx \| FileCheck -check-prefix=CHECK-SAFE %s
				spatelUnsubmitted Not Done Reply Inline Actions Same comment as above: Selectively using 2 different labels for the same RUN line confused me. That seems unnecessary because the function name and IR makes it clear what the difference in output is expected to be. spatel: Same comment as above: Selectively using 2 different labels for the same RUN line confused me.

	target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"			target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	declare double @llvm.sqrt.f64(double)			declare double @llvm.sqrt.f64(double)
	declare float @llvm.sqrt.f32(float)			declare float @llvm.sqrt.f32(float)
	declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)			declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)

	▲ Show 20 Lines • Show All 260 Lines • Show Last 20 Lines

test/CodeGen/X86/dagcombine-unsafe-math.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -enable-unsafe-fp-math -mtriple=x86_64-apple-darwin -mcpu=corei7-avx \| FileCheck %s			; RUN: llc < %s -enable-unsafe-fp-math -enable-no-signed-zeros-fp-math -mtriple=x86_64-apple-darwin -mcpu=corei7-avx \| FileCheck %s


	; rdar://13126763			; rdar://13126763
	; Expression "x + xx" was mistakenly transformed into "x 3.0f".			; Expression "x + xx" was mistakenly transformed into "x 3.0f".

	define float @test1(float %x) {			define float @test1(float %x) {
	; CHECK-LABEL: test1:			; CHECK-LABEL: test1:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	define float @test5(<4 x float> %x) {			define float @test5(<4 x float> %x) {
	; CHECK-LABEL: test5:			; CHECK-LABEL: test5:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%splat = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> zeroinitializer			%splat = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> zeroinitializer
	%v1 = extractelement <4 x float> %splat, i32 1			%v1 = extractelement <4 x float> %splat, i32 1
	%v0 = extractelement <4 x float> %splat, i32 0			%v0 = extractelement <4 x float> %splat, i32 0
	%add1 = fadd float %v0, %v1			%add1 = fadd float %v0, %v1
				spatelUnsubmitted Not Done Reply Inline Actions If 'contract' is required, that is unnecessary? Mark with a 'FIXME' note? spatel: If 'contract' is required, that is unnecessary? Mark with a 'FIXME' note?
	%v2 = extractelement <4 x float> %splat, i32 2			%v2 = extractelement <4 x float> %splat, i32 2
	%add2 = fadd float %v2, %add1			%add2 = fadd float %v2, %add1
	ret float %add2			ret float %add2
	}			}

test/CodeGen/X86/fmul-combines.ll

	Show First 20 Lines • Show All 256 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: mulps %xmm1, %xmm0			; CHECK-NEXT: mulps %xmm1, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%x.neg = fsub <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, %x			%x.neg = fsub <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, %x
	%y.neg = fsub <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, %y			%y.neg = fsub <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, %y
	%mul = fmul <4 x float> %x.neg, %y.neg			%mul = fmul <4 x float> %x.neg, %y.neg
	ret <4 x float> %mul			ret <4 x float> %mul
	}			}

	attributes #0 = { "less-precise-fpmad"="true" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "unsafe-fp-math"="true" }			attributes #0 = { "less-precise-fpmad"="true" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "unsafe-fp-math"="true" "no-signed-zeros-fp-math"="true" }

test/CodeGen/X86/fp-fast.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=avx -enable-unsafe-fp-math --enable-no-nans-fp-math < %s \| FileCheck %s			; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=avx -enable-unsafe-fp-math --enable-no-nans-fp-math -enable-no-signed-zeros-fp-math < %s \| FileCheck %s

	define float @test1(float %a) {			define float @test1(float %a) {
	; CHECK-LABEL: test1:			; CHECK-LABEL: test1:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t1 = fadd float %a, %a			%t1 = fadd float %a, %a
	%r = fadd float %t1, %t1			%r = fadd float %t1, %t1
	ret float %r			ret float %r
				spatelUnsubmitted Not Done Reply Inline Actions Same as earlier comment: If 'contract' is required, that is unnecessary? Mark with a 'FIXME' note? spatel: Same as earlier comment: If 'contract' is required, that is unnecessary? Mark with a 'FIXME'…
	}			}

	define float @test2(float %a) {			define float @test2(float %a) {
	; CHECK-LABEL: test2:			; CHECK-LABEL: test2:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; CHECK-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t1 = fmul float 4.0, %a			%t1 = fmul float 4.0, %a
	▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

test/CodeGen/X86/fp-fold.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefixes=ANY,STRICT			; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefixes=ANY,STRICT
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -enable-unsafe-fp-math \| FileCheck %s --check-prefixes=ANY,UNSAFE			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -enable-unsafe-fp-math -enable-no-signed-zeros-fp-math \| FileCheck %s --check-prefixes=ANY,UNSAFE
				spatelUnsubmitted Not Done Reply Inline Actions Same comment as earlier: Selectively using 2 different labels for the same RUN line confused me. That seems unnecessary because the function name and IR makes it clear what the difference in output is expected to be. Definitely use the auto-generation script when possible for x86 tests. spatel: Same comment as earlier: Selectively using 2 different labels for the same RUN line confused me.

	define float @fadd_zero(float %x) {			define float @fadd_zero(float %x) {
	; STRICT-LABEL: fadd_zero:			; STRICT-LABEL: fadd_zero:
	; STRICT: # %bb.0:			; STRICT: # %bb.0:
	; STRICT-NEXT: xorps %xmm1, %xmm1			; STRICT-NEXT: xorps %xmm1, %xmm1
	; STRICT-NEXT: addss %xmm1, %xmm0			; STRICT-NEXT: addss %xmm1, %xmm0
	; STRICT-NEXT: retq			; STRICT-NEXT: retq
	;			;
	▲ Show 20 Lines • Show All 262 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize NoSignedZerosFPMath options controlClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 211364

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

test/CodeGen/AArch64/fadd-combines.ll

test/CodeGen/AMDGPU/enable-no-signed-zeros-fp-math.ll

test/CodeGen/AMDGPU/ffloor.f64.ll

test/CodeGen/PowerPC/fma-mutate.ll

test/CodeGen/PowerPC/fmf-propagation.ll

test/CodeGen/PowerPC/qpx-recipest.ll

test/CodeGen/PowerPC/recipest.ll

test/CodeGen/X86/dagcombine-unsafe-math.ll

test/CodeGen/X86/fmul-combines.ll

test/CodeGen/X86/fp-fast.ll

test/CodeGen/X86/fp-fold.ll

Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize NoSignedZerosFPMath options control
ClosedPublic