Download Raw Diff

Details

Reviewers

spatel
wristow
arsenm

Commits

rGca38254601ce: extend folding fsub/fadd to fneg for FMF
rL339357: extend folding fsub/fadd to fneg for FMF

Summary

This change provides a common optimization path for both Unsafe and FMF driven optimization for this fsub fold adding reassociation, as it the flag that most closely represents the translation

Diff Detail

Event Timeline

mcberg2017 created this revision.Aug 2 2018, 9:59 AM

Herald added a subscriber: wdng. · View Herald TranscriptAug 2 2018, 9:59 AM

spatel added inline comments.Aug 7 2018, 7:47 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10949–10952	Moving this block is unrelated/NFC?
test/CodeGen/X86/fp-fold.ll
98–100	A minimal test doesn't need fmul/fabs, but we do need 2 tests (1 for each of the commuted variants).

In IR, we do this transform with 'reassoc nsz':

define float @fsub_neg_y(float %x, float %y) {
  %add = fadd float %x, %y
  %r = fsub nsz reassoc float %y, %add
  ret float %r
}

$ opt -instcombine fsub.ll -S

define float @fsub_neg_y(float %x, float %y) {
  %1 = fsub reassoc nsz float -0.000000e+00, %x
  ret float %1
}

I think we want both flags for this transform so we have permission to do this:
original code: -0.0 - (-0.0 + (-0.0)) --> -0.0
transformed: -(-0.0) --> 0.0

IsNegatibleForFree shouldn't be a necessary predicate?

mcberg2017 added inline comments.Aug 7 2018, 8:24 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10949–10952	No it's not NFC, isNegatibleForFree should have been moved earlier when I was doing the other modifications. It sweeps up an opportunity and removes the chance for this combination to occur in in some cases.
test/CodeGen/X86/fp-fold.ll
98–100	Ok, will do.

I like these test versions a little better, they show an opportunity yet to be had for the IR optimizer.

mcberg2017 added inline comments.Aug 7 2018, 10:29 AM

test/CodeGen/X86/fp-fold.ll
227	no worries, I will remove this...

In D50195#1191124, @mcberg2017 wrote:

I like these test versions a little better, they show an opportunity yet to be had for the IR optimizer.

Yes, that makes it easier to see. Can you commit the tests with baselines CHECKs as a preliminary step, so we just see the diff in this patch?

I'm still confused though - are we avoiding the general transform (no constants):

define float @fsub_neg_y(float %x, float %y) {
  %add = fadd float %x, %y
  %r = fsub nsz reassoc float %y, %add
  ret float %r
}

...because that produces worse codegen somehow?

Nope, we can do either, I will upload the non constant version. Also I will split fp_fold.ll to use just unsafe for the initial check as NFC before this change.

With the non const version for both tests and some cleanup.

Updated and sync'd with fsub tests from r339197.

spatel mentioned this in D50417: [InstCombine] fold fneg into constant operand of fmul/fdiv.Aug 7 2018, 4:29 PM

spatel mentioned this in rL339248: [InstCombine] fold fneg into constant operand of fmul/fdiv.Aug 8 2018, 7:29 AM

I'm still confused. I don't see why we need to check isNegatibleForFree().

I added the IR transform for the test I mentioned in an earlier comment here:
rL339267

Isn't that the same pattern that we're trying to fold using flags in this patch?

It's possible that between the above patch and:
rL339171
rL339174
rL339176
rL339248
rL339266
...that we've solved the motivating case for this patch?

I'm not opposed to having this in the DAG too if the pattern can appear late, but if we can make it more general (and correspond to the IR fold), I think that would be better.

Even with all the above changes, the opportunity persists. We need to move:

// fold (fsub A, (fneg B)) -> (fadd A, B)

if (isNegatibleForFree(N1, LegalOperations, TLI, &Options))
  return DAG.getNode(ISD::FADD, DL, VT, N0,
                     GetNegatedExpression(N1, DAG, LegalOperations), Flags);

after the match as it sweeps up part of the key expression, and never completes the rest, leaving it partially optimized for some cases (in the test case noted, its for zero gain). Basically, this key context should be after all the major specialization matches to give each of the others a chance to complete.

spatel mentioned this in rL339293: [x86] add tests for fsub+fadd with FMF; NFC.Aug 8 2018, 3:18 PM

spatel mentioned this in rL339299: [DAGCombiner] loosen constraints for fsub+fadd fold.Aug 8 2018, 4:05 PM

In D50195#1192761, @mcberg2017 wrote:
Even with all the above changes, the opportunity persists. We need to move:

// fold (fsub A, (fneg B)) -> (fadd A, B)
if (isNegatibleForFree(N1, LegalOperations, TLI, &Options))
  return DAG.getNode(ISD::FADD, DL, VT, N0,
                     GetNegatedExpression(N1, DAG, LegalOperations), Flags);
after the match as it sweeps up part of the key expression, and never completes the rest, leaving it partially optimized for some cases (in the test case noted, its for zero gain). Basically, this key context should be after all the major specialization matches to give each of the others a chance to complete.

Ok, I understand now after adding some more tests to see what was going on. I removed the isNegatibleForFree() constraint as a preliminary step with rL339299.
So there are actually 2 independent changes in this patch currently:

Enable the fold with flags.
Rearrange the order of folds, so we don't miss the fold if we're starting from fsub.

I don't think there's any controversy with #1. Please go ahead with that change as another preliminary step (or I can do it if you'd prefer).
IIUC, the 2nd change is really just making up for the lack of an equivalent fadd fold for this:
rL339176
If you want to solve that by rearranging the order of folds, I suppose that's fine, but adding the fadd fold is likely the stronger solution.

I see what you meant, removing isNegatibleForFree/GetNegatedExpression in favor of fneg when that is what we doing. I have the rest of the change ready now. Should be up in a bit...

With changes to the new fold tests you added and the code mod.

spatel added inline comments.Aug 8 2018, 7:14 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10943	Not sure how to expose this with a test, but we should pass the Flags when creating the FNEG nodes now that we’re using them to enable the transform?

Its either that or peer through them like fp_round and fp_extend, which seems more natural since these usually camp over arithmetics.

At this point there isn't any code either way that checks for fneg wrt to peering through or mining flags...

In D50195#1193315, @mcberg2017 wrote:

At this point there isn't any code either way that checks for fine...

Agree, but I think we should pass them to be consistent with the transform in IR. It can’t hurt to have them on the new node.

With the flags added...

LGTM

This revision is now accepted and ready to land.Aug 9 2018, 4:33 AM

Closed by commit rL339357: extend folding fsub/fadd to fneg for FMF (authored by mcberg2017). · Explain WhyAug 9 2018, 10:00 AM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: llvm-commits. · View Herald TranscriptAug 9 2018, 10:00 AM

Diff 159847

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,929 Lines • ▼ Show 20 Lines	if (N0CFP && N0CFP->isZero()) {
if (Options.NoSignedZerosFPMath \|\| Flags.hasNoSignedZeros()) {		if (Options.NoSignedZerosFPMath \|\| Flags.hasNoSignedZeros()) {
if (isNegatibleForFree(N1, LegalOperations, TLI, &Options))		if (isNegatibleForFree(N1, LegalOperations, TLI, &Options))
return GetNegatedExpression(N1, DAG, LegalOperations);		return GetNegatedExpression(N1, DAG, LegalOperations);
if (!LegalOperations \|\| TLI.isOperationLegal(ISD::FNEG, VT))		if (!LegalOperations \|\| TLI.isOperationLegal(ISD::FNEG, VT))
return DAG.getNode(ISD::FNEG, DL, VT, N1, Flags);		return DAG.getNode(ISD::FNEG, DL, VT, N1, Flags);
}		}
}		}

// fold (fsub A, (fneg B)) -> (fadd A, B)		if ((Options.UnsafeFPMath \|\|
if (isNegatibleForFree(N1, LegalOperations, TLI, &Options))		(Flags.hasAllowReassociation() && Flags.hasNoSignedZeros()))
return DAG.getNode(ISD::FADD, DL, VT, N0,		&& N1.getOpcode() == ISD::FADD) {
GetNegatedExpression(N1, DAG, LegalOperations), Flags);

if (Options.UnsafeFPMath && N1.getOpcode() == ISD::FADD) {
// X - (X + Y) -> -Y		// X - (X + Y) -> -Y
if (N0 == N1->getOperand(0))		if (N0 == N1->getOperand(0))
return DAG.getNode(ISD::FNEG, DL, VT, N1->getOperand(1));		return DAG.getNode(ISD::FNEG, DL, VT, N1->getOperand(1));
		spatelUnsubmitted Not Done Reply Inline Actions Not sure how to expose this with a test, but we should pass the Flags when creating the FNEG nodes now that we’re using them to enable the transform? spatel: Not sure how to expose this with a test, but we should pass the Flags when creating the FNEG…
// X - (Y + X) -> -Y		// X - (Y + X) -> -Y
if (N0 == N1->getOperand(1))		if (N0 == N1->getOperand(1))
return DAG.getNode(ISD::FNEG, DL, VT, N1->getOperand(0));		return DAG.getNode(ISD::FNEG, DL, VT, N1->getOperand(0));
}		}

		// fold (fsub A, (fneg B)) -> (fadd A, B)
		if (isNegatibleForFree(N1, LegalOperations, TLI, &Options))
		return DAG.getNode(ISD::FADD, DL, VT, N0,
		GetNegatedExpression(N1, DAG, LegalOperations), Flags);
		spatelUnsubmitted Not Done Reply Inline Actions Moving this block is unrelated/NFC? spatel: Moving this block is unrelated/NFC?
		mcberg2017AuthorUnsubmitted Not Done Reply Inline Actions No it's not NFC, isNegatibleForFree should have been moved earlier when I was doing the other modifications. It sweeps up an opportunity and removes the chance for this combination to occur in in some cases. mcberg2017: No it's not NFC, isNegatibleForFree should have been moved earlier when I was doing the other…

// FSUB -> FMA combines:		// FSUB -> FMA combines:
if (SDValue Fused = visitFSUBForFMACombine(N)) {		if (SDValue Fused = visitFSUBForFMACombine(N)) {
AddToWorklist(Fused.getNode());		AddToWorklist(Fused.getNode());
return Fused;		return Fused;
}		}

return SDValue();		return SDValue();
}		}
▲ Show 20 Lines • Show All 7,709 Lines • Show Last 20 Lines

test/CodeGen/X86/fp-fold.ll

	Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
	; ANY-NEXT: movaps %xmm1, %xmm0			; ANY-NEXT: movaps %xmm1, %xmm0
	; ANY-NEXT: retq			; ANY-NEXT: retq
	%neg = fsub nsz float 0.0, %x			%neg = fsub nsz float 0.0, %x
	%r = fadd nsz float %neg, %y			%r = fadd nsz float %neg, %y
	ret float %r			ret float %r
	}			}

	define float @fsub_neg_y(float %x, float %y) {			define float @fsub_neg_y(float %x, float %y) {
	; STRICT-LABEL: fsub_neg_y:			; ANY-LABEL: fsub_neg_y:
	; STRICT: # %bb.0:			; ANY: # %bb.0:
	; STRICT-NEXT: mulss {{.*}}(%rip), %xmm0			; ANY-NEXT: mulss {{.*}}(%rip), %xmm0
	; STRICT-NEXT: addss %xmm1, %xmm0			; ANY-NEXT: retq
	; STRICT-NEXT: subss %xmm0, %xmm1
	; STRICT-NEXT: movaps %xmm1, %xmm0
	; STRICT-NEXT: retq
	;
	; UNSAFE-LABEL: fsub_neg_y:
	; UNSAFE: # %bb.0:
	; UNSAFE-NEXT: mulss {{.*}}(%rip), %xmm0
	; UNSAFE-NEXT: subss %xmm1, %xmm0
	; UNSAFE-NEXT: addss %xmm1, %xmm0
	; UNSAFE-NEXT: retq
	%mul = fmul float %x, 5.0			%mul = fmul float %x, 5.0
	%add = fadd float %mul, %y			%add = fadd float %mul, %y
	%r = fsub nsz reassoc float %y, %add			%r = fsub nsz reassoc float %y, %add
	ret float %r			ret float %r
	}			}
				spatelUnsubmitted Not Done Reply Inline Actions A minimal test doesn't need fmul/fabs, but we do need 2 tests (1 for each of the commuted variants). spatel: A minimal test doesn't need fmul/fabs, but we do need 2 tests (1 for each of the commuted…
				mcberg2017AuthorUnsubmitted Not Done Reply Inline Actions Ok, will do. mcberg2017: Ok, will do.

	define float @fsub_neg_y_commute(float %x, float %y) {			define float @fsub_neg_y_commute(float %x, float %y) {
	; STRICT-LABEL: fsub_neg_y_commute:			; ANY-LABEL: fsub_neg_y_commute:
	; STRICT: # %bb.0:			; ANY: # %bb.0:
	; STRICT-NEXT: mulss {{.*}}(%rip), %xmm0			; ANY-NEXT: mulss {{.*}}(%rip), %xmm0
	; STRICT-NEXT: addss %xmm1, %xmm0			; ANY-NEXT: retq
	; STRICT-NEXT: subss %xmm0, %xmm1
	; STRICT-NEXT: movaps %xmm1, %xmm0
	; STRICT-NEXT: retq
	;
	; UNSAFE-LABEL: fsub_neg_y_commute:
	; UNSAFE: # %bb.0:
	; UNSAFE-NEXT: mulss {{.*}}(%rip), %xmm0
	; UNSAFE-NEXT: subss %xmm1, %xmm0
	; UNSAFE-NEXT: addss %xmm1, %xmm0
	; UNSAFE-NEXT: retq
	%mul = fmul float %x, 5.0			%mul = fmul float %x, 5.0
	%add = fadd float %y, %mul			%add = fadd float %y, %mul
	%r = fsub nsz reassoc float %y, %add			%r = fsub nsz reassoc float %y, %add
	ret float %r			ret float %r
	}			}
	; Y - (X + Y) --> -X			; Y - (X + Y) --> -X

	define float @fsub_fadd_common_op_fneg(float %x, float %y) {			define float @fsub_fadd_common_op_fneg(float %x, float %y) {
	; STRICT-LABEL: fsub_fadd_common_op_fneg:			; ANY-LABEL: fsub_fadd_common_op_fneg:
	; STRICT: # %bb.0:			; ANY: # %bb.0:
	; STRICT-NEXT: addss %xmm1, %xmm0			; ANY-NEXT: xorps {{.*}}(%rip), %xmm0
	; STRICT-NEXT: subss %xmm0, %xmm1			; ANY-NEXT: retq
	; STRICT-NEXT: movaps %xmm1, %xmm0
	; STRICT-NEXT: retq
	;
	; UNSAFE-LABEL: fsub_fadd_common_op_fneg:
	; UNSAFE: # %bb.0:
	; UNSAFE-NEXT: xorps {{.*}}(%rip), %xmm0
	; UNSAFE-NEXT: retq
	%a = fadd float %x, %y			%a = fadd float %x, %y
	%r = fsub reassoc nsz float %y, %a			%r = fsub reassoc nsz float %y, %a
	ret float %r			ret float %r
	}			}

	; Y - (X + Y) --> -X			; Y - (X + Y) --> -X

	define <4 x float> @fsub_fadd_common_op_fneg_vec(<4 x float> %x, <4 x float> %y) {			define <4 x float> @fsub_fadd_common_op_fneg_vec(<4 x float> %x, <4 x float> %y) {
	; STRICT-LABEL: fsub_fadd_common_op_fneg_vec:			; ANY-LABEL: fsub_fadd_common_op_fneg_vec:
	; STRICT: # %bb.0:			; ANY: # %bb.0:
	; STRICT-NEXT: addps %xmm1, %xmm0			; ANY-NEXT: xorps {{.*}}(%rip), %xmm0
	; STRICT-NEXT: subps %xmm0, %xmm1			; ANY-NEXT: retq
	; STRICT-NEXT: movaps %xmm1, %xmm0
	; STRICT-NEXT: retq
	;
	; UNSAFE-LABEL: fsub_fadd_common_op_fneg_vec:
	; UNSAFE: # %bb.0:
	; UNSAFE-NEXT: xorps {{.*}}(%rip), %xmm0
	; UNSAFE-NEXT: retq
	%a = fadd <4 x float> %x, %y			%a = fadd <4 x float> %x, %y
	%r = fsub nsz reassoc <4 x float> %y, %a			%r = fsub nsz reassoc <4 x float> %y, %a
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	; Y - (Y + X) --> -X			; Y - (Y + X) --> -X
	; Commute operands of the 'add'.			; Commute operands of the 'add'.

	define float @fsub_fadd_common_op_fneg_commute(float %x, float %y) {			define float @fsub_fadd_common_op_fneg_commute(float %x, float %y) {
	; STRICT-LABEL: fsub_fadd_common_op_fneg_commute:			; ANY-LABEL: fsub_fadd_common_op_fneg_commute:
	; STRICT: # %bb.0:			; ANY: # %bb.0:
	; STRICT-NEXT: addss %xmm1, %xmm0			; ANY-NEXT: xorps {{.*}}(%rip), %xmm0
	; STRICT-NEXT: subss %xmm0, %xmm1			; ANY-NEXT: retq
	; STRICT-NEXT: movaps %xmm1, %xmm0
	; STRICT-NEXT: retq
	;
	; UNSAFE-LABEL: fsub_fadd_common_op_fneg_commute:
	; UNSAFE: # %bb.0:
	; UNSAFE-NEXT: xorps {{.*}}(%rip), %xmm0
	; UNSAFE-NEXT: retq
	%a = fadd float %y, %x			%a = fadd float %y, %x
	%r = fsub reassoc nsz float %y, %a			%r = fsub reassoc nsz float %y, %a
	ret float %r			ret float %r
	}			}

	; Y - (Y + X) --> -X			; Y - (Y + X) --> -X

	define <4 x float> @fsub_fadd_common_op_fneg_commute_vec(<4 x float> %x, <4 x float> %y) {			define <4 x float> @fsub_fadd_common_op_fneg_commute_vec(<4 x float> %x, <4 x float> %y) {
	; STRICT-LABEL: fsub_fadd_common_op_fneg_commute_vec:			; ANY-LABEL: fsub_fadd_common_op_fneg_commute_vec:
	; STRICT: # %bb.0:			; ANY: # %bb.0:
	; STRICT-NEXT: addps %xmm1, %xmm0			; ANY-NEXT: xorps {{.*}}(%rip), %xmm0
	; STRICT-NEXT: subps %xmm0, %xmm1			; ANY-NEXT: retq
	; STRICT-NEXT: movaps %xmm1, %xmm0
	; STRICT-NEXT: retq
	;
	; UNSAFE-LABEL: fsub_fadd_common_op_fneg_commute_vec:
	; UNSAFE: # %bb.0:
	; UNSAFE-NEXT: xorps {{.*}}(%rip), %xmm0
	; UNSAFE-NEXT: retq
	%a = fadd <4 x float> %y, %x			%a = fadd <4 x float> %y, %x
	%r = fsub reassoc nsz <4 x float> %y, %a			%r = fsub reassoc nsz <4 x float> %y, %a
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define float @fsub_negzero(float %x) {			define float @fsub_negzero(float %x) {
	; STRICT-LABEL: fsub_negzero:			; STRICT-LABEL: fsub_negzero:
	; STRICT: # %bb.0:			; STRICT: # %bb.0:
	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	define float @fmul_x_const_const(float %x) {			define float @fmul_x_const_const(float %x) {
	; ANY-LABEL: fmul_x_const_const:			; ANY-LABEL: fmul_x_const_const:
	; ANY: # %bb.0:			; ANY: # %bb.0:
	; ANY-NEXT: mulss {{.*}}(%rip), %xmm0			; ANY-NEXT: mulss {{.*}}(%rip), %xmm0
	; ANY-NEXT: retq			; ANY-NEXT: retq
	%mul = fmul reassoc float %x, 9.0			%mul = fmul reassoc float %x, 9.0
	%r = fmul reassoc float %mul, 4.0			%r = fmul reassoc float %mul, 4.0
	ret float %r			ret float %r
	}			}
				mcberg2017AuthorUnsubmitted Not Done Reply Inline Actions no worries, I will remove this... mcberg2017: no worries, I will remove this...

This is an archive of the discontinued LLVM Phabricator instance.

extend folding fsub/fadd to fneg for FMF
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 159847

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/X86/fp-fold.ll

This is an archive of the discontinued LLVM Phabricator instance.

extend folding fsub/fadd to fneg for FMFClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 159847

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/X86/fp-fold.ll

extend folding fsub/fadd to fneg for FMF
ClosedPublic