⚙ D50195 extend folding fsub/fadd to fneg for FMF

mcberg2017 created this revision.Aug 2 2018, 9:59 AM

Herald added a subscriber: wdng. · View Herald TranscriptAug 2 2018, 9:59 AM

spatel added inline comments.Aug 7 2018, 7:47 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10961–10964	Moving this block is unrelated/NFC?
test/CodeGen/X86/fp-fold.ll
98–100	A minimal test doesn't need fmul/fabs, but we do need 2 tests (1 for each of the commuted variants).

In IR, we do this transform with 'reassoc nsz':

define float @fsub_neg_y(float %x, float %y) {
  %add = fadd float %x, %y
  %r = fsub nsz reassoc float %y, %add
  ret float %r
}

$ opt -instcombine fsub.ll -S

define float @fsub_neg_y(float %x, float %y) {
  %1 = fsub reassoc nsz float -0.000000e+00, %x
  ret float %1
}

I think we want both flags for this transform so we have permission to do this:
original code: -0.0 - (-0.0 + (-0.0)) --> -0.0
transformed: -(-0.0) --> 0.0

IsNegatibleForFree shouldn't be a necessary predicate?

mcberg2017 added inline comments.Aug 7 2018, 8:24 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10961–10964	No it's not NFC, isNegatibleForFree should have been moved earlier when I was doing the other modifications. It sweeps up an opportunity and removes the chance for this combination to occur in in some cases.
test/CodeGen/X86/fp-fold.ll
98–100	Ok, will do.

I like these test versions a little better, they show an opportunity yet to be had for the IR optimizer.

mcberg2017 added inline comments.Aug 7 2018, 10:29 AM

test/CodeGen/X86/fp-fold.ll
168	no worries, I will remove this...

In D50195#1191124, @mcberg2017 wrote:

I like these test versions a little better, they show an opportunity yet to be had for the IR optimizer.

Yes, that makes it easier to see. Can you commit the tests with baselines CHECKs as a preliminary step, so we just see the diff in this patch?

I'm still confused though - are we avoiding the general transform (no constants):

define float @fsub_neg_y(float %x, float %y) {
  %add = fadd float %x, %y
  %r = fsub nsz reassoc float %y, %add
  ret float %r
}

...because that produces worse codegen somehow?

Nope, we can do either, I will upload the non constant version. Also I will split fp_fold.ll to use just unsafe for the initial check as NFC before this change.

With the non const version for both tests and some cleanup.

Updated and sync'd with fsub tests from r339197.

spatel mentioned this in D50417: [InstCombine] fold fneg into constant operand of fmul/fdiv.Aug 7 2018, 4:29 PM

spatel mentioned this in rL339248: [InstCombine] fold fneg into constant operand of fmul/fdiv.Aug 8 2018, 7:29 AM

I'm still confused. I don't see why we need to check isNegatibleForFree().

I added the IR transform for the test I mentioned in an earlier comment here:
rL339267

Isn't that the same pattern that we're trying to fold using flags in this patch?

It's possible that between the above patch and:
rL339171
rL339174
rL339176
rL339248
rL339266
...that we've solved the motivating case for this patch?

I'm not opposed to having this in the DAG too if the pattern can appear late, but if we can make it more general (and correspond to the IR fold), I think that would be better.

Even with all the above changes, the opportunity persists. We need to move:

// fold (fsub A, (fneg B)) -> (fadd A, B)

if (isNegatibleForFree(N1, LegalOperations, TLI, &Options))
  return DAG.getNode(ISD::FADD, DL, VT, N0,
                     GetNegatedExpression(N1, DAG, LegalOperations), Flags);

after the match as it sweeps up part of the key expression, and never completes the rest, leaving it partially optimized for some cases (in the test case noted, its for zero gain). Basically, this key context should be after all the major specialization matches to give each of the others a chance to complete.

spatel mentioned this in rL339293: [x86] add tests for fsub+fadd with FMF; NFC.Aug 8 2018, 3:18 PM

spatel mentioned this in rL339299: [DAGCombiner] loosen constraints for fsub+fadd fold.Aug 8 2018, 4:05 PM

In D50195#1192761, @mcberg2017 wrote:
Even with all the above changes, the opportunity persists. We need to move:

// fold (fsub A, (fneg B)) -> (fadd A, B)
if (isNegatibleForFree(N1, LegalOperations, TLI, &Options))
  return DAG.getNode(ISD::FADD, DL, VT, N0,
                     GetNegatedExpression(N1, DAG, LegalOperations), Flags);
after the match as it sweeps up part of the key expression, and never completes the rest, leaving it partially optimized for some cases (in the test case noted, its for zero gain). Basically, this key context should be after all the major specialization matches to give each of the others a chance to complete.

Ok, I understand now after adding some more tests to see what was going on. I removed the isNegatibleForFree() constraint as a preliminary step with rL339299.
So there are actually 2 independent changes in this patch currently:

Enable the fold with flags.
Rearrange the order of folds, so we don't miss the fold if we're starting from fsub.

I don't think there's any controversy with #1. Please go ahead with that change as another preliminary step (or I can do it if you'd prefer).
IIUC, the 2nd change is really just making up for the lack of an equivalent fadd fold for this:
rL339176
If you want to solve that by rearranging the order of folds, I suppose that's fine, but adding the fadd fold is likely the stronger solution.

I see what you meant, removing isNegatibleForFree/GetNegatedExpression in favor of fneg when that is what we doing. I have the rest of the change ready now. Should be up in a bit...

With changes to the new fold tests you added and the code mod.

spatel added inline comments.Aug 8 2018, 7:14 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10943–10944	Not sure how to expose this with a test, but we should pass the Flags when creating the FNEG nodes now that we’re using them to enable the transform?

Its either that or peer through them like fp_round and fp_extend, which seems more natural since these usually camp over arithmetics.

At this point there isn't any code either way that checks for fneg wrt to peering through or mining flags...

In D50195#1193315, @mcberg2017 wrote:

At this point there isn't any code either way that checks for fine...

Agree, but I think we should pass them to be consistent with the transform in IR. It can’t hurt to have them on the new node.

With the flags added...

LGTM

This revision is now accepted and ready to land.Aug 9 2018, 4:33 AM

Closed by commit rL339357: extend folding fsub/fadd to fneg for FMF (authored by mcberg2017). · Explain WhyAug 9 2018, 10:00 AM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: llvm-commits. · View Herald TranscriptAug 9 2018, 10:00 AM

This is an archive of the discontinued LLVM Phabricator instance.

extend folding fsub/fadd to fneg for FMF
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 159620

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/X86/fp-fold.ll