This is an archive of the discontinued LLVM Phabricator instance.

extend folding fsub/fadd to fneg for FMF
ClosedPublic

Authored by mcberg2017 on Aug 2 2018, 9:59 AM.

Details

Summary

This change provides a common optimization path for both Unsafe and FMF driven optimization for this fsub fold adding reassociation, as it the flag that most closely represents the translation

Diff Detail

Repository
rL LLVM

Event Timeline

mcberg2017 created this revision.Aug 2 2018, 9:59 AM
spatel added inline comments.Aug 7 2018, 7:47 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10949–10952 ↗(On Diff #158787)

Moving this block is unrelated/NFC?

test/CodeGen/X86/fp-fold.ll
98–100 ↗(On Diff #158787)

A minimal test doesn't need fmul/fabs, but we do need 2 tests (1 for each of the commuted variants).

spatel added a comment.Aug 7 2018, 8:10 AM

In IR, we do this transform with 'reassoc nsz':

define float @fsub_neg_y(float %x, float %y) {
  %add = fadd float %x, %y
  %r = fsub nsz reassoc float %y, %add
  ret float %r
}
$ opt -instcombine fsub.ll -S

define float @fsub_neg_y(float %x, float %y) {
  %1 = fsub reassoc nsz float -0.000000e+00, %x
  ret float %1
}

I think we want both flags for this transform so we have permission to do this:
original code: -0.0 - (-0.0 + (-0.0)) --> -0.0
transformed: -(-0.0) --> 0.0

IsNegatibleForFree shouldn't be a necessary predicate?

mcberg2017 added inline comments.Aug 7 2018, 8:24 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10949–10952 ↗(On Diff #158787)

No it's not NFC, isNegatibleForFree should have been moved earlier when I was doing the other modifications. It sweeps up an opportunity and removes the chance for this combination to occur in in some cases.

test/CodeGen/X86/fp-fold.ll
98–100 ↗(On Diff #158787)

Ok, will do.

I like these test versions a little better, they show an opportunity yet to be had for the IR optimizer.

mcberg2017 added inline comments.Aug 7 2018, 10:29 AM
test/CodeGen/X86/fp-fold.ll
168 ↗(On Diff #159539)

no worries, I will remove this...

I like these test versions a little better, they show an opportunity yet to be had for the IR optimizer.

Yes, that makes it easier to see. Can you commit the tests with baselines CHECKs as a preliminary step, so we just see the diff in this patch?

I'm still confused though - are we avoiding the general transform (no constants):

define float @fsub_neg_y(float %x, float %y) {
  %add = fadd float %x, %y
  %r = fsub nsz reassoc float %y, %add
  ret float %r
}

...because that produces worse codegen somehow?

mcberg2017 added a comment.EditedAug 7 2018, 10:46 AM

Nope, we can do either, I will upload the non constant version. Also I will split fp_fold.ll to use just unsafe for the initial check as NFC before this change.

With the non const version for both tests and some cleanup.

mcberg2017 updated this revision to Diff 159620.Aug 7 2018, 4:21 PM

Updated and sync'd with fsub tests from r339197.

spatel added a comment.Aug 8 2018, 9:38 AM

I'm still confused. I don't see why we need to check isNegatibleForFree().

I added the IR transform for the test I mentioned in an earlier comment here:
rL339267

Isn't that the same pattern that we're trying to fold using flags in this patch?

It's possible that between the above patch and:
rL339171
rL339174
rL339176
rL339248
rL339266
...that we've solved the motivating case for this patch?

I'm not opposed to having this in the DAG too if the pattern can appear late, but if we can make it more general (and correspond to the IR fold), I think that would be better.

Even with all the above changes, the opportunity persists. We need to move:

// fold (fsub A, (fneg B)) -> (fadd A, B)

if (isNegatibleForFree(N1, LegalOperations, TLI, &Options))
  return DAG.getNode(ISD::FADD, DL, VT, N0,
                     GetNegatedExpression(N1, DAG, LegalOperations), Flags);

after the match as it sweeps up part of the key expression, and never completes the rest, leaving it partially optimized for some cases (in the test case noted, its for zero gain). Basically, this key context should be after all the major specialization matches to give each of the others a chance to complete.

spatel added a comment.Aug 8 2018, 4:16 PM

Even with all the above changes, the opportunity persists. We need to move:

// fold (fsub A, (fneg B)) -> (fadd A, B)

if (isNegatibleForFree(N1, LegalOperations, TLI, &Options))
  return DAG.getNode(ISD::FADD, DL, VT, N0,
                     GetNegatedExpression(N1, DAG, LegalOperations), Flags);

after the match as it sweeps up part of the key expression, and never completes the rest, leaving it partially optimized for some cases (in the test case noted, its for zero gain). Basically, this key context should be after all the major specialization matches to give each of the others a chance to complete.

Ok, I understand now after adding some more tests to see what was going on. I removed the isNegatibleForFree() constraint as a preliminary step with rL339299.
So there are actually 2 independent changes in this patch currently:

  1. Enable the fold with flags.
  2. Rearrange the order of folds, so we don't miss the fold if we're starting from fsub.

I don't think there's any controversy with #1. Please go ahead with that change as another preliminary step (or I can do it if you'd prefer).
IIUC, the 2nd change is really just making up for the lack of an equivalent fadd fold for this:
rL339176
If you want to solve that by rearranging the order of folds, I suppose that's fine, but adding the fadd fold is likely the stronger solution.

I see what you meant, removing isNegatibleForFree/GetNegatedExpression in favor of fneg when that is what we doing. I have the rest of the change ready now. Should be up in a bit...

mcberg2017 updated this revision to Diff 159847.EditedAug 8 2018, 6:59 PM

With changes to the new fold tests you added and the code mod.

spatel added inline comments.Aug 8 2018, 7:14 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10943 ↗(On Diff #159847)

Not sure how to expose this with a test, but we should pass the Flags when creating the FNEG nodes now that we’re using them to enable the transform?

Its either that or peer through them like fp_round and fp_extend, which seems more natural since these usually camp over arithmetics.

mcberg2017 added a comment.EditedAug 8 2018, 7:18 PM

At this point there isn't any code either way that checks for fneg wrt to peering through or mining flags...

spatel added a comment.Aug 8 2018, 7:20 PM

At this point there isn't any code either way that checks for fine...

Agree, but I think we should pass them to be consistent with the transform in IR. It can’t hurt to have them on the new node.

mcberg2017 updated this revision to Diff 159851.Aug 8 2018, 9:05 PM

With the flags added...

spatel accepted this revision.Aug 9 2018, 4:33 AM

LGTM

This revision is now accepted and ready to land.Aug 9 2018, 4:33 AM
This revision was automatically updated to reflect the committed changes.