This change provides a common optimization path for both Unsafe and FMF driven optimization for this fsub fold adding reassociation, as it the flag that most closely represents the translation
Details
Diff Detail
Event Timeline
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
10961–10964 | Moving this block is unrelated/NFC? | |
test/CodeGen/X86/fp-fold.ll | ||
98–100 | A minimal test doesn't need fmul/fabs, but we do need 2 tests (1 for each of the commuted variants). |
In IR, we do this transform with 'reassoc nsz':
define float @fsub_neg_y(float %x, float %y) { %add = fadd float %x, %y %r = fsub nsz reassoc float %y, %add ret float %r }
$ opt -instcombine fsub.ll -S define float @fsub_neg_y(float %x, float %y) { %1 = fsub reassoc nsz float -0.000000e+00, %x ret float %1 }
I think we want both flags for this transform so we have permission to do this:
original code: -0.0 - (-0.0 + (-0.0)) --> -0.0
transformed: -(-0.0) --> 0.0
IsNegatibleForFree shouldn't be a necessary predicate?
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
10961–10964 | No it's not NFC, isNegatibleForFree should have been moved earlier when I was doing the other modifications. It sweeps up an opportunity and removes the chance for this combination to occur in in some cases. | |
test/CodeGen/X86/fp-fold.ll | ||
98–100 | Ok, will do. |
I like these test versions a little better, they show an opportunity yet to be had for the IR optimizer.
test/CodeGen/X86/fp-fold.ll | ||
---|---|---|
168 | no worries, I will remove this... |
Yes, that makes it easier to see. Can you commit the tests with baselines CHECKs as a preliminary step, so we just see the diff in this patch?
I'm still confused though - are we avoiding the general transform (no constants):
define float @fsub_neg_y(float %x, float %y) { %add = fadd float %x, %y %r = fsub nsz reassoc float %y, %add ret float %r }
...because that produces worse codegen somehow?
Nope, we can do either, I will upload the non constant version. Also I will split fp_fold.ll to use just unsafe for the initial check as NFC before this change.
I'm still confused. I don't see why we need to check isNegatibleForFree().
I added the IR transform for the test I mentioned in an earlier comment here:
rL339267
Isn't that the same pattern that we're trying to fold using flags in this patch?
It's possible that between the above patch and:
rL339171
rL339174
rL339176
rL339248
rL339266
...that we've solved the motivating case for this patch?
I'm not opposed to having this in the DAG too if the pattern can appear late, but if we can make it more general (and correspond to the IR fold), I think that would be better.
Even with all the above changes, the opportunity persists. We need to move:
// fold (fsub A, (fneg B)) -> (fadd A, B)
if (isNegatibleForFree(N1, LegalOperations, TLI, &Options)) return DAG.getNode(ISD::FADD, DL, VT, N0, GetNegatedExpression(N1, DAG, LegalOperations), Flags);
after the match as it sweeps up part of the key expression, and never completes the rest, leaving it partially optimized for some cases (in the test case noted, its for zero gain). Basically, this key context should be after all the major specialization matches to give each of the others a chance to complete.
Ok, I understand now after adding some more tests to see what was going on. I removed the isNegatibleForFree() constraint as a preliminary step with rL339299.
So there are actually 2 independent changes in this patch currently:
- Enable the fold with flags.
- Rearrange the order of folds, so we don't miss the fold if we're starting from fsub.
I don't think there's any controversy with #1. Please go ahead with that change as another preliminary step (or I can do it if you'd prefer).
IIUC, the 2nd change is really just making up for the lack of an equivalent fadd fold for this:
rL339176
If you want to solve that by rearranging the order of folds, I suppose that's fine, but adding the fadd fold is likely the stronger solution.
I see what you meant, removing isNegatibleForFree/GetNegatedExpression in favor of fneg when that is what we doing. I have the rest of the change ready now. Should be up in a bit...
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
10943–10944 | Not sure how to expose this with a test, but we should pass the Flags when creating the FNEG nodes now that we’re using them to enable the transform? |
Its either that or peer through them like fp_round and fp_extend, which seems more natural since these usually camp over arithmetics.
At this point there isn't any code either way that checks for fneg wrt to peering through or mining flags...
Agree, but I think we should pass them to be consistent with the transform in IR. It can’t hurt to have them on the new node.
Not sure how to expose this with a test, but we should pass the Flags when creating the FNEG nodes now that we’re using them to enable the transform?