This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Simplify addends reordering logic
ClosedPublic

Authored by kovdan01 on Jan 14 2022, 4:44 AM.

Details

Summary

Previously some constants were not pushed to the top of the resulting
expression tree as intended by the algorithm. We can remove the logic from
simplifyFAdd and rely on SimplifyAssociativeOrCommutative to do that.

Diff Detail

Event Timeline

kovdan01 created this revision.Jan 14 2022, 4:44 AM
kovdan01 requested review of this revision.Jan 14 2022, 4:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 14 2022, 4:44 AM
kovdan01 updated this revision to Diff 399962.Jan 14 2022, 5:37 AM

I can't tell if this has a greater effect than shown by the test diff. That diff actually demonstrates a basic missing canonicalization in IR - no fast-math is needed to convert fsub to fadd in that example or any of these more general cases:

define float @fmul_c1(float %x, float %y) {
  %m = fmul float %x, 7.000000e+00
  %r = fsub float %y, %m
  ret float %r
}

define float @fdiv_c0(float %x, float %y) {
  %m = fdiv float 7.000000e+00, %x
  %r = fsub float %y, %m
  ret float %r
}

define float @fdiv_c1(float %x, float %y) {
  %m = fdiv float %x, 7.000000e+00
  %r = fsub float %y, %m
  ret float %r
}

We do have this transform in the backend though (pushing the implicit negation op into the constant operand). For example, if you codegen this with llc for x86, you'll see 3 "addss" instructions and no "subss". Would adding those canonicalizations to IR solve your motivating case(s)?
Or is there a larger example of a change resulting from this patch?

I can't tell if this has a greater effect than shown by the test diff. That diff actually demonstrates a basic missing canonicalization in IR - no fast-math is needed to convert fsub to fadd in that example or any of these more general cases:

define float @fmul_c1(float %x, float %y) {
  %m = fmul float %x, 7.000000e+00
  %r = fsub float %y, %m
  ret float %r
}

define float @fdiv_c0(float %x, float %y) {
  %m = fdiv float 7.000000e+00, %x
  %r = fsub float %y, %m
  ret float %r
}

define float @fdiv_c1(float %x, float %y) {
  %m = fdiv float %x, 7.000000e+00
  %r = fsub float %y, %m
  ret float %r
}

We do have this transform in the backend though (pushing the implicit negation op into the constant operand). For example, if you codegen this with llc for x86, you'll see 3 "addss" instructions and no "subss". Would adding those canonicalizations to IR solve your motivating case(s)?
Or is there a larger example of a change resulting from this patch?

Thanks for your comment! I don’t have a larger example that results from this patch and adding the canonicalization that you mentioned will solve my motivation case. I can implement that and submit as a separate patch if that’s applicable for you. Could you please tell me the right place to add the canonicalization to?

Getting back to the current patch – I believe that it is also worth to be merged – at least, in terms of refactoring (having code that is intended to do the same thing in two places looks weird IMHO).

spatel accepted this revision.Jan 17 2022, 6:31 AM

I can't tell if this has a greater effect than shown by the test diff. That diff actually demonstrates a basic missing canonicalization in IR - no fast-math is needed to convert fsub to fadd in that example or any of these more general cases:

define float @fmul_c1(float %x, float %y) {
  %m = fmul float %x, 7.000000e+00
  %r = fsub float %y, %m
  ret float %r
}

define float @fdiv_c0(float %x, float %y) {
  %m = fdiv float 7.000000e+00, %x
  %r = fsub float %y, %m
  ret float %r
}

define float @fdiv_c1(float %x, float %y) {
  %m = fdiv float %x, 7.000000e+00
  %r = fsub float %y, %m
  ret float %r
}

We do have this transform in the backend though (pushing the implicit negation op into the constant operand). For example, if you codegen this with llc for x86, you'll see 3 "addss" instructions and no "subss". Would adding those canonicalizations to IR solve your motivating case(s)?
Or is there a larger example of a change resulting from this patch?

Thanks for your comment! I don’t have a larger example that results from this patch and adding the canonicalization that you mentioned will solve my motivation case. I can implement that and submit as a separate patch if that’s applicable for you. Could you please tell me the right place to add the canonicalization to?

I think it would start from InstCombinerImpl::visitFSub() and be very similar to the specialization that already exists as foldFNegIntoConstant. I already drafted a patch for it while looking at this, so I can clean that up and post it (the harder part is adding a pile of tests to check FMF propagation!).

Getting back to the current patch – I believe that it is also worth to be merged – at least, in terms of refactoring (having code that is intended to do the same thing in two places looks weird IMHO).

I agree. I'm still not sure exactly what happens within FAddCombine, but this patch just removes code, so LGTM.

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
540

typo: supper -> super

This revision is now accepted and ready to land.Jan 17 2022, 6:31 AM
spatel added inline comments.Jan 17 2022, 6:38 AM
llvm/test/Transforms/Reassociate/fast-basictest.ll
2

I missed that this test is under Reassociate. It's generally wrong for a regression test to run 3 different passes, but it can be a separate cleanup patch.

kovdan01 updated this revision to Diff 400582.Jan 17 2022, 9:28 AM

I already drafted a patch for it while looking at this, so I can clean that up and post it (the harder part is adding a pile of tests to check FMF propagation!).

OK, please let me know if you need any help with that. If I understood you correctly, you plan to submit the patch by yourself. If so, could you please mention me in revision to keep me informed? Thanks!

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
540

Fixed

llvm/test/Transforms/Reassociate/fast-basictest.ll
2

Got it, will submit a separate patch to fix this

This revision was landed with ongoing or failed builds.Jan 18 2022, 5:01 AM
This revision was automatically updated to reflect the committed changes.

I already drafted a patch for it while looking at this, so I can clean that up and post it (the harder part is adding a pile of tests to check FMF propagation!).

OK, please let me know if you need any help with that. If I understood you correctly, you plan to submit the patch by yourself. If so, could you please mention me in revision to keep me informed? Thanks!

Looking over the regression tests, I noticed that we would miss a factoring optimization if we canonicalize fsub->fadd more often. This would potentially be inverting a fold done by the reassociation pass:

/// Recursively analyze an expression to build a list of instructions that have
/// negative floating-point constant operands. The caller can then transform
/// the list to create positive constants for better reassociation and CSE.

...so I'm not sure if it is worth doing that transform generally.