This is an archive of the discontinued LLVM Phabricator instance.

[InstCombineAddSub opportunities]: More opportunities to factorize FAdd/FSub when unsafeAlgebra is present for Inst
Needs ReviewPublic

Authored by aditya_nandakumar on Aug 17 2015, 7:45 PM.

Details

Reviewers
llvm-commits
Summary

Canonicalizing negative constants out of expressions causes significant instruction count increase (not all of which is cleaned up by instcombine due to ordering of expressions)

The following patch tries to factorize more combinations of FAdd/FSub instructions when unsafeAlgebra is set. This reduces the regression in codegen instructions from the above patch.

Some examples -
Here op* is restricted to FAdd/FSub.
-> Transform (A op1 B) op2 C -> A op3 (B op4 C) if (B op4 C) factorizes
Eg. (A + X*C1) + X * C2 -> A + X* (C1 + C2)
-> Transform A op1 (B op2 C) -> (A op3 B) op4 C) if (A op3 B) factorizes
Eg. (A + X*C1) - X*C2 -> A + X *(C1-C2)
-> Transform ( A op1 B) op2 C -> (A op3 C) op4 B if (A op3 C) factorizes
Eg. (X * C1 - B) + X * C2 -> X * (C1 - C2) - B
-> Transform A op1 (B op2 C) -> (A op3 C) op4 B
Eg. X * C1 - (B + X * C2) -> X * (C1 - C2) - B

I'll add test cases post feedback.

Diff Detail

Event Timeline

aditya_nandakumar retitled this revision from to [InstCombineAddSub opportunities]: More opportunities to factorize FAdd/FSub when unsafeAlgebra is present for Inst.
aditya_nandakumar updated this object.
aditya_nandakumar added a reviewer: llvm-commits.
aditya_nandakumar set the repository for this revision to rL LLVM.
aditya_nandakumar added a subscriber: llvm-commits.

Please reupload with additional context in the diff.

aditya_nandakumar removed rL LLVM as the repository for this revision.

Updated diff with -U99999

Updated diff with test cases that cover all the 16 possible combinations of optimizations.

Please format your change with clang-format.

lib/Transforms/InstCombine/InstCombineAddSub.cpp
597–598

Just use cast<Instruction>, it will assert for you.

Ran it through clang-format.

Use cast instead of dyn_cast + assert.

aditya_nandakumar marked an inline comment as done.

Updated comments.
Moved the checking of operands directly to if statement instead of assigning to bool and checking.

Removed unnecessary {}

Out of curiosity, why doesn't the reassociate pass catch your test cases?

lib/Transforms/InstCombine/InstCombineAddSub.cpp
451

Please use hasOneUse instead.

466–467

Elsewhere, you use A, B and C instead of a, b, and c. Please be consistent.

573–577

Please be consistent where you have comments.

772

This is commented out.

Updated to use hasOneUse
Consistent comments.

The IR I see in the test case is a simplified version of running reassociate followed by instcombine - the original IR contained lots of -ve floating point constants which reassociate now canonicalizes further expecting instcombine to cleanup. Instcombine is unable to simplify possibly due how reassociate arranges the expressions (my guess).

aditya_nandakumar marked 3 inline comments as done.Aug 21 2015, 2:33 PM

Is there anything else that needs to be done here?

Thanks
Aditya

I am not convinced as to why we need this in InstCombine. This seems inside the purview of Reassociate.

I ran a quarter of your test cases under reassociate twice and it seems to optimize it just fine. There seems to be a bug in reassociate that requires us running it twice, I think that is the actual bug.

test/Transforms/InstCombine/FAddFSubAssociativeFactorize.ll
12–19

Two runs of the reassociate pass achieve the desired result.

25–32

Two runs of reassociate get this case:

%tmp2 = fmul fast half %a, 0xH4500
%tmp6 = fmul fast half %b, 0xH3C00 ; <- this is just a multiply by 1.
%tmp5 = fadd fast half %tmp6, %tmp2
38–45

Likewise, two runs of reassociate get this case.

67–74

Two runs also get this case.

I'll try taking a look at reassociate for the same test case. I hope that would also fix the regression I was originally looking at.

Also out of general curiosity, shouldn't instcombine be capable of handling these patterns?(if reassociate is not run before instcombine in a pass pipeline).

Also out of general curiosity, shouldn't instcombine be capable of handling these patterns?(if reassociate is not run before instcombine in a pass pipeline).

IMO, we should endeavor to make InstCombine expect output canonicalized by reassociate (where possible/reasonable) and tell users to feed reassociate into InstCombine.