This is an archive of the discontinued LLVM Phabricator instance.

[simplifycfg] Handle 3 sequential branches while the first two can infer the third one.
Needs ReviewPublic

Authored by yinyuefengyi on Nov 22 2018, 12:52 AM.

Details

Summary

e.g. if a>=b && a!=b, then we can infer a>b. this patch will combine the
a>=b and a!=b into a new compare instruction a>b first, then do the next
level implication calculation by reusing the existing isImpliedCondition
function.

Diff Detail

Event Timeline

yinyuefengyi created this revision.Nov 22 2018, 12:52 AM
yinyuefengyi retitled this revision from Handle 3 sequential branches while the first two can infer the third one. to [simplifycfg] Handle 3 sequential branches while the first two can infer the third one..Nov 22 2018, 12:58 AM
lebedev.ri added inline comments.
llvm/lib/IR/Instructions.cpp
3401–3402 ↗(On Diff #175016)

This should already exist elsewhere, i think
https://godbolt.org/z/efgnhT

yinyuefengyi added a subscriber: yinyuefengyi.

use existing logic to implement 'AND' operation of two compare instruction.

getICmpCode in InstCombine already done this job, reuse this logic instead of adding redundant function.
add most test cases to cover.

Please always upload all patches with full context (-U99999)

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
5953–5955

And this too might already exist..
Have you seen [[ https://github.com/llvm-mirror/llvm/blob/5e3c9a56cb080c7a5131d6063d04d1d4a22044d0/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | InstCombiner::foldAndOfICmps() ]] ?

yinyuefengyi marked an inline comment as done.Nov 25 2018, 11:59 PM
yinyuefengyi added inline comments.
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
5953–5955

Thanks for the comments.
I've read the InstCombiner::foldAndOfIcmps(), it is a local member functions with a lot of other member functions called inside, seems not easy to call it directly from class SimplifyCFGOpt, do you have any suggestions about this?

lebedev.ri added inline comments.Nov 26 2018, 12:04 AM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
5953–5955

Perhaps it can be promoted to not be private? (cc @spatel)
https://github.com/llvm-mirror/llvm/blob/5e3c9a56cb080c7a5131d6063d04d1d4a22044d0/lib/Transforms/InstCombine/InstCombineInternal.h#L437-L581
Duplicating the exact same logic is probably not the best answer here..

yinyuefengyi updated this revision to Diff 175202.EditedNov 26 2018, 12:38 AM
yinyuefengyi marked an inline comment as not done.

upload full diff with -U99999 for other's review comments.

lebedev.ri suggested to use InstCombiner::foldAndOfICmps() instead of duplicate logic, this requires function change from private to public, also a new instance of InstComber is needed in SimplifyCFGOpt, pending for discussions.
Thanks.

spatel added inline comments.Nov 26 2018, 7:39 AM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
5953–5955

I think the way to share the code is to move it from InstCombine to CmpInstAnalysis.cpp. That's how CmpInstAnalysis was created in the first place.

I'm not sure if would avoid that work, but have you looked at trying to simplify the icmp in CorrelatedValuePropagation instead of here in SimplifyCFG?

jsji added a subscriber: jsji.Nov 26 2018, 12:47 PM
yinyuefengyi added inline comments.Nov 27 2018, 12:32 AM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
5953–5955

Thanks for the reply.
This simplifyCFG patch is cross basic block optimization that need back access 3 block's prediction instruction, no dependency of other passes. But the CorrelatedValuePropagation can only do instruction level optimization IN BLOCK based on LazyValueInfo, like infer the condition "x < Const" to be true or false, quite different with this patch's target.

Also, I've investigated about moving InstCombiner::foldAndOfICmps() to other file like CmpInstAnalysis.cpp, more than 1000+ LOC changes required as this function calls 10+ member functions like foldAndOrOfICmpsOfAndWithPow2(), simplifyRangeCheck(), insertRangeTest() and static functions getNewICmpValue(), foldLogOpOfMaskedICmps,foldLogOpOfMaskedICmps(), foldAndOrOfEqualityCmpsWithConstants(), foldSignedTruncationCheck(). In addition, the variable IRBuilder<TargetFolder, IRBuilderCallbackInserter> is different from outside need change.
Seems this HUGE refactor should be done in another patch if necessary, after the foldAndOfICmps() is moved to accessible place, this SimplifyCFGOpt patch can be tidy.

spatel added inline comments.Nov 27 2018, 6:34 AM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
5953–5955

Thanks for checking on CVP. I haven't looked into that pass very much, but if you can simplify the icmp to true/false within that pass, then doesn't that allow dead code elimination/instcombine/simplifycfg to later kill all of the unused instructions?

I agree that any refactoring of the code from instcombine should be done in separate patches from this one. I also agree that it's a big mess of code. Maybe we can break that job up to allow progress for this patch?

As a preliminary step, you can commit the test cases from this patch to trunk with the current output. For that, please use FileCheck rather than 'grep' and use utils/update_test_checks.py to generate the assertions. There are many examples of test files that use that script, but feel free to ask questions if it's not clear how to use it. I also ask that you give each test a comment, so we can tell what the differences are between the tests.

nikic added a subscriber: nikic.Nov 27 2018, 7:56 AM
yinyuefengyi added inline comments.Nov 28 2018, 12:23 AM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
5953–5955

Thanks, spatel. The test case is already split and uploaded to https://reviews.llvm.org/D54994.

The problem is this type of icmps ( a>=b && a!=b => a>b) can't be simplified, they can only be folded by the function InstCombiner::foldAndOfICmps().

I just come up with another idea that whether can this be implemented in this way of combination optimization:
opt < %s -simplifycfg -instcombine -simplifycfg -S

the first round of 'simplifycfg' will do the condition check , if BCmpPred equals to (ACmpPred && BCmpPred), do nothing(means it may be folded, to avoid endless loop), else replace the BCmpPred with the AND instruction, this AND instruction can be folded to BCmpPred2(i.e. a>b) by the next 'instcombine' pass in foldAndOfICmps(). then the second round of 'simplifycfg' can do the implication check with BCmpPred2 && CCmpPred, to merge Block 3 or eliminated it.

I tried this experiment on my code base, it can work. But I am not sure this kind of pass combination introduces PASS DEPENDENCY will cause side effect or break the LLVM design rule. If this implementation is also not acceptable, then possibly the only way is refactoring the function foldAndOfICmps() from instcombine. Fundamentally, a shared foldAndOfICmps() can achieve best benefits.

spatel added a comment.Dec 2 2018, 9:26 AM

I've been looking at this:
rL347868
rL347896
rL348088
...and I'm still not exactly sure how we should fix it, but I'm confident that what this patch is proposing is not the right answer. We don't want to create a dummy instruction only to analyze and delete it.
My best guess is that we can handle this case using a small enhancement to ValueTracking plus a minimal change to the callers in InstCombine and/or SimplifyCFG. These are hacks vs. a proper solution that uses a dominator tree and a more significant fix to CVP, but it's what we already do in both InstCombine and SimplifyCFG, so there's really no additional cost for optimizing simple cases like this.