Page MenuHomePhabricator

Add CallSiteSplitting pass
ClosedPublic

Authored by junbuml on Oct 20 2017, 12:31 PM.

Details

Summary

This change add a pass which tries to split a call-site to pass
more constrained arguments if its argument is predicated in the control flow
so that we can expose better context to the later passes (e.g, inliner, jump
threading, or IPA-CP based function cloning, etc.).
As of now we support two cases :

  1. If a call site is dominated by an OR condition and if any of its arguments

are predicated on this OR condition, try to split the condition with more
constrained arguments. For example, in the code below, we try to split the
call site since we can predicate the argument (ptr) based on the OR condition.

Split from :

if (!ptr || c)
  callee(ptr);

to :

if (!ptr)
  callee(null ptr)  // set the known constant value
else if (c)
  callee(nonnull ptr)  // set non-null attribute in the argument
  1. We can also split a call-site based on constant incoming values of a PHI

For example,
from :

BB0:
 %c = icmp eq i32 %i1, %i2
 br i1 %c, label %BB2, label %BB1
BB1:
 br label %BB2
BB2:
 %p = phi i32 [ 0, %BB0 ], [ 1, %BB1 ]
 call void @bar(i32 %p)

to

BB0:
 %c = icmp eq i32 %i1, %i2
 br i1 %c, label %BB2-split0, label %BB1
BB1:
 br label %BB2-split1
BB2-split0:
 call void @bar(i32 0)
 br label %BB2
BB2-split1:
 call void @bar(i32 1)
 br label %BB2
BB2:
 %p = phi i32 [ 0, %BB2-split0 ], [ 1, %BB2-split1 ]

Enabled this for O3 and LTO. I didn't see any significant code size increase in my spec2000/2006/2017 tests on aarch64. Observed 20% performance gain in spec2017/gcc without regressions in other benchmarks.

I've added only two simple tests to demonstrate this pass, but I will add more
tests covering what this patch doing if the overall approach is reasonable.

Diff Detail

Repository
rL LLVM

Event Timeline

junbuml created this revision.Oct 20 2017, 12:31 PM
mcrosier edited edge metadata.Oct 23 2017, 7:23 AM

Graham,
This is the inlining work we discussed at the dev meeting. Would you mind adding the Manchester developer that work on the related patch internally?

davidxl added inline comments.Oct 23 2017, 9:31 AM
lib/Transforms/IPO/PassManagerBuilder.cpp
473 ↗(On Diff #119685)

should this be moved after CFG simplification pass ? The latter may undo the splitting.

lib/Transforms/Scalar/CallSiteSplitting.cpp
169 ↗(On Diff #119685)

Document this method -- describe parameters etc.

321 ↗(On Diff #119685)

move the check to the caller.

326 ↗(On Diff #119685)

move the check to caller.

365 ↗(On Diff #119685)

if splitCallSite returns false, how can CallInst1 be non-null?

375 ↗(On Diff #119685)

nit: shorten the name to just 'doCallSiteSplitting'

377 ↗(On Diff #119685)

use range for loop: for (BasicBlock &BB: F)

379 ↗(On Diff #119685)

use range loop.

382 ↗(On Diff #119685)

Skip intrinsicInst. Or if there are folding opportunities, add test cases to cover it.

test/Transforms/CallSiteSplitting/callsite-split.ll
1 ↗(On Diff #119685)

add test for new PM.

junbuml added inline comments.Oct 23 2017, 1:21 PM
lib/Transforms/IPO/PassManagerBuilder.cpp
473 ↗(On Diff #119685)

This patch will split a call-site into two call-sites with different arguments. So, I doubt if SimplifyCFG can merge it back. Moreover, if SimpliyCFG cannot merge this back, then don't you think we need to somehow merge the call-site split in this pass back to original in later pass in case when not inlined?

Regarding the pass order, I think we need to place this pass even before constant propagation to allow more propagation.

junbuml updated this revision to Diff 120318.Oct 25 2017, 2:39 PM
junbuml marked 4 inline comments as done.

Move this pass before IPSCCP and addressed David's comments.

junbuml marked an inline comment as done.Oct 25 2017, 2:39 PM
junbuml added inline comments.
lib/Transforms/Scalar/CallSiteSplitting.cpp
365 ↗(On Diff #119685)

CallInst1 or CallInst2 could be created when detecting constrained arguments from an OR condition in createCallSitesOnOrPredicatedArgument(), but SplitBlockPredecessors() not always guarantee splitting predecessors. In case where we cannot split, we should clean up CallInst1 or CallInst2 if created already.

379 ↗(On Diff #119685)

I don't think we can use range because we modify the IR (remove the call instruction and add blocks) over the iterations.

382 ↗(On Diff #119685)

Added check for IntrinsicInst and InstructionTriviallyDead. Can you give me little more detail about folding opportunities ?

fhahn added a subscriber: fhahn.Oct 25 2017, 2:42 PM

I'll have a closer look later.

For now, CC @sdesmalen , he worked on a similar pass.

mcrosier added inline comments.Oct 26 2017, 7:22 AM
lib/Transforms/Scalar/CallSiteSplitting.cpp
25 ↗(On Diff #120318)

I believe this should be:

//  if (!ptr)
//    callee(null)         // set the known constant value
//  else if (c)
//    callee(nonnull ptr)  // set non-null attribute in the argument
113 ↗(On Diff #120318)
assert(PBI->isConditional() && "Expected predecessor block to end in a conditional branch.");
275 ↗(On Diff #120318)

You might consider skipping Constant arguments before checking for the NonNull attribute.

302 ↗(On Diff #120318)

Improve indent:

if (match(Cond, m_ICmp(Pred, m_Value(), m_Constant())))
  return;
ICmpInst *Cmp = cast<ICmpInst>(Cond);
if (isCondRelevantToAnyCallArgument(Cmp, CS))
  if (Pred == ICmpInst::ICMP_EQ || Pred == ICmpInst::ICMP_NE)
    BranchInsts.push_back(PBI);
314 ↗(On Diff #120318)
BasicBlock *Parent = Instr->getParent();
if (Instr != (Parent->getFirstNonPHI()))
  return false;

for (BasicBlock::iterator BI = Parent->begin(), BE = Parent->end();
425 ↗(On Diff #120318)

I think you can just use a reference to BB:

BasicBlock &BB = *BI++;
for (BasicBlock::iterator II = BB.begin(), IE = BB.end(); II != IE;) {
479 ↗(On Diff #120318)

You can eliminate the Changed boolean by:

if (!doCallSiteSplitting(F, TLI))
  return PreservedAnalyses::all();

I'll have a closer look later.

For now, CC @sdesmalen , he worked on a similar pass.

Thanks, Florian.

junbuml updated this revision to Diff 120420.Oct 26 2017, 7:57 AM
junbuml marked 6 inline comments as done.

Addressed Chad's comments.
Thanks for the review.

fhahn added inline comments.Oct 26 2017, 9:08 AM
lib/Passes/PassBuilder.cpp
546 ↗(On Diff #120318)

Is there any reason you use Level == 3 here and Level > 2 in lib/Transforms/IPO/PassManagerBuilder.cpp?

lib/Transforms/Scalar/CallSiteSplitting.cpp
115 ↗(On Diff #120420)

Would it make sense to add an assertion making sure PBI->getCondition() is an ICmpInst? I assume it could be a FCmpInst as well, if the caller passes a wrong BranchInsts. Same forthe cast to Constant later on.

116 ↗(On Diff #120420)

I think Op0 and Op1 could do with more expressive names. Op0 is the argument that gets replaced by the constant Op1, right?

144 ↗(On Diff #120420)

You could use continue here to get rid of the else { and reduce the nest in the loop.

244 ↗(On Diff #120420)

range based for loop?

251 ↗(On Diff #120420)

We have the same loop at 3 points now. Would it make sense to add a helper function for that?

296 ↗(On Diff #120420)

just !HeaderBB?

307 ↗(On Diff #120420)

maybe have this earlier as a cheap check before we do the more expensive checks?

316 ↗(On Diff #120420)

Unneeded braces

319 ↗(On Diff #120420)

range based for loop?

352 ↗(On Diff #120420)

!Header?

397 ↗(On Diff #120420)

Unneeded braces?

77 ↗(On Diff #120318)

ConstValue seems unused?

82 ↗(On Diff #120318)

A range-based-for loop might make things easier here.

fhahn added inline comments.Oct 26 2017, 9:30 AM
test/Transforms/CallSiteSplitting/callsite-split.ll
1 ↗(On Diff #120420)

Do we really need to run -inline -instcombine and -jump-threading? Wouldn't it be enough to just declare your callee functions and then just check if it has been split properly and the new call sites have the correct arguments?

Also, it might be a good idea to add a few negative tests, e.g. conditions with non-constants and so on

junbuml added inline comments.Oct 26 2017, 9:37 AM
test/Transforms/CallSiteSplitting/callsite-split.ll
1 ↗(On Diff #120420)

I totally agree with this. I will add tests simply shows how we split and many negative cases as well.
The intention of this test was to demonstrate the motivation case we observed.

junbuml updated this revision to Diff 120456.Oct 26 2017, 11:20 AM
junbuml marked 13 inline comments as done.

Fixed Florian's comments. I will add more tests soon.
Thanks for the review.

lib/Passes/PassBuilder.cpp
546 ↗(On Diff #120318)

Level in lib/Passes/PassBuilder.cpp is an enum defined :
enum OptimizationLevel {

  O0,
  O1,
  O2,
  O3,
  Os,
  Oz
};

while, OptLevel in lib/Transforms/IPO/PassManagerBuilder.cpp is defined :

/// The Optimization Level - Specify the basic optimization level.
///    0 = -O0, 1 = -O1, 2 = -O2, 3 = -O3
unsigned OptLevel;
lib/Transforms/Scalar/CallSiteSplitting.cpp
251 ↗(On Diff #120420)

Changed to a range-based loop, but didn't add helper function because the purpose of each loop slightly different: first one adds an attribute, and the second one set argument value for one call-site, and the third one set arguments for two call-sites. IMHO, iterating for each case doesn't make a big confusion.

307 ↗(On Diff #120420)

Agree.

junbuml edited the summary of this revision. (Show Details)Oct 26 2017, 11:26 AM
junbuml updated this revision to Diff 120634.Oct 27 2017, 10:23 AM

Added a test file to show simple cases covered by this pass.

Hi David,
Currently I placed this pass before IPSCCP, so ran performance tests (spec2000/2006/2017/llvm-test-suite) on aarch64. No performance / size regression was observed. Still keep 20% performance gain in spec2017/gcc. Please let me know if you are okay with this pass order.

Thanks,
Jun

davidxl added inline comments.Oct 30 2017, 3:50 PM
lib/Transforms/Scalar/CallSiteSplitting.cpp
103 ↗(On Diff #120634)

Takend --> Taken

408 ↗(On Diff #120634)

Assert CallIInst1 and CallInst2 are null here before return false.

410 ↗(On Diff #120634)

It seems better to not to speculatively clone the call instructions and delete them later when not needed. The analysis function should instead return 'clone' records indicating how to do the cloning during the transformation phase.

junbuml updated this revision to Diff 121000.Oct 31 2017, 9:44 AM
junbuml marked 2 inline comments as done.

Addressed David's comments. Please let me know any comment.
Thanks for the review.

lib/Transforms/Scalar/CallSiteSplitting.cpp
410 ↗(On Diff #120634)

Added canSplitCallSite() and checked if the call-site can be split first. So, we do not speculatively clone call instructions. The new call-sites cloned should be split in splitCallSite() without any trouble when splitting preds (added assert for this). Now, I changed splitCallSite() to return void.

junbuml updated this revision to Diff 121353.Nov 2 2017, 1:21 PM

Rebased and added debug messages.

davidxl accepted this revision.Nov 2 2017, 1:49 PM

lgtm

This revision is now accepted and ready to land.Nov 2 2017, 1:49 PM
fhahn added a comment.Nov 2 2017, 3:09 PM

Great, thanks! lgtm too.

This revision was automatically updated to reflect the committed changes.

Thanks for the review.