This is an archive of the discontinued LLVM Phabricator instance.

Add CallSiteSplitting pass
ClosedPublic

Authored by junbuml on Oct 20 2017, 12:31 PM.

Details

Summary

This change add a pass which tries to split a call-site to pass
more constrained arguments if its argument is predicated in the control flow
so that we can expose better context to the later passes (e.g, inliner, jump
threading, or IPA-CP based function cloning, etc.).
As of now we support two cases :

  1. If a call site is dominated by an OR condition and if any of its arguments

are predicated on this OR condition, try to split the condition with more
constrained arguments. For example, in the code below, we try to split the
call site since we can predicate the argument (ptr) based on the OR condition.

Split from :

if (!ptr || c)
  callee(ptr);

to :

if (!ptr)
  callee(null ptr)  // set the known constant value
else if (c)
  callee(nonnull ptr)  // set non-null attribute in the argument
  1. We can also split a call-site based on constant incoming values of a PHI

For example,
from :

BB0:
 %c = icmp eq i32 %i1, %i2
 br i1 %c, label %BB2, label %BB1
BB1:
 br label %BB2
BB2:
 %p = phi i32 [ 0, %BB0 ], [ 1, %BB1 ]
 call void @bar(i32 %p)

to

BB0:
 %c = icmp eq i32 %i1, %i2
 br i1 %c, label %BB2-split0, label %BB1
BB1:
 br label %BB2-split1
BB2-split0:
 call void @bar(i32 0)
 br label %BB2
BB2-split1:
 call void @bar(i32 1)
 br label %BB2
BB2:
 %p = phi i32 [ 0, %BB2-split0 ], [ 1, %BB2-split1 ]

Enabled this for O3 and LTO. I didn't see any significant code size increase in my spec2000/2006/2017 tests on aarch64. Observed 20% performance gain in spec2017/gcc without regressions in other benchmarks.

I've added only two simple tests to demonstrate this pass, but I will add more
tests covering what this patch doing if the overall approach is reasonable.

Diff Detail

Event Timeline

junbuml created this revision.Oct 20 2017, 12:31 PM
mcrosier edited edge metadata.Oct 23 2017, 7:23 AM

Graham,
This is the inlining work we discussed at the dev meeting. Would you mind adding the Manchester developer that work on the related patch internally?

davidxl added inline comments.Oct 23 2017, 9:31 AM
lib/Transforms/IPO/PassManagerBuilder.cpp
485

should this be moved after CFG simplification pass ? The latter may undo the splitting.

lib/Transforms/Scalar/CallSiteSplitting.cpp
170

Document this method -- describe parameters etc.

322

move the check to the caller.

327

move the check to caller.

366

if splitCallSite returns false, how can CallInst1 be non-null?

376

nit: shorten the name to just 'doCallSiteSplitting'

378

use range for loop: for (BasicBlock &BB: F)

380

use range loop.

383

Skip intrinsicInst. Or if there are folding opportunities, add test cases to cover it.

test/Transforms/CallSiteSplitting/callsite-split.ll
2

add test for new PM.

junbuml added inline comments.Oct 23 2017, 1:21 PM
lib/Transforms/IPO/PassManagerBuilder.cpp
485

This patch will split a call-site into two call-sites with different arguments. So, I doubt if SimplifyCFG can merge it back. Moreover, if SimpliyCFG cannot merge this back, then don't you think we need to somehow merge the call-site split in this pass back to original in later pass in case when not inlined?

Regarding the pass order, I think we need to place this pass even before constant propagation to allow more propagation.

junbuml updated this revision to Diff 120318.Oct 25 2017, 2:39 PM
junbuml marked 4 inline comments as done.

Move this pass before IPSCCP and addressed David's comments.

junbuml marked an inline comment as done.Oct 25 2017, 2:39 PM
junbuml added inline comments.
lib/Transforms/Scalar/CallSiteSplitting.cpp
366

CallInst1 or CallInst2 could be created when detecting constrained arguments from an OR condition in createCallSitesOnOrPredicatedArgument(), but SplitBlockPredecessors() not always guarantee splitting predecessors. In case where we cannot split, we should clean up CallInst1 or CallInst2 if created already.

380

I don't think we can use range because we modify the IR (remove the call instruction and add blocks) over the iterations.

383

Added check for IntrinsicInst and InstructionTriviallyDead. Can you give me little more detail about folding opportunities ?

fhahn added a subscriber: fhahn.Oct 25 2017, 2:42 PM

I'll have a closer look later.

For now, CC @sdesmalen , he worked on a similar pass.

mcrosier added inline comments.Oct 26 2017, 7:22 AM
lib/Transforms/Scalar/CallSiteSplitting.cpp
26

I believe this should be:

//  if (!ptr)
//    callee(null)         // set the known constant value
//  else if (c)
//    callee(nonnull ptr)  // set non-null attribute in the argument
114
assert(PBI->isConditional() && "Expected predecessor block to end in a conditional branch.");
276

You might consider skipping Constant arguments before checking for the NonNull attribute.

303

Improve indent:

if (match(Cond, m_ICmp(Pred, m_Value(), m_Constant())))
  return;
ICmpInst *Cmp = cast<ICmpInst>(Cond);
if (isCondRelevantToAnyCallArgument(Cmp, CS))
  if (Pred == ICmpInst::ICMP_EQ || Pred == ICmpInst::ICMP_NE)
    BranchInsts.push_back(PBI);
315
BasicBlock *Parent = Instr->getParent();
if (Instr != (Parent->getFirstNonPHI()))
  return false;

for (BasicBlock::iterator BI = Parent->begin(), BE = Parent->end();
426

I think you can just use a reference to BB:

BasicBlock &BB = *BI++;
for (BasicBlock::iterator II = BB.begin(), IE = BB.end(); II != IE;) {
480

You can eliminate the Changed boolean by:

if (!doCallSiteSplitting(F, TLI))
  return PreservedAnalyses::all();

I'll have a closer look later.

For now, CC @sdesmalen , he worked on a similar pass.

Thanks, Florian.

junbuml updated this revision to Diff 120420.Oct 26 2017, 7:57 AM
junbuml marked 6 inline comments as done.

Addressed Chad's comments.
Thanks for the review.

fhahn added inline comments.Oct 26 2017, 9:08 AM
lib/Passes/PassBuilder.cpp
553

Is there any reason you use Level == 3 here and Level > 2 in lib/Transforms/IPO/PassManagerBuilder.cpp?

lib/Transforms/Scalar/CallSiteSplitting.cpp
78

ConstValue seems unused?

83

A range-based-for loop might make things easier here.

116

Would it make sense to add an assertion making sure PBI->getCondition() is an ICmpInst? I assume it could be a FCmpInst as well, if the caller passes a wrong BranchInsts. Same forthe cast to Constant later on.

117

I think Op0 and Op1 could do with more expressive names. Op0 is the argument that gets replaced by the constant Op1, right?

145

You could use continue here to get rid of the else { and reduce the nest in the loop.

245

range based for loop?

252

We have the same loop at 3 points now. Would it make sense to add a helper function for that?

297

just !HeaderBB?

308

maybe have this earlier as a cheap check before we do the more expensive checks?

317

Unneeded braces

320

range based for loop?

353

!Header?

398

Unneeded braces?

fhahn added inline comments.Oct 26 2017, 9:30 AM
test/Transforms/CallSiteSplitting/callsite-split.ll
2

Do we really need to run -inline -instcombine and -jump-threading? Wouldn't it be enough to just declare your callee functions and then just check if it has been split properly and the new call sites have the correct arguments?

Also, it might be a good idea to add a few negative tests, e.g. conditions with non-constants and so on

junbuml added inline comments.Oct 26 2017, 9:37 AM
test/Transforms/CallSiteSplitting/callsite-split.ll
2

I totally agree with this. I will add tests simply shows how we split and many negative cases as well.
The intention of this test was to demonstrate the motivation case we observed.

junbuml updated this revision to Diff 120456.Oct 26 2017, 11:20 AM
junbuml marked 13 inline comments as done.

Fixed Florian's comments. I will add more tests soon.
Thanks for the review.

lib/Passes/PassBuilder.cpp
553

Level in lib/Passes/PassBuilder.cpp is an enum defined :
enum OptimizationLevel {

  O0,
  O1,
  O2,
  O3,
  Os,
  Oz
};

while, OptLevel in lib/Transforms/IPO/PassManagerBuilder.cpp is defined :

/// The Optimization Level - Specify the basic optimization level.
///    0 = -O0, 1 = -O1, 2 = -O2, 3 = -O3
unsigned OptLevel;
lib/Transforms/Scalar/CallSiteSplitting.cpp
252

Changed to a range-based loop, but didn't add helper function because the purpose of each loop slightly different: first one adds an attribute, and the second one set argument value for one call-site, and the third one set arguments for two call-sites. IMHO, iterating for each case doesn't make a big confusion.

308

Agree.

junbuml edited the summary of this revision. (Show Details)Oct 26 2017, 11:26 AM
junbuml updated this revision to Diff 120634.Oct 27 2017, 10:23 AM

Added a test file to show simple cases covered by this pass.

Hi David,
Currently I placed this pass before IPSCCP, so ran performance tests (spec2000/2006/2017/llvm-test-suite) on aarch64. No performance / size regression was observed. Still keep 20% performance gain in spec2017/gcc. Please let me know if you are okay with this pass order.

Thanks,
Jun

davidxl added inline comments.Oct 30 2017, 3:50 PM
lib/Transforms/Scalar/CallSiteSplitting.cpp
104

Takend --> Taken

409

Assert CallIInst1 and CallInst2 are null here before return false.

411

It seems better to not to speculatively clone the call instructions and delete them later when not needed. The analysis function should instead return 'clone' records indicating how to do the cloning during the transformation phase.

junbuml updated this revision to Diff 121000.Oct 31 2017, 9:44 AM
junbuml marked 2 inline comments as done.

Addressed David's comments. Please let me know any comment.
Thanks for the review.

lib/Transforms/Scalar/CallSiteSplitting.cpp
411

Added canSplitCallSite() and checked if the call-site can be split first. So, we do not speculatively clone call instructions. The new call-sites cloned should be split in splitCallSite() without any trouble when splitting preds (added assert for this). Now, I changed splitCallSite() to return void.

junbuml updated this revision to Diff 121353.Nov 2 2017, 1:21 PM

Rebased and added debug messages.

davidxl accepted this revision.Nov 2 2017, 1:49 PM

lgtm

This revision is now accepted and ready to land.Nov 2 2017, 1:49 PM
fhahn added a comment.Nov 2 2017, 3:09 PM

Great, thanks! lgtm too.

This revision was automatically updated to reflect the committed changes.

Thanks for the review.

thakis added a subscriber: thakis.Mar 29 2020, 10:31 AM
thakis added inline comments.
llvm/trunk/lib/Transforms/Scalar/CallSiteSplitting.cpp
370 ↗(On Diff #121523)

Was this break supposed to be in the if above? As-is, it unconditionally breaks after the first iteration of the loop.

But even if it was a line further up, we'd only check the first Phi in the parent and check if it's an arg. I don't know if that's enough.

fhahn added inline comments.Mar 30 2020, 1:51 PM
llvm/trunk/lib/Transforms/Scalar/CallSiteSplitting.cpp
370 ↗(On Diff #121523)

It looks like the code indeed bails out too early. I think we should check all PHIs. I've put up D77089. It should not matter if there are unrelated PHIs in between, but maybe the goal was to be extremely conservative @junbuml ?