This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
CallSiteSplitting.cpp

Differential D47949

[callsitesplit] Limit the # of predecessors walk when recording condition
Needs RevisionPublic

Authored by trentxintong on Jun 8 2018, 9:07 AM.

Download Raw Diff

Details

Reviewers

fhahn
junbuml
davide

Summary

We have some pathological cases (generated by machine) which
callsite splitting chokes on.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 19099
Build 19099: arc lint + arc unit

Event Timeline

trentxintong created this revision.Jun 8 2018, 9:07 AM

Harbormaster completed remote builds in B19099: Diff 150527.Jun 8 2018, 9:07 AM

Thanks for the patch! Could you add a test case for this too? It can be very simple, just set callsite-predecessor-walk-threshold=1 and have a usable condition at the 2nd predecessor

Will do.

It seems like this is trying to hide some algorithmic problem in the pass.
I think we should try to fix the pass instead. @trentxintong can you provide such cases?

The reason why I say this is that we sprinkled cutoffs over the optimizer and some of them have bitten us back. Given that CallSiteSplitting is a relatively new pass (and a clean/simple one), maybe there's something we can try before giving up.

This revision now requires changes to proceed.Jun 8 2018, 9:34 AM

The pass currently walks back the single predecessors of a call site and records conditions relevant to the call site. There are cases where doing so we end up visiting the same blocks over and over again. It would probably be better to do a single traversal of the CFG and record relevant conditions once. I have to think a bit more about it, but maybe PredicateInfo could be helpful here and allow us to avoid visiting predecessors.

In D47949#1126627, @fhahn wrote:

The pass currently walks back the single predecessors of a call site and records conditions relevant to the call site. There are cases where doing so we end up visiting the same blocks over and over again. It would probably be better to do a single traversal of the CFG and record relevant conditions once. I have to think a bit more about it, but maybe PredicateInfo could be helpful here and allow us to avoid visiting predecessors.

Yes, that could be one solution. In general, I don't think that we should visit a block more than a constant amount of time to split the callsites.

@davide @fhahn I am sorry that I cant provide the source code. But one of the cases that resulted in chasing up the chain a lot I can see came from a sequence of object declarations & initializations (constructors which can throw). And they cascaded into a block with 2 predecessors.

Actually in this case we did not manage to find anything interesting to arguments, i.e. the terminator is an InvokeInst.

Before we fix the pass, do you think its reasonable to land this patch (with test cases provided) and a FIXME on the knob about this discussion/review so that we can remove it after the fundamental problem with the pass is addressed to help us and also possibly other people with similar situations.

I'll let @junbuml or @fhahn to make the call as they touched this more than me but I think we should see whether it's feasible to fix the algorithmic complexity (or understand why it's harder) before pushing this bandaid.

In D47949#1130490, @trentxintong wrote:

@davide @fhahn I am sorry that I cant provide the source code. But one of the cases that resulted in chasing up the chain a lot I can see came from a sequence of object declarations & initializations (constructors which can throw). And they cascaded into a block with 2 predecessors.

There is no need for (C/C++) source code. A simple LLVM IR test case in test/Transforms/CallSiteSplitting/ would be good.

Actually in this case we did not manage to find anything interesting to arguments, i.e. the terminator is an InvokeInst.

Before we fix the pass, do you think its reasonable to land this patch (with test cases provided) and a FIXME on the knob about this discussion/review so that we can remove it after the fundamental problem with the pass is addressed to help us and also possibly other people with similar situations.

So I had a look at using PredicateInfo for this. It does something very similar, but unfortunately I don't think we can just use it, as it only inserts copies in code paths dominated by the condition, whereas for callsitesplitting, the callsite is usually not dominated by the conditions. Only the predecessors we create when splitting are. However we should be able to use a similar approach to just traverse the CFG once.

I can look into that over the next few weeks. I am not sure if it is worth putting the fix in until then, but it could be valuable as a temporary fix for some cases.

Sorry for coming back to this just now. Unfortunately I won't have time to turn this into a forward-style analysis before the 7.0 branch. IMO it would be fine to get this in for the 7.0 branch, to limit compile time in edge cases, and creating a ticket to improve the situation. What do you think @davide ? In general, currently we are only supporting splitting a very limited set of call sites, so we would have to limit the forward-analysis to the relevant call sites, otherwise we end up doing (much) more work on average.

In any case, we should get still get a test case for the change.

@fhahn Thank you for doing this. This is not a blocking issue for us. But it would be nice to have it fixed (for us and the possibly other users of LLVM in general). If you and @davide agree we should do this before having a real fix. I can write a test and land this. Otherwise, I am fine waiting for the real fix.

hiraditya added a subscriber: hiraditya.Jul 17 2018, 12:21 PM

@trentxintong I just updated D44627, which stops backtracking once we hit IDom(call site), because conditions further up won't be impacted by splitting. Any chance you could try this patch to see if it improves things on your test case?

Thanks for the change @fhahn. I will dig up my test case and run it.

In D47949#1292403, @trentxintong wrote:

Thanks for the change @fhahn. I will dig up my test case and run it.

@trentxintong did you manage to get any numbers for the test case?

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

CallSiteSplitting.cpp

13 lines

Diff 150527

lib/Transforms/Scalar/CallSiteSplitting.cpp

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
/// DuplicationThreshold. Those instructions need to be duplicated in all		/// DuplicationThreshold. Those instructions need to be duplicated in all
/// split blocks.		/// split blocks.
static cl::opt<unsigned>		static cl::opt<unsigned>
DuplicationThreshold("callsite-splitting-duplication-threshold", cl::Hidden,		DuplicationThreshold("callsite-splitting-duplication-threshold", cl::Hidden,
cl::desc("Only allow instructions before a call, if "		cl::desc("Only allow instructions before a call, if "
"their cost is below DuplicationThreshold"),		"their cost is below DuplicationThreshold"),
cl::init(5));		cl::init(5));


		/// Only allow N level of predecessors to be walked.
		static cl::opt<unsigned>
		PredecessorWalkThreshold("callsite-predecessor-walk-threshold", cl::Hidden,
		cl::desc("Only allow N number of predecessors to be"
		"walked"),
		cl::init(16));


static void addNonNullAttribute(CallSite CS, Value *Op) {		static void addNonNullAttribute(CallSite CS, Value *Op) {
unsigned ArgNo = 0;		unsigned ArgNo = 0;
for (auto &I : CS.args()) {		for (auto &I : CS.args()) {
if (&*I == Op)		if (&*I == Op)
CS.addParamAttr(ArgNo, Attribute::NonNull);		CS.addParamAttr(ArgNo, Attribute::NonNull);
++ArgNo;		++ArgNo;
}		}
}		}
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	if (isCondRelevantToAnyCallArgument(Cmp, CS))
: Cmp->getInversePredicate()});		: Cmp->getInversePredicate()});
}		}

/// Record ICmp conditions relevant to any argument in CS following Pred's		/// Record ICmp conditions relevant to any argument in CS following Pred's
/// single predecessors. If there are conflicting conditions along a path, like		/// single predecessors. If there are conflicting conditions along a path, like
/// x == 1 and x == 0, the first condition will be used.		/// x == 1 and x == 0, the first condition will be used.
static void recordConditions(CallSite CS, BasicBlock *Pred,		static void recordConditions(CallSite CS, BasicBlock *Pred,
ConditionsTy &Conditions) {		ConditionsTy &Conditions) {
		uint32_t PredWalked = 0;
recordCondition(CS, Pred, CS.getInstruction()->getParent(), Conditions);		recordCondition(CS, Pred, CS.getInstruction()->getParent(), Conditions);
BasicBlock *From = Pred;		BasicBlock *From = Pred;
BasicBlock *To = Pred;		BasicBlock *To = Pred;
SmallPtrSet<BasicBlock *, 4> Visited;		SmallPtrSet<BasicBlock *, 4> Visited;
while (!Visited.count(From->getSinglePredecessor()) &&		while (!Visited.count(From->getSinglePredecessor()) &&
(From = From->getSinglePredecessor())) {		(From = From->getSinglePredecessor())) {
recordCondition(CS, From, To, Conditions);		recordCondition(CS, From, To, Conditions);
Visited.insert(From);		Visited.insert(From);
To = From;		To = From;
		// Make sure we do not traverse too many predecessors up the CFG.
		if (++PredWalked > PredecessorWalkThreshold)
		break;
}		}
}		}

static void addConditions(CallSite CS, const ConditionsTy &Conditions) {		static void addConditions(CallSite CS, const ConditionsTy &Conditions) {
for (auto &Cond : Conditions) {		for (auto &Cond : Conditions) {
Value *Arg = Cond.first->getOperand(0);		Value *Arg = Cond.first->getOperand(0);
Constant *ConstVal = cast<Constant>(Cond.first->getOperand(1));		Constant *ConstVal = cast<Constant>(Cond.first->getOperand(1));
if (Cond.second == ICmpInst::ICMP_EQ)		if (Cond.second == ICmpInst::ICMP_EQ)
▲ Show 20 Lines • Show All 394 Lines • Show Last 20 Lines