This is an archive of the discontinued LLVM Phabricator instance.

[callsitesplit] Limit the # of predecessors walk when recording condition
Needs RevisionPublic

Authored by trentxintong on Jun 8 2018, 9:07 AM.

Details

Summary

We have some pathological cases (generated by machine) which
callsite splitting chokes on.

Diff Detail

Event Timeline

trentxintong created this revision.Jun 8 2018, 9:07 AM
fhahn added a comment.Jun 8 2018, 9:09 AM

Thanks for the patch! Could you add a test case for this too? It can be very simple, just set callsite-predecessor-walk-threshold=1 and have a usable condition at the 2nd predecessor

davide requested changes to this revision.Jun 8 2018, 9:34 AM
davide added a subscriber: davide.

It seems like this is trying to hide some algorithmic problem in the pass.
I think we should try to fix the pass instead. @trentxintong can you provide such cases?

The reason why I say this is that we sprinkled cutoffs over the optimizer and some of them have bitten us back. Given that CallSiteSplitting is a relatively new pass (and a clean/simple one), maybe there's something we can try before giving up.

This revision now requires changes to proceed.Jun 8 2018, 9:34 AM
fhahn added a comment.Jun 8 2018, 9:40 AM

The pass currently walks back the single predecessors of a call site and records conditions relevant to the call site. There are cases where doing so we end up visiting the same blocks over and over again. It would probably be better to do a single traversal of the CFG and record relevant conditions once. I have to think a bit more about it, but maybe PredicateInfo could be helpful here and allow us to avoid visiting predecessors.

davide added a comment.Jun 8 2018, 9:51 AM

The pass currently walks back the single predecessors of a call site and records conditions relevant to the call site. There are cases where doing so we end up visiting the same blocks over and over again. It would probably be better to do a single traversal of the CFG and record relevant conditions once. I have to think a bit more about it, but maybe PredicateInfo could be helpful here and allow us to avoid visiting predecessors.

Yes, that could be one solution. In general, I don't think that we should visit a block more than a constant amount of time to split the callsites.

@davide @fhahn I am sorry that I cant provide the source code. But one of the cases that resulted in chasing up the chain a lot I can see came from a sequence of object declarations & initializations (constructors which can throw). And they cascaded into a block with 2 predecessors.

Actually in this case we did not manage to find anything interesting to arguments, i.e. the terminator is an InvokeInst.

Before we fix the pass, do you think its reasonable to land this patch (with test cases provided) and a FIXME on the knob about this discussion/review so that we can remove it after the fundamental problem with the pass is addressed to help us and also possibly other people with similar situations.

I'll let @junbuml or @fhahn to make the call as they touched this more than me but I think we should see whether it's feasible to fix the algorithmic complexity (or understand why it's harder) before pushing this bandaid.

fhahn added a comment.Jun 14 2018, 2:50 PM

@davide @fhahn I am sorry that I cant provide the source code. But one of the cases that resulted in chasing up the chain a lot I can see came from a sequence of object declarations & initializations (constructors which can throw). And they cascaded into a block with 2 predecessors.

There is no need for (C/C++) source code. A simple LLVM IR test case in test/Transforms/CallSiteSplitting/ would be good.

Actually in this case we did not manage to find anything interesting to arguments, i.e. the terminator is an InvokeInst.

Before we fix the pass, do you think its reasonable to land this patch (with test cases provided) and a FIXME on the knob about this discussion/review so that we can remove it after the fundamental problem with the pass is addressed to help us and also possibly other people with similar situations.

So I had a look at using PredicateInfo for this. It does something very similar, but unfortunately I don't think we can just use it, as it only inserts copies in code paths dominated by the condition, whereas for callsitesplitting, the callsite is usually not dominated by the conditions. Only the predecessors we create when splitting are. However we should be able to use a similar approach to just traverse the CFG once.

I can look into that over the next few weeks. I am not sure if it is worth putting the fix in until then, but it could be valuable as a temporary fix for some cases.

fhahn added a comment.Jul 16 2018, 9:57 AM

Sorry for coming back to this just now. Unfortunately I won't have time to turn this into a forward-style analysis before the 7.0 branch. IMO it would be fine to get this in for the 7.0 branch, to limit compile time in edge cases, and creating a ticket to improve the situation. What do you think @davide ? In general, currently we are only supporting splitting a very limited set of call sites, so we would have to limit the forward-analysis to the relevant call sites, otherwise we end up doing (much) more work on average.

In any case, we should get still get a test case for the change.

@fhahn Thank you for doing this. This is not a blocking issue for us. But it would be nice to have it fixed (for us and the possibly other users of LLVM in general). If you and @davide agree we should do this before having a real fix. I can write a test and land this. Otherwise, I am fine waiting for the real fix.

fhahn added a comment.Nov 8 2018, 6:38 AM

@trentxintong I just updated D44627, which stops backtracking once we hit IDom(call site), because conditions further up won't be impacted by splitting. Any chance you could try this patch to see if it improves things on your test case?

Thanks for the change @fhahn. I will dig up my test case and run it.

fhahn added a comment.Jan 11 2019, 8:57 AM

Thanks for the change @fhahn. I will dig up my test case and run it.

@trentxintong did you manage to get any numbers for the test case?