This is an archive of the discontinued LLVM Phabricator instance.

[CallSiteSplitting] properly split musttail calls
ClosedPublic

Authored by indutny on Feb 23 2018, 11:09 PM.

Details

Summary

musttail calls can't be naively splitted. The split blocks must
include not only the call instruction itself, but also (optional)
bitcast and return instructions that follow it.

Clone bitcast and ret, place them into the split blocks, and
remove the tail block when done.

Diff Detail

Repository
rL LLVM

Event Timeline

indutny created this revision.Feb 23 2018, 11:09 PM

A question: what is backport policy of LLVM project? I'd love to see this fix in 5.0.0 .

indutny updated this revision to Diff 135801.Feb 24 2018, 11:28 AM

Move condition to proper function.

indutny updated this revision to Diff 135803.Feb 24 2018, 12:01 PM
indutny retitled this revision from [CallSiteSplitting] do not split musttail calls to [CallSiteSplitting] properly split musttail calls.
indutny edited the summary of this revision. (Show Details)

Proper implementation!

indutny updated this revision to Diff 135804.Feb 24 2018, 12:08 PM

Fix non-musttail calls.

Thanks for catching this.

Different cloning using UnreachableInst could be a way to handle musttail. However, I prefer keeping the current way of splitting CS as it as. Instead, we may be able to handle musttail (clone bitcast/retrun and remove TailBB) at the end of splitCallSite() in a separate function.

lib/Transforms/Scalar/CallSiteSplitting.cpp
268 ↗(On Diff #135804)

Instead of cloning differently for musttail using UnreachableInst, I think it also possible to keep the current way of splitting as it is, but for musttail, we can handle it (clone bitcast and return and remove TailBB) as a separate function which being called at the end of this function.

358 ↗(On Diff #135804)

Here, I think we can call a separate function which handle the musttail case. So we can specifically clone bitcast and return and remove TailBB.

test/Transforms/CallSiteSplitting/musttail.ll
21 ↗(On Diff #135804)

Can we add two more test cases :

  1. return void .
  2. return without bitcast.
indutny updated this revision to Diff 135987.Feb 26 2018, 2:29 PM
indutny marked 3 inline comments as done.

Address review comments.

Hello!

Just wanted to say thank you for giving a thorough review! I appreciate it!

junbuml added inline comments.Feb 27 2018, 7:56 AM
lib/Transforms/Scalar/CallSiteSplitting.cpp
224 ↗(On Diff #135987)

You can change "if (PN != nulltr)" into "if (PN)".

Isn't it possible to set isVoid here?

236 ↗(On Diff #135987)

assert(RI != nullptr && -> assert(RI &&

240 ↗(On Diff #135987)

Although BI is no longer used, ++BI in assert doesn't seem to be a good idea. std::next(BI) should be better.

250 ↗(On Diff #135987)

It's not always true because we can have some instructions before the call-site split (e.g., callsite-instructions-before-call.ll).

254 ↗(On Diff #135987)

We don't need to do this in the loop.

258–267 ↗(On Diff #135987)

Look like we can make a separate function for cloning.

286 ↗(On Diff #135987)

if (RI) should be better.

414 ↗(On Diff #135987)

The original CallInst must be removed in above while loop (line 394).

indutny updated this revision to Diff 136101.Feb 27 2018, 10:37 AM
indutny marked 7 inline comments as done.

Address feedback.

LGTM.
I want to let Florian also review this as well.

lib/Transforms/Scalar/CallSiteSplitting.cpp
212 ↗(On Diff #136101)

Since this is only for Return/BitCast for musttail, it will be good to name it more specifically. Something like cloneReturnInstForMustTail ?

273 ↗(On Diff #136101)

It would be better if you can clean up DEBUG little bit more. I think DEBUG only for BitCast and Return cloned is good enough.

indutny updated this revision to Diff 136147.Feb 27 2018, 1:40 PM
indutny marked 3 inline comments as done.

Rename function, and remove extra debug calls.

fhahn added a comment.Feb 28 2018, 1:54 AM

Thanks for the patch. I think it would be simpler (and less code) to integrate musttail handling directly in splitCallSite, rather than patching up afterwards. See my inline comments and please let me know what you think/if I am missing something.

lib/Transforms/Scalar/CallSiteSplitting.cpp
376 ↗(On Diff #136147)

For musttail calls, I think we could clone the bitcast & ret instructions here and move them after NewCi. Instead of adding the result to the Call phi node, we just have to updated the cloned bitcast/ret instruction to use NewCI instead of the original one.

379 ↗(On Diff #136147)

For musttail calls, we should be able to just delete the BB once we handled all predecessors and we are done.

indutny added a comment.EditedFeb 28 2018, 12:38 PM

Hello, @fhahn !

Thank you for taking a look at it. I'll be glad to make the changes. Just one question before I start: it seems that the suggestions of @junbuml and you are to some extent contradictory. At first I was suggested to leave the function as it is, and now the suggestion is to move things back to it. Given this, what do we actually desire it to look like? Let's decide on this before making any additional changes.

Thanks for your time,
Fedor.

fhahn added a comment.Feb 28 2018, 1:23 PM

Hello, @fhahn !

Thank you for taking a look at it. I'll be glad to make the changes. Just one question before I start: it seems that the suggestions of @junbuml and you are to some extent contradictory. At first I was suggested to leave the function as it is, and now the suggestion is to move things back to it. Given this, what do we actually desire it to look like? Let's decide on this before making any additional changes.

Ah I missed the full history, but had a look from the start now. I think the initial patch was slightly different to what I outlined. With my suggestion, the only changes to the main loop of splitCallSite would be cloning and moving the bitcast/ret instructions for musttail calls to the newly created predecessor blocks (this should be done in a separate function). And then we can just erase the TailBB, if it does not have any predecessors any longer (which should always the case ATM for musttail calls, as we split all predecessors for now).

If that works as I think, it should be just a couple of lines changed in splitCallSite and we handle musttail directly, rather than after the fact, which IMO makes it easier to see what's going on. I might be missing something though. Jun, what do you think?

What I didn't want to have initially was creating unreachable to rely on DuplicateInstructionsInSplitBetween to clone BitCast/Ret. As a way to avoid that, I suggested to separate handling musttail from existing cloning. I think what Florian suggested could make fixMustTailCallSite even simpler. If you make fixMustTailCallSite to handle each pred split instead of handling two preds splits, I think you can call the modified fixMustTailCallSite instead of doing CallPN->addIncoming() in splitCallSite();

Sounds like a plan! Will work on it today. Thank you, guys!

indutny updated this revision to Diff 136409.Feb 28 2018, 3:28 PM
indutny marked 2 inline comments as done.

Address feedback.

There's a huge FIXME in the code that depends on @fhahn's patch: https://reviews.llvm.org/D43822 . Will remove it as soon as that patch lands. Hopefully it won't deter you from reviewing it ;)

indutny edited the summary of this revision. (Show Details)Feb 28 2018, 3:43 PM
fhahn added inline comments.Mar 1 2018, 10:43 AM
lib/Transforms/Scalar/CallSiteSplitting.cpp
338 ↗(On Diff #136409)

We should only do that for musttail calls, otherwise we crash.

I think we hit a slightly different problem here that won't be fixed by D43822. A different iterator gets invalidated. I'll have a closer look tomorrow.

indutny added inline comments.Mar 1 2018, 12:35 PM
lib/Transforms/Scalar/CallSiteSplitting.cpp
338 ↗(On Diff #136409)

Ooops. I'll take a look at it today, hopefully this won't be in conflict with your work since you plan to start tomorrow.

indutny updated this revision to Diff 136678.Mar 1 2018, 10:29 PM
indutny marked 2 inline comments as done.

Proper fixup for pending clone issue.

fhahn added a comment.Mar 2 2018, 7:45 AM

Thanks for updating! I left a few more comments, but it looks good overall. I just left a few thoughts about structuring the code to deal with musttail in splitCallSite. Sorry if I was not too clear earlier. Anyways, those are mostly my personal preference. Also, I think doing the deletion after the loop is totally fine and avoids problems.

lib/Transforms/Scalar/CallSiteSplitting.cpp
215 ↗(On Diff #136678)

I don't think we need to do that.

278 ↗(On Diff #136678)

Personally I would just move the code in this block and the code for the musttail handling in the main loop to a single function, named something like addMusttailReturn. It means we might end up doing a bit of additional work, but it keeps splitCallSite simpler.

283 ↗(On Diff #136678)

Not sure if we need those assertions here. If anything is wrong here, then it should have been invalid earlier too.

289 ↗(On Diff #136678)

Not sure if we need those assertions here. If anything is wrong here, then it should have been invalid earlier too.

327 ↗(On Diff #136678)

I think it would be slightly better to have that code a function addMusttailReturn or something.

333 ↗(On Diff #136678)

I don't think D43822 fixes the problem. For now I think it's OK to just remove the terminators after the main loop (and drop the fixme).

350 ↗(On Diff #136678)

Please merge that with the block that deletes the terminators. I would drop the fixme and add a comment with a note that we delete it after we split all predecessors to avoid iterator invalidation.

indutny updated this revision to Diff 136785.Mar 2 2018, 10:19 AM
indutny marked 7 inline comments as done.

Address feedback.


This should do it. Thank you for reviewing this!

lib/Transforms/Scalar/CallSiteSplitting.cpp
215 ↗(On Diff #136678)

We need it for fancy names in output IR. Those %ca1, %ca2 come out because of it.

fhahn accepted this revision.Mar 2 2018, 11:00 AM

Great thanks! LGTM. Just 2 small nits that can be addressed before committing, without additional review

lib/Transforms/Scalar/CallSiteSplitting.cpp
303 ↗(On Diff #136785)

blokcs -> blocks

353 ↗(On Diff #136785)

range based for loop?

This revision is now accepted and ready to land.Mar 2 2018, 11:00 AM
indutny updated this revision to Diff 136810.Mar 2 2018, 11:05 AM
indutny marked 3 inline comments as done.

Address feedback.


Done. Thanks for taking a look! Will land it later today unless there's more to change!

fhahn added a comment.Mar 2 2018, 11:11 AM

Great. Please wait a bit with committing so @junbuml has a chance to raise any additional concerns.

This revision was automatically updated to reflect the committed changes.

Thank you, everyone!

Sorry for my late response. I was off for past few days. Thanks for working on this.