This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Enable tail calls for all Thumb1 targets.
Changes PlannedPublic

Authored by efriedma on Jul 17 2018, 7:14 PM.

Details

Summary

It isn't really profitable in general, but it's profitable in cases where there we don't need to spill LR and the callee is a function pointer.

We don't actually generate a tail-call until after isel: we can't tell whether it will be profitable at that point, so we delay the decision to a separate Thumb1TailCallOptimizer pass.

Depends on D49459.

Diff Detail

Repository
rL LLVM

Event Timeline

efriedma created this revision.Jul 17 2018, 7:14 PM
chill accepted this revision.Jul 20 2018, 3:31 AM

LGTM. test/CodeGen/Thumb/PR35481.ll needs an update though ( and a typo there CHECK-V47)

This revision is now accepted and ready to land.Jul 20 2018, 3:31 AM

Hello. This has code to prevent it from increasing codesize, right? I'm seeing the opposite happen, at least at -Oz. Perhaps because of an interplay between block layout or jump tables or something like it?

One of the examples I looked at is in csibe, with OpenTCP/smtp/smtp_client.c showing some increase.

I'm not sure how this could increase size; the only thing I can think of is that there's some bad interaction with tail merge. Maybe we need to do reverse transform earlier? (I'd appreciate a testcase, if you have one.)

efriedma planned changes to this revision.Jul 26 2018, 1:03 PM

Spent some time looking at David's testcases; looks like there are a few problems which lead to codesize increases:

  1. SupportsTailCall = true changes the result of ARMTargetLowering::mayBeEmittedAsTailCall, which makes CodeGenPrepare clone a bunch of "ret" instructions. The result should be cleaned up again by tail merge, but the default tail merge threshold is too high ("bl foo; pop {r4, pc}" is two instructions, and the default threshold is 3 instructions).
  2. The reverse transform in Thumb1FrameLowering::restoreCalleeSavedRegisters adds the wrong operands to the tPOP_RET instruction; it's based on the arguments to the tail call, not the returned values. As a consequence, tail merge thinks the instructions don't match. Also, there's a theoretical miscompile, but in practice the compiler probably won't insert code there anyway, so I'm not sure how to trigger it. Not sure how to fix this; I don't think we track the necessary information anywhere. (Tail merge could be fixed to merge instructions with mismatched implicit operands, which is probably a good idea anyway, but that wouldn't really solve the underlying incorrect modeling.)

So this needs more work.

efriedma updated this revision to Diff 159626.Aug 7 2018, 5:07 PM
efriedma edited the summary of this revision. (Show Details)

Updated to use a pass after isel to convert a call in tail position to a tail-call, rather than try to do the opposite in frame lowering. I'm not completely happy with this; the pass is a lot more code than I'd really like. But I don't have any better ideas.

Still need to look more at the CodeGenPrepare tail duplication issue; we might need to suppress the transform for Thumb1, even if that means missing some cases we could tail-call. And I probably need some more testcases.

This revision is now accepted and ready to land.Aug 7 2018, 5:07 PM
efriedma planned changes to this revision.Aug 30 2018, 2:06 PM
chill resigned from this revision.Jul 9 2019, 5:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 9 2019, 5:41 AM