Page MenuHomePhabricator

Fix fastcc/tailcallopt for ARM64
Needs ReviewPublic

Authored by Jiangning on May 6 2014, 8:34 PM.

Details

Reviewers
t.p.northover
Summary

ARM64 only supports sibling call, and it doesn't really support aggressive tail call optimization as AArch64 is able to do right now. This patch is to support aggressive tail call optimization for ARM64. There is a tracker at http://llvm.org/bugs/show_bug.cgi?id=19426 describing the issue, and this is blocking the merge from AArch64 to ARM64.

Originally ARM64 is using IsTailCall to represent sibling call. With this patch, IsTailCall is changed to be the original tail call eligibility meaning, and IsSibCall is added to describe the real sibling call.

Both sibling call and tail call optimizations use branch instruction to replace the call, and remove the return instruction after calling the callee, but the difference is sibling call doesn't break traditional ABI, i.e. not pop argument stack, while tail call does.

IsTailCall, TailCallOpt and IsSibCall have the following relationships,

  • IsTailCall and TailCallOpt are orthogonal.
  • IsTailCall means a callee isEligibleForTailCallOptimization().
  • TailCallOpt means command line option switch Options.GuaranteedTailCallOpt only.
  • IsSibCall = (!TailCallOpt && IsTailCall)

For Darwin, I'm not sure if we still need to guarantee 16-byte alignment, so the tests don't really cover arm64-darwin yet.

Thanks,
-Jiangning

Diff Detail

Event Timeline

Jiangning updated this revision to Diff 9138.May 6 2014, 8:34 PM
Jiangning retitled this revision from to Fix fastcc/tailcallopt for ARM64.
Jiangning updated this object.
Jiangning edited the test plan for this revision. (Show Details)
Jiangning added a reviewer: t.p.northover.
Jiangning added a subscriber: Unknown Object (MLST).
t.p.northover edited edge metadata.May 7 2014, 2:55 AM

Hi Jiangning,

Thanks very much for working on this! It's an interesting area, but a very tricky one. Mostly it looks OK, I've just got a few comments:

lib/Target/ARM64/ARM64FrameLowering.cpp
478

TCRETURN in the ARM64 world.

lib/Target/ARM64/ARM64ISelLowering.cpp
1769

Darwin definitely does need the stack to be aligned to 16 bytes: a segfault occurs if a misaligned SP is ever used.

2146–2148

Again, Darwin has the same constraints as other platforms here.

I also think this might be the wrong place to do this, on reflection. I'd like to try & keep NumBytes unaligned (it's mostly corrected for by a RoundUpToAlignment in ARM64FrameLowering::eliminateCallFramePseudoInstr and the more accurate value is useful to PrologEpilogInserter).

So perhaps it's best to call RoundUpToAlignment only where it's actually needed: line 2169 & 2390 by the looks of it. On the other hand, if it can be moved to eliminateCallFramePseudoInstr that would be even more consistent (that may make calculations impossible or unnecessarily difficult in ISelLowering though, so twist yourself into contortions trying it).

2236–2241

I think this is probably dodgy for byval arguments. Perhaps leave the big-endian changes for James to thrash out?

2247

There's some trailing spaces on this line (& others). Might be an idea to clang-format the patch if they're not obvious in your editor.

lib/Target/ARM64/ARM64InstrInfo.td
89

Couldn't this be a 2-argument node rather than a variadic one? You seem to have modified all uses.

jmolloy added a subscriber: jmolloy.May 7 2014, 6:35 AM
jmolloy added inline comments.
lib/Target/ARM64/ARM64ISelLowering.cpp
2236–2241

Yes, this looks like code that was in AArch64 but has been refactored for ARM64. It should all fall out in the wash as this code is now fully committed to ARM64 ToT.

Jiangning updated this revision to Diff 9202.May 8 2014, 2:13 AM
Jiangning edited edge metadata.

This new version of patch fixed the followings,

  1. Merged with TOT, and in particular, merged the logic around big-endian in LowerCall.
  2. Changed the argument of SDT_ARM64TCRET from -1 to 2.
  3. Fixed the 16-byte alignment requirement issue following Tim's feedback.
  4. The patch is formatted with tool git-clang-format.

Thanks,
-Jiangning

Hi Tim,

I updated with new patch and see my comments below.

Thanks,
-Jiangning

lib/Target/ARM64/ARM64FrameLowering.cpp
478

Tim, it should be TC_RETURN rather than TCRETURN, I think.

lib/Target/ARM64/ARM64ISelLowering.cpp
2146–2148

OK. I modified this and keep the raw stack size in NumBytes, and added RoundUpToAlignment to the locations you pointed out. I think you are correct.

I also think it's quite normal to calculate and keep the stack size info in LowerCall, because at this moment we can easily know the callee we are handling, and logic is interleaved with judging if it is a tailcall. Moving to eliminateCallFramePseudoInstr sounds not quite reasonable to me.

2236–2241

I merged with Jame's fix on TOT, but the logic should still be similar to this, and I think James almost kept the same logic here.

2247

I see. Sorry about that! Now I know I can use git-clang-format to clean up my patch. Thanks for your kindly reminder.

lib/Target/ARM64/ARM64InstrInfo.td
89

Yeah. I tried and it's amazing 2-argument works. Actually with this patch the TC_RETURN may have more arguments rather than 2, because addTokenForArgument will add dependence to register argument and RA will guarantee not to clobber them. Actually I don't quite understand why the pattern match can pass the argument check if I set argument to 2. But anyway, given that number 2 works, I get it changed in my new patch.

Jiangning updated this revision to Diff 9376.May 14 2014, 12:32 AM

No real code logic changes, and only make the following changes,

  1. Merged with trunk
  2. Reformat with git-clang-format

Hi Jiangning,

Sorry for the delay, I somehow missed the fact that you'd updated the patch. And then I somehow managed to close the tab I was doing the review in.

Anyway, I've just got a comment on a couple of asserts which probably need adjusting for ARM64. Other than that it looks fine to me. No need to post another version, I don't think.

Cheers.

Tim.

lib/Target/ARM64/ARM64FrameLowering.cpp
134

It looks like ARM64's emitFrameOffset can handle larger immediates (up to 24 bits easily, beyond that inefficiently). Same thing later.

Hi Tim,

Modified as you suggested and committed as r208837.

Thanks,
-Jiangning

2014-05-14 16:17 GMT+08:00 Tim Northover <t.p.northover@gmail.com>:

Hi Jiangning,

Sorry for the delay, I somehow missed the fact that you'd updated the
patch. And then I somehow managed to close the tab I was doing the review
in.

Anyway, I've just got a comment on a couple of asserts which probably need
adjusting for ARM64. Other than that it looks fine to me. No need to post
another version, I don't think.

Cheers.

Tim.

================
Comment at: lib/Target/ARM64/ARM64FrameLowering.cpp:134
@@ +133,3 @@
+ // it is a limitation that needs dealing with.
+ assert(Amount > -0xfff && Amount < 0xfff && "call frame too large");
+ emitFrameOffset(MBB, I, DL, ARM64::SP, ARM64::SP, Amount, TII);


It looks like ARM64's emitFrameOffset can handle larger immediates (up to
24 bits easily, beyond that inefficiently). Same thing later.

http://reviews.llvm.org/D3633