This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Change TCReturn to tBL if tailcall optimization fails.
ClosedPublic

Authored by sanwou01 on Jan 23 2017, 5:23 AM.

Details

Summary

The tail call optimisation is performed before register allocation, so
at that point we don't know if LR is being spilt or not. If LR was spilt
to the stack, then we cannot do a tail call optimisation. That would
involve popping back into LR which is not possible in Thumb1 code.

Event Timeline

sanwou01 created this revision.Jan 23 2017, 5:23 AM

Tests in https://reviews.llvm.org/D29073 . FYI, I do not have commit access.

rengolin edited edge metadata.Jan 24 2017, 3:11 AM

Hi,

The tests really need to be here. If anything happens, cross-reverting is a nightmare and can be really hard to work out if the commits are in different ranges.

If this fix is needed for the other patch, than the only way I can think of is for both to be on the same commit. If this is just an independent fix, than it needs tests with it.

cheers,
--renato

sanwou01 updated this revision to Diff 85560.Jan 24 2017, 4:42 AM

@rengolin sorry about that, now with tests.

Thanks!

Now, I'm trying to understand what the problem is.

It seems that a previous process to deal with tail calls missed a spot, and you're adding the fix on the last possible stage to just change it to a branch&link.

The idea seems fine, but I'm worried that the implementation could leave untested areas uncovered.

Not to mention that, if there is a process that deals with tail calls, the code should not leave that unchecked. Ie. there should be no TCRETURN* after it at all.

I don't remember well that part of the code, so I may be missing something, but it looks to me that there is a more encompassing solution that we're not seeing here.

Also, from the test alone, it's not clear what cases fail to be processed and what don't. Can you elaborate on the description of the review what was the problem you found, what was the approach and what cases you hope to have covered?

cheers,
--renato

Thanks!

Now, I'm trying to understand what the problem is.

It seems that a previous process to deal with tail calls missed a spot, and you're adding the fix on the last possible stage to just change it to a branch&link.

The idea seems fine, but I'm worried that the implementation could leave untested areas uncovered.

Not to mention that, if there is a process that deals with tail calls, the code should not leave that unchecked. Ie. there should be no TCRETURN* after it at all.

I don't remember well that part of the code, so I may be missing something, but it looks to me that there is a more encompassing solution that we're not seeing here.

Also, from the test alone, it's not clear what cases fail to be processed and what don't. Can you elaborate on the description of the review what was the problem you found, what was the approach and what cases you hope to have covered?

cheers,
--renato

The tail call optimisation is performed before register allocation, so at that point we don't know if LR is being spilt or not. If LR was spilt to the stack, then we cannot do a tail call optimisation. That would involve popping back into LR which is not possible in Thumb1 code.

To me, this seems like the logical place to catch this case.

If you're happy with that explanation, I can move it into the commit message. My apologies that is was missing; it took some digging to find the rationale for this bit of code.

The tail call optimisation is performed before register allocation, so at that point we don't know if LR is being spilt or not. If LR was spilt to the stack, then we cannot do a tail call optimisation. That would involve popping back into LR which is not possible in Thumb1 code.

Right, this is a better explanation, thanks!

My concern of this being here is that this is a method that restores registers saved by spillCalleeSavedRegisters, and it shouldn't be changing the return instruction.

Also, that loop is all about needing the POP instruction or not, so any code that is not POP related shouldn't be there.

I'm not familiar with the tail call machinery, so I can't recommend you a better place. I'm adding James and Diana who have worked around frame lowering more than I did. Feel free to include more people you know worked in the area, too.

cheers,
--renato

olista01 edited edge metadata.Feb 1 2017, 5:39 AM

There's a big comment in ARMSubtarget.cpp (line ~200) explaining the problem with being unable to pop back into LR, but it seems to have fallen out of sync with the code because it claims that we don't do this optimisation. Could you update that comment to match the code?

test/CodeGen/ARM/v8m-tail-call.ll
4

I think these tests can be greatly simplified. For the first test case, this reproduces the bug:

define void @test() {
; CHECK-LABEL: test:
entry:
  %call = tail call i32 @foo()
  %tail = tail call i32 @foo()
  ret void
; CHECK: bl foo
; CHECK: bl foo
; CHECK-NOT: b foo
}
declare i32 @foo()

There should also be CHECK-LABELs for each of the test functions.

sanwou01 updated this revision to Diff 86650.Feb 1 2017, 9:27 AM
sanwou01 edited the summary of this revision. (Show Details)

Updated patch re @olista01 comments.

olista01 accepted this revision.Feb 1 2017, 9:29 AM

A few typos in the comment, otherwise LGTM.

lib/Target/ARM/ARMSubtarget.cpp
205

Typo: "an extra instructions"

207

Typo: generate generate

This revision is now accepted and ready to land.Feb 1 2017, 9:29 AM
sanwou01 updated this revision to Diff 86663.Feb 1 2017, 10:08 AM
sanwou01 marked 3 inline comments as done.

Fixed the typos, thanks!

sanwou01 closed this revision.Feb 3 2017, 3:27 AM