This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/ARM/
-
Target/
-
ARM/
-
Thumb1FrameLowering.cpp

Differential D29020

[ARM] Change TCReturn to tBL if tailcall optimization fails.
ClosedPublic

Authored by sanwou01 on Jan 23 2017, 5:23 AM.

Download Raw Diff

Details

Reviewers

rovka
rengolin
jmolloy
olista01

Commits

rGa9941857579f: [ARM] Change TCReturn to tBL if tailcall optimization fails.
rL294000: [ARM] Change TCReturn to tBL if tailcall optimization fails.

Summary

The tail call optimisation is performed before register allocation, so
at that point we don't know if LR is being spilt or not. If LR was spilt
to the stack, then we cannot do a tail call optimisation. That would
involve popping back into LR which is not possible in Thumb1 code.

Diff Detail

Build Status

Buildable 3177
Build 3177: arc lint + arc unit

Event Timeline

sanwou01 created this revision.Jan 23 2017, 5:23 AM

Herald added a subscriber: aemerson. · View Herald TranscriptJan 23 2017, 5:23 AM

Tests in https://reviews.llvm.org/D29073 . FYI, I do not have commit access.

Hi,

The tests really need to be here. If anything happens, cross-reverting is a nightmare and can be really hard to work out if the commits are in different ranges.

If this fix is needed for the other patch, than the only way I can think of is for both to be on the same commit. If this is just an independent fix, than it needs tests with it.

cheers,
--renato

rengolin mentioned this in D29073: [ARM] Enable Cortex-M23 and Cortex-M33 support..Jan 24 2017, 3:22 AM

@rengolin sorry about that, now with tests.

Thanks!

Now, I'm trying to understand what the problem is.

It seems that a previous process to deal with tail calls missed a spot, and you're adding the fix on the last possible stage to just change it to a branch&link.

The idea seems fine, but I'm worried that the implementation could leave untested areas uncovered.

Not to mention that, if there is a process that deals with tail calls, the code should not leave that unchecked. Ie. there should be no TCRETURN* after it at all.

I don't remember well that part of the code, so I may be missing something, but it looks to me that there is a more encompassing solution that we're not seeing here.

Also, from the test alone, it's not clear what cases fail to be processed and what don't. Can you elaborate on the description of the review what was the problem you found, what was the approach and what cases you hope to have covered?

cheers,
--renato

In D29020#654783, @rengolin wrote:

Thanks!

Now, I'm trying to understand what the problem is.

It seems that a previous process to deal with tail calls missed a spot, and you're adding the fix on the last possible stage to just change it to a branch&link.

The idea seems fine, but I'm worried that the implementation could leave untested areas uncovered.

Not to mention that, if there is a process that deals with tail calls, the code should not leave that unchecked. Ie. there should be no TCRETURN* after it at all.

I don't remember well that part of the code, so I may be missing something, but it looks to me that there is a more encompassing solution that we're not seeing here.

Also, from the test alone, it's not clear what cases fail to be processed and what don't. Can you elaborate on the description of the review what was the problem you found, what was the approach and what cases you hope to have covered?

cheers,
--renato

The tail call optimisation is performed before register allocation, so at that point we don't know if LR is being spilt or not. If LR was spilt to the stack, then we cannot do a tail call optimisation. That would involve popping back into LR which is not possible in Thumb1 code.

To me, this seems like the logical place to catch this case.

If you're happy with that explanation, I can move it into the commit message. My apologies that is was missing; it took some digging to find the rationale for this bit of code.

In D29020#654923, @sanwou01 wrote:

The tail call optimisation is performed before register allocation, so at that point we don't know if LR is being spilt or not. If LR was spilt to the stack, then we cannot do a tail call optimisation. That would involve popping back into LR which is not possible in Thumb1 code.

Right, this is a better explanation, thanks!

My concern of this being here is that this is a method that restores registers saved by spillCalleeSavedRegisters, and it shouldn't be changing the return instruction.

Also, that loop is all about needing the POP instruction or not, so any code that is not POP related shouldn't be there.

I'm not familiar with the tail call machinery, so I can't recommend you a better place. I'm adding James and Diana who have worked around frame lowering more than I did. Feel free to include more people you know worked in the area, too.

cheers,
--renato

There's a big comment in ARMSubtarget.cpp (line ~200) explaining the problem with being unable to pop back into LR, but it seems to have fallen out of sync with the code because it claims that we don't do this optimisation. Could you update that comment to match the code?

test/CodeGen/ARM/v8m-tail-call.ll
3 ↗	(On Diff #85560)	I think these tests can be greatly simplified. For the first test case, this reproduces the bug: define void @test() { ; CHECK-LABEL: test: entry: %call = tail call i32 @foo() %tail = tail call i32 @foo() ret void ; CHECK: bl foo ; CHECK: bl foo ; CHECK-NOT: b foo } declare i32 @foo() There should also be CHECK-LABELs for each of the test functions.

Updated patch re @olista01 comments.

A few typos in the comment, otherwise LGTM.

lib/Target/ARM/ARMSubtarget.cpp
205 ↗	(On Diff #86650)	Typo: "an extra instructions"
207 ↗	(On Diff #86650)	Typo: generate generate

This revision is now accepted and ready to land.Feb 1 2017, 9:29 AM

Fixed the typos, thanks!

sanwou01 closed this revision.Feb 3 2017, 3:27 AM

efriedma mentioned this in D39599: [ARM] Fix incorrect conversion of a tail call to an ordinary call.Nov 13 2017, 12:13 PM

Revision Contents

Path

Size

lib/

Target/

ARM/

Thumb1FrameLowering.cpp

10 lines

Diff 85363

lib/Target/ARM/Thumb1FrameLowering.cpp

Show First 20 Lines • Show All 860 Lines • ▼ Show 20 Lines	for (unsigned i = CSI.size(); i != 0; --i) {
if (Reg == ARM::LR) {		if (Reg == ARM::LR) {
if (MBB.succ_empty()) {		if (MBB.succ_empty()) {
// Special epilogue for vararg functions. See emitEpilogue		// Special epilogue for vararg functions. See emitEpilogue
if (isVarArg)		if (isVarArg)
continue;		continue;
// ARMv4T requires BX, see emitEpilogue		// ARMv4T requires BX, see emitEpilogue
if (!STI.hasV5TOps())		if (!STI.hasV5TOps())
continue;		continue;
		// Tailcall optimization failed; change TCRETURN to a tBL
		if (MI->getOpcode() == ARM::TCRETURNdi \|\|
		MI->getOpcode() == ARM::TCRETURNri) {
		unsigned Opcode = MI->getOpcode() == ARM::TCRETURNdi
		? ARM::tBL : ARM::tBLXr;
		MachineInstrBuilder BL = BuildMI(MF, DL, TII.get(Opcode));
		BL.add(predOps(ARMCC::AL));
		BL.add(MI->getOperand(0));
		MBB.insert(MI, &*BL);
		}
Reg = ARM::PC;		Reg = ARM::PC;
(*MIB).setDesc(TII.get(ARM::tPOP_RET));		(*MIB).setDesc(TII.get(ARM::tPOP_RET));
if (MI != MBB.end())		if (MI != MBB.end())
MIB.copyImplicitOps(*MI);		MIB.copyImplicitOps(*MI);
MI = MBB.erase(MI);		MI = MBB.erase(MI);
} else		} else
// LR may only be popped into PC, as part of return sequence.		// LR may only be popped into PC, as part of return sequence.
// If this isn't the return sequence, we'll need emitPopSpecialFixUp		// If this isn't the return sequence, we'll need emitPopSpecialFixUp
Show All 15 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Change TCReturn to tBL if tailcall optimization fails.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 85363

lib/Target/ARM/Thumb1FrameLowering.cpp

[ARM] Change TCReturn to tBL if tailcall optimization fails.
ClosedPublic