This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/ARM/
-
Target/
-
ARM/
2/2
ARMSubtarget.cpp
-
Thumb1FrameLowering.cpp
-
test/CodeGen/ARM/
-
CodeGen/
-
ARM/
1/1
v8m-tail-call.ll

Differential D29020

[ARM] Change TCReturn to tBL if tailcall optimization fails.
ClosedPublic

Authored by sanwou01 on Jan 23 2017, 5:23 AM.

Download Raw Diff

Details

Reviewers

rovka
rengolin
jmolloy
olista01

Commits

rGa9941857579f: [ARM] Change TCReturn to tBL if tailcall optimization fails.
rL294000: [ARM] Change TCReturn to tBL if tailcall optimization fails.

Summary

The tail call optimisation is performed before register allocation, so
at that point we don't know if LR is being spilt or not. If LR was spilt
to the stack, then we cannot do a tail call optimisation. That would
involve popping back into LR which is not possible in Thumb1 code.

Diff Detail

Build Status

Buildable 3495
Build 3495: arc lint + arc unit

Event Timeline

sanwou01 created this revision.Jan 23 2017, 5:23 AM

Herald added a subscriber: aemerson. · View Herald TranscriptJan 23 2017, 5:23 AM

Tests in https://reviews.llvm.org/D29073 . FYI, I do not have commit access.

Hi,

The tests really need to be here. If anything happens, cross-reverting is a nightmare and can be really hard to work out if the commits are in different ranges.

If this fix is needed for the other patch, than the only way I can think of is for both to be on the same commit. If this is just an independent fix, than it needs tests with it.

cheers,
--renato

rengolin mentioned this in D29073: [ARM] Enable Cortex-M23 and Cortex-M33 support..Jan 24 2017, 3:22 AM

@rengolin sorry about that, now with tests.

Thanks!

Now, I'm trying to understand what the problem is.

It seems that a previous process to deal with tail calls missed a spot, and you're adding the fix on the last possible stage to just change it to a branch&link.

The idea seems fine, but I'm worried that the implementation could leave untested areas uncovered.

Not to mention that, if there is a process that deals with tail calls, the code should not leave that unchecked. Ie. there should be no TCRETURN* after it at all.

I don't remember well that part of the code, so I may be missing something, but it looks to me that there is a more encompassing solution that we're not seeing here.

Also, from the test alone, it's not clear what cases fail to be processed and what don't. Can you elaborate on the description of the review what was the problem you found, what was the approach and what cases you hope to have covered?

cheers,
--renato

In D29020#654783, @rengolin wrote:

Thanks!

Now, I'm trying to understand what the problem is.

It seems that a previous process to deal with tail calls missed a spot, and you're adding the fix on the last possible stage to just change it to a branch&link.

The idea seems fine, but I'm worried that the implementation could leave untested areas uncovered.

Not to mention that, if there is a process that deals with tail calls, the code should not leave that unchecked. Ie. there should be no TCRETURN* after it at all.

I don't remember well that part of the code, so I may be missing something, but it looks to me that there is a more encompassing solution that we're not seeing here.

Also, from the test alone, it's not clear what cases fail to be processed and what don't. Can you elaborate on the description of the review what was the problem you found, what was the approach and what cases you hope to have covered?

cheers,
--renato

The tail call optimisation is performed before register allocation, so at that point we don't know if LR is being spilt or not. If LR was spilt to the stack, then we cannot do a tail call optimisation. That would involve popping back into LR which is not possible in Thumb1 code.

To me, this seems like the logical place to catch this case.

If you're happy with that explanation, I can move it into the commit message. My apologies that is was missing; it took some digging to find the rationale for this bit of code.

In D29020#654923, @sanwou01 wrote:

The tail call optimisation is performed before register allocation, so at that point we don't know if LR is being spilt or not. If LR was spilt to the stack, then we cannot do a tail call optimisation. That would involve popping back into LR which is not possible in Thumb1 code.

Right, this is a better explanation, thanks!

My concern of this being here is that this is a method that restores registers saved by spillCalleeSavedRegisters, and it shouldn't be changing the return instruction.

Also, that loop is all about needing the POP instruction or not, so any code that is not POP related shouldn't be there.

I'm not familiar with the tail call machinery, so I can't recommend you a better place. I'm adding James and Diana who have worked around frame lowering more than I did. Feel free to include more people you know worked in the area, too.

cheers,
--renato

There's a big comment in ARMSubtarget.cpp (line ~200) explaining the problem with being unable to pop back into LR, but it seems to have fallen out of sync with the code because it claims that we don't do this optimisation. Could you update that comment to match the code?

test/CodeGen/ARM/v8m-tail-call.ll
4	I think these tests can be greatly simplified. For the first test case, this reproduces the bug: define void @test() { ; CHECK-LABEL: test: entry: %call = tail call i32 @foo() %tail = tail call i32 @foo() ret void ; CHECK: bl foo ; CHECK: bl foo ; CHECK-NOT: b foo } declare i32 @foo() There should also be CHECK-LABELs for each of the test functions.

Updated patch re @olista01 comments.

A few typos in the comment, otherwise LGTM.

lib/Target/ARM/ARMSubtarget.cpp
205	Typo: "an extra instructions"
207	Typo: generate generate

This revision is now accepted and ready to land.Feb 1 2017, 9:29 AM

Fixed the typos, thanks!

sanwou01 closed this revision.Feb 3 2017, 3:27 AM

efriedma mentioned this in D39599: [ARM] Fix incorrect conversion of a tail call to an ordinary call.Nov 13 2017, 12:13 PM

Revision Contents

Path

Size

lib/

Target/

ARM/

ARMSubtarget.cpp

12 lines

Thumb1FrameLowering.cpp

10 lines

test/

CodeGen/

ARM/

v8m-tail-call.ll

23 lines

Diff 86663

lib/Target/ARM/ARMSubtarget.cpp

Show First 20 Lines • Show All 194 Lines • ▼ Show 20 Lines	if (isTargetNaCl() \|\| isAAPCS16_ABI())
stackAlignment = 16;		stackAlignment = 16;

// FIXME: Completely disable sibcall for Thumb1 since ThumbRegisterInfo::		// FIXME: Completely disable sibcall for Thumb1 since ThumbRegisterInfo::
// emitEpilogue is not ready for them. Thumb tail calls also use t2B, as		// emitEpilogue is not ready for them. Thumb tail calls also use t2B, as
// the Thumb1 16-bit unconditional branch doesn't have sufficient relocation		// the Thumb1 16-bit unconditional branch doesn't have sufficient relocation
// support in the assembler and linker to be used. This would need to be		// support in the assembler and linker to be used. This would need to be
// fixed to fully support tail calls in Thumb1.		// fixed to fully support tail calls in Thumb1.
//		//
// Doing this is tricky, since the LDM/POP instruction on Thumb doesn't take		// For ARMv8-M, we /do/ implement tail calls. Doing this is tricky for v8-M
// LR. This means if we need to reload LR, it takes an extra instructions,		// baseline, since the LDM/POP instruction on Thumb doesn't take LR. This
// which outweighs the value of the tail call; but here we don't know yet		// means if we need to reload LR, it takes extra instructions, which outweighs
		olista01Unsubmitted Done Reply Inline Actions Typo: "an extra instructions" olista01: Typo: "an extra instructions"
// whether LR is going to be used. Probably the right approach is to		// the value of the tail call; but here we don't know yet whether LR is going
// generate the tail call here and turn it back into CALL/RET in		// to be used. We generate the tail call here and turn it back into CALL/RET
		olista01Unsubmitted Done Reply Inline Actions Typo: generate generate olista01: Typo: generate generate
// emitEpilogue if LR is used.		// in emitEpilogue if LR is used.

// Thumb1 PIC calls to external symbols use BX, so they can be tail calls,		// Thumb1 PIC calls to external symbols use BX, so they can be tail calls,
// but we need to make sure there are enough registers; the only valid		// but we need to make sure there are enough registers; the only valid
// registers are the 4 used for parameters. We don't currently do this		// registers are the 4 used for parameters. We don't currently do this
// case.		// case.

SupportsTailCall = !isThumb() \|\| hasV8MBaselineOps();		SupportsTailCall = !isThumb() \|\| hasV8MBaselineOps();

▲ Show 20 Lines • Show All 166 Lines • Show Last 20 Lines

lib/Target/ARM/Thumb1FrameLowering.cpp

Show First 20 Lines • Show All 860 Lines • ▼ Show 20 Lines	for (unsigned i = CSI.size(); i != 0; --i) {
if (Reg == ARM::LR) {		if (Reg == ARM::LR) {
if (MBB.succ_empty()) {		if (MBB.succ_empty()) {
// Special epilogue for vararg functions. See emitEpilogue		// Special epilogue for vararg functions. See emitEpilogue
if (isVarArg)		if (isVarArg)
continue;		continue;
// ARMv4T requires BX, see emitEpilogue		// ARMv4T requires BX, see emitEpilogue
if (!STI.hasV5TOps())		if (!STI.hasV5TOps())
continue;		continue;
		// Tailcall optimization failed; change TCRETURN to a tBL
		if (MI->getOpcode() == ARM::TCRETURNdi \|\|
		MI->getOpcode() == ARM::TCRETURNri) {
		unsigned Opcode = MI->getOpcode() == ARM::TCRETURNdi
		? ARM::tBL : ARM::tBLXr;
		MachineInstrBuilder BL = BuildMI(MF, DL, TII.get(Opcode));
		BL.add(predOps(ARMCC::AL));
		BL.add(MI->getOperand(0));
		MBB.insert(MI, &*BL);
		}
Reg = ARM::PC;		Reg = ARM::PC;
(*MIB).setDesc(TII.get(ARM::tPOP_RET));		(*MIB).setDesc(TII.get(ARM::tPOP_RET));
if (MI != MBB.end())		if (MI != MBB.end())
MIB.copyImplicitOps(*MI);		MIB.copyImplicitOps(*MI);
MI = MBB.erase(MI);		MI = MBB.erase(MI);
} else		} else
// LR may only be popped into PC, as part of return sequence.		// LR may only be popped into PC, as part of return sequence.
// If this isn't the return sequence, we'll need emitPopSpecialFixUp		// If this isn't the return sequence, we'll need emitPopSpecialFixUp
Show All 15 Lines

test/CodeGen/ARM/v8m-tail-call.ll

This file was added.

				; RUN: llc %s -o - -mtriple=thumbv8m.base \| FileCheck %s

				define void @test() {
				; CHECK-LABEL: test:
				olista01Unsubmitted Done Reply Inline Actions I think these tests can be greatly simplified. For the first test case, this reproduces the bug: define void @test() { ; CHECK-LABEL: test: entry: %call = tail call i32 @foo() %tail = tail call i32 @foo() ret void ; CHECK: bl foo ; CHECK: bl foo ; CHECK-NOT: b foo } declare i32 @foo() There should also be CHECK-LABELs for each of the test functions. olista01: I think these tests can be greatly simplified. For the first test case, this reproduces the bug…
				entry:
				%call = tail call i32 @foo()
				%tail = tail call i32 @foo()
				ret void
				; CHECK: bl foo
				; CHECK: bl foo
				; CHECK-NOT: b foo
				}

				define void @test2() {
				; CHECK-LABEL: test2:
				entry:
				%tail = tail call i32 @foo()
				ret void
				; CHECK: b foo
				; CHECK-NOT: bl foo
				}

				declare i32 @foo()

This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Change TCReturn to tBL if tailcall optimization fails.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 86663

lib/Target/ARM/ARMSubtarget.cpp

lib/Target/ARM/Thumb1FrameLowering.cpp

test/CodeGen/ARM/v8m-tail-call.ll

[ARM] Change TCReturn to tBL if tailcall optimization fails.
ClosedPublic